Merge tag 'rust-6.9' of https://github.com/Rust-for-Linux/linux
authorLinus Torvalds <torvalds@linux-foundation.org>
Mon, 11 Mar 2024 19:31:28 +0000 (12:31 -0700)
committerLinus Torvalds <torvalds@linux-foundation.org>
Mon, 11 Mar 2024 19:31:28 +0000 (12:31 -0700)
Pull Rust updates from Miguel Ojeda:
 "Another routine one in terms of features. We got two version upgrades
  this time, but in terms of lines, 'alloc' changes are not very large.

  Toolchain and infrastructure:

   - Upgrade to Rust 1.76.0

     This time around, due to how the kernel and Rust schedules have
     aligned, there are two upgrades in fact. These allow us to remove
     two more unstable features ('const_maybe_uninit_zeroed' and
     'ptr_metadata') from the list, among other improvements

   - Mark 'rustc' (and others) invocations as recursive, which fixes a
     new warning and prepares us for the future in case we eventually
     take advantage of the Make jobserver

  'kernel' crate:

   - Add the 'container_of!' macro

   - Stop using the unstable 'ptr_metadata' feature by employing the now
     stable 'byte_sub' method to implement 'Arc::from_raw()'

   - Add the 'time' module with a 'msecs_to_jiffies()' conversion
     function to begin with, to be used by Rust Binder

   - Add 'notify_sync()' and 'wait_interruptible_timeout()' methods to
     'CondVar', to be used by Rust Binder

   - Update integer types for 'CondVar'

   - Rename 'wait_list' field to 'wait_queue_head' in 'CondVar'

   - Implement 'Display' and 'Debug' for 'BStr'

   - Add the 'try_from_foreign()' method to the 'ForeignOwnable' trait

   - Add reexports for macros so that they can be used from the right
     module (in addition to the root)

   - A series of code documentation improvements, including adding
     intra-doc links, consistency improvements, typo fixes...

  'macros' crate:

   - Place generated 'init_module()' function in '.init.text'

  Documentation:

   - Add documentation on Rust doctests and how they work"

* tag 'rust-6.9' of https://github.com/Rust-for-Linux/linux: (29 commits)
  rust: upgrade to Rust 1.76.0
  kbuild: mark `rustc` (and others) invocations as recursive
  rust: add `container_of!` macro
  rust: str: implement `Display` and `Debug` for `BStr`
  rust: module: place generated init_module() function in .init.text
  rust: types: add `try_from_foreign()` method
  docs: rust: Add description of Rust documentation test as KUnit ones
  docs: rust: Move testing to a separate page
  rust: kernel: stop using ptr_metadata feature
  rust: kernel: add reexports for macros
  rust: locked_by: shorten doclink preview
  rust: kernel: remove unneeded doclink targets
  rust: kernel: add doclinks
  rust: kernel: add blank lines in front of code blocks
  rust: kernel: mark code fragments in docs with backticks
  rust: kernel: unify spelling of refcount in docs
  rust: str: move SAFETY comment in front of unsafe block
  rust: str: use `NUL` instead of 0 in doc comments
  rust: kernel: add srctree-relative doclinks
  rust: ioctl: end top-level module docs with full stop
  ...

2606 files changed:
.mailmap
CREDITS
Documentation/ABI/testing/sysfs-class-net-queues
Documentation/ABI/testing/sysfs-class-net-statistics
Documentation/ABI/testing/sysfs-driver-intel-i915-hwmon
Documentation/ABI/testing/sysfs-driver-intel-xe-hwmon
Documentation/ABI/testing/sysfs-nvmem-cells
Documentation/ABI/testing/sysfs-platform-silicom
Documentation/RCU/checklist.rst
Documentation/RCU/rcu_dereference.rst
Documentation/RCU/whatisRCU.rst
Documentation/accel/introduction.rst
Documentation/admin-guide/kernel-parameters.rst
Documentation/admin-guide/kernel-parameters.txt
Documentation/admin-guide/kernel-per-CPU-kthreads.rst
Documentation/arch/arm64/silicon-errata.rst
Documentation/arch/x86/mds.rst
Documentation/conf.py
Documentation/dev-tools/kselftest.rst
Documentation/dev-tools/kunit/usage.rst
Documentation/devicetree/bindings/Makefile
Documentation/devicetree/bindings/ata/ceva,ahci-1v84.yaml
Documentation/devicetree/bindings/clock/google,gs101-clock.yaml
Documentation/devicetree/bindings/display/bridge/nxp,tda998x.yaml
Documentation/devicetree/bindings/display/samsung/samsung,exynos-mixer.yaml
Documentation/devicetree/bindings/gpio/xlnx,zynqmp-gpio-modepin.yaml
Documentation/devicetree/bindings/media/cnm,wave521c.yaml
Documentation/devicetree/bindings/net/marvell,prestera.yaml
Documentation/devicetree/bindings/net/renesas,ethertsn.yaml
Documentation/devicetree/bindings/reset/xlnx,zynqmp-reset.yaml
Documentation/devicetree/bindings/sound/allwinner,sun4i-a10-spdif.yaml
Documentation/devicetree/bindings/sound/google,sc7280-herobrine.yaml
Documentation/devicetree/bindings/sound/nvidia,tegra-audio-max9808x.yaml
Documentation/devicetree/bindings/tpm/tpm-common.yaml
Documentation/devicetree/bindings/ufs/samsung,exynos-ufs.yaml
Documentation/devicetree/bindings/usb/dwc3-xilinx.yaml
Documentation/devicetree/bindings/usb/microchip,usb5744.yaml
Documentation/devicetree/bindings/usb/xlnx,usb2.yaml
Documentation/driver-api/dpll.rst
Documentation/filesystems/files.rst
Documentation/filesystems/index.rst
Documentation/filesystems/locking.rst
Documentation/filesystems/ntfs.rst [deleted file]
Documentation/filesystems/overlayfs.rst
Documentation/filesystems/vfs.rst
Documentation/kbuild/Kconfig.recursion-issue-01
Documentation/netlink/specs/dpll.yaml
Documentation/netlink/specs/rt_link.yaml
Documentation/networking/devlink/devlink-port.rst
Documentation/networking/net_cachelines/inet_sock.rst
Documentation/networking/net_cachelines/net_device.rst
Documentation/networking/net_cachelines/tcp_sock.rst
Documentation/process/cve.rst [new file with mode: 0644]
Documentation/process/index.rst
Documentation/process/maintainer-netdev.rst
Documentation/process/security-bugs.rst
Documentation/sphinx/kernel_feat.py
Documentation/sphinx/templates/kernel-toc.html
Documentation/sphinx/translations.py
Documentation/usb/gadget-testing.rst
Documentation/userspace-api/ioctl/ioctl-number.rst
Documentation/virt/hyperv/index.rst
Documentation/virt/hyperv/vpci.rst [new file with mode: 0644]
Documentation/virt/kvm/api.rst
MAINTAINERS
Makefile
arch/Kconfig
arch/arc/include/asm/jump_label.h
arch/arm/boot/dts/amazon/alpine.dtsi
arch/arm/boot/dts/aspeed/aspeed-bmc-facebook-bletchley.dts
arch/arm/boot/dts/aspeed/aspeed-bmc-facebook-wedge400.dts
arch/arm/boot/dts/aspeed/aspeed-bmc-opp-tacoma.dts
arch/arm/boot/dts/aspeed/aspeed-g4.dtsi
arch/arm/boot/dts/aspeed/aspeed-g5.dtsi
arch/arm/boot/dts/aspeed/aspeed-g6.dtsi
arch/arm/boot/dts/aspeed/ast2600-facebook-netbmc-common.dtsi
arch/arm/boot/dts/broadcom/bcm-cygnus.dtsi
arch/arm/boot/dts/broadcom/bcm-hr2.dtsi
arch/arm/boot/dts/broadcom/bcm-nsp.dtsi
arch/arm/boot/dts/intel/ixp/intel-ixp42x-gateway-7001.dts
arch/arm/boot/dts/intel/ixp/intel-ixp42x-goramo-multilink.dts
arch/arm/boot/dts/marvell/kirkwood-l-50.dts
arch/arm/boot/dts/nuvoton/nuvoton-wpcm450.dtsi
arch/arm/boot/dts/nvidia/tegra30-apalis-v1.1.dtsi
arch/arm/boot/dts/nvidia/tegra30-apalis.dtsi
arch/arm/boot/dts/nvidia/tegra30-colibri.dtsi
arch/arm/boot/dts/nxp/imx/imx6q-b850v3.dts
arch/arm/boot/dts/nxp/imx/imx6q-bx50v3.dtsi
arch/arm/boot/dts/nxp/imx/imx6qdl-apalis.dtsi
arch/arm/boot/dts/nxp/imx/imx6qdl-colibri.dtsi
arch/arm/boot/dts/nxp/imx/imx6qdl-emcon.dtsi
arch/arm/boot/dts/nxp/imx/imx6qdl-phytec-pfla02.dtsi
arch/arm/boot/dts/nxp/imx/imx6qdl-phytec-phycore-som.dtsi
arch/arm/boot/dts/nxp/imx/imx6ull-phytec-tauri.dtsi
arch/arm/boot/dts/nxp/imx/imx7d-flex-concentrator.dts
arch/arm/boot/dts/nxp/imx/imx7d-pico-dwarf.dts
arch/arm/boot/dts/nxp/imx/imx7s.dtsi
arch/arm/boot/dts/nxp/vf/vf610-zii-dev-rev-b.dts
arch/arm/boot/dts/qcom/qcom-sdx55.dtsi
arch/arm/boot/dts/renesas/r8a7790-lager.dts
arch/arm/boot/dts/renesas/r8a7790-stout.dts
arch/arm/boot/dts/renesas/r8a7791-koelsch.dts
arch/arm/boot/dts/renesas/r8a7791-porter.dts
arch/arm/boot/dts/renesas/r8a7792-blanche.dts
arch/arm/boot/dts/renesas/r8a7793-gose.dts
arch/arm/boot/dts/renesas/r8a7794-alt.dts
arch/arm/boot/dts/renesas/r8a7794-silk.dts
arch/arm/boot/dts/rockchip/rv1108.dtsi
arch/arm/boot/dts/samsung/exynos4212-tab3.dtsi
arch/arm/boot/dts/st/stm32429i-eval.dts
arch/arm/boot/dts/st/stm32mp157c-dk2.dts
arch/arm/boot/dts/ti/omap/am335x-moxa-uc-2100-common.dtsi
arch/arm/boot/dts/ti/omap/am5729-beagleboneai.dts
arch/arm/configs/imx_v6_v7_defconfig
arch/arm/include/asm/jump_label.h
arch/arm/mach-ep93xx/core.c
arch/arm/mm/fault.c
arch/arm64/Makefile
arch/arm64/boot/dts/allwinner/Makefile
arch/arm64/boot/dts/amazon/alpine-v2.dtsi
arch/arm64/boot/dts/amazon/alpine-v3.dtsi
arch/arm64/boot/dts/broadcom/northstar2/ns2.dtsi
arch/arm64/boot/dts/broadcom/stingray/stingray.dtsi
arch/arm64/boot/dts/exynos/google/gs101.dtsi
arch/arm64/boot/dts/freescale/Makefile
arch/arm64/boot/dts/freescale/imx8mm-phygate-tauri-l.dts
arch/arm64/boot/dts/freescale/imx8mm-venice-gw72xx.dtsi
arch/arm64/boot/dts/freescale/imx8mm-venice-gw73xx.dtsi
arch/arm64/boot/dts/freescale/imx8mn-var-som-symphony.dts
arch/arm64/boot/dts/freescale/imx8mp-beacon-kit.dts
arch/arm64/boot/dts/freescale/imx8mp-data-modul-edm-sbc.dts
arch/arm64/boot/dts/freescale/imx8mp-dhcom-pdk3.dts
arch/arm64/boot/dts/freescale/imx8mp-dhcom-som.dtsi
arch/arm64/boot/dts/freescale/imx8mp-tqma8mpql-mba8mpxl.dts
arch/arm64/boot/dts/freescale/imx8mp-venice-gw72xx.dtsi
arch/arm64/boot/dts/freescale/imx8mp-venice-gw73xx.dtsi
arch/arm64/boot/dts/freescale/imx8mp-venice-gw74xx.dts
arch/arm64/boot/dts/freescale/imx8mp.dtsi
arch/arm64/boot/dts/freescale/imx8mq-kontron-pitx-imx8m.dts
arch/arm64/boot/dts/lg/lg1312.dtsi
arch/arm64/boot/dts/lg/lg1313.dtsi
arch/arm64/boot/dts/marvell/armada-ap80x.dtsi
arch/arm64/boot/dts/mediatek/mt8183-kukui.dtsi
arch/arm64/boot/dts/mediatek/mt8192-asurada.dtsi
arch/arm64/boot/dts/mediatek/mt8195-demo.dts
arch/arm64/boot/dts/nvidia/tegra234-p3737-0000+p3701-0000.dts
arch/arm64/boot/dts/nvidia/tegra234.dtsi
arch/arm64/boot/dts/qcom/ipq6018.dtsi
arch/arm64/boot/dts/qcom/ipq8074.dtsi
arch/arm64/boot/dts/qcom/msm8996.dtsi
arch/arm64/boot/dts/qcom/sc8280xp-crd.dts
arch/arm64/boot/dts/qcom/sc8280xp-lenovo-thinkpad-x13s.dts
arch/arm64/boot/dts/qcom/sm6115.dtsi
arch/arm64/boot/dts/qcom/sm8650-mtp.dts
arch/arm64/boot/dts/qcom/sm8650-qrd.dts
arch/arm64/boot/dts/renesas/ulcb-kf.dtsi
arch/arm64/boot/dts/rockchip/px30.dtsi
arch/arm64/boot/dts/rockchip/rk3328.dtsi
arch/arm64/boot/dts/rockchip/rk3399-gru-bob.dts
arch/arm64/boot/dts/rockchip/rk3399-gru-scarlet.dtsi
arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5-evb.dts
arch/arm64/boot/dts/rockchip/rk3588-coolpi-cm5.dtsi
arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts
arch/arm64/boot/dts/rockchip/rk3588-jaguar.dts
arch/arm64/boot/dts/rockchip/rk3588-nanopc-t6.dts
arch/arm64/boot/dts/rockchip/rk3588s-coolpi-4b.dts
arch/arm64/boot/dts/rockchip/rk3588s-indiedroid-nova.dts
arch/arm64/crypto/aes-neonbs-glue.c
arch/arm64/include/asm/alternative-macros.h
arch/arm64/include/asm/cpufeature.h
arch/arm64/include/asm/cputype.h
arch/arm64/include/asm/fpsimd.h
arch/arm64/include/asm/jump_label.h
arch/arm64/include/asm/vdso.h
arch/arm64/kernel/Makefile
arch/arm64/kernel/cpu_errata.c
arch/arm64/kernel/fpsimd.c
arch/arm64/kernel/ptrace.c
arch/arm64/kernel/signal.c
arch/arm64/kernel/stacktrace.c
arch/arm64/kernel/suspend.c
arch/arm64/kernel/vdso32/Makefile
arch/arm64/kvm/Kconfig
arch/arm64/kvm/hyp/pgtable.c
arch/arm64/kvm/pkvm.c
arch/arm64/kvm/vgic/vgic-its.c
arch/csky/include/asm/jump_label.h
arch/loongarch/Kconfig
arch/loongarch/boot/dts/loongson-2k0500-ref.dts
arch/loongarch/boot/dts/loongson-2k1000-ref.dts
arch/loongarch/include/asm/acpi.h
arch/loongarch/include/asm/jump_label.h
arch/loongarch/include/asm/kvm_vcpu.h
arch/loongarch/kernel/acpi.c
arch/loongarch/kernel/setup.c
arch/loongarch/kernel/smp.c
arch/loongarch/kvm/mmu.c
arch/loongarch/kvm/vcpu.c
arch/loongarch/mm/kasan_init.c
arch/loongarch/mm/tlb.c
arch/loongarch/vdso/Makefile
arch/m68k/Makefile
arch/m68k/emu/nfblock.c
arch/mips/alchemy/common/prom.c
arch/mips/alchemy/common/setup.c
arch/mips/bcm63xx/boards/board_bcm963xx.c
arch/mips/bcm63xx/dev-rng.c
arch/mips/bcm63xx/dev-uart.c
arch/mips/bcm63xx/dev-wdt.c
arch/mips/bcm63xx/irq.c
arch/mips/bcm63xx/setup.c
arch/mips/bcm63xx/timer.c
arch/mips/cobalt/setup.c
arch/mips/fw/arc/memory.c
arch/mips/include/asm/checksum.h
arch/mips/include/asm/jump_label.h
arch/mips/include/asm/mach-au1x00/au1000.h
arch/mips/include/asm/mach-cobalt/cobalt.h
arch/mips/include/asm/ptrace.h
arch/mips/kernel/elf.c
arch/mips/kernel/ptrace.c
arch/mips/kernel/traps.c
arch/mips/lantiq/prom.c
arch/mips/loongson64/init.c
arch/mips/loongson64/numa.c
arch/mips/sgi-ip27/Makefile
arch/mips/sgi-ip27/ip27-berr.c
arch/mips/sgi-ip27/ip27-common.h
arch/mips/sgi-ip27/ip27-hubio.c [deleted file]
arch/mips/sgi-ip27/ip27-irq.c
arch/mips/sgi-ip27/ip27-memory.c
arch/mips/sgi-ip27/ip27-nmi.c
arch/mips/sgi-ip30/ip30-console.c
arch/mips/sgi-ip30/ip30-setup.c
arch/mips/sgi-ip32/crime.c
arch/mips/sgi-ip32/ip32-berr.c
arch/mips/sgi-ip32/ip32-common.h [new file with mode: 0644]
arch/mips/sgi-ip32/ip32-irq.c
arch/mips/sgi-ip32/ip32-memory.c
arch/mips/sgi-ip32/ip32-reset.c
arch/mips/sgi-ip32/ip32-setup.c
arch/parisc/Kconfig
arch/parisc/Makefile
arch/parisc/include/asm/assembly.h
arch/parisc/include/asm/extable.h [new file with mode: 0644]
arch/parisc/include/asm/jump_label.h
arch/parisc/include/asm/kprobes.h
arch/parisc/include/asm/special_insns.h
arch/parisc/include/asm/uaccess.h
arch/parisc/kernel/cache.c
arch/parisc/kernel/drivers.c
arch/parisc/kernel/ftrace.c
arch/parisc/kernel/processor.c
arch/parisc/kernel/unaligned.c
arch/parisc/kernel/unwind.c
arch/parisc/kernel/vmlinux.lds.S
arch/parisc/mm/fault.c
arch/powerpc/include/asm/ftrace.h
arch/powerpc/include/asm/jump_label.h
arch/powerpc/include/asm/papr-sysparm.h
arch/powerpc/include/asm/ppc-pci.h
arch/powerpc/include/asm/reg.h
arch/powerpc/include/asm/rtas.h
arch/powerpc/include/asm/sections.h
arch/powerpc/include/asm/thread_info.h
arch/powerpc/include/asm/uaccess.h
arch/powerpc/include/uapi/asm/papr-sysparm.h
arch/powerpc/kernel/cpu_setup_6xx.S
arch/powerpc/kernel/cpu_specs_e500mc.h
arch/powerpc/kernel/interrupt_64.S
arch/powerpc/kernel/iommu.c
arch/powerpc/kernel/irq_64.c
arch/powerpc/kernel/rtas.c
arch/powerpc/kernel/trace/ftrace.c
arch/powerpc/kernel/trace/ftrace_64_pg.c
arch/powerpc/kernel/vmlinux.lds.S
arch/powerpc/kvm/book3s_hv.c
arch/powerpc/kvm/book3s_hv_nestedv2.c
arch/powerpc/mm/kasan/init_32.c
arch/powerpc/platforms/85xx/mpc8536_ds.c
arch/powerpc/platforms/85xx/mvme2500.c
arch/powerpc/platforms/85xx/p1010rdb.c
arch/powerpc/platforms/85xx/p1022_ds.c
arch/powerpc/platforms/85xx/p1022_rdk.c
arch/powerpc/platforms/85xx/socrates_fpga_pic.c
arch/powerpc/platforms/85xx/xes_mpc85xx.c
arch/powerpc/platforms/pseries/iommu.c
arch/powerpc/platforms/pseries/lpar.c
arch/powerpc/platforms/pseries/pci_dlpar.c
arch/powerpc/sysdev/udbg_memcons.c
arch/riscv/Kconfig
arch/riscv/boot/dts/sifive/hifive-unmatched-a00.dts
arch/riscv/boot/dts/sophgo/sg2042.dtsi
arch/riscv/boot/dts/starfive/jh7100.dtsi
arch/riscv/boot/dts/starfive/jh7110.dtsi
arch/riscv/include/asm/arch_hweight.h
arch/riscv/include/asm/bitops.h
arch/riscv/include/asm/checksum.h
arch/riscv/include/asm/cpufeature.h
arch/riscv/include/asm/csr.h
arch/riscv/include/asm/ftrace.h
arch/riscv/include/asm/hugetlb.h
arch/riscv/include/asm/hwcap.h
arch/riscv/include/asm/jump_label.h
arch/riscv/include/asm/pgalloc.h
arch/riscv/include/asm/pgtable-64.h
arch/riscv/include/asm/pgtable.h
arch/riscv/include/asm/stacktrace.h
arch/riscv/include/asm/suspend.h
arch/riscv/include/asm/tlb.h
arch/riscv/include/asm/tlbflush.h
arch/riscv/include/asm/vmalloc.h
arch/riscv/include/uapi/asm/kvm.h
arch/riscv/kernel/Makefile
arch/riscv/kernel/cpufeature.c
arch/riscv/kernel/paravirt.c
arch/riscv/kernel/return_address.c [new file with mode: 0644]
arch/riscv/kernel/suspend.c
arch/riscv/kvm/vcpu_onereg.c
arch/riscv/kvm/vcpu_sbi_sta.c
arch/riscv/lib/csum.c
arch/riscv/mm/hugetlbpage.c
arch/riscv/mm/init.c
arch/riscv/mm/tlbflush.c
arch/riscv/net/bpf_jit_comp64.c
arch/s390/configs/compat.config [new file with mode: 0644]
arch/s390/configs/debug_defconfig
arch/s390/configs/defconfig
arch/s390/configs/zfcpdump_defconfig
arch/s390/include/asm/jump_label.h
arch/s390/kvm/priv.c
arch/s390/kvm/vsie.c
arch/s390/mm/gmap.c
arch/s390/pci/pci.c
arch/sparc/Makefile
arch/sparc/include/asm/jump_label.h
arch/sparc/video/Makefile
arch/um/Makefile
arch/um/drivers/ubd_kern.c
arch/um/include/asm/cpufeature.h
arch/x86/Kconfig.cpu
arch/x86/Makefile
arch/x86/boot/header.S
arch/x86/boot/setup.ld
arch/x86/entry/entry.S
arch/x86/entry/entry_32.S
arch/x86/entry/entry_64.S
arch/x86/entry/entry_64_compat.S
arch/x86/hyperv/hv_vtl.c
arch/x86/hyperv/ivm.c
arch/x86/include/asm/coco.h
arch/x86/include/asm/cpufeature.h
arch/x86/include/asm/cpufeatures.h
arch/x86/include/asm/entry-common.h
arch/x86/include/asm/intel-family.h
arch/x86/include/asm/jump_label.h
arch/x86/include/asm/kmsan.h
arch/x86/include/asm/kvm_host.h
arch/x86/include/asm/nospec-branch.h
arch/x86/include/asm/rmwcc.h
arch/x86/include/asm/set_memory.h
arch/x86/include/asm/special_insns.h
arch/x86/include/asm/syscall_wrapper.h
arch/x86/include/asm/uaccess.h
arch/x86/include/asm/vsyscall.h
arch/x86/kernel/alternative.c
arch/x86/kernel/cpu/amd.c
arch/x86/kernel/cpu/bugs.c
arch/x86/kernel/cpu/common.c
arch/x86/kernel/cpu/intel.c
arch/x86/kernel/e820.c
arch/x86/kernel/fpu/signal.c
arch/x86/kernel/kvm.c
arch/x86/kernel/nmi.c
arch/x86/kvm/Kconfig
arch/x86/kvm/hyperv.c
arch/x86/kvm/hyperv.h
arch/x86/kvm/mmu/mmu.c
arch/x86/kvm/svm/sev.c
arch/x86/kvm/svm/svm_ops.h
arch/x86/kvm/vmx/pmu_intel.c
arch/x86/kvm/vmx/run_flags.h
arch/x86/kvm/vmx/vmenter.S
arch/x86/kvm/vmx/vmx.c
arch/x86/kvm/vmx/vmx_ops.h
arch/x86/kvm/x86.c
arch/x86/lib/getuser.S
arch/x86/lib/putuser.S
arch/x86/mm/fault.c
arch/x86/mm/ident_map.c
arch/x86/mm/maccess.c
arch/x86/mm/numa.c
arch/x86/mm/pat/set_memory.c
arch/x86/xen/smp.c
arch/xtensa/include/asm/jump_label.h
arch/xtensa/platforms/iss/simdisk.c
block/bdev.c
block/bfq-cgroup.c
block/bfq-iosched.c
block/bio-integrity.c
block/bio.c
block/blk-cgroup.c
block/blk-cgroup.h
block/blk-core.c
block/blk-crypto-fallback.c
block/blk-flush.c
block/blk-integrity.c
block/blk-iocost.c
block/blk-iolatency.c
block/blk-lib.c
block/blk-map.c
block/blk-merge.c
block/blk-mq.c
block/blk-settings.c
block/blk-stat.c
block/blk-sysfs.c
block/blk-throttle.c
block/blk-wbt.c
block/blk-zoned.c
block/blk.h
block/bounce.c
block/bsg-lib.c
block/fops.c
block/genhd.c
block/holder.c
block/ioctl.c
block/opal_proto.h
block/partitions/core.c
block/partitions/mac.c
block/sed-opal.c
block/t10-pi.c
crypto/algif_hash.c
crypto/cbc.c
crypto/lskcipher.c
drivers/accel/ivpu/ivpu_debugfs.c
drivers/accel/ivpu/ivpu_drv.c
drivers/accel/ivpu/ivpu_drv.h
drivers/accel/ivpu/ivpu_fw.c
drivers/accel/ivpu/ivpu_gem.c
drivers/accel/ivpu/ivpu_gem.h
drivers/accel/ivpu/ivpu_hw_37xx.c
drivers/accel/ivpu/ivpu_hw_40xx.c
drivers/accel/ivpu/ivpu_ipc.c
drivers/accel/ivpu/ivpu_job.c
drivers/accel/ivpu/ivpu_job.h
drivers/accel/ivpu/ivpu_mmu.c
drivers/accel/ivpu/ivpu_mmu.h
drivers/accel/ivpu/ivpu_mmu_context.c
drivers/accel/ivpu/ivpu_pm.c
drivers/accel/ivpu/ivpu_pm.h
drivers/acpi/apei/ghes.c
drivers/acpi/ec.c
drivers/android/binder.c
drivers/ata/ahci.c
drivers/ata/ahci.h
drivers/ata/ahci_ceva.c
drivers/ata/libata-core.c
drivers/ata/libata-sata.c
drivers/atm/idt77252.c
drivers/base/arch_topology.c
drivers/base/base.h
drivers/base/core.c
drivers/base/regmap/regmap-kunit.c
drivers/block/amiflop.c
drivers/block/aoe/aoeblk.c
drivers/block/aoe/aoecmd.c
drivers/block/aoe/aoenet.c
drivers/block/ataflop.c
drivers/block/brd.c
drivers/block/drbd/drbd_int.h
drivers/block/drbd/drbd_main.c
drivers/block/drbd/drbd_nl.c
drivers/block/drbd/drbd_state.c
drivers/block/drbd/drbd_state_change.h
drivers/block/floppy.c
drivers/block/loop.c
drivers/block/mtip32xx/mtip32xx.c
drivers/block/n64cart.c
drivers/block/nbd.c
drivers/block/null_blk/main.c
drivers/block/null_blk/null_blk.h
drivers/block/null_blk/trace.h
drivers/block/null_blk/zoned.c
drivers/block/pktcdvd.c
drivers/block/ps3disk.c
drivers/block/ps3vram.c
drivers/block/rbd.c
drivers/block/rnbd/rnbd-clt.c
drivers/block/rnbd/rnbd-srv.c
drivers/block/rnbd/rnbd-srv.h
drivers/block/sunvdc.c
drivers/block/swim.c
drivers/block/swim3.c
drivers/block/ublk_drv.c
drivers/block/virtio_blk.c
drivers/block/xen-blkback/blkback.c
drivers/block/xen-blkback/common.h
drivers/block/xen-blkback/xenbus.c
drivers/block/xen-blkfront.c
drivers/block/z2ram.c
drivers/block/zram/zram_drv.c
drivers/block/zram/zram_drv.h
drivers/bluetooth/btqca.c
drivers/bluetooth/hci_bcm4377.c
drivers/bluetooth/hci_qca.c
drivers/bus/imx-weim.c
drivers/cache/ax45mp_cache.c
drivers/cdrom/gdrom.c
drivers/clk/samsung/clk-gs101.c
drivers/comedi/drivers/comedi_8255.c
drivers/comedi/drivers/comedi_test.c
drivers/connector/cn_proc.c
drivers/counter/counter-core.c
drivers/cpufreq/amd-pstate.c
drivers/cpufreq/intel_pstate.c
drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c
drivers/crypto/caam/caamalg_qi2.c
drivers/crypto/caam/caamhash.c
drivers/crypto/ccp/sev-dev.c
drivers/crypto/intel/qat/qat_4xxx/adf_4xxx_hw_data.c
drivers/crypto/rockchip/rk3288_crypto_ahash.c
drivers/crypto/virtio/virtio_crypto_akcipher_algs.c
drivers/cxl/acpi.c
drivers/cxl/core/cdat.c
drivers/cxl/core/mbox.c
drivers/cxl/core/memdev.c
drivers/cxl/core/pci.c
drivers/cxl/core/region.c
drivers/cxl/core/trace.h
drivers/cxl/cxl.h
drivers/cxl/cxlmem.h
drivers/cxl/mem.c
drivers/cxl/pci.c
drivers/dma-buf/heaps/cma_heap.c
drivers/dma/at_hdmac.c
drivers/dma/dw-edma/dw-edma-v0-core.c
drivers/dma/dw-edma/dw-hdma-v0-core.c
drivers/dma/dw-edma/dw-hdma-v0-regs.h
drivers/dma/fsl-dpaa2-qdma/dpaa2-qdma.c
drivers/dma/fsl-edma-common.c
drivers/dma/fsl-edma-common.h
drivers/dma/fsl-edma-main.c
drivers/dma/fsl-qdma.c
drivers/dma/idxd/cdev.c
drivers/dma/idxd/debugfs.c
drivers/dma/idxd/idxd.h
drivers/dma/idxd/init.c
drivers/dma/idxd/irq.c
drivers/dma/ptdma/ptdma-dmaengine.c
drivers/dma/ti/edma.c
drivers/dma/ti/k3-udma.c
drivers/dpll/dpll_core.c
drivers/dpll/dpll_core.h
drivers/dpll/dpll_netlink.c
drivers/dpll/dpll_nl.c
drivers/dpll/dpll_nl.h
drivers/firewire/core-card.c
drivers/firewire/core-device.c
drivers/firewire/ohci.c
drivers/firmware/arm_ffa/driver.c
drivers/firmware/arm_scmi/clock.c
drivers/firmware/arm_scmi/common.h
drivers/firmware/arm_scmi/mailbox.c
drivers/firmware/arm_scmi/perf.c
drivers/firmware/arm_scmi/raw_mode.c
drivers/firmware/arm_scmi/shmem.c
drivers/firmware/efi/arm-runtime.c
drivers/firmware/efi/capsule-loader.c
drivers/firmware/efi/cper.c
drivers/firmware/efi/efi-init.c
drivers/firmware/efi/libstub/Makefile
drivers/firmware/efi/libstub/alignedmem.c
drivers/firmware/efi/libstub/efistub.h
drivers/firmware/efi/libstub/kaslr.c
drivers/firmware/efi/libstub/randomalloc.c
drivers/firmware/efi/libstub/x86-stub.c
drivers/firmware/efi/libstub/x86-stub.h
drivers/firmware/efi/libstub/zboot.c
drivers/firmware/efi/riscv-runtime.c
drivers/firmware/microchip/mpfs-auto-update.c
drivers/firmware/sysfb.c
drivers/gpio/gpio-74x164.c
drivers/gpio/gpio-eic-sprd.c
drivers/gpio/gpiolib-acpi.c
drivers/gpio/gpiolib.c
drivers/gpu/drm/Kconfig
drivers/gpu/drm/amd/amdgpu/amdgpu.h
drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
drivers/gpu/drm/amd/amdgpu/cik_ih.c
drivers/gpu/drm/amd/amdgpu/cz_ih.c
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
drivers/gpu/drm/amd/amdgpu/iceland_ih.c
drivers/gpu/drm/amd/amdgpu/ih_v6_0.c
drivers/gpu/drm/amd/amdgpu/ih_v6_1.c
drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c
drivers/gpu/drm/amd/amdgpu/navi10_ih.c
drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c
drivers/gpu/drm/amd/amdgpu/si_ih.c
drivers/gpu/drm/amd/amdgpu/soc15.c
drivers/gpu/drm/amd/amdgpu/soc21.c
drivers/gpu/drm/amd/amdgpu/tonga_ih.c
drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c
drivers/gpu/drm/amd/amdgpu/vega10_ih.c
drivers/gpu/drm/amd/amdgpu/vega20_ih.c
drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx10.asm
drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c
drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
drivers/gpu/drm/amd/amdkfd/kfd_priv.h
drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
drivers/gpu/drm/amd/amdkfd/kfd_topology.c
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
drivers/gpu/drm/amd/display/dc/basics/dce_calcs.c
drivers/gpu/drm/amd/display/dc/bios/bios_parser2.c
drivers/gpu/drm/amd/display/dc/clk_mgr/dcn301/vg_clk_mgr.c
drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c
drivers/gpu/drm/amd/display/dc/core/dc.c
drivers/gpu/drm/amd/display/dc/core/dc_state.c
drivers/gpu/drm/amd/display/dc/dc.h
drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c
drivers/gpu/drm/amd/display/dc/dc_types.h
drivers/gpu/drm/amd/display/dc/dce/dce_panel_cntl.c
drivers/gpu/drm/amd/display/dc/dcn30/dcn30_dpp_cm.c
drivers/gpu/drm/amd/display/dc/dcn301/dcn301_panel_cntl.c
drivers/gpu/drm/amd/display/dc/dcn31/dcn31_panel_cntl.c
drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_link_encoder.c
drivers/gpu/drm/amd/display/dc/dcn35/dcn35_dio_link_encoder.c
drivers/gpu/drm/amd/display/dc/dml/Makefile
drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
drivers/gpu/drm/amd/display/dc/dml/dcn35/dcn35_fpu.c
drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c
drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.c
drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.h
drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.c
drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_hwseq.h
drivers/gpu/drm/amd/display/dc/hwss/dcn21/dcn21_hwseq.c
drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c
drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.h
drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_init.c
drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c
drivers/gpu/drm/amd/display/dc/hwss/hw_sequencer.h
drivers/gpu/drm/amd/display/dc/hwss/hw_sequencer_private.h
drivers/gpu/drm/amd/display/dc/inc/core_types.h
drivers/gpu/drm/amd/display/dc/inc/hw/panel_cntl.h
drivers/gpu/drm/amd/display/dc/inc/resource.h
drivers/gpu/drm/amd/display/dc/link/link_factory.c
drivers/gpu/drm/amd/display/dc/link/link_validation.c
drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_dpia_bw.c
drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training.c
drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_training_dpia.c
drivers/gpu/drm/amd/display/dc/link/protocols/link_dpcd.c
drivers/gpu/drm/amd/display/dc/resource/dcn301/dcn301_resource.c
drivers/gpu/drm/amd/display/dc/resource/dcn32/dcn32_resource.c
drivers/gpu/drm/amd/display/dc/resource/dcn35/dcn35_resource.c
drivers/gpu/drm/amd/display/dmub/inc/dmub_cmd.h
drivers/gpu/drm/amd/display/modules/power/power_helpers.c
drivers/gpu/drm/amd/display/modules/power/power_helpers.h
drivers/gpu/drm/amd/include/amd_shared.h
drivers/gpu/drm/amd/include/amdgpu_reg_state.h
drivers/gpu/drm/amd/pm/amdgpu_pm.c
drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c
drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c
drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_7_ppt.c
drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0.c
drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_0_ppt.c
drivers/gpu/drm/bridge/analogix/anx7625.c
drivers/gpu/drm/bridge/analogix/anx7625.h
drivers/gpu/drm/bridge/aux-hpd-bridge.c
drivers/gpu/drm/bridge/parade-ps8640.c
drivers/gpu/drm/bridge/samsung-dsim.c
drivers/gpu/drm/bridge/sii902x.c
drivers/gpu/drm/display/drm_dp_mst_topology.c
drivers/gpu/drm/drm_buddy.c
drivers/gpu/drm/drm_crtc.c
drivers/gpu/drm/drm_prime.c
drivers/gpu/drm/drm_probe_helper.c
drivers/gpu/drm/drm_syncobj.c
drivers/gpu/drm/exynos/exynos5433_drm_decon.c
drivers/gpu/drm/exynos/exynos_drm_fimd.c
drivers/gpu/drm/exynos/exynos_drm_gsc.c
drivers/gpu/drm/i915/Kconfig
drivers/gpu/drm/i915/Makefile
drivers/gpu/drm/i915/display/icl_dsi.c
drivers/gpu/drm/i915/display/intel_display_power_well.c
drivers/gpu/drm/i915/display/intel_display_types.h
drivers/gpu/drm/i915/display/intel_dp.c
drivers/gpu/drm/i915/display/intel_dp.h
drivers/gpu/drm/i915/display/intel_dp_hdcp.c
drivers/gpu/drm/i915/display/intel_dp_mst.c
drivers/gpu/drm/i915/display/intel_modeset_setup.c
drivers/gpu/drm/i915/display/intel_psr.c
drivers/gpu/drm/i915/display/intel_sdvo.c
drivers/gpu/drm/i915/display/intel_tv.c
drivers/gpu/drm/i915/display/intel_vdsc_regs.h
drivers/gpu/drm/i915/gem/i915_gem_userptr.c
drivers/gpu/drm/i915/gvt/handlers.c
drivers/gpu/drm/i915/intel_gvt.c
drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
drivers/gpu/drm/meson/meson_encoder_cvbs.c
drivers/gpu/drm/meson/meson_encoder_dsi.c
drivers/gpu/drm/meson/meson_encoder_hdmi.c
drivers/gpu/drm/msm/adreno/a6xx_gpu.c
drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c
drivers/gpu/drm/msm/disp/dpu1/dpu_rm.c
drivers/gpu/drm/msm/dp/dp_ctrl.c
drivers/gpu/drm/msm/dp/dp_display.c
drivers/gpu/drm/msm/dp/dp_link.c
drivers/gpu/drm/msm/dp/dp_reg.h
drivers/gpu/drm/msm/msm_gem_prime.c
drivers/gpu/drm/msm/msm_gpu.c
drivers/gpu/drm/msm/msm_iommu.c
drivers/gpu/drm/msm/msm_mdss.c
drivers/gpu/drm/msm/msm_ringbuffer.c
drivers/gpu/drm/nouveau/Kconfig
drivers/gpu/drm/nouveau/include/nvkm/core/client.h
drivers/gpu/drm/nouveau/include/nvkm/subdev/gsp.h
drivers/gpu/drm/nouveau/nouveau_abi16.c
drivers/gpu/drm/nouveau/nouveau_abi16.h
drivers/gpu/drm/nouveau/nouveau_drm.c
drivers/gpu/drm/nouveau/nouveau_drv.h
drivers/gpu/drm/nouveau/nouveau_exec.c
drivers/gpu/drm/nouveau/nouveau_fence.c
drivers/gpu/drm/nouveau/nouveau_fence.h
drivers/gpu/drm/nouveau/nouveau_gem.c
drivers/gpu/drm/nouveau/nouveau_sched.c
drivers/gpu/drm/nouveau/nouveau_sched.h
drivers/gpu/drm/nouveau/nouveau_svm.c
drivers/gpu/drm/nouveau/nouveau_uvmm.c
drivers/gpu/drm/nouveau/nvkm/core/client.c
drivers/gpu/drm/nouveau/nvkm/core/object.c
drivers/gpu/drm/nouveau/nvkm/subdev/bar/r535.c
drivers/gpu/drm/nouveau/nvkm/subdev/bios/shadow.c
drivers/gpu/drm/nouveau/nvkm/subdev/gsp/r535.c
drivers/gpu/drm/panel/Kconfig
drivers/gpu/drm/panel/panel-boe-tv101wum-nl6.c
drivers/gpu/drm/panel/panel-samsung-s6d7aa0.c
drivers/gpu/drm/panel/panel-simple.c
drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
drivers/gpu/drm/scheduler/sched_main.c
drivers/gpu/drm/tegra/drm.c
drivers/gpu/drm/tests/drm_buddy_test.c
drivers/gpu/drm/tests/drm_mm_test.c
drivers/gpu/drm/ttm/ttm_device.c
drivers/gpu/drm/ttm/ttm_pool.c
drivers/gpu/drm/v3d/v3d_submit.c
drivers/gpu/drm/virtio/virtgpu_drv.c
drivers/gpu/drm/xe/abi/guc_actions_abi.h
drivers/gpu/drm/xe/abi/guc_actions_slpc_abi.h
drivers/gpu/drm/xe/abi/guc_communication_ctb_abi.h
drivers/gpu/drm/xe/abi/guc_klvs_abi.h
drivers/gpu/drm/xe/abi/guc_messages_abi.h
drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h
drivers/gpu/drm/xe/tests/xe_migrate.c
drivers/gpu/drm/xe/tests/xe_mocs_test.c
drivers/gpu/drm/xe/tests/xe_wa_test.c
drivers/gpu/drm/xe/xe_bo.c
drivers/gpu/drm/xe/xe_bo.h
drivers/gpu/drm/xe/xe_device.c
drivers/gpu/drm/xe/xe_device.h
drivers/gpu/drm/xe/xe_device_types.h
drivers/gpu/drm/xe/xe_display.c
drivers/gpu/drm/xe/xe_dma_buf.c
drivers/gpu/drm/xe/xe_drm_client.c
drivers/gpu/drm/xe/xe_exec.c
drivers/gpu/drm/xe/xe_exec_queue.c
drivers/gpu/drm/xe/xe_exec_queue_types.h
drivers/gpu/drm/xe/xe_execlist.c
drivers/gpu/drm/xe/xe_gt.c
drivers/gpu/drm/xe/xe_gt_idle.c
drivers/gpu/drm/xe/xe_gt_mcr.c
drivers/gpu/drm/xe/xe_gt_pagefault.c
drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
drivers/gpu/drm/xe/xe_guc_pc.c
drivers/gpu/drm/xe/xe_guc_submit.c
drivers/gpu/drm/xe/xe_hw_fence.c
drivers/gpu/drm/xe/xe_hwmon.c
drivers/gpu/drm/xe/xe_lrc.c
drivers/gpu/drm/xe/xe_migrate.c
drivers/gpu/drm/xe/xe_mmio.c
drivers/gpu/drm/xe/xe_pt.c
drivers/gpu/drm/xe/xe_pt_walk.c
drivers/gpu/drm/xe/xe_pt_walk.h
drivers/gpu/drm/xe/xe_query.c
drivers/gpu/drm/xe/xe_range_fence.c
drivers/gpu/drm/xe/xe_sched_job.c
drivers/gpu/drm/xe/xe_sync.c
drivers/gpu/drm/xe/xe_sync.h
drivers/gpu/drm/xe/xe_sync_types.h
drivers/gpu/drm/xe/xe_tile.c
drivers/gpu/drm/xe/xe_trace.h
drivers/gpu/drm/xe/xe_vm.c
drivers/gpu/drm/xe/xe_vm.h
drivers/gpu/drm/xe/xe_vm_types.h
drivers/gpu/host1x/dev.c
drivers/gpu/host1x/dev.h
drivers/hid/bpf/hid_bpf_dispatch.c
drivers/hid/bpf/hid_bpf_dispatch.h
drivers/hid/bpf/hid_bpf_jmp_table.c
drivers/hid/hid-ids.h
drivers/hid/hid-logitech-hidpp.c
drivers/hid/hid-multitouch.c
drivers/hid/hid-nvidia-shield.c
drivers/hid/hid-steam.c
drivers/hid/hidraw.c
drivers/hid/i2c-hid/i2c-hid-core.c
drivers/hid/i2c-hid/i2c-hid-of.c
drivers/hid/intel-ish-hid/ishtp/bus.c
drivers/hid/intel-ish-hid/ishtp/client.c
drivers/hid/wacom_sys.c
drivers/hid/wacom_wac.c
drivers/hv/channel.c
drivers/hv/hv_util.c
drivers/hv/vmbus_drv.c
drivers/hwmon/aspeed-pwm-tacho.c
drivers/hwmon/coretemp.c
drivers/hwmon/gigabyte_waterforce.c
drivers/hwmon/nct6775-core.c
drivers/hwmon/pmbus/mp2975.c
drivers/i2c/busses/Makefile
drivers/i2c/busses/i2c-aspeed.c
drivers/i2c/busses/i2c-i801.c
drivers/i2c/busses/i2c-imx.c
drivers/i2c/busses/i2c-pasemi-core.c
drivers/i2c/busses/i2c-qcom-geni.c
drivers/i2c/busses/i2c-wmt.c
drivers/iio/accel/Kconfig
drivers/iio/accel/adxl367.c
drivers/iio/accel/adxl367_i2c.c
drivers/iio/adc/ad4130.c
drivers/iio/adc/ad7091r8.c
drivers/iio/humidity/Kconfig
drivers/iio/humidity/Makefile
drivers/iio/humidity/hdc3020.c
drivers/iio/imu/bno055/Kconfig
drivers/iio/imu/inv_mpu6050/inv_mpu_ring.c
drivers/iio/imu/inv_mpu6050/inv_mpu_trigger.c
drivers/iio/industrialio-core.c
drivers/iio/light/hid-sensor-als.c
drivers/iio/magnetometer/rm3100-core.c
drivers/iio/pressure/bmp280-spi.c
drivers/iio/pressure/dlhl60d.c
drivers/infiniband/hw/bnxt_re/ib_verbs.c
drivers/infiniband/hw/bnxt_re/main.c
drivers/infiniband/hw/bnxt_re/qplib_fp.c
drivers/infiniband/hw/hfi1/pio.c
drivers/infiniband/hw/hfi1/sdma.c
drivers/infiniband/hw/irdma/defs.h
drivers/infiniband/hw/irdma/hw.c
drivers/infiniband/hw/irdma/verbs.c
drivers/infiniband/hw/mlx5/cong.c
drivers/infiniband/hw/mlx5/devx.c
drivers/infiniband/hw/mlx5/wr.c
drivers/infiniband/hw/qedr/verbs.c
drivers/infiniband/ulp/srpt/ib_srpt.c
drivers/input/joystick/xpad.c
drivers/input/keyboard/atkbd.c
drivers/input/keyboard/gpio_keys_polled.c
drivers/input/rmi4/rmi_driver.c
drivers/input/serio/i8042-acpipnpio.h
drivers/input/touchscreen/goodix.c
drivers/interconnect/qcom/sc8180x.c
drivers/interconnect/qcom/sm8550.c
drivers/interconnect/qcom/sm8650.c
drivers/interconnect/qcom/x1e80100.c
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
drivers/iommu/arm/arm-smmu/arm-smmu.c
drivers/iommu/intel/iommu.c
drivers/iommu/intel/iommu.h
drivers/iommu/intel/nested.c
drivers/iommu/intel/pasid.c
drivers/iommu/intel/pasid.h
drivers/iommu/iommu-sva.c
drivers/iommu/iommu.c
drivers/iommu/iommufd/hw_pagetable.c
drivers/iommu/iommufd/io_pagetable.c
drivers/iommu/iommufd/iommufd_test.h
drivers/iommu/iommufd/iova_bitmap.c
drivers/iommu/iommufd/selftest.c
drivers/irqchip/irq-brcmstb-l2.c
drivers/irqchip/irq-gic-v3-its.c
drivers/irqchip/irq-loongson-eiointc.c
drivers/irqchip/irq-mbigen.c
drivers/irqchip/irq-qcom-mpm.c
drivers/irqchip/irq-sifive-plic.c
drivers/md/bcache/bcache.h
drivers/md/bcache/super.c
drivers/md/dm-core.h
drivers/md/dm-crypt.c
drivers/md/dm-integrity.c
drivers/md/dm-ioctl.c
drivers/md/dm-raid.c
drivers/md/dm-stats.c
drivers/md/dm-table.c
drivers/md/dm-verity-target.c
drivers/md/dm-verity.h
drivers/md/dm-writecache.c
drivers/md/dm-zoned-metadata.c
drivers/md/dm.c
drivers/md/md-bitmap.c
drivers/md/md-linear.h [deleted file]
drivers/md/md-multipath.h [deleted file]
drivers/md/md.c
drivers/md/md.h
drivers/md/raid0.c
drivers/md/raid1-10.c
drivers/md/raid1.c
drivers/md/raid1.h
drivers/md/raid10.c
drivers/md/raid5-ppl.c
drivers/md/raid5.c
drivers/media/common/videobuf2/videobuf2-core.c
drivers/media/common/videobuf2/videobuf2-v4l2.c
drivers/media/platform/chips-media/wave5/wave5-vpu.c
drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
drivers/media/platform/rockchip/rkisp1/rkisp1-common.h
drivers/media/platform/rockchip/rkisp1/rkisp1-csi.c
drivers/media/platform/rockchip/rkisp1/rkisp1-dev.c
drivers/media/platform/rockchip/rkisp1/rkisp1-isp.c
drivers/media/rc/Kconfig
drivers/media/rc/bpf-lirc.c
drivers/media/rc/ir_toy.c
drivers/media/rc/lirc_dev.c
drivers/media/rc/rc-core-priv.h
drivers/memstick/core/ms_block.c
drivers/memstick/core/mspro_block.c
drivers/misc/fastrpc.c
drivers/misc/lis3lv02d/lis3lv02d_i2c.c
drivers/misc/mei/gsc_proxy/mei_gsc_proxy.c
drivers/misc/mei/hw-me-regs.h
drivers/misc/mei/pci-me.c
drivers/misc/mei/vsc-tp.c
drivers/misc/open-dice.c
drivers/mmc/core/mmc.c
drivers/mmc/core/queue.c
drivers/mmc/core/slot-gpio.c
drivers/mmc/host/mmci_stm32_sdmmc.c
drivers/mmc/host/sdhci-pci-o2micro.c
drivers/mmc/host/sdhci-xenon-phy.c
drivers/mtd/devices/block2mtd.c
drivers/mtd/mtd_blkdevs.c
drivers/mtd/mtdcore.c
drivers/mtd/nand/raw/marvell_nand.c
drivers/mtd/nand/spi/gigadevice.c
drivers/mtd/ubi/block.c
drivers/net/arcnet/arc-rawmode.c
drivers/net/arcnet/arc-rimi.c
drivers/net/arcnet/capmode.c
drivers/net/arcnet/com20020-pci.c
drivers/net/arcnet/com20020.c
drivers/net/arcnet/com20020_cs.c
drivers/net/arcnet/com90io.c
drivers/net/arcnet/com90xx.c
drivers/net/arcnet/rfc1051.c
drivers/net/arcnet/rfc1201.c
drivers/net/bonding/bond_main.c
drivers/net/can/dev/netlink.c
drivers/net/dsa/dsa_loop_bdinfo.c
drivers/net/dsa/microchip/ksz8795.c
drivers/net/dsa/mt7530.c
drivers/net/dsa/mv88e6xxx/chip.c
drivers/net/dsa/qca/qca8k-8xxx.c
drivers/net/ethernet/8390/8390.c
drivers/net/ethernet/8390/8390p.c
drivers/net/ethernet/8390/apne.c
drivers/net/ethernet/8390/hydra.c
drivers/net/ethernet/8390/stnic.c
drivers/net/ethernet/8390/zorro8390.c
drivers/net/ethernet/adi/Kconfig
drivers/net/ethernet/amd/pds_core/adminq.c
drivers/net/ethernet/amd/pds_core/auxbus.c
drivers/net/ethernet/amd/pds_core/core.c
drivers/net/ethernet/amd/pds_core/core.h
drivers/net/ethernet/amd/pds_core/debugfs.c
drivers/net/ethernet/amd/pds_core/dev.c
drivers/net/ethernet/amd/pds_core/devlink.c
drivers/net/ethernet/amd/pds_core/fw.c
drivers/net/ethernet/amd/pds_core/main.c
drivers/net/ethernet/aquantia/atlantic/aq_ptp.c
drivers/net/ethernet/aquantia/atlantic/aq_ring.c
drivers/net/ethernet/aquantia/atlantic/aq_ring.h
drivers/net/ethernet/broadcom/asp2/bcmasp.c
drivers/net/ethernet/broadcom/asp2/bcmasp_intf.c
drivers/net/ethernet/broadcom/bcm4908_enet.c
drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c
drivers/net/ethernet/broadcom/bgmac-bcma.c
drivers/net/ethernet/broadcom/bgmac-platform.c
drivers/net/ethernet/broadcom/bgmac.c
drivers/net/ethernet/broadcom/bnxt/bnxt.c
drivers/net/ethernet/broadcom/bnxt/bnxt.h
drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
drivers/net/ethernet/brocade/bna/bnad.c
drivers/net/ethernet/cavium/liquidio/lio_core.c
drivers/net/ethernet/cirrus/ep93xx_eth.c
drivers/net/ethernet/cisco/enic/vnic_vic.c
drivers/net/ethernet/engleder/tsnep_main.c
drivers/net/ethernet/ezchip/nps_enet.c
drivers/net/ethernet/freescale/enetc/enetc.c
drivers/net/ethernet/freescale/fec_main.c
drivers/net/ethernet/freescale/fman/fman_memac.c
drivers/net/ethernet/freescale/fsl_pq_mdio.c
drivers/net/ethernet/google/gve/gve_rx.c
drivers/net/ethernet/intel/e1000e/e1000.h
drivers/net/ethernet/intel/e1000e/ich8lan.c
drivers/net/ethernet/intel/e1000e/ptp.c
drivers/net/ethernet/intel/i40e/i40e_dcb.c
drivers/net/ethernet/intel/i40e/i40e_dcb.h
drivers/net/ethernet/intel/i40e/i40e_main.c
drivers/net/ethernet/intel/i40e/i40e_prototype.h
drivers/net/ethernet/intel/i40e/i40e_txrx.c
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
drivers/net/ethernet/intel/i40e/i40e_xsk.c
drivers/net/ethernet/intel/ice/ice_base.c
drivers/net/ethernet/intel/ice/ice_dpll.c
drivers/net/ethernet/intel/ice/ice_lag.c
drivers/net/ethernet/intel/ice/ice_lag.h
drivers/net/ethernet/intel/ice/ice_lib.c
drivers/net/ethernet/intel/ice/ice_lib.h
drivers/net/ethernet/intel/ice/ice_main.c
drivers/net/ethernet/intel/ice/ice_osdep.h
drivers/net/ethernet/intel/ice/ice_sriov.c
drivers/net/ethernet/intel/ice/ice_txrx.c
drivers/net/ethernet/intel/ice/ice_txrx.h
drivers/net/ethernet/intel/ice/ice_txrx_lib.h
drivers/net/ethernet/intel/ice/ice_type.h
drivers/net/ethernet/intel/ice/ice_virtchnl.c
drivers/net/ethernet/intel/ice/ice_virtchnl_allowlist.c
drivers/net/ethernet/intel/ice/ice_xsk.c
drivers/net/ethernet/intel/idpf/idpf_lib.c
drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
drivers/net/ethernet/intel/idpf/virtchnl2.h
drivers/net/ethernet/intel/igb/igb.h
drivers/net/ethernet/intel/igb/igb_main.c
drivers/net/ethernet/intel/igb/igb_ptp.c
drivers/net/ethernet/intel/igc/igc_main.c
drivers/net/ethernet/intel/igc/igc_phy.c
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
drivers/net/ethernet/litex/litex_liteeth.c
drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
drivers/net/ethernet/marvell/octeontx2/af/mbox.c
drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
drivers/net/ethernet/marvell/octeontx2/nic/otx2_ethtool.c
drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
drivers/net/ethernet/mediatek/mtk_eth_soc.c
drivers/net/ethernet/mellanox/mlx5/core/cmd.c
drivers/net/ethernet/mellanox/mlx5/core/devlink.c
drivers/net/ethernet/mellanox/mlx5/core/dpll.c
drivers/net/ethernet/mellanox/mlx5/core/en.h
drivers/net/ethernet/mellanox/mlx5/core/en/fs_tt_redirect.c
drivers/net/ethernet/mellanox/mlx5/core/en/params.c
drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
drivers/net/ethernet/mellanox/mlx5/core/en/tc/post_act.c
drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c
drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c
drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
drivers/net/ethernet/mellanox/mlx5/core/en_common.c
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_mcast.c
drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec_fs.c
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
drivers/net/ethernet/mellanox/mlx5/core/health.c
drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c
drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c
drivers/net/ethernet/mellanox/mlx5/core/vport.c
drivers/net/ethernet/microchip/lan966x/lan966x_lag.c
drivers/net/ethernet/microchip/lan966x/lan966x_port.c
drivers/net/ethernet/microchip/sparx5/sparx5_mactable.c
drivers/net/ethernet/microchip/sparx5/sparx5_main.c
drivers/net/ethernet/microchip/sparx5/sparx5_main.h
drivers/net/ethernet/microchip/sparx5/sparx5_packet.c
drivers/net/ethernet/netronome/nfp/flower/conntrack.c
drivers/net/ethernet/netronome/nfp/flower/tunnel_conf.c
drivers/net/ethernet/netronome/nfp/nfp_net_common.c
drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
drivers/net/ethernet/pensando/ionic/ionic_bus_pci.c
drivers/net/ethernet/pensando/ionic/ionic_dev.c
drivers/net/ethernet/pensando/ionic/ionic_ethtool.c
drivers/net/ethernet/pensando/ionic/ionic_fw.c
drivers/net/ethernet/pensando/ionic/ionic_lif.c
drivers/net/ethernet/pensando/ionic/ionic_main.c
drivers/net/ethernet/pensando/ionic/ionic_txrx.c
drivers/net/ethernet/renesas/ravb_main.c
drivers/net/ethernet/stmicro/stmmac/common.h
drivers/net/ethernet/stmicro/stmmac/dwmac-imx.c
drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c
drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c
drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c
drivers/net/ethernet/stmicro/stmmac/dwxgmac2.h
drivers/net/ethernet/stmicro/stmmac/dwxgmac2_core.c
drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c
drivers/net/ethernet/stmicro/stmmac/hwif.c
drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
drivers/net/ethernet/ti/Kconfig
drivers/net/ethernet/ti/am65-cpsw-nuss.c
drivers/net/ethernet/ti/cpsw.c
drivers/net/ethernet/ti/cpsw_new.c
drivers/net/ethernet/ti/cpts.c
drivers/net/ethernet/toshiba/ps3_gelic_net.c
drivers/net/fddi/skfp/skfddi.c
drivers/net/fjes/fjes_hw.c
drivers/net/geneve.c
drivers/net/gtp.c
drivers/net/hyperv/netvsc.c
drivers/net/hyperv/netvsc_drv.c
drivers/net/ieee802154/fakelb.c
drivers/net/ipa/ipa_interrupt.c
drivers/net/ipvlan/ipvtap.c
drivers/net/macsec.c
drivers/net/netdevsim/dev.c
drivers/net/phy/mdio_devres.c
drivers/net/phy/mediatek-ge-soc.c
drivers/net/phy/micrel.c
drivers/net/phy/realtek.c
drivers/net/plip/plip.c
drivers/net/ppp/bsd_comp.c
drivers/net/ppp/ppp_async.c
drivers/net/ppp/ppp_deflate.c
drivers/net/ppp/ppp_generic.c
drivers/net/ppp/ppp_synctty.c
drivers/net/ppp/pppoe.c
drivers/net/tun.c
drivers/net/usb/dm9601.c
drivers/net/usb/lan78xx.c
drivers/net/usb/smsc95xx.c
drivers/net/veth.c
drivers/net/wireless/ath/ar5523/ar5523.c
drivers/net/wireless/ath/ath11k/core.h
drivers/net/wireless/ath/ath11k/debugfs.c
drivers/net/wireless/ath/ath11k/debugfs.h
drivers/net/wireless/ath/ath11k/mac.c
drivers/net/wireless/ath/wcn36xx/main.c
drivers/net/wireless/broadcom/brcm80211/brcmfmac/bca/module.c
drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
drivers/net/wireless/broadcom/brcm80211/brcmfmac/cyw/module.c
drivers/net/wireless/broadcom/brcm80211/brcmfmac/wcc/module.c
drivers/net/wireless/intel/iwlwifi/fw/acpi.c
drivers/net/wireless/intel/iwlwifi/fw/api/debug.h
drivers/net/wireless/intel/iwlwifi/fw/api/txq.h
drivers/net/wireless/intel/iwlwifi/fw/dbg.c
drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c
drivers/net/wireless/intel/iwlwifi/iwl-drv.c
drivers/net/wireless/intel/iwlwifi/iwl-nvm-parse.c
drivers/net/wireless/intel/iwlwifi/mvm/d3.c
drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c
drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c
drivers/net/wireless/intel/iwlwifi/mvm/mvm.h
drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c
drivers/net/wireless/intel/iwlwifi/mvm/sta.c
drivers/net/wireless/intel/iwlwifi/mvm/sta.h
drivers/net/wireless/intel/iwlwifi/mvm/time-event.c
drivers/net/wireless/intel/iwlwifi/mvm/tx.c
drivers/net/wireless/intersil/p54/fwio.c
drivers/net/wireless/intersil/p54/p54spi.c
drivers/net/wireless/mediatek/mt76/mt7603/main.c
drivers/net/wireless/mediatek/mt76/mt7615/main.c
drivers/net/wireless/mediatek/mt76/mt7615/mmio.c
drivers/net/wireless/mediatek/mt76/mt7615/sdio.c
drivers/net/wireless/mediatek/mt76/mt7615/usb.c
drivers/net/wireless/mediatek/mt76/mt7615/usb_sdio.c
drivers/net/wireless/mediatek/mt76/mt76_connac_mcu.c
drivers/net/wireless/mediatek/mt76/mt76x0/eeprom.c
drivers/net/wireless/mediatek/mt76/mt76x0/pci.c
drivers/net/wireless/mediatek/mt76/mt76x0/usb.c
drivers/net/wireless/mediatek/mt76/mt76x02_usb_mcu.c
drivers/net/wireless/mediatek/mt76/mt76x02_util.c
drivers/net/wireless/mediatek/mt76/mt76x2/eeprom.c
drivers/net/wireless/mediatek/mt76/mt76x2/pci.c
drivers/net/wireless/mediatek/mt76/mt76x2/usb.c
drivers/net/wireless/mediatek/mt76/mt7915/mmio.c
drivers/net/wireless/mediatek/mt76/mt7921/main.c
drivers/net/wireless/mediatek/mt76/mt7921/pci.c
drivers/net/wireless/mediatek/mt76/mt7921/sdio.c
drivers/net/wireless/mediatek/mt76/mt7921/usb.c
drivers/net/wireless/mediatek/mt76/mt7925/main.c
drivers/net/wireless/mediatek/mt76/mt7925/pci.c
drivers/net/wireless/mediatek/mt76/mt7925/usb.c
drivers/net/wireless/mediatek/mt76/mt792x_core.c
drivers/net/wireless/mediatek/mt76/mt792x_usb.c
drivers/net/wireless/mediatek/mt76/mt7996/mcu.c
drivers/net/wireless/mediatek/mt76/mt7996/mmio.c
drivers/net/wireless/mediatek/mt76/sdio.c
drivers/net/wireless/mediatek/mt76/usb.c
drivers/net/wireless/mediatek/mt76/util.c
drivers/net/wireless/microchip/wilc1000/netdev.c
drivers/net/wireless/microchip/wilc1000/sdio.c
drivers/net/wireless/microchip/wilc1000/spi.c
drivers/net/wireless/ti/wl1251/sdio.c
drivers/net/wireless/ti/wl1251/spi.c
drivers/net/wireless/ti/wl12xx/main.c
drivers/net/wireless/ti/wl18xx/main.c
drivers/net/wireless/ti/wlcore/main.c
drivers/net/wireless/ti/wlcore/sdio.c
drivers/net/wireless/ti/wlcore/spi.c
drivers/net/xen-netback/netback.c
drivers/nvdimm/btt.c
drivers/nvdimm/pmem.c
drivers/nvme/common/auth.c
drivers/nvme/common/keyring.c
drivers/nvme/host/apple.c
drivers/nvme/host/auth.c
drivers/nvme/host/constants.c
drivers/nvme/host/core.c
drivers/nvme/host/fabrics.c
drivers/nvme/host/fabrics.h
drivers/nvme/host/fc.c
drivers/nvme/host/ioctl.c
drivers/nvme/host/multipath.c
drivers/nvme/host/nvme.h
drivers/nvme/host/pci.c
drivers/nvme/host/rdma.c
drivers/nvme/host/sysfs.c
drivers/nvme/host/tcp.c
drivers/nvme/host/zns.c
drivers/nvme/target/admin-cmd.c
drivers/nvme/target/configfs.c
drivers/nvme/target/core.c
drivers/nvme/target/discovery.c
drivers/nvme/target/fabrics-cmd.c
drivers/nvme/target/fc.c
drivers/nvme/target/fcloop.c
drivers/nvme/target/io-cmd-bdev.c
drivers/nvme/target/loop.c
drivers/nvme/target/nvmet.h
drivers/nvme/target/passthru.c
drivers/nvme/target/rdma.c
drivers/nvme/target/tcp.c
drivers/nvme/target/zns.c
drivers/nvmem/core.c
drivers/of/property.c
drivers/of/unittest.c
drivers/pci/bus.c
drivers/pci/controller/dwc/pcie-designware-ep.c
drivers/pci/controller/dwc/pcie-qcom.c
drivers/pci/msi/irqdomain.c
drivers/pci/pci.c
drivers/pci/pci.h
drivers/pci/pcie/aspm.c
drivers/perf/arm-cmn.c
drivers/perf/cxl_pmu.c
drivers/perf/riscv_pmu.c
drivers/perf/riscv_pmu_legacy.c
drivers/perf/riscv_pmu_sbi.c
drivers/phy/freescale/phy-fsl-imx8-mipi-dphy.c
drivers/phy/microchip/lan966x_serdes.c
drivers/phy/qualcomm/phy-qcom-eusb2-repeater.c
drivers/phy/qualcomm/phy-qcom-m31.c
drivers/phy/qualcomm/phy-qcom-qmp-combo.c
drivers/phy/qualcomm/phy-qcom-qmp-usb.c
drivers/phy/renesas/phy-rcar-gen3-usb2.c
drivers/phy/ti/phy-omap-usb2.c
drivers/pinctrl/core.c
drivers/pinctrl/pinctrl-amd.c
drivers/pinctrl/stm32/pinctrl-stm32mp257.c
drivers/platform/mellanox/mlxbf-pmc.c
drivers/platform/mellanox/mlxbf-tmfifo.c
drivers/platform/x86/amd/pmf/Kconfig
drivers/platform/x86/amd/pmf/core.c
drivers/platform/x86/amd/pmf/pmf.h
drivers/platform/x86/amd/pmf/spc.c
drivers/platform/x86/amd/pmf/tee-if.c
drivers/platform/x86/intel/ifs/load.c
drivers/platform/x86/intel/int0002_vgpio.c
drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.c
drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.h
drivers/platform/x86/intel/vbtn.c
drivers/platform/x86/intel/wmi/sbl-fw-update.c
drivers/platform/x86/p2sb.c
drivers/platform/x86/serdev_helpers.h [new file with mode: 0644]
drivers/platform/x86/think-lmi.c
drivers/platform/x86/thinkpad_acpi.c
drivers/platform/x86/touchscreen_dmi.c
drivers/platform/x86/wmi.c
drivers/platform/x86/x86-android-tablets/core.c
drivers/platform/x86/x86-android-tablets/lenovo.c
drivers/platform/x86/x86-android-tablets/other.c
drivers/platform/x86/x86-android-tablets/x86-android-tablets.h
drivers/pmdomain/arm/scmi_perf_domain.c
drivers/pmdomain/core.c
drivers/pmdomain/mediatek/mtk-pm-domains.c
drivers/pmdomain/qcom/rpmhpd.c
drivers/pmdomain/renesas/r8a77980-sysc.c
drivers/power/supply/Kconfig
drivers/power/supply/bq27xxx_battery_i2c.c
drivers/power/supply/qcom_battmgr.c
drivers/regulator/max5970-regulator.c
drivers/regulator/pwm-regulator.c
drivers/regulator/rk808-regulator.c
drivers/regulator/ti-abb-regulator.c
drivers/rtc/lib_test.c
drivers/s390/block/dasd.c
drivers/s390/block/dasd_3990_erp.c
drivers/s390/block/dasd_alias.c
drivers/s390/block/dasd_devmap.c
drivers/s390/block/dasd_diag.c
drivers/s390/block/dasd_eckd.c
drivers/s390/block/dasd_eer.c
drivers/s390/block/dasd_erp.c
drivers/s390/block/dasd_fba.c
drivers/s390/block/dasd_genhd.c
drivers/s390/block/dasd_int.h
drivers/s390/block/dasd_ioctl.c
drivers/s390/block/dasd_proc.c
drivers/s390/block/dcssblk.c
drivers/s390/block/scm_blk.c
drivers/s390/cio/device_ops.c
drivers/s390/net/qeth_l3_main.c
drivers/scsi/Kconfig
drivers/scsi/fcoe/fcoe_ctlr.c
drivers/scsi/fnic/fnic.h
drivers/scsi/fnic/fnic_fcs.c
drivers/scsi/fnic/fnic_main.c
drivers/scsi/fnic/fnic_scsi.c
drivers/scsi/initio.c
drivers/scsi/isci/request.c
drivers/scsi/lpfc/lpfc_scsi.c
drivers/scsi/mpi3mr/mpi3mr_transport.c
drivers/scsi/mpt3sas/mpt3sas_base.c
drivers/scsi/scsi.c
drivers/scsi/scsi_error.c
drivers/scsi/scsi_lib.c
drivers/scsi/scsi_priv.h
drivers/scsi/scsi_scan.c
drivers/scsi/sd.c
drivers/scsi/smartpqi/smartpqi_init.c
drivers/scsi/storvsc_drv.c
drivers/scsi/virtio_scsi.c
drivers/soc/apple/mailbox.c
drivers/soc/microchip/Kconfig
drivers/soc/qcom/pmic_glink.c
drivers/soc/qcom/pmic_glink_altmode.c
drivers/spi/spi-bcm-qspi.c
drivers/spi/spi-cadence-quadspi.c
drivers/spi/spi-cadence.c
drivers/spi/spi-cs42l43.c
drivers/spi/spi-hisi-sfc-v3xx.c
drivers/spi/spi-imx.c
drivers/spi/spi-intel-pci.c
drivers/spi/spi-mxs.c
drivers/spi/spi-omap2-mcspi.c
drivers/spi/spi-ppc4xx.c
drivers/spi/spi-sh-msiof.c
drivers/spi/spi.c
drivers/staging/iio/impedance-analyzer/ad5933.c
drivers/staging/media/atomisp/pci/atomisp_cmd.c
drivers/staging/media/atomisp/pci/atomisp_internal.h
drivers/staging/media/atomisp/pci/atomisp_ioctl.c
drivers/staging/media/atomisp/pci/atomisp_v4l2.c
drivers/target/target_core_configfs.c
drivers/target/target_core_iblock.c
drivers/target/target_core_iblock.h
drivers/target/target_core_pscsi.c
drivers/target/target_core_pscsi.h
drivers/tee/optee/device.c
drivers/thermal/intel/intel_powerclamp.c
drivers/thunderbolt/switch.c
drivers/thunderbolt/tb_regs.h
drivers/thunderbolt/usb4.c
drivers/tty/hvc/Kconfig
drivers/tty/serial/8250/8250_dw.c
drivers/tty/serial/8250/8250_pci1xxxx.c
drivers/tty/serial/amba-pl011.c
drivers/tty/serial/fsl_lpuart.c
drivers/tty/serial/imx.c
drivers/tty/serial/max310x.c
drivers/tty/serial/mxs-auart.c
drivers/tty/serial/qcom_geni_serial.c
drivers/tty/serial/serial_core.c
drivers/tty/serial/serial_port.c
drivers/tty/serial/stm32-usart.c
drivers/tty/vt/vt.c
drivers/ufs/core/ufshcd.c
drivers/usb/cdns3/cdns3-gadget.c
drivers/usb/cdns3/core.c
drivers/usb/cdns3/drd.c
drivers/usb/cdns3/drd.h
drivers/usb/cdns3/host.c
drivers/usb/chipidea/ci.h
drivers/usb/chipidea/core.c
drivers/usb/common/ulpi.c
drivers/usb/core/hub.c
drivers/usb/core/port.c
drivers/usb/dwc3/core.h
drivers/usb/dwc3/dwc3-pci.c
drivers/usb/dwc3/gadget.c
drivers/usb/dwc3/gadget.h
drivers/usb/dwc3/host.c
drivers/usb/gadget/function/f_mass_storage.c
drivers/usb/gadget/function/f_ncm.c
drivers/usb/gadget/udc/omap_udc.c
drivers/usb/gadget/udc/pch_udc.c
drivers/usb/host/uhci-grlib.c
drivers/usb/host/xhci-mem.c
drivers/usb/host/xhci-plat.c
drivers/usb/host/xhci-ring.c
drivers/usb/host/xhci.h
drivers/usb/roles/class.c
drivers/usb/serial/cp210x.c
drivers/usb/serial/option.c
drivers/usb/serial/qcserial.c
drivers/usb/storage/isd200.c
drivers/usb/storage/scsiglue.c
drivers/usb/storage/uas.c
drivers/usb/typec/altmodes/displayport.c
drivers/usb/typec/tcpm/tcpm.c
drivers/usb/typec/ucsi/ucsi.c
drivers/usb/typec/ucsi/ucsi_acpi.c
drivers/usb/typec/ucsi/ucsi_glink.c
drivers/video/fbdev/core/fbcon.c
drivers/video/fbdev/hyperv_fb.c
drivers/video/fbdev/savage/savagefb_driver.c
drivers/video/fbdev/sis/sis_main.c
drivers/video/fbdev/stifb.c
drivers/video/fbdev/vt8500lcdfb.c
drivers/xen/events/events_base.c
drivers/xen/gntalloc.c
drivers/xen/pcpu.c
drivers/xen/privcmd.c
drivers/xen/xen-balloon.c
drivers/xen/xenbus/xenbus_client.c
fs/9p/vfs_file.c
fs/Kconfig
fs/Makefile
fs/affs/affs.h
fs/affs/super.c
fs/afs/dir.c
fs/afs/dynroot.c
fs/afs/file.c
fs/afs/flock.c
fs/afs/internal.h
fs/afs/main.c
fs/afs/proc.c
fs/afs/server.c
fs/afs/volume.c
fs/aio.c
fs/attr.c
fs/backing-file.c
fs/bcachefs/alloc_background.c
fs/bcachefs/backpointers.c
fs/bcachefs/bcachefs.h
fs/bcachefs/btree_iter.c
fs/bcachefs/btree_locking.c
fs/bcachefs/btree_update_interior.c
fs/bcachefs/debug.c
fs/bcachefs/fs-io-buffered.c
fs/bcachefs/fs-io-direct.c
fs/bcachefs/fs-io.c
fs/bcachefs/fs-ioctl.c
fs/bcachefs/fs.c
fs/bcachefs/fsck.c
fs/bcachefs/io_write.c
fs/bcachefs/journal.c
fs/bcachefs/journal_io.c
fs/bcachefs/journal_reclaim.c
fs/bcachefs/mean_and_variance.h
fs/bcachefs/printbuf.c
fs/bcachefs/recovery.c
fs/bcachefs/sb-members.c
fs/bcachefs/snapshot.c
fs/bcachefs/str_hash.h
fs/bcachefs/super-io.c
fs/bcachefs/super.c
fs/bcachefs/super_types.h
fs/bcachefs/thread_with_file.c
fs/bcachefs/util.c
fs/bcachefs/util.h
fs/btrfs/block-group.c
fs/btrfs/block-group.h
fs/btrfs/block-rsv.c
fs/btrfs/block-rsv.h
fs/btrfs/compression.c
fs/btrfs/compression.h
fs/btrfs/defrag.c
fs/btrfs/delalloc-space.c
fs/btrfs/dev-replace.c
fs/btrfs/disk-io.c
fs/btrfs/disk-io.h
fs/btrfs/extent-tree.c
fs/btrfs/extent_io.c
fs/btrfs/inode.c
fs/btrfs/ioctl.c
fs/btrfs/lzo.c
fs/btrfs/qgroup.c
fs/btrfs/ref-verify.c
fs/btrfs/scrub.c
fs/btrfs/send.c
fs/btrfs/space-info.c
fs/btrfs/subpage.c
fs/btrfs/super.c
fs/btrfs/transaction.c
fs/btrfs/tree-checker.c
fs/btrfs/volumes.c
fs/btrfs/volumes.h
fs/btrfs/zlib.c
fs/btrfs/zoned.c
fs/buffer.c
fs/cachefiles/cache.c
fs/cachefiles/daemon.c
fs/cachefiles/ondemand.c
fs/ceph/caps.c
fs/ceph/inode.c
fs/ceph/locks.c
fs/ceph/mds_client.c
fs/ceph/mds_client.h
fs/ceph/mdsmap.c
fs/ceph/mdsmap.h
fs/ceph/super.h
fs/coda/inode.c
fs/coredump.c
fs/cramfs/inode.c
fs/crypto/fname.c
fs/crypto/hooks.c
fs/dcache.c
fs/direct-io.c
fs/dlm/plock.c
fs/ecryptfs/crypto.c
fs/efivarfs/internal.h
fs/efivarfs/super.c
fs/efivarfs/vars.c
fs/efs/super.c
fs/erofs/compress.h
fs/erofs/data.c
fs/erofs/decompressor.c
fs/erofs/decompressor_deflate.c
fs/erofs/decompressor_lzma.c
fs/erofs/fscache.c
fs/erofs/inode.c
fs/erofs/internal.h
fs/erofs/namei.c
fs/erofs/super.c
fs/erofs/utils.c
fs/erofs/zdata.c
fs/eventfd.c
fs/eventpoll.c
fs/exec.c
fs/exfat/exfat_fs.h
fs/exfat/file.c
fs/exfat/inode.c
fs/exfat/nls.c
fs/exfat/super.c
fs/exportfs/expfs.c
fs/ext4/ext4.h
fs/ext4/extents.c
fs/ext4/file.c
fs/ext4/fsmap.c
fs/ext4/indirect.c
fs/ext4/inode.c
fs/ext4/ioctl.c
fs/ext4/mballoc.c
fs/ext4/mballoc.h
fs/ext4/move_extent.c
fs/ext4/namei.c
fs/ext4/super.c
fs/ext4/symlink.c
fs/f2fs/f2fs.h
fs/f2fs/namei.c
fs/f2fs/segment.c
fs/f2fs/super.c
fs/fat/inode.c
fs/fcntl.c
fs/fhandle.c
fs/file_table.c
fs/fs-writeback.c
fs/fs_parser.c
fs/fuse/cuse.c
fs/fuse/file.c
fs/fuse/fuse_i.h
fs/fuse/inode.c
fs/gfs2/bmap.c
fs/gfs2/dentry.c
fs/gfs2/file.c
fs/gfs2/inode.c
fs/gfs2/ops_fstype.c
fs/hfsplus/hfsplus_fs.h
fs/hfsplus/super.c
fs/hfsplus/wrapper.c
fs/hugetlbfs/inode.c
fs/inode.c
fs/internal.h
fs/ioctl.c
fs/iomap/buffered-io.c
fs/iomap/direct-io.c
fs/iomap/trace.h
fs/jfs/jfs_dmap.c
fs/jfs/jfs_logmgr.c
fs/jfs/jfs_logmgr.h
fs/jfs/jfs_mount.c
fs/jfs/super.c
fs/kernfs/mount.c
fs/libfs.c
fs/lockd/clnt4xdr.c
fs/lockd/clntlock.c
fs/lockd/clntproc.c
fs/lockd/clntxdr.c
fs/lockd/svc4proc.c
fs/lockd/svclock.c
fs/lockd/svcproc.c
fs/lockd/svcsubs.c
fs/lockd/xdr.c
fs/lockd/xdr4.c
fs/locks.c
fs/mbcache.c
fs/minix/inode.c
fs/mnt_idmapping.c
fs/mpage.c
fs/namei.c
fs/namespace.c
fs/netfs/buffered_read.c
fs/netfs/buffered_write.c
fs/netfs/direct_write.c
fs/netfs/fscache_cache.c
fs/netfs/io.c
fs/netfs/misc.c
fs/nfs/blocklayout/blocklayout.h
fs/nfs/blocklayout/dev.c
fs/nfs/client.c
fs/nfs/delegation.c
fs/nfs/dir.c
fs/nfs/file.c
fs/nfs/nfs3proc.c
fs/nfs/nfs4_fs.h
fs/nfs/nfs4file.c
fs/nfs/nfs4proc.c
fs/nfs/nfs4state.c
fs/nfs/nfs4trace.h
fs/nfs/nfs4xdr.c
fs/nfs/write.c
fs/nfsd/filecache.c
fs/nfsd/nfs4callback.c
fs/nfsd/nfs4layouts.c
fs/nfsd/nfs4state.c
fs/nilfs2/file.c
fs/nilfs2/recovery.c
fs/nilfs2/segment.c
fs/nsfs.c
fs/ntfs/Kconfig [deleted file]
fs/ntfs/Makefile [deleted file]
fs/ntfs/aops.c [deleted file]
fs/ntfs/aops.h [deleted file]
fs/ntfs/attrib.c [deleted file]
fs/ntfs/attrib.h [deleted file]
fs/ntfs/bitmap.c [deleted file]
fs/ntfs/bitmap.h [deleted file]
fs/ntfs/collate.c [deleted file]
fs/ntfs/collate.h [deleted file]
fs/ntfs/compress.c [deleted file]
fs/ntfs/debug.c [deleted file]
fs/ntfs/debug.h [deleted file]
fs/ntfs/dir.c [deleted file]
fs/ntfs/dir.h [deleted file]
fs/ntfs/endian.h [deleted file]
fs/ntfs/file.c [deleted file]
fs/ntfs/index.c [deleted file]
fs/ntfs/index.h [deleted file]
fs/ntfs/inode.c [deleted file]
fs/ntfs/inode.h [deleted file]
fs/ntfs/layout.h [deleted file]
fs/ntfs/lcnalloc.c [deleted file]
fs/ntfs/lcnalloc.h [deleted file]
fs/ntfs/logfile.c [deleted file]
fs/ntfs/logfile.h [deleted file]
fs/ntfs/malloc.h [deleted file]
fs/ntfs/mft.c [deleted file]
fs/ntfs/mft.h [deleted file]
fs/ntfs/mst.c [deleted file]
fs/ntfs/namei.c [deleted file]
fs/ntfs/ntfs.h [deleted file]
fs/ntfs/quota.c [deleted file]
fs/ntfs/quota.h [deleted file]
fs/ntfs/runlist.c [deleted file]
fs/ntfs/runlist.h [deleted file]
fs/ntfs/super.c [deleted file]
fs/ntfs/sysctl.c [deleted file]
fs/ntfs/sysctl.h [deleted file]
fs/ntfs/time.h [deleted file]
fs/ntfs/types.h [deleted file]
fs/ntfs/unistr.c [deleted file]
fs/ntfs/upcase.c [deleted file]
fs/ntfs/usnjrnl.c [deleted file]
fs/ntfs/usnjrnl.h [deleted file]
fs/ntfs/volume.h [deleted file]
fs/ntfs3/attrib.c
fs/ntfs3/attrlist.c
fs/ntfs3/bitmap.c
fs/ntfs3/dir.c
fs/ntfs3/file.c
fs/ntfs3/frecord.c
fs/ntfs3/fslog.c
fs/ntfs3/fsntfs.c
fs/ntfs3/index.c
fs/ntfs3/inode.c
fs/ntfs3/namei.c
fs/ntfs3/ntfs.h
fs/ntfs3/ntfs_fs.h
fs/ntfs3/record.c
fs/ntfs3/super.c
fs/ntfs3/xattr.c
fs/ocfs2/cluster/heartbeat.c
fs/ocfs2/locks.c
fs/ocfs2/stack_user.c
fs/ocfs2/super.c
fs/open.c
fs/openpromfs/inode.c
fs/overlayfs/copy_up.c
fs/overlayfs/namei.c
fs/overlayfs/overlayfs.h
fs/overlayfs/ovl_entry.h
fs/overlayfs/params.c
fs/overlayfs/readdir.c
fs/overlayfs/super.c
fs/overlayfs/util.c
fs/pidfs.c [new file with mode: 0644]
fs/pipe.c
fs/posix_acl.c
fs/proc/array.c
fs/proc/base.c
fs/proc/inode.c
fs/proc/root.c
fs/qnx4/inode.c
fs/qnx6/inode.c
fs/reiserfs/journal.c
fs/reiserfs/procfs.c
fs/reiserfs/reiserfs.h
fs/reiserfs/super.c
fs/remap_range.c
fs/romfs/super.c
fs/select.c
fs/smb/client/cached_dir.c
fs/smb/client/cifsencrypt.c
fs/smb/client/cifsfs.c
fs/smb/client/cifsglob.h
fs/smb/client/cifssmb.c
fs/smb/client/connect.c
fs/smb/client/dfs.c
fs/smb/client/file.c
fs/smb/client/fs_context.c
fs/smb/client/inode.c
fs/smb/client/namespace.c
fs/smb/client/readdir.c
fs/smb/client/sess.c
fs/smb/client/smb2file.c
fs/smb/client/smb2inode.c
fs/smb/client/smb2ops.c
fs/smb/client/smb2pdu.c
fs/smb/client/smb2proto.h
fs/smb/client/smbencrypt.c
fs/smb/client/transport.c
fs/smb/server/ksmbd_netlink.h
fs/smb/server/misc.c
fs/smb/server/smb2pdu.c
fs/smb/server/transport_ipc.c
fs/smb/server/transport_tcp.c
fs/smb/server/vfs.c
fs/super.c
fs/sysv/inode.c
fs/sysv/itree.c
fs/tracefs/event_inode.c
fs/tracefs/inode.c
fs/tracefs/internal.h
fs/ubifs/dir.c
fs/ubifs/super.c
fs/xfs/libxfs/xfs_attr.c
fs/xfs/libxfs/xfs_rtbitmap.c
fs/xfs/libxfs/xfs_rtbitmap.h
fs/xfs/libxfs/xfs_sb.c
fs/xfs/libxfs/xfs_sb.h
fs/xfs/libxfs/xfs_types.h
fs/xfs/scrub/rtbitmap.c
fs/xfs/scrub/rtsummary.c
fs/xfs/xfs_aops.c
fs/xfs/xfs_buf.c
fs/xfs/xfs_buf.h
fs/xfs/xfs_mount.c
fs/xfs/xfs_super.c
fs/zonefs/file.c
fs/zonefs/super.c
include/asm-generic/barrier.h
include/drm/bridge/aux-bridge.h
include/kunit/test.h
include/linux/backing-dev-defs.h
include/linux/backing-dev.h
include/linux/blk-integrity.h
include/linux/blk-mq.h
include/linux/blk_types.h
include/linux/blkdev.h
include/linux/bvec.h
include/linux/ceph/messenger.h
include/linux/ceph/osd_client.h
include/linux/compiler-gcc.h
include/linux/compiler_attributes.h
include/linux/compiler_types.h
include/linux/cper.h
include/linux/cxl-event.h
include/linux/dcache.h
include/linux/device-mapper.h
include/linux/dmaengine.h
include/linux/dpll.h
include/linux/file.h
include/linux/filelock.h
include/linux/fs.h
include/linux/fscrypt.h
include/linux/gfp.h
include/linux/gpio/driver.h
include/linux/hid_bpf.h
include/linux/hrtimer.h
include/linux/hyperv.h
include/linux/iio/adc/ad_sigma_delta.h
include/linux/iio/common/st_sensors.h
include/linux/iio/imu/adis.h
include/linux/io_uring_types.h
include/linux/iomap.h
include/linux/iommu.h
include/linux/kvm_host.h
include/linux/libata.h
include/linux/lockd/lockd.h
include/linux/lockd/xdr.h
include/linux/lsm_hook_defs.h
include/linux/maple_tree.h
include/linux/memblock.h
include/linux/mlx5/driver.h
include/linux/mlx5/fs.h
include/linux/mlx5/mlx5_ifc.h
include/linux/mlx5/qp.h
include/linux/mlx5/vport.h
include/linux/mman.h
include/linux/mmzone.h
include/linux/netdevice.h
include/linux/netfilter.h
include/linux/netfilter/ipset/ip_set.h
include/linux/nfs_fs_sb.h
include/linux/ns_common.h
include/linux/nvme-rdma.h
include/linux/nvme.h
include/linux/pci.h
include/linux/pid.h
include/linux/pidfs.h [new file with mode: 0644]
include/linux/pktcdvd.h
include/linux/poison.h
include/linux/poll.h
include/linux/proc_fs.h
include/linux/proc_ns.h
include/linux/ptrace.h
include/linux/rcu_sync.h
include/linux/rcupdate.h
include/linux/rw_hint.h [new file with mode: 0644]
include/linux/sched.h
include/linux/sched/signal.h
include/linux/sched/topology.h
include/linux/seq_buf.h
include/linux/serial_core.h
include/linux/skmsg.h
include/linux/spi/spi.h
include/linux/swap.h
include/linux/syscalls.h
include/linux/tcp.h
include/linux/trace_seq.h
include/linux/uio.h
include/linux/usb/gadget.h
include/net/af_unix.h
include/net/busy_poll.h
include/net/cfg80211.h
include/net/inet_connection_sock.h
include/net/inet_sock.h
include/net/ip.h
include/net/llc_pdu.h
include/net/mctp.h
include/net/netfilter/nf_flow_table.h
include/net/netfilter/nf_tables.h
include/net/sch_generic.h
include/net/sock.h
include/net/switchdev.h
include/net/tcp.h
include/net/tls.h
include/net/xdp_sock_drv.h
include/scsi/scsi_device.h
include/sound/cs35l56.h
include/sound/soc-card.h
include/sound/tas2781.h
include/trace/events/afs.h
include/trace/events/ext4.h
include/trace/events/filelock.h
include/trace/events/io_uring.h
include/trace/events/qdisc.h
include/trace/events/rxrpc.h
include/uapi/drm/ivpu_accel.h
include/uapi/drm/nouveau_drm.h
include/uapi/drm/xe_drm.h
include/uapi/linux/btrfs.h
include/uapi/linux/fs.h
include/uapi/linux/iio/types.h
include/uapi/linux/in6.h
include/uapi/linux/io_uring.h
include/uapi/linux/magic.h
include/uapi/linux/netfilter/nf_tables.h
include/uapi/linux/pidfd.h
include/uapi/linux/serial.h
include/uapi/linux/ublk_cmd.h
include/uapi/sound/asound.h
include/uapi/xen/gntalloc.h
init/Kconfig
init/do_mounts.c
init/do_mounts.h
init/init_task.c
init/initramfs.c
init/main.c
io_uring/Makefile
io_uring/cancel.c
io_uring/cancel.h
io_uring/fdinfo.c
io_uring/filetable.h
io_uring/io_uring.c
io_uring/io_uring.h
io_uring/kbuf.c
io_uring/kbuf.h
io_uring/napi.c [new file with mode: 0644]
io_uring/napi.h [new file with mode: 0644]
io_uring/net.c
io_uring/opdef.c
io_uring/openclose.c
io_uring/poll.c
io_uring/poll.h
io_uring/register.c
io_uring/rsrc.h
io_uring/rw.c
io_uring/sqpoll.c
io_uring/sqpoll.h
io_uring/truncate.c [new file with mode: 0644]
io_uring/truncate.h [new file with mode: 0644]
io_uring/uring_cmd.c
io_uring/xattr.c
kernel/bpf/cpumap.c
kernel/bpf/helpers.c
kernel/bpf/task_iter.c
kernel/bpf/verifier.c
kernel/cgroup/cpuset.c
kernel/context_tracking.c
kernel/events/uprobes.c
kernel/exit.c
kernel/fork.c
kernel/futex/core.c
kernel/futex/pi.c
kernel/irq/irqdesc.c
kernel/kprobes.c
kernel/nsproxy.c
kernel/pid.c
kernel/power/swap.c
kernel/rcu/Kconfig
kernel/rcu/rcu.h
kernel/rcu/rcuscale.c
kernel/rcu/rcutorture.c
kernel/rcu/srcutree.c
kernel/rcu/sync.c
kernel/rcu/tasks.h
kernel/rcu/tiny.c
kernel/rcu/tree.c
kernel/rcu/tree.h
kernel/rcu/tree_exp.h
kernel/rcu/tree_nocb.h
kernel/rcu/tree_plugin.h
kernel/sched/core.c
kernel/sched/membarrier.c
kernel/signal.c
kernel/sys.c
kernel/time/clocksource.c
kernel/time/hrtimer.c
kernel/time/tick-sched.c
kernel/time/time_test.c
kernel/trace/fprobe.c
kernel/trace/ftrace.c
kernel/trace/ring_buffer.c
kernel/trace/trace.c
kernel/trace/trace_btf.c
kernel/trace/trace_events_synth.c
kernel/trace/trace_events_trigger.c
kernel/trace/trace_osnoise.c
kernel/trace/trace_output.c
kernel/trace/trace_probe.c
kernel/trace/trace_probe.h
kernel/trace/tracing_map.c
kernel/workqueue.c
lib/Kconfig.debug
lib/Makefile
lib/checksum_kunit.c
lib/cmdline_kunit.c
lib/iov_iter.c
lib/kobject.c
lib/kunit/device-impl.h
lib/kunit/device.c
lib/kunit/executor.c
lib/kunit/executor_test.c
lib/kunit/kunit-test.c
lib/kunit/test.c
lib/livepatch/Makefile [deleted file]
lib/maple_tree.c
lib/memcpy_kunit.c
lib/nlattr.c
lib/seq_buf.c
lib/stackdepot.c
lib/test_maple_tree.c
mm/backing-dev.c
mm/compaction.c
mm/damon/core.c
mm/damon/lru_sort.c
mm/damon/reclaim.c
mm/damon/sysfs-schemes.c
mm/debug_vm_pgtable.c
mm/filemap.c
mm/huge_memory.c
mm/kasan/common.c
mm/kasan/generic.c
mm/kasan/kasan.h
mm/kasan/quarantine.c
mm/madvise.c
mm/memblock.c
mm/memcontrol.c
mm/memory-failure.c
mm/memory.c
mm/migrate.c
mm/mmap.c
mm/page-writeback.c
mm/page_alloc.c
mm/readahead.c
mm/shmem.c
mm/swap.h
mm/swap_state.c
mm/swapfile.c
mm/userfaultfd.c
mm/vmscan.c
mm/zswap.c
net/6lowpan/core.c
net/8021q/vlan_netlink.c
net/atm/mpc.c
net/batman-adv/multicast.c
net/bluetooth/hci_core.c
net/bluetooth/hci_event.c
net/bluetooth/hci_sync.c
net/bluetooth/l2cap_core.c
net/bluetooth/mgmt.c
net/bluetooth/rfcomm/core.c
net/bridge/br_multicast.c
net/bridge/br_netfilter_hooks.c
net/bridge/br_private.h
net/bridge/br_switchdev.c
net/bridge/netfilter/nf_conntrack_bridge.c
net/can/j1939/j1939-priv.h
net/can/j1939/main.c
net/can/j1939/socket.c
net/ceph/messenger_v1.c
net/ceph/messenger_v2.c
net/ceph/osd_client.c
net/core/datagram.c
net/core/dev.c
net/core/dev.h
net/core/filter.c
net/core/gso_test.c
net/core/page_pool_user.c
net/core/request_sock.c
net/core/rtnetlink.c
net/core/skmsg.c
net/core/sock.c
net/devlink/core.c
net/devlink/port.c
net/handshake/handshake-test.c
net/hsr/hsr_device.c
net/hsr/hsr_forward.c
net/ipv4/af_inet.c
net/ipv4/ah4.c
net/ipv4/arp.c
net/ipv4/devinet.c
net/ipv4/esp4.c
net/ipv4/inet_connection_sock.c
net/ipv4/inet_hashtables.c
net/ipv4/ip_gre.c
net/ipv4/ip_output.c
net/ipv4/ip_sockglue.c
net/ipv4/ip_tunnel.c
net/ipv4/ip_tunnel_core.c
net/ipv4/ip_vti.c
net/ipv4/ipip.c
net/ipv4/ipmr.c
net/ipv4/raw.c
net/ipv4/tcp.c
net/ipv4/tunnel4.c
net/ipv4/udp.c
net/ipv4/udp_tunnel_core.c
net/ipv4/xfrm4_tunnel.c
net/ipv6/addrconf.c
net/ipv6/addrconf_core.c
net/ipv6/af_inet6.c
net/ipv6/ah6.c
net/ipv6/esp6.c
net/ipv6/exthdrs.c
net/ipv6/ip6_output.c
net/ipv6/ip6_tunnel.c
net/ipv6/ip6_udp_tunnel.c
net/ipv6/mip6.c
net/ipv6/route.c
net/ipv6/seg6.c
net/ipv6/sit.c
net/ipv6/tunnel6.c
net/ipv6/xfrm6_tunnel.c
net/iucv/iucv.c
net/key/af_key.c
net/l2tp/l2tp_ip6.c
net/llc/af_llc.c
net/llc/llc_core.c
net/mac80211/Kconfig
net/mac80211/cfg.c
net/mac80211/debugfs_netdev.c
net/mac80211/debugfs_netdev.h
net/mac80211/iface.c
net/mac80211/mlme.c
net/mac80211/rate.c
net/mac80211/scan.c
net/mac80211/sta_info.c
net/mac80211/tx.c
net/mac80211/wbrf.c
net/mctp/route.c
net/mptcp/diag.c
net/mptcp/fastopen.c
net/mptcp/options.c
net/mptcp/pm_netlink.c
net/mptcp/pm_userspace.c
net/mptcp/protocol.c
net/mptcp/protocol.h
net/mptcp/subflow.c
net/netfilter/ipset/ip_set_bitmap_gen.h
net/netfilter/ipset/ip_set_core.c
net/netfilter/ipset/ip_set_hash_gen.h
net/netfilter/ipset/ip_set_list_set.c
net/netfilter/nf_conntrack_core.c
net/netfilter/nf_conntrack_h323_asn1.c
net/netfilter/nf_conntrack_netlink.c
net/netfilter/nf_conntrack_proto_sctp.c
net/netfilter/nf_conntrack_proto_tcp.c
net/netfilter/nf_flow_table_core.c
net/netfilter/nf_log.c
net/netfilter/nf_nat_core.c
net/netfilter/nf_tables_api.c
net/netfilter/nfnetlink_queue.c
net/netfilter/nft_chain_filter.c
net/netfilter/nft_compat.c
net/netfilter/nft_ct.c
net/netfilter/nft_flow_offload.c
net/netfilter/nft_limit.c
net/netfilter/nft_nat.c
net/netfilter/nft_rt.c
net/netfilter/nft_set_hash.c
net/netfilter/nft_set_pipapo.c
net/netfilter/nft_set_pipapo.h
net/netfilter/nft_set_pipapo_avx2.c
net/netfilter/nft_set_rbtree.c
net/netfilter/nft_socket.c
net/netfilter/nft_synproxy.c
net/netfilter/nft_tproxy.c
net/netfilter/nft_tunnel.c
net/netfilter/nft_xfrm.c
net/netlink/af_netlink.c
net/netrom/af_netrom.c
net/netrom/nr_dev.c
net/netrom/nr_in.c
net/netrom/nr_out.c
net/netrom/nr_route.c
net/netrom/nr_subr.c
net/nfc/nci/core.c
net/openvswitch/flow_netlink.c
net/phonet/datagram.c
net/phonet/pep.c
net/rds/af_rds.c
net/rds/rdma.c
net/rds/recv.c
net/rds/send.c
net/rxrpc/ar-internal.h
net/rxrpc/call_event.c
net/rxrpc/call_object.c
net/rxrpc/conn_event.c
net/rxrpc/input.c
net/rxrpc/output.c
net/rxrpc/proc.c
net/rxrpc/rxkad.c
net/sched/act_mirred.c
net/sched/cls_api.c
net/sched/cls_flower.c
net/sched/em_canid.c
net/sched/em_cmp.c
net/sched/em_meta.c
net/sched/em_nbyte.c
net/sched/em_text.c
net/sched/em_u32.c
net/sctp/inqueue.c
net/smc/af_smc.c
net/smc/smc_core.c
net/smc/smc_diag.c
net/sunrpc/svc.c
net/sunrpc/svcsock.c
net/switchdev/switchdev.c
net/tipc/bearer.c
net/tls/tls_main.c
net/tls/tls_sw.c
net/unix/af_unix.c
net/unix/diag.c
net/unix/garbage.c
net/wireless/Kconfig
net/wireless/core.c
net/wireless/nl80211.c
net/wireless/scan.c
net/xdp/xsk.c
net/xdp/xsk_buff_pool.c
net/xfrm/xfrm_algo.c
net/xfrm/xfrm_device.c
net/xfrm/xfrm_output.c
net/xfrm/xfrm_policy.c
net/xfrm/xfrm_user.c
samples/bpf/asm_goto_workaround.h
samples/cgroup/.gitignore [new file with mode: 0644]
scripts/Kconfig.include
scripts/Makefile.compiler
scripts/Makefile.defconf
scripts/Makefile.extrawarn
scripts/bpf_doc.py
scripts/clang-tools/gen_compile_commands.py
scripts/gdb/linux/symbols.py
scripts/kconfig/symbol.c
scripts/link-vmlinux.sh
scripts/mksysmap
scripts/mod/modpost.c
scripts/mod/modpost.h
scripts/mod/sumversion.c
scripts/package/kernel.spec
security/apparmor/lsm.c
security/integrity/digsig.c
security/keys/encrypted-keys/encrypted.c
security/landlock/fs.c
security/security.c
security/selinux/hooks.c
security/tomoyo/common.c
security/tomoyo/tomoyo.c
sound/core/Makefile
sound/core/pcm.c
sound/core/pcm_native.c
sound/core/ump.c
sound/firewire/amdtp-stream.c
sound/pci/hda/Kconfig
sound/pci/hda/cs35l41_hda_property.c
sound/pci/hda/cs35l56_hda.c
sound/pci/hda/hda_controller.c
sound/pci/hda/hda_intel.c
sound/pci/hda/patch_conexant.c
sound/pci/hda/patch_cs8409.c
sound/pci/hda/patch_realtek.c
sound/pci/hda/tas2781_hda_i2c.c
sound/soc/amd/acp/acp-mach-common.c
sound/soc/amd/acp/acp-sof-mach.c
sound/soc/amd/acp/acp3x-es83xx/acp3x-es83xx.c
sound/soc/amd/yc/acp6x-mach.c
sound/soc/amd/yc/pci-acp6x.c
sound/soc/codecs/cs35l45.c
sound/soc/codecs/cs35l56-shared.c
sound/soc/codecs/cs35l56.c
sound/soc/codecs/cs35l56.h
sound/soc/codecs/cs42l43.c
sound/soc/codecs/es8326.c [changed mode: 0755->0644]
sound/soc/codecs/es8326.h
sound/soc/codecs/lpass-wsa-macro.c
sound/soc/codecs/madera.c
sound/soc/codecs/rt5645.c
sound/soc/codecs/tas2781-comlib.c
sound/soc/codecs/tas2781-i2c.c
sound/soc/codecs/wcd9335.c
sound/soc/codecs/wcd934x.c
sound/soc/codecs/wcd938x.c
sound/soc/codecs/wm8962.c
sound/soc/codecs/wm_adsp.c
sound/soc/codecs/wsa883x.c
sound/soc/fsl/fsl_xcvr.c
sound/soc/intel/avs/core.c
sound/soc/intel/avs/topology.c
sound/soc/intel/boards/bytcht_cx2072x.c
sound/soc/intel/boards/bytcht_da7213.c
sound/soc/intel/boards/bytcht_es8316.c
sound/soc/intel/boards/bytcr_rt5640.c
sound/soc/intel/boards/bytcr_rt5651.c
sound/soc/intel/boards/bytcr_wm5102.c
sound/soc/intel/boards/cht_bsw_rt5645.c
sound/soc/intel/boards/cht_bsw_rt5672.c
sound/soc/qcom/lpass-cdc-dma.c
sound/soc/qcom/qdsp6/q6apm-dai.c
sound/soc/qcom/sc8280xp.c
sound/soc/sh/rcar/adg.c
sound/soc/soc-card.c
sound/soc/soc-core.c
sound/soc/sof/amd/acp-ipc.c
sound/soc/sof/amd/acp.c
sound/soc/sof/intel/pci-lnl.c
sound/soc/sof/intel/pci-tgl.c
sound/soc/sof/ipc3-topology.c
sound/soc/sof/ipc3.c
sound/soc/sof/ipc4-pcm.c
sound/soc/sunxi/sun4i-spdif.c
sound/usb/clock.c
sound/usb/format.c
sound/usb/midi.c
sound/usb/midi2.c
sound/usb/quirks.c
sound/virtio/virtio_card.c
sound/virtio/virtio_ctl_msg.c
sound/virtio/virtio_pcm_msg.c
tools/arch/x86/include/asm/cpufeatures.h
tools/arch/x86/include/asm/msr-index.h
tools/arch/x86/include/asm/rmwcc.h
tools/arch/x86/include/uapi/asm/kvm.h
tools/arch/x86/lib/memcpy_64.S
tools/arch/x86/lib/memset_64.S
tools/include/asm-generic/unaligned.h
tools/include/linux/compiler_types.h
tools/include/uapi/asm-generic/unistd.h
tools/include/uapi/drm/drm.h
tools/include/uapi/drm/i915_drm.h
tools/include/uapi/linux/fcntl.h
tools/include/uapi/linux/kvm.h
tools/include/uapi/linux/mount.h
tools/include/uapi/linux/stat.h
tools/net/ynl/lib/ynl.c
tools/perf/Documentation/perf-list.txt
tools/perf/Makefile.perf
tools/perf/builtin-list.c
tools/perf/builtin-record.c
tools/perf/builtin-top.c
tools/perf/pmu-events/arch/x86/alderlake/adl-metrics.json
tools/perf/pmu-events/arch/x86/alderlaken/adln-metrics.json
tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json
tools/perf/tests/shell/daemon.sh
tools/perf/tests/shell/list.sh
tools/perf/tests/shell/script.sh
tools/perf/trace/beauty/statx.c
tools/perf/util/evlist.c
tools/perf/util/hist.c
tools/perf/util/include/linux/linkage.h
tools/perf/util/metricgroup.c
tools/perf/util/print-events.c
tools/perf/util/synthetic-events.c
tools/power/cpupower/bench/Makefile
tools/testing/cxl/Kbuild
tools/testing/cxl/test/Kbuild
tools/testing/cxl/test/cxl.c
tools/testing/cxl/test/mock.c
tools/testing/cxl/test/mock.h
tools/testing/kunit/kunit_kernel.py
tools/testing/nvdimm/Kbuild
tools/testing/selftests/Makefile
tools/testing/selftests/bpf/prog_tests/iters.c
tools/testing/selftests/bpf/prog_tests/read_vsyscall.c [new file with mode: 0644]
tools/testing/selftests/bpf/prog_tests/timer.c
tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
tools/testing/selftests/bpf/progs/iters_task.c
tools/testing/selftests/bpf/progs/read_vsyscall.c [new file with mode: 0644]
tools/testing/selftests/bpf/progs/timer.c
tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c
tools/testing/selftests/core/close_range_test.c
tools/testing/selftests/drivers/net/bonding/bond_options.sh
tools/testing/selftests/drivers/net/bonding/lag_lib.sh
tools/testing/selftests/drivers/net/bonding/settings
tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh
tools/testing/selftests/drivers/net/team/config
tools/testing/selftests/dt/Makefile
tools/testing/selftests/dt/test_unprobed_devices.sh
tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
tools/testing/selftests/ftrace/ftracetest
tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
tools/testing/selftests/ftrace/test.d/ftrace/func_hotplug.tc [new file with mode: 0644]
tools/testing/selftests/ftrace/test.d/trigger/trigger-hist-mod.tc
tools/testing/selftests/futex/functional/futex_requeue_pi.c
tools/testing/selftests/hid/tests/test_wacom_generic.py
tools/testing/selftests/iommu/config
tools/testing/selftests/iommu/iommufd.c
tools/testing/selftests/iommu/iommufd_utils.h
tools/testing/selftests/kselftest/ktap_helpers.sh [moved from tools/testing/selftests/dt/ktap_helpers.sh with 66% similarity]
tools/testing/selftests/kvm/aarch64/arch_timer.c
tools/testing/selftests/kvm/aarch64/hypercalls.c
tools/testing/selftests/kvm/aarch64/page_fault_test.c
tools/testing/selftests/kvm/aarch64/smccc_filter.c
tools/testing/selftests/kvm/aarch64/vpmu_counter_access.c
tools/testing/selftests/kvm/demand_paging_test.c
tools/testing/selftests/kvm/dirty_log_perf_test.c
tools/testing/selftests/kvm/dirty_log_test.c
tools/testing/selftests/kvm/get-reg-list.c
tools/testing/selftests/kvm/guest_print_test.c
tools/testing/selftests/kvm/hardware_disable_test.c
tools/testing/selftests/kvm/include/test_util.h
tools/testing/selftests/kvm/include/x86_64/processor.h
tools/testing/selftests/kvm/kvm_create_max_vcpus.c
tools/testing/selftests/kvm/kvm_page_table_test.c
tools/testing/selftests/kvm/lib/aarch64/processor.c
tools/testing/selftests/kvm/lib/aarch64/vgic.c
tools/testing/selftests/kvm/lib/elf.c
tools/testing/selftests/kvm/lib/kvm_util.c
tools/testing/selftests/kvm/lib/memstress.c
tools/testing/selftests/kvm/lib/riscv/processor.c
tools/testing/selftests/kvm/lib/s390x/processor.c
tools/testing/selftests/kvm/lib/test_util.c
tools/testing/selftests/kvm/lib/userfaultfd_util.c
tools/testing/selftests/kvm/lib/x86_64/processor.c
tools/testing/selftests/kvm/lib/x86_64/vmx.c
tools/testing/selftests/kvm/memslot_modification_stress_test.c
tools/testing/selftests/kvm/memslot_perf_test.c
tools/testing/selftests/kvm/riscv/get-reg-list.c
tools/testing/selftests/kvm/rseq_test.c
tools/testing/selftests/kvm/s390x/resets.c
tools/testing/selftests/kvm/s390x/sync_regs_test.c
tools/testing/selftests/kvm/set_memory_region_test.c
tools/testing/selftests/kvm/system_counter_offset_test.c
tools/testing/selftests/kvm/x86_64/amx_test.c
tools/testing/selftests/kvm/x86_64/cpuid_test.c
tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
tools/testing/selftests/kvm/x86_64/flds_emulation.h
tools/testing/selftests/kvm/x86_64/hyperv_clock.c
tools/testing/selftests/kvm/x86_64/hyperv_features.c
tools/testing/selftests/kvm/x86_64/hyperv_ipi.c
tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c
tools/testing/selftests/kvm/x86_64/kvm_clock_test.c
tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
tools/testing/selftests/kvm/x86_64/platform_info_test.c
tools/testing/selftests/kvm/x86_64/pmu_event_filter_test.c
tools/testing/selftests/kvm/x86_64/sev_migrate_tests.c
tools/testing/selftests/kvm/x86_64/smaller_maxphyaddr_emulation_test.c
tools/testing/selftests/kvm/x86_64/sync_regs_test.c
tools/testing/selftests/kvm/x86_64/ucna_injection_test.c
tools/testing/selftests/kvm/x86_64/userspace_io_test.c
tools/testing/selftests/kvm/x86_64/vmx_apic_access_test.c
tools/testing/selftests/kvm/x86_64/vmx_dirty_log_test.c
tools/testing/selftests/kvm/x86_64/vmx_exception_with_invalid_guest_state.c
tools/testing/selftests/kvm/x86_64/vmx_nested_tsc_scaling_test.c
tools/testing/selftests/kvm/x86_64/xapic_ipi_test.c
tools/testing/selftests/kvm/x86_64/xcr0_cpuid_test.c
tools/testing/selftests/kvm/x86_64/xss_msr_test.c
tools/testing/selftests/landlock/common.h
tools/testing/selftests/landlock/fs_test.c
tools/testing/selftests/landlock/net_test.c
tools/testing/selftests/lib.mk
tools/testing/selftests/livepatch/.gitignore [new file with mode: 0644]
tools/testing/selftests/livepatch/Makefile
tools/testing/selftests/livepatch/README
tools/testing/selftests/livepatch/config
tools/testing/selftests/livepatch/functions.sh
tools/testing/selftests/livepatch/test-callbacks.sh
tools/testing/selftests/livepatch/test-ftrace.sh
tools/testing/selftests/livepatch/test-livepatch.sh
tools/testing/selftests/livepatch/test-shadow-vars.sh
tools/testing/selftests/livepatch/test-state.sh
tools/testing/selftests/livepatch/test-syscall.sh [new file with mode: 0755]
tools/testing/selftests/livepatch/test-sysfs.sh
tools/testing/selftests/livepatch/test_klp-call_getpid.c [new file with mode: 0644]
tools/testing/selftests/livepatch/test_modules/Makefile [new file with mode: 0644]
tools/testing/selftests/livepatch/test_modules/test_klp_atomic_replace.c [moved from lib/livepatch/test_klp_atomic_replace.c with 100% similarity]
tools/testing/selftests/livepatch/test_modules/test_klp_callbacks_busy.c [moved from lib/livepatch/test_klp_callbacks_busy.c with 100% similarity]
tools/testing/selftests/livepatch/test_modules/test_klp_callbacks_demo.c [moved from lib/livepatch/test_klp_callbacks_demo.c with 100% similarity]
tools/testing/selftests/livepatch/test_modules/test_klp_callbacks_demo2.c [moved from lib/livepatch/test_klp_callbacks_demo2.c with 100% similarity]
tools/testing/selftests/livepatch/test_modules/test_klp_callbacks_mod.c [moved from lib/livepatch/test_klp_callbacks_mod.c with 100% similarity]
tools/testing/selftests/livepatch/test_modules/test_klp_livepatch.c [moved from lib/livepatch/test_klp_livepatch.c with 100% similarity]
tools/testing/selftests/livepatch/test_modules/test_klp_shadow_vars.c [moved from lib/livepatch/test_klp_shadow_vars.c with 100% similarity]
tools/testing/selftests/livepatch/test_modules/test_klp_state.c [moved from lib/livepatch/test_klp_state.c with 100% similarity]
tools/testing/selftests/livepatch/test_modules/test_klp_state2.c [moved from lib/livepatch/test_klp_state2.c with 100% similarity]
tools/testing/selftests/livepatch/test_modules/test_klp_state3.c [moved from lib/livepatch/test_klp_state3.c with 100% similarity]
tools/testing/selftests/livepatch/test_modules/test_klp_syscall.c [new file with mode: 0644]
tools/testing/selftests/mm/charge_reserved_hugetlb.sh
tools/testing/selftests/mm/ksm_tests.c
tools/testing/selftests/mm/map_hugetlb.c
tools/testing/selftests/mm/mremap_test.c
tools/testing/selftests/mm/uffd-unit-tests.c
tools/testing/selftests/mm/va_high_addr_switch.sh
tools/testing/selftests/mm/write_hugetlb_memory.sh
tools/testing/selftests/move_mount_set_group/move_mount_set_group_test.c
tools/testing/selftests/mqueue/setting [new file with mode: 0644]
tools/testing/selftests/net/Makefile
tools/testing/selftests/net/big_tcp.sh
tools/testing/selftests/net/cmsg_ipv6.sh
tools/testing/selftests/net/config
tools/testing/selftests/net/forwarding/Makefile
tools/testing/selftests/net/forwarding/bridge_locked_port.sh
tools/testing/selftests/net/forwarding/bridge_mdb.sh
tools/testing/selftests/net/forwarding/tc_actions.sh
tools/testing/selftests/net/forwarding/tc_flower_l2_miss.sh
tools/testing/selftests/net/gro.sh
tools/testing/selftests/net/ioam6.sh
tools/testing/selftests/net/ioam6_parser.c
tools/testing/selftests/net/ip_local_port_range.c
tools/testing/selftests/net/lib.sh
tools/testing/selftests/net/mptcp/config
tools/testing/selftests/net/mptcp/diag.sh
tools/testing/selftests/net/mptcp/mptcp_join.sh
tools/testing/selftests/net/mptcp/mptcp_lib.sh
tools/testing/selftests/net/mptcp/pm_netlink.sh
tools/testing/selftests/net/mptcp/settings
tools/testing/selftests/net/mptcp/simult_flows.sh
tools/testing/selftests/net/mptcp/userspace_pm.sh
tools/testing/selftests/net/net_helper.sh [changed mode: 0755->0644]
tools/testing/selftests/net/openvswitch/openvswitch.sh
tools/testing/selftests/net/openvswitch/ovs-dpctl.py
tools/testing/selftests/net/pmtu.sh
tools/testing/selftests/net/rps_default_mask.sh
tools/testing/selftests/net/rtnetlink.sh
tools/testing/selftests/net/setup_loopback.sh [changed mode: 0755->0644]
tools/testing/selftests/net/setup_veth.sh
tools/testing/selftests/net/so_incoming_cpu.c
tools/testing/selftests/net/so_txtime.sh
tools/testing/selftests/net/tcp_ao/config [new file with mode: 0644]
tools/testing/selftests/net/tcp_ao/key-management.c
tools/testing/selftests/net/tcp_ao/lib/sock.c
tools/testing/selftests/net/tcp_ao/rst.c
tools/testing/selftests/net/tcp_ao/settings [new file with mode: 0644]
tools/testing/selftests/net/tcp_ao/unsigned-md5.c
tools/testing/selftests/net/test_bridge_backup_port.sh
tools/testing/selftests/net/tls.c
tools/testing/selftests/net/udpgro.sh
tools/testing/selftests/net/udpgro_bench.sh
tools/testing/selftests/net/udpgro_frglist.sh
tools/testing/selftests/net/udpgro_fwd.sh
tools/testing/selftests/net/udpgso_bench_rx.c
tools/testing/selftests/net/veth.sh
tools/testing/selftests/net/xdp_dummy.c [new file with mode: 0644]
tools/testing/selftests/netfilter/Makefile
tools/testing/selftests/netfilter/bridge_netfilter.sh [new file with mode: 0644]
tools/testing/selftests/netfilter/conntrack_dump_flush.c
tools/testing/selftests/pidfd/pidfd_getfd_test.c
tools/testing/selftests/power_supply/Makefile [new file with mode: 0644]
tools/testing/selftests/power_supply/helpers.sh [new file with mode: 0644]
tools/testing/selftests/power_supply/test_power_supply_properties.sh [new file with mode: 0755]
tools/testing/selftests/powerpc/math/fpu_signal.c
tools/testing/selftests/powerpc/papr_vpd/papr_vpd.c
tools/testing/selftests/resctrl/cache.c
tools/testing/selftests/resctrl/cat_test.c
tools/testing/selftests/resctrl/cmt_test.c
tools/testing/selftests/resctrl/fill_buf.c
tools/testing/selftests/resctrl/mba_test.c
tools/testing/selftests/resctrl/mbm_test.c
tools/testing/selftests/resctrl/resctrl.h
tools/testing/selftests/resctrl/resctrl_tests.c
tools/testing/selftests/resctrl/resctrl_val.c
tools/testing/selftests/resctrl/resctrlfs.c
tools/testing/selftests/rseq/basic_percpu_ops_test.c
tools/testing/selftests/rseq/param_test.c
tools/testing/selftests/rust/Makefile [new file with mode: 0644]
tools/testing/selftests/rust/config [new file with mode: 0644]
tools/testing/selftests/rust/test_probe_samples.sh [new file with mode: 0755]
tools/testing/selftests/sched/cs_prctl_test.c
tools/testing/selftests/seccomp/seccomp_benchmark.c
tools/testing/selftests/thermal/intel/power_floor/.gitignore [new file with mode: 0644]
tools/testing/selftests/thermal/intel/workload_hint/.gitignore [new file with mode: 0644]
tools/testing/selftests/uevent/.gitignore [new file with mode: 0644]
tools/tracing/rtla/Makefile
tools/tracing/rtla/src/osnoise_hist.c
tools/tracing/rtla/src/osnoise_top.c
tools/tracing/rtla/src/timerlat_hist.c
tools/tracing/rtla/src/timerlat_top.c
tools/tracing/rtla/src/utils.c
tools/tracing/rtla/src/utils.h
tools/verification/rv/Makefile
tools/verification/rv/src/in_kernel.c
virt/kvm/kvm_main.c

index 04998f7bda81816b4bb1dd321f80393a80ae9a67..bd9f1025ac44e0e289a6843de2c4497be2b76118 100644 (file)
--- a/.mailmap
+++ b/.mailmap
@@ -191,10 +191,11 @@ Gao Xiang <xiang@kernel.org> <gaoxiang25@huawei.com>
 Gao Xiang <xiang@kernel.org> <hsiangkao@aol.com>
 Gao Xiang <xiang@kernel.org> <hsiangkao@linux.alibaba.com>
 Gao Xiang <xiang@kernel.org> <hsiangkao@redhat.com>
-Geliang Tang <geliang.tang@linux.dev> <geliang.tang@suse.com>
-Geliang Tang <geliang.tang@linux.dev> <geliangtang@xiaomi.com>
-Geliang Tang <geliang.tang@linux.dev> <geliangtang@gmail.com>
-Geliang Tang <geliang.tang@linux.dev> <geliangtang@163.com>
+Geliang Tang <geliang@kernel.org> <geliang.tang@linux.dev>
+Geliang Tang <geliang@kernel.org> <geliang.tang@suse.com>
+Geliang Tang <geliang@kernel.org> <geliangtang@xiaomi.com>
+Geliang Tang <geliang@kernel.org> <geliangtang@gmail.com>
+Geliang Tang <geliang@kernel.org> <geliangtang@163.com>
 Georgi Djakov <djakov@kernel.org> <georgi.djakov@linaro.org>
 Gerald Schaefer <gerald.schaefer@linux.ibm.com> <geraldsc@de.ibm.com>
 Gerald Schaefer <gerald.schaefer@linux.ibm.com> <gerald.schaefer@de.ibm.com>
@@ -289,6 +290,7 @@ Johan Hovold <johan@kernel.org> <johan@hovoldconsulting.com>
 John Crispin <john@phrozen.org> <blogic@openwrt.org>
 John Fastabend <john.fastabend@gmail.com> <john.r.fastabend@intel.com>
 John Keeping <john@keeping.me.uk> <john@metanate.com>
+John Moon <john@jmoon.dev> <quic_johmoo@quicinc.com>
 John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
 John Stultz <johnstul@us.ibm.com>
 <jon.toppins+linux@gmail.com> <jtoppins@cumulusnetworks.com>
@@ -323,6 +325,7 @@ Kenneth W Chen <kenneth.w.chen@intel.com>
 Kenneth Westfield <quic_kwestfie@quicinc.com> <kwestfie@codeaurora.org>
 Kiran Gunda <quic_kgunda@quicinc.com> <kgunda@codeaurora.org>
 Kirill Tkhai <tkhai@ya.ru> <ktkhai@virtuozzo.com>
+Kishon Vijay Abraham I <kishon@kernel.org> <kishon@ti.com>
 Konstantin Khlebnikov <koct9i@gmail.com> <khlebnikov@yandex-team.ru>
 Konstantin Khlebnikov <koct9i@gmail.com> <k.khlebnikov@samsung.com>
 Koushik <raghavendra.koushik@neterion.com>
@@ -344,6 +347,7 @@ Leonid I Ananiev <leonid.i.ananiev@intel.com>
 Leon Romanovsky <leon@kernel.org> <leon@leon.nu>
 Leon Romanovsky <leon@kernel.org> <leonro@mellanox.com>
 Leon Romanovsky <leon@kernel.org> <leonro@nvidia.com>
+Leo Yan <leo.yan@linux.dev> <leo.yan@linaro.org>
 Liam Mark <quic_lmark@quicinc.com> <lmark@codeaurora.org>
 Linas Vepstas <linas@austin.ibm.com>
 Linus Lüssing <linus.luessing@c0d3.blue> <linus.luessing@ascom.ch>
@@ -550,6 +554,7 @@ Senthilkumar N L <quic_snlakshm@quicinc.com> <snlakshm@codeaurora.org>
 Serge Hallyn <sergeh@kernel.org> <serge.hallyn@canonical.com>
 Serge Hallyn <sergeh@kernel.org> <serue@us.ibm.com>
 Seth Forshee <sforshee@kernel.org> <seth.forshee@canonical.com>
+Shakeel Butt <shakeel.butt@linux.dev> <shakeelb@google.com>
 Shannon Nelson <shannon.nelson@amd.com> <snelson@pensando.io>
 Shannon Nelson <shannon.nelson@amd.com> <shannon.nelson@intel.com>
 Shannon Nelson <shannon.nelson@amd.com> <shannon.nelson@oracle.com>
@@ -605,6 +610,11 @@ TripleX Chung <xxx.phy@gmail.com> <triplex@zh-kernel.org>
 TripleX Chung <xxx.phy@gmail.com> <zhongyu@18mail.cn>
 Tsuneo Yoshioka <Tsuneo.Yoshioka@f-secure.com>
 Tudor Ambarus <tudor.ambarus@linaro.org> <tudor.ambarus@microchip.com>
+Tvrtko Ursulin <tursulin@ursulin.net> <tvrtko.ursulin@intel.com>
+Tvrtko Ursulin <tursulin@ursulin.net> <tvrtko.ursulin@linux.intel.com>
+Tvrtko Ursulin <tursulin@ursulin.net> <tvrtko.ursulin@sophos.com>
+Tvrtko Ursulin <tursulin@ursulin.net> <tvrtko.ursulin@onelan.co.uk>
+Tvrtko Ursulin <tursulin@ursulin.net> <tvrtko@ursulin.net>
 Tycho Andersen <tycho@tycho.pizza> <tycho@tycho.ws>
 Tzung-Bi Shih <tzungbi@kernel.org> <tzungbi@google.com>
 Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
diff --git a/CREDITS b/CREDITS
index 5797e8f7e92b06f8736c01c6c191815c4802b6fd..3c2bb55847c607f027ddc6c50c259545327c0fe8 100644 (file)
--- a/CREDITS
+++ b/CREDITS
@@ -63,6 +63,11 @@ D: dosfs, LILO, some fd features, ATM, various other hacks here and there
 S: Buenos Aires
 S: Argentina
 
+NTFS FILESYSTEM
+N: Anton Altaparmakov
+E: anton@tuxera.com
+D: NTFS filesystem
+
 N: Tim Alpaerts
 E: tim_alpaerts@toyota-motor-europe.com
 D: 802.2 class II logical link control layer,
@@ -2161,6 +2166,19 @@ N: Mike Kravetz
 E: mike.kravetz@oracle.com
 D: Maintenance and development of the hugetlb subsystem
 
+N: Seth Jennings
+E: sjenning@redhat.com
+D: Creation and maintenance of zswap
+
+N: Dan Streetman
+E: ddstreet@ieee.org
+D: Maintenance and development of zswap
+D: Creation and maintenance of the zpool API
+
+N: Vitaly Wool
+E: vitaly.wool@konsulko.com
+D: Maintenance and development of zswap
+
 N: Andreas S. Krebs
 E: akrebs@altavista.net
 D: CYPRESS CY82C693 chipset IDE, Digital's PC-Alpha 164SX boards
index 906ff3ca928ac1389567a5f02bdc4e06c3980b38..5bff64d256c207c8a7d2c915e0e8affac191913c 100644 (file)
@@ -1,4 +1,4 @@
-What:          /sys/class/<iface>/queues/rx-<queue>/rps_cpus
+What:          /sys/class/net/<iface>/queues/rx-<queue>/rps_cpus
 Date:          March 2010
 KernelVersion: 2.6.35
 Contact:       netdev@vger.kernel.org
@@ -8,7 +8,7 @@ Description:
                network device queue. Possible values depend on the number
                of available CPU(s) in the system.
 
-What:          /sys/class/<iface>/queues/rx-<queue>/rps_flow_cnt
+What:          /sys/class/net/<iface>/queues/rx-<queue>/rps_flow_cnt
 Date:          April 2010
 KernelVersion: 2.6.35
 Contact:       netdev@vger.kernel.org
@@ -16,7 +16,7 @@ Description:
                Number of Receive Packet Steering flows being currently
                processed by this particular network device receive queue.
 
-What:          /sys/class/<iface>/queues/tx-<queue>/tx_timeout
+What:          /sys/class/net/<iface>/queues/tx-<queue>/tx_timeout
 Date:          November 2011
 KernelVersion: 3.3
 Contact:       netdev@vger.kernel.org
@@ -24,7 +24,7 @@ Description:
                Indicates the number of transmit timeout events seen by this
                network interface transmit queue.
 
-What:          /sys/class/<iface>/queues/tx-<queue>/tx_maxrate
+What:          /sys/class/net/<iface>/queues/tx-<queue>/tx_maxrate
 Date:          March 2015
 KernelVersion: 4.1
 Contact:       netdev@vger.kernel.org
@@ -32,7 +32,7 @@ Description:
                A Mbps max-rate set for the queue, a value of zero means disabled,
                default is disabled.
 
-What:          /sys/class/<iface>/queues/tx-<queue>/xps_cpus
+What:          /sys/class/net/<iface>/queues/tx-<queue>/xps_cpus
 Date:          November 2010
 KernelVersion: 2.6.38
 Contact:       netdev@vger.kernel.org
@@ -42,7 +42,7 @@ Description:
                network device transmit queue. Possible values depend on the
                number of available CPU(s) in the system.
 
-What:          /sys/class/<iface>/queues/tx-<queue>/xps_rxqs
+What:          /sys/class/net/<iface>/queues/tx-<queue>/xps_rxqs
 Date:          June 2018
 KernelVersion: 4.18.0
 Contact:       netdev@vger.kernel.org
@@ -53,7 +53,7 @@ Description:
                number of available receive queue(s) in the network device.
                Default is disabled.
 
-What:          /sys/class/<iface>/queues/tx-<queue>/byte_queue_limits/hold_time
+What:          /sys/class/net/<iface>/queues/tx-<queue>/byte_queue_limits/hold_time
 Date:          November 2011
 KernelVersion: 3.3
 Contact:       netdev@vger.kernel.org
@@ -62,7 +62,7 @@ Description:
                of this particular network device transmit queue.
                Default value is 1000.
 
-What:          /sys/class/<iface>/queues/tx-<queue>/byte_queue_limits/inflight
+What:          /sys/class/net/<iface>/queues/tx-<queue>/byte_queue_limits/inflight
 Date:          November 2011
 KernelVersion: 3.3
 Contact:       netdev@vger.kernel.org
@@ -70,7 +70,7 @@ Description:
                Indicates the number of bytes (objects) in flight on this
                network device transmit queue.
 
-What:          /sys/class/<iface>/queues/tx-<queue>/byte_queue_limits/limit
+What:          /sys/class/net/<iface>/queues/tx-<queue>/byte_queue_limits/limit
 Date:          November 2011
 KernelVersion: 3.3
 Contact:       netdev@vger.kernel.org
@@ -79,7 +79,7 @@ Description:
                on this network device transmit queue. This value is clamped
                to be within the bounds defined by limit_max and limit_min.
 
-What:          /sys/class/<iface>/queues/tx-<queue>/byte_queue_limits/limit_max
+What:          /sys/class/net/<iface>/queues/tx-<queue>/byte_queue_limits/limit_max
 Date:          November 2011
 KernelVersion: 3.3
 Contact:       netdev@vger.kernel.org
@@ -88,7 +88,7 @@ Description:
                queued on this network device transmit queue. See
                include/linux/dynamic_queue_limits.h for the default value.
 
-What:          /sys/class/<iface>/queues/tx-<queue>/byte_queue_limits/limit_min
+What:          /sys/class/net/<iface>/queues/tx-<queue>/byte_queue_limits/limit_min
 Date:          November 2011
 KernelVersion: 3.3
 Contact:       netdev@vger.kernel.org
index 55db27815361b2d9ad511e026bee60a615ae03d5..53e508c6936a515216ad3af96ddfb170f6e50cf7 100644 (file)
@@ -1,4 +1,4 @@
-What:          /sys/class/<iface>/statistics/collisions
+What:          /sys/class/net/<iface>/statistics/collisions
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -6,7 +6,7 @@ Description:
                Indicates the number of collisions seen by this network device.
                This value might not be relevant with all MAC layers.
 
-What:          /sys/class/<iface>/statistics/multicast
+What:          /sys/class/net/<iface>/statistics/multicast
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -14,7 +14,7 @@ Description:
                Indicates the number of multicast packets received by this
                network device.
 
-What:          /sys/class/<iface>/statistics/rx_bytes
+What:          /sys/class/net/<iface>/statistics/rx_bytes
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -23,7 +23,7 @@ Description:
                See the network driver for the exact meaning of when this
                value is incremented.
 
-What:          /sys/class/<iface>/statistics/rx_compressed
+What:          /sys/class/net/<iface>/statistics/rx_compressed
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -32,7 +32,7 @@ Description:
                network device. This value might only be relevant for interfaces
                that support packet compression (e.g: PPP).
 
-What:          /sys/class/<iface>/statistics/rx_crc_errors
+What:          /sys/class/net/<iface>/statistics/rx_crc_errors
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -41,7 +41,7 @@ Description:
                by this network device. Note that the specific meaning might
                depend on the MAC layer used by the interface.
 
-What:          /sys/class/<iface>/statistics/rx_dropped
+What:          /sys/class/net/<iface>/statistics/rx_dropped
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -51,7 +51,7 @@ Description:
                packet processing. See the network driver for the exact
                meaning of this value.
 
-What:          /sys/class/<iface>/statistics/rx_errors
+What:          /sys/class/net/<iface>/statistics/rx_errors
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -59,7 +59,7 @@ Description:
                Indicates the number of receive errors on this network device.
                See the network driver for the exact meaning of this value.
 
-What:          /sys/class/<iface>/statistics/rx_fifo_errors
+What:          /sys/class/net/<iface>/statistics/rx_fifo_errors
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -68,7 +68,7 @@ Description:
                network device. See the network driver for the exact
                meaning of this value.
 
-What:          /sys/class/<iface>/statistics/rx_frame_errors
+What:          /sys/class/net/<iface>/statistics/rx_frame_errors
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -78,7 +78,7 @@ Description:
                on the MAC layer protocol used. See the network driver for
                the exact meaning of this value.
 
-What:          /sys/class/<iface>/statistics/rx_length_errors
+What:          /sys/class/net/<iface>/statistics/rx_length_errors
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -87,7 +87,7 @@ Description:
                error, oversized or undersized. See the network driver for the
                exact meaning of this value.
 
-What:          /sys/class/<iface>/statistics/rx_missed_errors
+What:          /sys/class/net/<iface>/statistics/rx_missed_errors
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -96,7 +96,7 @@ Description:
                due to lack of capacity in the receive side. See the network
                driver for the exact meaning of this value.
 
-What:          /sys/class/<iface>/statistics/rx_nohandler
+What:          /sys/class/net/<iface>/statistics/rx_nohandler
 Date:          February 2016
 KernelVersion: 4.6
 Contact:       netdev@vger.kernel.org
@@ -104,7 +104,7 @@ Description:
                Indicates the number of received packets that were dropped on
                an inactive device by the network core.
 
-What:          /sys/class/<iface>/statistics/rx_over_errors
+What:          /sys/class/net/<iface>/statistics/rx_over_errors
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -114,7 +114,7 @@ Description:
                (e.g: larger than MTU). See the network driver for the exact
                meaning of this value.
 
-What:          /sys/class/<iface>/statistics/rx_packets
+What:          /sys/class/net/<iface>/statistics/rx_packets
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -122,7 +122,7 @@ Description:
                Indicates the total number of good packets received by this
                network device.
 
-What:          /sys/class/<iface>/statistics/tx_aborted_errors
+What:          /sys/class/net/<iface>/statistics/tx_aborted_errors
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -132,7 +132,7 @@ Description:
                a medium collision). See the network driver for the exact
                meaning of this value.
 
-What:          /sys/class/<iface>/statistics/tx_bytes
+What:          /sys/class/net/<iface>/statistics/tx_bytes
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -143,7 +143,7 @@ Description:
                transmitted packets or all packets that have been queued for
                transmission.
 
-What:          /sys/class/<iface>/statistics/tx_carrier_errors
+What:          /sys/class/net/<iface>/statistics/tx_carrier_errors
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -152,7 +152,7 @@ Description:
                because of carrier errors (e.g: physical link down). See the
                network driver for the exact meaning of this value.
 
-What:          /sys/class/<iface>/statistics/tx_compressed
+What:          /sys/class/net/<iface>/statistics/tx_compressed
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -161,7 +161,7 @@ Description:
                this might only be relevant for devices that support
                compression (e.g: PPP).
 
-What:          /sys/class/<iface>/statistics/tx_dropped
+What:          /sys/class/net/<iface>/statistics/tx_dropped
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -170,7 +170,7 @@ Description:
                See the driver for the exact reasons as to why the packets were
                dropped.
 
-What:          /sys/class/<iface>/statistics/tx_errors
+What:          /sys/class/net/<iface>/statistics/tx_errors
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -179,7 +179,7 @@ Description:
                a network device. See the driver for the exact reasons as to
                why the packets were dropped.
 
-What:          /sys/class/<iface>/statistics/tx_fifo_errors
+What:          /sys/class/net/<iface>/statistics/tx_fifo_errors
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -188,7 +188,7 @@ Description:
                FIFO error. See the driver for the exact reasons as to why the
                packets were dropped.
 
-What:          /sys/class/<iface>/statistics/tx_heartbeat_errors
+What:          /sys/class/net/<iface>/statistics/tx_heartbeat_errors
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -197,7 +197,7 @@ Description:
                reported as heartbeat errors. See the driver for the exact
                reasons as to why the packets were dropped.
 
-What:          /sys/class/<iface>/statistics/tx_packets
+What:          /sys/class/net/<iface>/statistics/tx_packets
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
@@ -206,7 +206,7 @@ Description:
                device. See the driver for whether this reports the number of all
                attempted or successful transmissions.
 
-What:          /sys/class/<iface>/statistics/tx_window_errors
+What:          /sys/class/net/<iface>/statistics/tx_window_errors
 Date:          April 2005
 KernelVersion: 2.6.12
 Contact:       netdev@vger.kernel.org
index 8d7d8f05f6cd0a3d7bb9fa06f0a40730f5bae99b..92fe7c5c5ac1d1d981562d1b441f32a6bafdc1aa 100644 (file)
@@ -1,4 +1,4 @@
-What:          /sys/devices/.../hwmon/hwmon<i>/in0_input
+What:          /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/in0_input
 Date:          February 2023
 KernelVersion: 6.2
 Contact:       intel-gfx@lists.freedesktop.org
@@ -6,7 +6,7 @@ Description:    RO. Current Voltage in millivolt.
 
                Only supported for particular Intel i915 graphics platforms.
 
-What:          /sys/devices/.../hwmon/hwmon<i>/power1_max
+What:          /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/power1_max
 Date:          February 2023
 KernelVersion: 6.2
 Contact:       intel-gfx@lists.freedesktop.org
@@ -20,7 +20,7 @@ Description:  RW. Card reactive sustained  (PL1/Tau) power limit in microwatts.
 
                Only supported for particular Intel i915 graphics platforms.
 
-What:          /sys/devices/.../hwmon/hwmon<i>/power1_rated_max
+What:          /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/power1_rated_max
 Date:          February 2023
 KernelVersion: 6.2
 Contact:       intel-gfx@lists.freedesktop.org
@@ -28,7 +28,7 @@ Description:  RO. Card default power limit (default TDP setting).
 
                Only supported for particular Intel i915 graphics platforms.
 
-What:          /sys/devices/.../hwmon/hwmon<i>/power1_max_interval
+What:          /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/power1_max_interval
 Date:          February 2023
 KernelVersion: 6.2
 Contact:       intel-gfx@lists.freedesktop.org
@@ -37,7 +37,7 @@ Description:  RW. Sustained power limit interval (Tau in PL1/Tau) in
 
                Only supported for particular Intel i915 graphics platforms.
 
-What:          /sys/devices/.../hwmon/hwmon<i>/power1_crit
+What:          /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/power1_crit
 Date:          February 2023
 KernelVersion: 6.2
 Contact:       intel-gfx@lists.freedesktop.org
@@ -50,7 +50,7 @@ Description:  RW. Card reactive critical (I1) power limit in microwatts.
 
                Only supported for particular Intel i915 graphics platforms.
 
-What:          /sys/devices/.../hwmon/hwmon<i>/curr1_crit
+What:          /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/curr1_crit
 Date:          February 2023
 KernelVersion: 6.2
 Contact:       intel-gfx@lists.freedesktop.org
@@ -63,7 +63,7 @@ Description:  RW. Card reactive critical (I1) power limit in milliamperes.
 
                Only supported for particular Intel i915 graphics platforms.
 
-What:          /sys/devices/.../hwmon/hwmon<i>/energy1_input
+What:          /sys/bus/pci/drivers/i915/.../hwmon/hwmon<i>/energy1_input
 Date:          February 2023
 KernelVersion: 6.2
 Contact:       intel-gfx@lists.freedesktop.org
index 8c321bc9dc04401e5b25fb2e4c2e509f0d2eba14..023fd82de3f70a61fb9c58c973690bc0fff38e12 100644 (file)
@@ -1,4 +1,4 @@
-What:          /sys/devices/.../hwmon/hwmon<i>/power1_max
+What:          /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/power1_max
 Date:          September 2023
 KernelVersion: 6.5
 Contact:       intel-xe@lists.freedesktop.org
@@ -12,7 +12,7 @@ Description:  RW. Card reactive sustained  (PL1) power limit in microwatts.
 
                Only supported for particular Intel xe graphics platforms.
 
-What:          /sys/devices/.../hwmon/hwmon<i>/power1_rated_max
+What:          /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/power1_rated_max
 Date:          September 2023
 KernelVersion: 6.5
 Contact:       intel-xe@lists.freedesktop.org
@@ -20,7 +20,7 @@ Description:  RO. Card default power limit (default TDP setting).
 
                Only supported for particular Intel xe graphics platforms.
 
-What:          /sys/devices/.../hwmon/hwmon<i>/power1_crit
+What:          /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/power1_crit
 Date:          September 2023
 KernelVersion: 6.5
 Contact:       intel-xe@lists.freedesktop.org
@@ -33,7 +33,7 @@ Description:  RW. Card reactive critical (I1) power limit in microwatts.
 
                Only supported for particular Intel xe graphics platforms.
 
-What:          /sys/devices/.../hwmon/hwmon<i>/curr1_crit
+What:          /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/curr1_crit
 Date:          September 2023
 KernelVersion: 6.5
 Contact:       intel-xe@lists.freedesktop.org
@@ -44,7 +44,7 @@ Description:  RW. Card reactive critical (I1) power limit in milliamperes.
                the operating frequency if the power averaged over a window
                exceeds this limit.
 
-What:          /sys/devices/.../hwmon/hwmon<i>/in0_input
+What:          /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/in0_input
 Date:          September 2023
 KernelVersion: 6.5
 Contact:       intel-xe@lists.freedesktop.org
@@ -52,7 +52,7 @@ Description:  RO. Current Voltage in millivolt.
 
                Only supported for particular Intel xe graphics platforms.
 
-What:          /sys/devices/.../hwmon/hwmon<i>/energy1_input
+What:          /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/energy1_input
 Date:          September 2023
 KernelVersion: 6.5
 Contact:       intel-xe@lists.freedesktop.org
@@ -60,7 +60,7 @@ Description:  RO. Energy input of device in microjoules.
 
                Only supported for particular Intel xe graphics platforms.
 
-What:          /sys/devices/.../hwmon/hwmon<i>/power1_max_interval
+What:          /sys/bus/pci/drivers/xe/.../hwmon/hwmon<i>/power1_max_interval
 Date:          October 2023
 KernelVersion: 6.6
 Contact:       intel-xe@lists.freedesktop.org
index 7af70adf3690e3b0b3a3c148a087ecb3f788d54a..c7c9444f92a880ff3f9971fc52f71a0e2756d5f1 100644 (file)
@@ -4,18 +4,18 @@ KernelVersion:        6.5
 Contact:       Miquel Raynal <miquel.raynal@bootlin.com>
 Description:
                The "cells" folder contains one file per cell exposed by the
-               NVMEM device. The name of the file is: <name>@<where>, with
-               <name> being the cell name and <where> its location in the NVMEM
-               device, in hexadecimal (without the '0x' prefix, to mimic device
-               tree node names). The length of the file is the size of the cell
-               (when known). The content of the file is the binary content of
-               the cell (may sometimes be ASCII, likely without trailing
-               character).
+               NVMEM device. The name of the file is: "<name>@<byte>,<bit>",
+               with <name> being the cell name and <where> its location in
+               the NVMEM device, in hexadecimal bytes and bits (without the
+               '0x' prefix, to mimic device tree node names). The length of
+               the file is the size of the cell (when known). The content of
+               the file is the binary content of the cell (may sometimes be
+               ASCII, likely without trailing character).
                Note: This file is only present if CONFIG_NVMEM_SYSFS
                is enabled.
 
                Example::
 
-                 hexdump -C /sys/bus/nvmem/devices/1-00563/cells/product-name@d
+                 hexdump -C /sys/bus/nvmem/devices/1-00563/cells/product-name@d,0
                  00000000  54 4e 34 38 4d 2d 50 2d  44 4e         |TN48M-P-DN|
                  0000000a
index 2288b3665d160a87b352def3e122461256a21761..4d1cc5bdbcc5f9f945dd825aed99cad89982fdd4 100644 (file)
@@ -10,6 +10,7 @@ What:         /sys/devices/platform/silicom-platform/power_cycle
 Date:          November 2023
 KernelVersion: 6.7
 Contact:       Henry Shi <henrys@silicom-usa.com>
+Description:
                This file allow user to power cycle the platform.
                Default value is 0; when set to 1, it powers down
                the platform, waits 5 seconds, then powers on the
index 2d42998a89a6378a94521d49785c4f1632b25a34..3e6407de231c998ac7f0539c3d856458097d8971 100644 (file)
@@ -68,7 +68,8 @@ over a rather long period of time, but improvements are always welcome!
        rcu_read_lock_sched(), or by the appropriate update-side lock.
        Explicit disabling of preemption (preempt_disable(), for example)
        can serve as rcu_read_lock_sched(), but is less readable and
-       prevents lockdep from detecting locking issues.
+       prevents lockdep from detecting locking issues.  Acquiring a
+       spinlock also enters an RCU read-side critical section.
 
        Please note that you *cannot* rely on code known to be built
        only in non-preemptible kernels.  Such code can and will break,
@@ -382,16 +383,17 @@ over a rather long period of time, but improvements are always welcome!
        must use whatever locking or other synchronization is required
        to safely access and/or modify that data structure.
 
-       Do not assume that RCU callbacks will be executed on the same
-       CPU that executed the corresponding call_rcu() or call_srcu().
-       For example, if a given CPU goes offline while having an RCU
-       callback pending, then that RCU callback will execute on some
-       surviving CPU.  (If this was not the case, a self-spawning RCU
-       callback would prevent the victim CPU from ever going offline.)
-       Furthermore, CPUs designated by rcu_nocbs= might well *always*
-       have their RCU callbacks executed on some other CPUs, in fact,
-       for some  real-time workloads, this is the whole point of using
-       the rcu_nocbs= kernel boot parameter.
+       Do not assume that RCU callbacks will be executed on
+       the same CPU that executed the corresponding call_rcu(),
+       call_srcu(), call_rcu_tasks(), call_rcu_tasks_rude(), or
+       call_rcu_tasks_trace().  For example, if a given CPU goes offline
+       while having an RCU callback pending, then that RCU callback
+       will execute on some surviving CPU.  (If this was not the case,
+       a self-spawning RCU callback would prevent the victim CPU from
+       ever going offline.)  Furthermore, CPUs designated by rcu_nocbs=
+       might well *always* have their RCU callbacks executed on some
+       other CPUs, in fact, for some  real-time workloads, this is the
+       whole point of using the rcu_nocbs= kernel boot parameter.
 
        In addition, do not assume that callbacks queued in a given order
        will be invoked in that order, even if they all are queued on the
@@ -444,7 +446,7 @@ over a rather long period of time, but improvements are always welcome!
        real-time workloads than is synchronize_rcu_expedited().
 
        It is also permissible to sleep in RCU Tasks Trace read-side
-       critical, which are delimited by rcu_read_lock_trace() and
+       critical section, which are delimited by rcu_read_lock_trace() and
        rcu_read_unlock_trace().  However, this is a specialized flavor
        of RCU, and you should not use it without first checking with
        its current users.  In most cases, you should instead use SRCU.
@@ -490,6 +492,12 @@ over a rather long period of time, but improvements are always welcome!
                since the last time that you passed that same object to
                call_rcu() (or friends).
 
+       CONFIG_RCU_STRICT_GRACE_PERIOD:
+               combine with KASAN to check for pointers leaked out
+               of RCU read-side critical sections.  This Kconfig
+               option is tough on both performance and scalability,
+               and so is limited to four-CPU systems.
+
        __rcu sparse checks:
                tag the pointer to the RCU-protected data structure
                with __rcu, and sparse will warn you if you access that
index 659d5913784d0d9e2d196a04bdfb7fadb57d5eed..2524dcdadde2b801b33a4ce0a93f31948ea7aefb 100644 (file)
@@ -408,7 +408,10 @@ member of the rcu_dereference() to use in various situations:
        RCU flavors, an RCU read-side critical section is entered
        using rcu_read_lock(), anything that disables bottom halves,
        anything that disables interrupts, or anything that disables
-       preemption.
+       preemption.  Please note that spinlock critical sections
+       are also implied RCU read-side critical sections, even when
+       they are preemptible, as they are in kernels built with
+       CONFIG_PREEMPT_RT=y.
 
 2.     If the access might be within an RCU read-side critical section
        on the one hand, or protected by (say) my_lock on the other,
index 60ce02475142d881b0238f67eee5b9306830745e..872ac665223fbd51f7e06fd6fcf9eddd0de5a65f 100644 (file)
@@ -172,14 +172,25 @@ rcu_read_lock()
        critical section.  Reference counts may be used in conjunction
        with RCU to maintain longer-term references to data structures.
 
+       Note that anything that disables bottom halves, preemption,
+       or interrupts also enters an RCU read-side critical section.
+       Acquiring a spinlock also enters an RCU read-side critical
+       sections, even for spinlocks that do not disable preemption,
+       as is the case in kernels built with CONFIG_PREEMPT_RT=y.
+       Sleeplocks do *not* enter RCU read-side critical sections.
+
 rcu_read_unlock()
 ^^^^^^^^^^^^^^^^^
        void rcu_read_unlock(void);
 
        This temporal primitives is used by a reader to inform the
        reclaimer that the reader is exiting an RCU read-side critical
-       section.  Note that RCU read-side critical sections may be nested
-       and/or overlapping.
+       section.  Anything that enables bottom halves, preemption,
+       or interrupts also exits an RCU read-side critical section.
+       Releasing a spinlock also exits an RCU read-side critical section.
+
+       Note that RCU read-side critical sections may be nested and/or
+       overlapping.
 
 synchronize_rcu()
 ^^^^^^^^^^^^^^^^^
@@ -952,8 +963,8 @@ unfortunately any spinlock in a ``SLAB_TYPESAFE_BY_RCU`` object must be
 initialized after each and every call to kmem_cache_alloc(), which renders
 reference-free spinlock acquisition completely unsafe.  Therefore, when
 using ``SLAB_TYPESAFE_BY_RCU``, make proper use of a reference counter.
-(Those willing to use a kmem_cache constructor may also use locking,
-including cache-friendly sequence locking.)
+(Those willing to initialize their locks in a kmem_cache constructor
+may also use locking, including cache-friendly sequence locking.)
 
 With traditional reference counting -- such as that implemented by the
 kref library in Linux -- there is typically code that runs when the last
index 89984dfececf0b0b07a937179808d27b8268cf4b..ae30301366379d067e5cb71b4fcb534bc53c4d40 100644 (file)
@@ -101,8 +101,8 @@ External References
 email threads
 -------------
 
-* `Initial discussion on the New subsystem for acceleration devices <https://lkml.org/lkml/2022/7/31/83>`_ - Oded Gabbay (2022)
-* `patch-set to add the new subsystem <https://lkml.org/lkml/2022/10/22/544>`_ - Oded Gabbay (2022)
+* `Initial discussion on the New subsystem for acceleration devices <https://lore.kernel.org/lkml/CAFCwf11=9qpNAepL7NL+YAV_QO=Wv6pnWPhKHKAepK3fNn+2Dg@mail.gmail.com/>`_ - Oded Gabbay (2022)
+* `patch-set to add the new subsystem <https://lore.kernel.org/lkml/20221022214622.18042-1-ogabbay@kernel.org/>`_ - Oded Gabbay (2022)
 
 Conference talks
 ----------------
index 102937bc8443a23d88b952b4d7278e5e6cd25c21..e8bdf5e86a9ba15b9d52858e66b7478307b6660f 100644 (file)
@@ -108,6 +108,7 @@ is applicable::
        CMA     Contiguous Memory Area support is enabled.
        DRM     Direct Rendering Management support is enabled.
        DYNAMIC_DEBUG Build in debug messages and enable them at runtime
+       EARLY   Parameter processed too early to be embedded in initrd.
        EDD     BIOS Enhanced Disk Drive Services (EDD) is enabled
        EFI     EFI Partitioning (GPT) is enabled
        EVM     Extended Verification Module
@@ -218,8 +219,3 @@ bytes respectively. Such letter suffixes can also be entirely omitted:
 
 .. include:: kernel-parameters.txt
    :literal:
-
-Todo
-----
-
-       Add more DRM drivers.
index 31b3a25680d08cfac3603d58b3d3783bbf1e34bb..94314d0eb3019b760d9bb9548067235db24f5793 100644 (file)
@@ -9,7 +9,7 @@
                        accept_memory=eager can be used to accept all memory
                        at once during boot.
 
-       acpi=           [HW,ACPI,X86,ARM64,RISCV64]
+       acpi=           [HW,ACPI,X86,ARM64,RISCV64,EARLY]
                        Advanced Configuration and Power Interface
                        Format: { force | on | off | strict | noirq | rsdt |
                                  copy_dsdt }
@@ -26,7 +26,7 @@
 
                        See also Documentation/power/runtime_pm.rst, pci=noacpi
 
-       acpi_apic_instance=     [ACPI, IOAPIC]
+       acpi_apic_instance=     [ACPI,IOAPIC,EARLY]
                        Format: <int>
                        2: use 2nd APIC table, if available
                        1,0: use 1st APIC table
@@ -41,7 +41,7 @@
                        If set to native, use the device's native backlight mode.
                        If set to none, disable the ACPI backlight interface.
 
-       acpi_force_32bit_fadt_addr
+       acpi_force_32bit_fadt_addr [ACPI,EARLY]
                        force FADT to use 32 bit addresses rather than the
                        64 bit X_* addresses. Some firmware have broken 64
                        bit addresses for force ACPI ignore these and use
@@ -97,7 +97,7 @@
                        no: ACPI OperationRegions are not marked as reserved,
                        no further checks are performed.
 
-       acpi_force_table_verification   [HW,ACPI]
+       acpi_force_table_verification   [HW,ACPI,EARLY]
                        Enable table checksum verification during early stage.
                        By default, this is disabled due to x86 early mapping
                        size limitation.
        acpi_no_memhotplug [ACPI] Disable memory hotplug.  Useful for kdump
                           kernels.
 
-       acpi_no_static_ssdt     [HW,ACPI]
+       acpi_no_static_ssdt     [HW,ACPI,EARLY]
                        Disable installation of static SSDTs at early boot time
                        By default, SSDTs contained in the RSDT/XSDT will be
                        installed automatically and they will appear under
                        Ignore the ACPI-based watchdog interface (WDAT) and let
                        a native driver control the watchdog device instead.
 
-       acpi_rsdp=      [ACPI,EFI,KEXEC]
+       acpi_rsdp=      [ACPI,EFI,KEXEC,EARLY]
                        Pass the RSDP address to the kernel, mostly used
                        on machines running EFI runtime service to boot the
                        second kernel for kdump.
                        to assume that this machine's pmtimer latches its value
                        and always returns good values.
 
-       acpi_sci=       [HW,ACPI] ACPI System Control Interrupt trigger mode
+       acpi_sci=       [HW,ACPI,EARLY] ACPI System Control Interrupt trigger mode
                        Format: { level | edge | high | low }
 
-       acpi_skip_timer_override [HW,ACPI]
+       acpi_skip_timer_override [HW,ACPI,EARLY]
                        Recognize and ignore IRQ0/pin2 Interrupt Override.
                        For broken nForce2 BIOS resulting in XT-PIC timer.
 
                        behave incorrectly in some ways with respect to system
                        suspend and resume to be ignored (use wisely).
 
-       acpi_use_timer_override [HW,ACPI]
+       acpi_use_timer_override [HW,ACPI,EARLY]
                        Use timer override. For some broken Nvidia NF5 boards
                        that require a timer override, but don't have HPET
 
-       add_efi_memmap  [EFI; X86] Include EFI memory map in
+       add_efi_memmap  [EFI,X86,EARLY] Include EFI memory map in
                        kernel's map of available physical RAM.
 
        agp=            [AGP]
                        do not want to use tracing_snapshot_alloc() as it needs
                        to be done where GFP_KERNEL allocations are allowed.
 
-       allow_mismatched_32bit_el0 [ARM64]
+       allow_mismatched_32bit_el0 [ARM64,EARLY]
                        Allow execve() of 32-bit applications and setting of the
                        PER_LINUX32 personality on systems where only a strict
                        subset of the CPUs support 32-bit EL0. When this
                                     This mode requires kvm-amd.avic=1.
                                     (Default when IOMMU HW support is present.)
 
-       amd_pstate=     [X86]
+       amd_pstate=     [X86,EARLY]
                        disable
                          Do not enable amd_pstate as the default
                          scaling driver for the supported processors
                        not play well with APC CPU idle - disable it if you have
                        APC and your system crashes randomly.
 
-       apic=           [APIC,X86] Advanced Programmable Interrupt Controller
+       apic=           [APIC,X86,EARLY] Advanced Programmable Interrupt Controller
                        Change the output verbosity while booting
                        Format: { quiet (default) | verbose | debug }
                        Change the amount of debugging information output
                        Format: apic=driver_name
                        Examples: apic=bigsmp
 
-       apic_extnmi=    [APIC,X86] External NMI delivery setting
+       apic_extnmi=    [APIC,X86,EARLY] External NMI delivery setting
                        Format: { bsp (default) | all | none }
                        bsp:  External NMI is delivered only to CPU 0
                        all:  External NMIs are broadcast to all CPUs as a
        bert_disable    [ACPI]
                        Disable BERT OS support on buggy BIOSes.
 
-       bgrt_disable    [ACPI][X86]
+       bgrt_disable    [ACPI,X86,EARLY]
                        Disable BGRT to avoid flickering OEM logo.
 
        blkdevparts=    Manual partition parsing of block device(s) for
                        embedded devices based on command line input.
                        See Documentation/block/cmdline-partition.rst
 
-       boot_delay=     Milliseconds to delay each printk during boot.
+       boot_delay=     [KNL,EARLY]
+                       Milliseconds to delay each printk during boot.
                        Only works if CONFIG_BOOT_PRINTK_DELAY is enabled,
                        and you may also have to specify "lpj=".  Boot_delay
                        values larger than 10 seconds (10000) are assumed
                        erroneous and ignored.
                        Format: integer
 
-       bootconfig      [KNL]
+       bootconfig      [KNL,EARLY]
                        Extended command line options can be added to an initrd
                        and this will cause the kernel to look for it.
 
                        trust validation.
                        format: { id:<keyid> | builtin }
 
-       cca=            [MIPS] Override the kernel pages' cache coherency
+       cca=            [MIPS,EARLY] Override the kernel pages' cache coherency
                        algorithm.  Accepted values range from 0 to 7
                        inclusive. See arch/mips/include/asm/pgtable-bits.h
                        for platform specific values (SB1, Loongson3 and
                        [X86-64] hpet,tsc
 
        clocksource.arm_arch_timer.evtstrm=
-                       [ARM,ARM64]
+                       [ARM,ARM64,EARLY]
                        Format: <bool>
                        Enable/disable the eventstream feature of the ARM
                        architected timer so that code using WFE-based polling
                        10 seconds when built into the kernel.
 
        cma=nn[MG]@[start[MG][-end[MG]]]
-                       [KNL,CMA]
+                       [KNL,CMA,EARLY]
                        Sets the size of kernel global memory area for
                        contiguous memory allocations and optionally the
                        placement constraint by the physical address range of
                        kernel/dma/contiguous.c
 
        cma_pernuma=nn[MG]
-                       [KNL,CMA]
+                       [KNL,CMA,EARLY]
                        Sets the size of kernel per-numa memory area for
                        contiguous memory allocations. A value of 0 disables
                        per-numa CMA altogether. And If this option is not
                        they will fallback to the global default memory area.
 
        numa_cma=<node>:nn[MG][,<node>:nn[MG]]
-                       [KNL,CMA]
+                       [KNL,CMA,EARLY]
                        Sets the size of kernel numa memory area for
                        contiguous memory allocations. It will reserve CMA
                        area for the specified node.
                        a hypervisor.
                        Default: yes
 
-       coherent_pool=nn[KMG]   [ARM,KNL]
+       coherent_pool=nn[KMG]   [ARM,KNL,EARLY]
                        Sets the size of memory pool for coherent, atomic dma
                        allocations, by default set to 256K.
 
        condev=         [HW,S390] console device
        conmode=
 
-       con3215_drop=   [S390] 3215 console drop mode.
+       con3215_drop=   [S390,EARLY] 3215 console drop mode.
                        Format: y|n|Y|N|1|0
                        When set to true, drop data on the 3215 console when
                        the console buffer is full. In this case the
                        kernel before the cpufreq driver probes.
 
        cpu_init_udelay=N
-                       [X86] Delay for N microsec between assert and de-assert
+                       [X86,EARLY] Delay for N microsec between assert and de-assert
                        of APIC INIT to start processors.  This delay occurs
                        on every CPU online, such as boot, and resume from suspend.
                        Default: 10000
                        kernel more unstable.
 
        crashkernel=size[KMG][@offset[KMG]]
-                       [KNL] Using kexec, Linux can switch to a 'crash kernel'
+                       [KNL,EARLY] Using kexec, Linux can switch to a 'crash kernel'
                        upon panic. This parameter reserves the physical
                        memory region [offset, offset + size] for that kernel
                        image. If '@offset' is omitted, then a suitable offset
                        Format: <port#>,<type>
                        See also Documentation/input/devices/joystick-parport.rst
 
-       debug           [KNL] Enable kernel debugging (events log level).
+       debug           [KNL,EARLY] Enable kernel debugging (events log level).
 
        debug_boot_weak_hash
-                       [KNL] Enable printing [hashed] pointers early in the
+                       [KNL,EARLY] Enable printing [hashed] pointers early in the
                        boot sequence.  If enabled, we use a weak hash instead
                        of siphash to hash pointers.  Use this option if you are
                        seeing instances of '(___ptrval___)') and need to see a
                        will print _a_lot_ more information - normally only
                        useful to lockdep developers.
 
-       debug_objects   [KNL] Enable object debugging
+       debug_objects   [KNL,EARLY] Enable object debugging
 
        debug_guardpage_minorder=
-                       [KNL] When CONFIG_DEBUG_PAGEALLOC is set, this
+                       [KNL,EARLY] When CONFIG_DEBUG_PAGEALLOC is set, this
                        parameter allows control of the order of pages that will
                        be intentionally kept free (and hence protected) by the
                        buddy allocator. Bigger value increase the probability
                        help tracking down these problems.
 
        debug_pagealloc=
-                       [KNL] When CONFIG_DEBUG_PAGEALLOC is set, this parameter
+                       [KNL,EARLY] When CONFIG_DEBUG_PAGEALLOC is set, this parameter
                        enables the feature at boot time. By default, it is
                        disabled and the system will work mostly the same as a
                        kernel built without CONFIG_DEBUG_PAGEALLOC.
                        useful to also enable the page_owner functionality.
                        on: enable the feature
 
-       debugfs=        [KNL] This parameter enables what is exposed to userspace
-                       and debugfs internal clients.
+       debugfs=        [KNL,EARLY] This parameter enables what is exposed to
+                       userspace and debugfs internal clients.
                        Format: { on, no-mount, off }
                        on:     All functions are enabled.
                        no-mount:
        dhash_entries=  [KNL]
                        Set number of hash buckets for dentry cache.
 
-       disable_1tb_segments [PPC]
+       disable_1tb_segments [PPC,EARLY]
                        Disables the use of 1TB hash page table segments. This
                        causes the kernel to fall back to 256MB segments which
                        can be useful when debugging issues that require an SLB
        disable=        [IPV6]
                        See Documentation/networking/ipv6.rst.
 
-       disable_radix   [PPC]
+       disable_radix   [PPC,EARLY]
                        Disable RADIX MMU mode on POWER9
 
        disable_tlbie   [PPC]
                        causing system reset or hang due to sending
                        INIT from AP to BSP.
 
-       disable_ddw     [PPC/PSERIES]
+       disable_ddw     [PPC/PSERIES,EARLY]
                        Disable Dynamic DMA Window support. Use this
                        to workaround buggy firmware.
 
        disable_ipv6=   [IPV6]
                        See Documentation/networking/ipv6.rst.
 
-       disable_mtrr_cleanup [X86]
+       disable_mtrr_cleanup [X86,EARLY]
                        The kernel tries to adjust MTRR layout from continuous
                        to discrete, to make X server driver able to add WB
                        entry later. This parameter disables that.
 
-       disable_mtrr_trim [X86, Intel and AMD only]
+       disable_mtrr_trim [X86, Intel and AMD only,EARLY]
                        By default the kernel will trim any uncacheable
                        memory out of your available memory pool based on
                        MTRR settings.  This parameter disables that behavior,
                        possibly causing your machine to run very slowly.
 
-       disable_timer_pin_1 [X86]
+       disable_timer_pin_1 [X86,EARLY]
                        Disable PIN 1 of APIC timer
                        Can be useful to work around chipset bugs.
 
 
        dscc4.setup=    [NET]
 
-       dt_cpu_ftrs=    [PPC]
+       dt_cpu_ftrs=    [PPC,EARLY]
                        Format: {"off" | "known"}
                        Control how the dt_cpu_ftrs device-tree binding is
                        used for CPU feature discovery and setup (if it
                        Documentation/admin-guide/dynamic-debug-howto.rst
                        for details.
 
-       early_ioremap_debug [KNL]
+       early_ioremap_debug [KNL,EARLY]
                        Enable debug messages in early_ioremap support. This
                        is useful for tracking down temporary early mappings
                        which are not unmapped.
 
-       earlycon=       [KNL] Output early console device and options.
+       earlycon=       [KNL,EARLY] Output early console device and options.
 
                        When used with no options, the early console is
                        determined by stdout-path property in device tree's
                        address must be provided, and the serial port must
                        already be setup and configured.
 
-       earlyprintk=    [X86,SH,ARM,M68k,S390]
+       earlyprintk=    [X86,SH,ARM,M68k,S390,UM,EARLY]
                        earlyprintk=vga
                        earlyprintk=sclp
                        earlyprintk=xen
        edd=            [EDD]
                        Format: {"off" | "on" | "skip[mbr]"}
 
-       efi=            [EFI]
+       efi=            [EFI,EARLY]
                        Format: { "debug", "disable_early_pci_dma",
                                  "nochunk", "noruntime", "nosoftreserve",
                                  "novamap", "no_disable_early_pci_dma" }
                        no_disable_early_pci_dma: Leave the busmaster bit set
                        on all PCI bridges while in the EFI boot stub
 
-       efi_no_storage_paranoia [EFI; X86]
+       efi_no_storage_paranoia [EFI,X86,EARLY]
                        Using this parameter you can use more than 50% of
                        your efi variable storage. Use this parameter only if
                        you are really sure that your UEFI does sane gc and
                        fulfills the spec otherwise your board may brick.
 
-       efi_fake_mem=   nn[KMG]@ss[KMG]:aa[,nn[KMG]@ss[KMG]:aa,..] [EFI; X86]
+       efi_fake_mem=   nn[KMG]@ss[KMG]:aa[,nn[KMG]@ss[KMG]:aa,..] [EFI,X86,EARLY]
                        Add arbitrary attribute to specific memory range by
                        updating original EFI memory map.
                        Region of memory which aa attribute is added to is
        eisa_irq_edge=  [PARISC,HW]
                        See header of drivers/parisc/eisa.c.
 
-       ekgdboc=        [X86,KGDB] Allow early kernel console debugging
+       ekgdboc=        [X86,KGDB,EARLY] Allow early kernel console debugging
                        Format: ekgdboc=kbd
 
                        This is designed to be used in conjunction with
                        See comment before function elanfreq_setup() in
                        arch/x86/kernel/cpu/cpufreq/elanfreq.c.
 
-       elfcorehdr=[size[KMG]@]offset[KMG] [PPC,SH,X86,S390]
+       elfcorehdr=[size[KMG]@]offset[KMG] [PPC,SH,X86,S390,EARLY]
                        Specifies physical address of start of kernel core
                        image elf header and optionally the size. Generally
                        kexec loader will pass this option to capture kernel.
                        See Documentation/admin-guide/kdump/kdump.rst for details.
 
-       enable_mtrr_cleanup [X86]
+       enable_mtrr_cleanup [X86,EARLY]
                        The kernel tries to adjust MTRR layout from continuous
                        to discrete, to make X server driver able to add WB
                        entry later. This parameter enables that.
                        Permit 'security.evm' to be updated regardless of
                        current integrity status.
 
-       early_page_ext [KNL] Enforces page_ext initialization to earlier
+       early_page_ext [KNL,EARLY] Enforces page_ext initialization to earlier
                        stages so cover more early boot allocations.
                        Please note that as side effect some optimizations
                        might be disabled to achieve that (e.g. parallelized
                        can be changed at run time by the max_graph_depth file
                        in the tracefs tracing directory. default: 0 (no limit)
 
-       fw_devlink=     [KNL] Create device links between consumer and supplier
+       fw_devlink=     [KNL,EARLY] Create device links between consumer and supplier
                        devices by scanning the firmware to infer the
                        consumer/supplier relationships. This feature is
                        especially useful when drivers are loaded as modules as
                        rpm --  Like "on", but also use to order runtime PM.
 
        fw_devlink.strict=<bool>
-                       [KNL] Treat all inferred dependencies as mandatory
+                       [KNL,EARLY] Treat all inferred dependencies as mandatory
                        dependencies. This only applies for fw_devlink=on|rpm.
                        Format: <bool>
 
        fw_devlink.sync_state =
-                       [KNL] When all devices that could probe have finished
+                       [KNL,EARLY] When all devices that could probe have finished
                        probing, this parameter controls what to do with
                        devices that haven't yet received their sync_state()
                        calls.
 
        gamma=          [HW,DRM]
 
-       gart_fix_e820=  [X86-64] disable the fix e820 for K8 GART
+       gart_fix_e820=  [X86-64,EARLY] disable the fix e820 for K8 GART
                        Format: off | on
                        default: on
 
        gather_data_sampling=
-                       [X86,INTEL] Control the Gather Data Sampling (GDS)
+                       [X86,INTEL,EARLY] Control the Gather Data Sampling (GDS)
                        mitigation.
 
                        Gather Data Sampling is a hardware vulnerability which
                                (that will set all pages holding image data
                                during restoration read-only).
 
-       highmem=nn[KMG] [KNL,BOOT] forces the highmem zone to have an exact
+       highmem=nn[KMG] [KNL,BOOT,EARLY] forces the highmem zone to have an exact
                        size of <nn>. This works even on boxes that have no
                        highmem otherwise. This also works to reduce highmem
                        size on bigger boxes.
 
        hlt             [BUGS=ARM,SH]
 
-       hostname=       [KNL] Set the hostname (aka UTS nodename).
+       hostname=       [KNL,EARLY] Set the hostname (aka UTS nodename).
                        Format: <string>
                        This allows setting the system's hostname during early
                        startup. This sets the name returned by gethostname.
                        Documentation/admin-guide/mm/hugetlbpage.rst.
                        Format: size[KMG]
 
-       hugetlb_cma=    [HW,CMA] The size of a CMA area used for allocation
+       hugetlb_cma=    [HW,CMA,EARLY] The size of a CMA area used for allocation
                        of gigantic hugepages. Or using node format, the size
                        of a CMA area per node can be specified.
                        Format: nn[KMGTPE] or (node format)
                                If specified, z/VM IUCV HVC accepts connections
                                from listed z/VM user IDs only.
 
-       hv_nopvspin     [X86,HYPER_V] Disables the paravirt spinlock optimizations
-                                     which allow the hypervisor to 'idle' the
-                                     guest on lock contention.
+       hv_nopvspin     [X86,HYPER_V,EARLY]
+                       Disables the paravirt spinlock optimizations
+                       which allow the hypervisor to 'idle' the guest
+                       on lock contention.
 
        i2c_bus=        [HW]    Override the default board specific I2C bus speed
                                or register an additional I2C bus that is not
                        Format: <io>[,<membase>[,<icn_id>[,<icn_id2>]]]
 
 
-       idle=           [X86]
+       idle=           [X86,EARLY]
                        Format: idle=poll, idle=halt, idle=nomwait
                        Poll forces a polling idle loop that can slightly
                        improve the performance of waking up a idle CPU, but
                        mode generally follows that for the NaN encoding,
                        except where unsupported by hardware.
 
-       ignore_loglevel [KNL]
+       ignore_loglevel [KNL,EARLY]
                        Ignore loglevel setting - this will print /all/
                        kernel messages to the console. Useful for debugging.
                        We also add it as printk module parameter, so users
                        unpacking being completed before device_ and
                        late_ initcalls.
 
-       initrd=         [BOOT] Specify the location of the initial ramdisk
+       initrd=         [BOOT,EARLY] Specify the location of the initial ramdisk
 
-       initrdmem=      [KNL] Specify a physical address and size from which to
+       initrdmem=      [KNL,EARLY] Specify a physical address and size from which to
                        load the initrd. If an initrd is compiled in or
                        specified in the bootparams, it takes priority over this
                        setting.
                        Format: ss[KMG],nn[KMG]
                        Default is 0, 0
 
-       init_on_alloc=  [MM] Fill newly allocated pages and heap objects with
+       init_on_alloc=  [MM,EARLY] Fill newly allocated pages and heap objects with
                        zeroes.
                        Format: 0 | 1
                        Default set by CONFIG_INIT_ON_ALLOC_DEFAULT_ON.
 
-       init_on_free=   [MM] Fill freed pages and heap objects with zeroes.
+       init_on_free=   [MM,EARLY] Fill freed pages and heap objects with zeroes.
                        Format: 0 | 1
                        Default set by CONFIG_INIT_ON_FREE_DEFAULT_ON.
 
                        0       disables intel_idle and fall back on acpi_idle.
                        1 to 9  specify maximum depth of C-state.
 
-       intel_pstate=   [X86]
+       intel_pstate=   [X86,EARLY]
                        disable
                          Do not enable intel_pstate as the default
                          scaling driver for the supported processors
                          Allow per-logical-CPU P-State performance control limits using
                          cpufreq sysfs interface
 
-       intremap=       [X86-64, Intel-IOMMU]
+       intremap=       [X86-64,Intel-IOMMU,EARLY]
                        on      enable Interrupt Remapping (default)
                        off     disable Interrupt Remapping
                        nosid   disable Source ID checking
                strict  regions from userspace.
                relaxed
 
-       iommu=          [X86]
+       iommu=          [X86,EARLY]
                off
                force
                noforce
                nobypass        [PPC/POWERNV]
                        Disable IOMMU bypass, using IOMMU for PCI devices.
 
-       iommu.forcedac= [ARM64, X86] Control IOVA allocation for PCI devices.
+       iommu.forcedac= [ARM64,X86,EARLY] Control IOVA allocation for PCI devices.
                        Format: { "0" | "1" }
                        0 - Try to allocate a 32-bit DMA address first, before
                          falling back to the full range if needed.
                          forcing Dual Address Cycle for PCI cards supporting
                          greater than 32-bit addressing.
 
-       iommu.strict=   [ARM64, X86, S390] Configure TLB invalidation behaviour
+       iommu.strict=   [ARM64,X86,S390,EARLY] Configure TLB invalidation behaviour
                        Format: { "0" | "1" }
                        0 - Lazy mode.
                          Request that DMA unmap operations use deferred
                        legacy driver-specific options takes precedence.
 
        iommu.passthrough=
-                       [ARM64, X86] Configure DMA to bypass the IOMMU by default.
+                       [ARM64,X86,EARLY] Configure DMA to bypass the IOMMU by default.
                        Format: { "0" | "1" }
                        0 - Use IOMMU translation for DMA.
                        1 - Bypass the IOMMU for DMA.
                        See comment before marvel_specify_io7 in
                        arch/alpha/kernel/core_marvel.c.
 
-       io_delay=       [X86] I/O delay method
+       io_delay=       [X86,EARLY] I/O delay method
                0x80
                        Standard port 0x80 based delay
                0xed
        ip=             [IP_PNP]
                        See Documentation/admin-guide/nfs/nfsroot.rst.
 
-       ipcmni_extend   [KNL] Extend the maximum number of unique System V
+       ipcmni_extend   [KNL,EARLY] Extend the maximum number of unique System V
                        IPC identifiers from 32,768 to 16,777,216.
 
        irqaffinity=    [SMP] Set the default irq affinity mask
                        The argument is a cpu list, as described above.
 
        irqchip.gicv2_force_probe=
-                       [ARM, ARM64]
+                       [ARM,ARM64,EARLY]
                        Format: <bool>
                        Force the kernel to look for the second 4kB page
                        of a GICv2 controller even if the memory range
                        exposed by the device tree is too small.
 
        irqchip.gicv3_nolpi=
-                       [ARM, ARM64]
+                       [ARM,ARM64,EARLY]
                        Force the kernel to ignore the availability of
                        LPIs (and by consequence ITSs). Intended for system
                        that use the kernel as a bootloader, and thus want
                        to let secondary kernels in charge of setting up
                        LPIs.
 
-       irqchip.gicv3_pseudo_nmi= [ARM64]
+       irqchip.gicv3_pseudo_nmi= [ARM64,EARLY]
                        Enables support for pseudo-NMIs in the kernel. This
                        requires the kernel to be built with
                        CONFIG_ARM64_PSEUDO_NMI.
                        parameter KASAN will print report only for the first
                        invalid access.
 
-       keep_bootcon    [KNL]
+       keep_bootcon    [KNL,EARLY]
                        Do not unregister boot console at start. This is only
                        useful for debugging when something happens in the window
                        between unregistering the boot console and initializing
 
        keepinitrd      [HW,ARM] See retain_initrd.
 
-       kernelcore=     [KNL,X86,IA-64,PPC]
+       kernelcore=     [KNL,X86,IA-64,PPC,EARLY]
                        Format: nn[KMGTPE] | nn% | "mirror"
                        This parameter specifies the amount of memory usable by
                        the kernel for non-movable allocations.  The requested
                        for Movable pages.  "nn[KMGTPE]", "nn%", and "mirror"
                        are exclusive, so you cannot specify multiple forms.
 
-       kgdbdbgp=       [KGDB,HW] kgdb over EHCI usb debug port.
+       kgdbdbgp=       [KGDB,HW,EARLY] kgdb over EHCI usb debug port.
                        Format: <Controller#>[,poll interval]
                        The controller # is the number of the ehci usb debug
                        port as it is probed via PCI.  The poll interval is
                         kms, kbd format: kms,kbd
                         kms, kbd and serial format: kms,kbd,<ser_dev>[,baud]
 
-       kgdboc_earlycon=        [KGDB,HW]
+       kgdboc_earlycon=        [KGDB,HW,EARLY]
                        If the boot console provides the ability to read
                        characters and can work in polling mode, you can use
                        this parameter to tell kgdb to use it as a backend
                        blank and the first boot console that implements
                        read() will be picked.
 
-       kgdbwait        [KGDB] Stop kernel execution and enter the
+       kgdbwait        [KGDB,EARLY] Stop kernel execution and enter the
                        kernel debugger at the earliest opportunity.
 
        kmac=           [MIPS] Korina ethernet MAC address.
                        Configure the RouterBoard 532 series on-chip
                        Ethernet adapter MAC address.
 
-       kmemleak=       [KNL] Boot-time kmemleak enable/disable
+       kmemleak=       [KNL,EARLY] Boot-time kmemleak enable/disable
                        Valid arguments: on, off
                        Default: on
                        Built with CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y,
                        See also Documentation/trace/kprobetrace.rst "Kernel
                        Boot Parameter" section.
 
-       kpti=           [ARM64] Control page table isolation of user
-                       and kernel address spaces.
+       kpti=           [ARM64,EARLY] Control page table isolation of
+                       user and kernel address spaces.
                        Default: enabled on cores which need mitigation.
                        0: force disabled
                        1: force enabled
                        for NPT.
 
        kvm-arm.mode=
-                       [KVM,ARM] Select one of KVM/arm64's modes of operation.
+                       [KVM,ARM,EARLY] Select one of KVM/arm64's modes of
+                       operation.
 
                        none: Forcefully disable KVM.
 
                        used with extreme caution.
 
        kvm-arm.vgic_v3_group0_trap=
-                       [KVM,ARM] Trap guest accesses to GICv3 group-0
+                       [KVM,ARM,EARLY] Trap guest accesses to GICv3 group-0
                        system registers
 
        kvm-arm.vgic_v3_group1_trap=
-                       [KVM,ARM] Trap guest accesses to GICv3 group-1
+                       [KVM,ARM,EARLY] Trap guest accesses to GICv3 group-1
                        system registers
 
        kvm-arm.vgic_v3_common_trap=
-                       [KVM,ARM] Trap guest accesses to GICv3 common
+                       [KVM,ARM,EARLY] Trap guest accesses to GICv3 common
                        system registers
 
        kvm-arm.vgic_v4_enable=
-                       [KVM,ARM] Allow use of GICv4 for direct injection of
-                       LPIs.
+                       [KVM,ARM,EARLY] Allow use of GICv4 for direct
+                       injection of LPIs.
 
-       kvm_cma_resv_ratio=n [PPC]
+       kvm_cma_resv_ratio=n [PPC,EARLY]
                        Reserves given percentage from system memory area for
                        contiguous memory allocation for KVM hash pagetable
                        allocation.
                        (enabled). Disable by KVM if hardware lacks support
                        for it.
 
-       l1d_flush=      [X86,INTEL]
+       l1d_flush=      [X86,INTEL,EARLY]
                        Control mitigation for L1D based snooping vulnerability.
 
                        Certain CPUs are vulnerable to an exploit against CPU
 
                        on         - enable the interface for the mitigation
 
-       l1tf=           [X86] Control mitigation of the L1TF vulnerability on
+       l1tf=           [X86,EARLY] Control mitigation of the L1TF vulnerability on
                              affected CPUs
 
                        The kernel PTE inversion protection is unconditionally
 
        l3cr=           [PPC]
 
-       lapic           [X86-32,APIC] Enable the local APIC even if BIOS
+       lapic           [X86-32,APIC,EARLY] Enable the local APIC even if BIOS
                        disabled it.
 
        lapic=          [X86,APIC] Do not use TSC deadline
                        back to the programmable timer unit in the LAPIC.
                        Format: notscdeadline
 
-       lapic_timer_c2_ok       [X86,APIC] trust the local apic timer
+       lapic_timer_c2_ok       [X86,APIC,EARLY] trust the local apic timer
                        in C2 power state.
 
        libata.dma=     [LIBATA] DMA control
        lockd.nlm_udpport=M     [NFS] Assign UDP port.
                        Format: <integer>
 
-       lockdown=       [SECURITY]
+       lockdown=       [SECURITY,EARLY]
                        { integrity | confidentiality }
                        Enable the kernel lockdown feature. If set to
                        integrity, kernel features that allow userland to
        logibm.irq=     [HW,MOUSE] Logitech Bus Mouse Driver
                        Format: <irq>
 
-       loglevel=       All Kernel Messages with a loglevel smaller than the
+       loglevel=       [KNL,EARLY]
+                       All Kernel Messages with a loglevel smaller than the
                        console loglevel will be printed to the console. It can
                        also be changed with klogd or other programs. The
                        loglevels are defined as follows:
                        6 (KERN_INFO)           informational
                        7 (KERN_DEBUG)          debug-level messages
 
-       log_buf_len=n[KMG]      Sets the size of the printk ring buffer,
-                       in bytes.  n must be a power of two and greater
-                       than the minimal size. The minimal size is defined
-                       by LOG_BUF_SHIFT kernel config parameter. There is
-                       also CONFIG_LOG_CPU_MAX_BUF_SHIFT config parameter
-                       that allows to increase the default size depending on
-                       the number of CPUs. See init/Kconfig for more details.
+       log_buf_len=n[KMG] [KNL,EARLY]
+                       Sets the size of the printk ring buffer, in bytes.
+                       n must be a power of two and greater than the
+                       minimal size. The minimal size is defined by
+                       LOG_BUF_SHIFT kernel config parameter. There
+                       is also CONFIG_LOG_CPU_MAX_BUF_SHIFT config
+                       parameter that allows to increase the default size
+                       depending on the number of CPUs. See init/Kconfig
+                       for more details.
 
        logo.nologo     [FB] Disables display of the built-in Linux logo.
                        This may be used to provide more screen space for
        max_addr=nn[KMG]        [KNL,BOOT,IA-64] All physical memory greater
                        than or equal to this physical address is ignored.
 
-       maxcpus=        [SMP] Maximum number of processors that an SMP kernel
+       maxcpus=        [SMP,EARLY] Maximum number of processors that an SMP kernel
                        will bring up during bootup.  maxcpus=n : n >= 0 limits
                        the kernel to bring up 'n' processors. Surely after
                        bootup you can bring up the other plugged cpu by executing
                        Format: <first>,<last>
                        Specifies range of consoles to be captured by the MDA.
 
-       mds=            [X86,INTEL]
+       mds=            [X86,INTEL,EARLY]
                        Control mitigation for the Micro-architectural Data
                        Sampling (MDS) vulnerability.
 
 
                        For details see: Documentation/admin-guide/hw-vuln/mds.rst
 
-       mem=nn[KMG]     [HEXAGON] Set the memory size.
+       mem=nn[KMG]     [HEXAGON,EARLY] Set the memory size.
                        Must be specified, otherwise memory size will be 0.
 
-       mem=nn[KMG]     [KNL,BOOT] Force usage of a specific amount of memory
-                       Amount of memory to be used in cases as follows:
+       mem=nn[KMG]     [KNL,BOOT,EARLY] Force usage of a specific amount
+                       of memory Amount of memory to be used in cases
+                       as follows:
 
                        1 for test;
                        2 when the kernel is not able to see the whole system memory;
                        if system memory of hypervisor is not sufficient.
 
        mem=nn[KMG]@ss[KMG]
-                       [ARM,MIPS] - override the memory layout reported by
-                       firmware.
+                       [ARM,MIPS,EARLY] - override the memory layout
+                       reported by firmware.
                        Define a memory region of size nn[KMG] starting at
                        ss[KMG].
                        Multiple different regions can be specified with
        mem=nopentium   [BUGS=X86-32] Disable usage of 4MB pages for kernel
                        memory.
 
-       memblock=debug  [KNL] Enable memblock debug messages.
+       memblock=debug  [KNL,EARLY] Enable memblock debug messages.
 
        memchunk=nn[KMG]
                        [KNL,SH] Allow user to override the default size for
                        option.
                        See Documentation/admin-guide/mm/memory-hotplug.rst.
 
-       memmap=exactmap [KNL,X86] Enable setting of an exact
+       memmap=exactmap [KNL,X86,EARLY] Enable setting of an exact
                        E820 memory map, as specified by the user.
                        Such memmap=exactmap lines can be constructed based on
                        BIOS output or other requirements. See the memmap=nn@ss
                        option description.
 
        memmap=nn[KMG]@ss[KMG]
-                       [KNL, X86, MIPS, XTENSA] Force usage of a specific region of memory.
+                       [KNL, X86,MIPS,XTENSA,EARLY] Force usage of a specific region of memory.
                        Region of memory to be used is from ss to ss+nn.
                        If @ss[KMG] is omitted, it is equivalent to mem=nn[KMG],
                        which limits max address to nn[KMG].
                                memmap=100M@2G,100M#3G,1G!1024G
 
        memmap=nn[KMG]#ss[KMG]
-                       [KNL,ACPI] Mark specific memory as ACPI data.
+                       [KNL,ACPI,EARLY] Mark specific memory as ACPI data.
                        Region of memory to be marked is from ss to ss+nn.
 
        memmap=nn[KMG]$ss[KMG]
-                       [KNL,ACPI] Mark specific memory as reserved.
+                       [KNL,ACPI,EARLY] Mark specific memory as reserved.
                        Region of memory to be reserved is from ss to ss+nn.
                        Example: Exclude memory from 0x18690000-0x1869ffff
                                 memmap=64K$0x18690000
                        like Grub2, otherwise '$' and the following number
                        will be eaten.
 
-       memmap=nn[KMG]!ss[KMG]
+       memmap=nn[KMG]!ss[KMG,EARLY]
                        [KNL,X86] Mark specific memory as protected.
                        Region of memory to be used, from ss to ss+nn.
                        The memory region may be marked as e820 type 12 (0xc)
                        and is NVDIMM or ADR memory.
 
        memmap=<size>%<offset>-<oldtype>+<newtype>
-                       [KNL,ACPI] Convert memory within the specified region
+                       [KNL,ACPI,EARLY] Convert memory within the specified region
                        from <oldtype> to <newtype>. If "-<oldtype>" is left
                        out, the whole region will be marked as <newtype>,
                        even if previously unavailable. If "+<newtype>" is left
                        specified as e820 types, e.g., 1 = RAM, 2 = reserved,
                        3 = ACPI, 12 = PRAM.
 
-       memory_corruption_check=0/1 [X86]
+       memory_corruption_check=0/1 [X86,EARLY]
                        Some BIOSes seem to corrupt the first 64k of
                        memory when doing things like suspend/resume.
                        Setting this option will scan the memory
                        affects the same memory, you can use memmap=
                        to prevent the kernel from using that memory.
 
-       memory_corruption_check_size=size [X86]
+       memory_corruption_check_size=size [X86,EARLY]
                        By default it checks for corruption in the low
                        64k, making this memory unavailable for normal
                        use.  Use this parameter to scan for
                        corruption in more or less memory.
 
-       memory_corruption_check_period=seconds [X86]
+       memory_corruption_check_period=seconds [X86,EARLY]
                        By default it checks for corruption every 60
                        seconds.  Use this parameter to check at some
                        other rate.  0 disables periodic checking.
                        Note that even when enabled, there are a few cases where
                        the feature is not effective.
 
-       memtest=        [KNL,X86,ARM,M68K,PPC,RISCV] Enable memtest
+       memtest=        [KNL,X86,ARM,M68K,PPC,RISCV,EARLY] Enable memtest
                        Format: <integer>
                        default : 0 <disable>
                        Specifies the number of memtest passes to be
                        https://repo.or.cz/w/linux-2.6/mini2440.git
 
        mitigations=
-                       [X86,PPC,S390,ARM64] Control optional mitigations for
+                       [X86,PPC,S390,ARM64,EARLY] Control optional mitigations for
                        CPU vulnerabilities.  This is a set of curated,
                        arch-independent options, each of which is an
                        aggregation of existing arch-specific options.
                                               retbleed=auto,nosmt [X86]
 
        mminit_loglevel=
-                       [KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this
+                       [KNL,EARLY] When CONFIG_DEBUG_MEMORY_INIT is set, this
                        parameter allows control of the logging verbosity for
                        the additional memory initialisation checks. A value
                        of 0 disables mminit logging and a level of 4 will
                        so loglevel=8 may also need to be specified.
 
        mmio_stale_data=
-                       [X86,INTEL] Control mitigation for the Processor
+                       [X86,INTEL,EARLY] Control mitigation for the Processor
                        MMIO Stale Data vulnerabilities.
 
                        Processor MMIO Stale Data is a class of
        mousedev.yres=  [MOUSE] Vertical screen resolution, used for devices
                        reporting absolute coordinates, such as tablets
 
-       movablecore=    [KNL,X86,IA-64,PPC]
+       movablecore=    [KNL,X86,IA-64,PPC,EARLY]
                        Format: nn[KMGTPE] | nn%
                        This parameter is the complement to kernelcore=, it
                        specifies the amount of memory used for migratable
                        that the amount of memory usable for all allocations
                        is not too small.
 
-       movable_node    [KNL] Boot-time switch to make hotplugable memory
+       movable_node    [KNL,EARLY] Boot-time switch to make hotplugable memory
                        NUMA nodes to be movable. This means that the memory
                        of such nodes will be usable only for movable
                        allocations which rules out almost all kernel
                        [HW] Make the MicroTouch USB driver use raw coordinates
                        ('y', default) or cooked coordinates ('n')
 
-       mtrr=debug      [X86]
+       mtrr=debug      [X86,EARLY]
                        Enable printing debug information related to MTRR
                        registers at boot time.
 
-       mtrr_chunk_size=nn[KMG] [X86]
+       mtrr_chunk_size=nn[KMG,X86,EARLY]
                        used for mtrr cleanup. It is largest continuous chunk
                        that could hold holes aka. UC entries.
 
-       mtrr_gran_size=nn[KMG] [X86]
+       mtrr_gran_size=nn[KMG,X86,EARLY]
                        Used for mtrr cleanup. It is granularity of mtrr block.
                        Default is 1.
                        Large value could prevent small alignment from
                        using up MTRRs.
 
-       mtrr_spare_reg_nr=n [X86]
+       mtrr_spare_reg_nr=n [X86,EARLY]
                        Format: <integer>
                        Range: 0,7 : spare reg number
                        Default : 1
                        emulation library even if a 387 maths coprocessor
                        is present.
 
-       no4lvl          [RISCV] Disable 4-level and 5-level paging modes. Forces
-                       kernel to use 3-level paging instead.
+       no4lvl          [RISCV,EARLY] Disable 4-level and 5-level paging modes.
+                       Forces kernel to use 3-level paging instead.
 
-       no5lvl          [X86-64,RISCV] Disable 5-level paging mode. Forces
+       no5lvl          [X86-64,RISCV,EARLY] Disable 5-level paging mode. Forces
                        kernel to use 4-level paging instead.
 
        noaliencache    [MM, NUMA, SLAB] Disables the allocation of alien
 
        noalign         [KNL,ARM]
 
-       noaltinstr      [S390] Disables alternative instructions patching
-                       (CPU alternatives feature).
+       noaltinstr      [S390,EARLY] Disables alternative instructions
+                       patching (CPU alternatives feature).
 
-       noapic          [SMP,APIC] Tells the kernel to not make use of any
+       noapic          [SMP,APIC,EARLY] Tells the kernel to not make use of any
                        IOAPICs that may be present in the system.
 
        noautogroup     Disable scheduler automatic task group creation.
 
-       nocache         [ARM]
+       nocache         [ARM,EARLY]
 
        no_console_suspend
                        [HW] Never suspend the console
                        turn on/off it dynamically.
 
        no_debug_objects
-                       [KNL] Disable object debugging
+                       [KNL,EARLY] Disable object debugging
 
        nodsp           [SH] Disable hardware DSP at boot time.
 
-       noefi           Disable EFI runtime services support.
+       noefi           [EFI,EARLY] Disable EFI runtime services support.
 
-       no_entry_flush  [PPC] Don't flush the L1-D cache when entering the kernel.
+       no_entry_flush  [PPC,EARLY] Don't flush the L1-D cache when entering the kernel.
 
        noexec          [IA-64]
 
                        real-time systems.
 
        no_hash_pointers
+                       [KNL,EARLY]
                        Force pointers printed to the console or buffers to be
                        unhashed.  By default, when a pointer is printed via %p
                        format string, that pointer is "hashed", i.e. obscured
                        the impact of the sleep instructions. This is also
                        useful when using JTAG debugger.
 
-       nohugeiomap     [KNL,X86,PPC,ARM64] Disable kernel huge I/O mappings.
+       nohugeiomap     [KNL,X86,PPC,ARM64,EARLY] Disable kernel huge I/O mappings.
 
-       nohugevmalloc   [KNL,X86,PPC,ARM64] Disable kernel huge vmalloc mappings.
+       nohugevmalloc   [KNL,X86,PPC,ARM64,EARLY] Disable kernel huge vmalloc mappings.
 
        nohz=           [KNL] Boottime enable/disable dynamic ticks
                        Valid arguments: on, off
        noinitrd        [RAM] Tells the kernel not to load any configured
                        initial RAM disk.
 
-       nointremap      [X86-64, Intel-IOMMU] Do not enable interrupt
+       nointremap      [X86-64,Intel-IOMMU,EARLY] Do not enable interrupt
                        remapping.
                        [Deprecated - use intremap=off]
 
        nointroute      [IA-64]
 
-       noinvpcid       [X86] Disable the INVPCID cpu feature.
+       noinvpcid       [X86,EARLY] Disable the INVPCID cpu feature.
 
        noiotrap        [SH] Disables trapped I/O port accesses.
 
 
        nojitter        [IA-64] Disables jitter checking for ITC timers.
 
-       nokaslr         [KNL]
+       nokaslr         [KNL,EARLY]
                        When CONFIG_RANDOMIZE_BASE is set, this disables
                        kernel and module base offset ASLR (Address Space
                        Layout Randomization).
 
-       no-kvmapf       [X86,KVM] Disable paravirtualized asynchronous page
+       no-kvmapf       [X86,KVM,EARLY] Disable paravirtualized asynchronous page
                        fault handling.
 
-       no-kvmclock     [X86,KVM] Disable paravirtualized KVM clock driver
+       no-kvmclock     [X86,KVM,EARLY] Disable paravirtualized KVM clock driver
 
-       nolapic         [X86-32,APIC] Do not enable or use the local APIC.
+       nolapic         [X86-32,APIC,EARLY] Do not enable or use the local APIC.
 
-       nolapic_timer   [X86-32,APIC] Do not use the local APIC timer.
+       nolapic_timer   [X86-32,APIC,EARLY] Do not use the local APIC timer.
 
        nomca           [IA-64] Disable machine check abort handling
 
                        shutdown the other cpus.  Instead use the REBOOT_VECTOR
                        irq.
 
-       nopat           [X86] Disable PAT (page attribute table extension of
+       nopat           [X86,EARLY] Disable PAT (page attribute table extension of
                        pagetables) support.
 
-       nopcid          [X86-64] Disable the PCID cpu feature.
+       nopcid          [X86-64,EARLY] Disable the PCID cpu feature.
 
        nopku           [X86] Disable Memory Protection Keys CPU feature found
                        in some Intel CPUs.
 
-       nopti           [X86-64]
+       nopti           [X86-64,EARLY]
                        Equivalent to pti=off
 
-       nopv=           [X86,XEN,KVM,HYPER_V,VMWARE]
+       nopv=           [X86,XEN,KVM,HYPER_V,VMWARE,EARLY]
                        Disables the PV optimizations forcing the guest to run
                        as generic guest with no PV drivers. Currently support
                        XEN HVM, KVM, HYPER_V and VMWARE guest.
 
-       nopvspin        [X86,XEN,KVM]
+       nopvspin        [X86,XEN,KVM,EARLY]
                        Disables the qspinlock slow path using PV optimizations
                        which allow the hypervisor to 'idle' the guest on lock
                        contention.
                        This is required for the Braillex ib80-piezo Braille
                        reader made by F.H. Papenmeier (Germany).
 
-       nosgx           [X86-64,SGX] Disables Intel SGX kernel support.
+       nosgx           [X86-64,SGX,EARLY] Disables Intel SGX kernel support.
 
-       nosmap          [PPC]
+       nosmap          [PPC,EARLY]
                        Disable SMAP (Supervisor Mode Access Prevention)
                        even if it is supported by processor.
 
-       nosmep          [PPC64s]
+       nosmep          [PPC64s,EARLY]
                        Disable SMEP (Supervisor Mode Execution Prevention)
                        even if it is supported by processor.
 
-       nosmp           [SMP] Tells an SMP kernel to act as a UP kernel,
+       nosmp           [SMP,EARLY] Tells an SMP kernel to act as a UP kernel,
                        and disable the IO APIC.  legacy for "maxcpus=0".
 
-       nosmt           [KNL,MIPS,PPC,S390] Disable symmetric multithreading (SMT).
+       nosmt           [KNL,MIPS,PPC,S390,EARLY] Disable symmetric multithreading (SMT).
                        Equivalent to smt=1.
 
                        [KNL,X86,PPC] Disable symmetric multithreading (SMT).
        nosoftlockup    [KNL] Disable the soft-lockup detector.
 
        nospec_store_bypass_disable
-                       [HW] Disable all mitigations for the Speculative Store Bypass vulnerability
+                       [HW,EARLY] Disable all mitigations for the Speculative
+                       Store Bypass vulnerability
 
-       nospectre_bhb   [ARM64] Disable all mitigations for Spectre-BHB (branch
+       nospectre_bhb   [ARM64,EARLY] Disable all mitigations for Spectre-BHB (branch
                        history injection) vulnerability. System may allow data leaks
                        with this option.
 
-       nospectre_v1    [X86,PPC] Disable mitigations for Spectre Variant 1
+       nospectre_v1    [X86,PPC,EARLY] Disable mitigations for Spectre Variant 1
                        (bounds check bypass). With this option data leaks are
                        possible in the system.
 
-       nospectre_v2    [X86,PPC_E500,ARM64] Disable all mitigations for
-                       the Spectre variant 2 (indirect branch prediction)
-                       vulnerability. System may allow data leaks with this
-                       option.
+       nospectre_v2    [X86,PPC_E500,ARM64,EARLY] Disable all mitigations
+                       for the Spectre variant 2 (indirect branch
+                       prediction) vulnerability. System may allow data
+                       leaks with this option.
 
-       no-steal-acc    [X86,PV_OPS,ARM64,PPC/PSERIES,RISCV] Disable
+       no-steal-acc    [X86,PV_OPS,ARM64,PPC/PSERIES,RISCV,EARLY] Disable
                        paravirtualized steal time accounting. steal time is
                        computed, but won't influence scheduler behaviour
 
                        broken timer IRQ sources.
 
        no_uaccess_flush
-                       [PPC] Don't flush the L1-D cache after accessing user data.
+                       [PPC,EARLY] Don't flush the L1-D cache after accessing user data.
 
        novmcoredd      [KNL,KDUMP]
                        Disable device dump. Device dump allows drivers to
                        is set.
 
        no-vmw-sched-clock
-                       [X86,PV_OPS] Disable paravirtualized VMware scheduler
-                       clock and use the default one.
+                       [X86,PV_OPS,EARLY] Disable paravirtualized VMware
+                       scheduler clock and use the default one.
 
        nowatchdog      [KNL] Disable both lockup detectors, i.e.
                        soft-lockup and NMI watchdog (hard-lockup).
 
-       nowb            [ARM]
+       nowb            [ARM,EARLY]
 
-       nox2apic        [X86-64,APIC] Do not enable x2APIC mode.
+       nox2apic        [X86-64,APIC,EARLY] Do not enable x2APIC mode.
 
                        NOTE: this parameter will be ignored on systems with the
                        LEGACY_XAPIC_DISABLED bit set in the
                        purges which is reported from either PAL_VM_SUMMARY or
                        SAL PALO.
 
-       nr_cpus=        [SMP] Maximum number of processors that an SMP kernel
+       nr_cpus=        [SMP,EARLY] Maximum number of processors that an SMP kernel
                        could support.  nr_cpus=n : n >= 1 limits the kernel to
                        support 'n' processors. It could be larger than the
                        number of already plugged CPU during bootup, later in
 
        nr_uarts=       [SERIAL] maximum number of UARTs to be registered.
 
-       numa=off        [KNL, ARM64, PPC, RISCV, SPARC, X86] Disable NUMA, Only
-                       set up a single NUMA node spanning all memory.
+       numa=off        [KNL, ARM64, PPC, RISCV, SPARC, X86, EARLY]
+                       Disable NUMA, Only set up a single NUMA node
+                       spanning all memory.
 
        numa_balancing= [KNL,ARM64,PPC,RISCV,S390,X86] Enable or disable automatic
                        NUMA balancing.
                        This can be set from sysctl after boot.
                        See Documentation/admin-guide/sysctl/vm.rst for details.
 
-       ohci1394_dma=early      [HW] enable debugging via the ohci1394 driver.
+       ohci1394_dma=early      [HW,EARLY] enable debugging via the ohci1394 driver.
                        See Documentation/core-api/debugging-via-ohci1394.rst for more
                        info.
 
                                   Once locked, the boundary cannot be changed.
                                   1 indicates lock status, 0 indicates unlock status.
 
-       oops=panic      Always panic on oopses. Default is to just kill the
+       oops=panic      [KNL,EARLY]
+                       Always panic on oopses. Default is to just kill the
                        process, but there is a small probability of
                        deadlocking the machine.
                        This will also cause panics on machine check exceptions.
                        can be read from sysfs at:
                        /sys/module/page_alloc/parameters/shuffle.
 
-       page_owner=     [KNL] Boot-time page_owner enabling option.
+       page_owner=     [KNL,EARLY] Boot-time page_owner enabling option.
                        Storage of the information about who allocated
                        each page is disabled in default. With this switch,
                        we can turn it on.
                        on: enable the feature
 
-       page_poison=    [KNL] Boot-time parameter changing the state of
+       page_poison=    [KNL,EARLY] Boot-time parameter changing the state of
                        poisoning on the buddy allocator, available with
                        CONFIG_PAGE_POISONING=y.
                        off: turn off poisoning (default)
                        timeout < 0: reboot immediately
                        Format: <timeout>
 
-       panic_on_taint= Bitmask for conditionally calling panic() in add_taint()
+       panic_on_taint= [KNL,EARLY]
+                       Bitmask for conditionally calling panic() in add_taint()
                        Format: <hex>[,nousertaint]
                        Hexadecimal bitmask representing the set of TAINT flags
                        that will cause the kernel to panic when add_taint() is
 
        pcbit=          [HW,ISDN]
 
-       pci=option[,option...]  [PCI] various PCI subsystem options.
+       pci=option[,option...]  [PCI,EARLY] various PCI subsystem options.
 
                                Some options herein operate on a specific device
                                or a set of devices (<pci_dev>). These are
                        Format: { 0 | 1 }
                        See arch/parisc/kernel/pdc_chassis.c
 
-       percpu_alloc=   Select which percpu first chunk allocator to use.
+       percpu_alloc=   [MM,EARLY]
+                       Select which percpu first chunk allocator to use.
                        Currently supported values are "embed" and "page".
                        Archs may support subset or none of the selections.
                        See comments in mm/percpu.c for details on each
                        execution priority.
 
        ppc_strict_facility_enable
-                       [PPC] This option catches any kernel floating point,
+                       [PPC,ENABLE] This option catches any kernel floating point,
                        Altivec, VSX and SPE outside of regions specifically
                        allowed (eg kernel_enable_fpu()/kernel_disable_fpu()).
                        There is some performance impact when enabling this.
 
-       ppc_tm=         [PPC]
+       ppc_tm=         [PPC,EARLY]
                        Format: {"off"}
                        Disable Hardware Transactional Memory
 
                        [KNL] Number of legacy pty's. Overwrites compiled-in
                        default number.
 
-       quiet           [KNL] Disable most log messages
+       quiet           [KNL,EARLY] Disable most log messages
 
        r128=           [HW,DRM]
 
        ramdisk_start=  [RAM] RAM disk image start address
 
        random.trust_cpu=off
-                       [KNL] Disable trusting the use of the CPU's
+                       [KNL,EARLY] Disable trusting the use of the CPU's
                        random number generator (if available) to
                        initialize the kernel's RNG.
 
        random.trust_bootloader=off
-                       [KNL] Disable trusting the use of the a seed
+                       [KNL,EARLY] Disable trusting the use of the a seed
                        passed by the bootloader (if available) to
                        initialize the kernel's RNG.
 
        randomize_kstack_offset=
-                       [KNL] Enable or disable kernel stack offset
+                       [KNL,EARLY] Enable or disable kernel stack offset
                        randomization, which provides roughly 5 bits of
                        entropy, frustrating memory corruption attacks
                        that depend on stack address determinism or
                        this kernel boot parameter, forcibly setting it
                        to zero.
 
+       rcutree.enable_rcu_lazy= [KNL]
+                       To save power, batch RCU callbacks and flush after
+                       delay, memory pressure or callback list growing too
+                       big.
+
        rcuscale.gp_async= [KNL]
                        Measure performance of asynchronous
                        grace-period primitives such as call_rcu().
                        Run specified binary instead of /init from the ramdisk,
                        used for early userspace startup. See initrd.
 
-       rdrand=         [X86]
+       rdrand=         [X86,EARLY]
                        force - Override the decision by the kernel to hide the
                                advertisement of RDRAND support (this affects
                                certain AMD processors because of buggy BIOS
                        them.  If <base> is less than 0x10000, the region
                        is assumed to be I/O ports; otherwise it is memory.
 
-       reservetop=     [X86-32]
+       reservetop=     [X86-32,EARLY]
                        Format: nn[KMG]
                        Reserves a hole at the top of the kernel virtual
                        address space.
                        [KNL] Disable ring 3 MONITOR/MWAIT feature on supported
                        CPUs.
 
-       riscv_isa_fallback [RISCV]
+       riscv_isa_fallback [RISCV,EARLY]
                        When CONFIG_RISCV_ISA_FALLBACK is not enabled, permit
                        falling back to detecting extension support by parsing
                        "riscv,isa" property on devicetree systems when the
 
        ro              [KNL] Mount root device read-only on boot
 
-       rodata=         [KNL]
+       rodata=         [KNL,EARLY]
                on      Mark read-only kernel memory as read-only (default).
                off     Leave read-only kernel memory writable for debugging.
                full    Mark read-only kernel memory and aliases as read-only
                        [arm64]
 
        rockchip.usb_uart
+                       [EARLY]
                        Enable the uart passthrough on the designated usb port
                        on Rockchip SoCs. When active, the signals of the
                        debug-uart get routed to the D+ and D- pins of the usb
        sa1100ir        [NET]
                        See drivers/net/irda/sa1100_ir.c.
 
-       sched_verbose   [KNL] Enables verbose scheduler debug messages.
+       sched_verbose   [KNL,EARLY] Enables verbose scheduler debug messages.
 
        schedstats=     [KNL,X86] Enable or disable scheduled statistics.
                        Allowed values are enable and disable. This feature
                        non-zero "wait" parameter.  See weight_single
                        and weight_many.
 
-       skew_tick=      [KNL] Offset the periodic timer tick per cpu to mitigate
+       skew_tick=      [KNL,EARLY] Offset the periodic timer tick per cpu to mitigate
                        xtime_lock contention on larger systems, and/or RCU lock
                        contention on all systems with CONFIG_MAXSMP set.
                        Format: { "0" | "1" }
                                1: Fast pin select (default)
                                2: ATC IRMode
 
-       smt=            [KNL,MIPS,S390] Set the maximum number of threads (logical
-                       CPUs) to use per physical CPU on systems capable of
-                       symmetric multithreading (SMT). Will be capped to the
-                       actual hardware limit.
+       smt=            [KNL,MIPS,S390,EARLY] Set the maximum number of threads
+                       (logical CPUs) to use per physical CPU on systems
+                       capable of symmetric multithreading (SMT). Will
+                       be capped to the actual hardware limit.
                        Format: <integer>
                        Default: -1 (no limit)
 
        sonypi.*=       [HW] Sony Programmable I/O Control Device driver
                        See Documentation/admin-guide/laptops/sonypi.rst
 
-       spectre_v2=     [X86] Control mitigation of Spectre variant 2
+       spectre_v2=     [X86,EARLY] Control mitigation of Spectre variant 2
                        (indirect branch speculation) vulnerability.
                        The default operation protects the kernel from
                        user space attacks.
                        spectre_v2_user=auto.
 
        spec_rstack_overflow=
-                       [X86] Control RAS overflow mitigation on AMD Zen CPUs
+                       [X86,EARLY] Control RAS overflow mitigation on AMD Zen CPUs
 
                        off             - Disable mitigation
                        microcode       - Enable microcode mitigation only
                                          (cloud-specific mitigation)
 
        spec_store_bypass_disable=
-                       [HW] Control Speculative Store Bypass (SSB) Disable mitigation
+                       [HW,EARLY] Control Speculative Store Bypass (SSB) Disable mitigation
                        (Speculative Store Bypass vulnerability)
 
                        Certain CPUs are vulnerable to an exploit against a
                        #DB exception for bus lock is triggered only when
                        CPL > 0.
 
-       srbds=          [X86,INTEL]
+       srbds=          [X86,INTEL,EARLY]
                        Control the Special Register Buffer Data Sampling
                        (SRBDS) mitigation.
 
                        srcutree.convert_to_big must have the 0x10 bit
                        set for contention-based conversions to occur.
 
-       ssbd=           [ARM64,HW]
+       ssbd=           [ARM64,HW,EARLY]
                        Speculative Store Bypass Disable control
 
                        On CPUs that are vulnerable to the Speculative
                        growing up) the main stack are reserved for no other
                        mapping. Default value is 256 pages.
 
-       stack_depot_disable= [KNL]
+       stack_depot_disable= [KNL,EARLY]
                        Setting this to true through kernel command line will
                        disable the stack depot thereby saving the static memory
                        consumed by the stack hash table. By default this is set
                        be used to filter out binaries which have
                        not yet been made aware of AT_MINSIGSTKSZ.
 
-       stress_hpt      [PPC]
+       stress_hpt      [PPC,EARLY]
                        Limits the number of kernel HPT entries in the hash
                        page table to increase the rate of hash page table
                        faults on kernel addresses.
 
-       stress_slb      [PPC]
+       stress_slb      [PPC,EARLY]
                        Limits the number of kernel SLB entries, and flushes
                        them frequently to increase the rate of SLB faults
                        on kernel addresses.
                        This parameter controls use of the Protected
                        Execution Facility on pSeries.
 
-       swiotlb=        [ARM,IA-64,PPC,MIPS,X86]
+       swiotlb=        [ARM,IA-64,PPC,MIPS,X86,EARLY]
                        Format: { <int> [,<int>] | force | noforce }
                        <int> -- Number of I/O TLB slabs
                        <int> -- Second integer after comma. Number of swiotlb
                                 wouldn't be automatically used by the kernel
                        noforce -- Never use bounce buffers (for debugging)
 
-       switches=       [HW,M68k]
+       switches=       [HW,M68k,EARLY]
 
        sysctl.*=       [KNL]
                        Set a sysctl parameter, right before loading the init
                        <deci-seconds>: poll all this frequency
                        0: no polling (default)
 
-       threadirqs      [KNL]
+       threadirqs      [KNL,EARLY]
                        Force threading of all interrupt handlers except those
                        marked explicitly IRQF_NO_THREAD.
 
-       topology=       [S390]
+       topology=       [S390,EARLY]
                        Format: {off | on}
                        Specify if the kernel should make use of the cpu
                        topology information if the hardware supports this.
                        can be overridden by a later tsc=nowatchdog.  A console
                        message will flag any such suppression or overriding.
 
-       tsc_early_khz=  [X86] Skip early TSC calibration and use the given
+       tsc_early_khz=  [X86,EARLY] Skip early TSC calibration and use the given
                        value instead. Useful when the early TSC frequency discovery
                        procedure is not reliable, such as on overclocked systems
                        with CPUID.16h support and partial CPUID.15h support.
                        See Documentation/admin-guide/hw-vuln/tsx_async_abort.rst
                        for more details.
 
-       tsx_async_abort= [X86,INTEL] Control mitigation for the TSX Async
+       tsx_async_abort= [X86,INTEL,EARLY] Control mitigation for the TSX Async
                        Abort (TAA) vulnerability.
 
                        Similar to Micro-architectural Data Sampling (MDS)
        unknown_nmi_panic
                        [X86] Cause panic on unknown NMI.
 
-       unwind_debug    [X86-64]
+       unwind_debug    [X86-64,EARLY]
                        Enable unwinder debug output.  This can be
                        useful for debugging certain unwinder error
                        conditions, including corrupt stacks and
                        Example: user_debug=31
 
        userpte=
-                       [X86] Flags controlling user PTE allocations.
+                       [X86,EARLY] Flags controlling user PTE allocations.
 
                                nohigh = do not allocate PTE pages in
                                        HIGHMEM regardless of setting
        vector=         [IA-64,SMP]
                        vector=percpu: enable percpu vector domain
 
-       video=          [FB] Frame buffer configuration
+       video=          [FB,EARLY] Frame buffer configuration
                        See Documentation/fb/modedb.rst.
 
        video.brightness_switch_enabled= [ACPI]
                          P     Enable page structure init time poisoning
                          -     Disable all of the above options
 
-       vmalloc=nn[KMG] [KNL,BOOT] Forces the vmalloc area to have an exact
-                       size of <nn>. This can be used to increase the
-                       minimum size (128MB on x86). It can also be used to
-                       decrease the size and leave more room for directly
-                       mapped kernel RAM.
+       vmalloc=nn[KMG] [KNL,BOOT,EARLY] Forces the vmalloc area to have an
+                       exact size of <nn>. This can be used to increase
+                       the minimum size (128MB on x86). It can also be
+                       used to decrease the size and leave more room
+                       for directly mapped kernel RAM.
 
-       vmcp_cma=nn[MG] [KNL,S390]
+       vmcp_cma=nn[MG] [KNL,S390,EARLY]
                        Sets the memory size reserved for contiguous memory
                        allocations for the vmcp device driver.
 
        vmpoff=         [KNL,S390] Perform z/VM CP command after power off.
                        Format: <command>
 
-       vsyscall=       [X86-64]
+       vsyscall=       [X86-64,EARLY]
                        Controls the behavior of vsyscalls (i.e. calls to
                        fixed addresses of 0xffffffffff600x00 from legacy
                        code).  Most statically-linked binaries and older
                        When enabled, memory and cache locality will be
                        impacted.
 
-       writecombine=   [LOONGARCH] Control the MAT (Memory Access Type) of
-                       ioremap_wc().
+       writecombine=   [LOONGARCH,EARLY] Control the MAT (Memory Access
+                       Type) of ioremap_wc().
 
                        on   - Enable writecombine, use WUC for ioremap_wc()
                        off  - Disable writecombine, use SUC for ioremap_wc()
 
-       x2apic_phys     [X86-64,APIC] Use x2apic physical mode instead of
+       x2apic_phys     [X86-64,APIC,EARLY] Use x2apic physical mode instead of
                        default x2apic cluster mode on platforms
                        supporting x2apic.
 
                        save/restore/migration must be enabled to handle larger
                        domains.
 
-       xen_emul_unplug=                [HW,X86,XEN]
+       xen_emul_unplug=                [HW,X86,XEN,EARLY]
                        Unplug Xen emulated devices
                        Format: [unplug0,][unplug1]
                        ide-disks -- unplug primary master IDE devices
                                the unplug protocol
                        never -- do not unplug even if version check succeeds
 
-       xen_legacy_crash        [X86,XEN]
+       xen_legacy_crash        [X86,XEN,EARLY]
                        Crash from Xen panic notifier, without executing late
                        panic() code such as dumping handler.
 
-       xen_msr_safe=   [X86,XEN]
+       xen_msr_safe=   [X86,XEN,EARLY]
                        Format: <bool>
                        Select whether to always use non-faulting (safe) MSR
                        access functions when running as Xen PV guest. The
                        default value is controlled by CONFIG_XEN_PV_MSR_SAFE.
 
-       xen_nopvspin    [X86,XEN]
+       xen_nopvspin    [X86,XEN,EARLY]
                        Disables the qspinlock slowpath using Xen PV optimizations.
                        This parameter is obsoleted by "nopvspin" parameter, which
                        has equivalent effect for XEN platform.
                        has equivalent effect for XEN platform.
 
        xen_no_vector_callback
-                       [KNL,X86,XEN] Disable the vector callback for Xen
+                       [KNL,X86,XEN,EARLY] Disable the vector callback for Xen
                        event channel interrupts.
 
        xen_scrub_pages=        [XEN]
                        with /sys/devices/system/xen_memory/xen_memory0/scrub_pages.
                        Default value controlled with CONFIG_XEN_SCRUB_PAGES_DEFAULT.
 
-       xen_timer_slop= [X86-64,XEN]
+       xen_timer_slop= [X86-64,XEN,EARLY]
                        Set the timer slop (in nanoseconds) for the virtual Xen
                        timers (default is 100000). This adjusts the minimum
                        delta of virtualized Xen timers, where lower values
                        host controller quirks. Meaning of each bit can be
                        consulted in header drivers/usb/host/xhci.h.
 
-       xmon            [PPC]
+       xmon            [PPC,EARLY]
                        Format: { early | on | rw | ro | off }
                        Controls if xmon debugger is enabled. Default is off.
                        Passing only "xmon" is equivalent to "xmon=early".
index 993c2a05f5eeab65f9e3d3a5464ac26513452472..b6aeae3327ceb537b78fdbd86961ae670614395b 100644 (file)
@@ -243,13 +243,9 @@ To reduce its OS jitter, do any of the following:
 3.     Do any of the following needed to avoid jitter that your
        application cannot tolerate:
 
-       a.      Build your kernel with CONFIG_SLUB=y rather than
-               CONFIG_SLAB=y, thus avoiding the slab allocator's periodic
-               use of each CPU's workqueues to run its cache_reap()
-               function.
-       b.      Avoid using oprofile, thus avoiding OS jitter from
+       a.      Avoid using oprofile, thus avoiding OS jitter from
                wq_sync_buffer().
-       c.      Limit your CPU frequency so that a CPU-frequency
+       b.      Limit your CPU frequency so that a CPU-frequency
                governor is not required, possibly enlisting the aid of
                special heatsinks or other cooling technologies.  If done
                correctly, and if you CPU architecture permits, you should
@@ -259,7 +255,7 @@ To reduce its OS jitter, do any of the following:
 
                WARNING:  Please check your CPU specifications to
                make sure that this is safe on your particular system.
-       d.      As of v3.18, Christoph Lameter's on-demand vmstat workers
+       c.      As of v3.18, Christoph Lameter's on-demand vmstat workers
                commit prevents OS jitter due to vmstat_update() on
                CONFIG_SMP=y systems.  Before v3.18, is not possible
                to entirely get rid of the OS jitter, but you can
@@ -274,7 +270,7 @@ To reduce its OS jitter, do any of the following:
                (based on an earlier one from Gilad Ben-Yossef) that
                reduces or even eliminates vmstat overhead for some
                workloads at https://lore.kernel.org/r/00000140e9dfd6bd-40db3d4f-c1be-434f-8132-7820f81bb586-000000@email.amazonses.com.
-       e.      If running on high-end powerpc servers, build with
+       d.      If running on high-end powerpc servers, build with
                CONFIG_PPC_RTAS_DAEMON=n.  This prevents the RTAS
                daemon from running on each CPU every second or so.
                (This will require editing Kconfig files and will defeat
@@ -282,12 +278,12 @@ To reduce its OS jitter, do any of the following:
                due to the rtas_event_scan() function.
                WARNING:  Please check your CPU specifications to
                make sure that this is safe on your particular system.
-       f.      If running on Cell Processor, build your kernel with
+       e.      If running on Cell Processor, build your kernel with
                CBE_CPUFREQ_SPU_GOVERNOR=n to avoid OS jitter from
                spu_gov_work().
                WARNING:  Please check your CPU specifications to
                make sure that this is safe on your particular system.
-       g.      If running on PowerMAC, build your kernel with
+       f.      If running on PowerMAC, build your kernel with
                CONFIG_PMAC_RACKMETER=n to disable the CPU-meter,
                avoiding OS jitter from rackmeter_do_timer().
 
index e8c2ce1f9df68df5976b7cc536d3f48c0501ba4b..45a7f4932fe07f295cd452313bfcb64c59809218 100644 (file)
@@ -243,3 +243,10 @@ stable kernels.
 +----------------+-----------------+-----------------+-----------------------------+
 | ASR            | ASR8601         | #8601001        | N/A                         |
 +----------------+-----------------+-----------------+-----------------------------+
++----------------+-----------------+-----------------+-----------------------------+
+| Microsoft      | Azure Cobalt 100| #2139208        | ARM64_ERRATUM_2139208       |
++----------------+-----------------+-----------------+-----------------------------+
+| Microsoft      | Azure Cobalt 100| #2067961        | ARM64_ERRATUM_2067961       |
++----------------+-----------------+-----------------+-----------------------------+
+| Microsoft      | Azure Cobalt 100| #2253138        | ARM64_ERRATUM_2253138       |
++----------------+-----------------+-----------------+-----------------------------+
index e73fdff62c0aa10a0d6de89e6a62f8b2185920a7..c58c72362911cd0a10be8e96eba4cb9940d3b576 100644 (file)
@@ -95,6 +95,9 @@ The kernel provides a function to invoke the buffer clearing:
 
     mds_clear_cpu_buffers()
 
+Also macro CLEAR_CPU_BUFFERS can be used in ASM late in exit-to-user path.
+Other than CFLAGS.ZF, this macro doesn't clobber any registers.
+
 The mitigation is invoked on kernel/userspace, hypervisor/guest and C-state
 (idle) transitions.
 
@@ -138,17 +141,30 @@ Mitigation points
 
    When transitioning from kernel to user space the CPU buffers are flushed
    on affected CPUs when the mitigation is not disabled on the kernel
-   command line. The migitation is enabled through the static key
-   mds_user_clear.
-
-   The mitigation is invoked in prepare_exit_to_usermode() which covers
-   all but one of the kernel to user space transitions.  The exception
-   is when we return from a Non Maskable Interrupt (NMI), which is
-   handled directly in do_nmi().
-
-   (The reason that NMI is special is that prepare_exit_to_usermode() can
-    enable IRQs.  In NMI context, NMIs are blocked, and we don't want to
-    enable IRQs with NMIs blocked.)
+   command line. The mitigation is enabled through the feature flag
+   X86_FEATURE_CLEAR_CPU_BUF.
+
+   The mitigation is invoked just before transitioning to userspace after
+   user registers are restored. This is done to minimize the window in
+   which kernel data could be accessed after VERW e.g. via an NMI after
+   VERW.
+
+   **Corner case not handled**
+   Interrupts returning to kernel don't clear CPUs buffers since the
+   exit-to-user path is expected to do that anyways. But, there could be
+   a case when an NMI is generated in kernel after the exit-to-user path
+   has cleared the buffers. This case is not handled and NMI returning to
+   kernel don't clear CPU buffers because:
+
+   1. It is rare to get an NMI after VERW, but before returning to userspace.
+   2. For an unprivileged user, there is no known way to make that NMI
+      less rare or target it.
+   3. It would take a large number of these precisely-timed NMIs to mount
+      an actual attack.  There's presumably not enough bandwidth.
+   4. The NMI in question occurs after a VERW, i.e. when user state is
+      restored and most interesting data is already scrubbed. Whats left
+      is only the data that NMI touches, and that may or may not be of
+      any interest.
 
 
 2. C-State transition
index 5830b01c56429d38f18e12778ebce543605b3296..da64c9fb7e072378c53423b1f7b575ef124b6834 100644 (file)
@@ -388,6 +388,12 @@ latex_elements = {
         verbatimhintsturnover=false,
     ''',
 
+    #
+    # Some of our authors are fond of deep nesting; tell latex to
+    # cope.
+    #
+    'maxlistdepth': '10',
+
     # For CJK One-half spacing, need to be in front of hyperref
     'extrapackages': r'\usepackage{setspace}',
 
index ab376b316c36d6e3bebd0fe050ff28314d5b60b2..7f3582a67318bebe42dffd5a666a8babdca0fb49 100644 (file)
@@ -245,6 +245,10 @@ Contributing new tests (details)
    TEST_PROGS, TEST_GEN_PROGS mean it is the executable tested by
    default.
 
+   TEST_GEN_MODS_DIR should be used by tests that require modules to be built
+   before the test starts. The variable will contain the name of the directory
+   containing the modules.
+
    TEST_CUSTOM_PROGS should be used by tests that require custom build
    rules and prevent common build rule use.
 
index a9efab50eed83e06a89549aeb1fb4da1b2eba1d9..22955d56b3799bfc3f3b92874b638aa24c1edaa6 100644 (file)
@@ -671,8 +671,23 @@ Testing Static Functions
 ------------------------
 
 If we do not want to expose functions or variables for testing, one option is to
-conditionally ``#include`` the test file at the end of your .c file. For
-example:
+conditionally export the used symbol. For example:
+
+.. code-block:: c
+
+       /* In my_file.c */
+
+       VISIBLE_IF_KUNIT int do_interesting_thing();
+       EXPORT_SYMBOL_IF_KUNIT(do_interesting_thing);
+
+       /* In my_file.h */
+
+       #if IS_ENABLED(CONFIG_KUNIT)
+               int do_interesting_thing(void);
+       #endif
+
+Alternatively, you could conditionally ``#include`` the test file at the end of
+your .c file. For example:
 
 .. code-block:: c
 
index 2323fd5b7cdae1ebe440275d8f67649354a6f448..129cf698fa8a66fd2be5111074319da545f4cc98 100644 (file)
@@ -28,7 +28,10 @@ $(obj)/%.example.dts: $(src)/%.yaml check_dtschema_version FORCE
 find_all_cmd = find $(srctree)/$(src) \( -name '*.yaml' ! \
                -name 'processed-schema*' \)
 
-find_cmd = $(find_all_cmd) | sed 's|^$(srctree)/$(src)/||' | grep -F -e "$(subst :," -e ",$(DT_SCHEMA_FILES))" | sed 's|^|$(srctree)/$(src)/|'
+find_cmd = $(find_all_cmd) | \
+               sed 's|^$(srctree)/||' | \
+               grep -F -e "$(subst :," -e ",$(DT_SCHEMA_FILES))" | \
+               sed 's|^|$(srctree)/|'
 CHK_DT_DOCS := $(shell $(find_cmd))
 
 quiet_cmd_yamllint = LINT    $(src)
index b29ce598f9aaea327bcd177dc6bf143ee8693ebf..9952e0ef77674c11d115dab50a904841410e148a 100644 (file)
@@ -7,7 +7,8 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
 title: Ceva AHCI SATA Controller
 
 maintainers:
-  - Piyush Mehta <piyush.mehta@amd.com>
+  - Mubin Sayyed <mubin.sayyed@amd.com>
+  - Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
 
 description: |
   The Ceva SATA controller mostly conforms to the AHCI interface with some
index 3eebc03a309be24fca83f928ffeab18fed09b13b..ca7fdada3ff2487c3c678bc3aa8b40381d04d12e 100644 (file)
@@ -85,8 +85,8 @@ allOf:
 
         clock-names:
           items:
-            - const: dout_cmu_misc_bus
-            - const: dout_cmu_misc_sss
+            - const: bus
+            - const: sss
 
 additionalProperties: false
 
index 21d995f29a1e3068be328506cf01d8f0f5d3d383..b8e9cf6ce4e61145bb6a30d90396b982449b2f08 100644 (file)
@@ -29,19 +29,22 @@ properties:
 
   audio-ports:
     description:
-      Array of 8-bit values, 2 values per DAI (Documentation/sound/soc/dai.rst).
+      Array of 2 values per DAI (Documentation/sound/soc/dai.rst).
       The implementation allows one or two DAIs.
       If two DAIs are defined, they must be of different type.
     $ref: /schemas/types.yaml#/definitions/uint32-matrix
+    minItems: 1
+    maxItems: 2
     items:
-      minItems: 1
       items:
         - description: |
             The first value defines the DAI type: TDA998x_SPDIF or TDA998x_I2S
             (see include/dt-bindings/display/tda998x.h).
+          enum: [ 1, 2 ]
         - description:
             The second value defines the tda998x AP_ENA reg content when the
             DAI in question is used.
+          maximum: 0xff
 
   '#sound-dai-cells':
     enum: [ 0, 1 ]
index 25d53fde92e1104490e3f8e604184b9449150be3..597c9cc6a312acb66b0355f84f9dd8977dbb2197 100644 (file)
@@ -85,7 +85,7 @@ allOf:
         clocks:
           minItems: 6
           maxItems: 6
-        regs:
+        reg:
           minItems: 2
           maxItems: 2
 
@@ -99,7 +99,7 @@ allOf:
         clocks:
           minItems: 4
           maxItems: 4
-        regs:
+        reg:
           minItems: 2
           maxItems: 2
 
@@ -116,7 +116,7 @@ allOf:
         clocks:
           minItems: 3
           maxItems: 3
-        regs:
+        reg:
           minItems: 1
           maxItems: 1
 
index b1fd632718d49659483fd7e6773377adeb66d938..bb93baa888794b83d1613cecca79a383b528914a 100644 (file)
@@ -12,7 +12,8 @@ description:
   PS_MODE). Every pin can be configured as input/output.
 
 maintainers:
-  - Piyush Mehta <piyush.mehta@amd.com>
+  - Mubin Sayyed <mubin.sayyed@amd.com>
+  - Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
 
 properties:
   compatible:
index 6d5569e77b7a1239219c13ef2a163849ce5bfd86..6a11c1d11fb5f9a9ccd343c2cb461bd7f3411121 100644 (file)
@@ -17,7 +17,7 @@ properties:
   compatible:
     items:
       - enum:
-          - ti,k3-j721s2-wave521c
+          - ti,j721s2-wave521c
       - const: cnm,wave521c
 
   reg:
@@ -53,7 +53,7 @@ additionalProperties: false
 examples:
   - |
     vpu: video-codec@12345678 {
-        compatible = "ti,k3-j721s2-wave521c", "cnm,wave521c";
+        compatible = "ti,j721s2-wave521c", "cnm,wave521c";
         reg = <0x12345678 0x1000>;
         clocks = <&clks 42>;
         interrupts = <42>;
index 5ea8b73663a50c3f55999fb8cc911af491d46086..16ff892f7bbd0aa8f965d602e5ccbd3b18ec9253 100644 (file)
@@ -78,8 +78,8 @@ examples:
     pcie@0 {
         #address-cells = <3>;
         #size-cells = <2>;
-        ranges = <0x0 0x0 0x0 0x0 0x0 0x0>;
-        reg = <0x0 0x0 0x0 0x0 0x0 0x0>;
+        ranges = <0x02000000 0x0 0x100000 0x10000000 0x0 0x0>;
+        reg = <0x0 0x1000>;
         device_type = "pci";
 
         switch@0,0 {
index 475aff7714d6419a9cb7266c65bffffce733b29d..ea35d19be829a37a657f6a3fb45153981ea16fb9 100644 (file)
@@ -65,9 +65,11 @@ properties:
 
   rx-internal-delay-ps:
     enum: [0, 1800]
+    default: 0
 
   tx-internal-delay-ps:
     enum: [0, 2000]
+    default: 0
 
   '#address-cells':
     const: 1
index 49db668014297040f85b628137769951663992ed..1f1b42dde94d5086020f0a89d183eafa1ea17589 100644 (file)
@@ -7,7 +7,8 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
 title: Zynq UltraScale+ MPSoC and Versal reset
 
 maintainers:
-  - Piyush Mehta <piyush.mehta@amd.com>
+  - Mubin Sayyed <mubin.sayyed@amd.com>
+  - Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
 
 description: |
   The Zynq UltraScale+ MPSoC and Versal has several different resets.
index 8108c564dd78a84a1d869a60b975dcb51e6480ff..aa32dc950e72ccdaf7fb1ac7f759d57a855fc9b6 100644 (file)
@@ -22,6 +22,7 @@ properties:
       - const: allwinner,sun6i-a31-spdif
       - const: allwinner,sun8i-h3-spdif
       - const: allwinner,sun50i-h6-spdif
+      - const: allwinner,sun50i-h616-spdif
       - items:
           - const: allwinner,sun8i-a83t-spdif
           - const: allwinner,sun8i-h3-spdif
@@ -62,6 +63,8 @@ allOf:
             enum:
               - allwinner,sun6i-a31-spdif
               - allwinner,sun8i-h3-spdif
+              - allwinner,sun50i-h6-spdif
+              - allwinner,sun50i-h616-spdif
 
     then:
       required:
@@ -73,7 +76,7 @@ allOf:
           contains:
             enum:
               - allwinner,sun8i-h3-spdif
-              - allwinner,sun50i-h6-spdif
+              - allwinner,sun50i-h616-spdif
 
     then:
       properties:
index ec4b6e547ca6efad4b77697c567e30da74707261..cdcd7c6f21eb241663c44fd8d066dc1700a8994b 100644 (file)
@@ -7,7 +7,6 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
 title: Google SC7280-Herobrine ASoC sound card driver
 
 maintainers:
-  - Srinivasa Rao Mandadapu <srivasam@codeaurora.org>
   - Judy Hsiao <judyhsiao@chromium.org>
 
 description:
index c29d7942915cccbaf58679f702c642632f5495b8..241d20f3aad08a845c57e3dead8fba25c4856e6a 100644 (file)
@@ -64,7 +64,7 @@ examples:
     #include <dt-bindings/clock/tegra30-car.h>
     #include <dt-bindings/soc/tegra-pmc.h>
     sound {
-        compatible = "lge,tegra-audio-max98089-p895",
+        compatible = "lg,tegra-audio-max98089-p895",
                      "nvidia,tegra-audio-max98089";
         nvidia,model = "LG Optimus Vu MAX98089";
 
index 90390624a8be5e7abc0b18374f19705db999a97d..3c1241b2a43f99d361e0af89aed61f4318c9b914 100644 (file)
@@ -42,7 +42,7 @@ properties:
 
   resets:
     description: Reset controller to reset the TPM
-    $ref: /schemas/types.yaml#/definitions/phandle
+    maxItems: 1
 
   reset-gpios:
     description: Output GPIO pin to reset the TPM
index 88cc1e3a0c887c367c7ed83ff2a0835398a20b93..b2b509b3944d85714316c8f91b042054373416a4 100644 (file)
@@ -55,9 +55,12 @@ properties:
 
   samsung,sysreg:
     $ref: /schemas/types.yaml#/definitions/phandle-array
-    description: Should be phandle/offset pair. The phandle to the syscon node
-                 which indicates the FSYSx sysreg interface and the offset of
-                 the control register for UFS io coherency setting.
+    items:
+      - items:
+          - description: phandle to FSYSx sysreg node
+          - description: offset of the control register for UFS io coherency setting
+    description:
+      Phandle and offset to the FSYSx sysreg for UFS io coherency setting.
 
   dma-coherent: true
 
index bb373eb025a5f92b085d62b21354d94eaa002e65..00f87a558c7dd3b8af7392f87448ac8a00fbcd95 100644 (file)
@@ -7,7 +7,8 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
 title: Xilinx SuperSpeed DWC3 USB SoC controller
 
 maintainers:
-  - Piyush Mehta <piyush.mehta@amd.com>
+  - Mubin Sayyed <mubin.sayyed@amd.com>
+  - Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
 
 properties:
   compatible:
index 6d4cfd943f5847ff43cbccd13e5f210a95448c1c..445183d9d6db1adaa1ab9d04cb4271eadbe22ffc 100644 (file)
@@ -16,8 +16,9 @@ description:
   USB 2.0 traffic.
 
 maintainers:
-  - Piyush Mehta <piyush.mehta@amd.com>
   - Michal Simek <michal.simek@amd.com>
+  - Mubin Sayyed <mubin.sayyed@amd.com>
+  - Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
 
 properties:
   compatible:
index 868dffe314bcba9123a4e99b9966de738b0ea8f3..a7f75fe366652bb2dcec6bf6e87c5879d31f1fce 100644 (file)
@@ -7,7 +7,8 @@ $schema: http://devicetree.org/meta-schemas/core.yaml#
 title: Xilinx udc controller
 
 maintainers:
-  - Piyush Mehta <piyush.mehta@amd.com>
+  - Mubin Sayyed <mubin.sayyed@amd.com>
+  - Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
 
 properties:
   compatible:
index e3d593841aa7ddd96237283b27470a9fba97ec89..ea8d16600e16a8530b7e633368bb53b75e878c15 100644 (file)
@@ -545,7 +545,7 @@ In such scenario, dpll device input signal shall be also configurable
 to drive dpll with signal recovered from the PHY netdevice.
 This is done by exposing a pin to the netdevice - attaching pin to the
 netdevice itself with
-``netdev_dpll_pin_set(struct net_device *dev, struct dpll_pin *dpll_pin)``.
+``dpll_netdev_pin_set(struct net_device *dev, struct dpll_pin *dpll_pin)``.
 Exposed pin id handle ``DPLL_A_PIN_ID`` is then identifiable by the user
 as it is attached to rtnetlink respond to get ``RTM_NEWLINK`` command in
 nested attribute ``IFLA_DPLL_PIN``.
index 9e38e4c221ca5dc1be598bc1ab45bdd890f1419e..eb770f891b275f3e8b905f1bb794494a61dde54d 100644 (file)
@@ -116,7 +116,7 @@ before and after the reference count increment. This pattern can be seen
 in get_file_rcu() and __files_get_rcu().
 
 In addition, it isn't possible to access or check fields in struct file
-without first aqcuiring a reference on it under rcu lookup. Not doing
+without first acquiring a reference on it under rcu lookup. Not doing
 that was always very dodgy and it was only usable for non-pointer data
 in struct file. With SLAB_TYPESAFE_BY_RCU it is necessary that callers
 either first acquire a reference or they must hold the files_lock of the
index e18bc5ae3b35f89ecc61fc6266d5151c53dbd34d..0ea1e44fa02823ffd51f4739a3a9aab635a35bbe 100644 (file)
@@ -98,7 +98,6 @@ Documentation for filesystem implementations.
    isofs
    nilfs2
    nfs/index
-   ntfs
    ntfs3
    ocfs2
    ocfs2-online-filecheck
index d5bf4b6b7509b01c9a2d5225a6bb5b2e1ef327b2..e664061ed55dc1bdc6d7d16c086f3050c32909d6 100644 (file)
@@ -29,7 +29,7 @@ prototypes::
        char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen);
        struct vfsmount *(*d_automount)(struct path *path);
        int (*d_manage)(const struct path *, bool);
-       struct dentry *(*d_real)(struct dentry *, const struct inode *);
+       struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
 
 locking rules:
 
diff --git a/Documentation/filesystems/ntfs.rst b/Documentation/filesystems/ntfs.rst
deleted file mode 100644 (file)
index 5bb093a..0000000
+++ /dev/null
@@ -1,466 +0,0 @@
-.. SPDX-License-Identifier: GPL-2.0
-
-================================
-The Linux NTFS filesystem driver
-================================
-
-
-.. Table of contents
-
-   - Overview
-   - Web site
-   - Features
-   - Supported mount options
-   - Known bugs and (mis-)features
-   - Using NTFS volume and stripe sets
-     - The Device-Mapper driver
-     - The Software RAID / MD driver
-     - Limitations when using the MD driver
-
-
-Overview
-========
-
-Linux-NTFS comes with a number of user-space programs known as ntfsprogs.
-These include mkntfs, a full-featured ntfs filesystem format utility,
-ntfsundelete used for recovering files that were unintentionally deleted
-from an NTFS volume and ntfsresize which is used to resize an NTFS partition.
-See the web site for more information.
-
-To mount an NTFS 1.2/3.x (Windows NT4/2000/XP/2003) volume, use the file
-system type 'ntfs'.  The driver currently supports read-only mode (with no
-fault-tolerance, encryption or journalling) and very limited, but safe, write
-support.
-
-For fault tolerance and raid support (i.e. volume and stripe sets), you can
-use the kernel's Software RAID / MD driver.  See section "Using Software RAID
-with NTFS" for details.
-
-
-Web site
-========
-
-There is plenty of additional information on the linux-ntfs web site
-at http://www.linux-ntfs.org/
-
-The web site has a lot of additional information, such as a comprehensive
-FAQ, documentation on the NTFS on-disk format, information on the Linux-NTFS
-userspace utilities, etc.
-
-
-Features
-========
-
-- This is a complete rewrite of the NTFS driver that used to be in the 2.4 and
-  earlier kernels.  This new driver implements NTFS read support and is
-  functionally equivalent to the old ntfs driver and it also implements limited
-  write support.  The biggest limitation at present is that files/directories
-  cannot be created or deleted.  See below for the list of write features that
-  are so far supported.  Another limitation is that writing to compressed files
-  is not implemented at all.  Also, neither read nor write access to encrypted
-  files is so far implemented.
-- The new driver has full support for sparse files on NTFS 3.x volumes which
-  the old driver isn't happy with.
-- The new driver supports execution of binaries due to mmap() now being
-  supported.
-- The new driver supports loopback mounting of files on NTFS which is used by
-  some Linux distributions to enable the user to run Linux from an NTFS
-  partition by creating a large file while in Windows and then loopback
-  mounting the file while in Linux and creating a Linux filesystem on it that
-  is used to install Linux on it.
-- A comparison of the two drivers using::
-
-       time find . -type f -exec md5sum "{}" \;
-
-  run three times in sequence with each driver (after a reboot) on a 1.4GiB
-  NTFS partition, showed the new driver to be 20% faster in total time elapsed
-  (from 9:43 minutes on average down to 7:53).  The time spent in user space
-  was unchanged but the time spent in the kernel was decreased by a factor of
-  2.5 (from 85 CPU seconds down to 33).
-- The driver does not support short file names in general.  For backwards
-  compatibility, we implement access to files using their short file names if
-  they exist.  The driver will not create short file names however, and a
-  rename will discard any existing short file name.
-- The new driver supports exporting of mounted NTFS volumes via NFS.
-- The new driver supports async io (aio).
-- The new driver supports fsync(2), fdatasync(2), and msync(2).
-- The new driver supports readv(2) and writev(2).
-- The new driver supports access time updates (including mtime and ctime).
-- The new driver supports truncate(2) and open(2) with O_TRUNC.  But at present
-  only very limited support for highly fragmented files, i.e. ones which have
-  their data attribute split across multiple extents, is included.  Another
-  limitation is that at present truncate(2) will never create sparse files,
-  since to mark a file sparse we need to modify the directory entry for the
-  file and we do not implement directory modifications yet.
-- The new driver supports write(2) which can both overwrite existing data and
-  extend the file size so that you can write beyond the existing data.  Also,
-  writing into sparse regions is supported and the holes are filled in with
-  clusters.  But at present only limited support for highly fragmented files,
-  i.e. ones which have their data attribute split across multiple extents, is
-  included.  Another limitation is that write(2) will never create sparse
-  files, since to mark a file sparse we need to modify the directory entry for
-  the file and we do not implement directory modifications yet.
-
-Supported mount options
-=======================
-
-In addition to the generic mount options described by the manual page for the
-mount command (man 8 mount, also see man 5 fstab), the NTFS driver supports the
-following mount options:
-
-======================= =======================================================
-iocharset=name         Deprecated option.  Still supported but please use
-                       nls=name in the future.  See description for nls=name.
-
-nls=name               Character set to use when returning file names.
-                       Unlike VFAT, NTFS suppresses names that contain
-                       unconvertible characters.  Note that most character
-                       sets contain insufficient characters to represent all
-                       possible Unicode characters that can exist on NTFS.
-                       To be sure you are not missing any files, you are
-                       advised to use nls=utf8 which is capable of
-                       representing all Unicode characters.
-
-utf8=<bool>            Option no longer supported.  Currently mapped to
-                       nls=utf8 but please use nls=utf8 in the future and
-                       make sure utf8 is compiled either as module or into
-                       the kernel.  See description for nls=name.
-
-uid=
-gid=
-umask=                 Provide default owner, group, and access mode mask.
-                       These options work as documented in mount(8).  By
-                       default, the files/directories are owned by root and
-                       he/she has read and write permissions, as well as
-                       browse permission for directories.  No one else has any
-                       access permissions.  I.e. the mode on all files is by
-                       default rw------- and for directories rwx------, a
-                       consequence of the default fmask=0177 and dmask=0077.
-                       Using a umask of zero will grant all permissions to
-                       everyone, i.e. all files and directories will have mode
-                       rwxrwxrwx.
-
-fmask=
-dmask=                 Instead of specifying umask which applies both to
-                       files and directories, fmask applies only to files and
-                       dmask only to directories.
-
-sloppy=<BOOL>          If sloppy is specified, ignore unknown mount options.
-                       Otherwise the default behaviour is to abort mount if
-                       any unknown options are found.
-
-show_sys_files=<BOOL>  If show_sys_files is specified, show the system files
-                       in directory listings.  Otherwise the default behaviour
-                       is to hide the system files.
-                       Note that even when show_sys_files is specified, "$MFT"
-                       will not be visible due to bugs/mis-features in glibc.
-                       Further, note that irrespective of show_sys_files, all
-                       files are accessible by name, i.e. you can always do
-                       "ls -l \$UpCase" for example to specifically show the
-                       system file containing the Unicode upcase table.
-
-case_sensitive=<BOOL>  If case_sensitive is specified, treat all file names as
-                       case sensitive and create file names in the POSIX
-                       namespace.  Otherwise the default behaviour is to treat
-                       file names as case insensitive and to create file names
-                       in the WIN32/LONG name space.  Note, the Linux NTFS
-                       driver will never create short file names and will
-                       remove them on rename/delete of the corresponding long
-                       file name.
-                       Note that files remain accessible via their short file
-                       name, if it exists.  If case_sensitive, you will need
-                       to provide the correct case of the short file name.
-
-disable_sparse=<BOOL>  If disable_sparse is specified, creation of sparse
-                       regions, i.e. holes, inside files is disabled for the
-                       volume (for the duration of this mount only).  By
-                       default, creation of sparse regions is enabled, which
-                       is consistent with the behaviour of traditional Unix
-                       filesystems.
-
-errors=opt             What to do when critical filesystem errors are found.
-                       Following values can be used for "opt":
-
-                         ========  =========================================
-                         continue  DEFAULT, try to clean-up as much as
-                                   possible, e.g. marking a corrupt inode as
-                                   bad so it is no longer accessed, and then
-                                   continue.
-                         recover   At present only supported is recovery of
-                                   the boot sector from the backup copy.
-                                   If read-only mount, the recovery is done
-                                   in memory only and not written to disk.
-                         ========  =========================================
-
-                       Note that the options are additive, i.e. specifying::
-
-                          errors=continue,errors=recover
-
-                       means the driver will attempt to recover and if that
-                       fails it will clean-up as much as possible and
-                       continue.
-
-mft_zone_multiplier=   Set the MFT zone multiplier for the volume (this
-                       setting is not persistent across mounts and can be
-                       changed from mount to mount but cannot be changed on
-                       remount).  Values of 1 to 4 are allowed, 1 being the
-                       default.  The MFT zone multiplier determines how much
-                       space is reserved for the MFT on the volume.  If all
-                       other space is used up, then the MFT zone will be
-                       shrunk dynamically, so this has no impact on the
-                       amount of free space.  However, it can have an impact
-                       on performance by affecting fragmentation of the MFT.
-                       In general use the default.  If you have a lot of small
-                       files then use a higher value.  The values have the
-                       following meaning:
-
-                             =====         =================================
-                             Value          MFT zone size (% of volume size)
-                             =====         =================================
-                               1               12.5%
-                               2               25%
-                               3               37.5%
-                               4               50%
-                             =====         =================================
-
-                       Note this option is irrelevant for read-only mounts.
-======================= =======================================================
-
-
-Known bugs and (mis-)features
-=============================
-
-- The link count on each directory inode entry is set to 1, due to Linux not
-  supporting directory hard links.  This may well confuse some user space
-  applications, since the directory names will have the same inode numbers.
-  This also speeds up ntfs_read_inode() immensely.  And we haven't found any
-  problems with this approach so far.  If you find a problem with this, please
-  let us know.
-
-
-Please send bug reports/comments/feedback/abuse to the Linux-NTFS development
-list at sourceforge: linux-ntfs-dev@lists.sourceforge.net
-
-
-Using NTFS volume and stripe sets
-=================================
-
-For support of volume and stripe sets, you can either use the kernel's
-Device-Mapper driver or the kernel's Software RAID / MD driver.  The former is
-the recommended one to use for linear raid.  But the latter is required for
-raid level 5.  For striping and mirroring, either driver should work fine.
-
-
-The Device-Mapper driver
-------------------------
-
-You will need to create a table of the components of the volume/stripe set and
-how they fit together and load this into the kernel using the dmsetup utility
-(see man 8 dmsetup).
-
-Linear volume sets, i.e. linear raid, has been tested and works fine.  Even
-though untested, there is no reason why stripe sets, i.e. raid level 0, and
-mirrors, i.e. raid level 1 should not work, too.  Stripes with parity, i.e.
-raid level 5, unfortunately cannot work yet because the current version of the
-Device-Mapper driver does not support raid level 5.  You may be able to use the
-Software RAID / MD driver for raid level 5, see the next section for details.
-
-To create the table describing your volume you will need to know each of its
-components and their sizes in sectors, i.e. multiples of 512-byte blocks.
-
-For NT4 fault tolerant volumes you can obtain the sizes using fdisk.  So for
-example if one of your partitions is /dev/hda2 you would do::
-
-    $ fdisk -ul /dev/hda
-
-    Disk /dev/hda: 81.9 GB, 81964302336 bytes
-    255 heads, 63 sectors/track, 9964 cylinders, total 160086528 sectors
-    Units = sectors of 1 * 512 = 512 bytes
-
-       Device Boot      Start         End      Blocks   Id  System
-       /dev/hda1   *          63     4209029     2104483+  83  Linux
-       /dev/hda2         4209030    37768814    16779892+  86  NTFS
-       /dev/hda3        37768815    46170809     4200997+  83  Linux
-
-And you would know that /dev/hda2 has a size of 37768814 - 4209030 + 1 =
-33559785 sectors.
-
-For Win2k and later dynamic disks, you can for example use the ldminfo utility
-which is part of the Linux LDM tools (the latest version at the time of
-writing is linux-ldm-0.0.8.tar.bz2).  You can download it from:
-
-       http://www.linux-ntfs.org/
-
-Simply extract the downloaded archive (tar xvjf linux-ldm-0.0.8.tar.bz2), go
-into it (cd linux-ldm-0.0.8) and change to the test directory (cd test).  You
-will find the precompiled (i386) ldminfo utility there.  NOTE: You will not be
-able to compile this yourself easily so use the binary version!
-
-Then you would use ldminfo in dump mode to obtain the necessary information::
-
-    $ ./ldminfo --dump /dev/hda
-
-This would dump the LDM database found on /dev/hda which describes all of your
-dynamic disks and all the volumes on them.  At the bottom you will see the
-VOLUME DEFINITIONS section which is all you really need.  You may need to look
-further above to determine which of the disks in the volume definitions is
-which device in Linux.  Hint: Run ldminfo on each of your dynamic disks and
-look at the Disk Id close to the top of the output for each (the PRIVATE HEADER
-section).  You can then find these Disk Ids in the VBLK DATABASE section in the
-<Disk> components where you will get the LDM Name for the disk that is found in
-the VOLUME DEFINITIONS section.
-
-Note you will also need to enable the LDM driver in the Linux kernel.  If your
-distribution did not enable it, you will need to recompile the kernel with it
-enabled.  This will create the LDM partitions on each device at boot time.  You
-would then use those devices (for /dev/hda they would be /dev/hda1, 2, 3, etc)
-in the Device-Mapper table.
-
-You can also bypass using the LDM driver by using the main device (e.g.
-/dev/hda) and then using the offsets of the LDM partitions into this device as
-the "Start sector of device" when creating the table.  Once again ldminfo would
-give you the correct information to do this.
-
-Assuming you know all your devices and their sizes things are easy.
-
-For a linear raid the table would look like this (note all values are in
-512-byte sectors)::
-
-    # Offset into      Size of this    Raid type       Device          Start sector
-    # volume   device                                          of device
-    0          1028161         linear          /dev/hda1       0
-    1028161            3903762         linear          /dev/hdb2       0
-    4931923            2103211         linear          /dev/hdc1       0
-
-For a striped volume, i.e. raid level 0, you will need to know the chunk size
-you used when creating the volume.  Windows uses 64kiB as the default, so it
-will probably be this unless you changes the defaults when creating the array.
-
-For a raid level 0 the table would look like this (note all values are in
-512-byte sectors)::
-
-    # Offset   Size        Raid     Number   Chunk  1st        Start   2nd       Start
-    # into     of the   type     of          size   Device     in      Device    in
-    # volume   volume       stripes                    device            device
-    0     2056320  striped  2        128    /dev/hda1  0       /dev/hdb1 0
-
-If there are more than two devices, just add each of them to the end of the
-line.
-
-Finally, for a mirrored volume, i.e. raid level 1, the table would look like
-this (note all values are in 512-byte sectors)::
-
-    # Ofs Size   Raid   Log  Number Region Should Number Source  Start Target Start
-    # in  of the type   type of log size   sync?  of     Device  in    Device in
-    # vol volume                params              mirrors         Device       Device
-    0    2056320 mirror core 2 16     nosync 2    /dev/hda1 0   /dev/hdb1 0
-
-If you are mirroring to multiple devices you can specify further targets at the
-end of the line.
-
-Note the "Should sync?" parameter "nosync" means that the two mirrors are
-already in sync which will be the case on a clean shutdown of Windows.  If the
-mirrors are not clean, you can specify the "sync" option instead of "nosync"
-and the Device-Mapper driver will then copy the entirety of the "Source Device"
-to the "Target Device" or if you specified multiple target devices to all of
-them.
-
-Once you have your table, save it in a file somewhere (e.g. /etc/ntfsvolume1),
-and hand it over to dmsetup to work with, like so::
-
-    $ dmsetup create myvolume1 /etc/ntfsvolume1
-
-You can obviously replace "myvolume1" with whatever name you like.
-
-If it all worked, you will now have the device /dev/device-mapper/myvolume1
-which you can then just use as an argument to the mount command as usual to
-mount the ntfs volume.  For example::
-
-    $ mount -t ntfs -o ro /dev/device-mapper/myvolume1 /mnt/myvol1
-
-(You need to create the directory /mnt/myvol1 first and of course you can use
-anything you like instead of /mnt/myvol1 as long as it is an existing
-directory.)
-
-It is advisable to do the mount read-only to see if the volume has been setup
-correctly to avoid the possibility of causing damage to the data on the ntfs
-volume.
-
-
-The Software RAID / MD driver
------------------------------
-
-An alternative to using the Device-Mapper driver is to use the kernel's
-Software RAID / MD driver.  For which you need to set up your /etc/raidtab
-appropriately (see man 5 raidtab).
-
-Linear volume sets, i.e. linear raid, as well as stripe sets, i.e. raid level
-0, have been tested and work fine (though see section "Limitations when using
-the MD driver with NTFS volumes" especially if you want to use linear raid).
-Even though untested, there is no reason why mirrors, i.e. raid level 1, and
-stripes with parity, i.e. raid level 5, should not work, too.
-
-You have to use the "persistent-superblock 0" option for each raid-disk in the
-NTFS volume/stripe you are configuring in /etc/raidtab as the persistent
-superblock used by the MD driver would damage the NTFS volume.
-
-Windows by default uses a stripe chunk size of 64k, so you probably want the
-"chunk-size 64k" option for each raid-disk, too.
-
-For example, if you have a stripe set consisting of two partitions /dev/hda5
-and /dev/hdb1 your /etc/raidtab would look like this::
-
-    raiddev /dev/md0
-           raid-level  0
-           nr-raid-disks       2
-           nr-spare-disks      0
-           persistent-superblock       0
-           chunk-size  64k
-           device              /dev/hda5
-           raid-disk   0
-           device              /dev/hdb1
-           raid-disk   1
-
-For linear raid, just change the raid-level above to "raid-level linear", for
-mirrors, change it to "raid-level 1", and for stripe sets with parity, change
-it to "raid-level 5".
-
-Note for stripe sets with parity you will also need to tell the MD driver
-which parity algorithm to use by specifying the option "parity-algorithm
-which", where you need to replace "which" with the name of the algorithm to
-use (see man 5 raidtab for available algorithms) and you will have to try the
-different available algorithms until you find one that works.  Make sure you
-are working read-only when playing with this as you may damage your data
-otherwise.  If you find which algorithm works please let us know (email the
-linux-ntfs developers list linux-ntfs-dev@lists.sourceforge.net or drop in on
-IRC in channel #ntfs on the irc.freenode.net network) so we can update this
-documentation.
-
-Once the raidtab is setup, run for example raid0run -a to start all devices or
-raid0run /dev/md0 to start a particular md device, in this case /dev/md0.
-
-Then just use the mount command as usual to mount the ntfs volume using for
-example::
-
-    mount -t ntfs -o ro /dev/md0 /mnt/myntfsvolume
-
-It is advisable to do the mount read-only to see if the md volume has been
-setup correctly to avoid the possibility of causing damage to the data on the
-ntfs volume.
-
-
-Limitations when using the Software RAID / MD driver
------------------------------------------------------
-
-Using the md driver will not work properly if any of your NTFS partitions have
-an odd number of sectors.  This is especially important for linear raid as all
-data after the first partition with an odd number of sectors will be offset by
-one or more sectors so if you mount such a partition with write support you
-will cause massive damage to the data on the volume which will only become
-apparent when you try to use the volume again under Windows.
-
-So when using linear raid, make sure that all your partitions have an even
-number of sectors BEFORE attempting to use it.  You have been warned!
-
-Even better is to simply use the Device-Mapper for linear raid and then you do
-not have this problem with odd numbers of sectors.
index 1c244866041a3cb985568ce5f19e67d947eed5e6..16551440144183c58517fe27e750233619ebf89d 100644 (file)
@@ -145,7 +145,9 @@ filesystem, an overlay filesystem needs to record in the upper filesystem
 that files have been removed.  This is done using whiteouts and opaque
 directories (non-directories are always opaque).
 
-A whiteout is created as a character device with 0/0 device number.
+A whiteout is created as a character device with 0/0 device number or
+as a zero-size regular file with the xattr "trusted.overlay.whiteout".
+
 When a whiteout is found in the upper level of a merged directory, any
 matching name in the lower level is ignored, and the whiteout itself
 is also hidden.
@@ -154,6 +156,13 @@ A directory is made opaque by setting the xattr "trusted.overlay.opaque"
 to "y".  Where the upper filesystem contains an opaque directory, any
 directory in the lower filesystem with the same name is ignored.
 
+An opaque directory should not conntain any whiteouts, because they do not
+serve any purpose.  A merge directory containing regular files with the xattr
+"trusted.overlay.whiteout", should be additionally marked by setting the xattr
+"trusted.overlay.opaque" to "x" on the merge directory itself.
+This is needed to avoid the overhead of checking the "trusted.overlay.whiteout"
+on all entries during readdir in the common case.
+
 readdir
 -------
 
@@ -534,8 +543,9 @@ A lower dir with a regular whiteout will always be handled by the overlayfs
 mount, so to support storing an effective whiteout file in an overlayfs mount an
 alternative form of whiteout is supported. This form is a regular, zero-size
 file with the "overlay.whiteout" xattr set, inside a directory with the
-"overlay.whiteouts" xattr set. Such whiteouts are never created by overlayfs,
-but can be used by userspace tools (like containers) that generate lower layers.
+"overlay.opaque" xattr set to "x" (see `whiteouts and opaque directories`_).
+These alternative whiteouts are never created by overlayfs, but can be used by
+userspace tools (like containers) that generate lower layers.
 These alternative whiteouts can be escaped using the standard xattr escape
 mechanism in order to properly nest to any depth.
 
index eebcc0f9e2bcd1f3eecc99c00ec656fe45cbbd8e..6e903a903f8f691d55af7f3ae200f959fc7978a2 100644 (file)
@@ -1264,7 +1264,7 @@ defined:
                char *(*d_dname)(struct dentry *, char *, int);
                struct vfsmount *(*d_automount)(struct path *);
                int (*d_manage)(const struct path *, bool);
-               struct dentry *(*d_real)(struct dentry *, const struct inode *);
+               struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
        };
 
 ``d_revalidate``
@@ -1419,16 +1419,14 @@ defined:
        the dentry being transited from.
 
 ``d_real``
-       overlay/union type filesystems implement this method to return
-       one of the underlying dentries hidden by the overlay.  It is
-       used in two different modes:
+       overlay/union type filesystems implement this method to return one
+       of the underlying dentries of a regular file hidden by the overlay.
 
-       Called from file_dentry() it returns the real dentry matching
-       the inode argument.  The real dentry may be from a lower layer
-       already copied up, but still referenced from the file.  This
-       mode is selected with a non-NULL inode argument.
+       The 'type' argument takes the values D_REAL_DATA or D_REAL_METADATA
+       for returning the real underlying dentry that refers to the inode
+       hosting the file's data or metadata respectively.
 
-       With NULL inode the topmost real underlying dentry is returned.
+       For non-regular files, the 'dentry' argument is returned.
 
 Each dentry has a pointer to its parent dentry, as well as a hash list
 of child dentries.  Child dentries are basically like files in a
index e8877db0461fb45ed939bd859df9d7bc0ef7077e..ac49836d8ecf8909ef75e8e04fbab45d89d1f888 100644 (file)
 # that are possible for CORE. So for example if CORE_BELL_A_ADVANCED is 'y',
 # CORE must be 'y' too.
 #
-#  * What influences CORE_BELL_A_ADVANCED ?
+#  * What influences CORE_BELL_A_ADVANCED?
 #
 # As the name implies CORE_BELL_A_ADVANCED is an advanced feature of
 # CORE_BELL_A so naturally it depends on CORE_BELL_A. So if CORE_BELL_A is 'y'
 # we know CORE_BELL_A_ADVANCED can be 'y' too.
 #
-#   * What influences CORE_BELL_A ?
+#   * What influences CORE_BELL_A?
 #
 # CORE_BELL_A depends on CORE, so CORE influences CORE_BELL_A.
 #
@@ -34,7 +34,7 @@
 # the "recursive dependency detected" error.
 #
 # Reading the Documentation/kbuild/Kconfig.recursion-issue-01 file it may be
-# obvious that an easy to solution to this problem should just be the removal
+# obvious that an easy solution to this problem should just be the removal
 # of the "select CORE" from CORE_BELL_A_ADVANCED as that is implicit already
 # since CORE_BELL_A depends on CORE. Recursive dependency issues are not always
 # so trivial to resolve, we provide another example below of practical
index b14aed18065f43ce24e9217eefd455814c953efc..3dcc9ece272aad6842a6297c6d5bf2cca2c2acc3 100644 (file)
@@ -384,8 +384,6 @@ operations:
             - type
 
       dump:
-        pre: dpll-lock-dumpit
-        post: dpll-unlock-dumpit
         reply: *dev-attrs
 
     -
@@ -473,8 +471,6 @@ operations:
             - fractional-frequency-offset
 
       dump:
-        pre: dpll-lock-dumpit
-        post: dpll-unlock-dumpit
         request:
           attributes:
             - id
index 1ad01d52a8638dcf6ee8a1c6c3d58698abd0d8e4..8e4d19adee8cd17eae831db73692c9237b5e0ad1 100644 (file)
@@ -942,6 +942,10 @@ attribute-sets:
       -
         name: gro-ipv4-max-size
         type: u32
+      -
+        name: dpll-pin
+        type: nest
+        nested-attributes: link-dpll-pin-attrs
   -
     name: af-spec-attrs
     attributes:
@@ -1627,6 +1631,12 @@ attribute-sets:
       -
         name: used
         type: u8
+  -
+    name: link-dpll-pin-attrs
+    attributes:
+      -
+        name: id
+        type: u32
 
 sub-messages:
   -
index e33ad2401ad70c8a678fec93ebfda3ca4909f9db..562f46b41274493c77176a1e563269fc9e4b2b85 100644 (file)
@@ -126,7 +126,7 @@ Users may also set the RoCE capability of the function using
 `devlink port function set roce` command.
 
 Users may also set the function as migratable using
-'devlink port function set migratable' command.
+`devlink port function set migratable` command.
 
 Users may also set the IPsec crypto capability of the function using
 `devlink port function set ipsec_crypto` command.
index a2babd0d7954e6729ed8533518dbef039f5fdeac..595d7ef5fc8b090788e7a3439843c060951d1098 100644 (file)
@@ -1,9 +1,9 @@
 .. SPDX-License-Identifier: GPL-2.0
 .. Copyright (C) 2023 Google LLC
 
-=====================================================
-inet_connection_sock struct fast path usage breakdown
-=====================================================
+==========================================
+inet_sock struct fast path usage breakdown
+==========================================
 
 Type                    Name                  fastpath_tx_access  fastpath_rx_access  comment
 ..struct                ..inet_sock                                                     
index e75a53593bb9606f1c0595d8f7227881ec932b9c..dceb49d56a91158232543e920c7ed23bed74106e 100644 (file)
@@ -136,8 +136,8 @@ struct_netpoll_info*                npinfo                  -
 possible_net_t                      nd_net                  -                   read_mostly         (dev_net)napi_busy_loop,tcp_v(4/6)_rcv,ip(v6)_rcv,ip(6)_input,ip(6)_input_finish
 void*                               ml_priv                                                         
 enum_netdev_ml_priv_type            ml_priv_type                                                    
-struct_pcpu_lstats__percpu*         lstats                                                          
-struct_pcpu_sw_netstats__percpu*    tstats                                                          
+struct_pcpu_lstats__percpu*         lstats                  read_mostly                             dev_lstats_add()
+struct_pcpu_sw_netstats__percpu*    tstats                  read_mostly                             dev_sw_netstats_tx_add()
 struct_pcpu_dstats__percpu*         dstats                                                          
 struct_garp_port*                   garp_port                                                       
 struct_mrp_port*                    mrp_port                                                        
index 97d7a5c8e01c02658c7f445ed92a2d1f7cc61d31..1c154cbd18487e385c8ae7a1e39d3b5f5ab086a2 100644 (file)
@@ -38,13 +38,13 @@ u32                           max_window              read_mostly         -
 u32                           mss_cache               read_mostly         read_mostly         tcp_rate_check_app_limited,tcp_current_mss,tcp_sync_mss,tcp_sndbuf_expand,tcp_tso_should_defer(tx);tcp_update_pacing_rate,tcp_clean_rtx_queue(rx)
 u32                           window_clamp            read_mostly         read_write          tcp_rcv_space_adjust,__tcp_select_window
 u32                           rcv_ssthresh            read_mostly         -                   __tcp_select_window
-u82                           scaling_ratio                                                   
+u8                            scaling_ratio           read_mostly         read_mostly         tcp_win_from_space
 struct                        tcp_rack                                                        
 u16                           advmss                  -                   read_mostly         tcp_rcv_space_adjust
 u8                            compressed_ack                                                  
 u8:2                          dup_ack_counter                                                 
 u8:1                          tlp_retrans                                                     
-u8:1                          tcp_usec_ts                                                     
+u8:1                          tcp_usec_ts             read_mostly         read_mostly
 u32                           chrono_start            read_write          -                   tcp_chrono_start/stop(tcp_write_xmit,tcp_cwnd_validate,tcp_send_syn_data)
 u32[3]                        chrono_stat             read_write          -                   tcp_chrono_start/stop(tcp_write_xmit,tcp_cwnd_validate,tcp_send_syn_data)
 u8:2                          chrono_type             read_write          -                   tcp_chrono_start/stop(tcp_write_xmit,tcp_cwnd_validate,tcp_send_syn_data)
diff --git a/Documentation/process/cve.rst b/Documentation/process/cve.rst
new file mode 100644 (file)
index 0000000..5e2753e
--- /dev/null
@@ -0,0 +1,121 @@
+====
+CVEs
+====
+
+Common Vulnerabilities and Exposure (CVE®) numbers were developed as an
+unambiguous way to identify, define, and catalog publicly disclosed
+security vulnerabilities.  Over time, their usefulness has declined with
+regards to the kernel project, and CVE numbers were very often assigned
+in inappropriate ways and for inappropriate reasons.  Because of this,
+the kernel development community has tended to avoid them.  However, the
+combination of continuing pressure to assign CVEs and other forms of
+security identifiers, and ongoing abuses by individuals and companies
+outside of the kernel community has made it clear that the kernel
+community should have control over those assignments.
+
+The Linux kernel developer team does have the ability to assign CVEs for
+potential Linux kernel security issues.  This assignment is independent
+of the :doc:`normal Linux kernel security bug reporting
+process<../process/security-bugs>`.
+
+A list of all assigned CVEs for the Linux kernel can be found in the
+archives of the linux-cve mailing list, as seen on
+https://lore.kernel.org/linux-cve-announce/.  To get notice of the
+assigned CVEs, please `subscribe
+<https://subspace.kernel.org/subscribing.html>`_ to that mailing list.
+
+Process
+=======
+
+As part of the normal stable release process, kernel changes that are
+potentially security issues are identified by the developers responsible
+for CVE number assignments and have CVE numbers automatically assigned
+to them.  These assignments are published on the linux-cve-announce
+mailing list as announcements on a frequent basis.
+
+Note, due to the layer at which the Linux kernel is in a system, almost
+any bug might be exploitable to compromise the security of the kernel,
+but the possibility of exploitation is often not evident when the bug is
+fixed.  Because of this, the CVE assignment team is overly cautious and
+assign CVE numbers to any bugfix that they identify.  This
+explains the seemingly large number of CVEs that are issued by the Linux
+kernel team.
+
+If the CVE assignment team misses a specific fix that any user feels
+should have a CVE assigned to it, please email them at <cve@kernel.org>
+and the team there will work with you on it.  Note that no potential
+security issues should be sent to this alias, it is ONLY for assignment
+of CVEs for fixes that are already in released kernel trees.  If you
+feel you have found an unfixed security issue, please follow the
+:doc:`normal Linux kernel security bug reporting
+process<../process/security-bugs>`.
+
+No CVEs will be automatically assigned for unfixed security issues in
+the Linux kernel; assignment will only automatically happen after a fix
+is available and applied to a stable kernel tree, and it will be tracked
+that way by the git commit id of the original fix.  If anyone wishes to
+have a CVE assigned before an issue is resolved with a commit, please
+contact the kernel CVE assignment team at <cve@kernel.org> to get an
+identifier assigned from their batch of reserved identifiers.
+
+No CVEs will be assigned for any issue found in a version of the kernel
+that is not currently being actively supported by the Stable/LTS kernel
+team.  A list of the currently supported kernel branches can be found at
+https://kernel.org/releases.html
+
+Disputes of assigned CVEs
+=========================
+
+The authority to dispute or modify an assigned CVE for a specific kernel
+change lies solely with the maintainers of the relevant subsystem
+affected.  This principle ensures a high degree of accuracy and
+accountability in vulnerability reporting.  Only those individuals with
+deep expertise and intimate knowledge of the subsystem can effectively
+assess the validity and scope of a reported vulnerability and determine
+its appropriate CVE designation.  Any attempt to modify or dispute a CVE
+outside of this designated authority could lead to confusion, inaccurate
+reporting, and ultimately, compromised systems.
+
+Invalid CVEs
+============
+
+If a security issue is found in a Linux kernel that is only supported by
+a Linux distribution due to the changes that have been made by that
+distribution, or due to the distribution supporting a kernel version
+that is no longer one of the kernel.org supported releases, then a CVE
+can not be assigned by the Linux kernel CVE team, and must be asked for
+from that Linux distribution itself.
+
+Any CVE that is assigned against the Linux kernel for an actively
+supported kernel version, by any group other than the kernel assignment
+CVE team should not be treated as a valid CVE.  Please notify the
+kernel CVE assignment team at <cve@kernel.org> so that they can work to
+invalidate such entries through the CNA remediation process.
+
+Applicability of specific CVEs
+==============================
+
+As the Linux kernel can be used in many different ways, with many
+different ways of accessing it by external users, or no access at all,
+the applicability of any specific CVE is up to the user of Linux to
+determine, it is not up to the CVE assignment team.  Please do not
+contact us to attempt to determine the applicability of any specific
+CVE.
+
+Also, as the source tree is so large, and any one system only uses a
+small subset of the source tree, any users of Linux should be aware that
+large numbers of assigned CVEs are not relevant for their systems.
+
+In short, we do not know your use case, and we do not know what portions
+of the kernel that you use, so there is no way for us to determine if a
+specific CVE is relevant for your system.
+
+As always, it is best to take all released kernel changes, as they are
+tested together in a unified whole by many community members, and not as
+individual cherry-picked changes.  Also note that for many bugs, the
+solution to the overall problem is not found in a single change, but by
+the sum of many fixes on top of each other.  Ideally CVEs will be
+assigned to all fixes for all issues, but sometimes we will fail to
+notice fixes, therefore assume that some changes without a CVE assigned
+might be relevant to take.
+
index 6cb732dfcc72245639e93638083c60fdb7141195..de9cbb7bd7eb2b3a064a93ca2b025bdaf63a42c7 100644 (file)
@@ -81,6 +81,7 @@ of special classes of bugs: regressions and security problems.
 
    handling-regressions
    security-bugs
+   cve
    embargoed-hardware-issues
 
 Maintainer information
index 84ee60fceef24cbf1ba9e090ac91c94abd4064b5..fd96e4a3cef9c09382e34419ec3f8ac1c5514cf4 100644 (file)
@@ -431,7 +431,7 @@ patchwork checks
 Checks in patchwork are mostly simple wrappers around existing kernel
 scripts, the sources are available at:
 
-https://github.com/kuba-moo/nipa/tree/master/tests
+https://github.com/linux-netdev/nipa/tree/master/tests
 
 **Do not** post your patches just to run them through the checks.
 You must ensure that your patches are ready by testing them locally
index 692a3ba56cca83742f77edc5161167060d76f414..56c560a00b37a6a3e99a7d9edaa45a103d7398bb 100644 (file)
@@ -99,9 +99,8 @@ CVE assignment
 The security team does not assign CVEs, nor do we require them for
 reports or fixes, as this can needlessly complicate the process and may
 delay the bug handling.  If a reporter wishes to have a CVE identifier
-assigned, they should find one by themselves, for example by contacting
-MITRE directly.  However under no circumstances will a patch inclusion
-be delayed to wait for a CVE identifier to arrive.
+assigned for a confirmed issue, they can contact the :doc:`kernel CVE
+assignment team<../process/cve>` to obtain one.
 
 Non-disclosure agreements
 -------------------------
index b9df61eb45013872ca82b463048494e041b3f127..03ace5f01b5c021e12adba23b83b8cb074c949ba 100644 (file)
@@ -109,7 +109,7 @@ class KernelFeat(Directive):
             else:
                 out_lines += line + "\n"
 
-        nodeList = self.nestedParse(out_lines, fname)
+        nodeList = self.nestedParse(out_lines, self.arguments[0])
         return nodeList
 
     def nestedParse(self, lines, fname):
index b58efa99df527d3d870d9572e6ee7f18912fe99f..41f1efbe64bb2898f1770deb128630b316a68a08 100644 (file)
@@ -12,5 +12,7 @@
 <script type="text/javascript"> <!--
   var sbar = document.getElementsByClassName("sphinxsidebar")[0];
   let currents = document.getElementsByClassName("current")
-  sbar.scrollTop = currents[currents.length - 1].offsetTop;
+  if (currents.length) {
+    sbar.scrollTop = currents[currents.length - 1].offsetTop;
+  }
   --> </script>
index 47161e6eba9976fa8e67a14905f284ea05d82f21..32c2b32b2b5ee91a27abacfa0332620e208b8723 100644 (file)
@@ -29,10 +29,7 @@ all_languages = {
 }
 
 class LanguagesNode(nodes.Element):
-    def __init__(self, current_language, *args, **kwargs):
-        super().__init__(*args, **kwargs)
-
-        self.current_language = current_language
+    pass
 
 class TranslationsTransform(Transform):
     default_priority = 900
@@ -49,7 +46,8 @@ class TranslationsTransform(Transform):
             # normalize docname to be the untranslated one
             docname = os.path.join(*components[2:])
 
-        new_nodes = LanguagesNode(all_languages[this_lang_code])
+        new_nodes = LanguagesNode()
+        new_nodes['current_language'] = all_languages[this_lang_code]
 
         for lang_code, lang_name in all_languages.items():
             if lang_code == this_lang_code:
@@ -84,7 +82,7 @@ def process_languages(app, doctree, docname):
 
         html_content = app.builder.templates.render('translations.html',
             context={
-                'current_language': node.current_language,
+                'current_language': node['current_language'],
                 'languages': languages,
             })
 
index 8cd62c466d20aac597fa5aa15ecfb2930c15c252..077dfac7ed98f7911d731312eb09631e41c63772 100644 (file)
@@ -448,17 +448,17 @@ Function-specific configfs interface
 The function name to use when creating the function directory is "ncm".
 The NCM function provides these attributes in its function directory:
 
-       ===============   ==================================================
-       ifname            network device interface name associated with this
-                         function instance
-       qmult             queue length multiplier for high and super speed
-       host_addr         MAC address of host's end of this
-                         Ethernet over USB link
-       dev_addr          MAC address of device's end of this
-                         Ethernet over USB link
-       max_segment_size  Segment size required for P2P connections. This
-                         will set MTU to (max_segment_size - 14 bytes)
-       ===============   ==================================================
+       ======================= ==================================================
+       ifname                  network device interface name associated with this
+                               function instance
+       qmult                   queue length multiplier for high and super speed
+       host_addr               MAC address of host's end of this
+                               Ethernet over USB link
+       dev_addr                MAC address of device's end of this
+                               Ethernet over USB link
+       max_segment_size        Segment size required for P2P connections. This
+                               will set MTU to 14 bytes
+       ======================= ==================================================
 
 and after creating the functions/ncm.<instance name> they contain default
 values: qmult is 5, dev_addr and host_addr are randomly selected.
index 457e16f06e04defe9f998ba74ff6577b7ff13490..3731ecf1e4370df533430bb2debdc046acdffb88 100644 (file)
@@ -82,8 +82,9 @@ Code  Seq#    Include File                                           Comments
 0x10  00-0F  drivers/char/s390/vmcp.h
 0x10  10-1F  arch/s390/include/uapi/sclp_ctl.h
 0x10  20-2F  arch/s390/include/uapi/asm/hypfs.h
-0x12  all    linux/fs.h
+0x12  all    linux/fs.h                                              BLK* ioctls
              linux/blkpg.h
+0x15  all    linux/fs.h                                              FS_IOC_* ioctls
 0x1b  all                                                            InfiniBand Subsystem
                                                                      <http://infiniband.sourceforge.net/>
 0x20  all    drivers/cdrom/cm206.h
index 4a7a1b738bbead3563cbc70a67c038725d0aadfd..de447e11b4a5c3b9a0948712e59d1d065130a1f7 100644 (file)
@@ -10,3 +10,4 @@ Hyper-V Enlightenments
    overview
    vmbus
    clocks
+   vpci
diff --git a/Documentation/virt/hyperv/vpci.rst b/Documentation/virt/hyperv/vpci.rst
new file mode 100644 (file)
index 0000000..b65b212
--- /dev/null
@@ -0,0 +1,316 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+PCI pass-thru devices
+=========================
+In a Hyper-V guest VM, PCI pass-thru devices (also called
+virtual PCI devices, or vPCI devices) are physical PCI devices
+that are mapped directly into the VM's physical address space.
+Guest device drivers can interact directly with the hardware
+without intermediation by the host hypervisor.  This approach
+provides higher bandwidth access to the device with lower
+latency, compared with devices that are virtualized by the
+hypervisor.  The device should appear to the guest just as it
+would when running on bare metal, so no changes are required
+to the Linux device drivers for the device.
+
+Hyper-V terminology for vPCI devices is "Discrete Device
+Assignment" (DDA).  Public documentation for Hyper-V DDA is
+available here: `DDA`_
+
+.. _DDA: https://learn.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/plan-for-deploying-devices-using-discrete-device-assignment
+
+DDA is typically used for storage controllers, such as NVMe,
+and for GPUs.  A similar mechanism for NICs is called SR-IOV
+and produces the same benefits by allowing a guest device
+driver to interact directly with the hardware.  See Hyper-V
+public documentation here: `SR-IOV`_
+
+.. _SR-IOV: https://learn.microsoft.com/en-us/windows-hardware/drivers/network/overview-of-single-root-i-o-virtualization--sr-iov-
+
+This discussion of vPCI devices includes DDA and SR-IOV
+devices.
+
+Device Presentation
+-------------------
+Hyper-V provides full PCI functionality for a vPCI device when
+it is operating, so the Linux device driver for the device can
+be used unchanged, provided it uses the correct Linux kernel
+APIs for accessing PCI config space and for other integration
+with Linux.  But the initial detection of the PCI device and
+its integration with the Linux PCI subsystem must use Hyper-V
+specific mechanisms.  Consequently, vPCI devices on Hyper-V
+have a dual identity.  They are initially presented to Linux
+guests as VMBus devices via the standard VMBus "offer"
+mechanism, so they have a VMBus identity and appear under
+/sys/bus/vmbus/devices.  The VMBus vPCI driver in Linux at
+drivers/pci/controller/pci-hyperv.c handles a newly introduced
+vPCI device by fabricating a PCI bus topology and creating all
+the normal PCI device data structures in Linux that would
+exist if the PCI device were discovered via ACPI on a bare-
+metal system.  Once those data structures are set up, the
+device also has a normal PCI identity in Linux, and the normal
+Linux device driver for the vPCI device can function as if it
+were running in Linux on bare-metal.  Because vPCI devices are
+presented dynamically through the VMBus offer mechanism, they
+do not appear in the Linux guest's ACPI tables.  vPCI devices
+may be added to a VM or removed from a VM at any time during
+the life of the VM, and not just during initial boot.
+
+With this approach, the vPCI device is a VMBus device and a
+PCI device at the same time.  In response to the VMBus offer
+message, the hv_pci_probe() function runs and establishes a
+VMBus connection to the vPCI VSP on the Hyper-V host.  That
+connection has a single VMBus channel.  The channel is used to
+exchange messages with the vPCI VSP for the purpose of setting
+up and configuring the vPCI device in Linux.  Once the device
+is fully configured in Linux as a PCI device, the VMBus
+channel is used only if Linux changes the vCPU to be interrupted
+in the guest, or if the vPCI device is removed from
+the VM while the VM is running.  The ongoing operation of the
+device happens directly between the Linux device driver for
+the device and the hardware, with VMBus and the VMBus channel
+playing no role.
+
+PCI Device Setup
+----------------
+PCI device setup follows a sequence that Hyper-V originally
+created for Windows guests, and that can be ill-suited for
+Linux guests due to differences in the overall structure of
+the Linux PCI subsystem compared with Windows.  Nonetheless,
+with a bit of hackery in the Hyper-V virtual PCI driver for
+Linux, the virtual PCI device is setup in Linux so that
+generic Linux PCI subsystem code and the Linux driver for the
+device "just work".
+
+Each vPCI device is set up in Linux to be in its own PCI
+domain with a host bridge.  The PCI domainID is derived from
+bytes 4 and 5 of the instance GUID assigned to the VMBus vPCI
+device.  The Hyper-V host does not guarantee that these bytes
+are unique, so hv_pci_probe() has an algorithm to resolve
+collisions.  The collision resolution is intended to be stable
+across reboots of the same VM so that the PCI domainIDs don't
+change, as the domainID appears in the user space
+configuration of some devices.
+
+hv_pci_probe() allocates a guest MMIO range to be used as PCI
+config space for the device.  This MMIO range is communicated
+to the Hyper-V host over the VMBus channel as part of telling
+the host that the device is ready to enter d0.  See
+hv_pci_enter_d0().  When the guest subsequently accesses this
+MMIO range, the Hyper-V host intercepts the accesses and maps
+them to the physical device PCI config space.
+
+hv_pci_probe() also gets BAR information for the device from
+the Hyper-V host, and uses this information to allocate MMIO
+space for the BARs.  That MMIO space is then setup to be
+associated with the host bridge so that it works when generic
+PCI subsystem code in Linux processes the BARs.
+
+Finally, hv_pci_probe() creates the root PCI bus.  At this
+point the Hyper-V virtual PCI driver hackery is done, and the
+normal Linux PCI machinery for scanning the root bus works to
+detect the device, to perform driver matching, and to
+initialize the driver and device.
+
+PCI Device Removal
+------------------
+A Hyper-V host may initiate removal of a vPCI device from a
+guest VM at any time during the life of the VM.  The removal
+is instigated by an admin action taken on the Hyper-V host and
+is not under the control of the guest OS.
+
+A guest VM is notified of the removal by an unsolicited
+"Eject" message sent from the host to the guest over the VMBus
+channel associated with the vPCI device.  Upon receipt of such
+a message, the Hyper-V virtual PCI driver in Linux
+asynchronously invokes Linux kernel PCI subsystem calls to
+shutdown and remove the device.  When those calls are
+complete, an "Ejection Complete" message is sent back to
+Hyper-V over the VMBus channel indicating that the device has
+been removed.  At this point, Hyper-V sends a VMBus rescind
+message to the Linux guest, which the VMBus driver in Linux
+processes by removing the VMBus identity for the device.  Once
+that processing is complete, all vestiges of the device having
+been present are gone from the Linux kernel.  The rescind
+message also indicates to the guest that Hyper-V has stopped
+providing support for the vPCI device in the guest.  If the
+guest were to attempt to access that device's MMIO space, it
+would be an invalid reference. Hypercalls affecting the device
+return errors, and any further messages sent in the VMBus
+channel are ignored.
+
+After sending the Eject message, Hyper-V allows the guest VM
+60 seconds to cleanly shutdown the device and respond with
+Ejection Complete before sending the VMBus rescind
+message.  If for any reason the Eject steps don't complete
+within the allowed 60 seconds, the Hyper-V host forcibly
+performs the rescind steps, which will likely result in
+cascading errors in the guest because the device is now no
+longer present from the guest standpoint and accessing the
+device MMIO space will fail.
+
+Because ejection is asynchronous and can happen at any point
+during the guest VM lifecycle, proper synchronization in the
+Hyper-V virtual PCI driver is very tricky.  Ejection has been
+observed even before a newly offered vPCI device has been
+fully setup.  The Hyper-V virtual PCI driver has been updated
+several times over the years to fix race conditions when
+ejections happen at inopportune times. Care must be taken when
+modifying this code to prevent re-introducing such problems.
+See comments in the code.
+
+Interrupt Assignment
+--------------------
+The Hyper-V virtual PCI driver supports vPCI devices using
+MSI, multi-MSI, or MSI-X.  Assigning the guest vCPU that will
+receive the interrupt for a particular MSI or MSI-X message is
+complex because of the way the Linux setup of IRQs maps onto
+the Hyper-V interfaces.  For the single-MSI and MSI-X cases,
+Linux calls hv_compse_msi_msg() twice, with the first call
+containing a dummy vCPU and the second call containing the
+real vCPU.  Furthermore, hv_irq_unmask() is finally called
+(on x86) or the GICD registers are set (on arm64) to specify
+the real vCPU again.  Each of these three calls interact
+with Hyper-V, which must decide which physical CPU should
+receive the interrupt before it is forwarded to the guest VM.
+Unfortunately, the Hyper-V decision-making process is a bit
+limited, and can result in concentrating the physical
+interrupts on a single CPU, causing a performance bottleneck.
+See details about how this is resolved in the extensive
+comment above the function hv_compose_msi_req_get_cpu().
+
+The Hyper-V virtual PCI driver implements the
+irq_chip.irq_compose_msi_msg function as hv_compose_msi_msg().
+Unfortunately, on Hyper-V the implementation requires sending
+a VMBus message to the Hyper-V host and awaiting an interrupt
+indicating receipt of a reply message.  Since
+irq_chip.irq_compose_msi_msg can be called with IRQ locks
+held, it doesn't work to do the normal sleep until awakened by
+the interrupt. Instead hv_compose_msi_msg() must send the
+VMBus message, and then poll for the completion message. As
+further complexity, the vPCI device could be ejected/rescinded
+while the polling is in progress, so this scenario must be
+detected as well.  See comments in the code regarding this
+very tricky area.
+
+Most of the code in the Hyper-V virtual PCI driver (pci-
+hyperv.c) applies to Hyper-V and Linux guests running on x86
+and on arm64 architectures.  But there are differences in how
+interrupt assignments are managed.  On x86, the Hyper-V
+virtual PCI driver in the guest must make a hypercall to tell
+Hyper-V which guest vCPU should be interrupted by each
+MSI/MSI-X interrupt, and the x86 interrupt vector number that
+the x86_vector IRQ domain has picked for the interrupt.  This
+hypercall is made by hv_arch_irq_unmask().  On arm64, the
+Hyper-V virtual PCI driver manages the allocation of an SPI
+for each MSI/MSI-X interrupt.  The Hyper-V virtual PCI driver
+stores the allocated SPI in the architectural GICD registers,
+which Hyper-V emulates, so no hypercall is necessary as with
+x86.  Hyper-V does not support using LPIs for vPCI devices in
+arm64 guest VMs because it does not emulate a GICv3 ITS.
+
+The Hyper-V virtual PCI driver in Linux supports vPCI devices
+whose drivers create managed or unmanaged Linux IRQs.  If the
+smp_affinity for an unmanaged IRQ is updated via the /proc/irq
+interface, the Hyper-V virtual PCI driver is called to tell
+the Hyper-V host to change the interrupt targeting and
+everything works properly.  However, on x86 if the x86_vector
+IRQ domain needs to reassign an interrupt vector due to
+running out of vectors on a CPU, there's no path to inform the
+Hyper-V host of the change, and things break.  Fortunately,
+guest VMs operate in a constrained device environment where
+using all the vectors on a CPU doesn't happen. Since such a
+problem is only a theoretical concern rather than a practical
+concern, it has been left unaddressed.
+
+DMA
+---
+By default, Hyper-V pins all guest VM memory in the host
+when the VM is created, and programs the physical IOMMU to
+allow the VM to have DMA access to all its memory.  Hence
+it is safe to assign PCI devices to the VM, and allow the
+guest operating system to program the DMA transfers.  The
+physical IOMMU prevents a malicious guest from initiating
+DMA to memory belonging to the host or to other VMs on the
+host. From the Linux guest standpoint, such DMA transfers
+are in "direct" mode since Hyper-V does not provide a virtual
+IOMMU in the guest.
+
+Hyper-V assumes that physical PCI devices always perform
+cache-coherent DMA.  When running on x86, this behavior is
+required by the architecture.  When running on arm64, the
+architecture allows for both cache-coherent and
+non-cache-coherent devices, with the behavior of each device
+specified in the ACPI DSDT.  But when a PCI device is assigned
+to a guest VM, that device does not appear in the DSDT, so the
+Hyper-V VMBus driver propagates cache-coherency information
+from the VMBus node in the ACPI DSDT to all VMBus devices,
+including vPCI devices (since they have a dual identity as a VMBus
+device and as a PCI device).  See vmbus_dma_configure().
+Current Hyper-V versions always indicate that the VMBus is
+cache coherent, so vPCI devices on arm64 always get marked as
+cache coherent and the CPU does not perform any sync
+operations as part of dma_map/unmap_*() calls.
+
+vPCI protocol versions
+----------------------
+As previously described, during vPCI device setup and teardown
+messages are passed over a VMBus channel between the Hyper-V
+host and the Hyper-v vPCI driver in the Linux guest.  Some
+messages have been revised in newer versions of Hyper-V, so
+the guest and host must agree on the vPCI protocol version to
+be used.  The version is negotiated when communication over
+the VMBus channel is first established.  See
+hv_pci_protocol_negotiation(). Newer versions of the protocol
+extend support to VMs with more than 64 vCPUs, and provide
+additional information about the vPCI device, such as the
+guest virtual NUMA node to which it is most closely affined in
+the underlying hardware.
+
+Guest NUMA node affinity
+------------------------
+When the vPCI protocol version provides it, the guest NUMA
+node affinity of the vPCI device is stored as part of the Linux
+device information for subsequent use by the Linux driver. See
+hv_pci_assign_numa_node().  If the negotiated protocol version
+does not support the host providing NUMA affinity information,
+the Linux guest defaults the device NUMA node to 0.  But even
+when the negotiated protocol version includes NUMA affinity
+information, the ability of the host to provide such
+information depends on certain host configuration options.  If
+the guest receives NUMA node value "0", it could mean NUMA
+node 0, or it could mean "no information is available".
+Unfortunately it is not possible to distinguish the two cases
+from the guest side.
+
+PCI config space access in a CoCo VM
+------------------------------------
+Linux PCI device drivers access PCI config space using a
+standard set of functions provided by the Linux PCI subsystem.
+In Hyper-V guests these standard functions map to functions
+hv_pcifront_read_config() and hv_pcifront_write_config()
+in the Hyper-V virtual PCI driver.  In normal VMs,
+these hv_pcifront_*() functions directly access the PCI config
+space, and the accesses trap to Hyper-V to be handled.
+But in CoCo VMs, memory encryption prevents Hyper-V
+from reading the guest instruction stream to emulate the
+access, so the hv_pcifront_*() functions must invoke
+hypercalls with explicit arguments describing the access to be
+made.
+
+Config Block back-channel
+-------------------------
+The Hyper-V host and Hyper-V virtual PCI driver in Linux
+together implement a non-standard back-channel communication
+path between the host and guest.  The back-channel path uses
+messages sent over the VMBus channel associated with the vPCI
+device.  The functions hyperv_read_cfg_blk() and
+hyperv_write_cfg_blk() are the primary interfaces provided to
+other parts of the Linux kernel.  As of this writing, these
+interfaces are used only by the Mellanox mlx5 driver to pass
+diagnostic data to a Hyper-V host running in the Azure public
+cloud.  The functions hyperv_read_cfg_blk() and
+hyperv_write_cfg_blk() are implemented in a separate module
+(pci-hyperv-intf.c, under CONFIG_PCI_HYPERV_INTERFACE) that
+effectively stubs them out when running in non-Hyper-V
+environments.
index 3ec0b7a455a0cf489b93683a49b5362cded0b570..09c7e585ff5800da5a72a1f9dbd8b719f0b6d595 100644 (file)
@@ -8791,6 +8791,11 @@ means the VM type with value @n is supported.  Possible values of @n are::
   #define KVM_X86_DEFAULT_VM   0
   #define KVM_X86_SW_PROTECTED_VM      1
 
+Note, KVM_X86_SW_PROTECTED_VM is currently only for development and testing.
+Do not use KVM_X86_SW_PROTECTED_VM for "real" VMs, and especially not in
+production.  The behavior and effective ABI for software-protected VMs is
+unstable.
+
 9. Known KVM API problems
 =========================
 
index 8d1052fa6a6924d17a4d2681fa7907c544e35186..7f0b462d85a414a7fe3babd94ab6999e4051813b 100644 (file)
@@ -1395,6 +1395,7 @@ F:        drivers/hwmon/max31760.c
 
 ANALOGBITS PLL LIBRARIES
 M:     Paul Walmsley <paul.walmsley@sifive.com>
+M:     Samuel Holland <samuel.holland@sifive.com>
 S:     Supported
 F:     drivers/clk/analogbits/*
 F:     include/linux/clk/analogbits*
@@ -2156,7 +2157,7 @@ M:        Shawn Guo <shawnguo@kernel.org>
 M:     Sascha Hauer <s.hauer@pengutronix.de>
 R:     Pengutronix Kernel Team <kernel@pengutronix.de>
 R:     Fabio Estevam <festevam@gmail.com>
-R:     NXP Linux Team <linux-imx@nxp.com>
+L:     imx@lists.linux.dev
 L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
 S:     Maintained
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux.git
@@ -3168,10 +3169,10 @@ F:      drivers/hwmon/asus-ec-sensors.c
 
 ASUS NOTEBOOKS AND EEEPC ACPI/WMI EXTRAS DRIVERS
 M:     Corentin Chary <corentin.chary@gmail.com>
-L:     acpi4asus-user@lists.sourceforge.net
+M:     Luke D. Jones <luke@ljones.dev>
 L:     platform-driver-x86@vger.kernel.org
 S:     Maintained
-W:     http://acpi4asus.sf.net
+W:     https://asus-linux.org/
 F:     drivers/platform/x86/asus*.c
 F:     drivers/platform/x86/eeepc*.c
 
@@ -4169,14 +4170,14 @@ F:      drivers/firmware/broadcom/tee_bnxt_fw.c
 F:     drivers/net/ethernet/broadcom/bnxt/
 F:     include/linux/firmware/broadcom/tee_bnxt_fw.h
 
-BROADCOM BRCM80211 IEEE802.11n WIRELESS DRIVER
-M:     Arend van Spriel <aspriel@gmail.com>
-M:     Franky Lin <franky.lin@broadcom.com>
-M:     Hante Meuleman <hante.meuleman@broadcom.com>
+BROADCOM BRCM80211 IEEE802.11 WIRELESS DRIVERS
+M:     Arend van Spriel <arend.vanspriel@broadcom.com>
 L:     linux-wireless@vger.kernel.org
+L:     brcm80211@lists.linux.dev
 L:     brcm80211-dev-list.pdl@broadcom.com
 S:     Supported
 F:     drivers/net/wireless/broadcom/brcm80211/
+F:     include/linux/platform_data/brcmfmac.h
 
 BROADCOM BRCMSTB GPIO DRIVER
 M:     Doug Berger <opendmb@gmail.com>
@@ -4547,7 +4548,7 @@ F:        drivers/net/ieee802154/ca8210.c
 
 CACHEFILES: FS-CACHE BACKEND FOR CACHING ON MOUNTED FILESYSTEMS
 M:     David Howells <dhowells@redhat.com>
-L:     linux-cachefs@redhat.com (moderated for non-subscribers)
+L:     netfs@lists.linux.dev
 S:     Supported
 F:     Documentation/filesystems/caching/cachefiles.rst
 F:     fs/cachefiles/
@@ -5378,7 +5379,7 @@ CONTROL GROUP - MEMORY RESOURCE CONTROLLER (MEMCG)
 M:     Johannes Weiner <hannes@cmpxchg.org>
 M:     Michal Hocko <mhocko@kernel.org>
 M:     Roman Gushchin <roman.gushchin@linux.dev>
-M:     Shakeel Butt <shakeelb@google.com>
+M:     Shakeel Butt <shakeel.butt@linux.dev>
 R:     Muchun Song <muchun.song@linux.dev>
 L:     cgroups@vger.kernel.org
 L:     linux-mm@kvack.org
@@ -5610,6 +5611,11 @@ S:       Maintained
 F:     Documentation/devicetree/bindings/net/can/ctu,ctucanfd.yaml
 F:     drivers/net/can/ctucanfd/
 
+CVE ASSIGNMENT CONTACT
+M:     CVE Assignment Team <cve@kernel.org>
+S:     Maintained
+F:     Documentation/process/cve.rst
+
 CW1200 WLAN driver
 S:     Orphan
 F:     drivers/net/wireless/st/cw1200/
@@ -5958,7 +5964,6 @@ S:        Maintained
 F:     drivers/platform/x86/dell/dell-wmi-descriptor.c
 
 DELL WMI HARDWARE PRIVACY SUPPORT
-M:     Perry Yuan <Perry.Yuan@dell.com>
 L:     Dell.Client.Kernel@dell.com
 L:     platform-driver-x86@vger.kernel.org
 S:     Maintained
@@ -7955,12 +7960,13 @@ L:      rust-for-linux@vger.kernel.org
 S:     Maintained
 F:     rust/kernel/net/phy.rs
 
-EXEC & BINFMT API
+EXEC & BINFMT API, ELF
 R:     Eric Biederman <ebiederm@xmission.com>
 R:     Kees Cook <keescook@chromium.org>
 L:     linux-mm@kvack.org
 S:     Supported
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/execve
+F:     Documentation/userspace-api/ELF.rst
 F:     fs/*binfmt_*.c
 F:     fs/exec.c
 F:     include/linux/binfmts.h
@@ -8223,7 +8229,8 @@ F:        include/linux/iomap.h
 
 FILESYSTEMS [NETFS LIBRARY]
 M:     David Howells <dhowells@redhat.com>
-L:     linux-cachefs@redhat.com (moderated for non-subscribers)
+R:     Jeff Layton <jlayton@kernel.org>
+L:     netfs@lists.linux.dev
 L:     linux-fsdevel@vger.kernel.org
 S:     Supported
 F:     Documentation/filesystems/caching/
@@ -8489,7 +8496,7 @@ FREESCALE IMX / MXC FEC DRIVER
 M:     Wei Fang <wei.fang@nxp.com>
 R:     Shenwei Wang <shenwei.wang@nxp.com>
 R:     Clark Wang <xiaoning.wang@nxp.com>
-R:     NXP Linux Team <linux-imx@nxp.com>
+L:     imx@lists.linux.dev
 L:     netdev@vger.kernel.org
 S:     Maintained
 F:     Documentation/devicetree/bindings/net/fsl,fec.yaml
@@ -8524,7 +8531,7 @@ F:        drivers/i2c/busses/i2c-imx.c
 FREESCALE IMX LPI2C DRIVER
 M:     Dong Aisheng <aisheng.dong@nxp.com>
 L:     linux-i2c@vger.kernel.org
-L:     linux-imx@nxp.com
+L:     imx@lists.linux.dev
 S:     Maintained
 F:     Documentation/devicetree/bindings/i2c/i2c-imx-lpi2c.yaml
 F:     drivers/i2c/busses/i2c-imx-lpi2c.c
@@ -10090,7 +10097,7 @@ L:      linux-i2c@vger.kernel.org
 S:     Maintained
 W:     https://i2c.wiki.kernel.org/
 Q:     https://patchwork.ozlabs.org/project/linux-i2c/list/
-T:     git git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux.git
 F:     Documentation/devicetree/bindings/i2c/
 F:     drivers/i2c/algos/
 F:     drivers/i2c/busses/
@@ -10282,7 +10289,7 @@ F:      drivers/scsi/ibmvscsi/ibmvscsi*
 F:     include/scsi/viosrp.h
 
 IBM Power Virtual SCSI Device Target Driver
-M:     Michael Cyr <mikecyr@linux.ibm.com>
+M:     Tyrel Datwyler <tyreld@linux.ibm.com>
 L:     linux-scsi@vger.kernel.org
 L:     target-devel@vger.kernel.org
 S:     Supported
@@ -10728,7 +10735,7 @@ INTEL DRM I915 DRIVER (Meteor Lake, DG2 and older excluding Poulsbo, Moorestown
 M:     Jani Nikula <jani.nikula@linux.intel.com>
 M:     Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
 M:     Rodrigo Vivi <rodrigo.vivi@intel.com>
-M:     Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
+M:     Tvrtko Ursulin <tursulin@ursulin.net>
 L:     intel-gfx@lists.freedesktop.org
 S:     Supported
 W:     https://drm.pages.freedesktop.org/intel-docs/
@@ -10800,11 +10807,11 @@ F:    drivers/gpio/gpio-tangier.h
 
 INTEL GVT-g DRIVERS (Intel GPU Virtualization)
 M:     Zhenyu Wang <zhenyuw@linux.intel.com>
-M:     Zhi Wang <zhi.a.wang@intel.com>
+M:     Zhi Wang <zhi.wang.linux@gmail.com>
 L:     intel-gvt-dev@lists.freedesktop.org
 L:     intel-gfx@lists.freedesktop.org
 S:     Supported
-W:     https://01.org/igvt-g
+W:     https://github.com/intel/gvt-linux/wiki
 T:     git https://github.com/intel/gvt-linux.git
 F:     drivers/gpu/drm/i915/gvt/
 
@@ -11126,7 +11133,6 @@ S:      Supported
 F:     drivers/net/wireless/intel/iwlegacy/
 
 INTEL WIRELESS WIFI LINK (iwlwifi)
-M:     Gregory Greenman <gregory.greenman@intel.com>
 M:     Miri Korenblit <miriam.rachel.korenblit@intel.com>
 L:     linux-wireless@vger.kernel.org
 S:     Supported
@@ -11724,6 +11730,7 @@ F:      fs/smb/server/
 KERNEL UNIT TESTING FRAMEWORK (KUnit)
 M:     Brendan Higgins <brendanhiggins@google.com>
 M:     David Gow <davidgow@google.com>
+R:     Rae Moar <rmoar@google.com>
 L:     linux-kselftest@vger.kernel.org
 L:     kunit-dev@googlegroups.com
 S:     Maintained
@@ -12510,7 +12517,6 @@ F:      arch/powerpc/include/asm/livepatch.h
 F:     include/linux/livepatch.h
 F:     kernel/livepatch/
 F:     kernel/module/livepatch.c
-F:     lib/livepatch/
 F:     samples/livepatch/
 F:     tools/testing/selftests/livepatch/
 
@@ -12902,6 +12908,8 @@ M:      Alejandro Colomar <alx@kernel.org>
 L:     linux-man@vger.kernel.org
 S:     Maintained
 W:     http://www.kernel.org/doc/man-pages
+T:     git git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
+T:     git git://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git
 
 MANAGEMENT COMPONENT TRANSPORT PROTOCOL (MCTP)
 M:     Jeremy Kerr <jk@codeconstruct.com.au>
@@ -14103,6 +14111,17 @@ F:     mm/
 F:     tools/mm/
 F:     tools/testing/selftests/mm/
 
+MEMORY MAPPING
+M:     Andrew Morton <akpm@linux-foundation.org>
+R:     Liam R. Howlett <Liam.Howlett@oracle.com>
+R:     Vlastimil Babka <vbabka@suse.cz>
+R:     Lorenzo Stoakes <lstoakes@gmail.com>
+L:     linux-mm@kvack.org
+S:     Maintained
+W:     http://www.linux-mm.org
+T:     git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
+F:     mm/mmap.c
+
 MEMORY TECHNOLOGY DEVICES (MTD)
 M:     Miquel Raynal <miquel.raynal@bootlin.com>
 M:     Richard Weinberger <richard@nod.at>
@@ -14361,7 +14380,7 @@ MICROCHIP MCP16502 PMIC DRIVER
 M:     Claudiu Beznea <claudiu.beznea@tuxon.dev>
 L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
 S:     Supported
-F:     Documentation/devicetree/bindings/regulator/mcp16502-regulator.txt
+F:     Documentation/devicetree/bindings/regulator/microchip,mcp16502.yaml
 F:     drivers/regulator/mcp16502.c
 
 MICROCHIP MCP3564 ADC DRIVER
@@ -15177,6 +15196,7 @@ F:      Documentation/networking/net_cachelines/net_device.rst
 F:     drivers/connector/
 F:     drivers/net/
 F:     include/dt-bindings/net/
+F:     include/linux/cn_proc.h
 F:     include/linux/etherdevice.h
 F:     include/linux/fcdevice.h
 F:     include/linux/fddidevice.h
@@ -15184,6 +15204,7 @@ F:      include/linux/hippidevice.h
 F:     include/linux/if_*
 F:     include/linux/inetdevice.h
 F:     include/linux/netdevice.h
+F:     include/uapi/linux/cn_proc.h
 F:     include/uapi/linux/if_*
 F:     include/uapi/linux/netdevice.h
 X:     drivers/net/wireless/
@@ -15232,6 +15253,8 @@ F:      Documentation/networking/
 F:     Documentation/networking/net_cachelines/
 F:     Documentation/process/maintainer-netdev.rst
 F:     Documentation/userspace-api/netlink/
+F:     include/linux/framer/framer-provider.h
+F:     include/linux/framer/framer.h
 F:     include/linux/in.h
 F:     include/linux/indirect_call_wrapper.h
 F:     include/linux/net.h
@@ -15319,7 +15342,7 @@ K:      \bmdo_
 NETWORKING [MPTCP]
 M:     Matthieu Baerts <matttbe@kernel.org>
 M:     Mat Martineau <martineau@kernel.org>
-R:     Geliang Tang <geliang.tang@linux.dev>
+R:     Geliang Tang <geliang@kernel.org>
 L:     netdev@vger.kernel.org
 L:     mptcp@lists.linux.dev
 S:     Maintained
@@ -15566,16 +15589,6 @@ W:     https://github.com/davejiang/linux/wiki
 T:     git https://github.com/davejiang/linux.git
 F:     drivers/ntb/hw/intel/
 
-NTFS FILESYSTEM
-M:     Anton Altaparmakov <anton@tuxera.com>
-R:     Namjae Jeon <linkinjeon@kernel.org>
-L:     linux-ntfs-dev@lists.sourceforge.net
-S:     Supported
-W:     http://www.tuxera.com/
-T:     git git://git.kernel.org/pub/scm/linux/kernel/git/aia21/ntfs.git
-F:     Documentation/filesystems/ntfs.rst
-F:     fs/ntfs/
-
 NTFS3 FILESYSTEM
 M:     Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
 L:     ntfs3@lists.linux.dev
@@ -15704,7 +15717,7 @@ F:      drivers/iio/gyro/fxas21002c_spi.c
 NXP i.MX 7D/6SX/6UL/93 AND VF610 ADC DRIVER
 M:     Haibo Chen <haibo.chen@nxp.com>
 L:     linux-iio@vger.kernel.org
-L:     linux-imx@nxp.com
+L:     imx@lists.linux.dev
 S:     Maintained
 F:     Documentation/devicetree/bindings/iio/adc/fsl,imx7d-adc.yaml
 F:     Documentation/devicetree/bindings/iio/adc/fsl,vf610-adc.yaml
@@ -15741,7 +15754,7 @@ F:      drivers/gpu/drm/imx/dcss/
 NXP i.MX 8QXP ADC DRIVER
 M:     Cai Huoqing <cai.huoqing@linux.dev>
 M:     Haibo Chen <haibo.chen@nxp.com>
-L:     linux-imx@nxp.com
+L:     imx@lists.linux.dev
 L:     linux-iio@vger.kernel.org
 S:     Maintained
 F:     Documentation/devicetree/bindings/iio/adc/nxp,imx8qxp-adc.yaml
@@ -15749,7 +15762,7 @@ F:      drivers/iio/adc/imx8qxp-adc.c
 
 NXP i.MX 8QXP/8QM JPEG V4L2 DRIVER
 M:     Mirela Rabulea <mirela.rabulea@nxp.com>
-R:     NXP Linux Team <linux-imx@nxp.com>
+L:     imx@lists.linux.dev
 L:     linux-media@vger.kernel.org
 S:     Maintained
 F:     Documentation/devicetree/bindings/media/nxp,imx8-jpeg.yaml
@@ -15759,7 +15772,7 @@ NXP i.MX CLOCK DRIVERS
 M:     Abel Vesa <abelvesa@kernel.org>
 R:     Peng Fan <peng.fan@nxp.com>
 L:     linux-clk@vger.kernel.org
-L:     linux-imx@nxp.com
+L:     imx@lists.linux.dev
 S:     Maintained
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/abelvesa/linux.git clk/imx
 F:     Documentation/devicetree/bindings/clock/imx*
@@ -16720,6 +16733,7 @@ F:      drivers/pci/controller/dwc/*layerscape*
 PCI DRIVER FOR FU740
 M:     Paul Walmsley <paul.walmsley@sifive.com>
 M:     Greentime Hu <greentime.hu@sifive.com>
+M:     Samuel Holland <samuel.holland@sifive.com>
 L:     linux-pci@vger.kernel.org
 S:     Maintained
 F:     Documentation/devicetree/bindings/pci/sifive,fu740-pcie.yaml
@@ -16832,6 +16846,7 @@ F:      drivers/pci/controller/dwc/*designware*
 
 PCI DRIVER FOR TI DRA7XX/J721E
 M:     Vignesh Raghavendra <vigneshr@ti.com>
+R:     Siddharth Vadapalli <s-vadapalli@ti.com>
 L:     linux-omap@vger.kernel.org
 L:     linux-pci@vger.kernel.org
 L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
@@ -16856,9 +16871,8 @@ F:      Documentation/devicetree/bindings/pci/xilinx-versal-cpm.yaml
 F:     drivers/pci/controller/pcie-xilinx-cpm.c
 
 PCI ENDPOINT SUBSYSTEM
-M:     Lorenzo Pieralisi <lpieralisi@kernel.org>
+M:     Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
 M:     Krzysztof Wilczyński <kw@linux.com>
-R:     Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
 R:     Kishon Vijay Abraham I <kishon@kernel.org>
 L:     linux-pci@vger.kernel.org
 S:     Supported
@@ -17178,7 +17192,7 @@ R:      John Garry <john.g.garry@oracle.com>
 R:     Will Deacon <will@kernel.org>
 R:     James Clark <james.clark@arm.com>
 R:     Mike Leach <mike.leach@linaro.org>
-R:     Leo Yan <leo.yan@linaro.org>
+R:     Leo Yan <leo.yan@linux.dev>
 L:     linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
 S:     Supported
 F:     tools/build/feature/test-libopencsd.c
@@ -17525,6 +17539,7 @@ F:      Documentation/devicetree/bindings/power/supply/
 F:     drivers/power/supply/
 F:     include/linux/power/
 F:     include/linux/power_supply.h
+F:     tools/testing/selftests/power_supply/
 
 POWERNV OPERATOR PANEL LCD DISPLAY DRIVER
 M:     Suraj Jitindar Singh <sjitindarsingh@gmail.com>
@@ -17972,33 +17987,34 @@ F:    drivers/media/tuners/qt1010*
 
 QUALCOMM ATH12K WIRELESS DRIVER
 M:     Kalle Valo <kvalo@kernel.org>
-M:     Jeff Johnson <quic_jjohnson@quicinc.com>
+M:     Jeff Johnson <jjohnson@kernel.org>
 L:     ath12k@lists.infradead.org
 S:     Supported
 W:     https://wireless.wiki.kernel.org/en/users/Drivers/ath12k
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git
 F:     drivers/net/wireless/ath/ath12k/
+N:     ath12k
 
 QUALCOMM ATHEROS ATH10K WIRELESS DRIVER
 M:     Kalle Valo <kvalo@kernel.org>
-M:     Jeff Johnson <quic_jjohnson@quicinc.com>
+M:     Jeff Johnson <jjohnson@kernel.org>
 L:     ath10k@lists.infradead.org
 S:     Supported
 W:     https://wireless.wiki.kernel.org/en/users/Drivers/ath10k
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git
-F:     Documentation/devicetree/bindings/net/wireless/qcom,ath10k.yaml
 F:     drivers/net/wireless/ath/ath10k/
+N:     ath10k
 
 QUALCOMM ATHEROS ATH11K WIRELESS DRIVER
 M:     Kalle Valo <kvalo@kernel.org>
-M:     Jeff Johnson <quic_jjohnson@quicinc.com>
+M:     Jeff Johnson <jjohnson@kernel.org>
 L:     ath11k@lists.infradead.org
 S:     Supported
 W:     https://wireless.wiki.kernel.org/en/users/Drivers/ath11k
 B:     https://wireless.wiki.kernel.org/en/users/Drivers/ath11k/bugreport
 T:     git git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git
-F:     Documentation/devicetree/bindings/net/wireless/qcom,ath11k.yaml
 F:     drivers/net/wireless/ath/ath11k/
+N:     ath11k
 
 QUALCOMM ATHEROS ATH9K WIRELESS DRIVER
 M:     Toke Høiland-Jørgensen <toke@toke.dk>
@@ -18081,7 +18097,6 @@ F:      drivers/net/ethernet/qualcomm/emac/
 
 QUALCOMM ETHQOS ETHERNET DRIVER
 M:     Vinod Koul <vkoul@kernel.org>
-R:     Bhupesh Sharma <bhupesh.sharma@linaro.org>
 L:     netdev@vger.kernel.org
 L:     linux-arm-msm@vger.kernel.org
 S:     Maintained
@@ -18428,7 +18443,7 @@ S:      Supported
 F:     drivers/infiniband/sw/rdmavt
 
 RDS - RELIABLE DATAGRAM SOCKETS
-M:     Santosh Shilimkar <santosh.shilimkar@oracle.com>
+M:     Allison Henderson <allison.henderson@oracle.com>
 L:     netdev@vger.kernel.org
 L:     linux-rdma@vger.kernel.org
 L:     rds-devel@oss.oracle.com (moderated for non-subscribers)
@@ -19095,6 +19110,7 @@ F:      Documentation/rust/
 F:     rust/
 F:     samples/rust/
 F:     scripts/*rust*
+F:     tools/testing/selftests/rust/
 K:     \b(?i:rust)\b
 
 RXRPC SOCKETS (AF_RXRPC)
@@ -19630,7 +19646,7 @@ F:      drivers/mmc/host/sdhci-of-at91.c
 
 SECURE DIGITAL HOST CONTROLLER INTERFACE (SDHCI) NXP i.MX DRIVER
 M:     Haibo Chen <haibo.chen@nxp.com>
-L:     linux-imx@nxp.com
+L:     imx@lists.linux.dev
 L:     linux-mmc@vger.kernel.org
 S:     Maintained
 F:     drivers/mmc/host/sdhci-esdhc-imx.c
@@ -19965,36 +19981,15 @@ S:    Maintained
 F:     drivers/watchdog/simatic-ipc-wdt.c
 
 SIFIVE DRIVERS
-M:     Palmer Dabbelt <palmer@dabbelt.com>
 M:     Paul Walmsley <paul.walmsley@sifive.com>
+M:     Samuel Holland <samuel.holland@sifive.com>
 L:     linux-riscv@lists.infradead.org
 S:     Supported
+F:     drivers/dma/sf-pdma/
 N:     sifive
+K:     fu[57]40
 K:     [^@]sifive
 
-SIFIVE CACHE DRIVER
-M:     Conor Dooley <conor@kernel.org>
-L:     linux-riscv@lists.infradead.org
-S:     Maintained
-F:     Documentation/devicetree/bindings/cache/sifive,ccache0.yaml
-F:     drivers/cache/sifive_ccache.c
-
-SIFIVE FU540 SYSTEM-ON-CHIP
-M:     Paul Walmsley <paul.walmsley@sifive.com>
-M:     Palmer Dabbelt <palmer@dabbelt.com>
-L:     linux-riscv@lists.infradead.org
-S:     Supported
-T:     git git://git.kernel.org/pub/scm/linux/kernel/git/pjw/sifive.git
-N:     fu540
-K:     fu540
-
-SIFIVE PDMA DRIVER
-M:     Green Wan <green.wan@sifive.com>
-S:     Maintained
-F:     Documentation/devicetree/bindings/dma/sifive,fu540-c000-pdma.yaml
-F:     drivers/dma/sf-pdma/
-
-
 SILEAD TOUCHSCREEN DRIVER
 M:     Hans de Goede <hdegoede@redhat.com>
 L:     linux-input@vger.kernel.org
@@ -20203,8 +20198,8 @@ F:      Documentation/devicetree/bindings/net/socionext,uniphier-ave4.yaml
 F:     drivers/net/ethernet/socionext/sni_ave.c
 
 SOCIONEXT (SNI) NETSEC NETWORK DRIVER
-M:     Jassi Brar <jaswinder.singh@linaro.org>
 M:     Ilias Apalodimas <ilias.apalodimas@linaro.org>
+M:     Masahisa Kojima <kojima.masahisa@socionext.com>
 L:     netdev@vger.kernel.org
 S:     Maintained
 F:     Documentation/devicetree/bindings/net/socionext,synquacer-netsec.yaml
@@ -20549,6 +20544,7 @@ F:      Documentation/translations/sp_SP/
 
 SPARC + UltraSPARC (sparc/sparc64)
 M:     "David S. Miller" <davem@davemloft.net>
+M:     Andreas Larsson <andreas@gaisler.com>
 L:     sparclinux@vger.kernel.org
 S:     Maintained
 Q:     http://patchwork.ozlabs.org/project/sparclinux/list/
@@ -22005,6 +22001,14 @@ F:     Documentation/devicetree/bindings/media/i2c/ti,ds90*
 F:     drivers/media/i2c/ds90*
 F:     include/media/i2c/ds90*
 
+TI HDC302X HUMIDITY DRIVER
+M:     Javier Carrasco <javier.carrasco.cruz@gmail.com>
+M:     Li peiyu <579lpy@gmail.com>
+L:     linux-iio@vger.kernel.org
+S:     Maintained
+F:     Documentation/devicetree/bindings/iio/humidity/ti,hdc3020.yaml
+F:     drivers/iio/humidity/hdc3020.c
+
 TI ICSSG ETHERNET DRIVER (ICSSG)
 R:     MD Danish Anwar <danishanwar@ti.com>
 R:     Roger Quadros <rogerq@kernel.org>
@@ -22860,9 +22864,8 @@ S:      Maintained
 F:     drivers/usb/typec/mux/pi3usb30532.c
 
 USB TYPEC PORT CONTROLLER DRIVERS
-M:     Guenter Roeck <linux@roeck-us.net>
 L:     linux-usb@vger.kernel.org
-S:     Maintained
+S:     Orphan
 F:     drivers/usb/typec/tcpm/
 
 USB UHCI DRIVER
@@ -24339,13 +24342,6 @@ T:     git git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs.git
 F:     Documentation/filesystems/zonefs.rst
 F:     fs/zonefs/
 
-ZPOOL COMPRESSED PAGE STORAGE API
-M:     Dan Streetman <ddstreet@ieee.org>
-L:     linux-mm@kvack.org
-S:     Maintained
-F:     include/linux/zpool.h
-F:     mm/zpool.c
-
 ZR36067 VIDEO FOR LINUX DRIVER
 M:     Corentin Labbe <clabbe@baylibre.com>
 L:     mjpeg-users@lists.sourceforge.net
@@ -24397,7 +24393,9 @@ M:      Nhat Pham <nphamcs@gmail.com>
 L:     linux-mm@kvack.org
 S:     Maintained
 F:     Documentation/admin-guide/mm/zswap.rst
+F:     include/linux/zpool.h
 F:     include/linux/zswap.h
+F:     mm/zpool.c
 F:     mm/zswap.c
 
 THE REST
index cbcdd8d9d0e3d094f966311caf5a7d13fc375dee..d18fa2a6240ddeee632215061d10ba3c7fb6ce24 100644 (file)
--- a/Makefile
+++ b/Makefile
@@ -2,7 +2,7 @@
 VERSION = 6
 PATCHLEVEL = 8
 SUBLEVEL = 0
-EXTRAVERSION = -rc1
+EXTRAVERSION =
 NAME = Hurr durr I'ma ninja sloth
 
 # *DOCUMENTATION*
@@ -294,15 +294,15 @@ may-sync-config   := 1
 single-build   :=
 
 ifneq ($(filter $(no-dot-config-targets), $(MAKECMDGOALS)),)
-       ifeq ($(filter-out $(no-dot-config-targets), $(MAKECMDGOALS)),)
+    ifeq ($(filter-out $(no-dot-config-targets), $(MAKECMDGOALS)),)
                need-config :=
-       endif
+    endif
 endif
 
 ifneq ($(filter $(no-sync-config-targets), $(MAKECMDGOALS)),)
-       ifeq ($(filter-out $(no-sync-config-targets), $(MAKECMDGOALS)),)
+    ifeq ($(filter-out $(no-sync-config-targets), $(MAKECMDGOALS)),)
                may-sync-config :=
-       endif
+    endif
 endif
 
 need-compiler := $(may-sync-config)
@@ -323,9 +323,9 @@ endif
 # We cannot build single targets and the others at the same time
 ifneq ($(filter $(single-targets), $(MAKECMDGOALS)),)
        single-build := 1
-       ifneq ($(filter-out $(single-targets), $(MAKECMDGOALS)),)
+    ifneq ($(filter-out $(single-targets), $(MAKECMDGOALS)),)
                mixed-build := 1
-       endif
+    endif
 endif
 
 # For "make -j clean all", "make -j mrproper defconfig all", etc.
@@ -986,6 +986,10 @@ NOSTDINC_FLAGS += -nostdinc
 # perform bounds checking.
 KBUILD_CFLAGS += $(call cc-option, -fstrict-flex-arrays=3)
 
+#Currently, disable -Wstringop-overflow for GCC 11, globally.
+KBUILD_CFLAGS-$(CONFIG_CC_NO_STRINGOP_OVERFLOW) += $(call cc-option, -Wno-stringop-overflow)
+KBUILD_CFLAGS-$(CONFIG_CC_STRINGOP_OVERFLOW) += $(call cc-option, -Wstringop-overflow)
+
 # disable invalid "can't wrap" optimizations for signed / pointers
 KBUILD_CFLAGS  += -fno-strict-overflow
 
@@ -1662,7 +1666,7 @@ help:
        @echo  '                       (sparse by default)'
        @echo  '  make C=2   [targets] Force check of all c source with $$CHECK'
        @echo  '  make RECORDMCOUNT_WARN=1 [targets] Warn about ignored mcount sections'
-       @echo  '  make W=n   [targets] Enable extra build checks, n=1,2,3 where'
+       @echo  '  make W=n   [targets] Enable extra build checks, n=1,2,3,c,e where'
        @echo  '                1: warnings which may be relevant and do not occur too often'
        @echo  '                2: warnings which occur quite often but may still be relevant'
        @echo  '                3: more obscure warnings, can most likely be ignored'
index c91917b508736d1fa0d37d5bf3b1e4bf5550e211..a5af0edd3eb8f3b64e6e51bffb2ac491cb31bc26 100644 (file)
@@ -673,6 +673,7 @@ config SHADOW_CALL_STACK
        bool "Shadow Call Stack"
        depends on ARCH_SUPPORTS_SHADOW_CALL_STACK
        depends on DYNAMIC_FTRACE_WITH_ARGS || DYNAMIC_FTRACE_WITH_REGS || !FUNCTION_GRAPH_TRACER
+       depends on MMU
        help
          This option enables the compiler's Shadow Call Stack, which
          uses a shadow stack to protect function return addresses from
index 9d96180797396bba26ace54f047f7a47bf82dd5f..a339223d9e052b35ea678d6a3e60faf6e5673671 100644 (file)
@@ -31,7 +31,7 @@
 static __always_inline bool arch_static_branch(struct static_key *key,
                                               bool branch)
 {
-       asm_volatile_goto(".balign "__stringify(JUMP_LABEL_NOP_SIZE)"   \n"
+       asm goto(".balign "__stringify(JUMP_LABEL_NOP_SIZE)"            \n"
                 "1:                                                    \n"
                 "nop                                                   \n"
                 ".pushsection __jump_table, \"aw\"                     \n"
@@ -47,7 +47,7 @@ l_yes:
 static __always_inline bool arch_static_branch_jump(struct static_key *key,
                                                    bool branch)
 {
-       asm_volatile_goto(".balign "__stringify(JUMP_LABEL_NOP_SIZE)"   \n"
+       asm goto(".balign "__stringify(JUMP_LABEL_NOP_SIZE)"            \n"
                 "1:                                                    \n"
                 "b %l[l_yes]                                           \n"
                 ".pushsection __jump_table, \"aw\"                     \n"
index ff68dfb4eb7874a00d398bf7dfc2d242385c5620..90bd12feac010108def3f68756edf4e2d76c2e84 100644 (file)
                msix: msix@fbe00000 {
                        compatible = "al,alpine-msix";
                        reg = <0x0 0xfbe00000 0x0 0x100000>;
-                       interrupt-controller;
                        msi-controller;
                        al,msi-base-spi = <96>;
                        al,msi-num-spis = <64>;
index e899de681f4752d4077b55a0cd4f8858c6e23df0..5be0e8fd2633c20e2d87abc843b53fca437942be 100644 (file)
@@ -45,8 +45,8 @@
                num-chipselects = <1>;
                cs-gpios = <&gpio0 ASPEED_GPIO(Z, 0) GPIO_ACTIVE_LOW>;
 
-               tpmdev@0 {
-                       compatible = "tcg,tpm_tis-spi";
+               tpm@0 {
+                       compatible = "infineon,slb9670", "tcg,tpm_tis-spi";
                        spi-max-frequency = <33000000>;
                        reg = <0>;
                };
index a677c827e758fe2042fcf14a192832668e3ffbd0..5a8169bbda8792c76c1da960508c8a0c6bdd4b86 100644 (file)
@@ -80,8 +80,8 @@
                gpio-miso = <&gpio ASPEED_GPIO(R, 5) GPIO_ACTIVE_HIGH>;
                num-chipselects = <1>;
 
-               tpmdev@0 {
-                       compatible = "tcg,tpm_tis-spi";
+               tpm@0 {
+                       compatible = "infineon,slb9670", "tcg,tpm_tis-spi";
                        spi-max-frequency = <33000000>;
                        reg = <0>;
                };
index 3f6010ef2b86f264fe88935a737b3ce9c60d762b..213023bc5aec4144751c9e7bc8e3e05c156386c8 100644 (file)
        status = "okay";
 
        tpm: tpm@2e {
-               compatible = "tcg,tpm-tis-i2c";
+               compatible = "nuvoton,npct75x", "tcg,tpm-tis-i2c";
                reg = <0x2e>;
        };
 };
index 530491ae5eb26060f68802cf3318914f7fb2d361..857cb26ed6d7e8acd13c5695daa9fb3b8699c3c1 100644 (file)
        i2c0: i2c-bus@40 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x40 0x40>;
                compatible = "aspeed,ast2400-i2c-bus";
        i2c1: i2c-bus@80 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x80 0x40>;
                compatible = "aspeed,ast2400-i2c-bus";
        i2c2: i2c-bus@c0 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0xc0 0x40>;
                compatible = "aspeed,ast2400-i2c-bus";
        i2c3: i2c-bus@100 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x100 0x40>;
                compatible = "aspeed,ast2400-i2c-bus";
        i2c4: i2c-bus@140 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x140 0x40>;
                compatible = "aspeed,ast2400-i2c-bus";
        i2c5: i2c-bus@180 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x180 0x40>;
                compatible = "aspeed,ast2400-i2c-bus";
        i2c6: i2c-bus@1c0 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x1c0 0x40>;
                compatible = "aspeed,ast2400-i2c-bus";
        i2c7: i2c-bus@300 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x300 0x40>;
                compatible = "aspeed,ast2400-i2c-bus";
        i2c8: i2c-bus@340 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x340 0x40>;
                compatible = "aspeed,ast2400-i2c-bus";
        i2c9: i2c-bus@380 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x380 0x40>;
                compatible = "aspeed,ast2400-i2c-bus";
        i2c10: i2c-bus@3c0 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x3c0 0x40>;
                compatible = "aspeed,ast2400-i2c-bus";
        i2c11: i2c-bus@400 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x400 0x40>;
                compatible = "aspeed,ast2400-i2c-bus";
        i2c12: i2c-bus@440 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x440 0x40>;
                compatible = "aspeed,ast2400-i2c-bus";
        i2c13: i2c-bus@480 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x480 0x40>;
                compatible = "aspeed,ast2400-i2c-bus";
index 04f98d1dbb97c84c318c7e6a133fbf4572237c47..e6f3cf3c721e574f8b9975254cdcc79e3ce3b725 100644 (file)
                                interrupts = <40>;
                                reg = <0x1e780200 0x0100>;
                                clocks = <&syscon ASPEED_CLK_APB>;
+                               #interrupt-cells = <2>;
                                interrupt-controller;
                                bus-frequency = <12000000>;
                                pinctrl-names = "default";
        i2c0: i2c-bus@40 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x40 0x40>;
                compatible = "aspeed,ast2500-i2c-bus";
        i2c1: i2c-bus@80 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x80 0x40>;
                compatible = "aspeed,ast2500-i2c-bus";
        i2c2: i2c-bus@c0 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0xc0 0x40>;
                compatible = "aspeed,ast2500-i2c-bus";
        i2c3: i2c-bus@100 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x100 0x40>;
                compatible = "aspeed,ast2500-i2c-bus";
        i2c4: i2c-bus@140 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x140 0x40>;
                compatible = "aspeed,ast2500-i2c-bus";
        i2c5: i2c-bus@180 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x180 0x40>;
                compatible = "aspeed,ast2500-i2c-bus";
        i2c6: i2c-bus@1c0 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x1c0 0x40>;
                compatible = "aspeed,ast2500-i2c-bus";
        i2c7: i2c-bus@300 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x300 0x40>;
                compatible = "aspeed,ast2500-i2c-bus";
        i2c8: i2c-bus@340 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x340 0x40>;
                compatible = "aspeed,ast2500-i2c-bus";
        i2c9: i2c-bus@380 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x380 0x40>;
                compatible = "aspeed,ast2500-i2c-bus";
        i2c10: i2c-bus@3c0 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x3c0 0x40>;
                compatible = "aspeed,ast2500-i2c-bus";
        i2c11: i2c-bus@400 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x400 0x40>;
                compatible = "aspeed,ast2500-i2c-bus";
        i2c12: i2c-bus@440 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x440 0x40>;
                compatible = "aspeed,ast2500-i2c-bus";
        i2c13: i2c-bus@480 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
 
                reg = <0x480 0x40>;
                compatible = "aspeed,ast2500-i2c-bus";
index c4d1faade8be33d52c91f797f3fedaa0b22566a2..29f94696d8b189cba0113e7a65bbb25611358710 100644 (file)
                                reg = <0x1e780500 0x100>;
                                interrupts = <GIC_SPI 51 IRQ_TYPE_LEVEL_HIGH>;
                                clocks = <&syscon ASPEED_CLK_APB2>;
+                               #interrupt-cells = <2>;
                                interrupt-controller;
                                bus-frequency = <12000000>;
                                pinctrl-names = "default";
                                reg = <0x1e780600 0x100>;
                                interrupts = <GIC_SPI 70 IRQ_TYPE_LEVEL_HIGH>;
                                clocks = <&syscon ASPEED_CLK_APB2>;
+                               #interrupt-cells = <2>;
                                interrupt-controller;
                                bus-frequency = <12000000>;
                                pinctrl-names = "default";
        i2c0: i2c-bus@80 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
                reg = <0x80 0x80>;
                compatible = "aspeed,ast2600-i2c-bus";
                clocks = <&syscon ASPEED_CLK_APB2>;
        i2c1: i2c-bus@100 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
                reg = <0x100 0x80>;
                compatible = "aspeed,ast2600-i2c-bus";
                clocks = <&syscon ASPEED_CLK_APB2>;
        i2c2: i2c-bus@180 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
                reg = <0x180 0x80>;
                compatible = "aspeed,ast2600-i2c-bus";
                clocks = <&syscon ASPEED_CLK_APB2>;
        i2c3: i2c-bus@200 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
                reg = <0x200 0x80>;
                compatible = "aspeed,ast2600-i2c-bus";
                clocks = <&syscon ASPEED_CLK_APB2>;
        i2c4: i2c-bus@280 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
                reg = <0x280 0x80>;
                compatible = "aspeed,ast2600-i2c-bus";
                clocks = <&syscon ASPEED_CLK_APB2>;
        i2c5: i2c-bus@300 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
                reg = <0x300 0x80>;
                compatible = "aspeed,ast2600-i2c-bus";
                clocks = <&syscon ASPEED_CLK_APB2>;
        i2c6: i2c-bus@380 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
                reg = <0x380 0x80>;
                compatible = "aspeed,ast2600-i2c-bus";
                clocks = <&syscon ASPEED_CLK_APB2>;
        i2c7: i2c-bus@400 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
                reg = <0x400 0x80>;
                compatible = "aspeed,ast2600-i2c-bus";
                clocks = <&syscon ASPEED_CLK_APB2>;
        i2c8: i2c-bus@480 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
                reg = <0x480 0x80>;
                compatible = "aspeed,ast2600-i2c-bus";
                clocks = <&syscon ASPEED_CLK_APB2>;
        i2c9: i2c-bus@500 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
                reg = <0x500 0x80>;
                compatible = "aspeed,ast2600-i2c-bus";
                clocks = <&syscon ASPEED_CLK_APB2>;
        i2c10: i2c-bus@580 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
                reg = <0x580 0x80>;
                compatible = "aspeed,ast2600-i2c-bus";
                clocks = <&syscon ASPEED_CLK_APB2>;
        i2c11: i2c-bus@600 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
                reg = <0x600 0x80>;
                compatible = "aspeed,ast2600-i2c-bus";
                clocks = <&syscon ASPEED_CLK_APB2>;
        i2c12: i2c-bus@680 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
                reg = <0x680 0x80>;
                compatible = "aspeed,ast2600-i2c-bus";
                clocks = <&syscon ASPEED_CLK_APB2>;
        i2c13: i2c-bus@700 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
                reg = <0x700 0x80>;
                compatible = "aspeed,ast2600-i2c-bus";
                clocks = <&syscon ASPEED_CLK_APB2>;
        i2c14: i2c-bus@780 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
                reg = <0x780 0x80>;
                compatible = "aspeed,ast2600-i2c-bus";
                clocks = <&syscon ASPEED_CLK_APB2>;
        i2c15: i2c-bus@800 {
                #address-cells = <1>;
                #size-cells = <0>;
-               #interrupt-cells = <1>;
                reg = <0x800 0x80>;
                compatible = "aspeed,ast2600-i2c-bus";
                clocks = <&syscon ASPEED_CLK_APB2>;
index 31590d3186a2e099e44c663c46a87975b60aae27..00e5887c926f181d57bebe6b0b781ad2f2e8a514 100644 (file)
@@ -35,8 +35,8 @@
                gpio-mosi = <&gpio0 ASPEED_GPIO(X, 4) GPIO_ACTIVE_HIGH>;
                gpio-miso = <&gpio0 ASPEED_GPIO(X, 5) GPIO_ACTIVE_HIGH>;
 
-               tpmdev@0 {
-                       compatible = "tcg,tpm_tis-spi";
+               tpm@0 {
+                       compatible = "infineon,slb9670", "tcg,tpm_tis-spi";
                        spi-max-frequency = <33000000>;
                        reg = <0>;
                };
index f9f79ed825181b7e71b12f87d7ba21ade0fd6d4d..07ca0d993c9fdb27ef50e3c450f3472ebe67f858 100644 (file)
                        #gpio-cells = <2>;
                        gpio-controller;
                        interrupt-controller;
+                       #interrupt-cells = <2>;
                        interrupt-parent = <&mailbox>;
                        interrupts = <0>;
                };
                        gpio-controller;
                        interrupts = <GIC_SPI 84 IRQ_TYPE_LEVEL_HIGH>;
                        interrupt-controller;
+                       #interrupt-cells = <2>;
                };
 
                i2c1: i2c@1800b000 {
                        gpio-controller;
 
                        interrupt-controller;
+                       #interrupt-cells = <2>;
                        interrupts = <GIC_SPI 174 IRQ_TYPE_LEVEL_HIGH>;
                        gpio-ranges = <&pinctrl 0 42 1>,
                                        <&pinctrl 1 44 3>,
index 788a6806191a33a04aa326a0645d5af06365571d..75545b10ef2fa69570f42422e15a2341d4cfaf92 100644 (file)
                        gpio-controller;
                        ngpios = <4>;
                        interrupt-controller;
+                       #interrupt-cells = <2>;
                        interrupts = <GIC_SPI 93 IRQ_TYPE_LEVEL_HIGH>;
                };
 
index 9d20ba3b1ffb13d4983f28e66de7ae140af528be..6a4482c9316741d89eb67371ac13a3670783b8fc 100644 (file)
                        gpio-controller;
                        ngpios = <32>;
                        interrupt-controller;
+                       #interrupt-cells = <2>;
                        interrupts = <GIC_SPI 85 IRQ_TYPE_LEVEL_HIGH>;
                        gpio-ranges = <&pinctrl 0 0 32>;
                };
                        gpio-controller;
                        ngpios = <4>;
                        interrupt-controller;
+                       #interrupt-cells = <2>;
                        interrupts = <GIC_SPI 87 IRQ_TYPE_LEVEL_HIGH>;
                };
 
index 4d70f6afd13ab5ee5df7ea621b56f81f4e642d41..6d5e69035f94dcaa3f323c833c1edd064d4f7dfd 100644 (file)
@@ -60,6 +60,8 @@
                         * We have slots (IDSEL) 1 and 2 with one assigned IRQ
                         * each handling all IRQs.
                         */
+                       #interrupt-cells = <1>;
+                       interrupt-map-mask = <0xf800 0 0 7>;
                        interrupt-map =
                        /* IDSEL 1 */
                        <0x0800 0 0 1 &gpio0 11 IRQ_TYPE_LEVEL_LOW>, /* INT A on slot 1 is irq 11 */
index 9ec0169bacf8c2098814ec6c1399e41c910df464..5f4c849915db71390ab3050b7277b7893b075307 100644 (file)
@@ -89,6 +89,8 @@
                         * The slots have Ethernet, Ethernet, NEC and MPCI.
                         * The IDSELs are 11, 12, 13, 14.
                         */
+                       #interrupt-cells = <1>;
+                       interrupt-map-mask = <0xf800 0 0 7>;
                        interrupt-map =
                        /* IDSEL 11 - Ethernet A */
                        <0x5800 0 0 1 &gpio0 4 IRQ_TYPE_LEVEL_LOW>, /* INT A on slot 11 is irq 4 */
index dffb9f84e67c50c63ba5268a9975c62b93e75157..c841eb8e7fb1d0404301f4f8b21899fb60b77a25 100644 (file)
@@ -65,6 +65,7 @@
                        gpio2: gpio-expander@20 {
                                #gpio-cells = <2>;
                                #interrupt-cells = <2>;
+                               interrupt-controller;
                                compatible = "semtech,sx1505q";
                                reg = <0x20>;
 
@@ -79,6 +80,7 @@
                        gpio3: gpio-expander@21 {
                                #gpio-cells = <2>;
                                #interrupt-cells = <2>;
+                               interrupt-controller;
                                compatible = "semtech,sx1505q";
                                reg = <0x21>;
 
index fd671c7a1e5d64c6eafb0a7434c7d14b19f4d1b6..6e1f0f164cb4f511d19774a8c39a9a3090d85b9d 100644 (file)
                                interrupts = <2 IRQ_TYPE_LEVEL_HIGH>,
                                             <3 IRQ_TYPE_LEVEL_HIGH>,
                                             <4 IRQ_TYPE_LEVEL_HIGH>;
+                               #interrupt-cells = <2>;
                                interrupt-controller;
                        };
 
                                gpio-controller;
                                #gpio-cells = <2>;
                                interrupts = <5 IRQ_TYPE_LEVEL_HIGH>;
+                               #interrupt-cells = <2>;
                                interrupt-controller;
                        };
 
index 1640763fd4af2216c225b95e60e954afd5255fb5..ff0d684622f74d13eb1b4b2c7178c38e93ab4293 100644 (file)
                        compatible = "st,stmpe811";
                        reg = <0x41>;
                        irq-gpio = <&gpio TEGRA_GPIO(V, 0) GPIO_ACTIVE_LOW>;
-                       interrupt-controller;
                        id = <0>;
                        blocks = <0x5>;
                        irq-trigger = <0x1>;
index 3b6fad273cabf17a6ddff7ede1d72de13079ed1f..d38f1dd38a9068371c25ddf82f4c284a555ffb03 100644 (file)
                        compatible = "st,stmpe811";
                        reg = <0x41>;
                        irq-gpio = <&gpio TEGRA_GPIO(V, 0) GPIO_ACTIVE_LOW>;
-                       interrupt-controller;
                        id = <0>;
                        blocks = <0x5>;
                        irq-trigger = <0x1>;
index 4eb526fe9c55888d6a595d68d3a95616bb913404..81c8a5fd92ccea33b3673d61302d39397e8fa72f 100644 (file)
                        compatible = "st,stmpe811";
                        reg = <0x41>;
                        irq-gpio = <&gpio TEGRA_GPIO(V, 0) GPIO_ACTIVE_LOW>;
-                       interrupt-controller;
                        id = <0>;
                        blocks = <0x5>;
                        irq-trigger = <0x1>;
index db8c332df6a1d53f1b3eff6572a9f080ac10fe0a..cad112e054758f7ce364f2346eb4e1e291086a61 100644 (file)
 
                #address-cells = <3>;
                #size-cells = <2>;
-               #interrupt-cells = <1>;
 
                bridge@2,1 {
                        compatible = "pci10b5,8605";
 
                        #address-cells = <3>;
                        #size-cells = <2>;
-                       #interrupt-cells = <1>;
 
                        /* Intel Corporation I210 Gigabit Network Connection */
                        ethernet@3,0 {
 
                        #address-cells = <3>;
                        #size-cells = <2>;
-                       #interrupt-cells = <1>;
 
                        /* Intel Corporation I210 Gigabit Network Connection */
                        switch_nic: ethernet@4,0 {
index 99f4f6ac71d4a18f6f6eb2f0476c47280ba844b7..c1ae7c47b44227c2438d4e7c73fbafd6eaa269b9 100644 (file)
                                reg = <0x74>;
                                gpio-controller;
                                #gpio-cells = <2>;
+                               #interrupt-cells = <2>;
                                interrupt-controller;
                                interrupt-parent = <&gpio2>;
                                interrupts = <3 IRQ_TYPE_LEVEL_LOW>;
 
                #address-cells = <3>;
                #size-cells = <2>;
-               #interrupt-cells = <1>;
        };
 };
 
index 2ae93f57fe5acac1f3f437b082e258ed81a391e0..ea40623d12e5fddc11b2af150ca6a80af93510a3 100644 (file)
                blocks = <0x5>;
                id = <0>;
                interrupts = <10 IRQ_TYPE_LEVEL_LOW>;
-               interrupt-controller;
                interrupt-parent = <&gpio4>;
                irq-trigger = <0x1>;
                pinctrl-names = "default";
index 55c90f6393ad5e1176b5f8af6ca94bcf9c368477..d3a7a6eeb8e09edff6963de86527e13899e3c956 100644 (file)
                blocks = <0x5>;
                interrupts = <20 IRQ_TYPE_LEVEL_LOW>;
                interrupt-parent = <&gpio6>;
-               interrupt-controller;
                id = <0>;
                irq-trigger = <0x1>;
                pinctrl-names = "default";
index a63e73adc1fc532175d8cd1baca8ede060f4d2f8..42b2ba23aefc9e26ddb3a8e0317013e30602fdbe 100644 (file)
                pinctrl-0 = <&pinctrl_pmic>;
                interrupt-parent = <&gpio2>;
                interrupts = <8 IRQ_TYPE_LEVEL_LOW>;
-               interrupt-controller;
 
                onkey {
                        compatible = "dlg,da9063-onkey";
index 113974520d544b72ff3397629935037c1d1cae53..c0c47adc5866e3ea157b499f15d8edf8b2d1fcde 100644 (file)
                reg = <0x58>;
                interrupt-parent = <&gpio2>;
                interrupts = <9 IRQ_TYPE_LEVEL_LOW>; /* active-low GPIO2_9 */
+               #interrupt-cells = <2>;
                interrupt-controller;
 
                regulators {
index 86b4269e0e0117b3906b625537444533c28510fb..85e278eb201610a1c851c4093025bb205e02a3b3 100644 (file)
                interrupt-parent = <&gpio1>;
                interrupts = <2 IRQ_TYPE_LEVEL_LOW>;
                interrupt-controller;
+               #interrupt-cells = <2>;
                gpio-controller;
                #gpio-cells = <2>;
 
index 44cc4ff1d0df358ab66bb036d127175da1be74b6..d12fb44aeb140cfacf05a5b257d2106c79392279 100644 (file)
        tpm_tis: tpm@1 {
                pinctrl-names = "default";
                pinctrl-0 = <&pinctrl_tpm>;
-               compatible = "tcg,tpm_tis-spi";
+               compatible = "infineon,slb9670", "tcg,tpm_tis-spi";
                reg = <1>;
                spi-max-frequency = <20000000>;
                interrupt-parent = <&gpio5>;
index 3a723843d5626f6cc4b9ee2750968c01e46306db..9984b343cdf0cad1abd9e0d4d142ded838c47980 100644 (file)
         * TCG specification - Section 6.4.1 Clocking:
         * TPM shall support a SPI clock frequency range of 10-24 MHz.
         */
-       st33htph: tpm-tis@0 {
+       st33htph: tpm@0 {
                compatible = "st,st33htpm-spi", "tcg,tpm_tis-spi";
                reg = <0>;
                spi-max-frequency = <24000000>;
index 12361fcbe24aff98a70482f2a7885c6ce28cb3b2..1b965652291bfaf5d6bad76ac3eaf10974eac6ea 100644 (file)
@@ -63,6 +63,7 @@
                gpio-controller;
                #gpio-cells = <2>;
                #interrupt-cells = <2>;
+               interrupt-controller;
                reg = <0x25>;
        };
 
index ebf7befcc11e3e8cd5985d72c384ae2248635bcc..9c81c6baa2d39ae7cd73a34144598d513423c343 100644 (file)
                                        <&clks IMX7D_LCDIF_PIXEL_ROOT_CLK>;
                                clock-names = "pix", "axi";
                                status = "disabled";
-
-                               port {
-                                       #address-cells = <1>;
-                                       #size-cells = <0>;
-
-                                       lcdif_out_mipi_dsi: endpoint@0 {
-                                               reg = <0>;
-                                               remote-endpoint = <&mipi_dsi_in_lcdif>;
-                                       };
-                               };
                        };
 
                        mipi_csi: mipi-csi@30750000 {
                                samsung,esc-clock-frequency = <20000000>;
                                samsung,pll-clock-frequency = <24000000>;
                                status = "disabled";
-
-                               ports {
-                                       #address-cells = <1>;
-                                       #size-cells = <0>;
-
-                                       port@0 {
-                                               reg = <0>;
-                                               #address-cells = <1>;
-                                               #size-cells = <0>;
-
-                                               mipi_dsi_in_lcdif: endpoint@0 {
-                                                       reg = <0>;
-                                                       remote-endpoint = <&lcdif_out_mipi_dsi>;
-                                               };
-                                       };
-                               };
                        };
                };
 
index b0ed68af0546702d9413c492da6796194208c347..029f49be40e373f706f7f67c34358ba9272ea0af 100644 (file)
                reg = <0x22>;
                gpio-controller;
                #gpio-cells = <2>;
+               #interrupt-cells = <2>;
                interrupt-controller;
                interrupt-parent = <&gpio3>;
                interrupts = <2 IRQ_TYPE_LEVEL_LOW>;
index 2045fc779f887030735f9310982bdef228f8a481..27429d0fedfba8ac6f144c55dbd49d295f5cec29 100644 (file)
                                          "msi8";
                        #interrupt-cells = <1>;
                        interrupt-map-mask = <0 0 0 0x7>;
-                       interrupt-map = <0 0 0 1 &intc 0 0 0 141 IRQ_TYPE_LEVEL_HIGH>, /* int_a */
-                                       <0 0 0 2 &intc 0 0 0 142 IRQ_TYPE_LEVEL_HIGH>, /* int_b */
-                                       <0 0 0 3 &intc 0 0 0 143 IRQ_TYPE_LEVEL_HIGH>, /* int_c */
-                                       <0 0 0 4 &intc 0 0 0 144 IRQ_TYPE_LEVEL_HIGH>; /* int_d */
+                       interrupt-map = <0 0 0 1 &intc 0 141 IRQ_TYPE_LEVEL_HIGH>, /* int_a */
+                                       <0 0 0 2 &intc 0 142 IRQ_TYPE_LEVEL_HIGH>, /* int_b */
+                                       <0 0 0 3 &intc 0 143 IRQ_TYPE_LEVEL_HIGH>, /* int_c */
+                                       <0 0 0 4 &intc 0 144 IRQ_TYPE_LEVEL_HIGH>; /* int_d */
 
                        clocks = <&gcc GCC_PCIE_PIPE_CLK>,
                                 <&gcc GCC_PCIE_AUX_CLK>,
index 2fba4d084001b9646ee012eb967e96a27695bfa6..8590981245a62057c2b61370e57a7627f36496e8 100644 (file)
                        interrupt-parent = <&irqc0>;
                        interrupts = <2 IRQ_TYPE_LEVEL_LOW>;
                        interrupt-controller;
+                       #interrupt-cells = <2>;
 
                        rtc {
                                compatible = "dlg,da9063-rtc";
index f9bc5b4f019d02136aa99631c1b2e8c67e9651de..683f7395fab0b6961e5f00a3985fc9b690469237 100644 (file)
                interrupt-parent = <&irqc0>;
                interrupts = <2 IRQ_TYPE_LEVEL_LOW>;
                interrupt-controller;
+               #interrupt-cells = <2>;
 
                onkey {
                        compatible = "dlg,da9063-onkey";
index e9c13bb03772af44eada731a13b5ee88a2e3de7c..0efd9f98c75aced03009396d1c6e6ac023d84c4a 100644 (file)
                interrupt-parent = <&irqc0>;
                interrupts = <2 IRQ_TYPE_LEVEL_LOW>;
                interrupt-controller;
+               #interrupt-cells = <2>;
 
                rtc {
                        compatible = "dlg,da9063-rtc";
index 7e8bc06715f6564badf502267a33c3737c206cf9..93c86e9216455577271652dcbeb8623faba69885 100644 (file)
                interrupt-parent = <&irqc0>;
                interrupts = <2 IRQ_TYPE_LEVEL_LOW>;
                interrupt-controller;
+               #interrupt-cells = <2>;
 
                watchdog {
                        compatible = "dlg,da9063-watchdog";
index 4f9838cf97ee4fb608b27bfc3d637edee39f3c95..540a9ad28f28ac1a08c7b4f5d3e6a23bcfc262e0 100644 (file)
                interrupt-parent = <&irqc>;
                interrupts = <2 IRQ_TYPE_LEVEL_LOW>;
                interrupt-controller;
+               #interrupt-cells = <2>;
 
                rtc {
                        compatible = "dlg,da9063-rtc";
index 1744fdbf9e0ce08d2a30180e1462dd46a18152f9..1ea6c757893bc0bf5ae4d7c6a6c91854939f9b3f 100644 (file)
                interrupt-parent = <&irqc0>;
                interrupts = <2 IRQ_TYPE_LEVEL_LOW>;
                interrupt-controller;
+               #interrupt-cells = <2>;
 
                rtc {
                        compatible = "dlg,da9063-rtc";
index c0d067df22a03d4e2590333965c7c8d7a6f539d6..b5ecafbb2e4de582e4449e7abba6217d4e35dcdb 100644 (file)
                interrupt-parent = <&gpio3>;
                interrupts = <31 IRQ_TYPE_LEVEL_LOW>;
                interrupt-controller;
+               #interrupt-cells = <2>;
 
                rtc {
                        compatible = "dlg,da9063-rtc";
index 43d480a7f3eacc21636788f15e2b27ce3d4dec43..595e074085eb4cd3cf9ad84d59b138051302ef5e 100644 (file)
                interrupt-parent = <&gpio3>;
                interrupts = <31 IRQ_TYPE_LEVEL_LOW>;
                interrupt-controller;
+               #interrupt-cells = <2>;
 
                onkey {
                        compatible = "dlg,da9063-onkey";
index abf3006f0a842435b9d56750e805fe93261649c6..f3291f3bbc6fd2b480e975632847f9310c082225 100644 (file)
        pwm4: pwm@10280000 {
                compatible = "rockchip,rv1108-pwm", "rockchip,rk3288-pwm";
                reg = <0x10280000 0x10>;
-               interrupts = <GIC_SPI 38 IRQ_TYPE_LEVEL_HIGH>;
                clocks = <&cru SCLK_PWM>, <&cru PCLK_PWM>;
                clock-names = "pwm", "pclk";
                pinctrl-names = "default";
        pwm5: pwm@10280010 {
                compatible = "rockchip,rv1108-pwm", "rockchip,rk3288-pwm";
                reg = <0x10280010 0x10>;
-               interrupts = <GIC_SPI 38 IRQ_TYPE_LEVEL_HIGH>;
                clocks = <&cru SCLK_PWM>, <&cru PCLK_PWM>;
                clock-names = "pwm", "pclk";
                pinctrl-names = "default";
        pwm6: pwm@10280020 {
                compatible = "rockchip,rv1108-pwm", "rockchip,rk3288-pwm";
                reg = <0x10280020 0x10>;
-               interrupts = <GIC_SPI 38 IRQ_TYPE_LEVEL_HIGH>;
                clocks = <&cru SCLK_PWM>, <&cru PCLK_PWM>;
                clock-names = "pwm", "pclk";
                pinctrl-names = "default";
        pwm7: pwm@10280030 {
                compatible = "rockchip,rv1108-pwm", "rockchip,rk3288-pwm";
                reg = <0x10280030 0x10>;
-               interrupts = <GIC_SPI 38 IRQ_TYPE_LEVEL_HIGH>;
                clocks = <&cru SCLK_PWM>, <&cru PCLK_PWM>;
                clock-names = "pwm", "pclk";
                pinctrl-names = "default";
        pwm0: pwm@20040000 {
                compatible = "rockchip,rv1108-pwm", "rockchip,rk3288-pwm";
                reg = <0x20040000 0x10>;
-               interrupts = <GIC_SPI 39 IRQ_TYPE_LEVEL_HIGH>;
                clocks = <&cru SCLK_PWM0_PMU>, <&cru PCLK_PWM0_PMU>;
                clock-names = "pwm", "pclk";
                pinctrl-names = "default";
        pwm1: pwm@20040010 {
                compatible = "rockchip,rv1108-pwm", "rockchip,rk3288-pwm";
                reg = <0x20040010 0x10>;
-               interrupts = <GIC_SPI 39 IRQ_TYPE_LEVEL_HIGH>;
                clocks = <&cru SCLK_PWM0_PMU>, <&cru PCLK_PWM0_PMU>;
                clock-names = "pwm", "pclk";
                pinctrl-names = "default";
        pwm2: pwm@20040020 {
                compatible = "rockchip,rv1108-pwm", "rockchip,rk3288-pwm";
                reg = <0x20040020 0x10>;
-               interrupts = <GIC_SPI 39 IRQ_TYPE_LEVEL_HIGH>;
                clocks = <&cru SCLK_PWM0_PMU>, <&cru PCLK_PWM0_PMU>;
                clock-names = "pwm", "pclk";
                pinctrl-names = "default";
        pwm3: pwm@20040030 {
                compatible = "rockchip,rv1108-pwm", "rockchip,rk3288-pwm";
                reg = <0x20040030 0x10>;
-               interrupts = <GIC_SPI 39 IRQ_TYPE_LEVEL_HIGH>;
                clocks = <&cru SCLK_PWM0_PMU>, <&cru PCLK_PWM0_PMU>;
                clock-names = "pwm", "pclk";
                pinctrl-names = "default";
index d7954ff466b491b32acf6962ab5d64f4843f8157..e5254e32aa8fc326dfcabce33705a9b25e272052 100644 (file)
 };
 
 &fimd {
+       samsung,invert-vclk;
        status = "okay";
 };
 
index 576235ec3c516ee2136dd2b4a9c95a2ded61a3b3..afa417b34b25ffd7351885071e72989dd635b382 100644 (file)
                reg = <0x42>;
                interrupts = <8 3>;
                interrupt-parent = <&gpioi>;
-               interrupt-controller;
                wakeup-source;
 
                stmpegpio: stmpe_gpio {
index 510cca5acb79ca449dc11ba043475cfc43becc4c..7a701f7ef0c70467181e71719f17712ca4341562 100644 (file)
@@ -64,7 +64,6 @@
                reg = <0x38>;
                interrupts = <2 2>;
                interrupt-parent = <&gpiof>;
-               interrupt-controller;
                touchscreen-size-x = <480>;
                touchscreen-size-y = <800>;
                status = "okay";
index b8730aa52ce6fe521a1b531be42c4ef891c969b5..a59331aa58e55e3ef514fc06b5a36472c901dcd3 100644 (file)
        pinctrl-names = "default";
        pinctrl-0 = <&spi1_pins>;
 
-       tpm_spi_tis@0 {
+       tpm@0 {
                compatible = "tcg,tpm_tis-spi";
                reg = <0>;
                spi-max-frequency = <500000>;
index c8e55642f9c6e5acc43a741a769f798be6cccb37..3e834fc7e3707d4573b75cbfd89a49423c3ec6a5 100644 (file)
                reg = <0x41>;
                interrupts = <30 IRQ_TYPE_LEVEL_LOW>;
                interrupt-parent = <&gpio2>;
-               interrupt-controller;
                id = <0>;
                blocks = <0x5>;
                irq-trigger = <0x1>;
index 0a90583f9f017ed2f88cd20cb6f731440909e830..8f9dbe8d90291ef33f42498d29f477cf54337b2a 100644 (file)
@@ -297,6 +297,7 @@ CONFIG_FB_MODE_HELPERS=y
 CONFIG_LCD_CLASS_DEVICE=y
 CONFIG_LCD_L4F00242T03=y
 CONFIG_LCD_PLATFORM=y
+CONFIG_BACKLIGHT_CLASS_DEVICE=y
 CONFIG_BACKLIGHT_PWM=y
 CONFIG_BACKLIGHT_GPIO=y
 CONFIG_FRAMEBUFFER_CONSOLE=y
index e12d7d096fc034058bfaa094bf9b314a2a7a983d..e4eb54f6cd9fef41fecad56e25c4136e75455756 100644 (file)
@@ -11,7 +11,7 @@
 
 static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
 {
-       asm_volatile_goto("1:\n\t"
+       asm goto("1:\n\t"
                 WASM(nop) "\n\t"
                 ".pushsection __jump_table,  \"aw\"\n\t"
                 ".word 1b, %l[l_yes], %c0\n\t"
@@ -25,7 +25,7 @@ l_yes:
 
 static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)
 {
-       asm_volatile_goto("1:\n\t"
+       asm goto("1:\n\t"
                 WASM(b) " %l[l_yes]\n\t"
                 ".pushsection __jump_table,  \"aw\"\n\t"
                 ".word 1b, %l[l_yes], %c0\n\t"
index 71b1139764204c506bae31fc31c23f7a51bf61a3..8b1ec60a9a467abcc3bc80c07be12bf734d7c236 100644 (file)
@@ -339,6 +339,7 @@ static struct gpiod_lookup_table ep93xx_i2c_gpiod_table = {
                                GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN),
                GPIO_LOOKUP_IDX("G", 0, NULL, 1,
                                GPIO_ACTIVE_HIGH | GPIO_OPEN_DRAIN),
+               { }
        },
 };
 
index e96fb40b9cc32a64b5a0379e5ee42bd9e38f1aa5..07565b593ed681b0f1675f8ef7a934c1ae53dc51 100644 (file)
@@ -298,6 +298,8 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
                goto done;
        }
        count_vm_vma_lock_event(VMA_LOCK_RETRY);
+       if (fault & VM_FAULT_MAJOR)
+               flags |= FAULT_FLAG_TRIED;
 
        /* Quick path to respond to signals */
        if (fault_signal_pending(fault, regs)) {
index 47ecc4cff9d25b7752c94df9ab574ec52cbabd28..a88cdf91068713ebefc031f438b3b22a0247f943 100644 (file)
@@ -195,7 +195,7 @@ vdso_prepare: prepare0
        include/generated/vdso-offsets.h arch/arm64/kernel/vdso/vdso.so
 ifdef CONFIG_COMPAT_VDSO
        $(Q)$(MAKE) $(build)=arch/arm64/kernel/vdso32 \
-       include/generated/vdso32-offsets.h arch/arm64/kernel/vdso32/vdso.so
+       arch/arm64/kernel/vdso32/vdso.so
 endif
 endif
 
index 91d505b385de5a55f66b125586158c75720672a6..1f1f8d865d0e52a2a872d677504a125e06f57746 100644 (file)
@@ -42,5 +42,6 @@ dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h616-bigtreetech-cb1-manta.dtb
 dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h616-bigtreetech-pi.dtb
 dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h616-orangepi-zero2.dtb
 dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h616-x96-mate.dtb
+dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h618-orangepi-zero2w.dtb
 dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h618-orangepi-zero3.dtb
 dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h618-transpeed-8k618-t.dtb
index dccbba6e7f98e49f572b57c86415dced108fee2d..dbf2dce8d1d68a5225311bf330704e9f6d1ead40 100644 (file)
                msix: msix@fbe00000 {
                        compatible = "al,alpine-msix";
                        reg = <0x0 0xfbe00000 0x0 0x100000>;
-                       interrupt-controller;
                        msi-controller;
                        al,msi-base-spi = <160>;
                        al,msi-num-spis = <160>;
index 39481d7fd7d4da806fe1ab1e4b2320cc732f37d5..3ea178acdddfe2072352283f47318f0f75808c4f 100644 (file)
                msix: msix@fbe00000 {
                        compatible = "al,alpine-msix";
                        reg = <0x0 0xfbe00000 0x0 0x100000>;
-                       interrupt-controller;
                        msi-controller;
                        al,msi-base-spi = <336>;
                        al,msi-num-spis = <959>;
index 9dcd25ec2c04183fb90f160452142c2f5a790136..896d1f33b5b6173e3b4b701d4e08f4ad277856e0 100644 (file)
                        #gpio-cells = <2>;
                        gpio-controller;
                        interrupt-controller;
+                       #interrupt-cells = <2>;
                        interrupts = <GIC_SPI 400 IRQ_TYPE_LEVEL_HIGH>;
                };
 
index f049687d6b96d23fb0383401ef9c19e50af34148..d8516ec0dae7450e2c5e81f0bddf8ffdeba2bb5e 100644 (file)
                        #gpio-cells = <2>;
                        gpio-controller;
                        interrupt-controller;
+                       #interrupt-cells = <2>;
                        interrupts = <GIC_SPI 183 IRQ_TYPE_LEVEL_HIGH>;
                        gpio-ranges = <&pinmux 0 0 16>,
                                        <&pinmux 16 71 2>,
index 9747cb3fa03ac5c141b9bf660da3531ca2082def..d838e3a7af6e5ddda3751cc6f0bf4c73bccacc03 100644 (file)
                        #clock-cells = <1>;
                        clocks = <&cmu_top CLK_DOUT_CMU_MISC_BUS>,
                                 <&cmu_top CLK_DOUT_CMU_MISC_SSS>;
-                       clock-names = "dout_cmu_misc_bus", "dout_cmu_misc_sss";
+                       clock-names = "bus", "sss";
                };
 
                watchdog_cl0: watchdog@10060000 {
index 2e027675d7bbe16300b91be4b6f5522b245dea12..2cb0212b63c6eda77567f90d7960ce89825bd114 100644 (file)
@@ -20,23 +20,41 @@ dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls1046a-frwy.dtb
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls1046a-qds.dtb
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls1046a-rdb.dtb
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls1046a-tqmls1046a-mbls10xxa.dtb
+DTC_FLAGS_fsl-ls1088a-qds := -Wno-interrupt_map
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls1088a-qds.dtb
+DTC_FLAGS_fsl-ls1088a-rdb := -Wno-interrupt_map
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls1088a-rdb.dtb
+DTC_FLAGS_fsl-ls1088a-ten64 := -Wno-interrupt_map
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls1088a-ten64.dtb
+DTC_FLAGS_fsl-ls1088a-tqmls1088a-mbls10xxa := -Wno-interrupt_map
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls1088a-tqmls1088a-mbls10xxa.dtb
+DTC_FLAGS_fsl-ls2080a-qds := -Wno-interrupt_map
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-qds.dtb
+DTC_FLAGS_fsl-ls2080a-rdb := -Wno-interrupt_map
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-rdb.dtb
+DTC_FLAGS_fsl-ls2081a-rdb := -Wno-interrupt_map
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2081a-rdb.dtb
+DTC_FLAGS_fsl-ls2080a-simu := -Wno-interrupt_map
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2080a-simu.dtb
+DTC_FLAGS_fsl-ls2088a-qds := -Wno-interrupt_map
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-qds.dtb
+DTC_FLAGS_fsl-ls2088a-rdb := -Wno-interrupt_map
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls2088a-rdb.dtb
+DTC_FLAGS_fsl-lx2160a-bluebox3 := -Wno-interrupt_map
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-bluebox3.dtb
+DTC_FLAGS_fsl-lx2160a-bluebox3-rev-a := -Wno-interrupt_map
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-bluebox3-rev-a.dtb
+DTC_FLAGS_fsl-lx2160a-clearfog-cx := -Wno-interrupt_map
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-clearfog-cx.dtb
+DTC_FLAGS_fsl-lx2160a-honeycomb := -Wno-interrupt_map
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-honeycomb.dtb
+DTC_FLAGS_fsl-lx2160a-qds := -Wno-interrupt_map
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-qds.dtb
+DTC_FLAGS_fsl-lx2160a-rdb := -Wno-interrupt_map
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2160a-rdb.dtb
+DTC_FLAGS_fsl-lx2162a-clearfog := -Wno-interrupt_map
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2162a-clearfog.dtb
+DTC_FLAGS_fsl-lx2162a-qds := -Wno-interrupt_map
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-lx2162a-qds.dtb
 
 fsl-ls1028a-qds-13bb-dtbs := fsl-ls1028a-qds.dtb fsl-ls1028a-qds-13bb.dtbo
@@ -53,6 +71,7 @@ dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls1028a-qds-85bb.dtb
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls1028a-qds-899b.dtb
 dtb-$(CONFIG_ARCH_LAYERSCAPE) += fsl-ls1028a-qds-9999.dtb
 
+DTC_FLAGS_fsl-lx2160a-tqmlx2160a-mblx2160a := -Wno-interrupt_map
 fsl-lx2160a-tqmlx2160a-mblx2160a-12-11-x-dtbs := fsl-lx2160a-tqmlx2160a-mblx2160a.dtb \
        fsl-lx2160a-tqmlx2160a-mblx2160a_12_x_x.dtbo \
        fsl-lx2160a-tqmlx2160a-mblx2160a_x_11_x.dtbo
index 968f475b9a96c3c7334d670fd004ddcde08eed6f..27a902569e2a28434af3b6b15dcdb3a43f7a9606 100644 (file)
        };
 
        tpm: tpm@1 {
-               compatible = "tcg,tpm_tis-spi";
+               compatible = "infineon,slb9670", "tcg,tpm_tis-spi";
                interrupts = <11 IRQ_TYPE_LEVEL_LOW>;
                interrupt-parent = <&gpio2>;
                pinctrl-names = "default";
index 3f3f2a2c89cd504f22548178b0d718ed61d122fa..752caa38eb03bfd6831e61f857b517beb5bfe1a1 100644 (file)
@@ -89,7 +89,7 @@
        status = "okay";
 
        tpm@1 {
-               compatible = "tcg,tpm_tis-spi";
+               compatible = "atmel,attpm20p", "tcg,tpm_tis-spi";
                reg = <0x1>;
                spi-max-frequency = <36000000>;
        };
index 06fed93769966367b02c0a3d5f44f8264c080617..2aa6c1090fc7d7b81f7774354286c13a5463c06b 100644 (file)
        status = "okay";
 
        tpm@1 {
-               compatible = "tcg,tpm_tis-spi";
+               compatible = "atmel,attpm20p", "tcg,tpm_tis-spi";
                reg = <0x1>;
                spi-max-frequency = <36000000>;
        };
index f38ee2266b25dd811e1a8f29c7380aed337a1337..a6b94d1957c92ac6bcc18667b477ca05eda8b1bc 100644 (file)
                pinctrl-0 = <&pinctrl_ptn5150>;
                status = "okay";
 
-               connector {
-                       compatible = "usb-c-connector";
-                       label = "USB-C";
-
-                       port {
-                               typec1_dr_sw: endpoint {
-                                       remote-endpoint = <&usb1_drd_sw>;
-                               };
+               port {
+                       typec1_dr_sw: endpoint {
+                               remote-endpoint = <&usb1_drd_sw>;
                        };
                };
        };
index feae77e038354c687d69904fdb5b577f32cfe26d..a08057410bdef5b3a2572cb5c5e2fe6ea35b5522 100644 (file)
        status = "okay";
 
        tpm: tpm@0 {
-               compatible = "infineon,slb9670";
+               compatible = "infineon,slb9670", "tcg,tpm_tis-spi";
                reg = <0>;
                pinctrl-names = "default";
                pinctrl-0 = <&pinctrl_tpm>;
index d98a040860a48a3ff2c6592420853a0dacc9b48a..5828c9d7821de1eab50967972cf406f8f6359da5 100644 (file)
 &uart4 {
        pinctrl-names = "default";
        pinctrl-0 = <&pinctrl_uart4>;
-       status = "okay";
+       status = "disabled";
 };
 
 &usb3_phy0 {
index fea67a9282f033121323ef2c86e200deac9463d4..b749e28e5ede5cf85f309f2f7903ebee44b41f98 100644 (file)
                                pinctrl-names = "default";
                                pinctrl-0 = <&pinctrl_ptn5150>;
 
-                               connector {
-                                       compatible = "usb-c-connector";
-                                       label = "USB-C";
-
-                                       port {
-                                               ptn5150_out_ep: endpoint {
-                                                       remote-endpoint = <&dwc3_0_ep>;
-                                               };
+                               port {
+
+                                       ptn5150_out_ep: endpoint {
+                                               remote-endpoint = <&dwc3_0_ep>;
                                        };
                                };
                        };
index 4ae4fdab461e008d4816816eedb90f91e7d32561..43f1d45ccc96f01686534d228de9b69630db3ebb 100644 (file)
                                  <&clk IMX8MP_AUDIO_PLL2_OUT>;
                assigned-clock-parents = <&clk IMX8MP_AUDIO_PLL2_OUT>;
                assigned-clock-rates = <13000000>, <13000000>, <156000000>;
-               reset-gpios = <&gpio3 21 GPIO_ACTIVE_HIGH>;
+               reset-gpios = <&gpio1 GPIO_ACTIVE_HIGH>;
                status = "disabled";
 
                ports {
index a2d5d19b2de0cb8b69a8ce55fbaeb0c6ba410907..86d3da36e4f3eecf64c0168c825baee86dfdab3f 100644 (file)
                enable-active-high;
        };
 
+       reg_vcc_1v8: regulator-1v8 {
+               compatible = "regulator-fixed";
+               regulator-name = "VCC_1V8";
+               regulator-min-microvolt = <1800000>;
+               regulator-max-microvolt = <1800000>;
+       };
+
        reg_vcc_3v3: regulator-3v3 {
                compatible = "regulator-fixed";
                regulator-name = "VCC_3V3";
                clock-names = "mclk";
                clocks = <&audio_blk_ctrl IMX8MP_CLK_AUDIOMIX_SAI3_MCLK1>;
                reset-gpios = <&gpio4 29 GPIO_ACTIVE_LOW>;
-               iov-supply = <&reg_vcc_3v3>;
+               iov-supply = <&reg_vcc_1v8>;
                ldoin-supply = <&reg_vcc_3v3>;
        };
 
index c24587c895e1f9734da4c4f4cf7becb697825f59..41c79d2ebdd6201dc10278204c064a4c01c71709 100644 (file)
        status = "okay";
 
        tpm@1 {
-               compatible = "tcg,tpm_tis-spi";
+               compatible = "atmel,attpm20p", "tcg,tpm_tis-spi";
                reg = <0x1>;
                spi-max-frequency = <36000000>;
        };
index 628ffba69862ad51f2072e88fc812b3a84e1b71c..d5c400b355af564123497cd1805e0b0ad56ded21 100644 (file)
        status = "okay";
 
        tpm@1 {
-               compatible = "tcg,tpm_tis-spi";
+               compatible = "atmel,attpm20p", "tcg,tpm_tis-spi";
                reg = <0x1>;
                spi-max-frequency = <36000000>;
        };
index 9caf7ca25444600a4a7979b3749d5175e32b0bbe..cae586cd45bdd59aa479e70bb290fc50b0392a3c 100644 (file)
        status = "okay";
 
        tpm@0 {
-               compatible = "tcg,tpm_tis-spi";
+               compatible = "atmel,attpm20p", "tcg,tpm_tis-spi";
                reg = <0x0>;
                spi-max-frequency = <36000000>;
        };
index 76c73daf546bd0f64bc22e5e1176d814ad677e18..39a550c1cd261dd516da26757bfa8eccd908b92a 100644 (file)
                                        compatible = "fsl,imx8mp-ldb";
                                        reg = <0x5c 0x4>, <0x128 0x4>;
                                        reg-names = "ldb", "lvds";
-                                       clocks = <&clk IMX8MP_CLK_MEDIA_LDB>;
+                                       clocks = <&clk IMX8MP_CLK_MEDIA_LDB_ROOT>;
                                        clock-names = "ldb";
                                        assigned-clocks = <&clk IMX8MP_CLK_MEDIA_LDB>;
                                        assigned-clock-parents = <&clk IMX8MP_VIDEO_PLL1_OUT>;
index 6376417e918c2083bb67c2f978d53602153d3cb9..d8cf1f27c3ec8a33b7ad527c1fc2b489747a2d84 100644 (file)
@@ -65,7 +65,7 @@
        status = "okay";
 
        tpm@0 {
-               compatible = "infineon,slb9670";
+               compatible = "infineon,slb9670", "tcg,tpm_tis-spi";
                reg = <0>;
                spi-max-frequency = <43000000>;
        };
index 48ec4ebec0a83e65bb4978e2f2ffa9cb7aba873c..b864ffa74ea8b6ff72afbd698eab4d30ad990a37 100644 (file)
        amba {
                #address-cells = <2>;
                #size-cells = <1>;
-               #interrupt-cells = <3>;
 
                compatible = "simple-bus";
                interrupt-parent = <&gic>;
index 3869460aa5dcb5da3a3fb32f8e0df6903b88862c..996fb39bb50c1f2074ddd5ac03f191091920c96b 100644 (file)
        amba {
                #address-cells = <2>;
                #size-cells = <1>;
-               #interrupt-cells = <3>;
 
                compatible = "simple-bus";
                interrupt-parent = <&gic>;
index 2c920e22cec2b52dd983f2d20812e7fe80a0c379..7ec7c789d87eff436c4f7362e417c71e2033a5b1 100644 (file)
 
                        odmi: odmi@300000 {
                                compatible = "marvell,odmi-controller";
-                               interrupt-controller;
                                msi-controller;
                                marvell,odmi-frames = <4>;
                                reg = <0x300000 0x4000>,
index 5506de83f61d423634511fba3f783f67a8987792..1b3396b1cee394659d0a77c104f05e1e7762569f 100644 (file)
        status = "okay";
        cs-gpios = <&pio 86 GPIO_ACTIVE_LOW>;
 
-       cr50@0 {
+       tpm@0 {
                compatible = "google,cr50";
                reg = <0>;
                spi-max-frequency = <1000000>;
index f2281250ac35da2514d73191cbcdb2e195afcbcb..d87aab8d7a79ed4ac8365b951f16c370b2efcc91 100644 (file)
        pinctrl-names = "default";
        pinctrl-0 = <&spi5_pins>;
 
-       cr50@0 {
+       tpm@0 {
                compatible = "google,cr50";
                reg = <0>;
                interrupts-extended = <&pio 171 IRQ_TYPE_EDGE_RISING>;
index 69c7f3954ae59a8008a257807d31f227ba1cd2a8..4127cb84eba41a39f0fbff423a43de827dbea695 100644 (file)
                compatible = "mediatek,mt6360";
                reg = <0x34>;
                interrupt-controller;
+               #interrupt-cells = <1>;
                interrupts-extended = <&pio 101 IRQ_TYPE_EDGE_FALLING>;
                interrupt-names = "IRQB";
 
index ea13c4a7027c46ba5f5151947537b5376bcbad20..81a82933e35004e7df51383ed22d291e40874dd9 100644 (file)
                        status = "okay";
 
                        phy-handle = <&mgbe0_phy>;
-                       phy-mode = "usxgmii";
+                       phy-mode = "10gbase-r";
 
                        mdio {
                                #address-cells = <1>;
index 3f16595d099c5620b0d2dde77f0e2c6491c4a576..d1bd328892afa2c319750b20c5b8b979283e6481 100644 (file)
                                        <&mc TEGRA234_MEMORY_CLIENT_MGBEAWR &emc>;
                        interconnect-names = "dma-mem", "write";
                        iommus = <&smmu_niso0 TEGRA234_SID_MGBE>;
-                       power-domains = <&bpmp TEGRA234_POWER_DOMAIN_MGBEA>;
+                       power-domains = <&bpmp TEGRA234_POWER_DOMAIN_MGBEB>;
                        status = "disabled";
                };
 
                                        <&mc TEGRA234_MEMORY_CLIENT_MGBEBWR &emc>;
                        interconnect-names = "dma-mem", "write";
                        iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF1>;
-                       power-domains = <&bpmp TEGRA234_POWER_DOMAIN_MGBEB>;
+                       power-domains = <&bpmp TEGRA234_POWER_DOMAIN_MGBEC>;
                        status = "disabled";
                };
 
                                        <&mc TEGRA234_MEMORY_CLIENT_MGBECWR &emc>;
                        interconnect-names = "dma-mem", "write";
                        iommus = <&smmu_niso0 TEGRA234_SID_MGBE_VF2>;
-                       power-domains = <&bpmp TEGRA234_POWER_DOMAIN_MGBEC>;
+                       power-domains = <&bpmp TEGRA234_POWER_DOMAIN_MGBED>;
                        status = "disabled";
                };
 
index 5e1277fea7250b4132039efb18f1cfaafdc5257e..61c8fd49c96678740684696397eb15118d83e1b9 100644 (file)
 
                        #interrupt-cells = <1>;
                        interrupt-map-mask = <0 0 0 0x7>;
-                       interrupt-map = <0 0 0 1 &intc 0 75 IRQ_TYPE_LEVEL_HIGH>, /* int_a */
-                                       <0 0 0 2 &intc 0 78 IRQ_TYPE_LEVEL_HIGH>, /* int_b */
-                                       <0 0 0 3 &intc 0 79 IRQ_TYPE_LEVEL_HIGH>, /* int_c */
-                                       <0 0 0 4 &intc 0 83 IRQ_TYPE_LEVEL_HIGH>; /* int_d */
+                       interrupt-map = <0 0 0 1 &intc 0 0 0 75 IRQ_TYPE_LEVEL_HIGH>, /* int_a */
+                                       <0 0 0 2 &intc 0 0 0 78 IRQ_TYPE_LEVEL_HIGH>, /* int_b */
+                                       <0 0 0 3 &intc 0 0 0 79 IRQ_TYPE_LEVEL_HIGH>, /* int_c */
+                                       <0 0 0 4 &intc 0 0 0 83 IRQ_TYPE_LEVEL_HIGH>; /* int_d */
 
                        clocks = <&gcc GCC_SYS_NOC_PCIE0_AXI_CLK>,
                                 <&gcc GCC_PCIE0_AXI_M_CLK>,
index cf295bed32998087cee60bd0ce61d0cf587d2c0a..26441447c866f6095aa26d48bb15c79f73bdd6c8 100644 (file)
                        interrupt-names = "msi";
                        #interrupt-cells = <1>;
                        interrupt-map-mask = <0 0 0 0x7>;
-                       interrupt-map = <0 0 0 1 &intc 0 142
+                       interrupt-map = <0 0 0 1 &intc 0 142
                                         IRQ_TYPE_LEVEL_HIGH>, /* int_a */
-                                       <0 0 0 2 &intc 0 143
+                                       <0 0 0 2 &intc 0 143
                                         IRQ_TYPE_LEVEL_HIGH>, /* int_b */
-                                       <0 0 0 3 &intc 0 144
+                                       <0 0 0 3 &intc 0 144
                                         IRQ_TYPE_LEVEL_HIGH>, /* int_c */
-                                       <0 0 0 4 &intc 0 145
+                                       <0 0 0 4 &intc 0 145
                                         IRQ_TYPE_LEVEL_HIGH>; /* int_d */
 
                        clocks = <&gcc GCC_SYS_NOC_PCIE1_AXI_CLK>,
                        interrupt-names = "msi";
                        #interrupt-cells = <1>;
                        interrupt-map-mask = <0 0 0 0x7>;
-                       interrupt-map = <0 0 0 1 &intc 0 75
+                       interrupt-map = <0 0 0 1 &intc 0 75
                                         IRQ_TYPE_LEVEL_HIGH>, /* int_a */
-                                       <0 0 0 2 &intc 0 78
+                                       <0 0 0 2 &intc 0 78
                                         IRQ_TYPE_LEVEL_HIGH>, /* int_b */
-                                       <0 0 0 3 &intc 0 79
+                                       <0 0 0 3 &intc 0 79
                                         IRQ_TYPE_LEVEL_HIGH>, /* int_c */
-                                       <0 0 0 4 &intc 0 83
+                                       <0 0 0 4 &intc 0 83
                                         IRQ_TYPE_LEVEL_HIGH>; /* int_d */
 
                        clocks = <&gcc GCC_SYS_NOC_PCIE0_AXI_CLK>,
index 8d41ed261adfbfc99e15c07755f54d8f4cf5cc80..ee6f87c828aefab76ff58c1ba1f59ae023068381 100644 (file)
                };
        };
 
-       mpm: interrupt-controller {
-               compatible = "qcom,mpm";
-               qcom,rpm-msg-ram = <&apss_mpm>;
-               interrupts = <GIC_SPI 171 IRQ_TYPE_EDGE_RISING>;
-               mboxes = <&apcs_glb 1>;
-               interrupt-controller;
-               #interrupt-cells = <2>;
-               #power-domain-cells = <0>;
-               interrupt-parent = <&intc>;
-               qcom,mpm-pin-count = <96>;
-               qcom,mpm-pin-map = <2 184>,  /* TSENS1 upper_lower_int */
-                                  <52 243>, /* DWC3_PRI ss_phy_irq */
-                                  <79 347>, /* DWC3_PRI hs_phy_irq */
-                                  <80 352>, /* DWC3_SEC hs_phy_irq */
-                                  <81 347>, /* QUSB2_PHY_PRI DP+DM */
-                                  <82 352>, /* QUSB2_PHY_SEC DP+DM */
-                                  <87 326>; /* SPMI */
-       };
-
        psci {
                compatible = "arm,psci-1.0";
                method = "smc";
                };
 
                rpm_msg_ram: sram@68000 {
-                       compatible = "qcom,rpm-msg-ram", "mmio-sram";
+                       compatible = "qcom,rpm-msg-ram";
                        reg = <0x00068000 0x6000>;
-                       #address-cells = <1>;
-                       #size-cells = <1>;
-                       ranges = <0 0x00068000 0x7000>;
-
-                       apss_mpm: sram@1b8 {
-                               reg = <0x1b8 0x48>;
-                       };
                };
 
                qfprom@74000 {
                        reg = <0x004ad000 0x1000>, /* TM */
                              <0x004ac000 0x1000>; /* SROT */
                        #qcom,sensors = <8>;
-                       interrupts-extended = <&mpm 2 IRQ_TYPE_LEVEL_HIGH>,
-                                             <&intc GIC_SPI 430 IRQ_TYPE_LEVEL_HIGH>;
+                       interrupts = <GIC_SPI 184 IRQ_TYPE_LEVEL_HIGH>,
+                                    <GIC_SPI 430 IRQ_TYPE_LEVEL_HIGH>;
                        interrupt-names = "uplow", "critical";
                        #thermal-sensor-cells = <1>;
                };
                        interrupts = <GIC_SPI 208 IRQ_TYPE_LEVEL_HIGH>;
                        gpio-controller;
                        gpio-ranges = <&tlmm 0 0 150>;
-                       wakeup-parent = <&mpm>;
                        #gpio-cells = <2>;
                        interrupt-controller;
                        #interrupt-cells = <2>;
                              <0x0400a000 0x002100>;
                        reg-names = "core", "chnls", "obsrvr", "intr", "cnfg";
                        interrupt-names = "periph_irq";
-                       interrupts-extended = <&mpm 87 IRQ_TYPE_LEVEL_HIGH>;
+                       interrupts = <GIC_SPI 326 IRQ_TYPE_LEVEL_HIGH>;
                        qcom,ee = <0>;
                        qcom,channel = <0>;
                        #address-cells = <2>;
                        #size-cells = <1>;
                        ranges;
 
-                       interrupts-extended = <&mpm 79 IRQ_TYPE_LEVEL_HIGH>,
-                                             <&mpm 52 IRQ_TYPE_LEVEL_HIGH>;
+                       interrupts = <GIC_SPI 347 IRQ_TYPE_LEVEL_HIGH>,
+                                    <GIC_SPI 243 IRQ_TYPE_LEVEL_HIGH>;
                        interrupt-names = "hs_phy_irq", "ss_phy_irq";
 
                        clocks = <&gcc GCC_SYS_NOC_USB3_AXI_CLK>,
index ffc4406422ae2f82c9636e0fb521f34a1d28c1eb..41215567b3aed7d4211a8a4c5ab94042d205b422 100644 (file)
 };
 
 &pcie4 {
+       max-link-speed = <2>;
+
        perst-gpios = <&tlmm 141 GPIO_ACTIVE_LOW>;
        wake-gpios = <&tlmm 139 GPIO_ACTIVE_LOW>;
 
index def3976bd5bb154d27228831de14e9463239bdf8..eb657e544961d7c2ac60e0f505767c1427893a14 100644 (file)
 };
 
 &pcie4 {
+       max-link-speed = <2>;
+
        perst-gpios = <&tlmm 141 GPIO_ACTIVE_LOW>;
        wake-gpios = <&tlmm 139 GPIO_ACTIVE_LOW>;
 
index 160e098f10757e5f4e9c68e82ecc45f1ce27aa14..f9849b8befbf24b54992d49af812eaa94288c3fb 100644 (file)
                                                 &config_noc SLAVE_QUP_0 RPM_ALWAYS_TAG>,
                                                <&system_noc MASTER_QUP_0 RPM_ALWAYS_TAG
                                                 &bimc SLAVE_EBI_CH0 RPM_ALWAYS_TAG>;
+                               interconnect-names = "qup-core",
+                                                    "qup-config",
+                                                    "qup-memory";
                                #address-cells = <1>;
                                #size-cells = <0>;
                                status = "disabled";
index 9d916edb1c73c10ef5e4fde52e33b75ff902a957..be133a3d5cbe0cb073c0fe8d4f253740da584992 100644 (file)
 
 &tlmm {
        /* Reserved I/Os for NFC */
-       gpio-reserved-ranges = <32 8>;
+       gpio-reserved-ranges = <32 8>, <74 1>;
 
        disp0_reset_n_active: disp0-reset-n-active-state {
                pins = "gpio133";
index 592a67a47c782f667cd48d8d8bcad7d457a84ee4..b9151c2ddf2e5ce7944bed07aa6864ccdc75f2a5 100644 (file)
 
 &tlmm {
        /* Reserved I/Os for NFC */
-       gpio-reserved-ranges = <32 8>;
+       gpio-reserved-ranges = <32 8>, <74 1>;
 
        bt_default: bt-default-state {
                bt-en-pins {
index 3885ef3454ff6e92d8f0d00509d0f935e7e40fa6..50de17e4fb3f25ed0ad490d9b4e593cab2b2cc5a 100644 (file)
                gpio-controller;
                #gpio-cells = <2>;
                interrupt-controller;
+               #interrupt-cells = <2>;
                interrupt-parent = <&gpio6>;
                interrupts = <8 IRQ_TYPE_EDGE_FALLING>;
 
                gpio-controller;
                #gpio-cells = <2>;
                interrupt-controller;
+               #interrupt-cells = <2>;
                interrupt-parent = <&gpio6>;
                interrupts = <4 IRQ_TYPE_EDGE_FALLING>;
        };
                gpio-controller;
                #gpio-cells = <2>;
                interrupt-controller;
+               #interrupt-cells = <2>;
                interrupt-parent = <&gpio7>;
                interrupts = <3 IRQ_TYPE_EDGE_FALLING>;
        };
                gpio-controller;
                #gpio-cells = <2>;
                interrupt-controller;
+               #interrupt-cells = <2>;
                interrupt-parent = <&gpio5>;
                interrupts = <9 IRQ_TYPE_EDGE_FALLING>;
        };
index d0905515399bb00b8562305b7fa5cf8a0eee65b2..9137dd76e72cedb0cfbf1995032e5852cab80f96 100644 (file)
                clock-names = "spiclk", "apb_pclk";
                dmas = <&dmac 12>, <&dmac 13>;
                dma-names = "tx", "rx";
+               num-cs = <2>;
                pinctrl-names = "default";
                pinctrl-0 = <&spi0_clk &spi0_csn &spi0_miso &spi0_mosi>;
                #address-cells = <1>;
                clock-names = "spiclk", "apb_pclk";
                dmas = <&dmac 14>, <&dmac 15>;
                dma-names = "tx", "rx";
+               num-cs = <2>;
                pinctrl-names = "default";
                pinctrl-0 = <&spi1_clk &spi1_csn0 &spi1_csn1 &spi1_miso &spi1_mosi>;
                #address-cells = <1>;
index fb5dcf6e93272180bfd60b8e251a61e61f0e9155..7b4c15c4a9c319da2e92a19ca902884a788e9514 100644 (file)
        pwm3: pwm@ff1b0030 {
                compatible = "rockchip,rk3328-pwm";
                reg = <0x0 0xff1b0030 0x0 0x10>;
-               interrupts = <GIC_SPI 50 IRQ_TYPE_LEVEL_HIGH>;
                clocks = <&cru SCLK_PWM>, <&cru PCLK_PWM>;
                clock-names = "pwm", "pclk";
                pinctrl-names = "default";
index 0f9cc042d9bf06b3445c2cb125435c823f3b26b4..1cba1d857c96ba06e3f257b8a15f20a99a9250ee 100644 (file)
@@ -70,7 +70,7 @@
 &spi0 {
        status = "okay";
 
-       cr50@0 {
+       tpm@0 {
                compatible = "google,cr50";
                reg = <0>;
                interrupt-parent = <&gpio0>;
index c5e7de60c12140c0dae9789cc338ef5f1b9fac3c..5846a11f0e848fc059446a47b57ff732b45e9f4c 100644 (file)
@@ -706,7 +706,7 @@ camera: &i2c7 {
 &spi2 {
        status = "okay";
 
-       cr50@0 {
+       tpm@0 {
                compatible = "google,cr50";
                reg = <0>;
                interrupt-parent = <&gpio1>;
index d4c70835e0fe28639548ed4bf4579439a5f92308..a4946cdc3bb34ef7bc084f74ae0a4ac8424994df 100644 (file)
@@ -72,7 +72,7 @@
                vin-supply = <&vcc3v3_sys>;
        };
 
-       vcc5v0_usb30_host: vcc5v0-usb30-host-regulator {
+       vcc5v0_usb_host1: vcc5v0_usb_host2: vcc5v0-usb-host-regulator {
                compatible = "regulator-fixed";
                regulator-name = "vcc5v0_host";
                regulator-boot-on;
        status = "okay";
 };
 
+/* Standard pcie */
 &pcie3x2 {
        reset-gpios = <&gpio3 RK_PB0 GPIO_ACTIVE_HIGH>;
        vpcie3v3-supply = <&vcc3v3_sys>;
 
 /* M.2 M-Key ssd */
 &pcie3x4 {
+       num-lanes = <2>;
        reset-gpios = <&gpio4 RK_PB6 GPIO_ACTIVE_HIGH>;
        vpcie3v3-supply = <&vcc3v3_sys>;
        status = "okay";
 };
 
 &u2phy2_host {
-       phy-supply = <&vcc5v0_usb30_host>;
+       phy-supply = <&vcc5v0_usb_host1>;
        status = "okay";
 };
 
 &u2phy3_host {
-       phy-supply = <&vcc5v0_usb30_host>;
+       phy-supply = <&vcc5v0_usb_host2>;
        status = "okay";
 };
 
index 0b02f4d6e00331d4731e60251240748b5415b660..cce1c8e835877c4341d90f2fe80da7c57dde8d0c 100644 (file)
@@ -16,8 +16,8 @@
 
        aliases {
                mmc0 = &sdhci;
-               mmc1 = &sdio;
-               mmc2 = &sdmmc;
+               mmc1 = &sdmmc;
+               mmc2 = &sdio;
                serial2 = &uart2;
        };
 
index ac7c677b0fb9c3d6af9e7b8bcd399d9d24ef0b84..de30c2632b8e5fc8cc6d89272269353676b1e1a3 100644 (file)
                            <&rk806_dvs2_null>, <&rk806_dvs3_null>;
                pinctrl-names = "default";
                spi-max-frequency = <1000000>;
+               system-power-controller;
 
                vcc1-supply = <&vcc5v0_sys>;
                vcc2-supply = <&vcc5v0_sys>;
index 4ce70fb75a307ba34fdd8ad5a72d56401de0118e..39d65002add1e11e81bb0d660fd7f5ff90e4cdf7 100644 (file)
@@ -62,7 +62,6 @@
                compatible = "gpio-leds";
                pinctrl-names = "default";
                pinctrl-0 = <&led1_pin>;
-               status = "okay";
 
                /* LED1 on PCB */
                led-1 {
index d7722772ecd8a0afb7e844ffb168fa9a7462cb03..997b516c2533c1d1fe2db05f2b9df2ad5588e278 100644 (file)
        cpu-supply = <&vdd_cpu_lit_s0>;
 };
 
-&cpu_b0{
+&cpu_b0 {
        cpu-supply = <&vdd_cpu_big0_s0>;
 };
 
-&cpu_b1{
+&cpu_b1 {
        cpu-supply = <&vdd_cpu_big0_s0>;
 };
 
-&cpu_b2{
+&cpu_b2 {
        cpu-supply = <&vdd_cpu_big1_s0>;
 };
 
-&cpu_b3{
+&cpu_b3 {
        cpu-supply = <&vdd_cpu_big1_s0>;
 };
 
index ef4f058c20ff1565cb67e5c2c495f0f337ab2a1c..e037bf9db75af0402dccd26b82b50922823fe9f7 100644 (file)
@@ -19,8 +19,8 @@
 
        aliases {
                mmc0 = &sdhci;
-               mmc1 = &sdio;
-               mmc2 = &sdmmc;
+               mmc1 = &sdmmc;
+               mmc2 = &sdio;
        };
 
        analog-sound {
index dc677f29a9c7fca2359cf0d28b3ec3c9e97dda30..3c227888685192456ec7b4e9d348f187f3259063 100644 (file)
 
 &gpio1 {
        gpio-line-names = /* GPIO1 A0-A7 */
-                         "HEADER_27_3v3", "HEADER_28_3v3", "", "",
+                         "HEADER_27_3v3", "", "", "",
                          "HEADER_29_1v8", "", "HEADER_7_1v8", "",
                          /* GPIO1 B0-B7 */
                          "", "HEADER_31_1v8", "HEADER_33_1v8", "",
                          "HEADER_11_1v8", "HEADER_13_1v8", "", "",
                          /* GPIO1 C0-C7 */
-                         "", "", "", "",
+                         "", "HEADER_28_3v3", "", "",
                          "", "", "", "",
                          /* GPIO1 D0-D7 */
                          "", "", "", "",
 
 &gpio4 {
        gpio-line-names = /* GPIO4 A0-A7 */
-                         "", "", "HEADER_37_3v3", "HEADER_32_3v3",
-                         "HEADER_36_3v3", "", "HEADER_35_3v3", "HEADER_38_3v3",
+                         "", "", "HEADER_37_3v3", "HEADER_8_3v3",
+                         "HEADER_10_3v3", "", "HEADER_32_3v3", "HEADER_35_3v3",
                          /* GPIO4 B0-B7 */
                          "", "", "", "HEADER_40_3v3",
-                         "HEADER_8_3v3", "HEADER_10_3v3", "", "",
+                         "HEADER_38_3v3", "HEADER_36_3v3", "", "",
                          /* GPIO4 C0-C7 */
                          "", "", "", "",
                          "", "", "", "",
index bac4cabef6073e5b0c652d0ed031ea7cce97c72f..467ac2f768ac2bb423b92eb797dce8bde697f259 100644 (file)
@@ -227,8 +227,19 @@ static int ctr_encrypt(struct skcipher_request *req)
                        src += blocks * AES_BLOCK_SIZE;
                }
                if (nbytes && walk.nbytes == walk.total) {
+                       u8 buf[AES_BLOCK_SIZE];
+                       u8 *d = dst;
+
+                       if (unlikely(nbytes < AES_BLOCK_SIZE))
+                               src = dst = memcpy(buf + sizeof(buf) - nbytes,
+                                                  src, nbytes);
+
                        neon_aes_ctr_encrypt(dst, src, ctx->enc, ctx->key.rounds,
                                             nbytes, walk.iv);
+
+                       if (unlikely(nbytes < AES_BLOCK_SIZE))
+                               memcpy(d, dst, nbytes);
+
                        nbytes = 0;
                }
                kernel_neon_end();
index 210bb43cff2c7d020fdf9dc1c8f2a99ca49415a0..d328f549b1a60a26bff884bccebe448fc0936f76 100644 (file)
@@ -229,7 +229,7 @@ alternative_has_cap_likely(const unsigned long cpucap)
        if (!cpucap_is_possible(cpucap))
                return false;
 
-       asm_volatile_goto(
+       asm goto(
        ALTERNATIVE_CB("b       %l[l_no]", %[cpucap], alt_cb_patch_nops)
        :
        : [cpucap] "i" (cpucap)
@@ -247,7 +247,7 @@ alternative_has_cap_unlikely(const unsigned long cpucap)
        if (!cpucap_is_possible(cpucap))
                return false;
 
-       asm_volatile_goto(
+       asm goto(
        ALTERNATIVE("nop", "b   %l[l_yes]", %[cpucap])
        :
        : [cpucap] "i" (cpucap)
index 21c824edf8ce4a6fa1c208b7a882df4b4e31eeb7..bd8d4ca81a48c9c3bfb012b1830eed185b7de679 100644 (file)
@@ -83,7 +83,7 @@ struct arm64_ftr_bits {
  * to full-0 denotes that this field has no override
  *
  * A @mask field set to full-0 with the corresponding @val field set
- * to full-1 denotes thath this field has an invalid override.
+ * to full-1 denotes that this field has an invalid override.
  */
 struct arm64_ftr_override {
        u64             val;
index 7c7493cb571f97bf98b0b4841aeb756d43990718..52f076afeb96006c42dfee6edefcf348048af96b 100644 (file)
@@ -61,6 +61,7 @@
 #define ARM_CPU_IMP_HISI               0x48
 #define ARM_CPU_IMP_APPLE              0x61
 #define ARM_CPU_IMP_AMPERE             0xC0
+#define ARM_CPU_IMP_MICROSOFT          0x6D
 
 #define ARM_CPU_PART_AEM_V8            0xD0F
 #define ARM_CPU_PART_FOUNDATION                0xD00
 
 #define AMPERE_CPU_PART_AMPERE1                0xAC3
 
+#define MICROSOFT_CPU_PART_AZURE_COBALT_100    0xD49 /* Based on r0p0 of ARM Neoverse N2 */
+
 #define MIDR_CORTEX_A53 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A53)
 #define MIDR_CORTEX_A57 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A57)
 #define MIDR_CORTEX_A72 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A72)
 #define MIDR_APPLE_M2_BLIZZARD_MAX MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M2_BLIZZARD_MAX)
 #define MIDR_APPLE_M2_AVALANCHE_MAX MIDR_CPU_MODEL(ARM_CPU_IMP_APPLE, APPLE_CPU_PART_M2_AVALANCHE_MAX)
 #define MIDR_AMPERE1 MIDR_CPU_MODEL(ARM_CPU_IMP_AMPERE, AMPERE_CPU_PART_AMPERE1)
+#define MIDR_MICROSOFT_AZURE_COBALT_100 MIDR_CPU_MODEL(ARM_CPU_IMP_MICROSOFT, MICROSOFT_CPU_PART_AZURE_COBALT_100)
 
 /* Fujitsu Erratum 010001 affects A64FX 1.0 and 1.1, (v0r0 and v1r0) */
 #define MIDR_FUJITSU_ERRATUM_010001            MIDR_FUJITSU_A64FX
index 50e5f25d3024ced8b3c107de352a69e65b6ab7f6..b67b89c54e1c83644cfdd7c63a4807dd24b8d06d 100644 (file)
@@ -62,13 +62,13 @@ static inline void cpacr_restore(unsigned long cpacr)
  * When we defined the maximum SVE vector length we defined the ABI so
  * that the maximum vector length included all the reserved for future
  * expansion bits in ZCR rather than those just currently defined by
- * the architecture. While SME follows a similar pattern the fact that
- * it includes a square matrix means that any allocations that attempt
- * to cover the maximum potential vector length (such as happen with
- * the regset used for ptrace) end up being extremely large. Define
- * the much lower actual limit for use in such situations.
+ * the architecture.  Using this length to allocate worst size buffers
+ * results in excessively large allocations, and this effect is even
+ * more pronounced for SME due to ZA.  Define more suitable VLs for
+ * these situations.
  */
-#define SME_VQ_MAX     16
+#define ARCH_SVE_VQ_MAX ((ZCR_ELx_LEN_MASK >> ZCR_ELx_LEN_SHIFT) + 1)
+#define SME_VQ_MAX     ((SMCR_ELx_LEN_MASK >> SMCR_ELx_LEN_SHIFT) + 1)
 
 struct task_struct;
 
@@ -386,6 +386,7 @@ extern void sme_alloc(struct task_struct *task, bool flush);
 extern unsigned int sme_get_vl(void);
 extern int sme_set_current_vl(unsigned long arg);
 extern int sme_get_current_vl(void);
+extern void sme_suspend_exit(void);
 
 /*
  * Return how many bytes of memory are required to store the full SME
@@ -421,6 +422,7 @@ static inline int sme_max_vl(void) { return 0; }
 static inline int sme_max_virtualisable_vl(void) { return 0; }
 static inline int sme_set_current_vl(unsigned long arg) { return -EINVAL; }
 static inline int sme_get_current_vl(void) { return -EINVAL; }
+static inline void sme_suspend_exit(void) { }
 
 static inline size_t sme_state_size(struct task_struct const *task)
 {
index 48ddc0f45d2283f2c5ba0de62f5d3231999a91f3..6aafbb7899916e631eab9241c39c1313a7c93707 100644 (file)
@@ -18,7 +18,7 @@
 static __always_inline bool arch_static_branch(struct static_key * const key,
                                               const bool branch)
 {
-       asm_volatile_goto(
+       asm goto(
                "1:     nop                                     \n\t"
                 "      .pushsection    __jump_table, \"aw\"    \n\t"
                 "      .align          3                       \n\t"
@@ -35,7 +35,7 @@ l_yes:
 static __always_inline bool arch_static_branch_jump(struct static_key * const key,
                                                    const bool branch)
 {
-       asm_volatile_goto(
+       asm goto(
                "1:     b               %l[l_yes]               \n\t"
                 "      .pushsection    __jump_table, \"aw\"    \n\t"
                 "      .align          3                       \n\t"
index b4ae3210993273e8fd709b8f4d17a081bf39ff3d..4305995c8f82f416e6ce11280ac1dd19fbe25eec 100644 (file)
@@ -17,9 +17,6 @@
 #ifndef __ASSEMBLY__
 
 #include <generated/vdso-offsets.h>
-#ifdef CONFIG_COMPAT_VDSO
-#include <generated/vdso32-offsets.h>
-#endif
 
 #define VDSO_SYMBOL(base, name)                                                   \
 ({                                                                        \
index e5d03a7039b4bf9cce893b1ea39712eef3e2f4ad..467cb711727309eb991df38ece1af46b858e6178 100644 (file)
@@ -77,9 +77,9 @@ obj-$(CONFIG_UNWIND_PATCH_PAC_INTO_SCS)       += patch-scs.o
 # We need to prevent the SCS patching code from patching itself. Using
 # -mbranch-protection=none here to avoid the patchable PAC opcodes from being
 # generated triggers an issue with full LTO on Clang, which stops emitting PAC
-# instructions altogether. So instead, omit the unwind tables used by the
-# patching code, so it will not be able to locate its own PAC instructions.
-CFLAGS_patch-scs.o                     += -fno-asynchronous-unwind-tables -fno-unwind-tables
+# instructions altogether. So disable LTO as well for the compilation unit.
+CFLAGS_patch-scs.o                     += -mbranch-protection=none
+CFLAGS_REMOVE_patch-scs.o              += $(CC_FLAGS_LTO)
 
 # Force dependency (vdso*-wrap.S includes vdso.so through incbin)
 $(obj)/vdso-wrap.o: $(obj)/vdso/vdso.so
index 967c7c7a4e7db3db7e3d05a7637e8e7d13e0d273..76b8dd37092ad2a9dd6e59a92d1c1fab887589da 100644 (file)
@@ -374,6 +374,7 @@ static const struct midr_range erratum_1463225[] = {
 static const struct midr_range trbe_overwrite_fill_mode_cpus[] = {
 #ifdef CONFIG_ARM64_ERRATUM_2139208
        MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
+       MIDR_ALL_VERSIONS(MIDR_MICROSOFT_AZURE_COBALT_100),
 #endif
 #ifdef CONFIG_ARM64_ERRATUM_2119858
        MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
@@ -387,6 +388,7 @@ static const struct midr_range trbe_overwrite_fill_mode_cpus[] = {
 static const struct midr_range tsb_flush_fail_cpus[] = {
 #ifdef CONFIG_ARM64_ERRATUM_2067961
        MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
+       MIDR_ALL_VERSIONS(MIDR_MICROSOFT_AZURE_COBALT_100),
 #endif
 #ifdef CONFIG_ARM64_ERRATUM_2054223
        MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
@@ -399,6 +401,7 @@ static const struct midr_range tsb_flush_fail_cpus[] = {
 static struct midr_range trbe_write_out_of_range_cpus[] = {
 #ifdef CONFIG_ARM64_ERRATUM_2253138
        MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
+       MIDR_ALL_VERSIONS(MIDR_MICROSOFT_AZURE_COBALT_100),
 #endif
 #ifdef CONFIG_ARM64_ERRATUM_2224489
        MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
index a5dc6f764195847251dc25c196304cbef44d8850..f27acca550d5539d00d958d441ca8631c8dba8d4 100644 (file)
@@ -1311,6 +1311,22 @@ void __init sme_setup(void)
                get_sme_default_vl());
 }
 
+void sme_suspend_exit(void)
+{
+       u64 smcr = 0;
+
+       if (!system_supports_sme())
+               return;
+
+       if (system_supports_fa64())
+               smcr |= SMCR_ELx_FA64;
+       if (system_supports_sme2())
+               smcr |= SMCR_ELx_EZT0;
+
+       write_sysreg_s(smcr, SYS_SMCR_EL1);
+       write_sysreg_s(0, SYS_SMPRI_EL1);
+}
+
 #endif /* CONFIG_ARM64_SME */
 
 static void sve_init_regs(void)
@@ -1635,7 +1651,7 @@ void fpsimd_preserve_current_state(void)
 void fpsimd_signal_preserve_current_state(void)
 {
        fpsimd_preserve_current_state();
-       if (test_thread_flag(TIF_SVE))
+       if (current->thread.fp_type == FP_STATE_SVE)
                sve_to_fpsimd(current);
 }
 
index dc6cf0e37194e428519d7d58524ad0f624f4bebb..e3bef38fc2e2d36b85a9a4069729e276f9368846 100644 (file)
@@ -1500,7 +1500,8 @@ static const struct user_regset aarch64_regsets[] = {
 #ifdef CONFIG_ARM64_SVE
        [REGSET_SVE] = { /* Scalable Vector Extension */
                .core_note_type = NT_ARM_SVE,
-               .n = DIV_ROUND_UP(SVE_PT_SIZE(SVE_VQ_MAX, SVE_PT_REGS_SVE),
+               .n = DIV_ROUND_UP(SVE_PT_SIZE(ARCH_SVE_VQ_MAX,
+                                             SVE_PT_REGS_SVE),
                                  SVE_VQ_BYTES),
                .size = SVE_VQ_BYTES,
                .align = SVE_VQ_BYTES,
index 0e8beb3349ea2a1aae6340f879f78dd568e04f51..425b1bc17a3f6dc3237e81fabcb35d774f1aa9d1 100644 (file)
@@ -242,7 +242,7 @@ static int preserve_sve_context(struct sve_context __user *ctx)
                vl = task_get_sme_vl(current);
                vq = sve_vq_from_vl(vl);
                flags |= SVE_SIG_FLAG_SM;
-       } else if (test_thread_flag(TIF_SVE)) {
+       } else if (current->thread.fp_type == FP_STATE_SVE) {
                vq = sve_vq_from_vl(vl);
        }
 
@@ -878,7 +878,7 @@ static int setup_sigframe_layout(struct rt_sigframe_user_layout *user,
        if (system_supports_sve() || system_supports_sme()) {
                unsigned int vq = 0;
 
-               if (add_all || test_thread_flag(TIF_SVE) ||
+               if (add_all || current->thread.fp_type == FP_STATE_SVE ||
                    thread_sm_enabled(&current->thread)) {
                        int vl = max(sve_max_vl(), sme_max_vl());
 
index 7f88028a00c02c0e176af5ae7674ae606f5afd3a..b2a60e0bcfd21d28a60db750590732bf1115551d 100644 (file)
@@ -247,7 +247,7 @@ struct kunwind_consume_entry_data {
        void *cookie;
 };
 
-static bool
+static __always_inline bool
 arch_kunwind_consume_entry(const struct kunwind_state *state, void *cookie)
 {
        struct kunwind_consume_entry_data *data = cookie;
index eca4d043521183adc7263da95cf656323a4cc73a..eaaff94329cddb8d1fb8d1523395453f3501c9a5 100644 (file)
@@ -12,6 +12,7 @@
 #include <asm/daifflags.h>
 #include <asm/debug-monitors.h>
 #include <asm/exec.h>
+#include <asm/fpsimd.h>
 #include <asm/mte.h>
 #include <asm/memory.h>
 #include <asm/mmu_context.h>
@@ -80,6 +81,8 @@ void notrace __cpu_suspend_exit(void)
         */
        spectre_v4_enable_mitigation(NULL);
 
+       sme_suspend_exit();
+
        /* Restore additional feature-specific configuration */
        ptrauth_suspend_exit();
 }
index 2266fcdff78a0740fcd72a5c8125d17938d88df4..f5f80fdce0fe7aa2ab3b14ce931999b954312162 100644 (file)
@@ -127,9 +127,6 @@ obj-vdso := $(c-obj-vdso) $(c-obj-vdso-gettimeofday) $(asm-obj-vdso)
 targets += vdso.lds
 CPPFLAGS_vdso.lds += -P -C -U$(ARCH)
 
-include/generated/vdso32-offsets.h: $(obj)/vdso32.so.dbg FORCE
-       $(call if_changed,vdsosym)
-
 # Strip rule for vdso.so
 $(obj)/vdso.so: OBJCOPYFLAGS := -S
 $(obj)/vdso.so: $(obj)/vdso32.so.dbg FORCE
@@ -166,9 +163,3 @@ quiet_cmd_vdsoas = AS32    $@
 
 quiet_cmd_vdsomunge = MUNGE   $@
       cmd_vdsomunge = $(obj)/$(munge) $< $@
-
-# Generate vDSO offsets using helper script (borrowed from the 64-bit vDSO)
-gen-vdsosym := $(srctree)/$(src)/../vdso/gen_vdso_offsets.sh
-quiet_cmd_vdsosym = VDSOSYM $@
-# The AArch64 nm should be able to read an AArch32 binary
-      cmd_vdsosym = $(NM) $< | $(gen-vdsosym) | LC_ALL=C sort > $@
index 6c3c8ca73e7fda8bb29792218bb11d031e7527ff..27ca89b628a02499d18505953ad1cee73ccf7f88 100644 (file)
@@ -3,7 +3,6 @@
 # KVM configuration
 #
 
-source "virt/lib/Kconfig"
 source "virt/kvm/Kconfig"
 
 menuconfig VIRTUALIZATION
index c651df904fe3eb940e07785aac1ac76079743666..ab9d05fcf98b23b992343d6a11a1daf9de806b3a 100644 (file)
@@ -1419,7 +1419,6 @@ kvm_pte_t *kvm_pgtable_stage2_create_unlinked(struct kvm_pgtable *pgt,
                                 level + 1);
        if (ret) {
                kvm_pgtable_stage2_free_unlinked(mm_ops, pgtable, level);
-               mm_ops->put_page(pgtable);
                return ERR_PTR(ret);
        }
 
@@ -1502,7 +1501,6 @@ static int stage2_split_walker(const struct kvm_pgtable_visit_ctx *ctx,
 
        if (!stage2_try_break_pte(ctx, mmu)) {
                kvm_pgtable_stage2_free_unlinked(mm_ops, childp, level);
-               mm_ops->put_page(childp);
                return -EAGAIN;
        }
 
index 8350fb8fee0b998ccf27dca4b7bf2e858846ccd3..b7be96a5359737d41576af46eee1f68852632846 100644 (file)
@@ -101,6 +101,17 @@ void __init kvm_hyp_reserve(void)
                 hyp_mem_base);
 }
 
+static void __pkvm_destroy_hyp_vm(struct kvm *host_kvm)
+{
+       if (host_kvm->arch.pkvm.handle) {
+               WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_vm,
+                                         host_kvm->arch.pkvm.handle));
+       }
+
+       host_kvm->arch.pkvm.handle = 0;
+       free_hyp_memcache(&host_kvm->arch.pkvm.teardown_mc);
+}
+
 /*
  * Allocates and donates memory for hypervisor VM structs at EL2.
  *
@@ -181,7 +192,7 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
        return 0;
 
 destroy_vm:
-       pkvm_destroy_hyp_vm(host_kvm);
+       __pkvm_destroy_hyp_vm(host_kvm);
        return ret;
 free_vm:
        free_pages_exact(hyp_vm, hyp_vm_sz);
@@ -194,23 +205,19 @@ int pkvm_create_hyp_vm(struct kvm *host_kvm)
 {
        int ret = 0;
 
-       mutex_lock(&host_kvm->lock);
+       mutex_lock(&host_kvm->arch.config_lock);
        if (!host_kvm->arch.pkvm.handle)
                ret = __pkvm_create_hyp_vm(host_kvm);
-       mutex_unlock(&host_kvm->lock);
+       mutex_unlock(&host_kvm->arch.config_lock);
 
        return ret;
 }
 
 void pkvm_destroy_hyp_vm(struct kvm *host_kvm)
 {
-       if (host_kvm->arch.pkvm.handle) {
-               WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_vm,
-                                         host_kvm->arch.pkvm.handle));
-       }
-
-       host_kvm->arch.pkvm.handle = 0;
-       free_hyp_memcache(&host_kvm->arch.pkvm.teardown_mc);
+       mutex_lock(&host_kvm->arch.config_lock);
+       __pkvm_destroy_hyp_vm(host_kvm);
+       mutex_unlock(&host_kvm->arch.config_lock);
 }
 
 int pkvm_init_host_vm(struct kvm *host_kvm)
index e2764d0ffa9f32094c57580ed5d987f99b5d2ade..28a93074eca17dbb10c7c75e23baf72edf126391 100644 (file)
@@ -468,6 +468,9 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
                }
 
                irq = vgic_get_irq(vcpu->kvm, NULL, intids[i]);
+               if (!irq)
+                       continue;
+
                raw_spin_lock_irqsave(&irq->irq_lock, flags);
                irq->pending_latch = pendmask & (1U << bit_nr);
                vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
@@ -1432,6 +1435,8 @@ static int vgic_its_cmd_handle_movall(struct kvm *kvm, struct vgic_its *its,
 
        for (i = 0; i < irq_count; i++) {
                irq = vgic_get_irq(kvm, NULL, intids[i]);
+               if (!irq)
+                       continue;
 
                update_affinity(irq, vcpu2);
 
index 98a3f4b168bd2687f3e4828aa681d29e0c13b97e..ef2e37a10a0feb9af3543481cffddd75c5b3a8ef 100644 (file)
@@ -12,7 +12,7 @@
 static __always_inline bool arch_static_branch(struct static_key *key,
                                               bool branch)
 {
-       asm_volatile_goto(
+       asm goto(
                "1:     nop32                                   \n"
                "       .pushsection    __jump_table, \"aw\"    \n"
                "       .align          2                       \n"
@@ -29,7 +29,7 @@ label:
 static __always_inline bool arch_static_branch_jump(struct static_key *key,
                                                    bool branch)
 {
-       asm_volatile_goto(
+       asm goto(
                "1:     bsr32           %l[label]               \n"
                "       .pushsection    __jump_table, \"aw\"    \n"
                "       .align          2                       \n"
index 10959e6c3583255264aef0ea7de6e6477a003418..929f68926b3432e52a6d7b84a92d402a5d8bc11a 100644 (file)
@@ -12,6 +12,7 @@ config LOONGARCH
        select ARCH_DISABLE_KASAN_INLINE
        select ARCH_ENABLE_MEMORY_HOTPLUG
        select ARCH_ENABLE_MEMORY_HOTREMOVE
+       select ARCH_ENABLE_THP_MIGRATION if TRANSPARENT_HUGEPAGE
        select ARCH_HAS_ACPI_TABLE_UPGRADE      if ACPI
        select ARCH_HAS_CPU_FINALIZE_INIT
        select ARCH_HAS_FORTIFY_SOURCE
@@ -99,6 +100,7 @@ config LOONGARCH
        select HAVE_ARCH_KFENCE
        select HAVE_ARCH_KGDB if PERF_EVENTS
        select HAVE_ARCH_MMAP_RND_BITS if MMU
+       select HAVE_ARCH_SECCOMP
        select HAVE_ARCH_SECCOMP_FILTER
        select HAVE_ARCH_TRACEHOOK
        select HAVE_ARCH_TRANSPARENT_HUGEPAGE
@@ -632,23 +634,6 @@ config RANDOMIZE_BASE_MAX_OFFSET
 
          This is limited by the size of the lower address memory, 256MB.
 
-config SECCOMP
-       bool "Enable seccomp to safely compute untrusted bytecode"
-       depends on PROC_FS
-       default y
-       help
-         This kernel feature is useful for number crunching applications
-         that may need to compute untrusted bytecode during their
-         execution. By using pipes or other transports made available to
-         the process as file descriptors supporting the read/write
-         syscalls, it's possible to isolate those applications in
-         their own address space using seccomp. Once seccomp is
-         enabled via /proc/<pid>/seccomp, it cannot be disabled
-         and the task is only allowed to execute a few safe syscalls
-         defined by each seccomp mode.
-
-         If unsure, say Y. Only embedded should say N here.
-
 endmenu
 
 config ARCH_SELECT_MEMORY_MODEL
@@ -667,10 +652,6 @@ config ARCH_SPARSEMEM_ENABLE
          or have huge holes in the physical address space for other reasons.
          See <file:Documentation/mm/numa.rst> for more.
 
-config ARCH_ENABLE_THP_MIGRATION
-       def_bool y
-       depends on TRANSPARENT_HUGEPAGE
-
 config ARCH_MEMORY_PROBE
        def_bool y
        depends on MEMORY_HOTPLUG
index b38071a4d0b023c7faf29935d3bb6d5e0c65ca76..8aefb0c126722980a345062cae02a6127c02b52e 100644 (file)
@@ -60,7 +60,7 @@
 
        #address-cells = <1>;
        #size-cells = <0>;
-       eeprom@57{
+       eeprom@57 {
                compatible = "atmel,24c16";
                reg = <0x57>;
                pagesize = <16>;
index 132a2d1ea8bce1ac95222875b6ad74d5ebf06b14..ed4d324340411dee9b88e52720329cf839307ad0 100644 (file)
@@ -78,7 +78,7 @@
 
        #address-cells = <1>;
        #size-cells = <0>;
-       eeprom@57{
+       eeprom@57 {
                compatible = "atmel,24c16";
                reg = <0x57>;
                pagesize = <16>;
index 8de6c4b83a61a8088903abc67ddc50beb35233d5..49e29b29996f0f4473c5d628c936c7528630ad52 100644 (file)
@@ -32,8 +32,10 @@ static inline bool acpi_has_cpu_in_madt(void)
        return true;
 }
 
+#define MAX_CORE_PIC 256
+
 extern struct list_head acpi_wakeup_device_list;
-extern struct acpi_madt_core_pic acpi_core_pic[NR_CPUS];
+extern struct acpi_madt_core_pic acpi_core_pic[MAX_CORE_PIC];
 
 extern int __init parse_acpi_topology(void);
 
index 3cea299a5ef58313a305f7d5a086ae9a32e8aa95..29acfe3de3faae797beca198e58f9e5a5e570bbc 100644 (file)
@@ -22,7 +22,7 @@
 
 static __always_inline bool arch_static_branch(struct static_key * const key, const bool branch)
 {
-       asm_volatile_goto(
+       asm goto(
                "1:     nop                     \n\t"
                JUMP_TABLE_ENTRY
                :  :  "i"(&((char *)key)[branch]) :  : l_yes);
@@ -35,7 +35,7 @@ l_yes:
 
 static __always_inline bool arch_static_branch_jump(struct static_key * const key, const bool branch)
 {
-       asm_volatile_goto(
+       asm goto(
                "1:     b       %l[l_yes]       \n\t"
                JUMP_TABLE_ENTRY
                :  :  "i"(&((char *)key)[branch]) :  : l_yes);
index e71ceb88f29eecdbebe40c407c1a93e4cd433337..0cb4fdb8a9b5970dfefb24a34c82d1451ff27fa9 100644 (file)
@@ -60,7 +60,7 @@ int kvm_own_lsx(struct kvm_vcpu *vcpu);
 void kvm_save_lsx(struct loongarch_fpu *fpu);
 void kvm_restore_lsx(struct loongarch_fpu *fpu);
 #else
-static inline int kvm_own_lsx(struct kvm_vcpu *vcpu) { }
+static inline int kvm_own_lsx(struct kvm_vcpu *vcpu) { return -EINVAL; }
 static inline void kvm_save_lsx(struct loongarch_fpu *fpu) { }
 static inline void kvm_restore_lsx(struct loongarch_fpu *fpu) { }
 #endif
@@ -70,7 +70,7 @@ int kvm_own_lasx(struct kvm_vcpu *vcpu);
 void kvm_save_lasx(struct loongarch_fpu *fpu);
 void kvm_restore_lasx(struct loongarch_fpu *fpu);
 #else
-static inline int kvm_own_lasx(struct kvm_vcpu *vcpu) { }
+static inline int kvm_own_lasx(struct kvm_vcpu *vcpu) { return -EINVAL; }
 static inline void kvm_save_lasx(struct loongarch_fpu *fpu) { }
 static inline void kvm_restore_lasx(struct loongarch_fpu *fpu) { }
 #endif
index b6b097bbf8668a400105dd408c696e4135c824d8..5cf59c617126b7d00f65b3310ef39ea7bfb98e96 100644 (file)
@@ -29,11 +29,9 @@ int disabled_cpus;
 
 u64 acpi_saved_sp;
 
-#define MAX_CORE_PIC 256
-
 #define PREFIX                 "ACPI: "
 
-struct acpi_madt_core_pic acpi_core_pic[NR_CPUS];
+struct acpi_madt_core_pic acpi_core_pic[MAX_CORE_PIC];
 
 void __init __iomem * __acpi_map_table(unsigned long phys, unsigned long size)
 {
index edf2bba80130670364e144ad301868a7dfd3bf93..634ef17fd38bf10d8bd9deef8a6693f0f4777c1e 100644 (file)
@@ -357,6 +357,8 @@ void __init platform_init(void)
        acpi_gbl_use_default_register_widths = false;
        acpi_boot_table_init();
 #endif
+
+       early_init_fdt_scan_reserved_mem();
        unflatten_and_copy_device_tree();
 
 #ifdef CONFIG_NUMA
@@ -390,8 +392,6 @@ static void __init arch_mem_init(char **cmdline_p)
 
        check_kernel_sections_mem();
 
-       early_init_fdt_scan_reserved_mem();
-
        /*
         * In order to reduce the possibility of kernel panic when failed to
         * get IO TLB memory under CONFIG_SWIOTLB, it is better to allocate
index a16e3dbe9f09eb2fbf1b239b982b727330f7c233..aabee0b280fe5f43a70d8a1091e6268a37d701e9 100644 (file)
@@ -88,6 +88,73 @@ void show_ipi_list(struct seq_file *p, int prec)
        }
 }
 
+static inline void set_cpu_core_map(int cpu)
+{
+       int i;
+
+       cpumask_set_cpu(cpu, &cpu_core_setup_map);
+
+       for_each_cpu(i, &cpu_core_setup_map) {
+               if (cpu_data[cpu].package == cpu_data[i].package) {
+                       cpumask_set_cpu(i, &cpu_core_map[cpu]);
+                       cpumask_set_cpu(cpu, &cpu_core_map[i]);
+               }
+       }
+}
+
+static inline void set_cpu_sibling_map(int cpu)
+{
+       int i;
+
+       cpumask_set_cpu(cpu, &cpu_sibling_setup_map);
+
+       for_each_cpu(i, &cpu_sibling_setup_map) {
+               if (cpus_are_siblings(cpu, i)) {
+                       cpumask_set_cpu(i, &cpu_sibling_map[cpu]);
+                       cpumask_set_cpu(cpu, &cpu_sibling_map[i]);
+               }
+       }
+}
+
+static inline void clear_cpu_sibling_map(int cpu)
+{
+       int i;
+
+       for_each_cpu(i, &cpu_sibling_setup_map) {
+               if (cpus_are_siblings(cpu, i)) {
+                       cpumask_clear_cpu(i, &cpu_sibling_map[cpu]);
+                       cpumask_clear_cpu(cpu, &cpu_sibling_map[i]);
+               }
+       }
+
+       cpumask_clear_cpu(cpu, &cpu_sibling_setup_map);
+}
+
+/*
+ * Calculate a new cpu_foreign_map mask whenever a
+ * new cpu appears or disappears.
+ */
+void calculate_cpu_foreign_map(void)
+{
+       int i, k, core_present;
+       cpumask_t temp_foreign_map;
+
+       /* Re-calculate the mask */
+       cpumask_clear(&temp_foreign_map);
+       for_each_online_cpu(i) {
+               core_present = 0;
+               for_each_cpu(k, &temp_foreign_map)
+                       if (cpus_are_siblings(i, k))
+                               core_present = 1;
+               if (!core_present)
+                       cpumask_set_cpu(i, &temp_foreign_map);
+       }
+
+       for_each_online_cpu(i)
+               cpumask_andnot(&cpu_foreign_map[i],
+                              &temp_foreign_map, &cpu_sibling_map[i]);
+}
+
 /* Send mailbox buffer via Mail_Send */
 static void csr_mail_send(uint64_t data, int cpu, int mailbox)
 {
@@ -303,6 +370,7 @@ int loongson_cpu_disable(void)
        numa_remove_cpu(cpu);
 #endif
        set_cpu_online(cpu, false);
+       clear_cpu_sibling_map(cpu);
        calculate_cpu_foreign_map();
        local_irq_save(flags);
        irq_migrate_all_off_this_cpu();
@@ -337,6 +405,7 @@ void __noreturn arch_cpu_idle_dead(void)
                addr = iocsr_read64(LOONGARCH_IOCSR_MBUF0);
        } while (addr == 0);
 
+       local_irq_disable();
        init_fn = (void *)TO_CACHE(addr);
        iocsr_write32(0xffffffff, LOONGARCH_IOCSR_IPI_CLEAR);
 
@@ -379,59 +448,6 @@ static int __init ipi_pm_init(void)
 core_initcall(ipi_pm_init);
 #endif
 
-static inline void set_cpu_sibling_map(int cpu)
-{
-       int i;
-
-       cpumask_set_cpu(cpu, &cpu_sibling_setup_map);
-
-       for_each_cpu(i, &cpu_sibling_setup_map) {
-               if (cpus_are_siblings(cpu, i)) {
-                       cpumask_set_cpu(i, &cpu_sibling_map[cpu]);
-                       cpumask_set_cpu(cpu, &cpu_sibling_map[i]);
-               }
-       }
-}
-
-static inline void set_cpu_core_map(int cpu)
-{
-       int i;
-
-       cpumask_set_cpu(cpu, &cpu_core_setup_map);
-
-       for_each_cpu(i, &cpu_core_setup_map) {
-               if (cpu_data[cpu].package == cpu_data[i].package) {
-                       cpumask_set_cpu(i, &cpu_core_map[cpu]);
-                       cpumask_set_cpu(cpu, &cpu_core_map[i]);
-               }
-       }
-}
-
-/*
- * Calculate a new cpu_foreign_map mask whenever a
- * new cpu appears or disappears.
- */
-void calculate_cpu_foreign_map(void)
-{
-       int i, k, core_present;
-       cpumask_t temp_foreign_map;
-
-       /* Re-calculate the mask */
-       cpumask_clear(&temp_foreign_map);
-       for_each_online_cpu(i) {
-               core_present = 0;
-               for_each_cpu(k, &temp_foreign_map)
-                       if (cpus_are_siblings(i, k))
-                               core_present = 1;
-               if (!core_present)
-                       cpumask_set_cpu(i, &temp_foreign_map);
-       }
-
-       for_each_online_cpu(i)
-               cpumask_andnot(&cpu_foreign_map[i],
-                              &temp_foreign_map, &cpu_sibling_map[i]);
-}
-
 /* Preload SMP state for boot cpu */
 void smp_prepare_boot_cpu(void)
 {
@@ -509,7 +525,6 @@ asmlinkage void start_secondary(void)
        sync_counter();
        cpu = raw_smp_processor_id();
        set_my_cpu_offset(per_cpu_offset(cpu));
-       rcutree_report_cpu_starting(cpu);
 
        cpu_probe();
        constant_clockevent_init();
index 915f175278931f26164c1b970663542cf0661a12..50a6acd7ffe4c94b986c5f7a9802420f090a7d79 100644 (file)
@@ -675,7 +675,7 @@ static bool fault_supports_huge_mapping(struct kvm_memory_slot *memslot,
  *
  * There are several ways to safely use this helper:
  *
- * - Check mmu_invalidate_retry_hva() after grabbing the mapping level, before
+ * - Check mmu_invalidate_retry_gfn() after grabbing the mapping level, before
  *   consuming it.  In this case, mmu_lock doesn't need to be held during the
  *   lookup, but it does need to be held while checking the MMU notifier.
  *
@@ -855,7 +855,7 @@ retry:
 
        /* Check if an invalidation has taken place since we got pfn */
        spin_lock(&kvm->mmu_lock);
-       if (mmu_invalidate_retry_hva(kvm, mmu_seq, hva)) {
+       if (mmu_invalidate_retry_gfn(kvm, mmu_seq, gfn)) {
                /*
                 * This can happen when mappings are changed asynchronously, but
                 * also synchronously if a COW is triggered by
index 27701991886dda7e3a6f75bd8a7f71a86995735b..36106922b5d75b7f7de70df5df0d72a697440f0f 100644 (file)
@@ -298,74 +298,73 @@ static int _kvm_setcsr(struct kvm_vcpu *vcpu, unsigned int id, u64 val)
        return ret;
 }
 
-static int _kvm_get_cpucfg(int id, u64 *v)
+static int _kvm_get_cpucfg_mask(int id, u64 *v)
 {
-       int ret = 0;
-
-       if (id < 0 && id >= KVM_MAX_CPUCFG_REGS)
+       if (id < 0 || id >= KVM_MAX_CPUCFG_REGS)
                return -EINVAL;
 
        switch (id) {
        case 2:
-               /* Return CPUCFG2 features which have been supported by KVM */
+               /* CPUCFG2 features unconditionally supported by KVM */
                *v = CPUCFG2_FP     | CPUCFG2_FPSP  | CPUCFG2_FPDP     |
                     CPUCFG2_FPVERS | CPUCFG2_LLFTP | CPUCFG2_LLFTPREV |
                     CPUCFG2_LAM;
                /*
-                * If LSX is supported by CPU, it is also supported by KVM,
-                * as we implement it.
+                * For the ISA extensions listed below, if one is supported
+                * by the host, then it is also supported by KVM.
                 */
                if (cpu_has_lsx)
                        *v |= CPUCFG2_LSX;
-               /*
-                * if LASX is supported by CPU, it is also supported by KVM,
-                * as we implement it.
-                */
                if (cpu_has_lasx)
                        *v |= CPUCFG2_LASX;
 
-               break;
+               return 0;
        default:
-               ret = -EINVAL;
-               break;
+               /*
+                * No restrictions on other valid CPUCFG IDs' values, but
+                * CPUCFG data is limited to 32 bits as the LoongArch ISA
+                * manual says (Volume 1, Section 2.2.10.5 "CPUCFG").
+                */
+               *v = U32_MAX;
+               return 0;
        }
-       return ret;
 }
 
 static int kvm_check_cpucfg(int id, u64 val)
 {
-       u64 mask;
-       int ret = 0;
-
-       if (id < 0 && id >= KVM_MAX_CPUCFG_REGS)
-               return -EINVAL;
+       int ret;
+       u64 mask = 0;
 
-       if (_kvm_get_cpucfg(id, &mask))
+       ret = _kvm_get_cpucfg_mask(id, &mask);
+       if (ret)
                return ret;
 
+       if (val & ~mask)
+               /* Unsupported features and/or the higher 32 bits should not be set */
+               return -EINVAL;
+
        switch (id) {
        case 2:
-               /* CPUCFG2 features checking */
-               if (val & ~mask)
-                       /* The unsupported features should not be set */
-                       ret = -EINVAL;
-               else if (!(val & CPUCFG2_LLFTP))
-                       /* The LLFTP must be set, as guest must has a constant timer */
-                       ret = -EINVAL;
-               else if ((val & CPUCFG2_FP) && (!(val & CPUCFG2_FPSP) || !(val & CPUCFG2_FPDP)))
-                       /* Single and double float point must both be set when enable FP */
-                       ret = -EINVAL;
-               else if ((val & CPUCFG2_LSX) && !(val & CPUCFG2_FP))
-                       /* FP should be set when enable LSX */
-                       ret = -EINVAL;
-               else if ((val & CPUCFG2_LASX) && !(val & CPUCFG2_LSX))
-                       /* LSX, FP should be set when enable LASX, and FP has been checked before. */
-                       ret = -EINVAL;
-               break;
+               if (!(val & CPUCFG2_LLFTP))
+                       /* Guests must have a constant timer */
+                       return -EINVAL;
+               if ((val & CPUCFG2_FP) && (!(val & CPUCFG2_FPSP) || !(val & CPUCFG2_FPDP)))
+                       /* Single and double float point must both be set when FP is enabled */
+                       return -EINVAL;
+               if ((val & CPUCFG2_LSX) && !(val & CPUCFG2_FP))
+                       /* LSX architecturally implies FP but val does not satisfy that */
+                       return -EINVAL;
+               if ((val & CPUCFG2_LASX) && !(val & CPUCFG2_LSX))
+                       /* LASX architecturally implies LSX and FP but val does not satisfy that */
+                       return -EINVAL;
+               return 0;
        default:
-               break;
+               /*
+                * Values for the other CPUCFG IDs are not being further validated
+                * besides the mask check above.
+                */
+               return 0;
        }
-       return ret;
 }
 
 static int kvm_get_one_reg(struct kvm_vcpu *vcpu,
@@ -566,7 +565,7 @@ static int kvm_loongarch_get_cpucfg_attr(struct kvm_vcpu *vcpu,
        uint64_t val;
        uint64_t __user *uaddr = (uint64_t __user *)attr->addr;
 
-       ret = _kvm_get_cpucfg(attr->attr, &val);
+       ret = _kvm_get_cpucfg_mask(attr->attr, &val);
        if (ret)
                return ret;
 
index cc3e81fe0186f4f0fa8de9cedfc75138583ce23f..c608adc9984581d0419594a8eb87ae18a3e9ec63 100644 (file)
@@ -44,6 +44,9 @@ void *kasan_mem_to_shadow(const void *addr)
                unsigned long xrange = (maddr >> XRANGE_SHIFT) & 0xffff;
                unsigned long offset = 0;
 
+               if (maddr >= FIXADDR_START)
+                       return (void *)(kasan_early_shadow_page);
+
                maddr &= XRANGE_SHADOW_MASK;
                switch (xrange) {
                case XKPRANGE_CC_SEG:
index 2c0a411f23aa778bb62160bd511252736fc987be..0b95d32b30c94704a0108fdffcae68c148403ce7 100644 (file)
@@ -284,12 +284,16 @@ static void setup_tlb_handler(int cpu)
                set_handler(EXCCODE_TLBNR * VECSIZE, handle_tlb_protect, VECSIZE);
                set_handler(EXCCODE_TLBNX * VECSIZE, handle_tlb_protect, VECSIZE);
                set_handler(EXCCODE_TLBPE * VECSIZE, handle_tlb_protect, VECSIZE);
-       }
+       } else {
+               int vec_sz __maybe_unused;
+               void *addr __maybe_unused;
+               struct page *page __maybe_unused;
+
+               /* Avoid lockdep warning */
+               rcutree_report_cpu_starting(cpu);
+
 #ifdef CONFIG_NUMA
-       else {
-               void *addr;
-               struct page *page;
-               const int vec_sz = sizeof(exception_handlers);
+               vec_sz = sizeof(exception_handlers);
 
                if (pcpu_handlers[cpu])
                        return;
@@ -305,8 +309,8 @@ static void setup_tlb_handler(int cpu)
                csr_write64(pcpu_handlers[cpu], LOONGARCH_CSR_EENTRY);
                csr_write64(pcpu_handlers[cpu], LOONGARCH_CSR_MERRENTRY);
                csr_write64(pcpu_handlers[cpu] + 80*VECSIZE, LOONGARCH_CSR_TLBRENTRY);
-       }
 #endif
+       }
 }
 
 void tlb_init(int cpu)
index c74c9921304f2273fea31278cfafce7b143a75ea..f597cd08a96be0a19084884bd175678a6a83d6ab 100644 (file)
@@ -2,6 +2,7 @@
 # Objects to go into the VDSO.
 
 KASAN_SANITIZE := n
+UBSAN_SANITIZE := n
 KCOV_INSTRUMENT := n
 
 # Include the generic Makefile to check the built vdso.
index 43e39040d3ac6cd38a4bd4fc3dc04e03d5c71bf5..0abcf994ce5503e3e713f0ee2f1d563f978786a4 100644 (file)
 KBUILD_DEFCONFIG := multi_defconfig
 
 ifdef cross_compiling
-       ifeq ($(CROSS_COMPILE),)
+    ifeq ($(CROSS_COMPILE),)
                CROSS_COMPILE := $(call cc-cross-prefix, \
                        m68k-linux-gnu- m68k-linux- m68k-unknown-linux-gnu-)
-       endif
+    endif
 endif
 
 #
index a708fbd5a844f8a2c6a60cea6eb2e5a126e8dbdb..642fb80c5c4e31f6c595e1663a19d7760c68e1e1 100644 (file)
@@ -96,6 +96,9 @@ static const struct block_device_operations nfhd_ops = {
 
 static int __init nfhd_init_one(int id, u32 blocks, u32 bsize)
 {
+       struct queue_limits lim = {
+               .logical_block_size     = bsize,
+       };
        struct nfhd_device *dev;
        int dev_id = id - NFHD_DEV_OFFSET;
        int err = -ENOMEM;
@@ -117,9 +120,11 @@ static int __init nfhd_init_one(int id, u32 blocks, u32 bsize)
        dev->bsize = bsize;
        dev->bshift = ffs(bsize) - 10;
 
-       dev->disk = blk_alloc_disk(NUMA_NO_NODE);
-       if (!dev->disk)
+       dev->disk = blk_alloc_disk(&lim, NUMA_NO_NODE);
+       if (IS_ERR(dev->disk)) {
+               err = PTR_ERR(dev->disk);
                goto free_dev;
+       }
 
        dev->disk->major = major_num;
        dev->disk->first_minor = dev_id * 16;
@@ -128,7 +133,6 @@ static int __init nfhd_init_one(int id, u32 blocks, u32 bsize)
        dev->disk->private_data = dev;
        sprintf(dev->disk->disk_name, "nfhd%u", dev_id);
        set_capacity(dev->disk, (sector_t)blocks * (bsize / 512));
-       blk_queue_logical_block_size(dev->disk->queue, bsize);
        err = add_disk(dev->disk);
        if (err)
                goto out_cleanup_disk;
index b13d8adf3be47dbfd6f65e1e63ee3217feafe04b..20d30f6265cdce2a915ddffc52d0bb67e6e0edac 100644 (file)
@@ -40,6 +40,7 @@
 #include <linux/string.h>
 
 #include <asm/bootinfo.h>
+#include <prom.h>
 
 int prom_argc;
 char **prom_argv;
index 2388d68786f4a7c40dcadfed78fd8ecfc91f4896..a7a6d31a7a4148ada6ad340d0723ef8c7a73f0be 100644 (file)
 #include <linux/mm.h>
 #include <linux/dma-map-ops.h> /* for dma_default_coherent */
 
+#include <asm/bootinfo.h>
 #include <asm/mipsregs.h>
 
 #include <au1000.h>
 
-extern void __init board_setup(void);
-extern void __init alchemy_set_lpj(void);
-
 static bool alchemy_dma_coherent(void)
 {
        switch (alchemy_get_cputype()) {
index 01aff80a59672dee1b675c3625aecb6f70eb52b9..99f321b6e417bd4250ab7cec31ae74ad2d396ec3 100644 (file)
@@ -702,7 +702,7 @@ static struct ssb_sprom bcm63xx_sprom = {
        .boardflags_hi          = 0x0000,
 };
 
-int bcm63xx_get_fallback_sprom(struct ssb_bus *bus, struct ssb_sprom *out)
+static int bcm63xx_get_fallback_sprom(struct ssb_bus *bus, struct ssb_sprom *out)
 {
        if (bus->bustype == SSB_BUSTYPE_PCI) {
                memcpy(out, &bcm63xx_sprom, sizeof(struct ssb_sprom));
index d277b4dc6c688eb394544b556e0941a54654c1b9..f94151f7c96fe1d988cd3d88f8451bbdf012955c 100644 (file)
@@ -26,7 +26,7 @@ static struct platform_device bcm63xx_rng_device = {
        .resource       = rng_resources,
 };
 
-int __init bcm63xx_rng_register(void)
+static int __init bcm63xx_rng_register(void)
 {
        if (!BCMCPU_IS_6368())
                return -ENODEV;
index 3bc7f3bfc9ad5c5e45737bcf1510bfcd5b5483e7..5d6bf0445b299cf0e91a4f7992f134e5648ca1c2 100644 (file)
@@ -10,6 +10,7 @@
 #include <linux/kernel.h>
 #include <linux/platform_device.h>
 #include <bcm63xx_cpu.h>
+#include <bcm63xx_dev_uart.h>
 
 static struct resource uart0_resources[] = {
        {
index 42130914a3c210993c07d971449a424d40775060..302bf7ed5ad5abfaa6cb94e4a4e0dcdf1ffbceb1 100644 (file)
@@ -34,7 +34,7 @@ static struct platform_device bcm63xx_wdt_device = {
        },
 };
 
-int __init bcm63xx_wdt_register(void)
+static int __init bcm63xx_wdt_register(void)
 {
        wdt_resources[0].start = bcm63xx_regset_address(RSET_WDT);
        wdt_resources[0].end = wdt_resources[0].start;
index 2548013442f6d95bdda071f89cc112d97d8a0d0a..6240a8f88ea366b5d440f6de3416191dead812b8 100644 (file)
@@ -72,7 +72,7 @@ static inline int enable_irq_for_cpu(int cpu, struct irq_data *d,
  */
 
 #define BUILD_IPIC_INTERNAL(width)                                     \
-void __dispatch_internal_##width(int cpu)                              \
+static void __dispatch_internal_##width(int cpu)                       \
 {                                                                      \
        u32 pending[width / 32];                                        \
        unsigned int src, tgt;                                          \
index d811e3e03f819a5005a480d56d5aee5a090fcc3c..c13ddb544a23bf0ebfd6bd627c9ed022a44cda0e 100644 (file)
@@ -159,7 +159,7 @@ void __init plat_mem_setup(void)
        board_setup();
 }
 
-int __init bcm63xx_register_devices(void)
+static int __init bcm63xx_register_devices(void)
 {
        /* register gpiochip */
        bcm63xx_gpio_init();
index a86065854c0c8c6c92254c4d7746fda8e6801250..74b83807df30a7be13f1f9466753b2560ce9b50b 100644 (file)
@@ -178,7 +178,7 @@ int bcm63xx_timer_set(int id, int monotonic, unsigned int countdown_us)
 
 EXPORT_SYMBOL(bcm63xx_timer_set);
 
-int bcm63xx_timer_init(void)
+static int bcm63xx_timer_init(void)
 {
        int ret, irq;
        u32 reg;
index 2e099d55a564a6ecf3dc347ace84ad25e4278dd9..9a266bf7833993b5facbdb63c97e555ad4d9ce27 100644 (file)
@@ -23,9 +23,6 @@
 
 #include <cobalt.h>
 
-extern void cobalt_machine_restart(char *command);
-extern void cobalt_machine_halt(void);
-
 const char *get_system_type(void)
 {
        switch (cobalt_board_id) {
index 66188739f54d20a41ce18acb0a88a4fdf16e8718..fb78e6fd5de4804e221fba63bceeb4dcd4a492a9 100644 (file)
@@ -37,7 +37,7 @@ static unsigned int nr_prom_mem __initdata;
  */
 #define ARC_PAGE_SHIFT 12
 
-struct linux_mdesc * __init ArcGetMemoryDescriptor(struct linux_mdesc *Current)
+static struct linux_mdesc * __init ArcGetMemoryDescriptor(struct linux_mdesc *Current)
 {
        return (struct linux_mdesc *) ARC_CALL1(get_mdesc, Current);
 }
index 4044eaf989ac7dad0f2094c5d4cfab05ac9fb5c3..0921ddda11a4b353c1c4d754417d3de4d003c12f 100644 (file)
@@ -241,7 +241,8 @@ static __inline__ __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
        "       .set    pop"
        : "=&r" (sum), "=&r" (tmp)
        : "r" (saddr), "r" (daddr),
-         "0" (htonl(len)), "r" (htonl(proto)), "r" (sum));
+         "0" (htonl(len)), "r" (htonl(proto)), "r" (sum)
+       : "memory");
 
        return csum_fold(sum);
 }
index 081be98c71ef48c698f4aa6ba14945239a666a9d..ff5d388502d4ab56ec28d71ad4126d542bb65977 100644 (file)
@@ -39,7 +39,7 @@ extern void jump_label_apply_nops(struct module *mod);
 
 static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
 {
-       asm_volatile_goto("1:\t" B_INSN " 2f\n\t"
+       asm goto("1:\t" B_INSN " 2f\n\t"
                "2:\t.insn\n\t"
                ".pushsection __jump_table,  \"aw\"\n\t"
                WORD_INSN " 1b, %l[l_yes], %0\n\t"
@@ -53,7 +53,7 @@ l_yes:
 
 static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)
 {
-       asm_volatile_goto("1:\t" J_INSN " %l[l_yes]\n\t"
+       asm goto("1:\t" J_INSN " %l[l_yes]\n\t"
                ".pushsection __jump_table,  \"aw\"\n\t"
                WORD_INSN " 1b, %l[l_yes], %0\n\t"
                ".popsection\n\t"
index a7eec3364a64abb60f1dae67ad26c80738878533..41546777902ba0fe25af0f442c688169f9220b48 100644 (file)
 
 #include <asm/cpu.h>
 
+void alchemy_set_lpj(void);
+void board_setup(void);
+
 /* helpers to access the SYS_* registers */
 static inline unsigned long alchemy_rdsys(int regofs)
 {
index 5b9fce73f11d1301fa5724049bfd9f8625ea7061..97f9d5e9446d22e1371b1c9f6fe09d59610d27ed 100644 (file)
@@ -19,4 +19,7 @@ extern int cobalt_board_id;
 #define COBALT_BRD_ID_QUBE2    0x5
 #define COBALT_BRD_ID_RAQ2     0x6
 
+void cobalt_machine_halt(void);
+void cobalt_machine_restart(char *command);
+
 #endif /* __ASM_COBALT_H */
index daf3cf244ea972c9a8bf134a09fa081931645425..d14d0e37ad02ddf10b42cfed590c65f97f8de424 100644 (file)
@@ -60,6 +60,7 @@ static inline void instruction_pointer_set(struct pt_regs *regs,
                                            unsigned long val)
 {
        regs->cp0_epc = val;
+       regs->cp0_cause &= ~CAUSEF_BD;
 }
 
 /* Query offset/name of register from its name/offset */
@@ -154,6 +155,8 @@ static inline long regs_return_value(struct pt_regs *regs)
 }
 
 #define instruction_pointer(regs) ((regs)->cp0_epc)
+extern unsigned long exception_ip(struct pt_regs *regs);
+#define exception_ip(regs) exception_ip(regs)
 #define profile_pc(regs) instruction_pointer(regs)
 
 extern asmlinkage long syscall_trace_enter(struct pt_regs *regs, long syscall);
index 5582a4ca1e9e36ad5dac4d23caa4d6c4bfb11a5d..7aa2c2360ff60219bb8fb9f03a8a528edf7f53a1 100644 (file)
@@ -11,6 +11,7 @@
 
 #include <asm/cpu-features.h>
 #include <asm/cpu-info.h>
+#include <asm/fpu.h>
 
 #ifdef CONFIG_MIPS_FP_SUPPORT
 
@@ -309,6 +310,11 @@ void mips_set_personality_nan(struct arch_elf_state *state)
        struct cpuinfo_mips *c = &boot_cpu_data;
        struct task_struct *t = current;
 
+       /* Do this early so t->thread.fpu.fcr31 won't be clobbered in case
+        * we are preempted before the lose_fpu(0) in start_thread.
+        */
+       lose_fpu(0);
+
        t->thread.fpu.fcr31 = c->fpu_csr31;
        switch (state->nan_2008) {
        case 0:
index d9df543f7e2c4cd17b29522840154f1e323cacb0..59288c13b581b89ccb46214c7be02126a017dab2 100644 (file)
@@ -31,6 +31,7 @@
 #include <linux/seccomp.h>
 #include <linux/ftrace.h>
 
+#include <asm/branch.h>
 #include <asm/byteorder.h>
 #include <asm/cpu.h>
 #include <asm/cpu-info.h>
 #define CREATE_TRACE_POINTS
 #include <trace/events/syscalls.h>
 
+unsigned long exception_ip(struct pt_regs *regs)
+{
+       return exception_epc(regs);
+}
+EXPORT_SYMBOL(exception_ip);
+
 /*
  * Called by kernel/ptrace.c when detaching..
  *
index dec6878b35f627089226618ff4dc4628855c8eb4..a1c1cb5de91321468f338d41a01df2f40efaf293 100644 (file)
@@ -2007,7 +2007,13 @@ unsigned long vi_handlers[64];
 
 void reserve_exception_space(phys_addr_t addr, unsigned long size)
 {
-       memblock_reserve(addr, size);
+       /*
+        * reserve exception space on CPUs other than CPU0
+        * is too late, since memblock is unavailable when APs
+        * up
+        */
+       if (smp_processor_id() == 0)
+               memblock_reserve(addr, size);
 }
 
 void __init *set_except_vector(int n, void *addr)
index a3cf293658581ed6a599da2b870f3c10c67a6be1..0c45767eacf67429ea3910628a2f44c219a4da34 100644 (file)
@@ -108,10 +108,9 @@ void __init prom_init(void)
        prom_init_cmdline();
 
 #if defined(CONFIG_MIPS_MT_SMP)
-       if (cpu_has_mipsmt) {
-               lantiq_smp_ops = vsmp_smp_ops;
+       lantiq_smp_ops = vsmp_smp_ops;
+       if (cpu_has_mipsmt)
                lantiq_smp_ops.init_secondary = lantiq_init_secondary;
-               register_smp_ops(&lantiq_smp_ops);
-       }
+       register_smp_ops(&lantiq_smp_ops);
 #endif
 }
index f25caa6aa9d306e84d719e97ea54f7b8faa449c1..553142c1f14fe2261d963b3784f3ed9e6c086cd2 100644 (file)
@@ -103,6 +103,9 @@ void __init szmem(unsigned int node)
        if (loongson_sysconf.vgabios_addr)
                memblock_reserve(virt_to_phys((void *)loongson_sysconf.vgabios_addr),
                                SZ_256K);
+       /* set nid for reserved memory */
+       memblock_set_node((u64)node << 44, (u64)(node + 1) << 44,
+                       &memblock.reserved, node);
 }
 
 #ifndef CONFIG_NUMA
index 8f61e93c0c5bcf07134cc22a06913c57e5140af4..68dafd6d3e2571f615e9c9e7d9b2c895de80468a 100644 (file)
@@ -132,6 +132,8 @@ static void __init node_mem_init(unsigned int node)
 
                /* Reserve pfn range 0~node[0]->node_start_pfn */
                memblock_reserve(0, PAGE_SIZE * start_pfn);
+               /* set nid for reserved memory on node 0 */
+               memblock_set_node(0, 1ULL << 44, &memblock.reserved, 0);
        }
 }
 
index 27c14ede191eb7b1353e3a2cedd6d9d80bc2b385..9877fcc512b1578731fb6235a35256a61b172afb 100644 (file)
@@ -5,7 +5,7 @@
 
 obj-y  := ip27-berr.o ip27-irq.o ip27-init.o ip27-klconfig.o \
           ip27-klnuma.o ip27-memory.o ip27-nmi.o ip27-reset.o ip27-timer.o \
-          ip27-hubio.o ip27-xtalk.o
+          ip27-xtalk.o
 
 obj-$(CONFIG_EARLY_PRINTK)     += ip27-console.o
 obj-$(CONFIG_SMP)              += ip27-smp.o
index 923a63a51cda39482c227936c17f828ceae3227b..9eb497cb5d525c74e775ca741bd4ec664209280b 100644 (file)
@@ -22,6 +22,8 @@
 #include <asm/traps.h>
 #include <linux/uaccess.h>
 
+#include "ip27-common.h"
+
 static void dump_hub_information(unsigned long errst0, unsigned long errst1)
 {
        static char *err_type[2][8] = {
@@ -57,7 +59,7 @@ static void dump_hub_information(unsigned long errst0, unsigned long errst1)
               [st0.pi_stat0_fmt.s0_err_type] ? : "invalid");
 }
 
-int ip27_be_handler(struct pt_regs *regs, int is_fixup)
+static int ip27_be_handler(struct pt_regs *regs, int is_fixup)
 {
        unsigned long errst0, errst1;
        int data = regs->cp0_cause & 4;
index ed008a08464c208cc1944cfbd6fe5de31e14fee4..a0059fa13934539af5fb616120f66b77054a2219 100644 (file)
@@ -10,6 +10,7 @@ extern void hub_rt_clock_event_init(void);
 extern void hub_rtc_init(nasid_t nasid);
 extern void install_cpu_nmi_handler(int slice);
 extern void install_ipi(void);
+extern void ip27_be_init(void);
 extern void ip27_reboot_setup(void);
 extern const struct plat_smp_ops ip27_smp_ops;
 extern unsigned long node_getfirstfree(nasid_t nasid);
@@ -17,4 +18,5 @@ extern void per_cpu_init(void);
 extern void replicate_kernel_text(void);
 extern void setup_replication_mask(void);
 
+
 #endif /* __IP27_COMMON_H */
diff --git a/arch/mips/sgi-ip27/ip27-hubio.c b/arch/mips/sgi-ip27/ip27-hubio.c
deleted file mode 100644 (file)
index c57f0d8..0000000
+++ /dev/null
@@ -1,185 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * Copyright (C) 1992-1997, 2000-2003 Silicon Graphics, Inc.
- * Copyright (C) 2004 Christoph Hellwig.
- *
- * Support functions for the HUB ASIC - mostly PIO mapping related.
- */
-
-#include <linux/bitops.h>
-#include <linux/string.h>
-#include <linux/mmzone.h>
-#include <asm/sn/addrs.h>
-#include <asm/sn/arch.h>
-#include <asm/sn/agent.h>
-#include <asm/sn/io.h>
-#include <asm/xtalk/xtalk.h>
-
-
-static int force_fire_and_forget = 1;
-
-/**
- * hub_pio_map -  establish a HUB PIO mapping
- *
- * @nasid:     nasid to perform PIO mapping on
- * @widget:    widget ID to perform PIO mapping for
- * @xtalk_addr: xtalk_address that needs to be mapped
- * @size:      size of the PIO mapping
- *
- **/
-unsigned long hub_pio_map(nasid_t nasid, xwidgetnum_t widget,
-                         unsigned long xtalk_addr, size_t size)
-{
-       unsigned i;
-
-       /* use small-window mapping if possible */
-       if ((xtalk_addr % SWIN_SIZE) + size <= SWIN_SIZE)
-               return NODE_SWIN_BASE(nasid, widget) + (xtalk_addr % SWIN_SIZE);
-
-       if ((xtalk_addr % BWIN_SIZE) + size > BWIN_SIZE) {
-               printk(KERN_WARNING "PIO mapping at hub %d widget %d addr 0x%lx"
-                               " too big (%ld)\n",
-                               nasid, widget, xtalk_addr, size);
-               return 0;
-       }
-
-       xtalk_addr &= ~(BWIN_SIZE-1);
-       for (i = 0; i < HUB_NUM_BIG_WINDOW; i++) {
-               if (test_and_set_bit(i, hub_data(nasid)->h_bigwin_used))
-                       continue;
-
-               /*
-                * The code below does a PIO write to setup an ITTE entry.
-                *
-                * We need to prevent other CPUs from seeing our updated
-                * memory shadow of the ITTE (in the piomap) until the ITTE
-                * entry is actually set up; otherwise, another CPU might
-                * attempt a PIO prematurely.
-                *
-                * Also, the only way we can know that an entry has been
-                * received  by the hub and can be used by future PIO reads/
-                * writes is by reading back the ITTE entry after writing it.
-                *
-                * For these two reasons, we PIO read back the ITTE entry
-                * after we write it.
-                */
-               IIO_ITTE_PUT(nasid, i, HUB_PIO_MAP_TO_MEM, widget, xtalk_addr);
-               __raw_readq(IIO_ITTE_GET(nasid, i));
-
-               return NODE_BWIN_BASE(nasid, widget) + (xtalk_addr % BWIN_SIZE);
-       }
-
-       printk(KERN_WARNING "unable to establish PIO mapping for at"
-                       " hub %d widget %d addr 0x%lx\n",
-                       nasid, widget, xtalk_addr);
-       return 0;
-}
-
-
-/*
- * hub_setup_prb(nasid, prbnum, credits, conveyor)
- *
- *     Put a PRB into fire-and-forget mode if conveyor isn't set.  Otherwise,
- *     put it into conveyor belt mode with the specified number of credits.
- */
-static void hub_setup_prb(nasid_t nasid, int prbnum, int credits)
-{
-       union iprb_u prb;
-       int prb_offset;
-
-       /*
-        * Get the current register value.
-        */
-       prb_offset = IIO_IOPRB(prbnum);
-       prb.iprb_regval = REMOTE_HUB_L(nasid, prb_offset);
-
-       /*
-        * Clear out some fields.
-        */
-       prb.iprb_ovflow = 1;
-       prb.iprb_bnakctr = 0;
-       prb.iprb_anakctr = 0;
-
-       /*
-        * Enable or disable fire-and-forget mode.
-        */
-       prb.iprb_ff = force_fire_and_forget ? 1 : 0;
-
-       /*
-        * Set the appropriate number of PIO credits for the widget.
-        */
-       prb.iprb_xtalkctr = credits;
-
-       /*
-        * Store the new value to the register.
-        */
-       REMOTE_HUB_S(nasid, prb_offset, prb.iprb_regval);
-}
-
-/**
- * hub_set_piomode  -  set pio mode for a given hub
- *
- * @nasid:     physical node ID for the hub in question
- *
- * Put the hub into either "PIO conveyor belt" mode or "fire-and-forget" mode.
- * To do this, we have to make absolutely sure that no PIOs are in progress
- * so we turn off access to all widgets for the duration of the function.
- *
- * XXX - This code should really check what kind of widget we're talking
- * to. Bridges can only handle three requests, but XG will do more.
- * How many can crossbow handle to widget 0?  We're assuming 1.
- *
- * XXX - There is a bug in the crossbow that link reset PIOs do not
- * return write responses.  The easiest solution to this problem is to
- * leave widget 0 (xbow) in fire-and-forget mode at all times. This
- * only affects pio's to xbow registers, which should be rare.
- **/
-static void hub_set_piomode(nasid_t nasid)
-{
-       u64 ii_iowa;
-       union hubii_wcr_u ii_wcr;
-       unsigned i;
-
-       ii_iowa = REMOTE_HUB_L(nasid, IIO_OUTWIDGET_ACCESS);
-       REMOTE_HUB_S(nasid, IIO_OUTWIDGET_ACCESS, 0);
-
-       ii_wcr.wcr_reg_value = REMOTE_HUB_L(nasid, IIO_WCR);
-
-       if (ii_wcr.iwcr_dir_con) {
-               /*
-                * Assume a bridge here.
-                */
-               hub_setup_prb(nasid, 0, 3);
-       } else {
-               /*
-                * Assume a crossbow here.
-                */
-               hub_setup_prb(nasid, 0, 1);
-       }
-
-       /*
-        * XXX - Here's where we should take the widget type into
-        * when account assigning credits.
-        */
-       for (i = HUB_WIDGET_ID_MIN; i <= HUB_WIDGET_ID_MAX; i++)
-               hub_setup_prb(nasid, i, 3);
-
-       REMOTE_HUB_S(nasid, IIO_OUTWIDGET_ACCESS, ii_iowa);
-}
-
-/*
- * hub_pio_init         -  PIO-related hub initialization
- *
- * @hub:       hubinfo structure for our hub
- */
-void hub_pio_init(nasid_t nasid)
-{
-       unsigned i;
-
-       /* initialize big window piomaps for this hub */
-       bitmap_zero(hub_data(nasid)->h_bigwin_used, HUB_NUM_BIG_WINDOW);
-       for (i = 0; i < HUB_NUM_BIG_WINDOW; i++)
-               IIO_ITTE_DISABLE(nasid, i);
-
-       hub_set_piomode(nasid);
-}
index a0dd3bd2b81b359491b447917486890ebc18fd4b..8f5299b269e7e7d1b104d6fa4616de4f7fdfc34d 100644 (file)
@@ -23,6 +23,8 @@
 #include <asm/sn/intr.h>
 #include <asm/sn/irq_alloc.h>
 
+#include "ip27-common.h"
+
 struct hub_irq_data {
        u64     *irq_mask[2];
        cpuid_t cpu;
index f79c4839371661237141b866d89743a101411c53..b8ca94cfb4fef34b42f9e5307e7dcfc09ef8a6d2 100644 (file)
@@ -23,6 +23,7 @@
 #include <asm/page.h>
 #include <asm/pgalloc.h>
 #include <asm/sections.h>
+#include <asm/sgialib.h>
 
 #include <asm/sn/arch.h>
 #include <asm/sn/agent.h>
index 84889b57d5ff684e32bc2a1897583a0f4770853e..fc2816398d0cf04a48c1f704ade54a65b97e15f8 100644 (file)
@@ -11,6 +11,8 @@
 #include <asm/sn/arch.h>
 #include <asm/sn/agent.h>
 
+#include "ip27-common.h"
+
 #if 0
 #define NODE_NUM_CPUS(n)       CNODE_NUM_CPUS(n)
 #else
 typedef unsigned long machreg_t;
 
 static arch_spinlock_t nmi_lock = __ARCH_SPIN_LOCK_UNLOCKED;
-
-/*
- * Let's see what else we need to do here. Set up sp, gp?
- */
-void nmi_dump(void)
-{
-       void cont_nmi_dump(void);
-
-       cont_nmi_dump();
-}
+static void nmi_dump(void);
 
 void install_cpu_nmi_handler(int slice)
 {
@@ -53,7 +46,7 @@ void install_cpu_nmi_handler(int slice)
  * into the eframe format for the node under consideration.
  */
 
-void nmi_cpu_eframe_save(nasid_t nasid, int slice)
+static void nmi_cpu_eframe_save(nasid_t nasid, int slice)
 {
        struct reg_struct *nr;
        int             i;
@@ -129,7 +122,7 @@ void nmi_cpu_eframe_save(nasid_t nasid, int slice)
        pr_emerg("\n");
 }
 
-void nmi_dump_hub_irq(nasid_t nasid, int slice)
+static void nmi_dump_hub_irq(nasid_t nasid, int slice)
 {
        u64 mask0, mask1, pend0, pend1;
 
@@ -153,7 +146,7 @@ void nmi_dump_hub_irq(nasid_t nasid, int slice)
  * Copy the cpu registers which have been saved in the IP27prom format
  * into the eframe format for the node under consideration.
  */
-void nmi_node_eframe_save(nasid_t nasid)
+static void nmi_node_eframe_save(nasid_t nasid)
 {
        int slice;
 
@@ -170,8 +163,7 @@ void nmi_node_eframe_save(nasid_t nasid)
 /*
  * Save the nmi cpu registers for all cpus in the system.
  */
-void
-nmi_eframes_save(void)
+static void nmi_eframes_save(void)
 {
        nasid_t nasid;
 
@@ -179,8 +171,7 @@ nmi_eframes_save(void)
                nmi_node_eframe_save(nasid);
 }
 
-void
-cont_nmi_dump(void)
+static void nmi_dump(void)
 {
 #ifndef REAL_NMI_SIGNAL
        static atomic_t nmied_cpus = ATOMIC_INIT(0);
index b91f8c4fdc786011172f8111e7e0dfc3e04705e1..7c6dcf6e73f701c68595bd3b26677ff8d667b56a 100644 (file)
@@ -3,6 +3,7 @@
 #include <linux/io.h>
 
 #include <asm/sn/ioc3.h>
+#include <asm/setup.h>
 
 static inline struct ioc3_uartregs *console_uart(void)
 {
index 75a34684e7045977a89faa54b1ec740eb13af5ff..e8547636a7482a4a4c08738bccf7f246b8061d26 100644 (file)
@@ -14,6 +14,7 @@
 #include <linux/percpu.h>
 #include <linux/memblock.h>
 
+#include <asm/bootinfo.h>
 #include <asm/smp-ops.h>
 #include <asm/sgialib.h>
 #include <asm/time.h>
index a8e0c776ca6c628faa0b0ef4828de3fb4e9f51a2..b8a0e4cfa9ce882dcba3c0dc4e911716d47a457b 100644 (file)
@@ -18,6 +18,8 @@
 #include <asm/ip32/crime.h>
 #include <asm/ip32/mace.h>
 
+#include "ip32-common.h"
+
 struct sgi_crime __iomem *crime;
 struct sgi_mace __iomem *mace;
 
@@ -39,7 +41,7 @@ void __init crime_init(void)
               id, rev, field, (unsigned long) CRIME_BASE);
 }
 
-irqreturn_t crime_memerr_intr(unsigned int irq, void *dev_id)
+irqreturn_t crime_memerr_intr(int irq, void *dev_id)
 {
        unsigned long stat, addr;
        int fatal = 0;
@@ -90,7 +92,7 @@ irqreturn_t crime_memerr_intr(unsigned int irq, void *dev_id)
        return IRQ_HANDLED;
 }
 
-irqreturn_t crime_cpuerr_intr(unsigned int irq, void *dev_id)
+irqreturn_t crime_cpuerr_intr(int irq, void *dev_id)
 {
        unsigned long stat = crime->cpu_error_stat & CRIME_CPU_ERROR_MASK;
        unsigned long addr = crime->cpu_error_addr & CRIME_CPU_ERROR_ADDR_MASK;
index 478b63b4c808f35456bb0b4ba69de4450edb7404..7cbc27941f928399c3cd5166741f5495c55b7eaa 100644 (file)
@@ -18,6 +18,8 @@
 #include <asm/ptrace.h>
 #include <asm/tlbdebug.h>
 
+#include "ip32-common.h"
+
 static int ip32_be_handler(struct pt_regs *regs, int is_fixup)
 {
        int data = regs->cp0_cause & 4;
diff --git a/arch/mips/sgi-ip32/ip32-common.h b/arch/mips/sgi-ip32/ip32-common.h
new file mode 100644 (file)
index 0000000..cfc0225
--- /dev/null
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __IP32_COMMON_H
+#define __IP32_COMMON_H
+
+#include <linux/init.h>
+#include <linux/interrupt.h>
+
+void __init crime_init(void);
+irqreturn_t crime_memerr_intr(int irq, void *dev_id);
+irqreturn_t crime_cpuerr_intr(int irq, void *dev_id);
+void __init ip32_be_init(void);
+void ip32_prepare_poweroff(void);
+
+#endif /* __IP32_COMMON_H */
index e21ea1de05e31953ce51f04122512cd27b2d9c46..29d04468a06b8f5c4004a25a18ad94dcb08013f9 100644 (file)
@@ -28,6 +28,8 @@
 #include <asm/ip32/mace.h>
 #include <asm/ip32/ip32_ints.h>
 
+#include "ip32-common.h"
+
 /* issue a PIO read to make sure no PIO writes are pending */
 static inline void flush_crime_bus(void)
 {
@@ -107,10 +109,6 @@ static inline void flush_mace_bus(void)
  * is quite different anyway.
  */
 
-/* Some initial interrupts to set up */
-extern irqreturn_t crime_memerr_intr(int irq, void *dev_id);
-extern irqreturn_t crime_cpuerr_intr(int irq, void *dev_id);
-
 /*
  * This is for pure CRIME interrupts - ie not MACE.  The advantage?
  * We get to split the register in half and do faster lookups.
index 3fc8d0a0bdfa45cc8b3aead0bd31144a874e17bb..5fee33744f674bdbdd777ba63b7d15f92d661a99 100644 (file)
@@ -15,6 +15,7 @@
 #include <asm/ip32/crime.h>
 #include <asm/bootinfo.h>
 #include <asm/page.h>
+#include <asm/sgialib.h>
 
 extern void crime_init(void);
 
index 18d1c115cd534a2d78a1ee5f8b53681e46fc021f..6bdc1421cda46cad28b5b253bf53703005ed09bf 100644 (file)
@@ -29,6 +29,8 @@
 #include <asm/ip32/crime.h>
 #include <asm/ip32/ip32_ints.h>
 
+#include "ip32-common.h"
+
 #define POWERDOWN_TIMEOUT      120
 /*
  * Blink frequency during reboot grace period and when panicked.
index 8019dae1721a811cef26fb75430a2b3ca151d6dd..aeb0805aae57bacfef7b95877042a6dc476a14a5 100644 (file)
@@ -26,8 +26,7 @@
 #include <asm/ip32/mace.h>
 #include <asm/ip32/ip32_ints.h>
 
-extern void ip32_be_init(void);
-extern void crime_init(void);
+#include "ip32-common.h"
 
 #ifdef CONFIG_SGI_O2MACE_ETH
 /*
index d14ccc948a29b920854b6c750febffac625619fd..5c845e8d59d92f8cd3594fccf1476503d8957149 100644 (file)
@@ -25,7 +25,6 @@ config PARISC
        select RTC_DRV_GENERIC
        select INIT_ALL_POSSIBLE
        select BUG
-       select BUILDTIME_TABLE_SORT
        select HAVE_KERNEL_UNCOMPRESSED
        select HAVE_PCI
        select HAVE_PERF_EVENTS
index 920db57b6b4cc866018c05dd00ca49142c7f949c..316f84f1d15c8f8c6e65dd3862dc5db1144f95bb 100644 (file)
@@ -50,12 +50,12 @@ export CROSS32CC
 
 # Set default cross compiler for kernel build
 ifdef cross_compiling
-       ifeq ($(CROSS_COMPILE),)
+    ifeq ($(CROSS_COMPILE),)
                CC_SUFFIXES = linux linux-gnu unknown-linux-gnu suse-linux
                CROSS_COMPILE := $(call cc-cross-prefix, \
                        $(foreach a,$(CC_ARCHES), \
                        $(foreach s,$(CC_SUFFIXES),$(a)-$(s)-)))
-       endif
+    endif
 endif
 
 ifdef CONFIG_DYNAMIC_FTRACE
index 74d17d7e759da9dfa89aa1a504b94de4554db16d..5937d5edaba1eac5a0c4e4b055c3e77fcbe3bf62 100644 (file)
        .section __ex_table,"aw"                        !       \
        .align 4                                        !       \
        .word (fault_addr - .), (except_addr - .)       !       \
+       or %r0,%r0,%r0                                  !       \
        .previous
 
 
diff --git a/arch/parisc/include/asm/extable.h b/arch/parisc/include/asm/extable.h
new file mode 100644 (file)
index 0000000..4ea23e3
--- /dev/null
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PARISC_EXTABLE_H
+#define __PARISC_EXTABLE_H
+
+#include <asm/ptrace.h>
+#include <linux/compiler.h>
+
+/*
+ * The exception table consists of three addresses:
+ *
+ * - A relative address to the instruction that is allowed to fault.
+ * - A relative address at which the program should continue (fixup routine)
+ * - An asm statement which specifies which CPU register will
+ *   receive -EFAULT when an exception happens if the lowest bit in
+ *   the fixup address is set.
+ *
+ * Note: The register specified in the err_opcode instruction will be
+ * modified at runtime if a fault happens. Register %r0 will be ignored.
+ *
+ * Since relative addresses are used, 32bit values are sufficient even on
+ * 64bit kernel.
+ */
+
+struct pt_regs;
+int fixup_exception(struct pt_regs *regs);
+
+#define ARCH_HAS_RELATIVE_EXTABLE
+struct exception_table_entry {
+       int insn;       /* relative address of insn that is allowed to fault. */
+       int fixup;      /* relative address of fixup routine */
+       int err_opcode; /* sample opcode with register which holds error code */
+};
+
+#define ASM_EXCEPTIONTABLE_ENTRY( fault_addr, except_addr, opcode )\
+       ".section __ex_table,\"aw\"\n"                     \
+       ".align 4\n"                                       \
+       ".word (" #fault_addr " - .), (" #except_addr " - .)\n" \
+       opcode "\n"                                        \
+       ".previous\n"
+
+/*
+ * ASM_EXCEPTIONTABLE_ENTRY_EFAULT() creates a special exception table entry
+ * (with lowest bit set) for which the fault handler in fixup_exception() will
+ * load -EFAULT on fault into the register specified by the err_opcode instruction,
+ * and zeroes the target register in case of a read fault in get_user().
+ */
+#define ASM_EXCEPTIONTABLE_VAR(__err_var)              \
+       int __err_var = 0
+#define ASM_EXCEPTIONTABLE_ENTRY_EFAULT( fault_addr, except_addr, register )\
+       ASM_EXCEPTIONTABLE_ENTRY( fault_addr, except_addr + 1, "or %%r0,%%r0," register)
+
+static inline void swap_ex_entry_fixup(struct exception_table_entry *a,
+                                      struct exception_table_entry *b,
+                                      struct exception_table_entry tmp,
+                                      int delta)
+{
+       a->fixup = b->fixup + delta;
+       b->fixup = tmp.fixup - delta;
+       a->err_opcode = b->err_opcode;
+       b->err_opcode = tmp.err_opcode;
+}
+#define swap_ex_entry_fixup swap_ex_entry_fixup
+
+#endif
index 94428798b6aa63e8d4b0878cc7555826cf080e47..317ebc5edc9fe99950f4efe55d989db453f46d0d 100644 (file)
@@ -12,7 +12,7 @@
 
 static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
 {
-       asm_volatile_goto("1:\n\t"
+       asm goto("1:\n\t"
                 "nop\n\t"
                 ".pushsection __jump_table,  \"aw\"\n\t"
                 ".align %1\n\t"
@@ -29,7 +29,7 @@ l_yes:
 
 static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)
 {
-       asm_volatile_goto("1:\n\t"
+       asm goto("1:\n\t"
                 "b,n %l[l_yes]\n\t"
                 ".pushsection __jump_table,  \"aw\"\n\t"
                 ".align %1\n\t"
index 0a175ac876980c7c90b747bd8f8f34658499997a..0f42f5c8e3b66a8cbcf6f95a3312cb22f456cca8 100644 (file)
 #ifndef _PARISC_KPROBES_H
 #define _PARISC_KPROBES_H
 
+#include <asm-generic/kprobes.h>
+
 #ifdef CONFIG_KPROBES
 
-#include <asm-generic/kprobes.h>
 #include <linux/types.h>
 #include <linux/ptrace.h>
 #include <linux/notifier.h>
index c822bd0c0e3c6ccb86b4190d15500589c70f353a..51f40eaf7780659263f37b7c10fa7bd4ecf4ced7 100644 (file)
@@ -8,7 +8,8 @@
                "copy %%r0,%0\n"                        \
                "8:\tlpa %%r0(%1),%0\n"                 \
                "9:\n"                                  \
-               ASM_EXCEPTIONTABLE_ENTRY(8b, 9b)        \
+               ASM_EXCEPTIONTABLE_ENTRY(8b, 9b,        \
+                               "or %%r0,%%r0,%%r0")    \
                : "=&r" (pa)                            \
                : "r" (va)                              \
                : "memory"                              \
@@ -22,7 +23,8 @@
                "copy %%r0,%0\n"                        \
                "8:\tlpa %%r0(%%sr3,%1),%0\n"           \
                "9:\n"                                  \
-               ASM_EXCEPTIONTABLE_ENTRY(8b, 9b)        \
+               ASM_EXCEPTIONTABLE_ENTRY(8b, 9b,        \
+                               "or %%r0,%%r0,%%r0")    \
                : "=&r" (pa)                            \
                : "r" (va)                              \
                : "memory"                              \
index 4165079898d9e7af239a31a1bc77821e6081706a..88d0ae5769dde54e29176e286da359eb6a54e7bf 100644 (file)
@@ -7,6 +7,7 @@
  */
 #include <asm/page.h>
 #include <asm/cache.h>
+#include <asm/extable.h>
 
 #include <linux/bug.h>
 #include <linux/string.h>
 #define STD_USER(sr, x, ptr)   __put_user_asm(sr, "std", x, ptr)
 #endif
 
-/*
- * The exception table contains two values: the first is the relative offset to
- * the address of the instruction that is allowed to fault, and the second is
- * the relative offset to the address of the fixup routine. Since relative
- * addresses are used, 32bit values are sufficient even on 64bit kernel.
- */
-
-#define ARCH_HAS_RELATIVE_EXTABLE
-struct exception_table_entry {
-       int insn;       /* relative address of insn that is allowed to fault. */
-       int fixup;      /* relative address of fixup routine */
-};
-
-#define ASM_EXCEPTIONTABLE_ENTRY( fault_addr, except_addr )\
-       ".section __ex_table,\"aw\"\n"                     \
-       ".align 4\n"                                       \
-       ".word (" #fault_addr " - .), (" #except_addr " - .)\n\t" \
-       ".previous\n"
-
-/*
- * ASM_EXCEPTIONTABLE_ENTRY_EFAULT() creates a special exception table entry
- * (with lowest bit set) for which the fault handler in fixup_exception() will
- * load -EFAULT into %r29 for a read or write fault, and zeroes the target
- * register in case of a read fault in get_user().
- */
-#define ASM_EXCEPTIONTABLE_REG 29
-#define ASM_EXCEPTIONTABLE_VAR(__variable)             \
-       register long __variable __asm__ ("r29") = 0
-#define ASM_EXCEPTIONTABLE_ENTRY_EFAULT( fault_addr, except_addr )\
-       ASM_EXCEPTIONTABLE_ENTRY( fault_addr, except_addr + 1)
-
 #define __get_user_internal(sr, val, ptr)              \
 ({                                                     \
        ASM_EXCEPTIONTABLE_VAR(__gu_err);               \
@@ -83,7 +53,7 @@ struct exception_table_entry {
                                                        \
        __asm__("1: " ldx " 0(%%sr%2,%3),%0\n"          \
                "9:\n"                                  \
-               ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 9b) \
+               ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 9b, "%1")   \
                : "=r"(__gu_val), "+r"(__gu_err)        \
                : "i"(sr), "r"(ptr));                   \
                                                        \
@@ -115,8 +85,8 @@ struct exception_table_entry {
                "1: ldw 0(%%sr%2,%3),%0\n"              \
                "2: ldw 4(%%sr%2,%3),%R0\n"             \
                "9:\n"                                  \
-               ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 9b) \
-               ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 9b) \
+               ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 9b, "%1")   \
+               ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 9b, "%1")   \
                : "=&r"(__gu_tmp.l), "+r"(__gu_err)     \
                : "i"(sr), "r"(ptr));                   \
                                                        \
@@ -174,7 +144,7 @@ struct exception_table_entry {
        __asm__ __volatile__ (                                  \
                "1: " stx " %1,0(%%sr%2,%3)\n"                  \
                "9:\n"                                          \
-               ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 9b)         \
+               ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 9b, "%0")   \
                : "+r"(__pu_err)                                \
                : "r"(x), "i"(sr), "r"(ptr))
 
@@ -186,15 +156,14 @@ struct exception_table_entry {
                "1: stw %1,0(%%sr%2,%3)\n"                      \
                "2: stw %R1,4(%%sr%2,%3)\n"                     \
                "9:\n"                                          \
-               ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 9b)         \
-               ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 9b)         \
+               ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 9b, "%0")   \
+               ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 9b, "%0")   \
                : "+r"(__pu_err)                                \
                : "r"(__val), "i"(sr), "r"(ptr));               \
 } while (0)
 
 #endif /* !defined(CONFIG_64BIT) */
 
-
 /*
  * Complex access routines -- external declarations
  */
@@ -216,7 +185,4 @@ unsigned long __must_check raw_copy_from_user(void *dst, const void __user *src,
 #define INLINE_COPY_TO_USER
 #define INLINE_COPY_FROM_USER
 
-struct pt_regs;
-int fixup_exception(struct pt_regs *regs);
-
 #endif /* __PARISC_UACCESS_H */
index 268d90a9325b468603b634b86b48980a31b4fba7..422f3e1e6d9cad718c264c7d7c9bd30872846555 100644 (file)
@@ -58,7 +58,7 @@ int pa_serialize_tlb_flushes __ro_after_init;
 
 struct pdc_cache_info cache_info __ro_after_init;
 #ifndef CONFIG_PA20
-struct pdc_btlb_info btlb_info __ro_after_init;
+struct pdc_btlb_info btlb_info;
 #endif
 
 DEFINE_STATIC_KEY_TRUE(parisc_has_cache);
@@ -264,6 +264,10 @@ parisc_cache_init(void)
        icache_stride = CAFL_STRIDE(cache_info.ic_conf);
 #undef CAFL_STRIDE
 
+       /* stride needs to be non-zero, otherwise cache flushes will not work */
+       WARN_ON(cache_info.dc_size && dcache_stride == 0);
+       WARN_ON(cache_info.ic_size && icache_stride == 0);
+
        if ((boot_cpu_data.pdc.capabilities & PDC_MODEL_NVA_MASK) ==
                                                PDC_MODEL_NVA_UNSUPPORTED) {
                printk(KERN_WARNING "parisc_cache_init: Only equivalent aliasing supported!\n");
@@ -850,7 +854,7 @@ SYSCALL_DEFINE3(cacheflush, unsigned long, addr, unsigned long, bytes,
 #endif
                        "   fic,m       %3(%4,%0)\n"
                        "2: sync\n"
-                       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 2b)
+                       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 2b, "%1")
                        : "+r" (start), "+r" (error)
                        : "r" (end), "r" (dcache_stride), "i" (SR_USER));
        }
@@ -865,7 +869,7 @@ SYSCALL_DEFINE3(cacheflush, unsigned long, addr, unsigned long, bytes,
 #endif
                        "   fdc,m       %3(%4,%0)\n"
                        "2: sync\n"
-                       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 2b)
+                       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 2b, "%1")
                        : "+r" (start), "+r" (error)
                        : "r" (end), "r" (icache_stride), "i" (SR_USER));
        }
index 25f9b9e9d6dfbc70f21787e29a170334cb102dc7..c7ff339732ba5a762eac90e1b3072aef45c58318 100644 (file)
@@ -742,7 +742,7 @@ parse_tree_node(struct device *parent, int index, struct hardware_path *modpath)
        };
 
        if (device_for_each_child(parent, &recurse_data, descend_children))
-               { /* nothing */ };
+               { /* nothing */ }
 
        return d.dev;
 }
@@ -1004,6 +1004,9 @@ static __init int qemu_print_iodc_data(struct device *lin_dev, void *data)
 
        pr_info("\n");
 
+       /* Prevent hung task messages when printing on serial console */
+       cond_resched();
+
        pr_info("#define HPA_%08lx_DESCRIPTION \"%s\"\n",
                hpa, parisc_hardware_description(&dev->id));
 
index d1defb9ede70c0ae73e46363e850fc28ef91cebd..621a4b386ae4fcc90fa5e2ad9b7ac6b947fd903d 100644 (file)
@@ -78,7 +78,7 @@ asmlinkage void notrace __hot ftrace_function_trampoline(unsigned long parent,
 #endif
 }
 
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+#if defined(CONFIG_DYNAMIC_FTRACE) && defined(CONFIG_FUNCTION_GRAPH_TRACER)
 int ftrace_enable_ftrace_graph_caller(void)
 {
        static_key_enable(&ftrace_graph_enable.key);
index e95a977ba5f376eb813d4c7806d205a92f539880..bf73562706b2e8ec337bc8cde4b6fd9e5cd7f43e 100644 (file)
@@ -172,7 +172,6 @@ static int __init processor_probe(struct parisc_device *dev)
        p->cpu_num = cpu_info.cpu_num;
        p->cpu_loc = cpu_info.cpu_loc;
 
-       set_cpu_possible(cpuid, true);
        store_cpu_topology(cpuid);
 
 #ifdef CONFIG_SMP
@@ -474,13 +473,6 @@ static struct parisc_driver cpu_driver __refdata = {
  */
 void __init processor_init(void)
 {
-       unsigned int cpu;
-
        reset_cpu_topology();
-
-       /* reset possible mask. We will mark those which are possible. */
-       for_each_possible_cpu(cpu)
-               set_cpu_possible(cpu, false);
-
        register_parisc_driver(&cpu_driver);
 }
index ce25acfe4889d0df8048e448a16d76e414ee1262..c520e551a165258609cba5e068037493bd7e57a8 100644 (file)
@@ -120,8 +120,8 @@ static int emulate_ldh(struct pt_regs *regs, int toreg)
 "2:    ldbs    1(%%sr1,%3), %0\n"
 "      depw    %2, 23, 24, %0\n"
 "3:    \n"
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b)
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b)
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b, "%1")
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b, "%1")
        : "+r" (val), "+r" (ret), "=&r" (temp1)
        : "r" (saddr), "r" (regs->isr) );
 
@@ -152,8 +152,8 @@ static int emulate_ldw(struct pt_regs *regs, int toreg, int flop)
 "      mtctl   %2,11\n"
 "      vshd    %0,%3,%0\n"
 "3:    \n"
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b)
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b)
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b, "%1")
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b, "%1")
        : "+r" (val), "+r" (ret), "=&r" (temp1), "=&r" (temp2)
        : "r" (saddr), "r" (regs->isr) );
 
@@ -189,8 +189,8 @@ static int emulate_ldd(struct pt_regs *regs, int toreg, int flop)
 "      mtsar   %%r19\n"
 "      shrpd   %0,%%r20,%%sar,%0\n"
 "3:    \n"
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b)
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b)
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b, "%1")
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b, "%1")
        : "=r" (val), "+r" (ret)
        : "0" (val), "r" (saddr), "r" (regs->isr)
        : "r19", "r20" );
@@ -209,9 +209,9 @@ static int emulate_ldd(struct pt_regs *regs, int toreg, int flop)
 "      vshd    %0,%R0,%0\n"
 "      vshd    %R0,%4,%R0\n"
 "4:    \n"
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 4b)
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 4b)
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(3b, 4b)
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 4b, "%1")
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 4b, "%1")
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(3b, 4b, "%1")
        : "+r" (val), "+r" (ret), "+r" (saddr), "=&r" (shift), "=&r" (temp1)
        : "r" (regs->isr) );
     }
@@ -244,8 +244,8 @@ static int emulate_sth(struct pt_regs *regs, int frreg)
 "1:    stb %1, 0(%%sr1, %3)\n"
 "2:    stb %2, 1(%%sr1, %3)\n"
 "3:    \n"
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b)
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b)
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b, "%0")
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b, "%0")
        : "+r" (ret), "=&r" (temp1)
        : "r" (val), "r" (regs->ior), "r" (regs->isr) );
 
@@ -285,8 +285,8 @@ static int emulate_stw(struct pt_regs *regs, int frreg, int flop)
 "      stw     %%r20,0(%%sr1,%2)\n"
 "      stw     %%r21,4(%%sr1,%2)\n"
 "3:    \n"
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b)
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b)
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 3b, "%0")
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 3b, "%0")
        : "+r" (ret)
        : "r" (val), "r" (regs->ior), "r" (regs->isr)
        : "r19", "r20", "r21", "r22", "r1" );
@@ -329,10 +329,10 @@ static int emulate_std(struct pt_regs *regs, int frreg, int flop)
 "3:    std     %%r20,0(%%sr1,%2)\n"
 "4:    std     %%r21,8(%%sr1,%2)\n"
 "5:    \n"
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 5b)
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 5b)
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(3b, 5b)
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(4b, 5b)
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 5b, "%0")
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 5b, "%0")
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(3b, 5b, "%0")
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(4b, 5b, "%0")
        : "+r" (ret)
        : "r" (val), "r" (regs->ior), "r" (regs->isr)
        : "r19", "r20", "r21", "r22", "r1" );
@@ -357,11 +357,11 @@ static int emulate_std(struct pt_regs *regs, int frreg, int flop)
 "4:    stw     %%r1,4(%%sr1,%2)\n"
 "5:    stw     %R1,8(%%sr1,%2)\n"
 "6:    \n"
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 6b)
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 6b)
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(3b, 6b)
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(4b, 6b)
-       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(5b, 6b)
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(1b, 6b, "%0")
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(2b, 6b, "%0")
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(3b, 6b, "%0")
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(4b, 6b, "%0")
+       ASM_EXCEPTIONTABLE_ENTRY_EFAULT(5b, 6b, "%0")
        : "+r" (ret)
        : "r" (val), "r" (regs->ior), "r" (regs->isr)
        : "r19", "r20", "r21", "r1" );
index 27ae40a443b80c5fa575e8579bca7f08ef6d36ab..f7e0fee5ee55a3e055679e75b06c280679b603ad 100644 (file)
@@ -228,10 +228,8 @@ static int unwind_special(struct unwind_frame_info *info, unsigned long pc, int
 #ifdef CONFIG_IRQSTACKS
        extern void * const _call_on_stack;
 #endif /* CONFIG_IRQSTACKS */
-       void *ptr;
 
-       ptr = dereference_kernel_function_descriptor(&handle_interruption);
-       if (pc_is_kernel_fn(pc, ptr)) {
+       if (pc_is_kernel_fn(pc, handle_interruption)) {
                struct pt_regs *regs = (struct pt_regs *)(info->sp - frame_size - PT_SZ_ALGN);
                dbg("Unwinding through handle_interruption()\n");
                info->prev_sp = regs->gr[30];
@@ -239,13 +237,13 @@ static int unwind_special(struct unwind_frame_info *info, unsigned long pc, int
                return 1;
        }
 
-       if (pc_is_kernel_fn(pc, ret_from_kernel_thread) ||
-           pc_is_kernel_fn(pc, syscall_exit)) {
+       if (pc == (unsigned long)&ret_from_kernel_thread ||
+           pc == (unsigned long)&syscall_exit) {
                info->prev_sp = info->prev_ip = 0;
                return 1;
        }
 
-       if (pc_is_kernel_fn(pc, intr_return)) {
+       if (pc == (unsigned long)&intr_return) {
                struct pt_regs *regs;
 
                dbg("Found intr_return()\n");
@@ -257,14 +255,14 @@ static int unwind_special(struct unwind_frame_info *info, unsigned long pc, int
        }
 
        if (pc_is_kernel_fn(pc, _switch_to) ||
-           pc_is_kernel_fn(pc, _switch_to_ret)) {
+           pc == (unsigned long)&_switch_to_ret) {
                info->prev_sp = info->sp - CALLEE_SAVE_FRAME_SIZE;
                info->prev_ip = *(unsigned long *)(info->prev_sp - RP_OFFSET);
                return 1;
        }
 
 #ifdef CONFIG_IRQSTACKS
-       if (pc_is_kernel_fn(pc, _call_on_stack)) {
+       if (pc == (unsigned long)&_call_on_stack) {
                info->prev_sp = *(unsigned long *)(info->sp - FRAME_SIZE - REG_SZ);
                info->prev_ip = *(unsigned long *)(info->sp - FRAME_SIZE - RP_OFFSET);
                return 1;
index 548051b0b4aff692741847a04b09208d1e68d279..b445e47903cfd0b813035c2056f11a4f818cf6d2 100644 (file)
@@ -127,7 +127,7 @@ SECTIONS
        }
 #endif
 
-       RO_DATA(8)
+       RO_DATA(PAGE_SIZE)
 
        /* unwind info */
        . = ALIGN(4);
index 2fe5b44986e0924e3981ebc1edb9d074c08e6fda..c39de84e98b05172bdec0f474261ccde4a06cf00 100644 (file)
@@ -150,11 +150,16 @@ int fixup_exception(struct pt_regs *regs)
                 * Fix up get_user() and put_user().
                 * ASM_EXCEPTIONTABLE_ENTRY_EFAULT() sets the least-significant
                 * bit in the relative address of the fixup routine to indicate
-                * that gr[ASM_EXCEPTIONTABLE_REG] should be loaded with
-                * -EFAULT to report a userspace access error.
+                * that the register encoded in the "or %r0,%r0,register"
+                * opcode should be loaded with -EFAULT to report a userspace
+                * access error.
                 */
                if (fix->fixup & 1) {
-                       regs->gr[ASM_EXCEPTIONTABLE_REG] = -EFAULT;
+                       int fault_error_reg = fix->err_opcode & 0x1f;
+                       if (!WARN_ON(!fault_error_reg))
+                               regs->gr[fault_error_reg] = -EFAULT;
+                       pr_debug("Unalignment fixup of register %d at %pS\n",
+                               fault_error_reg, (void*)regs->iaoq[0]);
 
                        /* zero target register for get_user() */
                        if (parisc_acctyp(0, regs->iir) == VM_READ) {
index 1ebd2ca97f1201f1e760fcb46fe4c5e043727333..107fc5a484569673af80bd58e2e5980ae4082fab 100644 (file)
 #ifndef __ASSEMBLY__
 extern void _mcount(void);
 
-static inline unsigned long ftrace_call_adjust(unsigned long addr)
-{
-       if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
-               addr += MCOUNT_INSN_SIZE;
-
-       return addr;
-}
-
 unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
                                    unsigned long sp);
 
@@ -142,8 +134,10 @@ static inline u8 this_cpu_get_ftrace_enabled(void) { return 1; }
 #ifdef CONFIG_FUNCTION_TRACER
 extern unsigned int ftrace_tramp_text[], ftrace_tramp_init[];
 void ftrace_free_init_tramp(void);
+unsigned long ftrace_call_adjust(unsigned long addr);
 #else
 static inline void ftrace_free_init_tramp(void) { }
+static inline unsigned long ftrace_call_adjust(unsigned long addr) { return addr; }
 #endif
 #endif /* !__ASSEMBLY__ */
 
index 93ce3ec253877d38da5e3f9c3ac76205354d3496..2f2a86ed2280aac66df0535d7938cf4a673446f7 100644 (file)
@@ -17,7 +17,7 @@
 
 static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
 {
-       asm_volatile_goto("1:\n\t"
+       asm goto("1:\n\t"
                 "nop # arch_static_branch\n\t"
                 ".pushsection __jump_table,  \"aw\"\n\t"
                 ".long 1b - ., %l[l_yes] - .\n\t"
@@ -32,7 +32,7 @@ l_yes:
 
 static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)
 {
-       asm_volatile_goto("1:\n\t"
+       asm goto("1:\n\t"
                 "b %l[l_yes] # arch_static_branch_jump\n\t"
                 ".pushsection __jump_table,  \"aw\"\n\t"
                 ".long 1b - ., %l[l_yes] - .\n\t"
index 0dbbff59101d6f31d4ac9f59c2fa74a7d4e90cae..c3cd5b131033eb3ef5c8fc1e99f0a8da0b78157a 100644 (file)
@@ -32,7 +32,7 @@ typedef struct {
  */
 struct papr_sysparm_buf {
        __be16 len;
-       char val[PAPR_SYSPARM_MAX_OUTPUT];
+       u8 val[PAPR_SYSPARM_MAX_OUTPUT];
 };
 
 struct papr_sysparm_buf *papr_sysparm_buf_alloc(void);
index ce2b1b5eebddcf5eb2e84b5e8853f89cf06501a6..a8b7e8682f5bd6c58ff9faa31a152a31e1b5280d 100644 (file)
@@ -30,6 +30,16 @@ void *pci_traverse_device_nodes(struct device_node *start,
                                void *data);
 extern void pci_devs_phb_init_dynamic(struct pci_controller *phb);
 
+#if defined(CONFIG_IOMMU_API) && (defined(CONFIG_PPC_PSERIES) || \
+                                 defined(CONFIG_PPC_POWERNV))
+extern void ppc_iommu_register_device(struct pci_controller *phb);
+extern void ppc_iommu_unregister_device(struct pci_controller *phb);
+#else
+static inline void ppc_iommu_register_device(struct pci_controller *phb) { }
+static inline void ppc_iommu_unregister_device(struct pci_controller *phb) { }
+#endif
+
+
 /* From rtas_pci.h */
 extern void init_pci_config_tokens (void);
 extern unsigned long get_phb_buid (struct device_node *);
index 7fd09f25452d4f697728f7958a9ad5b277f73a50..bb47af9054a9545f54bf5080a14bde235ef29a30 100644 (file)
 #endif
 #define SPRN_HID2      0x3F8           /* Hardware Implementation Register 2 */
 #define SPRN_HID2_GEKKO        0x398           /* Gekko HID2 Register */
+#define SPRN_HID2_G2_LE        0x3F3           /* G2_LE HID2 Register */
+#define  HID2_G2_LE_HBE        (1<<18)         /* High BAT Enable (G2_LE) */
 #define SPRN_IABR      0x3F2   /* Instruction Address Breakpoint Register */
 #define SPRN_IABR2     0x3FA           /* 83xx */
 #define SPRN_IBCR      0x135           /* 83xx Insn Breakpoint Control Reg */
index 9bb2210c8d4417a4262aab81d68d851e175b77b4..065ffd1b2f8adaef8369846531bf4e6f78159b57 100644 (file)
@@ -69,7 +69,7 @@ enum rtas_function_index {
        RTAS_FNIDX__IBM_READ_SLOT_RESET_STATE,
        RTAS_FNIDX__IBM_READ_SLOT_RESET_STATE2,
        RTAS_FNIDX__IBM_REMOVE_PE_DMA_WINDOW,
-       RTAS_FNIDX__IBM_RESET_PE_DMA_WINDOWS,
+       RTAS_FNIDX__IBM_RESET_PE_DMA_WINDOW,
        RTAS_FNIDX__IBM_SCAN_LOG_DUMP,
        RTAS_FNIDX__IBM_SET_DYNAMIC_INDICATOR,
        RTAS_FNIDX__IBM_SET_EEH_OPTION,
@@ -164,7 +164,7 @@ typedef struct {
 #define RTAS_FN_IBM_READ_SLOT_RESET_STATE         rtas_fn_handle(RTAS_FNIDX__IBM_READ_SLOT_RESET_STATE)
 #define RTAS_FN_IBM_READ_SLOT_RESET_STATE2        rtas_fn_handle(RTAS_FNIDX__IBM_READ_SLOT_RESET_STATE2)
 #define RTAS_FN_IBM_REMOVE_PE_DMA_WINDOW          rtas_fn_handle(RTAS_FNIDX__IBM_REMOVE_PE_DMA_WINDOW)
-#define RTAS_FN_IBM_RESET_PE_DMA_WINDOWS          rtas_fn_handle(RTAS_FNIDX__IBM_RESET_PE_DMA_WINDOWS)
+#define RTAS_FN_IBM_RESET_PE_DMA_WINDOW           rtas_fn_handle(RTAS_FNIDX__IBM_RESET_PE_DMA_WINDOW)
 #define RTAS_FN_IBM_SCAN_LOG_DUMP                 rtas_fn_handle(RTAS_FNIDX__IBM_SCAN_LOG_DUMP)
 #define RTAS_FN_IBM_SET_DYNAMIC_INDICATOR         rtas_fn_handle(RTAS_FNIDX__IBM_SET_DYNAMIC_INDICATOR)
 #define RTAS_FN_IBM_SET_EEH_OPTION                rtas_fn_handle(RTAS_FNIDX__IBM_SET_EEH_OPTION)
index ea26665f82cfc833a93b87f23ae2b196eb5a4180..f43f3a6b0051cf24bd76428987366d1fcdf5de5d 100644 (file)
@@ -14,6 +14,7 @@ typedef struct func_desc func_desc_t;
 
 extern char __head_end[];
 extern char __srwx_boundary[];
+extern char __exittext_begin[], __exittext_end[];
 
 /* Patch sites */
 extern s32 patch__call_flush_branch_caches1;
index bf5dde1a411471fcc95d4503dfb41d3881aad9fe..15c5691dd218440d32142779a2a1e2ce5058d60c 100644 (file)
@@ -14,7 +14,7 @@
 
 #ifdef __KERNEL__
 
-#ifdef CONFIG_KASAN
+#if defined(CONFIG_KASAN) && CONFIG_THREAD_SHIFT < 15
 #define MIN_THREAD_SHIFT       (CONFIG_THREAD_SHIFT + 1)
 #else
 #define MIN_THREAD_SHIFT       CONFIG_THREAD_SHIFT
index f1f9890f50d3ef84dfd62b5d66db68315f0698b6..de10437fd20652ee63a6d214638bded13cdbc6c3 100644 (file)
@@ -74,7 +74,7 @@ __pu_failed:                                                  \
 /* -mprefixed can generate offsets beyond range, fall back hack */
 #ifdef CONFIG_PPC_KERNEL_PREFIXED
 #define __put_user_asm_goto(x, addr, label, op)                        \
-       asm_volatile_goto(                                      \
+       asm goto(                                       \
                "1:     " op " %0,0(%1) # put_user\n"           \
                EX_TABLE(1b, %l2)                               \
                :                                               \
@@ -83,7 +83,7 @@ __pu_failed:                                                  \
                : label)
 #else
 #define __put_user_asm_goto(x, addr, label, op)                        \
-       asm_volatile_goto(                                      \
+       asm goto(                                       \
                "1:     " op "%U1%X1 %0,%1      # put_user\n"   \
                EX_TABLE(1b, %l2)                               \
                :                                               \
@@ -97,7 +97,7 @@ __pu_failed:                                                  \
        __put_user_asm_goto(x, ptr, label, "std")
 #else /* __powerpc64__ */
 #define __put_user_asm2_goto(x, addr, label)                   \
-       asm_volatile_goto(                                      \
+       asm goto(                                       \
                "1:     stw%X1 %0, %1\n"                        \
                "2:     stw%X1 %L0, %L1\n"                      \
                EX_TABLE(1b, %l2)                               \
@@ -146,7 +146,7 @@ do {                                                                \
 /* -mprefixed can generate offsets beyond range, fall back hack */
 #ifdef CONFIG_PPC_KERNEL_PREFIXED
 #define __get_user_asm_goto(x, addr, label, op)                        \
-       asm_volatile_goto(                                      \
+       asm_goto_output(                                        \
                "1:     "op" %0,0(%1)   # get_user\n"           \
                EX_TABLE(1b, %l2)                               \
                : "=r" (x)                                      \
@@ -155,7 +155,7 @@ do {                                                                \
                : label)
 #else
 #define __get_user_asm_goto(x, addr, label, op)                        \
-       asm_volatile_goto(                                      \
+       asm_goto_output(                                        \
                "1:     "op"%U1%X1 %0, %1       # get_user\n"   \
                EX_TABLE(1b, %l2)                               \
                : "=r" (x)                                      \
@@ -169,7 +169,7 @@ do {                                                                \
        __get_user_asm_goto(x, addr, label, "ld")
 #else /* __powerpc64__ */
 #define __get_user_asm2_goto(x, addr, label)                   \
-       asm_volatile_goto(                                      \
+       asm_goto_output(                                        \
                "1:     lwz%X1 %0, %1\n"                        \
                "2:     lwz%X1 %L0, %L1\n"                      \
                EX_TABLE(1b, %l2)                               \
index 9f9a0f267ea57c2593448bcfbd0af4f4f0582f08..f733467b1534eb9bf3dad20042b06afd85ab8f41 100644 (file)
@@ -14,7 +14,7 @@ enum {
 struct papr_sysparm_io_block {
        __u32 parameter;
        __u16 length;
-       char data[PAPR_SYSPARM_MAX_OUTPUT];
+       __u8 data[PAPR_SYSPARM_MAX_OUTPUT];
 };
 
 /**
index f29ce3dd6140f40c026a0d8f67e79858c86dabd9..bfd3f442e5eb9dfcb851f1b1c5b68e690f1702ae 100644 (file)
@@ -26,6 +26,15 @@ BEGIN_FTR_SECTION
        bl      __init_fpu_registers
 END_FTR_SECTION_IFCLR(CPU_FTR_FPU_UNAVAILABLE)
        bl      setup_common_caches
+
+       /*
+        * This assumes that all cores using __setup_cpu_603 with
+        * MMU_FTR_USE_HIGH_BATS are G2_LE compatible
+        */
+BEGIN_MMU_FTR_SECTION
+       bl      setup_g2_le_hid2
+END_MMU_FTR_SECTION_IFSET(MMU_FTR_USE_HIGH_BATS)
+
        mtlr    r5
        blr
 _GLOBAL(__setup_cpu_604)
@@ -115,6 +124,16 @@ SYM_FUNC_START_LOCAL(setup_604_hid0)
        blr
 SYM_FUNC_END(setup_604_hid0)
 
+/* Enable high BATs for G2_LE and derivatives like e300cX */
+SYM_FUNC_START_LOCAL(setup_g2_le_hid2)
+       mfspr   r11,SPRN_HID2_G2_LE
+       oris    r11,r11,HID2_G2_LE_HBE@h
+       mtspr   SPRN_HID2_G2_LE,r11
+       sync
+       isync
+       blr
+SYM_FUNC_END(setup_g2_le_hid2)
+
 /* 7400 <= rev 2.7 and 7410 rev = 1.0 suffer from some
  * erratas we work around here.
  * Moto MPC710CE.pdf describes them, those are errata
@@ -495,4 +514,3 @@ _GLOBAL(__restore_cpu_setup)
        mtcr    r7
        blr
 _ASM_NOKPROBE_SYMBOL(__restore_cpu_setup)
-
index ceb06b109f831355a833a0e929ef68d86ccbc321..2ae8e9a7b461c8c35755bd4ac0ab48e29a4daf79 100644 (file)
@@ -8,7 +8,8 @@
 
 #ifdef CONFIG_PPC64
 #define COMMON_USER_BOOKE      (PPC_FEATURE_32 | PPC_FEATURE_HAS_MMU | \
-                                PPC_FEATURE_HAS_FPU | PPC_FEATURE_64)
+                                PPC_FEATURE_HAS_FPU | PPC_FEATURE_64 | \
+                                PPC_FEATURE_BOOKE)
 #else
 #define COMMON_USER_BOOKE      (PPC_FEATURE_32 | PPC_FEATURE_HAS_MMU | \
                                 PPC_FEATURE_BOOKE)
index bd863702d81218d80e61c73e469d448f963eb265..1ad059a9e2fef3da806514bc35158966d626072b 100644 (file)
@@ -52,7 +52,8 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
        mr      r10,r1
        ld      r1,PACAKSAVE(r13)
        std     r10,0(r1)
-       std     r11,_NIP(r1)
+       std     r11,_LINK(r1)
+       std     r11,_NIP(r1)    /* Saved LR is also the next instruction */
        std     r12,_MSR(r1)
        std     r0,GPR0(r1)
        std     r10,GPR1(r1)
@@ -70,7 +71,6 @@ _ASM_NOKPROBE_SYMBOL(system_call_vectored_\name)
        std     r9,GPR13(r1)
        SAVE_NVGPRS(r1)
        std     r11,_XER(r1)
-       std     r11,_LINK(r1)
        std     r11,_CTR(r1)
 
        li      r11,\trapnr
index ebe259bdd46298e0654fb681b0cf8853c8381079..1185efebf032b6e7d2cf08db4c953938948a44b1 100644 (file)
@@ -1287,20 +1287,22 @@ spapr_tce_platform_iommu_attach_dev(struct iommu_domain *platform_domain,
        struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
        struct iommu_group *grp = iommu_group_get(dev);
        struct iommu_table_group *table_group;
-       int ret = -EINVAL;
 
        /* At first attach the ownership is already set */
-       if (!domain)
+       if (!domain) {
+               iommu_group_put(grp);
                return 0;
-
-       if (!grp)
-               return -ENODEV;
+       }
 
        table_group = iommu_group_get_iommudata(grp);
-       ret = table_group->ops->take_ownership(table_group);
+       /*
+        * The domain being set to PLATFORM from earlier
+        * BLOCKED. The table_group ownership has to be released.
+        */
+       table_group->ops->release_ownership(table_group);
        iommu_group_put(grp);
 
-       return ret;
+       return 0;
 }
 
 static const struct iommu_domain_ops spapr_tce_platform_domain_ops = {
@@ -1312,13 +1314,32 @@ static struct iommu_domain spapr_tce_platform_domain = {
        .ops = &spapr_tce_platform_domain_ops,
 };
 
-static struct iommu_domain spapr_tce_blocked_domain = {
-       .type = IOMMU_DOMAIN_BLOCKED,
+static int
+spapr_tce_blocked_iommu_attach_dev(struct iommu_domain *platform_domain,
+                                    struct device *dev)
+{
+       struct iommu_group *grp = iommu_group_get(dev);
+       struct iommu_table_group *table_group;
+       int ret = -EINVAL;
+
        /*
         * FIXME: SPAPR mixes blocked and platform behaviors, the blocked domain
         * also sets the dma_api ops
         */
-       .ops = &spapr_tce_platform_domain_ops,
+       table_group = iommu_group_get_iommudata(grp);
+       ret = table_group->ops->take_ownership(table_group);
+       iommu_group_put(grp);
+
+       return ret;
+}
+
+static const struct iommu_domain_ops spapr_tce_blocked_domain_ops = {
+       .attach_dev = spapr_tce_blocked_iommu_attach_dev,
+};
+
+static struct iommu_domain spapr_tce_blocked_domain = {
+       .type = IOMMU_DOMAIN_BLOCKED,
+       .ops = &spapr_tce_blocked_domain_ops,
 };
 
 static bool spapr_tce_iommu_capable(struct device *dev, enum iommu_cap cap)
@@ -1339,7 +1360,7 @@ static struct iommu_device *spapr_tce_iommu_probe_device(struct device *dev)
        struct pci_controller *hose;
 
        if (!dev_is_pci(dev))
-               return ERR_PTR(-EPERM);
+               return ERR_PTR(-ENODEV);
 
        pdev = to_pci_dev(dev);
        hose = pdev->bus->sysdata;
@@ -1388,6 +1409,21 @@ static const struct attribute_group *spapr_tce_iommu_groups[] = {
        NULL,
 };
 
+void ppc_iommu_register_device(struct pci_controller *phb)
+{
+       iommu_device_sysfs_add(&phb->iommu, phb->parent,
+                               spapr_tce_iommu_groups, "iommu-phb%04x",
+                               phb->global_number);
+       iommu_device_register(&phb->iommu, &spapr_tce_iommu_ops,
+                               phb->parent);
+}
+
+void ppc_iommu_unregister_device(struct pci_controller *phb)
+{
+       iommu_device_unregister(&phb->iommu);
+       iommu_device_sysfs_remove(&phb->iommu);
+}
+
 /*
  * This registers IOMMU devices of PHBs. This needs to happen
  * after core_initcall(iommu_init) + postcore_initcall(pci_driver_init) and
@@ -1398,11 +1434,7 @@ static int __init spapr_tce_setup_phb_iommus_initcall(void)
        struct pci_controller *hose;
 
        list_for_each_entry(hose, &hose_list, list_node) {
-               iommu_device_sysfs_add(&hose->iommu, hose->parent,
-                                      spapr_tce_iommu_groups, "iommu-phb%04x",
-                                      hose->global_number);
-               iommu_device_register(&hose->iommu, &spapr_tce_iommu_ops,
-                                     hose->parent);
+               ppc_iommu_register_device(hose);
        }
        return 0;
 }
index 938e66829eae65cc52d170f7753ee0685cdaa4e3..d5c48d1b0a31ea533281934e414320bbf77368d2 100644 (file)
@@ -230,7 +230,7 @@ again:
         * This allows interrupts to be unmasked without hard disabling, and
         * also without new hard interrupts coming in ahead of pending ones.
         */
-       asm_volatile_goto(
+       asm goto(
 "1:                                    \n"
 "              lbz     9,%0(13)        \n"
 "              cmpwi   9,0             \n"
index 7e793b503e29f1ff878e7289c8703e7c4cf20edc..8064d9c3de8620d27d9c87f829676ef048aeed40 100644 (file)
@@ -375,8 +375,13 @@ static struct rtas_function rtas_function_table[] __ro_after_init = {
        [RTAS_FNIDX__IBM_REMOVE_PE_DMA_WINDOW] = {
                .name = "ibm,remove-pe-dma-window",
        },
-       [RTAS_FNIDX__IBM_RESET_PE_DMA_WINDOWS] = {
-               .name = "ibm,reset-pe-dma-windows",
+       [RTAS_FNIDX__IBM_RESET_PE_DMA_WINDOW] = {
+               /*
+                * Note: PAPR+ v2.13 7.3.31.4.1 spells this as
+                * "ibm,reset-pe-dma-windows" (plural), but RTAS
+                * implementations use the singular form in practice.
+                */
+               .name = "ibm,reset-pe-dma-window",
        },
        [RTAS_FNIDX__IBM_SCAN_LOG_DUMP] = {
                .name = "ibm,scan-log-dump",
index 82010629cf887ca1753d4f64bfa5916cdc1d7b48..d8d6b4fd9a14cbf8f8f93e499500eed11190be71 100644 (file)
 #include <asm/ftrace.h>
 #include <asm/syscall.h>
 #include <asm/inst.h>
+#include <asm/sections.h>
 
 #define        NUM_FTRACE_TRAMPS       2
 static unsigned long ftrace_tramps[NUM_FTRACE_TRAMPS];
 
+unsigned long ftrace_call_adjust(unsigned long addr)
+{
+       if (addr >= (unsigned long)__exittext_begin && addr < (unsigned long)__exittext_end)
+               return 0;
+
+       if (IS_ENABLED(CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY))
+               addr += MCOUNT_INSN_SIZE;
+
+       return addr;
+}
+
 static ppc_inst_t ftrace_create_branch_inst(unsigned long ip, unsigned long addr, int link)
 {
        ppc_inst_t op;
index 7b85c3b460a3c048ec31cce44e9b21066b96c5a8..12fab1803bcf45cafb3fd230c1f7871e2c539f1d 100644 (file)
 #define        NUM_FTRACE_TRAMPS       8
 static unsigned long ftrace_tramps[NUM_FTRACE_TRAMPS];
 
+unsigned long ftrace_call_adjust(unsigned long addr)
+{
+       return addr;
+}
+
 static ppc_inst_t
 ftrace_call_replace(unsigned long ip, unsigned long addr, int link)
 {
index 1c5970df32336655703888b1ffafb8180e79c446..f420df7888a75c5f515a3457708c3188661fa331 100644 (file)
@@ -281,7 +281,9 @@ SECTIONS
         * to deal with references from __bug_table
         */
        .exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) {
+               __exittext_begin = .;
                EXIT_TEXT
+               __exittext_end = .;
        }
 
        . = ALIGN(PAGE_SIZE);
index 52427fc2a33fa4ad7032bcc6323bc6364918d98f..0b921704da45eb6b718cac8f031c5d0c45176746 100644 (file)
@@ -391,6 +391,24 @@ static void kvmppc_set_pvr_hv(struct kvm_vcpu *vcpu, u32 pvr)
 /* Dummy value used in computing PCR value below */
 #define PCR_ARCH_31    (PCR_ARCH_300 << 1)
 
+static inline unsigned long map_pcr_to_cap(unsigned long pcr)
+{
+       unsigned long cap = 0;
+
+       switch (pcr) {
+       case PCR_ARCH_300:
+               cap = H_GUEST_CAP_POWER9;
+               break;
+       case PCR_ARCH_31:
+               cap = H_GUEST_CAP_POWER10;
+               break;
+       default:
+               break;
+       }
+
+       return cap;
+}
+
 static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
 {
        unsigned long host_pcr_bit = 0, guest_pcr_bit = 0, cap = 0;
@@ -424,11 +442,9 @@ static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
                        break;
                case PVR_ARCH_300:
                        guest_pcr_bit = PCR_ARCH_300;
-                       cap = H_GUEST_CAP_POWER9;
                        break;
                case PVR_ARCH_31:
                        guest_pcr_bit = PCR_ARCH_31;
-                       cap = H_GUEST_CAP_POWER10;
                        break;
                default:
                        return -EINVAL;
@@ -440,6 +456,12 @@ static int kvmppc_set_arch_compat(struct kvm_vcpu *vcpu, u32 arch_compat)
                return -EINVAL;
 
        if (kvmhv_on_pseries() && kvmhv_is_nestedv2()) {
+               /*
+                * 'arch_compat == 0' would mean the guest should default to
+                * L1's compatibility. In this case, the guest would pick
+                * host's PCR and evaluate the corresponding capabilities.
+                */
+               cap = map_pcr_to_cap(guest_pcr_bit);
                if (!(cap & nested_capabilities))
                        return -EINVAL;
        }
index 5378eb40b162f2690879f43fbaeb3f0b003536a7..8e6f5355f08b5d925c54606db4a70cbe24d74e61 100644 (file)
@@ -138,6 +138,7 @@ static int gs_msg_ops_vcpu_fill_info(struct kvmppc_gs_buff *gsb,
        vector128 v;
        int rc, i;
        u16 iden;
+       u32 arch_compat = 0;
 
        vcpu = gsm->data;
 
@@ -347,8 +348,23 @@ static int gs_msg_ops_vcpu_fill_info(struct kvmppc_gs_buff *gsb,
                        break;
                }
                case KVMPPC_GSID_LOGICAL_PVR:
-                       rc = kvmppc_gse_put_u32(gsb, iden,
-                                               vcpu->arch.vcore->arch_compat);
+                       /*
+                        * Though 'arch_compat == 0' would mean the default
+                        * compatibility, arch_compat, being a Guest Wide
+                        * Element, cannot be filled with a value of 0 in GSB
+                        * as this would result into a kernel trap.
+                        * Hence, when `arch_compat == 0`, arch_compat should
+                        * default to L1's PVR.
+                        */
+                       if (!vcpu->arch.vcore->arch_compat) {
+                               if (cpu_has_feature(CPU_FTR_ARCH_31))
+                                       arch_compat = PVR_ARCH_31;
+                               else if (cpu_has_feature(CPU_FTR_ARCH_300))
+                                       arch_compat = PVR_ARCH_300;
+                       } else {
+                               arch_compat = vcpu->arch.vcore->arch_compat;
+                       }
+                       rc = kvmppc_gse_put_u32(gsb, iden, arch_compat);
                        break;
                }
 
index a70828a6d9357d5fb399375f9624a7fb06d4a129..aa9aa11927b2f842718d98e7818fcd499efd0e54 100644 (file)
@@ -64,6 +64,7 @@ int __init __weak kasan_init_region(void *start, size_t size)
        if (ret)
                return ret;
 
+       k_start = k_start & PAGE_MASK;
        block = memblock_alloc(k_end - k_start, PAGE_SIZE);
        if (!block)
                return -ENOMEM;
index e966b2ad8ecd42a97d8eb9fc7d6a70922165b2d9..b3327a358eb434dfe4c3357b534ea3f8102425e1 100644 (file)
@@ -27,7 +27,7 @@
 
 #include "mpc85xx.h"
 
-void __init mpc8536_ds_pic_init(void)
+static void __init mpc8536_ds_pic_init(void)
 {
        struct mpic *mpic = mpic_alloc(NULL, 0, MPIC_BIG_ENDIAN,
                        0, 256, " OpenPIC  ");
index 1b59e45a0c64f1bd4dc257b52df072b07a555abf..19122daadb55a64842c4d5e56ad6e08ad762ab28 100644 (file)
@@ -21,7 +21,7 @@
 
 #include "mpc85xx.h"
 
-void __init mvme2500_pic_init(void)
+static void __init mvme2500_pic_init(void)
 {
        struct mpic *mpic = mpic_alloc(NULL, 0,
                  MPIC_BIG_ENDIAN | MPIC_SINGLE_DEST_CPU,
index 10d6f1fa33275a8b5b4153226c827e6f1d2816fa..491895ac8bcfe2121f99138e352255cd15085ccd 100644 (file)
@@ -24,7 +24,7 @@
 
 #include "mpc85xx.h"
 
-void __init p1010_rdb_pic_init(void)
+static void __init p1010_rdb_pic_init(void)
 {
        struct mpic *mpic = mpic_alloc(NULL, 0, MPIC_BIG_ENDIAN |
          MPIC_SINGLE_DEST_CPU,
index 0dd786a061a6a2195e51a1ecf2d6be65d6d38221..adc3a2ee141509f0ca06ccb1d648efd17292f7be 100644 (file)
@@ -370,7 +370,7 @@ exit:
  *
  * @pixclock: the wavelength, in picoseconds, of the clock
  */
-void p1022ds_set_pixel_clock(unsigned int pixclock)
+static void p1022ds_set_pixel_clock(unsigned int pixclock)
 {
        struct device_node *guts_np = NULL;
        struct ccsr_guts __iomem *guts;
@@ -418,7 +418,7 @@ void p1022ds_set_pixel_clock(unsigned int pixclock)
 /**
  * p1022ds_valid_monitor_port: set the monitor port for sysfs
  */
-enum fsl_diu_monitor_port
+static enum fsl_diu_monitor_port
 p1022ds_valid_monitor_port(enum fsl_diu_monitor_port port)
 {
        switch (port) {
@@ -432,7 +432,7 @@ p1022ds_valid_monitor_port(enum fsl_diu_monitor_port port)
 
 #endif
 
-void __init p1022_ds_pic_init(void)
+static void __init p1022_ds_pic_init(void)
 {
        struct mpic *mpic = mpic_alloc(NULL, 0, MPIC_BIG_ENDIAN |
                MPIC_SINGLE_DEST_CPU,
index 25ab6e9c14703a66fd9b702324f36574166ab114..6198299d95b1b88806f83da5743961034dc35b8f 100644 (file)
@@ -40,7 +40,7 @@
  *
  * @pixclock: the wavelength, in picoseconds, of the clock
  */
-void p1022rdk_set_pixel_clock(unsigned int pixclock)
+static void p1022rdk_set_pixel_clock(unsigned int pixclock)
 {
        struct device_node *guts_np = NULL;
        struct ccsr_guts __iomem *guts;
@@ -88,7 +88,7 @@ void p1022rdk_set_pixel_clock(unsigned int pixclock)
 /**
  * p1022rdk_valid_monitor_port: set the monitor port for sysfs
  */
-enum fsl_diu_monitor_port
+static enum fsl_diu_monitor_port
 p1022rdk_valid_monitor_port(enum fsl_diu_monitor_port port)
 {
        return FSL_DIU_PORT_DVI;
@@ -96,7 +96,7 @@ p1022rdk_valid_monitor_port(enum fsl_diu_monitor_port port)
 
 #endif
 
-void __init p1022_rdk_pic_init(void)
+static void __init p1022_rdk_pic_init(void)
 {
        struct mpic *mpic = mpic_alloc(NULL, 0, MPIC_BIG_ENDIAN |
                MPIC_SINGLE_DEST_CPU,
index baa12eff6d5de460f56832a60bd3116e3748fa32..60e0b8947ce6106873dcc1abb22b51450524c3b2 100644 (file)
@@ -8,6 +8,8 @@
 #include <linux/of_irq.h>
 #include <linux/io.h>
 
+#include "socrates_fpga_pic.h"
+
 /*
  * The FPGA supports 9 interrupt sources, which can be routed to 3
  * interrupt request lines of the MPIC. The line to be used can be
index 45f257fc1ade055a7fdcc6a9142e0b5404f77f0b..2582427d8d0182fffdb79d70ed584889c6ff0d5f 100644 (file)
@@ -37,7 +37,7 @@
 #define MPC85xx_L2CTL_L2I              0x40000000 /* L2 flash invalidate */
 #define MPC85xx_L2CTL_L2SIZ_MASK       0x30000000 /* L2 SRAM size (R/O) */
 
-void __init xes_mpc85xx_pic_init(void)
+static void __init xes_mpc85xx_pic_init(void)
 {
        struct mpic *mpic = mpic_alloc(NULL, 0, MPIC_BIG_ENDIAN,
                        0, 256, " OpenPIC  ");
index 496e16c588aaa8edcd0294825862312471928506..e8c4129697b142ba48490481ee38793086e8425a 100644 (file)
@@ -574,29 +574,6 @@ static void iommu_table_setparms(struct pci_controller *phb,
 
 struct iommu_table_ops iommu_table_lpar_multi_ops;
 
-/*
- * iommu_table_setparms_lpar
- *
- * Function: On pSeries LPAR systems, return TCE table info, given a pci bus.
- */
-static void iommu_table_setparms_lpar(struct pci_controller *phb,
-                                     struct device_node *dn,
-                                     struct iommu_table *tbl,
-                                     struct iommu_table_group *table_group,
-                                     const __be32 *dma_window)
-{
-       unsigned long offset, size, liobn;
-
-       of_parse_dma_window(dn, dma_window, &liobn, &offset, &size);
-
-       iommu_table_setparms_common(tbl, phb->bus->number, liobn, offset, size, IOMMU_PAGE_SHIFT_4K, NULL,
-                                   &iommu_table_lpar_multi_ops);
-
-
-       table_group->tce32_start = offset;
-       table_group->tce32_size = size;
-}
-
 struct iommu_table_ops iommu_table_pseries_ops = {
        .set = tce_build_pSeries,
        .clear = tce_free_pSeries,
@@ -724,26 +701,71 @@ struct iommu_table_ops iommu_table_lpar_multi_ops = {
  * dynamic 64bit DMA window, walking up the device tree.
  */
 static struct device_node *pci_dma_find(struct device_node *dn,
-                                       const __be32 **dma_window)
+                                       struct dynamic_dma_window_prop *prop)
 {
-       const __be32 *dw = NULL;
+       const __be32 *default_prop = NULL;
+       const __be32 *ddw_prop = NULL;
+       struct device_node *rdn = NULL;
+       bool default_win = false, ddw_win = false;
 
        for ( ; dn && PCI_DN(dn); dn = dn->parent) {
-               dw = of_get_property(dn, "ibm,dma-window", NULL);
-               if (dw) {
-                       if (dma_window)
-                               *dma_window = dw;
-                       return dn;
+               default_prop = of_get_property(dn, "ibm,dma-window", NULL);
+               if (default_prop) {
+                       rdn = dn;
+                       default_win = true;
+               }
+               ddw_prop = of_get_property(dn, DIRECT64_PROPNAME, NULL);
+               if (ddw_prop) {
+                       rdn = dn;
+                       ddw_win = true;
+                       break;
+               }
+               ddw_prop = of_get_property(dn, DMA64_PROPNAME, NULL);
+               if (ddw_prop) {
+                       rdn = dn;
+                       ddw_win = true;
+                       break;
                }
-               dw = of_get_property(dn, DIRECT64_PROPNAME, NULL);
-               if (dw)
-                       return dn;
-               dw = of_get_property(dn, DMA64_PROPNAME, NULL);
-               if (dw)
-                       return dn;
+
+               /* At least found default window, which is the case for normal boot */
+               if (default_win)
+                       break;
        }
 
-       return NULL;
+       /* For PCI devices there will always be a DMA window, either on the device
+        * or parent bus
+        */
+       WARN_ON(!(default_win | ddw_win));
+
+       /* caller doesn't want to get DMA window property */
+       if (!prop)
+               return rdn;
+
+       /* parse DMA window property. During normal system boot, only default
+        * DMA window is passed in OF. But, for kdump, a dedicated adapter might
+        * have both default and DDW in FDT. In this scenario, DDW takes precedence
+        * over default window.
+        */
+       if (ddw_win) {
+               struct dynamic_dma_window_prop *p;
+
+               p = (struct dynamic_dma_window_prop *)ddw_prop;
+               prop->liobn = p->liobn;
+               prop->dma_base = p->dma_base;
+               prop->tce_shift = p->tce_shift;
+               prop->window_shift = p->window_shift;
+       } else if (default_win) {
+               unsigned long offset, size, liobn;
+
+               of_parse_dma_window(rdn, default_prop, &liobn, &offset, &size);
+
+               prop->liobn = cpu_to_be32((u32)liobn);
+               prop->dma_base = cpu_to_be64(offset);
+               prop->tce_shift = cpu_to_be32(IOMMU_PAGE_SHIFT_4K);
+               prop->window_shift = cpu_to_be32(order_base_2(size));
+       }
+
+       return rdn;
 }
 
 static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
@@ -751,17 +773,20 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
        struct iommu_table *tbl;
        struct device_node *dn, *pdn;
        struct pci_dn *ppci;
-       const __be32 *dma_window = NULL;
+       struct dynamic_dma_window_prop prop;
 
        dn = pci_bus_to_OF_node(bus);
 
        pr_debug("pci_dma_bus_setup_pSeriesLP: setting up bus %pOF\n",
                 dn);
 
-       pdn = pci_dma_find(dn, &dma_window);
+       pdn = pci_dma_find(dn, &prop);
 
-       if (dma_window == NULL)
-               pr_debug("  no ibm,dma-window property !\n");
+       /* In PPC architecture, there will always be DMA window on bus or one of the
+        * parent bus. During reboot, there will be ibm,dma-window property to
+        * define DMA window. For kdump, there will at least be default window or DDW
+        * or both.
+        */
 
        ppci = PCI_DN(pdn);
 
@@ -771,13 +796,24 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
        if (!ppci->table_group) {
                ppci->table_group = iommu_pseries_alloc_group(ppci->phb->node);
                tbl = ppci->table_group->tables[0];
-               if (dma_window) {
-                       iommu_table_setparms_lpar(ppci->phb, pdn, tbl,
-                                                 ppci->table_group, dma_window);
 
-                       if (!iommu_init_table(tbl, ppci->phb->node, 0, 0))
-                               panic("Failed to initialize iommu table");
-               }
+               iommu_table_setparms_common(tbl, ppci->phb->bus->number,
+                               be32_to_cpu(prop.liobn),
+                               be64_to_cpu(prop.dma_base),
+                               1ULL << be32_to_cpu(prop.window_shift),
+                               be32_to_cpu(prop.tce_shift), NULL,
+                               &iommu_table_lpar_multi_ops);
+
+               /* Only for normal boot with default window. Doesn't matter even
+                * if we set these with DDW which is 64bit during kdump, since
+                * these will not be used during kdump.
+                */
+               ppci->table_group->tce32_start = be64_to_cpu(prop.dma_base);
+               ppci->table_group->tce32_size = 1 << be32_to_cpu(prop.window_shift);
+
+               if (!iommu_init_table(tbl, ppci->phb->node, 0, 0))
+                       panic("Failed to initialize iommu table");
+
                iommu_register_group(ppci->table_group,
                                pci_domain_nr(bus), 0);
                pr_debug("  created table: %p\n", ppci->table_group);
@@ -968,6 +1004,12 @@ static void find_existing_ddw_windows_named(const char *name)
                        continue;
                }
 
+               /* If at the time of system initialization, there are DDWs in OF,
+                * it means this is during kexec. DDW could be direct or dynamic.
+                * We will just mark DDWs as "dynamic" since this is kdump path,
+                * no need to worry about perforance. ddw_list_new_entry() will
+                * set window->direct = false.
+                */
                window = ddw_list_new_entry(pdn, dma64);
                if (!window) {
                        of_node_put(pdn);
@@ -1524,8 +1566,8 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 {
        struct device_node *pdn, *dn;
        struct iommu_table *tbl;
-       const __be32 *dma_window = NULL;
        struct pci_dn *pci;
+       struct dynamic_dma_window_prop prop;
 
        pr_debug("pci_dma_dev_setup_pSeriesLP: %s\n", pci_name(dev));
 
@@ -1538,7 +1580,7 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
        dn = pci_device_to_OF_node(dev);
        pr_debug("  node is %pOF\n", dn);
 
-       pdn = pci_dma_find(dn, &dma_window);
+       pdn = pci_dma_find(dn, &prop);
        if (!pdn || !PCI_DN(pdn)) {
                printk(KERN_WARNING "pci_dma_dev_setup_pSeriesLP: "
                       "no DMA window found for pci dev=%s dn=%pOF\n",
@@ -1551,8 +1593,20 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
        if (!pci->table_group) {
                pci->table_group = iommu_pseries_alloc_group(pci->phb->node);
                tbl = pci->table_group->tables[0];
-               iommu_table_setparms_lpar(pci->phb, pdn, tbl,
-                               pci->table_group, dma_window);
+
+               iommu_table_setparms_common(tbl, pci->phb->bus->number,
+                               be32_to_cpu(prop.liobn),
+                               be64_to_cpu(prop.dma_base),
+                               1ULL << be32_to_cpu(prop.window_shift),
+                               be32_to_cpu(prop.tce_shift), NULL,
+                               &iommu_table_lpar_multi_ops);
+
+               /* Only for normal boot with default window. Doesn't matter even
+                * if we set these with DDW which is 64bit during kdump, since
+                * these will not be used during kdump.
+                */
+               pci->table_group->tce32_start = be64_to_cpu(prop.dma_base);
+               pci->table_group->tce32_size = 1 << be32_to_cpu(prop.window_shift);
 
                iommu_init_table(tbl, pci->phb->node, 0, 0);
                iommu_register_group(pci->table_group,
index 4561667832ed403e2a4ce1847bda97d844a12485..4e9916bb03d71fb687f49c989b7f65cce08bd066 100644 (file)
@@ -662,8 +662,12 @@ u64 pseries_paravirt_steal_clock(int cpu)
 {
        struct lppaca *lppaca = &lppaca_of(cpu);
 
-       return be64_to_cpu(READ_ONCE(lppaca->enqueue_dispatch_tb)) +
-               be64_to_cpu(READ_ONCE(lppaca->ready_enqueue_tb));
+       /*
+        * VPA steal time counters are reported at TB frequency. Hence do a
+        * conversion to ns before returning
+        */
+       return tb_to_ns(be64_to_cpu(READ_ONCE(lppaca->enqueue_dispatch_tb)) +
+                       be64_to_cpu(READ_ONCE(lppaca->ready_enqueue_tb)));
 }
 #endif
 
index 4ba8245681192120860ad1278a1b7ec7110a4bfc..4448386268d99155657fe6179ad8fd0132676f13 100644 (file)
@@ -35,6 +35,8 @@ struct pci_controller *init_phb_dynamic(struct device_node *dn)
 
        pseries_msi_allocate_domains(phb);
 
+       ppc_iommu_register_device(phb);
+
        /* Create EEH devices for the PHB */
        eeh_phb_pe_create(phb);
 
@@ -76,6 +78,8 @@ int remove_phb_dynamic(struct pci_controller *phb)
                }
        }
 
+       ppc_iommu_unregister_device(phb);
+
        pseries_msi_free_domains(phb);
 
        /* Keep a reference so phb isn't freed yet */
index 5020044400dcb3caa60d5efebe366249dc7ff65e..4de57ba52236513f5c2f92e42ae4d7059d6e4f2f 100644 (file)
@@ -41,7 +41,7 @@ struct memcons memcons = {
        .input_end = &memcons_input[CONFIG_PPC_MEMCONS_INPUT_SIZE],
 };
 
-void memcons_putc(char c)
+static void memcons_putc(char c)
 {
        char *new_output_pos;
 
@@ -54,7 +54,7 @@ void memcons_putc(char c)
        memcons.output_pos = new_output_pos;
 }
 
-int memcons_getc_poll(void)
+static int memcons_getc_poll(void)
 {
        char c;
        char *new_input_pos;
@@ -77,7 +77,7 @@ int memcons_getc_poll(void)
        return -1;
 }
 
-int memcons_getc(void)
+static int memcons_getc(void)
 {
        int c;
 
index bffbd869a0682842883591788da784648acf1626..e3142ce531a097b8cf0e39251ba88ae143d6594c 100644 (file)
@@ -315,7 +315,6 @@ config AS_HAS_OPTION_ARCH
        # https://reviews.llvm.org/D123515
        def_bool y
        depends on $(as-instr, .option arch$(comma) +m)
-       depends on !$(as-instr, .option arch$(comma) -i)
 
 source "arch/riscv/Kconfig.socs"
 source "arch/riscv/Kconfig.errata"
index 07387f9c135ca7e8ddf7d45de10ccdb933a2e4d4..72b87b08ab444ef1dc1ed200a6e8b3cbb9bfc73f 100644 (file)
                interrupt-parent = <&gpio>;
                interrupts = <1 IRQ_TYPE_LEVEL_LOW>;
                interrupt-controller;
+               #interrupt-cells = <2>;
 
                onkey {
                        compatible = "dlg,da9063-onkey";
index 93256540d07882af2b12a6faf0642d93ddc4970b..ead1cc35d88b2f13bfecf935a6e66e6049a24a75 100644 (file)
                                              <&cpu63_intc 3>;
                };
 
-               clint_mtimer0: timer@70ac000000 {
+               clint_mtimer0: timer@70ac004000 {
                        compatible = "sophgo,sg2042-aclint-mtimer", "thead,c900-aclint-mtimer";
-                       reg = <0x00000070 0xac000000 0x00000000 0x00007ff8>;
+                       reg = <0x00000070 0xac004000 0x00000000 0x0000c000>;
+                       reg-names = "mtimecmp";
                        interrupts-extended = <&cpu0_intc 7>,
                                              <&cpu1_intc 7>,
                                              <&cpu2_intc 7>,
                                              <&cpu3_intc 7>;
                };
 
-               clint_mtimer1: timer@70ac010000 {
+               clint_mtimer1: timer@70ac014000 {
                        compatible = "sophgo,sg2042-aclint-mtimer", "thead,c900-aclint-mtimer";
-                       reg = <0x00000070 0xac010000 0x00000000 0x00007ff8>;
+                       reg = <0x00000070 0xac014000 0x00000000 0x0000c000>;
+                       reg-names = "mtimecmp";
                        interrupts-extended = <&cpu4_intc 7>,
                                              <&cpu5_intc 7>,
                                              <&cpu6_intc 7>,
                                              <&cpu7_intc 7>;
                };
 
-               clint_mtimer2: timer@70ac020000 {
+               clint_mtimer2: timer@70ac024000 {
                        compatible = "sophgo,sg2042-aclint-mtimer", "thead,c900-aclint-mtimer";
-                       reg = <0x00000070 0xac020000 0x00000000 0x00007ff8>;
+                       reg = <0x00000070 0xac024000 0x00000000 0x0000c000>;
+                       reg-names = "mtimecmp";
                        interrupts-extended = <&cpu8_intc 7>,
                                              <&cpu9_intc 7>,
                                              <&cpu10_intc 7>,
                                              <&cpu11_intc 7>;
                };
 
-               clint_mtimer3: timer@70ac030000 {
+               clint_mtimer3: timer@70ac034000 {
                        compatible = "sophgo,sg2042-aclint-mtimer", "thead,c900-aclint-mtimer";
-                       reg = <0x00000070 0xac030000 0x00000000 0x00007ff8>;
+                       reg = <0x00000070 0xac034000 0x00000000 0x0000c000>;
+                       reg-names = "mtimecmp";
                        interrupts-extended = <&cpu12_intc 7>,
                                              <&cpu13_intc 7>,
                                              <&cpu14_intc 7>,
                                              <&cpu15_intc 7>;
                };
 
-               clint_mtimer4: timer@70ac040000 {
+               clint_mtimer4: timer@70ac044000 {
                        compatible = "sophgo,sg2042-aclint-mtimer", "thead,c900-aclint-mtimer";
-                       reg = <0x00000070 0xac040000 0x00000000 0x00007ff8>;
+                       reg = <0x00000070 0xac044000 0x00000000 0x0000c000>;
+                       reg-names = "mtimecmp";
                        interrupts-extended = <&cpu16_intc 7>,
                                              <&cpu17_intc 7>,
                                              <&cpu18_intc 7>,
                                              <&cpu19_intc 7>;
                };
 
-               clint_mtimer5: timer@70ac050000 {
+               clint_mtimer5: timer@70ac054000 {
                        compatible = "sophgo,sg2042-aclint-mtimer", "thead,c900-aclint-mtimer";
-                       reg = <0x00000070 0xac050000 0x00000000 0x00007ff8>;
+                       reg = <0x00000070 0xac054000 0x00000000 0x0000c000>;
+                       reg-names = "mtimecmp";
                        interrupts-extended = <&cpu20_intc 7>,
                                              <&cpu21_intc 7>,
                                              <&cpu22_intc 7>,
                                              <&cpu23_intc 7>;
                };
 
-               clint_mtimer6: timer@70ac060000 {
+               clint_mtimer6: timer@70ac064000 {
                        compatible = "sophgo,sg2042-aclint-mtimer", "thead,c900-aclint-mtimer";
-                       reg = <0x00000070 0xac060000 0x00000000 0x00007ff8>;
+                       reg = <0x00000070 0xac064000 0x00000000 0x0000c000>;
+                       reg-names = "mtimecmp";
                        interrupts-extended = <&cpu24_intc 7>,
                                              <&cpu25_intc 7>,
                                              <&cpu26_intc 7>,
                                              <&cpu27_intc 7>;
                };
 
-               clint_mtimer7: timer@70ac070000 {
+               clint_mtimer7: timer@70ac074000 {
                        compatible = "sophgo,sg2042-aclint-mtimer", "thead,c900-aclint-mtimer";
-                       reg = <0x00000070 0xac070000 0x00000000 0x00007ff8>;
+                       reg = <0x00000070 0xac074000 0x00000000 0x0000c000>;
+                       reg-names = "mtimecmp";
                        interrupts-extended = <&cpu28_intc 7>,
                                              <&cpu29_intc 7>,
                                              <&cpu30_intc 7>,
                                              <&cpu31_intc 7>;
                };
 
-               clint_mtimer8: timer@70ac080000 {
+               clint_mtimer8: timer@70ac084000 {
                        compatible = "sophgo,sg2042-aclint-mtimer", "thead,c900-aclint-mtimer";
-                       reg = <0x00000070 0xac080000 0x00000000 0x00007ff8>;
+                       reg = <0x00000070 0xac084000 0x00000000 0x0000c000>;
+                       reg-names = "mtimecmp";
                        interrupts-extended = <&cpu32_intc 7>,
                                              <&cpu33_intc 7>,
                                              <&cpu34_intc 7>,
                                              <&cpu35_intc 7>;
                };
 
-               clint_mtimer9: timer@70ac090000 {
+               clint_mtimer9: timer@70ac094000 {
                        compatible = "sophgo,sg2042-aclint-mtimer", "thead,c900-aclint-mtimer";
-                       reg = <0x00000070 0xac090000 0x00000000 0x00007ff8>;
+                       reg = <0x00000070 0xac094000 0x00000000 0x0000c000>;
+                       reg-names = "mtimecmp";
                        interrupts-extended = <&cpu36_intc 7>,
                                              <&cpu37_intc 7>,
                                              <&cpu38_intc 7>,
                                              <&cpu39_intc 7>;
                };
 
-               clint_mtimer10: timer@70ac0a0000 {
+               clint_mtimer10: timer@70ac0a4000 {
                        compatible = "sophgo,sg2042-aclint-mtimer", "thead,c900-aclint-mtimer";
-                       reg = <0x00000070 0xac0a0000 0x00000000 0x00007ff8>;
+                       reg = <0x00000070 0xac0a4000 0x00000000 0x0000c000>;
+                       reg-names = "mtimecmp";
                        interrupts-extended = <&cpu40_intc 7>,
                                              <&cpu41_intc 7>,
                                              <&cpu42_intc 7>,
                                              <&cpu43_intc 7>;
                };
 
-               clint_mtimer11: timer@70ac0b0000 {
+               clint_mtimer11: timer@70ac0b4000 {
                        compatible = "sophgo,sg2042-aclint-mtimer", "thead,c900-aclint-mtimer";
-                       reg = <0x00000070 0xac0b0000 0x00000000 0x00007ff8>;
+                       reg = <0x00000070 0xac0b4000 0x00000000 0x0000c000>;
+                       reg-names = "mtimecmp";
                        interrupts-extended = <&cpu44_intc 7>,
                                              <&cpu45_intc 7>,
                                              <&cpu46_intc 7>,
                                              <&cpu47_intc 7>;
                };
 
-               clint_mtimer12: timer@70ac0c0000 {
+               clint_mtimer12: timer@70ac0c4000 {
                        compatible = "sophgo,sg2042-aclint-mtimer", "thead,c900-aclint-mtimer";
-                       reg = <0x00000070 0xac0c0000 0x00000000 0x00007ff8>;
+                       reg = <0x00000070 0xac0c4000 0x00000000 0x0000c000>;
+                       reg-names = "mtimecmp";
                        interrupts-extended = <&cpu48_intc 7>,
                                              <&cpu49_intc 7>,
                                              <&cpu50_intc 7>,
                                              <&cpu51_intc 7>;
                };
 
-               clint_mtimer13: timer@70ac0d0000 {
+               clint_mtimer13: timer@70ac0d4000 {
                        compatible = "sophgo,sg2042-aclint-mtimer", "thead,c900-aclint-mtimer";
-                       reg = <0x00000070 0xac0d0000 0x00000000 0x00007ff8>;
+                       reg = <0x00000070 0xac0d4000 0x00000000 0x0000c000>;
+                       reg-names = "mtimecmp";
                        interrupts-extended = <&cpu52_intc 7>,
                                              <&cpu53_intc 7>,
                                              <&cpu54_intc 7>,
                                              <&cpu55_intc 7>;
                };
 
-               clint_mtimer14: timer@70ac0e0000 {
+               clint_mtimer14: timer@70ac0e4000 {
                        compatible = "sophgo,sg2042-aclint-mtimer", "thead,c900-aclint-mtimer";
-                       reg = <0x00000070 0xac0e0000 0x00000000 0x00007ff8>;
+                       reg = <0x00000070 0xac0e4000 0x00000000 0x0000c000>;
+                       reg-names = "mtimecmp";
                        interrupts-extended = <&cpu56_intc 7>,
                                              <&cpu57_intc 7>,
                                              <&cpu58_intc 7>,
                                              <&cpu59_intc 7>;
                };
 
-               clint_mtimer15: timer@70ac0f0000 {
+               clint_mtimer15: timer@70ac0f4000 {
                        compatible = "sophgo,sg2042-aclint-mtimer", "thead,c900-aclint-mtimer";
-                       reg = <0x00000070 0xac0f0000 0x00000000 0x00007ff8>;
+                       reg = <0x00000070 0xac0f4000 0x00000000 0x0000c000>;
+                       reg-names = "mtimecmp";
                        interrupts-extended = <&cpu60_intc 7>,
                                              <&cpu61_intc 7>,
                                              <&cpu62_intc 7>,
index c216aaecac53f2d7d1ec47b4f250ea5ae08e11cb..8bcf36d07f3f7c38a164a5864974bc60ad11e8b1 100644 (file)
                        thermal-sensors = <&sfctemp>;
 
                        trips {
-                               cpu_alert0 {
+                               cpu-alert0 {
                                        /* milliCelsius */
                                        temperature = <75000>;
                                        hysteresis = <2000>;
                                        type = "passive";
                                };
 
-                               cpu_crit {
+                               cpu-crit {
                                        /* milliCelsius */
                                        temperature = <90000>;
                                        hysteresis = <2000>;
                };
        };
 
-       osc_sys: osc_sys {
+       osc_sys: osc-sys {
                compatible = "fixed-clock";
                #clock-cells = <0>;
                /* This value must be overridden by the board */
                clock-frequency = <0>;
        };
 
-       osc_aud: osc_aud {
+       osc_aud: osc-aud {
                compatible = "fixed-clock";
                #clock-cells = <0>;
                /* This value must be overridden by the board */
                clock-frequency = <0>;
        };
 
-       gmac_rmii_ref: gmac_rmii_ref {
+       gmac_rmii_ref: gmac-rmii-ref {
                compatible = "fixed-clock";
                #clock-cells = <0>;
                /* Should be overridden by the board when needed */
                clock-frequency = <0>;
        };
 
-       gmac_gr_mii_rxclk: gmac_gr_mii_rxclk {
+       gmac_gr_mii_rxclk: gmac-gr-mii-rxclk {
                compatible = "fixed-clock";
                #clock-cells = <0>;
                /* Should be overridden by the board when needed */
index 45213cdf50dc75a9fa6610710a4d0cbe58b44c51..74ed3b9264d8f15ee10400b4bf5fcf855b7cecd0 100644 (file)
                        };
 
                        trips {
-                               cpu_alert0: cpu_alert0 {
+                               cpu_alert0: cpu-alert0 {
                                        /* milliCelsius */
                                        temperature = <85000>;
                                        hysteresis = <2000>;
                                        type = "passive";
                                };
 
-                               cpu_crit {
+                               cpu-crit {
                                        /* milliCelsius */
                                        temperature = <100000>;
                                        hysteresis = <2000>;
index c20236a0725b9e27a31d28b580db1bd1ad2c945a..85b2c443823e8ab8ce55aa5b32953eabbc5e1108 100644 (file)
@@ -20,7 +20,7 @@
 static __always_inline unsigned int __arch_hweight32(unsigned int w)
 {
 #ifdef CONFIG_RISCV_ISA_ZBB
-       asm_volatile_goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
+       asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
                                      RISCV_ISA_EXT_ZBB, 1)
                          : : : : legacy);
 
@@ -51,7 +51,7 @@ static inline unsigned int __arch_hweight8(unsigned int w)
 static __always_inline unsigned long __arch_hweight64(__u64 w)
 {
 # ifdef CONFIG_RISCV_ISA_ZBB
-       asm_volatile_goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
+       asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
                                      RISCV_ISA_EXT_ZBB, 1)
                          : : : : legacy);
 
index 9ffc355370248aed22dbe690ba1cde8e682a3588..329d8244a9b3fd516104808db5a959acfb469b22 100644 (file)
@@ -39,7 +39,7 @@ static __always_inline unsigned long variable__ffs(unsigned long word)
 {
        int num;
 
-       asm_volatile_goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
+       asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
                                      RISCV_ISA_EXT_ZBB, 1)
                          : : : : legacy);
 
@@ -95,7 +95,7 @@ static __always_inline unsigned long variable__fls(unsigned long word)
 {
        int num;
 
-       asm_volatile_goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
+       asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
                                      RISCV_ISA_EXT_ZBB, 1)
                          : : : : legacy);
 
@@ -154,7 +154,7 @@ static __always_inline int variable_ffs(int x)
        if (!x)
                return 0;
 
-       asm_volatile_goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
+       asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
                                      RISCV_ISA_EXT_ZBB, 1)
                          : : : : legacy);
 
@@ -209,7 +209,7 @@ static __always_inline int variable_fls(unsigned int x)
        if (!x)
                return 0;
 
-       asm_volatile_goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
+       asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0,
                                      RISCV_ISA_EXT_ZBB, 1)
                          : : : : legacy);
 
index a5b60b54b101c3ba1e550b3e16e7096a6bfc8357..88e6f1499e889951b2871fec052330b3c92f2eb7 100644 (file)
@@ -53,7 +53,7 @@ static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
            IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
                unsigned long fold_temp;
 
-               asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+               asm goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
                                              RISCV_ISA_EXT_ZBB, 1)
                    :
                    :
index 5a626ed2c47a8915b3848df2e7f4a7ea0601bd71..0bd11862b7607b9ffebf8460ea6cc00cc1e4ff62 100644 (file)
@@ -80,7 +80,7 @@ riscv_has_extension_likely(const unsigned long ext)
                           "ext must be < RISCV_ISA_EXT_MAX");
 
        if (IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
-               asm_volatile_goto(
+               asm goto(
                ALTERNATIVE("j  %l[l_no]", "nop", 0, %[ext], 1)
                :
                : [ext] "i" (ext)
@@ -103,7 +103,7 @@ riscv_has_extension_unlikely(const unsigned long ext)
                           "ext must be < RISCV_ISA_EXT_MAX");
 
        if (IS_ENABLED(CONFIG_RISCV_ALTERNATIVE)) {
-               asm_volatile_goto(
+               asm goto(
                ALTERNATIVE("nop", "j   %l[l_yes]", 0, %[ext], 1)
                :
                : [ext] "i" (ext)
index 510014051f5dbb1aa61098e4974e7e7ac02145ee..2468c55933cd0d5d55d71d83a52226172bd5121c 100644 (file)
 # define CSR_STATUS    CSR_MSTATUS
 # define CSR_IE                CSR_MIE
 # define CSR_TVEC      CSR_MTVEC
+# define CSR_ENVCFG    CSR_MENVCFG
 # define CSR_SCRATCH   CSR_MSCRATCH
 # define CSR_EPC       CSR_MEPC
 # define CSR_CAUSE     CSR_MCAUSE
 # define CSR_STATUS    CSR_SSTATUS
 # define CSR_IE                CSR_SIE
 # define CSR_TVEC      CSR_STVEC
+# define CSR_ENVCFG    CSR_SENVCFG
 # define CSR_SCRATCH   CSR_SSCRATCH
 # define CSR_EPC       CSR_SEPC
 # define CSR_CAUSE     CSR_SCAUSE
index 3291721229523456247532009bc2ed2ddc444540..15055f9df4daa1e4250c8a37c64193bf5c943ee3 100644 (file)
 
 #define ARCH_SUPPORTS_FTRACE_OPS 1
 #ifndef __ASSEMBLY__
+
+extern void *return_address(unsigned int level);
+
+#define ftrace_return_address(n) return_address(n)
+
 void MCOUNT_NAME(void);
 static inline unsigned long ftrace_call_adjust(unsigned long addr)
 {
index 4c5b0e929890fadcebb3caace0afe97dfa46d8bf..22deb7a2a6ec4e4daba8322c7c6c28137b49f5f8 100644 (file)
@@ -11,6 +11,11 @@ static inline void arch_clear_hugepage_flags(struct page *page)
 }
 #define arch_clear_hugepage_flags arch_clear_hugepage_flags
 
+#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
+bool arch_hugetlb_migration_supported(struct hstate *h);
+#define arch_hugetlb_migration_supported arch_hugetlb_migration_supported
+#endif
+
 #ifdef CONFIG_RISCV_ISA_SVNAPOT
 #define __HAVE_ARCH_HUGE_PTE_CLEAR
 void huge_pte_clear(struct mm_struct *mm, unsigned long addr,
index 5340f818746b71a805319eb6f941fa311c9b36a2..1f2d2599c655d20be6df7516382e20a7e3956301 100644 (file)
@@ -81,6 +81,8 @@
 #define RISCV_ISA_EXT_ZTSO             72
 #define RISCV_ISA_EXT_ZACAS            73
 
+#define RISCV_ISA_EXT_XLINUXENVCFG     127
+
 #define RISCV_ISA_EXT_MAX              128
 #define RISCV_ISA_EXT_INVALID          U32_MAX
 
index 14a5ea8d8ef0f4a2f4477fb65778e4f8ea449e2a..4a35d787c0191475b3a5d8dc7452e448541dc8e9 100644 (file)
@@ -17,7 +17,7 @@
 static __always_inline bool arch_static_branch(struct static_key * const key,
                                               const bool branch)
 {
-       asm_volatile_goto(
+       asm goto(
                "       .align          2                       \n\t"
                "       .option push                            \n\t"
                "       .option norelax                         \n\t"
@@ -39,7 +39,7 @@ label:
 static __always_inline bool arch_static_branch_jump(struct static_key * const key,
                                                    const bool branch)
 {
-       asm_volatile_goto(
+       asm goto(
                "       .align          2                       \n\t"
                "       .option push                            \n\t"
                "       .option norelax                         \n\t"
index d169a4f41a2e728276a97898e1270c7b4763f9ed..c80bb9990d32ef706452d7d4fcc1c049cd7436d9 100644 (file)
@@ -95,7 +95,13 @@ static inline void pud_free(struct mm_struct *mm, pud_t *pud)
                __pud_free(mm, pud);
 }
 
-#define __pud_free_tlb(tlb, pud, addr)  pud_free((tlb)->mm, pud)
+#define __pud_free_tlb(tlb, pud, addr)                                 \
+do {                                                                   \
+       if (pgtable_l4_enabled) {                                       \
+               pagetable_pud_dtor(virt_to_ptdesc(pud));                \
+               tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pud));     \
+       }                                                               \
+} while (0)
 
 #define p4d_alloc_one p4d_alloc_one
 static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr)
@@ -124,7 +130,11 @@ static inline void p4d_free(struct mm_struct *mm, p4d_t *p4d)
                __p4d_free(mm, p4d);
 }
 
-#define __p4d_free_tlb(tlb, p4d, addr)  p4d_free((tlb)->mm, p4d)
+#define __p4d_free_tlb(tlb, p4d, addr)                                 \
+do {                                                                   \
+       if (pgtable_l5_enabled)                                         \
+               tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(p4d));     \
+} while (0)
 #endif /* __PAGETABLE_PMD_FOLDED */
 
 static inline void sync_kernel_mappings(pgd_t *pgd)
@@ -149,7 +159,11 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 
 #ifndef __PAGETABLE_PMD_FOLDED
 
-#define __pmd_free_tlb(tlb, pmd, addr)  pmd_free((tlb)->mm, pmd)
+#define __pmd_free_tlb(tlb, pmd, addr)                         \
+do {                                                           \
+       pagetable_pmd_dtor(virt_to_ptdesc(pmd));                \
+       tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pmd));     \
+} while (0)
 
 #endif /* __PAGETABLE_PMD_FOLDED */
 
index b42017d76924f74386bc712719280af21781bb5d..b99bd66107a69038c835ead6b77725aaeaf882c3 100644 (file)
@@ -136,7 +136,7 @@ enum napot_cont_order {
  * 10010 - IO   Strongly-ordered, Non-cacheable, Non-bufferable, Shareable, Non-trustable
  */
 #define _PAGE_PMA_THEAD                ((1UL << 62) | (1UL << 61) | (1UL << 60))
-#define _PAGE_NOCACHE_THEAD    ((1UL < 61) | (1UL << 60))
+#define _PAGE_NOCACHE_THEAD    ((1UL << 61) | (1UL << 60))
 #define _PAGE_IO_THEAD         ((1UL << 63) | (1UL << 60))
 #define _PAGE_MTMASK_THEAD     (_PAGE_PMA_THEAD | _PAGE_IO_THEAD | (1UL << 59))
 
index 0c94260b5d0c126f6302f39a59507f19eed48dac..6066822e7396fa5078a546356a3a6f6605470712 100644 (file)
@@ -84,7 +84,7 @@
  * Define vmemmap for pfn_to_page & page_to_pfn calls. Needed if kernel
  * is configured with CONFIG_SPARSEMEM_VMEMMAP enabled.
  */
-#define vmemmap                ((struct page *)VMEMMAP_START)
+#define vmemmap                ((struct page *)VMEMMAP_START - (phys_ram_base >> PAGE_SHIFT))
 
 #define PCI_IO_SIZE      SZ_16M
 #define PCI_IO_END       VMEMMAP_START
@@ -439,6 +439,10 @@ static inline pte_t pte_mkhuge(pte_t pte)
        return pte;
 }
 
+#define pte_leaf_size(pte)     (pte_napot(pte) ?                               \
+                                       napot_cont_size(napot_cont_order(pte)) :\
+                                       PAGE_SIZE)
+
 #ifdef CONFIG_NUMA_BALANCING
 /*
  * See the comment in include/asm-generic/pgtable.h
index f7e8ef2418b99fc98362a1a977f8038f7592fbac..b1495a7e06ce693b4fc698ee4d62549bd0614700 100644 (file)
@@ -21,4 +21,9 @@ static inline bool on_thread_stack(void)
        return !(((unsigned long)(current->stack) ^ current_stack_pointer) & ~(THREAD_SIZE - 1));
 }
 
+
+#ifdef CONFIG_VMAP_STACK
+DECLARE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)], overflow_stack);
+#endif /* CONFIG_VMAP_STACK */
+
 #endif /* _ASM_RISCV_STACKTRACE_H */
index 02f87867389a9e660f91b64c7ca818a6b61637dc..491296a335d0ce6cd9c8f242646c3c60c762bc87 100644 (file)
@@ -14,6 +14,7 @@ struct suspend_context {
        struct pt_regs regs;
        /* Saved and restored by high-level functions */
        unsigned long scratch;
+       unsigned long envcfg;
        unsigned long tvec;
        unsigned long ie;
 #ifdef CONFIG_MMU
index 1eb5682b2af6065c9019e398df729f5b97a573c6..50b63b5c15bd8b19dac37176ef98c3489c837e05 100644 (file)
@@ -16,7 +16,7 @@ static void tlb_flush(struct mmu_gather *tlb);
 static inline void tlb_flush(struct mmu_gather *tlb)
 {
 #ifdef CONFIG_MMU
-       if (tlb->fullmm || tlb->need_flush_all)
+       if (tlb->fullmm || tlb->need_flush_all || tlb->freed_tables)
                flush_tlb_mm(tlb->mm);
        else
                flush_tlb_mm_range(tlb->mm, tlb->start, tlb->end,
index 928f096dca21b4e6cbafc009595cd34bb9917109..4112cc8d1d69f9fbde77a524820a5de1e7931acf 100644 (file)
@@ -75,6 +75,7 @@ static inline void flush_tlb_kernel_range(unsigned long start,
 
 #define flush_tlb_mm(mm) flush_tlb_all()
 #define flush_tlb_mm_range(mm, start, end, page_size) flush_tlb_all()
+#define local_flush_tlb_kernel_range(start, end) flush_tlb_all()
 #endif /* !CONFIG_SMP || !CONFIG_MMU */
 
 #endif /* _ASM_RISCV_TLBFLUSH_H */
index 924d01b56c9a1eb1eacd53a923fc55591cda654f..51f6dfe19745aa486bd73d7de472faa538cf0486 100644 (file)
@@ -19,65 +19,6 @@ static inline bool arch_vmap_pmd_supported(pgprot_t prot)
        return true;
 }
 
-#ifdef CONFIG_RISCV_ISA_SVNAPOT
-#include <linux/pgtable.h>
+#endif
 
-#define arch_vmap_pte_range_map_size arch_vmap_pte_range_map_size
-static inline unsigned long arch_vmap_pte_range_map_size(unsigned long addr, unsigned long end,
-                                                        u64 pfn, unsigned int max_page_shift)
-{
-       unsigned long map_size = PAGE_SIZE;
-       unsigned long size, order;
-
-       if (!has_svnapot())
-               return map_size;
-
-       for_each_napot_order_rev(order) {
-               if (napot_cont_shift(order) > max_page_shift)
-                       continue;
-
-               size = napot_cont_size(order);
-               if (end - addr < size)
-                       continue;
-
-               if (!IS_ALIGNED(addr, size))
-                       continue;
-
-               if (!IS_ALIGNED(PFN_PHYS(pfn), size))
-                       continue;
-
-               map_size = size;
-               break;
-       }
-
-       return map_size;
-}
-
-#define arch_vmap_pte_supported_shift arch_vmap_pte_supported_shift
-static inline int arch_vmap_pte_supported_shift(unsigned long size)
-{
-       int shift = PAGE_SHIFT;
-       unsigned long order;
-
-       if (!has_svnapot())
-               return shift;
-
-       WARN_ON_ONCE(size >= PMD_SIZE);
-
-       for_each_napot_order_rev(order) {
-               if (napot_cont_size(order) > size)
-                       continue;
-
-               if (!IS_ALIGNED(size, napot_cont_size(order)))
-                       continue;
-
-               shift = napot_cont_shift(order);
-               break;
-       }
-
-       return shift;
-}
-
-#endif /* CONFIG_RISCV_ISA_SVNAPOT */
-#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
 #endif /* _ASM_RISCV_VMALLOC_H */
index d6b7a5b958742c443bce93e067434128c1cff7e4..7499e88a947c086c5f569e98a899f50d098a8335 100644 (file)
@@ -139,6 +139,33 @@ enum KVM_RISCV_ISA_EXT_ID {
        KVM_RISCV_ISA_EXT_ZIHPM,
        KVM_RISCV_ISA_EXT_SMSTATEEN,
        KVM_RISCV_ISA_EXT_ZICOND,
+       KVM_RISCV_ISA_EXT_ZBC,
+       KVM_RISCV_ISA_EXT_ZBKB,
+       KVM_RISCV_ISA_EXT_ZBKC,
+       KVM_RISCV_ISA_EXT_ZBKX,
+       KVM_RISCV_ISA_EXT_ZKND,
+       KVM_RISCV_ISA_EXT_ZKNE,
+       KVM_RISCV_ISA_EXT_ZKNH,
+       KVM_RISCV_ISA_EXT_ZKR,
+       KVM_RISCV_ISA_EXT_ZKSED,
+       KVM_RISCV_ISA_EXT_ZKSH,
+       KVM_RISCV_ISA_EXT_ZKT,
+       KVM_RISCV_ISA_EXT_ZVBB,
+       KVM_RISCV_ISA_EXT_ZVBC,
+       KVM_RISCV_ISA_EXT_ZVKB,
+       KVM_RISCV_ISA_EXT_ZVKG,
+       KVM_RISCV_ISA_EXT_ZVKNED,
+       KVM_RISCV_ISA_EXT_ZVKNHA,
+       KVM_RISCV_ISA_EXT_ZVKNHB,
+       KVM_RISCV_ISA_EXT_ZVKSED,
+       KVM_RISCV_ISA_EXT_ZVKSH,
+       KVM_RISCV_ISA_EXT_ZVKT,
+       KVM_RISCV_ISA_EXT_ZFH,
+       KVM_RISCV_ISA_EXT_ZFHMIN,
+       KVM_RISCV_ISA_EXT_ZIHINTNTL,
+       KVM_RISCV_ISA_EXT_ZVFH,
+       KVM_RISCV_ISA_EXT_ZVFHMIN,
+       KVM_RISCV_ISA_EXT_ZFA,
        KVM_RISCV_ISA_EXT_MAX,
 };
 
index f71910718053d841a361fd97e7d62da4f86bebcf..604d6bf7e47672e9b01902f6fa497aeb4e102ee5 100644 (file)
@@ -7,6 +7,7 @@ ifdef CONFIG_FTRACE
 CFLAGS_REMOVE_ftrace.o = $(CC_FLAGS_FTRACE)
 CFLAGS_REMOVE_patch.o  = $(CC_FLAGS_FTRACE)
 CFLAGS_REMOVE_sbi.o    = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_return_address.o = $(CC_FLAGS_FTRACE)
 endif
 CFLAGS_syscall_table.o += $(call cc-option,-Wno-override-init,)
 CFLAGS_compat_syscall_table.o += $(call cc-option,-Wno-override-init,)
@@ -46,6 +47,7 @@ obj-y += irq.o
 obj-y  += process.o
 obj-y  += ptrace.o
 obj-y  += reset.o
+obj-y  += return_address.o
 obj-y  += setup.o
 obj-y  += signal.o
 obj-y  += syscall_table.o
index 89920f84d0a34385471e9afbf9c26d287cbbd838..79a5a35fab964d3b54db97b5504f45f68dface11 100644 (file)
@@ -24,6 +24,7 @@
 #include <asm/hwprobe.h>
 #include <asm/patch.h>
 #include <asm/processor.h>
+#include <asm/sbi.h>
 #include <asm/vector.h>
 
 #include "copy-unaligned.h"
@@ -201,6 +202,16 @@ static const unsigned int riscv_zvbb_exts[] = {
        RISCV_ISA_EXT_ZVKB
 };
 
+/*
+ * While the [ms]envcfg CSRs were not defined until version 1.12 of the RISC-V
+ * privileged ISA, the existence of the CSRs is implied by any extension which
+ * specifies [ms]envcfg bit(s). Hence, we define a custom ISA extension for the
+ * existence of the CSR, and treat it as a subset of those other extensions.
+ */
+static const unsigned int riscv_xlinuxenvcfg_exts[] = {
+       RISCV_ISA_EXT_XLINUXENVCFG
+};
+
 /*
  * The canonical order of ISA extension names in the ISA string is defined in
  * chapter 27 of the unprivileged specification.
@@ -250,8 +261,8 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
        __RISCV_ISA_EXT_DATA(c, RISCV_ISA_EXT_c),
        __RISCV_ISA_EXT_DATA(v, RISCV_ISA_EXT_v),
        __RISCV_ISA_EXT_DATA(h, RISCV_ISA_EXT_h),
-       __RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
-       __RISCV_ISA_EXT_DATA(zicboz, RISCV_ISA_EXT_ZICBOZ),
+       __RISCV_ISA_EXT_SUPERSET(zicbom, RISCV_ISA_EXT_ZICBOM, riscv_xlinuxenvcfg_exts),
+       __RISCV_ISA_EXT_SUPERSET(zicboz, RISCV_ISA_EXT_ZICBOZ, riscv_xlinuxenvcfg_exts),
        __RISCV_ISA_EXT_DATA(zicntr, RISCV_ISA_EXT_ZICNTR),
        __RISCV_ISA_EXT_DATA(zicond, RISCV_ISA_EXT_ZICOND),
        __RISCV_ISA_EXT_DATA(zicsr, RISCV_ISA_EXT_ZICSR),
@@ -538,6 +549,20 @@ static void __init riscv_fill_hwcap_from_isa_string(unsigned long *isa2hwcap)
                        set_bit(RISCV_ISA_EXT_ZIHPM, isainfo->isa);
                }
 
+               /*
+                * "V" in ISA strings is ambiguous in practice: it should mean
+                * just the standard V-1.0 but vendors aren't well behaved.
+                * Many vendors with T-Head CPU cores which implement the 0.7.1
+                * version of the vector specification put "v" into their DTs.
+                * CPU cores with the ratified spec will contain non-zero
+                * marchid.
+                */
+               if (acpi_disabled && riscv_cached_mvendorid(cpu) == THEAD_VENDOR_ID &&
+                   riscv_cached_marchid(cpu) == 0x0) {
+                       this_hwcap &= ~isa2hwcap[RISCV_ISA_EXT_v];
+                       clear_bit(RISCV_ISA_EXT_v, isainfo->isa);
+               }
+
                /*
                 * All "okay" hart should have same isa. Set HWCAP based on
                 * common capabilities of every "okay" hart, in case they don't
@@ -950,7 +975,7 @@ arch_initcall(check_unaligned_access_all_cpus);
 void riscv_user_isa_enable(void)
 {
        if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICBOZ))
-               csr_set(CSR_SENVCFG, ENVCFG_CBZE);
+               csr_set(CSR_ENVCFG, ENVCFG_CBZE);
 }
 
 #ifdef CONFIG_RISCV_ALTERNATIVE
index 8e114f5930cec6148b98f7f81abd72a798caf8e8..0d6225fd3194e14ed71ac9afc716b2e81168e9a5 100644 (file)
@@ -41,7 +41,7 @@ static int __init parse_no_stealacc(char *arg)
 
 early_param("no-steal-acc", parse_no_stealacc);
 
-DEFINE_PER_CPU(struct sbi_sta_struct, steal_time) __aligned(64);
+static DEFINE_PER_CPU(struct sbi_sta_struct, steal_time) __aligned(64);
 
 static bool __init has_pv_steal_clock(void)
 {
@@ -91,8 +91,8 @@ static int pv_time_cpu_down_prepare(unsigned int cpu)
 static u64 pv_time_steal_clock(int cpu)
 {
        struct sbi_sta_struct *st = per_cpu_ptr(&steal_time, cpu);
-       u32 sequence;
-       u64 steal;
+       __le32 sequence;
+       __le64 steal;
 
        /*
         * Check the sequence field before and after reading the steal
diff --git a/arch/riscv/kernel/return_address.c b/arch/riscv/kernel/return_address.c
new file mode 100644 (file)
index 0000000..c8115ec
--- /dev/null
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * This code come from arch/arm64/kernel/return_address.c
+ *
+ * Copyright (C) 2023 SiFive.
+ */
+
+#include <linux/export.h>
+#include <linux/kprobes.h>
+#include <linux/stacktrace.h>
+
+struct return_address_data {
+       unsigned int level;
+       void *addr;
+};
+
+static bool save_return_addr(void *d, unsigned long pc)
+{
+       struct return_address_data *data = d;
+
+       if (!data->level) {
+               data->addr = (void *)pc;
+               return false;
+       }
+
+       --data->level;
+
+       return true;
+}
+NOKPROBE_SYMBOL(save_return_addr);
+
+noinline void *return_address(unsigned int level)
+{
+       struct return_address_data data;
+
+       data.level = level + 3;
+       data.addr = NULL;
+
+       arch_stack_walk(save_return_addr, &data, current, NULL);
+
+       if (!data.level)
+               return data.addr;
+       else
+               return NULL;
+
+}
+EXPORT_SYMBOL_GPL(return_address);
+NOKPROBE_SYMBOL(return_address);
index 239509367e4233336806c19da964a06537d5a9b5..299795341e8a2207dc922373511e31118bbd0f8b 100644 (file)
@@ -15,6 +15,8 @@
 void suspend_save_csrs(struct suspend_context *context)
 {
        context->scratch = csr_read(CSR_SCRATCH);
+       if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG))
+               context->envcfg = csr_read(CSR_ENVCFG);
        context->tvec = csr_read(CSR_TVEC);
        context->ie = csr_read(CSR_IE);
 
@@ -36,6 +38,8 @@ void suspend_save_csrs(struct suspend_context *context)
 void suspend_restore_csrs(struct suspend_context *context)
 {
        csr_write(CSR_SCRATCH, context->scratch);
+       if (riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_XLINUXENVCFG))
+               csr_write(CSR_ENVCFG, context->envcfg);
        csr_write(CSR_TVEC, context->tvec);
        csr_write(CSR_IE, context->ie);
 
index fc34557f5356e27902a2f83c27eb37f1237c9b95..5f7355e960084b4a4a17ea294e92352f7a70da60 100644 (file)
@@ -42,15 +42,42 @@ static const unsigned long kvm_isa_ext_arr[] = {
        KVM_ISA_EXT_ARR(SVPBMT),
        KVM_ISA_EXT_ARR(ZBA),
        KVM_ISA_EXT_ARR(ZBB),
+       KVM_ISA_EXT_ARR(ZBC),
+       KVM_ISA_EXT_ARR(ZBKB),
+       KVM_ISA_EXT_ARR(ZBKC),
+       KVM_ISA_EXT_ARR(ZBKX),
        KVM_ISA_EXT_ARR(ZBS),
+       KVM_ISA_EXT_ARR(ZFA),
+       KVM_ISA_EXT_ARR(ZFH),
+       KVM_ISA_EXT_ARR(ZFHMIN),
        KVM_ISA_EXT_ARR(ZICBOM),
        KVM_ISA_EXT_ARR(ZICBOZ),
        KVM_ISA_EXT_ARR(ZICNTR),
        KVM_ISA_EXT_ARR(ZICOND),
        KVM_ISA_EXT_ARR(ZICSR),
        KVM_ISA_EXT_ARR(ZIFENCEI),
+       KVM_ISA_EXT_ARR(ZIHINTNTL),
        KVM_ISA_EXT_ARR(ZIHINTPAUSE),
        KVM_ISA_EXT_ARR(ZIHPM),
+       KVM_ISA_EXT_ARR(ZKND),
+       KVM_ISA_EXT_ARR(ZKNE),
+       KVM_ISA_EXT_ARR(ZKNH),
+       KVM_ISA_EXT_ARR(ZKR),
+       KVM_ISA_EXT_ARR(ZKSED),
+       KVM_ISA_EXT_ARR(ZKSH),
+       KVM_ISA_EXT_ARR(ZKT),
+       KVM_ISA_EXT_ARR(ZVBB),
+       KVM_ISA_EXT_ARR(ZVBC),
+       KVM_ISA_EXT_ARR(ZVFH),
+       KVM_ISA_EXT_ARR(ZVFHMIN),
+       KVM_ISA_EXT_ARR(ZVKB),
+       KVM_ISA_EXT_ARR(ZVKG),
+       KVM_ISA_EXT_ARR(ZVKNED),
+       KVM_ISA_EXT_ARR(ZVKNHA),
+       KVM_ISA_EXT_ARR(ZVKNHB),
+       KVM_ISA_EXT_ARR(ZVKSED),
+       KVM_ISA_EXT_ARR(ZVKSH),
+       KVM_ISA_EXT_ARR(ZVKT),
 };
 
 static unsigned long kvm_riscv_vcpu_base2isa_ext(unsigned long base_ext)
@@ -92,13 +119,40 @@ static bool kvm_riscv_vcpu_isa_disable_allowed(unsigned long ext)
        case KVM_RISCV_ISA_EXT_SVNAPOT:
        case KVM_RISCV_ISA_EXT_ZBA:
        case KVM_RISCV_ISA_EXT_ZBB:
+       case KVM_RISCV_ISA_EXT_ZBC:
+       case KVM_RISCV_ISA_EXT_ZBKB:
+       case KVM_RISCV_ISA_EXT_ZBKC:
+       case KVM_RISCV_ISA_EXT_ZBKX:
        case KVM_RISCV_ISA_EXT_ZBS:
+       case KVM_RISCV_ISA_EXT_ZFA:
+       case KVM_RISCV_ISA_EXT_ZFH:
+       case KVM_RISCV_ISA_EXT_ZFHMIN:
        case KVM_RISCV_ISA_EXT_ZICNTR:
        case KVM_RISCV_ISA_EXT_ZICOND:
        case KVM_RISCV_ISA_EXT_ZICSR:
        case KVM_RISCV_ISA_EXT_ZIFENCEI:
+       case KVM_RISCV_ISA_EXT_ZIHINTNTL:
        case KVM_RISCV_ISA_EXT_ZIHINTPAUSE:
        case KVM_RISCV_ISA_EXT_ZIHPM:
+       case KVM_RISCV_ISA_EXT_ZKND:
+       case KVM_RISCV_ISA_EXT_ZKNE:
+       case KVM_RISCV_ISA_EXT_ZKNH:
+       case KVM_RISCV_ISA_EXT_ZKR:
+       case KVM_RISCV_ISA_EXT_ZKSED:
+       case KVM_RISCV_ISA_EXT_ZKSH:
+       case KVM_RISCV_ISA_EXT_ZKT:
+       case KVM_RISCV_ISA_EXT_ZVBB:
+       case KVM_RISCV_ISA_EXT_ZVBC:
+       case KVM_RISCV_ISA_EXT_ZVFH:
+       case KVM_RISCV_ISA_EXT_ZVFHMIN:
+       case KVM_RISCV_ISA_EXT_ZVKB:
+       case KVM_RISCV_ISA_EXT_ZVKG:
+       case KVM_RISCV_ISA_EXT_ZVKNED:
+       case KVM_RISCV_ISA_EXT_ZVKNHA:
+       case KVM_RISCV_ISA_EXT_ZVKNHB:
+       case KVM_RISCV_ISA_EXT_ZVKSED:
+       case KVM_RISCV_ISA_EXT_ZVKSH:
+       case KVM_RISCV_ISA_EXT_ZVKT:
                return false;
        /* Extensions which can be disabled using Smstateen */
        case KVM_RISCV_ISA_EXT_SSAIA:
index 01f09fe8c3b020968be3f623097c9a48ab958087..d8cf9ca28c616e9d4073465a71dcc7479be3d35a 100644 (file)
@@ -26,8 +26,12 @@ void kvm_riscv_vcpu_record_steal_time(struct kvm_vcpu *vcpu)
 {
        gpa_t shmem = vcpu->arch.sta.shmem;
        u64 last_steal = vcpu->arch.sta.last_steal;
-       u32 *sequence_ptr, sequence;
-       u64 *steal_ptr, steal;
+       __le32 __user *sequence_ptr;
+       __le64 __user *steal_ptr;
+       __le32 sequence_le;
+       __le64 steal_le;
+       u32 sequence;
+       u64 steal;
        unsigned long hva;
        gfn_t gfn;
 
@@ -47,22 +51,22 @@ void kvm_riscv_vcpu_record_steal_time(struct kvm_vcpu *vcpu)
                return;
        }
 
-       sequence_ptr = (u32 *)(hva + offset_in_page(shmem) +
+       sequence_ptr = (__le32 __user *)(hva + offset_in_page(shmem) +
                               offsetof(struct sbi_sta_struct, sequence));
-       steal_ptr = (u64 *)(hva + offset_in_page(shmem) +
+       steal_ptr = (__le64 __user *)(hva + offset_in_page(shmem) +
                            offsetof(struct sbi_sta_struct, steal));
 
-       if (WARN_ON(get_user(sequence, sequence_ptr)))
+       if (WARN_ON(get_user(sequence_le, sequence_ptr)))
                return;
 
-       sequence = le32_to_cpu(sequence);
+       sequence = le32_to_cpu(sequence_le);
        sequence += 1;
 
        if (WARN_ON(put_user(cpu_to_le32(sequence), sequence_ptr)))
                return;
 
-       if (!WARN_ON(get_user(steal, steal_ptr))) {
-               steal = le64_to_cpu(steal);
+       if (!WARN_ON(get_user(steal_le, steal_ptr))) {
+               steal = le64_to_cpu(steal_le);
                vcpu->arch.sta.last_steal = READ_ONCE(current->sched_info.run_delay);
                steal += vcpu->arch.sta.last_steal - last_steal;
                WARN_ON(put_user(cpu_to_le64(steal), steal_ptr));
index af3df5274ccbae0118488080040f45881a3e025a..74af3ab520b6d433836930937dd90ffa2e672339 100644 (file)
@@ -53,7 +53,7 @@ __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
                 * support, so nop when Zbb is available and jump when Zbb is
                 * not available.
                 */
-               asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+               asm goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
                                              RISCV_ISA_EXT_ZBB, 1)
                                  :
                                  :
@@ -170,7 +170,7 @@ do_csum_with_alignment(const unsigned char *buff, int len)
                 * support, so nop when Zbb is available and jump when Zbb is
                 * not available.
                 */
-               asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+               asm goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
                                              RISCV_ISA_EXT_ZBB, 1)
                                  :
                                  :
@@ -178,7 +178,7 @@ do_csum_with_alignment(const unsigned char *buff, int len)
                                  : no_zbb);
 
 #ifdef CONFIG_32BIT
-               asm_volatile_goto(".option push                 \n\
+               asm_goto_output(".option push                   \n\
                .option arch,+zbb                               \n\
                        rori    %[fold_temp], %[csum], 16       \n\
                        andi    %[offset], %[offset], 1         \n\
@@ -193,7 +193,7 @@ do_csum_with_alignment(const unsigned char *buff, int len)
 
                return (unsigned short)csum;
 #else /* !CONFIG_32BIT */
-               asm_volatile_goto(".option push                 \n\
+               asm_goto_output(".option push                   \n\
                .option arch,+zbb                               \n\
                        rori    %[fold_temp], %[csum], 32       \n\
                        add     %[csum], %[fold_temp], %[csum]  \n\
@@ -257,7 +257,7 @@ do_csum_no_alignment(const unsigned char *buff, int len)
                 * support, so nop when Zbb is available and jump when Zbb is
                 * not available.
                 */
-               asm_volatile_goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
+               asm goto(ALTERNATIVE("j %l[no_zbb]", "nop", 0,
                                              RISCV_ISA_EXT_ZBB, 1)
                                  :
                                  :
index 431596c0e20e04e1ad5a2422fa989f27cdfdc7c4..5ef2a6891158a6d59de8f36b4f4d98cf3ad6eb2a 100644 (file)
@@ -125,6 +125,26 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
        return pte;
 }
 
+unsigned long hugetlb_mask_last_page(struct hstate *h)
+{
+       unsigned long hp_size = huge_page_size(h);
+
+       switch (hp_size) {
+#ifndef __PAGETABLE_PMD_FOLDED
+       case PUD_SIZE:
+               return P4D_SIZE - PUD_SIZE;
+#endif
+       case PMD_SIZE:
+               return PUD_SIZE - PMD_SIZE;
+       case napot_cont_size(NAPOT_CONT64KB_ORDER):
+               return PMD_SIZE - napot_cont_size(NAPOT_CONT64KB_ORDER);
+       default:
+               break;
+       }
+
+       return 0UL;
+}
+
 static pte_t get_clear_contig(struct mm_struct *mm,
                              unsigned long addr,
                              pte_t *ptep,
@@ -177,13 +197,36 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
        return entry;
 }
 
+static void clear_flush(struct mm_struct *mm,
+                       unsigned long addr,
+                       pte_t *ptep,
+                       unsigned long pgsize,
+                       unsigned long ncontig)
+{
+       struct vm_area_struct vma = TLB_FLUSH_VMA(mm, 0);
+       unsigned long i, saddr = addr;
+
+       for (i = 0; i < ncontig; i++, addr += pgsize, ptep++)
+               ptep_get_and_clear(mm, addr, ptep);
+
+       flush_tlb_range(&vma, saddr, addr);
+}
+
+/*
+ * When dealing with NAPOT mappings, the privileged specification indicates that
+ * "if an update needs to be made, the OS generally should first mark all of the
+ * PTEs invalid, then issue SFENCE.VMA instruction(s) covering all 4 KiB regions
+ * within the range, [...] then update the PTE(s), as described in Section
+ * 4.2.1.". That's the equivalent of the Break-Before-Make approach used by
+ * arm64.
+ */
 void set_huge_pte_at(struct mm_struct *mm,
                     unsigned long addr,
                     pte_t *ptep,
                     pte_t pte,
                     unsigned long sz)
 {
-       unsigned long hugepage_shift;
+       unsigned long hugepage_shift, pgsize;
        int i, pte_num;
 
        if (sz >= PGDIR_SIZE)
@@ -198,7 +241,22 @@ void set_huge_pte_at(struct mm_struct *mm,
                hugepage_shift = PAGE_SHIFT;
 
        pte_num = sz >> hugepage_shift;
-       for (i = 0; i < pte_num; i++, ptep++, addr += (1 << hugepage_shift))
+       pgsize = 1 << hugepage_shift;
+
+       if (!pte_present(pte)) {
+               for (i = 0; i < pte_num; i++, ptep++, addr += pgsize)
+                       set_ptes(mm, addr, ptep, pte, 1);
+               return;
+       }
+
+       if (!pte_napot(pte)) {
+               set_ptes(mm, addr, ptep, pte, 1);
+               return;
+       }
+
+       clear_flush(mm, addr, ptep, pgsize, pte_num);
+
+       for (i = 0; i < pte_num; i++, ptep++, addr += pgsize)
                set_pte_at(mm, addr, ptep, pte);
 }
 
@@ -306,7 +364,7 @@ void huge_pte_clear(struct mm_struct *mm,
                pte_clear(mm, addr, ptep);
 }
 
-static __init bool is_napot_size(unsigned long size)
+static bool is_napot_size(unsigned long size)
 {
        unsigned long order;
 
@@ -334,7 +392,7 @@ arch_initcall(napot_hugetlbpages_init);
 
 #else
 
-static __init bool is_napot_size(unsigned long size)
+static bool is_napot_size(unsigned long size)
 {
        return false;
 }
@@ -351,7 +409,7 @@ int pmd_huge(pmd_t pmd)
        return pmd_leaf(pmd);
 }
 
-bool __init arch_hugetlb_valid_size(unsigned long size)
+static bool __hugetlb_valid_size(unsigned long size)
 {
        if (size == HPAGE_SIZE)
                return true;
@@ -363,6 +421,18 @@ bool __init arch_hugetlb_valid_size(unsigned long size)
                return false;
 }
 
+bool __init arch_hugetlb_valid_size(unsigned long size)
+{
+       return __hugetlb_valid_size(size);
+}
+
+#ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
+bool arch_hugetlb_migration_supported(struct hstate *h)
+{
+       return __hugetlb_valid_size(huge_page_size(h));
+}
+#endif
+
 #ifdef CONFIG_CONTIG_ALLOC
 static __init int gigantic_pages_init(void)
 {
index 32cad6a65ccd23431d63097a0906ca5b8de485f8..fa34cf55037bd37ad0b8d3bb3b67f6f91d243f58 100644 (file)
@@ -1385,6 +1385,10 @@ void __init misc_mem_init(void)
        early_memtest(min_low_pfn << PAGE_SHIFT, max_low_pfn << PAGE_SHIFT);
        arch_numa_init();
        sparse_init();
+#ifdef CONFIG_SPARSEMEM_VMEMMAP
+       /* The entire VMEMMAP region has been populated. Flush TLB for this region */
+       local_flush_tlb_kernel_range(VMEMMAP_START, VMEMMAP_END);
+#endif
        zone_sizes_init();
        arch_reserve_crashkernel();
        memblock_dump_all();
index 8d12b26f5ac37b659687981c2046f3d5b590753c..893566e004b73fcf9a8dbc94f766e59cd00f1bb1 100644 (file)
@@ -66,9 +66,10 @@ static inline void local_flush_tlb_range_asid(unsigned long start,
                local_flush_tlb_range_threshold_asid(start, size, stride, asid);
 }
 
+/* Flush a range of kernel pages without broadcasting */
 void local_flush_tlb_kernel_range(unsigned long start, unsigned long end)
 {
-       local_flush_tlb_range_asid(start, end, PAGE_SIZE, FLUSH_TLB_NO_ASID);
+       local_flush_tlb_range_asid(start, end - start, PAGE_SIZE, FLUSH_TLB_NO_ASID);
 }
 
 static void __ipi_flush_tlb_all(void *info)
@@ -233,4 +234,5 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
 {
        __flush_tlb_range(&batch->cpumask, FLUSH_TLB_NO_ASID, 0,
                          FLUSH_TLB_MAX_SIZE, PAGE_SIZE);
+       cpumask_clear(&batch->cpumask);
 }
index 58dc64dd94a82c8d8cc42a71ec69954dc548934a..719a97e7edb2c12277a8e08dd214e0eb03be094a 100644 (file)
@@ -795,6 +795,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
        struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY];
        struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT];
        struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN];
+       bool is_struct_ops = flags & BPF_TRAMP_F_INDIRECT;
        void *orig_call = func_addr;
        bool save_ret;
        u32 insn;
@@ -878,7 +879,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 
        stack_size = round_up(stack_size, 16);
 
-       if (func_addr) {
+       if (!is_struct_ops) {
                /* For the trampoline called from function entry,
                 * the frame of traced function and the frame of
                 * trampoline need to be considered.
@@ -998,7 +999,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im,
 
        emit_ld(RV_REG_S1, -sreg_off, RV_REG_FP, ctx);
 
-       if (func_addr) {
+       if (!is_struct_ops) {
                /* trampoline called from function entry */
                emit_ld(RV_REG_T0, stack_size - 8, RV_REG_SP, ctx);
                emit_ld(RV_REG_FP, stack_size - 16, RV_REG_SP, ctx);
diff --git a/arch/s390/configs/compat.config b/arch/s390/configs/compat.config
new file mode 100644 (file)
index 0000000..6fd0514
--- /dev/null
@@ -0,0 +1,3 @@
+# Help: Enable compat support
+CONFIG_COMPAT=y
+CONFIG_COMPAT_32BIT_TIME=y
index cae2dd34fbb49d16ee020e72fb669010dca832f8..06756bad5e30ffbe8e4d0153517f180cbd82a1cd 100644 (file)
@@ -118,7 +118,6 @@ CONFIG_UNIX=y
 CONFIG_UNIX_DIAG=m
 CONFIG_XFRM_USER=m
 CONFIG_NET_KEY=m
-CONFIG_SMC=m
 CONFIG_SMC_DIAG=m
 CONFIG_INET=y
 CONFIG_IP_MULTICAST=y
@@ -374,6 +373,7 @@ CONFIG_NET_ACT_POLICE=m
 CONFIG_NET_ACT_GACT=m
 CONFIG_GACT_PROB=y
 CONFIG_NET_ACT_MIRRED=m
+CONFIG_NET_ACT_IPT=m
 CONFIG_NET_ACT_NAT=m
 CONFIG_NET_ACT_PEDIT=m
 CONFIG_NET_ACT_SIMP=m
@@ -436,9 +436,6 @@ CONFIG_SCSI_DH_ALUA=m
 CONFIG_MD=y
 CONFIG_BLK_DEV_MD=y
 # CONFIG_MD_BITMAP_FILE is not set
-CONFIG_MD_LINEAR=m
-CONFIG_MD_MULTIPATH=m
-CONFIG_MD_FAULTY=m
 CONFIG_MD_CLUSTER=m
 CONFIG_BCACHE=m
 CONFIG_BLK_DEV_DM=y
@@ -637,7 +634,6 @@ CONFIG_FUSE_FS=y
 CONFIG_CUSE=m
 CONFIG_VIRTIO_FS=m
 CONFIG_OVERLAY_FS=m
-CONFIG_NETFS_SUPPORT=m
 CONFIG_NETFS_STATS=y
 CONFIG_FSCACHE=y
 CONFIG_CACHEFILES=m
@@ -709,7 +705,6 @@ CONFIG_IMA_DEFAULT_HASH_SHA256=y
 CONFIG_IMA_WRITE_POLICY=y
 CONFIG_IMA_APPRAISE=y
 CONFIG_LSM="yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"
-CONFIG_INIT_STACK_NONE=y
 CONFIG_BUG_ON_DATA_CORRUPTION=y
 CONFIG_CRYPTO_USER=m
 # CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
@@ -739,7 +734,6 @@ CONFIG_CRYPTO_TEA=m
 CONFIG_CRYPTO_TWOFISH=m
 CONFIG_CRYPTO_ADIANTUM=m
 CONFIG_CRYPTO_ARC4=m
-CONFIG_CRYPTO_CFB=m
 CONFIG_CRYPTO_HCTR2=m
 CONFIG_CRYPTO_KEYWRAP=m
 CONFIG_CRYPTO_LRW=m
@@ -886,4 +880,3 @@ CONFIG_ATOMIC64_SELFTEST=y
 CONFIG_STRING_SELFTEST=y
 CONFIG_TEST_BITOPS=m
 CONFIG_TEST_BPF=m
-CONFIG_TEST_LIVEPATCH=m
index 42b988873e5443df15b054d78610697fdf769293..d33f814f78b2c115f31bdbb9bfbe6417258501f0 100644 (file)
@@ -109,7 +109,6 @@ CONFIG_UNIX=y
 CONFIG_UNIX_DIAG=m
 CONFIG_XFRM_USER=m
 CONFIG_NET_KEY=m
-CONFIG_SMC=m
 CONFIG_SMC_DIAG=m
 CONFIG_INET=y
 CONFIG_IP_MULTICAST=y
@@ -364,6 +363,7 @@ CONFIG_NET_ACT_POLICE=m
 CONFIG_NET_ACT_GACT=m
 CONFIG_GACT_PROB=y
 CONFIG_NET_ACT_MIRRED=m
+CONFIG_NET_ACT_IPT=m
 CONFIG_NET_ACT_NAT=m
 CONFIG_NET_ACT_PEDIT=m
 CONFIG_NET_ACT_SIMP=m
@@ -426,9 +426,6 @@ CONFIG_SCSI_DH_ALUA=m
 CONFIG_MD=y
 CONFIG_BLK_DEV_MD=y
 # CONFIG_MD_BITMAP_FILE is not set
-CONFIG_MD_LINEAR=m
-CONFIG_MD_MULTIPATH=m
-CONFIG_MD_FAULTY=m
 CONFIG_MD_CLUSTER=m
 CONFIG_BCACHE=m
 CONFIG_BLK_DEV_DM=y
@@ -622,7 +619,6 @@ CONFIG_FUSE_FS=y
 CONFIG_CUSE=m
 CONFIG_VIRTIO_FS=m
 CONFIG_OVERLAY_FS=m
-CONFIG_NETFS_SUPPORT=m
 CONFIG_NETFS_STATS=y
 CONFIG_FSCACHE=y
 CONFIG_CACHEFILES=m
@@ -693,7 +689,6 @@ CONFIG_IMA_DEFAULT_HASH_SHA256=y
 CONFIG_IMA_WRITE_POLICY=y
 CONFIG_IMA_APPRAISE=y
 CONFIG_LSM="yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"
-CONFIG_INIT_STACK_NONE=y
 CONFIG_BUG_ON_DATA_CORRUPTION=y
 CONFIG_CRYPTO_FIPS=y
 CONFIG_CRYPTO_USER=m
@@ -724,11 +719,9 @@ CONFIG_CRYPTO_TEA=m
 CONFIG_CRYPTO_TWOFISH=m
 CONFIG_CRYPTO_ADIANTUM=m
 CONFIG_CRYPTO_ARC4=m
-CONFIG_CRYPTO_CFB=m
 CONFIG_CRYPTO_HCTR2=m
 CONFIG_CRYPTO_KEYWRAP=m
 CONFIG_CRYPTO_LRW=m
-CONFIG_CRYPTO_OFB=m
 CONFIG_CRYPTO_PCBC=m
 CONFIG_CRYPTO_AEGIS128=m
 CONFIG_CRYPTO_CHACHA20POLY1305=m
@@ -815,4 +808,3 @@ CONFIG_KPROBES_SANITY_TEST=m
 CONFIG_PERCPU_TEST=m
 CONFIG_ATOMIC64_SELFTEST=y
 CONFIG_TEST_BPF=m
-CONFIG_TEST_LIVEPATCH=m
index 30d2a16876650e9c3ea32997f771131e6372e2fc..c51f3ec4eb28ab189b7d27d12ca28b98261178e2 100644 (file)
@@ -8,6 +8,7 @@ CONFIG_BPF_SYSCALL=y
 # CONFIG_NET_NS is not set
 CONFIG_BLK_DEV_INITRD=y
 CONFIG_CC_OPTIMIZE_FOR_SIZE=y
+CONFIG_KEXEC=y
 CONFIG_CRASH_DUMP=y
 CONFIG_MARCH_Z13=y
 CONFIG_NR_CPUS=2
@@ -64,7 +65,6 @@ CONFIG_ZFCP=y
 # CONFIG_MISC_FILESYSTEMS is not set
 # CONFIG_NETWORK_FILESYSTEMS is not set
 CONFIG_LSM="yama,loadpin,safesetid,integrity"
-CONFIG_INIT_STACK_NONE=y
 # CONFIG_ZLIB_DFLTCC is not set
 CONFIG_XZ_DEC_MICROLZMA=y
 CONFIG_PRINTK_TIME=y
index 895f774bbcc55353cc7a3d302b009796a799446f..bf78cf381dfcdac92a170b754328acd16846eb2e 100644 (file)
@@ -25,7 +25,7 @@
  */
 static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
 {
-       asm_volatile_goto("0:   brcl 0,%l[label]\n"
+       asm goto("0:    brcl 0,%l[label]\n"
                          ".pushsection __jump_table,\"aw\"\n"
                          ".balign      8\n"
                          ".long        0b-.,%l[label]-.\n"
@@ -39,7 +39,7 @@ label:
 
 static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)
 {
-       asm_volatile_goto("0:   brcl 15,%l[label]\n"
+       asm goto("0:    brcl 15,%l[label]\n"
                          ".pushsection __jump_table,\"aw\"\n"
                          ".balign      8\n"
                          ".long        0b-.,%l[label]-.\n"
index 621a17fd1a1bb52fd7875a134a1acac25f004209..f875a404a0a02555d5875128fafedcfe54d5b4d6 100644 (file)
@@ -676,8 +676,12 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
        if (vcpu->kvm->arch.crypto.pqap_hook) {
                pqap_hook = *vcpu->kvm->arch.crypto.pqap_hook;
                ret = pqap_hook(vcpu);
-               if (!ret && vcpu->run->s.regs.gprs[1] & 0x00ff0000)
-                       kvm_s390_set_psw_cc(vcpu, 3);
+               if (!ret) {
+                       if (vcpu->run->s.regs.gprs[1] & 0x00ff0000)
+                               kvm_s390_set_psw_cc(vcpu, 3);
+                       else
+                               kvm_s390_set_psw_cc(vcpu, 0);
+               }
                up_read(&vcpu->kvm->arch.crypto.pqap_hook_rwsem);
                return ret;
        }
index fef42e2a80a2ae5bc47eae89f2b4e38293a2586b..3af3bd20ac7b8f075e08b85b34f7e257f4687eeb 100644 (file)
@@ -1235,7 +1235,6 @@ static int acquire_gmap_shadow(struct kvm_vcpu *vcpu,
        gmap = gmap_shadow(vcpu->arch.gmap, asce, edat);
        if (IS_ERR(gmap))
                return PTR_ERR(gmap);
-       gmap->private = vcpu->kvm;
        vcpu->kvm->stat.gmap_shadow_create++;
        WRITE_ONCE(vsie_page->gmap, gmap);
        return 0;
index 6f96b5a71c6383d07eb447cb80df70214bdd1910..8da39deb56ca4952a6f8e436d153ec6f54292932 100644 (file)
@@ -1691,6 +1691,7 @@ struct gmap *gmap_shadow(struct gmap *parent, unsigned long asce,
                return ERR_PTR(-ENOMEM);
        new->mm = parent->mm;
        new->parent = gmap_get(parent);
+       new->private = parent->private;
        new->orig_asce = asce;
        new->edat_level = edat_level;
        new->initialized = false;
index 676ac74026a82b578f857e2426a501abdec014c7..52a44e353796c001a31e9a8242f39982203fb8be 100644 (file)
@@ -252,7 +252,7 @@ resource_size_t pcibios_align_resource(void *data, const struct resource *res,
 /* combine single writes by using store-block insn */
 void __iowrite64_copy(void __iomem *to, const void *from, size_t count)
 {
-       zpci_memcpy_toio(to, from, count);
+       zpci_memcpy_toio(to, from, count * 8);
 }
 
 void __iomem *ioremap_prot(phys_addr_t phys_addr, size_t size,
index 5f60359361312e4159b45d769c1afe81533ace1b..2a03daa68f2857df85df97b5b632d6154e76496f 100644 (file)
@@ -60,7 +60,7 @@ libs-y                 += arch/sparc/prom/
 libs-y                 += arch/sparc/lib/
 
 drivers-$(CONFIG_PM) += arch/sparc/power/
-drivers-$(CONFIG_FB) += arch/sparc/video/
+drivers-$(CONFIG_FB_CORE) += arch/sparc/video/
 
 boot := arch/sparc/boot
 
index 94eb529dcb77623caf637387e694d8e5ddc049a8..2718cbea826a7d13aefacd26fee3719b69856746 100644 (file)
@@ -10,7 +10,7 @@
 
 static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
 {
-       asm_volatile_goto("1:\n\t"
+       asm goto("1:\n\t"
                 "nop\n\t"
                 "nop\n\t"
                 ".pushsection __jump_table,  \"aw\"\n\t"
@@ -26,7 +26,7 @@ l_yes:
 
 static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)
 {
-       asm_volatile_goto("1:\n\t"
+       asm goto("1:\n\t"
                 "b %l[l_yes]\n\t"
                 "nop\n\t"
                 ".pushsection __jump_table,  \"aw\"\n\t"
index 6baddbd58e4db3fa82c9ba76fd5e0d571a7c4f48..d4d83f1702c61f09e3dceac24c494ecd1632f3e5 100644 (file)
@@ -1,3 +1,3 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
-obj-$(CONFIG_FB) += fbdev.o
+obj-$(CONFIG_FB_CORE) += fbdev.o
index 82f05f250634807c9f78774bca9213dfd5de2038..34957dcb88b9c31befa3c2b7e08809de73b23a3e 100644 (file)
@@ -115,7 +115,9 @@ archprepare:
        $(Q)$(MAKE) $(build)=$(HOST_DIR)/um include/generated/user_constants.h
 
 LINK-$(CONFIG_LD_SCRIPT_STATIC) += -static
-LINK-$(CONFIG_LD_SCRIPT_DYN) += $(call cc-option, -no-pie)
+ifdef CONFIG_LD_SCRIPT_DYN
+LINK-$(call gcc-min-version, 60100)$(CONFIG_CC_IS_CLANG) += -no-pie
+endif
 LINK-$(CONFIG_LD_SCRIPT_DYN_RPATH) += -Wl,-rpath,/lib
 
 CFLAGS_NO_HARDENING := $(call cc-option, -fno-PIC,) $(call cc-option, -fno-pic,) \
index 92ee2697ff398458c004ce4bc7515abfc0def5d4..63fc062add708cf8e09f5f23185478ef8b85a88a 100644 (file)
@@ -108,8 +108,6 @@ static inline void ubd_set_bit(__u64 bit, unsigned char *data)
 static DEFINE_MUTEX(ubd_lock);
 static DEFINE_MUTEX(ubd_mutex); /* replaces BKL, might not be needed */
 
-static int ubd_open(struct gendisk *disk, blk_mode_t mode);
-static void ubd_release(struct gendisk *disk);
 static int ubd_ioctl(struct block_device *bdev, blk_mode_t mode,
                     unsigned int cmd, unsigned long arg);
 static int ubd_getgeo(struct block_device *bdev, struct hd_geometry *geo);
@@ -118,16 +116,11 @@ static int ubd_getgeo(struct block_device *bdev, struct hd_geometry *geo);
 
 static const struct block_device_operations ubd_blops = {
         .owner         = THIS_MODULE,
-        .open          = ubd_open,
-        .release       = ubd_release,
         .ioctl         = ubd_ioctl,
         .compat_ioctl  = blkdev_compat_ptr_ioctl,
        .getgeo         = ubd_getgeo,
 };
 
-/* Protected by ubd_lock */
-static struct gendisk *ubd_gendisk[MAX_DEV];
-
 #ifdef CONFIG_BLK_DEV_UBD_SYNC
 #define OPEN_FLAGS ((struct openflags) { .r = 1, .w = 1, .s = 1, .c = 0, \
                                         .cl = 1 })
@@ -155,7 +148,6 @@ struct ubd {
         * backing or the cow file. */
        char *file;
        char *serial;
-       int count;
        int fd;
        __u64 size;
        struct openflags boot_openflags;
@@ -165,7 +157,7 @@ struct ubd {
        unsigned no_trim:1;
        struct cow cow;
        struct platform_device pdev;
-       struct request_queue *queue;
+       struct gendisk *disk;
        struct blk_mq_tag_set tag_set;
        spinlock_t lock;
 };
@@ -181,7 +173,6 @@ struct ubd {
 #define DEFAULT_UBD { \
        .file =                 NULL, \
        .serial =               NULL, \
-       .count =                0, \
        .fd =                   -1, \
        .size =                 -1, \
        .boot_openflags =       OPEN_FLAGS, \
@@ -774,8 +765,6 @@ static int ubd_open_dev(struct ubd *ubd_dev)
        ubd_dev->fd = fd;
 
        if(ubd_dev->cow.file != NULL){
-               blk_queue_max_hw_sectors(ubd_dev->queue, 8 * sizeof(long));
-
                err = -ENOMEM;
                ubd_dev->cow.bitmap = vmalloc(ubd_dev->cow.bitmap_len);
                if(ubd_dev->cow.bitmap == NULL){
@@ -797,11 +786,6 @@ static int ubd_open_dev(struct ubd *ubd_dev)
                if(err < 0) goto error;
                ubd_dev->cow.fd = err;
        }
-       if (ubd_dev->no_trim == 0) {
-               blk_queue_max_discard_sectors(ubd_dev->queue, UBD_MAX_REQUEST);
-               blk_queue_max_write_zeroes_sectors(ubd_dev->queue, UBD_MAX_REQUEST);
-       }
-       blk_queue_flag_set(QUEUE_FLAG_NONROT, ubd_dev->queue);
        return 0;
  error:
        os_close_file(ubd_dev->fd);
@@ -851,27 +835,6 @@ static const struct attribute_group *ubd_attr_groups[] = {
        NULL,
 };
 
-static int ubd_disk_register(int major, u64 size, int unit,
-                            struct gendisk *disk)
-{
-       disk->major = major;
-       disk->first_minor = unit << UBD_SHIFT;
-       disk->minors = 1 << UBD_SHIFT;
-       disk->fops = &ubd_blops;
-       set_capacity(disk, size / 512);
-       sprintf(disk->disk_name, "ubd%c", 'a' + unit);
-
-       ubd_devs[unit].pdev.id   = unit;
-       ubd_devs[unit].pdev.name = DRIVER_NAME;
-       ubd_devs[unit].pdev.dev.release = ubd_device_release;
-       dev_set_drvdata(&ubd_devs[unit].pdev.dev, &ubd_devs[unit]);
-       platform_device_register(&ubd_devs[unit].pdev);
-
-       disk->private_data = &ubd_devs[unit];
-       disk->queue = ubd_devs[unit].queue;
-       return device_add_disk(&ubd_devs[unit].pdev.dev, disk, ubd_attr_groups);
-}
-
 #define ROUND_BLOCK(n) ((n + (SECTOR_SIZE - 1)) & (-SECTOR_SIZE))
 
 static const struct blk_mq_ops ubd_mq_ops = {
@@ -881,18 +844,36 @@ static const struct blk_mq_ops ubd_mq_ops = {
 static int ubd_add(int n, char **error_out)
 {
        struct ubd *ubd_dev = &ubd_devs[n];
+       struct queue_limits lim = {
+               .max_segments           = MAX_SG,
+               .seg_boundary_mask      = PAGE_SIZE - 1,
+       };
        struct gendisk *disk;
        int err = 0;
 
        if(ubd_dev->file == NULL)
                goto out;
 
+       if (ubd_dev->cow.file)
+               lim.max_hw_sectors = 8 * sizeof(long);
+       if (!ubd_dev->no_trim) {
+               lim.max_hw_discard_sectors = UBD_MAX_REQUEST;
+               lim.max_write_zeroes_sectors = UBD_MAX_REQUEST;
+       }
+
        err = ubd_file_size(ubd_dev, &ubd_dev->size);
        if(err < 0){
                *error_out = "Couldn't determine size of device's file";
                goto out;
        }
 
+       err = ubd_open_dev(ubd_dev);
+       if (err) {
+               pr_err("ubd%c: Can't open \"%s\": errno = %d\n",
+                       'a' + n, ubd_dev->file, -err);
+               goto out;
+       }
+
        ubd_dev->size = ROUND_BLOCK(ubd_dev->size);
 
        ubd_dev->tag_set.ops = &ubd_mq_ops;
@@ -904,29 +885,43 @@ static int ubd_add(int n, char **error_out)
 
        err = blk_mq_alloc_tag_set(&ubd_dev->tag_set);
        if (err)
-               goto out;
+               goto out_close;
 
-       disk = blk_mq_alloc_disk(&ubd_dev->tag_set, ubd_dev);
+       disk = blk_mq_alloc_disk(&ubd_dev->tag_set, &lim, ubd_dev);
        if (IS_ERR(disk)) {
                err = PTR_ERR(disk);
                goto out_cleanup_tags;
        }
-       ubd_dev->queue = disk->queue;
 
-       blk_queue_write_cache(ubd_dev->queue, true, false);
-       blk_queue_max_segments(ubd_dev->queue, MAX_SG);
-       blk_queue_segment_boundary(ubd_dev->queue, PAGE_SIZE - 1);
-       err = ubd_disk_register(UBD_MAJOR, ubd_dev->size, n, disk);
+       blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
+       blk_queue_write_cache(disk->queue, true, false);
+       disk->major = UBD_MAJOR;
+       disk->first_minor = n << UBD_SHIFT;
+       disk->minors = 1 << UBD_SHIFT;
+       disk->fops = &ubd_blops;
+       set_capacity(disk, ubd_dev->size / 512);
+       sprintf(disk->disk_name, "ubd%c", 'a' + n);
+       disk->private_data = ubd_dev;
+       set_disk_ro(disk, !ubd_dev->openflags.w);
+
+       ubd_dev->pdev.id = n;
+       ubd_dev->pdev.name = DRIVER_NAME;
+       ubd_dev->pdev.dev.release = ubd_device_release;
+       dev_set_drvdata(&ubd_dev->pdev.dev, ubd_dev);
+       platform_device_register(&ubd_dev->pdev);
+
+       err = device_add_disk(&ubd_dev->pdev.dev, disk, ubd_attr_groups);
        if (err)
                goto out_cleanup_disk;
 
-       ubd_gendisk[n] = disk;
        return 0;
 
 out_cleanup_disk:
        put_disk(disk);
 out_cleanup_tags:
        blk_mq_free_tag_set(&ubd_dev->tag_set);
+out_close:
+       ubd_close_dev(ubd_dev);
 out:
        return err;
 }
@@ -1012,7 +1007,6 @@ static int ubd_id(char **str, int *start_out, int *end_out)
 
 static int ubd_remove(int n, char **error_out)
 {
-       struct gendisk *disk = ubd_gendisk[n];
        struct ubd *ubd_dev;
        int err = -ENODEV;
 
@@ -1023,15 +1017,15 @@ static int ubd_remove(int n, char **error_out)
        if(ubd_dev->file == NULL)
                goto out;
 
-       /* you cannot remove a open disk */
-       err = -EBUSY;
-       if(ubd_dev->count > 0)
-               goto out;
+       if (ubd_dev->disk) {
+               /* you cannot remove a open disk */
+               err = -EBUSY;
+               if (disk_openers(ubd_dev->disk))
+                       goto out;
 
-       ubd_gendisk[n] = NULL;
-       if(disk != NULL){
-               del_gendisk(disk);
-               put_disk(disk);
+               del_gendisk(ubd_dev->disk);
+               ubd_close_dev(ubd_dev);
+               put_disk(ubd_dev->disk);
        }
 
        err = 0;
@@ -1153,37 +1147,6 @@ static int __init ubd_driver_init(void){
 
 device_initcall(ubd_driver_init);
 
-static int ubd_open(struct gendisk *disk, blk_mode_t mode)
-{
-       struct ubd *ubd_dev = disk->private_data;
-       int err = 0;
-
-       mutex_lock(&ubd_mutex);
-       if(ubd_dev->count == 0){
-               err = ubd_open_dev(ubd_dev);
-               if(err){
-                       printk(KERN_ERR "%s: Can't open \"%s\": errno = %d\n",
-                              disk->disk_name, ubd_dev->file, -err);
-                       goto out;
-               }
-       }
-       ubd_dev->count++;
-       set_disk_ro(disk, !ubd_dev->openflags.w);
-out:
-       mutex_unlock(&ubd_mutex);
-       return err;
-}
-
-static void ubd_release(struct gendisk *disk)
-{
-       struct ubd *ubd_dev = disk->private_data;
-
-       mutex_lock(&ubd_mutex);
-       if(--ubd_dev->count == 0)
-               ubd_close_dev(ubd_dev);
-       mutex_unlock(&ubd_mutex);
-}
-
 static void cowify_bitmap(__u64 io_offset, int length, unsigned long *cow_mask,
                          __u64 *cow_offset, unsigned long *bitmap,
                          __u64 bitmap_offset, unsigned long *bitmap_words,
index 4b6d1b526bc1217e2e89d4670f9c4385e68dacc7..66fe06db872f05bb775f0089a4f134f77563efe4 100644 (file)
@@ -75,7 +75,7 @@ extern void setup_clear_cpu_cap(unsigned int bit);
  */
 static __always_inline bool _static_cpu_has(u16 bit)
 {
-       asm_volatile_goto("1: jmp 6f\n"
+       asm goto("1: jmp 6f\n"
                 "2:\n"
                 ".skip -(((5f-4f) - (2b-1b)) > 0) * "
                         "((5f-4f) - (2b-1b)),0x90\n"
index b9224cf2ee4d6fcb234be76e072d37fa1cc7ad53..2a7279d80460a8adf0218a954646d9d8343ddf3e 100644 (file)
@@ -379,7 +379,7 @@ config X86_CMOV
 config X86_MINIMUM_CPU_FAMILY
        int
        default "64" if X86_64
-       default "6" if X86_32 && (MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MEFFICEON || MATOM || MCRUSOE || MCORE2 || MK7 || MK8)
+       default "6" if X86_32 && (MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MEFFICEON || MATOM || MCORE2 || MK7 || MK8)
        default "5" if X86_32 && X86_CMPXCHG64
        default "4"
 
index 1a068de12a564fe452cd5c003feb907fd3de42fd..da8f3caf27815e39592443c7c8c09674fe9e2362 100644 (file)
@@ -112,13 +112,13 @@ ifeq ($(CONFIG_X86_32),y)
         # temporary until string.h is fixed
         KBUILD_CFLAGS += -ffreestanding
 
-       ifeq ($(CONFIG_STACKPROTECTOR),y)
-               ifeq ($(CONFIG_SMP),y)
+    ifeq ($(CONFIG_STACKPROTECTOR),y)
+        ifeq ($(CONFIG_SMP),y)
                        KBUILD_CFLAGS += -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard
-               else
+        else
                        KBUILD_CFLAGS += -mstack-protector-guard=global
-               endif
-       endif
+        endif
+    endif
 else
         BITS := 64
         UTS_MACHINE := x86_64
index b2771710ed989cc805e6310bd88a1743350a9dc5..a1bbedd989e42ed5f9e433f556613613094ae74d 100644 (file)
@@ -106,8 +106,7 @@ extra_header_fields:
        .word   0                               # MinorSubsystemVersion
        .long   0                               # Win32VersionValue
 
-       .long   setup_size + ZO__end + pecompat_vsize
-                                               # SizeOfImage
+       .long   setup_size + ZO__end            # SizeOfImage
 
        .long   salign                          # SizeOfHeaders
        .long   0                               # CheckSum
@@ -143,7 +142,7 @@ section_table:
        .ascii  ".setup"
        .byte   0
        .byte   0
-       .long   setup_size - salign             # VirtualSize
+       .long   pecompat_fstart - salign        # VirtualSize
        .long   salign                          # VirtualAddress
        .long   pecompat_fstart - salign        # SizeOfRawData
        .long   salign                          # PointerToRawData
@@ -156,8 +155,8 @@ section_table:
 #ifdef CONFIG_EFI_MIXED
        .asciz  ".compat"
 
-       .long   8                               # VirtualSize
-       .long   setup_size + ZO__end            # VirtualAddress
+       .long   pecompat_fsize                  # VirtualSize
+       .long   pecompat_fstart                 # VirtualAddress
        .long   pecompat_fsize                  # SizeOfRawData
        .long   pecompat_fstart                 # PointerToRawData
 
@@ -172,17 +171,16 @@ section_table:
         * modes this image supports.
         */
        .pushsection ".pecompat", "a", @progbits
-       .balign falign
-       .set    pecompat_vsize, salign
+       .balign salign
        .globl  pecompat_fstart
 pecompat_fstart:
        .byte   0x1                             # Version
        .byte   8                               # Size
        .word   IMAGE_FILE_MACHINE_I386         # PE machine type
        .long   setup_size + ZO_efi32_pe_entry  # Entrypoint
+       .byte   0x0                             # Sentinel
        .popsection
 #else
-       .set    pecompat_vsize, 0
        .set    pecompat_fstart, setup_size
 #endif
        .ascii  ".text"
index 83bb7efad8ae7139ca66f850d7bb21b4859bd3e0..3a2d1360abb016902495f5879632335d883b8c03 100644 (file)
@@ -24,6 +24,9 @@ SECTIONS
        .text           : { *(.text .text.*) }
        .text32         : { *(.text32) }
 
+       .pecompat       : { *(.pecompat) }
+       PROVIDE(pecompat_fsize = setup_size - pecompat_fstart);
+
        . = ALIGN(16);
        .rodata         : { *(.rodata*) }
 
@@ -36,9 +39,6 @@ SECTIONS
        . = ALIGN(16);
        .data           : { *(.data*) }
 
-       .pecompat       : { *(.pecompat) }
-       PROVIDE(pecompat_fsize = setup_size - pecompat_fstart);
-
        .signature      : {
                setup_sig = .;
                LONG(0x5a5aaa55)
index 8c8d38f0cb1df0ee959e09c9f912ec1ab2afce40..0033790499245e3df5f10496986badbe0150aac2 100644 (file)
@@ -6,6 +6,9 @@
 #include <linux/export.h>
 #include <linux/linkage.h>
 #include <asm/msr-index.h>
+#include <asm/unwind_hints.h>
+#include <asm/segment.h>
+#include <asm/cache.h>
 
 .pushsection .noinstr.text, "ax"
 
@@ -20,3 +23,23 @@ SYM_FUNC_END(entry_ibpb)
 EXPORT_SYMBOL_GPL(entry_ibpb);
 
 .popsection
+
+/*
+ * Define the VERW operand that is disguised as entry code so that
+ * it can be referenced with KPTI enabled. This ensure VERW can be
+ * used late in exit-to-user path after page tables are switched.
+ */
+.pushsection .entry.text, "ax"
+
+.align L1_CACHE_BYTES, 0xcc
+SYM_CODE_START_NOALIGN(mds_verw_sel)
+       UNWIND_HINT_UNDEFINED
+       ANNOTATE_NOENDBR
+       .word __KERNEL_DS
+.align L1_CACHE_BYTES, 0xcc
+SYM_CODE_END(mds_verw_sel);
+/* For KVM */
+EXPORT_SYMBOL_GPL(mds_verw_sel);
+
+.popsection
+
index c73047bf9f4bff9c4631c0eab383cedceda41918..fba427646805d55221664538be2285c3ae188ca1 100644 (file)
@@ -885,6 +885,7 @@ SYM_FUNC_START(entry_SYSENTER_32)
        BUG_IF_WRONG_CR3 no_user_check=1
        popfl
        popl    %eax
+       CLEAR_CPU_BUFFERS
 
        /*
         * Return back to the vDSO, which will pop ecx and edx.
@@ -954,6 +955,7 @@ restore_all_switch_stack:
 
        /* Restore user state */
        RESTORE_REGS pop=4                      # skip orig_eax/error_code
+       CLEAR_CPU_BUFFERS
 .Lirq_return:
        /*
         * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization
@@ -1146,6 +1148,7 @@ SYM_CODE_START(asm_exc_nmi)
 
        /* Not on SYSENTER stack. */
        call    exc_nmi
+       CLEAR_CPU_BUFFERS
        jmp     .Lnmi_return
 
 .Lnmi_from_sysenter_stack:
index c40f89ab1b4c70a18b632a50c1e659e3fd83cfa9..9bb4859776291593249b9998416505aeec505011 100644 (file)
@@ -161,6 +161,7 @@ syscall_return_via_sysret:
 SYM_INNER_LABEL(entry_SYSRETQ_unsafe_stack, SYM_L_GLOBAL)
        ANNOTATE_NOENDBR
        swapgs
+       CLEAR_CPU_BUFFERS
        sysretq
 SYM_INNER_LABEL(entry_SYSRETQ_end, SYM_L_GLOBAL)
        ANNOTATE_NOENDBR
@@ -573,6 +574,7 @@ SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
 
 .Lswapgs_and_iret:
        swapgs
+       CLEAR_CPU_BUFFERS
        /* Assert that the IRET frame indicates user mode. */
        testb   $3, 8(%rsp)
        jnz     .Lnative_iret
@@ -723,6 +725,8 @@ native_irq_return_ldt:
         */
        popq    %rax                            /* Restore user RAX */
 
+       CLEAR_CPU_BUFFERS
+
        /*
         * RSP now points to an ordinary IRET frame, except that the page
         * is read-only and RSP[31:16] are preloaded with the userspace
@@ -1449,6 +1453,12 @@ nmi_restore:
        std
        movq    $0, 5*8(%rsp)           /* clear "NMI executing" */
 
+       /*
+        * Skip CLEAR_CPU_BUFFERS here, since it only helps in rare cases like
+        * NMI in kernel after user state is restored. For an unprivileged user
+        * these conditions are hard to meet.
+        */
+
        /*
         * iretq reads the "iret" frame and exits the NMI stack in a
         * single instruction.  We are returning to kernel mode, so this
@@ -1466,6 +1476,7 @@ SYM_CODE_START(entry_SYSCALL32_ignore)
        UNWIND_HINT_END_OF_STACK
        ENDBR
        mov     $-ENOSYS, %eax
+       CLEAR_CPU_BUFFERS
        sysretl
 SYM_CODE_END(entry_SYSCALL32_ignore)
 
index de94e2e84ecca927d9aa0e1ab99466466c163d44..eabf48c4d4b4c30367792f5d9a0b158a9ecf8a04 100644 (file)
@@ -270,6 +270,7 @@ SYM_INNER_LABEL(entry_SYSRETL_compat_unsafe_stack, SYM_L_GLOBAL)
        xorl    %r9d, %r9d
        xorl    %r10d, %r10d
        swapgs
+       CLEAR_CPU_BUFFERS
        sysretl
 SYM_INNER_LABEL(entry_SYSRETL_compat_end, SYM_L_GLOBAL)
        ANNOTATE_NOENDBR
index 96e6c51515f50467efbf7cb77082c6b9d18cb8f6..cf1b78cb2d0431ae7095d1c8769c1e140c4357a7 100644 (file)
 extern struct boot_params boot_params;
 static struct real_mode_header hv_vtl_real_mode_header;
 
+static bool __init hv_vtl_msi_ext_dest_id(void)
+{
+       return true;
+}
+
 void __init hv_vtl_init_platform(void)
 {
        pr_info("Linux runs in Hyper-V Virtual Trust Level\n");
@@ -38,6 +43,8 @@ void __init hv_vtl_init_platform(void)
        x86_platform.legacy.warm_reset = 0;
        x86_platform.legacy.reserve_bios_regions = 0;
        x86_platform.legacy.devices.pnpbios = 0;
+
+       x86_init.hyper.msi_ext_dest_id = hv_vtl_msi_ext_dest_id;
 }
 
 static inline u64 hv_vtl_system_desc_base(struct ldttss_desc *desc)
index 7dcbf153ad7257c3fda712df91b3195efe522ab2..768d73de0d098afc5d7a9b6ea1e7c2747aab4feb 100644 (file)
@@ -15,6 +15,7 @@
 #include <asm/io.h>
 #include <asm/coco.h>
 #include <asm/mem_encrypt.h>
+#include <asm/set_memory.h>
 #include <asm/mshyperv.h>
 #include <asm/hypervisor.h>
 #include <asm/mtrr.h>
@@ -502,6 +503,31 @@ static int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
                return -EFAULT;
 }
 
+/*
+ * When transitioning memory between encrypted and decrypted, the caller
+ * of set_memory_encrypted() or set_memory_decrypted() is responsible for
+ * ensuring that the memory isn't in use and isn't referenced while the
+ * transition is in progress.  The transition has multiple steps, and the
+ * memory is in an inconsistent state until all steps are complete. A
+ * reference while the state is inconsistent could result in an exception
+ * that can't be cleanly fixed up.
+ *
+ * But the Linux kernel load_unaligned_zeropad() mechanism could cause a
+ * stray reference that can't be prevented by the caller, so Linux has
+ * specific code to handle this case. But when the #VC and #VE exceptions
+ * routed to a paravisor, the specific code doesn't work. To avoid this
+ * problem, mark the pages as "not present" while the transition is in
+ * progress. If load_unaligned_zeropad() causes a stray reference, a normal
+ * page fault is generated instead of #VC or #VE, and the page-fault-based
+ * handlers for load_unaligned_zeropad() resolve the reference.  When the
+ * transition is complete, hv_vtom_set_host_visibility() marks the pages
+ * as "present" again.
+ */
+static bool hv_vtom_clear_present(unsigned long kbuffer, int pagecount, bool enc)
+{
+       return !set_memory_np(kbuffer, pagecount);
+}
+
 /*
  * hv_vtom_set_host_visibility - Set specified memory visible to host.
  *
@@ -515,16 +541,28 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
        enum hv_mem_host_visibility visibility = enc ?
                        VMBUS_PAGE_NOT_VISIBLE : VMBUS_PAGE_VISIBLE_READ_WRITE;
        u64 *pfn_array;
+       phys_addr_t paddr;
+       void *vaddr;
        int ret = 0;
        bool result = true;
        int i, pfn;
 
        pfn_array = kmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
-       if (!pfn_array)
-               return false;
+       if (!pfn_array) {
+               result = false;
+               goto err_set_memory_p;
+       }
 
        for (i = 0, pfn = 0; i < pagecount; i++) {
-               pfn_array[pfn] = virt_to_hvpfn((void *)kbuffer + i * HV_HYP_PAGE_SIZE);
+               /*
+                * Use slow_virt_to_phys() because the PRESENT bit has been
+                * temporarily cleared in the PTEs.  slow_virt_to_phys() works
+                * without the PRESENT bit while virt_to_hvpfn() or similar
+                * does not.
+                */
+               vaddr = (void *)kbuffer + (i * HV_HYP_PAGE_SIZE);
+               paddr = slow_virt_to_phys(vaddr);
+               pfn_array[pfn] = paddr >> HV_HYP_PAGE_SHIFT;
                pfn++;
 
                if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
@@ -538,14 +576,30 @@ static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bo
                }
        }
 
- err_free_pfn_array:
+err_free_pfn_array:
        kfree(pfn_array);
+
+err_set_memory_p:
+       /*
+        * Set the PTE PRESENT bits again to revert what hv_vtom_clear_present()
+        * did. Do this even if there is an error earlier in this function in
+        * order to avoid leaving the memory range in a "broken" state. Setting
+        * the PRESENT bits shouldn't fail, but return an error if it does.
+        */
+       if (set_memory_p(kbuffer, pagecount))
+               result = false;
+
        return result;
 }
 
 static bool hv_vtom_tlb_flush_required(bool private)
 {
-       return true;
+       /*
+        * Since hv_vtom_clear_present() marks the PTEs as "not present"
+        * and flushes the TLB, they can't be in the TLB. That makes the
+        * flush controlled by this function redundant, so return "false".
+        */
+       return false;
 }
 
 static bool hv_vtom_cache_flush_required(void)
@@ -608,6 +662,7 @@ void __init hv_vtom_init(void)
        x86_platform.hyper.is_private_mmio = hv_is_private_mmio;
        x86_platform.guest.enc_cache_flush_required = hv_vtom_cache_flush_required;
        x86_platform.guest.enc_tlb_flush_required = hv_vtom_tlb_flush_required;
+       x86_platform.guest.enc_status_change_prepare = hv_vtom_clear_present;
        x86_platform.guest.enc_status_change_finish = hv_vtom_set_host_visibility;
 
        /* Set WB as the default cache mode. */
index 6ae2d16a7613b714cb58283dafa600db5829ba6f..76c310b19b11d898db11cf498d7c82449bbf7dc2 100644 (file)
@@ -10,13 +10,14 @@ enum cc_vendor {
        CC_VENDOR_INTEL,
 };
 
-extern enum cc_vendor cc_vendor;
-
 #ifdef CONFIG_ARCH_HAS_CC_PLATFORM
+extern enum cc_vendor cc_vendor;
 void cc_set_mask(u64 mask);
 u64 cc_mkenc(u64 val);
 u64 cc_mkdec(u64 val);
 #else
+#define cc_vendor (CC_VENDOR_NONE)
+
 static inline u64 cc_mkenc(u64 val)
 {
        return val;
index a26bebbdff87ed20c45bdb98dcc4a8873f5c30f5..a1273698fc430b41951c241b6b76dfa9b7887692 100644 (file)
@@ -168,7 +168,7 @@ extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
  */
 static __always_inline bool _static_cpu_has(u16 bit)
 {
-       asm_volatile_goto(
+       asm goto(
                ALTERNATIVE_TERNARY("jmp 6f", %P[feature], "", "jmp %l[t_no]")
                ".pushsection .altinstr_aux,\"ax\"\n"
                "6:\n"
index 29cb275a219d7fb38fa0d16e6ba48e91c9d032b4..2b62cdd8dd1227f2425e698525b97639a4124f75 100644 (file)
 #define X86_FEATURE_K6_MTRR            ( 3*32+ 1) /* AMD K6 nonstandard MTRRs */
 #define X86_FEATURE_CYRIX_ARR          ( 3*32+ 2) /* Cyrix ARRs (= MTRRs) */
 #define X86_FEATURE_CENTAUR_MCR                ( 3*32+ 3) /* Centaur MCRs (= MTRRs) */
-
-/* CPU types for specific tunings: */
 #define X86_FEATURE_K8                 ( 3*32+ 4) /* "" Opteron, Athlon64 */
-/* FREE, was #define X86_FEATURE_K7                    ( 3*32+ 5) "" Athlon */
+#define X86_FEATURE_ZEN5               ( 3*32+ 5) /* "" CPU based on Zen5 microarchitecture */
 #define X86_FEATURE_P3                 ( 3*32+ 6) /* "" P3 */
 #define X86_FEATURE_P4                 ( 3*32+ 7) /* "" P4 */
 #define X86_FEATURE_CONSTANT_TSC       ( 3*32+ 8) /* TSC ticks at a constant rate */
@@ -97,7 +95,7 @@
 #define X86_FEATURE_SYSENTER32         ( 3*32+15) /* "" sysenter in IA32 userspace */
 #define X86_FEATURE_REP_GOOD           ( 3*32+16) /* REP microcode works well */
 #define X86_FEATURE_AMD_LBR_V2         ( 3*32+17) /* AMD Last Branch Record Extension Version 2 */
-/* FREE, was #define X86_FEATURE_LFENCE_RDTSC          ( 3*32+18) "" LFENCE synchronizes RDTSC */
+#define X86_FEATURE_CLEAR_CPU_BUF      ( 3*32+18) /* "" Clear CPU buffers using VERW */
 #define X86_FEATURE_ACC_POWER          ( 3*32+19) /* AMD Accumulated Power Mechanism */
 #define X86_FEATURE_NOPL               ( 3*32+20) /* The NOPL (0F 1F) instructions */
 #define X86_FEATURE_ALWAYS             ( 3*32+21) /* "" Always-present feature */
index ce8f50192ae3e46da87fe3a24fc736b3b2fc3b21..7e523bb3d2d31a9a8ab9d32ca65a41b5b765c4c4 100644 (file)
@@ -91,7 +91,6 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
 
 static __always_inline void arch_exit_to_user_mode(void)
 {
-       mds_user_clear_cpu_buffers();
        amd_clear_divider();
 }
 #define arch_exit_to_user_mode arch_exit_to_user_mode
index 197316121f04e154dad9ba4d9a7169674c623dd5..b65e9c46b92210293d767ab01434593c2aad27a0 100644 (file)
 #define INTEL_FAM6_ATOM_CRESTMONT_X    0xAF /* Sierra Forest */
 #define INTEL_FAM6_ATOM_CRESTMONT      0xB6 /* Grand Ridge */
 
+#define INTEL_FAM6_ATOM_DARKMONT_X     0xDD /* Clearwater Forest */
+
 /* Xeon Phi */
 
 #define INTEL_FAM6_XEON_PHI_KNL                0x57 /* Knights Landing */
index 071572e23d3a06783e3a1f63e11bb47e99af9daa..cbbef32517f0049a3df51842162032ff1946e901 100644 (file)
@@ -24,7 +24,7 @@
 
 static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
 {
-       asm_volatile_goto("1:"
+       asm goto("1:"
                "jmp %l[l_yes] # objtool NOPs this \n\t"
                JUMP_TABLE_ENTRY
                : :  "i" (key), "i" (2 | branch) : : l_yes);
@@ -38,7 +38,7 @@ l_yes:
 
 static __always_inline bool arch_static_branch(struct static_key * const key, const bool branch)
 {
-       asm_volatile_goto("1:"
+       asm goto("1:"
                ".byte " __stringify(BYTES_NOP5) "\n\t"
                JUMP_TABLE_ENTRY
                : :  "i" (key), "i" (branch) : : l_yes);
@@ -52,7 +52,7 @@ l_yes:
 
 static __always_inline bool arch_static_branch_jump(struct static_key * const key, const bool branch)
 {
-       asm_volatile_goto("1:"
+       asm goto("1:"
                "jmp %l[l_yes]\n\t"
                JUMP_TABLE_ENTRY
                : :  "i" (key), "i" (branch) : : l_yes);
index 8fa6ac0e2d7665f936756748c0e1b4ab08a2c5a7..d91b37f5b4bb45106ee927fcd98b66f1b82a54c1 100644 (file)
@@ -64,6 +64,7 @@ static inline bool kmsan_virt_addr_valid(void *addr)
 {
        unsigned long x = (unsigned long)addr;
        unsigned long y = x - __START_KERNEL_map;
+       bool ret;
 
        /* use the carry flag to determine if x was < __START_KERNEL_map */
        if (unlikely(x > y)) {
@@ -79,7 +80,21 @@ static inline bool kmsan_virt_addr_valid(void *addr)
                        return false;
        }
 
-       return pfn_valid(x >> PAGE_SHIFT);
+       /*
+        * pfn_valid() relies on RCU, and may call into the scheduler on exiting
+        * the critical section. However, this would result in recursion with
+        * KMSAN. Therefore, disable preemption here, and re-enable preemption
+        * below while suppressing reschedules to avoid recursion.
+        *
+        * Note, this sacrifices occasionally breaking scheduling guarantees.
+        * Although, a kernel compiled with KMSAN has already given up on any
+        * performance guarantees due to being heavily instrumented.
+        */
+       preempt_disable();
+       ret = pfn_valid(x >> PAGE_SHIFT);
+       preempt_enable_no_resched();
+
+       return ret;
 }
 
 #endif /* !MODULE */
index b5b2d0fde5796894534f0cb98f96d3076b6bf27d..d271ba20a0b214104a1f11832a1007f5bb35190e 100644 (file)
@@ -1145,6 +1145,8 @@ struct kvm_hv {
        unsigned int synic_auto_eoi_used;
 
        struct kvm_hv_syndbg hv_syndbg;
+
+       bool xsaves_xsavec_checked;
 };
 #endif
 
index 262e65539f83c86d140552305c8a9d330b313c20..2aa52cab1e463af6f4105e2f887acf185dec9f31 100644 (file)
 #endif
 .endm
 
+/*
+ * Macro to execute VERW instruction that mitigate transient data sampling
+ * attacks such as MDS. On affected systems a microcode update overloaded VERW
+ * instruction to also clear the CPU buffers. VERW clobbers CFLAGS.ZF.
+ *
+ * Note: Only the memory operand variant of VERW clears the CPU buffers.
+ */
+.macro CLEAR_CPU_BUFFERS
+       ALTERNATIVE "", __stringify(verw _ASM_RIP(mds_verw_sel)), X86_FEATURE_CLEAR_CPU_BUF
+.endm
+
 #else /* __ASSEMBLY__ */
 
 #define ANNOTATE_RETPOLINE_SAFE                                        \
@@ -529,13 +540,14 @@ DECLARE_STATIC_KEY_FALSE(switch_to_cond_stibp);
 DECLARE_STATIC_KEY_FALSE(switch_mm_cond_ibpb);
 DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
 
-DECLARE_STATIC_KEY_FALSE(mds_user_clear);
 DECLARE_STATIC_KEY_FALSE(mds_idle_clear);
 
 DECLARE_STATIC_KEY_FALSE(switch_mm_cond_l1d_flush);
 
 DECLARE_STATIC_KEY_FALSE(mmio_stale_data_clear);
 
+extern u16 mds_verw_sel;
+
 #include <asm/segment.h>
 
 /**
@@ -561,17 +573,6 @@ static __always_inline void mds_clear_cpu_buffers(void)
        asm volatile("verw %[ds]" : : [ds] "m" (ds) : "cc");
 }
 
-/**
- * mds_user_clear_cpu_buffers - Mitigation for MDS and TAA vulnerability
- *
- * Clear CPU buffers if the corresponding static key is enabled
- */
-static __always_inline void mds_user_clear_cpu_buffers(void)
-{
-       if (static_branch_likely(&mds_user_clear))
-               mds_clear_cpu_buffers();
-}
-
 /**
  * mds_idle_clear_cpu_buffers - Mitigation for MDS vulnerability
  *
index 4b081e0d3306b79cca3dc222bb7406cde371d517..363266cbcadaf29e5bdeba4b0bfd5ab0ccb7355f 100644 (file)
@@ -13,7 +13,7 @@
 #define __GEN_RMWcc(fullop, _var, cc, clobbers, ...)                   \
 ({                                                                     \
        bool c = false;                                                 \
-       asm_volatile_goto (fullop "; j" #cc " %l[cc_label]"             \
+       asm goto (fullop "; j" #cc " %l[cc_label]"              \
                        : : [var] "m" (_var), ## __VA_ARGS__            \
                        : clobbers : cc_label);                         \
        if (0) {                                                        \
index a5e89641bd2dac7e9fa5e1ab548369836640908a..9aee31862b4a8b8cbf2242db991a5cbeb3d41e21 100644 (file)
@@ -47,6 +47,7 @@ int set_memory_uc(unsigned long addr, int numpages);
 int set_memory_wc(unsigned long addr, int numpages);
 int set_memory_wb(unsigned long addr, int numpages);
 int set_memory_np(unsigned long addr, int numpages);
+int set_memory_p(unsigned long addr, int numpages);
 int set_memory_4k(unsigned long addr, int numpages);
 int set_memory_encrypted(unsigned long addr, int numpages);
 int set_memory_decrypted(unsigned long addr, int numpages);
index d6cd9344f6c78e5555486e5d9f231fd27de9da6a..48f8dd47cf6882ac9e3920d6e7105c0eff430528 100644 (file)
@@ -205,7 +205,7 @@ static inline void clwb(volatile void *__p)
 #ifdef CONFIG_X86_USER_SHADOW_STACK
 static inline int write_user_shstk_64(u64 __user *addr, u64 val)
 {
-       asm_volatile_goto("1: wrussq %[val], (%[addr])\n"
+       asm goto("1: wrussq %[val], (%[addr])\n"
                          _ASM_EXTABLE(1b, %l[fail])
                          :: [addr] "r" (addr), [val] "r" (val)
                          :: fail);
index 21f9407be5d357a8f4204addc66841dc50d9bf51..7e88705e907f411b416d25e533e06623997555ea 100644 (file)
@@ -58,12 +58,29 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs *regs);
                ,,regs->di,,regs->si,,regs->dx                          \
                ,,regs->r10,,regs->r8,,regs->r9)                        \
 
+
+/* SYSCALL_PT_ARGS is Adapted from s390x */
+#define SYSCALL_PT_ARG6(m, t1, t2, t3, t4, t5, t6)                     \
+       SYSCALL_PT_ARG5(m, t1, t2, t3, t4, t5), m(t6, (regs->bp))
+#define SYSCALL_PT_ARG5(m, t1, t2, t3, t4, t5)                         \
+       SYSCALL_PT_ARG4(m, t1, t2, t3, t4),  m(t5, (regs->di))
+#define SYSCALL_PT_ARG4(m, t1, t2, t3, t4)                             \
+       SYSCALL_PT_ARG3(m, t1, t2, t3),  m(t4, (regs->si))
+#define SYSCALL_PT_ARG3(m, t1, t2, t3)                                 \
+       SYSCALL_PT_ARG2(m, t1, t2), m(t3, (regs->dx))
+#define SYSCALL_PT_ARG2(m, t1, t2)                                     \
+       SYSCALL_PT_ARG1(m, t1), m(t2, (regs->cx))
+#define SYSCALL_PT_ARG1(m, t1) m(t1, (regs->bx))
+#define SYSCALL_PT_ARGS(x, ...) SYSCALL_PT_ARG##x(__VA_ARGS__)
+
+#define __SC_COMPAT_CAST(t, a)                                         \
+       (__typeof(__builtin_choose_expr(__TYPE_IS_L(t), 0, 0U)))        \
+       (unsigned int)a
+
 /* Mapping of registers to parameters for syscalls on i386 */
 #define SC_IA32_REGS_TO_ARGS(x, ...)                                   \
-       __MAP(x,__SC_ARGS                                               \
-             ,,(unsigned int)regs->bx,,(unsigned int)regs->cx          \
-             ,,(unsigned int)regs->dx,,(unsigned int)regs->si          \
-             ,,(unsigned int)regs->di,,(unsigned int)regs->bp)
+       SYSCALL_PT_ARGS(x, __SC_COMPAT_CAST,                            \
+                       __MAP(x, __SC_TYPE, __VA_ARGS__))               \
 
 #define __SYS_STUB0(abi, name)                                         \
        long __##abi##_##name(const struct pt_regs *regs);              \
index 5c367c1290c355fb3849800a38c88c3553175903..237dc8cdd12b9482f38f8543b85dbde88fb98d65 100644 (file)
@@ -133,7 +133,7 @@ extern int __get_user_bad(void);
 
 #ifdef CONFIG_X86_32
 #define __put_user_goto_u64(x, addr, label)                    \
-       asm_volatile_goto("\n"                                  \
+       asm goto("\n"                                   \
                     "1:        movl %%eax,0(%1)\n"             \
                     "2:        movl %%edx,4(%1)\n"             \
                     _ASM_EXTABLE_UA(1b, %l2)                   \
@@ -295,7 +295,7 @@ do {                                                                        \
 } while (0)
 
 #define __get_user_asm(x, addr, itype, ltype, label)                   \
-       asm_volatile_goto("\n"                                          \
+       asm_goto_output("\n"                                            \
                     "1:        mov"itype" %[umem],%[output]\n"         \
                     _ASM_EXTABLE_UA(1b, %l2)                           \
                     : [output] ltype(x)                                \
@@ -375,7 +375,7 @@ do {                                                                        \
        __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold);              \
        __typeof__(*(_ptr)) __old = *_old;                              \
        __typeof__(*(_ptr)) __new = (_new);                             \
-       asm_volatile_goto("\n"                                          \
+       asm_goto_output("\n"                                            \
                     "1: " LOCK_PREFIX "cmpxchg"itype" %[new], %[ptr]\n"\
                     _ASM_EXTABLE_UA(1b, %l[label])                     \
                     : CC_OUT(z) (success),                             \
@@ -394,7 +394,7 @@ do {                                                                        \
        __typeof__(_ptr) _old = (__typeof__(_ptr))(_pold);              \
        __typeof__(*(_ptr)) __old = *_old;                              \
        __typeof__(*(_ptr)) __new = (_new);                             \
-       asm_volatile_goto("\n"                                          \
+       asm_goto_output("\n"                                            \
                     "1: " LOCK_PREFIX "cmpxchg8b %[ptr]\n"             \
                     _ASM_EXTABLE_UA(1b, %l[label])                     \
                     : CC_OUT(z) (success),                             \
@@ -477,7 +477,7 @@ struct __large_struct { unsigned long buf[100]; };
  * aliasing issues.
  */
 #define __put_user_goto(x, addr, itype, ltype, label)                  \
-       asm_volatile_goto("\n"                                          \
+       asm goto("\n"                                                   \
                "1:     mov"itype" %0,%1\n"                             \
                _ASM_EXTABLE_UA(1b, %l2)                                \
                : : ltype(x), "m" (__m(addr))                           \
index ab60a71a8dcb98e62bccf3c045066df8f42f30f4..472f0263dbc6129c30636f149b94d5828ac300c6 100644 (file)
@@ -4,6 +4,7 @@
 
 #include <linux/seqlock.h>
 #include <uapi/asm/vsyscall.h>
+#include <asm/page_types.h>
 
 #ifdef CONFIG_X86_VSYSCALL_EMULATION
 extern void map_vsyscall(void);
@@ -24,4 +25,13 @@ static inline bool emulate_vsyscall(unsigned long error_code,
 }
 #endif
 
+/*
+ * The (legacy) vsyscall page is the long page in the kernel portion
+ * of the address space that has user-accessible permissions.
+ */
+static inline bool is_vsyscall_vaddr(unsigned long vaddr)
+{
+       return unlikely((vaddr & PAGE_MASK) == VSYSCALL_ADDR);
+}
+
 #endif /* _ASM_X86_VSYSCALL_H */
index cc130b57542ac4033c1c653809f429306eb3e460..1d85cb7071cb21c84899477ec4a150d2fcc4da43 100644 (file)
@@ -403,7 +403,7 @@ noinstr void BUG_func(void)
 {
        BUG();
 }
-EXPORT_SYMBOL_GPL(BUG_func);
+EXPORT_SYMBOL(BUG_func);
 
 #define CALL_RIP_REL_OPCODE    0xff
 #define CALL_RIP_REL_MODRM     0x15
index 9f42d1c59e095ee6923a78cb2ecb04fbe375a438..f3abca334199d8eae235f1560f99448eb9675a27 100644 (file)
@@ -538,7 +538,7 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
 
        /* Figure out Zen generations: */
        switch (c->x86) {
-       case 0x17: {
+       case 0x17:
                switch (c->x86_model) {
                case 0x00 ... 0x2f:
                case 0x50 ... 0x5f:
@@ -554,8 +554,8 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
                        goto warn;
                }
                break;
-       }
-       case 0x19: {
+
+       case 0x19:
                switch (c->x86_model) {
                case 0x00 ... 0x0f:
                case 0x20 ... 0x5f:
@@ -569,7 +569,20 @@ static void bsp_init_amd(struct cpuinfo_x86 *c)
                        goto warn;
                }
                break;
-       }
+
+       case 0x1a:
+               switch (c->x86_model) {
+               case 0x00 ... 0x0f:
+               case 0x20 ... 0x2f:
+               case 0x40 ... 0x4f:
+               case 0x70 ... 0x7f:
+                       setup_force_cpu_cap(X86_FEATURE_ZEN5);
+                       break;
+               default:
+                       goto warn;
+               }
+               break;
+
        default:
                break;
        }
@@ -1039,6 +1052,11 @@ static void init_amd_zen4(struct cpuinfo_x86 *c)
                msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT);
 }
 
+static void init_amd_zen5(struct cpuinfo_x86 *c)
+{
+       init_amd_zen_common();
+}
+
 static void init_amd(struct cpuinfo_x86 *c)
 {
        u64 vm_cr;
@@ -1084,6 +1102,8 @@ static void init_amd(struct cpuinfo_x86 *c)
                init_amd_zen3(c);
        else if (boot_cpu_has(X86_FEATURE_ZEN4))
                init_amd_zen4(c);
+       else if (boot_cpu_has(X86_FEATURE_ZEN5))
+               init_amd_zen5(c);
 
        /*
         * Enable workaround for FXSAVE leak on CPUs
index bb0ab8466b919809a861d7a2f979e132ad863289..48d049cd74e7123a178564ba6fc8ef1dc0212e2d 100644 (file)
@@ -111,9 +111,6 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_cond_ibpb);
 /* Control unconditional IBPB in switch_mm() */
 DEFINE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
 
-/* Control MDS CPU buffer clear before returning to user space */
-DEFINE_STATIC_KEY_FALSE(mds_user_clear);
-EXPORT_SYMBOL_GPL(mds_user_clear);
 /* Control MDS CPU buffer clear before idling (halt, mwait) */
 DEFINE_STATIC_KEY_FALSE(mds_idle_clear);
 EXPORT_SYMBOL_GPL(mds_idle_clear);
@@ -252,7 +249,7 @@ static void __init mds_select_mitigation(void)
                if (!boot_cpu_has(X86_FEATURE_MD_CLEAR))
                        mds_mitigation = MDS_MITIGATION_VMWERV;
 
-               static_branch_enable(&mds_user_clear);
+               setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF);
 
                if (!boot_cpu_has(X86_BUG_MSBDS_ONLY) &&
                    (mds_nosmt || cpu_mitigations_auto_nosmt()))
@@ -356,7 +353,7 @@ static void __init taa_select_mitigation(void)
         * For guests that can't determine whether the correct microcode is
         * present on host, enable the mitigation for UCODE_NEEDED as well.
         */
-       static_branch_enable(&mds_user_clear);
+       setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF);
 
        if (taa_nosmt || cpu_mitigations_auto_nosmt())
                cpu_smt_disable(false);
@@ -424,7 +421,7 @@ static void __init mmio_select_mitigation(void)
         */
        if (boot_cpu_has_bug(X86_BUG_MDS) || (boot_cpu_has_bug(X86_BUG_TAA) &&
                                              boot_cpu_has(X86_FEATURE_RTM)))
-               static_branch_enable(&mds_user_clear);
+               setup_force_cpu_cap(X86_FEATURE_CLEAR_CPU_BUF);
        else
                static_branch_enable(&mmio_stale_data_clear);
 
@@ -484,12 +481,12 @@ static void __init md_clear_update_mitigation(void)
        if (cpu_mitigations_off())
                return;
 
-       if (!static_key_enabled(&mds_user_clear))
+       if (!boot_cpu_has(X86_FEATURE_CLEAR_CPU_BUF))
                goto out;
 
        /*
-        * mds_user_clear is now enabled. Update MDS, TAA and MMIO Stale Data
-        * mitigation, if necessary.
+        * X86_FEATURE_CLEAR_CPU_BUF is now enabled. Update MDS, TAA and MMIO
+        * Stale Data mitigation, if necessary.
         */
        if (mds_mitigation == MDS_MITIGATION_OFF &&
            boot_cpu_has_bug(X86_BUG_MDS)) {
index 0b97bcde70c6102a4b82b561c3256ec53b614770..fbc4e60d027cbff23b91e0d8cf2720cabb64803c 100644 (file)
@@ -1589,6 +1589,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
                get_cpu_vendor(c);
                get_cpu_cap(c);
                setup_force_cpu_cap(X86_FEATURE_CPUID);
+               get_cpu_address_sizes(c);
                cpu_parse_early_param();
 
                if (this_cpu->c_early_init)
@@ -1601,10 +1602,9 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
                        this_cpu->c_bsp_init(c);
        } else {
                setup_clear_cpu_cap(X86_FEATURE_CPUID);
+               get_cpu_address_sizes(c);
        }
 
-       get_cpu_address_sizes(c);
-
        setup_force_cpu_cap(X86_FEATURE_ALWAYS);
 
        cpu_set_bug_bits(c);
index a927a8fc962448035f041c8b17f45ffb6bb9e079..40dec9b56f87db8348c1a242330f243c22c5199d 100644 (file)
@@ -184,6 +184,90 @@ static bool bad_spectre_microcode(struct cpuinfo_x86 *c)
        return false;
 }
 
+#define MSR_IA32_TME_ACTIVATE          0x982
+
+/* Helpers to access TME_ACTIVATE MSR */
+#define TME_ACTIVATE_LOCKED(x)         (x & 0x1)
+#define TME_ACTIVATE_ENABLED(x)                (x & 0x2)
+
+#define TME_ACTIVATE_POLICY(x)         ((x >> 4) & 0xf)        /* Bits 7:4 */
+#define TME_ACTIVATE_POLICY_AES_XTS_128        0
+
+#define TME_ACTIVATE_KEYID_BITS(x)     ((x >> 32) & 0xf)       /* Bits 35:32 */
+
+#define TME_ACTIVATE_CRYPTO_ALGS(x)    ((x >> 48) & 0xffff)    /* Bits 63:48 */
+#define TME_ACTIVATE_CRYPTO_AES_XTS_128        1
+
+/* Values for mktme_status (SW only construct) */
+#define MKTME_ENABLED                  0
+#define MKTME_DISABLED                 1
+#define MKTME_UNINITIALIZED            2
+static int mktme_status = MKTME_UNINITIALIZED;
+
+static void detect_tme_early(struct cpuinfo_x86 *c)
+{
+       u64 tme_activate, tme_policy, tme_crypto_algs;
+       int keyid_bits = 0, nr_keyids = 0;
+       static u64 tme_activate_cpu0 = 0;
+
+       rdmsrl(MSR_IA32_TME_ACTIVATE, tme_activate);
+
+       if (mktme_status != MKTME_UNINITIALIZED) {
+               if (tme_activate != tme_activate_cpu0) {
+                       /* Broken BIOS? */
+                       pr_err_once("x86/tme: configuration is inconsistent between CPUs\n");
+                       pr_err_once("x86/tme: MKTME is not usable\n");
+                       mktme_status = MKTME_DISABLED;
+
+                       /* Proceed. We may need to exclude bits from x86_phys_bits. */
+               }
+       } else {
+               tme_activate_cpu0 = tme_activate;
+       }
+
+       if (!TME_ACTIVATE_LOCKED(tme_activate) || !TME_ACTIVATE_ENABLED(tme_activate)) {
+               pr_info_once("x86/tme: not enabled by BIOS\n");
+               mktme_status = MKTME_DISABLED;
+               return;
+       }
+
+       if (mktme_status != MKTME_UNINITIALIZED)
+               goto detect_keyid_bits;
+
+       pr_info("x86/tme: enabled by BIOS\n");
+
+       tme_policy = TME_ACTIVATE_POLICY(tme_activate);
+       if (tme_policy != TME_ACTIVATE_POLICY_AES_XTS_128)
+               pr_warn("x86/tme: Unknown policy is active: %#llx\n", tme_policy);
+
+       tme_crypto_algs = TME_ACTIVATE_CRYPTO_ALGS(tme_activate);
+       if (!(tme_crypto_algs & TME_ACTIVATE_CRYPTO_AES_XTS_128)) {
+               pr_err("x86/mktme: No known encryption algorithm is supported: %#llx\n",
+                               tme_crypto_algs);
+               mktme_status = MKTME_DISABLED;
+       }
+detect_keyid_bits:
+       keyid_bits = TME_ACTIVATE_KEYID_BITS(tme_activate);
+       nr_keyids = (1UL << keyid_bits) - 1;
+       if (nr_keyids) {
+               pr_info_once("x86/mktme: enabled by BIOS\n");
+               pr_info_once("x86/mktme: %d KeyIDs available\n", nr_keyids);
+       } else {
+               pr_info_once("x86/mktme: disabled by BIOS\n");
+       }
+
+       if (mktme_status == MKTME_UNINITIALIZED) {
+               /* MKTME is usable */
+               mktme_status = MKTME_ENABLED;
+       }
+
+       /*
+        * KeyID bits effectively lower the number of physical address
+        * bits.  Update cpuinfo_x86::x86_phys_bits accordingly.
+        */
+       c->x86_phys_bits -= keyid_bits;
+}
+
 static void early_init_intel(struct cpuinfo_x86 *c)
 {
        u64 misc_enable;
@@ -322,6 +406,13 @@ static void early_init_intel(struct cpuinfo_x86 *c)
         */
        if (detect_extended_topology_early(c) < 0)
                detect_ht_early(c);
+
+       /*
+        * Adjust the number of physical bits early because it affects the
+        * valid bits of the MTRR mask registers.
+        */
+       if (cpu_has(c, X86_FEATURE_TME))
+               detect_tme_early(c);
 }
 
 static void bsp_init_intel(struct cpuinfo_x86 *c)
@@ -482,90 +573,6 @@ static void srat_detect_node(struct cpuinfo_x86 *c)
 #endif
 }
 
-#define MSR_IA32_TME_ACTIVATE          0x982
-
-/* Helpers to access TME_ACTIVATE MSR */
-#define TME_ACTIVATE_LOCKED(x)         (x & 0x1)
-#define TME_ACTIVATE_ENABLED(x)                (x & 0x2)
-
-#define TME_ACTIVATE_POLICY(x)         ((x >> 4) & 0xf)        /* Bits 7:4 */
-#define TME_ACTIVATE_POLICY_AES_XTS_128        0
-
-#define TME_ACTIVATE_KEYID_BITS(x)     ((x >> 32) & 0xf)       /* Bits 35:32 */
-
-#define TME_ACTIVATE_CRYPTO_ALGS(x)    ((x >> 48) & 0xffff)    /* Bits 63:48 */
-#define TME_ACTIVATE_CRYPTO_AES_XTS_128        1
-
-/* Values for mktme_status (SW only construct) */
-#define MKTME_ENABLED                  0
-#define MKTME_DISABLED                 1
-#define MKTME_UNINITIALIZED            2
-static int mktme_status = MKTME_UNINITIALIZED;
-
-static void detect_tme(struct cpuinfo_x86 *c)
-{
-       u64 tme_activate, tme_policy, tme_crypto_algs;
-       int keyid_bits = 0, nr_keyids = 0;
-       static u64 tme_activate_cpu0 = 0;
-
-       rdmsrl(MSR_IA32_TME_ACTIVATE, tme_activate);
-
-       if (mktme_status != MKTME_UNINITIALIZED) {
-               if (tme_activate != tme_activate_cpu0) {
-                       /* Broken BIOS? */
-                       pr_err_once("x86/tme: configuration is inconsistent between CPUs\n");
-                       pr_err_once("x86/tme: MKTME is not usable\n");
-                       mktme_status = MKTME_DISABLED;
-
-                       /* Proceed. We may need to exclude bits from x86_phys_bits. */
-               }
-       } else {
-               tme_activate_cpu0 = tme_activate;
-       }
-
-       if (!TME_ACTIVATE_LOCKED(tme_activate) || !TME_ACTIVATE_ENABLED(tme_activate)) {
-               pr_info_once("x86/tme: not enabled by BIOS\n");
-               mktme_status = MKTME_DISABLED;
-               return;
-       }
-
-       if (mktme_status != MKTME_UNINITIALIZED)
-               goto detect_keyid_bits;
-
-       pr_info("x86/tme: enabled by BIOS\n");
-
-       tme_policy = TME_ACTIVATE_POLICY(tme_activate);
-       if (tme_policy != TME_ACTIVATE_POLICY_AES_XTS_128)
-               pr_warn("x86/tme: Unknown policy is active: %#llx\n", tme_policy);
-
-       tme_crypto_algs = TME_ACTIVATE_CRYPTO_ALGS(tme_activate);
-       if (!(tme_crypto_algs & TME_ACTIVATE_CRYPTO_AES_XTS_128)) {
-               pr_err("x86/mktme: No known encryption algorithm is supported: %#llx\n",
-                               tme_crypto_algs);
-               mktme_status = MKTME_DISABLED;
-       }
-detect_keyid_bits:
-       keyid_bits = TME_ACTIVATE_KEYID_BITS(tme_activate);
-       nr_keyids = (1UL << keyid_bits) - 1;
-       if (nr_keyids) {
-               pr_info_once("x86/mktme: enabled by BIOS\n");
-               pr_info_once("x86/mktme: %d KeyIDs available\n", nr_keyids);
-       } else {
-               pr_info_once("x86/mktme: disabled by BIOS\n");
-       }
-
-       if (mktme_status == MKTME_UNINITIALIZED) {
-               /* MKTME is usable */
-               mktme_status = MKTME_ENABLED;
-       }
-
-       /*
-        * KeyID bits effectively lower the number of physical address
-        * bits.  Update cpuinfo_x86::x86_phys_bits accordingly.
-        */
-       c->x86_phys_bits -= keyid_bits;
-}
-
 static void init_cpuid_fault(struct cpuinfo_x86 *c)
 {
        u64 msr;
@@ -702,9 +709,6 @@ static void init_intel(struct cpuinfo_x86 *c)
 
        init_ia32_feat_ctl(c);
 
-       if (cpu_has(c, X86_FEATURE_TME))
-               detect_tme(c);
-
        init_intel_misc_features(c);
 
        split_lock_init();
index fb8cf953380dab44a5426f78733a25452ade3b87..b66f540de054a72403dbe3b4a837d6b1e280610d 100644 (file)
@@ -1017,10 +1017,12 @@ void __init e820__reserve_setup_data(void)
                e820__range_update(pa_data, sizeof(*data)+data->len, E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
 
                /*
-                * SETUP_EFI and SETUP_IMA are supplied by kexec and do not need
-                * to be reserved.
+                * SETUP_EFI, SETUP_IMA and SETUP_RNG_SEED are supplied by
+                * kexec and do not need to be reserved.
                 */
-               if (data->type != SETUP_EFI && data->type != SETUP_IMA)
+               if (data->type != SETUP_EFI &&
+                   data->type != SETUP_IMA &&
+                   data->type != SETUP_RNG_SEED)
                        e820__range_update_kexec(pa_data,
                                                 sizeof(*data) + data->len,
                                                 E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
index 558076dbde5bfca582139f8de63bd9ffa1050d6f..247f2225aa9f36f0a0fef0a22ed921b4748a7de5 100644 (file)
@@ -274,12 +274,13 @@ static int __restore_fpregs_from_user(void __user *buf, u64 ufeatures,
  * Attempt to restore the FPU registers directly from user memory.
  * Pagefaults are handled and any errors returned are fatal.
  */
-static bool restore_fpregs_from_user(void __user *buf, u64 xrestore,
-                                    bool fx_only, unsigned int size)
+static bool restore_fpregs_from_user(void __user *buf, u64 xrestore, bool fx_only)
 {
        struct fpu *fpu = &current->thread.fpu;
        int ret;
 
+       /* Restore enabled features only. */
+       xrestore &= fpu->fpstate->user_xfeatures;
 retry:
        fpregs_lock();
        /* Ensure that XFD is up to date */
@@ -309,7 +310,7 @@ retry:
                if (ret != X86_TRAP_PF)
                        return false;
 
-               if (!fault_in_readable(buf, size))
+               if (!fault_in_readable(buf, fpu->fpstate->user_size))
                        goto retry;
                return false;
        }
@@ -339,7 +340,6 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx,
        struct user_i387_ia32_struct env;
        bool success, fx_only = false;
        union fpregs_state *fpregs;
-       unsigned int state_size;
        u64 user_xfeatures = 0;
 
        if (use_xsave()) {
@@ -349,17 +349,14 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx,
                        return false;
 
                fx_only = !fx_sw_user.magic1;
-               state_size = fx_sw_user.xstate_size;
                user_xfeatures = fx_sw_user.xfeatures;
        } else {
                user_xfeatures = XFEATURE_MASK_FPSSE;
-               state_size = fpu->fpstate->user_size;
        }
 
        if (likely(!ia32_fxstate)) {
                /* Restore the FPU registers directly from user memory. */
-               return restore_fpregs_from_user(buf_fx, user_xfeatures, fx_only,
-                                               state_size);
+               return restore_fpregs_from_user(buf_fx, user_xfeatures, fx_only);
        }
 
        /*
index dfe9945b9becee7f6d0ca89d00fd3c1eb4e496c5..428ee74002e1eac63d0e269510f536ba2644c3f7 100644 (file)
@@ -434,7 +434,8 @@ static void __init sev_map_percpu_data(void)
 {
        int cpu;
 
-       if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
+       if (cc_vendor != CC_VENDOR_AMD ||
+           !cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
                return;
 
        for_each_possible_cpu(cpu) {
index 17e955ab69feda933cca3708822f6f9f598e31bf..3082cf24b69e34a3a0ca09a50a72ee1aaec8ebc8 100644 (file)
@@ -563,9 +563,6 @@ nmi_restart:
        }
        if (this_cpu_dec_return(nmi_state))
                goto nmi_restart;
-
-       if (user_mode(regs))
-               mds_user_clear_cpu_buffers();
 }
 
 #if IS_ENABLED(CONFIG_KVM_INTEL)
index 87e3da7b0439790dac6b35aa4f95e8e7573284d7..65ed14b6540bbebfb91e1d20d0c7627277da3f26 100644 (file)
@@ -80,9 +80,10 @@ config KVM_SW_PROTECTED_VM
        depends on KVM && X86_64
        select KVM_GENERIC_PRIVATE_MEM
        help
-         Enable support for KVM software-protected VMs.  Currently "protected"
-         means the VM can be backed with memory provided by
-         KVM_CREATE_GUEST_MEMFD.
+         Enable support for KVM software-protected VMs.  Currently, software-
+         protected VMs are purely a development and testing vehicle for
+         KVM_CREATE_GUEST_MEMFD.  Attempting to run a "real" VM workload as a
+         software-protected VM will fail miserably.
 
          If unsure, say "N".
 
index 4943f6b2bbee491651bdacf288e4cdbda2e49dec..8a47f8541eab7098c991837c2b4e03c4822c445a 100644 (file)
@@ -1322,6 +1322,56 @@ static bool hv_check_msr_access(struct kvm_vcpu_hv *hv_vcpu, u32 msr)
        return false;
 }
 
+#define KVM_HV_WIN2016_GUEST_ID 0x1040a00003839
+#define KVM_HV_WIN2016_GUEST_ID_MASK (~GENMASK_ULL(23, 16)) /* mask out the service version */
+
+/*
+ * Hyper-V enabled Windows Server 2016 SMP VMs fail to boot in !XSAVES && XSAVEC
+ * configuration.
+ * Such configuration can result from, for example, AMD Erratum 1386 workaround.
+ *
+ * Print a notice so users aren't left wondering what's suddenly gone wrong.
+ */
+static void __kvm_hv_xsaves_xsavec_maybe_warn(struct kvm_vcpu *vcpu)
+{
+       struct kvm *kvm = vcpu->kvm;
+       struct kvm_hv *hv = to_kvm_hv(kvm);
+
+       /* Check again under the hv_lock.  */
+       if (hv->xsaves_xsavec_checked)
+               return;
+
+       if ((hv->hv_guest_os_id & KVM_HV_WIN2016_GUEST_ID_MASK) !=
+           KVM_HV_WIN2016_GUEST_ID)
+               return;
+
+       hv->xsaves_xsavec_checked = true;
+
+       /* UP configurations aren't affected */
+       if (atomic_read(&kvm->online_vcpus) < 2)
+               return;
+
+       if (guest_cpuid_has(vcpu, X86_FEATURE_XSAVES) ||
+           !guest_cpuid_has(vcpu, X86_FEATURE_XSAVEC))
+               return;
+
+       pr_notice_ratelimited("Booting SMP Windows KVM VM with !XSAVES && XSAVEC. "
+                             "If it fails to boot try disabling XSAVEC in the VM config.\n");
+}
+
+void kvm_hv_xsaves_xsavec_maybe_warn(struct kvm_vcpu *vcpu)
+{
+       struct kvm_hv *hv = to_kvm_hv(vcpu->kvm);
+
+       if (!vcpu->arch.hyperv_enabled ||
+           hv->xsaves_xsavec_checked)
+               return;
+
+       mutex_lock(&hv->hv_lock);
+       __kvm_hv_xsaves_xsavec_maybe_warn(vcpu);
+       mutex_unlock(&hv->hv_lock);
+}
+
 static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data,
                             bool host)
 {
index 1dc0b6604526a1c629b3709d54c674f1ae1ce080..923e64903da9afeeff80f76062c45bd5ab076717 100644 (file)
@@ -182,6 +182,8 @@ void kvm_hv_setup_tsc_page(struct kvm *kvm,
                           struct pvclock_vcpu_time_info *hv_clock);
 void kvm_hv_request_tsc_page_update(struct kvm *kvm);
 
+void kvm_hv_xsaves_xsavec_maybe_warn(struct kvm_vcpu *vcpu);
+
 void kvm_hv_init_vm(struct kvm *kvm);
 void kvm_hv_destroy_vm(struct kvm *kvm);
 int kvm_hv_vcpu_init(struct kvm_vcpu *vcpu);
@@ -267,6 +269,7 @@ int kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu);
 static inline void kvm_hv_setup_tsc_page(struct kvm *kvm,
                                         struct pvclock_vcpu_time_info *hv_clock) {}
 static inline void kvm_hv_request_tsc_page_update(struct kvm *kvm) {}
+static inline void kvm_hv_xsaves_xsavec_maybe_warn(struct kvm_vcpu *vcpu) {}
 static inline void kvm_hv_init_vm(struct kvm *kvm) {}
 static inline void kvm_hv_destroy_vm(struct kvm *kvm) {}
 static inline int kvm_hv_vcpu_init(struct kvm_vcpu *vcpu)
index 2d6cdeab1f8a3e78306148d44a4665a1d51d8b1e..0544700ca50b8458ad97020bde53ec24432a21c2 100644 (file)
@@ -4405,6 +4405,31 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
        fault->mmu_seq = vcpu->kvm->mmu_invalidate_seq;
        smp_rmb();
 
+       /*
+        * Check for a relevant mmu_notifier invalidation event before getting
+        * the pfn from the primary MMU, and before acquiring mmu_lock.
+        *
+        * For mmu_lock, if there is an in-progress invalidation and the kernel
+        * allows preemption, the invalidation task may drop mmu_lock and yield
+        * in response to mmu_lock being contended, which is *very* counter-
+        * productive as this vCPU can't actually make forward progress until
+        * the invalidation completes.
+        *
+        * Retrying now can also avoid unnessary lock contention in the primary
+        * MMU, as the primary MMU doesn't necessarily hold a single lock for
+        * the duration of the invalidation, i.e. faulting in a conflicting pfn
+        * can cause the invalidation to take longer by holding locks that are
+        * needed to complete the invalidation.
+        *
+        * Do the pre-check even for non-preemtible kernels, i.e. even if KVM
+        * will never yield mmu_lock in response to contention, as this vCPU is
+        * *guaranteed* to need to retry, i.e. waiting until mmu_lock is held
+        * to detect retry guarantees the worst case latency for the vCPU.
+        */
+       if (fault->slot &&
+           mmu_invalidate_retry_gfn_unsafe(vcpu->kvm, fault->mmu_seq, fault->gfn))
+               return RET_PF_RETRY;
+
        ret = __kvm_faultin_pfn(vcpu, fault);
        if (ret != RET_PF_CONTINUE)
                return ret;
@@ -4415,6 +4440,18 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
        if (unlikely(!fault->slot))
                return kvm_handle_noslot_fault(vcpu, fault, access);
 
+       /*
+        * Check again for a relevant mmu_notifier invalidation event purely to
+        * avoid contending mmu_lock.  Most invalidations will be detected by
+        * the previous check, but checking is extremely cheap relative to the
+        * overall cost of failing to detect the invalidation until after
+        * mmu_lock is acquired.
+        */
+       if (mmu_invalidate_retry_gfn_unsafe(vcpu->kvm, fault->mmu_seq, fault->gfn)) {
+               kvm_release_pfn_clean(fault->pfn);
+               return RET_PF_RETRY;
+       }
+
        return RET_PF_CONTINUE;
 }
 
@@ -4442,6 +4479,11 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
        if (!sp && kvm_test_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu))
                return true;
 
+       /*
+        * Check for a relevant mmu_notifier invalidation event one last time
+        * now that mmu_lock is held, as the "unsafe" checks performed without
+        * holding mmu_lock can get false negatives.
+        */
        return fault->slot &&
               mmu_invalidate_retry_gfn(vcpu->kvm, fault->mmu_seq, fault->gfn);
 }
index f760106c31f8a58d2941dbabd82531b9779089fa..a8ce5226b3b5785b0b73741148076dc978761fa0 100644 (file)
@@ -57,7 +57,7 @@ static bool sev_es_enabled = true;
 module_param_named(sev_es, sev_es_enabled, bool, 0444);
 
 /* enable/disable SEV-ES DebugSwap support */
-static bool sev_es_debug_swap_enabled = true;
+static bool sev_es_debug_swap_enabled = false;
 module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
 #else
 #define sev_enabled false
@@ -612,8 +612,11 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
        save->xss  = svm->vcpu.arch.ia32_xss;
        save->dr6  = svm->vcpu.arch.dr6;
 
-       if (sev_es_debug_swap_enabled)
+       if (sev_es_debug_swap_enabled) {
                save->sev_features |= SVM_SEV_FEAT_DEBUG_SWAP;
+               pr_warn_once("Enabling DebugSwap with KVM_SEV_ES_INIT. "
+                            "This will not work starting with Linux 6.10\n");
+       }
 
        pr_debug("Virtual Machine Save Area (VMSA):\n");
        print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
@@ -1975,20 +1978,22 @@ int sev_mem_enc_register_region(struct kvm *kvm,
                goto e_free;
        }
 
-       region->uaddr = range->addr;
-       region->size = range->size;
-
-       list_add_tail(&region->list, &sev->regions_list);
-       mutex_unlock(&kvm->lock);
-
        /*
         * The guest may change the memory encryption attribute from C=0 -> C=1
         * or vice versa for this memory range. Lets make sure caches are
         * flushed to ensure that guest data gets written into memory with
-        * correct C-bit.
+        * correct C-bit.  Note, this must be done before dropping kvm->lock,
+        * as region and its array of pages can be freed by a different task
+        * once kvm->lock is released.
         */
        sev_clflush_pages(region->pages, region->npages);
 
+       region->uaddr = range->addr;
+       region->size = range->size;
+
+       list_add_tail(&region->list, &sev->regions_list);
+       mutex_unlock(&kvm->lock);
+
        return ret;
 
 e_free:
index 36c8af87a707ac0556fb1e50157e70c6305df798..4e725854c63a10c8645fa3b875a7a718020e96fe 100644 (file)
@@ -8,7 +8,7 @@
 
 #define svm_asm(insn, clobber...)                              \
 do {                                                           \
-       asm_volatile_goto("1: " __stringify(insn) "\n\t"        \
+       asm goto("1: " __stringify(insn) "\n\t" \
                          _ASM_EXTABLE(1b, %l[fault])           \
                          ::: clobber : fault);                 \
        return;                                                 \
@@ -18,7 +18,7 @@ fault:                                                                \
 
 #define svm_asm1(insn, op1, clobber...)                                \
 do {                                                           \
-       asm_volatile_goto("1: "  __stringify(insn) " %0\n\t"    \
+       asm goto("1: "  __stringify(insn) " %0\n\t"     \
                          _ASM_EXTABLE(1b, %l[fault])           \
                          :: op1 : clobber : fault);            \
        return;                                                 \
@@ -28,7 +28,7 @@ fault:                                                                \
 
 #define svm_asm2(insn, op1, op2, clobber...)                           \
 do {                                                                   \
-       asm_volatile_goto("1: "  __stringify(insn) " %1, %0\n\t"        \
+       asm goto("1: "  __stringify(insn) " %1, %0\n\t" \
                          _ASM_EXTABLE(1b, %l[fault])                   \
                          :: op1, op2 : clobber : fault);               \
        return;                                                         \
index a6216c8747291f4c8aeed534117fad1f3808acb8..315c7c2ba89b13437fe4c3cbb93d92f75bd8f3f1 100644 (file)
@@ -71,7 +71,7 @@ static int fixed_pmc_events[] = {
 static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
 {
        struct kvm_pmc *pmc;
-       u8 old_fixed_ctr_ctrl = pmu->fixed_ctr_ctrl;
+       u64 old_fixed_ctr_ctrl = pmu->fixed_ctr_ctrl;
        int i;
 
        pmu->fixed_ctr_ctrl = data;
index edc3f16cc1896f29e4eef46da685d22b4c31c668..6a9bfdfbb6e59b2e613385cd2ad46cc651a0eb28 100644 (file)
@@ -2,7 +2,10 @@
 #ifndef __KVM_X86_VMX_RUN_FLAGS_H
 #define __KVM_X86_VMX_RUN_FLAGS_H
 
-#define VMX_RUN_VMRESUME       (1 << 0)
-#define VMX_RUN_SAVE_SPEC_CTRL (1 << 1)
+#define VMX_RUN_VMRESUME_SHIFT         0
+#define VMX_RUN_SAVE_SPEC_CTRL_SHIFT   1
+
+#define VMX_RUN_VMRESUME               BIT(VMX_RUN_VMRESUME_SHIFT)
+#define VMX_RUN_SAVE_SPEC_CTRL         BIT(VMX_RUN_SAVE_SPEC_CTRL_SHIFT)
 
 #endif /* __KVM_X86_VMX_RUN_FLAGS_H */
index 906ecd001511355d0939e4e90a3994a7bd9809e3..2bfbf758d06110f49c71a22c1f54da9d9499669a 100644 (file)
@@ -139,7 +139,7 @@ SYM_FUNC_START(__vmx_vcpu_run)
        mov (%_ASM_SP), %_ASM_AX
 
        /* Check if vmlaunch or vmresume is needed */
-       test $VMX_RUN_VMRESUME, %ebx
+       bt   $VMX_RUN_VMRESUME_SHIFT, %ebx
 
        /* Load guest registers.  Don't clobber flags. */
        mov VCPU_RCX(%_ASM_AX), %_ASM_CX
@@ -161,8 +161,11 @@ SYM_FUNC_START(__vmx_vcpu_run)
        /* Load guest RAX.  This kills the @regs pointer! */
        mov VCPU_RAX(%_ASM_AX), %_ASM_AX
 
-       /* Check EFLAGS.ZF from 'test VMX_RUN_VMRESUME' above */
-       jz .Lvmlaunch
+       /* Clobbers EFLAGS.ZF */
+       CLEAR_CPU_BUFFERS
+
+       /* Check EFLAGS.CF from the VMX_RUN_VMRESUME bit test above. */
+       jnc .Lvmlaunch
 
        /*
         * After a successful VMRESUME/VMLAUNCH, control flow "magically"
index e262bc2ba4e569983a94c3db5c13e1f0dabd9951..88a4ff200d04bf2ae6c9c2953608c3d34b2ac592 100644 (file)
@@ -388,7 +388,16 @@ static __always_inline void vmx_enable_fb_clear(struct vcpu_vmx *vmx)
 
 static void vmx_update_fb_clear_dis(struct kvm_vcpu *vcpu, struct vcpu_vmx *vmx)
 {
-       vmx->disable_fb_clear = (host_arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) &&
+       /*
+        * Disable VERW's behavior of clearing CPU buffers for the guest if the
+        * CPU isn't affected by MDS/TAA, and the host hasn't forcefully enabled
+        * the mitigation. Disabling the clearing behavior provides a
+        * performance boost for guests that aren't aware that manually clearing
+        * CPU buffers is unnecessary, at the cost of MSR accesses on VM-Entry
+        * and VM-Exit.
+        */
+       vmx->disable_fb_clear = !cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF) &&
+                               (host_arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) &&
                                !boot_cpu_has_bug(X86_BUG_MDS) &&
                                !boot_cpu_has_bug(X86_BUG_TAA);
 
@@ -738,7 +747,7 @@ static int vmx_set_guest_uret_msr(struct vcpu_vmx *vmx,
  */
 static int kvm_cpu_vmxoff(void)
 {
-       asm_volatile_goto("1: vmxoff\n\t"
+       asm goto("1: vmxoff\n\t"
                          _ASM_EXTABLE(1b, %l[fault])
                          ::: "cc", "memory" : fault);
 
@@ -2784,7 +2793,7 @@ static int kvm_cpu_vmxon(u64 vmxon_pointer)
 
        cr4_set_bits(X86_CR4_VMXE);
 
-       asm_volatile_goto("1: vmxon %[vmxon_pointer]\n\t"
+       asm goto("1: vmxon %[vmxon_pointer]\n\t"
                          _ASM_EXTABLE(1b, %l[fault])
                          : : [vmxon_pointer] "m"(vmxon_pointer)
                          : : fault);
@@ -7224,11 +7233,14 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
 
        guest_state_enter_irqoff();
 
-       /* L1D Flush includes CPU buffer clear to mitigate MDS */
+       /*
+        * L1D Flush includes CPU buffer clear to mitigate MDS, but VERW
+        * mitigation for MDS is done late in VMentry and is still
+        * executed in spite of L1D Flush. This is because an extra VERW
+        * should not matter much after the big hammer L1D Flush.
+        */
        if (static_branch_unlikely(&vmx_l1d_should_flush))
                vmx_l1d_flush(vcpu);
-       else if (static_branch_unlikely(&mds_user_clear))
-               mds_clear_cpu_buffers();
        else if (static_branch_unlikely(&mmio_stale_data_clear) &&
                 kvm_arch_has_assigned_device(vcpu->kvm))
                mds_clear_cpu_buffers();
index f41ce3c24123a93e08a996d4daa8e033077caf6d..8060e5fc6dbd83e145f6c08fca370e4a42d9861b 100644 (file)
@@ -94,7 +94,7 @@ static __always_inline unsigned long __vmcs_readl(unsigned long field)
 
 #ifdef CONFIG_CC_HAS_ASM_GOTO_OUTPUT
 
-       asm_volatile_goto("1: vmread %[field], %[output]\n\t"
+       asm_goto_output("1: vmread %[field], %[output]\n\t"
                          "jna %l[do_fail]\n\t"
 
                          _ASM_EXTABLE(1b, %l[do_exception])
@@ -188,7 +188,7 @@ static __always_inline unsigned long vmcs_readl(unsigned long field)
 
 #define vmx_asm1(insn, op1, error_args...)                             \
 do {                                                                   \
-       asm_volatile_goto("1: " __stringify(insn) " %0\n\t"             \
+       asm goto("1: " __stringify(insn) " %0\n\t"                      \
                          ".byte 0x2e\n\t" /* branch not taken hint */  \
                          "jna %l[error]\n\t"                           \
                          _ASM_EXTABLE(1b, %l[fault])                   \
@@ -205,7 +205,7 @@ fault:                                                                      \
 
 #define vmx_asm2(insn, op1, op2, error_args...)                                \
 do {                                                                   \
-       asm_volatile_goto("1: "  __stringify(insn) " %1, %0\n\t"        \
+       asm goto("1: "  __stringify(insn) " %1, %0\n\t"                 \
                          ".byte 0x2e\n\t" /* branch not taken hint */  \
                          "jna %l[error]\n\t"                           \
                          _ASM_EXTABLE(1b, %l[fault])                   \
index 363b1c08020578b090b53d74482d9c8912629ec9..e02cc710f56de285fc080a0c0a4ecca4addaff4f 100644 (file)
@@ -1704,22 +1704,17 @@ static int do_get_msr_feature(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
        struct kvm_msr_entry msr;
        int r;
 
+       /* Unconditionally clear the output for simplicity */
+       msr.data = 0;
        msr.index = index;
        r = kvm_get_msr_feature(&msr);
 
-       if (r == KVM_MSR_RET_INVALID) {
-               /* Unconditionally clear the output for simplicity */
-               *data = 0;
-               if (kvm_msr_ignored_check(index, 0, false))
-                       r = 0;
-       }
-
-       if (r)
-               return r;
+       if (r == KVM_MSR_RET_INVALID && kvm_msr_ignored_check(index, 0, false))
+               r = 0;
 
        *data = msr.data;
 
-       return 0;
+       return r;
 }
 
 static bool __kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
@@ -1782,6 +1777,10 @@ static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
        if ((efer ^ old_efer) & KVM_MMU_EFER_ROLE_BITS)
                kvm_mmu_reset_context(vcpu);
 
+       if (!static_cpu_has(X86_FEATURE_XSAVES) &&
+           (efer & EFER_SVME))
+               kvm_hv_xsaves_xsavec_maybe_warn(vcpu);
+
        return 0;
 }
 
@@ -2507,7 +2506,7 @@ static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns)
 }
 
 #ifdef CONFIG_X86_64
-static inline int gtod_is_based_on_tsc(int mode)
+static inline bool gtod_is_based_on_tsc(int mode)
 {
        return mode == VDSO_CLOCKMODE_TSC || mode == VDSO_CLOCKMODE_HVCLOCK;
 }
@@ -4581,7 +4580,7 @@ static bool kvm_is_vm_type_supported(unsigned long type)
 {
        return type == KVM_X86_DEFAULT_VM ||
               (type == KVM_X86_SW_PROTECTED_VM &&
-               IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) && tdp_enabled);
+               IS_ENABLED(CONFIG_KVM_SW_PROTECTED_VM) && tdp_mmu_enabled);
 }
 
 int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
@@ -5454,7 +5453,8 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
        if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
                vcpu->arch.nmi_pending = 0;
                atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
-               kvm_make_request(KVM_REQ_NMI, vcpu);
+               if (events->nmi.pending)
+                       kvm_make_request(KVM_REQ_NMI, vcpu);
        }
        static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
 
@@ -7016,6 +7016,9 @@ set_identity_unlock:
                r = -EEXIST;
                if (kvm->arch.vpit)
                        goto create_pit_unlock;
+               r = -ENOENT;
+               if (!pic_in_kernel(kvm))
+                       goto create_pit_unlock;
                r = -ENOMEM;
                kvm->arch.vpit = kvm_create_pit(kvm, u.pit_config.flags);
                if (kvm->arch.vpit)
@@ -8004,6 +8007,16 @@ static int emulator_cmpxchg_emulated(struct x86_emulate_ctxt *ctxt,
 
        if (r < 0)
                return X86EMUL_UNHANDLEABLE;
+
+       /*
+        * Mark the page dirty _before_ checking whether or not the CMPXCHG was
+        * successful, as the old value is written back on failure.  Note, for
+        * live migration, this is unnecessarily conservative as CMPXCHG writes
+        * back the original value and the access is atomic, but KVM's ABI is
+        * that all writes are dirty logged, regardless of the value written.
+        */
+       kvm_vcpu_mark_page_dirty(vcpu, gpa_to_gfn(gpa));
+
        if (r)
                return X86EMUL_CMPXCHG_FAILED;
 
index 20ef350a60fbb59a4b183bc3e54b1d517e6bba9b..10d5ed8b5990f4d2f64436b71905a9d817df11a1 100644 (file)
@@ -163,23 +163,23 @@ SYM_CODE_END(__get_user_8_handle_exception)
 #endif
 
 /* get_user */
-       _ASM_EXTABLE(1b, __get_user_handle_exception)
-       _ASM_EXTABLE(2b, __get_user_handle_exception)
-       _ASM_EXTABLE(3b, __get_user_handle_exception)
+       _ASM_EXTABLE_UA(1b, __get_user_handle_exception)
+       _ASM_EXTABLE_UA(2b, __get_user_handle_exception)
+       _ASM_EXTABLE_UA(3b, __get_user_handle_exception)
 #ifdef CONFIG_X86_64
-       _ASM_EXTABLE(4b, __get_user_handle_exception)
+       _ASM_EXTABLE_UA(4b, __get_user_handle_exception)
 #else
-       _ASM_EXTABLE(4b, __get_user_8_handle_exception)
-       _ASM_EXTABLE(5b, __get_user_8_handle_exception)
+       _ASM_EXTABLE_UA(4b, __get_user_8_handle_exception)
+       _ASM_EXTABLE_UA(5b, __get_user_8_handle_exception)
 #endif
 
 /* __get_user */
-       _ASM_EXTABLE(6b, __get_user_handle_exception)
-       _ASM_EXTABLE(7b, __get_user_handle_exception)
-       _ASM_EXTABLE(8b, __get_user_handle_exception)
+       _ASM_EXTABLE_UA(6b, __get_user_handle_exception)
+       _ASM_EXTABLE_UA(7b, __get_user_handle_exception)
+       _ASM_EXTABLE_UA(8b, __get_user_handle_exception)
 #ifdef CONFIG_X86_64
-       _ASM_EXTABLE(9b, __get_user_handle_exception)
+       _ASM_EXTABLE_UA(9b, __get_user_handle_exception)
 #else
-       _ASM_EXTABLE(9b, __get_user_8_handle_exception)
-       _ASM_EXTABLE(10b, __get_user_8_handle_exception)
+       _ASM_EXTABLE_UA(9b, __get_user_8_handle_exception)
+       _ASM_EXTABLE_UA(10b, __get_user_8_handle_exception)
 #endif
index 2877f59341775aa38a68d152f72e0d55606c1cac..975c9c18263d2afd926c12a8bfffe0c2d72d43cd 100644 (file)
@@ -133,15 +133,15 @@ SYM_CODE_START_LOCAL(__put_user_handle_exception)
        RET
 SYM_CODE_END(__put_user_handle_exception)
 
-       _ASM_EXTABLE(1b, __put_user_handle_exception)
-       _ASM_EXTABLE(2b, __put_user_handle_exception)
-       _ASM_EXTABLE(3b, __put_user_handle_exception)
-       _ASM_EXTABLE(4b, __put_user_handle_exception)
-       _ASM_EXTABLE(5b, __put_user_handle_exception)
-       _ASM_EXTABLE(6b, __put_user_handle_exception)
-       _ASM_EXTABLE(7b, __put_user_handle_exception)
-       _ASM_EXTABLE(9b, __put_user_handle_exception)
+       _ASM_EXTABLE_UA(1b, __put_user_handle_exception)
+       _ASM_EXTABLE_UA(2b, __put_user_handle_exception)
+       _ASM_EXTABLE_UA(3b, __put_user_handle_exception)
+       _ASM_EXTABLE_UA(4b, __put_user_handle_exception)
+       _ASM_EXTABLE_UA(5b, __put_user_handle_exception)
+       _ASM_EXTABLE_UA(6b, __put_user_handle_exception)
+       _ASM_EXTABLE_UA(7b, __put_user_handle_exception)
+       _ASM_EXTABLE_UA(9b, __put_user_handle_exception)
 #ifdef CONFIG_X86_32
-       _ASM_EXTABLE(8b, __put_user_handle_exception)
-       _ASM_EXTABLE(10b, __put_user_handle_exception)
+       _ASM_EXTABLE_UA(8b, __put_user_handle_exception)
+       _ASM_EXTABLE_UA(10b, __put_user_handle_exception)
 #endif
index 679b09cfe241c72e7f85bd7bbd406d59a259bf2a..d6375b3c633bc45474bbb2d6460512863ff14a51 100644 (file)
@@ -798,15 +798,6 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
        show_opcodes(regs, loglvl);
 }
 
-/*
- * The (legacy) vsyscall page is the long page in the kernel portion
- * of the address space that has user-accessible permissions.
- */
-static bool is_vsyscall_vaddr(unsigned long vaddr)
-{
-       return unlikely((vaddr & PAGE_MASK) == VSYSCALL_ADDR);
-}
-
 static void
 __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
                       unsigned long address, u32 pkey, int si_code)
index 968d7005f4a72454ccf8678967f040fe06f36ad6..f50cc210a981886e7d3a265b4d43ca16f47f6825 100644 (file)
@@ -26,18 +26,31 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
        for (; addr < end; addr = next) {
                pud_t *pud = pud_page + pud_index(addr);
                pmd_t *pmd;
+               bool use_gbpage;
 
                next = (addr & PUD_MASK) + PUD_SIZE;
                if (next > end)
                        next = end;
 
-               if (info->direct_gbpages) {
-                       pud_t pudval;
+               /* if this is already a gbpage, this portion is already mapped */
+               if (pud_large(*pud))
+                       continue;
+
+               /* Is using a gbpage allowed? */
+               use_gbpage = info->direct_gbpages;
 
-                       if (pud_present(*pud))
-                               continue;
+               /* Don't use gbpage if it maps more than the requested region. */
+               /* at the begining: */
+               use_gbpage &= ((addr & ~PUD_MASK) == 0);
+               /* ... or at the end: */
+               use_gbpage &= ((next & ~PUD_MASK) == 0);
+
+               /* Never overwrite existing mappings */
+               use_gbpage &= !pud_present(*pud);
+
+               if (use_gbpage) {
+                       pud_t pudval;
 
-                       addr &= PUD_MASK;
                        pudval = __pud((addr - info->offset) | info->page_flag);
                        set_pud(pud, pudval);
                        continue;
index 6993f026adec9d12a68cdbf3af3314336882f36f..42115ac079cfe617b76199a167c61e5b3c7de10f 100644 (file)
@@ -3,6 +3,8 @@
 #include <linux/uaccess.h>
 #include <linux/kernel.h>
 
+#include <asm/vsyscall.h>
+
 #ifdef CONFIG_X86_64
 bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size)
 {
@@ -15,6 +17,14 @@ bool copy_from_kernel_nofault_allowed(const void *unsafe_src, size_t size)
        if (vaddr < TASK_SIZE_MAX + PAGE_SIZE)
                return false;
 
+       /*
+        * Reading from the vsyscall page may cause an unhandled fault in
+        * certain cases.  Though it is at an address above TASK_SIZE_MAX, it is
+        * usually considered as a user space address.
+        */
+       if (is_vsyscall_vaddr(vaddr))
+               return false;
+
        /*
         * Allow everything during early boot before 'x86_virt_bits'
         * is initialized.  Needed for instruction decoding in early
index adc497b93f03746aca087a71233b806a5790bf96..65e9a6e391c046d1c18c32ffa0049082461a82dd 100644 (file)
@@ -934,7 +934,7 @@ static int __init cmp_memblk(const void *a, const void *b)
        const struct numa_memblk *ma = *(const struct numa_memblk **)a;
        const struct numa_memblk *mb = *(const struct numa_memblk **)b;
 
-       return ma->start - mb->start;
+       return (ma->start > mb->start) - (ma->start < mb->start);
 }
 
 static struct numa_memblk *numa_memblk_list[NR_NODE_MEMBLKS] __initdata;
@@ -944,14 +944,12 @@ static struct numa_memblk *numa_memblk_list[NR_NODE_MEMBLKS] __initdata;
  * @start: address to begin fill
  * @end: address to end fill
  *
- * Find and extend numa_meminfo memblks to cover the @start-@end
- * physical address range, such that the first memblk includes
- * @start, the last memblk includes @end, and any gaps in between
- * are filled.
+ * Find and extend numa_meminfo memblks to cover the physical
+ * address range @start-@end
  *
  * RETURNS:
  * 0             : Success
- * NUMA_NO_MEMBLK : No memblk exists in @start-@end range
+ * NUMA_NO_MEMBLK : No memblks exist in address range @start-@end
  */
 
 int __init numa_fill_memblks(u64 start, u64 end)
@@ -963,17 +961,14 @@ int __init numa_fill_memblks(u64 start, u64 end)
 
        /*
         * Create a list of pointers to numa_meminfo memblks that
-        * overlap start, end. Exclude (start == bi->end) since
-        * end addresses in both a CFMWS range and a memblk range
-        * are exclusive.
-        *
-        * This list of pointers is used to make in-place changes
-        * that fill out the numa_meminfo memblks.
+        * overlap start, end. The list is used to make in-place
+        * changes that fill out the numa_meminfo memblks.
         */
        for (int i = 0; i < mi->nr_blks; i++) {
                struct numa_memblk *bi = &mi->blk[i];
 
-               if (start < bi->end && end >= bi->start) {
+               if (memblock_addrs_overlap(start, end - start, bi->start,
+                                          bi->end - bi->start)) {
                        blk[count] = &mi->blk[i];
                        count++;
                }
index e9b448d1b1b70f08dae6216250f02e783091a83a..10288040404635743bac78ee8bb0fd721b370b88 100644 (file)
@@ -755,10 +755,14 @@ pmd_t *lookup_pmd_address(unsigned long address)
  * areas on 32-bit NUMA systems.  The percpu areas can
  * end up in this kind of memory, for instance.
  *
- * This could be optimized, but it is only intended to be
- * used at initialization time, and keeping it
- * unoptimized should increase the testing coverage for
- * the more obscure platforms.
+ * Note that as long as the PTEs are well-formed with correct PFNs, this
+ * works without checking the PRESENT bit in the leaf PTE.  This is unlike
+ * the similar vmalloc_to_page() and derivatives.  Callers may depend on
+ * this behavior.
+ *
+ * This could be optimized, but it is only used in paths that are not perf
+ * sensitive, and keeping it unoptimized should increase the testing coverage
+ * for the more obscure platforms.
  */
 phys_addr_t slow_virt_to_phys(void *__virt_addr)
 {
@@ -2041,17 +2045,12 @@ int set_mce_nospec(unsigned long pfn)
        return rc;
 }
 
-static int set_memory_p(unsigned long *addr, int numpages)
-{
-       return change_page_attr_set(addr, numpages, __pgprot(_PAGE_PRESENT), 0);
-}
-
 /* Restore full speculative operation to the pfn. */
 int clear_mce_nospec(unsigned long pfn)
 {
        unsigned long addr = (unsigned long) pfn_to_kaddr(pfn);
 
-       return set_memory_p(&addr, 1);
+       return set_memory_p(addr, 1);
 }
 EXPORT_SYMBOL_GPL(clear_mce_nospec);
 #endif /* CONFIG_X86_64 */
@@ -2104,6 +2103,11 @@ int set_memory_np_noalias(unsigned long addr, int numpages)
                                        CPA_NO_CHECK_ALIAS, NULL);
 }
 
+int set_memory_p(unsigned long addr, int numpages)
+{
+       return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
+}
+
 int set_memory_4k(unsigned long addr, int numpages)
 {
        return change_page_attr_set_clr(&addr, numpages, __pgprot(0),
index 4b0d6fff88de5a544e2ef91a8c3f7c5fa1339aa5..1fb9a1644d944b825a7eb5735b68f48b4c8df9ce 100644 (file)
@@ -65,6 +65,8 @@ int xen_smp_intr_init(unsigned int cpu)
        char *resched_name, *callfunc_name, *debug_name;
 
        resched_name = kasprintf(GFP_KERNEL, "resched%d", cpu);
+       if (!resched_name)
+               goto fail_mem;
        per_cpu(xen_resched_irq, cpu).name = resched_name;
        rc = bind_ipi_to_irqhandler(XEN_RESCHEDULE_VECTOR,
                                    cpu,
@@ -77,6 +79,8 @@ int xen_smp_intr_init(unsigned int cpu)
        per_cpu(xen_resched_irq, cpu).irq = rc;
 
        callfunc_name = kasprintf(GFP_KERNEL, "callfunc%d", cpu);
+       if (!callfunc_name)
+               goto fail_mem;
        per_cpu(xen_callfunc_irq, cpu).name = callfunc_name;
        rc = bind_ipi_to_irqhandler(XEN_CALL_FUNCTION_VECTOR,
                                    cpu,
@@ -90,6 +94,9 @@ int xen_smp_intr_init(unsigned int cpu)
 
        if (!xen_fifo_events) {
                debug_name = kasprintf(GFP_KERNEL, "debug%d", cpu);
+               if (!debug_name)
+                       goto fail_mem;
+
                per_cpu(xen_debug_irq, cpu).name = debug_name;
                rc = bind_virq_to_irqhandler(VIRQ_DEBUG, cpu,
                                             xen_debug_interrupt,
@@ -101,6 +108,9 @@ int xen_smp_intr_init(unsigned int cpu)
        }
 
        callfunc_name = kasprintf(GFP_KERNEL, "callfuncsingle%d", cpu);
+       if (!callfunc_name)
+               goto fail_mem;
+
        per_cpu(xen_callfuncsingle_irq, cpu).name = callfunc_name;
        rc = bind_ipi_to_irqhandler(XEN_CALL_FUNCTION_SINGLE_VECTOR,
                                    cpu,
@@ -114,6 +124,8 @@ int xen_smp_intr_init(unsigned int cpu)
 
        return 0;
 
+ fail_mem:
+       rc = -ENOMEM;
  fail:
        xen_smp_intr_free(cpu);
        return rc;
index c812bf85021c02db59d107fda752dba2872e3b60..46c8596259d2d921fc0395a9eaf8dd444032029c 100644 (file)
@@ -13,7 +13,7 @@
 static __always_inline bool arch_static_branch(struct static_key *key,
                                               bool branch)
 {
-       asm_volatile_goto("1:\n\t"
+       asm goto("1:\n\t"
                          "_nop\n\t"
                          ".pushsection __jump_table,  \"aw\"\n\t"
                          ".word 1b, %l[l_yes], %c0\n\t"
@@ -38,7 +38,7 @@ static __always_inline bool arch_static_branch_jump(struct static_key *key,
         * make it reachable and wrap both into a no-transform block
         * to avoid any assembler interference with this.
         */
-       asm_volatile_goto("1:\n\t"
+       asm goto("1:\n\t"
                          ".begin no-transform\n\t"
                          "_j %l[l_yes]\n\t"
                          "2:\n\t"
index 178cf96ca10acb4cd70f8cd052e58503a9ee1494..defc67909a9c745a9794847a3b0bfbeb9ca74536 100644 (file)
@@ -264,16 +264,18 @@ static int __init simdisk_setup(struct simdisk *dev, int which,
                struct proc_dir_entry *procdir)
 {
        char tmp[2] = { '0' + which, 0 };
-       int err = -ENOMEM;
+       int err;
 
        dev->fd = -1;
        dev->filename = NULL;
        spin_lock_init(&dev->lock);
        dev->users = 0;
 
-       dev->gd = blk_alloc_disk(NUMA_NO_NODE);
-       if (!dev->gd)
+       dev->gd = blk_alloc_disk(NULL, NUMA_NO_NODE);
+       if (IS_ERR(dev->gd)) {
+               err = PTR_ERR(dev->gd);
                goto out;
+       }
        dev->gd->major = simdisk_major;
        dev->gd->first_minor = which;
        dev->gd->minors = SIMDISK_MINORS;
index e9f1b12bd75c7b0d4b2964995e8fbf70ac3c5c8e..e7adaaf1c21927a71d93e64817e22732fc72f2a3 100644 (file)
@@ -49,6 +49,12 @@ struct block_device *I_BDEV(struct inode *inode)
 }
 EXPORT_SYMBOL(I_BDEV);
 
+struct block_device *file_bdev(struct file *bdev_file)
+{
+       return I_BDEV(bdev_file->f_mapping->host);
+}
+EXPORT_SYMBOL(file_bdev);
+
 static void bdev_write_inode(struct block_device *bdev)
 {
        struct inode *inode = bdev->bd_inode;
@@ -368,24 +374,24 @@ static struct file_system_type bd_type = {
 };
 
 struct super_block *blockdev_superblock __ro_after_init;
+struct vfsmount *blockdev_mnt __ro_after_init;
 EXPORT_SYMBOL_GPL(blockdev_superblock);
 
 void __init bdev_cache_init(void)
 {
        int err;
-       static struct vfsmount *bd_mnt __ro_after_init;
 
        bdev_cachep = kmem_cache_create("bdev_cache", sizeof(struct bdev_inode),
                        0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
-                               SLAB_MEM_SPREAD|SLAB_ACCOUNT|SLAB_PANIC),
+                               SLAB_ACCOUNT|SLAB_PANIC),
                        init_once);
        err = register_filesystem(&bd_type);
        if (err)
                panic("Cannot register bdev pseudo-fs");
-       bd_mnt = kern_mount(&bd_type);
-       if (IS_ERR(bd_mnt))
+       blockdev_mnt = kern_mount(&bd_type);
+       if (IS_ERR(blockdev_mnt))
                panic("Cannot create bdev pseudo-fs");
-       blockdev_superblock = bd_mnt->mnt_sb;   /* For writeback */
+       blockdev_superblock = blockdev_mnt->mnt_sb;   /* For writeback */
 }
 
 struct block_device *bdev_alloc(struct gendisk *disk, u8 partno)
@@ -696,6 +702,31 @@ out_blkdev_put:
        return ret;
 }
 
+int bdev_permission(dev_t dev, blk_mode_t mode, void *holder)
+{
+       int ret;
+
+       ret = devcgroup_check_permission(DEVCG_DEV_BLOCK,
+                       MAJOR(dev), MINOR(dev),
+                       ((mode & BLK_OPEN_READ) ? DEVCG_ACC_READ : 0) |
+                       ((mode & BLK_OPEN_WRITE) ? DEVCG_ACC_WRITE : 0));
+       if (ret)
+               return ret;
+
+       /* Blocking writes requires exclusive opener */
+       if (mode & BLK_OPEN_RESTRICT_WRITES && !holder)
+               return -EINVAL;
+
+       /*
+        * We're using error pointers to indicate to ->release() when we
+        * failed to open that block device. Also this doesn't make sense.
+        */
+       if (WARN_ON_ONCE(IS_ERR(holder)))
+               return -EINVAL;
+
+       return 0;
+}
+
 static void blkdev_put_part(struct block_device *part)
 {
        struct block_device *whole = bdev_whole(part);
@@ -775,83 +806,55 @@ static void bdev_claim_write_access(struct block_device *bdev, blk_mode_t mode)
                bdev->bd_writers++;
 }
 
-static void bdev_yield_write_access(struct block_device *bdev, blk_mode_t mode)
+static void bdev_yield_write_access(struct file *bdev_file)
 {
+       struct block_device *bdev;
+
        if (bdev_allow_write_mounted)
                return;
 
+       bdev = file_bdev(bdev_file);
        /* Yield exclusive or shared write access. */
-       if (mode & BLK_OPEN_RESTRICT_WRITES)
-               bdev_unblock_writes(bdev);
-       else if (mode & BLK_OPEN_WRITE)
-               bdev->bd_writers--;
+       if (bdev_file->f_mode & FMODE_WRITE) {
+               if (bdev_writes_blocked(bdev))
+                       bdev_unblock_writes(bdev);
+               else
+                       bdev->bd_writers--;
+       }
 }
 
 /**
- * bdev_open_by_dev - open a block device by device number
- * @dev: device number of block device to open
+ * bdev_open - open a block device
+ * @bdev: block device to open
  * @mode: open mode (BLK_OPEN_*)
  * @holder: exclusive holder identifier
  * @hops: holder operations
+ * @bdev_file: file for the block device
  *
- * Open the block device described by device number @dev. If @holder is not
- * %NULL, the block device is opened with exclusive access.  Exclusive opens may
- * nest for the same @holder.
- *
- * Use this interface ONLY if you really do not have anything better - i.e. when
- * you are behind a truly sucky interface and all you are given is a device
- * number.  Everything else should use bdev_open_by_path().
+ * Open the block device. If @holder is not %NULL, the block device is opened
+ * with exclusive access.  Exclusive opens may nest for the same @holder.
  *
  * CONTEXT:
  * Might sleep.
  *
  * RETURNS:
- * Handle with a reference to the block_device on success, ERR_PTR(-errno) on
- * failure.
+ * zero on success, -errno on failure.
  */
-struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
-                                    const struct blk_holder_ops *hops)
+int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
+             const struct blk_holder_ops *hops, struct file *bdev_file)
 {
-       struct bdev_handle *handle = kmalloc(sizeof(struct bdev_handle),
-                                            GFP_KERNEL);
-       struct block_device *bdev;
        bool unblock_events = true;
-       struct gendisk *disk;
+       struct gendisk *disk = bdev->bd_disk;
        int ret;
 
-       if (!handle)
-               return ERR_PTR(-ENOMEM);
-
-       ret = devcgroup_check_permission(DEVCG_DEV_BLOCK,
-                       MAJOR(dev), MINOR(dev),
-                       ((mode & BLK_OPEN_READ) ? DEVCG_ACC_READ : 0) |
-                       ((mode & BLK_OPEN_WRITE) ? DEVCG_ACC_WRITE : 0));
-       if (ret)
-               goto free_handle;
-
-       /* Blocking writes requires exclusive opener */
-       if (mode & BLK_OPEN_RESTRICT_WRITES && !holder) {
-               ret = -EINVAL;
-               goto free_handle;
-       }
-
-       bdev = blkdev_get_no_open(dev);
-       if (!bdev) {
-               ret = -ENXIO;
-               goto free_handle;
-       }
-       disk = bdev->bd_disk;
-
        if (holder) {
                mode |= BLK_OPEN_EXCL;
                ret = bd_prepare_to_claim(bdev, holder, hops);
                if (ret)
-                       goto put_blkdev;
+                       return ret;
        } else {
-               if (WARN_ON_ONCE(mode & BLK_OPEN_EXCL)) {
-                       ret = -EIO;
-                       goto put_blkdev;
-               }
+               if (WARN_ON_ONCE(mode & BLK_OPEN_EXCL))
+                       return -EIO;
        }
 
        disk_block_events(disk);
@@ -892,10 +895,16 @@ struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
 
        if (unblock_events)
                disk_unblock_events(disk);
-       handle->bdev = bdev;
-       handle->holder = holder;
-       handle->mode = mode;
-       return handle;
+
+       bdev_file->f_flags |= O_LARGEFILE;
+       bdev_file->f_mode |= FMODE_BUF_RASYNC | FMODE_CAN_ODIRECT;
+       if (bdev_nowait(bdev))
+               bdev_file->f_mode |= FMODE_NOWAIT;
+       bdev_file->f_mapping = bdev->bd_inode->i_mapping;
+       bdev_file->f_wb_err = filemap_sample_wb_err(bdev_file->f_mapping);
+       bdev_file->private_data = holder;
+
+       return 0;
 put_module:
        module_put(disk->fops->owner);
 abort_claiming:
@@ -903,36 +912,80 @@ abort_claiming:
                bd_abort_claiming(bdev, holder);
        mutex_unlock(&disk->open_mutex);
        disk_unblock_events(disk);
-put_blkdev:
-       blkdev_put_no_open(bdev);
-free_handle:
-       kfree(handle);
-       return ERR_PTR(ret);
+       return ret;
 }
-EXPORT_SYMBOL(bdev_open_by_dev);
 
-/**
- * bdev_open_by_path - open a block device by name
- * @path: path to the block device to open
- * @mode: open mode (BLK_OPEN_*)
- * @holder: exclusive holder identifier
- * @hops: holder operations
- *
- * Open the block device described by the device file at @path.  If @holder is
- * not %NULL, the block device is opened with exclusive access.  Exclusive opens
- * may nest for the same @holder.
- *
- * CONTEXT:
- * Might sleep.
+/*
+ * If BLK_OPEN_WRITE_IOCTL is set then this is a historical quirk
+ * associated with the floppy driver where it has allowed ioctls if the
+ * file was opened for writing, but does not allow reads or writes.
+ * Make sure that this quirk is reflected in @f_flags.
  *
- * RETURNS:
- * Handle with a reference to the block_device on success, ERR_PTR(-errno) on
- * failure.
+ * It can also happen if a block device is opened as O_RDWR | O_WRONLY.
  */
-struct bdev_handle *bdev_open_by_path(const char *path, blk_mode_t mode,
-               void *holder, const struct blk_holder_ops *hops)
+static unsigned blk_to_file_flags(blk_mode_t mode)
+{
+       unsigned int flags = 0;
+
+       if ((mode & (BLK_OPEN_READ | BLK_OPEN_WRITE)) ==
+           (BLK_OPEN_READ | BLK_OPEN_WRITE))
+               flags |= O_RDWR;
+       else if (mode & BLK_OPEN_WRITE_IOCTL)
+               flags |= O_RDWR | O_WRONLY;
+       else if (mode & BLK_OPEN_WRITE)
+               flags |= O_WRONLY;
+       else if (mode & BLK_OPEN_READ)
+               flags |= O_RDONLY; /* homeopathic, because O_RDONLY is 0 */
+       else
+               WARN_ON_ONCE(true);
+
+       if (mode & BLK_OPEN_NDELAY)
+               flags |= O_NDELAY;
+
+       return flags;
+}
+
+struct file *bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
+                                  const struct blk_holder_ops *hops)
 {
-       struct bdev_handle *handle;
+       struct file *bdev_file;
+       struct block_device *bdev;
+       unsigned int flags;
+       int ret;
+
+       ret = bdev_permission(dev, mode, holder);
+       if (ret)
+               return ERR_PTR(ret);
+
+       bdev = blkdev_get_no_open(dev);
+       if (!bdev)
+               return ERR_PTR(-ENXIO);
+
+       flags = blk_to_file_flags(mode);
+       bdev_file = alloc_file_pseudo_noaccount(bdev->bd_inode,
+                       blockdev_mnt, "", flags | O_LARGEFILE, &def_blk_fops);
+       if (IS_ERR(bdev_file)) {
+               blkdev_put_no_open(bdev);
+               return bdev_file;
+       }
+       ihold(bdev->bd_inode);
+
+       ret = bdev_open(bdev, mode, holder, hops, bdev_file);
+       if (ret) {
+               /* We failed to open the block device. Let ->release() know. */
+               bdev_file->private_data = ERR_PTR(ret);
+               fput(bdev_file);
+               return ERR_PTR(ret);
+       }
+       return bdev_file;
+}
+EXPORT_SYMBOL(bdev_file_open_by_dev);
+
+struct file *bdev_file_open_by_path(const char *path, blk_mode_t mode,
+                                   void *holder,
+                                   const struct blk_holder_ops *hops)
+{
+       struct file *file;
        dev_t dev;
        int error;
 
@@ -940,22 +993,28 @@ struct bdev_handle *bdev_open_by_path(const char *path, blk_mode_t mode,
        if (error)
                return ERR_PTR(error);
 
-       handle = bdev_open_by_dev(dev, mode, holder, hops);
-       if (!IS_ERR(handle) && (mode & BLK_OPEN_WRITE) &&
-           bdev_read_only(handle->bdev)) {
-               bdev_release(handle);
-               return ERR_PTR(-EACCES);
+       file = bdev_file_open_by_dev(dev, mode, holder, hops);
+       if (!IS_ERR(file) && (mode & BLK_OPEN_WRITE)) {
+               if (bdev_read_only(file_bdev(file))) {
+                       fput(file);
+                       file = ERR_PTR(-EACCES);
+               }
        }
 
-       return handle;
+       return file;
 }
-EXPORT_SYMBOL(bdev_open_by_path);
+EXPORT_SYMBOL(bdev_file_open_by_path);
 
-void bdev_release(struct bdev_handle *handle)
+void bdev_release(struct file *bdev_file)
 {
-       struct block_device *bdev = handle->bdev;
+       struct block_device *bdev = file_bdev(bdev_file);
+       void *holder = bdev_file->private_data;
        struct gendisk *disk = bdev->bd_disk;
 
+       /* We failed to open that block device. */
+       if (IS_ERR(holder))
+               goto put_no_open;
+
        /*
         * Sync early if it looks like we're the last one.  If someone else
         * opens the block device between now and the decrement of bd_openers
@@ -967,10 +1026,10 @@ void bdev_release(struct bdev_handle *handle)
                sync_blockdev(bdev);
 
        mutex_lock(&disk->open_mutex);
-       bdev_yield_write_access(bdev, handle->mode);
+       bdev_yield_write_access(bdev_file);
 
-       if (handle->holder)
-               bd_end_claim(bdev, handle->holder);
+       if (holder)
+               bd_end_claim(bdev, holder);
 
        /*
         * Trigger event checking and tell drivers to flush MEDIA_CHANGE
@@ -986,10 +1045,9 @@ void bdev_release(struct bdev_handle *handle)
        mutex_unlock(&disk->open_mutex);
 
        module_put(disk->fops->owner);
+put_no_open:
        blkdev_put_no_open(bdev);
-       kfree(handle);
 }
-EXPORT_SYMBOL(bdev_release);
 
 /**
  * lookup_bdev() - Look up a struct block_device by name.
index 2c90e5de0acd94f9ba3f1920cede68856c8f66df..d442ee358fc2573e6873dec037e90998ddda7fdf 100644 (file)
@@ -127,7 +127,7 @@ static void bfqg_stats_update_group_wait_time(struct bfqg_stats *stats)
        if (!bfqg_stats_waiting(stats))
                return;
 
-       now = ktime_get_ns();
+       now = blk_time_get_ns();
        if (now > stats->start_group_wait_time)
                bfq_stat_add(&stats->group_wait_time,
                              now - stats->start_group_wait_time);
@@ -144,7 +144,7 @@ static void bfqg_stats_set_start_group_wait_time(struct bfq_group *bfqg,
                return;
        if (bfqg == curr_bfqg)
                return;
-       stats->start_group_wait_time = ktime_get_ns();
+       stats->start_group_wait_time = blk_time_get_ns();
        bfqg_stats_mark_waiting(stats);
 }
 
@@ -156,7 +156,7 @@ static void bfqg_stats_end_empty_time(struct bfqg_stats *stats)
        if (!bfqg_stats_empty(stats))
                return;
 
-       now = ktime_get_ns();
+       now = blk_time_get_ns();
        if (now > stats->start_empty_time)
                bfq_stat_add(&stats->empty_time,
                              now - stats->start_empty_time);
@@ -183,7 +183,7 @@ void bfqg_stats_set_start_empty_time(struct bfq_group *bfqg)
        if (bfqg_stats_empty(stats))
                return;
 
-       stats->start_empty_time = ktime_get_ns();
+       stats->start_empty_time = blk_time_get_ns();
        bfqg_stats_mark_empty(stats);
 }
 
@@ -192,7 +192,7 @@ void bfqg_stats_update_idle_time(struct bfq_group *bfqg)
        struct bfqg_stats *stats = &bfqg->stats;
 
        if (bfqg_stats_idling(stats)) {
-               u64 now = ktime_get_ns();
+               u64 now = blk_time_get_ns();
 
                if (now > stats->start_idle_time)
                        bfq_stat_add(&stats->idle_time,
@@ -205,7 +205,7 @@ void bfqg_stats_set_start_idle_time(struct bfq_group *bfqg)
 {
        struct bfqg_stats *stats = &bfqg->stats;
 
-       stats->start_idle_time = ktime_get_ns();
+       stats->start_idle_time = blk_time_get_ns();
        bfqg_stats_mark_idling(stats);
 }
 
@@ -242,7 +242,7 @@ void bfqg_stats_update_completion(struct bfq_group *bfqg, u64 start_time_ns,
                                  u64 io_start_time_ns, blk_opf_t opf)
 {
        struct bfqg_stats *stats = &bfqg->stats;
-       u64 now = ktime_get_ns();
+       u64 now = blk_time_get_ns();
 
        if (now > io_start_time_ns)
                blkg_rwstat_add(&stats->service_time, opf,
index 3cce6de464a7b7c1b506158d044b537510a3e6f8..4b88a54a9b76cba3bca954e50f509ff6028251ee 100644 (file)
@@ -1005,7 +1005,7 @@ static struct request *bfq_check_fifo(struct bfq_queue *bfqq,
 
        rq = rq_entry_fifo(bfqq->fifo.next);
 
-       if (rq == last || ktime_get_ns() < rq->fifo_time)
+       if (rq == last || blk_time_get_ns() < rq->fifo_time)
                return NULL;
 
        bfq_log_bfqq(bfqq->bfqd, bfqq, "check_fifo: returned %p", rq);
@@ -1829,7 +1829,7 @@ static void bfq_bfqq_handle_idle_busy_switch(struct bfq_data *bfqd,
                 * bfq_bfqq_update_budg_for_activation for
                 * details on the usage of the next variable.
                 */
-               arrived_in_time =  ktime_get_ns() <=
+               arrived_in_time =  blk_time_get_ns() <=
                        bfqq->ttime.last_end_request +
                        bfqd->bfq_slice_idle * 3;
        unsigned int act_idx = bfq_actuator_index(bfqd, rq->bio);
@@ -2208,7 +2208,7 @@ static void bfq_add_request(struct request *rq)
        struct request *next_rq, *prev;
        unsigned int old_wr_coeff = bfqq->wr_coeff;
        bool interactive = false;
-       u64 now_ns = ktime_get_ns();
+       u64 now_ns = blk_time_get_ns();
 
        bfq_log_bfqq(bfqd, bfqq, "add_request %d", rq_is_sync(rq));
        bfqq->queued[rq_is_sync(rq)]++;
@@ -2262,7 +2262,7 @@ static void bfq_add_request(struct request *rq)
                      bfqd->rqs_injected && bfqd->tot_rq_in_driver > 0)) &&
                    time_is_before_eq_jiffies(bfqq->decrease_time_jif +
                                              msecs_to_jiffies(10))) {
-                       bfqd->last_empty_occupied_ns = ktime_get_ns();
+                       bfqd->last_empty_occupied_ns = blk_time_get_ns();
                        /*
                         * Start the state machine for measuring the
                         * total service time of rq: setting
@@ -3294,7 +3294,7 @@ static void bfq_set_budget_timeout(struct bfq_data *bfqd,
        else
                timeout_coeff = bfqq->entity.weight / bfqq->entity.orig_weight;
 
-       bfqd->last_budget_start = ktime_get();
+       bfqd->last_budget_start = blk_time_get();
 
        bfqq->budget_timeout = jiffies +
                bfqd->bfq_timeout * timeout_coeff;
@@ -3394,7 +3394,7 @@ static void bfq_arm_slice_timer(struct bfq_data *bfqd)
        else if (bfqq->wr_coeff > 1)
                sl = max_t(u32, sl, 20ULL * NSEC_PER_MSEC);
 
-       bfqd->last_idling_start = ktime_get();
+       bfqd->last_idling_start = blk_time_get();
        bfqd->last_idling_start_jiffies = jiffies;
 
        hrtimer_start(&bfqd->idle_slice_timer, ns_to_ktime(sl),
@@ -3433,7 +3433,7 @@ static void bfq_reset_rate_computation(struct bfq_data *bfqd,
                                       struct request *rq)
 {
        if (rq != NULL) { /* new rq dispatch now, reset accordingly */
-               bfqd->last_dispatch = bfqd->first_dispatch = ktime_get_ns();
+               bfqd->last_dispatch = bfqd->first_dispatch = blk_time_get_ns();
                bfqd->peak_rate_samples = 1;
                bfqd->sequential_samples = 0;
                bfqd->tot_sectors_dispatched = bfqd->last_rq_max_size =
@@ -3590,7 +3590,7 @@ reset_computation:
  */
 static void bfq_update_peak_rate(struct bfq_data *bfqd, struct request *rq)
 {
-       u64 now_ns = ktime_get_ns();
+       u64 now_ns = blk_time_get_ns();
 
        if (bfqd->peak_rate_samples == 0) { /* first dispatch */
                bfq_log(bfqd, "update_peak_rate: goto reset, samples %d",
@@ -4162,7 +4162,7 @@ static bool bfq_bfqq_is_slow(struct bfq_data *bfqd, struct bfq_queue *bfqq,
        if (compensate)
                delta_ktime = bfqd->last_idling_start;
        else
-               delta_ktime = ktime_get();
+               delta_ktime = blk_time_get();
        delta_ktime = ktime_sub(delta_ktime, bfqd->last_budget_start);
        delta_usecs = ktime_to_us(delta_ktime);
 
@@ -5591,7 +5591,7 @@ static void bfq_init_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
                          struct bfq_io_cq *bic, pid_t pid, int is_sync,
                          unsigned int act_idx)
 {
-       u64 now_ns = ktime_get_ns();
+       u64 now_ns = blk_time_get_ns();
 
        bfqq->actuator_idx = act_idx;
        RB_CLEAR_NODE(&bfqq->entity.rb_node);
@@ -5903,7 +5903,7 @@ static void bfq_update_io_thinktime(struct bfq_data *bfqd,
         */
        if (bfqq->dispatched || bfq_bfqq_busy(bfqq))
                return;
-       elapsed = ktime_get_ns() - bfqq->ttime.last_end_request;
+       elapsed = blk_time_get_ns() - bfqq->ttime.last_end_request;
        elapsed = min_t(u64, elapsed, 2ULL * bfqd->bfq_slice_idle);
 
        ttime->ttime_samples = (7*ttime->ttime_samples + 256) / 8;
@@ -6194,7 +6194,7 @@ static bool __bfq_insert_request(struct bfq_data *bfqd, struct request *rq)
        bfq_add_request(rq);
        idle_timer_disabled = waiting && !bfq_bfqq_wait_request(bfqq);
 
-       rq->fifo_time = ktime_get_ns() + bfqd->bfq_fifo_expire[rq_is_sync(rq)];
+       rq->fifo_time = blk_time_get_ns() + bfqd->bfq_fifo_expire[rq_is_sync(rq)];
        list_add_tail(&rq->queuelist, &bfqq->fifo);
 
        bfq_rq_enqueued(bfqd, bfqq, rq);
@@ -6370,7 +6370,7 @@ static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd)
                bfq_weights_tree_remove(bfqq);
        }
 
-       now_ns = ktime_get_ns();
+       now_ns = blk_time_get_ns();
 
        bfqq->ttime.last_end_request = now_ns;
 
@@ -6585,7 +6585,7 @@ static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd)
 static void bfq_update_inject_limit(struct bfq_data *bfqd,
                                    struct bfq_queue *bfqq)
 {
-       u64 tot_time_ns = ktime_get_ns() - bfqd->last_empty_occupied_ns;
+       u64 tot_time_ns = blk_time_get_ns() - bfqd->last_empty_occupied_ns;
        unsigned int old_limit = bfqq->inject_limit;
 
        if (bfqq->last_serv_time_ns > 0 && bfqd->rqs_injected) {
index c9a16fba58b9c47f5424be9a8c7c6681d176b986..2e3e8e04961eaeaa04f9ec0470d15bbf64867191 100644 (file)
@@ -395,6 +395,7 @@ static blk_status_t bio_integrity_process(struct bio *bio,
        iter.tuple_size = bi->tuple_size;
        iter.seed = proc_iter->bi_sector;
        iter.prot_buf = bvec_virt(bip->bip_vec);
+       iter.pi_offset = bi->pi_offset;
 
        __bio_for_each_segment(bv, bio, bviter, *proc_iter) {
                void *kaddr = bvec_kmap_local(&bv);
index b9642a41f286e5bb52d841255aa9286c1449e83d..d24420ed1c4c6f20b80bc043cdf0378afcb43668 100644 (file)
@@ -16,7 +16,6 @@
 #include <linux/workqueue.h>
 #include <linux/cgroup.h>
 #include <linux/highmem.h>
-#include <linux/sched/sysctl.h>
 #include <linux/blk-crypto.h>
 #include <linux/xarray.h>
 
@@ -251,6 +250,7 @@ void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table,
        bio->bi_opf = opf;
        bio->bi_flags = 0;
        bio->bi_ioprio = 0;
+       bio->bi_write_hint = 0;
        bio->bi_status = 0;
        bio->bi_iter.bi_sector = 0;
        bio->bi_iter.bi_size = 0;
@@ -762,29 +762,31 @@ static inline void bio_put_percpu_cache(struct bio *bio)
        struct bio_alloc_cache *cache;
 
        cache = per_cpu_ptr(bio->bi_pool->cache, get_cpu());
-       if (READ_ONCE(cache->nr_irq) + cache->nr > ALLOC_CACHE_MAX) {
-               put_cpu();
-               bio_free(bio);
-               return;
-       }
+       if (READ_ONCE(cache->nr_irq) + cache->nr > ALLOC_CACHE_MAX)
+               goto out_free;
 
-       bio_uninit(bio);
-
-       if ((bio->bi_opf & REQ_POLLED) && !WARN_ON_ONCE(in_interrupt())) {
+       if (in_task()) {
+               bio_uninit(bio);
                bio->bi_next = cache->free_list;
+               /* Not necessary but helps not to iopoll already freed bios */
                bio->bi_bdev = NULL;
                cache->free_list = bio;
                cache->nr++;
-       } else {
-               unsigned long flags;
+       } else if (in_hardirq()) {
+               lockdep_assert_irqs_disabled();
 
-               local_irq_save(flags);
+               bio_uninit(bio);
                bio->bi_next = cache->free_list_irq;
                cache->free_list_irq = bio;
                cache->nr_irq++;
-               local_irq_restore(flags);
+       } else {
+               goto out_free;
        }
        put_cpu();
+       return;
+out_free:
+       put_cpu();
+       bio_free(bio);
 }
 
 /**
@@ -813,6 +815,7 @@ static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp)
 {
        bio_set_flag(bio, BIO_CLONED);
        bio->bi_ioprio = bio_src->bi_ioprio;
+       bio->bi_write_hint = bio_src->bi_write_hint;
        bio->bi_iter = bio_src->bi_iter;
 
        if (bio->bi_bdev) {
@@ -1152,7 +1155,7 @@ void __bio_release_pages(struct bio *bio, bool mark_dirty)
 
        bio_for_each_folio_all(fi, bio) {
                struct page *page;
-               size_t done = 0;
+               size_t nr_pages;
 
                if (mark_dirty) {
                        folio_lock(fi.folio);
@@ -1160,10 +1163,11 @@ void __bio_release_pages(struct bio *bio, bool mark_dirty)
                        folio_unlock(fi.folio);
                }
                page = folio_page(fi.folio, fi.offset / PAGE_SIZE);
+               nr_pages = (fi.offset + fi.length - 1) / PAGE_SIZE -
+                          fi.offset / PAGE_SIZE + 1;
                do {
                        bio_release_page(bio, page++);
-                       done += PAGE_SIZE;
-               } while (done < fi.length);
+               } while (--nr_pages != 0);
        }
 }
 EXPORT_SYMBOL_GPL(__bio_release_pages);
@@ -1369,21 +1373,12 @@ int submit_bio_wait(struct bio *bio)
 {
        DECLARE_COMPLETION_ONSTACK_MAP(done,
                        bio->bi_bdev->bd_disk->lockdep_map);
-       unsigned long hang_check;
 
        bio->bi_private = &done;
        bio->bi_end_io = submit_bio_wait_endio;
        bio->bi_opf |= REQ_SYNC;
        submit_bio(bio);
-
-       /* Prevent hang_check timer from firing at us during very long I/O */
-       hang_check = sysctl_hung_task_timeout_secs;
-       if (hang_check)
-               while (!wait_for_completion_io_timeout(&done,
-                                       hang_check * (HZ/2)))
-                       ;
-       else
-               wait_for_completion_io(&done);
+       blk_wait_io(&done);
 
        return blk_status_to_errno(bio->bi_status);
 }
index ff93c385ba5afb6920b53fdbcf96bd5d3970d17a..bdbb557feb5a0ec949e7ac8cde0e87b6d4055f5b 100644 (file)
@@ -1846,7 +1846,7 @@ static void blkcg_maybe_throttle_blkg(struct blkcg_gq *blkg, bool use_memdelay)
 {
        unsigned long pflags;
        bool clamp;
-       u64 now = ktime_to_ns(ktime_get());
+       u64 now = blk_time_get_ns();
        u64 exp;
        u64 delay_nsec = 0;
        int tok;
index b927a4a0ad0301db43a6aad657c49c67f4cb880b..78b74106bf10c5cbadd655e2da6b2f21416c0622 100644 (file)
@@ -19,6 +19,7 @@
 #include <linux/kthread.h>
 #include <linux/blk-mq.h>
 #include <linux/llist.h>
+#include "blk.h"
 
 struct blkcg_gq;
 struct blkg_policy_data;
index 11342af420d0c41d1c98a729471dcd6bfb46da05..a16b5abdbbf56f44611d34fd238c0ee3a00d72f5 100644 (file)
@@ -49,6 +49,7 @@
 #include "blk-pm.h"
 #include "blk-cgroup.h"
 #include "blk-throttle.h"
+#include "blk-ioprio.h"
 
 struct dentry *blk_debugfs_root;
 
@@ -393,24 +394,34 @@ static void blk_timeout_work(struct work_struct *work)
 {
 }
 
-struct request_queue *blk_alloc_queue(int node_id)
+struct request_queue *blk_alloc_queue(struct queue_limits *lim, int node_id)
 {
        struct request_queue *q;
+       int error;
 
        q = kmem_cache_alloc_node(blk_requestq_cachep, GFP_KERNEL | __GFP_ZERO,
                                  node_id);
        if (!q)
-               return NULL;
+               return ERR_PTR(-ENOMEM);
 
        q->last_merge = NULL;
 
        q->id = ida_alloc(&blk_queue_ida, GFP_KERNEL);
-       if (q->id < 0)
+       if (q->id < 0) {
+               error = q->id;
                goto fail_q;
+       }
 
        q->stats = blk_alloc_queue_stats();
-       if (!q->stats)
+       if (!q->stats) {
+               error = -ENOMEM;
                goto fail_id;
+       }
+
+       error = blk_set_default_limits(lim);
+       if (error)
+               goto fail_stats;
+       q->limits = *lim;
 
        q->node = node_id;
 
@@ -424,6 +435,7 @@ struct request_queue *blk_alloc_queue(int node_id)
        mutex_init(&q->debugfs_mutex);
        mutex_init(&q->sysfs_lock);
        mutex_init(&q->sysfs_dir_lock);
+       mutex_init(&q->limits_lock);
        mutex_init(&q->rq_qos_mutex);
        spin_lock_init(&q->queue_lock);
 
@@ -434,12 +446,12 @@ struct request_queue *blk_alloc_queue(int node_id)
         * Init percpu_ref in atomic mode so that it's faster to shutdown.
         * See blk_register_queue() for details.
         */
-       if (percpu_ref_init(&q->q_usage_counter,
+       error = percpu_ref_init(&q->q_usage_counter,
                                blk_queue_usage_counter_release,
-                               PERCPU_REF_INIT_ATOMIC, GFP_KERNEL))
+                               PERCPU_REF_INIT_ATOMIC, GFP_KERNEL);
+       if (error)
                goto fail_stats;
 
-       blk_set_default_limits(&q->limits);
        q->nr_requests = BLKDEV_DEFAULT_RQ;
 
        return q;
@@ -450,7 +462,7 @@ fail_id:
        ida_free(&blk_queue_ida, q->id);
 fail_q:
        kmem_cache_free(blk_requestq_cachep, q);
-       return NULL;
+       return ERR_PTR(error);
 }
 
 /**
@@ -833,6 +845,14 @@ end_io:
 }
 EXPORT_SYMBOL(submit_bio_noacct);
 
+static void bio_set_ioprio(struct bio *bio)
+{
+       /* Nobody set ioprio so far? Initialize it based on task's nice value */
+       if (IOPRIO_PRIO_CLASS(bio->bi_ioprio) == IOPRIO_CLASS_NONE)
+               bio->bi_ioprio = get_current_ioprio();
+       blkcg_set_ioprio(bio);
+}
+
 /**
  * submit_bio - submit a bio to the block device layer for I/O
  * @bio: The &struct bio which describes the I/O
@@ -855,6 +875,7 @@ void submit_bio(struct bio *bio)
                count_vm_events(PGPGOUT, bio_sectors(bio));
        }
 
+       bio_set_ioprio(bio);
        submit_bio_noacct(bio);
 }
 EXPORT_SYMBOL(submit_bio);
@@ -1073,6 +1094,7 @@ void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
        if (tsk->plug)
                return;
 
+       plug->cur_ktime = 0;
        plug->mq_list = NULL;
        plug->cached_rq = NULL;
        plug->nr_ios = min_t(unsigned short, nr_ios, BLK_MAX_REQUEST_COUNT);
@@ -1172,6 +1194,8 @@ void __blk_flush_plug(struct blk_plug *plug, bool from_schedule)
         */
        if (unlikely(!rq_list_empty(plug->cached_rq)))
                blk_mq_free_plug_rqs(plug);
+
+       current->flags &= ~PF_BLOCK_TS;
 }
 
 /**
@@ -1219,8 +1243,7 @@ int __init blk_dev_init(void)
        if (!kblockd_workqueue)
                panic("Failed to create kblockd\n");
 
-       blk_requestq_cachep = kmem_cache_create("request_queue",
-                       sizeof(struct request_queue), 0, SLAB_PANIC, NULL);
+       blk_requestq_cachep = KMEM_CACHE(request_queue, SLAB_PANIC);
 
        blk_debugfs_root = debugfs_create_dir("block", NULL);
 
index e6468eab2681e9f827e9b86b3b21d24f6a6fe0a5..b1e7415f8439c45d04c49f7fbeb48891e0392027 100644 (file)
@@ -172,6 +172,7 @@ static struct bio *blk_crypto_fallback_clone_bio(struct bio *bio_src)
        if (bio_flagged(bio_src, BIO_REMAPPED))
                bio_set_flag(bio, BIO_REMAPPED);
        bio->bi_ioprio          = bio_src->bi_ioprio;
+       bio->bi_write_hint      = bio_src->bi_write_hint;
        bio->bi_iter.bi_sector  = bio_src->bi_iter.bi_sector;
        bio->bi_iter.bi_size    = bio_src->bi_iter.bi_size;
 
index 3f4d41952ef210929091ca29661b5da2be280d9c..b0f314f4bc1493db379e6a234c0b5381569827b2 100644 (file)
@@ -143,7 +143,7 @@ static void blk_account_io_flush(struct request *rq)
        part_stat_lock();
        part_stat_inc(part, ios[STAT_FLUSH]);
        part_stat_add(part, nsecs[STAT_FLUSH],
-                     ktime_get_ns() - rq->start_time_ns);
+                     blk_time_get_ns() - rq->start_time_ns);
        part_stat_unlock();
 }
 
index d4e9b4556d14b2ca9931b3123d5e1ba0884720ca..ccbeb6dfa87a4dc5f63a485cd0bec9e800704ff6 100644 (file)
@@ -370,6 +370,7 @@ void blk_integrity_register(struct gendisk *disk, struct blk_integrity *template
        bi->profile = template->profile ? template->profile : &nop_profile;
        bi->tuple_size = template->tuple_size;
        bi->tag_size = template->tag_size;
+       bi->pi_offset = template->pi_offset;
 
        blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, disk->queue);
 
index c8beec6d7df0863bb4811c12f5ff576e7a5121c7..9a85bfbbc45a018e941cd0b778ab612a54cdea09 100644 (file)
@@ -829,7 +829,7 @@ static int ioc_autop_idx(struct ioc *ioc, struct gendisk *disk)
 
        /* step up/down based on the vrate */
        vrate_pct = div64_u64(ioc->vtime_base_rate * 100, VTIME_PER_USEC);
-       now_ns = ktime_get_ns();
+       now_ns = blk_time_get_ns();
 
        if (p->too_fast_vrate_pct && p->too_fast_vrate_pct <= vrate_pct) {
                if (!ioc->autop_too_fast_at)
@@ -1044,7 +1044,7 @@ static void ioc_now(struct ioc *ioc, struct ioc_now *now)
        unsigned seq;
        u64 vrate;
 
-       now->now_ns = ktime_get();
+       now->now_ns = blk_time_get_ns();
        now->now = ktime_to_us(now->now_ns);
        vrate = atomic64_read(&ioc->vtime_rate);
 
@@ -1353,6 +1353,13 @@ static bool iocg_kick_delay(struct ioc_gq *iocg, struct ioc_now *now)
 
        lockdep_assert_held(&iocg->waitq.lock);
 
+       /*
+        * If the delay is set by another CPU, we may be in the past. No need to
+        * change anything if so. This avoids decay calculation underflow.
+        */
+       if (time_before64(now->now, iocg->delay_at))
+               return false;
+
        /* calculate the current delay in effect - 1/2 every second */
        tdelta = now->now - iocg->delay_at;
        if (iocg->delay)
@@ -2810,7 +2817,7 @@ static void ioc_rqos_done(struct rq_qos *rqos, struct request *rq)
                return;
        }
 
-       on_q_ns = ktime_get_ns() - rq->alloc_time_ns;
+       on_q_ns = blk_time_get_ns() - rq->alloc_time_ns;
        rq_wait_ns = rq->start_time_ns - rq->alloc_time_ns;
        size_nsec = div64_u64(calc_size_vtime_cost(rq, ioc), VTIME_PER_NSEC);
 
@@ -2893,7 +2900,7 @@ static int blk_iocost_init(struct gendisk *disk)
        ioc->vtime_base_rate = VTIME_PER_USEC;
        atomic64_set(&ioc->vtime_rate, VTIME_PER_USEC);
        seqcount_spinlock_init(&ioc->period_seqcount, &ioc->lock);
-       ioc->period_at = ktime_to_us(ktime_get());
+       ioc->period_at = ktime_to_us(blk_time_get());
        atomic64_set(&ioc->cur_period, 0);
        atomic_set(&ioc->hweight_gen, 0);
 
index c1a6aba1d59e4db829079071b591670859018217..ebb522788d9780f6d4b452b826f113957be02772 100644 (file)
@@ -609,7 +609,7 @@ static void blkcg_iolatency_done_bio(struct rq_qos *rqos, struct bio *bio)
        if (!iolat->blkiolat->enabled)
                return;
 
-       now = ktime_to_ns(ktime_get());
+       now = blk_time_get_ns();
        while (blkg && blkg->parent) {
                iolat = blkg_to_lat(blkg);
                if (!iolat) {
@@ -661,7 +661,7 @@ static void blkiolatency_timer_fn(struct timer_list *t)
        struct blk_iolatency *blkiolat = from_timer(blkiolat, t, timer);
        struct blkcg_gq *blkg;
        struct cgroup_subsys_state *pos_css;
-       u64 now = ktime_to_ns(ktime_get());
+       u64 now = blk_time_get_ns();
 
        rcu_read_lock();
        blkg_for_each_descendant_pre(blkg, pos_css,
@@ -985,7 +985,7 @@ static void iolatency_pd_init(struct blkg_policy_data *pd)
        struct blkcg_gq *blkg = lat_to_blkg(iolat);
        struct rq_qos *rqos = iolat_rq_qos(blkg->q);
        struct blk_iolatency *blkiolat = BLKIOLATENCY(rqos);
-       u64 now = ktime_to_ns(ktime_get());
+       u64 now = blk_time_get_ns();
        int cpu;
 
        if (blk_queue_nonrot(blkg->q))
index e59c3069e8351f7edf0d82c6a3b376a3029a994c..dc8e35d0a51d6de0d4c7bfb2d4ce2f8cbb91a5d1 100644 (file)
@@ -35,6 +35,26 @@ static sector_t bio_discard_limit(struct block_device *bdev, sector_t sector)
        return round_down(UINT_MAX, discard_granularity) >> SECTOR_SHIFT;
 }
 
+static void await_bio_endio(struct bio *bio)
+{
+       complete(bio->bi_private);
+       bio_put(bio);
+}
+
+/*
+ * await_bio_chain - ends @bio and waits for every chained bio to complete
+ */
+static void await_bio_chain(struct bio *bio)
+{
+       DECLARE_COMPLETION_ONSTACK_MAP(done,
+                       bio->bi_bdev->bd_disk->lockdep_map);
+
+       bio->bi_private = &done;
+       bio->bi_end_io = await_bio_endio;
+       bio_endio(bio);
+       blk_wait_io(&done);
+}
+
 int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
                sector_t nr_sects, gfp_t gfp_mask, struct bio **biop)
 {
@@ -77,6 +97,10 @@ int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
                 * is disabled.
                 */
                cond_resched();
+               if (fatal_signal_pending(current)) {
+                       await_bio_chain(bio);
+                       return -EINTR;
+               }
        }
 
        *biop = bio;
@@ -120,32 +144,33 @@ static int __blkdev_issue_write_zeroes(struct block_device *bdev,
                struct bio **biop, unsigned flags)
 {
        struct bio *bio = *biop;
-       unsigned int max_write_zeroes_sectors;
+       unsigned int max_sectors;
 
        if (bdev_read_only(bdev))
                return -EPERM;
 
-       /* Ensure that max_write_zeroes_sectors doesn't overflow bi_size */
-       max_write_zeroes_sectors = bdev_write_zeroes_sectors(bdev);
+       /* Ensure that max_sectors doesn't overflow bi_size */
+       max_sectors = bdev_write_zeroes_sectors(bdev);
 
-       if (max_write_zeroes_sectors == 0)
+       if (max_sectors == 0)
                return -EOPNOTSUPP;
 
        while (nr_sects) {
+               unsigned int len = min_t(sector_t, nr_sects, max_sectors);
+
                bio = blk_next_bio(bio, bdev, 0, REQ_OP_WRITE_ZEROES, gfp_mask);
                bio->bi_iter.bi_sector = sector;
                if (flags & BLKDEV_ZERO_NOUNMAP)
                        bio->bi_opf |= REQ_NOUNMAP;
 
-               if (nr_sects > max_write_zeroes_sectors) {
-                       bio->bi_iter.bi_size = max_write_zeroes_sectors << 9;
-                       nr_sects -= max_write_zeroes_sectors;
-                       sector += max_write_zeroes_sectors;
-               } else {
-                       bio->bi_iter.bi_size = nr_sects << 9;
-                       nr_sects = 0;
-               }
+               bio->bi_iter.bi_size = len << SECTOR_SHIFT;
+               nr_sects -= len;
+               sector += len;
                cond_resched();
+               if (fatal_signal_pending(current)) {
+                       await_bio_chain(bio);
+                       return -EINTR;
+               }
        }
 
        *biop = bio;
@@ -190,6 +215,10 @@ static int __blkdev_issue_zero_pages(struct block_device *bdev,
                                break;
                }
                cond_resched();
+               if (fatal_signal_pending(current)) {
+                       await_bio_chain(bio);
+                       return -EINTR;
+               }
        }
 
        *biop = bio;
@@ -280,7 +309,7 @@ retry:
                bio_put(bio);
        }
        blk_finish_plug(&plug);
-       if (ret && try_write_zeroes) {
+       if (ret && ret != -EINTR && try_write_zeroes) {
                if (!(flags & BLKDEV_ZERO_NOFALLBACK)) {
                        try_write_zeroes = false;
                        goto retry;
@@ -322,7 +351,7 @@ int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
                return -EPERM;
 
        blk_start_plug(&plug);
-       for (;;) {
+       while (nr_sects) {
                unsigned int len = min_t(sector_t, nr_sects, max_sectors);
 
                bio = blk_next_bio(bio, bdev, 0, REQ_OP_SECURE_ERASE, gfp);
@@ -331,12 +360,17 @@ int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector,
 
                sector += len;
                nr_sects -= len;
-               if (!nr_sects) {
-                       ret = submit_bio_wait(bio);
-                       bio_put(bio);
+               cond_resched();
+               if (fatal_signal_pending(current)) {
+                       await_bio_chain(bio);
+                       ret = -EINTR;
+                       bio = NULL;
                        break;
                }
-               cond_resched();
+       }
+       if (bio) {
+               ret = submit_bio_wait(bio);
+               bio_put(bio);
        }
        blk_finish_plug(&plug);
 
index 8584babf3ea0ca2590f30383b9594231266e9437..71210cdb34426d967b5632667cb7579b11e97a2d 100644 (file)
@@ -205,12 +205,19 @@ static int bio_copy_user_iov(struct request *rq, struct rq_map_data *map_data,
        /*
         * success
         */
-       if ((iov_iter_rw(iter) == WRITE &&
-            (!map_data || !map_data->null_mapped)) ||
-           (map_data && map_data->from_user)) {
+       if (iov_iter_rw(iter) == WRITE &&
+            (!map_data || !map_data->null_mapped)) {
                ret = bio_copy_from_iter(bio, iter);
                if (ret)
                        goto cleanup;
+       } else if (map_data && map_data->from_user) {
+               struct iov_iter iter2 = *iter;
+
+               /* This is the copy-in part of SG_DXFER_TO_FROM_DEV. */
+               iter2.data_source = ITER_SOURCE;
+               ret = bio_copy_from_iter(bio, &iter2);
+               if (ret)
+                       goto cleanup;
        } else {
                if (bmd->is_our_pages)
                        zero_fill_bio(bio);
index 2d470cf2173e29c540161401bd566942f049f27c..2a06fd33039da6ad1cbb2b6d212c8662184788e0 100644 (file)
@@ -810,6 +810,10 @@ static struct request *attempt_merge(struct request_queue *q,
        if (rq_data_dir(req) != rq_data_dir(next))
                return NULL;
 
+       /* Don't merge requests with different write hints. */
+       if (req->write_hint != next->write_hint)
+               return NULL;
+
        if (req->ioprio != next->ioprio)
                return NULL;
 
@@ -937,6 +941,10 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
        if (!bio_crypt_rq_ctx_compatible(rq, bio))
                return false;
 
+       /* Don't merge requests with different write hints. */
+       if (rq->write_hint != bio->bi_write_hint)
+               return false;
+
        if (rq->ioprio != bio_prio(bio))
                return false;
 
index aa87fcfda1ecfc875c86a0258fe16e707ce3f167..555ada922cf06021124eb3170983fc308e8d2a38 100644 (file)
@@ -21,7 +21,6 @@
 #include <linux/llist.h>
 #include <linux/cpu.h>
 #include <linux/cache.h>
-#include <linux/sched/sysctl.h>
 #include <linux/sched/topology.h>
 #include <linux/sched/signal.h>
 #include <linux/delay.h>
@@ -40,7 +39,6 @@
 #include "blk-stat.h"
 #include "blk-mq-sched.h"
 #include "blk-rq-qos.h"
-#include "blk-ioprio.h"
 
 static DEFINE_PER_CPU(struct llist_head, blk_cpu_done);
 static DEFINE_PER_CPU(call_single_data_t, blk_cpu_csd);
@@ -323,7 +321,7 @@ void blk_rq_init(struct request_queue *q, struct request *rq)
        RB_CLEAR_NODE(&rq->rb_node);
        rq->tag = BLK_MQ_NO_TAG;
        rq->internal_tag = BLK_MQ_NO_TAG;
-       rq->start_time_ns = ktime_get_ns();
+       rq->start_time_ns = blk_time_get_ns();
        rq->part = NULL;
        blk_crypto_rq_set_defaults(rq);
 }
@@ -333,7 +331,7 @@ EXPORT_SYMBOL(blk_rq_init);
 static inline void blk_mq_rq_time_init(struct request *rq, u64 alloc_time_ns)
 {
        if (blk_mq_need_time_stamp(rq))
-               rq->start_time_ns = ktime_get_ns();
+               rq->start_time_ns = blk_time_get_ns();
        else
                rq->start_time_ns = 0;
 
@@ -444,7 +442,7 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
 
        /* alloc_time includes depth and tag waits */
        if (blk_queue_rq_alloc_time(q))
-               alloc_time_ns = ktime_get_ns();
+               alloc_time_ns = blk_time_get_ns();
 
        if (data->cmd_flags & REQ_NOWAIT)
                data->flags |= BLK_MQ_REQ_NOWAIT;
@@ -629,7 +627,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 
        /* alloc_time includes depth and tag waits */
        if (blk_queue_rq_alloc_time(q))
-               alloc_time_ns = ktime_get_ns();
+               alloc_time_ns = blk_time_get_ns();
 
        /*
         * If the tag allocator sleeps we could get an allocation for a
@@ -1042,7 +1040,7 @@ static inline void __blk_mq_end_request_acct(struct request *rq, u64 now)
 inline void __blk_mq_end_request(struct request *rq, blk_status_t error)
 {
        if (blk_mq_need_time_stamp(rq))
-               __blk_mq_end_request_acct(rq, ktime_get_ns());
+               __blk_mq_end_request_acct(rq, blk_time_get_ns());
 
        blk_mq_finish_request(rq);
 
@@ -1085,7 +1083,7 @@ void blk_mq_end_request_batch(struct io_comp_batch *iob)
        u64 now = 0;
 
        if (iob->need_ts)
-               now = ktime_get_ns();
+               now = blk_time_get_ns();
 
        while ((rq = rq_list_pop(&iob->req_list)) != NULL) {
                prefetch(rq->bio);
@@ -1168,10 +1166,11 @@ static inline bool blk_mq_complete_need_ipi(struct request *rq)
        if (force_irqthreads())
                return false;
 
-       /* same CPU or cache domain?  Complete locally */
+       /* same CPU or cache domain and capacity?  Complete locally */
        if (cpu == rq->mq_ctx->cpu ||
            (!test_bit(QUEUE_FLAG_SAME_FORCE, &rq->q->queue_flags) &&
-            cpus_share_cache(cpu, rq->mq_ctx->cpu)))
+            cpus_share_cache(cpu, rq->mq_ctx->cpu) &&
+            cpus_equal_capacity(cpu, rq->mq_ctx->cpu)))
                return false;
 
        /* don't try to IPI to an offline CPU */
@@ -1255,7 +1254,7 @@ void blk_mq_start_request(struct request *rq)
 
        if (test_bit(QUEUE_FLAG_STATS, &q->queue_flags) &&
            !blk_rq_is_passthrough(rq)) {
-               rq->io_start_time_ns = ktime_get_ns();
+               rq->io_start_time_ns = blk_time_get_ns();
                rq->stats_sectors = blk_rq_sectors(rq);
                rq->rq_flags |= RQF_STATS;
                rq_qos_issue(q, rq);
@@ -1410,22 +1409,10 @@ blk_status_t blk_execute_rq(struct request *rq, bool at_head)
        blk_mq_insert_request(rq, at_head ? BLK_MQ_INSERT_AT_HEAD : 0);
        blk_mq_run_hw_queue(hctx, false);
 
-       if (blk_rq_is_poll(rq)) {
+       if (blk_rq_is_poll(rq))
                blk_rq_poll_completion(rq, &wait.done);
-       } else {
-               /*
-                * Prevent hang_check timer from firing at us during very long
-                * I/O
-                */
-               unsigned long hang_check = sysctl_hung_task_timeout_secs;
-
-               if (hang_check)
-                       while (!wait_for_completion_io_timeout(&wait.done,
-                                       hang_check * (HZ/2)))
-                               ;
-               else
-                       wait_for_completion_io(&wait.done);
-       }
+       else
+               blk_wait_io(&wait.done);
 
        return wait.ret;
 }
@@ -2585,6 +2572,7 @@ static void blk_mq_bio_to_request(struct request *rq, struct bio *bio,
                rq->cmd_flags |= REQ_FAILFAST_MASK;
 
        rq->__sector = bio->bi_iter.bi_sector;
+       rq->write_hint = bio->bi_write_hint;
        blk_rq_bio_prep(rq, bio, nr_segs);
 
        /* This can't fail, since GFP_NOIO includes __GFP_DIRECT_RECLAIM. */
@@ -2892,9 +2880,6 @@ static struct request *blk_mq_get_new_requests(struct request_queue *q,
        };
        struct request *rq;
 
-       if (blk_mq_attempt_bio_merge(q, bio, nsegs))
-               return NULL;
-
        rq_qos_throttle(q, bio);
 
        if (plug) {
@@ -2913,22 +2898,31 @@ static struct request *blk_mq_get_new_requests(struct request_queue *q,
 }
 
 /*
- * Check if we can use the passed on request for submitting the passed in bio,
- * and remove it from the request list if it can be used.
+ * Check if there is a suitable cached request and return it.
  */
-static bool blk_mq_use_cached_rq(struct request *rq, struct blk_plug *plug,
-               struct bio *bio)
+static struct request *blk_mq_peek_cached_request(struct blk_plug *plug,
+               struct request_queue *q, blk_opf_t opf)
 {
-       enum hctx_type type = blk_mq_get_hctx_type(bio->bi_opf);
-       enum hctx_type hctx_type = rq->mq_hctx->type;
+       enum hctx_type type = blk_mq_get_hctx_type(opf);
+       struct request *rq;
 
-       WARN_ON_ONCE(rq_list_peek(&plug->cached_rq) != rq);
+       if (!plug)
+               return NULL;
+       rq = rq_list_peek(&plug->cached_rq);
+       if (!rq || rq->q != q)
+               return NULL;
+       if (type != rq->mq_hctx->type &&
+           (type != HCTX_TYPE_READ || rq->mq_hctx->type != HCTX_TYPE_DEFAULT))
+               return NULL;
+       if (op_is_flush(rq->cmd_flags) != op_is_flush(opf))
+               return NULL;
+       return rq;
+}
 
-       if (type != hctx_type &&
-           !(type == HCTX_TYPE_READ && hctx_type == HCTX_TYPE_DEFAULT))
-               return false;
-       if (op_is_flush(rq->cmd_flags) != op_is_flush(bio->bi_opf))
-               return false;
+static void blk_mq_use_cached_rq(struct request *rq, struct blk_plug *plug,
+               struct bio *bio)
+{
+       WARN_ON_ONCE(rq_list_peek(&plug->cached_rq) != rq);
 
        /*
         * If any qos ->throttle() end up blocking, we will have flushed the
@@ -2941,15 +2935,6 @@ static bool blk_mq_use_cached_rq(struct request *rq, struct blk_plug *plug,
        blk_mq_rq_time_init(rq, 0);
        rq->cmd_flags = bio->bi_opf;
        INIT_LIST_HEAD(&rq->queuelist);
-       return true;
-}
-
-static void bio_set_ioprio(struct bio *bio)
-{
-       /* Nobody set ioprio so far? Initialize it based on task's nice value */
-       if (IOPRIO_PRIO_CLASS(bio->bi_ioprio) == IOPRIO_CLASS_NONE)
-               bio->bi_ioprio = get_current_ioprio();
-       blkcg_set_ioprio(bio);
 }
 
 /**
@@ -2971,51 +2956,43 @@ void blk_mq_submit_bio(struct bio *bio)
        struct blk_plug *plug = blk_mq_plug(bio);
        const int is_sync = op_is_sync(bio->bi_opf);
        struct blk_mq_hw_ctx *hctx;
-       struct request *rq = NULL;
        unsigned int nr_segs = 1;
+       struct request *rq;
        blk_status_t ret;
 
        bio = blk_queue_bounce(bio, q);
-       bio_set_ioprio(bio);
 
-       if (plug) {
-               rq = rq_list_peek(&plug->cached_rq);
-               if (rq && rq->q != q)
-                       rq = NULL;
-       }
-       if (rq) {
-               if (unlikely(bio_may_exceed_limits(bio, &q->limits))) {
-                       bio = __bio_split_to_limits(bio, &q->limits, &nr_segs);
-                       if (!bio)
-                               return;
-               }
-               if (!bio_integrity_prep(bio))
-                       return;
-               if (blk_mq_attempt_bio_merge(q, bio, nr_segs))
-                       return;
-               if (blk_mq_use_cached_rq(rq, plug, bio))
-                       goto done;
-               percpu_ref_get(&q->q_usage_counter);
-       } else {
+       /*
+        * If the plug has a cached request for this queue, try use it.
+        *
+        * The cached request already holds a q_usage_counter reference and we
+        * don't have to acquire a new one if we use it.
+        */
+       rq = blk_mq_peek_cached_request(plug, q, bio->bi_opf);
+       if (!rq) {
                if (unlikely(bio_queue_enter(bio)))
                        return;
-               if (unlikely(bio_may_exceed_limits(bio, &q->limits))) {
-                       bio = __bio_split_to_limits(bio, &q->limits, &nr_segs);
-                       if (!bio)
-                               goto fail;
-               }
-               if (!bio_integrity_prep(bio))
-                       goto fail;
        }
 
-       rq = blk_mq_get_new_requests(q, plug, bio, nr_segs);
-       if (unlikely(!rq)) {
-fail:
-               blk_queue_exit(q);
-               return;
+       if (unlikely(bio_may_exceed_limits(bio, &q->limits))) {
+               bio = __bio_split_to_limits(bio, &q->limits, &nr_segs);
+               if (!bio)
+                       goto queue_exit;
+       }
+       if (!bio_integrity_prep(bio))
+               goto queue_exit;
+
+       if (blk_mq_attempt_bio_merge(q, bio, nr_segs))
+               goto queue_exit;
+
+       if (!rq) {
+               rq = blk_mq_get_new_requests(q, plug, bio, nr_segs);
+               if (unlikely(!rq))
+                       goto queue_exit;
+       } else {
+               blk_mq_use_cached_rq(rq, plug, bio);
        }
 
-done:
        trace_block_getrq(bio);
 
        rq_qos_track(q, rq, bio);
@@ -3046,6 +3023,15 @@ done:
        } else {
                blk_mq_run_dispatch_ops(q, blk_mq_try_issue_directly(hctx, rq));
        }
+       return;
+
+queue_exit:
+       /*
+        * Don't drop the queue reference if we were trying to use a cached
+        * request and thus didn't acquire one.
+        */
+       if (!rq)
+               blk_queue_exit(q);
 }
 
 #ifdef CONFIG_BLK_MQ_STACKING
@@ -3107,7 +3093,7 @@ blk_status_t blk_insert_cloned_request(struct request *rq)
        blk_mq_run_dispatch_ops(q,
                        ret = blk_mq_request_issue_directly(rq, true));
        if (ret)
-               blk_account_io_done(rq, ktime_get_ns());
+               blk_account_io_done(rq, blk_time_get_ns());
        return ret;
 }
 EXPORT_SYMBOL_GPL(blk_insert_cloned_request);
@@ -3185,6 +3171,7 @@ int blk_rq_prep_clone(struct request *rq, struct request *rq_src,
        }
        rq->nr_phys_segments = rq_src->nr_phys_segments;
        rq->ioprio = rq_src->ioprio;
+       rq->write_hint = rq_src->write_hint;
 
        if (rq->bio && blk_crypto_rq_bio_prep(rq, rq->bio, gfp_mask) < 0)
                goto free_and_out;
@@ -4086,15 +4073,16 @@ void blk_mq_release(struct request_queue *q)
        blk_mq_sysfs_deinit(q);
 }
 
-static struct request_queue *blk_mq_init_queue_data(struct blk_mq_tag_set *set,
-               void *queuedata)
+struct request_queue *blk_mq_alloc_queue(struct blk_mq_tag_set *set,
+               struct queue_limits *lim, void *queuedata)
 {
+       struct queue_limits default_lim = { };
        struct request_queue *q;
        int ret;
 
-       q = blk_alloc_queue(set->numa_node);
-       if (!q)
-               return ERR_PTR(-ENOMEM);
+       q = blk_alloc_queue(lim ? lim : &default_lim, set->numa_node);
+       if (IS_ERR(q))
+               return q;
        q->queuedata = queuedata;
        ret = blk_mq_init_allocated_queue(set, q);
        if (ret) {
@@ -4103,20 +4091,15 @@ static struct request_queue *blk_mq_init_queue_data(struct blk_mq_tag_set *set,
        }
        return q;
 }
-
-struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
-{
-       return blk_mq_init_queue_data(set, NULL);
-}
-EXPORT_SYMBOL(blk_mq_init_queue);
+EXPORT_SYMBOL(blk_mq_alloc_queue);
 
 /**
  * blk_mq_destroy_queue - shutdown a request queue
  * @q: request queue to shutdown
  *
- * This shuts down a request queue allocated by blk_mq_init_queue(). All future
+ * This shuts down a request queue allocated by blk_mq_alloc_queue(). All future
  * requests will be failed with -ENODEV. The caller is responsible for dropping
- * the reference from blk_mq_init_queue() by calling blk_put_queue().
+ * the reference from blk_mq_alloc_queue() by calling blk_put_queue().
  *
  * Context: can sleep
  */
@@ -4137,13 +4120,14 @@ void blk_mq_destroy_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_mq_destroy_queue);
 
-struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set, void *queuedata,
+struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set,
+               struct queue_limits *lim, void *queuedata,
                struct lock_class_key *lkclass)
 {
        struct request_queue *q;
        struct gendisk *disk;
 
-       q = blk_mq_init_queue_data(set, queuedata);
+       q = blk_mq_alloc_queue(set, lim, queuedata);
        if (IS_ERR(q))
                return ERR_CAST(q);
 
@@ -4397,7 +4381,7 @@ static void blk_mq_update_queue_map(struct blk_mq_tag_set *set)
        if (set->nr_maps == 1)
                set->map[HCTX_TYPE_DEFAULT].nr_queues = set->nr_hw_queues;
 
-       if (set->ops->map_queues && !is_kdump_kernel()) {
+       if (set->ops->map_queues) {
                int i;
 
                /*
@@ -4496,14 +4480,12 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 
        /*
         * If a crashdump is active, then we are potentially in a very
-        * memory constrained environment. Limit us to 1 queue and
-        * 64 tags to prevent using too much memory.
+        * memory constrained environment. Limit us to  64 tags to prevent
+        * using too much memory.
         */
-       if (is_kdump_kernel()) {
-               set->nr_hw_queues = 1;
-               set->nr_maps = 1;
+       if (is_kdump_kernel())
                set->queue_depth = min(64U, set->queue_depth);
-       }
+
        /*
         * There is no use for more h/w queues than cpus if we just have
         * a single map
@@ -4533,7 +4515,7 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
                                                  GFP_KERNEL, set->numa_node);
                if (!set->map[i].mq_map)
                        goto out_free_mq_map;
-               set->map[i].nr_queues = is_kdump_kernel() ? 1 : set->nr_hw_queues;
+               set->map[i].nr_queues = set->nr_hw_queues;
        }
 
        blk_mq_update_queue_map(set);
index 06ea91e51b8b2e554e5ead222abf3ad1b3ded4ef..3c7d8d638ab59dc9704aa01217c9b940b5941e4b 100644 (file)
@@ -25,53 +25,22 @@ void blk_queue_rq_timeout(struct request_queue *q, unsigned int timeout)
 }
 EXPORT_SYMBOL_GPL(blk_queue_rq_timeout);
 
-/**
- * blk_set_default_limits - reset limits to default values
- * @lim:  the queue_limits structure to reset
- *
- * Description:
- *   Returns a queue_limit struct to its default state.
- */
-void blk_set_default_limits(struct queue_limits *lim)
-{
-       lim->max_segments = BLK_MAX_SEGMENTS;
-       lim->max_discard_segments = 1;
-       lim->max_integrity_segments = 0;
-       lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK;
-       lim->virt_boundary_mask = 0;
-       lim->max_segment_size = BLK_MAX_SEGMENT_SIZE;
-       lim->max_sectors = lim->max_hw_sectors = BLK_SAFE_MAX_SECTORS;
-       lim->max_user_sectors = lim->max_dev_sectors = 0;
-       lim->chunk_sectors = 0;
-       lim->max_write_zeroes_sectors = 0;
-       lim->max_zone_append_sectors = 0;
-       lim->max_discard_sectors = 0;
-       lim->max_hw_discard_sectors = 0;
-       lim->max_secure_erase_sectors = 0;
-       lim->discard_granularity = 512;
-       lim->discard_alignment = 0;
-       lim->discard_misaligned = 0;
-       lim->logical_block_size = lim->physical_block_size = lim->io_min = 512;
-       lim->bounce = BLK_BOUNCE_NONE;
-       lim->alignment_offset = 0;
-       lim->io_opt = 0;
-       lim->misaligned = 0;
-       lim->zoned = false;
-       lim->zone_write_granularity = 0;
-       lim->dma_alignment = 511;
-}
-
 /**
  * blk_set_stacking_limits - set default limits for stacking devices
  * @lim:  the queue_limits structure to reset
  *
- * Description:
- *   Returns a queue_limit struct to its default state. Should be used
- *   by stacking drivers like DM that have no internal limits.
+ * Prepare queue limits for applying limits from underlying devices using
+ * blk_stack_limits().
  */
 void blk_set_stacking_limits(struct queue_limits *lim)
 {
-       blk_set_default_limits(lim);
+       memset(lim, 0, sizeof(*lim));
+       lim->logical_block_size = SECTOR_SIZE;
+       lim->physical_block_size = SECTOR_SIZE;
+       lim->io_min = SECTOR_SIZE;
+       lim->discard_granularity = SECTOR_SIZE;
+       lim->dma_alignment = SECTOR_SIZE - 1;
+       lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK;
 
        /* Inherit limits from component devices */
        lim->max_segments = USHRT_MAX;
@@ -82,9 +51,239 @@ void blk_set_stacking_limits(struct queue_limits *lim)
        lim->max_dev_sectors = UINT_MAX;
        lim->max_write_zeroes_sectors = UINT_MAX;
        lim->max_zone_append_sectors = UINT_MAX;
+       lim->max_user_discard_sectors = UINT_MAX;
 }
 EXPORT_SYMBOL(blk_set_stacking_limits);
 
+static void blk_apply_bdi_limits(struct backing_dev_info *bdi,
+               struct queue_limits *lim)
+{
+       /*
+        * For read-ahead of large files to be effective, we need to read ahead
+        * at least twice the optimal I/O size.
+        */
+       bdi->ra_pages = max(lim->io_opt * 2 / PAGE_SIZE, VM_READAHEAD_PAGES);
+       bdi->io_pages = lim->max_sectors >> PAGE_SECTORS_SHIFT;
+}
+
+static int blk_validate_zoned_limits(struct queue_limits *lim)
+{
+       if (!lim->zoned) {
+               if (WARN_ON_ONCE(lim->max_open_zones) ||
+                   WARN_ON_ONCE(lim->max_active_zones) ||
+                   WARN_ON_ONCE(lim->zone_write_granularity) ||
+                   WARN_ON_ONCE(lim->max_zone_append_sectors))
+                       return -EINVAL;
+               return 0;
+       }
+
+       if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_BLK_DEV_ZONED)))
+               return -EINVAL;
+
+       if (lim->zone_write_granularity < lim->logical_block_size)
+               lim->zone_write_granularity = lim->logical_block_size;
+
+       if (lim->max_zone_append_sectors) {
+               /*
+                * The Zone Append size is limited by the maximum I/O size
+                * and the zone size given that it can't span zones.
+                */
+               lim->max_zone_append_sectors =
+                       min3(lim->max_hw_sectors,
+                            lim->max_zone_append_sectors,
+                            lim->chunk_sectors);
+       }
+
+       return 0;
+}
+
+/*
+ * Check that the limits in lim are valid, initialize defaults for unset
+ * values, and cap values based on others where needed.
+ */
+static int blk_validate_limits(struct queue_limits *lim)
+{
+       unsigned int max_hw_sectors;
+
+       /*
+        * Unless otherwise specified, default to 512 byte logical blocks and a
+        * physical block size equal to the logical block size.
+        */
+       if (!lim->logical_block_size)
+               lim->logical_block_size = SECTOR_SIZE;
+       if (lim->physical_block_size < lim->logical_block_size)
+               lim->physical_block_size = lim->logical_block_size;
+
+       /*
+        * The minimum I/O size defaults to the physical block size unless
+        * explicitly overridden.
+        */
+       if (lim->io_min < lim->physical_block_size)
+               lim->io_min = lim->physical_block_size;
+
+       /*
+        * max_hw_sectors has a somewhat weird default for historical reason,
+        * but driver really should set their own instead of relying on this
+        * value.
+        *
+        * The block layer relies on the fact that every driver can
+        * handle at lest a page worth of data per I/O, and needs the value
+        * aligned to the logical block size.
+        */
+       if (!lim->max_hw_sectors)
+               lim->max_hw_sectors = BLK_SAFE_MAX_SECTORS;
+       if (WARN_ON_ONCE(lim->max_hw_sectors < PAGE_SECTORS))
+               return -EINVAL;
+       lim->max_hw_sectors = round_down(lim->max_hw_sectors,
+                       lim->logical_block_size >> SECTOR_SHIFT);
+
+       /*
+        * The actual max_sectors value is a complex beast and also takes the
+        * max_dev_sectors value (set by SCSI ULPs) and a user configurable
+        * value into account.  The ->max_sectors value is always calculated
+        * from these, so directly setting it won't have any effect.
+        */
+       max_hw_sectors = min_not_zero(lim->max_hw_sectors,
+                               lim->max_dev_sectors);
+       if (lim->max_user_sectors) {
+               if (lim->max_user_sectors > max_hw_sectors ||
+                   lim->max_user_sectors < PAGE_SIZE / SECTOR_SIZE)
+                       return -EINVAL;
+               lim->max_sectors = min(max_hw_sectors, lim->max_user_sectors);
+       } else {
+               lim->max_sectors = min(max_hw_sectors, BLK_DEF_MAX_SECTORS_CAP);
+       }
+       lim->max_sectors = round_down(lim->max_sectors,
+                       lim->logical_block_size >> SECTOR_SHIFT);
+
+       /*
+        * Random default for the maximum number of segments.  Driver should not
+        * rely on this and set their own.
+        */
+       if (!lim->max_segments)
+               lim->max_segments = BLK_MAX_SEGMENTS;
+
+       lim->max_discard_sectors =
+               min(lim->max_hw_discard_sectors, lim->max_user_discard_sectors);
+
+       if (!lim->max_discard_segments)
+               lim->max_discard_segments = 1;
+
+       if (lim->discard_granularity < lim->physical_block_size)
+               lim->discard_granularity = lim->physical_block_size;
+
+       /*
+        * By default there is no limit on the segment boundary alignment,
+        * but if there is one it can't be smaller than the page size as
+        * that would break all the normal I/O patterns.
+        */
+       if (!lim->seg_boundary_mask)
+               lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK;
+       if (WARN_ON_ONCE(lim->seg_boundary_mask < PAGE_SIZE - 1))
+               return -EINVAL;
+
+       /*
+        * Devices that require a virtual boundary do not support scatter/gather
+        * I/O natively, but instead require a descriptor list entry for each
+        * page (which might not be identical to the Linux PAGE_SIZE).  Because
+        * of that they are not limited by our notion of "segment size".
+        */
+       if (lim->virt_boundary_mask) {
+               if (WARN_ON_ONCE(lim->max_segment_size &&
+                                lim->max_segment_size != UINT_MAX))
+                       return -EINVAL;
+               lim->max_segment_size = UINT_MAX;
+       } else {
+               /*
+                * The maximum segment size has an odd historic 64k default that
+                * drivers probably should override.  Just like the I/O size we
+                * require drivers to at least handle a full page per segment.
+                */
+               if (!lim->max_segment_size)
+                       lim->max_segment_size = BLK_MAX_SEGMENT_SIZE;
+               if (WARN_ON_ONCE(lim->max_segment_size < PAGE_SIZE))
+                       return -EINVAL;
+       }
+
+       /*
+        * We require drivers to at least do logical block aligned I/O, but
+        * historically could not check for that due to the separate calls
+        * to set the limits.  Once the transition is finished the check
+        * below should be narrowed down to check the logical block size.
+        */
+       if (!lim->dma_alignment)
+               lim->dma_alignment = SECTOR_SIZE - 1;
+       if (WARN_ON_ONCE(lim->dma_alignment > PAGE_SIZE))
+               return -EINVAL;
+
+       if (lim->alignment_offset) {
+               lim->alignment_offset &= (lim->physical_block_size - 1);
+               lim->misaligned = 0;
+       }
+
+       return blk_validate_zoned_limits(lim);
+}
+
+/*
+ * Set the default limits for a newly allocated queue.  @lim contains the
+ * initial limits set by the driver, which could be no limit in which case
+ * all fields are cleared to zero.
+ */
+int blk_set_default_limits(struct queue_limits *lim)
+{
+       /*
+        * Most defaults are set by capping the bounds in blk_validate_limits,
+        * but max_user_discard_sectors is special and needs an explicit
+        * initialization to the max value here.
+        */
+       lim->max_user_discard_sectors = UINT_MAX;
+       return blk_validate_limits(lim);
+}
+
+/**
+ * queue_limits_commit_update - commit an atomic update of queue limits
+ * @q:         queue to update
+ * @lim:       limits to apply
+ *
+ * Apply the limits in @lim that were obtained from queue_limits_start_update()
+ * and updated by the caller to @q.
+ *
+ * Returns 0 if successful, else a negative error code.
+ */
+int queue_limits_commit_update(struct request_queue *q,
+               struct queue_limits *lim)
+       __releases(q->limits_lock)
+{
+       int error = blk_validate_limits(lim);
+
+       if (!error) {
+               q->limits = *lim;
+               if (q->disk)
+                       blk_apply_bdi_limits(q->disk->bdi, lim);
+       }
+       mutex_unlock(&q->limits_lock);
+       return error;
+}
+EXPORT_SYMBOL_GPL(queue_limits_commit_update);
+
+/**
+ * queue_limits_set - apply queue limits to queue
+ * @q:         queue to update
+ * @lim:       limits to apply
+ *
+ * Apply the limits in @lim that were freshly initialized to @q.
+ * To update existing limits use queue_limits_start_update() and
+ * queue_limits_commit_update() instead.
+ *
+ * Returns 0 if successful, else a negative error code.
+ */
+int queue_limits_set(struct request_queue *q, struct queue_limits *lim)
+{
+       mutex_lock(&q->limits_lock);
+       return queue_limits_commit_update(q, lim);
+}
+EXPORT_SYMBOL_GPL(queue_limits_set);
+
 /**
  * blk_queue_bounce_limit - set bounce buffer limit for queue
  * @q: the request queue for the device
@@ -177,8 +376,11 @@ EXPORT_SYMBOL(blk_queue_chunk_sectors);
 void blk_queue_max_discard_sectors(struct request_queue *q,
                unsigned int max_discard_sectors)
 {
-       q->limits.max_hw_discard_sectors = max_discard_sectors;
-       q->limits.max_discard_sectors = max_discard_sectors;
+       struct queue_limits *lim = &q->limits;
+
+       lim->max_hw_discard_sectors = max_discard_sectors;
+       lim->max_discard_sectors =
+               min(max_discard_sectors, lim->max_user_discard_sectors);
 }
 EXPORT_SYMBOL(blk_queue_max_discard_sectors);
 
@@ -393,15 +595,7 @@ EXPORT_SYMBOL(blk_queue_alignment_offset);
 
 void disk_update_readahead(struct gendisk *disk)
 {
-       struct request_queue *q = disk->queue;
-
-       /*
-        * For read-ahead of large files to be effective, we need to read ahead
-        * at least twice the optimal I/O size.
-        */
-       disk->bdi->ra_pages =
-               max(queue_io_opt(q) * 2 / PAGE_SIZE, VM_READAHEAD_PAGES);
-       disk->bdi->io_pages = queue_max_sectors(q) >> (PAGE_SHIFT - 9);
+       blk_apply_bdi_limits(disk->bdi, &disk->queue->limits);
 }
 EXPORT_SYMBOL_GPL(disk_update_readahead);
 
@@ -689,33 +883,38 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
        t->zone_write_granularity = max(t->zone_write_granularity,
                                        b->zone_write_granularity);
        t->zoned = max(t->zoned, b->zoned);
+       if (!t->zoned) {
+               t->zone_write_granularity = 0;
+               t->max_zone_append_sectors = 0;
+       }
        return ret;
 }
 EXPORT_SYMBOL(blk_stack_limits);
 
 /**
- * disk_stack_limits - adjust queue limits for stacked drivers
- * @disk:  MD/DM gendisk (top)
+ * queue_limits_stack_bdev - adjust queue_limits for stacked devices
+ * @t: the stacking driver limits (top device)
  * @bdev:  the underlying block device (bottom)
  * @offset:  offset to beginning of data within component device
+ * @pfx: prefix to use for warnings logged
  *
  * Description:
- *    Merges the limits for a top level gendisk and a bottom level
- *    block_device.
+ *    This function is used by stacking drivers like MD and DM to ensure
+ *    that all component devices have compatible block sizes and
+ *    alignments.  The stacking driver must provide a queue_limits
+ *    struct (top) and then iteratively call the stacking function for
+ *    all component (bottom) devices.  The stacking function will
+ *    attempt to combine the values and ensure proper alignment.
  */
-void disk_stack_limits(struct gendisk *disk, struct block_device *bdev,
-                      sector_t offset)
+void queue_limits_stack_bdev(struct queue_limits *t, struct block_device *bdev,
+               sector_t offset, const char *pfx)
 {
-       struct request_queue *t = disk->queue;
-
-       if (blk_stack_limits(&t->limits, &bdev_get_queue(bdev)->limits,
-                       get_start_sect(bdev) + (offset >> 9)) < 0)
+       if (blk_stack_limits(t, &bdev_get_queue(bdev)->limits,
+                       get_start_sect(bdev) + offset))
                pr_notice("%s: Warning: Device %pg is misaligned\n",
-                       disk->disk_name, bdev);
-
-       disk_update_readahead(disk);
+                       pfx, bdev);
 }
-EXPORT_SYMBOL(disk_stack_limits);
+EXPORT_SYMBOL_GPL(queue_limits_stack_bdev);
 
 /**
  * blk_queue_update_dma_pad - update pad mask
index 7ff76ae6c76a9531050af5e7a70c14af755eb403..e42c263e53fb995cd7c0b0d6d3de021cae05aae1 100644 (file)
@@ -27,7 +27,7 @@ void blk_rq_stat_init(struct blk_rq_stat *stat)
 /* src is a per-cpu stat, mean isn't initialized */
 void blk_rq_stat_sum(struct blk_rq_stat *dst, struct blk_rq_stat *src)
 {
-       if (!src->nr_samples)
+       if (dst->nr_samples + src->nr_samples <= dst->nr_samples)
                return;
 
        dst->min = min(dst->min, src->min);
index 6b2429cad81af1d9ea4c717a5885616d22b0f68b..8c8f69d8ba48ee7ca553f9a72cc90d5367291666 100644 (file)
@@ -174,23 +174,29 @@ static ssize_t queue_discard_max_show(struct request_queue *q, char *page)
 static ssize_t queue_discard_max_store(struct request_queue *q,
                                       const char *page, size_t count)
 {
-       unsigned long max_discard;
-       ssize_t ret = queue_var_store(&max_discard, page, count);
+       unsigned long max_discard_bytes;
+       struct queue_limits lim;
+       ssize_t ret;
+       int err;
 
+       ret = queue_var_store(&max_discard_bytes, page, count);
        if (ret < 0)
                return ret;
 
-       if (max_discard & (q->limits.discard_granularity - 1))
+       if (max_discard_bytes & (q->limits.discard_granularity - 1))
                return -EINVAL;
 
-       max_discard >>= 9;
-       if (max_discard > UINT_MAX)
+       if ((max_discard_bytes >> SECTOR_SHIFT) > UINT_MAX)
                return -EINVAL;
 
-       if (max_discard > q->limits.max_hw_discard_sectors)
-               max_discard = q->limits.max_hw_discard_sectors;
+       blk_mq_freeze_queue(q);
+       lim = queue_limits_start_update(q);
+       lim.max_user_discard_sectors = max_discard_bytes >> SECTOR_SHIFT;
+       err = queue_limits_commit_update(q, &lim);
+       blk_mq_unfreeze_queue(q);
 
-       q->limits.max_discard_sectors = max_discard;
+       if (err)
+               return err;
        return ret;
 }
 
@@ -226,35 +232,22 @@ static ssize_t queue_zone_append_max_show(struct request_queue *q, char *page)
 static ssize_t
 queue_max_sectors_store(struct request_queue *q, const char *page, size_t count)
 {
-       unsigned long var;
-       unsigned int max_sectors_kb,
-               max_hw_sectors_kb = queue_max_hw_sectors(q) >> 1,
-                       page_kb = 1 << (PAGE_SHIFT - 10);
-       ssize_t ret = queue_var_store(&var, page, count);
+       unsigned long max_sectors_kb;
+       struct queue_limits lim;
+       ssize_t ret;
+       int err;
 
+       ret = queue_var_store(&max_sectors_kb, page, count);
        if (ret < 0)
                return ret;
 
-       max_sectors_kb = (unsigned int)var;
-       max_hw_sectors_kb = min_not_zero(max_hw_sectors_kb,
-                                        q->limits.max_dev_sectors >> 1);
-       if (max_sectors_kb == 0) {
-               q->limits.max_user_sectors = 0;
-               max_sectors_kb = min(max_hw_sectors_kb,
-                                    BLK_DEF_MAX_SECTORS_CAP >> 1);
-       } else {
-               if (max_sectors_kb > max_hw_sectors_kb ||
-                   max_sectors_kb < page_kb)
-                       return -EINVAL;
-               q->limits.max_user_sectors = max_sectors_kb << 1;
-       }
-
-       spin_lock_irq(&q->queue_lock);
-       q->limits.max_sectors = max_sectors_kb << 1;
-       if (q->disk)
-               q->disk->bdi->io_pages = max_sectors_kb >> (PAGE_SHIFT - 10);
-       spin_unlock_irq(&q->queue_lock);
-
+       blk_mq_freeze_queue(q);
+       lim = queue_limits_start_update(q);
+       lim.max_user_sectors = max_sectors_kb << 1;
+       err = queue_limits_commit_update(q, &lim);
+       blk_mq_unfreeze_queue(q);
+       if (err)
+               return err;
        return ret;
 }
 
index 16f5766620a41043645756c51d441f4488af9edf..f4850a6f860bbac8aba17aba9d217d83ec12f6a7 100644 (file)
@@ -1098,7 +1098,7 @@ static int throtl_dispatch_tg(struct throtl_grp *tg)
        while ((bio = throtl_peek_queued(&sq->queued[READ])) &&
               tg_may_dispatch(tg, bio, NULL)) {
 
-               tg_dispatch_one_bio(tg, bio_data_dir(bio));
+               tg_dispatch_one_bio(tg, READ);
                nr_reads++;
 
                if (nr_reads >= max_nr_reads)
@@ -1108,7 +1108,7 @@ static int throtl_dispatch_tg(struct throtl_grp *tg)
        while ((bio = throtl_peek_queued(&sq->queued[WRITE])) &&
               tg_may_dispatch(tg, bio, NULL)) {
 
-               tg_dispatch_one_bio(tg, bio_data_dir(bio));
+               tg_dispatch_one_bio(tg, WRITE);
                nr_writes++;
 
                if (nr_writes >= max_nr_writes)
@@ -1815,7 +1815,7 @@ static bool throtl_tg_is_idle(struct throtl_grp *tg)
        time = min_t(unsigned long, MAX_IDLE_TIME, 4 * tg->idletime_threshold);
        ret = tg->latency_target == DFL_LATENCY_TARGET ||
              tg->idletime_threshold == DFL_IDLE_THRESHOLD ||
-             (ktime_get_ns() >> 10) - tg->last_finish_time > time ||
+             (blk_time_get_ns() >> 10) - tg->last_finish_time > time ||
              tg->avg_idletime > tg->idletime_threshold ||
              (tg->latency_target && tg->bio_cnt &&
                tg->bad_bio_cnt * 5 < tg->bio_cnt);
@@ -2060,7 +2060,7 @@ static void blk_throtl_update_idletime(struct throtl_grp *tg)
        if (last_finish_time == 0)
                return;
 
-       now = ktime_get_ns() >> 10;
+       now = blk_time_get_ns() >> 10;
        if (now <= last_finish_time ||
            last_finish_time == tg->checked_last_finish_time)
                return;
@@ -2327,7 +2327,7 @@ void blk_throtl_bio_endio(struct bio *bio)
        if (!tg->td->limit_valid[LIMIT_LOW])
                return;
 
-       finish_time_ns = ktime_get_ns();
+       finish_time_ns = blk_time_get_ns();
        tg->last_finish_time = finish_time_ns >> 10;
 
        start_time = bio_issue_time(&bio->bi_issue) >> 10;
index 5ba3cd574eacbddc1b92bbaec3d79d81fb66ae7a..64472134dd26df23eede4fcd493543e445c1230e 100644 (file)
@@ -29,6 +29,7 @@
 #include "blk-wbt.h"
 #include "blk-rq-qos.h"
 #include "elevator.h"
+#include "blk.h"
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/wbt.h>
@@ -163,9 +164,9 @@ static void wb_timestamp(struct rq_wb *rwb, unsigned long *var)
  */
 static bool wb_recent_wait(struct rq_wb *rwb)
 {
-       struct bdi_writeback *wb = &rwb->rqos.disk->bdi->wb;
+       struct backing_dev_info *bdi = rwb->rqos.disk->bdi;
 
-       return time_before(jiffies, wb->dirty_sleep + HZ);
+       return time_before(jiffies, bdi->last_bdp_sleep + HZ);
 }
 
 static inline struct rq_wait *get_rq_wait(struct rq_wb *rwb,
@@ -274,13 +275,12 @@ static inline bool stat_sample_valid(struct blk_rq_stat *stat)
 
 static u64 rwb_sync_issue_lat(struct rq_wb *rwb)
 {
-       u64 now, issue = READ_ONCE(rwb->sync_issue);
+       u64 issue = READ_ONCE(rwb->sync_issue);
 
        if (!issue || !rwb->sync_cookie)
                return 0;
 
-       now = ktime_to_ns(ktime_get());
-       return now - issue;
+       return blk_time_get_ns() - issue;
 }
 
 static inline unsigned int wbt_inflight(struct rq_wb *rwb)
index d343e5756a9c8048374edcce8000230eed10480b..da0f4b2a8fa09330bdc71bf545ed2dc688326392 100644 (file)
@@ -11,7 +11,6 @@
 
 #include <linux/kernel.h>
 #include <linux/module.h>
-#include <linux/rbtree.h>
 #include <linux/blkdev.h>
 #include <linux/blk-mq.h>
 #include <linux/mm.h>
@@ -177,8 +176,7 @@ static int blk_zone_need_reset_cb(struct blk_zone *zone, unsigned int idx,
        }
 }
 
-static int blkdev_zone_reset_all_emulated(struct block_device *bdev,
-                                         gfp_t gfp_mask)
+static int blkdev_zone_reset_all_emulated(struct block_device *bdev)
 {
        struct gendisk *disk = bdev->bd_disk;
        sector_t capacity = bdev_nr_sectors(bdev);
@@ -205,7 +203,7 @@ static int blkdev_zone_reset_all_emulated(struct block_device *bdev,
                }
 
                bio = blk_next_bio(bio, bdev, 0, REQ_OP_ZONE_RESET | REQ_SYNC,
-                                  gfp_mask);
+                                  GFP_KERNEL);
                bio->bi_iter.bi_sector = sector;
                sector += zone_sectors;
 
@@ -223,7 +221,7 @@ out_free_need_reset:
        return ret;
 }
 
-static int blkdev_zone_reset_all(struct block_device *bdev, gfp_t gfp_mask)
+static int blkdev_zone_reset_all(struct block_device *bdev)
 {
        struct bio bio;
 
@@ -238,7 +236,6 @@ static int blkdev_zone_reset_all(struct block_device *bdev, gfp_t gfp_mask)
  * @sector:    Start sector of the first zone to operate on
  * @nr_sectors:        Number of sectors, should be at least the length of one zone and
  *             must be zone size aligned.
- * @gfp_mask:  Memory allocation flags (for bio_alloc)
  *
  * Description:
  *    Perform the specified operation on the range of zones specified by
@@ -248,7 +245,7 @@ static int blkdev_zone_reset_all(struct block_device *bdev, gfp_t gfp_mask)
  *    or finish request.
  */
 int blkdev_zone_mgmt(struct block_device *bdev, enum req_op op,
-                    sector_t sector, sector_t nr_sectors, gfp_t gfp_mask)
+                    sector_t sector, sector_t nr_sectors)
 {
        struct request_queue *q = bdev_get_queue(bdev);
        sector_t zone_sectors = bdev_zone_sectors(bdev);
@@ -285,12 +282,12 @@ int blkdev_zone_mgmt(struct block_device *bdev, enum req_op op,
         */
        if (op == REQ_OP_ZONE_RESET && sector == 0 && nr_sectors == capacity) {
                if (!blk_queue_zone_resetall(q))
-                       return blkdev_zone_reset_all_emulated(bdev, gfp_mask);
-               return blkdev_zone_reset_all(bdev, gfp_mask);
+                       return blkdev_zone_reset_all_emulated(bdev);
+               return blkdev_zone_reset_all(bdev);
        }
 
        while (sector < end_sector) {
-               bio = blk_next_bio(bio, bdev, 0, op | REQ_SYNC, gfp_mask);
+               bio = blk_next_bio(bio, bdev, 0, op | REQ_SYNC, GFP_KERNEL);
                bio->bi_iter.bi_sector = sector;
                sector += zone_sectors;
 
@@ -419,8 +416,7 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, blk_mode_t mode,
                return -ENOTTY;
        }
 
-       ret = blkdev_zone_mgmt(bdev, op, zrange.sector, zrange.nr_sectors,
-                              GFP_KERNEL);
+       ret = blkdev_zone_mgmt(bdev, op, zrange.sector, zrange.nr_sectors);
 
 fail:
        if (cmd == BLKRESETZONE)
index 1ef920f72e0f87172227778cbf1fa4b78cdea295..a19b7b42e6503cd5ca5e03aba41b894a891929b8 100644 (file)
@@ -4,6 +4,8 @@
 
 #include <linux/blk-crypto.h>
 #include <linux/memblock.h>    /* for max_pfn/max_low_pfn */
+#include <linux/sched/sysctl.h>
+#include <linux/timekeeping.h>
 #include <xen/xen.h>
 #include "blk-crypto-internal.h"
 
@@ -70,6 +72,18 @@ static inline int bio_queue_enter(struct bio *bio)
        return __bio_queue_enter(q, bio);
 }
 
+static inline void blk_wait_io(struct completion *done)
+{
+       /* Prevent hang_check timer from firing at us during very long I/O */
+       unsigned long timeout = sysctl_hung_task_timeout_secs * HZ / 2;
+
+       if (timeout)
+               while (!wait_for_completion_io_timeout(done, timeout))
+                       ;
+       else
+               wait_for_completion_io(done);
+}
+
 #define BIO_INLINE_VECS 4
 struct bio_vec *bvec_alloc(mempool_t *pool, unsigned short *nr_vecs,
                gfp_t gfp_mask);
@@ -329,7 +343,7 @@ void blk_rq_set_mixed_merge(struct request *rq);
 bool blk_rq_merge_ok(struct request *rq, struct bio *bio);
 enum elv_merge blk_try_merge(struct request *rq, struct bio *bio);
 
-void blk_set_default_limits(struct queue_limits *lim);
+int blk_set_default_limits(struct queue_limits *lim);
 int blk_dev_init(void);
 
 /*
@@ -447,7 +461,7 @@ static inline void bio_release_page(struct bio *bio, struct page *page)
                unpin_user_page(page);
 }
 
-struct request_queue *blk_alloc_queue(int node_id);
+struct request_queue *blk_alloc_queue(struct queue_limits *lim, int node_id);
 
 int disk_scan_partitions(struct gendisk *disk, blk_mode_t mode);
 
@@ -516,4 +530,75 @@ static inline int req_ref_read(struct request *req)
        return atomic_read(&req->ref);
 }
 
+static inline u64 blk_time_get_ns(void)
+{
+       struct blk_plug *plug = current->plug;
+
+       if (!plug)
+               return ktime_get_ns();
+
+       /*
+        * 0 could very well be a valid time, but rather than flag "this is
+        * a valid timestamp" separately, just accept that we'll do an extra
+        * ktime_get_ns() if we just happen to get 0 as the current time.
+        */
+       if (!plug->cur_ktime) {
+               plug->cur_ktime = ktime_get_ns();
+               current->flags |= PF_BLOCK_TS;
+       }
+       return plug->cur_ktime;
+}
+
+static inline ktime_t blk_time_get(void)
+{
+       return ns_to_ktime(blk_time_get_ns());
+}
+
+/*
+ * From most significant bit:
+ * 1 bit: reserved for other usage, see below
+ * 12 bits: original size of bio
+ * 51 bits: issue time of bio
+ */
+#define BIO_ISSUE_RES_BITS      1
+#define BIO_ISSUE_SIZE_BITS     12
+#define BIO_ISSUE_RES_SHIFT     (64 - BIO_ISSUE_RES_BITS)
+#define BIO_ISSUE_SIZE_SHIFT    (BIO_ISSUE_RES_SHIFT - BIO_ISSUE_SIZE_BITS)
+#define BIO_ISSUE_TIME_MASK     ((1ULL << BIO_ISSUE_SIZE_SHIFT) - 1)
+#define BIO_ISSUE_SIZE_MASK     \
+       (((1ULL << BIO_ISSUE_SIZE_BITS) - 1) << BIO_ISSUE_SIZE_SHIFT)
+#define BIO_ISSUE_RES_MASK      (~((1ULL << BIO_ISSUE_RES_SHIFT) - 1))
+
+/* Reserved bit for blk-throtl */
+#define BIO_ISSUE_THROTL_SKIP_LATENCY (1ULL << 63)
+
+static inline u64 __bio_issue_time(u64 time)
+{
+       return time & BIO_ISSUE_TIME_MASK;
+}
+
+static inline u64 bio_issue_time(struct bio_issue *issue)
+{
+       return __bio_issue_time(issue->value);
+}
+
+static inline sector_t bio_issue_size(struct bio_issue *issue)
+{
+       return ((issue->value & BIO_ISSUE_SIZE_MASK) >> BIO_ISSUE_SIZE_SHIFT);
+}
+
+static inline void bio_issue_init(struct bio_issue *issue,
+                                      sector_t size)
+{
+       size &= (1ULL << BIO_ISSUE_SIZE_BITS) - 1;
+       issue->value = ((issue->value & BIO_ISSUE_RES_MASK) |
+                       (blk_time_get_ns() & BIO_ISSUE_TIME_MASK) |
+                       ((u64)size << BIO_ISSUE_SIZE_SHIFT));
+}
+
+void bdev_release(struct file *bdev_file);
+int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
+             const struct blk_holder_ops *hops, struct file *bdev_file);
+int bdev_permission(dev_t dev, blk_mode_t mode, void *holder);
+
 #endif /* BLK_INTERNAL_H */
index 7cfcb242f9a112cc8352121c5b6f73ceb467d4c9..d6a5219f29dd53b5fa7aa379d69a71aa5a087ebc 100644 (file)
@@ -169,6 +169,7 @@ static struct bio *bounce_clone_bio(struct bio *bio_src)
        if (bio_flagged(bio_src, BIO_REMAPPED))
                bio_set_flag(bio, BIO_REMAPPED);
        bio->bi_ioprio          = bio_src->bi_ioprio;
+       bio->bi_write_hint      = bio_src->bi_write_hint;
        bio->bi_iter.bi_sector  = bio_src->bi_iter.bi_sector;
        bio->bi_iter.bi_size    = bio_src->bi_iter.bi_size;
 
index b3acdbdb6e7ea8f62c8d9f76d7ae83a4a35bf0e2..bcc7dee6abced61e33c55950774cfac418375b09 100644 (file)
@@ -383,7 +383,7 @@ struct request_queue *bsg_setup_queue(struct device *dev, const char *name,
        if (blk_mq_alloc_tag_set(set))
                goto out_tag_set;
 
-       q = blk_mq_init_queue(set);
+       q = blk_mq_alloc_queue(set, NULL, NULL);
        if (IS_ERR(q)) {
                ret = PTR_ERR(q);
                goto out_queue;
index 0cf8cf72cdfa108926ae8fc7f53bce05a3225058..679d9b752fe828eb64b67d17e7492469c71e35d3 100644 (file)
@@ -73,6 +73,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
                bio_init(&bio, bdev, vecs, nr_pages, dio_bio_write_op(iocb));
        }
        bio.bi_iter.bi_sector = pos >> SECTOR_SHIFT;
+       bio.bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint;
        bio.bi_ioprio = iocb->ki_ioprio;
 
        ret = bio_iov_iter_get_pages(&bio, iter);
@@ -203,6 +204,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 
        for (;;) {
                bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT;
+               bio->bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint;
                bio->bi_private = dio;
                bio->bi_end_io = blkdev_bio_end_io;
                bio->bi_ioprio = iocb->ki_ioprio;
@@ -321,6 +323,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
        dio->flags = 0;
        dio->iocb = iocb;
        bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT;
+       bio->bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint;
        bio->bi_end_io = blkdev_bio_end_io_async;
        bio->bi_ioprio = iocb->ki_ioprio;
 
@@ -482,7 +485,7 @@ static void blkdev_readahead(struct readahead_control *rac)
 }
 
 static int blkdev_map_blocks(struct iomap_writepage_ctx *wpc,
-               struct inode *inode, loff_t offset)
+               struct inode *inode, loff_t offset, unsigned int len)
 {
        loff_t isize = i_size_read(inode);
 
@@ -569,18 +572,17 @@ static int blkdev_fsync(struct file *filp, loff_t start, loff_t end,
 blk_mode_t file_to_blk_mode(struct file *file)
 {
        blk_mode_t mode = 0;
-       struct bdev_handle *handle = file->private_data;
 
        if (file->f_mode & FMODE_READ)
                mode |= BLK_OPEN_READ;
        if (file->f_mode & FMODE_WRITE)
                mode |= BLK_OPEN_WRITE;
        /*
-        * do_dentry_open() clears O_EXCL from f_flags, use handle->mode to
-        * determine whether the open was exclusive for already open files.
+        * do_dentry_open() clears O_EXCL from f_flags, use file->private_data
+        * to determine whether the open was exclusive for already open files.
         */
-       if (handle)
-               mode |= handle->mode & BLK_OPEN_EXCL;
+       if (file->private_data)
+               mode |= BLK_OPEN_EXCL;
        else if (file->f_flags & O_EXCL)
                mode |= BLK_OPEN_EXCL;
        if (file->f_flags & O_NDELAY)
@@ -599,36 +601,31 @@ blk_mode_t file_to_blk_mode(struct file *file)
 
 static int blkdev_open(struct inode *inode, struct file *filp)
 {
-       struct bdev_handle *handle;
+       struct block_device *bdev;
        blk_mode_t mode;
-
-       /*
-        * Preserve backwards compatibility and allow large file access
-        * even if userspace doesn't ask for it explicitly. Some mkfs
-        * binary needs it. We might want to drop this workaround
-        * during an unstable branch.
-        */
-       filp->f_flags |= O_LARGEFILE;
-       filp->f_mode |= FMODE_BUF_RASYNC | FMODE_CAN_ODIRECT;
+       int ret;
 
        mode = file_to_blk_mode(filp);
-       handle = bdev_open_by_dev(inode->i_rdev, mode,
-                       mode & BLK_OPEN_EXCL ? filp : NULL, NULL);
-       if (IS_ERR(handle))
-               return PTR_ERR(handle);
+       /* Use the file as the holder. */
+       if (mode & BLK_OPEN_EXCL)
+               filp->private_data = filp;
+       ret = bdev_permission(inode->i_rdev, mode, filp->private_data);
+       if (ret)
+               return ret;
 
-       if (bdev_nowait(handle->bdev))
-               filp->f_mode |= FMODE_NOWAIT;
+       bdev = blkdev_get_no_open(inode->i_rdev);
+       if (!bdev)
+               return -ENXIO;
 
-       filp->f_mapping = handle->bdev->bd_inode->i_mapping;
-       filp->f_wb_err = filemap_sample_wb_err(filp->f_mapping);
-       filp->private_data = handle;
-       return 0;
+       ret = bdev_open(bdev, mode, filp->private_data, NULL, filp);
+       if (ret)
+               blkdev_put_no_open(bdev);
+       return ret;
 }
 
 static int blkdev_release(struct inode *inode, struct file *filp)
 {
-       bdev_release(filp->private_data);
+       bdev_release(filp);
        return 0;
 }
 
index d74fb5b4ae68188a9b1bd7886212e8fa2566a7a7..bb29a68e1d67662533653d8e74742554ae09b39a 100644 (file)
@@ -342,7 +342,7 @@ EXPORT_SYMBOL_GPL(disk_uevent);
 
 int disk_scan_partitions(struct gendisk *disk, blk_mode_t mode)
 {
-       struct bdev_handle *handle;
+       struct file *file;
        int ret = 0;
 
        if (disk->flags & (GENHD_FL_NO_PART | GENHD_FL_HIDDEN))
@@ -366,12 +366,12 @@ int disk_scan_partitions(struct gendisk *disk, blk_mode_t mode)
        }
 
        set_bit(GD_NEED_PART_SCAN, &disk->state);
-       handle = bdev_open_by_dev(disk_devt(disk), mode & ~BLK_OPEN_EXCL, NULL,
-                                 NULL);
-       if (IS_ERR(handle))
-               ret = PTR_ERR(handle);
+       file = bdev_file_open_by_dev(disk_devt(disk), mode & ~BLK_OPEN_EXCL,
+                                    NULL, NULL);
+       if (IS_ERR(file))
+               ret = PTR_ERR(file);
        else
-               bdev_release(handle);
+               fput(file);
 
        /*
         * If blkdev_get_by_dev() failed early, GD_NEED_PART_SCAN is still set,
@@ -1201,7 +1201,7 @@ static int block_uevent(const struct device *dev, struct kobj_uevent_env *env)
        return add_uevent_var(env, "DISKSEQ=%llu", disk->diskseq);
 }
 
-struct class block_class = {
+const struct class block_class = {
        .name           = "block",
        .dev_uevent     = block_uevent,
 };
@@ -1391,19 +1391,21 @@ out_free_disk:
        return NULL;
 }
 
-struct gendisk *__blk_alloc_disk(int node, struct lock_class_key *lkclass)
+struct gendisk *__blk_alloc_disk(struct queue_limits *lim, int node,
+               struct lock_class_key *lkclass)
 {
+       struct queue_limits default_lim = { };
        struct request_queue *q;
        struct gendisk *disk;
 
-       q = blk_alloc_queue(node);
-       if (!q)
-               return NULL;
+       q = blk_alloc_queue(lim ? lim : &default_lim, node);
+       if (IS_ERR(q))
+               return ERR_CAST(q);
 
        disk = __alloc_disk_node(q, node, lkclass);
        if (!disk) {
                blk_put_queue(q);
-               return NULL;
+               return ERR_PTR(-ENOMEM);
        }
        set_bit(GD_OWNS_QUEUE, &disk->state);
        return disk;
index 37d18c13d958188ba35214fc842db985d0e908b3..791091a7eac234d79970aca9b32a5d317b085483 100644 (file)
@@ -8,6 +8,8 @@ struct bd_holder_disk {
        int                     refcnt;
 };
 
+static DEFINE_MUTEX(blk_holder_mutex);
+
 static struct bd_holder_disk *bd_find_holder_disk(struct block_device *bdev,
                                                  struct gendisk *disk)
 {
@@ -80,7 +82,7 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
        kobject_get(bdev->bd_holder_dir);
        mutex_unlock(&bdev->bd_disk->open_mutex);
 
-       mutex_lock(&disk->open_mutex);
+       mutex_lock(&blk_holder_mutex);
        WARN_ON_ONCE(!bdev->bd_holder);
 
        holder = bd_find_holder_disk(bdev, disk);
@@ -108,7 +110,7 @@ int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk)
                goto out_del_symlink;
        list_add(&holder->list, &disk->slave_bdevs);
 
-       mutex_unlock(&disk->open_mutex);
+       mutex_unlock(&blk_holder_mutex);
        return 0;
 
 out_del_symlink:
@@ -116,7 +118,7 @@ out_del_symlink:
 out_free_holder:
        kfree(holder);
 out_unlock:
-       mutex_unlock(&disk->open_mutex);
+       mutex_unlock(&blk_holder_mutex);
        if (ret)
                kobject_put(bdev->bd_holder_dir);
        return ret;
@@ -140,7 +142,7 @@ void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk)
        if (WARN_ON_ONCE(!disk->slave_dir))
                return;
 
-       mutex_lock(&disk->open_mutex);
+       mutex_lock(&blk_holder_mutex);
        holder = bd_find_holder_disk(bdev, disk);
        if (!WARN_ON_ONCE(holder == NULL) && !--holder->refcnt) {
                del_symlink(disk->slave_dir, bdev_kobj(bdev));
@@ -149,6 +151,6 @@ void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk)
                list_del_init(&holder->list);
                kfree(holder);
        }
-       mutex_unlock(&disk->open_mutex);
+       mutex_unlock(&blk_holder_mutex);
 }
 EXPORT_SYMBOL_GPL(bd_unlink_disk_holder);
index 9c73a763ef8838953bd1050b505621c39b8d4cdb..0c76137adcaaa5b9d212d789291d681c23c064f6 100644 (file)
@@ -18,10 +18,8 @@ static int blkpg_do_ioctl(struct block_device *bdev,
 {
        struct gendisk *disk = bdev->bd_disk;
        struct blkpg_partition p;
-       sector_t start, length;
+       sector_t start, length, capacity, end;
 
-       if (disk->flags & GENHD_FL_NO_PART)
-               return -EINVAL;
        if (!capable(CAP_SYS_ADMIN))
                return -EACCES;
        if (copy_from_user(&p, upart, sizeof(struct blkpg_partition)))
@@ -43,6 +41,13 @@ static int blkpg_do_ioctl(struct block_device *bdev,
 
        start = p.start >> SECTOR_SHIFT;
        length = p.length >> SECTOR_SHIFT;
+       capacity = get_capacity(disk);
+
+       if (check_add_overflow(start, length, &end))
+               return -EINVAL;
+
+       if (start >= capacity || end > capacity)
+               return -EINVAL;
 
        switch (op) {
        case BLKPG_ADD_PARTITION:
@@ -471,7 +476,7 @@ static int blkdev_bszset(struct block_device *bdev, blk_mode_t mode,
                int __user *argp)
 {
        int ret, n;
-       struct bdev_handle *handle;
+       struct file *file;
 
        if (!capable(CAP_SYS_ADMIN))
                return -EACCES;
@@ -483,12 +488,11 @@ static int blkdev_bszset(struct block_device *bdev, blk_mode_t mode,
        if (mode & BLK_OPEN_EXCL)
                return set_blocksize(bdev, n);
 
-       handle = bdev_open_by_dev(bdev->bd_dev, mode, &bdev, NULL);
-       if (IS_ERR(handle))
+       file = bdev_file_open_by_dev(bdev->bd_dev, mode, &bdev, NULL);
+       if (IS_ERR(file))
                return -EBUSY;
        ret = set_blocksize(bdev, n);
-       bdev_release(handle);
-
+       fput(file);
        return ret;
 }
 
index dec7ce3a3edb7027b971232269846390e3baa834..d247a457bf6e3fd03c0e0f496988c6a5e8999ff1 100644 (file)
@@ -71,6 +71,7 @@ enum opal_response_token {
 #define SHORT_ATOM_BYTE  0xBF
 #define MEDIUM_ATOM_BYTE 0xDF
 #define LONG_ATOM_BYTE   0xE3
+#define EMPTY_ATOM_BYTE  0xFF
 
 #define OPAL_INVAL_PARAM 12
 #define OPAL_MANUFACTURED_INACTIVE 0x08
index cab0d76a828e37eb90e38d91b4e92a61e703717e..b11e88c82c8cfa9e179b05a6633f0b1294d818da 100644 (file)
@@ -419,26 +419,20 @@ static bool partition_overlaps(struct gendisk *disk, sector_t start,
 int bdev_add_partition(struct gendisk *disk, int partno, sector_t start,
                sector_t length)
 {
-       sector_t capacity = get_capacity(disk), end;
        struct block_device *part;
        int ret;
 
        mutex_lock(&disk->open_mutex);
-       if (check_add_overflow(start, length, &end)) {
-               ret = -EINVAL;
+       if (!disk_live(disk)) {
+               ret = -ENXIO;
                goto out;
        }
 
-       if (start >= capacity || end > capacity) {
+       if (disk->flags & GENHD_FL_NO_PART) {
                ret = -EINVAL;
                goto out;
        }
 
-       if (!disk_live(disk)) {
-               ret = -ENXIO;
-               goto out;
-       }
-
        if (partition_overlaps(disk, start, length, -1)) {
                ret = -EBUSY;
                goto out;
index 7b521df00a39f4fc10239b71a1be86263d93beb1..c80183156d68020e0e14974308ac751b3df84421 100644 (file)
@@ -20,6 +20,7 @@ extern void note_bootable_part(dev_t dev, int part, int goodness);
  * Code to understand MacOS partition tables.
  */
 
+#ifdef CONFIG_PPC_PMAC
 static inline void mac_fix_string(char *stg, int len)
 {
        int i;
@@ -27,6 +28,7 @@ static inline void mac_fix_string(char *stg, int len)
        for (i = len - 1; i >= 0 && stg[i] == ' '; i--)
                stg[i] = 0;
 }
+#endif
 
 int mac_partition(struct parsed_partitions *state)
 {
index 3d9e9cd250bd541f3166932bde9e43b35c13f13a..14fe0fef811cfc2fb5e84047a592fa7a309addb4 100644 (file)
@@ -1056,16 +1056,20 @@ static int response_parse(const u8 *buf, size_t length,
                        token_length = response_parse_medium(iter, pos);
                else if (pos[0] <= LONG_ATOM_BYTE) /* long atom */
                        token_length = response_parse_long(iter, pos);
+               else if (pos[0] == EMPTY_ATOM_BYTE) /* empty atom */
+                       token_length = 1;
                else /* TOKEN */
                        token_length = response_parse_token(iter, pos);
 
                if (token_length < 0)
                        return token_length;
 
+               if (pos[0] != EMPTY_ATOM_BYTE)
+                       num_entries++;
+
                pos += token_length;
                total -= token_length;
                iter++;
-               num_entries++;
        }
 
        resp->num = num_entries;
@@ -1208,7 +1212,7 @@ static int cmd_start(struct opal_dev *dev, const u8 *uid, const u8 *method)
 static int start_opal_session_cont(struct opal_dev *dev)
 {
        u32 hsn, tsn;
-       int error = 0;
+       int error;
 
        error = parse_and_check_status(dev);
        if (error)
@@ -1350,7 +1354,7 @@ static int get_active_key_cont(struct opal_dev *dev)
 {
        const char *activekey;
        size_t keylen;
-       int error = 0;
+       int error;
 
        error = parse_and_check_status(dev);
        if (error)
@@ -2153,7 +2157,7 @@ static int lock_unlock_locking_range(struct opal_dev *dev, void *data)
        u8 lr_buffer[OPAL_UID_LENGTH];
        struct opal_lock_unlock *lkul = data;
        u8 read_locked = 1, write_locked = 1;
-       int err = 0;
+       int err;
 
        if (build_locking_range(lr_buffer, sizeof(lr_buffer),
                                lkul->session.opal_key.lr) < 0)
@@ -2576,7 +2580,7 @@ static int opal_get_discv(struct opal_dev *dev, struct opal_discovery *discv)
        const struct opal_step discovery0_step = {
                opal_discovery0, discv
        };
-       int ret = 0;
+       int ret;
 
        mutex_lock(&dev->dev_lock);
        setup_opal_dev(dev);
@@ -3065,7 +3069,7 @@ bool opal_unlock_from_suspend(struct opal_dev *dev)
 {
        struct opal_suspend_data *suspend;
        bool was_failure = false;
-       int ret = 0;
+       int ret;
 
        if (!dev)
                return false;
@@ -3108,10 +3112,9 @@ static int opal_read_table(struct opal_dev *dev,
                { read_table_data, rw_tbl },
                { end_opal_session, }
        };
-       int ret = 0;
 
        if (!rw_tbl->size)
-               return ret;
+               return 0;
 
        return execute_steps(dev, read_table_steps,
                             ARRAY_SIZE(read_table_steps));
@@ -3125,10 +3128,9 @@ static int opal_write_table(struct opal_dev *dev,
                { write_table_data, rw_tbl },
                { end_opal_session, }
        };
-       int ret = 0;
 
        if (!rw_tbl->size)
-               return ret;
+               return 0;
 
        return execute_steps(dev, write_table_steps,
                             ARRAY_SIZE(write_table_steps));
index 914d8cddd43a92ebc65c02c0b92d7e7825cdc3d0..d90892fd6f2ad5faf295b7c91f3bcf1e90a5a713 100644 (file)
 #include <net/checksum.h>
 #include <asm/unaligned.h>
 
-typedef __be16 (csum_fn) (void *, unsigned int);
+typedef __be16 (csum_fn) (__be16, void *, unsigned int);
 
-static __be16 t10_pi_crc_fn(void *data, unsigned int len)
+static __be16 t10_pi_crc_fn(__be16 crc, void *data, unsigned int len)
 {
-       return cpu_to_be16(crc_t10dif(data, len));
+       return cpu_to_be16(crc_t10dif_update(be16_to_cpu(crc), data, len));
 }
 
-static __be16 t10_pi_ip_fn(void *data, unsigned int len)
+static __be16 t10_pi_ip_fn(__be16 csum, void *data, unsigned int len)
 {
        return (__force __be16)ip_compute_csum(data, len);
 }
@@ -32,12 +32,16 @@ static __be16 t10_pi_ip_fn(void *data, unsigned int len)
 static blk_status_t t10_pi_generate(struct blk_integrity_iter *iter,
                csum_fn *fn, enum t10_dif_type type)
 {
+       u8 offset = iter->pi_offset;
        unsigned int i;
 
        for (i = 0 ; i < iter->data_size ; i += iter->interval) {
-               struct t10_pi_tuple *pi = iter->prot_buf;
+               struct t10_pi_tuple *pi = iter->prot_buf + offset;
 
-               pi->guard_tag = fn(iter->data_buf, iter->interval);
+               pi->guard_tag = fn(0, iter->data_buf, iter->interval);
+               if (offset)
+                       pi->guard_tag = fn(pi->guard_tag, iter->prot_buf,
+                                          offset);
                pi->app_tag = 0;
 
                if (type == T10_PI_TYPE1_PROTECTION)
@@ -56,12 +60,13 @@ static blk_status_t t10_pi_generate(struct blk_integrity_iter *iter,
 static blk_status_t t10_pi_verify(struct blk_integrity_iter *iter,
                csum_fn *fn, enum t10_dif_type type)
 {
+       u8 offset = iter->pi_offset;
        unsigned int i;
 
        BUG_ON(type == T10_PI_TYPE0_PROTECTION);
 
        for (i = 0 ; i < iter->data_size ; i += iter->interval) {
-               struct t10_pi_tuple *pi = iter->prot_buf;
+               struct t10_pi_tuple *pi = iter->prot_buf + offset;
                __be16 csum;
 
                if (type == T10_PI_TYPE1_PROTECTION ||
@@ -83,7 +88,9 @@ static blk_status_t t10_pi_verify(struct blk_integrity_iter *iter,
                                goto next;
                }
 
-               csum = fn(iter->data_buf, iter->interval);
+               csum = fn(0, iter->data_buf, iter->interval);
+               if (offset)
+                       csum = fn(csum, iter->prot_buf, offset);
 
                if (pi->guard_tag != csum) {
                        pr_err("%s: guard tag error at sector %llu " \
@@ -134,8 +141,10 @@ static blk_status_t t10_pi_type1_verify_ip(struct blk_integrity_iter *iter)
  */
 static void t10_pi_type1_prepare(struct request *rq)
 {
-       const int tuple_sz = rq->q->integrity.tuple_size;
+       struct blk_integrity *bi = &rq->q->integrity;
+       const int tuple_sz = bi->tuple_size;
        u32 ref_tag = t10_pi_ref_tag(rq);
+       u8 offset = bi->pi_offset;
        struct bio *bio;
 
        __rq_for_each_bio(bio, rq) {
@@ -154,7 +163,7 @@ static void t10_pi_type1_prepare(struct request *rq)
 
                        p = bvec_kmap_local(&iv);
                        for (j = 0; j < iv.bv_len; j += tuple_sz) {
-                               struct t10_pi_tuple *pi = p;
+                               struct t10_pi_tuple *pi = p + offset;
 
                                if (be32_to_cpu(pi->ref_tag) == virt)
                                        pi->ref_tag = cpu_to_be32(ref_tag);
@@ -183,9 +192,11 @@ static void t10_pi_type1_prepare(struct request *rq)
  */
 static void t10_pi_type1_complete(struct request *rq, unsigned int nr_bytes)
 {
-       unsigned intervals = nr_bytes >> rq->q->integrity.interval_exp;
-       const int tuple_sz = rq->q->integrity.tuple_size;
+       struct blk_integrity *bi = &rq->q->integrity;
+       unsigned intervals = nr_bytes >> bi->interval_exp;
+       const int tuple_sz = bi->tuple_size;
        u32 ref_tag = t10_pi_ref_tag(rq);
+       u8 offset = bi->pi_offset;
        struct bio *bio;
 
        __rq_for_each_bio(bio, rq) {
@@ -200,7 +211,7 @@ static void t10_pi_type1_complete(struct request *rq, unsigned int nr_bytes)
 
                        p = bvec_kmap_local(&iv);
                        for (j = 0; j < iv.bv_len && intervals; j += tuple_sz) {
-                               struct t10_pi_tuple *pi = p;
+                               struct t10_pi_tuple *pi = p + offset;
 
                                if (be32_to_cpu(pi->ref_tag) == ref_tag)
                                        pi->ref_tag = cpu_to_be32(virt);
@@ -280,20 +291,24 @@ const struct blk_integrity_profile t10_pi_type3_ip = {
 };
 EXPORT_SYMBOL(t10_pi_type3_ip);
 
-static __be64 ext_pi_crc64(void *data, unsigned int len)
+static __be64 ext_pi_crc64(u64 crc, void *data, unsigned int len)
 {
-       return cpu_to_be64(crc64_rocksoft(data, len));
+       return cpu_to_be64(crc64_rocksoft_update(crc, data, len));
 }
 
 static blk_status_t ext_pi_crc64_generate(struct blk_integrity_iter *iter,
                                        enum t10_dif_type type)
 {
+       u8 offset = iter->pi_offset;
        unsigned int i;
 
        for (i = 0 ; i < iter->data_size ; i += iter->interval) {
-               struct crc64_pi_tuple *pi = iter->prot_buf;
+               struct crc64_pi_tuple *pi = iter->prot_buf + offset;
 
-               pi->guard_tag = ext_pi_crc64(iter->data_buf, iter->interval);
+               pi->guard_tag = ext_pi_crc64(0, iter->data_buf, iter->interval);
+               if (offset)
+                       pi->guard_tag = ext_pi_crc64(be64_to_cpu(pi->guard_tag),
+                                       iter->prot_buf, offset);
                pi->app_tag = 0;
 
                if (type == T10_PI_TYPE1_PROTECTION)
@@ -319,10 +334,11 @@ static bool ext_pi_ref_escape(u8 *ref_tag)
 static blk_status_t ext_pi_crc64_verify(struct blk_integrity_iter *iter,
                                      enum t10_dif_type type)
 {
+       u8 offset = iter->pi_offset;
        unsigned int i;
 
        for (i = 0; i < iter->data_size; i += iter->interval) {
-               struct crc64_pi_tuple *pi = iter->prot_buf;
+               struct crc64_pi_tuple *pi = iter->prot_buf + offset;
                u64 ref, seed;
                __be64 csum;
 
@@ -343,7 +359,11 @@ static blk_status_t ext_pi_crc64_verify(struct blk_integrity_iter *iter,
                                goto next;
                }
 
-               csum = ext_pi_crc64(iter->data_buf, iter->interval);
+               csum = ext_pi_crc64(0, iter->data_buf, iter->interval);
+               if (offset)
+                       csum = ext_pi_crc64(be64_to_cpu(csum), iter->prot_buf,
+                                           offset);
+
                if (pi->guard_tag != csum) {
                        pr_err("%s: guard tag error at sector %llu " \
                               "(rcvd %016llx, want %016llx)\n",
@@ -373,8 +393,10 @@ static blk_status_t ext_pi_type1_generate_crc64(struct blk_integrity_iter *iter)
 
 static void ext_pi_type1_prepare(struct request *rq)
 {
-       const int tuple_sz = rq->q->integrity.tuple_size;
+       struct blk_integrity *bi = &rq->q->integrity;
+       const int tuple_sz = bi->tuple_size;
        u64 ref_tag = ext_pi_ref_tag(rq);
+       u8 offset = bi->pi_offset;
        struct bio *bio;
 
        __rq_for_each_bio(bio, rq) {
@@ -393,7 +415,7 @@ static void ext_pi_type1_prepare(struct request *rq)
 
                        p = bvec_kmap_local(&iv);
                        for (j = 0; j < iv.bv_len; j += tuple_sz) {
-                               struct crc64_pi_tuple *pi = p;
+                               struct crc64_pi_tuple *pi = p +  offset;
                                u64 ref = get_unaligned_be48(pi->ref_tag);
 
                                if (ref == virt)
@@ -411,9 +433,11 @@ static void ext_pi_type1_prepare(struct request *rq)
 
 static void ext_pi_type1_complete(struct request *rq, unsigned int nr_bytes)
 {
-       unsigned intervals = nr_bytes >> rq->q->integrity.interval_exp;
-       const int tuple_sz = rq->q->integrity.tuple_size;
+       struct blk_integrity *bi = &rq->q->integrity;
+       unsigned intervals = nr_bytes >> bi->interval_exp;
+       const int tuple_sz = bi->tuple_size;
        u64 ref_tag = ext_pi_ref_tag(rq);
+       u8 offset = bi->pi_offset;
        struct bio *bio;
 
        __rq_for_each_bio(bio, rq) {
@@ -428,7 +452,7 @@ static void ext_pi_type1_complete(struct request *rq, unsigned int nr_bytes)
 
                        p = bvec_kmap_local(&iv);
                        for (j = 0; j < iv.bv_len && intervals; j += tuple_sz) {
-                               struct crc64_pi_tuple *pi = p;
+                               struct crc64_pi_tuple *pi = p + offset;
                                u64 ref = get_unaligned_be48(pi->ref_tag);
 
                                if (ref == ref_tag)
index 82c44d4899b9676d4d43c2f2af7fd9f95758b894..e24c829d7a0154f0ff016152e6913bff105cd93f 100644 (file)
@@ -91,13 +91,13 @@ static int hash_sendmsg(struct socket *sock, struct msghdr *msg,
                if (!(msg->msg_flags & MSG_MORE)) {
                        err = hash_alloc_result(sk, ctx);
                        if (err)
-                               goto unlock_free;
+                               goto unlock_free_result;
                        ahash_request_set_crypt(&ctx->req, NULL,
                                                ctx->result, 0);
                        err = crypto_wait_req(crypto_ahash_final(&ctx->req),
                                              &ctx->wait);
                        if (err)
-                               goto unlock_free;
+                               goto unlock_free_result;
                }
                goto done_more;
        }
@@ -170,6 +170,7 @@ unlock:
 
 unlock_free:
        af_alg_free_sg(&ctx->sgl);
+unlock_free_result:
        hash_free_result(sk, ctx);
        ctx->more = false;
        goto unlock;
index eedddef9ce40cc40fa7a3c2cd3bcca7607be491b..e81918ca68b782c881bf6f868b441281e249e7f4 100644 (file)
@@ -148,6 +148,9 @@ static int crypto_cbc_create(struct crypto_template *tmpl, struct rtattr **tb)
        if (!is_power_of_2(inst->alg.co.base.cra_blocksize))
                goto out_free_inst;
 
+       if (inst->alg.co.statesize)
+               goto out_free_inst;
+
        inst->alg.encrypt = crypto_cbc_encrypt;
        inst->alg.decrypt = crypto_cbc_decrypt;
 
index 0b6dd8aa21f2edace686fb5531705698e7acc18d..0f1bd7dcde245988bb7d01dc9d0e32655669bdf8 100644 (file)
@@ -212,13 +212,12 @@ static int crypto_lskcipher_crypt_sg(struct skcipher_request *req,
 
        ivsize = crypto_lskcipher_ivsize(tfm);
        ivs = PTR_ALIGN(ivs, crypto_skcipher_alignmask(skcipher) + 1);
+       memcpy(ivs, req->iv, ivsize);
 
        flags = req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP;
 
        if (req->base.flags & CRYPTO_SKCIPHER_REQ_CONT)
                flags |= CRYPTO_LSKCIPHER_FLAG_CONT;
-       else
-               memcpy(ivs, req->iv, ivsize);
 
        if (!(req->base.flags & CRYPTO_SKCIPHER_REQ_NOTFINAL))
                flags |= CRYPTO_LSKCIPHER_FLAG_FINAL;
@@ -234,8 +233,7 @@ static int crypto_lskcipher_crypt_sg(struct skcipher_request *req,
                flags |= CRYPTO_LSKCIPHER_FLAG_CONT;
        }
 
-       if (flags & CRYPTO_LSKCIPHER_FLAG_FINAL)
-               memcpy(req->iv, ivs, ivsize);
+       memcpy(req->iv, ivs, ivsize);
 
        return err;
 }
index 19035230563d7bf8ca2625a06c241d0eb010c3b7..7cb962e2145349670e4a506c879822a6bb3c6c23 100644 (file)
@@ -102,7 +102,7 @@ static int reset_pending_show(struct seq_file *s, void *v)
 {
        struct ivpu_device *vdev = seq_to_ivpu(s);
 
-       seq_printf(s, "%d\n", atomic_read(&vdev->pm->in_reset));
+       seq_printf(s, "%d\n", atomic_read(&vdev->pm->reset_pending));
        return 0;
 }
 
@@ -130,7 +130,9 @@ dvfs_mode_fops_write(struct file *file, const char __user *user_buf, size_t size
 
        fw->dvfs_mode = dvfs_mode;
 
-       ivpu_pm_schedule_recovery(vdev);
+       ret = pci_try_reset_function(to_pci_dev(vdev->drm.dev));
+       if (ret)
+               return ret;
 
        return size;
 }
@@ -190,7 +192,10 @@ fw_profiling_freq_fops_write(struct file *file, const char __user *user_buf,
                return ret;
 
        ivpu_hw_profiling_freq_drive(vdev, enable);
-       ivpu_pm_schedule_recovery(vdev);
+
+       ret = pci_try_reset_function(to_pci_dev(vdev->drm.dev));
+       if (ret)
+               return ret;
 
        return size;
 }
@@ -301,11 +306,18 @@ static ssize_t
 ivpu_force_recovery_fn(struct file *file, const char __user *user_buf, size_t size, loff_t *pos)
 {
        struct ivpu_device *vdev = file->private_data;
+       int ret;
 
        if (!size)
                return -EINVAL;
 
-       ivpu_pm_schedule_recovery(vdev);
+       ret = ivpu_rpm_get(vdev);
+       if (ret)
+               return ret;
+
+       ivpu_pm_trigger_recovery(vdev, "debugfs");
+       flush_work(&vdev->pm->recovery_work);
+       ivpu_rpm_put(vdev);
        return size;
 }
 
index 64927682161b282e739ef024a0ccff29c49de2cd..4b06402269869335c324770fc572a57bea7316f7 100644 (file)
@@ -6,6 +6,7 @@
 #include <linux/firmware.h>
 #include <linux/module.h>
 #include <linux/pci.h>
+#include <linux/pm_runtime.h>
 
 #include <drm/drm_accel.h>
 #include <drm/drm_file.h>
@@ -17,6 +18,7 @@
 #include "ivpu_debugfs.h"
 #include "ivpu_drv.h"
 #include "ivpu_fw.h"
+#include "ivpu_fw_log.h"
 #include "ivpu_gem.h"
 #include "ivpu_hw.h"
 #include "ivpu_ipc.h"
@@ -65,22 +67,20 @@ struct ivpu_file_priv *ivpu_file_priv_get(struct ivpu_file_priv *file_priv)
        return file_priv;
 }
 
-struct ivpu_file_priv *ivpu_file_priv_get_by_ctx_id(struct ivpu_device *vdev, unsigned long id)
+static void file_priv_unbind(struct ivpu_device *vdev, struct ivpu_file_priv *file_priv)
 {
-       struct ivpu_file_priv *file_priv;
-
-       xa_lock_irq(&vdev->context_xa);
-       file_priv = xa_load(&vdev->context_xa, id);
-       /* file_priv may still be in context_xa during file_priv_release() */
-       if (file_priv && !kref_get_unless_zero(&file_priv->ref))
-               file_priv = NULL;
-       xa_unlock_irq(&vdev->context_xa);
-
-       if (file_priv)
-               ivpu_dbg(vdev, KREF, "file_priv get by id: ctx %u refcount %u\n",
-                        file_priv->ctx.id, kref_read(&file_priv->ref));
-
-       return file_priv;
+       mutex_lock(&file_priv->lock);
+       if (file_priv->bound) {
+               ivpu_dbg(vdev, FILE, "file_priv unbind: ctx %u\n", file_priv->ctx.id);
+
+               ivpu_cmdq_release_all_locked(file_priv);
+               ivpu_jsm_context_release(vdev, file_priv->ctx.id);
+               ivpu_bo_unbind_all_bos_from_context(vdev, &file_priv->ctx);
+               ivpu_mmu_user_context_fini(vdev, &file_priv->ctx);
+               file_priv->bound = false;
+               drm_WARN_ON(&vdev->drm, !xa_erase_irq(&vdev->context_xa, file_priv->ctx.id));
+       }
+       mutex_unlock(&file_priv->lock);
 }
 
 static void file_priv_release(struct kref *ref)
@@ -88,13 +88,15 @@ static void file_priv_release(struct kref *ref)
        struct ivpu_file_priv *file_priv = container_of(ref, struct ivpu_file_priv, ref);
        struct ivpu_device *vdev = file_priv->vdev;
 
-       ivpu_dbg(vdev, FILE, "file_priv release: ctx %u\n", file_priv->ctx.id);
+       ivpu_dbg(vdev, FILE, "file_priv release: ctx %u bound %d\n",
+                file_priv->ctx.id, (bool)file_priv->bound);
+
+       pm_runtime_get_sync(vdev->drm.dev);
+       mutex_lock(&vdev->context_list_lock);
+       file_priv_unbind(vdev, file_priv);
+       mutex_unlock(&vdev->context_list_lock);
+       pm_runtime_put_autosuspend(vdev->drm.dev);
 
-       ivpu_cmdq_release_all(file_priv);
-       ivpu_jsm_context_release(vdev, file_priv->ctx.id);
-       ivpu_bo_remove_all_bos_from_context(vdev, &file_priv->ctx);
-       ivpu_mmu_user_context_fini(vdev, &file_priv->ctx);
-       drm_WARN_ON(&vdev->drm, xa_erase_irq(&vdev->context_xa, file_priv->ctx.id) != file_priv);
        mutex_destroy(&file_priv->lock);
        kfree(file_priv);
 }
@@ -176,9 +178,6 @@ static int ivpu_get_param_ioctl(struct drm_device *dev, void *data, struct drm_f
        case DRM_IVPU_PARAM_CONTEXT_BASE_ADDRESS:
                args->value = vdev->hw->ranges.user.start;
                break;
-       case DRM_IVPU_PARAM_CONTEXT_PRIORITY:
-               args->value = file_priv->priority;
-               break;
        case DRM_IVPU_PARAM_CONTEXT_ID:
                args->value = file_priv->ctx.id;
                break;
@@ -218,17 +217,10 @@ static int ivpu_get_param_ioctl(struct drm_device *dev, void *data, struct drm_f
 
 static int ivpu_set_param_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 {
-       struct ivpu_file_priv *file_priv = file->driver_priv;
        struct drm_ivpu_param *args = data;
        int ret = 0;
 
        switch (args->param) {
-       case DRM_IVPU_PARAM_CONTEXT_PRIORITY:
-               if (args->value <= DRM_IVPU_CONTEXT_PRIORITY_REALTIME)
-                       file_priv->priority = args->value;
-               else
-                       ret = -EINVAL;
-               break;
        default:
                ret = -EINVAL;
        }
@@ -241,50 +233,53 @@ static int ivpu_open(struct drm_device *dev, struct drm_file *file)
        struct ivpu_device *vdev = to_ivpu_device(dev);
        struct ivpu_file_priv *file_priv;
        u32 ctx_id;
-       void *old;
-       int ret;
+       int idx, ret;
 
-       ret = xa_alloc_irq(&vdev->context_xa, &ctx_id, NULL, vdev->context_xa_limit, GFP_KERNEL);
-       if (ret) {
-               ivpu_err(vdev, "Failed to allocate context id: %d\n", ret);
-               return ret;
-       }
+       if (!drm_dev_enter(dev, &idx))
+               return -ENODEV;
 
        file_priv = kzalloc(sizeof(*file_priv), GFP_KERNEL);
        if (!file_priv) {
                ret = -ENOMEM;
-               goto err_xa_erase;
+               goto err_dev_exit;
        }
 
        file_priv->vdev = vdev;
-       file_priv->priority = DRM_IVPU_CONTEXT_PRIORITY_NORMAL;
+       file_priv->bound = true;
        kref_init(&file_priv->ref);
        mutex_init(&file_priv->lock);
 
+       mutex_lock(&vdev->context_list_lock);
+
+       ret = xa_alloc_irq(&vdev->context_xa, &ctx_id, file_priv,
+                          vdev->context_xa_limit, GFP_KERNEL);
+       if (ret) {
+               ivpu_err(vdev, "Failed to allocate context id: %d\n", ret);
+               goto err_unlock;
+       }
+
        ret = ivpu_mmu_user_context_init(vdev, &file_priv->ctx, ctx_id);
        if (ret)
-               goto err_mutex_destroy;
+               goto err_xa_erase;
 
-       old = xa_store_irq(&vdev->context_xa, ctx_id, file_priv, GFP_KERNEL);
-       if (xa_is_err(old)) {
-               ret = xa_err(old);
-               ivpu_err(vdev, "Failed to store context %u: %d\n", ctx_id, ret);
-               goto err_ctx_fini;
-       }
+       mutex_unlock(&vdev->context_list_lock);
+       drm_dev_exit(idx);
+
+       file->driver_priv = file_priv;
 
        ivpu_dbg(vdev, FILE, "file_priv create: ctx %u process %s pid %d\n",
                 ctx_id, current->comm, task_pid_nr(current));
 
-       file->driver_priv = file_priv;
        return 0;
 
-err_ctx_fini:
-       ivpu_mmu_user_context_fini(vdev, &file_priv->ctx);
-err_mutex_destroy:
-       mutex_destroy(&file_priv->lock);
-       kfree(file_priv);
 err_xa_erase:
        xa_erase_irq(&vdev->context_xa, ctx_id);
+err_unlock:
+       mutex_unlock(&vdev->context_list_lock);
+       mutex_destroy(&file_priv->lock);
+       kfree(file_priv);
+err_dev_exit:
+       drm_dev_exit(idx);
        return ret;
 }
 
@@ -340,8 +335,6 @@ static int ivpu_wait_for_ready(struct ivpu_device *vdev)
 
        if (!ret)
                ivpu_dbg(vdev, PM, "VPU ready message received successfully\n");
-       else
-               ivpu_hw_diagnose_failure(vdev);
 
        return ret;
 }
@@ -369,6 +362,9 @@ int ivpu_boot(struct ivpu_device *vdev)
        ret = ivpu_wait_for_ready(vdev);
        if (ret) {
                ivpu_err(vdev, "Failed to boot the firmware: %d\n", ret);
+               ivpu_hw_diagnose_failure(vdev);
+               ivpu_mmu_evtq_dump(vdev);
+               ivpu_fw_log_dump(vdev);
                return ret;
        }
 
@@ -484,9 +480,8 @@ static int ivpu_pci_init(struct ivpu_device *vdev)
        /* Clear any pending errors */
        pcie_capability_clear_word(pdev, PCI_EXP_DEVSTA, 0x3f);
 
-       /* VPU 37XX does not require 10m D3hot delay */
-       if (ivpu_hw_gen(vdev) == IVPU_HW_37XX)
-               pdev->d3hot_delay = 0;
+       /* NPU does not require 10m D3hot delay */
+       pdev->d3hot_delay = 0;
 
        ret = pcim_enable_device(pdev);
        if (ret) {
@@ -540,6 +535,10 @@ static int ivpu_dev_init(struct ivpu_device *vdev)
        lockdep_set_class(&vdev->submitted_jobs_xa.xa_lock, &submitted_jobs_xa_lock_class_key);
        INIT_LIST_HEAD(&vdev->bo_list);
 
+       ret = drmm_mutex_init(&vdev->drm, &vdev->context_list_lock);
+       if (ret)
+               goto err_xa_destroy;
+
        ret = drmm_mutex_init(&vdev->drm, &vdev->bo_list_lock);
        if (ret)
                goto err_xa_destroy;
@@ -611,14 +610,30 @@ err_xa_destroy:
        return ret;
 }
 
+static void ivpu_bo_unbind_all_user_contexts(struct ivpu_device *vdev)
+{
+       struct ivpu_file_priv *file_priv;
+       unsigned long ctx_id;
+
+       mutex_lock(&vdev->context_list_lock);
+
+       xa_for_each(&vdev->context_xa, ctx_id, file_priv)
+               file_priv_unbind(vdev, file_priv);
+
+       mutex_unlock(&vdev->context_list_lock);
+}
+
 static void ivpu_dev_fini(struct ivpu_device *vdev)
 {
        ivpu_pm_disable(vdev);
        ivpu_shutdown(vdev);
        if (IVPU_WA(d3hot_after_power_off))
                pci_set_power_state(to_pci_dev(vdev->drm.dev), PCI_D3hot);
+
+       ivpu_jobs_abort_all(vdev);
        ivpu_job_done_consumer_fini(vdev);
        ivpu_pm_cancel_recovery(vdev);
+       ivpu_bo_unbind_all_user_contexts(vdev);
 
        ivpu_ipc_fini(vdev);
        ivpu_fw_fini(vdev);
index ebc4b84f27b209df9d653747772e756702cf4601..069ace4adb2d19c1a0544333d0da65632c524ea7 100644 (file)
@@ -56,6 +56,7 @@
 #define IVPU_DBG_JSM    BIT(10)
 #define IVPU_DBG_KREF   BIT(11)
 #define IVPU_DBG_RPM    BIT(12)
+#define IVPU_DBG_MMU_MAP BIT(13)
 
 #define ivpu_err(vdev, fmt, ...) \
        drm_err(&(vdev)->drm, "%s(): " fmt, __func__, ##__VA_ARGS__)
@@ -114,6 +115,7 @@ struct ivpu_device {
 
        struct ivpu_mmu_context gctx;
        struct ivpu_mmu_context rctx;
+       struct mutex context_list_lock; /* Protects user context addition/removal */
        struct xarray context_xa;
        struct xa_limit context_xa_limit;
 
@@ -145,8 +147,8 @@ struct ivpu_file_priv {
        struct mutex lock; /* Protects cmdq */
        struct ivpu_cmdq *cmdq[IVPU_NUM_ENGINES];
        struct ivpu_mmu_context ctx;
-       u32 priority;
        bool has_mmu_faults;
+       bool bound;
 };
 
 extern int ivpu_dbg_mask;
@@ -162,7 +164,6 @@ extern bool ivpu_disable_mmu_cont_pages;
 extern int ivpu_test_mode;
 
 struct ivpu_file_priv *ivpu_file_priv_get(struct ivpu_file_priv *file_priv);
-struct ivpu_file_priv *ivpu_file_priv_get_by_ctx_id(struct ivpu_device *vdev, unsigned long id);
 void ivpu_file_priv_put(struct ivpu_file_priv **link);
 
 int ivpu_boot(struct ivpu_device *vdev);
index 6576232f3e678ee7c2532b07c830b74733c06960..5fa8bd4603d5be6f1fba8c43ba058e6a9b4f3676 100644 (file)
@@ -222,7 +222,6 @@ ivpu_fw_init_wa(struct ivpu_device *vdev)
        const struct vpu_firmware_header *fw_hdr = (const void *)vdev->fw->file->data;
 
        if (IVPU_FW_CHECK_API_VER_LT(vdev, fw_hdr, BOOT, 3, 17) ||
-           (ivpu_hw_gen(vdev) > IVPU_HW_37XX) ||
            (ivpu_test_mode & IVPU_TEST_MODE_D0I3_MSG_DISABLE))
                vdev->wa.disable_d0i3_msg = true;
 
index 1dda4f38ea25cd356cc9efadcaa8d35394c6b19f..e9ddbe9f50ebeffaa3a1617431b864ceec73e2e7 100644 (file)
@@ -24,14 +24,11 @@ static const struct drm_gem_object_funcs ivpu_gem_funcs;
 
 static inline void ivpu_dbg_bo(struct ivpu_device *vdev, struct ivpu_bo *bo, const char *action)
 {
-       if (bo->ctx)
-               ivpu_dbg(vdev, BO, "%6s: size %zu has_pages %d dma_mapped %d handle %u ctx %d vpu_addr 0x%llx mmu_mapped %d\n",
-                        action, ivpu_bo_size(bo), (bool)bo->base.pages, (bool)bo->base.sgt,
-                        bo->handle, bo->ctx->id, bo->vpu_addr, bo->mmu_mapped);
-       else
-               ivpu_dbg(vdev, BO, "%6s: size %zu has_pages %d dma_mapped %d handle %u (not added to context)\n",
-                        action, ivpu_bo_size(bo), (bool)bo->base.pages, (bool)bo->base.sgt,
-                        bo->handle);
+       ivpu_dbg(vdev, BO,
+                "%6s: bo %8p vpu_addr %9llx size %8zu ctx %d has_pages %d dma_mapped %d mmu_mapped %d wc %d imported %d\n",
+                action, bo, bo->vpu_addr, ivpu_bo_size(bo), bo->ctx ? bo->ctx->id : 0,
+                (bool)bo->base.pages, (bool)bo->base.sgt, bo->mmu_mapped, bo->base.map_wc,
+                (bool)bo->base.base.import_attach);
 }
 
 /*
@@ -49,12 +46,7 @@ int __must_check ivpu_bo_pin(struct ivpu_bo *bo)
        mutex_lock(&bo->lock);
 
        ivpu_dbg_bo(vdev, bo, "pin");
-
-       if (!bo->ctx) {
-               ivpu_err(vdev, "vpu_addr not allocated for BO %d\n", bo->handle);
-               ret = -EINVAL;
-               goto unlock;
-       }
+       drm_WARN_ON(&vdev->drm, !bo->ctx);
 
        if (!bo->mmu_mapped) {
                struct sg_table *sgt = drm_gem_shmem_get_pages_sgt(&bo->base);
@@ -85,7 +77,10 @@ ivpu_bo_alloc_vpu_addr(struct ivpu_bo *bo, struct ivpu_mmu_context *ctx,
                       const struct ivpu_addr_range *range)
 {
        struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
-       int ret;
+       int idx, ret;
+
+       if (!drm_dev_enter(&vdev->drm, &idx))
+               return -ENODEV;
 
        mutex_lock(&bo->lock);
 
@@ -101,6 +96,8 @@ ivpu_bo_alloc_vpu_addr(struct ivpu_bo *bo, struct ivpu_mmu_context *ctx,
 
        mutex_unlock(&bo->lock);
 
+       drm_dev_exit(idx);
+
        return ret;
 }
 
@@ -108,11 +105,7 @@ static void ivpu_bo_unbind_locked(struct ivpu_bo *bo)
 {
        struct ivpu_device *vdev = ivpu_bo_to_vdev(bo);
 
-       lockdep_assert_held(&bo->lock);
-
-       ivpu_dbg_bo(vdev, bo, "unbind");
-
-       /* TODO: dma_unmap */
+       lockdep_assert(lockdep_is_held(&bo->lock) || !kref_read(&bo->base.base.refcount));
 
        if (bo->mmu_mapped) {
                drm_WARN_ON(&vdev->drm, !bo->ctx);
@@ -124,19 +117,23 @@ static void ivpu_bo_unbind_locked(struct ivpu_bo *bo)
 
        if (bo->ctx) {
                ivpu_mmu_context_remove_node(bo->ctx, &bo->mm_node);
-               bo->vpu_addr = 0;
                bo->ctx = NULL;
        }
-}
 
-static void ivpu_bo_unbind(struct ivpu_bo *bo)
-{
-       mutex_lock(&bo->lock);
-       ivpu_bo_unbind_locked(bo);
-       mutex_unlock(&bo->lock);
+       if (bo->base.base.import_attach)
+               return;
+
+       dma_resv_lock(bo->base.base.resv, NULL);
+       if (bo->base.sgt) {
+               dma_unmap_sgtable(vdev->drm.dev, bo->base.sgt, DMA_BIDIRECTIONAL, 0);
+               sg_free_table(bo->base.sgt);
+               kfree(bo->base.sgt);
+               bo->base.sgt = NULL;
+       }
+       dma_resv_unlock(bo->base.base.resv);
 }
 
-void ivpu_bo_remove_all_bos_from_context(struct ivpu_device *vdev, struct ivpu_mmu_context *ctx)
+void ivpu_bo_unbind_all_bos_from_context(struct ivpu_device *vdev, struct ivpu_mmu_context *ctx)
 {
        struct ivpu_bo *bo;
 
@@ -146,8 +143,10 @@ void ivpu_bo_remove_all_bos_from_context(struct ivpu_device *vdev, struct ivpu_m
        mutex_lock(&vdev->bo_list_lock);
        list_for_each_entry(bo, &vdev->bo_list, bo_list_node) {
                mutex_lock(&bo->lock);
-               if (bo->ctx == ctx)
+               if (bo->ctx == ctx) {
+                       ivpu_dbg_bo(vdev, bo, "unbind");
                        ivpu_bo_unbind_locked(bo);
+               }
                mutex_unlock(&bo->lock);
        }
        mutex_unlock(&vdev->bo_list_lock);
@@ -199,9 +198,6 @@ ivpu_bo_create(struct ivpu_device *vdev, u64 size, u32 flags)
        list_add_tail(&bo->bo_list_node, &vdev->bo_list);
        mutex_unlock(&vdev->bo_list_lock);
 
-       ivpu_dbg(vdev, BO, "create: vpu_addr 0x%llx size %zu flags 0x%x\n",
-                bo->vpu_addr, bo->base.base.size, flags);
-
        return bo;
 }
 
@@ -212,6 +208,12 @@ static int ivpu_bo_open(struct drm_gem_object *obj, struct drm_file *file)
        struct ivpu_bo *bo = to_ivpu_bo(obj);
        struct ivpu_addr_range *range;
 
+       if (bo->ctx) {
+               ivpu_warn(vdev, "Can't add BO to ctx %u: already in ctx %u\n",
+                         file_priv->ctx.id, bo->ctx->id);
+               return -EALREADY;
+       }
+
        if (bo->flags & DRM_IVPU_BO_SHAVE_MEM)
                range = &vdev->hw->ranges.shave;
        else if (bo->flags & DRM_IVPU_BO_DMA_MEM)
@@ -227,62 +229,24 @@ static void ivpu_bo_free(struct drm_gem_object *obj)
        struct ivpu_device *vdev = to_ivpu_device(obj->dev);
        struct ivpu_bo *bo = to_ivpu_bo(obj);
 
+       ivpu_dbg_bo(vdev, bo, "free");
+
        mutex_lock(&vdev->bo_list_lock);
        list_del(&bo->bo_list_node);
        mutex_unlock(&vdev->bo_list_lock);
 
        drm_WARN_ON(&vdev->drm, !dma_resv_test_signaled(obj->resv, DMA_RESV_USAGE_READ));
 
-       ivpu_dbg_bo(vdev, bo, "free");
-
-       ivpu_bo_unbind(bo);
+       ivpu_bo_unbind_locked(bo);
        mutex_destroy(&bo->lock);
 
        drm_WARN_ON(obj->dev, bo->base.pages_use_count > 1);
        drm_gem_shmem_free(&bo->base);
 }
 
-static const struct dma_buf_ops ivpu_bo_dmabuf_ops =  {
-       .cache_sgt_mapping = true,
-       .attach = drm_gem_map_attach,
-       .detach = drm_gem_map_detach,
-       .map_dma_buf = drm_gem_map_dma_buf,
-       .unmap_dma_buf = drm_gem_unmap_dma_buf,
-       .release = drm_gem_dmabuf_release,
-       .mmap = drm_gem_dmabuf_mmap,
-       .vmap = drm_gem_dmabuf_vmap,
-       .vunmap = drm_gem_dmabuf_vunmap,
-};
-
-static struct dma_buf *ivpu_bo_export(struct drm_gem_object *obj, int flags)
-{
-       struct drm_device *dev = obj->dev;
-       struct dma_buf_export_info exp_info = {
-               .exp_name = KBUILD_MODNAME,
-               .owner = dev->driver->fops->owner,
-               .ops = &ivpu_bo_dmabuf_ops,
-               .size = obj->size,
-               .flags = flags,
-               .priv = obj,
-               .resv = obj->resv,
-       };
-       void *sgt;
-
-       /*
-        * Make sure that pages are allocated and dma-mapped before exporting the bo.
-        * DMA-mapping is required if the bo will be imported to the same device.
-        */
-       sgt = drm_gem_shmem_get_pages_sgt(to_drm_gem_shmem_obj(obj));
-       if (IS_ERR(sgt))
-               return sgt;
-
-       return drm_gem_dmabuf_export(dev, &exp_info);
-}
-
 static const struct drm_gem_object_funcs ivpu_gem_funcs = {
        .free = ivpu_bo_free,
        .open = ivpu_bo_open,
-       .export = ivpu_bo_export,
        .print_info = drm_gem_shmem_object_print_info,
        .pin = drm_gem_shmem_object_pin,
        .unpin = drm_gem_shmem_object_unpin,
@@ -315,11 +279,9 @@ int ivpu_bo_create_ioctl(struct drm_device *dev, void *data, struct drm_file *fi
                return PTR_ERR(bo);
        }
 
-       ret = drm_gem_handle_create(file, &bo->base.base, &bo->handle);
-       if (!ret) {
+       ret = drm_gem_handle_create(file, &bo->base.base, &args->handle);
+       if (!ret)
                args->vpu_addr = bo->vpu_addr;
-               args->handle = bo->handle;
-       }
 
        drm_gem_object_put(&bo->base.base);
 
@@ -361,7 +323,9 @@ ivpu_bo_alloc_internal(struct ivpu_device *vdev, u64 vpu_addr, u64 size, u32 fla
        if (ret)
                goto err_put;
 
+       dma_resv_lock(bo->base.base.resv, NULL);
        ret = drm_gem_shmem_vmap(&bo->base, &map);
+       dma_resv_unlock(bo->base.base.resv);
        if (ret)
                goto err_put;
 
@@ -376,7 +340,10 @@ void ivpu_bo_free_internal(struct ivpu_bo *bo)
 {
        struct iosys_map map = IOSYS_MAP_INIT_VADDR(bo->base.vaddr);
 
+       dma_resv_lock(bo->base.base.resv, NULL);
        drm_gem_shmem_vunmap(&bo->base, &map);
+       dma_resv_unlock(bo->base.base.resv);
+
        drm_gem_object_put(&bo->base.base);
 }
 
@@ -432,19 +399,11 @@ int ivpu_bo_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file
 
 static void ivpu_bo_print_info(struct ivpu_bo *bo, struct drm_printer *p)
 {
-       unsigned long dma_refcount = 0;
-
        mutex_lock(&bo->lock);
 
-       if (bo->base.base.dma_buf && bo->base.base.dma_buf->file)
-               dma_refcount = atomic_long_read(&bo->base.base.dma_buf->file->f_count);
-
-       drm_printf(p, "%-3u %-6d 0x%-12llx %-10lu 0x%-8x %-4u %-8lu",
-                  bo->ctx->id, bo->handle, bo->vpu_addr, bo->base.base.size,
-                  bo->flags, kref_read(&bo->base.base.refcount), dma_refcount);
-
-       if (bo->base.base.import_attach)
-               drm_printf(p, " imported");
+       drm_printf(p, "%-9p %-3u 0x%-12llx %-10lu 0x%-8x %-4u",
+                  bo, bo->ctx->id, bo->vpu_addr, bo->base.base.size,
+                  bo->flags, kref_read(&bo->base.base.refcount));
 
        if (bo->base.pages)
                drm_printf(p, " has_pages");
@@ -452,6 +411,9 @@ static void ivpu_bo_print_info(struct ivpu_bo *bo, struct drm_printer *p)
        if (bo->mmu_mapped)
                drm_printf(p, " mmu_mapped");
 
+       if (bo->base.base.import_attach)
+               drm_printf(p, " imported");
+
        drm_printf(p, "\n");
 
        mutex_unlock(&bo->lock);
@@ -462,8 +424,8 @@ void ivpu_bo_list(struct drm_device *dev, struct drm_printer *p)
        struct ivpu_device *vdev = to_ivpu_device(dev);
        struct ivpu_bo *bo;
 
-       drm_printf(p, "%-3s %-6s %-14s %-10s %-10s %-4s %-8s %s\n",
-                  "ctx", "handle", "vpu_addr", "size", "flags", "refs", "dma_refs", "attribs");
+       drm_printf(p, "%-9s %-3s %-14s %-10s %-10s %-4s %s\n",
+                  "bo", "ctx", "vpu_addr", "size", "flags", "refs", "attribs");
 
        mutex_lock(&vdev->bo_list_lock);
        list_for_each_entry(bo, &vdev->bo_list, bo_list_node)
index d75cad0d3c742db703dbe0812a0df6eaaba24d53..a8559211c70d41ac20bae1da57d846ffd4d7e3b3 100644 (file)
@@ -19,14 +19,13 @@ struct ivpu_bo {
 
        struct mutex lock; /* Protects: ctx, mmu_mapped, vpu_addr */
        u64 vpu_addr;
-       u32 handle;
        u32 flags;
        u32 job_status; /* Valid only for command buffer */
        bool mmu_mapped;
 };
 
 int ivpu_bo_pin(struct ivpu_bo *bo);
-void ivpu_bo_remove_all_bos_from_context(struct ivpu_device *vdev, struct ivpu_mmu_context *ctx);
+void ivpu_bo_unbind_all_bos_from_context(struct ivpu_device *vdev, struct ivpu_mmu_context *ctx);
 
 struct drm_gem_object *ivpu_gem_create_object(struct drm_device *dev, size_t size);
 struct ivpu_bo *ivpu_bo_alloc_internal(struct ivpu_device *vdev, u64 vpu_addr, u64 size, u32 flags);
index 574cdeefb66b39af45beda6534a6ac3eb0e57c7b..89af1006df5587ba560415a7d669251648f3c3e8 100644 (file)
@@ -510,22 +510,12 @@ static int ivpu_boot_pwr_domain_enable(struct ivpu_device *vdev)
        return ret;
 }
 
-static int ivpu_boot_pwr_domain_disable(struct ivpu_device *vdev)
-{
-       ivpu_boot_dpu_active_drive(vdev, false);
-       ivpu_boot_pwr_island_isolation_drive(vdev, true);
-       ivpu_boot_pwr_island_trickle_drive(vdev, false);
-       ivpu_boot_pwr_island_drive(vdev, false);
-
-       return ivpu_boot_wait_for_pwr_island_status(vdev, 0x0);
-}
-
 static void ivpu_boot_no_snoop_enable(struct ivpu_device *vdev)
 {
        u32 val = REGV_RD32(VPU_37XX_HOST_IF_TCU_PTW_OVERRIDES);
 
        val = REG_SET_FLD(VPU_37XX_HOST_IF_TCU_PTW_OVERRIDES, NOSNOOP_OVERRIDE_EN, val);
-       val = REG_SET_FLD(VPU_37XX_HOST_IF_TCU_PTW_OVERRIDES, AW_NOSNOOP_OVERRIDE, val);
+       val = REG_CLR_FLD(VPU_37XX_HOST_IF_TCU_PTW_OVERRIDES, AW_NOSNOOP_OVERRIDE, val);
        val = REG_SET_FLD(VPU_37XX_HOST_IF_TCU_PTW_OVERRIDES, AR_NOSNOOP_OVERRIDE, val);
 
        REGV_WR32(VPU_37XX_HOST_IF_TCU_PTW_OVERRIDES, val);
@@ -616,12 +606,37 @@ static int ivpu_hw_37xx_info_init(struct ivpu_device *vdev)
        return 0;
 }
 
+static int ivpu_hw_37xx_ip_reset(struct ivpu_device *vdev)
+{
+       int ret;
+       u32 val;
+
+       if (IVPU_WA(punit_disabled))
+               return 0;
+
+       ret = REGB_POLL_FLD(VPU_37XX_BUTTRESS_VPU_IP_RESET, TRIGGER, 0, TIMEOUT_US);
+       if (ret) {
+               ivpu_err(vdev, "Timed out waiting for TRIGGER bit\n");
+               return ret;
+       }
+
+       val = REGB_RD32(VPU_37XX_BUTTRESS_VPU_IP_RESET);
+       val = REG_SET_FLD(VPU_37XX_BUTTRESS_VPU_IP_RESET, TRIGGER, val);
+       REGB_WR32(VPU_37XX_BUTTRESS_VPU_IP_RESET, val);
+
+       ret = REGB_POLL_FLD(VPU_37XX_BUTTRESS_VPU_IP_RESET, TRIGGER, 0, TIMEOUT_US);
+       if (ret)
+               ivpu_err(vdev, "Timed out waiting for RESET completion\n");
+
+       return ret;
+}
+
 static int ivpu_hw_37xx_reset(struct ivpu_device *vdev)
 {
        int ret = 0;
 
-       if (ivpu_boot_pwr_domain_disable(vdev)) {
-               ivpu_err(vdev, "Failed to disable power domain\n");
+       if (ivpu_hw_37xx_ip_reset(vdev)) {
+               ivpu_err(vdev, "Failed to reset NPU\n");
                ret = -EIO;
        }
 
@@ -661,6 +676,11 @@ static int ivpu_hw_37xx_power_up(struct ivpu_device *vdev)
 {
        int ret;
 
+       /* PLL requests may fail when powering down, so issue WP 0 here */
+       ret = ivpu_pll_disable(vdev);
+       if (ret)
+               ivpu_warn(vdev, "Failed to disable PLL: %d\n", ret);
+
        ret = ivpu_hw_37xx_d0i3_disable(vdev);
        if (ret)
                ivpu_warn(vdev, "Failed to disable D0I3: %d\n", ret);
@@ -875,24 +895,18 @@ static void ivpu_hw_37xx_irq_disable(struct ivpu_device *vdev)
 
 static void ivpu_hw_37xx_irq_wdt_nce_handler(struct ivpu_device *vdev)
 {
-       ivpu_err_ratelimited(vdev, "WDT NCE irq\n");
-
-       ivpu_pm_schedule_recovery(vdev);
+       ivpu_pm_trigger_recovery(vdev, "WDT NCE IRQ");
 }
 
 static void ivpu_hw_37xx_irq_wdt_mss_handler(struct ivpu_device *vdev)
 {
-       ivpu_err_ratelimited(vdev, "WDT MSS irq\n");
-
        ivpu_hw_wdt_disable(vdev);
-       ivpu_pm_schedule_recovery(vdev);
+       ivpu_pm_trigger_recovery(vdev, "WDT MSS IRQ");
 }
 
 static void ivpu_hw_37xx_irq_noc_firewall_handler(struct ivpu_device *vdev)
 {
-       ivpu_err_ratelimited(vdev, "NOC Firewall irq\n");
-
-       ivpu_pm_schedule_recovery(vdev);
+       ivpu_pm_trigger_recovery(vdev, "NOC Firewall IRQ");
 }
 
 /* Handler for IRQs from VPU core (irqV) */
@@ -970,7 +984,7 @@ static bool ivpu_hw_37xx_irqb_handler(struct ivpu_device *vdev, int irq)
                REGB_WR32(VPU_37XX_BUTTRESS_INTERRUPT_STAT, status);
 
        if (schedule_recovery)
-               ivpu_pm_schedule_recovery(vdev);
+               ivpu_pm_trigger_recovery(vdev, "Buttress IRQ");
 
        return true;
 }
index eba2fdef2ace1384c93c1cbb30a8e3d9633abba8..a1523d0b1ef3660709ae087003a703fb4f8237bd 100644 (file)
@@ -24,7 +24,7 @@
 #define SKU_HW_ID_SHIFT              16u
 #define SKU_HW_ID_MASK               0xffff0000u
 
-#define PLL_CONFIG_DEFAULT           0x1
+#define PLL_CONFIG_DEFAULT           0x0
 #define PLL_CDYN_DEFAULT             0x80
 #define PLL_EPP_DEFAULT              0x80
 #define PLL_REF_CLK_FREQ            (50 * 1000000)
@@ -530,7 +530,7 @@ static void ivpu_boot_no_snoop_enable(struct ivpu_device *vdev)
        u32 val = REGV_RD32(VPU_40XX_HOST_IF_TCU_PTW_OVERRIDES);
 
        val = REG_SET_FLD(VPU_40XX_HOST_IF_TCU_PTW_OVERRIDES, SNOOP_OVERRIDE_EN, val);
-       val = REG_CLR_FLD(VPU_40XX_HOST_IF_TCU_PTW_OVERRIDES, AW_SNOOP_OVERRIDE, val);
+       val = REG_SET_FLD(VPU_40XX_HOST_IF_TCU_PTW_OVERRIDES, AW_SNOOP_OVERRIDE, val);
        val = REG_CLR_FLD(VPU_40XX_HOST_IF_TCU_PTW_OVERRIDES, AR_SNOOP_OVERRIDE, val);
 
        REGV_WR32(VPU_40XX_HOST_IF_TCU_PTW_OVERRIDES, val);
@@ -704,7 +704,6 @@ static int ivpu_hw_40xx_info_init(struct ivpu_device *vdev)
 {
        struct ivpu_hw_info *hw = vdev->hw;
        u32 tile_disable;
-       u32 tile_enable;
        u32 fuse;
 
        fuse = REGB_RD32(VPU_40XX_BUTTRESS_TILE_FUSE);
@@ -725,10 +724,6 @@ static int ivpu_hw_40xx_info_init(struct ivpu_device *vdev)
        else
                ivpu_dbg(vdev, MISC, "Fuse: All %d tiles enabled\n", TILE_MAX_NUM);
 
-       tile_enable = (~tile_disable) & TILE_MAX_MASK;
-
-       hw->sku = REG_SET_FLD_NUM(SKU, HW_ID, LNL_HW_ID, hw->sku);
-       hw->sku = REG_SET_FLD_NUM(SKU, TILE, tile_enable, hw->sku);
        hw->tile_fuse = tile_disable;
        hw->pll.profiling_freq = PLL_PROFILING_FREQ_DEFAULT;
 
@@ -746,7 +741,7 @@ static int ivpu_hw_40xx_info_init(struct ivpu_device *vdev)
        return 0;
 }
 
-static int ivpu_hw_40xx_reset(struct ivpu_device *vdev)
+static int ivpu_hw_40xx_ip_reset(struct ivpu_device *vdev)
 {
        int ret;
        u32 val;
@@ -768,6 +763,23 @@ static int ivpu_hw_40xx_reset(struct ivpu_device *vdev)
        return ret;
 }
 
+static int ivpu_hw_40xx_reset(struct ivpu_device *vdev)
+{
+       int ret = 0;
+
+       if (ivpu_hw_40xx_ip_reset(vdev)) {
+               ivpu_err(vdev, "Failed to reset VPU IP\n");
+               ret = -EIO;
+       }
+
+       if (ivpu_pll_disable(vdev)) {
+               ivpu_err(vdev, "Failed to disable PLL\n");
+               ret = -EIO;
+       }
+
+       return ret;
+}
+
 static int ivpu_hw_40xx_d0i3_enable(struct ivpu_device *vdev)
 {
        int ret;
@@ -913,7 +925,7 @@ static int ivpu_hw_40xx_power_down(struct ivpu_device *vdev)
 
        ivpu_hw_40xx_save_d0i3_entry_timestamp(vdev);
 
-       if (!ivpu_hw_40xx_is_idle(vdev) && ivpu_hw_40xx_reset(vdev))
+       if (!ivpu_hw_40xx_is_idle(vdev) && ivpu_hw_40xx_ip_reset(vdev))
                ivpu_warn(vdev, "Failed to reset the VPU\n");
 
        if (ivpu_pll_disable(vdev)) {
@@ -1032,18 +1044,18 @@ static void ivpu_hw_40xx_irq_disable(struct ivpu_device *vdev)
 static void ivpu_hw_40xx_irq_wdt_nce_handler(struct ivpu_device *vdev)
 {
        /* TODO: For LNN hang consider engine reset instead of full recovery */
-       ivpu_pm_schedule_recovery(vdev);
+       ivpu_pm_trigger_recovery(vdev, "WDT NCE IRQ");
 }
 
 static void ivpu_hw_40xx_irq_wdt_mss_handler(struct ivpu_device *vdev)
 {
        ivpu_hw_wdt_disable(vdev);
-       ivpu_pm_schedule_recovery(vdev);
+       ivpu_pm_trigger_recovery(vdev, "WDT MSS IRQ");
 }
 
 static void ivpu_hw_40xx_irq_noc_firewall_handler(struct ivpu_device *vdev)
 {
-       ivpu_pm_schedule_recovery(vdev);
+       ivpu_pm_trigger_recovery(vdev, "NOC Firewall IRQ");
 }
 
 /* Handler for IRQs from VPU core (irqV) */
@@ -1137,7 +1149,7 @@ static bool ivpu_hw_40xx_irqb_handler(struct ivpu_device *vdev, int irq)
        REGB_WR32(VPU_40XX_BUTTRESS_INTERRUPT_STAT, status);
 
        if (schedule_recovery)
-               ivpu_pm_schedule_recovery(vdev);
+               ivpu_pm_trigger_recovery(vdev, "Buttress IRQ");
 
        return true;
 }
index e86621f16f85a8d5d41f0d5dcf1ef99d1e45541d..fa66c39b57ecaaecae036d17f79c3e7683fc5279 100644 (file)
@@ -343,10 +343,8 @@ int ivpu_ipc_send_receive_active(struct ivpu_device *vdev, struct vpu_jsm_msg *r
        hb_ret = ivpu_ipc_send_receive_internal(vdev, &hb_req, VPU_JSM_MSG_QUERY_ENGINE_HB_DONE,
                                                &hb_resp, VPU_IPC_CHAN_ASYNC_CMD,
                                                vdev->timeout.jsm);
-       if (hb_ret == -ETIMEDOUT) {
-               ivpu_hw_diagnose_failure(vdev);
-               ivpu_pm_schedule_recovery(vdev);
-       }
+       if (hb_ret == -ETIMEDOUT)
+               ivpu_pm_trigger_recovery(vdev, "IPC timeout");
 
        return ret;
 }
index 7206cf9cdb4a45335b220796621fcd2c55a8ddf0..e70cfb8593390e489e9f9868fb6c2420733ae241 100644 (file)
@@ -112,22 +112,20 @@ static void ivpu_cmdq_release_locked(struct ivpu_file_priv *file_priv, u16 engin
        }
 }
 
-void ivpu_cmdq_release_all(struct ivpu_file_priv *file_priv)
+void ivpu_cmdq_release_all_locked(struct ivpu_file_priv *file_priv)
 {
        int i;
 
-       mutex_lock(&file_priv->lock);
+       lockdep_assert_held(&file_priv->lock);
 
        for (i = 0; i < IVPU_NUM_ENGINES; i++)
                ivpu_cmdq_release_locked(file_priv, i);
-
-       mutex_unlock(&file_priv->lock);
 }
 
 /*
  * Mark the doorbell as unregistered and reset job queue pointers.
  * This function needs to be called when the VPU hardware is restarted
- * and FW looses job queue state. The next time job queue is used it
+ * and FW loses job queue state. The next time job queue is used it
  * will be registered again.
  */
 static void ivpu_cmdq_reset_locked(struct ivpu_file_priv *file_priv, u16 engine)
@@ -161,15 +159,13 @@ void ivpu_cmdq_reset_all_contexts(struct ivpu_device *vdev)
        struct ivpu_file_priv *file_priv;
        unsigned long ctx_id;
 
-       xa_for_each(&vdev->context_xa, ctx_id, file_priv) {
-               file_priv = ivpu_file_priv_get_by_ctx_id(vdev, ctx_id);
-               if (!file_priv)
-                       continue;
+       mutex_lock(&vdev->context_list_lock);
 
+       xa_for_each(&vdev->context_xa, ctx_id, file_priv)
                ivpu_cmdq_reset_all(file_priv);
 
-               ivpu_file_priv_put(&file_priv);
-       }
+       mutex_unlock(&vdev->context_list_lock);
+
 }
 
 static int ivpu_cmdq_push_job(struct ivpu_cmdq *cmdq, struct ivpu_job *job)
@@ -243,60 +239,32 @@ static struct dma_fence *ivpu_fence_create(struct ivpu_device *vdev)
        return &fence->base;
 }
 
-static void job_get(struct ivpu_job *job, struct ivpu_job **link)
+static void ivpu_job_destroy(struct ivpu_job *job)
 {
        struct ivpu_device *vdev = job->vdev;
-
-       kref_get(&job->ref);
-       *link = job;
-
-       ivpu_dbg(vdev, KREF, "Job get: id %u refcount %u\n", job->job_id, kref_read(&job->ref));
-}
-
-static void job_release(struct kref *ref)
-{
-       struct ivpu_job *job = container_of(ref, struct ivpu_job, ref);
-       struct ivpu_device *vdev = job->vdev;
        u32 i;
 
+       ivpu_dbg(vdev, JOB, "Job destroyed: id %3u ctx %2d engine %d",
+                job->job_id, job->file_priv->ctx.id, job->engine_idx);
+
        for (i = 0; i < job->bo_count; i++)
                if (job->bos[i])
                        drm_gem_object_put(&job->bos[i]->base.base);
 
        dma_fence_put(job->done_fence);
        ivpu_file_priv_put(&job->file_priv);
-
-       ivpu_dbg(vdev, KREF, "Job released: id %u\n", job->job_id);
        kfree(job);
-
-       /* Allow the VPU to get suspended, must be called after ivpu_file_priv_put() */
-       ivpu_rpm_put(vdev);
-}
-
-static void job_put(struct ivpu_job *job)
-{
-       struct ivpu_device *vdev = job->vdev;
-
-       ivpu_dbg(vdev, KREF, "Job put: id %u refcount %u\n", job->job_id, kref_read(&job->ref));
-       kref_put(&job->ref, job_release);
 }
 
 static struct ivpu_job *
-ivpu_create_job(struct ivpu_file_priv *file_priv, u32 engine_idx, u32 bo_count)
+ivpu_job_create(struct ivpu_file_priv *file_priv, u32 engine_idx, u32 bo_count)
 {
        struct ivpu_device *vdev = file_priv->vdev;
        struct ivpu_job *job;
-       int ret;
-
-       ret = ivpu_rpm_get(vdev);
-       if (ret < 0)
-               return NULL;
 
        job = kzalloc(struct_size(job, bos, bo_count), GFP_KERNEL);
        if (!job)
-               goto err_rpm_put;
-
-       kref_init(&job->ref);
+               return NULL;
 
        job->vdev = vdev;
        job->engine_idx = engine_idx;
@@ -310,17 +278,14 @@ ivpu_create_job(struct ivpu_file_priv *file_priv, u32 engine_idx, u32 bo_count)
        job->file_priv = ivpu_file_priv_get(file_priv);
 
        ivpu_dbg(vdev, JOB, "Job created: ctx %2d engine %d", file_priv->ctx.id, job->engine_idx);
-
        return job;
 
 err_free_job:
        kfree(job);
-err_rpm_put:
-       ivpu_rpm_put(vdev);
        return NULL;
 }
 
-static int ivpu_job_done(struct ivpu_device *vdev, u32 job_id, u32 job_status)
+static int ivpu_job_signal_and_destroy(struct ivpu_device *vdev, u32 job_id, u32 job_status)
 {
        struct ivpu_job *job;
 
@@ -329,7 +294,7 @@ static int ivpu_job_done(struct ivpu_device *vdev, u32 job_id, u32 job_status)
                return -ENOENT;
 
        if (job->file_priv->has_mmu_faults)
-               job_status = VPU_JSM_STATUS_ABORTED;
+               job_status = DRM_IVPU_JOB_STATUS_ABORTED;
 
        job->bos[CMD_BUF_IDX]->job_status = job_status;
        dma_fence_signal(job->done_fence);
@@ -337,9 +302,10 @@ static int ivpu_job_done(struct ivpu_device *vdev, u32 job_id, u32 job_status)
        ivpu_dbg(vdev, JOB, "Job complete:  id %3u ctx %2d engine %d status 0x%x\n",
                 job->job_id, job->file_priv->ctx.id, job->engine_idx, job_status);
 
+       ivpu_job_destroy(job);
        ivpu_stop_job_timeout_detection(vdev);
 
-       job_put(job);
+       ivpu_rpm_put(vdev);
        return 0;
 }
 
@@ -349,10 +315,10 @@ void ivpu_jobs_abort_all(struct ivpu_device *vdev)
        unsigned long id;
 
        xa_for_each(&vdev->submitted_jobs_xa, id, job)
-               ivpu_job_done(vdev, id, VPU_JSM_STATUS_ABORTED);
+               ivpu_job_signal_and_destroy(vdev, id, DRM_IVPU_JOB_STATUS_ABORTED);
 }
 
-static int ivpu_direct_job_submission(struct ivpu_job *job)
+static int ivpu_job_submit(struct ivpu_job *job)
 {
        struct ivpu_file_priv *file_priv = job->file_priv;
        struct ivpu_device *vdev = job->vdev;
@@ -360,53 +326,65 @@ static int ivpu_direct_job_submission(struct ivpu_job *job)
        struct ivpu_cmdq *cmdq;
        int ret;
 
+       ret = ivpu_rpm_get(vdev);
+       if (ret < 0)
+               return ret;
+
        mutex_lock(&file_priv->lock);
 
        cmdq = ivpu_cmdq_acquire(job->file_priv, job->engine_idx);
        if (!cmdq) {
-               ivpu_warn(vdev, "Failed get job queue, ctx %d engine %d\n",
-                         file_priv->ctx.id, job->engine_idx);
+               ivpu_warn_ratelimited(vdev, "Failed get job queue, ctx %d engine %d\n",
+                                     file_priv->ctx.id, job->engine_idx);
                ret = -EINVAL;
-               goto err_unlock;
+               goto err_unlock_file_priv;
        }
 
        job_id_range.min = FIELD_PREP(JOB_ID_CONTEXT_MASK, (file_priv->ctx.id - 1));
        job_id_range.max = job_id_range.min | JOB_ID_JOB_MASK;
 
-       job_get(job, &job);
-       ret = xa_alloc(&vdev->submitted_jobs_xa, &job->job_id, job, job_id_range, GFP_KERNEL);
+       xa_lock(&vdev->submitted_jobs_xa);
+       ret = __xa_alloc(&vdev->submitted_jobs_xa, &job->job_id, job, job_id_range, GFP_KERNEL);
        if (ret) {
-               ivpu_warn_ratelimited(vdev, "Failed to allocate job id: %d\n", ret);
-               goto err_job_put;
+               ivpu_dbg(vdev, JOB, "Too many active jobs in ctx %d\n",
+                        file_priv->ctx.id);
+               ret = -EBUSY;
+               goto err_unlock_submitted_jobs_xa;
        }
 
        ret = ivpu_cmdq_push_job(cmdq, job);
        if (ret)
-               goto err_xa_erase;
+               goto err_erase_xa;
 
        ivpu_start_job_timeout_detection(vdev);
 
-       ivpu_dbg(vdev, JOB, "Job submitted: id %3u addr 0x%llx ctx %2d engine %d next %d\n",
-                job->job_id, job->cmd_buf_vpu_addr, file_priv->ctx.id,
-                job->engine_idx, cmdq->jobq->header.tail);
-
-       if (ivpu_test_mode & IVPU_TEST_MODE_NULL_HW) {
-               ivpu_job_done(vdev, job->job_id, VPU_JSM_STATUS_SUCCESS);
+       if (unlikely(ivpu_test_mode & IVPU_TEST_MODE_NULL_HW)) {
                cmdq->jobq->header.head = cmdq->jobq->header.tail;
                wmb(); /* Flush WC buffer for jobq header */
        } else {
                ivpu_cmdq_ring_db(vdev, cmdq);
        }
 
+       ivpu_dbg(vdev, JOB, "Job submitted: id %3u ctx %2d engine %d addr 0x%llx next %d\n",
+                job->job_id, file_priv->ctx.id, job->engine_idx,
+                job->cmd_buf_vpu_addr, cmdq->jobq->header.tail);
+
+       xa_unlock(&vdev->submitted_jobs_xa);
+
        mutex_unlock(&file_priv->lock);
+
+       if (unlikely(ivpu_test_mode & IVPU_TEST_MODE_NULL_HW))
+               ivpu_job_signal_and_destroy(vdev, job->job_id, VPU_JSM_STATUS_SUCCESS);
+
        return 0;
 
-err_xa_erase:
-       xa_erase(&vdev->submitted_jobs_xa, job->job_id);
-err_job_put:
-       job_put(job);
-err_unlock:
+err_erase_xa:
+       __xa_erase(&vdev->submitted_jobs_xa, job->job_id);
+err_unlock_submitted_jobs_xa:
+       xa_unlock(&vdev->submitted_jobs_xa);
+err_unlock_file_priv:
        mutex_unlock(&file_priv->lock);
+       ivpu_rpm_put(vdev);
        return ret;
 }
 
@@ -488,6 +466,9 @@ int ivpu_submit_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
        if (params->engine > DRM_IVPU_ENGINE_COPY)
                return -EINVAL;
 
+       if (params->priority > DRM_IVPU_JOB_PRIORITY_REALTIME)
+               return -EINVAL;
+
        if (params->buffer_count == 0 || params->buffer_count > JOB_MAX_BUFFER_COUNT)
                return -EINVAL;
 
@@ -509,44 +490,49 @@ int ivpu_submit_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
                             params->buffer_count * sizeof(u32));
        if (ret) {
                ret = -EFAULT;
-               goto free_handles;
+               goto err_free_handles;
        }
 
        if (!drm_dev_enter(&vdev->drm, &idx)) {
                ret = -ENODEV;
-               goto free_handles;
+               goto err_free_handles;
        }
 
        ivpu_dbg(vdev, JOB, "Submit ioctl: ctx %u buf_count %u\n",
                 file_priv->ctx.id, params->buffer_count);
 
-       job = ivpu_create_job(file_priv, params->engine, params->buffer_count);
+       job = ivpu_job_create(file_priv, params->engine, params->buffer_count);
        if (!job) {
                ivpu_err(vdev, "Failed to create job\n");
                ret = -ENOMEM;
-               goto dev_exit;
+               goto err_exit_dev;
        }
 
        ret = ivpu_job_prepare_bos_for_submit(file, job, buf_handles, params->buffer_count,
                                              params->commands_offset);
        if (ret) {
-               ivpu_err(vdev, "Failed to prepare job, ret %d\n", ret);
-               goto job_put;
+               ivpu_err(vdev, "Failed to prepare job: %d\n", ret);
+               goto err_destroy_job;
        }
 
-       ret = ivpu_direct_job_submission(job);
-       if (ret) {
-               dma_fence_signal(job->done_fence);
-               ivpu_err(vdev, "Failed to submit job to the HW, ret %d\n", ret);
-       }
+       down_read(&vdev->pm->reset_lock);
+       ret = ivpu_job_submit(job);
+       up_read(&vdev->pm->reset_lock);
+       if (ret)
+               goto err_signal_fence;
 
-job_put:
-       job_put(job);
-dev_exit:
        drm_dev_exit(idx);
-free_handles:
        kfree(buf_handles);
+       return ret;
 
+err_signal_fence:
+       dma_fence_signal(job->done_fence);
+err_destroy_job:
+       ivpu_job_destroy(job);
+err_exit_dev:
+       drm_dev_exit(idx);
+err_free_handles:
+       kfree(buf_handles);
        return ret;
 }
 
@@ -568,7 +554,7 @@ ivpu_job_done_callback(struct ivpu_device *vdev, struct ivpu_ipc_hdr *ipc_hdr,
        }
 
        payload = (struct vpu_ipc_msg_payload_job_done *)&jsm_msg->payload;
-       ret = ivpu_job_done(vdev, payload->job_id, payload->job_status);
+       ret = ivpu_job_signal_and_destroy(vdev, payload->job_id, payload->job_status);
        if (!ret && !xa_empty(&vdev->submitted_jobs_xa))
                ivpu_start_job_timeout_detection(vdev);
 }
index 45a2f2ec82e5ba69110d737e154c38bf6765174a..ca4984071cc76b17d858ae86929a48a9cea39c88 100644 (file)
@@ -43,7 +43,6 @@ struct ivpu_cmdq {
                          will update the job status
  */
 struct ivpu_job {
-       struct kref ref;
        struct ivpu_device *vdev;
        struct ivpu_file_priv *file_priv;
        struct dma_fence *done_fence;
@@ -56,7 +55,7 @@ struct ivpu_job {
 
 int ivpu_submit_ioctl(struct drm_device *dev, void *data, struct drm_file *file);
 
-void ivpu_cmdq_release_all(struct ivpu_file_priv *file_priv);
+void ivpu_cmdq_release_all_locked(struct ivpu_file_priv *file_priv);
 void ivpu_cmdq_reset_all_contexts(struct ivpu_device *vdev);
 
 void ivpu_job_done_consumer_init(struct ivpu_device *vdev);
index 2228c44b115fa0e4d48f36c115e2fdc7b434a8c0..91bd640655ab363b51df17a25cb9589293adc804 100644 (file)
@@ -7,6 +7,7 @@
 #include <linux/highmem.h>
 
 #include "ivpu_drv.h"
+#include "ivpu_hw.h"
 #include "ivpu_hw_reg_io.h"
 #include "ivpu_mmu.h"
 #include "ivpu_mmu_context.h"
 
 #define IVPU_MMU_Q_COUNT_LOG2          4 /* 16 entries */
 #define IVPU_MMU_Q_COUNT               ((u32)1 << IVPU_MMU_Q_COUNT_LOG2)
-#define IVPU_MMU_Q_WRAP_BIT            (IVPU_MMU_Q_COUNT << 1)
-#define IVPU_MMU_Q_WRAP_MASK           (IVPU_MMU_Q_WRAP_BIT - 1)
-#define IVPU_MMU_Q_IDX_MASK            (IVPU_MMU_Q_COUNT - 1)
+#define IVPU_MMU_Q_WRAP_MASK            GENMASK(IVPU_MMU_Q_COUNT_LOG2, 0)
+#define IVPU_MMU_Q_IDX_MASK             (IVPU_MMU_Q_COUNT - 1)
 #define IVPU_MMU_Q_IDX(val)            ((val) & IVPU_MMU_Q_IDX_MASK)
+#define IVPU_MMU_Q_WRP(val)             ((val) & IVPU_MMU_Q_COUNT)
 
 #define IVPU_MMU_CMDQ_CMD_SIZE         16
 #define IVPU_MMU_CMDQ_SIZE             (IVPU_MMU_Q_COUNT * IVPU_MMU_CMDQ_CMD_SIZE)
@@ -474,20 +475,32 @@ static int ivpu_mmu_cmdq_wait_for_cons(struct ivpu_device *vdev)
        return 0;
 }
 
+static bool ivpu_mmu_queue_is_full(struct ivpu_mmu_queue *q)
+{
+       return ((IVPU_MMU_Q_IDX(q->prod) == IVPU_MMU_Q_IDX(q->cons)) &&
+               (IVPU_MMU_Q_WRP(q->prod) != IVPU_MMU_Q_WRP(q->cons)));
+}
+
+static bool ivpu_mmu_queue_is_empty(struct ivpu_mmu_queue *q)
+{
+       return ((IVPU_MMU_Q_IDX(q->prod) == IVPU_MMU_Q_IDX(q->cons)) &&
+               (IVPU_MMU_Q_WRP(q->prod) == IVPU_MMU_Q_WRP(q->cons)));
+}
+
 static int ivpu_mmu_cmdq_cmd_write(struct ivpu_device *vdev, const char *name, u64 data0, u64 data1)
 {
-       struct ivpu_mmu_queue *q = &vdev->mmu->cmdq;
-       u64 *queue_buffer = q->base;
-       int idx = IVPU_MMU_Q_IDX(q->prod) * (IVPU_MMU_CMDQ_CMD_SIZE / sizeof(*queue_buffer));
+       struct ivpu_mmu_queue *cmdq = &vdev->mmu->cmdq;
+       u64 *queue_buffer = cmdq->base;
+       int idx = IVPU_MMU_Q_IDX(cmdq->prod) * (IVPU_MMU_CMDQ_CMD_SIZE / sizeof(*queue_buffer));
 
-       if (!CIRC_SPACE(IVPU_MMU_Q_IDX(q->prod), IVPU_MMU_Q_IDX(q->cons), IVPU_MMU_Q_COUNT)) {
+       if (ivpu_mmu_queue_is_full(cmdq)) {
                ivpu_err(vdev, "Failed to write MMU CMD %s\n", name);
                return -EBUSY;
        }
 
        queue_buffer[idx] = data0;
        queue_buffer[idx + 1] = data1;
-       q->prod = (q->prod + 1) & IVPU_MMU_Q_WRAP_MASK;
+       cmdq->prod = (cmdq->prod + 1) & IVPU_MMU_Q_WRAP_MASK;
 
        ivpu_dbg(vdev, MMU, "CMD write: %s data: 0x%llx 0x%llx\n", name, data0, data1);
 
@@ -518,6 +531,7 @@ static int ivpu_mmu_cmdq_sync(struct ivpu_device *vdev)
 
                ivpu_err(vdev, "Timed out waiting for MMU consumer: %d, error: %s\n", ret,
                         ivpu_mmu_cmdq_err_to_str(err));
+               ivpu_hw_diagnose_failure(vdev);
        }
 
        return ret;
@@ -558,7 +572,6 @@ static int ivpu_mmu_reset(struct ivpu_device *vdev)
        mmu->cmdq.cons = 0;
 
        memset(mmu->evtq.base, 0, IVPU_MMU_EVTQ_SIZE);
-       clflush_cache_range(mmu->evtq.base, IVPU_MMU_EVTQ_SIZE);
        mmu->evtq.prod = 0;
        mmu->evtq.cons = 0;
 
@@ -872,20 +885,15 @@ static u32 *ivpu_mmu_get_event(struct ivpu_device *vdev)
        u32 *evt = evtq->base + (idx * IVPU_MMU_EVTQ_CMD_SIZE);
 
        evtq->prod = REGV_RD32(IVPU_MMU_REG_EVTQ_PROD_SEC);
-       if (!CIRC_CNT(IVPU_MMU_Q_IDX(evtq->prod), IVPU_MMU_Q_IDX(evtq->cons), IVPU_MMU_Q_COUNT))
+       if (ivpu_mmu_queue_is_empty(evtq))
                return NULL;
 
-       clflush_cache_range(evt, IVPU_MMU_EVTQ_CMD_SIZE);
-
        evtq->cons = (evtq->cons + 1) & IVPU_MMU_Q_WRAP_MASK;
-       REGV_WR32(IVPU_MMU_REG_EVTQ_CONS_SEC, evtq->cons);
-
        return evt;
 }
 
 void ivpu_mmu_irq_evtq_handler(struct ivpu_device *vdev)
 {
-       bool schedule_recovery = false;
        u32 *event;
        u32 ssid;
 
@@ -895,14 +903,22 @@ void ivpu_mmu_irq_evtq_handler(struct ivpu_device *vdev)
                ivpu_mmu_dump_event(vdev, event);
 
                ssid = FIELD_GET(IVPU_MMU_EVT_SSID_MASK, event[0]);
-               if (ssid == IVPU_GLOBAL_CONTEXT_MMU_SSID)
-                       schedule_recovery = true;
-               else
-                       ivpu_mmu_user_context_mark_invalid(vdev, ssid);
+               if (ssid == IVPU_GLOBAL_CONTEXT_MMU_SSID) {
+                       ivpu_pm_trigger_recovery(vdev, "MMU event");
+                       return;
+               }
+
+               ivpu_mmu_user_context_mark_invalid(vdev, ssid);
+               REGV_WR32(IVPU_MMU_REG_EVTQ_CONS_SEC, vdev->mmu->evtq.cons);
        }
+}
 
-       if (schedule_recovery)
-               ivpu_pm_schedule_recovery(vdev);
+void ivpu_mmu_evtq_dump(struct ivpu_device *vdev)
+{
+       u32 *event;
+
+       while ((event = ivpu_mmu_get_event(vdev)) != NULL)
+               ivpu_mmu_dump_event(vdev, event);
 }
 
 void ivpu_mmu_irq_gerr_handler(struct ivpu_device *vdev)
index cb551126806baa9bb47a967c7bff916b444c2427..6fa35c240710625670b6879098833c6cd680fb40 100644 (file)
@@ -46,5 +46,6 @@ int ivpu_mmu_invalidate_tlb(struct ivpu_device *vdev, u16 ssid);
 
 void ivpu_mmu_irq_evtq_handler(struct ivpu_device *vdev);
 void ivpu_mmu_irq_gerr_handler(struct ivpu_device *vdev);
+void ivpu_mmu_evtq_dump(struct ivpu_device *vdev);
 
 #endif /* __IVPU_MMU_H__ */
index 12a8c09d4547d7d9b81cd91d93307e59e648e14f..fe61612992364c65d184eef9c3a3ad3ebb60ce6a 100644 (file)
@@ -355,6 +355,9 @@ ivpu_mmu_context_map_sgt(struct ivpu_device *vdev, struct ivpu_mmu_context *ctx,
                dma_addr_t dma_addr = sg_dma_address(sg) - sg->offset;
                size_t size = sg_dma_len(sg) + sg->offset;
 
+               ivpu_dbg(vdev, MMU_MAP, "Map ctx: %u dma_addr: 0x%llx vpu_addr: 0x%llx size: %lu\n",
+                        ctx->id, dma_addr, vpu_addr, size);
+
                ret = ivpu_mmu_context_map_pages(vdev, ctx, vpu_addr, dma_addr, size, prot);
                if (ret) {
                        ivpu_err(vdev, "Failed to map context pages\n");
@@ -366,6 +369,7 @@ ivpu_mmu_context_map_sgt(struct ivpu_device *vdev, struct ivpu_mmu_context *ctx,
 
        /* Ensure page table modifications are flushed from wc buffers to memory */
        wmb();
+
        mutex_unlock(&ctx->lock);
 
        ret = ivpu_mmu_invalidate_tlb(vdev, ctx->id);
@@ -388,14 +392,19 @@ ivpu_mmu_context_unmap_sgt(struct ivpu_device *vdev, struct ivpu_mmu_context *ct
        mutex_lock(&ctx->lock);
 
        for_each_sgtable_dma_sg(sgt, sg, i) {
+               dma_addr_t dma_addr = sg_dma_address(sg) - sg->offset;
                size_t size = sg_dma_len(sg) + sg->offset;
 
+               ivpu_dbg(vdev, MMU_MAP, "Unmap ctx: %u dma_addr: 0x%llx vpu_addr: 0x%llx size: %lu\n",
+                        ctx->id, dma_addr, vpu_addr, size);
+
                ivpu_mmu_context_unmap_pages(ctx, vpu_addr, size);
                vpu_addr += size;
        }
 
        /* Ensure page table modifications are flushed from wc buffers to memory */
        wmb();
+
        mutex_unlock(&ctx->lock);
 
        ret = ivpu_mmu_invalidate_tlb(vdev, ctx->id);
index 0af8864cb3b55f636bc7418a03dc3c07360e7ac8..5f73854234ba93da22b00113376c296df1ebd35a 100644 (file)
@@ -13,6 +13,7 @@
 #include "ivpu_drv.h"
 #include "ivpu_hw.h"
 #include "ivpu_fw.h"
+#include "ivpu_fw_log.h"
 #include "ivpu_ipc.h"
 #include "ivpu_job.h"
 #include "ivpu_jsm_msg.h"
@@ -57,11 +58,14 @@ static int ivpu_suspend(struct ivpu_device *vdev)
 {
        int ret;
 
+       /* Save PCI state before powering down as it sometimes gets corrupted if NPU hangs */
+       pci_save_state(to_pci_dev(vdev->drm.dev));
+
        ret = ivpu_shutdown(vdev);
-       if (ret) {
+       if (ret)
                ivpu_err(vdev, "Failed to shutdown VPU: %d\n", ret);
-               return ret;
-       }
+
+       pci_set_power_state(to_pci_dev(vdev->drm.dev), PCI_D3hot);
 
        return ret;
 }
@@ -70,6 +74,9 @@ static int ivpu_resume(struct ivpu_device *vdev)
 {
        int ret;
 
+       pci_set_power_state(to_pci_dev(vdev->drm.dev), PCI_D0);
+       pci_restore_state(to_pci_dev(vdev->drm.dev));
+
 retry:
        ret = ivpu_hw_power_up(vdev);
        if (ret) {
@@ -111,22 +118,37 @@ static void ivpu_pm_recovery_work(struct work_struct *work)
        char *evt[2] = {"IVPU_PM_EVENT=IVPU_RECOVER", NULL};
        int ret;
 
-retry:
-       ret = pci_try_reset_function(to_pci_dev(vdev->drm.dev));
-       if (ret == -EAGAIN && !drm_dev_is_unplugged(&vdev->drm)) {
-               cond_resched();
-               goto retry;
-       }
+       ivpu_err(vdev, "Recovering the VPU (reset #%d)\n", atomic_read(&vdev->pm->reset_counter));
 
-       if (ret && ret != -EAGAIN)
-               ivpu_err(vdev, "Failed to reset VPU: %d\n", ret);
+       ret = pm_runtime_resume_and_get(vdev->drm.dev);
+       if (ret)
+               ivpu_err(vdev, "Failed to resume VPU: %d\n", ret);
+
+       ivpu_fw_log_dump(vdev);
+
+       atomic_inc(&vdev->pm->reset_counter);
+       atomic_set(&vdev->pm->reset_pending, 1);
+       down_write(&vdev->pm->reset_lock);
+
+       ivpu_suspend(vdev);
+       ivpu_pm_prepare_cold_boot(vdev);
+       ivpu_jobs_abort_all(vdev);
+
+       ret = ivpu_resume(vdev);
+       if (ret)
+               ivpu_err(vdev, "Failed to resume NPU: %d\n", ret);
+
+       up_write(&vdev->pm->reset_lock);
+       atomic_set(&vdev->pm->reset_pending, 0);
 
        kobject_uevent_env(&vdev->drm.dev->kobj, KOBJ_CHANGE, evt);
+       pm_runtime_mark_last_busy(vdev->drm.dev);
+       pm_runtime_put_autosuspend(vdev->drm.dev);
 }
 
-void ivpu_pm_schedule_recovery(struct ivpu_device *vdev)
+void ivpu_pm_trigger_recovery(struct ivpu_device *vdev, const char *reason)
 {
-       struct ivpu_pm_info *pm = vdev->pm;
+       ivpu_err(vdev, "Recovery triggered by %s\n", reason);
 
        if (ivpu_disable_recovery) {
                ivpu_err(vdev, "Recovery not available when disable_recovery param is set\n");
@@ -138,10 +160,11 @@ void ivpu_pm_schedule_recovery(struct ivpu_device *vdev)
                return;
        }
 
-       /* Schedule recovery if it's not in progress */
-       if (atomic_cmpxchg(&pm->in_reset, 0, 1) == 0) {
-               ivpu_hw_irq_disable(vdev);
-               queue_work(system_long_wq, &pm->recovery_work);
+       /* Trigger recovery if it's not in progress */
+       if (atomic_cmpxchg(&vdev->pm->reset_pending, 0, 1) == 0) {
+               ivpu_hw_diagnose_failure(vdev);
+               ivpu_hw_irq_disable(vdev); /* Disable IRQ early to protect from IRQ storm */
+               queue_work(system_long_wq, &vdev->pm->recovery_work);
        }
 }
 
@@ -149,12 +172,8 @@ static void ivpu_job_timeout_work(struct work_struct *work)
 {
        struct ivpu_pm_info *pm = container_of(work, struct ivpu_pm_info, job_timeout_work.work);
        struct ivpu_device *vdev = pm->vdev;
-       unsigned long timeout_ms = ivpu_tdr_timeout_ms ? ivpu_tdr_timeout_ms : vdev->timeout.tdr;
-
-       ivpu_err(vdev, "TDR detected, timeout %lu ms", timeout_ms);
-       ivpu_hw_diagnose_failure(vdev);
 
-       ivpu_pm_schedule_recovery(vdev);
+       ivpu_pm_trigger_recovery(vdev, "TDR");
 }
 
 void ivpu_start_job_timeout_detection(struct ivpu_device *vdev)
@@ -192,9 +211,6 @@ int ivpu_pm_suspend_cb(struct device *dev)
        ivpu_suspend(vdev);
        ivpu_pm_prepare_warm_boot(vdev);
 
-       pci_save_state(to_pci_dev(dev));
-       pci_set_power_state(to_pci_dev(dev), PCI_D3hot);
-
        ivpu_dbg(vdev, PM, "Suspend done.\n");
 
        return 0;
@@ -208,9 +224,6 @@ int ivpu_pm_resume_cb(struct device *dev)
 
        ivpu_dbg(vdev, PM, "Resume..\n");
 
-       pci_set_power_state(to_pci_dev(dev), PCI_D0);
-       pci_restore_state(to_pci_dev(dev));
-
        ret = ivpu_resume(vdev);
        if (ret)
                ivpu_err(vdev, "Failed to resume: %d\n", ret);
@@ -227,6 +240,9 @@ int ivpu_pm_runtime_suspend_cb(struct device *dev)
        bool hw_is_idle = true;
        int ret;
 
+       drm_WARN_ON(&vdev->drm, !xa_empty(&vdev->submitted_jobs_xa));
+       drm_WARN_ON(&vdev->drm, work_pending(&vdev->pm->recovery_work));
+
        ivpu_dbg(vdev, PM, "Runtime suspend..\n");
 
        if (!ivpu_hw_is_idle(vdev) && vdev->pm->suspend_reschedule_counter) {
@@ -247,7 +263,8 @@ int ivpu_pm_runtime_suspend_cb(struct device *dev)
                ivpu_err(vdev, "Failed to set suspend VPU: %d\n", ret);
 
        if (!hw_is_idle) {
-               ivpu_warn(vdev, "VPU failed to enter idle, force suspended.\n");
+               ivpu_err(vdev, "VPU failed to enter idle, force suspended.\n");
+               ivpu_fw_log_dump(vdev);
                ivpu_pm_prepare_cold_boot(vdev);
        } else {
                ivpu_pm_prepare_warm_boot(vdev);
@@ -308,11 +325,12 @@ void ivpu_pm_reset_prepare_cb(struct pci_dev *pdev)
 {
        struct ivpu_device *vdev = pci_get_drvdata(pdev);
 
-       pm_runtime_get_sync(vdev->drm.dev);
-
        ivpu_dbg(vdev, PM, "Pre-reset..\n");
        atomic_inc(&vdev->pm->reset_counter);
-       atomic_set(&vdev->pm->in_reset, 1);
+       atomic_set(&vdev->pm->reset_pending, 1);
+
+       pm_runtime_get_sync(vdev->drm.dev);
+       down_write(&vdev->pm->reset_lock);
        ivpu_prepare_for_reset(vdev);
        ivpu_hw_reset(vdev);
        ivpu_pm_prepare_cold_boot(vdev);
@@ -329,9 +347,11 @@ void ivpu_pm_reset_done_cb(struct pci_dev *pdev)
        ret = ivpu_resume(vdev);
        if (ret)
                ivpu_err(vdev, "Failed to set RESUME state: %d\n", ret);
-       atomic_set(&vdev->pm->in_reset, 0);
+       up_write(&vdev->pm->reset_lock);
+       atomic_set(&vdev->pm->reset_pending, 0);
        ivpu_dbg(vdev, PM, "Post-reset done.\n");
 
+       pm_runtime_mark_last_busy(vdev->drm.dev);
        pm_runtime_put_autosuspend(vdev->drm.dev);
 }
 
@@ -344,7 +364,10 @@ void ivpu_pm_init(struct ivpu_device *vdev)
        pm->vdev = vdev;
        pm->suspend_reschedule_counter = PM_RESCHEDULE_LIMIT;
 
-       atomic_set(&pm->in_reset, 0);
+       init_rwsem(&pm->reset_lock);
+       atomic_set(&pm->reset_pending, 0);
+       atomic_set(&pm->reset_counter, 0);
+
        INIT_WORK(&pm->recovery_work, ivpu_pm_recovery_work);
        INIT_DELAYED_WORK(&pm->job_timeout_work, ivpu_job_timeout_work);
 
index 97c6e0b0aa42d0a5a071940c5b54a052f99a748c..ec60fbeefefc65bbca4ed619d7265aabffd1bb61 100644 (file)
@@ -6,6 +6,7 @@
 #ifndef __IVPU_PM_H__
 #define __IVPU_PM_H__
 
+#include <linux/rwsem.h>
 #include <linux/types.h>
 
 struct ivpu_device;
@@ -14,8 +15,9 @@ struct ivpu_pm_info {
        struct ivpu_device *vdev;
        struct delayed_work job_timeout_work;
        struct work_struct recovery_work;
-       atomic_t in_reset;
+       struct rw_semaphore reset_lock;
        atomic_t reset_counter;
+       atomic_t reset_pending;
        bool is_warmboot;
        u32 suspend_reschedule_counter;
 };
@@ -37,7 +39,7 @@ int __must_check ivpu_rpm_get(struct ivpu_device *vdev);
 int __must_check ivpu_rpm_get_if_active(struct ivpu_device *vdev);
 void ivpu_rpm_put(struct ivpu_device *vdev);
 
-void ivpu_pm_schedule_recovery(struct ivpu_device *vdev);
+void ivpu_pm_trigger_recovery(struct ivpu_device *vdev, const char *reason);
 void ivpu_start_job_timeout_detection(struct ivpu_device *vdev);
 void ivpu_stop_job_timeout_detection(struct ivpu_device *vdev);
 
index 7b7c605166e0c1c7d2a4c9e1f1bce1f05799d4f6..ab2a82cb1b0b48ab21682bdb87c052707f19d282 100644 (file)
@@ -26,7 +26,6 @@
 #include <linux/interrupt.h>
 #include <linux/timer.h>
 #include <linux/cper.h>
-#include <linux/cxl-event.h>
 #include <linux/platform_device.h>
 #include <linux/mutex.h>
 #include <linux/ratelimit.h>
@@ -674,78 +673,6 @@ static void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata,
        schedule_work(&entry->work);
 }
 
-/*
- * Only a single callback can be registered for CXL CPER events.
- */
-static DECLARE_RWSEM(cxl_cper_rw_sem);
-static cxl_cper_callback cper_callback;
-
-/* CXL Event record UUIDs are formatted as GUIDs and reported in section type */
-
-/*
- * General Media Event Record
- * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
- */
-#define CPER_SEC_CXL_GEN_MEDIA_GUID                                    \
-       GUID_INIT(0xfbcd0a77, 0xc260, 0x417f,                           \
-                 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6)
-
-/*
- * DRAM Event Record
- * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
- */
-#define CPER_SEC_CXL_DRAM_GUID                                         \
-       GUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,                           \
-                 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24)
-
-/*
- * Memory Module Event Record
- * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
- */
-#define CPER_SEC_CXL_MEM_MODULE_GUID                                   \
-       GUID_INIT(0xfe927475, 0xdd59, 0x4339,                           \
-                 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74)
-
-static void cxl_cper_post_event(enum cxl_event_type event_type,
-                               struct cxl_cper_event_rec *rec)
-{
-       if (rec->hdr.length <= sizeof(rec->hdr) ||
-           rec->hdr.length > sizeof(*rec)) {
-               pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n",
-                      rec->hdr.length);
-               return;
-       }
-
-       if (!(rec->hdr.validation_bits & CPER_CXL_COMP_EVENT_LOG_VALID)) {
-               pr_err(FW_WARN "CXL CPER invalid event\n");
-               return;
-       }
-
-       guard(rwsem_read)(&cxl_cper_rw_sem);
-       if (cper_callback)
-               cper_callback(event_type, rec);
-}
-
-int cxl_cper_register_callback(cxl_cper_callback callback)
-{
-       guard(rwsem_write)(&cxl_cper_rw_sem);
-       if (cper_callback)
-               return -EINVAL;
-       cper_callback = callback;
-       return 0;
-}
-EXPORT_SYMBOL_NS_GPL(cxl_cper_register_callback, CXL);
-
-int cxl_cper_unregister_callback(cxl_cper_callback callback)
-{
-       guard(rwsem_write)(&cxl_cper_rw_sem);
-       if (callback != cper_callback)
-               return -EINVAL;
-       cper_callback = NULL;
-       return 0;
-}
-EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_callback, CXL);
-
 static bool ghes_do_proc(struct ghes *ghes,
                         const struct acpi_hest_generic_status *estatus)
 {
@@ -780,22 +707,6 @@ static bool ghes_do_proc(struct ghes *ghes,
                }
                else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
                        queued = ghes_handle_arm_hw_error(gdata, sev, sync);
-               } else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID)) {
-                       struct cxl_cper_event_rec *rec =
-                               acpi_hest_get_payload(gdata);
-
-                       cxl_cper_post_event(CXL_CPER_EVENT_GEN_MEDIA, rec);
-               } else if (guid_equal(sec_type, &CPER_SEC_CXL_DRAM_GUID)) {
-                       struct cxl_cper_event_rec *rec =
-                               acpi_hest_get_payload(gdata);
-
-                       cxl_cper_post_event(CXL_CPER_EVENT_DRAM, rec);
-               } else if (guid_equal(sec_type,
-                                     &CPER_SEC_CXL_MEM_MODULE_GUID)) {
-                       struct cxl_cper_event_rec *rec =
-                               acpi_hest_get_payload(gdata);
-
-                       cxl_cper_post_event(CXL_CPER_EVENT_MEM_MODULE, rec);
                } else {
                        void *err = acpi_hest_get_payload(gdata);
 
index dbdee2924594a921f27fead574fcf1855c4e471b..02255795b800d1a42ceb7694216d2b6c92594b6b 100644 (file)
@@ -525,10 +525,12 @@ static void acpi_ec_clear(struct acpi_ec *ec)
 
 static void acpi_ec_enable_event(struct acpi_ec *ec)
 {
-       spin_lock(&ec->lock);
+       unsigned long flags;
+
+       spin_lock_irqsave(&ec->lock, flags);
        if (acpi_ec_started(ec))
                __acpi_ec_enable_event(ec);
-       spin_unlock(&ec->lock);
+       spin_unlock_irqrestore(&ec->lock, flags);
 
        /* Drain additional events if hardware requires that */
        if (EC_FLAGS_CLEAR_ON_RESUME)
@@ -544,9 +546,11 @@ static void __acpi_ec_flush_work(void)
 
 static void acpi_ec_disable_event(struct acpi_ec *ec)
 {
-       spin_lock(&ec->lock);
+       unsigned long flags;
+
+       spin_lock_irqsave(&ec->lock, flags);
        __acpi_ec_disable_event(ec);
-       spin_unlock(&ec->lock);
+       spin_unlock_irqrestore(&ec->lock, flags);
 
        /*
         * When ec_freeze_events is true, we need to flush events in
@@ -567,9 +571,10 @@ void acpi_ec_flush_work(void)
 
 static bool acpi_ec_guard_event(struct acpi_ec *ec)
 {
+       unsigned long flags;
        bool guarded;
 
-       spin_lock(&ec->lock);
+       spin_lock_irqsave(&ec->lock, flags);
        /*
         * If firmware SCI_EVT clearing timing is "event", we actually
         * don't know when the SCI_EVT will be cleared by firmware after
@@ -585,29 +590,31 @@ static bool acpi_ec_guard_event(struct acpi_ec *ec)
        guarded = ec_event_clearing == ACPI_EC_EVT_TIMING_EVENT &&
                ec->event_state != EC_EVENT_READY &&
                (!ec->curr || ec->curr->command != ACPI_EC_COMMAND_QUERY);
-       spin_unlock(&ec->lock);
+       spin_unlock_irqrestore(&ec->lock, flags);
        return guarded;
 }
 
 static int ec_transaction_polled(struct acpi_ec *ec)
 {
+       unsigned long flags;
        int ret = 0;
 
-       spin_lock(&ec->lock);
+       spin_lock_irqsave(&ec->lock, flags);
        if (ec->curr && (ec->curr->flags & ACPI_EC_COMMAND_POLL))
                ret = 1;
-       spin_unlock(&ec->lock);
+       spin_unlock_irqrestore(&ec->lock, flags);
        return ret;
 }
 
 static int ec_transaction_completed(struct acpi_ec *ec)
 {
+       unsigned long flags;
        int ret = 0;
 
-       spin_lock(&ec->lock);
+       spin_lock_irqsave(&ec->lock, flags);
        if (ec->curr && (ec->curr->flags & ACPI_EC_COMMAND_COMPLETE))
                ret = 1;
-       spin_unlock(&ec->lock);
+       spin_unlock_irqrestore(&ec->lock, flags);
        return ret;
 }
 
@@ -749,6 +756,7 @@ static int ec_guard(struct acpi_ec *ec)
 
 static int ec_poll(struct acpi_ec *ec)
 {
+       unsigned long flags;
        int repeat = 5; /* number of command restarts */
 
        while (repeat--) {
@@ -757,14 +765,14 @@ static int ec_poll(struct acpi_ec *ec)
                do {
                        if (!ec_guard(ec))
                                return 0;
-                       spin_lock(&ec->lock);
+                       spin_lock_irqsave(&ec->lock, flags);
                        advance_transaction(ec, false);
-                       spin_unlock(&ec->lock);
+                       spin_unlock_irqrestore(&ec->lock, flags);
                } while (time_before(jiffies, delay));
                pr_debug("controller reset, restart transaction\n");
-               spin_lock(&ec->lock);
+               spin_lock_irqsave(&ec->lock, flags);
                start_transaction(ec);
-               spin_unlock(&ec->lock);
+               spin_unlock_irqrestore(&ec->lock, flags);
        }
        return -ETIME;
 }
@@ -772,10 +780,11 @@ static int ec_poll(struct acpi_ec *ec)
 static int acpi_ec_transaction_unlocked(struct acpi_ec *ec,
                                        struct transaction *t)
 {
+       unsigned long tmp;
        int ret = 0;
 
        /* start transaction */
-       spin_lock(&ec->lock);
+       spin_lock_irqsave(&ec->lock, tmp);
        /* Enable GPE for command processing (IBF=0/OBF=1) */
        if (!acpi_ec_submit_flushable_request(ec)) {
                ret = -EINVAL;
@@ -786,11 +795,11 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec *ec,
        ec->curr = t;
        ec_dbg_req("Command(%s) started", acpi_ec_cmd_string(t->command));
        start_transaction(ec);
-       spin_unlock(&ec->lock);
+       spin_unlock_irqrestore(&ec->lock, tmp);
 
        ret = ec_poll(ec);
 
-       spin_lock(&ec->lock);
+       spin_lock_irqsave(&ec->lock, tmp);
        if (t->irq_count == ec_storm_threshold)
                acpi_ec_unmask_events(ec);
        ec_dbg_req("Command(%s) stopped", acpi_ec_cmd_string(t->command));
@@ -799,7 +808,7 @@ static int acpi_ec_transaction_unlocked(struct acpi_ec *ec,
        acpi_ec_complete_request(ec);
        ec_dbg_ref(ec, "Decrease command");
 unlock:
-       spin_unlock(&ec->lock);
+       spin_unlock_irqrestore(&ec->lock, tmp);
        return ret;
 }
 
@@ -927,7 +936,9 @@ EXPORT_SYMBOL(ec_get_handle);
 
 static void acpi_ec_start(struct acpi_ec *ec, bool resuming)
 {
-       spin_lock(&ec->lock);
+       unsigned long flags;
+
+       spin_lock_irqsave(&ec->lock, flags);
        if (!test_and_set_bit(EC_FLAGS_STARTED, &ec->flags)) {
                ec_dbg_drv("Starting EC");
                /* Enable GPE for event processing (SCI_EVT=1) */
@@ -937,28 +948,31 @@ static void acpi_ec_start(struct acpi_ec *ec, bool resuming)
                }
                ec_log_drv("EC started");
        }
-       spin_unlock(&ec->lock);
+       spin_unlock_irqrestore(&ec->lock, flags);
 }
 
 static bool acpi_ec_stopped(struct acpi_ec *ec)
 {
+       unsigned long flags;
        bool flushed;
 
-       spin_lock(&ec->lock);
+       spin_lock_irqsave(&ec->lock, flags);
        flushed = acpi_ec_flushed(ec);
-       spin_unlock(&ec->lock);
+       spin_unlock_irqrestore(&ec->lock, flags);
        return flushed;
 }
 
 static void acpi_ec_stop(struct acpi_ec *ec, bool suspending)
 {
-       spin_lock(&ec->lock);
+       unsigned long flags;
+
+       spin_lock_irqsave(&ec->lock, flags);
        if (acpi_ec_started(ec)) {
                ec_dbg_drv("Stopping EC");
                set_bit(EC_FLAGS_STOPPED, &ec->flags);
-               spin_unlock(&ec->lock);
+               spin_unlock_irqrestore(&ec->lock, flags);
                wait_event(ec->wait, acpi_ec_stopped(ec));
-               spin_lock(&ec->lock);
+               spin_lock_irqsave(&ec->lock, flags);
                /* Disable GPE for event processing (SCI_EVT=1) */
                if (!suspending) {
                        acpi_ec_complete_request(ec);
@@ -969,25 +983,29 @@ static void acpi_ec_stop(struct acpi_ec *ec, bool suspending)
                clear_bit(EC_FLAGS_STOPPED, &ec->flags);
                ec_log_drv("EC stopped");
        }
-       spin_unlock(&ec->lock);
+       spin_unlock_irqrestore(&ec->lock, flags);
 }
 
 static void acpi_ec_enter_noirq(struct acpi_ec *ec)
 {
-       spin_lock(&ec->lock);
+       unsigned long flags;
+
+       spin_lock_irqsave(&ec->lock, flags);
        ec->busy_polling = true;
        ec->polling_guard = 0;
        ec_log_drv("interrupt blocked");
-       spin_unlock(&ec->lock);
+       spin_unlock_irqrestore(&ec->lock, flags);
 }
 
 static void acpi_ec_leave_noirq(struct acpi_ec *ec)
 {
-       spin_lock(&ec->lock);
+       unsigned long flags;
+
+       spin_lock_irqsave(&ec->lock, flags);
        ec->busy_polling = ec_busy_polling;
        ec->polling_guard = ec_polling_guard;
        ec_log_drv("interrupt unblocked");
-       spin_unlock(&ec->lock);
+       spin_unlock_irqrestore(&ec->lock, flags);
 }
 
 void acpi_ec_block_transactions(void)
@@ -1119,9 +1137,9 @@ static void acpi_ec_event_processor(struct work_struct *work)
 
        ec_dbg_evt("Query(0x%02x) stopped", handler->query_bit);
 
-       spin_lock(&ec->lock);
+       spin_lock_irq(&ec->lock);
        ec->queries_in_progress--;
-       spin_unlock(&ec->lock);
+       spin_unlock_irq(&ec->lock);
 
        acpi_ec_put_query_handler(handler);
        kfree(q);
@@ -1184,12 +1202,12 @@ static int acpi_ec_submit_query(struct acpi_ec *ec)
         */
        ec_dbg_evt("Query(0x%02x) scheduled", value);
 
-       spin_lock(&ec->lock);
+       spin_lock_irq(&ec->lock);
 
        ec->queries_in_progress++;
        queue_work(ec_query_wq, &q->work);
 
-       spin_unlock(&ec->lock);
+       spin_unlock_irq(&ec->lock);
 
        return 0;
 
@@ -1205,14 +1223,14 @@ static void acpi_ec_event_handler(struct work_struct *work)
 
        ec_dbg_evt("Event started");
 
-       spin_lock(&ec->lock);
+       spin_lock_irq(&ec->lock);
 
        while (ec->events_to_process) {
-               spin_unlock(&ec->lock);
+               spin_unlock_irq(&ec->lock);
 
                acpi_ec_submit_query(ec);
 
-               spin_lock(&ec->lock);
+               spin_lock_irq(&ec->lock);
 
                ec->events_to_process--;
        }
@@ -1229,11 +1247,11 @@ static void acpi_ec_event_handler(struct work_struct *work)
 
                ec_dbg_evt("Event stopped");
 
-               spin_unlock(&ec->lock);
+               spin_unlock_irq(&ec->lock);
 
                guard_timeout = !!ec_guard(ec);
 
-               spin_lock(&ec->lock);
+               spin_lock_irq(&ec->lock);
 
                /* Take care of SCI_EVT unless someone else is doing that. */
                if (guard_timeout && !ec->curr)
@@ -1246,7 +1264,7 @@ static void acpi_ec_event_handler(struct work_struct *work)
 
        ec->events_in_progress--;
 
-       spin_unlock(&ec->lock);
+       spin_unlock_irq(&ec->lock);
 }
 
 static void clear_gpe_and_advance_transaction(struct acpi_ec *ec, bool interrupt)
@@ -1271,11 +1289,13 @@ static void clear_gpe_and_advance_transaction(struct acpi_ec *ec, bool interrupt
 
 static void acpi_ec_handle_interrupt(struct acpi_ec *ec)
 {
-       spin_lock(&ec->lock);
+       unsigned long flags;
+
+       spin_lock_irqsave(&ec->lock, flags);
 
        clear_gpe_and_advance_transaction(ec, true);
 
-       spin_unlock(&ec->lock);
+       spin_unlock_irqrestore(&ec->lock, flags);
 }
 
 static u32 acpi_ec_gpe_handler(acpi_handle gpe_device,
@@ -2085,7 +2105,7 @@ bool acpi_ec_dispatch_gpe(void)
         * Dispatch the EC GPE in-band, but do not report wakeup in any case
         * to allow the caller to process events properly after that.
         */
-       spin_lock(&first_ec->lock);
+       spin_lock_irq(&first_ec->lock);
 
        if (acpi_ec_gpe_status_set(first_ec)) {
                pm_pr_dbg("ACPI EC GPE status set\n");
@@ -2094,7 +2114,7 @@ bool acpi_ec_dispatch_gpe(void)
                work_in_progress = acpi_ec_work_in_progress(first_ec);
        }
 
-       spin_unlock(&first_ec->lock);
+       spin_unlock_irq(&first_ec->lock);
 
        if (!work_in_progress)
                return false;
@@ -2107,11 +2127,11 @@ bool acpi_ec_dispatch_gpe(void)
 
                pm_pr_dbg("ACPI EC work flushed\n");
 
-               spin_lock(&first_ec->lock);
+               spin_lock_irq(&first_ec->lock);
 
                work_in_progress = acpi_ec_work_in_progress(first_ec);
 
-               spin_unlock(&first_ec->lock);
+               spin_unlock_irq(&first_ec->lock);
        } while (work_in_progress && !pm_wakeup_pending());
 
        return false;
index 8dd23b19e99731ce1ce5f5d6062366aece0dd415..eca24f41556df04ac61747e05aace9622fbcc580 100644 (file)
@@ -478,6 +478,16 @@ binder_enqueue_thread_work_ilocked(struct binder_thread *thread,
 {
        WARN_ON(!list_empty(&thread->waiting_thread_node));
        binder_enqueue_work_ilocked(work, &thread->todo);
+
+       /* (e)poll-based threads require an explicit wakeup signal when
+        * queuing their own work; they rely on these events to consume
+        * messages without I/O block. Without it, threads risk waiting
+        * indefinitely without handling the work.
+        */
+       if (thread->looper & BINDER_LOOPER_STATE_POLL &&
+           thread->pid == current->pid && !thread->process_todo)
+               wake_up_interruptible_sync(&thread->wait);
+
        thread->process_todo = true;
 }
 
index 3a5f3255f51b39cc4a5b65554e7d55eed8ea2c57..682ff550ccfb98381515b4821594176f5561f869 100644 (file)
@@ -48,6 +48,7 @@ enum {
 enum board_ids {
        /* board IDs by feature in alphabetical order */
        board_ahci,
+       board_ahci_43bit_dma,
        board_ahci_ign_iferr,
        board_ahci_low_power,
        board_ahci_no_debounce_delay,
@@ -128,6 +129,13 @@ static const struct ata_port_info ahci_port_info[] = {
                .udma_mask      = ATA_UDMA6,
                .port_ops       = &ahci_ops,
        },
+       [board_ahci_43bit_dma] = {
+               AHCI_HFLAGS     (AHCI_HFLAG_43BIT_ONLY),
+               .flags          = AHCI_FLAG_COMMON,
+               .pio_mask       = ATA_PIO4,
+               .udma_mask      = ATA_UDMA6,
+               .port_ops       = &ahci_ops,
+       },
        [board_ahci_ign_iferr] = {
                AHCI_HFLAGS     (AHCI_HFLAG_IGN_IRQ_IF_ERR),
                .flags          = AHCI_FLAG_COMMON,
@@ -597,14 +605,14 @@ static const struct pci_device_id ahci_pci_tbl[] = {
        { PCI_VDEVICE(PROMISE, 0x3f20), board_ahci },   /* PDC42819 */
        { PCI_VDEVICE(PROMISE, 0x3781), board_ahci },   /* FastTrak TX8660 ahci-mode */
 
-       /* Asmedia */
-       { PCI_VDEVICE(ASMEDIA, 0x0601), board_ahci },   /* ASM1060 */
-       { PCI_VDEVICE(ASMEDIA, 0x0602), board_ahci },   /* ASM1060 */
-       { PCI_VDEVICE(ASMEDIA, 0x0611), board_ahci },   /* ASM1061 */
-       { PCI_VDEVICE(ASMEDIA, 0x0612), board_ahci },   /* ASM1062 */
-       { PCI_VDEVICE(ASMEDIA, 0x0621), board_ahci },   /* ASM1061R */
-       { PCI_VDEVICE(ASMEDIA, 0x0622), board_ahci },   /* ASM1062R */
-       { PCI_VDEVICE(ASMEDIA, 0x0624), board_ahci },   /* ASM1062+JMB575 */
+       /* ASMedia */
+       { PCI_VDEVICE(ASMEDIA, 0x0601), board_ahci_43bit_dma }, /* ASM1060 */
+       { PCI_VDEVICE(ASMEDIA, 0x0602), board_ahci_43bit_dma }, /* ASM1060 */
+       { PCI_VDEVICE(ASMEDIA, 0x0611), board_ahci_43bit_dma }, /* ASM1061 */
+       { PCI_VDEVICE(ASMEDIA, 0x0612), board_ahci_43bit_dma }, /* ASM1061/1062 */
+       { PCI_VDEVICE(ASMEDIA, 0x0621), board_ahci_43bit_dma }, /* ASM1061R */
+       { PCI_VDEVICE(ASMEDIA, 0x0622), board_ahci_43bit_dma }, /* ASM1062R */
+       { PCI_VDEVICE(ASMEDIA, 0x0624), board_ahci_43bit_dma }, /* ASM1062+JMB575 */
        { PCI_VDEVICE(ASMEDIA, 0x1062), board_ahci },   /* ASM1062A */
        { PCI_VDEVICE(ASMEDIA, 0x1064), board_ahci },   /* ASM1064 */
        { PCI_VDEVICE(ASMEDIA, 0x1164), board_ahci },   /* ASM1164 */
@@ -663,6 +671,19 @@ MODULE_PARM_DESC(mobile_lpm_policy, "Default LPM policy for mobile chipsets");
 static void ahci_pci_save_initial_config(struct pci_dev *pdev,
                                         struct ahci_host_priv *hpriv)
 {
+       if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA) {
+               switch (pdev->device) {
+               case 0x1166:
+                       dev_info(&pdev->dev, "ASM1166 has only six ports\n");
+                       hpriv->saved_port_map = 0x3f;
+                       break;
+               case 0x1064:
+                       dev_info(&pdev->dev, "ASM1064 has only four ports\n");
+                       hpriv->saved_port_map = 0xf;
+                       break;
+               }
+       }
+
        if (pdev->vendor == PCI_VENDOR_ID_JMICRON && pdev->device == 0x2361) {
                dev_info(&pdev->dev, "JMB361 has only one port\n");
                hpriv->saved_port_map = 1;
@@ -949,11 +970,20 @@ static int ahci_pci_device_resume(struct device *dev)
 
 #endif /* CONFIG_PM */
 
-static int ahci_configure_dma_masks(struct pci_dev *pdev, int using_dac)
+static int ahci_configure_dma_masks(struct pci_dev *pdev,
+                                   struct ahci_host_priv *hpriv)
 {
-       const int dma_bits = using_dac ? 64 : 32;
+       int dma_bits;
        int rc;
 
+       if (hpriv->cap & HOST_CAP_64) {
+               dma_bits = 64;
+               if (hpriv->flags & AHCI_HFLAG_43BIT_ONLY)
+                       dma_bits = 43;
+       } else {
+               dma_bits = 32;
+       }
+
        /*
         * If the device fixup already set the dma_mask to some non-standard
         * value, don't extend it here. This happens on STA2X11, for example.
@@ -1926,7 +1956,7 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
        ahci_gtf_filter_workaround(host);
 
        /* initialize adapter */
-       rc = ahci_configure_dma_masks(pdev, hpriv->cap & HOST_CAP_64);
+       rc = ahci_configure_dma_masks(pdev, hpriv);
        if (rc)
                return rc;
 
index 4bae95b06ae3c953de7a567c35f9b46dd9e3083f..df8f8a1a3a34c3ee26d0d2b899522a82d220b6c2 100644 (file)
@@ -247,6 +247,7 @@ enum {
        AHCI_HFLAG_SUSPEND_PHYS         = BIT(26), /* handle PHYs during
                                                      suspend/resume */
        AHCI_HFLAG_NO_SXS               = BIT(28), /* SXS not supported */
+       AHCI_HFLAG_43BIT_ONLY           = BIT(29), /* 43bit DMA addr limit */
 
        /* ap->flags bits */
 
index 64f7f7d6ba84e07c2f2db2fbbdfb3d315f821ec2..11a2c199a7c24628e858f2fc8e88e69a60c8b94b 100644 (file)
@@ -88,7 +88,6 @@ struct ceva_ahci_priv {
        u32 axicc;
        bool is_cci_enabled;
        int flags;
-       struct reset_control *rst;
 };
 
 static unsigned int ceva_ahci_read_id(struct ata_device *dev,
@@ -189,6 +188,60 @@ static const struct scsi_host_template ahci_platform_sht = {
        AHCI_SHT(DRV_NAME),
 };
 
+static int ceva_ahci_platform_enable_resources(struct ahci_host_priv *hpriv)
+{
+       int rc, i;
+
+       rc = ahci_platform_enable_regulators(hpriv);
+       if (rc)
+               return rc;
+
+       rc = ahci_platform_enable_clks(hpriv);
+       if (rc)
+               goto disable_regulator;
+
+       /* Assert the controller reset */
+       rc = ahci_platform_assert_rsts(hpriv);
+       if (rc)
+               goto disable_clks;
+
+       for (i = 0; i < hpriv->nports; i++) {
+               rc = phy_init(hpriv->phys[i]);
+               if (rc)
+                       goto disable_rsts;
+       }
+
+       /* De-assert the controller reset */
+       ahci_platform_deassert_rsts(hpriv);
+
+       for (i = 0; i < hpriv->nports; i++) {
+               rc = phy_power_on(hpriv->phys[i]);
+               if (rc) {
+                       phy_exit(hpriv->phys[i]);
+                       goto disable_phys;
+               }
+       }
+
+       return 0;
+
+disable_rsts:
+       ahci_platform_deassert_rsts(hpriv);
+
+disable_phys:
+       while (--i >= 0) {
+               phy_power_off(hpriv->phys[i]);
+               phy_exit(hpriv->phys[i]);
+       }
+
+disable_clks:
+       ahci_platform_disable_clks(hpriv);
+
+disable_regulator:
+       ahci_platform_disable_regulators(hpriv);
+
+       return rc;
+}
+
 static int ceva_ahci_probe(struct platform_device *pdev)
 {
        struct device_node *np = pdev->dev.of_node;
@@ -203,47 +256,19 @@ static int ceva_ahci_probe(struct platform_device *pdev)
                return -ENOMEM;
 
        cevapriv->ahci_pdev = pdev;
-
-       cevapriv->rst = devm_reset_control_get_optional_exclusive(&pdev->dev,
-                                                                 NULL);
-       if (IS_ERR(cevapriv->rst))
-               dev_err_probe(&pdev->dev, PTR_ERR(cevapriv->rst),
-                             "failed to get reset\n");
-
        hpriv = ahci_platform_get_resources(pdev, 0);
        if (IS_ERR(hpriv))
                return PTR_ERR(hpriv);
 
-       if (!cevapriv->rst) {
-               rc = ahci_platform_enable_resources(hpriv);
-               if (rc)
-                       return rc;
-       } else {
-               int i;
+       hpriv->rsts = devm_reset_control_get_optional_exclusive(&pdev->dev,
+                                                               NULL);
+       if (IS_ERR(hpriv->rsts))
+               return dev_err_probe(&pdev->dev, PTR_ERR(hpriv->rsts),
+                                    "failed to get reset\n");
 
-               rc = ahci_platform_enable_clks(hpriv);
-               if (rc)
-                       return rc;
-               /* Assert the controller reset */
-               reset_control_assert(cevapriv->rst);
-
-               for (i = 0; i < hpriv->nports; i++) {
-                       rc = phy_init(hpriv->phys[i]);
-                       if (rc)
-                               return rc;
-               }
-
-               /* De-assert the controller reset */
-               reset_control_deassert(cevapriv->rst);
-
-               for (i = 0; i < hpriv->nports; i++) {
-                       rc = phy_power_on(hpriv->phys[i]);
-                       if (rc) {
-                               phy_exit(hpriv->phys[i]);
-                               return rc;
-                       }
-               }
-       }
+       rc = ceva_ahci_platform_enable_resources(hpriv);
+       if (rc)
+               return rc;
 
        if (of_property_read_bool(np, "ceva,broken-gen2"))
                cevapriv->flags = CEVA_FLAG_BROKEN_GEN2;
@@ -252,52 +277,60 @@ static int ceva_ahci_probe(struct platform_device *pdev)
        if (of_property_read_u8_array(np, "ceva,p0-cominit-params",
                                        (u8 *)&cevapriv->pp2c[0], 4) < 0) {
                dev_warn(dev, "ceva,p0-cominit-params property not defined\n");
-               return -EINVAL;
+               rc = -EINVAL;
+               goto disable_resources;
        }
 
        if (of_property_read_u8_array(np, "ceva,p1-cominit-params",
                                        (u8 *)&cevapriv->pp2c[1], 4) < 0) {
                dev_warn(dev, "ceva,p1-cominit-params property not defined\n");
-               return -EINVAL;
+               rc = -EINVAL;
+               goto disable_resources;
        }
 
        /* Read OOB timing value for COMWAKE from device-tree*/
        if (of_property_read_u8_array(np, "ceva,p0-comwake-params",
                                        (u8 *)&cevapriv->pp3c[0], 4) < 0) {
                dev_warn(dev, "ceva,p0-comwake-params property not defined\n");
-               return -EINVAL;
+               rc = -EINVAL;
+               goto disable_resources;
        }
 
        if (of_property_read_u8_array(np, "ceva,p1-comwake-params",
                                        (u8 *)&cevapriv->pp3c[1], 4) < 0) {
                dev_warn(dev, "ceva,p1-comwake-params property not defined\n");
-               return -EINVAL;
+               rc = -EINVAL;
+               goto disable_resources;
        }
 
        /* Read phy BURST timing value from device-tree */
        if (of_property_read_u8_array(np, "ceva,p0-burst-params",
                                        (u8 *)&cevapriv->pp4c[0], 4) < 0) {
                dev_warn(dev, "ceva,p0-burst-params property not defined\n");
-               return -EINVAL;
+               rc = -EINVAL;
+               goto disable_resources;
        }
 
        if (of_property_read_u8_array(np, "ceva,p1-burst-params",
                                        (u8 *)&cevapriv->pp4c[1], 4) < 0) {
                dev_warn(dev, "ceva,p1-burst-params property not defined\n");
-               return -EINVAL;
+               rc = -EINVAL;
+               goto disable_resources;
        }
 
        /* Read phy RETRY interval timing value from device-tree */
        if (of_property_read_u16_array(np, "ceva,p0-retry-params",
                                        (u16 *)&cevapriv->pp5c[0], 2) < 0) {
                dev_warn(dev, "ceva,p0-retry-params property not defined\n");
-               return -EINVAL;
+               rc = -EINVAL;
+               goto disable_resources;
        }
 
        if (of_property_read_u16_array(np, "ceva,p1-retry-params",
                                        (u16 *)&cevapriv->pp5c[1], 2) < 0) {
                dev_warn(dev, "ceva,p1-retry-params property not defined\n");
-               return -EINVAL;
+               rc = -EINVAL;
+               goto disable_resources;
        }
 
        /*
@@ -335,7 +368,7 @@ static int __maybe_unused ceva_ahci_resume(struct device *dev)
        struct ahci_host_priv *hpriv = host->private_data;
        int rc;
 
-       rc = ahci_platform_enable_resources(hpriv);
+       rc = ceva_ahci_platform_enable_resources(hpriv);
        if (rc)
                return rc;
 
index 09ed67772fae492323361ab7e94f8a8d4345d2e8..be3412cdb22e78a1d663337698f07b07c66727e4 100644 (file)
@@ -2001,6 +2001,33 @@ bool ata_dev_power_init_tf(struct ata_device *dev, struct ata_taskfile *tf,
        return true;
 }
 
+static bool ata_dev_power_is_active(struct ata_device *dev)
+{
+       struct ata_taskfile tf;
+       unsigned int err_mask;
+
+       ata_tf_init(dev, &tf);
+       tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
+       tf.protocol = ATA_PROT_NODATA;
+       tf.command = ATA_CMD_CHK_POWER;
+
+       err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE, NULL, 0, 0);
+       if (err_mask) {
+               ata_dev_err(dev, "Check power mode failed (err_mask=0x%x)\n",
+                           err_mask);
+               /*
+                * Assume we are in standby mode so that we always force a
+                * spinup in ata_dev_power_set_active().
+                */
+               return false;
+       }
+
+       ata_dev_dbg(dev, "Power mode: 0x%02x\n", tf.nsect);
+
+       /* Active or idle */
+       return tf.nsect == 0xff;
+}
+
 /**
  *     ata_dev_power_set_standby - Set a device power mode to standby
  *     @dev: target device
@@ -2017,6 +2044,11 @@ void ata_dev_power_set_standby(struct ata_device *dev)
        struct ata_taskfile tf;
        unsigned int err_mask;
 
+       /* If the device is already sleeping or in standby, do nothing. */
+       if ((dev->flags & ATA_DFLAG_SLEEPING) ||
+           !ata_dev_power_is_active(dev))
+               return;
+
        /*
         * Some odd clown BIOSes issue spindown on power off (ACPI S4 or S5)
         * causing some drives to spin up and down again. For these, do nothing
@@ -2042,33 +2074,6 @@ void ata_dev_power_set_standby(struct ata_device *dev)
                            err_mask);
 }
 
-static bool ata_dev_power_is_active(struct ata_device *dev)
-{
-       struct ata_taskfile tf;
-       unsigned int err_mask;
-
-       ata_tf_init(dev, &tf);
-       tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
-       tf.protocol = ATA_PROT_NODATA;
-       tf.command = ATA_CMD_CHK_POWER;
-
-       err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE, NULL, 0, 0);
-       if (err_mask) {
-               ata_dev_err(dev, "Check power mode failed (err_mask=0x%x)\n",
-                           err_mask);
-               /*
-                * Assume we are in standby mode so that we always force a
-                * spinup in ata_dev_power_set_active().
-                */
-               return false;
-       }
-
-       ata_dev_dbg(dev, "Power mode: 0x%02x\n", tf.nsect);
-
-       /* Active or idle */
-       return tf.nsect == 0xff;
-}
-
 /**
  *     ata_dev_power_set_active -  Set a device power mode to active
  *     @dev: target device
index b6656c287175c7653324758ccb5422dc36168e3d..0fb1934875f2084a753216cf54ff443aa601361b 100644 (file)
@@ -784,7 +784,7 @@ bool sata_lpm_ignore_phy_events(struct ata_link *link)
 EXPORT_SYMBOL_GPL(sata_lpm_ignore_phy_events);
 
 static const char *ata_lpm_policy_names[] = {
-       [ATA_LPM_UNKNOWN]               = "max_performance",
+       [ATA_LPM_UNKNOWN]               = "keep_firmware_settings",
        [ATA_LPM_MAX_POWER]             = "max_performance",
        [ATA_LPM_MED_POWER]             = "medium_power",
        [ATA_LPM_MED_POWER_WITH_DIPM]   = "med_power_with_dipm",
index e327a0229dc173442b2789a402a8ea0adb931cdd..e7f713cd70d3fd7a413c2568b8dca8f8cc8ba2c4 100644 (file)
@@ -2930,6 +2930,8 @@ open_card_ubr0(struct idt77252_dev *card)
        vc->scq = alloc_scq(card, vc->class);
        if (!vc->scq) {
                printk("%s: can't get SCQ.\n", card->name);
+               kfree(card->vcs[0]);
+               card->vcs[0] = NULL;
                return -ENOMEM;
        }
 
index 018ac202de345e9a97bc7198385c1d95d460eb28..024b78a0cfc11bbba2f0bf3c32f21d55aa101d3d 100644 (file)
@@ -431,9 +431,6 @@ init_cpu_capacity_callback(struct notifier_block *nb,
        struct cpufreq_policy *policy = data;
        int cpu;
 
-       if (!raw_capacity)
-               return 0;
-
        if (val != CPUFREQ_CREATE_POLICY)
                return 0;
 
@@ -450,9 +447,11 @@ init_cpu_capacity_callback(struct notifier_block *nb,
        }
 
        if (cpumask_empty(cpus_to_visit)) {
-               topology_normalize_cpu_scale();
-               schedule_work(&update_topology_flags_work);
-               free_raw_capacity();
+               if (raw_capacity) {
+                       topology_normalize_cpu_scale();
+                       schedule_work(&update_topology_flags_work);
+                       free_raw_capacity();
+               }
                pr_debug("cpu_capacity: parsing done\n");
                schedule_work(&parsing_done_work);
        }
@@ -472,7 +471,7 @@ static int __init register_cpufreq_notifier(void)
         * On ACPI-based systems skip registering cpufreq notifier as cpufreq
         * information is not needed for cpu capacity initialization.
         */
-       if (!acpi_disabled || !raw_capacity)
+       if (!acpi_disabled)
                return -EINVAL;
 
        if (!alloc_cpumask_var(&cpus_to_visit, GFP_KERNEL))
index eb4c0ace924201dfbdef7dc410c344640333f04b..0738ccad08b2e03c31696b9208200909b9de0171 100644 (file)
@@ -207,7 +207,7 @@ static inline int devtmpfs_init(void) { return 0; }
 #endif
 
 #ifdef CONFIG_BLOCK
-extern struct class block_class;
+extern const struct class block_class;
 static inline bool is_blockdev(struct device *dev)
 {
        return dev->class == &block_class;
index 14d46af40f9a15e185230eecf3bbac6ec94728ef..9828da9b933cb7511756d15ec8be2ebbd14f9e44 100644 (file)
@@ -125,7 +125,7 @@ static void __fwnode_link_del(struct fwnode_link *link)
  */
 static void __fwnode_link_cycle(struct fwnode_link *link)
 {
-       pr_debug("%pfwf: Relaxing link with %pfwf\n",
+       pr_debug("%pfwf: cycle: depends on %pfwf\n",
                 link->consumer, link->supplier);
        link->flags |= FWLINK_FLAG_CYCLE;
 }
@@ -284,10 +284,12 @@ static bool device_is_ancestor(struct device *dev, struct device *target)
        return false;
 }
 
+#define DL_MARKER_FLAGS                (DL_FLAG_INFERRED | \
+                                DL_FLAG_CYCLE | \
+                                DL_FLAG_MANAGED)
 static inline bool device_link_flag_is_sync_state_only(u32 flags)
 {
-       return (flags & ~(DL_FLAG_INFERRED | DL_FLAG_CYCLE)) ==
-               (DL_FLAG_SYNC_STATE_ONLY | DL_FLAG_MANAGED);
+       return (flags & ~DL_MARKER_FLAGS) == DL_FLAG_SYNC_STATE_ONLY;
 }
 
 /**
@@ -1943,6 +1945,7 @@ static bool __fw_devlink_relax_cycles(struct device *con,
 
        /* Termination condition. */
        if (sup_dev == con) {
+               pr_debug("----- cycle: start -----\n");
                ret = true;
                goto out;
        }
@@ -1974,8 +1977,11 @@ static bool __fw_devlink_relax_cycles(struct device *con,
        else
                par_dev = fwnode_get_next_parent_dev(sup_handle);
 
-       if (par_dev && __fw_devlink_relax_cycles(con, par_dev->fwnode))
+       if (par_dev && __fw_devlink_relax_cycles(con, par_dev->fwnode)) {
+               pr_debug("%pfwf: cycle: child of %pfwf\n", sup_handle,
+                        par_dev->fwnode);
                ret = true;
+       }
 
        if (!sup_dev)
                goto out;
@@ -1991,6 +1997,8 @@ static bool __fw_devlink_relax_cycles(struct device *con,
 
                if (__fw_devlink_relax_cycles(con,
                                              dev_link->supplier->fwnode)) {
+                       pr_debug("%pfwf: cycle: depends on %pfwf\n", sup_handle,
+                                dev_link->supplier->fwnode);
                        fw_devlink_relax_link(dev_link);
                        dev_link->flags |= DL_FLAG_CYCLE;
                        ret = true;
@@ -2058,13 +2066,19 @@ static int fw_devlink_create_devlink(struct device *con,
 
        /*
         * SYNC_STATE_ONLY device links don't block probing and supports cycles.
-        * So cycle detection isn't necessary and shouldn't be done.
+        * So, one might expect that cycle detection isn't necessary for them.
+        * However, if the device link was marked as SYNC_STATE_ONLY because
+        * it's part of a cycle, then we still need to do cycle detection. This
+        * is because the consumer and supplier might be part of multiple cycles
+        * and we need to detect all those cycles.
         */
-       if (!(flags & DL_FLAG_SYNC_STATE_ONLY)) {
+       if (!device_link_flag_is_sync_state_only(flags) ||
+           flags & DL_FLAG_CYCLE) {
                device_links_write_lock();
                if (__fw_devlink_relax_cycles(con, sup_handle)) {
                        __fwnode_link_cycle(link);
                        flags = fw_devlink_get_flags(link->flags);
+                       pr_debug("----- cycle: end -----\n");
                        dev_info(con, "Fixed dependency cycle(s) with %pfwf\n",
                                 sup_handle);
                }
index 026bdcb45127f530093cb4041f734d222e2fb005..0d957c5f1bcc987a585abaad9ed53623c33b4189 100644 (file)
@@ -9,6 +9,23 @@
 
 #define BLOCK_TEST_SIZE 12
 
+static void get_changed_bytes(void *orig, void *new, size_t size)
+{
+       char *o = orig;
+       char *n = new;
+       int i;
+
+       get_random_bytes(new, size);
+
+       /*
+        * This could be nicer and more efficient but we shouldn't
+        * super care.
+        */
+       for (i = 0; i < size; i++)
+               while (n[i] == o[i])
+                       get_random_bytes(&n[i], 1);
+}
+
 static const struct regmap_config test_regmap_config = {
        .max_register = BLOCK_TEST_SIZE,
        .reg_stride = 1,
@@ -1202,7 +1219,8 @@ static void raw_noinc_write(struct kunit *test)
        struct regmap *map;
        struct regmap_config config;
        struct regmap_ram_data *data;
-       unsigned int val, val_test, val_last;
+       unsigned int val;
+       u16 val_test, val_last;
        u16 val_array[BLOCK_TEST_SIZE];
 
        config = raw_regmap_config;
@@ -1251,7 +1269,7 @@ static void raw_sync(struct kunit *test)
        struct regmap *map;
        struct regmap_config config;
        struct regmap_ram_data *data;
-       u16 val[2];
+       u16 val[3];
        u16 *hw_buf;
        unsigned int rval;
        int i;
@@ -1265,17 +1283,13 @@ static void raw_sync(struct kunit *test)
 
        hw_buf = (u16 *)data->vals;
 
-       get_random_bytes(&val, sizeof(val));
+       get_changed_bytes(&hw_buf[2], &val[0], sizeof(val));
 
        /* Do a regular write and a raw write in cache only mode */
        regcache_cache_only(map, true);
-       KUNIT_EXPECT_EQ(test, 0, regmap_raw_write(map, 2, val, sizeof(val)));
-       if (config.val_format_endian == REGMAP_ENDIAN_BIG)
-               KUNIT_EXPECT_EQ(test, 0, regmap_write(map, 6,
-                                                     be16_to_cpu(val[0])));
-       else
-               KUNIT_EXPECT_EQ(test, 0, regmap_write(map, 6,
-                                                     le16_to_cpu(val[0])));
+       KUNIT_EXPECT_EQ(test, 0, regmap_raw_write(map, 2, val,
+                                                 sizeof(u16) * 2));
+       KUNIT_EXPECT_EQ(test, 0, regmap_write(map, 4, val[2]));
 
        /* We should read back the new values, and defaults for the rest */
        for (i = 0; i < config.max_register + 1; i++) {
@@ -1284,24 +1298,34 @@ static void raw_sync(struct kunit *test)
                switch (i) {
                case 2:
                case 3:
-               case 6:
                        if (config.val_format_endian == REGMAP_ENDIAN_BIG) {
                                KUNIT_EXPECT_EQ(test, rval,
-                                               be16_to_cpu(val[i % 2]));
+                                               be16_to_cpu(val[i - 2]));
                        } else {
                                KUNIT_EXPECT_EQ(test, rval,
-                                               le16_to_cpu(val[i % 2]));
+                                               le16_to_cpu(val[i - 2]));
                        }
                        break;
+               case 4:
+                       KUNIT_EXPECT_EQ(test, rval, val[i - 2]);
+                       break;
                default:
                        KUNIT_EXPECT_EQ(test, config.reg_defaults[i].def, rval);
                        break;
                }
        }
+
+       /*
+        * The value written via _write() was translated by the core,
+        * translate the original copy for comparison purposes.
+        */
+       if (config.val_format_endian == REGMAP_ENDIAN_BIG)
+               val[2] = cpu_to_be16(val[2]);
+       else
+               val[2] = cpu_to_le16(val[2]);
        
        /* The values should not appear in the "hardware" */
-       KUNIT_EXPECT_MEMNEQ(test, &hw_buf[2], val, sizeof(val));
-       KUNIT_EXPECT_MEMNEQ(test, &hw_buf[6], val, sizeof(u16));
+       KUNIT_EXPECT_MEMNEQ(test, &hw_buf[2], &val[0], sizeof(val));
 
        for (i = 0; i < config.max_register + 1; i++)
                data->written[i] = false;
@@ -1312,8 +1336,7 @@ static void raw_sync(struct kunit *test)
        KUNIT_EXPECT_EQ(test, 0, regcache_sync(map));
 
        /* The values should now appear in the "hardware" */
-       KUNIT_EXPECT_MEMEQ(test, &hw_buf[2], val, sizeof(val));
-       KUNIT_EXPECT_MEMEQ(test, &hw_buf[6], val, sizeof(u16));
+       KUNIT_EXPECT_MEMEQ(test, &hw_buf[2], &val[0], sizeof(val));
 
        regmap_exit(map);
 }
index 2b98114a9fe0926d8b79021ff60e5d92dd2b8e37..a25414228e47410fe7756e5cc434fe2b0f53115d 100644 (file)
@@ -1779,7 +1779,7 @@ static int fd_alloc_disk(int drive, int system)
        struct gendisk *disk;
        int err;
 
-       disk = blk_mq_alloc_disk(&unit[drive].tag_set, NULL);
+       disk = blk_mq_alloc_disk(&unit[drive].tag_set, NULL, NULL);
        if (IS_ERR(disk))
                return PTR_ERR(disk);
 
index d2dbf8aaccb5b1320909964b3092ea47df90cc4a..b6dac8cee70fe13a49a7699727e47f13a27be91c 100644 (file)
@@ -24,8 +24,8 @@ static DEFINE_MUTEX(aoeblk_mutex);
 static struct kmem_cache *buf_pool_cache;
 static struct dentry *aoe_debugfs_dir;
 
-/* GPFS needs a larger value than the default. */
-static int aoe_maxsectors;
+/* random default picked from the historic block max_sectors cap */
+static int aoe_maxsectors = 2560;
 module_param(aoe_maxsectors, int, 0644);
 MODULE_PARM_DESC(aoe_maxsectors,
        "When nonzero, set the maximum number of sectors per I/O request");
@@ -333,6 +333,11 @@ aoeblk_gdalloc(void *vp)
        struct gendisk *gd;
        mempool_t *mp;
        struct blk_mq_tag_set *set;
+       sector_t ssize;
+       struct queue_limits lim = {
+               .max_hw_sectors         = aoe_maxsectors,
+               .io_opt                 = SZ_2M,
+       };
        ulong flags;
        int late = 0;
        int err;
@@ -370,7 +375,7 @@ aoeblk_gdalloc(void *vp)
                goto err_mempool;
        }
 
-       gd = blk_mq_alloc_disk(set, d);
+       gd = blk_mq_alloc_disk(set, &lim, d);
        if (IS_ERR(gd)) {
                pr_err("aoe: cannot allocate block queue for %ld.%d\n",
                        d->aoemajor, d->aoeminor);
@@ -383,20 +388,15 @@ aoeblk_gdalloc(void *vp)
        WARN_ON(d->flags & DEVFL_TKILL);
        WARN_ON(d->gd);
        WARN_ON(d->flags & DEVFL_UP);
-       /* random number picked from the history block max_sectors cap */
-       blk_queue_max_hw_sectors(gd->queue, 2560u);
-       blk_queue_io_opt(gd->queue, SZ_2M);
        d->bufpool = mp;
        d->blkq = gd->queue;
        d->gd = gd;
-       if (aoe_maxsectors)
-               blk_queue_max_hw_sectors(gd->queue, aoe_maxsectors);
        gd->major = AOE_MAJOR;
        gd->first_minor = d->sysminor;
        gd->minors = AOE_PARTITIONS;
        gd->fops = &aoe_bdops;
        gd->private_data = d;
-       set_capacity(gd, d->ssize);
+       ssize = d->ssize;
        snprintf(gd->disk_name, sizeof gd->disk_name, "etherd/e%ld.%d",
                d->aoemajor, d->aoeminor);
 
@@ -405,6 +405,8 @@ aoeblk_gdalloc(void *vp)
 
        spin_unlock_irqrestore(&d->lock, flags);
 
+       set_capacity(gd, ssize);
+
        err = device_add_disk(NULL, gd, aoe_attr_groups);
        if (err)
                goto out_disk_cleanup;
index d7317425be510d1c3d4bbac5a18fba2ea8e76c1d..cc9077b588d7e7af30401ab03c826a358b2a232c 100644 (file)
@@ -419,13 +419,16 @@ aoecmd_cfg_pkts(ushort aoemajor, unsigned char aoeminor, struct sk_buff_head *qu
        rcu_read_lock();
        for_each_netdev_rcu(&init_net, ifp) {
                dev_hold(ifp);
-               if (!is_aoe_netif(ifp))
-                       goto cont;
+               if (!is_aoe_netif(ifp)) {
+                       dev_put(ifp);
+                       continue;
+               }
 
                skb = new_skb(sizeof *h + sizeof *ch);
                if (skb == NULL) {
                        printk(KERN_INFO "aoe: skb alloc failure\n");
-                       goto cont;
+                       dev_put(ifp);
+                       continue;
                }
                skb_put(skb, sizeof *h + sizeof *ch);
                skb->dev = ifp;
@@ -440,9 +443,6 @@ aoecmd_cfg_pkts(ushort aoemajor, unsigned char aoeminor, struct sk_buff_head *qu
                h->major = cpu_to_be16(aoemajor);
                h->minor = aoeminor;
                h->cmd = AOECMD_CFG;
-
-cont:
-               dev_put(ifp);
        }
        rcu_read_unlock();
 }
index c51ea95bc2ce41f6260302f5efe914d3e12e1d98..923a134fd766562fcf9d33a993739040517258c1 100644 (file)
@@ -63,6 +63,7 @@ tx(int id) __must_hold(&txlock)
                        pr_warn("aoe: packet could not be sent on %s.  %s\n",
                                ifp ? ifp->name : "netif",
                                "consider increasing tx_queue_len");
+               dev_put(ifp);
                spin_lock_irq(&txlock);
        }
        return 0;
index 50949207798d2a27a3fb6c86e94c5afd490ff5d6..cacc4ba942a814015fa82e592c8f946340a5a577 100644 (file)
@@ -1994,7 +1994,7 @@ static int ataflop_alloc_disk(unsigned int drive, unsigned int type)
 {
        struct gendisk *disk;
 
-       disk = blk_mq_alloc_disk(&unit[drive].tag_set, NULL);
+       disk = blk_mq_alloc_disk(&unit[drive].tag_set, NULL, NULL);
        if (IS_ERR(disk))
                return PTR_ERR(disk);
 
index 970bd6ff38c491a610f65026817f5c076aed8285..e322cef6596bfaa2f1cbf6de1f275a804670d49b 100644 (file)
@@ -318,6 +318,16 @@ static int brd_alloc(int i)
        struct gendisk *disk;
        char buf[DISK_NAME_LEN];
        int err = -ENOMEM;
+       struct queue_limits lim = {
+               /*
+                * This is so fdisk will align partitions on 4k, because of
+                * direct_access API needing 4k alignment, returning a PFN
+                * (This is only a problem on very small devices <= 4M,
+                *  otherwise fdisk will align on 1M. Regardless this call
+                *  is harmless)
+                */
+               .physical_block_size    = PAGE_SIZE,
+       };
 
        list_for_each_entry(brd, &brd_devices, brd_list)
                if (brd->brd_number == i)
@@ -335,10 +345,11 @@ static int brd_alloc(int i)
                debugfs_create_u64(buf, 0444, brd_debugfs_dir,
                                &brd->brd_nr_pages);
 
-       disk = brd->brd_disk = blk_alloc_disk(NUMA_NO_NODE);
-       if (!disk)
+       disk = brd->brd_disk = blk_alloc_disk(&lim, NUMA_NO_NODE);
+       if (IS_ERR(disk)) {
+               err = PTR_ERR(disk);
                goto out_free_dev;
-
+       }
        disk->major             = RAMDISK_MAJOR;
        disk->first_minor       = i * max_part;
        disk->minors            = max_part;
@@ -347,15 +358,6 @@ static int brd_alloc(int i)
        strscpy(disk->disk_name, buf, DISK_NAME_LEN);
        set_capacity(disk, rd_size * 2);
        
-       /*
-        * This is so fdisk will align partitions on 4k, because of
-        * direct_access API needing 4k alignment, returning a PFN
-        * (This is only a problem on very small devices <= 4M,
-        *  otherwise fdisk will align on 1M. Regardless this call
-        *  is harmless)
-        */
-       blk_queue_physical_block_size(disk->queue, PAGE_SIZE);
-
        /* Tell the block layer that this is not a rotational device */
        blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
        blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, disk->queue);
index c21e3732759ec21069939f43ca944e044ea95095..94dc0a235919d755a788f758150ba80c66635910 100644 (file)
@@ -524,9 +524,9 @@ struct drbd_md {
 
 struct drbd_backing_dev {
        struct block_device *backing_bdev;
-       struct bdev_handle *backing_bdev_handle;
+       struct file *backing_bdev_file;
        struct block_device *md_bdev;
-       struct bdev_handle *md_bdev_handle;
+       struct file *f_md_bdev;
        struct drbd_md md;
        struct disk_conf *disk_conf; /* RCU, for updates: resource->conf_update */
        sector_t known_size; /* last known size of that backing device */
index 6bc86106c7b2ab5d59b6ab66e452976d235a7d97..113b441d4d3670c15f2b10c85b0ec6f82c7ab002 100644 (file)
@@ -2690,6 +2690,14 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
        int id;
        int vnr = adm_ctx->volume;
        enum drbd_ret_code err = ERR_NOMEM;
+       struct queue_limits lim = {
+               /*
+                * Setting the max_hw_sectors to an odd value of 8kibyte here.
+                * This triggers a max_bio_size message upon first attach or
+                * connect.
+                */
+               .max_hw_sectors         = DRBD_MAX_BIO_SIZE_SAFE >> 8,
+       };
 
        device = minor_to_device(minor);
        if (device)
@@ -2708,9 +2716,11 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
 
        drbd_init_set_defaults(device);
 
-       disk = blk_alloc_disk(NUMA_NO_NODE);
-       if (!disk)
+       disk = blk_alloc_disk(&lim, NUMA_NO_NODE);
+       if (IS_ERR(disk)) {
+               err = PTR_ERR(disk);
                goto out_no_disk;
+       }
 
        device->vdisk = disk;
        device->rq_queue = disk->queue;
@@ -2727,9 +2737,6 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
 
        blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, disk->queue);
        blk_queue_write_cache(disk->queue, true, true);
-       /* Setting the max_hw_sectors to an odd value of 8kibyte here
-          This triggers a max_bio_size message upon first attach or connect */
-       blk_queue_max_hw_sectors(disk->queue, DRBD_MAX_BIO_SIZE_SAFE >> 8);
 
        device->md_io.page = alloc_page(GFP_KERNEL);
        if (!device->md_io.page)
index 43747a1aae43537a1e578ce41949af3711a3bd3c..5d65c9754d8377e60f59228f7abb2eac9669a439 100644 (file)
@@ -1189,9 +1189,31 @@ static int drbd_check_al_size(struct drbd_device *device, struct disk_conf *dc)
        return 0;
 }
 
-static void blk_queue_discard_granularity(struct request_queue *q, unsigned int granularity)
+static unsigned int drbd_max_peer_bio_size(struct drbd_device *device)
 {
-       q->limits.discard_granularity = granularity;
+       /*
+        * We may ignore peer limits if the peer is modern enough.  From 8.3.8
+        * onwards the peer can use multiple BIOs for a single peer_request.
+        */
+       if (device->state.conn < C_WF_REPORT_PARAMS)
+               return device->peer_max_bio_size;
+
+       if (first_peer_device(device)->connection->agreed_pro_version < 94)
+               return min(device->peer_max_bio_size, DRBD_MAX_SIZE_H80_PACKET);
+
+       /*
+        * Correct old drbd (up to 8.3.7) if it believes it can do more than
+        * 32KiB.
+        */
+       if (first_peer_device(device)->connection->agreed_pro_version == 94)
+               return DRBD_MAX_SIZE_H80_PACKET;
+
+       /*
+        * drbd 8.3.8 onwards, before 8.4.0
+        */
+       if (first_peer_device(device)->connection->agreed_pro_version < 100)
+               return DRBD_MAX_BIO_SIZE_P95;
+       return DRBD_MAX_BIO_SIZE;
 }
 
 static unsigned int drbd_max_discard_sectors(struct drbd_connection *connection)
@@ -1204,149 +1226,119 @@ static unsigned int drbd_max_discard_sectors(struct drbd_connection *connection)
        return AL_EXTENT_SIZE >> 9;
 }
 
-static void decide_on_discard_support(struct drbd_device *device,
+static bool drbd_discard_supported(struct drbd_connection *connection,
                struct drbd_backing_dev *bdev)
 {
-       struct drbd_connection *connection =
-               first_peer_device(device)->connection;
-       struct request_queue *q = device->rq_queue;
-       unsigned int max_discard_sectors;
-
        if (bdev && !bdev_max_discard_sectors(bdev->backing_bdev))
-               goto not_supported;
+               return false;
 
        if (connection->cstate >= C_CONNECTED &&
            !(connection->agreed_features & DRBD_FF_TRIM)) {
                drbd_info(connection,
                        "peer DRBD too old, does not support TRIM: disabling discards\n");
-               goto not_supported;
+               return false;
        }
 
-       /*
-        * We don't care for the granularity, really.
-        *
-        * Stacking limits below should fix it for the local device.  Whether or
-        * not it is a suitable granularity on the remote device is not our
-        * problem, really. If you care, you need to use devices with similar
-        * topology on all peers.
-        */
-       blk_queue_discard_granularity(q, 512);
-       max_discard_sectors = drbd_max_discard_sectors(connection);
-       blk_queue_max_discard_sectors(q, max_discard_sectors);
-       blk_queue_max_write_zeroes_sectors(q, max_discard_sectors);
-       return;
-
-not_supported:
-       blk_queue_discard_granularity(q, 0);
-       blk_queue_max_discard_sectors(q, 0);
+       return true;
 }
 
-static void fixup_write_zeroes(struct drbd_device *device, struct request_queue *q)
+/* This is the workaround for "bio would need to, but cannot, be split" */
+static unsigned int drbd_backing_dev_max_segments(struct drbd_device *device)
 {
-       /* Fixup max_write_zeroes_sectors after blk_stack_limits():
-        * if we can handle "zeroes" efficiently on the protocol,
-        * we want to do that, even if our backend does not announce
-        * max_write_zeroes_sectors itself. */
-       struct drbd_connection *connection = first_peer_device(device)->connection;
-       /* If the peer announces WZEROES support, use it.  Otherwise, rather
-        * send explicit zeroes than rely on some discard-zeroes-data magic. */
-       if (connection->agreed_features & DRBD_FF_WZEROES)
-               q->limits.max_write_zeroes_sectors = DRBD_MAX_BBIO_SECTORS;
-       else
-               q->limits.max_write_zeroes_sectors = 0;
-}
+       unsigned int max_segments;
 
-static void fixup_discard_support(struct drbd_device *device, struct request_queue *q)
-{
-       unsigned int max_discard = device->rq_queue->limits.max_discard_sectors;
-       unsigned int discard_granularity =
-               device->rq_queue->limits.discard_granularity >> SECTOR_SHIFT;
+       rcu_read_lock();
+       max_segments = rcu_dereference(device->ldev->disk_conf)->max_bio_bvecs;
+       rcu_read_unlock();
 
-       if (discard_granularity > max_discard) {
-               blk_queue_discard_granularity(q, 0);
-               blk_queue_max_discard_sectors(q, 0);
-       }
+       if (!max_segments)
+               return BLK_MAX_SEGMENTS;
+       return max_segments;
 }
 
-static void drbd_setup_queue_param(struct drbd_device *device, struct drbd_backing_dev *bdev,
-                                  unsigned int max_bio_size, struct o_qlim *o)
+void drbd_reconsider_queue_parameters(struct drbd_device *device,
+               struct drbd_backing_dev *bdev, struct o_qlim *o)
 {
+       struct drbd_connection *connection =
+               first_peer_device(device)->connection;
        struct request_queue * const q = device->rq_queue;
-       unsigned int max_hw_sectors = max_bio_size >> 9;
-       unsigned int max_segments = 0;
+       unsigned int now = queue_max_hw_sectors(q) << 9;
+       struct queue_limits lim;
        struct request_queue *b = NULL;
-       struct disk_conf *dc;
+       unsigned int new;
 
        if (bdev) {
                b = bdev->backing_bdev->bd_disk->queue;
 
-               max_hw_sectors = min(queue_max_hw_sectors(b), max_bio_size >> 9);
-               rcu_read_lock();
-               dc = rcu_dereference(device->ldev->disk_conf);
-               max_segments = dc->max_bio_bvecs;
-               rcu_read_unlock();
-
-               blk_set_stacking_limits(&q->limits);
+               device->local_max_bio_size =
+                       queue_max_hw_sectors(b) << SECTOR_SHIFT;
        }
 
-       blk_queue_max_hw_sectors(q, max_hw_sectors);
-       /* This is the workaround for "bio would need to, but cannot, be split" */
-       blk_queue_max_segments(q, max_segments ? max_segments : BLK_MAX_SEGMENTS);
-       blk_queue_segment_boundary(q, PAGE_SIZE-1);
-       decide_on_discard_support(device, bdev);
-
-       if (b) {
-               blk_stack_limits(&q->limits, &b->limits, 0);
-               disk_update_readahead(device->vdisk);
+       /*
+        * We may later detach and re-attach on a disconnected Primary.  Avoid
+        * decreasing the value in this case.
+        *
+        * We want to store what we know the peer DRBD can handle, not what the
+        * peer IO backend can handle.
+        */
+       new = min3(DRBD_MAX_BIO_SIZE, device->local_max_bio_size,
+               max(drbd_max_peer_bio_size(device), device->peer_max_bio_size));
+       if (new != now) {
+               if (device->state.role == R_PRIMARY && new < now)
+                       drbd_err(device, "ASSERT FAILED new < now; (%u < %u)\n",
+                                       new, now);
+               drbd_info(device, "max BIO size = %u\n", new);
        }
-       fixup_write_zeroes(device, q);
-       fixup_discard_support(device, q);
-}
-
-void drbd_reconsider_queue_parameters(struct drbd_device *device, struct drbd_backing_dev *bdev, struct o_qlim *o)
-{
-       unsigned int now, new, local, peer;
-
-       now = queue_max_hw_sectors(device->rq_queue) << 9;
-       local = device->local_max_bio_size; /* Eventually last known value, from volatile memory */
-       peer = device->peer_max_bio_size; /* Eventually last known value, from meta data */
 
+       lim = queue_limits_start_update(q);
        if (bdev) {
-               local = queue_max_hw_sectors(bdev->backing_bdev->bd_disk->queue) << 9;
-               device->local_max_bio_size = local;
+               blk_set_stacking_limits(&lim);
+               lim.max_segments = drbd_backing_dev_max_segments(device);
+       } else {
+               lim.max_segments = BLK_MAX_SEGMENTS;
        }
-       local = min(local, DRBD_MAX_BIO_SIZE);
 
-       /* We may ignore peer limits if the peer is modern enough.
-          Because new from 8.3.8 onwards the peer can use multiple
-          BIOs for a single peer_request */
-       if (device->state.conn >= C_WF_REPORT_PARAMS) {
-               if (first_peer_device(device)->connection->agreed_pro_version < 94)
-                       peer = min(device->peer_max_bio_size, DRBD_MAX_SIZE_H80_PACKET);
-                       /* Correct old drbd (up to 8.3.7) if it believes it can do more than 32KiB */
-               else if (first_peer_device(device)->connection->agreed_pro_version == 94)
-                       peer = DRBD_MAX_SIZE_H80_PACKET;
-               else if (first_peer_device(device)->connection->agreed_pro_version < 100)
-                       peer = DRBD_MAX_BIO_SIZE_P95;  /* drbd 8.3.8 onwards, before 8.4.0 */
-               else
-                       peer = DRBD_MAX_BIO_SIZE;
+       lim.max_hw_sectors = new >> SECTOR_SHIFT;
+       lim.seg_boundary_mask = PAGE_SIZE - 1;
 
-               /* We may later detach and re-attach on a disconnected Primary.
-                * Avoid this setting to jump back in that case.
-                * We want to store what we know the peer DRBD can handle,
-                * not what the peer IO backend can handle. */
-               if (peer > device->peer_max_bio_size)
-                       device->peer_max_bio_size = peer;
+       /*
+        * We don't care for the granularity, really.
+        *
+        * Stacking limits below should fix it for the local device.  Whether or
+        * not it is a suitable granularity on the remote device is not our
+        * problem, really. If you care, you need to use devices with similar
+        * topology on all peers.
+        */
+       if (drbd_discard_supported(connection, bdev)) {
+               lim.discard_granularity = 512;
+               lim.max_hw_discard_sectors =
+                       drbd_max_discard_sectors(connection);
+       } else {
+               lim.discard_granularity = 0;
+               lim.max_hw_discard_sectors = 0;
        }
-       new = min(local, peer);
 
-       if (device->state.role == R_PRIMARY && new < now)
-               drbd_err(device, "ASSERT FAILED new < now; (%u < %u)\n", new, now);
+       if (bdev)
+               blk_stack_limits(&lim, &b->limits, 0);
 
-       if (new != now)
-               drbd_info(device, "max BIO size = %u\n", new);
+       /*
+        * If we can handle "zeroes" efficiently on the protocol, we want to do
+        * that, even if our backend does not announce max_write_zeroes_sectors
+        * itself.
+        */
+       if (connection->agreed_features & DRBD_FF_WZEROES)
+               lim.max_write_zeroes_sectors = DRBD_MAX_BBIO_SECTORS;
+       else
+               lim.max_write_zeroes_sectors = 0;
+
+       if ((lim.discard_granularity >> SECTOR_SHIFT) >
+           lim.max_hw_discard_sectors) {
+               lim.discard_granularity = 0;
+               lim.max_hw_discard_sectors = 0;
+       }
 
-       drbd_setup_queue_param(device, bdev, new, o);
+       if (queue_limits_commit_update(q, &lim))
+               drbd_err(device, "setting new queue limits failed\n");
 }
 
 /* Starts the worker thread */
@@ -1635,45 +1627,45 @@ success:
        return 0;
 }
 
-static struct bdev_handle *open_backing_dev(struct drbd_device *device,
+static struct file *open_backing_dev(struct drbd_device *device,
                const char *bdev_path, void *claim_ptr, bool do_bd_link)
 {
-       struct bdev_handle *handle;
+       struct file *file;
        int err = 0;
 
-       handle = bdev_open_by_path(bdev_path, BLK_OPEN_READ | BLK_OPEN_WRITE,
-                                  claim_ptr, NULL);
-       if (IS_ERR(handle)) {
+       file = bdev_file_open_by_path(bdev_path, BLK_OPEN_READ | BLK_OPEN_WRITE,
+                                     claim_ptr, NULL);
+       if (IS_ERR(file)) {
                drbd_err(device, "open(\"%s\") failed with %ld\n",
-                               bdev_path, PTR_ERR(handle));
-               return handle;
+                               bdev_path, PTR_ERR(file));
+               return file;
        }
 
        if (!do_bd_link)
-               return handle;
+               return file;
 
-       err = bd_link_disk_holder(handle->bdev, device->vdisk);
+       err = bd_link_disk_holder(file_bdev(file), device->vdisk);
        if (err) {
-               bdev_release(handle);
+               fput(file);
                drbd_err(device, "bd_link_disk_holder(\"%s\", ...) failed with %d\n",
                                bdev_path, err);
-               handle = ERR_PTR(err);
+               file = ERR_PTR(err);
        }
-       return handle;
+       return file;
 }
 
 static int open_backing_devices(struct drbd_device *device,
                struct disk_conf *new_disk_conf,
                struct drbd_backing_dev *nbc)
 {
-       struct bdev_handle *handle;
+       struct file *file;
 
-       handle = open_backing_dev(device, new_disk_conf->backing_dev, device,
+       file = open_backing_dev(device, new_disk_conf->backing_dev, device,
                                  true);
-       if (IS_ERR(handle))
+       if (IS_ERR(file))
                return ERR_OPEN_DISK;
-       nbc->backing_bdev = handle->bdev;
-       nbc->backing_bdev_handle = handle;
+       nbc->backing_bdev = file_bdev(file);
+       nbc->backing_bdev_file = file;
 
        /*
         * meta_dev_idx >= 0: external fixed size, possibly multiple
@@ -1683,7 +1675,7 @@ static int open_backing_devices(struct drbd_device *device,
         * should check it for you already; but if you don't, or
         * someone fooled it, we need to double check here)
         */
-       handle = open_backing_dev(device, new_disk_conf->meta_dev,
+       file = open_backing_dev(device, new_disk_conf->meta_dev,
                /* claim ptr: device, if claimed exclusively; shared drbd_m_holder,
                 * if potentially shared with other drbd minors */
                        (new_disk_conf->meta_dev_idx < 0) ? (void*)device : (void*)drbd_m_holder,
@@ -1691,21 +1683,21 @@ static int open_backing_devices(struct drbd_device *device,
                 * as would happen with internal metadata. */
                        (new_disk_conf->meta_dev_idx != DRBD_MD_INDEX_FLEX_INT &&
                         new_disk_conf->meta_dev_idx != DRBD_MD_INDEX_INTERNAL));
-       if (IS_ERR(handle))
+       if (IS_ERR(file))
                return ERR_OPEN_MD_DISK;
-       nbc->md_bdev = handle->bdev;
-       nbc->md_bdev_handle = handle;
+       nbc->md_bdev = file_bdev(file);
+       nbc->f_md_bdev = file;
        return NO_ERROR;
 }
 
 static void close_backing_dev(struct drbd_device *device,
-               struct bdev_handle *handle, bool do_bd_unlink)
+               struct file *bdev_file, bool do_bd_unlink)
 {
-       if (!handle)
+       if (!bdev_file)
                return;
        if (do_bd_unlink)
-               bd_unlink_disk_holder(handle->bdev, device->vdisk);
-       bdev_release(handle);
+               bd_unlink_disk_holder(file_bdev(bdev_file), device->vdisk);
+       fput(bdev_file);
 }
 
 void drbd_backing_dev_free(struct drbd_device *device, struct drbd_backing_dev *ldev)
@@ -1713,9 +1705,9 @@ void drbd_backing_dev_free(struct drbd_device *device, struct drbd_backing_dev *
        if (ldev == NULL)
                return;
 
-       close_backing_dev(device, ldev->md_bdev_handle,
+       close_backing_dev(device, ldev->f_md_bdev,
                          ldev->md_bdev != ldev->backing_bdev);
-       close_backing_dev(device, ldev->backing_bdev_handle, true);
+       close_backing_dev(device, ldev->backing_bdev_file, true);
 
        kfree(ldev->disk_conf);
        kfree(ldev);
@@ -2131,9 +2123,9 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
  fail:
        conn_reconfig_done(connection);
        if (nbc) {
-               close_backing_dev(device, nbc->md_bdev_handle,
+               close_backing_dev(device, nbc->f_md_bdev,
                          nbc->md_bdev != nbc->backing_bdev);
-               close_backing_dev(device, nbc->backing_bdev_handle, true);
+               close_backing_dev(device, nbc->backing_bdev_file, true);
                kfree(nbc);
        }
        kfree(new_disk_conf);
index 287a8d1d3f707f217c2e715c62decb4e83e27d1f..e858e7e0383f262fa0bf412c5867ee6832ffcb64 100644 (file)
@@ -1542,9 +1542,10 @@ int drbd_bitmap_io_from_worker(struct drbd_device *device,
 
 int notify_resource_state_change(struct sk_buff *skb,
                                  unsigned int seq,
-                                 struct drbd_resource_state_change *resource_state_change,
+                                 void *state_change,
                                  enum drbd_notification_type type)
 {
+       struct drbd_resource_state_change *resource_state_change = state_change;
        struct drbd_resource *resource = resource_state_change->resource;
        struct resource_info resource_info = {
                .res_role = resource_state_change->role[NEW],
@@ -1558,13 +1559,14 @@ int notify_resource_state_change(struct sk_buff *skb,
 
 int notify_connection_state_change(struct sk_buff *skb,
                                    unsigned int seq,
-                                   struct drbd_connection_state_change *connection_state_change,
+                                   void *state_change,
                                    enum drbd_notification_type type)
 {
-       struct drbd_connection *connection = connection_state_change->connection;
+       struct drbd_connection_state_change *p = state_change;
+       struct drbd_connection *connection = p->connection;
        struct connection_info connection_info = {
-               .conn_connection_state = connection_state_change->cstate[NEW],
-               .conn_role = connection_state_change->peer_role[NEW],
+               .conn_connection_state = p->cstate[NEW],
+               .conn_role = p->peer_role[NEW],
        };
 
        return notify_connection_state(skb, seq, connection, &connection_info, type);
@@ -1572,9 +1574,10 @@ int notify_connection_state_change(struct sk_buff *skb,
 
 int notify_device_state_change(struct sk_buff *skb,
                                unsigned int seq,
-                               struct drbd_device_state_change *device_state_change,
+                               void *state_change,
                                enum drbd_notification_type type)
 {
+       struct drbd_device_state_change *device_state_change = state_change;
        struct drbd_device *device = device_state_change->device;
        struct device_info device_info = {
                .dev_disk_state = device_state_change->disk_state[NEW],
@@ -1585,9 +1588,10 @@ int notify_device_state_change(struct sk_buff *skb,
 
 int notify_peer_device_state_change(struct sk_buff *skb,
                                     unsigned int seq,
-                                    struct drbd_peer_device_state_change *p,
+                                    void *state_change,
                                     enum drbd_notification_type type)
 {
+       struct drbd_peer_device_state_change *p = state_change;
        struct drbd_peer_device *peer_device = p->peer_device;
        struct peer_device_info peer_device_info = {
                .peer_repl_state = p->repl_state[NEW],
@@ -1605,8 +1609,8 @@ static void broadcast_state_change(struct drbd_state_change *state_change)
        struct drbd_resource_state_change *resource_state_change = &state_change->resource[0];
        bool resource_state_has_changed;
        unsigned int n_device, n_connection, n_peer_device, n_peer_devices;
-       int (*last_func)(struct sk_buff *, unsigned int, void *,
-                         enum drbd_notification_type) = NULL;
+       int (*last_func)(struct sk_buff *, unsigned int,
+               void *, enum drbd_notification_type) = NULL;
        void *last_arg = NULL;
 
 #define HAS_CHANGED(state) ((state)[OLD] != (state)[NEW])
@@ -1616,7 +1620,7 @@ static void broadcast_state_change(struct drbd_state_change *state_change)
        })
 #define REMEMBER_STATE_CHANGE(func, arg, type) \
        ({ FINAL_STATE_CHANGE(type | NOTIFY_CONTINUES); \
-          last_func = (typeof(last_func))func; \
+          last_func = func; \
           last_arg = arg; \
         })
 
index 9d78d8e3912eee6d581a2a70dc683665a2b91e97..a56a57d67686291a8fd6401b28a1a70473d336c2 100644 (file)
@@ -46,19 +46,19 @@ extern void forget_state_change(struct drbd_state_change *);
 
 extern int notify_resource_state_change(struct sk_buff *,
                                         unsigned int,
-                                        struct drbd_resource_state_change *,
+                                        void *,
                                         enum drbd_notification_type type);
 extern int notify_connection_state_change(struct sk_buff *,
                                           unsigned int,
-                                          struct drbd_connection_state_change *,
+                                          void *,
                                           enum drbd_notification_type type);
 extern int notify_device_state_change(struct sk_buff *,
                                       unsigned int,
-                                      struct drbd_device_state_change *,
+                                      void *,
                                       enum drbd_notification_type type);
 extern int notify_peer_device_state_change(struct sk_buff *,
                                            unsigned int,
-                                           struct drbd_peer_device_state_change *,
+                                           void *,
                                            enum drbd_notification_type type);
 
 #endif  /* DRBD_STATE_CHANGE_H */
index d0e41d52d6a9b58474c52edd7f5dc23f9a0c19c1..1b399ec8c07d1e7027a759051b18187dde83ab1e 100644 (file)
@@ -530,14 +530,13 @@ static struct format_descr format_req;
 static char *floppy_track_buffer;
 static int max_buffer_sectors;
 
-typedef void (*done_f)(int);
 static const struct cont_t {
        void (*interrupt)(void);
                                /* this is called after the interrupt of the
                                 * main command */
        void (*redo)(void);     /* this is called to retry the operation */
        void (*error)(void);    /* this is called to tally an error */
-       done_f done;            /* this is called to say if the operation has
+       void (*done)(int);      /* this is called to say if the operation has
                                 * succeeded/failed */
 } *cont;
 
@@ -985,6 +984,10 @@ static void empty(void)
 {
 }
 
+static void empty_done(int result)
+{
+}
+
 static void (*floppy_work_fn)(void);
 
 static void floppy_work_workfn(struct work_struct *work)
@@ -1998,14 +2001,14 @@ static const struct cont_t wakeup_cont = {
        .interrupt      = empty,
        .redo           = do_wakeup,
        .error          = empty,
-       .done           = (done_f)empty
+       .done           = empty_done,
 };
 
 static const struct cont_t intr_cont = {
        .interrupt      = empty,
        .redo           = process_fd_request,
        .error          = empty,
-       .done           = (done_f)empty
+       .done           = empty_done,
 };
 
 /* schedules handler, waiting for completion. May be interrupted, will then
@@ -4513,13 +4516,15 @@ static bool floppy_available(int drive)
 
 static int floppy_alloc_disk(unsigned int drive, unsigned int type)
 {
+       struct queue_limits lim = {
+               .max_hw_sectors = 64,
+       };
        struct gendisk *disk;
 
-       disk = blk_mq_alloc_disk(&tag_sets[drive], NULL);
+       disk = blk_mq_alloc_disk(&tag_sets[drive], &lim, NULL);
        if (IS_ERR(disk))
                return PTR_ERR(disk);
 
-       blk_queue_max_hw_sectors(disk->queue, 64);
        disk->major = FLOPPY_MAJOR;
        disk->first_minor = TOMINOR(drive) | (type << 2);
        disk->minors = 1;
index f8145499da38c834225b8f2d2ee0448d19adc8e1..28a95fd366fea5741db9d72b34afc6ddc6d0890a 100644 (file)
@@ -750,12 +750,13 @@ static void loop_sysfs_exit(struct loop_device *lo)
                                   &loop_attribute_group);
 }
 
-static void loop_config_discard(struct loop_device *lo)
+static void loop_config_discard(struct loop_device *lo,
+               struct queue_limits *lim)
 {
        struct file *file = lo->lo_backing_file;
        struct inode *inode = file->f_mapping->host;
-       struct request_queue *q = lo->lo_queue;
-       u32 granularity, max_discard_sectors;
+       u32 granularity = 0, max_discard_sectors = 0;
+       struct kstatfs sbuf;
 
        /*
         * If the backing device is a block device, mirror its zeroing
@@ -775,29 +776,17 @@ static void loop_config_discard(struct loop_device *lo)
         * We use punch hole to reclaim the free space used by the
         * image a.k.a. discard.
         */
-       } else if (!file->f_op->fallocate) {
-               max_discard_sectors = 0;
-               granularity = 0;
-
-       } else {
-               struct kstatfs sbuf;
-
+       } else if (file->f_op->fallocate && !vfs_statfs(&file->f_path, &sbuf)) {
                max_discard_sectors = UINT_MAX >> 9;
-               if (!vfs_statfs(&file->f_path, &sbuf))
-                       granularity = sbuf.f_bsize;
-               else
-                       max_discard_sectors = 0;
+               granularity = sbuf.f_bsize;
        }
 
-       if (max_discard_sectors) {
-               q->limits.discard_granularity = granularity;
-               blk_queue_max_discard_sectors(q, max_discard_sectors);
-               blk_queue_max_write_zeroes_sectors(q, max_discard_sectors);
-       } else {
-               q->limits.discard_granularity = 0;
-               blk_queue_max_discard_sectors(q, 0);
-               blk_queue_max_write_zeroes_sectors(q, 0);
-       }
+       lim->max_hw_discard_sectors = max_discard_sectors;
+       lim->max_write_zeroes_sectors = max_discard_sectors;
+       if (max_discard_sectors)
+               lim->discard_granularity = granularity;
+       else
+               lim->discard_granularity = 0;
 }
 
 struct loop_worker {
@@ -986,6 +975,20 @@ loop_set_status_from_info(struct loop_device *lo,
        return 0;
 }
 
+static int loop_reconfigure_limits(struct loop_device *lo, unsigned short bsize,
+               bool update_discard_settings)
+{
+       struct queue_limits lim;
+
+       lim = queue_limits_start_update(lo->lo_queue);
+       lim.logical_block_size = bsize;
+       lim.physical_block_size = bsize;
+       lim.io_min = bsize;
+       if (update_discard_settings)
+               loop_config_discard(lo, &lim);
+       return queue_limits_commit_update(lo->lo_queue, &lim);
+}
+
 static int loop_configure(struct loop_device *lo, blk_mode_t mode,
                          struct block_device *bdev,
                          const struct loop_config *config)
@@ -1083,11 +1086,10 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
        else
                bsize = 512;
 
-       blk_queue_logical_block_size(lo->lo_queue, bsize);
-       blk_queue_physical_block_size(lo->lo_queue, bsize);
-       blk_queue_io_min(lo->lo_queue, bsize);
+       error = loop_reconfigure_limits(lo, bsize, true);
+       if (WARN_ON_ONCE(error))
+               goto out_unlock;
 
-       loop_config_discard(lo);
        loop_update_rotational(lo);
        loop_update_dio(lo);
        loop_sysfs_init(lo);
@@ -1154,9 +1156,7 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
        lo->lo_offset = 0;
        lo->lo_sizelimit = 0;
        memset(lo->lo_file_name, 0, LO_NAME_SIZE);
-       blk_queue_logical_block_size(lo->lo_queue, 512);
-       blk_queue_physical_block_size(lo->lo_queue, 512);
-       blk_queue_io_min(lo->lo_queue, 512);
+       loop_reconfigure_limits(lo, 512, false);
        invalidate_disk(lo->lo_disk);
        loop_sysfs_exit(lo);
        /* let user-space know about this change */
@@ -1488,9 +1488,7 @@ static int loop_set_block_size(struct loop_device *lo, unsigned long arg)
        invalidate_bdev(lo->lo_device);
 
        blk_mq_freeze_queue(lo->lo_queue);
-       blk_queue_logical_block_size(lo->lo_queue, arg);
-       blk_queue_physical_block_size(lo->lo_queue, arg);
-       blk_queue_io_min(lo->lo_queue, arg);
+       err = loop_reconfigure_limits(lo, arg, false);
        loop_update_dio(lo);
        blk_mq_unfreeze_queue(lo->lo_queue);
 
@@ -1982,6 +1980,12 @@ static const struct blk_mq_ops loop_mq_ops = {
 
 static int loop_add(int i)
 {
+       struct queue_limits lim = {
+               /*
+                * Random number picked from the historic block max_sectors cap.
+                */
+               .max_hw_sectors         = 2560u,
+       };
        struct loop_device *lo;
        struct gendisk *disk;
        int err;
@@ -2025,16 +2029,13 @@ static int loop_add(int i)
        if (err)
                goto out_free_idr;
 
-       disk = lo->lo_disk = blk_mq_alloc_disk(&lo->tag_set, lo);
+       disk = lo->lo_disk = blk_mq_alloc_disk(&lo->tag_set, &lim, lo);
        if (IS_ERR(disk)) {
                err = PTR_ERR(disk);
                goto out_cleanup_tags;
        }
        lo->lo_queue = lo->lo_disk->queue;
 
-       /* random number picked from the history block max_sectors cap */
-       blk_queue_max_hw_sectors(lo->lo_queue, 2560u);
-
        /*
         * By default, we do buffer IO, so it doesn't make sense to enable
         * merge because the I/O submitted to backing file is handled page by
index b200950e8fb5f9b7c5efcb044dddf52087765e2e..43a187609ef794a82a57ea49fe1b34a40593e828 100644 (file)
@@ -3401,6 +3401,12 @@ static const struct blk_mq_ops mtip_mq_ops = {
  */
 static int mtip_block_initialize(struct driver_data *dd)
 {
+       struct queue_limits lim = {
+               .physical_block_size    = 4096,
+               .max_hw_sectors         = 0xffff,
+               .max_segments           = MTIP_MAX_SG,
+               .max_segment_size       = 0x400000,
+       };
        int rv = 0, wait_for_rebuild = 0;
        sector_t capacity;
        unsigned int index = 0;
@@ -3431,7 +3437,7 @@ static int mtip_block_initialize(struct driver_data *dd)
                goto block_queue_alloc_tag_error;
        }
 
-       dd->disk = blk_mq_alloc_disk(&dd->tags, dd);
+       dd->disk = blk_mq_alloc_disk(&dd->tags, &lim, dd);
        if (IS_ERR(dd->disk)) {
                dev_err(&dd->pdev->dev,
                        "Unable to allocate request queue\n");
@@ -3481,12 +3487,7 @@ skip_create_disk:
        /* Set device limits. */
        blk_queue_flag_set(QUEUE_FLAG_NONROT, dd->queue);
        blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, dd->queue);
-       blk_queue_max_segments(dd->queue, MTIP_MAX_SG);
-       blk_queue_physical_block_size(dd->queue, 4096);
-       blk_queue_max_hw_sectors(dd->queue, 0xffff);
-       blk_queue_max_segment_size(dd->queue, 0x400000);
        dma_set_max_seg_size(&dd->pdev->dev, 0x400000);
-       blk_queue_io_min(dd->queue, 4096);
 
        /* Set the capacity of the device in 512 byte sectors. */
        if (!(mtip_hw_get_capacity(dd, &capacity))) {
index d914156db2d8b2d698432001ecc6534f619b7394..27b2187e7a6d55cad41bec8ee664ca7cc5026c28 100644 (file)
@@ -114,6 +114,10 @@ static const struct block_device_operations n64cart_fops = {
  */
 static int __init n64cart_probe(struct platform_device *pdev)
 {
+       struct queue_limits lim = {
+               .physical_block_size    = 4096,
+               .logical_block_size     = 4096,
+       };
        struct gendisk *disk;
        int err = -ENOMEM;
 
@@ -131,9 +135,11 @@ static int __init n64cart_probe(struct platform_device *pdev)
        if (IS_ERR(reg_base))
                return PTR_ERR(reg_base);
 
-       disk = blk_alloc_disk(NUMA_NO_NODE);
-       if (!disk)
+       disk = blk_alloc_disk(&lim, NUMA_NO_NODE);
+       if (IS_ERR(disk)) {
+               err = PTR_ERR(disk);
                goto out;
+       }
 
        disk->first_minor = 0;
        disk->flags = GENHD_FL_NO_PART;
@@ -145,8 +151,6 @@ static int __init n64cart_probe(struct platform_device *pdev)
        set_disk_ro(disk, 1);
 
        blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
-       blk_queue_physical_block_size(disk->queue, 4096);
-       blk_queue_logical_block_size(disk->queue, 4096);
 
        err = add_disk(disk);
        if (err)
index 33a8f37bb6a1f504060f783c6d727e4c76026a2e..9d4ec9273bf9545b715aa2aefdb3d8cefaaba9ca 100644 (file)
@@ -316,9 +316,12 @@ static void nbd_mark_nsock_dead(struct nbd_device *nbd, struct nbd_sock *nsock,
        nsock->sent = 0;
 }
 
-static int nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
+static int __nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
                loff_t blksize)
 {
+       struct queue_limits lim;
+       int error;
+
        if (!blksize)
                blksize = 1u << NBD_DEF_BLKSIZE_BITS;
 
@@ -334,10 +337,16 @@ static int nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
        if (!nbd->pid)
                return 0;
 
+       lim = queue_limits_start_update(nbd->disk->queue);
        if (nbd->config->flags & NBD_FLAG_SEND_TRIM)
-               blk_queue_max_discard_sectors(nbd->disk->queue, UINT_MAX);
-       blk_queue_logical_block_size(nbd->disk->queue, blksize);
-       blk_queue_physical_block_size(nbd->disk->queue, blksize);
+               lim.max_hw_discard_sectors = UINT_MAX;
+       else
+               lim.max_hw_discard_sectors = 0;
+       lim.logical_block_size = blksize;
+       lim.physical_block_size = blksize;
+       error = queue_limits_commit_update(nbd->disk->queue, &lim);
+       if (error)
+               return error;
 
        if (max_part)
                set_bit(GD_NEED_PART_SCAN, &nbd->disk->state);
@@ -346,6 +355,18 @@ static int nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
        return 0;
 }
 
+static int nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
+               loff_t blksize)
+{
+       int error;
+
+       blk_mq_freeze_queue(nbd->disk->queue);
+       error = __nbd_set_size(nbd, bytesize, blksize);
+       blk_mq_unfreeze_queue(nbd->disk->queue);
+
+       return error;
+}
+
 static void nbd_complete_rq(struct request *req)
 {
        struct nbd_cmd *cmd = blk_mq_rq_to_pdu(req);
@@ -1351,7 +1372,6 @@ static void nbd_config_put(struct nbd_device *nbd)
                nbd->config = NULL;
 
                nbd->tag_set.timeout = 0;
-               blk_queue_max_discard_sectors(nbd->disk->queue, 0);
 
                mutex_unlock(&nbd->config_lock);
                nbd_put(nbd);
@@ -1783,6 +1803,12 @@ static const struct blk_mq_ops nbd_mq_ops = {
 
 static struct nbd_device *nbd_dev_add(int index, unsigned int refs)
 {
+       struct queue_limits lim = {
+               .max_hw_sectors         = 65536,
+               .max_user_sectors       = 256,
+               .max_segments           = USHRT_MAX,
+               .max_segment_size       = UINT_MAX,
+       };
        struct nbd_device *nbd;
        struct gendisk *disk;
        int err = -ENOMEM;
@@ -1823,7 +1849,7 @@ static struct nbd_device *nbd_dev_add(int index, unsigned int refs)
        if (err < 0)
                goto out_free_tags;
 
-       disk = blk_mq_alloc_disk(&nbd->tag_set, NULL);
+       disk = blk_mq_alloc_disk(&nbd->tag_set, &lim, NULL);
        if (IS_ERR(disk)) {
                err = PTR_ERR(disk);
                goto out_free_idr;
@@ -1843,11 +1869,6 @@ static struct nbd_device *nbd_dev_add(int index, unsigned int refs)
         * Tell the block layer that we are not a rotational device
         */
        blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
-       blk_queue_max_discard_sectors(disk->queue, 0);
-       blk_queue_max_segment_size(disk->queue, UINT_MAX);
-       blk_queue_max_segments(disk->queue, USHRT_MAX);
-       blk_queue_max_hw_sectors(disk->queue, 65536);
-       disk->queue->limits.max_sectors = 256;
 
        mutex_init(&nbd->config_lock);
        refcount_set(&nbd->config_refs, 0);
@@ -2433,6 +2454,12 @@ static int nbd_genl_status(struct sk_buff *skb, struct genl_info *info)
        }
 
        dev_list = nla_nest_start_noflag(reply, NBD_ATTR_DEVICE_LIST);
+       if (!dev_list) {
+               nlmsg_free(reply);
+               ret = -EMSGSIZE;
+               goto out;
+       }
+
        if (index == -1) {
                ret = idr_for_each(&nbd_index_idr, &status_cb, reply);
                if (ret) {
index 36755f263e8ec03b1828bf44a05cc3b54bb6a03f..71c39bcd872c7ecaabc67e91f35aa2fb267d6826 100644 (file)
@@ -115,6 +115,18 @@ module_param_string(init_hctx, g_init_hctx_str, sizeof(g_init_hctx_str), 0444);
 MODULE_PARM_DESC(init_hctx, "Fault injection to fail hctx init. init_hctx=<interval>,<probability>,<space>,<times>");
 #endif
 
+/*
+ * Historic queue modes.
+ *
+ * These days nothing but NULL_Q_MQ is actually supported, but we keep it the
+ * enum for error reporting.
+ */
+enum {
+       NULL_Q_BIO      = 0,
+       NULL_Q_RQ       = 1,
+       NULL_Q_MQ       = 2,
+};
+
 static int g_queue_mode = NULL_Q_MQ;
 
 static int null_param_store_val(const char *str, int *val, int min, int max)
@@ -165,8 +177,8 @@ static bool g_blocking;
 module_param_named(blocking, g_blocking, bool, 0444);
 MODULE_PARM_DESC(blocking, "Register as a blocking blk-mq driver device");
 
-static bool shared_tags;
-module_param(shared_tags, bool, 0444);
+static bool g_shared_tags;
+module_param_named(shared_tags, g_shared_tags, bool, 0444);
 MODULE_PARM_DESC(shared_tags, "Share tag set between devices for blk-mq");
 
 static bool g_shared_tag_bitmap;
@@ -426,6 +438,7 @@ NULLB_DEVICE_ATTR(zone_max_open, uint, NULL);
 NULLB_DEVICE_ATTR(zone_max_active, uint, NULL);
 NULLB_DEVICE_ATTR(virt_boundary, bool, NULL);
 NULLB_DEVICE_ATTR(no_sched, bool, NULL);
+NULLB_DEVICE_ATTR(shared_tags, bool, NULL);
 NULLB_DEVICE_ATTR(shared_tag_bitmap, bool, NULL);
 
 static ssize_t nullb_device_power_show(struct config_item *item, char *page)
@@ -571,6 +584,7 @@ static struct configfs_attribute *nullb_device_attrs[] = {
        &nullb_device_attr_zone_offline,
        &nullb_device_attr_virt_boundary,
        &nullb_device_attr_no_sched,
+       &nullb_device_attr_shared_tags,
        &nullb_device_attr_shared_tag_bitmap,
        NULL,
 };
@@ -653,10 +667,11 @@ static ssize_t memb_group_features_show(struct config_item *item, char *page)
                        "badblocks,blocking,blocksize,cache_size,"
                        "completion_nsec,discard,home_node,hw_queue_depth,"
                        "irqmode,max_sectors,mbps,memory_backed,no_sched,"
-                       "poll_queues,power,queue_mode,shared_tag_bitmap,size,"
-                       "submit_queues,use_per_node_hctx,virt_boundary,zoned,"
-                       "zone_capacity,zone_max_active,zone_max_open,"
-                       "zone_nr_conv,zone_offline,zone_readonly,zone_size\n");
+                       "poll_queues,power,queue_mode,shared_tag_bitmap,"
+                       "shared_tags,size,submit_queues,use_per_node_hctx,"
+                       "virt_boundary,zoned,zone_capacity,zone_max_active,"
+                       "zone_max_open,zone_nr_conv,zone_offline,zone_readonly,"
+                       "zone_size\n");
 }
 
 CONFIGFS_ATTR_RO(memb_group_, features);
@@ -738,6 +753,7 @@ static struct nullb_device *null_alloc_dev(void)
        dev->zone_max_active = g_zone_max_active;
        dev->virt_boundary = g_virt_boundary;
        dev->no_sched = g_no_sched;
+       dev->shared_tags = g_shared_tags;
        dev->shared_tag_bitmap = g_shared_tag_bitmap;
        return dev;
 }
@@ -752,98 +768,11 @@ static void null_free_dev(struct nullb_device *dev)
        kfree(dev);
 }
 
-static void put_tag(struct nullb_queue *nq, unsigned int tag)
-{
-       clear_bit_unlock(tag, nq->tag_map);
-
-       if (waitqueue_active(&nq->wait))
-               wake_up(&nq->wait);
-}
-
-static unsigned int get_tag(struct nullb_queue *nq)
-{
-       unsigned int tag;
-
-       do {
-               tag = find_first_zero_bit(nq->tag_map, nq->queue_depth);
-               if (tag >= nq->queue_depth)
-                       return -1U;
-       } while (test_and_set_bit_lock(tag, nq->tag_map));
-
-       return tag;
-}
-
-static void free_cmd(struct nullb_cmd *cmd)
-{
-       put_tag(cmd->nq, cmd->tag);
-}
-
-static enum hrtimer_restart null_cmd_timer_expired(struct hrtimer *timer);
-
-static struct nullb_cmd *__alloc_cmd(struct nullb_queue *nq)
-{
-       struct nullb_cmd *cmd;
-       unsigned int tag;
-
-       tag = get_tag(nq);
-       if (tag != -1U) {
-               cmd = &nq->cmds[tag];
-               cmd->tag = tag;
-               cmd->error = BLK_STS_OK;
-               cmd->nq = nq;
-               if (nq->dev->irqmode == NULL_IRQ_TIMER) {
-                       hrtimer_init(&cmd->timer, CLOCK_MONOTONIC,
-                                    HRTIMER_MODE_REL);
-                       cmd->timer.function = null_cmd_timer_expired;
-               }
-               return cmd;
-       }
-
-       return NULL;
-}
-
-static struct nullb_cmd *alloc_cmd(struct nullb_queue *nq, struct bio *bio)
-{
-       struct nullb_cmd *cmd;
-       DEFINE_WAIT(wait);
-
-       do {
-               /*
-                * This avoids multiple return statements, multiple calls to
-                * __alloc_cmd() and a fast path call to prepare_to_wait().
-                */
-               cmd = __alloc_cmd(nq);
-               if (cmd) {
-                       cmd->bio = bio;
-                       return cmd;
-               }
-               prepare_to_wait(&nq->wait, &wait, TASK_UNINTERRUPTIBLE);
-               io_schedule();
-               finish_wait(&nq->wait, &wait);
-       } while (1);
-}
-
-static void end_cmd(struct nullb_cmd *cmd)
-{
-       int queue_mode = cmd->nq->dev->queue_mode;
-
-       switch (queue_mode)  {
-       case NULL_Q_MQ:
-               blk_mq_end_request(cmd->rq, cmd->error);
-               return;
-       case NULL_Q_BIO:
-               cmd->bio->bi_status = cmd->error;
-               bio_endio(cmd->bio);
-               break;
-       }
-
-       free_cmd(cmd);
-}
-
 static enum hrtimer_restart null_cmd_timer_expired(struct hrtimer *timer)
 {
-       end_cmd(container_of(timer, struct nullb_cmd, timer));
+       struct nullb_cmd *cmd = container_of(timer, struct nullb_cmd, timer);
 
+       blk_mq_end_request(blk_mq_rq_from_pdu(cmd), cmd->error);
        return HRTIMER_NORESTART;
 }
 
@@ -856,7 +785,9 @@ static void null_cmd_end_timer(struct nullb_cmd *cmd)
 
 static void null_complete_rq(struct request *rq)
 {
-       end_cmd(blk_mq_rq_to_pdu(rq));
+       struct nullb_cmd *cmd = blk_mq_rq_to_pdu(rq);
+
+       blk_mq_end_request(rq, cmd->error);
 }
 
 static struct nullb_page *null_alloc_page(void)
@@ -1273,7 +1204,7 @@ static int null_transfer(struct nullb *nullb, struct page *page,
 
 static int null_handle_rq(struct nullb_cmd *cmd)
 {
-       struct request *rq = cmd->rq;
+       struct request *rq = blk_mq_rq_from_pdu(cmd);
        struct nullb *nullb = cmd->nq->dev->nullb;
        int err;
        unsigned int len;
@@ -1298,63 +1229,21 @@ static int null_handle_rq(struct nullb_cmd *cmd)
        return 0;
 }
 
-static int null_handle_bio(struct nullb_cmd *cmd)
-{
-       struct bio *bio = cmd->bio;
-       struct nullb *nullb = cmd->nq->dev->nullb;
-       int err;
-       unsigned int len;
-       sector_t sector = bio->bi_iter.bi_sector;
-       struct bio_vec bvec;
-       struct bvec_iter iter;
-
-       spin_lock_irq(&nullb->lock);
-       bio_for_each_segment(bvec, bio, iter) {
-               len = bvec.bv_len;
-               err = null_transfer(nullb, bvec.bv_page, len, bvec.bv_offset,
-                                    op_is_write(bio_op(bio)), sector,
-                                    bio->bi_opf & REQ_FUA);
-               if (err) {
-                       spin_unlock_irq(&nullb->lock);
-                       return err;
-               }
-               sector += len >> SECTOR_SHIFT;
-       }
-       spin_unlock_irq(&nullb->lock);
-       return 0;
-}
-
-static void null_stop_queue(struct nullb *nullb)
-{
-       struct request_queue *q = nullb->q;
-
-       if (nullb->dev->queue_mode == NULL_Q_MQ)
-               blk_mq_stop_hw_queues(q);
-}
-
-static void null_restart_queue_async(struct nullb *nullb)
-{
-       struct request_queue *q = nullb->q;
-
-       if (nullb->dev->queue_mode == NULL_Q_MQ)
-               blk_mq_start_stopped_hw_queues(q, true);
-}
-
 static inline blk_status_t null_handle_throttled(struct nullb_cmd *cmd)
 {
        struct nullb_device *dev = cmd->nq->dev;
        struct nullb *nullb = dev->nullb;
        blk_status_t sts = BLK_STS_OK;
-       struct request *rq = cmd->rq;
+       struct request *rq = blk_mq_rq_from_pdu(cmd);
 
        if (!hrtimer_active(&nullb->bw_timer))
                hrtimer_restart(&nullb->bw_timer);
 
        if (atomic_long_sub_return(blk_rq_bytes(rq), &nullb->cur_bytes) < 0) {
-               null_stop_queue(nullb);
+               blk_mq_stop_hw_queues(nullb->q);
                /* race with timer */
                if (atomic_long_read(&nullb->cur_bytes) > 0)
-                       null_restart_queue_async(nullb);
+                       blk_mq_start_stopped_hw_queues(nullb->q, true);
                /* requeue request */
                sts = BLK_STS_DEV_RESOURCE;
        }
@@ -1381,37 +1270,29 @@ static inline blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd,
                                                     sector_t nr_sectors)
 {
        struct nullb_device *dev = cmd->nq->dev;
-       int err;
 
        if (op == REQ_OP_DISCARD)
                return null_handle_discard(dev, sector, nr_sectors);
+       return errno_to_blk_status(null_handle_rq(cmd));
 
-       if (dev->queue_mode == NULL_Q_BIO)
-               err = null_handle_bio(cmd);
-       else
-               err = null_handle_rq(cmd);
-
-       return errno_to_blk_status(err);
 }
 
 static void nullb_zero_read_cmd_buffer(struct nullb_cmd *cmd)
 {
+       struct request *rq = blk_mq_rq_from_pdu(cmd);
        struct nullb_device *dev = cmd->nq->dev;
        struct bio *bio;
 
-       if (dev->memory_backed)
-               return;
-
-       if (dev->queue_mode == NULL_Q_BIO && bio_op(cmd->bio) == REQ_OP_READ) {
-               zero_fill_bio(cmd->bio);
-       } else if (req_op(cmd->rq) == REQ_OP_READ) {
-               __rq_for_each_bio(bio, cmd->rq)
+       if (!dev->memory_backed && req_op(rq) == REQ_OP_READ) {
+               __rq_for_each_bio(bio, rq)
                        zero_fill_bio(bio);
        }
 }
 
 static inline void nullb_complete_cmd(struct nullb_cmd *cmd)
 {
+       struct request *rq = blk_mq_rq_from_pdu(cmd);
+
        /*
         * Since root privileges are required to configure the null_blk
         * driver, it is fine that this driver does not initialize the
@@ -1425,20 +1306,10 @@ static inline void nullb_complete_cmd(struct nullb_cmd *cmd)
        /* Complete IO by inline, softirq or timer */
        switch (cmd->nq->dev->irqmode) {
        case NULL_IRQ_SOFTIRQ:
-               switch (cmd->nq->dev->queue_mode) {
-               case NULL_Q_MQ:
-                       blk_mq_complete_request(cmd->rq);
-                       break;
-               case NULL_Q_BIO:
-                       /*
-                        * XXX: no proper submitting cpu information available.
-                        */
-                       end_cmd(cmd);
-                       break;
-               }
+               blk_mq_complete_request(rq);
                break;
        case NULL_IRQ_NONE:
-               end_cmd(cmd);
+               blk_mq_end_request(rq, cmd->error);
                break;
        case NULL_IRQ_TIMER:
                null_cmd_end_timer(cmd);
@@ -1499,7 +1370,7 @@ static enum hrtimer_restart nullb_bwtimer_fn(struct hrtimer *timer)
                return HRTIMER_NORESTART;
 
        atomic_long_set(&nullb->cur_bytes, mb_per_tick(mbps));
-       null_restart_queue_async(nullb);
+       blk_mq_start_stopped_hw_queues(nullb->q, true);
 
        hrtimer_forward_now(&nullb->bw_timer, timer_interval);
 
@@ -1516,26 +1387,6 @@ static void nullb_setup_bwtimer(struct nullb *nullb)
        hrtimer_start(&nullb->bw_timer, timer_interval, HRTIMER_MODE_REL);
 }
 
-static struct nullb_queue *nullb_to_queue(struct nullb *nullb)
-{
-       int index = 0;
-
-       if (nullb->nr_queues != 1)
-               index = raw_smp_processor_id() / ((nr_cpu_ids + nullb->nr_queues - 1) / nullb->nr_queues);
-
-       return &nullb->queues[index];
-}
-
-static void null_submit_bio(struct bio *bio)
-{
-       sector_t sector = bio->bi_iter.bi_sector;
-       sector_t nr_sectors = bio_sectors(bio);
-       struct nullb *nullb = bio->bi_bdev->bd_disk->private_data;
-       struct nullb_queue *nq = nullb_to_queue(nullb);
-
-       null_handle_cmd(alloc_cmd(nq, bio), sector, nr_sectors, bio_op(bio));
-}
-
 #ifdef CONFIG_BLK_DEV_NULL_BLK_FAULT_INJECTION
 
 static bool should_timeout_request(struct request *rq)
@@ -1655,7 +1506,7 @@ static int null_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob)
                                                blk_rq_sectors(req));
                if (!blk_mq_add_to_batch(req, iob, (__force int) cmd->error,
                                        blk_mq_end_request_batch))
-                       end_cmd(cmd);
+                       blk_mq_end_request(req, cmd->error);
                nr++;
        }
 
@@ -1711,7 +1562,6 @@ static blk_status_t null_queue_rq(struct blk_mq_hw_ctx *hctx,
                hrtimer_init(&cmd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
                cmd->timer.function = null_cmd_timer_expired;
        }
-       cmd->rq = rq;
        cmd->error = BLK_STS_OK;
        cmd->nq = nq;
        cmd->fake_timeout = should_timeout_request(rq) ||
@@ -1770,34 +1620,8 @@ static void null_queue_rqs(struct request **rqlist)
        *rqlist = requeue_list;
 }
 
-static void cleanup_queue(struct nullb_queue *nq)
-{
-       bitmap_free(nq->tag_map);
-       kfree(nq->cmds);
-}
-
-static void cleanup_queues(struct nullb *nullb)
-{
-       int i;
-
-       for (i = 0; i < nullb->nr_queues; i++)
-               cleanup_queue(&nullb->queues[i]);
-
-       kfree(nullb->queues);
-}
-
-static void null_exit_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
-{
-       struct nullb_queue *nq = hctx->driver_data;
-       struct nullb *nullb = nq->dev->nullb;
-
-       nullb->nr_queues--;
-}
-
 static void null_init_queue(struct nullb *nullb, struct nullb_queue *nq)
 {
-       init_waitqueue_head(&nq->wait);
-       nq->queue_depth = nullb->queue_depth;
        nq->dev = nullb->dev;
        INIT_LIST_HEAD(&nq->poll_list);
        spin_lock_init(&nq->poll_lock);
@@ -1815,7 +1639,6 @@ static int null_init_hctx(struct blk_mq_hw_ctx *hctx, void *driver_data,
        nq = &nullb->queues[hctx_idx];
        hctx->driver_data = nq;
        null_init_queue(nullb, nq);
-       nullb->nr_queues++;
 
        return 0;
 }
@@ -1828,7 +1651,6 @@ static const struct blk_mq_ops null_mq_ops = {
        .poll           = null_poll,
        .map_queues     = null_map_queues,
        .init_hctx      = null_init_hctx,
-       .exit_hctx      = null_exit_hctx,
 };
 
 static void null_del_dev(struct nullb *nullb)
@@ -1849,21 +1671,20 @@ static void null_del_dev(struct nullb *nullb)
        if (test_bit(NULLB_DEV_FL_THROTTLED, &nullb->dev->flags)) {
                hrtimer_cancel(&nullb->bw_timer);
                atomic_long_set(&nullb->cur_bytes, LONG_MAX);
-               null_restart_queue_async(nullb);
+               blk_mq_start_stopped_hw_queues(nullb->q, true);
        }
 
        put_disk(nullb->disk);
-       if (dev->queue_mode == NULL_Q_MQ &&
-           nullb->tag_set == &nullb->__tag_set)
+       if (nullb->tag_set == &nullb->__tag_set)
                blk_mq_free_tag_set(nullb->tag_set);
-       cleanup_queues(nullb);
+       kfree(nullb->queues);
        if (null_cache_active(nullb))
                null_free_device_storage(nullb->dev, true);
        kfree(nullb);
        dev->nullb = NULL;
 }
 
-static void null_config_discard(struct nullb *nullb)
+static void null_config_discard(struct nullb *nullb, struct queue_limits *lim)
 {
        if (nullb->dev->discard == false)
                return;
@@ -1880,43 +1701,14 @@ static void null_config_discard(struct nullb *nullb)
                return;
        }
 
-       blk_queue_max_discard_sectors(nullb->q, UINT_MAX >> 9);
+       lim->max_hw_discard_sectors = UINT_MAX >> 9;
 }
 
-static const struct block_device_operations null_bio_ops = {
-       .owner          = THIS_MODULE,
-       .submit_bio     = null_submit_bio,
-       .report_zones   = null_report_zones,
-};
-
-static const struct block_device_operations null_rq_ops = {
+static const struct block_device_operations null_ops = {
        .owner          = THIS_MODULE,
        .report_zones   = null_report_zones,
 };
 
-static int setup_commands(struct nullb_queue *nq)
-{
-       struct nullb_cmd *cmd;
-       int i;
-
-       nq->cmds = kcalloc(nq->queue_depth, sizeof(*cmd), GFP_KERNEL);
-       if (!nq->cmds)
-               return -ENOMEM;
-
-       nq->tag_map = bitmap_zalloc(nq->queue_depth, GFP_KERNEL);
-       if (!nq->tag_map) {
-               kfree(nq->cmds);
-               return -ENOMEM;
-       }
-
-       for (i = 0; i < nq->queue_depth; i++) {
-               cmd = &nq->cmds[i];
-               cmd->tag = -1U;
-       }
-
-       return 0;
-}
-
 static int setup_queues(struct nullb *nullb)
 {
        int nqueues = nr_cpu_ids;
@@ -1929,101 +1721,66 @@ static int setup_queues(struct nullb *nullb)
        if (!nullb->queues)
                return -ENOMEM;
 
-       nullb->queue_depth = nullb->dev->hw_queue_depth;
        return 0;
 }
 
-static int init_driver_queues(struct nullb *nullb)
+static int null_init_tag_set(struct blk_mq_tag_set *set, int poll_queues)
 {
-       struct nullb_queue *nq;
-       int i, ret = 0;
-
-       for (i = 0; i < nullb->dev->submit_queues; i++) {
-               nq = &nullb->queues[i];
-
-               null_init_queue(nullb, nq);
-
-               ret = setup_commands(nq);
-               if (ret)
-                       return ret;
-               nullb->nr_queues++;
+       set->ops = &null_mq_ops;
+       set->cmd_size = sizeof(struct nullb_cmd);
+       set->timeout = 5 * HZ;
+       set->nr_maps = 1;
+       if (poll_queues) {
+               set->nr_hw_queues += poll_queues;
+               set->nr_maps += 2;
        }
-       return 0;
+       return blk_mq_alloc_tag_set(set);
 }
 
-static int null_gendisk_register(struct nullb *nullb)
+static int null_init_global_tag_set(void)
 {
-       sector_t size = ((sector_t)nullb->dev->size * SZ_1M) >> SECTOR_SHIFT;
-       struct gendisk *disk = nullb->disk;
+       int error;
 
-       set_capacity(disk, size);
-
-       disk->major             = null_major;
-       disk->first_minor       = nullb->index;
-       disk->minors            = 1;
-       if (queue_is_mq(nullb->q))
-               disk->fops              = &null_rq_ops;
-       else
-               disk->fops              = &null_bio_ops;
-       disk->private_data      = nullb;
-       strscpy_pad(disk->disk_name, nullb->disk_name, DISK_NAME_LEN);
+       if (tag_set.ops)
+               return 0;
 
-       if (nullb->dev->zoned) {
-               int ret = null_register_zoned_dev(nullb);
+       tag_set.nr_hw_queues = g_submit_queues;
+       tag_set.queue_depth = g_hw_queue_depth;
+       tag_set.numa_node = g_home_node;
+       tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
+       if (g_no_sched)
+               tag_set.flags |= BLK_MQ_F_NO_SCHED;
+       if (g_shared_tag_bitmap)
+               tag_set.flags |= BLK_MQ_F_TAG_HCTX_SHARED;
+       if (g_blocking)
+               tag_set.flags |= BLK_MQ_F_BLOCKING;
 
-               if (ret)
-                       return ret;
-       }
-
-       return add_disk(disk);
+       error = null_init_tag_set(&tag_set, g_poll_queues);
+       if (error)
+               tag_set.ops = NULL;
+       return error;
 }
 
-static int null_init_tag_set(struct nullb *nullb, struct blk_mq_tag_set *set)
+static int null_setup_tagset(struct nullb *nullb)
 {
-       unsigned int flags = BLK_MQ_F_SHOULD_MERGE;
-       int hw_queues, numa_node;
-       unsigned int queue_depth;
-       int poll_queues;
-
-       if (nullb) {
-               hw_queues = nullb->dev->submit_queues;
-               poll_queues = nullb->dev->poll_queues;
-               queue_depth = nullb->dev->hw_queue_depth;
-               numa_node = nullb->dev->home_node;
-               if (nullb->dev->no_sched)
-                       flags |= BLK_MQ_F_NO_SCHED;
-               if (nullb->dev->shared_tag_bitmap)
-                       flags |= BLK_MQ_F_TAG_HCTX_SHARED;
-               if (nullb->dev->blocking)
-                       flags |= BLK_MQ_F_BLOCKING;
-       } else {
-               hw_queues = g_submit_queues;
-               poll_queues = g_poll_queues;
-               queue_depth = g_hw_queue_depth;
-               numa_node = g_home_node;
-               if (g_no_sched)
-                       flags |= BLK_MQ_F_NO_SCHED;
-               if (g_shared_tag_bitmap)
-                       flags |= BLK_MQ_F_TAG_HCTX_SHARED;
-               if (g_blocking)
-                       flags |= BLK_MQ_F_BLOCKING;
-       }
-
-       set->ops = &null_mq_ops;
-       set->cmd_size   = sizeof(struct nullb_cmd);
-       set->flags = flags;
-       set->driver_data = nullb;
-       set->nr_hw_queues = hw_queues;
-       set->queue_depth = queue_depth;
-       set->numa_node = numa_node;
-       if (poll_queues) {
-               set->nr_hw_queues += poll_queues;
-               set->nr_maps = 3;
-       } else {
-               set->nr_maps = 1;
+       if (nullb->dev->shared_tags) {
+               nullb->tag_set = &tag_set;
+               return null_init_global_tag_set();
        }
 
-       return blk_mq_alloc_tag_set(set);
+       nullb->tag_set = &nullb->__tag_set;
+       nullb->tag_set->driver_data = nullb;
+       nullb->tag_set->nr_hw_queues = nullb->dev->submit_queues;
+       nullb->tag_set->queue_depth = nullb->dev->hw_queue_depth;
+       nullb->tag_set->numa_node = nullb->dev->home_node;
+       nullb->tag_set->flags = BLK_MQ_F_SHOULD_MERGE;
+       if (nullb->dev->no_sched)
+               nullb->tag_set->flags |= BLK_MQ_F_NO_SCHED;
+       if (nullb->dev->shared_tag_bitmap)
+               nullb->tag_set->flags |= BLK_MQ_F_TAG_HCTX_SHARED;
+       if (nullb->dev->blocking)
+               nullb->tag_set->flags |= BLK_MQ_F_BLOCKING;
+       return null_init_tag_set(nullb->tag_set, nullb->dev->poll_queues);
 }
 
 static int null_validate_conf(struct nullb_device *dev)
@@ -2032,11 +1789,15 @@ static int null_validate_conf(struct nullb_device *dev)
                pr_err("legacy IO path is no longer available\n");
                return -EINVAL;
        }
+       if (dev->queue_mode == NULL_Q_BIO) {
+               pr_err("BIO-based IO path is no longer available, using blk-mq instead.\n");
+               dev->queue_mode = NULL_Q_MQ;
+       }
 
        dev->blocksize = round_down(dev->blocksize, 512);
        dev->blocksize = clamp_t(unsigned int, dev->blocksize, 512, 4096);
 
-       if (dev->queue_mode == NULL_Q_MQ && dev->use_per_node_hctx) {
+       if (dev->use_per_node_hctx) {
                if (dev->submit_queues != nr_online_nodes)
                        dev->submit_queues = nr_online_nodes;
        } else if (dev->submit_queues > nr_cpu_ids)
@@ -2048,8 +1809,6 @@ static int null_validate_conf(struct nullb_device *dev)
        if (dev->poll_queues > g_poll_queues)
                dev->poll_queues = g_poll_queues;
        dev->prev_poll_queues = dev->poll_queues;
-
-       dev->queue_mode = min_t(unsigned int, dev->queue_mode, NULL_Q_MQ);
        dev->irqmode = min_t(unsigned int, dev->irqmode, NULL_IRQ_TIMER);
 
        /* Do memory allocation, so set blocking */
@@ -2060,9 +1819,6 @@ static int null_validate_conf(struct nullb_device *dev)
        dev->cache_size = min_t(unsigned long, ULONG_MAX / 1024 / 1024,
                                                dev->cache_size);
        dev->mbps = min_t(unsigned int, 1024 * 40, dev->mbps);
-       /* can not stop a queue */
-       if (dev->queue_mode == NULL_Q_BIO)
-               dev->mbps = 0;
 
        if (dev->zoned &&
            (!dev->zone_size || !is_power_of_2(dev->zone_size))) {
@@ -2102,6 +1858,12 @@ static bool null_setup_fault(void)
 
 static int null_add_dev(struct nullb_device *dev)
 {
+       struct queue_limits lim = {
+               .logical_block_size     = dev->blocksize,
+               .physical_block_size    = dev->blocksize,
+               .max_hw_sectors         = dev->max_sectors,
+       };
+
        struct nullb *nullb;
        int rv;
 
@@ -2123,36 +1885,25 @@ static int null_add_dev(struct nullb_device *dev)
        if (rv)
                goto out_free_nullb;
 
-       if (dev->queue_mode == NULL_Q_MQ) {
-               if (shared_tags) {
-                       nullb->tag_set = &tag_set;
-                       rv = 0;
-               } else {
-                       nullb->tag_set = &nullb->__tag_set;
-                       rv = null_init_tag_set(nullb, nullb->tag_set);
-               }
+       rv = null_setup_tagset(nullb);
+       if (rv)
+               goto out_cleanup_queues;
 
+       if (dev->virt_boundary)
+               lim.virt_boundary_mask = PAGE_SIZE - 1;
+       null_config_discard(nullb, &lim);
+       if (dev->zoned) {
+               rv = null_init_zoned_dev(dev, &lim);
                if (rv)
-                       goto out_cleanup_queues;
-
-               nullb->tag_set->timeout = 5 * HZ;
-               nullb->disk = blk_mq_alloc_disk(nullb->tag_set, nullb);
-               if (IS_ERR(nullb->disk)) {
-                       rv = PTR_ERR(nullb->disk);
                        goto out_cleanup_tags;
-               }
-               nullb->q = nullb->disk->queue;
-       } else if (dev->queue_mode == NULL_Q_BIO) {
-               rv = -ENOMEM;
-               nullb->disk = blk_alloc_disk(nullb->dev->home_node);
-               if (!nullb->disk)
-                       goto out_cleanup_queues;
+       }
 
-               nullb->q = nullb->disk->queue;
-               rv = init_driver_queues(nullb);
-               if (rv)
-                       goto out_cleanup_disk;
+       nullb->disk = blk_mq_alloc_disk(nullb->tag_set, &lim, nullb);
+       if (IS_ERR(nullb->disk)) {
+               rv = PTR_ERR(nullb->disk);
+               goto out_cleanup_zone;
        }
+       nullb->q = nullb->disk->queue;
 
        if (dev->mbps) {
                set_bit(NULLB_DEV_FL_THROTTLED, &dev->flags);
@@ -2164,12 +1915,6 @@ static int null_add_dev(struct nullb_device *dev)
                blk_queue_write_cache(nullb->q, true, true);
        }
 
-       if (dev->zoned) {
-               rv = null_init_zoned_dev(dev, nullb->q);
-               if (rv)
-                       goto out_cleanup_disk;
-       }
-
        nullb->q->queuedata = nullb;
        blk_queue_flag_set(QUEUE_FLAG_NONROT, nullb->q);
 
@@ -2177,22 +1922,12 @@ static int null_add_dev(struct nullb_device *dev)
        rv = ida_alloc(&nullb_indexes, GFP_KERNEL);
        if (rv < 0) {
                mutex_unlock(&lock);
-               goto out_cleanup_zone;
+               goto out_cleanup_disk;
        }
        nullb->index = rv;
        dev->index = rv;
        mutex_unlock(&lock);
 
-       blk_queue_logical_block_size(nullb->q, dev->blocksize);
-       blk_queue_physical_block_size(nullb->q, dev->blocksize);
-       if (dev->max_sectors)
-               blk_queue_max_hw_sectors(nullb->q, dev->max_sectors);
-
-       if (dev->virt_boundary)
-               blk_queue_virt_boundary(nullb->q, PAGE_SIZE - 1);
-
-       null_config_discard(nullb);
-
        if (config_item_name(&dev->group.cg_item)) {
                /* Use configfs dir name as the device name */
                snprintf(nullb->disk_name, sizeof(nullb->disk_name),
@@ -2201,7 +1936,22 @@ static int null_add_dev(struct nullb_device *dev)
                sprintf(nullb->disk_name, "nullb%d", nullb->index);
        }
 
-       rv = null_gendisk_register(nullb);
+       set_capacity(nullb->disk,
+               ((sector_t)nullb->dev->size * SZ_1M) >> SECTOR_SHIFT);
+       nullb->disk->major = null_major;
+       nullb->disk->first_minor = nullb->index;
+       nullb->disk->minors = 1;
+       nullb->disk->fops = &null_ops;
+       nullb->disk->private_data = nullb;
+       strscpy_pad(nullb->disk->disk_name, nullb->disk_name, DISK_NAME_LEN);
+
+       if (nullb->dev->zoned) {
+               rv = null_register_zoned_dev(nullb);
+               if (rv)
+                       goto out_ida_free;
+       }
+
+       rv = add_disk(nullb->disk);
        if (rv)
                goto out_ida_free;
 
@@ -2220,10 +1970,10 @@ out_cleanup_zone:
 out_cleanup_disk:
        put_disk(nullb->disk);
 out_cleanup_tags:
-       if (dev->queue_mode == NULL_Q_MQ && nullb->tag_set == &nullb->__tag_set)
+       if (nullb->tag_set == &nullb->__tag_set)
                blk_mq_free_tag_set(nullb->tag_set);
 out_cleanup_queues:
-       cleanup_queues(nullb);
+       kfree(nullb->queues);
 out_free_nullb:
        kfree(nullb);
        dev->nullb = NULL;
@@ -2299,7 +2049,7 @@ static int __init null_init(void)
                return -EINVAL;
        }
 
-       if (g_queue_mode == NULL_Q_MQ && g_use_per_node_hctx) {
+       if (g_use_per_node_hctx) {
                if (g_submit_queues != nr_online_nodes) {
                        pr_warn("submit_queues param is set to %u.\n",
                                nr_online_nodes);
@@ -2311,18 +2061,12 @@ static int __init null_init(void)
                g_submit_queues = 1;
        }
 
-       if (g_queue_mode == NULL_Q_MQ && shared_tags) {
-               ret = null_init_tag_set(NULL, &tag_set);
-               if (ret)
-                       return ret;
-       }
-
        config_group_init(&nullb_subsys.su_group);
        mutex_init(&nullb_subsys.su_mutex);
 
        ret = configfs_register_subsystem(&nullb_subsys);
        if (ret)
-               goto err_tagset;
+               return ret;
 
        mutex_init(&lock);
 
@@ -2349,9 +2093,6 @@ err_dev:
        unregister_blkdev(null_major, "nullb");
 err_conf:
        configfs_unregister_subsystem(&nullb_subsys);
-err_tagset:
-       if (g_queue_mode == NULL_Q_MQ && shared_tags)
-               blk_mq_free_tag_set(&tag_set);
        return ret;
 }
 
@@ -2370,7 +2111,7 @@ static void __exit null_exit(void)
        }
        mutex_unlock(&lock);
 
-       if (g_queue_mode == NULL_Q_MQ && shared_tags)
+       if (tag_set.ops)
                blk_mq_free_tag_set(&tag_set);
 }
 
index 929f659dd255b7068b74453099d46f965df7b30f..477b9774682346b95becee2afd9a7a1a5559af1b 100644 (file)
 #include <linux/mutex.h>
 
 struct nullb_cmd {
-       union {
-               struct request *rq;
-               struct bio *bio;
-       };
-       unsigned int tag;
        blk_status_t error;
        bool fake_timeout;
        struct nullb_queue *nq;
@@ -28,16 +23,11 @@ struct nullb_cmd {
 };
 
 struct nullb_queue {
-       unsigned long *tag_map;
-       wait_queue_head_t wait;
-       unsigned int queue_depth;
        struct nullb_device *dev;
        unsigned int requeue_selection;
 
        struct list_head poll_list;
        spinlock_t poll_lock;
-
-       struct nullb_cmd *cmds;
 };
 
 struct nullb_zone {
@@ -60,13 +50,6 @@ struct nullb_zone {
        unsigned int capacity;
 };
 
-/* Queue modes */
-enum {
-       NULL_Q_BIO      = 0,
-       NULL_Q_RQ       = 1,
-       NULL_Q_MQ       = 2,
-};
-
 struct nullb_device {
        struct nullb *nullb;
        struct config_group group;
@@ -119,6 +102,7 @@ struct nullb_device {
        bool zoned; /* if device is zoned */
        bool virt_boundary; /* virtual boundary on/off for the device */
        bool no_sched; /* no IO scheduler for the device */
+       bool shared_tags; /* share tag set between devices for blk-mq */
        bool shared_tag_bitmap; /* use hostwide shared tags */
 };
 
@@ -130,14 +114,12 @@ struct nullb {
        struct gendisk *disk;
        struct blk_mq_tag_set *tag_set;
        struct blk_mq_tag_set __tag_set;
-       unsigned int queue_depth;
        atomic_long_t cur_bytes;
        struct hrtimer bw_timer;
        unsigned long cache_flush_pos;
        spinlock_t lock;
 
        struct nullb_queue *queues;
-       unsigned int nr_queues;
        char disk_name[DISK_NAME_LEN];
 };
 
@@ -147,7 +129,7 @@ blk_status_t null_process_cmd(struct nullb_cmd *cmd, enum req_op op,
                              sector_t sector, unsigned int nr_sectors);
 
 #ifdef CONFIG_BLK_DEV_ZONED
-int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q);
+int null_init_zoned_dev(struct nullb_device *dev, struct queue_limits *lim);
 int null_register_zoned_dev(struct nullb *nullb);
 void null_free_zoned_dev(struct nullb_device *dev);
 int null_report_zones(struct gendisk *disk, sector_t sector,
@@ -160,7 +142,7 @@ ssize_t zone_cond_store(struct nullb_device *dev, const char *page,
                        size_t count, enum blk_zone_cond cond);
 #else
 static inline int null_init_zoned_dev(struct nullb_device *dev,
-                                     struct request_queue *q)
+               struct queue_limits *lim)
 {
        pr_err("CONFIG_BLK_DEV_ZONED not enabled\n");
        return -EINVAL;
index 6b2b370e786f5f6c6a4fc4eeb9b75f4c96fd3be6..ef2d05d5f0df7ea0784a35ffc566fc8f686efee4 100644 (file)
@@ -41,10 +41,11 @@ TRACE_EVENT(nullb_zone_op,
                __field(unsigned int, zone_cond)
            ),
            TP_fast_assign(
-               __entry->op = req_op(cmd->rq);
+               __entry->op = req_op(blk_mq_rq_from_pdu(cmd));
                __entry->zone_no = zone_no;
                __entry->zone_cond = zone_cond;
-               __assign_disk_name(__entry->disk, cmd->rq->q->disk);
+               __assign_disk_name(__entry->disk,
+                       blk_mq_rq_from_pdu(cmd)->q->disk);
            ),
            TP_printk("%s req=%-15s zone_no=%u zone_cond=%-10s",
                      __print_disk_name(__entry->disk),
index 6f5e0994862eaedb65273513be975affd4e30fba..1689e25841048355f5f3babf4f9c9e8bbca5cd1d 100644 (file)
@@ -58,7 +58,8 @@ static inline void null_unlock_zone(struct nullb_device *dev,
                mutex_unlock(&zone->mutex);
 }
 
-int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
+int null_init_zoned_dev(struct nullb_device *dev,
+                       struct queue_limits *lim)
 {
        sector_t dev_capacity_sects, zone_capacity_sects;
        struct nullb_zone *zone;
@@ -151,27 +152,22 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
                sector += dev->zone_size_sects;
        }
 
+       lim->zoned = true;
+       lim->chunk_sectors = dev->zone_size_sects;
+       lim->max_zone_append_sectors = dev->zone_size_sects;
+       lim->max_open_zones = dev->zone_max_open;
+       lim->max_active_zones = dev->zone_max_active;
        return 0;
 }
 
 int null_register_zoned_dev(struct nullb *nullb)
 {
-       struct nullb_device *dev = nullb->dev;
        struct request_queue *q = nullb->q;
 
-       disk_set_zoned(nullb->disk);
        blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
        blk_queue_required_elevator_features(q, ELEVATOR_F_ZBD_SEQ_WRITE);
-       blk_queue_chunk_sectors(q, dev->zone_size_sects);
        nullb->disk->nr_zones = bdev_nr_zones(nullb->disk->part0);
-       blk_queue_max_zone_append_sectors(q, dev->zone_size_sects);
-       disk_set_max_open_zones(nullb->disk, dev->zone_max_open);
-       disk_set_max_active_zones(nullb->disk, dev->zone_max_active);
-
-       if (queue_is_mq(q))
-               return blk_revalidate_disk_zones(nullb->disk, NULL);
-
-       return 0;
+       return blk_revalidate_disk_zones(nullb->disk, NULL);
 }
 
 void null_free_zoned_dev(struct nullb_device *dev)
@@ -394,10 +390,7 @@ static blk_status_t null_zone_write(struct nullb_cmd *cmd, sector_t sector,
         */
        if (append) {
                sector = zone->wp;
-               if (dev->queue_mode == NULL_Q_MQ)
-                       cmd->rq->__sector = sector;
-               else
-                       cmd->bio->bi_iter.bi_sector = sector;
+               blk_mq_rq_from_pdu(cmd)->__sector = sector;
        } else if (sector != zone->wp) {
                ret = BLK_STS_IOERR;
                goto unlock;
index d56d972aadb36fa2a3b85282eef1ea058a9bf73e..21728e9ea5c374603b50b758282d9091329e2e82 100644 (file)
@@ -340,8 +340,8 @@ static ssize_t device_map_show(const struct class *c, const struct class_attribu
                n += sysfs_emit_at(data, n, "%s %u:%u %u:%u\n",
                        pd->disk->disk_name,
                        MAJOR(pd->pkt_dev), MINOR(pd->pkt_dev),
-                       MAJOR(pd->bdev_handle->bdev->bd_dev),
-                       MINOR(pd->bdev_handle->bdev->bd_dev));
+                       MAJOR(file_bdev(pd->bdev_file)->bd_dev),
+                       MINOR(file_bdev(pd->bdev_file)->bd_dev));
        }
        mutex_unlock(&ctl_mutex);
        return n;
@@ -438,7 +438,7 @@ static int pkt_seq_show(struct seq_file *m, void *p)
        int states[PACKET_NUM_STATES];
 
        seq_printf(m, "Writer %s mapped to %pg:\n", pd->disk->disk_name,
-                  pd->bdev_handle->bdev);
+                  file_bdev(pd->bdev_file));
 
        seq_printf(m, "\nSettings:\n");
        seq_printf(m, "\tpacket size:\t\t%dkB\n", pd->settings.size / 2);
@@ -715,7 +715,7 @@ static void pkt_rbtree_insert(struct pktcdvd_device *pd, struct pkt_rb_node *nod
  */
 static int pkt_generic_packet(struct pktcdvd_device *pd, struct packet_command *cgc)
 {
-       struct request_queue *q = bdev_get_queue(pd->bdev_handle->bdev);
+       struct request_queue *q = bdev_get_queue(file_bdev(pd->bdev_file));
        struct scsi_cmnd *scmd;
        struct request *rq;
        int ret = 0;
@@ -828,6 +828,12 @@ static noinline_for_stack int pkt_set_speed(struct pktcdvd_device *pd,
  */
 static void pkt_queue_bio(struct pktcdvd_device *pd, struct bio *bio)
 {
+       /*
+        * Some CDRW drives can not handle writes larger than one packet,
+        * even if the size is a multiple of the packet size.
+        */
+       bio->bi_opf |= REQ_NOMERGE;
+
        spin_lock(&pd->iosched.lock);
        if (bio_data_dir(bio) == READ)
                bio_list_add(&pd->iosched.read_queue, bio);
@@ -1048,7 +1054,7 @@ static void pkt_gather_data(struct pktcdvd_device *pd, struct packet_data *pkt)
                        continue;
 
                bio = pkt->r_bios[f];
-               bio_init(bio, pd->bdev_handle->bdev, bio->bi_inline_vecs, 1,
+               bio_init(bio, file_bdev(pd->bdev_file), bio->bi_inline_vecs, 1,
                         REQ_OP_READ);
                bio->bi_iter.bi_sector = pkt->sector + f * (CD_FRAMESIZE >> 9);
                bio->bi_end_io = pkt_end_io_read;
@@ -1264,7 +1270,7 @@ static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
        struct device *ddev = disk_to_dev(pd->disk);
        int f;
 
-       bio_init(pkt->w_bio, pd->bdev_handle->bdev, pkt->w_bio->bi_inline_vecs,
+       bio_init(pkt->w_bio, file_bdev(pd->bdev_file), pkt->w_bio->bi_inline_vecs,
                 pkt->frames, REQ_OP_WRITE);
        pkt->w_bio->bi_iter.bi_sector = pkt->sector;
        pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
@@ -2162,20 +2168,20 @@ static int pkt_open_dev(struct pktcdvd_device *pd, bool write)
        int ret;
        long lba;
        struct request_queue *q;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
 
        /*
         * We need to re-open the cdrom device without O_NONBLOCK to be able
         * to read/write from/to it. It is already opened in O_NONBLOCK mode
         * so open should not fail.
         */
-       bdev_handle = bdev_open_by_dev(pd->bdev_handle->bdev->bd_dev,
+       bdev_file = bdev_file_open_by_dev(file_bdev(pd->bdev_file)->bd_dev,
                                       BLK_OPEN_READ, pd, NULL);
-       if (IS_ERR(bdev_handle)) {
-               ret = PTR_ERR(bdev_handle);
+       if (IS_ERR(bdev_file)) {
+               ret = PTR_ERR(bdev_file);
                goto out;
        }
-       pd->open_bdev_handle = bdev_handle;
+       pd->f_open_bdev = bdev_file;
 
        ret = pkt_get_last_written(pd, &lba);
        if (ret) {
@@ -2184,18 +2190,13 @@ static int pkt_open_dev(struct pktcdvd_device *pd, bool write)
        }
 
        set_capacity(pd->disk, lba << 2);
-       set_capacity_and_notify(pd->bdev_handle->bdev->bd_disk, lba << 2);
+       set_capacity_and_notify(file_bdev(pd->bdev_file)->bd_disk, lba << 2);
 
-       q = bdev_get_queue(pd->bdev_handle->bdev);
+       q = bdev_get_queue(file_bdev(pd->bdev_file));
        if (write) {
                ret = pkt_open_write(pd);
                if (ret)
                        goto out_putdev;
-               /*
-                * Some CDRW drives can not handle writes larger than one packet,
-                * even if the size is a multiple of the packet size.
-                */
-               blk_queue_max_hw_sectors(q, pd->settings.size);
                set_bit(PACKET_WRITABLE, &pd->flags);
        } else {
                pkt_set_speed(pd, MAX_SPEED, MAX_SPEED);
@@ -2218,7 +2219,7 @@ static int pkt_open_dev(struct pktcdvd_device *pd, bool write)
        return 0;
 
 out_putdev:
-       bdev_release(bdev_handle);
+       fput(bdev_file);
 out:
        return ret;
 }
@@ -2237,8 +2238,8 @@ static void pkt_release_dev(struct pktcdvd_device *pd, int flush)
        pkt_lock_door(pd, 0);
 
        pkt_set_speed(pd, MAX_SPEED, MAX_SPEED);
-       bdev_release(pd->open_bdev_handle);
-       pd->open_bdev_handle = NULL;
+       fput(pd->f_open_bdev);
+       pd->f_open_bdev = NULL;
 
        pkt_shrink_pktlist(pd);
 }
@@ -2326,7 +2327,7 @@ static void pkt_end_io_read_cloned(struct bio *bio)
 
 static void pkt_make_request_read(struct pktcdvd_device *pd, struct bio *bio)
 {
-       struct bio *cloned_bio = bio_alloc_clone(pd->bdev_handle->bdev, bio,
+       struct bio *cloned_bio = bio_alloc_clone(file_bdev(pd->bdev_file), bio,
                GFP_NOIO, &pkt_bio_set);
        struct packet_stacked_data *psd = mempool_alloc(&psd_pool, GFP_NOIO);
 
@@ -2338,9 +2339,9 @@ static void pkt_make_request_read(struct pktcdvd_device *pd, struct bio *bio)
        pkt_queue_bio(pd, cloned_bio);
 }
 
-static void pkt_make_request_write(struct request_queue *q, struct bio *bio)
+static void pkt_make_request_write(struct bio *bio)
 {
-       struct pktcdvd_device *pd = q->queuedata;
+       struct pktcdvd_device *pd = bio->bi_bdev->bd_disk->private_data;
        sector_t zone;
        struct packet_data *pkt;
        int was_empty, blocked_bio;
@@ -2432,7 +2433,7 @@ static void pkt_make_request_write(struct request_queue *q, struct bio *bio)
 
 static void pkt_submit_bio(struct bio *bio)
 {
-       struct pktcdvd_device *pd = bio->bi_bdev->bd_disk->queue->queuedata;
+       struct pktcdvd_device *pd = bio->bi_bdev->bd_disk->private_data;
        struct device *ddev = disk_to_dev(pd->disk);
        struct bio *split;
 
@@ -2476,7 +2477,7 @@ static void pkt_submit_bio(struct bio *bio)
                        split = bio;
                }
 
-               pkt_make_request_write(bio->bi_bdev->bd_disk->queue, split);
+               pkt_make_request_write(split);
        } while (split != bio);
 
        return;
@@ -2484,20 +2485,11 @@ end_io:
        bio_io_error(bio);
 }
 
-static void pkt_init_queue(struct pktcdvd_device *pd)
-{
-       struct request_queue *q = pd->disk->queue;
-
-       blk_queue_logical_block_size(q, CD_FRAMESIZE);
-       blk_queue_max_hw_sectors(q, PACKET_MAX_SECTORS);
-       q->queuedata = pd;
-}
-
 static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
 {
        struct device *ddev = disk_to_dev(pd->disk);
        int i;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        struct scsi_device *sdev;
 
        if (pd->pkt_dev == dev) {
@@ -2508,9 +2500,9 @@ static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
                struct pktcdvd_device *pd2 = pkt_devs[i];
                if (!pd2)
                        continue;
-               if (pd2->bdev_handle->bdev->bd_dev == dev) {
+               if (file_bdev(pd2->bdev_file)->bd_dev == dev) {
                        dev_err(ddev, "%pg already setup\n",
-                               pd2->bdev_handle->bdev);
+                               file_bdev(pd2->bdev_file));
                        return -EBUSY;
                }
                if (pd2->pkt_dev == dev) {
@@ -2519,13 +2511,13 @@ static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
                }
        }
 
-       bdev_handle = bdev_open_by_dev(dev, BLK_OPEN_READ | BLK_OPEN_NDELAY,
+       bdev_file = bdev_file_open_by_dev(dev, BLK_OPEN_READ | BLK_OPEN_NDELAY,
                                       NULL, NULL);
-       if (IS_ERR(bdev_handle))
-               return PTR_ERR(bdev_handle);
-       sdev = scsi_device_from_queue(bdev_handle->bdev->bd_disk->queue);
+       if (IS_ERR(bdev_file))
+               return PTR_ERR(bdev_file);
+       sdev = scsi_device_from_queue(file_bdev(bdev_file)->bd_disk->queue);
        if (!sdev) {
-               bdev_release(bdev_handle);
+               fput(bdev_file);
                return -EINVAL;
        }
        put_device(&sdev->sdev_gendev);
@@ -2533,10 +2525,8 @@ static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
        /* This is safe, since we have a reference from open(). */
        __module_get(THIS_MODULE);
 
-       pd->bdev_handle = bdev_handle;
-       set_blocksize(bdev_handle->bdev, CD_FRAMESIZE);
-
-       pkt_init_queue(pd);
+       pd->bdev_file = bdev_file;
+       set_blocksize(file_bdev(bdev_file), CD_FRAMESIZE);
 
        atomic_set(&pd->cdrw.pending_bios, 0);
        pd->cdrw.thread = kthread_run(kcdrwd, pd, "%s", pd->disk->disk_name);
@@ -2546,11 +2536,11 @@ static int pkt_new_dev(struct pktcdvd_device *pd, dev_t dev)
        }
 
        proc_create_single_data(pd->disk->disk_name, 0, pkt_proc, pkt_seq_show, pd);
-       dev_notice(ddev, "writer mapped to %pg\n", bdev_handle->bdev);
+       dev_notice(ddev, "writer mapped to %pg\n", file_bdev(bdev_file));
        return 0;
 
 out_mem:
-       bdev_release(bdev_handle);
+       fput(bdev_file);
        /* This is safe: open() is still holding a reference. */
        module_put(THIS_MODULE);
        return -ENOMEM;
@@ -2605,9 +2595,9 @@ static unsigned int pkt_check_events(struct gendisk *disk,
 
        if (!pd)
                return 0;
-       if (!pd->bdev_handle)
+       if (!pd->bdev_file)
                return 0;
-       attached_disk = pd->bdev_handle->bdev->bd_disk;
+       attached_disk = file_bdev(pd->bdev_file)->bd_disk;
        if (!attached_disk || !attached_disk->fops->check_events)
                return 0;
        return attached_disk->fops->check_events(attached_disk, clearing);
@@ -2634,6 +2624,10 @@ static const struct block_device_operations pktcdvd_ops = {
  */
 static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev)
 {
+       struct queue_limits lim = {
+               .max_hw_sectors         = PACKET_MAX_SECTORS,
+               .logical_block_size     = CD_FRAMESIZE,
+       };
        int idx;
        int ret = -ENOMEM;
        struct pktcdvd_device *pd;
@@ -2673,10 +2667,11 @@ static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev)
        pd->write_congestion_on  = write_congestion_on;
        pd->write_congestion_off = write_congestion_off;
 
-       ret = -ENOMEM;
-       disk = blk_alloc_disk(NUMA_NO_NODE);
-       if (!disk)
+       disk = blk_alloc_disk(&lim, NUMA_NO_NODE);
+       if (IS_ERR(disk)) {
+               ret = PTR_ERR(disk);
                goto out_mem;
+       }
        pd->disk = disk;
        disk->major = pktdev_major;
        disk->first_minor = idx;
@@ -2692,7 +2687,7 @@ static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev)
                goto out_mem2;
 
        /* inherit events of the host device */
-       disk->events = pd->bdev_handle->bdev->bd_disk->events;
+       disk->events = file_bdev(pd->bdev_file)->bd_disk->events;
 
        ret = add_disk(disk);
        if (ret)
@@ -2757,7 +2752,7 @@ static int pkt_remove_dev(dev_t pkt_dev)
        pkt_debugfs_dev_remove(pd);
        pkt_sysfs_dev_remove(pd);
 
-       bdev_release(pd->bdev_handle);
+       fput(pd->bdev_file);
 
        remove_proc_entry(pd->disk->disk_name, pkt_proc);
        dev_notice(ddev, "writer unmapped\n");
@@ -2784,7 +2779,7 @@ static void pkt_get_status(struct pkt_ctrl_command *ctrl_cmd)
 
        pd = pkt_find_dev_from_minor(ctrl_cmd->dev_index);
        if (pd) {
-               ctrl_cmd->dev = new_encode_dev(pd->bdev_handle->bdev->bd_dev);
+               ctrl_cmd->dev = new_encode_dev(file_bdev(pd->bdev_file)->bd_dev);
                ctrl_cmd->pkt_dev = new_encode_dev(pd->pkt_dev);
        } else {
                ctrl_cmd->dev = 0;
index 36d7b36c60c76bd705115f4ce61bf1c50bafab11..b810ac0a5c4b97ed92b6a8f3075fee8ca415b6b6 100644 (file)
@@ -382,6 +382,14 @@ static int ps3disk_probe(struct ps3_system_bus_device *_dev)
        struct ps3disk_private *priv;
        int error;
        unsigned int devidx;
+       struct queue_limits lim = {
+               .logical_block_size     = dev->blk_size,
+               .max_hw_sectors         = dev->bounce_size >> 9,
+               .max_segments           = -1,
+               .max_segment_size       = dev->bounce_size,
+               .dma_alignment          = dev->blk_size - 1,
+       };
+
        struct request_queue *queue;
        struct gendisk *gendisk;
 
@@ -431,7 +439,7 @@ static int ps3disk_probe(struct ps3_system_bus_device *_dev)
        if (error)
                goto fail_teardown;
 
-       gendisk = blk_mq_alloc_disk(&priv->tag_set, dev);
+       gendisk = blk_mq_alloc_disk(&priv->tag_set, &lim, dev);
        if (IS_ERR(gendisk)) {
                dev_err(&dev->sbd.core, "%s:%u: blk_mq_alloc_disk failed\n",
                        __func__, __LINE__);
@@ -441,15 +449,8 @@ static int ps3disk_probe(struct ps3_system_bus_device *_dev)
 
        queue = gendisk->queue;
 
-       blk_queue_max_hw_sectors(queue, dev->bounce_size >> 9);
-       blk_queue_dma_alignment(queue, dev->blk_size-1);
-       blk_queue_logical_block_size(queue, dev->blk_size);
-
        blk_queue_write_cache(queue, true, false);
 
-       blk_queue_max_segments(queue, -1);
-       blk_queue_max_segment_size(queue, dev->bounce_size);
-
        priv->gendisk = gendisk;
        gendisk->major = ps3disk_major;
        gendisk->first_minor = devidx * PS3DISK_MINORS;
index 38d42af01b253517f0a185fe4d582c2006f09a29..bdcf083b45e2349fc9d92854ca3433ddf78fa7fb 100644 (file)
@@ -730,10 +730,10 @@ static int ps3vram_probe(struct ps3_system_bus_device *dev)
 
        ps3vram_proc_init(dev);
 
-       gendisk = blk_alloc_disk(NUMA_NO_NODE);
-       if (!gendisk) {
+       gendisk = blk_alloc_disk(NULL, NUMA_NO_NODE);
+       if (IS_ERR(gendisk)) {
                dev_err(&dev->core, "blk_alloc_disk failed\n");
-               error = -ENOMEM;
+               error = PTR_ERR(gendisk);
                goto out_cache_cleanup;
        }
 
index a999b698b131f7763916c3bd0de5c87478fd0df4..26ff5cd2bf0abc118d5c83cdf733554a3be97e0c 100644 (file)
@@ -575,7 +575,7 @@ static const struct attribute_group rbd_bus_group = {
 };
 __ATTRIBUTE_GROUPS(rbd_bus);
 
-static struct bus_type rbd_bus_type = {
+static const struct bus_type rbd_bus_type = {
        .name           = "rbd",
        .bus_groups     = rbd_bus_groups,
 };
@@ -3452,14 +3452,15 @@ static bool rbd_lock_add_request(struct rbd_img_request *img_req)
 static void rbd_lock_del_request(struct rbd_img_request *img_req)
 {
        struct rbd_device *rbd_dev = img_req->rbd_dev;
-       bool need_wakeup;
+       bool need_wakeup = false;
 
        lockdep_assert_held(&rbd_dev->lock_rwsem);
        spin_lock(&rbd_dev->lock_lists_lock);
-       rbd_assert(!list_empty(&img_req->lock_item));
-       list_del_init(&img_req->lock_item);
-       need_wakeup = (rbd_dev->lock_state == RBD_LOCK_STATE_RELEASING &&
-                      list_empty(&rbd_dev->running_list));
+       if (!list_empty(&img_req->lock_item)) {
+               list_del_init(&img_req->lock_item);
+               need_wakeup = (rbd_dev->lock_state == RBD_LOCK_STATE_RELEASING &&
+                              list_empty(&rbd_dev->running_list));
+       }
        spin_unlock(&rbd_dev->lock_lists_lock);
        if (need_wakeup)
                complete(&rbd_dev->releasing_wait);
@@ -3842,14 +3843,19 @@ static void wake_lock_waiters(struct rbd_device *rbd_dev, int result)
                return;
        }
 
-       list_for_each_entry(img_req, &rbd_dev->acquiring_list, lock_item) {
+       while (!list_empty(&rbd_dev->acquiring_list)) {
+               img_req = list_first_entry(&rbd_dev->acquiring_list,
+                                          struct rbd_img_request, lock_item);
                mutex_lock(&img_req->state_mutex);
                rbd_assert(img_req->state == RBD_IMG_EXCLUSIVE_LOCK);
+               if (!result)
+                       list_move_tail(&img_req->lock_item,
+                                      &rbd_dev->running_list);
+               else
+                       list_del_init(&img_req->lock_item);
                rbd_img_schedule(img_req, result);
                mutex_unlock(&img_req->state_mutex);
        }
-
-       list_splice_tail_init(&rbd_dev->acquiring_list, &rbd_dev->running_list);
 }
 
 static bool locker_equal(const struct ceph_locker *lhs,
@@ -4946,6 +4952,14 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
        struct request_queue *q;
        unsigned int objset_bytes =
            rbd_dev->layout.object_size * rbd_dev->layout.stripe_count;
+       struct queue_limits lim = {
+               .max_hw_sectors         = objset_bytes >> SECTOR_SHIFT,
+               .max_user_sectors       = objset_bytes >> SECTOR_SHIFT,
+               .io_min                 = rbd_dev->opts->alloc_size,
+               .io_opt                 = rbd_dev->opts->alloc_size,
+               .max_segments           = USHRT_MAX,
+               .max_segment_size       = UINT_MAX,
+       };
        int err;
 
        memset(&rbd_dev->tag_set, 0, sizeof(rbd_dev->tag_set));
@@ -4960,7 +4974,13 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
        if (err)
                return err;
 
-       disk = blk_mq_alloc_disk(&rbd_dev->tag_set, rbd_dev);
+       if (rbd_dev->opts->trim) {
+               lim.discard_granularity = rbd_dev->opts->alloc_size;
+               lim.max_hw_discard_sectors = objset_bytes >> SECTOR_SHIFT;
+               lim.max_write_zeroes_sectors = objset_bytes >> SECTOR_SHIFT;
+       }
+
+       disk = blk_mq_alloc_disk(&rbd_dev->tag_set, &lim, rbd_dev);
        if (IS_ERR(disk)) {
                err = PTR_ERR(disk);
                goto out_tag_set;
@@ -4981,19 +5001,6 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
        blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
        /* QUEUE_FLAG_ADD_RANDOM is off by default for blk-mq */
 
-       blk_queue_max_hw_sectors(q, objset_bytes >> SECTOR_SHIFT);
-       q->limits.max_sectors = queue_max_hw_sectors(q);
-       blk_queue_max_segments(q, USHRT_MAX);
-       blk_queue_max_segment_size(q, UINT_MAX);
-       blk_queue_io_min(q, rbd_dev->opts->alloc_size);
-       blk_queue_io_opt(q, rbd_dev->opts->alloc_size);
-
-       if (rbd_dev->opts->trim) {
-               q->limits.discard_granularity = rbd_dev->opts->alloc_size;
-               blk_queue_max_discard_sectors(q, objset_bytes >> SECTOR_SHIFT);
-               blk_queue_max_write_zeroes_sectors(q, objset_bytes >> SECTOR_SHIFT);
-       }
-
        if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
                blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, q);
 
@@ -5326,7 +5333,7 @@ static void rbd_dev_release(struct device *dev)
 
        if (need_put) {
                destroy_workqueue(rbd_dev->task_wq);
-               ida_simple_remove(&rbd_dev_id_ida, rbd_dev->dev_id);
+               ida_free(&rbd_dev_id_ida, rbd_dev->dev_id);
        }
 
        rbd_dev_free(rbd_dev);
@@ -5402,9 +5409,9 @@ static struct rbd_device *rbd_dev_create(struct rbd_client *rbdc,
                return NULL;
 
        /* get an id and fill in device name */
-       rbd_dev->dev_id = ida_simple_get(&rbd_dev_id_ida, 0,
-                                        minor_to_rbd_dev_id(1 << MINORBITS),
-                                        GFP_KERNEL);
+       rbd_dev->dev_id = ida_alloc_max(&rbd_dev_id_ida,
+                                       minor_to_rbd_dev_id(1 << MINORBITS) - 1,
+                                       GFP_KERNEL);
        if (rbd_dev->dev_id < 0)
                goto fail_rbd_dev;
 
@@ -5425,7 +5432,7 @@ static struct rbd_device *rbd_dev_create(struct rbd_client *rbdc,
        return rbd_dev;
 
 fail_dev_id:
-       ida_simple_remove(&rbd_dev_id_ida, rbd_dev->dev_id);
+       ida_free(&rbd_dev_id_ida, rbd_dev->dev_id);
 fail_rbd_dev:
        rbd_dev_free(rbd_dev);
        return NULL;
index 4044c369d22a5f6229d2907acacbba497c03b8f4..b7ffe03c61606d205558169193d61e632ce26b35 100644 (file)
@@ -1329,43 +1329,6 @@ static void rnbd_init_mq_hw_queues(struct rnbd_clt_dev *dev)
        }
 }
 
-static void setup_request_queue(struct rnbd_clt_dev *dev,
-                               struct rnbd_msg_open_rsp *rsp)
-{
-       blk_queue_logical_block_size(dev->queue,
-                                    le16_to_cpu(rsp->logical_block_size));
-       blk_queue_physical_block_size(dev->queue,
-                                     le16_to_cpu(rsp->physical_block_size));
-       blk_queue_max_hw_sectors(dev->queue,
-                                dev->sess->max_io_size / SECTOR_SIZE);
-
-       /*
-        * we don't support discards to "discontiguous" segments
-        * in on request
-        */
-       blk_queue_max_discard_segments(dev->queue, 1);
-
-       blk_queue_max_discard_sectors(dev->queue,
-                                     le32_to_cpu(rsp->max_discard_sectors));
-       dev->queue->limits.discard_granularity =
-                                       le32_to_cpu(rsp->discard_granularity);
-       dev->queue->limits.discard_alignment =
-                                       le32_to_cpu(rsp->discard_alignment);
-       if (le16_to_cpu(rsp->secure_discard))
-               blk_queue_max_secure_erase_sectors(dev->queue,
-                                       le32_to_cpu(rsp->max_discard_sectors));
-       blk_queue_flag_set(QUEUE_FLAG_SAME_COMP, dev->queue);
-       blk_queue_flag_set(QUEUE_FLAG_SAME_FORCE, dev->queue);
-       blk_queue_max_segments(dev->queue, dev->sess->max_segments);
-       blk_queue_io_opt(dev->queue, dev->sess->max_io_size);
-       blk_queue_virt_boundary(dev->queue, SZ_4K - 1);
-       blk_queue_write_cache(dev->queue,
-                             !!(rsp->cache_policy & RNBD_WRITEBACK),
-                             !!(rsp->cache_policy & RNBD_FUA));
-       blk_queue_max_write_zeroes_sectors(dev->queue,
-                                          le32_to_cpu(rsp->max_write_zeroes_sectors));
-}
-
 static int rnbd_clt_setup_gen_disk(struct rnbd_clt_dev *dev,
                                   struct rnbd_msg_open_rsp *rsp, int idx)
 {
@@ -1403,18 +1366,41 @@ static int rnbd_clt_setup_gen_disk(struct rnbd_clt_dev *dev,
 static int rnbd_client_setup_device(struct rnbd_clt_dev *dev,
                                    struct rnbd_msg_open_rsp *rsp)
 {
+       struct queue_limits lim = {
+               .logical_block_size     = le16_to_cpu(rsp->logical_block_size),
+               .physical_block_size    = le16_to_cpu(rsp->physical_block_size),
+               .io_opt                 = dev->sess->max_io_size,
+               .max_hw_sectors         = dev->sess->max_io_size / SECTOR_SIZE,
+               .max_hw_discard_sectors = le32_to_cpu(rsp->max_discard_sectors),
+               .discard_granularity    = le32_to_cpu(rsp->discard_granularity),
+               .discard_alignment      = le32_to_cpu(rsp->discard_alignment),
+               .max_segments           = dev->sess->max_segments,
+               .virt_boundary_mask     = SZ_4K - 1,
+               .max_write_zeroes_sectors =
+                       le32_to_cpu(rsp->max_write_zeroes_sectors),
+       };
        int idx = dev->clt_device_id;
 
        dev->size = le64_to_cpu(rsp->nsectors) *
                        le16_to_cpu(rsp->logical_block_size);
 
-       dev->gd = blk_mq_alloc_disk(&dev->sess->tag_set, dev);
+       if (rsp->secure_discard) {
+               lim.max_secure_erase_sectors =
+                       le32_to_cpu(rsp->max_discard_sectors);
+       }
+
+       dev->gd = blk_mq_alloc_disk(&dev->sess->tag_set, &lim, dev);
        if (IS_ERR(dev->gd))
                return PTR_ERR(dev->gd);
        dev->queue = dev->gd->queue;
        rnbd_init_mq_hw_queues(dev);
 
-       setup_request_queue(dev, rsp);
+       blk_queue_flag_set(QUEUE_FLAG_SAME_COMP, dev->queue);
+       blk_queue_flag_set(QUEUE_FLAG_SAME_FORCE, dev->queue);
+       blk_queue_write_cache(dev->queue,
+                             !!(rsp->cache_policy & RNBD_WRITEBACK),
+                             !!(rsp->cache_policy & RNBD_FUA));
+
        return rnbd_clt_setup_gen_disk(dev, rsp, idx);
 }
 
index 3a0d5dcec6f2559f85b13c432248533651c27c44..f6e3a3c4b76cc48b9b1fbe92c5cdca361598f08f 100644 (file)
@@ -145,7 +145,7 @@ static int process_rdma(struct rnbd_srv_session *srv_sess,
        priv->sess_dev = sess_dev;
        priv->id = id;
 
-       bio = bio_alloc(sess_dev->bdev_handle->bdev, 1,
+       bio = bio_alloc(file_bdev(sess_dev->bdev_file), 1,
                        rnbd_to_bio_flags(le32_to_cpu(msg->rw)), GFP_KERNEL);
        if (bio_add_page(bio, virt_to_page(data), datalen,
                        offset_in_page(data)) != datalen) {
@@ -219,7 +219,7 @@ void rnbd_destroy_sess_dev(struct rnbd_srv_sess_dev *sess_dev, bool keep_id)
        rnbd_put_sess_dev(sess_dev);
        wait_for_completion(&dc); /* wait for inflights to drop to zero */
 
-       bdev_release(sess_dev->bdev_handle);
+       fput(sess_dev->bdev_file);
        mutex_lock(&sess_dev->dev->lock);
        list_del(&sess_dev->dev_list);
        if (!sess_dev->readonly)
@@ -534,7 +534,7 @@ rnbd_srv_get_or_create_srv_dev(struct block_device *bdev,
 static void rnbd_srv_fill_msg_open_rsp(struct rnbd_msg_open_rsp *rsp,
                                        struct rnbd_srv_sess_dev *sess_dev)
 {
-       struct block_device *bdev = sess_dev->bdev_handle->bdev;
+       struct block_device *bdev = file_bdev(sess_dev->bdev_file);
 
        rsp->hdr.type = cpu_to_le16(RNBD_MSG_OPEN_RSP);
        rsp->device_id = cpu_to_le32(sess_dev->device_id);
@@ -560,7 +560,7 @@ static void rnbd_srv_fill_msg_open_rsp(struct rnbd_msg_open_rsp *rsp,
 static struct rnbd_srv_sess_dev *
 rnbd_srv_create_set_sess_dev(struct rnbd_srv_session *srv_sess,
                              const struct rnbd_msg_open *open_msg,
-                             struct bdev_handle *handle, bool readonly,
+                             struct file *bdev_file, bool readonly,
                              struct rnbd_srv_dev *srv_dev)
 {
        struct rnbd_srv_sess_dev *sdev = rnbd_sess_dev_alloc(srv_sess);
@@ -572,7 +572,7 @@ rnbd_srv_create_set_sess_dev(struct rnbd_srv_session *srv_sess,
 
        strscpy(sdev->pathname, open_msg->dev_name, sizeof(sdev->pathname));
 
-       sdev->bdev_handle       = handle;
+       sdev->bdev_file         = bdev_file;
        sdev->sess              = srv_sess;
        sdev->dev               = srv_dev;
        sdev->readonly          = readonly;
@@ -678,7 +678,7 @@ static int process_msg_open(struct rnbd_srv_session *srv_sess,
        struct rnbd_srv_dev *srv_dev;
        struct rnbd_srv_sess_dev *srv_sess_dev;
        const struct rnbd_msg_open *open_msg = msg;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        blk_mode_t open_flags = BLK_OPEN_READ;
        char *full_path;
        struct rnbd_msg_open_rsp *rsp = data;
@@ -716,15 +716,15 @@ static int process_msg_open(struct rnbd_srv_session *srv_sess,
                goto reject;
        }
 
-       bdev_handle = bdev_open_by_path(full_path, open_flags, NULL, NULL);
-       if (IS_ERR(bdev_handle)) {
-               ret = PTR_ERR(bdev_handle);
+       bdev_file = bdev_file_open_by_path(full_path, open_flags, NULL, NULL);
+       if (IS_ERR(bdev_file)) {
+               ret = PTR_ERR(bdev_file);
                pr_err("Opening device '%s' on session %s failed, failed to open the block device, err: %pe\n",
-                      full_path, srv_sess->sessname, bdev_handle);
+                      full_path, srv_sess->sessname, bdev_file);
                goto free_path;
        }
 
-       srv_dev = rnbd_srv_get_or_create_srv_dev(bdev_handle->bdev, srv_sess,
+       srv_dev = rnbd_srv_get_or_create_srv_dev(file_bdev(bdev_file), srv_sess,
                                                  open_msg->access_mode);
        if (IS_ERR(srv_dev)) {
                pr_err("Opening device '%s' on session %s failed, creating srv_dev failed, err: %pe\n",
@@ -734,7 +734,7 @@ static int process_msg_open(struct rnbd_srv_session *srv_sess,
        }
 
        srv_sess_dev = rnbd_srv_create_set_sess_dev(srv_sess, open_msg,
-                               bdev_handle,
+                               bdev_file,
                                open_msg->access_mode == RNBD_ACCESS_RO,
                                srv_dev);
        if (IS_ERR(srv_sess_dev)) {
@@ -750,7 +750,7 @@ static int process_msg_open(struct rnbd_srv_session *srv_sess,
         */
        mutex_lock(&srv_dev->lock);
        if (!srv_dev->dev_kobj.state_in_sysfs) {
-               ret = rnbd_srv_create_dev_sysfs(srv_dev, bdev_handle->bdev);
+               ret = rnbd_srv_create_dev_sysfs(srv_dev, file_bdev(bdev_file));
                if (ret) {
                        mutex_unlock(&srv_dev->lock);
                        rnbd_srv_err(srv_sess_dev,
@@ -793,7 +793,7 @@ srv_dev_put:
        }
        rnbd_put_srv_dev(srv_dev);
 blkdev_put:
-       bdev_release(bdev_handle);
+       fput(bdev_file);
 free_path:
        kfree(full_path);
 reject:
index 343cc682b617b4447e813e0027a7ed009133fe23..18d873808b8d835d61017124939b880dfc2d26d8 100644 (file)
@@ -46,7 +46,7 @@ struct rnbd_srv_dev {
 struct rnbd_srv_sess_dev {
        /* Entry inside rnbd_srv_dev struct */
        struct list_head                dev_list;
-       struct bdev_handle              *bdev_handle;
+       struct file                     *bdev_file;
        struct rnbd_srv_session         *sess;
        struct rnbd_srv_dev             *dev;
        struct kobject                  kobj;
index 7bf4b48e2282e72247d3db519110d4de04b80f14..c99dd6698977ea61992aa0cb087109ef1c380c3f 100644 (file)
@@ -784,6 +784,14 @@ static const struct blk_mq_ops vdc_mq_ops = {
 
 static int probe_disk(struct vdc_port *port)
 {
+       struct queue_limits lim = {
+               .physical_block_size            = port->vdisk_phys_blksz,
+               .max_hw_sectors                 = port->max_xfer_size,
+               /* Each segment in a request is up to an aligned page in size. */
+               .seg_boundary_mask              = PAGE_SIZE - 1,
+               .max_segment_size               = PAGE_SIZE,
+               .max_segments                   = port->ring_cookies,
+       };
        struct request_queue *q;
        struct gendisk *g;
        int err;
@@ -824,7 +832,7 @@ static int probe_disk(struct vdc_port *port)
        if (err)
                return err;
 
-       g = blk_mq_alloc_disk(&port->tag_set, port);
+       g = blk_mq_alloc_disk(&port->tag_set, &lim, port);
        if (IS_ERR(g)) {
                printk(KERN_ERR PFX "%s: Could not allocate gendisk.\n",
                       port->vio.name);
@@ -835,12 +843,6 @@ static int probe_disk(struct vdc_port *port)
        port->disk = g;
        q = g->queue;
 
-       /* Each segment in a request is up to an aligned page in size. */
-       blk_queue_segment_boundary(q, PAGE_SIZE - 1);
-       blk_queue_max_segment_size(q, PAGE_SIZE);
-
-       blk_queue_max_segments(q, port->ring_cookies);
-       blk_queue_max_hw_sectors(q, port->max_xfer_size);
        g->major = vdc_major;
        g->first_minor = port->vio.vdev->dev_no << PARTITION_SHIFT;
        g->minors = 1 << PARTITION_SHIFT;
@@ -872,8 +874,6 @@ static int probe_disk(struct vdc_port *port)
                }
        }
 
-       blk_queue_physical_block_size(q, port->vdisk_phys_blksz);
-
        pr_info(PFX "%s: %u sectors (%u MB) protocol %d.%d\n",
               g->disk_name,
               port->vdisk_size, (port->vdisk_size >> (20 - 9)),
index f85b6af414b4318b394665bec1a679f174b31755..6731678f3a41db753c306a3f90a84b6aa17bb0dc 100644 (file)
@@ -820,7 +820,7 @@ static int swim_floppy_init(struct swim_priv *swd)
                        goto exit_put_disks;
 
                swd->unit[drive].disk =
-                       blk_mq_alloc_disk(&swd->unit[drive].tag_set,
+                       blk_mq_alloc_disk(&swd->unit[drive].tag_set, NULL,
                                          &swd->unit[drive]);
                if (IS_ERR(swd->unit[drive].disk)) {
                        blk_mq_free_tag_set(&swd->unit[drive].tag_set);
@@ -916,7 +916,7 @@ out:
        return ret;
 }
 
-static int swim_remove(struct platform_device *dev)
+static void swim_remove(struct platform_device *dev)
 {
        struct swim_priv *swd = platform_get_drvdata(dev);
        int drive;
@@ -937,13 +937,11 @@ static int swim_remove(struct platform_device *dev)
                release_mem_region(res->start, resource_size(res));
 
        kfree(swd);
-
-       return 0;
 }
 
 static struct platform_driver swim_driver = {
        .probe  = swim_probe,
-       .remove = swim_remove,
+       .remove_new = swim_remove,
        .driver   = {
                .name   = CARDNAME,
        },
index c2bc85826358e93df0896821529ac8a5e81caa19..a04756ac778ee803ddbe287d7c770ceceb247694 100644 (file)
@@ -1210,7 +1210,7 @@ static int swim3_attach(struct macio_dev *mdev,
        if (rc)
                goto out_unregister;
 
-       disk = blk_mq_alloc_disk(&fs->tag_set, fs);
+       disk = blk_mq_alloc_disk(&fs->tag_set, NULL, fs);
        if (IS_ERR(disk)) {
                rc = PTR_ERR(disk);
                goto out_free_tag_set;
index 1dfb2e77898ba64215c8da9a9f3219f91f3616a6..bea3d5cf8a83487909270d5f2398267250507a31 100644 (file)
@@ -246,21 +246,12 @@ static int ublk_dev_param_zoned_validate(const struct ublk_device *ub)
        return 0;
 }
 
-static int ublk_dev_param_zoned_apply(struct ublk_device *ub)
+static void ublk_dev_param_zoned_apply(struct ublk_device *ub)
 {
-       const struct ublk_param_zoned *p = &ub->params.zoned;
-
-       disk_set_zoned(ub->ub_disk);
        blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, ub->ub_disk->queue);
        blk_queue_required_elevator_features(ub->ub_disk->queue,
                                             ELEVATOR_F_ZBD_SEQ_WRITE);
-       disk_set_max_active_zones(ub->ub_disk, p->max_active_zones);
-       disk_set_max_open_zones(ub->ub_disk, p->max_open_zones);
-       blk_queue_max_zone_append_sectors(ub->ub_disk->queue, p->max_zone_append_sectors);
-
        ub->ub_disk->nr_zones = ublk_get_nr_zones(ub);
-
-       return 0;
 }
 
 /* Based on virtblk_alloc_report_buffer */
@@ -432,9 +423,8 @@ static int ublk_dev_param_zoned_validate(const struct ublk_device *ub)
        return -EOPNOTSUPP;
 }
 
-static int ublk_dev_param_zoned_apply(struct ublk_device *ub)
+static void ublk_dev_param_zoned_apply(struct ublk_device *ub)
 {
-       return -EOPNOTSUPP;
 }
 
 static int ublk_revalidate_disk_zones(struct ublk_device *ub)
@@ -498,11 +488,6 @@ static void ublk_dev_param_basic_apply(struct ublk_device *ub)
        struct request_queue *q = ub->ub_disk->queue;
        const struct ublk_param_basic *p = &ub->params.basic;
 
-       blk_queue_logical_block_size(q, 1 << p->logical_bs_shift);
-       blk_queue_physical_block_size(q, 1 << p->physical_bs_shift);
-       blk_queue_io_min(q, 1 << p->io_min_shift);
-       blk_queue_io_opt(q, 1 << p->io_opt_shift);
-
        blk_queue_write_cache(q, p->attrs & UBLK_ATTR_VOLATILE_CACHE,
                        p->attrs & UBLK_ATTR_FUA);
        if (p->attrs & UBLK_ATTR_ROTATIONAL)
@@ -510,29 +495,12 @@ static void ublk_dev_param_basic_apply(struct ublk_device *ub)
        else
                blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
 
-       blk_queue_max_hw_sectors(q, p->max_sectors);
-       blk_queue_chunk_sectors(q, p->chunk_sectors);
-       blk_queue_virt_boundary(q, p->virt_boundary_mask);
-
        if (p->attrs & UBLK_ATTR_READ_ONLY)
                set_disk_ro(ub->ub_disk, true);
 
        set_capacity(ub->ub_disk, p->dev_sectors);
 }
 
-static void ublk_dev_param_discard_apply(struct ublk_device *ub)
-{
-       struct request_queue *q = ub->ub_disk->queue;
-       const struct ublk_param_discard *p = &ub->params.discard;
-
-       q->limits.discard_alignment = p->discard_alignment;
-       q->limits.discard_granularity = p->discard_granularity;
-       blk_queue_max_discard_sectors(q, p->max_discard_sectors);
-       blk_queue_max_write_zeroes_sectors(q,
-                       p->max_write_zeroes_sectors);
-       blk_queue_max_discard_segments(q, p->max_discard_segments);
-}
-
 static int ublk_validate_params(const struct ublk_device *ub)
 {
        /* basic param is the only one which must be set */
@@ -576,20 +544,12 @@ static int ublk_validate_params(const struct ublk_device *ub)
        return 0;
 }
 
-static int ublk_apply_params(struct ublk_device *ub)
+static void ublk_apply_params(struct ublk_device *ub)
 {
-       if (!(ub->params.types & UBLK_PARAM_TYPE_BASIC))
-               return -EINVAL;
-
        ublk_dev_param_basic_apply(ub);
 
-       if (ub->params.types & UBLK_PARAM_TYPE_DISCARD)
-               ublk_dev_param_discard_apply(ub);
-
        if (ub->params.types & UBLK_PARAM_TYPE_ZONED)
-               return ublk_dev_param_zoned_apply(ub);
-
-       return 0;
+               ublk_dev_param_zoned_apply(ub);
 }
 
 static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
@@ -645,14 +605,16 @@ static inline bool ublk_need_get_data(const struct ublk_queue *ubq)
        return ubq->flags & UBLK_F_NEED_GET_DATA;
 }
 
-static struct ublk_device *ublk_get_device(struct ublk_device *ub)
+/* Called in slow path only, keep it noinline for trace purpose */
+static noinline struct ublk_device *ublk_get_device(struct ublk_device *ub)
 {
        if (kobject_get_unless_zero(&ub->cdev_dev.kobj))
                return ub;
        return NULL;
 }
 
-static void ublk_put_device(struct ublk_device *ub)
+/* Called in slow path only, keep it noinline for trace purpose */
+static noinline void ublk_put_device(struct ublk_device *ub)
 {
        put_device(&ub->cdev_dev);
 }
@@ -711,7 +673,7 @@ static void ublk_free_disk(struct gendisk *disk)
        struct ublk_device *ub = disk->private_data;
 
        clear_bit(UB_STATE_USED, &ub->state);
-       put_device(&ub->cdev_dev);
+       ublk_put_device(ub);
 }
 
 static void ublk_store_owner_uid_gid(unsigned int *owner_uid,
@@ -2182,7 +2144,7 @@ static void ublk_remove(struct ublk_device *ub)
        cancel_work_sync(&ub->stop_work);
        cancel_work_sync(&ub->quiesce_work);
        cdev_device_del(&ub->cdev, &ub->cdev_dev);
-       put_device(&ub->cdev_dev);
+       ublk_put_device(ub);
        ublks_added--;
 }
 
@@ -2205,12 +2167,47 @@ static struct ublk_device *ublk_get_device_from_id(int idx)
 static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd)
 {
        const struct ublksrv_ctrl_cmd *header = io_uring_sqe_cmd(cmd->sqe);
+       const struct ublk_param_basic *p = &ub->params.basic;
        int ublksrv_pid = (int)header->data[0];
+       struct queue_limits lim = {
+               .logical_block_size     = 1 << p->logical_bs_shift,
+               .physical_block_size    = 1 << p->physical_bs_shift,
+               .io_min                 = 1 << p->io_min_shift,
+               .io_opt                 = 1 << p->io_opt_shift,
+               .max_hw_sectors         = p->max_sectors,
+               .chunk_sectors          = p->chunk_sectors,
+               .virt_boundary_mask     = p->virt_boundary_mask,
+
+       };
        struct gendisk *disk;
        int ret = -EINVAL;
 
        if (ublksrv_pid <= 0)
                return -EINVAL;
+       if (!(ub->params.types & UBLK_PARAM_TYPE_BASIC))
+               return -EINVAL;
+
+       if (ub->params.types & UBLK_PARAM_TYPE_DISCARD) {
+               const struct ublk_param_discard *pd = &ub->params.discard;
+
+               lim.discard_alignment = pd->discard_alignment;
+               lim.discard_granularity = pd->discard_granularity;
+               lim.max_hw_discard_sectors = pd->max_discard_sectors;
+               lim.max_write_zeroes_sectors = pd->max_write_zeroes_sectors;
+               lim.max_discard_segments = pd->max_discard_segments;
+       }
+
+       if (ub->params.types & UBLK_PARAM_TYPE_ZONED) {
+               const struct ublk_param_zoned *p = &ub->params.zoned;
+
+               if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED))
+                       return -EOPNOTSUPP;
+
+               lim.zoned = true;
+               lim.max_active_zones = p->max_active_zones;
+               lim.max_open_zones =  p->max_open_zones;
+               lim.max_zone_append_sectors = p->max_zone_append_sectors;
+       }
 
        if (wait_for_completion_interruptible(&ub->completion) != 0)
                return -EINTR;
@@ -2222,7 +2219,7 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd)
                goto out_unlock;
        }
 
-       disk = blk_mq_alloc_disk(&ub->tag_set, NULL);
+       disk = blk_mq_alloc_disk(&ub->tag_set, &lim, NULL);
        if (IS_ERR(disk)) {
                ret = PTR_ERR(disk);
                goto out_unlock;
@@ -2234,15 +2231,13 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd)
        ub->dev_info.ublksrv_pid = ublksrv_pid;
        ub->ub_disk = disk;
 
-       ret = ublk_apply_params(ub);
-       if (ret)
-               goto out_put_disk;
+       ublk_apply_params(ub);
 
        /* don't probe partitions if any one ubq daemon is un-trusted */
        if (ub->nr_privileged_daemon != ub->nr_queues_ready)
                set_bit(GD_SUPPRESS_PART_SCAN, &disk->state);
 
-       get_device(&ub->cdev_dev);
+       ublk_get_device(ub);
        ub->dev_info.state = UBLK_S_DEV_LIVE;
 
        if (ublk_dev_is_zoned(ub)) {
@@ -2262,7 +2257,6 @@ out_put_cdev:
                ub->dev_info.state = UBLK_S_DEV_DEAD;
                ublk_put_device(ub);
        }
-out_put_disk:
        if (ret)
                put_disk(disk);
 out_unlock:
@@ -2474,7 +2468,7 @@ static inline bool ublk_idr_freed(int id)
        return ptr == NULL;
 }
 
-static int ublk_ctrl_del_dev(struct ublk_device **p_ub)
+static int ublk_ctrl_del_dev(struct ublk_device **p_ub, bool wait)
 {
        struct ublk_device *ub = *p_ub;
        int idx = ub->ub_number;
@@ -2508,7 +2502,7 @@ static int ublk_ctrl_del_dev(struct ublk_device **p_ub)
         * - the device number is freed already, we will not find this
         *   device via ublk_get_device_from_id()
         */
-       if (wait_event_interruptible(ublk_idr_wq, ublk_idr_freed(idx)))
+       if (wait && wait_event_interruptible(ublk_idr_wq, ublk_idr_freed(idx)))
                return -EINTR;
        return 0;
 }
@@ -2907,7 +2901,10 @@ static int ublk_ctrl_uring_cmd(struct io_uring_cmd *cmd,
                ret = ublk_ctrl_add_dev(cmd);
                break;
        case UBLK_CMD_DEL_DEV:
-               ret = ublk_ctrl_del_dev(&ub);
+               ret = ublk_ctrl_del_dev(&ub, true);
+               break;
+       case UBLK_U_CMD_DEL_DEV_ASYNC:
+               ret = ublk_ctrl_del_dev(&ub, false);
                break;
        case UBLK_CMD_GET_QUEUE_AFFINITY:
                ret = ublk_ctrl_get_queue_affinity(ub, cmd);
index 5bf98fd6a651a506ff294545d6241f608af34568..42dea7601d8799279b17aaa4929b93503f721e3a 100644 (file)
@@ -720,25 +720,24 @@ fail_report:
        return ret;
 }
 
-static int virtblk_probe_zoned_device(struct virtio_device *vdev,
-                                      struct virtio_blk *vblk,
-                                      struct request_queue *q)
+static int virtblk_read_zoned_limits(struct virtio_blk *vblk,
+               struct queue_limits *lim)
 {
+       struct virtio_device *vdev = vblk->vdev;
        u32 v, wg;
 
        dev_dbg(&vdev->dev, "probing host-managed zoned device\n");
 
-       disk_set_zoned(vblk->disk);
-       blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
+       lim->zoned = true;
 
        virtio_cread(vdev, struct virtio_blk_config,
                     zoned.max_open_zones, &v);
-       disk_set_max_open_zones(vblk->disk, v);
+       lim->max_open_zones = v;
        dev_dbg(&vdev->dev, "max open zones = %u\n", v);
 
        virtio_cread(vdev, struct virtio_blk_config,
                     zoned.max_active_zones, &v);
-       disk_set_max_active_zones(vblk->disk, v);
+       lim->max_active_zones = v;
        dev_dbg(&vdev->dev, "max active zones = %u\n", v);
 
        virtio_cread(vdev, struct virtio_blk_config,
@@ -747,8 +746,8 @@ static int virtblk_probe_zoned_device(struct virtio_device *vdev,
                dev_warn(&vdev->dev, "zero write granularity reported\n");
                return -ENODEV;
        }
-       blk_queue_physical_block_size(q, wg);
-       blk_queue_io_min(q, wg);
+       lim->physical_block_size = wg;
+       lim->io_min = wg;
 
        dev_dbg(&vdev->dev, "write granularity = %u\n", wg);
 
@@ -764,13 +763,13 @@ static int virtblk_probe_zoned_device(struct virtio_device *vdev,
                        vblk->zone_sectors);
                return -ENODEV;
        }
-       blk_queue_chunk_sectors(q, vblk->zone_sectors);
+       lim->chunk_sectors = vblk->zone_sectors;
        dev_dbg(&vdev->dev, "zone sectors = %u\n", vblk->zone_sectors);
 
        if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) {
                dev_warn(&vblk->vdev->dev,
                         "ignoring negotiated F_DISCARD for zoned device\n");
-               blk_queue_max_discard_sectors(q, 0);
+               lim->max_hw_discard_sectors = 0;
        }
 
        virtio_cread(vdev, struct virtio_blk_config,
@@ -785,25 +784,21 @@ static int virtblk_probe_zoned_device(struct virtio_device *vdev,
                        wg, v);
                return -ENODEV;
        }
-       blk_queue_max_zone_append_sectors(q, v);
+       lim->max_zone_append_sectors = v;
        dev_dbg(&vdev->dev, "max append sectors = %u\n", v);
 
-       return blk_revalidate_disk_zones(vblk->disk, NULL);
+       return 0;
 }
-
 #else
-
 /*
- * Zoned block device support is not configured in this kernel.
- * Host-managed zoned devices can't be supported, but others are
- * good to go as regular block devices.
+ * Zoned block device support is not configured in this kernel, host-managed
+ * zoned devices can't be supported.
  */
 #define virtblk_report_zones       NULL
-
-static inline int virtblk_probe_zoned_device(struct virtio_device *vdev,
-                       struct virtio_blk *vblk, struct request_queue *q)
+static inline int virtblk_read_zoned_limits(struct virtio_blk *vblk,
+               struct queue_limits *lim)
 {
-       dev_err(&vdev->dev,
+       dev_err(&vblk->vdev->dev,
                "virtio_blk: zoned devices are not supported");
        return -EOPNOTSUPP;
 }
@@ -1248,31 +1243,17 @@ static const struct blk_mq_ops virtio_mq_ops = {
 static unsigned int virtblk_queue_depth;
 module_param_named(queue_depth, virtblk_queue_depth, uint, 0444);
 
-static int virtblk_probe(struct virtio_device *vdev)
+static int virtblk_read_limits(struct virtio_blk *vblk,
+               struct queue_limits *lim)
 {
-       struct virtio_blk *vblk;
-       struct request_queue *q;
-       int err, index;
-
+       struct virtio_device *vdev = vblk->vdev;
        u32 v, blk_size, max_size, sg_elems, opt_io_size;
        u32 max_discard_segs = 0;
        u32 discard_granularity = 0;
        u16 min_io_size;
        u8 physical_block_exp, alignment_offset;
-       unsigned int queue_depth;
        size_t max_dma_size;
-
-       if (!vdev->config->get) {
-               dev_err(&vdev->dev, "%s failure: config access disabled\n",
-                       __func__);
-               return -EINVAL;
-       }
-
-       err = ida_alloc_range(&vd_index_ida, 0,
-                             minor_to_index(1 << MINORBITS) - 1, GFP_KERNEL);
-       if (err < 0)
-               goto out;
-       index = err;
+       int err;
 
        /* We need to know how many segments before we allocate. */
        err = virtio_cread_feature(vdev, VIRTIO_BLK_F_SEG_MAX,
@@ -1286,78 +1267,11 @@ static int virtblk_probe(struct virtio_device *vdev)
        /* Prevent integer overflows and honor max vq size */
        sg_elems = min_t(u32, sg_elems, VIRTIO_BLK_MAX_SG_ELEMS - 2);
 
-       vdev->priv = vblk = kmalloc(sizeof(*vblk), GFP_KERNEL);
-       if (!vblk) {
-               err = -ENOMEM;
-               goto out_free_index;
-       }
-
-       mutex_init(&vblk->vdev_mutex);
-
-       vblk->vdev = vdev;
-
-       INIT_WORK(&vblk->config_work, virtblk_config_changed_work);
-
-       err = init_vq(vblk);
-       if (err)
-               goto out_free_vblk;
-
-       /* Default queue sizing is to fill the ring. */
-       if (!virtblk_queue_depth) {
-               queue_depth = vblk->vqs[0].vq->num_free;
-               /* ... but without indirect descs, we use 2 descs per req */
-               if (!virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC))
-                       queue_depth /= 2;
-       } else {
-               queue_depth = virtblk_queue_depth;
-       }
-
-       memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
-       vblk->tag_set.ops = &virtio_mq_ops;
-       vblk->tag_set.queue_depth = queue_depth;
-       vblk->tag_set.numa_node = NUMA_NO_NODE;
-       vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
-       vblk->tag_set.cmd_size =
-               sizeof(struct virtblk_req) +
-               sizeof(struct scatterlist) * VIRTIO_BLK_INLINE_SG_CNT;
-       vblk->tag_set.driver_data = vblk;
-       vblk->tag_set.nr_hw_queues = vblk->num_vqs;
-       vblk->tag_set.nr_maps = 1;
-       if (vblk->io_queues[HCTX_TYPE_POLL])
-               vblk->tag_set.nr_maps = 3;
-
-       err = blk_mq_alloc_tag_set(&vblk->tag_set);
-       if (err)
-               goto out_free_vq;
-
-       vblk->disk = blk_mq_alloc_disk(&vblk->tag_set, vblk);
-       if (IS_ERR(vblk->disk)) {
-               err = PTR_ERR(vblk->disk);
-               goto out_free_tags;
-       }
-       q = vblk->disk->queue;
-
-       virtblk_name_format("vd", index, vblk->disk->disk_name, DISK_NAME_LEN);
-
-       vblk->disk->major = major;
-       vblk->disk->first_minor = index_to_minor(index);
-       vblk->disk->minors = 1 << PART_BITS;
-       vblk->disk->private_data = vblk;
-       vblk->disk->fops = &virtblk_fops;
-       vblk->index = index;
-
-       /* configure queue flush support */
-       virtblk_update_cache_mode(vdev);
-
-       /* If disk is read-only in the host, the guest should obey */
-       if (virtio_has_feature(vdev, VIRTIO_BLK_F_RO))
-               set_disk_ro(vblk->disk, 1);
-
        /* We can handle whatever the host told us to handle. */
-       blk_queue_max_segments(q, sg_elems);
+       lim->max_segments = sg_elems;
 
        /* No real sector limit. */
-       blk_queue_max_hw_sectors(q, UINT_MAX);
+       lim->max_hw_sectors = UINT_MAX;
 
        max_dma_size = virtio_max_dma_size(vdev);
        max_size = max_dma_size > U32_MAX ? U32_MAX : max_dma_size;
@@ -1369,7 +1283,7 @@ static int virtblk_probe(struct virtio_device *vdev)
        if (!err)
                max_size = min(max_size, v);
 
-       blk_queue_max_segment_size(q, max_size);
+       lim->max_segment_size = max_size;
 
        /* Host can optionally specify the block size of the device */
        err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,
@@ -1381,38 +1295,37 @@ static int virtblk_probe(struct virtio_device *vdev)
                        dev_err(&vdev->dev,
                                "virtio_blk: invalid block size: 0x%x\n",
                                blk_size);
-                       goto out_cleanup_disk;
+                       return err;
                }
 
-               blk_queue_logical_block_size(q, blk_size);
+               lim->logical_block_size = blk_size;
        } else
-               blk_size = queue_logical_block_size(q);
+               blk_size = lim->logical_block_size;
 
        /* Use topology information if available */
        err = virtio_cread_feature(vdev, VIRTIO_BLK_F_TOPOLOGY,
                                   struct virtio_blk_config, physical_block_exp,
                                   &physical_block_exp);
        if (!err && physical_block_exp)
-               blk_queue_physical_block_size(q,
-                               blk_size * (1 << physical_block_exp));
+               lim->physical_block_size = blk_size * (1 << physical_block_exp);
 
        err = virtio_cread_feature(vdev, VIRTIO_BLK_F_TOPOLOGY,
                                   struct virtio_blk_config, alignment_offset,
                                   &alignment_offset);
        if (!err && alignment_offset)
-               blk_queue_alignment_offset(q, blk_size * alignment_offset);
+               lim->alignment_offset = blk_size * alignment_offset;
 
        err = virtio_cread_feature(vdev, VIRTIO_BLK_F_TOPOLOGY,
                                   struct virtio_blk_config, min_io_size,
                                   &min_io_size);
        if (!err && min_io_size)
-               blk_queue_io_min(q, blk_size * min_io_size);
+               lim->io_min = blk_size * min_io_size;
 
        err = virtio_cread_feature(vdev, VIRTIO_BLK_F_TOPOLOGY,
                                   struct virtio_blk_config, opt_io_size,
                                   &opt_io_size);
        if (!err && opt_io_size)
-               blk_queue_io_opt(q, blk_size * opt_io_size);
+               lim->io_opt = blk_size * opt_io_size;
 
        if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) {
                virtio_cread(vdev, struct virtio_blk_config,
@@ -1420,7 +1333,7 @@ static int virtblk_probe(struct virtio_device *vdev)
 
                virtio_cread(vdev, struct virtio_blk_config,
                             max_discard_sectors, &v);
-               blk_queue_max_discard_sectors(q, v ? v : UINT_MAX);
+               lim->max_hw_discard_sectors = v ? v : UINT_MAX;
 
                virtio_cread(vdev, struct virtio_blk_config, max_discard_seg,
                             &max_discard_segs);
@@ -1429,7 +1342,7 @@ static int virtblk_probe(struct virtio_device *vdev)
        if (virtio_has_feature(vdev, VIRTIO_BLK_F_WRITE_ZEROES)) {
                virtio_cread(vdev, struct virtio_blk_config,
                             max_write_zeroes_sectors, &v);
-               blk_queue_max_write_zeroes_sectors(q, v ? v : UINT_MAX);
+               lim->max_write_zeroes_sectors = v ? v : UINT_MAX;
        }
 
        /* The discard and secure erase limits are combined since the Linux
@@ -1455,8 +1368,7 @@ static int virtblk_probe(struct virtio_device *vdev)
                if (!v) {
                        dev_err(&vdev->dev,
                                "virtio_blk: secure_erase_sector_alignment can't be 0\n");
-                       err = -EINVAL;
-                       goto out_cleanup_disk;
+                       return -EINVAL;
                }
 
                discard_granularity = min_not_zero(discard_granularity, v);
@@ -1470,11 +1382,10 @@ static int virtblk_probe(struct virtio_device *vdev)
                if (!v) {
                        dev_err(&vdev->dev,
                                "virtio_blk: max_secure_erase_sectors can't be 0\n");
-                       err = -EINVAL;
-                       goto out_cleanup_disk;
+                       return -EINVAL;
                }
 
-               blk_queue_max_secure_erase_sectors(q, v);
+               lim->max_secure_erase_sectors = v;
 
                virtio_cread(vdev, struct virtio_blk_config,
                             max_secure_erase_seg, &v);
@@ -1485,8 +1396,7 @@ static int virtblk_probe(struct virtio_device *vdev)
                if (!v) {
                        dev_err(&vdev->dev,
                                "virtio_blk: max_secure_erase_seg can't be 0\n");
-                       err = -EINVAL;
-                       goto out_cleanup_disk;
+                       return -EINVAL;
                }
 
                max_discard_segs = min_not_zero(max_discard_segs, v);
@@ -1502,45 +1412,142 @@ static int virtblk_probe(struct virtio_device *vdev)
                if (!max_discard_segs)
                        max_discard_segs = sg_elems;
 
-               blk_queue_max_discard_segments(q,
-                                              min(max_discard_segs, MAX_DISCARD_SEGMENTS));
+               lim->max_discard_segments =
+                       min(max_discard_segs, MAX_DISCARD_SEGMENTS);
 
                if (discard_granularity)
-                       q->limits.discard_granularity = discard_granularity << SECTOR_SHIFT;
+                       lim->discard_granularity =
+                               discard_granularity << SECTOR_SHIFT;
                else
-                       q->limits.discard_granularity = blk_size;
+                       lim->discard_granularity = blk_size;
        }
 
-       virtblk_update_capacity(vblk, false);
-       virtio_device_ready(vdev);
-
-       /*
-        * All steps that follow use the VQs therefore they need to be
-        * placed after the virtio_device_ready() call above.
-        */
        if (virtio_has_feature(vdev, VIRTIO_BLK_F_ZONED)) {
                u8 model;
 
-               virtio_cread(vdev, struct virtio_blk_config, zoned.model,
-                               &model);
+               virtio_cread(vdev, struct virtio_blk_config, zoned.model, &model);
                switch (model) {
                case VIRTIO_BLK_Z_NONE:
                case VIRTIO_BLK_Z_HA:
-                       /* Present the host-aware device as non-zoned */
-                       break;
+                       /* treat host-aware devices as non-zoned */
+                       return 0;
                case VIRTIO_BLK_Z_HM:
-                       err = virtblk_probe_zoned_device(vdev, vblk, q);
+                       err = virtblk_read_zoned_limits(vblk, lim);
                        if (err)
-                               goto out_cleanup_disk;
+                               return err;
                        break;
                default:
-                       dev_err(&vdev->dev, "unsupported zone model %d\n",
-                               model);
-                       err = -EINVAL;
-                       goto out_cleanup_disk;
+                       dev_err(&vdev->dev, "unsupported zone model %d\n", model);
+                       return -EINVAL;
                }
        }
 
+       return 0;
+}
+
+static int virtblk_probe(struct virtio_device *vdev)
+{
+       struct virtio_blk *vblk;
+       struct queue_limits lim = { };
+       int err, index;
+       unsigned int queue_depth;
+
+       if (!vdev->config->get) {
+               dev_err(&vdev->dev, "%s failure: config access disabled\n",
+                       __func__);
+               return -EINVAL;
+       }
+
+       err = ida_alloc_range(&vd_index_ida, 0,
+                             minor_to_index(1 << MINORBITS) - 1, GFP_KERNEL);
+       if (err < 0)
+               goto out;
+       index = err;
+
+       vdev->priv = vblk = kmalloc(sizeof(*vblk), GFP_KERNEL);
+       if (!vblk) {
+               err = -ENOMEM;
+               goto out_free_index;
+       }
+
+       mutex_init(&vblk->vdev_mutex);
+
+       vblk->vdev = vdev;
+
+       INIT_WORK(&vblk->config_work, virtblk_config_changed_work);
+
+       err = init_vq(vblk);
+       if (err)
+               goto out_free_vblk;
+
+       /* Default queue sizing is to fill the ring. */
+       if (!virtblk_queue_depth) {
+               queue_depth = vblk->vqs[0].vq->num_free;
+               /* ... but without indirect descs, we use 2 descs per req */
+               if (!virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC))
+                       queue_depth /= 2;
+       } else {
+               queue_depth = virtblk_queue_depth;
+       }
+
+       memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
+       vblk->tag_set.ops = &virtio_mq_ops;
+       vblk->tag_set.queue_depth = queue_depth;
+       vblk->tag_set.numa_node = NUMA_NO_NODE;
+       vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
+       vblk->tag_set.cmd_size =
+               sizeof(struct virtblk_req) +
+               sizeof(struct scatterlist) * VIRTIO_BLK_INLINE_SG_CNT;
+       vblk->tag_set.driver_data = vblk;
+       vblk->tag_set.nr_hw_queues = vblk->num_vqs;
+       vblk->tag_set.nr_maps = 1;
+       if (vblk->io_queues[HCTX_TYPE_POLL])
+               vblk->tag_set.nr_maps = 3;
+
+       err = blk_mq_alloc_tag_set(&vblk->tag_set);
+       if (err)
+               goto out_free_vq;
+
+       err = virtblk_read_limits(vblk, &lim);
+       if (err)
+               goto out_free_tags;
+
+       vblk->disk = blk_mq_alloc_disk(&vblk->tag_set, &lim, vblk);
+       if (IS_ERR(vblk->disk)) {
+               err = PTR_ERR(vblk->disk);
+               goto out_free_tags;
+       }
+
+       virtblk_name_format("vd", index, vblk->disk->disk_name, DISK_NAME_LEN);
+
+       vblk->disk->major = major;
+       vblk->disk->first_minor = index_to_minor(index);
+       vblk->disk->minors = 1 << PART_BITS;
+       vblk->disk->private_data = vblk;
+       vblk->disk->fops = &virtblk_fops;
+       vblk->index = index;
+
+       /* configure queue flush support */
+       virtblk_update_cache_mode(vdev);
+
+       /* If disk is read-only in the host, the guest should obey */
+       if (virtio_has_feature(vdev, VIRTIO_BLK_F_RO))
+               set_disk_ro(vblk->disk, 1);
+
+       virtblk_update_capacity(vblk, false);
+       virtio_device_ready(vdev);
+
+       /*
+        * All steps that follow use the VQs therefore they need to be
+        * placed after the virtio_device_ready() call above.
+        */
+       if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && lim.zoned) {
+               blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, vblk->disk->queue);
+               err = blk_revalidate_disk_zones(vblk->disk, NULL);
+               if (err)
+                       goto out_cleanup_disk;
+       }
+
        err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
        if (err)
                goto out_cleanup_disk;
@@ -1593,14 +1600,15 @@ static int virtblk_freeze(struct virtio_device *vdev)
 {
        struct virtio_blk *vblk = vdev->priv;
 
+       /* Ensure no requests in virtqueues before deleting vqs. */
+       blk_mq_freeze_queue(vblk->disk->queue);
+
        /* Ensure we don't receive any more interrupts */
        virtio_reset_device(vdev);
 
        /* Make sure no work handler is accessing the device. */
        flush_work(&vblk->config_work);
 
-       blk_mq_quiesce_queue(vblk->disk->queue);
-
        vdev->config->del_vqs(vdev);
        kfree(vblk->vqs);
 
@@ -1618,7 +1626,7 @@ static int virtblk_restore(struct virtio_device *vdev)
 
        virtio_device_ready(vdev);
 
-       blk_mq_unquiesce_queue(vblk->disk->queue);
+       blk_mq_unfreeze_queue(vblk->disk->queue);
        return 0;
 }
 #endif
index 4defd7f387c786b937b445e3f7b8615fc733075c..944576d582fb14145e64822118aaebfc0df343fe 100644 (file)
@@ -465,7 +465,7 @@ static int xen_vbd_translate(struct phys_req *req, struct xen_blkif *blkif,
        }
 
        req->dev  = vbd->pdevice;
-       req->bdev = vbd->bdev_handle->bdev;
+       req->bdev = file_bdev(vbd->bdev_file);
        rc = 0;
 
  out:
@@ -969,7 +969,7 @@ static int dispatch_discard_io(struct xen_blkif_ring *ring,
        int err = 0;
        int status = BLKIF_RSP_OKAY;
        struct xen_blkif *blkif = ring->blkif;
-       struct block_device *bdev = blkif->vbd.bdev_handle->bdev;
+       struct block_device *bdev = file_bdev(blkif->vbd.bdev_file);
        struct phys_req preq;
 
        xen_blkif_get(blkif);
index 1432c83183d098eab8865a8f9186dcd194170493..b427d54bc1205ec2903a01d51bd87511e619041e 100644 (file)
@@ -221,7 +221,7 @@ struct xen_vbd {
        unsigned char           type;
        /* phys device that this vbd maps to. */
        u32                     pdevice;
-       struct bdev_handle      *bdev_handle;
+       struct file             *bdev_file;
        /* Cached size parameter. */
        sector_t                size;
        unsigned int            flush_support:1;
@@ -360,7 +360,7 @@ struct pending_req {
 };
 
 
-#define vbd_sz(_v)     bdev_nr_sectors((_v)->bdev_handle->bdev)
+#define vbd_sz(_v)     bdev_nr_sectors(file_bdev((_v)->bdev_file))
 
 #define xen_blkif_get(_b) (atomic_inc(&(_b)->refcnt))
 #define xen_blkif_put(_b)                              \
index e34219ea2b058c47d7464be4ce64d18c81d0e3f2..0621878940ae57c5fcd3425cfc80dadd757e82c3 100644 (file)
@@ -81,7 +81,7 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
        int i;
 
        /* Not ready to connect? */
-       if (!blkif->rings || !blkif->rings[0].irq || !blkif->vbd.bdev_handle)
+       if (!blkif->rings || !blkif->rings[0].irq || !blkif->vbd.bdev_file)
                return;
 
        /* Already connected? */
@@ -99,13 +99,12 @@ static void xen_update_blkif_status(struct xen_blkif *blkif)
                return;
        }
 
-       err = sync_blockdev(blkif->vbd.bdev_handle->bdev);
+       err = sync_blockdev(file_bdev(blkif->vbd.bdev_file));
        if (err) {
                xenbus_dev_error(blkif->be->dev, err, "block flush");
                return;
        }
-       invalidate_inode_pages2(
-                       blkif->vbd.bdev_handle->bdev->bd_inode->i_mapping);
+       invalidate_inode_pages2(blkif->vbd.bdev_file->f_mapping);
 
        for (i = 0; i < blkif->nr_rings; i++) {
                ring = &blkif->rings[i];
@@ -473,9 +472,9 @@ static void xenvbd_sysfs_delif(struct xenbus_device *dev)
 
 static void xen_vbd_free(struct xen_vbd *vbd)
 {
-       if (vbd->bdev_handle)
-               bdev_release(vbd->bdev_handle);
-       vbd->bdev_handle = NULL;
+       if (vbd->bdev_file)
+               fput(vbd->bdev_file);
+       vbd->bdev_file = NULL;
 }
 
 static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
@@ -483,7 +482,7 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
                          int cdrom)
 {
        struct xen_vbd *vbd;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
 
        vbd = &blkif->vbd;
        vbd->handle   = handle;
@@ -492,17 +491,17 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
 
        vbd->pdevice  = MKDEV(major, minor);
 
-       bdev_handle = bdev_open_by_dev(vbd->pdevice, vbd->readonly ?
+       bdev_file = bdev_file_open_by_dev(vbd->pdevice, vbd->readonly ?
                                 BLK_OPEN_READ : BLK_OPEN_WRITE, NULL, NULL);
 
-       if (IS_ERR(bdev_handle)) {
+       if (IS_ERR(bdev_file)) {
                pr_warn("xen_vbd_create: device %08x could not be opened\n",
                        vbd->pdevice);
                return -ENOENT;
        }
 
-       vbd->bdev_handle = bdev_handle;
-       if (vbd->bdev_handle->bdev->bd_disk == NULL) {
+       vbd->bdev_file = bdev_file;
+       if (file_bdev(vbd->bdev_file)->bd_disk == NULL) {
                pr_warn("xen_vbd_create: device %08x doesn't exist\n",
                        vbd->pdevice);
                xen_vbd_free(vbd);
@@ -510,14 +509,14 @@ static int xen_vbd_create(struct xen_blkif *blkif, blkif_vdev_t handle,
        }
        vbd->size = vbd_sz(vbd);
 
-       if (cdrom || disk_to_cdi(vbd->bdev_handle->bdev->bd_disk))
+       if (cdrom || disk_to_cdi(file_bdev(vbd->bdev_file)->bd_disk))
                vbd->type |= VDISK_CDROM;
-       if (vbd->bdev_handle->bdev->bd_disk->flags & GENHD_FL_REMOVABLE)
+       if (file_bdev(vbd->bdev_file)->bd_disk->flags & GENHD_FL_REMOVABLE)
                vbd->type |= VDISK_REMOVABLE;
 
-       if (bdev_write_cache(bdev_handle->bdev))
+       if (bdev_write_cache(file_bdev(bdev_file)))
                vbd->flush_support = true;
-       if (bdev_max_secure_erase_sectors(bdev_handle->bdev))
+       if (bdev_max_secure_erase_sectors(file_bdev(bdev_file)))
                vbd->discard_secure = true;
 
        pr_debug("Successful creation of handle=%04x (dom=%u)\n",
@@ -570,7 +569,7 @@ static void xen_blkbk_discard(struct xenbus_transaction xbt, struct backend_info
        struct xen_blkif *blkif = be->blkif;
        int err;
        int state = 0;
-       struct block_device *bdev = be->blkif->vbd.bdev_handle->bdev;
+       struct block_device *bdev = file_bdev(be->blkif->vbd.bdev_file);
 
        if (!xenbus_read_unsigned(dev->nodename, "discard-enable", 1))
                return;
@@ -932,7 +931,7 @@ again:
        }
        err = xenbus_printf(xbt, dev->nodename, "sector-size", "%lu",
                            (unsigned long)bdev_logical_block_size(
-                                       be->blkif->vbd.bdev_handle->bdev));
+                                       file_bdev(be->blkif->vbd.bdev_file)));
        if (err) {
                xenbus_dev_fatal(dev, err, "writing %s/sector-size",
                                 dev->nodename);
@@ -940,7 +939,7 @@ again:
        }
        err = xenbus_printf(xbt, dev->nodename, "physical-sector-size", "%u",
                            bdev_physical_block_size(
-                                       be->blkif->vbd.bdev_handle->bdev));
+                                       file_bdev(be->blkif->vbd.bdev_file)));
        if (err)
                xenbus_dev_error(dev, err, "writing %s/physical-sector-size",
                                 dev->nodename);
index 434fab306777439754a0f9feed417f848b2442a4..fd7c0ff2139cee128ece1011791b2db03283e66c 100644 (file)
@@ -941,39 +941,35 @@ static const struct blk_mq_ops blkfront_mq_ops = {
        .complete = blkif_complete_rq,
 };
 
-static void blkif_set_queue_limits(struct blkfront_info *info)
+static void blkif_set_queue_limits(const struct blkfront_info *info,
+               struct queue_limits *lim)
 {
-       struct request_queue *rq = info->rq;
-       struct gendisk *gd = info->gd;
        unsigned int segments = info->max_indirect_segments ? :
                                BLKIF_MAX_SEGMENTS_PER_REQUEST;
 
-       blk_queue_flag_set(QUEUE_FLAG_VIRT, rq);
-
        if (info->feature_discard) {
-               blk_queue_max_discard_sectors(rq, get_capacity(gd));
-               rq->limits.discard_granularity = info->discard_granularity ?:
-                                                info->physical_sector_size;
-               rq->limits.discard_alignment = info->discard_alignment;
+               lim->max_hw_discard_sectors = UINT_MAX;
+               if (info->discard_granularity)
+                       lim->discard_granularity = info->discard_granularity;
+               lim->discard_alignment = info->discard_alignment;
                if (info->feature_secdiscard)
-                       blk_queue_max_secure_erase_sectors(rq,
-                                                          get_capacity(gd));
+                       lim->max_secure_erase_sectors = UINT_MAX;
        }
 
        /* Hard sector size and max sectors impersonate the equiv. hardware. */
-       blk_queue_logical_block_size(rq, info->sector_size);
-       blk_queue_physical_block_size(rq, info->physical_sector_size);
-       blk_queue_max_hw_sectors(rq, (segments * XEN_PAGE_SIZE) / 512);
+       lim->logical_block_size = info->sector_size;
+       lim->physical_block_size = info->physical_sector_size;
+       lim->max_hw_sectors = (segments * XEN_PAGE_SIZE) / 512;
 
        /* Each segment in a request is up to an aligned page in size. */
-       blk_queue_segment_boundary(rq, PAGE_SIZE - 1);
-       blk_queue_max_segment_size(rq, PAGE_SIZE);
+       lim->seg_boundary_mask = PAGE_SIZE - 1;
+       lim->max_segment_size = PAGE_SIZE;
 
        /* Ensure a merged request will fit in a single I/O ring slot. */
-       blk_queue_max_segments(rq, segments / GRANTS_PER_PSEG);
+       lim->max_segments = segments / GRANTS_PER_PSEG;
 
        /* Make sure buffer addresses are sector-aligned. */
-       blk_queue_dma_alignment(rq, 511);
+       lim->dma_alignment = 511;
 }
 
 static const char *flush_info(struct blkfront_info *info)
@@ -1070,6 +1066,7 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
                struct blkfront_info *info, u16 sector_size,
                unsigned int physical_sector_size)
 {
+       struct queue_limits lim = {};
        struct gendisk *gd;
        int nr_minors = 1;
        int err;
@@ -1136,11 +1133,13 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
        if (err)
                goto out_release_minors;
 
-       gd = blk_mq_alloc_disk(&info->tag_set, info);
+       blkif_set_queue_limits(info, &lim);
+       gd = blk_mq_alloc_disk(&info->tag_set, &lim, info);
        if (IS_ERR(gd)) {
                err = PTR_ERR(gd);
                goto out_free_tag_set;
        }
+       blk_queue_flag_set(QUEUE_FLAG_VIRT, gd->queue);
 
        strcpy(gd->disk_name, DEV_NAME);
        ptr = encode_disk_name(gd->disk_name + sizeof(DEV_NAME) - 1, offset);
@@ -1162,7 +1161,6 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
        info->gd = gd;
        info->sector_size = sector_size;
        info->physical_sector_size = physical_sector_size;
-       blkif_set_queue_limits(info);
 
        xlvbd_flush(info);
 
@@ -2006,18 +2004,19 @@ static int blkfront_probe(struct xenbus_device *dev,
 
 static int blkif_recover(struct blkfront_info *info)
 {
+       struct queue_limits lim;
        unsigned int r_index;
        struct request *req, *n;
        int rc;
        struct bio *bio;
-       unsigned int segs;
        struct blkfront_ring_info *rinfo;
 
+       lim = queue_limits_start_update(info->rq);
        blkfront_gather_backend_features(info);
-       /* Reset limits changed by blk_mq_update_nr_hw_queues(). */
-       blkif_set_queue_limits(info);
-       segs = info->max_indirect_segments ? : BLKIF_MAX_SEGMENTS_PER_REQUEST;
-       blk_queue_max_segments(info->rq, segs / GRANTS_PER_PSEG);
+       blkif_set_queue_limits(info, &lim);
+       rc = queue_limits_commit_update(info->rq, &lim);
+       if (rc)
+               return rc;
 
        for_each_rinfo(info, rinfo, r_index) {
                rc = blkfront_setup_indirect(rinfo);
@@ -2037,7 +2036,9 @@ static int blkif_recover(struct blkfront_info *info)
        list_for_each_entry_safe(req, n, &info->requests, queuelist) {
                /* Requeue pending requests (flush or discard) */
                list_del_init(&req->queuelist);
-               BUG_ON(req->nr_phys_segments > segs);
+               BUG_ON(req->nr_phys_segments >
+                      (info->max_indirect_segments ? :
+                       BLKIF_MAX_SEGMENTS_PER_REQUEST));
                blk_mq_requeue_request(req, false);
        }
        blk_mq_start_stopped_hw_queues(info->rq, true);
index 11493167b0a848a255b5db442f97cb71e3f97e6a..7c5f4e4d9b50374cb96f5b1699cbeb0a60cc54d6 100644 (file)
@@ -318,7 +318,7 @@ static int z2ram_register_disk(int minor)
        struct gendisk *disk;
        int err;
 
-       disk = blk_mq_alloc_disk(&tag_set, NULL);
+       disk = blk_mq_alloc_disk(&tag_set, NULL, NULL);
        if (IS_ERR(disk))
                return PTR_ERR(disk);
 
index 6772e0c654fa7f885192caa273fc1220e5736682..da7a20fa6152a97462dbeeefc8fbb7a09409a91c 100644 (file)
@@ -426,11 +426,11 @@ static void reset_bdev(struct zram *zram)
        if (!zram->backing_dev)
                return;
 
-       bdev_release(zram->bdev_handle);
+       fput(zram->bdev_file);
        /* hope filp_close flush all of IO */
        filp_close(zram->backing_dev, NULL);
        zram->backing_dev = NULL;
-       zram->bdev_handle = NULL;
+       zram->bdev_file = NULL;
        zram->disk->fops = &zram_devops;
        kvfree(zram->bitmap);
        zram->bitmap = NULL;
@@ -476,7 +476,7 @@ static ssize_t backing_dev_store(struct device *dev,
        struct address_space *mapping;
        unsigned int bitmap_sz;
        unsigned long nr_pages, *bitmap = NULL;
-       struct bdev_handle *bdev_handle = NULL;
+       struct file *bdev_file = NULL;
        int err;
        struct zram *zram = dev_to_zram(dev);
 
@@ -513,11 +513,11 @@ static ssize_t backing_dev_store(struct device *dev,
                goto out;
        }
 
-       bdev_handle = bdev_open_by_dev(inode->i_rdev,
+       bdev_file = bdev_file_open_by_dev(inode->i_rdev,
                                BLK_OPEN_READ | BLK_OPEN_WRITE, zram, NULL);
-       if (IS_ERR(bdev_handle)) {
-               err = PTR_ERR(bdev_handle);
-               bdev_handle = NULL;
+       if (IS_ERR(bdev_file)) {
+               err = PTR_ERR(bdev_file);
+               bdev_file = NULL;
                goto out;
        }
 
@@ -531,7 +531,7 @@ static ssize_t backing_dev_store(struct device *dev,
 
        reset_bdev(zram);
 
-       zram->bdev_handle = bdev_handle;
+       zram->bdev_file = bdev_file;
        zram->backing_dev = backing_dev;
        zram->bitmap = bitmap;
        zram->nr_pages = nr_pages;
@@ -544,8 +544,8 @@ static ssize_t backing_dev_store(struct device *dev,
 out:
        kvfree(bitmap);
 
-       if (bdev_handle)
-               bdev_release(bdev_handle);
+       if (bdev_file)
+               fput(bdev_file);
 
        if (backing_dev)
                filp_close(backing_dev, NULL);
@@ -587,7 +587,7 @@ static void read_from_bdev_async(struct zram *zram, struct page *page,
 {
        struct bio *bio;
 
-       bio = bio_alloc(zram->bdev_handle->bdev, 1, parent->bi_opf, GFP_NOIO);
+       bio = bio_alloc(file_bdev(zram->bdev_file), 1, parent->bi_opf, GFP_NOIO);
        bio->bi_iter.bi_sector = entry * (PAGE_SIZE >> 9);
        __bio_add_page(bio, page, PAGE_SIZE, 0);
        bio_chain(bio, parent);
@@ -703,7 +703,7 @@ static ssize_t writeback_store(struct device *dev,
                        continue;
                }
 
-               bio_init(&bio, zram->bdev_handle->bdev, &bio_vec, 1,
+               bio_init(&bio, file_bdev(zram->bdev_file), &bio_vec, 1,
                         REQ_OP_WRITE | REQ_SYNC);
                bio.bi_iter.bi_sector = blk_idx * (PAGE_SIZE >> 9);
                __bio_add_page(&bio, page, PAGE_SIZE, 0);
@@ -785,7 +785,7 @@ static void zram_sync_read(struct work_struct *work)
        struct bio_vec bv;
        struct bio bio;
 
-       bio_init(&bio, zw->zram->bdev_handle->bdev, &bv, 1, REQ_OP_READ);
+       bio_init(&bio, file_bdev(zw->zram->bdev_file), &bv, 1, REQ_OP_READ);
        bio.bi_iter.bi_sector = zw->entry * (PAGE_SIZE >> 9);
        __bio_add_page(&bio, zw->page, PAGE_SIZE, 0);
        zw->error = submit_bio_wait(&bio);
@@ -2177,6 +2177,28 @@ ATTRIBUTE_GROUPS(zram_disk);
  */
 static int zram_add(void)
 {
+       struct queue_limits lim = {
+               .logical_block_size             = ZRAM_LOGICAL_BLOCK_SIZE,
+               /*
+                * To ensure that we always get PAGE_SIZE aligned and
+                * n*PAGE_SIZED sized I/O requests.
+                */
+               .physical_block_size            = PAGE_SIZE,
+               .io_min                         = PAGE_SIZE,
+               .io_opt                         = PAGE_SIZE,
+               .max_hw_discard_sectors         = UINT_MAX,
+               /*
+                * zram_bio_discard() will clear all logical blocks if logical
+                * block size is identical with physical block size(PAGE_SIZE).
+                * But if it is different, we will skip discarding some parts of
+                * logical blocks in the part of the request range which isn't
+                * aligned to physical block size.  So we can't ensure that all
+                * discarded logical blocks are zeroed.
+                */
+#if ZRAM_LOGICAL_BLOCK_SIZE == PAGE_SIZE
+               .max_write_zeroes_sectors       = UINT_MAX,
+#endif
+       };
        struct zram *zram;
        int ret, device_id;
 
@@ -2195,11 +2217,11 @@ static int zram_add(void)
 #endif
 
        /* gendisk structure */
-       zram->disk = blk_alloc_disk(NUMA_NO_NODE);
-       if (!zram->disk) {
+       zram->disk = blk_alloc_disk(&lim, NUMA_NO_NODE);
+       if (IS_ERR(zram->disk)) {
                pr_err("Error allocating disk structure for device %d\n",
                        device_id);
-               ret = -ENOMEM;
+               ret = PTR_ERR(zram->disk);
                goto out_free_idr;
        }
 
@@ -2216,29 +2238,6 @@ static int zram_add(void)
        /* zram devices sort of resembles non-rotational disks */
        blk_queue_flag_set(QUEUE_FLAG_NONROT, zram->disk->queue);
        blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, zram->disk->queue);
-
-       /*
-        * To ensure that we always get PAGE_SIZE aligned
-        * and n*PAGE_SIZED sized I/O requests.
-        */
-       blk_queue_physical_block_size(zram->disk->queue, PAGE_SIZE);
-       blk_queue_logical_block_size(zram->disk->queue,
-                                       ZRAM_LOGICAL_BLOCK_SIZE);
-       blk_queue_io_min(zram->disk->queue, PAGE_SIZE);
-       blk_queue_io_opt(zram->disk->queue, PAGE_SIZE);
-       blk_queue_max_discard_sectors(zram->disk->queue, UINT_MAX);
-
-       /*
-        * zram_bio_discard() will clear all logical blocks if logical block
-        * size is identical with physical block size(PAGE_SIZE). But if it is
-        * different, we will skip discarding some parts of logical blocks in
-        * the part of the request range which isn't aligned to physical block
-        * size.  So we can't ensure that all discarded logical blocks are
-        * zeroed.
-        */
-       if (ZRAM_LOGICAL_BLOCK_SIZE == PAGE_SIZE)
-               blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX);
-
        blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, zram->disk->queue);
        ret = device_add_disk(NULL, zram->disk, zram_disk_groups);
        if (ret)
index 3b94d12f41b40644b112b9362d371dd1108ece24..37bf29f34d26f0c068bf29f7084bc87534416d8f 100644 (file)
@@ -132,7 +132,7 @@ struct zram {
        spinlock_t wb_limit_lock;
        bool wb_limit_enable;
        u64 bd_wb_limit;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        unsigned long *bitmap;
        unsigned long nr_pages;
 #endif
index fdb0fae88d1c584e94bdc3b206999203779cd755..b40b32fa7f1c38c5d12931ee7b06e5b8ab144d77 100644 (file)
@@ -152,7 +152,7 @@ static int qca_send_patch_config_cmd(struct hci_dev *hdev)
        bt_dev_dbg(hdev, "QCA Patch config");
 
        skb = __hci_cmd_sync_ev(hdev, EDL_PATCH_CMD_OPCODE, sizeof(cmd),
-                               cmd, HCI_EV_VENDOR, HCI_INIT_TIMEOUT);
+                               cmd, 0, HCI_INIT_TIMEOUT);
        if (IS_ERR(skb)) {
                err = PTR_ERR(skb);
                bt_dev_err(hdev, "Sending QCA Patch config failed (%d)", err);
index a617578356953c30a4a882f7928d16d464a4a04d..9a7243d5db71ff35697cf26cf7a744910f2741fd 100644 (file)
@@ -1417,7 +1417,7 @@ static int bcm4377_check_bdaddr(struct bcm4377_data *bcm4377)
 
        bda = (struct hci_rp_read_bd_addr *)skb->data;
        if (!bcm4377_is_valid_bdaddr(bcm4377, &bda->bdaddr))
-               set_bit(HCI_QUIRK_INVALID_BDADDR, &bcm4377->hdev->quirks);
+               set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &bcm4377->hdev->quirks);
 
        kfree_skb(skb);
        return 0;
@@ -2368,7 +2368,6 @@ static int bcm4377_probe(struct pci_dev *pdev, const struct pci_device_id *id)
        hdev->set_bdaddr = bcm4377_hci_set_bdaddr;
        hdev->setup = bcm4377_hci_setup;
 
-       set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks);
        if (bcm4377->hw->broken_mws_transport_config)
                set_bit(HCI_QUIRK_BROKEN_MWS_TRANSPORT_CONFIG, &hdev->quirks);
        if (bcm4377->hw->broken_ext_scan)
index 94b8c406f0c0edf0245064bd994ea6b84637b7b1..edd2a81b4d5ed7f5f9f36058ffe9131877ddde56 100644 (file)
@@ -7,6 +7,7 @@
  *
  *  Copyright (C) 2007 Texas Instruments, Inc.
  *  Copyright (c) 2010, 2012, 2018 The Linux Foundation. All rights reserved.
+ *  Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
  *
  *  Acknowledgements:
  *  This file is based on hci_ll.c, which was...
@@ -1806,13 +1807,12 @@ static int qca_power_on(struct hci_dev *hdev)
 
 static void hci_coredump_qca(struct hci_dev *hdev)
 {
+       int err;
        static const u8 param[] = { 0x26 };
-       struct sk_buff *skb;
 
-       skb = __hci_cmd_sync(hdev, 0xfc0c, 1, param, HCI_CMD_TIMEOUT);
-       if (IS_ERR(skb))
-               bt_dev_err(hdev, "%s: trigger crash failed (%ld)", __func__, PTR_ERR(skb));
-       kfree_skb(skb);
+       err = __hci_cmd_send(hdev, 0xfc0c, 1, param);
+       if (err < 0)
+               bt_dev_err(hdev, "%s: trigger crash failed (%d)", __func__, err);
 }
 
 static int qca_get_data_path_id(struct hci_dev *hdev, __u8 *data_path_id)
@@ -1904,7 +1904,17 @@ retry:
        case QCA_WCN6750:
        case QCA_WCN6855:
        case QCA_WCN7850:
-               set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks);
+
+               /* Set BDA quirk bit for reading BDA value from fwnode property
+                * only if that property exist in DT.
+                */
+               if (fwnode_property_present(dev_fwnode(hdev->dev.parent), "local-bd-address")) {
+                       set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks);
+                       bt_dev_info(hdev, "setting quirk bit to read BDA from fwnode later");
+               } else {
+                       bt_dev_dbg(hdev, "local-bd-address` is not present in the devicetree so not setting quirk bit for BDA");
+               }
+
                hci_set_aosp_capable(hdev);
 
                ret = qca_read_soc_version(hdev, &ver, soc_type);
index 6b5da73c85417644b5885e534c39917e4e5496a3..837bf9d51c6ec93888cec97ecde0eb2a792339e2 100644 (file)
@@ -120,7 +120,7 @@ static int imx_weim_gpr_setup(struct platform_device *pdev)
                i++;
        }
 
-       if (i == 0 || i % 4)
+       if (i == 0)
                goto err;
 
        for (i = 0; i < ARRAY_SIZE(gprvals); i++) {
index 57186c58dc849c15db2f9c25ad8c816398f29986..1d7dd3d2c101cd4412876d62162fb733c800c02c 100644 (file)
@@ -129,8 +129,12 @@ static void ax45mp_dma_cache_wback(phys_addr_t paddr, size_t size)
        unsigned long line_size;
        unsigned long flags;
 
+       if (unlikely(start == end))
+               return;
+
        line_size = ax45mp_priv.ax45mp_cache_line_size;
        start = start & (~(line_size - 1));
+       end = ((end + line_size - 1) & (~(line_size - 1)));
        local_irq_save(flags);
        ax45mp_cpu_dcache_wb_range(start, end);
        local_irq_restore(flags);
index d668b174ace92fbd7e6a8635034f01249afb67e2..eefdd422ad8e9f8f9b8c47c0b2a62b84cc4dc749 100644 (file)
@@ -724,11 +724,6 @@ static void probe_gdrom_setupdisk(void)
 
 static int probe_gdrom_setupqueue(void)
 {
-       blk_queue_logical_block_size(gd.gdrom_rq, GDROM_HARD_SECTOR);
-       /* using DMA so memory will need to be contiguous */
-       blk_queue_max_segments(gd.gdrom_rq, 1);
-       /* set a large max size to get most from DMA */
-       blk_queue_max_segment_size(gd.gdrom_rq, 0x40000);
        gd.disk->queue = gd.gdrom_rq;
        return gdrom_init_dma_mode();
 }
@@ -743,6 +738,13 @@ static const struct blk_mq_ops gdrom_mq_ops = {
  */
 static int probe_gdrom(struct platform_device *devptr)
 {
+       struct queue_limits lim = {
+               .logical_block_size             = GDROM_HARD_SECTOR,
+               /* using DMA so memory will need to be contiguous */
+               .max_segments                   = 1,
+               /* set a large max size to get most from DMA */
+               .max_segment_size               = 0x40000,
+       };
        int err;
 
        /*
@@ -778,7 +780,7 @@ static int probe_gdrom(struct platform_device *devptr)
        if (err)
                goto probe_fail_free_cd_info;
 
-       gd.disk = blk_mq_alloc_disk(&gd.tag_set, NULL);
+       gd.disk = blk_mq_alloc_disk(&gd.tag_set, &lim, NULL);
        if (IS_ERR(gd.disk)) {
                err = PTR_ERR(gd.disk);
                goto probe_fail_free_tag_set;
@@ -829,7 +831,7 @@ probe_fail_no_mem:
        return err;
 }
 
-static int remove_gdrom(struct platform_device *devptr)
+static void remove_gdrom(struct platform_device *devptr)
 {
        blk_mq_free_tag_set(&gd.tag_set);
        free_irq(HW_EVENT_GDROM_CMD, &gd);
@@ -840,13 +842,11 @@ static int remove_gdrom(struct platform_device *devptr)
        unregister_cdrom(gd.cd_info);
        kfree(gd.cd_info);
        kfree(gd.toc);
-
-       return 0;
 }
 
 static struct platform_driver gdrom_driver = {
        .probe = probe_gdrom,
-       .remove = remove_gdrom,
+       .remove_new = remove_gdrom,
        .driver = {
                        .name = GDROM_DEV_NAME,
        },
index 0964bb11657f100916b85b7f00074a9bdb365c62..782993951fff8f7cc209329fc84af7f825fee143 100644 (file)
@@ -2475,7 +2475,7 @@ static const struct samsung_cmu_info misc_cmu_info __initconst = {
        .nr_clk_ids             = CLKS_NR_MISC,
        .clk_regs               = misc_clk_regs,
        .nr_clk_regs            = ARRAY_SIZE(misc_clk_regs),
-       .clk_name               = "dout_cmu_misc_bus",
+       .clk_name               = "bus",
 };
 
 /* ---- platform_driver ----------------------------------------------------- */
index e4974b508328d1ae50e839602c572581f15ada56..a933ef53845a5b4ad24b3adfb1a303d44e9b2517 100644 (file)
@@ -159,6 +159,7 @@ static int __subdev_8255_init(struct comedi_device *dev,
                return -ENOMEM;
 
        spriv->context = context;
+       spriv->io      = io;
 
        s->type         = COMEDI_SUBD_DIO;
        s->subdev_flags = SDF_READABLE | SDF_WRITABLE;
index 30ea8b53ebf8191db808b928041b0ac9a6e1512a..05ae9122823f8032bf62d8e7cd1f7570115a04f8 100644 (file)
@@ -87,6 +87,8 @@ struct waveform_private {
        struct comedi_device *dev;      /* parent comedi device */
        u64 ao_last_scan_time;          /* time of previous AO scan in usec */
        unsigned int ao_scan_period;    /* AO scan period in usec */
+       bool ai_timer_enable:1;         /* should AI timer be running? */
+       bool ao_timer_enable:1;         /* should AO timer be running? */
        unsigned short ao_loopbacks[N_CHANS];
 };
 
@@ -236,8 +238,12 @@ static void waveform_ai_timer(struct timer_list *t)
                        time_increment = devpriv->ai_convert_time - now;
                else
                        time_increment = 1;
-               mod_timer(&devpriv->ai_timer,
-                         jiffies + usecs_to_jiffies(time_increment));
+               spin_lock(&dev->spinlock);
+               if (devpriv->ai_timer_enable) {
+                       mod_timer(&devpriv->ai_timer,
+                                 jiffies + usecs_to_jiffies(time_increment));
+               }
+               spin_unlock(&dev->spinlock);
        }
 
 overrun:
@@ -393,9 +399,12 @@ static int waveform_ai_cmd(struct comedi_device *dev,
         * Seem to need an extra jiffy here, otherwise timer expires slightly
         * early!
         */
+       spin_lock_bh(&dev->spinlock);
+       devpriv->ai_timer_enable = true;
        devpriv->ai_timer.expires =
                jiffies + usecs_to_jiffies(devpriv->ai_convert_period) + 1;
        add_timer(&devpriv->ai_timer);
+       spin_unlock_bh(&dev->spinlock);
        return 0;
 }
 
@@ -404,6 +413,9 @@ static int waveform_ai_cancel(struct comedi_device *dev,
 {
        struct waveform_private *devpriv = dev->private;
 
+       spin_lock_bh(&dev->spinlock);
+       devpriv->ai_timer_enable = false;
+       spin_unlock_bh(&dev->spinlock);
        if (in_softirq()) {
                /* Assume we were called from the timer routine itself. */
                del_timer(&devpriv->ai_timer);
@@ -495,8 +507,12 @@ static void waveform_ao_timer(struct timer_list *t)
                unsigned int time_inc = devpriv->ao_last_scan_time +
                                        devpriv->ao_scan_period - now;
 
-               mod_timer(&devpriv->ao_timer,
-                         jiffies + usecs_to_jiffies(time_inc));
+               spin_lock(&dev->spinlock);
+               if (devpriv->ao_timer_enable) {
+                       mod_timer(&devpriv->ao_timer,
+                                 jiffies + usecs_to_jiffies(time_inc));
+               }
+               spin_unlock(&dev->spinlock);
        }
 
 underrun:
@@ -517,9 +533,12 @@ static int waveform_ao_inttrig_start(struct comedi_device *dev,
        async->inttrig = NULL;
 
        devpriv->ao_last_scan_time = ktime_to_us(ktime_get());
+       spin_lock_bh(&dev->spinlock);
+       devpriv->ao_timer_enable = true;
        devpriv->ao_timer.expires =
                jiffies + usecs_to_jiffies(devpriv->ao_scan_period);
        add_timer(&devpriv->ao_timer);
+       spin_unlock_bh(&dev->spinlock);
 
        return 1;
 }
@@ -604,6 +623,9 @@ static int waveform_ao_cancel(struct comedi_device *dev,
        struct waveform_private *devpriv = dev->private;
 
        s->async->inttrig = NULL;
+       spin_lock_bh(&dev->spinlock);
+       devpriv->ao_timer_enable = false;
+       spin_unlock_bh(&dev->spinlock);
        if (in_softirq()) {
                /* Assume we were called from the timer routine itself. */
                del_timer(&devpriv->ao_timer);
index 3d5e6d705fc6ee3a0224a0b1fa57c32076fe306f..44b19e69617632bf4951d8da1e514f9c0c689d4b 100644 (file)
@@ -108,9 +108,8 @@ static inline void send_msg(struct cn_msg *msg)
                filter_data[1] = 0;
        }
 
-       if (cn_netlink_send_mult(msg, msg->len, 0, CN_IDX_PROC, GFP_NOWAIT,
-                            cn_filter, (void *)filter_data) == -ESRCH)
-               atomic_set(&proc_event_num_listeners, 0);
+       cn_netlink_send_mult(msg, msg->len, 0, CN_IDX_PROC, GFP_NOWAIT,
+                            cn_filter, (void *)filter_data);
 
        local_unlock(&local_event.lock);
 }
index 09c77afb33ca84e79c077c87659252c64840929a..3f24481fc04a1258624020a9a919f2d10640057a 100644 (file)
@@ -31,10 +31,11 @@ struct counter_device_allochelper {
        struct counter_device counter;
 
        /*
-        * This is cache line aligned to ensure private data behaves like if it
-        * were kmalloced separately.
+        * This ensures private data behaves like if it were kmalloced
+        * separately. Also ensures the minimum alignment for safe DMA
+        * operations (which may or may not mean cache alignment).
         */
-       unsigned long privdata[] ____cacheline_aligned;
+       unsigned long privdata[] __aligned(ARCH_DMA_MINALIGN);
 };
 
 static void counter_device_release(struct device *dev)
index 1f6186475715e0592df1028ade0a336703338b15..1791d37fbc53c57e0f13469934eee357c0de87cc 100644 (file)
@@ -1232,14 +1232,13 @@ static void amd_pstate_epp_update_limit(struct cpufreq_policy *policy)
        max_limit_perf = div_u64(policy->max * cpudata->highest_perf, cpudata->max_freq);
        min_limit_perf = div_u64(policy->min * cpudata->highest_perf, cpudata->max_freq);
 
+       WRITE_ONCE(cpudata->max_limit_perf, max_limit_perf);
+       WRITE_ONCE(cpudata->min_limit_perf, min_limit_perf);
+
        max_perf = clamp_t(unsigned long, max_perf, cpudata->min_limit_perf,
                        cpudata->max_limit_perf);
        min_perf = clamp_t(unsigned long, min_perf, cpudata->min_limit_perf,
                        cpudata->max_limit_perf);
-
-       WRITE_ONCE(cpudata->max_limit_perf, max_limit_perf);
-       WRITE_ONCE(cpudata->min_limit_perf, min_limit_perf);
-
        value = READ_ONCE(cpudata->cppc_req_cached);
 
        if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE)
index 2ca70b0b5fdc5d39990bb88ba4dcd1f7e7131d31..79619227ea511b5247ca7941400ae821b1030f73 100644 (file)
@@ -529,6 +529,30 @@ static int intel_pstate_cppc_get_scaling(int cpu)
 }
 #endif /* CONFIG_ACPI_CPPC_LIB */
 
+static int intel_pstate_freq_to_hwp_rel(struct cpudata *cpu, int freq,
+                                       unsigned int relation)
+{
+       if (freq == cpu->pstate.turbo_freq)
+               return cpu->pstate.turbo_pstate;
+
+       if (freq == cpu->pstate.max_freq)
+               return cpu->pstate.max_pstate;
+
+       switch (relation) {
+       case CPUFREQ_RELATION_H:
+               return freq / cpu->pstate.scaling;
+       case CPUFREQ_RELATION_C:
+               return DIV_ROUND_CLOSEST(freq, cpu->pstate.scaling);
+       }
+
+       return DIV_ROUND_UP(freq, cpu->pstate.scaling);
+}
+
+static int intel_pstate_freq_to_hwp(struct cpudata *cpu, int freq)
+{
+       return intel_pstate_freq_to_hwp_rel(cpu, freq, CPUFREQ_RELATION_L);
+}
+
 /**
  * intel_pstate_hybrid_hwp_adjust - Calibrate HWP performance levels.
  * @cpu: Target CPU.
@@ -546,6 +570,7 @@ static void intel_pstate_hybrid_hwp_adjust(struct cpudata *cpu)
        int perf_ctl_scaling = cpu->pstate.perf_ctl_scaling;
        int perf_ctl_turbo = pstate_funcs.get_turbo(cpu->cpu);
        int scaling = cpu->pstate.scaling;
+       int freq;
 
        pr_debug("CPU%d: perf_ctl_max_phys = %d\n", cpu->cpu, perf_ctl_max_phys);
        pr_debug("CPU%d: perf_ctl_turbo = %d\n", cpu->cpu, perf_ctl_turbo);
@@ -559,16 +584,16 @@ static void intel_pstate_hybrid_hwp_adjust(struct cpudata *cpu)
        cpu->pstate.max_freq = rounddown(cpu->pstate.max_pstate * scaling,
                                         perf_ctl_scaling);
 
-       cpu->pstate.max_pstate_physical =
-                       DIV_ROUND_UP(perf_ctl_max_phys * perf_ctl_scaling,
-                                    scaling);
+       freq = perf_ctl_max_phys * perf_ctl_scaling;
+       cpu->pstate.max_pstate_physical = intel_pstate_freq_to_hwp(cpu, freq);
 
-       cpu->pstate.min_freq = cpu->pstate.min_pstate * perf_ctl_scaling;
+       freq = cpu->pstate.min_pstate * perf_ctl_scaling;
+       cpu->pstate.min_freq = freq;
        /*
         * Cast the min P-state value retrieved via pstate_funcs.get_min() to
         * the effective range of HWP performance levels.
         */
-       cpu->pstate.min_pstate = DIV_ROUND_UP(cpu->pstate.min_freq, scaling);
+       cpu->pstate.min_pstate = intel_pstate_freq_to_hwp(cpu, freq);
 }
 
 static inline void update_turbo_state(void)
@@ -2528,13 +2553,12 @@ static void intel_pstate_update_perf_limits(struct cpudata *cpu,
         * abstract values to represent performance rather than pure ratios.
         */
        if (hwp_active && cpu->pstate.scaling != perf_ctl_scaling) {
-               int scaling = cpu->pstate.scaling;
                int freq;
 
                freq = max_policy_perf * perf_ctl_scaling;
-               max_policy_perf = DIV_ROUND_UP(freq, scaling);
+               max_policy_perf = intel_pstate_freq_to_hwp(cpu, freq);
                freq = min_policy_perf * perf_ctl_scaling;
-               min_policy_perf = DIV_ROUND_UP(freq, scaling);
+               min_policy_perf = intel_pstate_freq_to_hwp(cpu, freq);
        }
 
        pr_debug("cpu:%d min_policy_perf:%d max_policy_perf:%d\n",
@@ -2908,18 +2932,7 @@ static int intel_cpufreq_target(struct cpufreq_policy *policy,
 
        cpufreq_freq_transition_begin(policy, &freqs);
 
-       switch (relation) {
-       case CPUFREQ_RELATION_L:
-               target_pstate = DIV_ROUND_UP(freqs.new, cpu->pstate.scaling);
-               break;
-       case CPUFREQ_RELATION_H:
-               target_pstate = freqs.new / cpu->pstate.scaling;
-               break;
-       default:
-               target_pstate = DIV_ROUND_CLOSEST(freqs.new, cpu->pstate.scaling);
-               break;
-       }
-
+       target_pstate = intel_pstate_freq_to_hwp_rel(cpu, freqs.new, relation);
        target_pstate = intel_cpufreq_update_pstate(policy, target_pstate, false);
 
        freqs.new = target_pstate * cpu->pstate.scaling;
@@ -2937,7 +2950,7 @@ static unsigned int intel_cpufreq_fast_switch(struct cpufreq_policy *policy,
 
        update_turbo_state();
 
-       target_pstate = DIV_ROUND_UP(target_freq, cpu->pstate.scaling);
+       target_pstate = intel_pstate_freq_to_hwp(cpu, target_freq);
 
        target_pstate = intel_cpufreq_update_pstate(policy, target_pstate, true);
 
@@ -2974,6 +2987,9 @@ static void intel_cpufreq_adjust_perf(unsigned int cpunum,
        if (min_pstate < cpu->min_perf_ratio)
                min_pstate = cpu->min_perf_ratio;
 
+       if (min_pstate > cpu->max_perf_ratio)
+               min_pstate = cpu->max_perf_ratio;
+
        max_pstate = min(cap_pstate, cpu->max_perf_ratio);
        if (max_pstate < min_pstate)
                max_pstate = min_pstate;
index 1262a7773ef304d184799771166ca5700fb7871a..de50c00ba218fb19302438b6df29f24a38a9c591 100644 (file)
@@ -299,22 +299,6 @@ theend:
        return err;
 }
 
-static void sun8i_ce_cipher_run(struct crypto_engine *engine, void *areq)
-{
-       struct skcipher_request *breq = container_of(areq, struct skcipher_request, base);
-       struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(breq);
-       struct sun8i_cipher_tfm_ctx *op = crypto_skcipher_ctx(tfm);
-       struct sun8i_ce_dev *ce = op->ce;
-       struct sun8i_cipher_req_ctx *rctx = skcipher_request_ctx(breq);
-       int flow, err;
-
-       flow = rctx->flow;
-       err = sun8i_ce_run_task(ce, flow, crypto_tfm_alg_name(breq->base.tfm));
-       local_bh_disable();
-       crypto_finalize_skcipher_request(engine, breq, err);
-       local_bh_enable();
-}
-
 static void sun8i_ce_cipher_unprepare(struct crypto_engine *engine,
                                      void *async_req)
 {
@@ -360,6 +344,23 @@ static void sun8i_ce_cipher_unprepare(struct crypto_engine *engine,
        dma_unmap_single(ce->dev, rctx->addr_key, op->keylen, DMA_TO_DEVICE);
 }
 
+static void sun8i_ce_cipher_run(struct crypto_engine *engine, void *areq)
+{
+       struct skcipher_request *breq = container_of(areq, struct skcipher_request, base);
+       struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(breq);
+       struct sun8i_cipher_tfm_ctx *op = crypto_skcipher_ctx(tfm);
+       struct sun8i_ce_dev *ce = op->ce;
+       struct sun8i_cipher_req_ctx *rctx = skcipher_request_ctx(breq);
+       int flow, err;
+
+       flow = rctx->flow;
+       err = sun8i_ce_run_task(ce, flow, crypto_tfm_alg_name(breq->base.tfm));
+       sun8i_ce_cipher_unprepare(engine, areq);
+       local_bh_disable();
+       crypto_finalize_skcipher_request(engine, breq, err);
+       local_bh_enable();
+}
+
 int sun8i_ce_cipher_do_one(struct crypto_engine *engine, void *areq)
 {
        int err = sun8i_ce_cipher_prepare(engine, areq);
@@ -368,7 +369,6 @@ int sun8i_ce_cipher_do_one(struct crypto_engine *engine, void *areq)
                return err;
 
        sun8i_ce_cipher_run(engine, areq);
-       sun8i_ce_cipher_unprepare(engine, areq);
        return 0;
 }
 
index a148ff1f0872c419fc2198f64174d26e45342289..a4f6884416a0486181426c8a22d885f4f0534ea0 100644 (file)
@@ -4545,6 +4545,7 @@ struct caam_hash_alg {
        struct list_head entry;
        struct device *dev;
        int alg_type;
+       bool is_hmac;
        struct ahash_alg ahash_alg;
 };
 
@@ -4571,7 +4572,7 @@ static int caam_hash_cra_init(struct crypto_tfm *tfm)
 
        ctx->dev = caam_hash->dev;
 
-       if (alg->setkey) {
+       if (caam_hash->is_hmac) {
                ctx->adata.key_dma = dma_map_single_attrs(ctx->dev, ctx->key,
                                                          ARRAY_SIZE(ctx->key),
                                                          DMA_TO_DEVICE,
@@ -4611,7 +4612,7 @@ static int caam_hash_cra_init(struct crypto_tfm *tfm)
         * For keyed hash algorithms shared descriptors
         * will be created later in setkey() callback
         */
-       return alg->setkey ? 0 : ahash_set_sh_desc(ahash);
+       return caam_hash->is_hmac ? 0 : ahash_set_sh_desc(ahash);
 }
 
 static void caam_hash_cra_exit(struct crypto_tfm *tfm)
@@ -4646,12 +4647,14 @@ static struct caam_hash_alg *caam_hash_alloc(struct device *dev,
                         template->hmac_name);
                snprintf(alg->cra_driver_name, CRYPTO_MAX_ALG_NAME, "%s",
                         template->hmac_driver_name);
+               t_alg->is_hmac = true;
        } else {
                snprintf(alg->cra_name, CRYPTO_MAX_ALG_NAME, "%s",
                         template->name);
                snprintf(alg->cra_driver_name, CRYPTO_MAX_ALG_NAME, "%s",
                         template->driver_name);
                t_alg->ahash_alg.setkey = NULL;
+               t_alg->is_hmac = false;
        }
        alg->cra_module = THIS_MODULE;
        alg->cra_init = caam_hash_cra_init;
index 290c8500c247f9cbf20fb055e3715400a5f30646..fdd724228c2fa8accc7c7ebc1244c5ee92423247 100644 (file)
@@ -1753,6 +1753,7 @@ static struct caam_hash_template driver_hash[] = {
 struct caam_hash_alg {
        struct list_head entry;
        int alg_type;
+       bool is_hmac;
        struct ahash_engine_alg ahash_alg;
 };
 
@@ -1804,7 +1805,7 @@ static int caam_hash_cra_init(struct crypto_tfm *tfm)
        } else {
                if (priv->era >= 6) {
                        ctx->dir = DMA_BIDIRECTIONAL;
-                       ctx->key_dir = alg->setkey ? DMA_TO_DEVICE : DMA_NONE;
+                       ctx->key_dir = caam_hash->is_hmac ? DMA_TO_DEVICE : DMA_NONE;
                } else {
                        ctx->dir = DMA_TO_DEVICE;
                        ctx->key_dir = DMA_NONE;
@@ -1862,7 +1863,7 @@ static int caam_hash_cra_init(struct crypto_tfm *tfm)
         * For keyed hash algorithms shared descriptors
         * will be created later in setkey() callback
         */
-       return alg->setkey ? 0 : ahash_set_sh_desc(ahash);
+       return caam_hash->is_hmac ? 0 : ahash_set_sh_desc(ahash);
 }
 
 static void caam_hash_cra_exit(struct crypto_tfm *tfm)
@@ -1915,12 +1916,14 @@ caam_hash_alloc(struct caam_hash_template *template,
                         template->hmac_name);
                snprintf(alg->cra_driver_name, CRYPTO_MAX_ALG_NAME, "%s",
                         template->hmac_driver_name);
+               t_alg->is_hmac = true;
        } else {
                snprintf(alg->cra_name, CRYPTO_MAX_ALG_NAME, "%s",
                         template->name);
                snprintf(alg->cra_driver_name, CRYPTO_MAX_ALG_NAME, "%s",
                         template->driver_name);
                halg->setkey = NULL;
+               t_alg->is_hmac = false;
        }
        alg->cra_module = THIS_MODULE;
        alg->cra_init = caam_hash_cra_init;
index e4d3f45242f63258ea0efc9f0a0a7ca9b411333c..b04bc1d3d627d447c2cfc10b9078b040800c8406 100644 (file)
@@ -534,10 +534,16 @@ EXPORT_SYMBOL_GPL(sev_platform_init);
 
 static int __sev_platform_shutdown_locked(int *error)
 {
-       struct sev_device *sev = psp_master->sev_data;
+       struct psp_device *psp = psp_master;
+       struct sev_device *sev;
        int ret;
 
-       if (!sev || sev->state == SEV_STATE_UNINIT)
+       if (!psp || !psp->sev_data)
+               return 0;
+
+       sev = psp->sev_data;
+
+       if (sev->state == SEV_STATE_UNINIT)
                return 0;
 
        ret = __sev_do_cmd_locked(SEV_CMD_SHUTDOWN, NULL, error);
index 479062aa5e6b61c2706ff8b4f4fe912f52ded3dc..94a0ebb03d8c96804b455f73a8d8b3155baab866 100644 (file)
@@ -463,6 +463,7 @@ void adf_init_hw_data_4xxx(struct adf_hw_device_data *hw_data, u32 dev_id)
                hw_data->fw_name = ADF_402XX_FW;
                hw_data->fw_mmp_name = ADF_402XX_MMP;
                hw_data->uof_get_name = uof_get_name_402xx;
+               hw_data->get_ena_thd_mask = get_ena_thd_mask;
                break;
        case ADF_401XX_PCI_DEVICE_ID:
                hw_data->fw_name = ADF_4XXX_FW;
index 1b13b4aa16ecc441a37266996f1b4aca6863a436..a235e6c300f1e5419eb06945757946ced70f12e2 100644 (file)
@@ -332,12 +332,12 @@ static int rk_hash_run(struct crypto_engine *engine, void *breq)
 theend:
        pm_runtime_put_autosuspend(rkc->dev);
 
+       rk_hash_unprepare(engine, breq);
+
        local_bh_disable();
        crypto_finalize_hash_request(engine, breq, err);
        local_bh_enable();
 
-       rk_hash_unprepare(engine, breq);
-
        return 0;
 }
 
index 2621ff8a93764d4ad905bcfe7e52331f45bb2c71..de53eddf6796b6c6ac6eafdeaee9a7ee03c979d3 100644 (file)
@@ -104,7 +104,8 @@ static void virtio_crypto_dataq_akcipher_callback(struct virtio_crypto_request *
 }
 
 static int virtio_crypto_alg_akcipher_init_session(struct virtio_crypto_akcipher_ctx *ctx,
-               struct virtio_crypto_ctrl_header *header, void *para,
+               struct virtio_crypto_ctrl_header *header,
+               struct virtio_crypto_akcipher_session_para *para,
                const uint8_t *key, unsigned int keylen)
 {
        struct scatterlist outhdr_sg, key_sg, inhdr_sg, *sgs[3];
@@ -128,7 +129,7 @@ static int virtio_crypto_alg_akcipher_init_session(struct virtio_crypto_akcipher
 
        ctrl = &vc_ctrl_req->ctrl;
        memcpy(&ctrl->header, header, sizeof(ctrl->header));
-       memcpy(&ctrl->u, para, sizeof(ctrl->u));
+       memcpy(&ctrl->u.akcipher_create_session.para, para, sizeof(*para));
        input = &vc_ctrl_req->input;
        input->status = cpu_to_le32(VIRTIO_CRYPTO_ERR);
 
index dcf2b39e1048822ca90324667d85f68225c05fa4..1a3e6aafbdcc33dd2aae8731be8a5ad52cc0891e 100644 (file)
@@ -316,31 +316,27 @@ static const struct cxl_root_ops acpi_root_ops = {
        .qos_class = cxl_acpi_qos_class,
 };
 
-static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
-                          const unsigned long end)
+static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws,
+                            struct cxl_cfmws_context *ctx)
 {
        int target_map[CXL_DECODER_MAX_INTERLEAVE];
-       struct cxl_cfmws_context *ctx = arg;
        struct cxl_port *root_port = ctx->root_port;
        struct resource *cxl_res = ctx->cxl_res;
        struct cxl_cxims_context cxims_ctx;
        struct cxl_root_decoder *cxlrd;
        struct device *dev = ctx->dev;
-       struct acpi_cedt_cfmws *cfmws;
        cxl_calc_hb_fn cxl_calc_hb;
        struct cxl_decoder *cxld;
        unsigned int ways, i, ig;
        struct resource *res;
        int rc;
 
-       cfmws = (struct acpi_cedt_cfmws *) header;
-
        rc = cxl_acpi_cfmws_verify(dev, cfmws);
        if (rc) {
                dev_err(dev, "CFMWS range %#llx-%#llx not registered\n",
                        cfmws->base_hpa,
                        cfmws->base_hpa + cfmws->window_size - 1);
-               return 0;
+               return rc;
        }
 
        rc = eiw_to_ways(cfmws->interleave_ways, &ways);
@@ -376,7 +372,7 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
 
        cxlrd = cxl_root_decoder_alloc(root_port, ways, cxl_calc_hb);
        if (IS_ERR(cxlrd))
-               return 0;
+               return PTR_ERR(cxlrd);
 
        cxld = &cxlrd->cxlsd.cxld;
        cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions);
@@ -420,16 +416,7 @@ err_xormap:
                put_device(&cxld->dev);
        else
                rc = cxl_decoder_autoremove(dev, cxld);
-       if (rc) {
-               dev_err(dev, "Failed to add decode range: %pr", res);
-               return rc;
-       }
-       dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
-               dev_name(&cxld->dev),
-               phys_to_target_node(cxld->hpa_range.start),
-               cxld->hpa_range.start, cxld->hpa_range.end);
-
-       return 0;
+       return rc;
 
 err_insert:
        kfree(res->name);
@@ -438,6 +425,29 @@ err_name:
        return -ENOMEM;
 }
 
+static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
+                          const unsigned long end)
+{
+       struct acpi_cedt_cfmws *cfmws = (struct acpi_cedt_cfmws *)header;
+       struct cxl_cfmws_context *ctx = arg;
+       struct device *dev = ctx->dev;
+       int rc;
+
+       rc = __cxl_parse_cfmws(cfmws, ctx);
+       if (rc)
+               dev_err(dev,
+                       "Failed to add decode range: [%#llx - %#llx] (%d)\n",
+                       cfmws->base_hpa,
+                       cfmws->base_hpa + cfmws->window_size - 1, rc);
+       else
+               dev_dbg(dev, "decode range: node: %d range [%#llx - %#llx]\n",
+                       phys_to_target_node(cfmws->base_hpa), cfmws->base_hpa,
+                       cfmws->base_hpa + cfmws->window_size - 1);
+
+       /* never fail cxl_acpi load for a single window failure */
+       return 0;
+}
+
 __mock struct acpi_device *to_cxl_host_bridge(struct device *host,
                                              struct device *dev)
 {
index 6fe11546889fabb48e997fda83e1f184a64179c6..08fd0baea7a0eb0f1c1442e9f454e3c32736d19c 100644 (file)
@@ -210,19 +210,12 @@ static int cxl_port_perf_data_calculate(struct cxl_port *port,
        return 0;
 }
 
-static void add_perf_entry(struct device *dev, struct dsmas_entry *dent,
-                          struct list_head *list)
+static void update_perf_entry(struct device *dev, struct dsmas_entry *dent,
+                             struct cxl_dpa_perf *dpa_perf)
 {
-       struct cxl_dpa_perf *dpa_perf;
-
-       dpa_perf = kzalloc(sizeof(*dpa_perf), GFP_KERNEL);
-       if (!dpa_perf)
-               return;
-
        dpa_perf->dpa_range = dent->dpa_range;
        dpa_perf->coord = dent->coord;
        dpa_perf->qos_class = dent->qos_class;
-       list_add_tail(&dpa_perf->list, list);
        dev_dbg(dev,
                "DSMAS: dpa: %#llx qos: %d read_bw: %d write_bw %d read_lat: %d write_lat: %d\n",
                dent->dpa_range.start, dpa_perf->qos_class,
@@ -230,20 +223,6 @@ static void add_perf_entry(struct device *dev, struct dsmas_entry *dent,
                dent->coord.read_latency, dent->coord.write_latency);
 }
 
-static void free_perf_ents(void *data)
-{
-       struct cxl_memdev_state *mds = data;
-       struct cxl_dpa_perf *dpa_perf, *n;
-       LIST_HEAD(discard);
-
-       list_splice_tail_init(&mds->ram_perf_list, &discard);
-       list_splice_tail_init(&mds->pmem_perf_list, &discard);
-       list_for_each_entry_safe(dpa_perf, n, &discard, list) {
-               list_del(&dpa_perf->list);
-               kfree(dpa_perf);
-       }
-}
-
 static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds,
                                     struct xarray *dsmas_xa)
 {
@@ -263,16 +242,14 @@ static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds,
        xa_for_each(dsmas_xa, index, dent) {
                if (resource_size(&cxlds->ram_res) &&
                    range_contains(&ram_range, &dent->dpa_range))
-                       add_perf_entry(dev, dent, &mds->ram_perf_list);
+                       update_perf_entry(dev, dent, &mds->ram_perf);
                else if (resource_size(&cxlds->pmem_res) &&
                         range_contains(&pmem_range, &dent->dpa_range))
-                       add_perf_entry(dev, dent, &mds->pmem_perf_list);
+                       update_perf_entry(dev, dent, &mds->pmem_perf);
                else
                        dev_dbg(dev, "no partition for dsmas dpa: %#llx\n",
                                dent->dpa_range.start);
        }
-
-       devm_add_action_or_reset(&cxlds->cxlmd->dev, free_perf_ents, mds);
 }
 
 static int match_cxlrd_qos_class(struct device *dev, void *data)
@@ -293,24 +270,24 @@ static int match_cxlrd_qos_class(struct device *dev, void *data)
        return 0;
 }
 
-static void cxl_qos_match(struct cxl_port *root_port,
-                         struct list_head *work_list,
-                         struct list_head *discard_list)
+static void reset_dpa_perf(struct cxl_dpa_perf *dpa_perf)
 {
-       struct cxl_dpa_perf *dpa_perf, *n;
+       *dpa_perf = (struct cxl_dpa_perf) {
+               .qos_class = CXL_QOS_CLASS_INVALID,
+       };
+}
 
-       list_for_each_entry_safe(dpa_perf, n, work_list, list) {
-               int rc;
+static bool cxl_qos_match(struct cxl_port *root_port,
+                         struct cxl_dpa_perf *dpa_perf)
+{
+       if (dpa_perf->qos_class == CXL_QOS_CLASS_INVALID)
+               return false;
 
-               if (dpa_perf->qos_class == CXL_QOS_CLASS_INVALID)
-                       return;
+       if (!device_for_each_child(&root_port->dev, &dpa_perf->qos_class,
+                                  match_cxlrd_qos_class))
+               return false;
 
-               rc = device_for_each_child(&root_port->dev,
-                                          (void *)&dpa_perf->qos_class,
-                                          match_cxlrd_qos_class);
-               if (!rc)
-                       list_move_tail(&dpa_perf->list, discard_list);
-       }
+       return true;
 }
 
 static int match_cxlrd_hb(struct device *dev, void *data)
@@ -334,23 +311,10 @@ static int match_cxlrd_hb(struct device *dev, void *data)
        return 0;
 }
 
-static void discard_dpa_perf(struct list_head *list)
-{
-       struct cxl_dpa_perf *dpa_perf, *n;
-
-       list_for_each_entry_safe(dpa_perf, n, list, list) {
-               list_del(&dpa_perf->list);
-               kfree(dpa_perf);
-       }
-}
-DEFINE_FREE(dpa_perf, struct list_head *, if (!list_empty(_T)) discard_dpa_perf(_T))
-
 static int cxl_qos_class_verify(struct cxl_memdev *cxlmd)
 {
        struct cxl_dev_state *cxlds = cxlmd->cxlds;
        struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
-       LIST_HEAD(__discard);
-       struct list_head *discard __free(dpa_perf) = &__discard;
        struct cxl_port *root_port;
        int rc;
 
@@ -363,16 +327,17 @@ static int cxl_qos_class_verify(struct cxl_memdev *cxlmd)
        root_port = &cxl_root->port;
 
        /* Check that the QTG IDs are all sane between end device and root decoders */
-       cxl_qos_match(root_port, &mds->ram_perf_list, discard);
-       cxl_qos_match(root_port, &mds->pmem_perf_list, discard);
+       if (!cxl_qos_match(root_port, &mds->ram_perf))
+               reset_dpa_perf(&mds->ram_perf);
+       if (!cxl_qos_match(root_port, &mds->pmem_perf))
+               reset_dpa_perf(&mds->pmem_perf);
 
        /* Check to make sure that the device's host bridge is under a root decoder */
        rc = device_for_each_child(&root_port->dev,
-                                  (void *)cxlmd->endpoint->host_bridge,
-                                  match_cxlrd_hb);
+                                  cxlmd->endpoint->host_bridge, match_cxlrd_hb);
        if (!rc) {
-               list_splice_tail_init(&mds->ram_perf_list, discard);
-               list_splice_tail_init(&mds->pmem_perf_list, discard);
+               reset_dpa_perf(&mds->ram_perf);
+               reset_dpa_perf(&mds->pmem_perf);
        }
 
        return rc;
@@ -417,6 +382,7 @@ void cxl_endpoint_parse_cdat(struct cxl_port *port)
 
        cxl_memdev_set_qos_class(cxlds, dsmas_xa);
        cxl_qos_class_verify(cxlmd);
+       cxl_memdev_update_perf(cxlmd);
 }
 EXPORT_SYMBOL_NS_GPL(cxl_endpoint_parse_cdat, CXL);
 
index 27166a41170579a9441a2f9bf3e2a915ed85d893..9adda4795eb786b8658b573dd1e79befbad52255 100644 (file)
@@ -1391,8 +1391,8 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
        mds->cxlds.reg_map.host = dev;
        mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
        mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
-       INIT_LIST_HEAD(&mds->ram_perf_list);
-       INIT_LIST_HEAD(&mds->pmem_perf_list);
+       mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID;
+       mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID;
 
        return mds;
 }
index dae8802ecdb01ee748e3891120bc0011e9e8894e..d4e259f3a7e914b9e3f17330cbc57f691d1976c2 100644 (file)
@@ -447,13 +447,41 @@ static struct attribute *cxl_memdev_attributes[] = {
        NULL,
 };
 
+static ssize_t pmem_qos_class_show(struct device *dev,
+                                  struct device_attribute *attr, char *buf)
+{
+       struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+       struct cxl_dev_state *cxlds = cxlmd->cxlds;
+       struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
+
+       return sysfs_emit(buf, "%d\n", mds->pmem_perf.qos_class);
+}
+
+static struct device_attribute dev_attr_pmem_qos_class =
+       __ATTR(qos_class, 0444, pmem_qos_class_show, NULL);
+
 static struct attribute *cxl_memdev_pmem_attributes[] = {
        &dev_attr_pmem_size.attr,
+       &dev_attr_pmem_qos_class.attr,
        NULL,
 };
 
+static ssize_t ram_qos_class_show(struct device *dev,
+                                 struct device_attribute *attr, char *buf)
+{
+       struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+       struct cxl_dev_state *cxlds = cxlmd->cxlds;
+       struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
+
+       return sysfs_emit(buf, "%d\n", mds->ram_perf.qos_class);
+}
+
+static struct device_attribute dev_attr_ram_qos_class =
+       __ATTR(qos_class, 0444, ram_qos_class_show, NULL);
+
 static struct attribute *cxl_memdev_ram_attributes[] = {
        &dev_attr_ram_size.attr,
+       &dev_attr_ram_qos_class.attr,
        NULL,
 };
 
@@ -477,14 +505,42 @@ static struct attribute_group cxl_memdev_attribute_group = {
        .is_visible = cxl_memdev_visible,
 };
 
+static umode_t cxl_ram_visible(struct kobject *kobj, struct attribute *a, int n)
+{
+       struct device *dev = kobj_to_dev(kobj);
+       struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+       struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+
+       if (a == &dev_attr_ram_qos_class.attr)
+               if (mds->ram_perf.qos_class == CXL_QOS_CLASS_INVALID)
+                       return 0;
+
+       return a->mode;
+}
+
 static struct attribute_group cxl_memdev_ram_attribute_group = {
        .name = "ram",
        .attrs = cxl_memdev_ram_attributes,
+       .is_visible = cxl_ram_visible,
 };
 
+static umode_t cxl_pmem_visible(struct kobject *kobj, struct attribute *a, int n)
+{
+       struct device *dev = kobj_to_dev(kobj);
+       struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+       struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+
+       if (a == &dev_attr_pmem_qos_class.attr)
+               if (mds->pmem_perf.qos_class == CXL_QOS_CLASS_INVALID)
+                       return 0;
+
+       return a->mode;
+}
+
 static struct attribute_group cxl_memdev_pmem_attribute_group = {
        .name = "pmem",
        .attrs = cxl_memdev_pmem_attributes,
+       .is_visible = cxl_pmem_visible,
 };
 
 static umode_t cxl_memdev_security_visible(struct kobject *kobj,
@@ -519,6 +575,13 @@ static const struct attribute_group *cxl_memdev_attribute_groups[] = {
        NULL,
 };
 
+void cxl_memdev_update_perf(struct cxl_memdev *cxlmd)
+{
+       sysfs_update_group(&cxlmd->dev.kobj, &cxl_memdev_ram_attribute_group);
+       sysfs_update_group(&cxlmd->dev.kobj, &cxl_memdev_pmem_attribute_group);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_memdev_update_perf, CXL);
+
 static const struct device_type cxl_memdev_type = {
        .name = "cxl_memdev",
        .release = cxl_memdev_release,
index 6c9c8d92f8f71401af70fec26be60e0339c18c64..e9e6c81ce034a8ffaba105132d5b9ecc59d51880 100644 (file)
@@ -477,9 +477,9 @@ int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
                allowed++;
        }
 
-       if (!allowed) {
-               cxl_set_mem_enable(cxlds, 0);
-               info->mem_enabled = 0;
+       if (!allowed && info->mem_enabled) {
+               dev_err(dev, "Range register decodes outside platform defined CXL ranges.\n");
+               return -ENXIO;
        }
 
        /*
@@ -932,11 +932,21 @@ static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds) { }
 void cxl_cor_error_detected(struct pci_dev *pdev)
 {
        struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
+       struct device *dev = &cxlds->cxlmd->dev;
+
+       scoped_guard(device, dev) {
+               if (!dev->driver) {
+                       dev_warn(&pdev->dev,
+                                "%s: memdev disabled, abort error handling\n",
+                                dev_name(dev));
+                       return;
+               }
 
-       if (cxlds->rcd)
-               cxl_handle_rdport_errors(cxlds);
+               if (cxlds->rcd)
+                       cxl_handle_rdport_errors(cxlds);
 
-       cxl_handle_endpoint_cor_ras(cxlds);
+               cxl_handle_endpoint_cor_ras(cxlds);
+       }
 }
 EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, CXL);
 
@@ -948,16 +958,25 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
        struct device *dev = &cxlmd->dev;
        bool ue;
 
-       if (cxlds->rcd)
-               cxl_handle_rdport_errors(cxlds);
+       scoped_guard(device, dev) {
+               if (!dev->driver) {
+                       dev_warn(&pdev->dev,
+                                "%s: memdev disabled, abort error handling\n",
+                                dev_name(dev));
+                       return PCI_ERS_RESULT_DISCONNECT;
+               }
+
+               if (cxlds->rcd)
+                       cxl_handle_rdport_errors(cxlds);
+               /*
+                * A frozen channel indicates an impending reset which is fatal to
+                * CXL.mem operation, and will likely crash the system. On the off
+                * chance the situation is recoverable dump the status of the RAS
+                * capability registers and bounce the active state of the memdev.
+                */
+               ue = cxl_handle_endpoint_ras(cxlds);
+       }
 
-       /*
-        * A frozen channel indicates an impending reset which is fatal to
-        * CXL.mem operation, and will likely crash the system. On the off
-        * chance the situation is recoverable dump the status of the RAS
-        * capability registers and bounce the active state of the memdev.
-        */
-       ue = cxl_handle_endpoint_ras(cxlds);
 
        switch (state) {
        case pci_channel_io_normal:
index 0f05692bfec3946841a766c8583bc1a3b526073e..4c7fd2d5cccb2965eb528cbc26bb261ef01dcdce 100644 (file)
@@ -525,7 +525,7 @@ static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
        struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
        struct cxl_region_params *p = &cxlr->params;
        struct resource *res;
-       u32 remainder = 0;
+       u64 remainder = 0;
 
        lockdep_assert_held_write(&cxl_region_rwsem);
 
@@ -545,7 +545,7 @@ static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
            (cxlr->mode == CXL_DECODER_PMEM && uuid_is_null(&p->uuid)))
                return -ENXIO;
 
-       div_u64_rem(size, SZ_256M * p->interleave_ways, &remainder);
+       div64_u64_rem(size, (u64)SZ_256M * p->interleave_ways, &remainder);
        if (remainder)
                return -EINVAL;
 
@@ -730,12 +730,17 @@ static int match_auto_decoder(struct device *dev, void *data)
        return 0;
 }
 
-static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
-                                                  struct cxl_region *cxlr)
+static struct cxl_decoder *
+cxl_region_find_decoder(struct cxl_port *port,
+                       struct cxl_endpoint_decoder *cxled,
+                       struct cxl_region *cxlr)
 {
        struct device *dev;
        int id = 0;
 
+       if (port == cxled_to_port(cxled))
+               return &cxled->cxld;
+
        if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags))
                dev = device_find_child(&port->dev, &cxlr->params,
                                        match_auto_decoder);
@@ -753,8 +758,31 @@ static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
        return to_cxl_decoder(dev);
 }
 
-static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
-                                              struct cxl_region *cxlr)
+static bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_iter,
+                         struct cxl_decoder *cxld)
+{
+       struct cxl_region_ref *rr = cxl_rr_load(port, cxlr_iter);
+       struct cxl_decoder *cxld_iter = rr->decoder;
+
+       /*
+        * Allow the out of order assembly of auto-discovered regions.
+        * Per CXL Spec 3.1 8.2.4.20.12 software must commit decoders
+        * in HPA order. Confirm that the decoder with the lesser HPA
+        * starting address has the lesser id.
+        */
+       dev_dbg(&cxld->dev, "check for HPA violation %s:%d < %s:%d\n",
+               dev_name(&cxld->dev), cxld->id,
+               dev_name(&cxld_iter->dev), cxld_iter->id);
+
+       if (cxld_iter->id > cxld->id)
+               return true;
+
+       return false;
+}
+
+static struct cxl_region_ref *
+alloc_region_ref(struct cxl_port *port, struct cxl_region *cxlr,
+                struct cxl_endpoint_decoder *cxled)
 {
        struct cxl_region_params *p = &cxlr->params;
        struct cxl_region_ref *cxl_rr, *iter;
@@ -764,16 +792,21 @@ static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
        xa_for_each(&port->regions, index, iter) {
                struct cxl_region_params *ip = &iter->region->params;
 
-               if (!ip->res)
+               if (!ip->res || ip->res->start < p->res->start)
                        continue;
 
-               if (ip->res->start > p->res->start) {
-                       dev_dbg(&cxlr->dev,
-                               "%s: HPA order violation %s:%pr vs %pr\n",
-                               dev_name(&port->dev),
-                               dev_name(&iter->region->dev), ip->res, p->res);
-                       return ERR_PTR(-EBUSY);
+               if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
+                       struct cxl_decoder *cxld;
+
+                       cxld = cxl_region_find_decoder(port, cxled, cxlr);
+                       if (auto_order_ok(port, iter->region, cxld))
+                               continue;
                }
+               dev_dbg(&cxlr->dev, "%s: HPA order violation %s:%pr vs %pr\n",
+                       dev_name(&port->dev),
+                       dev_name(&iter->region->dev), ip->res, p->res);
+
+               return ERR_PTR(-EBUSY);
        }
 
        cxl_rr = kzalloc(sizeof(*cxl_rr), GFP_KERNEL);
@@ -853,10 +886,7 @@ static int cxl_rr_alloc_decoder(struct cxl_port *port, struct cxl_region *cxlr,
 {
        struct cxl_decoder *cxld;
 
-       if (port == cxled_to_port(cxled))
-               cxld = &cxled->cxld;
-       else
-               cxld = cxl_region_find_decoder(port, cxlr);
+       cxld = cxl_region_find_decoder(port, cxled, cxlr);
        if (!cxld) {
                dev_dbg(&cxlr->dev, "%s: no decoder available\n",
                        dev_name(&port->dev));
@@ -953,7 +983,7 @@ static int cxl_port_attach_region(struct cxl_port *port,
                        nr_targets_inc = true;
                }
        } else {
-               cxl_rr = alloc_region_ref(port, cxlr);
+               cxl_rr = alloc_region_ref(port, cxlr, cxled);
                if (IS_ERR(cxl_rr)) {
                        dev_dbg(&cxlr->dev,
                                "%s: failed to allocate region reference\n",
index 89445435303aac4d043c964a0ada866548889917..bdf117a33744be2db0468e869226ac8d45ef7a16 100644 (file)
@@ -338,7 +338,7 @@ TRACE_EVENT(cxl_general_media,
 
        TP_fast_assign(
                CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
-               memcpy(&__entry->hdr_uuid, &CXL_EVENT_GEN_MEDIA_UUID, sizeof(uuid_t));
+               __entry->hdr_uuid = CXL_EVENT_GEN_MEDIA_UUID;
 
                /* General Media */
                __entry->dpa = le64_to_cpu(rec->phys_addr);
@@ -425,7 +425,7 @@ TRACE_EVENT(cxl_dram,
 
        TP_fast_assign(
                CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
-               memcpy(&__entry->hdr_uuid, &CXL_EVENT_DRAM_UUID, sizeof(uuid_t));
+               __entry->hdr_uuid = CXL_EVENT_DRAM_UUID;
 
                /* DRAM */
                __entry->dpa = le64_to_cpu(rec->phys_addr);
@@ -573,7 +573,7 @@ TRACE_EVENT(cxl_memory_module,
 
        TP_fast_assign(
                CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
-               memcpy(&__entry->hdr_uuid, &CXL_EVENT_MEM_MODULE_UUID, sizeof(uuid_t));
+               __entry->hdr_uuid = CXL_EVENT_MEM_MODULE_UUID;
 
                /* Memory Module Event */
                __entry->event_type = rec->event_type;
index b6017c0c57b4d5e69dfe45011b7a8b3f5bf0b913..003feebab79b5f8e7563ba2e32665b4377871a55 100644 (file)
@@ -880,6 +880,8 @@ void cxl_switch_parse_cdat(struct cxl_port *port);
 int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
                                      struct access_coordinate *coord);
 
+void cxl_memdev_update_perf(struct cxl_memdev *cxlmd);
+
 /*
  * Unit test builds overrides this to __weak, find the 'strong' version
  * of these symbols in tools/testing/cxl/.
index 5303d6942b880af65dcf8e77b02d26626c2bb94d..20fb3b35e89e0473ee8ad42dcd17407086fb8cdb 100644 (file)
@@ -395,13 +395,11 @@ enum cxl_devtype {
 
 /**
  * struct cxl_dpa_perf - DPA performance property entry
- * @list - list entry
  * @dpa_range - range for DPA address
  * @coord - QoS performance data (i.e. latency, bandwidth)
  * @qos_class - QoS Class cookies
  */
 struct cxl_dpa_perf {
-       struct list_head list;
        struct range dpa_range;
        struct access_coordinate coord;
        int qos_class;
@@ -471,8 +469,8 @@ struct cxl_dev_state {
  * @security: security driver state info
  * @fw: firmware upload / activation state
  * @mbox_send: @dev specific transport for transmitting mailbox commands
- * @ram_perf_list: performance data entries matched to RAM
- * @pmem_perf_list: performance data entries matched to PMEM
+ * @ram_perf: performance data entry matched to RAM partition
+ * @pmem_perf: performance data entry matched to PMEM partition
  *
  * See CXL 3.0 8.2.9.8.2 Capacity Configuration and Label Storage for
  * details on capacity parameters.
@@ -494,8 +492,8 @@ struct cxl_memdev_state {
        u64 next_volatile_bytes;
        u64 next_persistent_bytes;
 
-       struct list_head ram_perf_list;
-       struct list_head pmem_perf_list;
+       struct cxl_dpa_perf ram_perf;
+       struct cxl_dpa_perf pmem_perf;
 
        struct cxl_event_state event;
        struct cxl_poison_state poison;
index c5c9d8e0d88d69fcc9f031e1bd46ba7c44de4fd4..0c79d9ce877ccaef9895a9885801d4fff69c5093 100644 (file)
@@ -215,52 +215,6 @@ static ssize_t trigger_poison_list_store(struct device *dev,
 }
 static DEVICE_ATTR_WO(trigger_poison_list);
 
-static ssize_t ram_qos_class_show(struct device *dev,
-                                 struct device_attribute *attr, char *buf)
-{
-       struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
-       struct cxl_dev_state *cxlds = cxlmd->cxlds;
-       struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
-       struct cxl_dpa_perf *dpa_perf;
-
-       if (!dev->driver)
-               return -ENOENT;
-
-       if (list_empty(&mds->ram_perf_list))
-               return -ENOENT;
-
-       dpa_perf = list_first_entry(&mds->ram_perf_list, struct cxl_dpa_perf,
-                                   list);
-
-       return sysfs_emit(buf, "%d\n", dpa_perf->qos_class);
-}
-
-static struct device_attribute dev_attr_ram_qos_class =
-       __ATTR(qos_class, 0444, ram_qos_class_show, NULL);
-
-static ssize_t pmem_qos_class_show(struct device *dev,
-                                  struct device_attribute *attr, char *buf)
-{
-       struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
-       struct cxl_dev_state *cxlds = cxlmd->cxlds;
-       struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
-       struct cxl_dpa_perf *dpa_perf;
-
-       if (!dev->driver)
-               return -ENOENT;
-
-       if (list_empty(&mds->pmem_perf_list))
-               return -ENOENT;
-
-       dpa_perf = list_first_entry(&mds->pmem_perf_list, struct cxl_dpa_perf,
-                                   list);
-
-       return sysfs_emit(buf, "%d\n", dpa_perf->qos_class);
-}
-
-static struct device_attribute dev_attr_pmem_qos_class =
-       __ATTR(qos_class, 0444, pmem_qos_class_show, NULL);
-
 static umode_t cxl_mem_visible(struct kobject *kobj, struct attribute *a, int n)
 {
        struct device *dev = kobj_to_dev(kobj);
@@ -272,21 +226,11 @@ static umode_t cxl_mem_visible(struct kobject *kobj, struct attribute *a, int n)
                              mds->poison.enabled_cmds))
                        return 0;
 
-       if (a == &dev_attr_pmem_qos_class.attr)
-               if (list_empty(&mds->pmem_perf_list))
-                       return 0;
-
-       if (a == &dev_attr_ram_qos_class.attr)
-               if (list_empty(&mds->ram_perf_list))
-                       return 0;
-
        return a->mode;
 }
 
 static struct attribute *cxl_mem_attrs[] = {
        &dev_attr_trigger_poison_list.attr,
-       &dev_attr_ram_qos_class.attr,
-       &dev_attr_pmem_qos_class.attr,
        NULL
 };
 
index 4fd1f207c84ee53a857e417a5fc2260fb43b9733..2ff361e756d66147d8d20969c376730ae2bcc90e 100644 (file)
@@ -382,7 +382,7 @@ static int cxl_pci_mbox_send(struct cxl_memdev_state *mds,
        return rc;
 }
 
-static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds)
+static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds, bool irq_avail)
 {
        struct cxl_dev_state *cxlds = &mds->cxlds;
        const int cap = readl(cxlds->regs.mbox + CXLDEV_MBOX_CAPS_OFFSET);
@@ -441,7 +441,7 @@ static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds)
        INIT_DELAYED_WORK(&mds->security.poll_dwork, cxl_mbox_sanitize_work);
 
        /* background command interrupts are optional */
-       if (!(cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ))
+       if (!(cap & CXLDEV_MBOX_CAP_BG_CMD_IRQ) || !irq_avail)
                return 0;
 
        msgnum = FIELD_GET(CXLDEV_MBOX_CAP_IRQ_MSGNUM_MASK, cap);
@@ -588,7 +588,7 @@ static int cxl_mem_alloc_event_buf(struct cxl_memdev_state *mds)
        return devm_add_action_or_reset(mds->cxlds.dev, free_event_buf, buf);
 }
 
-static int cxl_alloc_irq_vectors(struct pci_dev *pdev)
+static bool cxl_alloc_irq_vectors(struct pci_dev *pdev)
 {
        int nvecs;
 
@@ -605,9 +605,9 @@ static int cxl_alloc_irq_vectors(struct pci_dev *pdev)
                                      PCI_IRQ_MSIX | PCI_IRQ_MSI);
        if (nvecs < 1) {
                dev_dbg(&pdev->dev, "Failed to alloc irq vectors: %d\n", nvecs);
-               return -ENXIO;
+               return false;
        }
-       return 0;
+       return true;
 }
 
 static irqreturn_t cxl_event_thread(int irq, void *id)
@@ -743,7 +743,7 @@ static bool cxl_event_int_is_fw(u8 setting)
 }
 
 static int cxl_event_config(struct pci_host_bridge *host_bridge,
-                           struct cxl_memdev_state *mds)
+                           struct cxl_memdev_state *mds, bool irq_avail)
 {
        struct cxl_event_interrupt_policy policy;
        int rc;
@@ -755,6 +755,11 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge,
        if (!host_bridge->native_cxl_error)
                return 0;
 
+       if (!irq_avail) {
+               dev_info(mds->cxlds.dev, "No interrupt support, disable event processing.\n");
+               return 0;
+       }
+
        rc = cxl_mem_alloc_event_buf(mds);
        if (rc)
                return rc;
@@ -789,6 +794,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
        struct cxl_register_map map;
        struct cxl_memdev *cxlmd;
        int i, rc, pmu_count;
+       bool irq_avail;
 
        /*
         * Double check the anonymous union trickery in struct cxl_regs
@@ -846,11 +852,9 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
        else
                dev_warn(&pdev->dev, "Media not active (%d)\n", rc);
 
-       rc = cxl_alloc_irq_vectors(pdev);
-       if (rc)
-               return rc;
+       irq_avail = cxl_alloc_irq_vectors(pdev);
 
-       rc = cxl_pci_setup_mailbox(mds);
+       rc = cxl_pci_setup_mailbox(mds, irq_avail);
        if (rc)
                return rc;
 
@@ -909,7 +913,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
                }
        }
 
-       rc = cxl_event_config(host_bridge, mds);
+       rc = cxl_event_config(host_bridge, mds, irq_avail);
        if (rc)
                return rc;
 
@@ -970,61 +974,6 @@ static struct pci_driver cxl_pci_driver = {
        },
 };
 
-#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
-static void cxl_cper_event_call(enum cxl_event_type ev_type,
-                               struct cxl_cper_event_rec *rec)
-{
-       struct cper_cxl_event_devid *device_id = &rec->hdr.device_id;
-       struct pci_dev *pdev __free(pci_dev_put) = NULL;
-       enum cxl_event_log_type log_type;
-       struct cxl_dev_state *cxlds;
-       unsigned int devfn;
-       u32 hdr_flags;
-
-       devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
-       pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
-                                          device_id->bus_num, devfn);
-       if (!pdev)
-               return;
-
-       guard(pci_dev)(pdev);
-       if (pdev->driver != &cxl_pci_driver)
-               return;
-
-       cxlds = pci_get_drvdata(pdev);
-       if (!cxlds)
-               return;
-
-       /* Fabricate a log type */
-       hdr_flags = get_unaligned_le24(rec->event.generic.hdr.flags);
-       log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
-
-       cxl_event_trace_record(cxlds->cxlmd, log_type, ev_type,
-                              &uuid_null, &rec->event);
-}
-
-static int __init cxl_pci_driver_init(void)
-{
-       int rc;
-
-       rc = cxl_cper_register_callback(cxl_cper_event_call);
-       if (rc)
-               return rc;
-
-       rc = pci_register_driver(&cxl_pci_driver);
-       if (rc)
-               cxl_cper_unregister_callback(cxl_cper_event_call);
-
-       return rc;
-}
-
-static void __exit cxl_pci_driver_exit(void)
-{
-       pci_unregister_driver(&cxl_pci_driver);
-       cxl_cper_unregister_callback(cxl_cper_event_call);
-}
-
-module_init(cxl_pci_driver_init);
-module_exit(cxl_pci_driver_exit);
+module_pci_driver(cxl_pci_driver);
 MODULE_LICENSE("GPL v2");
 MODULE_IMPORT_NS(CXL);
index ee899f8e67215f6036734795cb5b90ab77a293a3..4a63567e93bae3dd2d5affabeedfd713aaa51460 100644 (file)
@@ -168,10 +168,7 @@ static vm_fault_t cma_heap_vm_fault(struct vm_fault *vmf)
        if (vmf->pgoff > buffer->pagecount)
                return VM_FAULT_SIGBUS;
 
-       vmf->page = buffer->pages[vmf->pgoff];
-       get_page(vmf->page);
-
-       return 0;
+       return vmf_insert_pfn(vma, vmf->address, page_to_pfn(buffer->pages[vmf->pgoff]));
 }
 
 static const struct vm_operations_struct dma_heap_vm_ops = {
@@ -185,6 +182,8 @@ static int cma_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
        if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
                return -EINVAL;
 
+       vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
+
        vma->vm_ops = &dma_heap_vm_ops;
        vma->vm_private_data = buffer;
 
index fb89ecbf0cc5be8ca566eaac6e499f2e2336b625..40052d1bd0b5c161180eec477bedb0f871e91a55 100644 (file)
@@ -222,8 +222,14 @@ struct atdma_sg {
  * @vd: pointer to the virtual dma descriptor.
  * @atchan: pointer to the atmel dma channel.
  * @total_len: total transaction byte count
- * @sg_len: number of sg entries.
+ * @sglen: number of sg entries.
  * @sg: array of sgs.
+ * @boundary: number of transfers to perform before the automatic address increment operation
+ * @dst_hole: value to add to the destination address when the boundary has been reached
+ * @src_hole: value to add to the source address when the boundary has been reached
+ * @memset_buffer: buffer used for the memset operation
+ * @memset_paddr: physical address of the buffer used for the memset operation
+ * @memset_vaddr: virtual address of the buffer used for the memset operation
  */
 struct at_desc {
        struct                          virt_dma_desc vd;
@@ -245,7 +251,10 @@ struct at_desc {
 /*--  Channels  --------------------------------------------------------*/
 
 /**
- * atc_status - information bits stored in channel status flag
+ * enum atc_status - information bits stored in channel status flag
+ *
+ * @ATC_IS_PAUSED: If channel is pauses
+ * @ATC_IS_CYCLIC: If channel is cyclic
  *
  * Manipulated with atomic operations.
  */
@@ -282,7 +291,6 @@ struct at_dma_chan {
        u32                     save_cfg;
        u32                     save_dscr;
        struct dma_slave_config dma_sconfig;
-       bool                    cyclic;
        struct at_desc          *desc;
 };
 
@@ -328,12 +336,12 @@ static inline u8 convert_buswidth(enum dma_slave_buswidth addr_width)
 /**
  * struct at_dma - internal representation of an Atmel HDMA Controller
  * @dma_device: dmaengine dma_device object members
- * @atdma_devtype: identifier of DMA controller compatibility
- * @ch_regs: memory mapped register base
+ * @regs: memory mapped register base
  * @clk: dma controller clock
  * @save_imr: interrupt mask register that is saved on suspend/resume cycle
  * @all_chan_mask: all channels availlable in a mask
  * @lli_pool: hw lli table
+ * @memset_pool: hw memset pool
  * @chan: channels table to store at_dma_chan structures
  */
 struct at_dma {
@@ -626,6 +634,9 @@ static inline u32 atc_calc_bytes_left(u32 current_len, u32 ctrla)
 
 /**
  * atc_get_llis_residue - Get residue for a hardware linked list transfer
+ * @atchan: pointer to an atmel hdmac channel.
+ * @desc: pointer to the descriptor for which the residue is calculated.
+ * @residue: residue to be set to dma_tx_state.
  *
  * Calculate the residue by removing the length of the Linked List Item (LLI)
  * already transferred from the total length. To get the current LLI we can use
@@ -661,10 +672,8 @@ static inline u32 atc_calc_bytes_left(u32 current_len, u32 ctrla)
  * two DSCR values are different, we read again the CTRLA then the DSCR till two
  * consecutive read values from DSCR are equal or till the maximum trials is
  * reach. This algorithm is very unlikely not to find a stable value for DSCR.
- * @atchan: pointer to an atmel hdmac channel.
- * @desc: pointer to the descriptor for which the residue is calculated.
- * @residue: residue to be set to dma_tx_state.
- * Returns 0 on success, -errno otherwise.
+ *
+ * Returns: %0 on success, -errno otherwise.
  */
 static int atc_get_llis_residue(struct at_dma_chan *atchan,
                                struct at_desc *desc, u32 *residue)
@@ -731,7 +740,8 @@ static int atc_get_llis_residue(struct at_dma_chan *atchan,
  * @chan: DMA channel
  * @cookie: transaction identifier to check status of
  * @residue: residue to be updated.
- * Return 0 on success, -errono otherwise.
+ *
+ * Return: %0 on success, -errno otherwise.
  */
 static int atc_get_residue(struct dma_chan *chan, dma_cookie_t cookie,
                           u32 *residue)
@@ -1710,7 +1720,7 @@ static void atc_issue_pending(struct dma_chan *chan)
  * atc_alloc_chan_resources - allocate resources for DMA channel
  * @chan: allocate descriptor resources for this channel
  *
- * return - the number of allocated descriptors
+ * Return: the number of allocated descriptors
  */
 static int atc_alloc_chan_resources(struct dma_chan *chan)
 {
index b38786f0ad7995d9b0d22aa18fdd6d2407320c26..b75fdaffad9a4ea6cd8d15e8f43bea550848b46c 100644 (file)
@@ -346,6 +346,20 @@ static void dw_edma_v0_core_write_chunk(struct dw_edma_chunk *chunk)
        dw_edma_v0_write_ll_link(chunk, i, control, chunk->ll_region.paddr);
 }
 
+static void dw_edma_v0_sync_ll_data(struct dw_edma_chunk *chunk)
+{
+       /*
+        * In case of remote eDMA engine setup, the DW PCIe RP/EP internal
+        * configuration registers and application memory are normally accessed
+        * over different buses. Ensure LL-data reaches the memory before the
+        * doorbell register is toggled by issuing the dummy-read from the remote
+        * LL memory in a hope that the MRd TLP will return only after the
+        * last MWr TLP is completed
+        */
+       if (!(chunk->chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL))
+               readl(chunk->ll_region.vaddr.io);
+}
+
 static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
 {
        struct dw_edma_chan *chan = chunk->chan;
@@ -412,6 +426,9 @@ static void dw_edma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
                SET_CH_32(dw, chan->dir, chan->id, llp.msb,
                          upper_32_bits(chunk->ll_region.paddr));
        }
+
+       dw_edma_v0_sync_ll_data(chunk);
+
        /* Doorbell */
        SET_RW_32(dw, chan->dir, doorbell,
                  FIELD_PREP(EDMA_V0_DOORBELL_CH_MASK, chan->id));
index 00b735a0202ab2e8e030910db2747c02be8bf75e..10e8f0715114fb5f08f135b4f2d592ce6c53f10c 100644 (file)
@@ -65,18 +65,12 @@ static void dw_hdma_v0_core_off(struct dw_edma *dw)
 
 static u16 dw_hdma_v0_core_ch_count(struct dw_edma *dw, enum dw_edma_dir dir)
 {
-       u32 num_ch = 0;
-       int id;
-
-       for (id = 0; id < HDMA_V0_MAX_NR_CH; id++) {
-               if (GET_CH_32(dw, id, dir, ch_en) & BIT(0))
-                       num_ch++;
-       }
-
-       if (num_ch > HDMA_V0_MAX_NR_CH)
-               num_ch = HDMA_V0_MAX_NR_CH;
-
-       return (u16)num_ch;
+       /*
+        * The HDMA IP have no way to know the number of hardware channels
+        * available, we set it to maximum channels and let the platform
+        * set the right number of channels.
+        */
+       return HDMA_V0_MAX_NR_CH;
 }
 
 static enum dma_status dw_hdma_v0_core_ch_status(struct dw_edma_chan *chan)
@@ -228,6 +222,20 @@ static void dw_hdma_v0_core_write_chunk(struct dw_edma_chunk *chunk)
        dw_hdma_v0_write_ll_link(chunk, i, control, chunk->ll_region.paddr);
 }
 
+static void dw_hdma_v0_sync_ll_data(struct dw_edma_chunk *chunk)
+{
+       /*
+        * In case of remote HDMA engine setup, the DW PCIe RP/EP internal
+        * configuration registers and application memory are normally accessed
+        * over different buses. Ensure LL-data reaches the memory before the
+        * doorbell register is toggled by issuing the dummy-read from the remote
+        * LL memory in a hope that the MRd TLP will return only after the
+        * last MWr TLP is completed
+        */
+       if (!(chunk->chan->dw->chip->flags & DW_EDMA_CHIP_LOCAL))
+               readl(chunk->ll_region.vaddr.io);
+}
+
 static void dw_hdma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
 {
        struct dw_edma_chan *chan = chunk->chan;
@@ -242,7 +250,9 @@ static void dw_hdma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
                /* Interrupt enable&unmask - done, abort */
                tmp = GET_CH_32(dw, chan->dir, chan->id, int_setup) |
                      HDMA_V0_STOP_INT_MASK | HDMA_V0_ABORT_INT_MASK |
-                     HDMA_V0_LOCAL_STOP_INT_EN | HDMA_V0_LOCAL_STOP_INT_EN;
+                     HDMA_V0_LOCAL_STOP_INT_EN | HDMA_V0_LOCAL_ABORT_INT_EN;
+               if (!(dw->chip->flags & DW_EDMA_CHIP_LOCAL))
+                       tmp |= HDMA_V0_REMOTE_STOP_INT_EN | HDMA_V0_REMOTE_ABORT_INT_EN;
                SET_CH_32(dw, chan->dir, chan->id, int_setup, tmp);
                /* Channel control */
                SET_CH_32(dw, chan->dir, chan->id, control1, HDMA_V0_LINKLIST_EN);
@@ -256,6 +266,9 @@ static void dw_hdma_v0_core_start(struct dw_edma_chunk *chunk, bool first)
        /* Set consumer cycle */
        SET_CH_32(dw, chan->dir, chan->id, cycle_sync,
                  HDMA_V0_CONSUMER_CYCLE_STAT | HDMA_V0_CONSUMER_CYCLE_BIT);
+
+       dw_hdma_v0_sync_ll_data(chunk);
+
        /* Doorbell */
        SET_CH_32(dw, chan->dir, chan->id, doorbell, HDMA_V0_DOORBELL_START);
 }
index a974abdf8aaf5ecd83eadd56f191a313ec37e9ff..eab5fd7177e545cab3f2217bd1a8add0d8dbb435 100644 (file)
@@ -15,7 +15,7 @@
 #define HDMA_V0_LOCAL_ABORT_INT_EN             BIT(6)
 #define HDMA_V0_REMOTE_ABORT_INT_EN            BIT(5)
 #define HDMA_V0_LOCAL_STOP_INT_EN              BIT(4)
-#define HDMA_V0_REMOTEL_STOP_INT_EN            BIT(3)
+#define HDMA_V0_REMOTE_STOP_INT_EN             BIT(3)
 #define HDMA_V0_ABORT_INT_MASK                 BIT(2)
 #define HDMA_V0_STOP_INT_MASK                  BIT(0)
 #define HDMA_V0_LINKLIST_EN                    BIT(0)
index 7958ac33e36ce3fab462d33161fea8dabe0ee215..5a8061a307cdafeb3a4db5ddd104ae6d7ec8d190 100644 (file)
@@ -38,15 +38,17 @@ static int dpaa2_qdma_alloc_chan_resources(struct dma_chan *chan)
        if (!dpaa2_chan->fd_pool)
                goto err;
 
-       dpaa2_chan->fl_pool = dma_pool_create("fl_pool", dev,
-                                             sizeof(struct dpaa2_fl_entry),
-                                             sizeof(struct dpaa2_fl_entry), 0);
+       dpaa2_chan->fl_pool =
+               dma_pool_create("fl_pool", dev,
+                                sizeof(struct dpaa2_fl_entry) * 3,
+                                sizeof(struct dpaa2_fl_entry), 0);
+
        if (!dpaa2_chan->fl_pool)
                goto err_fd;
 
        dpaa2_chan->sdd_pool =
                dma_pool_create("sdd_pool", dev,
-                               sizeof(struct dpaa2_qdma_sd_d),
+                               sizeof(struct dpaa2_qdma_sd_d) * 2,
                                sizeof(struct dpaa2_qdma_sd_d), 0);
        if (!dpaa2_chan->sdd_pool)
                goto err_fl;
index b53f46245c377f05520c8275c95bf10c59be34d7..793f1a7ad5e343bbfe403c9e0ad28e891bd0d556 100644 (file)
@@ -503,7 +503,7 @@ void fsl_edma_fill_tcd(struct fsl_edma_chan *fsl_chan,
        if (fsl_chan->is_multi_fifo) {
                /* set mloff to support multiple fifo */
                burst = cfg->direction == DMA_DEV_TO_MEM ?
-                               cfg->src_addr_width : cfg->dst_addr_width;
+                               cfg->src_maxburst : cfg->dst_maxburst;
                nbytes |= EDMA_V3_TCD_NBYTES_MLOFF(-(burst * 4));
                /* enable DMLOE/SMLOE */
                if (cfg->direction == DMA_MEM_TO_DEV) {
index bb5221158a7702379322392a46a1ebfb4de0f476..f5e216b157c75ff2215d7c74cd1d9febad47031c 100644 (file)
@@ -30,8 +30,9 @@
 #define EDMA_TCD_ATTR_SSIZE(x)         (((x) & GENMASK(2, 0)) << 8)
 #define EDMA_TCD_ATTR_SMOD(x)          (((x) & GENMASK(4, 0)) << 11)
 
-#define EDMA_TCD_CITER_CITER(x)                ((x) & GENMASK(14, 0))
-#define EDMA_TCD_BITER_BITER(x)                ((x) & GENMASK(14, 0))
+#define EDMA_TCD_ITER_MASK             GENMASK(14, 0)
+#define EDMA_TCD_CITER_CITER(x)                ((x) & EDMA_TCD_ITER_MASK)
+#define EDMA_TCD_BITER_BITER(x)                ((x) & EDMA_TCD_ITER_MASK)
 
 #define EDMA_TCD_CSR_START             BIT(0)
 #define EDMA_TCD_CSR_INT_MAJOR         BIT(1)
index 45cc419b1b4acbe87c12c3daaccafce73f8de1ba..d36e28b9c767ae7ebb44bc9e87de7bbc0363f926 100644 (file)
@@ -10,6 +10,7 @@
  */
 
 #include <dt-bindings/dma/fsl-edma.h>
+#include <linux/bitfield.h>
 #include <linux/module.h>
 #include <linux/interrupt.h>
 #include <linux/clk.h>
@@ -582,7 +583,8 @@ static int fsl_edma_probe(struct platform_device *pdev)
                                        DMAENGINE_ALIGN_32_BYTES;
 
        /* Per worst case 'nbytes = 1' take CITER as the max_seg_size */
-       dma_set_max_seg_size(fsl_edma->dma_dev.dev, 0x3fff);
+       dma_set_max_seg_size(fsl_edma->dma_dev.dev,
+                            FIELD_GET(EDMA_TCD_ITER_MASK, EDMA_TCD_ITER_MASK));
 
        fsl_edma->dma_dev.residue_granularity = DMA_RESIDUE_GRANULARITY_SEGMENT;
 
index a1d0aa63142a981bb59fcde5663e53ee7355947c..5005e138fc239bf23a8a888c90e5ad720f697d3d 100644 (file)
 #define FSL_QDMA_CMD_WTHROTL_OFFSET    20
 #define FSL_QDMA_CMD_DSEN_OFFSET       19
 #define FSL_QDMA_CMD_LWC_OFFSET                16
+#define FSL_QDMA_CMD_PF                        BIT(17)
 
 /* Field definition for Descriptor status */
 #define QDMA_CCDF_STATUS_RTE           BIT(5)
@@ -160,6 +161,10 @@ struct fsl_qdma_format {
                        u8 __reserved1[2];
                        u8 cfg8b_w1;
                } __packed;
+               struct {
+                       __le32 __reserved2;
+                       __le32 cmd;
+               } __packed;
                __le64 data;
        };
 } __packed;
@@ -354,7 +359,6 @@ static void fsl_qdma_free_chan_resources(struct dma_chan *chan)
 static void fsl_qdma_comp_fill_memcpy(struct fsl_qdma_comp *fsl_comp,
                                      dma_addr_t dst, dma_addr_t src, u32 len)
 {
-       u32 cmd;
        struct fsl_qdma_format *sdf, *ddf;
        struct fsl_qdma_format *ccdf, *csgf_desc, *csgf_src, *csgf_dest;
 
@@ -383,14 +387,11 @@ static void fsl_qdma_comp_fill_memcpy(struct fsl_qdma_comp *fsl_comp,
        /* This entry is the last entry. */
        qdma_csgf_set_f(csgf_dest, len);
        /* Descriptor Buffer */
-       cmd = cpu_to_le32(FSL_QDMA_CMD_RWTTYPE <<
-                         FSL_QDMA_CMD_RWTTYPE_OFFSET);
-       sdf->data = QDMA_SDDF_CMD(cmd);
-
-       cmd = cpu_to_le32(FSL_QDMA_CMD_RWTTYPE <<
-                         FSL_QDMA_CMD_RWTTYPE_OFFSET);
-       cmd |= cpu_to_le32(FSL_QDMA_CMD_LWC << FSL_QDMA_CMD_LWC_OFFSET);
-       ddf->data = QDMA_SDDF_CMD(cmd);
+       sdf->cmd = cpu_to_le32((FSL_QDMA_CMD_RWTTYPE << FSL_QDMA_CMD_RWTTYPE_OFFSET) |
+                              FSL_QDMA_CMD_PF);
+
+       ddf->cmd = cpu_to_le32((FSL_QDMA_CMD_RWTTYPE << FSL_QDMA_CMD_RWTTYPE_OFFSET) |
+                              (FSL_QDMA_CMD_LWC << FSL_QDMA_CMD_LWC_OFFSET));
 }
 
 /*
@@ -514,11 +515,11 @@ static struct fsl_qdma_queue
                        queue_temp = queue_head + i + (j * queue_num);
 
                        queue_temp->cq =
-                       dma_alloc_coherent(&pdev->dev,
-                                          sizeof(struct fsl_qdma_format) *
-                                          queue_size[i],
-                                          &queue_temp->bus_addr,
-                                          GFP_KERNEL);
+                       dmam_alloc_coherent(&pdev->dev,
+                                           sizeof(struct fsl_qdma_format) *
+                                           queue_size[i],
+                                           &queue_temp->bus_addr,
+                                           GFP_KERNEL);
                        if (!queue_temp->cq)
                                return NULL;
                        queue_temp->block_base = fsl_qdma->block_base +
@@ -563,15 +564,14 @@ static struct fsl_qdma_queue
        /*
         * Buffer for queue command
         */
-       status_head->cq = dma_alloc_coherent(&pdev->dev,
-                                            sizeof(struct fsl_qdma_format) *
-                                            status_size,
-                                            &status_head->bus_addr,
-                                            GFP_KERNEL);
-       if (!status_head->cq) {
-               devm_kfree(&pdev->dev, status_head);
+       status_head->cq = dmam_alloc_coherent(&pdev->dev,
+                                             sizeof(struct fsl_qdma_format) *
+                                             status_size,
+                                             &status_head->bus_addr,
+                                             GFP_KERNEL);
+       if (!status_head->cq)
                return NULL;
-       }
+
        status_head->n_cq = status_size;
        status_head->virt_head = status_head->cq;
        status_head->virt_tail = status_head->cq;
@@ -625,7 +625,7 @@ static int fsl_qdma_halt(struct fsl_qdma_engine *fsl_qdma)
 
 static int
 fsl_qdma_queue_transfer_complete(struct fsl_qdma_engine *fsl_qdma,
-                                void *block,
+                                __iomem void *block,
                                 int id)
 {
        bool duplicate;
@@ -1197,10 +1197,6 @@ static int fsl_qdma_probe(struct platform_device *pdev)
        if (!fsl_qdma->queue)
                return -ENOMEM;
 
-       ret = fsl_qdma_irq_init(pdev, fsl_qdma);
-       if (ret)
-               return ret;
-
        fsl_qdma->irq_base = platform_get_irq_byname(pdev, "qdma-queue0");
        if (fsl_qdma->irq_base < 0)
                return fsl_qdma->irq_base;
@@ -1239,16 +1235,19 @@ static int fsl_qdma_probe(struct platform_device *pdev)
 
        platform_set_drvdata(pdev, fsl_qdma);
 
-       ret = dma_async_device_register(&fsl_qdma->dma_dev);
+       ret = fsl_qdma_reg_init(fsl_qdma);
        if (ret) {
-               dev_err(&pdev->dev,
-                       "Can't register NXP Layerscape qDMA engine.\n");
+               dev_err(&pdev->dev, "Can't Initialize the qDMA engine.\n");
                return ret;
        }
 
-       ret = fsl_qdma_reg_init(fsl_qdma);
+       ret = fsl_qdma_irq_init(pdev, fsl_qdma);
+       if (ret)
+               return ret;
+
+       ret = dma_async_device_register(&fsl_qdma->dma_dev);
        if (ret) {
-               dev_err(&pdev->dev, "Can't Initialize the qDMA engine.\n");
+               dev_err(&pdev->dev, "Can't register NXP Layerscape qDMA engine.\n");
                return ret;
        }
 
@@ -1268,8 +1267,6 @@ static void fsl_qdma_cleanup_vchan(struct dma_device *dmadev)
 
 static void fsl_qdma_remove(struct platform_device *pdev)
 {
-       int i;
-       struct fsl_qdma_queue *status;
        struct device_node *np = pdev->dev.of_node;
        struct fsl_qdma_engine *fsl_qdma = platform_get_drvdata(pdev);
 
@@ -1277,12 +1274,6 @@ static void fsl_qdma_remove(struct platform_device *pdev)
        fsl_qdma_cleanup_vchan(&fsl_qdma->dma_dev);
        of_dma_controller_free(np);
        dma_async_device_unregister(&fsl_qdma->dma_dev);
-
-       for (i = 0; i < fsl_qdma->block_number; i++) {
-               status = fsl_qdma->status[i];
-               dma_free_coherent(&pdev->dev, sizeof(struct fsl_qdma_format) *
-                               status->n_cq, status->cq, status->bus_addr);
-       }
 }
 
 static const struct of_device_id fsl_qdma_dt_ids[] = {
index 77f8885cf4075acfd3ff535b7e09519a8df41c70..e5a94a93a3cc4e6da66aca64cc2174b20d80a7bb 100644 (file)
@@ -345,7 +345,7 @@ static void idxd_cdev_evl_drain_pasid(struct idxd_wq *wq, u32 pasid)
        spin_lock(&evl->lock);
        status.bits = ioread64(idxd->reg_base + IDXD_EVLSTATUS_OFFSET);
        t = status.tail;
-       h = evl->head;
+       h = status.head;
        size = evl->size;
 
        while (h != t) {
index 9cfbd9b14c4c43306326e857b8b3d982c612314f..f3f25ee676f30eb283989586d458a5c8b8c01f9f 100644 (file)
@@ -68,9 +68,9 @@ static int debugfs_evl_show(struct seq_file *s, void *d)
 
        spin_lock(&evl->lock);
 
-       h = evl->head;
        evl_status.bits = ioread64(idxd->reg_base + IDXD_EVLSTATUS_OFFSET);
        t = evl_status.tail;
+       h = evl_status.head;
        evl_size = evl->size;
 
        seq_printf(s, "Event Log head %u tail %u interrupt pending %u\n\n",
index 47de3f93ff1e9a72eb718b07c05213d19ec1d23b..d0f5db6cf1eda103db09c31449cf3a58d58b7971 100644 (file)
@@ -300,7 +300,6 @@ struct idxd_evl {
        unsigned int log_size;
        /* The number of entries in the event log. */
        u16 size;
-       u16 head;
        unsigned long *bmap;
        bool batch_fail[IDXD_MAX_BATCH_IDENT];
 };
index 14df1f1347a8dd83b82263438acf3fe613513564..4954adc6bb609e508c510daf630f1077191fd2c7 100644 (file)
@@ -343,7 +343,9 @@ static void idxd_cleanup_internals(struct idxd_device *idxd)
 static int idxd_init_evl(struct idxd_device *idxd)
 {
        struct device *dev = &idxd->pdev->dev;
+       unsigned int evl_cache_size;
        struct idxd_evl *evl;
+       const char *idxd_name;
 
        if (idxd->hw.gen_cap.evl_support == 0)
                return 0;
@@ -355,9 +357,16 @@ static int idxd_init_evl(struct idxd_device *idxd)
        spin_lock_init(&evl->lock);
        evl->size = IDXD_EVL_SIZE_MIN;
 
-       idxd->evl_cache = kmem_cache_create(dev_name(idxd_confdev(idxd)),
-                                           sizeof(struct idxd_evl_fault) + evl_ent_size(idxd),
-                                           0, 0, NULL);
+       idxd_name = dev_name(idxd_confdev(idxd));
+       evl_cache_size = sizeof(struct idxd_evl_fault) + evl_ent_size(idxd);
+       /*
+        * Since completion record in evl_cache will be copied to user
+        * when handling completion record page fault, need to create
+        * the cache suitable for user copy.
+        */
+       idxd->evl_cache = kmem_cache_create_usercopy(idxd_name, evl_cache_size,
+                                                    0, 0, 0, evl_cache_size,
+                                                    NULL);
        if (!idxd->evl_cache) {
                kfree(evl);
                return -ENOMEM;
index c8a0aa874b1153f845278e03e9e5153cc487c0fb..348aa21389a9fceb4cd522579c8f8a9963e72ef3 100644 (file)
@@ -367,9 +367,9 @@ static void process_evl_entries(struct idxd_device *idxd)
        /* Clear interrupt pending bit */
        iowrite32(evl_status.bits_upper32,
                  idxd->reg_base + IDXD_EVLSTATUS_OFFSET + sizeof(u32));
-       h = evl->head;
        evl_status.bits = ioread64(idxd->reg_base + IDXD_EVLSTATUS_OFFSET);
        t = evl_status.tail;
+       h = evl_status.head;
        size = idxd->evl->size;
 
        while (h != t) {
@@ -378,7 +378,6 @@ static void process_evl_entries(struct idxd_device *idxd)
                h = (h + 1) % size;
        }
 
-       evl->head = h;
        evl_status.head = h;
        iowrite32(evl_status.bits_lower32, idxd->reg_base + IDXD_EVLSTATUS_OFFSET);
        spin_unlock(&evl->lock);
index 1aa65e5de0f3ad9bc0fa0907ebda8e8c0fe6d0ab..f792407348077dd9fe481cfdec6577a701493487 100644 (file)
@@ -385,8 +385,6 @@ int pt_dmaengine_register(struct pt_device *pt)
        chan->vc.desc_free = pt_do_cleanup;
        vchan_init(&chan->vc, dma_dev);
 
-       dma_set_mask_and_coherent(pt->dev, DMA_BIT_MASK(64));
-
        ret = dma_async_device_register(dma_dev);
        if (ret)
                goto err_reg;
index f1f920861fa9d8937a34bc29906778720a8dbfd8..5f8d2e93ff3fb516ea6374e007110b057377f11e 100644 (file)
@@ -2404,6 +2404,11 @@ static int edma_probe(struct platform_device *pdev)
        if (irq > 0) {
                irq_name = devm_kasprintf(dev, GFP_KERNEL, "%s_ccint",
                                          dev_name(dev));
+               if (!irq_name) {
+                       ret = -ENOMEM;
+                       goto err_disable_pm;
+               }
+
                ret = devm_request_irq(dev, irq, dma_irq_handler, 0, irq_name,
                                       ecc);
                if (ret) {
@@ -2420,6 +2425,11 @@ static int edma_probe(struct platform_device *pdev)
        if (irq > 0) {
                irq_name = devm_kasprintf(dev, GFP_KERNEL, "%s_ccerrint",
                                          dev_name(dev));
+               if (!irq_name) {
+                       ret = -ENOMEM;
+                       goto err_disable_pm;
+               }
+
                ret = devm_request_irq(dev, irq, dma_ccerr_handler, 0, irq_name,
                                       ecc);
                if (ret) {
index 2841a539c264891cde8b8166d51d5bbf885a5d85..6400d06588a24d1aa54ceefc3e79ce7661e43f3a 100644 (file)
@@ -3968,6 +3968,7 @@ static void udma_desc_pre_callback(struct virt_dma_chan *vc,
 {
        struct udma_chan *uc = to_udma_chan(&vc->chan);
        struct udma_desc *d;
+       u8 status;
 
        if (!vd)
                return;
@@ -3977,12 +3978,12 @@ static void udma_desc_pre_callback(struct virt_dma_chan *vc,
        if (d->metadata_size)
                udma_fetch_epib(uc, d);
 
-       /* Provide residue information for the client */
        if (result) {
                void *desc_vaddr = udma_curr_cppi5_desc_vaddr(d, d->desc_idx);
 
                if (cppi5_desc_get_type(desc_vaddr) ==
                    CPPI5_INFO0_DESC_TYPE_VAL_HOST) {
+                       /* Provide residue information for the client */
                        result->residue = d->residue -
                                          cppi5_hdesc_get_pktlen(desc_vaddr);
                        if (result->residue)
@@ -3991,7 +3992,12 @@ static void udma_desc_pre_callback(struct virt_dma_chan *vc,
                                result->result = DMA_TRANS_NOERROR;
                } else {
                        result->residue = 0;
-                       result->result = DMA_TRANS_NOERROR;
+                       /* Propagate TR Response errors to the client */
+                       status = d->hwdesc[0].tr_resp_base->status;
+                       if (status)
+                               result->result = DMA_TRANS_ABORTED;
+                       else
+                               result->result = DMA_TRANS_NOERROR;
                }
        }
 }
index 1eca8cc271f841e7b15967b2c33394169065b4ab..7f686d179fc93c85f684d051595a1d4c1934bdbb 100644 (file)
@@ -29,8 +29,6 @@ static u32 dpll_pin_xa_id;
        WARN_ON_ONCE(!xa_get_mark(&dpll_device_xa, (d)->id, DPLL_REGISTERED))
 #define ASSERT_DPLL_NOT_REGISTERED(d)  \
        WARN_ON_ONCE(xa_get_mark(&dpll_device_xa, (d)->id, DPLL_REGISTERED))
-#define ASSERT_PIN_REGISTERED(p)       \
-       WARN_ON_ONCE(!xa_get_mark(&dpll_pin_xa, (p)->id, DPLL_REGISTERED))
 
 struct dpll_device_registration {
        struct list_head list;
@@ -425,6 +423,53 @@ void dpll_device_unregister(struct dpll_device *dpll,
 }
 EXPORT_SYMBOL_GPL(dpll_device_unregister);
 
+static void dpll_pin_prop_free(struct dpll_pin_properties *prop)
+{
+       kfree(prop->package_label);
+       kfree(prop->panel_label);
+       kfree(prop->board_label);
+       kfree(prop->freq_supported);
+}
+
+static int dpll_pin_prop_dup(const struct dpll_pin_properties *src,
+                            struct dpll_pin_properties *dst)
+{
+       memcpy(dst, src, sizeof(*dst));
+       if (src->freq_supported && src->freq_supported_num) {
+               size_t freq_size = src->freq_supported_num *
+                                  sizeof(*src->freq_supported);
+               dst->freq_supported = kmemdup(src->freq_supported,
+                                             freq_size, GFP_KERNEL);
+               if (!src->freq_supported)
+                       return -ENOMEM;
+       }
+       if (src->board_label) {
+               dst->board_label = kstrdup(src->board_label, GFP_KERNEL);
+               if (!dst->board_label)
+                       goto err_board_label;
+       }
+       if (src->panel_label) {
+               dst->panel_label = kstrdup(src->panel_label, GFP_KERNEL);
+               if (!dst->panel_label)
+                       goto err_panel_label;
+       }
+       if (src->package_label) {
+               dst->package_label = kstrdup(src->package_label, GFP_KERNEL);
+               if (!dst->package_label)
+                       goto err_package_label;
+       }
+
+       return 0;
+
+err_package_label:
+       kfree(dst->panel_label);
+err_panel_label:
+       kfree(dst->board_label);
+err_board_label:
+       kfree(dst->freq_supported);
+       return -ENOMEM;
+}
+
 static struct dpll_pin *
 dpll_pin_alloc(u64 clock_id, u32 pin_idx, struct module *module,
               const struct dpll_pin_properties *prop)
@@ -441,24 +486,48 @@ dpll_pin_alloc(u64 clock_id, u32 pin_idx, struct module *module,
        if (WARN_ON(prop->type < DPLL_PIN_TYPE_MUX ||
                    prop->type > DPLL_PIN_TYPE_MAX)) {
                ret = -EINVAL;
-               goto err;
+               goto err_pin_prop;
        }
-       pin->prop = prop;
+       ret = dpll_pin_prop_dup(prop, &pin->prop);
+       if (ret)
+               goto err_pin_prop;
        refcount_set(&pin->refcount, 1);
        xa_init_flags(&pin->dpll_refs, XA_FLAGS_ALLOC);
        xa_init_flags(&pin->parent_refs, XA_FLAGS_ALLOC);
        ret = xa_alloc_cyclic(&dpll_pin_xa, &pin->id, pin, xa_limit_32b,
                              &dpll_pin_xa_id, GFP_KERNEL);
        if (ret)
-               goto err;
+               goto err_xa_alloc;
        return pin;
-err:
+err_xa_alloc:
        xa_destroy(&pin->dpll_refs);
        xa_destroy(&pin->parent_refs);
+       dpll_pin_prop_free(&pin->prop);
+err_pin_prop:
        kfree(pin);
        return ERR_PTR(ret);
 }
 
+static void dpll_netdev_pin_assign(struct net_device *dev, struct dpll_pin *dpll_pin)
+{
+       rtnl_lock();
+       rcu_assign_pointer(dev->dpll_pin, dpll_pin);
+       rtnl_unlock();
+}
+
+void dpll_netdev_pin_set(struct net_device *dev, struct dpll_pin *dpll_pin)
+{
+       WARN_ON(!dpll_pin);
+       dpll_netdev_pin_assign(dev, dpll_pin);
+}
+EXPORT_SYMBOL(dpll_netdev_pin_set);
+
+void dpll_netdev_pin_clear(struct net_device *dev)
+{
+       dpll_netdev_pin_assign(dev, NULL);
+}
+EXPORT_SYMBOL(dpll_netdev_pin_clear);
+
 /**
  * dpll_pin_get - find existing or create new dpll pin
  * @clock_id: clock_id of creator
@@ -514,7 +583,8 @@ void dpll_pin_put(struct dpll_pin *pin)
                xa_destroy(&pin->dpll_refs);
                xa_destroy(&pin->parent_refs);
                xa_erase(&dpll_pin_xa, pin->id);
-               kfree(pin);
+               dpll_pin_prop_free(&pin->prop);
+               kfree_rcu(pin, rcu);
        }
        mutex_unlock(&dpll_lock);
 }
@@ -564,8 +634,6 @@ dpll_pin_register(struct dpll_device *dpll, struct dpll_pin *pin,
            WARN_ON(!ops->state_on_dpll_get) ||
            WARN_ON(!ops->direction_get))
                return -EINVAL;
-       if (ASSERT_DPLL_REGISTERED(dpll))
-               return -EINVAL;
 
        mutex_lock(&dpll_lock);
        if (WARN_ON(!(dpll->module == pin->module &&
@@ -636,15 +704,13 @@ int dpll_pin_on_pin_register(struct dpll_pin *parent, struct dpll_pin *pin,
        unsigned long i, stop;
        int ret;
 
-       if (WARN_ON(parent->prop->type != DPLL_PIN_TYPE_MUX))
+       if (WARN_ON(parent->prop.type != DPLL_PIN_TYPE_MUX))
                return -EINVAL;
 
        if (WARN_ON(!ops) ||
            WARN_ON(!ops->state_on_pin_get) ||
            WARN_ON(!ops->direction_get))
                return -EINVAL;
-       if (ASSERT_PIN_REGISTERED(parent))
-               return -EINVAL;
 
        mutex_lock(&dpll_lock);
        ret = dpll_xa_ref_pin_add(&pin->parent_refs, parent, ops, priv);
index 5585873c5c1b020e5618896aaa43ff1656ff53dd..2b6d8ef1cdf36cff24328e497c49d667659dd0e6 100644 (file)
@@ -44,9 +44,10 @@ struct dpll_device {
  * @module:            module of creator
  * @dpll_refs:         hold referencees to dplls pin was registered with
  * @parent_refs:       hold references to parent pins pin was registered with
- * @prop:              pointer to pin properties given by registerer
+ * @prop:              pin properties copied from the registerer
  * @rclk_dev_name:     holds name of device when pin can recover clock from it
  * @refcount:          refcount
+ * @rcu:               rcu_head for kfree_rcu()
  **/
 struct dpll_pin {
        u32 id;
@@ -55,8 +56,9 @@ struct dpll_pin {
        struct module *module;
        struct xarray dpll_refs;
        struct xarray parent_refs;
-       const struct dpll_pin_properties *prop;
+       struct dpll_pin_properties prop;
        refcount_t refcount;
+       struct rcu_head rcu;
 };
 
 /**
index 3370dbddb86bdeb6b627fdf741357eeb15ee3676..b57355e0c214bb3badca414c7d127d79a772bcdb 100644 (file)
@@ -8,6 +8,7 @@
  */
 #include <linux/module.h>
 #include <linux/kernel.h>
+#include <linux/netdevice.h>
 #include <net/genetlink.h>
 #include "dpll_core.h"
 #include "dpll_netlink.h"
@@ -47,18 +48,6 @@ dpll_msg_add_dev_parent_handle(struct sk_buff *msg, u32 id)
        return 0;
 }
 
-/**
- * dpll_msg_pin_handle_size - get size of pin handle attribute for given pin
- * @pin: pin pointer
- *
- * Return: byte size of pin handle attribute for given pin.
- */
-size_t dpll_msg_pin_handle_size(struct dpll_pin *pin)
-{
-       return pin ? nla_total_size(4) : 0; /* DPLL_A_PIN_ID */
-}
-EXPORT_SYMBOL_GPL(dpll_msg_pin_handle_size);
-
 /**
  * dpll_msg_add_pin_handle - attach pin handle attribute to a given message
  * @msg: pointer to sk_buff message to attach a pin handle
@@ -68,7 +57,7 @@ EXPORT_SYMBOL_GPL(dpll_msg_pin_handle_size);
  * * 0 - success
  * * -EMSGSIZE - no space in message to attach pin handle
  */
-int dpll_msg_add_pin_handle(struct sk_buff *msg, struct dpll_pin *pin)
+static int dpll_msg_add_pin_handle(struct sk_buff *msg, struct dpll_pin *pin)
 {
        if (!pin)
                return 0;
@@ -76,7 +65,28 @@ int dpll_msg_add_pin_handle(struct sk_buff *msg, struct dpll_pin *pin)
                return -EMSGSIZE;
        return 0;
 }
-EXPORT_SYMBOL_GPL(dpll_msg_add_pin_handle);
+
+static struct dpll_pin *dpll_netdev_pin(const struct net_device *dev)
+{
+       return rcu_dereference_rtnl(dev->dpll_pin);
+}
+
+/**
+ * dpll_netdev_pin_handle_size - get size of pin handle attribute of a netdev
+ * @dev: netdev from which to get the pin
+ *
+ * Return: byte size of pin handle attribute, or 0 if @dev has no pin.
+ */
+size_t dpll_netdev_pin_handle_size(const struct net_device *dev)
+{
+       return dpll_netdev_pin(dev) ? nla_total_size(4) : 0; /* DPLL_A_PIN_ID */
+}
+
+int dpll_netdev_add_pin_handle(struct sk_buff *msg,
+                              const struct net_device *dev)
+{
+       return dpll_msg_add_pin_handle(msg, dpll_netdev_pin(dev));
+}
 
 static int
 dpll_msg_add_mode(struct sk_buff *msg, struct dpll_device *dpll,
@@ -303,17 +313,17 @@ dpll_msg_add_pin_freq(struct sk_buff *msg, struct dpll_pin *pin,
        if (nla_put_64bit(msg, DPLL_A_PIN_FREQUENCY, sizeof(freq), &freq,
                          DPLL_A_PIN_PAD))
                return -EMSGSIZE;
-       for (fs = 0; fs < pin->prop->freq_supported_num; fs++) {
+       for (fs = 0; fs < pin->prop.freq_supported_num; fs++) {
                nest = nla_nest_start(msg, DPLL_A_PIN_FREQUENCY_SUPPORTED);
                if (!nest)
                        return -EMSGSIZE;
-               freq = pin->prop->freq_supported[fs].min;
+               freq = pin->prop.freq_supported[fs].min;
                if (nla_put_64bit(msg, DPLL_A_PIN_FREQUENCY_MIN, sizeof(freq),
                                  &freq, DPLL_A_PIN_PAD)) {
                        nla_nest_cancel(msg, nest);
                        return -EMSGSIZE;
                }
-               freq = pin->prop->freq_supported[fs].max;
+               freq = pin->prop.freq_supported[fs].max;
                if (nla_put_64bit(msg, DPLL_A_PIN_FREQUENCY_MAX, sizeof(freq),
                                  &freq, DPLL_A_PIN_PAD)) {
                        nla_nest_cancel(msg, nest);
@@ -329,9 +339,9 @@ static bool dpll_pin_is_freq_supported(struct dpll_pin *pin, u32 freq)
 {
        int fs;
 
-       for (fs = 0; fs < pin->prop->freq_supported_num; fs++)
-               if (freq >= pin->prop->freq_supported[fs].min &&
-                   freq <= pin->prop->freq_supported[fs].max)
+       for (fs = 0; fs < pin->prop.freq_supported_num; fs++)
+               if (freq >= pin->prop.freq_supported[fs].min &&
+                   freq <= pin->prop.freq_supported[fs].max)
                        return true;
        return false;
 }
@@ -421,7 +431,7 @@ static int
 dpll_cmd_pin_get_one(struct sk_buff *msg, struct dpll_pin *pin,
                     struct netlink_ext_ack *extack)
 {
-       const struct dpll_pin_properties *prop = pin->prop;
+       const struct dpll_pin_properties *prop = &pin->prop;
        struct dpll_pin_ref *ref;
        int ret;
 
@@ -553,6 +563,24 @@ __dpll_device_change_ntf(struct dpll_device *dpll)
        return dpll_device_event_send(DPLL_CMD_DEVICE_CHANGE_NTF, dpll);
 }
 
+static bool dpll_pin_available(struct dpll_pin *pin)
+{
+       struct dpll_pin_ref *par_ref;
+       unsigned long i;
+
+       if (!xa_get_mark(&dpll_pin_xa, pin->id, DPLL_REGISTERED))
+               return false;
+       xa_for_each(&pin->parent_refs, i, par_ref)
+               if (xa_get_mark(&dpll_pin_xa, par_ref->pin->id,
+                               DPLL_REGISTERED))
+                       return true;
+       xa_for_each(&pin->dpll_refs, i, par_ref)
+               if (xa_get_mark(&dpll_device_xa, par_ref->dpll->id,
+                               DPLL_REGISTERED))
+                       return true;
+       return false;
+}
+
 /**
  * dpll_device_change_ntf - notify that the dpll device has been changed
  * @dpll: registered dpll pointer
@@ -579,7 +607,7 @@ dpll_pin_event_send(enum dpll_cmd event, struct dpll_pin *pin)
        int ret = -ENOMEM;
        void *hdr;
 
-       if (WARN_ON(!xa_get_mark(&dpll_pin_xa, pin->id, DPLL_REGISTERED)))
+       if (!dpll_pin_available(pin))
                return -ENODEV;
 
        msg = genlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
@@ -717,7 +745,7 @@ dpll_pin_on_pin_state_set(struct dpll_pin *pin, u32 parent_idx,
        int ret;
 
        if (!(DPLL_PIN_CAPABILITIES_STATE_CAN_CHANGE &
-             pin->prop->capabilities)) {
+             pin->prop.capabilities)) {
                NL_SET_ERR_MSG(extack, "state changing is not allowed");
                return -EOPNOTSUPP;
        }
@@ -753,7 +781,7 @@ dpll_pin_state_set(struct dpll_device *dpll, struct dpll_pin *pin,
        int ret;
 
        if (!(DPLL_PIN_CAPABILITIES_STATE_CAN_CHANGE &
-             pin->prop->capabilities)) {
+             pin->prop.capabilities)) {
                NL_SET_ERR_MSG(extack, "state changing is not allowed");
                return -EOPNOTSUPP;
        }
@@ -780,7 +808,7 @@ dpll_pin_prio_set(struct dpll_device *dpll, struct dpll_pin *pin,
        int ret;
 
        if (!(DPLL_PIN_CAPABILITIES_PRIORITY_CAN_CHANGE &
-             pin->prop->capabilities)) {
+             pin->prop.capabilities)) {
                NL_SET_ERR_MSG(extack, "prio changing is not allowed");
                return -EOPNOTSUPP;
        }
@@ -808,7 +836,7 @@ dpll_pin_direction_set(struct dpll_pin *pin, struct dpll_device *dpll,
        int ret;
 
        if (!(DPLL_PIN_CAPABILITIES_DIRECTION_CAN_CHANGE &
-             pin->prop->capabilities)) {
+             pin->prop.capabilities)) {
                NL_SET_ERR_MSG(extack, "direction changing is not allowed");
                return -EOPNOTSUPP;
        }
@@ -838,8 +866,8 @@ dpll_pin_phase_adj_set(struct dpll_pin *pin, struct nlattr *phase_adj_attr,
        int ret;
 
        phase_adj = nla_get_s32(phase_adj_attr);
-       if (phase_adj > pin->prop->phase_range.max ||
-           phase_adj < pin->prop->phase_range.min) {
+       if (phase_adj > pin->prop.phase_range.max ||
+           phase_adj < pin->prop.phase_range.min) {
                NL_SET_ERR_MSG_ATTR(extack, phase_adj_attr,
                                    "phase adjust value not supported");
                return -EINVAL;
@@ -1023,7 +1051,7 @@ dpll_pin_find(u64 clock_id, struct nlattr *mod_name_attr,
        unsigned long i;
 
        xa_for_each_marked(&dpll_pin_xa, i, pin, DPLL_REGISTERED) {
-               prop = pin->prop;
+               prop = &pin->prop;
                cid_match = clock_id ? pin->clock_id == clock_id : true;
                mod_match = mod_name_attr && module_name(pin->module) ?
                        !nla_strcmp(mod_name_attr,
@@ -1130,6 +1158,10 @@ int dpll_nl_pin_id_get_doit(struct sk_buff *skb, struct genl_info *info)
        }
        pin = dpll_pin_find_from_nlattr(info);
        if (!IS_ERR(pin)) {
+               if (!dpll_pin_available(pin)) {
+                       nlmsg_free(msg);
+                       return -ENODEV;
+               }
                ret = dpll_msg_add_pin_handle(msg, pin);
                if (ret) {
                        nlmsg_free(msg);
@@ -1177,8 +1209,11 @@ int dpll_nl_pin_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
        unsigned long i;
        int ret = 0;
 
+       mutex_lock(&dpll_lock);
        xa_for_each_marked_start(&dpll_pin_xa, i, pin, DPLL_REGISTERED,
                                 ctx->idx) {
+               if (!dpll_pin_available(pin))
+                       continue;
                hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid,
                                  cb->nlh->nlmsg_seq,
                                  &dpll_nl_family, NLM_F_MULTI,
@@ -1194,6 +1229,8 @@ int dpll_nl_pin_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
                }
                genlmsg_end(skb, hdr);
        }
+       mutex_unlock(&dpll_lock);
+
        if (ret == -EMSGSIZE) {
                ctx->idx = i;
                return skb->len;
@@ -1349,6 +1386,7 @@ int dpll_nl_device_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
        unsigned long i;
        int ret = 0;
 
+       mutex_lock(&dpll_lock);
        xa_for_each_marked_start(&dpll_device_xa, i, dpll, DPLL_REGISTERED,
                                 ctx->idx) {
                hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid,
@@ -1365,6 +1403,8 @@ int dpll_nl_device_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
                }
                genlmsg_end(skb, hdr);
        }
+       mutex_unlock(&dpll_lock);
+
        if (ret == -EMSGSIZE) {
                ctx->idx = i;
                return skb->len;
@@ -1415,20 +1455,6 @@ dpll_unlock_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
        mutex_unlock(&dpll_lock);
 }
 
-int dpll_lock_dumpit(struct netlink_callback *cb)
-{
-       mutex_lock(&dpll_lock);
-
-       return 0;
-}
-
-int dpll_unlock_dumpit(struct netlink_callback *cb)
-{
-       mutex_unlock(&dpll_lock);
-
-       return 0;
-}
-
 int dpll_pin_pre_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
                      struct genl_info *info)
 {
@@ -1441,7 +1467,8 @@ int dpll_pin_pre_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
        }
        info->user_ptr[0] = xa_load(&dpll_pin_xa,
                                    nla_get_u32(info->attrs[DPLL_A_PIN_ID]));
-       if (!info->user_ptr[0]) {
+       if (!info->user_ptr[0] ||
+           !dpll_pin_available(info->user_ptr[0])) {
                NL_SET_ERR_MSG(info->extack, "pin not found");
                ret = -ENODEV;
                goto unlock_dev;
index eaee5be7aa642a9359c0b438e7157eec830b6519..1e95f5397cfce65270fbc88d8916a24386258047 100644 (file)
@@ -95,9 +95,7 @@ static const struct genl_split_ops dpll_nl_ops[] = {
        },
        {
                .cmd    = DPLL_CMD_DEVICE_GET,
-               .start  = dpll_lock_dumpit,
                .dumpit = dpll_nl_device_get_dumpit,
-               .done   = dpll_unlock_dumpit,
                .flags  = GENL_ADMIN_PERM | GENL_CMD_CAP_DUMP,
        },
        {
@@ -129,9 +127,7 @@ static const struct genl_split_ops dpll_nl_ops[] = {
        },
        {
                .cmd            = DPLL_CMD_PIN_GET,
-               .start          = dpll_lock_dumpit,
                .dumpit         = dpll_nl_pin_get_dumpit,
-               .done           = dpll_unlock_dumpit,
                .policy         = dpll_pin_get_dump_nl_policy,
                .maxattr        = DPLL_A_PIN_ID,
                .flags          = GENL_ADMIN_PERM | GENL_CMD_CAP_DUMP,
index 92d4c9c4f788dc1b36c7b076a980afd2903aba68..f491262bee4f0c16e97624353bef6a4938b761aa 100644 (file)
@@ -30,8 +30,6 @@ dpll_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
 void
 dpll_pin_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
                   struct genl_info *info);
-int dpll_lock_dumpit(struct netlink_callback *cb);
-int dpll_unlock_dumpit(struct netlink_callback *cb);
 
 int dpll_nl_device_id_get_doit(struct sk_buff *skb, struct genl_info *info);
 int dpll_nl_device_get_doit(struct sk_buff *skb, struct genl_info *info);
index 6ac5ff20a2fe22f1c3c5a7010ca0af87caf4c446..401a77e3b5fa8ed9e9b834c4a55cde98d2b2a8db 100644 (file)
@@ -429,7 +429,23 @@ static void bm_work(struct work_struct *work)
         */
        card->bm_generation = generation;
 
-       if (root_device == NULL) {
+       if (card->gap_count == 0) {
+               /*
+                * If self IDs have inconsistent gap counts, do a
+                * bus reset ASAP. The config rom read might never
+                * complete, so don't wait for it. However, still
+                * send a PHY configuration packet prior to the
+                * bus reset. The PHY configuration packet might
+                * fail, but 1394-2008 8.4.5.2 explicitly permits
+                * it in this case, so it should be safe to try.
+                */
+               new_root_id = local_id;
+               /*
+                * We must always send a bus reset if the gap count
+                * is inconsistent, so bypass the 5-reset limit.
+                */
+               card->bm_retries = 0;
+       } else if (root_device == NULL) {
                /*
                 * Either link_on is false, or we failed to read the
                 * config rom.  In either case, pick another root.
@@ -484,7 +500,19 @@ static void bm_work(struct work_struct *work)
                fw_notice(card, "phy config: new root=%x, gap_count=%d\n",
                          new_root_id, gap_count);
                fw_send_phy_config(card, new_root_id, generation, gap_count);
-               reset_bus(card, true);
+               /*
+                * Where possible, use a short bus reset to minimize
+                * disruption to isochronous transfers. But in the event
+                * of a gap count inconsistency, use a long bus reset.
+                *
+                * As noted in 1394a 8.4.6.2, nodes on a mixed 1394/1394a bus
+                * may set different gap counts after a bus reset. On a mixed
+                * 1394/1394a bus, a short bus reset can get doubled. Some
+                * nodes may treat the double reset as one bus reset and others
+                * may treat it as two, causing a gap count inconsistency
+                * again. Using a long bus reset prevents this.
+                */
+               reset_bus(card, card->gap_count != 0);
                /* Will allocate broadcast channel after the reset. */
                goto out;
        }
index 0547253d16fe5dc5f606a34035473de7f271acf6..7d3346b3a2bf320910c72783ab857415d331ed14 100644 (file)
@@ -118,10 +118,9 @@ static int textual_leaf_to_string(const u32 *block, char *buf, size_t size)
  * @buf:       where to put the string
  * @size:      size of @buf, in bytes
  *
- * The string is taken from a minimal ASCII text descriptor leaf after
- * the immediate entry with @key.  The string is zero-terminated.
- * An overlong string is silently truncated such that it and the
- * zero byte fit into @size.
+ * The string is taken from a minimal ASCII text descriptor leaf just after the entry with the
+ * @key. The string is zero-terminated. An overlong string is silently truncated such that it
+ * and the zero byte fit into @size.
  *
  * Returns strlen(buf) or a negative error code.
  */
@@ -368,8 +367,17 @@ static ssize_t show_text_leaf(struct device *dev,
        for (i = 0; i < ARRAY_SIZE(directories) && !!directories[i]; ++i) {
                int result = fw_csr_string(directories[i], attr->key, buf, bufsize);
                // Detected.
-               if (result >= 0)
+               if (result >= 0) {
                        ret = result;
+               } else if (i == 0 && attr->key == CSR_VENDOR) {
+                       // Sony DVMC-DA1 has configuration ROM such that the descriptor leaf entry
+                       // in the root directory follows to the directory entry for vendor ID
+                       // instead of the immediate value for vendor ID.
+                       result = fw_csr_string(directories[i], CSR_DIRECTORY | attr->key, buf,
+                                              bufsize);
+                       if (result >= 0)
+                               ret = result;
+               }
        }
 
        if (ret >= 0) {
index 9db9290c326930d7ac903382f234f9435876b39b..7bc71f4be64a07510507e1c9b7d0f1a61de30e3b 100644 (file)
@@ -3773,6 +3773,7 @@ static int pci_probe(struct pci_dev *dev,
        return 0;
 
  fail_msi:
+       devm_free_irq(&dev->dev, dev->irq, ohci);
        pci_disable_msi(dev);
 
        return err;
@@ -3800,6 +3801,7 @@ static void pci_remove(struct pci_dev *dev)
 
        software_reset(ohci);
 
+       devm_free_irq(&dev->dev, dev->irq, ohci);
        pci_disable_msi(dev);
 
        dev_notice(&dev->dev, "removing fw-ohci device\n");
index 6146b2927d5c56af6bc3b9722c1789f29a4498fe..f2556a8e940156bc4f9d34ae5dc92aac837b688a 100644 (file)
@@ -107,12 +107,12 @@ struct ffa_drv_info {
        struct work_struct notif_pcpu_work;
        struct work_struct irq_work;
        struct xarray partition_info;
-       unsigned int partition_count;
        DECLARE_HASHTABLE(notifier_hash, ilog2(FFA_MAX_NOTIFICATIONS));
        struct mutex notify_lock; /* lock to protect notifier hashtable  */
 };
 
 static struct ffa_drv_info *drv_info;
+static void ffa_partitions_cleanup(void);
 
 /*
  * The driver must be able to support all the versions from the earliest
@@ -733,6 +733,11 @@ static void __do_sched_recv_cb(u16 part_id, u16 vcpu, bool is_per_vcpu)
        void *cb_data;
 
        partition = xa_load(&drv_info->partition_info, part_id);
+       if (!partition) {
+               pr_err("%s: Invalid partition ID 0x%x\n", __func__, part_id);
+               return;
+       }
+
        read_lock(&partition->rw_lock);
        callback = partition->callback;
        cb_data = partition->cb_data;
@@ -915,6 +920,11 @@ static int ffa_sched_recv_cb_update(u16 part_id, ffa_sched_recv_cb callback,
                return -EOPNOTSUPP;
 
        partition = xa_load(&drv_info->partition_info, part_id);
+       if (!partition) {
+               pr_err("%s: Invalid partition ID 0x%x\n", __func__, part_id);
+               return -EINVAL;
+       }
+
        write_lock(&partition->rw_lock);
 
        cb_valid = !!partition->callback;
@@ -1186,9 +1196,9 @@ void ffa_device_match_uuid(struct ffa_device *ffa_dev, const uuid_t *uuid)
        kfree(pbuf);
 }
 
-static void ffa_setup_partitions(void)
+static int ffa_setup_partitions(void)
 {
-       int count, idx;
+       int count, idx, ret;
        uuid_t uuid;
        struct ffa_device *ffa_dev;
        struct ffa_dev_part_info *info;
@@ -1197,7 +1207,7 @@ static void ffa_setup_partitions(void)
        count = ffa_partition_probe(&uuid_null, &pbuf);
        if (count <= 0) {
                pr_info("%s: No partitions found, error %d\n", __func__, count);
-               return;
+               return -EINVAL;
        }
 
        xa_init(&drv_info->partition_info);
@@ -1226,40 +1236,53 @@ static void ffa_setup_partitions(void)
                        ffa_device_unregister(ffa_dev);
                        continue;
                }
-               xa_store(&drv_info->partition_info, tpbuf->id, info, GFP_KERNEL);
+               rwlock_init(&info->rw_lock);
+               ret = xa_insert(&drv_info->partition_info, tpbuf->id,
+                               info, GFP_KERNEL);
+               if (ret) {
+                       pr_err("%s: failed to save partition ID 0x%x - ret:%d\n",
+                              __func__, tpbuf->id, ret);
+                       ffa_device_unregister(ffa_dev);
+                       kfree(info);
+               }
        }
-       drv_info->partition_count = count;
 
        kfree(pbuf);
 
        /* Allocate for the host */
        info = kzalloc(sizeof(*info), GFP_KERNEL);
-       if (!info)
-               return;
-       xa_store(&drv_info->partition_info, drv_info->vm_id, info, GFP_KERNEL);
-       drv_info->partition_count++;
+       if (!info) {
+               pr_err("%s: failed to alloc Host partition ID 0x%x. Abort.\n",
+                      __func__, drv_info->vm_id);
+               /* Already registered devices are freed on bus_exit */
+               ffa_partitions_cleanup();
+               return -ENOMEM;
+       }
+
+       rwlock_init(&info->rw_lock);
+       ret = xa_insert(&drv_info->partition_info, drv_info->vm_id,
+                       info, GFP_KERNEL);
+       if (ret) {
+               pr_err("%s: failed to save Host partition ID 0x%x - ret:%d. Abort.\n",
+                      __func__, drv_info->vm_id, ret);
+               kfree(info);
+               /* Already registered devices are freed on bus_exit */
+               ffa_partitions_cleanup();
+       }
+
+       return ret;
 }
 
 static void ffa_partitions_cleanup(void)
 {
-       struct ffa_dev_part_info **info;
-       int idx, count = drv_info->partition_count;
-
-       if (!count)
-               return;
-
-       info = kcalloc(count, sizeof(*info), GFP_KERNEL);
-       if (!info)
-               return;
-
-       xa_extract(&drv_info->partition_info, (void **)info, 0, VM_ID_MASK,
-                  count, XA_PRESENT);
+       struct ffa_dev_part_info *info;
+       unsigned long idx;
 
-       for (idx = 0; idx < count; idx++)
-               kfree(info[idx]);
-       kfree(info);
+       xa_for_each(&drv_info->partition_info, idx, info) {
+               xa_erase(&drv_info->partition_info, idx);
+               kfree(info);
+       }
 
-       drv_info->partition_count = 0;
        xa_destroy(&drv_info->partition_info);
 }
 
@@ -1508,7 +1531,11 @@ static int __init ffa_init(void)
 
        ffa_notifications_setup();
 
-       ffa_setup_partitions();
+       ret = ffa_setup_partitions();
+       if (ret) {
+               pr_err("failed to setup partitions\n");
+               goto cleanup_notifs;
+       }
 
        ret = ffa_sched_recv_cb_update(drv_info->vm_id, ffa_self_notif_handle,
                                       drv_info, true);
@@ -1516,6 +1543,9 @@ static int __init ffa_init(void)
                pr_info("Failed to register driver sched callback %d\n", ret);
 
        return 0;
+
+cleanup_notifs:
+       ffa_notifications_cleanup();
 free_pages:
        if (drv_info->tx_buffer)
                free_pages_exact(drv_info->tx_buffer, RXTX_BUFFER_SIZE);
@@ -1535,7 +1565,6 @@ static void __exit ffa_exit(void)
        ffa_rxtx_unmap(drv_info->vm_id);
        free_pages_exact(drv_info->tx_buffer, RXTX_BUFFER_SIZE);
        free_pages_exact(drv_info->rx_buffer, RXTX_BUFFER_SIZE);
-       xa_destroy(&drv_info->partition_info);
        kfree(drv_info);
        arm_ffa_bus_exit();
 }
index c0644558042a06dc1f0c087af4f22264abeee52a..e2050adbf85c6a125fc5ba241fb0c6b133466bfe 100644 (file)
@@ -13,7 +13,7 @@
 #include "notify.h"
 
 /* Updated only after ALL the mandatory features for that version are merged */
-#define SCMI_PROTOCOL_SUPPORTED_VERSION                0x20001
+#define SCMI_PROTOCOL_SUPPORTED_VERSION                0x20000
 
 enum scmi_clock_protocol_cmd {
        CLOCK_ATTRIBUTES = 0x3,
@@ -954,8 +954,7 @@ static int scmi_clock_protocol_init(const struct scmi_protocol_handle *ph)
                        scmi_clock_describe_rates_get(ph, clkid, clk);
        }
 
-       if (PROTOCOL_REV_MAJOR(version) >= 0x2 &&
-           PROTOCOL_REV_MINOR(version) >= 0x1) {
+       if (PROTOCOL_REV_MAJOR(version) >= 0x3) {
                cinfo->clock_config_set = scmi_clock_config_set_v2;
                cinfo->clock_config_get = scmi_clock_config_get_v2;
        } else {
index c46dc5215af7a7c8a78e0fe26c12fac51c8080b7..00b165d1f502df7816527298996f196585d10f5a 100644 (file)
@@ -314,6 +314,7 @@ void shmem_fetch_notification(struct scmi_shared_mem __iomem *shmem,
 void shmem_clear_channel(struct scmi_shared_mem __iomem *shmem);
 bool shmem_poll_done(struct scmi_shared_mem __iomem *shmem,
                     struct scmi_xfer *xfer);
+bool shmem_channel_free(struct scmi_shared_mem __iomem *shmem);
 
 /* declarations for message passing transports */
 struct scmi_msg_payld;
index 19246ed1f01ff7cc3ea7346402c32e02b57b336a..b8d470417e8f99bb6408aba541bc4b89541ddf7c 100644 (file)
@@ -45,6 +45,20 @@ static void rx_callback(struct mbox_client *cl, void *m)
 {
        struct scmi_mailbox *smbox = client_to_scmi_mailbox(cl);
 
+       /*
+        * An A2P IRQ is NOT valid when received while the platform still has
+        * the ownership of the channel, because the platform at first releases
+        * the SMT channel and then sends the completion interrupt.
+        *
+        * This addresses a possible race condition in which a spurious IRQ from
+        * a previous timed-out reply which arrived late could be wrongly
+        * associated with the next pending transaction.
+        */
+       if (cl->knows_txdone && !shmem_channel_free(smbox->shmem)) {
+               dev_warn(smbox->cinfo->dev, "Ignoring spurious A2P IRQ !\n");
+               return;
+       }
+
        scmi_rx_callback(smbox->cinfo, shmem_read_header(smbox->shmem), NULL);
 }
 
index 8ea2a7b3d35d2029f9731ef3031d575609093d6e..211e8e0aef2c2b4fade048990249c2444afb946a 100644 (file)
@@ -350,8 +350,8 @@ process_response_opp(struct scmi_opp *opp, unsigned int loop_idx,
 }
 
 static inline void
-process_response_opp_v4(struct perf_dom_info *dom, struct scmi_opp *opp,
-                       unsigned int loop_idx,
+process_response_opp_v4(struct device *dev, struct perf_dom_info *dom,
+                       struct scmi_opp *opp, unsigned int loop_idx,
                        const struct scmi_msg_resp_perf_describe_levels_v4 *r)
 {
        opp->perf = le32_to_cpu(r->opp[loop_idx].perf_val);
@@ -362,10 +362,23 @@ process_response_opp_v4(struct perf_dom_info *dom, struct scmi_opp *opp,
        /* Note that PERF v4 reports always five 32-bit words */
        opp->indicative_freq = le32_to_cpu(r->opp[loop_idx].indicative_freq);
        if (dom->level_indexing_mode) {
+               int ret;
+
                opp->level_index = le32_to_cpu(r->opp[loop_idx].level_index);
 
-               xa_store(&dom->opps_by_idx, opp->level_index, opp, GFP_KERNEL);
-               xa_store(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL);
+               ret = xa_insert(&dom->opps_by_idx, opp->level_index, opp,
+                               GFP_KERNEL);
+               if (ret)
+                       dev_warn(dev,
+                                "Failed to add opps_by_idx at %d - ret:%d\n",
+                                opp->level_index, ret);
+
+               ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL);
+               if (ret)
+                       dev_warn(dev,
+                                "Failed to add opps_by_lvl at %d - ret:%d\n",
+                                opp->perf, ret);
+
                hash_add(dom->opps_by_freq, &opp->hash, opp->indicative_freq);
        }
 }
@@ -382,7 +395,7 @@ iter_perf_levels_process_response(const struct scmi_protocol_handle *ph,
        if (PROTOCOL_REV_MAJOR(p->version) <= 0x3)
                process_response_opp(opp, st->loop_idx, response);
        else
-               process_response_opp_v4(p->perf_dom, opp, st->loop_idx,
+               process_response_opp_v4(ph->dev, p->perf_dom, opp, st->loop_idx,
                                        response);
        p->perf_dom->opp_count++;
 
index 0493aa3c12bf5363e02c1ecc9b2520d1bd8b3d67..350573518503355f6abaa4d24cbcac6368e8930c 100644 (file)
@@ -1111,7 +1111,6 @@ static int scmi_raw_mode_setup(struct scmi_raw_mode_info *raw,
                int i;
 
                for (i = 0; i < num_chans; i++) {
-                       void *xret;
                        struct scmi_raw_queue *q;
 
                        q = scmi_raw_queue_init(raw);
@@ -1120,13 +1119,12 @@ static int scmi_raw_mode_setup(struct scmi_raw_mode_info *raw,
                                goto err_xa;
                        }
 
-                       xret = xa_store(&raw->chans_q, channels[i], q,
+                       ret = xa_insert(&raw->chans_q, channels[i], q,
                                        GFP_KERNEL);
-                       if (xa_err(xret)) {
+                       if (ret) {
                                dev_err(dev,
                                        "Fail to allocate Raw queue 0x%02X\n",
                                        channels[i]);
-                               ret = xa_err(xret);
                                goto err_xa;
                        }
                }
@@ -1322,6 +1320,12 @@ void scmi_raw_message_report(void *r, struct scmi_xfer *xfer,
        dev = raw->handle->dev;
        q = scmi_raw_queue_select(raw, idx,
                                  SCMI_XFER_IS_CHAN_SET(xfer) ? chan_id : 0);
+       if (!q) {
+               dev_warn(dev,
+                        "RAW[%d] - NO queue for chan 0x%X. Dropping report.\n",
+                        idx, chan_id);
+               return;
+       }
 
        /*
         * Grab the msg_q_lock upfront to avoid a possible race between
index 87b4f4d35f06230bc161fc4205c7b199e03c0015..8bf495bcad09b7ba8246c05b4e76086fa1bdaf90 100644 (file)
@@ -10,7 +10,7 @@
 #include <linux/processor.h>
 #include <linux/types.h>
 
-#include <asm-generic/bug.h>
+#include <linux/bug.h>
 
 #include "common.h"
 
@@ -122,3 +122,9 @@ bool shmem_poll_done(struct scmi_shared_mem __iomem *shmem,
                (SCMI_SHMEM_CHAN_STAT_CHANNEL_ERROR |
                 SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE);
 }
+
+bool shmem_channel_free(struct scmi_shared_mem __iomem *shmem)
+{
+       return (ioread32(&shmem->channel_status) &
+                       SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE);
+}
index 83f5bb57fa4c466334a90c2195c06ce7443d1b6a..83092d93f36a63087ffbd8b6460d38a824e9cbb1 100644 (file)
@@ -107,7 +107,7 @@ static int __init arm_enable_runtime_services(void)
                efi_memory_desc_t *md;
 
                for_each_efi_memory_desc(md) {
-                       int md_size = md->num_pages << EFI_PAGE_SHIFT;
+                       u64 md_size = md->num_pages << EFI_PAGE_SHIFT;
                        struct resource *res;
 
                        if (!(md->attribute & EFI_MEMORY_SP))
index 3e8d4b51a8140c16720eef8f08d311b024b1a830..97bafb5f7038924fb99eea6f5679b18b2d459e5a 100644 (file)
@@ -292,7 +292,7 @@ static int efi_capsule_open(struct inode *inode, struct file *file)
                return -ENOMEM;
        }
 
-       cap_info->phys = kzalloc(sizeof(void *), GFP_KERNEL);
+       cap_info->phys = kzalloc(sizeof(phys_addr_t), GFP_KERNEL);
        if (!cap_info->phys) {
                kfree(cap_info->pages);
                kfree(cap_info);
index 35c37f667781c7071c714aef274e68dbddca026b..9b3884ff81e699f2308a3cf618e774ad9a67e6a3 100644 (file)
@@ -523,6 +523,17 @@ static void cper_print_tstamp(const char *pfx,
        }
 }
 
+struct ignore_section {
+       guid_t guid;
+       const char *name;
+};
+
+static const struct ignore_section ignore_sections[] = {
+       { .guid = CPER_SEC_CXL_GEN_MEDIA_GUID, .name = "CXL General Media Event" },
+       { .guid = CPER_SEC_CXL_DRAM_GUID, .name = "CXL DRAM Event" },
+       { .guid = CPER_SEC_CXL_MEM_MODULE_GUID, .name = "CXL Memory Module Event" },
+};
+
 static void
 cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata,
                           int sec_no)
@@ -543,6 +554,14 @@ cper_estatus_print_section(const char *pfx, struct acpi_hest_generic_data *gdata
                printk("%s""fru_text: %.20s\n", pfx, gdata->fru_text);
 
        snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
+
+       for (int i = 0; i < ARRAY_SIZE(ignore_sections); i++) {
+               if (guid_equal(sec_type, &ignore_sections[i].guid)) {
+                       printk("%ssection_type: %s\n", newpfx, ignore_sections[i].name);
+                       return;
+               }
+       }
+
        if (guid_equal(sec_type, &CPER_SEC_PROC_GENERIC)) {
                struct cper_sec_proc_generic *proc_err = acpi_hest_get_payload(gdata);
 
index d4987d013080174bda0e462f029c03192897bebb..a00e07b853f221721e1bcd2f801cadcc5bcb67cf 100644 (file)
@@ -143,15 +143,6 @@ static __init int is_usable_memory(efi_memory_desc_t *md)
        case EFI_BOOT_SERVICES_DATA:
        case EFI_CONVENTIONAL_MEMORY:
        case EFI_PERSISTENT_MEMORY:
-               /*
-                * Special purpose memory is 'soft reserved', which means it
-                * is set aside initially, but can be hotplugged back in or
-                * be assigned to the dax driver after boot.
-                */
-               if (efi_soft_reserve_enabled() &&
-                   (md->attribute & EFI_MEMORY_SP))
-                       return false;
-
                /*
                 * According to the spec, these regions are no longer reserved
                 * after calling ExitBootServices(). However, we can only use
@@ -196,6 +187,16 @@ static __init void reserve_regions(void)
                size = npages << PAGE_SHIFT;
 
                if (is_memory(md)) {
+                       /*
+                        * Special purpose memory is 'soft reserved', which
+                        * means it is set aside initially. Don't add a memblock
+                        * for it now so that it can be hotplugged back in or
+                        * be assigned to the dax driver after boot.
+                        */
+                       if (efi_soft_reserve_enabled() &&
+                           (md->attribute & EFI_MEMORY_SP))
+                               continue;
+
                        early_init_dt_add_memory_arch(paddr, size);
 
                        if (!is_usable_memory(md))
index 06964a3c130f6addeed20eca1ed26153a2260854..73f4810f6db38ecc933f9a6ac2bed5ae57709148 100644 (file)
@@ -28,7 +28,7 @@ cflags-$(CONFIG_ARM)          += -DEFI_HAVE_STRLEN -DEFI_HAVE_STRNLEN \
                                   -DEFI_HAVE_MEMCHR -DEFI_HAVE_STRRCHR \
                                   -DEFI_HAVE_STRCMP -fno-builtin -fpic \
                                   $(call cc-option,-mno-single-pic-base)
-cflags-$(CONFIG_RISCV)         += -fpic -DNO_ALTERNATIVE
+cflags-$(CONFIG_RISCV)         += -fpic -DNO_ALTERNATIVE -mno-relax
 cflags-$(CONFIG_LOONGARCH)     += -fpie
 
 cflags-$(CONFIG_EFI_PARAMS_FROM_FDT)   += -I$(srctree)/scripts/dtc/libfdt
@@ -143,7 +143,7 @@ STUBCOPY_RELOC-$(CONFIG_ARM64)      := R_AARCH64_ABS
 # exist.
 STUBCOPY_FLAGS-$(CONFIG_RISCV) += --prefix-alloc-sections=.init \
                                   --prefix-symbols=__efistub_
-STUBCOPY_RELOC-$(CONFIG_RISCV) := R_RISCV_HI20
+STUBCOPY_RELOC-$(CONFIG_RISCV) := -E R_RISCV_HI20\|R_RISCV_$(BITS)\|R_RISCV_RELAX
 
 # For LoongArch, keep all the symbols in .init section and make sure that no
 # absolute symbols references exist.
index 6b83c492c3b8260d52e16bb73a1d5abaa4cb943d..31928bd87e0fff5a0666234ef8328cf5d4f564df 100644 (file)
@@ -14,6 +14,7 @@
  * @max:       the address that the last allocated memory page shall not
  *             exceed
  * @align:     minimum alignment of the base of the allocation
+ * @memory_type: the type of memory to allocate
  *
  * Allocate pages as EFI_LOADER_DATA. The allocated pages are aligned according
  * to @align, which should be >= EFI_ALLOC_ALIGN. The last allocated page will
index 212687c30d79c4b0b307af0b8d3c7b52502e6a95..c04b82ea40f2169b6764ff69a14ff3acc5a8795d 100644 (file)
@@ -956,7 +956,8 @@ efi_status_t efi_get_random_bytes(unsigned long size, u8 *out);
 
 efi_status_t efi_random_alloc(unsigned long size, unsigned long align,
                              unsigned long *addr, unsigned long random_seed,
-                             int memory_type, unsigned long alloc_limit);
+                             int memory_type, unsigned long alloc_min,
+                             unsigned long alloc_max);
 
 efi_status_t efi_random_get_seed(void);
 
index 62d63f7a2645bf82525d79b5d8825e9bea023404..1a9808012abd36ee7f58ad0baf818cbae6df1b0b 100644 (file)
@@ -119,7 +119,7 @@ efi_status_t efi_kaslr_relocate_kernel(unsigned long *image_addr,
                 */
                status = efi_random_alloc(*reserve_size, min_kimg_align,
                                          reserve_addr, phys_seed,
-                                         EFI_LOADER_CODE, EFI_ALLOC_LIMIT);
+                                         EFI_LOADER_CODE, 0, EFI_ALLOC_LIMIT);
                if (status != EFI_SUCCESS)
                        efi_warn("efi_random_alloc() failed: 0x%lx\n", status);
        } else {
index 674a064b8f7adc68edf2412bb8e012250077c717..4e96a855fdf47b5b064b63b729d7dc989cd2b949 100644 (file)
@@ -17,7 +17,7 @@
 static unsigned long get_entry_num_slots(efi_memory_desc_t *md,
                                         unsigned long size,
                                         unsigned long align_shift,
-                                        u64 alloc_limit)
+                                        u64 alloc_min, u64 alloc_max)
 {
        unsigned long align = 1UL << align_shift;
        u64 first_slot, last_slot, region_end;
@@ -30,11 +30,11 @@ static unsigned long get_entry_num_slots(efi_memory_desc_t *md,
                return 0;
 
        region_end = min(md->phys_addr + md->num_pages * EFI_PAGE_SIZE - 1,
-                        alloc_limit);
+                        alloc_max);
        if (region_end < size)
                return 0;
 
-       first_slot = round_up(md->phys_addr, align);
+       first_slot = round_up(max(md->phys_addr, alloc_min), align);
        last_slot = round_down(region_end - size + 1, align);
 
        if (first_slot > last_slot)
@@ -56,7 +56,8 @@ efi_status_t efi_random_alloc(unsigned long size,
                              unsigned long *addr,
                              unsigned long random_seed,
                              int memory_type,
-                             unsigned long alloc_limit)
+                             unsigned long alloc_min,
+                             unsigned long alloc_max)
 {
        unsigned long total_slots = 0, target_slot;
        unsigned long total_mirrored_slots = 0;
@@ -78,7 +79,8 @@ efi_status_t efi_random_alloc(unsigned long size,
                efi_memory_desc_t *md = (void *)map->map + map_offset;
                unsigned long slots;
 
-               slots = get_entry_num_slots(md, size, ilog2(align), alloc_limit);
+               slots = get_entry_num_slots(md, size, ilog2(align), alloc_min,
+                                           alloc_max);
                MD_NUM_SLOTS(md) = slots;
                total_slots += slots;
                if (md->attribute & EFI_MEMORY_MORE_RELIABLE)
index 0d510c9a06a45925922595f1e44c7ee3b2a170a6..99429bc4b0c7eb0c639b84934fe614f8f8cb5721 100644 (file)
@@ -223,8 +223,8 @@ static void retrieve_apple_device_properties(struct boot_params *boot_params)
        }
 }
 
-void efi_adjust_memory_range_protection(unsigned long start,
-                                       unsigned long size)
+efi_status_t efi_adjust_memory_range_protection(unsigned long start,
+                                               unsigned long size)
 {
        efi_status_t status;
        efi_gcd_memory_space_desc_t desc;
@@ -236,13 +236,17 @@ void efi_adjust_memory_range_protection(unsigned long start,
        rounded_end = roundup(start + size, EFI_PAGE_SIZE);
 
        if (memattr != NULL) {
-               efi_call_proto(memattr, clear_memory_attributes, rounded_start,
-                              rounded_end - rounded_start, EFI_MEMORY_XP);
-               return;
+               status = efi_call_proto(memattr, clear_memory_attributes,
+                                       rounded_start,
+                                       rounded_end - rounded_start,
+                                       EFI_MEMORY_XP);
+               if (status != EFI_SUCCESS)
+                       efi_warn("Failed to clear EFI_MEMORY_XP attribute\n");
+               return status;
        }
 
        if (efi_dxe_table == NULL)
-               return;
+               return EFI_SUCCESS;
 
        /*
         * Don't modify memory region attributes, they are
@@ -255,7 +259,7 @@ void efi_adjust_memory_range_protection(unsigned long start,
                status = efi_dxe_call(get_memory_space_descriptor, start, &desc);
 
                if (status != EFI_SUCCESS)
-                       return;
+                       break;
 
                next = desc.base_address + desc.length;
 
@@ -280,8 +284,10 @@ void efi_adjust_memory_range_protection(unsigned long start,
                                 unprotect_start,
                                 unprotect_start + unprotect_size,
                                 status);
+                       break;
                }
        }
+       return EFI_SUCCESS;
 }
 
 static void setup_unaccepted_memory(void)
@@ -793,6 +799,7 @@ static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry)
 
        status = efi_random_alloc(alloc_size, CONFIG_PHYSICAL_ALIGN, &addr,
                                  seed[0], EFI_LOADER_CODE,
+                                 LOAD_PHYSICAL_ADDR,
                                  EFI_X86_KERNEL_ALLOC_LIMIT);
        if (status != EFI_SUCCESS)
                return status;
@@ -805,9 +812,7 @@ static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry)
 
        *kernel_entry = addr + entry;
 
-       efi_adjust_memory_range_protection(addr, kernel_total_size);
-
-       return EFI_SUCCESS;
+       return efi_adjust_memory_range_protection(addr, kernel_total_size);
 }
 
 static void __noreturn enter_kernel(unsigned long kernel_addr,
index 37c5a36b9d8cf9b2cad93f228502fd336d142908..1c20e99a6494423787ef1dd091739ed9cbc89a24 100644 (file)
@@ -5,8 +5,8 @@
 extern void trampoline_32bit_src(void *, bool);
 extern const u16 trampoline_ljmp_imm_offset;
 
-void efi_adjust_memory_range_protection(unsigned long start,
-                                       unsigned long size);
+efi_status_t efi_adjust_memory_range_protection(unsigned long start,
+                                               unsigned long size);
 
 #ifdef CONFIG_X86_64
 efi_status_t efi_setup_5level_paging(void);
index bdb17eac0cb401befbcc8b13820f9a3b416b6f19..1ceace956758682f592f6fe3f280b7260f7ca562 100644 (file)
@@ -119,7 +119,7 @@ efi_zboot_entry(efi_handle_t handle, efi_system_table_t *systab)
                }
 
                status = efi_random_alloc(alloc_size, min_kimg_align, &image_base,
-                                         seed, EFI_LOADER_CODE, EFI_ALLOC_LIMIT);
+                                         seed, EFI_LOADER_CODE, 0, EFI_ALLOC_LIMIT);
                if (status != EFI_SUCCESS) {
                        efi_err("Failed to allocate memory\n");
                        goto free_cmdline;
index 09525fb5c240e6686ff5588c55998d5815e20ff7..01f0f90ea4183119b0a4eedf82a3fe81f1b2f480 100644 (file)
@@ -85,7 +85,7 @@ static int __init riscv_enable_runtime_services(void)
                efi_memory_desc_t *md;
 
                for_each_efi_memory_desc(md) {
-                       int md_size = md->num_pages << EFI_PAGE_SHIFT;
+                       u64 md_size = md->num_pages << EFI_PAGE_SHIFT;
                        struct resource *res;
 
                        if (!(md->attribute & EFI_MEMORY_SP))
index 81f5f62e34fce04fb6db2db11294f8281c58f5b7..fbeeaee4ac85603783412b2afddd9c5ec6fafd49 100644 (file)
@@ -167,7 +167,7 @@ static int mpfs_auto_update_verify_image(struct fw_upload *fw_uploader)
        u32 *response_msg;
        int ret;
 
-       response_msg = devm_kzalloc(priv->dev, AUTO_UPDATE_FEATURE_RESP_SIZE * sizeof(response_msg),
+       response_msg = devm_kzalloc(priv->dev, AUTO_UPDATE_FEATURE_RESP_SIZE * sizeof(*response_msg),
                                    GFP_KERNEL);
        if (!response_msg)
                return -ENOMEM;
@@ -384,7 +384,8 @@ static int mpfs_auto_update_available(struct mpfs_auto_update_priv *priv)
        u32 *response_msg;
        int ret;
 
-       response_msg = devm_kzalloc(priv->dev, AUTO_UPDATE_FEATURE_RESP_SIZE * sizeof(response_msg),
+       response_msg = devm_kzalloc(priv->dev,
+                                   AUTO_UPDATE_FEATURE_RESP_SIZE * sizeof(*response_msg),
                                    GFP_KERNEL);
        if (!response_msg)
                return -ENOMEM;
index 82fcfd29bc4d29116b051c946edb9b6535fd78ac..3c197db42c9d936866f9ff68cf7561e4735cfe1e 100644 (file)
@@ -128,4 +128,4 @@ unlock_mutex:
 }
 
 /* must execute after PCI subsystem for EFI quirks */
-subsys_initcall_sync(sysfb_init);
+device_initcall(sysfb_init);
index e00c333105170f5a2a702593feab340ddc4a7d8e..753e7be039e4d9cd830190d75d8b62ca1219ec96 100644 (file)
@@ -127,8 +127,6 @@ static int gen_74x164_probe(struct spi_device *spi)
        if (IS_ERR(chip->gpiod_oe))
                return PTR_ERR(chip->gpiod_oe);
 
-       gpiod_set_value_cansleep(chip->gpiod_oe, 1);
-
        spi_set_drvdata(spi, chip);
 
        chip->gpio_chip.label = spi->modalias;
@@ -153,6 +151,8 @@ static int gen_74x164_probe(struct spi_device *spi)
                goto exit_destroy;
        }
 
+       gpiod_set_value_cansleep(chip->gpiod_oe, 1);
+
        ret = gpiochip_add_data(&chip->gpio_chip, chip);
        if (!ret)
                return 0;
index be7f2fa5aa7b600a2605084d832e23f24d501c84..806b88d8dfb7bda7d23cae021eb08c3bbf383ab1 100644 (file)
@@ -330,20 +330,27 @@ static int sprd_eic_irq_set_type(struct irq_data *data, unsigned int flow_type)
                switch (flow_type) {
                case IRQ_TYPE_LEVEL_HIGH:
                        sprd_eic_update(chip, offset, SPRD_EIC_DBNC_IEV, 1);
+                       sprd_eic_update(chip, offset, SPRD_EIC_DBNC_IC, 1);
                        break;
                case IRQ_TYPE_LEVEL_LOW:
                        sprd_eic_update(chip, offset, SPRD_EIC_DBNC_IEV, 0);
+                       sprd_eic_update(chip, offset, SPRD_EIC_DBNC_IC, 1);
                        break;
                case IRQ_TYPE_EDGE_RISING:
                case IRQ_TYPE_EDGE_FALLING:
                case IRQ_TYPE_EDGE_BOTH:
                        state = sprd_eic_get(chip, offset);
-                       if (state)
+                       if (state) {
                                sprd_eic_update(chip, offset,
                                                SPRD_EIC_DBNC_IEV, 0);
-                       else
+                               sprd_eic_update(chip, offset,
+                                               SPRD_EIC_DBNC_IC, 1);
+                       } else {
                                sprd_eic_update(chip, offset,
                                                SPRD_EIC_DBNC_IEV, 1);
+                               sprd_eic_update(chip, offset,
+                                               SPRD_EIC_DBNC_IC, 1);
+                       }
                        break;
                default:
                        return -ENOTSUPP;
@@ -355,20 +362,27 @@ static int sprd_eic_irq_set_type(struct irq_data *data, unsigned int flow_type)
                switch (flow_type) {
                case IRQ_TYPE_LEVEL_HIGH:
                        sprd_eic_update(chip, offset, SPRD_EIC_LATCH_INTPOL, 0);
+                       sprd_eic_update(chip, offset, SPRD_EIC_LATCH_INTCLR, 1);
                        break;
                case IRQ_TYPE_LEVEL_LOW:
                        sprd_eic_update(chip, offset, SPRD_EIC_LATCH_INTPOL, 1);
+                       sprd_eic_update(chip, offset, SPRD_EIC_LATCH_INTCLR, 1);
                        break;
                case IRQ_TYPE_EDGE_RISING:
                case IRQ_TYPE_EDGE_FALLING:
                case IRQ_TYPE_EDGE_BOTH:
                        state = sprd_eic_get(chip, offset);
-                       if (state)
+                       if (state) {
                                sprd_eic_update(chip, offset,
                                                SPRD_EIC_LATCH_INTPOL, 0);
-                       else
+                               sprd_eic_update(chip, offset,
+                                               SPRD_EIC_LATCH_INTCLR, 1);
+                       } else {
                                sprd_eic_update(chip, offset,
                                                SPRD_EIC_LATCH_INTPOL, 1);
+                               sprd_eic_update(chip, offset,
+                                               SPRD_EIC_LATCH_INTCLR, 1);
+                       }
                        break;
                default:
                        return -ENOTSUPP;
@@ -382,29 +396,34 @@ static int sprd_eic_irq_set_type(struct irq_data *data, unsigned int flow_type)
                        sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTBOTH, 0);
                        sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTMODE, 0);
                        sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTPOL, 1);
+                       sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTCLR, 1);
                        irq_set_handler_locked(data, handle_edge_irq);
                        break;
                case IRQ_TYPE_EDGE_FALLING:
                        sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTBOTH, 0);
                        sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTMODE, 0);
                        sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTPOL, 0);
+                       sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTCLR, 1);
                        irq_set_handler_locked(data, handle_edge_irq);
                        break;
                case IRQ_TYPE_EDGE_BOTH:
                        sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTMODE, 0);
                        sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTBOTH, 1);
+                       sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTCLR, 1);
                        irq_set_handler_locked(data, handle_edge_irq);
                        break;
                case IRQ_TYPE_LEVEL_HIGH:
                        sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTBOTH, 0);
                        sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTMODE, 1);
                        sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTPOL, 1);
+                       sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTCLR, 1);
                        irq_set_handler_locked(data, handle_level_irq);
                        break;
                case IRQ_TYPE_LEVEL_LOW:
                        sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTBOTH, 0);
                        sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTMODE, 1);
                        sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTPOL, 0);
+                       sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTCLR, 1);
                        irq_set_handler_locked(data, handle_level_irq);
                        break;
                default:
@@ -417,29 +436,34 @@ static int sprd_eic_irq_set_type(struct irq_data *data, unsigned int flow_type)
                        sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTBOTH, 0);
                        sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTMODE, 0);
                        sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTPOL, 1);
+                       sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTCLR, 1);
                        irq_set_handler_locked(data, handle_edge_irq);
                        break;
                case IRQ_TYPE_EDGE_FALLING:
                        sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTBOTH, 0);
                        sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTMODE, 0);
                        sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTPOL, 0);
+                       sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTCLR, 1);
                        irq_set_handler_locked(data, handle_edge_irq);
                        break;
                case IRQ_TYPE_EDGE_BOTH:
                        sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTMODE, 0);
                        sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTBOTH, 1);
+                       sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTCLR, 1);
                        irq_set_handler_locked(data, handle_edge_irq);
                        break;
                case IRQ_TYPE_LEVEL_HIGH:
                        sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTBOTH, 0);
                        sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTMODE, 1);
                        sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTPOL, 1);
+                       sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTCLR, 1);
                        irq_set_handler_locked(data, handle_level_irq);
                        break;
                case IRQ_TYPE_LEVEL_LOW:
                        sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTBOTH, 0);
                        sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTMODE, 1);
                        sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTPOL, 0);
+                       sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTCLR, 1);
                        irq_set_handler_locked(data, handle_level_irq);
                        break;
                default:
index 88066826d8e5b629697136b8bb2431b543544977..cd3e9657cc36df59123a571a0ed2ed5332272a5d 100644 (file)
@@ -1651,6 +1651,20 @@ static const struct dmi_system_id gpiolib_acpi_quirks[] __initconst = {
                        .ignore_interrupt = "INT33FC:00@3",
                },
        },
+       {
+               /*
+                * Spurious wakeups from TP_ATTN# pin
+                * Found in BIOS 0.35
+                * https://gitlab.freedesktop.org/drm/amd/-/issues/3073
+                */
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "GPD"),
+                       DMI_MATCH(DMI_PRODUCT_NAME, "G1619-04"),
+               },
+               .driver_data = &(struct acpi_gpiolib_dmi_quirk) {
+                       .ignore_wake = "PNP0C50:00@8",
+               },
+       },
        {} /* Terminating entry */
 };
 
index 44c8f5743a2416087b523e973967e993e8a192a1..75be4a3ca7f8443f55a68aff5185044bbbdaa367 100644 (file)
@@ -968,11 +968,11 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data,
 
        ret = gpiochip_irqchip_init_valid_mask(gc);
        if (ret)
-               goto err_remove_acpi_chip;
+               goto err_free_hogs;
 
        ret = gpiochip_irqchip_init_hw(gc);
        if (ret)
-               goto err_remove_acpi_chip;
+               goto err_remove_irqchip_mask;
 
        ret = gpiochip_add_irqchip(gc, lock_key, request_key);
        if (ret)
@@ -997,23 +997,23 @@ err_remove_irqchip:
        gpiochip_irqchip_remove(gc);
 err_remove_irqchip_mask:
        gpiochip_irqchip_free_valid_mask(gc);
-err_remove_acpi_chip:
+err_free_hogs:
+       gpiochip_free_hogs(gc);
        acpi_gpiochip_remove(gc);
+       gpiochip_remove_pin_ranges(gc);
 err_remove_of_chip:
-       gpiochip_free_hogs(gc);
        of_gpiochip_remove(gc);
 err_free_gpiochip_mask:
-       gpiochip_remove_pin_ranges(gc);
        gpiochip_free_valid_mask(gc);
+err_remove_from_list:
+       spin_lock_irqsave(&gpio_lock, flags);
+       list_del(&gdev->list);
+       spin_unlock_irqrestore(&gpio_lock, flags);
        if (gdev->dev.release) {
                /* release() has been registered by gpiochip_setup_dev() */
                gpio_device_put(gdev);
                goto err_print_message;
        }
-err_remove_from_list:
-       spin_lock_irqsave(&gpio_lock, flags);
-       list_del(&gdev->list);
-       spin_unlock_irqrestore(&gpio_lock, flags);
 err_free_label:
        kfree_const(gdev->label);
 err_free_descs:
@@ -2042,6 +2042,11 @@ EXPORT_SYMBOL_GPL(gpiochip_generic_free);
 int gpiochip_generic_config(struct gpio_chip *gc, unsigned int offset,
                            unsigned long config)
 {
+#ifdef CONFIG_PINCTRL
+       if (list_empty(&gc->gpiodev->pin_ranges))
+               return -ENOTSUPP;
+#endif
+
        return pinctrl_gpio_set_config(gc, offset, config);
 }
 EXPORT_SYMBOL_GPL(gpiochip_generic_config);
index 2520db0b776e1bccf213fd541baf6275dbb192eb..c7edba18a6f09c4d3c75af737d94737a0e6f2890 100644 (file)
@@ -199,7 +199,7 @@ config DRM_TTM
 config DRM_TTM_KUNIT_TEST
         tristate "KUnit tests for TTM" if !KUNIT_ALL_TESTS
         default n
-        depends on DRM && KUNIT && MMU
+        depends on DRM && KUNIT && MMU && (UML || COMPILE_TEST)
         select DRM_TTM
         select DRM_EXPORT_FOR_TESTS if m
         select DRM_KUNIT_TEST_HELPERS
@@ -207,7 +207,8 @@ config DRM_TTM_KUNIT_TEST
         help
           Enables unit tests for TTM, a GPU memory manager subsystem used
           to manage memory buffers. This option is mostly useful for kernel
-          developers.
+          developers. It depends on (UML || COMPILE_TEST) since no other driver
+          which uses TTM can be loaded while running the tests.
 
           If in doubt, say "N".
 
index 3d8a48f46b015613dc44517ebd20d5250df5a3b1..79827a6dcd7f5cbbf30d61f6701ecee8ae6614fa 100644 (file)
@@ -200,6 +200,7 @@ extern uint amdgpu_dc_debug_mask;
 extern uint amdgpu_dc_visual_confirm;
 extern uint amdgpu_dm_abm_level;
 extern int amdgpu_backlight;
+extern int amdgpu_damage_clips;
 extern struct amdgpu_mgpu_info mgpu_info;
 extern int amdgpu_ras_enable;
 extern uint amdgpu_ras_mask;
@@ -1078,6 +1079,8 @@ struct amdgpu_device {
        bool                            in_s3;
        bool                            in_s4;
        bool                            in_s0ix;
+       /* indicate amdgpu suspension status */
+       bool                            suspend_complete;
 
        enum pp_mp1_state               mp1_state;
        struct amdgpu_doorbell_index doorbell_index;
@@ -1547,9 +1550,11 @@ static inline int amdgpu_acpi_smart_shift_update(struct drm_device *dev,
 #if defined(CONFIG_ACPI) && defined(CONFIG_SUSPEND)
 bool amdgpu_acpi_is_s3_active(struct amdgpu_device *adev);
 bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device *adev);
+void amdgpu_choose_low_power_state(struct amdgpu_device *adev);
 #else
 static inline bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device *adev) { return false; }
 static inline bool amdgpu_acpi_is_s3_active(struct amdgpu_device *adev) { return false; }
+static inline void amdgpu_choose_low_power_state(struct amdgpu_device *adev) { }
 #endif
 
 #if defined(CONFIG_DRM_AMD_DC)
index 2deebece810e78a7ce039772a839684f570bceca..7099ff9cf8c50d7b7ea96149bcef235368fae165 100644 (file)
@@ -1519,4 +1519,22 @@ bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device *adev)
 #endif /* CONFIG_AMD_PMC */
 }
 
+/**
+ * amdgpu_choose_low_power_state
+ *
+ * @adev: amdgpu_device_pointer
+ *
+ * Choose the target low power state for the GPU
+ */
+void amdgpu_choose_low_power_state(struct amdgpu_device *adev)
+{
+       if (adev->in_runpm)
+               return;
+
+       if (amdgpu_acpi_is_s0ix_active(adev))
+               adev->in_s0ix = true;
+       else if (amdgpu_acpi_is_s3_active(adev))
+               adev->in_s3 = true;
+}
+
 #endif /* CONFIG_SUSPEND */
index 77e2636602887034c188ec695591d20e5b087b60..41db030ddc4ee9c98ba952b4b91d6292f7c457d6 100644 (file)
@@ -141,11 +141,31 @@ static void amdgpu_amdkfd_reset_work(struct work_struct *work)
 static const struct drm_client_funcs kfd_client_funcs = {
        .unregister     = drm_client_release,
 };
+
+int amdgpu_amdkfd_drm_client_create(struct amdgpu_device *adev)
+{
+       int ret;
+
+       if (!adev->kfd.init_complete)
+               return 0;
+
+       ret = drm_client_init(&adev->ddev, &adev->kfd.client, "kfd",
+                             &kfd_client_funcs);
+       if (ret) {
+               dev_err(adev->dev, "Failed to init DRM client: %d\n",
+                       ret);
+               return ret;
+       }
+
+       drm_client_register(&adev->kfd.client);
+
+       return 0;
+}
+
 void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
 {
        int i;
        int last_valid_bit;
-       int ret;
 
        amdgpu_amdkfd_gpuvm_init_mem_limits();
 
@@ -164,12 +184,6 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
                        .enable_mes = adev->enable_mes,
                };
 
-               ret = drm_client_init(&adev->ddev, &adev->kfd.client, "kfd", &kfd_client_funcs);
-               if (ret) {
-                       dev_err(adev->dev, "Failed to init DRM client: %d\n", ret);
-                       return;
-               }
-
                /* this is going to have a few of the MSBs set that we need to
                 * clear
                 */
@@ -208,10 +222,6 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
 
                adev->kfd.init_complete = kgd2kfd_device_init(adev->kfd.dev,
                                                        &gpu_resources);
-               if (adev->kfd.init_complete)
-                       drm_client_register(&adev->kfd.client);
-               else
-                       drm_client_release(&adev->kfd.client);
 
                amdgpu_amdkfd_total_mem_size += adev->gmc.real_vram_size;
 
index f262b9d89541a8a971a394b5f0da0f6a1368ba65..27c61c535e297931892902f1abb9e56ca6feea5c 100644 (file)
@@ -182,6 +182,8 @@ int amdgpu_queue_mask_bit_to_set_resource_bit(struct amdgpu_device *adev,
 struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
                                struct mm_struct *mm,
                                struct svm_range_bo *svm_bo);
+
+int amdgpu_amdkfd_drm_client_create(struct amdgpu_device *adev);
 #if defined(CONFIG_DEBUG_FS)
 int kfd_debugfs_kfd_mem_limits(struct seq_file *m, void *data);
 #endif
@@ -301,7 +303,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(struct amdgpu_device *adev,
                                          struct kgd_mem *mem, void *drm_priv);
 int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
                struct amdgpu_device *adev, struct kgd_mem *mem, void *drm_priv);
-void amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv);
+int amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv);
 int amdgpu_amdkfd_gpuvm_sync_memory(
                struct amdgpu_device *adev, struct kgd_mem *mem, bool intr);
 int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_mem *mem,
index 899e31e3a5e81d2be343a668e295a564efee10af..3a3f3ce09f00dbe77f61455f24fed7bd0db0dec5 100644 (file)
@@ -290,7 +290,7 @@ static int suspend_resume_compute_scheduler(struct amdgpu_device *adev, bool sus
        for (i = 0; i < adev->gfx.num_compute_rings; i++) {
                struct amdgpu_ring *ring = &adev->gfx.compute_ring[i];
 
-               if (!(ring && drm_sched_wqueue_ready(&ring->sched)))
+               if (!amdgpu_ring_sched_ready(ring))
                        continue;
 
                /* stop secheduler and drain ring. */
index f183d7faeeece16cfc7c211f5a6a0232dce37c36..231fd927dcfbee0db07e3a5d28eed2b24ff82b9c 100644 (file)
@@ -2085,21 +2085,35 @@ out:
        return ret;
 }
 
-void amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv)
+int amdgpu_amdkfd_gpuvm_dmaunmap_mem(struct kgd_mem *mem, void *drm_priv)
 {
        struct kfd_mem_attachment *entry;
        struct amdgpu_vm *vm;
+       int ret;
 
        vm = drm_priv_to_vm(drm_priv);
 
        mutex_lock(&mem->lock);
 
+       ret = amdgpu_bo_reserve(mem->bo, true);
+       if (ret)
+               goto out;
+
        list_for_each_entry(entry, &mem->attachments, list) {
-               if (entry->bo_va->base.vm == vm)
-                       kfd_mem_dmaunmap_attachment(mem, entry);
+               if (entry->bo_va->base.vm != vm)
+                       continue;
+               if (entry->bo_va->base.bo->tbo.ttm &&
+                   !entry->bo_va->base.bo->tbo.ttm->sg)
+                       continue;
+
+               kfd_mem_dmaunmap_attachment(mem, entry);
        }
 
+       amdgpu_bo_unreserve(mem->bo);
+out:
        mutex_unlock(&mem->lock);
+
+       return ret;
 }
 
 int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
index e485dd3357c63fd225b3fb7e3847675749f018da..1afbb2e932c6b58a9e26cbabe61370151373a4af 100644 (file)
@@ -1678,7 +1678,7 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
        for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
                struct amdgpu_ring *ring = adev->rings[i];
 
-               if (!ring || !drm_sched_wqueue_ready(&ring->sched))
+               if (!amdgpu_ring_sched_ready(ring))
                        continue;
                drm_sched_wqueue_stop(&ring->sched);
        }
@@ -1694,7 +1694,7 @@ static int amdgpu_debugfs_test_ib_show(struct seq_file *m, void *unused)
        for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
                struct amdgpu_ring *ring = adev->rings[i];
 
-               if (!ring || !drm_sched_wqueue_ready(&ring->sched))
+               if (!amdgpu_ring_sched_ready(ring))
                        continue;
                drm_sched_wqueue_start(&ring->sched);
        }
@@ -1916,8 +1916,8 @@ static int amdgpu_debugfs_ib_preempt(void *data, u64 val)
 
        ring = adev->rings[val];
 
-       if (!ring || !ring->funcs->preempt_ib ||
-           !drm_sched_wqueue_ready(&ring->sched))
+       if (!amdgpu_ring_sched_ready(ring) ||
+           !ring->funcs->preempt_ib)
                return -EINVAL;
 
        /* the last preemption failed */
index b158d27d0a71cbbafb55f0d58657c1ec178fa6c2..94bdb5fa6ebc6ac7715a64191b5050a9670bf673 100644 (file)
@@ -4121,23 +4121,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
                                }
                        }
                } else {
-                       switch (amdgpu_ip_version(adev, MP1_HWIP, 0)) {
-                       case IP_VERSION(13, 0, 0):
-                       case IP_VERSION(13, 0, 7):
-                       case IP_VERSION(13, 0, 10):
-                               r = psp_gpu_reset(adev);
-                               break;
-                       default:
-                               tmp = amdgpu_reset_method;
-                               /* It should do a default reset when loading or reloading the driver,
-                                * regardless of the module parameter reset_method.
-                                */
-                               amdgpu_reset_method = AMD_RESET_METHOD_NONE;
-                               r = amdgpu_asic_reset(adev);
-                               amdgpu_reset_method = tmp;
-                               break;
-                       }
-
+                       tmp = amdgpu_reset_method;
+                       /* It should do a default reset when loading or reloading the driver,
+                        * regardless of the module parameter reset_method.
+                        */
+                       amdgpu_reset_method = AMD_RESET_METHOD_NONE;
+                       r = amdgpu_asic_reset(adev);
+                       amdgpu_reset_method = tmp;
                        if (r) {
                                dev_err(adev->dev, "asic reset on init failed\n");
                                goto failed;
@@ -4524,13 +4514,15 @@ int amdgpu_device_prepare(struct drm_device *dev)
        struct amdgpu_device *adev = drm_to_adev(dev);
        int i, r;
 
+       amdgpu_choose_low_power_state(adev);
+
        if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
                return 0;
 
        /* Evict the majority of BOs before starting suspend sequence */
        r = amdgpu_device_evict_resources(adev);
        if (r)
-               return r;
+               goto unprepare;
 
        for (i = 0; i < adev->num_ip_blocks; i++) {
                if (!adev->ip_blocks[i].status.valid)
@@ -4539,10 +4531,15 @@ int amdgpu_device_prepare(struct drm_device *dev)
                        continue;
                r = adev->ip_blocks[i].version->funcs->prepare_suspend((void *)adev);
                if (r)
-                       return r;
+                       goto unprepare;
        }
 
        return 0;
+
+unprepare:
+       adev->in_s0ix = adev->in_s3 = false;
+
+       return r;
 }
 
 /**
@@ -4579,7 +4576,6 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon)
                drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)->fb_helper, true);
 
        cancel_delayed_work_sync(&adev->delayed_init_work);
-       flush_delayed_work(&adev->gfx.gfx_off_delay_work);
 
        amdgpu_ras_suspend(adev);
 
@@ -5031,7 +5027,7 @@ bool amdgpu_device_has_job_running(struct amdgpu_device *adev)
        for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
                struct amdgpu_ring *ring = adev->rings[i];
 
-               if (!ring || !drm_sched_wqueue_ready(&ring->sched))
+               if (!amdgpu_ring_sched_ready(ring))
                        continue;
 
                spin_lock(&ring->sched.job_list_lock);
@@ -5170,7 +5166,7 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
        for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
                struct amdgpu_ring *ring = adev->rings[i];
 
-               if (!ring || !drm_sched_wqueue_ready(&ring->sched))
+               if (!amdgpu_ring_sched_ready(ring))
                        continue;
 
                /* Clear job fence from fence drv to avoid force_completion
@@ -5637,7 +5633,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
                for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
                        struct amdgpu_ring *ring = tmp_adev->rings[i];
 
-                       if (!ring || !drm_sched_wqueue_ready(&ring->sched))
+                       if (!amdgpu_ring_sched_ready(ring))
                                continue;
 
                        drm_sched_stop(&ring->sched, job ? &job->base : NULL);
@@ -5706,7 +5702,7 @@ skip_hw_reset:
                for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
                        struct amdgpu_ring *ring = tmp_adev->rings[i];
 
-                       if (!ring || !drm_sched_wqueue_ready(&ring->sched))
+                       if (!amdgpu_ring_sched_ready(ring))
                                continue;
 
                        drm_sched_start(&ring->sched, true);
@@ -6061,7 +6057,7 @@ pci_ers_result_t amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta
                for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
                        struct amdgpu_ring *ring = adev->rings[i];
 
-                       if (!ring || !drm_sched_wqueue_ready(&ring->sched))
+                       if (!amdgpu_ring_sched_ready(ring))
                                continue;
 
                        drm_sched_stop(&ring->sched, NULL);
@@ -6189,7 +6185,7 @@ void amdgpu_pci_resume(struct pci_dev *pdev)
        for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
                struct amdgpu_ring *ring = adev->rings[i];
 
-               if (!ring || !drm_sched_wqueue_ready(&ring->sched))
+               if (!amdgpu_ring_sched_ready(ring))
                        continue;
 
                drm_sched_start(&ring->sched, true);
index cc69005f5b46e7b9f06d65db13287a617cc384e2..586f4d03039dfb5177a27fce81fbdbead88e0235 100644 (file)
@@ -211,6 +211,7 @@ int amdgpu_seamless = -1; /* auto */
 uint amdgpu_debug_mask;
 int amdgpu_agp = -1; /* auto */
 int amdgpu_wbrf = -1;
+int amdgpu_damage_clips = -1; /* auto */
 
 static void amdgpu_drv_delayed_reset_work_handler(struct work_struct *work);
 
@@ -859,6 +860,18 @@ int amdgpu_backlight = -1;
 MODULE_PARM_DESC(backlight, "Backlight control (0 = pwm, 1 = aux, -1 auto (default))");
 module_param_named(backlight, amdgpu_backlight, bint, 0444);
 
+/**
+ * DOC: damageclips (int)
+ * Enable or disable damage clips support. If damage clips support is disabled,
+ * we will force full frame updates, irrespective of what user space sends to
+ * us.
+ *
+ * Defaults to -1 (where it is enabled unless a PSR-SU display is detected).
+ */
+MODULE_PARM_DESC(damageclips,
+                "Damage clips support (0 = disable, 1 = enable, -1 auto (default))");
+module_param_named(damageclips, amdgpu_damage_clips, int, 0444);
+
 /**
  * DOC: tmz (int)
  * Trusted Memory Zone (TMZ) is a method to protect data being written
@@ -2255,6 +2268,10 @@ retry_init:
        if (ret)
                goto err_pci;
 
+       ret = amdgpu_amdkfd_drm_client_create(adev);
+       if (ret)
+               goto err_pci;
+
        /*
         * 1. don't init fbdev on hw without DCE
         * 2. don't init fbdev if there are no connectors
@@ -2472,6 +2489,7 @@ static int amdgpu_pmops_suspend(struct device *dev)
        struct drm_device *drm_dev = dev_get_drvdata(dev);
        struct amdgpu_device *adev = drm_to_adev(drm_dev);
 
+       adev->suspend_complete = false;
        if (amdgpu_acpi_is_s0ix_active(adev))
                adev->in_s0ix = true;
        else if (amdgpu_acpi_is_s3_active(adev))
@@ -2486,6 +2504,7 @@ static int amdgpu_pmops_suspend_noirq(struct device *dev)
        struct drm_device *drm_dev = dev_get_drvdata(dev);
        struct amdgpu_device *adev = drm_to_adev(drm_dev);
 
+       adev->suspend_complete = true;
        if (amdgpu_acpi_should_gpu_reset(adev))
                return amdgpu_asic_reset(adev);
 
index 73b8cca35bab8780d1938a45d035d19648bdd081..c623e23049d1d4bde50991fcddc8b542df0099b7 100644 (file)
@@ -121,6 +121,7 @@ int amdgpu_gart_table_ram_alloc(struct amdgpu_device *adev)
        struct amdgpu_bo_param bp;
        dma_addr_t dma_addr;
        struct page *p;
+       unsigned long x;
        int ret;
 
        if (adev->gart.bo != NULL)
@@ -130,6 +131,10 @@ int amdgpu_gart_table_ram_alloc(struct amdgpu_device *adev)
        if (!p)
                return -ENOMEM;
 
+       /* assign pages to this device */
+       for (x = 0; x < (1UL << order); x++)
+               p[x].mapping = adev->mman.bdev.dev_mapping;
+
        /* If the hardware does not support UTCL2 snooping of the CPU caches
         * then set_memory_wc() could be used as a workaround to mark the pages
         * as write combine memory.
@@ -223,6 +228,7 @@ void amdgpu_gart_table_ram_free(struct amdgpu_device *adev)
        unsigned int order = get_order(adev->gart.table_size);
        struct sg_table *sg = adev->gart.bo->tbo.sg;
        struct page *p;
+       unsigned long x;
        int ret;
 
        ret = amdgpu_bo_reserve(adev->gart.bo, false);
@@ -234,6 +240,8 @@ void amdgpu_gart_table_ram_free(struct amdgpu_device *adev)
        sg_free_table(sg);
        kfree(sg);
        p = virt_to_page(adev->gart.ptr);
+       for (x = 0; x < (1UL << order); x++)
+               p[x].mapping = NULL;
        __free_pages(p, order);
 
        adev->gart.ptr = NULL;
index b9674c57c4365fb5ebdf9644fc4ac0a31b955da8..6ddc8e3360e220644618b26059d735e6bbda10e4 100644 (file)
@@ -723,8 +723,15 @@ void amdgpu_gfx_off_ctrl(struct amdgpu_device *adev, bool enable)
 
                if (adev->gfx.gfx_off_req_count == 0 &&
                    !adev->gfx.gfx_off_state) {
-                       schedule_delayed_work(&adev->gfx.gfx_off_delay_work,
+                       /* If going to s2idle, no need to wait */
+                       if (adev->in_s0ix) {
+                               if (!amdgpu_dpm_set_powergating_by_smu(adev,
+                                               AMD_IP_BLOCK_TYPE_GFX, true))
+                                       adev->gfx.gfx_off_state = true;
+                       } else {
+                               schedule_delayed_work(&adev->gfx.gfx_off_delay_work,
                                              delay);
+                       }
                }
        } else {
                if (adev->gfx.gfx_off_req_count == 0) {
index 468a67b302d4c140c9d7cf09bc92566404180e75..ca5c86e5f7cd671a651d61357ab52d3c53a1e7f3 100644 (file)
@@ -362,7 +362,7 @@ static ssize_t ta_if_invoke_debugfs_write(struct file *fp, const char *buf, size
                }
        }
 
-       if (copy_to_user((char *)buf, context->mem_context.shared_buf, shared_buf_len))
+       if (copy_to_user((char *)&buf[copy_pos], context->mem_context.shared_buf, shared_buf_len))
                ret = -EFAULT;
 
 err_free_shared_buf:
index 45424ebf9681430fefc21bdc33d6aa2c6e5f6c91..5505d646f43aa8f963d8d8732846b00fc612a3a7 100644 (file)
@@ -635,6 +635,7 @@ int amdgpu_ring_test_helper(struct amdgpu_ring *ring)
                              ring->name);
 
        ring->sched.ready = !r;
+
        return r;
 }
 
@@ -717,3 +718,14 @@ void amdgpu_ring_ib_on_emit_de(struct amdgpu_ring *ring)
        if (ring->is_sw_ring)
                amdgpu_sw_ring_ib_mark_offset(ring, AMDGPU_MUX_OFFSET_TYPE_DE);
 }
+
+bool amdgpu_ring_sched_ready(struct amdgpu_ring *ring)
+{
+       if (!ring)
+               return false;
+
+       if (ring->no_scheduler || !drm_sched_wqueue_ready(&ring->sched))
+               return false;
+
+       return true;
+}
index bbb53720a0181d93cf9fdfd6f7721ee006699004..fe1a61eb6e4c0809c1bccd41bc89f32bcd8304f2 100644 (file)
@@ -450,5 +450,5 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
 int amdgpu_ib_pool_init(struct amdgpu_device *adev);
 void amdgpu_ib_pool_fini(struct amdgpu_device *adev);
 int amdgpu_ib_ring_tests(struct amdgpu_device *adev);
-
+bool amdgpu_ring_sched_ready(struct amdgpu_ring *ring);
 #endif
index 08916538a615ff3d072eb5241a97495795c7e32a..8db880244324ff1077ff3d87c20b7387ecd8b74b 100644 (file)
@@ -221,8 +221,23 @@ static struct attribute *amdgpu_vram_mgr_attributes[] = {
        NULL
 };
 
+static umode_t amdgpu_vram_attrs_is_visible(struct kobject *kobj,
+                                           struct attribute *attr, int i)
+{
+       struct device *dev = kobj_to_dev(kobj);
+       struct drm_device *ddev = dev_get_drvdata(dev);
+       struct amdgpu_device *adev = drm_to_adev(ddev);
+
+       if (attr == &dev_attr_mem_info_vram_vendor.attr &&
+           !adev->gmc.vram_vendor)
+               return 0;
+
+       return attr->mode;
+}
+
 const struct attribute_group amdgpu_vram_mgr_attr_group = {
-       .attrs = amdgpu_vram_mgr_attributes
+       .attrs = amdgpu_vram_mgr_attributes,
+       .is_visible = amdgpu_vram_attrs_is_visible
 };
 
 /**
index 6f7c031dd197a22e388ddcfaed56ec75e37cafe5..f24e34dc33d1defcd70cab67f1423dffd31e8f08 100644 (file)
@@ -204,6 +204,12 @@ static u32 cik_ih_get_wptr(struct amdgpu_device *adev,
                tmp = RREG32(mmIH_RB_CNTL);
                tmp |= IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK;
                WREG32(mmIH_RB_CNTL, tmp);
+
+               /* Unset the CLEAR_OVERFLOW bit immediately so new overflows
+                * can be detected.
+                */
+               tmp &= ~IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK;
+               WREG32(mmIH_RB_CNTL, tmp);
        }
        return (wptr & ih->ptr_mask);
 }
index b8c47e0cf37ad53bcb3f1afe161e6356b91789e3..c19681492efa748bf7b5d92864dbdc61c0351520 100644 (file)
@@ -216,6 +216,11 @@ static u32 cz_ih_get_wptr(struct amdgpu_device *adev,
        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
        WREG32(mmIH_RB_CNTL, tmp);
 
+       /* Unset the CLEAR_OVERFLOW bit immediately so new overflows
+        * can be detected.
+        */
+       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
+       WREG32(mmIH_RB_CNTL, tmp);
 
 out:
        return (wptr & ih->ptr_mask);
index d63cab294883b8b44caa908d5bafaeaf19750ef6..dcdecb18b2306b84ca1b18852837409776707c69 100644 (file)
@@ -4027,8 +4027,6 @@ static int gfx_v10_0_init_microcode(struct amdgpu_device *adev)
                err = 0;
                adev->gfx.mec2_fw = NULL;
        }
-       amdgpu_gfx_cp_init_microcode(adev, AMDGPU_UCODE_ID_CP_MEC2);
-       amdgpu_gfx_cp_init_microcode(adev, AMDGPU_UCODE_ID_CP_MEC2_JT);
 
        gfx_v10_0_check_fw_write_wait(adev);
 out:
@@ -6589,7 +6587,7 @@ static int gfx_v10_0_compute_mqd_init(struct amdgpu_device *adev, void *m,
 #ifdef __BIG_ENDIAN
        tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, ENDIAN_SWAP, 1);
 #endif
-       tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, UNORD_DISPATCH, 0);
+       tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, UNORD_DISPATCH, 1);
        tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, TUNNEL_DISPATCH,
                            prop->allow_tunneling);
        tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, PRIV_STATE, 1);
index 0ea0866c261f84e24e8494755387b3d22482a0a2..4f3bfdc75b37d66cbc5d78a5525a8a905eb1e733 100644 (file)
@@ -107,23 +107,6 @@ static const struct soc15_reg_golden golden_settings_gc_11_0_1[] =
        SOC15_REG_GOLDEN_VALUE(GC, 0, regTCP_CNTL2, 0xfcffffff, 0x0000000a)
 };
 
-static const struct soc15_reg_golden golden_settings_gc_11_5_0[] = {
-       SOC15_REG_GOLDEN_VALUE(GC, 0, regDB_DEBUG5, 0xffffffff, 0x00000800),
-       SOC15_REG_GOLDEN_VALUE(GC, 0, regGB_ADDR_CONFIG, 0x0c1807ff, 0x00000242),
-       SOC15_REG_GOLDEN_VALUE(GC, 0, regGCR_GENERAL_CNTL, 0x1ff1ffff, 0x00000500),
-       SOC15_REG_GOLDEN_VALUE(GC, 0, regGL2A_ADDR_MATCH_MASK, 0xffffffff, 0xfffffff3),
-       SOC15_REG_GOLDEN_VALUE(GC, 0, regGL2C_ADDR_MATCH_MASK, 0xffffffff, 0xfffffff3),
-       SOC15_REG_GOLDEN_VALUE(GC, 0, regGL2C_CTRL, 0xffffffff, 0xf37fff3f),
-       SOC15_REG_GOLDEN_VALUE(GC, 0, regGL2C_CTRL3, 0xfffffffb, 0x00f40188),
-       SOC15_REG_GOLDEN_VALUE(GC, 0, regGL2C_CTRL4, 0xf0ffffff, 0x80009007),
-       SOC15_REG_GOLDEN_VALUE(GC, 0, regPA_CL_ENHANCE, 0xf1ffffff, 0x00880007),
-       SOC15_REG_GOLDEN_VALUE(GC, 0, regPC_CONFIG_CNTL_1, 0xffffffff, 0x00010000),
-       SOC15_REG_GOLDEN_VALUE(GC, 0, regTA_CNTL_AUX, 0xf7f7ffff, 0x01030000),
-       SOC15_REG_GOLDEN_VALUE(GC, 0, regTA_CNTL2, 0x007f0000, 0x00000000),
-       SOC15_REG_GOLDEN_VALUE(GC, 0, regTCP_CNTL2, 0xffcfffff, 0x0000200a),
-       SOC15_REG_GOLDEN_VALUE(GC, 0, regUTCL1_CTRL_2, 0xffffffff, 0x0000048f)
-};
-
 #define DEFAULT_SH_MEM_CONFIG \
        ((SH_MEM_ADDRESS_MODE_64 << SH_MEM_CONFIG__ADDRESS_MODE__SHIFT) | \
         (SH_MEM_ALIGNMENT_MODE_UNALIGNED << SH_MEM_CONFIG__ALIGNMENT_MODE__SHIFT) | \
@@ -304,11 +287,6 @@ static void gfx_v11_0_init_golden_registers(struct amdgpu_device *adev)
                                                golden_settings_gc_11_0_1,
                                                (const u32)ARRAY_SIZE(golden_settings_gc_11_0_1));
                break;
-       case IP_VERSION(11, 5, 0):
-               soc15_program_register_sequence(adev,
-                                               golden_settings_gc_11_5_0,
-                                               (const u32)ARRAY_SIZE(golden_settings_gc_11_5_0));
-               break;
        default:
                break;
        }
@@ -3846,7 +3824,7 @@ static int gfx_v11_0_compute_mqd_init(struct amdgpu_device *adev, void *m,
                            (order_base_2(prop->queue_size / 4) - 1));
        tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, RPTR_BLOCK_SIZE,
                            (order_base_2(AMDGPU_GPU_PAGE_SIZE / 4) - 1));
-       tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, UNORD_DISPATCH, 0);
+       tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, UNORD_DISPATCH, 1);
        tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, TUNNEL_DISPATCH,
                            prop->allow_tunneling);
        tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, PRIV_STATE, 1);
index 69c500910746018281471ad6d27350aaf2461702..3bc6943365a4ff36a32827ae2d477aac6883631d 100644 (file)
@@ -3034,6 +3034,14 @@ static int gfx_v9_0_cp_gfx_start(struct amdgpu_device *adev)
 
        gfx_v9_0_cp_gfx_enable(adev, true);
 
+       /* Now only limit the quirk on the APU gfx9 series and already
+        * confirmed that the APU gfx10/gfx11 needn't such update.
+        */
+       if (adev->flags & AMD_IS_APU &&
+                       adev->in_s3 && !adev->suspend_complete) {
+               DRM_INFO(" Will skip the CSB packet resubmit\n");
+               return 0;
+       }
        r = amdgpu_ring_alloc(ring, gfx_v9_0_get_csb_size(adev) + 4 + 3);
        if (r) {
                DRM_ERROR("amdgpu: cp failed to lock ring (%d).\n", r);
index 42e103d7077d52d5bbe556f70f2b03bb0d5ae8db..59d9215e555629577b43afcba38e945f5ce90bcd 100644 (file)
@@ -915,8 +915,8 @@ static int gmc_v6_0_hw_init(void *handle)
 
        if (amdgpu_emu_mode == 1)
                return amdgpu_gmc_vram_checking(adev);
-       else
-               return r;
+
+       return 0;
 }
 
 static int gmc_v6_0_hw_fini(void *handle)
index efc16e580f1e27e384b7c80323c72d0e59fba473..45a2f8e031a2c9920f3a68ae690731357f33da0c 100644 (file)
@@ -1099,8 +1099,8 @@ static int gmc_v7_0_hw_init(void *handle)
 
        if (amdgpu_emu_mode == 1)
                return amdgpu_gmc_vram_checking(adev);
-       else
-               return r;
+
+       return 0;
 }
 
 static int gmc_v7_0_hw_fini(void *handle)
index ff4ae73d27ecd26aaf399bdfe158e22c1de3009f..4422b27a3cc2fc069a6ecb3e6d8b9630e9c173cc 100644 (file)
@@ -1219,8 +1219,8 @@ static int gmc_v8_0_hw_init(void *handle)
 
        if (amdgpu_emu_mode == 1)
                return amdgpu_gmc_vram_checking(adev);
-       else
-               return r;
+
+       return 0;
 }
 
 static int gmc_v8_0_hw_fini(void *handle)
index f9039d64ff2d72804556daa16b8ed9632b08b307..e67a62db9e12629b40c92f322922cc763ce53ce7 100644 (file)
@@ -1947,13 +1947,6 @@ static int gmc_v9_0_init_mem_ranges(struct amdgpu_device *adev)
 
 static void gmc_v9_4_3_init_vram_info(struct amdgpu_device *adev)
 {
-       static const u32 regBIF_BIOS_SCRATCH_4 = 0x50;
-       u32 vram_info;
-
-       if (!amdgpu_sriov_vf(adev)) {
-               vram_info = RREG32(regBIF_BIOS_SCRATCH_4);
-               adev->gmc.vram_vendor = vram_info & 0xF;
-       }
        adev->gmc.vram_type = AMDGPU_VRAM_TYPE_HBM;
        adev->gmc.vram_width = 128 * 64;
 }
@@ -2340,8 +2333,8 @@ static int gmc_v9_0_hw_init(void *handle)
 
        if (amdgpu_emu_mode == 1)
                return amdgpu_gmc_vram_checking(adev);
-       else
-               return r;
+
+       return 0;
 }
 
 /**
index aecad530b10a61289f9e2413612bbf58a33cec22..2c02ae69883d2bb86bec8e1d1fb521f8481d7ebb 100644 (file)
@@ -215,6 +215,11 @@ static u32 iceland_ih_get_wptr(struct amdgpu_device *adev,
        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
        WREG32(mmIH_RB_CNTL, tmp);
 
+       /* Unset the CLEAR_OVERFLOW bit immediately so new overflows
+        * can be detected.
+        */
+       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
+       WREG32(mmIH_RB_CNTL, tmp);
 
 out:
        return (wptr & ih->ptr_mask);
index d9ed7332d805d3fca1bd0343ebc804e69dc44595..ad4ad39f128f7d7f788a866d36cc7c8175743b5d 100644 (file)
@@ -418,6 +418,12 @@ static u32 ih_v6_0_get_wptr(struct amdgpu_device *adev,
        tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl);
        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
        WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
+
+       /* Unset the CLEAR_OVERFLOW bit immediately so new overflows
+        * can be detected.
+        */
+       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
+       WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
 out:
        return (wptr & ih->ptr_mask);
 }
index 8fb05eae340ad298653afaca4edccfce86741c84..b8da0fc29378c496ba0392e10105d1c58d53bf5a 100644 (file)
@@ -418,6 +418,13 @@ static u32 ih_v6_1_get_wptr(struct amdgpu_device *adev,
        tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl);
        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
        WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
+
+       /* Unset the CLEAR_OVERFLOW bit immediately so new overflows
+        * can be detected.
+        */
+       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
+       WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
+
 out:
        return (wptr & ih->ptr_mask);
 }
index bc38b90f8cf88e8fee393e8e52214ac72f0aa8a6..88ea58d5c4abf5b0f20abff28f9833f402e4b016 100644 (file)
@@ -674,14 +674,6 @@ static int jpeg_v4_0_set_powergating_state(void *handle,
        return ret;
 }
 
-static int jpeg_v4_0_set_interrupt_state(struct amdgpu_device *adev,
-                                       struct amdgpu_irq_src *source,
-                                       unsigned type,
-                                       enum amdgpu_interrupt_state state)
-{
-       return 0;
-}
-
 static int jpeg_v4_0_set_ras_interrupt_state(struct amdgpu_device *adev,
                                        struct amdgpu_irq_src *source,
                                        unsigned int type,
@@ -765,7 +757,6 @@ static void jpeg_v4_0_set_dec_ring_funcs(struct amdgpu_device *adev)
 }
 
 static const struct amdgpu_irq_src_funcs jpeg_v4_0_irq_funcs = {
-       .set = jpeg_v4_0_set_interrupt_state,
        .process = jpeg_v4_0_process_interrupt,
 };
 
index 6ede85b28cc8c0bbfd6a7e94c6a3d1a677e958bf..78b74daf4eebfc30f04ee4aaf6d0ff92891ff30f 100644 (file)
@@ -181,7 +181,6 @@ static int jpeg_v4_0_5_hw_fini(void *handle)
                        RREG32_SOC15(JPEG, 0, regUVD_JRBC_STATUS))
                        jpeg_v4_0_5_set_powergating_state(adev, AMD_PG_STATE_GATE);
        }
-       amdgpu_irq_put(adev, &adev->jpeg.inst->irq, 0);
 
        return 0;
 }
@@ -516,14 +515,6 @@ static int jpeg_v4_0_5_set_powergating_state(void *handle,
        return ret;
 }
 
-static int jpeg_v4_0_5_set_interrupt_state(struct amdgpu_device *adev,
-                                       struct amdgpu_irq_src *source,
-                                       unsigned type,
-                                       enum amdgpu_interrupt_state state)
-{
-       return 0;
-}
-
 static int jpeg_v4_0_5_process_interrupt(struct amdgpu_device *adev,
                                      struct amdgpu_irq_src *source,
                                      struct amdgpu_iv_entry *entry)
@@ -603,7 +594,6 @@ static void jpeg_v4_0_5_set_dec_ring_funcs(struct amdgpu_device *adev)
 }
 
 static const struct amdgpu_irq_src_funcs jpeg_v4_0_5_irq_funcs = {
-       .set = jpeg_v4_0_5_set_interrupt_state,
        .process = jpeg_v4_0_5_process_interrupt,
 };
 
index e64b33115848d204a4d81eb9530df5bf95fdf796..de93614726c9a48ccd398c6ac5570a8844fb7618 100644 (file)
@@ -442,6 +442,12 @@ static u32 navi10_ih_get_wptr(struct amdgpu_device *adev,
        tmp = RREG32_NO_KIQ(ih_regs->ih_rb_cntl);
        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
        WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
+
+       /* Unset the CLEAR_OVERFLOW bit immediately so new overflows
+        * can be detected.
+        */
+       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
+       WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
 out:
        return (wptr & ih->ptr_mask);
 }
index e90f33780803458c32843f2599c07e4f598ca659..b4723d68eab0f939ba057b67cf7712ddb512c8c8 100644 (file)
@@ -431,6 +431,12 @@ static void nbio_v7_9_init_registers(struct amdgpu_device *adev)
        u32 inst_mask;
        int i;
 
+       if (amdgpu_sriov_vf(adev))
+               adev->rmmio_remap.reg_offset =
+                       SOC15_REG_OFFSET(
+                               NBIO, 0,
+                               regBIF_BX_DEV0_EPF0_VF0_HDP_MEM_COHERENCY_FLUSH_CNTL)
+                       << 2;
        WREG32_SOC15(NBIO, 0, regXCC_DOORBELL_FENCE,
                0xff & ~(adev->gfx.xcc_mask));
 
index 9a24f17a57502edaa744451bd312dfcd8b3d678c..cada9f300a7f510a3f025c3ed17c87aedcbbaeb5 100644 (file)
@@ -119,6 +119,12 @@ static u32 si_ih_get_wptr(struct amdgpu_device *adev,
                tmp = RREG32(IH_RB_CNTL);
                tmp |= IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK;
                WREG32(IH_RB_CNTL, tmp);
+
+               /* Unset the CLEAR_OVERFLOW bit immediately so new overflows
+                * can be detected.
+                */
+               tmp &= ~IH_RB_CNTL__WPTR_OVERFLOW_CLEAR_MASK;
+               WREG32(IH_RB_CNTL, tmp);
        }
        return (wptr & ih->ptr_mask);
 }
index 15033efec2bac0148e5d9381027a6ee3e70334b7..1c614451deadd10d5dfb29a591fbeb394505ac91 100644 (file)
@@ -574,11 +574,34 @@ soc15_asic_reset_method(struct amdgpu_device *adev)
                return AMD_RESET_METHOD_MODE1;
 }
 
+static bool soc15_need_reset_on_resume(struct amdgpu_device *adev)
+{
+       u32 sol_reg;
+
+       sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
+
+       /* Will reset for the following suspend abort cases.
+        * 1) Only reset limit on APU side, dGPU hasn't checked yet.
+        * 2) S3 suspend abort and TOS already launched.
+        */
+       if (adev->flags & AMD_IS_APU && adev->in_s3 &&
+                       !adev->suspend_complete &&
+                       sol_reg)
+               return true;
+
+       return false;
+}
+
 static int soc15_asic_reset(struct amdgpu_device *adev)
 {
        /* original raven doesn't have full asic reset */
-       if ((adev->apu_flags & AMD_APU_IS_RAVEN) ||
-           (adev->apu_flags & AMD_APU_IS_RAVEN2))
+       /* On the latest Raven, the GPU reset can be performed
+        * successfully. So now, temporarily enable it for the
+        * S3 suspend abort case.
+        */
+       if (((adev->apu_flags & AMD_APU_IS_RAVEN) ||
+           (adev->apu_flags & AMD_APU_IS_RAVEN2)) &&
+               !soc15_need_reset_on_resume(adev))
                return 0;
 
        switch (soc15_asic_reset_method(adev)) {
@@ -1302,6 +1325,10 @@ static int soc15_common_resume(void *handle)
 {
        struct amdgpu_device *adev = (struct amdgpu_device *)handle;
 
+       if (soc15_need_reset_on_resume(adev)) {
+               dev_info(adev->dev, "S3 suspend abort case, let's reset ASIC.\n");
+               soc15_asic_reset(adev);
+       }
        return soc15_common_hw_init(adev);
 }
 
index 48c6efcdeac974ba109224510442b0488e1875d0..4d7188912edfee820dca2ac854b55314dc2f1b27 100644 (file)
@@ -50,13 +50,13 @@ static const struct amd_ip_funcs soc21_common_ip_funcs;
 /* SOC21 */
 static const struct amdgpu_video_codec_info vcn_4_0_0_video_codecs_encode_array_vcn0[] = {
        {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4_AVC, 4096, 2304, 0)},
-       {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 4096, 2304, 0)},
+       {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 8192, 4352, 0)},
        {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_AV1, 8192, 4352, 0)},
 };
 
 static const struct amdgpu_video_codec_info vcn_4_0_0_video_codecs_encode_array_vcn1[] = {
        {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_MPEG4_AVC, 4096, 2304, 0)},
-       {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 4096, 2304, 0)},
+       {codec_info_build(AMDGPU_INFO_VIDEO_CAPS_CODEC_IDX_HEVC, 8192, 4352, 0)},
 };
 
 static const struct amdgpu_video_codecs vcn_4_0_0_video_codecs_encode_vcn0 = {
index 917707bba7f3624e37b0525d3ec72bf563c1307a..450b6e8315091448c24e2d90dcd4edccc9d4423c 100644 (file)
@@ -219,6 +219,12 @@ static u32 tonga_ih_get_wptr(struct amdgpu_device *adev,
        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
        WREG32(mmIH_RB_CNTL, tmp);
 
+       /* Unset the CLEAR_OVERFLOW bit immediately so new overflows
+        * can be detected.
+        */
+       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
+       WREG32(mmIH_RB_CNTL, tmp);
+
 out:
        return (wptr & ih->ptr_mask);
 }
index 169ed400ee7b7413263ab48a2de1e75aa3ed00f7..8ab01ae919d2e36c8ff1c2226227c173223247be 100644 (file)
@@ -2017,22 +2017,6 @@ static int vcn_v4_0_set_powergating_state(void *handle, enum amd_powergating_sta
        return ret;
 }
 
-/**
- * vcn_v4_0_set_interrupt_state - set VCN block interrupt state
- *
- * @adev: amdgpu_device pointer
- * @source: interrupt sources
- * @type: interrupt types
- * @state: interrupt states
- *
- * Set VCN block interrupt state
- */
-static int vcn_v4_0_set_interrupt_state(struct amdgpu_device *adev, struct amdgpu_irq_src *source,
-      unsigned type, enum amdgpu_interrupt_state state)
-{
-       return 0;
-}
-
 /**
  * vcn_v4_0_set_ras_interrupt_state - set VCN block RAS interrupt state
  *
@@ -2097,7 +2081,6 @@ static int vcn_v4_0_process_interrupt(struct amdgpu_device *adev, struct amdgpu_
 }
 
 static const struct amdgpu_irq_src_funcs vcn_v4_0_irq_funcs = {
-       .set = vcn_v4_0_set_interrupt_state,
        .process = vcn_v4_0_process_interrupt,
 };
 
index 2eda30e78f61d928984cf57b94337abc7b9cfc0a..49e4c3c09acab8eab12770325f4cf48c8c491b7c 100644 (file)
@@ -269,8 +269,6 @@ static int vcn_v4_0_5_hw_fini(void *handle)
                                vcn_v4_0_5_set_powergating_state(adev, AMD_PG_STATE_GATE);
                        }
                }
-
-               amdgpu_irq_put(adev, &adev->vcn.inst[i].irq, 0);
        }
 
        return 0;
@@ -1668,22 +1666,6 @@ static int vcn_v4_0_5_set_powergating_state(void *handle, enum amd_powergating_s
        return ret;
 }
 
-/**
- * vcn_v4_0_5_set_interrupt_state - set VCN block interrupt state
- *
- * @adev: amdgpu_device pointer
- * @source: interrupt sources
- * @type: interrupt types
- * @state: interrupt states
- *
- * Set VCN block interrupt state
- */
-static int vcn_v4_0_5_set_interrupt_state(struct amdgpu_device *adev, struct amdgpu_irq_src *source,
-               unsigned type, enum amdgpu_interrupt_state state)
-{
-       return 0;
-}
-
 /**
  * vcn_v4_0_5_process_interrupt - process VCN block interrupt
  *
@@ -1726,7 +1708,6 @@ static int vcn_v4_0_5_process_interrupt(struct amdgpu_device *adev, struct amdgp
 }
 
 static const struct amdgpu_irq_src_funcs vcn_v4_0_5_irq_funcs = {
-       .set = vcn_v4_0_5_set_interrupt_state,
        .process = vcn_v4_0_5_process_interrupt,
 };
 
index d364c6dd152c33b7fc1fbc614668b2dd4ffe223a..bf68e18e3824b8e492c2451b655bfcf5068910f6 100644 (file)
@@ -373,6 +373,12 @@ static u32 vega10_ih_get_wptr(struct amdgpu_device *adev,
        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
        WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
 
+       /* Unset the CLEAR_OVERFLOW bit immediately so new overflows
+        * can be detected.
+        */
+       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
+       WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
+
 out:
        return (wptr & ih->ptr_mask);
 }
index ddfc6941f9d559c916fe2cdb66b4e27394f1d618..db66e6cccaf2aa4e596a8f377eed8030c55159b7 100644 (file)
@@ -421,6 +421,12 @@ static u32 vega20_ih_get_wptr(struct amdgpu_device *adev,
        tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 1);
        WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
 
+       /* Unset the CLEAR_OVERFLOW bit immediately so new overflows
+        * can be detected.
+        */
+       tmp = REG_SET_FIELD(tmp, IH_RB_CNTL, WPTR_OVERFLOW_CLEAR, 0);
+       WREG32_NO_KIQ(ih_regs->ih_rb_cntl, tmp);
+
 out:
        return (wptr & ih->ptr_mask);
 }
index df75863393fcb887613fb4dc054977fb46a49b1e..d1caaf0e6a7c4eaed98fc8f390781719bf28b846 100644 (file)
@@ -674,7 +674,7 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
        0x86ea6a6a, 0x8f6e837a,
        0xb96ee0c2, 0xbf800002,
        0xb97a0002, 0xbf8a0000,
-       0xbe801f6c, 0xbf810000,
+       0xbe801f6c, 0xbf9b0000,
 };
 
 static const uint32_t cwsr_trap_nv1x_hex[] = {
@@ -1091,7 +1091,7 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
        0xb9eef807, 0x876dff6d,
        0x0000ffff, 0x87fe7e7e,
        0x87ea6a6a, 0xb9faf802,
-       0xbe80226c, 0xbf810000,
+       0xbe80226c, 0xbf9b0000,
        0xbf9f0000, 0xbf9f0000,
        0xbf9f0000, 0xbf9f0000,
        0xbf9f0000, 0x00000000,
@@ -1574,7 +1574,7 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
        0x86ea6a6a, 0x8f6e837a,
        0xb96ee0c2, 0xbf800002,
        0xb97a0002, 0xbf8a0000,
-       0xbe801f6c, 0xbf810000,
+       0xbe801f6c, 0xbf9b0000,
 };
 
 static const uint32_t cwsr_trap_aldebaran_hex[] = {
@@ -2065,7 +2065,7 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = {
        0x86ea6a6a, 0x8f6e837a,
        0xb96ee0c2, 0xbf800002,
        0xb97a0002, 0xbf8a0000,
-       0xbe801f6c, 0xbf810000,
+       0xbe801f6c, 0xbf9b0000,
 };
 
 static const uint32_t cwsr_trap_gfx10_hex[] = {
@@ -2500,7 +2500,7 @@ static const uint32_t cwsr_trap_gfx10_hex[] = {
        0x876dff6d, 0x0000ffff,
        0x87fe7e7e, 0x87ea6a6a,
        0xb9faf802, 0xbe80226c,
-       0xbf810000, 0xbf9f0000,
+       0xbf9b0000, 0xbf9f0000,
        0xbf9f0000, 0xbf9f0000,
        0xbf9f0000, 0xbf9f0000,
 };
@@ -2944,7 +2944,7 @@ static const uint32_t cwsr_trap_gfx11_hex[] = {
        0xb8eef802, 0xbf0d866e,
        0xbfa20002, 0xb97af802,
        0xbe80486c, 0xb97af802,
-       0xbe804a6c, 0xbfb00000,
+       0xbe804a6c, 0xbfb10000,
        0xbf9f0000, 0xbf9f0000,
        0xbf9f0000, 0xbf9f0000,
        0xbf9f0000, 0x00000000,
@@ -3436,5 +3436,5 @@ static const uint32_t cwsr_trap_gfx9_4_3_hex[] = {
        0x86ea6a6a, 0x8f6e837a,
        0xb96ee0c2, 0xbf800002,
        0xb97a0002, 0xbf8a0000,
-       0xbe801f6c, 0xbf810000,
+       0xbe801f6c, 0xbf9b0000,
 };
index e0140df0b0ec8086433048adb31a06ca6aca740d..71b3dc0c73634aef86846be3669723590ca55db9 100644 (file)
@@ -1104,7 +1104,7 @@ L_RETURN_WITHOUT_PRIV:
        s_rfe_b64       s_restore_pc_lo                                         //Return to the main shader program and resume execution
 
 L_END_PGM:
-       s_endpgm
+       s_endpgm_saved
 end
 
 function write_hwreg_to_mem(s, s_rsrc, s_mem_offset)
index e506411ad28ab99f474eca96ff37254fb43078de..bb26338204f4ba84b5ae41a781e1becdf9ad72bb 100644 (file)
@@ -921,7 +921,7 @@ L_RESTORE:
 /*                     the END                                           */
 /**************************************************************************/
 L_END_PGM:
-    s_endpgm
+    s_endpgm_saved
 
 end
 
index ce4c52ec34d80eabb7f7664051ccebcd2f0ec64e..80e90fdef291d5b8cdcf7d08c6e319150fcf631b 100644 (file)
@@ -1442,7 +1442,9 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
                        kfd_flush_tlb(peer_pdd, TLB_FLUSH_HEAVYWEIGHT);
 
                /* Remove dma mapping after tlb flush to avoid IO_PAGE_FAULT */
-               amdgpu_amdkfd_gpuvm_dmaunmap_mem(mem, peer_pdd->drm_priv);
+               err = amdgpu_amdkfd_gpuvm_dmaunmap_mem(mem, peer_pdd->drm_priv);
+               if (err)
+                       goto sync_memory_failed;
        }
 
        mutex_unlock(&p->mutex);
index f856901055d34e605cd4ec51fbdfc3be18e2abeb..bdc01ca9609a7e57fac05ee60d6866a5950e2b07 100644 (file)
@@ -574,7 +574,7 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct svm_range *prange,
        pr_debug("svms 0x%p [0x%lx 0x%lx]\n", prange->svms, prange->start,
                 prange->last);
 
-       addr = prange->start << PAGE_SHIFT;
+       addr = migrate->start;
 
        src = (uint64_t *)(scratch + npages);
        dst = scratch;
index 8b7fed91352696cf2b5cafab0680ad0737fa95ee..22cbfa1bdaddb9a764053421b16159391c1ba56d 100644 (file)
@@ -170,6 +170,7 @@ static void update_mqd(struct mqd_manager *mm, void *mqd,
        m->cp_hqd_pq_control = 5 << CP_HQD_PQ_CONTROL__RPTR_BLOCK_SIZE__SHIFT;
        m->cp_hqd_pq_control |=
                        ffs(q->queue_size / sizeof(unsigned int)) - 1 - 1;
+       m->cp_hqd_pq_control |= CP_HQD_PQ_CONTROL__UNORD_DISPATCH_MASK;
        pr_debug("cp_hqd_pq_control 0x%x\n", m->cp_hqd_pq_control);
 
        m->cp_hqd_pq_base_lo = lower_32_bits((uint64_t)q->queue_address >> 8);
index 15277f1d5cf0a9d9eb694ccaeec540e467ab774a..826bc4f6c8a7043853d0b8e21bad73660c6a8a8c 100644 (file)
@@ -55,8 +55,8 @@ static void update_cu_mask(struct mqd_manager *mm, void *mqd,
        m = get_mqd(mqd);
 
        if (has_wa_flag) {
-               uint32_t wa_mask = minfo->update_flag == UPDATE_FLAG_DBG_WA_ENABLE ?
-                                               0xffff : 0xffffffff;
+               uint32_t wa_mask =
+                       (minfo->update_flag & UPDATE_FLAG_DBG_WA_ENABLE) ? 0xffff : 0xffffffff;
 
                m->compute_static_thread_mgmt_se0 = wa_mask;
                m->compute_static_thread_mgmt_se1 = wa_mask;
@@ -224,6 +224,7 @@ static void update_mqd(struct mqd_manager *mm, void *mqd,
        m->cp_hqd_pq_control = 5 << CP_HQD_PQ_CONTROL__RPTR_BLOCK_SIZE__SHIFT;
        m->cp_hqd_pq_control |=
                        ffs(q->queue_size / sizeof(unsigned int)) - 1 - 1;
+       m->cp_hqd_pq_control |= CP_HQD_PQ_CONTROL__UNORD_DISPATCH_MASK;
        pr_debug("cp_hqd_pq_control 0x%x\n", m->cp_hqd_pq_control);
 
        m->cp_hqd_pq_base_lo = lower_32_bits((uint64_t)q->queue_address >> 8);
index 42d881809dc70e230133674e4b12f6f68567837a..697b6d530d12ef30ed06a22d3cf5c15fa740b62a 100644 (file)
@@ -303,6 +303,15 @@ static void update_mqd(struct mqd_manager *mm, void *mqd,
                update_cu_mask(mm, mqd, minfo, 0);
        set_priority(m, q);
 
+       if (minfo && KFD_GC_VERSION(mm->dev) >= IP_VERSION(9, 4, 2)) {
+               if (minfo->update_flag & UPDATE_FLAG_IS_GWS)
+                       m->compute_resource_limits |=
+                               COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK;
+               else
+                       m->compute_resource_limits &=
+                               ~COMPUTE_RESOURCE_LIMITS__FORCE_SIMD_DIST_MASK;
+       }
+
        q->is_active = QUEUE_IS_ACTIVE(*q);
 }
 
index 17fbedbf3651388edfcd0109a22d0fe9dfcd331f..80320b8603fc6692cc5f10426d24f33b5ce0acfa 100644 (file)
@@ -532,6 +532,7 @@ struct queue_properties {
 enum mqd_update_flag {
        UPDATE_FLAG_DBG_WA_ENABLE = 1,
        UPDATE_FLAG_DBG_WA_DISABLE = 2,
+       UPDATE_FLAG_IS_GWS = 4, /* quirk for gfx9 IP */
 };
 
 struct mqd_update_info {
@@ -1488,10 +1489,15 @@ void kfd_dec_compute_active(struct kfd_node *dev);
 
 /* Cgroup Support */
 /* Check with device cgroup if @kfd device is accessible */
-static inline int kfd_devcgroup_check_permission(struct kfd_node *kfd)
+static inline int kfd_devcgroup_check_permission(struct kfd_node *node)
 {
 #if defined(CONFIG_CGROUP_DEVICE) || defined(CONFIG_CGROUP_BPF)
-       struct drm_device *ddev = adev_to_drm(kfd->adev);
+       struct drm_device *ddev;
+
+       if (node->xcp)
+               ddev = node->xcp->ddev;
+       else
+               ddev = adev_to_drm(node->adev);
 
        return devcgroup_check_permission(DEVCG_DEV_CHAR, DRM_MAJOR,
                                          ddev->render->index,
index 43eff221eae58ca008e2e2e92aec09eb749157d7..4858112f9a53b7e491186e0efa0e70dbb92ee47a 100644 (file)
@@ -95,6 +95,7 @@ void kfd_process_dequeue_from_device(struct kfd_process_device *pdd)
 int pqm_set_gws(struct process_queue_manager *pqm, unsigned int qid,
                        void *gws)
 {
+       struct mqd_update_info minfo = {0};
        struct kfd_node *dev = NULL;
        struct process_queue_node *pqn;
        struct kfd_process_device *pdd;
@@ -146,9 +147,10 @@ int pqm_set_gws(struct process_queue_manager *pqm, unsigned int qid,
        }
 
        pdd->qpd.num_gws = gws ? dev->adev->gds.gws_size : 0;
+       minfo.update_flag = gws ? UPDATE_FLAG_IS_GWS : 0;
 
        return pqn->q->device->dqm->ops.update_queue(pqn->q->device->dqm,
-                                                       pqn->q, NULL);
+                                                       pqn->q, &minfo);
 }
 
 void kfd_process_dequeue_from_all_devices(struct kfd_process *p)
index e5f7c92eebcbbfa6a1fda115ca2b599cab48e4e8..6ed2ec381aaa320ed1514038a1b6b10c44843019 100644 (file)
@@ -1638,12 +1638,10 @@ static int fill_in_l2_l3_pcache(struct kfd_cache_properties **props_ext,
                else
                        mode = UNKNOWN_MEMORY_PARTITION_MODE;
 
-               if (pcache->cache_level == 2)
-                       pcache->cache_size = pcache_info[cache_type].cache_size * num_xcc;
-               else if (mode)
-                       pcache->cache_size = pcache_info[cache_type].cache_size / mode;
-               else
-                       pcache->cache_size = pcache_info[cache_type].cache_size;
+               pcache->cache_size = pcache_info[cache_type].cache_size;
+               /* Partition mode only affects L3 cache size */
+               if (mode && pcache->cache_level == 3)
+                       pcache->cache_size /= mode;
 
                if (pcache_info[cache_type].flags & CRAT_CACHE_FLAGS_DATA_CACHE)
                        pcache->cache_type |= HSA_CACHE_TYPE_DATA;
index d4f525b66a09055909e163b815beee357f28d19d..1a9bbb04bd5e2c7fb9d29b5c7f2e1d0cd92d978c 100644 (file)
@@ -272,6 +272,7 @@ static int dm_crtc_get_scanoutpos(struct amdgpu_device *adev, int crtc,
 {
        u32 v_blank_start, v_blank_end, h_position, v_position;
        struct amdgpu_crtc *acrtc = NULL;
+       struct dc *dc = adev->dm.dc;
 
        if ((crtc < 0) || (crtc >= adev->mode_info.num_crtc))
                return -EINVAL;
@@ -284,6 +285,9 @@ static int dm_crtc_get_scanoutpos(struct amdgpu_device *adev, int crtc,
                return 0;
        }
 
+       if (dc && dc->caps.ips_support && dc->idle_optimizations_allowed)
+               dc_allow_idle_optimizations(dc, false);
+
        /*
         * TODO rework base driver to use values directly.
         * for now parse it back into reg-format
@@ -1715,7 +1719,10 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
        init_data.nbio_reg_offsets = adev->reg_offset[NBIO_HWIP][0];
        init_data.clk_reg_offsets = adev->reg_offset[CLK_HWIP][0];
 
-       init_data.flags.disable_ips = DMUB_IPS_DISABLE_ALL;
+       if (amdgpu_dc_debug_mask & DC_DISABLE_IPS)
+               init_data.flags.disable_ips = DMUB_IPS_DISABLE_ALL;
+
+       init_data.flags.disable_ips_in_vpb = 1;
 
        /* Enable DWB for tested platforms only */
        if (amdgpu_ip_version(adev, DCE_HWIP, 0) >= IP_VERSION(3, 0, 0))
@@ -1836,21 +1843,12 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
                        DRM_ERROR("amdgpu: fail to register dmub aux callback");
                        goto error;
                }
-               if (!register_dmub_notify_callback(adev, DMUB_NOTIFICATION_HPD, dmub_hpd_callback, true)) {
-                       DRM_ERROR("amdgpu: fail to register dmub hpd callback");
-                       goto error;
-               }
-               if (!register_dmub_notify_callback(adev, DMUB_NOTIFICATION_HPD_IRQ, dmub_hpd_callback, true)) {
-                       DRM_ERROR("amdgpu: fail to register dmub hpd callback");
-                       goto error;
-               }
-       }
-
-       /* Enable outbox notification only after IRQ handlers are registered and DMUB is alive.
-        * It is expected that DMUB will resend any pending notifications at this point, for
-        * example HPD from DPIA.
-        */
-       if (dc_is_dmub_outbox_supported(adev->dm.dc)) {
+               /* Enable outbox notification only after IRQ handlers are registered and DMUB is alive.
+                * It is expected that DMUB will resend any pending notifications at this point. Note
+                * that hpd and hpd_irq handler registration are deferred to register_hpd_handlers() to
+                * align legacy interface initialization sequence. Connection status will be proactivly
+                * detected once in the amdgpu_dm_initialize_drm_device.
+                */
                dc_enable_dmub_outbox(adev->dm.dc);
 
                /* DPIA trace goes to dmesg logs only if outbox is enabled */
@@ -1949,7 +1947,7 @@ static void amdgpu_dm_fini(struct amdgpu_device *adev)
                                      &adev->dm.dmub_bo_gpu_addr,
                                      &adev->dm.dmub_bo_cpu_addr);
 
-       if (adev->dm.hpd_rx_offload_wq) {
+       if (adev->dm.hpd_rx_offload_wq && adev->dm.dc) {
                for (i = 0; i < adev->dm.dc->caps.max_links; i++) {
                        if (adev->dm.hpd_rx_offload_wq[i].wq) {
                                destroy_workqueue(adev->dm.hpd_rx_offload_wq[i].wq);
@@ -2280,6 +2278,7 @@ static int dm_sw_fini(void *handle)
 
        if (adev->dm.dmub_srv) {
                dmub_srv_destroy(adev->dm.dmub_srv);
+               kfree(adev->dm.dmub_srv);
                adev->dm.dmub_srv = NULL;
        }
 
@@ -3529,6 +3528,14 @@ static void register_hpd_handlers(struct amdgpu_device *adev)
        int_params.requested_polarity = INTERRUPT_POLARITY_DEFAULT;
        int_params.current_polarity = INTERRUPT_POLARITY_DEFAULT;
 
+       if (dc_is_dmub_outbox_supported(adev->dm.dc)) {
+               if (!register_dmub_notify_callback(adev, DMUB_NOTIFICATION_HPD, dmub_hpd_callback, true))
+                       DRM_ERROR("amdgpu: fail to register dmub hpd callback");
+
+               if (!register_dmub_notify_callback(adev, DMUB_NOTIFICATION_HPD_IRQ, dmub_hpd_callback, true))
+                       DRM_ERROR("amdgpu: fail to register dmub hpd callback");
+       }
+
        list_for_each_entry(connector,
                        &dev->mode_config.connector_list, head) {
 
@@ -3557,10 +3564,6 @@ static void register_hpd_handlers(struct amdgpu_device *adev)
                                        handle_hpd_rx_irq,
                                        (void *) aconnector);
                }
-
-               if (adev->dm.hpd_rx_offload_wq)
-                       adev->dm.hpd_rx_offload_wq[connector->index].aconnector =
-                               aconnector;
        }
 }
 
@@ -4554,6 +4557,10 @@ static int amdgpu_dm_initialize_drm_device(struct amdgpu_device *adev)
                        goto fail;
                }
 
+               if (dm->hpd_rx_offload_wq)
+                       dm->hpd_rx_offload_wq[aconnector->base.index].aconnector =
+                               aconnector;
+
                if (!dc_link_detect_connection_type(link, &new_connection_type))
                        DRM_ERROR("KMS: Failed to detect connector\n");
 
@@ -5212,6 +5219,7 @@ static void fill_dc_dirty_rects(struct drm_plane *plane,
                                struct drm_plane_state *new_plane_state,
                                struct drm_crtc_state *crtc_state,
                                struct dc_flip_addrs *flip_addrs,
+                               bool is_psr_su,
                                bool *dirty_regions_changed)
 {
        struct dm_crtc_state *dm_crtc_state = to_dm_crtc_state(crtc_state);
@@ -5236,6 +5244,10 @@ static void fill_dc_dirty_rects(struct drm_plane *plane,
        num_clips = drm_plane_get_damage_clips_count(new_plane_state);
        clips = drm_plane_get_damage_clips(new_plane_state);
 
+       if (num_clips && (!amdgpu_damage_clips || (amdgpu_damage_clips < 0 &&
+                                                  is_psr_su)))
+               goto ffu;
+
        if (!dm_crtc_state->mpo_requested) {
                if (!num_clips || num_clips > DC_MAX_DIRTY_RECTS)
                        goto ffu;
@@ -6187,7 +6199,9 @@ create_stream_for_sink(struct drm_connector *connector,
                if (recalculate_timing) {
                        freesync_mode = get_highest_refresh_rate_mode(aconnector, false);
                        drm_mode_copy(&saved_mode, &mode);
+                       saved_mode.picture_aspect_ratio = mode.picture_aspect_ratio;
                        drm_mode_copy(&mode, freesync_mode);
+                       mode.picture_aspect_ratio = saved_mode.picture_aspect_ratio;
                } else {
                        decide_crtc_timing_for_drm_display_mode(
                                        &mode, preferred_mode, scale);
@@ -6520,10 +6534,15 @@ amdgpu_dm_connector_late_register(struct drm_connector *connector)
 static void amdgpu_dm_connector_funcs_force(struct drm_connector *connector)
 {
        struct amdgpu_dm_connector *aconnector = to_amdgpu_dm_connector(connector);
-       struct amdgpu_connector *amdgpu_connector = to_amdgpu_connector(connector);
        struct dc_link *dc_link = aconnector->dc_link;
        struct dc_sink *dc_em_sink = aconnector->dc_em_sink;
        struct edid *edid;
+       struct i2c_adapter *ddc;
+
+       if (dc_link && dc_link->aux_mode)
+               ddc = &aconnector->dm_dp_aux.aux.ddc;
+       else
+               ddc = &aconnector->i2c->base;
 
        /*
         * Note: drm_get_edid gets edid in the following order:
@@ -6531,7 +6550,7 @@ static void amdgpu_dm_connector_funcs_force(struct drm_connector *connector)
         * 2) firmware EDID if set via edid_firmware module parameter
         * 3) regular DDC read.
         */
-       edid = drm_get_edid(connector, &amdgpu_connector->ddc_bus->aux.ddc);
+       edid = drm_get_edid(connector, ddc);
        if (!edid) {
                DRM_ERROR("No EDID found on connector: %s.\n", connector->name);
                return;
@@ -6572,12 +6591,18 @@ static int get_modes(struct drm_connector *connector)
 static void create_eml_sink(struct amdgpu_dm_connector *aconnector)
 {
        struct drm_connector *connector = &aconnector->base;
-       struct amdgpu_connector *amdgpu_connector = to_amdgpu_connector(&aconnector->base);
+       struct dc_link *dc_link = aconnector->dc_link;
        struct dc_sink_init_data init_params = {
                        .link = aconnector->dc_link,
                        .sink_signal = SIGNAL_TYPE_VIRTUAL
        };
        struct edid *edid;
+       struct i2c_adapter *ddc;
+
+       if (dc_link->aux_mode)
+               ddc = &aconnector->dm_dp_aux.aux.ddc;
+       else
+               ddc = &aconnector->i2c->base;
 
        /*
         * Note: drm_get_edid gets edid in the following order:
@@ -6585,7 +6610,7 @@ static void create_eml_sink(struct amdgpu_dm_connector *aconnector)
         * 2) firmware EDID if set via edid_firmware module parameter
         * 3) regular DDC read.
         */
-       edid = drm_get_edid(connector, &amdgpu_connector->ddc_bus->aux.ddc);
+       edid = drm_get_edid(connector, ddc);
        if (!edid) {
                DRM_ERROR("No EDID found on connector: %s.\n", connector->name);
                return;
@@ -8291,6 +8316,8 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
                        fill_dc_dirty_rects(plane, old_plane_state,
                                            new_plane_state, new_crtc_state,
                                            &bundle->flip_addrs[planes_count],
+                                           acrtc_state->stream->link->psr_settings.psr_version ==
+                                           DC_PSR_VERSION_SU_1,
                                            &dirty_rects_changed);
 
                        /*
@@ -8976,16 +9003,8 @@ static void amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state)
 
        trace_amdgpu_dm_atomic_commit_tail_begin(state);
 
-       if (dm->dc->caps.ips_support) {
-               for_each_oldnew_connector_in_state(state, connector, old_con_state, new_con_state, i) {
-                       if (new_con_state->crtc &&
-                               new_con_state->crtc->state->active &&
-                               drm_atomic_crtc_needs_modeset(new_con_state->crtc->state)) {
-                               dc_dmub_srv_apply_idle_power_optimizations(dm->dc, false);
-                               break;
-                       }
-               }
-       }
+       if (dm->dc->caps.ips_support && dm->dc->idle_optimizations_allowed)
+               dc_allow_idle_optimizations(dm->dc, false);
 
        drm_atomic_helper_update_legacy_modeset_state(dev, state);
        drm_dp_mst_atomic_wait_for_dependencies(state);
@@ -9188,6 +9207,10 @@ static void amdgpu_dm_atomic_commit_tail(struct drm_atomic_state *state)
                 * To fix this, DC should permit updating only stream properties.
                 */
                dummy_updates = kzalloc(sizeof(struct dc_surface_update) * MAX_SURFACES, GFP_ATOMIC);
+               if (!dummy_updates) {
+                       DRM_ERROR("Failed to allocate memory for dummy_updates.\n");
+                       continue;
+               }
                for (j = 0; j < status->plane_count; j++)
                        dummy_updates[j].surface = status->plane_states[0];
 
@@ -10728,11 +10751,13 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev,
                        goto fail;
                }
 
-               ret = compute_mst_dsc_configs_for_state(state, dm_state->context, vars);
-               if (ret) {
-                       DRM_DEBUG_DRIVER("compute_mst_dsc_configs_for_state() failed\n");
-                       ret = -EINVAL;
-                       goto fail;
+               if (dc_resource_is_dsc_encoding_supported(dc)) {
+                       ret = compute_mst_dsc_configs_for_state(state, dm_state->context, vars);
+                       if (ret) {
+                               DRM_DEBUG_DRIVER("compute_mst_dsc_configs_for_state() failed\n");
+                               ret = -EINVAL;
+                               goto fail;
+                       }
                }
 
                ret = dm_update_mst_vcpi_slots_for_dsc(state, dm_state->context, vars);
@@ -11144,14 +11169,23 @@ void amdgpu_dm_update_freesync_caps(struct drm_connector *connector,
                                if (range->flags != 1)
                                        continue;
 
-                               amdgpu_dm_connector->min_vfreq = range->min_vfreq;
-                               amdgpu_dm_connector->max_vfreq = range->max_vfreq;
-                               amdgpu_dm_connector->pixel_clock_mhz =
-                                       range->pixel_clock_mhz * 10;
-
                                connector->display_info.monitor_range.min_vfreq = range->min_vfreq;
                                connector->display_info.monitor_range.max_vfreq = range->max_vfreq;
 
+                               if (edid->revision >= 4) {
+                                       if (data->pad2 & DRM_EDID_RANGE_OFFSET_MIN_VFREQ)
+                                               connector->display_info.monitor_range.min_vfreq += 255;
+                                       if (data->pad2 & DRM_EDID_RANGE_OFFSET_MAX_VFREQ)
+                                               connector->display_info.monitor_range.max_vfreq += 255;
+                               }
+
+                               amdgpu_dm_connector->min_vfreq =
+                                       connector->display_info.monitor_range.min_vfreq;
+                               amdgpu_dm_connector->max_vfreq =
+                                       connector->display_info.monitor_range.max_vfreq;
+                               amdgpu_dm_connector->pixel_clock_mhz =
+                                       range->pixel_clock_mhz * 10;
+
                                break;
                        }
 
index 85b7f58a7f35a478f551ec097b1613b504ced535..c27063305a1341c677c95e91dd49eb4fca1ea94a 100644 (file)
@@ -67,6 +67,8 @@ static void apply_edid_quirks(struct edid *edid, struct dc_edid_caps *edid_caps)
        /* Workaround for some monitors that do not clear DPCD 0x317 if FreeSync is unsupported */
        case drm_edid_encode_panel_id('A', 'U', 'O', 0xA7AB):
        case drm_edid_encode_panel_id('A', 'U', 'O', 0xE69B):
+       case drm_edid_encode_panel_id('B', 'O', 'E', 0x092A):
+       case drm_edid_encode_panel_id('L', 'G', 'D', 0x06D1):
                DRM_DEBUG_DRIVER("Clearing DPCD 0x317 on monitor with panel id %X\n", panel_id);
                edid_caps->panel_patch.remove_sink_ext_caps = true;
                break;
@@ -120,6 +122,8 @@ enum dc_edid_status dm_helpers_parse_edid_caps(
 
        edid_caps->edid_hdmi = connector->display_info.is_hdmi;
 
+       apply_edid_quirks(edid_buf, edid_caps);
+
        sad_count = drm_edid_to_sad((struct edid *) edid->raw_edid, &sads);
        if (sad_count <= 0)
                return result;
@@ -146,8 +150,6 @@ enum dc_edid_status dm_helpers_parse_edid_caps(
        else
                edid_caps->speaker_flags = DEFAULT_SPEAKER_LOCATION;
 
-       apply_edid_quirks(edid_buf, edid_caps);
-
        kfree(sads);
        kfree(sadb);
 
index 58b880acb087ae73352e9ac487d5ccd033f03f4d..3390f0d8420a05dc6d9daae1f3d8c4f53de57aed 100644 (file)
@@ -711,7 +711,7 @@ static inline int dm_irq_state(struct amdgpu_device *adev,
 {
        bool st;
        enum dc_irq_source irq_source;
-
+       struct dc *dc = adev->dm.dc;
        struct amdgpu_crtc *acrtc = adev->mode_info.crtcs[crtc_id];
 
        if (!acrtc) {
@@ -729,6 +729,9 @@ static inline int dm_irq_state(struct amdgpu_device *adev,
 
        st = (state == AMDGPU_IRQ_STATE_ENABLE);
 
+       if (dc && dc->caps.ips_support && dc->idle_optimizations_allowed)
+               dc_allow_idle_optimizations(dc, false);
+
        dc_interrupt_set(adev->dm.dc, irq_source, st);
        return 0;
 }
index f2dfa96f9ef5d9e4805fdbf592cac078efa391a5..39530b2ea4957cc0a6718f322158e101f15431d0 100644 (file)
@@ -94,7 +94,7 @@ static void calculate_bandwidth(
        const uint32_t s_high = 7;
        const uint32_t dmif_chunk_buff_margin = 1;
 
-       uint32_t max_chunks_fbc_mode;
+       uint32_t max_chunks_fbc_mode = 0;
        int32_t num_cursor_lines;
 
        int32_t i, j, k;
index 960c4b4f6ddf3670156abd99cc0a02aeb176c7dc..05f392501c0ae3572250061b31defef7cde51fb5 100644 (file)
@@ -1850,19 +1850,21 @@ static enum bp_result get_firmware_info_v3_2(
                /* Vega12 */
                smu_info_v3_2 = GET_IMAGE(struct atom_smu_info_v3_2,
                                                        DATA_TABLES(smu_info));
-               DC_LOG_BIOS("gpuclk_ss_percentage (unit of 0.001 percent): %d\n", smu_info_v3_2->gpuclk_ss_percentage);
                if (!smu_info_v3_2)
                        return BP_RESULT_BADBIOSTABLE;
 
+               DC_LOG_BIOS("gpuclk_ss_percentage (unit of 0.001 percent): %d\n", smu_info_v3_2->gpuclk_ss_percentage);
+
                info->default_engine_clk = smu_info_v3_2->bootup_dcefclk_10khz * 10;
        } else if (revision.minor == 3) {
                /* Vega20 */
                smu_info_v3_3 = GET_IMAGE(struct atom_smu_info_v3_3,
                                                        DATA_TABLES(smu_info));
-               DC_LOG_BIOS("gpuclk_ss_percentage (unit of 0.001 percent): %d\n", smu_info_v3_3->gpuclk_ss_percentage);
                if (!smu_info_v3_3)
                        return BP_RESULT_BADBIOSTABLE;
 
+               DC_LOG_BIOS("gpuclk_ss_percentage (unit of 0.001 percent): %d\n", smu_info_v3_3->gpuclk_ss_percentage);
+
                info->default_engine_clk = smu_info_v3_3->bootup_dcefclk_10khz * 10;
        }
 
@@ -2422,10 +2424,11 @@ static enum bp_result get_integrated_info_v11(
        info_v11 = GET_IMAGE(struct atom_integrated_system_info_v1_11,
                                        DATA_TABLES(integratedsysteminfo));
 
-       DC_LOG_BIOS("gpuclk_ss_percentage (unit of 0.001 percent): %d\n", info_v11->gpuclk_ss_percentage);
        if (info_v11 == NULL)
                return BP_RESULT_BADBIOSTABLE;
 
+       DC_LOG_BIOS("gpuclk_ss_percentage (unit of 0.001 percent): %d\n", info_v11->gpuclk_ss_percentage);
+
        info->gpu_cap_info =
        le32_to_cpu(info_v11->gpucapinfo);
        /*
@@ -2637,11 +2640,12 @@ static enum bp_result get_integrated_info_v2_1(
 
        info_v2_1 = GET_IMAGE(struct atom_integrated_system_info_v2_1,
                                        DATA_TABLES(integratedsysteminfo));
-       DC_LOG_BIOS("gpuclk_ss_percentage (unit of 0.001 percent): %d\n", info_v2_1->gpuclk_ss_percentage);
 
        if (info_v2_1 == NULL)
                return BP_RESULT_BADBIOSTABLE;
 
+       DC_LOG_BIOS("gpuclk_ss_percentage (unit of 0.001 percent): %d\n", info_v2_1->gpuclk_ss_percentage);
+
        info->gpu_cap_info =
        le32_to_cpu(info_v2_1->gpucapinfo);
        /*
@@ -2799,11 +2803,11 @@ static enum bp_result get_integrated_info_v2_2(
        info_v2_2 = GET_IMAGE(struct atom_integrated_system_info_v2_2,
                                        DATA_TABLES(integratedsysteminfo));
 
-       DC_LOG_BIOS("gpuclk_ss_percentage (unit of 0.001 percent): %d\n", info_v2_2->gpuclk_ss_percentage);
-
        if (info_v2_2 == NULL)
                return BP_RESULT_BADBIOSTABLE;
 
+       DC_LOG_BIOS("gpuclk_ss_percentage (unit of 0.001 percent): %d\n", info_v2_2->gpuclk_ss_percentage);
+
        info->gpu_cap_info =
        le32_to_cpu(info_v2_2->gpucapinfo);
        /*
index a5489fe6875f453149d622d59e9b6417b4db616c..aa9fd1dc550a5e8b2142cfb10db96f4779bdc788 100644 (file)
@@ -546,6 +546,8 @@ static unsigned int find_dcfclk_for_voltage(const struct vg_dpm_clocks *clock_ta
        int i;
 
        for (i = 0; i < VG_NUM_SOC_VOLTAGE_LEVELS; i++) {
+               if (i >= VG_NUM_DCFCLK_DPM_LEVELS)
+                       break;
                if (clock_table->SocVoltage[i] == voltage)
                        return clock_table->DcfClocks[i];
        }
index 9c660d1facc7699d7a1b3f90292ae31d985fd259..e648902592358ff08ca3a536d9f0abe56bfe5e34 100644 (file)
@@ -437,32 +437,32 @@ static struct wm_table ddr5_wm_table = {
                        .wm_inst = WM_A,
                        .wm_type = WM_TYPE_PSTATE_CHG,
                        .pstate_latency_us = 11.72,
-                       .sr_exit_time_us = 14.0,
-                       .sr_enter_plus_exit_time_us = 16.0,
+                       .sr_exit_time_us = 28.0,
+                       .sr_enter_plus_exit_time_us = 30.0,
                        .valid = true,
                },
                {
                        .wm_inst = WM_B,
                        .wm_type = WM_TYPE_PSTATE_CHG,
                        .pstate_latency_us = 11.72,
-                       .sr_exit_time_us = 14.0,
-                       .sr_enter_plus_exit_time_us = 16.0,
+                       .sr_exit_time_us = 28.0,
+                       .sr_enter_plus_exit_time_us = 30.0,
                        .valid = true,
                },
                {
                        .wm_inst = WM_C,
                        .wm_type = WM_TYPE_PSTATE_CHG,
                        .pstate_latency_us = 11.72,
-                       .sr_exit_time_us = 14.0,
-                       .sr_enter_plus_exit_time_us = 16.0,
+                       .sr_exit_time_us = 28.0,
+                       .sr_enter_plus_exit_time_us = 30.0,
                        .valid = true,
                },
                {
                        .wm_inst = WM_D,
                        .wm_type = WM_TYPE_PSTATE_CHG,
                        .pstate_latency_us = 11.72,
-                       .sr_exit_time_us = 14.0,
-                       .sr_enter_plus_exit_time_us = 16.0,
+                       .sr_exit_time_us = 28.0,
+                       .sr_enter_plus_exit_time_us = 30.0,
                        .valid = true,
                },
        }
@@ -474,32 +474,32 @@ static struct wm_table lpddr5_wm_table = {
                        .wm_inst = WM_A,
                        .wm_type = WM_TYPE_PSTATE_CHG,
                        .pstate_latency_us = 11.65333,
-                       .sr_exit_time_us = 14.0,
-                       .sr_enter_plus_exit_time_us = 16.0,
+                       .sr_exit_time_us = 28.0,
+                       .sr_enter_plus_exit_time_us = 30.0,
                        .valid = true,
                },
                {
                        .wm_inst = WM_B,
                        .wm_type = WM_TYPE_PSTATE_CHG,
                        .pstate_latency_us = 11.65333,
-                       .sr_exit_time_us = 14.0,
-                       .sr_enter_plus_exit_time_us = 16.0,
+                       .sr_exit_time_us = 28.0,
+                       .sr_enter_plus_exit_time_us = 30.0,
                        .valid = true,
                },
                {
                        .wm_inst = WM_C,
                        .wm_type = WM_TYPE_PSTATE_CHG,
                        .pstate_latency_us = 11.65333,
-                       .sr_exit_time_us = 14.0,
-                       .sr_enter_plus_exit_time_us = 16.0,
+                       .sr_exit_time_us = 28.0,
+                       .sr_enter_plus_exit_time_us = 30.0,
                        .valid = true,
                },
                {
                        .wm_inst = WM_D,
                        .wm_type = WM_TYPE_PSTATE_CHG,
                        .pstate_latency_us = 11.65333,
-                       .sr_exit_time_us = 14.0,
-                       .sr_enter_plus_exit_time_us = 16.0,
+                       .sr_exit_time_us = 28.0,
+                       .sr_enter_plus_exit_time_us = 30.0,
                        .valid = true,
                },
        }
@@ -655,10 +655,13 @@ static void dcn35_clk_mgr_helper_populate_bw_params(struct clk_mgr_internal *clk
        struct clk_limit_table_entry def_max = bw_params->clk_table.entries[bw_params->clk_table.num_entries - 1];
        uint32_t max_fclk = 0, min_pstate = 0, max_dispclk = 0, max_dppclk = 0;
        uint32_t max_pstate = 0, max_dram_speed_mts = 0, min_dram_speed_mts = 0;
+       uint32_t num_memps, num_fclk, num_dcfclk;
        int i;
 
        /* Determine min/max p-state values. */
-       for (i = 0; i < clock_table->NumMemPstatesEnabled; i++) {
+       num_memps = (clock_table->NumMemPstatesEnabled > NUM_MEM_PSTATE_LEVELS) ? NUM_MEM_PSTATE_LEVELS :
+               clock_table->NumMemPstatesEnabled;
+       for (i = 0; i < num_memps; i++) {
                uint32_t dram_speed_mts = calc_dram_speed_mts(&clock_table->MemPstateTable[i]);
 
                if (is_valid_clock_value(dram_speed_mts) && dram_speed_mts > max_dram_speed_mts) {
@@ -670,7 +673,7 @@ static void dcn35_clk_mgr_helper_populate_bw_params(struct clk_mgr_internal *clk
        min_dram_speed_mts = max_dram_speed_mts;
        min_pstate = max_pstate;
 
-       for (i = 0; i < clock_table->NumMemPstatesEnabled; i++) {
+       for (i = 0; i < num_memps; i++) {
                uint32_t dram_speed_mts = calc_dram_speed_mts(&clock_table->MemPstateTable[i]);
 
                if (is_valid_clock_value(dram_speed_mts) && dram_speed_mts < min_dram_speed_mts) {
@@ -699,9 +702,13 @@ static void dcn35_clk_mgr_helper_populate_bw_params(struct clk_mgr_internal *clk
        /* Base the clock table on dcfclk, need at least one entry regardless of pmfw table */
        ASSERT(clock_table->NumDcfClkLevelsEnabled > 0);
 
-       max_fclk = find_max_clk_value(clock_table->FclkClocks_Freq, clock_table->NumFclkLevelsEnabled);
+       num_fclk = (clock_table->NumFclkLevelsEnabled > NUM_FCLK_DPM_LEVELS) ? NUM_FCLK_DPM_LEVELS :
+               clock_table->NumFclkLevelsEnabled;
+       max_fclk = find_max_clk_value(clock_table->FclkClocks_Freq, num_fclk);
 
-       for (i = 0; i < clock_table->NumDcfClkLevelsEnabled; i++) {
+       num_dcfclk = (clock_table->NumFclkLevelsEnabled > NUM_DCFCLK_DPM_LEVELS) ? NUM_DCFCLK_DPM_LEVELS :
+               clock_table->NumDcfClkLevelsEnabled;
+       for (i = 0; i < num_dcfclk; i++) {
                int j;
 
                /* First search defaults for the clocks we don't read using closest lower or equal default dcfclk */
index aa7c02ba948e9ce63aa84eb7518f9c73c80d107a..2c424e435962d4ddd73648aeb3b531ad1bd7aa92 100644 (file)
@@ -3817,7 +3817,9 @@ static void commit_planes_for_stream(struct dc *dc,
                 * programming has completed (we turn on phantom OTG in order
                 * to complete the plane disable for phantom pipes).
                 */
-               dc->hwss.apply_ctx_to_hw(dc, context);
+
+               if (dc->hwss.disable_phantom_streams)
+                       dc->hwss.disable_phantom_streams(dc, context);
        }
 
        if (update_type != UPDATE_TYPE_FAST)
index 88c6436b28b69ca7f4791bdc47404cd5f73a5f83..180ac47868c22a68c1af47096db95ecf6b11994c 100644 (file)
@@ -291,11 +291,14 @@ void dc_state_destruct(struct dc_state *state)
                dc_stream_release(state->phantom_streams[i]);
                state->phantom_streams[i] = NULL;
        }
+       state->phantom_stream_count = 0;
 
        for (i = 0; i < state->phantom_plane_count; i++) {
                dc_plane_state_release(state->phantom_planes[i]);
                state->phantom_planes[i] = NULL;
        }
+       state->phantom_plane_count = 0;
+
        state->stream_mask = 0;
        memset(&state->res_ctx, 0, sizeof(state->res_ctx));
        memset(&state->pp_display_cfg, 0, sizeof(state->pp_display_cfg));
index 5d7aa882416b3435a5dcfbaf502a9f326981bc81..c9317ea0258ea1cb2f686830fddc7469158966cc 100644 (file)
@@ -434,6 +434,7 @@ struct dc_config {
        bool EnableMinDispClkODM;
        bool enable_auto_dpm_test_logs;
        unsigned int disable_ips;
+       unsigned int disable_ips_in_vpb;
 };
 
 enum visual_confirm {
index 2b79a0e5638e1b757ea3d3527add517db139552e..363d522603a21744c02e3e3497a2907862b02fd1 100644 (file)
@@ -125,7 +125,7 @@ bool dc_dmub_srv_cmd_list_queue_execute(struct dc_dmub_srv *dc_dmub_srv,
                unsigned int count,
                union dmub_rb_cmd *cmd_list)
 {
-       struct dc_context *dc_ctx = dc_dmub_srv->ctx;
+       struct dc_context *dc_ctx;
        struct dmub_srv *dmub;
        enum dmub_status status;
        int i;
@@ -133,6 +133,7 @@ bool dc_dmub_srv_cmd_list_queue_execute(struct dc_dmub_srv *dc_dmub_srv,
        if (!dc_dmub_srv || !dc_dmub_srv->dmub)
                return false;
 
+       dc_ctx = dc_dmub_srv->ctx;
        dmub = dc_dmub_srv->dmub;
 
        for (i = 0 ; i < count; i++) {
@@ -1161,7 +1162,7 @@ void dc_dmub_srv_subvp_save_surf_addr(const struct dc_dmub_srv *dc_dmub_srv, con
 
 bool dc_dmub_srv_is_hw_pwr_up(struct dc_dmub_srv *dc_dmub_srv, bool wait)
 {
-       struct dc_context *dc_ctx = dc_dmub_srv->ctx;
+       struct dc_context *dc_ctx;
        enum dmub_status status;
 
        if (!dc_dmub_srv || !dc_dmub_srv->dmub)
@@ -1170,6 +1171,8 @@ bool dc_dmub_srv_is_hw_pwr_up(struct dc_dmub_srv *dc_dmub_srv, bool wait)
        if (dc_dmub_srv->ctx->dc->debug.dmcub_emulation)
                return true;
 
+       dc_ctx = dc_dmub_srv->ctx;
+
        if (wait) {
                if (dc_dmub_srv->ctx->dc->debug.disable_timeout) {
                        do {
index b08ccb8c68bc366386e82a566c452459da0aabdc..9900dda2eef5cd2e44e6dbd008cd411194d107af 100644 (file)
@@ -1034,6 +1034,7 @@ enum replay_FW_Message_type {
        Replay_Msg_Not_Support = -1,
        Replay_Set_Timing_Sync_Supported,
        Replay_Set_Residency_Frameupdate_Timer,
+       Replay_Set_Pseudo_VTotal,
 };
 
 union replay_error_status {
@@ -1089,6 +1090,10 @@ struct replay_settings {
        uint16_t coasting_vtotal_table[PR_COASTING_TYPE_NUM];
        /* Maximum link off frame count */
        enum replay_link_off_frame_count_level link_off_frame_count_level;
+       /* Replay pseudo vtotal for abm + ips on full screen video which can improve ips residency */
+       uint16_t abm_with_ips_on_full_screen_video_pseudo_vtotal;
+       /* Replay last pseudo vtotal set to DMUB */
+       uint16_t last_pseudo_vtotal;
 };
 
 /* To split out "global" and "per-panel" config settings.
index e8570060d007ba5bab0db3b3395aca2b9c487573..5bca67407c5b16b682ed669ef2b6382be7965b1b 100644 (file)
@@ -290,4 +290,5 @@ void dce_panel_cntl_construct(
        dce_panel_cntl->base.funcs = &dce_link_panel_cntl_funcs;
        dce_panel_cntl->base.ctx = init_data->ctx;
        dce_panel_cntl->base.inst = init_data->inst;
+       dce_panel_cntl->base.pwrseq_inst = 0;
 }
index e43f77c11c00825aad64ada6ddfb4b0bdce23aff..5f97a868ada34734d99a6a35a329d9c3cd3c5ac2 100644 (file)
@@ -56,16 +56,13 @@ static void dpp3_enable_cm_block(
 
 static enum dc_lut_mode dpp30_get_gamcor_current(struct dpp *dpp_base)
 {
-       enum dc_lut_mode mode;
+       enum dc_lut_mode mode = LUT_BYPASS;
        uint32_t state_mode;
        uint32_t lut_mode;
        struct dcn3_dpp *dpp = TO_DCN30_DPP(dpp_base);
 
        REG_GET(CM_GAMCOR_CONTROL, CM_GAMCOR_MODE_CURRENT, &state_mode);
 
-       if (state_mode == 0)
-               mode = LUT_BYPASS;
-
        if (state_mode == 2) {//Programmable RAM LUT
                REG_GET(CM_GAMCOR_CONTROL, CM_GAMCOR_SELECT_CURRENT, &lut_mode);
                if (lut_mode == 0)
index ad0df1a72a90ab4ff13b267f1c69392e68703884..9e96a3ace2077cb53bff30f5984a5391a017d239 100644 (file)
@@ -215,4 +215,5 @@ void dcn301_panel_cntl_construct(
        dcn301_panel_cntl->base.funcs = &dcn301_link_panel_cntl_funcs;
        dcn301_panel_cntl->base.ctx = init_data->ctx;
        dcn301_panel_cntl->base.inst = init_data->inst;
+       dcn301_panel_cntl->base.pwrseq_inst = 0;
 }
index 03248422d6ffde2d6923fb33185bf8dd12607787..281be20b1a1071576a4ca9037ee105333268801e 100644 (file)
@@ -154,8 +154,24 @@ void dcn31_panel_cntl_construct(
        struct dcn31_panel_cntl *dcn31_panel_cntl,
        const struct panel_cntl_init_data *init_data)
 {
+       uint8_t pwrseq_inst = 0xF;
+
        dcn31_panel_cntl->base.funcs = &dcn31_link_panel_cntl_funcs;
        dcn31_panel_cntl->base.ctx = init_data->ctx;
        dcn31_panel_cntl->base.inst = init_data->inst;
-       dcn31_panel_cntl->base.pwrseq_inst = init_data->pwrseq_inst;
+
+       switch (init_data->eng_id) {
+       case ENGINE_ID_DIGA:
+               pwrseq_inst = 0;
+               break;
+       case ENGINE_ID_DIGB:
+               pwrseq_inst = 1;
+               break;
+       default:
+               DC_LOG_WARNING("Unsupported pwrseq engine id: %d!\n", init_data->eng_id);
+               ASSERT(false);
+               break;
+       }
+
+       dcn31_panel_cntl->base.pwrseq_inst = pwrseq_inst;
 }
index 501388014855c5a1f830b6a830d9f6eed9bf3224..d761b0df28784afd5d81dfef193dfc11657ddff2 100644 (file)
@@ -203,12 +203,12 @@ void dcn32_link_encoder_construct(
        enc10->base.hpd_source = init_data->hpd_source;
        enc10->base.connector = init_data->connector;
 
-       if (enc10->base.connector.id == CONNECTOR_ID_USBC)
-               enc10->base.features.flags.bits.DP_IS_USB_C = 1;
 
        enc10->base.preferred_engine = ENGINE_ID_UNKNOWN;
 
        enc10->base.features = *enc_features;
+       if (enc10->base.connector.id == CONNECTOR_ID_USBC)
+               enc10->base.features.flags.bits.DP_IS_USB_C = 1;
 
        enc10->base.transmitter = init_data->transmitter;
 
index da94e5309fbaf0f8e06a4a1aad4ce431a8d9f2cc..81e349d5835bbed499f03ef6eb33e5210c83d64b 100644 (file)
@@ -184,8 +184,6 @@ void dcn35_link_encoder_construct(
        enc10->base.hpd_source = init_data->hpd_source;
        enc10->base.connector = init_data->connector;
 
-       if (enc10->base.connector.id == CONNECTOR_ID_USBC)
-               enc10->base.features.flags.bits.DP_IS_USB_C = 1;
 
        enc10->base.preferred_engine = ENGINE_ID_UNKNOWN;
 
@@ -240,6 +238,8 @@ void dcn35_link_encoder_construct(
        }
 
        enc10->base.features.flags.bits.HDMI_6GB_EN = 1;
+       if (enc10->base.connector.id == CONNECTOR_ID_USBC)
+               enc10->base.features.flags.bits.DP_IS_USB_C = 1;
 
        if (bp_funcs->get_connector_speed_cap_info)
                result = bp_funcs->get_connector_speed_cap_info(enc10->base.ctx->dc_bios,
index 6042a5a6a44f8c32187b2bea702892572f08ec57..59ade76ffb18d56f26a6b329b850462150214c04 100644 (file)
@@ -72,11 +72,11 @@ CFLAGS_$(AMDDALPATH)/dc/dml/display_mode_lib.o := $(dml_ccflags)
 CFLAGS_$(AMDDALPATH)/dc/dml/display_mode_vba.o := $(dml_ccflags)
 CFLAGS_$(AMDDALPATH)/dc/dml/dcn10/dcn10_fpu.o := $(dml_ccflags)
 CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/dcn20_fpu.o := $(dml_ccflags)
-CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_mode_vba_20.o := $(dml_ccflags)
+CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_mode_vba_20.o := $(dml_ccflags) $(frame_warn_flag)
 CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_rq_dlg_calc_20.o := $(dml_ccflags)
-CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_mode_vba_20v2.o := $(dml_ccflags)
+CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_mode_vba_20v2.o := $(dml_ccflags) $(frame_warn_flag)
 CFLAGS_$(AMDDALPATH)/dc/dml/dcn20/display_rq_dlg_calc_20v2.o := $(dml_ccflags)
-CFLAGS_$(AMDDALPATH)/dc/dml/dcn21/display_mode_vba_21.o := $(dml_ccflags)
+CFLAGS_$(AMDDALPATH)/dc/dml/dcn21/display_mode_vba_21.o := $(dml_ccflags) $(frame_warn_flag)
 CFLAGS_$(AMDDALPATH)/dc/dml/dcn21/display_rq_dlg_calc_21.o := $(dml_ccflags)
 CFLAGS_$(AMDDALPATH)/dc/dml/dcn30/display_mode_vba_30.o := $(dml_ccflags) $(frame_warn_flag)
 CFLAGS_$(AMDDALPATH)/dc/dml/dcn30/display_rq_dlg_calc_30.o := $(dml_ccflags)
index 9f37f717a1f86f88c5fa41bc30f477406d70f3b8..a0a65e0991041d90904c516c7279c5b8aa76967c 100644 (file)
@@ -1112,7 +1112,7 @@ struct pipe_slice_table {
                struct pipe_ctx *pri_pipe;
                struct dc_plane_state *plane;
                int slice_count;
-       } mpc_combines[MAX_SURFACES];
+       } mpc_combines[MAX_PLANES];
        int mpc_combine_count;
 };
 
@@ -1288,7 +1288,7 @@ static bool update_pipes_with_split_flags(struct dc *dc, struct dc_state *contex
        return updated;
 }
 
-static bool should_allow_odm_power_optimization(struct dc *dc,
+static bool should_apply_odm_power_optimization(struct dc *dc,
                struct dc_state *context, struct vba_vars_st *v, int *split,
                bool *merge)
 {
@@ -1392,9 +1392,12 @@ static void try_odm_power_optimization_and_revalidate(
 {
        int i;
        unsigned int new_vlevel;
+       unsigned int cur_policy[MAX_PIPES];
 
-       for (i = 0; i < pipe_cnt; i++)
+       for (i = 0; i < pipe_cnt; i++) {
+               cur_policy[i] = pipes[i].pipe.dest.odm_combine_policy;
                pipes[i].pipe.dest.odm_combine_policy = dm_odm_combine_policy_2to1;
+       }
 
        new_vlevel = dml_get_voltage_level(&context->bw_ctx.dml, pipes, pipe_cnt);
 
@@ -1403,6 +1406,9 @@ static void try_odm_power_optimization_and_revalidate(
                memset(merge, 0, MAX_PIPES * sizeof(bool));
                *vlevel = dcn20_validate_apply_pipe_split_flags(dc, context, new_vlevel, split, merge);
                context->bw_ctx.dml.vba.VoltageLevel = *vlevel;
+       } else {
+               for (i = 0; i < pipe_cnt; i++)
+                       pipes[i].pipe.dest.odm_combine_policy = cur_policy[i];
        }
 }
 
@@ -1580,7 +1586,7 @@ static void dcn32_full_validate_bw_helper(struct dc *dc,
                }
        }
 
-       if (should_allow_odm_power_optimization(dc, context, vba, split, merge))
+       if (should_apply_odm_power_optimization(dc, context, vba, split, merge))
                try_odm_power_optimization_and_revalidate(
                                dc, context, pipes, split, merge, vlevel, *pipe_cnt);
 
@@ -2209,7 +2215,8 @@ bool dcn32_internal_validate_bw(struct dc *dc,
                int i;
 
                pipe_cnt = dc->res_pool->funcs->populate_dml_pipes(dc, context, pipes, fast_validate);
-               dcn32_update_dml_pipes_odm_policy_based_on_context(dc, context, pipes);
+               if (!dc->config.enable_windowed_mpo_odm)
+                       dcn32_update_dml_pipes_odm_policy_based_on_context(dc, context, pipes);
 
                /* repopulate_pipes = 1 means the pipes were either split or merged. In this case
                 * we have to re-calculate the DET allocation and run through DML once more to
index 475c4ec43c013f481a71ad5668a8aef82ac7ba0a..7ea2bd5374d51b138d13179ab7444d0d8d2ef3a7 100644 (file)
@@ -164,8 +164,8 @@ struct _vcs_dpi_soc_bounding_box_st dcn3_5_soc = {
                },
        },
        .num_states = 5,
-       .sr_exit_time_us = 14.0,
-       .sr_enter_plus_exit_time_us = 16.0,
+       .sr_exit_time_us = 28.0,
+       .sr_enter_plus_exit_time_us = 30.0,
        .sr_exit_z8_time_us = 210.0,
        .sr_enter_plus_exit_z8_time_us = 320.0,
        .fclk_change_latency_us = 24.0,
index 64d01a9cd68c859db9bcffbc478ef09090b07fbf..1ba6933d2b3617aa6d275647d17320dd0755ae69 100644 (file)
@@ -341,9 +341,6 @@ void dml2_init_soc_states(struct dml2_context *dml2, const struct dc *in_dc,
                break;
        }
 
-       if (dml2->config.bbox_overrides.clks_table.num_states)
-                       p->in_states->num_states = dml2->config.bbox_overrides.clks_table.num_states;
-
        /* Override from passed values, if available */
        for (i = 0; i < p->in_states->num_states; i++) {
                if (dml2->config.bbox_overrides.sr_exit_latency_us) {
@@ -400,7 +397,7 @@ void dml2_init_soc_states(struct dml2_context *dml2, const struct dc *in_dc,
        }
        /* Copy clocks tables entries, if available */
        if (dml2->config.bbox_overrides.clks_table.num_states) {
-
+               p->in_states->num_states = dml2->config.bbox_overrides.clks_table.num_states;
                for (i = 0; i < dml2->config.bbox_overrides.clks_table.num_entries_per_clk.num_dcfclk_levels; i++) {
                        p->in_states->state_array[i].dcfclk_mhz = dml2->config.bbox_overrides.clks_table.clk_entries[i].dcfclk_mhz;
                }
@@ -439,6 +436,14 @@ void dml2_init_soc_states(struct dml2_context *dml2, const struct dc *in_dc,
        }
 
        dml2_policy_build_synthetic_soc_states(s, p);
+       if (dml2->v20.dml_core_ctx.project == dml_project_dcn35 ||
+               dml2->v20.dml_core_ctx.project == dml_project_dcn351) {
+               // Override last out_state with data from last in_state
+               // This will ensure that out_state contains max fclk
+               memcpy(&p->out_states->state_array[p->out_states->num_states - 1],
+                               &p->in_states->state_array[p->in_states->num_states - 1],
+                               sizeof(struct soc_state_bounding_box_st));
+       }
 }
 
 void dml2_translate_ip_params(const struct dc *in, struct ip_params_st *out)
@@ -793,35 +798,28 @@ static void populate_dml_surface_cfg_from_plane_state(enum dml_project_id dml2_p
        }
 }
 
-/*TODO no support for mpc combine, need rework - should calculate scaling params based on plane+stream*/
-static struct scaler_data get_scaler_data_for_plane(const struct dc_plane_state *in, const struct dc_state *context)
+static struct scaler_data get_scaler_data_for_plane(const struct dc_plane_state *in, struct dc_state *context)
 {
        int i;
-       struct scaler_data data = { 0 };
+       struct pipe_ctx *temp_pipe = &context->res_ctx.temp_pipe;
+
+       memset(temp_pipe, 0, sizeof(struct pipe_ctx));
 
        for (i = 0; i < MAX_PIPES; i++) {
                const struct pipe_ctx *pipe = &context->res_ctx.pipe_ctx[i];
 
                if (pipe->plane_state == in && !pipe->prev_odm_pipe) {
-                       const struct pipe_ctx *next_pipe = pipe->next_odm_pipe;
-
-                       data = context->res_ctx.pipe_ctx[i].plane_res.scl_data;
-                       while (next_pipe) {
-                               data.h_active += next_pipe->plane_res.scl_data.h_active;
-                               data.recout.width += next_pipe->plane_res.scl_data.recout.width;
-                               if (in->rotation == ROTATION_ANGLE_0 || in->rotation == ROTATION_ANGLE_180) {
-                                       data.viewport.width += next_pipe->plane_res.scl_data.viewport.width;
-                               } else {
-                                       data.viewport.height += next_pipe->plane_res.scl_data.viewport.height;
-                               }
-                               next_pipe = next_pipe->next_odm_pipe;
-                       }
+                       temp_pipe->stream = pipe->stream;
+                       temp_pipe->plane_state = pipe->plane_state;
+                       temp_pipe->plane_res.scl_data.taps = pipe->plane_res.scl_data.taps;
+
+                       resource_build_scaling_params(temp_pipe);
                        break;
                }
        }
 
        ASSERT(i < MAX_PIPES);
-       return data;
+       return temp_pipe->plane_res.scl_data;
 }
 
 static void populate_dummy_dml_plane_cfg(struct dml_plane_cfg_st *out, unsigned int location, const struct dc_stream_state *in)
@@ -866,7 +864,7 @@ static void populate_dummy_dml_plane_cfg(struct dml_plane_cfg_st *out, unsigned
        out->ScalerEnabled[location] = false;
 }
 
-static void populate_dml_plane_cfg_from_plane_state(struct dml_plane_cfg_st *out, unsigned int location, const struct dc_plane_state *in, const struct dc_state *context)
+static void populate_dml_plane_cfg_from_plane_state(struct dml_plane_cfg_st *out, unsigned int location, const struct dc_plane_state *in, struct dc_state *context)
 {
        const struct scaler_data scaler_data = get_scaler_data_for_plane(in, context);
 
index 26307e599614c6e1212c53184ba02849ae6e1dbb..2a58a7687bdb5779db6c639d3cbf2277aaf231ae 100644 (file)
@@ -76,6 +76,11 @@ static void map_hw_resources(struct dml2_context *dml2,
                        in_out_display_cfg->hw.DLGRefClkFreqMHz = 50;
                }
                for (j = 0; j < mode_support_info->DPPPerSurface[i]; j++) {
+                       if (i >= __DML2_WRAPPER_MAX_STREAMS_PLANES__) {
+                               dml_print("DML::%s: Index out of bounds: i=%d, __DML2_WRAPPER_MAX_STREAMS_PLANES__=%d\n",
+                                         __func__, i, __DML2_WRAPPER_MAX_STREAMS_PLANES__);
+                               break;
+                       }
                        dml2->v20.scratch.dml_to_dc_pipe_mapping.dml_pipe_idx_to_stream_id[num_pipes] = dml2->v20.scratch.dml_to_dc_pipe_mapping.disp_cfg_to_stream_id[i];
                        dml2->v20.scratch.dml_to_dc_pipe_mapping.dml_pipe_idx_to_stream_id_valid[num_pipes] = true;
                        dml2->v20.scratch.dml_to_dc_pipe_mapping.dml_pipe_idx_to_plane_id[num_pipes] = dml2->v20.scratch.dml_to_dc_pipe_mapping.disp_cfg_to_plane_id[i];
index 5660f15da291e9de58637c115e315b07f1cee7a3..01493c49bd7a084b1748bb786c56106858709dcc 100644 (file)
@@ -1183,9 +1183,9 @@ void dce110_disable_stream(struct pipe_ctx *pipe_ctx)
                dto_params.timing = &pipe_ctx->stream->timing;
                dp_hpo_inst = pipe_ctx->stream_res.hpo_dp_stream_enc->inst;
                if (dccg) {
-                       dccg->funcs->set_dtbclk_dto(dccg, &dto_params);
                        dccg->funcs->disable_symclk32_se(dccg, dp_hpo_inst);
                        dccg->funcs->set_dpstreamclk(dccg, REFCLK, tg->inst, dp_hpo_inst);
+                       dccg->funcs->set_dtbclk_dto(dccg, &dto_params);
                }
        } else if (dccg && dccg->funcs->disable_symclk_se) {
                dccg->funcs->disable_symclk_se(dccg, stream_enc->stream_enc_inst,
@@ -1476,7 +1476,7 @@ static enum dc_status dce110_enable_stream_timing(
        return DC_OK;
 }
 
-static enum dc_status apply_single_controller_ctx_to_hw(
+enum dc_status dce110_apply_single_controller_ctx_to_hw(
                struct pipe_ctx *pipe_ctx,
                struct dc_state *context,
                struct dc *dc)
@@ -2302,7 +2302,7 @@ enum dc_status dce110_apply_ctx_to_hw(
                if (pipe_ctx->top_pipe || pipe_ctx->prev_odm_pipe)
                        continue;
 
-               status = apply_single_controller_ctx_to_hw(
+               status = dce110_apply_single_controller_ctx_to_hw(
                                pipe_ctx,
                                context,
                                dc);
index 08028a1779ae819282ab2394de57c4b8f266a9f3..ed3cc3648e8e23f8d076b92e10a23791253f9662 100644 (file)
@@ -39,6 +39,10 @@ enum dc_status dce110_apply_ctx_to_hw(
                struct dc *dc,
                struct dc_state *context);
 
+enum dc_status dce110_apply_single_controller_ctx_to_hw(
+               struct pipe_ctx *pipe_ctx,
+               struct dc_state *context,
+               struct dc *dc);
 
 void dce110_enable_stream(struct pipe_ctx *pipe_ctx);
 
index e931342fcf4cf1d4f4b0cf41628cd9f855fa6dac..931ac8ed7069d7bdcd3ca2f0c35f5e5f04552827 100644 (file)
@@ -2561,7 +2561,7 @@ void dcn20_setup_vupdate_interrupt(struct dc *dc, struct pipe_ctx *pipe_ctx)
                tg->funcs->setup_vertical_interrupt2(tg, start_line);
 }
 
-static void dcn20_reset_back_end_for_pipe(
+void dcn20_reset_back_end_for_pipe(
                struct dc *dc,
                struct pipe_ctx *pipe_ctx,
                struct dc_state *context)
@@ -2790,18 +2790,17 @@ void dcn20_enable_stream(struct pipe_ctx *pipe_ctx)
        }
 
        if (dc->link_srv->dp_is_128b_132b_signal(pipe_ctx)) {
-               dp_hpo_inst = pipe_ctx->stream_res.hpo_dp_stream_enc->inst;
-               dccg->funcs->set_dpstreamclk(dccg, DTBCLK0, tg->inst, dp_hpo_inst);
-
-               phyd32clk = get_phyd32clk_src(link);
-               dccg->funcs->enable_symclk32_se(dccg, dp_hpo_inst, phyd32clk);
-
                dto_params.otg_inst = tg->inst;
                dto_params.pixclk_khz = pipe_ctx->stream->timing.pix_clk_100hz / 10;
                dto_params.num_odm_segments = get_odm_segment_count(pipe_ctx);
                dto_params.timing = &pipe_ctx->stream->timing;
                dto_params.ref_dtbclk_khz = dc->clk_mgr->funcs->get_dtb_ref_clk_frequency(dc->clk_mgr);
                dccg->funcs->set_dtbclk_dto(dccg, &dto_params);
+               dp_hpo_inst = pipe_ctx->stream_res.hpo_dp_stream_enc->inst;
+               dccg->funcs->set_dpstreamclk(dccg, DTBCLK0, tg->inst, dp_hpo_inst);
+
+               phyd32clk = get_phyd32clk_src(link);
+               dccg->funcs->enable_symclk32_se(dccg, dp_hpo_inst, phyd32clk);
        } else {
                if (dccg->funcs->enable_symclk_se)
                        dccg->funcs->enable_symclk_se(dccg, stream_enc->stream_enc_inst,
index b94c85340abff7c02f3ec59025b04c8417d77bd6..d950b3e54ec2c7d35fb1c70a53094f0543c17b97 100644 (file)
@@ -84,6 +84,10 @@ enum dc_status dcn20_enable_stream_timing(
 void dcn20_disable_stream_gating(struct dc *dc, struct pipe_ctx *pipe_ctx);
 void dcn20_enable_stream_gating(struct dc *dc, struct pipe_ctx *pipe_ctx);
 void dcn20_setup_vupdate_interrupt(struct dc *dc, struct pipe_ctx *pipe_ctx);
+void dcn20_reset_back_end_for_pipe(
+               struct dc *dc,
+               struct pipe_ctx *pipe_ctx,
+               struct dc_state *context);
 void dcn20_init_blank(
                struct dc *dc,
                struct timing_generator *tg);
index 8e88dcaf88f5b2b709a95abf9e0673390e27daa5..7252f5f781f0d7869e147846bc1eb44f09e63593 100644 (file)
@@ -206,28 +206,32 @@ void dcn21_set_abm_immediate_disable(struct pipe_ctx *pipe_ctx)
 void dcn21_set_pipe(struct pipe_ctx *pipe_ctx)
 {
        struct abm *abm = pipe_ctx->stream_res.abm;
-       uint32_t otg_inst = pipe_ctx->stream_res.tg->inst;
+       struct timing_generator *tg = pipe_ctx->stream_res.tg;
        struct panel_cntl *panel_cntl = pipe_ctx->stream->link->panel_cntl;
        struct dmcu *dmcu = pipe_ctx->stream->ctx->dc->res_pool->dmcu;
+       uint32_t otg_inst;
+
+       if (!abm || !tg || !panel_cntl)
+               return;
+
+       otg_inst = tg->inst;
 
        if (dmcu) {
                dce110_set_pipe(pipe_ctx);
                return;
        }
 
-       if (abm && panel_cntl) {
-               if (abm->funcs && abm->funcs->set_pipe_ex) {
-                       abm->funcs->set_pipe_ex(abm,
+       if (abm->funcs && abm->funcs->set_pipe_ex) {
+               abm->funcs->set_pipe_ex(abm,
                                        otg_inst,
                                        SET_ABM_PIPE_NORMAL,
                                        panel_cntl->inst,
                                        panel_cntl->pwrseq_inst);
-               } else {
-                               dmub_abm_set_pipe(abm, otg_inst,
-                                               SET_ABM_PIPE_NORMAL,
-                                               panel_cntl->inst,
-                                               panel_cntl->pwrseq_inst);
-               }
+       } else {
+               dmub_abm_set_pipe(abm, otg_inst,
+                                 SET_ABM_PIPE_NORMAL,
+                                 panel_cntl->inst,
+                                 panel_cntl->pwrseq_inst);
        }
 }
 
@@ -237,34 +241,35 @@ bool dcn21_set_backlight_level(struct pipe_ctx *pipe_ctx,
 {
        struct dc_context *dc = pipe_ctx->stream->ctx;
        struct abm *abm = pipe_ctx->stream_res.abm;
+       struct timing_generator *tg = pipe_ctx->stream_res.tg;
        struct panel_cntl *panel_cntl = pipe_ctx->stream->link->panel_cntl;
+       uint32_t otg_inst;
+
+       if (!abm || !tg || !panel_cntl)
+               return false;
+
+       otg_inst = tg->inst;
 
        if (dc->dc->res_pool->dmcu) {
                dce110_set_backlight_level(pipe_ctx, backlight_pwm_u16_16, frame_ramp);
                return true;
        }
 
-       if (abm != NULL) {
-               uint32_t otg_inst = pipe_ctx->stream_res.tg->inst;
-
-               if (abm && panel_cntl) {
-                       if (abm->funcs && abm->funcs->set_pipe_ex) {
-                               abm->funcs->set_pipe_ex(abm,
-                                               otg_inst,
-                                               SET_ABM_PIPE_NORMAL,
-                                               panel_cntl->inst,
-                                               panel_cntl->pwrseq_inst);
-                       } else {
-                                       dmub_abm_set_pipe(abm,
-                                                       otg_inst,
-                                                       SET_ABM_PIPE_NORMAL,
-                                                       panel_cntl->inst,
-                                                       panel_cntl->pwrseq_inst);
-                       }
-               }
+       if (abm->funcs && abm->funcs->set_pipe_ex) {
+               abm->funcs->set_pipe_ex(abm,
+                                       otg_inst,
+                                       SET_ABM_PIPE_NORMAL,
+                                       panel_cntl->inst,
+                                       panel_cntl->pwrseq_inst);
+       } else {
+               dmub_abm_set_pipe(abm,
+                                 otg_inst,
+                                 SET_ABM_PIPE_NORMAL,
+                                 panel_cntl->inst,
+                                 panel_cntl->pwrseq_inst);
        }
 
-       if (abm && abm->funcs && abm->funcs->set_backlight_level_pwm)
+       if (abm->funcs && abm->funcs->set_backlight_level_pwm)
                abm->funcs->set_backlight_level_pwm(abm, backlight_pwm_u16_16,
                        frame_ramp, 0, panel_cntl->inst);
        else
index 6c9299c7683df19b3c444b865d297182d91ae7b3..aa36d7a56ca8c3b6f3cd47e67455ba67549bf73b 100644 (file)
@@ -1474,9 +1474,44 @@ void dcn32_update_dsc_pg(struct dc *dc,
        }
 }
 
+void dcn32_disable_phantom_streams(struct dc *dc, struct dc_state *context)
+{
+       struct dce_hwseq *hws = dc->hwseq;
+       int i;
+
+       for (i = dc->res_pool->pipe_count - 1; i >= 0 ; i--) {
+               struct pipe_ctx *pipe_ctx_old =
+                       &dc->current_state->res_ctx.pipe_ctx[i];
+               struct pipe_ctx *pipe_ctx = &context->res_ctx.pipe_ctx[i];
+
+               if (!pipe_ctx_old->stream)
+                       continue;
+
+               if (dc_state_get_pipe_subvp_type(dc->current_state, pipe_ctx_old) != SUBVP_PHANTOM)
+                       continue;
+
+               if (pipe_ctx_old->top_pipe || pipe_ctx_old->prev_odm_pipe)
+                       continue;
+
+               if (!pipe_ctx->stream || pipe_need_reprogram(pipe_ctx_old, pipe_ctx) ||
+                               (pipe_ctx->stream && dc_state_get_pipe_subvp_type(context, pipe_ctx) != SUBVP_PHANTOM)) {
+                       struct clock_source *old_clk = pipe_ctx_old->clock_source;
+
+                       if (hws->funcs.reset_back_end_for_pipe)
+                               hws->funcs.reset_back_end_for_pipe(dc, pipe_ctx_old, dc->current_state);
+                       if (hws->funcs.enable_stream_gating)
+                               hws->funcs.enable_stream_gating(dc, pipe_ctx_old);
+                       if (old_clk)
+                               old_clk->funcs->cs_power_down(old_clk);
+               }
+       }
+}
+
 void dcn32_enable_phantom_streams(struct dc *dc, struct dc_state *context)
 {
        unsigned int i;
+       enum dc_status status = DC_OK;
+       struct dce_hwseq *hws = dc->hwseq;
 
        for (i = 0; i < dc->res_pool->pipe_count; i++) {
                struct pipe_ctx *pipe = &context->res_ctx.pipe_ctx[i];
@@ -1497,16 +1532,39 @@ void dcn32_enable_phantom_streams(struct dc *dc, struct dc_state *context)
                }
        }
        for (i = 0; i < dc->res_pool->pipe_count; i++) {
-               struct pipe_ctx *new_pipe = &context->res_ctx.pipe_ctx[i];
-
-               if (new_pipe->stream && dc_state_get_pipe_subvp_type(context, new_pipe) == SUBVP_PHANTOM) {
-                       // If old context or new context has phantom pipes, apply
-                       // the phantom timings now. We can't change the phantom
-                       // pipe configuration safely without driver acquiring
-                       // the DMCUB lock first.
-                       dc->hwss.apply_ctx_to_hw(dc, context);
-                       break;
+               struct pipe_ctx *pipe_ctx_old =
+                                       &dc->current_state->res_ctx.pipe_ctx[i];
+               struct pipe_ctx *pipe_ctx = &context->res_ctx.pipe_ctx[i];
+
+               if (pipe_ctx->stream == NULL)
+                       continue;
+
+               if (dc_state_get_pipe_subvp_type(context, pipe_ctx) != SUBVP_PHANTOM)
+                       continue;
+
+               if (pipe_ctx->stream == pipe_ctx_old->stream &&
+                       pipe_ctx->stream->link->link_state_valid) {
+                       continue;
                }
+
+               if (pipe_ctx_old->stream && !pipe_need_reprogram(pipe_ctx_old, pipe_ctx))
+                       continue;
+
+               if (pipe_ctx->top_pipe || pipe_ctx->prev_odm_pipe)
+                       continue;
+
+               if (hws->funcs.apply_single_controller_ctx_to_hw)
+                       status = hws->funcs.apply_single_controller_ctx_to_hw(
+                                       pipe_ctx,
+                                       context,
+                                       dc);
+
+               ASSERT(status == DC_OK);
+
+#ifdef CONFIG_DRM_AMD_DC_FP
+               if (hws->funcs.resync_fifo_dccg_dio)
+                       hws->funcs.resync_fifo_dccg_dio(hws, dc, context);
+#endif
        }
 }
 
index cecf7f0f567190b257cf81e5f756b5a916eba09c..069e20bc87c0a75af028168253219fc9343b1af3 100644 (file)
@@ -111,6 +111,8 @@ void dcn32_update_dsc_pg(struct dc *dc,
 
 void dcn32_enable_phantom_streams(struct dc *dc, struct dc_state *context);
 
+void dcn32_disable_phantom_streams(struct dc *dc, struct dc_state *context);
+
 void dcn32_init_blank(
                struct dc *dc,
                struct timing_generator *tg);
index 427cfc8c24a4b7ed4cee1f0b6955cbe371797219..e8ac94a005b83a78533646aae0a36ca132eb8a75 100644 (file)
@@ -109,6 +109,7 @@ static const struct hw_sequencer_funcs dcn32_funcs = {
        .get_dcc_en_bits = dcn10_get_dcc_en_bits,
        .commit_subvp_config = dcn32_commit_subvp_config,
        .enable_phantom_streams = dcn32_enable_phantom_streams,
+       .disable_phantom_streams = dcn32_disable_phantom_streams,
        .subvp_pipe_control_lock = dcn32_subvp_pipe_control_lock,
        .update_visual_confirm_color = dcn10_update_visual_confirm_color,
        .subvp_pipe_control_lock_fast = dcn32_subvp_pipe_control_lock_fast,
@@ -159,6 +160,8 @@ static const struct hwseq_private_funcs dcn32_private_funcs = {
        .set_pixels_per_cycle = dcn32_set_pixels_per_cycle,
        .resync_fifo_dccg_dio = dcn32_resync_fifo_dccg_dio,
        .is_dp_dig_pixel_rate_div_policy = dcn32_is_dp_dig_pixel_rate_div_policy,
+       .apply_single_controller_ctx_to_hw = dce110_apply_single_controller_ctx_to_hw,
+       .reset_back_end_for_pipe = dcn20_reset_back_end_for_pipe,
 };
 
 void dcn32_hw_sequencer_init_functions(struct dc *dc)
index 9c806385ecbdcce6c0d14f949ea41879758969f7..8b6c49622f3b63c8e6dae68c507e1e45c5a736a2 100644 (file)
@@ -680,7 +680,7 @@ void dcn35_power_down_on_boot(struct dc *dc)
 bool dcn35_apply_idle_power_optimizations(struct dc *dc, bool enable)
 {
        struct dc_link *edp_links[MAX_NUM_EDP];
-       int edp_num;
+       int i, edp_num;
        if (dc->debug.dmcub_emulation)
                return true;
 
@@ -688,6 +688,13 @@ bool dcn35_apply_idle_power_optimizations(struct dc *dc, bool enable)
                dc_get_edp_links(dc, edp_links, &edp_num);
                if (edp_num == 0 || edp_num > 1)
                        return false;
+
+               for (i = 0; i < dc->current_state->stream_count; ++i) {
+                       struct dc_stream_state *stream = dc->current_state->streams[i];
+
+                       if (!stream->dpms_off && !dc_is_embedded_signal(stream->signal))
+                               return false;
+               }
        }
 
        // TODO: review other cases when idle optimization is allowed
index a54399383318145b8bc72fc85e646bf546588609..64ca7c66509b79bc2cfe50806cc37e8953468239 100644 (file)
@@ -379,6 +379,7 @@ struct hw_sequencer_funcs {
                        struct dc_cursor_attributes *cursor_attr);
        void (*commit_subvp_config)(struct dc *dc, struct dc_state *context);
        void (*enable_phantom_streams)(struct dc *dc, struct dc_state *context);
+       void (*disable_phantom_streams)(struct dc *dc, struct dc_state *context);
        void (*subvp_pipe_control_lock)(struct dc *dc,
                        struct dc_state *context,
                        bool lock,
index 6137cf09aa54d25750246e86583c5938e557501b..b3c62a82cb1cf10fddad52dcf85b7e02de87ee35 100644 (file)
@@ -165,8 +165,15 @@ struct hwseq_private_funcs {
        void (*set_pixels_per_cycle)(struct pipe_ctx *pipe_ctx);
        void (*resync_fifo_dccg_dio)(struct dce_hwseq *hws, struct dc *dc,
                        struct dc_state *context);
+       enum dc_status (*apply_single_controller_ctx_to_hw)(
+                       struct pipe_ctx *pipe_ctx,
+                       struct dc_state *context,
+                       struct dc *dc);
        bool (*is_dp_dig_pixel_rate_div_policy)(struct pipe_ctx *pipe_ctx);
 #endif
+       void (*reset_back_end_for_pipe)(struct dc *dc,
+                       struct pipe_ctx *pipe_ctx,
+                       struct dc_state *context);
 };
 
 struct dce_hwseq {
index f74ae0d41d3c49cf215d615f336339b773cbbcbc..3a6bf77a68732166d320dbea642929c3201d3e01 100644 (file)
@@ -469,6 +469,8 @@ struct resource_context {
        unsigned int hpo_dp_link_enc_to_link_idx[MAX_HPO_DP2_LINK_ENCODERS];
        int hpo_dp_link_enc_ref_cnts[MAX_HPO_DP2_LINK_ENCODERS];
        bool is_mpc_3dlut_acquired[MAX_PIPES];
+       /* solely used for build scalar data in dml2 */
+       struct pipe_ctx temp_pipe;
 };
 
 struct dce_bw_output {
index 5dcbaa2db964aee7de17c2e9306606cac1817b08..e97d964a1791cefb2eb47c91780a41e3682baed0 100644 (file)
@@ -57,7 +57,7 @@ struct panel_cntl_funcs {
 struct panel_cntl_init_data {
        struct dc_context *ctx;
        uint32_t inst;
-       uint32_t pwrseq_inst;
+       uint32_t eng_id;
 };
 
 struct panel_cntl {
index c958ef37b78a667b1bb9bfb26827ae3e45053715..77a60aa9f27bbfdfa8a652306e2366dc0eca4345 100644 (file)
@@ -427,22 +427,18 @@ struct pipe_ctx *resource_get_primary_dpp_pipe(const struct pipe_ctx *dpp_pipe);
 int resource_get_mpc_slice_index(const struct pipe_ctx *dpp_pipe);
 
 /*
- * Get number of MPC "cuts" of the plane associated with the pipe. MPC slice
- * count is equal to MPC splits + 1. For example if a plane is cut 3 times, it
- * will have 4 pieces of slice.
- * return - 0 if pipe is not used for a plane with MPCC combine. otherwise
- * the number of MPC "cuts" for the plane.
+ * Get the number of MPC slices associated with the pipe.
+ * The function returns 0 if the pipe is not associated with an MPC combine
+ * pipe topology.
  */
-int resource_get_mpc_slice_count(const struct pipe_ctx *opp_head);
+int resource_get_mpc_slice_count(const struct pipe_ctx *pipe);
 
 /*
- * Get number of ODM "cuts" of the timing associated with the pipe. ODM slice
- * count is equal to ODM splits + 1. For example if a timing is cut 3 times, it
- * will have 4 pieces of slice.
- * return - 0 if pipe is not used for ODM combine. otherwise
- * the number of ODM "cuts" for the timing.
+ * Get the number of ODM slices associated with the pipe.
+ * The function returns 0 if the pipe is not associated with an ODM combine
+ * pipe topology.
  */
-int resource_get_odm_slice_count(const struct pipe_ctx *otg_master);
+int resource_get_odm_slice_count(const struct pipe_ctx *pipe);
 
 /* Get the ODM slice index counting from 0 from left most slice */
 int resource_get_odm_slice_index(const struct pipe_ctx *opp_head);
index 37d3027c32dcb1007dbb90e209f7f459be81617e..cf22b8f28ba6c65394a536465143d1c2f81bd2b6 100644 (file)
@@ -370,30 +370,6 @@ static enum transmitter translate_encoder_to_transmitter(
        }
 }
 
-static uint8_t translate_dig_inst_to_pwrseq_inst(struct dc_link *link)
-{
-       uint8_t pwrseq_inst = 0xF;
-       struct dc_context *dc_ctx = link->dc->ctx;
-
-       DC_LOGGER_INIT(dc_ctx->logger);
-
-       switch (link->eng_id) {
-       case ENGINE_ID_DIGA:
-               pwrseq_inst = 0;
-               break;
-       case ENGINE_ID_DIGB:
-               pwrseq_inst = 1;
-               break;
-       default:
-               DC_LOG_WARNING("Unsupported pwrseq engine id: %d!\n", link->eng_id);
-               ASSERT(false);
-               break;
-       }
-
-       return pwrseq_inst;
-}
-
-
 static void link_destruct(struct dc_link *link)
 {
        int i;
@@ -657,7 +633,7 @@ static bool construct_phy(struct dc_link *link,
                        link->link_id.id == CONNECTOR_ID_LVDS)) {
                panel_cntl_init_data.ctx = dc_ctx;
                panel_cntl_init_data.inst = panel_cntl_init_data.ctx->dc_edp_id_count;
-               panel_cntl_init_data.pwrseq_inst = translate_dig_inst_to_pwrseq_inst(link);
+               panel_cntl_init_data.eng_id = link->eng_id;
                link->panel_cntl =
                        link->dc->res_pool->funcs->panel_cntl_create(
                                                                &panel_cntl_init_data);
index 8fe66c3678508d9aee6779fa25cd6128e1f30832..5b0bc7f6a188ccd6b304a369be0bdfe43b91f76a 100644 (file)
@@ -361,7 +361,7 @@ bool link_validate_dpia_bandwidth(const struct dc_stream_state *stream, const un
        struct dc_link *dpia_link[MAX_DPIA_NUM] = {0};
        int num_dpias = 0;
 
-       for (uint8_t i = 0; i < num_streams; ++i) {
+       for (unsigned int i = 0; i < num_streams; ++i) {
                if (stream[i].signal == SIGNAL_TYPE_DISPLAY_PORT) {
                        /* new dpia sst stream, check whether it exceeds max dpia */
                        if (num_dpias >= MAX_DPIA_NUM)
index dd0d2b206462c927c5f68b355498e71250c154b9..5491b707cec881b9854ab96834503c1e88053380 100644 (file)
@@ -196,7 +196,7 @@ static int get_host_router_total_dp_tunnel_bw(const struct dc *dc, uint8_t hr_in
        struct dc_link *link_dpia_primary, *link_dpia_secondary;
        int total_bw = 0;
 
-       for (uint8_t i = 0; i < MAX_PIPES * 2; ++i) {
+       for (uint8_t i = 0; i < (MAX_PIPES * 2) - 1; ++i) {
 
                if (!dc->links[i] || dc->links[i]->ep_type != DISPLAY_ENDPOINT_USB4_DPIA)
                        continue;
index 5a0b0451895690d184ec00c56873f0d1acad6864..16a62e01871224495cd771c4042f04d3be85e04d 100644 (file)
@@ -517,6 +517,7 @@ enum link_training_result dp_check_link_loss_status(
 {
        enum link_training_result status = LINK_TRAINING_SUCCESS;
        union lane_status lane_status;
+       union lane_align_status_updated dpcd_lane_status_updated;
        uint8_t dpcd_buf[6] = {0};
        uint32_t lane;
 
@@ -532,10 +533,12 @@ enum link_training_result dp_check_link_loss_status(
                 * check lanes status
                 */
                lane_status.raw = dp_get_nibble_at_index(&dpcd_buf[2], lane);
+               dpcd_lane_status_updated.raw = dpcd_buf[4];
 
                if (!lane_status.bits.CHANNEL_EQ_DONE_0 ||
                        !lane_status.bits.CR_DONE_0 ||
-                       !lane_status.bits.SYMBOL_LOCKED_0) {
+                       !lane_status.bits.SYMBOL_LOCKED_0 ||
+                       !dp_is_interlane_aligned(dpcd_lane_status_updated)) {
                        /* if one of the channel equalization, clock
                         * recovery or symbol lock is dropped
                         * consider it as (link has been
index e8dda44b23cb29aa3ec2686b6656bd044c194606..5d36bab0029ca54a03aaef4fc83ff99e59550e5a 100644 (file)
@@ -619,7 +619,7 @@ static enum link_training_result dpia_training_eq_non_transparent(
        uint32_t retries_eq = 0;
        enum dc_status status;
        enum dc_dp_training_pattern tr_pattern;
-       uint32_t wait_time_microsec;
+       uint32_t wait_time_microsec = 0;
        enum dc_lane_count lane_count = lt_settings->link_settings.lane_count;
        union lane_align_status_updated dpcd_lane_status_updated = {0};
        union lane_status dpcd_lane_status[LANE_COUNT_DP_MAX] = {0};
index 5c9a30211c109f749ab7e1cceb402bd7a0dcb786..fc50931c2aecbb53d74a2d48913e608134510940 100644 (file)
@@ -205,7 +205,7 @@ enum dc_status core_link_read_dpcd(
        uint32_t extended_size;
        /* size of the remaining partitioned address space */
        uint32_t size_left_to_read;
-       enum dc_status status;
+       enum dc_status status = DC_ERROR_UNEXPECTED;
        /* size of the next partition to be read from */
        uint32_t partition_size;
        uint32_t data_index = 0;
@@ -234,7 +234,7 @@ enum dc_status core_link_write_dpcd(
 {
        uint32_t partition_size;
        uint32_t data_index = 0;
-       enum dc_status status;
+       enum dc_status status = DC_ERROR_UNEXPECTED;
 
        while (size) {
                partition_size = dpcd_get_next_partition_size(address, size);
index 511ff6b5b9856776ea834393e4a7bfcaa90ca49f..7538b548c5725177b12e2d169acc681c31174797 100644 (file)
@@ -999,7 +999,7 @@ static struct stream_encoder *dcn301_stream_encoder_create(enum engine_id eng_id
        vpg = dcn301_vpg_create(ctx, vpg_inst);
        afmt = dcn301_afmt_create(ctx, afmt_inst);
 
-       if (!enc1 || !vpg || !afmt) {
+       if (!enc1 || !vpg || !afmt || eng_id >= ARRAY_SIZE(stream_enc_regs)) {
                kfree(enc1);
                kfree(vpg);
                kfree(afmt);
index c4d71e7f18af47ba47dbc89e1a9098a0a4eade04..6f10052caeef02c3448307c4c81aef805e68e95b 100644 (file)
@@ -1829,7 +1829,21 @@ int dcn32_populate_dml_pipes_from_context(
                dcn32_zero_pipe_dcc_fraction(pipes, pipe_cnt);
                DC_FP_END();
                pipes[pipe_cnt].pipe.dest.vfront_porch = timing->v_front_porch;
-               pipes[pipe_cnt].pipe.dest.odm_combine_policy = dm_odm_combine_policy_dal;
+               if (dc->config.enable_windowed_mpo_odm &&
+                               dc->debug.enable_single_display_2to1_odm_policy) {
+                       switch (resource_get_odm_slice_count(pipe)) {
+                       case 2:
+                               pipes[pipe_cnt].pipe.dest.odm_combine_policy = dm_odm_combine_policy_2to1;
+                               break;
+                       case 4:
+                               pipes[pipe_cnt].pipe.dest.odm_combine_policy = dm_odm_combine_policy_4to1;
+                               break;
+                       default:
+                               pipes[pipe_cnt].pipe.dest.odm_combine_policy = dm_odm_combine_policy_dal;
+                       }
+               } else {
+                       pipes[pipe_cnt].pipe.dest.odm_combine_policy = dm_odm_combine_policy_dal;
+               }
                pipes[pipe_cnt].pipe.src.gpuvm_min_page_size_kbytes = 256; // according to spreadsheet
                pipes[pipe_cnt].pipe.src.unbounded_req_mode = false;
                pipes[pipe_cnt].pipe.scale_ratio_depth.lb_depth = dm_lb_19;
index 761ec989187568730fdd8cd51cd1802fa657be9c..5fdcda8f86026d94697a069ed53a83752a0ebdee 100644 (file)
@@ -780,8 +780,8 @@ static const struct dc_debug_options debug_defaults_drv = {
        .disable_z10 = false,
        .ignore_pg = true,
        .psp_disabled_wa = true,
-       .ips2_eval_delay_us = 200,
-       .ips2_entry_delay_us = 400,
+       .ips2_eval_delay_us = 2000,
+       .ips2_entry_delay_us = 800,
        .static_screen_wait_frames = 2,
 };
 
@@ -2130,6 +2130,7 @@ static bool dcn35_resource_construct(
        dc->dml2_options.dcn_pipe_count = pool->base.pipe_count;
        dc->dml2_options.use_native_pstate_optimization = true;
        dc->dml2_options.use_native_soc_bb_construction = true;
+       dc->dml2_options.minimize_dispclk_using_odm = false;
        if (dc->config.EnableMinDispClkODM)
                dc->dml2_options.minimize_dispclk_using_odm = true;
        dc->dml2_options.enable_windowed_mpo_odm = dc->config.enable_windowed_mpo_odm;
index c64b6c848ef7219e3ddc44da8d4e56763a9bf7f4..e699731ee68e96388c52ed55c17b34cc8710aaab 100644 (file)
@@ -2832,6 +2832,7 @@ struct dmub_rb_cmd_psr_set_power_opt {
 #define REPLAY_RESIDENCY_MODE_MASK             (0x1 << REPLAY_RESIDENCY_MODE_SHIFT)
 # define REPLAY_RESIDENCY_MODE_PHY             (0x0 << REPLAY_RESIDENCY_MODE_SHIFT)
 # define REPLAY_RESIDENCY_MODE_ALPM            (0x1 << REPLAY_RESIDENCY_MODE_SHIFT)
+# define REPLAY_RESIDENCY_MODE_IPS             0x10
 
 #define REPLAY_RESIDENCY_ENABLE_MASK           (0x1 << REPLAY_RESIDENCY_ENABLE_SHIFT)
 # define REPLAY_RESIDENCY_DISABLE              (0x0 << REPLAY_RESIDENCY_ENABLE_SHIFT)
@@ -2894,6 +2895,10 @@ enum dmub_cmd_replay_type {
         * Set Residency Frameupdate Timer.
         */
        DMUB_CMD__REPLAY_SET_RESIDENCY_FRAMEUPDATE_TIMER = 6,
+       /**
+        * Set pseudo vtotal
+        */
+       DMUB_CMD__REPLAY_SET_PSEUDO_VTOTAL = 7,
 };
 
 /**
@@ -3076,6 +3081,26 @@ struct dmub_cmd_replay_set_timing_sync_data {
        uint8_t pad[2];
 };
 
+/**
+ * Data passed from driver to FW in a DMUB_CMD__REPLAY_SET_PSEUDO_VTOTAL command.
+ */
+struct dmub_cmd_replay_set_pseudo_vtotal {
+       /**
+        * Panel Instance.
+        * Panel isntance to identify which replay_state to use
+        * Currently the support is only for 0 or 1
+        */
+       uint8_t panel_inst;
+       /**
+        * Source Vtotal that Replay + IPS + ABM full screen video src vtotal
+        */
+       uint16_t vtotal;
+       /**
+        * Explicit padding to 4 byte boundary.
+        */
+       uint8_t pad;
+};
+
 /**
  * Definition of a DMUB_CMD__SET_REPLAY_POWER_OPT command.
  */
@@ -3156,6 +3181,20 @@ struct dmub_rb_cmd_replay_set_timing_sync {
        struct dmub_cmd_replay_set_timing_sync_data replay_set_timing_sync_data;
 };
 
+/**
+ * Definition of a DMUB_CMD__REPLAY_SET_PSEUDO_VTOTAL command.
+ */
+struct dmub_rb_cmd_replay_set_pseudo_vtotal {
+       /**
+        * Command header.
+        */
+       struct dmub_cmd_header header;
+       /**
+        * Definition of DMUB_CMD__REPLAY_SET_PSEUDO_VTOTAL command.
+        */
+       struct dmub_cmd_replay_set_pseudo_vtotal data;
+};
+
 /**
  * Data passed from driver to FW in  DMUB_CMD__REPLAY_SET_RESIDENCY_FRAMEUPDATE_TIMER command.
  */
@@ -3207,6 +3246,10 @@ union dmub_replay_cmd_set {
         * Definition of DMUB_CMD__REPLAY_SET_RESIDENCY_FRAMEUPDATE_TIMER command data.
         */
        struct dmub_cmd_replay_frameupdate_timer_data timer_data;
+       /**
+        * Definition of DMUB_CMD__REPLAY_SET_PSEUDO_VTOTAL command data.
+        */
+       struct dmub_cmd_replay_set_pseudo_vtotal pseudo_vtotal_data;
 };
 
 /**
@@ -4358,6 +4401,10 @@ union dmub_rb_cmd {
         * Definition of a DMUB_CMD__REPLAY_SET_RESIDENCY_FRAMEUPDATE_TIMER command.
         */
        struct dmub_rb_cmd_replay_set_frameupdate_timer replay_set_frameupdate_timer;
+       /**
+        * Definition of a DMUB_CMD__REPLAY_SET_PSEUDO_VTOTAL command.
+        */
+       struct dmub_rb_cmd_replay_set_pseudo_vtotal replay_set_pseudo_vtotal;
 };
 
 /**
index ad98e504c00de5908ca94a38392ef818e91b2152..e304e8435fb8f1c5e29428f72c20a6097fb57697 100644 (file)
@@ -980,6 +980,11 @@ void set_replay_coasting_vtotal(struct dc_link *link,
        link->replay_settings.coasting_vtotal_table[type] = vtotal;
 }
 
+void set_replay_ips_full_screen_video_src_vtotal(struct dc_link *link, uint16_t vtotal)
+{
+       link->replay_settings.abm_with_ips_on_full_screen_video_pseudo_vtotal = vtotal;
+}
+
 void calculate_replay_link_off_frame_count(struct dc_link *link,
        uint16_t vtotal, uint16_t htotal)
 {
index c17bbc6fb38cafb518777b16c96a99b2116c36eb..bef4815e1703d78cdebc6f49bc160932d08c5272 100644 (file)
@@ -57,6 +57,7 @@ void init_replay_config(struct dc_link *link, struct replay_config *pr_config);
 void set_replay_coasting_vtotal(struct dc_link *link,
        enum replay_coasting_vtotal_type type,
        uint16_t vtotal);
+void set_replay_ips_full_screen_video_src_vtotal(struct dc_link *link, uint16_t vtotal);
 void calculate_replay_link_off_frame_count(struct dc_link *link,
        uint16_t vtotal, uint16_t htotal);
 
index 1dc5dd9b7bf70b10641a76e4c731e3e735aeaeef..df2c7ffe190f4db36050901dce5af89180646f3b 100644 (file)
@@ -258,6 +258,7 @@ enum DC_DEBUG_MASK {
        DC_ENABLE_DML2 = 0x100,
        DC_DISABLE_PSR_SU = 0x200,
        DC_DISABLE_REPLAY = 0x400,
+       DC_DISABLE_IPS = 0x800,
 };
 
 enum amd_dpm_forced_level;
index be519c8edf496fda93f393a077f174454e635a05..335980e2afbfb8e6eae89e7f28fdcc3391d39cde 100644 (file)
@@ -138,7 +138,7 @@ static inline size_t amdgpu_reginst_size(uint16_t num_inst, size_t inst_size,
 }
 
 #define amdgpu_asic_get_reg_state_supported(adev) \
-       ((adev)->asic_funcs->get_reg_state ? 1 : 0)
+       (((adev)->asic_funcs && (adev)->asic_funcs->get_reg_state) ? 1 : 0)
 
 #define amdgpu_asic_get_reg_state(adev, state, buf, size)                  \
        ((adev)->asic_funcs->get_reg_state ?                               \
index 087d57850304c45193a7f5de336953c1dec9cbba..39c5e1dfa275a64f32fa358d875efc6d0bd99682 100644 (file)
@@ -2558,6 +2558,7 @@ static ssize_t amdgpu_hwmon_set_pwm1_enable(struct device *dev,
 {
        struct amdgpu_device *adev = dev_get_drvdata(dev);
        int err, ret;
+       u32 pwm_mode;
        int value;
 
        if (amdgpu_in_reset(adev))
@@ -2569,13 +2570,22 @@ static ssize_t amdgpu_hwmon_set_pwm1_enable(struct device *dev,
        if (err)
                return err;
 
+       if (value == 0)
+               pwm_mode = AMD_FAN_CTRL_NONE;
+       else if (value == 1)
+               pwm_mode = AMD_FAN_CTRL_MANUAL;
+       else if (value == 2)
+               pwm_mode = AMD_FAN_CTRL_AUTO;
+       else
+               return -EINVAL;
+
        ret = pm_runtime_get_sync(adev_to_drm(adev)->dev);
        if (ret < 0) {
                pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
                return ret;
        }
 
-       ret = amdgpu_dpm_set_fan_control_mode(adev, value);
+       ret = amdgpu_dpm_set_fan_control_mode(adev, pwm_mode);
 
        pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
        pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
index df4f20293c16a368748cd4138c0912906f80acc7..eb4da3666e05d6d145a927258d7ea247425dad93 100644 (file)
@@ -6925,6 +6925,23 @@ static int si_dpm_enable(struct amdgpu_device *adev)
        return 0;
 }
 
+static int si_set_temperature_range(struct amdgpu_device *adev)
+{
+       int ret;
+
+       ret = si_thermal_enable_alert(adev, false);
+       if (ret)
+               return ret;
+       ret = si_thermal_set_temperature_range(adev, R600_TEMP_RANGE_MIN, R600_TEMP_RANGE_MAX);
+       if (ret)
+               return ret;
+       ret = si_thermal_enable_alert(adev, true);
+       if (ret)
+               return ret;
+
+       return ret;
+}
+
 static void si_dpm_disable(struct amdgpu_device *adev)
 {
        struct rv7xx_power_info *pi = rv770_get_pi(adev);
@@ -7608,6 +7625,18 @@ static int si_dpm_process_interrupt(struct amdgpu_device *adev,
 
 static int si_dpm_late_init(void *handle)
 {
+       int ret;
+       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+
+       if (!adev->pm.dpm_enabled)
+               return 0;
+
+       ret = si_set_temperature_range(adev);
+       if (ret)
+               return ret;
+#if 0 //TODO ?
+       si_dpm_powergate_uvd(adev, true);
+#endif
        return 0;
 }
 
index c16703868e5ca2a3f0a7f6c7e3757b8e6ba036d0..0ad947df777ab2665a8f0de986a5d39737dd9ded 100644 (file)
@@ -24,6 +24,7 @@
 
 #include <linux/firmware.h>
 #include <linux/pci.h>
+#include <linux/power_supply.h>
 #include <linux/reboot.h>
 
 #include "amdgpu.h"
@@ -733,7 +734,7 @@ static int smu_early_init(void *handle)
        smu->adev = adev;
        smu->pm_enabled = !!amdgpu_dpm;
        smu->is_apu = false;
-       smu->smu_baco.state = SMU_BACO_STATE_NONE;
+       smu->smu_baco.state = SMU_BACO_STATE_EXIT;
        smu->smu_baco.platform_support = false;
        smu->user_dpm_profile.fan_mode = -1;
 
@@ -817,16 +818,8 @@ static int smu_late_init(void *handle)
         * handle the switch automatically. Driver involvement
         * is unnecessary.
         */
-       if (!smu->dc_controlled_by_gpio) {
-               ret = smu_set_power_source(smu,
-                                          adev->pm.ac_power ? SMU_POWER_SOURCE_AC :
-                                          SMU_POWER_SOURCE_DC);
-               if (ret) {
-                       dev_err(adev->dev, "Failed to switch to %s mode!\n",
-                               adev->pm.ac_power ? "AC" : "DC");
-                       return ret;
-               }
-       }
+       adev->pm.ac_power = power_supply_is_system_supplied() > 0;
+       smu_set_ac_dc(smu);
 
        if ((amdgpu_ip_version(adev, MP1_HWIP, 0) == IP_VERSION(13, 0, 1)) ||
            (amdgpu_ip_version(adev, MP1_HWIP, 0) == IP_VERSION(13, 0, 3)))
@@ -1961,31 +1954,10 @@ static int smu_smc_hw_cleanup(struct smu_context *smu)
        return 0;
 }
 
-static int smu_reset_mp1_state(struct smu_context *smu)
-{
-       struct amdgpu_device *adev = smu->adev;
-       int ret = 0;
-
-       if ((!adev->in_runpm) && (!adev->in_suspend) &&
-               (!amdgpu_in_reset(adev)))
-               switch (amdgpu_ip_version(adev, MP1_HWIP, 0)) {
-               case IP_VERSION(13, 0, 0):
-               case IP_VERSION(13, 0, 7):
-               case IP_VERSION(13, 0, 10):
-                       ret = smu_set_mp1_state(smu, PP_MP1_STATE_UNLOAD);
-                       break;
-               default:
-                       break;
-               }
-
-       return ret;
-}
-
 static int smu_hw_fini(void *handle)
 {
        struct amdgpu_device *adev = (struct amdgpu_device *)handle;
        struct smu_context *smu = adev->powerplay.pp_handle;
-       int ret;
 
        if (amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev))
                return 0;
@@ -2003,15 +1975,7 @@ static int smu_hw_fini(void *handle)
 
        adev->pm.dpm_enabled = false;
 
-       ret = smu_smc_hw_cleanup(smu);
-       if (ret)
-               return ret;
-
-       ret = smu_reset_mp1_state(smu);
-       if (ret)
-               return ret;
-
-       return 0;
+       return smu_smc_hw_cleanup(smu);
 }
 
 static void smu_late_fini(void *handle)
@@ -2710,6 +2674,7 @@ int smu_get_power_limit(void *handle,
                case SMU_PPT_LIMIT_CURRENT:
                        switch (amdgpu_ip_version(adev, MP1_HWIP, 0)) {
                        case IP_VERSION(13, 0, 2):
+                       case IP_VERSION(13, 0, 6):
                        case IP_VERSION(11, 0, 7):
                        case IP_VERSION(11, 0, 11):
                        case IP_VERSION(11, 0, 12):
index 2aa4fea873147516c23fb2fc568a94d907ee1c8a..66e84defd0b6ec2521c230262c34215a14251dfb 100644 (file)
@@ -424,7 +424,6 @@ enum smu_reset_mode {
 enum smu_baco_state {
        SMU_BACO_STATE_ENTER = 0,
        SMU_BACO_STATE_EXIT,
-       SMU_BACO_STATE_NONE,
 };
 
 struct smu_baco_context {
index 4cd43bbec910e351eb27a79b4c39308d6462d196..bcad42534da46d780423d636953c40993e7001ac 100644 (file)
@@ -1303,13 +1303,12 @@ static int arcturus_get_power_limit(struct smu_context *smu,
        if (default_power_limit)
                *default_power_limit = power_limit;
 
-       if (smu->od_enabled) {
+       if (smu->od_enabled)
                od_percent_upper = le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
-               od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
-       } else {
+       else
                od_percent_upper = 0;
-               od_percent_lower = 100;
-       }
+
+       od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
 
        dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d (default power: %d)\n",
                                                        od_percent_upper, od_percent_lower, power_limit);
index 8d1d29ffb0f1c54a781c2508447454f9bb7aa5ee..ed189a3878ebe7199833e495f45417461897a93a 100644 (file)
@@ -2357,13 +2357,12 @@ static int navi10_get_power_limit(struct smu_context *smu,
                *default_power_limit = power_limit;
 
        if (smu->od_enabled &&
-                   navi10_od_feature_is_supported(od_settings, SMU_11_0_ODCAP_POWER_LIMIT)) {
+                   navi10_od_feature_is_supported(od_settings, SMU_11_0_ODCAP_POWER_LIMIT))
                od_percent_upper = le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
-               od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
-       } else {
+       else
                od_percent_upper = 0;
-               od_percent_lower = 100;
-       }
+
+       od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_ODSETTING_POWERPERCENTAGE]);
 
        dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d (default power: %d)\n",
                                        od_percent_upper, od_percent_lower, power_limit);
index 21fc033528fa9d1a57ea2699a2780501e2902b3c..e2ad2b972ab0b3550d7aceb66e632eb372a0ffc5 100644 (file)
@@ -640,13 +640,12 @@ static int sienna_cichlid_get_power_limit(struct smu_context *smu,
        if (default_power_limit)
                *default_power_limit = power_limit;
 
-       if (smu->od_enabled) {
+       if (smu->od_enabled)
                od_percent_upper = le32_to_cpu(powerplay_table->overdrive_table.max[SMU_11_0_7_ODSETTING_POWERPERCENTAGE]);
-               od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_7_ODSETTING_POWERPERCENTAGE]);
-       } else {
+       else
                od_percent_upper = 0;
-               od_percent_lower = 100;
-       }
+
+       od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_11_0_7_ODSETTING_POWERPERCENTAGE]);
 
        dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d (default power: %d)\n",
                                        od_percent_upper, od_percent_lower, power_limit);
index 5a314d0316c1c8410d1f44281e5cc487e4947e81..c7bfa68bf00f400f3396c9853d2c08c6bf971659 100644 (file)
@@ -1442,10 +1442,12 @@ static int smu_v11_0_irq_process(struct amdgpu_device *adev,
                        case 0x3:
                                dev_dbg(adev->dev, "Switched to AC mode!\n");
                                schedule_work(&smu->interrupt_work);
+                               adev->pm.ac_power = true;
                                break;
                        case 0x4:
                                dev_dbg(adev->dev, "Switched to DC mode!\n");
                                schedule_work(&smu->interrupt_work);
+                               adev->pm.ac_power = false;
                                break;
                        case 0x7:
                                /*
index 771a3d457c335e2cc08582a3e4e3a3ba853d2928..c486182ff275222fedfaa1e27c417f9be80d19d0 100644 (file)
@@ -1379,10 +1379,12 @@ static int smu_v13_0_irq_process(struct amdgpu_device *adev,
                        case 0x3:
                                dev_dbg(adev->dev, "Switched to AC mode!\n");
                                smu_v13_0_ack_ac_dc_interrupt(smu);
+                               adev->pm.ac_power = true;
                                break;
                        case 0x4:
                                dev_dbg(adev->dev, "Switched to DC mode!\n");
                                smu_v13_0_ack_ac_dc_interrupt(smu);
+                               adev->pm.ac_power = false;
                                break;
                        case 0x7:
                                /*
index a9b25faa63e468d0069ea08acfd7b90b1b36f056..9b80f18ea6c359f279f050ee9f645b92dd43d057 100644 (file)
@@ -2357,6 +2357,7 @@ static int smu_v13_0_0_get_power_limit(struct smu_context *smu,
        PPTable_t *pptable = table_context->driver_pptable;
        SkuTable_t *skutable = &pptable->SkuTable;
        uint32_t power_limit, od_percent_upper, od_percent_lower;
+       uint32_t msg_limit = skutable->MsgLimits.Power[PPT_THROTTLER_PPT0][POWER_SOURCE_AC];
 
        if (smu_v13_0_get_current_power_limit(smu, &power_limit))
                power_limit = smu->adev->pm.ac_power ?
@@ -2368,19 +2369,18 @@ static int smu_v13_0_0_get_power_limit(struct smu_context *smu,
        if (default_power_limit)
                *default_power_limit = power_limit;
 
-       if (smu->od_enabled) {
+       if (smu->od_enabled)
                od_percent_upper = le32_to_cpu(powerplay_table->overdrive_table.max[SMU_13_0_0_ODSETTING_POWERPERCENTAGE]);
-               od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_13_0_0_ODSETTING_POWERPERCENTAGE]);
-       } else {
+       else
                od_percent_upper = 0;
-               od_percent_lower = 100;
-       }
+
+       od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_13_0_0_ODSETTING_POWERPERCENTAGE]);
 
        dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d (default power: %d)\n",
                                        od_percent_upper, od_percent_lower, power_limit);
 
        if (max_power_limit) {
-               *max_power_limit = power_limit * (100 + od_percent_upper);
+               *max_power_limit = msg_limit * (100 + od_percent_upper);
                *max_power_limit /= 100;
        }
 
@@ -2747,13 +2747,7 @@ static int smu_v13_0_0_set_mp1_state(struct smu_context *smu,
 
        switch (mp1_state) {
        case PP_MP1_STATE_UNLOAD:
-               ret = smu_cmn_send_smc_msg_with_param(smu,
-                                                                                         SMU_MSG_PrepareMp1ForUnload,
-                                                                                         0x55, NULL);
-
-               if (!ret && smu->smu_baco.state == SMU_BACO_STATE_EXIT)
-                       ret = smu_v13_0_disable_pmfw_state(smu);
-
+               ret = smu_cmn_set_mp1_state(smu, mp1_state);
                break;
        default:
                /* Ignore others */
@@ -2949,7 +2943,7 @@ static bool smu_v13_0_0_wbrf_support_check(struct smu_context *smu)
 {
        struct amdgpu_device *adev = smu->adev;
 
-       switch (adev->ip_versions[MP1_HWIP][0]) {
+       switch (amdgpu_ip_version(adev, MP1_HWIP, 0)) {
        case IP_VERSION(13, 0, 0):
                return smu->smc_fw_version >= 0x004e6300;
        case IP_VERSION(13, 0, 10):
@@ -2959,6 +2953,55 @@ static bool smu_v13_0_0_wbrf_support_check(struct smu_context *smu)
        }
 }
 
+static int smu_v13_0_0_set_power_limit(struct smu_context *smu,
+                                      enum smu_ppt_limit_type limit_type,
+                                      uint32_t limit)
+{
+       PPTable_t *pptable = smu->smu_table.driver_pptable;
+       SkuTable_t *skutable = &pptable->SkuTable;
+       uint32_t msg_limit = skutable->MsgLimits.Power[PPT_THROTTLER_PPT0][POWER_SOURCE_AC];
+       struct smu_table_context *table_context = &smu->smu_table;
+       OverDriveTableExternal_t *od_table =
+               (OverDriveTableExternal_t *)table_context->overdrive_table;
+       int ret = 0;
+
+       if (limit_type != SMU_DEFAULT_PPT_LIMIT)
+               return -EINVAL;
+
+       if (limit <= msg_limit) {
+               if (smu->current_power_limit > msg_limit) {
+                       od_table->OverDriveTable.Ppt = 0;
+                       od_table->OverDriveTable.FeatureCtrlMask |= 1U << PP_OD_FEATURE_PPT_BIT;
+
+                       ret = smu_v13_0_0_upload_overdrive_table(smu, od_table);
+                       if (ret) {
+                               dev_err(smu->adev->dev, "Failed to upload overdrive table!\n");
+                               return ret;
+                       }
+               }
+               return smu_v13_0_set_power_limit(smu, limit_type, limit);
+       } else if (smu->od_enabled) {
+               ret = smu_v13_0_set_power_limit(smu, limit_type, msg_limit);
+               if (ret)
+                       return ret;
+
+               od_table->OverDriveTable.Ppt = (limit * 100) / msg_limit - 100;
+               od_table->OverDriveTable.FeatureCtrlMask |= 1U << PP_OD_FEATURE_PPT_BIT;
+
+               ret = smu_v13_0_0_upload_overdrive_table(smu, od_table);
+               if (ret) {
+                 dev_err(smu->adev->dev, "Failed to upload overdrive table!\n");
+                 return ret;
+               }
+
+               smu->current_power_limit = limit;
+       } else {
+               return -EINVAL;
+       }
+
+       return 0;
+}
+
 static const struct pptable_funcs smu_v13_0_0_ppt_funcs = {
        .get_allowed_feature_mask = smu_v13_0_0_get_allowed_feature_mask,
        .set_default_dpm_table = smu_v13_0_0_set_default_dpm_table,
@@ -3013,7 +3056,7 @@ static const struct pptable_funcs smu_v13_0_0_ppt_funcs = {
        .set_fan_control_mode = smu_v13_0_set_fan_control_mode,
        .enable_mgpu_fan_boost = smu_v13_0_0_enable_mgpu_fan_boost,
        .get_power_limit = smu_v13_0_0_get_power_limit,
-       .set_power_limit = smu_v13_0_set_power_limit,
+       .set_power_limit = smu_v13_0_0_set_power_limit,
        .set_power_source = smu_v13_0_set_power_source,
        .get_power_profile_mode = smu_v13_0_0_get_power_profile_mode,
        .set_power_profile_mode = smu_v13_0_0_set_power_profile_mode,
index 3c98a8a0386a2612d0470dd6dbd767a6ecc308b0..7e1941cf17964c594dc8821c3fcda75f64e9f145 100644 (file)
@@ -160,8 +160,8 @@ static const struct cmn2asic_msg_mapping smu_v13_0_6_message_map[SMU_MSG_MAX_COU
        MSG_MAP(GfxDriverResetRecovery,              PPSMC_MSG_GfxDriverResetRecovery,          0),
        MSG_MAP(GetMinGfxclkFrequency,               PPSMC_MSG_GetMinGfxDpmFreq,                1),
        MSG_MAP(GetMaxGfxclkFrequency,               PPSMC_MSG_GetMaxGfxDpmFreq,                1),
-       MSG_MAP(SetSoftMinGfxclk,                    PPSMC_MSG_SetSoftMinGfxClk,                0),
-       MSG_MAP(SetSoftMaxGfxClk,                    PPSMC_MSG_SetSoftMaxGfxClk,                0),
+       MSG_MAP(SetSoftMinGfxclk,                    PPSMC_MSG_SetSoftMinGfxClk,                1),
+       MSG_MAP(SetSoftMaxGfxClk,                    PPSMC_MSG_SetSoftMaxGfxClk,                1),
        MSG_MAP(PrepareMp1ForUnload,                 PPSMC_MSG_PrepareForDriverUnload,          0),
        MSG_MAP(GetCTFLimit,                         PPSMC_MSG_GetCTFLimit,                     0),
        MSG_MAP(GetThermalLimit,                     PPSMC_MSG_ReadThrottlerLimit,              0),
index 59606a19e3d2b4494885b72f36f6524ec25b5139..3dc7b60cb0754d0f62fd3cead74f1553071b8597 100644 (file)
@@ -2321,6 +2321,7 @@ static int smu_v13_0_7_get_power_limit(struct smu_context *smu,
        PPTable_t *pptable = table_context->driver_pptable;
        SkuTable_t *skutable = &pptable->SkuTable;
        uint32_t power_limit, od_percent_upper, od_percent_lower;
+       uint32_t msg_limit = skutable->MsgLimits.Power[PPT_THROTTLER_PPT0][POWER_SOURCE_AC];
 
        if (smu_v13_0_get_current_power_limit(smu, &power_limit))
                power_limit = smu->adev->pm.ac_power ?
@@ -2332,19 +2333,18 @@ static int smu_v13_0_7_get_power_limit(struct smu_context *smu,
        if (default_power_limit)
                *default_power_limit = power_limit;
 
-       if (smu->od_enabled) {
+       if (smu->od_enabled)
                od_percent_upper = le32_to_cpu(powerplay_table->overdrive_table.max[SMU_13_0_7_ODSETTING_POWERPERCENTAGE]);
-               od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_13_0_7_ODSETTING_POWERPERCENTAGE]);
-       } else {
+       else
                od_percent_upper = 0;
-               od_percent_lower = 100;
-       }
+
+       od_percent_lower = le32_to_cpu(powerplay_table->overdrive_table.min[SMU_13_0_7_ODSETTING_POWERPERCENTAGE]);
 
        dev_dbg(smu->adev->dev, "od percent upper:%d, od percent lower:%d (default power: %d)\n",
                                        od_percent_upper, od_percent_lower, power_limit);
 
        if (max_power_limit) {
-               *max_power_limit = power_limit * (100 + od_percent_upper);
+               *max_power_limit = msg_limit * (100 + od_percent_upper);
                *max_power_limit /= 100;
        }
 
@@ -2504,13 +2504,7 @@ static int smu_v13_0_7_set_mp1_state(struct smu_context *smu,
 
        switch (mp1_state) {
        case PP_MP1_STATE_UNLOAD:
-               ret = smu_cmn_send_smc_msg_with_param(smu,
-                                                                                         SMU_MSG_PrepareMp1ForUnload,
-                                                                                         0x55, NULL);
-
-               if (!ret && smu->smu_baco.state == SMU_BACO_STATE_EXIT)
-                       ret = smu_v13_0_disable_pmfw_state(smu);
-
+               ret = smu_cmn_set_mp1_state(smu, mp1_state);
                break;
        default:
                /* Ignore others */
@@ -2545,6 +2539,55 @@ static bool smu_v13_0_7_wbrf_support_check(struct smu_context *smu)
        return smu->smc_fw_version > 0x00524600;
 }
 
+static int smu_v13_0_7_set_power_limit(struct smu_context *smu,
+                                      enum smu_ppt_limit_type limit_type,
+                                      uint32_t limit)
+{
+       PPTable_t *pptable = smu->smu_table.driver_pptable;
+       SkuTable_t *skutable = &pptable->SkuTable;
+       uint32_t msg_limit = skutable->MsgLimits.Power[PPT_THROTTLER_PPT0][POWER_SOURCE_AC];
+       struct smu_table_context *table_context = &smu->smu_table;
+       OverDriveTableExternal_t *od_table =
+               (OverDriveTableExternal_t *)table_context->overdrive_table;
+       int ret = 0;
+
+       if (limit_type != SMU_DEFAULT_PPT_LIMIT)
+               return -EINVAL;
+
+       if (limit <= msg_limit) {
+               if (smu->current_power_limit > msg_limit) {
+                       od_table->OverDriveTable.Ppt = 0;
+                       od_table->OverDriveTable.FeatureCtrlMask |= 1U << PP_OD_FEATURE_PPT_BIT;
+
+                       ret = smu_v13_0_7_upload_overdrive_table(smu, od_table);
+                       if (ret) {
+                               dev_err(smu->adev->dev, "Failed to upload overdrive table!\n");
+                               return ret;
+                       }
+               }
+               return smu_v13_0_set_power_limit(smu, limit_type, limit);
+       } else if (smu->od_enabled) {
+               ret = smu_v13_0_set_power_limit(smu, limit_type, msg_limit);
+               if (ret)
+                       return ret;
+
+               od_table->OverDriveTable.Ppt = (limit * 100) / msg_limit - 100;
+               od_table->OverDriveTable.FeatureCtrlMask |= 1U << PP_OD_FEATURE_PPT_BIT;
+
+               ret = smu_v13_0_7_upload_overdrive_table(smu, od_table);
+               if (ret) {
+                 dev_err(smu->adev->dev, "Failed to upload overdrive table!\n");
+                 return ret;
+               }
+
+               smu->current_power_limit = limit;
+       } else {
+               return -EINVAL;
+       }
+
+       return 0;
+}
+
 static const struct pptable_funcs smu_v13_0_7_ppt_funcs = {
        .get_allowed_feature_mask = smu_v13_0_7_get_allowed_feature_mask,
        .set_default_dpm_table = smu_v13_0_7_set_default_dpm_table,
@@ -2596,7 +2639,7 @@ static const struct pptable_funcs smu_v13_0_7_ppt_funcs = {
        .set_fan_control_mode = smu_v13_0_set_fan_control_mode,
        .enable_mgpu_fan_boost = smu_v13_0_7_enable_mgpu_fan_boost,
        .get_power_limit = smu_v13_0_7_get_power_limit,
-       .set_power_limit = smu_v13_0_set_power_limit,
+       .set_power_limit = smu_v13_0_7_set_power_limit,
        .set_power_source = smu_v13_0_set_power_source,
        .get_power_profile_mode = smu_v13_0_7_get_power_profile_mode,
        .set_power_profile_mode = smu_v13_0_7_set_power_profile_mode,
index 4894f7ee737b41dd0e81503b5cb7f3fc1182a6e6..6dae5ad74ff081c4616304ada3a6af302cc21f87 100644 (file)
@@ -229,8 +229,6 @@ int smu_v14_0_check_fw_version(struct smu_context *smu)
                smu->smc_driver_if_version = SMU14_DRIVER_IF_VERSION_SMU_V14_0_2;
                break;
        case IP_VERSION(14, 0, 0):
-               if ((smu->smc_fw_version < 0x5d3a00))
-                       dev_warn(smu->adev->dev, "The PMFW version(%x) is behind in this BIOS!\n", smu->smc_fw_version);
                smu->smc_driver_if_version = SMU14_DRIVER_IF_VERSION_SMU_V14_0_0;
                break;
        default:
index 47fdbae4adfc0207f628c8e2e156703b960e5829..9310c4758e38ce9791ba8d61ce61a2face051fe8 100644 (file)
@@ -261,7 +261,10 @@ static int smu_v14_0_0_get_smu_metrics_data(struct smu_context *smu,
                *value = metrics->MpipuclkFrequency;
                break;
        case METRICS_AVERAGE_GFXACTIVITY:
-               *value = metrics->GfxActivity / 100;
+               if ((smu->smc_fw_version > 0x5d4600))
+                       *value = metrics->GfxActivity;
+               else
+                       *value = metrics->GfxActivity / 100;
                break;
        case METRICS_AVERAGE_VCNACTIVITY:
                *value = metrics->VcnActivity / 100;
index ef31033439bc15a896ed8748b7a62a8b46336c13..29d91493b101acb5234c9a2fe76441925b346f55 100644 (file)
@@ -1762,6 +1762,7 @@ static ssize_t anx7625_aux_transfer(struct drm_dp_aux *aux,
        u8 request = msg->request & ~DP_AUX_I2C_MOT;
        int ret = 0;
 
+       mutex_lock(&ctx->aux_lock);
        pm_runtime_get_sync(dev);
        msg->reply = 0;
        switch (request) {
@@ -1778,6 +1779,7 @@ static ssize_t anx7625_aux_transfer(struct drm_dp_aux *aux,
                                        msg->size, msg->buffer);
        pm_runtime_mark_last_busy(dev);
        pm_runtime_put_autosuspend(dev);
+       mutex_unlock(&ctx->aux_lock);
 
        return ret;
 }
@@ -2474,7 +2476,9 @@ static void anx7625_bridge_atomic_disable(struct drm_bridge *bridge,
        ctx->connector = NULL;
        anx7625_dp_stop(ctx);
 
-       pm_runtime_put_sync(dev);
+       mutex_lock(&ctx->aux_lock);
+       pm_runtime_put_sync_suspend(dev);
+       mutex_unlock(&ctx->aux_lock);
 }
 
 static enum drm_connector_status
@@ -2668,6 +2672,7 @@ static int anx7625_i2c_probe(struct i2c_client *client)
 
        mutex_init(&platform->lock);
        mutex_init(&platform->hdcp_wq_lock);
+       mutex_init(&platform->aux_lock);
 
        INIT_DELAYED_WORK(&platform->hdcp_work, hdcp_check_work_func);
        platform->hdcp_workqueue = create_workqueue("hdcp workqueue");
index 66ebee7f3d832534ec64b780bdfa985bbfcfc896..39ed35d338363390d2fe37b765d4e0e48dc0118e 100644 (file)
@@ -475,6 +475,8 @@ struct anx7625_data {
        struct workqueue_struct *hdcp_workqueue;
        /* Lock for hdcp work queue */
        struct mutex hdcp_wq_lock;
+       /* Lock for aux transfer and disable */
+       struct mutex aux_lock;
        char edid_block;
        struct display_timing dt;
        u8 display_timing_valid;
index bb55f697a1819264e1320f6118d28dd776236f4f..6886db2d9e00c4544ee3d81e29e779f806c8a9b7 100644 (file)
@@ -25,20 +25,18 @@ static void drm_aux_hpd_bridge_release(struct device *dev)
        ida_free(&drm_aux_hpd_bridge_ida, adev->id);
 
        of_node_put(adev->dev.platform_data);
+       of_node_put(adev->dev.of_node);
 
        kfree(adev);
 }
 
-static void drm_aux_hpd_bridge_unregister_adev(void *_adev)
+static void drm_aux_hpd_bridge_free_adev(void *_adev)
 {
-       struct auxiliary_device *adev = _adev;
-
-       auxiliary_device_delete(adev);
-       auxiliary_device_uninit(adev);
+       auxiliary_device_uninit(_adev);
 }
 
 /**
- * drm_dp_hpd_bridge_register - Create a simple HPD DisplayPort bridge
+ * devm_drm_dp_hpd_bridge_alloc - allocate a HPD DisplayPort bridge
  * @parent: device instance providing this bridge
  * @np: device node pointer corresponding to this bridge instance
  *
@@ -46,11 +44,9 @@ static void drm_aux_hpd_bridge_unregister_adev(void *_adev)
  * DRM_MODE_CONNECTOR_DisplayPort, which terminates the bridge chain and is
  * able to send the HPD events.
  *
- * Return: device instance that will handle created bridge or an error code
- * encoded into the pointer.
+ * Return: bridge auxiliary device pointer or an error pointer
  */
-struct device *drm_dp_hpd_bridge_register(struct device *parent,
-                                         struct device_node *np)
+struct auxiliary_device *devm_drm_dp_hpd_bridge_alloc(struct device *parent, struct device_node *np)
 {
        struct auxiliary_device *adev;
        int ret;
@@ -74,18 +70,62 @@ struct device *drm_dp_hpd_bridge_register(struct device *parent,
 
        ret = auxiliary_device_init(adev);
        if (ret) {
+               of_node_put(adev->dev.platform_data);
+               of_node_put(adev->dev.of_node);
                ida_free(&drm_aux_hpd_bridge_ida, adev->id);
                kfree(adev);
                return ERR_PTR(ret);
        }
 
-       ret = auxiliary_device_add(adev);
-       if (ret) {
-               auxiliary_device_uninit(adev);
+       ret = devm_add_action_or_reset(parent, drm_aux_hpd_bridge_free_adev, adev);
+       if (ret)
                return ERR_PTR(ret);
-       }
 
-       ret = devm_add_action_or_reset(parent, drm_aux_hpd_bridge_unregister_adev, adev);
+       return adev;
+}
+EXPORT_SYMBOL_GPL(devm_drm_dp_hpd_bridge_alloc);
+
+static void drm_aux_hpd_bridge_del_adev(void *_adev)
+{
+       auxiliary_device_delete(_adev);
+}
+
+/**
+ * devm_drm_dp_hpd_bridge_add - register a HDP DisplayPort bridge
+ * @dev: struct device to tie registration lifetime to
+ * @adev: bridge auxiliary device to be registered
+ *
+ * Returns: zero on success or a negative errno
+ */
+int devm_drm_dp_hpd_bridge_add(struct device *dev, struct auxiliary_device *adev)
+{
+       int ret;
+
+       ret = auxiliary_device_add(adev);
+       if (ret)
+               return ret;
+
+       return devm_add_action_or_reset(dev, drm_aux_hpd_bridge_del_adev, adev);
+}
+EXPORT_SYMBOL_GPL(devm_drm_dp_hpd_bridge_add);
+
+/**
+ * drm_dp_hpd_bridge_register - allocate and register a HDP DisplayPort bridge
+ * @parent: device instance providing this bridge
+ * @np: device node pointer corresponding to this bridge instance
+ *
+ * Return: device instance that will handle created bridge or an error pointer
+ */
+struct device *drm_dp_hpd_bridge_register(struct device *parent, struct device_node *np)
+{
+       struct auxiliary_device *adev;
+       int ret;
+
+       adev = devm_drm_dp_hpd_bridge_alloc(parent, np);
+       if (IS_ERR(adev))
+               return ERR_CAST(adev);
+
+       ret = devm_drm_dp_hpd_bridge_add(parent, adev);
        if (ret)
                return ERR_PTR(ret);
 
index 541e4f5afc4c86a4e87b74a016885d6231afb892..14d4dcf239da835955f1d594579dd165288bd63f 100644 (file)
@@ -107,6 +107,7 @@ struct ps8640 {
        struct device_link *link;
        bool pre_enabled;
        bool need_post_hpd_delay;
+       struct mutex aux_lock;
 };
 
 static const struct regmap_config ps8640_regmap_config[] = {
@@ -345,11 +346,20 @@ static ssize_t ps8640_aux_transfer(struct drm_dp_aux *aux,
        struct device *dev = &ps_bridge->page[PAGE0_DP_CNTL]->dev;
        int ret;
 
+       mutex_lock(&ps_bridge->aux_lock);
        pm_runtime_get_sync(dev);
+       ret = _ps8640_wait_hpd_asserted(ps_bridge, 200 * 1000);
+       if (ret) {
+               pm_runtime_put_sync_suspend(dev);
+               goto exit;
+       }
        ret = ps8640_aux_transfer_msg(aux, msg);
        pm_runtime_mark_last_busy(dev);
        pm_runtime_put_autosuspend(dev);
 
+exit:
+       mutex_unlock(&ps_bridge->aux_lock);
+
        return ret;
 }
 
@@ -470,7 +480,18 @@ static void ps8640_atomic_post_disable(struct drm_bridge *bridge,
        ps_bridge->pre_enabled = false;
 
        ps8640_bridge_vdo_control(ps_bridge, DISABLE);
+
+       /*
+        * The bridge seems to expect everything to be power cycled at the
+        * disable process, so grab a lock here to make sure
+        * ps8640_aux_transfer() is not holding a runtime PM reference and
+        * preventing the bridge from suspend.
+        */
+       mutex_lock(&ps_bridge->aux_lock);
+
        pm_runtime_put_sync_suspend(&ps_bridge->page[PAGE0_DP_CNTL]->dev);
+
+       mutex_unlock(&ps_bridge->aux_lock);
 }
 
 static int ps8640_bridge_attach(struct drm_bridge *bridge,
@@ -619,6 +640,8 @@ static int ps8640_probe(struct i2c_client *client)
        if (!ps_bridge)
                return -ENOMEM;
 
+       mutex_init(&ps_bridge->aux_lock);
+
        ps_bridge->supplies[0].supply = "vdd12";
        ps_bridge->supplies[1].supply = "vdd33";
        ret = devm_regulator_bulk_get(dev, ARRAY_SIZE(ps_bridge->supplies),
index be5914caa17d546601d11719976161624c1a420f..63a1a0c88be4d98d169996d341de5d0d1b6cae91 100644 (file)
@@ -969,10 +969,6 @@ static int samsung_dsim_init_link(struct samsung_dsim *dsi)
        reg = samsung_dsim_read(dsi, DSIM_ESCMODE_REG);
        reg &= ~DSIM_STOP_STATE_CNT_MASK;
        reg |= DSIM_STOP_STATE_CNT(driver_data->reg_values[STOP_STATE_CNT]);
-
-       if (!samsung_dsim_hw_is_exynos(dsi->plat_data->hw_type))
-               reg |= DSIM_FORCE_STOP_STATE;
-
        samsung_dsim_write(dsi, DSIM_ESCMODE_REG, reg);
 
        reg = DSIM_BTA_TIMEOUT(0xff) | DSIM_LPDR_TIMEOUT(0xffff);
@@ -1431,18 +1427,6 @@ static void samsung_dsim_disable_irq(struct samsung_dsim *dsi)
        disable_irq(dsi->irq);
 }
 
-static void samsung_dsim_set_stop_state(struct samsung_dsim *dsi, bool enable)
-{
-       u32 reg = samsung_dsim_read(dsi, DSIM_ESCMODE_REG);
-
-       if (enable)
-               reg |= DSIM_FORCE_STOP_STATE;
-       else
-               reg &= ~DSIM_FORCE_STOP_STATE;
-
-       samsung_dsim_write(dsi, DSIM_ESCMODE_REG, reg);
-}
-
 static int samsung_dsim_init(struct samsung_dsim *dsi)
 {
        const struct samsung_dsim_driver_data *driver_data = dsi->driver_data;
@@ -1492,9 +1476,6 @@ static void samsung_dsim_atomic_pre_enable(struct drm_bridge *bridge,
                ret = samsung_dsim_init(dsi);
                if (ret)
                        return;
-
-               samsung_dsim_set_display_mode(dsi);
-               samsung_dsim_set_display_enable(dsi, true);
        }
 }
 
@@ -1503,12 +1484,8 @@ static void samsung_dsim_atomic_enable(struct drm_bridge *bridge,
 {
        struct samsung_dsim *dsi = bridge_to_dsi(bridge);
 
-       if (samsung_dsim_hw_is_exynos(dsi->plat_data->hw_type)) {
-               samsung_dsim_set_display_mode(dsi);
-               samsung_dsim_set_display_enable(dsi, true);
-       } else {
-               samsung_dsim_set_stop_state(dsi, false);
-       }
+       samsung_dsim_set_display_mode(dsi);
+       samsung_dsim_set_display_enable(dsi, true);
 
        dsi->state |= DSIM_STATE_VIDOUT_AVAILABLE;
 }
@@ -1521,9 +1498,6 @@ static void samsung_dsim_atomic_disable(struct drm_bridge *bridge,
        if (!(dsi->state & DSIM_STATE_ENABLED))
                return;
 
-       if (!samsung_dsim_hw_is_exynos(dsi->plat_data->hw_type))
-               samsung_dsim_set_stop_state(dsi, true);
-
        dsi->state &= ~DSIM_STATE_VIDOUT_AVAILABLE;
 }
 
@@ -1828,8 +1802,6 @@ static ssize_t samsung_dsim_host_transfer(struct mipi_dsi_host *host,
        if (ret)
                return ret;
 
-       samsung_dsim_set_stop_state(dsi, false);
-
        ret = mipi_dsi_create_packet(&xfer.packet, msg);
        if (ret < 0)
                return ret;
index 2bdc5b439bebd56407af3b5b04892b3ac90678d4..4560ae9cbce15095eddaf6296396960a7887ab06 100644 (file)
@@ -1080,6 +1080,26 @@ static int sii902x_init(struct sii902x *sii902x)
                        return ret;
        }
 
+       ret = sii902x_audio_codec_init(sii902x, dev);
+       if (ret)
+               return ret;
+
+       i2c_set_clientdata(sii902x->i2c, sii902x);
+
+       sii902x->i2cmux = i2c_mux_alloc(sii902x->i2c->adapter, dev,
+                                       1, 0, I2C_MUX_GATE,
+                                       sii902x_i2c_bypass_select,
+                                       sii902x_i2c_bypass_deselect);
+       if (!sii902x->i2cmux) {
+               ret = -ENOMEM;
+               goto err_unreg_audio;
+       }
+
+       sii902x->i2cmux->priv = sii902x;
+       ret = i2c_mux_add_adapter(sii902x->i2cmux, 0, 0, 0);
+       if (ret)
+               goto err_unreg_audio;
+
        sii902x->bridge.funcs = &sii902x_bridge_funcs;
        sii902x->bridge.of_node = dev->of_node;
        sii902x->bridge.timings = &default_sii902x_timings;
@@ -1090,19 +1110,13 @@ static int sii902x_init(struct sii902x *sii902x)
 
        drm_bridge_add(&sii902x->bridge);
 
-       sii902x_audio_codec_init(sii902x, dev);
-
-       i2c_set_clientdata(sii902x->i2c, sii902x);
+       return 0;
 
-       sii902x->i2cmux = i2c_mux_alloc(sii902x->i2c->adapter, dev,
-                                       1, 0, I2C_MUX_GATE,
-                                       sii902x_i2c_bypass_select,
-                                       sii902x_i2c_bypass_deselect);
-       if (!sii902x->i2cmux)
-               return -ENOMEM;
+err_unreg_audio:
+       if (!PTR_ERR_OR_ZERO(sii902x->audio.pdev))
+               platform_device_unregister(sii902x->audio.pdev);
 
-       sii902x->i2cmux->priv = sii902x;
-       return i2c_mux_add_adapter(sii902x->i2cmux, 0, 0, 0);
+       return ret;
 }
 
 static int sii902x_probe(struct i2c_client *client)
@@ -1170,12 +1184,14 @@ static int sii902x_probe(struct i2c_client *client)
 }
 
 static void sii902x_remove(struct i2c_client *client)
-
 {
        struct sii902x *sii902x = i2c_get_clientdata(client);
 
-       i2c_mux_del_adapters(sii902x->i2cmux);
        drm_bridge_remove(&sii902x->bridge);
+       i2c_mux_del_adapters(sii902x->i2cmux);
+
+       if (!PTR_ERR_OR_ZERO(sii902x->audio.pdev))
+               platform_device_unregister(sii902x->audio.pdev);
 }
 
 static const struct of_device_id sii902x_dt_ids[] = {
index bd6c24d4213cdf2f6bcb132848330f43e4546efd..f7c6b60629c2ba5b178145977d8490a6e094ce71 100644 (file)
@@ -5491,6 +5491,7 @@ EXPORT_SYMBOL(drm_dp_mst_atomic_enable_dsc);
  *   - 0 if the new state is valid
  *   - %-ENOSPC, if the new state is invalid, because of BW limitation
  *         @failing_port is set to:
+ *
  *         - The non-root port where a BW limit check failed
  *           with all the ports downstream of @failing_port passing
  *           the BW limit check.
@@ -5499,6 +5500,7 @@ EXPORT_SYMBOL(drm_dp_mst_atomic_enable_dsc);
  *         - %NULL if the BW limit check failed at the root port
  *           with all the ports downstream of the root port passing
  *           the BW limit check.
+ *
  *   - %-EINVAL, if the new state is invalid, because the root port has
  *     too many payloads.
  */
index f57e6d74fb0e039a710b9bd8161a8e8e25d5888b..5ebdd6f8f36e6bc8d67e99a54bac3856d45ac9eb 100644 (file)
@@ -332,6 +332,7 @@ alloc_range_bias(struct drm_buddy *mm,
                 u64 start, u64 end,
                 unsigned int order)
 {
+       u64 req_size = mm->chunk_size << order;
        struct drm_buddy_block *block;
        struct drm_buddy_block *buddy;
        LIST_HEAD(dfs);
@@ -367,6 +368,15 @@ alloc_range_bias(struct drm_buddy *mm,
                if (drm_buddy_block_is_allocated(block))
                        continue;
 
+               if (block_start < start || block_end > end) {
+                       u64 adjusted_start = max(block_start, start);
+                       u64 adjusted_end = min(block_end, end);
+
+                       if (round_down(adjusted_end + 1, req_size) <=
+                           round_up(adjusted_start, req_size))
+                               continue;
+               }
+
                if (contains(start, end, block_start, block_end) &&
                    order == drm_buddy_block_order(block)) {
                        /*
@@ -538,7 +548,13 @@ static int __alloc_range(struct drm_buddy *mm,
                list_add(&block->left->tmp_link, dfs);
        } while (1);
 
+       if (total_allocated < size) {
+               err = -ENOSPC;
+               goto err_free;
+       }
+
        list_splice_tail(&allocated, blocks);
+
        return 0;
 
 err_undo:
@@ -755,8 +771,12 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm,
                return -EINVAL;
 
        /* Actual range allocation */
-       if (start + size == end)
+       if (start + size == end) {
+               if (!IS_ALIGNED(start | end, min_block_size))
+                       return -EINVAL;
+
                return __drm_buddy_alloc_range(mm, start, size, NULL, blocks);
+       }
 
        original_size = size;
        original_min_size = min_block_size;
index cb90e70d85e862a495f2e8691813161a93b7a030..65f9f66933bba2785fc3b64f7040e676c2afd352 100644 (file)
@@ -904,6 +904,7 @@ out:
        connector_set = NULL;
        fb = NULL;
        mode = NULL;
+       num_connectors = 0;
 
        DRM_MODESET_LOCK_ALL_END(dev, ctx, ret);
 
index 834a5e28abbe5959cc6da2933904c4feb4c7b00d..7352bde299d54767fecb34232cb5941a01d6ea88 100644 (file)
@@ -820,7 +820,7 @@ struct sg_table *drm_prime_pages_to_sg(struct drm_device *dev,
        if (max_segment == 0)
                max_segment = UINT_MAX;
        err = sg_alloc_table_from_pages_segment(sg, pages, nr_pages, 0,
-                                               nr_pages << PAGE_SHIFT,
+                                               (unsigned long)nr_pages << PAGE_SHIFT,
                                                max_segment, GFP_KERNEL);
        if (err) {
                kfree(sg);
index 3f479483d7d80f21febcda087570bcc6af2fd34c..23b4e9a3361d82e0d5bc6a1daf121a0645011d96 100644 (file)
@@ -760,9 +760,11 @@ static void output_poll_execute(struct work_struct *work)
        changed = dev->mode_config.delayed_event;
        dev->mode_config.delayed_event = false;
 
-       if (!drm_kms_helper_poll && dev->mode_config.poll_running) {
-               drm_kms_helper_disable_hpd(dev);
-               dev->mode_config.poll_running = false;
+       if (!drm_kms_helper_poll) {
+               if (dev->mode_config.poll_running) {
+                       drm_kms_helper_disable_hpd(dev);
+                       dev->mode_config.poll_running = false;
+               }
                goto out;
        }
 
index 84101baeecc6e67d562e15e7d5ded57df39ffdc1..a6c19de462928ed70da033a60b10f08061bd1dc8 100644 (file)
@@ -1040,7 +1040,8 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
        uint64_t *points;
        uint32_t signaled_count, i;
 
-       if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT)
+       if (flags & (DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT |
+                    DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE))
                lockdep_assert_none_held_once();
 
        points = kmalloc_array(count, sizeof(*points), GFP_KERNEL);
@@ -1109,7 +1110,8 @@ static signed long drm_syncobj_array_wait_timeout(struct drm_syncobj **syncobjs,
         * fallthough and try a 0 timeout wait!
         */
 
-       if (flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT) {
+       if (flags & (DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT |
+                    DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE)) {
                for (i = 0; i < count; ++i)
                        drm_syncobj_fence_add_wait(syncobjs[i], &entries[i]);
        }
@@ -1416,10 +1418,21 @@ syncobj_eventfd_entry_func(struct drm_syncobj *syncobj,
 
        /* This happens inside the syncobj lock */
        fence = dma_fence_get(rcu_dereference_protected(syncobj->fence, 1));
+       if (!fence)
+               return;
+
        ret = dma_fence_chain_find_seqno(&fence, entry->point);
-       if (ret != 0 || !fence) {
+       if (ret != 0) {
+               /* The given seqno has not been submitted yet. */
                dma_fence_put(fence);
                return;
+       } else if (!fence) {
+               /* If dma_fence_chain_find_seqno returns 0 but sets the fence
+                * to NULL, it implies that the given seqno is signaled and a
+                * later seqno has already been submitted. Assign a stub fence
+                * so that the eventfd still gets signaled below.
+                */
+               fence = dma_fence_get_stub();
        }
 
        list_del_init(&entry->node);
index 776f2f0b602debb88a6c820add8d737332f2938e..0ef7bc8848b0798b125f7a65ff04cf4586f13d71 100644 (file)
@@ -319,9 +319,9 @@ static void decon_win_set_bldmod(struct decon_context *ctx, unsigned int win,
 static void decon_win_set_pixfmt(struct decon_context *ctx, unsigned int win,
                                 struct drm_framebuffer *fb)
 {
-       struct exynos_drm_plane plane = ctx->planes[win];
+       struct exynos_drm_plane *plane = &ctx->planes[win];
        struct exynos_drm_plane_state *state =
-               to_exynos_plane_state(plane.base.state);
+               to_exynos_plane_state(plane->base.state);
        unsigned int alpha = state->base.alpha;
        unsigned int pixel_alpha;
        unsigned long val;
index a9f1c5c058940178c8318484fcea422f1428de69..f2145227a1e0ce889d2ce0a3926a79e64c832fc9 100644 (file)
@@ -480,7 +480,7 @@ static void fimd_commit(struct exynos_drm_crtc *crtc)
        struct fimd_context *ctx = crtc->ctx;
        struct drm_display_mode *mode = &crtc->base.state->adjusted_mode;
        const struct fimd_driver_data *driver_data = ctx->driver_data;
-       void *timing_base = ctx->regs + driver_data->timing_base;
+       void __iomem *timing_base = ctx->regs + driver_data->timing_base;
        u32 val;
 
        if (ctx->suspended)
@@ -661,9 +661,9 @@ static void fimd_win_set_bldmod(struct fimd_context *ctx, unsigned int win,
 static void fimd_win_set_pixfmt(struct fimd_context *ctx, unsigned int win,
                                struct drm_framebuffer *fb, int width)
 {
-       struct exynos_drm_plane plane = ctx->planes[win];
+       struct exynos_drm_plane *plane = &ctx->planes[win];
        struct exynos_drm_plane_state *state =
-               to_exynos_plane_state(plane.base.state);
+               to_exynos_plane_state(plane->base.state);
        uint32_t pixel_format = fb->format->format;
        unsigned int alpha = state->base.alpha;
        u32 val = WINCONx_ENWIN;
index e9a769590415dcd0d7899df16254d7c20cdea8b1..180507a477009d6e424cc5aede8e18255127b3f1 100644 (file)
@@ -1341,7 +1341,7 @@ static int __maybe_unused gsc_runtime_resume(struct device *dev)
        for (i = 0; i < ctx->num_clocks; i++) {
                ret = clk_prepare_enable(ctx->clocks[i]);
                if (ret) {
-                       while (--i > 0)
+                       while (--i >= 0)
                                clk_disable_unprepare(ctx->clocks[i]);
                        return ret;
                }
index b5d6e3352071f5c3765f074200180bd6dfc7685c..3089029abba481828522070dc0063eaa79251bf9 100644 (file)
@@ -140,7 +140,7 @@ config DRM_I915_GVT_KVMGT
 
          Note that this driver only supports newer device from Broadwell on.
          For further information and setup guide, you can visit:
-         http://01.org/igvt-g.
+         https://github.com/intel/gvt-linux/wiki.
 
          If in doubt, say "N".
 
index e777686190ca241f0ed288b2ad32645d41f1a288..c13f14edb50889baa604b044d2324a371e444ed5 100644 (file)
@@ -17,7 +17,6 @@ subdir-ccflags-y += $(call cc-option, -Wunused-const-variable)
 subdir-ccflags-y += $(call cc-option, -Wpacked-not-aligned)
 subdir-ccflags-y += $(call cc-option, -Wformat-overflow)
 subdir-ccflags-y += $(call cc-option, -Wformat-truncation)
-subdir-ccflags-y += $(call cc-option, -Wstringop-overflow)
 subdir-ccflags-y += $(call cc-option, -Wstringop-truncation)
 # The following turn off the warnings enabled by -Wextra
 ifeq ($(findstring 2, $(KBUILD_EXTRA_WARN)),)
index ac456a2275dbad62cb9a4ac7f706333c73dd03aa..eda4a8b885904de71bb6e3bb1998fa1242a1b9a7 100644 (file)
@@ -1155,6 +1155,7 @@ static void gen11_dsi_powerup_panel(struct intel_encoder *encoder)
        }
 
        intel_dsi_vbt_exec_sequence(intel_dsi, MIPI_SEQ_INIT_OTP);
+       intel_dsi_vbt_exec_sequence(intel_dsi, MIPI_SEQ_DISPLAY_ON);
 
        /* ensure all panel commands dispatched before enabling transcoder */
        wait_for_cmds_dispatched_to_panel(encoder);
@@ -1255,8 +1256,6 @@ static void gen11_dsi_enable(struct intel_atomic_state *state,
        /* step6d: enable dsi transcoder */
        gen11_dsi_enable_transcoder(encoder);
 
-       intel_dsi_vbt_exec_sequence(intel_dsi, MIPI_SEQ_DISPLAY_ON);
-
        /* step7: enable backlight */
        intel_backlight_enable(crtc_state, conn_state);
        intel_dsi_vbt_exec_sequence(intel_dsi, MIPI_SEQ_BACKLIGHT_ON);
index 47cd6bb04366f34a798d76df35b3bba3be2cd67e..06900ff307b23a411114394e8d5d0f5e7ba4ad89 100644 (file)
@@ -246,7 +246,14 @@ static enum phy icl_aux_pw_to_phy(struct drm_i915_private *i915,
        enum aux_ch aux_ch = icl_aux_pw_to_ch(power_well);
        struct intel_digital_port *dig_port = aux_ch_to_digital_port(i915, aux_ch);
 
-       return intel_port_to_phy(i915, dig_port->base.port);
+       /*
+        * FIXME should we care about the (VBT defined) dig_port->aux_ch
+        * relationship or should this be purely defined by the hardware layout?
+        * Currently if the port doesn't appear in the VBT, or if it's declared
+        * as HDMI-only and routed to a combo PHY, the encoder either won't be
+        * present at all or it will not have an aux_ch assigned.
+        */
+       return dig_port ? intel_port_to_phy(i915, dig_port->base.port) : PHY_NONE;
 }
 
 static void hsw_wait_for_power_well_enable(struct drm_i915_private *dev_priv,
@@ -414,7 +421,8 @@ icl_combo_phy_aux_power_well_enable(struct drm_i915_private *dev_priv,
 
        intel_de_rmw(dev_priv, regs->driver, 0, HSW_PWR_WELL_CTL_REQ(pw_idx));
 
-       if (DISPLAY_VER(dev_priv) < 12)
+       /* FIXME this is a mess */
+       if (phy != PHY_NONE)
                intel_de_rmw(dev_priv, ICL_PORT_CL_DW12(phy),
                             0, ICL_LANE_ENABLE_AUX);
 
@@ -437,7 +445,10 @@ icl_combo_phy_aux_power_well_disable(struct drm_i915_private *dev_priv,
 
        drm_WARN_ON(&dev_priv->drm, !IS_ICELAKE(dev_priv));
 
-       intel_de_rmw(dev_priv, ICL_PORT_CL_DW12(phy), ICL_LANE_ENABLE_AUX, 0);
+       /* FIXME this is a mess */
+       if (phy != PHY_NONE)
+               intel_de_rmw(dev_priv, ICL_PORT_CL_DW12(phy),
+                            ICL_LANE_ENABLE_AUX, 0);
 
        intel_de_rmw(dev_priv, regs->driver, HSW_PWR_WELL_CTL_REQ(pw_idx), 0);
 
index 3fdd8a5179831288f1e10bc8d9161d8d23a7ba6a..ac7fe6281afe3f52b37c0e75f422045982bb076a 100644 (file)
@@ -609,6 +609,13 @@ struct intel_connector {
         * and active (i.e. dpms ON state). */
        bool (*get_hw_state)(struct intel_connector *);
 
+       /*
+        * Optional hook called during init/resume to sync any state
+        * stored in the connector (eg. DSC state) wrt. the HW state.
+        */
+       void (*sync_state)(struct intel_connector *connector,
+                          const struct intel_crtc_state *crtc_state);
+
        /* Panel info for eDP and LVDS */
        struct intel_panel panel;
 
index f5ef95da55346ff14cc6b102c27e78e8960cec65..94d2a15d8444ad6a9d88029091cf4aafda60f497 100644 (file)
@@ -2355,6 +2355,9 @@ intel_dp_compute_config_limits(struct intel_dp *intel_dp,
        limits->min_rate = intel_dp_common_rate(intel_dp, 0);
        limits->max_rate = intel_dp_max_link_rate(intel_dp);
 
+       /* FIXME 128b/132b SST support missing */
+       limits->max_rate = min(limits->max_rate, 810000);
+
        limits->min_lane_count = 1;
        limits->max_lane_count = intel_dp_max_lane_count(intel_dp);
 
@@ -5696,6 +5699,9 @@ intel_dp_detect(struct drm_connector *connector,
                goto out;
        }
 
+       if (!intel_dp_is_edp(intel_dp))
+               intel_psr_init_dpcd(intel_dp);
+
        intel_dp_detect_dsc_caps(intel_dp, intel_connector);
 
        intel_dp_configure_mst(intel_dp);
@@ -5856,6 +5862,19 @@ intel_dp_connector_unregister(struct drm_connector *connector)
        intel_connector_unregister(connector);
 }
 
+void intel_dp_connector_sync_state(struct intel_connector *connector,
+                                  const struct intel_crtc_state *crtc_state)
+{
+       struct drm_i915_private *i915 = to_i915(connector->base.dev);
+
+       if (crtc_state && crtc_state->dsc.compression_enable) {
+               drm_WARN_ON(&i915->drm, !connector->dp.dsc_decompression_aux);
+               connector->dp.dsc_decompression_enabled = true;
+       } else {
+               connector->dp.dsc_decompression_enabled = false;
+       }
+}
+
 void intel_dp_encoder_flush_work(struct drm_encoder *encoder)
 {
        struct intel_digital_port *dig_port = enc_to_dig_port(to_intel_encoder(encoder));
index 05db46b111f216e150760e0dff76581cc18bbcca..375d0677cd8c516c56ca2cf9ba592b4746677304 100644 (file)
@@ -45,6 +45,8 @@ bool intel_dp_limited_color_range(const struct intel_crtc_state *crtc_state,
 int intel_dp_min_bpp(enum intel_output_format output_format);
 bool intel_dp_init_connector(struct intel_digital_port *dig_port,
                             struct intel_connector *intel_connector);
+void intel_dp_connector_sync_state(struct intel_connector *connector,
+                                  const struct intel_crtc_state *crtc_state);
 void intel_dp_set_link_params(struct intel_dp *intel_dp,
                              int link_rate, int lane_count);
 int intel_dp_get_link_train_fallback_values(struct intel_dp *intel_dp,
index 3a595cd433d4952078ed20227a1d47f4fe11d458..8538d1ce2fcb854bdf5fcc60086992039312702e 100644 (file)
@@ -330,23 +330,13 @@ static const struct hdcp2_dp_msg_data hdcp2_dp_msg_data[] = {
          0, 0 },
 };
 
-static struct drm_dp_aux *
-intel_dp_hdcp_get_aux(struct intel_connector *connector)
-{
-       struct intel_digital_port *dig_port = intel_attached_dig_port(connector);
-
-       if (intel_encoder_is_mst(connector->encoder))
-               return &connector->port->aux;
-       else
-               return &dig_port->dp.aux;
-}
-
 static int
 intel_dp_hdcp2_read_rx_status(struct intel_connector *connector,
                              u8 *rx_status)
 {
        struct drm_i915_private *i915 = to_i915(connector->base.dev);
-       struct drm_dp_aux *aux = intel_dp_hdcp_get_aux(connector);
+       struct intel_digital_port *dig_port = intel_attached_dig_port(connector);
+       struct drm_dp_aux *aux = &dig_port->dp.aux;
        ssize_t ret;
 
        ret = drm_dp_dpcd_read(aux,
@@ -399,7 +389,9 @@ intel_dp_hdcp2_wait_for_msg(struct intel_connector *connector,
                            const struct hdcp2_dp_msg_data *hdcp2_msg_data)
 {
        struct drm_i915_private *i915 = to_i915(connector->base.dev);
-       struct intel_hdcp *hdcp = &connector->hdcp;
+       struct intel_digital_port *dig_port = intel_attached_dig_port(connector);
+       struct intel_dp *dp = &dig_port->dp;
+       struct intel_hdcp *hdcp = &dp->attached_connector->hdcp;
        u8 msg_id = hdcp2_msg_data->msg_id;
        int ret, timeout;
        bool msg_ready = false;
@@ -454,8 +446,9 @@ int intel_dp_hdcp2_write_msg(struct intel_connector *connector,
        unsigned int offset;
        u8 *byte = buf;
        ssize_t ret, bytes_to_write, len;
+       struct intel_digital_port *dig_port = intel_attached_dig_port(connector);
+       struct drm_dp_aux *aux = &dig_port->dp.aux;
        const struct hdcp2_dp_msg_data *hdcp2_msg_data;
-       struct drm_dp_aux *aux;
 
        hdcp2_msg_data = get_hdcp2_dp_msg_data(*byte);
        if (!hdcp2_msg_data)
@@ -463,8 +456,6 @@ int intel_dp_hdcp2_write_msg(struct intel_connector *connector,
 
        offset = hdcp2_msg_data->offset;
 
-       aux = intel_dp_hdcp_get_aux(connector);
-
        /* No msg_id in DP HDCP2.2 msgs */
        bytes_to_write = size - 1;
        byte++;
@@ -490,7 +481,8 @@ static
 ssize_t get_receiver_id_list_rx_info(struct intel_connector *connector,
                                     u32 *dev_cnt, u8 *byte)
 {
-       struct drm_dp_aux *aux = intel_dp_hdcp_get_aux(connector);
+       struct intel_digital_port *dig_port = intel_attached_dig_port(connector);
+       struct drm_dp_aux *aux = &dig_port->dp.aux;
        ssize_t ret;
        u8 *rx_info = byte;
 
@@ -515,8 +507,9 @@ int intel_dp_hdcp2_read_msg(struct intel_connector *connector,
 {
        struct intel_digital_port *dig_port = intel_attached_dig_port(connector);
        struct drm_i915_private *i915 = to_i915(dig_port->base.base.dev);
-       struct intel_hdcp *hdcp = &connector->hdcp;
-       struct drm_dp_aux *aux;
+       struct drm_dp_aux *aux = &dig_port->dp.aux;
+       struct intel_dp *dp = &dig_port->dp;
+       struct intel_hdcp *hdcp = &dp->attached_connector->hdcp;
        unsigned int offset;
        u8 *byte = buf;
        ssize_t ret, bytes_to_recv, len;
@@ -530,8 +523,6 @@ int intel_dp_hdcp2_read_msg(struct intel_connector *connector,
                return -EINVAL;
        offset = hdcp2_msg_data->offset;
 
-       aux = intel_dp_hdcp_get_aux(connector);
-
        ret = intel_dp_hdcp2_wait_for_msg(connector, hdcp2_msg_data);
        if (ret < 0)
                return ret;
@@ -561,13 +552,8 @@ int intel_dp_hdcp2_read_msg(struct intel_connector *connector,
 
                /* Entire msg read timeout since initiate of msg read */
                if (bytes_to_recv == size - 1 && hdcp2_msg_data->msg_read_timeout > 0) {
-                       if (intel_encoder_is_mst(connector->encoder))
-                               msg_end = ktime_add_ms(ktime_get_raw(),
-                                                      hdcp2_msg_data->msg_read_timeout *
-                                                      connector->port->parent->num_ports);
-                       else
-                               msg_end = ktime_add_ms(ktime_get_raw(),
-                                                      hdcp2_msg_data->msg_read_timeout);
+                       msg_end = ktime_add_ms(ktime_get_raw(),
+                                              hdcp2_msg_data->msg_read_timeout);
                }
 
                ret = drm_dp_dpcd_read(aux, offset,
@@ -651,12 +637,11 @@ static
 int intel_dp_hdcp2_capable(struct intel_connector *connector,
                           bool *capable)
 {
-       struct drm_dp_aux *aux;
+       struct intel_digital_port *dig_port = intel_attached_dig_port(connector);
+       struct drm_dp_aux *aux = &dig_port->dp.aux;
        u8 rx_caps[3];
        int ret;
 
-       aux = intel_dp_hdcp_get_aux(connector);
-
        *capable = false;
        ret = drm_dp_dpcd_read(aux,
                               DP_HDCP_2_2_REG_RX_CAPS_OFFSET,
index 8a9432335030346ecf3b7501a4cfb19cd59d5259..a01a59f57ae5525acb65e2725b3b37a9e31a065c 100644 (file)
@@ -1534,6 +1534,7 @@ static struct drm_connector *intel_dp_add_mst_connector(struct drm_dp_mst_topolo
                return NULL;
 
        intel_connector->get_hw_state = intel_dp_mst_get_hw_state;
+       intel_connector->sync_state = intel_dp_connector_sync_state;
        intel_connector->mst_port = intel_dp;
        intel_connector->port = port;
        drm_dp_mst_get_port_malloc(port);
index 94eece7f63be3341fc92807345c0f7b01f862275..caeca3a8442c5d76008525ff56c65356eb6171be 100644 (file)
@@ -318,12 +318,6 @@ static void intel_modeset_update_connector_atomic_state(struct drm_i915_private
                        const struct intel_crtc_state *crtc_state =
                                to_intel_crtc_state(crtc->base.state);
 
-                       if (crtc_state->dsc.compression_enable) {
-                               drm_WARN_ON(&i915->drm, !connector->dp.dsc_decompression_aux);
-                               connector->dp.dsc_decompression_enabled = true;
-                       } else {
-                               connector->dp.dsc_decompression_enabled = false;
-                       }
                        conn_state->max_bpc = (crtc_state->pipe_bpp ?: 24) / 3;
                }
        }
@@ -775,8 +769,9 @@ static void intel_modeset_readout_hw_state(struct drm_i915_private *i915)
 
        drm_connector_list_iter_begin(&i915->drm, &conn_iter);
        for_each_intel_connector_iter(connector, &conn_iter) {
+               struct intel_crtc_state *crtc_state = NULL;
+
                if (connector->get_hw_state(connector)) {
-                       struct intel_crtc_state *crtc_state;
                        struct intel_crtc *crtc;
 
                        connector->base.dpms = DRM_MODE_DPMS_ON;
@@ -802,6 +797,10 @@ static void intel_modeset_readout_hw_state(struct drm_i915_private *i915)
                        connector->base.dpms = DRM_MODE_DPMS_OFF;
                        connector->base.encoder = NULL;
                }
+
+               if (connector->sync_state)
+                       connector->sync_state(connector, crtc_state);
+
                drm_dbg_kms(&i915->drm,
                            "[CONNECTOR:%d:%s] hw state readout: %s\n",
                            connector->base.base.id, connector->base.name,
index 8f702c3fc62d483e6ba92d4d02537576975441ae..4faaf4b3fc53baf048cad365636955c2fce0e921 100644 (file)
@@ -1525,8 +1525,18 @@ static void intel_psr_enable_source(struct intel_dp *intel_dp,
         * can rely on frontbuffer tracking.
         */
        mask = EDP_PSR_DEBUG_MASK_MEMUP |
-              EDP_PSR_DEBUG_MASK_HPD |
-              EDP_PSR_DEBUG_MASK_LPSP;
+              EDP_PSR_DEBUG_MASK_HPD;
+
+       /*
+        * For some unknown reason on HSW non-ULT (or at least on
+        * Dell Latitude E6540) external displays start to flicker
+        * when PSR is enabled on the eDP. SR/PC6 residency is much
+        * higher than should be possible with an external display.
+        * As a workaround leave LPSP unmasked to prevent PSR entry
+        * when external displays are active.
+        */
+       if (DISPLAY_VER(dev_priv) >= 8 || IS_HASWELL_ULT(dev_priv))
+               mask |= EDP_PSR_DEBUG_MASK_LPSP;
 
        if (DISPLAY_VER(dev_priv) < 20)
                mask |= EDP_PSR_DEBUG_MASK_MAX_SLEEP;
@@ -2766,9 +2776,6 @@ void intel_psr_init(struct intel_dp *intel_dp)
        if (!(HAS_PSR(dev_priv) || HAS_DP20(dev_priv)))
                return;
 
-       if (!intel_dp_is_edp(intel_dp))
-               intel_psr_init_dpcd(intel_dp);
-
        /*
         * HSW spec explicitly says PSR is tied to port A.
         * BDW+ platforms have a instance of PSR registers per transcoder but
index acc6b6804105102389dc26c3fefce80444d0adad..2915d7afe5ccc2facdaeaee164e7b9c60796f361 100644 (file)
@@ -1209,7 +1209,7 @@ static bool intel_sdvo_set_tv_format(struct intel_sdvo *intel_sdvo,
        struct intel_sdvo_tv_format format;
        u32 format_map;
 
-       format_map = 1 << conn_state->tv.mode;
+       format_map = 1 << conn_state->tv.legacy_mode;
        memset(&format, 0, sizeof(format));
        memcpy(&format, &format_map, min(sizeof(format), sizeof(format_map)));
 
@@ -2298,7 +2298,7 @@ static int intel_sdvo_get_tv_modes(struct drm_connector *connector)
         * Read the list of supported input resolutions for the selected TV
         * format.
         */
-       format_map = 1 << conn_state->tv.mode;
+       format_map = 1 << conn_state->tv.legacy_mode;
        memcpy(&tv_res, &format_map,
               min(sizeof(format_map), sizeof(struct intel_sdvo_sdtv_resolution_request)));
 
@@ -2363,7 +2363,7 @@ intel_sdvo_connector_atomic_get_property(struct drm_connector *connector,
                int i;
 
                for (i = 0; i < intel_sdvo_connector->format_supported_num; i++)
-                       if (state->tv.mode == intel_sdvo_connector->tv_format_supported[i]) {
+                       if (state->tv.legacy_mode == intel_sdvo_connector->tv_format_supported[i]) {
                                *val = i;
 
                                return 0;
@@ -2419,7 +2419,7 @@ intel_sdvo_connector_atomic_set_property(struct drm_connector *connector,
        struct intel_sdvo_connector_state *sdvo_state = to_intel_sdvo_connector_state(state);
 
        if (property == intel_sdvo_connector->tv_format) {
-               state->tv.mode = intel_sdvo_connector->tv_format_supported[val];
+               state->tv.legacy_mode = intel_sdvo_connector->tv_format_supported[val];
 
                if (state->crtc) {
                        struct drm_crtc_state *crtc_state =
@@ -3076,7 +3076,7 @@ static bool intel_sdvo_tv_create_property(struct intel_sdvo *intel_sdvo,
                drm_property_add_enum(intel_sdvo_connector->tv_format, i,
                                      tv_format_names[intel_sdvo_connector->tv_format_supported[i]]);
 
-       intel_sdvo_connector->base.base.state->tv.mode = intel_sdvo_connector->tv_format_supported[0];
+       intel_sdvo_connector->base.base.state->tv.legacy_mode = intel_sdvo_connector->tv_format_supported[0];
        drm_object_attach_property(&intel_sdvo_connector->base.base.base,
                                   intel_sdvo_connector->tv_format, 0);
        return true;
index d4386cb3569e0991bc3c0c78a4415d77a7bc1998..992a725de751a2d1925c23da8763e5ea7dce4714 100644 (file)
@@ -949,7 +949,7 @@ intel_disable_tv(struct intel_atomic_state *state,
 
 static const struct tv_mode *intel_tv_mode_find(const struct drm_connector_state *conn_state)
 {
-       int format = conn_state->tv.mode;
+       int format = conn_state->tv.legacy_mode;
 
        return &tv_modes[format];
 }
@@ -1704,7 +1704,7 @@ static void intel_tv_find_better_format(struct drm_connector *connector)
                        break;
        }
 
-       connector->state->tv.mode = i;
+       connector->state->tv.legacy_mode = i;
 }
 
 static int
@@ -1859,7 +1859,7 @@ static int intel_tv_atomic_check(struct drm_connector *connector,
        old_state = drm_atomic_get_old_connector_state(state, connector);
        new_crtc_state = drm_atomic_get_new_crtc_state(state, new_state->crtc);
 
-       if (old_state->tv.mode != new_state->tv.mode ||
+       if (old_state->tv.legacy_mode != new_state->tv.legacy_mode ||
            old_state->tv.margins.left != new_state->tv.margins.left ||
            old_state->tv.margins.right != new_state->tv.margins.right ||
            old_state->tv.margins.top != new_state->tv.margins.top ||
@@ -1896,7 +1896,7 @@ static void intel_tv_add_properties(struct drm_connector *connector)
        conn_state->tv.margins.right = 46;
        conn_state->tv.margins.bottom = 37;
 
-       conn_state->tv.mode = 0;
+       conn_state->tv.legacy_mode = 0;
 
        /* Create TV properties then attach current values */
        for (i = 0; i < ARRAY_SIZE(tv_modes); i++) {
@@ -1910,7 +1910,7 @@ static void intel_tv_add_properties(struct drm_connector *connector)
 
        drm_object_attach_property(&connector->base,
                                   i915->drm.mode_config.legacy_tv_mode_property,
-                                  conn_state->tv.mode);
+                                  conn_state->tv.legacy_mode);
        drm_object_attach_property(&connector->base,
                                   i915->drm.mode_config.tv_left_margin_property,
                                   conn_state->tv.margins.left);
index 64f440fdc22b2c832a77ca7ca73cf83ecc5ba625..8b21dc8e26d525f514f74f2a647ea703b2cd2d9a 100644 (file)
@@ -51,8 +51,8 @@
 #define DSCC_PICTURE_PARAMETER_SET_0           _MMIO(0x6BA00)
 #define _DSCA_PPS_0                            0x6B200
 #define _DSCC_PPS_0                            0x6BA00
-#define DSCA_PPS(pps)                          _MMIO(_DSCA_PPS_0 + (pps) * 4)
-#define DSCC_PPS(pps)                          _MMIO(_DSCC_PPS_0 + (pps) * 4)
+#define DSCA_PPS(pps)                          _MMIO(_DSCA_PPS_0 + ((pps) < 12 ? (pps) : (pps) + 12) * 4)
+#define DSCC_PPS(pps)                          _MMIO(_DSCC_PPS_0 + ((pps) < 12 ? (pps) : (pps) + 12) * 4)
 #define _ICL_DSC0_PICTURE_PARAMETER_SET_0_PB   0x78270
 #define _ICL_DSC1_PICTURE_PARAMETER_SET_0_PB   0x78370
 #define _ICL_DSC0_PICTURE_PARAMETER_SET_0_PC   0x78470
index 1d3ebdf4069b5d0fea98aefdb2b1609f82b9650e..c08b67593565c5827d4555e70b88b083e97172d9 100644 (file)
@@ -379,6 +379,9 @@ i915_gem_userptr_release(struct drm_i915_gem_object *obj)
 {
        GEM_WARN_ON(obj->userptr.page_ref);
 
+       if (!obj->userptr.notifier.mm)
+               return;
+
        mmu_interval_notifier_remove(&obj->userptr.notifier);
        obj->userptr.notifier.mm = NULL;
 }
index 90f6c1ece57d4478a30375df1725624b47449298..efcb00472be24779590fcce94753ab83a787f2c4 100644 (file)
@@ -2849,8 +2849,7 @@ static int handle_mmio(struct intel_gvt_mmio_table_iter *iter, u32 offset,
        for (i = start; i < end; i += 4) {
                p = intel_gvt_find_mmio_info(gvt, i);
                if (p) {
-                       WARN(1, "dup mmio definition offset %x\n",
-                               info->offset);
+                       WARN(1, "dup mmio definition offset %x\n", i);
 
                        /* We return -EEXIST here to make GVT-g load fail.
                         * So duplicated MMIO can be found as soon as
index e98b6d69a91ab70c224d67ad337926c1b69936b3..9b6d87c8b5831c14aec9bbc425a374f0721f0273 100644 (file)
@@ -41,7 +41,7 @@
  * To virtualize GPU resources GVT-g driver depends on hypervisor technology
  * e.g KVM/VFIO/mdev, Xen, etc. to provide resource access trapping capability
  * and be virtualized within GVT-g device module. More architectural design
- * doc is available on https://01.org/group/2230/documentation-list.
+ * doc is available on https://github.com/intel/gvt-linux/wiki.
  */
 
 static LIST_HEAD(intel_gvt_devices);
index 2990dd4d4a0d8a84ad5794815dbb4661610314a2..e14ac0ab1314d1032a707762a1be444a14b3ca39 100644 (file)
@@ -3,6 +3,8 @@
  * Copyright © 2021 Intel Corporation
  */
 
+#include <linux/jiffies.h>
+
 //#include "gt/intel_engine_user.h"
 #include "gt/intel_gt.h"
 #include "i915_drv.h"
@@ -12,7 +14,7 @@
 
 #define REDUCED_TIMESLICE      5
 #define REDUCED_PREEMPT                10
-#define WAIT_FOR_RESET_TIME    10000
+#define WAIT_FOR_RESET_TIME_MS 10000
 
 struct intel_engine_cs *intel_selftest_find_any_engine(struct intel_gt *gt)
 {
@@ -91,7 +93,7 @@ int intel_selftest_wait_for_rq(struct i915_request *rq)
 {
        long ret;
 
-       ret = i915_request_wait(rq, 0, WAIT_FOR_RESET_TIME);
+       ret = i915_request_wait(rq, 0, msecs_to_jiffies(WAIT_FOR_RESET_TIME_MS));
        if (ret < 0)
                return ret;
 
index 3f73b211fa8e3e3bc4812180883c9685f8377f19..3407450435e2057dd3973441ba6e31485e69ee6d 100644 (file)
@@ -294,6 +294,5 @@ void meson_encoder_cvbs_remove(struct meson_drm *priv)
        if (priv->encoders[MESON_ENC_CVBS]) {
                meson_encoder_cvbs = priv->encoders[MESON_ENC_CVBS];
                drm_bridge_remove(&meson_encoder_cvbs->bridge);
-               drm_bridge_remove(meson_encoder_cvbs->next_bridge);
        }
 }
index 3f93c70488cad1829bbe488d8bf8f7b3833859f1..311b91630fbe536cf724223a1fa71e565ba2c778 100644 (file)
@@ -168,6 +168,5 @@ void meson_encoder_dsi_remove(struct meson_drm *priv)
        if (priv->encoders[MESON_ENC_DSI]) {
                meson_encoder_dsi = priv->encoders[MESON_ENC_DSI];
                drm_bridge_remove(&meson_encoder_dsi->bridge);
-               drm_bridge_remove(meson_encoder_dsi->next_bridge);
        }
 }
index 25ea765586908f14d08715f45ca9def85a6a07f3..c4686568c9ca5d81b4066315681263e0fbd848a2 100644 (file)
@@ -474,6 +474,5 @@ void meson_encoder_hdmi_remove(struct meson_drm *priv)
        if (priv->encoders[MESON_ENC_HDMI]) {
                meson_encoder_hdmi = priv->encoders[MESON_ENC_HDMI];
                drm_bridge_remove(&meson_encoder_hdmi->bridge);
-               drm_bridge_remove(meson_encoder_hdmi->next_bridge);
        }
 }
index c0bc924cd3025dc21939e2e75548f273a90fd620..c9c55e2ea584927ce7b3f8ffc50e7ed807f6671a 100644 (file)
@@ -1287,7 +1287,7 @@ static void a6xx_calc_ubwc_config(struct adreno_gpu *gpu)
        gpu->ubwc_config.highest_bank_bit = 15;
 
        if (adreno_is_a610(gpu)) {
-               gpu->ubwc_config.highest_bank_bit = 14;
+               gpu->ubwc_config.highest_bank_bit = 13;
                gpu->ubwc_config.min_acc_len = 1;
                gpu->ubwc_config.ubwc_mode = 1;
        }
index 83380bc92a00a964479a0cbbb8dbc7a9dcd675ca..6a4b489d44e5173831d73956f1fb7a4e10809052 100644 (file)
@@ -144,10 +144,6 @@ enum dpu_enc_rc_states {
  *                     to track crtc in the disable() hook which is called
  *                     _after_ encoder_mask is cleared.
  * @connector:         If a mode is set, cached pointer to the active connector
- * @crtc_kickoff_cb:           Callback into CRTC that will flush & start
- *                             all CTL paths
- * @crtc_kickoff_cb_data:      Opaque user data given to crtc_kickoff_cb
- * @debugfs_root:              Debug file system root file node
  * @enc_lock:                  Lock around physical encoder
  *                             create/destroy/enable/disable
  * @frame_busy_mask:           Bitmask tracking which phys_enc we are still
@@ -2072,7 +2068,7 @@ void dpu_encoder_helper_phys_cleanup(struct dpu_encoder_phys *phys_enc)
        }
 
        /* reset the merge 3D HW block */
-       if (phys_enc->hw_pp->merge_3d) {
+       if (phys_enc->hw_pp && phys_enc->hw_pp->merge_3d) {
                phys_enc->hw_pp->merge_3d->ops.setup_3d_mode(phys_enc->hw_pp->merge_3d,
                                BLEND_3D_NONE);
                if (phys_enc->hw_ctl->ops.update_pending_flush_merge_3d)
@@ -2103,7 +2099,7 @@ void dpu_encoder_helper_phys_cleanup(struct dpu_encoder_phys *phys_enc)
        if (phys_enc->hw_wb)
                intf_cfg.wb = phys_enc->hw_wb->idx;
 
-       if (phys_enc->hw_pp->merge_3d)
+       if (phys_enc->hw_pp && phys_enc->hw_pp->merge_3d)
                intf_cfg.merge_3d = phys_enc->hw_pp->merge_3d->idx;
 
        if (ctl->ops.reset_intf_cfg)
index b58a9c2ae326cab6c4799a88fe86acbab1c236f8..724537ab776dfde95c6406cf0aef1b795874b171 100644 (file)
@@ -29,7 +29,6 @@ static inline bool reserved_by_other(uint32_t *res_map, int idx,
 /**
  * struct dpu_rm_requirements - Reservation requirements parameter bundle
  * @topology:  selected topology for the display
- * @hw_res:       Hardware resources required as reported by the encoders
  */
 struct dpu_rm_requirements {
        struct msm_display_topology topology;
@@ -204,6 +203,8 @@ static bool _dpu_rm_needs_split_display(const struct msm_display_topology *top)
  * _dpu_rm_get_lm_peer - get the id of a mixer which is a peer of the primary
  * @rm: dpu resource manager handle
  * @primary_idx: index of primary mixer in rm->mixer_blks[]
+ *
+ * Returns: lm peer mixed id on success or %-EINVAL on error
  */
 static int _dpu_rm_get_lm_peer(struct dpu_rm *rm, int primary_idx)
 {
index 77a8d9366ed7b01d46a01cf602e74eafb15d4937..fb588fde298a2de231ea5fdd8f639da156d47030 100644 (file)
@@ -135,11 +135,6 @@ static void dp_ctrl_config_ctrl(struct dp_ctrl_private *ctrl)
        tbd = dp_link_get_test_bits_depth(ctrl->link,
                        ctrl->panel->dp_mode.bpp);
 
-       if (tbd == DP_TEST_BIT_DEPTH_UNKNOWN) {
-               pr_debug("BIT_DEPTH not set. Configure default\n");
-               tbd = DP_TEST_BIT_DEPTH_8;
-       }
-
        config |= tbd << DP_CONFIGURATION_CTRL_BPC_SHIFT;
 
        /* Num of Lanes */
index d37d599aec273b41b7ec54eb04b55ee86770d1a5..4c72124ffb5d495bdd24eefaf086f4d9401663ce 100644 (file)
@@ -329,10 +329,26 @@ static const struct component_ops dp_display_comp_ops = {
        .unbind = dp_display_unbind,
 };
 
+static void dp_display_send_hpd_event(struct msm_dp *dp_display)
+{
+       struct dp_display_private *dp;
+       struct drm_connector *connector;
+
+       dp = container_of(dp_display, struct dp_display_private, dp_display);
+
+       connector = dp->dp_display.connector;
+       drm_helper_hpd_irq_event(connector->dev);
+}
+
 static int dp_display_send_hpd_notification(struct dp_display_private *dp,
                                            bool hpd)
 {
-       struct drm_bridge *bridge = dp->dp_display.bridge;
+       if ((hpd && dp->dp_display.link_ready) ||
+                       (!hpd && !dp->dp_display.link_ready)) {
+               drm_dbg_dp(dp->drm_dev, "HPD already %s\n",
+                               (hpd ? "on" : "off"));
+               return 0;
+       }
 
        /* reset video pattern flag on disconnect */
        if (!hpd) {
@@ -348,7 +364,7 @@ static int dp_display_send_hpd_notification(struct dp_display_private *dp,
 
        drm_dbg_dp(dp->drm_dev, "type=%d hpd=%d\n",
                        dp->dp_display.connector_type, hpd);
-       drm_bridge_hpd_notify(bridge, dp->dp_display.link_ready);
+       dp_display_send_hpd_event(&dp->dp_display);
 
        return 0;
 }
index 98427d45e9a7e3ac99a47871bbd1e0e893b2bc24..49dfac1fd1ef2158626f4a417b22e810414b76f9 100644 (file)
@@ -7,6 +7,7 @@
 
 #include <drm/drm_print.h>
 
+#include "dp_reg.h"
 #include "dp_link.h"
 #include "dp_panel.h"
 
@@ -1082,7 +1083,7 @@ int dp_link_process_request(struct dp_link *dp_link)
 
 int dp_link_get_colorimetry_config(struct dp_link *dp_link)
 {
-       u32 cc;
+       u32 cc = DP_MISC0_COLORIMERY_CFG_LEGACY_RGB;
        struct dp_link_private *link;
 
        if (!dp_link) {
@@ -1096,10 +1097,11 @@ int dp_link_get_colorimetry_config(struct dp_link *dp_link)
         * Unless a video pattern CTS test is ongoing, use RGB_VESA
         * Only RGB_VESA and RGB_CEA supported for now
         */
-       if (dp_link_is_video_pattern_requested(link))
-               cc = link->dp_link.test_video.test_dyn_range;
-       else
-               cc = DP_TEST_DYNAMIC_RANGE_VESA;
+       if (dp_link_is_video_pattern_requested(link)) {
+               if (link->dp_link.test_video.test_dyn_range &
+                                       DP_TEST_DYNAMIC_RANGE_CEA)
+                       cc = DP_MISC0_COLORIMERY_CFG_CEA_RGB;
+       }
 
        return cc;
 }
@@ -1179,6 +1181,9 @@ void dp_link_reset_phy_params_vx_px(struct dp_link *dp_link)
 u32 dp_link_get_test_bits_depth(struct dp_link *dp_link, u32 bpp)
 {
        u32 tbd;
+       struct dp_link_private *link;
+
+       link = container_of(dp_link, struct dp_link_private, dp_link);
 
        /*
         * Few simplistic rules and assumptions made here:
@@ -1196,12 +1201,13 @@ u32 dp_link_get_test_bits_depth(struct dp_link *dp_link, u32 bpp)
                tbd = DP_TEST_BIT_DEPTH_10;
                break;
        default:
-               tbd = DP_TEST_BIT_DEPTH_UNKNOWN;
+               drm_dbg_dp(link->drm_dev, "bpp=%d not supported, use bpc=8\n",
+                          bpp);
+               tbd = DP_TEST_BIT_DEPTH_8;
                break;
        }
 
-       if (tbd != DP_TEST_BIT_DEPTH_UNKNOWN)
-               tbd = (tbd >> DP_TEST_BIT_DEPTH_SHIFT);
+       tbd = (tbd >> DP_TEST_BIT_DEPTH_SHIFT);
 
        return tbd;
 }
index ea85a691e72b5ce505822e4fce21f0cbcf0c4319..78785ed4b40c490d83396825d62a65af2fd6c9df 100644 (file)
 #define DP_MISC0_COLORIMETRY_CFG_SHIFT         (0x00000001)
 #define DP_MISC0_TEST_BITS_DEPTH_SHIFT         (0x00000005)
 
+#define DP_MISC0_COLORIMERY_CFG_LEGACY_RGB     (0)
+#define DP_MISC0_COLORIMERY_CFG_CEA_RGB                (0x04)
+
 #define REG_DP_VALID_BOUNDARY                  (0x00000030)
 #define REG_DP_VALID_BOUNDARY_2                        (0x00000034)
 
index 5f68e31a3e4e1cbeed95bfde138711c0fc9c9759..0915f3b68752e34702ae8864249d049e3b277ee2 100644 (file)
@@ -26,7 +26,7 @@ int msm_gem_prime_vmap(struct drm_gem_object *obj, struct iosys_map *map)
 {
        void *vaddr;
 
-       vaddr = msm_gem_get_vaddr(obj);
+       vaddr = msm_gem_get_vaddr_locked(obj);
        if (IS_ERR(vaddr))
                return PTR_ERR(vaddr);
        iosys_map_set_vaddr(map, vaddr);
@@ -36,7 +36,7 @@ int msm_gem_prime_vmap(struct drm_gem_object *obj, struct iosys_map *map)
 
 void msm_gem_prime_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
 {
-       msm_gem_put_vaddr(obj);
+       msm_gem_put_vaddr_locked(obj);
 }
 
 struct drm_gem_object *msm_gem_prime_import_sg_table(struct drm_device *dev,
index 095390774f22b547668227ed492a6e9783b055f9..655002b21b0d5dc345283a7699d14b0e88b3e472 100644 (file)
@@ -751,12 +751,14 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
        struct msm_ringbuffer *ring = submit->ring;
        unsigned long flags;
 
-       pm_runtime_get_sync(&gpu->pdev->dev);
+       WARN_ON(!mutex_is_locked(&gpu->lock));
 
-       mutex_lock(&gpu->lock);
+       pm_runtime_get_sync(&gpu->pdev->dev);
 
        msm_gpu_hw_init(gpu);
 
+       submit->seqno = submit->hw_fence->seqno;
+
        update_sw_cntrs(gpu);
 
        /*
@@ -781,11 +783,8 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
        gpu->funcs->submit(gpu, submit);
        gpu->cur_ctx_seqno = submit->queue->ctx->seqno;
 
-       hangcheck_timer_reset(gpu);
-
-       mutex_unlock(&gpu->lock);
-
        pm_runtime_put(&gpu->pdev->dev);
+       hangcheck_timer_reset(gpu);
 }
 
 /*
index 5cc8d358cc9759307a444cd62bce83c62b3dcdb7..d5512037c38bcd7ca807aaf281dceaebdd688a4e 100644 (file)
@@ -21,6 +21,8 @@ struct msm_iommu_pagetable {
        struct msm_mmu base;
        struct msm_mmu *parent;
        struct io_pgtable_ops *pgtbl_ops;
+       const struct iommu_flush_ops *tlb;
+       struct device *iommu_dev;
        unsigned long pgsize_bitmap;    /* Bitmap of page sizes in use */
        phys_addr_t ttbr;
        u32 asid;
@@ -201,11 +203,33 @@ static const struct msm_mmu_funcs pagetable_funcs = {
 
 static void msm_iommu_tlb_flush_all(void *cookie)
 {
+       struct msm_iommu_pagetable *pagetable = cookie;
+       struct adreno_smmu_priv *adreno_smmu;
+
+       if (!pm_runtime_get_if_in_use(pagetable->iommu_dev))
+               return;
+
+       adreno_smmu = dev_get_drvdata(pagetable->parent->dev);
+
+       pagetable->tlb->tlb_flush_all((void *)adreno_smmu->cookie);
+
+       pm_runtime_put_autosuspend(pagetable->iommu_dev);
 }
 
 static void msm_iommu_tlb_flush_walk(unsigned long iova, size_t size,
                size_t granule, void *cookie)
 {
+       struct msm_iommu_pagetable *pagetable = cookie;
+       struct adreno_smmu_priv *adreno_smmu;
+
+       if (!pm_runtime_get_if_in_use(pagetable->iommu_dev))
+               return;
+
+       adreno_smmu = dev_get_drvdata(pagetable->parent->dev);
+
+       pagetable->tlb->tlb_flush_walk(iova, size, granule, (void *)adreno_smmu->cookie);
+
+       pm_runtime_put_autosuspend(pagetable->iommu_dev);
 }
 
 static void msm_iommu_tlb_add_page(struct iommu_iotlb_gather *gather,
@@ -213,7 +237,7 @@ static void msm_iommu_tlb_add_page(struct iommu_iotlb_gather *gather,
 {
 }
 
-static const struct iommu_flush_ops null_tlb_ops = {
+static const struct iommu_flush_ops tlb_ops = {
        .tlb_flush_all = msm_iommu_tlb_flush_all,
        .tlb_flush_walk = msm_iommu_tlb_flush_walk,
        .tlb_add_page = msm_iommu_tlb_add_page,
@@ -254,10 +278,10 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
 
        /* The incoming cfg will have the TTBR1 quirk enabled */
        ttbr0_cfg.quirks &= ~IO_PGTABLE_QUIRK_ARM_TTBR1;
-       ttbr0_cfg.tlb = &null_tlb_ops;
+       ttbr0_cfg.tlb = &tlb_ops;
 
        pagetable->pgtbl_ops = alloc_io_pgtable_ops(ARM_64_LPAE_S1,
-               &ttbr0_cfg, iommu->domain);
+               &ttbr0_cfg, pagetable);
 
        if (!pagetable->pgtbl_ops) {
                kfree(pagetable);
@@ -279,6 +303,8 @@ struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
 
        /* Needed later for TLB flush */
        pagetable->parent = parent;
+       pagetable->tlb = ttbr1_cfg->tlb;
+       pagetable->iommu_dev = ttbr1_cfg->iommu_dev;
        pagetable->pgsize_bitmap = ttbr0_cfg.pgsize_bitmap;
        pagetable->ttbr = ttbr0_cfg.arm_lpae_s1_cfg.ttbr;
 
index 455b2e3a0cdd4811c67fda8efccd9dd3dcf77a16..35423d10aafa90b98bb6f92c3da405940c8938ea 100644 (file)
@@ -562,6 +562,7 @@ static const struct msm_mdss_data sdm670_data = {
        .ubwc_enc_version = UBWC_2_0,
        .ubwc_dec_version = UBWC_2_0,
        .highest_bank_bit = 1,
+       .reg_bus_bw = 76800,
 };
 
 static const struct msm_mdss_data sdm845_data = {
index 4bc13f7d005ab7c643f78206d8d41d72cd779045..9d6655f96f0cebcc0c03e5b9bef6900c299f2f0d 100644 (file)
@@ -21,8 +21,6 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job)
 
        msm_fence_init(submit->hw_fence, fctx);
 
-       submit->seqno = submit->hw_fence->seqno;
-
        mutex_lock(&priv->lru.lock);
 
        for (i = 0; i < submit->nr_bos; i++) {
@@ -35,8 +33,13 @@ static struct dma_fence *msm_job_run(struct drm_sched_job *job)
 
        mutex_unlock(&priv->lru.lock);
 
+       /* TODO move submit path over to using a per-ring lock.. */
+       mutex_lock(&gpu->lock);
+
        msm_gpu_submit(gpu, submit);
 
+       mutex_unlock(&gpu->lock);
+
        return dma_fence_get(submit->hw_fence);
 }
 
index 1e6aaf95ff7c79483f7d8bba1ddce897bb7affcf..ceef470c9fbfcfb08be6abd69627b7e7bc66366d 100644 (file)
@@ -100,3 +100,11 @@ config DRM_NOUVEAU_SVM
        help
          Say Y here if you want to enable experimental support for
          Shared Virtual Memory (SVM).
+
+config DRM_NOUVEAU_GSP_DEFAULT
+       bool "Use GSP firmware for Turing/Ampere (needs firmware installed)"
+       depends on DRM_NOUVEAU
+       default n
+       help
+         Say Y here if you want to use the GSP codepaths by default on
+         Turing and Ampere GPUs.
index 0d9fc741a719328722f2c1873bb07bf4b120e890..932c9fd0b2d89ce8c3ec04165bbefa0cec8b25ce 100644 (file)
@@ -11,6 +11,7 @@ struct nvkm_client {
        u32 debug;
 
        struct rb_root objroot;
+       spinlock_t obj_lock;
 
        void *data;
        int (*event)(u64 token, void *argv, u32 argc);
index d1437c08645f90d9c745ee77405d3fa1d8d51f9d..6f5d376d8fcc1ecb6d9faa80b4b06ba4cd1b21e4 100644 (file)
@@ -9,7 +9,7 @@
 #define GSP_PAGE_SIZE  BIT(GSP_PAGE_SHIFT)
 
 struct nvkm_gsp_mem {
-       u32 size;
+       size_t size;
        void *data;
        dma_addr_t addr;
 };
index a04156ca8390ba6fea6a21e07e9eb5bba3ec7605..80f74ee0fc78677f8f890e8cc5daf8f363817b34 100644 (file)
@@ -128,12 +128,14 @@ nouveau_abi16_chan_fini(struct nouveau_abi16 *abi16,
        struct nouveau_abi16_ntfy *ntfy, *temp;
 
        /* Cancel all jobs from the entity's queue. */
-       drm_sched_entity_fini(&chan->sched.entity);
+       if (chan->sched)
+               drm_sched_entity_fini(&chan->sched->entity);
 
        if (chan->chan)
                nouveau_channel_idle(chan->chan);
 
-       nouveau_sched_fini(&chan->sched);
+       if (chan->sched)
+               nouveau_sched_destroy(&chan->sched);
 
        /* cleanup notifier state */
        list_for_each_entry_safe(ntfy, temp, &chan->notifiers, head) {
@@ -197,6 +199,7 @@ nouveau_abi16_ioctl_getparam(ABI16_IOCTL_ARGS)
        struct nouveau_cli *cli = nouveau_cli(file_priv);
        struct nouveau_drm *drm = nouveau_drm(dev);
        struct nvif_device *device = &drm->client.device;
+       struct nvkm_device *nvkm_device = nvxx_device(&drm->client.device);
        struct nvkm_gr *gr = nvxx_gr(device);
        struct drm_nouveau_getparam *getparam = data;
        struct pci_dev *pdev = to_pci_dev(dev->dev);
@@ -261,6 +264,14 @@ nouveau_abi16_ioctl_getparam(ABI16_IOCTL_ARGS)
                getparam->value = nouveau_exec_push_max_from_ib_max(ib_max);
                break;
        }
+       case NOUVEAU_GETPARAM_VRAM_BAR_SIZE:
+               getparam->value = nvkm_device->func->resource_size(nvkm_device, 1);
+               break;
+       case NOUVEAU_GETPARAM_VRAM_USED: {
+               struct ttm_resource_manager *vram_mgr = ttm_manager_type(&drm->ttm.bdev, TTM_PL_VRAM);
+               getparam->value = (u64)ttm_resource_manager_usage(vram_mgr);
+               break;
+       }
        default:
                NV_PRINTK(dbg, cli, "unknown parameter %lld\n", getparam->param);
                return -EINVAL;
@@ -337,10 +348,16 @@ nouveau_abi16_ioctl_channel_alloc(ABI16_IOCTL_ARGS)
        if (ret)
                goto done;
 
-       ret = nouveau_sched_init(&chan->sched, drm, drm->sched_wq,
-                                chan->chan->dma.ib_max);
-       if (ret)
-               goto done;
+       /* If we're not using the VM_BIND uAPI, we don't need a scheduler.
+        *
+        * The client lock is already acquired by nouveau_abi16_get().
+        */
+       if (nouveau_cli_uvmm(cli)) {
+               ret = nouveau_sched_create(&chan->sched, drm, drm->sched_wq,
+                                          chan->chan->dma.ib_max);
+               if (ret)
+                       goto done;
+       }
 
        init->channel = chan->chan->chid;
 
index 1f5e243c0c759ef759dbba7d4f89279c90bce5d4..11c8c4a80079bbb2b658816dd42d05f68a5eaab6 100644 (file)
@@ -26,7 +26,7 @@ struct nouveau_abi16_chan {
        struct nouveau_bo *ntfy;
        struct nouveau_vma *ntfy_vma;
        struct nvkm_mm  heap;
-       struct nouveau_sched sched;
+       struct nouveau_sched *sched;
 };
 
 struct nouveau_abi16 {
index 6f6c31a9937b2fe751c6cffe429cc21a6b47a385..a947e1d5f309ae525e8087d13899f1efd1e8e73b 100644 (file)
@@ -201,7 +201,8 @@ nouveau_cli_fini(struct nouveau_cli *cli)
        WARN_ON(!list_empty(&cli->worker));
 
        usif_client_fini(cli);
-       nouveau_sched_fini(&cli->sched);
+       if (cli->sched)
+               nouveau_sched_destroy(&cli->sched);
        if (uvmm)
                nouveau_uvmm_fini(uvmm);
        nouveau_vmm_fini(&cli->svm);
@@ -311,7 +312,7 @@ nouveau_cli_init(struct nouveau_drm *drm, const char *sname,
        cli->mem = &mems[ret];
 
        /* Don't pass in the (shared) sched_wq in order to let
-        * nouveau_sched_init() create a dedicated one for VM_BIND jobs.
+        * nouveau_sched_create() create a dedicated one for VM_BIND jobs.
         *
         * This is required to ensure that for VM_BIND jobs free_job() work and
         * run_job() work can always run concurrently and hence, free_job() work
@@ -320,7 +321,7 @@ nouveau_cli_init(struct nouveau_drm *drm, const char *sname,
         * locks which indirectly or directly are held for allocations
         * elsewhere.
         */
-       ret = nouveau_sched_init(&cli->sched, drm, NULL, 1);
+       ret = nouveau_sched_create(&cli->sched, drm, NULL, 1);
        if (ret)
                goto done;
 
index 8a6d94c8b1631fd7ab8bbc193f35b064057a0185..e239c6bf4afa4f75d4ca30c63583af82f2ab9621 100644 (file)
@@ -98,7 +98,7 @@ struct nouveau_cli {
                bool disabled;
        } uvmm;
 
-       struct nouveau_sched sched;
+       struct nouveau_sched *sched;
 
        const struct nvif_mclass *mem;
 
index bc5d71b79ab203ff7e874c612f3ea1e7c36323de..e65c0ef23bc73d59f3066ff02ae9360253b93e6d 100644 (file)
@@ -389,7 +389,7 @@ nouveau_exec_ioctl_exec(struct drm_device *dev,
        if (ret)
                goto out;
 
-       args.sched = &chan16->sched;
+       args.sched = chan16->sched;
        args.file_priv = file_priv;
        args.chan = chan;
 
index 5057d976fa578cebe2e9e847c6e78634d2b08968..93f08f9479d89bfda87fbeef246c9dd702f047a1 100644 (file)
@@ -62,7 +62,7 @@ nouveau_fence_signal(struct nouveau_fence *fence)
        if (test_bit(DMA_FENCE_FLAG_USER_BITS, &fence->base.flags)) {
                struct nouveau_fence_chan *fctx = nouveau_fctx(fence);
 
-               if (atomic_dec_and_test(&fctx->notify_ref))
+               if (!--fctx->notify_ref)
                        drop = 1;
        }
 
@@ -103,7 +103,7 @@ nouveau_fence_context_kill(struct nouveau_fence_chan *fctx, int error)
 void
 nouveau_fence_context_del(struct nouveau_fence_chan *fctx)
 {
-       cancel_work_sync(&fctx->allow_block_work);
+       cancel_work_sync(&fctx->uevent_work);
        nouveau_fence_context_kill(fctx, 0);
        nvif_event_dtor(&fctx->event);
        fctx->dead = 1;
@@ -146,12 +146,13 @@ nouveau_fence_update(struct nouveau_channel *chan, struct nouveau_fence_chan *fc
        return drop;
 }
 
-static int
-nouveau_fence_wait_uevent_handler(struct nvif_event *event, void *repv, u32 repc)
+static void
+nouveau_fence_uevent_work(struct work_struct *work)
 {
-       struct nouveau_fence_chan *fctx = container_of(event, typeof(*fctx), event);
+       struct nouveau_fence_chan *fctx = container_of(work, struct nouveau_fence_chan,
+                                                      uevent_work);
        unsigned long flags;
-       int ret = NVIF_EVENT_KEEP;
+       int drop = 0;
 
        spin_lock_irqsave(&fctx->lock, flags);
        if (!list_empty(&fctx->pending)) {
@@ -161,23 +162,20 @@ nouveau_fence_wait_uevent_handler(struct nvif_event *event, void *repv, u32 repc
                fence = list_entry(fctx->pending.next, typeof(*fence), head);
                chan = rcu_dereference_protected(fence->channel, lockdep_is_held(&fctx->lock));
                if (nouveau_fence_update(chan, fctx))
-                       ret = NVIF_EVENT_DROP;
+                       drop = 1;
        }
-       spin_unlock_irqrestore(&fctx->lock, flags);
+       if (drop)
+               nvif_event_block(&fctx->event);
 
-       return ret;
+       spin_unlock_irqrestore(&fctx->lock, flags);
 }
 
-static void
-nouveau_fence_work_allow_block(struct work_struct *work)
+static int
+nouveau_fence_wait_uevent_handler(struct nvif_event *event, void *repv, u32 repc)
 {
-       struct nouveau_fence_chan *fctx = container_of(work, struct nouveau_fence_chan,
-                                                      allow_block_work);
-
-       if (atomic_read(&fctx->notify_ref) == 0)
-               nvif_event_block(&fctx->event);
-       else
-               nvif_event_allow(&fctx->event);
+       struct nouveau_fence_chan *fctx = container_of(event, typeof(*fctx), event);
+       schedule_work(&fctx->uevent_work);
+       return NVIF_EVENT_KEEP;
 }
 
 void
@@ -191,7 +189,7 @@ nouveau_fence_context_new(struct nouveau_channel *chan, struct nouveau_fence_cha
        } args;
        int ret;
 
-       INIT_WORK(&fctx->allow_block_work, nouveau_fence_work_allow_block);
+       INIT_WORK(&fctx->uevent_work, nouveau_fence_uevent_work);
        INIT_LIST_HEAD(&fctx->flip);
        INIT_LIST_HEAD(&fctx->pending);
        spin_lock_init(&fctx->lock);
@@ -535,19 +533,15 @@ static bool nouveau_fence_enable_signaling(struct dma_fence *f)
        struct nouveau_fence *fence = from_fence(f);
        struct nouveau_fence_chan *fctx = nouveau_fctx(fence);
        bool ret;
-       bool do_work;
 
-       if (atomic_inc_return(&fctx->notify_ref) == 0)
-               do_work = true;
+       if (!fctx->notify_ref++)
+               nvif_event_allow(&fctx->event);
 
        ret = nouveau_fence_no_signaling(f);
        if (ret)
                set_bit(DMA_FENCE_FLAG_USER_BITS, &fence->base.flags);
-       else if (atomic_dec_and_test(&fctx->notify_ref))
-               do_work = true;
-
-       if (do_work)
-               schedule_work(&fctx->allow_block_work);
+       else if (!--fctx->notify_ref)
+               nvif_event_block(&fctx->event);
 
        return ret;
 }
index 28f5cf013b8983240204d028c8367249a63912e0..8bc065acfe35870f62bd0f2e37df47a35eb8ae38 100644 (file)
@@ -3,7 +3,6 @@
 #define __NOUVEAU_FENCE_H__
 
 #include <linux/dma-fence.h>
-#include <linux/workqueue.h>
 #include <nvif/event.h>
 
 struct nouveau_drm;
@@ -45,10 +44,9 @@ struct nouveau_fence_chan {
        u32 context;
        char name[32];
 
+       struct work_struct uevent_work;
        struct nvif_event event;
-       struct work_struct allow_block_work;
-       atomic_t notify_ref;
-       int dead, killed;
+       int notify_ref, dead, killed;
 };
 
 struct nouveau_fence_priv {
index 49c2bcbef1299de1f556353423300b345e5cc538..5a887d67dc0e8c71cf1987acb68f48f4bbf05d70 100644 (file)
@@ -764,7 +764,7 @@ nouveau_gem_ioctl_pushbuf(struct drm_device *dev, void *data,
                return -ENOMEM;
 
        if (unlikely(nouveau_cli_uvmm(cli)))
-               return -ENOSYS;
+               return nouveau_abi16_put(abi16, -ENOSYS);
 
        list_for_each_entry(temp, &abi16->channels, head) {
                if (temp->chan->chid == req->channel) {
index dd98f6910f9cab7b19117186339a138277e77b78..32fa2e273965bf140a4cb2e05262b638c312cd6e 100644 (file)
@@ -398,7 +398,7 @@ static const struct drm_sched_backend_ops nouveau_sched_ops = {
        .free_job = nouveau_sched_free_job,
 };
 
-int
+static int
 nouveau_sched_init(struct nouveau_sched *sched, struct nouveau_drm *drm,
                   struct workqueue_struct *wq, u32 credit_limit)
 {
@@ -453,7 +453,30 @@ fail_wq:
        return ret;
 }
 
-void
+int
+nouveau_sched_create(struct nouveau_sched **psched, struct nouveau_drm *drm,
+                    struct workqueue_struct *wq, u32 credit_limit)
+{
+       struct nouveau_sched *sched;
+       int ret;
+
+       sched = kzalloc(sizeof(*sched), GFP_KERNEL);
+       if (!sched)
+               return -ENOMEM;
+
+       ret = nouveau_sched_init(sched, drm, wq, credit_limit);
+       if (ret) {
+               kfree(sched);
+               return ret;
+       }
+
+       *psched = sched;
+
+       return 0;
+}
+
+
+static void
 nouveau_sched_fini(struct nouveau_sched *sched)
 {
        struct drm_gpu_scheduler *drm_sched = &sched->base;
@@ -471,3 +494,14 @@ nouveau_sched_fini(struct nouveau_sched *sched)
        if (sched->wq)
                destroy_workqueue(sched->wq);
 }
+
+void
+nouveau_sched_destroy(struct nouveau_sched **psched)
+{
+       struct nouveau_sched *sched = *psched;
+
+       nouveau_sched_fini(sched);
+       kfree(sched);
+
+       *psched = NULL;
+}
index a6528f5981e6a6e8182a44e0ec3c0336302e6154..e1f01a23e6f6e84cf2700bde86e4fb5e3e013df1 100644 (file)
@@ -111,8 +111,8 @@ struct nouveau_sched {
        } job;
 };
 
-int nouveau_sched_init(struct nouveau_sched *sched, struct nouveau_drm *drm,
-                      struct workqueue_struct *wq, u32 credit_limit);
-void nouveau_sched_fini(struct nouveau_sched *sched);
+int nouveau_sched_create(struct nouveau_sched **psched, struct nouveau_drm *drm,
+                        struct workqueue_struct *wq, u32 credit_limit);
+void nouveau_sched_destroy(struct nouveau_sched **psched);
 
 #endif
index cc03e0c22ff3fec65cf6a40ae34db2af20bb349e..5e4565c5011a976d1c8057e9366d9e1da03de97a 100644 (file)
@@ -1011,7 +1011,7 @@ nouveau_svm_fault_buffer_ctor(struct nouveau_svm *svm, s32 oclass, int id)
        if (ret)
                return ret;
 
-       buffer->fault = kvcalloc(sizeof(*buffer->fault), buffer->entries, GFP_KERNEL);
+       buffer->fault = kvcalloc(buffer->entries, sizeof(*buffer->fault), GFP_KERNEL);
        if (!buffer->fault)
                return -ENOMEM;
 
index 4f223c972c6a8cb3bab7873dfa2a1c38756648b2..0a0a11dc9ec03eeba855f47ca57c1ad1c5669f54 100644 (file)
@@ -1740,7 +1740,7 @@ nouveau_uvmm_ioctl_vm_bind(struct drm_device *dev,
        if (ret)
                return ret;
 
-       args.sched = &cli->sched;
+       args.sched = cli->sched;
        args.file_priv = file_priv;
 
        ret = nouveau_uvmm_vm_bind(&args);
index ebdeb8eb9e774186707d17508975311c14f3fabf..c55662937ab22caa54cd0d8f3df44b0d3e170197 100644 (file)
@@ -180,6 +180,7 @@ nvkm_client_new(const char *name, u64 device, const char *cfg, const char *dbg,
        client->device = device;
        client->debug = nvkm_dbgopt(dbg, "CLIENT");
        client->objroot = RB_ROOT;
+       spin_lock_init(&client->obj_lock);
        client->event = event;
        INIT_LIST_HEAD(&client->umem);
        spin_lock_init(&client->lock);
index 7c554c14e8841da1bb0374f25d9a47512c6f3765..aea3ba72027abfbdf0456f065bce416eac26b348 100644 (file)
@@ -30,8 +30,10 @@ nvkm_object_search(struct nvkm_client *client, u64 handle,
                   const struct nvkm_object_func *func)
 {
        struct nvkm_object *object;
+       unsigned long flags;
 
        if (handle) {
+               spin_lock_irqsave(&client->obj_lock, flags);
                struct rb_node *node = client->objroot.rb_node;
                while (node) {
                        object = rb_entry(node, typeof(*object), node);
@@ -40,9 +42,12 @@ nvkm_object_search(struct nvkm_client *client, u64 handle,
                        else
                        if (handle > object->object)
                                node = node->rb_right;
-                       else
+                       else {
+                               spin_unlock_irqrestore(&client->obj_lock, flags);
                                goto done;
+                       }
                }
+               spin_unlock_irqrestore(&client->obj_lock, flags);
                return ERR_PTR(-ENOENT);
        } else {
                object = &client->object;
@@ -57,30 +62,39 @@ done:
 void
 nvkm_object_remove(struct nvkm_object *object)
 {
+       unsigned long flags;
+
+       spin_lock_irqsave(&object->client->obj_lock, flags);
        if (!RB_EMPTY_NODE(&object->node))
                rb_erase(&object->node, &object->client->objroot);
+       spin_unlock_irqrestore(&object->client->obj_lock, flags);
 }
 
 bool
 nvkm_object_insert(struct nvkm_object *object)
 {
-       struct rb_node **ptr = &object->client->objroot.rb_node;
+       struct rb_node **ptr;
        struct rb_node *parent = NULL;
+       unsigned long flags;
 
+       spin_lock_irqsave(&object->client->obj_lock, flags);
+       ptr = &object->client->objroot.rb_node;
        while (*ptr) {
                struct nvkm_object *this = rb_entry(*ptr, typeof(*this), node);
                parent = *ptr;
-               if (object->object < this->object)
+               if (object->object < this->object) {
                        ptr = &parent->rb_left;
-               else
-               if (object->object > this->object)
+               } else if (object->object > this->object) {
                        ptr = &parent->rb_right;
-               else
+               } else {
+                       spin_unlock_irqrestore(&object->client->obj_lock, flags);
                        return false;
+               }
        }
 
        rb_link_node(&object->node, parent, ptr);
        rb_insert_color(&object->node, &object->client->objroot);
+       spin_unlock_irqrestore(&object->client->obj_lock, flags);
        return true;
 }
 
index 4135690326f44789535e8cb375ccfe1ee5fa68c3..3a30bea30e366f47ecda0bbabac5441aed285565 100644 (file)
@@ -168,12 +168,11 @@ r535_bar_new_(const struct nvkm_bar_func *hw, struct nvkm_device *device,
        rm->flush = r535_bar_flush;
 
        ret = gf100_bar_new_(rm, device, type, inst, &bar);
-       *pbar = bar;
        if (ret) {
-               if (!bar)
-                       kfree(rm);
+               kfree(rm);
                return ret;
        }
+       *pbar = bar;
 
        bar->flushBAR2PhysMode = ioremap(device->func->resource_addr(device, 3), PAGE_SIZE);
        if (!bar->flushBAR2PhysMode)
index 19188683c8fca90a7656b53ab15a8ee58d8575e0..8c2bf1c16f2a9568a8d434838d0c7691d9d70ff7 100644 (file)
@@ -154,11 +154,17 @@ shadow_fw_init(struct nvkm_bios *bios, const char *name)
        return (void *)fw;
 }
 
+static void
+shadow_fw_release(void *fw)
+{
+       release_firmware(fw);
+}
+
 static const struct nvbios_source
 shadow_fw = {
        .name = "firmware",
        .init = shadow_fw_init,
-       .fini = (void(*)(void *))release_firmware,
+       .fini = shadow_fw_release,
        .read = shadow_fw_read,
        .rw = false,
 };
index 9ee58e2a0eb2ad99c198ea7a58e6e1cf02a667d0..a73a5b58979045b07468c1443940f87e1b151f67 100644 (file)
@@ -997,6 +997,32 @@ r535_gsp_rpc_get_gsp_static_info(struct nvkm_gsp *gsp)
        return 0;
 }
 
+static void
+nvkm_gsp_mem_dtor(struct nvkm_gsp *gsp, struct nvkm_gsp_mem *mem)
+{
+       if (mem->data) {
+               /*
+                * Poison the buffer to catch any unexpected access from
+                * GSP-RM if the buffer was prematurely freed.
+                */
+               memset(mem->data, 0xFF, mem->size);
+
+               dma_free_coherent(gsp->subdev.device->dev, mem->size, mem->data, mem->addr);
+               memset(mem, 0, sizeof(*mem));
+       }
+}
+
+static int
+nvkm_gsp_mem_ctor(struct nvkm_gsp *gsp, size_t size, struct nvkm_gsp_mem *mem)
+{
+       mem->size = size;
+       mem->data = dma_alloc_coherent(gsp->subdev.device->dev, size, &mem->addr, GFP_KERNEL);
+       if (WARN_ON(!mem->data))
+               return -ENOMEM;
+
+       return 0;
+}
+
 static int
 r535_gsp_postinit(struct nvkm_gsp *gsp)
 {
@@ -1024,6 +1050,11 @@ r535_gsp_postinit(struct nvkm_gsp *gsp)
 
        nvkm_inth_allow(&gsp->subdev.inth);
        nvkm_wr32(device, 0x110004, 0x00000040);
+
+       /* Release the DMA buffers that were needed only for boot and init */
+       nvkm_gsp_mem_dtor(gsp, &gsp->boot.fw);
+       nvkm_gsp_mem_dtor(gsp, &gsp->libos);
+
        return ret;
 }
 
@@ -1078,7 +1109,6 @@ r535_gsp_rpc_set_registry(struct nvkm_gsp *gsp)
        if (IS_ERR(rpc))
                return PTR_ERR(rpc);
 
-       rpc->size = sizeof(*rpc);
        rpc->numEntries = NV_GSP_REG_NUM_ENTRIES;
 
        str_offset = offsetof(typeof(*rpc), entries[NV_GSP_REG_NUM_ENTRIES]);
@@ -1094,6 +1124,7 @@ r535_gsp_rpc_set_registry(struct nvkm_gsp *gsp)
                strings += name_len;
                str_offset += name_len;
        }
+       rpc->size = str_offset;
 
        return nvkm_gsp_rpc_wr(gsp, rpc, false);
 }
@@ -1532,27 +1563,6 @@ r535_gsp_msg_run_cpu_sequencer(void *priv, u32 fn, void *repv, u32 repc)
        return 0;
 }
 
-static void
-nvkm_gsp_mem_dtor(struct nvkm_gsp *gsp, struct nvkm_gsp_mem *mem)
-{
-       if (mem->data) {
-               dma_free_coherent(gsp->subdev.device->dev, mem->size, mem->data, mem->addr);
-               mem->data = NULL;
-       }
-}
-
-static int
-nvkm_gsp_mem_ctor(struct nvkm_gsp *gsp, u32 size, struct nvkm_gsp_mem *mem)
-{
-       mem->size = size;
-       mem->data = dma_alloc_coherent(gsp->subdev.device->dev, size, &mem->addr, GFP_KERNEL);
-       if (WARN_ON(!mem->data))
-               return -ENOMEM;
-
-       return 0;
-}
-
-
 static int
 r535_gsp_booter_unload(struct nvkm_gsp *gsp, u32 mbox0, u32 mbox1)
 {
@@ -1938,20 +1948,20 @@ nvkm_gsp_radix3_dtor(struct nvkm_gsp *gsp, struct nvkm_gsp_radix3 *rx3)
  * See kgspCreateRadix3_IMPL
  */
 static int
-nvkm_gsp_radix3_sg(struct nvkm_device *device, struct sg_table *sgt, u64 size,
+nvkm_gsp_radix3_sg(struct nvkm_gsp *gsp, struct sg_table *sgt, u64 size,
                   struct nvkm_gsp_radix3 *rx3)
 {
        u64 addr;
 
        for (int i = ARRAY_SIZE(rx3->mem) - 1; i >= 0; i--) {
                u64 *ptes;
-               int idx;
+               size_t bufsize;
+               int ret, idx;
 
-               rx3->mem[i].size = ALIGN((size / GSP_PAGE_SIZE) * sizeof(u64), GSP_PAGE_SIZE);
-               rx3->mem[i].data = dma_alloc_coherent(device->dev, rx3->mem[i].size,
-                                                     &rx3->mem[i].addr, GFP_KERNEL);
-               if (WARN_ON(!rx3->mem[i].data))
-                       return -ENOMEM;
+               bufsize = ALIGN((size / GSP_PAGE_SIZE) * sizeof(u64), GSP_PAGE_SIZE);
+               ret = nvkm_gsp_mem_ctor(gsp, bufsize, &rx3->mem[i]);
+               if (ret)
+                       return ret;
 
                ptes = rx3->mem[i].data;
                if (i == 2) {
@@ -1991,7 +2001,7 @@ r535_gsp_fini(struct nvkm_gsp *gsp, bool suspend)
                if (ret)
                        return ret;
 
-               ret = nvkm_gsp_radix3_sg(gsp->subdev.device, &gsp->sr.sgt, len, &gsp->sr.radix3);
+               ret = nvkm_gsp_radix3_sg(gsp, &gsp->sr.sgt, len, &gsp->sr.radix3);
                if (ret)
                        return ret;
 
@@ -2150,6 +2160,13 @@ r535_gsp_dtor(struct nvkm_gsp *gsp)
        mutex_destroy(&gsp->cmdq.mutex);
 
        r535_gsp_dtor_fws(gsp);
+
+       nvkm_gsp_mem_dtor(gsp, &gsp->rmargs);
+       nvkm_gsp_mem_dtor(gsp, &gsp->wpr_meta);
+       nvkm_gsp_mem_dtor(gsp, &gsp->shm.mem);
+       nvkm_gsp_mem_dtor(gsp, &gsp->loginit);
+       nvkm_gsp_mem_dtor(gsp, &gsp->logintr);
+       nvkm_gsp_mem_dtor(gsp, &gsp->logrm);
 }
 
 int
@@ -2194,7 +2211,7 @@ r535_gsp_oneinit(struct nvkm_gsp *gsp)
        memcpy(gsp->sig.data, data, size);
 
        /* Build radix3 page table for ELF image. */
-       ret = nvkm_gsp_radix3_sg(device, &gsp->fw.mem.sgt, gsp->fw.len, &gsp->radix3);
+       ret = nvkm_gsp_radix3_sg(gsp, &gsp->fw.mem.sgt, gsp->fw.len, &gsp->radix3);
        if (ret)
                return ret;
 
@@ -2295,8 +2312,12 @@ r535_gsp_load(struct nvkm_gsp *gsp, int ver, const struct nvkm_gsp_fwif *fwif)
 {
        struct nvkm_subdev *subdev = &gsp->subdev;
        int ret;
+       bool enable_gsp = fwif->enable;
 
-       if (!nvkm_boolopt(subdev->device->cfgopt, "NvGspRm", fwif->enable))
+#if IS_ENABLED(CONFIG_DRM_NOUVEAU_GSP_DEFAULT)
+       enable_gsp = true;
+#endif
+       if (!nvkm_boolopt(subdev->device->cfgopt, "NvGspRm", enable_gsp))
                return -EINVAL;
 
        if ((ret = r535_gsp_load_fw(gsp, "gsp", fwif->ver, &gsp->fws.rm)) ||
index dad938cf6decfb0658a73439a6ca602c78fce2fb..8f3783742208b60d8b5b9ad7c6e2ceab4e9fc9e4 100644 (file)
@@ -539,6 +539,8 @@ config DRM_PANEL_RAYDIUM_RM692E5
        depends on OF
        depends on DRM_MIPI_DSI
        depends on BACKLIGHT_CLASS_DEVICE
+       select DRM_DISPLAY_DP_HELPER
+       select DRM_DISPLAY_HELPER
        help
          Say Y here if you want to enable support for Raydium RM692E5-based
          display panels, such as the one found in the Fairphone 5 smartphone.
index c4c0f08e92026d80824a6932a696144da65e0311..4945a1e787eb3efc8bb9617bef1a75fba13bb656 100644 (file)
@@ -1768,11 +1768,11 @@ static const struct panel_desc starry_qfh032011_53g_desc = {
 };
 
 static const struct drm_display_mode starry_himax83102_j02_default_mode = {
-       .clock = 162850,
+       .clock = 162680,
        .hdisplay = 1200,
-       .hsync_start = 1200 + 50,
-       .hsync_end = 1200 + 50 + 20,
-       .htotal = 1200 + 50 + 20 + 50,
+       .hsync_start = 1200 + 60,
+       .hsync_end = 1200 + 60 + 20,
+       .htotal = 1200 + 60 + 20 + 40,
        .vdisplay = 1920,
        .vsync_start = 1920 + 116,
        .vsync_end = 1920 + 116 + 8,
index ea5a857793827af1a0bfe90d88bf2a3a71065f11..f23d8832a1ad055483b1f513557cb3d2807e3692 100644 (file)
@@ -309,7 +309,7 @@ static const struct s6d7aa0_panel_desc s6d7aa0_lsl080al02_desc = {
        .off_func = s6d7aa0_lsl080al02_off,
        .drm_mode = &s6d7aa0_lsl080al02_mode,
        .mode_flags = MIPI_DSI_MODE_VSYNC_FLUSH | MIPI_DSI_MODE_VIDEO_NO_HFP,
-       .bus_flags = DRM_BUS_FLAG_DE_HIGH,
+       .bus_flags = 0,
 
        .has_backlight = false,
        .use_passwd3 = false,
index 2214cb09678cd6a234359c2cb7972c9beb3f5851..d493ee735c7349b2ae1a21abff870859c0ea2af4 100644 (file)
@@ -3948,6 +3948,7 @@ static const struct panel_desc tianma_tm070jdhg30 = {
        },
        .bus_format = MEDIA_BUS_FMT_RGB888_1X7X4_SPWG,
        .connector_type = DRM_MODE_CONNECTOR_LVDS,
+       .bus_flags = DRM_BUS_FLAG_DE_HIGH,
 };
 
 static const struct panel_desc tianma_tm070jvhg33 = {
@@ -3960,6 +3961,7 @@ static const struct panel_desc tianma_tm070jvhg33 = {
        },
        .bus_format = MEDIA_BUS_FMT_RGB888_1X7X4_SPWG,
        .connector_type = DRM_MODE_CONNECTOR_LVDS,
+       .bus_flags = DRM_BUS_FLAG_DE_HIGH,
 };
 
 static const struct display_timing tianma_tm070rvhg71_timing = {
index 85b3b4871a1d63bf5a8cb2315a25dfd5ef2b8b70..fdd768bbd487c24b545da7aba2d0d45f63784293 100644 (file)
@@ -1985,8 +1985,10 @@ static void vop2_crtc_atomic_enable(struct drm_crtc *crtc,
                clock = vop2_set_intf_mux(vp, rkencoder->crtc_endpoint_id, polflags);
        }
 
-       if (!clock)
+       if (!clock) {
+               vop2_unlock(vop2);
                return;
+       }
 
        if (vcstate->output_mode == ROCKCHIP_OUT_MODE_AAAA &&
            !(vp_data->feature & VOP2_VP_FEATURE_OUTPUT_10BIT))
index 550492a7a031d7827b2e167098c495908bee82aa..d442b893275b971a53adc42b3a06973eebd8bdbb 100644 (file)
@@ -1184,14 +1184,16 @@ static void drm_sched_run_job_work(struct work_struct *w)
        if (READ_ONCE(sched->pause_submit))
                return;
 
+       /* Find entity with a ready job */
        entity = drm_sched_select_entity(sched);
        if (!entity)
-               return;
+               return; /* No more work */
 
        sched_job = drm_sched_entity_pop_job(entity);
        if (!sched_job) {
                complete_all(&entity->entity_idle);
-               return; /* No more work */
+               drm_sched_run_job_queue(sched);
+               return;
        }
 
        s_fence = sched_job->s_fence;
index ff36171c8fb700bae9967961220ea7cbb262d193..03d1c76aec2d3f7aca6a52acbb1a42455b37faa8 100644 (file)
@@ -960,7 +960,8 @@ int host1x_client_iommu_attach(struct host1x_client *client)
         * not the shared IOMMU domain, don't try to attach it to a different
         * domain. This allows using the IOMMU-backed DMA API.
         */
-       if (domain && domain != tegra->domain)
+       if (domain && domain->type != IOMMU_DOMAIN_IDENTITY &&
+           domain != tegra->domain)
                return 0;
 
        if (tegra->domain) {
@@ -1242,9 +1243,26 @@ static int host1x_drm_probe(struct host1x_device *dev)
 
        drm_mode_config_reset(drm);
 
-       err = drm_aperture_remove_framebuffers(&tegra_drm_driver);
-       if (err < 0)
-               goto hub;
+       /*
+        * Only take over from a potential firmware framebuffer if any CRTCs
+        * have been registered. This must not be a fatal error because there
+        * are other accelerators that are exposed via this driver.
+        *
+        * Another case where this happens is on Tegra234 where the display
+        * hardware is no longer part of the host1x complex, so this driver
+        * will not expose any modesetting features.
+        */
+       if (drm->mode_config.num_crtc > 0) {
+               err = drm_aperture_remove_framebuffers(&tegra_drm_driver);
+               if (err < 0)
+                       goto hub;
+       } else {
+               /*
+                * Indicate to userspace that this doesn't expose any display
+                * capabilities.
+                */
+               drm->driver_features &= ~(DRIVER_MODESET | DRIVER_ATOMIC);
+       }
 
        err = drm_dev_register(drm, 0);
        if (err < 0)
index ea2af6bd9abebcf381cc6a1a245e8cc1f044a656..e48863a445564d9200c75223d0ab3df8090a4bc5 100644 (file)
 
 #include <linux/prime_numbers.h>
 #include <linux/sched/signal.h>
+#include <linux/sizes.h>
 
 #include <drm/drm_buddy.h>
 
 #include "../lib/drm_random.h"
 
+static unsigned int random_seed;
+
 static inline u64 get_size(int order, u64 chunk_size)
 {
        return (1 << order) * chunk_size;
 }
 
+static void drm_test_buddy_alloc_range_bias(struct kunit *test)
+{
+       u32 mm_size, ps, bias_size, bias_start, bias_end, bias_rem;
+       DRM_RND_STATE(prng, random_seed);
+       unsigned int i, count, *order;
+       struct drm_buddy mm;
+       LIST_HEAD(allocated);
+
+       bias_size = SZ_1M;
+       ps = roundup_pow_of_two(prandom_u32_state(&prng) % bias_size);
+       ps = max(SZ_4K, ps);
+       mm_size = (SZ_8M-1) & ~(ps-1); /* Multiple roots */
+
+       kunit_info(test, "mm_size=%u, ps=%u\n", mm_size, ps);
+
+       KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, ps),
+                              "buddy_init failed\n");
+
+       count = mm_size / bias_size;
+       order = drm_random_order(count, &prng);
+       KUNIT_EXPECT_TRUE(test, order);
+
+       /*
+        * Idea is to split the address space into uniform bias ranges, and then
+        * in some random order allocate within each bias, using various
+        * patterns within. This should detect if allocations leak out from a
+        * given bias, for example.
+        */
+
+       for (i = 0; i < count; i++) {
+               LIST_HEAD(tmp);
+               u32 size;
+
+               bias_start = order[i] * bias_size;
+               bias_end = bias_start + bias_size;
+               bias_rem = bias_size;
+
+               /* internal round_up too big */
+               KUNIT_ASSERT_TRUE_MSG(test,
+                                     drm_buddy_alloc_blocks(&mm, bias_start,
+                                                            bias_end, bias_size + ps, bias_size,
+                                                            &allocated,
+                                                            DRM_BUDDY_RANGE_ALLOCATION),
+                                     "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n",
+                                     bias_start, bias_end, bias_size, bias_size);
+
+               /* size too big */
+               KUNIT_ASSERT_TRUE_MSG(test,
+                                     drm_buddy_alloc_blocks(&mm, bias_start,
+                                                            bias_end, bias_size + ps, ps,
+                                                            &allocated,
+                                                            DRM_BUDDY_RANGE_ALLOCATION),
+                                     "buddy_alloc didn't fail with bias(%x-%x), size=%u, ps=%u\n",
+                                     bias_start, bias_end, bias_size + ps, ps);
+
+               /* bias range too small for size */
+               KUNIT_ASSERT_TRUE_MSG(test,
+                                     drm_buddy_alloc_blocks(&mm, bias_start + ps,
+                                                            bias_end, bias_size, ps,
+                                                            &allocated,
+                                                            DRM_BUDDY_RANGE_ALLOCATION),
+                                     "buddy_alloc didn't fail with bias(%x-%x), size=%u, ps=%u\n",
+                                     bias_start + ps, bias_end, bias_size, ps);
+
+               /* bias misaligned */
+               KUNIT_ASSERT_TRUE_MSG(test,
+                                     drm_buddy_alloc_blocks(&mm, bias_start + ps,
+                                                            bias_end - ps,
+                                                            bias_size >> 1, bias_size >> 1,
+                                                            &allocated,
+                                                            DRM_BUDDY_RANGE_ALLOCATION),
+                                     "buddy_alloc h didn't fail with bias(%x-%x), size=%u, ps=%u\n",
+                                     bias_start + ps, bias_end - ps, bias_size >> 1, bias_size >> 1);
+
+               /* single big page */
+               KUNIT_ASSERT_FALSE_MSG(test,
+                                      drm_buddy_alloc_blocks(&mm, bias_start,
+                                                             bias_end, bias_size, bias_size,
+                                                             &tmp,
+                                                             DRM_BUDDY_RANGE_ALLOCATION),
+                                      "buddy_alloc i failed with bias(%x-%x), size=%u, ps=%u\n",
+                                      bias_start, bias_end, bias_size, bias_size);
+               drm_buddy_free_list(&mm, &tmp);
+
+               /* single page with internal round_up */
+               KUNIT_ASSERT_FALSE_MSG(test,
+                                      drm_buddy_alloc_blocks(&mm, bias_start,
+                                                             bias_end, ps, bias_size,
+                                                             &tmp,
+                                                             DRM_BUDDY_RANGE_ALLOCATION),
+                                      "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n",
+                                      bias_start, bias_end, ps, bias_size);
+               drm_buddy_free_list(&mm, &tmp);
+
+               /* random size within */
+               size = max(round_up(prandom_u32_state(&prng) % bias_rem, ps), ps);
+               if (size)
+                       KUNIT_ASSERT_FALSE_MSG(test,
+                                              drm_buddy_alloc_blocks(&mm, bias_start,
+                                                                     bias_end, size, ps,
+                                                                     &tmp,
+                                                                     DRM_BUDDY_RANGE_ALLOCATION),
+                                              "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n",
+                                              bias_start, bias_end, size, ps);
+
+               bias_rem -= size;
+               /* too big for current avail */
+               KUNIT_ASSERT_TRUE_MSG(test,
+                                     drm_buddy_alloc_blocks(&mm, bias_start,
+                                                            bias_end, bias_rem + ps, ps,
+                                                            &allocated,
+                                                            DRM_BUDDY_RANGE_ALLOCATION),
+                                     "buddy_alloc didn't fail with bias(%x-%x), size=%u, ps=%u\n",
+                                     bias_start, bias_end, bias_rem + ps, ps);
+
+               if (bias_rem) {
+                       /* random fill of the remainder */
+                       size = max(round_up(prandom_u32_state(&prng) % bias_rem, ps), ps);
+                       size = max(size, ps);
+
+                       KUNIT_ASSERT_FALSE_MSG(test,
+                                              drm_buddy_alloc_blocks(&mm, bias_start,
+                                                                     bias_end, size, ps,
+                                                                     &allocated,
+                                                                     DRM_BUDDY_RANGE_ALLOCATION),
+                                              "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n",
+                                              bias_start, bias_end, size, ps);
+                       /*
+                        * Intentionally allow some space to be left
+                        * unallocated, and ideally not always on the bias
+                        * boundaries.
+                        */
+                       drm_buddy_free_list(&mm, &tmp);
+               } else {
+                       list_splice_tail(&tmp, &allocated);
+               }
+       }
+
+       kfree(order);
+       drm_buddy_free_list(&mm, &allocated);
+       drm_buddy_fini(&mm);
+
+       /*
+        * Something more free-form. Idea is to pick a random starting bias
+        * range within the address space and then start filling it up. Also
+        * randomly grow the bias range in both directions as we go along. This
+        * should give us bias start/end which is not always uniform like above,
+        * and in some cases will require the allocator to jump over already
+        * allocated nodes in the middle of the address space.
+        */
+
+       KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, ps),
+                              "buddy_init failed\n");
+
+       bias_start = round_up(prandom_u32_state(&prng) % (mm_size - ps), ps);
+       bias_end = round_up(bias_start + prandom_u32_state(&prng) % (mm_size - bias_start), ps);
+       bias_end = max(bias_end, bias_start + ps);
+       bias_rem = bias_end - bias_start;
+
+       do {
+               u32 size = max(round_up(prandom_u32_state(&prng) % bias_rem, ps), ps);
+
+               KUNIT_ASSERT_FALSE_MSG(test,
+                                      drm_buddy_alloc_blocks(&mm, bias_start,
+                                                             bias_end, size, ps,
+                                                             &allocated,
+                                                             DRM_BUDDY_RANGE_ALLOCATION),
+                                      "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n",
+                                      bias_start, bias_end, size, ps);
+               bias_rem -= size;
+
+               /*
+                * Try to randomly grow the bias range in both directions, or
+                * only one, or perhaps don't grow at all.
+                */
+               do {
+                       u32 old_bias_start = bias_start;
+                       u32 old_bias_end = bias_end;
+
+                       if (bias_start)
+                               bias_start -= round_up(prandom_u32_state(&prng) % bias_start, ps);
+                       if (bias_end != mm_size)
+                               bias_end += round_up(prandom_u32_state(&prng) % (mm_size - bias_end), ps);
+
+                       bias_rem += old_bias_start - bias_start;
+                       bias_rem += bias_end - old_bias_end;
+               } while (!bias_rem && (bias_start || bias_end != mm_size));
+       } while (bias_rem);
+
+       KUNIT_ASSERT_EQ(test, bias_start, 0);
+       KUNIT_ASSERT_EQ(test, bias_end, mm_size);
+       KUNIT_ASSERT_TRUE_MSG(test,
+                             drm_buddy_alloc_blocks(&mm, bias_start, bias_end,
+                                                    ps, ps,
+                                                    &allocated,
+                                                    DRM_BUDDY_RANGE_ALLOCATION),
+                             "buddy_alloc passed with bias(%x-%x), size=%u\n",
+                             bias_start, bias_end, ps);
+
+       drm_buddy_free_list(&mm, &allocated);
+       drm_buddy_fini(&mm);
+}
+
+static void drm_test_buddy_alloc_contiguous(struct kunit *test)
+{
+       const unsigned long ps = SZ_4K, mm_size = 16 * 3 * SZ_4K;
+       unsigned long i, n_pages, total;
+       struct drm_buddy_block *block;
+       struct drm_buddy mm;
+       LIST_HEAD(left);
+       LIST_HEAD(middle);
+       LIST_HEAD(right);
+       LIST_HEAD(allocated);
+
+       KUNIT_EXPECT_FALSE(test, drm_buddy_init(&mm, mm_size, ps));
+
+       /*
+        * Idea is to fragment the address space by alternating block
+        * allocations between three different lists; one for left, middle and
+        * right. We can then free a list to simulate fragmentation. In
+        * particular we want to exercise the DRM_BUDDY_CONTIGUOUS_ALLOCATION,
+        * including the try_harder path.
+        */
+
+       i = 0;
+       n_pages = mm_size / ps;
+       do {
+               struct list_head *list;
+               int slot = i % 3;
+
+               if (slot == 0)
+                       list = &left;
+               else if (slot == 1)
+                       list = &middle;
+               else
+                       list = &right;
+               KUNIT_ASSERT_FALSE_MSG(test,
+                                      drm_buddy_alloc_blocks(&mm, 0, mm_size,
+                                                             ps, ps, list, 0),
+                                      "buddy_alloc hit an error size=%lu\n",
+                                      ps);
+       } while (++i < n_pages);
+
+       KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+                                                          3 * ps, ps, &allocated,
+                                                          DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+                              "buddy_alloc didn't error size=%lu\n", 3 * ps);
+
+       drm_buddy_free_list(&mm, &middle);
+       KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+                                                          3 * ps, ps, &allocated,
+                                                          DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+                              "buddy_alloc didn't error size=%lu\n", 3 * ps);
+       KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+                                                          2 * ps, ps, &allocated,
+                                                          DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+                              "buddy_alloc didn't error size=%lu\n", 2 * ps);
+
+       drm_buddy_free_list(&mm, &right);
+       KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+                                                          3 * ps, ps, &allocated,
+                                                          DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+                              "buddy_alloc didn't error size=%lu\n", 3 * ps);
+       /*
+        * At this point we should have enough contiguous space for 2 blocks,
+        * however they are never buddies (since we freed middle and right) so
+        * will require the try_harder logic to find them.
+        */
+       KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+                                                           2 * ps, ps, &allocated,
+                                                           DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+                              "buddy_alloc hit an error size=%lu\n", 2 * ps);
+
+       drm_buddy_free_list(&mm, &left);
+       KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size,
+                                                           3 * ps, ps, &allocated,
+                                                           DRM_BUDDY_CONTIGUOUS_ALLOCATION),
+                              "buddy_alloc hit an error size=%lu\n", 3 * ps);
+
+       total = 0;
+       list_for_each_entry(block, &allocated, link)
+               total += drm_buddy_block_size(&mm, block);
+
+       KUNIT_ASSERT_EQ(test, total, ps * 2 + ps * 3);
+
+       drm_buddy_free_list(&mm, &allocated);
+       drm_buddy_fini(&mm);
+}
+
 static void drm_test_buddy_alloc_pathological(struct kunit *test)
 {
        u64 mm_size, size, start = 0;
@@ -275,16 +567,30 @@ static void drm_test_buddy_alloc_limit(struct kunit *test)
        drm_buddy_fini(&mm);
 }
 
+static int drm_buddy_suite_init(struct kunit_suite *suite)
+{
+       while (!random_seed)
+               random_seed = get_random_u32();
+
+       kunit_info(suite, "Testing DRM buddy manager, with random_seed=0x%x\n",
+                  random_seed);
+
+       return 0;
+}
+
 static struct kunit_case drm_buddy_tests[] = {
        KUNIT_CASE(drm_test_buddy_alloc_limit),
        KUNIT_CASE(drm_test_buddy_alloc_optimistic),
        KUNIT_CASE(drm_test_buddy_alloc_pessimistic),
        KUNIT_CASE(drm_test_buddy_alloc_pathological),
+       KUNIT_CASE(drm_test_buddy_alloc_contiguous),
+       KUNIT_CASE(drm_test_buddy_alloc_range_bias),
        {}
 };
 
 static struct kunit_suite drm_buddy_test_suite = {
        .name = "drm_buddy",
+       .suite_init = drm_buddy_suite_init,
        .test_cases = drm_buddy_tests,
 };
 
index 4e9247cf9977f5677126ffbdcf56c97446c769b4..f37c0d76586568ce645b8fc42be6d1042849cbd9 100644 (file)
@@ -157,7 +157,7 @@ static void drm_test_mm_init(struct kunit *test)
 
        /* After creation, it should all be one massive hole */
        if (!assert_one_hole(test, &mm, 0, size)) {
-               KUNIT_FAIL(test, "");
+               KUNIT_FAIL(test, "mm not one hole on creation");
                goto out;
        }
 
@@ -171,14 +171,14 @@ static void drm_test_mm_init(struct kunit *test)
 
        /* After filling the range entirely, there should be no holes */
        if (!assert_no_holes(test, &mm)) {
-               KUNIT_FAIL(test, "");
+               KUNIT_FAIL(test, "mm has holes when filled");
                goto out;
        }
 
        /* And then after emptying it again, the massive hole should be back */
        drm_mm_remove_node(&tmp);
        if (!assert_one_hole(test, &mm, 0, size)) {
-               KUNIT_FAIL(test, "");
+               KUNIT_FAIL(test, "mm does not have single hole after emptying");
                goto out;
        }
 
@@ -188,13 +188,13 @@ out:
 
 static void drm_test_mm_debug(struct kunit *test)
 {
+       struct drm_printer p = drm_debug_printer(test->name);
        struct drm_mm mm;
        struct drm_mm_node nodes[2];
 
        /* Create a small drm_mm with a couple of nodes and a few holes, and
         * check that the debug iterator doesn't explode over a trivial drm_mm.
         */
-
        drm_mm_init(&mm, 0, 4096);
 
        memset(nodes, 0, sizeof(nodes));
@@ -209,6 +209,9 @@ static void drm_test_mm_debug(struct kunit *test)
        KUNIT_ASSERT_FALSE_MSG(test, drm_mm_reserve_node(&mm, &nodes[1]),
                               "failed to reserve node[0] {start=%lld, size=%lld)\n",
                               nodes[0].start, nodes[0].size);
+
+       drm_mm_print(&mm, &p);
+       KUNIT_SUCCEED(test);
 }
 
 static bool expect_insert(struct kunit *test, struct drm_mm *mm,
index f5187b384ae9ac8eedede8e6a0d4d56eb8af1670..76027960054f1140e768ae21b30e5a3015437d02 100644 (file)
@@ -95,11 +95,17 @@ static int ttm_global_init(void)
        ttm_pool_mgr_init(num_pages);
        ttm_tt_mgr_init(num_pages, num_dma32);
 
-       glob->dummy_read_page = alloc_page(__GFP_ZERO | GFP_DMA32);
+       glob->dummy_read_page = alloc_page(__GFP_ZERO | GFP_DMA32 |
+                                          __GFP_NOWARN);
 
+       /* Retry without GFP_DMA32 for platforms DMA32 is not available */
        if (unlikely(glob->dummy_read_page == NULL)) {
-               ret = -ENOMEM;
-               goto out;
+               glob->dummy_read_page = alloc_page(__GFP_ZERO);
+               if (unlikely(glob->dummy_read_page == NULL)) {
+                       ret = -ENOMEM;
+                       goto out;
+               }
+               pr_warn("Using GFP_DMA32 fallback for dummy_read_page\n");
        }
 
        INIT_LIST_HEAD(&glob->device_list);
@@ -195,7 +201,7 @@ int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *func
                    bool use_dma_alloc, bool use_dma32)
 {
        struct ttm_global *glob = &ttm_glob;
-       int ret;
+       int ret, nid;
 
        if (WARN_ON(vma_manager == NULL))
                return -EINVAL;
@@ -215,7 +221,12 @@ int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *func
 
        ttm_sys_man_init(bdev);
 
-       ttm_pool_init(&bdev->pool, dev, dev_to_node(dev), use_dma_alloc, use_dma32);
+       if (dev)
+               nid = dev_to_node(dev);
+       else
+               nid = NUMA_NO_NODE;
+
+       ttm_pool_init(&bdev->pool, dev, nid, use_dma_alloc, use_dma32);
 
        bdev->vma_manager = vma_manager;
        spin_lock_init(&bdev->lru_lock);
index b62f420a9f969d61e09e1cde82f2e49b9415d8aa..112438d965ffbefd4fa2cce5f246cc03a63759f9 100644 (file)
@@ -387,7 +387,7 @@ static void ttm_pool_free_range(struct ttm_pool *pool, struct ttm_tt *tt,
                                enum ttm_caching caching,
                                pgoff_t start_page, pgoff_t end_page)
 {
-       struct page **pages = tt->pages;
+       struct page **pages = &tt->pages[start_page];
        unsigned int order;
        pgoff_t i, nr;
 
index fcff41dd2315b710dc9de6ccdb361922c61d2602..88f63d526b22365b42b90e90d5b451a56e3fda52 100644 (file)
@@ -147,6 +147,13 @@ v3d_job_allocate(void **container, size_t size)
        return 0;
 }
 
+static void
+v3d_job_deallocate(void **container)
+{
+       kfree(*container);
+       *container = NULL;
+}
+
 static int
 v3d_job_init(struct v3d_dev *v3d, struct drm_file *file_priv,
             struct v3d_job *job, void (*free)(struct kref *ref),
@@ -273,8 +280,10 @@ v3d_setup_csd_jobs_and_bos(struct drm_file *file_priv,
 
        ret = v3d_job_init(v3d, file_priv, &(*job)->base,
                           v3d_job_free, args->in_sync, se, V3D_CSD);
-       if (ret)
+       if (ret) {
+               v3d_job_deallocate((void *)job);
                return ret;
+       }
 
        ret = v3d_job_allocate((void *)clean_job, sizeof(**clean_job));
        if (ret)
@@ -282,8 +291,10 @@ v3d_setup_csd_jobs_and_bos(struct drm_file *file_priv,
 
        ret = v3d_job_init(v3d, file_priv, *clean_job,
                           v3d_job_free, 0, NULL, V3D_CACHE_CLEAN);
-       if (ret)
+       if (ret) {
+               v3d_job_deallocate((void *)clean_job);
                return ret;
+       }
 
        (*job)->args = *args;
 
@@ -860,8 +871,10 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 
        ret = v3d_job_init(v3d, file_priv, &render->base,
                           v3d_render_job_free, args->in_sync_rcl, &se, V3D_RENDER);
-       if (ret)
+       if (ret) {
+               v3d_job_deallocate((void *)&render);
                goto fail;
+       }
 
        render->start = args->rcl_start;
        render->end = args->rcl_end;
@@ -874,8 +887,10 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 
                ret = v3d_job_init(v3d, file_priv, &bin->base,
                                   v3d_job_free, args->in_sync_bcl, &se, V3D_BIN);
-               if (ret)
+               if (ret) {
+                       v3d_job_deallocate((void *)&bin);
                        goto fail;
+               }
 
                bin->start = args->bcl_start;
                bin->end = args->bcl_end;
@@ -892,8 +907,10 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 
                ret = v3d_job_init(v3d, file_priv, clean_job,
                                   v3d_job_free, 0, NULL, V3D_CACHE_CLEAN);
-               if (ret)
+               if (ret) {
+                       v3d_job_deallocate((void *)&clean_job);
                        goto fail;
+               }
 
                last_job = clean_job;
        } else {
@@ -1015,8 +1032,10 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 
        ret = v3d_job_init(v3d, file_priv, &job->base,
                           v3d_job_free, args->in_sync, &se, V3D_TFU);
-       if (ret)
+       if (ret) {
+               v3d_job_deallocate((void *)&job);
                goto fail;
+       }
 
        job->base.bo = kcalloc(ARRAY_SIZE(args->bo_handles),
                               sizeof(*job->base.bo), GFP_KERNEL);
@@ -1233,8 +1252,10 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
 
        ret = v3d_job_init(v3d, file_priv, &cpu_job->base,
                           v3d_job_free, 0, &se, V3D_CPU);
-       if (ret)
+       if (ret) {
+               v3d_job_deallocate((void *)&cpu_job);
                goto fail;
+       }
 
        clean_job = cpu_job->indirect_csd.clean_job;
        csd_job = cpu_job->indirect_csd.job;
index f8e9abe647b927b211abb4bbc0751ea318d80369..9539aa28937fa4cf71fbcd8e252749607617d966 100644 (file)
@@ -94,6 +94,7 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
                        goto err_free;
        }
 
+       dma_set_max_seg_size(dev->dev, dma_max_mapping_size(dev->dev) ?: UINT_MAX);
        ret = virtio_gpu_init(vdev, dev);
        if (ret)
                goto err_free;
index 3062e0e0d467ee0737f0fbf63d3826c9f4013778..79ba98a169f907cc18dcd63c07e9570c623c1608 100644 (file)
@@ -50,8 +50,8 @@
 
 #define HOST2GUC_SELF_CFG_REQUEST_MSG_LEN              (GUC_HXG_REQUEST_MSG_MIN_LEN + 3u)
 #define HOST2GUC_SELF_CFG_REQUEST_MSG_0_MBZ            GUC_HXG_REQUEST_MSG_0_DATA0
-#define HOST2GUC_SELF_CFG_REQUEST_MSG_1_KLV_KEY                (0xffff << 16)
-#define HOST2GUC_SELF_CFG_REQUEST_MSG_1_KLV_LEN                (0xffff << 0)
+#define HOST2GUC_SELF_CFG_REQUEST_MSG_1_KLV_KEY                (0xffffu << 16)
+#define HOST2GUC_SELF_CFG_REQUEST_MSG_1_KLV_LEN                (0xffffu << 0)
 #define HOST2GUC_SELF_CFG_REQUEST_MSG_2_VALUE32                GUC_HXG_REQUEST_MSG_n_DATAn
 #define HOST2GUC_SELF_CFG_REQUEST_MSG_3_VALUE64                GUC_HXG_REQUEST_MSG_n_DATAn
 
index 811add10c30dc21a357841dccd10e6583468978c..c165e26c097669b72e6cfa7f97a7ee5bdada90ff 100644 (file)
@@ -242,8 +242,8 @@ struct slpc_shared_data {
                (HOST2GUC_PC_SLPC_REQUEST_REQUEST_MSG_MIN_LEN + \
                        HOST2GUC_PC_SLPC_EVENT_MAX_INPUT_ARGS)
 #define HOST2GUC_PC_SLPC_REQUEST_MSG_0_MBZ             GUC_HXG_REQUEST_MSG_0_DATA0
-#define HOST2GUC_PC_SLPC_REQUEST_MSG_1_EVENT_ID                (0xff << 8)
-#define HOST2GUC_PC_SLPC_REQUEST_MSG_1_EVENT_ARGC      (0xff << 0)
+#define HOST2GUC_PC_SLPC_REQUEST_MSG_1_EVENT_ID                (0xffu << 8)
+#define HOST2GUC_PC_SLPC_REQUEST_MSG_1_EVENT_ARGC      (0xffu << 0)
 #define HOST2GUC_PC_SLPC_REQUEST_MSG_N_EVENT_DATA_N    GUC_HXG_REQUEST_MSG_n_DATAn
 
 #endif
index 3b83f907ece46165c5bc11f93a3077e6dafd2edf..0b1146d0c997a216c589bb21d86d91f4d0f6841c 100644 (file)
@@ -82,11 +82,11 @@ static_assert(sizeof(struct guc_ct_buffer_desc) == 64);
 #define GUC_CTB_HDR_LEN                                1u
 #define GUC_CTB_MSG_MIN_LEN                    GUC_CTB_HDR_LEN
 #define GUC_CTB_MSG_MAX_LEN                    256u
-#define GUC_CTB_MSG_0_FENCE                    (0xffff << 16)
-#define GUC_CTB_MSG_0_FORMAT                   (0xf << 12)
+#define GUC_CTB_MSG_0_FENCE                    (0xffffu << 16)
+#define GUC_CTB_MSG_0_FORMAT                   (0xfu << 12)
 #define   GUC_CTB_FORMAT_HXG                   0u
-#define GUC_CTB_MSG_0_RESERVED                 (0xf << 8)
-#define GUC_CTB_MSG_0_NUM_DWORDS               (0xff << 0)
+#define GUC_CTB_MSG_0_RESERVED                 (0xfu << 8)
+#define GUC_CTB_MSG_0_NUM_DWORDS               (0xffu << 0)
 
 /**
  * DOC: CTB HXG Message
index 47094b9b044cbbcdd68f51b3cacd6f15e4d97b3c..0400bc0fccdc9b5d5605dafd5f5480ff3983319c 100644 (file)
@@ -31,9 +31,9 @@
  */
 
 #define GUC_KLV_LEN_MIN                                1u
-#define GUC_KLV_0_KEY                          (0xffff << 16)
-#define GUC_KLV_0_LEN                          (0xffff << 0)
-#define GUC_KLV_n_VALUE                                (0xffffffff << 0)
+#define GUC_KLV_0_KEY                          (0xffffu << 16)
+#define GUC_KLV_0_LEN                          (0xffffu << 0)
+#define GUC_KLV_n_VALUE                                (0xffffffffu << 0)
 
 /**
  * DOC: GuC Self Config KLVs
index 3d199016cf881cea10668a010fce5e8b4ea234c1..29e414c82d56cb5a318686e18d3d774c7b9c3d0c 100644 (file)
  */
 
 #define GUC_HXG_MSG_MIN_LEN                    1u
-#define GUC_HXG_MSG_0_ORIGIN                   (0x1 << 31)
+#define GUC_HXG_MSG_0_ORIGIN                   (0x1u << 31)
 #define   GUC_HXG_ORIGIN_HOST                  0u
 #define   GUC_HXG_ORIGIN_GUC                   1u
-#define GUC_HXG_MSG_0_TYPE                     (0x7 << 28)
+#define GUC_HXG_MSG_0_TYPE                     (0x7u << 28)
 #define   GUC_HXG_TYPE_REQUEST                 0u
 #define   GUC_HXG_TYPE_EVENT                   1u
 #define   GUC_HXG_TYPE_NO_RESPONSE_BUSY                3u
 #define   GUC_HXG_TYPE_NO_RESPONSE_RETRY       5u
 #define   GUC_HXG_TYPE_RESPONSE_FAILURE                6u
 #define   GUC_HXG_TYPE_RESPONSE_SUCCESS                7u
-#define GUC_HXG_MSG_0_AUX                      (0xfffffff << 0)
-#define GUC_HXG_MSG_n_PAYLOAD                  (0xffffffff << 0)
+#define GUC_HXG_MSG_0_AUX                      (0xfffffffu << 0)
+#define GUC_HXG_MSG_n_PAYLOAD                  (0xffffffffu << 0)
 
 /**
  * DOC: HXG Request
@@ -85,8 +85,8 @@
  */
 
 #define GUC_HXG_REQUEST_MSG_MIN_LEN            GUC_HXG_MSG_MIN_LEN
-#define GUC_HXG_REQUEST_MSG_0_DATA0            (0xfff << 16)
-#define GUC_HXG_REQUEST_MSG_0_ACTION           (0xffff << 0)
+#define GUC_HXG_REQUEST_MSG_0_DATA0            (0xfffu << 16)
+#define GUC_HXG_REQUEST_MSG_0_ACTION           (0xffffu << 0)
 #define GUC_HXG_REQUEST_MSG_n_DATAn            GUC_HXG_MSG_n_PAYLOAD
 
 /**
  */
 
 #define GUC_HXG_EVENT_MSG_MIN_LEN              GUC_HXG_MSG_MIN_LEN
-#define GUC_HXG_EVENT_MSG_0_DATA0              (0xfff << 16)
-#define GUC_HXG_EVENT_MSG_0_ACTION             (0xffff << 0)
+#define GUC_HXG_EVENT_MSG_0_DATA0              (0xfffu << 16)
+#define GUC_HXG_EVENT_MSG_0_ACTION             (0xffffu << 0)
 #define GUC_HXG_EVENT_MSG_n_DATAn              GUC_HXG_MSG_n_PAYLOAD
 
 /**
  */
 
 #define GUC_HXG_FAILURE_MSG_LEN                        GUC_HXG_MSG_MIN_LEN
-#define GUC_HXG_FAILURE_MSG_0_HINT             (0xfff << 16)
-#define GUC_HXG_FAILURE_MSG_0_ERROR            (0xffff << 0)
+#define GUC_HXG_FAILURE_MSG_0_HINT             (0xfffu << 16)
+#define GUC_HXG_FAILURE_MSG_0_ERROR            (0xffffu << 0)
 
 /**
  * DOC: HXG Response
index 5f19550cc845360ada430477180405fe66bf84b9..777c20ceabab12f04f3f2062df3524ef1c2c0923 100644 (file)
@@ -10,7 +10,7 @@
 
 #include "xe_bo.h"
 
-#define i915_gem_object_is_shmem(obj) ((obj)->flags & XE_BO_CREATE_SYSTEM_BIT)
+#define i915_gem_object_is_shmem(obj) (0) /* We don't use shmem */
 
 static inline dma_addr_t i915_gem_object_get_dma_address(const struct xe_bo *bo, pgoff_t n)
 {
@@ -35,12 +35,10 @@ static inline int i915_gem_object_read_from_page(struct xe_bo *bo,
                                          u32 ofs, u64 *ptr, u32 size)
 {
        struct ttm_bo_kmap_obj map;
-       void *virtual;
+       void *src;
        bool is_iomem;
        int ret;
 
-       XE_WARN_ON(size != 8);
-
        ret = xe_bo_lock(bo, true);
        if (ret)
                return ret;
@@ -50,11 +48,12 @@ static inline int i915_gem_object_read_from_page(struct xe_bo *bo,
                goto out_unlock;
 
        ofs &= ~PAGE_MASK;
-       virtual = ttm_kmap_obj_virtual(&map, &is_iomem);
+       src = ttm_kmap_obj_virtual(&map, &is_iomem);
+       src += ofs;
        if (is_iomem)
-               *ptr = readq((void __iomem *)(virtual + ofs));
+               memcpy_fromio(ptr, (void __iomem *)src, size);
        else
-               *ptr = *(u64 *)(virtual + ofs);
+               memcpy(ptr, src, size);
 
        ttm_bo_kunmap(&map);
 out_unlock:
index a6523df0f1d39fbe7f0354d404f95886a3d56424..c347e2c29f81f133c7766a5089811edb046b0085 100644 (file)
@@ -114,21 +114,21 @@ static void test_copy(struct xe_migrate *m, struct xe_bo *bo,
                                                   region |
                                                   XE_BO_NEEDS_CPU_ACCESS);
        if (IS_ERR(remote)) {
-               KUNIT_FAIL(test, "Failed to allocate remote bo for %s: %li\n",
-                          str, PTR_ERR(remote));
+               KUNIT_FAIL(test, "Failed to allocate remote bo for %s: %pe\n",
+                          str, remote);
                return;
        }
 
        err = xe_bo_validate(remote, NULL, false);
        if (err) {
-               KUNIT_FAIL(test, "Failed to validate system bo for %s: %li\n",
+               KUNIT_FAIL(test, "Failed to validate system bo for %s: %i\n",
                           str, err);
                goto out_unlock;
        }
 
        err = xe_bo_vmap(remote);
        if (err) {
-               KUNIT_FAIL(test, "Failed to vmap system bo for %s: %li\n",
+               KUNIT_FAIL(test, "Failed to vmap system bo for %s: %i\n",
                           str, err);
                goto out_unlock;
        }
index ef56bd517b28c2604b1ed8c5f494a839217d25da..421b819fd4ba9a182d1dcbb7b364ad9a144477cb 100644 (file)
@@ -21,4 +21,5 @@ kunit_test_suite(xe_mocs_test_suite);
 
 MODULE_AUTHOR("Intel Corporation");
 MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("xe_mocs kunit test");
 MODULE_IMPORT_NS(EXPORTED_FOR_KUNIT_TESTING);
index a53c22a1958247cbd703264aeb49195620b6754a..b4715b78ef3bf952bacd5ed7e1739c2fe0cfa813 100644 (file)
@@ -74,9 +74,6 @@ static const struct platform_test_case cases[] = {
        SUBPLATFORM_CASE(DG2, G11, B1),
        SUBPLATFORM_CASE(DG2, G12, A0),
        SUBPLATFORM_CASE(DG2, G12, A1),
-       PLATFORM_CASE(PVC, B0),
-       PLATFORM_CASE(PVC, B1),
-       PLATFORM_CASE(PVC, C0),
        GMDID_CASE(METEORLAKE, 1270, A0, 1300, A0),
        GMDID_CASE(METEORLAKE, 1271, A0, 1300, A0),
        GMDID_CASE(LUNARLAKE, 2004, A0, 2000, A0),
index 0b0e262e2166d69da1063915fa4c6eeedfd38bd6..4d3b80ec906d0a6f44793df496ef776a90d84596 100644 (file)
 #include "xe_ttm_stolen_mgr.h"
 #include "xe_vm.h"
 
+const char *const xe_mem_type_to_name[TTM_NUM_MEM_TYPES]  = {
+       [XE_PL_SYSTEM] = "system",
+       [XE_PL_TT] = "gtt",
+       [XE_PL_VRAM0] = "vram0",
+       [XE_PL_VRAM1] = "vram1",
+       [XE_PL_STOLEN] = "stolen"
+};
+
 static const struct ttm_place sys_placement_flags = {
        .fpfn = 0,
        .lpfn = 0,
@@ -713,8 +721,7 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
                migrate = xe->tiles[0].migrate;
 
        xe_assert(xe, migrate);
-
-       trace_xe_bo_move(bo);
+       trace_xe_bo_move(bo, new_mem->mem_type, old_mem_type, move_lacks_source);
        xe_device_mem_access_get(xe);
 
        if (xe_bo_is_pinned(bo) && !xe_bo_is_user(bo)) {
index 9b1279aca1272cd69eab6d1121ed651b83210166..8be42ac6cd07023c520988cfff2cf3599de4859f 100644 (file)
@@ -243,6 +243,7 @@ int xe_bo_evict_pinned(struct xe_bo *bo);
 int xe_bo_restore_pinned(struct xe_bo *bo);
 
 extern struct ttm_device_funcs xe_ttm_funcs;
+extern const char *const xe_mem_type_to_name[];
 
 int xe_gem_create_ioctl(struct drm_device *dev, void *data,
                        struct drm_file *file);
index b8d8da5466708c6903ccb3ade3852a7ac9911235..5176c27e4b6a4c59739f5e456f79ca7d8a77ce94 100644 (file)
@@ -83,9 +83,6 @@ static int xe_file_open(struct drm_device *dev, struct drm_file *file)
        return 0;
 }
 
-static void device_kill_persistent_exec_queues(struct xe_device *xe,
-                                              struct xe_file *xef);
-
 static void xe_file_close(struct drm_device *dev, struct drm_file *file)
 {
        struct xe_device *xe = to_xe_device(dev);
@@ -102,8 +99,6 @@ static void xe_file_close(struct drm_device *dev, struct drm_file *file)
        mutex_unlock(&xef->exec_queue.lock);
        xa_destroy(&xef->exec_queue.xa);
        mutex_destroy(&xef->exec_queue.lock);
-       device_kill_persistent_exec_queues(xe, xef);
-
        mutex_lock(&xef->vm.lock);
        xa_for_each(&xef->vm.xa, idx, vm)
                xe_vm_close_and_put(vm);
@@ -255,9 +250,6 @@ struct xe_device *xe_device_create(struct pci_dev *pdev,
                        xa_erase(&xe->usm.asid_to_vm, asid);
        }
 
-       drmm_mutex_init(&xe->drm, &xe->persistent_engines.lock);
-       INIT_LIST_HEAD(&xe->persistent_engines.list);
-
        spin_lock_init(&xe->pinned.lock);
        INIT_LIST_HEAD(&xe->pinned.kernel_bo_present);
        INIT_LIST_HEAD(&xe->pinned.external_vram);
@@ -570,37 +562,6 @@ void xe_device_shutdown(struct xe_device *xe)
 {
 }
 
-void xe_device_add_persistent_exec_queues(struct xe_device *xe, struct xe_exec_queue *q)
-{
-       mutex_lock(&xe->persistent_engines.lock);
-       list_add_tail(&q->persistent.link, &xe->persistent_engines.list);
-       mutex_unlock(&xe->persistent_engines.lock);
-}
-
-void xe_device_remove_persistent_exec_queues(struct xe_device *xe,
-                                            struct xe_exec_queue *q)
-{
-       mutex_lock(&xe->persistent_engines.lock);
-       if (!list_empty(&q->persistent.link))
-               list_del(&q->persistent.link);
-       mutex_unlock(&xe->persistent_engines.lock);
-}
-
-static void device_kill_persistent_exec_queues(struct xe_device *xe,
-                                              struct xe_file *xef)
-{
-       struct xe_exec_queue *q, *next;
-
-       mutex_lock(&xe->persistent_engines.lock);
-       list_for_each_entry_safe(q, next, &xe->persistent_engines.list,
-                                persistent.link)
-               if (q->persistent.xef == xef) {
-                       xe_exec_queue_kill(q);
-                       list_del_init(&q->persistent.link);
-               }
-       mutex_unlock(&xe->persistent_engines.lock);
-}
-
 void xe_device_wmb(struct xe_device *xe)
 {
        struct xe_gt *gt = xe_root_mmio_gt(xe);
@@ -613,7 +574,7 @@ void xe_device_wmb(struct xe_device *xe)
 u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size)
 {
        return xe_device_has_flat_ccs(xe) ?
-               DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE(xe)) : 0;
+               DIV_ROUND_UP_ULL(size, NUM_BYTES_PER_CCS_BYTE(xe)) : 0;
 }
 
 bool xe_device_mem_access_ongoing(struct xe_device *xe)
index 3da83b2332063882afcaffb3f204410fa848de9d..08d8b72c77319a74bc34562c92ec0aab0195be42 100644 (file)
@@ -42,10 +42,6 @@ int xe_device_probe(struct xe_device *xe);
 void xe_device_remove(struct xe_device *xe);
 void xe_device_shutdown(struct xe_device *xe);
 
-void xe_device_add_persistent_exec_queues(struct xe_device *xe, struct xe_exec_queue *q);
-void xe_device_remove_persistent_exec_queues(struct xe_device *xe,
-                                            struct xe_exec_queue *q);
-
 void xe_device_wmb(struct xe_device *xe);
 
 static inline struct xe_file *to_xe_file(const struct drm_file *file)
index 5dc9127a20293e1ebb56c3684e2fdb7e6f425b43..e8491979a6f21810cf4c480af08e9b2b6abfd4ee 100644 (file)
@@ -341,14 +341,6 @@ struct xe_device {
                struct mutex lock;
        } usm;
 
-       /** @persistent_engines: engines that are closed but still running */
-       struct {
-               /** @lock: protects persistent engines */
-               struct mutex lock;
-               /** @list: list of persistent engines */
-               struct list_head list;
-       } persistent_engines;
-
        /** @pinned: pinned BO state */
        struct {
                /** @lock: protected pinned BO list state */
index 74391d9b11ae0e4cc77ecf9d0f5f47264e7adec6..e4db069f0db3f1fd27ed80eb84fc4544ea0831df 100644 (file)
@@ -134,8 +134,6 @@ static void xe_display_fini_nommio(struct drm_device *dev, void *dummy)
 
 int xe_display_init_nommio(struct xe_device *xe)
 {
-       int err;
-
        if (!xe->info.enable_display)
                return 0;
 
@@ -145,10 +143,6 @@ int xe_display_init_nommio(struct xe_device *xe)
        /* This must be called before any calls to HAS_PCH_* */
        intel_detect_pch(xe);
 
-       err = intel_power_domains_init(xe);
-       if (err)
-               return err;
-
        return drmm_add_action_or_reset(&xe->drm, xe_display_fini_nommio, xe);
 }
 
index 64ed303728fda98d2c4edb2a4a0e1f0b810812d2..da2627ed6ae7a94114ec4e1d0aa6f04495103540 100644 (file)
@@ -175,7 +175,7 @@ static int xe_dma_buf_begin_cpu_access(struct dma_buf *dma_buf,
        return 0;
 }
 
-const struct dma_buf_ops xe_dmabuf_ops = {
+static const struct dma_buf_ops xe_dmabuf_ops = {
        .attach = xe_dma_buf_attach,
        .detach = xe_dma_buf_detach,
        .pin = xe_dma_buf_pin,
index 82d1305e831f298f013338e4f7ee9e6e2ea67168..6040e4d22b2809c10385fadfbd6f4d8b6fdd0b28 100644 (file)
@@ -131,14 +131,6 @@ static void bo_meminfo(struct xe_bo *bo,
 
 static void show_meminfo(struct drm_printer *p, struct drm_file *file)
 {
-       static const char *const mem_type_to_name[TTM_NUM_MEM_TYPES]  = {
-               [XE_PL_SYSTEM] = "system",
-               [XE_PL_TT] = "gtt",
-               [XE_PL_VRAM0] = "vram0",
-               [XE_PL_VRAM1] = "vram1",
-               [4 ... 6] = NULL,
-               [XE_PL_STOLEN] = "stolen"
-       };
        struct drm_memory_stats stats[TTM_NUM_MEM_TYPES] = {};
        struct xe_file *xef = file->driver_priv;
        struct ttm_device *bdev = &xef->xe->ttm;
@@ -171,7 +163,7 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
        spin_unlock(&client->bos_lock);
 
        for (mem_type = XE_PL_SYSTEM; mem_type < TTM_NUM_MEM_TYPES; ++mem_type) {
-               if (!mem_type_to_name[mem_type])
+               if (!xe_mem_type_to_name[mem_type])
                        continue;
 
                man = ttm_manager_type(bdev, mem_type);
@@ -182,7 +174,7 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
                                               DRM_GEM_OBJECT_RESIDENT |
                                               (mem_type != XE_PL_SYSTEM ? 0 :
                                               DRM_GEM_OBJECT_PURGEABLE),
-                                              mem_type_to_name[mem_type]);
+                                              xe_mem_type_to_name[mem_type]);
                }
        }
 }
index b853feed9ccc15eefab7f0ccdf070096521e6015..17f26952e6656b8a077eb51161acbfd96638db2c 100644 (file)
@@ -111,7 +111,7 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
        u64 addresses[XE_HW_ENGINE_MAX_INSTANCE];
        struct drm_gpuvm_exec vm_exec = {.extra.fn = xe_exec_fn};
        struct drm_exec *exec = &vm_exec.exec;
-       u32 i, num_syncs = 0;
+       u32 i, num_syncs = 0, num_ufence = 0;
        struct xe_sched_job *job;
        struct dma_fence *rebind_fence;
        struct xe_vm *vm;
@@ -157,6 +157,14 @@ int xe_exec_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
                                           SYNC_PARSE_FLAG_LR_MODE : 0));
                if (err)
                        goto err_syncs;
+
+               if (xe_sync_is_ufence(&syncs[i]))
+                       num_ufence++;
+       }
+
+       if (XE_IOCTL_DBG(xe, num_ufence > 1)) {
+               err = -EINVAL;
+               goto err_syncs;
        }
 
        if (xe_exec_queue_is_parallel(q)) {
index bcfc4127c7c59f0fffc8e40df70a5b1c8222495f..49223026c89fd5e3626be84a9687774d29b6bcb2 100644 (file)
@@ -60,7 +60,6 @@ static struct xe_exec_queue *__xe_exec_queue_create(struct xe_device *xe,
        q->fence_irq = &gt->fence_irq[hwe->class];
        q->ring_ops = gt->ring_ops[hwe->class];
        q->ops = gt->exec_queue_ops;
-       INIT_LIST_HEAD(&q->persistent.link);
        INIT_LIST_HEAD(&q->compute.link);
        INIT_LIST_HEAD(&q->multi_gt_link);
 
@@ -310,102 +309,6 @@ static int exec_queue_set_timeslice(struct xe_device *xe, struct xe_exec_queue *
        return q->ops->set_timeslice(q, value);
 }
 
-static int exec_queue_set_preemption_timeout(struct xe_device *xe,
-                                            struct xe_exec_queue *q, u64 value,
-                                            bool create)
-{
-       u32 min = 0, max = 0;
-
-       xe_exec_queue_get_prop_minmax(q->hwe->eclass,
-                                     XE_EXEC_QUEUE_PREEMPT_TIMEOUT, &min, &max);
-
-       if (xe_exec_queue_enforce_schedule_limit() &&
-           !xe_hw_engine_timeout_in_range(value, min, max))
-               return -EINVAL;
-
-       return q->ops->set_preempt_timeout(q, value);
-}
-
-static int exec_queue_set_persistence(struct xe_device *xe, struct xe_exec_queue *q,
-                                     u64 value, bool create)
-{
-       if (XE_IOCTL_DBG(xe, !create))
-               return -EINVAL;
-
-       if (XE_IOCTL_DBG(xe, xe_vm_in_preempt_fence_mode(q->vm)))
-               return -EINVAL;
-
-       if (value)
-               q->flags |= EXEC_QUEUE_FLAG_PERSISTENT;
-       else
-               q->flags &= ~EXEC_QUEUE_FLAG_PERSISTENT;
-
-       return 0;
-}
-
-static int exec_queue_set_job_timeout(struct xe_device *xe, struct xe_exec_queue *q,
-                                     u64 value, bool create)
-{
-       u32 min = 0, max = 0;
-
-       if (XE_IOCTL_DBG(xe, !create))
-               return -EINVAL;
-
-       xe_exec_queue_get_prop_minmax(q->hwe->eclass,
-                                     XE_EXEC_QUEUE_JOB_TIMEOUT, &min, &max);
-
-       if (xe_exec_queue_enforce_schedule_limit() &&
-           !xe_hw_engine_timeout_in_range(value, min, max))
-               return -EINVAL;
-
-       return q->ops->set_job_timeout(q, value);
-}
-
-static int exec_queue_set_acc_trigger(struct xe_device *xe, struct xe_exec_queue *q,
-                                     u64 value, bool create)
-{
-       if (XE_IOCTL_DBG(xe, !create))
-               return -EINVAL;
-
-       if (XE_IOCTL_DBG(xe, !xe->info.has_usm))
-               return -EINVAL;
-
-       q->usm.acc_trigger = value;
-
-       return 0;
-}
-
-static int exec_queue_set_acc_notify(struct xe_device *xe, struct xe_exec_queue *q,
-                                    u64 value, bool create)
-{
-       if (XE_IOCTL_DBG(xe, !create))
-               return -EINVAL;
-
-       if (XE_IOCTL_DBG(xe, !xe->info.has_usm))
-               return -EINVAL;
-
-       q->usm.acc_notify = value;
-
-       return 0;
-}
-
-static int exec_queue_set_acc_granularity(struct xe_device *xe, struct xe_exec_queue *q,
-                                         u64 value, bool create)
-{
-       if (XE_IOCTL_DBG(xe, !create))
-               return -EINVAL;
-
-       if (XE_IOCTL_DBG(xe, !xe->info.has_usm))
-               return -EINVAL;
-
-       if (value > DRM_XE_ACC_GRANULARITY_64M)
-               return -EINVAL;
-
-       q->usm.acc_granularity = value;
-
-       return 0;
-}
-
 typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
                                             struct xe_exec_queue *q,
                                             u64 value, bool create);
@@ -413,12 +316,6 @@ typedef int (*xe_exec_queue_set_property_fn)(struct xe_device *xe,
 static const xe_exec_queue_set_property_fn exec_queue_set_property_funcs[] = {
        [DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY] = exec_queue_set_priority,
        [DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE] = exec_queue_set_timeslice,
-       [DRM_XE_EXEC_QUEUE_SET_PROPERTY_PREEMPTION_TIMEOUT] = exec_queue_set_preemption_timeout,
-       [DRM_XE_EXEC_QUEUE_SET_PROPERTY_PERSISTENCE] = exec_queue_set_persistence,
-       [DRM_XE_EXEC_QUEUE_SET_PROPERTY_JOB_TIMEOUT] = exec_queue_set_job_timeout,
-       [DRM_XE_EXEC_QUEUE_SET_PROPERTY_ACC_TRIGGER] = exec_queue_set_acc_trigger,
-       [DRM_XE_EXEC_QUEUE_SET_PROPERTY_ACC_NOTIFY] = exec_queue_set_acc_notify,
-       [DRM_XE_EXEC_QUEUE_SET_PROPERTY_ACC_GRANULARITY] = exec_queue_set_acc_granularity,
 };
 
 static int exec_queue_user_ext_set_property(struct xe_device *xe,
@@ -437,10 +334,15 @@ static int exec_queue_user_ext_set_property(struct xe_device *xe,
 
        if (XE_IOCTL_DBG(xe, ext.property >=
                         ARRAY_SIZE(exec_queue_set_property_funcs)) ||
-           XE_IOCTL_DBG(xe, ext.pad))
+           XE_IOCTL_DBG(xe, ext.pad) ||
+           XE_IOCTL_DBG(xe, ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY &&
+                        ext.property != DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE))
                return -EINVAL;
 
        idx = array_index_nospec(ext.property, ARRAY_SIZE(exec_queue_set_property_funcs));
+       if (!exec_queue_set_property_funcs[idx])
+               return -EINVAL;
+
        return exec_queue_set_property_funcs[idx](xe, q, ext.value,  create);
 }
 
@@ -704,9 +606,7 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
                }
 
                q = xe_exec_queue_create(xe, vm, logical_mask,
-                                        args->width, hwe,
-                                        xe_vm_in_lr_mode(vm) ? 0 :
-                                        EXEC_QUEUE_FLAG_PERSISTENT);
+                                        args->width, hwe, 0);
                up_read(&vm->lock);
                xe_vm_put(vm);
                if (IS_ERR(q))
@@ -728,8 +628,6 @@ int xe_exec_queue_create_ioctl(struct drm_device *dev, void *data,
                        goto kill_exec_queue;
        }
 
-       q->persistent.xef = xef;
-
        mutex_lock(&xef->exec_queue.lock);
        err = xa_alloc(&xef->exec_queue.xa, &id, q, xa_limit_32b, GFP_KERNEL);
        mutex_unlock(&xef->exec_queue.lock);
@@ -872,10 +770,7 @@ int xe_exec_queue_destroy_ioctl(struct drm_device *dev, void *data,
        if (XE_IOCTL_DBG(xe, !q))
                return -ENOENT;
 
-       if (!(q->flags & EXEC_QUEUE_FLAG_PERSISTENT))
-               xe_exec_queue_kill(q);
-       else
-               xe_device_add_persistent_exec_queues(xe, q);
+       xe_exec_queue_kill(q);
 
        trace_xe_exec_queue_close(q);
        xe_exec_queue_put(q);
@@ -926,20 +821,24 @@ void xe_exec_queue_last_fence_put_unlocked(struct xe_exec_queue *q)
  * @q: The exec queue
  * @vm: The VM the engine does a bind or exec for
  *
- * Get last fence, does not take a ref
+ * Get last fence, takes a ref
  *
  * Returns: last fence if not signaled, dma fence stub if signaled
  */
 struct dma_fence *xe_exec_queue_last_fence_get(struct xe_exec_queue *q,
                                               struct xe_vm *vm)
 {
+       struct dma_fence *fence;
+
        xe_exec_queue_last_fence_lockdep_assert(q, vm);
 
        if (q->last_fence &&
            test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &q->last_fence->flags))
                xe_exec_queue_last_fence_put(q, vm);
 
-       return q->last_fence ? q->last_fence : dma_fence_get_stub();
+       fence = q->last_fence ? q->last_fence : dma_fence_get_stub();
+       dma_fence_get(fence);
+       return fence;
 }
 
 /**
index 8d4b7feb8c306b8a406a46f74c5cad2a430bdef3..36f4901d8d7ee917215d745da900ea49b7616a78 100644 (file)
@@ -105,16 +105,6 @@ struct xe_exec_queue {
                struct xe_guc_exec_queue *guc;
        };
 
-       /**
-        * @persistent: persistent exec queue state
-        */
-       struct {
-               /** @xef: file which this exec queue belongs to */
-               struct xe_file *xef;
-               /** @link: link in list of persistent exec queues */
-               struct list_head link;
-       } persistent;
-
        union {
                /**
                 * @parallel: parallel submission state
@@ -160,16 +150,6 @@ struct xe_exec_queue {
                spinlock_t lock;
        } compute;
 
-       /** @usm: unified shared memory state */
-       struct {
-               /** @acc_trigger: access counter trigger */
-               u32 acc_trigger;
-               /** @acc_notify: access counter notify */
-               u32 acc_notify;
-               /** @acc_granularity: access counter granularity */
-               u32 acc_granularity;
-       } usm;
-
        /** @ops: submission backend exec queue operations */
        const struct xe_exec_queue_ops *ops;
 
index 96b5224eb4787d4c7abd2b65b56d0559724bd2c8..acb4d9f38fd738dd5a0e66607cb1bbdbe91311c2 100644 (file)
@@ -212,7 +212,7 @@ static void xe_execlist_port_wake_locked(struct xe_execlist_port *port,
 static void xe_execlist_make_active(struct xe_execlist_exec_queue *exl)
 {
        struct xe_execlist_port *port = exl->port;
-       enum xe_exec_queue_priority priority = exl->active_priority;
+       enum xe_exec_queue_priority priority = exl->q->sched_props.priority;
 
        XE_WARN_ON(priority == XE_EXEC_QUEUE_PRIORITY_UNSET);
        XE_WARN_ON(priority < 0);
@@ -378,8 +378,6 @@ static void execlist_exec_queue_fini_async(struct work_struct *w)
                list_del(&exl->active_link);
        spin_unlock_irqrestore(&exl->port->lock, flags);
 
-       if (q->flags & EXEC_QUEUE_FLAG_PERSISTENT)
-               xe_device_remove_persistent_exec_queues(xe, q);
        drm_sched_entity_fini(&exl->entity);
        drm_sched_fini(&exl->sched);
        kfree(exl);
index 3af2adec129561850bfb378c04ca2d7caacdf325..35474ddbaf97ecc974a6b55643e578dbcfe135f9 100644 (file)
@@ -437,7 +437,10 @@ static int all_fw_domain_init(struct xe_gt *gt)
                 * USM has its only SA pool to non-block behind user operations
                 */
                if (gt_to_xe(gt)->info.has_usm) {
-                       gt->usm.bb_pool = xe_sa_bo_manager_init(gt_to_tile(gt), SZ_1M, 16);
+                       struct xe_device *xe = gt_to_xe(gt);
+
+                       gt->usm.bb_pool = xe_sa_bo_manager_init(gt_to_tile(gt),
+                                                               IS_DGFX(xe) ? SZ_1M : SZ_512K, 16);
                        if (IS_ERR(gt->usm.bb_pool)) {
                                err = PTR_ERR(gt->usm.bb_pool);
                                goto err_force_wake;
index 9358f733688969391e68f22a2658b08c993d296a..9fcae65b64699eadb80a82b06386588a8af07f86 100644 (file)
@@ -145,10 +145,10 @@ void xe_gt_idle_sysfs_init(struct xe_gt_idle *gtidle)
        }
 
        if (xe_gt_is_media_type(gt)) {
-               sprintf(gtidle->name, "gt%d-mc\n", gt->info.id);
+               sprintf(gtidle->name, "gt%d-mc", gt->info.id);
                gtidle->idle_residency = xe_guc_pc_mc6_residency;
        } else {
-               sprintf(gtidle->name, "gt%d-rc\n", gt->info.id);
+               sprintf(gtidle->name, "gt%d-rc", gt->info.id);
                gtidle->idle_residency = xe_guc_pc_rc6_residency;
        }
 
index 77925b35cf8dcb0ee1d62ba7c579767796c8d807..8546cd3cc50d1f8c4146b2f69c4758bac05aa240 100644 (file)
@@ -480,7 +480,7 @@ static bool xe_gt_mcr_get_nonterminated_steering(struct xe_gt *gt,
  * to synchronize with external clients (e.g., firmware), so a semaphore
  * register will also need to be taken.
  */
-static void mcr_lock(struct xe_gt *gt)
+static void mcr_lock(struct xe_gt *gt) __acquires(&gt->mcr_lock)
 {
        struct xe_device *xe = gt_to_xe(gt);
        int ret = 0;
@@ -500,7 +500,7 @@ static void mcr_lock(struct xe_gt *gt)
        drm_WARN_ON_ONCE(&xe->drm, ret == -ETIMEDOUT);
 }
 
-static void mcr_unlock(struct xe_gt *gt)
+static void mcr_unlock(struct xe_gt *gt) __releases(&gt->mcr_lock)
 {
        /* Release hardware semaphore - this is done by writing 1 to the register */
        if (GRAPHICS_VERx100(gt_to_xe(gt)) >= 1270)
index 59a70d2e0a7a33386fdcfca9cc158919aab1e32c..73f08f1924df2ea8d4aaabb87eceaa13eff81d78 100644 (file)
@@ -165,7 +165,8 @@ retry_userptr:
                goto unlock_vm;
        }
 
-       if (!xe_vma_is_userptr(vma) || !xe_vma_userptr_check_repin(vma)) {
+       if (!xe_vma_is_userptr(vma) ||
+           !xe_vma_userptr_check_repin(to_userptr_vma(vma))) {
                downgrade_write(&vm->lock);
                write_locked = false;
        }
@@ -181,11 +182,13 @@ retry_userptr:
        /* TODO: Validate fault */
 
        if (xe_vma_is_userptr(vma) && write_locked) {
+               struct xe_userptr_vma *uvma = to_userptr_vma(vma);
+
                spin_lock(&vm->userptr.invalidated_lock);
-               list_del_init(&vma->userptr.invalidate_link);
+               list_del_init(&uvma->userptr.invalidate_link);
                spin_unlock(&vm->userptr.invalidated_lock);
 
-               ret = xe_vma_userptr_pin_pages(vma);
+               ret = xe_vma_userptr_pin_pages(uvma);
                if (ret)
                        goto unlock_vm;
 
@@ -220,7 +223,7 @@ retry_userptr:
        dma_fence_put(fence);
 
        if (xe_vma_is_userptr(vma))
-               ret = xe_vma_userptr_check_repin(vma);
+               ret = xe_vma_userptr_check_repin(to_userptr_vma(vma));
        vma->usm.tile_invalidated &= ~BIT(tile->id);
 
 unlock_dma_resv:
@@ -332,7 +335,7 @@ int xe_guc_pagefault_handler(struct xe_guc *guc, u32 *msg, u32 len)
                return -EPROTO;
 
        asid = FIELD_GET(PFD_ASID, msg[1]);
-       pf_queue = &gt->usm.pf_queue[asid % NUM_PF_QUEUE];
+       pf_queue = gt->usm.pf_queue + (asid % NUM_PF_QUEUE);
 
        spin_lock_irqsave(&pf_queue->lock, flags);
        full = pf_queue_full(pf_queue);
index 7eef23a00d77ee679b011d8e4a0dc2b3ed1bb360..f4c485289dbe4d606e9022c5b58eec8e8123fdca 100644 (file)
@@ -247,6 +247,14 @@ int xe_gt_tlb_invalidation_vma(struct xe_gt *gt,
 
        xe_gt_assert(gt, vma);
 
+       /* Execlists not supported */
+       if (gt_to_xe(gt)->info.force_execlist) {
+               if (fence)
+                       __invalidation_fence_signal(fence);
+
+               return 0;
+       }
+
        action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
        action[len++] = 0; /* seqno, replaced in send_tlb_invalidation */
        if (!xe->info.has_range_tlb_invalidation) {
@@ -317,6 +325,10 @@ int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno)
        struct drm_printer p = drm_err_printer(__func__);
        int ret;
 
+       /* Execlists not supported */
+       if (gt_to_xe(gt)->info.force_execlist)
+               return 0;
+
        /*
         * XXX: See above, this algorithm only works if seqno are always in
         * order
index f71085228cb33992940622dca2992f4e1ae9fa62..d91702592520af54eea5f8ca4bd56b67719531be 100644 (file)
@@ -963,7 +963,9 @@ void xe_guc_pc_fini(struct xe_guc_pc *pc)
        struct xe_device *xe = pc_to_xe(pc);
 
        if (xe->info.skip_guc_pc) {
+               xe_device_mem_access_get(xe);
                xe_gt_idle_disable_c6(pc_to_gt(pc));
+               xe_device_mem_access_put(xe);
                return;
        }
 
index 54ffcfcdd41f9ce3c590f5814fcbe3d3535946ac..f22ae717b0b2d3d8ff938d83f9ea954b4d5746e4 100644 (file)
@@ -1028,8 +1028,6 @@ static void __guc_exec_queue_fini_async(struct work_struct *w)
 
        if (xe_exec_queue_is_lr(q))
                cancel_work_sync(&ge->lr_tdr);
-       if (q->flags & EXEC_QUEUE_FLAG_PERSISTENT)
-               xe_device_remove_persistent_exec_queues(gt_to_xe(q->gt), q);
        release_guc_id(guc, q);
        xe_sched_entity_fini(&ge->entity);
        xe_sched_fini(&ge->sched);
index a6094c81f2ad0fa8a3f1cf1001ceb897a044d3cb..a5de3e7b0bd6ab134557fdfb52a406d4bf199016 100644 (file)
@@ -217,13 +217,13 @@ struct xe_hw_fence *xe_hw_fence_create(struct xe_hw_fence_ctx *ctx,
        if (!fence)
                return ERR_PTR(-ENOMEM);
 
-       dma_fence_init(&fence->dma, &xe_hw_fence_ops, &ctx->irq->lock,
-                      ctx->dma_fence_ctx, ctx->next_seqno++);
-
        fence->ctx = ctx;
        fence->seqno_map = seqno_map;
        INIT_LIST_HEAD(&fence->irq_link);
 
+       dma_fence_init(&fence->dma, &xe_hw_fence_ops, &ctx->irq->lock,
+                      ctx->dma_fence_ctx, ctx->next_seqno++);
+
        trace_xe_hw_fence_create(fence);
 
        return fence;
index 6ef2aa1eae8b095e958e74fda4b5c42e205436bf..174ed2185481e32d568551e62f1839181a1771d4 100644 (file)
@@ -419,7 +419,7 @@ static int xe_hwmon_pcode_read_i1(struct xe_gt *gt, u32 *uval)
 
        return xe_pcode_read(gt, PCODE_MBOX(PCODE_POWER_SETUP,
                             POWER_SETUP_SUBCOMMAND_READ_I1, 0),
-                            uval, 0);
+                            uval, NULL);
 }
 
 static int xe_hwmon_pcode_write_i1(struct xe_gt *gt, u32 uval)
index b7fa3831b68451cb74ae557ca3e7a66d5d4fa6fd..b38319d2801e008f14fa4b5089cd1b9dc204f547 100644 (file)
 #include "xe_map.h"
 #include "xe_vm.h"
 
-#define CTX_VALID                              (1 << 0)
-#define CTX_PRIVILEGE                          (1 << 8)
-#define CTX_ADDRESSING_MODE_SHIFT              3
-#define LEGACY_64B_CONTEXT                     3
+#define LRC_VALID                              (1 << 0)
+#define LRC_PRIVILEGE                          (1 << 8)
+#define LRC_ADDRESSING_MODE_SHIFT              3
+#define LRC_LEGACY_64B_CONTEXT                 3
 
 #define ENGINE_CLASS_SHIFT                     61
 #define ENGINE_INSTANCE_SHIFT                  48
@@ -682,8 +682,6 @@ static void xe_lrc_set_ppgtt(struct xe_lrc *lrc, struct xe_vm *vm)
 
 #define PVC_CTX_ASID           (0x2e + 1)
 #define PVC_CTX_ACC_CTR_THOLD  (0x2a + 1)
-#define ACC_GRANULARITY_S       20
-#define ACC_NOTIFY_S            16
 
 int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
                struct xe_exec_queue *q, struct xe_vm *vm, u32 ring_size)
@@ -754,23 +752,17 @@ int xe_lrc_init(struct xe_lrc *lrc, struct xe_hw_engine *hwe,
        xe_lrc_write_ctx_reg(lrc, CTX_RING_CTL,
                             RING_CTL_SIZE(lrc->ring.size) | RING_VALID);
        if (xe->info.has_asid && vm)
-               xe_lrc_write_ctx_reg(lrc, PVC_CTX_ASID,
-                                    (q->usm.acc_granularity <<
-                                     ACC_GRANULARITY_S) | vm->usm.asid);
-       if (xe->info.has_usm && vm)
-               xe_lrc_write_ctx_reg(lrc, PVC_CTX_ACC_CTR_THOLD,
-                                    (q->usm.acc_notify << ACC_NOTIFY_S) |
-                                    q->usm.acc_trigger);
-
-       lrc->desc = CTX_VALID;
-       lrc->desc |= LEGACY_64B_CONTEXT << CTX_ADDRESSING_MODE_SHIFT;
+               xe_lrc_write_ctx_reg(lrc, PVC_CTX_ASID, vm->usm.asid);
+
+       lrc->desc = LRC_VALID;
+       lrc->desc |= LRC_LEGACY_64B_CONTEXT << LRC_ADDRESSING_MODE_SHIFT;
        /* TODO: Priority */
 
        /* While this appears to have something about privileged batches or
         * some such, it really just means PPGTT mode.
         */
        if (vm)
-               lrc->desc |= CTX_PRIVILEGE;
+               lrc->desc |= LRC_PRIVILEGE;
 
        if (GRAPHICS_VERx100(xe) < 1250) {
                lrc->desc |= (u64)hwe->instance << ENGINE_INSTANCE_SHIFT;
index e05e9e7282b68abdcab839a9134efd09e60750f2..70480c30560215ff7fece9a824fd01c92008562d 100644 (file)
@@ -170,11 +170,6 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
        if (!IS_DGFX(xe)) {
                /* Write out batch too */
                m->batch_base_ofs = NUM_PT_SLOTS * XE_PAGE_SIZE;
-               if (xe->info.has_usm) {
-                       batch = tile->primary_gt->usm.bb_pool->bo;
-                       m->usm_batch_base_ofs = m->batch_base_ofs;
-               }
-
                for (i = 0; i < batch->size;
                     i += vm->flags & XE_VM_FLAG_64K ? XE_64K_PAGE_SIZE :
                     XE_PAGE_SIZE) {
@@ -185,6 +180,24 @@ static int xe_migrate_prepare_vm(struct xe_tile *tile, struct xe_migrate *m,
                                  entry);
                        level++;
                }
+               if (xe->info.has_usm) {
+                       xe_tile_assert(tile, batch->size == SZ_1M);
+
+                       batch = tile->primary_gt->usm.bb_pool->bo;
+                       m->usm_batch_base_ofs = m->batch_base_ofs + SZ_1M;
+                       xe_tile_assert(tile, batch->size == SZ_512K);
+
+                       for (i = 0; i < batch->size;
+                            i += vm->flags & XE_VM_FLAG_64K ? XE_64K_PAGE_SIZE :
+                            XE_PAGE_SIZE) {
+                               entry = vm->pt_ops->pte_encode_bo(batch, i,
+                                                                 pat_index, 0);
+
+                               xe_map_wr(xe, &bo->vmap, map_ofs + level * 8, u64,
+                                         entry);
+                               level++;
+                       }
+               }
        } else {
                u64 batch_addr = xe_bo_addr(batch, 0, XE_PAGE_SIZE);
 
@@ -472,7 +485,7 @@ static void emit_pte(struct xe_migrate *m,
        /* Indirect access needs compression enabled uncached PAT index */
        if (GRAPHICS_VERx100(xe) >= 2000)
                pat_index = is_comp_pte ? xe->pat.idx[XE_CACHE_NONE_COMPRESSION] :
-                                         xe->pat.idx[XE_CACHE_NONE];
+                                         xe->pat.idx[XE_CACHE_WB];
        else
                pat_index = xe->pat.idx[XE_CACHE_WB];
 
@@ -760,14 +773,14 @@ struct dma_fence *xe_migrate_copy(struct xe_migrate *m,
                if (src_is_vram && xe_migrate_allow_identity(src_L0, &src_it))
                        xe_res_next(&src_it, src_L0);
                else
-                       emit_pte(m, bb, src_L0_pt, src_is_vram, true, &src_it, src_L0,
-                                src);
+                       emit_pte(m, bb, src_L0_pt, src_is_vram, copy_system_ccs,
+                                &src_it, src_L0, src);
 
                if (dst_is_vram && xe_migrate_allow_identity(src_L0, &dst_it))
                        xe_res_next(&dst_it, src_L0);
                else
-                       emit_pte(m, bb, dst_L0_pt, dst_is_vram, true, &dst_it, src_L0,
-                                dst);
+                       emit_pte(m, bb, dst_L0_pt, dst_is_vram, copy_system_ccs,
+                                &dst_it, src_L0, dst);
 
                if (copy_system_ccs)
                        emit_pte(m, bb, ccs_pt, false, false, &ccs_it, ccs_size, src);
@@ -1009,8 +1022,8 @@ struct dma_fence *xe_migrate_clear(struct xe_migrate *m,
                if (clear_vram && xe_migrate_allow_identity(clear_L0, &src_it))
                        xe_res_next(&src_it, clear_L0);
                else
-                       emit_pte(m, bb, clear_L0_pt, clear_vram, true, &src_it, clear_L0,
-                                dst);
+                       emit_pte(m, bb, clear_L0_pt, clear_vram, clear_system_ccs,
+                                &src_it, clear_L0, dst);
 
                bb->cs[bb->len++] = MI_BATCH_BUFFER_END;
                update_idx = bb->len;
@@ -1204,8 +1217,11 @@ static bool no_in_syncs(struct xe_vm *vm, struct xe_exec_queue *q,
        }
        if (q) {
                fence = xe_exec_queue_last_fence_get(q, vm);
-               if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
+               if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) {
+                       dma_fence_put(fence);
                        return false;
+               }
+               dma_fence_put(fence);
        }
 
        return true;
index c8c5d74b6e9041ec53c38ba81d184b83037427ab..02f7808f28cabd5533e634b41d1780769bdcbb10 100644 (file)
@@ -105,7 +105,7 @@ static void xe_resize_vram_bar(struct xe_device *xe)
 
        pci_bus_for_each_resource(root, root_res, i) {
                if (root_res && root_res->flags & (IORESOURCE_MEM | IORESOURCE_MEM_64) &&
-                   root_res->start > 0x100000000ull)
+                   (u64)root_res->start > 0x100000000ul)
                        break;
        }
 
@@ -272,8 +272,8 @@ int xe_mmio_probe_vram(struct xe_device *xe)
                drm_info(&xe->drm, "VRAM[%u, %u]: Actual physical size %pa, usable size exclude stolen %pa, CPU accessible size %pa\n", id,
                         tile->id, &tile->mem.vram.actual_physical_size, &tile->mem.vram.usable_size, &tile->mem.vram.io_size);
                drm_info(&xe->drm, "VRAM[%u, %u]: DPA range: [%pa-%llx], io range: [%pa-%llx]\n", id, tile->id,
-                        &tile->mem.vram.dpa_base, tile->mem.vram.dpa_base + tile->mem.vram.actual_physical_size,
-                        &tile->mem.vram.io_start, tile->mem.vram.io_start + tile->mem.vram.io_size);
+                        &tile->mem.vram.dpa_base, tile->mem.vram.dpa_base + (u64)tile->mem.vram.actual_physical_size,
+                        &tile->mem.vram.io_start, tile->mem.vram.io_start + (u64)tile->mem.vram.io_size);
 
                /* calculate total size using tile size to get the correct HW sizing */
                total_size += tile_size;
index de1030a47588371b0cc71f5b69bee8f0257e2625..6653c045f3c927f21e9d73dacb591ad363e01c47 100644 (file)
@@ -20,8 +20,8 @@
 
 struct xe_pt_dir {
        struct xe_pt pt;
-       /** @dir: Directory structure for the xe_pt_walk functionality */
-       struct xe_ptw_dir dir;
+       /** @children: Array of page-table child nodes */
+       struct xe_ptw *children[XE_PDES];
 };
 
 #if IS_ENABLED(CONFIG_DRM_XE_DEBUG_VM)
@@ -44,7 +44,7 @@ static struct xe_pt_dir *as_xe_pt_dir(struct xe_pt *pt)
 
 static struct xe_pt *xe_pt_entry(struct xe_pt_dir *pt_dir, unsigned int index)
 {
-       return container_of(pt_dir->dir.entries[index], struct xe_pt, base);
+       return container_of(pt_dir->children[index], struct xe_pt, base);
 }
 
 static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
@@ -65,6 +65,14 @@ static u64 __xe_pt_empty_pte(struct xe_tile *tile, struct xe_vm *vm,
                XE_PTE_NULL;
 }
 
+static void xe_pt_free(struct xe_pt *pt)
+{
+       if (pt->level)
+               kfree(as_xe_pt_dir(pt));
+       else
+               kfree(pt);
+}
+
 /**
  * xe_pt_create() - Create a page-table.
  * @vm: The vm to create for.
@@ -85,15 +93,19 @@ struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
 {
        struct xe_pt *pt;
        struct xe_bo *bo;
-       size_t size;
        int err;
 
-       size = !level ?  sizeof(struct xe_pt) : sizeof(struct xe_pt_dir) +
-               XE_PDES * sizeof(struct xe_ptw *);
-       pt = kzalloc(size, GFP_KERNEL);
+       if (level) {
+               struct xe_pt_dir *dir = kzalloc(sizeof(*dir), GFP_KERNEL);
+
+               pt = (dir) ? &dir->pt : NULL;
+       } else {
+               pt = kzalloc(sizeof(*pt), GFP_KERNEL);
+       }
        if (!pt)
                return ERR_PTR(-ENOMEM);
 
+       pt->level = level;
        bo = xe_bo_create_pin_map(vm->xe, tile, vm, SZ_4K,
                                  ttm_bo_type_kernel,
                                  XE_BO_CREATE_VRAM_IF_DGFX(tile) |
@@ -106,8 +118,7 @@ struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
                goto err_kfree;
        }
        pt->bo = bo;
-       pt->level = level;
-       pt->base.dir = level ? &as_xe_pt_dir(pt)->dir : NULL;
+       pt->base.children = level ? as_xe_pt_dir(pt)->children : NULL;
 
        if (vm->xef)
                xe_drm_client_add_bo(vm->xef->client, pt->bo);
@@ -116,7 +127,7 @@ struct xe_pt *xe_pt_create(struct xe_vm *vm, struct xe_tile *tile,
        return pt;
 
 err_kfree:
-       kfree(pt);
+       xe_pt_free(pt);
        return ERR_PTR(err);
 }
 
@@ -193,7 +204,7 @@ void xe_pt_destroy(struct xe_pt *pt, u32 flags, struct llist_head *deferred)
                                              deferred);
                }
        }
-       kfree(pt);
+       xe_pt_free(pt);
 }
 
 /**
@@ -358,7 +369,7 @@ xe_pt_insert_entry(struct xe_pt_stage_bind_walk *xe_walk, struct xe_pt *parent,
                struct iosys_map *map = &parent->bo->vmap;
 
                if (unlikely(xe_child))
-                       parent->base.dir->entries[offset] = &xe_child->base;
+                       parent->base.children[offset] = &xe_child->base;
 
                xe_pt_write(xe_walk->vm->xe, map, offset, pte);
                parent->num_live++;
@@ -488,10 +499,12 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
                 * this device *requires* 64K PTE size for VRAM, fail.
                 */
                if (level == 0 && !xe_parent->is_compact) {
-                       if (xe_pt_is_pte_ps64K(addr, next, xe_walk))
+                       if (xe_pt_is_pte_ps64K(addr, next, xe_walk)) {
+                               xe_walk->vma->gpuva.flags |= XE_VMA_PTE_64K;
                                pte |= XE_PTE_PS64;
-                       else if (XE_WARN_ON(xe_walk->needs_64K))
+                       } else if (XE_WARN_ON(xe_walk->needs_64K)) {
                                return -EINVAL;
+                       }
                }
 
                ret = xe_pt_insert_entry(xe_walk, xe_parent, offset, NULL, pte);
@@ -534,13 +547,16 @@ xe_pt_stage_bind_entry(struct xe_ptw *parent, pgoff_t offset,
                *child = &xe_child->base;
 
                /*
-                * Prefer the compact pagetable layout for L0 if possible.
+                * Prefer the compact pagetable layout for L0 if possible. Only
+                * possible if VMA covers entire 2MB region as compact 64k and
+                * 4k pages cannot be mixed within a 2MB region.
                 * TODO: Suballocate the pt bo to avoid wasting a lot of
                 * memory.
                 */
                if (GRAPHICS_VERx100(tile_to_xe(xe_walk->tile)) >= 1250 && level == 1 &&
                    covers && xe_pt_scan_64K(addr, next, xe_walk)) {
                        walk->shifts = xe_compact_pt_shifts;
+                       xe_walk->vma->gpuva.flags |= XE_VMA_PTE_COMPACT;
                        flags |= XE_PDE_64K;
                        xe_child->is_compact = true;
                }
@@ -618,8 +634,8 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
 
        if (!xe_vma_is_null(vma)) {
                if (xe_vma_is_userptr(vma))
-                       xe_res_first_sg(vma->userptr.sg, 0, xe_vma_size(vma),
-                                       &curs);
+                       xe_res_first_sg(to_userptr_vma(vma)->userptr.sg, 0,
+                                       xe_vma_size(vma), &curs);
                else if (xe_bo_is_vram(bo) || xe_bo_is_stolen(bo))
                        xe_res_first(bo->ttm.resource, xe_vma_bo_offset(vma),
                                     xe_vma_size(vma), &curs);
@@ -853,7 +869,7 @@ static void xe_pt_commit_bind(struct xe_vma *vma,
                                xe_pt_destroy(xe_pt_entry(pt_dir, j_),
                                              xe_vma_vm(vma)->flags, deferred);
 
-                       pt_dir->dir.entries[j_] = &newpte->base;
+                       pt_dir->children[j_] = &newpte->base;
                }
                kfree(entries[i].pt_entries);
        }
@@ -906,17 +922,17 @@ static void xe_vm_dbg_print_entries(struct xe_device *xe,
 
 #ifdef CONFIG_DRM_XE_USERPTR_INVAL_INJECT
 
-static int xe_pt_userptr_inject_eagain(struct xe_vma *vma)
+static int xe_pt_userptr_inject_eagain(struct xe_userptr_vma *uvma)
 {
-       u32 divisor = vma->userptr.divisor ? vma->userptr.divisor : 2;
+       u32 divisor = uvma->userptr.divisor ? uvma->userptr.divisor : 2;
        static u32 count;
 
        if (count++ % divisor == divisor - 1) {
-               struct xe_vm *vm = xe_vma_vm(vma);
+               struct xe_vm *vm = xe_vma_vm(&uvma->vma);
 
-               vma->userptr.divisor = divisor << 1;
+               uvma->userptr.divisor = divisor << 1;
                spin_lock(&vm->userptr.invalidated_lock);
-               list_move_tail(&vma->userptr.invalidate_link,
+               list_move_tail(&uvma->userptr.invalidate_link,
                               &vm->userptr.invalidated);
                spin_unlock(&vm->userptr.invalidated_lock);
                return true;
@@ -927,7 +943,7 @@ static int xe_pt_userptr_inject_eagain(struct xe_vma *vma)
 
 #else
 
-static bool xe_pt_userptr_inject_eagain(struct xe_vma *vma)
+static bool xe_pt_userptr_inject_eagain(struct xe_userptr_vma *uvma)
 {
        return false;
 }
@@ -1000,9 +1016,9 @@ static int xe_pt_userptr_pre_commit(struct xe_migrate_pt_update *pt_update)
 {
        struct xe_pt_migrate_pt_update *userptr_update =
                container_of(pt_update, typeof(*userptr_update), base);
-       struct xe_vma *vma = pt_update->vma;
-       unsigned long notifier_seq = vma->userptr.notifier_seq;
-       struct xe_vm *vm = xe_vma_vm(vma);
+       struct xe_userptr_vma *uvma = to_userptr_vma(pt_update->vma);
+       unsigned long notifier_seq = uvma->userptr.notifier_seq;
+       struct xe_vm *vm = xe_vma_vm(&uvma->vma);
        int err = xe_pt_vm_dependencies(pt_update->job,
                                        &vm->rftree[pt_update->tile_id],
                                        pt_update->start,
@@ -1023,7 +1039,7 @@ static int xe_pt_userptr_pre_commit(struct xe_migrate_pt_update *pt_update)
         */
        do {
                down_read(&vm->userptr.notifier_lock);
-               if (!mmu_interval_read_retry(&vma->userptr.notifier,
+               if (!mmu_interval_read_retry(&uvma->userptr.notifier,
                                             notifier_seq))
                        break;
 
@@ -1032,11 +1048,11 @@ static int xe_pt_userptr_pre_commit(struct xe_migrate_pt_update *pt_update)
                if (userptr_update->bind)
                        return -EAGAIN;
 
-               notifier_seq = mmu_interval_read_begin(&vma->userptr.notifier);
+               notifier_seq = mmu_interval_read_begin(&uvma->userptr.notifier);
        } while (true);
 
        /* Inject errors to test_whether they are handled correctly */
-       if (userptr_update->bind && xe_pt_userptr_inject_eagain(vma)) {
+       if (userptr_update->bind && xe_pt_userptr_inject_eagain(uvma)) {
                up_read(&vm->userptr.notifier_lock);
                return -EAGAIN;
        }
@@ -1297,7 +1313,7 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
                vma->tile_present |= BIT(tile->id);
 
                if (bind_pt_update.locked) {
-                       vma->userptr.initial_bind = true;
+                       to_userptr_vma(vma)->userptr.initial_bind = true;
                        up_read(&vm->userptr.notifier_lock);
                        xe_bo_put_commit(&deferred);
                }
@@ -1507,7 +1523,7 @@ xe_pt_commit_unbind(struct xe_vma *vma,
                                        xe_pt_destroy(xe_pt_entry(pt_dir, i),
                                                      xe_vma_vm(vma)->flags, deferred);
 
-                               pt_dir->dir.entries[i] = NULL;
+                               pt_dir->children[i] = NULL;
                        }
                }
        }
@@ -1642,7 +1658,7 @@ __xe_pt_unbind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queu
 
                if (!vma->tile_present) {
                        spin_lock(&vm->userptr.invalidated_lock);
-                       list_del_init(&vma->userptr.invalidate_link);
+                       list_del_init(&to_userptr_vma(vma)->userptr.invalidate_link);
                        spin_unlock(&vm->userptr.invalidated_lock);
                }
                up_read(&vm->userptr.notifier_lock);
index 8f6c8d063f39f0293a6c4d966c009dfc24ca9045..b8b3d2aea4923d0ac087f6a2c972652aba8efc6f 100644 (file)
@@ -74,7 +74,7 @@ int xe_pt_walk_range(struct xe_ptw *parent, unsigned int level,
                     u64 addr, u64 end, struct xe_pt_walk *walk)
 {
        pgoff_t offset = xe_pt_offset(addr, level, walk);
-       struct xe_ptw **entries = parent->dir ? parent->dir->entries : NULL;
+       struct xe_ptw **entries = parent->children ? parent->children : NULL;
        const struct xe_pt_walk_ops *ops = walk->ops;
        enum page_walk_action action;
        struct xe_ptw *child;
index ec3d1e9efa6d514ae21bb4b4a1b35a0bc2baf59c..5ecc4d2f0f6536b7ec79033f80f556ce1f00edc5 100644 (file)
@@ -8,28 +8,15 @@
 #include <linux/pagewalk.h>
 #include <linux/types.h>
 
-struct xe_ptw_dir;
-
 /**
  * struct xe_ptw - base class for driver pagetable subclassing.
- * @dir: Pointer to an array of children if any.
+ * @children: Pointer to an array of children if any.
  *
  * Drivers could subclass this, and if it's a page-directory, typically
- * embed the xe_ptw_dir::entries array in the same allocation.
+ * embed an array of xe_ptw pointers.
  */
 struct xe_ptw {
-       struct xe_ptw_dir *dir;
-};
-
-/**
- * struct xe_ptw_dir - page directory structure
- * @entries: Array holding page directory children.
- *
- * It is the responsibility of the user to ensure @entries is
- * correctly sized.
- */
-struct xe_ptw_dir {
-       struct xe_ptw *entries[0];
+       struct xe_ptw **children;
 };
 
 /**
index 9b35673b286c80c1c2332d6ece9add9f754f3ca8..7e924faeeea0b0f8ebc2f7fe89ade231412a3892 100644 (file)
@@ -459,21 +459,21 @@ static size_t calc_topo_query_size(struct xe_device *xe)
                 sizeof_field(struct xe_gt, fuse_topo.eu_mask_per_dss));
 }
 
-static void __user *copy_mask(void __user *ptr,
-                             struct drm_xe_query_topology_mask *topo,
-                             void *mask, size_t mask_size)
+static int copy_mask(void __user **ptr,
+                    struct drm_xe_query_topology_mask *topo,
+                    void *mask, size_t mask_size)
 {
        topo->num_bytes = mask_size;
 
-       if (copy_to_user(ptr, topo, sizeof(*topo)))
-               return ERR_PTR(-EFAULT);
-       ptr += sizeof(topo);
+       if (copy_to_user(*ptr, topo, sizeof(*topo)))
+               return -EFAULT;
+       *ptr += sizeof(topo);
 
-       if (copy_to_user(ptr, mask, mask_size))
-               return ERR_PTR(-EFAULT);
-       ptr += mask_size;
+       if (copy_to_user(*ptr, mask, mask_size))
+               return -EFAULT;
+       *ptr += mask_size;
 
-       return ptr;
+       return 0;
 }
 
 static int query_gt_topology(struct xe_device *xe,
@@ -493,28 +493,28 @@ static int query_gt_topology(struct xe_device *xe,
        }
 
        for_each_gt(gt, xe, id) {
+               int err;
+
                topo.gt_id = id;
 
                topo.type = DRM_XE_TOPO_DSS_GEOMETRY;
-               query_ptr = copy_mask(query_ptr, &topo,
-                                     gt->fuse_topo.g_dss_mask,
-                                     sizeof(gt->fuse_topo.g_dss_mask));
-               if (IS_ERR(query_ptr))
-                       return PTR_ERR(query_ptr);
+               err = copy_mask(&query_ptr, &topo, gt->fuse_topo.g_dss_mask,
+                               sizeof(gt->fuse_topo.g_dss_mask));
+               if (err)
+                       return err;
 
                topo.type = DRM_XE_TOPO_DSS_COMPUTE;
-               query_ptr = copy_mask(query_ptr, &topo,
-                                     gt->fuse_topo.c_dss_mask,
-                                     sizeof(gt->fuse_topo.c_dss_mask));
-               if (IS_ERR(query_ptr))
-                       return PTR_ERR(query_ptr);
+               err = copy_mask(&query_ptr, &topo, gt->fuse_topo.c_dss_mask,
+                               sizeof(gt->fuse_topo.c_dss_mask));
+               if (err)
+                       return err;
 
                topo.type = DRM_XE_TOPO_EU_PER_DSS;
-               query_ptr = copy_mask(query_ptr, &topo,
-                                     gt->fuse_topo.eu_mask_per_dss,
-                                     sizeof(gt->fuse_topo.eu_mask_per_dss));
-               if (IS_ERR(query_ptr))
-                       return PTR_ERR(query_ptr);
+               err = copy_mask(&query_ptr, &topo,
+                               gt->fuse_topo.eu_mask_per_dss,
+                               sizeof(gt->fuse_topo.eu_mask_per_dss));
+               if (err)
+                       return err;
        }
 
        return 0;
index d35d9ec58e86f95c8244fc02e6a9709b63ccf93a..372378e89e989239833879e23d2e62a2fd573b54 100644 (file)
@@ -151,6 +151,11 @@ xe_range_fence_tree_next(struct xe_range_fence *rfence, u64 start, u64 last)
        return xe_range_fence_tree_iter_next(rfence, start, last);
 }
 
+static void xe_range_fence_free(struct xe_range_fence *rfence)
+{
+       kfree(rfence);
+}
+
 const struct xe_range_fence_ops xe_range_fence_kfree_ops = {
-       .free = (void (*)(struct xe_range_fence *rfence)) kfree,
+       .free = xe_range_fence_free,
 };
index 01106a1156ad82ab30378b29abf3f18d55b64fe3..4e2ccad0e52fabaf43ea26ddc1dd86f2294662a1 100644 (file)
@@ -274,7 +274,6 @@ int xe_sched_job_last_fence_add_dep(struct xe_sched_job *job, struct xe_vm *vm)
        struct dma_fence *fence;
 
        fence = xe_exec_queue_last_fence_get(job->q, vm);
-       dma_fence_get(fence);
 
        return drm_sched_job_add_dependency(&job->drm, fence);
 }
index e4c220cf9115e9d52fc7b1e9440e0e44ba247c46..02c9577fe418516bcb891174b9599b6c0b2903bf 100644 (file)
@@ -19,7 +19,7 @@
 #include "xe_macros.h"
 #include "xe_sched_job_types.h"
 
-struct user_fence {
+struct xe_user_fence {
        struct xe_device *xe;
        struct kref refcount;
        struct dma_fence_cb cb;
@@ -27,31 +27,32 @@ struct user_fence {
        struct mm_struct *mm;
        u64 __user *addr;
        u64 value;
+       int signalled;
 };
 
 static void user_fence_destroy(struct kref *kref)
 {
-       struct user_fence *ufence = container_of(kref, struct user_fence,
+       struct xe_user_fence *ufence = container_of(kref, struct xe_user_fence,
                                                 refcount);
 
        mmdrop(ufence->mm);
        kfree(ufence);
 }
 
-static void user_fence_get(struct user_fence *ufence)
+static void user_fence_get(struct xe_user_fence *ufence)
 {
        kref_get(&ufence->refcount);
 }
 
-static void user_fence_put(struct user_fence *ufence)
+static void user_fence_put(struct xe_user_fence *ufence)
 {
        kref_put(&ufence->refcount, user_fence_destroy);
 }
 
-static struct user_fence *user_fence_create(struct xe_device *xe, u64 addr,
-                                           u64 value)
+static struct xe_user_fence *user_fence_create(struct xe_device *xe, u64 addr,
+                                              u64 value)
 {
-       struct user_fence *ufence;
+       struct xe_user_fence *ufence;
 
        ufence = kmalloc(sizeof(*ufence), GFP_KERNEL);
        if (!ufence)
@@ -69,7 +70,7 @@ static struct user_fence *user_fence_create(struct xe_device *xe, u64 addr,
 
 static void user_fence_worker(struct work_struct *w)
 {
-       struct user_fence *ufence = container_of(w, struct user_fence, worker);
+       struct xe_user_fence *ufence = container_of(w, struct xe_user_fence, worker);
 
        if (mmget_not_zero(ufence->mm)) {
                kthread_use_mm(ufence->mm);
@@ -80,10 +81,11 @@ static void user_fence_worker(struct work_struct *w)
        }
 
        wake_up_all(&ufence->xe->ufence_wq);
+       WRITE_ONCE(ufence->signalled, 1);
        user_fence_put(ufence);
 }
 
-static void kick_ufence(struct user_fence *ufence, struct dma_fence *fence)
+static void kick_ufence(struct xe_user_fence *ufence, struct dma_fence *fence)
 {
        INIT_WORK(&ufence->worker, user_fence_worker);
        queue_work(ufence->xe->ordered_wq, &ufence->worker);
@@ -92,7 +94,7 @@ static void kick_ufence(struct user_fence *ufence, struct dma_fence *fence)
 
 static void user_fence_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
 {
-       struct user_fence *ufence = container_of(cb, struct user_fence, cb);
+       struct xe_user_fence *ufence = container_of(cb, struct xe_user_fence, cb);
 
        kick_ufence(ufence, fence);
 }
@@ -307,7 +309,6 @@ xe_sync_in_fence_get(struct xe_sync_entry *sync, int num_sync,
        /* Easy case... */
        if (!num_in_fence) {
                fence = xe_exec_queue_last_fence_get(q, vm);
-               dma_fence_get(fence);
                return fence;
        }
 
@@ -322,7 +323,6 @@ xe_sync_in_fence_get(struct xe_sync_entry *sync, int num_sync,
                }
        }
        fences[current_fence++] = xe_exec_queue_last_fence_get(q, vm);
-       dma_fence_get(fences[current_fence - 1]);
        cf = dma_fence_array_create(num_in_fence, fences,
                                    vm->composite_fence_ctx,
                                    vm->composite_fence_seqno++,
@@ -342,3 +342,39 @@ err_out:
 
        return ERR_PTR(-ENOMEM);
 }
+
+/**
+ * xe_sync_ufence_get() - Get user fence from sync
+ * @sync: input sync
+ *
+ * Get a user fence reference from sync.
+ *
+ * Return: xe_user_fence pointer with reference
+ */
+struct xe_user_fence *xe_sync_ufence_get(struct xe_sync_entry *sync)
+{
+       user_fence_get(sync->ufence);
+
+       return sync->ufence;
+}
+
+/**
+ * xe_sync_ufence_put() - Put user fence reference
+ * @ufence: user fence reference
+ *
+ */
+void xe_sync_ufence_put(struct xe_user_fence *ufence)
+{
+       user_fence_put(ufence);
+}
+
+/**
+ * xe_sync_ufence_get_status() - Get user fence status
+ * @ufence: user fence
+ *
+ * Return: 1 if signalled, 0 not signalled, <0 on error
+ */
+int xe_sync_ufence_get_status(struct xe_user_fence *ufence)
+{
+       return READ_ONCE(ufence->signalled);
+}
index d284afbe917c19203473b30d0abc38ca88ffbfa2..0fd0d51208e627c9be72eef661c160458db6f5a4 100644 (file)
@@ -33,4 +33,13 @@ struct dma_fence *
 xe_sync_in_fence_get(struct xe_sync_entry *sync, int num_sync,
                     struct xe_exec_queue *q, struct xe_vm *vm);
 
+static inline bool xe_sync_is_ufence(struct xe_sync_entry *sync)
+{
+       return !!sync->ufence;
+}
+
+struct xe_user_fence *xe_sync_ufence_get(struct xe_sync_entry *sync);
+void xe_sync_ufence_put(struct xe_user_fence *ufence);
+int xe_sync_ufence_get_status(struct xe_user_fence *ufence);
+
 #endif
index 852db5e7884fcde668f6f85b6e4049fa5290f8a9..30ac3f51993b944e3dd86ccb059c75441f87f5e1 100644 (file)
@@ -18,7 +18,7 @@ struct xe_sync_entry {
        struct drm_syncobj *syncobj;
        struct dma_fence *fence;
        struct dma_fence_chain *chain_fence;
-       struct user_fence *ufence;
+       struct xe_user_fence *ufence;
        u64 addr;
        u64 timeline_value;
        u32 type;
index 044c20881de7ef0ede17f4dcfcdf34863817d8de..0650b2fa75efba85aea8d2a98e7d076ebabd607a 100644 (file)
@@ -167,9 +167,10 @@ int xe_tile_init_noalloc(struct xe_tile *tile)
                goto err_mem_access;
 
        tile->mem.kernel_bb_pool = xe_sa_bo_manager_init(tile, SZ_1M, 16);
-       if (IS_ERR(tile->mem.kernel_bb_pool))
+       if (IS_ERR(tile->mem.kernel_bb_pool)) {
                err = PTR_ERR(tile->mem.kernel_bb_pool);
-
+               goto err_mem_access;
+       }
        xe_wa_apply_tile_workarounds(tile);
 
        xe_tile_sysfs_init(tile);
index 95163c303f3e11694bdc1bafd18eb6386740eb01..4ddc55527f9ab3e632635c5f920d4f4420df1255 100644 (file)
@@ -12,6 +12,7 @@
 #include <linux/tracepoint.h>
 #include <linux/types.h>
 
+#include "xe_bo.h"
 #include "xe_bo_types.h"
 #include "xe_exec_queue_types.h"
 #include "xe_gpu_scheduler_types.h"
@@ -26,16 +27,16 @@ DECLARE_EVENT_CLASS(xe_gt_tlb_invalidation_fence,
                    TP_ARGS(fence),
 
                    TP_STRUCT__entry(
-                            __field(u64, fence)
+                            __field(struct xe_gt_tlb_invalidation_fence *, fence)
                             __field(int, seqno)
                             ),
 
                    TP_fast_assign(
-                          __entry->fence = (u64)fence;
+                          __entry->fence = fence;
                           __entry->seqno = fence->seqno;
                           ),
 
-                   TP_printk("fence=0x%016llx, seqno=%d",
+                   TP_printk("fence=%p, seqno=%d",
                              __entry->fence, __entry->seqno)
 );
 
@@ -82,16 +83,16 @@ DECLARE_EVENT_CLASS(xe_bo,
                    TP_STRUCT__entry(
                             __field(size_t, size)
                             __field(u32, flags)
-                            __field(u64, vm)
+                            __field(struct xe_vm *, vm)
                             ),
 
                    TP_fast_assign(
                           __entry->size = bo->size;
                           __entry->flags = bo->flags;
-                          __entry->vm = (unsigned long)bo->vm;
+                          __entry->vm = bo->vm;
                           ),
 
-                   TP_printk("size=%zu, flags=0x%02x, vm=0x%016llx",
+                   TP_printk("size=%zu, flags=0x%02x, vm=%p",
                              __entry->size, __entry->flags, __entry->vm)
 );
 
@@ -100,9 +101,31 @@ DEFINE_EVENT(xe_bo, xe_bo_cpu_fault,
             TP_ARGS(bo)
 );
 
-DEFINE_EVENT(xe_bo, xe_bo_move,
-            TP_PROTO(struct xe_bo *bo),
-            TP_ARGS(bo)
+TRACE_EVENT(xe_bo_move,
+           TP_PROTO(struct xe_bo *bo, uint32_t new_placement, uint32_t old_placement,
+                    bool move_lacks_source),
+           TP_ARGS(bo, new_placement, old_placement, move_lacks_source),
+           TP_STRUCT__entry(
+                    __field(struct xe_bo *, bo)
+                    __field(size_t, size)
+                    __field(u32, new_placement)
+                    __field(u32, old_placement)
+                    __array(char, device_id, 12)
+                    __field(bool, move_lacks_source)
+                       ),
+
+           TP_fast_assign(
+                  __entry->bo      = bo;
+                  __entry->size = bo->size;
+                  __entry->new_placement = new_placement;
+                  __entry->old_placement = old_placement;
+                  strscpy(__entry->device_id, dev_name(xe_bo_device(__entry->bo)->drm.dev), 12);
+                  __entry->move_lacks_source = move_lacks_source;
+                  ),
+           TP_printk("move_lacks_source:%s, migrate object %p [size %zu] from %s to %s device_id:%s",
+                     __entry->move_lacks_source ? "yes" : "no", __entry->bo, __entry->size,
+                     xe_mem_type_to_name[__entry->old_placement],
+                     xe_mem_type_to_name[__entry->new_placement], __entry->device_id)
 );
 
 DECLARE_EVENT_CLASS(xe_exec_queue,
@@ -327,16 +350,16 @@ DECLARE_EVENT_CLASS(xe_hw_fence,
                    TP_STRUCT__entry(
                             __field(u64, ctx)
                             __field(u32, seqno)
-                            __field(u64, fence)
+                            __field(struct xe_hw_fence *, fence)
                             ),
 
                    TP_fast_assign(
                           __entry->ctx = fence->dma.context;
                           __entry->seqno = fence->dma.seqno;
-                          __entry->fence = (unsigned long)fence;
+                          __entry->fence = fence;
                           ),
 
-                   TP_printk("ctx=0x%016llx, fence=0x%016llx, seqno=%u",
+                   TP_printk("ctx=0x%016llx, fence=%p, seqno=%u",
                              __entry->ctx, __entry->fence, __entry->seqno)
 );
 
@@ -365,7 +388,7 @@ DECLARE_EVENT_CLASS(xe_vma,
                    TP_ARGS(vma),
 
                    TP_STRUCT__entry(
-                            __field(u64, vma)
+                            __field(struct xe_vma *, vma)
                             __field(u32, asid)
                             __field(u64, start)
                             __field(u64, end)
@@ -373,14 +396,14 @@ DECLARE_EVENT_CLASS(xe_vma,
                             ),
 
                    TP_fast_assign(
-                          __entry->vma = (unsigned long)vma;
+                          __entry->vma = vma;
                           __entry->asid = xe_vma_vm(vma)->usm.asid;
                           __entry->start = xe_vma_start(vma);
                           __entry->end = xe_vma_end(vma) - 1;
                           __entry->ptr = xe_vma_userptr(vma);
                           ),
 
-                   TP_printk("vma=0x%016llx, asid=0x%05x, start=0x%012llx, end=0x%012llx, ptr=0x%012llx,",
+                   TP_printk("vma=%p, asid=0x%05x, start=0x%012llx, end=0x%012llx, userptr=0x%012llx,",
                              __entry->vma, __entry->asid, __entry->start,
                              __entry->end, __entry->ptr)
 )
@@ -465,16 +488,16 @@ DECLARE_EVENT_CLASS(xe_vm,
                    TP_ARGS(vm),
 
                    TP_STRUCT__entry(
-                            __field(u64, vm)
+                            __field(struct xe_vm *, vm)
                             __field(u32, asid)
                             ),
 
                    TP_fast_assign(
-                          __entry->vm = (unsigned long)vm;
+                          __entry->vm = vm;
                           __entry->asid = vm->usm.asid;
                           ),
 
-                   TP_printk("vm=0x%016llx, asid=0x%05x",  __entry->vm,
+                   TP_printk("vm=%p, asid=0x%05x",  __entry->vm,
                              __entry->asid)
 );
 
index 10b6995fbf294690a36234dc3c36bf741245b7a9..3b21afe5b4883fa64aeb92c6d2174b014be96c59 100644 (file)
@@ -37,8 +37,6 @@
 #include "generated/xe_wa_oob.h"
 #include "xe_wa.h"
 
-#define TEST_VM_ASYNC_OPS_ERROR
-
 static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
 {
        return vm->gpuvm.r_obj;
@@ -46,7 +44,7 @@ static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
 
 /**
  * xe_vma_userptr_check_repin() - Advisory check for repin needed
- * @vma: The userptr vma
+ * @uvma: The userptr vma
  *
  * Check if the userptr vma has been invalidated since last successful
  * repin. The check is advisory only and can the function can be called
@@ -56,15 +54,17 @@ static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
  *
  * Return: 0 if userptr vma is valid, -EAGAIN otherwise; repin recommended.
  */
-int xe_vma_userptr_check_repin(struct xe_vma *vma)
+int xe_vma_userptr_check_repin(struct xe_userptr_vma *uvma)
 {
-       return mmu_interval_check_retry(&vma->userptr.notifier,
-                                       vma->userptr.notifier_seq) ?
+       return mmu_interval_check_retry(&uvma->userptr.notifier,
+                                       uvma->userptr.notifier_seq) ?
                -EAGAIN : 0;
 }
 
-int xe_vma_userptr_pin_pages(struct xe_vma *vma)
+int xe_vma_userptr_pin_pages(struct xe_userptr_vma *uvma)
 {
+       struct xe_userptr *userptr = &uvma->userptr;
+       struct xe_vma *vma = &uvma->vma;
        struct xe_vm *vm = xe_vma_vm(vma);
        struct xe_device *xe = vm->xe;
        const unsigned long num_pages = xe_vma_size(vma) >> PAGE_SHIFT;
@@ -80,30 +80,30 @@ retry:
        if (vma->gpuva.flags & XE_VMA_DESTROYED)
                return 0;
 
-       notifier_seq = mmu_interval_read_begin(&vma->userptr.notifier);
-       if (notifier_seq == vma->userptr.notifier_seq)
+       notifier_seq = mmu_interval_read_begin(&userptr->notifier);
+       if (notifier_seq == userptr->notifier_seq)
                return 0;
 
        pages = kvmalloc_array(num_pages, sizeof(*pages), GFP_KERNEL);
        if (!pages)
                return -ENOMEM;
 
-       if (vma->userptr.sg) {
+       if (userptr->sg) {
                dma_unmap_sgtable(xe->drm.dev,
-                                 vma->userptr.sg,
+                                 userptr->sg,
                                  read_only ? DMA_TO_DEVICE :
                                  DMA_BIDIRECTIONAL, 0);
-               sg_free_table(vma->userptr.sg);
-               vma->userptr.sg = NULL;
+               sg_free_table(userptr->sg);
+               userptr->sg = NULL;
        }
 
        pinned = ret = 0;
        if (in_kthread) {
-               if (!mmget_not_zero(vma->userptr.notifier.mm)) {
+               if (!mmget_not_zero(userptr->notifier.mm)) {
                        ret = -EFAULT;
                        goto mm_closed;
                }
-               kthread_use_mm(vma->userptr.notifier.mm);
+               kthread_use_mm(userptr->notifier.mm);
        }
 
        while (pinned < num_pages) {
@@ -112,43 +112,40 @@ retry:
                                          num_pages - pinned,
                                          read_only ? 0 : FOLL_WRITE,
                                          &pages[pinned]);
-               if (ret < 0) {
-                       if (in_kthread)
-                               ret = 0;
+               if (ret < 0)
                        break;
-               }
 
                pinned += ret;
                ret = 0;
        }
 
        if (in_kthread) {
-               kthread_unuse_mm(vma->userptr.notifier.mm);
-               mmput(vma->userptr.notifier.mm);
+               kthread_unuse_mm(userptr->notifier.mm);
+               mmput(userptr->notifier.mm);
        }
 mm_closed:
        if (ret)
                goto out;
 
-       ret = sg_alloc_table_from_pages_segment(&vma->userptr.sgt, pages,
+       ret = sg_alloc_table_from_pages_segment(&userptr->sgt, pages,
                                                pinned, 0,
                                                (u64)pinned << PAGE_SHIFT,
                                                xe_sg_segment_size(xe->drm.dev),
                                                GFP_KERNEL);
        if (ret) {
-               vma->userptr.sg = NULL;
+               userptr->sg = NULL;
                goto out;
        }
-       vma->userptr.sg = &vma->userptr.sgt;
+       userptr->sg = &userptr->sgt;
 
-       ret = dma_map_sgtable(xe->drm.dev, vma->userptr.sg,
+       ret = dma_map_sgtable(xe->drm.dev, userptr->sg,
                              read_only ? DMA_TO_DEVICE :
                              DMA_BIDIRECTIONAL,
                              DMA_ATTR_SKIP_CPU_SYNC |
                              DMA_ATTR_NO_KERNEL_MAPPING);
        if (ret) {
-               sg_free_table(vma->userptr.sg);
-               vma->userptr.sg = NULL;
+               sg_free_table(userptr->sg);
+               userptr->sg = NULL;
                goto out;
        }
 
@@ -167,8 +164,8 @@ out:
        kvfree(pages);
 
        if (!(ret < 0)) {
-               vma->userptr.notifier_seq = notifier_seq;
-               if (xe_vma_userptr_check_repin(vma) == -EAGAIN)
+               userptr->notifier_seq = notifier_seq;
+               if (xe_vma_userptr_check_repin(uvma) == -EAGAIN)
                        goto retry;
        }
 
@@ -635,7 +632,9 @@ static bool vma_userptr_invalidate(struct mmu_interval_notifier *mni,
                                   const struct mmu_notifier_range *range,
                                   unsigned long cur_seq)
 {
-       struct xe_vma *vma = container_of(mni, struct xe_vma, userptr.notifier);
+       struct xe_userptr *userptr = container_of(mni, typeof(*userptr), notifier);
+       struct xe_userptr_vma *uvma = container_of(userptr, typeof(*uvma), userptr);
+       struct xe_vma *vma = &uvma->vma;
        struct xe_vm *vm = xe_vma_vm(vma);
        struct dma_resv_iter cursor;
        struct dma_fence *fence;
@@ -651,7 +650,7 @@ static bool vma_userptr_invalidate(struct mmu_interval_notifier *mni,
        mmu_interval_set_seq(mni, cur_seq);
 
        /* No need to stop gpu access if the userptr is not yet bound. */
-       if (!vma->userptr.initial_bind) {
+       if (!userptr->initial_bind) {
                up_write(&vm->userptr.notifier_lock);
                return true;
        }
@@ -663,7 +662,7 @@ static bool vma_userptr_invalidate(struct mmu_interval_notifier *mni,
        if (!xe_vm_in_fault_mode(vm) &&
            !(vma->gpuva.flags & XE_VMA_DESTROYED) && vma->tile_present) {
                spin_lock(&vm->userptr.invalidated_lock);
-               list_move_tail(&vma->userptr.invalidate_link,
+               list_move_tail(&userptr->invalidate_link,
                               &vm->userptr.invalidated);
                spin_unlock(&vm->userptr.invalidated_lock);
        }
@@ -703,7 +702,7 @@ static const struct mmu_interval_notifier_ops vma_userptr_notifier_ops = {
 
 int xe_vm_userptr_pin(struct xe_vm *vm)
 {
-       struct xe_vma *vma, *next;
+       struct xe_userptr_vma *uvma, *next;
        int err = 0;
        LIST_HEAD(tmp_evict);
 
@@ -711,22 +710,23 @@ int xe_vm_userptr_pin(struct xe_vm *vm)
 
        /* Collect invalidated userptrs */
        spin_lock(&vm->userptr.invalidated_lock);
-       list_for_each_entry_safe(vma, next, &vm->userptr.invalidated,
+       list_for_each_entry_safe(uvma, next, &vm->userptr.invalidated,
                                 userptr.invalidate_link) {
-               list_del_init(&vma->userptr.invalidate_link);
-               list_move_tail(&vma->combined_links.userptr,
+               list_del_init(&uvma->userptr.invalidate_link);
+               list_move_tail(&uvma->userptr.repin_link,
                               &vm->userptr.repin_list);
        }
        spin_unlock(&vm->userptr.invalidated_lock);
 
        /* Pin and move to temporary list */
-       list_for_each_entry_safe(vma, next, &vm->userptr.repin_list,
-                                combined_links.userptr) {
-               err = xe_vma_userptr_pin_pages(vma);
+       list_for_each_entry_safe(uvma, next, &vm->userptr.repin_list,
+                                userptr.repin_link) {
+               err = xe_vma_userptr_pin_pages(uvma);
                if (err < 0)
                        return err;
 
-               list_move_tail(&vma->combined_links.userptr, &vm->rebind_list);
+               list_del_init(&uvma->userptr.repin_link);
+               list_move_tail(&uvma->vma.combined_links.rebind, &vm->rebind_list);
        }
 
        return 0;
@@ -782,6 +782,14 @@ struct dma_fence *xe_vm_rebind(struct xe_vm *vm, bool rebind_worker)
        return fence;
 }
 
+static void xe_vma_free(struct xe_vma *vma)
+{
+       if (xe_vma_is_userptr(vma))
+               kfree(to_userptr_vma(vma));
+       else
+               kfree(vma);
+}
+
 #define VMA_CREATE_FLAG_READ_ONLY      BIT(0)
 #define VMA_CREATE_FLAG_IS_NULL                BIT(1)
 
@@ -800,14 +808,26 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
        xe_assert(vm->xe, start < end);
        xe_assert(vm->xe, end < vm->size);
 
-       if (!bo && !is_null)    /* userptr */
+       /*
+        * Allocate and ensure that the xe_vma_is_userptr() return
+        * matches what was allocated.
+        */
+       if (!bo && !is_null) {
+               struct xe_userptr_vma *uvma = kzalloc(sizeof(*uvma), GFP_KERNEL);
+
+               if (!uvma)
+                       return ERR_PTR(-ENOMEM);
+
+               vma = &uvma->vma;
+       } else {
                vma = kzalloc(sizeof(*vma), GFP_KERNEL);
-       else
-               vma = kzalloc(sizeof(*vma) - sizeof(struct xe_userptr),
-                             GFP_KERNEL);
-       if (!vma) {
-               vma = ERR_PTR(-ENOMEM);
-               return vma;
+               if (!vma)
+                       return ERR_PTR(-ENOMEM);
+
+               if (is_null)
+                       vma->gpuva.flags |= DRM_GPUVA_SPARSE;
+               if (bo)
+                       vma->gpuva.gem.obj = &bo->ttm.base;
        }
 
        INIT_LIST_HEAD(&vma->combined_links.rebind);
@@ -818,8 +838,6 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
        vma->gpuva.va.range = end - start + 1;
        if (read_only)
                vma->gpuva.flags |= XE_VMA_READ_ONLY;
-       if (is_null)
-               vma->gpuva.flags |= DRM_GPUVA_SPARSE;
 
        for_each_tile(tile, vm->xe, id)
                vma->tile_mask |= 0x1 << id;
@@ -836,35 +854,35 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 
                vm_bo = drm_gpuvm_bo_obtain(vma->gpuva.vm, &bo->ttm.base);
                if (IS_ERR(vm_bo)) {
-                       kfree(vma);
+                       xe_vma_free(vma);
                        return ERR_CAST(vm_bo);
                }
 
                drm_gpuvm_bo_extobj_add(vm_bo);
                drm_gem_object_get(&bo->ttm.base);
-               vma->gpuva.gem.obj = &bo->ttm.base;
                vma->gpuva.gem.offset = bo_offset_or_userptr;
                drm_gpuva_link(&vma->gpuva, vm_bo);
                drm_gpuvm_bo_put(vm_bo);
        } else /* userptr or null */ {
                if (!is_null) {
+                       struct xe_userptr *userptr = &to_userptr_vma(vma)->userptr;
                        u64 size = end - start + 1;
                        int err;
 
-                       INIT_LIST_HEAD(&vma->userptr.invalidate_link);
+                       INIT_LIST_HEAD(&userptr->invalidate_link);
+                       INIT_LIST_HEAD(&userptr->repin_link);
                        vma->gpuva.gem.offset = bo_offset_or_userptr;
 
-                       err = mmu_interval_notifier_insert(&vma->userptr.notifier,
+                       err = mmu_interval_notifier_insert(&userptr->notifier,
                                                           current->mm,
                                                           xe_vma_userptr(vma), size,
                                                           &vma_userptr_notifier_ops);
                        if (err) {
-                               kfree(vma);
-                               vma = ERR_PTR(err);
-                               return vma;
+                               xe_vma_free(vma);
+                               return ERR_PTR(err);
                        }
 
-                       vma->userptr.notifier_seq = LONG_MAX;
+                       userptr->notifier_seq = LONG_MAX;
                }
 
                xe_vm_get(vm);
@@ -879,14 +897,21 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
        struct xe_device *xe = vm->xe;
        bool read_only = xe_vma_read_only(vma);
 
+       if (vma->ufence) {
+               xe_sync_ufence_put(vma->ufence);
+               vma->ufence = NULL;
+       }
+
        if (xe_vma_is_userptr(vma)) {
-               if (vma->userptr.sg) {
+               struct xe_userptr *userptr = &to_userptr_vma(vma)->userptr;
+
+               if (userptr->sg) {
                        dma_unmap_sgtable(xe->drm.dev,
-                                         vma->userptr.sg,
+                                         userptr->sg,
                                          read_only ? DMA_TO_DEVICE :
                                          DMA_BIDIRECTIONAL, 0);
-                       sg_free_table(vma->userptr.sg);
-                       vma->userptr.sg = NULL;
+                       sg_free_table(userptr->sg);
+                       userptr->sg = NULL;
                }
 
                /*
@@ -894,7 +919,7 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
                 * the notifer until we're sure the GPU is not accessing
                 * them anymore
                 */
-               mmu_interval_notifier_remove(&vma->userptr.notifier);
+               mmu_interval_notifier_remove(&userptr->notifier);
                xe_vm_put(vm);
        } else if (xe_vma_is_null(vma)) {
                xe_vm_put(vm);
@@ -902,7 +927,7 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
                xe_bo_put(xe_vma_bo(vma));
        }
 
-       kfree(vma);
+       xe_vma_free(vma);
 }
 
 static void vma_destroy_work_func(struct work_struct *w)
@@ -933,7 +958,7 @@ static void xe_vma_destroy(struct xe_vma *vma, struct dma_fence *fence)
                xe_assert(vm->xe, vma->gpuva.flags & XE_VMA_DESTROYED);
 
                spin_lock(&vm->userptr.invalidated_lock);
-               list_del(&vma->userptr.invalidate_link);
+               list_del(&to_userptr_vma(vma)->userptr.invalidate_link);
                spin_unlock(&vm->userptr.invalidated_lock);
        } else if (!xe_vma_is_null(vma)) {
                xe_bo_assert_held(xe_vma_bo(vma));
@@ -975,9 +1000,16 @@ int xe_vm_prepare_vma(struct drm_exec *exec, struct xe_vma *vma,
        int err;
 
        XE_WARN_ON(!vm);
-       err = drm_exec_prepare_obj(exec, xe_vm_obj(vm), num_shared);
-       if (!err && bo && !bo->vm)
-               err = drm_exec_prepare_obj(exec, &bo->ttm.base, num_shared);
+       if (num_shared)
+               err = drm_exec_prepare_obj(exec, xe_vm_obj(vm), num_shared);
+       else
+               err = drm_exec_lock_obj(exec, xe_vm_obj(vm));
+       if (!err && bo && !bo->vm) {
+               if (num_shared)
+                       err = drm_exec_prepare_obj(exec, &bo->ttm.base, num_shared);
+               else
+                       err = drm_exec_lock_obj(exec, &bo->ttm.base);
+       }
 
        return err;
 }
@@ -1581,6 +1613,16 @@ xe_vm_unbind_vma(struct xe_vma *vma, struct xe_exec_queue *q,
 
        trace_xe_vma_unbind(vma);
 
+       if (vma->ufence) {
+               struct xe_user_fence * const f = vma->ufence;
+
+               if (!xe_sync_ufence_get_status(f))
+                       return ERR_PTR(-EBUSY);
+
+               vma->ufence = NULL;
+               xe_sync_ufence_put(f);
+       }
+
        if (number_tiles > 1) {
                fences = kmalloc_array(number_tiles, sizeof(*fences),
                                       GFP_KERNEL);
@@ -1714,6 +1756,21 @@ err_fences:
        return ERR_PTR(err);
 }
 
+static struct xe_user_fence *
+find_ufence_get(struct xe_sync_entry *syncs, u32 num_syncs)
+{
+       unsigned int i;
+
+       for (i = 0; i < num_syncs; i++) {
+               struct xe_sync_entry *e = &syncs[i];
+
+               if (xe_sync_is_ufence(e))
+                       return xe_sync_ufence_get(e);
+       }
+
+       return NULL;
+}
+
 static int __xe_vm_bind(struct xe_vm *vm, struct xe_vma *vma,
                        struct xe_exec_queue *q, struct xe_sync_entry *syncs,
                        u32 num_syncs, bool immediate, bool first_op,
@@ -1721,9 +1778,16 @@ static int __xe_vm_bind(struct xe_vm *vm, struct xe_vma *vma,
 {
        struct dma_fence *fence;
        struct xe_exec_queue *wait_exec_queue = to_wait_exec_queue(vm, q);
+       struct xe_user_fence *ufence;
 
        xe_vm_assert_held(vm);
 
+       ufence = find_ufence_get(syncs, num_syncs);
+       if (vma->ufence && ufence)
+               xe_sync_ufence_put(vma->ufence);
+
+       vma->ufence = ufence ?: vma->ufence;
+
        if (immediate) {
                fence = xe_vm_bind_vma(vma, q, syncs, num_syncs, first_op,
                                       last_op);
@@ -1855,10 +1919,8 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
        mutex_lock(&xef->vm.lock);
        err = xa_alloc(&xef->vm.xa, &id, vm, xa_limit_32b, GFP_KERNEL);
        mutex_unlock(&xef->vm.lock);
-       if (err) {
-               xe_vm_close_and_put(vm);
-               return err;
-       }
+       if (err)
+               goto err_close_and_put;
 
        if (xe->info.has_asid) {
                mutex_lock(&xe->usm.lock);
@@ -1866,11 +1928,9 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
                                      XA_LIMIT(1, XE_MAX_ASID - 1),
                                      &xe->usm.next_asid, GFP_KERNEL);
                mutex_unlock(&xe->usm.lock);
-               if (err < 0) {
-                       xe_vm_close_and_put(vm);
-                       return err;
-               }
-               err = 0;
+               if (err < 0)
+                       goto err_free_id;
+
                vm->usm.asid = asid;
        }
 
@@ -1888,6 +1948,15 @@ int xe_vm_create_ioctl(struct drm_device *dev, void *data,
 #endif
 
        return 0;
+
+err_free_id:
+       mutex_lock(&xef->vm.lock);
+       xa_erase(&xef->vm.xa, id);
+       mutex_unlock(&xef->vm.lock);
+err_close_and_put:
+       xe_vm_close_and_put(vm);
+
+       return err;
 }
 
 int xe_vm_destroy_ioctl(struct drm_device *dev, void *data,
@@ -1954,6 +2023,7 @@ static int xe_vm_prefetch(struct xe_vm *vm, struct xe_vma *vma,
                                        xe_exec_queue_last_fence_get(wait_exec_queue, vm);
 
                                xe_sync_entry_signal(&syncs[i], NULL, fence);
+                               dma_fence_put(fence);
                        }
                }
 
@@ -2034,7 +2104,6 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
        struct drm_gem_object *obj = bo ? &bo->ttm.base : NULL;
        struct drm_gpuva_ops *ops;
        struct drm_gpuva_op *__op;
-       struct xe_vma_op *op;
        struct drm_gpuvm_bo *vm_bo;
        int err;
 
@@ -2081,23 +2150,10 @@ vm_bind_ioctl_ops_create(struct xe_vm *vm, struct xe_bo *bo,
        if (IS_ERR(ops))
                return ops;
 
-#ifdef TEST_VM_ASYNC_OPS_ERROR
-       if (operation & FORCE_ASYNC_OP_ERROR) {
-               op = list_first_entry_or_null(&ops->list, struct xe_vma_op,
-                                             base.entry);
-               if (op)
-                       op->inject_error = true;
-       }
-#endif
-
        drm_gpuva_for_each_op(__op, ops) {
                struct xe_vma_op *op = gpuva_op_to_vma_op(__op);
 
                if (__op->op == DRM_GPUVA_OP_MAP) {
-                       op->map.immediate =
-                               flags & DRM_XE_VM_BIND_FLAG_IMMEDIATE;
-                       op->map.read_only =
-                               flags & DRM_XE_VM_BIND_FLAG_READONLY;
                        op->map.is_null = flags & DRM_XE_VM_BIND_FLAG_NULL;
                        op->map.pat_index = pat_index;
                } else if (__op->op == DRM_GPUVA_OP_PREFETCH) {
@@ -2145,7 +2201,7 @@ static struct xe_vma *new_vma(struct xe_vm *vm, struct drm_gpuva_op_map *op,
                drm_exec_fini(&exec);
 
        if (xe_vma_is_userptr(vma)) {
-               err = xe_vma_userptr_pin_pages(vma);
+               err = xe_vma_userptr_pin_pages(to_userptr_vma(vma));
                if (err) {
                        prep_vma_destroy(vm, vma, false);
                        xe_vma_destroy_unlocked(vma);
@@ -2167,13 +2223,17 @@ static u64 xe_vma_max_pte_size(struct xe_vma *vma)
 {
        if (vma->gpuva.flags & XE_VMA_PTE_1G)
                return SZ_1G;
-       else if (vma->gpuva.flags & XE_VMA_PTE_2M)
+       else if (vma->gpuva.flags & (XE_VMA_PTE_2M | XE_VMA_PTE_COMPACT))
                return SZ_2M;
+       else if (vma->gpuva.flags & XE_VMA_PTE_64K)
+               return SZ_64K;
+       else if (vma->gpuva.flags & XE_VMA_PTE_4K)
+               return SZ_4K;
 
-       return SZ_4K;
+       return SZ_1G;   /* Uninitialized, used max size */
 }
 
-static u64 xe_vma_set_pte_size(struct xe_vma *vma, u64 size)
+static void xe_vma_set_pte_size(struct xe_vma *vma, u64 size)
 {
        switch (size) {
        case SZ_1G:
@@ -2182,9 +2242,13 @@ static u64 xe_vma_set_pte_size(struct xe_vma *vma, u64 size)
        case SZ_2M:
                vma->gpuva.flags |= XE_VMA_PTE_2M;
                break;
+       case SZ_64K:
+               vma->gpuva.flags |= XE_VMA_PTE_64K;
+               break;
+       case SZ_4K:
+               vma->gpuva.flags |= XE_VMA_PTE_4K;
+               break;
        }
-
-       return SZ_4K;
 }
 
 static int xe_vma_op_commit(struct xe_vm *vm, struct xe_vma_op *op)
@@ -2282,8 +2346,6 @@ static int vm_bind_ioctl_ops_parse(struct xe_vm *vm, struct xe_exec_queue *q,
                switch (op->base.op) {
                case DRM_GPUVA_OP_MAP:
                {
-                       flags |= op->map.read_only ?
-                               VMA_CREATE_FLAG_READ_ONLY : 0;
                        flags |= op->map.is_null ?
                                VMA_CREATE_FLAG_IS_NULL : 0;
 
@@ -2414,7 +2476,7 @@ static int op_execute(struct drm_exec *exec, struct xe_vm *vm,
        case DRM_GPUVA_OP_MAP:
                err = xe_vm_bind(vm, vma, op->q, xe_vma_bo(vma),
                                 op->syncs, op->num_syncs,
-                                op->map.immediate || !xe_vm_in_fault_mode(vm),
+                                !xe_vm_in_fault_mode(vm),
                                 op->flags & XE_VMA_OP_FIRST,
                                 op->flags & XE_VMA_OP_LAST);
                break;
@@ -2500,13 +2562,25 @@ retry_userptr:
        }
        drm_exec_fini(&exec);
 
-       if (err == -EAGAIN && xe_vma_is_userptr(vma)) {
+       if (err == -EAGAIN) {
                lockdep_assert_held_write(&vm->lock);
-               err = xe_vma_userptr_pin_pages(vma);
-               if (!err)
-                       goto retry_userptr;
 
-               trace_xe_vma_fail(vma);
+               if (op->base.op == DRM_GPUVA_OP_REMAP) {
+                       if (!op->remap.unmap_done)
+                               vma = gpuva_to_vma(op->base.remap.unmap->va);
+                       else if (op->remap.prev)
+                               vma = op->remap.prev;
+                       else
+                               vma = op->remap.next;
+               }
+
+               if (xe_vma_is_userptr(vma)) {
+                       err = xe_vma_userptr_pin_pages(to_userptr_vma(vma));
+                       if (!err)
+                               goto retry_userptr;
+
+                       trace_xe_vma_fail(vma);
+               }
        }
 
        return err;
@@ -2518,13 +2592,6 @@ static int xe_vma_op_execute(struct xe_vm *vm, struct xe_vma_op *op)
 
        lockdep_assert_held_write(&vm->lock);
 
-#ifdef TEST_VM_ASYNC_OPS_ERROR
-       if (op->inject_error) {
-               op->inject_error = false;
-               return -ENOMEM;
-       }
-#endif
-
        switch (op->base.op) {
        case DRM_GPUVA_OP_MAP:
                ret = __xe_vma_op_execute(vm, op->map.vma, op);
@@ -2639,7 +2706,7 @@ static void vm_bind_ioctl_ops_unwind(struct xe_vm *vm,
 {
        int i;
 
-       for (i = num_ops_list - 1; i; ++i) {
+       for (i = num_ops_list - 1; i >= 0; --i) {
                struct drm_gpuva_ops *__ops = ops[i];
                struct drm_gpuva_op *__op;
 
@@ -2684,21 +2751,11 @@ static int vm_bind_ioctl_ops_execute(struct xe_vm *vm,
        return 0;
 }
 
-#ifdef TEST_VM_ASYNC_OPS_ERROR
-#define SUPPORTED_FLAGS        \
-       (FORCE_ASYNC_OP_ERROR | DRM_XE_VM_BIND_FLAG_READONLY | \
-        DRM_XE_VM_BIND_FLAG_IMMEDIATE | DRM_XE_VM_BIND_FLAG_NULL | 0xffff)
-#else
-#define SUPPORTED_FLAGS        \
-       (DRM_XE_VM_BIND_FLAG_READONLY | \
-        DRM_XE_VM_BIND_FLAG_IMMEDIATE | DRM_XE_VM_BIND_FLAG_NULL | \
-        0xffff)
-#endif
+#define SUPPORTED_FLAGS        (DRM_XE_VM_BIND_FLAG_NULL | \
+        DRM_XE_VM_BIND_FLAG_DUMPABLE)
 #define XE_64K_PAGE_MASK 0xffffull
 #define ALL_DRM_XE_SYNCS_FLAGS (DRM_XE_SYNCS_FLAG_WAIT_FOR_OP)
 
-#define MAX_BINDS      512     /* FIXME: Picking random upper limit */
-
 static int vm_bind_ioctl_check_args(struct xe_device *xe,
                                    struct drm_xe_vm_bind *args,
                                    struct drm_xe_vm_bind_op **bind_ops)
@@ -2710,16 +2767,16 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
            XE_IOCTL_DBG(xe, args->reserved[0] || args->reserved[1]))
                return -EINVAL;
 
-       if (XE_IOCTL_DBG(xe, args->extensions) ||
-           XE_IOCTL_DBG(xe, args->num_binds > MAX_BINDS))
+       if (XE_IOCTL_DBG(xe, args->extensions))
                return -EINVAL;
 
        if (args->num_binds > 1) {
                u64 __user *bind_user =
                        u64_to_user_ptr(args->vector_of_binds);
 
-               *bind_ops = kmalloc(sizeof(struct drm_xe_vm_bind_op) *
-                                   args->num_binds, GFP_KERNEL);
+               *bind_ops = kvmalloc_array(args->num_binds,
+                                          sizeof(struct drm_xe_vm_bind_op),
+                                          GFP_KERNEL | __GFP_ACCOUNT);
                if (!*bind_ops)
                        return -ENOMEM;
 
@@ -2809,7 +2866,7 @@ static int vm_bind_ioctl_check_args(struct xe_device *xe,
 
 free_bind_ops:
        if (args->num_binds > 1)
-               kfree(*bind_ops);
+               kvfree(*bind_ops);
        return err;
 }
 
@@ -2846,7 +2903,7 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
        struct drm_gpuva_ops **ops = NULL;
        struct xe_vm *vm;
        struct xe_exec_queue *q = NULL;
-       u32 num_syncs;
+       u32 num_syncs, num_ufence = 0;
        struct xe_sync_entry *syncs = NULL;
        struct drm_xe_vm_bind_op *bind_ops;
        LIST_HEAD(ops_list);
@@ -2897,13 +2954,15 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
        }
 
        if (args->num_binds) {
-               bos = kcalloc(args->num_binds, sizeof(*bos), GFP_KERNEL);
+               bos = kvcalloc(args->num_binds, sizeof(*bos),
+                              GFP_KERNEL | __GFP_ACCOUNT);
                if (!bos) {
                        err = -ENOMEM;
                        goto release_vm_lock;
                }
 
-               ops = kcalloc(args->num_binds, sizeof(*ops), GFP_KERNEL);
+               ops = kvcalloc(args->num_binds, sizeof(*ops),
+                              GFP_KERNEL | __GFP_ACCOUNT);
                if (!ops) {
                        err = -ENOMEM;
                        goto release_vm_lock;
@@ -2983,6 +3042,14 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
                                           SYNC_PARSE_FLAG_DISALLOW_USER_FENCE : 0));
                if (err)
                        goto free_syncs;
+
+               if (xe_sync_is_ufence(&syncs[num_syncs]))
+                       num_ufence++;
+       }
+
+       if (XE_IOCTL_DBG(xe, num_ufence > 1)) {
+               err = -EINVAL;
+               goto free_syncs;
        }
 
        if (!args->num_binds) {
@@ -3036,10 +3103,10 @@ int xe_vm_bind_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
        for (i = 0; bos && i < args->num_binds; ++i)
                xe_bo_put(bos[i]);
 
-       kfree(bos);
-       kfree(ops);
+       kvfree(bos);
+       kvfree(ops);
        if (args->num_binds > 1)
-               kfree(bind_ops);
+               kvfree(bind_ops);
 
        return err;
 
@@ -3063,10 +3130,10 @@ put_exec_queue:
        if (q)
                xe_exec_queue_put(q);
 free_objs:
-       kfree(bos);
-       kfree(ops);
+       kvfree(bos);
+       kvfree(ops);
        if (args->num_binds > 1)
-               kfree(bind_ops);
+               kvfree(bind_ops);
        return err;
 }
 
@@ -3125,8 +3192,8 @@ int xe_vm_invalidate_vma(struct xe_vma *vma)
        if (IS_ENABLED(CONFIG_PROVE_LOCKING)) {
                if (xe_vma_is_userptr(vma)) {
                        WARN_ON_ONCE(!mmu_interval_check_retry
-                                    (&vma->userptr.notifier,
-                                     vma->userptr.notifier_seq));
+                                    (&to_userptr_vma(vma)->userptr.notifier,
+                                     to_userptr_vma(vma)->userptr.notifier_seq));
                        WARN_ON_ONCE(!dma_resv_test_signaled(xe_vm_resv(xe_vma_vm(vma)),
                                                             DMA_RESV_USAGE_BOOKKEEP));
 
@@ -3187,11 +3254,11 @@ int xe_analyze_vm(struct drm_printer *p, struct xe_vm *vm, int gt_id)
                if (is_null) {
                        addr = 0;
                } else if (is_userptr) {
+                       struct sg_table *sg = to_userptr_vma(vma)->userptr.sg;
                        struct xe_res_cursor cur;
 
-                       if (vma->userptr.sg) {
-                               xe_res_first_sg(vma->userptr.sg, 0, XE_PAGE_SIZE,
-                                               &cur);
+                       if (sg) {
+                               xe_res_first_sg(sg, 0, XE_PAGE_SIZE, &cur);
                                addr = xe_res_dma(&cur);
                        } else {
                                addr = 0;
index cf2f96e8c1ab92245b69dd8853c90d5e128262fd..9654a0612fc258d0ba7395ba7c7fd87899caf904 100644 (file)
@@ -160,6 +160,18 @@ static inline bool xe_vma_is_userptr(struct xe_vma *vma)
        return xe_vma_has_no_bo(vma) && !xe_vma_is_null(vma);
 }
 
+/**
+ * to_userptr_vma() - Return a pointer to an embedding userptr vma
+ * @vma: Pointer to the embedded struct xe_vma
+ *
+ * Return: Pointer to the embedding userptr vma
+ */
+static inline struct xe_userptr_vma *to_userptr_vma(struct xe_vma *vma)
+{
+       xe_assert(xe_vma_vm(vma)->xe, xe_vma_is_userptr(vma));
+       return container_of(vma, struct xe_userptr_vma, vma);
+}
+
 u64 xe_vm_pdp4_descriptor(struct xe_vm *vm, struct xe_tile *tile);
 
 int xe_vm_create_ioctl(struct drm_device *dev, void *data,
@@ -224,9 +236,9 @@ static inline void xe_vm_reactivate_rebind(struct xe_vm *vm)
        }
 }
 
-int xe_vma_userptr_pin_pages(struct xe_vma *vma);
+int xe_vma_userptr_pin_pages(struct xe_userptr_vma *uvma);
 
-int xe_vma_userptr_check_repin(struct xe_vma *vma);
+int xe_vma_userptr_check_repin(struct xe_userptr_vma *uvma);
 
 bool xe_vm_validate_should_retry(struct drm_exec *exec, int err, ktime_t *end);
 
index 63e8a50b88e94980d65a0800817235c518adcd69..7300eea5394ba8c1ece10dba63314bf733ee5157 100644 (file)
 
 struct xe_bo;
 struct xe_sync_entry;
+struct xe_user_fence;
 struct xe_vm;
 
-#define TEST_VM_ASYNC_OPS_ERROR
-#define FORCE_ASYNC_OP_ERROR   BIT(31)
-
 #define XE_VMA_READ_ONLY       DRM_GPUVA_USERBITS
 #define XE_VMA_DESTROYED       (DRM_GPUVA_USERBITS << 1)
 #define XE_VMA_ATOMIC_PTE_BIT  (DRM_GPUVA_USERBITS << 2)
@@ -32,11 +30,15 @@ struct xe_vm;
 #define XE_VMA_PTE_4K          (DRM_GPUVA_USERBITS << 5)
 #define XE_VMA_PTE_2M          (DRM_GPUVA_USERBITS << 6)
 #define XE_VMA_PTE_1G          (DRM_GPUVA_USERBITS << 7)
+#define XE_VMA_PTE_64K         (DRM_GPUVA_USERBITS << 8)
+#define XE_VMA_PTE_COMPACT     (DRM_GPUVA_USERBITS << 9)
 
 /** struct xe_userptr - User pointer */
 struct xe_userptr {
        /** @invalidate_link: Link for the vm::userptr.invalidated list */
        struct list_head invalidate_link;
+       /** @userptr: link into VM repin list if userptr. */
+       struct list_head repin_link;
        /**
         * @notifier: MMU notifier for user pointer (invalidation call back)
         */
@@ -68,8 +70,6 @@ struct xe_vma {
         * resv.
         */
        union {
-               /** @userptr: link into VM repin list if userptr. */
-               struct list_head userptr;
                /** @rebind: link into VM if this VMA needs rebinding. */
                struct list_head rebind;
                /** @destroy: link to contested list when VM is being closed. */
@@ -107,9 +107,19 @@ struct xe_vma {
        u16 pat_index;
 
        /**
-        * @userptr: user pointer state, only allocated for VMAs that are
-        * user pointers
+        * @ufence: The user fence that was provided with MAP.
+        * Needs to be signalled before UNMAP can be processed.
         */
+       struct xe_user_fence *ufence;
+};
+
+/**
+ * struct xe_userptr_vma - A userptr vma subclass
+ * @vma: The vma.
+ * @userptr: Additional userptr information.
+ */
+struct xe_userptr_vma {
+       struct xe_vma vma;
        struct xe_userptr userptr;
 };
 
@@ -285,10 +295,6 @@ struct xe_vm {
 struct xe_vma_op_map {
        /** @vma: VMA to map */
        struct xe_vma *vma;
-       /** @immediate: Immediate bind */
-       bool immediate;
-       /** @read_only: Read only */
-       bool read_only;
        /** @is_null: is NULL binding */
        bool is_null;
        /** @pat_index: The pat index to use for this operation. */
@@ -356,11 +362,6 @@ struct xe_vma_op {
        /** @flags: operation flags */
        enum xe_vma_op_flags flags;
 
-#ifdef TEST_VM_ASYNC_OPS_ERROR
-       /** @inject_error: inject error to test async op error handling */
-       bool inject_error;
-#endif
-
        union {
                /** @map: VMA map operation specific data */
                struct xe_vma_op_map map;
index 42fd504abbcda248e67fd84a64e2f96a2609b4cb..89983d7d73ca1539c19ff4a511c0c179bd07ed91 100644 (file)
@@ -169,6 +169,7 @@ static const struct host1x_info host1x06_info = {
        .num_sid_entries = ARRAY_SIZE(tegra186_sid_table),
        .sid_table = tegra186_sid_table,
        .reserve_vblank_syncpts = false,
+       .skip_reset_assert = true,
 };
 
 static const struct host1x_sid_entry tegra194_sid_table[] = {
@@ -680,13 +681,15 @@ static int __maybe_unused host1x_runtime_suspend(struct device *dev)
        host1x_intr_stop(host);
        host1x_syncpt_save(host);
 
-       err = reset_control_bulk_assert(host->nresets, host->resets);
-       if (err) {
-               dev_err(dev, "failed to assert reset: %d\n", err);
-               goto resume_host1x;
-       }
+       if (!host->info->skip_reset_assert) {
+               err = reset_control_bulk_assert(host->nresets, host->resets);
+               if (err) {
+                       dev_err(dev, "failed to assert reset: %d\n", err);
+                       goto resume_host1x;
+               }
 
-       usleep_range(1000, 2000);
+               usleep_range(1000, 2000);
+       }
 
        clk_disable_unprepare(host->clk);
        reset_control_bulk_release(host->nresets, host->resets);
index c8e302de76257008aa3fb172da0c3acd4412c572..925a118db23f5751cbbe50db317e98fd4c543414 100644 (file)
@@ -116,6 +116,12 @@ struct host1x_info {
         * the display driver disables VBLANK increments.
         */
        bool reserve_vblank_syncpts;
+       /*
+        * On Tegra186, secure world applications may require access to
+        * host1x during suspend/resume. To allow this, we need to leave
+        * host1x not in reset.
+        */
+       bool skip_reset_assert;
 };
 
 struct host1x {
index d9ef45fcaeab1380967fe2fa2357411d2bc913d4..470ae2c29c94f25b66127827b725da24b41e101b 100644 (file)
@@ -143,6 +143,9 @@ u8 *call_hid_bpf_rdesc_fixup(struct hid_device *hdev, u8 *rdesc, unsigned int *s
 }
 EXPORT_SYMBOL_GPL(call_hid_bpf_rdesc_fixup);
 
+/* Disables missing prototype warnings */
+__bpf_kfunc_start_defs();
+
 /**
  * hid_bpf_get_data - Get the kernel memory pointer associated with the context @ctx
  *
@@ -152,7 +155,7 @@ EXPORT_SYMBOL_GPL(call_hid_bpf_rdesc_fixup);
  *
  * @returns %NULL on error, an %__u8 memory pointer on success
  */
-noinline __u8 *
+__bpf_kfunc __u8 *
 hid_bpf_get_data(struct hid_bpf_ctx *ctx, unsigned int offset, const size_t rdwr_buf_size)
 {
        struct hid_bpf_ctx_kern *ctx_kern;
@@ -167,6 +170,7 @@ hid_bpf_get_data(struct hid_bpf_ctx *ctx, unsigned int offset, const size_t rdwr
 
        return ctx_kern->data + offset;
 }
+__bpf_kfunc_end_defs();
 
 /*
  * The following set contains all functions we agree BPF programs
@@ -241,6 +245,42 @@ int hid_bpf_reconnect(struct hid_device *hdev)
        return 0;
 }
 
+static int do_hid_bpf_attach_prog(struct hid_device *hdev, int prog_fd, struct bpf_prog *prog,
+                                 __u32 flags)
+{
+       int fd, err, prog_type;
+
+       prog_type = hid_bpf_get_prog_attach_type(prog);
+       if (prog_type < 0)
+               return prog_type;
+
+       if (prog_type >= HID_BPF_PROG_TYPE_MAX)
+               return -EINVAL;
+
+       if (prog_type == HID_BPF_PROG_TYPE_DEVICE_EVENT) {
+               err = hid_bpf_allocate_event_data(hdev);
+               if (err)
+                       return err;
+       }
+
+       fd = __hid_bpf_attach_prog(hdev, prog_type, prog_fd, prog, flags);
+       if (fd < 0)
+               return fd;
+
+       if (prog_type == HID_BPF_PROG_TYPE_RDESC_FIXUP) {
+               err = hid_bpf_reconnect(hdev);
+               if (err) {
+                       close_fd(fd);
+                       return err;
+               }
+       }
+
+       return fd;
+}
+
+/* Disables missing prototype warnings */
+__bpf_kfunc_start_defs();
+
 /**
  * hid_bpf_attach_prog - Attach the given @prog_fd to the given HID device
  *
@@ -253,22 +293,17 @@ int hid_bpf_reconnect(struct hid_device *hdev)
  * is pinned to the BPF file system).
  */
 /* called from syscall */
-noinline int
+__bpf_kfunc int
 hid_bpf_attach_prog(unsigned int hid_id, int prog_fd, __u32 flags)
 {
        struct hid_device *hdev;
+       struct bpf_prog *prog;
        struct device *dev;
-       int fd, err, prog_type = hid_bpf_get_prog_attach_type(prog_fd);
+       int err, fd;
 
        if (!hid_bpf_ops)
                return -EINVAL;
 
-       if (prog_type < 0)
-               return prog_type;
-
-       if (prog_type >= HID_BPF_PROG_TYPE_MAX)
-               return -EINVAL;
-
        if ((flags & ~HID_BPF_FLAG_MASK))
                return -EINVAL;
 
@@ -278,25 +313,29 @@ hid_bpf_attach_prog(unsigned int hid_id, int prog_fd, __u32 flags)
 
        hdev = to_hid_device(dev);
 
-       if (prog_type == HID_BPF_PROG_TYPE_DEVICE_EVENT) {
-               err = hid_bpf_allocate_event_data(hdev);
-               if (err)
-                       return err;
+       /*
+        * take a ref on the prog itself, it will be released
+        * on errors or when it'll be detached
+        */
+       prog = bpf_prog_get(prog_fd);
+       if (IS_ERR(prog)) {
+               err = PTR_ERR(prog);
+               goto out_dev_put;
        }
 
-       fd = __hid_bpf_attach_prog(hdev, prog_type, prog_fd, flags);
-       if (fd < 0)
-               return fd;
-
-       if (prog_type == HID_BPF_PROG_TYPE_RDESC_FIXUP) {
-               err = hid_bpf_reconnect(hdev);
-               if (err) {
-                       close_fd(fd);
-                       return err;
-               }
+       fd = do_hid_bpf_attach_prog(hdev, prog_fd, prog, flags);
+       if (fd < 0) {
+               err = fd;
+               goto out_prog_put;
        }
 
        return fd;
+
+ out_prog_put:
+       bpf_prog_put(prog);
+ out_dev_put:
+       put_device(dev);
+       return err;
 }
 
 /**
@@ -306,7 +345,7 @@ hid_bpf_attach_prog(unsigned int hid_id, int prog_fd, __u32 flags)
  *
  * @returns A pointer to &struct hid_bpf_ctx on success, %NULL on error.
  */
-noinline struct hid_bpf_ctx *
+__bpf_kfunc struct hid_bpf_ctx *
 hid_bpf_allocate_context(unsigned int hid_id)
 {
        struct hid_device *hdev;
@@ -323,8 +362,10 @@ hid_bpf_allocate_context(unsigned int hid_id)
        hdev = to_hid_device(dev);
 
        ctx_kern = kzalloc(sizeof(*ctx_kern), GFP_KERNEL);
-       if (!ctx_kern)
+       if (!ctx_kern) {
+               put_device(dev);
                return NULL;
+       }
 
        ctx_kern->ctx.hid = hdev;
 
@@ -337,14 +378,19 @@ hid_bpf_allocate_context(unsigned int hid_id)
  * @ctx: the HID-BPF context to release
  *
  */
-noinline void
+__bpf_kfunc void
 hid_bpf_release_context(struct hid_bpf_ctx *ctx)
 {
        struct hid_bpf_ctx_kern *ctx_kern;
+       struct hid_device *hid;
 
        ctx_kern = container_of(ctx, struct hid_bpf_ctx_kern, ctx);
+       hid = (struct hid_device *)ctx_kern->ctx.hid; /* ignore const */
 
        kfree(ctx_kern);
+
+       /* get_device() is called by bus_find_device() */
+       put_device(&hid->dev);
 }
 
 /**
@@ -358,7 +404,7 @@ hid_bpf_release_context(struct hid_bpf_ctx *ctx)
  *
  * @returns %0 on success, a negative error code otherwise.
  */
-noinline int
+__bpf_kfunc int
 hid_bpf_hw_request(struct hid_bpf_ctx *ctx, __u8 *buf, size_t buf__sz,
                   enum hid_report_type rtype, enum hid_class_request reqtype)
 {
@@ -426,6 +472,7 @@ hid_bpf_hw_request(struct hid_bpf_ctx *ctx, __u8 *buf, size_t buf__sz,
        kfree(dma_data);
        return ret;
 }
+__bpf_kfunc_end_defs();
 
 /* our HID-BPF entrypoints */
 BTF_SET8_START(hid_bpf_fmodret_ids)
index 63dfc8605cd21efbc5f0bdc1844e4e79d73ab3cb..fbe0639d09f2604d6a8e11833eba82480640e289 100644 (file)
@@ -12,9 +12,9 @@ struct hid_bpf_ctx_kern {
 
 int hid_bpf_preload_skel(void);
 void hid_bpf_free_links_and_skel(void);
-int hid_bpf_get_prog_attach_type(int prog_fd);
+int hid_bpf_get_prog_attach_type(struct bpf_prog *prog);
 int __hid_bpf_attach_prog(struct hid_device *hdev, enum hid_bpf_prog_type prog_type, int prog_fd,
-                         __u32 flags);
+                         struct bpf_prog *prog, __u32 flags);
 void __hid_bpf_destroy_device(struct hid_device *hdev);
 int hid_bpf_prog_run(struct hid_device *hdev, enum hid_bpf_prog_type type,
                     struct hid_bpf_ctx_kern *ctx_kern);
index eca34b7372f951fc17e156ec2cc3761282ea61e8..aa8e1c79cdf5518301e73e44038f75e6fb1173e0 100644 (file)
@@ -196,6 +196,7 @@ static void __hid_bpf_do_release_prog(int map_fd, unsigned int idx)
 static void hid_bpf_release_progs(struct work_struct *work)
 {
        int i, j, n, map_fd = -1;
+       bool hdev_destroyed;
 
        if (!jmp_table.map)
                return;
@@ -220,6 +221,12 @@ static void hid_bpf_release_progs(struct work_struct *work)
                if (entry->hdev) {
                        hdev = entry->hdev;
                        type = entry->type;
+                       /*
+                        * hdev is still valid, even if we are called after hid_destroy_device():
+                        * when hid_bpf_attach() gets called, it takes a ref on the dev through
+                        * bus_find_device()
+                        */
+                       hdev_destroyed = hdev->bpf.destroyed;
 
                        hid_bpf_populate_hdev(hdev, type);
 
@@ -232,12 +239,19 @@ static void hid_bpf_release_progs(struct work_struct *work)
                                if (test_bit(next->idx, jmp_table.enabled))
                                        continue;
 
-                               if (next->hdev == hdev && next->type == type)
+                               if (next->hdev == hdev && next->type == type) {
+                                       /*
+                                        * clear the hdev reference and decrement the device ref
+                                        * that was taken during bus_find_device() while calling
+                                        * hid_bpf_attach()
+                                        */
                                        next->hdev = NULL;
+                                       put_device(&hdev->dev);
+                               }
                        }
 
-                       /* if type was rdesc fixup, reconnect device */
-                       if (type == HID_BPF_PROG_TYPE_RDESC_FIXUP)
+                       /* if type was rdesc fixup and the device is not gone, reconnect device */
+                       if (type == HID_BPF_PROG_TYPE_RDESC_FIXUP && !hdev_destroyed)
                                hid_bpf_reconnect(hdev);
                }
        }
@@ -333,15 +347,10 @@ static int hid_bpf_insert_prog(int prog_fd, struct bpf_prog *prog)
        return err;
 }
 
-int hid_bpf_get_prog_attach_type(int prog_fd)
+int hid_bpf_get_prog_attach_type(struct bpf_prog *prog)
 {
-       struct bpf_prog *prog = NULL;
-       int i;
        int prog_type = HID_BPF_PROG_TYPE_UNDEF;
-
-       prog = bpf_prog_get(prog_fd);
-       if (IS_ERR(prog))
-               return PTR_ERR(prog);
+       int i;
 
        for (i = 0; i < HID_BPF_PROG_TYPE_MAX; i++) {
                if (hid_bpf_btf_ids[i] == prog->aux->attach_btf_id) {
@@ -350,8 +359,6 @@ int hid_bpf_get_prog_attach_type(int prog_fd)
                }
        }
 
-       bpf_prog_put(prog);
-
        return prog_type;
 }
 
@@ -388,19 +395,13 @@ static const struct bpf_link_ops hid_bpf_link_lops = {
 /* called from syscall */
 noinline int
 __hid_bpf_attach_prog(struct hid_device *hdev, enum hid_bpf_prog_type prog_type,
-                     int prog_fd, __u32 flags)
+                     int prog_fd, struct bpf_prog *prog, __u32 flags)
 {
        struct bpf_link_primer link_primer;
        struct hid_bpf_link *link;
-       struct bpf_prog *prog = NULL;
        struct hid_bpf_prog_entry *prog_entry;
        int cnt, err = -EINVAL, prog_table_idx = -1;
 
-       /* take a ref on the prog itself */
-       prog = bpf_prog_get(prog_fd);
-       if (IS_ERR(prog))
-               return PTR_ERR(prog);
-
        mutex_lock(&hid_bpf_attach_lock);
 
        link = kzalloc(sizeof(*link), GFP_USER);
@@ -467,7 +468,6 @@ __hid_bpf_attach_prog(struct hid_device *hdev, enum hid_bpf_prog_type prog_type,
  err_unlock:
        mutex_unlock(&hid_bpf_attach_lock);
 
-       bpf_prog_put(prog);
        kfree(link);
 
        return err;
index fb30e228d35f9a91b6bb395845584d9f073be26c..828a5c022c6407add84c44122248e0e1ea9aaa64 100644 (file)
 
 #define USB_VENDOR_ID_CIDC             0x1677
 
+#define I2C_VENDOR_ID_CIRQUE           0x0488
+#define I2C_PRODUCT_ID_CIRQUE_1063     0x1063
+
 #define USB_VENDOR_ID_CJTOUCH          0x24b8
 #define USB_DEVICE_ID_CJTOUCH_MULTI_TOUCH_0020 0x0020
 #define USB_DEVICE_ID_CJTOUCH_MULTI_TOUCH_0040 0x0040
index fd6d8f1d9b8f61992a69ce651dd379d121c2da49..d2f3f234f29dea35b2bfb37ef693ba9d6a9b8bf6 100644 (file)
@@ -203,6 +203,8 @@ struct hidpp_device {
        struct hidpp_scroll_counter vertical_wheel_counter;
 
        u8 wireless_feature_index;
+
+       bool connected_once;
 };
 
 /* HID++ 1.0 error codes */
@@ -988,8 +990,13 @@ static int hidpp_root_get_protocol_version(struct hidpp_device *hidpp)
        hidpp->protocol_minor = response.rap.params[1];
 
 print_version:
-       hid_info(hidpp->hid_dev, "HID++ %u.%u device connected.\n",
-                hidpp->protocol_major, hidpp->protocol_minor);
+       if (!hidpp->connected_once) {
+               hid_info(hidpp->hid_dev, "HID++ %u.%u device connected.\n",
+                        hidpp->protocol_major, hidpp->protocol_minor);
+               hidpp->connected_once = true;
+       } else
+               hid_dbg(hidpp->hid_dev, "HID++ %u.%u device connected.\n",
+                        hidpp->protocol_major, hidpp->protocol_minor);
        return 0;
 }
 
@@ -4184,7 +4191,7 @@ static void hidpp_connect_event(struct work_struct *work)
        /* Get device version to check if it is connected */
        ret = hidpp_root_get_protocol_version(hidpp);
        if (ret) {
-               hid_info(hidpp->hid_dev, "Disconnected\n");
+               hid_dbg(hidpp->hid_dev, "Disconnected\n");
                if (hidpp->battery.ps) {
                        hidpp->battery.online = false;
                        hidpp->battery.status = POWER_SUPPLY_STATUS_UNKNOWN;
@@ -4610,6 +4617,8 @@ static const struct hid_device_id hidpp_devices[] = {
          HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, 0xC088) },
        { /* Logitech G Pro X Superlight Gaming Mouse over USB */
          HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, 0xC094) },
+       { /* Logitech G Pro X Superlight 2 Gaming Mouse over USB */
+         HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, 0xC09b) },
 
        { /* G935 Gaming Headset */
          HID_USB_DEVICE(USB_VENDOR_ID_LOGITECH, 0x0a87),
index fd5b0637dad683e7b20c929974c958e79936880c..3e91e4d6ba6fa335c7f5988638791d3df8d1773a 100644 (file)
@@ -2151,6 +2151,10 @@ static const struct hid_device_id mt_devices[] = {
                HID_DEVICE(BUS_I2C, HID_GROUP_MULTITOUCH_WIN_8,
                        USB_VENDOR_ID_SYNAPTICS, 0xcd7e) },
 
+       { .driver_data = MT_CLS_WIN_8_FORCE_MULTI_INPUT,
+               HID_DEVICE(BUS_I2C, HID_GROUP_MULTITOUCH_WIN_8,
+                       USB_VENDOR_ID_SYNAPTICS, 0xcddc) },
+
        { .driver_data = MT_CLS_WIN_8_FORCE_MULTI_INPUT,
                HID_DEVICE(BUS_I2C, HID_GROUP_MULTITOUCH_WIN_8,
                        USB_VENDOR_ID_SYNAPTICS, 0xce08) },
index 82d0a77359c460c9bad772038e1a129e9983e0c0..58b15750dbb0ac2cb2ad333b616dc39cdea8c779 100644 (file)
@@ -800,6 +800,8 @@ static inline int thunderstrike_led_create(struct thunderstrike *ts)
 
        led->name = devm_kasprintf(&ts->base.hdev->dev, GFP_KERNEL,
                                   "thunderstrike%d:blue:led", ts->id);
+       if (!led->name)
+               return -ENOMEM;
        led->max_brightness = 1;
        led->flags = LED_CORE_SUSPENDRESUME | LED_RETAIN_AT_SHUTDOWN;
        led->brightness_get = &thunderstrike_led_get_brightness;
@@ -831,6 +833,8 @@ static inline int thunderstrike_psy_create(struct shield_device *shield_dev)
        shield_dev->battery_dev.desc.name =
                devm_kasprintf(&ts->base.hdev->dev, GFP_KERNEL,
                               "thunderstrike_%d", ts->id);
+       if (!shield_dev->battery_dev.desc.name)
+               return -ENOMEM;
 
        shield_dev->battery_dev.psy = power_supply_register(
                &hdev->dev, &shield_dev->battery_dev.desc, &psy_cfg);
index b3c4e50e248aa7eda08a356187ecea54cc803834..b08a5ab5852884219654ac449f255d9a3e3f1585 100644 (file)
@@ -1109,10 +1109,9 @@ static int steam_probe(struct hid_device *hdev,
                return hid_hw_start(hdev, HID_CONNECT_DEFAULT);
 
        steam = devm_kzalloc(&hdev->dev, sizeof(*steam), GFP_KERNEL);
-       if (!steam) {
-               ret = -ENOMEM;
-               goto steam_alloc_fail;
-       }
+       if (!steam)
+               return -ENOMEM;
+
        steam->hdev = hdev;
        hid_set_drvdata(hdev, steam);
        spin_lock_init(&steam->lock);
@@ -1129,14 +1128,14 @@ static int steam_probe(struct hid_device *hdev,
         */
        ret = hid_hw_start(hdev, HID_CONNECT_DEFAULT & ~HID_CONNECT_HIDRAW);
        if (ret)
-               goto hid_hw_start_fail;
+               goto err_cancel_work;
 
        ret = hid_hw_open(hdev);
        if (ret) {
                hid_err(hdev,
                        "%s:hid_hw_open\n",
                        __func__);
-               goto hid_hw_open_fail;
+               goto err_hw_stop;
        }
 
        if (steam->quirks & STEAM_QUIRK_WIRELESS) {
@@ -1152,36 +1151,37 @@ static int steam_probe(struct hid_device *hdev,
                        hid_err(hdev,
                                "%s:steam_register failed with error %d\n",
                                __func__, ret);
-                       goto input_register_fail;
+                       goto err_hw_close;
                }
        }
 
        steam->client_hdev = steam_create_client_hid(hdev);
        if (IS_ERR(steam->client_hdev)) {
                ret = PTR_ERR(steam->client_hdev);
-               goto client_hdev_fail;
+               goto err_stream_unregister;
        }
        steam->client_hdev->driver_data = steam;
 
        ret = hid_add_device(steam->client_hdev);
        if (ret)
-               goto client_hdev_add_fail;
+               goto err_destroy;
 
        return 0;
 
-client_hdev_add_fail:
-       hid_hw_stop(hdev);
-client_hdev_fail:
+err_destroy:
        hid_destroy_device(steam->client_hdev);
-input_register_fail:
-hid_hw_open_fail:
-hid_hw_start_fail:
+err_stream_unregister:
+       if (steam->connected)
+               steam_unregister(steam);
+err_hw_close:
+       hid_hw_close(hdev);
+err_hw_stop:
+       hid_hw_stop(hdev);
+err_cancel_work:
        cancel_work_sync(&steam->work_connect);
        cancel_delayed_work_sync(&steam->mode_switch);
        cancel_work_sync(&steam->rumble_work);
-steam_alloc_fail:
-       hid_err(hdev, "%s: failed with error %d\n",
-                       __func__, ret);
+
        return ret;
 }
 
index 13c8dd8cd35060731165cd2018f96c6e7bfef512..2bc762d31ac70de9724df166422f31ab1c8687f4 100644 (file)
@@ -357,8 +357,11 @@ static int hidraw_release(struct inode * inode, struct file * file)
        down_write(&minors_rwsem);
 
        spin_lock_irqsave(&hidraw_table[minor]->list_lock, flags);
-       for (int i = list->tail; i < list->head; i++)
-               kfree(list->buffer[i].value);
+       while (list->tail != list->head) {
+               kfree(list->buffer[list->tail].value);
+               list->buffer[list->tail].value = NULL;
+               list->tail = (list->tail + 1) & (HIDRAW_BUFFER_SIZE - 1);
+       }
        list_del(&list->node);
        spin_unlock_irqrestore(&hidraw_table[minor]->list_lock, flags);
        kfree(list);
index 90f316ae9819af4759720aad86136721f78f5abe..2df1ab3c31cc54da812ee653face224f32e69fc2 100644 (file)
@@ -49,6 +49,7 @@
 #define I2C_HID_QUIRK_RESET_ON_RESUME          BIT(2)
 #define I2C_HID_QUIRK_BAD_INPUT_SIZE           BIT(3)
 #define I2C_HID_QUIRK_NO_WAKEUP_AFTER_RESET    BIT(4)
+#define I2C_HID_QUIRK_NO_SLEEP_ON_SUSPEND      BIT(5)
 
 /* Command opcodes */
 #define I2C_HID_OPCODE_RESET                   0x01
@@ -131,6 +132,8 @@ static const struct i2c_hid_quirks {
                 I2C_HID_QUIRK_RESET_ON_RESUME },
        { USB_VENDOR_ID_ITE, I2C_DEVICE_ID_ITE_LENOVO_LEGION_Y720,
                I2C_HID_QUIRK_BAD_INPUT_SIZE },
+       { I2C_VENDOR_ID_CIRQUE, I2C_PRODUCT_ID_CIRQUE_1063,
+               I2C_HID_QUIRK_NO_SLEEP_ON_SUSPEND },
        /*
         * Sending the wakeup after reset actually break ELAN touchscreen controller
         */
@@ -956,7 +959,8 @@ static int i2c_hid_core_suspend(struct i2c_hid *ihid, bool force_poweroff)
                return ret;
 
        /* Save some power */
-       i2c_hid_set_power(ihid, I2C_HID_PWR_SLEEP);
+       if (!(ihid->quirks & I2C_HID_QUIRK_NO_SLEEP_ON_SUSPEND))
+               i2c_hid_set_power(ihid, I2C_HID_PWR_SLEEP);
 
        disable_irq(client->irq);
 
index c4e1fa0273c84c3b2e3b438e04673727b05e6f6e..8be4d576da7733d28b8e4a1a07e86a0d11584ae6 100644 (file)
@@ -87,6 +87,7 @@ static int i2c_hid_of_probe(struct i2c_client *client)
        if (!ihid_of)
                return -ENOMEM;
 
+       ihid_of->client = client;
        ihid_of->ops.power_up = i2c_hid_of_power_up;
        ihid_of->ops.power_down = i2c_hid_of_power_down;
 
index aa6cb033bb06b77f182e6df441a04e1b016aaef5..03d5601ce807b3b1d49ed88bc923774d71ace572 100644 (file)
@@ -722,6 +722,8 @@ void ishtp_bus_remove_all_clients(struct ishtp_device *ishtp_dev,
        spin_lock_irqsave(&ishtp_dev->cl_list_lock, flags);
        list_for_each_entry(cl, &ishtp_dev->cl_list, link) {
                cl->state = ISHTP_CL_DISCONNECTED;
+               if (warm_reset && cl->device->reference_count)
+                       continue;
 
                /*
                 * Wake any pending process. The waiter would check dev->state
index 82c907f01bd3b66af02efa1d313f3bb2f7cb7209..8a7f2f6a4f86864cd5783ed51852f56cef614d5f 100644 (file)
@@ -49,7 +49,9 @@ static void ishtp_read_list_flush(struct ishtp_cl *cl)
        list_for_each_entry_safe(rb, next, &cl->dev->read_list.list, list)
                if (rb->cl && ishtp_cl_cmp_id(cl, rb->cl)) {
                        list_del(&rb->list);
-                       ishtp_io_rb_free(rb);
+                       spin_lock(&cl->free_list_spinlock);
+                       list_add_tail(&rb->list, &cl->free_rb_list.list);
+                       spin_unlock(&cl->free_list_spinlock);
                }
        spin_unlock_irqrestore(&cl->dev->read_list_spinlock, flags);
 }
index b613f11ed9498d7045f8649496049dc1b0b91839..2bc45b24075c3fe4b70ef222bbd21a4ee11eeb21 100644 (file)
@@ -2087,7 +2087,7 @@ static int wacom_allocate_inputs(struct wacom *wacom)
        return 0;
 }
 
-static int wacom_register_inputs(struct wacom *wacom)
+static int wacom_setup_inputs(struct wacom *wacom)
 {
        struct input_dev *pen_input_dev, *touch_input_dev, *pad_input_dev;
        struct wacom_wac *wacom_wac = &(wacom->wacom_wac);
@@ -2106,10 +2106,6 @@ static int wacom_register_inputs(struct wacom *wacom)
                input_free_device(pen_input_dev);
                wacom_wac->pen_input = NULL;
                pen_input_dev = NULL;
-       } else {
-               error = input_register_device(pen_input_dev);
-               if (error)
-                       goto fail;
        }
 
        error = wacom_setup_touch_input_capabilities(touch_input_dev, wacom_wac);
@@ -2118,10 +2114,6 @@ static int wacom_register_inputs(struct wacom *wacom)
                input_free_device(touch_input_dev);
                wacom_wac->touch_input = NULL;
                touch_input_dev = NULL;
-       } else {
-               error = input_register_device(touch_input_dev);
-               if (error)
-                       goto fail;
        }
 
        error = wacom_setup_pad_input_capabilities(pad_input_dev, wacom_wac);
@@ -2130,7 +2122,34 @@ static int wacom_register_inputs(struct wacom *wacom)
                input_free_device(pad_input_dev);
                wacom_wac->pad_input = NULL;
                pad_input_dev = NULL;
-       } else {
+       }
+
+       return 0;
+}
+
+static int wacom_register_inputs(struct wacom *wacom)
+{
+       struct input_dev *pen_input_dev, *touch_input_dev, *pad_input_dev;
+       struct wacom_wac *wacom_wac = &(wacom->wacom_wac);
+       int error = 0;
+
+       pen_input_dev = wacom_wac->pen_input;
+       touch_input_dev = wacom_wac->touch_input;
+       pad_input_dev = wacom_wac->pad_input;
+
+       if (pen_input_dev) {
+               error = input_register_device(pen_input_dev);
+               if (error)
+                       goto fail;
+       }
+
+       if (touch_input_dev) {
+               error = input_register_device(touch_input_dev);
+               if (error)
+                       goto fail;
+       }
+
+       if (pad_input_dev) {
                error = input_register_device(pad_input_dev);
                if (error)
                        goto fail;
@@ -2383,6 +2402,20 @@ static int wacom_parse_and_register(struct wacom *wacom, bool wireless)
        if (error)
                goto fail;
 
+       error = wacom_setup_inputs(wacom);
+       if (error)
+               goto fail;
+
+       if (features->type == HID_GENERIC)
+               connect_mask |= HID_CONNECT_DRIVER;
+
+       /* Regular HID work starts now */
+       error = hid_hw_start(hdev, connect_mask);
+       if (error) {
+               hid_err(hdev, "hw start failed\n");
+               goto fail;
+       }
+
        error = wacom_register_inputs(wacom);
        if (error)
                goto fail;
@@ -2397,16 +2430,6 @@ static int wacom_parse_and_register(struct wacom *wacom, bool wireless)
                        goto fail;
        }
 
-       if (features->type == HID_GENERIC)
-               connect_mask |= HID_CONNECT_DRIVER;
-
-       /* Regular HID work starts now */
-       error = hid_hw_start(hdev, connect_mask);
-       if (error) {
-               hid_err(hdev, "hw start failed\n");
-               goto fail;
-       }
-
        if (!wireless) {
                /* Note that if query fails it is not a hard failure */
                wacom_query_tablet_data(wacom);
index da8a01fedd3944a7588aad5e2a523b44b2b2797c..fbe10fbc5769e53affe44a0826a55853b306c0ee 100644 (file)
@@ -2575,7 +2575,14 @@ static void wacom_wac_pen_report(struct hid_device *hdev,
                                wacom_wac->hid_data.tipswitch);
                input_report_key(input, wacom_wac->tool[0], sense);
                if (wacom_wac->serial[0]) {
-                       input_event(input, EV_MSC, MSC_SERIAL, wacom_wac->serial[0]);
+                       /*
+                        * xf86-input-wacom does not accept a serial number
+                        * of '0'. Report the low 32 bits if possible, but
+                        * if they are zero, report the upper ones instead.
+                        */
+                       __u32 serial_lo = wacom_wac->serial[0] & 0xFFFFFFFFu;
+                       __u32 serial_hi = wacom_wac->serial[0] >> 32;
+                       input_event(input, EV_MSC, MSC_SERIAL, (int)(serial_lo ? serial_lo : serial_hi));
                        input_report_abs(input, ABS_MISC, sense ? id : 0);
                }
 
index 56f7e06c673e4236ba8d1e01957723a800af63a0..adbf674355b2b8a472c03bd60092960cb0c742cf 100644 (file)
@@ -322,125 +322,89 @@ static int create_gpadl_header(enum hv_gpadl_type type, void *kbuffer,
 
        pagecount = hv_gpadl_size(type, size) >> HV_HYP_PAGE_SHIFT;
 
-       /* do we need a gpadl body msg */
        pfnsize = MAX_SIZE_CHANNEL_MESSAGE -
                  sizeof(struct vmbus_channel_gpadl_header) -
                  sizeof(struct gpa_range);
+       pfncount = umin(pagecount, pfnsize / sizeof(u64));
+
+       msgsize = sizeof(struct vmbus_channel_msginfo) +
+                 sizeof(struct vmbus_channel_gpadl_header) +
+                 sizeof(struct gpa_range) + pfncount * sizeof(u64);
+       msgheader =  kzalloc(msgsize, GFP_KERNEL);
+       if (!msgheader)
+               return -ENOMEM;
+
+       INIT_LIST_HEAD(&msgheader->submsglist);
+       msgheader->msgsize = msgsize;
+
+       gpadl_header = (struct vmbus_channel_gpadl_header *)
+               msgheader->msg;
+       gpadl_header->rangecount = 1;
+       gpadl_header->range_buflen = sizeof(struct gpa_range) +
+                                pagecount * sizeof(u64);
+       gpadl_header->range[0].byte_offset = 0;
+       gpadl_header->range[0].byte_count = hv_gpadl_size(type, size);
+       for (i = 0; i < pfncount; i++)
+               gpadl_header->range[0].pfn_array[i] = hv_gpadl_hvpfn(
+                       type, kbuffer, size, send_offset, i);
+       *msginfo = msgheader;
+
+       pfnsum = pfncount;
+       pfnleft = pagecount - pfncount;
+
+       /* how many pfns can we fit in a body message */
+       pfnsize = MAX_SIZE_CHANNEL_MESSAGE -
+                 sizeof(struct vmbus_channel_gpadl_body);
        pfncount = pfnsize / sizeof(u64);
 
-       if (pagecount > pfncount) {
-               /* we need a gpadl body */
-               /* fill in the header */
+       /*
+        * If pfnleft is zero, everything fits in the header and no body
+        * messages are needed
+        */
+       while (pfnleft) {
+               pfncurr = umin(pfncount, pfnleft);
                msgsize = sizeof(struct vmbus_channel_msginfo) +
-                         sizeof(struct vmbus_channel_gpadl_header) +
-                         sizeof(struct gpa_range) + pfncount * sizeof(u64);
-               msgheader =  kzalloc(msgsize, GFP_KERNEL);
-               if (!msgheader)
-                       goto nomem;
-
-               INIT_LIST_HEAD(&msgheader->submsglist);
-               msgheader->msgsize = msgsize;
-
-               gpadl_header = (struct vmbus_channel_gpadl_header *)
-                       msgheader->msg;
-               gpadl_header->rangecount = 1;
-               gpadl_header->range_buflen = sizeof(struct gpa_range) +
-                                        pagecount * sizeof(u64);
-               gpadl_header->range[0].byte_offset = 0;
-               gpadl_header->range[0].byte_count = hv_gpadl_size(type, size);
-               for (i = 0; i < pfncount; i++)
-                       gpadl_header->range[0].pfn_array[i] = hv_gpadl_hvpfn(
-                               type, kbuffer, size, send_offset, i);
-               *msginfo = msgheader;
-
-               pfnsum = pfncount;
-               pfnleft = pagecount - pfncount;
-
-               /* how many pfns can we fit */
-               pfnsize = MAX_SIZE_CHANNEL_MESSAGE -
-                         sizeof(struct vmbus_channel_gpadl_body);
-               pfncount = pfnsize / sizeof(u64);
-
-               /* fill in the body */
-               while (pfnleft) {
-                       if (pfnleft > pfncount)
-                               pfncurr = pfncount;
-                       else
-                               pfncurr = pfnleft;
-
-                       msgsize = sizeof(struct vmbus_channel_msginfo) +
-                                 sizeof(struct vmbus_channel_gpadl_body) +
-                                 pfncurr * sizeof(u64);
-                       msgbody = kzalloc(msgsize, GFP_KERNEL);
-
-                       if (!msgbody) {
-                               struct vmbus_channel_msginfo *pos = NULL;
-                               struct vmbus_channel_msginfo *tmp = NULL;
-                               /*
-                                * Free up all the allocated messages.
-                                */
-                               list_for_each_entry_safe(pos, tmp,
-                                       &msgheader->submsglist,
-                                       msglistentry) {
-
-                                       list_del(&pos->msglistentry);
-                                       kfree(pos);
-                               }
-
-                               goto nomem;
-                       }
-
-                       msgbody->msgsize = msgsize;
-                       gpadl_body =
-                               (struct vmbus_channel_gpadl_body *)msgbody->msg;
+                         sizeof(struct vmbus_channel_gpadl_body) +
+                         pfncurr * sizeof(u64);
+               msgbody = kzalloc(msgsize, GFP_KERNEL);
 
+               if (!msgbody) {
+                       struct vmbus_channel_msginfo *pos = NULL;
+                       struct vmbus_channel_msginfo *tmp = NULL;
                        /*
-                        * Gpadl is u32 and we are using a pointer which could
-                        * be 64-bit
-                        * This is governed by the guest/host protocol and
-                        * so the hypervisor guarantees that this is ok.
+                        * Free up all the allocated messages.
                         */
-                       for (i = 0; i < pfncurr; i++)
-                               gpadl_body->pfn[i] = hv_gpadl_hvpfn(type,
-                                       kbuffer, size, send_offset, pfnsum + i);
-
-                       /* add to msg header */
-                       list_add_tail(&msgbody->msglistentry,
-                                     &msgheader->submsglist);
-                       pfnsum += pfncurr;
-                       pfnleft -= pfncurr;
+                       list_for_each_entry_safe(pos, tmp,
+                               &msgheader->submsglist,
+                               msglistentry) {
+
+                               list_del(&pos->msglistentry);
+                               kfree(pos);
+                       }
+                       kfree(msgheader);
+                       return -ENOMEM;
                }
-       } else {
-               /* everything fits in a header */
-               msgsize = sizeof(struct vmbus_channel_msginfo) +
-                         sizeof(struct vmbus_channel_gpadl_header) +
-                         sizeof(struct gpa_range) + pagecount * sizeof(u64);
-               msgheader = kzalloc(msgsize, GFP_KERNEL);
-               if (msgheader == NULL)
-                       goto nomem;
-
-               INIT_LIST_HEAD(&msgheader->submsglist);
-               msgheader->msgsize = msgsize;
-
-               gpadl_header = (struct vmbus_channel_gpadl_header *)
-                       msgheader->msg;
-               gpadl_header->rangecount = 1;
-               gpadl_header->range_buflen = sizeof(struct gpa_range) +
-                                        pagecount * sizeof(u64);
-               gpadl_header->range[0].byte_offset = 0;
-               gpadl_header->range[0].byte_count = hv_gpadl_size(type, size);
-               for (i = 0; i < pagecount; i++)
-                       gpadl_header->range[0].pfn_array[i] = hv_gpadl_hvpfn(
-                               type, kbuffer, size, send_offset, i);
-
-               *msginfo = msgheader;
+
+               msgbody->msgsize = msgsize;
+               gpadl_body = (struct vmbus_channel_gpadl_body *)msgbody->msg;
+
+               /*
+                * Gpadl is u32 and we are using a pointer which could
+                * be 64-bit
+                * This is governed by the guest/host protocol and
+                * so the hypervisor guarantees that this is ok.
+                */
+               for (i = 0; i < pfncurr; i++)
+                       gpadl_body->pfn[i] = hv_gpadl_hvpfn(type,
+                               kbuffer, size, send_offset, pfnsum + i);
+
+               /* add to msg header */
+               list_add_tail(&msgbody->msglistentry, &msgheader->submsglist);
+               pfnsum += pfncurr;
+               pfnleft -= pfncurr;
        }
 
        return 0;
-nomem:
-       kfree(msgheader);
-       kfree(msgbody);
-       return -ENOMEM;
 }
 
 /*
index 42aec2c5606af756cc89c88cd50d158a733d784e..9c97c4065fe736e7e076894999447a2def819c24 100644 (file)
@@ -296,6 +296,11 @@ static struct {
        spinlock_t                      lock;
 } host_ts;
 
+static bool timesync_implicit;
+
+module_param(timesync_implicit, bool, 0644);
+MODULE_PARM_DESC(timesync_implicit, "If set treat SAMPLE as SYNC when clock is behind");
+
 static inline u64 reftime_to_ns(u64 reftime)
 {
        return (reftime - WLTIMEDELTA) * 100;
@@ -344,6 +349,29 @@ static void hv_set_host_time(struct work_struct *work)
                do_settimeofday64(&ts);
 }
 
+/*
+ * Due to a bug on Hyper-V hosts, the sync flag may not always be sent on resume.
+ * Force a sync if the guest is behind.
+ */
+static inline bool hv_implicit_sync(u64 host_time)
+{
+       struct timespec64 new_ts;
+       struct timespec64 threshold_ts;
+
+       new_ts = ns_to_timespec64(reftime_to_ns(host_time));
+       ktime_get_real_ts64(&threshold_ts);
+
+       threshold_ts.tv_sec += 5;
+
+       /*
+        * If guest behind the host by 5 or more seconds.
+        */
+       if (timespec64_compare(&new_ts, &threshold_ts) >= 0)
+               return true;
+
+       return false;
+}
+
 /*
  * Synchronize time with host after reboot, restore, etc.
  *
@@ -384,7 +412,8 @@ static inline void adj_guesttime(u64 hosttime, u64 reftime, u8 adj_flags)
        spin_unlock_irqrestore(&host_ts.lock, flags);
 
        /* Schedule work to do do_settimeofday64() */
-       if (adj_flags & ICTIMESYNCFLAG_SYNC)
+       if ((adj_flags & ICTIMESYNCFLAG_SYNC) ||
+           (timesync_implicit && hv_implicit_sync(host_ts.host_time)))
                schedule_work(&adj_time_work);
 }
 
index b33d5abd9beb234f98fdcb9d50636a3affc93c35..7f7965f3d187884d87a2a822c3479485e17cec62 100644 (file)
@@ -988,7 +988,7 @@ static const struct dev_pm_ops vmbus_pm = {
 };
 
 /* The one and only one */
-static struct bus_type  hv_bus = {
+static const struct bus_type  hv_bus = {
        .name =         "vmbus",
        .match =                vmbus_match,
        .shutdown =             vmbus_shutdown,
index f6e1e55e82922be6f67a98046b4f74c3159625d9..4acc1858d8acf799c20e5c2061431d35adc8db10 100644 (file)
@@ -195,6 +195,8 @@ struct aspeed_pwm_tacho_data {
        u8 fan_tach_ch_source[MAX_ASPEED_FAN_TACH_CHANNELS];
        struct aspeed_cooling_device *cdev[8];
        const struct attribute_group *groups[3];
+       /* protects access to shared ASPEED_PTCR_RESULT */
+       struct mutex tach_lock;
 };
 
 enum type { TYPEM, TYPEN, TYPEO };
@@ -529,6 +531,8 @@ static int aspeed_get_fan_tach_ch_rpm(struct aspeed_pwm_tacho_data *priv,
        u8 fan_tach_ch_source, type, mode, both;
        int ret;
 
+       mutex_lock(&priv->tach_lock);
+
        regmap_write(priv->regmap, ASPEED_PTCR_TRIGGER, 0);
        regmap_write(priv->regmap, ASPEED_PTCR_TRIGGER, 0x1 << fan_tach_ch);
 
@@ -546,6 +550,8 @@ static int aspeed_get_fan_tach_ch_rpm(struct aspeed_pwm_tacho_data *priv,
                ASPEED_RPM_STATUS_SLEEP_USEC,
                usec);
 
+       mutex_unlock(&priv->tach_lock);
+
        /* return -ETIMEDOUT if we didn't get an answer. */
        if (ret)
                return ret;
@@ -915,6 +921,7 @@ static int aspeed_pwm_tacho_probe(struct platform_device *pdev)
        priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
        if (!priv)
                return -ENOMEM;
+       mutex_init(&priv->tach_lock);
        priv->regmap = devm_regmap_init(dev, NULL, (__force void *)regs,
                        &aspeed_pwm_tacho_regmap_config);
        if (IS_ERR(priv->regmap))
index ba82d1e79c131678c0c673bd5c0d9d77b09fdf1a..b8fc8d1ef20dfcb6132a168425df2d7e2653afa4 100644 (file)
@@ -41,7 +41,7 @@ MODULE_PARM_DESC(tjmax, "TjMax value in degrees Celsius");
 
 #define PKG_SYSFS_ATTR_NO      1       /* Sysfs attribute for package temp */
 #define BASE_SYSFS_ATTR_NO     2       /* Sysfs Base attr no for coretemp */
-#define NUM_REAL_CORES         128     /* Number of Real cores per cpu */
+#define NUM_REAL_CORES         512     /* Number of Real cores per cpu */
 #define CORETEMP_NAME_LENGTH   28      /* String Length of attrs */
 #define MAX_CORE_ATTRS         4       /* Maximum no of basic attrs */
 #define TOTAL_ATTRS            (MAX_CORE_ATTRS + 1)
@@ -419,7 +419,7 @@ static ssize_t show_temp(struct device *dev,
 }
 
 static int create_core_attrs(struct temp_data *tdata, struct device *dev,
-                            int attr_no)
+                            int index)
 {
        int i;
        static ssize_t (*const rd_ptr[TOTAL_ATTRS]) (struct device *dev,
@@ -431,13 +431,20 @@ static int create_core_attrs(struct temp_data *tdata, struct device *dev,
        };
 
        for (i = 0; i < tdata->attr_size; i++) {
+               /*
+                * We map the attr number to core id of the CPU
+                * The attr number is always core id + 2
+                * The Pkgtemp will always show up as temp1_*, if available
+                */
+               int attr_no = tdata->is_pkg_data ? 1 : tdata->cpu_core_id + 2;
+
                snprintf(tdata->attr_name[i], CORETEMP_NAME_LENGTH,
                         "temp%d_%s", attr_no, suffixes[i]);
                sysfs_attr_init(&tdata->sd_attrs[i].dev_attr.attr);
                tdata->sd_attrs[i].dev_attr.attr.name = tdata->attr_name[i];
                tdata->sd_attrs[i].dev_attr.attr.mode = 0444;
                tdata->sd_attrs[i].dev_attr.show = rd_ptr[i];
-               tdata->sd_attrs[i].index = attr_no;
+               tdata->sd_attrs[i].index = index;
                tdata->attrs[i] = &tdata->sd_attrs[i].dev_attr.attr;
        }
        tdata->attr_group.attrs = tdata->attrs;
@@ -495,30 +502,25 @@ static int create_core_data(struct platform_device *pdev, unsigned int cpu,
        struct platform_data *pdata = platform_get_drvdata(pdev);
        struct cpuinfo_x86 *c = &cpu_data(cpu);
        u32 eax, edx;
-       int err, index, attr_no;
+       int err, index;
 
        if (!housekeeping_cpu(cpu, HK_TYPE_MISC))
                return 0;
 
        /*
-        * Find attr number for sysfs:
-        * We map the attr number to core id of the CPU
-        * The attr number is always core id + 2
-        * The Pkgtemp will always show up as temp1_*, if available
+        * Get the index of tdata in pdata->core_data[]
+        * tdata for package: pdata->core_data[1]
+        * tdata for core: pdata->core_data[2] .. pdata->core_data[NUM_REAL_CORES + 1]
         */
        if (pkg_flag) {
-               attr_no = PKG_SYSFS_ATTR_NO;
+               index = PKG_SYSFS_ATTR_NO;
        } else {
-               index = ida_alloc(&pdata->ida, GFP_KERNEL);
+               index = ida_alloc_max(&pdata->ida, NUM_REAL_CORES - 1, GFP_KERNEL);
                if (index < 0)
                        return index;
-               pdata->cpu_map[index] = topology_core_id(cpu);
-               attr_no = index + BASE_SYSFS_ATTR_NO;
-       }
 
-       if (attr_no > MAX_CORE_DATA - 1) {
-               err = -ERANGE;
-               goto ida_free;
+               pdata->cpu_map[index] = topology_core_id(cpu);
+               index += BASE_SYSFS_ATTR_NO;
        }
 
        tdata = init_temp_data(cpu, pkg_flag);
@@ -544,20 +546,20 @@ static int create_core_data(struct platform_device *pdev, unsigned int cpu,
                if (get_ttarget(tdata, &pdev->dev) >= 0)
                        tdata->attr_size++;
 
-       pdata->core_data[attr_no] = tdata;
+       pdata->core_data[index] = tdata;
 
        /* Create sysfs interfaces */
-       err = create_core_attrs(tdata, pdata->hwmon_dev, attr_no);
+       err = create_core_attrs(tdata, pdata->hwmon_dev, index);
        if (err)
                goto exit_free;
 
        return 0;
 exit_free:
-       pdata->core_data[attr_no] = NULL;
+       pdata->core_data[index] = NULL;
        kfree(tdata);
 ida_free:
        if (!pkg_flag)
-               ida_free(&pdata->ida, index);
+               ida_free(&pdata->ida, index - BASE_SYSFS_ATTR_NO);
        return err;
 }
 
index 85e5237757142a05b45fdd7006d4c7b2ca61a3c3..8129d7b3ceaf9ae2e851f39af2db7aa6eaca9ce0 100644 (file)
@@ -146,7 +146,7 @@ static int waterforce_get_status(struct waterforce_data *priv)
        /* Send command for getting status */
        ret = waterforce_write_expanded(priv, get_status_cmd, GET_STATUS_CMD_LENGTH);
        if (ret < 0)
-               return ret;
+               goto unlock_and_return;
 
        ret = wait_for_completion_interruptible_timeout(&priv->status_report_received,
                                                        msecs_to_jiffies(STATUS_VALIDITY));
index 8d2ef3145bca3c71b0aee2d8a1fb466dd3f9cb3e..9fbab8f023340da24cf9623e8da882849358ccea 100644 (file)
@@ -3512,6 +3512,7 @@ int nct6775_probe(struct device *dev, struct nct6775_data *data,
        const u16 *reg_temp_mon, *reg_temp_alternate, *reg_temp_crit;
        const u16 *reg_temp_crit_l = NULL, *reg_temp_crit_h = NULL;
        int num_reg_temp, num_reg_temp_mon, num_reg_tsi_temp;
+       int num_reg_temp_config;
        struct device *hwmon_dev;
        struct sensor_template_group tsi_temp_tg;
 
@@ -3594,6 +3595,7 @@ int nct6775_probe(struct device *dev, struct nct6775_data *data,
                reg_temp_over = NCT6106_REG_TEMP_OVER;
                reg_temp_hyst = NCT6106_REG_TEMP_HYST;
                reg_temp_config = NCT6106_REG_TEMP_CONFIG;
+               num_reg_temp_config = ARRAY_SIZE(NCT6106_REG_TEMP_CONFIG);
                reg_temp_alternate = NCT6106_REG_TEMP_ALTERNATE;
                reg_temp_crit = NCT6106_REG_TEMP_CRIT;
                reg_temp_crit_l = NCT6106_REG_TEMP_CRIT_L;
@@ -3669,6 +3671,7 @@ int nct6775_probe(struct device *dev, struct nct6775_data *data,
                reg_temp_over = NCT6106_REG_TEMP_OVER;
                reg_temp_hyst = NCT6106_REG_TEMP_HYST;
                reg_temp_config = NCT6106_REG_TEMP_CONFIG;
+               num_reg_temp_config = ARRAY_SIZE(NCT6106_REG_TEMP_CONFIG);
                reg_temp_alternate = NCT6106_REG_TEMP_ALTERNATE;
                reg_temp_crit = NCT6106_REG_TEMP_CRIT;
                reg_temp_crit_l = NCT6106_REG_TEMP_CRIT_L;
@@ -3746,6 +3749,7 @@ int nct6775_probe(struct device *dev, struct nct6775_data *data,
                reg_temp_over = NCT6775_REG_TEMP_OVER;
                reg_temp_hyst = NCT6775_REG_TEMP_HYST;
                reg_temp_config = NCT6775_REG_TEMP_CONFIG;
+               num_reg_temp_config = ARRAY_SIZE(NCT6775_REG_TEMP_CONFIG);
                reg_temp_alternate = NCT6775_REG_TEMP_ALTERNATE;
                reg_temp_crit = NCT6775_REG_TEMP_CRIT;
 
@@ -3821,6 +3825,7 @@ int nct6775_probe(struct device *dev, struct nct6775_data *data,
                reg_temp_over = NCT6775_REG_TEMP_OVER;
                reg_temp_hyst = NCT6775_REG_TEMP_HYST;
                reg_temp_config = NCT6776_REG_TEMP_CONFIG;
+               num_reg_temp_config = ARRAY_SIZE(NCT6776_REG_TEMP_CONFIG);
                reg_temp_alternate = NCT6776_REG_TEMP_ALTERNATE;
                reg_temp_crit = NCT6776_REG_TEMP_CRIT;
 
@@ -3900,6 +3905,7 @@ int nct6775_probe(struct device *dev, struct nct6775_data *data,
                reg_temp_over = NCT6779_REG_TEMP_OVER;
                reg_temp_hyst = NCT6779_REG_TEMP_HYST;
                reg_temp_config = NCT6779_REG_TEMP_CONFIG;
+               num_reg_temp_config = ARRAY_SIZE(NCT6779_REG_TEMP_CONFIG);
                reg_temp_alternate = NCT6779_REG_TEMP_ALTERNATE;
                reg_temp_crit = NCT6779_REG_TEMP_CRIT;
 
@@ -4034,6 +4040,7 @@ int nct6775_probe(struct device *dev, struct nct6775_data *data,
                reg_temp_over = NCT6779_REG_TEMP_OVER;
                reg_temp_hyst = NCT6779_REG_TEMP_HYST;
                reg_temp_config = NCT6779_REG_TEMP_CONFIG;
+               num_reg_temp_config = ARRAY_SIZE(NCT6779_REG_TEMP_CONFIG);
                reg_temp_alternate = NCT6779_REG_TEMP_ALTERNATE;
                reg_temp_crit = NCT6779_REG_TEMP_CRIT;
 
@@ -4123,6 +4130,7 @@ int nct6775_probe(struct device *dev, struct nct6775_data *data,
                reg_temp_over = NCT6798_REG_TEMP_OVER;
                reg_temp_hyst = NCT6798_REG_TEMP_HYST;
                reg_temp_config = NCT6779_REG_TEMP_CONFIG;
+               num_reg_temp_config = ARRAY_SIZE(NCT6779_REG_TEMP_CONFIG);
                reg_temp_alternate = NCT6798_REG_TEMP_ALTERNATE;
                reg_temp_crit = NCT6798_REG_TEMP_CRIT;
 
@@ -4204,7 +4212,8 @@ int nct6775_probe(struct device *dev, struct nct6775_data *data,
                                  = reg_temp_crit[src - 1];
                        if (reg_temp_crit_l && reg_temp_crit_l[i])
                                data->reg_temp[4][src - 1] = reg_temp_crit_l[i];
-                       data->reg_temp_config[src - 1] = reg_temp_config[i];
+                       if (i < num_reg_temp_config)
+                               data->reg_temp_config[src - 1] = reg_temp_config[i];
                        data->temp_src[src - 1] = src;
                        continue;
                }
@@ -4217,7 +4226,8 @@ int nct6775_probe(struct device *dev, struct nct6775_data *data,
                data->reg_temp[0][s] = reg_temp[i];
                data->reg_temp[1][s] = reg_temp_over[i];
                data->reg_temp[2][s] = reg_temp_hyst[i];
-               data->reg_temp_config[s] = reg_temp_config[i];
+               if (i < num_reg_temp_config)
+                       data->reg_temp_config[s] = reg_temp_config[i];
                if (reg_temp_crit_h && reg_temp_crit_h[i])
                        data->reg_temp[3][s] = reg_temp_crit_h[i];
                else if (reg_temp_crit[src - 1])
index b9bb469e2d8febe1d056e0b8f7d0a0b743d5ff3e..e5fa10b3b8bc7184e03e6caa0701e34f81bcb7cb 100644 (file)
@@ -126,6 +126,21 @@ static const struct regulator_desc __maybe_unused mp2975_reg_desc[] = {
 
 #define to_mp2975_data(x)  container_of(x, struct mp2975_data, info)
 
+static int mp2975_read_byte_data(struct i2c_client *client, int page, int reg)
+{
+       switch (reg) {
+       case PMBUS_VOUT_MODE:
+               /*
+                * Report direct format as configured by MFR_DC_LOOP_CTRL.
+                * Unlike on MP2971/MP2973 the reported VOUT_MODE isn't automatically
+                * internally updated, but always reads as PB_VOUT_MODE_VID.
+                */
+               return PB_VOUT_MODE_DIRECT;
+       default:
+               return -ENODATA;
+       }
+}
+
 static int
 mp2975_read_word_helper(struct i2c_client *client, int page, int phase, u8 reg,
                        u16 mask)
@@ -869,6 +884,7 @@ static struct pmbus_driver_info mp2975_info = {
                PMBUS_HAVE_IIN | PMBUS_HAVE_IOUT | PMBUS_HAVE_STATUS_IOUT |
                PMBUS_HAVE_TEMP | PMBUS_HAVE_STATUS_TEMP | PMBUS_HAVE_POUT |
                PMBUS_HAVE_PIN | PMBUS_HAVE_STATUS_INPUT | PMBUS_PHASE_VIRTUAL,
+       .read_byte_data = mp2975_read_byte_data,
        .read_word_data = mp2975_read_word_data,
 #if IS_ENABLED(CONFIG_SENSORS_MP2975_REGULATOR)
        .num_regulators = 1,
index 3757b9391e60ae9b0e1c2ec5e564e0ae55af0c2a..aa0ee8ecd6f2f53ea109cfeacffe5e2682ae228e 100644 (file)
@@ -90,10 +90,8 @@ obj-$(CONFIG_I2C_NPCM)               += i2c-npcm7xx.o
 obj-$(CONFIG_I2C_OCORES)       += i2c-ocores.o
 obj-$(CONFIG_I2C_OMAP)         += i2c-omap.o
 obj-$(CONFIG_I2C_OWL)          += i2c-owl.o
-i2c-pasemi-objs := i2c-pasemi-core.o i2c-pasemi-pci.o
-obj-$(CONFIG_I2C_PASEMI)       += i2c-pasemi.o
-i2c-apple-objs := i2c-pasemi-core.o i2c-pasemi-platform.o
-obj-$(CONFIG_I2C_APPLE)        += i2c-apple.o
+obj-$(CONFIG_I2C_PASEMI)       += i2c-pasemi-core.o i2c-pasemi-pci.o
+obj-$(CONFIG_I2C_APPLE)                += i2c-pasemi-core.o i2c-pasemi-platform.o
 obj-$(CONFIG_I2C_PCA_PLATFORM) += i2c-pca-platform.o
 obj-$(CONFIG_I2C_PNX)          += i2c-pnx.o
 obj-$(CONFIG_I2C_PXA)          += i2c-pxa.o
index 5511fd46a65eae66b46f3e3385fe15f0c8970a2b..ce8c4846b7fae4548e36ccd78ce59ce1c90532f3 100644 (file)
@@ -445,6 +445,7 @@ static u32 aspeed_i2c_master_irq(struct aspeed_i2c_bus *bus, u32 irq_status)
                        irq_status);
                irq_handled |= (irq_status & ASPEED_I2CD_INTR_MASTER_ERRORS);
                if (bus->master_state != ASPEED_I2C_MASTER_INACTIVE) {
+                       irq_handled = irq_status;
                        bus->cmd_err = ret;
                        bus->master_state = ASPEED_I2C_MASTER_INACTIVE;
                        goto out_complete;
index 3932e8d96a17173fa3b4f7ad90ebcbb786e99370..274e987e4cfa0f9b90a576b83d2a96368b7f50a3 100644 (file)
@@ -498,11 +498,10 @@ static int i801_block_transaction_by_block(struct i801_priv *priv,
        /* Set block buffer mode */
        outb_p(inb_p(SMBAUXCTL(priv)) | SMBAUXCTL_E32B, SMBAUXCTL(priv));
 
-       inb_p(SMBHSTCNT(priv)); /* reset the data buffer index */
-
        if (read_write == I2C_SMBUS_WRITE) {
                len = data->block[0];
                outb_p(len, SMBHSTDAT0(priv));
+               inb_p(SMBHSTCNT(priv)); /* reset the data buffer index */
                for (i = 0; i < len; i++)
                        outb_p(data->block[i+1], SMBBLKDAT(priv));
        }
@@ -520,6 +519,7 @@ static int i801_block_transaction_by_block(struct i801_priv *priv,
                }
 
                data->block[0] = len;
+               inb_p(SMBHSTCNT(priv)); /* reset the data buffer index */
                for (i = 0; i < len; i++)
                        data->block[i + 1] = inb_p(SMBBLKDAT(priv));
        }
@@ -1416,7 +1416,6 @@ static void i801_add_mux(struct i801_priv *priv)
                lookup->table[i] = GPIO_LOOKUP(mux_config->gpio_chip,
                                               mux_config->gpios[i], "mux", 0);
        gpiod_add_lookup_table(lookup);
-       priv->lookup = lookup;
 
        /*
         * Register the mux device, we use PLATFORM_DEVID_NONE here
@@ -1430,7 +1429,10 @@ static void i801_add_mux(struct i801_priv *priv)
                                sizeof(struct i2c_mux_gpio_platform_data));
        if (IS_ERR(priv->mux_pdev)) {
                gpiod_remove_lookup_table(lookup);
+               devm_kfree(dev, lookup);
                dev_err(dev, "Failed to register i2c-mux-gpio device\n");
+       } else {
+               priv->lookup = lookup;
        }
 }
 
@@ -1742,9 +1744,9 @@ static int i801_probe(struct pci_dev *dev, const struct pci_device_id *id)
 
        i801_enable_host_notify(&priv->adapter);
 
-       i801_probe_optional_slaves(priv);
        /* We ignore errors - multiplexing is optional */
        i801_add_mux(priv);
+       i801_probe_optional_slaves(priv);
 
        pci_set_drvdata(dev, priv);
 
index 88a053987403cc6f59c3def73fd52cd11e2b1359..60e813137f8442895b19c6e9d871252cc32c7f24 100644 (file)
@@ -803,6 +803,11 @@ static irqreturn_t i2c_imx_slave_handle(struct imx_i2c_struct *i2c_imx,
                ctl &= ~I2CR_MTX;
                imx_i2c_write_reg(ctl, i2c_imx, IMX_I2C_I2CR);
                imx_i2c_read_reg(i2c_imx, IMX_I2C_I2DR);
+
+               /* flag the last byte as processed */
+               i2c_imx_slave_event(i2c_imx,
+                                   I2C_SLAVE_READ_PROCESSED, &value);
+
                i2c_imx_slave_finish_op(i2c_imx);
                return IRQ_HANDLED;
        }
index 7d54a9f34c74b5a3b074a469dca674eb286dd50d..bd8becbdeeb28f4aa7f094df18fa7d059113dcae 100644 (file)
@@ -369,6 +369,7 @@ int pasemi_i2c_common_probe(struct pasemi_smbus *smbus)
 
        return 0;
 }
+EXPORT_SYMBOL_GPL(pasemi_i2c_common_probe);
 
 irqreturn_t pasemi_irq_handler(int irq, void *dev_id)
 {
@@ -378,3 +379,8 @@ irqreturn_t pasemi_irq_handler(int irq, void *dev_id)
        complete(&smbus->irq_completion);
        return IRQ_HANDLED;
 }
+EXPORT_SYMBOL_GPL(pasemi_irq_handler);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Olof Johansson <olof@lixom.net>");
+MODULE_DESCRIPTION("PA Semi PWRficient SMBus driver");
index 0d2e7171e3a6f94a66d2ff85861723e0f2caea00..da94df466e83c9d34c6212681c087cafc8a6b788 100644 (file)
@@ -613,20 +613,20 @@ static int geni_i2c_gpi_xfer(struct geni_i2c_dev *gi2c, struct i2c_msg msgs[], i
 
                peripheral.addr = msgs[i].addr;
 
+               ret =  geni_i2c_gpi(gi2c, &msgs[i], &config,
+                                   &tx_addr, &tx_buf, I2C_WRITE, gi2c->tx_c);
+               if (ret)
+                       goto err;
+
                if (msgs[i].flags & I2C_M_RD) {
                        ret =  geni_i2c_gpi(gi2c, &msgs[i], &config,
                                            &rx_addr, &rx_buf, I2C_READ, gi2c->rx_c);
                        if (ret)
                                goto err;
-               }
-
-               ret =  geni_i2c_gpi(gi2c, &msgs[i], &config,
-                                   &tx_addr, &tx_buf, I2C_WRITE, gi2c->tx_c);
-               if (ret)
-                       goto err;
 
-               if (msgs[i].flags & I2C_M_RD)
                        dma_async_issue_pending(gi2c->rx_c);
+               }
+
                dma_async_issue_pending(gi2c->tx_c);
 
                timeout = wait_for_completion_timeout(&gi2c->done, XFER_TIMEOUT);
index ec2a8da134e56d01be06588551db26bca47caef4..198afee5233c3d65df7552eb3343fe4d4e5d7488 100644 (file)
@@ -378,11 +378,15 @@ static int wmt_i2c_probe(struct platform_device *pdev)
 
        err = i2c_add_adapter(adap);
        if (err)
-               return err;
+               goto err_disable_clk;
 
        platform_set_drvdata(pdev, i2c_dev);
 
        return 0;
+
+err_disable_clk:
+       clk_disable_unprepare(i2c_dev->clk);
+       return err;
 }
 
 static void wmt_i2c_remove(struct platform_device *pdev)
index 91adcac875a4130d75d887b80464b76dd1f422a0..c9d7afe489e832b4a9598ffe9266084dbebd9fd6 100644 (file)
@@ -219,10 +219,12 @@ config BMA400
 
 config BMA400_I2C
        tristate
+       select REGMAP_I2C
        depends on BMA400
 
 config BMA400_SPI
        tristate
+       select REGMAP_SPI
        depends on BMA400
 
 config BMC150_ACCEL
index 90b7ae6d42b7700c9cb0a328b093352a7cec419e..484fe2e9fb1742b9adbde28d737127a4fe1d6413 100644 (file)
@@ -1429,9 +1429,11 @@ static int adxl367_verify_devid(struct adxl367_state *st)
        unsigned int val;
        int ret;
 
-       ret = regmap_read_poll_timeout(st->regmap, ADXL367_REG_DEVID, val,
-                                      val == ADXL367_DEVID_AD, 1000, 10000);
+       ret = regmap_read(st->regmap, ADXL367_REG_DEVID, &val);
        if (ret)
+               return dev_err_probe(st->dev, ret, "Failed to read dev id\n");
+
+       if (val != ADXL367_DEVID_AD)
                return dev_err_probe(st->dev, -ENODEV,
                                     "Invalid dev id 0x%02X, expected 0x%02X\n",
                                     val, ADXL367_DEVID_AD);
@@ -1510,6 +1512,8 @@ int adxl367_probe(struct device *dev, const struct adxl367_ops *ops,
        if (ret)
                return ret;
 
+       fsleep(15000);
+
        ret = adxl367_verify_devid(st);
        if (ret)
                return ret;
index b595fe94f3a321b2d8fc986d64d29dbc4f024ccc..62c74bdc0d77bff87b822d1d6ed2502ffbed6687 100644 (file)
@@ -11,7 +11,7 @@
 
 #include "adxl367.h"
 
-#define ADXL367_I2C_FIFO_DATA  0x42
+#define ADXL367_I2C_FIFO_DATA  0x18
 
 struct adxl367_i2c_state {
        struct regmap *regmap;
index feb86fe6c422df4ad3085b1c4ffea1651ad4cfd1..62490424b6aed44698c376560550af353073b72b 100644 (file)
@@ -1821,7 +1821,7 @@ static int ad4130_setup_int_clk(struct ad4130_state *st)
 {
        struct device *dev = &st->spi->dev;
        struct device_node *of_node = dev_of_node(dev);
-       struct clk_init_data init;
+       struct clk_init_data init = {};
        const char *clk_name;
        int ret;
 
@@ -1891,10 +1891,14 @@ static int ad4130_setup(struct iio_dev *indio_dev)
                return ret;
 
        /*
-        * Configure all GPIOs for output. If configured, the interrupt function
-        * of P2 takes priority over the GPIO out function.
+        * Configure unused GPIOs for output. If configured, the interrupt
+        * function of P2 takes priority over the GPIO out function.
         */
-       val =  AD4130_IO_CONTROL_GPIO_CTRL_MASK;
+       val = 0;
+       for (i = 0; i < AD4130_MAX_GPIOS; i++)
+               if (st->pins_fn[i + AD4130_AIN2_P1] == AD4130_PIN_FN_NONE)
+                       val |= FIELD_PREP(AD4130_IO_CONTROL_GPIO_CTRL_MASK, BIT(i));
+
        val |= FIELD_PREP(AD4130_IO_CONTROL_INT_PIN_SEL_MASK, st->int_pin_sel);
 
        ret = regmap_write(st->regmap, AD4130_IO_CONTROL_REG, val);
index 57700f12480382b82299c5822ae2fb42a8d7f391..70056430505752682f5fd2c3b2e56ef1db48c7b7 100644 (file)
@@ -195,7 +195,7 @@ static int ad7091r8_gpio_setup(struct ad7091r_state *st)
        st->reset_gpio = devm_gpiod_get_optional(st->dev, "reset",
                                                 GPIOD_OUT_HIGH);
        if (IS_ERR(st->reset_gpio))
-               return dev_err_probe(st->dev, PTR_ERR(st->convst_gpio),
+               return dev_err_probe(st->dev, PTR_ERR(st->reset_gpio),
                                     "Error on requesting reset GPIO\n");
 
        if (st->reset_gpio) {
index 2de5494e7c22585aa52f016dc18e30c4f8107f37..b15b7a3b66d5a4d84bf3b48d46d88b29ec7c00d7 100644 (file)
@@ -48,6 +48,18 @@ config HDC2010
          To compile this driver as a module, choose M here: the module
          will be called hdc2010.
 
+config HDC3020
+       tristate "TI HDC3020 relative humidity and temperature sensor"
+       depends on I2C
+       select CRC8
+       help
+         Say yes here to build support for the Texas Instruments
+         HDC3020, HDC3021 and HDC3022 relative humidity and temperature
+         sensors.
+
+         To compile this driver as a module, choose M here: the module
+         will be called hdc3020.
+
 config HID_SENSOR_HUMIDITY
        tristate "HID Environmental humidity sensor"
        depends on HID_SENSOR_HUB
index f19ff3de97c56743f0ac51e2768e11c7a2816846..5fbeef299f61bfff07c6dd1f2215cf147d015b4f 100644 (file)
@@ -7,6 +7,7 @@ obj-$(CONFIG_AM2315) += am2315.o
 obj-$(CONFIG_DHT11) += dht11.o
 obj-$(CONFIG_HDC100X) += hdc100x.o
 obj-$(CONFIG_HDC2010) += hdc2010.o
+obj-$(CONFIG_HDC3020) += hdc3020.o
 obj-$(CONFIG_HID_SENSOR_HUMIDITY) += hid-sensor-humidity.o
 
 hts221-y := hts221_core.o \
index 4e3311170725bc55fa3a5dd1c2a90dccb6c3aa19..ed70415512f687b6333078f9416b9a0fd6edbfdb 100644 (file)
@@ -322,7 +322,7 @@ static int hdc3020_read_raw(struct iio_dev *indio_dev,
                if (chan->type != IIO_TEMP)
                        return -EINVAL;
 
-               *val = 16852;
+               *val = -16852;
                return IIO_VAL_INT;
 
        default:
index 83e53acfbe88011f4306f19438b6f31d4cad5b22..c7f5866a177d90edef7c30bfe54ada71cc17870c 100644 (file)
@@ -8,6 +8,7 @@ config BOSCH_BNO055
 config BOSCH_BNO055_SERIAL
        tristate "Bosch BNO055 attached via UART"
        depends on SERIAL_DEV_BUS
+       select REGMAP
        select BOSCH_BNO055
        help
          Enable this to support Bosch BNO055 IMUs attached via UART.
index 66d4ba088e70ff8c0df12685af77e4372e87985d..d4f9b5d8d28d6d7850f8e5dbf2a4f3c5d9b32d50 100644 (file)
@@ -109,6 +109,8 @@ irqreturn_t inv_mpu6050_read_fifo(int irq, void *p)
        /* compute and process only all complete datum */
        nb = fifo_count / bytes_per_datum;
        fifo_count = nb * bytes_per_datum;
+       if (nb == 0)
+               goto end_session;
        /* Each FIFO data contains all sensors, so same number for FIFO and sensor data */
        fifo_period = NSEC_PER_SEC / INV_MPU6050_DIVIDER_TO_FIFO_RATE(st->chip_config.divider);
        inv_sensors_timestamp_interrupt(&st->timestamp, fifo_period, nb, nb, pf->timestamp);
index 676704f9151fcb4eb111cdd89d486a48fab91f28..e6e6e94452a32801ff7427112b33f0a4bb923d2f 100644 (file)
@@ -111,6 +111,7 @@ int inv_mpu6050_prepare_fifo(struct inv_mpu6050_state *st, bool enable)
        if (enable) {
                /* reset timestamping */
                inv_sensors_timestamp_reset(&st->timestamp);
+               inv_sensors_timestamp_apply_odr(&st->timestamp, 0, 0, 0);
                /* reset FIFO */
                d = st->chip_config.user_ctrl | INV_MPU6050_BIT_FIFO_RST;
                ret = regmap_write(st->map, st->reg->user_ctrl, d);
@@ -184,6 +185,10 @@ static int inv_mpu6050_set_enable(struct iio_dev *indio_dev, bool enable)
                if (result)
                        goto error_power_off;
        } else {
+               st->chip_config.gyro_fifo_enable = 0;
+               st->chip_config.accl_fifo_enable = 0;
+               st->chip_config.temp_fifo_enable = 0;
+               st->chip_config.magn_fifo_enable = 0;
                result = inv_mpu6050_prepare_fifo(st, false);
                if (result)
                        goto error_power_off;
index 9a85752124ddc43b10ecb12ed2c48f605395b170..173dc00762a152e414feac8f1d8d626e01d4bde8 100644 (file)
@@ -1584,10 +1584,13 @@ static int iio_device_register_sysfs(struct iio_dev *indio_dev)
        ret = iio_device_register_sysfs_group(indio_dev,
                                              &iio_dev_opaque->chan_attr_group);
        if (ret)
-               goto error_clear_attrs;
+               goto error_free_chan_attrs;
 
        return 0;
 
+error_free_chan_attrs:
+       kfree(iio_dev_opaque->chan_attr_group.attrs);
+       iio_dev_opaque->chan_attr_group.attrs = NULL;
 error_clear_attrs:
        iio_free_chan_devattr_list(&iio_dev_opaque->channel_attr_list);
 
index 5cd27f04b45e6d911ae53e7574a916455149c2a5..b6c4bef2a7bb22bbe42463ffc7d72e934e0a2591 100644 (file)
@@ -226,6 +226,7 @@ static int als_capture_sample(struct hid_sensor_hub_device *hsdev,
        case HID_USAGE_SENSOR_TIME_TIMESTAMP:
                als_state->timestamp = hid_sensor_convert_timestamp(&als_state->common_attributes,
                                                                    *(s64 *)raw_data);
+               ret = 0;
                break;
        default:
                break;
index 69938204456f8bb0c1c4777d93ee7d0b8f2421dd..42b70cd42b39359ddd542e273163cab829128ab4 100644 (file)
@@ -530,6 +530,7 @@ int rm3100_common_probe(struct device *dev, struct regmap *regmap, int irq)
        struct rm3100_data *data;
        unsigned int tmp;
        int ret;
+       int samp_rate_index;
 
        indio_dev = devm_iio_device_alloc(dev, sizeof(*data));
        if (!indio_dev)
@@ -586,9 +587,14 @@ int rm3100_common_probe(struct device *dev, struct regmap *regmap, int irq)
        ret = regmap_read(regmap, RM3100_REG_TMRC, &tmp);
        if (ret < 0)
                return ret;
+
+       samp_rate_index = tmp - RM3100_TMRC_OFFSET;
+       if (samp_rate_index < 0 || samp_rate_index >=  RM3100_SAMP_NUM) {
+               dev_err(dev, "The value read from RM3100_REG_TMRC is invalid!\n");
+               return -EINVAL;
+       }
        /* Initializing max wait time, which is double conversion time. */
-       data->conversion_time = rm3100_samp_rates[tmp - RM3100_TMRC_OFFSET][2]
-                               * 2;
+       data->conversion_time = rm3100_samp_rates[samp_rate_index][2] * 2;
 
        /* Cycle count values may not be what we want. */
        if ((tmp - RM3100_TMRC_OFFSET) == 0)
index 433d6fac83c4cd95f698e1063a78c36dde79b374..a444d4b2978b581ed8f4cd63b6821e23a45a0560 100644 (file)
@@ -4,6 +4,7 @@
  *
  * Inspired by the older BMP085 driver drivers/misc/bmp085-spi.c
  */
+#include <linux/bits.h>
 #include <linux/module.h>
 #include <linux/spi/spi.h>
 #include <linux/err.h>
@@ -35,6 +36,34 @@ static int bmp280_regmap_spi_read(void *context, const void *reg,
        return spi_write_then_read(spi, reg, reg_size, val, val_size);
 }
 
+static int bmp380_regmap_spi_read(void *context, const void *reg,
+                                 size_t reg_size, void *val, size_t val_size)
+{
+       struct spi_device *spi = to_spi_device(context);
+       u8 rx_buf[4];
+       ssize_t status;
+
+       /*
+        * Maximum number of consecutive bytes read for a temperature or
+        * pressure measurement is 3.
+        */
+       if (val_size > 3)
+               return -EINVAL;
+
+       /*
+        * According to the BMP3xx datasheets, for a basic SPI read opertion,
+        * the first byte needs to be dropped and the rest are the requested
+        * data.
+        */
+       status = spi_write_then_read(spi, reg, 1, rx_buf, val_size + 1);
+       if (status)
+               return status;
+
+       memcpy(val, rx_buf + 1, val_size);
+
+       return 0;
+}
+
 static struct regmap_bus bmp280_regmap_bus = {
        .write = bmp280_regmap_spi_write,
        .read = bmp280_regmap_spi_read,
@@ -42,10 +71,19 @@ static struct regmap_bus bmp280_regmap_bus = {
        .val_format_endian_default = REGMAP_ENDIAN_BIG,
 };
 
+static struct regmap_bus bmp380_regmap_bus = {
+       .write = bmp280_regmap_spi_write,
+       .read = bmp380_regmap_spi_read,
+       .read_flag_mask = BIT(7),
+       .reg_format_endian_default = REGMAP_ENDIAN_BIG,
+       .val_format_endian_default = REGMAP_ENDIAN_BIG,
+};
+
 static int bmp280_spi_probe(struct spi_device *spi)
 {
        const struct spi_device_id *id = spi_get_device_id(spi);
        const struct bmp280_chip_info *chip_info;
+       struct regmap_bus *bmp_regmap_bus;
        struct regmap *regmap;
        int ret;
 
@@ -58,8 +96,18 @@ static int bmp280_spi_probe(struct spi_device *spi)
 
        chip_info = spi_get_device_match_data(spi);
 
+       switch (chip_info->chip_id[0]) {
+       case BMP380_CHIP_ID:
+       case BMP390_CHIP_ID:
+               bmp_regmap_bus = &bmp380_regmap_bus;
+               break;
+       default:
+               bmp_regmap_bus = &bmp280_regmap_bus;
+               break;
+       }
+
        regmap = devm_regmap_init(&spi->dev,
-                                 &bmp280_regmap_bus,
+                                 bmp_regmap_bus,
                                  &spi->dev,
                                  chip_info->regmap_config);
        if (IS_ERR(regmap)) {
@@ -87,6 +135,7 @@ static const struct of_device_id bmp280_of_spi_match[] = {
 MODULE_DEVICE_TABLE(of, bmp280_of_spi_match);
 
 static const struct spi_device_id bmp280_spi_id[] = {
+       { "bmp085", (kernel_ulong_t)&bmp180_chip_info },
        { "bmp180", (kernel_ulong_t)&bmp180_chip_info },
        { "bmp181", (kernel_ulong_t)&bmp180_chip_info },
        { "bmp280", (kernel_ulong_t)&bmp280_chip_info },
index 28c8269ba65d31f1547fc0e35c0fc9465c4b9740..0bba4c5a8d4059ba24eebbc1acb5732ebd1efc31 100644 (file)
@@ -250,18 +250,17 @@ static irqreturn_t dlh_trigger_handler(int irq, void *private)
        struct dlh_state *st = iio_priv(indio_dev);
        int ret;
        unsigned int chn, i = 0;
-       __be32 tmp_buf[2];
+       __be32 tmp_buf[2] = { };
 
        ret = dlh_start_capture_and_read(st);
        if (ret)
                goto out;
 
        for_each_set_bit(chn, indio_dev->active_scan_mask,
-               indio_dev->masklength) {
-               memcpy(tmp_buf + i,
+                        indio_dev->masklength) {
+               memcpy(&tmp_buf[i++],
                        &st->rx_buf[1] + chn * DLH_NUM_DATA_BYTES,
                        DLH_NUM_DATA_BYTES);
-               i++;
        }
 
        iio_push_to_buffers(indio_dev, tmp_buf);
index 824349659d69dc8e9ea9c1b5254d469628a5f933..ce9c5bae83bf1b934338d465ce25c5fba4e6ab2c 100644 (file)
@@ -401,6 +401,10 @@ static void bnxt_re_create_fence_wqe(struct bnxt_re_pd *pd)
        struct bnxt_re_fence_data *fence = &pd->fence;
        struct ib_mr *ib_mr = &fence->mr->ib_mr;
        struct bnxt_qplib_swqe *wqe = &fence->bind_wqe;
+       struct bnxt_re_dev *rdev = pd->rdev;
+
+       if (bnxt_qplib_is_chip_gen_p5_p7(rdev->chip_ctx))
+               return;
 
        memset(wqe, 0, sizeof(*wqe));
        wqe->type = BNXT_QPLIB_SWQE_TYPE_BIND_MW;
@@ -455,6 +459,9 @@ static void bnxt_re_destroy_fence_mr(struct bnxt_re_pd *pd)
        struct device *dev = &rdev->en_dev->pdev->dev;
        struct bnxt_re_mr *mr = fence->mr;
 
+       if (bnxt_qplib_is_chip_gen_p5_p7(rdev->chip_ctx))
+               return;
+
        if (fence->mw) {
                bnxt_re_dealloc_mw(fence->mw);
                fence->mw = NULL;
@@ -486,6 +493,9 @@ static int bnxt_re_create_fence_mr(struct bnxt_re_pd *pd)
        struct ib_mw *mw;
        int rc;
 
+       if (bnxt_qplib_is_chip_gen_p5_p7(rdev->chip_ctx))
+               return 0;
+
        dma_addr = dma_map_single(dev, fence->va, BNXT_RE_FENCE_BYTES,
                                  DMA_BIDIRECTIONAL);
        rc = dma_mapping_error(dev, dma_addr);
@@ -1817,7 +1827,7 @@ int bnxt_re_modify_srq(struct ib_srq *ib_srq, struct ib_srq_attr *srq_attr,
        switch (srq_attr_mask) {
        case IB_SRQ_MAX_WR:
                /* SRQ resize is not supported */
-               break;
+               return -EINVAL;
        case IB_SRQ_LIMIT:
                /* Change the SRQ threshold */
                if (srq_attr->srq_limit > srq->qplib_srq.max_wqe)
@@ -1832,13 +1842,12 @@ int bnxt_re_modify_srq(struct ib_srq *ib_srq, struct ib_srq_attr *srq_attr,
                /* On success, update the shadow */
                srq->srq_limit = srq_attr->srq_limit;
                /* No need to Build and send response back to udata */
-               break;
+               return 0;
        default:
                ibdev_err(&rdev->ibdev,
                          "Unsupported srq_attr_mask 0x%x", srq_attr_mask);
                return -EINVAL;
        }
-       return 0;
 }
 
 int bnxt_re_query_srq(struct ib_srq *ib_srq, struct ib_srq_attr *srq_attr)
@@ -2556,11 +2565,6 @@ static int bnxt_re_build_inv_wqe(const struct ib_send_wr *wr,
        wqe->type = BNXT_QPLIB_SWQE_TYPE_LOCAL_INV;
        wqe->local_inv.inv_l_key = wr->ex.invalidate_rkey;
 
-       /* Need unconditional fence for local invalidate
-        * opcode to work as expected.
-        */
-       wqe->flags |= BNXT_QPLIB_SWQE_FLAGS_UC_FENCE;
-
        if (wr->send_flags & IB_SEND_SIGNALED)
                wqe->flags |= BNXT_QPLIB_SWQE_FLAGS_SIGNAL_COMP;
        if (wr->send_flags & IB_SEND_SOLICITED)
@@ -2583,12 +2587,6 @@ static int bnxt_re_build_reg_wqe(const struct ib_reg_wr *wr,
        wqe->frmr.levels = qplib_frpl->hwq.level;
        wqe->type = BNXT_QPLIB_SWQE_TYPE_REG_MR;
 
-       /* Need unconditional fence for reg_mr
-        * opcode to function as expected.
-        */
-
-       wqe->flags |= BNXT_QPLIB_SWQE_FLAGS_UC_FENCE;
-
        if (wr->wr.send_flags & IB_SEND_SIGNALED)
                wqe->flags |= BNXT_QPLIB_SWQE_FLAGS_SIGNAL_COMP;
 
@@ -2719,6 +2717,18 @@ bad:
        return rc;
 }
 
+static void bnxt_re_legacy_set_uc_fence(struct bnxt_qplib_swqe *wqe)
+{
+       /* Need unconditional fence for non-wire memory opcode
+        * to work as expected.
+        */
+       if (wqe->type == BNXT_QPLIB_SWQE_TYPE_LOCAL_INV ||
+           wqe->type == BNXT_QPLIB_SWQE_TYPE_FAST_REG_MR ||
+           wqe->type == BNXT_QPLIB_SWQE_TYPE_REG_MR ||
+           wqe->type == BNXT_QPLIB_SWQE_TYPE_BIND_MW)
+               wqe->flags |= BNXT_QPLIB_SWQE_FLAGS_UC_FENCE;
+}
+
 int bnxt_re_post_send(struct ib_qp *ib_qp, const struct ib_send_wr *wr,
                      const struct ib_send_wr **bad_wr)
 {
@@ -2798,8 +2808,11 @@ int bnxt_re_post_send(struct ib_qp *ib_qp, const struct ib_send_wr *wr,
                        rc = -EINVAL;
                        goto bad;
                }
-               if (!rc)
+               if (!rc) {
+                       if (!bnxt_qplib_is_chip_gen_p5_p7(qp->rdev->chip_ctx))
+                               bnxt_re_legacy_set_uc_fence(&wqe);
                        rc = bnxt_qplib_post_send(&qp->qplib_qp, &wqe);
+               }
 bad:
                if (rc) {
                        ibdev_err(&qp->rdev->ibdev,
index f022c922fae5183cb6860092e5bd0662d22f1764..54b4d2f3a5d885d1f17643a2416420cb6b805b8a 100644 (file)
@@ -280,9 +280,6 @@ static void bnxt_re_set_resource_limits(struct bnxt_re_dev *rdev)
 
 static void bnxt_re_vf_res_config(struct bnxt_re_dev *rdev)
 {
-
-       if (test_bit(BNXT_RE_FLAG_ERR_DEVICE_DETACHED, &rdev->flags))
-               return;
        rdev->num_vfs = pci_sriov_get_totalvfs(rdev->en_dev->pdev);
        if (!bnxt_qplib_is_chip_gen_p5_p7(rdev->chip_ctx)) {
                bnxt_re_set_resource_limits(rdev);
index c98e04fe2ddd477dd8457c09bef64c9339b992f4..439d0c7c5d0cab91e028b380435aaf898f9856c3 100644 (file)
@@ -744,7 +744,8 @@ int bnxt_qplib_query_srq(struct bnxt_qplib_res *res,
        bnxt_qplib_fill_cmdqmsg(&msg, &req, &resp, &sbuf, sizeof(req),
                                sizeof(resp), 0);
        rc = bnxt_qplib_rcfw_send_message(rcfw, &msg);
-       srq->threshold = le16_to_cpu(sb->srq_limit);
+       if (!rc)
+               srq->threshold = le16_to_cpu(sb->srq_limit);
        dma_free_coherent(&rcfw->pdev->dev, sbuf.size,
                          sbuf.sb, sbuf.dma_addr);
 
index 68c621ff59d03fea9340eb50a56363d24c9bc105..5a91cbda4aee6f769385d6a4eab9aa191d0e44d4 100644 (file)
@@ -2086,7 +2086,7 @@ int init_credit_return(struct hfi1_devdata *dd)
                                   "Unable to allocate credit return DMA range for NUMA %d\n",
                                   i);
                        ret = -ENOMEM;
-                       goto done;
+                       goto free_cr_base;
                }
        }
        set_dev_node(&dd->pcidev->dev, dd->node);
@@ -2094,6 +2094,10 @@ int init_credit_return(struct hfi1_devdata *dd)
        ret = 0;
 done:
        return ret;
+
+free_cr_base:
+       free_credit_return(dd);
+       goto done;
 }
 
 void free_credit_return(struct hfi1_devdata *dd)
index 6e5ac2023328a7d59d42f6532113dd9a95641b31..b67d23b1f28625c5ed7a4f15f8a07a32d074199b 100644 (file)
@@ -3158,7 +3158,7 @@ int _pad_sdma_tx_descs(struct hfi1_devdata *dd, struct sdma_txreq *tx)
 {
        int rval = 0;
 
-       if ((unlikely(tx->num_desc + 1 == tx->desc_limit))) {
+       if ((unlikely(tx->num_desc == tx->desc_limit))) {
                rval = _extend_sdma_tx_descs(dd, tx);
                if (rval) {
                        __sdma_txclean(dd, tx);
index 8fb752f2eda2999aed4f61bffcb53e105adde9a5..2cb4b96db7212163f1e207bb87dd0b1326ad26e1 100644 (file)
@@ -346,6 +346,7 @@ enum irdma_cqp_op_type {
 #define IRDMA_AE_LLP_TOO_MANY_KEEPALIVE_RETRIES                                0x050b
 #define IRDMA_AE_LLP_DOUBT_REACHABILITY                                        0x050c
 #define IRDMA_AE_LLP_CONNECTION_ESTABLISHED                            0x050e
+#define IRDMA_AE_LLP_TOO_MANY_RNRS                                     0x050f
 #define IRDMA_AE_RESOURCE_EXHAUSTION                                   0x0520
 #define IRDMA_AE_RESET_SENT                                            0x0601
 #define IRDMA_AE_TERMINATE_SENT                                                0x0602
index bd4b2b89644442341226e6c5716f5ddb221ea1a1..ad50b77282f8a1b5352e390080d208d0086152eb 100644 (file)
@@ -387,6 +387,7 @@ static void irdma_process_aeq(struct irdma_pci_f *rf)
                case IRDMA_AE_LLP_TOO_MANY_RETRIES:
                case IRDMA_AE_LCE_QP_CATASTROPHIC:
                case IRDMA_AE_LCE_FUNCTION_CATASTROPHIC:
+               case IRDMA_AE_LLP_TOO_MANY_RNRS:
                case IRDMA_AE_LCE_CQ_CATASTROPHIC:
                case IRDMA_AE_UDA_XMIT_DGRAM_TOO_LONG:
                default:
@@ -570,6 +571,13 @@ static void irdma_destroy_irq(struct irdma_pci_f *rf,
        dev->irq_ops->irdma_dis_irq(dev, msix_vec->idx);
        irq_update_affinity_hint(msix_vec->irq, NULL);
        free_irq(msix_vec->irq, dev_id);
+       if (rf == dev_id) {
+               tasklet_kill(&rf->dpc_tasklet);
+       } else {
+               struct irdma_ceq *iwceq = (struct irdma_ceq *)dev_id;
+
+               tasklet_kill(&iwceq->dpc_tasklet);
+       }
 }
 
 /**
index b5eb8d421988c1abd73cf4eb3a93adc6f2944089..0b046c061742be140251785f60ac25cff73aa2ba 100644 (file)
@@ -839,7 +839,9 @@ static int irdma_validate_qp_attrs(struct ib_qp_init_attr *init_attr,
 
        if (init_attr->cap.max_inline_data > uk_attrs->max_hw_inline ||
            init_attr->cap.max_send_sge > uk_attrs->max_hw_wq_frags ||
-           init_attr->cap.max_recv_sge > uk_attrs->max_hw_wq_frags)
+           init_attr->cap.max_recv_sge > uk_attrs->max_hw_wq_frags ||
+           init_attr->cap.max_send_wr > uk_attrs->max_hw_wq_quanta ||
+           init_attr->cap.max_recv_wr > uk_attrs->max_hw_rq_quanta)
                return -EINVAL;
 
        if (rdma_protocol_roce(&iwdev->ibdev, 1)) {
@@ -2184,9 +2186,8 @@ static int irdma_create_cq(struct ib_cq *ibcq,
                info.cq_base_pa = iwcq->kmem.pa;
        }
 
-       if (dev->hw_attrs.uk_attrs.hw_rev >= IRDMA_GEN_2)
-               info.shadow_read_threshold = min(info.cq_uk_init_info.cq_size / 2,
-                                                (u32)IRDMA_MAX_CQ_READ_THRESH);
+       info.shadow_read_threshold = min(info.cq_uk_init_info.cq_size / 2,
+                                        (u32)IRDMA_MAX_CQ_READ_THRESH);
 
        if (irdma_sc_cq_init(cq, &info)) {
                ibdev_dbg(&iwdev->ibdev, "VERBS: init cq fail\n");
index f87531318feb807c7c5a216c991e10f197e9f8f4..a78a067e3ce7f3abd260c09f552562050b7b78cc 100644 (file)
@@ -458,6 +458,12 @@ void mlx5_ib_init_cong_debugfs(struct mlx5_ib_dev *dev, u32 port_num)
        dbg_cc_params->root = debugfs_create_dir("cc_params", mlx5_debugfs_get_dev_root(mdev));
 
        for (i = 0; i < MLX5_IB_DBG_CC_MAX; i++) {
+               if ((i == MLX5_IB_DBG_CC_GENERAL_RTT_RESP_DSCP_VALID ||
+                    i == MLX5_IB_DBG_CC_GENERAL_RTT_RESP_DSCP))
+                       if (!MLX5_CAP_GEN(mdev, roce) ||
+                           !MLX5_CAP_ROCE(mdev, roce_cc_general))
+                               continue;
+
                dbg_cc_params->params[i].offset = i;
                dbg_cc_params->params[i].dev = dev;
                dbg_cc_params->params[i].port_num = port_num;
index 869369cb5b5fa4745aaca7bc5eb7032e684bb132..253fea374a72de1d1143b82601da2ce9caf1cf1f 100644 (file)
@@ -2949,7 +2949,7 @@ DECLARE_UVERBS_NAMED_METHOD(
        MLX5_IB_METHOD_DEVX_OBJ_MODIFY,
        UVERBS_ATTR_IDR(MLX5_IB_ATTR_DEVX_OBJ_MODIFY_HANDLE,
                        UVERBS_IDR_ANY_OBJECT,
-                       UVERBS_ACCESS_WRITE,
+                       UVERBS_ACCESS_READ,
                        UA_MANDATORY),
        UVERBS_ATTR_PTR_IN(
                MLX5_IB_ATTR_DEVX_OBJ_MODIFY_CMD_IN,
index df1d1b0a3ef72bfc938c6cb61b5589e5ef7b7ff4..9947feb7fb8a0bcd1ecf9e5d136e9ea7e326e8e7 100644 (file)
@@ -78,7 +78,7 @@ static void set_eth_seg(const struct ib_send_wr *wr, struct mlx5_ib_qp *qp,
                 */
                copysz = min_t(u64, *cur_edge - (void *)eseg->inline_hdr.start,
                               left);
-               memcpy(eseg->inline_hdr.start, pdata, copysz);
+               memcpy(eseg->inline_hdr.data, pdata, copysz);
                stride = ALIGN(sizeof(struct mlx5_wqe_eth_seg) -
                               sizeof(eseg->inline_hdr.start) + copysz, 16);
                *size += stride / 16;
index 7887a6786ed43d6917a97b2dfbd8770c49383fbd..f118ce0a9a617b4226d0195048299827f2a11d37 100644 (file)
@@ -1879,8 +1879,17 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
                /* RQ - read access only (0) */
                rc = qedr_init_user_queue(udata, dev, &qp->urq, ureq.rq_addr,
                                          ureq.rq_len, true, 0, alloc_and_init);
-               if (rc)
+               if (rc) {
+                       ib_umem_release(qp->usq.umem);
+                       qp->usq.umem = NULL;
+                       if (rdma_protocol_roce(&dev->ibdev, 1)) {
+                               qedr_free_pbl(dev, &qp->usq.pbl_info,
+                                             qp->usq.pbl_tbl);
+                       } else {
+                               kfree(qp->usq.pbl_tbl);
+                       }
                        return rc;
+               }
        }
 
        memset(&in_params, 0, sizeof(in_params));
index 58f70cfec45a72abd8df2ba88098a92f7fcacb4a..040234c01be4d5a0cc6fb4a4124af4752f58e181 100644 (file)
@@ -79,12 +79,16 @@ module_param(srpt_srq_size, int, 0444);
 MODULE_PARM_DESC(srpt_srq_size,
                 "Shared receive queue (SRQ) size.");
 
+static int srpt_set_u64_x(const char *buffer, const struct kernel_param *kp)
+{
+       return kstrtou64(buffer, 16, (u64 *)kp->arg);
+}
 static int srpt_get_u64_x(char *buffer, const struct kernel_param *kp)
 {
        return sprintf(buffer, "0x%016llx\n", *(u64 *)kp->arg);
 }
-module_param_call(srpt_service_guid, NULL, srpt_get_u64_x, &srpt_service_guid,
-                 0444);
+module_param_call(srpt_service_guid, srpt_set_u64_x, srpt_get_u64_x,
+                 &srpt_service_guid, 0444);
 MODULE_PARM_DESC(srpt_service_guid,
                 "Using this value for ioc_guid, id_ext, and cm_listen_id instead of using the node_guid of the first HCA.");
 
@@ -210,10 +214,12 @@ static const char *get_ch_state_name(enum rdma_ch_state s)
 /**
  * srpt_qp_event - QP event callback function
  * @event: Description of the event that occurred.
- * @ch: SRPT RDMA channel.
+ * @ptr: SRPT RDMA channel.
  */
-static void srpt_qp_event(struct ib_event *event, struct srpt_rdma_ch *ch)
+static void srpt_qp_event(struct ib_event *event, void *ptr)
 {
+       struct srpt_rdma_ch *ch = ptr;
+
        pr_debug("QP event %d on ch=%p sess_name=%s-%d state=%s\n",
                 event->event, ch, ch->sess_name, ch->qp->qp_num,
                 get_ch_state_name(ch->state));
@@ -1807,8 +1813,7 @@ retry:
        ch->cq_size = ch->rq_size + sq_size;
 
        qp_init->qp_context = (void *)ch;
-       qp_init->event_handler
-               = (void(*)(struct ib_event *, void*))srpt_qp_event;
+       qp_init->event_handler = srpt_qp_event;
        qp_init->send_cq = ch->cq;
        qp_init->recv_cq = ch->cq;
        qp_init->sq_sig_type = IB_SIGNAL_REQ_WR;
index b1244d7df6cc9e097a11257f9bb530b8954e3c12..14c828adebf7829269b7bace9b1bcbda0c7c506c 100644 (file)
@@ -130,7 +130,12 @@ static const struct xpad_device {
        { 0x0079, 0x18d4, "GPD Win 2 X-Box Controller", 0, XTYPE_XBOX360 },
        { 0x03eb, 0xff01, "Wooting One (Legacy)", 0, XTYPE_XBOX360 },
        { 0x03eb, 0xff02, "Wooting Two (Legacy)", 0, XTYPE_XBOX360 },
+       { 0x03f0, 0x038D, "HyperX Clutch", 0, XTYPE_XBOX360 },                  /* wired */
+       { 0x03f0, 0x048D, "HyperX Clutch", 0, XTYPE_XBOX360 },                  /* wireless */
        { 0x03f0, 0x0495, "HyperX Clutch Gladiate", 0, XTYPE_XBOXONE },
+       { 0x03f0, 0x07A0, "HyperX Clutch Gladiate RGB", 0, XTYPE_XBOXONE },
+       { 0x03f0, 0x08B6, "HyperX Clutch Gladiate", 0, XTYPE_XBOXONE },         /* v2 */
+       { 0x03f0, 0x09B4, "HyperX Clutch Tanto", 0, XTYPE_XBOXONE },
        { 0x044f, 0x0f00, "Thrustmaster Wheel", 0, XTYPE_XBOX },
        { 0x044f, 0x0f03, "Thrustmaster Wheel", 0, XTYPE_XBOX },
        { 0x044f, 0x0f07, "Thrustmaster, Inc. Controller", 0, XTYPE_XBOX },
@@ -294,6 +299,7 @@ static const struct xpad_device {
        { 0x1689, 0xfd00, "Razer Onza Tournament Edition", 0, XTYPE_XBOX360 },
        { 0x1689, 0xfd01, "Razer Onza Classic Edition", 0, XTYPE_XBOX360 },
        { 0x1689, 0xfe00, "Razer Sabertooth", 0, XTYPE_XBOX360 },
+       { 0x17ef, 0x6182, "Lenovo Legion Controller for Windows", 0, XTYPE_XBOX360 },
        { 0x1949, 0x041a, "Amazon Game Controller", 0, XTYPE_XBOX360 },
        { 0x1bad, 0x0002, "Harmonix Rock Band Guitar", 0, XTYPE_XBOX360 },
        { 0x1bad, 0x0003, "Harmonix Rock Band Drumkit", MAP_DPAD_TO_BUTTONS, XTYPE_XBOX360 },
@@ -462,6 +468,7 @@ static const struct usb_device_id xpad_table[] = {
        { USB_INTERFACE_INFO('X', 'B', 0) },    /* Xbox USB-IF not-approved class */
        XPAD_XBOX360_VENDOR(0x0079),            /* GPD Win 2 controller */
        XPAD_XBOX360_VENDOR(0x03eb),            /* Wooting Keyboards (Legacy) */
+       XPAD_XBOX360_VENDOR(0x03f0),            /* HP HyperX Xbox 360 controllers */
        XPAD_XBOXONE_VENDOR(0x03f0),            /* HP HyperX Xbox One controllers */
        XPAD_XBOX360_VENDOR(0x044f),            /* Thrustmaster Xbox 360 controllers */
        XPAD_XBOX360_VENDOR(0x045e),            /* Microsoft Xbox 360 controllers */
@@ -491,6 +498,7 @@ static const struct usb_device_id xpad_table[] = {
        XPAD_XBOX360_VENDOR(0x15e4),            /* Numark Xbox 360 controllers */
        XPAD_XBOX360_VENDOR(0x162e),            /* Joytech Xbox 360 controllers */
        XPAD_XBOX360_VENDOR(0x1689),            /* Razer Onza */
+       XPAD_XBOX360_VENDOR(0x17ef),            /* Lenovo */
        XPAD_XBOX360_VENDOR(0x1949),            /* Amazon controllers */
        XPAD_XBOX360_VENDOR(0x1bad),            /* Harmonix Rock Band guitar and drums */
        XPAD_XBOX360_VENDOR(0x20d6),            /* PowerA controllers */
index 13ef6284223da30940e5a37802d04a104d2692f6..7f67f9f2946b484317575d529ee35a385fc2882e 100644 (file)
@@ -811,7 +811,6 @@ static int atkbd_probe(struct atkbd *atkbd)
 {
        struct ps2dev *ps2dev = &atkbd->ps2dev;
        unsigned char param[2];
-       bool skip_getid;
 
 /*
  * Some systems, where the bit-twiddling when testing the io-lines of the
@@ -825,6 +824,11 @@ static int atkbd_probe(struct atkbd *atkbd)
                                 "keyboard reset failed on %s\n",
                                 ps2dev->serio->phys);
 
+       if (atkbd_skip_getid(atkbd)) {
+               atkbd->id = 0xab83;
+               goto deactivate_kbd;
+       }
+
 /*
  * Then we check the keyboard ID. We should get 0xab83 under normal conditions.
  * Some keyboards report different values, but the first byte is always 0xab or
@@ -833,18 +837,17 @@ static int atkbd_probe(struct atkbd *atkbd)
  */
 
        param[0] = param[1] = 0xa5;     /* initialize with invalid values */
-       skip_getid = atkbd_skip_getid(atkbd);
-       if (skip_getid || ps2_command(ps2dev, param, ATKBD_CMD_GETID)) {
+       if (ps2_command(ps2dev, param, ATKBD_CMD_GETID)) {
 
 /*
- * If the get ID command was skipped or failed, we check if we can at least set
+ * If the get ID command failed, we check if we can at least set
  * the LEDs on the keyboard. This should work on every keyboard out there.
  * It also turns the LEDs off, which we want anyway.
  */
                param[0] = 0;
                if (ps2_command(ps2dev, param, ATKBD_CMD_SETLEDS))
                        return -1;
-               atkbd->id = skip_getid ? 0xab83 : 0xabba;
+               atkbd->id = 0xabba;
                return 0;
        }
 
@@ -860,6 +863,7 @@ static int atkbd_probe(struct atkbd *atkbd)
                return -1;
        }
 
+deactivate_kbd:
 /*
  * Make sure nothing is coming from the keyboard and disturbs our
  * internal state.
index ba00ecfbd343bc796fd48f77db92659a1c36bccc..b41fd1240f4312e06935685d00aded64076c3513 100644 (file)
@@ -315,12 +315,10 @@ static int gpio_keys_polled_probe(struct platform_device *pdev)
 
                        error = devm_gpio_request_one(dev, button->gpio,
                                        flags, button->desc ? : DRV_NAME);
-                       if (error) {
-                               dev_err(dev,
-                                       "unable to claim gpio %u, err=%d\n",
-                                       button->gpio, error);
-                               return error;
-                       }
+                       if (error)
+                               return dev_err_probe(dev, error,
+                                                    "unable to claim gpio %u\n",
+                                                    button->gpio);
 
                        bdata->gpiod = gpio_to_desc(button->gpio);
                        if (!bdata->gpiod) {
index 258d5fe3d395c4670088aa0d736cac69c7d24550..42eaebb3bf5cc82efabccff777a8ee23b016bf49 100644 (file)
@@ -978,12 +978,12 @@ static int rmi_driver_remove(struct device *dev)
 
        rmi_disable_irq(rmi_dev, false);
 
-       irq_domain_remove(data->irqdomain);
-       data->irqdomain = NULL;
-
        rmi_f34_remove_sysfs(rmi_dev);
        rmi_free_function_list(rmi_dev);
 
+       irq_domain_remove(data->irqdomain);
+       data->irqdomain = NULL;
+
        return 0;
 }
 
index b585b1dab870e0725daa62d7b52d2c9ca406798a..dfc6c581873b7d45da63d88a216295a24fa2c13b 100644 (file)
@@ -634,6 +634,14 @@ static const struct dmi_system_id i8042_dmi_quirk_table[] __initconst = {
                },
                .driver_data = (void *)(SERIO_QUIRK_NOAUX)
        },
+       {
+               /* Fujitsu Lifebook U728 */
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"),
+                       DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK U728"),
+               },
+               .driver_data = (void *)(SERIO_QUIRK_NOAUX)
+       },
        {
                /* Gigabyte M912 */
                .matches = {
@@ -1208,6 +1216,12 @@ static const struct dmi_system_id i8042_dmi_quirk_table[] __initconst = {
                                        SERIO_QUIRK_RESET_ALWAYS | SERIO_QUIRK_NOLOOP |
                                        SERIO_QUIRK_NOPNP)
        },
+       {
+               .matches = {
+                       DMI_MATCH(DMI_BOARD_NAME, "NS5x_7xPU"),
+               },
+               .driver_data = (void *)(SERIO_QUIRK_NOAUX)
+       },
        {
                .matches = {
                        DMI_MATCH(DMI_BOARD_NAME, "NJ50_70CU"),
index af32fbe57b630373f6fd8b67e3129be1d54de1d1..b068ff8afbc9ad3ba62b70cbbee20feb572c3855 100644 (file)
@@ -884,7 +884,8 @@ static int goodix_add_acpi_gpio_mappings(struct goodix_ts_data *ts)
                }
        }
 
-       if (ts->gpio_count == 2 && ts->gpio_int_idx == 0) {
+       /* Some devices with gpio_int_idx 0 list a third unused GPIO */
+       if ((ts->gpio_count == 2 || ts->gpio_count == 3) && ts->gpio_int_idx == 0) {
                ts->irq_pin_access_method = IRQ_PIN_ACCESS_ACPI_GPIO;
                gpio_mapping = acpi_goodix_int_first_gpios;
        } else if (ts->gpio_count == 2 && ts->gpio_int_idx == 1) {
index 20331e119beb694945b196df9fb2c7efff60feda..03d626776ba17a3ff18c91c1e685a0230e8fcbbb 100644 (file)
@@ -1372,6 +1372,7 @@ static struct qcom_icc_bcm bcm_mm0 = {
 
 static struct qcom_icc_bcm bcm_co0 = {
        .name = "CO0",
+       .keepalive = true,
        .num_nodes = 1,
        .nodes = { &slv_qns_cdsp_mem_noc }
 };
index 629faa4c9aaee280e7514695dcd2c96e9125d1dd..fc22cecf650fc4eedaf3970a6a8f025f7e9d849e 100644 (file)
@@ -2223,6 +2223,7 @@ static struct platform_driver qnoc_driver = {
        .driver = {
                .name = "qnoc-sm8550",
                .of_match_table = qnoc_of_match,
+               .sync_state = icc_sync_state,
        },
 };
 
index b83de54577b6874553624390e0e0145fd966a38a..b962e6c233ef78ed3ed44cf0b0777bb62fbd50a6 100644 (file)
@@ -1160,7 +1160,7 @@ static struct qcom_icc_node qns_gemnoc_sf = {
 
 static struct qcom_icc_bcm bcm_acv = {
        .name = "ACV",
-       .enable_mask = BIT(3),
+       .enable_mask = BIT(0),
        .num_nodes = 1,
        .nodes = { &ebi },
 };
index d19501d913b39c696a337ff4f5d3a54aa07915c4..cbaf4f9c41be656212b50dce683273911e1e1cd6 100644 (file)
@@ -1586,6 +1586,7 @@ static struct qcom_icc_node qns_pcie_south_gem_noc_pcie = {
 
 static struct qcom_icc_bcm bcm_acv = {
        .name = "ACV",
+       .enable_mask = BIT(3),
        .num_nodes = 1,
        .nodes = { &ebi },
 };
index 05722121f00e70689680ce7a45cc5e953f50210b..4a27fbdb2d8446cb6af2b0e287580615c7da47c1 100644 (file)
@@ -292,10 +292,8 @@ arm_smmu_mmu_notifier_get(struct arm_smmu_domain *smmu_domain,
                          struct mm_struct *mm)
 {
        int ret;
-       unsigned long flags;
        struct arm_smmu_ctx_desc *cd;
        struct arm_smmu_mmu_notifier *smmu_mn;
-       struct arm_smmu_master *master;
 
        list_for_each_entry(smmu_mn, &smmu_domain->mmu_notifiers, list) {
                if (smmu_mn->mn.mm == mm) {
@@ -325,28 +323,9 @@ arm_smmu_mmu_notifier_get(struct arm_smmu_domain *smmu_domain,
                goto err_free_cd;
        }
 
-       spin_lock_irqsave(&smmu_domain->devices_lock, flags);
-       list_for_each_entry(master, &smmu_domain->devices, domain_head) {
-               ret = arm_smmu_write_ctx_desc(master, mm_get_enqcmd_pasid(mm),
-                                             cd);
-               if (ret) {
-                       list_for_each_entry_from_reverse(
-                               master, &smmu_domain->devices, domain_head)
-                               arm_smmu_write_ctx_desc(
-                                       master, mm_get_enqcmd_pasid(mm), NULL);
-                       break;
-               }
-       }
-       spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
-       if (ret)
-               goto err_put_notifier;
-
        list_add(&smmu_mn->list, &smmu_domain->mmu_notifiers);
        return smmu_mn;
 
-err_put_notifier:
-       /* Frees smmu_mn */
-       mmu_notifier_put(&smmu_mn->mn);
 err_free_cd:
        arm_smmu_free_shared_cd(cd);
        return ERR_PTR(ret);
@@ -363,9 +342,6 @@ static void arm_smmu_mmu_notifier_put(struct arm_smmu_mmu_notifier *smmu_mn)
 
        list_del(&smmu_mn->list);
 
-       arm_smmu_update_ctx_desc_devices(smmu_domain, mm_get_enqcmd_pasid(mm),
-                                        NULL);
-
        /*
         * If we went through clear(), we've already invalidated, and no
         * new TLB entry can have been formed.
@@ -381,7 +357,8 @@ static void arm_smmu_mmu_notifier_put(struct arm_smmu_mmu_notifier *smmu_mn)
        arm_smmu_free_shared_cd(cd);
 }
 
-static int __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
+static int __arm_smmu_sva_bind(struct device *dev, ioasid_t pasid,
+                              struct mm_struct *mm)
 {
        int ret;
        struct arm_smmu_bond *bond;
@@ -404,9 +381,15 @@ static int __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
                goto err_free_bond;
        }
 
+       ret = arm_smmu_write_ctx_desc(master, pasid, bond->smmu_mn->cd);
+       if (ret)
+               goto err_put_notifier;
+
        list_add(&bond->list, &master->bonds);
        return 0;
 
+err_put_notifier:
+       arm_smmu_mmu_notifier_put(bond->smmu_mn);
 err_free_bond:
        kfree(bond);
        return ret;
@@ -568,6 +551,9 @@ void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain,
        struct arm_smmu_master *master = dev_iommu_priv_get(dev);
 
        mutex_lock(&sva_lock);
+
+       arm_smmu_write_ctx_desc(master, id, NULL);
+
        list_for_each_entry(t, &master->bonds, list) {
                if (t->mm == mm) {
                        bond = t;
@@ -590,7 +576,7 @@ static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain,
        struct mm_struct *mm = domain->mm;
 
        mutex_lock(&sva_lock);
-       ret = __arm_smmu_sva_bind(dev, mm);
+       ret = __arm_smmu_sva_bind(dev, id, mm);
        mutex_unlock(&sva_lock);
 
        return ret;
index 68b6bc5e7c71016b8d58a6a077e921b27fb51447..6317aaf7b3ab1c7bed6f5f33b9a4bdca14cc171e 100644 (file)
@@ -859,10 +859,14 @@ static void arm_smmu_destroy_domain_context(struct arm_smmu_domain *smmu_domain)
        arm_smmu_rpm_put(smmu);
 }
 
-static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev)
+static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
        struct arm_smmu_domain *smmu_domain;
 
+       if (type != IOMMU_DOMAIN_UNMANAGED) {
+               if (using_legacy_binding || type != IOMMU_DOMAIN_DMA)
+                       return NULL;
+       }
        /*
         * Allocate the domain and initialise some of its data structures.
         * We can't really do anything meaningful until we've added a
@@ -875,15 +879,6 @@ static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev)
        mutex_init(&smmu_domain->init_mutex);
        spin_lock_init(&smmu_domain->cb_lock);
 
-       if (dev) {
-               struct arm_smmu_master_cfg *cfg = dev_iommu_priv_get(dev);
-
-               if (arm_smmu_init_domain_context(smmu_domain, cfg->smmu, dev)) {
-                       kfree(smmu_domain);
-                       return NULL;
-               }
-       }
-
        return &smmu_domain->domain;
 }
 
@@ -1600,7 +1595,7 @@ static struct iommu_ops arm_smmu_ops = {
        .identity_domain        = &arm_smmu_identity_domain,
        .blocked_domain         = &arm_smmu_blocked_domain,
        .capable                = arm_smmu_capable,
-       .domain_alloc_paging    = arm_smmu_domain_alloc_paging,
+       .domain_alloc           = arm_smmu_domain_alloc,
        .probe_device           = arm_smmu_probe_device,
        .release_device         = arm_smmu_release_device,
        .probe_finalize         = arm_smmu_probe_finalize,
index 6fb5f6fceea11fb7865d92d8451a5de98a655556..11652e0bcab3a6e3113c70fb80971853df012f57 100644 (file)
@@ -396,8 +396,6 @@ static int domain_update_device_node(struct dmar_domain *domain)
        return nid;
 }
 
-static void domain_update_iotlb(struct dmar_domain *domain);
-
 /* Return the super pagesize bitmap if supported. */
 static unsigned long domain_super_pgsize_bitmap(struct dmar_domain *domain)
 {
@@ -1218,7 +1216,7 @@ domain_lookup_dev_info(struct dmar_domain *domain,
        return NULL;
 }
 
-static void domain_update_iotlb(struct dmar_domain *domain)
+void domain_update_iotlb(struct dmar_domain *domain)
 {
        struct dev_pasid_info *dev_pasid;
        struct device_domain_info *info;
@@ -1368,6 +1366,46 @@ static void domain_flush_pasid_iotlb(struct intel_iommu *iommu,
        spin_unlock_irqrestore(&domain->lock, flags);
 }
 
+static void __iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did,
+                                   unsigned long pfn, unsigned int pages,
+                                   int ih)
+{
+       unsigned int aligned_pages = __roundup_pow_of_two(pages);
+       unsigned long bitmask = aligned_pages - 1;
+       unsigned int mask = ilog2(aligned_pages);
+       u64 addr = (u64)pfn << VTD_PAGE_SHIFT;
+
+       /*
+        * PSI masks the low order bits of the base address. If the
+        * address isn't aligned to the mask, then compute a mask value
+        * needed to ensure the target range is flushed.
+        */
+       if (unlikely(bitmask & pfn)) {
+               unsigned long end_pfn = pfn + pages - 1, shared_bits;
+
+               /*
+                * Since end_pfn <= pfn + bitmask, the only way bits
+                * higher than bitmask can differ in pfn and end_pfn is
+                * by carrying. This means after masking out bitmask,
+                * high bits starting with the first set bit in
+                * shared_bits are all equal in both pfn and end_pfn.
+                */
+               shared_bits = ~(pfn ^ end_pfn) & ~bitmask;
+               mask = shared_bits ? __ffs(shared_bits) : BITS_PER_LONG;
+       }
+
+       /*
+        * Fallback to domain selective flush if no PSI support or
+        * the size is too big.
+        */
+       if (!cap_pgsel_inv(iommu->cap) || mask > cap_max_amask_val(iommu->cap))
+               iommu->flush.flush_iotlb(iommu, did, 0, 0,
+                                        DMA_TLB_DSI_FLUSH);
+       else
+               iommu->flush.flush_iotlb(iommu, did, addr | ih, mask,
+                                        DMA_TLB_PSI_FLUSH);
+}
+
 static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
                                  struct dmar_domain *domain,
                                  unsigned long pfn, unsigned int pages,
@@ -1384,42 +1422,10 @@ static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
        if (ih)
                ih = 1 << 6;
 
-       if (domain->use_first_level) {
+       if (domain->use_first_level)
                domain_flush_pasid_iotlb(iommu, domain, addr, pages, ih);
-       } else {
-               unsigned long bitmask = aligned_pages - 1;
-
-               /*
-                * PSI masks the low order bits of the base address. If the
-                * address isn't aligned to the mask, then compute a mask value
-                * needed to ensure the target range is flushed.
-                */
-               if (unlikely(bitmask & pfn)) {
-                       unsigned long end_pfn = pfn + pages - 1, shared_bits;
-
-                       /*
-                        * Since end_pfn <= pfn + bitmask, the only way bits
-                        * higher than bitmask can differ in pfn and end_pfn is
-                        * by carrying. This means after masking out bitmask,
-                        * high bits starting with the first set bit in
-                        * shared_bits are all equal in both pfn and end_pfn.
-                        */
-                       shared_bits = ~(pfn ^ end_pfn) & ~bitmask;
-                       mask = shared_bits ? __ffs(shared_bits) : BITS_PER_LONG;
-               }
-
-               /*
-                * Fallback to domain selective flush if no PSI support or
-                * the size is too big.
-                */
-               if (!cap_pgsel_inv(iommu->cap) ||
-                   mask > cap_max_amask_val(iommu->cap))
-                       iommu->flush.flush_iotlb(iommu, did, 0, 0,
-                                                       DMA_TLB_DSI_FLUSH);
-               else
-                       iommu->flush.flush_iotlb(iommu, did, addr | ih, mask,
-                                                       DMA_TLB_PSI_FLUSH);
-       }
+       else
+               __iommu_flush_iotlb_psi(iommu, did, pfn, pages, ih);
 
        /*
         * In caching mode, changes of pages from non-present to present require
@@ -1443,6 +1449,46 @@ static void __mapping_notify_one(struct intel_iommu *iommu, struct dmar_domain *
                iommu_flush_write_buffer(iommu);
 }
 
+/*
+ * Flush the relevant caches in nested translation if the domain
+ * also serves as a parent
+ */
+static void parent_domain_flush(struct dmar_domain *domain,
+                               unsigned long pfn,
+                               unsigned long pages, int ih)
+{
+       struct dmar_domain *s1_domain;
+
+       spin_lock(&domain->s1_lock);
+       list_for_each_entry(s1_domain, &domain->s1_domains, s2_link) {
+               struct device_domain_info *device_info;
+               struct iommu_domain_info *info;
+               unsigned long flags;
+               unsigned long i;
+
+               xa_for_each(&s1_domain->iommu_array, i, info)
+                       __iommu_flush_iotlb_psi(info->iommu, info->did,
+                                               pfn, pages, ih);
+
+               if (!s1_domain->has_iotlb_device)
+                       continue;
+
+               spin_lock_irqsave(&s1_domain->lock, flags);
+               list_for_each_entry(device_info, &s1_domain->devices, link)
+                       /*
+                        * Address translation cache in device side caches the
+                        * result of nested translation. There is no easy way
+                        * to identify the exact set of nested translations
+                        * affected by a change in S2. So just flush the entire
+                        * device cache.
+                        */
+                       __iommu_flush_dev_iotlb(device_info, 0,
+                                               MAX_AGAW_PFN_WIDTH);
+               spin_unlock_irqrestore(&s1_domain->lock, flags);
+       }
+       spin_unlock(&domain->s1_lock);
+}
+
 static void intel_flush_iotlb_all(struct iommu_domain *domain)
 {
        struct dmar_domain *dmar_domain = to_dmar_domain(domain);
@@ -1462,6 +1508,9 @@ static void intel_flush_iotlb_all(struct iommu_domain *domain)
                if (!cap_caching_mode(iommu->cap))
                        iommu_flush_dev_iotlb(dmar_domain, 0, MAX_AGAW_PFN_WIDTH);
        }
+
+       if (dmar_domain->nested_parent)
+               parent_domain_flush(dmar_domain, 0, -1, 0);
 }
 
 static void iommu_disable_protect_mem_regions(struct intel_iommu *iommu)
@@ -1985,6 +2034,9 @@ static void switch_to_super_page(struct dmar_domain *domain,
                                iommu_flush_iotlb_psi(info->iommu, domain,
                                                      start_pfn, lvl_pages,
                                                      0, 0);
+                       if (domain->nested_parent)
+                               parent_domain_flush(domain, start_pfn,
+                                                   lvl_pages, 0);
                }
 
                pte++;
@@ -3883,6 +3935,7 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags,
        bool dirty_tracking = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
        bool nested_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT;
        struct intel_iommu *iommu = info->iommu;
+       struct dmar_domain *dmar_domain;
        struct iommu_domain *domain;
 
        /* Must be NESTING domain */
@@ -3908,11 +3961,16 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags,
        if (!domain)
                return ERR_PTR(-ENOMEM);
 
-       if (nested_parent)
-               to_dmar_domain(domain)->nested_parent = true;
+       dmar_domain = to_dmar_domain(domain);
+
+       if (nested_parent) {
+               dmar_domain->nested_parent = true;
+               INIT_LIST_HEAD(&dmar_domain->s1_domains);
+               spin_lock_init(&dmar_domain->s1_lock);
+       }
 
        if (dirty_tracking) {
-               if (to_dmar_domain(domain)->use_first_level) {
+               if (dmar_domain->use_first_level) {
                        iommu_domain_free(domain);
                        return ERR_PTR(-EOPNOTSUPP);
                }
@@ -3924,8 +3982,12 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags,
 
 static void intel_iommu_domain_free(struct iommu_domain *domain)
 {
+       struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+
+       WARN_ON(dmar_domain->nested_parent &&
+               !list_empty(&dmar_domain->s1_domains));
        if (domain != &si_domain->domain)
-               domain_exit(to_dmar_domain(domain));
+               domain_exit(dmar_domain);
 }
 
 int prepare_domain_attach_device(struct iommu_domain *domain,
@@ -4107,6 +4169,9 @@ static void intel_iommu_tlb_sync(struct iommu_domain *domain,
                                      start_pfn, nrpages,
                                      list_empty(&gather->freelist), 0);
 
+       if (dmar_domain->nested_parent)
+               parent_domain_flush(dmar_domain, start_pfn, nrpages,
+                                   list_empty(&gather->freelist));
        put_pages_list(&gather->freelist);
 }
 
@@ -4664,21 +4729,70 @@ static void *intel_iommu_hw_info(struct device *dev, u32 *length, u32 *type)
        return vtd;
 }
 
+/*
+ * Set dirty tracking for the device list of a domain. The caller must
+ * hold the domain->lock when calling it.
+ */
+static int device_set_dirty_tracking(struct list_head *devices, bool enable)
+{
+       struct device_domain_info *info;
+       int ret = 0;
+
+       list_for_each_entry(info, devices, link) {
+               ret = intel_pasid_setup_dirty_tracking(info->iommu, info->dev,
+                                                      IOMMU_NO_PASID, enable);
+               if (ret)
+                       break;
+       }
+
+       return ret;
+}
+
+static int parent_domain_set_dirty_tracking(struct dmar_domain *domain,
+                                           bool enable)
+{
+       struct dmar_domain *s1_domain;
+       unsigned long flags;
+       int ret;
+
+       spin_lock(&domain->s1_lock);
+       list_for_each_entry(s1_domain, &domain->s1_domains, s2_link) {
+               spin_lock_irqsave(&s1_domain->lock, flags);
+               ret = device_set_dirty_tracking(&s1_domain->devices, enable);
+               spin_unlock_irqrestore(&s1_domain->lock, flags);
+               if (ret)
+                       goto err_unwind;
+       }
+       spin_unlock(&domain->s1_lock);
+       return 0;
+
+err_unwind:
+       list_for_each_entry(s1_domain, &domain->s1_domains, s2_link) {
+               spin_lock_irqsave(&s1_domain->lock, flags);
+               device_set_dirty_tracking(&s1_domain->devices,
+                                         domain->dirty_tracking);
+               spin_unlock_irqrestore(&s1_domain->lock, flags);
+       }
+       spin_unlock(&domain->s1_lock);
+       return ret;
+}
+
 static int intel_iommu_set_dirty_tracking(struct iommu_domain *domain,
                                          bool enable)
 {
        struct dmar_domain *dmar_domain = to_dmar_domain(domain);
-       struct device_domain_info *info;
        int ret;
 
        spin_lock(&dmar_domain->lock);
        if (dmar_domain->dirty_tracking == enable)
                goto out_unlock;
 
-       list_for_each_entry(info, &dmar_domain->devices, link) {
-               ret = intel_pasid_setup_dirty_tracking(info->iommu,
-                                                      info->domain, info->dev,
-                                                      IOMMU_NO_PASID, enable);
+       ret = device_set_dirty_tracking(&dmar_domain->devices, enable);
+       if (ret)
+               goto err_unwind;
+
+       if (dmar_domain->nested_parent) {
+               ret = parent_domain_set_dirty_tracking(dmar_domain, enable);
                if (ret)
                        goto err_unwind;
        }
@@ -4690,10 +4804,8 @@ out_unlock:
        return 0;
 
 err_unwind:
-       list_for_each_entry(info, &dmar_domain->devices, link)
-               intel_pasid_setup_dirty_tracking(info->iommu, dmar_domain,
-                                                info->dev, IOMMU_NO_PASID,
-                                                dmar_domain->dirty_tracking);
+       device_set_dirty_tracking(&dmar_domain->devices,
+                                 dmar_domain->dirty_tracking);
        spin_unlock(&dmar_domain->lock);
        return ret;
 }
index d02f916d8e59a914d2441fa2b81af9ac31dfbf86..4145c04cb1c6818fea0ce420d31c41acec8836a3 100644 (file)
@@ -627,6 +627,10 @@ struct dmar_domain {
                        int             agaw;
                        /* maximum mapped address */
                        u64             max_addr;
+                       /* Protect the s1_domains list */
+                       spinlock_t      s1_lock;
+                       /* Track s1_domains nested on this domain */
+                       struct list_head s1_domains;
                };
 
                /* Nested user domain */
@@ -637,6 +641,8 @@ struct dmar_domain {
                        unsigned long s1_pgtbl;
                        /* page table attributes */
                        struct iommu_hwpt_vtd_s1 s1_cfg;
+                       /* link to parent domain siblings */
+                       struct list_head s2_link;
                };
        };
 
@@ -1060,6 +1066,7 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
  */
 #define QI_OPT_WAIT_DRAIN              BIT(0)
 
+void domain_update_iotlb(struct dmar_domain *domain);
 int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu);
 void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu);
 void device_block_translation(struct device *dev);
index f26c7f1c46ccaf43b0a4db5209b5c85b484277ed..a7d68f3d518acd9fc5af6f03ebbf71c825a4afcc 100644 (file)
@@ -65,12 +65,20 @@ static int intel_nested_attach_dev(struct iommu_domain *domain,
        list_add(&info->link, &dmar_domain->devices);
        spin_unlock_irqrestore(&dmar_domain->lock, flags);
 
+       domain_update_iotlb(dmar_domain);
+
        return 0;
 }
 
 static void intel_nested_domain_free(struct iommu_domain *domain)
 {
-       kfree(to_dmar_domain(domain));
+       struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+       struct dmar_domain *s2_domain = dmar_domain->s2_domain;
+
+       spin_lock(&s2_domain->s1_lock);
+       list_del(&dmar_domain->s2_link);
+       spin_unlock(&s2_domain->s1_lock);
+       kfree(dmar_domain);
 }
 
 static void nested_flush_dev_iotlb(struct dmar_domain *domain, u64 addr,
@@ -95,7 +103,7 @@ static void nested_flush_dev_iotlb(struct dmar_domain *domain, u64 addr,
 }
 
 static void intel_nested_flush_cache(struct dmar_domain *domain, u64 addr,
-                                    unsigned long npages, bool ih)
+                                    u64 npages, bool ih)
 {
        struct iommu_domain_info *info;
        unsigned int mask;
@@ -201,5 +209,9 @@ struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent,
        spin_lock_init(&domain->lock);
        xa_init(&domain->iommu_array);
 
+       spin_lock(&s2_domain->s1_lock);
+       list_add(&domain->s2_link, &s2_domain->s1_domains);
+       spin_unlock(&s2_domain->s1_lock);
+
        return &domain->domain;
 }
index 3239cefa4c337897dda048ebec7aeb1fc075a955..108158e2b907d0744467d88e8ec35b419185555b 100644 (file)
@@ -428,7 +428,6 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
  * Set up dirty tracking on a second only or nested translation type.
  */
 int intel_pasid_setup_dirty_tracking(struct intel_iommu *iommu,
-                                    struct dmar_domain *domain,
                                     struct device *dev, u32 pasid,
                                     bool enabled)
 {
@@ -445,7 +444,7 @@ int intel_pasid_setup_dirty_tracking(struct intel_iommu *iommu,
                return -ENODEV;
        }
 
-       did = domain_id_iommu(domain, iommu);
+       did = pasid_get_domain_id(pte);
        pgtt = pasid_pte_get_pgtt(pte);
        if (pgtt != PASID_ENTRY_PGTT_SL_ONLY &&
            pgtt != PASID_ENTRY_PGTT_NESTED) {
@@ -658,6 +657,8 @@ int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
        pasid_set_domain_id(pte, did);
        pasid_set_address_width(pte, s2_domain->agaw);
        pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
+       if (s2_domain->dirty_tracking)
+               pasid_set_ssade(pte);
        pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
        pasid_set_present(pte);
        spin_unlock(&iommu->lock);
index 8d40d4c66e3198a7ce90c83168a3f86491d79f71..487ede039bdde5733ec1f6af0905ade24c806200 100644 (file)
@@ -307,7 +307,6 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
                                   struct dmar_domain *domain,
                                   struct device *dev, u32 pasid);
 int intel_pasid_setup_dirty_tracking(struct intel_iommu *iommu,
-                                    struct dmar_domain *domain,
                                     struct device *dev, u32 pasid,
                                     bool enabled);
 int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
index c3fc9201d0be97e59395750cda0fc29940c0b844..65814cbc84020021df67d0b7dab9db2c61351b56 100644 (file)
@@ -41,6 +41,7 @@ static struct iommu_mm_data *iommu_alloc_mm_data(struct mm_struct *mm, struct de
        }
        iommu_mm->pasid = pasid;
        INIT_LIST_HEAD(&iommu_mm->sva_domains);
+       INIT_LIST_HEAD(&iommu_mm->sva_handles);
        /*
         * Make sure the write to mm->iommu_mm is not reordered in front of
         * initialization to iommu_mm fields. If it does, readers may see a
@@ -82,6 +83,14 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
                goto out_unlock;
        }
 
+       list_for_each_entry(handle, &mm->iommu_mm->sva_handles, handle_item) {
+               if (handle->dev == dev) {
+                       refcount_inc(&handle->users);
+                       mutex_unlock(&iommu_sva_lock);
+                       return handle;
+               }
+       }
+
        handle = kzalloc(sizeof(*handle), GFP_KERNEL);
        if (!handle) {
                ret = -ENOMEM;
@@ -111,6 +120,8 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
        list_add(&domain->next, &mm->iommu_mm->sva_domains);
 
 out:
+       refcount_set(&handle->users, 1);
+       list_add(&handle->handle_item, &mm->iommu_mm->sva_handles);
        mutex_unlock(&iommu_sva_lock);
        handle->dev = dev;
        handle->domain = domain;
@@ -141,6 +152,12 @@ void iommu_sva_unbind_device(struct iommu_sva *handle)
        struct device *dev = handle->dev;
 
        mutex_lock(&iommu_sva_lock);
+       if (!refcount_dec_and_test(&handle->users)) {
+               mutex_unlock(&iommu_sva_lock);
+               return;
+       }
+       list_del(&handle->handle_item);
+
        iommu_detach_device_pasid(domain, dev, iommu_mm->pasid);
        if (--domain->users == 0) {
                list_del(&domain->next);
index 68e648b55767060204a8f42d1927c09ebacad39a..d14413916f93a01626e850aa72ee0c919c1f72bd 100644 (file)
@@ -1799,7 +1799,7 @@ iommu_group_alloc_default_domain(struct iommu_group *group, int req_type)
         * domain. Do not use in new drivers.
         */
        if (ops->default_domain) {
-               if (req_type)
+               if (req_type != ops->default_domain->type)
                        return ERR_PTR(-EINVAL);
                return ops->default_domain;
        }
@@ -1871,10 +1871,18 @@ static int iommu_get_def_domain_type(struct iommu_group *group,
        const struct iommu_ops *ops = dev_iommu_ops(dev);
        int type;
 
-       if (!ops->def_domain_type)
-               return cur_type;
-
-       type = ops->def_domain_type(dev);
+       if (ops->default_domain) {
+               /*
+                * Drivers that declare a global static default_domain will
+                * always choose that.
+                */
+               type = ops->default_domain->type;
+       } else {
+               if (ops->def_domain_type)
+                       type = ops->def_domain_type(dev);
+               else
+                       return cur_type;
+       }
        if (!type || cur_type == type)
                return cur_type;
        if (!cur_type)
index 3f3f1fa1a0a946a43eb48ee324ab4979683bb566..33d142f8057d70a77f44e842afdd84b1bee0a970 100644 (file)
@@ -263,7 +263,8 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 
        if (cmd->__reserved)
                return -EOPNOTSUPP;
-       if (cmd->data_type == IOMMU_HWPT_DATA_NONE && cmd->data_len)
+       if ((cmd->data_type == IOMMU_HWPT_DATA_NONE && cmd->data_len) ||
+           (cmd->data_type != IOMMU_HWPT_DATA_NONE && !cmd->data_len))
                return -EINVAL;
 
        idev = iommufd_get_device(ucmd, cmd->dev_id);
index 504ac1b01b2d2ab45fbc22fde2bdcf324ce2d973..05fd9d3abf1b809614cced9e9387679797866103 100644 (file)
@@ -1330,20 +1330,23 @@ out_unlock:
 
 int iopt_add_access(struct io_pagetable *iopt, struct iommufd_access *access)
 {
+       u32 new_id;
        int rc;
 
        down_write(&iopt->domains_rwsem);
        down_write(&iopt->iova_rwsem);
-       rc = xa_alloc(&iopt->access_list, &access->iopt_access_list_id, access,
-                     xa_limit_16b, GFP_KERNEL_ACCOUNT);
+       rc = xa_alloc(&iopt->access_list, &new_id, access, xa_limit_16b,
+                     GFP_KERNEL_ACCOUNT);
+
        if (rc)
                goto out_unlock;
 
        rc = iopt_calculate_iova_alignment(iopt);
        if (rc) {
-               xa_erase(&iopt->access_list, access->iopt_access_list_id);
+               xa_erase(&iopt->access_list, new_id);
                goto out_unlock;
        }
+       access->iopt_access_list_id = new_id;
 
 out_unlock:
        up_write(&iopt->iova_rwsem);
index 482d4059f5db6aed38ee8aa60f25b791f1e7556d..e854d3f672051b5223e0fec8af741abf03bbffbd 100644 (file)
@@ -45,6 +45,7 @@ enum {
 
 enum {
        MOCK_FLAGS_DEVICE_NO_DIRTY = 1 << 0,
+       MOCK_FLAGS_DEVICE_HUGE_IOVA = 1 << 1,
 };
 
 enum {
index 0a92c9eeaf7f50a6fe05c266b9ec39d1021844a9..db8c46bee1559ac46fb148d2474668b5a994ae15 100644 (file)
@@ -100,7 +100,7 @@ struct iova_bitmap {
        struct iova_bitmap_map mapped;
 
        /* userspace address of the bitmap */
-       u64 __user *bitmap;
+       u8 __user *bitmap;
 
        /* u64 index that @mapped points to */
        unsigned long mapped_base_index;
@@ -113,6 +113,9 @@ struct iova_bitmap {
 
        /* length of the IOVA range for the whole bitmap */
        size_t length;
+
+       /* length of the IOVA range set ahead the pinned pages */
+       unsigned long set_ahead_length;
 };
 
 /*
@@ -162,7 +165,7 @@ static int iova_bitmap_get(struct iova_bitmap *bitmap)
 {
        struct iova_bitmap_map *mapped = &bitmap->mapped;
        unsigned long npages;
-       u64 __user *addr;
+       u8 __user *addr;
        long ret;
 
        /*
@@ -175,18 +178,19 @@ static int iova_bitmap_get(struct iova_bitmap *bitmap)
                               bitmap->mapped_base_index) *
                               sizeof(*bitmap->bitmap), PAGE_SIZE);
 
-       /*
-        * We always cap at max number of 'struct page' a base page can fit.
-        * This is, for example, on x86 means 2M of bitmap data max.
-        */
-       npages = min(npages,  PAGE_SIZE / sizeof(struct page *));
-
        /*
         * Bitmap address to be pinned is calculated via pointer arithmetic
         * with bitmap u64 word index.
         */
        addr = bitmap->bitmap + bitmap->mapped_base_index;
 
+       /*
+        * We always cap at max number of 'struct page' a base page can fit.
+        * This is, for example, on x86 means 2M of bitmap data max.
+        */
+       npages = min(npages + !!offset_in_page(addr),
+                    PAGE_SIZE / sizeof(struct page *));
+
        ret = pin_user_pages_fast((unsigned long)addr, npages,
                                  FOLL_WRITE, mapped->pages);
        if (ret <= 0)
@@ -247,7 +251,7 @@ struct iova_bitmap *iova_bitmap_alloc(unsigned long iova, size_t length,
 
        mapped = &bitmap->mapped;
        mapped->pgshift = __ffs(page_size);
-       bitmap->bitmap = data;
+       bitmap->bitmap = (u8 __user *)data;
        bitmap->mapped_total_index =
                iova_bitmap_offset_to_index(bitmap, length - 1) + 1;
        bitmap->iova = iova;
@@ -304,7 +308,7 @@ static unsigned long iova_bitmap_mapped_remaining(struct iova_bitmap *bitmap)
 
        remaining = bitmap->mapped_total_index - bitmap->mapped_base_index;
        remaining = min_t(unsigned long, remaining,
-                         bytes / sizeof(*bitmap->bitmap));
+                         DIV_ROUND_UP(bytes, sizeof(*bitmap->bitmap)));
 
        return remaining;
 }
@@ -341,6 +345,32 @@ static bool iova_bitmap_done(struct iova_bitmap *bitmap)
        return bitmap->mapped_base_index >= bitmap->mapped_total_index;
 }
 
+static int iova_bitmap_set_ahead(struct iova_bitmap *bitmap,
+                                size_t set_ahead_length)
+{
+       int ret = 0;
+
+       while (set_ahead_length > 0 && !iova_bitmap_done(bitmap)) {
+               unsigned long length = iova_bitmap_mapped_length(bitmap);
+               unsigned long iova = iova_bitmap_mapped_iova(bitmap);
+
+               ret = iova_bitmap_get(bitmap);
+               if (ret)
+                       break;
+
+               length = min(length, set_ahead_length);
+               iova_bitmap_set(bitmap, iova, length);
+
+               set_ahead_length -= length;
+               bitmap->mapped_base_index +=
+                       iova_bitmap_offset_to_index(bitmap, length - 1) + 1;
+               iova_bitmap_put(bitmap);
+       }
+
+       bitmap->set_ahead_length = 0;
+       return ret;
+}
+
 /*
  * Advances to the next range, releases the current pinned
  * pages and pins the next set of bitmap pages.
@@ -357,6 +387,15 @@ static int iova_bitmap_advance(struct iova_bitmap *bitmap)
        if (iova_bitmap_done(bitmap))
                return 0;
 
+       /* Iterate, set and skip any bits requested for next iteration */
+       if (bitmap->set_ahead_length) {
+               int ret;
+
+               ret = iova_bitmap_set_ahead(bitmap, bitmap->set_ahead_length);
+               if (ret)
+                       return ret;
+       }
+
        /* When advancing the index we pin the next set of bitmap pages */
        return iova_bitmap_get(bitmap);
 }
@@ -409,6 +448,7 @@ void iova_bitmap_set(struct iova_bitmap *bitmap,
                        mapped->pgshift) + mapped->pgoff * BITS_PER_BYTE;
        unsigned long last_bit = (((iova + length - 1) - mapped->iova) >>
                        mapped->pgshift) + mapped->pgoff * BITS_PER_BYTE;
+       unsigned long last_page_idx = mapped->npages - 1;
 
        do {
                unsigned int page_idx = cur_bit / BITS_PER_PAGE;
@@ -417,10 +457,18 @@ void iova_bitmap_set(struct iova_bitmap *bitmap,
                                         last_bit - cur_bit + 1);
                void *kaddr;
 
+               if (unlikely(page_idx > last_page_idx))
+                       break;
+
                kaddr = kmap_local_page(mapped->pages[page_idx]);
                bitmap_set(kaddr, offset, nbits);
                kunmap_local(kaddr);
                cur_bit += nbits;
        } while (cur_bit <= last_bit);
+
+       if (unlikely(cur_bit <= last_bit)) {
+               bitmap->set_ahead_length =
+                       ((last_bit - cur_bit + 1) << bitmap->mapped.pgshift);
+       }
 }
 EXPORT_SYMBOL_NS_GPL(iova_bitmap_set, IOMMUFD);
index d9e9920c7eba413eaf25b7840eefdf36a3999a9e..7a2199470f3121da91e060bca82315a6944e37b8 100644 (file)
@@ -36,11 +36,12 @@ static struct mock_bus_type iommufd_mock_bus_type = {
        },
 };
 
-static atomic_t mock_dev_num;
+static DEFINE_IDA(mock_dev_ida);
 
 enum {
        MOCK_DIRTY_TRACK = 1,
        MOCK_IO_PAGE_SIZE = PAGE_SIZE / 2,
+       MOCK_HUGE_PAGE_SIZE = 512 * MOCK_IO_PAGE_SIZE,
 
        /*
         * Like a real page table alignment requires the low bits of the address
@@ -53,6 +54,7 @@ enum {
        MOCK_PFN_START_IOVA = _MOCK_PFN_START,
        MOCK_PFN_LAST_IOVA = _MOCK_PFN_START,
        MOCK_PFN_DIRTY_IOVA = _MOCK_PFN_START << 1,
+       MOCK_PFN_HUGE_IOVA = _MOCK_PFN_START << 2,
 };
 
 /*
@@ -61,8 +63,8 @@ enum {
  * In syzkaller mode the 64 bit IOVA is converted into an nth area and offset
  * value. This has a much smaller randomization space and syzkaller can hit it.
  */
-static unsigned long iommufd_test_syz_conv_iova(struct io_pagetable *iopt,
-                                               u64 *iova)
+static unsigned long __iommufd_test_syz_conv_iova(struct io_pagetable *iopt,
+                                                 u64 *iova)
 {
        struct syz_layout {
                __u32 nth_area;
@@ -86,6 +88,21 @@ static unsigned long iommufd_test_syz_conv_iova(struct io_pagetable *iopt,
        return 0;
 }
 
+static unsigned long iommufd_test_syz_conv_iova(struct iommufd_access *access,
+                                               u64 *iova)
+{
+       unsigned long ret;
+
+       mutex_lock(&access->ioas_lock);
+       if (!access->ioas) {
+               mutex_unlock(&access->ioas_lock);
+               return 0;
+       }
+       ret = __iommufd_test_syz_conv_iova(&access->ioas->iopt, iova);
+       mutex_unlock(&access->ioas_lock);
+       return ret;
+}
+
 void iommufd_test_syz_conv_iova_id(struct iommufd_ucmd *ucmd,
                                   unsigned int ioas_id, u64 *iova, u32 *flags)
 {
@@ -98,7 +115,7 @@ void iommufd_test_syz_conv_iova_id(struct iommufd_ucmd *ucmd,
        ioas = iommufd_get_ioas(ucmd->ictx, ioas_id);
        if (IS_ERR(ioas))
                return;
-       *iova = iommufd_test_syz_conv_iova(&ioas->iopt, iova);
+       *iova = __iommufd_test_syz_conv_iova(&ioas->iopt, iova);
        iommufd_put_object(ucmd->ictx, &ioas->obj);
 }
 
@@ -121,6 +138,7 @@ enum selftest_obj_type {
 struct mock_dev {
        struct device dev;
        unsigned long flags;
+       int id;
 };
 
 struct selftest_obj {
@@ -191,6 +209,34 @@ static int mock_domain_set_dirty_tracking(struct iommu_domain *domain,
        return 0;
 }
 
+static bool mock_test_and_clear_dirty(struct mock_iommu_domain *mock,
+                                     unsigned long iova, size_t page_size,
+                                     unsigned long flags)
+{
+       unsigned long cur, end = iova + page_size - 1;
+       bool dirty = false;
+       void *ent, *old;
+
+       for (cur = iova; cur < end; cur += MOCK_IO_PAGE_SIZE) {
+               ent = xa_load(&mock->pfns, cur / MOCK_IO_PAGE_SIZE);
+               if (!ent || !(xa_to_value(ent) & MOCK_PFN_DIRTY_IOVA))
+                       continue;
+
+               dirty = true;
+               /* Clear dirty */
+               if (!(flags & IOMMU_DIRTY_NO_CLEAR)) {
+                       unsigned long val;
+
+                       val = xa_to_value(ent) & ~MOCK_PFN_DIRTY_IOVA;
+                       old = xa_store(&mock->pfns, cur / MOCK_IO_PAGE_SIZE,
+                                      xa_mk_value(val), GFP_KERNEL);
+                       WARN_ON_ONCE(ent != old);
+               }
+       }
+
+       return dirty;
+}
+
 static int mock_domain_read_and_clear_dirty(struct iommu_domain *domain,
                                            unsigned long iova, size_t size,
                                            unsigned long flags,
@@ -198,31 +244,31 @@ static int mock_domain_read_and_clear_dirty(struct iommu_domain *domain,
 {
        struct mock_iommu_domain *mock =
                container_of(domain, struct mock_iommu_domain, domain);
-       unsigned long i, max = size / MOCK_IO_PAGE_SIZE;
-       void *ent, *old;
+       unsigned long end = iova + size;
+       void *ent;
 
        if (!(mock->flags & MOCK_DIRTY_TRACK) && dirty->bitmap)
                return -EINVAL;
 
-       for (i = 0; i < max; i++) {
-               unsigned long cur = iova + i * MOCK_IO_PAGE_SIZE;
+       do {
+               unsigned long pgsize = MOCK_IO_PAGE_SIZE;
+               unsigned long head;
 
-               ent = xa_load(&mock->pfns, cur / MOCK_IO_PAGE_SIZE);
-               if (ent && (xa_to_value(ent) & MOCK_PFN_DIRTY_IOVA)) {
-                       /* Clear dirty */
-                       if (!(flags & IOMMU_DIRTY_NO_CLEAR)) {
-                               unsigned long val;
-
-                               val = xa_to_value(ent) & ~MOCK_PFN_DIRTY_IOVA;
-                               old = xa_store(&mock->pfns,
-                                              cur / MOCK_IO_PAGE_SIZE,
-                                              xa_mk_value(val), GFP_KERNEL);
-                               WARN_ON_ONCE(ent != old);
-                       }
-                       iommu_dirty_bitmap_record(dirty, cur,
-                                                 MOCK_IO_PAGE_SIZE);
+               ent = xa_load(&mock->pfns, iova / MOCK_IO_PAGE_SIZE);
+               if (!ent) {
+                       iova += pgsize;
+                       continue;
                }
-       }
+
+               if (xa_to_value(ent) & MOCK_PFN_HUGE_IOVA)
+                       pgsize = MOCK_HUGE_PAGE_SIZE;
+               head = iova & ~(pgsize - 1);
+
+               /* Clear dirty */
+               if (mock_test_and_clear_dirty(mock, head, pgsize, flags))
+                       iommu_dirty_bitmap_record(dirty, head, pgsize);
+               iova = head + pgsize;
+       } while (iova < end);
 
        return 0;
 }
@@ -234,6 +280,7 @@ const struct iommu_dirty_ops dirty_ops = {
 
 static struct iommu_domain *mock_domain_alloc_paging(struct device *dev)
 {
+       struct mock_dev *mdev = container_of(dev, struct mock_dev, dev);
        struct mock_iommu_domain *mock;
 
        mock = kzalloc(sizeof(*mock), GFP_KERNEL);
@@ -242,6 +289,8 @@ static struct iommu_domain *mock_domain_alloc_paging(struct device *dev)
        mock->domain.geometry.aperture_start = MOCK_APERTURE_START;
        mock->domain.geometry.aperture_end = MOCK_APERTURE_LAST;
        mock->domain.pgsize_bitmap = MOCK_IO_PAGE_SIZE;
+       if (dev && mdev->flags & MOCK_FLAGS_DEVICE_HUGE_IOVA)
+               mock->domain.pgsize_bitmap |= MOCK_HUGE_PAGE_SIZE;
        mock->domain.ops = mock_ops.default_domain_ops;
        mock->domain.type = IOMMU_DOMAIN_UNMANAGED;
        xa_init(&mock->pfns);
@@ -287,7 +336,7 @@ mock_domain_alloc_user(struct device *dev, u32 flags,
                        return ERR_PTR(-EOPNOTSUPP);
                if (user_data || (has_dirty_flag && no_dirty_ops))
                        return ERR_PTR(-EOPNOTSUPP);
-               domain = mock_domain_alloc_paging(NULL);
+               domain = mock_domain_alloc_paging(dev);
                if (!domain)
                        return ERR_PTR(-ENOMEM);
                if (has_dirty_flag)
@@ -350,6 +399,9 @@ static int mock_domain_map_pages(struct iommu_domain *domain,
 
                        if (pgcount == 1 && cur + MOCK_IO_PAGE_SIZE == pgsize)
                                flags = MOCK_PFN_LAST_IOVA;
+                       if (pgsize != MOCK_IO_PAGE_SIZE) {
+                               flags |= MOCK_PFN_HUGE_IOVA;
+                       }
                        old = xa_store(&mock->pfns, iova / MOCK_IO_PAGE_SIZE,
                                       xa_mk_value((paddr / MOCK_IO_PAGE_SIZE) |
                                                   flags),
@@ -394,20 +446,27 @@ static size_t mock_domain_unmap_pages(struct iommu_domain *domain,
 
                        /*
                         * iommufd generates unmaps that must be a strict
-                        * superset of the map's performend So every starting
-                        * IOVA should have been an iova passed to map, and the
+                        * superset of the map's performend So every
+                        * starting/ending IOVA should have been an iova passed
+                        * to map.
                         *
-                        * First IOVA must be present and have been a first IOVA
-                        * passed to map_pages
+                        * This simple logic doesn't work when the HUGE_PAGE is
+                        * turned on since the core code will automatically
+                        * switch between the two page sizes creating a break in
+                        * the unmap calls. The break can land in the middle of
+                        * contiguous IOVA.
                         */
-                       if (first) {
-                               WARN_ON(ent && !(xa_to_value(ent) &
-                                                MOCK_PFN_START_IOVA));
-                               first = false;
+                       if (!(domain->pgsize_bitmap & MOCK_HUGE_PAGE_SIZE)) {
+                               if (first) {
+                                       WARN_ON(ent && !(xa_to_value(ent) &
+                                                        MOCK_PFN_START_IOVA));
+                                       first = false;
+                               }
+                               if (pgcount == 1 &&
+                                   cur + MOCK_IO_PAGE_SIZE == pgsize)
+                                       WARN_ON(ent && !(xa_to_value(ent) &
+                                                        MOCK_PFN_LAST_IOVA));
                        }
-                       if (pgcount == 1 && cur + MOCK_IO_PAGE_SIZE == pgsize)
-                               WARN_ON(ent && !(xa_to_value(ent) &
-                                                MOCK_PFN_LAST_IOVA));
 
                        iova += MOCK_IO_PAGE_SIZE;
                        ret += MOCK_IO_PAGE_SIZE;
@@ -595,7 +654,7 @@ static void mock_dev_release(struct device *dev)
 {
        struct mock_dev *mdev = container_of(dev, struct mock_dev, dev);
 
-       atomic_dec(&mock_dev_num);
+       ida_free(&mock_dev_ida, mdev->id);
        kfree(mdev);
 }
 
@@ -604,7 +663,8 @@ static struct mock_dev *mock_dev_create(unsigned long dev_flags)
        struct mock_dev *mdev;
        int rc;
 
-       if (dev_flags & ~(MOCK_FLAGS_DEVICE_NO_DIRTY))
+       if (dev_flags &
+           ~(MOCK_FLAGS_DEVICE_NO_DIRTY | MOCK_FLAGS_DEVICE_HUGE_IOVA))
                return ERR_PTR(-EINVAL);
 
        mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
@@ -616,8 +676,12 @@ static struct mock_dev *mock_dev_create(unsigned long dev_flags)
        mdev->dev.release = mock_dev_release;
        mdev->dev.bus = &iommufd_mock_bus_type.bus;
 
-       rc = dev_set_name(&mdev->dev, "iommufd_mock%u",
-                         atomic_inc_return(&mock_dev_num));
+       rc = ida_alloc(&mock_dev_ida, GFP_KERNEL);
+       if (rc < 0)
+               goto err_put;
+       mdev->id = rc;
+
+       rc = dev_set_name(&mdev->dev, "iommufd_mock%u", mdev->id);
        if (rc)
                goto err_put;
 
@@ -1119,7 +1183,7 @@ static int iommufd_test_access_pages(struct iommufd_ucmd *ucmd,
        }
 
        if (flags & MOCK_FLAGS_ACCESS_SYZ)
-               iova = iommufd_test_syz_conv_iova(&staccess->access->ioas->iopt,
+               iova = iommufd_test_syz_conv_iova(staccess->access,
                                        &cmd->access_pages.iova);
 
        npages = (ALIGN(iova + length, PAGE_SIZE) -
@@ -1221,8 +1285,8 @@ static int iommufd_test_access_rw(struct iommufd_ucmd *ucmd,
        }
 
        if (flags & MOCK_FLAGS_ACCESS_SYZ)
-               iova = iommufd_test_syz_conv_iova(&staccess->access->ioas->iopt,
-                                       &cmd->access_rw.iova);
+               iova = iommufd_test_syz_conv_iova(staccess->access,
+                               &cmd->access_rw.iova);
 
        rc = iommufd_access_rw(staccess->access, iova, tmp, length, flags);
        if (rc)
index 5559c943f03f973137432de01f1972aec58f94b6..2b0b3175cea068eb571d8ec5f82a4d45a01e5719 100644 (file)
@@ -2,7 +2,7 @@
 /*
  * Generic Broadcom Set Top Box Level 2 Interrupt controller driver
  *
- * Copyright (C) 2014-2017 Broadcom
+ * Copyright (C) 2014-2024 Broadcom
  */
 
 #define pr_fmt(fmt)    KBUILD_MODNAME  ": " fmt
@@ -112,6 +112,9 @@ static void brcmstb_l2_intc_irq_handle(struct irq_desc *desc)
                generic_handle_domain_irq(b->domain, irq);
        } while (status);
 out:
+       /* Don't ack parent before all device writes are done */
+       wmb();
+
        chained_irq_exit(chip, desc);
 }
 
index d097001c1e3ee7e1d1380a891660dfc522a37554..b822752c42617055e811f9e89bc2b3455bcc2eb8 100644 (file)
@@ -207,6 +207,11 @@ static bool require_its_list_vmovp(struct its_vm *vm, struct its_node *its)
        return (gic_rdists->has_rvpeid || vm->vlpi_count[its->list_nr]);
 }
 
+static bool rdists_support_shareable(void)
+{
+       return !(gic_rdists->flags & RDIST_FLAGS_FORCE_NON_SHAREABLE);
+}
+
 static u16 get_its_list(struct its_vm *vm)
 {
        struct its_node *its;
@@ -2710,10 +2715,12 @@ static u64 inherit_vpe_l1_table_from_its(void)
                        break;
                }
                val |= FIELD_PREP(GICR_VPROPBASER_4_1_ADDR, addr >> 12);
-               val |= FIELD_PREP(GICR_VPROPBASER_SHAREABILITY_MASK,
-                                 FIELD_GET(GITS_BASER_SHAREABILITY_MASK, baser));
-               val |= FIELD_PREP(GICR_VPROPBASER_INNER_CACHEABILITY_MASK,
-                                 FIELD_GET(GITS_BASER_INNER_CACHEABILITY_MASK, baser));
+               if (rdists_support_shareable()) {
+                       val |= FIELD_PREP(GICR_VPROPBASER_SHAREABILITY_MASK,
+                                         FIELD_GET(GITS_BASER_SHAREABILITY_MASK, baser));
+                       val |= FIELD_PREP(GICR_VPROPBASER_INNER_CACHEABILITY_MASK,
+                                         FIELD_GET(GITS_BASER_INNER_CACHEABILITY_MASK, baser));
+               }
                val |= FIELD_PREP(GICR_VPROPBASER_4_1_SIZE, GITS_BASER_NR_PAGES(baser) - 1);
 
                return val;
@@ -2936,8 +2943,10 @@ static int allocate_vpe_l1_table(void)
        WARN_ON(!IS_ALIGNED(pa, psz));
 
        val |= FIELD_PREP(GICR_VPROPBASER_4_1_ADDR, pa >> 12);
-       val |= GICR_VPROPBASER_RaWb;
-       val |= GICR_VPROPBASER_InnerShareable;
+       if (rdists_support_shareable()) {
+               val |= GICR_VPROPBASER_RaWb;
+               val |= GICR_VPROPBASER_InnerShareable;
+       }
        val |= GICR_VPROPBASER_4_1_Z;
        val |= GICR_VPROPBASER_4_1_VALID;
 
@@ -3126,7 +3135,7 @@ static void its_cpu_init_lpis(void)
        gicr_write_propbaser(val, rbase + GICR_PROPBASER);
        tmp = gicr_read_propbaser(rbase + GICR_PROPBASER);
 
-       if (gic_rdists->flags & RDIST_FLAGS_FORCE_NON_SHAREABLE)
+       if (!rdists_support_shareable())
                tmp &= ~GICR_PROPBASER_SHAREABILITY_MASK;
 
        if ((tmp ^ val) & GICR_PROPBASER_SHAREABILITY_MASK) {
@@ -3153,7 +3162,7 @@ static void its_cpu_init_lpis(void)
        gicr_write_pendbaser(val, rbase + GICR_PENDBASER);
        tmp = gicr_read_pendbaser(rbase + GICR_PENDBASER);
 
-       if (gic_rdists->flags & RDIST_FLAGS_FORCE_NON_SHAREABLE)
+       if (!rdists_support_shareable())
                tmp &= ~GICR_PENDBASER_SHAREABILITY_MASK;
 
        if (!(tmp & GICR_PENDBASER_SHAREABILITY_MASK)) {
@@ -3172,6 +3181,7 @@ static void its_cpu_init_lpis(void)
        val |= GICR_CTLR_ENABLE_LPIS;
        writel_relaxed(val, rbase + GICR_CTLR);
 
+out:
        if (gic_rdists->has_vlpis && !gic_rdists->has_rvpeid) {
                void __iomem *vlpi_base = gic_data_rdist_vlpi_base();
 
@@ -3207,7 +3217,6 @@ static void its_cpu_init_lpis(void)
 
        /* Make sure the GIC has seen the above */
        dsb(sy);
-out:
        gic_data_rdist()->flags |= RD_LOCAL_LPI_ENABLED;
        pr_info("GICv3: CPU%d: using %s LPI pending table @%pa\n",
                smp_processor_id(),
@@ -3817,8 +3826,9 @@ static int its_vpe_set_affinity(struct irq_data *d,
                                bool force)
 {
        struct its_vpe *vpe = irq_data_get_irq_chip_data(d);
-       int from, cpu = cpumask_first(mask_val);
+       struct cpumask common, *table_mask;
        unsigned long flags;
+       int from, cpu;
 
        /*
         * Changing affinity is mega expensive, so let's be as lazy as
@@ -3834,19 +3844,22 @@ static int its_vpe_set_affinity(struct irq_data *d,
         * taken on any vLPI handling path that evaluates vpe->col_idx.
         */
        from = vpe_to_cpuid_lock(vpe, &flags);
-       if (from == cpu)
-               goto out;
-
-       vpe->col_idx = cpu;
+       table_mask = gic_data_rdist_cpu(from)->vpe_table_mask;
 
        /*
-        * GICv4.1 allows us to skip VMOVP if moving to a cpu whose RD
-        * is sharing its VPE table with the current one.
+        * If we are offered another CPU in the same GICv4.1 ITS
+        * affinity, pick this one. Otherwise, any CPU will do.
         */
-       if (gic_data_rdist_cpu(cpu)->vpe_table_mask &&
-           cpumask_test_cpu(from, gic_data_rdist_cpu(cpu)->vpe_table_mask))
+       if (table_mask && cpumask_and(&common, mask_val, table_mask))
+               cpu = cpumask_test_cpu(from, &common) ? from : cpumask_first(&common);
+       else
+               cpu = cpumask_first(mask_val);
+
+       if (from == cpu)
                goto out;
 
+       vpe->col_idx = cpu;
+
        its_send_vmovp(vpe);
        its_vpe_db_proxy_move(vpe, from, cpu);
 
@@ -3880,14 +3893,18 @@ static void its_vpe_schedule(struct its_vpe *vpe)
        val  = virt_to_phys(page_address(vpe->its_vm->vprop_page)) &
                GENMASK_ULL(51, 12);
        val |= (LPI_NRBITS - 1) & GICR_VPROPBASER_IDBITS_MASK;
-       val |= GICR_VPROPBASER_RaWb;
-       val |= GICR_VPROPBASER_InnerShareable;
+       if (rdists_support_shareable()) {
+               val |= GICR_VPROPBASER_RaWb;
+               val |= GICR_VPROPBASER_InnerShareable;
+       }
        gicr_write_vpropbaser(val, vlpi_base + GICR_VPROPBASER);
 
        val  = virt_to_phys(page_address(vpe->vpt_page)) &
                GENMASK_ULL(51, 16);
-       val |= GICR_VPENDBASER_RaWaWb;
-       val |= GICR_VPENDBASER_InnerShareable;
+       if (rdists_support_shareable()) {
+               val |= GICR_VPENDBASER_RaWaWb;
+               val |= GICR_VPENDBASER_InnerShareable;
+       }
        /*
         * There is no good way of finding out if the pending table is
         * empty as we can race against the doorbell interrupt very
@@ -5078,6 +5095,8 @@ static int __init its_probe_one(struct its_node *its)
        u32 ctlr;
        int err;
 
+       its_enable_quirks(its);
+
        if (is_v4(its)) {
                if (!(its->typer & GITS_TYPER_VMOVP)) {
                        err = its_compute_its_list_map(its);
@@ -5429,7 +5448,6 @@ static int __init its_of_probe(struct device_node *node)
                if (!its)
                        return -ENOMEM;
 
-               its_enable_quirks(its);
                err = its_probe_one(its);
                if (err)  {
                        its_node_destroy(its);
index 1623cd77917523f42419cb958ecbc0ce32ba8809..b3736bdd4b9f2ce0ddabd86b777f40c53c488eeb 100644 (file)
@@ -241,7 +241,7 @@ static int eiointc_domain_alloc(struct irq_domain *domain, unsigned int virq,
        int ret;
        unsigned int i, type;
        unsigned long hwirq = 0;
-       struct eiointc *priv = domain->host_data;
+       struct eiointc_priv *priv = domain->host_data;
 
        ret = irq_domain_translate_onecell(domain, arg, &hwirq, &type);
        if (ret)
index 5101a3fb11df5bef53122db9db3c194669d754e7..58881d3139792074bf6ae1430a4de3760d3eb220 100644 (file)
@@ -235,22 +235,17 @@ static const struct irq_domain_ops mbigen_domain_ops = {
 static int mbigen_of_create_domain(struct platform_device *pdev,
                                   struct mbigen_device *mgn_chip)
 {
-       struct device *parent;
        struct platform_device *child;
        struct irq_domain *domain;
        struct device_node *np;
        u32 num_pins;
        int ret = 0;
 
-       parent = bus_get_dev_root(&platform_bus_type);
-       if (!parent)
-               return -ENODEV;
-
        for_each_child_of_node(pdev->dev.of_node, np) {
                if (!of_property_read_bool(np, "interrupt-controller"))
                        continue;
 
-               child = of_platform_device_create(np, NULL, parent);
+               child = of_platform_device_create(np, NULL, NULL);
                if (!child) {
                        ret = -ENOMEM;
                        break;
@@ -273,7 +268,6 @@ static int mbigen_of_create_domain(struct platform_device *pdev,
                }
        }
 
-       put_device(parent);
        if (ret)
                of_node_put(np);
 
index cda5838d2232dc1971369b4c6e872d004263b8de..7942d8eb3d00eae5fa7e5718a05ef889bb8a82f0 100644 (file)
@@ -389,8 +389,8 @@ static int qcom_mpm_init(struct device_node *np, struct device_node *parent)
                /* Don't use devm_ioremap_resource, as we're accessing a shared region. */
                priv->base = devm_ioremap(dev, res.start, resource_size(&res));
                of_node_put(msgram_np);
-               if (IS_ERR(priv->base))
-                       return PTR_ERR(priv->base);
+               if (!priv->base)
+                       return -ENOMEM;
        } else {
                /* Otherwise, fall back to simple MMIO. */
                priv->base = devm_platform_ioremap_resource(pdev, 0);
index 5b7bc4fd9517c8972680ad7a503eebf2ca47a518..bf0b40b0fad4b23d756a22a86c8e206a7155e858 100644 (file)
@@ -148,7 +148,13 @@ static void plic_irq_eoi(struct irq_data *d)
 {
        struct plic_handler *handler = this_cpu_ptr(&plic_handlers);
 
-       writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
+       if (unlikely(irqd_irq_disabled(d))) {
+               plic_toggle(handler, d->hwirq, 1);
+               writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
+               plic_toggle(handler, d->hwirq, 0);
+       } else {
+               writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
+       }
 }
 
 #ifdef CONFIG_SMP
index 6ae2329052c92c3c2724694c11f586bed8c5c15a..4e6afa89921fe0b79c4a760163d66565a2092d53 100644 (file)
@@ -300,7 +300,7 @@ struct cached_dev {
        struct list_head        list;
        struct bcache_device    disk;
        struct block_device     *bdev;
-       struct bdev_handle      *bdev_handle;
+       struct file             *bdev_file;
 
        struct cache_sb         sb;
        struct cache_sb_disk    *sb_disk;
@@ -423,7 +423,7 @@ struct cache {
 
        struct kobject          kobj;
        struct block_device     *bdev;
-       struct bdev_handle      *bdev_handle;
+       struct file             *bdev_file;
 
        struct task_struct      *alloc_thread;
 
index dc3f50f69714174cd0d649b4d003a9ed5b2d4992..330bcd9ea4a9ccd0366e9c5d28d0e7b1fb061fe6 100644 (file)
@@ -900,9 +900,23 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size,
        struct request_queue *q;
        const size_t max_stripes = min_t(size_t, INT_MAX,
                                         SIZE_MAX / sizeof(atomic_t));
+       struct queue_limits lim = {
+               .max_hw_sectors         = UINT_MAX,
+               .max_sectors            = UINT_MAX,
+               .max_segment_size       = UINT_MAX,
+               .max_segments           = BIO_MAX_VECS,
+               .max_hw_discard_sectors = UINT_MAX,
+               .io_min                 = block_size,
+               .logical_block_size     = block_size,
+               .physical_block_size    = block_size,
+       };
        uint64_t n;
        int idx;
 
+       if (cached_bdev) {
+               d->stripe_size = bdev_io_opt(cached_bdev) >> SECTOR_SHIFT;
+               lim.io_opt = umax(block_size, bdev_io_opt(cached_bdev));
+       }
        if (!d->stripe_size)
                d->stripe_size = 1 << 31;
        else if (d->stripe_size < BCH_MIN_STRIPE_SZ)
@@ -935,8 +949,21 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size,
                        BIOSET_NEED_BVECS|BIOSET_NEED_RESCUER))
                goto out_ida_remove;
 
-       d->disk = blk_alloc_disk(NUMA_NO_NODE);
-       if (!d->disk)
+       if (lim.logical_block_size > PAGE_SIZE && cached_bdev) {
+               /*
+                * This should only happen with BCACHE_SB_VERSION_BDEV.
+                * Block/page size is checked for BCACHE_SB_VERSION_CDEV.
+                */
+               pr_info("bcache%i: sb/logical block size (%u) greater than page size (%lu) falling back to device logical block size (%u)\n",
+                       idx, lim.logical_block_size,
+                       PAGE_SIZE, bdev_logical_block_size(cached_bdev));
+
+               /* This also adjusts physical block size/min io size if needed */
+               lim.logical_block_size = bdev_logical_block_size(cached_bdev);
+       }
+
+       d->disk = blk_alloc_disk(&lim, NUMA_NO_NODE);
+       if (IS_ERR(d->disk))
                goto out_bioset_exit;
 
        set_capacity(d->disk, sectors);
@@ -949,27 +976,6 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size,
        d->disk->private_data   = d;
 
        q = d->disk->queue;
-       q->limits.max_hw_sectors        = UINT_MAX;
-       q->limits.max_sectors           = UINT_MAX;
-       q->limits.max_segment_size      = UINT_MAX;
-       q->limits.max_segments          = BIO_MAX_VECS;
-       blk_queue_max_discard_sectors(q, UINT_MAX);
-       q->limits.io_min                = block_size;
-       q->limits.logical_block_size    = block_size;
-       q->limits.physical_block_size   = block_size;
-
-       if (q->limits.logical_block_size > PAGE_SIZE && cached_bdev) {
-               /*
-                * This should only happen with BCACHE_SB_VERSION_BDEV.
-                * Block/page size is checked for BCACHE_SB_VERSION_CDEV.
-                */
-               pr_info("%s: sb/logical block size (%u) greater than page size (%lu) falling back to device logical block size (%u)\n",
-                       d->disk->disk_name, q->limits.logical_block_size,
-                       PAGE_SIZE, bdev_logical_block_size(cached_bdev));
-
-               /* This also adjusts physical block size/min io size if needed */
-               blk_queue_logical_block_size(q, bdev_logical_block_size(cached_bdev));
-       }
 
        blk_queue_flag_set(QUEUE_FLAG_NONROT, d->disk->queue);
 
@@ -1369,8 +1375,8 @@ static CLOSURE_CALLBACK(cached_dev_free)
        if (dc->sb_disk)
                put_page(virt_to_page(dc->sb_disk));
 
-       if (dc->bdev_handle)
-               bdev_release(dc->bdev_handle);
+       if (dc->bdev_file)
+               fput(dc->bdev_file);
 
        wake_up(&unregister_wait);
 
@@ -1416,9 +1422,7 @@ static int cached_dev_init(struct cached_dev *dc, unsigned int block_size)
                hlist_add_head(&io->hash, dc->io_hash + RECENT_IO);
        }
 
-       dc->disk.stripe_size = q->limits.io_opt >> 9;
-
-       if (dc->disk.stripe_size)
+       if (bdev_io_opt(dc->bdev))
                dc->partial_stripes_expensive =
                        q->limits.raid_partial_stripes_expensive;
 
@@ -1428,9 +1432,6 @@ static int cached_dev_init(struct cached_dev *dc, unsigned int block_size)
        if (ret)
                return ret;
 
-       blk_queue_io_opt(dc->disk.disk->queue,
-               max(queue_io_opt(dc->disk.disk->queue), queue_io_opt(q)));
-
        atomic_set(&dc->io_errors, 0);
        dc->io_disable = false;
        dc->error_limit = DEFAULT_CACHED_DEV_ERROR_LIMIT;
@@ -1445,7 +1446,7 @@ static int cached_dev_init(struct cached_dev *dc, unsigned int block_size)
 /* Cached device - bcache superblock */
 
 static int register_bdev(struct cache_sb *sb, struct cache_sb_disk *sb_disk,
-                                struct bdev_handle *bdev_handle,
+                                struct file *bdev_file,
                                 struct cached_dev *dc)
 {
        const char *err = "cannot allocate memory";
@@ -1453,8 +1454,8 @@ static int register_bdev(struct cache_sb *sb, struct cache_sb_disk *sb_disk,
        int ret = -ENOMEM;
 
        memcpy(&dc->sb, sb, sizeof(struct cache_sb));
-       dc->bdev_handle = bdev_handle;
-       dc->bdev = bdev_handle->bdev;
+       dc->bdev_file = bdev_file;
+       dc->bdev = file_bdev(bdev_file);
        dc->sb_disk = sb_disk;
 
        if (cached_dev_init(dc, sb->block_size << 9))
@@ -2218,8 +2219,8 @@ void bch_cache_release(struct kobject *kobj)
        if (ca->sb_disk)
                put_page(virt_to_page(ca->sb_disk));
 
-       if (ca->bdev_handle)
-               bdev_release(ca->bdev_handle);
+       if (ca->bdev_file)
+               fput(ca->bdev_file);
 
        kfree(ca);
        module_put(THIS_MODULE);
@@ -2339,18 +2340,18 @@ err_free:
 }
 
 static int register_cache(struct cache_sb *sb, struct cache_sb_disk *sb_disk,
-                               struct bdev_handle *bdev_handle,
+                               struct file *bdev_file,
                                struct cache *ca)
 {
        const char *err = NULL; /* must be set for any error case */
        int ret = 0;
 
        memcpy(&ca->sb, sb, sizeof(struct cache_sb));
-       ca->bdev_handle = bdev_handle;
-       ca->bdev = bdev_handle->bdev;
+       ca->bdev_file = bdev_file;
+       ca->bdev = file_bdev(bdev_file);
        ca->sb_disk = sb_disk;
 
-       if (bdev_max_discard_sectors((bdev_handle->bdev)))
+       if (bdev_max_discard_sectors(file_bdev(bdev_file)))
                ca->discard = CACHE_DISCARD(&ca->sb);
 
        ret = cache_alloc(ca);
@@ -2361,20 +2362,20 @@ static int register_cache(struct cache_sb *sb, struct cache_sb_disk *sb_disk,
                        err = "cache_alloc(): cache device is too small";
                else
                        err = "cache_alloc(): unknown error";
-               pr_notice("error %pg: %s\n", bdev_handle->bdev, err);
+               pr_notice("error %pg: %s\n", file_bdev(bdev_file), err);
                /*
                 * If we failed here, it means ca->kobj is not initialized yet,
                 * kobject_put() won't be called and there is no chance to
-                * call bdev_release() to bdev in bch_cache_release(). So
-                * we explicitly call bdev_release() here.
+                * call fput() to bdev in bch_cache_release(). So
+                * we explicitly call fput() on the block device here.
                 */
-               bdev_release(bdev_handle);
+               fput(bdev_file);
                return ret;
        }
 
-       if (kobject_add(&ca->kobj, bdev_kobj(bdev_handle->bdev), "bcache")) {
+       if (kobject_add(&ca->kobj, bdev_kobj(file_bdev(bdev_file)), "bcache")) {
                pr_notice("error %pg: error calling kobject_add\n",
-                         bdev_handle->bdev);
+                         file_bdev(bdev_file));
                ret = -ENOMEM;
                goto out;
        }
@@ -2388,7 +2389,7 @@ static int register_cache(struct cache_sb *sb, struct cache_sb_disk *sb_disk,
                goto out;
        }
 
-       pr_info("registered cache device %pg\n", ca->bdev_handle->bdev);
+       pr_info("registered cache device %pg\n", file_bdev(ca->bdev_file));
 
 out:
        kobject_put(&ca->kobj);
@@ -2446,7 +2447,7 @@ struct async_reg_args {
        char *path;
        struct cache_sb *sb;
        struct cache_sb_disk *sb_disk;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        void *holder;
 };
 
@@ -2457,7 +2458,7 @@ static void register_bdev_worker(struct work_struct *work)
                container_of(work, struct async_reg_args, reg_work.work);
 
        mutex_lock(&bch_register_lock);
-       if (register_bdev(args->sb, args->sb_disk, args->bdev_handle,
+       if (register_bdev(args->sb, args->sb_disk, args->bdev_file,
                          args->holder) < 0)
                fail = true;
        mutex_unlock(&bch_register_lock);
@@ -2478,7 +2479,7 @@ static void register_cache_worker(struct work_struct *work)
                container_of(work, struct async_reg_args, reg_work.work);
 
        /* blkdev_put() will be called in bch_cache_release() */
-       if (register_cache(args->sb, args->sb_disk, args->bdev_handle,
+       if (register_cache(args->sb, args->sb_disk, args->bdev_file,
                           args->holder))
                fail = true;
 
@@ -2516,7 +2517,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
        char *path = NULL;
        struct cache_sb *sb;
        struct cache_sb_disk *sb_disk;
-       struct bdev_handle *bdev_handle, *bdev_handle2;
+       struct file *bdev_file, *bdev_file2;
        void *holder = NULL;
        ssize_t ret;
        bool async_registration = false;
@@ -2549,15 +2550,15 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
 
        ret = -EINVAL;
        err = "failed to open device";
-       bdev_handle = bdev_open_by_path(strim(path), BLK_OPEN_READ, NULL, NULL);
-       if (IS_ERR(bdev_handle))
+       bdev_file = bdev_file_open_by_path(strim(path), BLK_OPEN_READ, NULL, NULL);
+       if (IS_ERR(bdev_file))
                goto out_free_sb;
 
        err = "failed to set blocksize";
-       if (set_blocksize(bdev_handle->bdev, 4096))
+       if (set_blocksize(file_bdev(bdev_file), 4096))
                goto out_blkdev_put;
 
-       err = read_super(sb, bdev_handle->bdev, &sb_disk);
+       err = read_super(sb, file_bdev(bdev_file), &sb_disk);
        if (err)
                goto out_blkdev_put;
 
@@ -2569,13 +2570,13 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
        }
 
        /* Now reopen in exclusive mode with proper holder */
-       bdev_handle2 = bdev_open_by_dev(bdev_handle->bdev->bd_dev,
+       bdev_file2 = bdev_file_open_by_dev(file_bdev(bdev_file)->bd_dev,
                        BLK_OPEN_READ | BLK_OPEN_WRITE, holder, NULL);
-       bdev_release(bdev_handle);
-       bdev_handle = bdev_handle2;
-       if (IS_ERR(bdev_handle)) {
-               ret = PTR_ERR(bdev_handle);
-               bdev_handle = NULL;
+       fput(bdev_file);
+       bdev_file = bdev_file2;
+       if (IS_ERR(bdev_file)) {
+               ret = PTR_ERR(bdev_file);
+               bdev_file = NULL;
                if (ret == -EBUSY) {
                        dev_t dev;
 
@@ -2610,7 +2611,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
                args->path      = path;
                args->sb        = sb;
                args->sb_disk   = sb_disk;
-               args->bdev_handle       = bdev_handle;
+               args->bdev_file = bdev_file;
                args->holder    = holder;
                register_device_async(args);
                /* No wait and returns to user space */
@@ -2619,14 +2620,14 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
 
        if (SB_IS_BDEV(sb)) {
                mutex_lock(&bch_register_lock);
-               ret = register_bdev(sb, sb_disk, bdev_handle, holder);
+               ret = register_bdev(sb, sb_disk, bdev_file, holder);
                mutex_unlock(&bch_register_lock);
                /* blkdev_put() will be called in cached_dev_free() */
                if (ret < 0)
                        goto out_free_sb;
        } else {
                /* blkdev_put() will be called in bch_cache_release() */
-               ret = register_cache(sb, sb_disk, bdev_handle, holder);
+               ret = register_cache(sb, sb_disk, bdev_file, holder);
                if (ret)
                        goto out_free_sb;
        }
@@ -2642,8 +2643,8 @@ out_free_holder:
 out_put_sb_page:
        put_page(virt_to_page(sb_disk));
 out_blkdev_put:
-       if (bdev_handle)
-               bdev_release(bdev_handle);
+       if (bdev_file)
+               fput(bdev_file);
 out_free_sb:
        kfree(sb);
 out_free_path:
index 095b9b49aa8250a1f56c531883cce5f6e8a24727..e6757a30dccad1fa1a6ae060b33d41a6a120dda3 100644 (file)
@@ -22,6 +22,8 @@
 #include "dm-ima.h"
 
 #define DM_RESERVED_MAX_IOS            1024
+#define DM_MAX_TARGETS                 1048576
+#define DM_MAX_TARGET_PARAMS           1024
 
 struct dm_io;
 
index 855b482cbff1f072912e957e8c1cc1d3b4e1b319..59445763e55a65de49e79cc2436c8a03131a5a15 100644 (file)
 struct convert_context {
        struct completion restart;
        struct bio *bio_in;
-       struct bio *bio_out;
        struct bvec_iter iter_in;
+       struct bio *bio_out;
        struct bvec_iter iter_out;
-       u64 cc_sector;
        atomic_t cc_pending;
+       u64 cc_sector;
        union {
                struct skcipher_request *req;
                struct aead_request *req_aead;
        } r;
+       bool aead_recheck;
+       bool aead_failed;
 
 };
 
@@ -73,10 +75,8 @@ struct dm_crypt_io {
        struct bio *base_bio;
        u8 *integrity_metadata;
        bool integrity_metadata_from_pool:1;
-       bool in_tasklet:1;
 
        struct work_struct work;
-       struct tasklet_struct tasklet;
 
        struct convert_context ctx;
 
@@ -84,6 +84,8 @@ struct dm_crypt_io {
        blk_status_t error;
        sector_t sector;
 
+       struct bvec_iter saved_bi_iter;
+
        struct rb_node rb_node;
 } CRYPTO_MINALIGN_ATTR;
 
@@ -1372,10 +1374,13 @@ static int crypt_convert_block_aead(struct crypt_config *cc,
        if (r == -EBADMSG) {
                sector_t s = le64_to_cpu(*sector);
 
-               DMERR_LIMIT("%pg: INTEGRITY AEAD ERROR, sector %llu",
-                           ctx->bio_in->bi_bdev, s);
-               dm_audit_log_bio(DM_MSG_PREFIX, "integrity-aead",
-                                ctx->bio_in, s, 0);
+               ctx->aead_failed = true;
+               if (ctx->aead_recheck) {
+                       DMERR_LIMIT("%pg: INTEGRITY AEAD ERROR, sector %llu",
+                                   ctx->bio_in->bi_bdev, s);
+                       dm_audit_log_bio(DM_MSG_PREFIX, "integrity-aead",
+                                        ctx->bio_in, s, 0);
+               }
        }
 
        if (!r && cc->iv_gen_ops && cc->iv_gen_ops->post)
@@ -1759,10 +1764,11 @@ static void crypt_io_init(struct dm_crypt_io *io, struct crypt_config *cc,
        io->base_bio = bio;
        io->sector = sector;
        io->error = 0;
+       io->ctx.aead_recheck = false;
+       io->ctx.aead_failed = false;
        io->ctx.r.req = NULL;
        io->integrity_metadata = NULL;
        io->integrity_metadata_from_pool = false;
-       io->in_tasklet = false;
        atomic_set(&io->io_pending, 0);
 }
 
@@ -1771,12 +1777,7 @@ static void crypt_inc_pending(struct dm_crypt_io *io)
        atomic_inc(&io->io_pending);
 }
 
-static void kcryptd_io_bio_endio(struct work_struct *work)
-{
-       struct dm_crypt_io *io = container_of(work, struct dm_crypt_io, work);
-
-       bio_endio(io->base_bio);
-}
+static void kcryptd_queue_read(struct dm_crypt_io *io);
 
 /*
  * One of the bios was finished. Check for completion of
@@ -1791,6 +1792,15 @@ static void crypt_dec_pending(struct dm_crypt_io *io)
        if (!atomic_dec_and_test(&io->io_pending))
                return;
 
+       if (likely(!io->ctx.aead_recheck) && unlikely(io->ctx.aead_failed) &&
+           cc->on_disk_tag_size && bio_data_dir(base_bio) == READ) {
+               io->ctx.aead_recheck = true;
+               io->ctx.aead_failed = false;
+               io->error = 0;
+               kcryptd_queue_read(io);
+               return;
+       }
+
        if (io->ctx.r.req)
                crypt_free_req(cc, io->ctx.r.req, base_bio);
 
@@ -1801,20 +1811,6 @@ static void crypt_dec_pending(struct dm_crypt_io *io)
 
        base_bio->bi_status = error;
 
-       /*
-        * If we are running this function from our tasklet,
-        * we can't call bio_endio() here, because it will call
-        * clone_endio() from dm.c, which in turn will
-        * free the current struct dm_crypt_io structure with
-        * our tasklet. In this case we need to delay bio_endio()
-        * execution to after the tasklet is done and dequeued.
-        */
-       if (io->in_tasklet) {
-               INIT_WORK(&io->work, kcryptd_io_bio_endio);
-               queue_work(cc->io_queue, &io->work);
-               return;
-       }
-
        bio_endio(base_bio);
 }
 
@@ -1840,15 +1836,19 @@ static void crypt_endio(struct bio *clone)
        struct dm_crypt_io *io = clone->bi_private;
        struct crypt_config *cc = io->cc;
        unsigned int rw = bio_data_dir(clone);
-       blk_status_t error;
+       blk_status_t error = clone->bi_status;
+
+       if (io->ctx.aead_recheck && !error) {
+               kcryptd_queue_crypt(io);
+               return;
+       }
 
        /*
         * free the processed pages
         */
-       if (rw == WRITE)
+       if (rw == WRITE || io->ctx.aead_recheck)
                crypt_free_buffer_pages(cc, clone);
 
-       error = clone->bi_status;
        bio_put(clone);
 
        if (rw == READ && !error) {
@@ -1869,6 +1869,22 @@ static int kcryptd_io_read(struct dm_crypt_io *io, gfp_t gfp)
        struct crypt_config *cc = io->cc;
        struct bio *clone;
 
+       if (io->ctx.aead_recheck) {
+               if (!(gfp & __GFP_DIRECT_RECLAIM))
+                       return 1;
+               crypt_inc_pending(io);
+               clone = crypt_alloc_buffer(io, io->base_bio->bi_iter.bi_size);
+               if (unlikely(!clone)) {
+                       crypt_dec_pending(io);
+                       return 1;
+               }
+               clone->bi_iter.bi_sector = cc->start + io->sector;
+               crypt_convert_init(cc, &io->ctx, clone, clone, io->sector);
+               io->saved_bi_iter = clone->bi_iter;
+               dm_submit_bio_remap(io->base_bio, clone);
+               return 0;
+       }
+
        /*
         * We need the original biovec array in order to decrypt the whole bio
         * data *afterwards* -- thanks to immutable biovecs we don't need to
@@ -2095,6 +2111,12 @@ static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
        io->ctx.bio_out = clone;
        io->ctx.iter_out = clone->bi_iter;
 
+       if (crypt_integrity_aead(cc)) {
+               bio_copy_data(clone, io->base_bio);
+               io->ctx.bio_in = clone;
+               io->ctx.iter_in = clone->bi_iter;
+       }
+
        sector += bio_sectors(clone);
 
        crypt_inc_pending(io);
@@ -2131,6 +2153,14 @@ dec:
 
 static void kcryptd_crypt_read_done(struct dm_crypt_io *io)
 {
+       if (io->ctx.aead_recheck) {
+               if (!io->error) {
+                       io->ctx.bio_in->bi_iter = io->saved_bi_iter;
+                       bio_copy_data(io->base_bio, io->ctx.bio_in);
+               }
+               crypt_free_buffer_pages(io->cc, io->ctx.bio_in);
+               bio_put(io->ctx.bio_in);
+       }
        crypt_dec_pending(io);
 }
 
@@ -2160,11 +2190,17 @@ static void kcryptd_crypt_read_convert(struct dm_crypt_io *io)
 
        crypt_inc_pending(io);
 
-       crypt_convert_init(cc, &io->ctx, io->base_bio, io->base_bio,
-                          io->sector);
+       if (io->ctx.aead_recheck) {
+               io->ctx.cc_sector = io->sector + cc->iv_offset;
+               r = crypt_convert(cc, &io->ctx,
+                                 test_bit(DM_CRYPT_NO_READ_WORKQUEUE, &cc->flags), true);
+       } else {
+               crypt_convert_init(cc, &io->ctx, io->base_bio, io->base_bio,
+                                  io->sector);
 
-       r = crypt_convert(cc, &io->ctx,
-                         test_bit(DM_CRYPT_NO_READ_WORKQUEUE, &cc->flags), true);
+               r = crypt_convert(cc, &io->ctx,
+                                 test_bit(DM_CRYPT_NO_READ_WORKQUEUE, &cc->flags), true);
+       }
        /*
         * Crypto API backlogged the request, because its queue was full
         * and we're in softirq context, so continue from a workqueue
@@ -2206,10 +2242,13 @@ static void kcryptd_async_done(void *data, int error)
        if (error == -EBADMSG) {
                sector_t s = le64_to_cpu(*org_sector_of_dmreq(cc, dmreq));
 
-               DMERR_LIMIT("%pg: INTEGRITY AEAD ERROR, sector %llu",
-                           ctx->bio_in->bi_bdev, s);
-               dm_audit_log_bio(DM_MSG_PREFIX, "integrity-aead",
-                                ctx->bio_in, s, 0);
+               ctx->aead_failed = true;
+               if (ctx->aead_recheck) {
+                       DMERR_LIMIT("%pg: INTEGRITY AEAD ERROR, sector %llu",
+                                   ctx->bio_in->bi_bdev, s);
+                       dm_audit_log_bio(DM_MSG_PREFIX, "integrity-aead",
+                                        ctx->bio_in, s, 0);
+               }
                io->error = BLK_STS_PROTECTION;
        } else if (error < 0)
                io->error = BLK_STS_IOERR;
@@ -2246,11 +2285,6 @@ static void kcryptd_crypt(struct work_struct *work)
                kcryptd_crypt_write_convert(io);
 }
 
-static void kcryptd_crypt_tasklet(unsigned long work)
-{
-       kcryptd_crypt((struct work_struct *)work);
-}
-
 static void kcryptd_queue_crypt(struct dm_crypt_io *io)
 {
        struct crypt_config *cc = io->cc;
@@ -2262,15 +2296,10 @@ static void kcryptd_queue_crypt(struct dm_crypt_io *io)
                 * irqs_disabled(): the kernel may run some IO completion from the idle thread, but
                 * it is being executed with irqs disabled.
                 */
-               if (in_hardirq() || irqs_disabled()) {
-                       io->in_tasklet = true;
-                       tasklet_init(&io->tasklet, kcryptd_crypt_tasklet, (unsigned long)&io->work);
-                       tasklet_schedule(&io->tasklet);
+               if (!(in_hardirq() || irqs_disabled())) {
+                       kcryptd_crypt(&io->work);
                        return;
                }
-
-               kcryptd_crypt(&io->work);
-               return;
        }
 
        INIT_WORK(&io->work, kcryptd_crypt);
@@ -3144,7 +3173,7 @@ static int crypt_ctr_optional(struct dm_target *ti, unsigned int argc, char **ar
                        sval = strchr(opt_string + strlen("integrity:"), ':') + 1;
                        if (!strcasecmp(sval, "aead")) {
                                set_bit(CRYPT_MODE_INTEGRITY_AEAD, &cc->cipher_flags);
-                       } else  if (strcasecmp(sval, "none")) {
+                       } else if (strcasecmp(sval, "none")) {
                                ti->error = "Unknown integrity profile";
                                return -EINVAL;
                        }
@@ -3673,7 +3702,7 @@ static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits)
 
 static struct target_type crypt_target = {
        .name   = "crypt",
-       .version = {1, 24, 0},
+       .version = {1, 25, 0},
        .module = THIS_MODULE,
        .ctr    = crypt_ctr,
        .dtr    = crypt_dtr,
index c5f03aab455256ff1b0abc606b7728438be347f0..1fc901df84eb163c833e364d21e0a48e65c06239 100644 (file)
@@ -278,6 +278,8 @@ struct dm_integrity_c {
 
        atomic64_t number_of_mismatches;
 
+       mempool_t recheck_pool;
+
        struct notifier_block reboot_notifier;
 };
 
@@ -1689,6 +1691,77 @@ failed:
        get_random_bytes(result, ic->tag_size);
 }
 
+static noinline void integrity_recheck(struct dm_integrity_io *dio, char *checksum)
+{
+       struct bio *bio = dm_bio_from_per_bio_data(dio, sizeof(struct dm_integrity_io));
+       struct dm_integrity_c *ic = dio->ic;
+       struct bvec_iter iter;
+       struct bio_vec bv;
+       sector_t sector, logical_sector, area, offset;
+       struct page *page;
+       void *buffer;
+
+       get_area_and_offset(ic, dio->range.logical_sector, &area, &offset);
+       dio->metadata_block = get_metadata_sector_and_offset(ic, area, offset,
+                                                            &dio->metadata_offset);
+       sector = get_data_sector(ic, area, offset);
+       logical_sector = dio->range.logical_sector;
+
+       page = mempool_alloc(&ic->recheck_pool, GFP_NOIO);
+       buffer = page_to_virt(page);
+
+       __bio_for_each_segment(bv, bio, iter, dio->bio_details.bi_iter) {
+               unsigned pos = 0;
+
+               do {
+                       char *mem;
+                       int r;
+                       struct dm_io_request io_req;
+                       struct dm_io_region io_loc;
+                       io_req.bi_opf = REQ_OP_READ;
+                       io_req.mem.type = DM_IO_KMEM;
+                       io_req.mem.ptr.addr = buffer;
+                       io_req.notify.fn = NULL;
+                       io_req.client = ic->io;
+                       io_loc.bdev = ic->dev->bdev;
+                       io_loc.sector = sector;
+                       io_loc.count = ic->sectors_per_block;
+
+                       r = dm_io(&io_req, 1, &io_loc, NULL);
+                       if (unlikely(r)) {
+                               dio->bi_status = errno_to_blk_status(r);
+                               goto free_ret;
+                       }
+
+                       integrity_sector_checksum(ic, logical_sector, buffer, checksum);
+                       r = dm_integrity_rw_tag(ic, checksum, &dio->metadata_block,
+                                               &dio->metadata_offset, ic->tag_size, TAG_CMP);
+                       if (r) {
+                               if (r > 0) {
+                                       DMERR_LIMIT("%pg: Checksum failed at sector 0x%llx",
+                                                   bio->bi_bdev, logical_sector);
+                                       atomic64_inc(&ic->number_of_mismatches);
+                                       dm_audit_log_bio(DM_MSG_PREFIX, "integrity-checksum",
+                                                        bio, logical_sector, 0);
+                                       r = -EILSEQ;
+                               }
+                               dio->bi_status = errno_to_blk_status(r);
+                               goto free_ret;
+                       }
+
+                       mem = bvec_kmap_local(&bv);
+                       memcpy(mem + pos, buffer, ic->sectors_per_block << SECTOR_SHIFT);
+                       kunmap_local(mem);
+
+                       pos += ic->sectors_per_block << SECTOR_SHIFT;
+                       sector += ic->sectors_per_block;
+                       logical_sector += ic->sectors_per_block;
+               } while (pos < bv.bv_len);
+       }
+free_ret:
+       mempool_free(page, &ic->recheck_pool);
+}
+
 static void integrity_metadata(struct work_struct *w)
 {
        struct dm_integrity_io *dio = container_of(w, struct dm_integrity_io, work);
@@ -1776,15 +1849,8 @@ again:
                                                checksums_ptr - checksums, dio->op == REQ_OP_READ ? TAG_CMP : TAG_WRITE);
                        if (unlikely(r)) {
                                if (r > 0) {
-                                       sector_t s;
-
-                                       s = sector - ((r + ic->tag_size - 1) / ic->tag_size);
-                                       DMERR_LIMIT("%pg: Checksum failed at sector 0x%llx",
-                                                   bio->bi_bdev, s);
-                                       r = -EILSEQ;
-                                       atomic64_inc(&ic->number_of_mismatches);
-                                       dm_audit_log_bio(DM_MSG_PREFIX, "integrity-checksum",
-                                                        bio, s, 0);
+                                       integrity_recheck(dio, checksums);
+                                       goto skip_io;
                                }
                                if (likely(checksums != checksums_onstack))
                                        kfree(checksums);
@@ -4261,6 +4327,12 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned int argc, char **argv
                goto bad;
        }
 
+       r = mempool_init_page_pool(&ic->recheck_pool, 1, 0);
+       if (r) {
+               ti->error = "Cannot allocate mempool";
+               goto bad;
+       }
+
        ic->metadata_wq = alloc_workqueue("dm-integrity-metadata",
                                          WQ_MEM_RECLAIM, METADATA_WORKQUEUE_MAX_ACTIVE);
        if (!ic->metadata_wq) {
@@ -4609,6 +4681,7 @@ static void dm_integrity_dtr(struct dm_target *ti)
        kvfree(ic->bbs);
        if (ic->bufio)
                dm_bufio_client_destroy(ic->bufio);
+       mempool_exit(&ic->recheck_pool);
        mempool_exit(&ic->journal_io_mempool);
        if (ic->io)
                dm_io_client_destroy(ic->io);
@@ -4661,7 +4734,7 @@ static void dm_integrity_dtr(struct dm_target *ti)
 
 static struct target_type integrity_target = {
        .name                   = "integrity",
-       .version                = {1, 10, 0},
+       .version                = {1, 11, 0},
        .module                 = THIS_MODULE,
        .features               = DM_TARGET_SINGLETON | DM_TARGET_INTEGRITY,
        .ctr                    = dm_integrity_ctr,
index e65058e0ed06ab73b9d20d26dfbf7aca55829572..3b1ad7127cb846a1b50059921241f2abe63eaf53 100644 (file)
@@ -1941,7 +1941,8 @@ static int copy_params(struct dm_ioctl __user *user, struct dm_ioctl *param_kern
                           minimum_data_size - sizeof(param_kernel->version)))
                return -EFAULT;
 
-       if (param_kernel->data_size < minimum_data_size) {
+       if (unlikely(param_kernel->data_size < minimum_data_size) ||
+           unlikely(param_kernel->data_size > DM_MAX_TARGETS * DM_MAX_TARGET_PARAMS)) {
                DMERR("Invalid data size in the ioctl structure: %u",
                      param_kernel->data_size);
                return -EINVAL;
index eb009d6bb03a17b72a06b9a932cd15242be10e26..17e9af60bbf7f07a7b4a567c1cb7672d7d6deb67 100644 (file)
@@ -213,6 +213,7 @@ struct raid_dev {
 #define RT_FLAG_RS_IN_SYNC             6
 #define RT_FLAG_RS_RESYNCING           7
 #define RT_FLAG_RS_GROW                        8
+#define RT_FLAG_RS_FROZEN              9
 
 /* Array elements of 64 bit needed for rebuild/failed disk bits */
 #define DISKS_ARRAY_ELEMS ((MAX_RAID_DEVICES + (sizeof(uint64_t) * 8 - 1)) / sizeof(uint64_t) / 8)
@@ -3240,11 +3241,12 @@ size_check:
        rs->md.ro = 1;
        rs->md.in_sync = 1;
 
-       /* Keep array frozen until resume. */
-       set_bit(MD_RECOVERY_FROZEN, &rs->md.recovery);
-
        /* Has to be held on running the array */
        mddev_suspend_and_lock_nointr(&rs->md);
+
+       /* Keep array frozen until resume. */
+       md_frozen_sync_thread(&rs->md);
+
        r = md_run(&rs->md);
        rs->md.in_sync = 0; /* Assume already marked dirty */
        if (r) {
@@ -3339,7 +3341,8 @@ static int raid_map(struct dm_target *ti, struct bio *bio)
        if (unlikely(bio_end_sector(bio) > mddev->array_sectors))
                return DM_MAPIO_REQUEUE;
 
-       md_handle_request(mddev, bio);
+       if (unlikely(!md_handle_request(mddev, bio)))
+               return DM_MAPIO_REQUEUE;
 
        return DM_MAPIO_SUBMITTED;
 }
@@ -3718,21 +3721,33 @@ static int raid_message(struct dm_target *ti, unsigned int argc, char **argv,
 {
        struct raid_set *rs = ti->private;
        struct mddev *mddev = &rs->md;
+       int ret = 0;
 
        if (!mddev->pers || !mddev->pers->sync_request)
                return -EINVAL;
 
-       if (!strcasecmp(argv[0], "frozen"))
-               set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
-       else
-               clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+       if (test_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags) ||
+           test_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags))
+               return -EBUSY;
 
-       if (!strcasecmp(argv[0], "idle") || !strcasecmp(argv[0], "frozen")) {
-               if (mddev->sync_thread) {
-                       set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-                       md_reap_sync_thread(mddev);
-               }
-       } else if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
+       if (!strcasecmp(argv[0], "frozen")) {
+               ret = mddev_lock(mddev);
+               if (ret)
+                       return ret;
+
+               md_frozen_sync_thread(mddev);
+               mddev_unlock(mddev);
+       } else if (!strcasecmp(argv[0], "idle")) {
+               ret = mddev_lock(mddev);
+               if (ret)
+                       return ret;
+
+               md_idle_sync_thread(mddev);
+               mddev_unlock(mddev);
+       }
+
+       clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+       if (decipher_sync_action(mddev, mddev->recovery) != st_idle)
                return -EBUSY;
        else if (!strcasecmp(argv[0], "resync"))
                ; /* MD_RECOVERY_NEEDED set below */
@@ -3791,15 +3806,46 @@ static void raid_io_hints(struct dm_target *ti, struct queue_limits *limits)
        blk_limits_io_opt(limits, chunk_size_bytes * mddev_data_stripes(rs));
 }
 
+static void raid_presuspend(struct dm_target *ti)
+{
+       struct raid_set *rs = ti->private;
+       struct mddev *mddev = &rs->md;
+
+       /*
+        * From now on, disallow raid_message() to change sync_thread until
+        * resume, raid_postsuspend() is too late.
+        */
+       set_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags);
+
+       if (!reshape_interrupted(mddev))
+               return;
+
+       /*
+        * For raid456, if reshape is interrupted, IO across reshape position
+        * will never make progress, while caller will wait for IO to be done.
+        * Inform raid456 to handle those IO to prevent deadlock.
+        */
+       if (mddev->pers && mddev->pers->prepare_suspend)
+               mddev->pers->prepare_suspend(mddev);
+}
+
+static void raid_presuspend_undo(struct dm_target *ti)
+{
+       struct raid_set *rs = ti->private;
+
+       clear_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags);
+}
+
 static void raid_postsuspend(struct dm_target *ti)
 {
        struct raid_set *rs = ti->private;
 
        if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
-               /* Writes have to be stopped before suspending to avoid deadlocks. */
-               if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
-                       md_stop_writes(&rs->md);
-
+               /*
+                * sync_thread must be stopped during suspend, and writes have
+                * to be stopped before suspending to avoid deadlocks.
+                */
+               md_stop_writes(&rs->md);
                mddev_suspend(&rs->md, false);
        }
 }
@@ -4012,8 +4058,6 @@ static int raid_preresume(struct dm_target *ti)
        }
 
        /* Check for any resize/reshape on @rs and adjust/initiate */
-       /* Be prepared for mddev_resume() in raid_resume() */
-       set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
        if (mddev->recovery_cp && mddev->recovery_cp < MaxSector) {
                set_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
                mddev->resync_min = mddev->recovery_cp;
@@ -4047,7 +4091,9 @@ static void raid_resume(struct dm_target *ti)
                 * Take this opportunity to check whether any failed
                 * devices are reachable again.
                 */
+               mddev_lock_nointr(mddev);
                attempt_restore_of_faulty_devices(rs);
+               mddev_unlock(mddev);
        }
 
        if (test_and_clear_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
@@ -4055,10 +4101,13 @@ static void raid_resume(struct dm_target *ti)
                if (mddev->delta_disks < 0)
                        rs_set_capacity(rs);
 
+               WARN_ON_ONCE(!test_bit(MD_RECOVERY_FROZEN, &mddev->recovery));
+               WARN_ON_ONCE(test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
+               clear_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags);
                mddev_lock_nointr(mddev);
-               clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
                mddev->ro = 0;
                mddev->in_sync = 0;
+               md_unfrozen_sync_thread(mddev);
                mddev_unlock_and_resume(mddev);
        }
 }
@@ -4074,6 +4123,8 @@ static struct target_type raid_target = {
        .message = raid_message,
        .iterate_devices = raid_iterate_devices,
        .io_hints = raid_io_hints,
+       .presuspend = raid_presuspend,
+       .presuspend_undo = raid_presuspend_undo,
        .postsuspend = raid_postsuspend,
        .preresume = raid_preresume,
        .resume = raid_resume,
index bdc14ec9981414c60e1dee432b97d50e82dbbc88..1e5d988f44da6919da6de094c6744bf1bb2a89be 100644 (file)
@@ -66,6 +66,9 @@ struct dm_stats_last_position {
        unsigned int last_rw;
 };
 
+#define DM_STAT_MAX_ENTRIES            8388608
+#define DM_STAT_MAX_HISTOGRAM_ENTRIES  134217728
+
 /*
  * A typo on the command line could possibly make the kernel run out of memory
  * and crash. To prevent the crash we account all used memory. We fail if we
@@ -285,6 +288,9 @@ static int dm_stats_create(struct dm_stats *stats, sector_t start, sector_t end,
        if (n_entries != (size_t)n_entries || !(size_t)(n_entries + 1))
                return -EOVERFLOW;
 
+       if (n_entries > DM_STAT_MAX_ENTRIES)
+               return -EOVERFLOW;
+
        shared_alloc_size = struct_size(s, stat_shared, n_entries);
        if ((shared_alloc_size - sizeof(struct dm_stat)) / sizeof(struct dm_stat_shared) != n_entries)
                return -EOVERFLOW;
@@ -297,6 +303,9 @@ static int dm_stats_create(struct dm_stats *stats, sector_t start, sector_t end,
        if (histogram_alloc_size / (n_histogram_entries + 1) != (size_t)n_entries * sizeof(unsigned long long))
                return -EOVERFLOW;
 
+       if ((n_histogram_entries + 1) * (size_t)n_entries > DM_STAT_MAX_HISTOGRAM_ENTRIES)
+               return -EOVERFLOW;
+
        if (!check_shared_memory(shared_alloc_size + histogram_alloc_size +
                                 num_possible_cpus() * (percpu_alloc_size + histogram_alloc_size)))
                return -ENOMEM;
index 260b5b8f2b0d7e9352ed9ed9376a91504ee10c9d..88114719fe187ad42905424b2f5e685bf3e21e17 100644 (file)
@@ -129,7 +129,12 @@ static int alloc_targets(struct dm_table *t, unsigned int num)
 int dm_table_create(struct dm_table **result, blk_mode_t mode,
                    unsigned int num_targets, struct mapped_device *md)
 {
-       struct dm_table *t = kzalloc(sizeof(*t), GFP_KERNEL);
+       struct dm_table *t;
+
+       if (num_targets > DM_MAX_TARGETS)
+               return -EOVERFLOW;
+
+       t = kzalloc(sizeof(*t), GFP_KERNEL);
 
        if (!t)
                return -ENOMEM;
@@ -144,7 +149,7 @@ int dm_table_create(struct dm_table **result, blk_mode_t mode,
 
        if (!num_targets) {
                kfree(t);
-               return -ENOMEM;
+               return -EOVERFLOW;
        }
 
        if (alloc_targets(t, num_targets)) {
@@ -1958,26 +1963,27 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
        bool wc = false, fua = false;
        int r;
 
-       /*
-        * Copy table's limits to the DM device's request_queue
-        */
-       q->limits = *limits;
-
        if (dm_table_supports_nowait(t))
                blk_queue_flag_set(QUEUE_FLAG_NOWAIT, q);
        else
                blk_queue_flag_clear(QUEUE_FLAG_NOWAIT, q);
 
        if (!dm_table_supports_discards(t)) {
-               q->limits.max_discard_sectors = 0;
-               q->limits.max_hw_discard_sectors = 0;
-               q->limits.discard_granularity = 0;
-               q->limits.discard_alignment = 0;
-               q->limits.discard_misaligned = 0;
+               limits->max_hw_discard_sectors = 0;
+               limits->discard_granularity = 0;
+               limits->discard_alignment = 0;
+               limits->discard_misaligned = 0;
        }
 
+       if (!dm_table_supports_write_zeroes(t))
+               limits->max_write_zeroes_sectors = 0;
+
        if (!dm_table_supports_secure_erase(t))
-               q->limits.max_secure_erase_sectors = 0;
+               limits->max_secure_erase_sectors = 0;
+
+       r = queue_limits_set(q, limits);
+       if (r)
+               return r;
 
        if (dm_table_supports_flush(t, (1UL << QUEUE_FLAG_WC))) {
                wc = true;
@@ -2002,9 +2008,6 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
        else
                blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
 
-       if (!dm_table_supports_write_zeroes(t))
-               q->limits.max_write_zeroes_sectors = 0;
-
        dm_table_verify_integrity(t);
 
        /*
@@ -2042,7 +2045,6 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
        }
 
        dm_update_crypto_profile(q, t);
-       disk_update_readahead(t->md->disk);
 
        /*
         * Check for request-based device is left to
index 14e58ae705218f71923b99bdfc1d195e6a45e658..1b591bfa90d5d6463016e22183dbb5f071e94a75 100644 (file)
@@ -482,6 +482,63 @@ int verity_for_bv_block(struct dm_verity *v, struct dm_verity_io *io,
        return 0;
 }
 
+static int verity_recheck_copy(struct dm_verity *v, struct dm_verity_io *io,
+                              u8 *data, size_t len)
+{
+       memcpy(data, io->recheck_buffer, len);
+       io->recheck_buffer += len;
+
+       return 0;
+}
+
+static noinline int verity_recheck(struct dm_verity *v, struct dm_verity_io *io,
+                                  struct bvec_iter start, sector_t cur_block)
+{
+       struct page *page;
+       void *buffer;
+       int r;
+       struct dm_io_request io_req;
+       struct dm_io_region io_loc;
+
+       page = mempool_alloc(&v->recheck_pool, GFP_NOIO);
+       buffer = page_to_virt(page);
+
+       io_req.bi_opf = REQ_OP_READ;
+       io_req.mem.type = DM_IO_KMEM;
+       io_req.mem.ptr.addr = buffer;
+       io_req.notify.fn = NULL;
+       io_req.client = v->io;
+       io_loc.bdev = v->data_dev->bdev;
+       io_loc.sector = cur_block << (v->data_dev_block_bits - SECTOR_SHIFT);
+       io_loc.count = 1 << (v->data_dev_block_bits - SECTOR_SHIFT);
+       r = dm_io(&io_req, 1, &io_loc, NULL);
+       if (unlikely(r))
+               goto free_ret;
+
+       r = verity_hash(v, verity_io_hash_req(v, io), buffer,
+                       1 << v->data_dev_block_bits,
+                       verity_io_real_digest(v, io), true);
+       if (unlikely(r))
+               goto free_ret;
+
+       if (memcmp(verity_io_real_digest(v, io),
+                  verity_io_want_digest(v, io), v->digest_size)) {
+               r = -EIO;
+               goto free_ret;
+       }
+
+       io->recheck_buffer = buffer;
+       r = verity_for_bv_block(v, io, &start, verity_recheck_copy);
+       if (unlikely(r))
+               goto free_ret;
+
+       r = 0;
+free_ret:
+       mempool_free(page, &v->recheck_pool);
+
+       return r;
+}
+
 static int verity_bv_zero(struct dm_verity *v, struct dm_verity_io *io,
                          u8 *data, size_t len)
 {
@@ -508,9 +565,7 @@ static int verity_verify_io(struct dm_verity_io *io)
 {
        bool is_zero;
        struct dm_verity *v = io->v;
-#if defined(CONFIG_DM_VERITY_FEC)
        struct bvec_iter start;
-#endif
        struct bvec_iter iter_copy;
        struct bvec_iter *iter;
        struct crypto_wait wait;
@@ -561,10 +616,7 @@ static int verity_verify_io(struct dm_verity_io *io)
                if (unlikely(r < 0))
                        return r;
 
-#if defined(CONFIG_DM_VERITY_FEC)
-               if (verity_fec_is_enabled(v))
-                       start = *iter;
-#endif
+               start = *iter;
                r = verity_for_io_block(v, io, iter, &wait);
                if (unlikely(r < 0))
                        return r;
@@ -586,6 +638,10 @@ static int verity_verify_io(struct dm_verity_io *io)
                         * tasklet since it may sleep, so fallback to work-queue.
                         */
                        return -EAGAIN;
+               } else if (verity_recheck(v, io, start, cur_block) == 0) {
+                       if (v->validated_blocks)
+                               set_bit(cur_block, v->validated_blocks);
+                       continue;
 #if defined(CONFIG_DM_VERITY_FEC)
                } else if (verity_fec_decode(v, io, DM_VERITY_BLOCK_TYPE_DATA,
                                             cur_block, NULL, &start) == 0) {
@@ -645,23 +701,6 @@ static void verity_work(struct work_struct *w)
        verity_finish_io(io, errno_to_blk_status(verity_verify_io(io)));
 }
 
-static void verity_tasklet(unsigned long data)
-{
-       struct dm_verity_io *io = (struct dm_verity_io *)data;
-       int err;
-
-       io->in_tasklet = true;
-       err = verity_verify_io(io);
-       if (err == -EAGAIN || err == -ENOMEM) {
-               /* fallback to retrying with work-queue */
-               INIT_WORK(&io->work, verity_work);
-               queue_work(io->v->verify_wq, &io->work);
-               return;
-       }
-
-       verity_finish_io(io, errno_to_blk_status(err));
-}
-
 static void verity_end_io(struct bio *bio)
 {
        struct dm_verity_io *io = bio->bi_private;
@@ -674,13 +713,8 @@ static void verity_end_io(struct bio *bio)
                return;
        }
 
-       if (static_branch_unlikely(&use_tasklet_enabled) && io->v->use_tasklet) {
-               tasklet_init(&io->tasklet, verity_tasklet, (unsigned long)io);
-               tasklet_schedule(&io->tasklet);
-       } else {
-               INIT_WORK(&io->work, verity_work);
-               queue_work(io->v->verify_wq, &io->work);
-       }
+       INIT_WORK(&io->work, verity_work);
+       queue_work(io->v->verify_wq, &io->work);
 }
 
 /*
@@ -963,6 +997,10 @@ static void verity_dtr(struct dm_target *ti)
        if (v->verify_wq)
                destroy_workqueue(v->verify_wq);
 
+       mempool_exit(&v->recheck_pool);
+       if (v->io)
+               dm_io_client_destroy(v->io);
+
        if (v->bufio)
                dm_bufio_client_destroy(v->bufio);
 
@@ -1401,6 +1439,20 @@ static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv)
        }
        v->hash_blocks = hash_position;
 
+       r = mempool_init_page_pool(&v->recheck_pool, 1, 0);
+       if (unlikely(r)) {
+               ti->error = "Cannot allocate mempool";
+               goto bad;
+       }
+
+       v->io = dm_io_client_create();
+       if (IS_ERR(v->io)) {
+               r = PTR_ERR(v->io);
+               v->io = NULL;
+               ti->error = "Cannot allocate dm io";
+               goto bad;
+       }
+
        v->bufio = dm_bufio_client_create(v->hash_dev->bdev,
                1 << v->hash_dev_block_bits, 1, sizeof(struct buffer_aux),
                dm_bufio_alloc_callback, NULL,
@@ -1508,7 +1560,7 @@ int dm_verity_get_root_digest(struct dm_target *ti, u8 **root_digest, unsigned i
 static struct target_type verity_target = {
        .name           = "verity",
        .features       = DM_TARGET_IMMUTABLE,
-       .version        = {1, 9, 0},
+       .version        = {1, 10, 0},
        .module         = THIS_MODULE,
        .ctr            = verity_ctr,
        .dtr            = verity_dtr,
index f9d522c870e61665d87271f66c690138db42108f..db93a91169d5e6de31d344a6f37589bbc0bdb654 100644 (file)
@@ -11,6 +11,7 @@
 #ifndef DM_VERITY_H
 #define DM_VERITY_H
 
+#include <linux/dm-io.h>
 #include <linux/dm-bufio.h>
 #include <linux/device-mapper.h>
 #include <linux/interrupt.h>
@@ -68,6 +69,9 @@ struct dm_verity {
        unsigned long *validated_blocks; /* bitset blocks validated */
 
        char *signature_key_desc; /* signature keyring reference */
+
+       struct dm_io_client *io;
+       mempool_t recheck_pool;
 };
 
 struct dm_verity_io {
@@ -76,14 +80,15 @@ struct dm_verity_io {
        /* original value of bio->bi_end_io */
        bio_end_io_t *orig_bi_end_io;
 
+       struct bvec_iter iter;
+
        sector_t block;
        unsigned int n_blocks;
        bool in_tasklet;
 
-       struct bvec_iter iter;
-
        struct work_struct work;
-       struct tasklet_struct tasklet;
+
+       char *recheck_buffer;
 
        /*
         * Three variably-size fields follow this struct:
index 074cb785eafc19172b9ebf4b6a6f2ae4591563d6..b463c28c39ad34ca23b3d2433811384901171d80 100644 (file)
@@ -299,7 +299,7 @@ static int persistent_memory_claim(struct dm_writecache *wc)
                long i;
 
                wc->memory_map = NULL;
-               pages = kvmalloc_array(p, sizeof(struct page *), GFP_KERNEL);
+               pages = vmalloc_array(p, sizeof(struct page *));
                if (!pages) {
                        r = -ENOMEM;
                        goto err2;
@@ -330,7 +330,7 @@ static int persistent_memory_claim(struct dm_writecache *wc)
                        r = -ENOMEM;
                        goto err3;
                }
-               kvfree(pages);
+               vfree(pages);
                wc->memory_vmapped = true;
        }
 
@@ -341,7 +341,7 @@ static int persistent_memory_claim(struct dm_writecache *wc)
 
        return 0;
 err3:
-       kvfree(pages);
+       vfree(pages);
 err2:
        dax_read_unlock(id);
 err1:
@@ -962,7 +962,7 @@ static int writecache_alloc_entries(struct dm_writecache *wc)
 
        if (wc->entries)
                return 0;
-       wc->entries = vmalloc(array_size(sizeof(struct wc_entry), wc->n_blocks));
+       wc->entries = vmalloc_array(wc->n_blocks, sizeof(struct wc_entry));
        if (!wc->entries)
                return -ENOMEM;
        for (b = 0; b < wc->n_blocks; b++) {
index fdfe30f7b6973d76fac4a50f6b41f6b2f4102168..8156881a31de93d4b155051e8085ceb90216aa46 100644 (file)
@@ -1655,10 +1655,13 @@ static int dmz_reset_zone(struct dmz_metadata *zmd, struct dm_zone *zone)
 
        if (!dmz_is_empty(zone) || dmz_seq_write_err(zone)) {
                struct dmz_dev *dev = zone->dev;
+               unsigned int noio_flag;
 
+               noio_flag = memalloc_noio_save();
                ret = blkdev_zone_mgmt(dev->bdev, REQ_OP_ZONE_RESET,
                                       dmz_start_sect(zmd, zone),
-                                      zmd->zone_nr_sectors, GFP_NOIO);
+                                      zmd->zone_nr_sectors);
+               memalloc_noio_restore(noio_flag);
                if (ret) {
                        dmz_dev_err(dev, "Reset zone %u failed %d",
                                    zone->id, ret);
index 8dcabf84d866e6d2ac863a98790b16b838475d23..447e132d09b53732f9d3eff3dc8aa85644ff3135 100644 (file)
@@ -726,7 +726,8 @@ static struct table_device *open_table_device(struct mapped_device *md,
                dev_t dev, blk_mode_t mode)
 {
        struct table_device *td;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
+       struct block_device *bdev;
        u64 part_off;
        int r;
 
@@ -735,34 +736,36 @@ static struct table_device *open_table_device(struct mapped_device *md,
                return ERR_PTR(-ENOMEM);
        refcount_set(&td->count, 1);
 
-       bdev_handle = bdev_open_by_dev(dev, mode, _dm_claim_ptr, NULL);
-       if (IS_ERR(bdev_handle)) {
-               r = PTR_ERR(bdev_handle);
+       bdev_file = bdev_file_open_by_dev(dev, mode, _dm_claim_ptr, NULL);
+       if (IS_ERR(bdev_file)) {
+               r = PTR_ERR(bdev_file);
                goto out_free_td;
        }
 
+       bdev = file_bdev(bdev_file);
+
        /*
         * We can be called before the dm disk is added.  In that case we can't
         * register the holder relation here.  It will be done once add_disk was
         * called.
         */
        if (md->disk->slave_dir) {
-               r = bd_link_disk_holder(bdev_handle->bdev, md->disk);
+               r = bd_link_disk_holder(bdev, md->disk);
                if (r)
                        goto out_blkdev_put;
        }
 
        td->dm_dev.mode = mode;
-       td->dm_dev.bdev = bdev_handle->bdev;
-       td->dm_dev.bdev_handle = bdev_handle;
-       td->dm_dev.dax_dev = fs_dax_get_by_bdev(bdev_handle->bdev, &part_off,
+       td->dm_dev.bdev = bdev;
+       td->dm_dev.bdev_file = bdev_file;
+       td->dm_dev.dax_dev = fs_dax_get_by_bdev(bdev, &part_off,
                                                NULL, NULL);
        format_dev_t(td->dm_dev.name, dev);
        list_add(&td->list, &md->table_devices);
        return td;
 
 out_blkdev_put:
-       bdev_release(bdev_handle);
+       fput(bdev_file);
 out_free_td:
        kfree(td);
        return ERR_PTR(r);
@@ -775,7 +778,7 @@ static void close_table_device(struct table_device *td, struct mapped_device *md
 {
        if (md->disk->slave_dir)
                bd_unlink_disk_holder(td->dm_dev.bdev, md->disk);
-       bdev_release(td->dm_dev.bdev_handle);
+       fput(td->dm_dev.bdev_file);
        put_dax(td->dm_dev.dax_dev);
        list_del(&td->list);
        kfree(td);
@@ -2098,8 +2101,8 @@ static struct mapped_device *alloc_dev(int minor)
         * established. If request-based table is loaded: blk-mq will
         * override accordingly.
         */
-       md->disk = blk_alloc_disk(md->numa_node_id);
-       if (!md->disk)
+       md->disk = blk_alloc_disk(NULL, md->numa_node_id);
+       if (IS_ERR(md->disk))
                goto bad;
        md->queue = md->disk->queue;
 
index 9672f75c30503cefc197ed0c3234d7a254b9fbe4..059afc24c08bec85ba2739002043619053ca0f1d 100644 (file)
@@ -234,7 +234,8 @@ static int __write_sb_page(struct md_rdev *rdev, struct bitmap *bitmap,
        sector_t doff;
 
        bdev = (rdev->meta_bdev) ? rdev->meta_bdev : rdev->bdev;
-       if (pg_index == store->file_pages - 1) {
+       /* we compare length (page numbers), not page offset. */
+       if ((pg_index - store->sb_index) == store->file_pages - 1) {
                unsigned int last_page_size = store->bytes & (PAGE_SIZE - 1);
 
                if (last_page_size == 0)
@@ -438,8 +439,8 @@ static void filemap_write_page(struct bitmap *bitmap, unsigned long pg_index,
        struct page *page = store->filemap[pg_index];
 
        if (mddev_is_clustered(bitmap->mddev)) {
-               pg_index += bitmap->cluster_slot *
-                       DIV_ROUND_UP(store->bytes, PAGE_SIZE);
+               /* go to node bitmap area starting point */
+               pg_index += store->sb_index;
        }
 
        if (store->file)
@@ -952,6 +953,7 @@ static void md_bitmap_file_set_bit(struct bitmap *bitmap, sector_t block)
        unsigned long index = file_page_index(store, chunk);
        unsigned long node_offset = 0;
 
+       index += store->sb_index;
        if (mddev_is_clustered(bitmap->mddev))
                node_offset = bitmap->cluster_slot * store->file_pages;
 
@@ -982,6 +984,7 @@ static void md_bitmap_file_clear_bit(struct bitmap *bitmap, sector_t block)
        unsigned long index = file_page_index(store, chunk);
        unsigned long node_offset = 0;
 
+       index += store->sb_index;
        if (mddev_is_clustered(bitmap->mddev))
                node_offset = bitmap->cluster_slot * store->file_pages;
 
@@ -1043,9 +1046,8 @@ void md_bitmap_unplug(struct bitmap *bitmap)
                if (dirty || need_write) {
                        if (!writing) {
                                md_bitmap_wait_writes(bitmap);
-                               if (bitmap->mddev->queue)
-                                       blk_add_trace_msg(bitmap->mddev->queue,
-                                                         "md bitmap_unplug");
+                               mddev_add_trace_msg(bitmap->mddev,
+                                       "md bitmap_unplug");
                        }
                        clear_page_attr(bitmap, i, BITMAP_PAGE_PENDING);
                        filemap_write_page(bitmap, i, false);
@@ -1316,9 +1318,7 @@ void md_bitmap_daemon_work(struct mddev *mddev)
        }
        bitmap->allclean = 1;
 
-       if (bitmap->mddev->queue)
-               blk_add_trace_msg(bitmap->mddev->queue,
-                                 "md bitmap_daemon_work");
+       mddev_add_trace_msg(bitmap->mddev, "md bitmap_daemon_work");
 
        /* Any file-page which is PENDING now needs to be written.
         * So set NEEDWRITE now, then after we make any last-minute changes
diff --git a/drivers/md/md-linear.h b/drivers/md/md-linear.h
deleted file mode 100644 (file)
index 5587eee..0000000
+++ /dev/null
@@ -1,17 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINEAR_H
-#define _LINEAR_H
-
-struct dev_info {
-       struct md_rdev  *rdev;
-       sector_t        end_sector;
-};
-
-struct linear_conf
-{
-       struct rcu_head         rcu;
-       sector_t                array_sectors;
-       int                     raid_disks; /* a copy of mddev->raid_disks */
-       struct dev_info         disks[] __counted_by(raid_disks);
-};
-#endif
diff --git a/drivers/md/md-multipath.h b/drivers/md/md-multipath.h
deleted file mode 100644 (file)
index b3099e5..0000000
+++ /dev/null
@@ -1,32 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _MULTIPATH_H
-#define _MULTIPATH_H
-
-struct multipath_info {
-       struct md_rdev  *rdev;
-};
-
-struct mpconf {
-       struct mddev                    *mddev;
-       struct multipath_info   *multipaths;
-       int                     raid_disks;
-       spinlock_t              device_lock;
-       struct list_head        retry_list;
-
-       mempool_t               pool;
-};
-
-/*
- * this is our 'private' 'collective' MULTIPATH buffer head.
- * it contains information about what kind of IO operations were started
- * for this MULTIPATH operation, and about their status:
- */
-
-struct multipath_bh {
-       struct mddev                    *mddev;
-       struct bio              *master_bio;
-       struct bio              bio;
-       int                     path;
-       struct list_head        retry_list;
-};
-#endif
index 2266358d807466f95d02b431d09ee39805dff5e8..e575e74aabf5efc8f6660b73166f59cffcbc4d19 100644 (file)
@@ -65,7 +65,6 @@
 #include <linux/percpu-refcount.h>
 #include <linux/part_stat.h>
 
-#include <trace/events/block.h>
 #include "md.h"
 #include "md-bitmap.h"
 #include "md-cluster.h"
@@ -99,18 +98,6 @@ static void mddev_detach(struct mddev *mddev);
 static void export_rdev(struct md_rdev *rdev, struct mddev *mddev);
 static void md_wakeup_thread_directly(struct md_thread __rcu *thread);
 
-enum md_ro_state {
-       MD_RDWR,
-       MD_RDONLY,
-       MD_AUTO_READ,
-       MD_MAX_STATE
-};
-
-static bool md_is_rdwr(struct mddev *mddev)
-{
-       return (mddev->ro == MD_RDWR);
-}
-
 /*
  * Default number of read corrections we'll attempt on an rdev
  * before ejecting it from the array. We divide the read error
@@ -378,7 +365,7 @@ static bool is_suspended(struct mddev *mddev, struct bio *bio)
        return true;
 }
 
-void md_handle_request(struct mddev *mddev, struct bio *bio)
+bool md_handle_request(struct mddev *mddev, struct bio *bio)
 {
 check_suspended:
        if (is_suspended(mddev, bio)) {
@@ -386,7 +373,7 @@ check_suspended:
                /* Bail out if REQ_NOWAIT is set for the bio */
                if (bio->bi_opf & REQ_NOWAIT) {
                        bio_wouldblock_error(bio);
-                       return;
+                       return true;
                }
                for (;;) {
                        prepare_to_wait(&mddev->sb_wait, &__wait,
@@ -402,10 +389,13 @@ check_suspended:
 
        if (!mddev->pers->make_request(mddev, bio)) {
                percpu_ref_put(&mddev->active_io);
+               if (!mddev->gendisk && mddev->pers->prepare_suspend)
+                       return false;
                goto check_suspended;
        }
 
        percpu_ref_put(&mddev->active_io);
+       return true;
 }
 EXPORT_SYMBOL(md_handle_request);
 
@@ -529,6 +519,24 @@ void mddev_resume(struct mddev *mddev)
 }
 EXPORT_SYMBOL_GPL(mddev_resume);
 
+/* sync bdev before setting device to readonly or stopping raid*/
+static int mddev_set_closing_and_sync_blockdev(struct mddev *mddev, int opener_num)
+{
+       mutex_lock(&mddev->open_mutex);
+       if (mddev->pers && atomic_read(&mddev->openers) > opener_num) {
+               mutex_unlock(&mddev->open_mutex);
+               return -EBUSY;
+       }
+       if (test_and_set_bit(MD_CLOSING, &mddev->flags)) {
+               mutex_unlock(&mddev->open_mutex);
+               return -EBUSY;
+       }
+       mutex_unlock(&mddev->open_mutex);
+
+       sync_blockdev(mddev->gendisk->part0);
+       return 0;
+}
+
 /*
  * Generic flush handling for md
  */
@@ -579,8 +587,12 @@ static void submit_flushes(struct work_struct *ws)
                        rcu_read_lock();
                }
        rcu_read_unlock();
-       if (atomic_dec_and_test(&mddev->flush_pending))
+       if (atomic_dec_and_test(&mddev->flush_pending)) {
+               /* The pair is percpu_ref_get() from md_flush_request() */
+               percpu_ref_put(&mddev->active_io);
+
                queue_work(md_wq, &mddev->flush_work);
+       }
 }
 
 static void md_submit_flush_data(struct work_struct *ws)
@@ -2402,7 +2414,7 @@ int md_integrity_register(struct mddev *mddev)
 
        if (list_empty(&mddev->disks))
                return 0; /* nothing to do */
-       if (!mddev->gendisk || blk_get_integrity(mddev->gendisk))
+       if (mddev_is_dm(mddev) || blk_get_integrity(mddev->gendisk))
                return 0; /* shouldn't register, or already is */
        rdev_for_each(rdev, mddev) {
                /* skip spares and non-functional disks */
@@ -2455,7 +2467,7 @@ int md_integrity_add_rdev(struct md_rdev *rdev, struct mddev *mddev)
 {
        struct blk_integrity *bi_mddev;
 
-       if (!mddev->gendisk)
+       if (mddev_is_dm(mddev))
                return 0;
 
        bi_mddev = blk_get_integrity(mddev->gendisk);
@@ -2562,6 +2574,7 @@ static int bind_rdev_to_array(struct md_rdev *rdev, struct mddev *mddev)
  fail:
        pr_warn("md: failed to register dev-%s for %s\n",
                b, mdname(mddev));
+       mddev_destroy_serial_pool(mddev, rdev);
        return err;
 }
 
@@ -2578,7 +2591,7 @@ static void export_rdev(struct md_rdev *rdev, struct mddev *mddev)
        if (test_bit(AutoDetected, &rdev->flags))
                md_autodetect_dev(rdev->bdev->bd_dev);
 #endif
-       bdev_release(rdev->bdev_handle);
+       fput(rdev->bdev_file);
        rdev->bdev = NULL;
        kobject_put(&rdev->kobj);
 }
@@ -2591,7 +2604,7 @@ static void md_kick_rdev_from_array(struct md_rdev *rdev)
        list_del_rcu(&rdev->same_set);
        pr_debug("md: unbind<%pg>\n", rdev->bdev);
        mddev_destroy_serial_pool(rdev->mddev, rdev);
-       rdev->mddev = NULL;
+       WRITE_ONCE(rdev->mddev, NULL);
        sysfs_remove_link(&rdev->kobj, "block");
        sysfs_put(rdev->sysfs_state);
        sysfs_put(rdev->sysfs_unack_badblocks);
@@ -2847,8 +2860,7 @@ repeat:
        pr_debug("md: updating %s RAID superblock on device (in sync %d)\n",
                 mdname(mddev), mddev->in_sync);
 
-       if (mddev->queue)
-               blk_add_trace_msg(mddev->queue, "md md_update_sb");
+       mddev_add_trace_msg(mddev, "md md_update_sb");
 rewrite:
        md_bitmap_update_sb(mddev->bitmap);
        rdev_for_each(rdev, mddev) {
@@ -2929,7 +2941,6 @@ static int add_bound_rdev(struct md_rdev *rdev)
                set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
        set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
        md_new_event();
-       md_wakeup_thread(mddev->thread);
        return 0;
 }
 
@@ -3044,10 +3055,8 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
 
                        if (err == 0) {
                                md_kick_rdev_from_array(rdev);
-                               if (mddev->pers) {
+                               if (mddev->pers)
                                        set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
-                                       md_wakeup_thread(mddev->thread);
-                               }
                                md_new_event();
                        }
                }
@@ -3077,7 +3086,6 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
                clear_bit(BlockedBadBlocks, &rdev->flags);
                wake_up(&rdev->blocked_wait);
                set_bit(MD_RECOVERY_NEEDED, &rdev->mddev->recovery);
-               md_wakeup_thread(rdev->mddev->thread);
 
                err = 0;
        } else if (cmd_match(buf, "insync") && rdev->raid_disk == -1) {
@@ -3115,7 +3123,6 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
                    !test_bit(Replacement, &rdev->flags))
                        set_bit(WantReplacement, &rdev->flags);
                set_bit(MD_RECOVERY_NEEDED, &rdev->mddev->recovery);
-               md_wakeup_thread(rdev->mddev->thread);
                err = 0;
        } else if (cmd_match(buf, "-want_replacement")) {
                /* Clearing 'want_replacement' is always allowed.
@@ -3245,7 +3252,6 @@ slot_store(struct md_rdev *rdev, const char *buf, size_t len)
                if (rdev->raid_disk >= 0)
                        return -EBUSY;
                set_bit(MD_RECOVERY_NEEDED, &rdev->mddev->recovery);
-               md_wakeup_thread(rdev->mddev->thread);
        } else if (rdev->mddev->pers) {
                /* Activating a spare .. or possibly reactivating
                 * if we ever get bitmaps working here.
@@ -3339,8 +3345,7 @@ static ssize_t new_offset_store(struct md_rdev *rdev,
        if (kstrtoull(buf, 10, &new_offset) < 0)
                return -EINVAL;
 
-       if (mddev->sync_thread ||
-           test_bit(MD_RECOVERY_RUNNING,&mddev->recovery))
+       if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
                return -EBUSY;
        if (new_offset == rdev->data_offset)
                /* reset is always permitted */
@@ -3671,7 +3676,7 @@ rdev_attr_store(struct kobject *kobj, struct attribute *attr,
        struct kernfs_node *kn = NULL;
        bool suspend = false;
        ssize_t rv;
-       struct mddev *mddev = rdev->mddev;
+       struct mddev *mddev = READ_ONCE(rdev->mddev);
 
        if (!entry->store)
                return -EIO;
@@ -3773,16 +3778,16 @@ static struct md_rdev *md_import_device(dev_t newdev, int super_format, int supe
        if (err)
                goto out_clear_rdev;
 
-       rdev->bdev_handle = bdev_open_by_dev(newdev,
+       rdev->bdev_file = bdev_file_open_by_dev(newdev,
                        BLK_OPEN_READ | BLK_OPEN_WRITE,
                        super_format == -2 ? &claim_rdev : rdev, NULL);
-       if (IS_ERR(rdev->bdev_handle)) {
+       if (IS_ERR(rdev->bdev_file)) {
                pr_warn("md: could not open device unknown-block(%u,%u).\n",
                        MAJOR(newdev), MINOR(newdev));
-               err = PTR_ERR(rdev->bdev_handle);
+               err = PTR_ERR(rdev->bdev_file);
                goto out_clear_rdev;
        }
-       rdev->bdev = rdev->bdev_handle->bdev;
+       rdev->bdev = file_bdev(rdev->bdev_file);
 
        kobject_init(&rdev->kobj, &rdev_ktype);
 
@@ -3813,7 +3818,7 @@ static struct md_rdev *md_import_device(dev_t newdev, int super_format, int supe
        return rdev;
 
 out_blkdev_put:
-       bdev_release(rdev->bdev_handle);
+       fput(rdev->bdev_file);
 out_clear_rdev:
        md_rdev_clear(rdev);
 out_free_rdev:
@@ -4013,8 +4018,7 @@ level_store(struct mddev *mddev, const char *buf, size_t len)
         */
 
        rv = -EBUSY;
-       if (mddev->sync_thread ||
-           test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
+       if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
            mddev->reshape_position != MaxSector ||
            mddev->sysfs_active)
                goto out_unlock;
@@ -4164,7 +4168,6 @@ level_store(struct mddev *mddev, const char *buf, size_t len)
                mddev->in_sync = 1;
                del_timer_sync(&mddev->safemode_timer);
        }
-       blk_set_stacking_limits(&mddev->queue->limits);
        pers->run(mddev);
        set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
        if (!mddev->thread)
@@ -4471,8 +4474,8 @@ array_state_show(struct mddev *mddev, char *page)
        return sprintf(page, "%s\n", array_states[st]);
 }
 
-static int do_md_stop(struct mddev *mddev, int ro, struct block_device *bdev);
-static int md_set_readonly(struct mddev *mddev, struct block_device *bdev);
+static int do_md_stop(struct mddev *mddev, int ro);
+static int md_set_readonly(struct mddev *mddev);
 static int restart_array(struct mddev *mddev);
 
 static ssize_t
@@ -4489,6 +4492,17 @@ array_state_store(struct mddev *mddev, const char *buf, size_t len)
        case broken:            /* cannot be set */
        case bad_word:
                return -EINVAL;
+       case clear:
+       case readonly:
+       case inactive:
+       case read_auto:
+               if (!mddev->pers || !md_is_rdwr(mddev))
+                       break;
+               /* write sysfs will not open mddev and opener should be 0 */
+               err = mddev_set_closing_and_sync_blockdev(mddev, 0);
+               if (err)
+                       return err;
+               break;
        default:
                break;
        }
@@ -4522,14 +4536,14 @@ array_state_store(struct mddev *mddev, const char *buf, size_t len)
        case inactive:
                /* stop an active array, return 0 otherwise */
                if (mddev->pers)
-                       err = do_md_stop(mddev, 2, NULL);
+                       err = do_md_stop(mddev, 2);
                break;
        case clear:
-               err = do_md_stop(mddev, 0, NULL);
+               err = do_md_stop(mddev, 0);
                break;
        case readonly:
                if (mddev->pers)
-                       err = md_set_readonly(mddev, NULL);
+                       err = md_set_readonly(mddev);
                else {
                        mddev->ro = MD_RDONLY;
                        set_disk_ro(mddev->gendisk, 1);
@@ -4539,7 +4553,7 @@ array_state_store(struct mddev *mddev, const char *buf, size_t len)
        case read_auto:
                if (mddev->pers) {
                        if (md_is_rdwr(mddev))
-                               err = md_set_readonly(mddev, NULL);
+                               err = md_set_readonly(mddev);
                        else if (mddev->ro == MD_RDONLY)
                                err = restart_array(mddev);
                        if (err == 0) {
@@ -4588,6 +4602,11 @@ array_state_store(struct mddev *mddev, const char *buf, size_t len)
                sysfs_notify_dirent_safe(mddev->sysfs_state);
        }
        mddev_unlock(mddev);
+
+       if (st == readonly || st == read_auto || st == inactive ||
+           (err && st == clear))
+               clear_bit(MD_CLOSING, &mddev->flags);
+
        return err ?: len;
 }
 static struct md_sysfs_entry md_array_state =
@@ -4915,6 +4934,35 @@ static void stop_sync_thread(struct mddev *mddev, bool locked, bool check_seq)
                mddev_lock_nointr(mddev);
 }
 
+void md_idle_sync_thread(struct mddev *mddev)
+{
+       lockdep_assert_held(&mddev->reconfig_mutex);
+
+       clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+       stop_sync_thread(mddev, true, true);
+}
+EXPORT_SYMBOL_GPL(md_idle_sync_thread);
+
+void md_frozen_sync_thread(struct mddev *mddev)
+{
+       lockdep_assert_held(&mddev->reconfig_mutex);
+
+       set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+       stop_sync_thread(mddev, true, false);
+}
+EXPORT_SYMBOL_GPL(md_frozen_sync_thread);
+
+void md_unfrozen_sync_thread(struct mddev *mddev)
+{
+       lockdep_assert_held(&mddev->reconfig_mutex);
+
+       clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+       set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+       md_wakeup_thread(mddev->thread);
+       sysfs_notify_dirent_safe(mddev->sysfs_action);
+}
+EXPORT_SYMBOL_GPL(md_unfrozen_sync_thread);
+
 static void idle_sync_thread(struct mddev *mddev)
 {
        mutex_lock(&mddev->sync_mutex);
@@ -5706,6 +5754,51 @@ static const struct kobj_type md_ktype = {
 
 int mdp_major = 0;
 
+/* stack the limit for all rdevs into lim */
+void mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim)
+{
+       struct md_rdev *rdev;
+
+       rdev_for_each(rdev, mddev) {
+               queue_limits_stack_bdev(lim, rdev->bdev, rdev->data_offset,
+                                       mddev->gendisk->disk_name);
+       }
+}
+EXPORT_SYMBOL_GPL(mddev_stack_rdev_limits);
+
+/* apply the extra stacking limits from a new rdev into mddev */
+int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev)
+{
+       struct queue_limits lim;
+
+       if (mddev_is_dm(mddev))
+               return 0;
+
+       lim = queue_limits_start_update(mddev->gendisk->queue);
+       queue_limits_stack_bdev(&lim, rdev->bdev, rdev->data_offset,
+                               mddev->gendisk->disk_name);
+       return queue_limits_commit_update(mddev->gendisk->queue, &lim);
+}
+EXPORT_SYMBOL_GPL(mddev_stack_new_rdev);
+
+/* update the optimal I/O size after a reshape */
+void mddev_update_io_opt(struct mddev *mddev, unsigned int nr_stripes)
+{
+       struct queue_limits lim;
+
+       if (mddev_is_dm(mddev))
+               return;
+
+       /* don't bother updating io_opt if we can't suspend the array */
+       if (mddev_suspend(mddev, false) < 0)
+               return;
+       lim = queue_limits_start_update(mddev->gendisk->queue);
+       lim.io_opt = lim.io_min * nr_stripes;
+       queue_limits_commit_update(mddev->gendisk->queue, &lim);
+       mddev_resume(mddev);
+}
+EXPORT_SYMBOL_GPL(mddev_update_io_opt);
+
 static void mddev_delayed_delete(struct work_struct *ws)
 {
        struct mddev *mddev = container_of(ws, struct mddev, del_work);
@@ -5770,10 +5863,11 @@ struct mddev *md_alloc(dev_t dev, char *name)
                 */
                mddev->hold_active = UNTIL_STOP;
 
-       error = -ENOMEM;
-       disk = blk_alloc_disk(NUMA_NO_NODE);
-       if (!disk)
+       disk = blk_alloc_disk(NULL, NUMA_NO_NODE);
+       if (IS_ERR(disk)) {
+               error = PTR_ERR(disk);
                goto out_free_mddev;
+       }
 
        disk->major = MAJOR(mddev->unit);
        disk->first_minor = unit << shift;
@@ -5787,9 +5881,7 @@ struct mddev *md_alloc(dev_t dev, char *name)
        disk->fops = &md_fops;
        disk->private_data = mddev;
 
-       mddev->queue = disk->queue;
-       blk_set_stacking_limits(&mddev->queue->limits);
-       blk_queue_write_cache(mddev->queue, true, true);
+       blk_queue_write_cache(disk->queue, true, true);
        disk->events |= DISK_EVENT_MEDIA_CHANGE;
        mddev->gendisk = disk;
        error = add_disk(disk);
@@ -5931,7 +6023,7 @@ int md_run(struct mddev *mddev)
                invalidate_bdev(rdev->bdev);
                if (mddev->ro != MD_RDONLY && rdev_read_only(rdev)) {
                        mddev->ro = MD_RDONLY;
-                       if (mddev->gendisk)
+                       if (!mddev_is_dm(mddev))
                                set_disk_ro(mddev->gendisk, 1);
                }
 
@@ -6034,7 +6126,10 @@ int md_run(struct mddev *mddev)
                        pr_warn("True protection against single-disk failure might be compromised.\n");
        }
 
-       mddev->recovery = 0;
+       /* dm-raid expect sync_thread to be frozen until resume */
+       if (mddev->gendisk)
+               mddev->recovery = 0;
+
        /* may be over-ridden by personality */
        mddev->resync_max_sectors = mddev->dev_sectors;
 
@@ -6090,7 +6185,8 @@ int md_run(struct mddev *mddev)
                }
        }
 
-       if (mddev->queue) {
+       if (!mddev_is_dm(mddev)) {
+               struct request_queue *q = mddev->gendisk->queue;
                bool nonrot = true;
 
                rdev_for_each(rdev, mddev) {
@@ -6102,14 +6198,14 @@ int md_run(struct mddev *mddev)
                if (mddev->degraded)
                        nonrot = false;
                if (nonrot)
-                       blk_queue_flag_set(QUEUE_FLAG_NONROT, mddev->queue);
+                       blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
                else
-                       blk_queue_flag_clear(QUEUE_FLAG_NONROT, mddev->queue);
-               blk_queue_flag_set(QUEUE_FLAG_IO_STAT, mddev->queue);
+                       blk_queue_flag_clear(QUEUE_FLAG_NONROT, q);
+               blk_queue_flag_set(QUEUE_FLAG_IO_STAT, q);
 
                /* Set the NOWAIT flags if all underlying devices support it */
                if (nowait)
-                       blk_queue_flag_set(QUEUE_FLAG_NOWAIT, mddev->queue);
+                       blk_queue_flag_set(QUEUE_FLAG_NOWAIT, q);
        }
        if (pers->sync_request) {
                if (mddev->kobj.sd &&
@@ -6188,7 +6284,6 @@ int do_md_run(struct mddev *mddev)
        /* run start up tasks that require md_thread */
        md_start(mddev);
 
-       md_wakeup_thread(mddev->thread);
        md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */
 
        set_capacity_and_notify(mddev->gendisk, mddev->array_sectors);
@@ -6209,7 +6304,6 @@ int md_start(struct mddev *mddev)
 
        if (mddev->pers->start) {
                set_bit(MD_RECOVERY_WAIT, &mddev->recovery);
-               md_wakeup_thread(mddev->thread);
                ret = mddev->pers->start(mddev);
                clear_bit(MD_RECOVERY_WAIT, &mddev->recovery);
                md_wakeup_thread(mddev->sync_thread);
@@ -6254,7 +6348,6 @@ static int restart_array(struct mddev *mddev)
        pr_debug("md: %s switched to read-write mode.\n", mdname(mddev));
        /* Kick recovery or resync if necessary */
        set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
-       md_wakeup_thread(mddev->thread);
        md_wakeup_thread(mddev->sync_thread);
        sysfs_notify_dirent_safe(mddev->sysfs_state);
        return 0;
@@ -6274,7 +6367,15 @@ static void md_clean(struct mddev *mddev)
        mddev->persistent = 0;
        mddev->level = LEVEL_NONE;
        mddev->clevel[0] = 0;
-       mddev->flags = 0;
+       /*
+        * Don't clear MD_CLOSING, or mddev can be opened again.
+        * 'hold_active != 0' means mddev is still in the creation
+        * process and will be used later.
+        */
+       if (mddev->hold_active)
+               mddev->flags = 0;
+       else
+               mddev->flags &= BIT_ULL_MASK(MD_CLOSING);
        mddev->sb_flags = 0;
        mddev->ro = MD_RDWR;
        mddev->metadata_type[0] = 0;
@@ -6311,7 +6412,6 @@ static void md_clean(struct mddev *mddev)
 
 static void __md_stop_writes(struct mddev *mddev)
 {
-       stop_sync_thread(mddev, true, false);
        del_timer_sync(&mddev->safemode_timer);
 
        if (mddev->pers && mddev->pers->quiesce) {
@@ -6336,6 +6436,8 @@ static void __md_stop_writes(struct mddev *mddev)
 void md_stop_writes(struct mddev *mddev)
 {
        mddev_lock_nointr(mddev);
+       set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
+       stop_sync_thread(mddev, true, false);
        __md_stop_writes(mddev);
        mddev_unlock(mddev);
 }
@@ -6349,8 +6451,10 @@ static void mddev_detach(struct mddev *mddev)
                mddev->pers->quiesce(mddev, 0);
        }
        md_unregister_thread(mddev, &mddev->thread);
-       if (mddev->queue)
-               blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/
+
+       /* the unplug fn references 'conf' */
+       if (!mddev_is_dm(mddev))
+               blk_sync_queue(mddev->gendisk->queue);
 }
 
 static void __md_stop(struct mddev *mddev)
@@ -6387,7 +6491,8 @@ void md_stop(struct mddev *mddev)
 
 EXPORT_SYMBOL_GPL(md_stop);
 
-static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
+/* ensure 'mddev->pers' exist before calling md_set_readonly() */
+static int md_set_readonly(struct mddev *mddev)
 {
        int err = 0;
        int did_freeze = 0;
@@ -6398,7 +6503,6 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
        if (!test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
                did_freeze = 1;
                set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
-               md_wakeup_thread(mddev->thread);
        }
 
        stop_sync_thread(mddev, false, false);
@@ -6406,36 +6510,29 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
                   !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags));
        mddev_lock_nointr(mddev);
 
-       mutex_lock(&mddev->open_mutex);
-       if ((mddev->pers && atomic_read(&mddev->openers) > !!bdev) ||
-           mddev->sync_thread ||
-           test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
+       if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
                pr_warn("md: %s still in use.\n",mdname(mddev));
                err = -EBUSY;
                goto out;
        }
 
-       if (mddev->pers) {
-               __md_stop_writes(mddev);
-
-               if (mddev->ro == MD_RDONLY) {
-                       err  = -ENXIO;
-                       goto out;
-               }
+       __md_stop_writes(mddev);
 
-               mddev->ro = MD_RDONLY;
-               set_disk_ro(mddev->gendisk, 1);
+       if (mddev->ro == MD_RDONLY) {
+               err  = -ENXIO;
+               goto out;
        }
 
+       mddev->ro = MD_RDONLY;
+       set_disk_ro(mddev->gendisk, 1);
+
 out:
-       if ((mddev->pers && !err) || did_freeze) {
+       if (!err || did_freeze) {
                clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
                set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
-               md_wakeup_thread(mddev->thread);
                sysfs_notify_dirent_safe(mddev->sysfs_state);
        }
 
-       mutex_unlock(&mddev->open_mutex);
        return err;
 }
 
@@ -6443,8 +6540,7 @@ out:
  *   0 - completely stop and dis-assemble array
  *   2 - stop but do not disassemble array
  */
-static int do_md_stop(struct mddev *mddev, int mode,
-                     struct block_device *bdev)
+static int do_md_stop(struct mddev *mddev, int mode)
 {
        struct gendisk *disk = mddev->gendisk;
        struct md_rdev *rdev;
@@ -6453,22 +6549,16 @@ static int do_md_stop(struct mddev *mddev, int mode,
        if (!test_bit(MD_RECOVERY_FROZEN, &mddev->recovery)) {
                did_freeze = 1;
                set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
-               md_wakeup_thread(mddev->thread);
        }
 
        stop_sync_thread(mddev, true, false);
 
-       mutex_lock(&mddev->open_mutex);
-       if ((mddev->pers && atomic_read(&mddev->openers) > !!bdev) ||
-           mddev->sysfs_active ||
-           mddev->sync_thread ||
+       if (mddev->sysfs_active ||
            test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
                pr_warn("md: %s still in use.\n",mdname(mddev));
-               mutex_unlock(&mddev->open_mutex);
                if (did_freeze) {
                        clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
                        set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
-                       md_wakeup_thread(mddev->thread);
                }
                return -EBUSY;
        }
@@ -6487,13 +6577,11 @@ static int do_md_stop(struct mddev *mddev, int mode,
                                sysfs_unlink_rdev(mddev, rdev);
 
                set_capacity_and_notify(disk, 0);
-               mutex_unlock(&mddev->open_mutex);
                mddev->changed = 1;
 
                if (!md_is_rdwr(mddev))
                        mddev->ro = MD_RDWR;
-       } else
-               mutex_unlock(&mddev->open_mutex);
+       }
        /*
         * Free resources if final stop
         */
@@ -6539,7 +6627,7 @@ static void autorun_array(struct mddev *mddev)
        err = do_md_run(mddev);
        if (err) {
                pr_warn("md: do_md_run() returned %d\n", err);
-               do_md_stop(mddev, 0, NULL);
+               do_md_stop(mddev, 0);
        }
 }
 
@@ -7009,9 +7097,7 @@ kick_rdev:
 
        md_kick_rdev_from_array(rdev);
        set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
-       if (mddev->thread)
-               md_wakeup_thread(mddev->thread);
-       else
+       if (!mddev->thread)
                md_update_sb(mddev, 1);
        md_new_event();
 
@@ -7086,14 +7172,13 @@ static int hot_add_disk(struct mddev *mddev, dev_t dev)
        if (!bdev_nowait(rdev->bdev)) {
                pr_info("%s: Disabling nowait because %pg does not support nowait\n",
                        mdname(mddev), rdev->bdev);
-               blk_queue_flag_clear(QUEUE_FLAG_NOWAIT, mddev->queue);
+               blk_queue_flag_clear(QUEUE_FLAG_NOWAIT, mddev->gendisk->queue);
        }
        /*
         * Kick recovery, maybe this spare has to be added to the
         * array immediately.
         */
        set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
-       md_wakeup_thread(mddev->thread);
        md_new_event();
        return 0;
 
@@ -7307,8 +7392,7 @@ static int update_size(struct mddev *mddev, sector_t num_sectors)
         * of each device.  If num_sectors is zero, we find the largest size
         * that fits.
         */
-       if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
-           mddev->sync_thread)
+       if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
                return -EBUSY;
        if (!md_is_rdwr(mddev))
                return -EROFS;
@@ -7325,10 +7409,9 @@ static int update_size(struct mddev *mddev, sector_t num_sectors)
        if (!rv) {
                if (mddev_is_clustered(mddev))
                        md_cluster_ops->update_size(mddev, old_dev_sectors);
-               else if (mddev->queue) {
+               else if (!mddev_is_dm(mddev))
                        set_capacity_and_notify(mddev->gendisk,
                                                mddev->array_sectors);
-               }
        }
        return rv;
 }
@@ -7345,8 +7428,7 @@ static int update_raid_disks(struct mddev *mddev, int raid_disks)
        if (raid_disks <= 0 ||
            (mddev->max_disks && raid_disks >= mddev->max_disks))
                return -EINVAL;
-       if (mddev->sync_thread ||
-           test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
+       if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
            test_bit(MD_RESYNCING_REMOTE, &mddev->recovery) ||
            mddev->reshape_position != MaxSector)
                return -EBUSY;
@@ -7542,16 +7624,17 @@ static int md_getgeo(struct block_device *bdev, struct hd_geometry *geo)
        return 0;
 }
 
-static inline bool md_ioctl_valid(unsigned int cmd)
+static inline int md_ioctl_valid(unsigned int cmd)
 {
        switch (cmd) {
-       case ADD_NEW_DISK:
        case GET_ARRAY_INFO:
-       case GET_BITMAP_FILE:
        case GET_DISK_INFO:
+       case RAID_VERSION:
+               return 0;
+       case ADD_NEW_DISK:
+       case GET_BITMAP_FILE:
        case HOT_ADD_DISK:
        case HOT_REMOVE_DISK:
-       case RAID_VERSION:
        case RESTART_ARRAY_RW:
        case RUN_ARRAY:
        case SET_ARRAY_INFO:
@@ -7560,9 +7643,11 @@ static inline bool md_ioctl_valid(unsigned int cmd)
        case STOP_ARRAY:
        case STOP_ARRAY_RO:
        case CLUSTERED_DISK_NACK:
-               return true;
+               if (!capable(CAP_SYS_ADMIN))
+                       return -EACCES;
+               return 0;
        default:
-               return false;
+               return -ENOTTY;
        }
 }
 
@@ -7620,31 +7705,17 @@ static int md_ioctl(struct block_device *bdev, blk_mode_t mode,
        int err = 0;
        void __user *argp = (void __user *)arg;
        struct mddev *mddev = NULL;
-       bool did_set_md_closing = false;
-
-       if (!md_ioctl_valid(cmd))
-               return -ENOTTY;
 
-       switch (cmd) {
-       case RAID_VERSION:
-       case GET_ARRAY_INFO:
-       case GET_DISK_INFO:
-               break;
-       default:
-               if (!capable(CAP_SYS_ADMIN))
-                       return -EACCES;
-       }
+       err = md_ioctl_valid(cmd);
+       if (err)
+               return err;
 
        /*
         * Commands dealing with the RAID driver but not any
         * particular array:
         */
-       switch (cmd) {
-       case RAID_VERSION:
-               err = get_version(argp);
-               goto out;
-       default:;
-       }
+       if (cmd == RAID_VERSION)
+               return get_version(argp);
 
        /*
         * Commands creating/starting a new array:
@@ -7652,35 +7723,23 @@ static int md_ioctl(struct block_device *bdev, blk_mode_t mode,
 
        mddev = bdev->bd_disk->private_data;
 
-       if (!mddev) {
-               BUG();
-               goto out;
-       }
-
        /* Some actions do not requires the mutex */
        switch (cmd) {
        case GET_ARRAY_INFO:
                if (!mddev->raid_disks && !mddev->external)
-                       err = -ENODEV;
-               else
-                       err = get_array_info(mddev, argp);
-               goto out;
+                       return -ENODEV;
+               return get_array_info(mddev, argp);
 
        case GET_DISK_INFO:
                if (!mddev->raid_disks && !mddev->external)
-                       err = -ENODEV;
-               else
-                       err = get_disk_info(mddev, argp);
-               goto out;
+                       return -ENODEV;
+               return get_disk_info(mddev, argp);
 
        case SET_DISK_FAULTY:
-               err = set_disk_faulty(mddev, new_decode_dev(arg));
-               goto out;
+               return set_disk_faulty(mddev, new_decode_dev(arg));
 
        case GET_BITMAP_FILE:
-               err = get_bitmap_file(mddev, argp);
-               goto out;
-
+               return get_bitmap_file(mddev, argp);
        }
 
        if (cmd == HOT_REMOVE_DISK)
@@ -7693,20 +7752,9 @@ static int md_ioctl(struct block_device *bdev, blk_mode_t mode,
                /* Need to flush page cache, and ensure no-one else opens
                 * and writes
                 */
-               mutex_lock(&mddev->open_mutex);
-               if (mddev->pers && atomic_read(&mddev->openers) > 1) {
-                       mutex_unlock(&mddev->open_mutex);
-                       err = -EBUSY;
-                       goto out;
-               }
-               if (test_and_set_bit(MD_CLOSING, &mddev->flags)) {
-                       mutex_unlock(&mddev->open_mutex);
-                       err = -EBUSY;
-                       goto out;
-               }
-               did_set_md_closing = true;
-               mutex_unlock(&mddev->open_mutex);
-               sync_blockdev(bdev);
+               err = mddev_set_closing_and_sync_blockdev(mddev, 1);
+               if (err)
+                       return err;
        }
 
        if (!md_is_rdwr(mddev))
@@ -7747,11 +7795,12 @@ static int md_ioctl(struct block_device *bdev, blk_mode_t mode,
                goto unlock;
 
        case STOP_ARRAY:
-               err = do_md_stop(mddev, 0, bdev);
+               err = do_md_stop(mddev, 0);
                goto unlock;
 
        case STOP_ARRAY_RO:
-               err = md_set_readonly(mddev, bdev);
+               if (mddev->pers)
+                       err = md_set_readonly(mddev);
                goto unlock;
 
        case HOT_REMOVE_DISK:
@@ -7846,7 +7895,7 @@ unlock:
                                     mddev_unlock(mddev);
 
 out:
-       if(did_set_md_closing)
+       if (cmd == STOP_ARRAY_RO || (err && cmd == STOP_ARRAY))
                clear_bit(MD_CLOSING, &mddev->flags);
        return err;
 }
@@ -8683,10 +8732,7 @@ void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev,
 
        bio_chain(discard_bio, bio);
        bio_clone_blkg_association(discard_bio, bio);
-       if (mddev->gendisk)
-               trace_block_bio_remap(discard_bio,
-                               disk_devt(mddev->gendisk),
-                               bio->bi_iter.bi_sector);
+       mddev_trace_remap(mddev, discard_bio, bio->bi_iter.bi_sector);
        submit_bio_noacct(discard_bio);
 }
 EXPORT_SYMBOL_GPL(md_submit_discard_bio);
@@ -8733,6 +8779,23 @@ void md_account_bio(struct mddev *mddev, struct bio **bio)
 }
 EXPORT_SYMBOL_GPL(md_account_bio);
 
+void md_free_cloned_bio(struct bio *bio)
+{
+       struct md_io_clone *md_io_clone = bio->bi_private;
+       struct bio *orig_bio = md_io_clone->orig_bio;
+       struct mddev *mddev = md_io_clone->mddev;
+
+       if (bio->bi_status && !orig_bio->bi_status)
+               orig_bio->bi_status = bio->bi_status;
+
+       if (md_io_clone->start_time)
+               bio_end_io_acct(orig_bio, md_io_clone->start_time);
+
+       bio_put(bio);
+       percpu_ref_put(&mddev->active_io);
+}
+EXPORT_SYMBOL_GPL(md_free_cloned_bio);
+
 /* md_allow_write(mddev)
  * Calling this ensures that the array is marked 'active' so that writes
  * may proceed without blocking.  It is important to call this before
@@ -8788,12 +8851,16 @@ void md_do_sync(struct md_thread *thread)
        int ret;
 
        /* just incase thread restarts... */
-       if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
-           test_bit(MD_RECOVERY_WAIT, &mddev->recovery))
+       if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
                return;
-       if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */
+
+       if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
+               goto skip;
+
+       if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
+           !md_is_rdwr(mddev)) {/* never try to sync a read-only array */
                set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-               return;
+               goto skip;
        }
 
        if (mddev_is_clustered(mddev)) {
@@ -9162,7 +9229,7 @@ void md_do_sync(struct md_thread *thread)
                        mddev->delta_disks > 0 &&
                        mddev->pers->finish_reshape &&
                        mddev->pers->size &&
-                       mddev->queue) {
+                       !mddev_is_dm(mddev)) {
                mddev_lock_nointr(mddev);
                md_set_array_sectors(mddev, mddev->pers->size(mddev, 0, 0));
                mddev_unlock(mddev);
@@ -9262,9 +9329,14 @@ static bool md_spares_need_change(struct mddev *mddev)
 {
        struct md_rdev *rdev;
 
-       rdev_for_each(rdev, mddev)
-               if (rdev_removeable(rdev) || rdev_addable(rdev))
+       rcu_read_lock();
+       rdev_for_each_rcu(rdev, mddev) {
+               if (rdev_removeable(rdev) || rdev_addable(rdev)) {
+                       rcu_read_unlock();
                        return true;
+               }
+       }
+       rcu_read_unlock();
        return false;
 }
 
@@ -9368,13 +9440,19 @@ static void md_start_sync(struct work_struct *ws)
        struct mddev *mddev = container_of(ws, struct mddev, sync_work);
        int spares = 0;
        bool suspend = false;
+       char *name;
 
-       if (md_spares_need_change(mddev))
+       /*
+        * If reshape is still in progress, spares won't be added or removed
+        * from conf until reshape is done.
+        */
+       if (mddev->reshape_position == MaxSector &&
+           md_spares_need_change(mddev)) {
                suspend = true;
+               mddev_suspend(mddev, false);
+       }
 
-       suspend ? mddev_suspend_and_lock_nointr(mddev) :
-                 mddev_lock_nointr(mddev);
-
+       mddev_lock_nointr(mddev);
        if (!md_is_rdwr(mddev)) {
                /*
                 * On a read-only array we can:
@@ -9400,8 +9478,10 @@ static void md_start_sync(struct work_struct *ws)
        if (spares)
                md_bitmap_write_all(mddev->bitmap);
 
+       name = test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) ?
+                       "reshape" : "resync";
        rcu_assign_pointer(mddev->sync_thread,
-                          md_register_thread(md_do_sync, mddev, "resync"));
+                          md_register_thread(md_do_sync, mddev, name));
        if (!mddev->sync_thread) {
                pr_warn("%s: could not start resync thread...\n",
                        mdname(mddev));
@@ -9445,6 +9525,20 @@ not_running:
                sysfs_notify_dirent_safe(mddev->sysfs_action);
 }
 
+static void unregister_sync_thread(struct mddev *mddev)
+{
+       if (!test_bit(MD_RECOVERY_DONE, &mddev->recovery)) {
+               /* resync/recovery still happening */
+               clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+               return;
+       }
+
+       if (WARN_ON_ONCE(!mddev->sync_thread))
+               return;
+
+       md_reap_sync_thread(mddev);
+}
+
 /*
  * This routine is regularly called by all per-raid-array threads to
  * deal with generic issues like resync and super-block update.
@@ -9469,9 +9563,6 @@ not_running:
  */
 void md_check_recovery(struct mddev *mddev)
 {
-       if (READ_ONCE(mddev->suspended))
-               return;
-
        if (mddev->bitmap)
                md_bitmap_daemon_work(mddev);
 
@@ -9485,7 +9576,8 @@ void md_check_recovery(struct mddev *mddev)
        }
 
        if (!md_is_rdwr(mddev) &&
-           !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
+           !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) &&
+           !test_bit(MD_RECOVERY_DONE, &mddev->recovery))
                return;
        if ( ! (
                (mddev->sb_flags & ~ (1<<MD_SB_CHANGE_PENDING)) ||
@@ -9507,8 +9599,7 @@ void md_check_recovery(struct mddev *mddev)
                        struct md_rdev *rdev;
 
                        if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
-                               /* sync_work already queued. */
-                               clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+                               unregister_sync_thread(mddev);
                                goto unlock;
                        }
 
@@ -9571,16 +9662,7 @@ void md_check_recovery(struct mddev *mddev)
                 * still set.
                 */
                if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
-                       if (!test_bit(MD_RECOVERY_DONE, &mddev->recovery)) {
-                               /* resync/recovery still happening */
-                               clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
-                               goto unlock;
-                       }
-
-                       if (WARN_ON_ONCE(!mddev->sync_thread))
-                               goto unlock;
-
-                       md_reap_sync_thread(mddev);
+                       unregister_sync_thread(mddev);
                        goto unlock;
                }
 
index 8d881cc597992f1a2af08cb05d07081182860ebd..097d9dbd69b8363df9ba69ed0d89cab05577151b 100644 (file)
@@ -18,6 +18,7 @@
 #include <linux/timer.h>
 #include <linux/wait.h>
 #include <linux/workqueue.h>
+#include <trace/events/block.h>
 #include "md-cluster.h"
 
 #define MaxSector (~(sector_t)0)
@@ -59,7 +60,7 @@ struct md_rdev {
         */
        struct block_device *meta_bdev;
        struct block_device *bdev;      /* block device handle */
-       struct bdev_handle *bdev_handle;        /* Handle from open for bdev */
+       struct file *bdev_file;         /* Handle from open for bdev */
 
        struct page     *sb_page, *bb_page;
        int             sb_loaded;
@@ -207,6 +208,7 @@ enum flag_bits {
                                 * check if there is collision between raid1
                                 * serial bios.
                                 */
+       Nonrot,                 /* non-rotational device (SSD) */
 };
 
 static inline int is_badblock(struct md_rdev *rdev, sector_t s, int sectors,
@@ -222,6 +224,16 @@ static inline int is_badblock(struct md_rdev *rdev, sector_t s, int sectors,
        }
        return 0;
 }
+
+static inline int rdev_has_badblock(struct md_rdev *rdev, sector_t s,
+                                   int sectors)
+{
+       sector_t first_bad;
+       int bad_sectors;
+
+       return is_badblock(rdev, s, sectors, &first_bad, &bad_sectors);
+}
+
 extern int rdev_set_badblocks(struct md_rdev *rdev, sector_t s, int sectors,
                              int is_new);
 extern int rdev_clear_badblocks(struct md_rdev *rdev, sector_t s, int sectors,
@@ -468,7 +480,6 @@ struct mddev {
        struct timer_list               safemode_timer;
        struct percpu_ref               writes_pending;
        int                             sync_checkers;  /* # of threads checking writes_pending */
-       struct request_queue            *queue; /* for plugging ... */
 
        struct bitmap                   *bitmap; /* the bitmap for the device */
        struct {
@@ -558,6 +569,37 @@ enum recovery_flags {
        MD_RESYNCING_REMOTE,    /* remote node is running resync thread */
 };
 
+enum md_ro_state {
+       MD_RDWR,
+       MD_RDONLY,
+       MD_AUTO_READ,
+       MD_MAX_STATE
+};
+
+static inline bool md_is_rdwr(struct mddev *mddev)
+{
+       return (mddev->ro == MD_RDWR);
+}
+
+static inline bool reshape_interrupted(struct mddev *mddev)
+{
+       /* reshape never start */
+       if (mddev->reshape_position == MaxSector)
+               return false;
+
+       /* interrupted */
+       if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
+               return true;
+
+       /* running reshape will be interrupted soon. */
+       if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
+           test_bit(MD_RECOVERY_INTR, &mddev->recovery) ||
+           test_bit(MD_RECOVERY_FROZEN, &mddev->recovery))
+               return true;
+
+       return false;
+}
+
 static inline int __must_check mddev_lock(struct mddev *mddev)
 {
        return mutex_lock_interruptible(&mddev->reconfig_mutex);
@@ -617,6 +659,7 @@ struct md_personality
        int (*start_reshape) (struct mddev *mddev);
        void (*finish_reshape) (struct mddev *mddev);
        void (*update_reshape_pos) (struct mddev *mddev);
+       void (*prepare_suspend) (struct mddev *mddev);
        /* quiesce suspends or resumes internal processing.
         * 1 - stop new actions and wait for action io to complete
         * 0 - return to normal behaviour
@@ -750,6 +793,7 @@ extern void md_finish_reshape(struct mddev *mddev);
 void md_submit_discard_bio(struct mddev *mddev, struct md_rdev *rdev,
                        struct bio *bio, sector_t start, sector_t size);
 void md_account_bio(struct mddev *mddev, struct bio **bio);
+void md_free_cloned_bio(struct bio *bio);
 
 extern bool __must_check md_flush_request(struct mddev *mddev, struct bio *bio);
 extern void md_super_write(struct mddev *mddev, struct md_rdev *rdev,
@@ -778,9 +822,12 @@ extern void md_stop_writes(struct mddev *mddev);
 extern int md_rdev_init(struct md_rdev *rdev);
 extern void md_rdev_clear(struct md_rdev *rdev);
 
-extern void md_handle_request(struct mddev *mddev, struct bio *bio);
+extern bool md_handle_request(struct mddev *mddev, struct bio *bio);
 extern int mddev_suspend(struct mddev *mddev, bool interruptible);
 extern void mddev_resume(struct mddev *mddev);
+extern void md_idle_sync_thread(struct mddev *mddev);
+extern void md_frozen_sync_thread(struct mddev *mddev);
+extern void md_unfrozen_sync_thread(struct mddev *mddev);
 
 extern void md_reload_sb(struct mddev *mddev, int raid_disk);
 extern void md_update_sb(struct mddev *mddev, int force);
@@ -821,7 +868,7 @@ static inline void mddev_check_write_zeroes(struct mddev *mddev, struct bio *bio
 {
        if (bio_op(bio) == REQ_OP_WRITE_ZEROES &&
            !bio->bi_bdev->bd_disk->queue->limits.max_write_zeroes_sectors)
-               mddev->queue->limits.max_write_zeroes_sectors = 0;
+               mddev->gendisk->queue->limits.max_write_zeroes_sectors = 0;
 }
 
 static inline int mddev_suspend_and_lock(struct mddev *mddev)
@@ -860,7 +907,31 @@ void md_autostart_arrays(int part);
 int md_set_array_info(struct mddev *mddev, struct mdu_array_info_s *info);
 int md_add_new_disk(struct mddev *mddev, struct mdu_disk_info_s *info);
 int do_md_run(struct mddev *mddev);
+void mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim);
+int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev);
+void mddev_update_io_opt(struct mddev *mddev, unsigned int nr_stripes);
 
 extern const struct block_device_operations md_fops;
 
+/*
+ * MD devices can be used undeneath by DM, in which case ->gendisk is NULL.
+ */
+static inline bool mddev_is_dm(struct mddev *mddev)
+{
+       return !mddev->gendisk;
+}
+
+static inline void mddev_trace_remap(struct mddev *mddev, struct bio *bio,
+               sector_t sector)
+{
+       if (!mddev_is_dm(mddev))
+               trace_block_bio_remap(bio, disk_devt(mddev->gendisk), sector);
+}
+
+#define mddev_add_trace_msg(mddev, fmt, args...)                       \
+do {                                                                   \
+       if (!mddev_is_dm(mddev))                                        \
+               blk_add_trace_msg((mddev)->gendisk->queue, fmt, ##args); \
+} while (0)
+
 #endif /* _MD_MD_H */
index c50a7abda744ad13262378a83fec2dbde0e00b9a..c5d4aeb68404c9cbd3f79797672013f965b45bbb 100644 (file)
@@ -379,6 +379,19 @@ static void raid0_free(struct mddev *mddev, void *priv)
        free_conf(mddev, conf);
 }
 
+static int raid0_set_limits(struct mddev *mddev)
+{
+       struct queue_limits lim;
+
+       blk_set_stacking_limits(&lim);
+       lim.max_hw_sectors = mddev->chunk_sectors;
+       lim.max_write_zeroes_sectors = mddev->chunk_sectors;
+       lim.io_min = mddev->chunk_sectors << 9;
+       lim.io_opt = lim.io_min * mddev->raid_disks;
+       mddev_stack_rdev_limits(mddev, &lim);
+       return queue_limits_set(mddev->gendisk->queue, &lim);
+}
+
 static int raid0_run(struct mddev *mddev)
 {
        struct r0conf *conf;
@@ -399,20 +412,10 @@ static int raid0_run(struct mddev *mddev)
                mddev->private = conf;
        }
        conf = mddev->private;
-       if (mddev->queue) {
-               struct md_rdev *rdev;
-
-               blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors);
-               blk_queue_max_write_zeroes_sectors(mddev->queue, mddev->chunk_sectors);
-
-               blk_queue_io_min(mddev->queue, mddev->chunk_sectors << 9);
-               blk_queue_io_opt(mddev->queue,
-                                (mddev->chunk_sectors << 9) * mddev->raid_disks);
-
-               rdev_for_each(rdev, mddev) {
-                       disk_stack_limits(mddev->gendisk, rdev->bdev,
-                                         rdev->data_offset << 9);
-               }
+       if (!mddev_is_dm(mddev)) {
+               ret = raid0_set_limits(mddev);
+               if (ret)
+                       goto out_free_conf;
        }
 
        /* calculate array device size */
@@ -426,8 +429,10 @@ static int raid0_run(struct mddev *mddev)
 
        ret = md_integrity_register(mddev);
        if (ret)
-               free_conf(mddev, conf);
-
+               goto out_free_conf;
+       return 0;
+out_free_conf:
+       free_conf(mddev, conf);
        return ret;
 }
 
@@ -578,10 +583,7 @@ static void raid0_map_submit_bio(struct mddev *mddev, struct bio *bio)
        bio_set_dev(bio, tmp_dev->bdev);
        bio->bi_iter.bi_sector = sector + zone->dev_start +
                tmp_dev->data_offset;
-
-       if (mddev->gendisk)
-               trace_block_bio_remap(bio, disk_devt(mddev->gendisk),
-                                     bio_sector);
+       mddev_trace_remap(mddev, bio, bio_sector);
        mddev_check_write_zeroes(mddev, bio);
        submit_bio_noacct(bio);
 }
index 512746551f36a754d9e655f220f260b6896dcd10..2ea1710a3b705e9c68d7ed4a05fcd0bc16e32e2b 100644 (file)
@@ -227,3 +227,72 @@ static inline bool exceed_read_errors(struct mddev *mddev, struct md_rdev *rdev)
 
        return false;
 }
+
+/**
+ * raid1_check_read_range() - check a given read range for bad blocks,
+ * available read length is returned;
+ * @rdev: the rdev to read;
+ * @this_sector: read position;
+ * @len: read length;
+ *
+ * helper function for read_balance()
+ *
+ * 1) If there are no bad blocks in the range, @len is returned;
+ * 2) If the range are all bad blocks, 0 is returned;
+ * 3) If there are partial bad blocks:
+ *  - If the bad block range starts after @this_sector, the length of first
+ *  good region is returned;
+ *  - If the bad block range starts before @this_sector, 0 is returned and
+ *  the @len is updated to the offset into the region before we get to the
+ *  good blocks;
+ */
+static inline int raid1_check_read_range(struct md_rdev *rdev,
+                                        sector_t this_sector, int *len)
+{
+       sector_t first_bad;
+       int bad_sectors;
+
+       /* no bad block overlap */
+       if (!is_badblock(rdev, this_sector, *len, &first_bad, &bad_sectors))
+               return *len;
+
+       /*
+        * bad block range starts offset into our range so we can return the
+        * number of sectors before the bad blocks start.
+        */
+       if (first_bad > this_sector)
+               return first_bad - this_sector;
+
+       /* read range is fully consumed by bad blocks. */
+       if (this_sector + *len <= first_bad + bad_sectors)
+               return 0;
+
+       /*
+        * final case, bad block range starts before or at the start of our
+        * range but does not cover our entire range so we still return 0 but
+        * update the length with the number of sectors before we get to the
+        * good ones.
+        */
+       *len = first_bad + bad_sectors - this_sector;
+       return 0;
+}
+
+/*
+ * Check if read should choose the first rdev.
+ *
+ * Balance on the whole device if no resync is going on (recovery is ok) or
+ * below the resync window. Otherwise, take the first readable disk.
+ */
+static inline bool raid1_should_read_first(struct mddev *mddev,
+                                          sector_t this_sector, int len)
+{
+       if ((mddev->recovery_cp < this_sector + len))
+               return true;
+
+       if (mddev_is_clustered(mddev) &&
+           md_cluster_ops->area_resyncing(mddev, READ, this_sector,
+                                          this_sector + len))
+               return true;
+
+       return false;
+}
index 24f0d799fd98ed318f2f1d2fc7b682d5ebf77e4c..be8ac24f50b6ad651fd107f9af9a448bb1f7780a 100644 (file)
@@ -46,9 +46,6 @@
 static void allow_barrier(struct r1conf *conf, sector_t sector_nr);
 static void lower_barrier(struct r1conf *conf, sector_t sector_nr);
 
-#define raid1_log(md, fmt, args...)                            \
-       do { if ((md)->queue) blk_add_trace_msg((md)->queue, "raid1 " fmt, ##args); } while (0)
-
 #define RAID_1_10_NAME "raid1"
 #include "raid1-10.c"
 
@@ -498,9 +495,6 @@ static void raid1_end_write_request(struct bio *bio)
                 * to user-side. So if something waits for IO, then it
                 * will wait for the 'master' bio.
                 */
-               sector_t first_bad;
-               int bad_sectors;
-
                r1_bio->bios[mirror] = NULL;
                to_put = bio;
                /*
@@ -516,8 +510,8 @@ static void raid1_end_write_request(struct bio *bio)
                        set_bit(R1BIO_Uptodate, &r1_bio->state);
 
                /* Maybe we can clear some bad blocks. */
-               if (is_badblock(rdev, r1_bio->sector, r1_bio->sectors,
-                               &first_bad, &bad_sectors) && !discard_error) {
+               if (rdev_has_badblock(rdev, r1_bio->sector, r1_bio->sectors) &&
+                   !discard_error) {
                        r1_bio->bios[mirror] = IO_MADE_GOOD;
                        set_bit(R1BIO_MadeGood, &r1_bio->state);
                }
@@ -582,211 +576,312 @@ static sector_t align_to_barrier_unit_end(sector_t start_sector,
        return len;
 }
 
-/*
- * This routine returns the disk from which the requested read should
- * be done. There is a per-array 'next expected sequential IO' sector
- * number - if this matches on the next IO then we use the last disk.
- * There is also a per-disk 'last know head position' sector that is
- * maintained from IRQ contexts, both the normal and the resync IO
- * completion handlers update this position correctly. If there is no
- * perfect sequential match then we pick the disk whose head is closest.
- *
- * If there are 2 mirrors in the same 2 devices, performance degrades
- * because position is mirror, not device based.
- *
- * The rdev for the device selected will have nr_pending incremented.
- */
-static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sectors)
+static void update_read_sectors(struct r1conf *conf, int disk,
+                               sector_t this_sector, int len)
 {
-       const sector_t this_sector = r1_bio->sector;
-       int sectors;
-       int best_good_sectors;
-       int best_disk, best_dist_disk, best_pending_disk;
-       int has_nonrot_disk;
+       struct raid1_info *info = &conf->mirrors[disk];
+
+       atomic_inc(&info->rdev->nr_pending);
+       if (info->next_seq_sect != this_sector)
+               info->seq_start = this_sector;
+       info->next_seq_sect = this_sector + len;
+}
+
+static int choose_first_rdev(struct r1conf *conf, struct r1bio *r1_bio,
+                            int *max_sectors)
+{
+       sector_t this_sector = r1_bio->sector;
+       int len = r1_bio->sectors;
        int disk;
-       sector_t best_dist;
-       unsigned int min_pending;
-       struct md_rdev *rdev;
-       int choose_first;
-       int choose_next_idle;
 
-       /*
-        * Check if we can balance. We can balance on the whole
-        * device if no resync is going on, or below the resync window.
-        * We take the first readable disk when above the resync window.
-        */
- retry:
-       sectors = r1_bio->sectors;
-       best_disk = -1;
-       best_dist_disk = -1;
-       best_dist = MaxSector;
-       best_pending_disk = -1;
-       min_pending = UINT_MAX;
-       best_good_sectors = 0;
-       has_nonrot_disk = 0;
-       choose_next_idle = 0;
-       clear_bit(R1BIO_FailFast, &r1_bio->state);
+       for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) {
+               struct md_rdev *rdev;
+               int read_len;
 
-       if ((conf->mddev->recovery_cp < this_sector + sectors) ||
-           (mddev_is_clustered(conf->mddev) &&
-           md_cluster_ops->area_resyncing(conf->mddev, READ, this_sector,
-                   this_sector + sectors)))
-               choose_first = 1;
-       else
-               choose_first = 0;
+               if (r1_bio->bios[disk] == IO_BLOCKED)
+                       continue;
+
+               rdev = conf->mirrors[disk].rdev;
+               if (!rdev || test_bit(Faulty, &rdev->flags))
+                       continue;
+
+               /* choose the first disk even if it has some bad blocks. */
+               read_len = raid1_check_read_range(rdev, this_sector, &len);
+               if (read_len > 0) {
+                       update_read_sectors(conf, disk, this_sector, read_len);
+                       *max_sectors = read_len;
+                       return disk;
+               }
+       }
+
+       return -1;
+}
+
+static int choose_bb_rdev(struct r1conf *conf, struct r1bio *r1_bio,
+                         int *max_sectors)
+{
+       sector_t this_sector = r1_bio->sector;
+       int best_disk = -1;
+       int best_len = 0;
+       int disk;
 
        for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) {
-               sector_t dist;
-               sector_t first_bad;
-               int bad_sectors;
-               unsigned int pending;
-               bool nonrot;
+               struct md_rdev *rdev;
+               int len;
+               int read_len;
+
+               if (r1_bio->bios[disk] == IO_BLOCKED)
+                       continue;
 
                rdev = conf->mirrors[disk].rdev;
-               if (r1_bio->bios[disk] == IO_BLOCKED
-                   || rdev == NULL
-                   || test_bit(Faulty, &rdev->flags))
+               if (!rdev || test_bit(Faulty, &rdev->flags) ||
+                   test_bit(WriteMostly, &rdev->flags))
                        continue;
-               if (!test_bit(In_sync, &rdev->flags) &&
-                   rdev->recovery_offset < this_sector + sectors)
+
+               /* keep track of the disk with the most readable sectors. */
+               len = r1_bio->sectors;
+               read_len = raid1_check_read_range(rdev, this_sector, &len);
+               if (read_len > best_len) {
+                       best_disk = disk;
+                       best_len = read_len;
+               }
+       }
+
+       if (best_disk != -1) {
+               *max_sectors = best_len;
+               update_read_sectors(conf, best_disk, this_sector, best_len);
+       }
+
+       return best_disk;
+}
+
+static int choose_slow_rdev(struct r1conf *conf, struct r1bio *r1_bio,
+                           int *max_sectors)
+{
+       sector_t this_sector = r1_bio->sector;
+       int bb_disk = -1;
+       int bb_read_len = 0;
+       int disk;
+
+       for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) {
+               struct md_rdev *rdev;
+               int len;
+               int read_len;
+
+               if (r1_bio->bios[disk] == IO_BLOCKED)
                        continue;
-               if (test_bit(WriteMostly, &rdev->flags)) {
-                       /* Don't balance among write-mostly, just
-                        * use the first as a last resort */
-                       if (best_dist_disk < 0) {
-                               if (is_badblock(rdev, this_sector, sectors,
-                                               &first_bad, &bad_sectors)) {
-                                       if (first_bad <= this_sector)
-                                               /* Cannot use this */
-                                               continue;
-                                       best_good_sectors = first_bad - this_sector;
-                               } else
-                                       best_good_sectors = sectors;
-                               best_dist_disk = disk;
-                               best_pending_disk = disk;
-                       }
+
+               rdev = conf->mirrors[disk].rdev;
+               if (!rdev || test_bit(Faulty, &rdev->flags) ||
+                   !test_bit(WriteMostly, &rdev->flags))
                        continue;
+
+               /* there are no bad blocks, we can use this disk */
+               len = r1_bio->sectors;
+               read_len = raid1_check_read_range(rdev, this_sector, &len);
+               if (read_len == r1_bio->sectors) {
+                       update_read_sectors(conf, disk, this_sector, read_len);
+                       return disk;
                }
-               /* This is a reasonable device to use.  It might
-                * even be best.
+
+               /*
+                * there are partial bad blocks, choose the rdev with largest
+                * read length.
                 */
-               if (is_badblock(rdev, this_sector, sectors,
-                               &first_bad, &bad_sectors)) {
-                       if (best_dist < MaxSector)
-                               /* already have a better device */
-                               continue;
-                       if (first_bad <= this_sector) {
-                               /* cannot read here. If this is the 'primary'
-                                * device, then we must not read beyond
-                                * bad_sectors from another device..
-                                */
-                               bad_sectors -= (this_sector - first_bad);
-                               if (choose_first && sectors > bad_sectors)
-                                       sectors = bad_sectors;
-                               if (best_good_sectors > sectors)
-                                       best_good_sectors = sectors;
-
-                       } else {
-                               sector_t good_sectors = first_bad - this_sector;
-                               if (good_sectors > best_good_sectors) {
-                                       best_good_sectors = good_sectors;
-                                       best_disk = disk;
-                               }
-                               if (choose_first)
-                                       break;
-                       }
-                       continue;
-               } else {
-                       if ((sectors > best_good_sectors) && (best_disk >= 0))
-                               best_disk = -1;
-                       best_good_sectors = sectors;
+               if (read_len > bb_read_len) {
+                       bb_disk = disk;
+                       bb_read_len = read_len;
                }
+       }
+
+       if (bb_disk != -1) {
+               *max_sectors = bb_read_len;
+               update_read_sectors(conf, bb_disk, this_sector, bb_read_len);
+       }
+
+       return bb_disk;
+}
+
+static bool is_sequential(struct r1conf *conf, int disk, struct r1bio *r1_bio)
+{
+       /* TODO: address issues with this check and concurrency. */
+       return conf->mirrors[disk].next_seq_sect == r1_bio->sector ||
+              conf->mirrors[disk].head_position == r1_bio->sector;
+}
+
+/*
+ * If buffered sequential IO size exceeds optimal iosize, check if there is idle
+ * disk. If yes, choose the idle disk.
+ */
+static bool should_choose_next(struct r1conf *conf, int disk)
+{
+       struct raid1_info *mirror = &conf->mirrors[disk];
+       int opt_iosize;
+
+       if (!test_bit(Nonrot, &mirror->rdev->flags))
+               return false;
+
+       opt_iosize = bdev_io_opt(mirror->rdev->bdev) >> 9;
+       return opt_iosize > 0 && mirror->seq_start != MaxSector &&
+              mirror->next_seq_sect > opt_iosize &&
+              mirror->next_seq_sect - opt_iosize >= mirror->seq_start;
+}
+
+static bool rdev_readable(struct md_rdev *rdev, struct r1bio *r1_bio)
+{
+       if (!rdev || test_bit(Faulty, &rdev->flags))
+               return false;
+
+       /* still in recovery */
+       if (!test_bit(In_sync, &rdev->flags) &&
+           rdev->recovery_offset < r1_bio->sector + r1_bio->sectors)
+               return false;
+
+       /* don't read from slow disk unless have to */
+       if (test_bit(WriteMostly, &rdev->flags))
+               return false;
+
+       /* don't split IO for bad blocks unless have to */
+       if (rdev_has_badblock(rdev, r1_bio->sector, r1_bio->sectors))
+               return false;
+
+       return true;
+}
+
+struct read_balance_ctl {
+       sector_t closest_dist;
+       int closest_dist_disk;
+       int min_pending;
+       int min_pending_disk;
+       int sequential_disk;
+       int readable_disks;
+};
+
+static int choose_best_rdev(struct r1conf *conf, struct r1bio *r1_bio)
+{
+       int disk;
+       struct read_balance_ctl ctl = {
+               .closest_dist_disk      = -1,
+               .closest_dist           = MaxSector,
+               .min_pending_disk       = -1,
+               .min_pending            = UINT_MAX,
+               .sequential_disk        = -1,
+       };
+
+       for (disk = 0 ; disk < conf->raid_disks * 2 ; disk++) {
+               struct md_rdev *rdev;
+               sector_t dist;
+               unsigned int pending;
 
-               if (best_disk >= 0)
-                       /* At least two disks to choose from so failfast is OK */
+               if (r1_bio->bios[disk] == IO_BLOCKED)
+                       continue;
+
+               rdev = conf->mirrors[disk].rdev;
+               if (!rdev_readable(rdev, r1_bio))
+                       continue;
+
+               /* At least two disks to choose from so failfast is OK */
+               if (ctl.readable_disks++ == 1)
                        set_bit(R1BIO_FailFast, &r1_bio->state);
 
-               nonrot = bdev_nonrot(rdev->bdev);
-               has_nonrot_disk |= nonrot;
                pending = atomic_read(&rdev->nr_pending);
-               dist = abs(this_sector - conf->mirrors[disk].head_position);
-               if (choose_first) {
-                       best_disk = disk;
-                       break;
-               }
+               dist = abs(r1_bio->sector - conf->mirrors[disk].head_position);
+
                /* Don't change to another disk for sequential reads */
-               if (conf->mirrors[disk].next_seq_sect == this_sector
-                   || dist == 0) {
-                       int opt_iosize = bdev_io_opt(rdev->bdev) >> 9;
-                       struct raid1_info *mirror = &conf->mirrors[disk];
+               if (is_sequential(conf, disk, r1_bio)) {
+                       if (!should_choose_next(conf, disk))
+                               return disk;
 
-                       best_disk = disk;
                        /*
-                        * If buffered sequential IO size exceeds optimal
-                        * iosize, check if there is idle disk. If yes, choose
-                        * the idle disk. read_balance could already choose an
-                        * idle disk before noticing it's a sequential IO in
-                        * this disk. This doesn't matter because this disk
-                        * will idle, next time it will be utilized after the
-                        * first disk has IO size exceeds optimal iosize. In
-                        * this way, iosize of the first disk will be optimal
-                        * iosize at least. iosize of the second disk might be
-                        * small, but not a big deal since when the second disk
-                        * starts IO, the first disk is likely still busy.
+                        * Add 'pending' to avoid choosing this disk if
+                        * there is other idle disk.
                         */
-                       if (nonrot && opt_iosize > 0 &&
-                           mirror->seq_start != MaxSector &&
-                           mirror->next_seq_sect > opt_iosize &&
-                           mirror->next_seq_sect - opt_iosize >=
-                           mirror->seq_start) {
-                               choose_next_idle = 1;
-                               continue;
-                       }
-                       break;
+                       pending++;
+                       /*
+                        * If there is no other idle disk, this disk
+                        * will be chosen.
+                        */
+                       ctl.sequential_disk = disk;
                }
 
-               if (choose_next_idle)
-                       continue;
-
-               if (min_pending > pending) {
-                       min_pending = pending;
-                       best_pending_disk = disk;
+               if (ctl.min_pending > pending) {
+                       ctl.min_pending = pending;
+                       ctl.min_pending_disk = disk;
                }
 
-               if (dist < best_dist) {
-                       best_dist = dist;
-                       best_dist_disk = disk;
+               if (ctl.closest_dist > dist) {
+                       ctl.closest_dist = dist;
+                       ctl.closest_dist_disk = disk;
                }
        }
 
+       /*
+        * sequential IO size exceeds optimal iosize, however, there is no other
+        * idle disk, so choose the sequential disk.
+        */
+       if (ctl.sequential_disk != -1 && ctl.min_pending != 0)
+               return ctl.sequential_disk;
+
        /*
         * If all disks are rotational, choose the closest disk. If any disk is
         * non-rotational, choose the disk with less pending request even the
         * disk is rotational, which might/might not be optimal for raids with
         * mixed ratation/non-rotational disks depending on workload.
         */
-       if (best_disk == -1) {
-               if (has_nonrot_disk || min_pending == 0)
-                       best_disk = best_pending_disk;
-               else
-                       best_disk = best_dist_disk;
-       }
+       if (ctl.min_pending_disk != -1 &&
+           (READ_ONCE(conf->nonrot_disks) || ctl.min_pending == 0))
+               return ctl.min_pending_disk;
+       else
+               return ctl.closest_dist_disk;
+}
 
-       if (best_disk >= 0) {
-               rdev = conf->mirrors[best_disk].rdev;
-               if (!rdev)
-                       goto retry;
-               atomic_inc(&rdev->nr_pending);
-               sectors = best_good_sectors;
+/*
+ * This routine returns the disk from which the requested read should be done.
+ *
+ * 1) If resync is in progress, find the first usable disk and use it even if it
+ * has some bad blocks.
+ *
+ * 2) Now that there is no resync, loop through all disks and skipping slow
+ * disks and disks with bad blocks for now. Only pay attention to key disk
+ * choice.
+ *
+ * 3) If we've made it this far, now look for disks with bad blocks and choose
+ * the one with most number of sectors.
+ *
+ * 4) If we are all the way at the end, we have no choice but to use a disk even
+ * if it is write mostly.
+ *
+ * The rdev for the device selected will have nr_pending incremented.
+ */
+static int read_balance(struct r1conf *conf, struct r1bio *r1_bio,
+                       int *max_sectors)
+{
+       int disk;
 
-               if (conf->mirrors[best_disk].next_seq_sect != this_sector)
-                       conf->mirrors[best_disk].seq_start = this_sector;
+       clear_bit(R1BIO_FailFast, &r1_bio->state);
+
+       if (raid1_should_read_first(conf->mddev, r1_bio->sector,
+                                   r1_bio->sectors))
+               return choose_first_rdev(conf, r1_bio, max_sectors);
 
-               conf->mirrors[best_disk].next_seq_sect = this_sector + sectors;
+       disk = choose_best_rdev(conf, r1_bio);
+       if (disk >= 0) {
+               *max_sectors = r1_bio->sectors;
+               update_read_sectors(conf, disk, r1_bio->sector,
+                                   r1_bio->sectors);
+               return disk;
        }
-       *max_sectors = sectors;
 
-       return best_disk;
+       /*
+        * If we are here it means we didn't find a perfectly good disk so
+        * now spend a bit more time trying to find one with the most good
+        * sectors.
+        */
+       disk = choose_bb_rdev(conf, r1_bio, max_sectors);
+       if (disk >= 0)
+               return disk;
+
+       return choose_slow_rdev(conf, r1_bio, max_sectors);
 }
 
 static void wake_up_barrier(struct r1conf *conf)
@@ -1098,7 +1193,7 @@ static void freeze_array(struct r1conf *conf, int extra)
         */
        spin_lock_irq(&conf->resync_lock);
        conf->array_frozen = 1;
-       raid1_log(conf->mddev, "wait freeze");
+       mddev_add_trace_msg(conf->mddev, "raid1 wait freeze");
        wait_event_lock_irq_cmd(
                conf->wait_barrier,
                get_unqueued_pending(conf) == extra,
@@ -1287,7 +1382,7 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio,
                 * Reading from a write-mostly device must take care not to
                 * over-take any writes that are 'behind'
                 */
-               raid1_log(mddev, "wait behind writes");
+               mddev_add_trace_msg(mddev, "raid1 wait behind writes");
                wait_event(bitmap->behind_wait,
                           atomic_read(&bitmap->behind_writes) == 0);
        }
@@ -1320,11 +1415,7 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio,
            test_bit(R1BIO_FailFast, &r1_bio->state))
                read_bio->bi_opf |= MD_FAILFAST;
        read_bio->bi_private = r1_bio;
-
-       if (mddev->gendisk)
-               trace_block_bio_remap(read_bio, disk_devt(mddev->gendisk),
-                                     r1_bio->sector);
-
+       mddev_trace_remap(mddev, read_bio, r1_bio->sector);
        submit_bio_noacct(read_bio);
 }
 
@@ -1474,7 +1565,8 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
                        bio_wouldblock_error(bio);
                        return;
                }
-               raid1_log(mddev, "wait rdev %d blocked", blocked_rdev->raid_disk);
+               mddev_add_trace_msg(mddev, "raid1 wait rdev %d blocked",
+                               blocked_rdev->raid_disk);
                md_wait_for_blocked_rdev(blocked_rdev, mddev);
                wait_barrier(conf, bio->bi_iter.bi_sector, false);
                goto retry_write;
@@ -1557,10 +1649,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
                mbio->bi_private = r1_bio;
 
                atomic_inc(&r1_bio->remaining);
-
-               if (mddev->gendisk)
-                       trace_block_bio_remap(mbio, disk_devt(mddev->gendisk),
-                                             r1_bio->sector);
+               mddev_trace_remap(mddev, mbio, r1_bio->sector);
                /* flush_pending_writes() needs access to the rdev so...*/
                mbio->bi_bdev = (void *)rdev;
                if (!raid1_add_bio_to_plug(mddev, mbio, raid1_unplug, disks)) {
@@ -1760,6 +1849,52 @@ static int raid1_spare_active(struct mddev *mddev)
        return count;
 }
 
+static bool raid1_add_conf(struct r1conf *conf, struct md_rdev *rdev, int disk,
+                          bool replacement)
+{
+       struct raid1_info *info = conf->mirrors + disk;
+
+       if (replacement)
+               info += conf->raid_disks;
+
+       if (info->rdev)
+               return false;
+
+       if (bdev_nonrot(rdev->bdev)) {
+               set_bit(Nonrot, &rdev->flags);
+               WRITE_ONCE(conf->nonrot_disks, conf->nonrot_disks + 1);
+       }
+
+       rdev->raid_disk = disk;
+       info->head_position = 0;
+       info->seq_start = MaxSector;
+       WRITE_ONCE(info->rdev, rdev);
+
+       return true;
+}
+
+static bool raid1_remove_conf(struct r1conf *conf, int disk)
+{
+       struct raid1_info *info = conf->mirrors + disk;
+       struct md_rdev *rdev = info->rdev;
+
+       if (!rdev || test_bit(In_sync, &rdev->flags) ||
+           atomic_read(&rdev->nr_pending))
+               return false;
+
+       /* Only remove non-faulty devices if recovery is not possible. */
+       if (!test_bit(Faulty, &rdev->flags) &&
+           rdev->mddev->recovery_disabled != conf->recovery_disabled &&
+           rdev->mddev->degraded < conf->raid_disks)
+               return false;
+
+       if (test_and_clear_bit(Nonrot, &rdev->flags))
+               WRITE_ONCE(conf->nonrot_disks, conf->nonrot_disks - 1);
+
+       WRITE_ONCE(info->rdev, NULL);
+       return true;
+}
+
 static int raid1_add_disk(struct mddev *mddev, struct md_rdev *rdev)
 {
        struct r1conf *conf = mddev->private;
@@ -1791,19 +1926,16 @@ static int raid1_add_disk(struct mddev *mddev, struct md_rdev *rdev)
        for (mirror = first; mirror <= last; mirror++) {
                p = conf->mirrors + mirror;
                if (!p->rdev) {
-                       if (mddev->gendisk)
-                               disk_stack_limits(mddev->gendisk, rdev->bdev,
-                                                 rdev->data_offset << 9);
+                       err = mddev_stack_new_rdev(mddev, rdev);
+                       if (err)
+                               return err;
 
-                       p->head_position = 0;
-                       rdev->raid_disk = mirror;
-                       err = 0;
+                       raid1_add_conf(conf, rdev, mirror, false);
                        /* As all devices are equivalent, we don't need a full recovery
                         * if this was recently any drive of the array
                         */
                        if (rdev->saved_raid_disk < 0)
                                conf->fullsync = 1;
-                       WRITE_ONCE(p->rdev, rdev);
                        break;
                }
                if (test_bit(WantReplacement, &p->rdev->flags) &&
@@ -1813,13 +1945,11 @@ static int raid1_add_disk(struct mddev *mddev, struct md_rdev *rdev)
 
        if (err && repl_slot >= 0) {
                /* Add this device as a replacement */
-               p = conf->mirrors + repl_slot;
                clear_bit(In_sync, &rdev->flags);
                set_bit(Replacement, &rdev->flags);
-               rdev->raid_disk = repl_slot;
+               raid1_add_conf(conf, rdev, repl_slot, true);
                err = 0;
                conf->fullsync = 1;
-               WRITE_ONCE(p[conf->raid_disks].rdev, rdev);
        }
 
        print_conf(conf);
@@ -1836,27 +1966,20 @@ static int raid1_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
        if (unlikely(number >= conf->raid_disks))
                goto abort;
 
-       if (rdev != p->rdev)
-               p = conf->mirrors + conf->raid_disks + number;
+       if (rdev != p->rdev) {
+               number += conf->raid_disks;
+               p = conf->mirrors + number;
+       }
 
        print_conf(conf);
        if (rdev == p->rdev) {
-               if (test_bit(In_sync, &rdev->flags) ||
-                   atomic_read(&rdev->nr_pending)) {
+               if (!raid1_remove_conf(conf, number)) {
                        err = -EBUSY;
                        goto abort;
                }
-               /* Only remove non-faulty devices if recovery
-                * is not possible.
-                */
-               if (!test_bit(Faulty, &rdev->flags) &&
-                   mddev->recovery_disabled != conf->recovery_disabled &&
-                   mddev->degraded < conf->raid_disks) {
-                       err = -EBUSY;
-                       goto abort;
-               }
-               WRITE_ONCE(p->rdev, NULL);
-               if (conf->mirrors[conf->raid_disks + number].rdev) {
+
+               if (number < conf->raid_disks &&
+                   conf->mirrors[conf->raid_disks + number].rdev) {
                        /* We just removed a device that is being replaced.
                         * Move down the replacement.  We drain all IO before
                         * doing this to avoid confusion.
@@ -1944,8 +2067,6 @@ static void end_sync_write(struct bio *bio)
        struct r1bio *r1_bio = get_resync_r1bio(bio);
        struct mddev *mddev = r1_bio->mddev;
        struct r1conf *conf = mddev->private;
-       sector_t first_bad;
-       int bad_sectors;
        struct md_rdev *rdev = conf->mirrors[find_bio_disk(r1_bio, bio)].rdev;
 
        if (!uptodate) {
@@ -1955,14 +2076,11 @@ static void end_sync_write(struct bio *bio)
                        set_bit(MD_RECOVERY_NEEDED, &
                                mddev->recovery);
                set_bit(R1BIO_WriteError, &r1_bio->state);
-       } else if (is_badblock(rdev, r1_bio->sector, r1_bio->sectors,
-                              &first_bad, &bad_sectors) &&
-                  !is_badblock(conf->mirrors[r1_bio->read_disk].rdev,
-                               r1_bio->sector,
-                               r1_bio->sectors,
-                               &first_bad, &bad_sectors)
-               )
+       } else if (rdev_has_badblock(rdev, r1_bio->sector, r1_bio->sectors) &&
+                  !rdev_has_badblock(conf->mirrors[r1_bio->read_disk].rdev,
+                                     r1_bio->sector, r1_bio->sectors)) {
                set_bit(R1BIO_MadeGood, &r1_bio->state);
+       }
 
        put_sync_write_buf(r1_bio, uptodate);
 }
@@ -2262,7 +2380,7 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
        int sectors = r1_bio->sectors;
        int read_disk = r1_bio->read_disk;
        struct mddev *mddev = conf->mddev;
-       struct md_rdev *rdev = rcu_dereference(conf->mirrors[read_disk].rdev);
+       struct md_rdev *rdev = conf->mirrors[read_disk].rdev;
 
        if (exceed_read_errors(mddev, rdev)) {
                r1_bio->bios[r1_bio->read_disk] = IO_BLOCKED;
@@ -2279,16 +2397,12 @@ static void fix_read_error(struct r1conf *conf, struct r1bio *r1_bio)
                        s = PAGE_SIZE >> 9;
 
                do {
-                       sector_t first_bad;
-                       int bad_sectors;
-
                        rdev = conf->mirrors[d].rdev;
                        if (rdev &&
                            (test_bit(In_sync, &rdev->flags) ||
                             (!test_bit(Faulty, &rdev->flags) &&
                              rdev->recovery_offset >= sect + s)) &&
-                           is_badblock(rdev, sect, s,
-                                       &first_bad, &bad_sectors) == 0) {
+                           rdev_has_badblock(rdev, sect, s) == 0) {
                                atomic_inc(&rdev->nr_pending);
                                if (sync_page_io(rdev, sect, s<<9,
                                         conf->tmppage, REQ_OP_READ, false))
@@ -3006,23 +3120,17 @@ static struct r1conf *setup_conf(struct mddev *mddev)
 
        err = -EINVAL;
        spin_lock_init(&conf->device_lock);
+       conf->raid_disks = mddev->raid_disks;
        rdev_for_each(rdev, mddev) {
                int disk_idx = rdev->raid_disk;
-               if (disk_idx >= mddev->raid_disks
-                   || disk_idx < 0)
+
+               if (disk_idx >= conf->raid_disks || disk_idx < 0)
                        continue;
-               if (test_bit(Replacement, &rdev->flags))
-                       disk = conf->mirrors + mddev->raid_disks + disk_idx;
-               else
-                       disk = conf->mirrors + disk_idx;
 
-               if (disk->rdev)
+               if (!raid1_add_conf(conf, rdev, disk_idx,
+                                   test_bit(Replacement, &rdev->flags)))
                        goto abort;
-               disk->rdev = rdev;
-               disk->head_position = 0;
-               disk->seq_start = MaxSector;
        }
-       conf->raid_disks = mddev->raid_disks;
        conf->mddev = mddev;
        INIT_LIST_HEAD(&conf->retry_list);
        INIT_LIST_HEAD(&conf->bio_end_io_list);
@@ -3086,12 +3194,21 @@ static struct r1conf *setup_conf(struct mddev *mddev)
        return ERR_PTR(err);
 }
 
+static int raid1_set_limits(struct mddev *mddev)
+{
+       struct queue_limits lim;
+
+       blk_set_stacking_limits(&lim);
+       lim.max_write_zeroes_sectors = 0;
+       mddev_stack_rdev_limits(mddev, &lim);
+       return queue_limits_set(mddev->gendisk->queue, &lim);
+}
+
 static void raid1_free(struct mddev *mddev, void *priv);
 static int raid1_run(struct mddev *mddev)
 {
        struct r1conf *conf;
        int i;
-       struct md_rdev *rdev;
        int ret;
 
        if (mddev->level != 1) {
@@ -3118,14 +3235,10 @@ static int raid1_run(struct mddev *mddev)
        if (IS_ERR(conf))
                return PTR_ERR(conf);
 
-       if (mddev->queue)
-               blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
-
-       rdev_for_each(rdev, mddev) {
-               if (!mddev->gendisk)
-                       continue;
-               disk_stack_limits(mddev->gendisk, rdev->bdev,
-                                 rdev->data_offset << 9);
+       if (!mddev_is_dm(mddev)) {
+               ret = raid1_set_limits(mddev);
+               if (ret)
+                       goto abort;
        }
 
        mddev->degraded = 0;
index 14d4211a123a8e4007689919d5f86092c7feb659..5300cbaa58a415a0c6f3f41078d59394418bcb3a 100644 (file)
@@ -71,6 +71,7 @@ struct r1conf {
                                                 * allow for replacements.
                                                 */
        int                     raid_disks;
+       int                     nonrot_disks;
 
        spinlock_t              device_lock;
 
index 7412066ea22c7a525ed3e9ff1cfc1b5db2b2b527..a4556d2e46bf95f3bb8020941a00119c050cf662 100644 (file)
@@ -76,9 +76,6 @@ static void reshape_request_write(struct mddev *mddev, struct r10bio *r10_bio);
 static void end_reshape_write(struct bio *bio);
 static void end_reshape(struct r10conf *conf);
 
-#define raid10_log(md, fmt, args...)                           \
-       do { if ((md)->queue) blk_add_trace_msg((md)->queue, "raid10 " fmt, ##args); } while (0)
-
 #include "raid1-10.c"
 
 #define NULL_CMD
@@ -518,11 +515,7 @@ static void raid10_end_write_request(struct bio *bio)
                 * The 'master' represents the composite IO operation to
                 * user-side. So if something waits for IO, then it will
                 * wait for the 'master' bio.
-                */
-               sector_t first_bad;
-               int bad_sectors;
-
-               /*
+                *
                 * Do not set R10BIO_Uptodate if the current device is
                 * rebuilding or Faulty. This is because we cannot use
                 * such device for properly reading the data back (we could
@@ -535,10 +528,9 @@ static void raid10_end_write_request(struct bio *bio)
                        set_bit(R10BIO_Uptodate, &r10_bio->state);
 
                /* Maybe we can clear some bad blocks. */
-               if (is_badblock(rdev,
-                               r10_bio->devs[slot].addr,
-                               r10_bio->sectors,
-                               &first_bad, &bad_sectors) && !discard_error) {
+               if (rdev_has_badblock(rdev, r10_bio->devs[slot].addr,
+                                     r10_bio->sectors) &&
+                   !discard_error) {
                        bio_put(bio);
                        if (repl)
                                r10_bio->devs[slot].repl_bio = IO_MADE_GOOD;
@@ -753,17 +745,8 @@ static struct md_rdev *read_balance(struct r10conf *conf,
        best_good_sectors = 0;
        do_balance = 1;
        clear_bit(R10BIO_FailFast, &r10_bio->state);
-       /*
-        * Check if we can balance. We can balance on the whole
-        * device if no resync is going on (recovery is ok), or below
-        * the resync window. We take the first readable disk when
-        * above the resync window.
-        */
-       if ((conf->mddev->recovery_cp < MaxSector
-            && (this_sector + sectors >= conf->next_resync)) ||
-           (mddev_is_clustered(conf->mddev) &&
-            md_cluster_ops->area_resyncing(conf->mddev, READ, this_sector,
-                                           this_sector + sectors)))
+
+       if (raid1_should_read_first(conf->mddev, this_sector, sectors))
                do_balance = 0;
 
        for (slot = 0; slot < conf->copies ; slot++) {
@@ -1033,7 +1016,7 @@ static bool wait_barrier(struct r10conf *conf, bool nowait)
                        ret = false;
                } else {
                        conf->nr_waiting++;
-                       raid10_log(conf->mddev, "wait barrier");
+                       mddev_add_trace_msg(conf->mddev, "raid10 wait barrier");
                        wait_event_barrier(conf, stop_waiting_barrier(conf));
                        conf->nr_waiting--;
                }
@@ -1152,7 +1135,7 @@ static bool regular_request_wait(struct mddev *mddev, struct r10conf *conf,
                        bio_wouldblock_error(bio);
                        return false;
                }
-               raid10_log(conf->mddev, "wait reshape");
+               mddev_add_trace_msg(conf->mddev, "raid10 wait reshape");
                wait_event(conf->wait_barrier,
                           conf->reshape_progress <= bio->bi_iter.bi_sector ||
                           conf->reshape_progress >= bio->bi_iter.bi_sector +
@@ -1249,10 +1232,7 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
            test_bit(R10BIO_FailFast, &r10_bio->state))
                read_bio->bi_opf |= MD_FAILFAST;
        read_bio->bi_private = r10_bio;
-
-       if (mddev->gendisk)
-               trace_block_bio_remap(read_bio, disk_devt(mddev->gendisk),
-                                     r10_bio->sector);
+       mddev_trace_remap(mddev, read_bio, r10_bio->sector);
        submit_bio_noacct(read_bio);
        return;
 }
@@ -1288,10 +1268,7 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio,
                         && enough(conf, devnum))
                mbio->bi_opf |= MD_FAILFAST;
        mbio->bi_private = r10_bio;
-
-       if (conf->mddev->gendisk)
-               trace_block_bio_remap(mbio, disk_devt(conf->mddev->gendisk),
-                                     r10_bio->sector);
+       mddev_trace_remap(mddev, mbio, r10_bio->sector);
        /* flush_pending_writes() needs access to the rdev so...*/
        mbio->bi_bdev = (void *)rdev;
 
@@ -1330,10 +1307,7 @@ retry_wait:
                }
 
                if (rdev && test_bit(WriteErrorSeen, &rdev->flags)) {
-                       sector_t first_bad;
                        sector_t dev_sector = r10_bio->devs[i].addr;
-                       int bad_sectors;
-                       int is_bad;
 
                        /*
                         * Discard request doesn't care the write result
@@ -1342,9 +1316,8 @@ retry_wait:
                        if (!r10_bio->sectors)
                                continue;
 
-                       is_bad = is_badblock(rdev, dev_sector, r10_bio->sectors,
-                                            &first_bad, &bad_sectors);
-                       if (is_bad < 0) {
+                       if (rdev_has_badblock(rdev, dev_sector,
+                                             r10_bio->sectors) < 0) {
                                /*
                                 * Mustn't write here until the bad block
                                 * is acknowledged
@@ -1360,8 +1333,9 @@ retry_wait:
        if (unlikely(blocked_rdev)) {
                /* Have to wait for this device to get unblocked, then retry */
                allow_barrier(conf);
-               raid10_log(conf->mddev, "%s wait rdev %d blocked",
-                               __func__, blocked_rdev->raid_disk);
+               mddev_add_trace_msg(conf->mddev,
+                       "raid10 %s wait rdev %d blocked",
+                       __func__, blocked_rdev->raid_disk);
                md_wait_for_blocked_rdev(blocked_rdev, mddev);
                wait_barrier(conf, false);
                goto retry_wait;
@@ -1416,7 +1390,8 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
                        bio_wouldblock_error(bio);
                        return;
                }
-               raid10_log(conf->mddev, "wait reshape metadata");
+               mddev_add_trace_msg(conf->mddev,
+                       "raid10 wait reshape metadata");
                wait_event(mddev->sb_wait,
                           !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags));
 
@@ -2131,10 +2106,9 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev)
                        continue;
                }
 
-               if (mddev->gendisk)
-                       disk_stack_limits(mddev->gendisk, rdev->bdev,
-                                         rdev->data_offset << 9);
-
+               err = mddev_stack_new_rdev(mddev, rdev);
+               if (err)
+                       return err;
                p->head_position = 0;
                p->recovery_disabled = mddev->recovery_disabled - 1;
                rdev->raid_disk = mirror;
@@ -2150,10 +2124,9 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev)
                clear_bit(In_sync, &rdev->flags);
                set_bit(Replacement, &rdev->flags);
                rdev->raid_disk = repl_slot;
-               err = 0;
-               if (mddev->gendisk)
-                       disk_stack_limits(mddev->gendisk, rdev->bdev,
-                                         rdev->data_offset << 9);
+               err = mddev_stack_new_rdev(mddev, rdev);
+               if (err)
+                       return err;
                conf->fullsync = 1;
                WRITE_ONCE(p->replacement, rdev);
        }
@@ -2290,8 +2263,6 @@ static void end_sync_write(struct bio *bio)
        struct mddev *mddev = r10_bio->mddev;
        struct r10conf *conf = mddev->private;
        int d;
-       sector_t first_bad;
-       int bad_sectors;
        int slot;
        int repl;
        struct md_rdev *rdev = NULL;
@@ -2312,11 +2283,10 @@ static void end_sync_write(struct bio *bio)
                                        &rdev->mddev->recovery);
                        set_bit(R10BIO_WriteError, &r10_bio->state);
                }
-       } else if (is_badblock(rdev,
-                            r10_bio->devs[slot].addr,
-                            r10_bio->sectors,
-                            &first_bad, &bad_sectors))
+       } else if (rdev_has_badblock(rdev, r10_bio->devs[slot].addr,
+                                    r10_bio->sectors)) {
                set_bit(R10BIO_MadeGood, &r10_bio->state);
+       }
 
        rdev_dec_pending(rdev, mddev);
 
@@ -2597,11 +2567,8 @@ static void recovery_request_write(struct mddev *mddev, struct r10bio *r10_bio)
 static int r10_sync_page_io(struct md_rdev *rdev, sector_t sector,
                            int sectors, struct page *page, enum req_op op)
 {
-       sector_t first_bad;
-       int bad_sectors;
-
-       if (is_badblock(rdev, sector, sectors, &first_bad, &bad_sectors)
-           && (op == REQ_OP_READ || test_bit(WriteErrorSeen, &rdev->flags)))
+       if (rdev_has_badblock(rdev, sector, sectors) &&
+           (op == REQ_OP_READ || test_bit(WriteErrorSeen, &rdev->flags)))
                return -1;
        if (sync_page_io(rdev, sector, sectors << 9, page, op, false))
                /* success */
@@ -2658,16 +2625,14 @@ static void fix_read_error(struct r10conf *conf, struct mddev *mddev, struct r10
                        s = PAGE_SIZE >> 9;
 
                do {
-                       sector_t first_bad;
-                       int bad_sectors;
-
                        d = r10_bio->devs[sl].devnum;
                        rdev = conf->mirrors[d].rdev;
                        if (rdev &&
                            test_bit(In_sync, &rdev->flags) &&
                            !test_bit(Faulty, &rdev->flags) &&
-                           is_badblock(rdev, r10_bio->devs[sl].addr + sect, s,
-                                       &first_bad, &bad_sectors) == 0) {
+                           rdev_has_badblock(rdev,
+                                             r10_bio->devs[sl].addr + sect,
+                                             s) == 0) {
                                atomic_inc(&rdev->nr_pending);
                                success = sync_page_io(rdev,
                                                       r10_bio->devs[sl].addr +
@@ -4002,14 +3967,26 @@ static struct r10conf *setup_conf(struct mddev *mddev)
        return ERR_PTR(err);
 }
 
-static void raid10_set_io_opt(struct r10conf *conf)
+static unsigned int raid10_nr_stripes(struct r10conf *conf)
 {
-       int raid_disks = conf->geo.raid_disks;
+       unsigned int raid_disks = conf->geo.raid_disks;
+
+       if (conf->geo.raid_disks % conf->geo.near_copies)
+               return raid_disks;
+       return raid_disks / conf->geo.near_copies;
+}
 
-       if (!(conf->geo.raid_disks % conf->geo.near_copies))
-               raid_disks /= conf->geo.near_copies;
-       blk_queue_io_opt(conf->mddev->queue, (conf->mddev->chunk_sectors << 9) *
-                        raid_disks);
+static int raid10_set_queue_limits(struct mddev *mddev)
+{
+       struct r10conf *conf = mddev->private;
+       struct queue_limits lim;
+
+       blk_set_stacking_limits(&lim);
+       lim.max_write_zeroes_sectors = 0;
+       lim.io_min = mddev->chunk_sectors << 9;
+       lim.io_opt = lim.io_min * raid10_nr_stripes(conf);
+       mddev_stack_rdev_limits(mddev, &lim);
+       return queue_limits_set(mddev->gendisk->queue, &lim);
 }
 
 static int raid10_run(struct mddev *mddev)
@@ -4021,6 +3998,7 @@ static int raid10_run(struct mddev *mddev)
        sector_t size;
        sector_t min_offset_diff = 0;
        int first = 1;
+       int ret = -EIO;
 
        if (mddev->private == NULL) {
                conf = setup_conf(mddev);
@@ -4047,12 +4025,6 @@ static int raid10_run(struct mddev *mddev)
                }
        }
 
-       if (mddev->queue) {
-               blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
-               blk_queue_io_min(mddev->queue, mddev->chunk_sectors << 9);
-               raid10_set_io_opt(conf);
-       }
-
        rdev_for_each(rdev, mddev) {
                long long diff;
 
@@ -4081,14 +4053,16 @@ static int raid10_run(struct mddev *mddev)
                if (first || diff < min_offset_diff)
                        min_offset_diff = diff;
 
-               if (mddev->gendisk)
-                       disk_stack_limits(mddev->gendisk, rdev->bdev,
-                                         rdev->data_offset << 9);
-
                disk->head_position = 0;
                first = 0;
        }
 
+       if (!mddev_is_dm(conf->mddev)) {
+               ret = raid10_set_queue_limits(mddev);
+               if (ret)
+                       goto out_free_conf;
+       }
+
        /* need to check that every block has at least one working mirror */
        if (!enough(conf, -1)) {
                pr_err("md/raid10:%s: not enough operational mirrors.\n",
@@ -4175,11 +4149,7 @@ static int raid10_run(struct mddev *mddev)
                clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
                clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
                set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
-               set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
-               rcu_assign_pointer(mddev->sync_thread,
-                       md_register_thread(md_do_sync, mddev, "reshape"));
-               if (!mddev->sync_thread)
-                       goto out_free_conf;
+               set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
        }
 
        return 0;
@@ -4189,7 +4159,7 @@ out_free_conf:
        raid10_free_conf(conf);
        mddev->private = NULL;
 out:
-       return -EIO;
+       return ret;
 }
 
 static void raid10_free(struct mddev *mddev, void *priv)
@@ -4573,16 +4543,8 @@ out:
        clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
        clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
        set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
-       set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
-
-       rcu_assign_pointer(mddev->sync_thread,
-                          md_register_thread(md_do_sync, mddev, "reshape"));
-       if (!mddev->sync_thread) {
-               ret = -EAGAIN;
-               goto abort;
-       }
+       set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
        conf->reshape_checkpoint = jiffies;
-       md_wakeup_thread(mddev->sync_thread);
        md_new_event();
        return 0;
 
@@ -4966,8 +4928,7 @@ static void end_reshape(struct r10conf *conf)
        conf->reshape_safe = MaxSector;
        spin_unlock_irq(&conf->device_lock);
 
-       if (conf->mddev->queue)
-               raid10_set_io_opt(conf);
+       mddev_update_io_opt(conf->mddev, raid10_nr_stripes(conf));
        conf->fullsync = 0;
 }
 
index da4ba736c4f0c942e15fca6c4ce65649e82ffe93..a70cbec12ed01737874659ed2ac3c65987a5dc9f 100644 (file)
@@ -1393,7 +1393,8 @@ int ppl_init_log(struct r5conf *conf)
                ppl_conf->signature = ~crc32c_le(~0, mddev->uuid, sizeof(mddev->uuid));
                ppl_conf->block_size = 512;
        } else {
-               ppl_conf->block_size = queue_logical_block_size(mddev->queue);
+               ppl_conf->block_size =
+                       queue_logical_block_size(mddev->gendisk->queue);
        }
 
        for (i = 0; i < ppl_conf->count; i++) {
index 8497880135ee4269ef329e58a10757870ae2df18..d874abfc18364ec17cd91b4b5d11930259859305 100644 (file)
@@ -36,6 +36,7 @@
  */
 
 #include <linux/blkdev.h>
+#include <linux/delay.h>
 #include <linux/kthread.h>
 #include <linux/raid/pq.h>
 #include <linux/async_tx.h>
@@ -760,6 +761,7 @@ enum stripe_result {
        STRIPE_RETRY,
        STRIPE_SCHEDULE_AND_RETRY,
        STRIPE_FAIL,
+       STRIPE_WAIT_RESHAPE,
 };
 
 struct stripe_request_ctx {
@@ -1210,10 +1212,8 @@ again:
                 */
                while (op_is_write(op) && rdev &&
                       test_bit(WriteErrorSeen, &rdev->flags)) {
-                       sector_t first_bad;
-                       int bad_sectors;
-                       int bad = is_badblock(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf),
-                                             &first_bad, &bad_sectors);
+                       int bad = rdev_has_badblock(rdev, sh->sector,
+                                                   RAID5_STRIPE_SECTORS(conf));
                        if (!bad)
                                break;
 
@@ -1295,10 +1295,7 @@ again:
                        if (rrdev)
                                set_bit(R5_DOUBLE_LOCKED, &sh->dev[i].flags);
 
-                       if (conf->mddev->gendisk)
-                               trace_block_bio_remap(bi,
-                                               disk_devt(conf->mddev->gendisk),
-                                               sh->dev[i].sector);
+                       mddev_trace_remap(conf->mddev, bi, sh->dev[i].sector);
                        if (should_defer && op_is_write(op))
                                bio_list_add(&pending_bios, bi);
                        else
@@ -1342,10 +1339,7 @@ again:
                         */
                        if (op == REQ_OP_DISCARD)
                                rbi->bi_vcnt = 0;
-                       if (conf->mddev->gendisk)
-                               trace_block_bio_remap(rbi,
-                                               disk_devt(conf->mddev->gendisk),
-                                               sh->dev[i].sector);
+                       mddev_trace_remap(conf->mddev, rbi, sh->dev[i].sector);
                        if (should_defer && op_is_write(op))
                                bio_list_add(&pending_bios, rbi);
                        else
@@ -2412,7 +2406,7 @@ static int grow_one_stripe(struct r5conf *conf, gfp_t gfp)
        atomic_inc(&conf->active_stripes);
 
        raid5_release_stripe(sh);
-       conf->max_nr_stripes++;
+       WRITE_ONCE(conf->max_nr_stripes, conf->max_nr_stripes + 1);
        return 1;
 }
 
@@ -2422,12 +2416,12 @@ static int grow_stripes(struct r5conf *conf, int num)
        size_t namelen = sizeof(conf->cache_name[0]);
        int devs = max(conf->raid_disks, conf->previous_raid_disks);
 
-       if (conf->mddev->gendisk)
+       if (mddev_is_dm(conf->mddev))
                snprintf(conf->cache_name[0], namelen,
-                       "raid%d-%s", conf->level, mdname(conf->mddev));
+                       "raid%d-%p", conf->level, conf->mddev);
        else
                snprintf(conf->cache_name[0], namelen,
-                       "raid%d-%p", conf->level, conf->mddev);
+                       "raid%d-%s", conf->level, mdname(conf->mddev));
        snprintf(conf->cache_name[1], namelen, "%.27s-alt", conf->cache_name[0]);
 
        conf->active_name = 0;
@@ -2707,7 +2701,7 @@ static int drop_one_stripe(struct r5conf *conf)
        shrink_buffers(sh);
        free_stripe(conf->slab_cache, sh);
        atomic_dec(&conf->active_stripes);
-       conf->max_nr_stripes--;
+       WRITE_ONCE(conf->max_nr_stripes, conf->max_nr_stripes - 1);
        return 1;
 }
 
@@ -2855,8 +2849,6 @@ static void raid5_end_write_request(struct bio *bi)
        struct r5conf *conf = sh->raid_conf;
        int disks = sh->disks, i;
        struct md_rdev *rdev;
-       sector_t first_bad;
-       int bad_sectors;
        int replacement = 0;
 
        for (i = 0 ; i < disks; i++) {
@@ -2888,9 +2880,8 @@ static void raid5_end_write_request(struct bio *bi)
        if (replacement) {
                if (bi->bi_status)
                        md_error(conf->mddev, rdev);
-               else if (is_badblock(rdev, sh->sector,
-                                    RAID5_STRIPE_SECTORS(conf),
-                                    &first_bad, &bad_sectors))
+               else if (rdev_has_badblock(rdev, sh->sector,
+                                          RAID5_STRIPE_SECTORS(conf)))
                        set_bit(R5_MadeGoodRepl, &sh->dev[i].flags);
        } else {
                if (bi->bi_status) {
@@ -2900,9 +2891,8 @@ static void raid5_end_write_request(struct bio *bi)
                        if (!test_and_set_bit(WantReplacement, &rdev->flags))
                                set_bit(MD_RECOVERY_NEEDED,
                                        &rdev->mddev->recovery);
-               } else if (is_badblock(rdev, sh->sector,
-                                      RAID5_STRIPE_SECTORS(conf),
-                                      &first_bad, &bad_sectors)) {
+               } else if (rdev_has_badblock(rdev, sh->sector,
+                                            RAID5_STRIPE_SECTORS(conf))) {
                        set_bit(R5_MadeGood, &sh->dev[i].flags);
                        if (test_bit(R5_ReadError, &sh->dev[i].flags))
                                /* That was a successful write so make
@@ -4205,10 +4195,9 @@ static int handle_stripe_dirtying(struct r5conf *conf,
        set_bit(STRIPE_HANDLE, &sh->state);
        if ((rmw < rcw || (rmw == rcw && conf->rmw_level == PARITY_PREFER_RMW)) && rmw > 0) {
                /* prefer read-modify-write, but need to get some data */
-               if (conf->mddev->queue)
-                       blk_add_trace_msg(conf->mddev->queue,
-                                         "raid5 rmw %llu %d",
-                                         (unsigned long long)sh->sector, rmw);
+               mddev_add_trace_msg(conf->mddev, "raid5 rmw %llu %d",
+                               sh->sector, rmw);
+
                for (i = disks; i--; ) {
                        struct r5dev *dev = &sh->dev[i];
                        if (test_bit(R5_InJournal, &dev->flags) &&
@@ -4285,10 +4274,11 @@ static int handle_stripe_dirtying(struct r5conf *conf,
                                        set_bit(STRIPE_DELAYED, &sh->state);
                        }
                }
-               if (rcw && conf->mddev->queue)
-                       blk_add_trace_msg(conf->mddev->queue, "raid5 rcw %llu %d %d %d",
-                                         (unsigned long long)sh->sector,
-                                         rcw, qread, test_bit(STRIPE_DELAYED, &sh->state));
+               if (rcw && !mddev_is_dm(conf->mddev))
+                       blk_add_trace_msg(conf->mddev->gendisk->queue,
+                               "raid5 rcw %llu %d %d %d",
+                               (unsigned long long)sh->sector, rcw, qread,
+                               test_bit(STRIPE_DELAYED, &sh->state));
        }
 
        if (rcw > disks && rmw > disks &&
@@ -4674,8 +4664,6 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
        /* Now to look around and see what can be done */
        for (i=disks; i--; ) {
                struct md_rdev *rdev;
-               sector_t first_bad;
-               int bad_sectors;
                int is_bad = 0;
 
                dev = &sh->dev[i];
@@ -4719,8 +4707,8 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
                rdev = conf->disks[i].replacement;
                if (rdev && !test_bit(Faulty, &rdev->flags) &&
                    rdev->recovery_offset >= sh->sector + RAID5_STRIPE_SECTORS(conf) &&
-                   !is_badblock(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf),
-                                &first_bad, &bad_sectors))
+                   !rdev_has_badblock(rdev, sh->sector,
+                                      RAID5_STRIPE_SECTORS(conf)))
                        set_bit(R5_ReadRepl, &dev->flags);
                else {
                        if (rdev && !test_bit(Faulty, &rdev->flags))
@@ -4733,8 +4721,8 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
                if (rdev && test_bit(Faulty, &rdev->flags))
                        rdev = NULL;
                if (rdev) {
-                       is_bad = is_badblock(rdev, sh->sector, RAID5_STRIPE_SECTORS(conf),
-                                            &first_bad, &bad_sectors);
+                       is_bad = rdev_has_badblock(rdev, sh->sector,
+                                                  RAID5_STRIPE_SECTORS(conf));
                        if (s->blocked_rdev == NULL
                            && (test_bit(Blocked, &rdev->flags)
                                || is_bad < 0)) {
@@ -5463,8 +5451,8 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
        struct r5conf *conf = mddev->private;
        struct bio *align_bio;
        struct md_rdev *rdev;
-       sector_t sector, end_sector, first_bad;
-       int bad_sectors, dd_idx;
+       sector_t sector, end_sector;
+       int dd_idx;
        bool did_inc;
 
        if (!in_chunk_boundary(mddev, raid_bio)) {
@@ -5493,8 +5481,7 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
 
        atomic_inc(&rdev->nr_pending);
 
-       if (is_badblock(rdev, sector, bio_sectors(raid_bio), &first_bad,
-                       &bad_sectors)) {
+       if (rdev_has_badblock(rdev, sector, bio_sectors(raid_bio))) {
                rdev_dec_pending(rdev, mddev);
                return 0;
        }
@@ -5530,9 +5517,7 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
                spin_unlock_irq(&conf->device_lock);
        }
 
-       if (mddev->gendisk)
-               trace_block_bio_remap(align_bio, disk_devt(mddev->gendisk),
-                                     raid_bio->bi_iter.bi_sector);
+       mddev_trace_remap(mddev, align_bio, raid_bio->bi_iter.bi_sector);
        submit_bio_noacct(align_bio);
        return 1;
 }
@@ -5701,8 +5686,8 @@ static void raid5_unplug(struct blk_plug_cb *blk_cb, bool from_schedule)
        }
        release_inactive_stripe_list(conf, cb->temp_inactive_list,
                                     NR_STRIPE_HASH_LOCKS);
-       if (mddev->queue)
-               trace_block_unplug(mddev->queue, cnt, !from_schedule);
+       if (!mddev_is_dm(mddev))
+               trace_block_unplug(mddev->gendisk->queue, cnt, !from_schedule);
        kfree(cb);
 }
 
@@ -5946,7 +5931,8 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
                        if (ahead_of_reshape(mddev, logical_sector,
                                             conf->reshape_safe)) {
                                spin_unlock_irq(&conf->device_lock);
-                               return STRIPE_SCHEDULE_AND_RETRY;
+                               ret = STRIPE_SCHEDULE_AND_RETRY;
+                               goto out;
                        }
                }
                spin_unlock_irq(&conf->device_lock);
@@ -6025,6 +6011,12 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
 
 out_release:
        raid5_release_stripe(sh);
+out:
+       if (ret == STRIPE_SCHEDULE_AND_RETRY && reshape_interrupted(mddev)) {
+               bi->bi_status = BLK_STS_RESOURCE;
+               ret = STRIPE_WAIT_RESHAPE;
+               pr_err_ratelimited("dm-raid456: io across reshape position while reshape can't make progress");
+       }
        return ret;
 }
 
@@ -6146,7 +6138,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
        while (1) {
                res = make_stripe_request(mddev, conf, &ctx, logical_sector,
                                          bi);
-               if (res == STRIPE_FAIL)
+               if (res == STRIPE_FAIL || res == STRIPE_WAIT_RESHAPE)
                        break;
 
                if (res == STRIPE_RETRY)
@@ -6184,6 +6176,11 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 
        if (rw == WRITE)
                md_write_end(mddev);
+       if (res == STRIPE_WAIT_RESHAPE) {
+               md_free_cloned_bio(bi);
+               return false;
+       }
+
        bio_endio(bi);
        return true;
 }
@@ -6773,7 +6770,18 @@ static void raid5d(struct md_thread *thread)
                        spin_unlock_irq(&conf->device_lock);
                        md_check_recovery(mddev);
                        spin_lock_irq(&conf->device_lock);
+
+                       /*
+                        * Waiting on MD_SB_CHANGE_PENDING below may deadlock
+                        * seeing md_check_recovery() is needed to clear
+                        * the flag when using mdmon.
+                        */
+                       continue;
                }
+
+               wait_event_lock_irq(mddev->sb_wait,
+                       !test_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags),
+                       conf->device_lock);
        }
        pr_debug("%d stripes handled\n", handled);
 
@@ -6820,7 +6828,7 @@ raid5_set_cache_size(struct mddev *mddev, int size)
        if (size <= 16 || size > 32768)
                return -EINVAL;
 
-       conf->min_nr_stripes = size;
+       WRITE_ONCE(conf->min_nr_stripes, size);
        mutex_lock(&conf->cache_size_mutex);
        while (size < conf->max_nr_stripes &&
               drop_one_stripe(conf))
@@ -6832,7 +6840,7 @@ raid5_set_cache_size(struct mddev *mddev, int size)
        mutex_lock(&conf->cache_size_mutex);
        while (size > conf->max_nr_stripes)
                if (!grow_one_stripe(conf, GFP_KERNEL)) {
-                       conf->min_nr_stripes = conf->max_nr_stripes;
+                       WRITE_ONCE(conf->min_nr_stripes, conf->max_nr_stripes);
                        result = -ENOMEM;
                        break;
                }
@@ -6967,10 +6975,8 @@ raid5_store_stripe_size(struct mddev  *mddev, const char *page, size_t len)
        pr_debug("md/raid: change stripe_size from %lu to %lu\n",
                        conf->stripe_size, new);
 
-       if (mddev->sync_thread ||
-               test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
-               mddev->reshape_position != MaxSector ||
-               mddev->sysfs_active) {
+       if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
+           mddev->reshape_position != MaxSector || mddev->sysfs_active) {
                err = -EBUSY;
                goto out_unlock;
        }
@@ -7084,7 +7090,7 @@ raid5_store_skip_copy(struct mddev *mddev, const char *page, size_t len)
        if (!conf)
                err = -ENODEV;
        else if (new != conf->skip_copy) {
-               struct request_queue *q = mddev->queue;
+               struct request_queue *q = mddev->gendisk->queue;
 
                conf->skip_copy = new;
                if (new)
@@ -7390,11 +7396,13 @@ static unsigned long raid5_cache_count(struct shrinker *shrink,
                                       struct shrink_control *sc)
 {
        struct r5conf *conf = shrink->private_data;
+       int max_stripes = READ_ONCE(conf->max_nr_stripes);
+       int min_stripes = READ_ONCE(conf->min_nr_stripes);
 
-       if (conf->max_nr_stripes < conf->min_nr_stripes)
+       if (max_stripes < min_stripes)
                /* unlikely, but not impossible */
                return 0;
-       return conf->max_nr_stripes - conf->min_nr_stripes;
+       return max_stripes - min_stripes;
 }
 
 static struct r5conf *setup_conf(struct mddev *mddev)
@@ -7684,10 +7692,65 @@ static int only_parity(int raid_disk, int algo, int raid_disks, int max_degraded
        return 0;
 }
 
-static void raid5_set_io_opt(struct r5conf *conf)
+static int raid5_set_limits(struct mddev *mddev)
 {
-       blk_queue_io_opt(conf->mddev->queue, (conf->chunk_sectors << 9) *
-                        (conf->raid_disks - conf->max_degraded));
+       struct r5conf *conf = mddev->private;
+       struct queue_limits lim;
+       int data_disks, stripe;
+       struct md_rdev *rdev;
+
+       /*
+        * The read-ahead size must cover two whole stripes, which is
+        * 2 * (datadisks) * chunksize where 'n' is the number of raid devices.
+        */
+       data_disks = conf->previous_raid_disks - conf->max_degraded;
+
+       /*
+        * We can only discard a whole stripe. It doesn't make sense to
+        * discard data disk but write parity disk
+        */
+       stripe = roundup_pow_of_two(data_disks * (mddev->chunk_sectors << 9));
+
+       blk_set_stacking_limits(&lim);
+       lim.io_min = mddev->chunk_sectors << 9;
+       lim.io_opt = lim.io_min * (conf->raid_disks - conf->max_degraded);
+       lim.raid_partial_stripes_expensive = 1;
+       lim.discard_granularity = stripe;
+       lim.max_write_zeroes_sectors = 0;
+       mddev_stack_rdev_limits(mddev, &lim);
+       rdev_for_each(rdev, mddev)
+               queue_limits_stack_bdev(&lim, rdev->bdev, rdev->new_data_offset,
+                               mddev->gendisk->disk_name);
+
+       /*
+        * Zeroing is required for discard, otherwise data could be lost.
+        *
+        * Consider a scenario: discard a stripe (the stripe could be
+        * inconsistent if discard_zeroes_data is 0); write one disk of the
+        * stripe (the stripe could be inconsistent again depending on which
+        * disks are used to calculate parity); the disk is broken; The stripe
+        * data of this disk is lost.
+        *
+        * We only allow DISCARD if the sysadmin has confirmed that only safe
+        * devices are in use by setting a module parameter.  A better idea
+        * might be to turn DISCARD into WRITE_ZEROES requests, as that is
+        * required to be safe.
+        */
+       if (!devices_handle_discard_safely ||
+           lim.max_discard_sectors < (stripe >> 9) ||
+           lim.discard_granularity < stripe)
+               lim.max_hw_discard_sectors = 0;
+
+       /*
+        * Requests require having a bitmap for each stripe.
+        * Limit the max sectors based on this.
+        */
+       lim.max_hw_sectors = RAID5_MAX_REQ_STRIPES << RAID5_STRIPE_SHIFT(conf);
+
+       /* No restrictions on the number of segments in the request */
+       lim.max_segments = USHRT_MAX;
+
+       return queue_limits_set(mddev->gendisk->queue, &lim);
 }
 
 static int raid5_run(struct mddev *mddev)
@@ -7700,6 +7763,7 @@ static int raid5_run(struct mddev *mddev)
        int i;
        long long min_offset_diff = 0;
        int first = 1;
+       int ret = -EIO;
 
        if (mddev->recovery_cp != MaxSector)
                pr_notice("md/raid:%s: not clean -- starting background reconstruction\n",
@@ -7936,11 +8000,7 @@ static int raid5_run(struct mddev *mddev)
                clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
                clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
                set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
-               set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
-               rcu_assign_pointer(mddev->sync_thread,
-                       md_register_thread(md_do_sync, mddev, "reshape"));
-               if (!mddev->sync_thread)
-                       goto abort;
+               set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
        }
 
        /* Ok, everything is just fine now */
@@ -7952,66 +8012,10 @@ static int raid5_run(struct mddev *mddev)
                        mdname(mddev));
        md_set_array_sectors(mddev, raid5_size(mddev, 0, 0));
 
-       if (mddev->queue) {
-               int chunk_size;
-               /* read-ahead size must cover two whole stripes, which
-                * is 2 * (datadisks) * chunksize where 'n' is the
-                * number of raid devices
-                */
-               int data_disks = conf->previous_raid_disks - conf->max_degraded;
-               int stripe = data_disks *
-                       ((mddev->chunk_sectors << 9) / PAGE_SIZE);
-
-               chunk_size = mddev->chunk_sectors << 9;
-               blk_queue_io_min(mddev->queue, chunk_size);
-               raid5_set_io_opt(conf);
-               mddev->queue->limits.raid_partial_stripes_expensive = 1;
-               /*
-                * We can only discard a whole stripe. It doesn't make sense to
-                * discard data disk but write parity disk
-                */
-               stripe = stripe * PAGE_SIZE;
-               stripe = roundup_pow_of_two(stripe);
-               mddev->queue->limits.discard_granularity = stripe;
-
-               blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
-
-               rdev_for_each(rdev, mddev) {
-                       disk_stack_limits(mddev->gendisk, rdev->bdev,
-                                         rdev->data_offset << 9);
-                       disk_stack_limits(mddev->gendisk, rdev->bdev,
-                                         rdev->new_data_offset << 9);
-               }
-
-               /*
-                * zeroing is required, otherwise data
-                * could be lost. Consider a scenario: discard a stripe
-                * (the stripe could be inconsistent if
-                * discard_zeroes_data is 0); write one disk of the
-                * stripe (the stripe could be inconsistent again
-                * depending on which disks are used to calculate
-                * parity); the disk is broken; The stripe data of this
-                * disk is lost.
-                *
-                * We only allow DISCARD if the sysadmin has confirmed that
-                * only safe devices are in use by setting a module parameter.
-                * A better idea might be to turn DISCARD into WRITE_ZEROES
-                * requests, as that is required to be safe.
-                */
-               if (!devices_handle_discard_safely ||
-                   mddev->queue->limits.max_discard_sectors < (stripe >> 9) ||
-                   mddev->queue->limits.discard_granularity < stripe)
-                       blk_queue_max_discard_sectors(mddev->queue, 0);
-
-               /*
-                * Requests require having a bitmap for each stripe.
-                * Limit the max sectors based on this.
-                */
-               blk_queue_max_hw_sectors(mddev->queue,
-                       RAID5_MAX_REQ_STRIPES << RAID5_STRIPE_SHIFT(conf));
-
-               /* No restrictions on the number of segments in the request */
-               blk_queue_max_segments(mddev->queue, USHRT_MAX);
+       if (!mddev_is_dm(mddev)) {
+               ret = raid5_set_limits(mddev);
+               if (ret)
+                       goto abort;
        }
 
        if (log_init(conf, journal_dev, raid5_has_ppl(conf)))
@@ -8024,7 +8028,7 @@ abort:
        free_conf(conf);
        mddev->private = NULL;
        pr_warn("md/raid:%s: failed to run raid set.\n", mdname(mddev));
-       return -EIO;
+       return ret;
 }
 
 static void raid5_free(struct mddev *mddev, void *priv)
@@ -8506,29 +8510,8 @@ static int raid5_start_reshape(struct mddev *mddev)
        clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
        clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
        set_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
-       set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
-       rcu_assign_pointer(mddev->sync_thread,
-                          md_register_thread(md_do_sync, mddev, "reshape"));
-       if (!mddev->sync_thread) {
-               mddev->recovery = 0;
-               spin_lock_irq(&conf->device_lock);
-               write_seqcount_begin(&conf->gen_lock);
-               mddev->raid_disks = conf->raid_disks = conf->previous_raid_disks;
-               mddev->new_chunk_sectors =
-                       conf->chunk_sectors = conf->prev_chunk_sectors;
-               mddev->new_layout = conf->algorithm = conf->prev_algo;
-               rdev_for_each(rdev, mddev)
-                       rdev->new_data_offset = rdev->data_offset;
-               smp_wmb();
-               conf->generation --;
-               conf->reshape_progress = MaxSector;
-               mddev->reshape_position = MaxSector;
-               write_seqcount_end(&conf->gen_lock);
-               spin_unlock_irq(&conf->device_lock);
-               return -EAGAIN;
-       }
+       set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
        conf->reshape_checkpoint = jiffies;
-       md_wakeup_thread(mddev->sync_thread);
        md_new_event();
        return 0;
 }
@@ -8556,8 +8539,8 @@ static void end_reshape(struct r5conf *conf)
                spin_unlock_irq(&conf->device_lock);
                wake_up(&conf->wait_for_overlap);
 
-               if (conf->mddev->queue)
-                       raid5_set_io_opt(conf);
+               mddev_update_io_opt(conf->mddev,
+                       conf->raid_disks - conf->max_degraded);
        }
 }
 
@@ -8934,6 +8917,18 @@ static int raid5_start(struct mddev *mddev)
        return r5l_start(conf->log);
 }
 
+/*
+ * This is only used for dm-raid456, caller already frozen sync_thread, hence
+ * if rehsape is still in progress, io that is waiting for reshape can never be
+ * done now, hence wake up and handle those IO.
+ */
+static void raid5_prepare_suspend(struct mddev *mddev)
+{
+       struct r5conf *conf = mddev->private;
+
+       wake_up(&conf->wait_for_overlap);
+}
+
 static struct md_personality raid6_personality =
 {
        .name           = "raid6",
@@ -8957,6 +8952,7 @@ static struct md_personality raid6_personality =
        .quiesce        = raid5_quiesce,
        .takeover       = raid6_takeover,
        .change_consistency_policy = raid5_change_consistency_policy,
+       .prepare_suspend = raid5_prepare_suspend,
 };
 static struct md_personality raid5_personality =
 {
@@ -8981,6 +8977,7 @@ static struct md_personality raid5_personality =
        .quiesce        = raid5_quiesce,
        .takeover       = raid5_takeover,
        .change_consistency_policy = raid5_change_consistency_policy,
+       .prepare_suspend = raid5_prepare_suspend,
 };
 
 static struct md_personality raid4_personality =
@@ -9006,6 +9003,7 @@ static struct md_personality raid4_personality =
        .quiesce        = raid5_quiesce,
        .takeover       = raid4_takeover,
        .change_consistency_policy = raid5_change_consistency_policy,
+       .prepare_suspend = raid5_prepare_suspend,
 };
 
 static int __init raid5_init(void)
index 41a832dd1426bae695dc90b385976b2bf4b7a304..b6bf8f232f4880ffcd1f7ff0bd00ddb0bbebccb9 100644 (file)
@@ -989,7 +989,7 @@ int vb2_core_create_bufs(struct vb2_queue *q, enum vb2_memory memory,
        bool no_previous_buffers = !q_num_bufs;
        int ret = 0;
 
-       if (q->num_buffers == q->max_num_buffers) {
+       if (q_num_bufs == q->max_num_buffers) {
                dprintk(q, 1, "maximum number of buffers already allocated\n");
                return -ENOBUFS;
        }
index 54d572c3b515d67722c4dbe7490437bc83c30b96..c575198e83547ab99719eca7e81dc6e7c0e601d4 100644 (file)
@@ -671,8 +671,20 @@ int vb2_querybuf(struct vb2_queue *q, struct v4l2_buffer *b)
 }
 EXPORT_SYMBOL(vb2_querybuf);
 
-static void fill_buf_caps(struct vb2_queue *q, u32 *caps)
+static void vb2_set_flags_and_caps(struct vb2_queue *q, u32 memory,
+                                  u32 *flags, u32 *caps, u32 *max_num_bufs)
 {
+       if (!q->allow_cache_hints || memory != V4L2_MEMORY_MMAP) {
+               /*
+                * This needs to clear V4L2_MEMORY_FLAG_NON_COHERENT only,
+                * but in order to avoid bugs we zero out all bits.
+                */
+               *flags = 0;
+       } else {
+               /* Clear all unknown flags. */
+               *flags &= V4L2_MEMORY_FLAG_NON_COHERENT;
+       }
+
        *caps = V4L2_BUF_CAP_SUPPORTS_ORPHANED_BUFS;
        if (q->io_modes & VB2_MMAP)
                *caps |= V4L2_BUF_CAP_SUPPORTS_MMAP;
@@ -686,21 +698,9 @@ static void fill_buf_caps(struct vb2_queue *q, u32 *caps)
                *caps |= V4L2_BUF_CAP_SUPPORTS_MMAP_CACHE_HINTS;
        if (q->supports_requests)
                *caps |= V4L2_BUF_CAP_SUPPORTS_REQUESTS;
-}
-
-static void validate_memory_flags(struct vb2_queue *q,
-                                 int memory,
-                                 u32 *flags)
-{
-       if (!q->allow_cache_hints || memory != V4L2_MEMORY_MMAP) {
-               /*
-                * This needs to clear V4L2_MEMORY_FLAG_NON_COHERENT only,
-                * but in order to avoid bugs we zero out all bits.
-                */
-               *flags = 0;
-       } else {
-               /* Clear all unknown flags. */
-               *flags &= V4L2_MEMORY_FLAG_NON_COHERENT;
+       if (max_num_bufs) {
+               *max_num_bufs = q->max_num_buffers;
+               *caps |= V4L2_BUF_CAP_SUPPORTS_MAX_NUM_BUFFERS;
        }
 }
 
@@ -709,8 +709,8 @@ int vb2_reqbufs(struct vb2_queue *q, struct v4l2_requestbuffers *req)
        int ret = vb2_verify_memory_type(q, req->memory, req->type);
        u32 flags = req->flags;
 
-       fill_buf_caps(q, &req->capabilities);
-       validate_memory_flags(q, req->memory, &flags);
+       vb2_set_flags_and_caps(q, req->memory, &flags,
+                              &req->capabilities, NULL);
        req->flags = flags;
        return ret ? ret : vb2_core_reqbufs(q, req->memory,
                                            req->flags, &req->count);
@@ -751,11 +751,9 @@ int vb2_create_bufs(struct vb2_queue *q, struct v4l2_create_buffers *create)
        int ret = vb2_verify_memory_type(q, create->memory, f->type);
        unsigned i;
 
-       fill_buf_caps(q, &create->capabilities);
-       validate_memory_flags(q, create->memory, &create->flags);
        create->index = vb2_get_num_buffers(q);
-       create->max_num_buffers = q->max_num_buffers;
-       create->capabilities |= V4L2_BUF_CAP_SUPPORTS_MAX_NUM_BUFFERS;
+       vb2_set_flags_and_caps(q, create->memory, &create->flags,
+                              &create->capabilities, &create->max_num_buffers);
        if (create->count == 0)
                return ret != -EBUSY ? ret : 0;
 
@@ -1006,8 +1004,8 @@ int vb2_ioctl_reqbufs(struct file *file, void *priv,
        int res = vb2_verify_memory_type(vdev->queue, p->memory, p->type);
        u32 flags = p->flags;
 
-       fill_buf_caps(vdev->queue, &p->capabilities);
-       validate_memory_flags(vdev->queue, p->memory, &flags);
+       vb2_set_flags_and_caps(vdev->queue, p->memory, &flags,
+                              &p->capabilities, NULL);
        p->flags = flags;
        if (res)
                return res;
@@ -1026,12 +1024,11 @@ int vb2_ioctl_create_bufs(struct file *file, void *priv,
                          struct v4l2_create_buffers *p)
 {
        struct video_device *vdev = video_devdata(file);
-       int res = vb2_verify_memory_type(vdev->queue, p->memory,
-                       p->format.type);
+       int res = vb2_verify_memory_type(vdev->queue, p->memory, p->format.type);
 
-       p->index = vdev->queue->num_buffers;
-       fill_buf_caps(vdev->queue, &p->capabilities);
-       validate_memory_flags(vdev->queue, p->memory, &p->flags);
+       p->index = vb2_get_num_buffers(vdev->queue);
+       vb2_set_flags_and_caps(vdev->queue, p->memory, &p->flags,
+                              &p->capabilities, &p->max_num_buffers);
        /*
         * If count == 0, then just check if memory and type are valid.
         * Any -EBUSY result from vb2_verify_memory_type can be mapped to 0.
index bfe4caa79cc9800f7b01839fbbb768c73010a72e..0d90b5820bef7286694129ec0c4ed4f436d399b2 100644 (file)
@@ -272,7 +272,7 @@ static const struct wave5_match_data ti_wave521c_data = {
 };
 
 static const struct of_device_id wave5_dt_ids[] = {
-       { .compatible = "ti,k3-j721s2-wave521c", .data = &ti_wave521c_data },
+       { .compatible = "ti,j721s2-wave521c", .data = &ti_wave521c_data },
        { /* sentinel */ }
 };
 MODULE_DEVICE_TABLE(of, wave5_dt_ids);
index aebd3c12020bfd5d104c089354da0557de93547d..c381c22135a217b71e82f8282699cfbe14749ded 100644 (file)
@@ -725,6 +725,9 @@ irqreturn_t rkisp1_capture_isr(int irq, void *ctx)
        unsigned int i;
        u32 status;
 
+       if (!rkisp1->irqs_enabled)
+               return IRQ_NONE;
+
        status = rkisp1_read(rkisp1, RKISP1_CIF_MI_MIS);
        if (!status)
                return IRQ_NONE;
index 4b6b28c05b8916b7e5e079b6d2791009907c22db..b757f75edecf75256525e378151fe217c88ef2c4 100644 (file)
@@ -450,6 +450,7 @@ struct rkisp1_debug {
  * @debug:        debug params to be exposed on debugfs
  * @info:         version-specific ISP information
  * @irqs:          IRQ line numbers
+ * @irqs_enabled:  the hardware is enabled and can cause interrupts
  */
 struct rkisp1_device {
        void __iomem *base_addr;
@@ -471,6 +472,7 @@ struct rkisp1_device {
        struct rkisp1_debug debug;
        const struct rkisp1_info *info;
        int irqs[RKISP1_NUM_IRQS];
+       bool irqs_enabled;
 };
 
 /*
index b6e47e2f1b94916e51ec86b38d12b37ae186609d..4202642e052392a946761ecb76d3cfa771956e07 100644 (file)
@@ -196,6 +196,9 @@ irqreturn_t rkisp1_csi_isr(int irq, void *ctx)
        struct rkisp1_device *rkisp1 = dev_get_drvdata(dev);
        u32 val, status;
 
+       if (!rkisp1->irqs_enabled)
+               return IRQ_NONE;
+
        status = rkisp1_read(rkisp1, RKISP1_CIF_MIPI_MIS);
        if (!status)
                return IRQ_NONE;
index f96f821a7b50d0f10db51932d2b82986dcb16957..73cf08a740118c05328fdd3f1a9d52a6e935c4e0 100644 (file)
@@ -305,6 +305,24 @@ static int __maybe_unused rkisp1_runtime_suspend(struct device *dev)
 {
        struct rkisp1_device *rkisp1 = dev_get_drvdata(dev);
 
+       rkisp1->irqs_enabled = false;
+       /* Make sure the IRQ handler will see the above */
+       mb();
+
+       /*
+        * Wait until any running IRQ handler has returned. The IRQ handler
+        * may get called even after this (as it's a shared interrupt line)
+        * but the 'irqs_enabled' flag will make the handler return immediately.
+        */
+       for (unsigned int il = 0; il < ARRAY_SIZE(rkisp1->irqs); ++il) {
+               if (rkisp1->irqs[il] == -1)
+                       continue;
+
+               /* Skip if the irq line is the same as previous */
+               if (il == 0 || rkisp1->irqs[il - 1] != rkisp1->irqs[il])
+                       synchronize_irq(rkisp1->irqs[il]);
+       }
+
        clk_bulk_disable_unprepare(rkisp1->clk_size, rkisp1->clks);
        return pinctrl_pm_select_sleep_state(dev);
 }
@@ -321,6 +339,10 @@ static int __maybe_unused rkisp1_runtime_resume(struct device *dev)
        if (ret)
                return ret;
 
+       rkisp1->irqs_enabled = true;
+       /* Make sure the IRQ handler will see the above */
+       mb();
+
        return 0;
 }
 
@@ -559,7 +581,7 @@ static int rkisp1_probe(struct platform_device *pdev)
                                rkisp1->irqs[il] = irq;
                }
 
-               ret = devm_request_irq(dev, irq, info->isrs[i].isr, 0,
+               ret = devm_request_irq(dev, irq, info->isrs[i].isr, IRQF_SHARED,
                                       dev_driver_string(dev), dev);
                if (ret) {
                        dev_err(dev, "request irq failed: %d\n", ret);
index f00873d31c42b702d239e9e9fefcb8eddb599275..78a1f7a1499be84f15b94d75b30266dfb8c720ce 100644 (file)
@@ -976,6 +976,9 @@ irqreturn_t rkisp1_isp_isr(int irq, void *ctx)
        struct rkisp1_device *rkisp1 = dev_get_drvdata(dev);
        u32 status, isp_err;
 
+       if (!rkisp1->irqs_enabled)
+               return IRQ_NONE;
+
        status = rkisp1_read(rkisp1, RKISP1_CIF_ISP_MIS);
        if (!status)
                return IRQ_NONE;
index 2afe67ffa285e3755f35c5842827160b93f573b5..74d69ce22a33e801762bc156c8c40289b2b2d4cb 100644 (file)
@@ -319,6 +319,7 @@ config IR_PWM_TX
        tristate "PWM IR transmitter"
        depends on LIRC
        depends on PWM
+       depends on HIGH_RES_TIMERS
        depends on OF
        help
           Say Y if you want to use a PWM based IR transmitter. This is
index fe17c7f98e8101afdae3d608ab12a2ef7f971d0e..52d82cbe7685f5b5adadf4448a171fcb146612b8 100644 (file)
@@ -253,7 +253,7 @@ int lirc_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
        if (attr->attach_flags)
                return -EINVAL;
 
-       rcdev = rc_dev_get_from_fd(attr->target_fd);
+       rcdev = rc_dev_get_from_fd(attr->target_fd, true);
        if (IS_ERR(rcdev))
                return PTR_ERR(rcdev);
 
@@ -278,7 +278,7 @@ int lirc_prog_detach(const union bpf_attr *attr)
        if (IS_ERR(prog))
                return PTR_ERR(prog);
 
-       rcdev = rc_dev_get_from_fd(attr->target_fd);
+       rcdev = rc_dev_get_from_fd(attr->target_fd, true);
        if (IS_ERR(rcdev)) {
                bpf_prog_put(prog);
                return PTR_ERR(rcdev);
@@ -303,7 +303,7 @@ int lirc_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr)
        if (attr->query.query_flags)
                return -EINVAL;
 
-       rcdev = rc_dev_get_from_fd(attr->query.target_fd);
+       rcdev = rc_dev_get_from_fd(attr->query.target_fd, false);
        if (IS_ERR(rcdev))
                return PTR_ERR(rcdev);
 
index 1968067092594979942f030af29bf4b484b5fd09..69e630d85262f65f413ee8c9d092ea85cee01c91 100644 (file)
@@ -332,6 +332,7 @@ static int irtoy_tx(struct rc_dev *rc, uint *txbuf, uint count)
                            sizeof(COMMAND_SMODE_EXIT), STATE_COMMAND_NO_RESP);
        if (err) {
                dev_err(irtoy->dev, "exit sample mode: %d\n", err);
+               kfree(buf);
                return err;
        }
 
@@ -339,6 +340,7 @@ static int irtoy_tx(struct rc_dev *rc, uint *txbuf, uint count)
                            sizeof(COMMAND_SMODE_ENTER), STATE_COMMAND);
        if (err) {
                dev_err(irtoy->dev, "enter sample mode: %d\n", err);
+               kfree(buf);
                return err;
        }
 
index a537734832c5080498d263428a96d7b1d13dcb88..caad59f76793f750f757c8fe5e58fe569b6b4322 100644 (file)
@@ -814,7 +814,7 @@ void __exit lirc_dev_exit(void)
        unregister_chrdev_region(lirc_base_dev, RC_DEV_MAX);
 }
 
-struct rc_dev *rc_dev_get_from_fd(int fd)
+struct rc_dev *rc_dev_get_from_fd(int fd, bool write)
 {
        struct fd f = fdget(fd);
        struct lirc_fh *fh;
@@ -828,6 +828,9 @@ struct rc_dev *rc_dev_get_from_fd(int fd)
                return ERR_PTR(-EINVAL);
        }
 
+       if (write && !(f.file->f_mode & FMODE_WRITE))
+               return ERR_PTR(-EPERM);
+
        fh = f.file->private_data;
        dev = fh->rc;
 
index ef1e95e1af7fcccda49324e375b940cc92c627f2..7df949fc65e2b68bf88c12643410330fc1ad4635 100644 (file)
@@ -325,7 +325,7 @@ void lirc_raw_event(struct rc_dev *dev, struct ir_raw_event ev);
 void lirc_scancode_event(struct rc_dev *dev, struct lirc_scancode *lsc);
 int lirc_register(struct rc_dev *dev);
 void lirc_unregister(struct rc_dev *dev);
-struct rc_dev *rc_dev_get_from_fd(int fd);
+struct rc_dev *rc_dev_get_from_fd(int fd, bool write);
 #else
 static inline int lirc_dev_init(void) { return 0; }
 static inline void lirc_dev_exit(void) {}
index 04115cd92433bfb9134708f0e3373bb21baa7c19..47a314a4eb6fafc76482ff3010b35298d3423ec7 100644 (file)
@@ -2078,6 +2078,12 @@ static const struct blk_mq_ops msb_mq_ops = {
 static int msb_init_disk(struct memstick_dev *card)
 {
        struct msb_data *msb = memstick_get_drvdata(card);
+       struct queue_limits lim = {
+               .logical_block_size     = msb->page_size,
+               .max_hw_sectors         = MS_BLOCK_MAX_PAGES,
+               .max_segments           = MS_BLOCK_MAX_SEGS,
+               .max_segment_size       = MS_BLOCK_MAX_PAGES * msb->page_size,
+       };
        int rc;
        unsigned long capacity;
 
@@ -2093,19 +2099,13 @@ static int msb_init_disk(struct memstick_dev *card)
        if (rc)
                goto out_release_id;
 
-       msb->disk = blk_mq_alloc_disk(&msb->tag_set, card);
+       msb->disk = blk_mq_alloc_disk(&msb->tag_set, &lim, card);
        if (IS_ERR(msb->disk)) {
                rc = PTR_ERR(msb->disk);
                goto out_free_tag_set;
        }
        msb->queue = msb->disk->queue;
 
-       blk_queue_max_hw_sectors(msb->queue, MS_BLOCK_MAX_PAGES);
-       blk_queue_max_segments(msb->queue, MS_BLOCK_MAX_SEGS);
-       blk_queue_max_segment_size(msb->queue,
-                                  MS_BLOCK_MAX_PAGES * msb->page_size);
-       blk_queue_logical_block_size(msb->queue, msb->page_size);
-
        sprintf(msb->disk->disk_name, "msblk%d", msb->disk_id);
        msb->disk->fops = &msb_bdops;
        msb->disk->private_data = msb;
index 5a69ed33999b4c4c0e5565dc9a3b58f2e8e0871d..49accfdc89d616cea5eb307a56ce85f68b2b5c6e 100644 (file)
@@ -1103,6 +1103,12 @@ static const struct blk_mq_ops mspro_mq_ops = {
 static int mspro_block_init_disk(struct memstick_dev *card)
 {
        struct mspro_block_data *msb = memstick_get_drvdata(card);
+       struct queue_limits lim = {
+               .logical_block_size     = msb->page_size,
+               .max_hw_sectors         = MSPRO_BLOCK_MAX_PAGES,
+               .max_segments           = MSPRO_BLOCK_MAX_SEGS,
+               .max_segment_size       = MSPRO_BLOCK_MAX_PAGES * msb->page_size,
+       };
        struct mspro_devinfo *dev_info = NULL;
        struct mspro_sys_info *sys_info = NULL;
        struct mspro_sys_attr *s_attr = NULL;
@@ -1138,18 +1144,13 @@ static int mspro_block_init_disk(struct memstick_dev *card)
        if (rc)
                goto out_release_id;
 
-       msb->disk = blk_mq_alloc_disk(&msb->tag_set, card);
+       msb->disk = blk_mq_alloc_disk(&msb->tag_set, &lim, card);
        if (IS_ERR(msb->disk)) {
                rc = PTR_ERR(msb->disk);
                goto out_free_tag_set;
        }
        msb->queue = msb->disk->queue;
 
-       blk_queue_max_hw_sectors(msb->queue, MSPRO_BLOCK_MAX_PAGES);
-       blk_queue_max_segments(msb->queue, MSPRO_BLOCK_MAX_SEGS);
-       blk_queue_max_segment_size(msb->queue,
-                                  MSPRO_BLOCK_MAX_PAGES * msb->page_size);
-
        msb->disk->major = major;
        msb->disk->first_minor = disk_id << MSPRO_BLOCK_PART_SHIFT;
        msb->disk->minors = 1 << MSPRO_BLOCK_PART_SHIFT;
@@ -1158,8 +1159,6 @@ static int mspro_block_init_disk(struct memstick_dev *card)
 
        sprintf(msb->disk->disk_name, "mspblk%d", disk_id);
 
-       blk_queue_logical_block_size(msb->queue, msb->page_size);
-
        capacity = be16_to_cpu(sys_info->user_block_count);
        capacity *= be16_to_cpu(sys_info->block_size);
        capacity *= msb->page_size >> 9;
index 1c6c62a7f7f5535f4c1025ee0d957006a4c5deb4..dbd26c3b245bca56adffc8288d5657dd3c61d3b8 100644 (file)
@@ -263,7 +263,6 @@ struct fastrpc_channel_ctx {
        int domain_id;
        int sesscount;
        int vmcount;
-       u64 perms;
        struct qcom_scm_vmperm vmperms[FASTRPC_MAX_VMIDS];
        struct rpmsg_device *rpdev;
        struct fastrpc_session_ctx session[FASTRPC_MAX_SESSIONS];
@@ -1279,9 +1278,11 @@ static int fastrpc_init_create_static_process(struct fastrpc_user *fl,
 
                /* Map if we have any heap VMIDs associated with this ADSP Static Process. */
                if (fl->cctx->vmcount) {
+                       u64 src_perms = BIT(QCOM_SCM_VMID_HLOS);
+
                        err = qcom_scm_assign_mem(fl->cctx->remote_heap->phys,
                                                        (u64)fl->cctx->remote_heap->size,
-                                                       &fl->cctx->perms,
+                                                       &src_perms,
                                                        fl->cctx->vmperms, fl->cctx->vmcount);
                        if (err) {
                                dev_err(fl->sctx->dev, "Failed to assign memory with phys 0x%llx size 0x%llx err %d",
@@ -1915,8 +1916,10 @@ static int fastrpc_req_mmap(struct fastrpc_user *fl, char __user *argp)
 
        /* Add memory to static PD pool, protection thru hypervisor */
        if (req.flags == ADSP_MMAP_REMOTE_HEAP_ADDR && fl->cctx->vmcount) {
+               u64 src_perms = BIT(QCOM_SCM_VMID_HLOS);
+
                err = qcom_scm_assign_mem(buf->phys, (u64)buf->size,
-                       &fl->cctx->perms, fl->cctx->vmperms, fl->cctx->vmcount);
+                       &src_perms, fl->cctx->vmperms, fl->cctx->vmcount);
                if (err) {
                        dev_err(fl->sctx->dev, "Failed to assign memory phys 0x%llx size 0x%llx err %d",
                                        buf->phys, buf->size, err);
@@ -2191,7 +2194,7 @@ static int fastrpc_cb_remove(struct platform_device *pdev)
        int i;
 
        spin_lock_irqsave(&cctx->lock, flags);
-       for (i = 1; i < FASTRPC_MAX_SESSIONS; i++) {
+       for (i = 0; i < FASTRPC_MAX_SESSIONS; i++) {
                if (cctx->session[i].sid == sess->sid) {
                        cctx->session[i].valid = false;
                        cctx->sesscount--;
@@ -2290,7 +2293,6 @@ static int fastrpc_rpmsg_probe(struct rpmsg_device *rpdev)
 
        if (vmcount) {
                data->vmcount = vmcount;
-               data->perms = BIT(QCOM_SCM_VMID_HLOS);
                for (i = 0; i < data->vmcount; i++) {
                        data->vmperms[i].vmid = vmids[i];
                        data->vmperms[i].perm = QCOM_SCM_PERM_RWX;
index c6eb27d46cb06de4ade9c0cdbbdd270ffe8ab474..15119584473cafbaaa96ce18f059571cfc2196bc 100644 (file)
@@ -198,8 +198,14 @@ static int lis3lv02d_i2c_suspend(struct device *dev)
        struct i2c_client *client = to_i2c_client(dev);
        struct lis3lv02d *lis3 = i2c_get_clientdata(client);
 
-       if (!lis3->pdata || !lis3->pdata->wakeup_flags)
+       /* Turn on for wakeup if turned off by runtime suspend */
+       if (lis3->pdata && lis3->pdata->wakeup_flags) {
+               if (pm_runtime_suspended(dev))
+                       lis3lv02d_poweron(lis3);
+       /* For non wakeup turn off if not already turned off by runtime suspend */
+       } else if (!pm_runtime_suspended(dev))
                lis3lv02d_poweroff(lis3);
+
        return 0;
 }
 
@@ -208,13 +214,12 @@ static int lis3lv02d_i2c_resume(struct device *dev)
        struct i2c_client *client = to_i2c_client(dev);
        struct lis3lv02d *lis3 = i2c_get_clientdata(client);
 
-       /*
-        * pm_runtime documentation says that devices should always
-        * be powered on at resume. Pm_runtime turns them off after system
-        * wide resume is complete.
-        */
-       if (!lis3->pdata || !lis3->pdata->wakeup_flags ||
-               pm_runtime_suspended(dev))
+       /* Turn back off if turned on for wakeup and runtime suspended*/
+       if (lis3->pdata && lis3->pdata->wakeup_flags) {
+               if (pm_runtime_suspended(dev))
+                       lis3lv02d_poweroff(lis3);
+       /* For non wakeup turn back on if not runtime suspended */
+       } else if (!pm_runtime_suspended(dev))
                lis3lv02d_poweron(lis3);
 
        return 0;
index be52b113aea937c7c658e06c012815cec8552f28..89364bdbb1290f5726a34945679e341b17289493 100644 (file)
@@ -96,7 +96,8 @@ static const struct component_master_ops mei_component_master_ops = {
  *
  *    The function checks if the device is pci device and
  *    Intel VGA adapter, the subcomponent is SW Proxy
- *    and the parent of MEI PCI and the parent of VGA are the same PCH device.
+ *    and the VGA is on the bus 0 reserved for built-in devices
+ *    to reject discrete GFX.
  *
  * @dev: master device
  * @subcomponent: subcomponent to match (I915_COMPONENT_SWPROXY)
@@ -123,7 +124,8 @@ static int mei_gsc_proxy_component_match(struct device *dev, int subcomponent,
        if (subcomponent != I915_COMPONENT_GSC_PROXY)
                return 0;
 
-       return component_compare_dev(dev->parent, ((struct device *)data)->parent);
+       /* Only built-in GFX */
+       return (pdev->bus->number == 0);
 }
 
 static int mei_gsc_proxy_probe(struct mei_cl_device *cldev,
@@ -146,7 +148,7 @@ static int mei_gsc_proxy_probe(struct mei_cl_device *cldev,
        }
 
        component_match_add_typed(&cldev->dev, &master_match,
-                                 mei_gsc_proxy_component_match, cldev->dev.parent);
+                                 mei_gsc_proxy_component_match, NULL);
        if (IS_ERR_OR_NULL(master_match)) {
                ret = -ENOMEM;
                goto err_exit;
index 961e5d53a27a8c4221b4b33c9d4e70f0f0155ee7..aac36750d2c54a658debcca55063d2e2a02bf1ce 100644 (file)
 #define MEI_DEV_ID_RPL_S      0x7A68  /* Raptor Lake Point S */
 
 #define MEI_DEV_ID_MTL_M      0x7E70  /* Meteor Lake Point M */
+#define MEI_DEV_ID_ARL_S      0x7F68  /* Arrow Lake Point S */
+#define MEI_DEV_ID_ARL_H      0x7770  /* Arrow Lake Point H */
 
 /*
  * MEI HW Section
index 676d566f38ddfd2cbb5c167f5691f737b4fcf01c..8cf636c5403225f7588a2428318cec1ff7fd2700 100644 (file)
@@ -119,6 +119,8 @@ static const struct pci_device_id mei_me_pci_tbl[] = {
        {MEI_PCI_DEVICE(MEI_DEV_ID_RPL_S, MEI_ME_PCH15_CFG)},
 
        {MEI_PCI_DEVICE(MEI_DEV_ID_MTL_M, MEI_ME_PCH15_CFG)},
+       {MEI_PCI_DEVICE(MEI_DEV_ID_ARL_S, MEI_ME_PCH15_CFG)},
+       {MEI_PCI_DEVICE(MEI_DEV_ID_ARL_H, MEI_ME_PCH15_CFG)},
 
        /* required last entry */
        {0, }
index 6f4a4be6ccb5508dbc8857ce21b7643e93c1ea63..55f7db490d3bbbabd08b7bd2ba6ff078abb93ad6 100644 (file)
@@ -535,6 +535,7 @@ static const struct acpi_device_id vsc_tp_acpi_ids[] = {
        { "INTC1009" }, /* Raptor Lake */
        { "INTC1058" }, /* Tiger Lake */
        { "INTC1094" }, /* Alder Lake */
+       { "INTC10D0" }, /* Meteor Lake */
        {}
 };
 MODULE_DEVICE_TABLE(acpi, vsc_tp_acpi_ids);
index 8aea2d070a40c23e0a0ed9495d8039f9fa6804ac..d279a4f195e2a343a8332d25c25e85a89b6ac88f 100644 (file)
@@ -140,7 +140,6 @@ static int __init open_dice_probe(struct platform_device *pdev)
                return -ENOMEM;
 
        *drvdata = (struct open_dice_drvdata){
-               .lock = __MUTEX_INITIALIZER(drvdata->lock),
                .rmem = rmem,
                .misc = (struct miscdevice){
                        .parent = dev,
@@ -150,6 +149,7 @@ static int __init open_dice_probe(struct platform_device *pdev)
                        .mode   = 0600,
                },
        };
+       mutex_init(&drvdata->lock);
 
        /* Index overflow check not needed, misc_register() will fail. */
        snprintf(drvdata->name, sizeof(drvdata->name), DRIVER_NAME"%u", dev_idx++);
index f410bee501328f6af96b4f0029d4856e45e06766..58ed7193a3ca460fe58a46427306b385a40a2d3e 100644 (file)
@@ -1015,10 +1015,12 @@ static int mmc_select_bus_width(struct mmc_card *card)
        static unsigned ext_csd_bits[] = {
                EXT_CSD_BUS_WIDTH_8,
                EXT_CSD_BUS_WIDTH_4,
+               EXT_CSD_BUS_WIDTH_1,
        };
        static unsigned bus_widths[] = {
                MMC_BUS_WIDTH_8,
                MMC_BUS_WIDTH_4,
+               MMC_BUS_WIDTH_1,
        };
        struct mmc_host *host = card->host;
        unsigned idx, bus_width = 0;
index a0a2412f62a7304278220a9f72e0ea84d4e2a508..2ae60d208cdf1ee2243aa99a340bd5be06d8bbc3 100644 (file)
@@ -174,8 +174,8 @@ static struct scatterlist *mmc_alloc_sg(unsigned short sg_len, gfp_t gfp)
        return sg;
 }
 
-static void mmc_queue_setup_discard(struct request_queue *q,
-                                   struct mmc_card *card)
+static void mmc_queue_setup_discard(struct mmc_card *card,
+               struct queue_limits *lim)
 {
        unsigned max_discard;
 
@@ -183,15 +183,17 @@ static void mmc_queue_setup_discard(struct request_queue *q,
        if (!max_discard)
                return;
 
-       blk_queue_max_discard_sectors(q, max_discard);
-       q->limits.discard_granularity = card->pref_erase << 9;
-       /* granularity must not be greater than max. discard */
-       if (card->pref_erase > max_discard)
-               q->limits.discard_granularity = SECTOR_SIZE;
+       lim->max_hw_discard_sectors = max_discard;
        if (mmc_can_secure_erase_trim(card))
-               blk_queue_max_secure_erase_sectors(q, max_discard);
+               lim->max_secure_erase_sectors = max_discard;
        if (mmc_can_trim(card) && card->erased_byte == 0)
-               blk_queue_max_write_zeroes_sectors(q, max_discard);
+               lim->max_write_zeroes_sectors = max_discard;
+
+       /* granularity must not be greater than max. discard */
+       if (card->pref_erase > max_discard)
+               lim->discard_granularity = SECTOR_SIZE;
+       else
+               lim->discard_granularity = card->pref_erase << 9;
 }
 
 static unsigned short mmc_get_max_segments(struct mmc_host *host)
@@ -341,40 +343,53 @@ static const struct blk_mq_ops mmc_mq_ops = {
        .timeout        = mmc_mq_timed_out,
 };
 
-static void mmc_setup_queue(struct mmc_queue *mq, struct mmc_card *card)
+static struct gendisk *mmc_alloc_disk(struct mmc_queue *mq,
+               struct mmc_card *card)
 {
        struct mmc_host *host = card->host;
-       unsigned block_size = 512;
+       struct queue_limits lim = { };
+       struct gendisk *disk;
 
-       blk_queue_flag_set(QUEUE_FLAG_NONROT, mq->queue);
-       blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, mq->queue);
        if (mmc_can_erase(card))
-               mmc_queue_setup_discard(mq->queue, card);
+               mmc_queue_setup_discard(card, &lim);
 
        if (!mmc_dev(host)->dma_mask || !*mmc_dev(host)->dma_mask)
-               blk_queue_bounce_limit(mq->queue, BLK_BOUNCE_HIGH);
-       blk_queue_max_hw_sectors(mq->queue,
-               min(host->max_blk_count, host->max_req_size / 512));
-       if (host->can_dma_map_merge)
-               WARN(!blk_queue_can_use_dma_map_merging(mq->queue,
-                                                       mmc_dev(host)),
-                    "merging was advertised but not possible");
-       blk_queue_max_segments(mq->queue, mmc_get_max_segments(host));
-
-       if (mmc_card_mmc(card) && card->ext_csd.data_sector_size) {
-               block_size = card->ext_csd.data_sector_size;
-               WARN_ON(block_size != 512 && block_size != 4096);
-       }
+               lim.bounce = BLK_BOUNCE_HIGH;
+
+       lim.max_hw_sectors = min(host->max_blk_count, host->max_req_size / 512);
+
+       if (mmc_card_mmc(card) && card->ext_csd.data_sector_size)
+               lim.logical_block_size = card->ext_csd.data_sector_size;
+       else
+               lim.logical_block_size = 512;
+
+       WARN_ON_ONCE(lim.logical_block_size != 512 &&
+                    lim.logical_block_size != 4096);
 
-       blk_queue_logical_block_size(mq->queue, block_size);
        /*
-        * After blk_queue_can_use_dma_map_merging() was called with succeed,
-        * since it calls blk_queue_virt_boundary(), the mmc should not call
-        * both blk_queue_max_segment_size().
+        * Setting a virt_boundary implicity sets a max_segment_size, so try
+        * to set the hardware one here.
         */
-       if (!host->can_dma_map_merge)
-               blk_queue_max_segment_size(mq->queue,
-                       round_down(host->max_seg_size, block_size));
+       if (host->can_dma_map_merge) {
+               lim.virt_boundary_mask = dma_get_merge_boundary(mmc_dev(host));
+               lim.max_segments = MMC_DMA_MAP_MERGE_SEGMENTS;
+       } else {
+               lim.max_segment_size =
+                       round_down(host->max_seg_size, lim.logical_block_size);
+               lim.max_segments = host->max_segs;
+       }
+
+       disk = blk_mq_alloc_disk(&mq->tag_set, &lim, mq);
+       if (IS_ERR(disk))
+               return disk;
+       mq->queue = disk->queue;
+
+       if (mmc_host_is_spi(host) && host->use_spi_crc)
+               blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, mq->queue);
+       blk_queue_rq_timeout(mq->queue, 60 * HZ);
+
+       blk_queue_flag_set(QUEUE_FLAG_NONROT, mq->queue);
+       blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, mq->queue);
 
        dma_set_max_seg_size(mmc_dev(host), queue_max_segment_size(mq->queue));
 
@@ -386,6 +401,7 @@ static void mmc_setup_queue(struct mmc_queue *mq, struct mmc_card *card)
        init_waitqueue_head(&mq->wait);
 
        mmc_crypto_setup_queue(mq->queue, host);
+       return disk;
 }
 
 static inline bool mmc_merge_capable(struct mmc_host *host)
@@ -447,18 +463,9 @@ struct gendisk *mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card)
                return ERR_PTR(ret);
                
 
-       disk = blk_mq_alloc_disk(&mq->tag_set, mq);
-       if (IS_ERR(disk)) {
+       disk = mmc_alloc_disk(mq, card);
+       if (IS_ERR(disk))
                blk_mq_free_tag_set(&mq->tag_set);
-               return disk;
-       }
-       mq->queue = disk->queue;
-
-       if (mmc_host_is_spi(host) && host->use_spi_crc)
-               blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, mq->queue);
-       blk_queue_rq_timeout(mq->queue, 60 * HZ);
-
-       mmc_setup_queue(mq, card);
        return disk;
 }
 
index 2a2d949a9344ea78b540337d471996f2ec77d53e..39f45c2b6de8a885e12af08f302d8bc2dce15d61 100644 (file)
@@ -75,11 +75,15 @@ EXPORT_SYMBOL(mmc_gpio_set_cd_irq);
 int mmc_gpio_get_ro(struct mmc_host *host)
 {
        struct mmc_gpio *ctx = host->slot.handler_priv;
+       int cansleep;
 
        if (!ctx || !ctx->ro_gpio)
                return -ENOSYS;
 
-       return gpiod_get_value_cansleep(ctx->ro_gpio);
+       cansleep = gpiod_cansleep(ctx->ro_gpio);
+       return cansleep ?
+               gpiod_get_value_cansleep(ctx->ro_gpio) :
+               gpiod_get_value(ctx->ro_gpio);
 }
 EXPORT_SYMBOL(mmc_gpio_get_ro);
 
index 35067e1e6cd8017b1bb37683f9dda6169af5cbf1..f5da7f9baa52d4b29cd396f0aa88e1ff7891666c 100644 (file)
@@ -225,6 +225,8 @@ static int sdmmc_idma_start(struct mmci_host *host, unsigned int *datactrl)
        struct scatterlist *sg;
        int i;
 
+       host->dma_in_progress = true;
+
        if (!host->variant->dma_lli || data->sg_len == 1 ||
            idma->use_bounce_buffer) {
                u32 dma_addr;
@@ -263,9 +265,30 @@ static int sdmmc_idma_start(struct mmci_host *host, unsigned int *datactrl)
        return 0;
 }
 
+static void sdmmc_idma_error(struct mmci_host *host)
+{
+       struct mmc_data *data = host->data;
+       struct sdmmc_idma *idma = host->dma_priv;
+
+       if (!dma_inprogress(host))
+               return;
+
+       writel_relaxed(0, host->base + MMCI_STM32_IDMACTRLR);
+       host->dma_in_progress = false;
+       data->host_cookie = 0;
+
+       if (!idma->use_bounce_buffer)
+               dma_unmap_sg(mmc_dev(host->mmc), data->sg, data->sg_len,
+                            mmc_get_dma_dir(data));
+}
+
 static void sdmmc_idma_finalize(struct mmci_host *host, struct mmc_data *data)
 {
+       if (!dma_inprogress(host))
+               return;
+
        writel_relaxed(0, host->base + MMCI_STM32_IDMACTRLR);
+       host->dma_in_progress = false;
 
        if (!data->host_cookie)
                sdmmc_idma_unprep_data(host, data, 0);
@@ -676,6 +699,7 @@ static struct mmci_host_ops sdmmc_variant_ops = {
        .dma_setup = sdmmc_idma_setup,
        .dma_start = sdmmc_idma_start,
        .dma_finalize = sdmmc_idma_finalize,
+       .dma_error = sdmmc_idma_error,
        .set_clkreg = mmci_sdmmc_set_clkreg,
        .set_pwrreg = mmci_sdmmc_set_pwrreg,
        .busy_complete = sdmmc_busy_complete,
index 7bfee28116af12ebdf08efb7b0e51e68cb602956..d4a02184784a3458b55b601d1fba1216e1ef3149 100644 (file)
@@ -693,6 +693,35 @@ static int sdhci_pci_o2_init_sd_express(struct mmc_host *mmc, struct mmc_ios *io
        return 0;
 }
 
+static void sdhci_pci_o2_set_power(struct sdhci_host *host, unsigned char mode,  unsigned short vdd)
+{
+       struct sdhci_pci_chip *chip;
+       struct sdhci_pci_slot *slot = sdhci_priv(host);
+       u32 scratch_32 = 0;
+       u8 scratch_8 = 0;
+
+       chip = slot->chip;
+
+       if (mode == MMC_POWER_OFF) {
+               /* UnLock WP */
+               pci_read_config_byte(chip->pdev, O2_SD_LOCK_WP, &scratch_8);
+               scratch_8 &= 0x7f;
+               pci_write_config_byte(chip->pdev, O2_SD_LOCK_WP, scratch_8);
+
+               /* Set PCR 0x354[16] to switch Clock Source back to OPE Clock */
+               pci_read_config_dword(chip->pdev, O2_SD_OUTPUT_CLK_SOURCE_SWITCH, &scratch_32);
+               scratch_32 &= ~(O2_SD_SEL_DLL);
+               pci_write_config_dword(chip->pdev, O2_SD_OUTPUT_CLK_SOURCE_SWITCH, scratch_32);
+
+               /* Lock WP */
+               pci_read_config_byte(chip->pdev, O2_SD_LOCK_WP, &scratch_8);
+               scratch_8 |= 0x80;
+               pci_write_config_byte(chip->pdev, O2_SD_LOCK_WP, scratch_8);
+       }
+
+       sdhci_set_power(host, mode, vdd);
+}
+
 static int sdhci_pci_o2_probe_slot(struct sdhci_pci_slot *slot)
 {
        struct sdhci_pci_chip *chip;
@@ -1051,6 +1080,7 @@ static const struct sdhci_ops sdhci_pci_o2_ops = {
        .set_bus_width = sdhci_set_bus_width,
        .reset = sdhci_reset,
        .set_uhs_signaling = sdhci_set_uhs_signaling,
+       .set_power = sdhci_pci_o2_set_power,
 };
 
 const struct sdhci_pci_fixes sdhci_o2 = {
index 8cf3a375de659a6d98b7dcfc2f4e2be09f7c4a5d..cc9d28b75eb911733d847a1d0c19cf24d9a3f755 100644 (file)
@@ -11,6 +11,7 @@
 #include <linux/slab.h>
 #include <linux/delay.h>
 #include <linux/ktime.h>
+#include <linux/iopoll.h>
 #include <linux/of_address.h>
 
 #include "sdhci-pltfm.h"
 #define XENON_EMMC_PHY_LOGIC_TIMING_ADJUST     (XENON_EMMC_PHY_REG_BASE + 0x18)
 #define XENON_LOGIC_TIMING_VALUE               0x00AA8977
 
+#define XENON_MAX_PHY_TIMEOUT_LOOPS            100
+
 /*
  * List offset of PHY registers and some special register values
  * in eMMC PHY 5.0 or eMMC PHY 5.1
@@ -216,6 +219,19 @@ static int xenon_alloc_emmc_phy(struct sdhci_host *host)
        return 0;
 }
 
+static int xenon_check_stability_internal_clk(struct sdhci_host *host)
+{
+       u32 reg;
+       int err;
+
+       err = read_poll_timeout(sdhci_readw, reg, reg & SDHCI_CLOCK_INT_STABLE,
+                               1100, 20000, false, host, SDHCI_CLOCK_CONTROL);
+       if (err)
+               dev_err(mmc_dev(host->mmc), "phy_init: Internal clock never stabilized.\n");
+
+       return err;
+}
+
 /*
  * eMMC 5.0/5.1 PHY init/re-init.
  * eMMC PHY init should be executed after:
@@ -232,6 +248,11 @@ static int xenon_emmc_phy_init(struct sdhci_host *host)
        struct xenon_priv *priv = sdhci_pltfm_priv(pltfm_host);
        struct xenon_emmc_phy_regs *phy_regs = priv->emmc_phy_regs;
 
+       int ret = xenon_check_stability_internal_clk(host);
+
+       if (ret)
+               return ret;
+
        reg = sdhci_readl(host, phy_regs->timing_adj);
        reg |= XENON_PHY_INITIALIZAION;
        sdhci_writel(host, reg, phy_regs->timing_adj);
@@ -259,18 +280,27 @@ static int xenon_emmc_phy_init(struct sdhci_host *host)
        /* get the wait time */
        wait /= clock;
        wait++;
-       /* wait for host eMMC PHY init completes */
-       udelay(wait);
 
-       reg = sdhci_readl(host, phy_regs->timing_adj);
-       reg &= XENON_PHY_INITIALIZAION;
-       if (reg) {
+       /*
+        * AC5X spec says bit must be polled until zero.
+        * We see cases in which timeout can take longer
+        * than the standard calculation on AC5X, which is
+        * expected following the spec comment above.
+        * According to the spec, we must wait as long as
+        * it takes for that bit to toggle on AC5X.
+        * Cap that with 100 delay loops so we won't get
+        * stuck here forever:
+        */
+
+       ret = read_poll_timeout(sdhci_readl, reg,
+                               !(reg & XENON_PHY_INITIALIZAION),
+                               wait, XENON_MAX_PHY_TIMEOUT_LOOPS * wait,
+                               false, host, phy_regs->timing_adj);
+       if (ret)
                dev_err(mmc_dev(host->mmc), "eMMC PHY init cannot complete after %d us\n",
-                       wait);
-               return -ETIMEDOUT;
-       }
+                       wait * XENON_MAX_PHY_TIMEOUT_LOOPS);
 
-       return 0;
+       return ret;
 }
 
 #define ARMADA_3700_SOC_PAD_1_8V       0x1
index aa44a23ec0451e70a11da895e64dc5a14d2dcbd7..97a00ec9a4d48944a8233b49c5fa0106493abb47 100644 (file)
@@ -37,7 +37,7 @@
 /* Info for the block device */
 struct block2mtd_dev {
        struct list_head list;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        struct mtd_info mtd;
        struct mutex write_mutex;
 };
@@ -55,8 +55,7 @@ static struct page *page_read(struct address_space *mapping, pgoff_t index)
 /* erase a specified part of the device */
 static int _block2mtd_erase(struct block2mtd_dev *dev, loff_t to, size_t len)
 {
-       struct address_space *mapping =
-                               dev->bdev_handle->bdev->bd_inode->i_mapping;
+       struct address_space *mapping = dev->bdev_file->f_mapping;
        struct page *page;
        pgoff_t index = to >> PAGE_SHIFT;       // page index
        int pages = len >> PAGE_SHIFT;
@@ -106,8 +105,7 @@ static int block2mtd_read(struct mtd_info *mtd, loff_t from, size_t len,
                size_t *retlen, u_char *buf)
 {
        struct block2mtd_dev *dev = mtd->priv;
-       struct address_space *mapping =
-                               dev->bdev_handle->bdev->bd_inode->i_mapping;
+       struct address_space *mapping = dev->bdev_file->f_mapping;
        struct page *page;
        pgoff_t index = from >> PAGE_SHIFT;
        int offset = from & (PAGE_SIZE-1);
@@ -142,8 +140,7 @@ static int _block2mtd_write(struct block2mtd_dev *dev, const u_char *buf,
                loff_t to, size_t len, size_t *retlen)
 {
        struct page *page;
-       struct address_space *mapping =
-                               dev->bdev_handle->bdev->bd_inode->i_mapping;
+       struct address_space *mapping = dev->bdev_file->f_mapping;
        pgoff_t index = to >> PAGE_SHIFT;       // page index
        int offset = to & ~PAGE_MASK;   // page offset
        int cpylen;
@@ -198,7 +195,7 @@ static int block2mtd_write(struct mtd_info *mtd, loff_t to, size_t len,
 static void block2mtd_sync(struct mtd_info *mtd)
 {
        struct block2mtd_dev *dev = mtd->priv;
-       sync_blockdev(dev->bdev_handle->bdev);
+       sync_blockdev(file_bdev(dev->bdev_file));
        return;
 }
 
@@ -210,10 +207,9 @@ static void block2mtd_free_device(struct block2mtd_dev *dev)
 
        kfree(dev->mtd.name);
 
-       if (dev->bdev_handle) {
-               invalidate_mapping_pages(
-                       dev->bdev_handle->bdev->bd_inode->i_mapping, 0, -1);
-               bdev_release(dev->bdev_handle);
+       if (dev->bdev_file) {
+               invalidate_mapping_pages(dev->bdev_file->f_mapping, 0, -1);
+               fput(dev->bdev_file);
        }
 
        kfree(dev);
@@ -223,10 +219,10 @@ static void block2mtd_free_device(struct block2mtd_dev *dev)
  * This function is marked __ref because it calls the __init marked
  * early_lookup_bdev when called from the early boot code.
  */
-static struct bdev_handle __ref *mdtblock_early_get_bdev(const char *devname,
+static struct file __ref *mdtblock_early_get_bdev(const char *devname,
                blk_mode_t mode, int timeout, struct block2mtd_dev *dev)
 {
-       struct bdev_handle *bdev_handle = ERR_PTR(-ENODEV);
+       struct file *bdev_file = ERR_PTR(-ENODEV);
 #ifndef MODULE
        int i;
 
@@ -234,7 +230,7 @@ static struct bdev_handle __ref *mdtblock_early_get_bdev(const char *devname,
         * We can't use early_lookup_bdev from a running system.
         */
        if (system_state >= SYSTEM_RUNNING)
-               return bdev_handle;
+               return bdev_file;
 
        /*
         * We might not have the root device mounted at this point.
@@ -253,20 +249,20 @@ static struct bdev_handle __ref *mdtblock_early_get_bdev(const char *devname,
                wait_for_device_probe();
 
                if (!early_lookup_bdev(devname, &devt)) {
-                       bdev_handle = bdev_open_by_dev(devt, mode, dev, NULL);
-                       if (!IS_ERR(bdev_handle))
+                       bdev_file = bdev_file_open_by_dev(devt, mode, dev, NULL);
+                       if (!IS_ERR(bdev_file))
                                break;
                }
        }
 #endif
-       return bdev_handle;
+       return bdev_file;
 }
 
 static struct block2mtd_dev *add_device(char *devname, int erase_size,
                char *label, int timeout)
 {
        const blk_mode_t mode = BLK_OPEN_READ | BLK_OPEN_WRITE;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        struct block_device *bdev;
        struct block2mtd_dev *dev;
        char *name;
@@ -279,16 +275,16 @@ static struct block2mtd_dev *add_device(char *devname, int erase_size,
                return NULL;
 
        /* Get a handle on the device */
-       bdev_handle = bdev_open_by_path(devname, mode, dev, NULL);
-       if (IS_ERR(bdev_handle))
-               bdev_handle = mdtblock_early_get_bdev(devname, mode, timeout,
+       bdev_file = bdev_file_open_by_path(devname, mode, dev, NULL);
+       if (IS_ERR(bdev_file))
+               bdev_file = mdtblock_early_get_bdev(devname, mode, timeout,
                                                      dev);
-       if (IS_ERR(bdev_handle)) {
+       if (IS_ERR(bdev_file)) {
                pr_err("error: cannot open device %s\n", devname);
                goto err_free_block2mtd;
        }
-       dev->bdev_handle = bdev_handle;
-       bdev = bdev_handle->bdev;
+       dev->bdev_file = bdev_file;
+       bdev = file_bdev(bdev_file);
 
        if (MAJOR(bdev->bd_dev) == MTD_BLOCK_MAJOR) {
                pr_err("attempting to use an MTD device as a block device\n");
index f0526dcc216276fa212f3cc5d21c8a9bc250d64b..3caa0717d46c012fcf882ee504d743574d072023 100644 (file)
@@ -277,6 +277,7 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
 {
        struct mtd_blktrans_ops *tr = new->tr;
        struct mtd_blktrans_dev *d;
+       struct queue_limits lim = { };
        int last_devnum = -1;
        struct gendisk *gd;
        int ret;
@@ -331,9 +332,13 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
                        BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_BLOCKING);
        if (ret)
                goto out_kfree_tag_set;
+       
+       lim.logical_block_size = tr->blksize;
+       if (tr->discard)
+               lim.max_hw_discard_sectors = UINT_MAX;
 
        /* Create gendisk */
-       gd = blk_mq_alloc_disk(new->tag_set, new);
+       gd = blk_mq_alloc_disk(new->tag_set, &lim, new);
        if (IS_ERR(gd)) {
                ret = PTR_ERR(gd);
                goto out_free_tag_set;
@@ -371,14 +376,9 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
        if (tr->flush)
                blk_queue_write_cache(new->rq, true, false);
 
-       blk_queue_logical_block_size(new->rq, tr->blksize);
-
        blk_queue_flag_set(QUEUE_FLAG_NONROT, new->rq);
        blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, new->rq);
 
-       if (tr->discard)
-               blk_queue_max_discard_sectors(new->rq, UINT_MAX);
-
        gd->queue = new->rq;
 
        if (new->readonly)
index e451b28840d58b2b0e6b5fdd4d50fe809dd29de4..5887feb347a4e42aa1dcc779bc7f5b252402b16e 100644 (file)
@@ -621,6 +621,7 @@ static void mtd_check_of_node(struct mtd_info *mtd)
                if (plen == mtd_name_len &&
                    !strncmp(mtd->name, pname + offset, plen)) {
                        mtd_set_of_node(mtd, mtd_dn);
+                       of_node_put(mtd_dn);
                        break;
                }
        }
index a466987448502e0b576b612f42139f878d4014ff..5b0f5a9cef81b5fbc1494cb9f00b123ea06f0d35 100644 (file)
@@ -290,16 +290,13 @@ static const struct marvell_hw_ecc_layout marvell_nfc_layouts[] = {
        MARVELL_LAYOUT( 2048,   512,  4,  1,  1, 2048, 32, 30,  0,  0,  0),
        MARVELL_LAYOUT( 2048,   512,  8,  2,  1, 1024,  0, 30,1024,32, 30),
        MARVELL_LAYOUT( 2048,   512,  8,  2,  1, 1024,  0, 30,1024,64, 30),
-       MARVELL_LAYOUT( 2048,   512,  12, 3,  2, 704,   0, 30,640,  0, 30),
-       MARVELL_LAYOUT( 2048,   512,  16, 5,  4, 512,   0, 30,  0, 32, 30),
+       MARVELL_LAYOUT( 2048,   512,  16, 4,  4, 512,   0, 30,  0, 32, 30),
        MARVELL_LAYOUT( 4096,   512,  4,  2,  2, 2048, 32, 30,  0,  0,  0),
-       MARVELL_LAYOUT( 4096,   512,  8,  5,  4, 1024,  0, 30,  0, 64, 30),
-       MARVELL_LAYOUT( 4096,   512,  12, 6,  5, 704,   0, 30,576, 32, 30),
-       MARVELL_LAYOUT( 4096,   512,  16, 9,  8, 512,   0, 30,  0, 32, 30),
+       MARVELL_LAYOUT( 4096,   512,  8,  4,  4, 1024,  0, 30,  0, 64, 30),
+       MARVELL_LAYOUT( 4096,   512,  16, 8,  8, 512,   0, 30,  0, 32, 30),
        MARVELL_LAYOUT( 8192,   512,  4,  4,  4, 2048,  0, 30,  0,  0,  0),
-       MARVELL_LAYOUT( 8192,   512,  8,  9,  8, 1024,  0, 30,  0, 160, 30),
-       MARVELL_LAYOUT( 8192,   512,  12, 12, 11, 704,  0, 30,448,  64, 30),
-       MARVELL_LAYOUT( 8192,   512,  16, 17, 16, 512,  0, 30,  0,  32, 30),
+       MARVELL_LAYOUT( 8192,   512,  8,  8,  8, 1024,  0, 30,  0, 160, 30),
+       MARVELL_LAYOUT( 8192,   512,  16, 16, 16, 512,  0, 30,  0,  32, 30),
 };
 
 /**
index 987710e09441adefbf238948e759fec4d049b126..6023cba748bb858373a54dd58c5808057d9841fc 100644 (file)
@@ -186,7 +186,7 @@ static int gd5fxgq4uexxg_ecc_get_status(struct spinand_device *spinand,
 {
        u8 status2;
        struct spi_mem_op op = SPINAND_GET_FEATURE_OP(GD5FXGQXXEXXG_REG_STATUS2,
-                                                     &status2);
+                                                     spinand->scratchbuf);
        int ret;
 
        switch (status & STATUS_ECC_MASK) {
@@ -207,6 +207,7 @@ static int gd5fxgq4uexxg_ecc_get_status(struct spinand_device *spinand,
                 * report the maximum of 4 in this case
                 */
                /* bits sorted this way (3...0): ECCS1,ECCS0,ECCSE1,ECCSE0 */
+               status2 = *(spinand->scratchbuf);
                return ((status & STATUS_ECC_MASK) >> 2) |
                        ((status2 & STATUS_ECC_MASK) >> 4);
 
@@ -228,7 +229,7 @@ static int gd5fxgq5xexxg_ecc_get_status(struct spinand_device *spinand,
 {
        u8 status2;
        struct spi_mem_op op = SPINAND_GET_FEATURE_OP(GD5FXGQXXEXXG_REG_STATUS2,
-                                                     &status2);
+                                                     spinand->scratchbuf);
        int ret;
 
        switch (status & STATUS_ECC_MASK) {
@@ -248,6 +249,7 @@ static int gd5fxgq5xexxg_ecc_get_status(struct spinand_device *spinand,
                 * 1 ... 4 bits are flipped (and corrected)
                 */
                /* bits sorted this way (1...0): ECCSE1, ECCSE0 */
+               status2 = *(spinand->scratchbuf);
                return ((status2 & STATUS_ECC_MASK) >> 4) + 1;
 
        case STATUS_ECC_UNCOR_ERROR:
index 654bd7372cd8c09c69bf7c205a728f263dcc3dfc..5c8fdcc088a0df73e4ee4684a4406780fddd978e 100644 (file)
@@ -348,6 +348,9 @@ static int calc_disk_capacity(struct ubi_volume_info *vi, u64 *disk_capacity)
 
 int ubiblock_create(struct ubi_volume_info *vi)
 {
+       struct queue_limits lim = {
+               .max_segments           = UBI_MAX_SG_COUNT,
+       };
        struct ubiblock *dev;
        struct gendisk *gd;
        u64 disk_capacity;
@@ -393,7 +396,7 @@ int ubiblock_create(struct ubi_volume_info *vi)
 
 
        /* Initialize the gendisk of this ubiblock device */
-       gd = blk_mq_alloc_disk(&dev->tag_set, dev);
+       gd = blk_mq_alloc_disk(&dev->tag_set, &lim, dev);
        if (IS_ERR(gd)) {
                ret = PTR_ERR(gd);
                goto out_free_tags;
@@ -416,7 +419,6 @@ int ubiblock_create(struct ubi_volume_info *vi)
        dev->gd = gd;
 
        dev->rq = gd->queue;
-       blk_queue_max_segments(dev->rq, UBI_MAX_SG_COUNT);
 
        list_add_tail(&dev->list, &ubiblock_devices);
 
index 8c651fdee039aab85019b9a574b813c51aa04ef6..57f1729066f28b76de26bd3c5145cbe6bcc2f9ac 100644 (file)
@@ -186,4 +186,5 @@ static void __exit arcnet_raw_exit(void)
 module_init(arcnet_raw_init);
 module_exit(arcnet_raw_exit);
 
+MODULE_DESCRIPTION("ARCnet raw mode packet interface module");
 MODULE_LICENSE("GPL");
index 8c3ccc7c83cd3cd92e73c44ebd09a6fa8f8c0c41..53d10a04d1bd0a765e34dd6e56217a59d5485cc9 100644 (file)
@@ -312,6 +312,7 @@ module_param(node, int, 0);
 module_param(io, int, 0);
 module_param(irq, int, 0);
 module_param_string(device, device, sizeof(device), 0);
+MODULE_DESCRIPTION("ARCnet COM90xx RIM I chipset driver");
 MODULE_LICENSE("GPL");
 
 static struct net_device *my_dev;
index c09b567845e1eeb025eff26c77ad5e22f6db150f..7a0a799737698f8ebb834b592d86c849aaf8f371 100644 (file)
@@ -265,4 +265,5 @@ static void __exit capmode_module_exit(void)
 module_init(capmode_module_init);
 module_exit(capmode_module_exit);
 
+MODULE_DESCRIPTION("ARCnet CAP mode packet interface module");
 MODULE_LICENSE("GPL");
index 7b5c8bb02f11941f6210200c23ee2f74272d49a3..c5e571ec94c990d0a76a19e3ab418730548c0ce1 100644 (file)
@@ -61,6 +61,7 @@ module_param(timeout, int, 0);
 module_param(backplane, int, 0);
 module_param(clockp, int, 0);
 module_param(clockm, int, 0);
+MODULE_DESCRIPTION("ARCnet COM20020 chipset PCI driver");
 MODULE_LICENSE("GPL");
 
 static void led_tx_set(struct led_classdev *led_cdev,
index 06e1651b594ba813fc5a0d75931c12ec299c100a..a0053e3992a364ef3e1d3c4328b4baf8956d248d 100644 (file)
@@ -399,6 +399,7 @@ EXPORT_SYMBOL(com20020_found);
 EXPORT_SYMBOL(com20020_netdev_ops);
 #endif
 
+MODULE_DESCRIPTION("ARCnet COM20020 chipset core driver");
 MODULE_LICENSE("GPL");
 
 #ifdef MODULE
index dc3253b318dafc3e668df035880a8bccabeb0fdc..75f08aa7528b4620180da471e87e225a39fcd06b 100644 (file)
@@ -97,6 +97,7 @@ module_param(backplane, int, 0);
 module_param(clockp, int, 0);
 module_param(clockm, int, 0);
 
+MODULE_DESCRIPTION("ARCnet COM20020 chipset PCMCIA driver");
 MODULE_LICENSE("GPL");
 
 /*====================================================================*/
index 37b47749fc8b4afb24ae60151ac5b316554ab391..3b463fbc6402114322278a6648a422f4a7ea2bff 100644 (file)
@@ -350,6 +350,7 @@ static char device[9];              /* use eg. device=arc1 to change name */
 module_param_hw(io, int, ioport, 0);
 module_param_hw(irq, int, irq, 0);
 module_param_string(device, device, sizeof(device), 0);
+MODULE_DESCRIPTION("ARCnet COM90xx IO mapped chipset driver");
 MODULE_LICENSE("GPL");
 
 #ifndef MODULE
index f49dae1942846d866d11aad4d619202fd7a8cb01..b3b287c1656179b6656d62a53e488ffcd85178ac 100644 (file)
@@ -645,6 +645,7 @@ static void com90xx_copy_from_card(struct net_device *dev, int bufnum,
        TIME(dev, "memcpy_fromio", count, memcpy_fromio(buf, memaddr, count));
 }
 
+MODULE_DESCRIPTION("ARCnet COM90xx normal chipset driver");
 MODULE_LICENSE("GPL");
 
 static int __init com90xx_init(void)
index a7752a5b647fcd4128e3e0c7b59961ad0a83c660..46519ca63a0aa5459732fd140d62ad679bc3a519 100644 (file)
@@ -78,6 +78,7 @@ static void __exit arcnet_rfc1051_exit(void)
 module_init(arcnet_rfc1051_init);
 module_exit(arcnet_rfc1051_exit);
 
+MODULE_DESCRIPTION("ARCNet packet format (RFC 1051) module");
 MODULE_LICENSE("GPL");
 
 /* Determine a packet's protocol ID.
index a4c856282674b3d3d74007f43fb7c7272ea046b8..0edf35d971c56ef15a11814167a2100927975f61 100644 (file)
@@ -35,6 +35,7 @@
 
 #include "arcdevice.h"
 
+MODULE_DESCRIPTION("ARCNet packet format (RFC 1201) module");
 MODULE_LICENSE("GPL");
 
 static __be16 type_trans(struct sk_buff *skb, struct net_device *dev);
index 4e0600c7b050f21c82a8862e224bb055e95d5039..cd0683bcca038d59336ee548b4aaa6acdb2c21a5 100644 (file)
@@ -1811,7 +1811,7 @@ void bond_xdp_set_features(struct net_device *bond_dev)
 
        ASSERT_RTNL();
 
-       if (!bond_xdp_check(bond)) {
+       if (!bond_xdp_check(bond) || !bond_has_slaves(bond)) {
                xdp_clear_features_flag(bond_dev);
                return;
        }
@@ -1819,6 +1819,8 @@ void bond_xdp_set_features(struct net_device *bond_dev)
        bond_for_each_slave(bond, slave, iter)
                val &= slave->dev->xdp_features;
 
+       val &= ~NETDEV_XDP_ACT_XSK_ZEROCOPY;
+
        xdp_set_features_flag(bond_dev, val);
 }
 
@@ -5909,9 +5911,6 @@ void bond_setup(struct net_device *bond_dev)
        if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP)
                bond_dev->features |= BOND_XFRM_FEATURES;
 #endif /* CONFIG_XFRM_OFFLOAD */
-
-       if (bond_xdp_check(bond))
-               bond_dev->xdp_features = NETDEV_XDP_ACT_MASK;
 }
 
 /* Destroy a bonding device.
index 036d85ef07f5ba676611e4a20ea470ab7691ffc1..dfdc039d92a6c114a1d5204c3a42ad170ca1b400 100644 (file)
@@ -346,7 +346,7 @@ static int can_changelink(struct net_device *dev, struct nlattr *tb[],
                        /* Neither of TDC parameters nor TDC flags are
                         * provided: do calculation
                         */
-                       can_calc_tdco(&priv->tdc, priv->tdc_const, &priv->data_bittiming,
+                       can_calc_tdco(&priv->tdc, priv->tdc_const, &dbt,
                                      &priv->ctrlmode, priv->ctrlmode_supported);
                } /* else: both CAN_CTRLMODE_TDC_{AUTO,MANUAL} are explicitly
                   * turned off. TDC is disabled: do nothing
index 237066d307044583167923fcfc54d88a1ff53bc4..14ca42491512c62d71dac84e6e7d4dd91b2eb19b 100644 (file)
@@ -32,4 +32,5 @@ static int __init dsa_loop_bdinfo_init(void)
 }
 arch_initcall(dsa_loop_bdinfo_init)
 
+MODULE_DESCRIPTION("DSA mock-up switch driver");
 MODULE_LICENSE("GPL");
index 61b71bcfe39625b271e0d932f92ee4564d9d87e2..c3da97abce2027a1ed9173a98b4d08fc755dc1d0 100644 (file)
@@ -49,9 +49,9 @@ static int ksz8_ind_write8(struct ksz_device *dev, u8 table, u16 addr, u8 data)
        mutex_lock(&dev->alu_mutex);
 
        ctrl_addr = IND_ACC_TABLE(table) | addr;
-       ret = ksz_write8(dev, regs[REG_IND_BYTE], data);
+       ret = ksz_write16(dev, regs[REG_IND_CTRL_0], ctrl_addr);
        if (!ret)
-               ret = ksz_write16(dev, regs[REG_IND_CTRL_0], ctrl_addr);
+               ret = ksz_write8(dev, regs[REG_IND_BYTE], data);
 
        mutex_unlock(&dev->alu_mutex);
 
index 391c4dbdff4283d0b077608a59e4c95758eb24cf..3c1f657593a8f364e5db9500d06257f34373af8a 100644 (file)
@@ -2838,8 +2838,7 @@ static void mt753x_phylink_mac_link_up(struct dsa_switch *ds, int port,
        /* MT753x MAC works in 1G full duplex mode for all up-clocked
         * variants.
         */
-       if (interface == PHY_INTERFACE_MODE_INTERNAL ||
-           interface == PHY_INTERFACE_MODE_TRGMII ||
+       if (interface == PHY_INTERFACE_MODE_TRGMII ||
            (phy_interface_mode_is_8023z(interface))) {
                speed = SPEED_1000;
                duplex = DUPLEX_FULL;
index 383b3c4d6f599c57358d8970c9c26941231e9898..614cabb5c1b039d8d6df6789589455fe00f09e70 100644 (file)
@@ -3659,7 +3659,7 @@ static int mv88e6xxx_mdio_read_c45(struct mii_bus *bus, int phy, int devad,
        int err;
 
        if (!chip->info->ops->phy_read_c45)
-               return -EOPNOTSUPP;
+               return 0xffff;
 
        mv88e6xxx_reg_lock(chip);
        err = chip->info->ops->phy_read_c45(chip, bus, phy, devad, reg, &val);
index c51f40960961f2b10a2f4191e4b8f5a50af9fc86..7a864329cb7267a9431a181a183c94b6f791f91e 100644 (file)
@@ -2051,12 +2051,11 @@ qca8k_sw_probe(struct mdio_device *mdiodev)
        priv->info = of_device_get_match_data(priv->dev);
 
        priv->reset_gpio = devm_gpiod_get_optional(priv->dev, "reset",
-                                                  GPIOD_ASIS);
+                                                  GPIOD_OUT_HIGH);
        if (IS_ERR(priv->reset_gpio))
                return PTR_ERR(priv->reset_gpio);
 
        if (priv->reset_gpio) {
-               gpiod_set_value_cansleep(priv->reset_gpio, 1);
                /* The active low duration must be greater than 10 ms
                 * and checkpatch.pl wants 20 ms.
                 */
index 0e0aa40168588fff69ff76fc9fa0b3b442319f58..c5636245f1cad225bbd1a638a0bea855db566235 100644 (file)
@@ -100,4 +100,5 @@ static void __exit ns8390_module_exit(void)
 module_init(ns8390_module_init);
 module_exit(ns8390_module_exit);
 #endif /* MODULE */
+MODULE_DESCRIPTION("National Semiconductor 8390 core driver");
 MODULE_LICENSE("GPL");
index 6834742057b3eb041065793057b86d31aba9d2c1..6d429b11e9c6aa5ce0a1ea3a6c3925a3672dd2bc 100644 (file)
@@ -102,4 +102,5 @@ static void __exit NS8390p_cleanup_module(void)
 
 module_init(NS8390p_init_module);
 module_exit(NS8390p_cleanup_module);
+MODULE_DESCRIPTION("National Semiconductor 8390 core for ISA driver");
 MODULE_LICENSE("GPL");
index a09f383dd249f1e1782d20de475bdac76a90c54d..828edca8d30c59dec13c8a764fe0f2ed39ceb8ea 100644 (file)
@@ -610,4 +610,5 @@ static int init_pcmcia(void)
        return 1;
 }
 
+MODULE_DESCRIPTION("National Semiconductor 8390 Amiga PCMCIA ethernet driver");
 MODULE_LICENSE("GPL");
index 24f49a8ff903ff3ae8496074c19df0235b7965cf..fd9dcdc356e681b4eba0698d7f61bbd0196fcb8f 100644 (file)
@@ -270,4 +270,5 @@ static void __exit hydra_cleanup_module(void)
 module_init(hydra_init_module);
 module_exit(hydra_cleanup_module);
 
+MODULE_DESCRIPTION("Zorro-II Hydra 8390 ethernet driver");
 MODULE_LICENSE("GPL");
index 265976e3b64ab227c55924bd8e0581b24b506d42..6cc0e190aa79c129ca65becb4699a5c55e034340 100644 (file)
@@ -296,4 +296,5 @@ static void __exit stnic_cleanup(void)
 
 module_init(stnic_probe);
 module_exit(stnic_cleanup);
+MODULE_DESCRIPTION("National Semiconductor DP83902AV ethernet driver");
 MODULE_LICENSE("GPL");
index d70390e9d03d9bfe421554ef9e50ce78123bddfe..c24dd4fe7a10666a25b53fc5246280c9891b10ac 100644 (file)
@@ -443,4 +443,5 @@ static void __exit zorro8390_cleanup_module(void)
 module_init(zorro8390_init_module);
 module_exit(zorro8390_cleanup_module);
 
+MODULE_DESCRIPTION("Zorro NS8390-based ethernet driver");
 MODULE_LICENSE("GPL");
index da3bdd3025022c3dd7286c3b7873e1a108767025..760a9a60bc15c1849f6b70e7d3f5b99c58667523 100644 (file)
@@ -21,6 +21,7 @@ config ADIN1110
        tristate "Analog Devices ADIN1110 MAC-PHY"
        depends on SPI && NET_SWITCHDEV
        select CRC8
+       select PHYLIB
        help
          Say yes here to build support for Analog Devices ADIN1110
          Low Power 10BASE-T1L Ethernet MAC-PHY.
index 5beadabc213618314ad42da120da259415eaa7b3..ea773cfa0af67bd06d86037bc8208b4111b3bbc5 100644 (file)
@@ -63,6 +63,15 @@ static int pdsc_process_notifyq(struct pdsc_qcq *qcq)
        return nq_work;
 }
 
+static bool pdsc_adminq_inc_if_up(struct pdsc *pdsc)
+{
+       if (pdsc->state & BIT_ULL(PDSC_S_STOPPING_DRIVER) ||
+           pdsc->state & BIT_ULL(PDSC_S_FW_DEAD))
+               return false;
+
+       return refcount_inc_not_zero(&pdsc->adminq_refcnt);
+}
+
 void pdsc_process_adminq(struct pdsc_qcq *qcq)
 {
        union pds_core_adminq_comp *comp;
@@ -75,9 +84,9 @@ void pdsc_process_adminq(struct pdsc_qcq *qcq)
        int aq_work = 0;
        int credits;
 
-       /* Don't process AdminQ when shutting down */
-       if (pdsc->state & BIT_ULL(PDSC_S_STOPPING_DRIVER)) {
-               dev_err(pdsc->dev, "%s: called while PDSC_S_STOPPING_DRIVER\n",
+       /* Don't process AdminQ when it's not up */
+       if (!pdsc_adminq_inc_if_up(pdsc)) {
+               dev_err(pdsc->dev, "%s: called while adminq is unavailable\n",
                        __func__);
                return;
        }
@@ -124,6 +133,7 @@ credits:
                pds_core_intr_credits(&pdsc->intr_ctrl[qcq->intx],
                                      credits,
                                      PDS_CORE_INTR_CRED_REARM);
+       refcount_dec(&pdsc->adminq_refcnt);
 }
 
 void pdsc_work_thread(struct work_struct *work)
@@ -135,18 +145,20 @@ void pdsc_work_thread(struct work_struct *work)
 
 irqreturn_t pdsc_adminq_isr(int irq, void *data)
 {
-       struct pdsc_qcq *qcq = data;
-       struct pdsc *pdsc = qcq->pdsc;
+       struct pdsc *pdsc = data;
+       struct pdsc_qcq *qcq;
 
-       /* Don't process AdminQ when shutting down */
-       if (pdsc->state & BIT_ULL(PDSC_S_STOPPING_DRIVER)) {
-               dev_err(pdsc->dev, "%s: called while PDSC_S_STOPPING_DRIVER\n",
+       /* Don't process AdminQ when it's not up */
+       if (!pdsc_adminq_inc_if_up(pdsc)) {
+               dev_err(pdsc->dev, "%s: called while adminq is unavailable\n",
                        __func__);
                return IRQ_HANDLED;
        }
 
+       qcq = &pdsc->adminqcq;
        queue_work(pdsc->wq, &qcq->work);
        pds_core_intr_mask(&pdsc->intr_ctrl[qcq->intx], PDS_CORE_INTR_MASK_CLEAR);
+       refcount_dec(&pdsc->adminq_refcnt);
 
        return IRQ_HANDLED;
 }
@@ -179,10 +191,16 @@ static int __pdsc_adminq_post(struct pdsc *pdsc,
 
        /* Check that the FW is running */
        if (!pdsc_is_fw_running(pdsc)) {
-               u8 fw_status = ioread8(&pdsc->info_regs->fw_status);
-
-               dev_info(pdsc->dev, "%s: post failed - fw not running %#02x:\n",
-                        __func__, fw_status);
+               if (pdsc->info_regs) {
+                       u8 fw_status =
+                               ioread8(&pdsc->info_regs->fw_status);
+
+                       dev_info(pdsc->dev, "%s: post failed - fw not running %#02x:\n",
+                                __func__, fw_status);
+               } else {
+                       dev_info(pdsc->dev, "%s: post failed - BARs not setup\n",
+                                __func__);
+               }
                ret = -ENXIO;
 
                goto err_out_unlock;
@@ -230,6 +248,12 @@ int pdsc_adminq_post(struct pdsc *pdsc,
        int err = 0;
        int index;
 
+       if (!pdsc_adminq_inc_if_up(pdsc)) {
+               dev_dbg(pdsc->dev, "%s: preventing adminq cmd %u\n",
+                       __func__, cmd->opcode);
+               return -ENXIO;
+       }
+
        wc.qcq = &pdsc->adminqcq;
        index = __pdsc_adminq_post(pdsc, &pdsc->adminqcq, cmd, comp, &wc);
        if (index < 0) {
@@ -248,10 +272,16 @@ int pdsc_adminq_post(struct pdsc *pdsc,
                        break;
 
                if (!pdsc_is_fw_running(pdsc)) {
-                       u8 fw_status = ioread8(&pdsc->info_regs->fw_status);
-
-                       dev_dbg(pdsc->dev, "%s: post wait failed - fw not running %#02x:\n",
-                               __func__, fw_status);
+                       if (pdsc->info_regs) {
+                               u8 fw_status =
+                                       ioread8(&pdsc->info_regs->fw_status);
+
+                               dev_dbg(pdsc->dev, "%s: post wait failed - fw not running %#02x:\n",
+                                       __func__, fw_status);
+                       } else {
+                               dev_dbg(pdsc->dev, "%s: post wait failed - BARs not setup\n",
+                                       __func__);
+                       }
                        err = -ENXIO;
                        break;
                }
@@ -285,6 +315,8 @@ err_out:
                        queue_work(pdsc->wq, &pdsc->health_work);
        }
 
+       refcount_dec(&pdsc->adminq_refcnt);
+
        return err;
 }
 EXPORT_SYMBOL_GPL(pdsc_adminq_post);
index 11c23a7f3172d39a200b85baad84c3cb3cc0db07..fd1a5149c00319d3c0c32a784f1d819869c85b44 100644 (file)
@@ -160,23 +160,19 @@ static struct pds_auxiliary_dev *pdsc_auxbus_dev_register(struct pdsc *cf,
        if (err < 0) {
                dev_warn(cf->dev, "auxiliary_device_init of %s failed: %pe\n",
                         name, ERR_PTR(err));
-               goto err_out;
+               kfree(padev);
+               return ERR_PTR(err);
        }
 
        err = auxiliary_device_add(aux_dev);
        if (err) {
                dev_warn(cf->dev, "auxiliary_device_add of %s failed: %pe\n",
                         name, ERR_PTR(err));
-               goto err_out_uninit;
+               auxiliary_device_uninit(aux_dev);
+               return ERR_PTR(err);
        }
 
        return padev;
-
-err_out_uninit:
-       auxiliary_device_uninit(aux_dev);
-err_out:
-       kfree(padev);
-       return ERR_PTR(err);
 }
 
 int pdsc_auxbus_dev_del(struct pdsc *cf, struct pdsc *pf)
index 0d2091e9eb283a375617828c00552cceb82768ca..7658a72867675aad5287c15989155386d3ab9de7 100644 (file)
@@ -125,7 +125,7 @@ static int pdsc_qcq_intr_alloc(struct pdsc *pdsc, struct pdsc_qcq *qcq)
 
        snprintf(name, sizeof(name), "%s-%d-%s",
                 PDS_CORE_DRV_NAME, pdsc->pdev->bus->number, qcq->q.name);
-       index = pdsc_intr_alloc(pdsc, name, pdsc_adminq_isr, qcq);
+       index = pdsc_intr_alloc(pdsc, name, pdsc_adminq_isr, pdsc);
        if (index < 0)
                return index;
        qcq->intx = index;
@@ -404,10 +404,7 @@ int pdsc_setup(struct pdsc *pdsc, bool init)
        int numdescs;
        int err;
 
-       if (init)
-               err = pdsc_dev_init(pdsc);
-       else
-               err = pdsc_dev_reinit(pdsc);
+       err = pdsc_dev_init(pdsc);
        if (err)
                return err;
 
@@ -450,6 +447,7 @@ int pdsc_setup(struct pdsc *pdsc, bool init)
                pdsc_debugfs_add_viftype(pdsc);
        }
 
+       refcount_set(&pdsc->adminq_refcnt, 1);
        clear_bit(PDSC_S_FW_DEAD, &pdsc->state);
        return 0;
 
@@ -464,6 +462,8 @@ void pdsc_teardown(struct pdsc *pdsc, bool removing)
 
        if (!pdsc->pdev->is_virtfn)
                pdsc_devcmd_reset(pdsc);
+       if (pdsc->adminqcq.work.func)
+               cancel_work_sync(&pdsc->adminqcq.work);
        pdsc_qcq_free(pdsc, &pdsc->notifyqcq);
        pdsc_qcq_free(pdsc, &pdsc->adminqcq);
 
@@ -476,10 +476,9 @@ void pdsc_teardown(struct pdsc *pdsc, bool removing)
                for (i = 0; i < pdsc->nintrs; i++)
                        pdsc_intr_free(pdsc, i);
 
-               if (removing) {
-                       kfree(pdsc->intr_info);
-                       pdsc->intr_info = NULL;
-               }
+               kfree(pdsc->intr_info);
+               pdsc->intr_info = NULL;
+               pdsc->nintrs = 0;
        }
 
        if (pdsc->kern_dbpage) {
@@ -487,6 +486,7 @@ void pdsc_teardown(struct pdsc *pdsc, bool removing)
                pdsc->kern_dbpage = NULL;
        }
 
+       pci_free_irq_vectors(pdsc->pdev);
        set_bit(PDSC_S_FW_DEAD, &pdsc->state);
 }
 
@@ -512,6 +512,24 @@ void pdsc_stop(struct pdsc *pdsc)
                                           PDS_CORE_INTR_MASK_SET);
 }
 
+static void pdsc_adminq_wait_and_dec_once_unused(struct pdsc *pdsc)
+{
+       /* The driver initializes the adminq_refcnt to 1 when the adminq is
+        * allocated and ready for use. Other users/requesters will increment
+        * the refcnt while in use. If the refcnt is down to 1 then the adminq
+        * is not in use and the refcnt can be cleared and adminq freed. Before
+        * calling this function the driver will set PDSC_S_FW_DEAD, which
+        * prevent subsequent attempts to use the adminq and increment the
+        * refcnt to fail. This guarantees that this function will eventually
+        * exit.
+        */
+       while (!refcount_dec_if_one(&pdsc->adminq_refcnt)) {
+               dev_dbg_ratelimited(pdsc->dev, "%s: adminq in use\n",
+                                   __func__);
+               cpu_relax();
+       }
+}
+
 void pdsc_fw_down(struct pdsc *pdsc)
 {
        union pds_core_notifyq_comp reset_event = {
@@ -527,6 +545,8 @@ void pdsc_fw_down(struct pdsc *pdsc)
        if (pdsc->pdev->is_virtfn)
                return;
 
+       pdsc_adminq_wait_and_dec_once_unused(pdsc);
+
        /* Notify clients of fw_down */
        if (pdsc->fw_reporter)
                devlink_health_report(pdsc->fw_reporter, "FW down reported", pdsc);
@@ -577,7 +597,13 @@ err_out:
 
 static void pdsc_check_pci_health(struct pdsc *pdsc)
 {
-       u8 fw_status = ioread8(&pdsc->info_regs->fw_status);
+       u8 fw_status;
+
+       /* some sort of teardown already in progress */
+       if (!pdsc->info_regs)
+               return;
+
+       fw_status = ioread8(&pdsc->info_regs->fw_status);
 
        /* is PCI broken? */
        if (fw_status != PDS_RC_BAD_PCI)
index e35d3e7006bfc1891a0343643910b915f31ba56a..110c4b826b22d588b33ca5cd2f0f1d38c76cf4b5 100644 (file)
@@ -184,6 +184,7 @@ struct pdsc {
        struct mutex devcmd_lock;       /* lock for dev_cmd operations */
        struct mutex config_lock;       /* lock for configuration operations */
        spinlock_t adminq_lock;         /* lock for adminq operations */
+       refcount_t adminq_refcnt;
        struct pds_core_dev_info_regs __iomem *info_regs;
        struct pds_core_dev_cmd_regs __iomem *cmd_regs;
        struct pds_core_intr __iomem *intr_ctrl;
@@ -280,7 +281,6 @@ int pdsc_devcmd_locked(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
                       union pds_core_dev_comp *comp, int max_seconds);
 int pdsc_devcmd_init(struct pdsc *pdsc);
 int pdsc_devcmd_reset(struct pdsc *pdsc);
-int pdsc_dev_reinit(struct pdsc *pdsc);
 int pdsc_dev_init(struct pdsc *pdsc);
 
 void pdsc_reset_prepare(struct pci_dev *pdev);
index 8ec392299b7dcff9b74a0b08f45a5ccd25986cf1..4e8579ca1c8c71bd89659f041f3613113af16141 100644 (file)
@@ -64,6 +64,10 @@ DEFINE_SHOW_ATTRIBUTE(identity);
 
 void pdsc_debugfs_add_ident(struct pdsc *pdsc)
 {
+       /* This file will already exist in the reset flow */
+       if (debugfs_lookup("identity", pdsc->dentry))
+               return;
+
        debugfs_create_file("identity", 0400, pdsc->dentry,
                            pdsc, &identity_fops);
 }
index 31940b857e0e501d2d4d220a0ed6a0cfd03098c7..e65a1632df505d55de687ba781166299d865eaae 100644 (file)
@@ -57,6 +57,9 @@ int pdsc_err_to_errno(enum pds_core_status_code code)
 
 bool pdsc_is_fw_running(struct pdsc *pdsc)
 {
+       if (!pdsc->info_regs)
+               return false;
+
        pdsc->fw_status = ioread8(&pdsc->info_regs->fw_status);
        pdsc->last_fw_time = jiffies;
        pdsc->last_hb = ioread32(&pdsc->info_regs->fw_heartbeat);
@@ -182,13 +185,17 @@ int pdsc_devcmd_locked(struct pdsc *pdsc, union pds_core_dev_cmd *cmd,
 {
        int err;
 
+       if (!pdsc->cmd_regs)
+               return -ENXIO;
+
        memcpy_toio(&pdsc->cmd_regs->cmd, cmd, sizeof(*cmd));
        pdsc_devcmd_dbell(pdsc);
        err = pdsc_devcmd_wait(pdsc, cmd->opcode, max_seconds);
-       memcpy_fromio(comp, &pdsc->cmd_regs->comp, sizeof(*comp));
 
        if ((err == -ENXIO || err == -ETIMEDOUT) && pdsc->wq)
                queue_work(pdsc->wq, &pdsc->health_work);
+       else
+               memcpy_fromio(comp, &pdsc->cmd_regs->comp, sizeof(*comp));
 
        return err;
 }
@@ -309,13 +316,6 @@ static int pdsc_identify(struct pdsc *pdsc)
        return 0;
 }
 
-int pdsc_dev_reinit(struct pdsc *pdsc)
-{
-       pdsc_init_devinfo(pdsc);
-
-       return pdsc_identify(pdsc);
-}
-
 int pdsc_dev_init(struct pdsc *pdsc)
 {
        unsigned int nintrs;
index e9948ea5bbcdbaae713390cca46280e55b548956..54864f27c87a9e526524a023e444e318cc2bc0f7 100644 (file)
@@ -111,7 +111,8 @@ int pdsc_dl_info_get(struct devlink *dl, struct devlink_info_req *req,
 
        mutex_lock(&pdsc->devcmd_lock);
        err = pdsc_devcmd_locked(pdsc, &cmd, &comp, pdsc->devcmd_timeout * 2);
-       memcpy_fromio(&fw_list, pdsc->cmd_regs->data, sizeof(fw_list));
+       if (!err)
+               memcpy_fromio(&fw_list, pdsc->cmd_regs->data, sizeof(fw_list));
        mutex_unlock(&pdsc->devcmd_lock);
        if (err && err != -EIO)
                return err;
index 90a811f3878ae974679bc5caba97e18aae04bdfb..fa626719e68d1b206fc9bbe1d038daf51984f19b 100644 (file)
@@ -107,6 +107,9 @@ int pdsc_firmware_update(struct pdsc *pdsc, const struct firmware *fw,
 
        dev_info(pdsc->dev, "Installing firmware\n");
 
+       if (!pdsc->cmd_regs)
+               return -ENXIO;
+
        dl = priv_to_devlink(pdsc);
        devlink_flash_update_status_notify(dl, "Preparing to flash",
                                           NULL, 0, 0);
index 3080898d7b95b0122701cacb8a15796ed2cc2dcb..0050c5894563b8a54c21a6b8933b844d63804098 100644 (file)
@@ -37,6 +37,11 @@ static void pdsc_unmap_bars(struct pdsc *pdsc)
        struct pdsc_dev_bar *bars = pdsc->bars;
        unsigned int i;
 
+       pdsc->info_regs = NULL;
+       pdsc->cmd_regs = NULL;
+       pdsc->intr_status = NULL;
+       pdsc->intr_ctrl = NULL;
+
        for (i = 0; i < PDS_CORE_BARS_MAX; i++) {
                if (bars[i].vaddr)
                        pci_iounmap(pdsc->pdev, bars[i].vaddr);
@@ -293,7 +298,7 @@ err_out_stop:
 err_out_teardown:
        pdsc_teardown(pdsc, PDSC_TEARDOWN_REMOVING);
 err_out_unmap_bars:
-       del_timer_sync(&pdsc->wdtimer);
+       timer_shutdown_sync(&pdsc->wdtimer);
        if (pdsc->wq)
                destroy_workqueue(pdsc->wq);
        mutex_destroy(&pdsc->config_lock);
@@ -420,7 +425,7 @@ static void pdsc_remove(struct pci_dev *pdev)
                 */
                pdsc_sriov_configure(pdev, 0);
 
-               del_timer_sync(&pdsc->wdtimer);
+               timer_shutdown_sync(&pdsc->wdtimer);
                if (pdsc->wq)
                        destroy_workqueue(pdsc->wq);
 
@@ -433,7 +438,6 @@ static void pdsc_remove(struct pci_dev *pdev)
                mutex_destroy(&pdsc->config_lock);
                mutex_destroy(&pdsc->devcmd_lock);
 
-               pci_free_irq_vectors(pdev);
                pdsc_unmap_bars(pdsc);
                pci_release_regions(pdev);
        }
@@ -445,13 +449,32 @@ static void pdsc_remove(struct pci_dev *pdev)
        devlink_free(dl);
 }
 
+static void pdsc_stop_health_thread(struct pdsc *pdsc)
+{
+       if (pdsc->pdev->is_virtfn)
+               return;
+
+       timer_shutdown_sync(&pdsc->wdtimer);
+       if (pdsc->health_work.func)
+               cancel_work_sync(&pdsc->health_work);
+}
+
+static void pdsc_restart_health_thread(struct pdsc *pdsc)
+{
+       if (pdsc->pdev->is_virtfn)
+               return;
+
+       timer_setup(&pdsc->wdtimer, pdsc_wdtimer_cb, 0);
+       mod_timer(&pdsc->wdtimer, jiffies + 1);
+}
+
 void pdsc_reset_prepare(struct pci_dev *pdev)
 {
        struct pdsc *pdsc = pci_get_drvdata(pdev);
 
+       pdsc_stop_health_thread(pdsc);
        pdsc_fw_down(pdsc);
 
-       pci_free_irq_vectors(pdev);
        pdsc_unmap_bars(pdsc);
        pci_release_regions(pdev);
        pci_disable_device(pdev);
@@ -486,6 +509,7 @@ void pdsc_reset_done(struct pci_dev *pdev)
        }
 
        pdsc_fw_up(pdsc);
+       pdsc_restart_health_thread(pdsc);
 }
 
 static const struct pci_error_handlers pdsc_err_handler = {
index abd4832e4ed21f3c2a22aed047a0331675162907..5acb3e16b5677b7826e488942ff6efb2c3cdf400 100644 (file)
@@ -993,7 +993,7 @@ int aq_ptp_ring_alloc(struct aq_nic_s *aq_nic)
        return 0;
 
 err_exit_hwts_rx:
-       aq_ring_free(&aq_ptp->hwts_rx);
+       aq_ring_hwts_rx_free(&aq_ptp->hwts_rx);
 err_exit_ptp_rx:
        aq_ring_free(&aq_ptp->ptp_rx);
 err_exit_ptp_tx:
@@ -1011,7 +1011,7 @@ void aq_ptp_ring_free(struct aq_nic_s *aq_nic)
 
        aq_ring_free(&aq_ptp->ptp_tx);
        aq_ring_free(&aq_ptp->ptp_rx);
-       aq_ring_free(&aq_ptp->hwts_rx);
+       aq_ring_hwts_rx_free(&aq_ptp->hwts_rx);
 
        aq_ptp_skb_ring_release(&aq_ptp->skb_ring);
 }
index cda8597b4e1469d2895f895f982f84cb97ef4506..f7433abd659159203f99fbb6cc9ed394bdedacfc 100644 (file)
@@ -919,6 +919,19 @@ void aq_ring_free(struct aq_ring_s *self)
        }
 }
 
+void aq_ring_hwts_rx_free(struct aq_ring_s *self)
+{
+       if (!self)
+               return;
+
+       if (self->dx_ring) {
+               dma_free_coherent(aq_nic_get_dev(self->aq_nic),
+                                 self->size * self->dx_size + AQ_CFG_RXDS_DEF,
+                                 self->dx_ring, self->dx_ring_pa);
+               self->dx_ring = NULL;
+       }
+}
+
 unsigned int aq_ring_fill_stats_data(struct aq_ring_s *self, u64 *data)
 {
        unsigned int count;
index 52847310740a21097dfc35a395e96dfe5de46321..d627ace850ff54201b760a079416e4d690e73184 100644 (file)
@@ -210,6 +210,7 @@ int aq_ring_rx_fill(struct aq_ring_s *self);
 int aq_ring_hwts_rx_alloc(struct aq_ring_s *self,
                          struct aq_nic_s *aq_nic, unsigned int idx,
                          unsigned int size, unsigned int dx_size);
+void aq_ring_hwts_rx_free(struct aq_ring_s *self);
 void aq_ring_hwts_rx_clean(struct aq_ring_s *self, struct aq_nic_s *aq_nic);
 
 unsigned int aq_ring_fill_stats_data(struct aq_ring_s *self, u64 *data);
index 29b04a274d077375d9658ea91c16df1ccd963969..80245c65cc904defdec4637eb66a9c1edd6eb03f 100644 (file)
@@ -535,9 +535,6 @@ int bcmasp_netfilt_get_all_active(struct bcmasp_intf *intf, u32 *rule_locs,
        int j = 0, i;
 
        for (i = 0; i < NUM_NET_FILTERS; i++) {
-               if (j == *rule_cnt)
-                       return -EMSGSIZE;
-
                if (!priv->net_filters[i].claimed ||
                    priv->net_filters[i].port != intf->port)
                        continue;
@@ -547,6 +544,9 @@ int bcmasp_netfilt_get_all_active(struct bcmasp_intf *intf, u32 *rule_locs,
                    priv->net_filters[i - 1].wake_filter)
                        continue;
 
+               if (j == *rule_cnt)
+                       return -EMSGSIZE;
+
                rule_locs[j++] = priv->net_filters[i].fs.location;
        }
 
index 53e5428812552b56de4c7809c5c3f49f7c65379b..6ad1366270f79cba0579bac6088743b1645203ef 100644 (file)
@@ -684,6 +684,8 @@ static int bcmasp_init_rx(struct bcmasp_intf *intf)
 
        intf->rx_buf_order = get_order(RING_BUFFER_SIZE);
        buffer_pg = alloc_pages(GFP_KERNEL, intf->rx_buf_order);
+       if (!buffer_pg)
+               return -ENOMEM;
 
        dma = dma_map_page(kdev, buffer_pg, 0, RING_BUFFER_SIZE,
                           DMA_FROM_DEVICE);
@@ -1048,6 +1050,9 @@ static int bcmasp_netif_init(struct net_device *dev, bool phy_connect)
                        netdev_err(dev, "could not attach to PHY\n");
                        goto err_phy_disable;
                }
+
+               /* Indicate that the MAC is responsible for PHY PM */
+               phydev->mac_managed_pm = true;
        } else if (!intf->wolopts) {
                ret = phy_resume(dev->phydev);
                if (ret)
@@ -1092,6 +1097,7 @@ static int bcmasp_netif_init(struct net_device *dev, bool phy_connect)
        return 0;
 
 err_reclaim_tx:
+       netif_napi_del(&intf->tx_napi);
        bcmasp_reclaim_free_all_tx(intf);
 err_phy_disconnect:
        if (phydev)
index 3e7c8671cd116485252414e3dc3235d80054495d..72df1bb101728872fb8ac4b4905657ce0a67096b 100644 (file)
@@ -793,5 +793,6 @@ static struct platform_driver bcm4908_enet_driver = {
 };
 module_platform_driver(bcm4908_enet_driver);
 
+MODULE_DESCRIPTION("Broadcom BCM4908 Gigabit Ethernet driver");
 MODULE_LICENSE("GPL v2");
 MODULE_DEVICE_TABLE(of, bcm4908_enet_of_match);
index 9b83d536169940ea7f2b861c44fe256faed305aa..50b8e97a811d205fb9ed57c37a74dad701607552 100644 (file)
@@ -260,4 +260,5 @@ void bcma_mdio_mii_unregister(struct mii_bus *mii_bus)
 EXPORT_SYMBOL_GPL(bcma_mdio_mii_unregister);
 
 MODULE_AUTHOR("Rafał Miłecki");
+MODULE_DESCRIPTION("Broadcom iProc GBit BCMA MDIO helpers");
 MODULE_LICENSE("GPL");
index 6e4f36aaf5db6a6f869141e3c87bac4891cc7e50..36f9bad28e6a90da7456a8ec3e40a38038351c84 100644 (file)
@@ -362,4 +362,5 @@ module_init(bgmac_init)
 module_exit(bgmac_exit)
 
 MODULE_AUTHOR("Rafał Miłecki");
+MODULE_DESCRIPTION("Broadcom iProc GBit BCMA interface driver");
 MODULE_LICENSE("GPL");
index 0b21fd5bd4575e7a76d52e704051d3e6d30c0f72..77425c7a32dbf882672c9e664b2b006d9bc1e5f9 100644 (file)
@@ -298,4 +298,5 @@ static struct platform_driver bgmac_enet_driver = {
 };
 
 module_platform_driver(bgmac_enet_driver);
+MODULE_DESCRIPTION("Broadcom iProc GBit platform interface driver");
 MODULE_LICENSE("GPL");
index 448a1b90de5ebcf6de79a6b749f5bf279875d0e3..6ffdc42294074f86b08e92accb705e7e58409967 100644 (file)
@@ -1626,4 +1626,5 @@ int bgmac_enet_resume(struct bgmac *bgmac)
 EXPORT_SYMBOL_GPL(bgmac_enet_resume);
 
 MODULE_AUTHOR("Rafał Miłecki");
+MODULE_DESCRIPTION("Broadcom iProc GBit driver");
 MODULE_LICENSE("GPL");
index 0aacd3c6ed5c0bbf2e02f1ddddc5dc292ae84a07..39845d556bafc949cdf4e599007db26c11b414ac 100644 (file)
@@ -3817,7 +3817,7 @@ static int bnxt_alloc_cp_rings(struct bnxt *bp)
 {
        bool sh = !!(bp->flags & BNXT_FLAG_SHARED_RINGS);
        int i, j, rc, ulp_base_vec, ulp_msix;
-       int tcs = netdev_get_num_tc(bp->dev);
+       int tcs = bp->num_tc;
 
        if (!tcs)
                tcs = 1;
@@ -5935,8 +5935,12 @@ static u16 bnxt_get_max_rss_ring(struct bnxt *bp)
 
 int bnxt_get_nr_rss_ctxs(struct bnxt *bp, int rx_rings)
 {
-       if (bp->flags & BNXT_FLAG_CHIP_P5_PLUS)
-               return DIV_ROUND_UP(rx_rings, BNXT_RSS_TABLE_ENTRIES_P5);
+       if (bp->flags & BNXT_FLAG_CHIP_P5_PLUS) {
+               if (!rx_rings)
+                       return 0;
+               return bnxt_calc_nr_ring_pages(rx_rings - 1,
+                                              BNXT_RSS_TABLE_ENTRIES_P5);
+       }
        if (BNXT_CHIP_TYPE_NITRO_A0(bp))
                return 2;
        return 1;
@@ -6926,7 +6930,7 @@ static int bnxt_hwrm_get_rings(struct bnxt *bp)
                        if (cp < (rx + tx)) {
                                rc = __bnxt_trim_rings(bp, &rx, &tx, cp, false);
                                if (rc)
-                                       return rc;
+                                       goto get_rings_exit;
                                if (bp->flags & BNXT_FLAG_AGG_RINGS)
                                        rx <<= 1;
                                hw_resc->resv_rx_rings = rx;
@@ -6938,8 +6942,9 @@ static int bnxt_hwrm_get_rings(struct bnxt *bp)
                hw_resc->resv_cp_rings = cp;
                hw_resc->resv_stat_ctxs = stats;
        }
+get_rings_exit:
        hwrm_req_drop(bp, req);
-       return 0;
+       return rc;
 }
 
 int __bnxt_hwrm_get_tx_rings(struct bnxt *bp, u16 fid, int *tx_rings)
@@ -7000,10 +7005,11 @@ __bnxt_hwrm_reserve_pf_rings(struct bnxt *bp, int tx_rings, int rx_rings,
 
                req->num_rx_rings = cpu_to_le16(rx_rings);
                if (bp->flags & BNXT_FLAG_CHIP_P5_PLUS) {
+                       u16 rss_ctx = bnxt_get_nr_rss_ctxs(bp, ring_grps);
+
                        req->num_cmpl_rings = cpu_to_le16(tx_rings + ring_grps);
                        req->num_msix = cpu_to_le16(cp_rings);
-                       req->num_rsscos_ctxs =
-                               cpu_to_le16(DIV_ROUND_UP(ring_grps, 64));
+                       req->num_rsscos_ctxs = cpu_to_le16(rss_ctx);
                } else {
                        req->num_cmpl_rings = cpu_to_le16(cp_rings);
                        req->num_hw_ring_grps = cpu_to_le16(ring_grps);
@@ -7050,8 +7056,10 @@ __bnxt_hwrm_reserve_vf_rings(struct bnxt *bp, int tx_rings, int rx_rings,
        req->num_tx_rings = cpu_to_le16(tx_rings);
        req->num_rx_rings = cpu_to_le16(rx_rings);
        if (bp->flags & BNXT_FLAG_CHIP_P5_PLUS) {
+               u16 rss_ctx = bnxt_get_nr_rss_ctxs(bp, ring_grps);
+
                req->num_cmpl_rings = cpu_to_le16(tx_rings + ring_grps);
-               req->num_rsscos_ctxs = cpu_to_le16(DIV_ROUND_UP(ring_grps, 64));
+               req->num_rsscos_ctxs = cpu_to_le16(rss_ctx);
        } else {
                req->num_cmpl_rings = cpu_to_le16(cp_rings);
                req->num_hw_ring_grps = cpu_to_le16(ring_grps);
@@ -9938,7 +9946,7 @@ static int __bnxt_num_tx_to_cp(struct bnxt *bp, int tx, int tx_sets, int tx_xdp)
 
 int bnxt_num_tx_to_cp(struct bnxt *bp, int tx)
 {
-       int tcs = netdev_get_num_tc(bp->dev);
+       int tcs = bp->num_tc;
 
        if (!tcs)
                tcs = 1;
@@ -9947,7 +9955,7 @@ int bnxt_num_tx_to_cp(struct bnxt *bp, int tx)
 
 static int bnxt_num_cp_to_tx(struct bnxt *bp, int tx_cp)
 {
-       int tcs = netdev_get_num_tc(bp->dev);
+       int tcs = bp->num_tc;
 
        return (tx_cp - bp->tx_nr_rings_xdp) * tcs +
               bp->tx_nr_rings_xdp;
@@ -9977,7 +9985,7 @@ static void bnxt_setup_msix(struct bnxt *bp)
        struct net_device *dev = bp->dev;
        int tcs, i;
 
-       tcs = netdev_get_num_tc(dev);
+       tcs = bp->num_tc;
        if (tcs) {
                int i, off, count;
 
@@ -10009,8 +10017,10 @@ static void bnxt_setup_inta(struct bnxt *bp)
 {
        const int len = sizeof(bp->irq_tbl[0].name);
 
-       if (netdev_get_num_tc(bp->dev))
+       if (bp->num_tc) {
                netdev_reset_tc(bp->dev);
+               bp->num_tc = 0;
+       }
 
        snprintf(bp->irq_tbl[0].name, len, "%s-%s-%d", bp->dev->name, "TxRx",
                 0);
@@ -10236,8 +10246,8 @@ static void bnxt_clear_int_mode(struct bnxt *bp)
 
 int bnxt_reserve_rings(struct bnxt *bp, bool irq_re_init)
 {
-       int tcs = netdev_get_num_tc(bp->dev);
        bool irq_cleared = false;
+       int tcs = bp->num_tc;
        int rc;
 
        if (!bnxt_need_reserve_rings(bp))
@@ -10263,6 +10273,7 @@ int bnxt_reserve_rings(struct bnxt *bp, bool irq_re_init)
                    bp->tx_nr_rings - bp->tx_nr_rings_xdp)) {
                netdev_err(bp->dev, "tx ring reservation failure\n");
                netdev_reset_tc(bp->dev);
+               bp->num_tc = 0;
                if (bp->tx_nr_rings_xdp)
                        bp->tx_nr_rings_per_tc = bp->tx_nr_rings_xdp;
                else
@@ -11564,10 +11575,12 @@ int bnxt_half_open_nic(struct bnxt *bp)
                netdev_err(bp->dev, "bnxt_alloc_mem err: %x\n", rc);
                goto half_open_err;
        }
+       bnxt_init_napi(bp);
        set_bit(BNXT_STATE_HALF_OPEN, &bp->state);
        rc = bnxt_init_nic(bp, true);
        if (rc) {
                clear_bit(BNXT_STATE_HALF_OPEN, &bp->state);
+               bnxt_del_napi(bp);
                netdev_err(bp->dev, "bnxt_init_nic err: %x\n", rc);
                goto half_open_err;
        }
@@ -11586,6 +11599,7 @@ half_open_err:
 void bnxt_half_close_nic(struct bnxt *bp)
 {
        bnxt_hwrm_resource_free(bp, false, true);
+       bnxt_del_napi(bp);
        bnxt_free_skbs(bp);
        bnxt_free_mem(bp, true);
        clear_bit(BNXT_STATE_HALF_OPEN, &bp->state);
@@ -13232,6 +13246,11 @@ static int bnxt_fw_init_one_p1(struct bnxt *bp)
 
        bp->fw_cap = 0;
        rc = bnxt_hwrm_ver_get(bp);
+       /* FW may be unresponsive after FLR. FLR must complete within 100 msec
+        * so wait before continuing with recovery.
+        */
+       if (rc)
+               msleep(100);
        bnxt_try_map_fw_health_reg(bp);
        if (rc) {
                rc = bnxt_try_recover_fw(bp);
@@ -13784,7 +13803,7 @@ int bnxt_setup_mq_tc(struct net_device *dev, u8 tc)
                return -EINVAL;
        }
 
-       if (netdev_get_num_tc(dev) == tc)
+       if (bp->num_tc == tc)
                return 0;
 
        if (bp->flags & BNXT_FLAG_SHARED_RINGS)
@@ -13802,9 +13821,11 @@ int bnxt_setup_mq_tc(struct net_device *dev, u8 tc)
        if (tc) {
                bp->tx_nr_rings = bp->tx_nr_rings_per_tc * tc;
                netdev_set_num_tc(dev, tc);
+               bp->num_tc = tc;
        } else {
                bp->tx_nr_rings = bp->tx_nr_rings_per_tc;
                netdev_reset_tc(dev);
+               bp->num_tc = 0;
        }
        bp->tx_nr_rings += bp->tx_nr_rings_xdp;
        tx_cp = bnxt_num_tx_to_cp(bp, bp->tx_nr_rings);
index b8ef1717cb65fb128b6a60d8148829f3c2b6af36..47338b48ca203d2ebea4c72b433690a0afde3c58 100644 (file)
@@ -2225,6 +2225,7 @@ struct bnxt {
        u8                      tc_to_qidx[BNXT_MAX_QUEUE];
        u8                      q_ids[BNXT_MAX_QUEUE];
        u8                      max_q;
+       u8                      num_tc;
 
        unsigned int            current_interval;
 #define BNXT_TIMER_INTERVAL    HZ
index 63e0670383852af5b6ab4a7c28c9d25e0ea64a1b..0dbb880a7aa0e721f98e42d17f845596a702abc4 100644 (file)
@@ -228,7 +228,7 @@ static int bnxt_queue_remap(struct bnxt *bp, unsigned int lltc_mask)
                }
        }
        if (bp->ieee_ets) {
-               int tc = netdev_get_num_tc(bp->dev);
+               int tc = bp->num_tc;
 
                if (!tc)
                        tc = 1;
index 27b983c0a8a9cdfb3f928fdc026ce7480307ff31..dc4ca706b0e299d9df7da1b2edba240becd68878 100644 (file)
@@ -884,7 +884,7 @@ static void bnxt_get_channels(struct net_device *dev,
        if (max_tx_sch_inputs)
                max_tx_rings = min_t(int, max_tx_rings, max_tx_sch_inputs);
 
-       tcs = netdev_get_num_tc(dev);
+       tcs = bp->num_tc;
        tx_grps = max(tcs, 1);
        if (bp->tx_nr_rings_xdp)
                tx_grps++;
@@ -944,7 +944,7 @@ static int bnxt_set_channels(struct net_device *dev,
        if (channel->combined_count)
                sh = true;
 
-       tcs = netdev_get_num_tc(dev);
+       tcs = bp->num_tc;
 
        req_tx_rings = sh ? channel->combined_count : channel->tx_count;
        req_rx_rings = sh ? channel->combined_count : channel->rx_count;
@@ -1574,7 +1574,8 @@ u32 bnxt_get_rxfh_indir_size(struct net_device *dev)
        struct bnxt *bp = netdev_priv(dev);
 
        if (bp->flags & BNXT_FLAG_CHIP_P5_PLUS)
-               return ALIGN(bp->rx_nr_rings, BNXT_RSS_TABLE_ENTRIES_P5);
+               return bnxt_get_nr_rss_ctxs(bp, bp->rx_nr_rings) *
+                      BNXT_RSS_TABLE_ENTRIES_P5;
        return HW_HASH_INDEX_SIZE;
 }
 
index adad188e38b8256ef5a1e051310abae2d5bd9b34..cc07660330f533b5e39efcb0f5dff6f865821315 100644 (file)
@@ -684,7 +684,7 @@ static void bnxt_stamp_tx_skb(struct bnxt *bp, struct sk_buff *skb)
                timestamp.hwtstamp = ns_to_ktime(ns);
                skb_tstamp_tx(ptp->tx_skb, &timestamp);
        } else {
-               netdev_WARN_ONCE(bp->dev,
+               netdev_warn_once(bp->dev,
                                 "TS query for TX timer failed rc = %x\n", rc);
        }
 
index c2b25fc623ecc08410e8fc45cb391d5a555cc1d9..4079538bc310eaaeee0ee568f6fcc3fbf2b4e758 100644 (file)
@@ -407,7 +407,7 @@ static int bnxt_xdp_set(struct bnxt *bp, struct bpf_prog *prog)
        if (prog)
                tx_xdp = bp->rx_nr_rings;
 
-       tc = netdev_get_num_tc(dev);
+       tc = bp->num_tc;
        if (!tc)
                tc = 1;
        rc = bnxt_check_rings(bp, bp->tx_nr_rings_per_tc, bp->rx_nr_rings,
index 31191b520b5875a72f08e22ce316a103ccee68ea..c32174484a967ae4cf49f6025bf0892333d1fd89 100644 (file)
@@ -1091,10 +1091,10 @@ bnad_cb_tx_resume(struct bnad *bnad, struct bna_tx *tx)
  * Free all TxQs buffers and then notify TX_E_CLEANUP_DONE to Tx fsm.
  */
 static void
-bnad_tx_cleanup(struct delayed_work *work)
+bnad_tx_cleanup(struct work_struct *work)
 {
        struct bnad_tx_info *tx_info =
-               container_of(work, struct bnad_tx_info, tx_cleanup_work);
+               container_of(work, struct bnad_tx_info, tx_cleanup_work.work);
        struct bnad *bnad = NULL;
        struct bna_tcb *tcb;
        unsigned long flags;
@@ -1170,7 +1170,7 @@ bnad_cb_rx_stall(struct bnad *bnad, struct bna_rx *rx)
  * Free all RxQs buffers and then notify RX_E_CLEANUP_DONE to Rx fsm.
  */
 static void
-bnad_rx_cleanup(void *work)
+bnad_rx_cleanup(struct work_struct *work)
 {
        struct bnad_rx_info *rx_info =
                container_of(work, struct bnad_rx_info, rx_cleanup_work);
@@ -1991,8 +1991,7 @@ bnad_setup_tx(struct bnad *bnad, u32 tx_id)
        }
        tx_info->tx = tx;
 
-       INIT_DELAYED_WORK(&tx_info->tx_cleanup_work,
-                       (work_func_t)bnad_tx_cleanup);
+       INIT_DELAYED_WORK(&tx_info->tx_cleanup_work, bnad_tx_cleanup);
 
        /* Register ISR for the Tx object */
        if (intr_info->intr_type == BNA_INTR_T_MSIX) {
@@ -2248,8 +2247,7 @@ bnad_setup_rx(struct bnad *bnad, u32 rx_id)
        rx_info->rx = rx;
        spin_unlock_irqrestore(&bnad->bna_lock, flags);
 
-       INIT_WORK(&rx_info->rx_cleanup_work,
-                       (work_func_t)(bnad_rx_cleanup));
+       INIT_WORK(&rx_info->rx_cleanup_work, bnad_rx_cleanup);
 
        /*
         * Init NAPI, so that state is set to NAPI_STATE_SCHED,
index 9cc6303c82ffb7f2680d3a01e4ca4c7fb227574c..f38d31bfab1bbcecafacaa4adf2e1ee67bd332b6 100644 (file)
@@ -27,6 +27,7 @@
 #include "octeon_network.h"
 
 MODULE_AUTHOR("Cavium Networks, <support@cavium.com>");
+MODULE_DESCRIPTION("Cavium LiquidIO Intelligent Server Adapter Core");
 MODULE_LICENSE("GPL");
 
 /* OOM task polling interval */
index 1c2a540db13d8a6806c0de8f3e31998133f8b429..1f495cfd7959b045c8186f6e84a2cbea43eeb1f2 100644 (file)
@@ -868,5 +868,6 @@ static struct platform_driver ep93xx_eth_driver = {
 
 module_platform_driver(ep93xx_eth_driver);
 
+MODULE_DESCRIPTION("Cirrus EP93xx Ethernet driver");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS("platform:ep93xx-eth");
index 20fcb20b42edee5129fcf6e5edde9f3a0ecd2764..66b57783533897e6399ec12505441004646b8ebc 100644 (file)
@@ -49,7 +49,8 @@ int vic_provinfo_add_tlv(struct vic_provinfo *vp, u16 type, u16 length,
 
        tlv->type = htons(type);
        tlv->length = htons(length);
-       memcpy(tlv->value, value, length);
+       unsafe_memcpy(tlv->value, value, length,
+                     /* Flexible array of flexible arrays */);
 
        vp->num_tlvs = htonl(ntohl(vp->num_tlvs) + 1);
        vp->length = htonl(ntohl(vp->length) +
index df40c720e7b23d517433743efb883edb8f8d4cf4..64eadd3207983a671332d47cc4fdc966561d8425 100644 (file)
@@ -719,17 +719,25 @@ static void tsnep_xdp_xmit_flush(struct tsnep_tx *tx)
 
 static bool tsnep_xdp_xmit_back(struct tsnep_adapter *adapter,
                                struct xdp_buff *xdp,
-                               struct netdev_queue *tx_nq, struct tsnep_tx *tx)
+                               struct netdev_queue *tx_nq, struct tsnep_tx *tx,
+                               bool zc)
 {
        struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp);
        bool xmit;
+       u32 type;
 
        if (unlikely(!xdpf))
                return false;
 
+       /* no page pool for zero copy */
+       if (zc)
+               type = TSNEP_TX_TYPE_XDP_NDO;
+       else
+               type = TSNEP_TX_TYPE_XDP_TX;
+
        __netif_tx_lock(tx_nq, smp_processor_id());
 
-       xmit = tsnep_xdp_xmit_frame_ring(xdpf, tx, TSNEP_TX_TYPE_XDP_TX);
+       xmit = tsnep_xdp_xmit_frame_ring(xdpf, tx, type);
 
        /* Avoid transmit queue timeout since we share it with the slow path */
        if (xmit)
@@ -1273,7 +1281,7 @@ static bool tsnep_xdp_run_prog(struct tsnep_rx *rx, struct bpf_prog *prog,
        case XDP_PASS:
                return false;
        case XDP_TX:
-               if (!tsnep_xdp_xmit_back(rx->adapter, xdp, tx_nq, tx))
+               if (!tsnep_xdp_xmit_back(rx->adapter, xdp, tx_nq, tx, false))
                        goto out_failure;
                *status |= TSNEP_XDP_TX;
                return true;
@@ -1323,7 +1331,7 @@ static bool tsnep_xdp_run_prog_zc(struct tsnep_rx *rx, struct bpf_prog *prog,
        case XDP_PASS:
                return false;
        case XDP_TX:
-               if (!tsnep_xdp_xmit_back(rx->adapter, xdp, tx_nq, tx))
+               if (!tsnep_xdp_xmit_back(rx->adapter, xdp, tx_nq, tx, true))
                        goto out_failure;
                *status |= TSNEP_XDP_TX;
                return true;
@@ -1485,7 +1493,7 @@ static int tsnep_rx_poll(struct tsnep_rx *rx, struct napi_struct *napi,
 
                        xdp_prepare_buff(&xdp, page_address(entry->page),
                                         XDP_PACKET_HEADROOM + TSNEP_RX_INLINE_METADATA_SIZE,
-                                        length, false);
+                                        length - ETH_FCS_LEN, false);
 
                        consume = tsnep_xdp_run_prog(rx, prog, &xdp,
                                                     &xdp_status, tx_nq, tx);
@@ -1568,7 +1576,7 @@ static int tsnep_rx_poll_zc(struct tsnep_rx *rx, struct napi_struct *napi,
                prefetch(entry->xdp->data);
                length = __le32_to_cpu(entry->desc_wb->properties) &
                         TSNEP_DESC_LENGTH_MASK;
-               xsk_buff_set_size(entry->xdp, length);
+               xsk_buff_set_size(entry->xdp, length - ETH_FCS_LEN);
                xsk_buff_dma_sync_for_cpu(entry->xdp, rx->xsk_pool);
 
                /* RX metadata with timestamps is in front of actual data,
@@ -1762,6 +1770,19 @@ static void tsnep_rx_reopen_xsk(struct tsnep_rx *rx)
                        allocated--;
                }
        }
+
+       /* set need wakeup flag immediately if ring is not filled completely,
+        * first polling would be too late as need wakeup signalisation would
+        * be delayed for an indefinite time
+        */
+       if (xsk_uses_need_wakeup(rx->xsk_pool)) {
+               int desc_available = tsnep_rx_desc_available(rx);
+
+               if (desc_available)
+                       xsk_set_rx_need_wakeup(rx->xsk_pool);
+               else
+                       xsk_clear_rx_need_wakeup(rx->xsk_pool);
+       }
 }
 
 static bool tsnep_pending(struct tsnep_queue *queue)
index 07c2b701b5fa9793d30e27892536eca28ba0504a..9ebe751c1df0758c75ee493ddaa63876807f49be 100644 (file)
@@ -661,4 +661,5 @@ static struct platform_driver nps_enet_driver = {
 module_platform_driver(nps_enet_driver);
 
 MODULE_AUTHOR("EZchip Semiconductor");
+MODULE_DESCRIPTION("EZchip NPS Ethernet driver");
 MODULE_LICENSE("GPL v2");
index cffbf27c4656b27b0694f2a1ac88ad596c6196b4..bfdbdab443ae0ddcf93dca777f134e659b5d4040 100644 (file)
@@ -3216,4 +3216,5 @@ void enetc_pci_remove(struct pci_dev *pdev)
 }
 EXPORT_SYMBOL_GPL(enetc_pci_remove);
 
+MODULE_DESCRIPTION("NXP ENETC Ethernet driver");
 MODULE_LICENSE("Dual BSD/GPL");
index d42594f322750f5ff5d4cf039b1115e5d4424f31..432523b2c789216b21440e4e6576c06ae30b674c 100644 (file)
@@ -2036,6 +2036,7 @@ static void fec_enet_adjust_link(struct net_device *ndev)
 
                /* if any of the above changed restart the FEC */
                if (status_change) {
+                       netif_stop_queue(ndev);
                        napi_disable(&fep->napi);
                        netif_tx_lock_bh(ndev);
                        fec_restart(ndev);
@@ -2045,6 +2046,7 @@ static void fec_enet_adjust_link(struct net_device *ndev)
                }
        } else {
                if (fep->link) {
+                       netif_stop_queue(ndev);
                        napi_disable(&fep->napi);
                        netif_tx_lock_bh(ndev);
                        fec_stop(ndev);
@@ -4769,4 +4771,5 @@ static struct platform_driver fec_driver = {
 
 module_platform_driver(fec_driver);
 
+MODULE_DESCRIPTION("NXP Fast Ethernet Controller (FEC) driver");
 MODULE_LICENSE("GPL");
index 9ba15d3183d75726fd88fa6b27a6efaf1fc30790..758535adc9ff5bb0a043683e1875ff1f3c9c2005 100644 (file)
@@ -1073,6 +1073,14 @@ int memac_initialization(struct mac_device *mac_dev,
        unsigned long            capabilities;
        unsigned long           *supported;
 
+       /* The internal connection to the serdes is XGMII, but this isn't
+        * really correct for the phy mode (which is the external connection).
+        * However, this is how all older device trees say that they want
+        * 10GBASE-R (aka XFI), so just convert it for them.
+        */
+       if (mac_dev->phy_if == PHY_INTERFACE_MODE_XGMII)
+               mac_dev->phy_if = PHY_INTERFACE_MODE_10GBASER;
+
        mac_dev->phylink_ops            = &memac_mac_ops;
        mac_dev->set_promisc            = memac_set_promiscuous;
        mac_dev->change_addr            = memac_modify_mac_address;
@@ -1139,7 +1147,7 @@ int memac_initialization(struct mac_device *mac_dev,
         * (and therefore that xfi_pcs cannot be set). If we are defaulting to
         * XGMII, assume this is for XFI. Otherwise, assume it is for SGMII.
         */
-       if (err && mac_dev->phy_if == PHY_INTERFACE_MODE_XGMII)
+       if (err && mac_dev->phy_if == PHY_INTERFACE_MODE_10GBASER)
                memac->xfi_pcs = pcs;
        else
                memac->sgmii_pcs = pcs;
@@ -1153,14 +1161,6 @@ int memac_initialization(struct mac_device *mac_dev,
                goto _return_fm_mac_free;
        }
 
-       /* The internal connection to the serdes is XGMII, but this isn't
-        * really correct for the phy mode (which is the external connection).
-        * However, this is how all older device trees say that they want
-        * 10GBASE-R (aka XFI), so just convert it for them.
-        */
-       if (mac_dev->phy_if == PHY_INTERFACE_MODE_XGMII)
-               mac_dev->phy_if = PHY_INTERFACE_MODE_10GBASER;
-
        /* TODO: The following interface modes are supported by (some) hardware
         * but not by this driver:
         * - 1000BASE-KX
index 70dd982a5edce68a63a8f3b94d59c9947dd0045a..026f7270a54de8bf398516b4e563f35d4825a3d1 100644 (file)
@@ -531,4 +531,5 @@ static struct platform_driver fsl_pq_mdio_driver = {
 
 module_platform_driver(fsl_pq_mdio_driver);
 
+MODULE_DESCRIPTION("Freescale PQ MDIO helpers");
 MODULE_LICENSE("GPL");
index 7a8dc5386ffff9bd99d94eced337cf276551a88f..76615d47e055aebc9fcea0d365b28b4389337c07 100644 (file)
@@ -356,7 +356,7 @@ static enum pkt_hash_types gve_rss_type(__be16 pkt_flags)
 
 static struct sk_buff *gve_rx_add_frags(struct napi_struct *napi,
                                        struct gve_rx_slot_page_info *page_info,
-                                       u16 packet_buffer_size, u16 len,
+                                       unsigned int truesize, u16 len,
                                        struct gve_rx_ctx *ctx)
 {
        u32 offset = page_info->page_offset + page_info->pad;
@@ -389,10 +389,10 @@ static struct sk_buff *gve_rx_add_frags(struct napi_struct *napi,
        if (skb != ctx->skb_head) {
                ctx->skb_head->len += len;
                ctx->skb_head->data_len += len;
-               ctx->skb_head->truesize += packet_buffer_size;
+               ctx->skb_head->truesize += truesize;
        }
        skb_add_rx_frag(skb, num_frags, page_info->page,
-                       offset, len, packet_buffer_size);
+                       offset, len, truesize);
 
        return ctx->skb_head;
 }
@@ -486,7 +486,7 @@ static struct sk_buff *gve_rx_copy_to_pool(struct gve_rx_ring *rx,
 
                memcpy(alloc_page_info.page_address, src, page_info->pad + len);
                skb = gve_rx_add_frags(napi, &alloc_page_info,
-                                      rx->packet_buffer_size,
+                                      PAGE_SIZE,
                                       len, ctx);
 
                u64_stats_update_begin(&rx->statss);
index a187582d22994c607915f1fe26f5374031444976..ba9c19e6994c9defdf06eada37091e09d10881fa 100644 (file)
@@ -360,23 +360,43 @@ s32 e1000e_get_base_timinca(struct e1000_adapter *adapter, u32 *timinca);
  * As a result, a shift of INCVALUE_SHIFT_n is used to fit a value of
  * INCVALUE_n into the TIMINCA register allowing 32+8+(24-INCVALUE_SHIFT_n)
  * bits to count nanoseconds leaving the rest for fractional nonseconds.
+ *
+ * Any given INCVALUE also has an associated maximum adjustment value. This
+ * maximum adjustment value is the largest increase (or decrease) which can be
+ * safely applied without overflowing the INCVALUE. Since INCVALUE has
+ * a maximum range of 24 bits, its largest value is 0xFFFFFF.
+ *
+ * To understand where the maximum value comes from, consider the following
+ * equation:
+ *
+ *   new_incval = base_incval + (base_incval * adjustment) / 1billion
+ *
+ * To avoid overflow that means:
+ *   max_incval = base_incval + (base_incval * max_adj) / billion
+ *
+ * Re-arranging:
+ *   max_adj = floor(((max_incval - base_incval) * 1billion) / 1billion)
  */
 #define INCVALUE_96MHZ         125
 #define INCVALUE_SHIFT_96MHZ   17
 #define INCPERIOD_SHIFT_96MHZ  2
 #define INCPERIOD_96MHZ                (12 >> INCPERIOD_SHIFT_96MHZ)
+#define MAX_PPB_96MHZ          23999900 /* 23,999,900 ppb */
 
 #define INCVALUE_25MHZ         40
 #define INCVALUE_SHIFT_25MHZ   18
 #define INCPERIOD_25MHZ                1
+#define MAX_PPB_25MHZ          599999900 /* 599,999,900 ppb */
 
 #define INCVALUE_24MHZ         125
 #define INCVALUE_SHIFT_24MHZ   14
 #define INCPERIOD_24MHZ                3
+#define MAX_PPB_24MHZ          999999999 /* 999,999,999 ppb */
 
 #define INCVALUE_38400KHZ      26
 #define INCVALUE_SHIFT_38400KHZ        19
 #define INCPERIOD_38400KHZ     1
+#define MAX_PPB_38400KHZ       230769100 /* 230,769,100 ppb */
 
 /* Another drawback of scaling the incvalue by a large factor is the
  * 64-bit SYSTIM register overflows more quickly.  This is dealt with
index a2788fd5f8bb857510e6c4b4af362188be996e68..19e450a5bd314ff67676843a767ec1c5c2bd84d4 100644 (file)
@@ -2559,7 +2559,7 @@ void e1000_copy_rx_addrs_to_phy_ich8lan(struct e1000_hw *hw)
                hw->phy.ops.write_reg_page(hw, BM_RAR_H(i),
                                           (u16)(mac_reg & 0xFFFF));
                hw->phy.ops.write_reg_page(hw, BM_RAR_CTRL(i),
-                                          FIELD_GET(E1000_RAH_AV, mac_reg));
+                                          (u16)((mac_reg & E1000_RAH_AV) >> 16));
        }
 
        e1000_disable_phy_wakeup_reg_access_bm(hw, &phy_reg);
index 02d871bc112a739cec1baffba5b63abaf14f4a7d..bbcfd529399b0fa938037858b1e2f7912f8e5a58 100644 (file)
@@ -280,8 +280,17 @@ void e1000e_ptp_init(struct e1000_adapter *adapter)
 
        switch (hw->mac.type) {
        case e1000_pch2lan:
+               adapter->ptp_clock_info.max_adj = MAX_PPB_96MHZ;
+               break;
        case e1000_pch_lpt:
+               if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI)
+                       adapter->ptp_clock_info.max_adj = MAX_PPB_96MHZ;
+               else
+                       adapter->ptp_clock_info.max_adj = MAX_PPB_25MHZ;
+               break;
        case e1000_pch_spt:
+               adapter->ptp_clock_info.max_adj = MAX_PPB_24MHZ;
+               break;
        case e1000_pch_cnp:
        case e1000_pch_tgp:
        case e1000_pch_adp:
@@ -289,15 +298,14 @@ void e1000e_ptp_init(struct e1000_adapter *adapter)
        case e1000_pch_lnp:
        case e1000_pch_ptp:
        case e1000_pch_nvp:
-               if ((hw->mac.type < e1000_pch_lpt) ||
-                   (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI)) {
-                       adapter->ptp_clock_info.max_adj = 24000000 - 1;
-                       break;
-               }
-               fallthrough;
+               if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI)
+                       adapter->ptp_clock_info.max_adj = MAX_PPB_24MHZ;
+               else
+                       adapter->ptp_clock_info.max_adj = MAX_PPB_38400KHZ;
+               break;
        case e1000_82574:
        case e1000_82583:
-               adapter->ptp_clock_info.max_adj = 600000000 - 1;
+               adapter->ptp_clock_info.max_adj = MAX_PPB_25MHZ;
                break;
        default:
                break;
index 9d88ed6105fd8f25ac8724827a9467f5043ee8b5..8db1eb0c1768c9869d7bf09820b4324b91bf788c 100644 (file)
@@ -1523,7 +1523,7 @@ void i40e_dcb_hw_rx_ets_bw_config(struct i40e_hw *hw, u8 *bw_share,
                reg = rd32(hw, I40E_PRTDCB_RETSTCC(i));
                reg &= ~(I40E_PRTDCB_RETSTCC_BWSHARE_MASK     |
                         I40E_PRTDCB_RETSTCC_UPINTC_MODE_MASK |
-                        I40E_PRTDCB_RETSTCC_ETSTC_SHIFT);
+                        I40E_PRTDCB_RETSTCC_ETSTC_MASK);
                reg |= FIELD_PREP(I40E_PRTDCB_RETSTCC_BWSHARE_MASK,
                                  bw_share[i]);
                reg |= FIELD_PREP(I40E_PRTDCB_RETSTCC_UPINTC_MODE_MASK,
index 6b60dc9b77361a2537466c18c8d78eafbc35a01a..d76497566e40e739fd7eba7773fbb84c02ddf93b 100644 (file)
@@ -43,7 +43,7 @@
 #define I40E_LLDP_TLV_SUBTYPE_SHIFT    0
 #define I40E_LLDP_TLV_SUBTYPE_MASK     (0xFF << I40E_LLDP_TLV_SUBTYPE_SHIFT)
 #define I40E_LLDP_TLV_OUI_SHIFT                8
-#define I40E_LLDP_TLV_OUI_MASK         (0xFFFFFF << I40E_LLDP_TLV_OUI_SHIFT)
+#define I40E_LLDP_TLV_OUI_MASK         (0xFFFFFFU << I40E_LLDP_TLV_OUI_SHIFT)
 
 /* Defines for IEEE ETS TLV */
 #define I40E_IEEE_ETS_MAXTC_SHIFT      0
index ae8f9f135725b4de88e9d95858025ce6ef41c650..89a3401d20ab4b1b429802930345c74e823f53e1 100644 (file)
@@ -3588,40 +3588,55 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
        struct i40e_hmc_obj_rxq rx_ctx;
        int err = 0;
        bool ok;
-       int ret;
 
        bitmap_zero(ring->state, __I40E_RING_STATE_NBITS);
 
        /* clear the context structure first */
        memset(&rx_ctx, 0, sizeof(rx_ctx));
 
-       if (ring->vsi->type == I40E_VSI_MAIN)
-               xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
+       ring->rx_buf_len = vsi->rx_buf_len;
+
+       /* XDP RX-queue info only needed for RX rings exposed to XDP */
+       if (ring->vsi->type != I40E_VSI_MAIN)
+               goto skip;
+
+       if (!xdp_rxq_info_is_reg(&ring->xdp_rxq)) {
+               err = __xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev,
+                                        ring->queue_index,
+                                        ring->q_vector->napi.napi_id,
+                                        ring->rx_buf_len);
+               if (err)
+                       return err;
+       }
 
        ring->xsk_pool = i40e_xsk_pool(ring);
        if (ring->xsk_pool) {
-               ring->rx_buf_len =
-                 xsk_pool_get_rx_frame_size(ring->xsk_pool);
-               ret = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
+               xdp_rxq_info_unreg(&ring->xdp_rxq);
+               ring->rx_buf_len = xsk_pool_get_rx_frame_size(ring->xsk_pool);
+               err = __xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev,
+                                        ring->queue_index,
+                                        ring->q_vector->napi.napi_id,
+                                        ring->rx_buf_len);
+               if (err)
+                       return err;
+               err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
                                                 MEM_TYPE_XSK_BUFF_POOL,
                                                 NULL);
-               if (ret)
-                       return ret;
+               if (err)
+                       return err;
                dev_info(&vsi->back->pdev->dev,
                         "Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring %d\n",
                         ring->queue_index);
 
        } else {
-               ring->rx_buf_len = vsi->rx_buf_len;
-               if (ring->vsi->type == I40E_VSI_MAIN) {
-                       ret = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
-                                                        MEM_TYPE_PAGE_SHARED,
-                                                        NULL);
-                       if (ret)
-                               return ret;
-               }
+               err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
+                                                MEM_TYPE_PAGE_SHARED,
+                                                NULL);
+               if (err)
+                       return err;
        }
 
+skip:
        xdp_init_buff(&ring->xdp, i40e_rx_pg_size(ring) / 2, &ring->xdp_rxq);
 
        rx_ctx.dbuff = DIV_ROUND_UP(ring->rx_buf_len,
@@ -4911,27 +4926,23 @@ int i40e_vsi_start_rings(struct i40e_vsi *vsi)
 void i40e_vsi_stop_rings(struct i40e_vsi *vsi)
 {
        struct i40e_pf *pf = vsi->back;
-       int pf_q, err, q_end;
+       u32 pf_q, tx_q_end, rx_q_end;
 
        /* When port TX is suspended, don't wait */
        if (test_bit(__I40E_PORT_SUSPENDED, vsi->back->state))
                return i40e_vsi_stop_rings_no_wait(vsi);
 
-       q_end = vsi->base_queue + vsi->num_queue_pairs;
-       for (pf_q = vsi->base_queue; pf_q < q_end; pf_q++)
-               i40e_pre_tx_queue_cfg(&pf->hw, (u32)pf_q, false);
+       tx_q_end = vsi->base_queue +
+               vsi->alloc_queue_pairs * (i40e_enabled_xdp_vsi(vsi) ? 2 : 1);
+       for (pf_q = vsi->base_queue; pf_q < tx_q_end; pf_q++)
+               i40e_pre_tx_queue_cfg(&pf->hw, pf_q, false);
 
-       for (pf_q = vsi->base_queue; pf_q < q_end; pf_q++) {
-               err = i40e_control_wait_rx_q(pf, pf_q, false);
-               if (err)
-                       dev_info(&pf->pdev->dev,
-                                "VSI seid %d Rx ring %d disable timeout\n",
-                                vsi->seid, pf_q);
-       }
+       rx_q_end = vsi->base_queue + vsi->num_queue_pairs;
+       for (pf_q = vsi->base_queue; pf_q < rx_q_end; pf_q++)
+               i40e_control_rx_q(pf, pf_q, false);
 
        msleep(I40E_DISABLE_TX_GAP_MSEC);
-       pf_q = vsi->base_queue;
-       for (pf_q = vsi->base_queue; pf_q < q_end; pf_q++)
+       for (pf_q = vsi->base_queue; pf_q < tx_q_end; pf_q++)
                wr32(&pf->hw, I40E_QTX_ENA(pf_q), 0);
 
        i40e_vsi_wait_queues_disabled(vsi);
@@ -5345,7 +5356,7 @@ static int i40e_pf_wait_queues_disabled(struct i40e_pf *pf)
 {
        int v, ret = 0;
 
-       for (v = 0; v < pf->hw.func_caps.num_vsis; v++) {
+       for (v = 0; v < pf->num_alloc_vsi; v++) {
                if (pf->vsi[v]) {
                        ret = i40e_vsi_wait_queues_disabled(pf->vsi[v]);
                        if (ret)
@@ -13549,9 +13560,9 @@ int i40e_queue_pair_disable(struct i40e_vsi *vsi, int queue_pair)
                return err;
 
        i40e_queue_pair_disable_irq(vsi, queue_pair);
+       i40e_queue_pair_toggle_napi(vsi, queue_pair, false /* off */);
        err = i40e_queue_pair_toggle_rings(vsi, queue_pair, false /* off */);
        i40e_clean_rx_ring(vsi->rx_rings[queue_pair]);
-       i40e_queue_pair_toggle_napi(vsi, queue_pair, false /* off */);
        i40e_queue_pair_clean_rings(vsi, queue_pair);
        i40e_queue_pair_reset_stats(vsi, queue_pair);
 
index af42693305815ca8711bb80d25b7221ca8981e16..ce1f11b8ad65c213bc0090e64dabe6e7295d736c 100644 (file)
@@ -567,8 +567,7 @@ static inline bool i40e_is_fw_ver_lt(struct i40e_hw *hw, u16 maj, u16 min)
  **/
 static inline bool i40e_is_fw_ver_eq(struct i40e_hw *hw, u16 maj, u16 min)
 {
-       return (hw->aq.fw_maj_ver > maj ||
-               (hw->aq.fw_maj_ver == maj && hw->aq.fw_min_ver == min));
+       return (hw->aq.fw_maj_ver == maj && hw->aq.fw_min_ver == min);
 }
 
 #endif /* _I40E_PROTOTYPE_H_ */
index 971ba33220381b799e4900f6f2231b58bcb877e4..0d7177083708f29d3b4deba11d00abdcb017f886 100644 (file)
@@ -1548,7 +1548,6 @@ void i40e_free_rx_resources(struct i40e_ring *rx_ring)
 int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
 {
        struct device *dev = rx_ring->dev;
-       int err;
 
        u64_stats_init(&rx_ring->syncp);
 
@@ -1569,14 +1568,6 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
        rx_ring->next_to_process = 0;
        rx_ring->next_to_use = 0;
 
-       /* XDP RX-queue info only needed for RX rings exposed to XDP */
-       if (rx_ring->vsi->type == I40E_VSI_MAIN) {
-               err = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
-                                      rx_ring->queue_index, rx_ring->q_vector->napi.napi_id);
-               if (err < 0)
-                       return err;
-       }
-
        rx_ring->xdp_prog = rx_ring->vsi->xdp_prog;
 
        rx_ring->rx_bi =
@@ -2087,7 +2078,8 @@ static void i40e_put_rx_buffer(struct i40e_ring *rx_ring,
 static void i40e_process_rx_buffs(struct i40e_ring *rx_ring, int xdp_res,
                                  struct xdp_buff *xdp)
 {
-       u32 next = rx_ring->next_to_clean;
+       u32 nr_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags;
+       u32 next = rx_ring->next_to_clean, i = 0;
        struct i40e_rx_buffer *rx_buffer;
 
        xdp->flags = 0;
@@ -2100,10 +2092,10 @@ static void i40e_process_rx_buffs(struct i40e_ring *rx_ring, int xdp_res,
                if (!rx_buffer->page)
                        continue;
 
-               if (xdp_res == I40E_XDP_CONSUMED)
-                       rx_buffer->pagecnt_bias++;
-               else
+               if (xdp_res != I40E_XDP_CONSUMED)
                        i40e_rx_buffer_flip(rx_buffer, xdp->frame_sz);
+               else if (i++ <= nr_frags)
+                       rx_buffer->pagecnt_bias++;
 
                /* EOP buffer will be put in i40e_clean_rx_irq() */
                if (next == rx_ring->next_to_process)
@@ -2117,20 +2109,20 @@ static void i40e_process_rx_buffs(struct i40e_ring *rx_ring, int xdp_res,
  * i40e_construct_skb - Allocate skb and populate it
  * @rx_ring: rx descriptor ring to transact packets on
  * @xdp: xdp_buff pointing to the data
- * @nr_frags: number of buffers for the packet
  *
  * This function allocates an skb.  It then populates it with the page
  * data from the current receive descriptor, taking care to set up the
  * skb correctly.
  */
 static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
-                                         struct xdp_buff *xdp,
-                                         u32 nr_frags)
+                                         struct xdp_buff *xdp)
 {
        unsigned int size = xdp->data_end - xdp->data;
        struct i40e_rx_buffer *rx_buffer;
+       struct skb_shared_info *sinfo;
        unsigned int headlen;
        struct sk_buff *skb;
+       u32 nr_frags = 0;
 
        /* prefetch first cache line of first page */
        net_prefetch(xdp->data);
@@ -2168,6 +2160,10 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
        memcpy(__skb_put(skb, headlen), xdp->data,
               ALIGN(headlen, sizeof(long)));
 
+       if (unlikely(xdp_buff_has_frags(xdp))) {
+               sinfo = xdp_get_shared_info_from_buff(xdp);
+               nr_frags = sinfo->nr_frags;
+       }
        rx_buffer = i40e_rx_bi(rx_ring, rx_ring->next_to_clean);
        /* update all of the pointers */
        size -= headlen;
@@ -2187,9 +2183,8 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
        }
 
        if (unlikely(xdp_buff_has_frags(xdp))) {
-               struct skb_shared_info *sinfo, *skinfo = skb_shinfo(skb);
+               struct skb_shared_info *skinfo = skb_shinfo(skb);
 
-               sinfo = xdp_get_shared_info_from_buff(xdp);
                memcpy(&skinfo->frags[skinfo->nr_frags], &sinfo->frags[0],
                       sizeof(skb_frag_t) * nr_frags);
 
@@ -2212,17 +2207,17 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
  * i40e_build_skb - Build skb around an existing buffer
  * @rx_ring: Rx descriptor ring to transact packets on
  * @xdp: xdp_buff pointing to the data
- * @nr_frags: number of buffers for the packet
  *
  * This function builds an skb around an existing Rx buffer, taking care
  * to set up the skb correctly and avoid any memcpy overhead.
  */
 static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
-                                     struct xdp_buff *xdp,
-                                     u32 nr_frags)
+                                     struct xdp_buff *xdp)
 {
        unsigned int metasize = xdp->data - xdp->data_meta;
+       struct skb_shared_info *sinfo;
        struct sk_buff *skb;
+       u32 nr_frags;
 
        /* Prefetch first cache line of first page. If xdp->data_meta
         * is unused, this points exactly as xdp->data, otherwise we
@@ -2231,6 +2226,11 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
         */
        net_prefetch(xdp->data_meta);
 
+       if (unlikely(xdp_buff_has_frags(xdp))) {
+               sinfo = xdp_get_shared_info_from_buff(xdp);
+               nr_frags = sinfo->nr_frags;
+       }
+
        /* build an skb around the page buffer */
        skb = napi_build_skb(xdp->data_hard_start, xdp->frame_sz);
        if (unlikely(!skb))
@@ -2243,9 +2243,6 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
                skb_metadata_set(skb, metasize);
 
        if (unlikely(xdp_buff_has_frags(xdp))) {
-               struct skb_shared_info *sinfo;
-
-               sinfo = xdp_get_shared_info_from_buff(xdp);
                xdp_update_skb_shared_info(skb, nr_frags,
                                           sinfo->xdp_frags_size,
                                           nr_frags * xdp->frame_sz,
@@ -2589,9 +2586,9 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget,
                        total_rx_bytes += size;
                } else {
                        if (ring_uses_build_skb(rx_ring))
-                               skb = i40e_build_skb(rx_ring, xdp, nfrags);
+                               skb = i40e_build_skb(rx_ring, xdp);
                        else
-                               skb = i40e_construct_skb(rx_ring, xdp, nfrags);
+                               skb = i40e_construct_skb(rx_ring, xdp);
 
                        /* drop if we failed to retrieve a buffer */
                        if (!skb) {
index 908cdbd3ec5d4fafe26fa03b4a41433f04ef5d31..b34c7177088745468ad33817302a631a5162322a 100644 (file)
@@ -2848,6 +2848,24 @@ error_param:
                                      (u8 *)&stats, sizeof(stats));
 }
 
+/**
+ * i40e_can_vf_change_mac
+ * @vf: pointer to the VF info
+ *
+ * Return true if the VF is allowed to change its MAC filters, false otherwise
+ */
+static bool i40e_can_vf_change_mac(struct i40e_vf *vf)
+{
+       /* If the VF MAC address has been set administratively (via the
+        * ndo_set_vf_mac command), then deny permission to the VF to
+        * add/delete unicast MAC addresses, unless the VF is trusted
+        */
+       if (vf->pf_set_mac && !vf->trusted)
+               return false;
+
+       return true;
+}
+
 #define I40E_MAX_MACVLAN_PER_HW 3072
 #define I40E_MAX_MACVLAN_PER_PF(num_ports) (I40E_MAX_MACVLAN_PER_HW /  \
        (num_ports))
@@ -2907,8 +2925,8 @@ static inline int i40e_check_vf_permission(struct i40e_vf *vf,
                 * The VF may request to set the MAC address filter already
                 * assigned to it so do not return an error in that case.
                 */
-               if (!test_bit(I40E_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps) &&
-                   !is_multicast_ether_addr(addr) && vf->pf_set_mac &&
+               if (!i40e_can_vf_change_mac(vf) &&
+                   !is_multicast_ether_addr(addr) &&
                    !ether_addr_equal(addr, vf->default_lan_addr.addr)) {
                        dev_err(&pf->pdev->dev,
                                "VF attempting to override administratively set MAC address, bring down and up the VF interface to resume normal operation\n");
@@ -3114,19 +3132,29 @@ static int i40e_vc_del_mac_addr_msg(struct i40e_vf *vf, u8 *msg)
                        ret = -EINVAL;
                        goto error_param;
                }
-               if (ether_addr_equal(al->list[i].addr, vf->default_lan_addr.addr))
-                       was_unimac_deleted = true;
        }
        vsi = pf->vsi[vf->lan_vsi_idx];
 
        spin_lock_bh(&vsi->mac_filter_hash_lock);
        /* delete addresses from the list */
-       for (i = 0; i < al->num_elements; i++)
+       for (i = 0; i < al->num_elements; i++) {
+               const u8 *addr = al->list[i].addr;
+
+               /* Allow to delete VF primary MAC only if it was not set
+                * administratively by PF or if VF is trusted.
+                */
+               if (ether_addr_equal(addr, vf->default_lan_addr.addr) &&
+                   i40e_can_vf_change_mac(vf))
+                       was_unimac_deleted = true;
+               else
+                       continue;
+
                if (i40e_del_mac_filter(vsi, al->list[i].addr)) {
                        ret = -EINVAL;
                        spin_unlock_bh(&vsi->mac_filter_hash_lock);
                        goto error_param;
                }
+       }
 
        spin_unlock_bh(&vsi->mac_filter_hash_lock);
 
index af7d5fa6cdc15552935b03e5beaaaaac856b7d3f..11500003af0d47dbfb203ea51914c2f452b42368 100644 (file)
@@ -414,7 +414,8 @@ i40e_add_xsk_frag(struct i40e_ring *rx_ring, struct xdp_buff *first,
        }
 
        __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++,
-                                  virt_to_page(xdp->data_hard_start), 0, size);
+                                  virt_to_page(xdp->data_hard_start),
+                                  XDP_PACKET_HEADROOM, size);
        sinfo->xdp_frags_size += size;
        xsk_buff_add_frag(xdp);
 
@@ -498,7 +499,6 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
                xdp_res = i40e_run_xdp_zc(rx_ring, first, xdp_prog);
                i40e_handle_xdp_result_zc(rx_ring, first, rx_desc, &rx_packets,
                                          &rx_bytes, xdp_res, &failure);
-               first->flags = 0;
                next_to_clean = next_to_process;
                if (failure)
                        break;
index 533b923cae2d078dfecdc902d4605d08b0d7391e..c979192e44d108b370ad132ec900c19d8452db32 100644 (file)
@@ -190,15 +190,13 @@ static void ice_free_q_vector(struct ice_vsi *vsi, int v_idx)
        q_vector = vsi->q_vectors[v_idx];
 
        ice_for_each_tx_ring(tx_ring, q_vector->tx) {
-               if (vsi->netdev)
-                       netif_queue_set_napi(vsi->netdev, tx_ring->q_index,
-                                            NETDEV_QUEUE_TYPE_TX, NULL);
+               ice_queue_set_napi(vsi, tx_ring->q_index, NETDEV_QUEUE_TYPE_TX,
+                                  NULL);
                tx_ring->q_vector = NULL;
        }
        ice_for_each_rx_ring(rx_ring, q_vector->rx) {
-               if (vsi->netdev)
-                       netif_queue_set_napi(vsi->netdev, rx_ring->q_index,
-                                            NETDEV_QUEUE_TYPE_RX, NULL);
+               ice_queue_set_napi(vsi, rx_ring->q_index, NETDEV_QUEUE_TYPE_RX,
+                                  NULL);
                rx_ring->q_vector = NULL;
        }
 
@@ -547,19 +545,27 @@ int ice_vsi_cfg_rxq(struct ice_rx_ring *ring)
        ring->rx_buf_len = ring->vsi->rx_buf_len;
 
        if (ring->vsi->type == ICE_VSI_PF) {
-               if (!xdp_rxq_info_is_reg(&ring->xdp_rxq))
-                       /* coverity[check_return] */
-                       __xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev,
-                                          ring->q_index,
-                                          ring->q_vector->napi.napi_id,
-                                          ring->vsi->rx_buf_len);
+               if (!xdp_rxq_info_is_reg(&ring->xdp_rxq)) {
+                       err = __xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev,
+                                                ring->q_index,
+                                                ring->q_vector->napi.napi_id,
+                                                ring->rx_buf_len);
+                       if (err)
+                               return err;
+               }
 
                ring->xsk_pool = ice_xsk_pool(ring);
                if (ring->xsk_pool) {
-                       xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
+                       xdp_rxq_info_unreg(&ring->xdp_rxq);
 
                        ring->rx_buf_len =
                                xsk_pool_get_rx_frame_size(ring->xsk_pool);
+                       err = __xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev,
+                                                ring->q_index,
+                                                ring->q_vector->napi.napi_id,
+                                                ring->rx_buf_len);
+                       if (err)
+                               return err;
                        err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
                                                         MEM_TYPE_XSK_BUFF_POOL,
                                                         NULL);
@@ -571,13 +577,14 @@ int ice_vsi_cfg_rxq(struct ice_rx_ring *ring)
                        dev_info(dev, "Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring %d\n",
                                 ring->q_index);
                } else {
-                       if (!xdp_rxq_info_is_reg(&ring->xdp_rxq))
-                               /* coverity[check_return] */
-                               __xdp_rxq_info_reg(&ring->xdp_rxq,
-                                                  ring->netdev,
-                                                  ring->q_index,
-                                                  ring->q_vector->napi.napi_id,
-                                                  ring->vsi->rx_buf_len);
+                       if (!xdp_rxq_info_is_reg(&ring->xdp_rxq)) {
+                               err = __xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev,
+                                                        ring->q_index,
+                                                        ring->q_vector->napi.napi_id,
+                                                        ring->rx_buf_len);
+                               if (err)
+                                       return err;
+                       }
 
                        err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
                                                         MEM_TYPE_PAGE_SHARED,
index b9c5eced6326f8fe3958c446f3e8b8bb0c517f90..bd9b1fed74ab86d3da3d9941f4139bbd46f3115e 100644 (file)
@@ -30,6 +30,26 @@ static const char * const pin_type_name[] = {
        [ICE_DPLL_PIN_TYPE_RCLK_INPUT] = "rclk-input",
 };
 
+/**
+ * ice_dpll_is_reset - check if reset is in progress
+ * @pf: private board structure
+ * @extack: error reporting
+ *
+ * If reset is in progress, fill extack with error.
+ *
+ * Return:
+ * * false - no reset in progress
+ * * true - reset in progress
+ */
+static bool ice_dpll_is_reset(struct ice_pf *pf, struct netlink_ext_ack *extack)
+{
+       if (ice_is_reset_in_progress(pf->state)) {
+               NL_SET_ERR_MSG(extack, "PF reset in progress");
+               return true;
+       }
+       return false;
+}
+
 /**
  * ice_dpll_pin_freq_set - set pin's frequency
  * @pf: private board structure
@@ -109,6 +129,9 @@ ice_dpll_frequency_set(const struct dpll_pin *pin, void *pin_priv,
        struct ice_pf *pf = d->pf;
        int ret;
 
+       if (ice_dpll_is_reset(pf, extack))
+               return -EBUSY;
+
        mutex_lock(&pf->dplls.lock);
        ret = ice_dpll_pin_freq_set(pf, p, pin_type, frequency, extack);
        mutex_unlock(&pf->dplls.lock);
@@ -254,6 +277,7 @@ ice_dpll_output_frequency_get(const struct dpll_pin *pin, void *pin_priv,
  * ice_dpll_pin_enable - enable a pin on dplls
  * @hw: board private hw structure
  * @pin: pointer to a pin
+ * @dpll_idx: dpll index to connect to output pin
  * @pin_type: type of pin being enabled
  * @extack: error reporting
  *
@@ -266,7 +290,7 @@ ice_dpll_output_frequency_get(const struct dpll_pin *pin, void *pin_priv,
  */
 static int
 ice_dpll_pin_enable(struct ice_hw *hw, struct ice_dpll_pin *pin,
-                   enum ice_dpll_pin_type pin_type,
+                   u8 dpll_idx, enum ice_dpll_pin_type pin_type,
                    struct netlink_ext_ack *extack)
 {
        u8 flags = 0;
@@ -280,10 +304,12 @@ ice_dpll_pin_enable(struct ice_hw *hw, struct ice_dpll_pin *pin,
                ret = ice_aq_set_input_pin_cfg(hw, pin->idx, 0, flags, 0, 0);
                break;
        case ICE_DPLL_PIN_TYPE_OUTPUT:
+               flags = ICE_AQC_SET_CGU_OUT_CFG_UPDATE_SRC_SEL;
                if (pin->flags[0] & ICE_AQC_GET_CGU_OUT_CFG_ESYNC_EN)
                        flags |= ICE_AQC_SET_CGU_OUT_CFG_ESYNC_EN;
                flags |= ICE_AQC_SET_CGU_OUT_CFG_OUT_EN;
-               ret = ice_aq_set_output_pin_cfg(hw, pin->idx, flags, 0, 0, 0);
+               ret = ice_aq_set_output_pin_cfg(hw, pin->idx, flags, dpll_idx,
+                                               0, 0);
                break;
        default:
                return -EINVAL;
@@ -370,7 +396,7 @@ ice_dpll_pin_state_update(struct ice_pf *pf, struct ice_dpll_pin *pin,
        case ICE_DPLL_PIN_TYPE_INPUT:
                ret = ice_aq_get_input_pin_cfg(&pf->hw, pin->idx, NULL, NULL,
                                               NULL, &pin->flags[0],
-                                              &pin->freq, NULL);
+                                              &pin->freq, &pin->phase_adjust);
                if (ret)
                        goto err;
                if (ICE_AQC_GET_CGU_IN_CFG_FLG2_INPUT_EN & pin->flags[0]) {
@@ -398,14 +424,27 @@ ice_dpll_pin_state_update(struct ice_pf *pf, struct ice_dpll_pin *pin,
                break;
        case ICE_DPLL_PIN_TYPE_OUTPUT:
                ret = ice_aq_get_output_pin_cfg(&pf->hw, pin->idx,
-                                               &pin->flags[0], NULL,
+                                               &pin->flags[0], &parent,
                                                &pin->freq, NULL);
                if (ret)
                        goto err;
-               if (ICE_AQC_SET_CGU_OUT_CFG_OUT_EN & pin->flags[0])
-                       pin->state[0] = DPLL_PIN_STATE_CONNECTED;
-               else
-                       pin->state[0] = DPLL_PIN_STATE_DISCONNECTED;
+
+               parent &= ICE_AQC_GET_CGU_OUT_CFG_DPLL_SRC_SEL;
+               if (ICE_AQC_SET_CGU_OUT_CFG_OUT_EN & pin->flags[0]) {
+                       pin->state[pf->dplls.eec.dpll_idx] =
+                               parent == pf->dplls.eec.dpll_idx ?
+                               DPLL_PIN_STATE_CONNECTED :
+                               DPLL_PIN_STATE_DISCONNECTED;
+                       pin->state[pf->dplls.pps.dpll_idx] =
+                               parent == pf->dplls.pps.dpll_idx ?
+                               DPLL_PIN_STATE_CONNECTED :
+                               DPLL_PIN_STATE_DISCONNECTED;
+               } else {
+                       pin->state[pf->dplls.eec.dpll_idx] =
+                               DPLL_PIN_STATE_DISCONNECTED;
+                       pin->state[pf->dplls.pps.dpll_idx] =
+                               DPLL_PIN_STATE_DISCONNECTED;
+               }
                break;
        case ICE_DPLL_PIN_TYPE_RCLK_INPUT:
                for (parent = 0; parent < pf->dplls.rclk.num_parents;
@@ -568,9 +607,13 @@ ice_dpll_pin_state_set(const struct dpll_pin *pin, void *pin_priv,
        struct ice_pf *pf = d->pf;
        int ret;
 
+       if (ice_dpll_is_reset(pf, extack))
+               return -EBUSY;
+
        mutex_lock(&pf->dplls.lock);
        if (enable)
-               ret = ice_dpll_pin_enable(&pf->hw, p, pin_type, extack);
+               ret = ice_dpll_pin_enable(&pf->hw, p, d->dpll_idx, pin_type,
+                                         extack);
        else
                ret = ice_dpll_pin_disable(&pf->hw, p, pin_type, extack);
        if (!ret)
@@ -603,6 +646,11 @@ ice_dpll_output_state_set(const struct dpll_pin *pin, void *pin_priv,
                          struct netlink_ext_ack *extack)
 {
        bool enable = state == DPLL_PIN_STATE_CONNECTED;
+       struct ice_dpll_pin *p = pin_priv;
+       struct ice_dpll *d = dpll_priv;
+
+       if (!enable && p->state[d->dpll_idx] == DPLL_PIN_STATE_DISCONNECTED)
+               return 0;
 
        return ice_dpll_pin_state_set(pin, pin_priv, dpll, dpll_priv, enable,
                                      extack, ICE_DPLL_PIN_TYPE_OUTPUT);
@@ -665,14 +713,16 @@ ice_dpll_pin_state_get(const struct dpll_pin *pin, void *pin_priv,
        struct ice_pf *pf = d->pf;
        int ret;
 
+       if (ice_dpll_is_reset(pf, extack))
+               return -EBUSY;
+
        mutex_lock(&pf->dplls.lock);
        ret = ice_dpll_pin_state_update(pf, p, pin_type, extack);
        if (ret)
                goto unlock;
-       if (pin_type == ICE_DPLL_PIN_TYPE_INPUT)
+       if (pin_type == ICE_DPLL_PIN_TYPE_INPUT ||
+           pin_type == ICE_DPLL_PIN_TYPE_OUTPUT)
                *state = p->state[d->dpll_idx];
-       else if (pin_type == ICE_DPLL_PIN_TYPE_OUTPUT)
-               *state = p->state[0];
        ret = 0;
 unlock:
        mutex_unlock(&pf->dplls.lock);
@@ -790,6 +840,9 @@ ice_dpll_input_prio_set(const struct dpll_pin *pin, void *pin_priv,
        struct ice_pf *pf = d->pf;
        int ret;
 
+       if (ice_dpll_is_reset(pf, extack))
+               return -EBUSY;
+
        mutex_lock(&pf->dplls.lock);
        ret = ice_dpll_hw_input_prio_set(pf, d, p, prio, extack);
        mutex_unlock(&pf->dplls.lock);
@@ -910,6 +963,9 @@ ice_dpll_pin_phase_adjust_set(const struct dpll_pin *pin, void *pin_priv,
        u8 flag, flags_en = 0;
        int ret;
 
+       if (ice_dpll_is_reset(pf, extack))
+               return -EBUSY;
+
        mutex_lock(&pf->dplls.lock);
        switch (type) {
        case ICE_DPLL_PIN_TYPE_INPUT:
@@ -1069,6 +1125,9 @@ ice_dpll_rclk_state_on_pin_set(const struct dpll_pin *pin, void *pin_priv,
        int ret = -EINVAL;
        u32 hw_idx;
 
+       if (ice_dpll_is_reset(pf, extack))
+               return -EBUSY;
+
        mutex_lock(&pf->dplls.lock);
        hw_idx = parent->idx - pf->dplls.base_rclk_idx;
        if (hw_idx >= pf->dplls.num_inputs)
@@ -1123,6 +1182,9 @@ ice_dpll_rclk_state_on_pin_get(const struct dpll_pin *pin, void *pin_priv,
        int ret = -EINVAL;
        u32 hw_idx;
 
+       if (ice_dpll_is_reset(pf, extack))
+               return -EBUSY;
+
        mutex_lock(&pf->dplls.lock);
        hw_idx = parent->idx - pf->dplls.base_rclk_idx;
        if (hw_idx >= pf->dplls.num_inputs)
@@ -1305,8 +1367,10 @@ static void ice_dpll_periodic_work(struct kthread_work *work)
        struct ice_pf *pf = container_of(d, struct ice_pf, dplls);
        struct ice_dpll *de = &pf->dplls.eec;
        struct ice_dpll *dp = &pf->dplls.pps;
-       int ret;
+       int ret = 0;
 
+       if (ice_is_reset_in_progress(pf->state))
+               goto resched;
        mutex_lock(&pf->dplls.lock);
        ret = ice_dpll_update_state(pf, de, false);
        if (!ret)
@@ -1326,6 +1390,7 @@ static void ice_dpll_periodic_work(struct kthread_work *work)
        ice_dpll_notify_changes(de);
        ice_dpll_notify_changes(dp);
 
+resched:
        /* Run twice a second or reschedule if update failed */
        kthread_queue_delayed_work(d->kworker, &d->work,
                                   ret ? msecs_to_jiffies(10) :
@@ -1532,7 +1597,7 @@ static void ice_dpll_deinit_rclk_pin(struct ice_pf *pf)
        }
        if (WARN_ON_ONCE(!vsi || !vsi->netdev))
                return;
-       netdev_dpll_pin_clear(vsi->netdev);
+       dpll_netdev_pin_clear(vsi->netdev);
        dpll_pin_put(rclk->pin);
 }
 
@@ -1576,7 +1641,7 @@ ice_dpll_init_rclk_pins(struct ice_pf *pf, struct ice_dpll_pin *pin,
        }
        if (WARN_ON((!vsi || !vsi->netdev)))
                return -EINVAL;
-       netdev_dpll_pin_set(vsi->netdev, pf->dplls.rclk.pin);
+       dpll_netdev_pin_set(vsi->netdev, pf->dplls.rclk.pin);
 
        return 0;
 
@@ -2055,6 +2120,7 @@ void ice_dpll_init(struct ice_pf *pf)
        struct ice_dplls *d = &pf->dplls;
        int err = 0;
 
+       mutex_init(&d->lock);
        err = ice_dpll_init_info(pf, cgu);
        if (err)
                goto err_exit;
@@ -2067,7 +2133,6 @@ void ice_dpll_init(struct ice_pf *pf)
        err = ice_dpll_init_pins(pf, cgu);
        if (err)
                goto deinit_pps;
-       mutex_init(&d->lock);
        if (cgu) {
                err = ice_dpll_init_worker(pf);
                if (err)
index 2a25323105e5b9bd5a1dbf072f097ddd90872210..467372d541d21f9c26416275f945b12e684079ef 100644 (file)
@@ -151,6 +151,27 @@ ice_lag_find_hw_by_lport(struct ice_lag *lag, u8 lport)
        return NULL;
 }
 
+/**
+ * ice_pkg_has_lport_extract - check if lport extraction supported
+ * @hw: HW struct
+ */
+static bool ice_pkg_has_lport_extract(struct ice_hw *hw)
+{
+       int i;
+
+       for (i = 0; i < hw->blk[ICE_BLK_SW].es.count; i++) {
+               u16 offset;
+               u8 fv_prot;
+
+               ice_find_prot_off(hw, ICE_BLK_SW, ICE_SW_DEFAULT_PROFILE, i,
+                                 &fv_prot, &offset);
+               if (fv_prot == ICE_FV_PROT_MDID &&
+                   offset == ICE_LP_EXT_BUF_OFFSET)
+                       return true;
+       }
+       return false;
+}
+
 /**
  * ice_lag_find_primary - returns pointer to primary interfaces lag struct
  * @lag: local interfaces lag struct
@@ -1206,7 +1227,7 @@ static void ice_lag_del_prune_list(struct ice_lag *lag, struct ice_pf *event_pf)
 }
 
 /**
- * ice_lag_init_feature_support_flag - Check for NVM support for LAG
+ * ice_lag_init_feature_support_flag - Check for package and NVM support for LAG
  * @pf: PF struct
  */
 static void ice_lag_init_feature_support_flag(struct ice_pf *pf)
@@ -1219,7 +1240,7 @@ static void ice_lag_init_feature_support_flag(struct ice_pf *pf)
        else
                ice_clear_feature_support(pf, ICE_F_ROCE_LAG);
 
-       if (caps->sriov_lag)
+       if (caps->sriov_lag && ice_pkg_has_lport_extract(&pf->hw))
                ice_set_feature_support(pf, ICE_F_SRIOV_LAG);
        else
                ice_clear_feature_support(pf, ICE_F_SRIOV_LAG);
index ede833dfa65866da00d8f4a6d77a90470906f863..183b38792ef22d9ac54daa0ea4a8156039681e7c 100644 (file)
@@ -17,6 +17,9 @@ enum ice_lag_role {
 #define ICE_LAG_INVALID_PORT 0xFF
 
 #define ICE_LAG_RESET_RETRIES          5
+#define ICE_SW_DEFAULT_PROFILE         0
+#define ICE_FV_PROT_MDID               255
+#define ICE_LP_EXT_BUF_OFFSET          32
 
 struct ice_pf;
 struct ice_vf;
index 9be724291ef82ac7e05c198d9febe029b4946a5e..fc23dbe302b46fa35e97ea425add87711abf66ee 100644 (file)
@@ -2426,7 +2426,7 @@ ice_vsi_cfg_def(struct ice_vsi *vsi, struct ice_vsi_cfg_params *params)
                ice_vsi_map_rings_to_vectors(vsi);
 
                /* Associate q_vector rings to napi */
-               ice_vsi_set_napi_queues(vsi, true);
+               ice_vsi_set_napi_queues(vsi);
 
                vsi->stat_offsets_loaded = false;
 
@@ -2904,19 +2904,19 @@ void ice_vsi_dis_irq(struct ice_vsi *vsi)
 }
 
 /**
- * ice_queue_set_napi - Set the napi instance for the queue
+ * __ice_queue_set_napi - Set the napi instance for the queue
  * @dev: device to which NAPI and queue belong
  * @queue_index: Index of queue
  * @type: queue type as RX or TX
  * @napi: NAPI context
  * @locked: is the rtnl_lock already held
  *
- * Set the napi instance for the queue
+ * Set the napi instance for the queue. Caller indicates the lock status.
  */
 static void
-ice_queue_set_napi(struct net_device *dev, unsigned int queue_index,
-                  enum netdev_queue_type type, struct napi_struct *napi,
-                  bool locked)
+__ice_queue_set_napi(struct net_device *dev, unsigned int queue_index,
+                    enum netdev_queue_type type, struct napi_struct *napi,
+                    bool locked)
 {
        if (!locked)
                rtnl_lock();
@@ -2926,26 +2926,79 @@ ice_queue_set_napi(struct net_device *dev, unsigned int queue_index,
 }
 
 /**
- * ice_q_vector_set_napi_queues - Map queue[s] associated with the napi
+ * ice_queue_set_napi - Set the napi instance for the queue
+ * @vsi: VSI being configured
+ * @queue_index: Index of queue
+ * @type: queue type as RX or TX
+ * @napi: NAPI context
+ *
+ * Set the napi instance for the queue. The rtnl lock state is derived from the
+ * execution path.
+ */
+void
+ice_queue_set_napi(struct ice_vsi *vsi, unsigned int queue_index,
+                  enum netdev_queue_type type, struct napi_struct *napi)
+{
+       struct ice_pf *pf = vsi->back;
+
+       if (!vsi->netdev)
+               return;
+
+       if (current_work() == &pf->serv_task ||
+           test_bit(ICE_PREPARED_FOR_RESET, pf->state) ||
+           test_bit(ICE_DOWN, pf->state) ||
+           test_bit(ICE_SUSPENDED, pf->state))
+               __ice_queue_set_napi(vsi->netdev, queue_index, type, napi,
+                                    false);
+       else
+               __ice_queue_set_napi(vsi->netdev, queue_index, type, napi,
+                                    true);
+}
+
+/**
+ * __ice_q_vector_set_napi_queues - Map queue[s] associated with the napi
  * @q_vector: q_vector pointer
  * @locked: is the rtnl_lock already held
  *
+ * Associate the q_vector napi with all the queue[s] on the vector.
+ * Caller indicates the lock status.
+ */
+void __ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector, bool locked)
+{
+       struct ice_rx_ring *rx_ring;
+       struct ice_tx_ring *tx_ring;
+
+       ice_for_each_rx_ring(rx_ring, q_vector->rx)
+               __ice_queue_set_napi(q_vector->vsi->netdev, rx_ring->q_index,
+                                    NETDEV_QUEUE_TYPE_RX, &q_vector->napi,
+                                    locked);
+
+       ice_for_each_tx_ring(tx_ring, q_vector->tx)
+               __ice_queue_set_napi(q_vector->vsi->netdev, tx_ring->q_index,
+                                    NETDEV_QUEUE_TYPE_TX, &q_vector->napi,
+                                    locked);
+       /* Also set the interrupt number for the NAPI */
+       netif_napi_set_irq(&q_vector->napi, q_vector->irq.virq);
+}
+
+/**
+ * ice_q_vector_set_napi_queues - Map queue[s] associated with the napi
+ * @q_vector: q_vector pointer
+ *
  * Associate the q_vector napi with all the queue[s] on the vector
  */
-void ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector, bool locked)
+void ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector)
 {
        struct ice_rx_ring *rx_ring;
        struct ice_tx_ring *tx_ring;
 
        ice_for_each_rx_ring(rx_ring, q_vector->rx)
-               ice_queue_set_napi(q_vector->vsi->netdev, rx_ring->q_index,
-                                  NETDEV_QUEUE_TYPE_RX, &q_vector->napi,
-                                  locked);
+               ice_queue_set_napi(q_vector->vsi, rx_ring->q_index,
+                                  NETDEV_QUEUE_TYPE_RX, &q_vector->napi);
 
        ice_for_each_tx_ring(tx_ring, q_vector->tx)
-               ice_queue_set_napi(q_vector->vsi->netdev, tx_ring->q_index,
-                                  NETDEV_QUEUE_TYPE_TX, &q_vector->napi,
-                                  locked);
+               ice_queue_set_napi(q_vector->vsi, tx_ring->q_index,
+                                  NETDEV_QUEUE_TYPE_TX, &q_vector->napi);
        /* Also set the interrupt number for the NAPI */
        netif_napi_set_irq(&q_vector->napi, q_vector->irq.virq);
 }
@@ -2953,11 +3006,10 @@ void ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector, bool locked)
 /**
  * ice_vsi_set_napi_queues
  * @vsi: VSI pointer
- * @locked: is the rtnl_lock already held
  *
  * Associate queue[s] with napi for all vectors
  */
-void ice_vsi_set_napi_queues(struct ice_vsi *vsi, bool locked)
+void ice_vsi_set_napi_queues(struct ice_vsi *vsi)
 {
        int i;
 
@@ -2965,7 +3017,7 @@ void ice_vsi_set_napi_queues(struct ice_vsi *vsi, bool locked)
                return;
 
        ice_for_each_q_vector(vsi, i)
-               ice_q_vector_set_napi_queues(vsi->q_vectors[i], locked);
+               ice_q_vector_set_napi_queues(vsi->q_vectors[i]);
 }
 
 /**
@@ -3140,7 +3192,7 @@ ice_vsi_realloc_stat_arrays(struct ice_vsi *vsi)
                }
        }
 
-       tx_ring_stats = vsi_stat->rx_ring_stats;
+       tx_ring_stats = vsi_stat->tx_ring_stats;
        vsi_stat->tx_ring_stats =
                krealloc_array(vsi_stat->tx_ring_stats, req_txq,
                               sizeof(*vsi_stat->tx_ring_stats),
index 71bd27244941d549d9253af900629ccb36278072..bfcfc582a4c04ff143390e394d0b65a1d0970391 100644 (file)
@@ -91,9 +91,15 @@ void ice_vsi_cfg_netdev_tc(struct ice_vsi *vsi, u8 ena_tc);
 struct ice_vsi *
 ice_vsi_setup(struct ice_pf *pf, struct ice_vsi_cfg_params *params);
 
-void ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector, bool locked);
+void
+ice_queue_set_napi(struct ice_vsi *vsi, unsigned int queue_index,
+                  enum netdev_queue_type type, struct napi_struct *napi);
+
+void __ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector, bool locked);
+
+void ice_q_vector_set_napi_queues(struct ice_q_vector *q_vector);
 
-void ice_vsi_set_napi_queues(struct ice_vsi *vsi, bool locked);
+void ice_vsi_set_napi_queues(struct ice_vsi *vsi);
 
 int ice_vsi_release(struct ice_vsi *vsi);
 
index dd4a9bc0dfdc661b2d2f3c48a2df5b773e4f75bb..df6a68ab747eeea289595765bc033473cef37165 100644 (file)
@@ -3495,7 +3495,7 @@ static void ice_napi_add(struct ice_vsi *vsi)
        ice_for_each_q_vector(vsi, v_idx) {
                netif_napi_add(vsi->netdev, &vsi->q_vectors[v_idx]->napi,
                               ice_napi_poll);
-               ice_q_vector_set_napi_queues(vsi->q_vectors[v_idx], false);
+               __ice_q_vector_set_napi_queues(vsi->q_vectors[v_idx], false);
        }
 }
 
@@ -5447,6 +5447,7 @@ static int ice_reinit_interrupt_scheme(struct ice_pf *pf)
                if (ret)
                        goto err_reinit;
                ice_vsi_map_rings_to_vectors(pf->vsi[v]);
+               ice_vsi_set_napi_queues(pf->vsi[v]);
        }
 
        ret = ice_req_irq_msix_misc(pf);
@@ -8012,6 +8013,8 @@ ice_bridge_setlink(struct net_device *dev, struct nlmsghdr *nlh,
        pf_sw = pf->first_sw;
        /* find the attribute in the netlink message */
        br_spec = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg), IFLA_AF_SPEC);
+       if (!br_spec)
+               return -EINVAL;
 
        nla_for_each_nested(attr, br_spec, rem) {
                __u16 mode;
index 82bc54fec7f36400a9be1f6603da2770ab5bb2e5..a2562f04267f23695af92be0bca5c1174702bef3 100644 (file)
@@ -24,7 +24,7 @@
 #define rd64(a, reg)           readq((a)->hw_addr + (reg))
 
 #define ice_flush(a)           rd32((a), GLGEN_STAT)
-#define ICE_M(m, s)            ((m) << (s))
+#define ICE_M(m, s)            ((m ## U) << (s))
 
 struct ice_dma_mem {
        void *va;
index a94a1c48c3de50db27c4373fef33632c5ae40f70..b0f78c2f2790949c4ed891d2bac789c9d2f2646d 100644 (file)
@@ -1068,6 +1068,7 @@ int ice_sriov_set_msix_vec_count(struct pci_dev *vf_dev, int msix_vec_count)
        struct ice_pf *pf = pci_get_drvdata(pdev);
        u16 prev_msix, prev_queues, queues;
        bool needs_rebuild = false;
+       struct ice_vsi *vsi;
        struct ice_vf *vf;
        int id;
 
@@ -1102,6 +1103,10 @@ int ice_sriov_set_msix_vec_count(struct pci_dev *vf_dev, int msix_vec_count)
        if (!vf)
                return -ENOENT;
 
+       vsi = ice_get_vf_vsi(vf);
+       if (!vsi)
+               return -ENOENT;
+
        prev_msix = vf->num_msix;
        prev_queues = vf->num_vf_qs;
 
@@ -1122,7 +1127,7 @@ int ice_sriov_set_msix_vec_count(struct pci_dev *vf_dev, int msix_vec_count)
        if (vf->first_vector_idx < 0)
                goto unroll;
 
-       if (ice_vf_reconfig_vsi(vf)) {
+       if (ice_vf_reconfig_vsi(vf) || ice_vf_init_host_cfg(vf, vsi)) {
                /* Try to rebuild with previous values */
                needs_rebuild = true;
                goto unroll;
@@ -1148,8 +1153,10 @@ unroll:
        if (vf->first_vector_idx < 0)
                return -EINVAL;
 
-       if (needs_rebuild)
+       if (needs_rebuild) {
                ice_vf_reconfig_vsi(vf);
+               ice_vf_init_host_cfg(vf, vsi);
+       }
 
        ice_ena_vf_mappings(vf);
        ice_put_vf(vf);
index 74d13cc5a3a7f1f62e6657e058548b243e2d438b..97d41d6ebf1fb69419e2cf13dae17db08fd27910 100644 (file)
@@ -513,11 +513,6 @@ int ice_setup_rx_ring(struct ice_rx_ring *rx_ring)
        if (ice_is_xdp_ena_vsi(rx_ring->vsi))
                WRITE_ONCE(rx_ring->xdp_prog, rx_ring->vsi->xdp_prog);
 
-       if (rx_ring->vsi->type == ICE_VSI_PF &&
-           !xdp_rxq_info_is_reg(&rx_ring->xdp_rxq))
-               if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev,
-                                    rx_ring->q_index, rx_ring->q_vector->napi.napi_id))
-                       goto err;
        return 0;
 
 err:
@@ -603,9 +598,7 @@ out_failure:
                ret = ICE_XDP_CONSUMED;
        }
 exit:
-       rx_buf->act = ret;
-       if (unlikely(xdp_buff_has_frags(xdp)))
-               ice_set_rx_bufs_act(xdp, rx_ring, ret);
+       ice_set_rx_bufs_act(xdp, rx_ring, ret);
 }
 
 /**
@@ -893,14 +886,17 @@ ice_add_xdp_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
        }
 
        if (unlikely(sinfo->nr_frags == MAX_SKB_FRAGS)) {
-               if (unlikely(xdp_buff_has_frags(xdp)))
-                       ice_set_rx_bufs_act(xdp, rx_ring, ICE_XDP_CONSUMED);
+               ice_set_rx_bufs_act(xdp, rx_ring, ICE_XDP_CONSUMED);
                return -ENOMEM;
        }
 
        __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, rx_buf->page,
                                   rx_buf->page_offset, size);
        sinfo->xdp_frags_size += size;
+       /* remember frag count before XDP prog execution; bpf_xdp_adjust_tail()
+        * can pop off frags but driver has to handle it on its own
+        */
+       rx_ring->nr_frags = sinfo->nr_frags;
 
        if (page_is_pfmemalloc(rx_buf->page))
                xdp_buff_set_frag_pfmemalloc(xdp);
@@ -1251,6 +1247,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
 
                xdp->data = NULL;
                rx_ring->first_desc = ntc;
+               rx_ring->nr_frags = 0;
                continue;
 construct_skb:
                if (likely(ice_ring_uses_build_skb(rx_ring)))
@@ -1266,10 +1263,12 @@ construct_skb:
                                                    ICE_XDP_CONSUMED);
                        xdp->data = NULL;
                        rx_ring->first_desc = ntc;
+                       rx_ring->nr_frags = 0;
                        break;
                }
                xdp->data = NULL;
                rx_ring->first_desc = ntc;
+               rx_ring->nr_frags = 0;
 
                stat_err_bits = BIT(ICE_RX_FLEX_DESC_STATUS0_RXE_S);
                if (unlikely(ice_test_staterr(rx_desc->wb.status_error0,
index b3379ff736747887a7404c5d020b865bc10a5024..af955b0e5dc5caeb3ce6ca3f9671772763270f52 100644 (file)
@@ -358,6 +358,7 @@ struct ice_rx_ring {
        struct ice_tx_ring *xdp_ring;
        struct ice_rx_ring *next;       /* pointer to next ring in q_vector */
        struct xsk_buff_pool *xsk_pool;
+       u32 nr_frags;
        dma_addr_t dma;                 /* physical address of ring */
        u16 rx_buf_len;
        u8 dcb_tc;                      /* Traffic class of ring */
index 762047508619603028cac48e5e091d4a584084c2..afcead4baef4b1552bdd152ee5414c8127b0b992 100644 (file)
  * act: action to store onto Rx buffers related to XDP buffer parts
  *
  * Set action that should be taken before putting Rx buffer from first frag
- * to one before last. Last one is handled by caller of this function as it
- * is the EOP frag that is currently being processed. This function is
- * supposed to be called only when XDP buffer contains frags.
+ * to the last.
  */
 static inline void
 ice_set_rx_bufs_act(struct xdp_buff *xdp, const struct ice_rx_ring *rx_ring,
                    const unsigned int act)
 {
-       const struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
-       u32 first = rx_ring->first_desc;
-       u32 nr_frags = sinfo->nr_frags;
+       u32 sinfo_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags;
+       u32 nr_frags = rx_ring->nr_frags + 1;
+       u32 idx = rx_ring->first_desc;
        u32 cnt = rx_ring->count;
        struct ice_rx_buf *buf;
 
        for (int i = 0; i < nr_frags; i++) {
-               buf = &rx_ring->rx_buf[first];
+               buf = &rx_ring->rx_buf[idx];
                buf->act = act;
 
-               if (++first == cnt)
-                       first = 0;
+               if (++idx == cnt)
+                       idx = 0;
+       }
+
+       /* adjust pagecnt_bias on frags freed by XDP prog */
+       if (sinfo_frags < rx_ring->nr_frags && act == ICE_XDP_CONSUMED) {
+               u32 delta = rx_ring->nr_frags - sinfo_frags;
+
+               while (delta) {
+                       if (idx == 0)
+                               idx = cnt - 1;
+                       else
+                               idx--;
+                       buf = &rx_ring->rx_buf[idx];
+                       buf->pagecnt_bias--;
+                       delta--;
+               }
        }
 }
 
index 41ab6d7bbd9ef923fb766555ba48c5533e989f93..a508e917ce5ffab9e092a62337fbe70b27efbc5e 100644 (file)
@@ -1072,7 +1072,7 @@ struct ice_aq_get_set_rss_lut_params {
 #define ICE_OROM_VER_BUILD_SHIFT       8
 #define ICE_OROM_VER_BUILD_MASK                (0xffff << ICE_OROM_VER_BUILD_SHIFT)
 #define ICE_OROM_VER_SHIFT             24
-#define ICE_OROM_VER_MASK              (0xff << ICE_OROM_VER_SHIFT)
+#define ICE_OROM_VER_MASK              (0xffU << ICE_OROM_VER_SHIFT)
 #define ICE_SR_PFA_PTR                 0x40
 #define ICE_SR_1ST_NVM_BANK_PTR                0x42
 #define ICE_SR_NVM_BANK_SIZE           0x43
index c925813ec9caf06d11199ca066a5096e6d034b4e..6f2328a049bf10e7604f3f33a91ce943944f0e37 100644 (file)
@@ -440,7 +440,6 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
                vf->driver_caps = *(u32 *)msg;
        else
                vf->driver_caps = VIRTCHNL_VF_OFFLOAD_L2 |
-                                 VIRTCHNL_VF_OFFLOAD_RSS_REG |
                                  VIRTCHNL_VF_OFFLOAD_VLAN;
 
        vfres->vf_cap_flags = VIRTCHNL_VF_OFFLOAD_L2;
@@ -453,14 +452,8 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg)
        vfres->vf_cap_flags |= ice_vc_get_vlan_caps(hw, vf, vsi,
                                                    vf->driver_caps);
 
-       if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RSS_PF) {
+       if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RSS_PF)
                vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_RSS_PF;
-       } else {
-               if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RSS_AQ)
-                       vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_RSS_AQ;
-               else
-                       vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_RSS_REG;
-       }
 
        if (vf->driver_caps & VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC)
                vfres->vf_cap_flags |= VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC;
index 5e19d48a05b45939f3bf72882bb7c9234312a230..d796dbd2a440cd550a9147956c6968286ae9cfa2 100644 (file)
@@ -13,8 +13,6 @@
  * - opcodes needed by VF when caps are activated
  *
  * Caps that don't use new opcodes (no opcodes should be allowed):
- * - VIRTCHNL_VF_OFFLOAD_RSS_AQ
- * - VIRTCHNL_VF_OFFLOAD_RSS_REG
  * - VIRTCHNL_VF_OFFLOAD_WB_ON_ITR
  * - VIRTCHNL_VF_OFFLOAD_CRC
  * - VIRTCHNL_VF_OFFLOAD_RX_POLLING
index 5d1ae8e4058a4ae43bb0fb2be98070cf1f2e9559..2eecd0f39aa696e1c24083f03bf9a1908c121fdd 100644 (file)
@@ -179,6 +179,10 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
                        return -EBUSY;
                usleep_range(1000, 2000);
        }
+
+       ice_qvec_dis_irq(vsi, rx_ring, q_vector);
+       ice_qvec_toggle_napi(vsi, q_vector, false);
+
        netif_tx_stop_queue(netdev_get_tx_queue(vsi->netdev, q_idx));
 
        ice_fill_txq_meta(vsi, tx_ring, &txq_meta);
@@ -195,13 +199,10 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
                if (err)
                        return err;
        }
-       ice_qvec_dis_irq(vsi, rx_ring, q_vector);
-
        err = ice_vsi_ctrl_one_rx_ring(vsi, false, q_idx, true);
        if (err)
                return err;
 
-       ice_qvec_toggle_napi(vsi, q_vector, false);
        ice_qp_clean_rings(vsi, q_idx);
        ice_qp_reset_stats(vsi, q_idx);
 
@@ -259,11 +260,11 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
        if (err)
                return err;
 
-       clear_bit(ICE_CFG_BUSY, vsi->state);
        ice_qvec_toggle_napi(vsi, q_vector, true);
        ice_qvec_ena_irq(vsi, q_vector);
 
        netif_tx_start_queue(netdev_get_tx_queue(vsi->netdev, q_idx));
+       clear_bit(ICE_CFG_BUSY, vsi->state);
 
        return 0;
 }
@@ -825,7 +826,8 @@ ice_add_xsk_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *first,
        }
 
        __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++,
-                                  virt_to_page(xdp->data_hard_start), 0, size);
+                                  virt_to_page(xdp->data_hard_start),
+                                  XDP_PACKET_HEADROOM, size);
        sinfo->xdp_frags_size += size;
        xsk_buff_add_frag(xdp);
 
@@ -895,7 +897,6 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
 
                if (!first) {
                        first = xdp;
-                       xdp_buff_clear_frags_flag(first);
                } else if (ice_add_xsk_frag(rx_ring, first, xdp, size)) {
                        break;
                }
index 5fea2fd957eb3563ac839b0cc966cde9de40a2ba..58179bd733ff05bf5d31cc0b6e4855d075fcae8d 100644 (file)
@@ -783,6 +783,8 @@ static int idpf_cfg_netdev(struct idpf_vport *vport)
        /* setup watchdog timeout value to be 5 second */
        netdev->watchdog_timeo = 5 * HZ;
 
+       netdev->dev_port = idx;
+
        /* configure default MTU size */
        netdev->min_mtu = ETH_MIN_MTU;
        netdev->max_mtu = vport->max_mtu;
index d0cdd63b3d5b24108ae4832a5e09f4b417e82dfe..390977a76de25a42766c314ab94c42e6528ab4c9 100644 (file)
@@ -2087,8 +2087,10 @@ int idpf_send_disable_queues_msg(struct idpf_vport *vport)
                set_bit(__IDPF_Q_POLL_MODE, vport->txqs[i]->flags);
 
        /* schedule the napi to receive all the marker packets */
+       local_bh_disable();
        for (i = 0; i < vport->num_q_vectors; i++)
                napi_schedule(&vport->q_vectors[i].napi);
+       local_bh_enable();
 
        return idpf_wait_for_marker_event(vport);
 }
index 8dc837889723c8a8976fd537e79e7d6acd49c4a8..4a3c4454d25abad18582ea7b93c74b616ef5cf75 100644 (file)
@@ -978,7 +978,7 @@ struct virtchnl2_ptype {
        u8 proto_id_count;
        __le16 pad;
        __le16 proto_id[];
-};
+} __packed __aligned(2);
 VIRTCHNL2_CHECK_STRUCT_LEN(6, virtchnl2_ptype);
 
 /**
index a2b759531cb7ba44720f000018597d5222bec900..3c2dc7bdebb50eb9f08ec49ce6590f2b35445e53 100644 (file)
@@ -637,7 +637,7 @@ struct igb_adapter {
                struct timespec64 period;
        } perout[IGB_N_PEROUT];
 
-       char fw_version[32];
+       char fw_version[48];
 #ifdef CONFIG_IGB_HWMON
        struct hwmon_buff *igb_hwmon_buff;
        bool ets;
index 4df8d4153aa5f5ce7ac9dd566180d552be9f5b4f..cebb44f51d5f5bbd1177b0caeb1e08f7a2fc30db 100644 (file)
@@ -3069,7 +3069,6 @@ void igb_set_fw_version(struct igb_adapter *adapter)
 {
        struct e1000_hw *hw = &adapter->hw;
        struct e1000_fw_version fw;
-       char *lbuf;
 
        igb_get_fw_version(hw, &fw);
 
@@ -3077,34 +3076,36 @@ void igb_set_fw_version(struct igb_adapter *adapter)
        case e1000_i210:
        case e1000_i211:
                if (!(igb_get_flash_presence_i210(hw))) {
-                       lbuf = kasprintf(GFP_KERNEL, "%2d.%2d-%d",
-                                        fw.invm_major, fw.invm_minor,
-                                        fw.invm_img_type);
+                       snprintf(adapter->fw_version,
+                                sizeof(adapter->fw_version),
+                                "%2d.%2d-%d",
+                                fw.invm_major, fw.invm_minor,
+                                fw.invm_img_type);
                        break;
                }
                fallthrough;
        default:
                /* if option rom is valid, display its version too */
                if (fw.or_valid) {
-                       lbuf = kasprintf(GFP_KERNEL, "%d.%d, 0x%08x, %d.%d.%d",
-                                        fw.eep_major, fw.eep_minor,
-                                        fw.etrack_id, fw.or_major, fw.or_build,
-                                        fw.or_patch);
+                       snprintf(adapter->fw_version,
+                                sizeof(adapter->fw_version),
+                                "%d.%d, 0x%08x, %d.%d.%d",
+                                fw.eep_major, fw.eep_minor, fw.etrack_id,
+                                fw.or_major, fw.or_build, fw.or_patch);
                /* no option rom */
                } else if (fw.etrack_id != 0X0000) {
-                       lbuf = kasprintf(GFP_KERNEL, "%d.%d, 0x%08x",
-                                        fw.eep_major, fw.eep_minor,
-                                        fw.etrack_id);
+                       snprintf(adapter->fw_version,
+                                sizeof(adapter->fw_version),
+                                "%d.%d, 0x%08x",
+                                fw.eep_major, fw.eep_minor, fw.etrack_id);
                } else {
-                       lbuf = kasprintf(GFP_KERNEL, "%d.%d.%d", fw.eep_major,
-                                        fw.eep_minor, fw.eep_build);
+                       snprintf(adapter->fw_version,
+                                sizeof(adapter->fw_version),
+                                "%d.%d.%d",
+                                fw.eep_major, fw.eep_minor, fw.eep_build);
                }
                break;
        }
-
-       /* the truncate happens here if it doesn't fit */
-       strscpy(adapter->fw_version, lbuf, sizeof(adapter->fw_version));
-       kfree(lbuf);
 }
 
 /**
index 319c544b9f04ce5e9ef6f09a9fa3e3f641583f47..f9457055612004c10f74379122063e8136fe7d76 100644 (file)
@@ -957,7 +957,7 @@ static void igb_ptp_tx_hwtstamp(struct igb_adapter *adapter)
 
        igb_ptp_systim_to_hwtstamp(adapter, &shhwtstamps, regval);
        /* adjust timestamp for the TX latency based on link speed */
-       if (adapter->hw.mac.type == e1000_i210) {
+       if (hw->mac.type == e1000_i210 || hw->mac.type == e1000_i211) {
                switch (adapter->link_speed) {
                case SPEED_10:
                        adjust = IGB_I210_TX_LATENCY_10;
@@ -1003,6 +1003,7 @@ int igb_ptp_rx_pktstamp(struct igb_q_vector *q_vector, void *va,
                        ktime_t *timestamp)
 {
        struct igb_adapter *adapter = q_vector->adapter;
+       struct e1000_hw *hw = &adapter->hw;
        struct skb_shared_hwtstamps ts;
        __le64 *regval = (__le64 *)va;
        int adjust = 0;
@@ -1022,7 +1023,7 @@ int igb_ptp_rx_pktstamp(struct igb_q_vector *q_vector, void *va,
        igb_ptp_systim_to_hwtstamp(adapter, &ts, le64_to_cpu(regval[1]));
 
        /* adjust timestamp for the RX latency based on link speed */
-       if (adapter->hw.mac.type == e1000_i210) {
+       if (hw->mac.type == e1000_i210 || hw->mac.type == e1000_i211) {
                switch (adapter->link_speed) {
                case SPEED_10:
                        adjust = IGB_I210_RX_LATENCY_10;
index ba8d3fe186aedacd5a7959e6fd9da3408fe71843..81c21a893ede9c0c432d26e03f3baecc3293b27f 100644 (file)
@@ -6487,7 +6487,7 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames,
        int cpu = smp_processor_id();
        struct netdev_queue *nq;
        struct igc_ring *ring;
-       int i, drops;
+       int i, nxmit;
 
        if (unlikely(!netif_carrier_ok(dev)))
                return -ENETDOWN;
@@ -6503,16 +6503,15 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames,
        /* Avoid transmit queue timeout since we share it with the slow path */
        txq_trans_cond_update(nq);
 
-       drops = 0;
+       nxmit = 0;
        for (i = 0; i < num_frames; i++) {
                int err;
                struct xdp_frame *xdpf = frames[i];
 
                err = igc_xdp_init_tx_descriptor(ring, xdpf);
-               if (err) {
-                       xdp_return_frame_rx_napi(xdpf);
-                       drops++;
-               }
+               if (err)
+                       break;
+               nxmit++;
        }
 
        if (flags & XDP_XMIT_FLUSH)
@@ -6520,7 +6519,7 @@ static int igc_xdp_xmit(struct net_device *dev, int num_frames,
 
        __netif_tx_unlock(nq);
 
-       return num_frames - drops;
+       return nxmit;
 }
 
 static void igc_trigger_rxtxq_interrupt(struct igc_adapter *adapter,
index 7cd8716d2ffa3a90b35cea6218922a8cf656b9eb..861f37076861655df235fed77f6e7d0cb4bb3dc4 100644 (file)
@@ -130,11 +130,7 @@ void igc_power_down_phy_copper(struct igc_hw *hw)
        /* The PHY will retain its settings across a power down/up cycle */
        hw->phy.ops.read_reg(hw, PHY_CONTROL, &mii_reg);
        mii_reg |= MII_CR_POWER_DOWN;
-
-       /* Temporary workaround - should be removed when PHY will implement
-        * IEEE registers as properly
-        */
-       /* hw->phy.ops.write_reg(hw, PHY_CONTROL, mii_reg);*/
+       hw->phy.ops.write_reg(hw, PHY_CONTROL, mii_reg);
        usleep_range(1000, 2000);
 }
 
index bd541527c8c74d6922e8683e2f4493d9b361f67b..99876b765b08bc94e9ba1673205f56e9d140a49d 100644 (file)
@@ -2939,8 +2939,8 @@ static void ixgbe_check_lsc(struct ixgbe_adapter *adapter)
 static inline void ixgbe_irq_enable_queues(struct ixgbe_adapter *adapter,
                                           u64 qmask)
 {
-       u32 mask;
        struct ixgbe_hw *hw = &adapter->hw;
+       u32 mask;
 
        switch (hw->mac.type) {
        case ixgbe_mac_82598EB:
@@ -10524,6 +10524,44 @@ static void ixgbe_reset_rxr_stats(struct ixgbe_ring *rx_ring)
        memset(&rx_ring->rx_stats, 0, sizeof(rx_ring->rx_stats));
 }
 
+/**
+ * ixgbe_irq_disable_single - Disable single IRQ vector
+ * @adapter: adapter structure
+ * @ring: ring index
+ **/
+static void ixgbe_irq_disable_single(struct ixgbe_adapter *adapter, u32 ring)
+{
+       struct ixgbe_hw *hw = &adapter->hw;
+       u64 qmask = BIT_ULL(ring);
+       u32 mask;
+
+       switch (adapter->hw.mac.type) {
+       case ixgbe_mac_82598EB:
+               mask = qmask & IXGBE_EIMC_RTX_QUEUE;
+               IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMC, mask);
+               break;
+       case ixgbe_mac_82599EB:
+       case ixgbe_mac_X540:
+       case ixgbe_mac_X550:
+       case ixgbe_mac_X550EM_x:
+       case ixgbe_mac_x550em_a:
+               mask = (qmask & 0xFFFFFFFF);
+               if (mask)
+                       IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(0), mask);
+               mask = (qmask >> 32);
+               if (mask)
+                       IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(1), mask);
+               break;
+       default:
+               break;
+       }
+       IXGBE_WRITE_FLUSH(&adapter->hw);
+       if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED)
+               synchronize_irq(adapter->msix_entries[ring].vector);
+       else
+               synchronize_irq(adapter->pdev->irq);
+}
+
 /**
  * ixgbe_txrx_ring_disable - Disable Rx/Tx/XDP Tx rings
  * @adapter: adapter structure
@@ -10540,6 +10578,11 @@ void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring)
        tx_ring = adapter->tx_ring[ring];
        xdp_ring = adapter->xdp_ring[ring];
 
+       ixgbe_irq_disable_single(adapter, ring);
+
+       /* Rx/Tx/XDP Tx share the same napi context. */
+       napi_disable(&rx_ring->q_vector->napi);
+
        ixgbe_disable_txr(adapter, tx_ring);
        if (xdp_ring)
                ixgbe_disable_txr(adapter, xdp_ring);
@@ -10548,9 +10591,6 @@ void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring)
        if (xdp_ring)
                synchronize_rcu();
 
-       /* Rx/Tx/XDP Tx share the same napi context. */
-       napi_disable(&rx_ring->q_vector->napi);
-
        ixgbe_clean_tx_ring(tx_ring);
        if (xdp_ring)
                ixgbe_clean_tx_ring(xdp_ring);
@@ -10578,9 +10618,6 @@ void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring)
        tx_ring = adapter->tx_ring[ring];
        xdp_ring = adapter->xdp_ring[ring];
 
-       /* Rx/Tx/XDP Tx share the same napi context. */
-       napi_enable(&rx_ring->q_vector->napi);
-
        ixgbe_configure_tx_ring(adapter, tx_ring);
        if (xdp_ring)
                ixgbe_configure_tx_ring(adapter, xdp_ring);
@@ -10589,6 +10626,11 @@ void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring)
        clear_bit(__IXGBE_TX_DISABLED, &tx_ring->state);
        if (xdp_ring)
                clear_bit(__IXGBE_TX_DISABLED, &xdp_ring->state);
+
+       /* Rx/Tx/XDP Tx share the same napi context. */
+       napi_enable(&rx_ring->q_vector->napi);
+       ixgbe_irq_enable_queues(adapter, BIT_ULL(ring));
+       IXGBE_WRITE_FLUSH(&adapter->hw);
 }
 
 /**
index 6208923e29a2b861363317b983b577b383bbeeb1..c1adc94a5a657a6ac432a52016436479020673f3 100644 (file)
@@ -716,7 +716,8 @@ static s32 ixgbe_read_iosf_sb_reg_x550(struct ixgbe_hw *hw, u32 reg_addr,
        if ((command & IXGBE_SB_IOSF_CTRL_RESP_STAT_MASK) != 0) {
                error = FIELD_GET(IXGBE_SB_IOSF_CTRL_CMPL_ERR_MASK, command);
                hw_dbg(hw, "Failed to read, error %x\n", error);
-               return -EIO;
+               ret = -EIO;
+               goto out;
        }
 
        if (!ret)
index 5182fe737c3727629fd4a24d516f2459126ddf2b..ff54fbe41bccc89ce954eadb770ceb5786b719ac 100644 (file)
@@ -318,4 +318,5 @@ static struct platform_driver liteeth_driver = {
 module_platform_driver(liteeth_driver);
 
 MODULE_AUTHOR("Joel Stanley <joel@jms.id.au>");
+MODULE_DESCRIPTION("LiteX Liteeth Ethernet driver");
 MODULE_LICENSE("GPL");
index 820b1fabe297a209dd2620092115a01361c755fd..23adf53c2aa1c08086bff5758a99673e023c7de4 100644 (file)
@@ -614,12 +614,38 @@ static void mvpp23_bm_set_8pool_mode(struct mvpp2 *priv)
        mvpp2_write(priv, MVPP22_BM_POOL_BASE_ADDR_HIGH_REG, val);
 }
 
+/* Cleanup pool before actual initialization in the OS */
+static void mvpp2_bm_pool_cleanup(struct mvpp2 *priv, int pool_id)
+{
+       unsigned int thread = mvpp2_cpu_to_thread(priv, get_cpu());
+       u32 val;
+       int i;
+
+       /* Drain the BM from all possible residues left by firmware */
+       for (i = 0; i < MVPP2_BM_POOL_SIZE_MAX; i++)
+               mvpp2_thread_read(priv, thread, MVPP2_BM_PHY_ALLOC_REG(pool_id));
+
+       put_cpu();
+
+       /* Stop the BM pool */
+       val = mvpp2_read(priv, MVPP2_BM_POOL_CTRL_REG(pool_id));
+       val |= MVPP2_BM_STOP_MASK;
+       mvpp2_write(priv, MVPP2_BM_POOL_CTRL_REG(pool_id), val);
+}
+
 static int mvpp2_bm_init(struct device *dev, struct mvpp2 *priv)
 {
        enum dma_data_direction dma_dir = DMA_FROM_DEVICE;
        int i, err, poolnum = MVPP2_BM_POOLS_NUM;
        struct mvpp2_port *port;
 
+       if (priv->percpu_pools)
+               poolnum = mvpp2_get_nrxqs(priv) * 2;
+
+       /* Clean up the pool state in case it contains stale state */
+       for (i = 0; i < poolnum; i++)
+               mvpp2_bm_pool_cleanup(priv, i);
+
        if (priv->percpu_pools) {
                for (i = 0; i < priv->port_count; i++) {
                        port = priv->port_list[i];
@@ -629,7 +655,6 @@ static int mvpp2_bm_init(struct device *dev, struct mvpp2 *priv)
                        }
                }
 
-               poolnum = mvpp2_get_nrxqs(priv) * 2;
                for (i = 0; i < poolnum; i++) {
                        /* the pool in use */
                        int pn = i / (poolnum / 2);
index 9690ac01f02c8db9b9ed2e524b05aa124b077c7a..b92264d0a77e71075495f5cc0e02c330e3c279bd 100644 (file)
@@ -413,4 +413,5 @@ const char *otx2_mbox_id2name(u16 id)
 EXPORT_SYMBOL(otx2_mbox_id2name);
 
 MODULE_AUTHOR("Marvell.");
+MODULE_DESCRIPTION("Marvell RVU NIC Mbox helpers");
 MODULE_LICENSE("GPL v2");
index 167145bdcb75d3f852134fcaa44fc2f307c42478..516adb50f9f6b2b8d4c43f12b51d83da30aae904 100644 (file)
@@ -61,28 +61,6 @@ int rvu_npc_get_tx_nibble_cfg(struct rvu *rvu, u64 nibble_ena)
        return 0;
 }
 
-static int npc_mcam_verify_pf_func(struct rvu *rvu,
-                                  struct mcam_entry *entry_data, u8 intf,
-                                  u16 pcifunc)
-{
-       u16 pf_func, pf_func_mask;
-
-       if (is_npc_intf_rx(intf))
-               return 0;
-
-       pf_func_mask = (entry_data->kw_mask[0] >> 32) &
-               NPC_KEX_PF_FUNC_MASK;
-       pf_func = (entry_data->kw[0] >> 32) & NPC_KEX_PF_FUNC_MASK;
-
-       pf_func = be16_to_cpu((__force __be16)pf_func);
-       if (pf_func_mask != NPC_KEX_PF_FUNC_MASK ||
-           ((pf_func & ~RVU_PFVF_FUNC_MASK) !=
-            (pcifunc & ~RVU_PFVF_FUNC_MASK)))
-               return -EINVAL;
-
-       return 0;
-}
-
 void rvu_npc_set_pkind(struct rvu *rvu, int pkind, struct rvu_pfvf *pfvf)
 {
        int blkaddr;
@@ -437,6 +415,10 @@ static void npc_fixup_vf_rule(struct rvu *rvu, struct npc_mcam *mcam,
                        return;
        }
 
+       /* AF modifies given action iff PF/VF has requested for it */
+       if ((entry->action & 0xFULL) != NIX_RX_ACTION_DEFAULT)
+               return;
+
        /* copy VF default entry action to the VF mcam entry */
        rx_action = npc_get_default_entry_action(rvu, mcam, blkaddr,
                                                 target_func);
@@ -1850,8 +1832,8 @@ void npc_mcam_rsrcs_deinit(struct rvu *rvu)
 {
        struct npc_mcam *mcam = &rvu->hw->mcam;
 
-       kfree(mcam->bmap);
-       kfree(mcam->bmap_reverse);
+       bitmap_free(mcam->bmap);
+       bitmap_free(mcam->bmap_reverse);
        kfree(mcam->entry2pfvf_map);
        kfree(mcam->cntr2pfvf_map);
        kfree(mcam->entry2cntr_map);
@@ -1904,21 +1886,20 @@ int npc_mcam_rsrcs_init(struct rvu *rvu, int blkaddr)
        mcam->pf_offset = mcam->nixlf_offset + nixlf_count;
 
        /* Allocate bitmaps for managing MCAM entries */
-       mcam->bmap = kmalloc_array(BITS_TO_LONGS(mcam->bmap_entries),
-                                  sizeof(long), GFP_KERNEL);
+       mcam->bmap = bitmap_zalloc(mcam->bmap_entries, GFP_KERNEL);
        if (!mcam->bmap)
                return -ENOMEM;
 
-       mcam->bmap_reverse = kmalloc_array(BITS_TO_LONGS(mcam->bmap_entries),
-                                          sizeof(long), GFP_KERNEL);
+       mcam->bmap_reverse = bitmap_zalloc(mcam->bmap_entries, GFP_KERNEL);
        if (!mcam->bmap_reverse)
                goto free_bmap;
 
        mcam->bmap_fcnt = mcam->bmap_entries;
 
        /* Alloc memory for saving entry to RVU PFFUNC allocation mapping */
-       mcam->entry2pfvf_map = kmalloc_array(mcam->bmap_entries,
-                                            sizeof(u16), GFP_KERNEL);
+       mcam->entry2pfvf_map = kcalloc(mcam->bmap_entries, sizeof(u16),
+                                      GFP_KERNEL);
+
        if (!mcam->entry2pfvf_map)
                goto free_bmap_reverse;
 
@@ -1941,21 +1922,21 @@ int npc_mcam_rsrcs_init(struct rvu *rvu, int blkaddr)
        if (err)
                goto free_entry_map;
 
-       mcam->cntr2pfvf_map = kmalloc_array(mcam->counters.max,
-                                           sizeof(u16), GFP_KERNEL);
+       mcam->cntr2pfvf_map = kcalloc(mcam->counters.max, sizeof(u16),
+                                     GFP_KERNEL);
        if (!mcam->cntr2pfvf_map)
                goto free_cntr_bmap;
 
        /* Alloc memory for MCAM entry to counter mapping and for tracking
         * counter's reference count.
         */
-       mcam->entry2cntr_map = kmalloc_array(mcam->bmap_entries,
-                                            sizeof(u16), GFP_KERNEL);
+       mcam->entry2cntr_map = kcalloc(mcam->bmap_entries, sizeof(u16),
+                                      GFP_KERNEL);
        if (!mcam->entry2cntr_map)
                goto free_cntr_map;
 
-       mcam->cntr_refcnt = kmalloc_array(mcam->counters.max,
-                                         sizeof(u16), GFP_KERNEL);
+       mcam->cntr_refcnt = kcalloc(mcam->counters.max, sizeof(u16),
+                                   GFP_KERNEL);
        if (!mcam->cntr_refcnt)
                goto free_entry_cntr_map;
 
@@ -1988,9 +1969,9 @@ free_cntr_bmap:
 free_entry_map:
        kfree(mcam->entry2pfvf_map);
 free_bmap_reverse:
-       kfree(mcam->bmap_reverse);
+       bitmap_free(mcam->bmap_reverse);
 free_bmap:
-       kfree(mcam->bmap);
+       bitmap_free(mcam->bmap);
 
        return -ENOMEM;
 }
@@ -2852,12 +2833,6 @@ int rvu_mbox_handler_npc_mcam_write_entry(struct rvu *rvu,
        else
                nix_intf = pfvf->nix_rx_intf;
 
-       if (!is_pffunc_af(pcifunc) &&
-           npc_mcam_verify_pf_func(rvu, &req->entry_data, req->intf, pcifunc)) {
-               rc = NPC_MCAM_INVALID_REQ;
-               goto exit;
-       }
-
        /* For AF installed rules, the nix_intf should be set to target NIX */
        if (is_pffunc_af(req->hdr.pcifunc))
                nix_intf = req->intf;
@@ -3209,10 +3184,6 @@ int rvu_mbox_handler_npc_mcam_alloc_and_write_entry(struct rvu *rvu,
        if (!is_npc_interface_valid(rvu, req->intf))
                return NPC_MCAM_INVALID_REQ;
 
-       if (npc_mcam_verify_pf_func(rvu, &req->entry_data, req->intf,
-                                   req->hdr.pcifunc))
-               return NPC_MCAM_INVALID_REQ;
-
        /* Try to allocate a MCAM entry */
        entry_req.hdr.pcifunc = req->hdr.pcifunc;
        entry_req.contig = true;
index 7ca6941ea0b9b4d684ba45482b88db066a728f98..02d0b707aea5bd6b9dea286180914b5aaba4a51d 100644 (file)
@@ -951,8 +951,11 @@ int otx2_sq_init(struct otx2_nic *pfvf, u16 qidx, u16 sqb_aura)
        if (pfvf->ptp && qidx < pfvf->hw.tx_queues) {
                err = qmem_alloc(pfvf->dev, &sq->timestamps, qset->sqe_cnt,
                                 sizeof(*sq->timestamps));
-               if (err)
+               if (err) {
+                       kfree(sq->sg);
+                       sq->sg = NULL;
                        return err;
+               }
        }
 
        sq->head = 0;
@@ -968,7 +971,14 @@ int otx2_sq_init(struct otx2_nic *pfvf, u16 qidx, u16 sqb_aura)
        sq->stats.bytes = 0;
        sq->stats.pkts = 0;
 
-       return pfvf->hw_ops->sq_aq_init(pfvf, qidx, sqb_aura);
+       err = pfvf->hw_ops->sq_aq_init(pfvf, qidx, sqb_aura);
+       if (err) {
+               kfree(sq->sg);
+               sq->sg = NULL;
+               return err;
+       }
+
+       return 0;
 
 }
 
index 2928898c7f8df89c45092c209f9a3dd25b43ee21..7f786de6101483a775c8aa4f12789631040fdd95 100644 (file)
@@ -314,7 +314,6 @@ static int otx2_set_channels(struct net_device *dev,
        pfvf->hw.tx_queues = channel->tx_count;
        if (pfvf->xdp_prog)
                pfvf->hw.xdp_queues = channel->rx_count;
-       pfvf->hw.non_qos_queues =  pfvf->hw.tx_queues + pfvf->hw.xdp_queues;
 
        if (if_up)
                err = dev->netdev_ops->ndo_open(dev);
index a57455aebff6fc58e24c4a4da2d60d78e59f439f..e5fe67e7386551e321949dc3b42074067eb4b3a9 100644 (file)
@@ -1744,6 +1744,7 @@ int otx2_open(struct net_device *netdev)
        /* RQ and SQs are mapped to different CQs,
         * so find out max CQ IRQs (i.e CINTs) needed.
         */
+       pf->hw.non_qos_queues =  pf->hw.tx_queues + pf->hw.xdp_queues;
        pf->hw.cint_cnt = max3(pf->hw.rx_queues, pf->hw.tx_queues,
                               pf->hw.tc_tx_queues);
 
@@ -2643,8 +2644,6 @@ static int otx2_xdp_setup(struct otx2_nic *pf, struct bpf_prog *prog)
                xdp_features_clear_redirect_target(dev);
        }
 
-       pf->hw.non_qos_queues += pf->hw.xdp_queues;
-
        if (if_up)
                otx2_open(pf->netdev);
 
index 4d519ea833b2c7c4fa439ee56fdd07962221030c..f828d32737af02f6a1492e015a1a3d77a732e732 100644 (file)
@@ -1403,7 +1403,7 @@ static bool otx2_xdp_rcv_pkt_handler(struct otx2_nic *pfvf,
                                     struct otx2_cq_queue *cq,
                                     bool *need_xdp_flush)
 {
-       unsigned char *hard_start, *data;
+       unsigned char *hard_start;
        int qidx = cq->cq_idx;
        struct xdp_buff xdp;
        struct page *page;
@@ -1417,9 +1417,8 @@ static bool otx2_xdp_rcv_pkt_handler(struct otx2_nic *pfvf,
 
        xdp_init_buff(&xdp, pfvf->rbsize, &cq->xdp_rxq);
 
-       data = (unsigned char *)phys_to_virt(pa);
-       hard_start = page_address(page);
-       xdp_prepare_buff(&xdp, hard_start, data - hard_start,
+       hard_start = (unsigned char *)phys_to_virt(pa);
+       xdp_prepare_buff(&xdp, hard_start, OTX2_HEAD_ROOM,
                         cqe->sg.seg_size, false);
 
        act = bpf_prog_run_xdp(prog, &xdp);
index a6e91573f8dae8368f7667f5f5caa5636d881a60..de123350bd46b6e55ee5ea83737f79a4bceb6867 100644 (file)
@@ -4761,7 +4761,10 @@ static int mtk_probe(struct platform_device *pdev)
        }
 
        if (MTK_HAS_CAPS(eth->soc->caps, MTK_36BIT_DMA)) {
-               err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(36));
+               err = dma_set_mask(&pdev->dev, DMA_BIT_MASK(36));
+               if (!err)
+                       err = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32));
+
                if (err) {
                        dev_err(&pdev->dev, "Wrong DMA config\n");
                        return -EINVAL;
index a7b1f9686c09a9a0d6370ee85e3cc33d8b4cd302..4957412ff1f65a8d0621410127d7d58d1cdd175f 100644 (file)
@@ -1923,6 +1923,7 @@ static void cmd_status_log(struct mlx5_core_dev *dev, u16 opcode, u8 status,
 {
        const char *namep = mlx5_command_str(opcode);
        struct mlx5_cmd_stats *stats;
+       unsigned long flags;
 
        if (!err || !(strcmp(namep, "unknown command opcode")))
                return;
@@ -1930,7 +1931,7 @@ static void cmd_status_log(struct mlx5_core_dev *dev, u16 opcode, u8 status,
        stats = xa_load(&dev->cmd.stats, opcode);
        if (!stats)
                return;
-       spin_lock_irq(&stats->lock);
+       spin_lock_irqsave(&stats->lock, flags);
        stats->failed++;
        if (err < 0)
                stats->last_failed_errno = -err;
@@ -1939,7 +1940,7 @@ static void cmd_status_log(struct mlx5_core_dev *dev, u16 opcode, u8 status,
                stats->last_failed_mbox_status = status;
                stats->last_failed_syndrome = syndrome;
        }
-       spin_unlock_irq(&stats->lock);
+       spin_unlock_irqrestore(&stats->lock, flags);
 }
 
 /* preserve -EREMOTEIO for outbox.status != OK, otherwise return err as is */
index 3e064234f6fe950273e16a80f31b365e3cad4865..98d4306929f3edf3782d573b5f207a04d64fecdb 100644 (file)
@@ -157,6 +157,12 @@ static int mlx5_devlink_reload_down(struct devlink *devlink, bool netns_change,
                return -EOPNOTSUPP;
        }
 
+       if (action == DEVLINK_RELOAD_ACTION_FW_ACTIVATE &&
+           !dev->priv.fw_reset) {
+               NL_SET_ERR_MSG_MOD(extack, "FW activate is unsupported for this function");
+               return -EOPNOTSUPP;
+       }
+
        if (mlx5_core_is_pf(dev) && pci_num_vf(pdev))
                NL_SET_ERR_MSG_MOD(extack, "reload while VFs are present is unfavorable");
 
index 18fed2b34fb1cad6319f972ca5b6d604701bbce5..d74a5aaf426863681eac1be35a8ac64054087828 100644 (file)
@@ -261,7 +261,7 @@ static void mlx5_dpll_netdev_dpll_pin_set(struct mlx5_dpll *mdpll,
 {
        if (mdpll->tracking_netdev)
                return;
-       netdev_dpll_pin_set(netdev, mdpll->dpll_pin);
+       dpll_netdev_pin_set(netdev, mdpll->dpll_pin);
        mdpll->tracking_netdev = netdev;
 }
 
@@ -269,7 +269,7 @@ static void mlx5_dpll_netdev_dpll_pin_clear(struct mlx5_dpll *mdpll)
 {
        if (!mdpll->tracking_netdev)
                return;
-       netdev_dpll_pin_clear(mdpll->tracking_netdev);
+       dpll_netdev_pin_clear(mdpll->tracking_netdev);
        mdpll->tracking_netdev = NULL;
 }
 
@@ -389,7 +389,7 @@ static void mlx5_dpll_remove(struct auxiliary_device *adev)
        struct mlx5_dpll *mdpll = auxiliary_get_drvdata(adev);
        struct mlx5_core_dev *mdev = mdpll->mdev;
 
-       cancel_delayed_work(&mdpll->work);
+       cancel_delayed_work_sync(&mdpll->work);
        mlx5_dpll_mdev_netdev_untrack(mdpll, mdev);
        destroy_workqueue(mdpll->wq);
        dpll_pin_unregister(mdpll->dpll, mdpll->dpll_pin,
index 0bfe1ca8a364233a1d6fb92846d5b54d07d2bcdc..55c6ace0acd557b075c3bae6ff0818ca84fc3ae8 100644 (file)
@@ -1124,7 +1124,7 @@ static inline bool mlx5_tx_swp_supported(struct mlx5_core_dev *mdev)
 extern const struct ethtool_ops mlx5e_ethtool_ops;
 
 int mlx5e_create_mkey(struct mlx5_core_dev *mdev, u32 pdn, u32 *mkey);
-int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev);
+int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev, bool create_tises);
 void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev);
 int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb,
                       bool enable_mc_lb);
index e1283531e0b810f78d3b18d20cd6e3ba56c9b84f..671adbad0a40f643bbd1f82e56233f7ae11872ce 100644 (file)
@@ -436,6 +436,7 @@ static int fs_any_create_groups(struct mlx5e_flow_table *ft)
        in = kvzalloc(inlen, GFP_KERNEL);
        if  (!in || !ft->g) {
                kfree(ft->g);
+               ft->g = NULL;
                kvfree(in);
                return -ENOMEM;
        }
index 284253b79266b937f4d654361c7b891278e9fda9..5d213a9886f11c4bed6a2b8c5e5bd708ce08bef3 100644 (file)
@@ -1064,8 +1064,8 @@ void mlx5e_build_sq_param(struct mlx5_core_dev *mdev,
        void *wq = MLX5_ADDR_OF(sqc, sqc, wq);
        bool allow_swp;
 
-       allow_swp =
-               mlx5_geneve_tx_allowed(mdev) || !!mlx5_ipsec_device_caps(mdev);
+       allow_swp = mlx5_geneve_tx_allowed(mdev) ||
+                   (mlx5_ipsec_device_caps(mdev) & MLX5_IPSEC_CAP_CRYPTO);
        mlx5e_build_sq_param_common(mdev, param);
        MLX5_SET(wq, wq, log_wq_sz, params->log_sq_size);
        MLX5_SET(sqc, sqc, allow_swp, allow_swp);
index c206cc0a84832e6ebf104cc92990c43a81e94d60..ca05b3252a1b0bd395d07ec37f02edfaa40f477c 100644 (file)
@@ -42,9 +42,9 @@ mlx5e_ptp_port_ts_cqe_list_add(struct mlx5e_ptp_port_ts_cqe_list *list, u8 metad
 
        WARN_ON_ONCE(tracker->inuse);
        tracker->inuse = true;
-       spin_lock(&list->tracker_list_lock);
+       spin_lock_bh(&list->tracker_list_lock);
        list_add_tail(&tracker->entry, &list->tracker_list_head);
-       spin_unlock(&list->tracker_list_lock);
+       spin_unlock_bh(&list->tracker_list_lock);
 }
 
 static void
@@ -54,9 +54,9 @@ mlx5e_ptp_port_ts_cqe_list_remove(struct mlx5e_ptp_port_ts_cqe_list *list, u8 me
 
        WARN_ON_ONCE(!tracker->inuse);
        tracker->inuse = false;
-       spin_lock(&list->tracker_list_lock);
+       spin_lock_bh(&list->tracker_list_lock);
        list_del(&tracker->entry);
-       spin_unlock(&list->tracker_list_lock);
+       spin_unlock_bh(&list->tracker_list_lock);
 }
 
 void mlx5e_ptpsq_track_metadata(struct mlx5e_ptpsq *ptpsq, u8 metadata)
@@ -155,7 +155,7 @@ static void mlx5e_ptpsq_mark_ts_cqes_undelivered(struct mlx5e_ptpsq *ptpsq,
        struct mlx5e_ptp_metadata_map *metadata_map = &ptpsq->metadata_map;
        struct mlx5e_ptp_port_ts_cqe_tracker *pos, *n;
 
-       spin_lock(&cqe_list->tracker_list_lock);
+       spin_lock_bh(&cqe_list->tracker_list_lock);
        list_for_each_entry_safe(pos, n, &cqe_list->tracker_list_head, entry) {
                struct sk_buff *skb =
                        mlx5e_ptp_metadata_map_lookup(metadata_map, pos->metadata_id);
@@ -170,7 +170,7 @@ static void mlx5e_ptpsq_mark_ts_cqes_undelivered(struct mlx5e_ptpsq *ptpsq,
                pos->inuse = false;
                list_del(&pos->entry);
        }
-       spin_unlock(&cqe_list->tracker_list_lock);
+       spin_unlock_bh(&cqe_list->tracker_list_lock);
 }
 
 #define PTP_WQE_CTR2IDX(val) ((val) & ptpsq->ts_cqe_ctr_mask)
@@ -213,7 +213,7 @@ static void mlx5e_ptp_handle_ts_cqe(struct mlx5e_ptpsq *ptpsq,
        mlx5e_ptpsq_mark_ts_cqes_undelivered(ptpsq, hwtstamp);
 out:
        napi_consume_skb(skb, budget);
-       md_buff[*md_buff_sz++] = metadata_id;
+       md_buff[(*md_buff_sz)++] = metadata_id;
        if (unlikely(mlx5e_ptp_metadata_map_unhealthy(&ptpsq->metadata_map)) &&
            !test_and_set_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state))
                queue_work(ptpsq->txqsq.priv->wq, &ptpsq->report_unhealthy_work);
index 86bf007fd05b7327a79918b5de9beea9353b70e1..b500cc2c9689d1973d8736b7fa4b421e92d4c5ea 100644 (file)
@@ -37,7 +37,7 @@ mlx5e_tc_post_act_init(struct mlx5e_priv *priv, struct mlx5_fs_chains *chains,
 
        if (!MLX5_CAP_FLOWTABLE_TYPE(priv->mdev, ignore_flow_level, table_type)) {
                if (priv->mdev->coredev_type == MLX5_COREDEV_PF)
-                       mlx5_core_warn(priv->mdev, "firmware level support is missing\n");
+                       mlx5_core_dbg(priv->mdev, "firmware flow level support is missing\n");
                err = -EOPNOTSUPP;
                goto err_check;
        }
index 161c5190c236a0d8d048bd6a253a36cdeb12b9bc..05612d9c6080c776e9bdded54d9848f8829748fa 100644 (file)
@@ -336,12 +336,17 @@ void mlx5e_ipsec_build_accel_xfrm_attrs(struct mlx5e_ipsec_sa_entry *sa_entry,
        /* iv len */
        aes_gcm->icv_len = x->aead->alg_icv_len;
 
+       attrs->dir = x->xso.dir;
+
        /* esn */
        if (x->props.flags & XFRM_STATE_ESN) {
                attrs->replay_esn.trigger = true;
                attrs->replay_esn.esn = sa_entry->esn_state.esn;
                attrs->replay_esn.esn_msb = sa_entry->esn_state.esn_msb;
                attrs->replay_esn.overlap = sa_entry->esn_state.overlap;
+               if (attrs->dir == XFRM_DEV_OFFLOAD_OUT)
+                       goto skip_replay_window;
+
                switch (x->replay_esn->replay_window) {
                case 32:
                        attrs->replay_esn.replay_window =
@@ -365,7 +370,7 @@ void mlx5e_ipsec_build_accel_xfrm_attrs(struct mlx5e_ipsec_sa_entry *sa_entry,
                }
        }
 
-       attrs->dir = x->xso.dir;
+skip_replay_window:
        /* spi */
        attrs->spi = be32_to_cpu(x->id.spi);
 
@@ -501,7 +506,8 @@ static int mlx5e_xfrm_validate_state(struct mlx5_core_dev *mdev,
                        return -EINVAL;
                }
 
-               if (x->replay_esn && x->replay_esn->replay_window != 32 &&
+               if (x->replay_esn && x->xso.dir == XFRM_DEV_OFFLOAD_IN &&
+                   x->replay_esn->replay_window != 32 &&
                    x->replay_esn->replay_window != 64 &&
                    x->replay_esn->replay_window != 128 &&
                    x->replay_esn->replay_window != 256) {
index d4ebd8743114573e6da44b6eb2418653ab1c4922..b2cabd6ab86cb9044f8d0dc404fa8a052d31938c 100644 (file)
@@ -310,9 +310,9 @@ static void mlx5e_macsec_destroy_object(struct mlx5_core_dev *mdev, u32 macsec_o
        mlx5_cmd_exec(mdev, in, sizeof(in), out, sizeof(out));
 }
 
-static void mlx5e_macsec_cleanup_sa(struct mlx5e_macsec *macsec,
-                                   struct mlx5e_macsec_sa *sa,
-                                   bool is_tx, struct net_device *netdev, u32 fs_id)
+static void mlx5e_macsec_cleanup_sa_fs(struct mlx5e_macsec *macsec,
+                                      struct mlx5e_macsec_sa *sa, bool is_tx,
+                                      struct net_device *netdev, u32 fs_id)
 {
        int action =  (is_tx) ?  MLX5_ACCEL_MACSEC_ACTION_ENCRYPT :
                                 MLX5_ACCEL_MACSEC_ACTION_DECRYPT;
@@ -322,20 +322,49 @@ static void mlx5e_macsec_cleanup_sa(struct mlx5e_macsec *macsec,
 
        mlx5_macsec_fs_del_rule(macsec->mdev->macsec_fs, sa->macsec_rule, action, netdev,
                                fs_id);
-       mlx5e_macsec_destroy_object(macsec->mdev, sa->macsec_obj_id);
        sa->macsec_rule = NULL;
 }
 
+static void mlx5e_macsec_cleanup_sa(struct mlx5e_macsec *macsec,
+                                   struct mlx5e_macsec_sa *sa, bool is_tx,
+                                   struct net_device *netdev, u32 fs_id)
+{
+       mlx5e_macsec_cleanup_sa_fs(macsec, sa, is_tx, netdev, fs_id);
+       mlx5e_macsec_destroy_object(macsec->mdev, sa->macsec_obj_id);
+}
+
+static int mlx5e_macsec_init_sa_fs(struct macsec_context *ctx,
+                                  struct mlx5e_macsec_sa *sa, bool encrypt,
+                                  bool is_tx, u32 *fs_id)
+{
+       struct mlx5e_priv *priv = macsec_netdev_priv(ctx->netdev);
+       struct mlx5_macsec_fs *macsec_fs = priv->mdev->macsec_fs;
+       struct mlx5_macsec_rule_attrs rule_attrs;
+       union mlx5_macsec_rule *macsec_rule;
+
+       rule_attrs.macsec_obj_id = sa->macsec_obj_id;
+       rule_attrs.sci = sa->sci;
+       rule_attrs.assoc_num = sa->assoc_num;
+       rule_attrs.action = (is_tx) ? MLX5_ACCEL_MACSEC_ACTION_ENCRYPT :
+                                     MLX5_ACCEL_MACSEC_ACTION_DECRYPT;
+
+       macsec_rule = mlx5_macsec_fs_add_rule(macsec_fs, ctx, &rule_attrs, fs_id);
+       if (!macsec_rule)
+               return -ENOMEM;
+
+       sa->macsec_rule = macsec_rule;
+
+       return 0;
+}
+
 static int mlx5e_macsec_init_sa(struct macsec_context *ctx,
                                struct mlx5e_macsec_sa *sa,
                                bool encrypt, bool is_tx, u32 *fs_id)
 {
        struct mlx5e_priv *priv = macsec_netdev_priv(ctx->netdev);
        struct mlx5e_macsec *macsec = priv->macsec;
-       struct mlx5_macsec_rule_attrs rule_attrs;
        struct mlx5_core_dev *mdev = priv->mdev;
        struct mlx5_macsec_obj_attrs obj_attrs;
-       union mlx5_macsec_rule *macsec_rule;
        int err;
 
        obj_attrs.next_pn = sa->next_pn;
@@ -357,20 +386,12 @@ static int mlx5e_macsec_init_sa(struct macsec_context *ctx,
        if (err)
                return err;
 
-       rule_attrs.macsec_obj_id = sa->macsec_obj_id;
-       rule_attrs.sci = sa->sci;
-       rule_attrs.assoc_num = sa->assoc_num;
-       rule_attrs.action = (is_tx) ? MLX5_ACCEL_MACSEC_ACTION_ENCRYPT :
-                                     MLX5_ACCEL_MACSEC_ACTION_DECRYPT;
-
-       macsec_rule = mlx5_macsec_fs_add_rule(mdev->macsec_fs, ctx, &rule_attrs, fs_id);
-       if (!macsec_rule) {
-               err = -ENOMEM;
-               goto destroy_macsec_object;
+       if (sa->active) {
+               err = mlx5e_macsec_init_sa_fs(ctx, sa, encrypt, is_tx, fs_id);
+               if (err)
+                       goto destroy_macsec_object;
        }
 
-       sa->macsec_rule = macsec_rule;
-
        return 0;
 
 destroy_macsec_object:
@@ -526,9 +547,7 @@ static int mlx5e_macsec_add_txsa(struct macsec_context *ctx)
                goto destroy_sa;
 
        macsec_device->tx_sa[assoc_num] = tx_sa;
-       if (!secy->operational ||
-           assoc_num != tx_sc->encoding_sa ||
-           !tx_sa->active)
+       if (!secy->operational)
                goto out;
 
        err = mlx5e_macsec_init_sa(ctx, tx_sa, tx_sc->encrypt, true, NULL);
@@ -595,7 +614,7 @@ static int mlx5e_macsec_upd_txsa(struct macsec_context *ctx)
                goto out;
 
        if (ctx_tx_sa->active) {
-               err = mlx5e_macsec_init_sa(ctx, tx_sa, tx_sc->encrypt, true, NULL);
+               err = mlx5e_macsec_init_sa_fs(ctx, tx_sa, tx_sc->encrypt, true, NULL);
                if (err)
                        goto out;
        } else {
@@ -604,7 +623,7 @@ static int mlx5e_macsec_upd_txsa(struct macsec_context *ctx)
                        goto out;
                }
 
-               mlx5e_macsec_cleanup_sa(macsec, tx_sa, true, ctx->secy->netdev, 0);
+               mlx5e_macsec_cleanup_sa_fs(macsec, tx_sa, true, ctx->secy->netdev, 0);
        }
 out:
        mutex_unlock(&macsec->lock);
@@ -1030,8 +1049,9 @@ static int mlx5e_macsec_del_rxsa(struct macsec_context *ctx)
                goto out;
        }
 
-       mlx5e_macsec_cleanup_sa(macsec, rx_sa, false, ctx->secy->netdev,
-                               rx_sc->sc_xarray_element->fs_id);
+       if (rx_sa->active)
+               mlx5e_macsec_cleanup_sa(macsec, rx_sa, false, ctx->secy->netdev,
+                                       rx_sc->sc_xarray_element->fs_id);
        mlx5_destroy_encryption_key(macsec->mdev, rx_sa->enc_key_id);
        kfree(rx_sa);
        rx_sc->rx_sa[assoc_num] = NULL;
@@ -1112,8 +1132,8 @@ static int macsec_upd_secy_hw_address(struct macsec_context *ctx,
                        if (!rx_sa || !rx_sa->macsec_rule)
                                continue;
 
-                       mlx5e_macsec_cleanup_sa(macsec, rx_sa, false, ctx->secy->netdev,
-                                               rx_sc->sc_xarray_element->fs_id);
+                       mlx5e_macsec_cleanup_sa_fs(macsec, rx_sa, false, ctx->secy->netdev,
+                                                  rx_sc->sc_xarray_element->fs_id);
                }
        }
 
@@ -1124,8 +1144,8 @@ static int macsec_upd_secy_hw_address(struct macsec_context *ctx,
                                continue;
 
                        if (rx_sa->active) {
-                               err = mlx5e_macsec_init_sa(ctx, rx_sa, true, false,
-                                                          &rx_sc->sc_xarray_element->fs_id);
+                               err = mlx5e_macsec_init_sa_fs(ctx, rx_sa, true, false,
+                                                             &rx_sc->sc_xarray_element->fs_id);
                                if (err)
                                        goto out;
                        }
@@ -1178,7 +1198,7 @@ static int mlx5e_macsec_upd_secy(struct macsec_context *ctx)
                if (!tx_sa)
                        continue;
 
-               mlx5e_macsec_cleanup_sa(macsec, tx_sa, true, ctx->secy->netdev, 0);
+               mlx5e_macsec_cleanup_sa_fs(macsec, tx_sa, true, ctx->secy->netdev, 0);
        }
 
        for (i = 0; i < MACSEC_NUM_AN; ++i) {
@@ -1187,7 +1207,7 @@ static int mlx5e_macsec_upd_secy(struct macsec_context *ctx)
                        continue;
 
                if (tx_sa->assoc_num == tx_sc->encoding_sa && tx_sa->active) {
-                       err = mlx5e_macsec_init_sa(ctx, tx_sa, tx_sc->encrypt, true, NULL);
+                       err = mlx5e_macsec_init_sa_fs(ctx, tx_sa, tx_sc->encrypt, true, NULL);
                        if (err)
                                goto out;
                }
index bb7f86c993e5579735d0310aa70b5c53b3a3ae9e..e66f486faafe1a6b0cfc75f0f11b2e957b040842 100644 (file)
@@ -254,11 +254,13 @@ static int arfs_create_groups(struct mlx5e_flow_table *ft,
 
        ft->g = kcalloc(MLX5E_ARFS_NUM_GROUPS,
                        sizeof(*ft->g), GFP_KERNEL);
-       in = kvzalloc(inlen, GFP_KERNEL);
-       if  (!in || !ft->g) {
-               kfree(ft->g);
-               kvfree(in);
+       if (!ft->g)
                return -ENOMEM;
+
+       in = kvzalloc(inlen, GFP_KERNEL);
+       if (!in) {
+               err = -ENOMEM;
+               goto err_free_g;
        }
 
        mc = MLX5_ADDR_OF(create_flow_group_in, in, match_criteria);
@@ -278,7 +280,7 @@ static int arfs_create_groups(struct mlx5e_flow_table *ft,
                break;
        default:
                err = -EINVAL;
-               goto out;
+               goto err_free_in;
        }
 
        switch (type) {
@@ -300,7 +302,7 @@ static int arfs_create_groups(struct mlx5e_flow_table *ft,
                break;
        default:
                err = -EINVAL;
-               goto out;
+               goto err_free_in;
        }
 
        MLX5_SET_CFG(in, match_criteria_enable, MLX5_MATCH_OUTER_HEADERS);
@@ -309,7 +311,7 @@ static int arfs_create_groups(struct mlx5e_flow_table *ft,
        MLX5_SET_CFG(in, end_flow_index, ix - 1);
        ft->g[ft->num_groups] = mlx5_create_flow_group(ft->t, in);
        if (IS_ERR(ft->g[ft->num_groups]))
-               goto err;
+               goto err_clean_group;
        ft->num_groups++;
 
        memset(in, 0, inlen);
@@ -318,18 +320,20 @@ static int arfs_create_groups(struct mlx5e_flow_table *ft,
        MLX5_SET_CFG(in, end_flow_index, ix - 1);
        ft->g[ft->num_groups] = mlx5_create_flow_group(ft->t, in);
        if (IS_ERR(ft->g[ft->num_groups]))
-               goto err;
+               goto err_clean_group;
        ft->num_groups++;
 
        kvfree(in);
        return 0;
 
-err:
+err_clean_group:
        err = PTR_ERR(ft->g[ft->num_groups]);
        ft->g[ft->num_groups] = NULL;
-out:
+err_free_in:
        kvfree(in);
-
+err_free_g:
+       kfree(ft->g);
+       ft->g = NULL;
        return err;
 }
 
index 67f546683e85a3fa0bed05baab33790c66eb9168..6ed3a32b7e226d497234e4fa7b244bf9629b5710 100644 (file)
@@ -95,7 +95,7 @@ static void mlx5e_destroy_tises(struct mlx5_core_dev *mdev, u32 tisn[MLX5_MAX_PO
 {
        int tc, i;
 
-       for (i = 0; i < MLX5_MAX_PORTS; i++)
+       for (i = 0; i < mlx5e_get_num_lag_ports(mdev); i++)
                for (tc = 0; tc < MLX5_MAX_NUM_TC; tc++)
                        mlx5e_destroy_tis(mdev, tisn[i][tc]);
 }
@@ -110,7 +110,7 @@ static int mlx5e_create_tises(struct mlx5_core_dev *mdev, u32 tisn[MLX5_MAX_PORT
        int tc, i;
        int err;
 
-       for (i = 0; i < MLX5_MAX_PORTS; i++) {
+       for (i = 0; i < mlx5e_get_num_lag_ports(mdev); i++) {
                for (tc = 0; tc < MLX5_MAX_NUM_TC; tc++) {
                        u32 in[MLX5_ST_SZ_DW(create_tis_in)] = {};
                        void *tisc;
@@ -140,7 +140,7 @@ err_close_tises:
        return err;
 }
 
-int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev)
+int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev, bool create_tises)
 {
        struct mlx5e_hw_objs *res = &mdev->mlx5e_res.hw_objs;
        int err;
@@ -169,11 +169,15 @@ int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev)
                goto err_destroy_mkey;
        }
 
-       err = mlx5e_create_tises(mdev, res->tisn);
-       if (err) {
-               mlx5_core_err(mdev, "alloc tises failed, %d\n", err);
-               goto err_destroy_bfreg;
+       if (create_tises) {
+               err = mlx5e_create_tises(mdev, res->tisn);
+               if (err) {
+                       mlx5_core_err(mdev, "alloc tises failed, %d\n", err);
+                       goto err_destroy_bfreg;
+               }
+               res->tisn_valid = true;
        }
+
        INIT_LIST_HEAD(&res->td.tirs_list);
        mutex_init(&res->td.list_lock);
 
@@ -203,7 +207,8 @@ void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev)
 
        mlx5_crypto_dek_cleanup(mdev->mlx5e_res.dek_priv);
        mdev->mlx5e_res.dek_priv = NULL;
-       mlx5e_destroy_tises(mdev, res->tisn);
+       if (res->tisn_valid)
+               mlx5e_destroy_tises(mdev, res->tisn);
        mlx5_free_bfreg(mdev, &res->bfreg);
        mlx5_core_destroy_mkey(mdev, res->mkey);
        mlx5_core_dealloc_transport_domain(mdev, res->td.tdn);
index b5f1c4ca38bac97d860ed2bb5f37eac332e6e24c..c8e8f512803efb7aea48e90259e852a55882403c 100644 (file)
@@ -5992,7 +5992,7 @@ static int mlx5e_resume(struct auxiliary_device *adev)
        if (netif_device_present(netdev))
                return 0;
 
-       err = mlx5e_create_mdev_resources(mdev);
+       err = mlx5e_create_mdev_resources(mdev, true);
        if (err)
                return err;
 
index 30932c9c9a8f08bca2c8025f0a5685f79695e54d..9fb2c057bd78723420478d93001e74e7599d646e 100644 (file)
@@ -761,7 +761,7 @@ static int mlx5e_hairpin_create_indirect_rqt(struct mlx5e_hairpin *hp)
 
        err = mlx5e_rss_params_indir_init(&indir, mdev,
                                          mlx5e_rqt_size(mdev, hp->num_channels),
-                                         mlx5e_rqt_size(mdev, priv->max_nch));
+                                         mlx5e_rqt_size(mdev, hp->num_channels));
        if (err)
                return err;
 
@@ -2014,9 +2014,10 @@ static void mlx5e_tc_del_fdb_peer_flow(struct mlx5e_tc_flow *flow,
        list_for_each_entry_safe(peer_flow, tmp, &flow->peer_flows, peer_flows) {
                if (peer_index != mlx5_get_dev_index(peer_flow->priv->mdev))
                        continue;
+
+               list_del(&peer_flow->peer_flows);
                if (refcount_dec_and_test(&peer_flow->refcnt)) {
                        mlx5e_tc_del_fdb_flow(peer_flow->priv, peer_flow);
-                       list_del(&peer_flow->peer_flows);
                        kfree(peer_flow);
                }
        }
index 5c166d9d2dca62a8db671c7cb62476781a8d1b97..2fa076b23fbead06bceb6697e0ebb0238bb5be7e 100644 (file)
@@ -401,6 +401,8 @@ mlx5e_txwqe_complete(struct mlx5e_txqsq *sq, struct sk_buff *skb,
                mlx5e_skb_cb_hwtstamp_init(skb);
                mlx5e_ptp_metadata_map_put(&sq->ptpsq->metadata_map, skb,
                                           metadata_index);
+               /* ensure skb is put on metadata_map before tracking the index */
+               wmb();
                mlx5e_ptpsq_track_metadata(sq->ptpsq, metadata_index);
                if (!netif_tx_queue_stopped(sq->txq) &&
                    mlx5e_ptpsq_metadata_freelist_empty(sq->ptpsq)) {
index a7ed87e9d8426befdbda753b52732400e003f1b8..22dd30cf8033f93134d08ed77fe32dc73b6bbaf2 100644 (file)
@@ -83,6 +83,7 @@ mlx5_esw_bridge_mdb_flow_create(u16 esw_owner_vhca_id, struct mlx5_esw_bridge_md
                i++;
        }
 
+       rule_spec->flow_context.flags |= FLOW_CONTEXT_UPLINK_HAIRPIN_EN;
        rule_spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
        dmac_v = MLX5_ADDR_OF(fte_match_param, rule_spec->match_value, outer_headers.dmac_47_16);
        ether_addr_copy(dmac_v, entry->key.addr);
@@ -587,6 +588,7 @@ mlx5_esw_bridge_mcast_vlan_flow_create(u16 vlan_proto, struct mlx5_esw_bridge_po
        if (!rule_spec)
                return ERR_PTR(-ENOMEM);
 
+       rule_spec->flow_context.flags |= FLOW_CONTEXT_UPLINK_HAIRPIN_EN;
        rule_spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
 
        flow_act.action |= MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT;
@@ -662,6 +664,7 @@ mlx5_esw_bridge_mcast_fwd_flow_create(struct mlx5_esw_bridge_port *port)
                dest.vport.flags = MLX5_FLOW_DEST_VPORT_VHCA_ID;
                dest.vport.vhca_id = port->esw_owner_vhca_id;
        }
+       rule_spec->flow_context.flags |= FLOW_CONTEXT_UPLINK_HAIRPIN_EN;
        handle = mlx5_add_flow_rules(port->mcast.ft, rule_spec, &flow_act, &dest, 1);
 
        kvfree(rule_spec);
index 190f10aba17028211fc6c34abaa7b35d44310ba2..5a0047bdcb5105ae4992578003007d21dd4fa1b5 100644 (file)
@@ -152,7 +152,7 @@ void mlx5_esw_ipsec_restore_dest_uplink(struct mlx5_core_dev *mdev)
 
        xa_for_each(&esw->offloads.vport_reps, i, rep) {
                rpriv = rep->rep_data[REP_ETH].priv;
-               if (!rpriv || !rpriv->netdev || !atomic_read(&rpriv->tc_ht.nelems))
+               if (!rpriv || !rpriv->netdev)
                        continue;
 
                rhashtable_walk_enter(&rpriv->tc_ht, &iter);
index b0455134c98eff62c82b3d35bc8c600dc059d960..baaae628b0a0f6510e2c350cbab0b6309b32da52 100644 (file)
@@ -535,21 +535,26 @@ esw_src_port_rewrite_supported(struct mlx5_eswitch *esw)
 }
 
 static bool
-esw_dests_to_vf_pf_vports(struct mlx5_flow_destination *dests, int max_dest)
+esw_dests_to_int_external(struct mlx5_flow_destination *dests, int max_dest)
 {
-       bool vf_dest = false, pf_dest = false;
+       bool internal_dest = false, external_dest = false;
        int i;
 
        for (i = 0; i < max_dest; i++) {
-               if (dests[i].type != MLX5_FLOW_DESTINATION_TYPE_VPORT)
+               if (dests[i].type != MLX5_FLOW_DESTINATION_TYPE_VPORT &&
+                   dests[i].type != MLX5_FLOW_DESTINATION_TYPE_UPLINK)
                        continue;
 
-               if (dests[i].vport.num == MLX5_VPORT_UPLINK)
-                       pf_dest = true;
+               /* Uplink dest is external, but considered as internal
+                * if there is reformat because firmware uses LB+hairpin to support it.
+                */
+               if (dests[i].vport.num == MLX5_VPORT_UPLINK &&
+                   !(dests[i].vport.flags & MLX5_FLOW_DEST_VPORT_REFORMAT_ID))
+                       external_dest = true;
                else
-                       vf_dest = true;
+                       internal_dest = true;
 
-               if (vf_dest && pf_dest)
+               if (internal_dest && external_dest)
                        return true;
        }
 
@@ -695,9 +700,9 @@ mlx5_eswitch_add_offloaded_rule(struct mlx5_eswitch *esw,
 
                /* Header rewrite with combined wire+loopback in FDB is not allowed */
                if ((flow_act.action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR) &&
-                   esw_dests_to_vf_pf_vports(dest, i)) {
+                   esw_dests_to_int_external(dest, i)) {
                        esw_warn(esw->dev,
-                                "FDB: Header rewrite with forwarding to both PF and VF is not allowed\n");
+                                "FDB: Header rewrite with forwarding to both internal and external dests is not allowed\n");
                        rule = ERR_PTR(-EINVAL);
                        goto err_esw_get;
                }
@@ -3658,22 +3663,6 @@ static int esw_inline_mode_to_devlink(u8 mlx5_mode, u8 *mode)
        return 0;
 }
 
-static bool esw_offloads_devlink_ns_eq_netdev_ns(struct devlink *devlink)
-{
-       struct mlx5_core_dev *dev = devlink_priv(devlink);
-       struct net *devl_net, *netdev_net;
-       bool ret = false;
-
-       mutex_lock(&dev->mlx5e_res.uplink_netdev_lock);
-       if (dev->mlx5e_res.uplink_netdev) {
-               netdev_net = dev_net(dev->mlx5e_res.uplink_netdev);
-               devl_net = devlink_net(devlink);
-               ret = net_eq(devl_net, netdev_net);
-       }
-       mutex_unlock(&dev->mlx5e_res.uplink_netdev_lock);
-       return ret;
-}
-
 int mlx5_eswitch_block_mode(struct mlx5_core_dev *dev)
 {
        struct mlx5_eswitch *esw = dev->priv.eswitch;
@@ -3718,13 +3707,6 @@ int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
        if (esw_mode_from_devlink(mode, &mlx5_mode))
                return -EINVAL;
 
-       if (mode == DEVLINK_ESWITCH_MODE_SWITCHDEV &&
-           !esw_offloads_devlink_ns_eq_netdev_ns(devlink)) {
-               NL_SET_ERR_MSG_MOD(extack,
-                                  "Can't change E-Switch mode to switchdev when netdev net namespace has diverged from the devlink's.");
-               return -EPERM;
-       }
-
        mlx5_lag_disable_change(esw->dev);
        err = mlx5_esw_try_lock(esw);
        if (err < 0) {
index 1616a6144f7b42d4c7415bc02a05dbf63c61c420..9b8599c200e2c0990009162b90b1e368e784cdef 100644 (file)
@@ -566,6 +566,8 @@ static int mlx5_cmd_set_fte(struct mlx5_core_dev *dev,
                 fte->flow_context.flow_tag);
        MLX5_SET(flow_context, in_flow_context, flow_source,
                 fte->flow_context.flow_source);
+       MLX5_SET(flow_context, in_flow_context, uplink_hairpin_en,
+                !!(fte->flow_context.flags & FLOW_CONTEXT_UPLINK_HAIRPIN_EN));
 
        MLX5_SET(flow_context, in_flow_context, extended_destination,
                 extended_dest);
index f27eab6e49299059ed26ed5edc0cf767b997a08f..2911aa34a5be3f9738b07635a421ba996e4f749a 100644 (file)
@@ -703,19 +703,30 @@ void mlx5_fw_reset_events_start(struct mlx5_core_dev *dev)
 {
        struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
 
+       if (!fw_reset)
+               return;
+
        MLX5_NB_INIT(&fw_reset->nb, fw_reset_event_notifier, GENERAL_EVENT);
        mlx5_eq_notifier_register(dev, &fw_reset->nb);
 }
 
 void mlx5_fw_reset_events_stop(struct mlx5_core_dev *dev)
 {
-       mlx5_eq_notifier_unregister(dev, &dev->priv.fw_reset->nb);
+       struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
+
+       if (!fw_reset)
+               return;
+
+       mlx5_eq_notifier_unregister(dev, &fw_reset->nb);
 }
 
 void mlx5_drain_fw_reset(struct mlx5_core_dev *dev)
 {
        struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
 
+       if (!fw_reset)
+               return;
+
        set_bit(MLX5_FW_RESET_FLAGS_DROP_NEW_REQUESTS, &fw_reset->reset_flags);
        cancel_work_sync(&fw_reset->fw_live_patch_work);
        cancel_work_sync(&fw_reset->reset_request_work);
@@ -733,9 +744,13 @@ static const struct devlink_param mlx5_fw_reset_devlink_params[] = {
 
 int mlx5_fw_reset_init(struct mlx5_core_dev *dev)
 {
-       struct mlx5_fw_reset *fw_reset = kzalloc(sizeof(*fw_reset), GFP_KERNEL);
+       struct mlx5_fw_reset *fw_reset;
        int err;
 
+       if (!MLX5_CAP_MCAM_REG(dev, mfrl))
+               return 0;
+
+       fw_reset = kzalloc(sizeof(*fw_reset), GFP_KERNEL);
        if (!fw_reset)
                return -ENOMEM;
        fw_reset->wq = create_singlethread_workqueue("mlx5_fw_reset_events");
@@ -771,6 +786,9 @@ void mlx5_fw_reset_cleanup(struct mlx5_core_dev *dev)
 {
        struct mlx5_fw_reset *fw_reset = dev->priv.fw_reset;
 
+       if (!fw_reset)
+               return;
+
        devl_params_unregister(priv_to_devlink(dev),
                               mlx5_fw_reset_devlink_params,
                               ARRAY_SIZE(mlx5_fw_reset_devlink_params));
index 8ff6dc9bc8033e74d20c2c5423e4a87d64d05b78..b5c709bba1553e1811767a82d07470f4648b0ca5 100644 (file)
@@ -452,10 +452,10 @@ mlx5_fw_reporter_diagnose(struct devlink_health_reporter *reporter,
        struct health_buffer __iomem *h = health->health;
        u8 synd = ioread8(&h->synd);
 
+       devlink_fmsg_u8_pair_put(fmsg, "Syndrome", synd);
        if (!synd)
                return 0;
 
-       devlink_fmsg_u8_pair_put(fmsg, "Syndrome", synd);
        devlink_fmsg_string_pair_put(fmsg, "Description", hsynd_str(synd));
 
        return 0;
index 58845121954c19db3bdc454046c161654ef37308..d77be1b4dd9c557b70ba74e3ebb37ac7994a4486 100644 (file)
@@ -783,7 +783,7 @@ static int mlx5_rdma_setup_rn(struct ib_device *ibdev, u32 port_num,
                }
 
                /* This should only be called once per mdev */
-               err = mlx5e_create_mdev_resources(mdev);
+               err = mlx5e_create_mdev_resources(mdev, false);
                if (err)
                        goto destroy_ht;
        }
index 40c7be12404168094e60d0ca5dbedbde77ea1402..58bd749b5e4de07a19320e223a0103b8ae7ded25 100644 (file)
@@ -98,7 +98,7 @@ static int create_aso_cq(struct mlx5_aso_cq *cq, void *cqc_data)
        mlx5_fill_page_frag_array(&cq->wq_ctrl.buf,
                                  (__be64 *)MLX5_ADDR_OF(create_cq_in, in, pas));
 
-       MLX5_SET(cqc,   cqc, cq_period_mode, DIM_CQ_PERIOD_MODE_START_FROM_EQE);
+       MLX5_SET(cqc,   cqc, cq_period_mode, MLX5_CQ_PERIOD_MODE_START_FROM_EQE);
        MLX5_SET(cqc,   cqc, c_eqn_or_apu_element, eqn);
        MLX5_SET(cqc,   cqc, uar_page,      mdev->priv.uar->index);
        MLX5_SET(cqc,   cqc, log_page_size, cq->wq_ctrl.buf.page_shift -
index 6f9790e97fed20821f48392732a90d12e2450a01..2ebb61ef3ea9f6a906601b41c723ba9f7834afda 100644 (file)
@@ -788,6 +788,7 @@ int mlx5dr_actions_build_ste_arr(struct mlx5dr_matcher *matcher,
                switch (action_type) {
                case DR_ACTION_TYP_DROP:
                        attr.final_icm_addr = nic_dmn->drop_icm_addr;
+                       attr.hit_gvmi = nic_dmn->drop_icm_addr >> 48;
                        break;
                case DR_ACTION_TYP_FT:
                        dest_action = action;
@@ -873,11 +874,17 @@ int mlx5dr_actions_build_ste_arr(struct mlx5dr_matcher *matcher,
                                                        action->sampler->tx_icm_addr;
                        break;
                case DR_ACTION_TYP_VPORT:
-                       attr.hit_gvmi = action->vport->caps->vhca_gvmi;
-                       dest_action = action;
-                       attr.final_icm_addr = rx_rule ?
-                               action->vport->caps->icm_address_rx :
-                               action->vport->caps->icm_address_tx;
+                       if (unlikely(rx_rule && action->vport->caps->num == MLX5_VPORT_UPLINK)) {
+                               /* can't go to uplink on RX rule - dropping instead */
+                               attr.final_icm_addr = nic_dmn->drop_icm_addr;
+                               attr.hit_gvmi = nic_dmn->drop_icm_addr >> 48;
+                       } else {
+                               attr.hit_gvmi = action->vport->caps->vhca_gvmi;
+                               dest_action = action;
+                               attr.final_icm_addr = rx_rule ?
+                                                     action->vport->caps->icm_address_rx :
+                                                     action->vport->caps->icm_address_tx;
+                       }
                        break;
                case DR_ACTION_TYP_POP_VLAN:
                        if (!rx_rule && !(dmn->ste_ctx->actions_caps &
index 21753f32786850bd010bded5a13db6eb83fa3ade..1005bb6935b65c0d6bb2b68f71744c2857085eed 100644 (file)
@@ -440,6 +440,27 @@ out:
 }
 EXPORT_SYMBOL_GPL(mlx5_query_nic_vport_system_image_guid);
 
+int mlx5_query_nic_vport_sd_group(struct mlx5_core_dev *mdev, u8 *sd_group)
+{
+       int outlen = MLX5_ST_SZ_BYTES(query_nic_vport_context_out);
+       u32 *out;
+       int err;
+
+       out = kvzalloc(outlen, GFP_KERNEL);
+       if (!out)
+               return -ENOMEM;
+
+       err = mlx5_query_nic_vport_context(mdev, 0, out);
+       if (err)
+               goto out;
+
+       *sd_group = MLX5_GET(query_nic_vport_context_out, out,
+                            nic_vport_context.sd_group);
+out:
+       kvfree(out);
+       return err;
+}
+
 int mlx5_query_nic_vport_node_guid(struct mlx5_core_dev *mdev, u64 *node_guid)
 {
        u32 *out;
index 41fa2523d91d3bf57479dd7d66c1903786aa98b2..5f2cd9a8cf8fb39cef3d6be0806f3a6f9b7cdac7 100644 (file)
@@ -37,19 +37,24 @@ static void lan966x_lag_set_aggr_pgids(struct lan966x *lan966x)
 
        /* Now, set PGIDs for each active LAG */
        for (lag = 0; lag < lan966x->num_phys_ports; ++lag) {
-               struct net_device *bond = lan966x->ports[lag]->bond;
+               struct lan966x_port *port = lan966x->ports[lag];
                int num_active_ports = 0;
+               struct net_device *bond;
                unsigned long bond_mask;
                u8 aggr_idx[16];
 
-               if (!bond || (visited & BIT(lag)))
+               if (!port || !port->bond || (visited & BIT(lag)))
                        continue;
 
+               bond = port->bond;
                bond_mask = lan966x_lag_get_mask(lan966x, bond);
 
                for_each_set_bit(p, &bond_mask, lan966x->num_phys_ports) {
                        struct lan966x_port *port = lan966x->ports[p];
 
+                       if (!port)
+                               continue;
+
                        lan_wr(ANA_PGID_PGID_SET(bond_mask),
                               lan966x, ANA_PGID(p));
                        if (port->lag_tx_active)
index 92108d354051c31c44c64b207fb11411d0b4295b..2e83bbb9477e0693f236e83be30277d3e92df235 100644 (file)
@@ -168,9 +168,10 @@ static void lan966x_port_link_up(struct lan966x_port *port)
        lan966x_taprio_speed_set(port, config->speed);
 
        /* Also the GIGA_MODE_ENA(1) needs to be set regardless of the
-        * port speed for QSGMII ports.
+        * port speed for QSGMII or SGMII ports.
         */
-       if (phy_interface_num_ports(config->portmode) == 4)
+       if (phy_interface_num_ports(config->portmode) == 4 ||
+           config->portmode == PHY_INTERFACE_MODE_SGMII)
                mode = DEV_MAC_MODE_CFG_GIGA_MODE_ENA_SET(1);
 
        lan_wr(config->duplex | mode,
index 4af285918ea2a45cae16ca01ec1ab8fd3f5370c9..75868b3f548ec40549c02ad1ac2a3c5c388eade4 100644 (file)
@@ -347,10 +347,10 @@ int sparx5_del_mact_entry(struct sparx5 *sparx5,
                                 list) {
                if ((vid == 0 || mact_entry->vid == vid) &&
                    ether_addr_equal(addr, mact_entry->mac)) {
+                       sparx5_mact_forget(sparx5, addr, mact_entry->vid);
+
                        list_del(&mact_entry->list);
                        devm_kfree(sparx5->dev, mact_entry);
-
-                       sparx5_mact_forget(sparx5, addr, mact_entry->vid);
                }
        }
        mutex_unlock(&sparx5->mact_lock);
index d1f7fc8b1b71ab68775f40ad592ebc8aa0e57211..3c066b62e68947cf81fc34c1169cc2edd2991d5a 100644 (file)
@@ -757,6 +757,7 @@ static int mchp_sparx5_probe(struct platform_device *pdev)
        platform_set_drvdata(pdev, sparx5);
        sparx5->pdev = pdev;
        sparx5->dev = &pdev->dev;
+       spin_lock_init(&sparx5->tx_lock);
 
        /* Do switch core reset if available */
        reset = devm_reset_control_get_optional_shared(&pdev->dev, "switch");
index 6f565c0c0c3dcd3d3889abb1bf8eac72899037fc..316fed5f27355207146875ee80b3636420ca4945 100644 (file)
@@ -280,6 +280,7 @@ struct sparx5 {
        int xtr_irq;
        /* Frame DMA */
        int fdma_irq;
+       spinlock_t tx_lock; /* lock for frame transmission */
        struct sparx5_rx rx;
        struct sparx5_tx tx;
        /* PTP */
index 6db6ac6a3bbc26db972e2f611ddd7c72fac29c16..ac7e1cffbcecf0ccc4f89e394730d90ec2ada2f8 100644 (file)
@@ -244,10 +244,12 @@ netdev_tx_t sparx5_port_xmit_impl(struct sk_buff *skb, struct net_device *dev)
        }
 
        skb_tx_timestamp(skb);
+       spin_lock(&sparx5->tx_lock);
        if (sparx5->fdma_irq > 0)
                ret = sparx5_fdma_xmit(sparx5, ifh, skb);
        else
                ret = sparx5_inject(sparx5, ifh, skb, dev);
+       spin_unlock(&sparx5->tx_lock);
 
        if (ret == -EBUSY)
                goto busy;
index 2967bab72505617abcf59f0b16f5a1d5bb9d127c..15180538b80a1535a8646b407bcc1b06b632b43c 100644 (file)
@@ -1424,10 +1424,30 @@ static void nfp_nft_ct_translate_mangle_action(struct flow_action_entry *mangle_
                mangle_action->mangle.mask = (__force u32)cpu_to_be32(mangle_action->mangle.mask);
                return;
 
+       /* Both struct tcphdr and struct udphdr start with
+        *      __be16 source;
+        *      __be16 dest;
+        * so we can use the same code for both.
+        */
        case FLOW_ACT_MANGLE_HDR_TYPE_TCP:
        case FLOW_ACT_MANGLE_HDR_TYPE_UDP:
-               mangle_action->mangle.val = (__force u16)cpu_to_be16(mangle_action->mangle.val);
-               mangle_action->mangle.mask = (__force u16)cpu_to_be16(mangle_action->mangle.mask);
+               if (mangle_action->mangle.offset == offsetof(struct tcphdr, source)) {
+                       mangle_action->mangle.val =
+                               (__force u32)cpu_to_be32(mangle_action->mangle.val << 16);
+                       /* The mask of mangle action is inverse mask,
+                        * so clear the dest tp port with 0xFFFF to
+                        * instead of rotate-left operation.
+                        */
+                       mangle_action->mangle.mask =
+                               (__force u32)cpu_to_be32(mangle_action->mangle.mask << 16 | 0xFFFF);
+               }
+               if (mangle_action->mangle.offset == offsetof(struct tcphdr, dest)) {
+                       mangle_action->mangle.offset = 0;
+                       mangle_action->mangle.val =
+                               (__force u32)cpu_to_be32(mangle_action->mangle.val);
+                       mangle_action->mangle.mask =
+                               (__force u32)cpu_to_be32(mangle_action->mangle.mask);
+               }
                return;
 
        default:
@@ -1864,10 +1884,30 @@ int nfp_fl_ct_handle_post_ct(struct nfp_flower_priv *priv,
 {
        struct flow_rule *rule = flow_cls_offload_flow_rule(flow);
        struct nfp_fl_ct_flow_entry *ct_entry;
+       struct flow_action_entry *ct_goto;
        struct nfp_fl_ct_zone_entry *zt;
+       struct flow_action_entry *act;
        bool wildcarded = false;
        struct flow_match_ct ct;
-       struct flow_action_entry *ct_goto;
+       int i;
+
+       flow_action_for_each(i, act, &rule->action) {
+               switch (act->id) {
+               case FLOW_ACTION_REDIRECT:
+               case FLOW_ACTION_REDIRECT_INGRESS:
+               case FLOW_ACTION_MIRRED:
+               case FLOW_ACTION_MIRRED_INGRESS:
+                       if (act->dev->rtnl_link_ops &&
+                           !strcmp(act->dev->rtnl_link_ops->kind, "openvswitch")) {
+                               NL_SET_ERR_MSG_MOD(extack,
+                                                  "unsupported offload: out port is openvswitch internal port");
+                               return -EOPNOTSUPP;
+                       }
+                       break;
+               default:
+                       break;
+               }
+       }
 
        flow_rule_match_ct(rule, &ct);
        if (!ct.mask->ct_zone) {
index e522845c7c211619a252bb995dec65160d7a1ae5..0d7d138d6e0d7e4f468f66683707cd22d750b64a 100644 (file)
@@ -1084,7 +1084,7 @@ nfp_tunnel_add_shared_mac(struct nfp_app *app, struct net_device *netdev,
        u16 nfp_mac_idx = 0;
 
        entry = nfp_tunnel_lookup_offloaded_macs(app, netdev->dev_addr);
-       if (entry && nfp_tunnel_is_mac_idx_global(entry->index)) {
+       if (entry && (nfp_tunnel_is_mac_idx_global(entry->index) || netif_is_lag_port(netdev))) {
                if (entry->bridge_count ||
                    !nfp_flower_is_supported_bridge(netdev)) {
                        nfp_tunnel_offloaded_macs_inc_ref_and_link(entry,
index 3b3210d823e8038704391e085b9d7951031dd309..f28e769e6fdadab091d447f3de4cb8df1d2b4d3e 100644 (file)
@@ -2776,6 +2776,7 @@ static void nfp_net_netdev_init(struct nfp_net *nn)
        case NFP_NFD_VER_NFD3:
                netdev->netdev_ops = &nfp_nfd3_netdev_ops;
                netdev->xdp_features |= NETDEV_XDP_ACT_XSK_ZEROCOPY;
+               netdev->xdp_features |= NETDEV_XDP_ACT_REDIRECT;
                break;
        case NFP_NFD_VER_NFDK:
                netdev->netdev_ops = &nfp_nfdk_netdev_ops;
index 33b4c28563162eeab3938da414b32cfd480c13d7..3f10c5365c80ebb2fe079b779fee644a46ed33da 100644 (file)
@@ -537,11 +537,13 @@ static int enable_bars(struct nfp6000_pcie *nfp, u16 interface)
        const u32 barcfg_msix_general =
                NFP_PCIE_BAR_PCIE2CPP_MapType(
                        NFP_PCIE_BAR_PCIE2CPP_MapType_GENERAL) |
-               NFP_PCIE_BAR_PCIE2CPP_LengthSelect_32BIT;
+               NFP_PCIE_BAR_PCIE2CPP_LengthSelect(
+                       NFP_PCIE_BAR_PCIE2CPP_LengthSelect_32BIT);
        const u32 barcfg_msix_xpb =
                NFP_PCIE_BAR_PCIE2CPP_MapType(
                        NFP_PCIE_BAR_PCIE2CPP_MapType_BULK) |
-               NFP_PCIE_BAR_PCIE2CPP_LengthSelect_32BIT |
+               NFP_PCIE_BAR_PCIE2CPP_LengthSelect(
+                       NFP_PCIE_BAR_PCIE2CPP_LengthSelect_32BIT) |
                NFP_PCIE_BAR_PCIE2CPP_Target_BaseAddress(
                        NFP_CPP_TARGET_ISLAND_XPB);
        const u32 barcfg_explicit[4] = {
index c49aa358e42444de33b3a3dd08832bf8f56af394..6ba8d4aca0a038b88e7f3ae3a8299ea0a55bd7e0 100644 (file)
@@ -93,6 +93,7 @@ static void ionic_unmap_bars(struct ionic *ionic)
                        bars[i].len = 0;
                }
        }
+       ionic->num_bars = 0;
 }
 
 void __iomem *ionic_bus_map_dbpage(struct ionic *ionic, int page_num)
@@ -215,15 +216,17 @@ out:
 
 static void ionic_clear_pci(struct ionic *ionic)
 {
-       ionic->idev.dev_info_regs = NULL;
-       ionic->idev.dev_cmd_regs = NULL;
-       ionic->idev.intr_status = NULL;
-       ionic->idev.intr_ctrl = NULL;
-
-       ionic_unmap_bars(ionic);
-       pci_release_regions(ionic->pdev);
+       if (ionic->num_bars) {
+               ionic->idev.dev_info_regs = NULL;
+               ionic->idev.dev_cmd_regs = NULL;
+               ionic->idev.intr_status = NULL;
+               ionic->idev.intr_ctrl = NULL;
+
+               ionic_unmap_bars(ionic);
+               pci_release_regions(ionic->pdev);
+       }
 
-       if (atomic_read(&ionic->pdev->enable_cnt) > 0)
+       if (pci_is_enabled(ionic->pdev))
                pci_disable_device(ionic->pdev);
 }
 
index 1e7c71f7f081b159e83271eeeb47eb35ac401d69..746072b4dbd0e0d37352bc771aa0c7e963eaa26f 100644 (file)
@@ -319,22 +319,32 @@ do_check_time:
 
 u8 ionic_dev_cmd_status(struct ionic_dev *idev)
 {
+       if (!idev->dev_cmd_regs)
+               return (u8)PCI_ERROR_RESPONSE;
        return ioread8(&idev->dev_cmd_regs->comp.comp.status);
 }
 
 bool ionic_dev_cmd_done(struct ionic_dev *idev)
 {
+       if (!idev->dev_cmd_regs)
+               return false;
        return ioread32(&idev->dev_cmd_regs->done) & IONIC_DEV_CMD_DONE;
 }
 
 void ionic_dev_cmd_comp(struct ionic_dev *idev, union ionic_dev_cmd_comp *comp)
 {
+       if (!idev->dev_cmd_regs)
+               return;
        memcpy_fromio(comp, &idev->dev_cmd_regs->comp, sizeof(*comp));
 }
 
 void ionic_dev_cmd_go(struct ionic_dev *idev, union ionic_dev_cmd *cmd)
 {
        idev->opcode = cmd->cmd.opcode;
+
+       if (!idev->dev_cmd_regs)
+               return;
+
        memcpy_toio(&idev->dev_cmd_regs->cmd, cmd, sizeof(*cmd));
        iowrite32(0, &idev->dev_cmd_regs->done);
        iowrite32(1, &idev->dev_cmd_regs->doorbell);
index cd3c0b01402e64360c9104a069f0a9bd5b23b65f..0ffc9c4904ac80320cc9c26f51ea6e52abf60784 100644 (file)
@@ -90,18 +90,23 @@ static void ionic_get_regs(struct net_device *netdev, struct ethtool_regs *regs,
                           void *p)
 {
        struct ionic_lif *lif = netdev_priv(netdev);
+       struct ionic_dev *idev;
        unsigned int offset;
        unsigned int size;
 
        regs->version = IONIC_DEV_CMD_REG_VERSION;
 
+       idev = &lif->ionic->idev;
+       if (!idev->dev_info_regs)
+               return;
+
        offset = 0;
        size = IONIC_DEV_INFO_REG_COUNT * sizeof(u32);
        memcpy_fromio(p + offset, lif->ionic->idev.dev_info_regs->words, size);
 
        offset += size;
        size = IONIC_DEV_CMD_REG_COUNT * sizeof(u32);
-       memcpy_fromio(p + offset, lif->ionic->idev.dev_cmd_regs->words, size);
+       memcpy_fromio(p + offset, idev->dev_cmd_regs->words, size);
 }
 
 static void ionic_get_link_ext_stats(struct net_device *netdev,
index 5f40324cd243fe2f2f79b924920951304d25df45..3c209c1a23373339b8455387105128f2dd9057be 100644 (file)
@@ -109,6 +109,11 @@ int ionic_firmware_update(struct ionic_lif *lif, const struct firmware *fw,
        dl = priv_to_devlink(ionic);
        devlink_flash_update_status_notify(dl, "Preparing to flash", NULL, 0, 0);
 
+       if (!idev->dev_cmd_regs) {
+               err = -ENXIO;
+               goto err_out;
+       }
+
        buf_sz = sizeof(idev->dev_cmd_regs->data);
 
        netdev_dbg(netdev,
index cf2d5ad7b68cc85195e516697d82238c6a7f5924..fcb44ceeb6aa51d944a12b411d904f2715a43be7 100644 (file)
@@ -3559,7 +3559,10 @@ int ionic_lif_init(struct ionic_lif *lif)
                        goto err_out_notifyq_deinit;
        }
 
-       err = ionic_init_nic_features(lif);
+       if (test_bit(IONIC_LIF_F_FW_RESET, lif->state))
+               err = ionic_set_nic_features(lif, lif->netdev->features);
+       else
+               err = ionic_init_nic_features(lif);
        if (err)
                goto err_out_notifyq_deinit;
 
index 165ab08ad2dda8ea15cca7aba88f586b0010c3f2..2f479de329fec5ef039c5e4ebaa3ea79d88a04a5 100644 (file)
@@ -416,6 +416,9 @@ static void ionic_dev_cmd_clean(struct ionic *ionic)
 {
        struct ionic_dev *idev = &ionic->idev;
 
+       if (!idev->dev_cmd_regs)
+               return;
+
        iowrite32(0, &idev->dev_cmd_regs->doorbell);
        memset_io(&idev->dev_cmd_regs->cmd, 0, sizeof(idev->dev_cmd_regs->cmd));
 }
index 54cd96b035d680a61297723b46adc16bf10ab3fa..6f47767598637ed3d2a961f6815cc183f066067b 100644 (file)
@@ -579,6 +579,9 @@ int ionic_tx_napi(struct napi_struct *napi, int budget)
        work_done = ionic_cq_service(cq, budget,
                                     ionic_tx_service, NULL, NULL);
 
+       if (unlikely(!budget))
+               return budget;
+
        if (work_done < budget && napi_complete_done(napi, work_done)) {
                ionic_dim_update(qcq, IONIC_LIF_F_TX_DIM_INTR);
                flags |= IONIC_INTR_CRED_UNMASK;
@@ -607,6 +610,9 @@ int ionic_rx_napi(struct napi_struct *napi, int budget)
        u32 work_done = 0;
        u32 flags = 0;
 
+       if (unlikely(!budget))
+               return budget;
+
        lif = cq->bound_q->lif;
        idev = &lif->ionic->idev;
 
@@ -656,6 +662,9 @@ int ionic_txrx_napi(struct napi_struct *napi, int budget)
        tx_work_done = ionic_cq_service(txcq, IONIC_TX_BUDGET_DEFAULT,
                                        ionic_tx_service, NULL, NULL);
 
+       if (unlikely(!budget))
+               return budget;
+
        rx_work_done = ionic_cq_service(rxcq, budget,
                                        ionic_rx_service, NULL, NULL);
 
index 0e3731f50fc2873dc3c4c06c16ffe1f4a8707e83..f7566cfa45ca37a3cfd02331c24f49bf576393a7 100644 (file)
@@ -772,29 +772,25 @@ static bool ravb_rx_gbeth(struct net_device *ndev, int *quota, int q)
        struct ravb_rx_desc *desc;
        struct sk_buff *skb;
        dma_addr_t dma_addr;
+       int rx_packets = 0;
        u8  desc_status;
-       int boguscnt;
        u16 pkt_len;
        u8  die_dt;
        int entry;
        int limit;
+       int i;
 
        entry = priv->cur_rx[q] % priv->num_rx_ring[q];
-       boguscnt = priv->dirty_rx[q] + priv->num_rx_ring[q] - priv->cur_rx[q];
+       limit = priv->dirty_rx[q] + priv->num_rx_ring[q] - priv->cur_rx[q];
        stats = &priv->stats[q];
 
-       boguscnt = min(boguscnt, *quota);
-       limit = boguscnt;
        desc = &priv->gbeth_rx_ring[entry];
-       while (desc->die_dt != DT_FEMPTY) {
+       for (i = 0; i < limit && rx_packets < *quota && desc->die_dt != DT_FEMPTY; i++) {
                /* Descriptor type must be checked before all other reads */
                dma_rmb();
                desc_status = desc->msc;
                pkt_len = le16_to_cpu(desc->ds_cc) & RX_DS;
 
-               if (--boguscnt < 0)
-                       break;
-
                /* We use 0-byte descriptors to mark the DMA mapping errors */
                if (!pkt_len)
                        continue;
@@ -820,7 +816,7 @@ static bool ravb_rx_gbeth(struct net_device *ndev, int *quota, int q)
                                skb_put(skb, pkt_len);
                                skb->protocol = eth_type_trans(skb, ndev);
                                napi_gro_receive(&priv->napi[q], skb);
-                               stats->rx_packets++;
+                               rx_packets++;
                                stats->rx_bytes += pkt_len;
                                break;
                        case DT_FSTART:
@@ -848,7 +844,7 @@ static bool ravb_rx_gbeth(struct net_device *ndev, int *quota, int q)
                                        eth_type_trans(priv->rx_1st_skb, ndev);
                                napi_gro_receive(&priv->napi[q],
                                                 priv->rx_1st_skb);
-                               stats->rx_packets++;
+                               rx_packets++;
                                stats->rx_bytes += pkt_len;
                                break;
                        }
@@ -887,9 +883,9 @@ static bool ravb_rx_gbeth(struct net_device *ndev, int *quota, int q)
                desc->die_dt = DT_FEMPTY;
        }
 
-       *quota -= limit - (++boguscnt);
-
-       return boguscnt <= 0;
+       stats->rx_packets += rx_packets;
+       *quota -= rx_packets;
+       return *quota == 0;
 }
 
 /* Packet receive function for Ethernet AVB */
index 721c1f8e892fc56ed1e9144619aa32ac676226b1..5ba606a596e779bc17081c55e5bf5c52555ada0b 100644 (file)
 #undef FRAME_FILTER_DEBUG
 /* #define FRAME_FILTER_DEBUG */
 
+struct stmmac_q_tx_stats {
+       u64_stats_t tx_bytes;
+       u64_stats_t tx_set_ic_bit;
+       u64_stats_t tx_tso_frames;
+       u64_stats_t tx_tso_nfrags;
+};
+
+struct stmmac_napi_tx_stats {
+       u64_stats_t tx_packets;
+       u64_stats_t tx_pkt_n;
+       u64_stats_t poll;
+       u64_stats_t tx_clean;
+       u64_stats_t tx_set_ic_bit;
+};
+
 struct stmmac_txq_stats {
-       u64 tx_bytes;
-       u64 tx_packets;
-       u64 tx_pkt_n;
-       u64 tx_normal_irq_n;
-       u64 napi_poll;
-       u64 tx_clean;
-       u64 tx_set_ic_bit;
-       u64 tx_tso_frames;
-       u64 tx_tso_nfrags;
-       struct u64_stats_sync syncp;
+       /* Updates protected by tx queue lock. */
+       struct u64_stats_sync q_syncp;
+       struct stmmac_q_tx_stats q;
+
+       /* Updates protected by NAPI poll logic. */
+       struct u64_stats_sync napi_syncp;
+       struct stmmac_napi_tx_stats napi;
 } ____cacheline_aligned_in_smp;
 
+struct stmmac_napi_rx_stats {
+       u64_stats_t rx_bytes;
+       u64_stats_t rx_packets;
+       u64_stats_t rx_pkt_n;
+       u64_stats_t poll;
+};
+
 struct stmmac_rxq_stats {
-       u64 rx_bytes;
-       u64 rx_packets;
-       u64 rx_pkt_n;
-       u64 rx_normal_irq_n;
-       u64 napi_poll;
-       struct u64_stats_sync syncp;
+       /* Updates protected by NAPI poll logic. */
+       struct u64_stats_sync napi_syncp;
+       struct stmmac_napi_rx_stats napi;
 } ____cacheline_aligned_in_smp;
 
+/* Updates on each CPU protected by not allowing nested irqs. */
+struct stmmac_pcpu_stats {
+       struct u64_stats_sync syncp;
+       u64_stats_t rx_normal_irq_n[MTL_MAX_TX_QUEUES];
+       u64_stats_t tx_normal_irq_n[MTL_MAX_RX_QUEUES];
+};
+
 /* Extra statistic and debug information exposed by ethtool */
 struct stmmac_extra_stats {
        /* Transmit errors */
@@ -205,6 +228,7 @@ struct stmmac_extra_stats {
        /* per queue statistics */
        struct stmmac_txq_stats txq_stats[MTL_MAX_TX_QUEUES];
        struct stmmac_rxq_stats rxq_stats[MTL_MAX_RX_QUEUES];
+       struct stmmac_pcpu_stats __percpu *pcpu_stats;
        unsigned long rx_dropped;
        unsigned long rx_errors;
        unsigned long tx_dropped;
@@ -216,6 +240,7 @@ struct stmmac_safety_stats {
        unsigned long mac_errors[32];
        unsigned long mtl_errors[32];
        unsigned long dma_errors[32];
+       unsigned long dma_dpp_errors[32];
 };
 
 /* Number of fields in Safety Stats */
index 8f730ada71f91d70b5c1d2707601b927f20aeb79..6b65420e11b5c518251565ca94bfb4a849068436 100644 (file)
@@ -353,6 +353,10 @@ static int imx_dwmac_probe(struct platform_device *pdev)
        if (data->flags & STMMAC_FLAG_HWTSTAMP_CORRECT_LATENCY)
                plat_dat->flags |= STMMAC_FLAG_HWTSTAMP_CORRECT_LATENCY;
 
+       /* Default TX Q0 to use TSO and rest TXQ for TBS */
+       for (int i = 1; i < plat_dat->tx_queues_to_use; i++)
+               plat_dat->tx_queues_cfg[i].tbs_en = 1;
+
        plat_dat->host_dma_width = dwmac->ops->addr_width;
        plat_dat->init = imx_dwmac_init;
        plat_dat->exit = imx_dwmac_exit;
index 137741b94122e5e99320eea5cad9909e6394dc7d..b21d99faa2d04c985427af61724dd073e3a2fe79 100644 (file)
@@ -441,8 +441,7 @@ static int sun8i_dwmac_dma_interrupt(struct stmmac_priv *priv,
                                     struct stmmac_extra_stats *x, u32 chan,
                                     u32 dir)
 {
-       struct stmmac_rxq_stats *rxq_stats = &priv->xstats.rxq_stats[chan];
-       struct stmmac_txq_stats *txq_stats = &priv->xstats.txq_stats[chan];
+       struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pcpu_stats);
        int ret = 0;
        u32 v;
 
@@ -455,9 +454,9 @@ static int sun8i_dwmac_dma_interrupt(struct stmmac_priv *priv,
 
        if (v & EMAC_TX_INT) {
                ret |= handle_tx;
-               u64_stats_update_begin(&txq_stats->syncp);
-               txq_stats->tx_normal_irq_n++;
-               u64_stats_update_end(&txq_stats->syncp);
+               u64_stats_update_begin(&stats->syncp);
+               u64_stats_inc(&stats->tx_normal_irq_n[chan]);
+               u64_stats_update_end(&stats->syncp);
        }
 
        if (v & EMAC_TX_DMA_STOP_INT)
@@ -479,9 +478,9 @@ static int sun8i_dwmac_dma_interrupt(struct stmmac_priv *priv,
 
        if (v & EMAC_RX_INT) {
                ret |= handle_rx;
-               u64_stats_update_begin(&rxq_stats->syncp);
-               rxq_stats->rx_normal_irq_n++;
-               u64_stats_update_end(&rxq_stats->syncp);
+               u64_stats_update_begin(&stats->syncp);
+               u64_stats_inc(&stats->rx_normal_irq_n[chan]);
+               u64_stats_update_end(&stats->syncp);
        }
 
        if (v & EMAC_RX_BUF_UA_INT)
index 9470d3fd2dede2bb436c05f6a92d87824c2db733..0d185e54eb7e24cfd4ef8de38e976aabd3ee9084 100644 (file)
@@ -171,8 +171,7 @@ int dwmac4_dma_interrupt(struct stmmac_priv *priv, void __iomem *ioaddr,
        const struct dwmac4_addrs *dwmac4_addrs = priv->plat->dwmac4_addrs;
        u32 intr_status = readl(ioaddr + DMA_CHAN_STATUS(dwmac4_addrs, chan));
        u32 intr_en = readl(ioaddr + DMA_CHAN_INTR_ENA(dwmac4_addrs, chan));
-       struct stmmac_rxq_stats *rxq_stats = &priv->xstats.rxq_stats[chan];
-       struct stmmac_txq_stats *txq_stats = &priv->xstats.txq_stats[chan];
+       struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pcpu_stats);
        int ret = 0;
 
        if (dir == DMA_DIR_RX)
@@ -201,15 +200,15 @@ int dwmac4_dma_interrupt(struct stmmac_priv *priv, void __iomem *ioaddr,
        }
        /* TX/RX NORMAL interrupts */
        if (likely(intr_status & DMA_CHAN_STATUS_RI)) {
-               u64_stats_update_begin(&rxq_stats->syncp);
-               rxq_stats->rx_normal_irq_n++;
-               u64_stats_update_end(&rxq_stats->syncp);
+               u64_stats_update_begin(&stats->syncp);
+               u64_stats_inc(&stats->rx_normal_irq_n[chan]);
+               u64_stats_update_end(&stats->syncp);
                ret |= handle_rx;
        }
        if (likely(intr_status & DMA_CHAN_STATUS_TI)) {
-               u64_stats_update_begin(&txq_stats->syncp);
-               txq_stats->tx_normal_irq_n++;
-               u64_stats_update_end(&txq_stats->syncp);
+               u64_stats_update_begin(&stats->syncp);
+               u64_stats_inc(&stats->tx_normal_irq_n[chan]);
+               u64_stats_update_end(&stats->syncp);
                ret |= handle_tx;
        }
 
index 7907d62d343759d661e00452198ef8e6cfef3601..85e18f9a22f92091bb98f1892d7bb1f5f08bcf2a 100644 (file)
@@ -162,8 +162,7 @@ static void show_rx_process_state(unsigned int status)
 int dwmac_dma_interrupt(struct stmmac_priv *priv, void __iomem *ioaddr,
                        struct stmmac_extra_stats *x, u32 chan, u32 dir)
 {
-       struct stmmac_rxq_stats *rxq_stats = &priv->xstats.rxq_stats[chan];
-       struct stmmac_txq_stats *txq_stats = &priv->xstats.txq_stats[chan];
+       struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pcpu_stats);
        int ret = 0;
        /* read the status register (CSR5) */
        u32 intr_status = readl(ioaddr + DMA_STATUS);
@@ -215,16 +214,16 @@ int dwmac_dma_interrupt(struct stmmac_priv *priv, void __iomem *ioaddr,
                        u32 value = readl(ioaddr + DMA_INTR_ENA);
                        /* to schedule NAPI on real RIE event. */
                        if (likely(value & DMA_INTR_ENA_RIE)) {
-                               u64_stats_update_begin(&rxq_stats->syncp);
-                               rxq_stats->rx_normal_irq_n++;
-                               u64_stats_update_end(&rxq_stats->syncp);
+                               u64_stats_update_begin(&stats->syncp);
+                               u64_stats_inc(&stats->rx_normal_irq_n[chan]);
+                               u64_stats_update_end(&stats->syncp);
                                ret |= handle_rx;
                        }
                }
                if (likely(intr_status & DMA_STATUS_TI)) {
-                       u64_stats_update_begin(&txq_stats->syncp);
-                       txq_stats->tx_normal_irq_n++;
-                       u64_stats_update_end(&txq_stats->syncp);
+                       u64_stats_update_begin(&stats->syncp);
+                       u64_stats_inc(&stats->tx_normal_irq_n[chan]);
+                       u64_stats_update_end(&stats->syncp);
                        ret |= handle_tx;
                }
                if (unlikely(intr_status & DMA_STATUS_ERI))
index 207ff1799f2c712fe1e10033fa4d2b32dbd197c6..6a2c7d22df1eb81dd216e00e7525bc8a9092c048 100644 (file)
 #define XGMAC_RXCEIE                   BIT(4)
 #define XGMAC_TXCEIE                   BIT(0)
 #define XGMAC_MTL_ECC_INT_STATUS       0x000010cc
+#define XGMAC_MTL_DPP_CONTROL          0x000010e0
+#define XGMAC_DPP_DISABLE              BIT(0)
 #define XGMAC_MTL_TXQ_OPMODE(x)                (0x00001100 + (0x80 * (x)))
 #define XGMAC_TQS                      GENMASK(25, 16)
 #define XGMAC_TQS_SHIFT                        16
 #define XGMAC_DCEIE                    BIT(1)
 #define XGMAC_TCEIE                    BIT(0)
 #define XGMAC_DMA_ECC_INT_STATUS       0x0000306c
+#define XGMAC_DMA_DPP_INT_STATUS       0x00003074
 #define XGMAC_DMA_CH_CONTROL(x)                (0x00003100 + (0x80 * (x)))
 #define XGMAC_SPH                      BIT(24)
 #define XGMAC_PBLx8                    BIT(16)
index eb48211d9b0eb7013b436b0336c75f512fbf638a..1af2f89a0504ab4c7ad6042e52f5898ba064df6c 100644 (file)
@@ -830,6 +830,44 @@ static const struct dwxgmac3_error_desc dwxgmac3_dma_errors[32]= {
        { false, "UNKNOWN", "Unknown Error" }, /* 31 */
 };
 
+#define DPP_RX_ERR "Read Rx Descriptor Parity checker Error"
+#define DPP_TX_ERR "Read Tx Descriptor Parity checker Error"
+
+static const struct dwxgmac3_error_desc dwxgmac3_dma_dpp_errors[32] = {
+       { true, "TDPES0", DPP_TX_ERR },
+       { true, "TDPES1", DPP_TX_ERR },
+       { true, "TDPES2", DPP_TX_ERR },
+       { true, "TDPES3", DPP_TX_ERR },
+       { true, "TDPES4", DPP_TX_ERR },
+       { true, "TDPES5", DPP_TX_ERR },
+       { true, "TDPES6", DPP_TX_ERR },
+       { true, "TDPES7", DPP_TX_ERR },
+       { true, "TDPES8", DPP_TX_ERR },
+       { true, "TDPES9", DPP_TX_ERR },
+       { true, "TDPES10", DPP_TX_ERR },
+       { true, "TDPES11", DPP_TX_ERR },
+       { true, "TDPES12", DPP_TX_ERR },
+       { true, "TDPES13", DPP_TX_ERR },
+       { true, "TDPES14", DPP_TX_ERR },
+       { true, "TDPES15", DPP_TX_ERR },
+       { true, "RDPES0", DPP_RX_ERR },
+       { true, "RDPES1", DPP_RX_ERR },
+       { true, "RDPES2", DPP_RX_ERR },
+       { true, "RDPES3", DPP_RX_ERR },
+       { true, "RDPES4", DPP_RX_ERR },
+       { true, "RDPES5", DPP_RX_ERR },
+       { true, "RDPES6", DPP_RX_ERR },
+       { true, "RDPES7", DPP_RX_ERR },
+       { true, "RDPES8", DPP_RX_ERR },
+       { true, "RDPES9", DPP_RX_ERR },
+       { true, "RDPES10", DPP_RX_ERR },
+       { true, "RDPES11", DPP_RX_ERR },
+       { true, "RDPES12", DPP_RX_ERR },
+       { true, "RDPES13", DPP_RX_ERR },
+       { true, "RDPES14", DPP_RX_ERR },
+       { true, "RDPES15", DPP_RX_ERR },
+};
+
 static void dwxgmac3_handle_dma_err(struct net_device *ndev,
                                    void __iomem *ioaddr, bool correctable,
                                    struct stmmac_safety_stats *stats)
@@ -841,6 +879,13 @@ static void dwxgmac3_handle_dma_err(struct net_device *ndev,
 
        dwxgmac3_log_error(ndev, value, correctable, "DMA",
                           dwxgmac3_dma_errors, STAT_OFF(dma_errors), stats);
+
+       value = readl(ioaddr + XGMAC_DMA_DPP_INT_STATUS);
+       writel(value, ioaddr + XGMAC_DMA_DPP_INT_STATUS);
+
+       dwxgmac3_log_error(ndev, value, false, "DMA_DPP",
+                          dwxgmac3_dma_dpp_errors,
+                          STAT_OFF(dma_dpp_errors), stats);
 }
 
 static int
@@ -881,6 +926,12 @@ dwxgmac3_safety_feat_config(void __iomem *ioaddr, unsigned int asp,
        value |= XGMAC_TMOUTEN; /* FSM Timeout Feature */
        writel(value, ioaddr + XGMAC_MAC_FSM_CONTROL);
 
+       /* 5. Enable Data Path Parity Protection */
+       value = readl(ioaddr + XGMAC_MTL_DPP_CONTROL);
+       /* already enabled by default, explicit enable it again */
+       value &= ~XGMAC_DPP_DISABLE;
+       writel(value, ioaddr + XGMAC_MTL_DPP_CONTROL);
+
        return 0;
 }
 
@@ -914,7 +965,11 @@ static int dwxgmac3_safety_feat_irq_status(struct net_device *ndev,
                ret |= !corr;
        }
 
-       err = dma & (XGMAC_DEUIS | XGMAC_DECIS);
+       /* DMA_DPP_Interrupt_Status is indicated by MCSIS bit in
+        * DMA_Safety_Interrupt_Status, so we handle DMA Data Path
+        * Parity Errors here
+        */
+       err = dma & (XGMAC_DEUIS | XGMAC_DECIS | XGMAC_MCSIS);
        corr = dma & XGMAC_DECIS;
        if (err) {
                dwxgmac3_handle_dma_err(ndev, ioaddr, corr, stats);
@@ -930,6 +985,7 @@ static const struct dwxgmac3_error {
        { dwxgmac3_mac_errors },
        { dwxgmac3_mtl_errors },
        { dwxgmac3_dma_errors },
+       { dwxgmac3_dma_dpp_errors },
 };
 
 static int dwxgmac3_safety_feat_dump(struct stmmac_safety_stats *stats,
index 3cde695fec91bd7592e23e725517f0cccee08a42..dd2ab6185c40e813ee4401857875d3e8478303e7 100644 (file)
@@ -337,8 +337,7 @@ static int dwxgmac2_dma_interrupt(struct stmmac_priv *priv,
                                  struct stmmac_extra_stats *x, u32 chan,
                                  u32 dir)
 {
-       struct stmmac_rxq_stats *rxq_stats = &priv->xstats.rxq_stats[chan];
-       struct stmmac_txq_stats *txq_stats = &priv->xstats.txq_stats[chan];
+       struct stmmac_pcpu_stats *stats = this_cpu_ptr(priv->xstats.pcpu_stats);
        u32 intr_status = readl(ioaddr + XGMAC_DMA_CH_STATUS(chan));
        u32 intr_en = readl(ioaddr + XGMAC_DMA_CH_INT_EN(chan));
        int ret = 0;
@@ -367,15 +366,15 @@ static int dwxgmac2_dma_interrupt(struct stmmac_priv *priv,
        /* TX/RX NORMAL interrupts */
        if (likely(intr_status & XGMAC_NIS)) {
                if (likely(intr_status & XGMAC_RI)) {
-                       u64_stats_update_begin(&rxq_stats->syncp);
-                       rxq_stats->rx_normal_irq_n++;
-                       u64_stats_update_end(&rxq_stats->syncp);
+                       u64_stats_update_begin(&stats->syncp);
+                       u64_stats_inc(&stats->rx_normal_irq_n[chan]);
+                       u64_stats_update_end(&stats->syncp);
                        ret |= handle_rx;
                }
                if (likely(intr_status & (XGMAC_TI | XGMAC_TBU))) {
-                       u64_stats_update_begin(&txq_stats->syncp);
-                       txq_stats->tx_normal_irq_n++;
-                       u64_stats_update_end(&txq_stats->syncp);
+                       u64_stats_update_begin(&stats->syncp);
+                       u64_stats_inc(&stats->tx_normal_irq_n[chan]);
+                       u64_stats_update_end(&stats->syncp);
                        ret |= handle_tx;
                }
        }
index 1bd34b2a47e81494eeddf72814b585e47d0b8c60..29367105df548271d3aa22cfad80a40dece256c1 100644 (file)
@@ -224,7 +224,7 @@ static const struct stmmac_hwif_entry {
                .regs = {
                        .ptp_off = PTP_GMAC4_OFFSET,
                        .mmc_off = MMC_GMAC4_OFFSET,
-                       .est_off = EST_XGMAC_OFFSET,
+                       .est_off = EST_GMAC4_OFFSET,
                },
                .desc = &dwmac4_desc_ops,
                .dma = &dwmac410_dma_ops,
index 42d27b97dd1d036e1410131060b65220b0ab2180..ec44becf0e2d289c4f6aeab983c54e93d70faf75 100644 (file)
@@ -549,44 +549,79 @@ stmmac_set_pauseparam(struct net_device *netdev,
        }
 }
 
+static u64 stmmac_get_rx_normal_irq_n(struct stmmac_priv *priv, int q)
+{
+       u64 total;
+       int cpu;
+
+       total = 0;
+       for_each_possible_cpu(cpu) {
+               struct stmmac_pcpu_stats *pcpu;
+               unsigned int start;
+               u64 irq_n;
+
+               pcpu = per_cpu_ptr(priv->xstats.pcpu_stats, cpu);
+               do {
+                       start = u64_stats_fetch_begin(&pcpu->syncp);
+                       irq_n = u64_stats_read(&pcpu->rx_normal_irq_n[q]);
+               } while (u64_stats_fetch_retry(&pcpu->syncp, start));
+               total += irq_n;
+       }
+       return total;
+}
+
+static u64 stmmac_get_tx_normal_irq_n(struct stmmac_priv *priv, int q)
+{
+       u64 total;
+       int cpu;
+
+       total = 0;
+       for_each_possible_cpu(cpu) {
+               struct stmmac_pcpu_stats *pcpu;
+               unsigned int start;
+               u64 irq_n;
+
+               pcpu = per_cpu_ptr(priv->xstats.pcpu_stats, cpu);
+               do {
+                       start = u64_stats_fetch_begin(&pcpu->syncp);
+                       irq_n = u64_stats_read(&pcpu->tx_normal_irq_n[q]);
+               } while (u64_stats_fetch_retry(&pcpu->syncp, start));
+               total += irq_n;
+       }
+       return total;
+}
+
 static void stmmac_get_per_qstats(struct stmmac_priv *priv, u64 *data)
 {
        u32 tx_cnt = priv->plat->tx_queues_to_use;
        u32 rx_cnt = priv->plat->rx_queues_to_use;
        unsigned int start;
-       int q, stat;
-       char *p;
+       int q;
 
        for (q = 0; q < tx_cnt; q++) {
                struct stmmac_txq_stats *txq_stats = &priv->xstats.txq_stats[q];
-               struct stmmac_txq_stats snapshot;
+               u64 pkt_n;
 
                do {
-                       start = u64_stats_fetch_begin(&txq_stats->syncp);
-                       snapshot = *txq_stats;
-               } while (u64_stats_fetch_retry(&txq_stats->syncp, start));
+                       start = u64_stats_fetch_begin(&txq_stats->napi_syncp);
+                       pkt_n = u64_stats_read(&txq_stats->napi.tx_pkt_n);
+               } while (u64_stats_fetch_retry(&txq_stats->napi_syncp, start));
 
-               p = (char *)&snapshot + offsetof(struct stmmac_txq_stats, tx_pkt_n);
-               for (stat = 0; stat < STMMAC_TXQ_STATS; stat++) {
-                       *data++ = (*(u64 *)p);
-                       p += sizeof(u64);
-               }
+               *data++ = pkt_n;
+               *data++ = stmmac_get_tx_normal_irq_n(priv, q);
        }
 
        for (q = 0; q < rx_cnt; q++) {
                struct stmmac_rxq_stats *rxq_stats = &priv->xstats.rxq_stats[q];
-               struct stmmac_rxq_stats snapshot;
+               u64 pkt_n;
 
                do {
-                       start = u64_stats_fetch_begin(&rxq_stats->syncp);
-                       snapshot = *rxq_stats;
-               } while (u64_stats_fetch_retry(&rxq_stats->syncp, start));
+                       start = u64_stats_fetch_begin(&rxq_stats->napi_syncp);
+                       pkt_n = u64_stats_read(&rxq_stats->napi.rx_pkt_n);
+               } while (u64_stats_fetch_retry(&rxq_stats->napi_syncp, start));
 
-               p = (char *)&snapshot + offsetof(struct stmmac_rxq_stats, rx_pkt_n);
-               for (stat = 0; stat < STMMAC_RXQ_STATS; stat++) {
-                       *data++ = (*(u64 *)p);
-                       p += sizeof(u64);
-               }
+               *data++ = pkt_n;
+               *data++ = stmmac_get_rx_normal_irq_n(priv, q);
        }
 }
 
@@ -645,39 +680,49 @@ static void stmmac_get_ethtool_stats(struct net_device *dev,
        pos = j;
        for (i = 0; i < rx_queues_count; i++) {
                struct stmmac_rxq_stats *rxq_stats = &priv->xstats.rxq_stats[i];
-               struct stmmac_rxq_stats snapshot;
+               struct stmmac_napi_rx_stats snapshot;
+               u64 n_irq;
 
                j = pos;
                do {
-                       start = u64_stats_fetch_begin(&rxq_stats->syncp);
-                       snapshot = *rxq_stats;
-               } while (u64_stats_fetch_retry(&rxq_stats->syncp, start));
-
-               data[j++] += snapshot.rx_pkt_n;
-               data[j++] += snapshot.rx_normal_irq_n;
-               normal_irq_n += snapshot.rx_normal_irq_n;
-               napi_poll += snapshot.napi_poll;
+                       start = u64_stats_fetch_begin(&rxq_stats->napi_syncp);
+                       snapshot = rxq_stats->napi;
+               } while (u64_stats_fetch_retry(&rxq_stats->napi_syncp, start));
+
+               data[j++] += u64_stats_read(&snapshot.rx_pkt_n);
+               n_irq = stmmac_get_rx_normal_irq_n(priv, i);
+               data[j++] += n_irq;
+               normal_irq_n += n_irq;
+               napi_poll += u64_stats_read(&snapshot.poll);
        }
 
        pos = j;
        for (i = 0; i < tx_queues_count; i++) {
                struct stmmac_txq_stats *txq_stats = &priv->xstats.txq_stats[i];
-               struct stmmac_txq_stats snapshot;
+               struct stmmac_napi_tx_stats napi_snapshot;
+               struct stmmac_q_tx_stats q_snapshot;
+               u64 n_irq;
 
                j = pos;
                do {
-                       start = u64_stats_fetch_begin(&txq_stats->syncp);
-                       snapshot = *txq_stats;
-               } while (u64_stats_fetch_retry(&txq_stats->syncp, start));
-
-               data[j++] += snapshot.tx_pkt_n;
-               data[j++] += snapshot.tx_normal_irq_n;
-               normal_irq_n += snapshot.tx_normal_irq_n;
-               data[j++] += snapshot.tx_clean;
-               data[j++] += snapshot.tx_set_ic_bit;
-               data[j++] += snapshot.tx_tso_frames;
-               data[j++] += snapshot.tx_tso_nfrags;
-               napi_poll += snapshot.napi_poll;
+                       start = u64_stats_fetch_begin(&txq_stats->q_syncp);
+                       q_snapshot = txq_stats->q;
+               } while (u64_stats_fetch_retry(&txq_stats->q_syncp, start));
+               do {
+                       start = u64_stats_fetch_begin(&txq_stats->napi_syncp);
+                       napi_snapshot = txq_stats->napi;
+               } while (u64_stats_fetch_retry(&txq_stats->napi_syncp, start));
+
+               data[j++] += u64_stats_read(&napi_snapshot.tx_pkt_n);
+               n_irq = stmmac_get_tx_normal_irq_n(priv, i);
+               data[j++] += n_irq;
+               normal_irq_n += n_irq;
+               data[j++] += u64_stats_read(&napi_snapshot.tx_clean);
+               data[j++] += u64_stats_read(&q_snapshot.tx_set_ic_bit) +
+                       u64_stats_read(&napi_snapshot.tx_set_ic_bit);
+               data[j++] += u64_stats_read(&q_snapshot.tx_tso_frames);
+               data[j++] += u64_stats_read(&q_snapshot.tx_tso_nfrags);
+               napi_poll += u64_stats_read(&napi_snapshot.poll);
        }
        normal_irq_n += priv->xstats.rx_early_irq;
        data[j++] = normal_irq_n;
index a0e46369ae158bf51ddcbd562414e172d97af201..7c6aef033a456455e4334466bf276755f33dbd47 100644 (file)
@@ -2482,7 +2482,6 @@ static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, u32 queue, u32 budget)
        struct xdp_desc xdp_desc;
        bool work_done = true;
        u32 tx_set_ic_bit = 0;
-       unsigned long flags;
 
        /* Avoids TX time-out as we are sharing with slow path */
        txq_trans_cond_update(nq);
@@ -2566,9 +2565,9 @@ static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, u32 queue, u32 budget)
                tx_q->cur_tx = STMMAC_GET_ENTRY(tx_q->cur_tx, priv->dma_conf.dma_tx_size);
                entry = tx_q->cur_tx;
        }
-       flags = u64_stats_update_begin_irqsave(&txq_stats->syncp);
-       txq_stats->tx_set_ic_bit += tx_set_ic_bit;
-       u64_stats_update_end_irqrestore(&txq_stats->syncp, flags);
+       u64_stats_update_begin(&txq_stats->napi_syncp);
+       u64_stats_add(&txq_stats->napi.tx_set_ic_bit, tx_set_ic_bit);
+       u64_stats_update_end(&txq_stats->napi_syncp);
 
        if (tx_desc) {
                stmmac_flush_tx_descriptors(priv, queue);
@@ -2616,7 +2615,6 @@ static int stmmac_tx_clean(struct stmmac_priv *priv, int budget, u32 queue,
        unsigned int bytes_compl = 0, pkts_compl = 0;
        unsigned int entry, xmits = 0, count = 0;
        u32 tx_packets = 0, tx_errors = 0;
-       unsigned long flags;
 
        __netif_tx_lock_bh(netdev_get_tx_queue(priv->dev, queue));
 
@@ -2674,7 +2672,8 @@ static int stmmac_tx_clean(struct stmmac_priv *priv, int budget, u32 queue,
                        }
                        if (skb) {
                                stmmac_get_tx_hwtstamp(priv, p, skb);
-                       } else {
+                       } else if (tx_q->xsk_pool &&
+                                  xp_tx_metadata_enabled(tx_q->xsk_pool)) {
                                struct stmmac_xsk_tx_complete tx_compl = {
                                        .priv = priv,
                                        .desc = p,
@@ -2782,11 +2781,11 @@ static int stmmac_tx_clean(struct stmmac_priv *priv, int budget, u32 queue,
        if (tx_q->dirty_tx != tx_q->cur_tx)
                *pending_packets = true;
 
-       flags = u64_stats_update_begin_irqsave(&txq_stats->syncp);
-       txq_stats->tx_packets += tx_packets;
-       txq_stats->tx_pkt_n += tx_packets;
-       txq_stats->tx_clean++;
-       u64_stats_update_end_irqrestore(&txq_stats->syncp, flags);
+       u64_stats_update_begin(&txq_stats->napi_syncp);
+       u64_stats_add(&txq_stats->napi.tx_packets, tx_packets);
+       u64_stats_add(&txq_stats->napi.tx_pkt_n, tx_packets);
+       u64_stats_inc(&txq_stats->napi.tx_clean);
+       u64_stats_update_end(&txq_stats->napi_syncp);
 
        priv->xstats.tx_errors += tx_errors;
 
@@ -3932,6 +3931,9 @@ static int __stmmac_open(struct net_device *dev,
        priv->rx_copybreak = STMMAC_RX_COPYBREAK;
 
        buf_sz = dma_conf->dma_buf_sz;
+       for (int i = 0; i < MTL_MAX_TX_QUEUES; i++)
+               if (priv->dma_conf.tx_queue[i].tbs & STMMAC_TBS_EN)
+                       dma_conf->tx_queue[i].tbs = priv->dma_conf.tx_queue[i].tbs;
        memcpy(&priv->dma_conf, dma_conf, sizeof(*dma_conf));
 
        stmmac_reset_queues_param(priv);
@@ -4004,8 +4006,10 @@ static void stmmac_fpe_stop_wq(struct stmmac_priv *priv)
 {
        set_bit(__FPE_REMOVING, &priv->fpe_task_state);
 
-       if (priv->fpe_wq)
+       if (priv->fpe_wq) {
                destroy_workqueue(priv->fpe_wq);
+               priv->fpe_wq = NULL;
+       }
 
        netdev_info(priv->dev, "FPE workqueue stop");
 }
@@ -4210,7 +4214,6 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
        struct stmmac_tx_queue *tx_q;
        bool has_vlan, set_ic;
        u8 proto_hdr_len, hdr;
-       unsigned long flags;
        u32 pay_len, mss;
        dma_addr_t des;
        int i;
@@ -4375,13 +4378,13 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
                netif_tx_stop_queue(netdev_get_tx_queue(priv->dev, queue));
        }
 
-       flags = u64_stats_update_begin_irqsave(&txq_stats->syncp);
-       txq_stats->tx_bytes += skb->len;
-       txq_stats->tx_tso_frames++;
-       txq_stats->tx_tso_nfrags += nfrags;
+       u64_stats_update_begin(&txq_stats->q_syncp);
+       u64_stats_add(&txq_stats->q.tx_bytes, skb->len);
+       u64_stats_inc(&txq_stats->q.tx_tso_frames);
+       u64_stats_add(&txq_stats->q.tx_tso_nfrags, nfrags);
        if (set_ic)
-               txq_stats->tx_set_ic_bit++;
-       u64_stats_update_end_irqrestore(&txq_stats->syncp, flags);
+               u64_stats_inc(&txq_stats->q.tx_set_ic_bit);
+       u64_stats_update_end(&txq_stats->q_syncp);
 
        if (priv->sarc_type)
                stmmac_set_desc_sarc(priv, first, priv->sarc_type);
@@ -4480,7 +4483,6 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
        struct stmmac_tx_queue *tx_q;
        bool has_vlan, set_ic;
        int entry, first_tx;
-       unsigned long flags;
        dma_addr_t des;
 
        tx_q = &priv->dma_conf.tx_queue[queue];
@@ -4650,11 +4652,11 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
                netif_tx_stop_queue(netdev_get_tx_queue(priv->dev, queue));
        }
 
-       flags = u64_stats_update_begin_irqsave(&txq_stats->syncp);
-       txq_stats->tx_bytes += skb->len;
+       u64_stats_update_begin(&txq_stats->q_syncp);
+       u64_stats_add(&txq_stats->q.tx_bytes, skb->len);
        if (set_ic)
-               txq_stats->tx_set_ic_bit++;
-       u64_stats_update_end_irqrestore(&txq_stats->syncp, flags);
+               u64_stats_inc(&txq_stats->q.tx_set_ic_bit);
+       u64_stats_update_end(&txq_stats->q_syncp);
 
        if (priv->sarc_type)
                stmmac_set_desc_sarc(priv, first, priv->sarc_type);
@@ -4918,12 +4920,11 @@ static int stmmac_xdp_xmit_xdpf(struct stmmac_priv *priv, int queue,
                set_ic = false;
 
        if (set_ic) {
-               unsigned long flags;
                tx_q->tx_count_frames = 0;
                stmmac_set_tx_ic(priv, tx_desc);
-               flags = u64_stats_update_begin_irqsave(&txq_stats->syncp);
-               txq_stats->tx_set_ic_bit++;
-               u64_stats_update_end_irqrestore(&txq_stats->syncp, flags);
+               u64_stats_update_begin(&txq_stats->q_syncp);
+               u64_stats_inc(&txq_stats->q.tx_set_ic_bit);
+               u64_stats_update_end(&txq_stats->q_syncp);
        }
 
        stmmac_enable_dma_transmission(priv, priv->ioaddr);
@@ -5073,7 +5074,6 @@ static void stmmac_dispatch_skb_zc(struct stmmac_priv *priv, u32 queue,
        unsigned int len = xdp->data_end - xdp->data;
        enum pkt_hash_types hash_type;
        int coe = priv->hw->rx_csum;
-       unsigned long flags;
        struct sk_buff *skb;
        u32 hash;
 
@@ -5103,10 +5103,10 @@ static void stmmac_dispatch_skb_zc(struct stmmac_priv *priv, u32 queue,
        skb_record_rx_queue(skb, queue);
        napi_gro_receive(&ch->rxtx_napi, skb);
 
-       flags = u64_stats_update_begin_irqsave(&rxq_stats->syncp);
-       rxq_stats->rx_pkt_n++;
-       rxq_stats->rx_bytes += len;
-       u64_stats_update_end_irqrestore(&rxq_stats->syncp, flags);
+       u64_stats_update_begin(&rxq_stats->napi_syncp);
+       u64_stats_inc(&rxq_stats->napi.rx_pkt_n);
+       u64_stats_add(&rxq_stats->napi.rx_bytes, len);
+       u64_stats_update_end(&rxq_stats->napi_syncp);
 }
 
 static bool stmmac_rx_refill_zc(struct stmmac_priv *priv, u32 queue, u32 budget)
@@ -5188,7 +5188,6 @@ static int stmmac_rx_zc(struct stmmac_priv *priv, int limit, u32 queue)
        unsigned int desc_size;
        struct bpf_prog *prog;
        bool failure = false;
-       unsigned long flags;
        int xdp_status = 0;
        int status = 0;
 
@@ -5343,9 +5342,9 @@ read_again:
 
        stmmac_finalize_xdp_rx(priv, xdp_status);
 
-       flags = u64_stats_update_begin_irqsave(&rxq_stats->syncp);
-       rxq_stats->rx_pkt_n += count;
-       u64_stats_update_end_irqrestore(&rxq_stats->syncp, flags);
+       u64_stats_update_begin(&rxq_stats->napi_syncp);
+       u64_stats_add(&rxq_stats->napi.rx_pkt_n, count);
+       u64_stats_update_end(&rxq_stats->napi_syncp);
 
        priv->xstats.rx_dropped += rx_dropped;
        priv->xstats.rx_errors += rx_errors;
@@ -5383,7 +5382,6 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
        unsigned int desc_size;
        struct sk_buff *skb = NULL;
        struct stmmac_xdp_buff ctx;
-       unsigned long flags;
        int xdp_status = 0;
        int buf_sz;
 
@@ -5643,11 +5641,11 @@ drain_data:
 
        stmmac_rx_refill(priv, queue);
 
-       flags = u64_stats_update_begin_irqsave(&rxq_stats->syncp);
-       rxq_stats->rx_packets += rx_packets;
-       rxq_stats->rx_bytes += rx_bytes;
-       rxq_stats->rx_pkt_n += count;
-       u64_stats_update_end_irqrestore(&rxq_stats->syncp, flags);
+       u64_stats_update_begin(&rxq_stats->napi_syncp);
+       u64_stats_add(&rxq_stats->napi.rx_packets, rx_packets);
+       u64_stats_add(&rxq_stats->napi.rx_bytes, rx_bytes);
+       u64_stats_add(&rxq_stats->napi.rx_pkt_n, count);
+       u64_stats_update_end(&rxq_stats->napi_syncp);
 
        priv->xstats.rx_dropped += rx_dropped;
        priv->xstats.rx_errors += rx_errors;
@@ -5662,13 +5660,12 @@ static int stmmac_napi_poll_rx(struct napi_struct *napi, int budget)
        struct stmmac_priv *priv = ch->priv_data;
        struct stmmac_rxq_stats *rxq_stats;
        u32 chan = ch->index;
-       unsigned long flags;
        int work_done;
 
        rxq_stats = &priv->xstats.rxq_stats[chan];
-       flags = u64_stats_update_begin_irqsave(&rxq_stats->syncp);
-       rxq_stats->napi_poll++;
-       u64_stats_update_end_irqrestore(&rxq_stats->syncp, flags);
+       u64_stats_update_begin(&rxq_stats->napi_syncp);
+       u64_stats_inc(&rxq_stats->napi.poll);
+       u64_stats_update_end(&rxq_stats->napi_syncp);
 
        work_done = stmmac_rx(priv, budget, chan);
        if (work_done < budget && napi_complete_done(napi, work_done)) {
@@ -5690,13 +5687,12 @@ static int stmmac_napi_poll_tx(struct napi_struct *napi, int budget)
        struct stmmac_txq_stats *txq_stats;
        bool pending_packets = false;
        u32 chan = ch->index;
-       unsigned long flags;
        int work_done;
 
        txq_stats = &priv->xstats.txq_stats[chan];
-       flags = u64_stats_update_begin_irqsave(&txq_stats->syncp);
-       txq_stats->napi_poll++;
-       u64_stats_update_end_irqrestore(&txq_stats->syncp, flags);
+       u64_stats_update_begin(&txq_stats->napi_syncp);
+       u64_stats_inc(&txq_stats->napi.poll);
+       u64_stats_update_end(&txq_stats->napi_syncp);
 
        work_done = stmmac_tx_clean(priv, budget, chan, &pending_packets);
        work_done = min(work_done, budget);
@@ -5726,17 +5722,16 @@ static int stmmac_napi_poll_rxtx(struct napi_struct *napi, int budget)
        struct stmmac_rxq_stats *rxq_stats;
        struct stmmac_txq_stats *txq_stats;
        u32 chan = ch->index;
-       unsigned long flags;
 
        rxq_stats = &priv->xstats.rxq_stats[chan];
-       flags = u64_stats_update_begin_irqsave(&rxq_stats->syncp);
-       rxq_stats->napi_poll++;
-       u64_stats_update_end_irqrestore(&rxq_stats->syncp, flags);
+       u64_stats_update_begin(&rxq_stats->napi_syncp);
+       u64_stats_inc(&rxq_stats->napi.poll);
+       u64_stats_update_end(&rxq_stats->napi_syncp);
 
        txq_stats = &priv->xstats.txq_stats[chan];
-       flags = u64_stats_update_begin_irqsave(&txq_stats->syncp);
-       txq_stats->napi_poll++;
-       u64_stats_update_end_irqrestore(&txq_stats->syncp, flags);
+       u64_stats_update_begin(&txq_stats->napi_syncp);
+       u64_stats_inc(&txq_stats->napi.poll);
+       u64_stats_update_end(&txq_stats->napi_syncp);
 
        tx_done = stmmac_tx_clean(priv, budget, chan, &tx_pending_packets);
        tx_done = min(tx_done, budget);
@@ -6067,11 +6062,6 @@ static irqreturn_t stmmac_mac_interrupt(int irq, void *dev_id)
        struct net_device *dev = (struct net_device *)dev_id;
        struct stmmac_priv *priv = netdev_priv(dev);
 
-       if (unlikely(!dev)) {
-               netdev_err(priv->dev, "%s: invalid dev pointer\n", __func__);
-               return IRQ_NONE;
-       }
-
        /* Check if adapter is up */
        if (test_bit(STMMAC_DOWN, &priv->state))
                return IRQ_HANDLED;
@@ -6087,11 +6077,6 @@ static irqreturn_t stmmac_safety_interrupt(int irq, void *dev_id)
        struct net_device *dev = (struct net_device *)dev_id;
        struct stmmac_priv *priv = netdev_priv(dev);
 
-       if (unlikely(!dev)) {
-               netdev_err(priv->dev, "%s: invalid dev pointer\n", __func__);
-               return IRQ_NONE;
-       }
-
        /* Check if adapter is up */
        if (test_bit(STMMAC_DOWN, &priv->state))
                return IRQ_HANDLED;
@@ -6113,11 +6098,6 @@ static irqreturn_t stmmac_msi_intr_tx(int irq, void *data)
        dma_conf = container_of(tx_q, struct stmmac_dma_conf, tx_queue[chan]);
        priv = container_of(dma_conf, struct stmmac_priv, dma_conf);
 
-       if (unlikely(!data)) {
-               netdev_err(priv->dev, "%s: invalid dev pointer\n", __func__);
-               return IRQ_NONE;
-       }
-
        /* Check if adapter is up */
        if (test_bit(STMMAC_DOWN, &priv->state))
                return IRQ_HANDLED;
@@ -6144,11 +6124,6 @@ static irqreturn_t stmmac_msi_intr_rx(int irq, void *data)
        dma_conf = container_of(rx_q, struct stmmac_dma_conf, rx_queue[chan]);
        priv = container_of(dma_conf, struct stmmac_priv, dma_conf);
 
-       if (unlikely(!data)) {
-               netdev_err(priv->dev, "%s: invalid dev pointer\n", __func__);
-               return IRQ_NONE;
-       }
-
        /* Check if adapter is up */
        if (test_bit(STMMAC_DOWN, &priv->state))
                return IRQ_HANDLED;
@@ -7062,10 +7037,13 @@ static void stmmac_get_stats64(struct net_device *dev, struct rtnl_link_stats64
                u64 tx_bytes;
 
                do {
-                       start = u64_stats_fetch_begin(&txq_stats->syncp);
-                       tx_packets = txq_stats->tx_packets;
-                       tx_bytes   = txq_stats->tx_bytes;
-               } while (u64_stats_fetch_retry(&txq_stats->syncp, start));
+                       start = u64_stats_fetch_begin(&txq_stats->q_syncp);
+                       tx_bytes   = u64_stats_read(&txq_stats->q.tx_bytes);
+               } while (u64_stats_fetch_retry(&txq_stats->q_syncp, start));
+               do {
+                       start = u64_stats_fetch_begin(&txq_stats->napi_syncp);
+                       tx_packets = u64_stats_read(&txq_stats->napi.tx_packets);
+               } while (u64_stats_fetch_retry(&txq_stats->napi_syncp, start));
 
                stats->tx_packets += tx_packets;
                stats->tx_bytes += tx_bytes;
@@ -7077,10 +7055,10 @@ static void stmmac_get_stats64(struct net_device *dev, struct rtnl_link_stats64
                u64 rx_bytes;
 
                do {
-                       start = u64_stats_fetch_begin(&rxq_stats->syncp);
-                       rx_packets = rxq_stats->rx_packets;
-                       rx_bytes   = rxq_stats->rx_bytes;
-               } while (u64_stats_fetch_retry(&rxq_stats->syncp, start));
+                       start = u64_stats_fetch_begin(&rxq_stats->napi_syncp);
+                       rx_packets = u64_stats_read(&rxq_stats->napi.rx_packets);
+                       rx_bytes   = u64_stats_read(&rxq_stats->napi.rx_bytes);
+               } while (u64_stats_fetch_retry(&rxq_stats->napi_syncp, start));
 
                stats->rx_packets += rx_packets;
                stats->rx_bytes += rx_bytes;
@@ -7474,9 +7452,16 @@ int stmmac_dvr_probe(struct device *device,
        priv->dev = ndev;
 
        for (i = 0; i < MTL_MAX_RX_QUEUES; i++)
-               u64_stats_init(&priv->xstats.rxq_stats[i].syncp);
-       for (i = 0; i < MTL_MAX_TX_QUEUES; i++)
-               u64_stats_init(&priv->xstats.txq_stats[i].syncp);
+               u64_stats_init(&priv->xstats.rxq_stats[i].napi_syncp);
+       for (i = 0; i < MTL_MAX_TX_QUEUES; i++) {
+               u64_stats_init(&priv->xstats.txq_stats[i].q_syncp);
+               u64_stats_init(&priv->xstats.txq_stats[i].napi_syncp);
+       }
+
+       priv->xstats.pcpu_stats =
+               devm_netdev_alloc_pcpu_stats(device, struct stmmac_pcpu_stats);
+       if (!priv->xstats.pcpu_stats)
+               return -ENOMEM;
 
        stmmac_set_ethtool_ops(ndev);
        priv->pause = pause;
@@ -7542,6 +7527,9 @@ int stmmac_dvr_probe(struct device *device,
                dev_err(priv->device, "unable to bring out of ahb reset: %pe\n",
                        ERR_PTR(ret));
 
+       /* Wait a bit for the reset to take effect */
+       udelay(10);
+
        /* Init MAC and get the capabilities */
        ret = stmmac_hw_init(priv);
        if (ret)
index be01450c20dc0199ebc5d1d731eca04a47781539..1530d13984d42606f6e4b4d1d28ca3f8c6461ac0 100644 (file)
@@ -189,6 +189,7 @@ config TI_ICSSG_PRUETH
        select TI_K3_CPPI_DESC_POOL
        depends on PRU_REMOTEPROC
        depends on ARCH_K3 && OF && TI_K3_UDMA_GLUE_LAYER
+       depends on PTP_1588_CLOCK_OPTIONAL
        help
          Support dual Gigabit Ethernet ports over the ICSSG PRU Subsystem.
          This subsystem is available starting with the AM65 platform.
index 9d2f4ac783e43502586b27283a4db73351ca0583..2939a21ca74f3cf0f627981df74a949e9c61011e 100644 (file)
@@ -294,7 +294,7 @@ static void am65_cpsw_nuss_ndo_host_tx_timeout(struct net_device *ndev,
                   txqueue,
                   netif_tx_queue_stopped(netif_txq),
                   jiffies_to_msecs(jiffies - trans_start),
-                  dql_avail(&netif_txq->dql),
+                  netdev_queue_dql_avail(netif_txq),
                   k3_cppi_desc_pool_avail(tx_chn->desc_pool));
 
        if (netif_tx_queue_stopped(netif_txq)) {
index ea85c6dd5484617a038e565312e8ad0ccdce6c75..c0a5abd8d9a8e6e0d113c36a9557a1de1c360993 100644 (file)
@@ -631,6 +631,8 @@ static void cpsw_slave_open(struct cpsw_slave *slave, struct cpsw_priv *priv)
                }
        }
 
+       phy->mac_managed_pm = true;
+
        slave->phy = phy;
 
        phy_attached_info(slave->phy);
index 498c50c6d1a701b86596b9148dbdc4523176cee7..087dcb67505a2da5995963d5d67d36dadb580a47 100644 (file)
@@ -773,6 +773,9 @@ static void cpsw_slave_open(struct cpsw_slave *slave, struct cpsw_priv *priv)
                        slave->slave_num);
                return;
        }
+
+       phy->mac_managed_pm = true;
+
        slave->phy = phy;
 
        phy_attached_info(slave->phy);
index bcccf43d368b7e2a9efc3a6e855a5b2299e9cdc4..dbbea914604057ca97823c5c6e164be50303df08 100644 (file)
@@ -638,6 +638,16 @@ static void cpts_calc_mult_shift(struct cpts *cpts)
                 freq, cpts->cc.mult, cpts->cc.shift, (ns - NSEC_PER_SEC));
 }
 
+static void cpts_clk_unregister(void *clk)
+{
+       clk_hw_unregister_mux(clk);
+}
+
+static void cpts_clk_del_provider(void *np)
+{
+       of_clk_del_provider(np);
+}
+
 static int cpts_of_mux_clk_setup(struct cpts *cpts, struct device_node *node)
 {
        struct device_node *refclk_np;
@@ -687,9 +697,7 @@ static int cpts_of_mux_clk_setup(struct cpts *cpts, struct device_node *node)
                goto mux_fail;
        }
 
-       ret = devm_add_action_or_reset(cpts->dev,
-                                      (void(*)(void *))clk_hw_unregister_mux,
-                                      clk_hw);
+       ret = devm_add_action_or_reset(cpts->dev, cpts_clk_unregister, clk_hw);
        if (ret) {
                dev_err(cpts->dev, "add clkmux unreg action %d", ret);
                goto mux_fail;
@@ -699,8 +707,7 @@ static int cpts_of_mux_clk_setup(struct cpts *cpts, struct device_node *node)
        if (ret)
                goto mux_fail;
 
-       ret = devm_add_action_or_reset(cpts->dev,
-                                      (void(*)(void *))of_clk_del_provider,
+       ret = devm_add_action_or_reset(cpts->dev, cpts_clk_del_provider,
                                       refclk_np);
        if (ret) {
                dev_err(cpts->dev, "add clkmux provider unreg action %d", ret);
index d5b75af163d35e6b257e9d3dcb48ada80f8a0f20..c1b0d35c8d05207b351b9313f6ae24b986ff3ca1 100644 (file)
@@ -384,18 +384,18 @@ static int gelic_descr_prepare_rx(struct gelic_card *card,
        if (gelic_descr_get_status(descr) !=  GELIC_DESCR_DMA_NOT_IN_USE)
                dev_info(ctodev(card), "%s: ERROR status\n", __func__);
 
-       descr->skb = netdev_alloc_skb(*card->netdev, rx_skb_size);
-       if (!descr->skb) {
-               descr->hw_regs.payload.dev_addr = 0; /* tell DMAC don't touch memory */
-               return -ENOMEM;
-       }
        descr->hw_regs.dmac_cmd_status = 0;
        descr->hw_regs.result_size = 0;
        descr->hw_regs.valid_size = 0;
        descr->hw_regs.data_error = 0;
        descr->hw_regs.payload.dev_addr = 0;
        descr->hw_regs.payload.size = 0;
-       descr->skb = NULL;
+
+       descr->skb = netdev_alloc_skb(*card->netdev, rx_skb_size);
+       if (!descr->skb) {
+               descr->hw_regs.payload.dev_addr = 0; /* tell DMAC don't touch memory */
+               return -ENOMEM;
+       }
 
        offset = ((unsigned long)descr->skb->data) &
                (GELIC_NET_RXBUF_ALIGN - 1);
index 2b6a607ac0b78848d8694af42df349aaa24f16d8..a273362c9e703ce8f807ae86e4705565ab51c605 100644 (file)
@@ -153,6 +153,7 @@ static const struct pci_device_id skfddi_pci_tbl[] = {
        { }                     /* Terminating entry */
 };
 MODULE_DEVICE_TABLE(pci, skfddi_pci_tbl);
+MODULE_DESCRIPTION("SysKonnect FDDI PCI driver");
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Mirko Lindner <mlindner@syskonnect.de>");
 
index 704e949484d0c1684247302fb29f45b7ffa3b3e1..b9b5554ea8620ed7249fbdc8870779c6ad658f1b 100644 (file)
@@ -221,21 +221,25 @@ static int fjes_hw_setup(struct fjes_hw *hw)
 
        mem_size = FJES_DEV_REQ_BUF_SIZE(hw->max_epid);
        hw->hw_info.req_buf = kzalloc(mem_size, GFP_KERNEL);
-       if (!(hw->hw_info.req_buf))
-               return -ENOMEM;
+       if (!(hw->hw_info.req_buf)) {
+               result = -ENOMEM;
+               goto free_ep_info;
+       }
 
        hw->hw_info.req_buf_size = mem_size;
 
        mem_size = FJES_DEV_RES_BUF_SIZE(hw->max_epid);
        hw->hw_info.res_buf = kzalloc(mem_size, GFP_KERNEL);
-       if (!(hw->hw_info.res_buf))
-               return -ENOMEM;
+       if (!(hw->hw_info.res_buf)) {
+               result = -ENOMEM;
+               goto free_req_buf;
+       }
 
        hw->hw_info.res_buf_size = mem_size;
 
        result = fjes_hw_alloc_shared_status_region(hw);
        if (result)
-               return result;
+               goto free_res_buf;
 
        hw->hw_info.buffer_share_bit = 0;
        hw->hw_info.buffer_unshare_reserve_bit = 0;
@@ -246,11 +250,11 @@ static int fjes_hw_setup(struct fjes_hw *hw)
 
                        result = fjes_hw_alloc_epbuf(&buf_pair->tx);
                        if (result)
-                               return result;
+                               goto free_epbuf;
 
                        result = fjes_hw_alloc_epbuf(&buf_pair->rx);
                        if (result)
-                               return result;
+                               goto free_epbuf;
 
                        spin_lock_irqsave(&hw->rx_status_lock, flags);
                        fjes_hw_setup_epbuf(&buf_pair->tx, mac,
@@ -273,6 +277,25 @@ static int fjes_hw_setup(struct fjes_hw *hw)
        fjes_hw_init_command_registers(hw, &param);
 
        return 0;
+
+free_epbuf:
+       for (epidx = 0; epidx < hw->max_epid ; epidx++) {
+               if (epidx == hw->my_epid)
+                       continue;
+               fjes_hw_free_epbuf(&hw->ep_shm_info[epidx].tx);
+               fjes_hw_free_epbuf(&hw->ep_shm_info[epidx].rx);
+       }
+       fjes_hw_free_shared_status_region(hw);
+free_res_buf:
+       kfree(hw->hw_info.res_buf);
+       hw->hw_info.res_buf = NULL;
+free_req_buf:
+       kfree(hw->hw_info.req_buf);
+       hw->hw_info.req_buf = NULL;
+free_ep_info:
+       kfree(hw->ep_shm_info);
+       hw->ep_shm_info = NULL;
+       return result;
 }
 
 static void fjes_hw_cleanup(struct fjes_hw *hw)
index 32c51c244153bd760b9f58001906c04c8b0f37ff..c4ed36c71897439fc8f6c11d069c88996e2a2a3c 100644 (file)
@@ -221,7 +221,7 @@ static void geneve_rx(struct geneve_dev *geneve, struct geneve_sock *gs,
        struct genevehdr *gnvh = geneve_hdr(skb);
        struct metadata_dst *tun_dst = NULL;
        unsigned int len;
-       int err = 0;
+       int nh, err = 0;
        void *oiph;
 
        if (ip_tunnel_collect_metadata() || gs->collect_md) {
@@ -272,9 +272,23 @@ static void geneve_rx(struct geneve_dev *geneve, struct geneve_sock *gs,
                skb->pkt_type = PACKET_HOST;
        }
 
-       oiph = skb_network_header(skb);
+       /* Save offset of outer header relative to skb->head,
+        * because we are going to reset the network header to the inner header
+        * and might change skb->head.
+        */
+       nh = skb_network_header(skb) - skb->head;
+
        skb_reset_network_header(skb);
 
+       if (!pskb_inet_may_pull(skb)) {
+               DEV_STATS_INC(geneve->dev, rx_length_errors);
+               DEV_STATS_INC(geneve->dev, rx_errors);
+               goto drop;
+       }
+
+       /* Get the outer header. */
+       oiph = skb->head + nh;
+
        if (geneve_get_sk_family(gs) == AF_INET)
                err = IP_ECN_decapsulate(oiph, skb);
 #if IS_ENABLED(CONFIG_IPV6)
index b1919278e931f4e9fb6b2d2ec2feb2193b2cda61..2b5357d94ff5683049510c71c932be05abe0f211 100644 (file)
@@ -1903,26 +1903,26 @@ static int __init gtp_init(void)
 
        get_random_bytes(&gtp_h_initval, sizeof(gtp_h_initval));
 
-       err = rtnl_link_register(&gtp_link_ops);
+       err = register_pernet_subsys(&gtp_net_ops);
        if (err < 0)
                goto error_out;
 
-       err = genl_register_family(&gtp_genl_family);
+       err = rtnl_link_register(&gtp_link_ops);
        if (err < 0)
-               goto unreg_rtnl_link;
+               goto unreg_pernet_subsys;
 
-       err = register_pernet_subsys(&gtp_net_ops);
+       err = genl_register_family(&gtp_genl_family);
        if (err < 0)
-               goto unreg_genl_family;
+               goto unreg_rtnl_link;
 
        pr_info("GTP module loaded (pdp ctx size %zd bytes)\n",
                sizeof(struct pdp_ctx));
        return 0;
 
-unreg_genl_family:
-       genl_unregister_family(&gtp_genl_family);
 unreg_rtnl_link:
        rtnl_link_unregister(&gtp_link_ops);
+unreg_pernet_subsys:
+       unregister_pernet_subsys(&gtp_net_ops);
 error_out:
        pr_err("error loading GTP module loaded\n");
        return err;
index 1dafa44155d0eb31dfaea9cacdc3954ebee75f4b..a6fcbda64ecc60e5beccf20f2043ab00870cbd5d 100644 (file)
@@ -708,7 +708,10 @@ void netvsc_device_remove(struct hv_device *device)
        /* Disable NAPI and disassociate its context from the device. */
        for (i = 0; i < net_device->num_chn; i++) {
                /* See also vmbus_reset_channel_cb(). */
-               napi_disable(&net_device->chan_table[i].napi);
+               /* only disable enabled NAPI channel */
+               if (i < ndev->real_num_rx_queues)
+                       napi_disable(&net_device->chan_table[i].napi);
+
                netif_napi_del(&net_device->chan_table[i].napi);
        }
 
index 4406427d4617d58d300be5c46a368df5223d2219..11831a1c97623985401317e690b66f6985abb750 100644 (file)
 #define LINKCHANGE_INT (2 * HZ)
 #define VF_TAKEOVER_INT (HZ / 10)
 
+/* Macros to define the context of vf registration */
+#define VF_REG_IN_PROBE                1
+#define VF_REG_IN_NOTIFIER     2
+
 static unsigned int ring_size __ro_after_init = 128;
 module_param(ring_size, uint, 0444);
-MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)");
+MODULE_PARM_DESC(ring_size, "Ring buffer size (# of 4K pages)");
 unsigned int netvsc_ring_bytes __ro_after_init;
 
 static const u32 default_msg = NETIF_MSG_DRV | NETIF_MSG_PROBE |
@@ -2185,7 +2189,7 @@ static rx_handler_result_t netvsc_vf_handle_frame(struct sk_buff **pskb)
 }
 
 static int netvsc_vf_join(struct net_device *vf_netdev,
-                         struct net_device *ndev)
+                         struct net_device *ndev, int context)
 {
        struct net_device_context *ndev_ctx = netdev_priv(ndev);
        int ret;
@@ -2208,7 +2212,11 @@ static int netvsc_vf_join(struct net_device *vf_netdev,
                goto upper_link_failed;
        }
 
-       schedule_delayed_work(&ndev_ctx->vf_takeover, VF_TAKEOVER_INT);
+       /* If this registration is called from probe context vf_takeover
+        * is taken care of later in probe itself.
+        */
+       if (context == VF_REG_IN_NOTIFIER)
+               schedule_delayed_work(&ndev_ctx->vf_takeover, VF_TAKEOVER_INT);
 
        call_netdevice_notifiers(NETDEV_JOIN, vf_netdev);
 
@@ -2346,7 +2354,7 @@ static int netvsc_prepare_bonding(struct net_device *vf_netdev)
        return NOTIFY_DONE;
 }
 
-static int netvsc_register_vf(struct net_device *vf_netdev)
+static int netvsc_register_vf(struct net_device *vf_netdev, int context)
 {
        struct net_device_context *net_device_ctx;
        struct netvsc_device *netvsc_dev;
@@ -2386,7 +2394,7 @@ static int netvsc_register_vf(struct net_device *vf_netdev)
 
        netdev_info(ndev, "VF registering: %s\n", vf_netdev->name);
 
-       if (netvsc_vf_join(vf_netdev, ndev) != 0)
+       if (netvsc_vf_join(vf_netdev, ndev, context) != 0)
                return NOTIFY_DONE;
 
        dev_hold(vf_netdev);
@@ -2484,10 +2492,31 @@ static int netvsc_unregister_vf(struct net_device *vf_netdev)
        return NOTIFY_OK;
 }
 
+static int check_dev_is_matching_vf(struct net_device *event_ndev)
+{
+       /* Skip NetVSC interfaces */
+       if (event_ndev->netdev_ops == &device_ops)
+               return -ENODEV;
+
+       /* Avoid non-Ethernet type devices */
+       if (event_ndev->type != ARPHRD_ETHER)
+               return -ENODEV;
+
+       /* Avoid Vlan dev with same MAC registering as VF */
+       if (is_vlan_dev(event_ndev))
+               return -ENODEV;
+
+       /* Avoid Bonding master dev with same MAC registering as VF */
+       if (netif_is_bond_master(event_ndev))
+               return -ENODEV;
+
+       return 0;
+}
+
 static int netvsc_probe(struct hv_device *dev,
                        const struct hv_vmbus_device_id *dev_id)
 {
-       struct net_device *net = NULL;
+       struct net_device *net = NULL, *vf_netdev;
        struct net_device_context *net_device_ctx;
        struct netvsc_device_info *device_info = NULL;
        struct netvsc_device *nvdev;
@@ -2599,6 +2628,30 @@ static int netvsc_probe(struct hv_device *dev,
        }
 
        list_add(&net_device_ctx->list, &netvsc_dev_list);
+
+       /* When the hv_netvsc driver is unloaded and reloaded, the
+        * NET_DEVICE_REGISTER for the vf device is replayed before probe
+        * is complete. This is because register_netdevice_notifier() gets
+        * registered before vmbus_driver_register() so that callback func
+        * is set before probe and we don't miss events like NETDEV_POST_INIT
+        * So, in this section we try to register the matching vf device that
+        * is present as a netdevice, knowing that its register call is not
+        * processed in the netvsc_netdev_notifier(as probing is progress and
+        * get_netvsc_byslot fails).
+        */
+       for_each_netdev(dev_net(net), vf_netdev) {
+               ret = check_dev_is_matching_vf(vf_netdev);
+               if (ret != 0)
+                       continue;
+
+               if (net != get_netvsc_byslot(vf_netdev))
+                       continue;
+
+               netvsc_prepare_bonding(vf_netdev);
+               netvsc_register_vf(vf_netdev, VF_REG_IN_PROBE);
+               __netvsc_vf_setup(net, vf_netdev);
+               break;
+       }
        rtnl_unlock();
 
        netvsc_devinfo_put(device_info);
@@ -2754,28 +2807,17 @@ static int netvsc_netdev_event(struct notifier_block *this,
                               unsigned long event, void *ptr)
 {
        struct net_device *event_dev = netdev_notifier_info_to_dev(ptr);
+       int ret = 0;
 
-       /* Skip our own events */
-       if (event_dev->netdev_ops == &device_ops)
-               return NOTIFY_DONE;
-
-       /* Avoid non-Ethernet type devices */
-       if (event_dev->type != ARPHRD_ETHER)
-               return NOTIFY_DONE;
-
-       /* Avoid Vlan dev with same MAC registering as VF */
-       if (is_vlan_dev(event_dev))
-               return NOTIFY_DONE;
-
-       /* Avoid Bonding master dev with same MAC registering as VF */
-       if (netif_is_bond_master(event_dev))
+       ret = check_dev_is_matching_vf(event_dev);
+       if (ret != 0)
                return NOTIFY_DONE;
 
        switch (event) {
        case NETDEV_POST_INIT:
                return netvsc_prepare_bonding(event_dev);
        case NETDEV_REGISTER:
-               return netvsc_register_vf(event_dev);
+               return netvsc_register_vf(event_dev, VF_REG_IN_NOTIFIER);
        case NETDEV_UNREGISTER:
                return netvsc_unregister_vf(event_dev);
        case NETDEV_UP:
@@ -2807,7 +2849,7 @@ static int __init netvsc_drv_init(void)
                pr_info("Increased ring_size to %u (min allowed)\n",
                        ring_size);
        }
-       netvsc_ring_bytes = ring_size * PAGE_SIZE;
+       netvsc_ring_bytes = VMBUS_RING_SIZE(ring_size * 4096);
 
        register_netdevice_notifier(&netvsc_netdev_notifier);
 
index 35e55f198e05cea45ddb747dff34963eabfac92e..2930141d7dd2d30201e4bd1d4492cbd681fb7c0a 100644 (file)
@@ -259,4 +259,5 @@ static __exit void fake_remove_module(void)
 
 module_init(fakelb_init_module);
 module_exit(fake_remove_module);
+MODULE_DESCRIPTION("IEEE 802.15.4 loopback driver");
 MODULE_LICENSE("GPL");
index 4bc05948f772d8b009e692a62fec564c7380aae3..a78c692f2d3c5dde24879254cab725bc97072634 100644 (file)
@@ -212,7 +212,7 @@ void ipa_interrupt_suspend_clear_all(struct ipa_interrupt *interrupt)
        u32 unit_count;
        u32 unit;
 
-       unit_count = roundup(ipa->endpoint_count, 32);
+       unit_count = DIV_ROUND_UP(ipa->endpoint_count, 32);
        for (unit = 0; unit < unit_count; unit++) {
                const struct reg *reg;
                u32 val;
index 60944a4beadae611b2c2dd012683cb30052d0f6b..1afc4c47be73f906f58721aa227dd35fd30eff73 100644 (file)
@@ -237,4 +237,5 @@ static void __exit ipvtap_exit(void)
 module_exit(ipvtap_exit);
 MODULE_ALIAS_RTNL_LINK("ipvtap");
 MODULE_AUTHOR("Sainath Grandhi <sainath.grandhi@intel.com>");
+MODULE_DESCRIPTION("IP-VLAN based tap driver");
 MODULE_LICENSE("GPL");
index e34816638569e4e11d7a554a7f0fdc1fe6cb07b9..7f5426285c61b1e35afd74d4c044f80c77f34e7f 100644 (file)
@@ -607,11 +607,26 @@ static struct sk_buff *macsec_encrypt(struct sk_buff *skb,
                return ERR_PTR(-EINVAL);
        }
 
-       ret = skb_ensure_writable_head_tail(skb, dev);
-       if (unlikely(ret < 0)) {
-               macsec_txsa_put(tx_sa);
-               kfree_skb(skb);
-               return ERR_PTR(ret);
+       if (unlikely(skb_headroom(skb) < MACSEC_NEEDED_HEADROOM ||
+                    skb_tailroom(skb) < MACSEC_NEEDED_TAILROOM)) {
+               struct sk_buff *nskb = skb_copy_expand(skb,
+                                                      MACSEC_NEEDED_HEADROOM,
+                                                      MACSEC_NEEDED_TAILROOM,
+                                                      GFP_ATOMIC);
+               if (likely(nskb)) {
+                       consume_skb(skb);
+                       skb = nskb;
+               } else {
+                       macsec_txsa_put(tx_sa);
+                       kfree_skb(skb);
+                       return ERR_PTR(-ENOMEM);
+               }
+       } else {
+               skb = skb_unshare(skb, GFP_ATOMIC);
+               if (!skb) {
+                       macsec_txsa_put(tx_sa);
+                       return ERR_PTR(-ENOMEM);
+               }
        }
 
        unprotected_len = skb->len;
index b4d3b9cde8bd685202f135cf9c845d1be76ef428..92a7a36b93ac0cc1b02a551b974fb390254ac484 100644 (file)
@@ -835,14 +835,14 @@ static void nsim_dev_trap_report_work(struct work_struct *work)
                                      trap_report_dw.work);
        nsim_dev = nsim_trap_data->nsim_dev;
 
-       /* For each running port and enabled packet trap, generate a UDP
-        * packet with a random 5-tuple and report it.
-        */
        if (!devl_trylock(priv_to_devlink(nsim_dev))) {
-               schedule_delayed_work(&nsim_dev->trap_data->trap_report_dw, 0);
+               schedule_delayed_work(&nsim_dev->trap_data->trap_report_dw, 1);
                return;
        }
 
+       /* For each running port and enabled packet trap, generate a UDP
+        * packet with a random 5-tuple and report it.
+        */
        list_for_each_entry(nsim_dev_port, &nsim_dev->port_list, list) {
                if (!netif_running(nsim_dev_port->ns->netdev))
                        continue;
index 69b829e6ab35b84a07f0063f3a6f7b48ea1a6de1..7fd3377dbd7960adb2af9f19999b1a940b55fc33 100644 (file)
@@ -131,4 +131,5 @@ int __devm_of_mdiobus_register(struct device *dev, struct mii_bus *mdio,
 EXPORT_SYMBOL(__devm_of_mdiobus_register);
 #endif /* CONFIG_OF_MDIO */
 
+MODULE_DESCRIPTION("Network MDIO bus devres helpers");
 MODULE_LICENSE("GPL");
index 8a20d9889f105bc609f56a2632132e0ef2c08504..0f3a1538a8b8ee045953a3c5ff308dc824ea7c0a 100644 (file)
@@ -489,7 +489,7 @@ static int tx_r50_fill_result(struct phy_device *phydev, u16 tx_r50_cal_val,
        u16 reg, val;
 
        if (phydev->drv->phy_id == MTK_GPHY_ID_MT7988)
-               bias = -2;
+               bias = -1;
 
        val = clamp_val(bias + tx_r50_cal_val, 0, 63);
 
@@ -705,6 +705,11 @@ restore:
 static void mt798x_phy_common_finetune(struct phy_device *phydev)
 {
        phy_select_page(phydev, MTK_PHY_PAGE_EXTENDED_52B5);
+       /* SlvDSPreadyTime = 24, MasDSPreadyTime = 24 */
+       __phy_write(phydev, 0x11, 0xc71);
+       __phy_write(phydev, 0x12, 0xc);
+       __phy_write(phydev, 0x10, 0x8fae);
+
        /* EnabRandUpdTrig = 1 */
        __phy_write(phydev, 0x11, 0x2f00);
        __phy_write(phydev, 0x12, 0xe);
@@ -715,15 +720,56 @@ static void mt798x_phy_common_finetune(struct phy_device *phydev)
        __phy_write(phydev, 0x12, 0x0);
        __phy_write(phydev, 0x10, 0x83aa);
 
-       /* TrFreeze = 0 */
+       /* FfeUpdGainForce = 1(Enable), FfeUpdGainForceVal = 4 */
+       __phy_write(phydev, 0x11, 0x240);
+       __phy_write(phydev, 0x12, 0x0);
+       __phy_write(phydev, 0x10, 0x9680);
+
+       /* TrFreeze = 0 (mt7988 default) */
        __phy_write(phydev, 0x11, 0x0);
        __phy_write(phydev, 0x12, 0x0);
        __phy_write(phydev, 0x10, 0x9686);
 
+       /* SSTrKp100 = 5 */
+       /* SSTrKf100 = 6 */
+       /* SSTrKp1000Mas = 5 */
+       /* SSTrKf1000Mas = 6 */
        /* SSTrKp1000Slv = 5 */
+       /* SSTrKf1000Slv = 6 */
        __phy_write(phydev, 0x11, 0xbaef);
        __phy_write(phydev, 0x12, 0x2e);
        __phy_write(phydev, 0x10, 0x968c);
+       phy_restore_page(phydev, MTK_PHY_PAGE_STANDARD, 0);
+}
+
+static void mt7981_phy_finetune(struct phy_device *phydev)
+{
+       u16 val[8] = { 0x01ce, 0x01c1,
+                      0x020f, 0x0202,
+                      0x03d0, 0x03c0,
+                      0x0013, 0x0005 };
+       int i, k;
+
+       /* 100M eye finetune:
+        * Keep middle level of TX MLT3 shapper as default.
+        * Only change TX MLT3 overshoot level here.
+        */
+       for (k = 0, i = 1; i < 12; i++) {
+               if (i % 3 == 0)
+                       continue;
+               phy_write_mmd(phydev, MDIO_MMD_VEND1, i, val[k++]);
+       }
+
+       phy_select_page(phydev, MTK_PHY_PAGE_EXTENDED_52B5);
+       /* ResetSyncOffset = 6 */
+       __phy_write(phydev, 0x11, 0x600);
+       __phy_write(phydev, 0x12, 0x0);
+       __phy_write(phydev, 0x10, 0x8fc0);
+
+       /* VgaDecRate = 1 */
+       __phy_write(phydev, 0x11, 0x4c2a);
+       __phy_write(phydev, 0x12, 0x3e);
+       __phy_write(phydev, 0x10, 0x8fa4);
 
        /* MrvlTrFix100Kp = 3, MrvlTrFix100Kf = 2,
         * MrvlTrFix1000Kp = 3, MrvlTrFix1000Kf = 2
@@ -738,7 +784,7 @@ static void mt798x_phy_common_finetune(struct phy_device *phydev)
        __phy_write(phydev, 0x10, 0x8ec0);
        phy_restore_page(phydev, MTK_PHY_PAGE_STANDARD, 0);
 
-       /* TR_OPEN_LOOP_EN = 1, lpf_x_average = 9*/
+       /* TR_OPEN_LOOP_EN = 1, lpf_x_average = 9 */
        phy_modify_mmd(phydev, MDIO_MMD_VEND1, MTK_PHY_RG_DEV1E_REG234,
                       MTK_PHY_TR_OPEN_LOOP_EN_MASK | MTK_PHY_LPF_X_AVERAGE_MASK,
                       BIT(0) | FIELD_PREP(MTK_PHY_LPF_X_AVERAGE_MASK, 0x9));
@@ -771,48 +817,6 @@ static void mt798x_phy_common_finetune(struct phy_device *phydev)
        phy_write_mmd(phydev, MDIO_MMD_VEND1, MTK_PHY_LDO_OUTPUT_V, 0x2222);
 }
 
-static void mt7981_phy_finetune(struct phy_device *phydev)
-{
-       u16 val[8] = { 0x01ce, 0x01c1,
-                      0x020f, 0x0202,
-                      0x03d0, 0x03c0,
-                      0x0013, 0x0005 };
-       int i, k;
-
-       /* 100M eye finetune:
-        * Keep middle level of TX MLT3 shapper as default.
-        * Only change TX MLT3 overshoot level here.
-        */
-       for (k = 0, i = 1; i < 12; i++) {
-               if (i % 3 == 0)
-                       continue;
-               phy_write_mmd(phydev, MDIO_MMD_VEND1, i, val[k++]);
-       }
-
-       phy_select_page(phydev, MTK_PHY_PAGE_EXTENDED_52B5);
-       /* SlvDSPreadyTime = 24, MasDSPreadyTime = 24 */
-       __phy_write(phydev, 0x11, 0xc71);
-       __phy_write(phydev, 0x12, 0xc);
-       __phy_write(phydev, 0x10, 0x8fae);
-
-       /* ResetSyncOffset = 6 */
-       __phy_write(phydev, 0x11, 0x600);
-       __phy_write(phydev, 0x12, 0x0);
-       __phy_write(phydev, 0x10, 0x8fc0);
-
-       /* VgaDecRate = 1 */
-       __phy_write(phydev, 0x11, 0x4c2a);
-       __phy_write(phydev, 0x12, 0x3e);
-       __phy_write(phydev, 0x10, 0x8fa4);
-
-       /* FfeUpdGainForce = 4 */
-       __phy_write(phydev, 0x11, 0x240);
-       __phy_write(phydev, 0x12, 0x0);
-       __phy_write(phydev, 0x10, 0x9680);
-
-       phy_restore_page(phydev, MTK_PHY_PAGE_STANDARD, 0);
-}
-
 static void mt7988_phy_finetune(struct phy_device *phydev)
 {
        u16 val[12] = { 0x0187, 0x01cd, 0x01c8, 0x0182,
@@ -827,17 +831,7 @@ static void mt7988_phy_finetune(struct phy_device *phydev)
        /* TCT finetune */
        phy_write_mmd(phydev, MDIO_MMD_VEND1, MTK_PHY_RG_TX_FILTER, 0x5);
 
-       /* Disable TX power saving */
-       phy_modify_mmd(phydev, MDIO_MMD_VEND1, MTK_PHY_RXADC_CTRL_RG7,
-                      MTK_PHY_DA_AD_BUF_BIAS_LP_MASK, 0x3 << 8);
-
        phy_select_page(phydev, MTK_PHY_PAGE_EXTENDED_52B5);
-
-       /* SlvDSPreadyTime = 24, MasDSPreadyTime = 12 */
-       __phy_write(phydev, 0x11, 0x671);
-       __phy_write(phydev, 0x12, 0xc);
-       __phy_write(phydev, 0x10, 0x8fae);
-
        /* ResetSyncOffset = 5 */
        __phy_write(phydev, 0x11, 0x500);
        __phy_write(phydev, 0x12, 0x0);
@@ -845,13 +839,27 @@ static void mt7988_phy_finetune(struct phy_device *phydev)
 
        /* VgaDecRate is 1 at default on mt7988 */
 
-       phy_restore_page(phydev, MTK_PHY_PAGE_STANDARD, 0);
+       /* MrvlTrFix100Kp = 6, MrvlTrFix100Kf = 7,
+        * MrvlTrFix1000Kp = 6, MrvlTrFix1000Kf = 7
+        */
+       __phy_write(phydev, 0x11, 0xb90a);
+       __phy_write(phydev, 0x12, 0x6f);
+       __phy_write(phydev, 0x10, 0x8f82);
+
+       /* RemAckCntLimitCtrl = 1 */
+       __phy_write(phydev, 0x11, 0xfbba);
+       __phy_write(phydev, 0x12, 0xc3);
+       __phy_write(phydev, 0x10, 0x87f8);
 
-       phy_select_page(phydev, MTK_PHY_PAGE_EXTENDED_2A30);
-       /* TxClkOffset = 2 */
-       __phy_modify(phydev, MTK_PHY_ANARG_RG, MTK_PHY_TCLKOFFSET_MASK,
-                    FIELD_PREP(MTK_PHY_TCLKOFFSET_MASK, 0x2));
        phy_restore_page(phydev, MTK_PHY_PAGE_STANDARD, 0);
+
+       /* TR_OPEN_LOOP_EN = 1, lpf_x_average = 10 */
+       phy_modify_mmd(phydev, MDIO_MMD_VEND1, MTK_PHY_RG_DEV1E_REG234,
+                      MTK_PHY_TR_OPEN_LOOP_EN_MASK | MTK_PHY_LPF_X_AVERAGE_MASK,
+                      BIT(0) | FIELD_PREP(MTK_PHY_LPF_X_AVERAGE_MASK, 0xa));
+
+       /* rg_tr_lpf_cnt_val = 1023 */
+       phy_write_mmd(phydev, MDIO_MMD_VEND1, MTK_PHY_RG_LPF_CNT_VAL, 0x3ff);
 }
 
 static void mt798x_phy_eee(struct phy_device *phydev)
@@ -884,11 +892,11 @@ static void mt798x_phy_eee(struct phy_device *phydev)
                       MTK_PHY_LPI_SLV_SEND_TX_EN,
                       FIELD_PREP(MTK_PHY_LPI_SLV_SEND_TX_TIMER_MASK, 0x120));
 
-       phy_modify_mmd(phydev, MDIO_MMD_VEND1, MTK_PHY_RG_DEV1E_REG239,
-                      MTK_PHY_LPI_SEND_LOC_TIMER_MASK |
-                      MTK_PHY_LPI_TXPCS_LOC_RCV,
-                      FIELD_PREP(MTK_PHY_LPI_SEND_LOC_TIMER_MASK, 0x117));
+       /* Keep MTK_PHY_LPI_SEND_LOC_TIMER as 375 */
+       phy_clear_bits_mmd(phydev, MDIO_MMD_VEND1, MTK_PHY_RG_DEV1E_REG239,
+                          MTK_PHY_LPI_TXPCS_LOC_RCV);
 
+       /* This also fixes some IoT issues, such as CH340 */
        phy_modify_mmd(phydev, MDIO_MMD_VEND1, MTK_PHY_RG_DEV1E_REG2C7,
                       MTK_PHY_MAX_GAIN_MASK | MTK_PHY_MIN_GAIN_MASK,
                       FIELD_PREP(MTK_PHY_MAX_GAIN_MASK, 0x8) |
@@ -922,7 +930,7 @@ static void mt798x_phy_eee(struct phy_device *phydev)
        __phy_write(phydev, 0x12, 0x0);
        __phy_write(phydev, 0x10, 0x9690);
 
-       /* REG_EEE_st2TrKf1000 = 3 */
+       /* REG_EEE_st2TrKf1000 = 2 */
        __phy_write(phydev, 0x11, 0x114f);
        __phy_write(phydev, 0x12, 0x2);
        __phy_write(phydev, 0x10, 0x969a);
@@ -947,7 +955,7 @@ static void mt798x_phy_eee(struct phy_device *phydev)
        __phy_write(phydev, 0x12, 0x0);
        __phy_write(phydev, 0x10, 0x96b8);
 
-       /* REGEEE_wake_slv_tr_wait_dfesigdet_en = 1 */
+       /* REGEEE_wake_slv_tr_wait_dfesigdet_en = 0 */
        __phy_write(phydev, 0x11, 0x1463);
        __phy_write(phydev, 0x12, 0x0);
        __phy_write(phydev, 0x10, 0x96ca);
@@ -1459,6 +1467,13 @@ static int mt7988_phy_probe(struct phy_device *phydev)
        if (err)
                return err;
 
+       /* Disable TX power saving at probing to:
+        * 1. Meet common mode compliance test criteria
+        * 2. Make sure that TX-VCM calibration works fine
+        */
+       phy_modify_mmd(phydev, MDIO_MMD_VEND1, MTK_PHY_RXADC_CTRL_RG7,
+                      MTK_PHY_DA_AD_BUF_BIAS_LP_MASK, 0x3 << 8);
+
        return mt798x_phy_calibration(phydev);
 }
 
index 81c20eb4b54b918517866a262404e3641988a66d..dad720138baafc57f3b7efb9afd36d82ec5a1b83 100644 (file)
  */
 #define LAN8814_1PPM_FORMAT                    17179
 
+#define PTP_RX_VERSION                         0x0248
+#define PTP_TX_VERSION                         0x0288
+#define PTP_MAX_VERSION(x)                     (((x) & GENMASK(7, 0)) << 8)
+#define PTP_MIN_VERSION(x)                     ((x) & GENMASK(7, 0))
+
 #define PTP_RX_MOD                             0x024F
 #define PTP_RX_MOD_BAD_UDPV4_CHKSUM_FORCE_FCS_DIS_ BIT(3)
 #define PTP_RX_TIMESTAMP_EN                    0x024D
@@ -3150,6 +3155,12 @@ static void lan8814_ptp_init(struct phy_device *phydev)
        lanphy_write_page_reg(phydev, 5, PTP_TX_PARSE_IP_ADDR_EN, 0);
        lanphy_write_page_reg(phydev, 5, PTP_RX_PARSE_IP_ADDR_EN, 0);
 
+       /* Disable checking for minorVersionPTP field */
+       lanphy_write_page_reg(phydev, 5, PTP_RX_VERSION,
+                             PTP_MAX_VERSION(0xff) | PTP_MIN_VERSION(0x0));
+       lanphy_write_page_reg(phydev, 5, PTP_TX_VERSION,
+                             PTP_MAX_VERSION(0xff) | PTP_MIN_VERSION(0x0));
+
        skb_queue_head_init(&ptp_priv->tx_queue);
        skb_queue_head_init(&ptp_priv->rx_queue);
        INIT_LIST_HEAD(&ptp_priv->rx_ts_list);
index 894172a3e15fe8a6a86e38b64246ebefcb65362b..337899c69738ec46c2b585db76e11fa25738560e 100644 (file)
@@ -421,9 +421,11 @@ static int rtl8211f_config_init(struct phy_device *phydev)
                                ERR_PTR(ret));
                        return ret;
                }
+
+               return genphy_soft_reset(phydev);
        }
 
-       return genphy_soft_reset(phydev);
+       return 0;
 }
 
 static int rtl821x_suspend(struct phy_device *phydev)
index 40ce8abe699954d106b216e8351925ec1cd9d3a1..cc7d1113ece0ee7d6cfa0e1830bbbdc664c28514 100644 (file)
@@ -1437,4 +1437,5 @@ static int __init plip_init (void)
 
 module_init(plip_init);
 module_exit(plip_cleanup_module);
+MODULE_DESCRIPTION("PLIP (parallel port) network module");
 MODULE_LICENSE("GPL");
index db0dc36d12e33ed7a481319c2d3a219ff88f3782..55954594e157e2eb6a4331e6da3b2d2ff547a4a3 100644 (file)
@@ -1166,5 +1166,6 @@ static void __exit bsdcomp_cleanup(void)
 
 module_init(bsdcomp_init);
 module_exit(bsdcomp_cleanup);
+MODULE_DESCRIPTION("PPP BSD-Compress compression module");
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_ALIAS("ppp-compress-" __stringify(CI_BSD_COMPRESS));
index 840da924708b393b16a82ab4e07746538214c0f9..c33c3db3cc0896d9b033aa2b188fbf46be8afd68 100644 (file)
@@ -87,6 +87,7 @@ struct asyncppp {
 static int flag_time = HZ;
 module_param(flag_time, int, 0);
 MODULE_PARM_DESC(flag_time, "ppp_async: interval between flagged packets (in clock ticks)");
+MODULE_DESCRIPTION("PPP async serial channel module");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_LDISC(N_PPP);
 
@@ -460,6 +461,10 @@ ppp_async_ioctl(struct ppp_channel *chan, unsigned int cmd, unsigned long arg)
        case PPPIOCSMRU:
                if (get_user(val, p))
                        break;
+               if (val > U16_MAX) {
+                       err = -EINVAL;
+                       break;
+               }
                if (val < PPP_MRU)
                        val = PPP_MRU;
                ap->mru = val;
index e6d48e5c65a3379e12bbbd4679b1d0b326d3e93b..4d2ff63f2ee2f6bb02a07419513549890956d32e 100644 (file)
@@ -630,6 +630,7 @@ static void __exit deflate_cleanup(void)
 
 module_init(deflate_init);
 module_exit(deflate_cleanup);
+MODULE_DESCRIPTION("PPP Deflate compression module");
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_ALIAS("ppp-compress-" __stringify(CI_DEFLATE));
 MODULE_ALIAS("ppp-compress-" __stringify(CI_DEFLATE_DRAFT));
index 0193af2d31c9bcf5dc8864da49ba4e75ba0192fc..3dd52bf28f15bf9f260719f1bf61e45d6d48b3f7 100644 (file)
@@ -3604,6 +3604,7 @@ EXPORT_SYMBOL(ppp_input_error);
 EXPORT_SYMBOL(ppp_output_wakeup);
 EXPORT_SYMBOL(ppp_register_compressor);
 EXPORT_SYMBOL(ppp_unregister_compressor);
+MODULE_DESCRIPTION("Generic PPP layer driver");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_CHARDEV(PPP_MAJOR, 0);
 MODULE_ALIAS_RTNL_LINK("ppp");
index 52d05ce4a2819815963eebf4df399058835ff350..45bf59ac8f5711867ed1ba433d3f5e7800b769e4 100644 (file)
@@ -724,5 +724,6 @@ ppp_sync_cleanup(void)
 
 module_init(ppp_sync_init);
 module_exit(ppp_sync_cleanup);
+MODULE_DESCRIPTION("PPP synchronous TTY channel module");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_LDISC(N_SYNC_PPP);
index 8e7238e97d0a71708ebcddda9b1e1a50ab28c17d..2ea4f4890d23b5f1c5229c7f8b303ee85a954037 100644 (file)
@@ -1007,26 +1007,21 @@ static int pppoe_recvmsg(struct socket *sock, struct msghdr *m,
        struct sk_buff *skb;
        int error = 0;
 
-       if (sk->sk_state & PPPOX_BOUND) {
-               error = -EIO;
-               goto end;
-       }
+       if (sk->sk_state & PPPOX_BOUND)
+               return -EIO;
 
        skb = skb_recv_datagram(sk, flags, &error);
-       if (error < 0)
-               goto end;
+       if (!skb)
+               return error;
 
-       if (skb) {
-               total_len = min_t(size_t, total_len, skb->len);
-               error = skb_copy_datagram_msg(skb, 0, m, total_len);
-               if (error == 0) {
-                       consume_skb(skb);
-                       return total_len;
-               }
+       total_len = min_t(size_t, total_len, skb->len);
+       error = skb_copy_datagram_msg(skb, 0, m, total_len);
+       if (error == 0) {
+               consume_skb(skb);
+               return total_len;
        }
 
        kfree_skb(skb);
-end:
        return error;
 }
 
index afa5497f7c35c3ab5682e66440afc8a888d14414..8f95a562b8d0c471c44591629e04809f7faef9b2 100644 (file)
@@ -653,6 +653,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
                                   tun->tfiles[tun->numqueues - 1]);
                ntfile = rtnl_dereference(tun->tfiles[index]);
                ntfile->queue_index = index;
+               ntfile->xdp_rxq.queue_index = index;
                rcu_assign_pointer(tun->tfiles[tun->numqueues - 1],
                                   NULL);
 
@@ -1630,13 +1631,19 @@ static int tun_xdp_act(struct tun_struct *tun, struct bpf_prog *xdp_prog,
        switch (act) {
        case XDP_REDIRECT:
                err = xdp_do_redirect(tun->dev, xdp, xdp_prog);
-               if (err)
+               if (err) {
+                       dev_core_stats_rx_dropped_inc(tun->dev);
                        return err;
+               }
+               dev_sw_netstats_rx_add(tun->dev, xdp->data_end - xdp->data);
                break;
        case XDP_TX:
                err = tun_xdp_tx(tun->dev, xdp);
-               if (err < 0)
+               if (err < 0) {
+                       dev_core_stats_rx_dropped_inc(tun->dev);
                        return err;
+               }
+               dev_sw_netstats_rx_add(tun->dev, xdp->data_end - xdp->data);
                break;
        case XDP_PASS:
                break;
index 99ec1d4a972db8c1232ce8ee8eb8d97385a9b5f0..8b6d6a1b3c2eca086e77915e26428c1110127f4d 100644 (file)
@@ -232,7 +232,7 @@ static int dm9601_mdio_read(struct net_device *netdev, int phy_id, int loc)
        err = dm_read_shared_word(dev, 1, loc, &res);
        if (err < 0) {
                netdev_err(dev->net, "MDIO read error: %d\n", err);
-               return err;
+               return 0;
        }
 
        netdev_dbg(dev->net,
index a6d653ff552a261ca50d331dd7d7aa875ca3c362..d2aa2c5b1989da8a7e099dfdef88c087da3cf37b 100644 (file)
@@ -1501,7 +1501,9 @@ static int lan78xx_link_reset(struct lan78xx_net *dev)
 
                lan78xx_rx_urb_submit_all(dev);
 
+               local_bh_disable();
                napi_schedule(&dev->napi);
+               local_bh_enable();
        }
 
        return 0;
@@ -3033,7 +3035,8 @@ static int lan78xx_reset(struct lan78xx_net *dev)
        if (dev->chipid == ID_REV_CHIP_ID_7801_)
                buf &= ~MAC_CR_GMII_EN_;
 
-       if (dev->chipid == ID_REV_CHIP_ID_7800_) {
+       if (dev->chipid == ID_REV_CHIP_ID_7800_ ||
+           dev->chipid == ID_REV_CHIP_ID_7850_) {
                ret = lan78xx_read_raw_eeprom(dev, 0, 1, &sig);
                if (!ret && sig != EEPROM_INDICATOR) {
                        /* Implies there is no external eeprom. Set mac speed */
@@ -3132,7 +3135,8 @@ static int lan78xx_open(struct net_device *net)
 done:
        mutex_unlock(&dev->dev_mutex);
 
-       usb_autopm_put_interface(dev->intf);
+       if (ret < 0)
+               usb_autopm_put_interface(dev->intf);
 
        return ret;
 }
index a530f20ee257550141e5ec7c17b5fba0087db248..2fa46baa589e5e87e12e145fe46268bdaf9fc219 100644 (file)
@@ -2104,6 +2104,11 @@ static const struct usb_device_id products[] = {
                USB_DEVICE(0x0424, 0x9E08),
                .driver_info = (unsigned long) &smsc95xx_info,
        },
+       {
+               /* SYSTEC USB-SPEmodule1 10BASE-T1L Ethernet Device */
+               USB_DEVICE(0x0878, 0x1400),
+               .driver_info = (unsigned long)&smsc95xx_info,
+       },
        {
                /* Microchip's EVB-LAN8670-USB 10BASE-T1S Ethernet Device */
                USB_DEVICE(0x184F, 0x0051),
index 578e36ea1589c11f1ca26b6e05a84b455d22999e..cd4a6fe458f95d7bbc3c468ae8585d06cf0ac097 100644 (file)
@@ -1208,14 +1208,6 @@ static int veth_enable_xdp(struct net_device *dev)
                                veth_disable_xdp_range(dev, 0, dev->real_num_rx_queues, true);
                                return err;
                        }
-
-                       if (!veth_gro_requested(dev)) {
-                               /* user-space did not require GRO, but adding XDP
-                                * is supposed to get GRO working
-                                */
-                               dev->features |= NETIF_F_GRO;
-                               netdev_features_change(dev);
-                       }
                }
        }
 
@@ -1235,18 +1227,9 @@ static void veth_disable_xdp(struct net_device *dev)
        for (i = 0; i < dev->real_num_rx_queues; i++)
                rcu_assign_pointer(priv->rq[i].xdp_prog, NULL);
 
-       if (!netif_running(dev) || !veth_gro_requested(dev)) {
+       if (!netif_running(dev) || !veth_gro_requested(dev))
                veth_napi_del(dev);
 
-               /* if user-space did not require GRO, since adding XDP
-                * enabled it, clear it now
-                */
-               if (!veth_gro_requested(dev) && netif_running(dev)) {
-                       dev->features &= ~NETIF_F_GRO;
-                       netdev_features_change(dev);
-               }
-       }
-
        veth_disable_xdp_range(dev, 0, dev->real_num_rx_queues, false);
 }
 
@@ -1478,7 +1461,8 @@ static int veth_alloc_queues(struct net_device *dev)
        struct veth_priv *priv = netdev_priv(dev);
        int i;
 
-       priv->rq = kcalloc(dev->num_rx_queues, sizeof(*priv->rq), GFP_KERNEL_ACCOUNT);
+       priv->rq = kvcalloc(dev->num_rx_queues, sizeof(*priv->rq),
+                           GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL);
        if (!priv->rq)
                return -ENOMEM;
 
@@ -1494,7 +1478,7 @@ static void veth_free_queues(struct net_device *dev)
 {
        struct veth_priv *priv = netdev_priv(dev);
 
-       kfree(priv->rq);
+       kvfree(priv->rq);
 }
 
 static int veth_dev_init(struct net_device *dev)
@@ -1654,6 +1638,14 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog,
                }
 
                if (!old_prog) {
+                       if (!veth_gro_requested(dev)) {
+                               /* user-space did not require GRO, but adding
+                                * XDP is supposed to get GRO working
+                                */
+                               dev->features |= NETIF_F_GRO;
+                               netdev_features_change(dev);
+                       }
+
                        peer->hw_features &= ~NETIF_F_GSO_SOFTWARE;
                        peer->max_mtu = max_mtu;
                }
@@ -1669,6 +1661,14 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog,
                        if (dev->flags & IFF_UP)
                                veth_disable_xdp(dev);
 
+                       /* if user-space did not require GRO, since adding XDP
+                        * enabled it, clear it now
+                        */
+                       if (!veth_gro_requested(dev)) {
+                               dev->features &= ~NETIF_F_GRO;
+                               netdev_features_change(dev);
+                       }
+
                        if (peer) {
                                peer->hw_features |= NETIF_F_GSO_SOFTWARE;
                                peer->max_mtu = ETH_MAX_MTU;
index 43e0db78d42beccfc2883050bb2665c191e675f8..a742cec44e3db823ae3fa85d6161e20d10dc64fb 100644 (file)
@@ -1803,5 +1803,6 @@ static struct usb_driver ar5523_driver = {
 
 module_usb_driver(ar5523_driver);
 
+MODULE_DESCRIPTION("Atheros AR5523 wireless driver");
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_FIRMWARE(AR5523_FIRMWARE_FILE);
index 7e3b6779f4e969369a9e6713b9235241efa9ac44..02e160d831bed13f3358034048ce5b03d36dc090 100644 (file)
@@ -368,10 +368,6 @@ struct ath11k_vif {
        struct ieee80211_chanctx_conf chanctx;
        struct ath11k_arp_ns_offload arp_ns_offload;
        struct ath11k_rekey_data rekey_data;
-
-#ifdef CONFIG_ATH11K_DEBUGFS
-       struct dentry *debugfs_twt;
-#endif /* CONFIG_ATH11K_DEBUGFS */
 };
 
 struct ath11k_vif_iter {
index a847bc0d50c0f0b955e93947e49b771d41756ea1..a48e737ef35d661f670373617bef8f0525358543 100644 (file)
@@ -1894,35 +1894,30 @@ static const struct file_operations ath11k_fops_twt_resume_dialog = {
        .open = simple_open
 };
 
-void ath11k_debugfs_add_interface(struct ath11k_vif *arvif)
+void ath11k_debugfs_op_vif_add(struct ieee80211_hw *hw,
+                              struct ieee80211_vif *vif)
 {
+       struct ath11k_vif *arvif = ath11k_vif_to_arvif(vif);
        struct ath11k_base *ab = arvif->ar->ab;
+       struct dentry *debugfs_twt;
 
        if (arvif->vif->type != NL80211_IFTYPE_AP &&
            !(arvif->vif->type == NL80211_IFTYPE_STATION &&
              test_bit(WMI_TLV_SERVICE_STA_TWT, ab->wmi_ab.svc_map)))
                return;
 
-       arvif->debugfs_twt = debugfs_create_dir("twt",
-                                               arvif->vif->debugfs_dir);
-       debugfs_create_file("add_dialog", 0200, arvif->debugfs_twt,
+       debugfs_twt = debugfs_create_dir("twt",
+                                        arvif->vif->debugfs_dir);
+       debugfs_create_file("add_dialog", 0200, debugfs_twt,
                            arvif, &ath11k_fops_twt_add_dialog);
 
-       debugfs_create_file("del_dialog", 0200, arvif->debugfs_twt,
+       debugfs_create_file("del_dialog", 0200, debugfs_twt,
                            arvif, &ath11k_fops_twt_del_dialog);
 
-       debugfs_create_file("pause_dialog", 0200, arvif->debugfs_twt,
+       debugfs_create_file("pause_dialog", 0200, debugfs_twt,
                            arvif, &ath11k_fops_twt_pause_dialog);
 
-       debugfs_create_file("resume_dialog", 0200, arvif->debugfs_twt,
+       debugfs_create_file("resume_dialog", 0200, debugfs_twt,
                            arvif, &ath11k_fops_twt_resume_dialog);
 }
 
-void ath11k_debugfs_remove_interface(struct ath11k_vif *arvif)
-{
-       if (!arvif->debugfs_twt)
-               return;
-
-       debugfs_remove_recursive(arvif->debugfs_twt);
-       arvif->debugfs_twt = NULL;
-}
index 44d15845f39a6735f3ef15224ea12ace13079ef4..a39e458637b01366b430e138bbc53126196b512f 100644 (file)
@@ -307,8 +307,8 @@ static inline int ath11k_debugfs_rx_filter(struct ath11k *ar)
        return ar->debug.rx_filter;
 }
 
-void ath11k_debugfs_add_interface(struct ath11k_vif *arvif);
-void ath11k_debugfs_remove_interface(struct ath11k_vif *arvif);
+void ath11k_debugfs_op_vif_add(struct ieee80211_hw *hw,
+                              struct ieee80211_vif *vif);
 void ath11k_debugfs_add_dbring_entry(struct ath11k *ar,
                                     enum wmi_direct_buffer_module id,
                                     enum ath11k_dbg_dbr_event event,
@@ -387,14 +387,6 @@ static inline int ath11k_debugfs_get_fw_stats(struct ath11k *ar,
        return 0;
 }
 
-static inline void ath11k_debugfs_add_interface(struct ath11k_vif *arvif)
-{
-}
-
-static inline void ath11k_debugfs_remove_interface(struct ath11k_vif *arvif)
-{
-}
-
 static inline void
 ath11k_debugfs_add_dbring_entry(struct ath11k *ar,
                                enum wmi_direct_buffer_module id,
index db241589424d519607429b34ffd9946b32c525a9..b13525bbbb8087acbdc15247a0a428a74fd5f8b9 100644 (file)
@@ -6756,13 +6756,6 @@ static int ath11k_mac_op_add_interface(struct ieee80211_hw *hw,
                goto err;
        }
 
-       /* In the case of hardware recovery, debugfs files are
-        * not deleted since ieee80211_ops.remove_interface() is
-        * not invoked. In such cases, try to delete the files.
-        * These will be re-created later.
-        */
-       ath11k_debugfs_remove_interface(arvif);
-
        memset(arvif, 0, sizeof(*arvif));
 
        arvif->ar = ar;
@@ -6939,8 +6932,6 @@ static int ath11k_mac_op_add_interface(struct ieee80211_hw *hw,
 
        ath11k_dp_vdev_tx_attach(ar, arvif);
 
-       ath11k_debugfs_add_interface(arvif);
-
        if (vif->type != NL80211_IFTYPE_MONITOR &&
            test_bit(ATH11K_FLAG_MONITOR_CONF_ENABLED, &ar->monitor_flags)) {
                ret = ath11k_mac_monitor_vdev_create(ar);
@@ -7056,8 +7047,6 @@ err_vdev_del:
        /* Recalc txpower for remaining vdev */
        ath11k_mac_txpower_recalc(ar);
 
-       ath11k_debugfs_remove_interface(arvif);
-
        /* TODO: recal traffic pause state based on the available vdevs */
 
        mutex_unlock(&ar->conf_mutex);
@@ -9153,6 +9142,7 @@ static const struct ieee80211_ops ath11k_ops = {
 #endif
 
 #ifdef CONFIG_ATH11K_DEBUGFS
+       .vif_add_debugfs                = ath11k_debugfs_op_vif_add,
        .sta_add_debugfs                = ath11k_debugfs_sta_op_add,
 #endif
 
index 41119fb177e306f30280d1a1d83ae5583976668d..4e6b4df8562f632e34089619f7a9b485b5e71595 100644 (file)
@@ -1685,6 +1685,7 @@ static struct platform_driver wcn36xx_driver = {
 
 module_platform_driver(wcn36xx_driver);
 
+MODULE_DESCRIPTION("Qualcomm Atheros WCN3660/3680 wireless driver");
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_AUTHOR("Eugene Krasnikov k.eugene.e@gmail.com");
 MODULE_FIRMWARE(WLAN_NV_FILE);
index d55f3271d6190234220afd12ac8f6eb7a1d78f64..4f0c1e1a8e605daa4bcf907006bbd8e9a07490fa 100644 (file)
@@ -20,6 +20,7 @@ static void __exit brcmf_bca_exit(void)
        brcmf_fwvid_unregister_vendor(BRCMF_FWVENDOR_BCA, THIS_MODULE);
 }
 
+MODULE_DESCRIPTION("Broadcom FullMAC WLAN driver plugin for Broadcom AP chipsets");
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_IMPORT_NS(BRCMFMAC);
 
index 133c5ea6429cd0e17baea181209c4d701e662d0c..28d6a30cc0106d6a38b51e35c1f518accfdbe987 100644 (file)
@@ -3779,8 +3779,10 @@ static int brcmf_internal_escan_add_info(struct cfg80211_scan_request *req,
                if (req->channels[i] == chan)
                        break;
        }
-       if (i == req->n_channels)
-               req->channels[req->n_channels++] = chan;
+       if (i == req->n_channels) {
+               req->n_channels++;
+               req->channels[i] = chan;
+       }
 
        for (i = 0; i < req->n_ssids; i++) {
                if (req->ssids[i].ssid_len == ssid_len &&
index f82fbbe3ecefb7af1019281b3f031f45b9ec30e6..90d06cda03a2f007e9f00c636a22a4a130670dff 100644 (file)
@@ -20,6 +20,7 @@ static void __exit brcmf_cyw_exit(void)
        brcmf_fwvid_unregister_vendor(BRCMF_FWVENDOR_CYW, THIS_MODULE);
 }
 
+MODULE_DESCRIPTION("Broadcom FullMAC WLAN driver plugin for Cypress/Infineon chipsets");
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_IMPORT_NS(BRCMFMAC);
 
index 02918d434556b04d797a4141f3dcaede15a7b494..b66135e3cff476a95c5482e099975fd01849bedf 100644 (file)
@@ -20,6 +20,7 @@ static void __exit brcmf_wcc_exit(void)
        brcmf_fwvid_unregister_vendor(BRCMF_FWVENDOR_WCC, THIS_MODULE);
 }
 
+MODULE_DESCRIPTION("Broadcom FullMAC WLAN driver plugin for Broadcom mobility chipsets");
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_IMPORT_NS(BRCMFMAC);
 
index b96f30d11644e24eb3886e5f2536d6eaad4d01db..dcc4810cb32472dbe8ee374091a2d58241af80c8 100644 (file)
@@ -618,7 +618,7 @@ int iwl_sar_get_wrds_table(struct iwl_fw_runtime *fwrt)
                                         &tbl_rev);
        if (!IS_ERR(wifi_pkg)) {
                if (tbl_rev != 2) {
-                       ret = PTR_ERR(wifi_pkg);
+                       ret = -EINVAL;
                        goto out_free;
                }
 
@@ -634,7 +634,7 @@ int iwl_sar_get_wrds_table(struct iwl_fw_runtime *fwrt)
                                         &tbl_rev);
        if (!IS_ERR(wifi_pkg)) {
                if (tbl_rev != 1) {
-                       ret = PTR_ERR(wifi_pkg);
+                       ret = -EINVAL;
                        goto out_free;
                }
 
@@ -650,7 +650,7 @@ int iwl_sar_get_wrds_table(struct iwl_fw_runtime *fwrt)
                                         &tbl_rev);
        if (!IS_ERR(wifi_pkg)) {
                if (tbl_rev != 0) {
-                       ret = PTR_ERR(wifi_pkg);
+                       ret = -EINVAL;
                        goto out_free;
                }
 
@@ -707,7 +707,7 @@ int iwl_sar_get_ewrd_table(struct iwl_fw_runtime *fwrt)
                                         &tbl_rev);
        if (!IS_ERR(wifi_pkg)) {
                if (tbl_rev != 2) {
-                       ret = PTR_ERR(wifi_pkg);
+                       ret = -EINVAL;
                        goto out_free;
                }
 
@@ -723,7 +723,7 @@ int iwl_sar_get_ewrd_table(struct iwl_fw_runtime *fwrt)
                                         &tbl_rev);
        if (!IS_ERR(wifi_pkg)) {
                if (tbl_rev != 1) {
-                       ret = PTR_ERR(wifi_pkg);
+                       ret = -EINVAL;
                        goto out_free;
                }
 
@@ -739,7 +739,7 @@ int iwl_sar_get_ewrd_table(struct iwl_fw_runtime *fwrt)
                                         &tbl_rev);
        if (!IS_ERR(wifi_pkg)) {
                if (tbl_rev != 0) {
-                       ret = PTR_ERR(wifi_pkg);
+                       ret = -EINVAL;
                        goto out_free;
                }
 
@@ -1116,6 +1116,9 @@ int iwl_acpi_get_ppag_table(struct iwl_fw_runtime *fwrt)
                goto read_table;
        }
 
+       ret = PTR_ERR(wifi_pkg);
+       goto out_free;
+
 read_table:
        fwrt->ppag_ver = tbl_rev;
        flags = &wifi_pkg->package.elements[1];
index 798731ecbefde7f625d0cf00ef688f10281727be..b740c65a7dca25807ac648873a0df14796984d1b 100644 (file)
@@ -537,7 +537,7 @@ enum iwl_fw_dbg_config_cmd_type {
 }; /* LDBG_CFG_CMD_TYPE_API_E_VER_1 */
 
 /* this token disables debug asserts in the firmware */
-#define IWL_FW_DBG_CONFIG_TOKEN 0x00011301
+#define IWL_FW_DBG_CONFIG_TOKEN 0x00010001
 
 /**
  * struct iwl_fw_dbg_config_cmd - configure FW debug
index 9c69d3674384609b8a7c376900e07a04441c24b0..e6c0f928a6bbf338ca240214635313c05c4e8751 100644 (file)
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
 /*
- * Copyright (C) 2005-2014, 2019-2021, 2023 Intel Corporation
+ * Copyright (C) 2005-2014, 2019-2021, 2023-2024 Intel Corporation
  * Copyright (C) 2013-2015 Intel Mobile Communications GmbH
  * Copyright (C) 2016-2017 Intel Deutschland GmbH
  */
@@ -66,6 +66,16 @@ enum iwl_gen2_tx_fifo {
        IWL_GEN2_TRIG_TX_FIFO_VO,
 };
 
+enum iwl_bz_tx_fifo {
+       IWL_BZ_EDCA_TX_FIFO_BK,
+       IWL_BZ_EDCA_TX_FIFO_BE,
+       IWL_BZ_EDCA_TX_FIFO_VI,
+       IWL_BZ_EDCA_TX_FIFO_VO,
+       IWL_BZ_TRIG_TX_FIFO_BK,
+       IWL_BZ_TRIG_TX_FIFO_BE,
+       IWL_BZ_TRIG_TX_FIFO_VI,
+       IWL_BZ_TRIG_TX_FIFO_VO,
+};
 /**
  * enum iwl_tx_queue_cfg_actions - TXQ config options
  * @TX_QUEUE_CFG_ENABLE_QUEUE: enable a queue
index e27774e7ed74d82bbbb9821f24a2bc3a1578395b..80fda056e46a698458ee4ecf1230f7ef315e3a2a 100644 (file)
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
 /*
- * Copyright (C) 2005-2014, 2018-2023 Intel Corporation
+ * Copyright (C) 2005-2014, 2018-2024 Intel Corporation
  * Copyright (C) 2013-2015 Intel Mobile Communications GmbH
  * Copyright (C) 2015-2017 Intel Deutschland GmbH
  */
@@ -19,7 +19,6 @@
  * @fwrt_ptr: pointer to the buffer coming from fwrt
  * @trans_ptr: pointer to struct %iwl_trans_dump_data which contains the
  *     transport's data.
- * @trans_len: length of the valid data in trans_ptr
  * @fwrt_len: length of the valid data in fwrt_ptr
  */
 struct iwl_fw_dump_ptrs {
index 3b14f647674350e3fdef138eb880b07df6eb5770..72075720969c06b2378d84d48305e84df467f201 100644 (file)
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
 /*
- * Copyright (C) 2018-2023 Intel Corporation
+ * Copyright (C) 2018-2024 Intel Corporation
  */
 #include <linux/firmware.h>
 #include "iwl-drv.h"
@@ -1096,7 +1096,7 @@ static int iwl_dbg_tlv_override_trig_node(struct iwl_fw_runtime *fwrt,
                node_trig = (void *)node_tlv->data;
        }
 
-       memcpy(node_trig->data + offset, trig->data, trig_data_len);
+       memcpy((u8 *)node_trig->data + offset, trig->data, trig_data_len);
        node_tlv->length = cpu_to_le32(size);
 
        if (policy & IWL_FW_INI_APPLY_POLICY_OVERRIDE_CFG) {
index ffe2670720c9257c30cd86aa3f5edf7386bce602..abf8001bdac179b7e4b40897182f8b54532d6c97 100644 (file)
@@ -128,6 +128,7 @@ static void iwl_dealloc_ucode(struct iwl_drv *drv)
        kfree(drv->fw.ucode_capa.cmd_versions);
        kfree(drv->fw.phy_integration_ver);
        kfree(drv->trans->dbg.pc_data);
+       drv->trans->dbg.pc_data = NULL;
 
        for (i = 0; i < IWL_UCODE_TYPE_MAX; i++)
                iwl_free_fw_img(drv, drv->fw.img + i);
index 402896988686990fdd7ea9410f8e01819a7a1bc4..2f6774ec37b2286f1fb72e120a18435e804fcb58 100644 (file)
@@ -668,7 +668,6 @@ static const struct ieee80211_sband_iftype_data iwl_he_eht_capa[] = {
                        .has_eht = true,
                        .eht_cap_elem = {
                                .mac_cap_info[0] =
-                                       IEEE80211_EHT_MAC_CAP0_EPCS_PRIO_ACCESS |
                                        IEEE80211_EHT_MAC_CAP0_OM_CONTROL |
                                        IEEE80211_EHT_MAC_CAP0_TRIG_TXOP_SHARING_MODE1 |
                                        IEEE80211_EHT_MAC_CAP0_TRIG_TXOP_SHARING_MODE2 |
@@ -793,7 +792,6 @@ static const struct ieee80211_sband_iftype_data iwl_he_eht_capa[] = {
                        .has_eht = true,
                        .eht_cap_elem = {
                                .mac_cap_info[0] =
-                                       IEEE80211_EHT_MAC_CAP0_EPCS_PRIO_ACCESS |
                                        IEEE80211_EHT_MAC_CAP0_OM_CONTROL |
                                        IEEE80211_EHT_MAC_CAP0_TRIG_TXOP_SHARING_MODE1 |
                                        IEEE80211_EHT_MAC_CAP0_TRIG_TXOP_SHARING_MODE2,
@@ -1020,8 +1018,7 @@ iwl_nvm_fixup_sband_iftd(struct iwl_trans *trans,
        if (CSR_HW_REV_TYPE(trans->hw_rev) == IWL_CFG_MAC_TYPE_GL &&
            iftype_data->eht_cap.has_eht) {
                iftype_data->eht_cap.eht_cap_elem.mac_cap_info[0] &=
-                       ~(IEEE80211_EHT_MAC_CAP0_EPCS_PRIO_ACCESS |
-                         IEEE80211_EHT_MAC_CAP0_TRIG_TXOP_SHARING_MODE1 |
+                       ~(IEEE80211_EHT_MAC_CAP0_TRIG_TXOP_SHARING_MODE1 |
                          IEEE80211_EHT_MAC_CAP0_TRIG_TXOP_SHARING_MODE2);
                iftype_data->eht_cap.eht_cap_elem.phy_cap_info[3] &=
                        ~(IEEE80211_EHT_PHY_CAP0_PARTIAL_BW_UL_MU_MIMO |
index 4582afb149d720d077f30c0e7bb1814e5106d453..05b64176859e809986082c002f91eef247e83add 100644 (file)
@@ -1279,7 +1279,9 @@ static int __iwl_mvm_suspend(struct ieee80211_hw *hw,
 
                mvm->net_detect = true;
        } else {
-               struct iwl_wowlan_config_cmd wowlan_config_cmd = {};
+               struct iwl_wowlan_config_cmd wowlan_config_cmd = {
+                       .offloading_tid = 0,
+               };
 
                wowlan_config_cmd.sta_id = mvmvif->deflink.ap_sta_id;
 
@@ -1291,6 +1293,11 @@ static int __iwl_mvm_suspend(struct ieee80211_hw *hw,
                        goto out_noreset;
                }
 
+               ret = iwl_mvm_sta_ensure_queue(
+                       mvm, ap_sta->txq[wowlan_config_cmd.offloading_tid]);
+               if (ret)
+                       goto out_noreset;
+
                ret = iwl_mvm_get_wowlan_config(mvm, wowlan, &wowlan_config_cmd,
                                                vif, mvmvif, ap_sta);
                if (ret)
index c4f96125cf33af0eb066c3950e6dba18d505c4f4..25a5a31e63c2a33a0fc0bbe7317f65df62b9e8de 100644 (file)
@@ -31,6 +31,17 @@ const u8 iwl_mvm_ac_to_gen2_tx_fifo[] = {
        IWL_GEN2_TRIG_TX_FIFO_BK,
 };
 
+const u8 iwl_mvm_ac_to_bz_tx_fifo[] = {
+       IWL_BZ_EDCA_TX_FIFO_VO,
+       IWL_BZ_EDCA_TX_FIFO_VI,
+       IWL_BZ_EDCA_TX_FIFO_BE,
+       IWL_BZ_EDCA_TX_FIFO_BK,
+       IWL_BZ_TRIG_TX_FIFO_VO,
+       IWL_BZ_TRIG_TX_FIFO_VI,
+       IWL_BZ_TRIG_TX_FIFO_BE,
+       IWL_BZ_TRIG_TX_FIFO_BK,
+};
+
 struct iwl_mvm_mac_iface_iterator_data {
        struct iwl_mvm *mvm;
        struct ieee80211_vif *vif;
index 7f13dff04b265caf265f24662d7609f60289120d..53e26c3c3a9af616ac057428503edf6270d53b3d 100644 (file)
@@ -1600,7 +1600,8 @@ static int iwl_mvm_mac_add_interface(struct ieee80211_hw *hw,
         */
        if (vif->type == NL80211_IFTYPE_AP ||
            vif->type == NL80211_IFTYPE_ADHOC) {
-               iwl_mvm_vif_dbgfs_add_link(mvm, vif);
+               if (!test_bit(IWL_MVM_STATUS_IN_HW_RESTART, &mvm->status))
+                       iwl_mvm_vif_dbgfs_add_link(mvm, vif);
                ret = 0;
                goto out;
        }
@@ -1640,7 +1641,8 @@ static int iwl_mvm_mac_add_interface(struct ieee80211_hw *hw,
                        iwl_mvm_chandef_get_primary_80(&vif->bss_conf.chandef);
        }
 
-       iwl_mvm_vif_dbgfs_add_link(mvm, vif);
+       if (!test_bit(IWL_MVM_STATUS_IN_HW_RESTART, &mvm->status))
+               iwl_mvm_vif_dbgfs_add_link(mvm, vif);
 
        if (!test_bit(IWL_MVM_STATUS_IN_HW_RESTART, &mvm->status) &&
            vif->type == NL80211_IFTYPE_STATION && !vif->p2p &&
@@ -3685,6 +3687,9 @@ iwl_mvm_sta_state_notexist_to_none(struct iwl_mvm *mvm,
                                           NL80211_TDLS_SETUP);
        }
 
+       if (ret)
+               return ret;
+
        for_each_sta_active_link(vif, sta, link_sta, i)
                link_sta->agg.max_rc_amsdu_len = 1;
 
index 61170173f917a00707fc63956b8d7f252737c809..893b69fc841b896b234078240631d72c792a4c7e 100644 (file)
@@ -81,7 +81,8 @@ static int iwl_mvm_mld_mac_add_interface(struct ieee80211_hw *hw,
                ieee80211_hw_set(mvm->hw, RX_INCLUDES_FCS);
        }
 
-       iwl_mvm_vif_dbgfs_add_link(mvm, vif);
+       if (!test_bit(IWL_MVM_STATUS_IN_HW_RESTART, &mvm->status))
+               iwl_mvm_vif_dbgfs_add_link(mvm, vif);
 
        if (!test_bit(IWL_MVM_STATUS_IN_HW_RESTART, &mvm->status) &&
            vif->type == NL80211_IFTYPE_STATION && !vif->p2p &&
@@ -437,6 +438,9 @@ __iwl_mvm_mld_unassign_vif_chanctx(struct iwl_mvm *mvm,
                mvmvif->ap_ibss_active = false;
        }
 
+       iwl_mvm_link_changed(mvm, vif, link_conf,
+                            LINK_CONTEXT_MODIFY_ACTIVE, false);
+
        if (iwl_mvm_is_esr_supported(mvm->fwrt.trans) && n_active > 1) {
                int ret = iwl_mvm_esr_mode_inactive(mvm, vif);
 
@@ -448,9 +452,6 @@ __iwl_mvm_mld_unassign_vif_chanctx(struct iwl_mvm *mvm,
        if (vif->type == NL80211_IFTYPE_MONITOR)
                iwl_mvm_mld_rm_snif_sta(mvm, vif);
 
-       iwl_mvm_link_changed(mvm, vif, link_conf,
-                            LINK_CONTEXT_MODIFY_ACTIVE, false);
-
        if (switching_chanctx)
                return;
        mvmvif->link[link_id]->phy_ctxt = NULL;
index 40627961b834a2ee860445b4557cf19498f4e166..81dbef6947f5578dd50e12f157124fa48fcdb728 100644 (file)
@@ -1581,12 +1581,16 @@ static inline int iwl_mvm_max_active_links(struct iwl_mvm *mvm,
 
 extern const u8 iwl_mvm_ac_to_tx_fifo[];
 extern const u8 iwl_mvm_ac_to_gen2_tx_fifo[];
+extern const u8 iwl_mvm_ac_to_bz_tx_fifo[];
 
 static inline u8 iwl_mvm_mac_ac_to_tx_fifo(struct iwl_mvm *mvm,
                                           enum ieee80211_ac_numbers ac)
 {
-       return iwl_mvm_has_new_tx_api(mvm) ?
-               iwl_mvm_ac_to_gen2_tx_fifo[ac] : iwl_mvm_ac_to_tx_fifo[ac];
+       if (mvm->trans->trans_cfg->device_family >= IWL_DEVICE_FAMILY_BZ)
+               return iwl_mvm_ac_to_bz_tx_fifo[ac];
+       if (iwl_mvm_has_new_tx_api(mvm))
+               return iwl_mvm_ac_to_gen2_tx_fifo[ac];
+       return iwl_mvm_ac_to_tx_fifo[ac];
 }
 
 struct iwl_rate_info {
index 886d0009852872a5fc91a5f2f11476b3fdda78d8..af15d470c69bd60ea3737753b70832b9ffcf7d7f 100644 (file)
@@ -505,6 +505,10 @@ static bool iwl_mvm_is_dup(struct ieee80211_sta *sta, int queue,
                return false;
 
        mvm_sta = iwl_mvm_sta_from_mac80211(sta);
+
+       if (WARN_ON_ONCE(!mvm_sta->dup_data))
+               return false;
+
        dup_data = &mvm_sta->dup_data[queue];
 
        /*
index 2a3ca97859749749fff954e097a9d62ae86f5d24..c2e0cff740e9281ee7f73a2a9db4d0add160fee1 100644 (file)
@@ -1502,6 +1502,34 @@ out_err:
        return ret;
 }
 
+int iwl_mvm_sta_ensure_queue(struct iwl_mvm *mvm,
+                            struct ieee80211_txq *txq)
+{
+       struct iwl_mvm_txq *mvmtxq = iwl_mvm_txq_from_mac80211(txq);
+       int ret = -EINVAL;
+
+       lockdep_assert_held(&mvm->mutex);
+
+       if (likely(test_bit(IWL_MVM_TXQ_STATE_READY, &mvmtxq->state)) ||
+           !txq->sta) {
+               return 0;
+       }
+
+       if (!iwl_mvm_sta_alloc_queue(mvm, txq->sta, txq->ac, txq->tid)) {
+               set_bit(IWL_MVM_TXQ_STATE_READY, &mvmtxq->state);
+               ret = 0;
+       }
+
+       local_bh_disable();
+       spin_lock(&mvm->add_stream_lock);
+       if (!list_empty(&mvmtxq->list))
+               list_del_init(&mvmtxq->list);
+       spin_unlock(&mvm->add_stream_lock);
+       local_bh_enable();
+
+       return ret;
+}
+
 void iwl_mvm_add_new_dqa_stream_wk(struct work_struct *wk)
 {
        struct iwl_mvm *mvm = container_of(wk, struct iwl_mvm,
index b33a0ce096d46c2f92eb127d8942062b42f39345..3cf8a70274ce888833014b4492348c233be0a4c0 100644 (file)
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
 /*
- * Copyright (C) 2012-2014, 2018-2023 Intel Corporation
+ * Copyright (C) 2012-2014, 2018-2024 Intel Corporation
  * Copyright (C) 2013-2014 Intel Mobile Communications GmbH
  * Copyright (C) 2015-2016 Intel Deutschland GmbH
  */
@@ -571,6 +571,7 @@ void iwl_mvm_modify_all_sta_disable_tx(struct iwl_mvm *mvm,
                                       bool disable);
 
 void iwl_mvm_csa_client_absent(struct iwl_mvm *mvm, struct ieee80211_vif *vif);
+int iwl_mvm_sta_ensure_queue(struct iwl_mvm *mvm, struct ieee80211_txq *txq);
 void iwl_mvm_add_new_dqa_stream_wk(struct work_struct *wk);
 int iwl_mvm_add_pasn_sta(struct iwl_mvm *mvm, struct ieee80211_vif *vif,
                         struct iwl_mvm_int_sta *sta, u8 *addr, u32 cipher,
index 218fdf1ed5304f333008c8015ae796ff59583db0..2e653a417d6269b333ec12e7d15f43f3c66be46b 100644 (file)
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
 /*
- * Copyright (C) 2012-2014, 2018-2023 Intel Corporation
+ * Copyright (C) 2012-2014, 2018-2024 Intel Corporation
  * Copyright (C) 2013-2015 Intel Mobile Communications GmbH
  * Copyright (C) 2017 Intel Deutschland GmbH
  */
@@ -972,6 +972,7 @@ void iwl_mvm_rx_session_protect_notif(struct iwl_mvm *mvm,
        if (!le32_to_cpu(notif->status) || !le32_to_cpu(notif->start)) {
                /* End TE, notify mac80211 */
                mvmvif->time_event_data.id = SESSION_PROTECT_CONF_MAX_ID;
+               mvmvif->time_event_data.link_id = -1;
                iwl_mvm_p2p_roc_finished(mvm);
                ieee80211_remain_on_channel_expired(mvm->hw);
        } else if (le32_to_cpu(notif->start)) {
index db986bfc4dc3fe4374e22b071ecbf178fbb796dd..461f26d9214e4ab81e32033c00629b92f7e31700 100644 (file)
@@ -520,13 +520,24 @@ static void iwl_mvm_set_tx_cmd_crypto(struct iwl_mvm *mvm,
        }
 }
 
+static void iwl_mvm_copy_hdr(void *cmd, const void *hdr, int hdrlen,
+                            const u8 *addr3_override)
+{
+       struct ieee80211_hdr *out_hdr = cmd;
+
+       memcpy(cmd, hdr, hdrlen);
+       if (addr3_override)
+               memcpy(out_hdr->addr3, addr3_override, ETH_ALEN);
+}
+
 /*
  * Allocates and sets the Tx cmd the driver data pointers in the skb
  */
 static struct iwl_device_tx_cmd *
 iwl_mvm_set_tx_params(struct iwl_mvm *mvm, struct sk_buff *skb,
                      struct ieee80211_tx_info *info, int hdrlen,
-                     struct ieee80211_sta *sta, u8 sta_id)
+                     struct ieee80211_sta *sta, u8 sta_id,
+                     const u8 *addr3_override)
 {
        struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)skb->data;
        struct iwl_device_tx_cmd *dev_cmd;
@@ -584,7 +595,7 @@ iwl_mvm_set_tx_params(struct iwl_mvm *mvm, struct sk_buff *skb,
                        cmd->len = cpu_to_le16((u16)skb->len);
 
                        /* Copy MAC header from skb into command buffer */
-                       memcpy(cmd->hdr, hdr, hdrlen);
+                       iwl_mvm_copy_hdr(cmd->hdr, hdr, hdrlen, addr3_override);
 
                        cmd->flags = cpu_to_le16(flags);
                        cmd->rate_n_flags = cpu_to_le32(rate_n_flags);
@@ -599,7 +610,7 @@ iwl_mvm_set_tx_params(struct iwl_mvm *mvm, struct sk_buff *skb,
                        cmd->len = cpu_to_le16((u16)skb->len);
 
                        /* Copy MAC header from skb into command buffer */
-                       memcpy(cmd->hdr, hdr, hdrlen);
+                       iwl_mvm_copy_hdr(cmd->hdr, hdr, hdrlen, addr3_override);
 
                        cmd->flags = cpu_to_le32(flags);
                        cmd->rate_n_flags = cpu_to_le32(rate_n_flags);
@@ -617,7 +628,7 @@ iwl_mvm_set_tx_params(struct iwl_mvm *mvm, struct sk_buff *skb,
        iwl_mvm_set_tx_cmd_rate(mvm, tx_cmd, info, sta, hdr->frame_control);
 
        /* Copy MAC header from skb into command buffer */
-       memcpy(tx_cmd->hdr, hdr, hdrlen);
+       iwl_mvm_copy_hdr(tx_cmd->hdr, hdr, hdrlen, addr3_override);
 
 out:
        return dev_cmd;
@@ -820,7 +831,8 @@ int iwl_mvm_tx_skb_non_sta(struct iwl_mvm *mvm, struct sk_buff *skb)
 
        IWL_DEBUG_TX(mvm, "station Id %d, queue=%d\n", sta_id, queue);
 
-       dev_cmd = iwl_mvm_set_tx_params(mvm, skb, &info, hdrlen, NULL, sta_id);
+       dev_cmd = iwl_mvm_set_tx_params(mvm, skb, &info, hdrlen, NULL, sta_id,
+                                       NULL);
        if (!dev_cmd)
                return -1;
 
@@ -1140,7 +1152,8 @@ static int iwl_mvm_tx_pkt_queued(struct iwl_mvm *mvm,
  */
 static int iwl_mvm_tx_mpdu(struct iwl_mvm *mvm, struct sk_buff *skb,
                           struct ieee80211_tx_info *info,
-                          struct ieee80211_sta *sta)
+                          struct ieee80211_sta *sta,
+                          const u8 *addr3_override)
 {
        struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)skb->data;
        struct iwl_mvm_sta *mvmsta;
@@ -1172,7 +1185,8 @@ static int iwl_mvm_tx_mpdu(struct iwl_mvm *mvm, struct sk_buff *skb,
                iwl_mvm_probe_resp_set_noa(mvm, skb);
 
        dev_cmd = iwl_mvm_set_tx_params(mvm, skb, info, hdrlen,
-                                       sta, mvmsta->deflink.sta_id);
+                                       sta, mvmsta->deflink.sta_id,
+                                       addr3_override);
        if (!dev_cmd)
                goto drop;
 
@@ -1294,9 +1308,11 @@ int iwl_mvm_tx_skb_sta(struct iwl_mvm *mvm, struct sk_buff *skb,
        struct iwl_mvm_sta *mvmsta = iwl_mvm_sta_from_mac80211(sta);
        struct ieee80211_tx_info info;
        struct sk_buff_head mpdus_skbs;
+       struct ieee80211_vif *vif;
        unsigned int payload_len;
        int ret;
        struct sk_buff *orig_skb = skb;
+       const u8 *addr3;
 
        if (WARN_ON_ONCE(!mvmsta))
                return -1;
@@ -1307,26 +1323,59 @@ int iwl_mvm_tx_skb_sta(struct iwl_mvm *mvm, struct sk_buff *skb,
        memcpy(&info, skb->cb, sizeof(info));
 
        if (!skb_is_gso(skb))
-               return iwl_mvm_tx_mpdu(mvm, skb, &info, sta);
+               return iwl_mvm_tx_mpdu(mvm, skb, &info, sta, NULL);
 
        payload_len = skb_tail_pointer(skb) - skb_transport_header(skb) -
                tcp_hdrlen(skb) + skb->data_len;
 
        if (payload_len <= skb_shinfo(skb)->gso_size)
-               return iwl_mvm_tx_mpdu(mvm, skb, &info, sta);
+               return iwl_mvm_tx_mpdu(mvm, skb, &info, sta, NULL);
 
        __skb_queue_head_init(&mpdus_skbs);
 
+       vif = info.control.vif;
+       if (!vif)
+               return -1;
+
        ret = iwl_mvm_tx_tso(mvm, skb, &info, sta, &mpdus_skbs);
        if (ret)
                return ret;
 
        WARN_ON(skb_queue_empty(&mpdus_skbs));
 
+       /*
+        * As described in IEEE sta 802.11-2020, table 9-30 (Address
+        * field contents), A-MSDU address 3 should contain the BSSID
+        * address.
+        * Pass address 3 down to iwl_mvm_tx_mpdu() and further to set it
+        * in the command header. We need to preserve the original
+        * address 3 in the skb header to correctly create all the
+        * A-MSDU subframe headers from it.
+        */
+       switch (vif->type) {
+       case NL80211_IFTYPE_STATION:
+               addr3 = vif->cfg.ap_addr;
+               break;
+       case NL80211_IFTYPE_AP:
+               addr3 = vif->addr;
+               break;
+       default:
+               addr3 = NULL;
+               break;
+       }
+
        while (!skb_queue_empty(&mpdus_skbs)) {
+               struct ieee80211_hdr *hdr;
+               bool amsdu;
+
                skb = __skb_dequeue(&mpdus_skbs);
+               hdr = (void *)skb->data;
+               amsdu = ieee80211_is_data_qos(hdr->frame_control) &&
+                       (*ieee80211_get_qos_ctl(hdr) &
+                        IEEE80211_QOS_CTL_A_MSDU_PRESENT);
 
-               ret = iwl_mvm_tx_mpdu(mvm, skb, &info, sta);
+               ret = iwl_mvm_tx_mpdu(mvm, skb, &info, sta,
+                                     amsdu ? addr3 : NULL);
                if (ret) {
                        /* Free skbs created as part of TSO logic that have not yet been dequeued */
                        __skb_queue_purge(&mpdus_skbs);
index b52cce38115d0a9e5e7655af324fdc3e266f6011..c4fe70e05b9b87771613d8569b216a1cf91ac550 100644 (file)
@@ -125,7 +125,7 @@ int p54_parse_firmware(struct ieee80211_hw *dev, const struct firmware *fw)
                           "FW rev %s - Softmac protocol %x.%x\n",
                           fw_version, priv->fw_var >> 8, priv->fw_var & 0xff);
                snprintf(dev->wiphy->fw_version, sizeof(dev->wiphy->fw_version),
-                               "%s - %x.%x", fw_version,
+                               "%.19s - %x.%x", fw_version,
                                priv->fw_var >> 8, priv->fw_var & 0xff);
        }
 
index ce0179b8ab368fa7138a394afbc32678b05d20fc..0073b5e0f9c90ba473e71f1902724d8979bdc792 100644 (file)
@@ -700,6 +700,7 @@ static struct spi_driver p54spi_driver = {
 
 module_spi_driver(p54spi_driver);
 
+MODULE_DESCRIPTION("Prism54 SPI wireless driver");
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Christian Lamparter <chunkeey@web.de>");
 MODULE_ALIAS("spi:cx3110x");
index 89d738deea62e9ed4d4f9044e1a193548833deff..e2146d30e55363ecdfec2ed26bb7753384f04317 100644 (file)
@@ -728,6 +728,7 @@ const struct ieee80211_ops mt7603_ops = {
        .set_sar_specs = mt7603_set_sar_specs,
 };
 
+MODULE_DESCRIPTION("MediaTek MT7603E and MT76x8 wireless driver");
 MODULE_LICENSE("Dual BSD/GPL");
 
 static int __init mt7603_init(void)
index dab16b5fc3861198f9eccd3ae48c776bb03a66a1..0971c164b57e926d2d22dd1ef4f0559d45420db2 100644 (file)
@@ -1375,4 +1375,5 @@ const struct ieee80211_ops mt7615_ops = {
 };
 EXPORT_SYMBOL_GPL(mt7615_ops);
 
+MODULE_DESCRIPTION("MediaTek MT7615E and MT7663E wireless driver");
 MODULE_LICENSE("Dual BSD/GPL");
index ac036a072439d5d0a2c7eb8aad38048b4098b25f..87a956ea3ad74f6fb62a873eba0f499e04a99c32 100644 (file)
@@ -270,4 +270,5 @@ static void __exit mt7615_exit(void)
 
 module_init(mt7615_init);
 module_exit(mt7615_exit);
+MODULE_DESCRIPTION("MediaTek MT7615E MMIO helpers");
 MODULE_LICENSE("Dual BSD/GPL");
index 67cedd2555f973fc53cd0e37312771c3e0fa1116..9692890ba51b7b61c43991cb20a59d8394188c1e 100644 (file)
@@ -253,4 +253,5 @@ module_sdio_driver(mt7663s_driver);
 
 MODULE_AUTHOR("Sean Wang <sean.wang@mediatek.com>");
 MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
+MODULE_DESCRIPTION("MediaTek MT7663S (SDIO) wireless driver");
 MODULE_LICENSE("Dual BSD/GPL");
index 04963b9f749838c41717884ecb818e9d0894667b..df737e1ff27b79a21c1b92ca899272eb961800f0 100644 (file)
@@ -281,4 +281,5 @@ module_usb_driver(mt7663u_driver);
 
 MODULE_AUTHOR("Sean Wang <sean.wang@mediatek.com>");
 MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
+MODULE_DESCRIPTION("MediaTek MT7663U (USB) wireless driver");
 MODULE_LICENSE("Dual BSD/GPL");
index 0052d103e276a895e3eddb81edb4397e0149281b..820b395900275a700da43e7139c3a22effa17140 100644 (file)
@@ -349,4 +349,5 @@ EXPORT_SYMBOL_GPL(mt7663_usb_sdio_register_device);
 
 MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
 MODULE_AUTHOR("Sean Wang <sean.wang@mediatek.com>");
+MODULE_DESCRIPTION("MediaTek MT7663 SDIO/USB helpers");
 MODULE_LICENSE("Dual BSD/GPL");
index 96494ba2fdf767ba89d25b505ca7ee86b0797c19..3a20ba0d2492840304f9f5b108a8f82643ce7396 100644 (file)
@@ -3160,4 +3160,5 @@ exit:
 EXPORT_SYMBOL_GPL(mt76_connac2_mcu_fill_message);
 
 MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
+MODULE_DESCRIPTION("MediaTek MT76x connac layer helpers");
 MODULE_LICENSE("Dual BSD/GPL");
index c3a392a1a659e8a0581809a3a5e03d9847085496..bcd24c9072ec9e52f68c7dac0f124279d525eba0 100644 (file)
@@ -342,4 +342,5 @@ int mt76x0_eeprom_init(struct mt76x02_dev *dev)
        return 0;
 }
 
+MODULE_DESCRIPTION("MediaTek MT76x EEPROM helpers");
 MODULE_LICENSE("Dual BSD/GPL");
index 9277ff38b7a228fd778e8c9909eed0182eb14306..293e66fa83d5d0669d4f26acaff150a41f35debc 100644 (file)
@@ -302,6 +302,7 @@ static const struct pci_device_id mt76x0e_device_table[] = {
 MODULE_DEVICE_TABLE(pci, mt76x0e_device_table);
 MODULE_FIRMWARE(MT7610E_FIRMWARE);
 MODULE_FIRMWARE(MT7650E_FIRMWARE);
+MODULE_DESCRIPTION("MediaTek MT76x0E (PCIe) wireless driver");
 MODULE_LICENSE("Dual BSD/GPL");
 
 static struct pci_driver mt76x0e_driver = {
index 0422c332354a131dab040c72a8961ec6f1b79515..dd042949cf82bc6c87f4aaee8b7c5d912faf2162 100644 (file)
@@ -336,6 +336,7 @@ err:
 MODULE_DEVICE_TABLE(usb, mt76x0_device_table);
 MODULE_FIRMWARE(MT7610E_FIRMWARE);
 MODULE_FIRMWARE(MT7610U_FIRMWARE);
+MODULE_DESCRIPTION("MediaTek MT76x0U (USB) wireless driver");
 MODULE_LICENSE("GPL");
 
 static struct usb_driver mt76x0_driver = {
index 02da543dfc5cf381f6edac1753c0d627504e3e48..b2cc449142945f585a7d50ca0d68da21e35531e7 100644 (file)
@@ -293,4 +293,5 @@ void mt76x02u_init_mcu(struct mt76_dev *dev)
 EXPORT_SYMBOL_GPL(mt76x02u_init_mcu);
 
 MODULE_AUTHOR("Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>");
+MODULE_DESCRIPTION("MediaTek MT76x02 MCU helpers");
 MODULE_LICENSE("Dual BSD/GPL");
index 8a0e8124b894003ed80aad02ff6c59be9f3e457f..8020446be37bd99c15b9410b697060a304ef18af 100644 (file)
@@ -696,4 +696,5 @@ void mt76x02_config_mac_addr_list(struct mt76x02_dev *dev)
 }
 EXPORT_SYMBOL_GPL(mt76x02_config_mac_addr_list);
 
+MODULE_DESCRIPTION("MediaTek MT76x02 helpers");
 MODULE_LICENSE("Dual BSD/GPL");
index 8c01855885ce3949a16e63fe4169db3345a62044..1fe5f5a02f937783c669205e286e917ec0872db1 100644 (file)
@@ -506,4 +506,5 @@ int mt76x2_eeprom_init(struct mt76x02_dev *dev)
 }
 EXPORT_SYMBOL_GPL(mt76x2_eeprom_init);
 
+MODULE_DESCRIPTION("MediaTek MT76x2 EEPROM helpers");
 MODULE_LICENSE("Dual BSD/GPL");
index df85ebc6e1df07a7c2e48c30efa50cdeec9f993f..30959746e9242712e8196724157db8c93caa96f3 100644 (file)
@@ -165,6 +165,7 @@ mt76x2e_resume(struct pci_dev *pdev)
 MODULE_DEVICE_TABLE(pci, mt76x2e_device_table);
 MODULE_FIRMWARE(MT7662_FIRMWARE);
 MODULE_FIRMWARE(MT7662_ROM_PATCH);
+MODULE_DESCRIPTION("MediaTek MT76x2E (PCIe) wireless driver");
 MODULE_LICENSE("Dual BSD/GPL");
 
 static struct pci_driver mt76pci_driver = {
index 55068f3252ef341f4fbdc6d7dd382296e47b17f7..ca78e14251c2f5cda524c046b9c80a96b4481167 100644 (file)
@@ -147,4 +147,5 @@ static struct usb_driver mt76x2u_driver = {
 module_usb_driver(mt76x2u_driver);
 
 MODULE_AUTHOR("Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>");
+MODULE_DESCRIPTION("MediaTek MT76x2U (USB) wireless driver");
 MODULE_LICENSE("Dual BSD/GPL");
index aff4f21e843d29ae24b1ef094b407e344597bf36..3039f53e224546a406a2fc4cc0b2f8e07884d456 100644 (file)
@@ -958,4 +958,5 @@ static void __exit mt7915_exit(void)
 
 module_init(mt7915_init);
 module_exit(mt7915_exit);
+MODULE_DESCRIPTION("MediaTek MT7915E MMIO helpers");
 MODULE_LICENSE("Dual BSD/GPL");
index 0645417e05825f709e19e392e48544d36d2e3534..0d5adc5ddae38283eb618ba00284fb4b527c677c 100644 (file)
@@ -1418,5 +1418,6 @@ const struct ieee80211_ops mt7921_ops = {
 };
 EXPORT_SYMBOL_GPL(mt7921_ops);
 
+MODULE_DESCRIPTION("MediaTek MT7921 core driver");
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_AUTHOR("Sean Wang <sean.wang@mediatek.com>");
index 57903c6e4f11f0735fd80c4a3c54c4299ca48be0..dde26f3274783d9a7dc84da9be5e8fa214fba21e 100644 (file)
@@ -544,4 +544,5 @@ MODULE_FIRMWARE(MT7922_FIRMWARE_WM);
 MODULE_FIRMWARE(MT7922_ROM_PATCH);
 MODULE_AUTHOR("Sean Wang <sean.wang@mediatek.com>");
 MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
+MODULE_DESCRIPTION("MediaTek MT7921E (PCIe) wireless driver");
 MODULE_LICENSE("Dual BSD/GPL");
index 7591e54d289733472740a5afe747833070363f2e..a9ce1e746b954bc7c7599f23ec6a6c23031dd384 100644 (file)
@@ -323,5 +323,6 @@ static struct sdio_driver mt7921s_driver = {
        .drv.pm         = pm_sleep_ptr(&mt7921s_pm_ops),
 };
 module_sdio_driver(mt7921s_driver);
+MODULE_DESCRIPTION("MediaTek MT7921S (SDIO) wireless driver");
 MODULE_AUTHOR("Sean Wang <sean.wang@mediatek.com>");
 MODULE_LICENSE("Dual BSD/GPL");
index e5258c74fc077ac69b310b5bab0e56c02ebfcef5..8b7c03c47598de7bf9037ed40cb370607a837af4 100644 (file)
@@ -336,5 +336,6 @@ static struct usb_driver mt7921u_driver = {
 };
 module_usb_driver(mt7921u_driver);
 
+MODULE_DESCRIPTION("MediaTek MT7921U (USB) wireless driver");
 MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
 MODULE_LICENSE("Dual BSD/GPL");
index 8f1075da4903908b5f149e12530d21f6735be07f..125a1be3cb64c6a1a14bb6aaac28ec0fbc11e889 100644 (file)
@@ -1450,4 +1450,5 @@ const struct ieee80211_ops mt7925_ops = {
 EXPORT_SYMBOL_GPL(mt7925_ops);
 
 MODULE_AUTHOR("Deren Wu <deren.wu@mediatek.com>");
+MODULE_DESCRIPTION("MediaTek MT7925 core driver");
 MODULE_LICENSE("Dual BSD/GPL");
index 734f31ee40d3f740873dc0c53356a63fa11ff976..1fd99a856541589b1f3859795e8b23bcc0c06cdf 100644 (file)
@@ -583,4 +583,5 @@ MODULE_FIRMWARE(MT7925_FIRMWARE_WM);
 MODULE_FIRMWARE(MT7925_ROM_PATCH);
 MODULE_AUTHOR("Deren Wu <deren.wu@mediatek.com>");
 MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
+MODULE_DESCRIPTION("MediaTek MT7925E (PCIe) wireless driver");
 MODULE_LICENSE("Dual BSD/GPL");
index 9b885c5b3ed594ddc9c5b62f47dfd4522430f469..1e0f094fc9059dbb02585bdc168b6b763f198004 100644 (file)
@@ -329,4 +329,5 @@ static struct usb_driver mt7925u_driver = {
 module_usb_driver(mt7925u_driver);
 
 MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
+MODULE_DESCRIPTION("MediaTek MT7925U (USB) wireless driver");
 MODULE_LICENSE("Dual BSD/GPL");
index 502be22dbe3677fb475371b7e4c564be074899d8..c42101aa9e45e958f605e37e763bb090c63aeea5 100644 (file)
@@ -862,5 +862,6 @@ int mt792x_load_firmware(struct mt792x_dev *dev)
 }
 EXPORT_SYMBOL_GPL(mt792x_load_firmware);
 
+MODULE_DESCRIPTION("MediaTek MT792x core driver");
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
index 2dd283caed36bf056127d17a6cd3e93e9f6664d4..589a3efb9f8c30bbce14ec288e8ad66ecf0acf99 100644 (file)
@@ -314,5 +314,6 @@ void mt792xu_disconnect(struct usb_interface *usb_intf)
 }
 EXPORT_SYMBOL_GPL(mt792xu_disconnect);
 
+MODULE_DESCRIPTION("MediaTek MT792x USB helpers");
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
index 3c729b563edc5dd6f964e0e72c46f276367c3d65..699be57309c2e4db6d8ff3f456d55fe5a75be291 100644 (file)
@@ -4477,7 +4477,8 @@ int mt7996_mcu_set_txpower_sku(struct mt7996_phy *phy)
 
        skb_put_data(skb, &req, sizeof(req));
        /* cck and ofdm */
-       skb_put_data(skb, &la.cck, sizeof(la.cck) + sizeof(la.ofdm));
+       skb_put_data(skb, &la.cck, sizeof(la.cck));
+       skb_put_data(skb, &la.ofdm, sizeof(la.ofdm));
        /* ht20 */
        skb_put_data(skb, &la.mcs[0], 8);
        /* ht40 */
index c50d89a445e9560672aeab8752de112220c9ab1c..9f2abfa273c9b060a793ae2594963f5c123fc5b0 100644 (file)
@@ -650,4 +650,5 @@ static void __exit mt7996_exit(void)
 
 module_init(mt7996_init);
 module_exit(mt7996_exit);
+MODULE_DESCRIPTION("MediaTek MT7996 MMIO helpers");
 MODULE_LICENSE("Dual BSD/GPL");
index c52d550f0c32aac260e3163f14ec4989efbacc53..3e88798df0178c17cce2e94f588feb43711ad0a0 100644 (file)
@@ -672,4 +672,5 @@ EXPORT_SYMBOL_GPL(mt76s_init);
 
 MODULE_AUTHOR("Sean Wang <sean.wang@mediatek.com>");
 MODULE_AUTHOR("Lorenzo Bianconi <lorenzo@kernel.org>");
+MODULE_DESCRIPTION("MediaTek MT76x SDIO helpers");
 MODULE_LICENSE("Dual BSD/GPL");
index 1584665fe3cb68d890bd7606e3eedcd05a470a2c..5a0bcb5071bd7d5ee8c22ef54ad3e05b3ea96cf4 100644 (file)
@@ -1128,4 +1128,5 @@ int mt76u_init(struct mt76_dev *dev, struct usb_interface *intf)
 EXPORT_SYMBOL_GPL(mt76u_init);
 
 MODULE_AUTHOR("Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>");
+MODULE_DESCRIPTION("MediaTek MT76x USB helpers");
 MODULE_LICENSE("Dual BSD/GPL");
index fc76c66ff1a5a58f1c73e9ff50ceeea4926b8063..d6c01a2dd1988c5a9ef50bc96ceb0e30b453302d 100644 (file)
@@ -138,4 +138,5 @@ int __mt76_worker_fn(void *ptr)
 }
 EXPORT_SYMBOL_GPL(__mt76_worker_fn);
 
+MODULE_DESCRIPTION("MediaTek MT76x helpers");
 MODULE_LICENSE("Dual BSD/GPL");
index 91d71e0f7ef2332354a0b950eea3e8795387ed7b..81e8f25863f5bdc957fa5b4ee53a78c31f81eba6 100644 (file)
@@ -1018,5 +1018,6 @@ unregister_netdev:
        return ERR_PTR(ret);
 }
 
+MODULE_DESCRIPTION("Atmel WILC1000 core wireless driver");
 MODULE_LICENSE("GPL");
 MODULE_FIRMWARE(WILC1000_FW(WILC1000_API_VER));
index 0d13e3e46e98e4b59852811324792793fc27f78e..d6d3946930905275ba021611307d6a16324a0bb8 100644 (file)
@@ -984,4 +984,5 @@ static struct sdio_driver wilc_sdio_driver = {
 module_driver(wilc_sdio_driver,
              sdio_register_driver,
              sdio_unregister_driver);
+MODULE_DESCRIPTION("Atmel WILC1000 SDIO wireless driver");
 MODULE_LICENSE("GPL");
index 77b4cdff73c370bf1bbd1e1ebec77eb0cac318b7..1d8b241ce43cae3329eb9fee84afc63a1027447e 100644 (file)
@@ -273,6 +273,7 @@ static struct spi_driver wilc_spi_driver = {
        .remove = wilc_bus_remove,
 };
 module_spi_driver(wilc_spi_driver);
+MODULE_DESCRIPTION("Atmel WILC1000 SPI wireless driver");
 MODULE_LICENSE("GPL");
 
 static int wilc_spi_tx(struct wilc *wilc, u8 *b, u32 len)
index 301bd0043a4354032ceac6c45768c347b473163b..4e5b351f80f0922cabc3f1ce57fa44960c090e3c 100644 (file)
@@ -343,5 +343,6 @@ static void __exit wl1251_sdio_exit(void)
 module_init(wl1251_sdio_init);
 module_exit(wl1251_sdio_exit);
 
+MODULE_DESCRIPTION("TI WL1251 SDIO helpers");
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Kalle Valo <kvalo@adurom.com>");
index 29292f06bd3dcb191bee70af1b295a6a62abf841..1936bb3af54ab6509edff7a6afdf23f9e9a9b728 100644 (file)
@@ -342,6 +342,7 @@ static struct spi_driver wl1251_spi_driver = {
 
 module_spi_driver(wl1251_spi_driver);
 
+MODULE_DESCRIPTION("TI WL1251 SPI helpers");
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Kalle Valo <kvalo@adurom.com>");
 MODULE_ALIAS("spi:wl1251");
index de045fe4ca1eb982105a4a7a2d502f142efd5d02..b26d42b4e3cc0fbdc21e55b4b9682b0a81e1ec8e 100644 (file)
@@ -1955,6 +1955,7 @@ module_param_named(tcxo, tcxo_param, charp, 0);
 MODULE_PARM_DESC(tcxo,
                 "TCXO clock: 19.2, 26, 38.4, 52, 16.368, 32.736, 16.8, 33.6");
 
+MODULE_DESCRIPTION("TI WL12xx wireless driver");
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Luciano Coelho <coelho@ti.com>");
 MODULE_FIRMWARE(WL127X_FW_NAME_SINGLE);
index 20d9181b3410c40b555d7f06836469357782fc9f..2ccac1cdec0120c1709d09add0bcdf390f57e7e7 100644 (file)
@@ -2086,6 +2086,7 @@ module_param_named(num_rx_desc, num_rx_desc_param, int, 0400);
 MODULE_PARM_DESC(num_rx_desc_param,
                 "Number of Rx descriptors: u8 (default is 32)");
 
+MODULE_DESCRIPTION("TI WiLink 8 wireless driver");
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Luciano Coelho <coelho@ti.com>");
 MODULE_FIRMWARE(WL18XX_FW_NAME);
index fb9ed97774c7a29ab1a27689f7e78bf3740c0c3f..5736acb4d2063cbd1ca9f9f23187102d1d30c599 100644 (file)
@@ -6793,6 +6793,7 @@ MODULE_PARM_DESC(bug_on_recovery, "BUG() on fw recovery");
 module_param(no_recovery, int, 0600);
 MODULE_PARM_DESC(no_recovery, "Prevent HW recovery. FW will remain stuck.");
 
+MODULE_DESCRIPTION("TI WLAN core driver");
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Luciano Coelho <coelho@ti.com>");
 MODULE_AUTHOR("Juuso Oikarinen <juuso.oikarinen@nokia.com>");
index f0686635db46e1246f3d06a8814f50d4c93c85ff..eb5482ed76ae48488ef5f55d1731c080c25b9919 100644 (file)
@@ -447,6 +447,7 @@ module_sdio_driver(wl1271_sdio_driver);
 module_param(dump, bool, 0600);
 MODULE_PARM_DESC(dump, "Enable sdio read/write dumps.");
 
+MODULE_DESCRIPTION("TI WLAN SDIO helpers");
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Luciano Coelho <coelho@ti.com>");
 MODULE_AUTHOR("Juuso Oikarinen <juuso.oikarinen@nokia.com>");
index 7d9a139db59e1552e3f4cd6feae526c4a5211e69..0aa2b2f3c5c914160d05198c3fc8c6ed07c6e999 100644 (file)
@@ -562,6 +562,7 @@ static struct spi_driver wl1271_spi_driver = {
 };
 
 module_spi_driver(wl1271_spi_driver);
+MODULE_DESCRIPTION("TI WLAN SPI helpers");
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Luciano Coelho <coelho@ti.com>");
 MODULE_AUTHOR("Juuso Oikarinen <juuso.oikarinen@nokia.com>");
index 88f760a7cbc35469e20be2d09f9b2cfb92b8362a..ef76850d9bcd232e84f00c4576c5a98ace51458f 100644 (file)
@@ -104,13 +104,12 @@ bool provides_xdp_headroom = true;
 module_param(provides_xdp_headroom, bool, 0644);
 
 static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx,
-                              u8 status);
+                              s8 status);
 
 static void make_tx_response(struct xenvif_queue *queue,
-                            struct xen_netif_tx_request *txp,
+                            const struct xen_netif_tx_request *txp,
                             unsigned int extra_count,
-                            s8       st);
-static void push_tx_responses(struct xenvif_queue *queue);
+                            s8 status);
 
 static void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx);
 
@@ -208,13 +207,9 @@ static void xenvif_tx_err(struct xenvif_queue *queue,
                          unsigned int extra_count, RING_IDX end)
 {
        RING_IDX cons = queue->tx.req_cons;
-       unsigned long flags;
 
        do {
-               spin_lock_irqsave(&queue->response_lock, flags);
                make_tx_response(queue, txp, extra_count, XEN_NETIF_RSP_ERROR);
-               push_tx_responses(queue);
-               spin_unlock_irqrestore(&queue->response_lock, flags);
                if (cons == end)
                        break;
                RING_COPY_REQUEST(&queue->tx, cons++, txp);
@@ -463,12 +458,20 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
        }
 
        for (shinfo->nr_frags = 0; nr_slots > 0 && shinfo->nr_frags < MAX_SKB_FRAGS;
-            shinfo->nr_frags++, gop++, nr_slots--) {
+            nr_slots--) {
+               if (unlikely(!txp->size)) {
+                       make_tx_response(queue, txp, 0, XEN_NETIF_RSP_OKAY);
+                       ++txp;
+                       continue;
+               }
+
                index = pending_index(queue->pending_cons++);
                pending_idx = queue->pending_ring[index];
                xenvif_tx_create_map_op(queue, pending_idx, txp,
                                        txp == first ? extra_count : 0, gop);
                frag_set_pending_idx(&frags[shinfo->nr_frags], pending_idx);
+               ++shinfo->nr_frags;
+               ++gop;
 
                if (txp == first)
                        txp = txfrags;
@@ -481,20 +484,33 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
                shinfo = skb_shinfo(nskb);
                frags = shinfo->frags;
 
-               for (shinfo->nr_frags = 0; shinfo->nr_frags < nr_slots;
-                    shinfo->nr_frags++, txp++, gop++) {
+               for (shinfo->nr_frags = 0; shinfo->nr_frags < nr_slots; ++txp) {
+                       if (unlikely(!txp->size)) {
+                               make_tx_response(queue, txp, 0,
+                                                XEN_NETIF_RSP_OKAY);
+                               continue;
+                       }
+
                        index = pending_index(queue->pending_cons++);
                        pending_idx = queue->pending_ring[index];
                        xenvif_tx_create_map_op(queue, pending_idx, txp, 0,
                                                gop);
                        frag_set_pending_idx(&frags[shinfo->nr_frags],
                                             pending_idx);
+                       ++shinfo->nr_frags;
+                       ++gop;
+               }
+
+               if (shinfo->nr_frags) {
+                       skb_shinfo(skb)->frag_list = nskb;
+                       nskb = NULL;
                }
+       }
 
-               skb_shinfo(skb)->frag_list = nskb;
-       } else if (nskb) {
+       if (nskb) {
                /* A frag_list skb was allocated but it is no longer needed
-                * because enough slots were converted to copy ops above.
+                * because enough slots were converted to copy ops above or some
+                * were empty.
                 */
                kfree_skb(nskb);
        }
@@ -963,7 +979,6 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
                                         (ret == 0) ?
                                         XEN_NETIF_RSP_OKAY :
                                         XEN_NETIF_RSP_ERROR);
-                       push_tx_responses(queue);
                        continue;
                }
 
@@ -975,7 +990,6 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 
                        make_tx_response(queue, &txreq, extra_count,
                                         XEN_NETIF_RSP_OKAY);
-                       push_tx_responses(queue);
                        continue;
                }
 
@@ -1401,8 +1415,35 @@ int xenvif_tx_action(struct xenvif_queue *queue, int budget)
        return work_done;
 }
 
+static void _make_tx_response(struct xenvif_queue *queue,
+                            const struct xen_netif_tx_request *txp,
+                            unsigned int extra_count,
+                            s8 status)
+{
+       RING_IDX i = queue->tx.rsp_prod_pvt;
+       struct xen_netif_tx_response *resp;
+
+       resp = RING_GET_RESPONSE(&queue->tx, i);
+       resp->id     = txp->id;
+       resp->status = status;
+
+       while (extra_count-- != 0)
+               RING_GET_RESPONSE(&queue->tx, ++i)->status = XEN_NETIF_RSP_NULL;
+
+       queue->tx.rsp_prod_pvt = ++i;
+}
+
+static void push_tx_responses(struct xenvif_queue *queue)
+{
+       int notify;
+
+       RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&queue->tx, notify);
+       if (notify)
+               notify_remote_via_irq(queue->tx_irq);
+}
+
 static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx,
-                              u8 status)
+                              s8 status)
 {
        struct pending_tx_info *pending_tx_info;
        pending_ring_idx_t index;
@@ -1412,8 +1453,8 @@ static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx,
 
        spin_lock_irqsave(&queue->response_lock, flags);
 
-       make_tx_response(queue, &pending_tx_info->req,
-                        pending_tx_info->extra_count, status);
+       _make_tx_response(queue, &pending_tx_info->req,
+                         pending_tx_info->extra_count, status);
 
        /* Release the pending index before pusing the Tx response so
         * its available before a new Tx request is pushed by the
@@ -1427,32 +1468,19 @@ static void xenvif_idx_release(struct xenvif_queue *queue, u16 pending_idx,
        spin_unlock_irqrestore(&queue->response_lock, flags);
 }
 
-
 static void make_tx_response(struct xenvif_queue *queue,
-                            struct xen_netif_tx_request *txp,
+                            const struct xen_netif_tx_request *txp,
                             unsigned int extra_count,
-                            s8       st)
+                            s8 status)
 {
-       RING_IDX i = queue->tx.rsp_prod_pvt;
-       struct xen_netif_tx_response *resp;
-
-       resp = RING_GET_RESPONSE(&queue->tx, i);
-       resp->id     = txp->id;
-       resp->status = st;
-
-       while (extra_count-- != 0)
-               RING_GET_RESPONSE(&queue->tx, ++i)->status = XEN_NETIF_RSP_NULL;
+       unsigned long flags;
 
-       queue->tx.rsp_prod_pvt = ++i;
-}
+       spin_lock_irqsave(&queue->response_lock, flags);
 
-static void push_tx_responses(struct xenvif_queue *queue)
-{
-       int notify;
+       _make_tx_response(queue, txp, extra_count, status);
+       push_tx_responses(queue);
 
-       RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&queue->tx, notify);
-       if (notify)
-               notify_remote_via_irq(queue->tx_irq);
+       spin_unlock_irqrestore(&queue->response_lock, flags);
 }
 
 static void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx)
@@ -1750,5 +1778,6 @@ static void __exit netback_fini(void)
 }
 module_exit(netback_fini);
 
+MODULE_DESCRIPTION("Xen backend network device module");
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_ALIAS("xen-backend:vif");
index bb3726b622ad9f0d9bdaeb23c91d7bff1b61d2cb..4d0c527e8576785fbff56eadf9cdf0412290f831 100644 (file)
@@ -1496,19 +1496,21 @@ static int btt_blk_init(struct btt *btt)
 {
        struct nd_btt *nd_btt = btt->nd_btt;
        struct nd_namespace_common *ndns = nd_btt->ndns;
-       int rc = -ENOMEM;
+       struct queue_limits lim = {
+               .logical_block_size     = btt->sector_size,
+               .max_hw_sectors         = UINT_MAX,
+       };
+       int rc;
 
-       btt->btt_disk = blk_alloc_disk(NUMA_NO_NODE);
-       if (!btt->btt_disk)
-               return -ENOMEM;
+       btt->btt_disk = blk_alloc_disk(&lim, NUMA_NO_NODE);
+       if (IS_ERR(btt->btt_disk))
+               return PTR_ERR(btt->btt_disk);
 
        nvdimm_namespace_disk_name(ndns, btt->btt_disk->disk_name);
        btt->btt_disk->first_minor = 0;
        btt->btt_disk->fops = &btt_fops;
        btt->btt_disk->private_data = btt;
 
-       blk_queue_logical_block_size(btt->btt_disk->queue, btt->sector_size);
-       blk_queue_max_hw_sectors(btt->btt_disk->queue, UINT_MAX);
        blk_queue_flag_set(QUEUE_FLAG_NONROT, btt->btt_disk->queue);
        blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, btt->btt_disk->queue);
 
index 4e8fdcb3f1c8276ea6e8b5992867cf976449925b..8dcc10b6db5b12c6994e2a6e1283be3eda73af29 100644 (file)
@@ -451,6 +451,11 @@ static int pmem_attach_disk(struct device *dev,
 {
        struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
        struct nd_region *nd_region = to_nd_region(dev->parent);
+       struct queue_limits lim = {
+               .logical_block_size     = pmem_sector_size(ndns),
+               .physical_block_size    = PAGE_SIZE,
+               .max_hw_sectors         = UINT_MAX,
+       };
        int nid = dev_to_node(dev), fua;
        struct resource *res = &nsio->res;
        struct range bb_range;
@@ -497,9 +502,9 @@ static int pmem_attach_disk(struct device *dev,
                return -EBUSY;
        }
 
-       disk = blk_alloc_disk(nid);
-       if (!disk)
-               return -ENOMEM;
+       disk = blk_alloc_disk(&lim, nid);
+       if (IS_ERR(disk))
+               return PTR_ERR(disk);
        q = disk->queue;
 
        pmem->disk = disk;
@@ -539,9 +544,6 @@ static int pmem_attach_disk(struct device *dev,
        pmem->virt_addr = addr;
 
        blk_queue_write_cache(q, true, fua);
-       blk_queue_physical_block_size(q, PAGE_SIZE);
-       blk_queue_logical_block_size(q, pmem_sector_size(ndns));
-       blk_queue_max_hw_sectors(q, UINT_MAX);
        blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
        blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, q);
        if (pmem->pfn_flags & PFN_MAP)
index a23ab5c968b9457bee89f14cc1f158e377ffa084..a3455f1d67fae20268a0e9e02a4c5c34ff054afe 100644 (file)
@@ -471,4 +471,5 @@ int nvme_auth_generate_key(u8 *secret, struct nvme_dhchap_key **ret_key)
 }
 EXPORT_SYMBOL_GPL(nvme_auth_generate_key);
 
+MODULE_DESCRIPTION("NVMe Authentication framework");
 MODULE_LICENSE("GPL v2");
index a5c0431c101cf3775509145e3bc7f12c6b64ccd0..6f7e7a8fa5ae470c463586fb0c638b5dc6f7313e 100644 (file)
@@ -181,5 +181,6 @@ static void __exit nvme_keyring_exit(void)
 
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Hannes Reinecke <hare@suse.de>");
+MODULE_DESCRIPTION("NVMe Keyring implementation");
 module_init(nvme_keyring_init);
 module_exit(nvme_keyring_exit);
index 596bb11eeba5a9d0a4d1637f2c061775956bb84e..a480cdeac2883c0eb08dbfd32fc4c1adaece7a88 100644 (file)
@@ -797,6 +797,7 @@ static int apple_nvme_init_request(struct blk_mq_tag_set *set,
 
 static void apple_nvme_disable(struct apple_nvme *anv, bool shutdown)
 {
+       enum nvme_ctrl_state state = nvme_ctrl_state(&anv->ctrl);
        u32 csts = readl(anv->mmio_nvme + NVME_REG_CSTS);
        bool dead = false, freeze = false;
        unsigned long flags;
@@ -808,8 +809,8 @@ static void apple_nvme_disable(struct apple_nvme *anv, bool shutdown)
        if (csts & NVME_CSTS_CFS)
                dead = true;
 
-       if (anv->ctrl.state == NVME_CTRL_LIVE ||
-           anv->ctrl.state == NVME_CTRL_RESETTING) {
+       if (state == NVME_CTRL_LIVE ||
+           state == NVME_CTRL_RESETTING) {
                freeze = true;
                nvme_start_freeze(&anv->ctrl);
        }
@@ -881,7 +882,7 @@ static enum blk_eh_timer_return apple_nvme_timeout(struct request *req)
        unsigned long flags;
        u32 csts = readl(anv->mmio_nvme + NVME_REG_CSTS);
 
-       if (anv->ctrl.state != NVME_CTRL_LIVE) {
+       if (nvme_ctrl_state(&anv->ctrl) != NVME_CTRL_LIVE) {
                /*
                 * From rdma.c:
                 * If we are resetting, connecting or deleting we should
@@ -985,10 +986,10 @@ static void apple_nvme_reset_work(struct work_struct *work)
        u32 boot_status, aqa;
        struct apple_nvme *anv =
                container_of(work, struct apple_nvme, ctrl.reset_work);
+       enum nvme_ctrl_state state = nvme_ctrl_state(&anv->ctrl);
 
-       if (anv->ctrl.state != NVME_CTRL_RESETTING) {
-               dev_warn(anv->dev, "ctrl state %d is not RESETTING\n",
-                        anv->ctrl.state);
+       if (state != NVME_CTRL_RESETTING) {
+               dev_warn(anv->dev, "ctrl state %d is not RESETTING\n", state);
                ret = -ENODEV;
                goto out;
        }
@@ -1515,7 +1516,7 @@ static int apple_nvme_probe(struct platform_device *pdev)
                goto put_dev;
        }
 
-       anv->ctrl.admin_q = blk_mq_init_queue(&anv->admin_tagset);
+       anv->ctrl.admin_q = blk_mq_alloc_queue(&anv->admin_tagset, NULL, NULL);
        if (IS_ERR(anv->ctrl.admin_q)) {
                ret = -ENOMEM;
                goto put_dev;
index 72c0525c75f503bb56c7c246c733f9eea57e44ab..a264b3ae078b8c4c28382c7d8f8757c7e3eec594 100644 (file)
@@ -48,11 +48,6 @@ struct nvme_dhchap_queue_context {
 
 static struct workqueue_struct *nvme_auth_wq;
 
-#define nvme_auth_flags_from_qid(qid) \
-       (qid == 0) ? 0 : BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_RESERVED
-#define nvme_auth_queue_from_qid(ctrl, qid) \
-       (qid == 0) ? (ctrl)->fabrics_q : (ctrl)->connect_q
-
 static inline int ctrl_max_dhchaps(struct nvme_ctrl *ctrl)
 {
        return ctrl->opts->nr_io_queues + ctrl->opts->nr_write_queues +
@@ -63,10 +58,15 @@ static int nvme_auth_submit(struct nvme_ctrl *ctrl, int qid,
                            void *data, size_t data_len, bool auth_send)
 {
        struct nvme_command cmd = {};
-       blk_mq_req_flags_t flags = nvme_auth_flags_from_qid(qid);
-       struct request_queue *q = nvme_auth_queue_from_qid(ctrl, qid);
+       nvme_submit_flags_t flags = NVME_SUBMIT_RETRY;
+       struct request_queue *q = ctrl->fabrics_q;
        int ret;
 
+       if (qid != 0) {
+               flags |= NVME_SUBMIT_NOWAIT | NVME_SUBMIT_RESERVED;
+               q = ctrl->connect_q;
+       }
+
        cmd.auth_common.opcode = nvme_fabrics_command;
        cmd.auth_common.secp = NVME_AUTH_DHCHAP_PROTOCOL_IDENTIFIER;
        cmd.auth_common.spsp0 = 0x01;
@@ -80,8 +80,7 @@ static int nvme_auth_submit(struct nvme_ctrl *ctrl, int qid,
        }
 
        ret = __nvme_submit_sync_cmd(q, &cmd, NULL, data, data_len,
-                                    qid == 0 ? NVME_QID_ANY : qid,
-                                    0, flags);
+                                    qid == 0 ? NVME_QID_ANY : qid, flags);
        if (ret > 0)
                dev_warn(ctrl->device,
                        "qid %d auth_send failed with status %d\n", qid, ret);
@@ -897,7 +896,7 @@ static void nvme_ctrl_auth_work(struct work_struct *work)
         * If the ctrl is no connected, bail as reconnect will handle
         * authentication.
         */
-       if (ctrl->state != NVME_CTRL_LIVE)
+       if (nvme_ctrl_state(ctrl) != NVME_CTRL_LIVE)
                return;
 
        /* Authenticate admin queue first */
index 20f46c230885c10f2a82bc87f7091645e2d02db5..6f2ebb5fcdb05e1e65971643c9aff2c3f2271c19 100644 (file)
@@ -171,15 +171,15 @@ static const char * const nvme_statuses[] = {
        [NVME_SC_HOST_ABORTED_CMD] = "Host Aborted Command",
 };
 
-const unsigned char *nvme_get_error_status_str(u16 status)
+const char *nvme_get_error_status_str(u16 status)
 {
        status &= 0x7ff;
        if (status < ARRAY_SIZE(nvme_statuses) && nvme_statuses[status])
-               return nvme_statuses[status & 0x7ff];
+               return nvme_statuses[status];
        return "Unknown";
 }
 
-const unsigned char *nvme_get_opcode_str(u8 opcode)
+const char *nvme_get_opcode_str(u8 opcode)
 {
        if (opcode < ARRAY_SIZE(nvme_ops) && nvme_ops[opcode])
                return nvme_ops[opcode];
@@ -187,7 +187,7 @@ const unsigned char *nvme_get_opcode_str(u8 opcode)
 }
 EXPORT_SYMBOL_GPL(nvme_get_opcode_str);
 
-const unsigned char *nvme_get_admin_opcode_str(u8 opcode)
+const char *nvme_get_admin_opcode_str(u8 opcode)
 {
        if (opcode < ARRAY_SIZE(nvme_admin_ops) && nvme_admin_ops[opcode])
                return nvme_admin_ops[opcode];
@@ -195,7 +195,7 @@ const unsigned char *nvme_get_admin_opcode_str(u8 opcode)
 }
 EXPORT_SYMBOL_GPL(nvme_get_admin_opcode_str);
 
-const unsigned char *nvme_get_fabrics_opcode_str(u8 opcode) {
+const char *nvme_get_fabrics_opcode_str(u8 opcode) {
        if (opcode < ARRAY_SIZE(nvme_fabrics_ops) && nvme_fabrics_ops[opcode])
                return nvme_fabrics_ops[opcode];
        return "Unknown";
index 85ab0fcf9e886451fb070b75dcd53be4a4f88f62..00864a63447099bca59fa45f8f6076933b58f836 100644 (file)
@@ -114,12 +114,21 @@ static DEFINE_MUTEX(nvme_subsystems_lock);
 
 static DEFINE_IDA(nvme_instance_ida);
 static dev_t nvme_ctrl_base_chr_devt;
-static struct class *nvme_class;
-static struct class *nvme_subsys_class;
+static int nvme_class_uevent(const struct device *dev, struct kobj_uevent_env *env);
+static const struct class nvme_class = {
+       .name = "nvme",
+       .dev_uevent = nvme_class_uevent,
+};
+
+static const struct class nvme_subsys_class = {
+       .name = "nvme-subsystem",
+};
 
 static DEFINE_IDA(nvme_ns_chr_minor_ida);
 static dev_t nvme_ns_chr_devt;
-static struct class *nvme_ns_chr_class;
+static const struct class nvme_ns_chr_class = {
+       .name = "nvme-generic",
+};
 
 static void nvme_put_subsystem(struct nvme_subsystem *subsys);
 static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
@@ -338,6 +347,30 @@ static void nvme_log_error(struct request *req)
                           nr->status & NVME_SC_DNR  ? "DNR "  : "");
 }
 
+static void nvme_log_err_passthru(struct request *req)
+{
+       struct nvme_ns *ns = req->q->queuedata;
+       struct nvme_request *nr = nvme_req(req);
+
+       pr_err_ratelimited("%s: %s(0x%x), %s (sct 0x%x / sc 0x%x) %s%s"
+               "cdw10=0x%x cdw11=0x%x cdw12=0x%x cdw13=0x%x cdw14=0x%x cdw15=0x%x\n",
+               ns ? ns->disk->disk_name : dev_name(nr->ctrl->device),
+               ns ? nvme_get_opcode_str(nr->cmd->common.opcode) :
+                    nvme_get_admin_opcode_str(nr->cmd->common.opcode),
+               nr->cmd->common.opcode,
+               nvme_get_error_status_str(nr->status),
+               nr->status >> 8 & 7,    /* Status Code Type */
+               nr->status & 0xff,      /* Status Code */
+               nr->status & NVME_SC_MORE ? "MORE " : "",
+               nr->status & NVME_SC_DNR  ? "DNR "  : "",
+               nr->cmd->common.cdw10,
+               nr->cmd->common.cdw11,
+               nr->cmd->common.cdw12,
+               nr->cmd->common.cdw13,
+               nr->cmd->common.cdw14,
+               nr->cmd->common.cdw14);
+}
+
 enum nvme_disposition {
        COMPLETE,
        RETRY,
@@ -385,8 +418,12 @@ static inline void nvme_end_req(struct request *req)
 {
        blk_status_t status = nvme_error_status(nvme_req(req)->status);
 
-       if (unlikely(nvme_req(req)->status && !(req->rq_flags & RQF_QUIET)))
-               nvme_log_error(req);
+       if (unlikely(nvme_req(req)->status && !(req->rq_flags & RQF_QUIET))) {
+               if (blk_rq_is_passthrough(req))
+                       nvme_log_err_passthru(req);
+               else
+                       nvme_log_error(req);
+       }
        nvme_end_req_zoned(req);
        nvme_trace_bio_complete(req);
        if (req->cmd_flags & REQ_NVME_MPATH)
@@ -679,10 +716,21 @@ static inline void nvme_clear_nvme_request(struct request *req)
 /* initialize a passthrough request */
 void nvme_init_request(struct request *req, struct nvme_command *cmd)
 {
-       if (req->q->queuedata)
+       struct nvme_request *nr = nvme_req(req);
+       bool logging_enabled;
+
+       if (req->q->queuedata) {
+               struct nvme_ns *ns = req->q->disk->private_data;
+
+               logging_enabled = ns->head->passthru_err_log_enabled;
                req->timeout = NVME_IO_TIMEOUT;
-       else /* no queuedata implies admin queue */
+       } else { /* no queuedata implies admin queue */
+               logging_enabled = nr->ctrl->passthru_err_log_enabled;
                req->timeout = NVME_ADMIN_TIMEOUT;
+       }
+
+       if (!logging_enabled)
+               req->rq_flags |= RQF_QUIET;
 
        /* passthru commands should let the driver set the SGL flags */
        cmd->common.flags &= ~NVME_CMD_SGL_ALL;
@@ -691,8 +739,7 @@ void nvme_init_request(struct request *req, struct nvme_command *cmd)
        if (req->mq_hctx->type == HCTX_TYPE_POLL)
                req->cmd_flags |= REQ_POLLED;
        nvme_clear_nvme_request(req);
-       req->rq_flags |= RQF_QUIET;
-       memcpy(nvme_req(req)->cmd, cmd, sizeof(*cmd));
+       memcpy(nr->cmd, cmd, sizeof(*cmd));
 }
 EXPORT_SYMBOL_GPL(nvme_init_request);
 
@@ -721,7 +768,7 @@ blk_status_t nvme_fail_nonready_command(struct nvme_ctrl *ctrl,
 EXPORT_SYMBOL_GPL(nvme_fail_nonready_command);
 
 bool __nvme_check_ready(struct nvme_ctrl *ctrl, struct request *rq,
-               bool queue_live)
+               bool queue_live, enum nvme_ctrl_state state)
 {
        struct nvme_request *req = nvme_req(rq);
 
@@ -742,7 +789,7 @@ bool __nvme_check_ready(struct nvme_ctrl *ctrl, struct request *rq,
                 * command, which is require to set the queue live in the
                 * appropinquate states.
                 */
-               switch (nvme_ctrl_state(ctrl)) {
+               switch (state) {
                case NVME_CTRL_CONNECTING:
                        if (blk_rq_is_passthrough(rq) && nvme_is_fabrics(req->cmd) &&
                            (req->cmd->fabrics.fctype == nvme_fabrics_type_connect ||
@@ -1051,20 +1098,27 @@ EXPORT_SYMBOL_NS_GPL(nvme_execute_rq, NVME_TARGET_PASSTHRU);
  */
 int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
                union nvme_result *result, void *buffer, unsigned bufflen,
-               int qid, int at_head, blk_mq_req_flags_t flags)
+               int qid, nvme_submit_flags_t flags)
 {
        struct request *req;
        int ret;
+       blk_mq_req_flags_t blk_flags = 0;
 
+       if (flags & NVME_SUBMIT_NOWAIT)
+               blk_flags |= BLK_MQ_REQ_NOWAIT;
+       if (flags & NVME_SUBMIT_RESERVED)
+               blk_flags |= BLK_MQ_REQ_RESERVED;
        if (qid == NVME_QID_ANY)
-               req = blk_mq_alloc_request(q, nvme_req_op(cmd), flags);
+               req = blk_mq_alloc_request(q, nvme_req_op(cmd), blk_flags);
        else
-               req = blk_mq_alloc_request_hctx(q, nvme_req_op(cmd), flags,
+               req = blk_mq_alloc_request_hctx(q, nvme_req_op(cmd), blk_flags,
                                                qid - 1);
 
        if (IS_ERR(req))
                return PTR_ERR(req);
        nvme_init_request(req, cmd);
+       if (flags & NVME_SUBMIT_RETRY)
+               req->cmd_flags &= ~REQ_FAILFAST_DRIVER;
 
        if (buffer && bufflen) {
                ret = blk_rq_map_kern(q, req, buffer, bufflen, GFP_KERNEL);
@@ -1072,7 +1126,7 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
                        goto out;
        }
 
-       ret = nvme_execute_rq(req, at_head);
+       ret = nvme_execute_rq(req, flags & NVME_SUBMIT_AT_HEAD);
        if (result && ret >= 0)
                *result = nvme_req(req)->result;
  out:
@@ -1085,7 +1139,7 @@ int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
                void *buffer, unsigned bufflen)
 {
        return __nvme_submit_sync_cmd(q, cmd, NULL, buffer, bufflen,
-                       NVME_QID_ANY, 0, 0);
+                       NVME_QID_ANY, 0);
 }
 EXPORT_SYMBOL_GPL(nvme_submit_sync_cmd);
 
@@ -1108,6 +1162,10 @@ u32 nvme_command_effects(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode)
                effects &= ~NVME_CMD_EFFECTS_CSE_MASK;
        } else {
                effects = le32_to_cpu(ctrl->effects->acs[opcode]);
+
+               /* Ignore execution restrictions if any relaxation bits are set */
+               if (effects & NVME_CMD_EFFECTS_CSER_MASK)
+                       effects &= ~NVME_CMD_EFFECTS_CSE_MASK;
        }
 
        return effects;
@@ -1349,8 +1407,10 @@ static int nvme_identify_ctrl(struct nvme_ctrl *dev, struct nvme_id_ctrl **id)
 
        error = nvme_submit_sync_cmd(dev->admin_q, &c, *id,
                        sizeof(struct nvme_id_ctrl));
-       if (error)
+       if (error) {
                kfree(*id);
+               *id = NULL;
+       }
        return error;
 }
 
@@ -1479,6 +1539,7 @@ int nvme_identify_ns(struct nvme_ctrl *ctrl, unsigned nsid,
        if (error) {
                dev_warn(ctrl->device, "Identify namespace failed (%d)\n", error);
                kfree(*id);
+               *id = NULL;
        }
        return error;
 }
@@ -1560,7 +1621,7 @@ static int nvme_features(struct nvme_ctrl *dev, u8 op, unsigned int fid,
        c.features.dword11 = cpu_to_le32(dword11);
 
        ret = __nvme_submit_sync_cmd(dev->admin_q, &c, &res,
-                       buffer, buflen, NVME_QID_ANY, 0, 0);
+                       buffer, buflen, NVME_QID_ANY, 0);
        if (ret >= 0 && result)
                *result = le32_to_cpu(res.u32);
        return ret;
@@ -1678,12 +1739,23 @@ int nvme_getgeo(struct block_device *bdev, struct hd_geometry *geo)
        return 0;
 }
 
-#ifdef CONFIG_BLK_DEV_INTEGRITY
-static void nvme_init_integrity(struct gendisk *disk,
-               struct nvme_ns_head *head, u32 max_integrity_segments)
+static bool nvme_init_integrity(struct gendisk *disk, struct nvme_ns_head *head)
 {
        struct blk_integrity integrity = { };
 
+       blk_integrity_unregister(disk);
+
+       if (!head->ms)
+               return true;
+
+       /*
+        * PI can always be supported as we can ask the controller to simply
+        * insert/strip it, which is not possible for other kinds of metadata.
+        */
+       if (!IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) ||
+           !(head->features & NVME_NS_METADATA_SUPPORTED))
+               return nvme_ns_has_pi(head);
+
        switch (head->pi_type) {
        case NVME_NS_DPS_PI_TYPE3:
                switch (head->guard_type) {
@@ -1726,53 +1798,32 @@ static void nvme_init_integrity(struct gendisk *disk,
        }
 
        integrity.tuple_size = head->ms;
+       integrity.pi_offset = head->pi_offset;
        blk_integrity_register(disk, &integrity);
-       blk_queue_max_integrity_segments(disk->queue, max_integrity_segments);
-}
-#else
-static void nvme_init_integrity(struct gendisk *disk,
-               struct nvme_ns_head *head, u32 max_integrity_segments)
-{
+       return true;
 }
-#endif /* CONFIG_BLK_DEV_INTEGRITY */
 
-static void nvme_config_discard(struct nvme_ctrl *ctrl, struct gendisk *disk,
-               struct nvme_ns_head *head)
+static void nvme_config_discard(struct nvme_ns *ns, struct queue_limits *lim)
 {
-       struct request_queue *queue = disk->queue;
-       u32 max_discard_sectors;
-
-       if (ctrl->dmrsl && ctrl->dmrsl <= nvme_sect_to_lba(head, UINT_MAX)) {
-               max_discard_sectors = nvme_lba_to_sect(head, ctrl->dmrsl);
-       } else if (ctrl->oncs & NVME_CTRL_ONCS_DSM) {
-               max_discard_sectors = UINT_MAX;
-       } else {
-               blk_queue_max_discard_sectors(queue, 0);
-               return;
-       }
+       struct nvme_ctrl *ctrl = ns->ctrl;
 
        BUILD_BUG_ON(PAGE_SIZE / sizeof(struct nvme_dsm_range) <
                        NVME_DSM_MAX_RANGES);
 
-       /*
-        * If discard is already enabled, don't reset queue limits.
-        *
-        * This works around the fact that the block layer can't cope well with
-        * updating the hardware limits when overridden through sysfs.  This is
-        * harmless because discard limits in NVMe are purely advisory.
-        */
-       if (queue->limits.max_discard_sectors)
-               return;
+       if (ctrl->dmrsl && ctrl->dmrsl <= nvme_sect_to_lba(ns->head, UINT_MAX))
+               lim->max_hw_discard_sectors =
+                       nvme_lba_to_sect(ns->head, ctrl->dmrsl);
+       else if (ctrl->oncs & NVME_CTRL_ONCS_DSM)
+               lim->max_hw_discard_sectors = UINT_MAX;
+       else
+               lim->max_hw_discard_sectors = 0;
+
+       lim->discard_granularity = lim->logical_block_size;
 
-       blk_queue_max_discard_sectors(queue, max_discard_sectors);
        if (ctrl->dmrl)
-               blk_queue_max_discard_segments(queue, ctrl->dmrl);
+               lim->max_discard_segments = ctrl->dmrl;
        else
-               blk_queue_max_discard_segments(queue, NVME_DSM_MAX_RANGES);
-       queue->limits.discard_granularity = queue_logical_block_size(queue);
-
-       if (ctrl->quirks & NVME_QUIRK_DEALLOCATE_ZEROES)
-               blk_queue_max_write_zeroes_sectors(queue, UINT_MAX);
+               lim->max_discard_segments = NVME_DSM_MAX_RANGES;
 }
 
 static bool nvme_ns_ids_equal(struct nvme_ns_ids *a, struct nvme_ns_ids *b)
@@ -1783,42 +1834,38 @@ static bool nvme_ns_ids_equal(struct nvme_ns_ids *a, struct nvme_ns_ids *b)
                a->csi == b->csi;
 }
 
-static int nvme_init_ms(struct nvme_ctrl *ctrl, struct nvme_ns_head *head,
-               struct nvme_id_ns *id)
+static int nvme_identify_ns_nvm(struct nvme_ctrl *ctrl, unsigned int nsid,
+               struct nvme_id_ns_nvm **nvmp)
 {
-       bool first = id->dps & NVME_NS_DPS_PI_FIRST;
-       unsigned lbaf = nvme_lbaf_index(id->flbas);
-       struct nvme_command c = { };
+       struct nvme_command c = {
+               .identify.opcode        = nvme_admin_identify,
+               .identify.nsid          = cpu_to_le32(nsid),
+               .identify.cns           = NVME_ID_CNS_CS_NS,
+               .identify.csi           = NVME_CSI_NVM,
+       };
        struct nvme_id_ns_nvm *nvm;
-       int ret = 0;
-       u32 elbaf;
-
-       head->pi_size = 0;
-       head->ms = le16_to_cpu(id->lbaf[lbaf].ms);
-       if (!(ctrl->ctratt & NVME_CTRL_ATTR_ELBAS)) {
-               head->pi_size = sizeof(struct t10_pi_tuple);
-               head->guard_type = NVME_NVM_NS_16B_GUARD;
-               goto set_pi;
-       }
+       int ret;
 
        nvm = kzalloc(sizeof(*nvm), GFP_KERNEL);
        if (!nvm)
                return -ENOMEM;
 
-       c.identify.opcode = nvme_admin_identify;
-       c.identify.nsid = cpu_to_le32(head->ns_id);
-       c.identify.cns = NVME_ID_CNS_CS_NS;
-       c.identify.csi = NVME_CSI_NVM;
-
        ret = nvme_submit_sync_cmd(ctrl->admin_q, &c, nvm, sizeof(*nvm));
        if (ret)
-               goto free_data;
+               kfree(nvm);
+       else
+               *nvmp = nvm;
+       return ret;
+}
 
-       elbaf = le32_to_cpu(nvm->elbaf[lbaf]);
+static void nvme_configure_pi_elbas(struct nvme_ns_head *head,
+               struct nvme_id_ns *id, struct nvme_id_ns_nvm *nvm)
+{
+       u32 elbaf = le32_to_cpu(nvm->elbaf[nvme_lbaf_index(id->flbas)]);
 
        /* no support for storage tag formats right now */
        if (nvme_elbaf_sts(elbaf))
-               goto free_data;
+               return;
 
        head->guard_type = nvme_elbaf_guard_type(elbaf);
        switch (head->guard_type) {
@@ -1831,30 +1878,31 @@ static int nvme_init_ms(struct nvme_ctrl *ctrl, struct nvme_ns_head *head,
        default:
                break;
        }
-
-free_data:
-       kfree(nvm);
-set_pi:
-       if (head->pi_size && (first || head->ms == head->pi_size))
-               head->pi_type = id->dps & NVME_NS_DPS_PI_MASK;
-       else
-               head->pi_type = 0;
-
-       return ret;
 }
 
-static int nvme_configure_metadata(struct nvme_ctrl *ctrl,
-               struct nvme_ns_head *head, struct nvme_id_ns *id)
+static void nvme_configure_metadata(struct nvme_ctrl *ctrl,
+               struct nvme_ns_head *head, struct nvme_id_ns *id,
+               struct nvme_id_ns_nvm *nvm)
 {
-       int ret;
-
-       ret = nvme_init_ms(ctrl, head, id);
-       if (ret)
-               return ret;
-
        head->features &= ~(NVME_NS_METADATA_SUPPORTED | NVME_NS_EXT_LBAS);
+       head->pi_type = 0;
+       head->pi_size = 0;
+       head->pi_offset = 0;
+       head->ms = le16_to_cpu(id->lbaf[nvme_lbaf_index(id->flbas)].ms);
        if (!head->ms || !(ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
-               return 0;
+               return;
+
+       if (nvm && (ctrl->ctratt & NVME_CTRL_ATTR_ELBAS)) {
+               nvme_configure_pi_elbas(head, id, nvm);
+       } else {
+               head->pi_size = sizeof(struct t10_pi_tuple);
+               head->guard_type = NVME_NVM_NS_16B_GUARD;
+       }
+
+       if (head->pi_size && head->ms >= head->pi_size)
+               head->pi_type = id->dps & NVME_NS_DPS_PI_MASK;
+       if (!(id->dps & NVME_NS_DPS_PI_FIRST))
+               head->pi_offset = head->ms - head->pi_size;
 
        if (ctrl->ops->flags & NVME_F_FABRICS) {
                /*
@@ -1863,7 +1911,7 @@ static int nvme_configure_metadata(struct nvme_ctrl *ctrl,
                 * remap the separate metadata buffer from the block layer.
                 */
                if (WARN_ON_ONCE(!(id->flbas & NVME_NS_FLBAS_META_EXT)))
-                       return 0;
+                       return;
 
                head->features |= NVME_NS_EXT_LBAS;
 
@@ -1890,33 +1938,32 @@ static int nvme_configure_metadata(struct nvme_ctrl *ctrl,
                else
                        head->features |= NVME_NS_METADATA_SUPPORTED;
        }
-       return 0;
 }
 
-static void nvme_set_queue_limits(struct nvme_ctrl *ctrl,
-               struct request_queue *q)
+static u32 nvme_max_drv_segments(struct nvme_ctrl *ctrl)
 {
-       bool vwc = ctrl->vwc & NVME_CTRL_VWC_PRESENT;
-
-       if (ctrl->max_hw_sectors) {
-               u32 max_segments =
-                       (ctrl->max_hw_sectors / (NVME_CTRL_PAGE_SIZE >> 9)) + 1;
+       return ctrl->max_hw_sectors / (NVME_CTRL_PAGE_SIZE >> SECTOR_SHIFT) + 1;
+}
 
-               max_segments = min_not_zero(max_segments, ctrl->max_segments);
-               blk_queue_max_hw_sectors(q, ctrl->max_hw_sectors);
-               blk_queue_max_segments(q, min_t(u32, max_segments, USHRT_MAX));
-       }
-       blk_queue_virt_boundary(q, NVME_CTRL_PAGE_SIZE - 1);
-       blk_queue_dma_alignment(q, 3);
-       blk_queue_write_cache(q, vwc, vwc);
+static void nvme_set_ctrl_limits(struct nvme_ctrl *ctrl,
+               struct queue_limits *lim)
+{
+       lim->max_hw_sectors = ctrl->max_hw_sectors;
+       lim->max_segments = min_t(u32, USHRT_MAX,
+               min_not_zero(nvme_max_drv_segments(ctrl), ctrl->max_segments));
+       lim->max_integrity_segments = ctrl->max_integrity_segments;
+       lim->virt_boundary_mask = NVME_CTRL_PAGE_SIZE - 1;
+       lim->max_segment_size = UINT_MAX;
+       lim->dma_alignment = 3;
 }
 
-static void nvme_update_disk_info(struct nvme_ctrl *ctrl, struct gendisk *disk,
-               struct nvme_ns_head *head, struct nvme_id_ns *id)
+static bool nvme_update_disk_info(struct nvme_ns *ns, struct nvme_id_ns *id,
+               struct queue_limits *lim)
 {
-       sector_t capacity = nvme_lba_to_sect(head, le64_to_cpu(id->nsze));
+       struct nvme_ns_head *head = ns->head;
        u32 bs = 1U << head->lba_shift;
        u32 atomic_bs, phys_bs, io_opt = 0;
+       bool valid = true;
 
        /*
         * The block layer can't support LBA sizes larger than the page size
@@ -1924,12 +1971,10 @@ static void nvme_update_disk_info(struct nvme_ctrl *ctrl, struct gendisk *disk,
         * allow block I/O.
         */
        if (head->lba_shift > PAGE_SHIFT || head->lba_shift < SECTOR_SHIFT) {
-               capacity = 0;
                bs = (1 << 9);
+               valid = false;
        }
 
-       blk_integrity_unregister(disk);
-
        atomic_bs = phys_bs = bs;
        if (id->nabo == 0) {
                /*
@@ -1940,7 +1985,7 @@ static void nvme_update_disk_info(struct nvme_ctrl *ctrl, struct gendisk *disk,
                if (id->nsfeat & NVME_NS_FEAT_ATOMICS && id->nawupf)
                        atomic_bs = (1 + le16_to_cpu(id->nawupf)) * bs;
                else
-                       atomic_bs = (1 + ctrl->subsys->awupf) * bs;
+                       atomic_bs = (1 + ns->ctrl->subsys->awupf) * bs;
        }
 
        if (id->nsfeat & NVME_NS_FEAT_IO_OPT) {
@@ -1950,36 +1995,20 @@ static void nvme_update_disk_info(struct nvme_ctrl *ctrl, struct gendisk *disk,
                io_opt = bs * (1 + le16_to_cpu(id->nows));
        }
 
-       blk_queue_logical_block_size(disk->queue, bs);
        /*
         * Linux filesystems assume writing a single physical block is
         * an atomic operation. Hence limit the physical block size to the
         * value of the Atomic Write Unit Power Fail parameter.
         */
-       blk_queue_physical_block_size(disk->queue, min(phys_bs, atomic_bs));
-       blk_queue_io_min(disk->queue, phys_bs);
-       blk_queue_io_opt(disk->queue, io_opt);
-
-       /*
-        * Register a metadata profile for PI, or the plain non-integrity NVMe
-        * metadata masquerading as Type 0 if supported, otherwise reject block
-        * I/O to namespaces with metadata except when the namespace supports
-        * PI, as it can strip/insert in that case.
-        */
-       if (head->ms) {
-               if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
-                   (head->features & NVME_NS_METADATA_SUPPORTED))
-                       nvme_init_integrity(disk, head,
-                                           ctrl->max_integrity_segments);
-               else if (!nvme_ns_has_pi(head))
-                       capacity = 0;
-       }
-
-       set_capacity_and_notify(disk, capacity);
-
-       nvme_config_discard(ctrl, disk, head);
-       blk_queue_max_write_zeroes_sectors(disk->queue,
-                                          ctrl->max_zeroes_sectors);
+       lim->logical_block_size = bs;
+       lim->physical_block_size = min(phys_bs, atomic_bs);
+       lim->io_min = phys_bs;
+       lim->io_opt = io_opt;
+       if (ns->ctrl->quirks & NVME_QUIRK_DEALLOCATE_ZEROES)
+               lim->max_write_zeroes_sectors = UINT_MAX;
+       else
+               lim->max_write_zeroes_sectors = ns->ctrl->max_zeroes_sectors;
+       return valid;
 }
 
 static bool nvme_ns_is_readonly(struct nvme_ns *ns, struct nvme_ns_info *info)
@@ -1993,7 +2022,8 @@ static inline bool nvme_first_scan(struct gendisk *disk)
        return !disk_live(disk);
 }
 
-static void nvme_set_chunk_sectors(struct nvme_ns *ns, struct nvme_id_ns *id)
+static void nvme_set_chunk_sectors(struct nvme_ns *ns, struct nvme_id_ns *id,
+               struct queue_limits *lim)
 {
        struct nvme_ctrl *ctrl = ns->ctrl;
        u32 iob;
@@ -2021,38 +2051,36 @@ static void nvme_set_chunk_sectors(struct nvme_ns *ns, struct nvme_id_ns *id)
                return;
        }
 
-       blk_queue_chunk_sectors(ns->queue, iob);
+       lim->chunk_sectors = iob;
 }
 
 static int nvme_update_ns_info_generic(struct nvme_ns *ns,
                struct nvme_ns_info *info)
 {
+       struct queue_limits lim;
+       int ret;
+
        blk_mq_freeze_queue(ns->disk->queue);
-       nvme_set_queue_limits(ns->ctrl, ns->queue);
+       lim = queue_limits_start_update(ns->disk->queue);
+       nvme_set_ctrl_limits(ns->ctrl, &lim);
+       ret = queue_limits_commit_update(ns->disk->queue, &lim);
        set_disk_ro(ns->disk, nvme_ns_is_readonly(ns, info));
        blk_mq_unfreeze_queue(ns->disk->queue);
 
-       if (nvme_ns_head_multipath(ns->head)) {
-               blk_mq_freeze_queue(ns->head->disk->queue);
-               set_disk_ro(ns->head->disk, nvme_ns_is_readonly(ns, info));
-               nvme_mpath_revalidate_paths(ns);
-               blk_stack_limits(&ns->head->disk->queue->limits,
-                                &ns->queue->limits, 0);
-               ns->head->disk->flags |= GENHD_FL_HIDDEN;
-               blk_mq_unfreeze_queue(ns->head->disk->queue);
-       }
-
        /* Hide the block-interface for these devices */
-       ns->disk->flags |= GENHD_FL_HIDDEN;
-       set_bit(NVME_NS_READY, &ns->flags);
-
-       return 0;
+       if (!ret)
+               ret = -ENODEV;
+       return ret;
 }
 
 static int nvme_update_ns_info_block(struct nvme_ns *ns,
                struct nvme_ns_info *info)
 {
+       bool vwc = ns->ctrl->vwc & NVME_CTRL_VWC_PRESENT;
+       struct queue_limits lim;
+       struct nvme_id_ns_nvm *nvm = NULL;
        struct nvme_id_ns *id;
+       sector_t capacity;
        unsigned lbaf;
        int ret;
 
@@ -2064,30 +2092,52 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
                /* namespace not allocated or attached */
                info->is_removed = true;
                ret = -ENODEV;
-               goto error;
+               goto out;
+       }
+
+       if (ns->ctrl->ctratt & NVME_CTRL_ATTR_ELBAS) {
+               ret = nvme_identify_ns_nvm(ns->ctrl, info->nsid, &nvm);
+               if (ret < 0)
+                       goto out;
        }
 
        blk_mq_freeze_queue(ns->disk->queue);
        lbaf = nvme_lbaf_index(id->flbas);
        ns->head->lba_shift = id->lbaf[lbaf].ds;
        ns->head->nuse = le64_to_cpu(id->nuse);
-       nvme_set_queue_limits(ns->ctrl, ns->queue);
+       capacity = nvme_lba_to_sect(ns->head, le64_to_cpu(id->nsze));
 
-       ret = nvme_configure_metadata(ns->ctrl, ns->head, id);
-       if (ret < 0) {
-               blk_mq_unfreeze_queue(ns->disk->queue);
-               goto out;
-       }
-       nvme_set_chunk_sectors(ns, id);
-       nvme_update_disk_info(ns->ctrl, ns->disk, ns->head, id);
-
-       if (ns->head->ids.csi == NVME_CSI_ZNS) {
-               ret = nvme_update_zone_info(ns, lbaf);
+       lim = queue_limits_start_update(ns->disk->queue);
+       nvme_set_ctrl_limits(ns->ctrl, &lim);
+       nvme_configure_metadata(ns->ctrl, ns->head, id, nvm);
+       nvme_set_chunk_sectors(ns, id, &lim);
+       if (!nvme_update_disk_info(ns, id, &lim))
+               capacity = 0;
+       nvme_config_discard(ns, &lim);
+       if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
+           ns->head->ids.csi == NVME_CSI_ZNS) {
+               ret = nvme_update_zone_info(ns, lbaf, &lim);
                if (ret) {
                        blk_mq_unfreeze_queue(ns->disk->queue);
                        goto out;
                }
        }
+       ret = queue_limits_commit_update(ns->disk->queue, &lim);
+       if (ret) {
+               blk_mq_unfreeze_queue(ns->disk->queue);
+               goto out;
+       }
+
+       /*
+        * Register a metadata profile for PI, or the plain non-integrity NVMe
+        * metadata masquerading as Type 0 if supported, otherwise reject block
+        * I/O to namespaces with metadata except when the namespace supports
+        * PI, as it can strip/insert in that case.
+        */
+       if (!nvme_init_integrity(ns->disk, ns->head))
+               capacity = 0;
+
+       set_capacity_and_notify(ns->disk, capacity);
 
        /*
         * Only set the DEAC bit if the device guarantees that reads from
@@ -2098,62 +2148,81 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
        if ((id->dlfeat & 0x7) == 0x1 && (id->dlfeat & (1 << 3)))
                ns->head->features |= NVME_NS_DEAC;
        set_disk_ro(ns->disk, nvme_ns_is_readonly(ns, info));
+       blk_queue_write_cache(ns->disk->queue, vwc, vwc);
        set_bit(NVME_NS_READY, &ns->flags);
        blk_mq_unfreeze_queue(ns->disk->queue);
 
        if (blk_queue_is_zoned(ns->queue)) {
-               ret = nvme_revalidate_zones(ns);
+               ret = blk_revalidate_disk_zones(ns->disk, NULL);
                if (ret && !nvme_first_scan(ns->disk))
                        goto out;
        }
 
-       if (nvme_ns_head_multipath(ns->head)) {
-               blk_mq_freeze_queue(ns->head->disk->queue);
-               nvme_update_disk_info(ns->ctrl, ns->head->disk, ns->head, id);
-               set_disk_ro(ns->head->disk, nvme_ns_is_readonly(ns, info));
-               nvme_mpath_revalidate_paths(ns);
-               blk_stack_limits(&ns->head->disk->queue->limits,
-                                &ns->queue->limits, 0);
-               disk_update_readahead(ns->head->disk);
-               blk_mq_unfreeze_queue(ns->head->disk->queue);
-       }
-
        ret = 0;
 out:
-       /*
-        * If probing fails due an unsupported feature, hide the block device,
-        * but still allow other access.
-        */
-       if (ret == -ENODEV) {
-               ns->disk->flags |= GENHD_FL_HIDDEN;
-               set_bit(NVME_NS_READY, &ns->flags);
-               ret = 0;
-       }
-
-error:
+       kfree(nvm);
        kfree(id);
        return ret;
 }
 
 static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
 {
+       bool unsupported = false;
+       int ret;
+
        switch (info->ids.csi) {
        case NVME_CSI_ZNS:
                if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
                        dev_info(ns->ctrl->device,
        "block device for nsid %u not supported without CONFIG_BLK_DEV_ZONED\n",
                                info->nsid);
-                       return nvme_update_ns_info_generic(ns, info);
+                       ret = nvme_update_ns_info_generic(ns, info);
+                       break;
                }
-               return nvme_update_ns_info_block(ns, info);
+               ret = nvme_update_ns_info_block(ns, info);
+               break;
        case NVME_CSI_NVM:
-               return nvme_update_ns_info_block(ns, info);
+               ret = nvme_update_ns_info_block(ns, info);
+               break;
        default:
                dev_info(ns->ctrl->device,
                        "block device for nsid %u not supported (csi %u)\n",
                        info->nsid, info->ids.csi);
-               return nvme_update_ns_info_generic(ns, info);
+               ret = nvme_update_ns_info_generic(ns, info);
+               break;
+       }
+
+       /*
+        * If probing fails due an unsupported feature, hide the block device,
+        * but still allow other access.
+        */
+       if (ret == -ENODEV) {
+               ns->disk->flags |= GENHD_FL_HIDDEN;
+               set_bit(NVME_NS_READY, &ns->flags);
+               unsupported = true;
+               ret = 0;
+       }
+
+       if (!ret && nvme_ns_head_multipath(ns->head)) {
+               struct queue_limits lim;
+
+               blk_mq_freeze_queue(ns->head->disk->queue);
+               if (unsupported)
+                       ns->head->disk->flags |= GENHD_FL_HIDDEN;
+               else
+                       nvme_init_integrity(ns->head->disk, ns->head);
+               set_capacity_and_notify(ns->head->disk, get_capacity(ns->disk));
+               set_disk_ro(ns->head->disk, nvme_ns_is_readonly(ns, info));
+               nvme_mpath_revalidate_paths(ns);
+
+               lim = queue_limits_start_update(ns->head->disk->queue);
+               queue_limits_stack_bdev(&lim, ns->disk->part0, 0,
+                                       ns->head->disk->disk_name);
+               ret = queue_limits_commit_update(ns->head->disk->queue, &lim);
+               blk_mq_unfreeze_queue(ns->head->disk->queue);
        }
+
+       return ret;
 }
 
 #ifdef CONFIG_BLK_SED_OPAL
@@ -2172,7 +2241,7 @@ static int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t l
        cmd.common.cdw11 = cpu_to_le32(len);
 
        return __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, NULL, buffer, len,
-                       NVME_QID_ANY, 1, 0);
+                       NVME_QID_ANY, NVME_SUBMIT_AT_HEAD);
 }
 
 static void nvme_configure_opal(struct nvme_ctrl *ctrl, bool was_suspended)
@@ -2828,7 +2897,7 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
        subsys->awupf = le16_to_cpu(id->awupf);
        nvme_mpath_default_iopolicy(subsys);
 
-       subsys->dev.class = nvme_subsys_class;
+       subsys->dev.class = &nvme_subsys_class;
        subsys->dev.release = nvme_release_subsystem;
        subsys->dev.groups = nvme_subsys_attrs_groups;
        dev_set_name(&subsys->dev, "nvme-subsys%d", ctrl->instance);
@@ -3068,11 +3137,17 @@ static int nvme_check_ctrl_fabric_info(struct nvme_ctrl *ctrl, struct nvme_id_ct
                return -EINVAL;
        }
 
+       if (!ctrl->maxcmd) {
+               dev_err(ctrl->device, "Maximum outstanding commands is 0\n");
+               return -EINVAL;
+       }
+
        return 0;
 }
 
 static int nvme_init_identify(struct nvme_ctrl *ctrl)
 {
+       struct queue_limits lim;
        struct nvme_id_ctrl *id;
        u32 max_hw_sectors;
        bool prev_apst_enabled;
@@ -3139,7 +3214,12 @@ static int nvme_init_identify(struct nvme_ctrl *ctrl)
        ctrl->max_hw_sectors =
                min_not_zero(ctrl->max_hw_sectors, max_hw_sectors);
 
-       nvme_set_queue_limits(ctrl, ctrl->admin_q);
+       lim = queue_limits_start_update(ctrl->admin_q);
+       nvme_set_ctrl_limits(ctrl, &lim);
+       ret = queue_limits_commit_update(ctrl->admin_q, &lim);
+       if (ret)
+               goto out_free;
+
        ctrl->sgls = le32_to_cpu(id->sgls);
        ctrl->kas = le16_to_cpu(id->kas);
        ctrl->max_namespaces = le32_to_cpu(id->mnan);
@@ -3371,7 +3451,7 @@ int nvme_cdev_add(struct cdev *cdev, struct device *cdev_device,
        if (minor < 0)
                return minor;
        cdev_device->devt = MKDEV(MAJOR(nvme_ns_chr_devt), minor);
-       cdev_device->class = nvme_ns_chr_class;
+       cdev_device->class = &nvme_ns_chr_class;
        cdev_device->release = nvme_cdev_rel;
        device_initialize(cdev_device);
        cdev_init(cdev, fops);
@@ -3643,7 +3723,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
        if (!ns)
                return;
 
-       disk = blk_mq_alloc_disk(ctrl->tagset, ns);
+       disk = blk_mq_alloc_disk(ctrl->tagset, NULL, ns);
        if (IS_ERR(disk))
                goto out_free_ns;
        disk->fops = &nvme_bdev_ops;
@@ -3714,6 +3794,13 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
        nvme_mpath_add_disk(ns, info->anagrpid);
        nvme_fault_inject_init(&ns->fault_inject, ns->disk->disk_name);
 
+       /*
+        * Set ns->disk->device->driver_data to ns so we can access
+        * ns->head->passthru_err_log_enabled in
+        * nvme_io_passthru_err_log_enabled_[store | show]().
+        */
+       dev_set_drvdata(disk_to_dev(ns->disk), ns);
+
        return;
 
  out_cleanup_ns_from_list:
@@ -4138,6 +4225,7 @@ static bool nvme_ctrl_pp_status(struct nvme_ctrl *ctrl)
 static void nvme_get_fw_slot_info(struct nvme_ctrl *ctrl)
 {
        struct nvme_fw_slot_info_log *log;
+       u8 next_fw_slot, cur_fw_slot;
 
        log = kmalloc(sizeof(*log), GFP_KERNEL);
        if (!log)
@@ -4149,13 +4237,15 @@ static void nvme_get_fw_slot_info(struct nvme_ctrl *ctrl)
                goto out_free_log;
        }
 
-       if (log->afi & 0x70 || !(log->afi & 0x7)) {
+       cur_fw_slot = log->afi & 0x7;
+       next_fw_slot = (log->afi & 0x70) >> 4;
+       if (!cur_fw_slot || (next_fw_slot && (cur_fw_slot != next_fw_slot))) {
                dev_info(ctrl->device,
                         "Firmware is activated after next Controller Level Reset\n");
                goto out_free_log;
        }
 
-       memcpy(ctrl->subsys->firmware_rev, &log->frs[(log->afi & 0x7) - 1],
+       memcpy(ctrl->subsys->firmware_rev, &log->frs[cur_fw_slot - 1],
                sizeof(ctrl->subsys->firmware_rev));
 
 out_free_log:
@@ -4294,6 +4384,7 @@ EXPORT_SYMBOL_GPL(nvme_complete_async_event);
 int nvme_alloc_admin_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
                const struct blk_mq_ops *ops, unsigned int cmd_size)
 {
+       struct queue_limits lim = {};
        int ret;
 
        memset(set, 0, sizeof(*set));
@@ -4313,14 +4404,14 @@ int nvme_alloc_admin_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
        if (ret)
                return ret;
 
-       ctrl->admin_q = blk_mq_init_queue(set);
+       ctrl->admin_q = blk_mq_alloc_queue(set, &lim, NULL);
        if (IS_ERR(ctrl->admin_q)) {
                ret = PTR_ERR(ctrl->admin_q);
                goto out_free_tagset;
        }
 
        if (ctrl->ops->flags & NVME_F_FABRICS) {
-               ctrl->fabrics_q = blk_mq_init_queue(set);
+               ctrl->fabrics_q = blk_mq_alloc_queue(set, NULL, NULL);
                if (IS_ERR(ctrl->fabrics_q)) {
                        ret = PTR_ERR(ctrl->fabrics_q);
                        goto out_cleanup_admin_q;
@@ -4384,7 +4475,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
                return ret;
 
        if (ctrl->ops->flags & NVME_F_FABRICS) {
-               ctrl->connect_q = blk_mq_init_queue(set);
+               ctrl->connect_q = blk_mq_alloc_queue(set, NULL, NULL);
                if (IS_ERR(ctrl->connect_q)) {
                        ret = PTR_ERR(ctrl->connect_q);
                        goto out_free_tag_set;
@@ -4514,6 +4605,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
        int ret;
 
        WRITE_ONCE(ctrl->state, NVME_CTRL_NEW);
+       ctrl->passthru_err_log_enabled = false;
        clear_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags);
        spin_lock_init(&ctrl->lock);
        mutex_init(&ctrl->scan_lock);
@@ -4553,7 +4645,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
        ctrl->device = &ctrl->ctrl_device;
        ctrl->device->devt = MKDEV(MAJOR(nvme_ctrl_base_chr_devt),
                        ctrl->instance);
-       ctrl->device->class = nvme_class;
+       ctrl->device->class = &nvme_class;
        ctrl->device->parent = ctrl->dev;
        if (ops->dev_attr_groups)
                ctrl->device->groups = ops->dev_attr_groups;
@@ -4786,42 +4878,36 @@ static int __init nvme_core_init(void)
        if (result < 0)
                goto destroy_delete_wq;
 
-       nvme_class = class_create("nvme");
-       if (IS_ERR(nvme_class)) {
-               result = PTR_ERR(nvme_class);
+       result = class_register(&nvme_class);
+       if (result)
                goto unregister_chrdev;
-       }
-       nvme_class->dev_uevent = nvme_class_uevent;
 
-       nvme_subsys_class = class_create("nvme-subsystem");
-       if (IS_ERR(nvme_subsys_class)) {
-               result = PTR_ERR(nvme_subsys_class);
+       result = class_register(&nvme_subsys_class);
+       if (result)
                goto destroy_class;
-       }
 
        result = alloc_chrdev_region(&nvme_ns_chr_devt, 0, NVME_MINORS,
                                     "nvme-generic");
        if (result < 0)
                goto destroy_subsys_class;
 
-       nvme_ns_chr_class = class_create("nvme-generic");
-       if (IS_ERR(nvme_ns_chr_class)) {
-               result = PTR_ERR(nvme_ns_chr_class);
+       result = class_register(&nvme_ns_chr_class);
+       if (result)
                goto unregister_generic_ns;
-       }
+
        result = nvme_init_auth();
        if (result)
                goto destroy_ns_chr;
        return 0;
 
 destroy_ns_chr:
-       class_destroy(nvme_ns_chr_class);
+       class_unregister(&nvme_ns_chr_class);
 unregister_generic_ns:
        unregister_chrdev_region(nvme_ns_chr_devt, NVME_MINORS);
 destroy_subsys_class:
-       class_destroy(nvme_subsys_class);
+       class_unregister(&nvme_subsys_class);
 destroy_class:
-       class_destroy(nvme_class);
+       class_unregister(&nvme_class);
 unregister_chrdev:
        unregister_chrdev_region(nvme_ctrl_base_chr_devt, NVME_MINORS);
 destroy_delete_wq:
@@ -4837,9 +4923,9 @@ out:
 static void __exit nvme_core_exit(void)
 {
        nvme_exit_auth();
-       class_destroy(nvme_ns_chr_class);
-       class_destroy(nvme_subsys_class);
-       class_destroy(nvme_class);
+       class_unregister(&nvme_ns_chr_class);
+       class_unregister(&nvme_subsys_class);
+       class_unregister(&nvme_class);
        unregister_chrdev_region(nvme_ns_chr_devt, NVME_MINORS);
        unregister_chrdev_region(nvme_ctrl_base_chr_devt, NVME_MINORS);
        destroy_workqueue(nvme_delete_wq);
@@ -4851,5 +4937,6 @@ static void __exit nvme_core_exit(void)
 
 MODULE_LICENSE("GPL");
 MODULE_VERSION("1.0");
+MODULE_DESCRIPTION("NVMe host core framework");
 module_init(nvme_core_init);
 module_exit(nvme_core_exit);
index b5752a77ad989f04a14ef9416f6244cf5255441d..1f0ea1f32d22f5badee3af7b7827172c31c9ecf7 100644 (file)
@@ -180,7 +180,7 @@ int nvmf_reg_read32(struct nvme_ctrl *ctrl, u32 off, u32 *val)
        cmd.prop_get.offset = cpu_to_le32(off);
 
        ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, &res, NULL, 0,
-                       NVME_QID_ANY, 0, 0);
+                       NVME_QID_ANY, 0);
 
        if (ret >= 0)
                *val = le64_to_cpu(res.u64);
@@ -226,7 +226,7 @@ int nvmf_reg_read64(struct nvme_ctrl *ctrl, u32 off, u64 *val)
        cmd.prop_get.offset = cpu_to_le32(off);
 
        ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, &res, NULL, 0,
-                       NVME_QID_ANY, 0, 0);
+                       NVME_QID_ANY, 0);
 
        if (ret >= 0)
                *val = le64_to_cpu(res.u64);
@@ -271,7 +271,7 @@ int nvmf_reg_write32(struct nvme_ctrl *ctrl, u32 off, u32 val)
        cmd.prop_set.value = cpu_to_le64(val);
 
        ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, NULL, NULL, 0,
-                       NVME_QID_ANY, 0, 0);
+                       NVME_QID_ANY, 0);
        if (unlikely(ret))
                dev_err(ctrl->device,
                        "Property Set error: %d, offset %#x\n",
@@ -450,8 +450,10 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
                return -ENOMEM;
 
        ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, &res,
-                       data, sizeof(*data), NVME_QID_ANY, 1,
-                       BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT);
+                       data, sizeof(*data), NVME_QID_ANY,
+                       NVME_SUBMIT_AT_HEAD |
+                       NVME_SUBMIT_NOWAIT |
+                       NVME_SUBMIT_RESERVED);
        if (ret) {
                nvmf_log_connect_error(ctrl, ret, le32_to_cpu(res.u32),
                                       &cmd, data);
@@ -525,11 +527,14 @@ int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid)
                return -ENOMEM;
 
        ret = __nvme_submit_sync_cmd(ctrl->connect_q, &cmd, &res,
-                       data, sizeof(*data), qid, 1,
-                       BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT);
+                       data, sizeof(*data), qid,
+                       NVME_SUBMIT_AT_HEAD |
+                       NVME_SUBMIT_RESERVED |
+                       NVME_SUBMIT_NOWAIT);
        if (ret) {
                nvmf_log_connect_error(ctrl, ret, le32_to_cpu(res.u32),
                                       &cmd, data);
+               goto out_free_data;
        }
        result = le32_to_cpu(res.u32);
        if (result & (NVME_CONNECT_AUTHREQ_ATR | NVME_CONNECT_AUTHREQ_ASCR)) {
@@ -633,7 +638,7 @@ static struct key *nvmf_parse_key(int key_id)
        }
 
        key = key_lookup(key_id);
-       if (!IS_ERR(key))
+       if (IS_ERR(key))
                pr_err("key id %08x not found\n", key_id);
        else
                pr_debug("Using key id %08x\n", key_id);
@@ -1314,7 +1319,10 @@ out_free_opts:
        return ERR_PTR(ret);
 }
 
-static struct class *nvmf_class;
+static const struct class nvmf_class = {
+       .name = "nvme-fabrics",
+};
+
 static struct device *nvmf_device;
 static DEFINE_MUTEX(nvmf_dev_mutex);
 
@@ -1434,15 +1442,14 @@ static int __init nvmf_init(void)
        if (!nvmf_default_host)
                return -ENOMEM;
 
-       nvmf_class = class_create("nvme-fabrics");
-       if (IS_ERR(nvmf_class)) {
+       ret = class_register(&nvmf_class);
+       if (ret) {
                pr_err("couldn't register class nvme-fabrics\n");
-               ret = PTR_ERR(nvmf_class);
                goto out_free_host;
        }
 
        nvmf_device =
-               device_create(nvmf_class, NULL, MKDEV(0, 0), NULL, "ctl");
+               device_create(&nvmf_class, NULL, MKDEV(0, 0), NULL, "ctl");
        if (IS_ERR(nvmf_device)) {
                pr_err("couldn't create nvme-fabrics device!\n");
                ret = PTR_ERR(nvmf_device);
@@ -1458,9 +1465,9 @@ static int __init nvmf_init(void)
        return 0;
 
 out_destroy_device:
-       device_destroy(nvmf_class, MKDEV(0, 0));
+       device_destroy(&nvmf_class, MKDEV(0, 0));
 out_destroy_class:
-       class_destroy(nvmf_class);
+       class_unregister(&nvmf_class);
 out_free_host:
        nvmf_host_put(nvmf_default_host);
        return ret;
@@ -1469,8 +1476,8 @@ out_free_host:
 static void __exit nvmf_exit(void)
 {
        misc_deregister(&nvmf_misc);
-       device_destroy(nvmf_class, MKDEV(0, 0));
-       class_destroy(nvmf_class);
+       device_destroy(&nvmf_class, MKDEV(0, 0));
+       class_unregister(&nvmf_class);
        nvmf_host_put(nvmf_default_host);
 
        BUILD_BUG_ON(sizeof(struct nvmf_common_command) != 64);
@@ -1488,6 +1495,7 @@ static void __exit nvmf_exit(void)
 }
 
 MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("NVMe host fabrics library");
 
 module_init(nvmf_init);
 module_exit(nvmf_exit);
index fbaee5a7be196c08483a41b6673f00e5032bec10..06cc54851b1be39615cdfa6eed1a935dec472f82 100644 (file)
@@ -185,9 +185,11 @@ static inline bool
 nvmf_ctlr_matches_baseopts(struct nvme_ctrl *ctrl,
                        struct nvmf_ctrl_options *opts)
 {
-       if (ctrl->state == NVME_CTRL_DELETING ||
-           ctrl->state == NVME_CTRL_DELETING_NOIO ||
-           ctrl->state == NVME_CTRL_DEAD ||
+       enum nvme_ctrl_state state = nvme_ctrl_state(ctrl);
+
+       if (state == NVME_CTRL_DELETING ||
+           state == NVME_CTRL_DELETING_NOIO ||
+           state == NVME_CTRL_DEAD ||
            strcmp(opts->subsysnqn, ctrl->opts->subsysnqn) ||
            strcmp(opts->host->nqn, ctrl->opts->host->nqn) ||
            !uuid_equal(&opts->host->id, &ctrl->opts->host->id))
index 16847a316421f393cfbf410f083cd97c657f062e..68a5d971657bb5080f717f5ae1ec5645830aadd5 100644 (file)
@@ -221,11 +221,6 @@ static LIST_HEAD(nvme_fc_lport_list);
 static DEFINE_IDA(nvme_fc_local_port_cnt);
 static DEFINE_IDA(nvme_fc_ctrl_cnt);
 
-static struct workqueue_struct *nvme_fc_wq;
-
-static bool nvme_fc_waiting_to_unload;
-static DECLARE_COMPLETION(nvme_fc_unload_proceed);
-
 /*
  * These items are short-term. They will eventually be moved into
  * a generic FC class. See comments in module init.
@@ -255,8 +250,6 @@ nvme_fc_free_lport(struct kref *ref)
        /* remove from transport list */
        spin_lock_irqsave(&nvme_fc_lock, flags);
        list_del(&lport->port_list);
-       if (nvme_fc_waiting_to_unload && list_empty(&nvme_fc_lport_list))
-               complete(&nvme_fc_unload_proceed);
        spin_unlock_irqrestore(&nvme_fc_lock, flags);
 
        ida_free(&nvme_fc_local_port_cnt, lport->localport.port_num);
@@ -2574,6 +2567,7 @@ static enum blk_eh_timer_return nvme_fc_timeout(struct request *rq)
 {
        struct nvme_fc_fcp_op *op = blk_mq_rq_to_pdu(rq);
        struct nvme_fc_ctrl *ctrl = op->ctrl;
+       u16 qnum = op->queue->qnum;
        struct nvme_fc_cmd_iu *cmdiu = &op->cmd_iu;
        struct nvme_command *sqe = &cmdiu->sqe;
 
@@ -2582,10 +2576,11 @@ static enum blk_eh_timer_return nvme_fc_timeout(struct request *rq)
         * will detect the aborted io and will fail the connection.
         */
        dev_info(ctrl->ctrl.device,
-               "NVME-FC{%d.%d}: io timeout: opcode %d fctype %d w10/11: "
+               "NVME-FC{%d.%d}: io timeout: opcode %d fctype %d (%s) w10/11: "
                "x%08x/x%08x\n",
-               ctrl->cnum, op->queue->qnum, sqe->common.opcode,
-               sqe->connect.fctype, sqe->common.cdw10, sqe->common.cdw11);
+               ctrl->cnum, qnum, sqe->common.opcode, sqe->fabrics.fctype,
+               nvme_fabrics_opcode_str(qnum, sqe),
+               sqe->common.cdw10, sqe->common.cdw11);
        if (__nvme_fc_abort_op(ctrl, op))
                nvme_fc_error_recovery(ctrl, "io timeout abort failed");
 
@@ -3575,8 +3570,8 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
        flush_delayed_work(&ctrl->connect_work);
 
        dev_info(ctrl->ctrl.device,
-               "NVME-FC{%d}: new ctrl: NQN \"%s\"\n",
-               ctrl->cnum, nvmf_ctrl_subsysnqn(&ctrl->ctrl));
+               "NVME-FC{%d}: new ctrl: NQN \"%s\", hostnqn: %s\n",
+               ctrl->cnum, nvmf_ctrl_subsysnqn(&ctrl->ctrl), opts->host->nqn);
 
        return &ctrl->ctrl;
 
@@ -3894,10 +3889,6 @@ static int __init nvme_fc_init_module(void)
 {
        int ret;
 
-       nvme_fc_wq = alloc_workqueue("nvme_fc_wq", WQ_MEM_RECLAIM, 0);
-       if (!nvme_fc_wq)
-               return -ENOMEM;
-
        /*
         * NOTE:
         * It is expected that in the future the kernel will combine
@@ -3915,7 +3906,7 @@ static int __init nvme_fc_init_module(void)
        ret = class_register(&fc_class);
        if (ret) {
                pr_err("couldn't register class fc\n");
-               goto out_destroy_wq;
+               return ret;
        }
 
        /*
@@ -3939,8 +3930,6 @@ out_destroy_device:
        device_destroy(&fc_class, MKDEV(0, 0));
 out_destroy_class:
        class_unregister(&fc_class);
-out_destroy_wq:
-       destroy_workqueue(nvme_fc_wq);
 
        return ret;
 }
@@ -3960,48 +3949,27 @@ nvme_fc_delete_controllers(struct nvme_fc_rport *rport)
        spin_unlock(&rport->lock);
 }
 
-static void
-nvme_fc_cleanup_for_unload(void)
+static void __exit nvme_fc_exit_module(void)
 {
        struct nvme_fc_lport *lport;
        struct nvme_fc_rport *rport;
-
-       list_for_each_entry(lport, &nvme_fc_lport_list, port_list) {
-               list_for_each_entry(rport, &lport->endp_list, endp_list) {
-                       nvme_fc_delete_controllers(rport);
-               }
-       }
-}
-
-static void __exit nvme_fc_exit_module(void)
-{
        unsigned long flags;
-       bool need_cleanup = false;
 
        spin_lock_irqsave(&nvme_fc_lock, flags);
-       nvme_fc_waiting_to_unload = true;
-       if (!list_empty(&nvme_fc_lport_list)) {
-               need_cleanup = true;
-               nvme_fc_cleanup_for_unload();
-       }
+       list_for_each_entry(lport, &nvme_fc_lport_list, port_list)
+               list_for_each_entry(rport, &lport->endp_list, endp_list)
+                       nvme_fc_delete_controllers(rport);
        spin_unlock_irqrestore(&nvme_fc_lock, flags);
-       if (need_cleanup) {
-               pr_info("%s: waiting for ctlr deletes\n", __func__);
-               wait_for_completion(&nvme_fc_unload_proceed);
-               pr_info("%s: ctrl deletes complete\n", __func__);
-       }
+       flush_workqueue(nvme_delete_wq);
 
        nvmf_unregister_transport(&nvme_fc_transport);
 
-       ida_destroy(&nvme_fc_local_port_cnt);
-       ida_destroy(&nvme_fc_ctrl_cnt);
-
        device_destroy(&fc_class, MKDEV(0, 0));
        class_unregister(&fc_class);
-       destroy_workqueue(nvme_fc_wq);
 }
 
 module_init(nvme_fc_init_module);
 module_exit(nvme_fc_exit_module);
 
+MODULE_DESCRIPTION("NVMe host FC transport driver");
 MODULE_LICENSE("GPL v2");
index 18f5c1be5d67e50ecef131bfe5b223e4e5eda5bd..3dfd5ae99ae05e892eb793cb3b21ba0b75dd6e98 100644 (file)
@@ -228,7 +228,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
        length = (io.nblocks + 1) << ns->head->lba_shift;
 
        if ((io.control & NVME_RW_PRINFO_PRACT) &&
-           ns->head->ms == sizeof(struct t10_pi_tuple)) {
+           (ns->head->ms == ns->head->pi_size)) {
                /*
                 * Protection information is stripped/inserted by the
                 * controller.
index 2dd4137a08b284df64788972a067d4282fa92ac7..5397fb428b242cea36f8c36997e134f6463e9f39 100644 (file)
@@ -156,7 +156,7 @@ void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
                if (!ns->head->disk)
                        continue;
                kblockd_schedule_work(&ns->head->requeue_work);
-               if (ctrl->state == NVME_CTRL_LIVE)
+               if (nvme_ctrl_state(ns->ctrl) == NVME_CTRL_LIVE)
                        disk_uevent(ns->head->disk, KOBJ_CHANGE);
        }
        up_read(&ctrl->namespaces_rwsem);
@@ -223,13 +223,14 @@ void nvme_mpath_revalidate_paths(struct nvme_ns *ns)
 
 static bool nvme_path_is_disabled(struct nvme_ns *ns)
 {
+       enum nvme_ctrl_state state = nvme_ctrl_state(ns->ctrl);
+
        /*
         * We don't treat NVME_CTRL_DELETING as a disabled path as I/O should
         * still be able to complete assuming that the controller is connected.
         * Otherwise it will fail immediately and return to the requeue list.
         */
-       if (ns->ctrl->state != NVME_CTRL_LIVE &&
-           ns->ctrl->state != NVME_CTRL_DELETING)
+       if (state != NVME_CTRL_LIVE && state != NVME_CTRL_DELETING)
                return true;
        if (test_bit(NVME_NS_ANA_PENDING, &ns->flags) ||
            !test_bit(NVME_NS_READY, &ns->flags))
@@ -331,7 +332,7 @@ out:
 
 static inline bool nvme_path_is_optimized(struct nvme_ns *ns)
 {
-       return ns->ctrl->state == NVME_CTRL_LIVE &&
+       return nvme_ctrl_state(ns->ctrl) == NVME_CTRL_LIVE &&
                ns->ana_state == NVME_ANA_OPTIMIZED;
 }
 
@@ -358,7 +359,7 @@ static bool nvme_available_path(struct nvme_ns_head *head)
        list_for_each_entry_rcu(ns, &head->list, siblings) {
                if (test_bit(NVME_CTRL_FAILFAST_EXPIRED, &ns->ctrl->flags))
                        continue;
-               switch (ns->ctrl->state) {
+               switch (nvme_ctrl_state(ns->ctrl)) {
                case NVME_CTRL_LIVE:
                case NVME_CTRL_RESETTING:
                case NVME_CTRL_CONNECTING:
@@ -515,6 +516,7 @@ static void nvme_requeue_work(struct work_struct *work)
 
 int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
 {
+       struct queue_limits lim;
        bool vwc = false;
 
        mutex_init(&head->lock);
@@ -531,9 +533,14 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
            !nvme_is_unique_nsid(ctrl, head) || !multipath)
                return 0;
 
-       head->disk = blk_alloc_disk(ctrl->numa_node);
-       if (!head->disk)
-               return -ENOMEM;
+       blk_set_stacking_limits(&lim);
+       lim.dma_alignment = 3;
+       if (head->ids.csi != NVME_CSI_ZNS)
+               lim.max_zone_append_sectors = 0;
+
+       head->disk = blk_alloc_disk(&lim, ctrl->numa_node);
+       if (IS_ERR(head->disk))
+               return PTR_ERR(head->disk);
        head->disk->fops = &nvme_ns_head_ops;
        head->disk->private_data = head;
        sprintf(head->disk->disk_name, "nvme%dn%d",
@@ -552,11 +559,6 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
            ctrl->tagset->map[HCTX_TYPE_POLL].nr_queues)
                blk_queue_flag_set(QUEUE_FLAG_POLL, head->disk->queue);
 
-       /* set to a default value of 512 until the disk is validated */
-       blk_queue_logical_block_size(head->disk->queue, 512);
-       blk_set_stacking_limits(&head->disk->queue->limits);
-       blk_queue_dma_alignment(head->disk->queue, 3);
-
        /* we need to propagate up the VMC settings */
        if (ctrl->vwc & NVME_CTRL_VWC_PRESENT)
                vwc = true;
@@ -667,7 +669,7 @@ static void nvme_update_ns_ana_state(struct nvme_ana_group_desc *desc,
         * controller is ready.
         */
        if (nvme_state_is_live(ns->ana_state) &&
-           ns->ctrl->state == NVME_CTRL_LIVE)
+           nvme_ctrl_state(ns->ctrl) == NVME_CTRL_LIVE)
                nvme_mpath_set_live(ns);
 }
 
@@ -748,7 +750,7 @@ static void nvme_ana_work(struct work_struct *work)
 {
        struct nvme_ctrl *ctrl = container_of(work, struct nvme_ctrl, ana_work);
 
-       if (ctrl->state != NVME_CTRL_LIVE)
+       if (nvme_ctrl_state(ctrl) != NVME_CTRL_LIVE)
                return;
 
        nvme_read_ana_log(ctrl);
index 030c8081824065e7fa3d14e1a4918f1c94080565..24193fcb8bd584de277606d57738c1aea5a9cb49 100644 (file)
@@ -263,6 +263,7 @@ enum nvme_ctrl_flags {
 struct nvme_ctrl {
        bool comp_seen;
        bool identified;
+       bool passthru_err_log_enabled;
        enum nvme_ctrl_state state;
        spinlock_t lock;
        struct mutex scan_lock;
@@ -454,6 +455,7 @@ struct nvme_ns_head {
        struct list_head        entry;
        struct kref             ref;
        bool                    shared;
+       bool                    passthru_err_log_enabled;
        int                     instance;
        struct nvme_effects_log *effects;
        u64                     nuse;
@@ -462,6 +464,7 @@ struct nvme_ns_head {
        u16                     ms;
        u16                     pi_size;
        u8                      pi_type;
+       u8                      pi_offset;
        u8                      guard_type;
        u16                     sgs;
        u32                     sws;
@@ -522,7 +525,6 @@ struct nvme_ns {
        struct device           cdev_device;
 
        struct nvme_fault_inject fault_inject;
-
 };
 
 /* NVMe ns supports metadata actions by the controller (generate/strip) */
@@ -805,17 +807,18 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req);
 blk_status_t nvme_fail_nonready_command(struct nvme_ctrl *ctrl,
                struct request *req);
 bool __nvme_check_ready(struct nvme_ctrl *ctrl, struct request *rq,
-               bool queue_live);
+               bool queue_live, enum nvme_ctrl_state state);
 
 static inline bool nvme_check_ready(struct nvme_ctrl *ctrl, struct request *rq,
                bool queue_live)
 {
-       if (likely(ctrl->state == NVME_CTRL_LIVE))
+       enum nvme_ctrl_state state = nvme_ctrl_state(ctrl);
+
+       if (likely(state == NVME_CTRL_LIVE))
                return true;
-       if (ctrl->ops->flags & NVME_F_FABRICS &&
-           ctrl->state == NVME_CTRL_DELETING)
+       if (ctrl->ops->flags & NVME_F_FABRICS && state == NVME_CTRL_DELETING)
                return queue_live;
-       return __nvme_check_ready(ctrl, rq, queue_live);
+       return __nvme_check_ready(ctrl, rq, queue_live, state);
 }
 
 /*
@@ -836,12 +839,27 @@ static inline bool nvme_is_unique_nsid(struct nvme_ctrl *ctrl,
                (ctrl->ctratt & NVME_CTRL_CTRATT_NVM_SETS);
 }
 
+/*
+ * Flags for __nvme_submit_sync_cmd()
+ */
+typedef __u32 __bitwise nvme_submit_flags_t;
+
+enum {
+       /* Insert request at the head of the queue */
+       NVME_SUBMIT_AT_HEAD  = (__force nvme_submit_flags_t)(1 << 0),
+       /* Set BLK_MQ_REQ_NOWAIT when allocating request */
+       NVME_SUBMIT_NOWAIT = (__force nvme_submit_flags_t)(1 << 1),
+       /* Set BLK_MQ_REQ_RESERVED when allocating request */
+       NVME_SUBMIT_RESERVED = (__force nvme_submit_flags_t)(1 << 2),
+       /* Retry command when NVME_SC_DNR is not set in the result */
+       NVME_SUBMIT_RETRY = (__force nvme_submit_flags_t)(1 << 3),
+};
+
 int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
                void *buf, unsigned bufflen);
 int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
                union nvme_result *result, void *buffer, unsigned bufflen,
-               int qid, int at_head,
-               blk_mq_req_flags_t flags);
+               int qid, nvme_submit_flags_t flags);
 int nvme_set_features(struct nvme_ctrl *dev, unsigned int fid,
                      unsigned int dword11, void *buffer, size_t buflen,
                      u32 *result);
@@ -1018,11 +1036,11 @@ static inline bool nvme_disk_is_ns_head(struct gendisk *disk)
 }
 #endif /* CONFIG_NVME_MULTIPATH */
 
-int nvme_revalidate_zones(struct nvme_ns *ns);
 int nvme_ns_report_zones(struct nvme_ns *ns, sector_t sector,
                unsigned int nr_zones, report_zones_cb cb, void *data);
+int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf,
+               struct queue_limits *lim);
 #ifdef CONFIG_BLK_DEV_ZONED
-int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf);
 blk_status_t nvme_setup_zone_mgmt_send(struct nvme_ns *ns, struct request *req,
                                       struct nvme_command *cmnd,
                                       enum nvme_zone_mgmt_action action);
@@ -1033,13 +1051,6 @@ static inline blk_status_t nvme_setup_zone_mgmt_send(struct nvme_ns *ns,
 {
        return BLK_STS_NOTSUPP;
 }
-
-static inline int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf)
-{
-       dev_warn(ns->ctrl->device,
-                "Please enable CONFIG_BLK_DEV_ZONED to support ZNS devices\n");
-       return -EPROTONOSUPPORT;
-}
 #endif
 
 static inline struct nvme_ns *nvme_get_ns_from_dev(struct device *dev)
@@ -1124,35 +1135,42 @@ static inline bool nvme_multi_css(struct nvme_ctrl *ctrl)
 }
 
 #ifdef CONFIG_NVME_VERBOSE_ERRORS
-const unsigned char *nvme_get_error_status_str(u16 status);
-const unsigned char *nvme_get_opcode_str(u8 opcode);
-const unsigned char *nvme_get_admin_opcode_str(u8 opcode);
-const unsigned char *nvme_get_fabrics_opcode_str(u8 opcode);
+const char *nvme_get_error_status_str(u16 status);
+const char *nvme_get_opcode_str(u8 opcode);
+const char *nvme_get_admin_opcode_str(u8 opcode);
+const char *nvme_get_fabrics_opcode_str(u8 opcode);
 #else /* CONFIG_NVME_VERBOSE_ERRORS */
-static inline const unsigned char *nvme_get_error_status_str(u16 status)
+static inline const char *nvme_get_error_status_str(u16 status)
 {
        return "I/O Error";
 }
-static inline const unsigned char *nvme_get_opcode_str(u8 opcode)
+static inline const char *nvme_get_opcode_str(u8 opcode)
 {
        return "I/O Cmd";
 }
-static inline const unsigned char *nvme_get_admin_opcode_str(u8 opcode)
+static inline const char *nvme_get_admin_opcode_str(u8 opcode)
 {
        return "Admin Cmd";
 }
 
-static inline const unsigned char *nvme_get_fabrics_opcode_str(u8 opcode)
+static inline const char *nvme_get_fabrics_opcode_str(u8 opcode)
 {
        return "Fabrics Cmd";
 }
 #endif /* CONFIG_NVME_VERBOSE_ERRORS */
 
-static inline const unsigned char *nvme_opcode_str(int qid, u8 opcode, u8 fctype)
+static inline const char *nvme_opcode_str(int qid, u8 opcode)
 {
-       if (opcode == nvme_fabrics_command)
-               return nvme_get_fabrics_opcode_str(fctype);
        return qid ? nvme_get_opcode_str(opcode) :
                nvme_get_admin_opcode_str(opcode);
 }
+
+static inline const char *nvme_fabrics_opcode_str(
+               int qid, const struct nvme_command *cmd)
+{
+       if (nvme_is_fabrics(cmd))
+               return nvme_get_fabrics_opcode_str(cmd->fabrics.fctype);
+
+       return nvme_opcode_str(qid, cmd->common.opcode);
+}
 #endif /* _NVME_H */
index c1d6357ec98a0107acacdae47024c3110b3cfb9f..e6267a6aa3801e5d76e7d1dc4a509ba0e9fc0159 100644 (file)
@@ -1349,7 +1349,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
                dev_warn(dev->ctrl.device,
                         "I/O tag %d (%04x) opcode %#x (%s) QID %d timeout, reset controller\n",
                         req->tag, nvme_cid(req), opcode,
-                        nvme_opcode_str(nvmeq->qid, opcode, 0), nvmeq->qid);
+                        nvme_opcode_str(nvmeq->qid, opcode), nvmeq->qid);
                nvme_req(req)->flags |= NVME_REQ_CANCELLED;
                goto disable;
        }
@@ -3543,5 +3543,6 @@ static void __exit nvme_exit(void)
 MODULE_AUTHOR("Matthew Wilcox <willy@linux.intel.com>");
 MODULE_LICENSE("GPL");
 MODULE_VERSION("1.0");
+MODULE_DESCRIPTION("NVMe host PCIe transport driver");
 module_init(nvme_init);
 module_exit(nvme_exit);
index 11dde0d830442df31c74499655e86566ab995a66..366f0bb4ebfc1d9757aad5bdce2287785142403a 100644 (file)
@@ -1006,6 +1006,7 @@ static int nvme_rdma_setup_ctrl(struct nvme_rdma_ctrl *ctrl, bool new)
 {
        int ret;
        bool changed;
+       u16 max_queue_size;
 
        ret = nvme_rdma_configure_admin_queue(ctrl, new);
        if (ret)
@@ -1030,11 +1031,16 @@ static int nvme_rdma_setup_ctrl(struct nvme_rdma_ctrl *ctrl, bool new)
                        ctrl->ctrl.opts->queue_size, ctrl->ctrl.sqsize + 1);
        }
 
-       if (ctrl->ctrl.sqsize + 1 > NVME_RDMA_MAX_QUEUE_SIZE) {
+       if (ctrl->ctrl.max_integrity_segments)
+               max_queue_size = NVME_RDMA_MAX_METADATA_QUEUE_SIZE;
+       else
+               max_queue_size = NVME_RDMA_MAX_QUEUE_SIZE;
+
+       if (ctrl->ctrl.sqsize + 1 > max_queue_size) {
                dev_warn(ctrl->ctrl.device,
-                       "ctrl sqsize %u > max queue size %u, clamping down\n",
-                       ctrl->ctrl.sqsize + 1, NVME_RDMA_MAX_QUEUE_SIZE);
-               ctrl->ctrl.sqsize = NVME_RDMA_MAX_QUEUE_SIZE - 1;
+                        "ctrl sqsize %u > max queue size %u, clamping down\n",
+                        ctrl->ctrl.sqsize + 1, max_queue_size);
+               ctrl->ctrl.sqsize = max_queue_size - 1;
        }
 
        if (ctrl->ctrl.sqsize + 1 > ctrl->ctrl.maxcmd) {
@@ -1410,6 +1416,8 @@ static int nvme_rdma_map_sg_pi(struct nvme_rdma_queue *queue,
        struct nvme_ns *ns = rq->q->queuedata;
        struct bio *bio = rq->bio;
        struct nvme_keyed_sgl_desc *sg = &c->common.dptr.ksgl;
+       struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
+       u32 xfer_len;
        int nr;
 
        req->mr = ib_mr_pool_get(queue->qp, &queue->qp->sig_mrs);
@@ -1422,8 +1430,7 @@ static int nvme_rdma_map_sg_pi(struct nvme_rdma_queue *queue,
        if (unlikely(nr))
                goto mr_put;
 
-       nvme_rdma_set_sig_attrs(blk_get_integrity(bio->bi_bdev->bd_disk), c,
-                               req->mr->sig_attrs, ns->head->pi_type);
+       nvme_rdma_set_sig_attrs(bi, c, req->mr->sig_attrs, ns->head->pi_type);
        nvme_rdma_set_prot_checks(c, &req->mr->sig_attrs->check_mask);
 
        ib_update_fast_reg_key(req->mr, ib_inc_rkey(req->mr->rkey));
@@ -1441,7 +1448,11 @@ static int nvme_rdma_map_sg_pi(struct nvme_rdma_queue *queue,
                     IB_ACCESS_REMOTE_WRITE;
 
        sg->addr = cpu_to_le64(req->mr->iova);
-       put_unaligned_le24(req->mr->length, sg->length);
+       xfer_len = req->mr->length;
+       /* Check if PI is added by the HW */
+       if (!pi_count)
+               xfer_len += (xfer_len >> bi->interval_exp) * ns->head->pi_size;
+       put_unaligned_le24(xfer_len, sg->length);
        put_unaligned_le32(req->mr->rkey, sg->key);
        sg->type = NVME_KEY_SGL_FMT_DATA_DESC << 4;
 
@@ -1946,14 +1957,13 @@ static enum blk_eh_timer_return nvme_rdma_timeout(struct request *rq)
        struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
        struct nvme_rdma_queue *queue = req->queue;
        struct nvme_rdma_ctrl *ctrl = queue->ctrl;
-       u8 opcode = req->req.cmd->common.opcode;
-       u8 fctype = req->req.cmd->fabrics.fctype;
+       struct nvme_command *cmd = req->req.cmd;
        int qid = nvme_rdma_queue_idx(queue);
 
        dev_warn(ctrl->ctrl.device,
                 "I/O tag %d (%04x) opcode %#x (%s) QID %d timeout\n",
-                rq->tag, nvme_cid(rq), opcode,
-                nvme_opcode_str(qid, opcode, fctype), qid);
+                rq->tag, nvme_cid(rq), cmd->common.opcode,
+                nvme_fabrics_opcode_str(qid, cmd), qid);
 
        if (nvme_ctrl_state(&ctrl->ctrl) != NVME_CTRL_LIVE) {
                /*
@@ -2296,8 +2306,8 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
        if (ret)
                goto out_uninit_ctrl;
 
-       dev_info(ctrl->ctrl.device, "new ctrl: NQN \"%s\", addr %pISpcs\n",
-               nvmf_ctrl_subsysnqn(&ctrl->ctrl), &ctrl->addr);
+       dev_info(ctrl->ctrl.device, "new ctrl: NQN \"%s\", addr %pISpcs, hostnqn: %s\n",
+               nvmf_ctrl_subsysnqn(&ctrl->ctrl), &ctrl->addr, opts->host->nqn);
 
        mutex_lock(&nvme_rdma_ctrl_mutex);
        list_add_tail(&ctrl->list, &nvme_rdma_ctrl_list);
@@ -2400,4 +2410,5 @@ static void __exit nvme_rdma_cleanup_module(void)
 module_init(nvme_rdma_init_module);
 module_exit(nvme_rdma_cleanup_module);
 
+MODULE_DESCRIPTION("NVMe host RDMA transport driver");
 MODULE_LICENSE("GPL v2");
index 754e911110420f5f30074762c7787a88b183830a..09fcaa519e5bc26618eae900a0830a89e6aebbb5 100644 (file)
@@ -35,6 +35,31 @@ static ssize_t nvme_sysfs_rescan(struct device *dev,
 }
 static DEVICE_ATTR(rescan_controller, S_IWUSR, NULL, nvme_sysfs_rescan);
 
+static ssize_t nvme_adm_passthru_err_log_enabled_show(struct device *dev,
+               struct device_attribute *attr, char *buf)
+{
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+
+       return sysfs_emit(buf,
+                         ctrl->passthru_err_log_enabled ? "on\n" : "off\n");
+}
+
+static ssize_t nvme_adm_passthru_err_log_enabled_store(struct device *dev,
+               struct device_attribute *attr, const char *buf, size_t count)
+{
+       struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+       bool passthru_err_log_enabled;
+       int err;
+
+       err = kstrtobool(buf, &passthru_err_log_enabled);
+       if (err)
+               return -EINVAL;
+
+       ctrl->passthru_err_log_enabled = passthru_err_log_enabled;
+
+       return count;
+}
+
 static inline struct nvme_ns_head *dev_to_ns_head(struct device *dev)
 {
        struct gendisk *disk = dev_to_disk(dev);
@@ -44,6 +69,37 @@ static inline struct nvme_ns_head *dev_to_ns_head(struct device *dev)
        return nvme_get_ns_from_dev(dev)->head;
 }
 
+static ssize_t nvme_io_passthru_err_log_enabled_show(struct device *dev,
+               struct device_attribute *attr, char *buf)
+{
+       struct nvme_ns_head *head = dev_to_ns_head(dev);
+
+       return sysfs_emit(buf, head->passthru_err_log_enabled ? "on\n" : "off\n");
+}
+
+static ssize_t nvme_io_passthru_err_log_enabled_store(struct device *dev,
+               struct device_attribute *attr, const char *buf, size_t count)
+{
+       struct nvme_ns_head *head = dev_to_ns_head(dev);
+       bool passthru_err_log_enabled;
+       int err;
+
+       err = kstrtobool(buf, &passthru_err_log_enabled);
+       if (err)
+               return -EINVAL;
+       head->passthru_err_log_enabled = passthru_err_log_enabled;
+
+       return count;
+}
+
+static struct device_attribute dev_attr_adm_passthru_err_log_enabled = \
+       __ATTR(passthru_err_log_enabled, S_IRUGO | S_IWUSR, \
+       nvme_adm_passthru_err_log_enabled_show, nvme_adm_passthru_err_log_enabled_store);
+
+static struct device_attribute dev_attr_io_passthru_err_log_enabled = \
+       __ATTR(passthru_err_log_enabled, S_IRUGO | S_IWUSR, \
+       nvme_io_passthru_err_log_enabled_show, nvme_io_passthru_err_log_enabled_store);
+
 static ssize_t wwid_show(struct device *dev, struct device_attribute *attr,
                char *buf)
 {
@@ -165,14 +221,11 @@ static int ns_update_nuse(struct nvme_ns *ns)
 
        ret = nvme_identify_ns(ns->ctrl, ns->head->ns_id, &id);
        if (ret)
-               goto out_free_id;
+               return ret;
 
        ns->head->nuse = le64_to_cpu(id->nuse);
-
-out_free_id:
        kfree(id);
-
-       return ret;
+       return 0;
 }
 
 static ssize_t nuse_show(struct device *dev, struct device_attribute *attr,
@@ -208,6 +261,7 @@ static struct attribute *nvme_ns_attrs[] = {
        &dev_attr_ana_grpid.attr,
        &dev_attr_ana_state.attr,
 #endif
+       &dev_attr_io_passthru_err_log_enabled.attr,
        NULL,
 };
 
@@ -311,6 +365,7 @@ static ssize_t nvme_sysfs_show_state(struct device *dev,
                                     char *buf)
 {
        struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
+       unsigned state = (unsigned)nvme_ctrl_state(ctrl);
        static const char *const state_name[] = {
                [NVME_CTRL_NEW]         = "new",
                [NVME_CTRL_LIVE]        = "live",
@@ -321,9 +376,8 @@ static ssize_t nvme_sysfs_show_state(struct device *dev,
                [NVME_CTRL_DEAD]        = "dead",
        };
 
-       if ((unsigned)ctrl->state < ARRAY_SIZE(state_name) &&
-           state_name[ctrl->state])
-               return sysfs_emit(buf, "%s\n", state_name[ctrl->state]);
+       if (state < ARRAY_SIZE(state_name) && state_name[state])
+               return sysfs_emit(buf, "%s\n", state_name[state]);
 
        return sysfs_emit(buf, "unknown state\n");
 }
@@ -655,6 +709,7 @@ static struct attribute *nvme_dev_attrs[] = {
 #ifdef CONFIG_NVME_TCP_TLS
        &dev_attr_tls_key.attr,
 #endif
+       &dev_attr_adm_passthru_err_log_enabled.attr,
        NULL
 };
 
index d058d990532bfcf6dd521cfa51f411f60f5913fd..a6d596e05602117ff9c38fbcb86645bda4016c59 100644 (file)
@@ -2428,13 +2428,13 @@ static enum blk_eh_timer_return nvme_tcp_timeout(struct request *rq)
        struct nvme_tcp_request *req = blk_mq_rq_to_pdu(rq);
        struct nvme_ctrl *ctrl = &req->queue->ctrl->ctrl;
        struct nvme_tcp_cmd_pdu *pdu = nvme_tcp_req_cmd_pdu(req);
-       u8 opc = pdu->cmd.common.opcode, fctype = pdu->cmd.fabrics.fctype;
+       struct nvme_command *cmd = &pdu->cmd;
        int qid = nvme_tcp_queue_id(req->queue);
 
        dev_warn(ctrl->device,
                 "I/O tag %d (%04x) type %d opcode %#x (%s) QID %d timeout\n",
-                rq->tag, nvme_cid(rq), pdu->hdr.type, opc,
-                nvme_opcode_str(qid, opc, fctype), qid);
+                rq->tag, nvme_cid(rq), pdu->hdr.type, cmd->common.opcode,
+                nvme_fabrics_opcode_str(qid, cmd), qid);
 
        if (nvme_ctrl_state(ctrl) != NVME_CTRL_LIVE) {
                /*
@@ -2753,8 +2753,8 @@ static struct nvme_ctrl *nvme_tcp_create_ctrl(struct device *dev,
        if (ret)
                goto out_uninit_ctrl;
 
-       dev_info(ctrl->ctrl.device, "new ctrl: NQN \"%s\", addr %pISp\n",
-               nvmf_ctrl_subsysnqn(&ctrl->ctrl), &ctrl->addr);
+       dev_info(ctrl->ctrl.device, "new ctrl: NQN \"%s\", addr %pISp, hostnqn: %s\n",
+               nvmf_ctrl_subsysnqn(&ctrl->ctrl), &ctrl->addr, opts->host->nqn);
 
        mutex_lock(&nvme_tcp_ctrl_mutex);
        list_add_tail(&ctrl->list, &nvme_tcp_ctrl_list);
@@ -2826,4 +2826,5 @@ static void __exit nvme_tcp_cleanup_module(void)
 module_init(nvme_tcp_init_module);
 module_exit(nvme_tcp_cleanup_module);
 
+MODULE_DESCRIPTION("NVMe host TCP transport driver");
 MODULE_LICENSE("GPL v2");
index 499bbb0eee8d09b9dc12c09337ba796d1832339b..722384bcc765cda778972c8a86345eaaf18a7353 100644 (file)
@@ -7,16 +7,6 @@
 #include <linux/vmalloc.h>
 #include "nvme.h"
 
-int nvme_revalidate_zones(struct nvme_ns *ns)
-{
-       struct request_queue *q = ns->queue;
-
-       blk_queue_chunk_sectors(q, ns->head->zsze);
-       blk_queue_max_zone_append_sectors(q, ns->ctrl->max_zone_append);
-
-       return blk_revalidate_disk_zones(ns->disk, NULL);
-}
-
 static int nvme_set_max_append(struct nvme_ctrl *ctrl)
 {
        struct nvme_command c = { };
@@ -45,10 +35,10 @@ static int nvme_set_max_append(struct nvme_ctrl *ctrl)
        return 0;
 }
 
-int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf)
+int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf,
+               struct queue_limits *lim)
 {
        struct nvme_effects_log *log = ns->head->effects;
-       struct request_queue *q = ns->queue;
        struct nvme_command c = { };
        struct nvme_id_ns_zns *id;
        int status;
@@ -109,10 +99,12 @@ int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf)
                goto free_data;
        }
 
-       disk_set_zoned(ns->disk);
-       blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
-       disk_set_max_open_zones(ns->disk, le32_to_cpu(id->mor) + 1);
-       disk_set_max_active_zones(ns->disk, le32_to_cpu(id->mar) + 1);
+       blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, ns->queue);
+       lim->zoned = 1;
+       lim->max_open_zones = le32_to_cpu(id->mor) + 1;
+       lim->max_active_zones = le32_to_cpu(id->mar) + 1;
+       lim->chunk_sectors = ns->head->zsze;
+       lim->max_zone_append_sectors = ns->ctrl->max_zone_append;
 free_data:
        kfree(id);
        return status;
index 39cb570f833dde9ec57ed406547ffbbf705e546f..f5b7054a4a05e38f25c0a3a0a831458138bebcca 100644 (file)
@@ -428,7 +428,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
        id->cqes = (0x4 << 4) | 0x4;
 
        /* no enforcement soft-limit for maxcmd - pick arbitrary high value */
-       id->maxcmd = cpu_to_le16(NVMET_MAX_CMD);
+       id->maxcmd = cpu_to_le16(NVMET_MAX_CMD(ctrl));
 
        id->nn = cpu_to_le32(NVMET_MAX_NAMESPACES);
        id->mnan = cpu_to_le32(NVMET_MAX_NAMESPACES);
index 2482a0db25043c88f2cb3fa3fd0bda3adf8abbbf..77a6e817b31596998e4424aa8205f8cfd9219f1d 100644 (file)
@@ -273,6 +273,32 @@ static ssize_t nvmet_param_inline_data_size_store(struct config_item *item,
 
 CONFIGFS_ATTR(nvmet_, param_inline_data_size);
 
+static ssize_t nvmet_param_max_queue_size_show(struct config_item *item,
+               char *page)
+{
+       struct nvmet_port *port = to_nvmet_port(item);
+
+       return snprintf(page, PAGE_SIZE, "%d\n", port->max_queue_size);
+}
+
+static ssize_t nvmet_param_max_queue_size_store(struct config_item *item,
+               const char *page, size_t count)
+{
+       struct nvmet_port *port = to_nvmet_port(item);
+       int ret;
+
+       if (nvmet_is_port_enabled(port, __func__))
+               return -EACCES;
+       ret = kstrtoint(page, 0, &port->max_queue_size);
+       if (ret) {
+               pr_err("Invalid value '%s' for max_queue_size\n", page);
+               return -EINVAL;
+       }
+       return count;
+}
+
+CONFIGFS_ATTR(nvmet_, param_max_queue_size);
+
 #ifdef CONFIG_BLK_DEV_INTEGRITY
 static ssize_t nvmet_param_pi_enable_show(struct config_item *item,
                char *page)
@@ -1859,6 +1885,7 @@ static struct configfs_attribute *nvmet_port_attrs[] = {
        &nvmet_attr_addr_trtype,
        &nvmet_attr_addr_tsas,
        &nvmet_attr_param_inline_data_size,
+       &nvmet_attr_param_max_queue_size,
 #ifdef CONFIG_BLK_DEV_INTEGRITY
        &nvmet_attr_param_pi_enable,
 #endif
@@ -1917,6 +1944,7 @@ static struct config_group *nvmet_ports_make(struct config_group *group,
        INIT_LIST_HEAD(&port->subsystems);
        INIT_LIST_HEAD(&port->referrals);
        port->inline_data_size = -1;    /* < 0 == let the transport choose */
+       port->max_queue_size = -1;      /* < 0 == let the transport choose */
 
        port->disc_addr.portid = cpu_to_le16(portid);
        port->disc_addr.adrfam = NVMF_ADDR_FAMILY_MAX;
index d26aa30f87026058fb23a1df97d10c1fe7fafbda..6bbe4df0166ca56949a5f5b14ad90f68305d6f36 100644 (file)
@@ -248,7 +248,7 @@ void nvmet_ns_changed(struct nvmet_subsys *subsys, u32 nsid)
                nvmet_add_to_changed_ns_log(ctrl, cpu_to_le32(nsid));
                if (nvmet_aen_bit_disabled(ctrl, NVME_AEN_BIT_NS_ATTR))
                        continue;
-               nvmet_add_async_event(ctrl, NVME_AER_TYPE_NOTICE,
+               nvmet_add_async_event(ctrl, NVME_AER_NOTICE,
                                NVME_AER_NOTICE_NS_CHANGED,
                                NVME_LOG_CHANGED_NS);
        }
@@ -265,7 +265,7 @@ void nvmet_send_ana_event(struct nvmet_subsys *subsys,
                        continue;
                if (nvmet_aen_bit_disabled(ctrl, NVME_AEN_BIT_ANA_CHANGE))
                        continue;
-               nvmet_add_async_event(ctrl, NVME_AER_TYPE_NOTICE,
+               nvmet_add_async_event(ctrl, NVME_AER_NOTICE,
                                NVME_AER_NOTICE_ANA, NVME_LOG_ANA);
        }
        mutex_unlock(&subsys->lock);
@@ -358,6 +358,18 @@ int nvmet_enable_port(struct nvmet_port *port)
        if (port->inline_data_size < 0)
                port->inline_data_size = 0;
 
+       /*
+        * If the transport didn't set the max_queue_size properly, then clamp
+        * it to the target limits. Also set default values in case the
+        * transport didn't set it at all.
+        */
+       if (port->max_queue_size < 0)
+               port->max_queue_size = NVMET_MAX_QUEUE_SIZE;
+       else
+               port->max_queue_size = clamp_t(int, port->max_queue_size,
+                                              NVMET_MIN_QUEUE_SIZE,
+                                              NVMET_MAX_QUEUE_SIZE);
+
        port->enabled = true;
        port->tr_ops = ops;
        return 0;
@@ -1223,9 +1235,10 @@ static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
        ctrl->cap |= (15ULL << 24);
        /* maximum queue entries supported: */
        if (ctrl->ops->get_max_queue_size)
-               ctrl->cap |= ctrl->ops->get_max_queue_size(ctrl) - 1;
+               ctrl->cap |= min_t(u16, ctrl->ops->get_max_queue_size(ctrl),
+                                  ctrl->port->max_queue_size) - 1;
        else
-               ctrl->cap |= NVMET_QUEUE_SIZE - 1;
+               ctrl->cap |= ctrl->port->max_queue_size - 1;
 
        if (nvmet_is_passthru_subsys(ctrl->subsys))
                nvmet_passthrough_override_cap(ctrl);
@@ -1411,6 +1424,7 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 
        kref_init(&ctrl->ref);
        ctrl->subsys = subsys;
+       ctrl->pi_support = ctrl->port->pi_enable && ctrl->subsys->pi_support;
        nvmet_init_cap(ctrl);
        WRITE_ONCE(ctrl->aen_enabled, NVMET_AEN_CFG_OPTIONAL);
 
@@ -1705,4 +1719,5 @@ static void __exit nvmet_exit(void)
 module_init(nvmet_init);
 module_exit(nvmet_exit);
 
+MODULE_DESCRIPTION("NVMe target core framework");
 MODULE_LICENSE("GPL v2");
index 668d257fa98636dc1785e7b5f6bb6b35e8188ab9..ce54da8c6b3661e3a83e4cf53ac101f9262b7176 100644 (file)
@@ -21,7 +21,7 @@ static void __nvmet_disc_changed(struct nvmet_port *port,
        if (nvmet_aen_bit_disabled(ctrl, NVME_AEN_BIT_DISC_CHANGE))
                return;
 
-       nvmet_add_async_event(ctrl, NVME_AER_TYPE_NOTICE,
+       nvmet_add_async_event(ctrl, NVME_AER_NOTICE,
                              NVME_AER_NOTICE_DISC_CHANGED, NVME_LOG_DISC);
 }
 
@@ -282,7 +282,7 @@ static void nvmet_execute_disc_identify(struct nvmet_req *req)
        id->lpa = (1 << 2);
 
        /* no enforcement soft-limit for maxcmd - pick arbitrary high value */
-       id->maxcmd = cpu_to_le16(NVMET_MAX_CMD);
+       id->maxcmd = cpu_to_le16(NVMET_MAX_CMD(ctrl));
 
        id->sgls = cpu_to_le32(1 << 0); /* we always support SGLs */
        if (ctrl->ops->flags & NVMF_KEYED_SGLS)
index d8da840a1c0ed1e9c383d59c11227f7fddfe607d..b23f4cf840bd541ca3f0215cda994549273a43b3 100644 (file)
@@ -157,7 +157,8 @@ static u16 nvmet_install_queue(struct nvmet_ctrl *ctrl, struct nvmet_req *req)
                return NVME_SC_CMD_SEQ_ERROR | NVME_SC_DNR;
        }
 
-       if (sqsize > mqes) {
+       /* for fabrics, this value applies to only the I/O Submission Queues */
+       if (qid && sqsize > mqes) {
                pr_warn("sqsize %u is larger than MQES supported %u cntlid %d\n",
                                sqsize, mqes, ctrl->cntlid);
                req->error_loc = offsetof(struct nvmf_connect_command, sqsize);
@@ -209,7 +210,7 @@ static void nvmet_execute_admin_connect(struct nvmet_req *req)
        struct nvmf_connect_command *c = &req->cmd->connect;
        struct nvmf_connect_data *d;
        struct nvmet_ctrl *ctrl = NULL;
-       u16 status = 0;
+       u16 status;
        int ret;
 
        if (!nvmet_check_transfer_len(req, sizeof(struct nvmf_connect_data)))
@@ -251,8 +252,6 @@ static void nvmet_execute_admin_connect(struct nvmet_req *req)
        if (status)
                goto out;
 
-       ctrl->pi_support = ctrl->port->pi_enable && ctrl->subsys->pi_support;
-
        uuid_copy(&ctrl->hostid, &d->hostid);
 
        ret = nvmet_setup_auth(ctrl);
@@ -290,7 +289,7 @@ static void nvmet_execute_io_connect(struct nvmet_req *req)
        struct nvmf_connect_data *d;
        struct nvmet_ctrl *ctrl;
        u16 qid = le16_to_cpu(c->qid);
-       u16 status = 0;
+       u16 status;
 
        if (!nvmet_check_transfer_len(req, sizeof(struct nvmf_connect_data)))
                return;
index bda7a3009e85127ca27f99e107d61fbf1f3995f2..fd229f310c931fbfd6c3132185f2b73c135cd633 100644 (file)
@@ -111,6 +111,8 @@ struct nvmet_fc_tgtport {
        struct nvmet_fc_port_entry      *pe;
        struct kref                     ref;
        u32                             max_sg_cnt;
+
+       struct work_struct              put_work;
 };
 
 struct nvmet_fc_port_entry {
@@ -145,7 +147,6 @@ struct nvmet_fc_tgt_queue {
        struct list_head                avail_defer_list;
        struct workqueue_struct         *work_q;
        struct kref                     ref;
-       struct rcu_head                 rcu;
        /* array of fcp_iods */
        struct nvmet_fc_fcp_iod         fod[] __counted_by(sqsize);
 } __aligned(sizeof(unsigned long long));
@@ -166,10 +167,9 @@ struct nvmet_fc_tgt_assoc {
        struct nvmet_fc_hostport        *hostport;
        struct nvmet_fc_ls_iod          *rcv_disconn;
        struct list_head                a_list;
-       struct nvmet_fc_tgt_queue __rcu *queues[NVMET_NR_QUEUES + 1];
+       struct nvmet_fc_tgt_queue       *queues[NVMET_NR_QUEUES + 1];
        struct kref                     ref;
        struct work_struct              del_work;
-       struct rcu_head                 rcu;
 };
 
 
@@ -249,6 +249,13 @@ static int nvmet_fc_tgt_a_get(struct nvmet_fc_tgt_assoc *assoc);
 static void nvmet_fc_tgt_q_put(struct nvmet_fc_tgt_queue *queue);
 static int nvmet_fc_tgt_q_get(struct nvmet_fc_tgt_queue *queue);
 static void nvmet_fc_tgtport_put(struct nvmet_fc_tgtport *tgtport);
+static void nvmet_fc_put_tgtport_work(struct work_struct *work)
+{
+       struct nvmet_fc_tgtport *tgtport =
+               container_of(work, struct nvmet_fc_tgtport, put_work);
+
+       nvmet_fc_tgtport_put(tgtport);
+}
 static int nvmet_fc_tgtport_get(struct nvmet_fc_tgtport *tgtport);
 static void nvmet_fc_handle_fcp_rqst(struct nvmet_fc_tgtport *tgtport,
                                        struct nvmet_fc_fcp_iod *fod);
@@ -360,7 +367,7 @@ __nvmet_fc_finish_ls_req(struct nvmet_fc_ls_req_op *lsop)
 
        if (!lsop->req_queued) {
                spin_unlock_irqrestore(&tgtport->lock, flags);
-               return;
+               goto out_putwork;
        }
 
        list_del(&lsop->lsreq_list);
@@ -373,7 +380,8 @@ __nvmet_fc_finish_ls_req(struct nvmet_fc_ls_req_op *lsop)
                                  (lsreq->rqstlen + lsreq->rsplen),
                                  DMA_BIDIRECTIONAL);
 
-       nvmet_fc_tgtport_put(tgtport);
+out_putwork:
+       queue_work(nvmet_wq, &tgtport->put_work);
 }
 
 static int
@@ -489,8 +497,7 @@ nvmet_fc_xmt_disconnect_assoc(struct nvmet_fc_tgt_assoc *assoc)
         * message is normal. Otherwise, send unless the hostport has
         * already been invalidated by the lldd.
         */
-       if (!tgtport->ops->ls_req || !assoc->hostport ||
-           assoc->hostport->invalid)
+       if (!tgtport->ops->ls_req || assoc->hostport->invalid)
                return;
 
        lsop = kzalloc((sizeof(*lsop) +
@@ -802,14 +809,11 @@ nvmet_fc_alloc_target_queue(struct nvmet_fc_tgt_assoc *assoc,
        if (!queue)
                return NULL;
 
-       if (!nvmet_fc_tgt_a_get(assoc))
-               goto out_free_queue;
-
        queue->work_q = alloc_workqueue("ntfc%d.%d.%d", 0, 0,
                                assoc->tgtport->fc_target_port.port_num,
                                assoc->a_id, qid);
        if (!queue->work_q)
-               goto out_a_put;
+               goto out_free_queue;
 
        queue->qid = qid;
        queue->sqsize = sqsize;
@@ -831,15 +835,13 @@ nvmet_fc_alloc_target_queue(struct nvmet_fc_tgt_assoc *assoc,
                goto out_fail_iodlist;
 
        WARN_ON(assoc->queues[qid]);
-       rcu_assign_pointer(assoc->queues[qid], queue);
+       assoc->queues[qid] = queue;
 
        return queue;
 
 out_fail_iodlist:
        nvmet_fc_destroy_fcp_iodlist(assoc->tgtport, queue);
        destroy_workqueue(queue->work_q);
-out_a_put:
-       nvmet_fc_tgt_a_put(assoc);
 out_free_queue:
        kfree(queue);
        return NULL;
@@ -852,15 +854,11 @@ nvmet_fc_tgt_queue_free(struct kref *ref)
        struct nvmet_fc_tgt_queue *queue =
                container_of(ref, struct nvmet_fc_tgt_queue, ref);
 
-       rcu_assign_pointer(queue->assoc->queues[queue->qid], NULL);
-
        nvmet_fc_destroy_fcp_iodlist(queue->assoc->tgtport, queue);
 
-       nvmet_fc_tgt_a_put(queue->assoc);
-
        destroy_workqueue(queue->work_q);
 
-       kfree_rcu(queue, rcu);
+       kfree(queue);
 }
 
 static void
@@ -969,7 +967,7 @@ nvmet_fc_find_target_queue(struct nvmet_fc_tgtport *tgtport,
        rcu_read_lock();
        list_for_each_entry_rcu(assoc, &tgtport->assoc_list, a_list) {
                if (association_id == assoc->association_id) {
-                       queue = rcu_dereference(assoc->queues[qid]);
+                       queue = assoc->queues[qid];
                        if (queue &&
                            (!atomic_read(&queue->connected) ||
                             !nvmet_fc_tgt_q_get(queue)))
@@ -1078,8 +1076,6 @@ nvmet_fc_alloc_hostport(struct nvmet_fc_tgtport *tgtport, void *hosthandle)
                /* new allocation not needed */
                kfree(newhost);
                newhost = match;
-               /* no new allocation - release reference */
-               nvmet_fc_tgtport_put(tgtport);
        } else {
                newhost->tgtport = tgtport;
                newhost->hosthandle = hosthandle;
@@ -1094,23 +1090,54 @@ nvmet_fc_alloc_hostport(struct nvmet_fc_tgtport *tgtport, void *hosthandle)
 }
 
 static void
-nvmet_fc_delete_assoc(struct work_struct *work)
+nvmet_fc_delete_assoc(struct nvmet_fc_tgt_assoc *assoc)
+{
+       nvmet_fc_delete_target_assoc(assoc);
+       nvmet_fc_tgt_a_put(assoc);
+}
+
+static void
+nvmet_fc_delete_assoc_work(struct work_struct *work)
 {
        struct nvmet_fc_tgt_assoc *assoc =
                container_of(work, struct nvmet_fc_tgt_assoc, del_work);
+       struct nvmet_fc_tgtport *tgtport = assoc->tgtport;
 
-       nvmet_fc_delete_target_assoc(assoc);
-       nvmet_fc_tgt_a_put(assoc);
+       nvmet_fc_delete_assoc(assoc);
+       nvmet_fc_tgtport_put(tgtport);
+}
+
+static void
+nvmet_fc_schedule_delete_assoc(struct nvmet_fc_tgt_assoc *assoc)
+{
+       nvmet_fc_tgtport_get(assoc->tgtport);
+       queue_work(nvmet_wq, &assoc->del_work);
+}
+
+static bool
+nvmet_fc_assoc_exits(struct nvmet_fc_tgtport *tgtport, u64 association_id)
+{
+       struct nvmet_fc_tgt_assoc *a;
+
+       list_for_each_entry_rcu(a, &tgtport->assoc_list, a_list) {
+               if (association_id == a->association_id)
+                       return true;
+       }
+
+       return false;
 }
 
 static struct nvmet_fc_tgt_assoc *
 nvmet_fc_alloc_target_assoc(struct nvmet_fc_tgtport *tgtport, void *hosthandle)
 {
-       struct nvmet_fc_tgt_assoc *assoc, *tmpassoc;
+       struct nvmet_fc_tgt_assoc *assoc;
        unsigned long flags;
+       bool done;
        u64 ran;
        int idx;
-       bool needrandom = true;
+
+       if (!tgtport->pe)
+               return NULL;
 
        assoc = kzalloc(sizeof(*assoc), GFP_KERNEL);
        if (!assoc)
@@ -1120,43 +1147,35 @@ nvmet_fc_alloc_target_assoc(struct nvmet_fc_tgtport *tgtport, void *hosthandle)
        if (idx < 0)
                goto out_free_assoc;
 
-       if (!nvmet_fc_tgtport_get(tgtport))
-               goto out_ida;
-
        assoc->hostport = nvmet_fc_alloc_hostport(tgtport, hosthandle);
        if (IS_ERR(assoc->hostport))
-               goto out_put;
+               goto out_ida;
 
        assoc->tgtport = tgtport;
        assoc->a_id = idx;
        INIT_LIST_HEAD(&assoc->a_list);
        kref_init(&assoc->ref);
-       INIT_WORK(&assoc->del_work, nvmet_fc_delete_assoc);
+       INIT_WORK(&assoc->del_work, nvmet_fc_delete_assoc_work);
        atomic_set(&assoc->terminating, 0);
 
-       while (needrandom) {
+       done = false;
+       do {
                get_random_bytes(&ran, sizeof(ran) - BYTES_FOR_QID);
                ran = ran << BYTES_FOR_QID_SHIFT;
 
                spin_lock_irqsave(&tgtport->lock, flags);
-               needrandom = false;
-               list_for_each_entry(tmpassoc, &tgtport->assoc_list, a_list) {
-                       if (ran == tmpassoc->association_id) {
-                               needrandom = true;
-                               break;
-                       }
-               }
-               if (!needrandom) {
+               rcu_read_lock();
+               if (!nvmet_fc_assoc_exits(tgtport, ran)) {
                        assoc->association_id = ran;
                        list_add_tail_rcu(&assoc->a_list, &tgtport->assoc_list);
+                       done = true;
                }
+               rcu_read_unlock();
                spin_unlock_irqrestore(&tgtport->lock, flags);
-       }
+       } while (!done);
 
        return assoc;
 
-out_put:
-       nvmet_fc_tgtport_put(tgtport);
 out_ida:
        ida_free(&tgtport->assoc_cnt, idx);
 out_free_assoc:
@@ -1172,13 +1191,18 @@ nvmet_fc_target_assoc_free(struct kref *ref)
        struct nvmet_fc_tgtport *tgtport = assoc->tgtport;
        struct nvmet_fc_ls_iod  *oldls;
        unsigned long flags;
+       int i;
+
+       for (i = NVMET_NR_QUEUES; i >= 0; i--) {
+               if (assoc->queues[i])
+                       nvmet_fc_delete_target_queue(assoc->queues[i]);
+       }
 
        /* Send Disconnect now that all i/o has completed */
        nvmet_fc_xmt_disconnect_assoc(assoc);
 
        nvmet_fc_free_hostport(assoc->hostport);
        spin_lock_irqsave(&tgtport->lock, flags);
-       list_del_rcu(&assoc->a_list);
        oldls = assoc->rcv_disconn;
        spin_unlock_irqrestore(&tgtport->lock, flags);
        /* if pending Rcv Disconnect Association LS, send rsp now */
@@ -1188,8 +1212,7 @@ nvmet_fc_target_assoc_free(struct kref *ref)
        dev_info(tgtport->dev,
                "{%d:%d} Association freed\n",
                tgtport->fc_target_port.port_num, assoc->a_id);
-       kfree_rcu(assoc, rcu);
-       nvmet_fc_tgtport_put(tgtport);
+       kfree(assoc);
 }
 
 static void
@@ -1208,7 +1231,7 @@ static void
 nvmet_fc_delete_target_assoc(struct nvmet_fc_tgt_assoc *assoc)
 {
        struct nvmet_fc_tgtport *tgtport = assoc->tgtport;
-       struct nvmet_fc_tgt_queue *queue;
+       unsigned long flags;
        int i, terminating;
 
        terminating = atomic_xchg(&assoc->terminating, 1);
@@ -1217,29 +1240,21 @@ nvmet_fc_delete_target_assoc(struct nvmet_fc_tgt_assoc *assoc)
        if (terminating)
                return;
 
+       spin_lock_irqsave(&tgtport->lock, flags);
+       list_del_rcu(&assoc->a_list);
+       spin_unlock_irqrestore(&tgtport->lock, flags);
 
-       for (i = NVMET_NR_QUEUES; i >= 0; i--) {
-               rcu_read_lock();
-               queue = rcu_dereference(assoc->queues[i]);
-               if (!queue) {
-                       rcu_read_unlock();
-                       continue;
-               }
+       synchronize_rcu();
 
-               if (!nvmet_fc_tgt_q_get(queue)) {
-                       rcu_read_unlock();
-                       continue;
-               }
-               rcu_read_unlock();
-               nvmet_fc_delete_target_queue(queue);
-               nvmet_fc_tgt_q_put(queue);
+       /* ensure all in-flight I/Os have been processed */
+       for (i = NVMET_NR_QUEUES; i >= 0; i--) {
+               if (assoc->queues[i])
+                       flush_workqueue(assoc->queues[i]->work_q);
        }
 
        dev_info(tgtport->dev,
                "{%d:%d} Association deleted\n",
                tgtport->fc_target_port.port_num, assoc->a_id);
-
-       nvmet_fc_tgt_a_put(assoc);
 }
 
 static struct nvmet_fc_tgt_assoc *
@@ -1415,6 +1430,7 @@ nvmet_fc_register_targetport(struct nvmet_fc_port_info *pinfo,
        kref_init(&newrec->ref);
        ida_init(&newrec->assoc_cnt);
        newrec->max_sg_cnt = template->max_sgl_segments;
+       INIT_WORK(&newrec->put_work, nvmet_fc_put_tgtport_work);
 
        ret = nvmet_fc_alloc_ls_iodlist(newrec);
        if (ret) {
@@ -1492,9 +1508,8 @@ __nvmet_fc_free_assocs(struct nvmet_fc_tgtport *tgtport)
        list_for_each_entry_rcu(assoc, &tgtport->assoc_list, a_list) {
                if (!nvmet_fc_tgt_a_get(assoc))
                        continue;
-               if (!queue_work(nvmet_wq, &assoc->del_work))
-                       /* already deleting - release local reference */
-                       nvmet_fc_tgt_a_put(assoc);
+               nvmet_fc_schedule_delete_assoc(assoc);
+               nvmet_fc_tgt_a_put(assoc);
        }
        rcu_read_unlock();
 }
@@ -1540,16 +1555,14 @@ nvmet_fc_invalidate_host(struct nvmet_fc_target_port *target_port,
        spin_lock_irqsave(&tgtport->lock, flags);
        list_for_each_entry_safe(assoc, next,
                                &tgtport->assoc_list, a_list) {
-               if (!assoc->hostport ||
-                   assoc->hostport->hosthandle != hosthandle)
+               if (assoc->hostport->hosthandle != hosthandle)
                        continue;
                if (!nvmet_fc_tgt_a_get(assoc))
                        continue;
                assoc->hostport->invalid = 1;
                noassoc = false;
-               if (!queue_work(nvmet_wq, &assoc->del_work))
-                       /* already deleting - release local reference */
-                       nvmet_fc_tgt_a_put(assoc);
+               nvmet_fc_schedule_delete_assoc(assoc);
+               nvmet_fc_tgt_a_put(assoc);
        }
        spin_unlock_irqrestore(&tgtport->lock, flags);
 
@@ -1581,7 +1594,7 @@ nvmet_fc_delete_ctrl(struct nvmet_ctrl *ctrl)
 
                rcu_read_lock();
                list_for_each_entry_rcu(assoc, &tgtport->assoc_list, a_list) {
-                       queue = rcu_dereference(assoc->queues[0]);
+                       queue = assoc->queues[0];
                        if (queue && queue->nvme_sq.ctrl == ctrl) {
                                if (nvmet_fc_tgt_a_get(assoc))
                                        found_ctrl = true;
@@ -1593,9 +1606,8 @@ nvmet_fc_delete_ctrl(struct nvmet_ctrl *ctrl)
                nvmet_fc_tgtport_put(tgtport);
 
                if (found_ctrl) {
-                       if (!queue_work(nvmet_wq, &assoc->del_work))
-                               /* already deleting - release local reference */
-                               nvmet_fc_tgt_a_put(assoc);
+                       nvmet_fc_schedule_delete_assoc(assoc);
+                       nvmet_fc_tgt_a_put(assoc);
                        return;
                }
 
@@ -1625,6 +1637,8 @@ nvmet_fc_unregister_targetport(struct nvmet_fc_target_port *target_port)
        /* terminate any outstanding associations */
        __nvmet_fc_free_assocs(tgtport);
 
+       flush_workqueue(nvmet_wq);
+
        /*
         * should terminate LS's as well. However, LS's will be generated
         * at the tail end of association termination, so they likely don't
@@ -1870,9 +1884,6 @@ nvmet_fc_ls_disconnect(struct nvmet_fc_tgtport *tgtport,
                                sizeof(struct fcnvme_ls_disconnect_assoc_acc)),
                        FCNVME_LS_DISCONNECT_ASSOC);
 
-       /* release get taken in nvmet_fc_find_target_assoc */
-       nvmet_fc_tgt_a_put(assoc);
-
        /*
         * The rules for LS response says the response cannot
         * go back until ABTS's have been sent for all outstanding
@@ -1887,8 +1898,6 @@ nvmet_fc_ls_disconnect(struct nvmet_fc_tgtport *tgtport,
        assoc->rcv_disconn = iod;
        spin_unlock_irqrestore(&tgtport->lock, flags);
 
-       nvmet_fc_delete_target_assoc(assoc);
-
        if (oldls) {
                dev_info(tgtport->dev,
                        "{%d:%d} Multiple Disconnect Association LS's "
@@ -1904,6 +1913,9 @@ nvmet_fc_ls_disconnect(struct nvmet_fc_tgtport *tgtport,
                nvmet_fc_xmt_ls_rsp(tgtport, oldls);
        }
 
+       nvmet_fc_schedule_delete_assoc(assoc);
+       nvmet_fc_tgt_a_put(assoc);
+
        return false;
 }
 
@@ -2540,8 +2552,9 @@ nvmet_fc_handle_fcp_rqst(struct nvmet_fc_tgtport *tgtport,
 
        fod->req.cmd = &fod->cmdiubuf.sqe;
        fod->req.cqe = &fod->rspiubuf.cqe;
-       if (tgtport->pe)
-               fod->req.port = tgtport->pe->port;
+       if (!tgtport->pe)
+               goto transport_error;
+       fod->req.port = tgtport->pe->port;
 
        /* clear any response payload */
        memset(&fod->rspiubuf, 0, sizeof(fod->rspiubuf));
@@ -2902,6 +2915,9 @@ nvmet_fc_remove_port(struct nvmet_port *port)
 
        nvmet_fc_portentry_unbind(pe);
 
+       /* terminate any outstanding associations */
+       __nvmet_fc_free_assocs(pe->tgtport);
+
        kfree(pe);
 }
 
@@ -2933,6 +2949,9 @@ static int __init nvmet_fc_init_module(void)
 
 static void __exit nvmet_fc_exit_module(void)
 {
+       /* ensure any shutdown operation, e.g. delete ctrls have finished */
+       flush_workqueue(nvmet_wq);
+
        /* sanity check - all lports should be removed */
        if (!list_empty(&nvmet_fc_target_list))
                pr_warn("%s: targetport list not empty\n", __func__);
@@ -2945,4 +2964,5 @@ static void __exit nvmet_fc_exit_module(void)
 module_init(nvmet_fc_init_module);
 module_exit(nvmet_fc_exit_module);
 
+MODULE_DESCRIPTION("NVMe target FC transport driver");
 MODULE_LICENSE("GPL v2");
index ead349af30f1e0c87ee0adde980aa98b5fdb0e8a..913cd2ec7a6f618db7a274dd234f335c263bdba0 100644 (file)
@@ -358,7 +358,7 @@ fcloop_h2t_ls_req(struct nvme_fc_local_port *localport,
        if (!rport->targetport) {
                tls_req->status = -ECONNREFUSED;
                spin_lock(&rport->lock);
-               list_add_tail(&rport->ls_list, &tls_req->ls_list);
+               list_add_tail(&tls_req->ls_list, &rport->ls_list);
                spin_unlock(&rport->lock);
                queue_work(nvmet_wq, &rport->ls_work);
                return ret;
@@ -391,7 +391,7 @@ fcloop_h2t_xmt_ls_rsp(struct nvmet_fc_target_port *targetport,
        if (remoteport) {
                rport = remoteport->private;
                spin_lock(&rport->lock);
-               list_add_tail(&rport->ls_list, &tls_req->ls_list);
+               list_add_tail(&tls_req->ls_list, &rport->ls_list);
                spin_unlock(&rport->lock);
                queue_work(nvmet_wq, &rport->ls_work);
        }
@@ -446,7 +446,7 @@ fcloop_t2h_ls_req(struct nvmet_fc_target_port *targetport, void *hosthandle,
        if (!tport->remoteport) {
                tls_req->status = -ECONNREFUSED;
                spin_lock(&tport->lock);
-               list_add_tail(&tport->ls_list, &tls_req->ls_list);
+               list_add_tail(&tls_req->ls_list, &tport->ls_list);
                spin_unlock(&tport->lock);
                queue_work(nvmet_wq, &tport->ls_work);
                return ret;
@@ -1556,7 +1556,9 @@ static const struct attribute_group *fcloop_dev_attr_groups[] = {
        NULL,
 };
 
-static struct class *fcloop_class;
+static const struct class fcloop_class = {
+       .name = "fcloop",
+};
 static struct device *fcloop_device;
 
 
@@ -1564,15 +1566,14 @@ static int __init fcloop_init(void)
 {
        int ret;
 
-       fcloop_class = class_create("fcloop");
-       if (IS_ERR(fcloop_class)) {
+       ret = class_register(&fcloop_class);
+       if (ret) {
                pr_err("couldn't register class fcloop\n");
-               ret = PTR_ERR(fcloop_class);
                return ret;
        }
 
        fcloop_device = device_create_with_groups(
-                               fcloop_class, NULL, MKDEV(0, 0), NULL,
+                               &fcloop_class, NULL, MKDEV(0, 0), NULL,
                                fcloop_dev_attr_groups, "ctl");
        if (IS_ERR(fcloop_device)) {
                pr_err("couldn't create ctl device!\n");
@@ -1585,7 +1586,7 @@ static int __init fcloop_init(void)
        return 0;
 
 out_destroy_class:
-       class_destroy(fcloop_class);
+       class_unregister(&fcloop_class);
        return ret;
 }
 
@@ -1643,11 +1644,12 @@ static void __exit fcloop_exit(void)
 
        put_device(fcloop_device);
 
-       device_destroy(fcloop_class, MKDEV(0, 0));
-       class_destroy(fcloop_class);
+       device_destroy(&fcloop_class, MKDEV(0, 0));
+       class_unregister(&fcloop_class);
 }
 
 module_init(fcloop_init);
 module_exit(fcloop_exit);
 
+MODULE_DESCRIPTION("NVMe target FC loop transport driver");
 MODULE_LICENSE("GPL v2");
index f11400a908f269f74138226079d3000f648f73b3..6426aac2634aeb501c673852d6eb99a79a437de4 100644 (file)
@@ -50,10 +50,10 @@ void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
 
 void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
 {
-       if (ns->bdev_handle) {
-               bdev_release(ns->bdev_handle);
+       if (ns->bdev_file) {
+               fput(ns->bdev_file);
                ns->bdev = NULL;
-               ns->bdev_handle = NULL;
+               ns->bdev_file = NULL;
        }
 }
 
@@ -85,18 +85,18 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
        if (ns->buffered_io)
                return -ENOTBLK;
 
-       ns->bdev_handle = bdev_open_by_path(ns->device_path,
+       ns->bdev_file = bdev_file_open_by_path(ns->device_path,
                                BLK_OPEN_READ | BLK_OPEN_WRITE, NULL, NULL);
-       if (IS_ERR(ns->bdev_handle)) {
-               ret = PTR_ERR(ns->bdev_handle);
+       if (IS_ERR(ns->bdev_file)) {
+               ret = PTR_ERR(ns->bdev_file);
                if (ret != -ENOTBLK) {
                        pr_err("failed to open block device %s: (%d)\n",
                                        ns->device_path, ret);
                }
-               ns->bdev_handle = NULL;
+               ns->bdev_file = NULL;
                return ret;
        }
-       ns->bdev = ns->bdev_handle->bdev;
+       ns->bdev = file_bdev(ns->bdev_file);
        ns->size = bdev_nr_bytes(ns->bdev);
        ns->blksize_shift = blksize_bits(bdev_logical_block_size(ns->bdev));
 
index 9cb434c5807514813afe91eada69c0a925daf83a..e589915ddef85cf5f67fcba50deed724b37616d3 100644 (file)
@@ -400,7 +400,7 @@ static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
        }
 
        nvme_quiesce_admin_queue(&ctrl->ctrl);
-       if (ctrl->ctrl.state == NVME_CTRL_LIVE)
+       if (nvme_ctrl_state(&ctrl->ctrl) == NVME_CTRL_LIVE)
                nvme_disable_ctrl(&ctrl->ctrl, true);
 
        nvme_cancel_admin_tagset(&ctrl->ctrl);
@@ -434,8 +434,10 @@ static void nvme_loop_reset_ctrl_work(struct work_struct *work)
        nvme_loop_shutdown_ctrl(ctrl);
 
        if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
-               if (ctrl->ctrl.state != NVME_CTRL_DELETING &&
-                   ctrl->ctrl.state != NVME_CTRL_DELETING_NOIO)
+               enum nvme_ctrl_state state = nvme_ctrl_state(&ctrl->ctrl);
+
+               if (state != NVME_CTRL_DELETING &&
+                   state != NVME_CTRL_DELETING_NOIO)
                        /* state change failure for non-deleted ctrl? */
                        WARN_ON_ONCE(1);
                return;
@@ -688,5 +690,6 @@ static void __exit nvme_loop_cleanup_module(void)
 module_init(nvme_loop_init_module);
 module_exit(nvme_loop_cleanup_module);
 
+MODULE_DESCRIPTION("NVMe target loop transport driver");
 MODULE_LICENSE("GPL v2");
 MODULE_ALIAS("nvmet-transport-254"); /* 254 == NVMF_TRTYPE_LOOP */
index 6c8acebe1a1a61b8742d892c1ba2f7eb5d1e9364..f460728e1df1fd87e24969e782203c7684a23326 100644 (file)
@@ -58,7 +58,7 @@
 
 struct nvmet_ns {
        struct percpu_ref       ref;
-       struct bdev_handle      *bdev_handle;
+       struct file             *bdev_file;
        struct block_device     *bdev;
        struct file             *file;
        bool                    readonly;
@@ -163,6 +163,7 @@ struct nvmet_port {
        void                            *priv;
        bool                            enabled;
        int                             inline_data_size;
+       int                             max_queue_size;
        const struct nvmet_fabrics_ops  *tr_ops;
        bool                            pi_enable;
 };
@@ -543,9 +544,10 @@ void nvmet_subsys_disc_changed(struct nvmet_subsys *subsys,
 void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
                u8 event_info, u8 log_page);
 
-#define NVMET_QUEUE_SIZE       1024
+#define NVMET_MIN_QUEUE_SIZE   16
+#define NVMET_MAX_QUEUE_SIZE   1024
 #define NVMET_NR_QUEUES                128
-#define NVMET_MAX_CMD          NVMET_QUEUE_SIZE
+#define NVMET_MAX_CMD(ctrl)    (NVME_CAP_MQES(ctrl->cap) + 1)
 
 /*
  * Nice round number that makes a list of nsids fit into a page.
index f2d963e1fe94e1a88cc7fa4aa6ef9b4d18c16054..bb4a69d538fd101087b34d55021f71559f302b84 100644 (file)
@@ -132,7 +132,7 @@ static u16 nvmet_passthru_override_id_ctrl(struct nvmet_req *req)
 
        id->sqes = min_t(__u8, ((0x6 << 4) | 0x6), id->sqes);
        id->cqes = min_t(__u8, ((0x4 << 4) | 0x4), id->cqes);
-       id->maxcmd = cpu_to_le16(NVMET_MAX_CMD);
+       id->maxcmd = cpu_to_le16(NVMET_MAX_CMD(ctrl));
 
        /* don't support fuse commands */
        id->fuses = 0;
index 667f9c04f35d538bb361f733e50c62bb7c52d9c3..f2bb9d95ecf4bc6cde907d1f9af3e2d89998f376 100644 (file)
@@ -1956,6 +1956,14 @@ static int nvmet_rdma_add_port(struct nvmet_port *nport)
                nport->inline_data_size = NVMET_RDMA_MAX_INLINE_DATA_SIZE;
        }
 
+       if (nport->max_queue_size < 0) {
+               nport->max_queue_size = NVME_RDMA_DEFAULT_QUEUE_SIZE;
+       } else if (nport->max_queue_size > NVME_RDMA_MAX_QUEUE_SIZE) {
+               pr_warn("max_queue_size %u is too large, reducing to %u\n",
+                       nport->max_queue_size, NVME_RDMA_MAX_QUEUE_SIZE);
+               nport->max_queue_size = NVME_RDMA_MAX_QUEUE_SIZE;
+       }
+
        ret = inet_pton_with_scope(&init_net, af, nport->disc_addr.traddr,
                        nport->disc_addr.trsvcid, &port->addr);
        if (ret) {
@@ -2015,6 +2023,8 @@ static u8 nvmet_rdma_get_mdts(const struct nvmet_ctrl *ctrl)
 
 static u16 nvmet_rdma_get_max_queue_size(const struct nvmet_ctrl *ctrl)
 {
+       if (ctrl->pi_support)
+               return NVME_RDMA_MAX_METADATA_QUEUE_SIZE;
        return NVME_RDMA_MAX_QUEUE_SIZE;
 }
 
@@ -2104,5 +2114,6 @@ static void __exit nvmet_rdma_exit(void)
 module_init(nvmet_rdma_init);
 module_exit(nvmet_rdma_exit);
 
+MODULE_DESCRIPTION("NVMe target RDMA transport driver");
 MODULE_LICENSE("GPL v2");
 MODULE_ALIAS("nvmet-transport-1"); /* 1 == NVMF_TRTYPE_RDMA */
index 6a1e6bb80062d4753501e07cbcba43870fc00eeb..c8655fc5aa5b8aac838cb4c2e4c0a76c8ebbc174 100644 (file)
@@ -2216,10 +2216,12 @@ static void __exit nvmet_tcp_exit(void)
        flush_workqueue(nvmet_wq);
 
        destroy_workqueue(nvmet_tcp_wq);
+       ida_destroy(&nvmet_tcp_queue_ida);
 }
 
 module_init(nvmet_tcp_init);
 module_exit(nvmet_tcp_exit);
 
+MODULE_DESCRIPTION("NVMe target TCP transport driver");
 MODULE_LICENSE("GPL v2");
 MODULE_ALIAS("nvmet-transport-3"); /* 3 == NVMF_TRTYPE_TCP */
index 5b5c1e48172213df20683d8c117a9586b733bfa5..3148d9f1bde66ac44d114b71f45c98ab52a6ed24 100644 (file)
@@ -456,8 +456,7 @@ static u16 nvmet_bdev_execute_zmgmt_send_all(struct nvmet_req *req)
        switch (zsa_req_op(req->cmd->zms.zsa)) {
        case REQ_OP_ZONE_RESET:
                ret = blkdev_zone_mgmt(req->ns->bdev, REQ_OP_ZONE_RESET, 0,
-                                      get_capacity(req->ns->bdev->bd_disk),
-                                      GFP_KERNEL);
+                                      get_capacity(req->ns->bdev->bd_disk));
                if (ret < 0)
                        return blkdev_zone_mgmt_errno_to_nvme_status(ret);
                break;
@@ -508,7 +507,7 @@ static void nvmet_bdev_zmgmt_send_work(struct work_struct *w)
                goto out;
        }
 
-       ret = blkdev_zone_mgmt(bdev, op, sect, zone_sectors, GFP_KERNEL);
+       ret = blkdev_zone_mgmt(bdev, op, sect, zone_sectors);
        if (ret < 0)
                status = blkdev_zone_mgmt_errno_to_nvme_status(ret);
 
index 980123fb4dde05d0e5cd4e0cfe5645b24a8d55dc..eb357ac2e54a2a827ad07b9a073d3e76415a000f 100644 (file)
@@ -460,8 +460,9 @@ static int nvmem_populate_sysfs_cells(struct nvmem_device *nvmem)
        list_for_each_entry(entry, &nvmem->cells, node) {
                sysfs_bin_attr_init(&attrs[i]);
                attrs[i].attr.name = devm_kasprintf(&nvmem->dev, GFP_KERNEL,
-                                                   "%s@%x", entry->name,
-                                                   entry->offset);
+                                                   "%s@%x,%x", entry->name,
+                                                   entry->offset,
+                                                   entry->bit_offset);
                attrs[i].attr.mode = 0444;
                attrs[i].size = entry->bytes;
                attrs[i].read = &nvmem_cell_attr_read;
index 641a40cf5cf34a7d0aa3bf94362a95ae58065fe1..fa8cd33be1312dc57f075cf6557270794dcc2939 100644 (file)
@@ -763,7 +763,9 @@ struct device_node *of_graph_get_port_parent(struct device_node *node)
        /* Walk 3 levels up only if there is 'ports' node. */
        for (depth = 3; depth && node; depth--) {
                node = of_get_next_parent(node);
-               if (depth == 2 && !of_node_name_eq(node, "ports"))
+               if (depth == 2 && !of_node_name_eq(node, "ports") &&
+                   !of_node_name_eq(node, "in-ports") &&
+                   !of_node_name_eq(node, "out-ports"))
                        break;
        }
        return node;
@@ -1063,36 +1065,6 @@ of_fwnode_device_get_match_data(const struct fwnode_handle *fwnode,
        return of_device_get_match_data(dev);
 }
 
-static struct device_node *of_get_compat_node(struct device_node *np)
-{
-       of_node_get(np);
-
-       while (np) {
-               if (!of_device_is_available(np)) {
-                       of_node_put(np);
-                       np = NULL;
-               }
-
-               if (of_property_present(np, "compatible"))
-                       break;
-
-               np = of_get_next_parent(np);
-       }
-
-       return np;
-}
-
-static struct device_node *of_get_compat_node_parent(struct device_node *np)
-{
-       struct device_node *parent, *node;
-
-       parent = of_get_parent(np);
-       node = of_get_compat_node(parent);
-       of_node_put(parent);
-
-       return node;
-}
-
 static void of_link_to_phandle(struct device_node *con_np,
                              struct device_node *sup_np)
 {
@@ -1222,10 +1194,10 @@ static struct device_node *parse_##fname(struct device_node *np,             \
  *  parse_prop.prop_name: Name of property holding a phandle value
  *  parse_prop.index: For properties holding a list of phandles, this is the
  *                   index into the list
+ * @get_con_dev: If the consumer node containing the property is never converted
+ *              to a struct device, implement this ops so fw_devlink can use it
+ *              to find the true consumer.
  * @optional: Describes whether a supplier is mandatory or not
- * @node_not_dev: The consumer node containing the property is never converted
- *               to a struct device. Instead, parse ancestor nodes for the
- *               compatible property to find a node corresponding to a device.
  *
  * Returns:
  * parse_prop() return values are
@@ -1236,15 +1208,15 @@ static struct device_node *parse_##fname(struct device_node *np,             \
 struct supplier_bindings {
        struct device_node *(*parse_prop)(struct device_node *np,
                                          const char *prop_name, int index);
+       struct device_node *(*get_con_dev)(struct device_node *np);
        bool optional;
-       bool node_not_dev;
 };
 
 DEFINE_SIMPLE_PROP(clocks, "clocks", "#clock-cells")
 DEFINE_SIMPLE_PROP(interconnects, "interconnects", "#interconnect-cells")
 DEFINE_SIMPLE_PROP(iommus, "iommus", "#iommu-cells")
 DEFINE_SIMPLE_PROP(mboxes, "mboxes", "#mbox-cells")
-DEFINE_SIMPLE_PROP(io_channels, "io-channel", "#io-channel-cells")
+DEFINE_SIMPLE_PROP(io_channels, "io-channels", "#io-channel-cells")
 DEFINE_SIMPLE_PROP(interrupt_parent, "interrupt-parent", NULL)
 DEFINE_SIMPLE_PROP(dmas, "dmas", "#dma-cells")
 DEFINE_SIMPLE_PROP(power_domains, "power-domains", "#power-domain-cells")
@@ -1262,7 +1234,6 @@ DEFINE_SIMPLE_PROP(pinctrl5, "pinctrl-5", NULL)
 DEFINE_SIMPLE_PROP(pinctrl6, "pinctrl-6", NULL)
 DEFINE_SIMPLE_PROP(pinctrl7, "pinctrl-7", NULL)
 DEFINE_SIMPLE_PROP(pinctrl8, "pinctrl-8", NULL)
-DEFINE_SIMPLE_PROP(remote_endpoint, "remote-endpoint", NULL)
 DEFINE_SIMPLE_PROP(pwms, "pwms", "#pwm-cells")
 DEFINE_SIMPLE_PROP(resets, "resets", "#reset-cells")
 DEFINE_SIMPLE_PROP(leds, "leds", NULL)
@@ -1328,6 +1299,17 @@ static struct device_node *parse_interrupts(struct device_node *np,
        return of_irq_parse_one(np, index, &sup_args) ? NULL : sup_args.np;
 }
 
+static struct device_node *parse_remote_endpoint(struct device_node *np,
+                                                const char *prop_name,
+                                                int index)
+{
+       /* Return NULL for index > 0 to signify end of remote-endpoints. */
+       if (index > 0 || strcmp(prop_name, "remote-endpoint"))
+               return NULL;
+
+       return of_graph_get_remote_port_parent(np);
+}
+
 static const struct supplier_bindings of_supplier_bindings[] = {
        { .parse_prop = parse_clocks, },
        { .parse_prop = parse_interconnects, },
@@ -1352,7 +1334,10 @@ static const struct supplier_bindings of_supplier_bindings[] = {
        { .parse_prop = parse_pinctrl6, },
        { .parse_prop = parse_pinctrl7, },
        { .parse_prop = parse_pinctrl8, },
-       { .parse_prop = parse_remote_endpoint, .node_not_dev = true, },
+       {
+               .parse_prop = parse_remote_endpoint,
+               .get_con_dev = of_graph_get_port_parent,
+       },
        { .parse_prop = parse_pwms, },
        { .parse_prop = parse_resets, },
        { .parse_prop = parse_leds, },
@@ -1403,8 +1388,8 @@ static int of_link_property(struct device_node *con_np, const char *prop_name)
                while ((phandle = s->parse_prop(con_np, prop_name, i))) {
                        struct device_node *con_dev_np;
 
-                       con_dev_np = s->node_not_dev
-                                       ? of_get_compat_node_parent(con_np)
+                       con_dev_np = s->get_con_dev
+                                       ? s->get_con_dev(con_np)
                                        : of_node_get(con_np);
                        matched = true;
                        i++;
index cfd60e35a8992d7d1bf7ee1ea42c10b6f43a7a2e..d7593bde2d02f39c2532ae4d0be41cccaec38526 100644 (file)
@@ -50,6 +50,12 @@ static struct unittest_results {
        failed; \
 })
 
+#ifdef CONFIG_OF_KOBJ
+#define OF_KREF_READ(NODE) kref_read(&(NODE)->kobj.kref)
+#else
+#define OF_KREF_READ(NODE) 1
+#endif
+
 /*
  * Expected message may have a message level other than KERN_INFO.
  * Print the expected message only if the current loglevel will allow
@@ -570,7 +576,7 @@ static void __init of_unittest_parse_phandle_with_args_map(void)
                        pr_err("missing testcase data\n");
                        return;
                }
-               prefs[i] = kref_read(&p[i]->kobj.kref);
+               prefs[i] = OF_KREF_READ(p[i]);
        }
 
        rc = of_count_phandle_with_args(np, "phandle-list", "#phandle-cells");
@@ -693,9 +699,9 @@ static void __init of_unittest_parse_phandle_with_args_map(void)
        unittest(rc == -EINVAL, "expected:%i got:%i\n", -EINVAL, rc);
 
        for (i = 0; i < ARRAY_SIZE(p); ++i) {
-               unittest(prefs[i] == kref_read(&p[i]->kobj.kref),
+               unittest(prefs[i] == OF_KREF_READ(p[i]),
                         "provider%d: expected:%d got:%d\n",
-                        i, prefs[i], kref_read(&p[i]->kobj.kref));
+                        i, prefs[i], OF_KREF_READ(p[i]));
                of_node_put(p[i]);
        }
 }
index 9c2137dae429aa26cd69bfaadb9706193946b2b8..826b5016a101022b990045fa7b68afe85be80c7a 100644 (file)
@@ -386,21 +386,8 @@ void pci_bus_add_devices(const struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pci_bus_add_devices);
 
-/** pci_walk_bus - walk devices on/under bus, calling callback.
- *  @top      bus whose devices should be walked
- *  @cb       callback to be called for each device found
- *  @userdata arbitrary pointer to be passed to callback.
- *
- *  Walk the given bus, including any bridged devices
- *  on buses under this bus.  Call the provided callback
- *  on each device found.
- *
- *  We check the return of @cb each time. If it returns anything
- *  other than 0, we break out.
- *
- */
-void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
-                 void *userdata)
+static void __pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
+                          void *userdata, bool locked)
 {
        struct pci_dev *dev;
        struct pci_bus *bus;
@@ -408,7 +395,8 @@ void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
        int retval;
 
        bus = top;
-       down_read(&pci_bus_sem);
+       if (!locked)
+               down_read(&pci_bus_sem);
        next = top->devices.next;
        for (;;) {
                if (next == &bus->devices) {
@@ -431,10 +419,37 @@ void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
                if (retval)
                        break;
        }
-       up_read(&pci_bus_sem);
+       if (!locked)
+               up_read(&pci_bus_sem);
+}
+
+/**
+ *  pci_walk_bus - walk devices on/under bus, calling callback.
+ *  @top: bus whose devices should be walked
+ *  @cb: callback to be called for each device found
+ *  @userdata: arbitrary pointer to be passed to callback
+ *
+ *  Walk the given bus, including any bridged devices
+ *  on buses under this bus.  Call the provided callback
+ *  on each device found.
+ *
+ *  We check the return of @cb each time. If it returns anything
+ *  other than 0, we break out.
+ */
+void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *), void *userdata)
+{
+       __pci_walk_bus(top, cb, userdata, false);
 }
 EXPORT_SYMBOL_GPL(pci_walk_bus);
 
+void pci_walk_bus_locked(struct pci_bus *top, int (*cb)(struct pci_dev *, void *), void *userdata)
+{
+       lockdep_assert_held(&pci_bus_sem);
+
+       __pci_walk_bus(top, cb, userdata, true);
+}
+EXPORT_SYMBOL_GPL(pci_walk_bus_locked);
+
 struct pci_bus *pci_bus_get(struct pci_bus *bus)
 {
        if (bus)
index 5befed2dc02b70bc5f4593b6e04a173e8e7ded08..9a437cfce073c16996927af40fa0e3816b7c7b32 100644 (file)
@@ -6,6 +6,7 @@
  * Author: Kishon Vijay Abraham I <kishon@ti.com>
  */
 
+#include <linux/align.h>
 #include <linux/bitfield.h>
 #include <linux/of.h>
 #include <linux/platform_device.h>
@@ -482,9 +483,10 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
                reg = ep_func->msi_cap + PCI_MSI_DATA_32;
                msg_data = dw_pcie_ep_readw_dbi(ep, func_no, reg);
        }
-       aligned_offset = msg_addr_lower & (epc->mem->window.page_size - 1);
-       msg_addr = ((u64)msg_addr_upper) << 32 |
-                       (msg_addr_lower & ~aligned_offset);
+       msg_addr = ((u64)msg_addr_upper) << 32 | msg_addr_lower;
+
+       aligned_offset = msg_addr & (epc->mem->window.page_size - 1);
+       msg_addr = ALIGN_DOWN(msg_addr, epc->mem->window.page_size);
        ret = dw_pcie_ep_map_addr(epc, func_no, 0, ep->msi_mem_phys, msg_addr,
                                  epc->mem->window.page_size);
        if (ret)
@@ -551,7 +553,7 @@ int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
        }
 
        aligned_offset = msg_addr & (epc->mem->window.page_size - 1);
-       msg_addr &= ~aligned_offset;
+       msg_addr = ALIGN_DOWN(msg_addr, epc->mem->window.page_size);
        ret = dw_pcie_ep_map_addr(epc, func_no, 0, ep->msi_mem_phys, msg_addr,
                                  epc->mem->window.page_size);
        if (ret)
index 10f2d0bb86bec008e82e6a86211161c693ae568e..2ce2a3bd932bd7e3824b69cc9450135895b7a89e 100644 (file)
@@ -972,7 +972,7 @@ static int qcom_pcie_enable_aspm(struct pci_dev *pdev, void *userdata)
         * Downstream devices need to be in D0 state before enabling PCI PM
         * substates.
         */
-       pci_set_power_state(pdev, PCI_D0);
+       pci_set_power_state_locked(pdev, PCI_D0);
        pci_enable_link_state_locked(pdev, PCIE_LINK_STATE_ALL);
 
        return 0;
index c8be056c248ded75cae622f1d8cd82bcc81e5500..cfd84a899c82d881f9ed5c446aed0c204bfd3cd4 100644 (file)
@@ -61,7 +61,7 @@ static irq_hw_number_t pci_msi_domain_calc_hwirq(struct msi_desc *desc)
 
        return (irq_hw_number_t)desc->msi_index |
                pci_dev_id(dev) << 11 |
-               (pci_domain_nr(dev->bus) & 0xFFFFFFFF) << 27;
+               ((irq_hw_number_t)(pci_domain_nr(dev->bus) & 0xFFFFFFFF)) << 27;
 }
 
 static void pci_msi_domain_set_desc(msi_alloc_info_t *arg,
index d8f11a078924c1336326456b0e3f37f7b0e66df9..c3585229c12a2145401d675ff84c20288b8f158e 100644 (file)
@@ -1354,6 +1354,7 @@ end:
 /**
  * pci_set_full_power_state - Put a PCI device into D0 and update its state
  * @dev: PCI device to power up
+ * @locked: whether pci_bus_sem is held
  *
  * Call pci_power_up() to put @dev into D0, read from its PCI_PM_CTRL register
  * to confirm the state change, restore its BARs if they might be lost and
@@ -1363,7 +1364,7 @@ end:
  * to D0, it is more efficient to use pci_power_up() directly instead of this
  * function.
  */
-static int pci_set_full_power_state(struct pci_dev *dev)
+static int pci_set_full_power_state(struct pci_dev *dev, bool locked)
 {
        u16 pmcsr;
        int ret;
@@ -1399,7 +1400,7 @@ static int pci_set_full_power_state(struct pci_dev *dev)
        }
 
        if (dev->bus->self)
-               pcie_aspm_pm_state_change(dev->bus->self);
+               pcie_aspm_pm_state_change(dev->bus->self, locked);
 
        return 0;
 }
@@ -1428,10 +1429,22 @@ void pci_bus_set_current_state(struct pci_bus *bus, pci_power_t state)
                pci_walk_bus(bus, __pci_dev_set_current_state, &state);
 }
 
+static void __pci_bus_set_current_state(struct pci_bus *bus, pci_power_t state, bool locked)
+{
+       if (!bus)
+               return;
+
+       if (locked)
+               pci_walk_bus_locked(bus, __pci_dev_set_current_state, &state);
+       else
+               pci_walk_bus(bus, __pci_dev_set_current_state, &state);
+}
+
 /**
  * pci_set_low_power_state - Put a PCI device into a low-power state.
  * @dev: PCI device to handle.
  * @state: PCI power state (D1, D2, D3hot) to put the device into.
+ * @locked: whether pci_bus_sem is held
  *
  * Use the device's PCI_PM_CTRL register to put it into a low-power state.
  *
@@ -1442,7 +1455,7 @@ void pci_bus_set_current_state(struct pci_bus *bus, pci_power_t state)
  * 0 if device already is in the requested state.
  * 0 if device's power state has been successfully changed.
  */
-static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state)
+static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state, bool locked)
 {
        u16 pmcsr;
 
@@ -1496,29 +1509,12 @@ static int pci_set_low_power_state(struct pci_dev *dev, pci_power_t state)
                                     pci_power_name(state));
 
        if (dev->bus->self)
-               pcie_aspm_pm_state_change(dev->bus->self);
+               pcie_aspm_pm_state_change(dev->bus->self, locked);
 
        return 0;
 }
 
-/**
- * pci_set_power_state - Set the power state of a PCI device
- * @dev: PCI device to handle.
- * @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
- *
- * Transition a device to a new power state, using the platform firmware and/or
- * the device's PCI PM registers.
- *
- * RETURN VALUE:
- * -EINVAL if the requested state is invalid.
- * -EIO if device does not support PCI PM or its PM capabilities register has a
- * wrong version, or device doesn't support the requested state.
- * 0 if the transition is to D1 or D2 but D1 and D2 are not supported.
- * 0 if device already is in the requested state.
- * 0 if the transition is to D3 but D3 is not supported.
- * 0 if device's power state has been successfully changed.
- */
-int pci_set_power_state(struct pci_dev *dev, pci_power_t state)
+static int __pci_set_power_state(struct pci_dev *dev, pci_power_t state, bool locked)
 {
        int error;
 
@@ -1542,7 +1538,7 @@ int pci_set_power_state(struct pci_dev *dev, pci_power_t state)
                return 0;
 
        if (state == PCI_D0)
-               return pci_set_full_power_state(dev);
+               return pci_set_full_power_state(dev, locked);
 
        /*
         * This device is quirked not to be put into D3, so don't put it in
@@ -1556,16 +1552,16 @@ int pci_set_power_state(struct pci_dev *dev, pci_power_t state)
                 * To put the device in D3cold, put it into D3hot in the native
                 * way, then put it into D3cold using platform ops.
                 */
-               error = pci_set_low_power_state(dev, PCI_D3hot);
+               error = pci_set_low_power_state(dev, PCI_D3hot, locked);
 
                if (pci_platform_power_transition(dev, PCI_D3cold))
                        return error;
 
                /* Powering off a bridge may power off the whole hierarchy */
                if (dev->current_state == PCI_D3cold)
-                       pci_bus_set_current_state(dev->subordinate, PCI_D3cold);
+                       __pci_bus_set_current_state(dev->subordinate, PCI_D3cold, locked);
        } else {
-               error = pci_set_low_power_state(dev, state);
+               error = pci_set_low_power_state(dev, state, locked);
 
                if (pci_platform_power_transition(dev, state))
                        return error;
@@ -1573,8 +1569,38 @@ int pci_set_power_state(struct pci_dev *dev, pci_power_t state)
 
        return 0;
 }
+
+/**
+ * pci_set_power_state - Set the power state of a PCI device
+ * @dev: PCI device to handle.
+ * @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
+ *
+ * Transition a device to a new power state, using the platform firmware and/or
+ * the device's PCI PM registers.
+ *
+ * RETURN VALUE:
+ * -EINVAL if the requested state is invalid.
+ * -EIO if device does not support PCI PM or its PM capabilities register has a
+ * wrong version, or device doesn't support the requested state.
+ * 0 if the transition is to D1 or D2 but D1 and D2 are not supported.
+ * 0 if device already is in the requested state.
+ * 0 if the transition is to D3 but D3 is not supported.
+ * 0 if device's power state has been successfully changed.
+ */
+int pci_set_power_state(struct pci_dev *dev, pci_power_t state)
+{
+       return __pci_set_power_state(dev, state, false);
+}
 EXPORT_SYMBOL(pci_set_power_state);
 
+int pci_set_power_state_locked(struct pci_dev *dev, pci_power_t state)
+{
+       lockdep_assert_held(&pci_bus_sem);
+
+       return __pci_set_power_state(dev, state, true);
+}
+EXPORT_SYMBOL(pci_set_power_state_locked);
+
 #define PCI_EXP_SAVE_REGS      7
 
 static struct pci_cap_saved_state *_pci_find_saved_cap(struct pci_dev *pci_dev,
@@ -2496,29 +2522,36 @@ static void pci_pme_list_scan(struct work_struct *work)
                if (pdev->pme_poll) {
                        struct pci_dev *bridge = pdev->bus->self;
                        struct device *dev = &pdev->dev;
-                       int pm_status;
+                       struct device *bdev = bridge ? &bridge->dev : NULL;
+                       int bref = 0;
 
                        /*
-                        * If bridge is in low power state, the
-                        * configuration space of subordinate devices
-                        * may be not accessible
+                        * If we have a bridge, it should be in an active/D0
+                        * state or the configuration space of subordinate
+                        * devices may not be accessible or stable over the
+                        * course of the call.
                         */
-                       if (bridge && bridge->current_state != PCI_D0)
-                               continue;
+                       if (bdev) {
+                               bref = pm_runtime_get_if_active(bdev, true);
+                               if (!bref)
+                                       continue;
+
+                               if (bridge->current_state != PCI_D0)
+                                       goto put_bridge;
+                       }
 
                        /*
-                        * If the device is in a low power state it
-                        * should not be polled either.
+                        * The device itself should be suspended but config
+                        * space must be accessible, therefore it cannot be in
+                        * D3cold.
                         */
-                       pm_status = pm_runtime_get_if_active(dev, true);
-                       if (!pm_status)
-                               continue;
-
-                       if (pdev->current_state != PCI_D3cold)
+                       if (pm_runtime_suspended(dev) &&
+                           pdev->current_state != PCI_D3cold)
                                pci_pme_wakeup(pdev, NULL);
 
-                       if (pm_status > 0)
-                               pm_runtime_put(dev);
+put_bridge:
+                       if (bref > 0)
+                               pm_runtime_put(bdev);
                } else {
                        list_del(&pme_dev->list);
                        kfree(pme_dev);
index 2336a8d1edab27646220794a3a4cdd085ba7b3e9..e9750b1b19bad5bfc500909f390f1d890f5eab73 100644 (file)
@@ -571,12 +571,12 @@ int pcie_retrain_link(struct pci_dev *pdev, bool use_lt);
 #ifdef CONFIG_PCIEASPM
 void pcie_aspm_init_link_state(struct pci_dev *pdev);
 void pcie_aspm_exit_link_state(struct pci_dev *pdev);
-void pcie_aspm_pm_state_change(struct pci_dev *pdev);
+void pcie_aspm_pm_state_change(struct pci_dev *pdev, bool locked);
 void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
 #else
 static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
 static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
-static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
+static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev, bool locked) { }
 static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
 #endif
 
index 5a0066ecc3c5adcc97e14f08f166c783f254f6e9..bc0bd86695ec62a2d43428b69eb562f771334bb3 100644 (file)
@@ -1003,8 +1003,11 @@ void pcie_aspm_exit_link_state(struct pci_dev *pdev)
        up_read(&pci_bus_sem);
 }
 
-/* @pdev: the root port or switch downstream port */
-void pcie_aspm_pm_state_change(struct pci_dev *pdev)
+/*
+ * @pdev: the root port or switch downstream port
+ * @locked: whether pci_bus_sem is held
+ */
+void pcie_aspm_pm_state_change(struct pci_dev *pdev, bool locked)
 {
        struct pcie_link_state *link = pdev->link_state;
 
@@ -1014,12 +1017,14 @@ void pcie_aspm_pm_state_change(struct pci_dev *pdev)
         * Devices changed PM state, we should recheck if latency
         * meets all functions' requirement
         */
-       down_read(&pci_bus_sem);
+       if (!locked)
+               down_read(&pci_bus_sem);
        mutex_lock(&aspm_lock);
        pcie_update_aspm_capable(link->root);
        pcie_config_aspm_path(link);
        mutex_unlock(&aspm_lock);
-       up_read(&pci_bus_sem);
+       if (!locked)
+               up_read(&pci_bus_sem);
 }
 
 void pcie_aspm_powersave_config_link(struct pci_dev *pdev)
index c584165b13babd946eb7fcd84300bdf68abe31af..7e3aa7e2345fa3a9d7d3b1cb9c5dd40cc15498ff 100644 (file)
@@ -2305,6 +2305,17 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset)
                                dev_dbg(cmn->dev, "ignoring external node %llx\n", reg);
                                continue;
                        }
+                       /*
+                        * AmpereOneX erratum AC04_MESH_1 makes some XPs report a bogus
+                        * child count larger than the number of valid child pointers.
+                        * A child offset of 0 can only occur on CMN-600; otherwise it
+                        * would imply the root node being its own grandchild, which
+                        * we can safely dismiss in general.
+                        */
+                       if (reg == 0 && cmn->part != PART_CMN600) {
+                               dev_dbg(cmn->dev, "bogus child pointer?\n");
+                               continue;
+                       }
 
                        arm_cmn_init_node_info(cmn, reg & CMN_CHILD_NODE_ADDR, dn);
 
index 365d964b0f6a6d7382455f3b07035fafe1de1fa2..308c9969642e1f149cdebd9f8aed7812adbc5f1f 100644 (file)
@@ -59,7 +59,7 @@
 #define   CXL_PMU_COUNTER_CFG_EVENT_GRP_ID_IDX_MSK     GENMASK_ULL(63, 59)
 
 #define CXL_PMU_FILTER_CFG_REG(n, f)   (0x400 + 4 * ((f) + (n) * 8))
-#define   CXL_PMU_FILTER_CFG_VALUE_MSK                 GENMASK(15, 0)
+#define   CXL_PMU_FILTER_CFG_VALUE_MSK                 GENMASK(31, 0)
 
 #define CXL_PMU_COUNTER_REG(n)         (0xc00 + 8 * (n))
 
@@ -314,9 +314,9 @@ static bool cxl_pmu_config1_get_edge(struct perf_event *event)
 }
 
 /*
- * CPMU specification allows for 8 filters, each with a 16 bit value...
- * So we need to find 8x16bits to store it in.
- * As the value used for disable is 0xffff, a separate enable switch
+ * CPMU specification allows for 8 filters, each with a 32 bit value...
+ * So we need to find 8x32bits to store it in.
+ * As the value used for disable is 0xffff_ffff, a separate enable switch
  * is needed.
  */
 
@@ -419,7 +419,7 @@ static struct attribute *cxl_pmu_event_attrs[] = {
        CXL_PMU_EVENT_CXL_ATTR(s2m_ndr_cmp,                     CXL_PMU_GID_S2M_NDR, BIT(0)),
        CXL_PMU_EVENT_CXL_ATTR(s2m_ndr_cmps,                    CXL_PMU_GID_S2M_NDR, BIT(1)),
        CXL_PMU_EVENT_CXL_ATTR(s2m_ndr_cmpe,                    CXL_PMU_GID_S2M_NDR, BIT(2)),
-       CXL_PMU_EVENT_CXL_ATTR(s2m_ndr_biconflictack,           CXL_PMU_GID_S2M_NDR, BIT(3)),
+       CXL_PMU_EVENT_CXL_ATTR(s2m_ndr_biconflictack,           CXL_PMU_GID_S2M_NDR, BIT(4)),
        /* CXL rev 3.0 Table 3-46 S2M DRS opcodes */
        CXL_PMU_EVENT_CXL_ATTR(s2m_drs_memdata,                 CXL_PMU_GID_S2M_DRS, BIT(0)),
        CXL_PMU_EVENT_CXL_ATTR(s2m_drs_memdatanxm,              CXL_PMU_GID_S2M_DRS, BIT(1)),
@@ -642,7 +642,7 @@ static void cxl_pmu_event_start(struct perf_event *event, int flags)
                if (cxl_pmu_config1_hdm_filter_en(event))
                        cfg = cxl_pmu_config2_get_hdm_decoder(event);
                else
-                       cfg = GENMASK(15, 0); /* No filtering if 0xFFFF_FFFF */
+                       cfg = GENMASK(31, 0); /* No filtering if 0xFFFF_FFFF */
                writeq(cfg, base + CXL_PMU_FILTER_CFG_REG(hwc->idx, 0));
        }
 
index 0dda70e1ef90a19017c902689f970dea684b4f4c..c78a6fd6c57f612221749d44673d47845911231f 100644 (file)
@@ -150,19 +150,11 @@ u64 riscv_pmu_ctr_get_width_mask(struct perf_event *event)
        struct riscv_pmu *rvpmu = to_riscv_pmu(event->pmu);
        struct hw_perf_event *hwc = &event->hw;
 
-       if (!rvpmu->ctr_get_width)
-       /**
-        * If the pmu driver doesn't support counter width, set it to default
-        * maximum allowed by the specification.
-        */
-               cwidth = 63;
-       else {
-               if (hwc->idx == -1)
-                       /* Handle init case where idx is not initialized yet */
-                       cwidth = rvpmu->ctr_get_width(0);
-               else
-                       cwidth = rvpmu->ctr_get_width(hwc->idx);
-       }
+       if (hwc->idx == -1)
+               /* Handle init case where idx is not initialized yet */
+               cwidth = rvpmu->ctr_get_width(0);
+       else
+               cwidth = rvpmu->ctr_get_width(hwc->idx);
 
        return GENMASK_ULL(cwidth, 0);
 }
index 79fdd667922e812612aae1f597714bbefa0d4899..fa0bccf4edf2ea6172c7ee72d577cb0904073ea7 100644 (file)
@@ -37,6 +37,12 @@ static int pmu_legacy_event_map(struct perf_event *event, u64 *config)
        return pmu_legacy_ctr_get_idx(event);
 }
 
+/* cycle & instret are always 64 bit, one bit less according to SBI spec */
+static int pmu_legacy_ctr_get_width(int idx)
+{
+       return 63;
+}
+
 static u64 pmu_legacy_read_ctr(struct perf_event *event)
 {
        struct hw_perf_event *hwc = &event->hw;
@@ -111,12 +117,14 @@ static void pmu_legacy_init(struct riscv_pmu *pmu)
        pmu->ctr_stop = NULL;
        pmu->event_map = pmu_legacy_event_map;
        pmu->ctr_get_idx = pmu_legacy_ctr_get_idx;
-       pmu->ctr_get_width = NULL;
+       pmu->ctr_get_width = pmu_legacy_ctr_get_width;
        pmu->ctr_clear_idx = NULL;
        pmu->ctr_read = pmu_legacy_read_ctr;
        pmu->event_mapped = pmu_legacy_event_mapped;
        pmu->event_unmapped = pmu_legacy_event_unmapped;
        pmu->csr_index = pmu_legacy_csr_index;
+       pmu->pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
+       pmu->pmu.capabilities |= PERF_PMU_CAP_NO_EXCLUDE;
 
        perf_pmu_register(&pmu->pmu, "cpu", PERF_TYPE_RAW);
 }
index 16acd4dcdb96c75e07b45a3745a71842f2d7d2b8..452aab49db1e8ccc35a6bb0b76661ca7cb6fb71f 100644 (file)
@@ -512,7 +512,7 @@ static void pmu_sbi_set_scounteren(void *arg)
 
        if (event->hw.idx != -1)
                csr_write(CSR_SCOUNTEREN,
-                         csr_read(CSR_SCOUNTEREN) | (1 << pmu_sbi_csr_index(event)));
+                         csr_read(CSR_SCOUNTEREN) | BIT(pmu_sbi_csr_index(event)));
 }
 
 static void pmu_sbi_reset_scounteren(void *arg)
@@ -521,7 +521,7 @@ static void pmu_sbi_reset_scounteren(void *arg)
 
        if (event->hw.idx != -1)
                csr_write(CSR_SCOUNTEREN,
-                         csr_read(CSR_SCOUNTEREN) & ~(1 << pmu_sbi_csr_index(event)));
+                         csr_read(CSR_SCOUNTEREN) & ~BIT(pmu_sbi_csr_index(event)));
 }
 
 static void pmu_sbi_ctr_start(struct perf_event *event, u64 ival)
@@ -731,14 +731,14 @@ static irqreturn_t pmu_sbi_ovf_handler(int irq, void *dev)
                /* compute hardware counter index */
                hidx = info->csr - CSR_CYCLE;
                /* check if the corresponding bit is set in sscountovf */
-               if (!(overflow & (1 << hidx)))
+               if (!(overflow & BIT(hidx)))
                        continue;
 
                /*
                 * Keep a track of overflowed counters so that they can be started
                 * with updated initial value.
                 */
-               overflowed_ctrs |= 1 << lidx;
+               overflowed_ctrs |= BIT(lidx);
                hw_evt = &event->hw;
                riscv_pmu_event_update(event);
                perf_sample_data_init(&data, 0, hw_evt->last_period);
index e625b32889bfceaef9846db42e594e971cccb54d..0928a526e2ab3692eaeb1e4abaa45e23eee4cf5b 100644 (file)
@@ -706,7 +706,7 @@ static int mixel_dphy_probe(struct platform_device *pdev)
                        return ret;
                }
 
-               priv->id = of_alias_get_id(np, "mipi_dphy");
+               priv->id = of_alias_get_id(np, "mipi-dphy");
                if (priv->id < 0) {
                        dev_err(dev, "Failed to get phy node alias id: %d\n",
                                priv->id);
index c1a41b6cd29b1d8f785134627547d68e38079747..b5ac2b7995e7156b73e348814ce4e29b6f71a874 100644 (file)
@@ -96,6 +96,8 @@ static const struct serdes_mux lan966x_serdes_muxes[] = {
        SERDES_MUX_SGMII(SERDES6G(1), 3, HSIO_HW_CFG_SD6G_1_CFG,
                         HSIO_HW_CFG_SD6G_1_CFG_SET(1)),
 
+       SERDES_MUX_SGMII(SERDES6G(2), 4, 0, 0),
+
        SERDES_MUX_RGMII(RGMII(0), 2, HSIO_HW_CFG_RGMII_0_CFG |
                         HSIO_HW_CFG_RGMII_ENA |
                         HSIO_HW_CFG_GMII_ENA,
index a623f092b11f642bd3d35655e162a94a454bb14f..a43e20abb10d54a2ff2bbe29907f5c4597d6871d 100644 (file)
 #define EUSB2_TUNE_EUSB_EQU            0x5A
 #define EUSB2_TUNE_EUSB_HS_COMP_CUR    0x5B
 
-#define QCOM_EUSB2_REPEATER_INIT_CFG(r, v)     \
-       {                                       \
-               .reg = r,                       \
-               .val = v,                       \
-       }
-
-enum reg_fields {
-       F_TUNE_EUSB_HS_COMP_CUR,
-       F_TUNE_EUSB_EQU,
-       F_TUNE_EUSB_SLEW,
-       F_TUNE_USB2_HS_COMP_CUR,
-       F_TUNE_USB2_PREEM,
-       F_TUNE_USB2_EQU,
-       F_TUNE_USB2_SLEW,
-       F_TUNE_SQUELCH_U,
-       F_TUNE_HSDISC,
-       F_TUNE_RES_FSDIF,
-       F_TUNE_IUSB2,
-       F_TUNE_USB2_CROSSOVER,
-       F_NUM_TUNE_FIELDS,
-
-       F_FORCE_VAL_5 = F_NUM_TUNE_FIELDS,
-       F_FORCE_EN_5,
-
-       F_EN_CTL1,
-
-       F_RPTR_STATUS,
-       F_NUM_FIELDS,
-};
-
-static struct reg_field eusb2_repeater_tune_reg_fields[F_NUM_FIELDS] = {
-       [F_TUNE_EUSB_HS_COMP_CUR] = REG_FIELD(EUSB2_TUNE_EUSB_HS_COMP_CUR, 0, 1),
-       [F_TUNE_EUSB_EQU] = REG_FIELD(EUSB2_TUNE_EUSB_EQU, 0, 1),
-       [F_TUNE_EUSB_SLEW] = REG_FIELD(EUSB2_TUNE_EUSB_SLEW, 0, 1),
-       [F_TUNE_USB2_HS_COMP_CUR] = REG_FIELD(EUSB2_TUNE_USB2_HS_COMP_CUR, 0, 1),
-       [F_TUNE_USB2_PREEM] = REG_FIELD(EUSB2_TUNE_USB2_PREEM, 0, 2),
-       [F_TUNE_USB2_EQU] = REG_FIELD(EUSB2_TUNE_USB2_EQU, 0, 1),
-       [F_TUNE_USB2_SLEW] = REG_FIELD(EUSB2_TUNE_USB2_SLEW, 0, 1),
-       [F_TUNE_SQUELCH_U] = REG_FIELD(EUSB2_TUNE_SQUELCH_U, 0, 2),
-       [F_TUNE_HSDISC] = REG_FIELD(EUSB2_TUNE_HSDISC, 0, 2),
-       [F_TUNE_RES_FSDIF] = REG_FIELD(EUSB2_TUNE_RES_FSDIF, 0, 2),
-       [F_TUNE_IUSB2] = REG_FIELD(EUSB2_TUNE_IUSB2, 0, 3),
-       [F_TUNE_USB2_CROSSOVER] = REG_FIELD(EUSB2_TUNE_USB2_CROSSOVER, 0, 2),
-
-       [F_FORCE_VAL_5] = REG_FIELD(EUSB2_FORCE_VAL_5, 0, 7),
-       [F_FORCE_EN_5] = REG_FIELD(EUSB2_FORCE_EN_5, 0, 7),
-
-       [F_EN_CTL1] = REG_FIELD(EUSB2_EN_CTL1, 0, 7),
-
-       [F_RPTR_STATUS] = REG_FIELD(EUSB2_RPTR_STATUS, 0, 7),
+enum eusb2_reg_layout {
+       TUNE_EUSB_HS_COMP_CUR,
+       TUNE_EUSB_EQU,
+       TUNE_EUSB_SLEW,
+       TUNE_USB2_HS_COMP_CUR,
+       TUNE_USB2_PREEM,
+       TUNE_USB2_EQU,
+       TUNE_USB2_SLEW,
+       TUNE_SQUELCH_U,
+       TUNE_HSDISC,
+       TUNE_RES_FSDIF,
+       TUNE_IUSB2,
+       TUNE_USB2_CROSSOVER,
+       NUM_TUNE_FIELDS,
+
+       FORCE_VAL_5 = NUM_TUNE_FIELDS,
+       FORCE_EN_5,
+
+       EN_CTL1,
+
+       RPTR_STATUS,
+       LAYOUT_SIZE,
 };
 
 struct eusb2_repeater_cfg {
@@ -98,10 +70,11 @@ struct eusb2_repeater_cfg {
 
 struct eusb2_repeater {
        struct device *dev;
-       struct regmap_field *regs[F_NUM_FIELDS];
+       struct regmap *regmap;
        struct phy *phy;
        struct regulator_bulk_data *vregs;
        const struct eusb2_repeater_cfg *cfg;
+       u32 base;
        enum phy_mode mode;
 };
 
@@ -109,10 +82,10 @@ static const char * const pm8550b_vreg_l[] = {
        "vdd18", "vdd3",
 };
 
-static const u32 pm8550b_init_tbl[F_NUM_TUNE_FIELDS] = {
-       [F_TUNE_IUSB2] = 0x8,
-       [F_TUNE_SQUELCH_U] = 0x3,
-       [F_TUNE_USB2_PREEM] = 0x5,
+static const u32 pm8550b_init_tbl[NUM_TUNE_FIELDS] = {
+       [TUNE_IUSB2] = 0x8,
+       [TUNE_SQUELCH_U] = 0x3,
+       [TUNE_USB2_PREEM] = 0x5,
 };
 
 static const struct eusb2_repeater_cfg pm8550b_eusb2_cfg = {
@@ -140,47 +113,42 @@ static int eusb2_repeater_init_vregs(struct eusb2_repeater *rptr)
 
 static int eusb2_repeater_init(struct phy *phy)
 {
-       struct reg_field *regfields = eusb2_repeater_tune_reg_fields;
        struct eusb2_repeater *rptr = phy_get_drvdata(phy);
        struct device_node *np = rptr->dev->of_node;
-       u32 init_tbl[F_NUM_TUNE_FIELDS] = { 0 };
-       u8 override;
+       struct regmap *regmap = rptr->regmap;
+       const u32 *init_tbl = rptr->cfg->init_tbl;
+       u8 tune_usb2_preem = init_tbl[TUNE_USB2_PREEM];
+       u8 tune_hsdisc = init_tbl[TUNE_HSDISC];
+       u8 tune_iusb2 = init_tbl[TUNE_IUSB2];
+       u32 base = rptr->base;
        u32 val;
        int ret;
-       int i;
+
+       of_property_read_u8(np, "qcom,tune-usb2-amplitude", &tune_iusb2);
+       of_property_read_u8(np, "qcom,tune-usb2-disc-thres", &tune_hsdisc);
+       of_property_read_u8(np, "qcom,tune-usb2-preem", &tune_usb2_preem);
 
        ret = regulator_bulk_enable(rptr->cfg->num_vregs, rptr->vregs);
        if (ret)
                return ret;
 
-       regmap_field_update_bits(rptr->regs[F_EN_CTL1], EUSB2_RPTR_EN, EUSB2_RPTR_EN);
+       regmap_write(regmap, base + EUSB2_EN_CTL1, EUSB2_RPTR_EN);
 
-       for (i = 0; i < F_NUM_TUNE_FIELDS; i++) {
-               if (init_tbl[i]) {
-                       regmap_field_update_bits(rptr->regs[i], init_tbl[i], init_tbl[i]);
-               } else {
-                       /* Write 0 if there's no value set */
-                       u32 mask = GENMASK(regfields[i].msb, regfields[i].lsb);
-
-                       regmap_field_update_bits(rptr->regs[i], mask, 0);
-               }
-       }
-       memcpy(init_tbl, rptr->cfg->init_tbl, sizeof(init_tbl));
+       regmap_write(regmap, base + EUSB2_TUNE_EUSB_HS_COMP_CUR, init_tbl[TUNE_EUSB_HS_COMP_CUR]);
+       regmap_write(regmap, base + EUSB2_TUNE_EUSB_EQU, init_tbl[TUNE_EUSB_EQU]);
+       regmap_write(regmap, base + EUSB2_TUNE_EUSB_SLEW, init_tbl[TUNE_EUSB_SLEW]);
+       regmap_write(regmap, base + EUSB2_TUNE_USB2_HS_COMP_CUR, init_tbl[TUNE_USB2_HS_COMP_CUR]);
+       regmap_write(regmap, base + EUSB2_TUNE_USB2_EQU, init_tbl[TUNE_USB2_EQU]);
+       regmap_write(regmap, base + EUSB2_TUNE_USB2_SLEW, init_tbl[TUNE_USB2_SLEW]);
+       regmap_write(regmap, base + EUSB2_TUNE_SQUELCH_U, init_tbl[TUNE_SQUELCH_U]);
+       regmap_write(regmap, base + EUSB2_TUNE_RES_FSDIF, init_tbl[TUNE_RES_FSDIF]);
+       regmap_write(regmap, base + EUSB2_TUNE_USB2_CROSSOVER, init_tbl[TUNE_USB2_CROSSOVER]);
 
-       if (!of_property_read_u8(np, "qcom,tune-usb2-amplitude", &override))
-               init_tbl[F_TUNE_IUSB2] = override;
+       regmap_write(regmap, base + EUSB2_TUNE_USB2_PREEM, tune_usb2_preem);
+       regmap_write(regmap, base + EUSB2_TUNE_HSDISC, tune_hsdisc);
+       regmap_write(regmap, base + EUSB2_TUNE_IUSB2, tune_iusb2);
 
-       if (!of_property_read_u8(np, "qcom,tune-usb2-disc-thres", &override))
-               init_tbl[F_TUNE_HSDISC] = override;
-
-       if (!of_property_read_u8(np, "qcom,tune-usb2-preem", &override))
-               init_tbl[F_TUNE_USB2_PREEM] = override;
-
-       for (i = 0; i < F_NUM_TUNE_FIELDS; i++)
-               regmap_field_update_bits(rptr->regs[i], init_tbl[i], init_tbl[i]);
-
-       ret = regmap_field_read_poll_timeout(rptr->regs[F_RPTR_STATUS],
-                                            val, val & RPTR_OK, 10, 5);
+       ret = regmap_read_poll_timeout(regmap, base + EUSB2_RPTR_STATUS, val, val & RPTR_OK, 10, 5);
        if (ret)
                dev_err(rptr->dev, "initialization timed-out\n");
 
@@ -191,6 +159,8 @@ static int eusb2_repeater_set_mode(struct phy *phy,
                                   enum phy_mode mode, int submode)
 {
        struct eusb2_repeater *rptr = phy_get_drvdata(phy);
+       struct regmap *regmap = rptr->regmap;
+       u32 base = rptr->base;
 
        switch (mode) {
        case PHY_MODE_USB_HOST:
@@ -199,10 +169,8 @@ static int eusb2_repeater_set_mode(struct phy *phy,
                 * per eUSB 1.2 Spec. Below implement software workaround until
                 * PHY and controller is fixing seen observation.
                 */
-               regmap_field_update_bits(rptr->regs[F_FORCE_EN_5],
-                                        F_CLK_19P2M_EN, F_CLK_19P2M_EN);
-               regmap_field_update_bits(rptr->regs[F_FORCE_VAL_5],
-                                        V_CLK_19P2M_EN, V_CLK_19P2M_EN);
+               regmap_write(regmap, base + EUSB2_FORCE_EN_5, F_CLK_19P2M_EN);
+               regmap_write(regmap, base + EUSB2_FORCE_VAL_5, V_CLK_19P2M_EN);
                break;
        case PHY_MODE_USB_DEVICE:
                /*
@@ -211,10 +179,8 @@ static int eusb2_repeater_set_mode(struct phy *phy,
                 * repeater doesn't clear previous value due to shared
                 * regulators (say host <-> device mode switch).
                 */
-               regmap_field_update_bits(rptr->regs[F_FORCE_EN_5],
-                                        F_CLK_19P2M_EN, 0);
-               regmap_field_update_bits(rptr->regs[F_FORCE_VAL_5],
-                                        V_CLK_19P2M_EN, 0);
+               regmap_write(regmap, base + EUSB2_FORCE_EN_5, 0);
+               regmap_write(regmap, base + EUSB2_FORCE_VAL_5, 0);
                break;
        default:
                return -EINVAL;
@@ -243,9 +209,8 @@ static int eusb2_repeater_probe(struct platform_device *pdev)
        struct device *dev = &pdev->dev;
        struct phy_provider *phy_provider;
        struct device_node *np = dev->of_node;
-       struct regmap *regmap;
-       int i, ret;
        u32 res;
+       int ret;
 
        rptr = devm_kzalloc(dev, sizeof(*rptr), GFP_KERNEL);
        if (!rptr)
@@ -258,22 +223,15 @@ static int eusb2_repeater_probe(struct platform_device *pdev)
        if (!rptr->cfg)
                return -EINVAL;
 
-       regmap = dev_get_regmap(dev->parent, NULL);
-       if (!regmap)
+       rptr->regmap = dev_get_regmap(dev->parent, NULL);
+       if (!rptr->regmap)
                return -ENODEV;
 
        ret = of_property_read_u32(np, "reg", &res);
        if (ret < 0)
                return ret;
 
-       for (i = 0; i < F_NUM_FIELDS; i++)
-               eusb2_repeater_tune_reg_fields[i].reg += res;
-
-       ret = devm_regmap_field_bulk_alloc(dev, regmap, rptr->regs,
-                                          eusb2_repeater_tune_reg_fields,
-                                          F_NUM_FIELDS);
-       if (ret)
-               return ret;
+       rptr->base = res;
 
        ret = eusb2_repeater_init_vregs(rptr);
        if (ret < 0) {
index c2590579190a935d76abc9cde99964c9958d3d07..03fb0d4b75d744492e4646af65287f61e7927f1b 100644 (file)
@@ -299,7 +299,7 @@ static int m31usb_phy_probe(struct platform_device *pdev)
 
        qphy->vreg = devm_regulator_get(dev, "vdda-phy");
        if (IS_ERR(qphy->vreg))
-               return dev_err_probe(dev, PTR_ERR(qphy->phy),
+               return dev_err_probe(dev, PTR_ERR(qphy->vreg),
                                     "failed to get vreg\n");
 
        phy_set_drvdata(qphy->phy, qphy);
index 1ad10110dd2544b77ae38a1459497ae6e2905b84..17c4ad7553a5edd0960e8ff7e64e97dd7f22b5f5 100644 (file)
@@ -3562,14 +3562,6 @@ static int qmp_combo_probe(struct platform_device *pdev)
        if (ret)
                return ret;
 
-       ret = qmp_combo_typec_switch_register(qmp);
-       if (ret)
-               return ret;
-
-       ret = drm_aux_bridge_register(dev);
-       if (ret)
-               return ret;
-
        /* Check for legacy binding with child nodes. */
        usb_np = of_get_child_by_name(dev->of_node, "usb3-phy");
        if (usb_np) {
@@ -3589,6 +3581,14 @@ static int qmp_combo_probe(struct platform_device *pdev)
        if (ret)
                goto err_node_put;
 
+       ret = qmp_combo_typec_switch_register(qmp);
+       if (ret)
+               goto err_node_put;
+
+       ret = drm_aux_bridge_register(dev);
+       if (ret)
+               goto err_node_put;
+
        pm_runtime_set_active(dev);
        ret = devm_pm_runtime_enable(dev);
        if (ret)
index 243cc2b9a0fb6d1fadc7384a9e93f453efad6351..5c003988c35d38cead7cc6b3e1e2af04a07bdb28 100644 (file)
@@ -1556,6 +1556,14 @@ static const char * const qmp_phy_vreg_l[] = {
        "vdda-phy", "vdda-pll",
 };
 
+static const struct qmp_usb_offsets qmp_usb_offsets_v3 = {
+       .serdes         = 0,
+       .pcs            = 0x800,
+       .pcs_misc       = 0x600,
+       .tx             = 0x200,
+       .rx             = 0x400,
+};
+
 static const struct qmp_usb_offsets qmp_usb_offsets_ipq9574 = {
        .serdes         = 0,
        .pcs            = 0x800,
@@ -1564,7 +1572,7 @@ static const struct qmp_usb_offsets qmp_usb_offsets_ipq9574 = {
        .rx             = 0x400,
 };
 
-static const struct qmp_usb_offsets qmp_usb_offsets_v3 = {
+static const struct qmp_usb_offsets qmp_usb_offsets_v3_msm8996 = {
        .serdes         = 0,
        .pcs            = 0x600,
        .tx             = 0x200,
@@ -1613,6 +1621,24 @@ static const struct qmp_usb_offsets qmp_usb_offsets_v7 = {
        .rx             = 0x1000,
 };
 
+static const struct qmp_phy_cfg ipq6018_usb3phy_cfg = {
+       .lanes                  = 1,
+
+       .offsets                = &qmp_usb_offsets_v3,
+
+       .serdes_tbl             = ipq9574_usb3_serdes_tbl,
+       .serdes_tbl_num         = ARRAY_SIZE(ipq9574_usb3_serdes_tbl),
+       .tx_tbl                 = msm8996_usb3_tx_tbl,
+       .tx_tbl_num             = ARRAY_SIZE(msm8996_usb3_tx_tbl),
+       .rx_tbl                 = ipq8074_usb3_rx_tbl,
+       .rx_tbl_num             = ARRAY_SIZE(ipq8074_usb3_rx_tbl),
+       .pcs_tbl                = ipq8074_usb3_pcs_tbl,
+       .pcs_tbl_num            = ARRAY_SIZE(ipq8074_usb3_pcs_tbl),
+       .vreg_list              = qmp_phy_vreg_l,
+       .num_vregs              = ARRAY_SIZE(qmp_phy_vreg_l),
+       .regs                   = qmp_v3_usb3phy_regs_layout,
+};
+
 static const struct qmp_phy_cfg ipq8074_usb3phy_cfg = {
        .lanes                  = 1,
 
@@ -1652,7 +1678,7 @@ static const struct qmp_phy_cfg ipq9574_usb3phy_cfg = {
 static const struct qmp_phy_cfg msm8996_usb3phy_cfg = {
        .lanes                  = 1,
 
-       .offsets                = &qmp_usb_offsets_v3,
+       .offsets                = &qmp_usb_offsets_v3_msm8996,
 
        .serdes_tbl             = msm8996_usb3_serdes_tbl,
        .serdes_tbl_num         = ARRAY_SIZE(msm8996_usb3_serdes_tbl),
@@ -2563,7 +2589,7 @@ err_node_put:
 static const struct of_device_id qmp_usb_of_match_table[] = {
        {
                .compatible = "qcom,ipq6018-qmp-usb3-phy",
-               .data = &ipq8074_usb3phy_cfg,
+               .data = &ipq6018_usb3phy_cfg,
        }, {
                .compatible = "qcom,ipq8074-qmp-usb3-phy",
                .data = &ipq8074_usb3phy_cfg,
index e53eace7c91e372e60d0fcbb6032e2e8fd510595..6387c0d34c551c0e4e28e09af0792cee69eb2952 100644 (file)
@@ -673,8 +673,6 @@ static int rcar_gen3_phy_usb2_probe(struct platform_device *pdev)
        channel->irq = platform_get_irq_optional(pdev, 0);
        channel->dr_mode = rcar_gen3_get_dr_mode(dev->of_node);
        if (channel->dr_mode != USB_DR_MODE_UNKNOWN) {
-               int ret;
-
                channel->is_otg_channel = true;
                channel->uses_otg_pins = !of_property_read_bool(dev->of_node,
                                                        "renesas,no-otg-pins");
@@ -738,8 +736,6 @@ static int rcar_gen3_phy_usb2_probe(struct platform_device *pdev)
                ret = PTR_ERR(provider);
                goto error;
        } else if (channel->is_otg_channel) {
-               int ret;
-
                ret = device_create_file(dev, &dev_attr_role);
                if (ret < 0)
                        goto error;
index dd2913ac0fa28cea0cabf82c491e2ba49dfcb80e..78e19b128962a9a504986c7d0e8135da50527aa3 100644 (file)
@@ -117,7 +117,7 @@ static int omap_usb_set_vbus(struct usb_otg *otg, bool enabled)
 {
        struct omap_usb *phy = phy_to_omapusb(otg->usb_phy);
 
-       if (!phy->comparator)
+       if (!phy->comparator || !phy->comparator->set_vbus)
                return -ENODEV;
 
        return phy->comparator->set_vbus(phy->comparator, enabled);
@@ -127,7 +127,7 @@ static int omap_usb_start_srp(struct usb_otg *otg)
 {
        struct omap_usb *phy = phy_to_omapusb(otg->usb_phy);
 
-       if (!phy->comparator)
+       if (!phy->comparator || !phy->comparator->start_srp)
                return -ENODEV;
 
        return phy->comparator->start_srp(phy->comparator);
index ee56856cb80c33e4733f2b7f2a43fb681c89fc61..bbcdece83bf422948983a814b7a6221b79cad6a3 100644 (file)
@@ -1644,7 +1644,7 @@ static int pinctrl_pins_show(struct seq_file *s, void *what)
        const struct pinctrl_ops *ops = pctldev->desc->pctlops;
        unsigned int i, pin;
 #ifdef CONFIG_GPIOLIB
-       struct gpio_device *gdev __free(gpio_device_put) = NULL;
+       struct gpio_device *gdev = NULL;
        struct pinctrl_gpio_range *range;
        int gpio_num;
 #endif
index 03ecb3d1aaf60da974f32bb344203b418969064f..49f89b70dcecb4a4465b62aecded05aa3e0b19f7 100644 (file)
@@ -1159,7 +1159,7 @@ static int amd_gpio_probe(struct platform_device *pdev)
        }
 
        ret = devm_request_irq(&pdev->dev, gpio_dev->irq, amd_gpio_irq_handler,
-                              IRQF_SHARED, KBUILD_MODNAME, gpio_dev);
+                              IRQF_SHARED | IRQF_ONESHOT, KBUILD_MODNAME, gpio_dev);
        if (ret)
                goto out2;
 
index 73f091cd827e69edae7e2c0f4743e54c6db14b40..23aebd4695e99fbebffaf308724484f14aeb7984 100644 (file)
@@ -2562,7 +2562,7 @@ static const struct of_device_id stm32mp257_pctrl_match[] = {
 };
 
 static const struct dev_pm_ops stm32_pinctrl_dev_pm_ops = {
-        SET_LATE_SYSTEM_SLEEP_PM_OPS(NULL, stm32_pinctrl_resume)
+        SET_LATE_SYSTEM_SLEEP_PM_OPS(stm32_pinctrl_suspend, stm32_pinctrl_resume)
 };
 
 static struct platform_driver stm32mp257_pinctrl_driver = {
index 1dd84c7a79de97f44b25c48fd241db068492b4f9..b1995ac268d77a9c56c81ce6e65048e2c465c449 100644 (file)
@@ -1170,7 +1170,7 @@ static int mlxbf_pmc_program_crspace_counter(int blk_num, uint32_t cnt_num,
        int ret;
 
        addr = pmc->block[blk_num].mmio_base +
-               (rounddown(cnt_num, 2) * MLXBF_PMC_CRSPACE_PERFSEL_SZ);
+               ((cnt_num / 2) * MLXBF_PMC_CRSPACE_PERFSEL_SZ);
        ret = mlxbf_pmc_readl(addr, &word);
        if (ret)
                return ret;
@@ -1413,7 +1413,7 @@ static int mlxbf_pmc_read_crspace_event(int blk_num, uint32_t cnt_num,
        int ret;
 
        addr = pmc->block[blk_num].mmio_base +
-               (rounddown(cnt_num, 2) * MLXBF_PMC_CRSPACE_PERFSEL_SZ);
+               ((cnt_num / 2) * MLXBF_PMC_CRSPACE_PERFSEL_SZ);
        ret = mlxbf_pmc_readl(addr, &word);
        if (ret)
                return ret;
index ed16ec422a7b33e9b529bfe1912d0ff687e4db5d..b8d1e32e97ebafaa1d0091d32d110b9854022ff9 100644 (file)
@@ -47,6 +47,9 @@
 /* Message with data needs at least two words (for header & data). */
 #define MLXBF_TMFIFO_DATA_MIN_WORDS            2
 
+/* Tx timeout in milliseconds. */
+#define TMFIFO_TX_TIMEOUT                      2000
+
 /* ACPI UID for BlueField-3. */
 #define TMFIFO_BF3_UID                         1
 
@@ -62,12 +65,14 @@ struct mlxbf_tmfifo;
  * @drop_desc: dummy desc for packet dropping
  * @cur_len: processed length of the current descriptor
  * @rem_len: remaining length of the pending packet
+ * @rem_padding: remaining bytes to send as paddings
  * @pkt_len: total length of the pending packet
  * @next_avail: next avail descriptor id
  * @num: vring size (number of descriptors)
  * @align: vring alignment size
  * @index: vring index
  * @vdev_id: vring virtio id (VIRTIO_ID_xxx)
+ * @tx_timeout: expire time of last tx packet
  * @fifo: pointer to the tmfifo structure
  */
 struct mlxbf_tmfifo_vring {
@@ -79,12 +84,14 @@ struct mlxbf_tmfifo_vring {
        struct vring_desc drop_desc;
        int cur_len;
        int rem_len;
+       int rem_padding;
        u32 pkt_len;
        u16 next_avail;
        int num;
        int align;
        int index;
        int vdev_id;
+       unsigned long tx_timeout;
        struct mlxbf_tmfifo *fifo;
 };
 
@@ -819,6 +826,50 @@ mlxbf_tmfifo_desc_done:
        return true;
 }
 
+static void mlxbf_tmfifo_check_tx_timeout(struct mlxbf_tmfifo_vring *vring)
+{
+       unsigned long flags;
+
+       /* Only handle Tx timeout for network vdev. */
+       if (vring->vdev_id != VIRTIO_ID_NET)
+               return;
+
+       /* Initialize the timeout or return if not expired. */
+       if (!vring->tx_timeout) {
+               /* Initialize the timeout. */
+               vring->tx_timeout = jiffies +
+                       msecs_to_jiffies(TMFIFO_TX_TIMEOUT);
+               return;
+       } else if (time_before(jiffies, vring->tx_timeout)) {
+               /* Return if not timeout yet. */
+               return;
+       }
+
+       /*
+        * Drop the packet after timeout. The outstanding packet is
+        * released and the remaining bytes will be sent with padding byte 0x00
+        * as a recovery. On the peer(host) side, the padding bytes 0x00 will be
+        * either dropped directly, or appended into existing outstanding packet
+        * thus dropped as corrupted network packet.
+        */
+       vring->rem_padding = round_up(vring->rem_len, sizeof(u64));
+       mlxbf_tmfifo_release_pkt(vring);
+       vring->cur_len = 0;
+       vring->rem_len = 0;
+       vring->fifo->vring[0] = NULL;
+
+       /*
+        * Make sure the load/store are in order before
+        * returning back to virtio.
+        */
+       virtio_mb(false);
+
+       /* Notify upper layer. */
+       spin_lock_irqsave(&vring->fifo->spin_lock[0], flags);
+       vring_interrupt(0, vring->vq);
+       spin_unlock_irqrestore(&vring->fifo->spin_lock[0], flags);
+}
+
 /* Rx & Tx processing of a queue. */
 static void mlxbf_tmfifo_rxtx(struct mlxbf_tmfifo_vring *vring, bool is_rx)
 {
@@ -841,6 +892,7 @@ static void mlxbf_tmfifo_rxtx(struct mlxbf_tmfifo_vring *vring, bool is_rx)
                return;
 
        do {
+retry:
                /* Get available FIFO space. */
                if (avail == 0) {
                        if (is_rx)
@@ -851,6 +903,17 @@ static void mlxbf_tmfifo_rxtx(struct mlxbf_tmfifo_vring *vring, bool is_rx)
                                break;
                }
 
+               /* Insert paddings for discarded Tx packet. */
+               if (!is_rx) {
+                       vring->tx_timeout = 0;
+                       while (vring->rem_padding >= sizeof(u64)) {
+                               writeq(0, vring->fifo->tx.data);
+                               vring->rem_padding -= sizeof(u64);
+                               if (--avail == 0)
+                                       goto retry;
+                       }
+               }
+
                /* Console output always comes from the Tx buffer. */
                if (!is_rx && devid == VIRTIO_ID_CONSOLE) {
                        mlxbf_tmfifo_console_tx(fifo, avail);
@@ -860,6 +923,10 @@ static void mlxbf_tmfifo_rxtx(struct mlxbf_tmfifo_vring *vring, bool is_rx)
                /* Handle one descriptor. */
                more = mlxbf_tmfifo_rxtx_one_desc(vring, is_rx, &avail);
        } while (more);
+
+       /* Check Tx timeout. */
+       if (avail <= 0 && !is_rx)
+               mlxbf_tmfifo_check_tx_timeout(vring);
 }
 
 /* Handle Rx or Tx queues. */
index f246252bddd85d97f3232617a53cb437b75a92c0..f4fa8bd8bda832a622078d84e2a7181d5f65cb88 100644 (file)
@@ -10,6 +10,7 @@ config AMD_PMF
        depends on AMD_NB
        select ACPI_PLATFORM_PROFILE
        depends on TEE && AMDTEE
+       depends on AMD_SFH_HID
        help
          This driver provides support for the AMD Platform Management Framework.
          The goal is to enhance end user experience by making AMD PCs smarter,
index feaa09f5b35a125c9c704a432f82c7672b7bc139..4f734e049f4a46b60b139cf38ec7c7a2e193a4f6 100644 (file)
@@ -296,7 +296,8 @@ static int amd_pmf_suspend_handler(struct device *dev)
 {
        struct amd_pmf_dev *pdev = dev_get_drvdata(dev);
 
-       kfree(pdev->buf);
+       if (pdev->smart_pc_enabled)
+               cancel_delayed_work_sync(&pdev->pb_work);
 
        return 0;
 }
@@ -312,6 +313,9 @@ static int amd_pmf_resume_handler(struct device *dev)
                        return ret;
        }
 
+       if (pdev->smart_pc_enabled)
+               schedule_delayed_work(&pdev->pb_work, msecs_to_jiffies(2000));
+
        return 0;
 }
 
@@ -330,9 +334,14 @@ static void amd_pmf_init_features(struct amd_pmf_dev *dev)
                dev_dbg(dev->dev, "SPS enabled and Platform Profiles registered\n");
        }
 
-       if (!amd_pmf_init_smart_pc(dev)) {
+       amd_pmf_init_smart_pc(dev);
+       if (dev->smart_pc_enabled) {
                dev_dbg(dev->dev, "Smart PC Solution Enabled\n");
-       } else if (is_apmf_func_supported(dev, APMF_FUNC_AUTO_MODE)) {
+               /* If Smart PC is enabled, no need to check for other features */
+               return;
+       }
+
+       if (is_apmf_func_supported(dev, APMF_FUNC_AUTO_MODE)) {
                amd_pmf_init_auto_mode(dev);
                dev_dbg(dev->dev, "Auto Mode Init done\n");
        } else if (is_apmf_func_supported(dev, APMF_FUNC_DYN_SLIDER_AC) ||
@@ -351,7 +360,7 @@ static void amd_pmf_deinit_features(struct amd_pmf_dev *dev)
                amd_pmf_deinit_sps(dev);
        }
 
-       if (!dev->smart_pc_enabled) {
+       if (dev->smart_pc_enabled) {
                amd_pmf_deinit_smart_pc(dev);
        } else if (is_apmf_func_supported(dev, APMF_FUNC_AUTO_MODE)) {
                amd_pmf_deinit_auto_mode(dev);
index 16999c5b334fd44537404c56ab325aff00ede667..66cae1cca73cc16b73210e49af1c836c3da4d260 100644 (file)
@@ -441,11 +441,6 @@ struct apmf_dyn_slider_output {
        struct apmf_cnqf_power_set ps[APMF_CNQF_MAX];
 } __packed;
 
-enum smart_pc_status {
-       PMF_SMART_PC_ENABLED,
-       PMF_SMART_PC_DISABLED,
-};
-
 /* Smart PC - TA internals */
 enum system_state {
        SYSTEM_STATE_S0i3,
index a0423942f771e457457cc88cb604913eeb5b2657..a3dec14c30043ecc9c1d109247452d6d19949976 100644 (file)
@@ -10,6 +10,7 @@
  */
 
 #include <acpi/button.h>
+#include <linux/amd-pmf-io.h>
 #include <linux/power_supply.h>
 #include <linux/units.h>
 #include "pmf.h"
@@ -44,6 +45,8 @@ void amd_pmf_dump_ta_inputs(struct amd_pmf_dev *dev, struct ta_pmf_enact_table *
        dev_dbg(dev->dev, "Max C0 Residency: %u\n", in->ev_info.max_c0residency);
        dev_dbg(dev->dev, "GFX Busy: %u\n", in->ev_info.gfx_busy);
        dev_dbg(dev->dev, "LID State: %s\n", in->ev_info.lid_state ? "close" : "open");
+       dev_dbg(dev->dev, "User Presence: %s\n", in->ev_info.user_present ? "Present" : "Away");
+       dev_dbg(dev->dev, "Ambient Light: %d\n", in->ev_info.ambient_light);
        dev_dbg(dev->dev, "==== TA inputs END ====\n");
 }
 #else
@@ -147,6 +150,38 @@ static int amd_pmf_get_slider_info(struct amd_pmf_dev *dev, struct ta_pmf_enact_
        return 0;
 }
 
+static int amd_pmf_get_sensor_info(struct amd_pmf_dev *dev, struct ta_pmf_enact_table *in)
+{
+       struct amd_sfh_info sfh_info;
+       int ret;
+
+       /* Get ALS data */
+       ret = amd_get_sfh_info(&sfh_info, MT_ALS);
+       if (!ret)
+               in->ev_info.ambient_light = sfh_info.ambient_light;
+       else
+               return ret;
+
+       /* get HPD data */
+       ret = amd_get_sfh_info(&sfh_info, MT_HPD);
+       if (ret)
+               return ret;
+
+       switch (sfh_info.user_present) {
+       case SFH_NOT_DETECTED:
+               in->ev_info.user_present = 0xff; /* assume no sensors connected */
+               break;
+       case SFH_USER_PRESENT:
+               in->ev_info.user_present = 1;
+               break;
+       case SFH_USER_AWAY:
+               in->ev_info.user_present = 0;
+               break;
+       }
+
+       return 0;
+}
+
 void amd_pmf_populate_ta_inputs(struct amd_pmf_dev *dev, struct ta_pmf_enact_table *in)
 {
        /* TA side lid open is 1 and close is 0, hence the ! here */
@@ -155,4 +190,5 @@ void amd_pmf_populate_ta_inputs(struct amd_pmf_dev *dev, struct ta_pmf_enact_tab
        amd_pmf_get_smu_info(dev, in);
        amd_pmf_get_battery_info(dev, in);
        amd_pmf_get_slider_info(dev, in);
+       amd_pmf_get_sensor_info(dev, in);
 }
index 502ce93d5cddac57f2f482080ea8a0800ea5123f..dcbe8f85e122947be014778c14478a7374698b49 100644 (file)
@@ -252,15 +252,17 @@ static int amd_pmf_start_policy_engine(struct amd_pmf_dev *dev)
        cookie = readl(dev->policy_buf + POLICY_COOKIE_OFFSET);
        length = readl(dev->policy_buf + POLICY_COOKIE_LEN);
 
-       if (cookie != POLICY_SIGN_COOKIE || !length)
+       if (cookie != POLICY_SIGN_COOKIE || !length) {
+               dev_dbg(dev->dev, "cookie doesn't match\n");
                return -EINVAL;
+       }
 
        /* Update the actual length */
        dev->policy_sz = length + 512;
        res = amd_pmf_invoke_cmd_init(dev);
        if (res == TA_PMF_TYPE_SUCCESS) {
                /* Now its safe to announce that smart pc is enabled */
-               dev->smart_pc_enabled = PMF_SMART_PC_ENABLED;
+               dev->smart_pc_enabled = true;
                /*
                 * Start collecting the data from TA FW after a small delay
                 * or else, we might end up getting stale values.
@@ -268,7 +270,7 @@ static int amd_pmf_start_policy_engine(struct amd_pmf_dev *dev)
                schedule_delayed_work(&dev->pb_work, msecs_to_jiffies(pb_actions_ms * 3));
        } else {
                dev_err(dev->dev, "ta invoke cmd init failed err: %x\n", res);
-               dev->smart_pc_enabled = PMF_SMART_PC_DISABLED;
+               dev->smart_pc_enabled = false;
                return res;
        }
 
@@ -298,8 +300,10 @@ static ssize_t amd_pmf_get_pb_data(struct file *filp, const char __user *buf,
        if (!new_policy_buf)
                return -ENOMEM;
 
-       if (copy_from_user(new_policy_buf, buf, length))
+       if (copy_from_user(new_policy_buf, buf, length)) {
+               kfree(new_policy_buf);
                return -EFAULT;
+       }
 
        kfree(dev->policy_buf);
        dev->policy_buf = new_policy_buf;
@@ -334,25 +338,6 @@ static void amd_pmf_remove_pb(struct amd_pmf_dev *dev) {}
 static void amd_pmf_hex_dump_pb(struct amd_pmf_dev *dev) {}
 #endif
 
-static int amd_pmf_get_bios_buffer(struct amd_pmf_dev *dev)
-{
-       dev->policy_buf = kzalloc(dev->policy_sz, GFP_KERNEL);
-       if (!dev->policy_buf)
-               return -ENOMEM;
-
-       dev->policy_base = devm_ioremap(dev->dev, dev->policy_addr, dev->policy_sz);
-       if (!dev->policy_base)
-               return -ENOMEM;
-
-       memcpy(dev->policy_buf, dev->policy_base, dev->policy_sz);
-
-       amd_pmf_hex_dump_pb(dev);
-       if (pb_side_load)
-               amd_pmf_open_pb(dev, dev->dbgfs_dir);
-
-       return amd_pmf_start_policy_engine(dev);
-}
-
 static int amd_pmf_amdtee_ta_match(struct tee_ioctl_version_data *ver, const void *data)
 {
        return ver->impl_id == TEE_IMPL_ID_AMDTEE;
@@ -451,22 +436,59 @@ int amd_pmf_init_smart_pc(struct amd_pmf_dev *dev)
                return ret;
 
        INIT_DELAYED_WORK(&dev->pb_work, amd_pmf_invoke_cmd);
-       amd_pmf_set_dram_addr(dev, true);
-       amd_pmf_get_bios_buffer(dev);
+
+       ret = amd_pmf_set_dram_addr(dev, true);
+       if (ret)
+               goto error;
+
+       dev->policy_base = devm_ioremap(dev->dev, dev->policy_addr, dev->policy_sz);
+       if (!dev->policy_base) {
+               ret = -ENOMEM;
+               goto error;
+       }
+
+       dev->policy_buf = kzalloc(dev->policy_sz, GFP_KERNEL);
+       if (!dev->policy_buf) {
+               ret = -ENOMEM;
+               goto error;
+       }
+
+       memcpy(dev->policy_buf, dev->policy_base, dev->policy_sz);
+
+       amd_pmf_hex_dump_pb(dev);
+
        dev->prev_data = kzalloc(sizeof(*dev->prev_data), GFP_KERNEL);
-       if (!dev->prev_data)
-               return -ENOMEM;
+       if (!dev->prev_data) {
+               ret = -ENOMEM;
+               goto error;
+       }
+
+       ret = amd_pmf_start_policy_engine(dev);
+       if (ret)
+               goto error;
+
+       if (pb_side_load)
+               amd_pmf_open_pb(dev, dev->dbgfs_dir);
+
+       return 0;
 
-       return dev->smart_pc_enabled;
+error:
+       amd_pmf_deinit_smart_pc(dev);
+
+       return ret;
 }
 
 void amd_pmf_deinit_smart_pc(struct amd_pmf_dev *dev)
 {
-       if (pb_side_load)
+       if (pb_side_load && dev->esbin)
                amd_pmf_remove_pb(dev);
 
+       cancel_delayed_work_sync(&dev->pb_work);
        kfree(dev->prev_data);
+       dev->prev_data = NULL;
        kfree(dev->policy_buf);
-       cancel_delayed_work_sync(&dev->pb_work);
+       dev->policy_buf = NULL;
+       kfree(dev->buf);
+       dev->buf = NULL;
        amd_pmf_tee_deinit(dev);
 }
index a1ee1a74fc3c4cb7e7bc62cda0297acdbe942d54..2cf3b4a8813f9b30cb5a79aaf2ee6acee2474c68 100644 (file)
@@ -399,7 +399,8 @@ int ifs_load_firmware(struct device *dev)
        if (fw->size != expected_size) {
                dev_err(dev, "File size mismatch (expected %u, actual %zu). Corrupted IFS image.\n",
                        expected_size, fw->size);
-               return -EINVAL;
+               ret = -EINVAL;
+               goto release;
        }
 
        ret = image_sanity_check(dev, (struct microcode_header_intel *)fw->data);
index b6708bab7c53d5afae8b4cd5d4fb07450a1c92ed..527d8fbc7cc1108da998e86d0d8dd970d9c5b179 100644 (file)
@@ -196,7 +196,7 @@ static int int0002_probe(struct platform_device *pdev)
         * IRQs into gpiolib.
         */
        ret = devm_request_irq(dev, irq, int0002_irq,
-                              IRQF_SHARED, "INT0002", chip);
+                              IRQF_ONESHOT | IRQF_SHARED, "INT0002", chip);
        if (ret) {
                dev_err(dev, "Error requesting IRQ %d: %d\n", irq, ret);
                return ret;
index 33ab207493e3e62946dc5d76c2eda98805725888..33bb58dc3f78c30a304a7a35595666152c34e908 100644 (file)
@@ -23,23 +23,23 @@ static int (*uncore_read)(struct uncore_data *data, unsigned int *min, unsigned
 static int (*uncore_write)(struct uncore_data *data, unsigned int input, unsigned int min_max);
 static int (*uncore_read_freq)(struct uncore_data *data, unsigned int *freq);
 
-static ssize_t show_domain_id(struct device *dev, struct device_attribute *attr, char *buf)
+static ssize_t show_domain_id(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
 {
-       struct uncore_data *data = container_of(attr, struct uncore_data, domain_id_dev_attr);
+       struct uncore_data *data = container_of(attr, struct uncore_data, domain_id_kobj_attr);
 
        return sprintf(buf, "%u\n", data->domain_id);
 }
 
-static ssize_t show_fabric_cluster_id(struct device *dev, struct device_attribute *attr, char *buf)
+static ssize_t show_fabric_cluster_id(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
 {
-       struct uncore_data *data = container_of(attr, struct uncore_data, fabric_cluster_id_dev_attr);
+       struct uncore_data *data = container_of(attr, struct uncore_data, fabric_cluster_id_kobj_attr);
 
        return sprintf(buf, "%u\n", data->cluster_id);
 }
 
-static ssize_t show_package_id(struct device *dev, struct device_attribute *attr, char *buf)
+static ssize_t show_package_id(struct kobject *kobj, struct kobj_attribute *attr, char *buf)
 {
-       struct uncore_data *data = container_of(attr, struct uncore_data, package_id_dev_attr);
+       struct uncore_data *data = container_of(attr, struct uncore_data, package_id_kobj_attr);
 
        return sprintf(buf, "%u\n", data->package_id);
 }
@@ -97,30 +97,30 @@ static ssize_t show_perf_status_freq_khz(struct uncore_data *data, char *buf)
 }
 
 #define store_uncore_min_max(name, min_max)                            \
-       static ssize_t store_##name(struct device *dev,         \
-                                    struct device_attribute *attr,     \
+       static ssize_t store_##name(struct kobject *kobj,               \
+                                    struct kobj_attribute *attr,       \
                                     const char *buf, size_t count)     \
        {                                                               \
-               struct uncore_data *data = container_of(attr, struct uncore_data, name##_dev_attr);\
+               struct uncore_data *data = container_of(attr, struct uncore_data, name##_kobj_attr);\
                                                                        \
                return store_min_max_freq_khz(data, buf, count, \
                                              min_max);         \
        }
 
 #define show_uncore_min_max(name, min_max)                             \
-       static ssize_t show_##name(struct device *dev,          \
-                                   struct device_attribute *attr, char *buf)\
+       static ssize_t show_##name(struct kobject *kobj,                \
+                                   struct kobj_attribute *attr, char *buf)\
        {                                                               \
-               struct uncore_data *data = container_of(attr, struct uncore_data, name##_dev_attr);\
+               struct uncore_data *data = container_of(attr, struct uncore_data, name##_kobj_attr);\
                                                                        \
                return show_min_max_freq_khz(data, buf, min_max);       \
        }
 
 #define show_uncore_perf_status(name)                                  \
-       static ssize_t show_##name(struct device *dev,          \
-                                  struct device_attribute *attr, char *buf)\
+       static ssize_t show_##name(struct kobject *kobj,                \
+                                  struct kobj_attribute *attr, char *buf)\
        {                                                               \
-               struct uncore_data *data = container_of(attr, struct uncore_data, name##_dev_attr);\
+               struct uncore_data *data = container_of(attr, struct uncore_data, name##_kobj_attr);\
                                                                        \
                return show_perf_status_freq_khz(data, buf); \
        }
@@ -134,11 +134,11 @@ show_uncore_min_max(max_freq_khz, 1);
 show_uncore_perf_status(current_freq_khz);
 
 #define show_uncore_data(member_name)                                  \
-       static ssize_t show_##member_name(struct device *dev,   \
-                                          struct device_attribute *attr, char *buf)\
+       static ssize_t show_##member_name(struct kobject *kobj, \
+                                          struct kobj_attribute *attr, char *buf)\
        {                                                               \
                struct uncore_data *data = container_of(attr, struct uncore_data,\
-                                                         member_name##_dev_attr);\
+                                                         member_name##_kobj_attr);\
                                                                        \
                return sysfs_emit(buf, "%u\n",                          \
                                 data->member_name);                    \
@@ -149,29 +149,29 @@ show_uncore_data(initial_max_freq_khz);
 
 #define init_attribute_rw(_name)                                       \
        do {                                                            \
-               sysfs_attr_init(&data->_name##_dev_attr.attr);  \
-               data->_name##_dev_attr.show = show_##_name;             \
-               data->_name##_dev_attr.store = store_##_name;           \
-               data->_name##_dev_attr.attr.name = #_name;              \
-               data->_name##_dev_attr.attr.mode = 0644;                \
+               sysfs_attr_init(&data->_name##_kobj_attr.attr); \
+               data->_name##_kobj_attr.show = show_##_name;            \
+               data->_name##_kobj_attr.store = store_##_name;          \
+               data->_name##_kobj_attr.attr.name = #_name;             \
+               data->_name##_kobj_attr.attr.mode = 0644;               \
        } while (0)
 
 #define init_attribute_ro(_name)                                       \
        do {                                                            \
-               sysfs_attr_init(&data->_name##_dev_attr.attr);  \
-               data->_name##_dev_attr.show = show_##_name;             \
-               data->_name##_dev_attr.store = NULL;                    \
-               data->_name##_dev_attr.attr.name = #_name;              \
-               data->_name##_dev_attr.attr.mode = 0444;                \
+               sysfs_attr_init(&data->_name##_kobj_attr.attr); \
+               data->_name##_kobj_attr.show = show_##_name;            \
+               data->_name##_kobj_attr.store = NULL;                   \
+               data->_name##_kobj_attr.attr.name = #_name;             \
+               data->_name##_kobj_attr.attr.mode = 0444;               \
        } while (0)
 
 #define init_attribute_root_ro(_name)                                  \
        do {                                                            \
-               sysfs_attr_init(&data->_name##_dev_attr.attr);  \
-               data->_name##_dev_attr.show = show_##_name;             \
-               data->_name##_dev_attr.store = NULL;                    \
-               data->_name##_dev_attr.attr.name = #_name;              \
-               data->_name##_dev_attr.attr.mode = 0400;                \
+               sysfs_attr_init(&data->_name##_kobj_attr.attr); \
+               data->_name##_kobj_attr.show = show_##_name;            \
+               data->_name##_kobj_attr.store = NULL;                   \
+               data->_name##_kobj_attr.attr.name = #_name;             \
+               data->_name##_kobj_attr.attr.mode = 0400;               \
        } while (0)
 
 static int create_attr_group(struct uncore_data *data, char *name)
@@ -186,21 +186,21 @@ static int create_attr_group(struct uncore_data *data, char *name)
 
        if (data->domain_id != UNCORE_DOMAIN_ID_INVALID) {
                init_attribute_root_ro(domain_id);
-               data->uncore_attrs[index++] = &data->domain_id_dev_attr.attr;
+               data->uncore_attrs[index++] = &data->domain_id_kobj_attr.attr;
                init_attribute_root_ro(fabric_cluster_id);
-               data->uncore_attrs[index++] = &data->fabric_cluster_id_dev_attr.attr;
+               data->uncore_attrs[index++] = &data->fabric_cluster_id_kobj_attr.attr;
                init_attribute_root_ro(package_id);
-               data->uncore_attrs[index++] = &data->package_id_dev_attr.attr;
+               data->uncore_attrs[index++] = &data->package_id_kobj_attr.attr;
        }
 
-       data->uncore_attrs[index++] = &data->max_freq_khz_dev_attr.attr;
-       data->uncore_attrs[index++] = &data->min_freq_khz_dev_attr.attr;
-       data->uncore_attrs[index++] = &data->initial_min_freq_khz_dev_attr.attr;
-       data->uncore_attrs[index++] = &data->initial_max_freq_khz_dev_attr.attr;
+       data->uncore_attrs[index++] = &data->max_freq_khz_kobj_attr.attr;
+       data->uncore_attrs[index++] = &data->min_freq_khz_kobj_attr.attr;
+       data->uncore_attrs[index++] = &data->initial_min_freq_khz_kobj_attr.attr;
+       data->uncore_attrs[index++] = &data->initial_max_freq_khz_kobj_attr.attr;
 
        ret = uncore_read_freq(data, &freq);
        if (!ret)
-               data->uncore_attrs[index++] = &data->current_freq_khz_dev_attr.attr;
+               data->uncore_attrs[index++] = &data->current_freq_khz_kobj_attr.attr;
 
        data->uncore_attrs[index] = NULL;
 
index 7afb69977c7e8c80b0db3ba819799434e41ff60a..0e5bf507e555209a69ba61e8e8eaaf7392209bfa 100644 (file)
  * @instance_id:       Unique instance id to append to directory name
  * @name:              Sysfs entry name for this instance
  * @uncore_attr_group: Attribute group storage
- * @max_freq_khz_dev_attr: Storage for device attribute max_freq_khz
- * @mix_freq_khz_dev_attr: Storage for device attribute min_freq_khz
- * @initial_max_freq_khz_dev_attr: Storage for device attribute initial_max_freq_khz
- * @initial_min_freq_khz_dev_attr: Storage for device attribute initial_min_freq_khz
- * @current_freq_khz_dev_attr: Storage for device attribute current_freq_khz
- * @domain_id_dev_attr: Storage for device attribute domain_id
- * @fabric_cluster_id_dev_attr: Storage for device attribute fabric_cluster_id
- * @package_id_dev_attr: Storage for device attribute package_id
+ * @max_freq_khz_kobj_attr: Storage for kobject attribute max_freq_khz
+ * @mix_freq_khz_kobj_attr: Storage for kobject attribute min_freq_khz
+ * @initial_max_freq_khz_kobj_attr: Storage for kobject attribute initial_max_freq_khz
+ * @initial_min_freq_khz_kobj_attr: Storage for kobject attribute initial_min_freq_khz
+ * @current_freq_khz_kobj_attr: Storage for kobject attribute current_freq_khz
+ * @domain_id_kobj_attr: Storage for kobject attribute domain_id
+ * @fabric_cluster_id_kobj_attr: Storage for kobject attribute fabric_cluster_id
+ * @package_id_kobj_attr: Storage for kobject attribute package_id
  * @uncore_attrs:      Attribute storage for group creation
  *
  * This structure is used to encapsulate all data related to uncore sysfs
@@ -53,14 +53,14 @@ struct uncore_data {
        char name[32];
 
        struct attribute_group uncore_attr_group;
-       struct device_attribute max_freq_khz_dev_attr;
-       struct device_attribute min_freq_khz_dev_attr;
-       struct device_attribute initial_max_freq_khz_dev_attr;
-       struct device_attribute initial_min_freq_khz_dev_attr;
-       struct device_attribute current_freq_khz_dev_attr;
-       struct device_attribute domain_id_dev_attr;
-       struct device_attribute fabric_cluster_id_dev_attr;
-       struct device_attribute package_id_dev_attr;
+       struct kobj_attribute max_freq_khz_kobj_attr;
+       struct kobj_attribute min_freq_khz_kobj_attr;
+       struct kobj_attribute initial_max_freq_khz_kobj_attr;
+       struct kobj_attribute initial_min_freq_khz_kobj_attr;
+       struct kobj_attribute current_freq_khz_kobj_attr;
+       struct kobj_attribute domain_id_kobj_attr;
+       struct kobj_attribute fabric_cluster_id_kobj_attr;
+       struct kobj_attribute package_id_kobj_attr;
        struct attribute *uncore_attrs[9];
 };
 
index 210b0a81b7ecbe3ec28499c3c8dbd52cbbf1c3fb..084c355c86f5fa9050ccb881a7efa6682b538773 100644 (file)
@@ -200,9 +200,6 @@ static void notify_handler(acpi_handle handle, u32 event, void *context)
        autorelease = val && (!ke_rel || ke_rel->type == KE_IGNORE);
 
        sparse_keymap_report_event(input_dev, event, val, autorelease);
-
-       /* Some devices need this to report further events */
-       acpi_evaluate_object(handle, "VBDL", NULL, NULL);
 }
 
 /*
index 9cf5ed0f8dc2848b9f85f59dafd3cd43373bf960..040153ad67c1cb7c36fe85616cb97d10f7f99c46 100644 (file)
@@ -32,7 +32,7 @@ static int get_fwu_request(struct device *dev, u32 *out)
                return -ENODEV;
 
        if (obj->type != ACPI_TYPE_INTEGER) {
-               dev_warn(dev, "wmi_query_block returned invalid value\n");
+               dev_warn(dev, "wmidev_block_query returned invalid value\n");
                kfree(obj);
                return -EINVAL;
        }
@@ -55,7 +55,7 @@ static int set_fwu_request(struct device *dev, u32 in)
 
        status = wmidev_block_set(to_wmi_device(dev), 0, &input);
        if (ACPI_FAILURE(status)) {
-               dev_err(dev, "wmi_set_block failed\n");
+               dev_err(dev, "wmidev_block_set failed\n");
                return -ENODEV;
        }
 
index 1cf2471d54ddef765b017fd079864b3ffb868fdc..3d66e1d4eb1f52dad69c8b2f94d5089ccd0b0c25 100644 (file)
 #define P2SBC_HIDE             BIT(8)
 
 #define P2SB_DEVFN_DEFAULT     PCI_DEVFN(31, 1)
+#define P2SB_DEVFN_GOLDMONT    PCI_DEVFN(13, 0)
+#define SPI_DEVFN_GOLDMONT     PCI_DEVFN(13, 2)
 
 static const struct x86_cpu_id p2sb_cpu_ids[] = {
-       X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT,       PCI_DEVFN(13, 0)),
+       X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT, P2SB_DEVFN_GOLDMONT),
        {}
 };
 
+/*
+ * Cache BAR0 of P2SB device functions 0 to 7.
+ * TODO: The constant 8 is the number of functions that PCI specification
+ *       defines. Same definitions exist tree-wide. Unify this definition and
+ *       the other definitions then move to include/uapi/linux/pci.h.
+ */
+#define NR_P2SB_RES_CACHE 8
+
+struct p2sb_res_cache {
+       u32 bus_dev_id;
+       struct resource res;
+};
+
+static struct p2sb_res_cache p2sb_resources[NR_P2SB_RES_CACHE];
+
 static int p2sb_get_devfn(unsigned int *devfn)
 {
        unsigned int fn = P2SB_DEVFN_DEFAULT;
@@ -39,10 +56,18 @@ static int p2sb_get_devfn(unsigned int *devfn)
        return 0;
 }
 
+static bool p2sb_valid_resource(struct resource *res)
+{
+       if (res->flags)
+               return true;
+
+       return false;
+}
+
 /* Copy resource from the first BAR of the device in question */
-static int p2sb_read_bar0(struct pci_dev *pdev, struct resource *mem)
+static void p2sb_read_bar0(struct pci_dev *pdev, struct resource *mem)
 {
-       struct resource *bar0 = &pdev->resource[0];
+       struct resource *bar0 = pci_resource_n(pdev, 0);
 
        /* Make sure we have no dangling pointers in the output */
        memset(mem, 0, sizeof(*mem));
@@ -56,49 +81,57 @@ static int p2sb_read_bar0(struct pci_dev *pdev, struct resource *mem)
        mem->end = bar0->end;
        mem->flags = bar0->flags;
        mem->desc = bar0->desc;
-
-       return 0;
 }
 
-static int p2sb_scan_and_read(struct pci_bus *bus, unsigned int devfn, struct resource *mem)
+static void p2sb_scan_and_cache_devfn(struct pci_bus *bus, unsigned int devfn)
 {
+       struct p2sb_res_cache *cache = &p2sb_resources[PCI_FUNC(devfn)];
        struct pci_dev *pdev;
-       int ret;
 
        pdev = pci_scan_single_device(bus, devfn);
        if (!pdev)
-               return -ENODEV;
+               return;
 
-       ret = p2sb_read_bar0(pdev, mem);
+       p2sb_read_bar0(pdev, &cache->res);
+       cache->bus_dev_id = bus->dev.id;
 
        pci_stop_and_remove_bus_device(pdev);
-       return ret;
 }
 
-/**
- * p2sb_bar - Get Primary to Sideband (P2SB) bridge device BAR
- * @bus: PCI bus to communicate with
- * @devfn: PCI slot and function to communicate with
- * @mem: memory resource to be filled in
- *
- * The BIOS prevents the P2SB device from being enumerated by the PCI
- * subsystem, so we need to unhide and hide it back to lookup the BAR.
- *
- * if @bus is NULL, the bus 0 in domain 0 will be used.
- * If @devfn is 0, it will be replaced by devfn of the P2SB device.
- *
- * Caller must provide a valid pointer to @mem.
- *
- * Locking is handled by pci_rescan_remove_lock mutex.
- *
- * Return:
- * 0 on success or appropriate errno value on error.
- */
-int p2sb_bar(struct pci_bus *bus, unsigned int devfn, struct resource *mem)
+static int p2sb_scan_and_cache(struct pci_bus *bus, unsigned int devfn)
+{
+       /* Scan the P2SB device and cache its BAR0 */
+       p2sb_scan_and_cache_devfn(bus, devfn);
+
+       /* On Goldmont p2sb_bar() also gets called for the SPI controller */
+       if (devfn == P2SB_DEVFN_GOLDMONT)
+               p2sb_scan_and_cache_devfn(bus, SPI_DEVFN_GOLDMONT);
+
+       if (!p2sb_valid_resource(&p2sb_resources[PCI_FUNC(devfn)].res))
+               return -ENOENT;
+
+       return 0;
+}
+
+static struct pci_bus *p2sb_get_bus(struct pci_bus *bus)
+{
+       static struct pci_bus *p2sb_bus;
+
+       bus = bus ?: p2sb_bus;
+       if (bus)
+               return bus;
+
+       /* Assume P2SB is on the bus 0 in domain 0 */
+       p2sb_bus = pci_find_bus(0, 0);
+       return p2sb_bus;
+}
+
+static int p2sb_cache_resources(void)
 {
-       struct pci_dev *pdev_p2sb;
        unsigned int devfn_p2sb;
        u32 value = P2SBC_HIDE;
+       struct pci_bus *bus;
+       u16 class;
        int ret;
 
        /* Get devfn for P2SB device itself */
@@ -106,8 +139,17 @@ int p2sb_bar(struct pci_bus *bus, unsigned int devfn, struct resource *mem)
        if (ret)
                return ret;
 
-       /* if @bus is NULL, use bus 0 in domain 0 */
-       bus = bus ?: pci_find_bus(0, 0);
+       bus = p2sb_get_bus(NULL);
+       if (!bus)
+               return -ENODEV;
+
+       /*
+        * When a device with same devfn exists and its device class is not
+        * PCI_CLASS_MEMORY_OTHER for P2SB, do not touch it.
+        */
+       pci_bus_read_config_word(bus, devfn_p2sb, PCI_CLASS_DEVICE, &class);
+       if (!PCI_POSSIBLE_ERROR(class) && class != PCI_CLASS_MEMORY_OTHER)
+               return -ENODEV;
 
        /*
         * Prevent concurrent PCI bus scan from seeing the P2SB device and
@@ -115,17 +157,16 @@ int p2sb_bar(struct pci_bus *bus, unsigned int devfn, struct resource *mem)
         */
        pci_lock_rescan_remove();
 
-       /* Unhide the P2SB device, if needed */
+       /*
+        * The BIOS prevents the P2SB device from being enumerated by the PCI
+        * subsystem, so we need to unhide and hide it back to lookup the BAR.
+        * Unhide the P2SB device here, if needed.
+        */
        pci_bus_read_config_dword(bus, devfn_p2sb, P2SBC, &value);
        if (value & P2SBC_HIDE)
                pci_bus_write_config_dword(bus, devfn_p2sb, P2SBC, 0);
 
-       pdev_p2sb = pci_scan_single_device(bus, devfn_p2sb);
-       if (devfn)
-               ret = p2sb_scan_and_read(bus, devfn, mem);
-       else
-               ret = p2sb_read_bar0(pdev_p2sb, mem);
-       pci_stop_and_remove_bus_device(pdev_p2sb);
+       ret = p2sb_scan_and_cache(bus, devfn_p2sb);
 
        /* Hide the P2SB device, if it was hidden */
        if (value & P2SBC_HIDE)
@@ -133,12 +174,62 @@ int p2sb_bar(struct pci_bus *bus, unsigned int devfn, struct resource *mem)
 
        pci_unlock_rescan_remove();
 
-       if (ret)
-               return ret;
+       return ret;
+}
 
-       if (mem->flags == 0)
+/**
+ * p2sb_bar - Get Primary to Sideband (P2SB) bridge device BAR
+ * @bus: PCI bus to communicate with
+ * @devfn: PCI slot and function to communicate with
+ * @mem: memory resource to be filled in
+ *
+ * If @bus is NULL, the bus 0 in domain 0 will be used.
+ * If @devfn is 0, it will be replaced by devfn of the P2SB device.
+ *
+ * Caller must provide a valid pointer to @mem.
+ *
+ * Return:
+ * 0 on success or appropriate errno value on error.
+ */
+int p2sb_bar(struct pci_bus *bus, unsigned int devfn, struct resource *mem)
+{
+       struct p2sb_res_cache *cache;
+       int ret;
+
+       bus = p2sb_get_bus(bus);
+       if (!bus)
+               return -ENODEV;
+
+       if (!devfn) {
+               ret = p2sb_get_devfn(&devfn);
+               if (ret)
+                       return ret;
+       }
+
+       cache = &p2sb_resources[PCI_FUNC(devfn)];
+       if (cache->bus_dev_id != bus->dev.id)
                return -ENODEV;
 
+       if (!p2sb_valid_resource(&cache->res))
+               return -ENOENT;
+
+       memcpy(mem, &cache->res, sizeof(*mem));
        return 0;
 }
 EXPORT_SYMBOL_GPL(p2sb_bar);
+
+static int __init p2sb_fs_init(void)
+{
+       p2sb_cache_resources();
+       return 0;
+}
+
+/*
+ * pci_rescan_remove_lock to avoid access to unhidden P2SB devices can
+ * not be locked in sysfs pci bus rescan path because of deadlock. To
+ * avoid the deadlock, access to P2SB devices with the lock at an early
+ * step in kernel initialization and cache required resources. This
+ * should happen after subsys_initcall which initializes PCI subsystem
+ * and before device_initcall which requires P2SB resources.
+ */
+fs_initcall(p2sb_fs_init);
diff --git a/drivers/platform/x86/serdev_helpers.h b/drivers/platform/x86/serdev_helpers.h
new file mode 100644 (file)
index 0000000..bcf3a0c
--- /dev/null
@@ -0,0 +1,80 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * In some cases UART attached devices which require an in kernel driver,
+ * e.g. UART attached Bluetooth HCIs are described in the ACPI tables
+ * by an ACPI device with a broken or missing UartSerialBusV2() resource.
+ *
+ * This causes the kernel to create a /dev/ttyS# char-device for the UART
+ * instead of creating an in kernel serdev-controller + serdev-device pair
+ * for the in kernel driver.
+ *
+ * The quirk handling in acpi_quirk_skip_serdev_enumeration() makes the kernel
+ * create a serdev-controller device for these UARTs instead of a /dev/ttyS#.
+ *
+ * Instantiating the actual serdev-device to bind to is up to pdx86 code,
+ * this header provides a helper for getting the serdev-controller device.
+ */
+#include <linux/acpi.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/printk.h>
+#include <linux/sprintf.h>
+#include <linux/string.h>
+
+static inline struct device *
+get_serdev_controller(const char *serial_ctrl_hid,
+                     const char *serial_ctrl_uid,
+                     int serial_ctrl_port,
+                     const char *serdev_ctrl_name)
+{
+       struct device *ctrl_dev, *child;
+       struct acpi_device *ctrl_adev;
+       char name[32];
+       int i;
+
+       ctrl_adev = acpi_dev_get_first_match_dev(serial_ctrl_hid, serial_ctrl_uid, -1);
+       if (!ctrl_adev) {
+               pr_err("error could not get %s/%s serial-ctrl adev\n",
+                      serial_ctrl_hid, serial_ctrl_uid);
+               return ERR_PTR(-ENODEV);
+       }
+
+       /* get_first_physical_node() returns a weak ref */
+       ctrl_dev = get_device(acpi_get_first_physical_node(ctrl_adev));
+       if (!ctrl_dev) {
+               pr_err("error could not get %s/%s serial-ctrl physical node\n",
+                      serial_ctrl_hid, serial_ctrl_uid);
+               ctrl_dev = ERR_PTR(-ENODEV);
+               goto put_ctrl_adev;
+       }
+
+       /* Walk host -> uart-ctrl -> port -> serdev-ctrl */
+       for (i = 0; i < 3; i++) {
+               switch (i) {
+               case 0:
+                       snprintf(name, sizeof(name), "%s:0", dev_name(ctrl_dev));
+                       break;
+               case 1:
+                       snprintf(name, sizeof(name), "%s.%d",
+                                dev_name(ctrl_dev), serial_ctrl_port);
+                       break;
+               case 2:
+                       strscpy(name, serdev_ctrl_name, sizeof(name));
+                       break;
+               }
+
+               child = device_find_child_by_name(ctrl_dev, name);
+               put_device(ctrl_dev);
+               if (!child) {
+                       pr_err("error could not find '%s' device\n", name);
+                       ctrl_dev = ERR_PTR(-ENODEV);
+                       goto put_ctrl_adev;
+               }
+
+               ctrl_dev = child;
+       }
+
+put_ctrl_adev:
+       acpi_dev_put(ctrl_adev);
+       return ctrl_dev;
+}
index 3a396b763c4963d1f965e1d635967bd3f3d60f18..ce3e08815a8e647f2bf5578d0383dd4621d8526f 100644 (file)
@@ -1009,7 +1009,16 @@ static ssize_t current_value_store(struct kobject *kobj,
                 * Note - this sets the variable and then the password as separate
                 * WMI calls. Function tlmi_save_bios_settings will error if the
                 * password is incorrect.
+                * Workstation's require the opcode to be set before changing the
+                * attribute.
                 */
+               if (tlmi_priv.pwd_admin->valid && tlmi_priv.pwd_admin->password[0]) {
+                       ret = tlmi_opcode_setting("WmiOpcodePasswordAdmin",
+                                                 tlmi_priv.pwd_admin->password);
+                       if (ret)
+                               goto out;
+               }
+
                set_str = kasprintf(GFP_KERNEL, "%s,%s;", setting->display_name,
                                    new_setting);
                if (!set_str) {
@@ -1021,17 +1030,10 @@ static ssize_t current_value_store(struct kobject *kobj,
                if (ret)
                        goto out;
 
-               if (tlmi_priv.save_mode == TLMI_SAVE_BULK) {
+               if (tlmi_priv.save_mode == TLMI_SAVE_BULK)
                        tlmi_priv.save_required = true;
-               } else {
-                       if (tlmi_priv.pwd_admin->valid && tlmi_priv.pwd_admin->password[0]) {
-                               ret = tlmi_opcode_setting("WmiOpcodePasswordAdmin",
-                                                         tlmi_priv.pwd_admin->password);
-                               if (ret)
-                                       goto out;
-                       }
+               else
                        ret = tlmi_save_bios_settings("");
-               }
        } else { /* old non-opcode based authentication method (deprecated) */
                if (tlmi_priv.pwd_admin->valid && tlmi_priv.pwd_admin->password[0]) {
                        auth_str = kasprintf(GFP_KERNEL, "%s,%s,%s;",
index c4895e9bc7148ae991a541508a0672a9ae0345bf..5ecd9d33250d78f3a38c7f99b8b6c5c903cdf25d 100644 (file)
@@ -10308,6 +10308,7 @@ static int convert_dytc_to_profile(int funcmode, int dytcmode,
                return 0;
        default:
                /* Unknown function */
+               pr_debug("unknown function 0x%x\n", funcmode);
                return -EOPNOTSUPP;
        }
        return 0;
@@ -10493,8 +10494,8 @@ static void dytc_profile_refresh(void)
                return;
 
        perfmode = (output >> DYTC_GET_MODE_BIT) & 0xF;
-       convert_dytc_to_profile(funcmode, perfmode, &profile);
-       if (profile != dytc_current_profile) {
+       err = convert_dytc_to_profile(funcmode, perfmode, &profile);
+       if (!err && profile != dytc_current_profile) {
                dytc_current_profile = profile;
                platform_profile_notify();
        }
index 0c6733772698408ef1a23b977d1a0698a19347d5..975cf24ae359a882974f35762894108d4a117fb8 100644 (file)
@@ -81,7 +81,7 @@ static const struct property_entry chuwi_hi8_air_props[] = {
 };
 
 static const struct ts_dmi_data chuwi_hi8_air_data = {
-       .acpi_name      = "MSSL1680:00",
+       .acpi_name      = "MSSL1680",
        .properties     = chuwi_hi8_air_props,
 };
 
@@ -415,18 +415,13 @@ static const struct property_entry gdix1001_upside_down_props[] = {
        { }
 };
 
-static const struct ts_dmi_data gdix1001_00_upside_down_data = {
-       .acpi_name      = "GDIX1001:00",
-       .properties     = gdix1001_upside_down_props,
-};
-
-static const struct ts_dmi_data gdix1001_01_upside_down_data = {
-       .acpi_name      = "GDIX1001:01",
+static const struct ts_dmi_data gdix1001_upside_down_data = {
+       .acpi_name      = "GDIX1001",
        .properties     = gdix1001_upside_down_props,
 };
 
-static const struct ts_dmi_data gdix1002_00_upside_down_data = {
-       .acpi_name      = "GDIX1002:00",
+static const struct ts_dmi_data gdix1002_upside_down_data = {
+       .acpi_name      = "GDIX1002",
        .properties     = gdix1001_upside_down_props,
 };
 
@@ -944,6 +939,32 @@ static const struct ts_dmi_data teclast_tbook11_data = {
        .properties     = teclast_tbook11_props,
 };
 
+static const struct property_entry teclast_x16_plus_props[] = {
+       PROPERTY_ENTRY_U32("touchscreen-min-x", 8),
+       PROPERTY_ENTRY_U32("touchscreen-min-y", 14),
+       PROPERTY_ENTRY_U32("touchscreen-size-x", 1916),
+       PROPERTY_ENTRY_U32("touchscreen-size-y", 1264),
+       PROPERTY_ENTRY_BOOL("touchscreen-inverted-y"),
+       PROPERTY_ENTRY_STRING("firmware-name", "gsl3692-teclast-x16-plus.fw"),
+       PROPERTY_ENTRY_U32("silead,max-fingers", 10),
+       PROPERTY_ENTRY_BOOL("silead,home-button"),
+       { }
+};
+
+static const struct ts_dmi_data teclast_x16_plus_data = {
+       .embedded_fw = {
+               .name   = "silead/gsl3692-teclast-x16-plus.fw",
+               .prefix = { 0xf0, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00 },
+               .length = 43560,
+               .sha256 = { 0x9d, 0xb0, 0x3d, 0xf1, 0x00, 0x3c, 0xb5, 0x25,
+                           0x62, 0x8a, 0xa0, 0x93, 0x4b, 0xe0, 0x4e, 0x75,
+                           0xd1, 0x27, 0xb1, 0x65, 0x3c, 0xba, 0xa5, 0x0f,
+                           0xcd, 0xb4, 0xbe, 0x00, 0xbb, 0xf6, 0x43, 0x29 },
+       },
+       .acpi_name      = "MSSL1680:00",
+       .properties     = teclast_x16_plus_props,
+};
+
 static const struct property_entry teclast_x3_plus_props[] = {
        PROPERTY_ENTRY_U32("touchscreen-size-x", 1980),
        PROPERTY_ENTRY_U32("touchscreen-size-y", 1500),
@@ -1386,7 +1407,7 @@ const struct dmi_system_id touchscreen_dmi_table[] = {
        },
        {
                /* Juno Tablet */
-               .driver_data = (void *)&gdix1002_00_upside_down_data,
+               .driver_data = (void *)&gdix1002_upside_down_data,
                .matches = {
                        DMI_MATCH(DMI_SYS_VENDOR, "Default string"),
                        /* Both product- and board-name being "Default string" is somewhat rare */
@@ -1612,6 +1633,15 @@ const struct dmi_system_id touchscreen_dmi_table[] = {
                        DMI_MATCH(DMI_PRODUCT_SKU, "E5A6_A1"),
                },
        },
+       {
+               /* Teclast X16 Plus */
+               .driver_data = (void *)&teclast_x16_plus_data,
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "TECLAST"),
+                       DMI_MATCH(DMI_PRODUCT_NAME, "Default string"),
+                       DMI_MATCH(DMI_PRODUCT_SKU, "D3A5_A1"),
+               },
+       },
        {
                /* Teclast X3 Plus */
                .driver_data = (void *)&teclast_x3_plus_data,
@@ -1623,7 +1653,7 @@ const struct dmi_system_id touchscreen_dmi_table[] = {
        },
        {
                /* Teclast X89 (Android version / BIOS) */
-               .driver_data = (void *)&gdix1001_00_upside_down_data,
+               .driver_data = (void *)&gdix1001_upside_down_data,
                .matches = {
                        DMI_MATCH(DMI_BOARD_VENDOR, "WISKY"),
                        DMI_MATCH(DMI_BOARD_NAME, "3G062i"),
@@ -1631,7 +1661,7 @@ const struct dmi_system_id touchscreen_dmi_table[] = {
        },
        {
                /* Teclast X89 (Windows version / BIOS) */
-               .driver_data = (void *)&gdix1001_01_upside_down_data,
+               .driver_data = (void *)&gdix1001_upside_down_data,
                .matches = {
                        /* tPAD is too generic, also match on bios date */
                        DMI_MATCH(DMI_BOARD_VENDOR, "TECLAST"),
@@ -1649,7 +1679,7 @@ const struct dmi_system_id touchscreen_dmi_table[] = {
        },
        {
                /* Teclast X98 Pro */
-               .driver_data = (void *)&gdix1001_00_upside_down_data,
+               .driver_data = (void *)&gdix1001_upside_down_data,
                .matches = {
                        /*
                         * Only match BIOS date, because the manufacturers
@@ -1753,7 +1783,7 @@ const struct dmi_system_id touchscreen_dmi_table[] = {
        },
        {
                /* "WinBook TW100" */
-               .driver_data = (void *)&gdix1001_00_upside_down_data,
+               .driver_data = (void *)&gdix1001_upside_down_data,
                .matches = {
                        DMI_MATCH(DMI_SYS_VENDOR, "WinBook"),
                        DMI_MATCH(DMI_PRODUCT_NAME, "TW100")
@@ -1761,7 +1791,7 @@ const struct dmi_system_id touchscreen_dmi_table[] = {
        },
        {
                /* WinBook TW700 */
-               .driver_data = (void *)&gdix1001_00_upside_down_data,
+               .driver_data = (void *)&gdix1001_upside_down_data,
                .matches = {
                        DMI_MATCH(DMI_SYS_VENDOR, "WinBook"),
                        DMI_MATCH(DMI_PRODUCT_NAME, "TW700")
@@ -1786,7 +1816,7 @@ static void ts_dmi_add_props(struct i2c_client *client)
        int error;
 
        if (has_acpi_companion(dev) &&
-           !strncmp(ts_data->acpi_name, client->name, I2C_NAME_SIZE)) {
+           strstarts(client->name, ts_data->acpi_name)) {
                error = device_create_managed_software_node(dev, ts_data->properties, NULL);
                if (error)
                        dev_err(dev, "failed to add properties: %d\n", error);
index bd271a5730aa51f1c1e6286e2b481e865799b79a..3c288e8f404beb5d4887235c85654e9ac77cd425 100644 (file)
@@ -25,6 +25,7 @@
 #include <linux/list.h>
 #include <linux/module.h>
 #include <linux/platform_device.h>
+#include <linux/rwsem.h>
 #include <linux/slab.h>
 #include <linux/sysfs.h>
 #include <linux/types.h>
@@ -56,7 +57,6 @@ static_assert(__alignof__(struct guid_block) == 1);
 
 enum { /* wmi_block flags */
        WMI_READ_TAKES_NO_ARGS,
-       WMI_PROBED,
 };
 
 struct wmi_block {
@@ -64,8 +64,10 @@ struct wmi_block {
        struct list_head list;
        struct guid_block gblock;
        struct acpi_device *acpi_device;
+       struct rw_semaphore notify_lock;        /* Protects notify callback add/remove */
        wmi_notify_handler handler;
        void *handler_data;
+       bool driver_ready;
        unsigned long flags;
 };
 
@@ -219,6 +221,17 @@ static int wmidev_match_guid(struct device *dev, const void *data)
        return 0;
 }
 
+static int wmidev_match_notify_id(struct device *dev, const void *data)
+{
+       struct wmi_block *wblock = dev_to_wblock(dev);
+       const u32 *notify_id = data;
+
+       if (wblock->gblock.flags & ACPI_WMI_EVENT && wblock->gblock.notify_id == *notify_id)
+               return 1;
+
+       return 0;
+}
+
 static struct bus_type wmi_bus_type;
 
 static struct wmi_device *wmi_find_device_by_guid(const char *guid_string)
@@ -238,6 +251,17 @@ static struct wmi_device *wmi_find_device_by_guid(const char *guid_string)
        return dev_to_wdev(dev);
 }
 
+static struct wmi_device *wmi_find_event_by_notify_id(const u32 notify_id)
+{
+       struct device *dev;
+
+       dev = bus_find_device(&wmi_bus_type, NULL, &notify_id, wmidev_match_notify_id);
+       if (!dev)
+               return ERR_PTR(-ENODEV);
+
+       return to_wmi_device(dev);
+}
+
 static void wmi_device_put(struct wmi_device *wdev)
 {
        put_device(&wdev->dev);
@@ -572,32 +596,31 @@ acpi_status wmi_install_notify_handler(const char *guid,
                                       wmi_notify_handler handler,
                                       void *data)
 {
-       struct wmi_block *block;
-       acpi_status status = AE_NOT_EXIST;
-       guid_t guid_input;
-
-       if (!guid || !handler)
-               return AE_BAD_PARAMETER;
+       struct wmi_block *wblock;
+       struct wmi_device *wdev;
+       acpi_status status;
 
-       if (guid_parse(guid, &guid_input))
-               return AE_BAD_PARAMETER;
+       wdev = wmi_find_device_by_guid(guid);
+       if (IS_ERR(wdev))
+               return AE_ERROR;
 
-       list_for_each_entry(block, &wmi_block_list, list) {
-               acpi_status wmi_status;
+       wblock = container_of(wdev, struct wmi_block, dev);
 
-               if (guid_equal(&block->gblock.guid, &guid_input)) {
-                       if (block->handler)
-                               return AE_ALREADY_ACQUIRED;
+       down_write(&wblock->notify_lock);
+       if (wblock->handler) {
+               status = AE_ALREADY_ACQUIRED;
+       } else {
+               wblock->handler = handler;
+               wblock->handler_data = data;
 
-                       block->handler = handler;
-                       block->handler_data = data;
+               if (ACPI_FAILURE(wmi_method_enable(wblock, true)))
+                       dev_warn(&wblock->dev.dev, "Failed to enable device\n");
 
-                       wmi_status = wmi_method_enable(block, true);
-                       if ((wmi_status != AE_OK) ||
-                           ((wmi_status == AE_OK) && (status == AE_NOT_EXIST)))
-                               status = wmi_status;
-               }
+               status = AE_OK;
        }
+       up_write(&wblock->notify_lock);
+
+       wmi_device_put(wdev);
 
        return status;
 }
@@ -613,30 +636,31 @@ EXPORT_SYMBOL_GPL(wmi_install_notify_handler);
  */
 acpi_status wmi_remove_notify_handler(const char *guid)
 {
-       struct wmi_block *block;
-       acpi_status status = AE_NOT_EXIST;
-       guid_t guid_input;
+       struct wmi_block *wblock;
+       struct wmi_device *wdev;
+       acpi_status status;
 
-       if (!guid)
-               return AE_BAD_PARAMETER;
+       wdev = wmi_find_device_by_guid(guid);
+       if (IS_ERR(wdev))
+               return AE_ERROR;
 
-       if (guid_parse(guid, &guid_input))
-               return AE_BAD_PARAMETER;
+       wblock = container_of(wdev, struct wmi_block, dev);
 
-       list_for_each_entry(block, &wmi_block_list, list) {
-               acpi_status wmi_status;
+       down_write(&wblock->notify_lock);
+       if (!wblock->handler) {
+               status = AE_NULL_ENTRY;
+       } else {
+               if (ACPI_FAILURE(wmi_method_enable(wblock, false)))
+                       dev_warn(&wblock->dev.dev, "Failed to disable device\n");
 
-               if (guid_equal(&block->gblock.guid, &guid_input)) {
-                       if (!block->handler)
-                               return AE_NULL_ENTRY;
+               wblock->handler = NULL;
+               wblock->handler_data = NULL;
 
-                       wmi_status = wmi_method_enable(block, false);
-                       block->handler = NULL;
-                       block->handler_data = NULL;
-                       if (wmi_status != AE_OK || (wmi_status == AE_OK && status == AE_NOT_EXIST))
-                               status = wmi_status;
-               }
+               status = AE_OK;
        }
+       up_write(&wblock->notify_lock);
+
+       wmi_device_put(wdev);
 
        return status;
 }
@@ -655,15 +679,19 @@ EXPORT_SYMBOL_GPL(wmi_remove_notify_handler);
 acpi_status wmi_get_event_data(u32 event, struct acpi_buffer *out)
 {
        struct wmi_block *wblock;
+       struct wmi_device *wdev;
+       acpi_status status;
 
-       list_for_each_entry(wblock, &wmi_block_list, list) {
-               struct guid_block *gblock = &wblock->gblock;
+       wdev = wmi_find_event_by_notify_id(event);
+       if (IS_ERR(wdev))
+               return AE_NOT_FOUND;
 
-               if ((gblock->flags & ACPI_WMI_EVENT) && gblock->notify_id == event)
-                       return get_event_data(wblock, out);
-       }
+       wblock = container_of(wdev, struct wmi_block, dev);
+       status = get_event_data(wblock, out);
 
-       return AE_NOT_FOUND;
+       wmi_device_put(wdev);
+
+       return status;
 }
 EXPORT_SYMBOL_GPL(wmi_get_event_data);
 
@@ -868,7 +896,7 @@ static int wmi_dev_probe(struct device *dev)
        if (wdriver->probe) {
                ret = wdriver->probe(dev_to_wdev(dev),
                                find_guid_context(wblock, wdriver));
-               if (!ret) {
+               if (ret) {
                        if (ACPI_FAILURE(wmi_method_enable(wblock, false)))
                                dev_warn(dev, "Failed to disable device\n");
 
@@ -876,7 +904,9 @@ static int wmi_dev_probe(struct device *dev)
                }
        }
 
-       set_bit(WMI_PROBED, &wblock->flags);
+       down_write(&wblock->notify_lock);
+       wblock->driver_ready = true;
+       up_write(&wblock->notify_lock);
 
        return 0;
 }
@@ -886,7 +916,9 @@ static void wmi_dev_remove(struct device *dev)
        struct wmi_block *wblock = dev_to_wblock(dev);
        struct wmi_driver *wdriver = drv_to_wdrv(dev->driver);
 
-       clear_bit(WMI_PROBED, &wblock->flags);
+       down_write(&wblock->notify_lock);
+       wblock->driver_ready = false;
+       up_write(&wblock->notify_lock);
 
        if (wdriver->remove)
                wdriver->remove(dev_to_wdev(dev));
@@ -999,6 +1031,8 @@ static int wmi_create_device(struct device *wmi_bus_dev,
                wblock->dev.setable = true;
 
  out_init:
+       init_rwsem(&wblock->notify_lock);
+       wblock->driver_ready = false;
        wblock->dev.dev.bus = &wmi_bus_type;
        wblock->dev.dev.parent = wmi_bus_dev;
 
@@ -1171,6 +1205,26 @@ acpi_wmi_ec_space_handler(u32 function, acpi_physical_address address,
        }
 }
 
+static void wmi_notify_driver(struct wmi_block *wblock)
+{
+       struct wmi_driver *driver = drv_to_wdrv(wblock->dev.dev.driver);
+       struct acpi_buffer data = { ACPI_ALLOCATE_BUFFER, NULL };
+       acpi_status status;
+
+       if (!driver->no_notify_data) {
+               status = get_event_data(wblock, &data);
+               if (ACPI_FAILURE(status)) {
+                       dev_warn(&wblock->dev.dev, "Failed to get event data\n");
+                       return;
+               }
+       }
+
+       if (driver->notify)
+               driver->notify(&wblock->dev, data.pointer);
+
+       kfree(data.pointer);
+}
+
 static int wmi_notify_device(struct device *dev, void *data)
 {
        struct wmi_block *wblock = dev_to_wblock(dev);
@@ -1179,28 +1233,17 @@ static int wmi_notify_device(struct device *dev, void *data)
        if (!(wblock->gblock.flags & ACPI_WMI_EVENT && wblock->gblock.notify_id == *event))
                return 0;
 
-       /* If a driver is bound, then notify the driver. */
-       if (test_bit(WMI_PROBED, &wblock->flags) && wblock->dev.dev.driver) {
-               struct wmi_driver *driver = drv_to_wdrv(wblock->dev.dev.driver);
-               struct acpi_buffer evdata = { ACPI_ALLOCATE_BUFFER, NULL };
-               acpi_status status;
-
-               if (!driver->no_notify_data) {
-                       status = get_event_data(wblock, &evdata);
-                       if (ACPI_FAILURE(status)) {
-                               dev_warn(&wblock->dev.dev, "failed to get event data\n");
-                               return -EIO;
-                       }
-               }
-
-               if (driver->notify)
-                       driver->notify(&wblock->dev, evdata.pointer);
-
-               kfree(evdata.pointer);
-       } else if (wblock->handler) {
-               /* Legacy handler */
-               wblock->handler(*event, wblock->handler_data);
+       down_read(&wblock->notify_lock);
+       /* The WMI driver notify handler conflicts with the legacy WMI handler.
+        * Because of this the WMI driver notify handler takes precedence.
+        */
+       if (wblock->dev.dev.driver && wblock->driver_ready) {
+               wmi_notify_driver(wblock);
+       } else {
+               if (wblock->handler)
+                       wblock->handler(*event, wblock->handler_data);
        }
+       up_read(&wblock->notify_lock);
 
        acpi_bus_generate_netlink_event(wblock->acpi_device->pnp.device_class,
                                        dev_name(&wblock->dev.dev), *event, 0);
index f8221a15575b327c78df4edb191fdc28c52fe2c1..a3415f1c0b5f82a3f54d4a6eb6fed54d98d5e366 100644 (file)
@@ -21,6 +21,7 @@
 #include <linux/string.h>
 
 #include "x86-android-tablets.h"
+#include "../serdev_helpers.h"
 
 static struct platform_device *x86_android_tablet_device;
 
@@ -113,6 +114,9 @@ int x86_acpi_irq_helper_get(const struct x86_acpi_irq_data *data)
                if (irq_type != IRQ_TYPE_NONE && irq_type != irq_get_trigger_type(irq))
                        irq_set_irq_type(irq, irq_type);
 
+               if (data->free_gpio)
+                       devm_gpiod_put(&x86_android_tablet_device->dev, gpiod);
+
                return irq;
        case X86_ACPI_IRQ_TYPE_PMIC:
                status = acpi_get_handle(NULL, data->chip, &handle);
@@ -229,38 +233,20 @@ static __init int x86_instantiate_spi_dev(const struct x86_dev_info *dev_info, i
 
 static __init int x86_instantiate_serdev(const struct x86_serdev_info *info, int idx)
 {
-       struct acpi_device *ctrl_adev, *serdev_adev;
+       struct acpi_device *serdev_adev;
        struct serdev_device *serdev;
        struct device *ctrl_dev;
        int ret = -ENODEV;
 
-       ctrl_adev = acpi_dev_get_first_match_dev(info->ctrl_hid, info->ctrl_uid, -1);
-       if (!ctrl_adev) {
-               pr_err("error could not get %s/%s ctrl adev\n",
-                      info->ctrl_hid, info->ctrl_uid);
-               return -ENODEV;
-       }
+       ctrl_dev = get_serdev_controller(info->ctrl_hid, info->ctrl_uid, 0,
+                                        info->ctrl_devname);
+       if (IS_ERR(ctrl_dev))
+               return PTR_ERR(ctrl_dev);
 
        serdev_adev = acpi_dev_get_first_match_dev(info->serdev_hid, NULL, -1);
        if (!serdev_adev) {
                pr_err("error could not get %s serdev adev\n", info->serdev_hid);
-               goto put_ctrl_adev;
-       }
-
-       /* get_first_physical_node() returns a weak ref, no need to put() it */
-       ctrl_dev = acpi_get_first_physical_node(ctrl_adev);
-       if (!ctrl_dev)  {
-               pr_err("error could not get %s/%s ctrl physical dev\n",
-                      info->ctrl_hid, info->ctrl_uid);
-               goto put_serdev_adev;
-       }
-
-       /* ctrl_dev now points to the controller's parent, get the controller */
-       ctrl_dev = device_find_child_by_name(ctrl_dev, info->ctrl_devname);
-       if (!ctrl_dev) {
-               pr_err("error could not get %s/%s %s ctrl dev\n",
-                      info->ctrl_hid, info->ctrl_uid, info->ctrl_devname);
-               goto put_serdev_adev;
+               goto put_ctrl_dev;
        }
 
        serdev = serdev_device_alloc(to_serdev_controller(ctrl_dev));
@@ -283,8 +269,8 @@ static __init int x86_instantiate_serdev(const struct x86_serdev_info *info, int
 
 put_serdev_adev:
        acpi_dev_put(serdev_adev);
-put_ctrl_adev:
-       acpi_dev_put(ctrl_adev);
+put_ctrl_dev:
+       put_device(ctrl_dev);
        return ret;
 }
 
index f1c66a61bfc52786f1a6cd49da8ee88c423adbea..c297391955adbcb9a6b076dfb8f009ae4bce2bcb 100644 (file)
@@ -116,6 +116,7 @@ static const struct x86_i2c_client_info lenovo_yb1_x90_i2c_clients[] __initconst
                        .trigger = ACPI_EDGE_SENSITIVE,
                        .polarity = ACPI_ACTIVE_LOW,
                        .con_id = "goodix_ts_irq",
+                       .free_gpio = true,
                },
        }, {
                /* Wacom Digitizer in keyboard half */
index bc6bbf7ec6ea137101394b59d38fb7471675b00c..278402dcb808c5f2b7e25a894c117177867250d0 100644 (file)
@@ -68,7 +68,7 @@ static const struct x86_i2c_client_info acer_b1_750_i2c_clients[] __initconst =
        },
 };
 
-static struct gpiod_lookup_table acer_b1_750_goodix_gpios = {
+static struct gpiod_lookup_table acer_b1_750_nvt_ts_gpios = {
        .dev_id = "i2c-NVT-ts",
        .table = {
                GPIO_LOOKUP("INT33FC:01", 26, "reset", GPIO_ACTIVE_LOW),
@@ -77,7 +77,7 @@ static struct gpiod_lookup_table acer_b1_750_goodix_gpios = {
 };
 
 static struct gpiod_lookup_table * const acer_b1_750_gpios[] = {
-       &acer_b1_750_goodix_gpios,
+       &acer_b1_750_nvt_ts_gpios,
        &int3496_reference_gpios,
        NULL
 };
index 49fed9410adbadad39d397a7b541f52b13c03564..468993edfeee25bcb541daedbe6006ccc7fc44bb 100644 (file)
@@ -39,6 +39,7 @@ struct x86_acpi_irq_data {
        int index;
        int trigger;  /* ACPI_EDGE_SENSITIVE / ACPI_LEVEL_SENSITIVE */
        int polarity; /* ACPI_ACTIVE_HIGH / ACPI_ACTIVE_LOW / ACPI_ACTIVE_BOTH */
+       bool free_gpio; /* Release GPIO after getting IRQ (for TYPE_GPIOINT) */
        const char *con_id;
 };
 
index 709bbc448fad431d894479146982664002578584..d7ef46ccd9b8a414f8066f7fe2718f9867c89003 100644 (file)
@@ -159,6 +159,9 @@ static void scmi_perf_domain_remove(struct scmi_device *sdev)
        struct genpd_onecell_data *scmi_pd_data = dev_get_drvdata(dev);
        int i;
 
+       if (!scmi_pd_data)
+               return;
+
        of_genpd_del_provider(dev->of_node);
 
        for (i = 0; i < scmi_pd_data->num_domains; i++)
index a1f6cba3ae6c86a386ab68bf46866ee95663eb51..18e232b5ed53d73ab24bd4fe3dab94c69235436d 100644 (file)
@@ -1109,7 +1109,7 @@ static int __init genpd_power_off_unused(void)
 
        return 0;
 }
-late_initcall(genpd_power_off_unused);
+late_initcall_sync(genpd_power_off_unused);
 
 #ifdef CONFIG_PM_SLEEP
 
index e26dc17d07ad71d8398044670227c93a6bdd4427..e274e3315fe7a60887bec6a1fa85db69156e7fd6 100644 (file)
@@ -561,6 +561,11 @@ static int scpsys_add_subdomain(struct scpsys *scpsys, struct device_node *paren
                        goto err_put_node;
                }
 
+               /* recursive call to add all subdomains */
+               ret = scpsys_add_subdomain(scpsys, child);
+               if (ret)
+                       goto err_put_node;
+
                ret = pm_genpd_add_subdomain(parent_pd, child_pd);
                if (ret) {
                        dev_err(scpsys->dev, "failed to add %s subdomain to parent %s\n",
@@ -570,11 +575,6 @@ static int scpsys_add_subdomain(struct scpsys *scpsys, struct device_node *paren
                        dev_dbg(scpsys->dev, "%s add subdomain: %s\n", parent_pd->name,
                                child_pd->name);
                }
-
-               /* recursive call to add all subdomains */
-               ret = scpsys_add_subdomain(scpsys, child);
-               if (ret)
-                       goto err_put_node;
        }
 
        return 0;
@@ -588,9 +588,6 @@ static void scpsys_remove_one_domain(struct scpsys_domain *pd)
 {
        int ret;
 
-       if (scpsys_domain_is_on(pd))
-               scpsys_power_off(&pd->genpd);
-
        /*
         * We're in the error cleanup already, so we only complain,
         * but won't emit another error on top of the original one.
@@ -600,6 +597,8 @@ static void scpsys_remove_one_domain(struct scpsys_domain *pd)
                dev_err(pd->scpsys->dev,
                        "failed to remove domain '%s' : %d - state may be inconsistent\n",
                        pd->genpd.name, ret);
+       if (scpsys_domain_is_on(pd))
+               scpsys_power_off(&pd->genpd);
 
        clk_bulk_put(pd->num_clks, pd->clks);
        clk_bulk_put(pd->num_subsys_clks, pd->subsys_clks);
index 3078896b13008865816edc575fe0d769b44c9453..47df910645f6680ab4a17948700f426904007b86 100644 (file)
@@ -692,6 +692,7 @@ static int rpmhpd_aggregate_corner(struct rpmhpd *pd, unsigned int corner)
        unsigned int active_corner, sleep_corner;
        unsigned int this_active_corner = 0, this_sleep_corner = 0;
        unsigned int peer_active_corner = 0, peer_sleep_corner = 0;
+       unsigned int peer_enabled_corner;
 
        if (pd->state_synced) {
                to_active_sleep(pd, corner, &this_active_corner, &this_sleep_corner);
@@ -701,9 +702,11 @@ static int rpmhpd_aggregate_corner(struct rpmhpd *pd, unsigned int corner)
                this_sleep_corner = pd->level_count - 1;
        }
 
-       if (peer && peer->enabled)
-               to_active_sleep(peer, peer->corner, &peer_active_corner,
+       if (peer && peer->enabled) {
+               peer_enabled_corner = max(peer->corner, peer->enable_corner);
+               to_active_sleep(peer, peer_enabled_corner, &peer_active_corner,
                                &peer_sleep_corner);
+       }
 
        active_corner = max(this_active_corner, peer_active_corner);
 
index 39ca84a67daadd21202e1ba80f13ec6cbc671a7a..621e411fc9991a4050cd6da699695912f18a46b0 100644 (file)
@@ -25,7 +25,8 @@ static const struct rcar_sysc_area r8a77980_areas[] __initconst = {
          PD_CPU_NOCR },
        { "ca53-cpu3",  0x200, 3, R8A77980_PD_CA53_CPU3, R8A77980_PD_CA53_SCU,
          PD_CPU_NOCR },
-       { "cr7",        0x240, 0, R8A77980_PD_CR7,      R8A77980_PD_ALWAYS_ON },
+       { "cr7",        0x240, 0, R8A77980_PD_CR7,      R8A77980_PD_ALWAYS_ON,
+         PD_CPU_NOCR },
        { "a3ir",       0x180, 0, R8A77980_PD_A3IR,     R8A77980_PD_ALWAYS_ON },
        { "a2ir0",      0x400, 0, R8A77980_PD_A2IR0,    R8A77980_PD_A3IR },
        { "a2ir1",      0x400, 1, R8A77980_PD_A2IR1,    R8A77980_PD_A3IR },
index f21cb05815ec6391cc5e11c7edc5190b7163aa94..3e31375491d58055b19f1b61b57dcac3d849b363 100644 (file)
@@ -978,6 +978,7 @@ config CHARGER_QCOM_SMB2
 config FUEL_GAUGE_MM8013
        tristate "Mitsumi MM8013 fuel gauge driver"
        depends on I2C
+       select REGMAP_I2C
        help
          Say Y here to enable the Mitsumi MM8013 fuel gauge driver.
          It enables the monitoring of many battery parameters, including
index 3a1798b0c1a79f3ed3a3fd0be4d84f6df390b3b4..9910c600743ebd9b9e01a1cb393c0378ae837807 100644 (file)
@@ -209,7 +209,9 @@ static void bq27xxx_battery_i2c_remove(struct i2c_client *client)
 {
        struct bq27xxx_device_info *di = i2c_get_clientdata(client);
 
-       free_irq(client->irq, di);
+       if (client->irq)
+               free_irq(client->irq, di);
+
        bq27xxx_battery_teardown(di);
 
        mutex_lock(&battery_mutex);
index a12e2a66d516f9de6e4b7ccc3f8048861322624a..ec163d1bcd189192abcecbcb4e29e0e4251b2e38 100644 (file)
@@ -282,7 +282,6 @@ struct qcom_battmgr_wireless {
 
 struct qcom_battmgr {
        struct device *dev;
-       struct auxiliary_device *adev;
        struct pmic_glink_client *client;
 
        enum qcom_battmgr_variant variant;
@@ -1294,69 +1293,11 @@ static void qcom_battmgr_enable_worker(struct work_struct *work)
                dev_err(battmgr->dev, "failed to request power notifications\n");
 }
 
-static char *qcom_battmgr_battery[] = { "battery" };
-
-static void qcom_battmgr_register_psy(struct qcom_battmgr *battmgr)
-{
-       struct power_supply_config psy_cfg_supply = {};
-       struct auxiliary_device *adev = battmgr->adev;
-       struct power_supply_config psy_cfg = {};
-       struct device *dev = &adev->dev;
-
-       psy_cfg.drv_data = battmgr;
-       psy_cfg.of_node = adev->dev.of_node;
-
-       psy_cfg_supply.drv_data = battmgr;
-       psy_cfg_supply.of_node = adev->dev.of_node;
-       psy_cfg_supply.supplied_to = qcom_battmgr_battery;
-       psy_cfg_supply.num_supplicants = 1;
-
-       if (battmgr->variant == QCOM_BATTMGR_SC8280XP) {
-               battmgr->bat_psy = devm_power_supply_register(dev, &sc8280xp_bat_psy_desc, &psy_cfg);
-               if (IS_ERR(battmgr->bat_psy))
-                       dev_err(dev, "failed to register battery power supply (%ld)\n",
-                               PTR_ERR(battmgr->bat_psy));
-
-               battmgr->ac_psy = devm_power_supply_register(dev, &sc8280xp_ac_psy_desc, &psy_cfg_supply);
-               if (IS_ERR(battmgr->ac_psy))
-                       dev_err(dev, "failed to register AC power supply (%ld)\n",
-                               PTR_ERR(battmgr->ac_psy));
-
-               battmgr->usb_psy = devm_power_supply_register(dev, &sc8280xp_usb_psy_desc, &psy_cfg_supply);
-               if (IS_ERR(battmgr->usb_psy))
-                       dev_err(dev, "failed to register USB power supply (%ld)\n",
-                               PTR_ERR(battmgr->usb_psy));
-
-               battmgr->wls_psy = devm_power_supply_register(dev, &sc8280xp_wls_psy_desc, &psy_cfg_supply);
-               if (IS_ERR(battmgr->wls_psy))
-                       dev_err(dev, "failed to register wireless charing power supply (%ld)\n",
-                               PTR_ERR(battmgr->wls_psy));
-       } else {
-               battmgr->bat_psy = devm_power_supply_register(dev, &sm8350_bat_psy_desc, &psy_cfg);
-               if (IS_ERR(battmgr->bat_psy))
-                       dev_err(dev, "failed to register battery power supply (%ld)\n",
-                               PTR_ERR(battmgr->bat_psy));
-
-               battmgr->usb_psy = devm_power_supply_register(dev, &sm8350_usb_psy_desc, &psy_cfg_supply);
-               if (IS_ERR(battmgr->usb_psy))
-                       dev_err(dev, "failed to register USB power supply (%ld)\n",
-                               PTR_ERR(battmgr->usb_psy));
-
-               battmgr->wls_psy = devm_power_supply_register(dev, &sm8350_wls_psy_desc, &psy_cfg_supply);
-               if (IS_ERR(battmgr->wls_psy))
-                       dev_err(dev, "failed to register wireless charing power supply (%ld)\n",
-                               PTR_ERR(battmgr->wls_psy));
-       }
-}
-
 static void qcom_battmgr_pdr_notify(void *priv, int state)
 {
        struct qcom_battmgr *battmgr = priv;
 
        if (state == SERVREG_SERVICE_STATE_UP) {
-               if (!battmgr->bat_psy)
-                       qcom_battmgr_register_psy(battmgr);
-
                battmgr->service_up = true;
                schedule_work(&battmgr->enable_work);
        } else {
@@ -1371,9 +1312,13 @@ static const struct of_device_id qcom_battmgr_of_variants[] = {
        {}
 };
 
+static char *qcom_battmgr_battery[] = { "battery" };
+
 static int qcom_battmgr_probe(struct auxiliary_device *adev,
                              const struct auxiliary_device_id *id)
 {
+       struct power_supply_config psy_cfg_supply = {};
+       struct power_supply_config psy_cfg = {};
        const struct of_device_id *match;
        struct qcom_battmgr *battmgr;
        struct device *dev = &adev->dev;
@@ -1383,7 +1328,14 @@ static int qcom_battmgr_probe(struct auxiliary_device *adev,
                return -ENOMEM;
 
        battmgr->dev = dev;
-       battmgr->adev = adev;
+
+       psy_cfg.drv_data = battmgr;
+       psy_cfg.of_node = adev->dev.of_node;
+
+       psy_cfg_supply.drv_data = battmgr;
+       psy_cfg_supply.of_node = adev->dev.of_node;
+       psy_cfg_supply.supplied_to = qcom_battmgr_battery;
+       psy_cfg_supply.num_supplicants = 1;
 
        INIT_WORK(&battmgr->enable_work, qcom_battmgr_enable_worker);
        mutex_init(&battmgr->lock);
@@ -1395,6 +1347,43 @@ static int qcom_battmgr_probe(struct auxiliary_device *adev,
        else
                battmgr->variant = QCOM_BATTMGR_SM8350;
 
+       if (battmgr->variant == QCOM_BATTMGR_SC8280XP) {
+               battmgr->bat_psy = devm_power_supply_register(dev, &sc8280xp_bat_psy_desc, &psy_cfg);
+               if (IS_ERR(battmgr->bat_psy))
+                       return dev_err_probe(dev, PTR_ERR(battmgr->bat_psy),
+                                            "failed to register battery power supply\n");
+
+               battmgr->ac_psy = devm_power_supply_register(dev, &sc8280xp_ac_psy_desc, &psy_cfg_supply);
+               if (IS_ERR(battmgr->ac_psy))
+                       return dev_err_probe(dev, PTR_ERR(battmgr->ac_psy),
+                                            "failed to register AC power supply\n");
+
+               battmgr->usb_psy = devm_power_supply_register(dev, &sc8280xp_usb_psy_desc, &psy_cfg_supply);
+               if (IS_ERR(battmgr->usb_psy))
+                       return dev_err_probe(dev, PTR_ERR(battmgr->usb_psy),
+                                            "failed to register USB power supply\n");
+
+               battmgr->wls_psy = devm_power_supply_register(dev, &sc8280xp_wls_psy_desc, &psy_cfg_supply);
+               if (IS_ERR(battmgr->wls_psy))
+                       return dev_err_probe(dev, PTR_ERR(battmgr->wls_psy),
+                                            "failed to register wireless charing power supply\n");
+       } else {
+               battmgr->bat_psy = devm_power_supply_register(dev, &sm8350_bat_psy_desc, &psy_cfg);
+               if (IS_ERR(battmgr->bat_psy))
+                       return dev_err_probe(dev, PTR_ERR(battmgr->bat_psy),
+                                            "failed to register battery power supply\n");
+
+               battmgr->usb_psy = devm_power_supply_register(dev, &sm8350_usb_psy_desc, &psy_cfg_supply);
+               if (IS_ERR(battmgr->usb_psy))
+                       return dev_err_probe(dev, PTR_ERR(battmgr->usb_psy),
+                                            "failed to register USB power supply\n");
+
+               battmgr->wls_psy = devm_power_supply_register(dev, &sm8350_wls_psy_desc, &psy_cfg_supply);
+               if (IS_ERR(battmgr->wls_psy))
+                       return dev_err_probe(dev, PTR_ERR(battmgr->wls_psy),
+                                            "failed to register wireless charing power supply\n");
+       }
+
        battmgr->client = devm_pmic_glink_register_client(dev,
                                                          PMIC_GLINK_OWNER_BATTMGR,
                                                          qcom_battmgr_callback,
index bc88a40a88d4cac0bbe61efc4725703b1136654e..8bbcd983a74aa8d8e5db6ae9e9cb7480a9220575 100644 (file)
@@ -29,8 +29,8 @@ struct max5970_regulator {
 };
 
 enum max597x_regulator_id {
-       MAX597X_SW0,
-       MAX597X_SW1,
+       MAX597X_sw0,
+       MAX597X_sw1,
 };
 
 static int max5970_read_adc(struct regmap *regmap, int reg, long *val)
@@ -378,8 +378,8 @@ static int max597x_dt_parse(struct device_node *np,
 }
 
 static const struct regulator_desc regulators[] = {
-       MAX597X_SWITCH(SW0, MAX5970_REG_CHXEN, 0, "vss1"),
-       MAX597X_SWITCH(SW1, MAX5970_REG_CHXEN, 1, "vss2"),
+       MAX597X_SWITCH(sw0, MAX5970_REG_CHXEN, 0, "vss1"),
+       MAX597X_SWITCH(sw1, MAX5970_REG_CHXEN, 1, "vss2"),
 };
 
 static int max597x_regmap_read_clear(struct regmap *map, unsigned int reg,
@@ -392,7 +392,7 @@ static int max597x_regmap_read_clear(struct regmap *map, unsigned int reg,
                return ret;
 
        if (*val)
-               return regmap_write(map, reg, *val);
+               return regmap_write(map, reg, 0);
 
        return 0;
 }
index 698c420e0869bd464f1368305b16d26a18360fa4..60cfcd741c2af31ce7e351cdf8cfb35f996264cf 100644 (file)
@@ -157,7 +157,17 @@ static int pwm_regulator_get_voltage(struct regulator_dev *rdev)
 
        pwm_get_state(drvdata->pwm, &pstate);
 
+       if (!pstate.enabled) {
+               if (pstate.polarity == PWM_POLARITY_INVERSED)
+                       pstate.duty_cycle = pstate.period;
+               else
+                       pstate.duty_cycle = 0;
+       }
+
        voltage = pwm_get_relative_duty_cycle(&pstate, duty_unit);
+       if (voltage < min(max_uV_duty, min_uV_duty) ||
+           voltage > max(max_uV_duty, min_uV_duty))
+               return -ENOTRECOVERABLE;
 
        /*
         * The dutycycle for min_uV might be greater than the one for max_uV.
@@ -313,6 +323,32 @@ static int pwm_regulator_init_continuous(struct platform_device *pdev,
        return 0;
 }
 
+static int pwm_regulator_init_boot_on(struct platform_device *pdev,
+                                     struct pwm_regulator_data *drvdata,
+                                     const struct regulator_init_data *init_data)
+{
+       struct pwm_state pstate;
+
+       if (!init_data->constraints.boot_on || drvdata->enb_gpio)
+               return 0;
+
+       pwm_get_state(drvdata->pwm, &pstate);
+       if (pstate.enabled)
+               return 0;
+
+       /*
+        * Update the duty cycle so the output does not change
+        * when the regulator core enables the regulator (and
+        * thus the PWM channel).
+        */
+       if (pstate.polarity == PWM_POLARITY_INVERSED)
+               pstate.duty_cycle = pstate.period;
+       else
+               pstate.duty_cycle = 0;
+
+       return pwm_apply_might_sleep(drvdata->pwm, &pstate);
+}
+
 static int pwm_regulator_probe(struct platform_device *pdev)
 {
        const struct regulator_init_data *init_data;
@@ -372,6 +408,13 @@ static int pwm_regulator_probe(struct platform_device *pdev)
        if (ret)
                return ret;
 
+       ret = pwm_regulator_init_boot_on(pdev, drvdata, init_data);
+       if (ret) {
+               dev_err(&pdev->dev, "Failed to apply boot_on settings: %d\n",
+                       ret);
+               return ret;
+       }
+
        regulator = devm_regulator_register(&pdev->dev,
                                            &drvdata->desc, &config);
        if (IS_ERR(regulator)) {
index e374fa6e5f2841e4a96fc2ad03907f020284cfc6..d89ae7f16d7a0e1f8d8c0ec19e7ad7565e816858 100644 (file)
@@ -1017,14 +1017,14 @@ static const struct regulator_desc rk805_reg[] = {
 };
 
 static const struct linear_range rk806_buck_voltage_ranges[] = {
-       REGULATOR_LINEAR_RANGE(500000, 0, 160, 6250), /* 500mV ~ 1500mV */
-       REGULATOR_LINEAR_RANGE(1500000, 161, 237, 25000), /* 1500mV ~ 3400mV */
-       REGULATOR_LINEAR_RANGE(3400000, 238, 255, 0),
+       REGULATOR_LINEAR_RANGE(500000, 0, 159, 6250), /* 500mV ~ 1500mV */
+       REGULATOR_LINEAR_RANGE(1500000, 160, 235, 25000), /* 1500mV ~ 3400mV */
+       REGULATOR_LINEAR_RANGE(3400000, 236, 255, 0),
 };
 
 static const struct linear_range rk806_ldo_voltage_ranges[] = {
-       REGULATOR_LINEAR_RANGE(500000, 0, 232, 12500), /* 500mV ~ 3400mV */
-       REGULATOR_LINEAR_RANGE(3400000, 233, 255, 0), /* 500mV ~ 3400mV */
+       REGULATOR_LINEAR_RANGE(500000, 0, 231, 12500), /* 500mV ~ 3400mV */
+       REGULATOR_LINEAR_RANGE(3400000, 232, 255, 0),
 };
 
 static const struct regulator_desc rk806_reg[] = {
index f48214e2c3b46000eb2e833b422be4474a08f920..04133510e5af7dee68f7d4cb8f10f7af02ff44ab 100644 (file)
@@ -726,9 +726,25 @@ static int ti_abb_probe(struct platform_device *pdev)
                        return PTR_ERR(abb->setup_reg);
        }
 
-       abb->int_base = devm_platform_ioremap_resource_byname(pdev, "int-address");
-       if (IS_ERR(abb->int_base))
-               return PTR_ERR(abb->int_base);
+       pname = "int-address";
+       res = platform_get_resource_byname(pdev, IORESOURCE_MEM, pname);
+       if (!res) {
+               dev_err(dev, "Missing '%s' IO resource\n", pname);
+               return -ENODEV;
+       }
+       /*
+        * The MPU interrupt status register (PRM_IRQSTATUS_MPU) is
+        * shared between regulator-abb-{ivahd,dspeve,gpu} driver
+        * instances. Therefore use devm_ioremap() rather than
+        * devm_platform_ioremap_resource_byname() to avoid busy
+        * resource region conflicts.
+        */
+       abb->int_base = devm_ioremap(dev, res->start,
+                                            resource_size(res));
+       if (!abb->int_base) {
+               dev_err(dev, "Unable to map '%s'\n", pname);
+               return -ENOMEM;
+       }
 
        /* Map Optional resources */
        pname = "efuse-address";
index d5caf36c56cdcfdfdc22537de70cd5ab44b1644c..225c859d6da550a8cb44c3d2446f3d433c9716e3 100644 (file)
@@ -54,7 +54,7 @@ static void rtc_time64_to_tm_test_date_range(struct kunit *test)
 
                days = div_s64(secs, 86400);
 
-               #define FAIL_MSG "%d/%02d/%02d (%2d) : %ld", \
+               #define FAIL_MSG "%d/%02d/%02d (%2d) : %lld", \
                        year, month, mday, yday, days
 
                KUNIT_ASSERT_EQ_MSG(test, year - 1900, result.tm_year, FAIL_MSG);
index 7327e81352e9c7c74248adbd95b7c1c62aa05643..cead018c3f06a5917396561321ef354b8db6acc3 100644 (file)
@@ -8,9 +8,6 @@
  * Copyright IBM Corp. 1999, 2009
  */
 
-#define KMSG_COMPONENT "dasd"
-#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
-
 #include <linux/kmod.h>
 #include <linux/init.h>
 #include <linux/interrupt.h>
@@ -30,9 +27,6 @@
 #include <asm/itcw.h>
 #include <asm/diag.h>
 
-/* This is ugly... */
-#define PRINTK_HEADER "dasd:"
-
 #include "dasd_int.h"
 /*
  * SECTION: Constant definitions to be used within this file
@@ -313,39 +307,57 @@ static int dasd_state_basic_to_known(struct dasd_device *device)
  */
 static int dasd_state_basic_to_ready(struct dasd_device *device)
 {
-       int rc;
-       struct dasd_block *block;
-       struct gendisk *disk;
+       struct dasd_block *block = device->block;
+       struct queue_limits lim;
+       int rc = 0;
 
-       rc = 0;
-       block = device->block;
        /* make disk known with correct capacity */
-       if (block) {
-               if (block->base->discipline->do_analysis != NULL)
-                       rc = block->base->discipline->do_analysis(block);
-               if (rc) {
-                       if (rc != -EAGAIN) {
-                               device->state = DASD_STATE_UNFMT;
-                               disk = device->block->gdp;
-                               kobject_uevent(&disk_to_dev(disk)->kobj,
-                                              KOBJ_CHANGE);
-                               goto out;
-                       }
-                       return rc;
-               }
-               if (device->discipline->setup_blk_queue)
-                       device->discipline->setup_blk_queue(block);
-               set_capacity(block->gdp,
-                            block->blocks << block->s2b_shift);
+       if (!block) {
                device->state = DASD_STATE_READY;
-               rc = dasd_scan_partitions(block);
-               if (rc) {
-                       device->state = DASD_STATE_BASIC;
+               goto out;
+       }
+
+       if (block->base->discipline->do_analysis != NULL)
+               rc = block->base->discipline->do_analysis(block);
+       if (rc) {
+               if (rc == -EAGAIN)
                        return rc;
-               }
-       } else {
-               device->state = DASD_STATE_READY;
+               device->state = DASD_STATE_UNFMT;
+               kobject_uevent(&disk_to_dev(device->block->gdp)->kobj,
+                              KOBJ_CHANGE);
+               goto out;
+       }
+
+       lim = queue_limits_start_update(block->gdp->queue);
+       lim.max_dev_sectors = device->discipline->max_sectors(block);
+       lim.max_hw_sectors = lim.max_dev_sectors;
+       lim.logical_block_size = block->bp_block;
+
+       if (device->discipline->has_discard) {
+               unsigned int max_bytes;
+
+               lim.discard_granularity = block->bp_block;
+
+               /* Calculate max_discard_sectors and make it PAGE aligned */
+               max_bytes = USHRT_MAX * block->bp_block;
+               max_bytes = ALIGN_DOWN(max_bytes, PAGE_SIZE);
+
+               lim.max_hw_discard_sectors = max_bytes / block->bp_block;
+               lim.max_write_zeroes_sectors = lim.max_hw_discard_sectors;
        }
+       rc = queue_limits_commit_update(block->gdp->queue, &lim);
+       if (rc)
+               return rc;
+
+       set_capacity(block->gdp, block->blocks << block->s2b_shift);
+       device->state = DASD_STATE_READY;
+
+       rc = dasd_scan_partitions(block);
+       if (rc) {
+               device->state = DASD_STATE_BASIC;
+               return rc;
+       }
+
 out:
        if (device->discipline->basic_to_ready)
                rc = device->discipline->basic_to_ready(device);
@@ -412,7 +424,7 @@ dasd_state_ready_to_online(struct dasd_device * device)
                                        KOBJ_CHANGE);
                        return 0;
                }
-               disk_uevent(device->block->bdev_handle->bdev->bd_disk,
+               disk_uevent(file_bdev(device->block->bdev_file)->bd_disk,
                            KOBJ_CHANGE);
        }
        return 0;
@@ -433,7 +445,7 @@ static int dasd_state_online_to_ready(struct dasd_device *device)
 
        device->state = DASD_STATE_READY;
        if (device->block && !(device->features & DASD_FEATURE_USERAW))
-               disk_uevent(device->block->bdev_handle->bdev->bd_disk,
+               disk_uevent(file_bdev(device->block->bdev_file)->bd_disk,
                            KOBJ_CHANGE);
        return 0;
 }
@@ -1301,7 +1313,6 @@ int dasd_term_IO(struct dasd_ccw_req *cqr)
 {
        struct dasd_device *device;
        int retries, rc;
-       char errorstring[ERRORLENGTH];
 
        /* Check the cqr */
        rc = dasd_check_cqr(cqr);
@@ -1340,10 +1351,8 @@ int dasd_term_IO(struct dasd_ccw_req *cqr)
                        rc = 0;
                        break;
                default:
-                       /* internal error 10 - unknown rc*/
-                       snprintf(errorstring, ERRORLENGTH, "10 %d", rc);
-                       dev_err(&device->cdev->dev, "An error occurred in the "
-                               "DASD device driver, reason=%s\n", errorstring);
+                       dev_err(&device->cdev->dev,
+                               "Unexpected error during request termination %d\n", rc);
                        BUG();
                        break;
                }
@@ -1362,7 +1371,6 @@ int dasd_start_IO(struct dasd_ccw_req *cqr)
 {
        struct dasd_device *device;
        int rc;
-       char errorstring[ERRORLENGTH];
 
        /* Check the cqr */
        rc = dasd_check_cqr(cqr);
@@ -1382,10 +1390,8 @@ int dasd_start_IO(struct dasd_ccw_req *cqr)
                return -EPERM;
        }
        if (cqr->retries < 0) {
-               /* internal error 14 - start_IO run out of retries */
-               sprintf(errorstring, "14 %p", cqr);
-               dev_err(&device->cdev->dev, "An error occurred in the DASD "
-                       "device driver, reason=%s\n", errorstring);
+               dev_err(&device->cdev->dev,
+                       "Start I/O ran out of retries\n");
                cqr->status = DASD_CQR_ERROR;
                return -EIO;
        }
@@ -1463,11 +1469,8 @@ int dasd_start_IO(struct dasd_ccw_req *cqr)
                              "not accessible");
                break;
        default:
-               /* internal error 11 - unknown rc */
-               snprintf(errorstring, ERRORLENGTH, "11 %d", rc);
                dev_err(&device->cdev->dev,
-                       "An error occurred in the DASD device driver, "
-                       "reason=%s\n", errorstring);
+                       "Unexpected error during request start %d", rc);
                BUG();
                break;
        }
@@ -1904,8 +1907,6 @@ static void __dasd_device_process_ccw_queue(struct dasd_device *device,
 static void __dasd_process_cqr(struct dasd_device *device,
                               struct dasd_ccw_req *cqr)
 {
-       char errorstring[ERRORLENGTH];
-
        switch (cqr->status) {
        case DASD_CQR_SUCCESS:
                cqr->status = DASD_CQR_DONE;
@@ -1917,11 +1918,8 @@ static void __dasd_process_cqr(struct dasd_device *device,
                cqr->status = DASD_CQR_TERMINATED;
                break;
        default:
-               /* internal error 12 - wrong cqr status*/
-               snprintf(errorstring, ERRORLENGTH, "12 %p %x02", cqr, cqr->status);
                dev_err(&device->cdev->dev,
-                       "An error occurred in the DASD device driver, "
-                       "reason=%s\n", errorstring);
+                       "Unexpected CQR status %02x", cqr->status);
                BUG();
        }
        if (cqr->callback)
@@ -1986,16 +1984,14 @@ static void __dasd_device_check_expire(struct dasd_device *device)
                if (device->discipline->term_IO(cqr) != 0) {
                        /* Hmpf, try again in 5 sec */
                        dev_err(&device->cdev->dev,
-                               "cqr %p timed out (%lus) but cannot be "
-                               "ended, retrying in 5 s\n",
-                               cqr, (cqr->expires/HZ));
+                               "CQR timed out (%lus) but cannot be ended, retrying in 5s\n",
+                               (cqr->expires / HZ));
                        cqr->expires += 5*HZ;
                        dasd_device_set_timer(device, 5*HZ);
                } else {
                        dev_err(&device->cdev->dev,
-                               "cqr %p timed out (%lus), %i retries "
-                               "remaining\n", cqr, (cqr->expires/HZ),
-                               cqr->retries);
+                               "CQR timed out (%lus), %i retries remaining\n",
+                               (cqr->expires / HZ), cqr->retries);
                }
                __dasd_device_check_autoquiesce_timeout(device, cqr);
        }
@@ -2116,8 +2112,7 @@ int dasd_flush_device_queue(struct dasd_device *device)
                        if (rc) {
                                /* unable to terminate requeust */
                                dev_err(&device->cdev->dev,
-                                       "Flushing the DASD request queue "
-                                       "failed for request %p\n", cqr);
+                                       "Flushing the DASD request queue failed\n");
                                /* stop flush processing */
                                goto finished;
                        }
@@ -2633,8 +2628,7 @@ static int __dasd_cancel_req(struct dasd_ccw_req *cqr)
                rc = device->discipline->term_IO(cqr);
                if (rc) {
                        dev_err(&device->cdev->dev,
-                               "Cancelling request %p failed with rc=%d\n",
-                               cqr, rc);
+                               "Cancelling request failed with rc=%d\n", rc);
                } else {
                        cqr->stopclk = get_tod_clock();
                }
@@ -3402,8 +3396,7 @@ static void dasd_generic_auto_online(void *data, async_cookie_t cookie)
 
        ret = ccw_device_set_online(cdev);
        if (ret)
-               pr_warn("%s: Setting the DASD online failed with rc=%d\n",
-                       dev_name(&cdev->dev), ret);
+               dev_warn(&cdev->dev, "Setting the DASD online failed with rc=%d\n", ret);
 }
 
 /*
@@ -3490,8 +3483,11 @@ int dasd_generic_set_online(struct ccw_device *cdev,
 {
        struct dasd_discipline *discipline;
        struct dasd_device *device;
+       struct device *dev;
        int rc;
 
+       dev = &cdev->dev;
+
        /* first online clears initial online feature flag */
        dasd_set_feature(cdev, DASD_FEATURE_INITIAL_ONLINE, 0);
        device = dasd_create_device(cdev);
@@ -3504,11 +3500,10 @@ int dasd_generic_set_online(struct ccw_device *cdev,
                        /* Try to load the required module. */
                        rc = request_module(DASD_DIAG_MOD);
                        if (rc) {
-                               pr_warn("%s Setting the DASD online failed "
-                                       "because the required module %s "
-                                       "could not be loaded (rc=%d)\n",
-                                       dev_name(&cdev->dev), DASD_DIAG_MOD,
-                                       rc);
+                               dev_warn(dev, "Setting the DASD online failed "
+                                        "because the required module %s "
+                                        "could not be loaded (rc=%d)\n",
+                                        DASD_DIAG_MOD, rc);
                                dasd_delete_device(device);
                                return -ENODEV;
                        }
@@ -3516,8 +3511,7 @@ int dasd_generic_set_online(struct ccw_device *cdev,
                /* Module init could have failed, so check again here after
                 * request_module(). */
                if (!dasd_diag_discipline_pointer) {
-                       pr_warn("%s Setting the DASD online failed because of missing DIAG discipline\n",
-                               dev_name(&cdev->dev));
+                       dev_warn(dev, "Setting the DASD online failed because of missing DIAG discipline\n");
                        dasd_delete_device(device);
                        return -ENODEV;
                }
@@ -3527,37 +3521,33 @@ int dasd_generic_set_online(struct ccw_device *cdev,
                dasd_delete_device(device);
                return -EINVAL;
        }
+       device->base_discipline = base_discipline;
        if (!try_module_get(discipline->owner)) {
-               module_put(base_discipline->owner);
                dasd_delete_device(device);
                return -EINVAL;
        }
-       device->base_discipline = base_discipline;
        device->discipline = discipline;
 
        /* check_device will allocate block device if necessary */
        rc = discipline->check_device(device);
        if (rc) {
-               pr_warn("%s Setting the DASD online with discipline %s failed with rc=%i\n",
-                       dev_name(&cdev->dev), discipline->name, rc);
-               module_put(discipline->owner);
-               module_put(base_discipline->owner);
+               dev_warn(dev, "Setting the DASD online with discipline %s failed with rc=%i\n",
+                        discipline->name, rc);
                dasd_delete_device(device);
                return rc;
        }
 
        dasd_set_target_state(device, DASD_STATE_ONLINE);
        if (device->state <= DASD_STATE_KNOWN) {
-               pr_warn("%s Setting the DASD online failed because of a missing discipline\n",
-                       dev_name(&cdev->dev));
+               dev_warn(dev, "Setting the DASD online failed because of a missing discipline\n");
                rc = -ENODEV;
                dasd_set_target_state(device, DASD_STATE_NEW);
                if (device->block)
                        dasd_free_block(device->block);
                dasd_delete_device(device);
-       } else
-               pr_debug("dasd_generic device %s found\n",
-                               dev_name(&cdev->dev));
+       } else {
+               dev_dbg(dev, "dasd_generic device found\n");
+       }
 
        wait_event(dasd_init_waitq, _wait_for_device(device));
 
@@ -3568,10 +3558,13 @@ EXPORT_SYMBOL_GPL(dasd_generic_set_online);
 
 int dasd_generic_set_offline(struct ccw_device *cdev)
 {
+       int max_count, open_count, rc;
        struct dasd_device *device;
        struct dasd_block *block;
-       int max_count, open_count, rc;
        unsigned long flags;
+       struct device *dev;
+
+       dev = &cdev->dev;
 
        rc = 0;
        spin_lock_irqsave(get_ccwdev_lock(cdev), flags);
@@ -3588,15 +3581,14 @@ int dasd_generic_set_offline(struct ccw_device *cdev)
         * in the other openers.
         */
        if (device->block) {
-               max_count = device->block->bdev_handle ? 0 : -1;
+               max_count = device->block->bdev_file ? 0 : -1;
                open_count = atomic_read(&device->block->open_count);
                if (open_count > max_count) {
                        if (open_count > 0)
-                               pr_warn("%s: The DASD cannot be set offline with open count %i\n",
-                                       dev_name(&cdev->dev), open_count);
+                               dev_warn(dev, "The DASD cannot be set offline with open count %i\n",
+                                        open_count);
                        else
-                               pr_warn("%s: The DASD cannot be set offline while it is in use\n",
-                                       dev_name(&cdev->dev));
+                               dev_warn(dev, "The DASD cannot be set offline while it is in use\n");
                        rc = -EBUSY;
                        goto out_err;
                }
@@ -3634,8 +3626,8 @@ int dasd_generic_set_offline(struct ccw_device *cdev)
                 * so sync bdev first and then wait for our queues to become
                 * empty
                 */
-               if (device->block && device->block->bdev_handle)
-                       bdev_mark_dead(device->block->bdev_handle->bdev, false);
+               if (device->block && device->block->bdev_file)
+                       bdev_mark_dead(file_bdev(device->block->bdev_file), false);
                dasd_schedule_device_bh(device);
                rc = wait_event_interruptible(shutdown_waitq,
                                              _wait_for_empty_queues(device));
@@ -3956,8 +3948,8 @@ static int dasd_handle_autoquiesce(struct dasd_device *device,
        if (dasd_eer_enabled(device))
                dasd_eer_write(device, NULL, DASD_EER_AUTOQUIESCE);
 
-       pr_info("%s: The DASD has been put in the quiesce state\n",
-               dev_name(&device->cdev->dev));
+       dev_info(&device->cdev->dev,
+                "The DASD has been put in the quiesce state\n");
        dasd_device_set_stop_bits(device, DASD_STOPPED_QUIESCE);
 
        if (device->features & DASD_FEATURE_REQUEUEQUIESCE)
@@ -3977,10 +3969,8 @@ static struct dasd_ccw_req *dasd_generic_build_rdc(struct dasd_device *device,
                                   NULL);
 
        if (IS_ERR(cqr)) {
-               /* internal error 13 - Allocating the RDC request failed*/
-               dev_err(&device->cdev->dev,
-                        "An error occurred in the DASD device driver, "
-                        "reason=%s\n", "13");
+               DBF_EVENT_DEVID(DBF_WARNING, device->cdev, "%s",
+                               "Could not allocate RDC request");
                return cqr;
        }
 
index 89957bb7244d2655b5a932bfb9fbd845a7e3d48c..459b7f8ac8837283fc07e6257bd871f407008d08 100644 (file)
@@ -7,13 +7,9 @@
  *
  */
 
-#define KMSG_COMPONENT "dasd-eckd"
-
 #include <linux/timer.h>
 #include <asm/idals.h>
 
-#define PRINTK_HEADER "dasd_erp(3990): "
-
 #include "dasd_int.h"
 #include "dasd_eckd.h"
 
@@ -398,7 +394,6 @@ dasd_3990_handle_env_data(struct dasd_ccw_req * erp, char *sense)
        struct dasd_device *device = erp->startdev;
        char msg_format = (sense[7] & 0xF0);
        char msg_no = (sense[7] & 0x0F);
-       char errorstring[ERRORLENGTH];
 
        switch (msg_format) {
        case 0x00:              /* Format 0 - Program or System Checks */
@@ -1004,12 +999,9 @@ dasd_3990_handle_env_data(struct dasd_ccw_req * erp, char *sense)
                }
                break;
 
-       default:        /* unknown message format - should not happen
-                          internal error 03 - unknown message format */
-               snprintf(errorstring, ERRORLENGTH, "03 %x02", msg_format);
+       default:
                dev_err(&device->cdev->dev,
-                        "An error occurred in the DASD device driver, "
-                        "reason=%s\n", errorstring);
+                       "Unknown message format %02x", msg_format);
                break;
        }                       /* end switch message format */
 
@@ -1056,11 +1048,9 @@ dasd_3990_erp_com_rej(struct dasd_ccw_req * erp, char *sense)
                set_bit(DASD_CQR_SUPPRESS_CR, &erp->refers->flags);
                erp = dasd_3990_erp_cleanup(erp, DASD_CQR_FAILED);
        } else {
-               /* fatal error -  set status to FAILED
-                  internal error 09 - Command Reject */
                if (!test_bit(DASD_CQR_SUPPRESS_CR, &erp->flags))
                        dev_err(&device->cdev->dev,
-                               "An error occurred in the DASD device driver, reason=09\n");
+                               "An I/O command request was rejected\n");
 
                erp = dasd_3990_erp_cleanup(erp, DASD_CQR_FAILED);
        }
@@ -1128,13 +1118,7 @@ dasd_3990_erp_equip_check(struct dasd_ccw_req * erp, char *sense)
        erp->function = dasd_3990_erp_equip_check;
 
        if (sense[1] & SNS1_WRITE_INHIBITED) {
-               dev_info(&device->cdev->dev,
-                           "Write inhibited path encountered\n");
-
-               /* vary path offline
-                  internal error 04 - Path should be varied off-line.*/
-               dev_err(&device->cdev->dev, "An error occurred in the DASD "
-                       "device driver, reason=%s\n", "04");
+               dev_err(&device->cdev->dev, "Write inhibited path encountered\n");
 
                erp = dasd_3990_erp_action_1(erp);
 
@@ -1285,11 +1269,7 @@ dasd_3990_erp_inv_format(struct dasd_ccw_req * erp, char *sense)
                erp = dasd_3990_erp_action_4(erp, sense);
 
        } else {
-               /* internal error 06 - The track format is not valid*/
-               dev_err(&device->cdev->dev,
-                       "An error occurred in the DASD device driver, "
-                       "reason=%s\n", "06");
-
+               dev_err(&device->cdev->dev, "Track format is not valid\n");
                erp = dasd_3990_erp_cleanup(erp, DASD_CQR_FAILED);
        }
 
@@ -1663,9 +1643,8 @@ dasd_3990_erp_action_1B_32(struct dasd_ccw_req * default_erp, char *sense)
                                     sizeof(struct LO_eckd_data), device);
 
        if (IS_ERR(erp)) {
-               /* internal error 01 - Unable to allocate ERP */
-               dev_err(&device->cdev->dev, "An error occurred in the DASD "
-                       "device driver, reason=%s\n", "01");
+               DBF_DEV_EVENT(DBF_ERR, device, "%s",
+                             "Unable to allocate ERP request (1B 32)");
                return dasd_3990_erp_cleanup(default_erp, DASD_CQR_FAILED);
        }
 
@@ -1807,10 +1786,8 @@ dasd_3990_update_1B(struct dasd_ccw_req * previous_erp, char *sense)
        cpa = previous_erp->irb.scsw.cmd.cpa;
 
        if (cpa == 0) {
-               /* internal error 02 -
-                  Unable to determine address of the CCW to be restarted */
-               dev_err(&device->cdev->dev, "An error occurred in the DASD "
-                       "device driver, reason=%s\n", "02");
+               dev_err(&device->cdev->dev,
+                       "Unable to determine address of to be restarted CCW\n");
 
                previous_erp->status = DASD_CQR_FAILED;
 
@@ -2009,15 +1986,9 @@ dasd_3990_erp_compound_config(struct dasd_ccw_req * erp, char *sense)
 {
 
        if ((sense[25] & DASD_SENSE_BIT_1) && (sense[26] & DASD_SENSE_BIT_2)) {
-
-               /* set to suspended duplex state then restart
-                  internal error 05 - Set device to suspended duplex state
-                  should be done */
                struct dasd_device *device = erp->startdev;
                dev_err(&device->cdev->dev,
-                       "An error occurred in the DASD device driver, "
-                       "reason=%s\n", "05");
-
+                       "Compound configuration error occurred\n");
        }
 
        erp->function = dasd_3990_erp_compound_config;
@@ -2153,10 +2124,9 @@ dasd_3990_erp_inspect_32(struct dasd_ccw_req * erp, char *sense)
                        erp = dasd_3990_erp_int_req(erp);
                        break;
 
-               case 0x0F:  /* length mismatch during update write command
-                              internal error 08 - update write command error*/
-                       dev_err(&device->cdev->dev, "An error occurred in the "
-                               "DASD device driver, reason=%s\n", "08");
+               case 0x0F:
+                       dev_err(&device->cdev->dev,
+                               "Update write command error occurred\n");
 
                        erp = dasd_3990_erp_cleanup(erp, DASD_CQR_FAILED);
                        break;
@@ -2165,12 +2135,9 @@ dasd_3990_erp_inspect_32(struct dasd_ccw_req * erp, char *sense)
                        erp = dasd_3990_erp_action_10_32(erp, sense);
                        break;
 
-               case 0x15:      /* next track outside defined extend
-                                  internal error 07 - The next track is not
-                                  within the defined storage extent */
+               case 0x15:
                        dev_err(&device->cdev->dev,
-                               "An error occurred in the DASD device driver, "
-                               "reason=%s\n", "07");
+                               "Track outside defined extent error occurred\n");
 
                        erp = dasd_3990_erp_cleanup(erp, DASD_CQR_FAILED);
                        break;
@@ -2663,7 +2630,7 @@ dasd_3990_erp_further_erp(struct dasd_ccw_req *erp)
                 * necessary
                 */
                dev_err(&device->cdev->dev,
-                       "ERP %p has run out of retries and failed\n", erp);
+                       "ERP %px has run out of retries and failed\n", erp);
 
                erp->status = DASD_CQR_FAILED;
        }
@@ -2704,8 +2671,7 @@ dasd_3990_erp_handle_match_erp(struct dasd_ccw_req *erp_head,
        while (erp_done != erp) {
 
                if (erp_done == NULL)   /* end of chain reached */
-                       panic(PRINTK_HEADER "Programming error in ERP! The "
-                             "original request was lost\n");
+                       panic("Programming error in ERP! The original request was lost\n");
 
                /* remove the request from the device queue */
                list_del(&erp_done->blocklist);
@@ -2786,11 +2752,9 @@ dasd_3990_erp_action(struct dasd_ccw_req * cqr)
                            "ERP chain at BEGINNING of ERP-ACTION\n");
                for (temp_erp = cqr;
                     temp_erp != NULL; temp_erp = temp_erp->refers) {
-
                        dev_err(&device->cdev->dev,
-                                   "ERP %p (%02x) refers to %p\n",
-                                   temp_erp, temp_erp->status,
-                                   temp_erp->refers);
+                               "ERP %px (%02x) refers to %px\n",
+                               temp_erp, temp_erp->status, temp_erp->refers);
                }
        }
 
@@ -2837,11 +2801,9 @@ dasd_3990_erp_action(struct dasd_ccw_req * cqr)
                            "ERP chain at END of ERP-ACTION\n");
                for (temp_erp = erp;
                     temp_erp != NULL; temp_erp = temp_erp->refers) {
-
                        dev_err(&device->cdev->dev,
-                                   "ERP %p (%02x) refers to %p\n",
-                                   temp_erp, temp_erp->status,
-                                   temp_erp->refers);
+                               "ERP %px (%02x) refers to %px\n",
+                               temp_erp, temp_erp->status, temp_erp->refers);
                }
        }
 
index c9740ae88d1a633c92e246912a0f57959f5a0f9a..e84cd5436556392de9c63d78470cc3c48f578d16 100644 (file)
@@ -6,20 +6,12 @@
  * Author(s): Stefan Weinhuber <wein@de.ibm.com>
  */
 
-#define KMSG_COMPONENT "dasd-eckd"
-
 #include <linux/list.h>
 #include <linux/slab.h>
 #include <asm/ebcdic.h>
 #include "dasd_int.h"
 #include "dasd_eckd.h"
 
-#ifdef PRINTK_HEADER
-#undef PRINTK_HEADER
-#endif                         /* PRINTK_HEADER */
-#define PRINTK_HEADER "dasd(eckd):"
-
-
 /*
  * General concept of alias management:
  * - PAV and DASD alias management is specific to the eckd discipline.
index c4e36650c42649ff28150ca2d24f93c659758794..0316c20823eecf0024b54f1ae7d54e443a5aaf7f 100644 (file)
@@ -13,8 +13,6 @@
  *
  */
 
-#define KMSG_COMPONENT "dasd"
-
 #include <linux/ctype.h>
 #include <linux/init.h>
 #include <linux/module.h>
@@ -24,8 +22,6 @@
 #include <linux/uaccess.h>
 #include <asm/ipl.h>
 
-/* This is ugly... */
-#define PRINTK_HEADER "dasd_devmap:"
 #define DASD_MAX_PARAMS 256
 
 #include "dasd_int.h"
@@ -1114,7 +1110,7 @@ dasd_use_diag_show(struct device *dev, struct device_attribute *attr, char *buf)
                use_diag = (devmap->features & DASD_FEATURE_USEDIAG) != 0;
        else
                use_diag = (DASD_FEATURE_DEFAULT & DASD_FEATURE_USEDIAG) != 0;
-       return sprintf(buf, use_diag ? "1\n" : "0\n");
+       return sysfs_emit(buf, use_diag ? "1\n" : "0\n");
 }
 
 static ssize_t
@@ -1163,7 +1159,7 @@ dasd_use_raw_show(struct device *dev, struct device_attribute *attr, char *buf)
                use_raw = (devmap->features & DASD_FEATURE_USERAW) != 0;
        else
                use_raw = (DASD_FEATURE_DEFAULT & DASD_FEATURE_USERAW) != 0;
-       return sprintf(buf, use_raw ? "1\n" : "0\n");
+       return sysfs_emit(buf, use_raw ? "1\n" : "0\n");
 }
 
 static ssize_t
@@ -1259,7 +1255,7 @@ dasd_access_show(struct device *dev, struct device_attribute *attr,
        if (count < 0)
                return count;
 
-       return sprintf(buf, "%d\n", count);
+       return sysfs_emit(buf, "%d\n", count);
 }
 
 static DEVICE_ATTR(host_access_count, 0444, dasd_access_show, NULL);
@@ -1338,19 +1334,19 @@ static ssize_t dasd_alias_show(struct device *dev,
 
        device = dasd_device_from_cdev(to_ccwdev(dev));
        if (IS_ERR(device))
-               return sprintf(buf, "0\n");
+               return sysfs_emit(buf, "0\n");
 
        if (device->discipline && device->discipline->get_uid &&
            !device->discipline->get_uid(device, &uid)) {
                if (uid.type == UA_BASE_PAV_ALIAS ||
                    uid.type == UA_HYPER_PAV_ALIAS) {
                        dasd_put_device(device);
-                       return sprintf(buf, "1\n");
+                       return sysfs_emit(buf, "1\n");
                }
        }
        dasd_put_device(device);
 
-       return sprintf(buf, "0\n");
+       return sysfs_emit(buf, "0\n");
 }
 
 static DEVICE_ATTR(alias, 0444, dasd_alias_show, NULL);
@@ -1412,15 +1408,9 @@ dasd_uid_show(struct device *dev, struct device_attribute *attr, char *buf)
                        break;
                }
 
-               if (strlen(uid.vduit) > 0)
-                       snprintf(uid_string, sizeof(uid_string),
-                                "%s.%s.%04x.%s.%s",
-                                uid.vendor, uid.serial, uid.ssid, ua_string,
-                                uid.vduit);
-               else
-                       snprintf(uid_string, sizeof(uid_string),
-                                "%s.%s.%04x.%s",
-                                uid.vendor, uid.serial, uid.ssid, ua_string);
+               snprintf(uid_string, sizeof(uid_string), "%s.%s.%04x.%s%s%s",
+                        uid.vendor, uid.serial, uid.ssid, ua_string,
+                        uid.vduit[0] ? "." : "", uid.vduit);
        }
        dasd_put_device(device);
 
@@ -1862,7 +1852,7 @@ static ssize_t dasd_pm_show(struct device *dev,
 
        device = dasd_device_from_cdev(to_ccwdev(dev));
        if (IS_ERR(device))
-               return sprintf(buf, "0\n");
+               return sysfs_emit(buf, "0\n");
 
        opm = dasd_path_get_opm(device);
        nppm = dasd_path_get_nppm(device);
@@ -1872,8 +1862,8 @@ static ssize_t dasd_pm_show(struct device *dev,
        ifccpm = dasd_path_get_ifccpm(device);
        dasd_put_device(device);
 
-       return sprintf(buf, "%02x %02x %02x %02x %02x %02x\n", opm, nppm,
-                      cablepm, cuirpm, hpfpm, ifccpm);
+       return sysfs_emit(buf, "%02x %02x %02x %02x %02x %02x\n", opm, nppm,
+                         cablepm, cuirpm, hpfpm, ifccpm);
 }
 
 static DEVICE_ATTR(path_masks, 0444, dasd_pm_show, NULL);
index 2e4e555b37c33293c7b4dde87d29500748cbd2c3..ea4b1d01bb767ec712e8e3696d91eccbcefa4e8a 100644 (file)
@@ -8,8 +8,6 @@
  *
  */
 
-#define KMSG_COMPONENT "dasd"
-
 #include <linux/kernel_stat.h>
 #include <linux/stddef.h>
 #include <linux/kernel.h>
@@ -31,8 +29,6 @@
 #include "dasd_int.h"
 #include "dasd_diag.h"
 
-#define PRINTK_HEADER "dasd(diag):"
-
 MODULE_LICENSE("GPL");
 
 /* The maximum number of blocks per request (max_blocks) is dependent on the
@@ -621,25 +617,9 @@ dasd_diag_dump_sense(struct dasd_device *device, struct dasd_ccw_req * req,
                    "dump sense not available for DIAG data");
 }
 
-/*
- * Initialize block layer request queue.
- */
-static void dasd_diag_setup_blk_queue(struct dasd_block *block)
+static unsigned int dasd_diag_max_sectors(struct dasd_block *block)
 {
-       unsigned int logical_block_size = block->bp_block;
-       struct request_queue *q = block->gdp->queue;
-       int max;
-
-       max = DIAG_MAX_BLOCKS << block->s2b_shift;
-       blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
-       q->limits.max_dev_sectors = max;
-       blk_queue_logical_block_size(q, logical_block_size);
-       blk_queue_max_hw_sectors(q, max);
-       blk_queue_max_segments(q, USHRT_MAX);
-       /* With page sized segments each segment can be translated into one idaw/tidaw */
-       blk_queue_max_segment_size(q, PAGE_SIZE);
-       blk_queue_segment_boundary(q, PAGE_SIZE - 1);
-       blk_queue_dma_alignment(q, PAGE_SIZE - 1);
+       return DIAG_MAX_BLOCKS << block->s2b_shift;
 }
 
 static int dasd_diag_pe_handler(struct dasd_device *device,
@@ -652,10 +632,10 @@ static struct dasd_discipline dasd_diag_discipline = {
        .owner = THIS_MODULE,
        .name = "DIAG",
        .ebcname = "DIAG",
+       .max_sectors = dasd_diag_max_sectors,
        .check_device = dasd_diag_check_device,
        .pe_handler = dasd_diag_pe_handler,
        .fill_geometry = dasd_diag_fill_geometry,
-       .setup_blk_queue = dasd_diag_setup_blk_queue,
        .start_IO = dasd_start_diag,
        .term_IO = dasd_diag_term_IO,
        .handle_terminated_request = dasd_diag_handle_terminated_request,
index bd89b032968a4b747c64f5e1e9c95b799fab8d22..373c1a86c33ed5d7b760087a6669315bfa35182d 100644 (file)
@@ -10,8 +10,6 @@
  * Author.........: Nigel Hislop <hislop_nigel@emc.com>
  */
 
-#define KMSG_COMPONENT "dasd-eckd"
-
 #include <linux/stddef.h>
 #include <linux/kernel.h>
 #include <linux/slab.h>
 #include "dasd_int.h"
 #include "dasd_eckd.h"
 
-#ifdef PRINTK_HEADER
-#undef PRINTK_HEADER
-#endif                         /* PRINTK_HEADER */
-#define PRINTK_HEADER "dasd(eckd):"
-
 /*
  * raw track access always map to 64k in memory
  * so it maps to 16 blocks of 4k per track
@@ -1072,22 +1065,14 @@ static void dasd_eckd_read_fc_security(struct dasd_device *device)
        }
 }
 
-static void dasd_eckd_get_uid_string(struct dasd_conf *conf,
-                                    char *print_uid)
+static void dasd_eckd_get_uid_string(struct dasd_conf *conf, char *print_uid)
 {
        struct dasd_uid uid;
 
        create_uid(conf, &uid);
-       if (strlen(uid.vduit) > 0)
-               snprintf(print_uid, DASD_UID_STRLEN,
-                        "%s.%s.%04x.%02x.%s",
-                        uid.vendor, uid.serial, uid.ssid,
-                        uid.real_unit_addr, uid.vduit);
-       else
-               snprintf(print_uid, DASD_UID_STRLEN,
-                        "%s.%s.%04x.%02x",
-                        uid.vendor, uid.serial, uid.ssid,
-                        uid.real_unit_addr);
+       snprintf(print_uid, DASD_UID_STRLEN, "%s.%s.%04x.%02x%s%s",
+                uid.vendor, uid.serial, uid.ssid, uid.real_unit_addr,
+                uid.vduit[0] ? "." : "", uid.vduit);
 }
 
 static int dasd_eckd_check_cabling(struct dasd_device *device,
@@ -5529,15 +5514,15 @@ dasd_eckd_ioctl(struct dasd_block *block, unsigned int cmd, void __user *argp)
  * and return number of printed chars.
  */
 static void
-dasd_eckd_dump_ccw_range(struct ccw1 *from, struct ccw1 *to, char *page)
+dasd_eckd_dump_ccw_range(struct dasd_device *device, struct ccw1 *from,
+                        struct ccw1 *to, char *page)
 {
        int len, count;
        char *datap;
 
        len = 0;
        while (from <= to) {
-               len += sprintf(page + len, PRINTK_HEADER
-                              " CCW %p: %08X %08X DAT:",
+               len += sprintf(page + len, "CCW %px: %08X %08X DAT:",
                               from, ((int *) from)[0], ((int *) from)[1]);
 
                /* get pointer to data (consider IDALs) */
@@ -5560,7 +5545,7 @@ dasd_eckd_dump_ccw_range(struct ccw1 *from, struct ccw1 *to, char *page)
                from++;
        }
        if (len > 0)
-               printk(KERN_ERR "%s", page);
+               dev_err(&device->cdev->dev, "%s", page);
 }
 
 static void
@@ -5591,9 +5576,12 @@ dasd_eckd_dump_sense_dbf(struct dasd_device *device, struct irb *irb,
 static void dasd_eckd_dump_sense_ccw(struct dasd_device *device,
                                 struct dasd_ccw_req *req, struct irb *irb)
 {
-       char *page;
        struct ccw1 *first, *last, *fail, *from, *to;
+       struct device *dev;
        int len, sl, sct;
+       char *page;
+
+       dev = &device->cdev->dev;
 
        page = (char *) get_zeroed_page(GFP_ATOMIC);
        if (page == NULL) {
@@ -5602,24 +5590,18 @@ static void dasd_eckd_dump_sense_ccw(struct dasd_device *device,
                return;
        }
        /* dump the sense data */
-       len = sprintf(page, PRINTK_HEADER
-                     " I/O status report for device %s:\n",
-                     dev_name(&device->cdev->dev));
-       len += sprintf(page + len, PRINTK_HEADER
-                      " in req: %p CC:%02X FC:%02X AC:%02X SC:%02X DS:%02X "
-                      "CS:%02X RC:%d\n",
+       len = sprintf(page, "I/O status report:\n");
+       len += sprintf(page + len,
+                      "in req: %px CC:%02X FC:%02X AC:%02X SC:%02X DS:%02X CS:%02X RC:%d\n",
                       req, scsw_cc(&irb->scsw), scsw_fctl(&irb->scsw),
                       scsw_actl(&irb->scsw), scsw_stctl(&irb->scsw),
                       scsw_dstat(&irb->scsw), scsw_cstat(&irb->scsw),
                       req ? req->intrc : 0);
-       len += sprintf(page + len, PRINTK_HEADER
-                      " device %s: Failing CCW: %p\n",
-                      dev_name(&device->cdev->dev),
+       len += sprintf(page + len, "Failing CCW: %px\n",
                       phys_to_virt(irb->scsw.cmd.cpa));
        if (irb->esw.esw0.erw.cons) {
                for (sl = 0; sl < 4; sl++) {
-                       len += sprintf(page + len, PRINTK_HEADER
-                                      " Sense(hex) %2d-%2d:",
+                       len += sprintf(page + len, "Sense(hex) %2d-%2d:",
                                       (8 * sl), ((8 * sl) + 7));
 
                        for (sct = 0; sct < 8; sct++) {
@@ -5631,23 +5613,20 @@ static void dasd_eckd_dump_sense_ccw(struct dasd_device *device,
 
                if (irb->ecw[27] & DASD_SENSE_BIT_0) {
                        /* 24 Byte Sense Data */
-                       sprintf(page + len, PRINTK_HEADER
-                               " 24 Byte: %x MSG %x, "
-                               "%s MSGb to SYSOP\n",
+                       sprintf(page + len,
+                               "24 Byte: %x MSG %x, %s MSGb to SYSOP\n",
                                irb->ecw[7] >> 4, irb->ecw[7] & 0x0f,
                                irb->ecw[1] & 0x10 ? "" : "no");
                } else {
                        /* 32 Byte Sense Data */
-                       sprintf(page + len, PRINTK_HEADER
-                               " 32 Byte: Format: %x "
-                               "Exception class %x\n",
+                       sprintf(page + len,
+                               "32 Byte: Format: %x Exception class %x\n",
                                irb->ecw[6] & 0x0f, irb->ecw[22] >> 4);
                }
        } else {
-               sprintf(page + len, PRINTK_HEADER
-                       " SORRY - NO VALID SENSE AVAILABLE\n");
+               sprintf(page + len, "SORRY - NO VALID SENSE AVAILABLE\n");
        }
-       printk(KERN_ERR "%s", page);
+       dev_err(dev, "%s", page);
 
        if (req) {
                /* req == NULL for unsolicited interrupts */
@@ -5656,8 +5635,8 @@ static void dasd_eckd_dump_sense_ccw(struct dasd_device *device,
                first = req->cpaddr;
                for (last = first; last->flags & (CCW_FLAG_CC | CCW_FLAG_DC); last++);
                to = min(first + 6, last);
-               printk(KERN_ERR PRINTK_HEADER " Related CP in req: %p\n", req);
-               dasd_eckd_dump_ccw_range(first, to, page);
+               dev_err(dev, "Related CP in req: %px\n", req);
+               dasd_eckd_dump_ccw_range(device, first, to, page);
 
                /* print failing CCW area (maximum 4) */
                /* scsw->cda is either valid or zero  */
@@ -5665,19 +5644,19 @@ static void dasd_eckd_dump_sense_ccw(struct dasd_device *device,
                fail = phys_to_virt(irb->scsw.cmd.cpa); /* failing CCW */
                if (from <  fail - 2) {
                        from = fail - 2;     /* there is a gap - print header */
-                       printk(KERN_ERR PRINTK_HEADER "......\n");
+                       dev_err(dev, "......\n");
                }
                to = min(fail + 1, last);
-               dasd_eckd_dump_ccw_range(from, to, page + len);
+               dasd_eckd_dump_ccw_range(device, from, to, page + len);
 
                /* print last CCWs (maximum 2) */
                len = 0;
                from = max(from, ++to);
                if (from < last - 1) {
                        from = last - 1;     /* there is a gap - print header */
-                       printk(KERN_ERR PRINTK_HEADER "......\n");
+                       dev_err(dev, "......\n");
                }
-               dasd_eckd_dump_ccw_range(from, last, page + len);
+               dasd_eckd_dump_ccw_range(device, from, last, page + len);
        }
        free_page((unsigned long) page);
 }
@@ -5701,11 +5680,9 @@ static void dasd_eckd_dump_sense_tcw(struct dasd_device *device,
                return;
        }
        /* dump the sense data */
-       len = sprintf(page, PRINTK_HEADER
-                     " I/O status report for device %s:\n",
-                     dev_name(&device->cdev->dev));
-       len += sprintf(page + len, PRINTK_HEADER
-                      " in req: %p CC:%02X FC:%02X AC:%02X SC:%02X DS:%02X "
+       len = sprintf(page, "I/O status report:\n");
+       len += sprintf(page + len,
+                      "in req: %px CC:%02X FC:%02X AC:%02X SC:%02X DS:%02X "
                       "CS:%02X fcxs:%02X schxs:%02X RC:%d\n",
                       req, scsw_cc(&irb->scsw), scsw_fctl(&irb->scsw),
                       scsw_actl(&irb->scsw), scsw_stctl(&irb->scsw),
@@ -5713,9 +5690,7 @@ static void dasd_eckd_dump_sense_tcw(struct dasd_device *device,
                       irb->scsw.tm.fcxs,
                       (irb->scsw.tm.ifob << 7) | irb->scsw.tm.sesq,
                       req ? req->intrc : 0);
-       len += sprintf(page + len, PRINTK_HEADER
-                      " device %s: Failing TCW: %p\n",
-                      dev_name(&device->cdev->dev),
+       len += sprintf(page + len, "Failing TCW: %px\n",
                       phys_to_virt(irb->scsw.tm.tcw));
 
        tsb = NULL;
@@ -5724,47 +5699,37 @@ static void dasd_eckd_dump_sense_tcw(struct dasd_device *device,
                tsb = tcw_get_tsb(phys_to_virt(irb->scsw.tm.tcw));
 
        if (tsb) {
-               len += sprintf(page + len, PRINTK_HEADER
-                              " tsb->length %d\n", tsb->length);
-               len += sprintf(page + len, PRINTK_HEADER
-                              " tsb->flags %x\n", tsb->flags);
-               len += sprintf(page + len, PRINTK_HEADER
-                              " tsb->dcw_offset %d\n", tsb->dcw_offset);
-               len += sprintf(page + len, PRINTK_HEADER
-                              " tsb->count %d\n", tsb->count);
+               len += sprintf(page + len, "tsb->length %d\n", tsb->length);
+               len += sprintf(page + len, "tsb->flags %x\n", tsb->flags);
+               len += sprintf(page + len, "tsb->dcw_offset %d\n", tsb->dcw_offset);
+               len += sprintf(page + len, "tsb->count %d\n", tsb->count);
                residual = tsb->count - 28;
-               len += sprintf(page + len, PRINTK_HEADER
-                              " residual %d\n", residual);
+               len += sprintf(page + len, "residual %d\n", residual);
 
                switch (tsb->flags & 0x07) {
                case 1: /* tsa_iostat */
-                       len += sprintf(page + len, PRINTK_HEADER
-                              " tsb->tsa.iostat.dev_time %d\n",
+                       len += sprintf(page + len, "tsb->tsa.iostat.dev_time %d\n",
                                       tsb->tsa.iostat.dev_time);
-                       len += sprintf(page + len, PRINTK_HEADER
-                              " tsb->tsa.iostat.def_time %d\n",
+                       len += sprintf(page + len, "tsb->tsa.iostat.def_time %d\n",
                                       tsb->tsa.iostat.def_time);
-                       len += sprintf(page + len, PRINTK_HEADER
-                              " tsb->tsa.iostat.queue_time %d\n",
+                       len += sprintf(page + len, "tsb->tsa.iostat.queue_time %d\n",
                                       tsb->tsa.iostat.queue_time);
-                       len += sprintf(page + len, PRINTK_HEADER
-                              " tsb->tsa.iostat.dev_busy_time %d\n",
+                       len += sprintf(page + len, "tsb->tsa.iostat.dev_busy_time %d\n",
                                       tsb->tsa.iostat.dev_busy_time);
-                       len += sprintf(page + len, PRINTK_HEADER
-                              " tsb->tsa.iostat.dev_act_time %d\n",
+                       len += sprintf(page + len, "tsb->tsa.iostat.dev_act_time %d\n",
                                       tsb->tsa.iostat.dev_act_time);
                        sense = tsb->tsa.iostat.sense;
                        break;
                case 2: /* ts_ddpc */
-                       len += sprintf(page + len, PRINTK_HEADER
-                              " tsb->tsa.ddpc.rc %d\n", tsb->tsa.ddpc.rc);
+                       len += sprintf(page + len, "tsb->tsa.ddpc.rc %d\n",
+                                      tsb->tsa.ddpc.rc);
                        for (sl = 0; sl < 2; sl++) {
-                               len += sprintf(page + len, PRINTK_HEADER
-                                              " tsb->tsa.ddpc.rcq %2d-%2d: ",
+                               len += sprintf(page + len,
+                                              "tsb->tsa.ddpc.rcq %2d-%2d: ",
                                               (8 * sl), ((8 * sl) + 7));
                                rcq = tsb->tsa.ddpc.rcq;
                                for (sct = 0; sct < 8; sct++) {
-                                       len += sprintf(page + len, " %02x",
+                                       len += sprintf(page + len, "%02x",
                                                       rcq[8 * sl + sct]);
                                }
                                len += sprintf(page + len, "\n");
@@ -5772,15 +5737,15 @@ static void dasd_eckd_dump_sense_tcw(struct dasd_device *device,
                        sense = tsb->tsa.ddpc.sense;
                        break;
                case 3: /* tsa_intrg */
-                       len += sprintf(page + len, PRINTK_HEADER
-                                     " tsb->tsa.intrg.: not supported yet\n");
+                       len += sprintf(page + len,
+                                     "tsb->tsa.intrg.: not supported yet\n");
                        break;
                }
 
                if (sense) {
                        for (sl = 0; sl < 4; sl++) {
-                               len += sprintf(page + len, PRINTK_HEADER
-                                              " Sense(hex) %2d-%2d:",
+                               len += sprintf(page + len,
+                                              "Sense(hex) %2d-%2d:",
                                               (8 * sl), ((8 * sl) + 7));
                                for (sct = 0; sct < 8; sct++) {
                                        len += sprintf(page + len, " %02x",
@@ -5791,27 +5756,23 @@ static void dasd_eckd_dump_sense_tcw(struct dasd_device *device,
 
                        if (sense[27] & DASD_SENSE_BIT_0) {
                                /* 24 Byte Sense Data */
-                               sprintf(page + len, PRINTK_HEADER
-                                       " 24 Byte: %x MSG %x, "
-                                       "%s MSGb to SYSOP\n",
+                               sprintf(page + len,
+                                       "24 Byte: %x MSG %x, %s MSGb to SYSOP\n",
                                        sense[7] >> 4, sense[7] & 0x0f,
                                        sense[1] & 0x10 ? "" : "no");
                        } else {
                                /* 32 Byte Sense Data */
-                               sprintf(page + len, PRINTK_HEADER
-                                       " 32 Byte: Format: %x "
-                                       "Exception class %x\n",
+                               sprintf(page + len,
+                                       "32 Byte: Format: %x Exception class %x\n",
                                        sense[6] & 0x0f, sense[22] >> 4);
                        }
                } else {
-                       sprintf(page + len, PRINTK_HEADER
-                               " SORRY - NO VALID SENSE AVAILABLE\n");
+                       sprintf(page + len, "SORRY - NO VALID SENSE AVAILABLE\n");
                }
        } else {
-               sprintf(page + len, PRINTK_HEADER
-                       " SORRY - NO TSB DATA AVAILABLE\n");
+               sprintf(page + len, "SORRY - NO TSB DATA AVAILABLE\n");
        }
-       printk(KERN_ERR "%s", page);
+       dev_err(&device->cdev->dev, "%s", page);
        free_page((unsigned long) page);
 }
 
@@ -6865,17 +6826,9 @@ static void dasd_eckd_handle_hpf_error(struct dasd_device *device,
        dasd_schedule_requeue(device);
 }
 
-/*
- * Initialize block layer request queue.
- */
-static void dasd_eckd_setup_blk_queue(struct dasd_block *block)
+static unsigned int dasd_eckd_max_sectors(struct dasd_block *block)
 {
-       unsigned int logical_block_size = block->bp_block;
-       struct request_queue *q = block->gdp->queue;
-       struct dasd_device *device = block->base;
-       int max;
-
-       if (device->features & DASD_FEATURE_USERAW) {
+       if (block->base->features & DASD_FEATURE_USERAW) {
                /*
                 * the max_blocks value for raw_track access is 256
                 * it is higher than the native ECKD value because we
@@ -6883,19 +6836,10 @@ static void dasd_eckd_setup_blk_queue(struct dasd_block *block)
                 * so the max_hw_sectors are
                 * 2048 x 512B = 1024kB = 16 tracks
                 */
-               max = DASD_ECKD_MAX_BLOCKS_RAW << block->s2b_shift;
-       } else {
-               max = DASD_ECKD_MAX_BLOCKS << block->s2b_shift;
+               return DASD_ECKD_MAX_BLOCKS_RAW << block->s2b_shift;
        }
-       blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
-       q->limits.max_dev_sectors = max;
-       blk_queue_logical_block_size(q, logical_block_size);
-       blk_queue_max_hw_sectors(q, max);
-       blk_queue_max_segments(q, USHRT_MAX);
-       /* With page sized segments each segment can be translated into one idaw/tidaw */
-       blk_queue_max_segment_size(q, PAGE_SIZE);
-       blk_queue_segment_boundary(q, PAGE_SIZE - 1);
-       blk_queue_dma_alignment(q, PAGE_SIZE - 1);
+
+       return DASD_ECKD_MAX_BLOCKS << block->s2b_shift;
 }
 
 static struct ccw_driver dasd_eckd_driver = {
@@ -6927,7 +6871,7 @@ static struct dasd_discipline dasd_eckd_discipline = {
        .basic_to_ready = dasd_eckd_basic_to_ready,
        .online_to_ready = dasd_eckd_online_to_ready,
        .basic_to_known = dasd_eckd_basic_to_known,
-       .setup_blk_queue = dasd_eckd_setup_blk_queue,
+       .max_sectors = dasd_eckd_max_sectors,
        .fill_geometry = dasd_eckd_fill_geometry,
        .start_IO = dasd_start_IO,
        .term_IO = dasd_term_IO,
index c956de711cf78cb15d4e218b224244ae4b793147..5064a616e041a310c1db35c0cbeacdd7980f0b86 100644 (file)
@@ -7,8 +7,6 @@
  *  Author(s): Stefan Weinhuber <wein@de.ibm.com>
  */
 
-#define KMSG_COMPONENT "dasd-eckd"
-
 #include <linux/init.h>
 #include <linux/fs.h>
 #include <linux/kernel.h>
 #include "dasd_int.h"
 #include "dasd_eckd.h"
 
-#ifdef PRINTK_HEADER
-#undef PRINTK_HEADER
-#endif                         /* PRINTK_HEADER */
-#define PRINTK_HEADER "dasd(eer):"
-
 /*
  * SECTION: the internal buffer
  */
index c07e6e71351835eaf298cd30ba9281a1ef01004d..4c0d3a704513cc9f602317ed899e961d2a842701 100644 (file)
@@ -9,8 +9,6 @@
  *
  */
 
-#define KMSG_COMPONENT "dasd"
-
 #include <linux/ctype.h>
 #include <linux/init.h>
 
@@ -18,9 +16,6 @@
 #include <asm/ebcdic.h>
 #include <linux/uaccess.h>
 
-/* This is ugly... */
-#define PRINTK_HEADER "dasd_erp:"
-
 #include "dasd_int.h"
 
 struct dasd_ccw_req *
@@ -170,12 +165,12 @@ dasd_log_sense(struct dasd_ccw_req *cqr, struct irb *irb)
        device = cqr->startdev;
        if (cqr->intrc == -ETIMEDOUT) {
                dev_err(&device->cdev->dev,
-                       "A timeout error occurred for cqr %p\n", cqr);
+                       "A timeout error occurred for cqr %px\n", cqr);
                return;
        }
        if (cqr->intrc == -ENOLINK) {
                dev_err(&device->cdev->dev,
-                       "A transport error occurred for cqr %p\n", cqr);
+                       "A transport error occurred for cqr %px\n", cqr);
                return;
        }
        /* dump sense data */
index c06fa2b27120572bca1695c10690ed8fa8f4a9b3..bcbb2f8e91feb6ca1a674742188083d5a0cdb49d 100644 (file)
 #include "dasd_int.h"
 #include "dasd_fba.h"
 
-#ifdef PRINTK_HEADER
-#undef PRINTK_HEADER
-#endif                         /* PRINTK_HEADER */
-#define PRINTK_HEADER "dasd(fba):"
-
 #define FBA_DEFAULT_RETRIES 32
 
 #define DASD_FBA_CCW_WRITE 0x41
@@ -660,30 +655,27 @@ static void
 dasd_fba_dump_sense(struct dasd_device *device, struct dasd_ccw_req * req,
                    struct irb *irb)
 {
-       char *page;
        struct ccw1 *act, *end, *last;
        int len, sl, sct, count;
+       struct device *dev;
+       char *page;
+
+       dev = &device->cdev->dev;
 
        page = (char *) get_zeroed_page(GFP_ATOMIC);
        if (page == NULL) {
                DBF_DEV_EVENT(DBF_WARNING, device, "%s",
-                           "No memory to dump sense data");
+                             "No memory to dump sense data");
                return;
        }
-       len = sprintf(page, PRINTK_HEADER
-                     " I/O status report for device %s:\n",
-                     dev_name(&device->cdev->dev));
-       len += sprintf(page + len, PRINTK_HEADER
-                      " in req: %p CS: 0x%02X DS: 0x%02X\n", req,
-                      irb->scsw.cmd.cstat, irb->scsw.cmd.dstat);
-       len += sprintf(page + len, PRINTK_HEADER
-                      " device %s: Failing CCW: %p\n",
-                      dev_name(&device->cdev->dev),
+       len = sprintf(page, "I/O status report:\n");
+       len += sprintf(page + len, "in req: %px CS: 0x%02X DS: 0x%02X\n",
+                      req, irb->scsw.cmd.cstat, irb->scsw.cmd.dstat);
+       len += sprintf(page + len, "Failing CCW: %px\n",
                       (void *) (addr_t) irb->scsw.cmd.cpa);
        if (irb->esw.esw0.erw.cons) {
                for (sl = 0; sl < 4; sl++) {
-                       len += sprintf(page + len, PRINTK_HEADER
-                                      " Sense(hex) %2d-%2d:",
+                       len += sprintf(page + len, "Sense(hex) %2d-%2d:",
                                       (8 * sl), ((8 * sl) + 7));
 
                        for (sct = 0; sct < 8; sct++) {
@@ -693,20 +685,18 @@ dasd_fba_dump_sense(struct dasd_device *device, struct dasd_ccw_req * req,
                        len += sprintf(page + len, "\n");
                }
        } else {
-               len += sprintf(page + len, PRINTK_HEADER
-                              " SORRY - NO VALID SENSE AVAILABLE\n");
+               len += sprintf(page + len, "SORRY - NO VALID SENSE AVAILABLE\n");
        }
-       printk(KERN_ERR "%s", page);
+       dev_err(dev, "%s", page);
 
        /* dump the Channel Program */
        /* print first CCWs (maximum 8) */
        act = req->cpaddr;
-        for (last = act; last->flags & (CCW_FLAG_CC | CCW_FLAG_DC); last++);
+       for (last = act; last->flags & (CCW_FLAG_CC | CCW_FLAG_DC); last++);
        end = min(act + 8, last);
-       len = sprintf(page, PRINTK_HEADER " Related CP in req: %p\n", req);
+       len = sprintf(page, "Related CP in req: %px\n", req);
        while (act <= end) {
-               len += sprintf(page + len, PRINTK_HEADER
-                              " CCW %p: %08X %08X DAT:",
+               len += sprintf(page + len, "CCW %px: %08X %08X DAT:",
                               act, ((int *) act)[0], ((int *) act)[1]);
                for (count = 0; count < 32 && count < act->count;
                     count += sizeof(int))
@@ -716,19 +706,17 @@ dasd_fba_dump_sense(struct dasd_device *device, struct dasd_ccw_req * req,
                len += sprintf(page + len, "\n");
                act++;
        }
-       printk(KERN_ERR "%s", page);
-
+       dev_err(dev, "%s", page);
 
        /* print failing CCW area */
        len = 0;
        if (act <  ((struct ccw1 *)(addr_t) irb->scsw.cmd.cpa) - 2) {
                act = ((struct ccw1 *)(addr_t) irb->scsw.cmd.cpa) - 2;
-               len += sprintf(page + len, PRINTK_HEADER "......\n");
+               len += sprintf(page + len, "......\n");
        }
        end = min((struct ccw1 *)(addr_t) irb->scsw.cmd.cpa + 2, last);
        while (act <= end) {
-               len += sprintf(page + len, PRINTK_HEADER
-                              " CCW %p: %08X %08X DAT:",
+               len += sprintf(page + len, "CCW %px: %08X %08X DAT:",
                               act, ((int *) act)[0], ((int *) act)[1]);
                for (count = 0; count < 32 && count < act->count;
                     count += sizeof(int))
@@ -742,11 +730,10 @@ dasd_fba_dump_sense(struct dasd_device *device, struct dasd_ccw_req * req,
        /* print last CCWs */
        if (act <  last - 2) {
                act = last - 2;
-               len += sprintf(page + len, PRINTK_HEADER "......\n");
+               len += sprintf(page + len, "......\n");
        }
        while (act <= last) {
-               len += sprintf(page + len, PRINTK_HEADER
-                              " CCW %p: %08X %08X DAT:",
+               len += sprintf(page + len, "CCW %px: %08X %08X DAT:",
                               act, ((int *) act)[0], ((int *) act)[1]);
                for (count = 0; count < 32 && count < act->count;
                     count += sizeof(int))
@@ -757,39 +744,13 @@ dasd_fba_dump_sense(struct dasd_device *device, struct dasd_ccw_req * req,
                act++;
        }
        if (len > 0)
-               printk(KERN_ERR "%s", page);
+               dev_err(dev, "%s", page);
        free_page((unsigned long) page);
 }
 
-/*
- * Initialize block layer request queue.
- */
-static void dasd_fba_setup_blk_queue(struct dasd_block *block)
+static unsigned int dasd_fba_max_sectors(struct dasd_block *block)
 {
-       unsigned int logical_block_size = block->bp_block;
-       struct request_queue *q = block->gdp->queue;
-       unsigned int max_bytes, max_discard_sectors;
-       int max;
-
-       max = DASD_FBA_MAX_BLOCKS << block->s2b_shift;
-       blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
-       q->limits.max_dev_sectors = max;
-       blk_queue_logical_block_size(q, logical_block_size);
-       blk_queue_max_hw_sectors(q, max);
-       blk_queue_max_segments(q, USHRT_MAX);
-       /* With page sized segments each segment can be translated into one idaw/tidaw */
-       blk_queue_max_segment_size(q, PAGE_SIZE);
-       blk_queue_segment_boundary(q, PAGE_SIZE - 1);
-
-       q->limits.discard_granularity = logical_block_size;
-
-       /* Calculate max_discard_sectors and make it PAGE aligned */
-       max_bytes = USHRT_MAX * logical_block_size;
-       max_bytes = ALIGN_DOWN(max_bytes, PAGE_SIZE);
-       max_discard_sectors = max_bytes / logical_block_size;
-
-       blk_queue_max_discard_sectors(q, max_discard_sectors);
-       blk_queue_max_write_zeroes_sectors(q, max_discard_sectors);
+       return DASD_FBA_MAX_BLOCKS << block->s2b_shift;
 }
 
 static int dasd_fba_pe_handler(struct dasd_device *device,
@@ -802,10 +763,11 @@ static struct dasd_discipline dasd_fba_discipline = {
        .owner = THIS_MODULE,
        .name = "FBA ",
        .ebcname = "FBA ",
+       .has_discard = true,
        .check_device = dasd_fba_check_characteristics,
        .do_analysis = dasd_fba_do_analysis,
        .pe_handler = dasd_fba_pe_handler,
-       .setup_blk_queue = dasd_fba_setup_blk_queue,
+       .max_sectors = dasd_fba_max_sectors,
        .fill_geometry = dasd_fba_fill_geometry,
        .start_IO = dasd_start_IO,
        .term_IO = dasd_term_IO,
index 55e3abe94cde2f617f34afc1c5ad4c98234c34f9..4533dd055ca8e31d7fc8bf711cf0ad0dd19653cf 100644 (file)
@@ -11,8 +11,6 @@
  *
  */
 
-#define KMSG_COMPONENT "dasd"
-
 #include <linux/interrupt.h>
 #include <linux/major.h>
 #include <linux/fs.h>
@@ -20,9 +18,6 @@
 
 #include <linux/uaccess.h>
 
-/* This is ugly... */
-#define PRINTK_HEADER "dasd_gendisk:"
-
 #include "dasd_int.h"
 
 static unsigned int queue_depth = 32;
@@ -39,6 +34,16 @@ MODULE_PARM_DESC(nr_hw_queues, "Default number of hardware queues for new DASD d
  */
 int dasd_gendisk_alloc(struct dasd_block *block)
 {
+       struct queue_limits lim = {
+               /*
+                * With page sized segments, each segment can be translated into
+                * one idaw/tidaw.
+                */
+               .max_segment_size = PAGE_SIZE,
+               .seg_boundary_mask = PAGE_SIZE - 1,
+               .dma_alignment = PAGE_SIZE - 1,
+               .max_segments = USHRT_MAX,
+       };
        struct gendisk *gdp;
        struct dasd_device *base;
        int len, rc;
@@ -58,11 +63,12 @@ int dasd_gendisk_alloc(struct dasd_block *block)
        if (rc)
                return rc;
 
-       gdp = blk_mq_alloc_disk(&block->tag_set, block);
+       gdp = blk_mq_alloc_disk(&block->tag_set, &lim, block);
        if (IS_ERR(gdp)) {
                blk_mq_free_tag_set(&block->tag_set);
                return PTR_ERR(gdp);
        }
+       blk_queue_flag_set(QUEUE_FLAG_NONROT, gdp->queue);
 
        /* Initialize gendisk structure. */
        gdp->major = DASD_MAJOR;
@@ -127,15 +133,15 @@ void dasd_gendisk_free(struct dasd_block *block)
  */
 int dasd_scan_partitions(struct dasd_block *block)
 {
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        int rc;
 
-       bdev_handle = bdev_open_by_dev(disk_devt(block->gdp), BLK_OPEN_READ,
+       bdev_file = bdev_file_open_by_dev(disk_devt(block->gdp), BLK_OPEN_READ,
                                       NULL, NULL);
-       if (IS_ERR(bdev_handle)) {
+       if (IS_ERR(bdev_file)) {
                DBF_DEV_EVENT(DBF_ERR, block->base,
                              "scan partitions error, blkdev_get returned %ld",
-                             PTR_ERR(bdev_handle));
+                             PTR_ERR(bdev_file));
                return -ENODEV;
        }
 
@@ -147,15 +153,15 @@ int dasd_scan_partitions(struct dasd_block *block)
                                "scan partitions error, rc %d", rc);
 
        /*
-        * Since the matching bdev_release() call to the
-        * bdev_open_by_path() in this function is not called before
+        * Since the matching fput() call to the
+        * bdev_file_open_by_path() in this function is not called before
         * dasd_destroy_partitions the offline open_count limit needs to be
-        * increased from 0 to 1. This is done by setting device->bdev_handle
+        * increased from 0 to 1. This is done by setting device->bdev_file
         * (see dasd_generic_set_offline). As long as the partition detection
         * is running no offline should be allowed. That is why the assignment
-        * to block->bdev_handle is done AFTER the BLKRRPART ioctl.
+        * to block->bdev_file is done AFTER the BLKRRPART ioctl.
         */
-       block->bdev_handle = bdev_handle;
+       block->bdev_file = bdev_file;
        return 0;
 }
 
@@ -165,21 +171,21 @@ int dasd_scan_partitions(struct dasd_block *block)
  */
 void dasd_destroy_partitions(struct dasd_block *block)
 {
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
 
        /*
-        * Get the bdev_handle pointer from the device structure and clear
-        * device->bdev_handle to lower the offline open_count limit again.
+        * Get the bdev_file pointer from the device structure and clear
+        * device->bdev_file to lower the offline open_count limit again.
         */
-       bdev_handle = block->bdev_handle;
-       block->bdev_handle = NULL;
+       bdev_file = block->bdev_file;
+       block->bdev_file = NULL;
 
-       mutex_lock(&bdev_handle->bdev->bd_disk->open_mutex);
-       bdev_disk_changed(bdev_handle->bdev->bd_disk, true);
-       mutex_unlock(&bdev_handle->bdev->bd_disk->open_mutex);
+       mutex_lock(&file_bdev(bdev_file)->bd_disk->open_mutex);
+       bdev_disk_changed(file_bdev(bdev_file)->bd_disk, true);
+       mutex_unlock(&file_bdev(bdev_file)->bd_disk->open_mutex);
 
        /* Matching blkdev_put to the blkdev_get in dasd_scan_partitions. */
-       bdev_release(bdev_handle);
+       fput(bdev_file);
 }
 
 int dasd_gendisk_init(void)
index 1b1b8a41c4d42e6145be51537785b609e37c73be..e5f40536b4254027c3c181a6efb29e318fa72f7a 100644 (file)
@@ -113,9 +113,6 @@ do { \
                            __dev_id.ssid, __dev_id.devno, d_data);     \
 } while (0)
 
-/* limit size for an errorstring */
-#define ERRORLENGTH 30
-
 /* definition of dbf debug levels */
 #define        DBF_EMERG       0       /* system is unusable                   */
 #define        DBF_ALERT       1       /* action must be taken immediately     */
@@ -126,32 +123,6 @@ do { \
 #define        DBF_INFO        6       /* informational                        */
 #define        DBF_DEBUG       6       /* debug-level messages                 */
 
-/* messages to be written via klogd and dbf */
-#define DEV_MESSAGE(d_loglevel,d_device,d_string,d_args...)\
-do { \
-       printk(d_loglevel PRINTK_HEADER " %s: " d_string "\n", \
-              dev_name(&d_device->cdev->dev), d_args); \
-       DBF_DEV_EVENT(DBF_ALERT, d_device, d_string, d_args); \
-} while(0)
-
-#define MESSAGE(d_loglevel,d_string,d_args...)\
-do { \
-       printk(d_loglevel PRINTK_HEADER " " d_string "\n", d_args); \
-       DBF_EVENT(DBF_ALERT, d_string, d_args); \
-} while(0)
-
-/* messages to be written via klogd only */
-#define DEV_MESSAGE_LOG(d_loglevel,d_device,d_string,d_args...)\
-do { \
-       printk(d_loglevel PRINTK_HEADER " %s: " d_string "\n", \
-              dev_name(&d_device->cdev->dev), d_args); \
-} while(0)
-
-#define MESSAGE_LOG(d_loglevel,d_string,d_args...)\
-do { \
-       printk(d_loglevel PRINTK_HEADER " " d_string "\n", d_args); \
-} while(0)
-
 /* Macro to calculate number of blocks per page */
 #define BLOCKS_PER_PAGE(blksize) (PAGE_SIZE / blksize)
 
@@ -322,6 +293,7 @@ struct dasd_discipline {
        struct module *owner;
        char ebcname[8];        /* a name used for tagging and printks */
        char name[8];           /* a name used for tagging and printks */
+       bool has_discard;
 
        struct list_head list;  /* used for list of disciplines */
 
@@ -360,10 +332,7 @@ struct dasd_discipline {
        int (*online_to_ready) (struct dasd_device *);
        int (*basic_to_known)(struct dasd_device *);
 
-       /*
-        * Initialize block layer request queue.
-        */
-       void (*setup_blk_queue)(struct dasd_block *);
+       unsigned int (*max_sectors)(struct dasd_block *);
        /* (struct dasd_device *);
         * Device operation functions. build_cp creates a ccw chain for
         * a block device request, start_io starts the request and
@@ -650,7 +619,7 @@ struct dasd_block {
        struct gendisk *gdp;
        spinlock_t request_queue_lock;
        struct blk_mq_tag_set tag_set;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        atomic_t open_count;
 
        unsigned long blocks;      /* size of volume in blocks */
index 61b9675e2a675e9dc1ec69101d8a4bc368d461c8..7e0ed7032f76a80ce3079d21537bbae6f5126455 100644 (file)
@@ -10,8 +10,6 @@
  * i/o controls for the dasd driver.
  */
 
-#define KMSG_COMPONENT "dasd"
-
 #include <linux/interrupt.h>
 #include <linux/compat.h>
 #include <linux/major.h>
 #include <linux/uaccess.h>
 #include <linux/dasd_mod.h>
 
-/* This is ugly... */
-#define PRINTK_HEADER "dasd_ioctl:"
-
 #include "dasd_int.h"
 
-
 static int
 dasd_ioctl_api_version(void __user *argp)
 {
@@ -537,7 +531,7 @@ static int __dasd_ioctl_information(struct dasd_block *block,
         * This must be hidden from user-space.
         */
        dasd_info->open_count = atomic_read(&block->open_count);
-       if (!block->bdev_handle)
+       if (!block->bdev_file)
                dasd_info->open_count++;
 
        /*
index 62a859ea67f8936f5b170cbc590b8b7e37a0fb37..0faaa437d9be8535a2cfe8c4aefa9b42463d4fc0 100644 (file)
@@ -11,8 +11,6 @@
  *
  */
 
-#define KMSG_COMPONENT "dasd"
-
 #include <linux/ctype.h>
 #include <linux/slab.h>
 #include <linux/string.h>
@@ -23,9 +21,6 @@
 #include <asm/debug.h>
 #include <linux/uaccess.h>
 
-/* This is ugly... */
-#define PRINTK_HEADER "dasd_proc:"
-
 #include "dasd_int.h"
 
 static struct proc_dir_entry *dasd_proc_root_entry = NULL;
index 4b7ecd4fd4319c000d2a4f1101022eafae0f2291..9c8f529b827cb3556e07bb0c3ed7ce51e5873cff 100644 (file)
@@ -546,6 +546,9 @@ static const struct attribute_group *dcssblk_dev_attr_groups[] = {
 static ssize_t
 dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count)
 {
+       struct queue_limits lim = {
+               .logical_block_size     = 4096,
+       };
        int rc, i, j, num_of_segments;
        struct dcssblk_dev_info *dev_info;
        struct segment_info *seg_info, *temp;
@@ -629,9 +632,9 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char
        dev_info->dev.release = dcssblk_release_segment;
        dev_info->dev.groups = dcssblk_dev_attr_groups;
        INIT_LIST_HEAD(&dev_info->lh);
-       dev_info->gd = blk_alloc_disk(NUMA_NO_NODE);
-       if (dev_info->gd == NULL) {
-               rc = -ENOMEM;
+       dev_info->gd = blk_alloc_disk(&lim, NUMA_NO_NODE);
+       if (IS_ERR(dev_info->gd)) {
+               rc = PTR_ERR(dev_info->gd);
                goto seg_list_del;
        }
        dev_info->gd->major = dcssblk_major;
@@ -639,7 +642,6 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char
        dev_info->gd->fops = &dcssblk_devops;
        dev_info->gd->private_data = dev_info;
        dev_info->gd->flags |= GENHD_FL_NO_PART;
-       blk_queue_logical_block_size(dev_info->gd->queue, 4096);
        blk_queue_flag_set(QUEUE_FLAG_DAX, dev_info->gd->queue);
 
        seg_byte_size = (dev_info->end - dev_info->start + 1);
index ade95e91b3c8db6f657fdcb7bcec2c8589175e43..9f6fdd0daa74eb33bbf4fd7f570df7ce192d8b4b 100644 (file)
@@ -435,10 +435,17 @@ static const struct blk_mq_ops scm_mq_ops = {
 
 int scm_blk_dev_setup(struct scm_blk_dev *bdev, struct scm_device *scmdev)
 {
-       unsigned int devindex, nr_max_blk;
+       struct queue_limits lim = {
+               .logical_block_size     = 1 << 12,
+       };
+       unsigned int devindex;
        struct request_queue *rq;
        int len, ret;
 
+       lim.max_segments = min(scmdev->nr_max_block,
+               (unsigned int) (PAGE_SIZE / sizeof(struct aidaw)));
+       lim.max_hw_sectors = lim.max_segments << 3; /* 8 * 512 = blk_size */
+
        devindex = atomic_inc_return(&nr_devices) - 1;
        /* scma..scmz + scmaa..scmzz */
        if (devindex > 701) {
@@ -462,18 +469,12 @@ int scm_blk_dev_setup(struct scm_blk_dev *bdev, struct scm_device *scmdev)
        if (ret)
                goto out;
 
-       bdev->gendisk = blk_mq_alloc_disk(&bdev->tag_set, scmdev);
+       bdev->gendisk = blk_mq_alloc_disk(&bdev->tag_set, &lim, scmdev);
        if (IS_ERR(bdev->gendisk)) {
                ret = PTR_ERR(bdev->gendisk);
                goto out_tag;
        }
        rq = bdev->rq = bdev->gendisk->queue;
-       nr_max_blk = min(scmdev->nr_max_block,
-                        (unsigned int) (PAGE_SIZE / sizeof(struct aidaw)));
-
-       blk_queue_logical_block_size(rq, 1 << 12);
-       blk_queue_max_hw_sectors(rq, nr_max_blk << 3); /* 8 * 512 = blk_size */
-       blk_queue_max_segments(rq, nr_max_blk);
        blk_queue_flag_set(QUEUE_FLAG_NONROT, rq);
        blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, rq);
 
index c533d1dadc6bbb0f3f388ac62b01049ac99a72c5..a5dba3829769c7954ed2d3ba38800bc768fb0019 100644 (file)
@@ -202,7 +202,8 @@ int ccw_device_start_timeout_key(struct ccw_device *cdev, struct ccw1 *cpa,
                return -EINVAL;
        if (cdev->private->state == DEV_STATE_NOT_OPER)
                return -ENODEV;
-       if (cdev->private->state == DEV_STATE_VERIFY) {
+       if (cdev->private->state == DEV_STATE_VERIFY ||
+           cdev->private->flags.doverify) {
                /* Remember to fake irb when finished. */
                if (!cdev->private->flags.fake_irb) {
                        cdev->private->flags.fake_irb = FAKE_CMD_IRB;
@@ -214,8 +215,7 @@ int ccw_device_start_timeout_key(struct ccw_device *cdev, struct ccw1 *cpa,
        }
        if (cdev->private->state != DEV_STATE_ONLINE ||
            ((sch->schib.scsw.cmd.stctl & SCSW_STCTL_PRIM_STATUS) &&
-            !(sch->schib.scsw.cmd.stctl & SCSW_STCTL_SEC_STATUS)) ||
-           cdev->private->flags.doverify)
+            !(sch->schib.scsw.cmd.stctl & SCSW_STCTL_SEC_STATUS)))
                return -EBUSY;
        ret = cio_set_options (sch, flags);
        if (ret)
index b92a32b4b1141670cc2f3c2e8b91a8e1d526b26a..04c64ce0a1ca1a2006d31ca5c7ee819598f155c4 100644 (file)
@@ -255,9 +255,10 @@ static void qeth_l3_clear_ip_htable(struct qeth_card *card, int recover)
                if (!recover) {
                        hash_del(&addr->hnode);
                        kfree(addr);
-                       continue;
+               } else {
+                       /* prepare for recovery */
+                       addr->disp_flag = QETH_DISP_ADDR_ADD;
                }
-               addr->disp_flag = QETH_DISP_ADDR_ADD;
        }
 
        mutex_unlock(&card->ip_lock);
@@ -278,9 +279,11 @@ static void qeth_l3_recover_ip(struct qeth_card *card)
                if (addr->disp_flag == QETH_DISP_ADDR_ADD) {
                        rc = qeth_l3_register_addr_entry(card, addr);
 
-                       if (!rc) {
+                       if (!rc || rc == -EADDRINUSE || rc == -ENETDOWN) {
+                               /* keep it in the records */
                                addr->disp_flag = QETH_DISP_ADDR_DO_NOTHING;
                        } else {
+                               /* bad address */
                                hash_del(&addr->hnode);
                                kfree(addr);
                        }
index addac7fbe37b9870380cc715acf923344071e6e6..9ce27092729c30a2791b329c117fa9314b268352 100644 (file)
@@ -1270,7 +1270,7 @@ source "drivers/scsi/arm/Kconfig"
 
 config JAZZ_ESP
        bool "MIPS JAZZ FAS216 SCSI support"
-       depends on MACH_JAZZ && SCSI
+       depends on MACH_JAZZ && SCSI=y
        select SCSI_SPI_ATTRS
        help
          This is the driver for the onboard SCSI host adapter of MIPS Magnum
index 19eee108db02145e55d6bebc33a03e4fffba1ef5..5c8d1ba3f8f3c9c2de41e7f111db57f2b4c3e63a 100644 (file)
@@ -319,17 +319,16 @@ static void fcoe_ctlr_announce(struct fcoe_ctlr *fip)
 {
        struct fcoe_fcf *sel;
        struct fcoe_fcf *fcf;
-       unsigned long flags;
 
        mutex_lock(&fip->ctlr_mutex);
-       spin_lock_irqsave(&fip->ctlr_lock, flags);
+       spin_lock_bh(&fip->ctlr_lock);
 
        kfree_skb(fip->flogi_req);
        fip->flogi_req = NULL;
        list_for_each_entry(fcf, &fip->fcfs, list)
                fcf->flogi_sent = 0;
 
-       spin_unlock_irqrestore(&fip->ctlr_lock, flags);
+       spin_unlock_bh(&fip->ctlr_lock);
        sel = fip->sel_fcf;
 
        if (sel && ether_addr_equal(sel->fcf_mac, fip->dest_addr))
@@ -700,7 +699,6 @@ int fcoe_ctlr_els_send(struct fcoe_ctlr *fip, struct fc_lport *lport,
 {
        struct fc_frame *fp;
        struct fc_frame_header *fh;
-       unsigned long flags;
        u16 old_xid;
        u8 op;
        u8 mac[ETH_ALEN];
@@ -734,11 +732,11 @@ int fcoe_ctlr_els_send(struct fcoe_ctlr *fip, struct fc_lport *lport,
                op = FIP_DT_FLOGI;
                if (fip->mode == FIP_MODE_VN2VN)
                        break;
-               spin_lock_irqsave(&fip->ctlr_lock, flags);
+               spin_lock_bh(&fip->ctlr_lock);
                kfree_skb(fip->flogi_req);
                fip->flogi_req = skb;
                fip->flogi_req_send = 1;
-               spin_unlock_irqrestore(&fip->ctlr_lock, flags);
+               spin_unlock_bh(&fip->ctlr_lock);
                schedule_work(&fip->timer_work);
                return -EINPROGRESS;
        case ELS_FDISC:
@@ -1707,11 +1705,10 @@ static int fcoe_ctlr_flogi_send_locked(struct fcoe_ctlr *fip)
 static int fcoe_ctlr_flogi_retry(struct fcoe_ctlr *fip)
 {
        struct fcoe_fcf *fcf;
-       unsigned long flags;
        int error;
 
        mutex_lock(&fip->ctlr_mutex);
-       spin_lock_irqsave(&fip->ctlr_lock, flags);
+       spin_lock_bh(&fip->ctlr_lock);
        LIBFCOE_FIP_DBG(fip, "re-sending FLOGI - reselect\n");
        fcf = fcoe_ctlr_select(fip);
        if (!fcf || fcf->flogi_sent) {
@@ -1722,7 +1719,7 @@ static int fcoe_ctlr_flogi_retry(struct fcoe_ctlr *fip)
                fcoe_ctlr_solicit(fip, NULL);
                error = fcoe_ctlr_flogi_send_locked(fip);
        }
-       spin_unlock_irqrestore(&fip->ctlr_lock, flags);
+       spin_unlock_bh(&fip->ctlr_lock);
        mutex_unlock(&fip->ctlr_mutex);
        return error;
 }
@@ -1739,9 +1736,8 @@ static int fcoe_ctlr_flogi_retry(struct fcoe_ctlr *fip)
 static void fcoe_ctlr_flogi_send(struct fcoe_ctlr *fip)
 {
        struct fcoe_fcf *fcf;
-       unsigned long flags;
 
-       spin_lock_irqsave(&fip->ctlr_lock, flags);
+       spin_lock_bh(&fip->ctlr_lock);
        fcf = fip->sel_fcf;
        if (!fcf || !fip->flogi_req_send)
                goto unlock;
@@ -1768,7 +1764,7 @@ static void fcoe_ctlr_flogi_send(struct fcoe_ctlr *fip)
        } else /* XXX */
                LIBFCOE_FIP_DBG(fip, "No FCF selected - defer send\n");
 unlock:
-       spin_unlock_irqrestore(&fip->ctlr_lock, flags);
+       spin_unlock_bh(&fip->ctlr_lock);
 }
 
 /**
index 2074937c05bc855dea5a580079b84fd677460fb5..ce73f08ee889f1409c43583d959baeae6f0fb895 100644 (file)
@@ -305,6 +305,7 @@ struct fnic {
        unsigned int copy_wq_base;
        struct work_struct link_work;
        struct work_struct frame_work;
+       struct work_struct flush_work;
        struct sk_buff_head frame_queue;
        struct sk_buff_head tx_queue;
 
@@ -363,7 +364,7 @@ void fnic_handle_event(struct work_struct *work);
 int fnic_rq_cmpl_handler(struct fnic *fnic, int);
 int fnic_alloc_rq_frame(struct vnic_rq *rq);
 void fnic_free_rq_buf(struct vnic_rq *rq, struct vnic_rq_buf *buf);
-void fnic_flush_tx(struct fnic *);
+void fnic_flush_tx(struct work_struct *work);
 void fnic_eth_send(struct fcoe_ctlr *, struct sk_buff *skb);
 void fnic_set_port_id(struct fc_lport *, u32, struct fc_frame *);
 void fnic_update_mac(struct fc_lport *, u8 *new);
index 5e312a55cc7da0c73811b0fc26aed17d8c6a034c..a08293b2ad9f59031d5220aba3480a84461a0e8a 100644 (file)
@@ -1182,7 +1182,7 @@ int fnic_send(struct fc_lport *lp, struct fc_frame *fp)
 
 /**
  * fnic_flush_tx() - send queued frames.
- * @fnic: fnic device
+ * @work: pointer to work element
  *
  * Send frames that were waiting to go out in FC or Ethernet mode.
  * Whenever changing modes we purge queued frames, so these frames should
@@ -1190,8 +1190,9 @@ int fnic_send(struct fc_lport *lp, struct fc_frame *fp)
  *
  * Called without fnic_lock held.
  */
-void fnic_flush_tx(struct fnic *fnic)
+void fnic_flush_tx(struct work_struct *work)
 {
+       struct fnic *fnic = container_of(work, struct fnic, flush_work);
        struct sk_buff *skb;
        struct fc_frame *fp;
 
index 5ed1d897311a88c0d1194bff7b36e9677a45166f..29eead383eb9a478bb71643eaac1a4e302418f0f 100644 (file)
@@ -830,6 +830,7 @@ static int fnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
                spin_lock_init(&fnic->vlans_lock);
                INIT_WORK(&fnic->fip_frame_work, fnic_handle_fip_frame);
                INIT_WORK(&fnic->event_work, fnic_handle_event);
+               INIT_WORK(&fnic->flush_work, fnic_flush_tx);
                skb_queue_head_init(&fnic->fip_frame_queue);
                INIT_LIST_HEAD(&fnic->evlist);
                INIT_LIST_HEAD(&fnic->vlans);
index 8d7fc5284293b5283523b049ba38387857ebb09e..fc4cee91b175c14d0950337ac8928b659cbc8cef 100644 (file)
@@ -680,7 +680,7 @@ static int fnic_fcpio_fw_reset_cmpl_handler(struct fnic *fnic,
 
        spin_unlock_irqrestore(&fnic->fnic_lock, flags);
 
-       fnic_flush_tx(fnic);
+       queue_work(fnic_event_queue, &fnic->flush_work);
 
  reset_cmpl_handler_end:
        fnic_clear_state_flags(fnic, FNIC_FLAGS_FWRESET);
@@ -736,7 +736,7 @@ static int fnic_fcpio_flogi_reg_cmpl_handler(struct fnic *fnic,
                }
                spin_unlock_irqrestore(&fnic->fnic_lock, flags);
 
-               fnic_flush_tx(fnic);
+               queue_work(fnic_event_queue, &fnic->flush_work);
                queue_work(fnic_event_queue, &fnic->frame_work);
        } else {
                spin_unlock_irqrestore(&fnic->fnic_lock, flags);
index 2a50fda3a628c3fdc9daa79d73437de565bc5891..625fd547ee60a79c3a8bae9e985cb74d06b9073f 100644 (file)
@@ -371,7 +371,6 @@ static u16 initio_se2_rd(unsigned long base, u8 addr)
  */
 static void initio_se2_wr(unsigned long base, u8 addr, u16 val)
 {
-       u8 rb;
        u8 instr;
        int i;
 
@@ -400,7 +399,7 @@ static void initio_se2_wr(unsigned long base, u8 addr, u16 val)
                udelay(30);
                outb(SE2CS, base + TUL_NVRAM);                  /* -CLK */
                udelay(30);
-               if ((rb = inb(base + TUL_NVRAM)) & SE2DI)
+               if (inb(base + TUL_NVRAM) & SE2DI)
                        break;  /* write complete */
        }
        outb(0, base + TUL_NVRAM);                              /* -CS */
index 71f711cb0628a70d40efc99520ef2dc807494e75..355a0bc0828e749a45513309b942cfdae4878a7c 100644 (file)
@@ -3387,7 +3387,7 @@ static enum sci_status isci_io_request_build(struct isci_host *ihost,
                return SCI_FAILURE;
        }
 
-       return SCI_SUCCESS;
+       return status;
 }
 
 static struct isci_request *isci_request_from_tag(struct isci_host *ihost, u16 tag)
index d26941b131fdb81e6bc9fe48ccc57b75a0055af5..bf879d81846b69379f34b91759a45ef8d5af89fb 100644 (file)
@@ -1918,7 +1918,7 @@ out:
  *
  * Returns the number of SGEs added to the SGL.
  **/
-static int
+static uint32_t
 lpfc_bg_setup_sgl(struct lpfc_hba *phba, struct scsi_cmnd *sc,
                struct sli4_sge *sgl, int datasegcnt,
                struct lpfc_io_buf *lpfc_cmd)
@@ -1926,8 +1926,8 @@ lpfc_bg_setup_sgl(struct lpfc_hba *phba, struct scsi_cmnd *sc,
        struct scatterlist *sgde = NULL; /* s/g data entry */
        struct sli4_sge_diseed *diseed = NULL;
        dma_addr_t physaddr;
-       int i = 0, num_sge = 0, status;
-       uint32_t reftag;
+       int i = 0, status;
+       uint32_t reftag, num_sge = 0;
        uint8_t txop, rxop;
 #ifdef CONFIG_SCSI_LPFC_DEBUG_FS
        uint32_t rc;
@@ -2099,7 +2099,7 @@ out:
  *
  * Returns the number of SGEs added to the SGL.
  **/
-static int
+static uint32_t
 lpfc_bg_setup_sgl_prot(struct lpfc_hba *phba, struct scsi_cmnd *sc,
                struct sli4_sge *sgl, int datacnt, int protcnt,
                struct lpfc_io_buf *lpfc_cmd)
@@ -2123,8 +2123,8 @@ lpfc_bg_setup_sgl_prot(struct lpfc_hba *phba, struct scsi_cmnd *sc,
        uint32_t rc;
 #endif
        uint32_t checking = 1;
-       uint32_t dma_offset = 0;
-       int num_sge = 0, j = 2;
+       uint32_t dma_offset = 0, num_sge = 0;
+       int j = 2;
        struct sli4_hybrid_sgl *sgl_xtra = NULL;
 
        sgpe = scsi_prot_sglist(sc);
index c0c8ab5869572f77fa11f1c2154e85802ff8a4e5..d32ad46318cb09af970085b3ab00fc376a934e4f 100644 (file)
@@ -1671,7 +1671,7 @@ mpi3mr_update_mr_sas_port(struct mpi3mr_ioc *mrioc, struct host_port *h_port,
 void
 mpi3mr_refresh_sas_ports(struct mpi3mr_ioc *mrioc)
 {
-       struct host_port h_port[64];
+       struct host_port *h_port = NULL;
        int i, j, found, host_port_count = 0, port_idx;
        u16 sz, attached_handle, ioc_status;
        struct mpi3_sas_io_unit_page0 *sas_io_unit_pg0 = NULL;
@@ -1685,6 +1685,10 @@ mpi3mr_refresh_sas_ports(struct mpi3mr_ioc *mrioc)
        sas_io_unit_pg0 = kzalloc(sz, GFP_KERNEL);
        if (!sas_io_unit_pg0)
                return;
+       h_port = kcalloc(64, sizeof(struct host_port), GFP_KERNEL);
+       if (!h_port)
+               goto out;
+
        if (mpi3mr_cfg_get_sas_io_unit_pg0(mrioc, sas_io_unit_pg0, sz)) {
                ioc_err(mrioc, "failure at %s:%d/%s()!\n",
                    __FILE__, __LINE__, __func__);
@@ -1814,6 +1818,7 @@ mpi3mr_refresh_sas_ports(struct mpi3mr_ioc *mrioc)
                }
        }
 out:
+       kfree(h_port);
        kfree(sas_io_unit_pg0);
 }
 
index 8761bc58d965f0f6eb6776a4272ca856e8724463..b8120ca93c79740d7827ebff1652b4b22b296421 100644 (file)
@@ -7378,7 +7378,9 @@ _base_wait_for_iocstate(struct MPT3SAS_ADAPTER *ioc, int timeout)
                return -EFAULT;
        }
 
- issue_diag_reset:
+       return 0;
+
+issue_diag_reset:
        rc = _base_diag_reset(ioc);
        return rc;
 }
index 76d369343c7a9c2457e7d7bc16aa518815cbfc8f..8cad9792a56275b38f70595baf7fb095882a6c1f 100644 (file)
@@ -328,21 +328,39 @@ static int scsi_vpd_inquiry(struct scsi_device *sdev, unsigned char *buffer,
        return result + 4;
 }
 
+enum scsi_vpd_parameters {
+       SCSI_VPD_HEADER_SIZE = 4,
+       SCSI_VPD_LIST_SIZE = 36,
+};
+
 static int scsi_get_vpd_size(struct scsi_device *sdev, u8 page)
 {
-       unsigned char vpd_header[SCSI_VPD_HEADER_SIZE] __aligned(4);
+       unsigned char vpd[SCSI_VPD_LIST_SIZE] __aligned(4);
        int result;
 
        if (sdev->no_vpd_size)
                return SCSI_DEFAULT_VPD_LEN;
 
+       /*
+        * Fetch the supported pages VPD and validate that the requested page
+        * number is present.
+        */
+       if (page != 0) {
+               result = scsi_vpd_inquiry(sdev, vpd, 0, sizeof(vpd));
+               if (result < SCSI_VPD_HEADER_SIZE)
+                       return 0;
+
+               result -= SCSI_VPD_HEADER_SIZE;
+               if (!memchr(&vpd[SCSI_VPD_HEADER_SIZE], page, result))
+                       return 0;
+       }
        /*
         * Fetch the VPD page header to find out how big the page
         * is. This is done to prevent problems on legacy devices
         * which can not handle allocation lengths as large as
         * potentially requested by the caller.
         */
-       result = scsi_vpd_inquiry(sdev, vpd_header, page, sizeof(vpd_header));
+       result = scsi_vpd_inquiry(sdev, vpd, page, SCSI_VPD_HEADER_SIZE);
        if (result < 0)
                return 0;
 
index 79da4b1c1df0adc649954a45f2d630989f12a6d6..612489afe8d2467965759c80562562e26919f704 100644 (file)
@@ -61,11 +61,11 @@ static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
 static enum scsi_disposition scsi_try_to_abort_cmd(const struct scsi_host_template *,
                                                   struct scsi_cmnd *);
 
-void scsi_eh_wakeup(struct Scsi_Host *shost)
+void scsi_eh_wakeup(struct Scsi_Host *shost, unsigned int busy)
 {
        lockdep_assert_held(shost->host_lock);
 
-       if (scsi_host_busy(shost) == shost->host_failed) {
+       if (busy == shost->host_failed) {
                trace_scsi_eh_wakeup(shost);
                wake_up_process(shost->ehandler);
                SCSI_LOG_ERROR_RECOVERY(5, shost_printk(KERN_INFO, shost,
@@ -88,7 +88,7 @@ void scsi_schedule_eh(struct Scsi_Host *shost)
        if (scsi_host_set_state(shost, SHOST_RECOVERY) == 0 ||
            scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY) == 0) {
                shost->host_eh_scheduled++;
-               scsi_eh_wakeup(shost);
+               scsi_eh_wakeup(shost, scsi_host_busy(shost));
        }
 
        spin_unlock_irqrestore(shost->host_lock, flags);
@@ -282,11 +282,12 @@ static void scsi_eh_inc_host_failed(struct rcu_head *head)
 {
        struct scsi_cmnd *scmd = container_of(head, typeof(*scmd), rcu);
        struct Scsi_Host *shost = scmd->device->host;
+       unsigned int busy = scsi_host_busy(shost);
        unsigned long flags;
 
        spin_lock_irqsave(shost->host_lock, flags);
        shost->host_failed++;
-       scsi_eh_wakeup(shost);
+       scsi_eh_wakeup(shost, busy);
        spin_unlock_irqrestore(shost->host_lock, flags);
 }
 
index cf3864f720930988fbadc77b3c91c77fe2d3bb62..df5ac03d5d6c2eb5233ad7fcfdad37a1e487b4e6 100644 (file)
@@ -278,9 +278,11 @@ static void scsi_dec_host_busy(struct Scsi_Host *shost, struct scsi_cmnd *cmd)
        rcu_read_lock();
        __clear_bit(SCMD_STATE_INFLIGHT, &cmd->state);
        if (unlikely(scsi_host_in_recovery(shost))) {
+               unsigned int busy = scsi_host_busy(shost);
+
                spin_lock_irqsave(shost->host_lock, flags);
                if (shost->host_failed || shost->host_eh_scheduled)
-                       scsi_eh_wakeup(shost);
+                       scsi_eh_wakeup(shost, busy);
                spin_unlock_irqrestore(shost->host_lock, flags);
        }
        rcu_read_unlock();
index 3f0dfb97db6bd1b88755db1fb50dd6e968e385c6..1fbfe1b52c9f1a906ea6b0da7a6b273e2972a903 100644 (file)
@@ -92,7 +92,7 @@ extern void scmd_eh_abort_handler(struct work_struct *work);
 extern enum blk_eh_timer_return scsi_timeout(struct request *req);
 extern int scsi_error_handler(void *host);
 extern enum scsi_disposition scsi_decide_disposition(struct scsi_cmnd *cmd);
-extern void scsi_eh_wakeup(struct Scsi_Host *shost);
+extern void scsi_eh_wakeup(struct Scsi_Host *shost, unsigned int busy);
 extern void scsi_eh_scmd_add(struct scsi_cmnd *);
 void scsi_eh_ready_devs(struct Scsi_Host *shost,
                        struct list_head *work_q,
index 44680f65ea1455daec91d4c6dd98d2cb2431f4e2..9969f4e2f1c3d9c656076e3e540bd17c8352c2af 100644 (file)
@@ -332,7 +332,7 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
 
        sdev->sg_reserved_size = INT_MAX;
 
-       q = blk_mq_init_queue(&sdev->host->tag_set);
+       q = blk_mq_alloc_queue(&sdev->host->tag_set, NULL, NULL);
        if (IS_ERR(q)) {
                /* release fn is set up in scsi_sysfs_device_initialise, so
                 * have to free and put manually here */
index 0833b3e6aa6e8f35b791d3f75fe208fb0f888914..bdd0acf7fa3cb130e64fac2aacf684aa5a91da8b 100644 (file)
@@ -3407,6 +3407,24 @@ static bool sd_validate_opt_xfer_size(struct scsi_disk *sdkp,
        return true;
 }
 
+static void sd_read_block_zero(struct scsi_disk *sdkp)
+{
+       unsigned int buf_len = sdkp->device->sector_size;
+       char *buffer, cmd[10] = { };
+
+       buffer = kmalloc(buf_len, GFP_KERNEL);
+       if (!buffer)
+               return;
+
+       cmd[0] = READ_10;
+       put_unaligned_be32(0, &cmd[2]); /* Logical block address 0 */
+       put_unaligned_be16(1, &cmd[7]); /* Transfer 1 logical block */
+
+       scsi_execute_cmd(sdkp->device, cmd, REQ_OP_DRV_IN, buffer, buf_len,
+                        SD_TIMEOUT, sdkp->max_retries, NULL);
+       kfree(buffer);
+}
+
 /**
  *     sd_revalidate_disk - called the first time a new disk is seen,
  *     performs disk spin up, read_capacity, etc.
@@ -3446,7 +3464,13 @@ static int sd_revalidate_disk(struct gendisk *disk)
         */
        if (sdkp->media_present) {
                sd_read_capacity(sdkp, buffer);
-
+               /*
+                * Some USB/UAS devices return generic values for mode pages
+                * until the media has been accessed. Trigger a READ operation
+                * to force the device to populate mode pages.
+                */
+               if (sdp->read_before_ms)
+                       sd_read_block_zero(sdkp);
                /*
                 * set the default to rotational.  All non-rotational devices
                 * support the block characteristics VPD page, which will
index ceff1ec13f9ea9ea056da947d3939c51f4797522..385180c98be496989dbf469926f52c974609a013 100644 (file)
@@ -6533,8 +6533,11 @@ static void pqi_map_queues(struct Scsi_Host *shost)
 {
        struct pqi_ctrl_info *ctrl_info = shost_to_hba(shost);
 
-       blk_mq_pci_map_queues(&shost->tag_set.map[HCTX_TYPE_DEFAULT],
+       if (!ctrl_info->disable_managed_interrupts)
+               return blk_mq_pci_map_queues(&shost->tag_set.map[HCTX_TYPE_DEFAULT],
                              ctrl_info->pci_dev, 0);
+       else
+               return blk_mq_map_queues(&shost->tag_set.map[HCTX_TYPE_DEFAULT]);
 }
 
 static inline bool pqi_is_tape_changer_device(struct pqi_scsi_dev *device)
index a95936b18f695e3ef796098866ea07101e9e346d..7ceb982040a5dfe5d490f9a4bd306e99e5140a53 100644 (file)
@@ -330,6 +330,7 @@ enum storvsc_request_type {
  */
 
 static int storvsc_ringbuffer_size = (128 * 1024);
+static int aligned_ringbuffer_size;
 static u32 max_outstanding_req_per_channel;
 static int storvsc_change_queue_depth(struct scsi_device *sdev, int queue_depth);
 
@@ -687,8 +688,8 @@ static void handle_sc_creation(struct vmbus_channel *new_sc)
        new_sc->next_request_id_callback = storvsc_next_request_id;
 
        ret = vmbus_open(new_sc,
-                        storvsc_ringbuffer_size,
-                        storvsc_ringbuffer_size,
+                        aligned_ringbuffer_size,
+                        aligned_ringbuffer_size,
                         (void *)&props,
                         sizeof(struct vmstorage_channel_properties),
                         storvsc_on_channel_callback, new_sc);
@@ -1973,7 +1974,7 @@ static int storvsc_probe(struct hv_device *device,
        dma_set_min_align_mask(&device->device, HV_HYP_PAGE_SIZE - 1);
 
        stor_device->port_number = host->host_no;
-       ret = storvsc_connect_to_vsp(device, storvsc_ringbuffer_size, is_fc);
+       ret = storvsc_connect_to_vsp(device, aligned_ringbuffer_size, is_fc);
        if (ret)
                goto err_out1;
 
@@ -2164,7 +2165,7 @@ static int storvsc_resume(struct hv_device *hv_dev)
 {
        int ret;
 
-       ret = storvsc_connect_to_vsp(hv_dev, storvsc_ringbuffer_size,
+       ret = storvsc_connect_to_vsp(hv_dev, aligned_ringbuffer_size,
                                     hv_dev_is_fc(hv_dev));
        return ret;
 }
@@ -2198,8 +2199,9 @@ static int __init storvsc_drv_init(void)
         * the ring buffer indices) by the max request size (which is
         * vmbus_channel_packet_multipage_buffer + struct vstor_packet + u64)
         */
+       aligned_ringbuffer_size = VMBUS_RING_SIZE(storvsc_ringbuffer_size);
        max_outstanding_req_per_channel =
-               ((storvsc_ringbuffer_size - PAGE_SIZE) /
+               ((aligned_ringbuffer_size - PAGE_SIZE) /
                ALIGN(MAX_MULTIPAGE_BUFFER_PACKET +
                sizeof(struct vstor_packet) + sizeof(u64),
                sizeof(u64)));
index 4cf20be668a6021c6acfae56c19f0914586a7bf6..617eb892f4ad457feb5d4de3d9c1ceb88a010c61 100644 (file)
@@ -188,8 +188,6 @@ static void virtscsi_vq_done(struct virtio_scsi *vscsi,
                while ((buf = virtqueue_get_buf(vq, &len)) != NULL)
                        fn(vscsi, buf);
 
-               if (unlikely(virtqueue_is_broken(vq)))
-                       break;
        } while (!virtqueue_enable_cb(vq));
        spin_unlock_irqrestore(&virtscsi_vq->vq_lock, flags);
 }
index 780199bf351efbfb24422880ef39510c77def68f..49a0955e82d6cf5eef83e5f63ba8d31194c65324 100644 (file)
@@ -296,14 +296,14 @@ struct apple_mbox *apple_mbox_get(struct device *dev, int index)
        of_node_put(args.np);
 
        if (!pdev)
-               return ERR_PTR(EPROBE_DEFER);
+               return ERR_PTR(-EPROBE_DEFER);
 
        mbox = platform_get_drvdata(pdev);
        if (!mbox)
-               return ERR_PTR(EPROBE_DEFER);
+               return ERR_PTR(-EPROBE_DEFER);
 
        if (!device_link_add(dev, &pdev->dev, DL_FLAG_AUTOREMOVE_CONSUMER))
-               return ERR_PTR(ENODEV);
+               return ERR_PTR(-ENODEV);
 
        return mbox;
 }
index 9b0fdd95276e4e017d32012d1f2de3107556e242..19f4b576f822b2e57309308f5294914af27df570 100644 (file)
@@ -1,5 +1,5 @@
 config POLARFIRE_SOC_SYS_CTRL
-       tristate "POLARFIRE_SOC_SYS_CTRL"
+       tristate "Microchip PolarFire SoC (MPFS) system controller support"
        depends on POLARFIRE_SOC_MAILBOX
        depends on MTD
        help
index f4bfd24386f1b5d2defe9aad6ffcd7123035158d..f913e9bd57ed4a7aa6d1b99d27a40552713b2536 100644 (file)
@@ -265,10 +265,17 @@ static int pmic_glink_probe(struct platform_device *pdev)
 
        pg->client_mask = *match_data;
 
+       pg->pdr = pdr_handle_alloc(pmic_glink_pdr_callback, pg);
+       if (IS_ERR(pg->pdr)) {
+               ret = dev_err_probe(&pdev->dev, PTR_ERR(pg->pdr),
+                                   "failed to initialize pdr\n");
+               return ret;
+       }
+
        if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_UCSI)) {
                ret = pmic_glink_add_aux_device(pg, &pg->ucsi_aux, "ucsi");
                if (ret)
-                       return ret;
+                       goto out_release_pdr_handle;
        }
        if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_ALTMODE)) {
                ret = pmic_glink_add_aux_device(pg, &pg->altmode_aux, "altmode");
@@ -281,17 +288,11 @@ static int pmic_glink_probe(struct platform_device *pdev)
                        goto out_release_altmode_aux;
        }
 
-       pg->pdr = pdr_handle_alloc(pmic_glink_pdr_callback, pg);
-       if (IS_ERR(pg->pdr)) {
-               ret = dev_err_probe(&pdev->dev, PTR_ERR(pg->pdr), "failed to initialize pdr\n");
-               goto out_release_aux_devices;
-       }
-
        service = pdr_add_lookup(pg->pdr, "tms/servreg", "msm/adsp/charger_pd");
        if (IS_ERR(service)) {
                ret = dev_err_probe(&pdev->dev, PTR_ERR(service),
                                    "failed adding pdr lookup for charger_pd\n");
-               goto out_release_pdr_handle;
+               goto out_release_aux_devices;
        }
 
        mutex_lock(&__pmic_glink_lock);
@@ -300,8 +301,6 @@ static int pmic_glink_probe(struct platform_device *pdev)
 
        return 0;
 
-out_release_pdr_handle:
-       pdr_handle_release(pg->pdr);
 out_release_aux_devices:
        if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_BATT))
                pmic_glink_del_aux_device(pg, &pg->ps_aux);
@@ -311,6 +310,8 @@ out_release_altmode_aux:
 out_release_ucsi_aux:
        if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_UCSI))
                pmic_glink_del_aux_device(pg, &pg->ucsi_aux);
+out_release_pdr_handle:
+       pdr_handle_release(pg->pdr);
 
        return ret;
 }
index 5fcd0fdd2faa2d087fc03e001dffe4f5016c80a9..b3808fc24c695e89fa10f46b93e0fcfabc3b4d61 100644 (file)
@@ -76,7 +76,7 @@ struct pmic_glink_altmode_port {
 
        struct work_struct work;
 
-       struct device *bridge;
+       struct auxiliary_device *bridge;
 
        enum typec_orientation orientation;
        u16 svid;
@@ -230,7 +230,7 @@ static void pmic_glink_altmode_worker(struct work_struct *work)
        else
                pmic_glink_altmode_enable_usb(altmode, alt_port);
 
-       drm_aux_hpd_bridge_notify(alt_port->bridge,
+       drm_aux_hpd_bridge_notify(&alt_port->bridge->dev,
                                  alt_port->hpd_state ?
                                  connector_status_connected :
                                  connector_status_disconnected);
@@ -454,7 +454,7 @@ static int pmic_glink_altmode_probe(struct auxiliary_device *adev,
                alt_port->index = port;
                INIT_WORK(&alt_port->work, pmic_glink_altmode_worker);
 
-               alt_port->bridge = drm_dp_hpd_bridge_register(dev, to_of_node(fwnode));
+               alt_port->bridge = devm_drm_dp_hpd_bridge_alloc(dev, to_of_node(fwnode));
                if (IS_ERR(alt_port->bridge)) {
                        fwnode_handle_put(fwnode);
                        return PTR_ERR(alt_port->bridge);
@@ -510,6 +510,16 @@ static int pmic_glink_altmode_probe(struct auxiliary_device *adev,
                }
        }
 
+       for (port = 0; port < ARRAY_SIZE(altmode->ports); port++) {
+               alt_port = &altmode->ports[port];
+               if (!alt_port->bridge)
+                       continue;
+
+               ret = devm_drm_dp_hpd_bridge_add(dev, alt_port->bridge);
+               if (ret)
+                       return ret;
+       }
+
        altmode->client = devm_pmic_glink_register_client(dev,
                                                          altmode->owner_id,
                                                          pmic_glink_altmode_callback,
index d96222e6d7d2d4022753d3120b4c36ea759dad75..cfdaa5eaec76db9b322272b54d4fdfdcff3db697 100644 (file)
@@ -19,7 +19,7 @@
 #include <linux/platform_device.h>
 #include <linux/slab.h>
 #include <linux/spi/spi.h>
-#include <linux/spi/spi-mem.h>
+#include <linux/mtd/spi-nor.h>
 #include <linux/sysfs.h>
 #include <linux/types.h>
 #include "spi-bcm-qspi.h"
@@ -1221,7 +1221,7 @@ static int bcm_qspi_exec_mem_op(struct spi_mem *mem,
 
        /* non-aligned and very short transfers are handled by MSPI */
        if (!IS_ALIGNED((uintptr_t)addr, 4) || !IS_ALIGNED((uintptr_t)buf, 4) ||
-           len < 4)
+           len < 4 || op->cmd.opcode == SPINOR_OP_RDSFDP)
                mspi_read = true;
 
        if (!has_bspi(qspi) || mspi_read)
index f94e0d370d466e9742261a84a567593b8073f169..1a8d03958dffbfb77a4cd183d8a18fbd3ed53d63 100644 (file)
@@ -1927,24 +1927,18 @@ static void cqspi_remove(struct platform_device *pdev)
        pm_runtime_disable(&pdev->dev);
 }
 
-static int cqspi_suspend(struct device *dev)
+static int cqspi_runtime_suspend(struct device *dev)
 {
        struct cqspi_st *cqspi = dev_get_drvdata(dev);
-       struct spi_controller *host = dev_get_drvdata(dev);
-       int ret;
 
-       ret = spi_controller_suspend(host);
        cqspi_controller_enable(cqspi, 0);
-
        clk_disable_unprepare(cqspi->clk);
-
-       return ret;
+       return 0;
 }
 
-static int cqspi_resume(struct device *dev)
+static int cqspi_runtime_resume(struct device *dev)
 {
        struct cqspi_st *cqspi = dev_get_drvdata(dev);
-       struct spi_controller *host = dev_get_drvdata(dev);
 
        clk_prepare_enable(cqspi->clk);
        cqspi_wait_idle(cqspi);
@@ -1952,12 +1946,27 @@ static int cqspi_resume(struct device *dev)
 
        cqspi->current_cs = -1;
        cqspi->sclk = 0;
+       return 0;
+}
+
+static int cqspi_suspend(struct device *dev)
+{
+       struct cqspi_st *cqspi = dev_get_drvdata(dev);
+
+       return spi_controller_suspend(cqspi->host);
+}
 
-       return spi_controller_resume(host);
+static int cqspi_resume(struct device *dev)
+{
+       struct cqspi_st *cqspi = dev_get_drvdata(dev);
+
+       return spi_controller_resume(cqspi->host);
 }
 
-static DEFINE_RUNTIME_DEV_PM_OPS(cqspi_dev_pm_ops, cqspi_suspend,
-                                cqspi_resume, NULL);
+static const struct dev_pm_ops cqspi_dev_pm_ops = {
+       RUNTIME_PM_OPS(cqspi_runtime_suspend, cqspi_runtime_resume, NULL)
+       SYSTEM_SLEEP_PM_OPS(cqspi_suspend, cqspi_resume)
+};
 
 static const struct cqspi_driver_platdata cdns_qspi = {
        .quirks = CQSPI_DISABLE_DAC_MODE,
index a50eb4db79de8e93cb61a9ea50bc8913ed3e4f1f..e5140532071d2b647ab77fa561f27630a334971a 100644 (file)
@@ -317,6 +317,15 @@ static void cdns_spi_process_fifo(struct cdns_spi *xspi, int ntx, int nrx)
        xspi->rx_bytes -= nrx;
 
        while (ntx || nrx) {
+               if (nrx) {
+                       u8 data = cdns_spi_read(xspi, CDNS_SPI_RXD);
+
+                       if (xspi->rxbuf)
+                               *xspi->rxbuf++ = data;
+
+                       nrx--;
+               }
+
                if (ntx) {
                        if (xspi->txbuf)
                                cdns_spi_write(xspi, CDNS_SPI_TXD, *xspi->txbuf++);
@@ -326,14 +335,6 @@ static void cdns_spi_process_fifo(struct cdns_spi *xspi, int ntx, int nrx)
                        ntx--;
                }
 
-               if (nrx) {
-                       u8 data = cdns_spi_read(xspi, CDNS_SPI_RXD);
-
-                       if (xspi->rxbuf)
-                               *xspi->rxbuf++ = data;
-
-                       nrx--;
-               }
        }
 }
 
index f13073e1259364640b16b57323e5d027c10bdb0f..adf19e8c4c8a0d1ef9ede14374e8ab6638073a82 100644 (file)
@@ -148,8 +148,7 @@ static void cs42l43_set_cs(struct spi_device *spi, bool is_high)
 {
        struct cs42l43_spi *priv = spi_controller_get_devdata(spi->controller);
 
-       if (spi_get_chipselect(spi, 0) == 0)
-               regmap_write(priv->regmap, CS42L43_SPI_CONFIG2, !is_high);
+       regmap_write(priv->regmap, CS42L43_SPI_CONFIG2, !is_high);
 }
 
 static int cs42l43_prepare_message(struct spi_controller *ctlr, struct spi_message *msg)
@@ -244,7 +243,10 @@ static int cs42l43_spi_probe(struct platform_device *pdev)
        priv->ctlr->use_gpio_descriptors = true;
        priv->ctlr->auto_runtime_pm = true;
 
-       devm_pm_runtime_enable(priv->dev);
+       ret = devm_pm_runtime_enable(priv->dev);
+       if (ret)
+               return ret;
+
        pm_runtime_idle(priv->dev);
 
        regmap_write(priv->regmap, CS42L43_TRAN_CONFIG6, CS42L43_FIFO_SIZE - 1);
index 9d22018f7985f11956fae5e06ffb0dbd180914f9..1301d14483d482dcaf05250a563a414db73c9dd4 100644 (file)
@@ -377,6 +377,11 @@ static const struct spi_controller_mem_ops hisi_sfc_v3xx_mem_ops = {
 static irqreturn_t hisi_sfc_v3xx_isr(int irq, void *data)
 {
        struct hisi_sfc_v3xx_host *host = data;
+       u32 reg;
+
+       reg = readl(host->regbase + HISI_SFC_V3XX_INT_STAT);
+       if (!reg)
+               return IRQ_NONE;
 
        hisi_sfc_v3xx_disable_int(host);
 
index 272bc871a848b833e6e673740f4be5f8f3a16294..833a1bb7a91438e02c2d5a176e1afdaba4159552 100644 (file)
@@ -2,6 +2,7 @@
 // Copyright 2004-2007 Freescale Semiconductor, Inc. All Rights Reserved.
 // Copyright (C) 2008 Juergen Beisert
 
+#include <linux/bits.h>
 #include <linux/clk.h>
 #include <linux/completion.h>
 #include <linux/delay.h>
@@ -660,15 +661,15 @@ static int mx51_ecspi_prepare_transfer(struct spi_imx_data *spi_imx,
                        << MX51_ECSPI_CTRL_BL_OFFSET;
        else {
                if (spi_imx->usedma) {
-                       ctrl |= (spi_imx->bits_per_word *
-                               spi_imx_bytes_per_word(spi_imx->bits_per_word) - 1)
+                       ctrl |= (spi_imx->bits_per_word - 1)
                                << MX51_ECSPI_CTRL_BL_OFFSET;
                } else {
                        if (spi_imx->count >= MX51_ECSPI_CTRL_MAX_BURST)
-                               ctrl |= (MX51_ECSPI_CTRL_MAX_BURST - 1)
+                               ctrl |= (MX51_ECSPI_CTRL_MAX_BURST * BITS_PER_BYTE - 1)
                                                << MX51_ECSPI_CTRL_BL_OFFSET;
                        else
-                               ctrl |= (spi_imx->count * spi_imx->bits_per_word - 1)
+                               ctrl |= spi_imx->count / DIV_ROUND_UP(spi_imx->bits_per_word,
+                                               BITS_PER_BYTE) * spi_imx->bits_per_word
                                                << MX51_ECSPI_CTRL_BL_OFFSET;
                }
        }
@@ -1344,7 +1345,7 @@ static int spi_imx_sdma_init(struct device *dev, struct spi_imx_data *spi_imx,
        controller->dma_tx = dma_request_chan(dev, "tx");
        if (IS_ERR(controller->dma_tx)) {
                ret = PTR_ERR(controller->dma_tx);
-               dev_dbg(dev, "can't get the TX DMA channel, error %d!\n", ret);
+               dev_err_probe(dev, ret, "can't get the TX DMA channel!\n");
                controller->dma_tx = NULL;
                goto err;
        }
@@ -1353,7 +1354,7 @@ static int spi_imx_sdma_init(struct device *dev, struct spi_imx_data *spi_imx,
        controller->dma_rx = dma_request_chan(dev, "rx");
        if (IS_ERR(controller->dma_rx)) {
                ret = PTR_ERR(controller->dma_rx);
-               dev_dbg(dev, "can't get the RX DMA channel, error %d\n", ret);
+               dev_err_probe(dev, ret, "can't get the RX DMA channel!\n");
                controller->dma_rx = NULL;
                goto err;
        }
index 57d767a68e7b2766dcea5510809cf2f09e0bef63..4337ca51d7aa21555684f62295a39a52772cce3d 100644 (file)
@@ -76,6 +76,7 @@ static const struct pci_device_id intel_spi_pci_ids[] = {
        { PCI_VDEVICE(INTEL, 0x7a24), (unsigned long)&cnl_info },
        { PCI_VDEVICE(INTEL, 0x7aa4), (unsigned long)&cnl_info },
        { PCI_VDEVICE(INTEL, 0x7e23), (unsigned long)&cnl_info },
+       { PCI_VDEVICE(INTEL, 0x7f24), (unsigned long)&cnl_info },
        { PCI_VDEVICE(INTEL, 0x9d24), (unsigned long)&cnl_info },
        { PCI_VDEVICE(INTEL, 0x9da4), (unsigned long)&cnl_info },
        { PCI_VDEVICE(INTEL, 0xa0a4), (unsigned long)&cnl_info },
@@ -84,7 +85,7 @@ static const struct pci_device_id intel_spi_pci_ids[] = {
        { PCI_VDEVICE(INTEL, 0xa2a4), (unsigned long)&cnl_info },
        { PCI_VDEVICE(INTEL, 0xa324), (unsigned long)&cnl_info },
        { PCI_VDEVICE(INTEL, 0xa3a4), (unsigned long)&cnl_info },
-       { PCI_VDEVICE(INTEL, 0xae23), (unsigned long)&cnl_info },
+       { PCI_VDEVICE(INTEL, 0xa823), (unsigned long)&cnl_info },
        { },
 };
 MODULE_DEVICE_TABLE(pci, intel_spi_pci_ids);
index 1bf080339b5a722b8ec6abb356f8bad231d19d9f..88cbe4f00cc3b11e5fef822a60114a90a88149f8 100644 (file)
@@ -39,6 +39,7 @@
 #include <linux/spi/spi.h>
 #include <linux/spi/mxs-spi.h>
 #include <trace/events/spi.h>
+#include <linux/dma/mxs-dma.h>
 
 #define DRIVER_NAME            "mxs-spi"
 
@@ -252,7 +253,7 @@ static int mxs_spi_txrx_dma(struct mxs_spi *spi,
                desc = dmaengine_prep_slave_sg(ssp->dmach,
                                &dma_xfer[sg_count].sg, 1,
                                (flags & TXRX_WRITE) ? DMA_MEM_TO_DEV : DMA_DEV_TO_MEM,
-                               DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
+                               DMA_PREP_INTERRUPT | MXS_DMA_CTRL_WAIT4END);
 
                if (!desc) {
                        dev_err(ssp->dev,
index a0c9fea908f553e0bf007a7a78a35746f60a25f4..ddf1c684bcc7d863ede340aa196506a3a3505654 100644 (file)
@@ -53,8 +53,6 @@
 
 /* per-register bitmasks: */
 #define OMAP2_MCSPI_IRQSTATUS_EOW      BIT(17)
-#define OMAP2_MCSPI_IRQSTATUS_TX0_EMPTY    BIT(0)
-#define OMAP2_MCSPI_IRQSTATUS_RX0_FULL    BIT(2)
 
 #define OMAP2_MCSPI_MODULCTRL_SINGLE   BIT(0)
 #define OMAP2_MCSPI_MODULCTRL_MS       BIT(2)
@@ -293,7 +291,7 @@ static void omap2_mcspi_set_mode(struct spi_controller *ctlr)
 }
 
 static void omap2_mcspi_set_fifo(const struct spi_device *spi,
-                               struct spi_transfer *t, int enable, int dma_enabled)
+                               struct spi_transfer *t, int enable)
 {
        struct spi_controller *ctlr = spi->controller;
        struct omap2_mcspi_cs *cs = spi->controller_state;
@@ -314,28 +312,20 @@ static void omap2_mcspi_set_fifo(const struct spi_device *spi,
                        max_fifo_depth = OMAP2_MCSPI_MAX_FIFODEPTH / 2;
                else
                        max_fifo_depth = OMAP2_MCSPI_MAX_FIFODEPTH;
-               if (dma_enabled)
-                       wcnt = t->len / bytes_per_word;
-               else
-                       wcnt = 0;
+
+               wcnt = t->len / bytes_per_word;
                if (wcnt > OMAP2_MCSPI_MAX_FIFOWCNT)
                        goto disable_fifo;
 
                xferlevel = wcnt << 16;
                if (t->rx_buf != NULL) {
                        chconf |= OMAP2_MCSPI_CHCONF_FFER;
-                       if (dma_enabled)
-                               xferlevel |= (bytes_per_word - 1) << 8;
-                       else
-                               xferlevel |= (max_fifo_depth - 1) << 8;
+                       xferlevel |= (bytes_per_word - 1) << 8;
                }
 
                if (t->tx_buf != NULL) {
                        chconf |= OMAP2_MCSPI_CHCONF_FFET;
-                       if (dma_enabled)
-                               xferlevel |= bytes_per_word - 1;
-                       else
-                               xferlevel |= (max_fifo_depth - 1);
+                       xferlevel |= bytes_per_word - 1;
                }
 
                mcspi_write_reg(ctlr, OMAP2_MCSPI_XFERLEVEL, xferlevel);
@@ -892,113 +882,6 @@ out:
        return count - c;
 }
 
-static unsigned
-omap2_mcspi_txrx_piofifo(struct spi_device *spi, struct spi_transfer *xfer)
-{
-       struct omap2_mcspi_cs   *cs = spi->controller_state;
-       struct omap2_mcspi    *mcspi;
-       unsigned int            count, c;
-       unsigned int            iter, cwc;
-       int last_request;
-       void __iomem            *base = cs->base;
-       void __iomem            *tx_reg;
-       void __iomem            *rx_reg;
-       void __iomem            *chstat_reg;
-       void __iomem        *irqstat_reg;
-       int                     word_len, bytes_per_word;
-       u8              *rx;
-       const u8        *tx;
-
-       mcspi = spi_controller_get_devdata(spi->controller);
-       count = xfer->len;
-       c = count;
-       word_len = cs->word_len;
-       bytes_per_word = mcspi_bytes_per_word(word_len);
-
-       /*
-        * We store the pre-calculated register addresses on stack to speed
-        * up the transfer loop.
-        */
-       tx_reg          = base + OMAP2_MCSPI_TX0;
-       rx_reg          = base + OMAP2_MCSPI_RX0;
-       chstat_reg      = base + OMAP2_MCSPI_CHSTAT0;
-       irqstat_reg    = base + OMAP2_MCSPI_IRQSTATUS;
-
-       if (c < (word_len >> 3))
-               return 0;
-
-       rx = xfer->rx_buf;
-       tx = xfer->tx_buf;
-
-       do {
-               /* calculate number of words in current iteration */
-               cwc = min((unsigned int)mcspi->fifo_depth / bytes_per_word,
-                         c / bytes_per_word);
-               last_request = cwc != (mcspi->fifo_depth / bytes_per_word);
-               if (tx) {
-                       if (mcspi_wait_for_reg_bit(irqstat_reg,
-                                                  OMAP2_MCSPI_IRQSTATUS_TX0_EMPTY) < 0) {
-                               dev_err(&spi->dev, "TX Empty timed out\n");
-                               goto out;
-                       }
-                       writel_relaxed(OMAP2_MCSPI_IRQSTATUS_TX0_EMPTY, irqstat_reg);
-
-                       for (iter = 0; iter < cwc; iter++, tx += bytes_per_word) {
-                               if (bytes_per_word == 1)
-                                       writel_relaxed(*tx, tx_reg);
-                               else if (bytes_per_word == 2)
-                                       writel_relaxed(*((u16 *)tx), tx_reg);
-                               else if (bytes_per_word == 4)
-                                       writel_relaxed(*((u32 *)tx), tx_reg);
-                       }
-               }
-
-               if (rx) {
-                       if (!last_request &&
-                           mcspi_wait_for_reg_bit(irqstat_reg,
-                                                  OMAP2_MCSPI_IRQSTATUS_RX0_FULL) < 0) {
-                               dev_err(&spi->dev, "RX_FULL timed out\n");
-                               goto out;
-                       }
-                       writel_relaxed(OMAP2_MCSPI_IRQSTATUS_RX0_FULL, irqstat_reg);
-
-                       for (iter = 0; iter < cwc; iter++, rx += bytes_per_word) {
-                               if (last_request &&
-                                   mcspi_wait_for_reg_bit(chstat_reg,
-                                                          OMAP2_MCSPI_CHSTAT_RXS) < 0) {
-                                       dev_err(&spi->dev, "RXS timed out\n");
-                                       goto out;
-                               }
-                               if (bytes_per_word == 1)
-                                       *rx = readl_relaxed(rx_reg);
-                               else if (bytes_per_word == 2)
-                                       *((u16 *)rx) = readl_relaxed(rx_reg);
-                               else if (bytes_per_word == 4)
-                                       *((u32 *)rx) = readl_relaxed(rx_reg);
-                       }
-               }
-
-               if (last_request) {
-                       if (mcspi_wait_for_reg_bit(chstat_reg,
-                                                  OMAP2_MCSPI_CHSTAT_EOT) < 0) {
-                               dev_err(&spi->dev, "EOT timed out\n");
-                               goto out;
-                       }
-                       if (mcspi_wait_for_reg_bit(chstat_reg,
-                                                  OMAP2_MCSPI_CHSTAT_TXFFE) < 0) {
-                               dev_err(&spi->dev, "TXFFE timed out\n");
-                               goto out;
-                       }
-                       omap2_mcspi_set_enable(spi, 0);
-               }
-               c -= cwc * bytes_per_word;
-       } while (c >= bytes_per_word);
-
-out:
-       omap2_mcspi_set_enable(spi, 1);
-       return count - c;
-}
-
 static u32 omap2_mcspi_calc_divisor(u32 speed_hz, u32 ref_clk_hz)
 {
        u32 div;
@@ -1323,9 +1206,7 @@ static int omap2_mcspi_transfer_one(struct spi_controller *ctlr,
                if ((mcspi_dma->dma_rx && mcspi_dma->dma_tx) &&
                    ctlr->cur_msg_mapped &&
                    ctlr->can_dma(ctlr, spi, t))
-                       omap2_mcspi_set_fifo(spi, t, 1, 1);
-               else if (t->len > OMAP2_MCSPI_MAX_FIFODEPTH)
-                       omap2_mcspi_set_fifo(spi, t, 1, 0);
+                       omap2_mcspi_set_fifo(spi, t, 1);
 
                omap2_mcspi_set_enable(spi, 1);
 
@@ -1338,8 +1219,6 @@ static int omap2_mcspi_transfer_one(struct spi_controller *ctlr,
                    ctlr->cur_msg_mapped &&
                    ctlr->can_dma(ctlr, spi, t))
                        count = omap2_mcspi_txrx_dma(spi, t);
-               else if (mcspi->fifo_depth > 0)
-                       count = omap2_mcspi_txrx_piofifo(spi, t);
                else
                        count = omap2_mcspi_txrx_pio(spi, t);
 
@@ -1352,7 +1231,7 @@ static int omap2_mcspi_transfer_one(struct spi_controller *ctlr,
        omap2_mcspi_set_enable(spi, 0);
 
        if (mcspi->fifo_depth > 0)
-               omap2_mcspi_set_fifo(spi, t, 0, 0);
+               omap2_mcspi_set_fifo(spi, t, 0);
 
 out:
        /* Restore defaults if they were overriden */
@@ -1375,7 +1254,7 @@ out:
                omap2_mcspi_set_cs(spi, !(spi->mode & SPI_CS_HIGH));
 
        if (mcspi->fifo_depth > 0 && t)
-               omap2_mcspi_set_fifo(spi, t, 0, 0);
+               omap2_mcspi_set_fifo(spi, t, 0);
 
        return status;
 }
index 03aab661be9d33af1ff6ad93052171f206402b83..82d6264841fc7f090a5541235569e40023330483 100644 (file)
 #include <linux/slab.h>
 #include <linux/errno.h>
 #include <linux/wait.h>
+#include <linux/platform_device.h>
 #include <linux/of_address.h>
 #include <linux/of_irq.h>
 #include <linux/of_platform.h>
 #include <linux/interrupt.h>
 #include <linux/delay.h>
+#include <linux/platform_device.h>
 
 #include <linux/spi/spi.h>
 #include <linux/spi/spi_bitbang.h>
@@ -166,10 +168,8 @@ static int spi_ppc4xx_setupxfer(struct spi_device *spi, struct spi_transfer *t)
        int scr;
        u8 cdm = 0;
        u32 speed;
-       u8 bits_per_word;
 
        /* Start with the generic configuration for this device. */
-       bits_per_word = spi->bits_per_word;
        speed = spi->max_speed_hz;
 
        /*
@@ -177,9 +177,6 @@ static int spi_ppc4xx_setupxfer(struct spi_device *spi, struct spi_transfer *t)
         * the transfer to overwrite the generic configuration with zeros.
         */
        if (t) {
-               if (t->bits_per_word)
-                       bits_per_word = t->bits_per_word;
-
                if (t->speed_hz)
                        speed = min(t->speed_hz, spi->max_speed_hz);
        }
index cfc3b1ddbd229f04885db1b610298e63b623132f..6f12e4fb2e2e184f1bb4cf9fe12e5437384fc4ac 100644 (file)
@@ -136,14 +136,14 @@ struct sh_msiof_spi_priv {
 
 /* SIFCTR */
 #define SIFCTR_TFWM_MASK       GENMASK(31, 29) /* Transmit FIFO Watermark */
-#define SIFCTR_TFWM_64         (0 << 29)       /*  Transfer Request when 64 empty stages */
-#define SIFCTR_TFWM_32         (1 << 29)       /*  Transfer Request when 32 empty stages */
-#define SIFCTR_TFWM_24         (2 << 29)       /*  Transfer Request when 24 empty stages */
-#define SIFCTR_TFWM_16         (3 << 29)       /*  Transfer Request when 16 empty stages */
-#define SIFCTR_TFWM_12         (4 << 29)       /*  Transfer Request when 12 empty stages */
-#define SIFCTR_TFWM_8          (5 << 29)       /*  Transfer Request when 8 empty stages */
-#define SIFCTR_TFWM_4          (6 << 29)       /*  Transfer Request when 4 empty stages */
-#define SIFCTR_TFWM_1          (7 << 29)       /*  Transfer Request when 1 empty stage */
+#define SIFCTR_TFWM_64         (0UL << 29)     /*  Transfer Request when 64 empty stages */
+#define SIFCTR_TFWM_32         (1UL << 29)     /*  Transfer Request when 32 empty stages */
+#define SIFCTR_TFWM_24         (2UL << 29)     /*  Transfer Request when 24 empty stages */
+#define SIFCTR_TFWM_16         (3UL << 29)     /*  Transfer Request when 16 empty stages */
+#define SIFCTR_TFWM_12         (4UL << 29)     /*  Transfer Request when 12 empty stages */
+#define SIFCTR_TFWM_8          (5UL << 29)     /*  Transfer Request when 8 empty stages */
+#define SIFCTR_TFWM_4          (6UL << 29)     /*  Transfer Request when 4 empty stages */
+#define SIFCTR_TFWM_1          (7UL << 29)     /*  Transfer Request when 1 empty stage */
 #define SIFCTR_TFUA_MASK       GENMASK(26, 20) /* Transmit FIFO Usable Area */
 #define SIFCTR_TFUA_SHIFT      20
 #define SIFCTR_TFUA(i)         ((i) << SIFCTR_TFUA_SHIFT)
index 7477a11e12be0e2bf47006ce9e579fdd9f1fda30..f2170f4b50775ea175c3d0c1e4a7ef0f809e6a52 100644 (file)
@@ -1717,6 +1717,10 @@ static int __spi_pump_transfer_message(struct spi_controller *ctlr,
                        pm_runtime_put_noidle(ctlr->dev.parent);
                        dev_err(&ctlr->dev, "Failed to power device: %d\n",
                                ret);
+
+                       msg->status = ret;
+                       spi_finalize_current_message(ctlr);
+
                        return ret;
                }
        }
index e748a5d04e970598c4bee3b52c19a57fcbbd1c12..9149d41fe65b7ed48785f80bc712902278eccec3 100644 (file)
@@ -608,7 +608,7 @@ static void ad5933_work(struct work_struct *work)
                struct ad5933_state, work.work);
        struct iio_dev *indio_dev = i2c_get_clientdata(st->client);
        __be16 buf[2];
-       int val[2];
+       u16 val[2];
        unsigned char status;
        int ret;
 
index f44e6412f4e31a4ee3ab22cc650eee5a49ee31a1..d0db2efe004525e9d88531e0115c8edd2fa9c4df 100644 (file)
@@ -3723,12 +3723,10 @@ apply_min_padding:
 
 static int atomisp_set_crop(struct atomisp_device *isp,
                            const struct v4l2_mbus_framefmt *format,
+                           struct v4l2_subdev_state *sd_state,
                            int which)
 {
        struct atomisp_input_subdev *input = &isp->inputs[isp->asd.input_curr];
-       struct v4l2_subdev_state pad_state = {
-               .pads = &input->pad_cfg,
-       };
        struct v4l2_subdev_selection sel = {
                .which = which,
                .target = V4L2_SEL_TGT_CROP,
@@ -3754,7 +3752,7 @@ static int atomisp_set_crop(struct atomisp_device *isp,
        sel.r.left = ((input->native_rect.width - sel.r.width) / 2) & ~1;
        sel.r.top = ((input->native_rect.height - sel.r.height) / 2) & ~1;
 
-       ret = v4l2_subdev_call(input->camera, pad, set_selection, &pad_state, &sel);
+       ret = v4l2_subdev_call(input->camera, pad, set_selection, sd_state, &sel);
        if (ret)
                dev_err(isp->dev, "Error setting crop to %ux%u @%ux%u: %d\n",
                        sel.r.width, sel.r.height, sel.r.left, sel.r.top, ret);
@@ -3770,9 +3768,6 @@ int atomisp_try_fmt(struct atomisp_device *isp, struct v4l2_pix_format *f,
        const struct atomisp_format_bridge *fmt, *snr_fmt;
        struct atomisp_sub_device *asd = &isp->asd;
        struct atomisp_input_subdev *input = &isp->inputs[asd->input_curr];
-       struct v4l2_subdev_state pad_state = {
-               .pads = &input->pad_cfg,
-       };
        struct v4l2_subdev_format format = {
                .which = V4L2_SUBDEV_FORMAT_TRY,
        };
@@ -3809,11 +3804,16 @@ int atomisp_try_fmt(struct atomisp_device *isp, struct v4l2_pix_format *f,
        dev_dbg(isp->dev, "try_mbus_fmt: asking for %ux%u\n",
                format.format.width, format.format.height);
 
-       ret = atomisp_set_crop(isp, &format.format, V4L2_SUBDEV_FORMAT_TRY);
-       if (ret)
-               return ret;
+       v4l2_subdev_lock_state(input->try_sd_state);
+
+       ret = atomisp_set_crop(isp, &format.format, input->try_sd_state,
+                              V4L2_SUBDEV_FORMAT_TRY);
+       if (ret == 0)
+               ret = v4l2_subdev_call(input->camera, pad, set_fmt,
+                                      input->try_sd_state, &format);
+
+       v4l2_subdev_unlock_state(input->try_sd_state);
 
-       ret = v4l2_subdev_call(input->camera, pad, set_fmt, &pad_state, &format);
        if (ret)
                return ret;
 
@@ -4238,9 +4238,7 @@ static int atomisp_set_fmt_to_snr(struct video_device *vdev, const struct v4l2_p
        struct atomisp_device *isp = asd->isp;
        struct atomisp_input_subdev *input = &isp->inputs[asd->input_curr];
        const struct atomisp_format_bridge *format;
-       struct v4l2_subdev_state pad_state = {
-               .pads = &input->pad_cfg,
-       };
+       struct v4l2_subdev_state *act_sd_state;
        struct v4l2_subdev_format vformat = {
                .which = V4L2_SUBDEV_FORMAT_TRY,
        };
@@ -4268,12 +4266,18 @@ static int atomisp_set_fmt_to_snr(struct video_device *vdev, const struct v4l2_p
 
        /* Disable dvs if resolution can't be supported by sensor */
        if (asd->params.video_dis_en && asd->run_mode->val == ATOMISP_RUN_MODE_VIDEO) {
-               ret = atomisp_set_crop(isp, &vformat.format, V4L2_SUBDEV_FORMAT_TRY);
-               if (ret)
-                       return ret;
+               v4l2_subdev_lock_state(input->try_sd_state);
+
+               ret = atomisp_set_crop(isp, &vformat.format, input->try_sd_state,
+                                      V4L2_SUBDEV_FORMAT_TRY);
+               if (ret == 0) {
+                       vformat.which = V4L2_SUBDEV_FORMAT_TRY;
+                       ret = v4l2_subdev_call(input->camera, pad, set_fmt,
+                                              input->try_sd_state, &vformat);
+               }
+
+               v4l2_subdev_unlock_state(input->try_sd_state);
 
-               vformat.which = V4L2_SUBDEV_FORMAT_TRY;
-               ret = v4l2_subdev_call(input->camera, pad, set_fmt, &pad_state, &vformat);
                if (ret)
                        return ret;
 
@@ -4291,12 +4295,18 @@ static int atomisp_set_fmt_to_snr(struct video_device *vdev, const struct v4l2_p
                }
        }
 
-       ret = atomisp_set_crop(isp, &vformat.format, V4L2_SUBDEV_FORMAT_ACTIVE);
-       if (ret)
-               return ret;
+       act_sd_state = v4l2_subdev_lock_and_get_active_state(input->camera);
+
+       ret = atomisp_set_crop(isp, &vformat.format, act_sd_state,
+                              V4L2_SUBDEV_FORMAT_ACTIVE);
+       if (ret == 0) {
+               vformat.which = V4L2_SUBDEV_FORMAT_ACTIVE;
+               ret = v4l2_subdev_call(input->camera, pad, set_fmt, act_sd_state, &vformat);
+       }
+
+       if (act_sd_state)
+               v4l2_subdev_unlock_state(act_sd_state);
 
-       vformat.which = V4L2_SUBDEV_FORMAT_ACTIVE;
-       ret = v4l2_subdev_call(input->camera, pad, set_fmt, NULL, &vformat);
        if (ret)
                return ret;
 
index f7b4bee9574bdb8ea330bec4e04074515e71f5a5..d5b077e602caec6ac2863780f660f7aac751ff02 100644 (file)
@@ -132,8 +132,8 @@ struct atomisp_input_subdev {
        /* Sensor rects for sensors which support crop */
        struct v4l2_rect native_rect;
        struct v4l2_rect active_rect;
-       /* Sensor pad_cfg for which == V4L2_SUBDEV_FORMAT_TRY calls */
-       struct v4l2_subdev_pad_config pad_cfg;
+       /* Sensor state for which == V4L2_SUBDEV_FORMAT_TRY calls */
+       struct v4l2_subdev_state *try_sd_state;
 
        struct v4l2_subdev *motor;
 
index 01b7fa9b56a21378459f3aa4101eab6195558546..5b2d88c02d36a083376ee21660923635e9ff70c4 100644 (file)
@@ -781,12 +781,20 @@ static int atomisp_enum_framesizes(struct file *file, void *priv,
                .which = V4L2_SUBDEV_FORMAT_ACTIVE,
                .code = input->code,
        };
+       struct v4l2_subdev_state *act_sd_state;
        int ret;
 
+       if (!input->camera)
+               return -EINVAL;
+
        if (input->crop_support)
                return atomisp_enum_framesizes_crop(isp, fsize);
 
-       ret = v4l2_subdev_call(input->camera, pad, enum_frame_size, NULL, &fse);
+       act_sd_state = v4l2_subdev_lock_and_get_active_state(input->camera);
+       ret = v4l2_subdev_call(input->camera, pad, enum_frame_size,
+                              act_sd_state, &fse);
+       if (act_sd_state)
+               v4l2_subdev_unlock_state(act_sd_state);
        if (ret)
                return ret;
 
@@ -803,18 +811,25 @@ static int atomisp_enum_frameintervals(struct file *file, void *priv,
        struct video_device *vdev = video_devdata(file);
        struct atomisp_device *isp = video_get_drvdata(vdev);
        struct atomisp_sub_device *asd = atomisp_to_video_pipe(vdev)->asd;
+       struct atomisp_input_subdev *input = &isp->inputs[asd->input_curr];
        struct v4l2_subdev_frame_interval_enum fie = {
-               .code   = atomisp_in_fmt_conv[0].code,
+               .code = atomisp_in_fmt_conv[0].code,
                .index = fival->index,
                .width = fival->width,
                .height = fival->height,
                .which = V4L2_SUBDEV_FORMAT_ACTIVE,
        };
+       struct v4l2_subdev_state *act_sd_state;
        int ret;
 
-       ret = v4l2_subdev_call(isp->inputs[asd->input_curr].camera,
-                              pad, enum_frame_interval, NULL,
-                              &fie);
+       if (!input->camera)
+               return -EINVAL;
+
+       act_sd_state = v4l2_subdev_lock_and_get_active_state(input->camera);
+       ret = v4l2_subdev_call(input->camera, pad, enum_frame_interval,
+                              act_sd_state, &fie);
+       if (act_sd_state)
+               v4l2_subdev_unlock_state(act_sd_state);
        if (ret)
                return ret;
 
@@ -830,30 +845,25 @@ static int atomisp_enum_fmt_cap(struct file *file, void *fh,
        struct video_device *vdev = video_devdata(file);
        struct atomisp_device *isp = video_get_drvdata(vdev);
        struct atomisp_sub_device *asd = atomisp_to_video_pipe(vdev)->asd;
+       struct atomisp_input_subdev *input = &isp->inputs[asd->input_curr];
        struct v4l2_subdev_mbus_code_enum code = {
                .which = V4L2_SUBDEV_FORMAT_ACTIVE,
        };
        const struct atomisp_format_bridge *format;
-       struct v4l2_subdev *camera;
+       struct v4l2_subdev_state *act_sd_state;
        unsigned int i, fi = 0;
-       int rval;
+       int ret;
 
-       camera = isp->inputs[asd->input_curr].camera;
-       if(!camera) {
-               dev_err(isp->dev, "%s(): camera is NULL, device is %s\n",
-                       __func__, vdev->name);
+       if (!input->camera)
                return -EINVAL;
-       }
 
-       rval = v4l2_subdev_call(camera, pad, enum_mbus_code, NULL, &code);
-       if (rval == -ENOIOCTLCMD) {
-               dev_warn(isp->dev,
-                        "enum_mbus_code pad op not supported by %s. Please fix your sensor driver!\n",
-                        camera->name);
-       }
-
-       if (rval)
-               return rval;
+       act_sd_state = v4l2_subdev_lock_and_get_active_state(input->camera);
+       ret = v4l2_subdev_call(input->camera, pad, enum_mbus_code,
+                              act_sd_state, &code);
+       if (act_sd_state)
+               v4l2_subdev_unlock_state(act_sd_state);
+       if (ret)
+               return ret;
 
        for (i = 0; i < ARRAY_SIZE(atomisp_output_fmts); i++) {
                format = &atomisp_output_fmts[i];
index c1c8501ec61f57046af5027c8f9a4170597210d5..547e1444ad9733569816c1e43c74e542089f82fa 100644 (file)
@@ -862,6 +862,9 @@ static void atomisp_unregister_entities(struct atomisp_device *isp)
        v4l2_device_unregister(&isp->v4l2_dev);
        media_device_unregister(&isp->media_dev);
        media_device_cleanup(&isp->media_dev);
+
+       for (i = 0; i < isp->input_cnt; i++)
+               __v4l2_subdev_state_free(isp->inputs[i].try_sd_state);
 }
 
 static int atomisp_register_entities(struct atomisp_device *isp)
@@ -933,32 +936,49 @@ v4l2_device_failed:
 
 static void atomisp_init_sensor(struct atomisp_input_subdev *input)
 {
+       static struct lock_class_key try_sd_state_key;
        struct v4l2_subdev_mbus_code_enum mbus_code_enum = { };
        struct v4l2_subdev_frame_size_enum fse = { };
-       struct v4l2_subdev_state sd_state = {
-               .pads = &input->pad_cfg,
-       };
        struct v4l2_subdev_selection sel = { };
+       struct v4l2_subdev_state *try_sd_state, *act_sd_state;
        int i, err;
 
+       /*
+        * FIXME: Drivers are not supposed to use __v4l2_subdev_state_alloc()
+        * but atomisp needs this for try_fmt on its /dev/video# node since
+        * it emulates a normal v4l2 device there, passing through try_fmt /
+        * set_fmt to the sensor.
+        */
+       try_sd_state = __v4l2_subdev_state_alloc(input->camera,
+                               "atomisp:try_sd_state->lock", &try_sd_state_key);
+       if (IS_ERR(try_sd_state))
+               return;
+
+       input->try_sd_state = try_sd_state;
+
+       act_sd_state = v4l2_subdev_lock_and_get_active_state(input->camera);
+
        mbus_code_enum.which = V4L2_SUBDEV_FORMAT_ACTIVE;
-       err = v4l2_subdev_call(input->camera, pad, enum_mbus_code, NULL, &mbus_code_enum);
+       err = v4l2_subdev_call(input->camera, pad, enum_mbus_code,
+                              act_sd_state, &mbus_code_enum);
        if (!err)
                input->code = mbus_code_enum.code;
 
        sel.which = V4L2_SUBDEV_FORMAT_ACTIVE;
        sel.target = V4L2_SEL_TGT_NATIVE_SIZE;
-       err = v4l2_subdev_call(input->camera, pad, get_selection, NULL, &sel);
+       err = v4l2_subdev_call(input->camera, pad, get_selection,
+                              act_sd_state, &sel);
        if (err)
-               return;
+               goto unlock_act_sd_state;
 
        input->native_rect = sel.r;
 
        sel.which = V4L2_SUBDEV_FORMAT_ACTIVE;
        sel.target = V4L2_SEL_TGT_CROP_DEFAULT;
-       err = v4l2_subdev_call(input->camera, pad, get_selection, NULL, &sel);
+       err = v4l2_subdev_call(input->camera, pad, get_selection,
+                              act_sd_state, &sel);
        if (err)
-               return;
+               goto unlock_act_sd_state;
 
        input->active_rect = sel.r;
 
@@ -973,7 +993,8 @@ static void atomisp_init_sensor(struct atomisp_input_subdev *input)
                fse.code = input->code;
                fse.which = V4L2_SUBDEV_FORMAT_ACTIVE;
 
-               err = v4l2_subdev_call(input->camera, pad, enum_frame_size, NULL, &fse);
+               err = v4l2_subdev_call(input->camera, pad, enum_frame_size,
+                                      act_sd_state, &fse);
                if (err)
                        break;
 
@@ -989,22 +1010,26 @@ static void atomisp_init_sensor(struct atomisp_input_subdev *input)
         * for padding, set the crop rect to cover the entire sensor instead
         * of only the default active area.
         *
-        * Do this for both try and active formats since the try_crop rect in
-        * pad_cfg may influence (clamp) future try_fmt calls with which == try.
+        * Do this for both try and active formats since the crop rect in
+        * try_sd_state may influence (clamp size) in calls with which == try.
         */
        sel.which = V4L2_SUBDEV_FORMAT_TRY;
        sel.target = V4L2_SEL_TGT_CROP;
        sel.r = input->native_rect;
-       err = v4l2_subdev_call(input->camera, pad, set_selection, &sd_state, &sel);
+       v4l2_subdev_lock_state(input->try_sd_state);
+       err = v4l2_subdev_call(input->camera, pad, set_selection,
+                              input->try_sd_state, &sel);
+       v4l2_subdev_unlock_state(input->try_sd_state);
        if (err)
-               return;
+               goto unlock_act_sd_state;
 
        sel.which = V4L2_SUBDEV_FORMAT_ACTIVE;
        sel.target = V4L2_SEL_TGT_CROP;
        sel.r = input->native_rect;
-       err = v4l2_subdev_call(input->camera, pad, set_selection, NULL, &sel);
+       err = v4l2_subdev_call(input->camera, pad, set_selection,
+                              act_sd_state, &sel);
        if (err)
-               return;
+               goto unlock_act_sd_state;
 
        dev_info(input->camera->dev, "Supports crop native %dx%d active %dx%d binning %d\n",
                 input->native_rect.width, input->native_rect.height,
@@ -1012,6 +1037,10 @@ static void atomisp_init_sensor(struct atomisp_input_subdev *input)
                 input->binning_support);
 
        input->crop_support = true;
+
+unlock_act_sd_state:
+       if (act_sd_state)
+               v4l2_subdev_unlock_state(act_sd_state);
 }
 
 int atomisp_register_device_nodes(struct atomisp_device *isp)
index a5f58988130a15c921e45570298edfbe212273ba..c1fbcdd1618264f0cd09f5e4078ac600ad6dc22a 100644 (file)
@@ -759,6 +759,29 @@ static ssize_t emulate_tas_store(struct config_item *item,
        return count;
 }
 
+static int target_try_configure_unmap(struct se_device *dev,
+                                     const char *config_opt)
+{
+       if (!dev->transport->configure_unmap) {
+               pr_err("Generic Block Discard not supported\n");
+               return -ENOSYS;
+       }
+
+       if (!target_dev_configured(dev)) {
+               pr_err("Generic Block Discard setup for %s requires device to be configured\n",
+                      config_opt);
+               return -ENODEV;
+       }
+
+       if (!dev->transport->configure_unmap(dev)) {
+               pr_err("Generic Block Discard setup for %s failed\n",
+                      config_opt);
+               return -ENOSYS;
+       }
+
+       return 0;
+}
+
 static ssize_t emulate_tpu_store(struct config_item *item,
                const char *page, size_t count)
 {
@@ -776,11 +799,9 @@ static ssize_t emulate_tpu_store(struct config_item *item,
         * Discard supported is detected iblock_create_virtdevice().
         */
        if (flag && !da->max_unmap_block_desc_count) {
-               if (!dev->transport->configure_unmap ||
-                   !dev->transport->configure_unmap(dev)) {
-                       pr_err("Generic Block Discard not supported\n");
-                       return -ENOSYS;
-               }
+               ret = target_try_configure_unmap(dev, "emulate_tpu");
+               if (ret)
+                       return ret;
        }
 
        da->emulate_tpu = flag;
@@ -806,11 +827,9 @@ static ssize_t emulate_tpws_store(struct config_item *item,
         * Discard supported is detected iblock_create_virtdevice().
         */
        if (flag && !da->max_unmap_block_desc_count) {
-               if (!dev->transport->configure_unmap ||
-                   !dev->transport->configure_unmap(dev)) {
-                       pr_err("Generic Block Discard not supported\n");
-                       return -ENOSYS;
-               }
+               ret = target_try_configure_unmap(dev, "emulate_tpws");
+               if (ret)
+                       return ret;
        }
 
        da->emulate_tpws = flag;
@@ -1022,12 +1041,9 @@ static ssize_t unmap_zeroes_data_store(struct config_item *item,
         * Discard supported is detected iblock_configure_device().
         */
        if (flag && !da->max_unmap_block_desc_count) {
-               if (!dev->transport->configure_unmap ||
-                   !dev->transport->configure_unmap(dev)) {
-                       pr_err("dev[%p]: Thin Provisioning LBPRZ will not be set because max_unmap_block_desc_count is zero\n",
-                              da->da_dev);
-                       return -ENOSYS;
-               }
+               ret = target_try_configure_unmap(dev, "unmap_zeroes_data");
+               if (ret)
+                       return ret;
        }
        da->unmap_zeroes_data = flag;
        pr_debug("dev[%p]: SE Device Thin Provisioning LBPRZ bit: %d\n",
index 8eb9eb7ce5df522b9ba734ef602f5ff22c4716e0..7f6ca81778453b0817f0d9f4c32e6b2b44ea6069 100644 (file)
@@ -91,7 +91,7 @@ static int iblock_configure_device(struct se_device *dev)
 {
        struct iblock_dev *ib_dev = IBLOCK_DEV(dev);
        struct request_queue *q;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        struct block_device *bd;
        struct blk_integrity *bi;
        blk_mode_t mode = BLK_OPEN_READ;
@@ -117,14 +117,14 @@ static int iblock_configure_device(struct se_device *dev)
        else
                dev->dev_flags |= DF_READ_ONLY;
 
-       bdev_handle = bdev_open_by_path(ib_dev->ibd_udev_path, mode, ib_dev,
+       bdev_file = bdev_file_open_by_path(ib_dev->ibd_udev_path, mode, ib_dev,
                                        NULL);
-       if (IS_ERR(bdev_handle)) {
-               ret = PTR_ERR(bdev_handle);
+       if (IS_ERR(bdev_file)) {
+               ret = PTR_ERR(bdev_file);
                goto out_free_bioset;
        }
-       ib_dev->ibd_bdev_handle = bdev_handle;
-       ib_dev->ibd_bd = bd = bdev_handle->bdev;
+       ib_dev->ibd_bdev_file = bdev_file;
+       ib_dev->ibd_bd = bd = file_bdev(bdev_file);
 
        q = bdev_get_queue(bd);
 
@@ -180,7 +180,7 @@ static int iblock_configure_device(struct se_device *dev)
        return 0;
 
 out_blkdev_put:
-       bdev_release(ib_dev->ibd_bdev_handle);
+       fput(ib_dev->ibd_bdev_file);
 out_free_bioset:
        bioset_exit(&ib_dev->ibd_bio_set);
 out:
@@ -205,8 +205,8 @@ static void iblock_destroy_device(struct se_device *dev)
 {
        struct iblock_dev *ib_dev = IBLOCK_DEV(dev);
 
-       if (ib_dev->ibd_bdev_handle)
-               bdev_release(ib_dev->ibd_bdev_handle);
+       if (ib_dev->ibd_bdev_file)
+               fput(ib_dev->ibd_bdev_file);
        bioset_exit(&ib_dev->ibd_bio_set);
 }
 
index 683f9a55945bb2676c992707bbeb7edaae806e36..91f6f4280666cb08d694d458bb53fb96c8719eae 100644 (file)
@@ -32,7 +32,7 @@ struct iblock_dev {
        u32     ibd_flags;
        struct bio_set  ibd_bio_set;
        struct block_device *ibd_bd;
-       struct bdev_handle *ibd_bdev_handle;
+       struct file *ibd_bdev_file;
        bool ibd_readonly;
        struct iblock_dev_plug *ibd_plug;
 } ____cacheline_aligned;
index 41b7489d37ce95e059ec4849ae7039949c6e6ff1..f98ebb18666bf09354daa36a97c04a8755f3233b 100644 (file)
@@ -352,7 +352,7 @@ static int pscsi_create_type_disk(struct se_device *dev, struct scsi_device *sd)
        struct pscsi_hba_virt *phv = dev->se_hba->hba_ptr;
        struct pscsi_dev_virt *pdv = PSCSI_DEV(dev);
        struct Scsi_Host *sh = sd->host;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        int ret;
 
        if (scsi_device_get(sd)) {
@@ -366,18 +366,18 @@ static int pscsi_create_type_disk(struct se_device *dev, struct scsi_device *sd)
         * Claim exclusive struct block_device access to struct scsi_device
         * for TYPE_DISK and TYPE_ZBC using supplied udev_path
         */
-       bdev_handle = bdev_open_by_path(dev->udev_path,
+       bdev_file = bdev_file_open_by_path(dev->udev_path,
                                BLK_OPEN_WRITE | BLK_OPEN_READ, pdv, NULL);
-       if (IS_ERR(bdev_handle)) {
+       if (IS_ERR(bdev_file)) {
                pr_err("pSCSI: bdev_open_by_path() failed\n");
                scsi_device_put(sd);
-               return PTR_ERR(bdev_handle);
+               return PTR_ERR(bdev_file);
        }
-       pdv->pdv_bdev_handle = bdev_handle;
+       pdv->pdv_bdev_file = bdev_file;
 
        ret = pscsi_add_device_to_list(dev, sd);
        if (ret) {
-               bdev_release(bdev_handle);
+               fput(bdev_file);
                scsi_device_put(sd);
                return ret;
        }
@@ -564,9 +564,9 @@ static void pscsi_destroy_device(struct se_device *dev)
                 * from pscsi_create_type_disk()
                 */
                if ((sd->type == TYPE_DISK || sd->type == TYPE_ZBC) &&
-                   pdv->pdv_bdev_handle) {
-                       bdev_release(pdv->pdv_bdev_handle);
-                       pdv->pdv_bdev_handle = NULL;
+                   pdv->pdv_bdev_file) {
+                       fput(pdv->pdv_bdev_file);
+                       pdv->pdv_bdev_file = NULL;
                }
                /*
                 * For HBA mode PHV_LLD_SCSI_HOST_NO, release the reference
@@ -907,12 +907,15 @@ new_bio:
 
        return 0;
 fail:
-       if (bio)
-               bio_put(bio);
+       if (bio) {
+               bio_uninit(bio);
+               kfree(bio);
+       }
        while (req->bio) {
                bio = req->bio;
                req->bio = bio->bi_next;
-               bio_put(bio);
+               bio_uninit(bio);
+               kfree(bio);
        }
        req->biotail = NULL;
        return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
@@ -994,8 +997,8 @@ static sector_t pscsi_get_blocks(struct se_device *dev)
 {
        struct pscsi_dev_virt *pdv = PSCSI_DEV(dev);
 
-       if (pdv->pdv_bdev_handle)
-               return bdev_nr_sectors(pdv->pdv_bdev_handle->bdev);
+       if (pdv->pdv_bdev_file)
+               return bdev_nr_sectors(file_bdev(pdv->pdv_bdev_file));
        return 0;
 }
 
index b0a3ef136592a9b00653a14a525f1568af976cd1..9acaa21e4c78a431a4f165b27faeb5198fea706c 100644 (file)
@@ -37,7 +37,7 @@ struct pscsi_dev_virt {
        int     pdv_channel_id;
        int     pdv_target_id;
        int     pdv_lun_id;
-       struct bdev_handle *pdv_bdev_handle;
+       struct file *pdv_bdev_file;
        struct scsi_device *pdv_sd;
        struct Scsi_Host *pdv_lld_host;
 } ____cacheline_aligned;
index 4b10921276942ed13a43e531b723e746b76dd6fa..1892e49a8e6a68b5c0f8e719042b34b4d2856b42 100644 (file)
@@ -90,13 +90,14 @@ static int optee_register_device(const uuid_t *device_uuid, u32 func)
        if (rc) {
                pr_err("device registration failed, err: %d\n", rc);
                put_device(&optee_device->dev);
+               return rc;
        }
 
        if (func == PTA_CMD_GET_DEVICES_SUPP)
                device_create_file(&optee_device->dev,
                                   &dev_attr_need_supplicant);
 
-       return rc;
+       return 0;
 }
 
 static int __optee_enumerate_devices(u32 func)
index 5ac5cb60bae67b8caa54d47e0ebb740d6a4505ab..bc6eb0dd66a495f04ae5d0c398611895351c9b2f 100644 (file)
@@ -49,7 +49,6 @@
  */
 #define DEFAULT_DURATION_JIFFIES (6)
 
-static unsigned int target_mwait;
 static struct dentry *debug_dir;
 static bool poll_pkg_cstate_enable;
 
@@ -312,34 +311,6 @@ MODULE_PARM_DESC(window_size, "sliding window in number of clamping cycles\n"
        "\twindow size results in slower response time but more smooth\n"
        "\tclamping results. default to 2.");
 
-static void find_target_mwait(void)
-{
-       unsigned int eax, ebx, ecx, edx;
-       unsigned int highest_cstate = 0;
-       unsigned int highest_subcstate = 0;
-       int i;
-
-       if (boot_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
-               return;
-
-       cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);
-
-       if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) ||
-           !(ecx & CPUID5_ECX_INTERRUPT_BREAK))
-               return;
-
-       edx >>= MWAIT_SUBSTATE_SIZE;
-       for (i = 0; i < 7 && edx; i++, edx >>= MWAIT_SUBSTATE_SIZE) {
-               if (edx & MWAIT_SUBSTATE_MASK) {
-                       highest_cstate = i;
-                       highest_subcstate = edx & MWAIT_SUBSTATE_MASK;
-               }
-       }
-       target_mwait = (highest_cstate << MWAIT_SUBSTATE_SIZE) |
-               (highest_subcstate - 1);
-
-}
-
 struct pkg_cstate_info {
        bool skip;
        int msr_index;
@@ -759,9 +730,6 @@ static int __init powerclamp_probe(void)
                return -ENODEV;
        }
 
-       /* find the deepest mwait value */
-       find_target_mwait();
-
        return 0;
 }
 
index 900114ba4371b10fd941e0add6ade8210c74268f..fad40c4bc710341f27b7f98f6d4317aef6627ae0 100644 (file)
@@ -1249,6 +1249,9 @@ int tb_port_update_credits(struct tb_port *port)
        ret = tb_port_do_update_credits(port);
        if (ret)
                return ret;
+
+       if (!port->dual_link_port)
+               return 0;
        return tb_port_do_update_credits(port->dual_link_port);
 }
 
index 87e4795275fe6772e0497e8c50650d4a1835b1af..6f798f6a2b8488ca5011fe813733ffc8b5942f48 100644 (file)
@@ -203,7 +203,7 @@ struct tb_regs_switch_header {
 #define ROUTER_CS_5_WOP                                BIT(1)
 #define ROUTER_CS_5_WOU                                BIT(2)
 #define ROUTER_CS_5_WOD                                BIT(3)
-#define ROUTER_CS_5_C3S                                BIT(23)
+#define ROUTER_CS_5_CNS                                BIT(23)
 #define ROUTER_CS_5_PTO                                BIT(24)
 #define ROUTER_CS_5_UTO                                BIT(25)
 #define ROUTER_CS_5_HCO                                BIT(26)
index f8f0d24ff6e4629856ea8a59b3941b79633d1781..1515eff8cc3e23434202fead2a9aa038080111d6 100644 (file)
@@ -290,7 +290,7 @@ int usb4_switch_setup(struct tb_switch *sw)
        }
 
        /* TBT3 supported by the CM */
-       val |= ROUTER_CS_5_C3S;
+       val &= ~ROUTER_CS_5_CNS;
 
        return tb_sw_write(sw, &val, TB_CFG_SWITCH, ROUTER_CS_5, 1);
 }
index 6e05c5c7bca1ad258502eaf158b94534c1cdd23d..c2a4e88b328f35888cb44c0fe1ca5f57f2040e66 100644 (file)
@@ -108,13 +108,15 @@ config HVC_DCC_SERIALIZE_SMP
 
 config HVC_RISCV_SBI
        bool "RISC-V SBI console support"
-       depends on RISCV_SBI
+       depends on RISCV_SBI && NONPORTABLE
        select HVC_DRIVER
        help
          This enables support for console output via RISC-V SBI calls, which
-         is normally used only during boot to output printk.
+         is normally used only during boot to output printk.  This driver
+         conflicts with real console drivers and should not be enabled on
+         systems that directly access the console.
 
-         If you don't know what do to here, say Y.
+         If you don't know what do to here, say N.
 
 config HVCS
        tristate "IBM Hypervisor Virtual Console Server support"
index 2d1f350a4bea2a86103d707cc322ded0f5941abb..c1d43f040c43abc517c4ef6b48a5423e936e97f2 100644 (file)
@@ -357,9 +357,9 @@ static void dw8250_set_termios(struct uart_port *p, struct ktermios *termios,
        long rate;
        int ret;
 
-       clk_disable_unprepare(d->clk);
        rate = clk_round_rate(d->clk, newrate);
-       if (rate > 0) {
+       if (rate > 0 && p->uartclk != rate) {
+               clk_disable_unprepare(d->clk);
                /*
                 * Note that any clock-notifer worker will block in
                 * serial8250_update_uartclk() until we are done.
@@ -367,8 +367,8 @@ static void dw8250_set_termios(struct uart_port *p, struct ktermios *termios,
                ret = clk_set_rate(d->clk, newrate);
                if (!ret)
                        p->uartclk = rate;
+               clk_prepare_enable(d->clk);
        }
-       clk_prepare_enable(d->clk);
 
        dw8250_do_set_termios(p, termios, old);
 }
index 558c4c7f3104ead7e7fe420c2634428651c88dd2..2dda737b1660bd7bd6f576d2146440d97e945047 100644 (file)
@@ -311,7 +311,7 @@ static void pci1xxxx_process_read_data(struct uart_port *port,
        }
 
        while (*valid_byte_count) {
-               if (*buff_index > RX_BUF_SIZE)
+               if (*buff_index >= RX_BUF_SIZE)
                        break;
                rx_buff[*buff_index] = readb(port->membase +
                                             UART_RX_BYTE_FIFO);
index fccec1698a54104c1487ea65536dce7729123c61..cf2c890a560f05204e249b931668deca04b3cb27 100644 (file)
@@ -1339,11 +1339,41 @@ static void pl011_start_tx_pio(struct uart_amba_port *uap)
        }
 }
 
+static void pl011_rs485_tx_start(struct uart_amba_port *uap)
+{
+       struct uart_port *port = &uap->port;
+       u32 cr;
+
+       /* Enable transmitter */
+       cr = pl011_read(uap, REG_CR);
+       cr |= UART011_CR_TXE;
+
+       /* Disable receiver if half-duplex */
+       if (!(port->rs485.flags & SER_RS485_RX_DURING_TX))
+               cr &= ~UART011_CR_RXE;
+
+       if (port->rs485.flags & SER_RS485_RTS_ON_SEND)
+               cr &= ~UART011_CR_RTS;
+       else
+               cr |= UART011_CR_RTS;
+
+       pl011_write(cr, uap, REG_CR);
+
+       if (port->rs485.delay_rts_before_send)
+               mdelay(port->rs485.delay_rts_before_send);
+
+       uap->rs485_tx_started = true;
+}
+
 static void pl011_start_tx(struct uart_port *port)
 {
        struct uart_amba_port *uap =
            container_of(port, struct uart_amba_port, port);
 
+       if ((uap->port.rs485.flags & SER_RS485_ENABLED) &&
+           !uap->rs485_tx_started)
+               pl011_rs485_tx_start(uap);
+
        if (!pl011_dma_tx_start(uap))
                pl011_start_tx_pio(uap);
 }
@@ -1424,42 +1454,12 @@ static bool pl011_tx_char(struct uart_amba_port *uap, unsigned char c,
        return true;
 }
 
-static void pl011_rs485_tx_start(struct uart_amba_port *uap)
-{
-       struct uart_port *port = &uap->port;
-       u32 cr;
-
-       /* Enable transmitter */
-       cr = pl011_read(uap, REG_CR);
-       cr |= UART011_CR_TXE;
-
-       /* Disable receiver if half-duplex */
-       if (!(port->rs485.flags & SER_RS485_RX_DURING_TX))
-               cr &= ~UART011_CR_RXE;
-
-       if (port->rs485.flags & SER_RS485_RTS_ON_SEND)
-               cr &= ~UART011_CR_RTS;
-       else
-               cr |= UART011_CR_RTS;
-
-       pl011_write(cr, uap, REG_CR);
-
-       if (port->rs485.delay_rts_before_send)
-               mdelay(port->rs485.delay_rts_before_send);
-
-       uap->rs485_tx_started = true;
-}
-
 /* Returns true if tx interrupts have to be (kept) enabled  */
 static bool pl011_tx_chars(struct uart_amba_port *uap, bool from_irq)
 {
        struct circ_buf *xmit = &uap->port.state->xmit;
        int count = uap->fifosize >> 1;
 
-       if ((uap->port.rs485.flags & SER_RS485_ENABLED) &&
-           !uap->rs485_tx_started)
-               pl011_rs485_tx_start(uap);
-
        if (uap->port.x_char) {
                if (!pl011_tx_char(uap, uap->port.x_char, from_irq))
                        return true;
index 5ddf110aedbe513b522d10e691cada5563fec4df..bbcbc91482af0bbd04db16242b4d955d21a9753a 100644 (file)
@@ -2345,9 +2345,12 @@ lpuart32_set_termios(struct uart_port *port, struct ktermios *termios,
 
        lpuart32_write(&sport->port, bd, UARTBAUD);
        lpuart32_serial_setbrg(sport, baud);
-       lpuart32_write(&sport->port, modem, UARTMODIR);
-       lpuart32_write(&sport->port, ctrl, UARTCTRL);
+       /* disable CTS before enabling UARTCTRL_TE to avoid pending idle preamble */
+       lpuart32_write(&sport->port, modem & ~UARTMODIR_TXCTSE, UARTMODIR);
        /* restore control register */
+       lpuart32_write(&sport->port, ctrl, UARTCTRL);
+       /* re-enable the CTS if needed */
+       lpuart32_write(&sport->port, modem, UARTMODIR);
 
        if ((ctrl & (UARTCTRL_PE | UARTCTRL_M)) == UARTCTRL_PE)
                sport->is_cs7 = true;
index 4aa72d5aeafbf081ac37241853c49cdb18e46ae5..e14813250616118e5ecfcfdafc1a4c1033f2bf79 100644 (file)
@@ -462,8 +462,7 @@ static void imx_uart_stop_tx(struct uart_port *port)
        }
 }
 
-/* called with port.lock taken and irqs off */
-static void imx_uart_stop_rx(struct uart_port *port)
+static void imx_uart_stop_rx_with_loopback_ctrl(struct uart_port *port, bool loopback)
 {
        struct imx_port *sport = (struct imx_port *)port;
        u32 ucr1, ucr2, ucr4, uts;
@@ -485,7 +484,7 @@ static void imx_uart_stop_rx(struct uart_port *port)
        /* See SER_RS485_ENABLED/UTS_LOOP comment in imx_uart_probe() */
        if (port->rs485.flags & SER_RS485_ENABLED &&
            port->rs485.flags & SER_RS485_RTS_ON_SEND &&
-           sport->have_rtscts && !sport->have_rtsgpio) {
+           sport->have_rtscts && !sport->have_rtsgpio && loopback) {
                uts = imx_uart_readl(sport, imx_uart_uts_reg(sport));
                uts |= UTS_LOOP;
                imx_uart_writel(sport, uts, imx_uart_uts_reg(sport));
@@ -497,6 +496,16 @@ static void imx_uart_stop_rx(struct uart_port *port)
        imx_uart_writel(sport, ucr2, UCR2);
 }
 
+/* called with port.lock taken and irqs off */
+static void imx_uart_stop_rx(struct uart_port *port)
+{
+       /*
+        * Stop RX and enable loopback in order to make sure RS485 bus
+        * is not blocked. Se comment in imx_uart_probe().
+        */
+       imx_uart_stop_rx_with_loopback_ctrl(port, true);
+}
+
 /* called with port.lock taken and irqs off */
 static void imx_uart_enable_ms(struct uart_port *port)
 {
@@ -682,9 +691,14 @@ static void imx_uart_start_tx(struct uart_port *port)
                                imx_uart_rts_inactive(sport, &ucr2);
                        imx_uart_writel(sport, ucr2, UCR2);
 
+                       /*
+                        * Since we are about to transmit we can not stop RX
+                        * with loopback enabled because that will make our
+                        * transmitted data being just looped to RX.
+                        */
                        if (!(port->rs485.flags & SER_RS485_RX_DURING_TX) &&
                            !port->rs485_rx_during_tx_gpio)
-                               imx_uart_stop_rx(port);
+                               imx_uart_stop_rx_with_loopback_ctrl(port, false);
 
                        sport->tx_state = WAIT_AFTER_RTS;
 
index f3a99daebdaa0e59d0d81211fad5a1f85947b011..10bf6d75bf9ee7f9ee13a36796a8af5439ac5eb9 100644 (file)
 #define MAX310x_REV_MASK               (0xf8)
 #define MAX310X_WRITE_BIT              0x80
 
+/* Port startup definitions */
+#define MAX310X_PORT_STARTUP_WAIT_RETRIES      20 /* Number of retries */
+#define MAX310X_PORT_STARTUP_WAIT_DELAY_MS     10 /* Delay between retries */
+
+/* Crystal-related definitions */
+#define MAX310X_XTAL_WAIT_RETRIES      20 /* Number of retries */
+#define MAX310X_XTAL_WAIT_DELAY_MS     10 /* Delay between retries */
+
 /* MAX3107 specific */
 #define MAX3107_REV_ID                 (0xa0)
 
@@ -583,7 +591,7 @@ static int max310x_update_best_err(unsigned long f, long *besterr)
        return 1;
 }
 
-static u32 max310x_set_ref_clk(struct device *dev, struct max310x_port *s,
+static s32 max310x_set_ref_clk(struct device *dev, struct max310x_port *s,
                               unsigned long freq, bool xtal)
 {
        unsigned int div, clksrc, pllcfg = 0;
@@ -641,12 +649,20 @@ static u32 max310x_set_ref_clk(struct device *dev, struct max310x_port *s,
 
        /* Wait for crystal */
        if (xtal) {
-               unsigned int val;
-               msleep(10);
-               regmap_read(s->regmap, MAX310X_STS_IRQSTS_REG, &val);
-               if (!(val & MAX310X_STS_CLKREADY_BIT)) {
-                       dev_warn(dev, "clock is not stable yet\n");
-               }
+               bool stable = false;
+               unsigned int try = 0, val = 0;
+
+               do {
+                       msleep(MAX310X_XTAL_WAIT_DELAY_MS);
+                       regmap_read(s->regmap, MAX310X_STS_IRQSTS_REG, &val);
+
+                       if (val & MAX310X_STS_CLKREADY_BIT)
+                               stable = true;
+               } while (!stable && (++try < MAX310X_XTAL_WAIT_RETRIES));
+
+               if (!stable)
+                       return dev_err_probe(dev, -EAGAIN,
+                                            "clock is not stable\n");
        }
 
        return bestfreq;
@@ -1271,7 +1287,7 @@ static int max310x_probe(struct device *dev, const struct max310x_devtype *devty
 {
        int i, ret, fmin, fmax, freq;
        struct max310x_port *s;
-       u32 uartclk = 0;
+       s32 uartclk = 0;
        bool xtal;
 
        for (i = 0; i < devtype->nr; i++)
@@ -1334,6 +1350,9 @@ static int max310x_probe(struct device *dev, const struct max310x_devtype *devty
                goto out_clk;
 
        for (i = 0; i < devtype->nr; i++) {
+               bool started = false;
+               unsigned int try = 0, val = 0;
+
                /* Reset port */
                regmap_write(regmaps[i], MAX310X_MODE2_REG,
                             MAX310X_MODE2_RST_BIT);
@@ -1342,13 +1361,27 @@ static int max310x_probe(struct device *dev, const struct max310x_devtype *devty
 
                /* Wait for port startup */
                do {
-                       regmap_read(regmaps[i], MAX310X_BRGDIVLSB_REG, &ret);
-               } while (ret != 0x01);
+                       msleep(MAX310X_PORT_STARTUP_WAIT_DELAY_MS);
+                       regmap_read(regmaps[i], MAX310X_BRGDIVLSB_REG, &val);
+
+                       if (val == 0x01)
+                               started = true;
+               } while (!started && (++try < MAX310X_PORT_STARTUP_WAIT_RETRIES));
+
+               if (!started) {
+                       ret = dev_err_probe(dev, -EAGAIN, "port reset failed\n");
+                       goto out_uart;
+               }
 
                regmap_write(regmaps[i], MAX310X_MODE1_REG, devtype->mode1);
        }
 
        uartclk = max310x_set_ref_clk(dev, s, freq, xtal);
+       if (uartclk < 0) {
+               ret = uartclk;
+               goto out_uart;
+       }
+
        dev_dbg(dev, "Reference clock set to %i Hz\n", uartclk);
 
        for (i = 0; i < devtype->nr; i++) {
index 3ec725555bcc1e6eb2843a186a5c49cba1ab0b42..4749331fe618cad7c0af98630f90021b8244bd07 100644 (file)
@@ -605,13 +605,16 @@ static void mxs_auart_tx_chars(struct mxs_auart_port *s)
                return;
        }
 
-       pending = uart_port_tx(&s->port, ch,
+       pending = uart_port_tx_flags(&s->port, ch, UART_TX_NOSTOP,
                !(mxs_read(s, REG_STAT) & AUART_STAT_TXFF),
                mxs_write(ch, s, REG_DATA));
        if (pending)
                mxs_set(AUART_INTR_TXIEN, s, REG_INTR);
        else
                mxs_clr(AUART_INTR_TXIEN, s, REG_INTR);
+
+       if (uart_tx_stopped(&s->port))
+               mxs_auart_stop_tx(&s->port);
 }
 
 static void mxs_auart_rx_char(struct mxs_auart_port *s)
index e63a8fbe63bdb22b70fd9362fae5d557c4c59b75..99e08737f293c6868e56d2de80f31bf0e3345ca3 100644 (file)
@@ -851,19 +851,21 @@ static void qcom_geni_serial_stop_tx(struct uart_port *uport)
 }
 
 static void qcom_geni_serial_send_chunk_fifo(struct uart_port *uport,
-                                            unsigned int remaining)
+                                            unsigned int chunk)
 {
        struct qcom_geni_serial_port *port = to_dev_port(uport);
        struct circ_buf *xmit = &uport->state->xmit;
-       unsigned int tx_bytes;
+       unsigned int tx_bytes, c, remaining = chunk;
        u8 buf[BYTES_PER_FIFO_WORD];
 
        while (remaining) {
                memset(buf, 0, sizeof(buf));
                tx_bytes = min(remaining, BYTES_PER_FIFO_WORD);
 
-               memcpy(buf, &xmit->buf[xmit->tail], tx_bytes);
-               uart_xmit_advance(uport, tx_bytes);
+               for (c = 0; c < tx_bytes ; c++) {
+                       buf[c] = xmit->buf[xmit->tail];
+                       uart_xmit_advance(uport, 1);
+               }
 
                iowrite32_rep(uport->membase + SE_GENI_TX_FIFOn, buf, 1);
 
index b56ed8c376b22fc5ec8c0e833a39dd976a7a58da..d6a58a9e072a1dad7938fbb53627f4d5e5374adc 100644 (file)
@@ -1084,8 +1084,8 @@ static int uart_tiocmget(struct tty_struct *tty)
                goto out;
 
        if (!tty_io_error(tty)) {
-               result = uport->mctrl;
                uart_port_lock_irq(uport);
+               result = uport->mctrl;
                result |= uport->ops->get_mctrl(uport);
                uart_port_unlock_irq(uport);
        }
index 88975a4df3060599b233abb49fe07ca53ba15b3a..72b6f4f326e2b04953875062f8ebc6e56f324e45 100644 (file)
@@ -46,8 +46,31 @@ out:
        return 0;
 }
 
+static int serial_port_runtime_suspend(struct device *dev)
+{
+       struct serial_port_device *port_dev = to_serial_base_port_device(dev);
+       struct uart_port *port = port_dev->port;
+       unsigned long flags;
+       bool busy;
+
+       if (port->flags & UPF_DEAD)
+               return 0;
+
+       uart_port_lock_irqsave(port, &flags);
+       busy = __serial_port_busy(port);
+       if (busy)
+               port->ops->start_tx(port);
+       uart_port_unlock_irqrestore(port, flags);
+
+       if (busy)
+               pm_runtime_mark_last_busy(dev);
+
+       return busy ? -EBUSY : 0;
+}
+
 static DEFINE_RUNTIME_DEV_PM_OPS(serial_port_pm,
-                                NULL, serial_port_runtime_resume, NULL);
+                                serial_port_runtime_suspend,
+                                serial_port_runtime_resume, NULL);
 
 static int serial_port_probe(struct device *dev)
 {
index 794b7751274034848c65a7e3374b694bcf61c42d..693e932d6feb5842467d1408e04c8d574342cb1f 100644 (file)
@@ -251,7 +251,9 @@ static int stm32_usart_config_rs485(struct uart_port *port, struct ktermios *ter
                writel_relaxed(cr3, port->membase + ofs->cr3);
                writel_relaxed(cr1, port->membase + ofs->cr1);
 
-               rs485conf->flags |= SER_RS485_RX_DURING_TX;
+               if (!port->rs485_rx_during_tx_gpio)
+                       rs485conf->flags |= SER_RS485_RX_DURING_TX;
+
        } else {
                stm32_usart_clr_bits(port, ofs->cr3,
                                     USART_CR3_DEM | USART_CR3_DEP);
index 156efda7c80d64b3c512d8cc84f228521aefec11..38a765eadbe2bc81494f1fbd7a63b50b87101f08 100644 (file)
@@ -381,7 +381,7 @@ static void vc_uniscr_delete(struct vc_data *vc, unsigned int nr)
                u32 *ln = vc->vc_uni_lines[vc->state.y];
                unsigned int x = vc->state.x, cols = vc->vc_cols;
 
-               memcpy(&ln[x], &ln[x + nr], (cols - x - nr) * sizeof(*ln));
+               memmove(&ln[x], &ln[x + nr], (cols - x - nr) * sizeof(*ln));
                memset32(&ln[cols - nr], ' ', nr);
        }
 }
index 029d017fc1b66b5c6695096016b54983e26b3e5f..eac7fff6992d0a863ec674c4c3c33f50b0d1f51e 100644 (file)
@@ -1469,7 +1469,7 @@ static int ufshcd_devfreq_target(struct device *dev,
        int ret = 0;
        struct ufs_hba *hba = dev_get_drvdata(dev);
        ktime_t start;
-       bool scale_up, sched_clk_scaling_suspend_work = false;
+       bool scale_up = false, sched_clk_scaling_suspend_work = false;
        struct list_head *clk_list = &hba->clk_list_head;
        struct ufs_clk_info *clki;
        unsigned long irq_flags;
@@ -3057,7 +3057,7 @@ bool ufshcd_cmd_inflight(struct scsi_cmnd *cmd)
  */
 static int ufshcd_clear_cmd(struct ufs_hba *hba, u32 task_tag)
 {
-       u32 mask = 1U << task_tag;
+       u32 mask;
        unsigned long flags;
        int err;
 
@@ -3075,6 +3075,8 @@ static int ufshcd_clear_cmd(struct ufs_hba *hba, u32 task_tag)
                return 0;
        }
 
+       mask = 1U << task_tag;
+
        /* clear outstanding transaction before retry */
        spin_lock_irqsave(hba->host->host_lock, flags);
        ufshcd_utrl_clear(hba, mask);
@@ -6352,7 +6354,6 @@ static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
                ufshcd_hold(hba);
                if (!ufshcd_is_clkgating_allowed(hba))
                        ufshcd_setup_clocks(hba, true);
-               ufshcd_release(hba);
                pm_op = hba->is_sys_suspended ? UFS_SYSTEM_PM : UFS_RUNTIME_PM;
                ufshcd_vops_resume(hba, pm_op);
        } else {
@@ -10592,7 +10593,7 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
        err = blk_mq_alloc_tag_set(&hba->tmf_tag_set);
        if (err < 0)
                goto out_remove_scsi_host;
-       hba->tmf_queue = blk_mq_init_queue(&hba->tmf_tag_set);
+       hba->tmf_queue = blk_mq_alloc_queue(&hba->tmf_tag_set, NULL, NULL);
        if (IS_ERR(hba->tmf_queue)) {
                err = PTR_ERR(hba->tmf_queue);
                goto free_tmf_tag_set;
index aeca902ab6cc427b0946cf13ea9b8c725eb3f287..fd1beb10bba726cef258e7438d642f31d6567dfe 100644 (file)
@@ -828,7 +828,11 @@ void cdns3_gadget_giveback(struct cdns3_endpoint *priv_ep,
                        return;
        }
 
-       if (request->complete) {
+       /*
+        * zlp request is appended by driver, needn't call usb_gadget_giveback_request() to notify
+        * gadget composite driver.
+        */
+       if (request->complete && request->buf != priv_dev->zlp_buf) {
                spin_unlock(&priv_dev->lock);
                usb_gadget_giveback_request(&priv_ep->endpoint,
                                            request);
@@ -2540,11 +2544,11 @@ static int cdns3_gadget_ep_disable(struct usb_ep *ep)
 
        while (!list_empty(&priv_ep->wa2_descmiss_req_list)) {
                priv_req = cdns3_next_priv_request(&priv_ep->wa2_descmiss_req_list);
+               list_del_init(&priv_req->list);
 
                kfree(priv_req->request.buf);
                cdns3_gadget_ep_free_request(&priv_ep->endpoint,
                                             &priv_req->request);
-               list_del_init(&priv_req->list);
                --priv_ep->wa2_counter;
        }
 
index 33548771a0d3a7212781ff39814fedb7d01f0ab4..465e9267b49c12768ac72ecb818f731fc8787641 100644 (file)
@@ -395,7 +395,6 @@ pm_put:
        return ret;
 }
 
-
 /**
  * cdns_wakeup_irq - interrupt handler for wakeup events
  * @irq: irq number for cdns3/cdnsp core device
index 04b6d12f2b9a39b9bfad76fe1909b22f7c010990..ee917f1b091c893ebccad19bd5a62aea9e65c721 100644 (file)
@@ -156,7 +156,8 @@ bool cdns_is_device(struct cdns *cdns)
  */
 static void cdns_otg_disable_irq(struct cdns *cdns)
 {
-       writel(0, &cdns->otg_irq_regs->ien);
+       if (cdns->version)
+               writel(0, &cdns->otg_irq_regs->ien);
 }
 
 /**
@@ -422,15 +423,20 @@ int cdns_drd_init(struct cdns *cdns)
 
                cdns->otg_regs = (void __iomem *)&cdns->otg_v1_regs->cmd;
 
-               if (readl(&cdns->otg_cdnsp_regs->did) == OTG_CDNSP_DID) {
+               state = readl(&cdns->otg_cdnsp_regs->did);
+
+               if (OTG_CDNSP_CHECK_DID(state)) {
                        cdns->otg_irq_regs = (struct cdns_otg_irq_regs __iomem *)
                                              &cdns->otg_cdnsp_regs->ien;
                        cdns->version  = CDNSP_CONTROLLER_V2;
-               } else {
+               } else if (OTG_CDNS3_CHECK_DID(state)) {
                        cdns->otg_irq_regs = (struct cdns_otg_irq_regs __iomem *)
                                              &cdns->otg_v1_regs->ien;
                        writel(1, &cdns->otg_v1_regs->simulate);
                        cdns->version  = CDNS3_CONTROLLER_V1;
+               } else {
+                       dev_err(cdns->dev, "not supporte DID=0x%08x\n", state);
+                       return -EINVAL;
                }
 
                dev_dbg(cdns->dev, "DRD version v1 (ID: %08x, rev: %08x)\n",
@@ -483,7 +489,6 @@ int cdns_drd_exit(struct cdns *cdns)
        return 0;
 }
 
-
 /* Indicate the cdns3 core was power lost before */
 bool cdns_power_is_lost(struct cdns *cdns)
 {
index cbdf94f73ed917bb14baf23a9087b10aca2f7015..d72370c321d3929fc477854585d9e46be6848fef 100644 (file)
@@ -79,7 +79,11 @@ struct cdnsp_otg_regs {
        __le32 susp_timing_ctrl;
 };
 
-#define OTG_CDNSP_DID  0x0004034E
+/* CDNSP driver supports 0x000403xx Cadence USB controller family. */
+#define OTG_CDNSP_CHECK_DID(did) (((did) & GENMASK(31, 8)) == 0x00040300)
+
+/* CDNS3 driver supports 0x000402xx Cadence USB controller family. */
+#define OTG_CDNS3_CHECK_DID(did) (((did) & GENMASK(31, 8)) == 0x00040200)
 
 /*
  * Common registers interface for both CDNS3 and CDNSP version of DRD.
index 6164fc4c96a49b60b73f772bdc92b8acf383269c..ceca4d839dfd42b87167f4de3019ab63776fa6c2 100644 (file)
 #include "../host/xhci.h"
 #include "../host/xhci-plat.h"
 
+/*
+ * The XECP_PORT_CAP_REG and XECP_AUX_CTRL_REG1 exist only
+ * in Cadence USB3 dual-role controller, so it can't be used
+ * with Cadence CDNSP dual-role controller.
+ */
 #define XECP_PORT_CAP_REG      0x8000
 #define XECP_AUX_CTRL_REG1     0x8120
 
@@ -57,6 +62,8 @@ static const struct xhci_plat_priv xhci_plat_cdns3_xhci = {
        .resume_quirk = xhci_cdns3_resume_quirk,
 };
 
+static const struct xhci_plat_priv xhci_plat_cdnsp_xhci;
+
 static int __cdns_host_init(struct cdns *cdns)
 {
        struct platform_device *xhci;
@@ -81,8 +88,13 @@ static int __cdns_host_init(struct cdns *cdns)
                goto err1;
        }
 
-       cdns->xhci_plat_data = kmemdup(&xhci_plat_cdns3_xhci,
-                       sizeof(struct xhci_plat_priv), GFP_KERNEL);
+       if (cdns->version < CDNSP_CONTROLLER_V2)
+               cdns->xhci_plat_data = kmemdup(&xhci_plat_cdns3_xhci,
+                               sizeof(struct xhci_plat_priv), GFP_KERNEL);
+       else
+               cdns->xhci_plat_data = kmemdup(&xhci_plat_cdnsp_xhci,
+                               sizeof(struct xhci_plat_priv), GFP_KERNEL);
+
        if (!cdns->xhci_plat_data) {
                ret = -ENOMEM;
                goto err1;
index d9bb3d3f026e68cae40de5dee4fa9d81ed391f10..2a38e1eb65466c82a6eb9e4f2feba8fc59ee7dfc 100644 (file)
@@ -176,6 +176,7 @@ struct hw_bank {
  * @enabled_otg_timer_bits: bits of enabled otg timers
  * @next_otg_timer: next nearest enabled timer to be expired
  * @work: work for role changing
+ * @power_lost_work: work for power lost handling
  * @wq: workqueue thread
  * @qh_pool: allocation pool for queue heads
  * @td_pool: allocation pool for transfer descriptors
@@ -226,6 +227,7 @@ struct ci_hdrc {
        enum otg_fsm_timer              next_otg_timer;
        struct usb_role_switch          *role_switch;
        struct work_struct              work;
+       struct work_struct              power_lost_work;
        struct workqueue_struct         *wq;
 
        struct dma_pool                 *qh_pool;
index 41014f93cfdf35ee42e859244e995a43ede7e777..835bf2428dc6eccee263b05024d42885884cd94d 100644 (file)
@@ -856,6 +856,27 @@ static int ci_extcon_register(struct ci_hdrc *ci)
        return 0;
 }
 
+static void ci_power_lost_work(struct work_struct *work)
+{
+       struct ci_hdrc *ci = container_of(work, struct ci_hdrc, power_lost_work);
+       enum ci_role role;
+
+       disable_irq_nosync(ci->irq);
+       pm_runtime_get_sync(ci->dev);
+       if (!ci_otg_is_fsm_mode(ci)) {
+               role = ci_get_role(ci);
+
+               if (ci->role != role) {
+                       ci_handle_id_switch(ci);
+               } else if (role == CI_ROLE_GADGET) {
+                       if (ci->is_otg && hw_read_otgsc(ci, OTGSC_BSV))
+                               usb_gadget_vbus_connect(&ci->gadget);
+               }
+       }
+       pm_runtime_put_sync(ci->dev);
+       enable_irq(ci->irq);
+}
+
 static DEFINE_IDA(ci_ida);
 
 struct platform_device *ci_hdrc_add_device(struct device *dev,
@@ -1045,6 +1066,8 @@ static int ci_hdrc_probe(struct platform_device *pdev)
 
        spin_lock_init(&ci->lock);
        mutex_init(&ci->mutex);
+       INIT_WORK(&ci->power_lost_work, ci_power_lost_work);
+
        ci->dev = dev;
        ci->platdata = dev_get_platdata(dev);
        ci->imx28_write_fix = !!(ci->platdata->flags &
@@ -1396,25 +1419,6 @@ static int ci_suspend(struct device *dev)
        return 0;
 }
 
-static void ci_handle_power_lost(struct ci_hdrc *ci)
-{
-       enum ci_role role;
-
-       disable_irq_nosync(ci->irq);
-       if (!ci_otg_is_fsm_mode(ci)) {
-               role = ci_get_role(ci);
-
-               if (ci->role != role) {
-                       ci_handle_id_switch(ci);
-               } else if (role == CI_ROLE_GADGET) {
-                       if (ci->is_otg && hw_read_otgsc(ci, OTGSC_BSV))
-                               usb_gadget_vbus_connect(&ci->gadget);
-               }
-       }
-
-       enable_irq(ci->irq);
-}
-
 static int ci_resume(struct device *dev)
 {
        struct ci_hdrc *ci = dev_get_drvdata(dev);
@@ -1446,7 +1450,7 @@ static int ci_resume(struct device *dev)
                ci_role(ci)->resume(ci, power_lost);
 
        if (power_lost)
-               ci_handle_power_lost(ci);
+               queue_work(system_freezable_wq, &ci->power_lost_work);
 
        if (ci->supports_runtime_pm) {
                pm_runtime_disable(dev);
index 84d91b1c1eed53e11539b69ccaa80080870ab043..0886b19d2e1c8f2b1c0f4e8bf85d6240f7cf19d1 100644 (file)
@@ -301,7 +301,7 @@ static int ulpi_register(struct device *dev, struct ulpi *ulpi)
                return ret;
        }
 
-       root = debugfs_create_dir(dev_name(dev), ulpi_root);
+       root = debugfs_create_dir(dev_name(&ulpi->dev), ulpi_root);
        debugfs_create_file("regs", 0444, root, ulpi, &ulpi_regs_fops);
 
        dev_dbg(&ulpi->dev, "registered ULPI PHY: vendor %04x, product %04x\n",
index ffd7c99e24a3624fba07e8277866a5526c6695eb..e38a4124f6102a5ff2a47107a8286815cfc5c8e2 100644 (file)
@@ -2053,9 +2053,19 @@ static void update_port_device_state(struct usb_device *udev)
 
        if (udev->parent) {
                hub = usb_hub_to_struct_hub(udev->parent);
-               port_dev = hub->ports[udev->portnum - 1];
-               WRITE_ONCE(port_dev->state, udev->state);
-               sysfs_notify_dirent(port_dev->state_kn);
+
+               /*
+                * The Link Layer Validation System Driver (lvstest)
+                * has a test step to unbind the hub before running the
+                * rest of the procedure. This triggers hub_disconnect
+                * which will set the hub's maxchild to 0, further
+                * resulting in usb_hub_to_struct_hub returning NULL.
+                */
+               if (hub) {
+                       port_dev = hub->ports[udev->portnum - 1];
+                       WRITE_ONCE(port_dev->state, udev->state);
+                       sysfs_notify_dirent(port_dev->state_kn);
+               }
        }
 }
 
@@ -2388,17 +2398,25 @@ static int usb_enumerate_device_otg(struct usb_device *udev)
                        }
                } else if (desc->bLength == sizeof
                                (struct usb_otg_descriptor)) {
-                       /* Set a_alt_hnp_support for legacy otg device */
-                       err = usb_control_msg(udev,
-                               usb_sndctrlpipe(udev, 0),
-                               USB_REQ_SET_FEATURE, 0,
-                               USB_DEVICE_A_ALT_HNP_SUPPORT,
-                               0, NULL, 0,
-                               USB_CTRL_SET_TIMEOUT);
-                       if (err < 0)
-                               dev_err(&udev->dev,
-                                       "set a_alt_hnp_support failed: %d\n",
-                                       err);
+                       /*
+                        * We are operating on a legacy OTP device
+                        * These should be told that they are operating
+                        * on the wrong port if we have another port that does
+                        * support HNP
+                        */
+                       if (bus->otg_port != 0) {
+                               /* Set a_alt_hnp_support for legacy otg device */
+                               err = usb_control_msg(udev,
+                                       usb_sndctrlpipe(udev, 0),
+                                       USB_REQ_SET_FEATURE, 0,
+                                       USB_DEVICE_A_ALT_HNP_SUPPORT,
+                                       0, NULL, 0,
+                                       USB_CTRL_SET_TIMEOUT);
+                               if (err < 0)
+                                       dev_err(&udev->dev,
+                                               "set a_alt_hnp_support failed: %d\n",
+                                               err);
+                       }
                }
        }
 #endif
index c628c1abc90711cb9b8e652a0d903a6359c968bc..4d63496f98b6c45074eee270db26c53b4019dac6 100644 (file)
@@ -573,7 +573,7 @@ static int match_location(struct usb_device *peer_hdev, void *p)
        struct usb_hub *peer_hub = usb_hub_to_struct_hub(peer_hdev);
        struct usb_device *hdev = to_usb_device(port_dev->dev.parent->parent);
 
-       if (!peer_hub)
+       if (!peer_hub || port_dev->connect_type == USB_PORT_NOT_USED)
                return 0;
 
        hcd = bus_to_hcd(hdev->bus);
@@ -584,7 +584,8 @@ static int match_location(struct usb_device *peer_hdev, void *p)
 
        for (port1 = 1; port1 <= peer_hdev->maxchild; port1++) {
                peer = peer_hub->ports[port1 - 1];
-               if (peer && peer->location == port_dev->location) {
+               if (peer && peer->connect_type != USB_PORT_NOT_USED &&
+                   peer->location == port_dev->location) {
                        link_peers_report(port_dev, peer);
                        return 1; /* done */
                }
index e3eea965e57bfd3d32fa6b1cb52fd4072734a30d..e120611a5174f7589ac124641a7b279654babff6 100644 (file)
 /* Global HWPARAMS4 Register */
 #define DWC3_GHWPARAMS4_HIBER_SCRATCHBUFS(n)   (((n) & (0x0f << 13)) >> 13)
 #define DWC3_MAX_HIBER_SCRATCHBUFS             15
-#define DWC3_EXT_BUFF_CONTROL          BIT(21)
 
 /* Global HWPARAMS6 Register */
 #define DWC3_GHWPARAMS6_BCSUPPORT              BIT(14)
index 6604845c397cd2171ee55966fc3ba80f3f2538d1..39564e17f3b07a228d54e503f0926c7b9bb810cf 100644 (file)
@@ -51,6 +51,8 @@
 #define PCI_DEVICE_ID_INTEL_MTLP               0x7ec1
 #define PCI_DEVICE_ID_INTEL_MTLS               0x7f6f
 #define PCI_DEVICE_ID_INTEL_MTL                        0x7e7e
+#define PCI_DEVICE_ID_INTEL_ARLH               0x7ec1
+#define PCI_DEVICE_ID_INTEL_ARLH_PCH           0x777e
 #define PCI_DEVICE_ID_INTEL_TGL                        0x9a15
 #define PCI_DEVICE_ID_AMD_MR                   0x163a
 
@@ -421,6 +423,8 @@ static const struct pci_device_id dwc3_pci_id_table[] = {
        { PCI_DEVICE_DATA(INTEL, MTLP, &dwc3_pci_intel_swnode) },
        { PCI_DEVICE_DATA(INTEL, MTL, &dwc3_pci_intel_swnode) },
        { PCI_DEVICE_DATA(INTEL, MTLS, &dwc3_pci_intel_swnode) },
+       { PCI_DEVICE_DATA(INTEL, ARLH, &dwc3_pci_intel_swnode) },
+       { PCI_DEVICE_DATA(INTEL, ARLH_PCH, &dwc3_pci_intel_swnode) },
        { PCI_DEVICE_DATA(INTEL, TGL, &dwc3_pci_intel_swnode) },
 
        { PCI_DEVICE_DATA(AMD, NL_USB, &dwc3_pci_amd_swnode) },
index 019368f8e9c4c3b2c26778eecc39fb23c6d614e6..28f49400f3e8b178e23c881120577da461178c35 100644 (file)
@@ -673,12 +673,6 @@ static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int action)
                params.param1 |= DWC3_DEPCFG_BINTERVAL_M1(bInterval_m1);
        }
 
-       if (dep->endpoint.fifo_mode) {
-               if (!(dwc->hwparams.hwparams4 & DWC3_EXT_BUFF_CONTROL))
-                       return -EINVAL;
-               params.param1 |= DWC3_DEPCFG_EBC_HWO_NOWB | DWC3_DEPCFG_USE_EBC;
-       }
-
        return dwc3_send_gadget_ep_cmd(dep, DWC3_DEPCMD_SETEPCONFIG, &params);
 }
 
@@ -2656,6 +2650,11 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
        int ret;
 
        spin_lock_irqsave(&dwc->lock, flags);
+       if (!dwc->pullups_connected) {
+               spin_unlock_irqrestore(&dwc->lock, flags);
+               return 0;
+       }
+
        dwc->connected = false;
 
        /*
@@ -4709,15 +4708,13 @@ int dwc3_gadget_suspend(struct dwc3 *dwc)
        unsigned long flags;
        int ret;
 
-       if (!dwc->gadget_driver)
-               return 0;
-
        ret = dwc3_gadget_soft_disconnect(dwc);
        if (ret)
                goto err;
 
        spin_lock_irqsave(&dwc->lock, flags);
-       dwc3_disconnect_gadget(dwc);
+       if (dwc->gadget_driver)
+               dwc3_disconnect_gadget(dwc);
        spin_unlock_irqrestore(&dwc->lock, flags);
 
        return 0;
index fd7a4e94397e64ccc74e362e5d73319918fdbae6..55a56cf67d7364998f9f4a42fd95e5d856cd105c 100644 (file)
@@ -26,8 +26,6 @@ struct dwc3;
 #define DWC3_DEPCFG_XFER_NOT_READY_EN  BIT(10)
 #define DWC3_DEPCFG_FIFO_ERROR_EN      BIT(11)
 #define DWC3_DEPCFG_STREAM_EVENT_EN    BIT(13)
-#define DWC3_DEPCFG_EBC_HWO_NOWB       BIT(14)
-#define DWC3_DEPCFG_USE_EBC            BIT(15)
 #define DWC3_DEPCFG_BINTERVAL_M1(n)    (((n) & 0xff) << 16)
 #define DWC3_DEPCFG_STREAM_CAPABLE     BIT(24)
 #define DWC3_DEPCFG_EP_NUMBER(n)       (((n) & 0x1f) << 25)
index 61f57fe5bb783bcf676cdb47177c66bb2a2e81be..43230915323c7dfa6625bfbbe67b1f8df238dcd4 100644 (file)
@@ -61,7 +61,7 @@ out:
 
 int dwc3_host_init(struct dwc3 *dwc)
 {
-       struct property_entry   props[4];
+       struct property_entry   props[5];
        struct platform_device  *xhci;
        int                     ret, irq;
        int                     prop_idx = 0;
@@ -89,6 +89,8 @@ int dwc3_host_init(struct dwc3 *dwc)
 
        memset(props, 0, sizeof(struct property_entry) * ARRAY_SIZE(props));
 
+       props[prop_idx++] = PROPERTY_ENTRY_BOOL("xhci-sg-trb-cache-size-quirk");
+
        if (dwc->usb3_lpm_capable)
                props[prop_idx++] = PROPERTY_ENTRY_BOOL("usb3-lpm-capable");
 
index 722a3ab2b337935e546806e21d1eab357f8f54e7..c265a1f62fc1451dacba18723e0ff75dbbebfc6d 100644 (file)
@@ -545,21 +545,37 @@ static int start_transfer(struct fsg_dev *fsg, struct usb_ep *ep,
 
 static bool start_in_transfer(struct fsg_common *common, struct fsg_buffhd *bh)
 {
+       int rc;
+
        if (!fsg_is_set(common))
                return false;
        bh->state = BUF_STATE_SENDING;
-       if (start_transfer(common->fsg, common->fsg->bulk_in, bh->inreq))
+       rc = start_transfer(common->fsg, common->fsg->bulk_in, bh->inreq);
+       if (rc) {
                bh->state = BUF_STATE_EMPTY;
+               if (rc == -ESHUTDOWN) {
+                       common->running = 0;
+                       return false;
+               }
+       }
        return true;
 }
 
 static bool start_out_transfer(struct fsg_common *common, struct fsg_buffhd *bh)
 {
+       int rc;
+
        if (!fsg_is_set(common))
                return false;
        bh->state = BUF_STATE_RECEIVING;
-       if (start_transfer(common->fsg, common->fsg->bulk_out, bh->outreq))
+       rc = start_transfer(common->fsg, common->fsg->bulk_out, bh->outreq);
+       if (rc) {
                bh->state = BUF_STATE_FULL;
+               if (rc == -ESHUTDOWN) {
+                       common->running = 0;
+                       return false;
+               }
+       }
        return true;
 }
 
index a1575a0ca568d7c46bc23bd605404096db14e6e5..28f4e6552e84592566d261ec3174773650c5d444 100644 (file)
@@ -105,8 +105,8 @@ static inline struct f_ncm *func_to_ncm(struct usb_function *f)
 
 /*
  * Although max mtu as dictated by u_ether is 15412 bytes, setting
- * max_segment_sizeto 15426 would not be efficient. If user chooses segment
- * size to be (>= 8192), then we can't aggregate more than one  buffer in each
+ * max_segment_size to 15426 would not be efficient. If user chooses segment
+ * size to be (>= 8192), then we can't aggregate more than one buffer in each
  * NTB (assuming each packet coming from network layer is >= 8192 bytes) as ep
  * maxpacket limit is 16384. So let max_segment_size be limited to 8000 to allow
  * at least 2 packets to be aggregated reducing wastage of NTB buffer space
@@ -1338,7 +1338,15 @@ parse_ntb:
             "Parsed NTB with %d frames\n", dgram_counter);
 
        to_process -= block_len;
-       if (to_process != 0) {
+
+       /*
+        * Windows NCM driver avoids USB ZLPs by adding a 1-byte
+        * zero pad as needed.
+        */
+       if (to_process == 1 &&
+           (*(unsigned char *)(ntb_ptr + block_len) == 0x00)) {
+               to_process--;
+       } else if ((to_process > 0) && (block_len != 0)) {
                ntb_ptr = (unsigned char *)(ntb_ptr + block_len);
                goto parse_ntb;
        }
@@ -1489,7 +1497,7 @@ static int ncm_bind(struct usb_configuration *c, struct usb_function *f)
        ncm_data_intf.bInterfaceNumber = status;
        ncm_union_desc.bSlaveInterface0 = status;
 
-       ecm_desc.wMaxSegmentSize = ncm_opts->max_segment_size;
+       ecm_desc.wMaxSegmentSize = cpu_to_le16(ncm_opts->max_segment_size);
 
        status = -ENODEV;
 
@@ -1685,7 +1693,7 @@ static struct usb_function_instance *ncm_alloc_inst(void)
                kfree(opts);
                return ERR_CAST(net);
        }
-       opts->max_segment_size = cpu_to_le16(ETH_FRAME_LEN);
+       opts->max_segment_size = ETH_FRAME_LEN;
        INIT_LIST_HEAD(&opts->ncm_os_desc.ext_prop);
 
        descs[0] = &opts->ncm_os_desc;
index 10c5d7f726a1fdd967d058bcc60302db8d839009..f90eeecf27de110ee4abc9d4cebef8cf73306193 100644 (file)
@@ -2036,7 +2036,8 @@ static irqreturn_t omap_udc_iso_irq(int irq, void *_dev)
 
 static inline int machine_without_vbus_sense(void)
 {
-       return  machine_is_omap_osk() || machine_is_sx1();
+       return  machine_is_omap_osk() || machine_is_omap_palmte() ||
+               machine_is_sx1();
 }
 
 static int omap_udc_start(struct usb_gadget *g,
index 4f8617210d852643e55a5b8463d0f03b6d9e9c78..169f72665739feca100e5a65786ddd6af2cbd675 100644 (file)
@@ -274,7 +274,6 @@ struct pch_udc_cfg_data {
  * @td_data:           for data request
  * @dev:               reference to device struct
  * @offset_addr:       offset address of ep register
- * @desc:              for this ep
  * @queue:             queue for requests
  * @num:               endpoint number
  * @in:                        endpoint is IN
index ac3fc597031573199a141e60e2b54432d2a2782e..cfebb833668e4b014633d0919be1aa1777c25140 100644 (file)
@@ -22,6 +22,7 @@
 #include <linux/of_irq.h>
 #include <linux/of_address.h>
 #include <linux/of_platform.h>
+#include <linux/platform_device.h>
 
 static int uhci_grlib_init(struct usb_hcd *hcd)
 {
index 4460fa7e9fab9e2e8412e9ed2844da3ad8c7c866..a7716202a8dd58d74f3d31fd48d0833de89db2be 100644 (file)
@@ -1861,14 +1861,14 @@ void xhci_remove_secondary_interrupter(struct usb_hcd *hcd, struct xhci_interrup
        struct xhci_hcd *xhci = hcd_to_xhci(hcd);
        unsigned int intr_num;
 
+       spin_lock_irq(&xhci->lock);
+
        /* interrupter 0 is primary interrupter, don't touch it */
-       if (!ir || !ir->intr_num || ir->intr_num >= xhci->max_interrupters)
+       if (!ir || !ir->intr_num || ir->intr_num >= xhci->max_interrupters) {
                xhci_dbg(xhci, "Invalid secondary interrupter, can't remove\n");
-
-       /* fixme, should we check xhci->interrupter[intr_num] == ir */
-       /* fixme locking */
-
-       spin_lock_irq(&xhci->lock);
+               spin_unlock_irq(&xhci->lock);
+               return;
+       }
 
        intr_num = ir->intr_num;
 
@@ -2322,7 +2322,7 @@ xhci_add_interrupter(struct xhci_hcd *xhci, struct xhci_interrupter *ir,
        u64 erst_base;
        u32 erst_size;
 
-       if (intr_num > xhci->max_interrupters) {
+       if (intr_num >= xhci->max_interrupters) {
                xhci_warn(xhci, "Can't add interrupter %d, max interrupters %d\n",
                          intr_num, xhci->max_interrupters);
                return -EINVAL;
index f04fde19f5514bed72d25aca8e6f07582b563ebb..3d071b8753088a5437c2e9f82031a6db4ad91208 100644 (file)
@@ -253,6 +253,9 @@ int xhci_plat_probe(struct platform_device *pdev, struct device *sysdev, const s
                if (device_property_read_bool(tmpdev, "quirk-broken-port-ped"))
                        xhci->quirks |= XHCI_BROKEN_PORT_PED;
 
+               if (device_property_read_bool(tmpdev, "xhci-sg-trb-cache-size-quirk"))
+                       xhci->quirks |= XHCI_SG_TRB_CACHE_SIZE_QUIRK;
+
                device_property_read_u32(tmpdev, "imod-interval-ns",
                                         &xhci->imod_interval);
        }
index 33806ae966f90c2167c967c38b17571fca0b179d..4f64b814d4aa20fdd08e349e5c07324b3e105c5e 100644 (file)
@@ -326,7 +326,13 @@ static unsigned int xhci_ring_expansion_needed(struct xhci_hcd *xhci, struct xhc
        /* how many trbs will be queued past the enqueue segment? */
        trbs_past_seg = enq_used + num_trbs - (TRBS_PER_SEGMENT - 1);
 
-       if (trbs_past_seg <= 0)
+       /*
+        * Consider expanding the ring already if num_trbs fills the current
+        * segment (i.e. trbs_past_seg == 0), not only when num_trbs goes into
+        * the next segment. Avoids confusing full ring with special empty ring
+        * case below
+        */
+       if (trbs_past_seg < 0)
                return 0;
 
        /* Empty ring special case, enqueue stuck on link trb while dequeue advanced */
@@ -2376,6 +2382,9 @@ static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
        /* handle completion code */
        switch (trb_comp_code) {
        case COMP_SUCCESS:
+               /* Don't overwrite status if TD had an error, see xHCI 4.9.1 */
+               if (td->error_mid_td)
+                       break;
                if (remaining) {
                        frame->status = short_framestatus;
                        if (xhci->quirks & XHCI_TRUST_TX_LENGTH)
@@ -2391,9 +2400,13 @@ static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
        case COMP_BANDWIDTH_OVERRUN_ERROR:
                frame->status = -ECOMM;
                break;
-       case COMP_ISOCH_BUFFER_OVERRUN:
        case COMP_BABBLE_DETECTED_ERROR:
+               sum_trbs_for_length = true;
+               fallthrough;
+       case COMP_ISOCH_BUFFER_OVERRUN:
                frame->status = -EOVERFLOW;
+               if (ep_trb != td->last_trb)
+                       td->error_mid_td = true;
                break;
        case COMP_INCOMPATIBLE_DEVICE_ERROR:
        case COMP_STALL_ERROR:
@@ -2401,8 +2414,9 @@ static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
                break;
        case COMP_USB_TRANSACTION_ERROR:
                frame->status = -EPROTO;
+               sum_trbs_for_length = true;
                if (ep_trb != td->last_trb)
-                       return 0;
+                       td->error_mid_td = true;
                break;
        case COMP_STOPPED:
                sum_trbs_for_length = true;
@@ -2422,6 +2436,9 @@ static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
                break;
        }
 
+       if (td->urb_length_set)
+               goto finish_td;
+
        if (sum_trbs_for_length)
                frame->actual_length = sum_trb_lengths(xhci, ep->ring, ep_trb) +
                        ep_trb_len - remaining;
@@ -2430,6 +2447,14 @@ static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 
        td->urb->actual_length += frame->actual_length;
 
+finish_td:
+       /* Don't give back TD yet if we encountered an error mid TD */
+       if (td->error_mid_td && ep_trb != td->last_trb) {
+               xhci_dbg(xhci, "Error mid isoc TD, wait for final completion event\n");
+               td->urb_length_set = true;
+               return 0;
+       }
+
        return finish_td(xhci, ep, ep_ring, td, trb_comp_code);
 }
 
@@ -2808,17 +2833,51 @@ static int handle_tx_event(struct xhci_hcd *xhci,
                }
 
                if (!ep_seg) {
-                       if (!ep->skip ||
-                           !usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
-                               /* Some host controllers give a spurious
-                                * successful event after a short transfer.
-                                * Ignore it.
-                                */
-                               if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
-                                               ep_ring->last_td_was_short) {
-                                       ep_ring->last_td_was_short = false;
-                                       goto cleanup;
+
+                       if (ep->skip && usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
+                               skip_isoc_td(xhci, td, ep, status);
+                               goto cleanup;
+                       }
+
+                       /*
+                        * Some hosts give a spurious success event after a short
+                        * transfer. Ignore it.
+                        */
+                       if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
+                           ep_ring->last_td_was_short) {
+                               ep_ring->last_td_was_short = false;
+                               goto cleanup;
+                       }
+
+                       /*
+                        * xhci 4.10.2 states isoc endpoints should continue
+                        * processing the next TD if there was an error mid TD.
+                        * So host like NEC don't generate an event for the last
+                        * isoc TRB even if the IOC flag is set.
+                        * xhci 4.9.1 states that if there are errors in mult-TRB
+                        * TDs xHC should generate an error for that TRB, and if xHC
+                        * proceeds to the next TD it should genete an event for
+                        * any TRB with IOC flag on the way. Other host follow this.
+                        * So this event might be for the next TD.
+                        */
+                       if (td->error_mid_td &&
+                           !list_is_last(&td->td_list, &ep_ring->td_list)) {
+                               struct xhci_td *td_next = list_next_entry(td, td_list);
+
+                               ep_seg = trb_in_td(xhci, td_next->start_seg, td_next->first_trb,
+                                                  td_next->last_trb, ep_trb_dma, false);
+                               if (ep_seg) {
+                                       /* give back previous TD, start handling new */
+                                       xhci_dbg(xhci, "Missing TD completion event after mid TD error\n");
+                                       ep_ring->dequeue = td->last_trb;
+                                       ep_ring->deq_seg = td->last_trb_seg;
+                                       inc_deq(xhci, ep_ring);
+                                       xhci_td_cleanup(xhci, td, ep_ring, td->status);
+                                       td = td_next;
                                }
+                       }
+
+                       if (!ep_seg) {
                                /* HC is busted, give up! */
                                xhci_err(xhci,
                                        "ERROR Transfer event TRB DMA ptr not "
@@ -2830,9 +2889,6 @@ static int handle_tx_event(struct xhci_hcd *xhci,
                                          ep_trb_dma, true);
                                return -ESHUTDOWN;
                        }
-
-                       skip_isoc_td(xhci, td, ep, status);
-                       goto cleanup;
                }
                if (trb_comp_code == COMP_SHORT_PACKET)
                        ep_ring->last_td_was_short = true;
index a5c72a634e6a91a262dbbaf316daf58baa0f8dcd..6f82d404883f9accf627c057a96702a7d8d65a80 100644 (file)
@@ -1549,6 +1549,7 @@ struct xhci_td {
        struct xhci_segment     *bounce_seg;
        /* actual_length of the URB has already been set */
        bool                    urb_length_set;
+       bool                    error_mid_td;
        unsigned int            num_trbs;
 };
 
index ae41578bd0149900b0a867f71a0cf6080e238566..70165dd86b5de958ab4f5fe0d1573988977be425 100644 (file)
@@ -21,7 +21,9 @@ static const struct class role_class = {
 struct usb_role_switch {
        struct device dev;
        struct mutex lock; /* device lock*/
+       struct module *module; /* the module this device depends on */
        enum usb_role role;
+       bool registered;
 
        /* From descriptor */
        struct device *usb2_port;
@@ -48,6 +50,9 @@ int usb_role_switch_set_role(struct usb_role_switch *sw, enum usb_role role)
        if (IS_ERR_OR_NULL(sw))
                return 0;
 
+       if (!sw->registered)
+               return -EOPNOTSUPP;
+
        mutex_lock(&sw->lock);
 
        ret = sw->set(sw, role);
@@ -73,7 +78,7 @@ enum usb_role usb_role_switch_get_role(struct usb_role_switch *sw)
 {
        enum usb_role role;
 
-       if (IS_ERR_OR_NULL(sw))
+       if (IS_ERR_OR_NULL(sw) || !sw->registered)
                return USB_ROLE_NONE;
 
        mutex_lock(&sw->lock);
@@ -135,7 +140,7 @@ struct usb_role_switch *usb_role_switch_get(struct device *dev)
                                                  usb_role_switch_match);
 
        if (!IS_ERR_OR_NULL(sw))
-               WARN_ON(!try_module_get(sw->dev.parent->driver->owner));
+               WARN_ON(!try_module_get(sw->module));
 
        return sw;
 }
@@ -157,7 +162,7 @@ struct usb_role_switch *fwnode_usb_role_switch_get(struct fwnode_handle *fwnode)
                sw = fwnode_connection_find_match(fwnode, "usb-role-switch",
                                                  NULL, usb_role_switch_match);
        if (!IS_ERR_OR_NULL(sw))
-               WARN_ON(!try_module_get(sw->dev.parent->driver->owner));
+               WARN_ON(!try_module_get(sw->module));
 
        return sw;
 }
@@ -172,7 +177,7 @@ EXPORT_SYMBOL_GPL(fwnode_usb_role_switch_get);
 void usb_role_switch_put(struct usb_role_switch *sw)
 {
        if (!IS_ERR_OR_NULL(sw)) {
-               module_put(sw->dev.parent->driver->owner);
+               module_put(sw->module);
                put_device(&sw->dev);
        }
 }
@@ -189,15 +194,18 @@ struct usb_role_switch *
 usb_role_switch_find_by_fwnode(const struct fwnode_handle *fwnode)
 {
        struct device *dev;
+       struct usb_role_switch *sw = NULL;
 
        if (!fwnode)
                return NULL;
 
        dev = class_find_device_by_fwnode(&role_class, fwnode);
-       if (dev)
-               WARN_ON(!try_module_get(dev->parent->driver->owner));
+       if (dev) {
+               sw = to_role_switch(dev);
+               WARN_ON(!try_module_get(sw->module));
+       }
 
-       return dev ? to_role_switch(dev) : NULL;
+       return sw;
 }
 EXPORT_SYMBOL_GPL(usb_role_switch_find_by_fwnode);
 
@@ -338,6 +346,7 @@ usb_role_switch_register(struct device *parent,
        sw->set = desc->set;
        sw->get = desc->get;
 
+       sw->module = parent->driver->owner;
        sw->dev.parent = parent;
        sw->dev.fwnode = desc->fwnode;
        sw->dev.class = &role_class;
@@ -352,6 +361,8 @@ usb_role_switch_register(struct device *parent,
                return ERR_PTR(ret);
        }
 
+       sw->registered = true;
+
        /* TODO: Symlinks for the host port and the device controller. */
 
        return sw;
@@ -366,8 +377,10 @@ EXPORT_SYMBOL_GPL(usb_role_switch_register);
  */
 void usb_role_switch_unregister(struct usb_role_switch *sw)
 {
-       if (!IS_ERR_OR_NULL(sw))
+       if (!IS_ERR_OR_NULL(sw)) {
+               sw->registered = false;
                device_unregister(&sw->dev);
+       }
 }
 EXPORT_SYMBOL_GPL(usb_role_switch_unregister);
 
index 1e61fe04317158c3a5e877bfb0d89cb5572ce4ef..923e0ed85444be9fde31e0b0d965813fc99c5acf 100644 (file)
@@ -146,6 +146,7 @@ static const struct usb_device_id id_table[] = {
        { USB_DEVICE(0x10C4, 0x85F8) }, /* Virtenio Preon32 */
        { USB_DEVICE(0x10C4, 0x8664) }, /* AC-Services CAN-IF */
        { USB_DEVICE(0x10C4, 0x8665) }, /* AC-Services OBD-IF */
+       { USB_DEVICE(0x10C4, 0x87ED) }, /* IMST USB-Stick for Smart Meter */
        { USB_DEVICE(0x10C4, 0x8856) }, /* CEL EM357 ZigBee USB Stick - LR */
        { USB_DEVICE(0x10C4, 0x8857) }, /* CEL EM357 ZigBee USB Stick */
        { USB_DEVICE(0x10C4, 0x88A4) }, /* MMB Networks ZigBee USB Device */
index 72390dbf0769282e8efb289023ca2b6915494160..2ae124c49d448f63b6d6a3078ad08fffee3ad2d0 100644 (file)
@@ -2269,6 +2269,7 @@ static const struct usb_device_id option_ids[] = {
        { USB_DEVICE_INTERFACE_CLASS(0x2cb7, 0x0111, 0xff) },                   /* Fibocom FM160 (MBIM mode) */
        { USB_DEVICE_INTERFACE_CLASS(0x2cb7, 0x01a0, 0xff) },                   /* Fibocom NL668-AM/NL652-EU (laptop MBIM) */
        { USB_DEVICE_INTERFACE_CLASS(0x2cb7, 0x01a2, 0xff) },                   /* Fibocom FM101-GL (laptop MBIM) */
+       { USB_DEVICE_INTERFACE_CLASS(0x2cb7, 0x01a3, 0xff) },                   /* Fibocom FM101-GL (laptop MBIM) */
        { USB_DEVICE_INTERFACE_CLASS(0x2cb7, 0x01a4, 0xff),                     /* Fibocom FM101-GL (laptop MBIM) */
          .driver_info = RSVD(4) },
        { USB_DEVICE_INTERFACE_CLASS(0x2df3, 0x9d03, 0xff) },                   /* LongSung M5710 */
index b1e844bf31f81f7984a976bf4ca5dcd8f01b3a97..703a9c56355731c158801f89996937f7ea760d35 100644 (file)
@@ -184,6 +184,8 @@ static const struct usb_device_id id_table[] = {
        {DEVICE_SWI(0x413c, 0x81d0)},   /* Dell Wireless 5819 */
        {DEVICE_SWI(0x413c, 0x81d1)},   /* Dell Wireless 5818 */
        {DEVICE_SWI(0x413c, 0x81d2)},   /* Dell Wireless 5818 */
+       {DEVICE_SWI(0x413c, 0x8217)},   /* Dell Wireless DW5826e */
+       {DEVICE_SWI(0x413c, 0x8218)},   /* Dell Wireless DW5826e QDL */
 
        /* Huawei devices */
        {DEVICE_HWI(0x03f0, 0x581d)},   /* HP lt4112 LTE/HSPA+ Gobi 4G Modem (Huawei me906e) */
index 4e0eef1440b7fd7407cf2158adad796b516aeda5..300aeef160e75c9d84fbd4b69b3c3ad35a774f5b 100644 (file)
@@ -1105,7 +1105,7 @@ static void isd200_dump_driveid(struct us_data *us, u16 *id)
 static int isd200_get_inquiry_data( struct us_data *us )
 {
        struct isd200_info *info = (struct isd200_info *)us->extra;
-       int retStatus = ISD200_GOOD;
+       int retStatus;
        u16 *id = info->id;
 
        usb_stor_dbg(us, "Entering isd200_get_inquiry_data\n");
@@ -1137,6 +1137,13 @@ static int isd200_get_inquiry_data( struct us_data *us )
                                isd200_fix_driveid(id);
                                isd200_dump_driveid(us, id);
 
+                               /* Prevent division by 0 in isd200_scsi_to_ata() */
+                               if (id[ATA_ID_HEADS] == 0 || id[ATA_ID_SECTORS] == 0) {
+                                       usb_stor_dbg(us, "   Invalid ATA Identify data\n");
+                                       retStatus = ISD200_ERROR;
+                                       goto Done;
+                               }
+
                                memset(&info->InquiryData, 0, sizeof(info->InquiryData));
 
                                /* Standard IDE interface only supports disks */
@@ -1202,6 +1209,7 @@ static int isd200_get_inquiry_data( struct us_data *us )
                }
        }
 
+ Done:
        usb_stor_dbg(us, "Leaving isd200_get_inquiry_data %08X\n", retStatus);
 
        return(retStatus);
@@ -1481,22 +1489,27 @@ static int isd200_init_info(struct us_data *us)
 
 static int isd200_Initialization(struct us_data *us)
 {
+       int rc = 0;
+
        usb_stor_dbg(us, "ISD200 Initialization...\n");
 
        /* Initialize ISD200 info struct */
 
-       if (isd200_init_info(us) == ISD200_ERROR) {
+       if (isd200_init_info(us) < 0) {
                usb_stor_dbg(us, "ERROR Initializing ISD200 Info struct\n");
+               rc = -ENOMEM;
        } else {
                /* Get device specific data */
 
-               if (isd200_get_inquiry_data(us) != ISD200_GOOD)
+               if (isd200_get_inquiry_data(us) != ISD200_GOOD) {
                        usb_stor_dbg(us, "ISD200 Initialization Failure\n");
-               else
+                       rc = -EINVAL;
+               } else {
                        usb_stor_dbg(us, "ISD200 Initialization complete\n");
+               }
        }
 
-       return 0;
+       return rc;
 }
 
 
index c54e9805da536a0ec139ad789017b79131c88561..12cf9940e5b6759167f9ae7450df8af92a85c63a 100644 (file)
@@ -179,6 +179,13 @@ static int slave_configure(struct scsi_device *sdev)
                 */
                sdev->use_192_bytes_for_3f = 1;
 
+               /*
+                * Some devices report generic values until the media has been
+                * accessed. Force a READ(10) prior to querying device
+                * characteristics.
+                */
+               sdev->read_before_ms = 1;
+
                /*
                 * Some devices don't like MODE SENSE with page=0x3f,
                 * which is the command used for checking if a device
index 9707f53cfda9c08507082ac33b69b5d146c6927f..71ace274761f182f0cbb942676e74d7e2c26d7a1 100644 (file)
@@ -878,6 +878,13 @@ static int uas_slave_configure(struct scsi_device *sdev)
        if (devinfo->flags & US_FL_CAPACITY_HEURISTICS)
                sdev->guess_capacity = 1;
 
+       /*
+        * Some devices report generic values until the media has been
+        * accessed. Force a READ(10) prior to querying device
+        * characteristics.
+        */
+       sdev->read_before_ms = 1;
+
        /*
         * Some devices don't like MODE SENSE with page=0x3f,
         * which is the command used for checking if a device
index f81bec0c7b864dc605143078ec9f1cd3e2706379..f8ea3054be54245c4233b48facaabe91f52868ed 100644 (file)
@@ -559,16 +559,21 @@ static ssize_t hpd_show(struct device *dev, struct device_attribute *attr, char
 }
 static DEVICE_ATTR_RO(hpd);
 
-static struct attribute *dp_altmode_attrs[] = {
+static struct attribute *displayport_attrs[] = {
        &dev_attr_configuration.attr,
        &dev_attr_pin_assignment.attr,
        &dev_attr_hpd.attr,
        NULL
 };
 
-static const struct attribute_group dp_altmode_group = {
+static const struct attribute_group displayport_group = {
        .name = "displayport",
-       .attrs = dp_altmode_attrs,
+       .attrs = displayport_attrs,
+};
+
+static const struct attribute_group *displayport_groups[] = {
+       &displayport_group,
+       NULL,
 };
 
 int dp_altmode_probe(struct typec_altmode *alt)
@@ -576,7 +581,6 @@ int dp_altmode_probe(struct typec_altmode *alt)
        const struct typec_altmode *port = typec_altmode_get_partner(alt);
        struct fwnode_handle *fwnode;
        struct dp_altmode *dp;
-       int ret;
 
        /* FIXME: Port can only be DFP_U. */
 
@@ -587,10 +591,6 @@ int dp_altmode_probe(struct typec_altmode *alt)
              DP_CAP_PIN_ASSIGN_DFP_D(alt->vdo)))
                return -ENODEV;
 
-       ret = sysfs_create_group(&alt->dev.kobj, &dp_altmode_group);
-       if (ret)
-               return ret;
-
        dp = devm_kzalloc(&alt->dev, sizeof(*dp), GFP_KERNEL);
        if (!dp)
                return -ENOMEM;
@@ -624,7 +624,6 @@ void dp_altmode_remove(struct typec_altmode *alt)
 {
        struct dp_altmode *dp = typec_altmode_get_drvdata(alt);
 
-       sysfs_remove_group(&alt->dev.kobj, &dp_altmode_group);
        cancel_work_sync(&dp->work);
 
        if (dp->connector_fwnode) {
@@ -649,6 +648,7 @@ static struct typec_altmode_driver dp_altmode_driver = {
        .driver = {
                .name = "typec_displayport",
                .owner = THIS_MODULE,
+               .dev_groups = displayport_groups,
        },
 };
 module_typec_altmode_driver(dp_altmode_driver);
index 5945e3a2b0f78f30c526a2a31ff1932f7e96531d..0965972310275e1c4d82be94051573648175a4fe 100644 (file)
@@ -3743,9 +3743,6 @@ static void tcpm_detach(struct tcpm_port *port)
        if (tcpm_port_is_disconnected(port))
                port->hard_reset_count = 0;
 
-       port->try_src_count = 0;
-       port->try_snk_count = 0;
-
        if (!port->attached)
                return;
 
@@ -4876,8 +4873,11 @@ static void run_state_machine(struct tcpm_port *port)
                break;
        case PORT_RESET:
                tcpm_reset_port(port);
-               tcpm_set_cc(port, tcpm_default_state(port) == SNK_UNATTACHED ?
-                           TYPEC_CC_RD : tcpm_rp_cc(port));
+               if (port->self_powered)
+                       tcpm_set_cc(port, TYPEC_CC_OPEN);
+               else
+                       tcpm_set_cc(port, tcpm_default_state(port) == SNK_UNATTACHED ?
+                                   TYPEC_CC_RD : tcpm_rp_cc(port));
                tcpm_set_state(port, PORT_RESET_WAIT_OFF,
                               PD_T_ERROR_RECOVERY);
                break;
@@ -6848,7 +6848,8 @@ struct tcpm_port *tcpm_register_port(struct device *dev, struct tcpc_dev *tcpc)
        if (err)
                goto out_role_sw_put;
 
-       port->typec_caps.pd = port->pds[0];
+       if (port->pds)
+               port->typec_caps.pd = port->pds[0];
 
        port->typec_port = typec_register_port(port->dev, &port->typec_caps);
        if (IS_ERR(port->typec_port)) {
index 5392ec6989592041f87b96a7af1479621be799a0..14f5a7bfae2e92873e405b369ca8ce5620d856c0 100644 (file)
@@ -938,7 +938,9 @@ static void ucsi_handle_connector_change(struct work_struct *work)
 
        clear_bit(EVENT_PENDING, &con->ucsi->flags);
 
+       mutex_lock(&ucsi->ppm_lock);
        ret = ucsi_acknowledge_connector_change(ucsi);
+       mutex_unlock(&ucsi->ppm_lock);
        if (ret)
                dev_err(ucsi->dev, "%s: ACK failed (%d)", __func__, ret);
 
index 6bbf490ac4010e9ad31a140bd484ec40077f0af6..928eacbeb21ac4cc5b8857644969bff7aba7a8a1 100644 (file)
@@ -25,6 +25,8 @@ struct ucsi_acpi {
        unsigned long flags;
        guid_t guid;
        u64 cmd;
+       bool dell_quirk_probed;
+       bool dell_quirk_active;
 };
 
 static int ucsi_acpi_dsm(struct ucsi_acpi *ua, int func)
@@ -73,9 +75,13 @@ static int ucsi_acpi_sync_write(struct ucsi *ucsi, unsigned int offset,
                                const void *val, size_t val_len)
 {
        struct ucsi_acpi *ua = ucsi_get_drvdata(ucsi);
+       bool ack = UCSI_COMMAND(*(u64 *)val) == UCSI_ACK_CC_CI;
        int ret;
 
-       set_bit(COMMAND_PENDING, &ua->flags);
+       if (ack)
+               set_bit(ACK_PENDING, &ua->flags);
+       else
+               set_bit(COMMAND_PENDING, &ua->flags);
 
        ret = ucsi_acpi_async_write(ucsi, offset, val, val_len);
        if (ret)
@@ -85,7 +91,10 @@ static int ucsi_acpi_sync_write(struct ucsi *ucsi, unsigned int offset,
                ret = -ETIMEDOUT;
 
 out_clear_bit:
-       clear_bit(COMMAND_PENDING, &ua->flags);
+       if (ack)
+               clear_bit(ACK_PENDING, &ua->flags);
+       else
+               clear_bit(COMMAND_PENDING, &ua->flags);
 
        return ret;
 }
@@ -119,12 +128,73 @@ static const struct ucsi_operations ucsi_zenbook_ops = {
        .async_write = ucsi_acpi_async_write
 };
 
-static const struct dmi_system_id zenbook_dmi_id[] = {
+/*
+ * Some Dell laptops expect that an ACK command with the
+ * UCSI_ACK_CONNECTOR_CHANGE bit set is followed by a (separate)
+ * ACK command that only has the UCSI_ACK_COMMAND_COMPLETE bit set.
+ * If this is not done events are not delivered to OSPM and
+ * subsequent commands will timeout.
+ */
+static int
+ucsi_dell_sync_write(struct ucsi *ucsi, unsigned int offset,
+                    const void *val, size_t val_len)
+{
+       struct ucsi_acpi *ua = ucsi_get_drvdata(ucsi);
+       u64 cmd = *(u64 *)val, ack = 0;
+       int ret;
+
+       if (UCSI_COMMAND(cmd) == UCSI_ACK_CC_CI &&
+           cmd & UCSI_ACK_CONNECTOR_CHANGE)
+               ack = UCSI_ACK_CC_CI | UCSI_ACK_COMMAND_COMPLETE;
+
+       ret = ucsi_acpi_sync_write(ucsi, offset, val, val_len);
+       if (ret != 0)
+               return ret;
+       if (ack == 0)
+               return ret;
+
+       if (!ua->dell_quirk_probed) {
+               ua->dell_quirk_probed = true;
+
+               cmd = UCSI_GET_CAPABILITY;
+               ret = ucsi_acpi_sync_write(ucsi, UCSI_CONTROL, &cmd,
+                                          sizeof(cmd));
+               if (ret == 0)
+                       return ucsi_acpi_sync_write(ucsi, UCSI_CONTROL,
+                                                   &ack, sizeof(ack));
+               if (ret != -ETIMEDOUT)
+                       return ret;
+
+               ua->dell_quirk_active = true;
+               dev_err(ua->dev, "Firmware bug: Additional ACK required after ACKing a connector change.\n");
+               dev_err(ua->dev, "Firmware bug: Enabling workaround\n");
+       }
+
+       if (!ua->dell_quirk_active)
+               return ret;
+
+       return ucsi_acpi_sync_write(ucsi, UCSI_CONTROL, &ack, sizeof(ack));
+}
+
+static const struct ucsi_operations ucsi_dell_ops = {
+       .read = ucsi_acpi_read,
+       .sync_write = ucsi_dell_sync_write,
+       .async_write = ucsi_acpi_async_write
+};
+
+static const struct dmi_system_id ucsi_acpi_quirks[] = {
        {
                .matches = {
                        DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
                        DMI_MATCH(DMI_PRODUCT_NAME, "ZenBook UX325UA_UM325UA"),
                },
+               .driver_data = (void *)&ucsi_zenbook_ops,
+       },
+       {
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
+               },
+               .driver_data = (void *)&ucsi_dell_ops,
        },
        { }
 };
@@ -142,8 +212,10 @@ static void ucsi_acpi_notify(acpi_handle handle, u32 event, void *data)
        if (UCSI_CCI_CONNECTOR(cci))
                ucsi_connector_change(ua->ucsi, UCSI_CCI_CONNECTOR(cci));
 
-       if (test_bit(COMMAND_PENDING, &ua->flags) &&
-           cci & (UCSI_CCI_ACK_COMPLETE | UCSI_CCI_COMMAND_COMPLETE))
+       if (cci & UCSI_CCI_ACK_COMPLETE && test_bit(ACK_PENDING, &ua->flags))
+               complete(&ua->complete);
+       if (cci & UCSI_CCI_COMMAND_COMPLETE &&
+           test_bit(COMMAND_PENDING, &ua->flags))
                complete(&ua->complete);
 }
 
@@ -151,6 +223,7 @@ static int ucsi_acpi_probe(struct platform_device *pdev)
 {
        struct acpi_device *adev = ACPI_COMPANION(&pdev->dev);
        const struct ucsi_operations *ops = &ucsi_acpi_ops;
+       const struct dmi_system_id *id;
        struct ucsi_acpi *ua;
        struct resource *res;
        acpi_status status;
@@ -180,8 +253,9 @@ static int ucsi_acpi_probe(struct platform_device *pdev)
        init_completion(&ua->complete);
        ua->dev = &pdev->dev;
 
-       if (dmi_check_system(zenbook_dmi_id))
-               ops = &ucsi_zenbook_ops;
+       id = dmi_first_match(ucsi_acpi_quirks);
+       if (id)
+               ops = id->driver_data;
 
        ua->ucsi = ucsi_create(&pdev->dev, ops);
        if (IS_ERR(ua->ucsi))
index 53a7ede8556df5688abb2631ce78a7c13dbc8308..faccc942b381be43700f0de95401bef80af500ec 100644 (file)
@@ -301,6 +301,7 @@ static const struct of_device_id pmic_glink_ucsi_of_quirks[] = {
        { .compatible = "qcom,sc8180x-pmic-glink", .data = (void *)UCSI_NO_PARTNER_PDOS, },
        { .compatible = "qcom,sc8280xp-pmic-glink", .data = (void *)UCSI_NO_PARTNER_PDOS, },
        { .compatible = "qcom,sm8350-pmic-glink", .data = (void *)UCSI_NO_PARTNER_PDOS, },
+       { .compatible = "qcom,sm8550-pmic-glink", .data = (void *)UCSI_NO_PARTNER_PDOS, },
        {}
 };
 
index 63af6ab034b5f1bb45992a4074f8862d528b38d3..46823c2e2ba1207e327607fa0ca0c757bc0968aa 100644 (file)
@@ -631,8 +631,7 @@ static void fbcon_prepare_logo(struct vc_data *vc, struct fb_info *info,
 
        if (logo_lines > vc->vc_bottom) {
                logo_shown = FBCON_LOGO_CANSHOW;
-               printk(KERN_INFO
-                      "fbcon_init: disable boot-logo (boot-logo bigger than screen).\n");
+               pr_info("fbcon: disable boot-logo (boot-logo bigger than screen).\n");
        } else {
                logo_shown = FBCON_LOGO_DRAW;
                vc->vc_top = logo_lines;
@@ -2400,11 +2399,9 @@ static int fbcon_do_set_font(struct vc_data *vc, int w, int h, int charcount,
        struct fbcon_ops *ops = info->fbcon_par;
        struct fbcon_display *p = &fb_display[vc->vc_num];
        int resize, ret, old_userfont, old_width, old_height, old_charcount;
-       char *old_data = NULL;
+       u8 *old_data = vc->vc_font.data;
 
        resize = (w != vc->vc_font.width) || (h != vc->vc_font.height);
-       if (p->userfont)
-               old_data = vc->vc_font.data;
        vc->vc_font.data = (void *)(p->fontdata = data);
        old_userfont = p->userfont;
        if ((p->userfont = userfont))
@@ -2438,13 +2435,13 @@ static int fbcon_do_set_font(struct vc_data *vc, int w, int h, int charcount,
                update_screen(vc);
        }
 
-       if (old_data && (--REFCOUNT(old_data) == 0))
+       if (old_userfont && (--REFCOUNT(old_data) == 0))
                kfree(old_data - FONT_EXTRA_WORDS * sizeof(int));
        return 0;
 
 err_out:
        p->fontdata = old_data;
-       vc->vc_font.data = (void *)old_data;
+       vc->vc_font.data = old_data;
 
        if (userfont) {
                p->userfont = old_userfont;
index c26ee6fd73c9bb0cff77793e2af4ca5407c59aa1..8fdccf033b2d9bf9a05e967fb166bbabe82bfe19 100644 (file)
@@ -1010,8 +1010,6 @@ static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info)
                        goto getmem_done;
                }
                pr_info("Unable to allocate enough contiguous physical memory on Gen 1 VM. Using MMIO instead.\n");
-       } else {
-               goto err1;
        }
 
        /*
index dddd6afcb972a5c23a5969c2ced0638ccf0b5b34..ebc9aeffdde7c54321b19499715e128d594c0e61 100644 (file)
@@ -869,6 +869,9 @@ static int savagefb_check_var(struct fb_var_screeninfo   *var,
 
        DBG("savagefb_check_var");
 
+       if (!var->pixclock)
+               return -EINVAL;
+
        var->transp.offset = 0;
        var->transp.length = 0;
        switch (var->bits_per_pixel) {
index 803ccb6aa479703bc1cb88237b4c3adc594a75a2..009bf1d926448011292c182e7eee29c25930ed6d 100644 (file)
@@ -1444,6 +1444,8 @@ sisfb_check_var(struct fb_var_screeninfo *var, struct fb_info *info)
 
        vtotal = var->upper_margin + var->lower_margin + var->vsync_len;
 
+       if (!var->pixclock)
+               return -EINVAL;
        pixclock = var->pixclock;
 
        if((var->vmode & FB_VMODE_MASK) == FB_VMODE_NONINTERLACED) {
index 2de0e675fd1504da67b7110ee81152934ad2cbad..8e5bac27542d915534c3071ec5f64e89727c2c11 100644 (file)
@@ -1158,7 +1158,7 @@ stifb_init_display(struct stifb_info *fb)
            }
            break;
        }
-       stifb_blank(0, (struct fb_info *)fb);   /* 0=enable screen */
+       stifb_blank(0, fb->info);       /* 0=enable screen */
 
        SETUP_FB(fb);
 }
index 42c25dc851976c5fa823b89fc4f72e5826d17459..ac73937073a76f7d22df39a503ac59bda2d4a7da 100644 (file)
@@ -374,7 +374,6 @@ static int vt8500lcd_probe(struct platform_device *pdev)
 
        irq = platform_get_irq(pdev, 0);
        if (irq < 0) {
-               dev_err(&pdev->dev, "no IRQ defined\n");
                ret = -ENODEV;
                goto failed_free_palette;
        }
index b8cfea7812d6b61110cc5e42fe4249d4578dc721..3b9f080109d7e46da11e4efb73a46554d7ff416f 100644 (file)
@@ -923,8 +923,8 @@ static void shutdown_pirq(struct irq_data *data)
                return;
 
        do_mask(info, EVT_MASK_REASON_EXPLICIT);
-       xen_evtchn_close(evtchn);
        xen_irq_info_cleanup(info);
+       xen_evtchn_close(evtchn);
 }
 
 static void enable_pirq(struct irq_data *data)
@@ -956,6 +956,7 @@ EXPORT_SYMBOL_GPL(xen_irq_from_gsi);
 static void __unbind_from_irq(struct irq_info *info, unsigned int irq)
 {
        evtchn_port_t evtchn;
+       bool close_evtchn = false;
 
        if (!info) {
                xen_irq_free_desc(irq);
@@ -975,7 +976,7 @@ static void __unbind_from_irq(struct irq_info *info, unsigned int irq)
                struct xenbus_device *dev;
 
                if (!info->is_static)
-                       xen_evtchn_close(evtchn);
+                       close_evtchn = true;
 
                switch (info->type) {
                case IRQT_VIRQ:
@@ -995,6 +996,9 @@ static void __unbind_from_irq(struct irq_info *info, unsigned int irq)
                }
 
                xen_irq_info_cleanup(info);
+
+               if (close_evtchn)
+                       xen_evtchn_close(evtchn);
        }
 
        xen_free_irq(info);
index 26ffb8755ffb5da27bd1eb38ceeb59ca06c473ed..f93f73ecefeee4b2b052a3ac758a8fbf9fdc11ae 100644 (file)
@@ -317,7 +317,7 @@ static long gntalloc_ioctl_alloc(struct gntalloc_file_private_data *priv,
                rc = -EFAULT;
                goto out_free;
        }
-       if (copy_to_user(arg->gref_ids, gref_ids,
+       if (copy_to_user(arg->gref_ids_flex, gref_ids,
                        sizeof(gref_ids[0]) * op.count)) {
                rc = -EFAULT;
                goto out_free;
index 50865527314538a8bedbde0f2590fbbb4afce3ce..c63f317e3df3de111b63a6f1b58feefca7de998b 100644 (file)
@@ -65,7 +65,7 @@ struct pcpu {
        uint32_t flags;
 };
 
-static struct bus_type xen_pcpu_subsys = {
+static const struct bus_type xen_pcpu_subsys = {
        .name = "xen_cpu",
        .dev_name = "xen_cpu",
 };
index 35b6e306026a4bfa1829f39f9b63ee8568f332d3..67dfa47788649328f6ad5783902f7dbcee9efa48 100644 (file)
@@ -1223,18 +1223,13 @@ struct privcmd_kernel_ioreq *alloc_ioreq(struct privcmd_ioeventfd *ioeventfd)
        kioreq->ioreq = (struct ioreq *)(page_to_virt(pages[0]));
        mmap_write_unlock(mm);
 
-       size = sizeof(*ports) * kioreq->vcpus;
-       ports = kzalloc(size, GFP_KERNEL);
-       if (!ports) {
-               ret = -ENOMEM;
+       ports = memdup_array_user(u64_to_user_ptr(ioeventfd->ports),
+                                 kioreq->vcpus, sizeof(*ports));
+       if (IS_ERR(ports)) {
+               ret = PTR_ERR(ports);
                goto error_kfree;
        }
 
-       if (copy_from_user(ports, u64_to_user_ptr(ioeventfd->ports), size)) {
-               ret = -EFAULT;
-               goto error_kfree_ports;
-       }
-
        for (i = 0; i < kioreq->vcpus; i++) {
                kioreq->ports[i].vcpu = i;
                kioreq->ports[i].port = ports[i];
@@ -1256,7 +1251,7 @@ struct privcmd_kernel_ioreq *alloc_ioreq(struct privcmd_ioeventfd *ioeventfd)
 error_unbind:
        while (--i >= 0)
                unbind_from_irqhandler(irq_from_evtchn(ports[i]), &kioreq->ports[i]);
-error_kfree_ports:
+
        kfree(ports);
 error_kfree:
        kfree(kioreq);
index 8cd583db20b1737144dbfb562f83a505d72c1f3d..b293d7652f15593532b9483422bcc56cbaae8cd9 100644 (file)
@@ -237,7 +237,7 @@ static const struct attribute_group *balloon_groups[] = {
        NULL
 };
 
-static struct bus_type balloon_subsys = {
+static const struct bus_type balloon_subsys = {
        .name = BALLOON_CLASS_NAME,
        .dev_name = BALLOON_CLASS_NAME,
 };
index 32835b4b9bc5030ad3e81ae411d56635f8d6a696..51b3124b0d56c98c316c58fa73d92c07e59b726a 100644 (file)
@@ -116,14 +116,15 @@ EXPORT_SYMBOL_GPL(xenbus_strstate);
  * @dev: xenbus device
  * @path: path to watch
  * @watch: watch to register
+ * @will_handle: events queuing determine callback
  * @callback: callback to register
  *
  * Register a @watch on the given path, using the given xenbus_watch structure
- * for storage, and the given @callback function as the callback.  On success,
- * the given @path will be saved as @watch->node, and remains the
- * caller's to free.  On error, @watch->node will
- * be NULL, the device will switch to %XenbusStateClosing, and the error will
- * be saved in the store.
+ * for storage, @will_handle function as the callback to determine if each
+ * event need to be queued, and the given @callback function as the callback.
+ * On success, the given @path will be saved as @watch->node, and remains the
+ * caller's to free.  On error, @watch->node will be NULL, the device will
+ * switch to %XenbusStateClosing, and the error will be saved in the store.
  *
  * Returns: %0 on success or -errno on error
  */
@@ -158,11 +159,13 @@ EXPORT_SYMBOL_GPL(xenbus_watch_path);
  * xenbus_watch_pathfmt - register a watch on a sprintf-formatted path
  * @dev: xenbus device
  * @watch: watch to register
+ * @will_handle: events queuing determine callback
  * @callback: callback to register
  * @pathfmt: format of path to watch
  *
  * Register a watch on the given @path, using the given xenbus_watch
- * structure for storage, and the given @callback function as the
+ * structure for storage, @will_handle function as the callback to determine if
+ * each event need to be queued, and the given @callback function as the
  * callback.  On success, the watched path (@path/@path2) will be saved
  * as @watch->node, and becomes the caller's to kfree().
  * On error, watch->node will be NULL, so the caller has nothing to
index bae330c2f0cf07d207af8c193dad15a78703793a..abdbbaee51846218d807033a30c98394a0c213b6 100644 (file)
@@ -107,7 +107,7 @@ static int v9fs_file_lock(struct file *filp, int cmd, struct file_lock *fl)
 
        p9_debug(P9_DEBUG_VFS, "filp: %p lock: %p\n", filp, fl);
 
-       if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->fl_type != F_UNLCK) {
+       if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->c.flc_type != F_UNLCK) {
                filemap_write_and_wait(inode->i_mapping);
                invalidate_mapping_pages(&inode->i_data, 0, -1);
        }
@@ -121,13 +121,12 @@ static int v9fs_file_do_lock(struct file *filp, int cmd, struct file_lock *fl)
        struct p9_fid *fid;
        uint8_t status = P9_LOCK_ERROR;
        int res = 0;
-       unsigned char fl_type;
        struct v9fs_session_info *v9ses;
 
        fid = filp->private_data;
        BUG_ON(fid == NULL);
 
-       BUG_ON((fl->fl_flags & FL_POSIX) != FL_POSIX);
+       BUG_ON((fl->c.flc_flags & FL_POSIX) != FL_POSIX);
 
        res = locks_lock_file_wait(filp, fl);
        if (res < 0)
@@ -136,7 +135,7 @@ static int v9fs_file_do_lock(struct file *filp, int cmd, struct file_lock *fl)
        /* convert posix lock to p9 tlock args */
        memset(&flock, 0, sizeof(flock));
        /* map the lock type */
-       switch (fl->fl_type) {
+       switch (fl->c.flc_type) {
        case F_RDLCK:
                flock.type = P9_LOCK_TYPE_RDLCK;
                break;
@@ -152,7 +151,7 @@ static int v9fs_file_do_lock(struct file *filp, int cmd, struct file_lock *fl)
                flock.length = 0;
        else
                flock.length = fl->fl_end - fl->fl_start + 1;
-       flock.proc_id = fl->fl_pid;
+       flock.proc_id = fl->c.flc_pid;
        flock.client_id = fid->clnt->name;
        if (IS_SETLKW(cmd))
                flock.flags = P9_LOCK_FLAGS_BLOCK;
@@ -207,12 +206,13 @@ out_unlock:
         * incase server returned error for lock request, revert
         * it locally
         */
-       if (res < 0 && fl->fl_type != F_UNLCK) {
-               fl_type = fl->fl_type;
-               fl->fl_type = F_UNLCK;
+       if (res < 0 && fl->c.flc_type != F_UNLCK) {
+               unsigned char type = fl->c.flc_type;
+
+               fl->c.flc_type = F_UNLCK;
                /* Even if this fails we want to return the remote error */
                locks_lock_file_wait(filp, fl);
-               fl->fl_type = fl_type;
+               fl->c.flc_type = type;
        }
        if (flock.client_id != fid->clnt->name)
                kfree(flock.client_id);
@@ -234,7 +234,7 @@ static int v9fs_file_getlock(struct file *filp, struct file_lock *fl)
         * if we have a conflicting lock locally, no need to validate
         * with server
         */
-       if (fl->fl_type != F_UNLCK)
+       if (fl->c.flc_type != F_UNLCK)
                return res;
 
        /* convert posix lock to p9 tgetlock args */
@@ -245,7 +245,7 @@ static int v9fs_file_getlock(struct file *filp, struct file_lock *fl)
                glock.length = 0;
        else
                glock.length = fl->fl_end - fl->fl_start + 1;
-       glock.proc_id = fl->fl_pid;
+       glock.proc_id = fl->c.flc_pid;
        glock.client_id = fid->clnt->name;
 
        res = p9_client_getlock_dotl(fid, &glock);
@@ -254,13 +254,13 @@ static int v9fs_file_getlock(struct file *filp, struct file_lock *fl)
        /* map 9p lock type to os lock type */
        switch (glock.type) {
        case P9_LOCK_TYPE_RDLCK:
-               fl->fl_type = F_RDLCK;
+               fl->c.flc_type = F_RDLCK;
                break;
        case P9_LOCK_TYPE_WRLCK:
-               fl->fl_type = F_WRLCK;
+               fl->c.flc_type = F_WRLCK;
                break;
        case P9_LOCK_TYPE_UNLCK:
-               fl->fl_type = F_UNLCK;
+               fl->c.flc_type = F_UNLCK;
                break;
        }
        if (glock.type != P9_LOCK_TYPE_UNLCK) {
@@ -269,7 +269,7 @@ static int v9fs_file_getlock(struct file *filp, struct file_lock *fl)
                        fl->fl_end = OFFSET_MAX;
                else
                        fl->fl_end = glock.start + glock.length - 1;
-               fl->fl_pid = -glock.proc_id;
+               fl->c.flc_pid = -glock.proc_id;
        }
 out:
        if (glock.client_id != fid->clnt->name)
@@ -293,7 +293,7 @@ static int v9fs_file_lock_dotl(struct file *filp, int cmd, struct file_lock *fl)
        p9_debug(P9_DEBUG_VFS, "filp: %p cmd:%d lock: %p name: %pD\n",
                 filp, cmd, fl, filp);
 
-       if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->fl_type != F_UNLCK) {
+       if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->c.flc_type != F_UNLCK) {
                filemap_write_and_wait(inode->i_mapping);
                invalidate_mapping_pages(&inode->i_data, 0, -1);
        }
@@ -324,16 +324,16 @@ static int v9fs_file_flock_dotl(struct file *filp, int cmd,
        p9_debug(P9_DEBUG_VFS, "filp: %p cmd:%d lock: %p name: %pD\n",
                 filp, cmd, fl, filp);
 
-       if (!(fl->fl_flags & FL_FLOCK))
+       if (!(fl->c.flc_flags & FL_FLOCK))
                goto out_err;
 
-       if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->fl_type != F_UNLCK) {
+       if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->c.flc_type != F_UNLCK) {
                filemap_write_and_wait(inode->i_mapping);
                invalidate_mapping_pages(&inode->i_data, 0, -1);
        }
        /* Convert flock to posix lock */
-       fl->fl_flags |= FL_POSIX;
-       fl->fl_flags ^= FL_FLOCK;
+       fl->c.flc_flags |= FL_POSIX;
+       fl->c.flc_flags ^= FL_FLOCK;
 
        if (IS_SETLK(cmd) | IS_SETLKW(cmd))
                ret = v9fs_file_do_lock(filp, cmd, fl);
index 89fdbefd1075f8f5a071987bca3b4a50f687a887..4bc7dd420874aa979d325d5bc620d4425f0e04db 100644 (file)
@@ -162,7 +162,6 @@ menu "DOS/FAT/EXFAT/NT Filesystems"
 
 source "fs/fat/Kconfig"
 source "fs/exfat/Kconfig"
-source "fs/ntfs/Kconfig"
 source "fs/ntfs3/Kconfig"
 
 endmenu
@@ -174,6 +173,13 @@ source "fs/proc/Kconfig"
 source "fs/kernfs/Kconfig"
 source "fs/sysfs/Kconfig"
 
+config FS_PID
+       bool "Pseudo filesystem for process file descriptors"
+       depends on 64BIT
+       default y
+       help
+         Pidfs implements advanced features for process file descriptors.
+
 config TMPFS
        bool "Tmpfs virtual memory file system support (former shm fs)"
        depends on SHMEM
index c09016257f05e82a772da50ab18e2d708ea5a768..6ecc9b0a53f2b0478385fe131c325d377d794c64 100644 (file)
@@ -15,7 +15,7 @@ obj-y :=      open.o read_write.o file_table.o super.o \
                pnode.o splice.o sync.o utimes.o d_path.o \
                stack.o fs_struct.o statfs.o fs_pin.o nsfs.o \
                fs_types.o fs_context.o fs_parser.o fsopen.o init.o \
-               kernel_read_file.o mnt_idmapping.o remap_range.o
+               kernel_read_file.o mnt_idmapping.o remap_range.o pidfs.o
 
 obj-$(CONFIG_BUFFER_HEAD)      += buffer.o mpage.o
 obj-$(CONFIG_PROC_FS)          += proc_namespace.o
@@ -91,7 +91,6 @@ obj-y                         += unicode/
 obj-$(CONFIG_SYSV_FS)          += sysv/
 obj-$(CONFIG_SMBFS)            += smb/
 obj-$(CONFIG_HPFS_FS)          += hpfs/
-obj-$(CONFIG_NTFS_FS)          += ntfs/
 obj-$(CONFIG_NTFS3_FS)         += ntfs3/
 obj-$(CONFIG_UFS_FS)           += ufs/
 obj-$(CONFIG_EFS_FS)           += efs/
index 60685ec76d983523f15da7c7e97b38634b750c07..2e612834329ac127ae6e63d802a7b31b89558b0c 100644 (file)
@@ -105,6 +105,7 @@ struct affs_sb_info {
        int work_queued;                /* non-zero delayed work is queued */
        struct delayed_work sb_work;    /* superblock flush delayed work */
        spinlock_t work_lock;           /* protects sb_work and work_queued */
+       struct rcu_head rcu;
 };
 
 #define AFFS_MOUNT_SF_INTL             0x0001 /* International filesystem. */
index 58b391446ae1fd97e48891c82ec8d88f32314303..b56a95cf414a44277783e7242c33cba5cb818707 100644 (file)
@@ -640,7 +640,7 @@ static void affs_kill_sb(struct super_block *sb)
                affs_brelse(sbi->s_root_bh);
                kfree(sbi->s_prefix);
                mutex_destroy(&sbi->s_bmlock);
-               kfree(sbi);
+               kfree_rcu(sbi, rcu);
        }
 }
 
index c14533ef108f191a7209f4f035e084fb8a41b57a..8a67fc427e748a0840d9e92c1c0e8e4a3d7c4fdc 100644 (file)
@@ -124,7 +124,7 @@ static void afs_dir_read_cleanup(struct afs_read *req)
                if (xas_retry(&xas, folio))
                        continue;
                BUG_ON(xa_is_value(folio));
-               ASSERTCMP(folio_file_mapping(folio), ==, mapping);
+               ASSERTCMP(folio->mapping, ==, mapping);
 
                folio_put(folio);
        }
@@ -202,12 +202,12 @@ static void afs_dir_dump(struct afs_vnode *dvnode, struct afs_read *req)
                if (xas_retry(&xas, folio))
                        continue;
 
-               BUG_ON(folio_file_mapping(folio) != mapping);
+               BUG_ON(folio->mapping != mapping);
 
                size = min_t(loff_t, folio_size(folio), req->actual_len - folio_pos(folio));
                for (offset = 0; offset < size; offset += sizeof(*block)) {
                        block = kmap_local_folio(folio, offset);
-                       pr_warn("[%02lx] %32phN\n", folio_index(folio) + offset, block);
+                       pr_warn("[%02lx] %32phN\n", folio->index + offset, block);
                        kunmap_local(block);
                }
        }
@@ -233,7 +233,7 @@ static int afs_dir_check(struct afs_vnode *dvnode, struct afs_read *req)
                if (xas_retry(&xas, folio))
                        continue;
 
-               BUG_ON(folio_file_mapping(folio) != mapping);
+               BUG_ON(folio->mapping != mapping);
 
                if (!afs_dir_check_folio(dvnode, folio, req->actual_len)) {
                        afs_dir_dump(dvnode, req);
@@ -474,6 +474,16 @@ static int afs_dir_iterate_block(struct afs_vnode *dvnode,
                        continue;
                }
 
+               /* Don't expose silly rename entries to userspace. */
+               if (nlen > 6 &&
+                   dire->u.name[0] == '.' &&
+                   ctx->actor != afs_lookup_filldir &&
+                   ctx->actor != afs_lookup_one_filldir &&
+                   memcmp(dire->u.name, ".__afs", 6) == 0) {
+                       ctx->pos = blkoff + next * sizeof(union afs_xdr_dirent);
+                       continue;
+               }
+
                /* found the next entry */
                if (!dir_emit(ctx, dire->u.name, nlen,
                              ntohl(dire->u.vnode),
@@ -708,6 +718,8 @@ static void afs_do_lookup_success(struct afs_operation *op)
                        break;
                }
 
+               if (vp->scb.status.abort_code)
+                       trace_afs_bulkstat_error(op, &vp->fid, i, vp->scb.status.abort_code);
                if (!vp->scb.have_status && !vp->scb.have_error)
                        continue;
 
@@ -897,12 +909,16 @@ static struct inode *afs_do_lookup(struct inode *dir, struct dentry *dentry,
                afs_begin_vnode_operation(op);
                afs_wait_for_operation(op);
        }
-       inode = ERR_PTR(afs_op_error(op));
 
 out_op:
        if (!afs_op_error(op)) {
-               inode = &op->file[1].vnode->netfs.inode;
-               op->file[1].vnode = NULL;
+               if (op->file[1].scb.status.abort_code) {
+                       afs_op_accumulate_error(op, -ECONNABORTED,
+                                               op->file[1].scb.status.abort_code);
+               } else {
+                       inode = &op->file[1].vnode->netfs.inode;
+                       op->file[1].vnode = NULL;
+               }
        }
 
        if (op->file[0].scb.have_status)
@@ -2022,7 +2038,7 @@ static bool afs_dir_release_folio(struct folio *folio, gfp_t gfp_flags)
 {
        struct afs_vnode *dvnode = AFS_FS_I(folio_inode(folio));
 
-       _enter("{{%llx:%llu}[%lu]}", dvnode->fid.vid, dvnode->fid.vnode, folio_index(folio));
+       _enter("{{%llx:%llu}[%lu]}", dvnode->fid.vid, dvnode->fid.vnode, folio->index);
 
        folio_detach_private(folio);
 
index d3bc4a2d708519624673be4fd0572e9080da732c..c4d2711e20ad4476cabc0d41b4e50e2aad4477e4 100644 (file)
@@ -258,16 +258,7 @@ const struct inode_operations afs_dynroot_inode_operations = {
        .lookup         = afs_dynroot_lookup,
 };
 
-/*
- * Dirs in the dynamic root don't need revalidation.
- */
-static int afs_dynroot_d_revalidate(struct dentry *dentry, unsigned int flags)
-{
-       return 1;
-}
-
 const struct dentry_operations afs_dynroot_dentry_operations = {
-       .d_revalidate   = afs_dynroot_d_revalidate,
        .d_delete       = always_delete_dentry,
        .d_release      = afs_d_release,
        .d_automount    = afs_d_automount,
index 3d33b221d9ca256a3b3d978a835d2db9fff2e284..ef2cc8f565d25b15e086d2fc64c6f565bac7a16b 100644 (file)
@@ -417,13 +417,17 @@ static void afs_add_open_mmap(struct afs_vnode *vnode)
 
 static void afs_drop_open_mmap(struct afs_vnode *vnode)
 {
-       if (!atomic_dec_and_test(&vnode->cb_nr_mmap))
+       if (atomic_add_unless(&vnode->cb_nr_mmap, -1, 1))
                return;
 
        down_write(&vnode->volume->open_mmaps_lock);
 
-       if (atomic_read(&vnode->cb_nr_mmap) == 0)
+       read_seqlock_excl(&vnode->cb_lock);
+       // the only place where ->cb_nr_mmap may hit 0
+       // see __afs_break_callback() for the other side...
+       if (atomic_dec_and_test(&vnode->cb_nr_mmap))
                list_del_init(&vnode->cb_mmap_link);
+       read_sequnlock_excl(&vnode->cb_lock);
 
        up_write(&vnode->volume->open_mmaps_lock);
        flush_work(&vnode->cb_work);
index 9c6dea3139f5a88497fc387190ca9050b182431a..f0e96a35093fa44871985d2edabd3e910a7b7a7d 100644 (file)
@@ -93,13 +93,13 @@ static void afs_grant_locks(struct afs_vnode *vnode)
        bool exclusive = (vnode->lock_type == AFS_LOCK_WRITE);
 
        list_for_each_entry_safe(p, _p, &vnode->pending_locks, fl_u.afs.link) {
-               if (!exclusive && p->fl_type == F_WRLCK)
+               if (!exclusive && lock_is_write(p))
                        continue;
 
                list_move_tail(&p->fl_u.afs.link, &vnode->granted_locks);
                p->fl_u.afs.state = AFS_LOCK_GRANTED;
                trace_afs_flock_op(vnode, p, afs_flock_op_grant);
-               wake_up(&p->fl_wait);
+               locks_wake_up(p);
        }
 }
 
@@ -112,25 +112,24 @@ static void afs_next_locker(struct afs_vnode *vnode, int error)
 {
        struct file_lock *p, *_p, *next = NULL;
        struct key *key = vnode->lock_key;
-       unsigned int fl_type = F_RDLCK;
+       unsigned int type = F_RDLCK;
 
        _enter("");
 
        if (vnode->lock_type == AFS_LOCK_WRITE)
-               fl_type = F_WRLCK;
+               type = F_WRLCK;
 
        list_for_each_entry_safe(p, _p, &vnode->pending_locks, fl_u.afs.link) {
                if (error &&
-                   p->fl_type == fl_type &&
-                   afs_file_key(p->fl_file) == key) {
+                   p->c.flc_type == type &&
+                   afs_file_key(p->c.flc_file) == key) {
                        list_del_init(&p->fl_u.afs.link);
                        p->fl_u.afs.state = error;
-                       wake_up(&p->fl_wait);
+                       locks_wake_up(p);
                }
 
                /* Select the next locker to hand off to. */
-               if (next &&
-                   (next->fl_type == F_WRLCK || p->fl_type == F_RDLCK))
+               if (next && (lock_is_write(next) || lock_is_read(p)))
                        continue;
                next = p;
        }
@@ -142,7 +141,7 @@ static void afs_next_locker(struct afs_vnode *vnode, int error)
                afs_set_lock_state(vnode, AFS_VNODE_LOCK_SETTING);
                next->fl_u.afs.state = AFS_LOCK_YOUR_TRY;
                trace_afs_flock_op(vnode, next, afs_flock_op_wake);
-               wake_up(&next->fl_wait);
+               locks_wake_up(next);
        } else {
                afs_set_lock_state(vnode, AFS_VNODE_LOCK_NONE);
                trace_afs_flock_ev(vnode, NULL, afs_flock_no_lockers, 0);
@@ -166,7 +165,7 @@ static void afs_kill_lockers_enoent(struct afs_vnode *vnode)
                               struct file_lock, fl_u.afs.link);
                list_del_init(&p->fl_u.afs.link);
                p->fl_u.afs.state = -ENOENT;
-               wake_up(&p->fl_wait);
+               locks_wake_up(p);
        }
 
        key_put(vnode->lock_key);
@@ -464,14 +463,14 @@ static int afs_do_setlk(struct file *file, struct file_lock *fl)
 
        _enter("{%llx:%llu},%llu-%llu,%u,%u",
               vnode->fid.vid, vnode->fid.vnode,
-              fl->fl_start, fl->fl_end, fl->fl_type, mode);
+              fl->fl_start, fl->fl_end, fl->c.flc_type, mode);
 
        fl->fl_ops = &afs_lock_ops;
        INIT_LIST_HEAD(&fl->fl_u.afs.link);
        fl->fl_u.afs.state = AFS_LOCK_PENDING;
 
        partial = (fl->fl_start != 0 || fl->fl_end != OFFSET_MAX);
-       type = (fl->fl_type == F_RDLCK) ? AFS_LOCK_READ : AFS_LOCK_WRITE;
+       type = lock_is_read(fl) ? AFS_LOCK_READ : AFS_LOCK_WRITE;
        if (mode == afs_flock_mode_write && partial)
                type = AFS_LOCK_WRITE;
 
@@ -524,7 +523,7 @@ static int afs_do_setlk(struct file *file, struct file_lock *fl)
        }
 
        if (vnode->lock_state == AFS_VNODE_LOCK_NONE &&
-           !(fl->fl_flags & FL_SLEEP)) {
+           !(fl->c.flc_flags & FL_SLEEP)) {
                ret = -EAGAIN;
                if (type == AFS_LOCK_READ) {
                        if (vnode->status.lock_count == -1)
@@ -621,7 +620,7 @@ skip_server_lock:
        return 0;
 
 lock_is_contended:
-       if (!(fl->fl_flags & FL_SLEEP)) {
+       if (!(fl->c.flc_flags & FL_SLEEP)) {
                list_del_init(&fl->fl_u.afs.link);
                afs_next_locker(vnode, 0);
                ret = -EAGAIN;
@@ -641,7 +640,7 @@ need_to_wait:
        spin_unlock(&vnode->lock);
 
        trace_afs_flock_ev(vnode, fl, afs_flock_waiting, 0);
-       ret = wait_event_interruptible(fl->fl_wait,
+       ret = wait_event_interruptible(fl->c.flc_wait,
                                       fl->fl_u.afs.state != AFS_LOCK_PENDING);
        trace_afs_flock_ev(vnode, fl, afs_flock_waited, ret);
 
@@ -704,7 +703,8 @@ static int afs_do_unlk(struct file *file, struct file_lock *fl)
        struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
        int ret;
 
-       _enter("{%llx:%llu},%u", vnode->fid.vid, vnode->fid.vnode, fl->fl_type);
+       _enter("{%llx:%llu},%u", vnode->fid.vid, vnode->fid.vnode,
+              fl->c.flc_type);
 
        trace_afs_flock_op(vnode, fl, afs_flock_op_unlock);
 
@@ -730,11 +730,11 @@ static int afs_do_getlk(struct file *file, struct file_lock *fl)
        if (vnode->lock_state == AFS_VNODE_LOCK_DELETED)
                return -ENOENT;
 
-       fl->fl_type = F_UNLCK;
+       fl->c.flc_type = F_UNLCK;
 
        /* check local lock records first */
        posix_test_lock(file, fl);
-       if (fl->fl_type == F_UNLCK) {
+       if (lock_is_unlock(fl)) {
                /* no local locks; consult the server */
                ret = afs_fetch_status(vnode, key, false, NULL);
                if (ret < 0)
@@ -743,18 +743,18 @@ static int afs_do_getlk(struct file *file, struct file_lock *fl)
                lock_count = READ_ONCE(vnode->status.lock_count);
                if (lock_count != 0) {
                        if (lock_count > 0)
-                               fl->fl_type = F_RDLCK;
+                               fl->c.flc_type = F_RDLCK;
                        else
-                               fl->fl_type = F_WRLCK;
+                               fl->c.flc_type = F_WRLCK;
                        fl->fl_start = 0;
                        fl->fl_end = OFFSET_MAX;
-                       fl->fl_pid = 0;
+                       fl->c.flc_pid = 0;
                }
        }
 
        ret = 0;
 error:
-       _leave(" = %d [%hd]", ret, fl->fl_type);
+       _leave(" = %d [%hd]", ret, fl->c.flc_type);
        return ret;
 }
 
@@ -769,7 +769,7 @@ int afs_lock(struct file *file, int cmd, struct file_lock *fl)
 
        _enter("{%llx:%llu},%d,{t=%x,fl=%x,r=%Ld:%Ld}",
               vnode->fid.vid, vnode->fid.vnode, cmd,
-              fl->fl_type, fl->fl_flags,
+              fl->c.flc_type, fl->c.flc_flags,
               (long long) fl->fl_start, (long long) fl->fl_end);
 
        if (IS_GETLK(cmd))
@@ -778,7 +778,7 @@ int afs_lock(struct file *file, int cmd, struct file_lock *fl)
        fl->fl_u.afs.debug_id = atomic_inc_return(&afs_file_lock_debug_id);
        trace_afs_flock_op(vnode, fl, afs_flock_op_lock);
 
-       if (fl->fl_type == F_UNLCK)
+       if (lock_is_unlock(fl))
                ret = afs_do_unlk(file, fl);
        else
                ret = afs_do_setlk(file, fl);
@@ -804,7 +804,7 @@ int afs_flock(struct file *file, int cmd, struct file_lock *fl)
 
        _enter("{%llx:%llu},%d,{t=%x,fl=%x}",
               vnode->fid.vid, vnode->fid.vnode, cmd,
-              fl->fl_type, fl->fl_flags);
+              fl->c.flc_type, fl->c.flc_flags);
 
        /*
         * No BSD flocks over NFS allowed.
@@ -813,14 +813,14 @@ int afs_flock(struct file *file, int cmd, struct file_lock *fl)
         * Not sure whether that would be unique, though, or whether
         * that would break in other places.
         */
-       if (!(fl->fl_flags & FL_FLOCK))
+       if (!(fl->c.flc_flags & FL_FLOCK))
                return -ENOLCK;
 
        fl->fl_u.afs.debug_id = atomic_inc_return(&afs_file_lock_debug_id);
        trace_afs_flock_op(vnode, fl, afs_flock_op_flock);
 
        /* we're simulating flock() locks using posix locks on the server */
-       if (fl->fl_type == F_UNLCK)
+       if (lock_is_unlock(fl))
                ret = afs_do_unlk(file, fl);
        else
                ret = afs_do_setlk(file, fl);
@@ -843,7 +843,7 @@ int afs_flock(struct file *file, int cmd, struct file_lock *fl)
  */
 static void afs_fl_copy_lock(struct file_lock *new, struct file_lock *fl)
 {
-       struct afs_vnode *vnode = AFS_FS_I(file_inode(fl->fl_file));
+       struct afs_vnode *vnode = AFS_FS_I(file_inode(fl->c.flc_file));
 
        _enter("");
 
@@ -861,7 +861,7 @@ static void afs_fl_copy_lock(struct file_lock *new, struct file_lock *fl)
  */
 static void afs_fl_release_private(struct file_lock *fl)
 {
-       struct afs_vnode *vnode = AFS_FS_I(file_inode(fl->fl_file));
+       struct afs_vnode *vnode = AFS_FS_I(file_inode(fl->c.flc_file));
 
        _enter("");
 
index 9c03fcf7ffaa84e9f7604444209bd934b64db466..6ce5a612937c61e2021b32cad1f68a22b7c501ca 100644 (file)
@@ -321,8 +321,7 @@ struct afs_net {
        struct list_head        fs_probe_slow;  /* List of afs_server to probe at 5m intervals */
        struct hlist_head       fs_proc;        /* procfs servers list */
 
-       struct hlist_head       fs_addresses4;  /* afs_server (by lowest IPv4 addr) */
-       struct hlist_head       fs_addresses6;  /* afs_server (by lowest IPv6 addr) */
+       struct hlist_head       fs_addresses;   /* afs_server (by lowest IPv6 addr) */
        seqlock_t               fs_addr_lock;   /* For fs_addresses[46] */
 
        struct work_struct      fs_manager;
@@ -561,8 +560,7 @@ struct afs_server {
        struct afs_server __rcu *uuid_next;     /* Next server with same UUID */
        struct afs_server       *uuid_prev;     /* Previous server with same UUID */
        struct list_head        probe_link;     /* Link in net->fs_probe_list */
-       struct hlist_node       addr4_link;     /* Link in net->fs_addresses4 */
-       struct hlist_node       addr6_link;     /* Link in net->fs_addresses6 */
+       struct hlist_node       addr_link;      /* Link in net->fs_addresses6 */
        struct hlist_node       proc_link;      /* Link in net->fs_proc */
        struct list_head        volumes;        /* RCU list of afs_server_entry objects */
        struct afs_server       *gc_next;       /* Next server in manager's list */
index 1b3bd21c168acc223bfaf39fa454d2cb49ae3fbb..a14f6013e316d964bfa6eef3e09befe62d591411 100644 (file)
@@ -90,8 +90,7 @@ static int __net_init afs_net_init(struct net *net_ns)
        INIT_LIST_HEAD(&net->fs_probe_slow);
        INIT_HLIST_HEAD(&net->fs_proc);
 
-       INIT_HLIST_HEAD(&net->fs_addresses4);
-       INIT_HLIST_HEAD(&net->fs_addresses6);
+       INIT_HLIST_HEAD(&net->fs_addresses);
        seqlock_init(&net->fs_addr_lock);
 
        INIT_WORK(&net->fs_manager, afs_manage_servers);
index 3bd02571f30debca6159756b5abe30e3dd905583..15eab053af6dc05931363c619cd32cf041093a3f 100644 (file)
@@ -166,7 +166,7 @@ static int afs_proc_addr_prefs_show(struct seq_file *m, void *v)
 
        if (!preflist) {
                seq_puts(m, "NO PREFS\n");
-               return 0;
+               goto out;
        }
 
        seq_printf(m, "PROT SUBNET                                      PRIOR (v=%u n=%u/%u/%u)\n",
@@ -191,7 +191,8 @@ static int afs_proc_addr_prefs_show(struct seq_file *m, void *v)
                }
        }
 
-       rcu_read_lock();
+out:
+       rcu_read_unlock();
        return 0;
 }
 
index e169121f603e28d5679a895d0ca0f136270a6f56..038f9d0ae3af8ee1df24dc163c972e826c5d62fb 100644 (file)
@@ -38,7 +38,7 @@ struct afs_server *afs_find_server(struct afs_net *net, const struct rxrpc_peer
                seq++; /* 2 on the 1st/lockless path, otherwise odd */
                read_seqbegin_or_lock(&net->fs_addr_lock, &seq);
 
-               hlist_for_each_entry_rcu(server, &net->fs_addresses6, addr6_link) {
+               hlist_for_each_entry_rcu(server, &net->fs_addresses, addr_link) {
                        estate = rcu_dereference(server->endpoint_state);
                        alist = estate->addresses;
                        for (i = 0; i < alist->nr_addrs; i++)
@@ -177,10 +177,8 @@ added_dup:
         * bit, but anything we might want to do gets messy and memory
         * intensive.
         */
-       if (alist->nr_ipv4 > 0)
-               hlist_add_head_rcu(&server->addr4_link, &net->fs_addresses4);
-       if (alist->nr_addrs > alist->nr_ipv4)
-               hlist_add_head_rcu(&server->addr6_link, &net->fs_addresses6);
+       if (alist->nr_addrs > 0)
+               hlist_add_head_rcu(&server->addr_link, &net->fs_addresses);
 
        write_sequnlock(&net->fs_addr_lock);
 
@@ -511,10 +509,8 @@ static void afs_gc_servers(struct afs_net *net, struct afs_server *gc_list)
 
                        list_del(&server->probe_link);
                        hlist_del_rcu(&server->proc_link);
-                       if (!hlist_unhashed(&server->addr4_link))
-                               hlist_del_rcu(&server->addr4_link);
-                       if (!hlist_unhashed(&server->addr6_link))
-                               hlist_del_rcu(&server->addr6_link);
+                       if (!hlist_unhashed(&server->addr_link))
+                               hlist_del_rcu(&server->addr_link);
                }
                write_sequnlock(&net->fs_lock);
 
index 020ecd45e476214f08b9867412ec4b379889344d..af3a3f57c1b3f9512bcaa08ce37a0f8173e809d0 100644 (file)
@@ -353,7 +353,7 @@ static int afs_update_volume_status(struct afs_volume *volume, struct key *key)
 {
        struct afs_server_list *new, *old, *discard;
        struct afs_vldb_entry *vldb;
-       char idbuf[16];
+       char idbuf[24];
        int ret, idsz;
 
        _enter("");
@@ -361,7 +361,7 @@ static int afs_update_volume_status(struct afs_volume *volume, struct key *key)
        /* We look up an ID by passing it as a decimal string in the
         * operation's name parameter.
         */
-       idsz = sprintf(idbuf, "%llu", volume->vid);
+       idsz = snprintf(idbuf, sizeof(idbuf), "%llu", volume->vid);
 
        vldb = afs_vl_lookup_vldb(volume->cell, key, idbuf, idsz);
        if (IS_ERR(vldb)) {
index bb2ff48991f35ed59479a004641e1452c7bad3ea..9cdaa2faa5363333627e0cba54a4efe75b45b144 100644 (file)
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -589,13 +589,24 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
 
 void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
 {
-       struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw);
-       struct kioctx *ctx = req->ki_ctx;
+       struct aio_kiocb *req;
+       struct kioctx *ctx;
        unsigned long flags;
 
+       /*
+        * kiocb didn't come from aio or is neither a read nor a write, hence
+        * ignore it.
+        */
+       if (!(iocb->ki_flags & IOCB_AIO_RW))
+               return;
+
+       req = container_of(iocb, struct aio_kiocb, rw);
+
        if (WARN_ON_ONCE(!list_empty(&req->ki_list)))
                return;
 
+       ctx = req->ki_ctx;
+
        spin_lock_irqsave(&ctx->ctx_lock, flags);
        list_add_tail(&req->ki_list, &ctx->active_reqs);
        req->ki_cancel = cancel;
@@ -1509,7 +1520,7 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb)
        req->ki_complete = aio_complete_rw;
        req->private = NULL;
        req->ki_pos = iocb->aio_offset;
-       req->ki_flags = req->ki_filp->f_iocb_flags;
+       req->ki_flags = req->ki_filp->f_iocb_flags | IOCB_AIO_RW;
        if (iocb->aio_flags & IOCB_FLAG_RESFD)
                req->ki_flags |= IOCB_EVENTFD;
        if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
index 5a13f0c8495fde67df096d4501d32c47e86f056e..49d23b5dbab4b971bb8756c3fca360c5618cc2c3 100644 (file)
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -352,7 +352,7 @@ int may_setattr(struct mnt_idmap *idmap, struct inode *inode,
 EXPORT_SYMBOL(may_setattr);
 
 /**
- * notify_change - modify attributes of a filesytem object
+ * notify_change - modify attributes of a filesystem object
  * @idmap:     idmap of the mount the inode was found from
  * @dentry:    object affected
  * @attr:      new attributes
index a681f38d84d8e170bcca715c6045811c7c7f163b..740185198db3473163bb141719e6e03514314c72 100644 (file)
@@ -325,9 +325,7 @@ EXPORT_SYMBOL_GPL(backing_file_mmap);
 
 static int __init backing_aio_init(void)
 {
-       backing_aio_cachep = kmem_cache_create("backing_aio",
-                                              sizeof(struct backing_aio),
-                                              0, SLAB_HWCACHE_ALIGN, NULL);
+       backing_aio_cachep = KMEM_CACHE(backing_aio, SLAB_HWCACHE_ALIGN);
        if (!backing_aio_cachep)
                return -ENOMEM;
 
index 10704f2d3af5302f71a931e13bd0ba5432d46fe2..fd3e175d83423261d68124cd26fc0351488ad05e 100644 (file)
@@ -1715,7 +1715,7 @@ static int bch2_discard_one_bucket(struct btree_trans *trans,
                 * This works without any other locks because this is the only
                 * thread that removes items from the need_discard tree
                 */
-               bch2_trans_unlock(trans);
+               bch2_trans_unlock_long(trans);
                blkdev_issue_discard(ca->disk_sb.bdev,
                                     k.k->p.offset * ca->mi.bucket_size,
                                     ca->mi.bucket_size,
index b4dc319bcb2bc0a5363e74f6d2096d3b5652599d..569b97904da42eec8975e8662dd78895d41d62fe 100644 (file)
@@ -68,9 +68,11 @@ void bch2_backpointer_to_text(struct printbuf *out, const struct bch_backpointer
 
 void bch2_backpointer_k_to_text(struct printbuf *out, struct bch_fs *c, struct bkey_s_c k)
 {
-       prt_str(out, "bucket=");
-       bch2_bpos_to_text(out, bp_pos_to_bucket(c, k.k->p));
-       prt_str(out, " ");
+       if (bch2_dev_exists2(c, k.k->p.inode)) {
+               prt_str(out, "bucket=");
+               bch2_bpos_to_text(out, bp_pos_to_bucket(c, k.k->p));
+               prt_str(out, " ");
+       }
 
        bch2_backpointer_to_text(out, bkey_s_c_to_backpointer(k).v);
 }
index b80c6c9efd8cef95b46b5b45b21f639e18373755..69d0d60d50e366edf9e56ba101dda047536f5338 100644 (file)
@@ -1249,6 +1249,18 @@ static inline struct stdio_redirect *bch2_fs_stdio_redirect(struct bch_fs *c)
        return stdio;
 }
 
+static inline unsigned metadata_replicas_required(struct bch_fs *c)
+{
+       return min(c->opts.metadata_replicas,
+                  c->opts.metadata_replicas_required);
+}
+
+static inline unsigned data_replicas_required(struct bch_fs *c)
+{
+       return min(c->opts.data_replicas,
+                  c->opts.data_replicas_required);
+}
+
 #define BKEY_PADDED_ONSTACK(key, pad)                          \
        struct { struct bkey_i key; __u64 key ## _pad[pad]; }
 
index 5467a8635be113102c56bb6f02986209533c35ac..3ef338df82f5e46228f583a85a7cacdba233a64b 100644 (file)
@@ -2156,7 +2156,9 @@ struct bkey_s_c bch2_btree_iter_peek_upto(struct btree_iter *iter, struct bpos e
                 * isn't monotonically increasing before FILTER_SNAPSHOTS, and
                 * that's what we check against in extents mode:
                 */
-               if (k.k->p.inode > end.inode)
+               if (unlikely(!(iter->flags & BTREE_ITER_IS_EXTENTS)
+                            ? bkey_gt(k.k->p, end)
+                            : k.k->p.inode > end.inode))
                        goto end;
 
                if (iter->update_path &&
index bed75c93c06904e06f70e3afa92cc507a68b81c9..6843974423381029e7a8cf24fd4cd5c6c33627cd 100644 (file)
@@ -92,7 +92,7 @@ static noinline void print_cycle(struct printbuf *out, struct lock_graph *g)
                        continue;
 
                bch2_btree_trans_to_text(out, i->trans);
-               bch2_prt_task_backtrace(out, task, i == g->g ? 5 : 1);
+               bch2_prt_task_backtrace(out, task, i == g->g ? 5 : 1, GFP_NOWAIT);
        }
 }
 
@@ -227,7 +227,7 @@ static noinline int break_cycle(struct lock_graph *g, struct printbuf *cycle)
                        prt_printf(&buf, "backtrace:");
                        prt_newline(&buf);
                        printbuf_indent_add(&buf, 2);
-                       bch2_prt_task_backtrace(&buf, trans->locking_wait.task, 2);
+                       bch2_prt_task_backtrace(&buf, trans->locking_wait.task, 2, GFP_NOWAIT);
                        printbuf_indent_sub(&buf, 2);
                        prt_newline(&buf);
                }
index 17a5938aa71a6b43b45c12383e4690df146ee2a3..4530b14ff2c3717ec15e92615385c04e185e28e1 100644 (file)
@@ -280,7 +280,8 @@ retry:
                                      writepoint_ptr(&c->btree_write_point),
                                      &devs_have,
                                      res->nr_replicas,
-                                     c->opts.metadata_replicas_required,
+                                     min(res->nr_replicas,
+                                         c->opts.metadata_replicas_required),
                                      watermark, 0, cl, &wp);
        if (unlikely(ret))
                return ERR_PTR(ret);
index cadda9bbe4a4cd67fe3b6f6f7aa5a5d93e496307..7bdba8507fc93cdfdecc29de3e70e5589cf8177b 100644 (file)
@@ -627,7 +627,7 @@ restart:
                prt_printf(&i->buf, "backtrace:");
                prt_newline(&i->buf);
                printbuf_indent_add(&i->buf, 2);
-               bch2_prt_task_backtrace(&i->buf, task, 0);
+               bch2_prt_task_backtrace(&i->buf, task, 0, GFP_KERNEL);
                printbuf_indent_sub(&i->buf, 2);
                prt_newline(&i->buf);
 
index 73c12e565af50a465260856baaa831eb2a542caa..27710cdd5710ec5bba9ff9a11cad92f7cf14bc09 100644 (file)
@@ -303,18 +303,6 @@ void bch2_readahead(struct readahead_control *ractl)
        darray_exit(&readpages_iter.folios);
 }
 
-static void __bchfs_readfolio(struct bch_fs *c, struct bch_read_bio *rbio,
-                            subvol_inum inum, struct folio *folio)
-{
-       bch2_folio_create(folio, __GFP_NOFAIL);
-
-       rbio->bio.bi_opf = REQ_OP_READ|REQ_SYNC;
-       rbio->bio.bi_iter.bi_sector = folio_sector(folio);
-       BUG_ON(!bio_add_folio(&rbio->bio, folio, folio_size(folio), 0));
-
-       bch2_trans_run(c, (bchfs_read(trans, rbio, inum, NULL), 0));
-}
-
 static void bch2_read_single_folio_end_io(struct bio *bio)
 {
        complete(bio->bi_private);
@@ -329,6 +317,9 @@ int bch2_read_single_folio(struct folio *folio, struct address_space *mapping)
        int ret;
        DECLARE_COMPLETION_ONSTACK(done);
 
+       if (!bch2_folio_create(folio, GFP_KERNEL))
+               return -ENOMEM;
+
        bch2_inode_opts_get(&opts, c, &inode->ei_inode);
 
        rbio = rbio_init(bio_alloc_bioset(NULL, 1, REQ_OP_READ, GFP_KERNEL, &c->bio_read),
@@ -336,7 +327,11 @@ int bch2_read_single_folio(struct folio *folio, struct address_space *mapping)
        rbio->bio.bi_private = &done;
        rbio->bio.bi_end_io = bch2_read_single_folio_end_io;
 
-       __bchfs_readfolio(c, rbio, inode_inum(inode), folio);
+       rbio->bio.bi_opf = REQ_OP_READ|REQ_SYNC;
+       rbio->bio.bi_iter.bi_sector = folio_sector(folio);
+       BUG_ON(!bio_add_folio(&rbio->bio, folio, folio_size(folio), 0));
+
+       bch2_trans_run(c, (bchfs_read(trans, rbio, inode_inum(inode), NULL), 0));
        wait_for_completion(&done);
 
        ret = blk_status_to_errno(rbio->bio.bi_status);
index e3b219e19e1008ccfe1ff61e966115795f9c1831..33cb6da3a5ad28f2c014c2ef12408937933d49c3 100644 (file)
@@ -88,6 +88,8 @@ static int bch2_direct_IO_read(struct kiocb *req, struct iov_iter *iter)
                return ret;
 
        shorten = iov_iter_count(iter) - round_up(ret, block_bytes(c));
+       if (shorten >= iter->count)
+               shorten = 0;
        iter->count -= shorten;
 
        bio = bio_alloc_bioset(NULL,
index dc52918d06ef3f91c30484822a5a170b08543f9c..8c70123b6a0c809b6d50040593281c2e9c115828 100644 (file)
@@ -79,7 +79,7 @@ void bch2_inode_flush_nocow_writes_async(struct bch_fs *c,
                        continue;
 
                bio = container_of(bio_alloc_bioset(ca->disk_sb.bdev, 0,
-                                                   REQ_OP_FLUSH,
+                                                   REQ_OP_WRITE|REQ_PREFLUSH,
                                                    GFP_KERNEL,
                                                    &c->nocow_flush_bioset),
                                   struct nocow_flush, bio);
index 3a4c24c28e7fa06deff38f6bb0b240a5daacda8c..3dc8630ff9fe139bd44317d72502ed9bf1f73751 100644 (file)
@@ -455,6 +455,7 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp,
        if (IS_ERR(victim))
                return PTR_ERR(victim);
 
+       dir = d_inode(path.dentry);
        if (victim->d_sb->s_fs_info != c) {
                ret = -EXDEV;
                goto err;
@@ -463,14 +464,13 @@ static long bch2_ioctl_subvolume_destroy(struct bch_fs *c, struct file *filp,
                ret = -ENOENT;
                goto err;
        }
-       dir = d_inode(path.dentry);
        ret = __bch2_unlink(dir, victim, true);
        if (!ret) {
                fsnotify_rmdir(dir, victim);
                d_delete(victim);
        }
-       inode_unlock(dir);
 err:
+       inode_unlock(dir);
        dput(victim);
        path_put(&path);
        return ret;
index ec419b8e2c43123b42e0d84c837611fc5f6e2314..77ae65542db9166a4168a78a55064295bb1d9ebf 100644 (file)
@@ -435,7 +435,7 @@ static int bch2_link(struct dentry *old_dentry, struct inode *vdir,
                bch2_subvol_is_ro(c, inode->ei_subvol) ?:
                __bch2_link(c, inode, dir, dentry);
        if (unlikely(ret))
-               return ret;
+               return bch2_err_class(ret);
 
        ihold(&inode->v);
        d_instantiate(dentry, &inode->v);
@@ -487,8 +487,9 @@ static int bch2_unlink(struct inode *vdir, struct dentry *dentry)
        struct bch_inode_info *dir= to_bch_ei(vdir);
        struct bch_fs *c = dir->v.i_sb->s_fs_info;
 
-       return bch2_subvol_is_ro(c, dir->ei_subvol) ?:
+       int ret = bch2_subvol_is_ro(c, dir->ei_subvol) ?:
                __bch2_unlink(vdir, dentry, false);
+       return bch2_err_class(ret);
 }
 
 static int bch2_symlink(struct mnt_idmap *idmap,
@@ -523,7 +524,7 @@ static int bch2_symlink(struct mnt_idmap *idmap,
        return 0;
 err:
        iput(&inode->v);
-       return ret;
+       return bch2_err_class(ret);
 }
 
 static int bch2_mkdir(struct mnt_idmap *idmap,
@@ -641,7 +642,7 @@ err:
                           src_inode,
                           dst_inode);
 
-       return ret;
+       return bch2_err_class(ret);
 }
 
 static void bch2_setattr_copy(struct mnt_idmap *idmap,
index 4f0ecd60567570b7364cef517225ea0e3dfa5575..6a760777bafb06d08b449ee0db4308a77b54b11e 100644 (file)
@@ -119,22 +119,19 @@ static int lookup_inode(struct btree_trans *trans, u64 inode_nr,
        if (!ret)
                *snapshot = iter.pos.snapshot;
 err:
-       bch_err_msg(trans->c, ret, "fetching inode %llu:%u", inode_nr, *snapshot);
        bch2_trans_iter_exit(trans, &iter);
        return ret;
 }
 
-static int __lookup_dirent(struct btree_trans *trans,
+static int lookup_dirent_in_snapshot(struct btree_trans *trans,
                           struct bch_hash_info hash_info,
                           subvol_inum dir, struct qstr *name,
-                          u64 *target, unsigned *type)
+                          u64 *target, unsigned *type, u32 snapshot)
 {
        struct btree_iter iter;
        struct bkey_s_c_dirent d;
-       int ret;
-
-       ret = bch2_hash_lookup(trans, &iter, bch2_dirent_hash_desc,
-                              &hash_info, dir, name, 0);
+       int ret = bch2_hash_lookup_in_snapshot(trans, &iter, bch2_dirent_hash_desc,
+                              &hash_info, dir, name, 0, snapshot);
        if (ret)
                return ret;
 
@@ -225,15 +222,16 @@ static int lookup_lostfound(struct btree_trans *trans, u32 snapshot,
 
        struct bch_inode_unpacked root_inode;
        struct bch_hash_info root_hash_info;
-       ret = lookup_inode(trans, root_inum.inum, &root_inode, &snapshot);
+       u32 root_inode_snapshot = snapshot;
+       ret = lookup_inode(trans, root_inum.inum, &root_inode, &root_inode_snapshot);
        bch_err_msg(c, ret, "looking up root inode");
        if (ret)
                return ret;
 
        root_hash_info = bch2_hash_info_init(c, &root_inode);
 
-       ret = __lookup_dirent(trans, root_hash_info, root_inum,
-                             &lostfound_str, &inum, &d_type);
+       ret = lookup_dirent_in_snapshot(trans, root_hash_info, root_inum,
+                             &lostfound_str, &inum, &d_type, snapshot);
        if (bch2_err_matches(ret, ENOENT))
                goto create_lostfound;
 
@@ -250,7 +248,10 @@ static int lookup_lostfound(struct btree_trans *trans, u32 snapshot,
         * The bch2_check_dirents pass has already run, dangling dirents
         * shouldn't exist here:
         */
-       return lookup_inode(trans, inum, lostfound, &snapshot);
+       ret = lookup_inode(trans, inum, lostfound, &snapshot);
+       bch_err_msg(c, ret, "looking up lost+found %llu:%u in (root inode %llu, snapshot root %u)",
+                   inum, snapshot, root_inum.inum, bch2_snapshot_root(c, snapshot));
+       return ret;
 
 create_lostfound:
        /*
index ef3a53f9045af2591ab1f9e272dd9d6151250444..2c098ac017b30b6a4b5d016e9f5dde93ee258f2f 100644 (file)
@@ -1564,6 +1564,7 @@ CLOSURE_CALLBACK(bch2_write)
        BUG_ON(!op->write_point.v);
        BUG_ON(bkey_eq(op->pos, POS_MAX));
 
+       op->nr_replicas_required = min_t(unsigned, op->nr_replicas_required, op->nr_replicas);
        op->start_time = local_clock();
        bch2_keylist_init(&op->insert_keys, op->inline_keys);
        wbio_init(bio)->put_bio = false;
index d71d26e39521e4410a90cb6bf3e21df360e6c201..bc890776eb57933a5931edd2a2f07570f52b7ab3 100644 (file)
@@ -233,7 +233,7 @@ static void __journal_entry_close(struct journal *j, unsigned closed_val, bool t
                prt_str(&pbuf, "entry size: ");
                prt_human_readable_u64(&pbuf, vstruct_bytes(buf->data));
                prt_newline(&pbuf);
-               bch2_prt_task_backtrace(&pbuf, current, 1);
+               bch2_prt_task_backtrace(&pbuf, current, 1, GFP_NOWAIT);
                trace_journal_entry_close(c, pbuf.buf);
                printbuf_exit(&pbuf);
        }
index 04a1e79a5ed392cd8ebaac922a2516b374a6d094..47805193f18cc72c941f72f5b82cfb461eb8982c 100644 (file)
@@ -1478,6 +1478,8 @@ static int journal_write_alloc(struct journal *j, struct journal_buf *w)
                c->opts.foreground_target;
        unsigned i, replicas = 0, replicas_want =
                READ_ONCE(c->opts.metadata_replicas);
+       unsigned replicas_need = min_t(unsigned, replicas_want,
+                                      READ_ONCE(c->opts.metadata_replicas_required));
 
        rcu_read_lock();
 retry:
@@ -1526,7 +1528,7 @@ done:
 
        BUG_ON(bkey_val_u64s(&w->key.k) > BCH_REPLICAS_MAX);
 
-       return replicas >= c->opts.metadata_replicas_required ? 0 : -EROFS;
+       return replicas >= replicas_need ? 0 : -EROFS;
 }
 
 static void journal_buf_realloc(struct journal *j, struct journal_buf *buf)
@@ -1988,7 +1990,8 @@ CLOSURE_CALLBACK(bch2_journal_write)
                        percpu_ref_get(&ca->io_ref);
 
                        bio = ca->journal.bio;
-                       bio_reset(bio, ca->disk_sb.bdev, REQ_OP_FLUSH);
+                       bio_reset(bio, ca->disk_sb.bdev,
+                                 REQ_OP_WRITE|REQ_PREFLUSH);
                        bio->bi_end_io          = journal_write_endio;
                        bio->bi_private         = ca;
                        closure_bio_submit(bio, cl);
index 820d25e19e5fe3ee6a45e70f23eb74fc1d558e88..c33dca641575dffc58b6db8354e71c879ed5cf26 100644 (file)
@@ -205,7 +205,7 @@ void bch2_journal_space_available(struct journal *j)
 
        j->can_discard = can_discard;
 
-       if (nr_online < c->opts.metadata_replicas_required) {
+       if (nr_online < metadata_replicas_required(c)) {
                ret = JOURNAL_ERR_insufficient_devices;
                goto out;
        }
@@ -892,9 +892,11 @@ int bch2_journal_flush_device_pins(struct journal *j, int dev_idx)
                                         journal_seq_pin(j, seq)->devs);
                seq++;
 
-               spin_unlock(&j->lock);
-               ret = bch2_mark_replicas(c, &replicas.e);
-               spin_lock(&j->lock);
+               if (replicas.e.nr_devs) {
+                       spin_unlock(&j->lock);
+                       ret = bch2_mark_replicas(c, &replicas.e);
+                       spin_lock(&j->lock);
+               }
        }
        spin_unlock(&j->lock);
 err:
index b2be565bb8f214bc2ac4ebd6efac324ac20b7241..64df11ab422bf455560bad095973cc6e5a296697 100644 (file)
@@ -17,7 +17,7 @@
  * Rust and rustc has issues with u128.
  */
 
-#if defined(__SIZEOF_INT128__) && defined(__KERNEL__)
+#if defined(__SIZEOF_INT128__) && defined(__KERNEL__) && !defined(CONFIG_PARISC)
 
 typedef struct {
        unsigned __int128 v;
index accf246c32330919869bccff32a1ecfcc6d97856..b27d22925929a6554079fb8731f82dfb3dd0421c 100644 (file)
@@ -56,6 +56,7 @@ void bch2_prt_vprintf(struct printbuf *out, const char *fmt, va_list args)
 
                va_copy(args2, args);
                len = vsnprintf(out->buf + out->pos, printbuf_remaining(out), fmt, args2);
+               va_end(args2);
        } while (len + 1 >= printbuf_remaining(out) &&
                 !bch2_printbuf_make_room(out, len + 1));
 
index 9127d0e3ca2f6a3fd44e076b42f01ee6f7736427..21e13bb4335be3b6d48005282000c2f0a7c4e2bd 100644 (file)
@@ -577,8 +577,9 @@ u64 bch2_recovery_passes_from_stable(u64 v)
 
 static bool check_version_upgrade(struct bch_fs *c)
 {
-       unsigned latest_compatible = bch2_latest_compatible_version(c->sb.version);
        unsigned latest_version = bcachefs_metadata_version_current;
+       unsigned latest_compatible = min(latest_version,
+                                        bch2_latest_compatible_version(c->sb.version));
        unsigned old_version = c->sb.version_upgrade_complete ?: c->sb.version;
        unsigned new_version = 0;
 
@@ -597,7 +598,7 @@ static bool check_version_upgrade(struct bch_fs *c)
                        new_version = latest_version;
                        break;
                case BCH_VERSION_UPGRADE_none:
-                       new_version = old_version;
+                       new_version = min(old_version, latest_version);
                        break;
                }
        }
@@ -774,7 +775,7 @@ int bch2_fs_recovery(struct bch_fs *c)
                goto err;
        }
 
-       if (!(c->opts.nochanges && c->opts.norecovery)) {
+       if (!c->opts.nochanges) {
                mutex_lock(&c->sb_lock);
                bool write_sb = false;
 
@@ -804,7 +805,7 @@ int bch2_fs_recovery(struct bch_fs *c)
                if (bch2_check_version_downgrade(c)) {
                        struct printbuf buf = PRINTBUF;
 
-                       prt_str(&buf, "Version downgrade required:\n");
+                       prt_str(&buf, "Version downgrade required:");
 
                        __le64 passes = ext->recovery_passes_required[0];
                        bch2_sb_set_downgrade(c,
@@ -812,7 +813,7 @@ int bch2_fs_recovery(struct bch_fs *c)
                                        BCH_VERSION_MINOR(c->sb.version));
                        passes = ext->recovery_passes_required[0] & ~passes;
                        if (passes) {
-                               prt_str(&buf, "  running recovery passes: ");
+                               prt_str(&buf, "\n  running recovery passes: ");
                                prt_bitflags(&buf, bch2_recovery_passes,
                                             bch2_recovery_passes_from_stable(le64_to_cpu(passes)));
                        }
index a45354d2acde9f3ad0b149247c8ff4c7c869fb15..eff5ce18c69c0600047c1fef688a5980af33c678 100644 (file)
@@ -421,7 +421,7 @@ void bch2_dev_errors_reset(struct bch_dev *ca)
        m = bch2_members_v2_get_mut(c->disk_sb.sb, ca->dev_idx);
        for (unsigned i = 0; i < ARRAY_SIZE(m->errors_at_reset); i++)
                m->errors_at_reset[i] = cpu_to_le64(atomic64_read(&ca->errors[i]));
-       m->errors_reset_time = ktime_get_real_seconds();
+       m->errors_reset_time = cpu_to_le64(ktime_get_real_seconds());
 
        bch2_write_super(c);
        mutex_unlock(&c->sb_lock);
index 45f67e8b29eb67f188e5cfb32aa39e0b1ad1d625..ac6ba04d5521714ece2e2cb00400fff60ec05eb6 100644 (file)
@@ -728,7 +728,7 @@ static int check_snapshot(struct btree_trans *trans,
                return 0;
 
        memset(&s, 0, sizeof(s));
-       memcpy(&s, k.v, bkey_val_bytes(k.k));
+       memcpy(&s, k.v, min(sizeof(s), bkey_val_bytes(k.k)));
 
        id = le32_to_cpu(s.parent);
        if (id) {
index 89fdb7c21134ebbb6c145a88ed5b1943ab54588a..fcaa5a888744881a4f6c37dd77fbd8cf73b2f4d0 100644 (file)
@@ -160,21 +160,16 @@ static inline bool is_visible_key(struct bch_hash_desc desc, subvol_inum inum, s
 }
 
 static __always_inline int
-bch2_hash_lookup(struct btree_trans *trans,
+bch2_hash_lookup_in_snapshot(struct btree_trans *trans,
                 struct btree_iter *iter,
                 const struct bch_hash_desc desc,
                 const struct bch_hash_info *info,
                 subvol_inum inum, const void *key,
-                unsigned flags)
+                unsigned flags, u32 snapshot)
 {
        struct bkey_s_c k;
-       u32 snapshot;
        int ret;
 
-       ret = bch2_subvolume_get_snapshot(trans, inum.subvol, &snapshot);
-       if (ret)
-               return ret;
-
        for_each_btree_key_upto_norestart(trans, *iter, desc.btree_id,
                           SPOS(inum.inum, desc.hash_key(info, key), snapshot),
                           POS(inum.inum, U64_MAX),
@@ -194,6 +189,19 @@ bch2_hash_lookup(struct btree_trans *trans,
        return ret ?: -BCH_ERR_ENOENT_str_hash_lookup;
 }
 
+static __always_inline int
+bch2_hash_lookup(struct btree_trans *trans,
+                struct btree_iter *iter,
+                const struct bch_hash_desc desc,
+                const struct bch_hash_info *info,
+                subvol_inum inum, const void *key,
+                unsigned flags)
+{
+       u32 snapshot;
+       return  bch2_subvolume_get_snapshot(trans, inum.subvol, &snapshot) ?:
+               bch2_hash_lookup_in_snapshot(trans, iter, desc, info, inum, key, flags, snapshot);
+}
+
 static __always_inline int
 bch2_hash_hole(struct btree_trans *trans,
               struct btree_iter *iter,
index d60c7d27a0477cb0de116675671d5c888d8f1c86..bd64eb68e84af4c6b7afed028a6bd7d9ae2cc5d6 100644 (file)
@@ -142,8 +142,8 @@ void bch2_sb_field_delete(struct bch_sb_handle *sb,
 void bch2_free_super(struct bch_sb_handle *sb)
 {
        kfree(sb->bio);
-       if (!IS_ERR_OR_NULL(sb->bdev_handle))
-               bdev_release(sb->bdev_handle);
+       if (!IS_ERR_OR_NULL(sb->s_bdev_file))
+               fput(sb->s_bdev_file);
        kfree(sb->holder);
        kfree(sb->sb_name);
 
@@ -704,22 +704,22 @@ retry:
        if (!opt_get(*opts, nochanges))
                sb->mode |= BLK_OPEN_WRITE;
 
-       sb->bdev_handle = bdev_open_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
-       if (IS_ERR(sb->bdev_handle) &&
-           PTR_ERR(sb->bdev_handle) == -EACCES &&
+       sb->s_bdev_file = bdev_file_open_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
+       if (IS_ERR(sb->s_bdev_file) &&
+           PTR_ERR(sb->s_bdev_file) == -EACCES &&
            opt_get(*opts, read_only)) {
                sb->mode &= ~BLK_OPEN_WRITE;
 
-               sb->bdev_handle = bdev_open_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
-               if (!IS_ERR(sb->bdev_handle))
+               sb->s_bdev_file = bdev_file_open_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
+               if (!IS_ERR(sb->s_bdev_file))
                        opt_set(*opts, nochanges, true);
        }
 
-       if (IS_ERR(sb->bdev_handle)) {
-               ret = PTR_ERR(sb->bdev_handle);
-               goto out;
+       if (IS_ERR(sb->s_bdev_file)) {
+               ret = PTR_ERR(sb->s_bdev_file);
+               goto err;
        }
-       sb->bdev = sb->bdev_handle->bdev;
+       sb->bdev = file_bdev(sb->s_bdev_file);
 
        ret = bch2_sb_realloc(sb, 0);
        if (ret) {
index b9911402b1753baa986a1673339c4454eba87431..6b23e11825e6d47ef46c7f294add46fa455e6a8f 100644 (file)
@@ -1428,10 +1428,10 @@ bool bch2_dev_state_allowed(struct bch_fs *c, struct bch_dev *ca,
 
                required = max(!(flags & BCH_FORCE_IF_METADATA_DEGRADED)
                               ? c->opts.metadata_replicas
-                              : c->opts.metadata_replicas_required,
+                              : metadata_replicas_required(c),
                               !(flags & BCH_FORCE_IF_DATA_DEGRADED)
                               ? c->opts.data_replicas
-                              : c->opts.data_replicas_required);
+                              : data_replicas_required(c));
 
                return nr_rw >= required;
        case BCH_MEMBER_STATE_failed:
index 0e5a14fc8e7fbfde622ec68dfae45f69ad83bd87..ec784d975f6655a378207692644975e53271ddca 100644 (file)
@@ -4,7 +4,7 @@
 
 struct bch_sb_handle {
        struct bch_sb           *sb;
-       struct bdev_handle      *bdev_handle;
+       struct file             *s_bdev_file;
        struct block_device     *bdev;
        char                    *sb_name;
        struct bio              *bio;
index b1c867aa2b58e6f097cba1e4eedc37f55a58cc93..9220d7de10db67f6cd4a36040af7fe557756230b 100644 (file)
@@ -53,9 +53,9 @@ int bch2_run_thread_with_file(struct thread_with_file *thr,
        if (ret)
                goto err;
 
-       fd_install(fd, file);
        get_task_struct(thr->task);
        wake_up_process(thr->task);
+       fd_install(fd, file);
        return fd;
 err:
        if (fd >= 0)
index a135136adeee355cb8854482e85b0c85e6c1b8f8..3a32faa86b5c4a2eee98de32951c18dc73052041 100644 (file)
@@ -272,14 +272,14 @@ void bch2_print_string_as_lines(const char *prefix, const char *lines)
        console_unlock();
 }
 
-int bch2_save_backtrace(bch_stacktrace *stack, struct task_struct *task, unsigned skipnr)
+int bch2_save_backtrace(bch_stacktrace *stack, struct task_struct *task, unsigned skipnr,
+                       gfp_t gfp)
 {
 #ifdef CONFIG_STACKTRACE
        unsigned nr_entries = 0;
-       int ret = 0;
 
        stack->nr = 0;
-       ret = darray_make_room(stack, 32);
+       int ret = darray_make_room_gfp(stack, 32, gfp);
        if (ret)
                return ret;
 
@@ -289,7 +289,7 @@ int bch2_save_backtrace(bch_stacktrace *stack, struct task_struct *task, unsigne
        do {
                nr_entries = stack_trace_save_tsk(task, stack->data, stack->size, skipnr + 1);
        } while (nr_entries == stack->size &&
-                !(ret = darray_make_room(stack, stack->size * 2)));
+                !(ret = darray_make_room_gfp(stack, stack->size * 2, gfp)));
 
        stack->nr = nr_entries;
        up_read(&task->signal->exec_update_lock);
@@ -308,10 +308,10 @@ void bch2_prt_backtrace(struct printbuf *out, bch_stacktrace *stack)
        }
 }
 
-int bch2_prt_task_backtrace(struct printbuf *out, struct task_struct *task, unsigned skipnr)
+int bch2_prt_task_backtrace(struct printbuf *out, struct task_struct *task, unsigned skipnr, gfp_t gfp)
 {
        bch_stacktrace stack = { 0 };
-       int ret = bch2_save_backtrace(&stack, task, skipnr + 1);
+       int ret = bch2_save_backtrace(&stack, task, skipnr + 1, gfp);
 
        bch2_prt_backtrace(out, &stack);
        darray_exit(&stack);
@@ -418,14 +418,15 @@ static inline void bch2_time_stats_update_one(struct bch2_time_stats *stats,
                bch2_quantiles_update(&stats->quantiles, duration);
        }
 
-       if (time_after64(end, stats->last_event)) {
+       if (stats->last_event && time_after64(end, stats->last_event)) {
                freq = end - stats->last_event;
                mean_and_variance_update(&stats->freq_stats, freq);
                mean_and_variance_weighted_update(&stats->freq_stats_weighted, freq);
                stats->max_freq = max(stats->max_freq, freq);
                stats->min_freq = min(stats->min_freq, freq);
-               stats->last_event = end;
        }
+
+       stats->last_event = end;
 }
 
 static void __bch2_time_stats_clear_buffer(struct bch2_time_stats *stats,
index df67bf55fe2bc2d74265eb8a52fe6d22fca2fd2f..b414736d59a5b36d1344657eaeb6de6113ec5a09 100644 (file)
@@ -348,9 +348,9 @@ void bch2_prt_u64_base2(struct printbuf *, u64);
 void bch2_print_string_as_lines(const char *prefix, const char *lines);
 
 typedef DARRAY(unsigned long) bch_stacktrace;
-int bch2_save_backtrace(bch_stacktrace *stack, struct task_struct *, unsigned);
+int bch2_save_backtrace(bch_stacktrace *stack, struct task_struct *, unsigned, gfp_t);
 void bch2_prt_backtrace(struct printbuf *, bch_stacktrace *);
-int bch2_prt_task_backtrace(struct printbuf *, struct task_struct *, unsigned);
+int bch2_prt_task_backtrace(struct printbuf *, struct task_struct *, unsigned, gfp_t);
 
 static inline void prt_bdevname(struct printbuf *out, struct block_device *bdev)
 {
index a9be9ac9922225bb32801aec5834c9e9d87ffc97..378d9103a2072b1628e66d850a42b9254be72b36 100644 (file)
@@ -1455,6 +1455,7 @@ out:
  */
 void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
 {
+       LIST_HEAD(retry_list);
        struct btrfs_block_group *block_group;
        struct btrfs_space_info *space_info;
        struct btrfs_trans_handle *trans;
@@ -1476,6 +1477,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
 
        spin_lock(&fs_info->unused_bgs_lock);
        while (!list_empty(&fs_info->unused_bgs)) {
+               u64 used;
                int trimming;
 
                block_group = list_first_entry(&fs_info->unused_bgs,
@@ -1511,9 +1513,9 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
                        goto next;
                }
 
+               spin_lock(&space_info->lock);
                spin_lock(&block_group->lock);
-               if (block_group->reserved || block_group->pinned ||
-                   block_group->used || block_group->ro ||
+               if (btrfs_is_block_group_used(block_group) || block_group->ro ||
                    list_is_singular(&block_group->list)) {
                        /*
                         * We want to bail if we made new allocations or have
@@ -1523,10 +1525,49 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
                         */
                        trace_btrfs_skip_unused_block_group(block_group);
                        spin_unlock(&block_group->lock);
+                       spin_unlock(&space_info->lock);
                        up_write(&space_info->groups_sem);
                        goto next;
                }
+
+               /*
+                * The block group may be unused but there may be space reserved
+                * accounting with the existence of that block group, that is,
+                * space_info->bytes_may_use was incremented by a task but no
+                * space was yet allocated from the block group by the task.
+                * That space may or may not be allocated, as we are generally
+                * pessimistic about space reservation for metadata as well as
+                * for data when using compression (as we reserve space based on
+                * the worst case, when data can't be compressed, and before
+                * actually attempting compression, before starting writeback).
+                *
+                * So check if the total space of the space_info minus the size
+                * of this block group is less than the used space of the
+                * space_info - if that's the case, then it means we have tasks
+                * that might be relying on the block group in order to allocate
+                * extents, and add back the block group to the unused list when
+                * we finish, so that we retry later in case no tasks ended up
+                * needing to allocate extents from the block group.
+                */
+               used = btrfs_space_info_used(space_info, true);
+               if (space_info->total_bytes - block_group->length < used) {
+                       /*
+                        * Add a reference for the list, compensate for the ref
+                        * drop under the "next" label for the
+                        * fs_info->unused_bgs list.
+                        */
+                       btrfs_get_block_group(block_group);
+                       list_add_tail(&block_group->bg_list, &retry_list);
+
+                       trace_btrfs_skip_unused_block_group(block_group);
+                       spin_unlock(&block_group->lock);
+                       spin_unlock(&space_info->lock);
+                       up_write(&space_info->groups_sem);
+                       goto next;
+               }
+
                spin_unlock(&block_group->lock);
+               spin_unlock(&space_info->lock);
 
                /* We don't want to force the issue, only flip if it's ok. */
                ret = inc_block_group_ro(block_group, 0);
@@ -1650,12 +1691,16 @@ next:
                btrfs_put_block_group(block_group);
                spin_lock(&fs_info->unused_bgs_lock);
        }
+       list_splice_tail(&retry_list, &fs_info->unused_bgs);
        spin_unlock(&fs_info->unused_bgs_lock);
        mutex_unlock(&fs_info->reclaim_bgs_lock);
        return;
 
 flip_async:
        btrfs_end_transaction(trans);
+       spin_lock(&fs_info->unused_bgs_lock);
+       list_splice_tail(&retry_list, &fs_info->unused_bgs);
+       spin_unlock(&fs_info->unused_bgs_lock);
        mutex_unlock(&fs_info->reclaim_bgs_lock);
        btrfs_put_block_group(block_group);
        btrfs_discard_punt_unused_bgs_list(fs_info);
@@ -2684,6 +2729,37 @@ next:
                btrfs_dec_delayed_refs_rsv_bg_inserts(fs_info);
                list_del_init(&block_group->bg_list);
                clear_bit(BLOCK_GROUP_FLAG_NEW, &block_group->runtime_flags);
+
+               /*
+                * If the block group is still unused, add it to the list of
+                * unused block groups. The block group may have been created in
+                * order to satisfy a space reservation, in which case the
+                * extent allocation only happens later. But often we don't
+                * actually need to allocate space that we previously reserved,
+                * so the block group may become unused for a long time. For
+                * example for metadata we generally reserve space for a worst
+                * possible scenario, but then don't end up allocating all that
+                * space or none at all (due to no need to COW, extent buffers
+                * were already COWed in the current transaction and still
+                * unwritten, tree heights lower than the maximum possible
+                * height, etc). For data we generally reserve the axact amount
+                * of space we are going to allocate later, the exception is
+                * when using compression, as we must reserve space based on the
+                * uncompressed data size, because the compression is only done
+                * when writeback triggered and we don't know how much space we
+                * are actually going to need, so we reserve the uncompressed
+                * size because the data may be uncompressible in the worst case.
+                */
+               if (ret == 0) {
+                       bool used;
+
+                       spin_lock(&block_group->lock);
+                       used = btrfs_is_block_group_used(block_group);
+                       spin_unlock(&block_group->lock);
+
+                       if (!used)
+                               btrfs_mark_bg_unused(block_group);
+               }
        }
        btrfs_trans_release_chunk_metadata(trans);
 }
index c4a1f01cc1c240d108702fc8899de9efe00da613..962b11983901a86ae16add7962c5ea5a26796b6f 100644 (file)
@@ -257,6 +257,13 @@ static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group)
        return (block_group->start + block_group->length);
 }
 
+static inline bool btrfs_is_block_group_used(const struct btrfs_block_group *bg)
+{
+       lockdep_assert_held(&bg->lock);
+
+       return (bg->used > 0 || bg->reserved > 0 || bg->pinned > 0);
+}
+
 static inline bool btrfs_is_block_group_data_only(
                                        struct btrfs_block_group *block_group)
 {
index ceb5f586a2d55571d53db2de227f4ef0f5ec1c27..1043a8142351b2692587f4a5e6d11147ee7fde99 100644 (file)
@@ -494,7 +494,7 @@ struct btrfs_block_rsv *btrfs_use_block_rsv(struct btrfs_trans_handle *trans,
 
        block_rsv = get_block_rsv(trans, root);
 
-       if (unlikely(block_rsv->size == 0))
+       if (unlikely(btrfs_block_rsv_size(block_rsv) == 0))
                goto try_reserve;
 again:
        ret = btrfs_block_rsv_use_bytes(block_rsv, blocksize);
index b0bd12b8652f4f51e467a95b4bfa36ec8d894837..43a9a6b5a79f4622607529393eaced15cb1409ac 100644 (file)
@@ -101,4 +101,36 @@ static inline bool btrfs_block_rsv_full(const struct btrfs_block_rsv *rsv)
        return data_race(rsv->full);
 }
 
+/*
+ * Get the reserved mount of a block reserve in a context where getting a stale
+ * value is acceptable, instead of accessing it directly and trigger data race
+ * warning from KCSAN.
+ */
+static inline u64 btrfs_block_rsv_reserved(struct btrfs_block_rsv *rsv)
+{
+       u64 ret;
+
+       spin_lock(&rsv->lock);
+       ret = rsv->reserved;
+       spin_unlock(&rsv->lock);
+
+       return ret;
+}
+
+/*
+ * Get the size of a block reserve in a context where getting a stale value is
+ * acceptable, instead of accessing it directly and trigger data race warning
+ * from KCSAN.
+ */
+static inline u64 btrfs_block_rsv_size(struct btrfs_block_rsv *rsv)
+{
+       u64 ret;
+
+       spin_lock(&rsv->lock);
+       ret = rsv->size;
+       spin_unlock(&rsv->lock);
+
+       return ret;
+}
+
 #endif /* BTRFS_BLOCK_RSV_H */
index 193168214eeb17fc8a8a9cff3942eb3f68958e1b..68345f73d429aa2d4537ef620a0048e61c4eb7a8 100644 (file)
@@ -141,16 +141,16 @@ static int compression_decompress_bio(struct list_head *ws,
 }
 
 static int compression_decompress(int type, struct list_head *ws,
-               const u8 *data_in, struct page *dest_page,
-               unsigned long start_byte, size_t srclen, size_t destlen)
+               const u8 *data_in, struct page *dest_page,
+               unsigned long dest_pgoff, size_t srclen, size_t destlen)
 {
        switch (type) {
        case BTRFS_COMPRESS_ZLIB: return zlib_decompress(ws, data_in, dest_page,
-                                               start_byte, srclen, destlen);
+                                               dest_pgoff, srclen, destlen);
        case BTRFS_COMPRESS_LZO:  return lzo_decompress(ws, data_in, dest_page,
-                                               start_byte, srclen, destlen);
+                                               dest_pgoff, srclen, destlen);
        case BTRFS_COMPRESS_ZSTD: return zstd_decompress(ws, data_in, dest_page,
-                                               start_byte, srclen, destlen);
+                                               dest_pgoff, srclen, destlen);
        case BTRFS_COMPRESS_NONE:
        default:
                /*
@@ -1037,14 +1037,23 @@ static int btrfs_decompress_bio(struct compressed_bio *cb)
  * start_byte tells us the offset into the compressed data we're interested in
  */
 int btrfs_decompress(int type, const u8 *data_in, struct page *dest_page,
-                    unsigned long start_byte, size_t srclen, size_t destlen)
+                    unsigned long dest_pgoff, size_t srclen, size_t destlen)
 {
+       struct btrfs_fs_info *fs_info = btrfs_sb(dest_page->mapping->host->i_sb);
        struct list_head *workspace;
+       const u32 sectorsize = fs_info->sectorsize;
        int ret;
 
+       /*
+        * The full destination page range should not exceed the page size.
+        * And the @destlen should not exceed sectorsize, as this is only called for
+        * inline file extents, which should not exceed sectorsize.
+        */
+       ASSERT(dest_pgoff + destlen <= PAGE_SIZE && destlen <= sectorsize);
+
        workspace = get_workspace(type, 0);
        ret = compression_decompress(type, workspace, data_in, dest_page,
-                                    start_byte, srclen, destlen);
+                                    dest_pgoff, srclen, destlen);
        put_workspace(type, workspace);
 
        return ret;
index 93cc92974deee4cebb4fd25d38118f2c046e1840..afd7e50d073d4ac743c924b70e7e1734af2f6ffc 100644 (file)
@@ -148,7 +148,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
                unsigned long *total_in, unsigned long *total_out);
 int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb);
 int zlib_decompress(struct list_head *ws, const u8 *data_in,
-               struct page *dest_page, unsigned long start_byte, size_t srclen,
+               struct page *dest_page, unsigned long dest_pgoff, size_t srclen,
                size_t destlen);
 struct list_head *zlib_alloc_workspace(unsigned int level);
 void zlib_free_workspace(struct list_head *ws);
@@ -159,7 +159,7 @@ int lzo_compress_pages(struct list_head *ws, struct address_space *mapping,
                unsigned long *total_in, unsigned long *total_out);
 int lzo_decompress_bio(struct list_head *ws, struct compressed_bio *cb);
 int lzo_decompress(struct list_head *ws, const u8 *data_in,
-               struct page *dest_page, unsigned long start_byte, size_t srclen,
+               struct page *dest_page, unsigned long dest_pgoff, size_t srclen,
                size_t destlen);
 struct list_head *lzo_alloc_workspace(unsigned int level);
 void lzo_free_workspace(struct list_head *ws);
index c276b136ab63a16d5278a654ab26a670098806c0..5b0b645714183a7adcbe093d48964ee7380c4b24 100644 (file)
@@ -1046,7 +1046,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
                        goto add;
 
                /* Skip too large extent */
-               if (range_len >= extent_thresh)
+               if (em->len >= extent_thresh)
                        goto next;
 
                /*
index 2833e8ef4c098f680a4883d41a1e925dc477bc2f..acf9f4b6c044025fe2ef288e99716d0373d01f31 100644 (file)
@@ -245,7 +245,6 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
        struct btrfs_block_rsv *block_rsv = &inode->block_rsv;
        u64 reserve_size = 0;
        u64 qgroup_rsv_size = 0;
-       u64 csum_leaves;
        unsigned outstanding_extents;
 
        lockdep_assert_held(&inode->lock);
@@ -260,10 +259,12 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
                                                outstanding_extents);
                reserve_size += btrfs_calc_metadata_size(fs_info, 1);
        }
-       csum_leaves = btrfs_csum_bytes_to_leaves(fs_info,
-                                                inode->csum_bytes);
-       reserve_size += btrfs_calc_insert_metadata_size(fs_info,
-                                                       csum_leaves);
+       if (!(inode->flags & BTRFS_INODE_NODATASUM)) {
+               u64 csum_leaves;
+
+               csum_leaves = btrfs_csum_bytes_to_leaves(fs_info, inode->csum_bytes);
+               reserve_size += btrfs_calc_insert_metadata_size(fs_info, csum_leaves);
+       }
        /*
         * For qgroup rsv, the calculation is very simple:
         * account one nodesize for each outstanding extent
@@ -278,14 +279,20 @@ static void btrfs_calculate_inode_block_rsv_size(struct btrfs_fs_info *fs_info,
        spin_unlock(&block_rsv->lock);
 }
 
-static void calc_inode_reservations(struct btrfs_fs_info *fs_info,
+static void calc_inode_reservations(struct btrfs_inode *inode,
                                    u64 num_bytes, u64 disk_num_bytes,
                                    u64 *meta_reserve, u64 *qgroup_reserve)
 {
+       struct btrfs_fs_info *fs_info = inode->root->fs_info;
        u64 nr_extents = count_max_extents(fs_info, num_bytes);
-       u64 csum_leaves = btrfs_csum_bytes_to_leaves(fs_info, disk_num_bytes);
+       u64 csum_leaves;
        u64 inode_update = btrfs_calc_metadata_size(fs_info, 1);
 
+       if (inode->flags & BTRFS_INODE_NODATASUM)
+               csum_leaves = 0;
+       else
+               csum_leaves = btrfs_csum_bytes_to_leaves(fs_info, disk_num_bytes);
+
        *meta_reserve = btrfs_calc_insert_metadata_size(fs_info,
                                                nr_extents + csum_leaves);
 
@@ -337,7 +344,7 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes,
         * everything out and try again, which is bad.  This way we just
         * over-reserve slightly, and clean up the mess when we are done.
         */
-       calc_inode_reservations(fs_info, num_bytes, disk_num_bytes,
+       calc_inode_reservations(inode, num_bytes, disk_num_bytes,
                                &meta_reserve, &qgroup_reserve);
        ret = btrfs_qgroup_reserve_meta_prealloc(root, qgroup_reserve, true,
                                                 noflush);
@@ -359,7 +366,8 @@ int btrfs_delalloc_reserve_metadata(struct btrfs_inode *inode, u64 num_bytes,
        nr_extents = count_max_extents(fs_info, num_bytes);
        spin_lock(&inode->lock);
        btrfs_mod_outstanding_extents(inode, nr_extents);
-       inode->csum_bytes += disk_num_bytes;
+       if (!(inode->flags & BTRFS_INODE_NODATASUM))
+               inode->csum_bytes += disk_num_bytes;
        btrfs_calculate_inode_block_rsv_size(fs_info, inode);
        spin_unlock(&inode->lock);
 
@@ -393,7 +401,8 @@ void btrfs_delalloc_release_metadata(struct btrfs_inode *inode, u64 num_bytes,
 
        num_bytes = ALIGN(num_bytes, fs_info->sectorsize);
        spin_lock(&inode->lock);
-       inode->csum_bytes -= num_bytes;
+       if (!(inode->flags & BTRFS_INODE_NODATASUM))
+               inode->csum_bytes -= num_bytes;
        btrfs_calculate_inode_block_rsv_size(fs_info, inode);
        spin_unlock(&inode->lock);
 
index 1502d664c89273eb54ba3516528b74eab094f3b3..fb33027e5a4cd05b89c74ef97b25f9de9249cbf3 100644 (file)
@@ -246,7 +246,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
 {
        struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
        struct btrfs_device *device;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        struct block_device *bdev;
        u64 devid = BTRFS_DEV_REPLACE_DEVID;
        int ret = 0;
@@ -257,13 +257,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
                return -EINVAL;
        }
 
-       bdev_handle = bdev_open_by_path(device_path, BLK_OPEN_WRITE,
+       bdev_file = bdev_file_open_by_path(device_path, BLK_OPEN_WRITE,
                                        fs_info->bdev_holder, NULL);
-       if (IS_ERR(bdev_handle)) {
+       if (IS_ERR(bdev_file)) {
                btrfs_err(fs_info, "target device %s is invalid!", device_path);
-               return PTR_ERR(bdev_handle);
+               return PTR_ERR(bdev_file);
        }
-       bdev = bdev_handle->bdev;
+       bdev = file_bdev(bdev_file);
 
        if (!btrfs_check_device_zone_type(fs_info, bdev)) {
                btrfs_err(fs_info,
@@ -314,7 +314,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
        device->commit_bytes_used = device->bytes_used;
        device->fs_info = fs_info;
        device->bdev = bdev;
-       device->bdev_handle = bdev_handle;
+       device->bdev_file = bdev_file;
        set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
        set_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
        device->dev_stats_valid = 1;
@@ -335,7 +335,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
        return 0;
 
 error:
-       bdev_release(bdev_handle);
+       fput(bdev_file);
        return ret;
 }
 
@@ -725,6 +725,23 @@ leave:
        return ret;
 }
 
+static int btrfs_check_replace_dev_names(struct btrfs_ioctl_dev_replace_args *args)
+{
+       if (args->start.srcdevid == 0) {
+               if (memchr(args->start.srcdev_name, 0,
+                          sizeof(args->start.srcdev_name)) == NULL)
+                       return -ENAMETOOLONG;
+       } else {
+               args->start.srcdev_name[0] = 0;
+       }
+
+       if (memchr(args->start.tgtdev_name, 0,
+                  sizeof(args->start.tgtdev_name)) == NULL)
+           return -ENAMETOOLONG;
+
+       return 0;
+}
+
 int btrfs_dev_replace_by_ioctl(struct btrfs_fs_info *fs_info,
                            struct btrfs_ioctl_dev_replace_args *args)
 {
@@ -737,10 +754,9 @@ int btrfs_dev_replace_by_ioctl(struct btrfs_fs_info *fs_info,
        default:
                return -EINVAL;
        }
-
-       if ((args->start.srcdevid == 0 && args->start.srcdev_name[0] == '\0') ||
-           args->start.tgtdev_name[0] == '\0')
-               return -EINVAL;
+       ret = btrfs_check_replace_dev_names(args);
+       if (ret < 0)
+               return ret;
 
        ret = btrfs_dev_replace_start(fs_info, args->start.tgtdev_name,
                                        args->start.srcdevid,
index c6907d533fe83912576fd92283658539e0abbb81..c843563914cad08e2dd84ef1741e19d933f092ee 100644 (file)
@@ -1307,12 +1307,12 @@ void btrfs_free_fs_info(struct btrfs_fs_info *fs_info)
  *
  * @objectid:  root id
  * @anon_dev:  preallocated anonymous block device number for new roots,
- *             pass 0 for new allocation.
+ *             pass NULL for a new allocation.
  * @check_ref: whether to check root item references, If true, return -ENOENT
  *             for orphan roots
  */
 static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
-                                            u64 objectid, dev_t anon_dev,
+                                            u64 objectid, dev_t *anon_dev,
                                             bool check_ref)
 {
        struct btrfs_root *root;
@@ -1336,8 +1336,17 @@ static struct btrfs_root *btrfs_get_root_ref(struct btrfs_fs_info *fs_info,
 again:
        root = btrfs_lookup_fs_root(fs_info, objectid);
        if (root) {
-               /* Shouldn't get preallocated anon_dev for cached roots */
-               ASSERT(!anon_dev);
+               /*
+                * Some other caller may have read out the newly inserted
+                * subvolume already (for things like backref walk etc).  Not
+                * that common but still possible.  In that case, we just need
+                * to free the anon_dev.
+                */
+               if (unlikely(anon_dev && *anon_dev)) {
+                       free_anon_bdev(*anon_dev);
+                       *anon_dev = 0;
+               }
+
                if (check_ref && btrfs_root_refs(&root->root_item) == 0) {
                        btrfs_put_root(root);
                        return ERR_PTR(-ENOENT);
@@ -1357,7 +1366,7 @@ again:
                goto fail;
        }
 
-       ret = btrfs_init_fs_root(root, anon_dev);
+       ret = btrfs_init_fs_root(root, anon_dev ? *anon_dev : 0);
        if (ret)
                goto fail;
 
@@ -1393,7 +1402,7 @@ fail:
         * root's anon_dev to 0 to avoid a double free, once by btrfs_put_root()
         * and once again by our caller.
         */
-       if (anon_dev)
+       if (anon_dev && *anon_dev)
                root->anon_dev = 0;
        btrfs_put_root(root);
        return ERR_PTR(ret);
@@ -1409,7 +1418,7 @@ fail:
 struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
                                     u64 objectid, bool check_ref)
 {
-       return btrfs_get_root_ref(fs_info, objectid, 0, check_ref);
+       return btrfs_get_root_ref(fs_info, objectid, NULL, check_ref);
 }
 
 /*
@@ -1417,11 +1426,11 @@ struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
  * the anonymous block device id
  *
  * @objectid:  tree objectid
- * @anon_dev:  if zero, allocate a new anonymous block device or use the
- *             parameter value
+ * @anon_dev:  if NULL, allocate a new anonymous block device or use the
+ *             parameter value if not NULL
  */
 struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
-                                        u64 objectid, dev_t anon_dev)
+                                        u64 objectid, dev_t *anon_dev)
 {
        return btrfs_get_root_ref(fs_info, objectid, anon_dev, true);
 }
index 9413726b329bb123202a66cf341320ca2d99e410..eb3473d1c1ac1b239092a594cf0f788f961b4943 100644 (file)
@@ -61,7 +61,7 @@ void btrfs_free_fs_roots(struct btrfs_fs_info *fs_info);
 struct btrfs_root *btrfs_get_fs_root(struct btrfs_fs_info *fs_info,
                                     u64 objectid, bool check_ref);
 struct btrfs_root *btrfs_get_new_fs_root(struct btrfs_fs_info *fs_info,
-                                        u64 objectid, dev_t anon_dev);
+                                        u64 objectid, dev_t *anon_dev);
 struct btrfs_root *btrfs_get_fs_root_commit_root(struct btrfs_fs_info *fs_info,
                                                 struct btrfs_path *path,
                                                 u64 objectid);
index f396aba92c579641d1cce38b48e7e7cd4febc510..8e8cc11112772dfd020217e30d74fe138c3151ca 100644 (file)
@@ -1260,7 +1260,8 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
        u64 bytes_left, end;
        u64 aligned_start = ALIGN(start, 1 << SECTOR_SHIFT);
 
-       if (WARN_ON(start != aligned_start)) {
+       /* Adjust the range to be aligned to 512B sectors if necessary. */
+       if (start != aligned_start) {
                len -= aligned_start - start;
                len = round_down(len, 1 << SECTOR_SHIFT);
                start = aligned_start;
@@ -4298,6 +4299,42 @@ static int prepare_allocation_clustered(struct btrfs_fs_info *fs_info,
        return 0;
 }
 
+static int prepare_allocation_zoned(struct btrfs_fs_info *fs_info,
+                                   struct find_free_extent_ctl *ffe_ctl)
+{
+       if (ffe_ctl->for_treelog) {
+               spin_lock(&fs_info->treelog_bg_lock);
+               if (fs_info->treelog_bg)
+                       ffe_ctl->hint_byte = fs_info->treelog_bg;
+               spin_unlock(&fs_info->treelog_bg_lock);
+       } else if (ffe_ctl->for_data_reloc) {
+               spin_lock(&fs_info->relocation_bg_lock);
+               if (fs_info->data_reloc_bg)
+                       ffe_ctl->hint_byte = fs_info->data_reloc_bg;
+               spin_unlock(&fs_info->relocation_bg_lock);
+       } else if (ffe_ctl->flags & BTRFS_BLOCK_GROUP_DATA) {
+               struct btrfs_block_group *block_group;
+
+               spin_lock(&fs_info->zone_active_bgs_lock);
+               list_for_each_entry(block_group, &fs_info->zone_active_bgs, active_bg_list) {
+                       /*
+                        * No lock is OK here because avail is monotinically
+                        * decreasing, and this is just a hint.
+                        */
+                       u64 avail = block_group->zone_capacity - block_group->alloc_offset;
+
+                       if (block_group_bits(block_group, ffe_ctl->flags) &&
+                           avail >= ffe_ctl->num_bytes) {
+                               ffe_ctl->hint_byte = block_group->start;
+                               break;
+                       }
+               }
+               spin_unlock(&fs_info->zone_active_bgs_lock);
+       }
+
+       return 0;
+}
+
 static int prepare_allocation(struct btrfs_fs_info *fs_info,
                              struct find_free_extent_ctl *ffe_ctl,
                              struct btrfs_space_info *space_info,
@@ -4308,19 +4345,7 @@ static int prepare_allocation(struct btrfs_fs_info *fs_info,
                return prepare_allocation_clustered(fs_info, ffe_ctl,
                                                    space_info, ins);
        case BTRFS_EXTENT_ALLOC_ZONED:
-               if (ffe_ctl->for_treelog) {
-                       spin_lock(&fs_info->treelog_bg_lock);
-                       if (fs_info->treelog_bg)
-                               ffe_ctl->hint_byte = fs_info->treelog_bg;
-                       spin_unlock(&fs_info->treelog_bg_lock);
-               }
-               if (ffe_ctl->for_data_reloc) {
-                       spin_lock(&fs_info->relocation_bg_lock);
-                       if (fs_info->data_reloc_bg)
-                               ffe_ctl->hint_byte = fs_info->data_reloc_bg;
-                       spin_unlock(&fs_info->relocation_bg_lock);
-               }
-               return 0;
+               return prepare_allocation_zoned(fs_info, ffe_ctl);
        default:
                BUG();
        }
index cfd2967f04a293cf3d38956e9e21ce9e6656b498..8b4bef05e22217cfe43af497060889e0e5b02d0a 100644 (file)
@@ -2480,6 +2480,7 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo,
                                struct fiemap_cache *cache,
                                u64 offset, u64 phys, u64 len, u32 flags)
 {
+       u64 cache_end;
        int ret = 0;
 
        /* Set at the end of extent_fiemap(). */
@@ -2489,15 +2490,102 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo,
                goto assign;
 
        /*
-        * Sanity check, extent_fiemap() should have ensured that new
-        * fiemap extent won't overlap with cached one.
-        * Not recoverable.
+        * When iterating the extents of the inode, at extent_fiemap(), we may
+        * find an extent that starts at an offset behind the end offset of the
+        * previous extent we processed. This happens if fiemap is called
+        * without FIEMAP_FLAG_SYNC and there are ordered extents completing
+        * while we call btrfs_next_leaf() (through fiemap_next_leaf_item()).
         *
-        * NOTE: Physical address can overlap, due to compression
+        * For example we are in leaf X processing its last item, which is the
+        * file extent item for file range [512K, 1M[, and after
+        * btrfs_next_leaf() releases the path, there's an ordered extent that
+        * completes for the file range [768K, 2M[, and that results in trimming
+        * the file extent item so that it now corresponds to the file range
+        * [512K, 768K[ and a new file extent item is inserted for the file
+        * range [768K, 2M[, which may end up as the last item of leaf X or as
+        * the first item of the next leaf - in either case btrfs_next_leaf()
+        * will leave us with a path pointing to the new extent item, for the
+        * file range [768K, 2M[, since that's the first key that follows the
+        * last one we processed. So in order not to report overlapping extents
+        * to user space, we trim the length of the previously cached extent and
+        * emit it.
+        *
+        * Upon calling btrfs_next_leaf() we may also find an extent with an
+        * offset smaller than or equals to cache->offset, and this happens
+        * when we had a hole or prealloc extent with several delalloc ranges in
+        * it, but after btrfs_next_leaf() released the path, delalloc was
+        * flushed and the resulting ordered extents were completed, so we can
+        * now have found a file extent item for an offset that is smaller than
+        * or equals to what we have in cache->offset. We deal with this as
+        * described below.
         */
-       if (cache->offset + cache->len > offset) {
-               WARN_ON(1);
-               return -EINVAL;
+       cache_end = cache->offset + cache->len;
+       if (cache_end > offset) {
+               if (offset == cache->offset) {
+                       /*
+                        * We cached a dealloc range (found in the io tree) for
+                        * a hole or prealloc extent and we have now found a
+                        * file extent item for the same offset. What we have
+                        * now is more recent and up to date, so discard what
+                        * we had in the cache and use what we have just found.
+                        */
+                       goto assign;
+               } else if (offset > cache->offset) {
+                       /*
+                        * The extent range we previously found ends after the
+                        * offset of the file extent item we found and that
+                        * offset falls somewhere in the middle of that previous
+                        * extent range. So adjust the range we previously found
+                        * to end at the offset of the file extent item we have
+                        * just found, since this extent is more up to date.
+                        * Emit that adjusted range and cache the file extent
+                        * item we have just found. This corresponds to the case
+                        * where a previously found file extent item was split
+                        * due to an ordered extent completing.
+                        */
+                       cache->len = offset - cache->offset;
+                       goto emit;
+               } else {
+                       const u64 range_end = offset + len;
+
+                       /*
+                        * The offset of the file extent item we have just found
+                        * is behind the cached offset. This means we were
+                        * processing a hole or prealloc extent for which we
+                        * have found delalloc ranges (in the io tree), so what
+                        * we have in the cache is the last delalloc range we
+                        * found while the file extent item we found can be
+                        * either for a whole delalloc range we previously
+                        * emmitted or only a part of that range.
+                        *
+                        * We have two cases here:
+                        *
+                        * 1) The file extent item's range ends at or behind the
+                        *    cached extent's end. In this case just ignore the
+                        *    current file extent item because we don't want to
+                        *    overlap with previous ranges that may have been
+                        *    emmitted already;
+                        *
+                        * 2) The file extent item starts behind the currently
+                        *    cached extent but its end offset goes beyond the
+                        *    end offset of the cached extent. We don't want to
+                        *    overlap with a previous range that may have been
+                        *    emmitted already, so we emit the currently cached
+                        *    extent and then partially store the current file
+                        *    extent item's range in the cache, for the subrange
+                        *    going the cached extent's end to the end of the
+                        *    file extent item.
+                        */
+                       if (range_end <= cache_end)
+                               return 0;
+
+                       if (!(flags & (FIEMAP_EXTENT_ENCODED | FIEMAP_EXTENT_DELALLOC)))
+                               phys += cache_end - offset;
+
+                       offset = cache_end;
+                       len = range_end - cache_end;
+                       goto emit;
+               }
        }
 
        /*
@@ -2517,6 +2605,7 @@ static int emit_fiemap_extent(struct fiemap_extent_info *fieinfo,
                return 0;
        }
 
+emit:
        /* Not mergeable, need to submit cached one */
        ret = fiemap_fill_next_extent(fieinfo, cache->offset, cache->phys,
                                      cache->len, cache->flags);
@@ -2689,16 +2778,34 @@ static int fiemap_process_hole(struct btrfs_inode *inode,
         * it beyond i_size.
         */
        while (cur_offset < end && cur_offset < i_size) {
+               struct extent_state *cached_state = NULL;
                u64 delalloc_start;
                u64 delalloc_end;
                u64 prealloc_start;
+               u64 lockstart;
+               u64 lockend;
                u64 prealloc_len = 0;
                bool delalloc;
 
+               lockstart = round_down(cur_offset, inode->root->fs_info->sectorsize);
+               lockend = round_up(end, inode->root->fs_info->sectorsize);
+
+               /*
+                * We are only locking for the delalloc range because that's the
+                * only thing that can change here.  With fiemap we have a lock
+                * on the inode, so no buffered or direct writes can happen.
+                *
+                * However mmaps and normal page writeback will cause this to
+                * change arbitrarily.  We have to lock the extent lock here to
+                * make sure that nobody messes with the tree while we're doing
+                * btrfs_find_delalloc_in_range.
+                */
+               lock_extent(&inode->io_tree, lockstart, lockend, &cached_state);
                delalloc = btrfs_find_delalloc_in_range(inode, cur_offset, end,
                                                        delalloc_cached_state,
                                                        &delalloc_start,
                                                        &delalloc_end);
+               unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state);
                if (!delalloc)
                        break;
 
@@ -2866,15 +2973,15 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
                  u64 start, u64 len)
 {
        const u64 ino = btrfs_ino(inode);
-       struct extent_state *cached_state = NULL;
        struct extent_state *delalloc_cached_state = NULL;
        struct btrfs_path *path;
        struct fiemap_cache cache = { 0 };
        struct btrfs_backref_share_check_ctx *backref_ctx;
        u64 last_extent_end;
        u64 prev_extent_end;
-       u64 lockstart;
-       u64 lockend;
+       u64 range_start;
+       u64 range_end;
+       const u64 sectorsize = inode->root->fs_info->sectorsize;
        bool stopped = false;
        int ret;
 
@@ -2885,22 +2992,19 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
                goto out;
        }
 
-       lockstart = round_down(start, inode->root->fs_info->sectorsize);
-       lockend = round_up(start + len, inode->root->fs_info->sectorsize);
-       prev_extent_end = lockstart;
-
-       btrfs_inode_lock(inode, BTRFS_ILOCK_SHARED);
-       lock_extent(&inode->io_tree, lockstart, lockend, &cached_state);
+       range_start = round_down(start, sectorsize);
+       range_end = round_up(start + len, sectorsize);
+       prev_extent_end = range_start;
 
        ret = fiemap_find_last_extent_offset(inode, path, &last_extent_end);
        if (ret < 0)
-               goto out_unlock;
+               goto out;
        btrfs_release_path(path);
 
        path->reada = READA_FORWARD;
-       ret = fiemap_search_slot(inode, path, lockstart);
+       ret = fiemap_search_slot(inode, path, range_start);
        if (ret < 0) {
-               goto out_unlock;
+               goto out;
        } else if (ret > 0) {
                /*
                 * No file extent item found, but we may have delalloc between
@@ -2910,7 +3014,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
                goto check_eof_delalloc;
        }
 
-       while (prev_extent_end < lockend) {
+       while (prev_extent_end < range_end) {
                struct extent_buffer *leaf = path->nodes[0];
                struct btrfs_file_extent_item *ei;
                struct btrfs_key key;
@@ -2933,21 +3037,21 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
                 * The first iteration can leave us at an extent item that ends
                 * before our range's start. Move to the next item.
                 */
-               if (extent_end <= lockstart)
+               if (extent_end <= range_start)
                        goto next_item;
 
                backref_ctx->curr_leaf_bytenr = leaf->start;
 
                /* We have in implicit hole (NO_HOLES feature enabled). */
                if (prev_extent_end < key.offset) {
-                       const u64 range_end = min(key.offset, lockend) - 1;
+                       const u64 hole_end = min(key.offset, range_end) - 1;
 
                        ret = fiemap_process_hole(inode, fieinfo, &cache,
                                                  &delalloc_cached_state,
                                                  backref_ctx, 0, 0, 0,
-                                                 prev_extent_end, range_end);
+                                                 prev_extent_end, hole_end);
                        if (ret < 0) {
-                               goto out_unlock;
+                               goto out;
                        } else if (ret > 0) {
                                /* fiemap_fill_next_extent() told us to stop. */
                                stopped = true;
@@ -2955,7 +3059,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
                        }
 
                        /* We've reached the end of the fiemap range, stop. */
-                       if (key.offset >= lockend) {
+                       if (key.offset >= range_end) {
                                stopped = true;
                                break;
                        }
@@ -3003,7 +3107,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
                                                                  extent_gen,
                                                                  backref_ctx);
                                if (ret < 0)
-                                       goto out_unlock;
+                                       goto out;
                                else if (ret > 0)
                                        flags |= FIEMAP_EXTENT_SHARED;
                        }
@@ -3014,7 +3118,7 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
                }
 
                if (ret < 0) {
-                       goto out_unlock;
+                       goto out;
                } else if (ret > 0) {
                        /* fiemap_fill_next_extent() told us to stop. */
                        stopped = true;
@@ -3025,12 +3129,12 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
 next_item:
                if (fatal_signal_pending(current)) {
                        ret = -EINTR;
-                       goto out_unlock;
+                       goto out;
                }
 
                ret = fiemap_next_leaf_item(inode, path);
                if (ret < 0) {
-                       goto out_unlock;
+                       goto out;
                } else if (ret > 0) {
                        /* No more file extent items for this inode. */
                        break;
@@ -3049,29 +3153,41 @@ check_eof_delalloc:
        btrfs_free_path(path);
        path = NULL;
 
-       if (!stopped && prev_extent_end < lockend) {
+       if (!stopped && prev_extent_end < range_end) {
                ret = fiemap_process_hole(inode, fieinfo, &cache,
                                          &delalloc_cached_state, backref_ctx,
-                                         0, 0, 0, prev_extent_end, lockend - 1);
+                                         0, 0, 0, prev_extent_end, range_end - 1);
                if (ret < 0)
-                       goto out_unlock;
-               prev_extent_end = lockend;
+                       goto out;
+               prev_extent_end = range_end;
        }
 
        if (cache.cached && cache.offset + cache.len >= last_extent_end) {
                const u64 i_size = i_size_read(&inode->vfs_inode);
 
                if (prev_extent_end < i_size) {
+                       struct extent_state *cached_state = NULL;
                        u64 delalloc_start;
                        u64 delalloc_end;
+                       u64 lockstart;
+                       u64 lockend;
                        bool delalloc;
 
+                       lockstart = round_down(prev_extent_end, sectorsize);
+                       lockend = round_up(i_size, sectorsize);
+
+                       /*
+                        * See the comment in fiemap_process_hole as to why
+                        * we're doing the locking here.
+                        */
+                       lock_extent(&inode->io_tree, lockstart, lockend, &cached_state);
                        delalloc = btrfs_find_delalloc_in_range(inode,
                                                                prev_extent_end,
                                                                i_size - 1,
                                                                &delalloc_cached_state,
                                                                &delalloc_start,
                                                                &delalloc_end);
+                       unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state);
                        if (!delalloc)
                                cache.flags |= FIEMAP_EXTENT_LAST;
                } else {
@@ -3080,10 +3196,6 @@ check_eof_delalloc:
        }
 
        ret = emit_last_fiemap_cache(fieinfo, &cache);
-
-out_unlock:
-       unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state);
-       btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED);
 out:
        free_extent_state(delalloc_cached_state);
        btrfs_free_backref_share_ctx(backref_ctx);
index 809b11472a806c92ef9ad4454d354a9460a51b7b..4795738d5785bce730fad21b68a00ff729b97915 100644 (file)
@@ -3184,8 +3184,23 @@ out:
                        unwritten_start += logical_len;
                clear_extent_uptodate(io_tree, unwritten_start, end, NULL);
 
-               /* Drop extent maps for the part of the extent we didn't write. */
-               btrfs_drop_extent_map_range(inode, unwritten_start, end, false);
+               /*
+                * Drop extent maps for the part of the extent we didn't write.
+                *
+                * We have an exception here for the free_space_inode, this is
+                * because when we do btrfs_get_extent() on the free space inode
+                * we will search the commit root.  If this is a new block group
+                * we won't find anything, and we will trip over the assert in
+                * writepage where we do ASSERT(em->block_start !=
+                * EXTENT_MAP_HOLE).
+                *
+                * Theoretically we could also skip this for any NOCOW extent as
+                * we don't mess with the extent map tree in the NOCOW case, but
+                * for now simply skip this if we are the free space inode.
+                */
+               if (!btrfs_is_free_space_inode(inode))
+                       btrfs_drop_extent_map_range(inode, unwritten_start,
+                                                   end, false);
 
                /*
                 * If the ordered extent had an IOERR or something else went
@@ -4458,6 +4473,8 @@ int btrfs_delete_subvolume(struct btrfs_inode *dir, struct dentry *dentry)
        u64 root_flags;
        int ret;
 
+       down_write(&fs_info->subvol_sem);
+
        /*
         * Don't allow to delete a subvolume with send in progress. This is
         * inside the inode lock so the error handling that has to drop the bit
@@ -4469,25 +4486,25 @@ int btrfs_delete_subvolume(struct btrfs_inode *dir, struct dentry *dentry)
                btrfs_warn(fs_info,
                           "attempt to delete subvolume %llu during send",
                           dest->root_key.objectid);
-               return -EPERM;
+               ret = -EPERM;
+               goto out_up_write;
        }
        if (atomic_read(&dest->nr_swapfiles)) {
                spin_unlock(&dest->root_item_lock);
                btrfs_warn(fs_info,
                           "attempt to delete subvolume %llu with active swapfile",
                           root->root_key.objectid);
-               return -EPERM;
+               ret = -EPERM;
+               goto out_up_write;
        }
        root_flags = btrfs_root_flags(&dest->root_item);
        btrfs_set_root_flags(&dest->root_item,
                             root_flags | BTRFS_ROOT_SUBVOL_DEAD);
        spin_unlock(&dest->root_item_lock);
 
-       down_write(&fs_info->subvol_sem);
-
        ret = may_destroy_subvol(dest);
        if (ret)
-               goto out_up_write;
+               goto out_undead;
 
        btrfs_init_block_rsv(&block_rsv, BTRFS_BLOCK_RSV_TEMP);
        /*
@@ -4497,7 +4514,7 @@ int btrfs_delete_subvolume(struct btrfs_inode *dir, struct dentry *dentry)
         */
        ret = btrfs_subvolume_reserve_metadata(root, &block_rsv, 5, true);
        if (ret)
-               goto out_up_write;
+               goto out_undead;
 
        trans = btrfs_start_transaction(root, 0);
        if (IS_ERR(trans)) {
@@ -4563,15 +4580,17 @@ out_end_trans:
        inode->i_flags |= S_DEAD;
 out_release:
        btrfs_subvolume_release_metadata(root, &block_rsv);
-out_up_write:
-       up_write(&fs_info->subvol_sem);
+out_undead:
        if (ret) {
                spin_lock(&dest->root_item_lock);
                root_flags = btrfs_root_flags(&dest->root_item);
                btrfs_set_root_flags(&dest->root_item,
                                root_flags & ~BTRFS_ROOT_SUBVOL_DEAD);
                spin_unlock(&dest->root_item_lock);
-       } else {
+       }
+out_up_write:
+       up_write(&fs_info->subvol_sem);
+       if (!ret) {
                d_invalidate(dentry);
                btrfs_prune_dentries(dest);
                ASSERT(dest->send_in_progress == 0);
@@ -7816,6 +7835,7 @@ struct iomap_dio *btrfs_dio_write(struct kiocb *iocb, struct iov_iter *iter,
 static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
                        u64 start, u64 len)
 {
+       struct btrfs_inode *btrfs_inode = BTRFS_I(inode);
        int     ret;
 
        ret = fiemap_prep(inode, fieinfo, start, &len, 0);
@@ -7841,7 +7861,26 @@ static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
                        return ret;
        }
 
-       return extent_fiemap(BTRFS_I(inode), fieinfo, start, len);
+       btrfs_inode_lock(btrfs_inode, BTRFS_ILOCK_SHARED);
+
+       /*
+        * We did an initial flush to avoid holding the inode's lock while
+        * triggering writeback and waiting for the completion of IO and ordered
+        * extents. Now after we locked the inode we do it again, because it's
+        * possible a new write may have happened in between those two steps.
+        */
+       if (fieinfo->fi_flags & FIEMAP_FLAG_SYNC) {
+               ret = btrfs_wait_ordered_range(inode, 0, LLONG_MAX);
+               if (ret) {
+                       btrfs_inode_unlock(btrfs_inode, BTRFS_ILOCK_SHARED);
+                       return ret;
+               }
+       }
+
+       ret = extent_fiemap(btrfs_inode, fieinfo, start, len);
+       btrfs_inode_unlock(btrfs_inode, BTRFS_ILOCK_SHARED);
+
+       return ret;
 }
 
 static int btrfs_writepages(struct address_space *mapping,
@@ -10269,6 +10308,13 @@ ssize_t btrfs_do_encoded_write(struct kiocb *iocb, struct iov_iter *from,
        if (encoded->encryption != BTRFS_ENCODED_IO_ENCRYPTION_NONE)
                return -EINVAL;
 
+       /*
+        * Compressed extents should always have checksums, so error out if we
+        * have a NOCOW file or inode was created while mounted with NODATASUM.
+        */
+       if (inode->flags & BTRFS_INODE_NODATASUM)
+               return -EINVAL;
+
        orig_count = iov_iter_count(from);
 
        /* The extent size must be sane. */
index 41b479861b3c767bb582920db56ea442c8f7f381..9876ee27f069e693f47c74170f5c54167b8c0268 100644 (file)
@@ -721,7 +721,7 @@ static noinline int create_subvol(struct mnt_idmap *idmap,
        free_extent_buffer(leaf);
        leaf = NULL;
 
-       new_root = btrfs_get_new_fs_root(fs_info, objectid, anon_dev);
+       new_root = btrfs_get_new_fs_root(fs_info, objectid, &anon_dev);
        if (IS_ERR(new_root)) {
                ret = PTR_ERR(new_root);
                btrfs_abort_transaction(trans, ret);
@@ -790,6 +790,9 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
                return -EOPNOTSUPP;
        }
 
+       if (btrfs_root_refs(&root->root_item) == 0)
+               return -ENOENT;
+
        if (!test_bit(BTRFS_ROOT_SHAREABLE, &root->state))
                return -EINVAL;
 
@@ -2608,6 +2611,10 @@ static int btrfs_ioctl_defrag(struct file *file, void __user *argp)
                                ret = -EFAULT;
                                goto out;
                        }
+                       if (range.flags & ~BTRFS_DEFRAG_RANGE_FLAGS_SUPP) {
+                               ret = -EOPNOTSUPP;
+                               goto out;
+                       }
                        /* compression requires us to start the IO */
                        if ((range.flags & BTRFS_DEFRAG_RANGE_COMPRESS)) {
                                range.flags |= BTRFS_DEFRAG_RANGE_START_IO;
@@ -2691,7 +2698,7 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
        struct inode *inode = file_inode(file);
        struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
        struct btrfs_ioctl_vol_args_v2 *vol_args;
-       struct bdev_handle *bdev_handle = NULL;
+       struct file *bdev_file = NULL;
        int ret;
        bool cancel = false;
 
@@ -2728,7 +2735,7 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
                goto err_drop;
 
        /* Exclusive operation is now claimed */
-       ret = btrfs_rm_device(fs_info, &args, &bdev_handle);
+       ret = btrfs_rm_device(fs_info, &args, &bdev_file);
 
        btrfs_exclop_finish(fs_info);
 
@@ -2742,8 +2749,8 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
        }
 err_drop:
        mnt_drop_write_file(file);
-       if (bdev_handle)
-               bdev_release(bdev_handle);
+       if (bdev_file)
+               fput(bdev_file);
 out:
        btrfs_put_dev_args_from_path(&args);
        kfree(vol_args);
@@ -2756,7 +2763,7 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
        struct inode *inode = file_inode(file);
        struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
        struct btrfs_ioctl_vol_args *vol_args;
-       struct bdev_handle *bdev_handle = NULL;
+       struct file *bdev_file = NULL;
        int ret;
        bool cancel = false;
 
@@ -2783,15 +2790,15 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
        ret = exclop_start_or_cancel_reloc(fs_info, BTRFS_EXCLOP_DEV_REMOVE,
                                           cancel);
        if (ret == 0) {
-               ret = btrfs_rm_device(fs_info, &args, &bdev_handle);
+               ret = btrfs_rm_device(fs_info, &args, &bdev_file);
                if (!ret)
                        btrfs_info(fs_info, "disk deleted %s", vol_args->name);
                btrfs_exclop_finish(fs_info);
        }
 
        mnt_drop_write_file(file);
-       if (bdev_handle)
-               bdev_release(bdev_handle);
+       if (bdev_file)
+               fput(bdev_file);
 out:
        btrfs_put_dev_args_from_path(&args);
        kfree(vol_args);
@@ -3808,6 +3815,11 @@ static long btrfs_ioctl_qgroup_create(struct file *file, void __user *arg)
                goto out;
        }
 
+       if (sa->create && is_fstree(sa->qgroupid)) {
+               ret = -EINVAL;
+               goto out;
+       }
+
        trans = btrfs_join_transaction(root);
        if (IS_ERR(trans)) {
                ret = PTR_ERR(trans);
index 1131d5a29d612ee50e14c488b1812a0657c259f1..e43bc0fdc74ec9b0224568928b31e0ca10c77805 100644 (file)
@@ -425,16 +425,16 @@ int lzo_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
 }
 
 int lzo_decompress(struct list_head *ws, const u8 *data_in,
-               struct page *dest_page, unsigned long start_byte, size_t srclen,
+               struct page *dest_page, unsigned long dest_pgoff, size_t srclen,
                size_t destlen)
 {
        struct workspace *workspace = list_entry(ws, struct workspace, list);
+       struct btrfs_fs_info *fs_info = btrfs_sb(dest_page->mapping->host->i_sb);
+       const u32 sectorsize = fs_info->sectorsize;
        size_t in_len;
        size_t out_len;
        size_t max_segment_len = WORKSPACE_BUF_LENGTH;
        int ret = 0;
-       char *kaddr;
-       unsigned long bytes;
 
        if (srclen < LZO_LEN || srclen > max_segment_len + LZO_LEN * 2)
                return -EUCLEAN;
@@ -451,7 +451,7 @@ int lzo_decompress(struct list_head *ws, const u8 *data_in,
        }
        data_in += LZO_LEN;
 
-       out_len = PAGE_SIZE;
+       out_len = sectorsize;
        ret = lzo1x_decompress_safe(data_in, in_len, workspace->buf, &out_len);
        if (ret != LZO_E_OK) {
                pr_warn("BTRFS: decompress failed!\n");
@@ -459,29 +459,13 @@ int lzo_decompress(struct list_head *ws, const u8 *data_in,
                goto out;
        }
 
-       if (out_len < start_byte) {
+       ASSERT(out_len <= sectorsize);
+       memcpy_to_page(dest_page, dest_pgoff, workspace->buf, out_len);
+       /* Early end, considered as an error. */
+       if (unlikely(out_len < destlen)) {
                ret = -EIO;
-               goto out;
+               memzero_page(dest_page, dest_pgoff + out_len, destlen - out_len);
        }
-
-       /*
-        * the caller is already checking against PAGE_SIZE, but lets
-        * move this check closer to the memcpy/memset
-        */
-       destlen = min_t(unsigned long, destlen, PAGE_SIZE);
-       bytes = min_t(unsigned long, destlen, out_len - start_byte);
-
-       kaddr = kmap_local_page(dest_page);
-       memcpy(kaddr, workspace->buf + start_byte, bytes);
-
-       /*
-        * btrfs_getblock is doing a zero on the tail of the page too,
-        * but this will cover anything missing from the decompressed
-        * data.
-        */
-       if (bytes < destlen)
-               memset(kaddr+bytes, 0, destlen-bytes);
-       kunmap_local(kaddr);
 out:
        return ret;
 }
index 63b426cc77989670e0f7890a1cb75a17348e96cb..5470e1cdf10c5348df676cd290bef45811a46019 100644 (file)
@@ -1736,6 +1736,15 @@ out:
        return ret;
 }
 
+static bool qgroup_has_usage(struct btrfs_qgroup *qgroup)
+{
+       return (qgroup->rfer > 0 || qgroup->rfer_cmpr > 0 ||
+               qgroup->excl > 0 || qgroup->excl_cmpr > 0 ||
+               qgroup->rsv.values[BTRFS_QGROUP_RSV_DATA] > 0 ||
+               qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PREALLOC] > 0 ||
+               qgroup->rsv.values[BTRFS_QGROUP_RSV_META_PERTRANS] > 0);
+}
+
 int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
 {
        struct btrfs_fs_info *fs_info = trans->fs_info;
@@ -1755,6 +1764,11 @@ int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, u64 qgroupid)
                goto out;
        }
 
+       if (is_fstree(qgroupid) && qgroup_has_usage(qgroup)) {
+               ret = -EBUSY;
+               goto out;
+       }
+
        /* Check if there are no children of this qgroup */
        if (!list_empty(&qgroup->members)) {
                ret = -EBUSY;
index 6486f0d7e9931b4fafbc03ddc5ddca0863679d7a..8c4fc98ca9ce7de055841a06e43863eeb6b960e0 100644 (file)
@@ -889,8 +889,10 @@ int btrfs_ref_tree_mod(struct btrfs_fs_info *fs_info,
 out_unlock:
        spin_unlock(&fs_info->ref_verify_lock);
 out:
-       if (ret)
+       if (ret) {
+               btrfs_free_ref_cache(fs_info);
                btrfs_clear_opt(fs_info->mount_opt, REF_VERIFY);
+       }
        return ret;
 }
 
@@ -1021,8 +1023,8 @@ int btrfs_build_ref_tree(struct btrfs_fs_info *fs_info)
                }
        }
        if (ret) {
-               btrfs_clear_opt(fs_info->mount_opt, REF_VERIFY);
                btrfs_free_ref_cache(fs_info);
+               btrfs_clear_opt(fs_info->mount_opt, REF_VERIFY);
        }
        btrfs_free_path(path);
        return ret;
index a01807cbd4d44e4127c798e470cef51d8bfa13e6..0123d272892373b3465c942e75e181d3bc77e681 100644 (file)
@@ -1098,12 +1098,22 @@ out:
 static void scrub_read_endio(struct btrfs_bio *bbio)
 {
        struct scrub_stripe *stripe = bbio->private;
+       struct bio_vec *bvec;
+       int sector_nr = calc_sector_number(stripe, bio_first_bvec_all(&bbio->bio));
+       int num_sectors;
+       u32 bio_size = 0;
+       int i;
+
+       ASSERT(sector_nr < stripe->nr_sectors);
+       bio_for_each_bvec_all(bvec, &bbio->bio, i)
+               bio_size += bvec->bv_len;
+       num_sectors = bio_size >> stripe->bg->fs_info->sectorsize_bits;
 
        if (bbio->bio.bi_status) {
-               bitmap_set(&stripe->io_error_bitmap, 0, stripe->nr_sectors);
-               bitmap_set(&stripe->error_bitmap, 0, stripe->nr_sectors);
+               bitmap_set(&stripe->io_error_bitmap, sector_nr, num_sectors);
+               bitmap_set(&stripe->error_bitmap, sector_nr, num_sectors);
        } else {
-               bitmap_clear(&stripe->io_error_bitmap, 0, stripe->nr_sectors);
+               bitmap_clear(&stripe->io_error_bitmap, sector_nr, num_sectors);
        }
        bio_put(&bbio->bio);
        if (atomic_dec_and_test(&stripe->pending_io)) {
@@ -1636,6 +1646,9 @@ static void scrub_submit_extent_sector_read(struct scrub_ctx *sctx,
 {
        struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
        struct btrfs_bio *bbio = NULL;
+       unsigned int nr_sectors = min(BTRFS_STRIPE_LEN, stripe->bg->start +
+                                     stripe->bg->length - stripe->logical) >>
+                                 fs_info->sectorsize_bits;
        u64 stripe_len = BTRFS_STRIPE_LEN;
        int mirror = stripe->mirror_num;
        int i;
@@ -1646,6 +1659,10 @@ static void scrub_submit_extent_sector_read(struct scrub_ctx *sctx,
                struct page *page = scrub_stripe_get_page(stripe, i);
                unsigned int pgoff = scrub_stripe_get_page_offset(stripe, i);
 
+               /* We're beyond the chunk boundary, no need to read anymore. */
+               if (i >= nr_sectors)
+                       break;
+
                /* The current sector cannot be merged, submit the bio. */
                if (bbio &&
                    ((i > 0 &&
@@ -1701,6 +1718,9 @@ static void scrub_submit_initial_read(struct scrub_ctx *sctx,
 {
        struct btrfs_fs_info *fs_info = sctx->fs_info;
        struct btrfs_bio *bbio;
+       unsigned int nr_sectors = min(BTRFS_STRIPE_LEN, stripe->bg->start +
+                                     stripe->bg->length - stripe->logical) >>
+                                 fs_info->sectorsize_bits;
        int mirror = stripe->mirror_num;
 
        ASSERT(stripe->bg);
@@ -1715,14 +1735,16 @@ static void scrub_submit_initial_read(struct scrub_ctx *sctx,
        bbio = btrfs_bio_alloc(SCRUB_STRIPE_PAGES, REQ_OP_READ, fs_info,
                               scrub_read_endio, stripe);
 
-       /* Read the whole stripe. */
        bbio->bio.bi_iter.bi_sector = stripe->logical >> SECTOR_SHIFT;
-       for (int i = 0; i < BTRFS_STRIPE_LEN >> PAGE_SHIFT; i++) {
+       /* Read the whole range inside the chunk boundary. */
+       for (unsigned int cur = 0; cur < nr_sectors; cur++) {
+               struct page *page = scrub_stripe_get_page(stripe, cur);
+               unsigned int pgoff = scrub_stripe_get_page_offset(stripe, cur);
                int ret;
 
-               ret = bio_add_page(&bbio->bio, stripe->pages[i], PAGE_SIZE, 0);
+               ret = bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff);
                /* We should have allocated enough bio vectors. */
-               ASSERT(ret == PAGE_SIZE);
+               ASSERT(ret == fs_info->sectorsize);
        }
        atomic_inc(&stripe->pending_io);
 
index 4e36550618e580044fb0b0d573ddfee196cdca5d..e48a063ef0851f9476fd37a00572c1dd6c6fe379 100644 (file)
@@ -6705,11 +6705,20 @@ static int finish_inode_if_needed(struct send_ctx *sctx, int at_end)
                                if (ret)
                                        goto out;
                        }
-                       if (sctx->cur_inode_last_extent <
-                           sctx->cur_inode_size) {
-                               ret = send_hole(sctx, sctx->cur_inode_size);
-                               if (ret)
+                       if (sctx->cur_inode_last_extent < sctx->cur_inode_size) {
+                               ret = range_is_hole_in_parent(sctx,
+                                                     sctx->cur_inode_last_extent,
+                                                     sctx->cur_inode_size);
+                               if (ret < 0) {
                                        goto out;
+                               } else if (ret == 0) {
+                                       ret = send_hole(sctx, sctx->cur_inode_size);
+                                       if (ret < 0)
+                                               goto out;
+                               } else {
+                                       /* Range is already a hole, skip. */
+                                       ret = 0;
+                               }
                        }
                }
                if (need_truncate) {
@@ -8111,7 +8120,7 @@ long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg)
        }
 
        if (arg->flags & ~BTRFS_SEND_FLAG_MASK) {
-               ret = -EINVAL;
+               ret = -EOPNOTSUPP;
                goto out;
        }
 
@@ -8205,8 +8214,8 @@ long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg)
                goto out;
        }
 
-       sctx->clone_roots = kvcalloc(sizeof(*sctx->clone_roots),
-                                    arg->clone_sources_count + 1,
+       sctx->clone_roots = kvcalloc(arg->clone_sources_count + 1,
+                                    sizeof(*sctx->clone_roots),
                                     GFP_KERNEL);
        if (!sctx->clone_roots) {
                ret = -ENOMEM;
index 571bb13587d5e7aabc1c40feab093af5257a6782..3b54eb5834746be51807d3e13023fba3f2701981 100644 (file)
@@ -856,7 +856,7 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info *fs_info,
 static bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info,
                                    struct btrfs_space_info *space_info)
 {
-       u64 global_rsv_size = fs_info->global_block_rsv.reserved;
+       const u64 global_rsv_size = btrfs_block_rsv_reserved(&fs_info->global_block_rsv);
        u64 ordered, delalloc;
        u64 thresh;
        u64 used;
@@ -956,8 +956,8 @@ static bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info,
        ordered = percpu_counter_read_positive(&fs_info->ordered_bytes) >> 1;
        delalloc = percpu_counter_read_positive(&fs_info->delalloc_bytes);
        if (ordered >= delalloc)
-               used += fs_info->delayed_refs_rsv.reserved +
-                       fs_info->delayed_block_rsv.reserved;
+               used += btrfs_block_rsv_reserved(&fs_info->delayed_refs_rsv) +
+                       btrfs_block_rsv_reserved(&fs_info->delayed_block_rsv);
        else
                used += space_info->bytes_may_use - global_rsv_size;
 
@@ -1173,7 +1173,7 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work)
                enum btrfs_flush_state flush;
                u64 delalloc_size = 0;
                u64 to_reclaim, block_rsv_size;
-               u64 global_rsv_size = global_rsv->reserved;
+               const u64 global_rsv_size = btrfs_block_rsv_reserved(global_rsv);
 
                loops++;
 
@@ -1185,9 +1185,9 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work)
                 * assume it's tied up in delalloc reservations.
                 */
                block_rsv_size = global_rsv_size +
-                       delayed_block_rsv->reserved +
-                       delayed_refs_rsv->reserved +
-                       trans_rsv->reserved;
+                       btrfs_block_rsv_reserved(delayed_block_rsv) +
+                       btrfs_block_rsv_reserved(delayed_refs_rsv) +
+                       btrfs_block_rsv_reserved(trans_rsv);
                if (block_rsv_size < space_info->bytes_may_use)
                        delalloc_size = space_info->bytes_may_use - block_rsv_size;
 
@@ -1207,16 +1207,16 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work)
                        to_reclaim = delalloc_size;
                        flush = FLUSH_DELALLOC;
                } else if (space_info->bytes_pinned >
-                          (delayed_block_rsv->reserved +
-                           delayed_refs_rsv->reserved)) {
+                          (btrfs_block_rsv_reserved(delayed_block_rsv) +
+                           btrfs_block_rsv_reserved(delayed_refs_rsv))) {
                        to_reclaim = space_info->bytes_pinned;
                        flush = COMMIT_TRANS;
-               } else if (delayed_block_rsv->reserved >
-                          delayed_refs_rsv->reserved) {
-                       to_reclaim = delayed_block_rsv->reserved;
+               } else if (btrfs_block_rsv_reserved(delayed_block_rsv) >
+                          btrfs_block_rsv_reserved(delayed_refs_rsv)) {
+                       to_reclaim = btrfs_block_rsv_reserved(delayed_block_rsv);
                        flush = FLUSH_DELAYED_ITEMS_NR;
                } else {
-                       to_reclaim = delayed_refs_rsv->reserved;
+                       to_reclaim = btrfs_block_rsv_reserved(delayed_refs_rsv);
                        flush = FLUSH_DELAYED_REFS_NR;
                }
 
index 93511d54abf8280bc6778a17b5fa75a28d3585c1..0e49dab8dad2480243f4d32e6ee934c0f2b35b67 100644 (file)
@@ -475,7 +475,8 @@ void btrfs_subpage_set_writeback(const struct btrfs_fs_info *fs_info,
 
        spin_lock_irqsave(&subpage->lock, flags);
        bitmap_set(subpage->bitmaps, start_bit, len >> fs_info->sectorsize_bits);
-       folio_start_writeback(folio);
+       if (!folio_test_writeback(folio))
+               folio_start_writeback(folio);
        spin_unlock_irqrestore(&subpage->lock, flags);
 }
 
index 896acfda17895150ff501960dd72f084c542301e..101f786963d4d7712baab28c912226fb741c0c9b 100644 (file)
@@ -1457,6 +1457,14 @@ static int btrfs_reconfigure(struct fs_context *fc)
 
        btrfs_info_to_ctx(fs_info, &old_ctx);
 
+       /*
+        * This is our "bind mount" trick, we don't want to allow the user to do
+        * anything other than mount a different ro/rw and a different subvol,
+        * all of the mount options should be maintained.
+        */
+       if (mount_reconfigure)
+               ctx->mount_opt = old_ctx.mount_opt;
+
        sync_filesystem(sb);
        set_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state);
 
index 5b3333ceef04818dbf98270da4bb84c99e5c70f8..bf8e64c766b63b4c8b424f4437791eaea12f24a2 100644 (file)
@@ -564,56 +564,22 @@ static int btrfs_reserve_trans_metadata(struct btrfs_fs_info *fs_info,
                                        u64 num_bytes,
                                        u64 *delayed_refs_bytes)
 {
-       struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
        struct btrfs_space_info *si = fs_info->trans_block_rsv.space_info;
-       u64 extra_delayed_refs_bytes = 0;
-       u64 bytes;
+       u64 bytes = num_bytes + *delayed_refs_bytes;
        int ret;
 
-       /*
-        * If there's a gap between the size of the delayed refs reserve and
-        * its reserved space, than some tasks have added delayed refs or bumped
-        * its size otherwise (due to block group creation or removal, or block
-        * group item update). Also try to allocate that gap in order to prevent
-        * using (and possibly abusing) the global reserve when committing the
-        * transaction.
-        */
-       if (flush == BTRFS_RESERVE_FLUSH_ALL &&
-           !btrfs_block_rsv_full(delayed_refs_rsv)) {
-               spin_lock(&delayed_refs_rsv->lock);
-               if (delayed_refs_rsv->size > delayed_refs_rsv->reserved)
-                       extra_delayed_refs_bytes = delayed_refs_rsv->size -
-                               delayed_refs_rsv->reserved;
-               spin_unlock(&delayed_refs_rsv->lock);
-       }
-
-       bytes = num_bytes + *delayed_refs_bytes + extra_delayed_refs_bytes;
-
        /*
         * We want to reserve all the bytes we may need all at once, so we only
         * do 1 enospc flushing cycle per transaction start.
         */
        ret = btrfs_reserve_metadata_bytes(fs_info, si, bytes, flush);
-       if (ret == 0) {
-               if (extra_delayed_refs_bytes > 0)
-                       btrfs_migrate_to_delayed_refs_rsv(fs_info,
-                                                         extra_delayed_refs_bytes);
-               return 0;
-       }
-
-       if (extra_delayed_refs_bytes > 0) {
-               bytes -= extra_delayed_refs_bytes;
-               ret = btrfs_reserve_metadata_bytes(fs_info, si, bytes, flush);
-               if (ret == 0)
-                       return 0;
-       }
 
        /*
         * If we are an emergency flush, which can steal from the global block
         * reserve, then attempt to not reserve space for the delayed refs, as
         * we will consume space for them from the global block reserve.
         */
-       if (flush == BTRFS_RESERVE_FLUSH_ALL_STEAL) {
+       if (ret && flush == BTRFS_RESERVE_FLUSH_ALL_STEAL) {
                bytes -= *delayed_refs_bytes;
                *delayed_refs_bytes = 0;
                ret = btrfs_reserve_metadata_bytes(fs_info, si, bytes, flush);
@@ -1868,7 +1834,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
        }
 
        key.offset = (u64)-1;
-       pending->snap = btrfs_get_new_fs_root(fs_info, objectid, pending->anon_dev);
+       pending->snap = btrfs_get_new_fs_root(fs_info, objectid, &pending->anon_dev);
        if (IS_ERR(pending->snap)) {
                ret = PTR_ERR(pending->snap);
                pending->snap = NULL;
index 50fdc69fdddf9d26014a65ed73c13fe694d05e4b..6eccf8496486c0630cd85c90ca813170f08e6eb5 100644 (file)
@@ -1436,7 +1436,7 @@ static int check_extent_item(struct extent_buffer *leaf,
                if (unlikely(ptr + btrfs_extent_inline_ref_size(inline_type) > end)) {
                        extent_err(leaf, slot,
 "inline ref item overflows extent item, ptr %lu iref size %u end %lu",
-                                  ptr, inline_type, end);
+                                  ptr, btrfs_extent_inline_ref_size(inline_type), end);
                        return -EUCLEAN;
                }
 
index 4c32497311d2ff6ba28fc9ac5ba8dd5b8f835a66..e180da4cc227317370401651d495c303d5a79782 100644 (file)
@@ -468,39 +468,39 @@ static noinline struct btrfs_fs_devices *find_fsid(
 
 static int
 btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
-                     int flush, struct bdev_handle **bdev_handle,
+                     int flush, struct file **bdev_file,
                      struct btrfs_super_block **disk_super)
 {
        struct block_device *bdev;
        int ret;
 
-       *bdev_handle = bdev_open_by_path(device_path, flags, holder, NULL);
+       *bdev_file = bdev_file_open_by_path(device_path, flags, holder, NULL);
 
-       if (IS_ERR(*bdev_handle)) {
-               ret = PTR_ERR(*bdev_handle);
+       if (IS_ERR(*bdev_file)) {
+               ret = PTR_ERR(*bdev_file);
                goto error;
        }
-       bdev = (*bdev_handle)->bdev;
+       bdev = file_bdev(*bdev_file);
 
        if (flush)
                sync_blockdev(bdev);
        ret = set_blocksize(bdev, BTRFS_BDEV_BLOCKSIZE);
        if (ret) {
-               bdev_release(*bdev_handle);
+               fput(*bdev_file);
                goto error;
        }
        invalidate_bdev(bdev);
        *disk_super = btrfs_read_dev_super(bdev);
        if (IS_ERR(*disk_super)) {
                ret = PTR_ERR(*disk_super);
-               bdev_release(*bdev_handle);
+               fput(*bdev_file);
                goto error;
        }
 
        return 0;
 
 error:
-       *bdev_handle = NULL;
+       *bdev_file = NULL;
        return ret;
 }
 
@@ -643,7 +643,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
                        struct btrfs_device *device, blk_mode_t flags,
                        void *holder)
 {
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        struct btrfs_super_block *disk_super;
        u64 devid;
        int ret;
@@ -654,7 +654,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
                return -EINVAL;
 
        ret = btrfs_get_bdev_and_sb(device->name->str, flags, holder, 1,
-                                   &bdev_handle, &disk_super);
+                                   &bdev_file, &disk_super);
        if (ret)
                return ret;
 
@@ -678,20 +678,20 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
                clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
                fs_devices->seeding = true;
        } else {
-               if (bdev_read_only(bdev_handle->bdev))
+               if (bdev_read_only(file_bdev(bdev_file)))
                        clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
                else
                        set_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
        }
 
-       if (!bdev_nonrot(bdev_handle->bdev))
+       if (!bdev_nonrot(file_bdev(bdev_file)))
                fs_devices->rotating = true;
 
-       if (bdev_max_discard_sectors(bdev_handle->bdev))
+       if (bdev_max_discard_sectors(file_bdev(bdev_file)))
                fs_devices->discardable = true;
 
-       device->bdev_handle = bdev_handle;
-       device->bdev = bdev_handle->bdev;
+       device->bdev_file = bdev_file;
+       device->bdev = file_bdev(bdev_file);
        clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
 
        fs_devices->open_devices++;
@@ -706,7 +706,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
 
 error_free_page:
        btrfs_release_disk_super(disk_super);
-       bdev_release(bdev_handle);
+       fput(bdev_file);
 
        return -EINVAL;
 }
@@ -1015,10 +1015,10 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
                if (device->devid == BTRFS_DEV_REPLACE_DEVID)
                        continue;
 
-               if (device->bdev_handle) {
-                       bdev_release(device->bdev_handle);
+               if (device->bdev_file) {
+                       fput(device->bdev_file);
                        device->bdev = NULL;
-                       device->bdev_handle = NULL;
+                       device->bdev_file = NULL;
                        fs_devices->open_devices--;
                }
                if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
@@ -1063,7 +1063,7 @@ static void btrfs_close_bdev(struct btrfs_device *device)
                invalidate_bdev(device->bdev);
        }
 
-       bdev_release(device->bdev_handle);
+       fput(device->bdev_file);
 }
 
 static void btrfs_close_one_device(struct btrfs_device *device)
@@ -1316,7 +1316,7 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
        struct btrfs_super_block *disk_super;
        bool new_device_added = false;
        struct btrfs_device *device = NULL;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        u64 bytenr, bytenr_orig;
        int ret;
 
@@ -1339,18 +1339,18 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
         * values temporarily, as the device paths of the fsid are the only
         * required information for assembling the volume.
         */
-       bdev_handle = bdev_open_by_path(path, flags, NULL, NULL);
-       if (IS_ERR(bdev_handle))
-               return ERR_CAST(bdev_handle);
+       bdev_file = bdev_file_open_by_path(path, flags, NULL, NULL);
+       if (IS_ERR(bdev_file))
+               return ERR_CAST(bdev_file);
 
        bytenr_orig = btrfs_sb_offset(0);
-       ret = btrfs_sb_log_location_bdev(bdev_handle->bdev, 0, READ, &bytenr);
+       ret = btrfs_sb_log_location_bdev(file_bdev(bdev_file), 0, READ, &bytenr);
        if (ret) {
                device = ERR_PTR(ret);
                goto error_bdev_put;
        }
 
-       disk_super = btrfs_read_disk_super(bdev_handle->bdev, bytenr,
+       disk_super = btrfs_read_disk_super(file_bdev(bdev_file), bytenr,
                                           bytenr_orig);
        if (IS_ERR(disk_super)) {
                device = ERR_CAST(disk_super);
@@ -1381,7 +1381,7 @@ free_disk_super:
        btrfs_release_disk_super(disk_super);
 
 error_bdev_put:
-       bdev_release(bdev_handle);
+       fput(bdev_file);
 
        return device;
 }
@@ -2057,7 +2057,7 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
 
 int btrfs_rm_device(struct btrfs_fs_info *fs_info,
                    struct btrfs_dev_lookup_args *args,
-                   struct bdev_handle **bdev_handle)
+                   struct file **bdev_file)
 {
        struct btrfs_trans_handle *trans;
        struct btrfs_device *device;
@@ -2166,7 +2166,7 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info,
 
        btrfs_assign_next_active_device(device, NULL);
 
-       if (device->bdev_handle) {
+       if (device->bdev_file) {
                cur_devices->open_devices--;
                /* remove sysfs entry */
                btrfs_sysfs_remove_device(device);
@@ -2182,9 +2182,9 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info,
         * free the device.
         *
         * We cannot call btrfs_close_bdev() here because we're holding the sb
-        * write lock, and bdev_release() will pull in the ->open_mutex on
-        * the block device and it's dependencies.  Instead just flush the
-        * device and let the caller do the final bdev_release.
+        * write lock, and fput() on the block device will pull in the
+        * ->open_mutex on the block device and it's dependencies.  Instead
+        *  just flush the device and let the caller do the final bdev_release.
         */
        if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
                btrfs_scratch_superblocks(fs_info, device->bdev,
@@ -2195,7 +2195,7 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info,
                }
        }
 
-       *bdev_handle = device->bdev_handle;
+       *bdev_file = device->bdev_file;
        synchronize_rcu();
        btrfs_free_device(device);
 
@@ -2332,7 +2332,7 @@ int btrfs_get_dev_args_from_path(struct btrfs_fs_info *fs_info,
                                 const char *path)
 {
        struct btrfs_super_block *disk_super;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        int ret;
 
        if (!path || !path[0])
@@ -2350,7 +2350,7 @@ int btrfs_get_dev_args_from_path(struct btrfs_fs_info *fs_info,
        }
 
        ret = btrfs_get_bdev_and_sb(path, BLK_OPEN_READ, NULL, 0,
-                                   &bdev_handle, &disk_super);
+                                   &bdev_file, &disk_super);
        if (ret) {
                btrfs_put_dev_args_from_path(args);
                return ret;
@@ -2363,7 +2363,7 @@ int btrfs_get_dev_args_from_path(struct btrfs_fs_info *fs_info,
        else
                memcpy(args->fsid, disk_super->fsid, BTRFS_FSID_SIZE);
        btrfs_release_disk_super(disk_super);
-       bdev_release(bdev_handle);
+       fput(bdev_file);
        return 0;
 }
 
@@ -2583,7 +2583,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
        struct btrfs_root *root = fs_info->dev_root;
        struct btrfs_trans_handle *trans;
        struct btrfs_device *device;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        struct super_block *sb = fs_info->sb;
        struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
        struct btrfs_fs_devices *seed_devices = NULL;
@@ -2596,12 +2596,12 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
        if (sb_rdonly(sb) && !fs_devices->seeding)
                return -EROFS;
 
-       bdev_handle = bdev_open_by_path(device_path, BLK_OPEN_WRITE,
+       bdev_file = bdev_file_open_by_path(device_path, BLK_OPEN_WRITE,
                                        fs_info->bdev_holder, NULL);
-       if (IS_ERR(bdev_handle))
-               return PTR_ERR(bdev_handle);
+       if (IS_ERR(bdev_file))
+               return PTR_ERR(bdev_file);
 
-       if (!btrfs_check_device_zone_type(fs_info, bdev_handle->bdev)) {
+       if (!btrfs_check_device_zone_type(fs_info, file_bdev(bdev_file))) {
                ret = -EINVAL;
                goto error;
        }
@@ -2613,11 +2613,11 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
                locked = true;
        }
 
-       sync_blockdev(bdev_handle->bdev);
+       sync_blockdev(file_bdev(bdev_file));
 
        rcu_read_lock();
        list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) {
-               if (device->bdev == bdev_handle->bdev) {
+               if (device->bdev == file_bdev(bdev_file)) {
                        ret = -EEXIST;
                        rcu_read_unlock();
                        goto error;
@@ -2633,8 +2633,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
        }
 
        device->fs_info = fs_info;
-       device->bdev_handle = bdev_handle;
-       device->bdev = bdev_handle->bdev;
+       device->bdev_file = bdev_file;
+       device->bdev = file_bdev(bdev_file);
        ret = lookup_bdev(device_path, &device->devt);
        if (ret)
                goto error_free_device;
@@ -2817,7 +2817,7 @@ error_free_zone:
 error_free_device:
        btrfs_free_device(device);
 error:
-       bdev_release(bdev_handle);
+       fput(bdev_file);
        if (locked) {
                mutex_unlock(&uuid_mutex);
                up_write(&sb->s_umount);
@@ -3087,7 +3087,6 @@ struct btrfs_chunk_map *btrfs_get_chunk_map(struct btrfs_fs_info *fs_info,
        map = btrfs_find_chunk_map(fs_info, logical, length);
 
        if (unlikely(!map)) {
-               read_unlock(&fs_info->mapping_tree_lock);
                btrfs_crit(fs_info,
                           "unable to find chunk map for logical %llu length %llu",
                           logical, length);
@@ -3095,7 +3094,6 @@ struct btrfs_chunk_map *btrfs_get_chunk_map(struct btrfs_fs_info *fs_info,
        }
 
        if (unlikely(map->start > logical || map->start + map->chunk_len <= logical)) {
-               read_unlock(&fs_info->mapping_tree_lock);
                btrfs_crit(fs_info,
                           "found a bad chunk map, wanted %llu-%llu, found %llu-%llu",
                           logical, logical + length, map->start,
index 53f87f398da779aba835c5d3cb65a090ac5a5698..a11854912d535fe60325d31b19879ef183392c04 100644 (file)
@@ -90,7 +90,7 @@ struct btrfs_device {
 
        u64 generation;
 
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        struct block_device *bdev;
 
        struct btrfs_zoned_device_info *zone_info;
@@ -661,7 +661,7 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
 void btrfs_put_dev_args_from_path(struct btrfs_dev_lookup_args *args);
 int btrfs_rm_device(struct btrfs_fs_info *fs_info,
                    struct btrfs_dev_lookup_args *args,
-                   struct bdev_handle **bdev_handle);
+                   struct file **bdev_file);
 void __exit btrfs_cleanup_fs_uuids(void);
 int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len);
 int btrfs_grow_device(struct btrfs_trans_handle *trans,
index 36cf1f0e338e2f59d736aaeb1001e00e8eaddaa3..8da66ea699e8febfdef6cc189c5917d22628265d 100644 (file)
@@ -354,18 +354,13 @@ done:
 }
 
 int zlib_decompress(struct list_head *ws, const u8 *data_in,
-               struct page *dest_page, unsigned long start_byte, size_t srclen,
+               struct page *dest_page, unsigned long dest_pgoff, size_t srclen,
                size_t destlen)
 {
        struct workspace *workspace = list_entry(ws, struct workspace, list);
        int ret = 0;
        int wbits = MAX_WBITS;
-       unsigned long bytes_left;
-       unsigned long total_out = 0;
-       unsigned long pg_offset = 0;
-
-       destlen = min_t(unsigned long, destlen, PAGE_SIZE);
-       bytes_left = destlen;
+       unsigned long to_copy;
 
        workspace->strm.next_in = data_in;
        workspace->strm.avail_in = srclen;
@@ -390,60 +385,30 @@ int zlib_decompress(struct list_head *ws, const u8 *data_in,
                return -EIO;
        }
 
-       while (bytes_left > 0) {
-               unsigned long buf_start;
-               unsigned long buf_offset;
-               unsigned long bytes;
-
-               ret = zlib_inflate(&workspace->strm, Z_NO_FLUSH);
-               if (ret != Z_OK && ret != Z_STREAM_END)
-                       break;
-
-               buf_start = total_out;
-               total_out = workspace->strm.total_out;
-
-               if (total_out == buf_start) {
-                       ret = -EIO;
-                       break;
-               }
-
-               if (total_out <= start_byte)
-                       goto next;
-
-               if (total_out > start_byte && buf_start < start_byte)
-                       buf_offset = start_byte - buf_start;
-               else
-                       buf_offset = 0;
-
-               bytes = min(PAGE_SIZE - pg_offset,
-                           PAGE_SIZE - (buf_offset % PAGE_SIZE));
-               bytes = min(bytes, bytes_left);
+       /*
+        * Everything (in/out buf) should be at most one sector, there should
+        * be no need to switch any input/output buffer.
+        */
+       ret = zlib_inflate(&workspace->strm, Z_FINISH);
+       to_copy = min(workspace->strm.total_out, destlen);
+       if (ret != Z_STREAM_END)
+               goto out;
 
-               memcpy_to_page(dest_page, pg_offset,
-                              workspace->buf + buf_offset, bytes);
+       memcpy_to_page(dest_page, dest_pgoff, workspace->buf, to_copy);
 
-               pg_offset += bytes;
-               bytes_left -= bytes;
-next:
-               workspace->strm.next_out = workspace->buf;
-               workspace->strm.avail_out = workspace->buf_size;
-       }
-
-       if (ret != Z_STREAM_END && bytes_left != 0)
+out:
+       if (unlikely(to_copy != destlen)) {
+               pr_warn_ratelimited("BTRFS: infalte failed, decompressed=%lu expected=%zu\n",
+                                       to_copy, destlen);
                ret = -EIO;
-       else
+       } else {
                ret = 0;
+       }
 
        zlib_inflateEnd(&workspace->strm);
 
-       /*
-        * this should only happen if zlib returned fewer bytes than we
-        * expected.  btrfs_get_block is responsible for zeroing from the
-        * end of the inline extent (destlen) to the end of the page
-        */
-       if (pg_offset < destlen) {
-               memzero_page(dest_page, pg_offset, destlen - pg_offset);
-       }
+       if (unlikely(to_copy < destlen))
+               memzero_page(dest_page, dest_pgoff + to_copy, destlen - to_copy);
        return ret;
 }
 
index 5bd76813b23f065fdf670bf8fe3fbd59ee0c88d9..aea51fd850cd9de130b30644a831488b5601f9ff 100644 (file)
@@ -824,11 +824,14 @@ static int sb_log_location(struct block_device *bdev, struct blk_zone *zones,
                        reset = &zones[1];
 
                if (reset && reset->cond != BLK_ZONE_COND_EMPTY) {
+                       unsigned int nofs_flags;
+
                        ASSERT(sb_zone_is_full(reset));
 
+                       nofs_flags = memalloc_nofs_save();
                        ret = blkdev_zone_mgmt(bdev, REQ_OP_ZONE_RESET,
-                                              reset->start, reset->len,
-                                              GFP_NOFS);
+                                              reset->start, reset->len);
+                       memalloc_nofs_restore(nofs_flags);
                        if (ret)
                                return ret;
 
@@ -974,11 +977,14 @@ int btrfs_advance_sb_log(struct btrfs_device *device, int mirror)
                         * explicit ZONE_FINISH is not necessary.
                         */
                        if (zone->wp != zone->start + zone->capacity) {
+                               unsigned int nofs_flags;
                                int ret;
 
+                               nofs_flags = memalloc_nofs_save();
                                ret = blkdev_zone_mgmt(device->bdev,
                                                REQ_OP_ZONE_FINISH, zone->start,
-                                               zone->len, GFP_NOFS);
+                                               zone->len);
+                               memalloc_nofs_restore(nofs_flags);
                                if (ret)
                                        return ret;
                        }
@@ -996,11 +1002,13 @@ int btrfs_advance_sb_log(struct btrfs_device *device, int mirror)
 
 int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror)
 {
+       unsigned int nofs_flags;
        sector_t zone_sectors;
        sector_t nr_sectors;
        u8 zone_sectors_shift;
        u32 sb_zone;
        u32 nr_zones;
+       int ret;
 
        zone_sectors = bdev_zone_sectors(bdev);
        zone_sectors_shift = ilog2(zone_sectors);
@@ -1011,9 +1019,12 @@ int btrfs_reset_sb_log_zones(struct block_device *bdev, int mirror)
        if (sb_zone + 1 >= nr_zones)
                return -ENOENT;
 
-       return blkdev_zone_mgmt(bdev, REQ_OP_ZONE_RESET,
-                               zone_start_sector(sb_zone, bdev),
-                               zone_sectors * BTRFS_NR_SB_LOG_ZONES, GFP_NOFS);
+       nofs_flags = memalloc_nofs_save();
+       ret = blkdev_zone_mgmt(bdev, REQ_OP_ZONE_RESET,
+                              zone_start_sector(sb_zone, bdev),
+                              zone_sectors * BTRFS_NR_SB_LOG_ZONES);
+       memalloc_nofs_restore(nofs_flags);
+       return ret;
 }
 
 /*
@@ -1124,12 +1135,14 @@ static void btrfs_dev_clear_active_zone(struct btrfs_device *device, u64 pos)
 int btrfs_reset_device_zone(struct btrfs_device *device, u64 physical,
                            u64 length, u64 *bytes)
 {
+       unsigned int nofs_flags;
        int ret;
 
        *bytes = 0;
+       nofs_flags = memalloc_nofs_save();
        ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_RESET,
-                              physical >> SECTOR_SHIFT, length >> SECTOR_SHIFT,
-                              GFP_NOFS);
+                              physical >> SECTOR_SHIFT, length >> SECTOR_SHIFT);
+       memalloc_nofs_restore(nofs_flags);
        if (ret)
                return ret;
 
@@ -1639,6 +1652,15 @@ int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
        }
 
 out:
+       /* Reject non SINGLE data profiles without RST */
+       if ((map->type & BTRFS_BLOCK_GROUP_DATA) &&
+           (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) &&
+           !fs_info->stripe_root) {
+               btrfs_err(fs_info, "zoned: data %s needs raid-stripe-tree",
+                         btrfs_bg_type_to_raid_name(map->type));
+               return -EINVAL;
+       }
+
        if (cache->alloc_offset > cache->zone_capacity) {
                btrfs_err(fs_info,
 "zoned: invalid write pointer %llu (larger than zone capacity %llu) in block group %llu",
@@ -1670,6 +1692,7 @@ out:
        }
        bitmap_free(active);
        kfree(zone_info);
+       btrfs_free_chunk_map(map);
 
        return ret;
 }
@@ -2055,6 +2078,7 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group)
 
        map = block_group->physical_map;
 
+       spin_lock(&fs_info->zone_active_bgs_lock);
        spin_lock(&block_group->lock);
        if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags)) {
                ret = true;
@@ -2067,7 +2091,6 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group)
                goto out_unlock;
        }
 
-       spin_lock(&fs_info->zone_active_bgs_lock);
        for (i = 0; i < map->num_stripes; i++) {
                struct btrfs_zoned_device_info *zinfo;
                int reserved = 0;
@@ -2087,20 +2110,17 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group)
                 */
                if (atomic_read(&zinfo->active_zones_left) <= reserved) {
                        ret = false;
-                       spin_unlock(&fs_info->zone_active_bgs_lock);
                        goto out_unlock;
                }
 
                if (!btrfs_dev_set_active_zone(device, physical)) {
                        /* Cannot activate the zone */
                        ret = false;
-                       spin_unlock(&fs_info->zone_active_bgs_lock);
                        goto out_unlock;
                }
                if (!is_data)
                        zinfo->reserved_active_zones--;
        }
-       spin_unlock(&fs_info->zone_active_bgs_lock);
 
        /* Successfully activated all the zones */
        set_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags);
@@ -2108,8 +2128,6 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group)
 
        /* For the active block group list */
        btrfs_get_block_group(block_group);
-
-       spin_lock(&fs_info->zone_active_bgs_lock);
        list_add_tail(&block_group->active_bg_list, &fs_info->zone_active_bgs);
        spin_unlock(&fs_info->zone_active_bgs_lock);
 
@@ -2117,6 +2135,7 @@ bool btrfs_zone_activate(struct btrfs_block_group *block_group)
 
 out_unlock:
        spin_unlock(&block_group->lock);
+       spin_unlock(&fs_info->zone_active_bgs_lock);
        return ret;
 }
 
@@ -2238,14 +2257,16 @@ static int do_zone_finish(struct btrfs_block_group *block_group, bool fully_writ
                struct btrfs_device *device = map->stripes[i].dev;
                const u64 physical = map->stripes[i].physical;
                struct btrfs_zoned_device_info *zinfo = device->zone_info;
+               unsigned int nofs_flags;
 
                if (zinfo->max_active_zones == 0)
                        continue;
 
+               nofs_flags = memalloc_nofs_save();
                ret = blkdev_zone_mgmt(device->bdev, REQ_OP_ZONE_FINISH,
                                       physical >> SECTOR_SHIFT,
-                                      zinfo->zone_size >> SECTOR_SHIFT,
-                                      GFP_NOFS);
+                                      zinfo->zone_size >> SECTOR_SHIFT);
+               memalloc_nofs_restore(nofs_flags);
 
                if (ret)
                        return ret;
index d3bcf601d3e5a5330419d8e69500c3781f9e8cfc..4f73d23c2c469182a5d873993cdcf0651bfd04f8 100644 (file)
@@ -55,7 +55,7 @@
 
 static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
 static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
-                         struct writeback_control *wbc);
+                         enum rw_hint hint, struct writeback_control *wbc);
 
 #define BH_ENTRY(list) list_entry((list), struct buffer_head, b_assoc_buffers)
 
@@ -464,7 +464,7 @@ EXPORT_SYMBOL(mark_buffer_async_write);
  * a successful fsync().  For example, ext2 indirect blocks need to be
  * written back and waited upon before fsync() returns.
  *
- * The functions mark_buffer_inode_dirty(), fsync_inode_buffers(),
+ * The functions mark_buffer_dirty_inode(), fsync_inode_buffers(),
  * inode_has_buffers() and invalidate_inode_buffers() are provided for the
  * management of a list of dependent buffers at ->i_mapping->i_private_list.
  *
@@ -1889,7 +1889,8 @@ int __block_write_full_folio(struct inode *inode, struct folio *folio,
        do {
                struct buffer_head *next = bh->b_this_page;
                if (buffer_async_write(bh)) {
-                       submit_bh_wbc(REQ_OP_WRITE | write_flags, bh, wbc);
+                       submit_bh_wbc(REQ_OP_WRITE | write_flags, bh,
+                                     inode->i_write_hint, wbc);
                        nr_underway++;
                }
                bh = next;
@@ -1944,7 +1945,8 @@ recover:
                struct buffer_head *next = bh->b_this_page;
                if (buffer_async_write(bh)) {
                        clear_buffer_dirty(bh);
-                       submit_bh_wbc(REQ_OP_WRITE | write_flags, bh, wbc);
+                       submit_bh_wbc(REQ_OP_WRITE | write_flags, bh,
+                                     inode->i_write_hint, wbc);
                        nr_underway++;
                }
                bh = next;
@@ -2756,6 +2758,7 @@ static void end_bio_bh_io_sync(struct bio *bio)
 }
 
 static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
+                         enum rw_hint write_hint,
                          struct writeback_control *wbc)
 {
        const enum req_op op = opf & REQ_OP_MASK;
@@ -2783,6 +2786,7 @@ static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
        fscrypt_set_bio_crypt_ctx_bh(bio, bh, GFP_NOIO);
 
        bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
+       bio->bi_write_hint = write_hint;
 
        __bio_add_page(bio, bh->b_page, bh->b_size, bh_offset(bh));
 
@@ -2802,7 +2806,7 @@ static void submit_bh_wbc(blk_opf_t opf, struct buffer_head *bh,
 
 void submit_bh(blk_opf_t opf, struct buffer_head *bh)
 {
-       submit_bh_wbc(opf, bh, NULL);
+       submit_bh_wbc(opf, bh, WRITE_LIFE_NOT_SET, NULL);
 }
 EXPORT_SYMBOL(submit_bh);
 
@@ -3121,12 +3125,8 @@ void __init buffer_init(void)
        unsigned long nrpages;
        int ret;
 
-       bh_cachep = kmem_cache_create("buffer_head",
-                       sizeof(struct buffer_head), 0,
-                               (SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
-                               SLAB_MEM_SPREAD),
-                               NULL);
-
+       bh_cachep = KMEM_CACHE(buffer_head,
+                               SLAB_RECLAIM_ACCOUNT|SLAB_PANIC);
        /*
         * Limit the bh occupancy to 10% of ZONE_NORMAL
         */
index 7077f72e6f4747c2a1f1bd6c898221d4bdaab380..f449f7340aad0811ae2cea3134731e1a2111f5ff 100644 (file)
@@ -168,6 +168,8 @@ error_unsupported:
        dput(root);
 error_open_root:
        cachefiles_end_secure(cache, saved_cred);
+       put_cred(cache->cache_cred);
+       cache->cache_cred = NULL;
 error_getsec:
        fscache_relinquish_cache(cache_cookie);
        cache->cache = NULL;
index 3f24905f40661302936f08122394947d55e3d5f3..6465e257423091d5183a6bf4c7963a2e8e900766 100644 (file)
@@ -816,6 +816,7 @@ static void cachefiles_daemon_unbind(struct cachefiles_cache *cache)
        cachefiles_put_directory(cache->graveyard);
        cachefiles_put_directory(cache->store);
        mntput(cache->mnt);
+       put_cred(cache->cache_cred);
 
        kfree(cache->rootdirname);
        kfree(cache->secctx);
index 5fd74ec60befc6cb192e8102a14e87de2b45bb87..4ba42f1fa3b4077b04735282354de250e70fe87d 100644 (file)
@@ -539,6 +539,9 @@ int cachefiles_ondemand_init_object(struct cachefiles_object *object)
        struct fscache_volume *volume = object->volume->vcookie;
        size_t volume_key_size, cookie_key_size, data_len;
 
+       if (!object->ondemand)
+               return 0;
+
        /*
         * CacheFiles will firstly check the cache file under the root cache
         * directory. If the coherency check failed, it will fallback to
index 9c02f328c966cbdd12b8af17d7ddb9d5bb19ea38..7fb4aae97412464c54b42037f75016085d214bd3 100644 (file)
@@ -1452,7 +1452,7 @@ static void __prep_cap(struct cap_msg_args *arg, struct ceph_cap *cap,
        if (flushing & CEPH_CAP_XATTR_EXCL) {
                arg->old_xattr_buf = __ceph_build_xattrs_blob(ci);
                arg->xattr_version = ci->i_xattrs.version;
-               arg->xattr_buf = ci->i_xattrs.blob;
+               arg->xattr_buf = ceph_buffer_get(ci->i_xattrs.blob);
        } else {
                arg->xattr_buf = NULL;
                arg->old_xattr_buf = NULL;
@@ -1553,6 +1553,7 @@ static void __send_cap(struct cap_msg_args *arg, struct ceph_inode_info *ci)
        encode_cap_msg(msg, arg);
        ceph_con_send(&arg->session->s_con, msg);
        ceph_buffer_put(arg->old_xattr_buf);
+       ceph_buffer_put(arg->xattr_buf);
        if (arg->wake)
                wake_up_all(&ci->i_cap_wq);
 }
@@ -2155,6 +2156,30 @@ retry:
                      ceph_cap_string(cap->implemented),
                      ceph_cap_string(revoking));
 
+               /* completed revocation? going down and there are no caps? */
+               if (revoking) {
+                       if ((revoking & cap_used) == 0) {
+                               doutc(cl, "completed revocation of %s\n",
+                                     ceph_cap_string(cap->implemented & ~cap->issued));
+                               goto ack;
+                       }
+
+                       /*
+                        * If the "i_wrbuffer_ref" was increased by mmap or generic
+                        * cache write just before the ceph_check_caps() is called,
+                        * the Fb capability revoking will fail this time. Then we
+                        * must wait for the BDI's delayed work to flush the dirty
+                        * pages and to release the "i_wrbuffer_ref", which will cost
+                        * at most 5 seconds. That means the MDS needs to wait at
+                        * most 5 seconds to finished the Fb capability's revocation.
+                        *
+                        * Let's queue a writeback for it.
+                        */
+                       if (S_ISREG(inode->i_mode) && ci->i_wrbuffer_ref &&
+                           (revoking & CEPH_CAP_FILE_BUFFER))
+                               queue_writeback = true;
+               }
+
                if (cap == ci->i_auth_cap &&
                    (cap->issued & CEPH_CAP_FILE_WR)) {
                        /* request larger max_size from MDS? */
@@ -2182,30 +2207,6 @@ retry:
                        }
                }
 
-               /* completed revocation? going down and there are no caps? */
-               if (revoking) {
-                       if ((revoking & cap_used) == 0) {
-                               doutc(cl, "completed revocation of %s\n",
-                                     ceph_cap_string(cap->implemented & ~cap->issued));
-                               goto ack;
-                       }
-
-                       /*
-                        * If the "i_wrbuffer_ref" was increased by mmap or generic
-                        * cache write just before the ceph_check_caps() is called,
-                        * the Fb capability revoking will fail this time. Then we
-                        * must wait for the BDI's delayed work to flush the dirty
-                        * pages and to release the "i_wrbuffer_ref", which will cost
-                        * at most 5 seconds. That means the MDS needs to wait at
-                        * most 5 seconds to finished the Fb capability's revocation.
-                        *
-                        * Let's queue a writeback for it.
-                        */
-                       if (S_ISREG(inode->i_mode) && ci->i_wrbuffer_ref &&
-                           (revoking & CEPH_CAP_FILE_BUFFER))
-                               queue_writeback = true;
-               }
-
                /* want more caps from mds? */
                if (want & ~cap->mds_wanted) {
                        if (want & ~(cap->mds_wanted | cap->issued))
@@ -3215,7 +3216,6 @@ static int ceph_try_drop_cap_snap(struct ceph_inode_info *ci,
 
 enum put_cap_refs_mode {
        PUT_CAP_REFS_SYNC = 0,
-       PUT_CAP_REFS_NO_CHECK,
        PUT_CAP_REFS_ASYNC,
 };
 
@@ -3331,11 +3331,6 @@ void ceph_put_cap_refs_async(struct ceph_inode_info *ci, int had)
        __ceph_put_cap_refs(ci, had, PUT_CAP_REFS_ASYNC);
 }
 
-void ceph_put_cap_refs_no_check_caps(struct ceph_inode_info *ci, int had)
-{
-       __ceph_put_cap_refs(ci, had, PUT_CAP_REFS_NO_CHECK);
-}
-
 /*
  * Release @nr WRBUFFER refs on dirty pages for the given @snapc snap
  * context.  Adjust per-snap dirty page accounting as appropriate.
@@ -4777,7 +4772,22 @@ int ceph_drop_caps_for_unlink(struct inode *inode)
                if (__ceph_caps_dirty(ci)) {
                        struct ceph_mds_client *mdsc =
                                ceph_inode_to_fs_client(inode)->mdsc;
-                       __cap_delay_requeue_front(mdsc, ci);
+
+                       doutc(mdsc->fsc->client, "%p %llx.%llx\n", inode,
+                             ceph_vinop(inode));
+                       spin_lock(&mdsc->cap_unlink_delay_lock);
+                       ci->i_ceph_flags |= CEPH_I_FLUSH;
+                       if (!list_empty(&ci->i_cap_delay_list))
+                               list_del_init(&ci->i_cap_delay_list);
+                       list_add_tail(&ci->i_cap_delay_list,
+                                     &mdsc->cap_unlink_delay_list);
+                       spin_unlock(&mdsc->cap_unlink_delay_lock);
+
+                       /*
+                        * Fire the work immediately, because the MDS maybe
+                        * waiting for caps release.
+                        */
+                       ceph_queue_cap_unlink_work(mdsc);
                }
        }
        spin_unlock(&ci->i_ceph_lock);
index 0c25d326afc41d9d4d8ba98d3c6c5976647bb3fe..7b2e77517f235ecd47264061c04bae2e6d0b7c83 100644 (file)
@@ -78,6 +78,8 @@ struct inode *ceph_new_inode(struct inode *dir, struct dentry *dentry,
        if (!inode)
                return ERR_PTR(-ENOMEM);
 
+       inode->i_blkbits = CEPH_FSCRYPT_BLOCK_SHIFT;
+
        if (!S_ISLNK(*mode)) {
                err = ceph_pre_init_acls(dir, mode, as_ctx);
                if (err < 0)
index e07ad29ff8b97210ed3d3173412f0b1d1cb7a257..ebf4ac0055ddc59815e4121550b09ebec08f801b 100644 (file)
@@ -33,7 +33,7 @@ void __init ceph_flock_init(void)
 
 static void ceph_fl_copy_lock(struct file_lock *dst, struct file_lock *src)
 {
-       struct inode *inode = file_inode(dst->fl_file);
+       struct inode *inode = file_inode(dst->c.flc_file);
        atomic_inc(&ceph_inode(inode)->i_filelock_ref);
        dst->fl_u.ceph.inode = igrab(inode);
 }
@@ -110,17 +110,18 @@ static int ceph_lock_message(u8 lock_type, u16 operation, struct inode *inode,
        else
                length = fl->fl_end - fl->fl_start + 1;
 
-       owner = secure_addr(fl->fl_owner);
+       owner = secure_addr(fl->c.flc_owner);
 
        doutc(cl, "rule: %d, op: %d, owner: %llx, pid: %llu, "
                    "start: %llu, length: %llu, wait: %d, type: %d\n",
-                   (int)lock_type, (int)operation, owner, (u64)fl->fl_pid,
-                   fl->fl_start, length, wait, fl->fl_type);
+                   (int)lock_type, (int)operation, owner,
+                   (u64) fl->c.flc_pid,
+                   fl->fl_start, length, wait, fl->c.flc_type);
 
        req->r_args.filelock_change.rule = lock_type;
        req->r_args.filelock_change.type = cmd;
        req->r_args.filelock_change.owner = cpu_to_le64(owner);
-       req->r_args.filelock_change.pid = cpu_to_le64((u64)fl->fl_pid);
+       req->r_args.filelock_change.pid = cpu_to_le64((u64) fl->c.flc_pid);
        req->r_args.filelock_change.start = cpu_to_le64(fl->fl_start);
        req->r_args.filelock_change.length = cpu_to_le64(length);
        req->r_args.filelock_change.wait = wait;
@@ -130,13 +131,13 @@ static int ceph_lock_message(u8 lock_type, u16 operation, struct inode *inode,
                err = ceph_mdsc_wait_request(mdsc, req, wait ?
                                        ceph_lock_wait_for_completion : NULL);
        if (!err && operation == CEPH_MDS_OP_GETFILELOCK) {
-               fl->fl_pid = -le64_to_cpu(req->r_reply_info.filelock_reply->pid);
+               fl->c.flc_pid = -le64_to_cpu(req->r_reply_info.filelock_reply->pid);
                if (CEPH_LOCK_SHARED == req->r_reply_info.filelock_reply->type)
-                       fl->fl_type = F_RDLCK;
+                       fl->c.flc_type = F_RDLCK;
                else if (CEPH_LOCK_EXCL == req->r_reply_info.filelock_reply->type)
-                       fl->fl_type = F_WRLCK;
+                       fl->c.flc_type = F_WRLCK;
                else
-                       fl->fl_type = F_UNLCK;
+                       fl->c.flc_type = F_UNLCK;
 
                fl->fl_start = le64_to_cpu(req->r_reply_info.filelock_reply->start);
                length = le64_to_cpu(req->r_reply_info.filelock_reply->start) +
@@ -150,8 +151,8 @@ static int ceph_lock_message(u8 lock_type, u16 operation, struct inode *inode,
        ceph_mdsc_put_request(req);
        doutc(cl, "rule: %d, op: %d, pid: %llu, start: %llu, "
              "length: %llu, wait: %d, type: %d, err code %d\n",
-             (int)lock_type, (int)operation, (u64)fl->fl_pid,
-             fl->fl_start, length, wait, fl->fl_type, err);
+             (int)lock_type, (int)operation, (u64) fl->c.flc_pid,
+             fl->fl_start, length, wait, fl->c.flc_type, err);
        return err;
 }
 
@@ -227,10 +228,10 @@ static int ceph_lock_wait_for_completion(struct ceph_mds_client *mdsc,
 static int try_unlock_file(struct file *file, struct file_lock *fl)
 {
        int err;
-       unsigned int orig_flags = fl->fl_flags;
-       fl->fl_flags |= FL_EXISTS;
+       unsigned int orig_flags = fl->c.flc_flags;
+       fl->c.flc_flags |= FL_EXISTS;
        err = locks_lock_file_wait(file, fl);
-       fl->fl_flags = orig_flags;
+       fl->c.flc_flags = orig_flags;
        if (err == -ENOENT) {
                if (!(orig_flags & FL_EXISTS))
                        err = 0;
@@ -253,13 +254,13 @@ int ceph_lock(struct file *file, int cmd, struct file_lock *fl)
        u8 wait = 0;
        u8 lock_cmd;
 
-       if (!(fl->fl_flags & FL_POSIX))
+       if (!(fl->c.flc_flags & FL_POSIX))
                return -ENOLCK;
 
        if (ceph_inode_is_shutdown(inode))
                return -ESTALE;
 
-       doutc(cl, "fl_owner: %p\n", fl->fl_owner);
+       doutc(cl, "fl_owner: %p\n", fl->c.flc_owner);
 
        /* set wait bit as appropriate, then make command as Ceph expects it*/
        if (IS_GETLK(cmd))
@@ -273,19 +274,19 @@ int ceph_lock(struct file *file, int cmd, struct file_lock *fl)
        }
        spin_unlock(&ci->i_ceph_lock);
        if (err < 0) {
-               if (op == CEPH_MDS_OP_SETFILELOCK && F_UNLCK == fl->fl_type)
+               if (op == CEPH_MDS_OP_SETFILELOCK && lock_is_unlock(fl))
                        posix_lock_file(file, fl, NULL);
                return err;
        }
 
-       if (F_RDLCK == fl->fl_type)
+       if (lock_is_read(fl))
                lock_cmd = CEPH_LOCK_SHARED;
-       else if (F_WRLCK == fl->fl_type)
+       else if (lock_is_write(fl))
                lock_cmd = CEPH_LOCK_EXCL;
        else
                lock_cmd = CEPH_LOCK_UNLOCK;
 
-       if (op == CEPH_MDS_OP_SETFILELOCK && F_UNLCK == fl->fl_type) {
+       if (op == CEPH_MDS_OP_SETFILELOCK && lock_is_unlock(fl)) {
                err = try_unlock_file(file, fl);
                if (err <= 0)
                        return err;
@@ -293,7 +294,7 @@ int ceph_lock(struct file *file, int cmd, struct file_lock *fl)
 
        err = ceph_lock_message(CEPH_LOCK_FCNTL, op, inode, lock_cmd, wait, fl);
        if (!err) {
-               if (op == CEPH_MDS_OP_SETFILELOCK && F_UNLCK != fl->fl_type) {
+               if (op == CEPH_MDS_OP_SETFILELOCK && F_UNLCK != fl->c.flc_type) {
                        doutc(cl, "locking locally\n");
                        err = posix_lock_file(file, fl, NULL);
                        if (err) {
@@ -319,13 +320,13 @@ int ceph_flock(struct file *file, int cmd, struct file_lock *fl)
        u8 wait = 0;
        u8 lock_cmd;
 
-       if (!(fl->fl_flags & FL_FLOCK))
+       if (!(fl->c.flc_flags & FL_FLOCK))
                return -ENOLCK;
 
        if (ceph_inode_is_shutdown(inode))
                return -ESTALE;
 
-       doutc(cl, "fl_file: %p\n", fl->fl_file);
+       doutc(cl, "fl_file: %p\n", fl->c.flc_file);
 
        spin_lock(&ci->i_ceph_lock);
        if (ci->i_ceph_flags & CEPH_I_ERROR_FILELOCK) {
@@ -333,7 +334,7 @@ int ceph_flock(struct file *file, int cmd, struct file_lock *fl)
        }
        spin_unlock(&ci->i_ceph_lock);
        if (err < 0) {
-               if (F_UNLCK == fl->fl_type)
+               if (lock_is_unlock(fl))
                        locks_lock_file_wait(file, fl);
                return err;
        }
@@ -341,14 +342,14 @@ int ceph_flock(struct file *file, int cmd, struct file_lock *fl)
        if (IS_SETLKW(cmd))
                wait = 1;
 
-       if (F_RDLCK == fl->fl_type)
+       if (lock_is_read(fl))
                lock_cmd = CEPH_LOCK_SHARED;
-       else if (F_WRLCK == fl->fl_type)
+       else if (lock_is_write(fl))
                lock_cmd = CEPH_LOCK_EXCL;
        else
                lock_cmd = CEPH_LOCK_UNLOCK;
 
-       if (F_UNLCK == fl->fl_type) {
+       if (lock_is_unlock(fl)) {
                err = try_unlock_file(file, fl);
                if (err <= 0)
                        return err;
@@ -356,7 +357,7 @@ int ceph_flock(struct file *file, int cmd, struct file_lock *fl)
 
        err = ceph_lock_message(CEPH_LOCK_FLOCK, CEPH_MDS_OP_SETFILELOCK,
                                inode, lock_cmd, wait, fl);
-       if (!err && F_UNLCK != fl->fl_type) {
+       if (!err && F_UNLCK != fl->c.flc_type) {
                err = locks_lock_file_wait(file, fl);
                if (err) {
                        ceph_lock_message(CEPH_LOCK_FLOCK,
@@ -385,9 +386,9 @@ void ceph_count_locks(struct inode *inode, int *fcntl_count, int *flock_count)
        ctx = locks_inode_context(inode);
        if (ctx) {
                spin_lock(&ctx->flc_lock);
-               list_for_each_entry(lock, &ctx->flc_posix, fl_list)
+               for_each_file_lock(lock, &ctx->flc_posix)
                        ++(*fcntl_count);
-               list_for_each_entry(lock, &ctx->flc_flock, fl_list)
+               for_each_file_lock(lock, &ctx->flc_flock)
                        ++(*flock_count);
                spin_unlock(&ctx->flc_lock);
        }
@@ -408,10 +409,10 @@ static int lock_to_ceph_filelock(struct inode *inode,
        cephlock->start = cpu_to_le64(lock->fl_start);
        cephlock->length = cpu_to_le64(lock->fl_end - lock->fl_start + 1);
        cephlock->client = cpu_to_le64(0);
-       cephlock->pid = cpu_to_le64((u64)lock->fl_pid);
-       cephlock->owner = cpu_to_le64(secure_addr(lock->fl_owner));
+       cephlock->pid = cpu_to_le64((u64) lock->c.flc_pid);
+       cephlock->owner = cpu_to_le64(secure_addr(lock->c.flc_owner));
 
-       switch (lock->fl_type) {
+       switch (lock->c.flc_type) {
        case F_RDLCK:
                cephlock->type = CEPH_LOCK_SHARED;
                break;
@@ -422,7 +423,8 @@ static int lock_to_ceph_filelock(struct inode *inode,
                cephlock->type = CEPH_LOCK_UNLOCK;
                break;
        default:
-               doutc(cl, "Have unknown lock type %d\n", lock->fl_type);
+               doutc(cl, "Have unknown lock type %d\n",
+                     lock->c.flc_type);
                err = -EINVAL;
        }
 
@@ -453,7 +455,7 @@ int ceph_encode_locks_to_buffer(struct inode *inode,
                return 0;
 
        spin_lock(&ctx->flc_lock);
-       list_for_each_entry(lock, &ctx->flc_posix, fl_list) {
+       for_each_file_lock(lock, &ctx->flc_posix) {
                ++seen_fcntl;
                if (seen_fcntl > num_fcntl_locks) {
                        err = -ENOSPC;
@@ -464,7 +466,7 @@ int ceph_encode_locks_to_buffer(struct inode *inode,
                        goto fail;
                ++l;
        }
-       list_for_each_entry(lock, &ctx->flc_flock, fl_list) {
+       for_each_file_lock(lock, &ctx->flc_flock) {
                ++seen_flock;
                if (seen_flock > num_flock_locks) {
                        err = -ENOSPC;
index 548d1de379f3570b729af9e50b67aaff65e36e14..3ab9c268a8bb398b779cc93d3da98f3d13df8fe3 100644 (file)
@@ -1089,7 +1089,7 @@ void ceph_mdsc_release_request(struct kref *kref)
        struct ceph_mds_request *req = container_of(kref,
                                                    struct ceph_mds_request,
                                                    r_kref);
-       ceph_mdsc_release_dir_caps_no_check(req);
+       ceph_mdsc_release_dir_caps_async(req);
        destroy_reply_info(&req->r_reply_info);
        if (req->r_request)
                ceph_msg_put(req->r_request);
@@ -2484,6 +2484,50 @@ void ceph_reclaim_caps_nr(struct ceph_mds_client *mdsc, int nr)
        }
 }
 
+void ceph_queue_cap_unlink_work(struct ceph_mds_client *mdsc)
+{
+       struct ceph_client *cl = mdsc->fsc->client;
+       if (mdsc->stopping)
+               return;
+
+        if (queue_work(mdsc->fsc->cap_wq, &mdsc->cap_unlink_work)) {
+                doutc(cl, "caps unlink work queued\n");
+        } else {
+                doutc(cl, "failed to queue caps unlink work\n");
+        }
+}
+
+static void ceph_cap_unlink_work(struct work_struct *work)
+{
+       struct ceph_mds_client *mdsc =
+               container_of(work, struct ceph_mds_client, cap_unlink_work);
+       struct ceph_client *cl = mdsc->fsc->client;
+
+       doutc(cl, "begin\n");
+       spin_lock(&mdsc->cap_unlink_delay_lock);
+       while (!list_empty(&mdsc->cap_unlink_delay_list)) {
+               struct ceph_inode_info *ci;
+               struct inode *inode;
+
+               ci = list_first_entry(&mdsc->cap_unlink_delay_list,
+                                     struct ceph_inode_info,
+                                     i_cap_delay_list);
+               list_del_init(&ci->i_cap_delay_list);
+
+               inode = igrab(&ci->netfs.inode);
+               if (inode) {
+                       spin_unlock(&mdsc->cap_unlink_delay_lock);
+                       doutc(cl, "on %p %llx.%llx\n", inode,
+                             ceph_vinop(inode));
+                       ceph_check_caps(ci, CHECK_CAPS_FLUSH);
+                       iput(inode);
+                       spin_lock(&mdsc->cap_unlink_delay_lock);
+               }
+       }
+       spin_unlock(&mdsc->cap_unlink_delay_lock);
+       doutc(cl, "done\n");
+}
+
 /*
  * requests
  */
@@ -4261,7 +4305,7 @@ void ceph_mdsc_release_dir_caps(struct ceph_mds_request *req)
        }
 }
 
-void ceph_mdsc_release_dir_caps_no_check(struct ceph_mds_request *req)
+void ceph_mdsc_release_dir_caps_async(struct ceph_mds_request *req)
 {
        struct ceph_client *cl = req->r_mdsc->fsc->client;
        int dcaps;
@@ -4269,8 +4313,7 @@ void ceph_mdsc_release_dir_caps_no_check(struct ceph_mds_request *req)
        dcaps = xchg(&req->r_dir_caps, 0);
        if (dcaps) {
                doutc(cl, "releasing r_dir_caps=%s\n", ceph_cap_string(dcaps));
-               ceph_put_cap_refs_no_check_caps(ceph_inode(req->r_parent),
-                                               dcaps);
+               ceph_put_cap_refs_async(ceph_inode(req->r_parent), dcaps);
        }
 }
 
@@ -4306,7 +4349,7 @@ static void replay_unsafe_requests(struct ceph_mds_client *mdsc,
                if (req->r_session->s_mds != session->s_mds)
                        continue;
 
-               ceph_mdsc_release_dir_caps_no_check(req);
+               ceph_mdsc_release_dir_caps_async(req);
 
                __send_request(session, req, true);
        }
@@ -5360,6 +5403,8 @@ int ceph_mdsc_init(struct ceph_fs_client *fsc)
        INIT_LIST_HEAD(&mdsc->cap_delay_list);
        INIT_LIST_HEAD(&mdsc->cap_wait_list);
        spin_lock_init(&mdsc->cap_delay_lock);
+       INIT_LIST_HEAD(&mdsc->cap_unlink_delay_list);
+       spin_lock_init(&mdsc->cap_unlink_delay_lock);
        INIT_LIST_HEAD(&mdsc->snap_flush_list);
        spin_lock_init(&mdsc->snap_flush_lock);
        mdsc->last_cap_flush_tid = 1;
@@ -5368,6 +5413,7 @@ int ceph_mdsc_init(struct ceph_fs_client *fsc)
        spin_lock_init(&mdsc->cap_dirty_lock);
        init_waitqueue_head(&mdsc->cap_flushing_wq);
        INIT_WORK(&mdsc->cap_reclaim_work, ceph_cap_reclaim_work);
+       INIT_WORK(&mdsc->cap_unlink_work, ceph_cap_unlink_work);
        err = ceph_metric_init(&mdsc->metric);
        if (err)
                goto err_mdsmap;
@@ -5641,6 +5687,7 @@ void ceph_mdsc_close_sessions(struct ceph_mds_client *mdsc)
        ceph_cleanup_global_and_empty_realms(mdsc);
 
        cancel_work_sync(&mdsc->cap_reclaim_work);
+       cancel_work_sync(&mdsc->cap_unlink_work);
        cancel_delayed_work_sync(&mdsc->delayed_work); /* cancel timer */
 
        doutc(cl, "done\n");
index 2e6ddaa13d725016dc9a93c6ad1838806eac547e..03f8ff00874f727adff8b88cc8d538fc989692d8 100644 (file)
@@ -462,6 +462,8 @@ struct ceph_mds_client {
        unsigned long    last_renew_caps;  /* last time we renewed our caps */
        struct list_head cap_delay_list;   /* caps with delayed release */
        spinlock_t       cap_delay_lock;   /* protects cap_delay_list */
+       struct list_head cap_unlink_delay_list;  /* caps with delayed release for unlink */
+       spinlock_t       cap_unlink_delay_lock;  /* protects cap_unlink_delay_list */
        struct list_head snap_flush_list;  /* cap_snaps ready to flush */
        spinlock_t       snap_flush_lock;
 
@@ -475,6 +477,8 @@ struct ceph_mds_client {
        struct work_struct cap_reclaim_work;
        atomic_t           cap_reclaim_pending;
 
+       struct work_struct cap_unlink_work;
+
        /*
         * Cap reservations
         *
@@ -552,7 +556,7 @@ extern int ceph_mdsc_do_request(struct ceph_mds_client *mdsc,
                                struct inode *dir,
                                struct ceph_mds_request *req);
 extern void ceph_mdsc_release_dir_caps(struct ceph_mds_request *req);
-extern void ceph_mdsc_release_dir_caps_no_check(struct ceph_mds_request *req);
+extern void ceph_mdsc_release_dir_caps_async(struct ceph_mds_request *req);
 static inline void ceph_mdsc_get_request(struct ceph_mds_request *req)
 {
        kref_get(&req->r_kref);
@@ -574,6 +578,7 @@ extern void ceph_flush_cap_releases(struct ceph_mds_client *mdsc,
                                    struct ceph_mds_session *session);
 extern void ceph_queue_cap_reclaim_work(struct ceph_mds_client *mdsc);
 extern void ceph_reclaim_caps_nr(struct ceph_mds_client *mdsc, int nr);
+extern void ceph_queue_cap_unlink_work(struct ceph_mds_client *mdsc);
 extern int ceph_iterate_session_caps(struct ceph_mds_session *session,
                                     int (*cb)(struct inode *, int mds, void *),
                                     void *arg);
index fae97c25ce58d5b268b7e3d73c5d4c94def4946d..8109aba66e023eb0d3dd5cdf06f3060c5cbf4b1a 100644 (file)
@@ -380,10 +380,11 @@ struct ceph_mdsmap *ceph_mdsmap_decode(struct ceph_mds_client *mdsc, void **p,
                ceph_decode_skip_8(p, end, bad_ext);
                /* required_client_features */
                ceph_decode_skip_set(p, end, 64, bad_ext);
+               /* bal_rank_mask */
+               ceph_decode_skip_string(p, end, bad_ext);
+       }
+       if (mdsmap_ev >= 18) {
                ceph_decode_64_safe(p, end, m->m_max_xattr_size, bad_ext);
-       } else {
-               /* This forces the usage of the (sync) SETXATTR Op */
-               m->m_max_xattr_size = 0;
        }
 bad_ext:
        doutc(cl, "m_enabled: %d, m_damaged: %d, m_num_laggy: %d\n",
index 89f1931f1ba6c9643a4098b1255c240e00f0c38e..1f2171dd01bfa34a404eef00113646bdcb978980 100644 (file)
@@ -27,7 +27,11 @@ struct ceph_mdsmap {
        u32 m_session_timeout;          /* seconds */
        u32 m_session_autoclose;        /* seconds */
        u64 m_max_file_size;
-       u64 m_max_xattr_size;           /* maximum size for xattrs blob */
+       /*
+        * maximum size for xattrs blob.
+        * Zeroed by default to force the usage of the (sync) SETXATTR Op.
+        */
+       u64 m_max_xattr_size;
        u32 m_max_mds;                  /* expected up:active mds number */
        u32 m_num_active_mds;           /* actual up:active mds number */
        u32 possible_max_rank;          /* possible max rank index */
index b06e2bc86221bf02fe54b2aa3304be80bedc5214..b63b4cd9b5b685a930bc33673f28e7c48a93d605 100644 (file)
@@ -1255,8 +1255,6 @@ extern void ceph_take_cap_refs(struct ceph_inode_info *ci, int caps,
 extern void ceph_get_cap_refs(struct ceph_inode_info *ci, int caps);
 extern void ceph_put_cap_refs(struct ceph_inode_info *ci, int had);
 extern void ceph_put_cap_refs_async(struct ceph_inode_info *ci, int had);
-extern void ceph_put_cap_refs_no_check_caps(struct ceph_inode_info *ci,
-                                           int had);
 extern void ceph_put_wrbuffer_cap_refs(struct ceph_inode_info *ci, int nr,
                                       struct ceph_snap_context *snapc);
 extern void __ceph_remove_capsnap(struct inode *inode,
index 0c7c2528791ebc010acad4754b51fe3e9283901b..a50356c541f6c7bb61769ac7ef3122304226a2de 100644 (file)
@@ -24,6 +24,8 @@
 #include <linux/pid_namespace.h>
 #include <linux/uaccess.h>
 #include <linux/fs.h>
+#include <linux/fs_context.h>
+#include <linux/fs_parser.h>
 #include <linux/vmalloc.h>
 
 #include <linux/coda.h>
@@ -87,10 +89,10 @@ void coda_destroy_inodecache(void)
        kmem_cache_destroy(coda_inode_cachep);
 }
 
-static int coda_remount(struct super_block *sb, int *flags, char *data)
+static int coda_reconfigure(struct fs_context *fc)
 {
-       sync_filesystem(sb);
-       *flags |= SB_NOATIME;
+       sync_filesystem(fc->root->d_sb);
+       fc->sb_flags |= SB_NOATIME;
        return 0;
 }
 
@@ -102,78 +104,102 @@ static const struct super_operations coda_super_operations =
        .evict_inode    = coda_evict_inode,
        .put_super      = coda_put_super,
        .statfs         = coda_statfs,
-       .remount_fs     = coda_remount,
 };
 
-static int get_device_index(struct coda_mount_data *data)
+struct coda_fs_context {
+       int     idx;
+};
+
+enum {
+       Opt_fd,
+};
+
+static const struct fs_parameter_spec coda_param_specs[] = {
+       fsparam_fd      ("fd",  Opt_fd),
+       {}
+};
+
+static int coda_parse_fd(struct fs_context *fc, int fd)
 {
+       struct coda_fs_context *ctx = fc->fs_private;
        struct fd f;
        struct inode *inode;
        int idx;
 
-       if (data == NULL) {
-               pr_warn("%s: Bad mount data\n", __func__);
-               return -1;
-       }
-
-       if (data->version != CODA_MOUNT_VERSION) {
-               pr_warn("%s: Bad mount version\n", __func__);
-               return -1;
-       }
-
-       f = fdget(data->fd);
+       f = fdget(fd);
        if (!f.file)
-               goto Ebadf;
+               return -EBADF;
        inode = file_inode(f.file);
        if (!S_ISCHR(inode->i_mode) || imajor(inode) != CODA_PSDEV_MAJOR) {
                fdput(f);
-               goto Ebadf;
+               return invalf(fc, "code: Not coda psdev");
        }
 
        idx = iminor(inode);
        fdput(f);
 
-       if (idx < 0 || idx >= MAX_CODADEVS) {
-               pr_warn("%s: Bad minor number\n", __func__);
-               return -1;
+       if (idx < 0 || idx >= MAX_CODADEVS)
+               return invalf(fc, "coda: Bad minor number");
+       ctx->idx = idx;
+       return 0;
+}
+
+static int coda_parse_param(struct fs_context *fc, struct fs_parameter *param)
+{
+       struct fs_parse_result result;
+       int opt;
+
+       opt = fs_parse(fc, coda_param_specs, param, &result);
+       if (opt < 0)
+               return opt;
+
+       switch (opt) {
+       case Opt_fd:
+               return coda_parse_fd(fc, result.uint_32);
        }
 
-       return idx;
-Ebadf:
-       pr_warn("%s: Bad file\n", __func__);
-       return -1;
+       return 0;
+}
+
+/*
+ * Parse coda's binary mount data form.  We ignore any errors and go with index
+ * 0 if we get one for backward compatibility.
+ */
+static int coda_parse_monolithic(struct fs_context *fc, void *_data)
+{
+       struct coda_mount_data *data = _data;
+
+       if (!data)
+               return invalf(fc, "coda: Bad mount data");
+
+       if (data->version != CODA_MOUNT_VERSION)
+               return invalf(fc, "coda: Bad mount version");
+
+       coda_parse_fd(fc, data->fd);
+       return 0;
 }
 
-static int coda_fill_super(struct super_block *sb, void *data, int silent)
+static int coda_fill_super(struct super_block *sb, struct fs_context *fc)
 {
+       struct coda_fs_context *ctx = fc->fs_private;
        struct inode *root = NULL;
        struct venus_comm *vc;
        struct CodaFid fid;
        int error;
-       int idx;
-
-       if (task_active_pid_ns(current) != &init_pid_ns)
-               return -EINVAL;
-
-       idx = get_device_index((struct coda_mount_data *) data);
 
-       /* Ignore errors in data, for backward compatibility */
-       if(idx == -1)
-               idx = 0;
-       
-       pr_info("%s: device index: %i\n", __func__,  idx);
+       infof(fc, "coda: device index: %i\n", ctx->idx);
 
-       vc = &coda_comms[idx];
+       vc = &coda_comms[ctx->idx];
        mutex_lock(&vc->vc_mutex);
 
        if (!vc->vc_inuse) {
-               pr_warn("%s: No pseudo device\n", __func__);
+               errorf(fc, "coda: No pseudo device");
                error = -EINVAL;
                goto unlock_out;
        }
 
        if (vc->vc_sb) {
-               pr_warn("%s: Device already mounted\n", __func__);
+               errorf(fc, "coda: Device already mounted");
                error = -EBUSY;
                goto unlock_out;
        }
@@ -313,18 +339,45 @@ static int coda_statfs(struct dentry *dentry, struct kstatfs *buf)
        return 0; 
 }
 
-/* init_coda: used by filesystems.c to register coda */
+static int coda_get_tree(struct fs_context *fc)
+{
+       if (task_active_pid_ns(current) != &init_pid_ns)
+               return -EINVAL;
 
-static struct dentry *coda_mount(struct file_system_type *fs_type,
-       int flags, const char *dev_name, void *data)
+       return get_tree_nodev(fc, coda_fill_super);
+}
+
+static void coda_free_fc(struct fs_context *fc)
 {
-       return mount_nodev(fs_type, flags, data, coda_fill_super);
+       kfree(fc->fs_private);
+}
+
+static const struct fs_context_operations coda_context_ops = {
+       .free           = coda_free_fc,
+       .parse_param    = coda_parse_param,
+       .parse_monolithic = coda_parse_monolithic,
+       .get_tree       = coda_get_tree,
+       .reconfigure    = coda_reconfigure,
+};
+
+static int coda_init_fs_context(struct fs_context *fc)
+{
+       struct coda_fs_context *ctx;
+
+       ctx = kzalloc(sizeof(struct coda_fs_context), GFP_KERNEL);
+       if (!ctx)
+               return -ENOMEM;
+
+       fc->fs_private = ctx;
+       fc->ops = &coda_context_ops;
+       return 0;
 }
 
 struct file_system_type coda_fs_type = {
        .owner          = THIS_MODULE,
        .name           = "coda",
-       .mount          = coda_mount,
+       .init_fs_context = coda_init_fs_context,
+       .parameters     = coda_param_specs,
        .kill_sb        = kill_anon_super,
        .fs_flags       = FS_BINARY_MOUNTDATA,
 };
index f258c17c18411284b9725ed88110de32fadc8993..be6403b4b14b6a26e611398f0903244d1af96343 100644 (file)
@@ -872,6 +872,9 @@ static int dump_emit_page(struct coredump_params *cprm, struct page *page)
        loff_t pos;
        ssize_t n;
 
+       if (!page)
+               return 0;
+
        if (cprm->to_skip) {
                if (!__dump_skip(cprm, cprm->to_skip))
                        return 0;
@@ -884,7 +887,6 @@ static int dump_emit_page(struct coredump_params *cprm, struct page *page)
        pos = file->f_pos;
        bvec_set_page(&bvec, page, PAGE_SIZE, 0);
        iov_iter_bvec(&iter, ITER_SOURCE, &bvec, 1, PAGE_SIZE);
-       iov_iter_set_copy_mc(&iter);
        n = __kernel_write_iter(cprm->file, &iter, &pos);
        if (n != PAGE_SIZE)
                return 0;
@@ -895,10 +897,44 @@ static int dump_emit_page(struct coredump_params *cprm, struct page *page)
        return 1;
 }
 
+/*
+ * If we might get machine checks from kernel accesses during the
+ * core dump, let's get those errors early rather than during the
+ * IO. This is not performance-critical enough to warrant having
+ * all the machine check logic in the iovec paths.
+ */
+#ifdef copy_mc_to_kernel
+
+#define dump_page_alloc() alloc_page(GFP_KERNEL)
+#define dump_page_free(x) __free_page(x)
+static struct page *dump_page_copy(struct page *src, struct page *dst)
+{
+       void *buf = kmap_local_page(src);
+       size_t left = copy_mc_to_kernel(page_address(dst), buf, PAGE_SIZE);
+       kunmap_local(buf);
+       return left ? NULL : dst;
+}
+
+#else
+
+/* We just want to return non-NULL; it's never used. */
+#define dump_page_alloc() ERR_PTR(-EINVAL)
+#define dump_page_free(x) ((void)(x))
+static inline struct page *dump_page_copy(struct page *src, struct page *dst)
+{
+       return src;
+}
+#endif
+
 int dump_user_range(struct coredump_params *cprm, unsigned long start,
                    unsigned long len)
 {
        unsigned long addr;
+       struct page *dump_page;
+
+       dump_page = dump_page_alloc();
+       if (!dump_page)
+               return 0;
 
        for (addr = start; addr < start + len; addr += PAGE_SIZE) {
                struct page *page;
@@ -912,14 +948,17 @@ int dump_user_range(struct coredump_params *cprm, unsigned long start,
                 */
                page = get_dump_page(addr);
                if (page) {
-                       int stop = !dump_emit_page(cprm, page);
+                       int stop = !dump_emit_page(cprm, dump_page_copy(page, dump_page));
                        put_page(page);
-                       if (stop)
+                       if (stop) {
+                               dump_page_free(dump_page);
                                return 0;
+                       }
                } else {
                        dump_skip(cprm, PAGE_SIZE);
                }
        }
+       dump_page_free(dump_page);
        return 1;
 }
 #endif
index 60dbfa0f880514d2bb5ce155a94109db5f34087f..39e75131fd5aa01d732f703cb1f421a3696bffd6 100644 (file)
@@ -495,7 +495,7 @@ static void cramfs_kill_sb(struct super_block *sb)
                sb->s_mtd = NULL;
        } else if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV) && sb->s_bdev) {
                sync_blockdev(sb->s_bdev);
-               bdev_release(sb->s_bdev_handle);
+               fput(sb->s_bdev_file);
        }
        kfree(sbi);
 }
index 7b3fc189593a5ac67d7e68b6cb8be55d57348cdc..0ad52fbe51c944eded295196a4cd27f91301571a 100644 (file)
@@ -74,13 +74,7 @@ struct fscrypt_nokey_name {
 
 static inline bool fscrypt_is_dot_dotdot(const struct qstr *str)
 {
-       if (str->len == 1 && str->name[0] == '.')
-               return true;
-
-       if (str->len == 2 && str->name[0] == '.' && str->name[1] == '.')
-               return true;
-
-       return false;
+       return is_dot_dotdot(str->name, str->len);
 }
 
 /**
index 52504dd478d31b4b1cd2db9780970e867435f91c..104771c3d3f6ad968ef822c464edfbc7f1b52ca2 100644 (file)
@@ -102,11 +102,8 @@ int __fscrypt_prepare_lookup(struct inode *dir, struct dentry *dentry,
        if (err && err != -ENOENT)
                return err;
 
-       if (fname->is_nokey_name) {
-               spin_lock(&dentry->d_lock);
-               dentry->d_flags |= DCACHE_NOKEY_NAME;
-               spin_unlock(&dentry->d_lock);
-       }
+       fscrypt_prepare_dentry(dentry, fname->is_nokey_name);
+
        return err;
 }
 EXPORT_SYMBOL_GPL(__fscrypt_prepare_lookup);
@@ -131,12 +128,10 @@ EXPORT_SYMBOL_GPL(__fscrypt_prepare_lookup);
 int fscrypt_prepare_lookup_partial(struct inode *dir, struct dentry *dentry)
 {
        int err = fscrypt_get_encryption_info(dir, true);
+       bool is_nokey_name = (!err && !fscrypt_has_encryption_key(dir));
+
+       fscrypt_prepare_dentry(dentry, is_nokey_name);
 
-       if (!err && !fscrypt_has_encryption_key(dir)) {
-               spin_lock(&dentry->d_lock);
-               dentry->d_flags |= DCACHE_NOKEY_NAME;
-               spin_unlock(&dentry->d_lock);
-       }
        return err;
 }
 EXPORT_SYMBOL_GPL(fscrypt_prepare_lookup_partial);
index b813528fb147784c6f308e67d47f3069e3a96e33..71a8e943a0fa506c93fd7f11400de9a5d7e23e01 100644 (file)
@@ -3061,7 +3061,10 @@ static enum d_walk_ret d_genocide_kill(void *data, struct dentry *dentry)
                if (d_unhashed(dentry) || !dentry->d_inode)
                        return D_WALK_SKIP;
 
-               dentry->d_lockref.count--;
+               if (!(dentry->d_flags & DCACHE_GENOCIDE)) {
+                       dentry->d_flags |= DCACHE_GENOCIDE;
+                       dentry->d_lockref.count--;
+               }
        }
        return D_WALK_CONTINUE;
 }
@@ -3136,7 +3139,7 @@ static void __init dcache_init(void)
         * of the dcache.
         */
        dentry_cache = KMEM_CACHE_USERCOPY(dentry,
-               SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
+               SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_ACCOUNT,
                d_iname);
 
        /* Hash may have been set up in dcache_init_early */
index 60456263a338e018e6b4cb0f79ce26e0806ec4c9..62c97ff9e852a15b495b1bdb3790290dddc33515 100644 (file)
@@ -410,6 +410,8 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio,
                bio->bi_end_io = dio_bio_end_io;
        if (dio->is_pinned)
                bio_set_flag(bio, BIO_PAGE_PINNED);
+       bio->bi_write_hint = file_inode(dio->iocb->ki_filp)->i_write_hint;
+
        sdio->bio = bio;
        sdio->logical_offset_in_bio = sdio->cur_page_fs_offset;
 }
index d814c51213670bb8897701e30c01dc5b78ab6af9..9ca83ef70ed1e11ee2bc50920405603d2e047c71 100644 (file)
@@ -138,14 +138,14 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
        }
 
        op->info.optype         = DLM_PLOCK_OP_LOCK;
-       op->info.pid            = fl->fl_pid;
-       op->info.ex             = (fl->fl_type == F_WRLCK);
-       op->info.wait           = !!(fl->fl_flags & FL_SLEEP);
+       op->info.pid            = fl->c.flc_pid;
+       op->info.ex             = lock_is_write(fl);
+       op->info.wait           = !!(fl->c.flc_flags & FL_SLEEP);
        op->info.fsid           = ls->ls_global_id;
        op->info.number         = number;
        op->info.start          = fl->fl_start;
        op->info.end            = fl->fl_end;
-       op->info.owner = (__u64)(long)fl->fl_owner;
+       op->info.owner = (__u64)(long) fl->c.flc_owner;
        /* async handling */
        if (fl->fl_lmops && fl->fl_lmops->lm_grant) {
                op_data = kzalloc(sizeof(*op_data), GFP_NOFS);
@@ -258,7 +258,7 @@ static int dlm_plock_callback(struct plock_op *op)
        }
 
        /* got fs lock; bookkeep locally as well: */
-       flc->fl_flags &= ~FL_SLEEP;
+       flc->c.flc_flags &= ~FL_SLEEP;
        if (posix_lock_file(file, flc, NULL)) {
                /*
                 * This can only happen in the case of kmalloc() failure.
@@ -291,7 +291,7 @@ int dlm_posix_unlock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
        struct dlm_ls *ls;
        struct plock_op *op;
        int rv;
-       unsigned char fl_flags = fl->fl_flags;
+       unsigned char saved_flags = fl->c.flc_flags;
 
        ls = dlm_find_lockspace_local(lockspace);
        if (!ls)
@@ -304,7 +304,7 @@ int dlm_posix_unlock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
        }
 
        /* cause the vfs unlock to return ENOENT if lock is not found */
-       fl->fl_flags |= FL_EXISTS;
+       fl->c.flc_flags |= FL_EXISTS;
 
        rv = locks_lock_file_wait(file, fl);
        if (rv == -ENOENT) {
@@ -317,14 +317,14 @@ int dlm_posix_unlock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
        }
 
        op->info.optype         = DLM_PLOCK_OP_UNLOCK;
-       op->info.pid            = fl->fl_pid;
+       op->info.pid            = fl->c.flc_pid;
        op->info.fsid           = ls->ls_global_id;
        op->info.number         = number;
        op->info.start          = fl->fl_start;
        op->info.end            = fl->fl_end;
-       op->info.owner = (__u64)(long)fl->fl_owner;
+       op->info.owner = (__u64)(long) fl->c.flc_owner;
 
-       if (fl->fl_flags & FL_CLOSE) {
+       if (fl->c.flc_flags & FL_CLOSE) {
                op->info.flags |= DLM_PLOCK_FL_CLOSE;
                send_op(op);
                rv = 0;
@@ -345,7 +345,7 @@ out_free:
        dlm_release_plock_op(op);
 out:
        dlm_put_lockspace(ls);
-       fl->fl_flags = fl_flags;
+       fl->c.flc_flags = saved_flags;
        return rv;
 }
 EXPORT_SYMBOL_GPL(dlm_posix_unlock);
@@ -375,14 +375,14 @@ int dlm_posix_cancel(dlm_lockspace_t *lockspace, u64 number, struct file *file,
                return -EINVAL;
 
        memset(&info, 0, sizeof(info));
-       info.pid = fl->fl_pid;
-       info.ex = (fl->fl_type == F_WRLCK);
+       info.pid = fl->c.flc_pid;
+       info.ex = lock_is_write(fl);
        info.fsid = ls->ls_global_id;
        dlm_put_lockspace(ls);
        info.number = number;
        info.start = fl->fl_start;
        info.end = fl->fl_end;
-       info.owner = (__u64)(long)fl->fl_owner;
+       info.owner = (__u64)(long) fl->c.flc_owner;
 
        rv = do_lock_cancel(&info);
        switch (rv) {
@@ -437,13 +437,13 @@ int dlm_posix_get(dlm_lockspace_t *lockspace, u64 number, struct file *file,
        }
 
        op->info.optype         = DLM_PLOCK_OP_GET;
-       op->info.pid            = fl->fl_pid;
-       op->info.ex             = (fl->fl_type == F_WRLCK);
+       op->info.pid            = fl->c.flc_pid;
+       op->info.ex             = lock_is_write(fl);
        op->info.fsid           = ls->ls_global_id;
        op->info.number         = number;
        op->info.start          = fl->fl_start;
        op->info.end            = fl->fl_end;
-       op->info.owner = (__u64)(long)fl->fl_owner;
+       op->info.owner = (__u64)(long) fl->c.flc_owner;
 
        send_op(op);
        wait_event(recv_wq, (op->done != 0));
@@ -455,16 +455,16 @@ int dlm_posix_get(dlm_lockspace_t *lockspace, u64 number, struct file *file,
 
        rv = op->info.rv;
 
-       fl->fl_type = F_UNLCK;
+       fl->c.flc_type = F_UNLCK;
        if (rv == -ENOENT)
                rv = 0;
        else if (rv > 0) {
                locks_init_lock(fl);
-               fl->fl_type = (op->info.ex) ? F_WRLCK : F_RDLCK;
-               fl->fl_flags = FL_POSIX;
-               fl->fl_pid = op->info.pid;
+               fl->c.flc_type = (op->info.ex) ? F_WRLCK : F_RDLCK;
+               fl->c.flc_flags = FL_POSIX;
+               fl->c.flc_pid = op->info.pid;
                if (op->info.nodeid != dlm_our_nodeid())
-                       fl->fl_pid = -fl->fl_pid;
+                       fl->c.flc_pid = -fl->c.flc_pid;
                fl->fl_start = op->info.start;
                fl->fl_end = op->info.end;
                rv = 0;
index 03bd55069d8600bb5b838150e1a74da0fc1e4a1d..2fe0f3af1a08ec5831d2609862e0e9a80c9b0475 100644 (file)
@@ -1949,16 +1949,6 @@ out:
        return rc;
 }
 
-static bool is_dot_dotdot(const char *name, size_t name_size)
-{
-       if (name_size == 1 && name[0] == '.')
-               return true;
-       else if (name_size == 2 && name[0] == '.' && name[1] == '.')
-               return true;
-
-       return false;
-}
-
 /**
  * ecryptfs_decode_and_decrypt_filename - converts the encoded cipher text name to decoded plaintext
  * @plaintext_name: The plaintext name
index 169252e6dc4616c7712126adde4a36ce8e2d6922..f7206158ee81385eeaab387fd16b05aea5a7634b 100644 (file)
@@ -38,7 +38,7 @@ struct efivar_entry {
 
 int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *,
                            struct list_head *),
-               void *data, bool duplicates, struct list_head *head);
+               void *data, struct list_head *head);
 
 int efivar_entry_add(struct efivar_entry *entry, struct list_head *head);
 void __efivar_entry_add(struct efivar_entry *entry, struct list_head *head);
index 6038dd39367abe41430c55b04448ced7727dd287..bb14462f6d992a5506f4fda2158952cb410c96f3 100644 (file)
@@ -343,12 +343,7 @@ static int efivarfs_fill_super(struct super_block *sb, struct fs_context *fc)
        if (err)
                return err;
 
-       err = efivar_init(efivarfs_callback, (void *)sb, true,
-                         &sfi->efivarfs_list);
-       if (err)
-               efivar_entry_iter(efivarfs_destroy, &sfi->efivarfs_list, NULL);
-
-       return err;
+       return efivar_init(efivarfs_callback, sb, &sfi->efivarfs_list);
 }
 
 static int efivarfs_get_tree(struct fs_context *fc)
index 114ff0fd4e55732e2ebe0cdc8b20d82436571abf..4d722af1014f2a18198cc3e831d1fea68d46e251 100644 (file)
@@ -361,7 +361,6 @@ static void dup_variable_bug(efi_char16_t *str16, efi_guid_t *vendor_guid,
  * efivar_init - build the initial list of EFI variables
  * @func: callback function to invoke for every variable
  * @data: function-specific data to pass to @func
- * @duplicates: error if we encounter duplicates on @head?
  * @head: initialised head of variable list
  *
  * Get every EFI variable from the firmware and invoke @func. @func
@@ -371,9 +370,9 @@ static void dup_variable_bug(efi_char16_t *str16, efi_guid_t *vendor_guid,
  */
 int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *,
                            struct list_head *),
-               void *data, bool duplicates, struct list_head *head)
+               void *data, struct list_head *head)
 {
-       unsigned long variable_name_size = 1024;
+       unsigned long variable_name_size = 512;
        efi_char16_t *variable_name;
        efi_status_t status;
        efi_guid_t vendor_guid;
@@ -390,12 +389,13 @@ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *,
                goto free;
 
        /*
-        * Per EFI spec, the maximum storage allocated for both
-        * the variable name and variable data is 1024 bytes.
+        * A small set of old UEFI implementations reject sizes
+        * above a certain threshold, the lowest seen in the wild
+        * is 512.
         */
 
        do {
-               variable_name_size = 1024;
+               variable_name_size = 512;
 
                status = efivar_get_next_variable(&variable_name_size,
                                                  variable_name,
@@ -413,8 +413,7 @@ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *,
                         * we'll ever see a different variable name,
                         * and may end up looping here forever.
                         */
-                       if (duplicates &&
-                           variable_is_present(variable_name, &vendor_guid,
+                       if (variable_is_present(variable_name, &vendor_guid,
                                                head)) {
                                dup_variable_bug(variable_name, &vendor_guid,
                                                 variable_name_size);
@@ -432,9 +431,13 @@ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *,
                        break;
                case EFI_NOT_FOUND:
                        break;
+               case EFI_BUFFER_TOO_SMALL:
+                       pr_warn("efivars: Variable name size exceeds maximum (%lu > 512)\n",
+                               variable_name_size);
+                       status = EFI_NOT_FOUND;
+                       break;
                default:
-                       printk(KERN_WARNING "efivars: get_next_variable: status=%lx\n",
-                               status);
+                       pr_warn("efivars: get_next_variable: status=%lx\n", status);
                        status = EFI_NOT_FOUND;
                        break;
                }
index f17fdac76b2eea716631f63bce3ca2c66b14a2fc..e4421c10caebe5a5114cbcbe76c717272f6d14f6 100644 (file)
 #include <linux/buffer_head.h>
 #include <linux/vfs.h>
 #include <linux/blkdev.h>
-
+#include <linux/fs_context.h>
+#include <linux/fs_parser.h>
 #include "efs.h"
 #include <linux/efs_vh.h>
 #include <linux/efs_fs_sb.h>
 
 static int efs_statfs(struct dentry *dentry, struct kstatfs *buf);
-static int efs_fill_super(struct super_block *s, void *d, int silent);
-
-static struct dentry *efs_mount(struct file_system_type *fs_type,
-       int flags, const char *dev_name, void *data)
-{
-       return mount_bdev(fs_type, flags, dev_name, data, efs_fill_super);
-}
+static int efs_init_fs_context(struct fs_context *fc);
 
 static void efs_kill_sb(struct super_block *s)
 {
@@ -35,15 +30,6 @@ static void efs_kill_sb(struct super_block *s)
        kfree(sbi);
 }
 
-static struct file_system_type efs_fs_type = {
-       .owner          = THIS_MODULE,
-       .name           = "efs",
-       .mount          = efs_mount,
-       .kill_sb        = efs_kill_sb,
-       .fs_flags       = FS_REQUIRES_DEV,
-};
-MODULE_ALIAS_FS("efs");
-
 static struct pt_types sgi_pt_types[] = {
        {0x00,          "SGI vh"},
        {0x01,          "SGI trkrepl"},
@@ -63,6 +49,27 @@ static struct pt_types sgi_pt_types[] = {
        {0,             NULL}
 };
 
+enum {
+       Opt_explicit_open,
+};
+
+static const struct fs_parameter_spec efs_param_spec[] = {
+       fsparam_flag    ("explicit-open",       Opt_explicit_open),
+       {}
+};
+
+/*
+ * File system definition and registration.
+ */
+static struct file_system_type efs_fs_type = {
+       .owner                  = THIS_MODULE,
+       .name                   = "efs",
+       .kill_sb                = efs_kill_sb,
+       .fs_flags               = FS_REQUIRES_DEV,
+       .init_fs_context        = efs_init_fs_context,
+       .parameters             = efs_param_spec,
+};
+MODULE_ALIAS_FS("efs");
 
 static struct kmem_cache * efs_inode_cachep;
 
@@ -91,8 +98,8 @@ static int __init init_inodecache(void)
 {
        efs_inode_cachep = kmem_cache_create("efs_inode_cache",
                                sizeof(struct efs_inode_info), 0,
-                               SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
-                               SLAB_ACCOUNT, init_once);
+                               SLAB_RECLAIM_ACCOUNT|SLAB_ACCOUNT,
+                               init_once);
        if (efs_inode_cachep == NULL)
                return -ENOMEM;
        return 0;
@@ -108,18 +115,10 @@ static void destroy_inodecache(void)
        kmem_cache_destroy(efs_inode_cachep);
 }
 
-static int efs_remount(struct super_block *sb, int *flags, char *data)
-{
-       sync_filesystem(sb);
-       *flags |= SB_RDONLY;
-       return 0;
-}
-
 static const struct super_operations efs_superblock_operations = {
        .alloc_inode    = efs_alloc_inode,
        .free_inode     = efs_free_inode,
        .statfs         = efs_statfs,
-       .remount_fs     = efs_remount,
 };
 
 static const struct export_operations efs_export_ops = {
@@ -249,26 +248,26 @@ static int efs_validate_super(struct efs_sb_info *sb, struct efs_super *super) {
        return 0;    
 }
 
-static int efs_fill_super(struct super_block *s, void *d, int silent)
+static int efs_fill_super(struct super_block *s, struct fs_context *fc)
 {
        struct efs_sb_info *sb;
        struct buffer_head *bh;
        struct inode *root;
 
-       sb = kzalloc(sizeof(struct efs_sb_info), GFP_KERNEL);
+       sb = kzalloc(sizeof(struct efs_sb_info), GFP_KERNEL);
        if (!sb)
                return -ENOMEM;
        s->s_fs_info = sb;
        s->s_time_min = 0;
        s->s_time_max = U32_MAX;
+
        s->s_magic              = EFS_SUPER_MAGIC;
        if (!sb_set_blocksize(s, EFS_BLOCKSIZE)) {
                pr_err("device does not support %d byte blocks\n",
                        EFS_BLOCKSIZE);
                return -EINVAL;
        }
-  
+
        /* read the vh (volume header) block */
        bh = sb_bread(s, 0);
 
@@ -294,7 +293,7 @@ static int efs_fill_super(struct super_block *s, void *d, int silent)
                pr_err("cannot read superblock\n");
                return -EIO;
        }
-               
+
        if (efs_validate_super(sb, (struct efs_super *) bh->b_data)) {
 #ifdef DEBUG
                pr_warn("invalid superblock at block %u\n",
@@ -328,6 +327,61 @@ static int efs_fill_super(struct super_block *s, void *d, int silent)
        return 0;
 }
 
+static void efs_free_fc(struct fs_context *fc)
+{
+       kfree(fc->fs_private);
+}
+
+static int efs_get_tree(struct fs_context *fc)
+{
+       return get_tree_bdev(fc, efs_fill_super);
+}
+
+static int efs_parse_param(struct fs_context *fc, struct fs_parameter *param)
+{
+       int token;
+       struct fs_parse_result result;
+
+       token = fs_parse(fc, efs_param_spec, param, &result);
+       if (token < 0)
+               return token;
+       return 0;
+}
+
+static int efs_reconfigure(struct fs_context *fc)
+{
+       sync_filesystem(fc->root->d_sb);
+
+       return 0;
+}
+
+struct efs_context {
+       unsigned long s_mount_opts;
+};
+
+static const struct fs_context_operations efs_context_opts = {
+       .parse_param    = efs_parse_param,
+       .get_tree       = efs_get_tree,
+       .reconfigure    = efs_reconfigure,
+       .free           = efs_free_fc,
+};
+
+/*
+ * Set up the filesystem mount context.
+ */
+static int efs_init_fs_context(struct fs_context *fc)
+{
+       struct efs_context *ctx;
+
+       ctx = kzalloc(sizeof(struct efs_context), GFP_KERNEL);
+       if (!ctx)
+               return -ENOMEM;
+       fc->fs_private = ctx;
+       fc->ops = &efs_context_opts;
+
+       return 0;
+}
+
 static int efs_statfs(struct dentry *dentry, struct kstatfs *buf) {
        struct super_block *sb = dentry->d_sb;
        struct efs_sb_info *sbi = SUPER_INFO(sb);
index 279933e007d21798549df035b4aa595597f225b6..7cc5841577b240f90f9a623e64adc87c3fb24982 100644 (file)
 struct z_erofs_decompress_req {
        struct super_block *sb;
        struct page **in, **out;
-
        unsigned short pageofs_in, pageofs_out;
        unsigned int inputsize, outputsize;
 
-       /* indicate the algorithm will be used for decompression */
-       unsigned int alg;
+       unsigned int alg;       /* the algorithm for decompression */
        bool inplace_io, partial_decoding, fillgaps;
+       gfp_t gfp;      /* allocation flags for extra temporary buffers */
 };
 
 struct z_erofs_decompressor {
index c98aeda8abb215e9be577d1b27dea2713b0b6e87..52524bd9698b43591e767cd0e45fc3c807375c58 100644 (file)
@@ -220,7 +220,7 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
                        up_read(&devs->rwsem);
                        return 0;
                }
-               map->m_bdev = dif->bdev_handle ? dif->bdev_handle->bdev : NULL;
+               map->m_bdev = dif->bdev_file ? file_bdev(dif->bdev_file) : NULL;
                map->m_daxdev = dif->dax_dev;
                map->m_dax_part_off = dif->dax_part_off;
                map->m_fscache = dif->fscache;
@@ -238,8 +238,8 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
                        if (map->m_pa >= startoff &&
                            map->m_pa < startoff + length) {
                                map->m_pa -= startoff;
-                               map->m_bdev = dif->bdev_handle ?
-                                             dif->bdev_handle->bdev : NULL;
+                               map->m_bdev = dif->bdev_file ?
+                                             file_bdev(dif->bdev_file) : NULL;
                                map->m_daxdev = dif->dax_dev;
                                map->m_dax_part_off = dif->dax_part_off;
                                map->m_fscache = dif->fscache;
@@ -447,5 +447,6 @@ const struct file_operations erofs_file_fops = {
        .llseek         = generic_file_llseek,
        .read_iter      = erofs_file_read_iter,
        .mmap           = erofs_file_mmap,
+       .get_unmapped_area = thp_get_unmapped_area,
        .splice_read    = filemap_splice_read,
 };
index 072ef6a66823ef351923f2c0514c9ddec50e5d8f..2ec9b2bb628d6b03bdf454c3fc6457b4065aabc8 100644 (file)
@@ -111,8 +111,9 @@ static int z_erofs_lz4_prepare_dstpages(struct z_erofs_lz4_decompress_ctx *ctx,
                        victim = availables[--top];
                        get_page(victim);
                } else {
-                       victim = erofs_allocpage(pagepool,
-                                                GFP_KERNEL | __GFP_NOFAIL);
+                       victim = erofs_allocpage(pagepool, rq->gfp);
+                       if (!victim)
+                               return -ENOMEM;
                        set_page_private(victim, Z_EROFS_SHORTLIVED_PAGE);
                }
                rq->out[i] = victim;
@@ -322,7 +323,8 @@ static int z_erofs_transform_plain(struct z_erofs_decompress_req *rq,
        unsigned int cur = 0, ni = 0, no, pi, po, insz, cnt;
        u8 *kin;
 
-       DBG_BUGON(rq->outputsize > rq->inputsize);
+       if (rq->outputsize > rq->inputsize)
+               return -EOPNOTSUPP;
        if (rq->alg == Z_EROFS_COMPRESSION_INTERLACED) {
                cur = bs - (rq->pageofs_out & (bs - 1));
                pi = (rq->pageofs_in + rq->inputsize - cur) & ~PAGE_MASK;
index 4a64a9c91dd322379d2c4be2268f6c4c24f995ee..b98872058abe82d4034b84c1c93c46645b50968b 100644 (file)
@@ -95,7 +95,7 @@ int z_erofs_load_deflate_config(struct super_block *sb,
 }
 
 int z_erofs_deflate_decompress(struct z_erofs_decompress_req *rq,
-                              struct page **pagepool)
+                              struct page **pgpl)
 {
        const unsigned int nrpages_out =
                PAGE_ALIGN(rq->pageofs_out + rq->outputsize) >> PAGE_SHIFT;
@@ -158,8 +158,12 @@ again:
                        strm->z.avail_out = min_t(u32, outsz, PAGE_SIZE - pofs);
                        outsz -= strm->z.avail_out;
                        if (!rq->out[no]) {
-                               rq->out[no] = erofs_allocpage(pagepool,
-                                               GFP_KERNEL | __GFP_NOFAIL);
+                               rq->out[no] = erofs_allocpage(pgpl, rq->gfp);
+                               if (!rq->out[no]) {
+                                       kout = NULL;
+                                       err = -ENOMEM;
+                                       break;
+                               }
                                set_page_private(rq->out[no],
                                                 Z_EROFS_SHORTLIVED_PAGE);
                        }
@@ -211,8 +215,11 @@ again:
 
                        DBG_BUGON(erofs_page_is_managed(EROFS_SB(sb),
                                                        rq->in[j]));
-                       tmppage = erofs_allocpage(pagepool,
-                                                 GFP_KERNEL | __GFP_NOFAIL);
+                       tmppage = erofs_allocpage(pgpl, rq->gfp);
+                       if (!tmppage) {
+                               err = -ENOMEM;
+                               goto failed;
+                       }
                        set_page_private(tmppage, Z_EROFS_SHORTLIVED_PAGE);
                        copy_highpage(tmppage, rq->in[j]);
                        rq->in[j] = tmppage;
@@ -230,7 +237,7 @@ again:
                        break;
                }
        }
-
+failed:
        if (zlib_inflateEnd(&strm->z) != Z_OK && !err)
                err = -EIO;
        if (kout)
index 2dd14f99c1dc10eeea57eedfccbb649bf184828f..6ca357d83cfa458225f20e2d6f6a45307fef2194 100644 (file)
@@ -148,7 +148,7 @@ again:
 }
 
 int z_erofs_lzma_decompress(struct z_erofs_decompress_req *rq,
-                           struct page **pagepool)
+                           struct page **pgpl)
 {
        const unsigned int nrpages_out =
                PAGE_ALIGN(rq->pageofs_out + rq->outputsize) >> PAGE_SHIFT;
@@ -215,8 +215,11 @@ again:
                                                   PAGE_SIZE - pageofs);
                        outlen -= strm->buf.out_size;
                        if (!rq->out[no] && rq->fillgaps) {     /* deduped */
-                               rq->out[no] = erofs_allocpage(pagepool,
-                                               GFP_KERNEL | __GFP_NOFAIL);
+                               rq->out[no] = erofs_allocpage(pgpl, rq->gfp);
+                               if (!rq->out[no]) {
+                                       err = -ENOMEM;
+                                       break;
+                               }
                                set_page_private(rq->out[no],
                                                 Z_EROFS_SHORTLIVED_PAGE);
                        }
@@ -258,8 +261,11 @@ again:
 
                        DBG_BUGON(erofs_page_is_managed(EROFS_SB(rq->sb),
                                                        rq->in[j]));
-                       tmppage = erofs_allocpage(pagepool,
-                                                 GFP_KERNEL | __GFP_NOFAIL);
+                       tmppage = erofs_allocpage(pgpl, rq->gfp);
+                       if (!tmppage) {
+                               err = -ENOMEM;
+                               goto failed;
+                       }
                        set_page_private(tmppage, Z_EROFS_SHORTLIVED_PAGE);
                        copy_highpage(tmppage, rq->in[j]);
                        rq->in[j] = tmppage;
@@ -277,6 +283,7 @@ again:
                        break;
                }
        }
+failed:
        if (no < nrpages_out && strm->buf.out)
                kunmap(rq->out[no]);
        if (ni < nrpages_in)
index bc12030393b24f26231fb363ac07e3150cd6babb..89a7c2453aae6f130e679af1459673397d581842 100644 (file)
@@ -381,11 +381,12 @@ static int erofs_fscache_init_domain(struct super_block *sb)
                goto out;
 
        if (!erofs_pseudo_mnt) {
-               erofs_pseudo_mnt = kern_mount(&erofs_fs_type);
-               if (IS_ERR(erofs_pseudo_mnt)) {
-                       err = PTR_ERR(erofs_pseudo_mnt);
+               struct vfsmount *mnt = kern_mount(&erofs_fs_type);
+               if (IS_ERR(mnt)) {
+                       err = PTR_ERR(mnt);
                        goto out;
                }
+               erofs_pseudo_mnt = mnt;
        }
 
        domain->volume = sbi->volume;
@@ -459,7 +460,7 @@ static struct erofs_fscache *erofs_fscache_acquire_cookie(struct super_block *sb
 
        inode->i_size = OFFSET_MAX;
        inode->i_mapping->a_ops = &erofs_fscache_meta_aops;
-       mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
+       mapping_set_gfp_mask(inode->i_mapping, GFP_KERNEL);
        inode->i_blkbits = EROFS_SB(sb)->blkszbits;
        inode->i_private = ctx;
 
index 3d616dea55dc3dbccbac495988f865947b0d2a96..36e638e8b53a3d290fcb7ade23a40dc4805be9e6 100644 (file)
@@ -60,7 +60,7 @@ static void *erofs_read_inode(struct erofs_buf *buf,
                } else {
                        const unsigned int gotten = sb->s_blocksize - *ofs;
 
-                       copied = kmalloc(vi->inode_isize, GFP_NOFS);
+                       copied = kmalloc(vi->inode_isize, GFP_KERNEL);
                        if (!copied) {
                                err = -ENOMEM;
                                goto err_out;
index b0409badb0172387f8b96c03f69267da5403b68d..0f0706325b7b4753f93c515ec423f35466e5224d 100644 (file)
@@ -49,7 +49,7 @@ typedef u32 erofs_blk_t;
 struct erofs_device_info {
        char *path;
        struct erofs_fscache *fscache;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        struct dax_device *dax_dev;
        u64 dax_part_off;
 
index d4f631d39f0fa83141eafcb13951bb3fd36598bd..f0110a78acb2078aa2ce6eae13e39481e46b7ea9 100644 (file)
@@ -130,24 +130,24 @@ static void *erofs_find_target_block(struct erofs_buf *target,
                        /* string comparison without already matched prefix */
                        diff = erofs_dirnamecmp(name, &dname, &matched);
 
-                       if (!diff) {
-                               *_ndirents = 0;
-                               goto out;
-                       } else if (diff > 0) {
-                               head = mid + 1;
-                               startprfx = matched;
-
-                               if (!IS_ERR(candidate))
-                                       erofs_put_metabuf(target);
-                               *target = buf;
-                               candidate = de;
-                               *_ndirents = ndirents;
-                       } else {
+                       if (diff < 0) {
                                erofs_put_metabuf(&buf);
-
                                back = mid - 1;
                                endprfx = matched;
+                               continue;
+                       }
+
+                       if (!IS_ERR(candidate))
+                               erofs_put_metabuf(target);
+                       *target = buf;
+                       if (!diff) {
+                               *_ndirents = 0;
+                               return de;
                        }
+                       head = mid + 1;
+                       startprfx = matched;
+                       candidate = de;
+                       *_ndirents = ndirents;
                        continue;
                }
 out:           /* free if the candidate is valid */
index 5f60f163bd56e272d167399ddac9a95de42b1b34..9b4b66dcdd4f10d5f8338c398e394b6601941818 100644 (file)
@@ -177,7 +177,7 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb,
        struct erofs_sb_info *sbi = EROFS_SB(sb);
        struct erofs_fscache *fscache;
        struct erofs_deviceslot *dis;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        void *ptr;
 
        ptr = erofs_read_metabuf(buf, sb, erofs_blknr(sb, *pos), EROFS_KMAP);
@@ -201,12 +201,12 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb,
                        return PTR_ERR(fscache);
                dif->fscache = fscache;
        } else if (!sbi->devs->flatdev) {
-               bdev_handle = bdev_open_by_path(dif->path, BLK_OPEN_READ,
+               bdev_file = bdev_file_open_by_path(dif->path, BLK_OPEN_READ,
                                                sb->s_type, NULL);
-               if (IS_ERR(bdev_handle))
-                       return PTR_ERR(bdev_handle);
-               dif->bdev_handle = bdev_handle;
-               dif->dax_dev = fs_dax_get_by_bdev(bdev_handle->bdev,
+               if (IS_ERR(bdev_file))
+                       return PTR_ERR(bdev_file);
+               dif->bdev_file = bdev_file;
+               dif->dax_dev = fs_dax_get_by_bdev(file_bdev(bdev_file),
                                &dif->dax_part_off, NULL, NULL);
        }
 
@@ -754,8 +754,8 @@ static int erofs_release_device_info(int id, void *ptr, void *data)
        struct erofs_device_info *dif = ptr;
 
        fs_put_dax(dif->dax_dev, NULL);
-       if (dif->bdev_handle)
-               bdev_release(dif->bdev_handle);
+       if (dif->bdev_file)
+               fput(dif->bdev_file);
        erofs_fscache_unregister_cookie(dif->fscache);
        dif->fscache = NULL;
        kfree(dif->path);
index 5dea308764b45038f8236bf31b004067f0f297a6..e146d09151af4188efe4cb7bf2ad4a938b8596af 100644 (file)
@@ -81,7 +81,7 @@ struct erofs_workgroup *erofs_insert_workgroup(struct super_block *sb,
 repeat:
        xa_lock(&sbi->managed_pslots);
        pre = __xa_cmpxchg(&sbi->managed_pslots, grp->index,
-                          NULL, grp, GFP_NOFS);
+                          NULL, grp, GFP_KERNEL);
        if (pre) {
                if (xa_is_err(pre)) {
                        pre = ERR_PTR(xa_err(pre));
index 692c0c39be638dc4b2454b63968a0467043ddc7a..ff0aa72b0db342f10ed7c1b565d2cc7bd6a540ff 100644 (file)
@@ -82,6 +82,9 @@ struct z_erofs_pcluster {
        /* L: indicate several pageofs_outs or not */
        bool multibases;
 
+       /* L: whether extra buffer allocations are best-effort */
+       bool besteffort;
+
        /* A: compressed bvecs (can be cached or inplaced pages) */
        struct z_erofs_bvec compressed_bvecs[];
 };
@@ -230,7 +233,7 @@ static int z_erofs_bvec_enqueue(struct z_erofs_bvec_iter *iter,
                struct page *nextpage = *candidate_bvpage;
 
                if (!nextpage) {
-                       nextpage = erofs_allocpage(pagepool, GFP_NOFS);
+                       nextpage = erofs_allocpage(pagepool, GFP_KERNEL);
                        if (!nextpage)
                                return -ENOMEM;
                        set_page_private(nextpage, Z_EROFS_SHORTLIVED_PAGE);
@@ -302,7 +305,7 @@ static struct z_erofs_pcluster *z_erofs_alloc_pcluster(unsigned int size)
                if (nrpages > pcs->maxpages)
                        continue;
 
-               pcl = kmem_cache_zalloc(pcs->slab, GFP_NOFS);
+               pcl = kmem_cache_zalloc(pcs->slab, GFP_KERNEL);
                if (!pcl)
                        return ERR_PTR(-ENOMEM);
                pcl->pclustersize = size;
@@ -563,21 +566,19 @@ static void z_erofs_bind_cache(struct z_erofs_decompress_frontend *fe)
                        __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN;
        unsigned int i;
 
-       if (i_blocksize(fe->inode) != PAGE_SIZE)
-               return;
-       if (fe->mode < Z_EROFS_PCLUSTER_FOLLOWED)
+       if (i_blocksize(fe->inode) != PAGE_SIZE ||
+           fe->mode < Z_EROFS_PCLUSTER_FOLLOWED)
                return;
 
        for (i = 0; i < pclusterpages; ++i) {
                struct page *page, *newpage;
                void *t;        /* mark pages just found for debugging */
 
-               /* the compressed page was loaded before */
+               /* Inaccurate check w/o locking to avoid unneeded lookups */
                if (READ_ONCE(pcl->compressed_bvecs[i].page))
                        continue;
 
                page = find_get_page(mc, pcl->obj.index + i);
-
                if (page) {
                        t = (void *)((unsigned long)page | 1);
                        newpage = NULL;
@@ -597,9 +598,13 @@ static void z_erofs_bind_cache(struct z_erofs_decompress_frontend *fe)
                        set_page_private(newpage, Z_EROFS_PREALLOCATED_PAGE);
                        t = (void *)((unsigned long)newpage | 1);
                }
-
-               if (!cmpxchg_relaxed(&pcl->compressed_bvecs[i].page, NULL, t))
+               spin_lock(&pcl->obj.lockref.lock);
+               if (!pcl->compressed_bvecs[i].page) {
+                       pcl->compressed_bvecs[i].page = t;
+                       spin_unlock(&pcl->obj.lockref.lock);
                        continue;
+               }
+               spin_unlock(&pcl->obj.lockref.lock);
 
                if (page)
                        put_page(page);
@@ -694,7 +699,7 @@ static void z_erofs_cache_invalidate_folio(struct folio *folio,
        DBG_BUGON(stop > folio_size(folio) || stop < length);
 
        if (offset == 0 && stop == folio_size(folio))
-               while (!z_erofs_cache_release_folio(folio, GFP_NOFS))
+               while (!z_erofs_cache_release_folio(folio, 0))
                        cond_resched();
 }
 
@@ -713,36 +718,30 @@ int erofs_init_managed_cache(struct super_block *sb)
        set_nlink(inode, 1);
        inode->i_size = OFFSET_MAX;
        inode->i_mapping->a_ops = &z_erofs_cache_aops;
-       mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
+       mapping_set_gfp_mask(inode->i_mapping, GFP_KERNEL);
        EROFS_SB(sb)->managed_cache = inode;
        return 0;
 }
 
-static bool z_erofs_try_inplace_io(struct z_erofs_decompress_frontend *fe,
-                                  struct z_erofs_bvec *bvec)
-{
-       struct z_erofs_pcluster *const pcl = fe->pcl;
-
-       while (fe->icur > 0) {
-               if (!cmpxchg(&pcl->compressed_bvecs[--fe->icur].page,
-                            NULL, bvec->page)) {
-                       pcl->compressed_bvecs[fe->icur] = *bvec;
-                       return true;
-               }
-       }
-       return false;
-}
-
 /* callers must be with pcluster lock held */
 static int z_erofs_attach_page(struct z_erofs_decompress_frontend *fe,
                               struct z_erofs_bvec *bvec, bool exclusive)
 {
+       struct z_erofs_pcluster *pcl = fe->pcl;
        int ret;
 
        if (exclusive) {
                /* give priority for inplaceio to use file pages first */
-               if (z_erofs_try_inplace_io(fe, bvec))
+               spin_lock(&pcl->obj.lockref.lock);
+               while (fe->icur > 0) {
+                       if (pcl->compressed_bvecs[--fe->icur].page)
+                               continue;
+                       pcl->compressed_bvecs[fe->icur] = *bvec;
+                       spin_unlock(&pcl->obj.lockref.lock);
                        return 0;
+               }
+               spin_unlock(&pcl->obj.lockref.lock);
+
                /* otherwise, check if it can be used as a bvpage */
                if (fe->mode >= Z_EROFS_PCLUSTER_FOLLOWED &&
                    !fe->candidate_bvpage)
@@ -964,7 +963,7 @@ static int z_erofs_read_fragment(struct super_block *sb, struct page *page,
 }
 
 static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe,
-                               struct page *page)
+                               struct page *page, bool ra)
 {
        struct inode *const inode = fe->inode;
        struct erofs_map_blocks *const map = &fe->map;
@@ -1014,6 +1013,7 @@ repeat:
                err = z_erofs_pcluster_begin(fe);
                if (err)
                        goto out;
+               fe->pcl->besteffort |= !ra;
        }
 
        /*
@@ -1280,6 +1280,9 @@ static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be,
                                        .inplace_io = overlapped,
                                        .partial_decoding = pcl->partial,
                                        .fillgaps = pcl->multibases,
+                                       .gfp = pcl->besteffort ?
+                                               GFP_KERNEL | __GFP_NOFAIL :
+                                               GFP_NOWAIT | __GFP_NORETRY
                                 }, be->pagepool);
 
        /* must handle all compressed pages before actual file pages */
@@ -1322,6 +1325,7 @@ static int z_erofs_decompress_pcluster(struct z_erofs_decompress_backend *be,
        pcl->length = 0;
        pcl->partial = true;
        pcl->multibases = false;
+       pcl->besteffort = false;
        pcl->bvset.nextpage = NULL;
        pcl->vcnt = 0;
 
@@ -1423,23 +1427,26 @@ static void z_erofs_fill_bio_vec(struct bio_vec *bvec,
 {
        gfp_t gfp = mapping_gfp_mask(mc);
        bool tocache = false;
-       struct z_erofs_bvec *zbv = pcl->compressed_bvecs + nr;
+       struct z_erofs_bvec zbv;
        struct address_space *mapping;
-       struct page *page, *oldpage;
+       struct page *page;
        int justfound, bs = i_blocksize(f->inode);
 
        /* Except for inplace pages, the entire page can be used for I/Os */
        bvec->bv_offset = 0;
        bvec->bv_len = PAGE_SIZE;
 repeat:
-       oldpage = READ_ONCE(zbv->page);
-       if (!oldpage)
+       spin_lock(&pcl->obj.lockref.lock);
+       zbv = pcl->compressed_bvecs[nr];
+       page = zbv.page;
+       justfound = (unsigned long)page & 1UL;
+       page = (struct page *)((unsigned long)page & ~1UL);
+       pcl->compressed_bvecs[nr].page = page;
+       spin_unlock(&pcl->obj.lockref.lock);
+       if (!page)
                goto out_allocpage;
 
-       justfound = (unsigned long)oldpage & 1UL;
-       page = (struct page *)((unsigned long)oldpage & ~1UL);
        bvec->bv_page = page;
-
        DBG_BUGON(z_erofs_is_shortlived_page(page));
        /*
         * Handle preallocated cached pages.  We tried to allocate such pages
@@ -1448,7 +1455,6 @@ repeat:
         */
        if (page->private == Z_EROFS_PREALLOCATED_PAGE) {
                set_page_private(page, 0);
-               WRITE_ONCE(zbv->page, page);
                tocache = true;
                goto out_tocache;
        }
@@ -1459,9 +1465,9 @@ repeat:
         * therefore it is impossible for `mapping` to be NULL.
         */
        if (mapping && mapping != mc) {
-               if (zbv->offset < 0)
-                       bvec->bv_offset = round_up(-zbv->offset, bs);
-               bvec->bv_len = round_up(zbv->end, bs) - bvec->bv_offset;
+               if (zbv.offset < 0)
+                       bvec->bv_offset = round_up(-zbv.offset, bs);
+               bvec->bv_len = round_up(zbv.end, bs) - bvec->bv_offset;
                return;
        }
 
@@ -1471,7 +1477,6 @@ repeat:
 
        /* the cached page is still in managed cache */
        if (page->mapping == mc) {
-               WRITE_ONCE(zbv->page, page);
                /*
                 * The cached page is still available but without a valid
                 * `->private` pcluster hint.  Let's reconnect them.
@@ -1503,11 +1508,15 @@ repeat:
        put_page(page);
 out_allocpage:
        page = erofs_allocpage(&f->pagepool, gfp | __GFP_NOFAIL);
-       if (oldpage != cmpxchg(&zbv->page, oldpage, page)) {
+       spin_lock(&pcl->obj.lockref.lock);
+       if (pcl->compressed_bvecs[nr].page) {
                erofs_pagepool_add(&f->pagepool, page);
+               spin_unlock(&pcl->obj.lockref.lock);
                cond_resched();
                goto repeat;
        }
+       pcl->compressed_bvecs[nr].page = page;
+       spin_unlock(&pcl->obj.lockref.lock);
        bvec->bv_page = page;
 out_tocache:
        if (!tocache || bs != PAGE_SIZE ||
@@ -1685,6 +1694,7 @@ submit_bio_retry:
 
                        if (cur + bvec.bv_len > end)
                                bvec.bv_len = end - cur;
+                       DBG_BUGON(bvec.bv_len < sb->s_blocksize);
                        if (!bio_add_page(bio, bvec.bv_page, bvec.bv_len,
                                          bvec.bv_offset))
                                goto submit_bio_retry;
@@ -1785,7 +1795,7 @@ static void z_erofs_pcluster_readmore(struct z_erofs_decompress_frontend *f,
                        if (PageUptodate(page))
                                unlock_page(page);
                        else
-                               (void)z_erofs_do_read_page(f, page);
+                               (void)z_erofs_do_read_page(f, page, !!rac);
                        put_page(page);
                }
 
@@ -1806,7 +1816,7 @@ static int z_erofs_read_folio(struct file *file, struct folio *folio)
        f.headoffset = (erofs_off_t)folio->index << PAGE_SHIFT;
 
        z_erofs_pcluster_readmore(&f, NULL, true);
-       err = z_erofs_do_read_page(&f, &folio->page);
+       err = z_erofs_do_read_page(&f, &folio->page, false);
        z_erofs_pcluster_readmore(&f, NULL, false);
        z_erofs_pcluster_end(&f);
 
@@ -1847,7 +1857,7 @@ static void z_erofs_readahead(struct readahead_control *rac)
                folio = head;
                head = folio_get_private(folio);
 
-               err = z_erofs_do_read_page(&f, &folio->page);
+               err = z_erofs_do_read_page(&f, &folio->page, true);
                if (err && err != -EINTR)
                        erofs_err(inode->i_sb, "readahead error at folio %lu @ nid %llu",
                                  folio->index, EROFS_I(inode)->nid);
index ad8186d47ba76062f1540835c9a5a0a64f560cd5..9afdb722fa9257f4f0dfac12146a229933c11441 100644 (file)
@@ -251,7 +251,7 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c
        ssize_t res;
        __u64 ucnt;
 
-       if (count < sizeof(ucnt))
+       if (count != sizeof(ucnt))
                return -EINVAL;
        if (copy_from_user(&ucnt, buf, sizeof(ucnt)))
                return -EFAULT;
@@ -283,13 +283,18 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c
 static void eventfd_show_fdinfo(struct seq_file *m, struct file *f)
 {
        struct eventfd_ctx *ctx = f->private_data;
+       __u64 cnt;
 
        spin_lock_irq(&ctx->wqh.lock);
-       seq_printf(m, "eventfd-count: %16llx\n",
-                  (unsigned long long)ctx->count);
+       cnt = ctx->count;
        spin_unlock_irq(&ctx->wqh.lock);
-       seq_printf(m, "eventfd-id: %d\n", ctx->id);
-       seq_printf(m, "eventfd-semaphore: %d\n",
+
+       seq_printf(m,
+                  "eventfd-count: %16llx\n"
+                  "eventfd-id: %d\n"
+                  "eventfd-semaphore: %d\n",
+                  cnt,
+                  ctx->id,
                   !!(ctx->flags & EFD_SEMAPHORE));
 }
 #endif
@@ -383,6 +388,7 @@ static int do_eventfd(unsigned int count, int flags)
        /* Check the EFD_* constants for consistency.  */
        BUILD_BUG_ON(EFD_CLOEXEC != O_CLOEXEC);
        BUILD_BUG_ON(EFD_NONBLOCK != O_NONBLOCK);
+       BUILD_BUG_ON(EFD_SEMAPHORE != (1 << 0));
 
        if (flags & ~EFD_FLAGS_SET)
                return -EINVAL;
index 3534d36a147400079bfd618d198e47fe669aee5f..39ac6fdf8bcab38533b3c2f4e5b4bb97ed3432fa 100644 (file)
@@ -206,7 +206,7 @@ struct eventpoll {
         */
        struct epitem *ovflist;
 
-       /* wakeup_source used when ep_scan_ready_list is running */
+       /* wakeup_source used when ep_send_events or __ep_eventpoll_poll is running */
        struct wakeup_source *ws;
 
        /* The user that created the eventpoll descriptor */
@@ -678,12 +678,6 @@ static void ep_done_scan(struct eventpoll *ep,
        write_unlock_irq(&ep->lock);
 }
 
-static void epi_rcu_free(struct rcu_head *head)
-{
-       struct epitem *epi = container_of(head, struct epitem, rcu);
-       kmem_cache_free(epi_cache, epi);
-}
-
 static void ep_get(struct eventpoll *ep)
 {
        refcount_inc(&ep->refcount);
@@ -767,7 +761,7 @@ static bool __ep_remove(struct eventpoll *ep, struct epitem *epi, bool force)
         * ep->mtx. The rcu read side, reverse_path_check_proc(), does not make
         * use of the rbn field.
         */
-       call_rcu(&epi->rcu, epi_rcu_free);
+       kfree_rcu(epi, rcu);
 
        percpu_counter_dec(&ep->user->epoll_watches);
        return ep_refcount_dec_and_test(ep);
@@ -1153,7 +1147,7 @@ static inline bool chain_epi_lockless(struct epitem *epi)
  * This callback takes a read lock in order not to contend with concurrent
  * events from another file descriptor, thus all modifications to ->rdllist
  * or ->ovflist are lockless.  Read lock is paired with the write lock from
- * ep_scan_ready_list(), which stops all list modifications and guarantees
+ * ep_start/done_scan(), which stops all list modifications and guarantees
  * that lists state is seen correctly.
  *
  * Another thing worth to mention is that ep_poll_callback() can be called
@@ -1751,7 +1745,7 @@ static int ep_send_events(struct eventpoll *ep,
                         * availability. At this point, no one can insert
                         * into ep->rdllist besides us. The epoll_ctl()
                         * callers are locked out by
-                        * ep_scan_ready_list() holding "mtx" and the
+                        * ep_send_events() holding "mtx" and the
                         * poll callback will queue them in ep->ovflist.
                         */
                        list_add_tail(&epi->rdllink, &ep->rdllist);
@@ -1904,7 +1898,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
                __set_current_state(TASK_INTERRUPTIBLE);
 
                /*
-                * Do the final check under the lock. ep_scan_ready_list()
+                * Do the final check under the lock. ep_start/done_scan()
                 * plays with two lists (->rdllist and ->ovflist) and there
                 * is always a race when both lists are empty for short
                 * period of time although events are pending, so lock is
index 8cdd5b2dd09c2e8047d6bd360e14b060dd23fbf0..ece3ab0998e11ee3fb6f0e13d7766ae2c50b2968 100644 (file)
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -128,7 +128,7 @@ SYSCALL_DEFINE1(uselib, const char __user *, library)
        struct filename *tmp = getname(library);
        int error = PTR_ERR(tmp);
        static const struct open_flags uselib_flags = {
-               .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
+               .open_flag = O_LARGEFILE | O_RDONLY,
                .acc_mode = MAY_READ | MAY_EXEC,
                .intent = LOOKUP_OPEN,
                .lookup_flags = LOOKUP_FOLLOW,
@@ -904,6 +904,10 @@ EXPORT_SYMBOL(transfer_args_to_stack);
 
 #endif /* CONFIG_MMU */
 
+/*
+ * On success, caller must call do_close_execat() on the returned
+ * struct file to close it.
+ */
 static struct file *do_open_execat(int fd, struct filename *name, int flags)
 {
        struct file *file;
@@ -948,6 +952,17 @@ exit:
        return ERR_PTR(err);
 }
 
+/**
+ * open_exec - Open a path name for execution
+ *
+ * @name: path name to open with the intent of executing it.
+ *
+ * Returns ERR_PTR on failure or allocated struct file on success.
+ *
+ * As this is a wrapper for the internal do_open_execat(), callers
+ * must call allow_write_access() before fput() on release. Also see
+ * do_close_execat().
+ */
 struct file *open_exec(const char *name)
 {
        struct filename *filename = getname_kernel(name);
@@ -1143,7 +1158,6 @@ static int de_thread(struct task_struct *tsk)
 
                BUG_ON(leader->exit_state != EXIT_ZOMBIE);
                leader->exit_state = EXIT_DEAD;
-
                /*
                 * We are going to release_task()->ptrace_unlink() silently,
                 * the tracer can sleep in do_wait(). EXIT_DEAD guarantees
@@ -1409,6 +1423,9 @@ int begin_new_exec(struct linux_binprm * bprm)
 
 out_unlock:
        up_write(&me->signal->exec_update_lock);
+       if (!bprm->cred)
+               mutex_unlock(&me->signal->cred_guard_mutex);
+
 out:
        return retval;
 }
@@ -1484,6 +1501,15 @@ static int prepare_bprm_creds(struct linux_binprm *bprm)
        return -ENOMEM;
 }
 
+/* Matches do_open_execat() */
+static void do_close_execat(struct file *file)
+{
+       if (!file)
+               return;
+       allow_write_access(file);
+       fput(file);
+}
+
 static void free_bprm(struct linux_binprm *bprm)
 {
        if (bprm->mm) {
@@ -1495,10 +1521,7 @@ static void free_bprm(struct linux_binprm *bprm)
                mutex_unlock(&current->signal->cred_guard_mutex);
                abort_creds(bprm->cred);
        }
-       if (bprm->file) {
-               allow_write_access(bprm->file);
-               fput(bprm->file);
-       }
+       do_close_execat(bprm->file);
        if (bprm->executable)
                fput(bprm->executable);
        /* If a binfmt changed the interp, free it. */
@@ -1520,8 +1543,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
 
        bprm = kzalloc(sizeof(*bprm), GFP_KERNEL);
        if (!bprm) {
-               allow_write_access(file);
-               fput(file);
+               do_close_execat(file);
                return ERR_PTR(-ENOMEM);
        }
 
@@ -1610,6 +1632,7 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
        }
        rcu_read_unlock();
 
+       /* "users" and "in_exec" locked for copy_fs() */
        if (p->fs->users > n_fs)
                bprm->unsafe |= LSM_UNSAFE_SHARE;
        else
@@ -1826,9 +1849,6 @@ static int exec_binprm(struct linux_binprm *bprm)
        return 0;
 }
 
-/*
- * sys_execve() executes a new program.
- */
 static int bprm_execve(struct linux_binprm *bprm)
 {
        int retval;
index 9474cd50da6d4fd8b9fba92f1f3d8717f19245dc..361595433480c46562765ad4d5c886a071005c25 100644 (file)
@@ -275,6 +275,7 @@ struct exfat_sb_info {
 
        spinlock_t inode_hash_lock;
        struct hlist_head inode_hashtable[EXFAT_HASH_SIZE];
+       struct rcu_head rcu;
 };
 
 #define EXFAT_CACHE_VALID      0
index d25a96a148af4cdb966c5d20f720aa944cab10c2..cc00f1a7a1e18082af9e0e8ff28f5995de75f1ba 100644 (file)
@@ -35,13 +35,18 @@ static int exfat_cont_expand(struct inode *inode, loff_t size)
        if (new_num_clusters == num_clusters)
                goto out;
 
-       exfat_chain_set(&clu, ei->start_clu, num_clusters, ei->flags);
-       ret = exfat_find_last_cluster(sb, &clu, &last_clu);
-       if (ret)
-               return ret;
+       if (num_clusters) {
+               exfat_chain_set(&clu, ei->start_clu, num_clusters, ei->flags);
+               ret = exfat_find_last_cluster(sb, &clu, &last_clu);
+               if (ret)
+                       return ret;
+
+               clu.dir = last_clu + 1;
+       } else {
+               last_clu = EXFAT_EOF_CLUSTER;
+               clu.dir = EXFAT_EOF_CLUSTER;
+       }
 
-       clu.dir = (last_clu == EXFAT_EOF_CLUSTER) ?
-                       EXFAT_EOF_CLUSTER : last_clu + 1;
        clu.size = 0;
        clu.flags = ei->flags;
 
@@ -51,17 +56,19 @@ static int exfat_cont_expand(struct inode *inode, loff_t size)
                return ret;
 
        /* Append new clusters to chain */
-       if (clu.flags != ei->flags) {
-               exfat_chain_cont_cluster(sb, ei->start_clu, num_clusters);
-               ei->flags = ALLOC_FAT_CHAIN;
-       }
-       if (clu.flags == ALLOC_FAT_CHAIN)
-               if (exfat_ent_set(sb, last_clu, clu.dir))
-                       goto free_clu;
-
-       if (num_clusters == 0)
+       if (num_clusters) {
+               if (clu.flags != ei->flags)
+                       if (exfat_chain_cont_cluster(sb, ei->start_clu, num_clusters))
+                               goto free_clu;
+
+               if (clu.flags == ALLOC_FAT_CHAIN)
+                       if (exfat_ent_set(sb, last_clu, clu.dir))
+                               goto free_clu;
+       } else
                ei->start_clu = clu.dir;
 
+       ei->flags = clu.flags;
+
 out:
        inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
        /* Expanded range not zeroed, do not update valid_size */
index 522edcbb2ce4d17a7f219e6016bff31f5d466fdd..0687f952956c34b6d85e785ee13f231d49679e64 100644 (file)
@@ -501,7 +501,7 @@ static ssize_t exfat_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
        struct inode *inode = mapping->host;
        struct exfat_inode_info *ei = EXFAT_I(inode);
        loff_t pos = iocb->ki_pos;
-       loff_t size = iocb->ki_pos + iov_iter_count(iter);
+       loff_t size = pos + iov_iter_count(iter);
        int rw = iov_iter_rw(iter);
        ssize_t ret;
 
@@ -525,11 +525,10 @@ static ssize_t exfat_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
         */
        ret = blockdev_direct_IO(iocb, inode, iter, exfat_get_block);
        if (ret < 0) {
-               if (rw == WRITE)
+               if (rw == WRITE && ret != -EIOCBQUEUED)
                        exfat_write_failed(mapping, size);
 
-               if (ret != -EIOCBQUEUED)
-                       return ret;
+               return ret;
        } else
                size = pos + ret;
 
index 705710f93e2ddd3c911df119b2c8e0ca5a9152cd..afdf13c34ff526fb423f322d3503b571e2d153e8 100644 (file)
@@ -655,7 +655,6 @@ static int exfat_load_upcase_table(struct super_block *sb,
        unsigned int sect_size = sb->s_blocksize;
        unsigned int i, index = 0;
        u32 chksum = 0;
-       int ret;
        unsigned char skip = false;
        unsigned short *upcase_table;
 
@@ -673,8 +672,7 @@ static int exfat_load_upcase_table(struct super_block *sb,
                if (!bh) {
                        exfat_err(sb, "failed to read sector(0x%llx)",
                                  (unsigned long long)sector);
-                       ret = -EIO;
-                       goto free_table;
+                       return -EIO;
                }
                sector++;
                for (i = 0; i < sect_size && index <= 0xFFFF; i += 2) {
@@ -701,15 +699,12 @@ static int exfat_load_upcase_table(struct super_block *sb,
 
        exfat_err(sb, "failed to load upcase table (idx : 0x%08x, chksum : 0x%08x, utbl_chksum : 0x%08x)",
                  index, chksum, utbl_checksum);
-       ret = -EINVAL;
-free_table:
-       exfat_free_upcase_table(sbi);
-       return ret;
+       return -EINVAL;
 }
 
 static int exfat_load_default_upcase_table(struct super_block *sb)
 {
-       int i, ret = -EIO;
+       int i;
        struct exfat_sb_info *sbi = EXFAT_SB(sb);
        unsigned char skip = false;
        unsigned short uni = 0, *upcase_table;
@@ -740,8 +735,7 @@ static int exfat_load_default_upcase_table(struct super_block *sb)
                return 0;
 
        /* FATAL error: default upcase table has error */
-       exfat_free_upcase_table(sbi);
-       return ret;
+       return -EIO;
 }
 
 int exfat_create_upcase_table(struct super_block *sb)
index d9d4fa91010bb1d226b1d00afdbb841e73911d33..fcb6582677650bd1462e501e9c5bb67a032befd4 100644 (file)
@@ -39,9 +39,6 @@ static void exfat_put_super(struct super_block *sb)
        exfat_free_bitmap(sbi);
        brelse(sbi->boot_bh);
        mutex_unlock(&sbi->s_lock);
-
-       unload_nls(sbi->nls_io);
-       exfat_free_upcase_table(sbi);
 }
 
 static int exfat_sync_fs(struct super_block *sb, int wait)
@@ -600,7 +597,7 @@ static int __exfat_fill_super(struct super_block *sb)
        ret = exfat_load_bitmap(sb);
        if (ret) {
                exfat_err(sb, "failed to load alloc-bitmap");
-               goto free_upcase_table;
+               goto free_bh;
        }
 
        ret = exfat_count_used_clusters(sb, &sbi->used_clusters);
@@ -613,8 +610,6 @@ static int __exfat_fill_super(struct super_block *sb)
 
 free_alloc_bitmap:
        exfat_free_bitmap(sbi);
-free_upcase_table:
-       exfat_free_upcase_table(sbi);
 free_bh:
        brelse(sbi->boot_bh);
        return ret;
@@ -701,12 +696,10 @@ put_inode:
        sb->s_root = NULL;
 
 free_table:
-       exfat_free_upcase_table(sbi);
        exfat_free_bitmap(sbi);
        brelse(sbi->boot_bh);
 
 check_nls_io:
-       unload_nls(sbi->nls_io);
        return err;
 }
 
@@ -771,13 +764,22 @@ static int exfat_init_fs_context(struct fs_context *fc)
        return 0;
 }
 
+static void delayed_free(struct rcu_head *p)
+{
+       struct exfat_sb_info *sbi = container_of(p, struct exfat_sb_info, rcu);
+
+       unload_nls(sbi->nls_io);
+       exfat_free_upcase_table(sbi);
+       exfat_free_sbi(sbi);
+}
+
 static void exfat_kill_sb(struct super_block *sb)
 {
        struct exfat_sb_info *sbi = sb->s_fs_info;
 
        kill_block_super(sb);
        if (sbi)
-               exfat_free_sbi(sbi);
+               call_rcu(&sbi->rcu, delayed_free);
 }
 
 static struct file_system_type exfat_fs_type = {
index 3ae0154c5680b2c06771a3819d825a1953aa707f..07ea3d62b2982d308439cd9c4008b0c7113df5fa 100644 (file)
@@ -255,7 +255,7 @@ static bool filldir_one(struct dir_context *ctx, const char *name, int len,
                container_of(ctx, struct getdents_callback, ctx);
 
        buf->sequence++;
-       if (buf->ino == ino && len <= NAME_MAX) {
+       if (buf->ino == ino && len <= NAME_MAX && !is_dot_dotdot(name, len)) {
                memcpy(buf->name, name, len);
                buf->name[len] = '\0';
                buf->found = 1;
index a5d784872303ddb6731f2bf0f8579170809b36fd..3c0d7d143036abd34867604cd9c89a4a2a3151d4 100644 (file)
@@ -252,8 +252,10 @@ struct ext4_allocation_request {
 #define EXT4_MAP_MAPPED                BIT(BH_Mapped)
 #define EXT4_MAP_UNWRITTEN     BIT(BH_Unwritten)
 #define EXT4_MAP_BOUNDARY      BIT(BH_Boundary)
+#define EXT4_MAP_DELAYED       BIT(BH_Delay)
 #define EXT4_MAP_FLAGS         (EXT4_MAP_NEW | EXT4_MAP_MAPPED |\
-                                EXT4_MAP_UNWRITTEN | EXT4_MAP_BOUNDARY)
+                                EXT4_MAP_UNWRITTEN | EXT4_MAP_BOUNDARY |\
+                                EXT4_MAP_DELAYED)
 
 struct ext4_map_blocks {
        ext4_fsblk_t m_pblk;
@@ -1548,7 +1550,7 @@ struct ext4_sb_info {
        unsigned long s_commit_interval;
        u32 s_max_batch_time;
        u32 s_min_batch_time;
-       struct bdev_handle *s_journal_bdev_handle;
+       struct file *s_journal_bdev_file;
 #ifdef CONFIG_QUOTA
        /* Names of quota files with journalled quota */
        char __rcu *s_qf_names[EXT4_MAXQUOTAS];
@@ -2912,10 +2914,10 @@ extern const struct seq_operations ext4_mb_seq_groups_ops;
 extern const struct seq_operations ext4_mb_seq_structs_summary_ops;
 extern int ext4_seq_mb_stats_show(struct seq_file *seq, void *offset);
 extern int ext4_mb_init(struct super_block *);
-extern int ext4_mb_release(struct super_block *);
+extern void ext4_mb_release(struct super_block *);
 extern ext4_fsblk_t ext4_mb_new_blocks(handle_t *,
                                struct ext4_allocation_request *, int *);
-extern void ext4_discard_preallocations(struct inode *, unsigned int);
+extern void ext4_discard_preallocations(struct inode *);
 extern int __init ext4_init_mballoc(void);
 extern void ext4_exit_mballoc(void);
 extern ext4_group_t ext4_mb_prefetch(struct super_block *sb,
index 01299b55a567aa41fe7147c3d625e5087b11a0fc..7669d154c05e0c1c86c725c2bab490753317d632 100644 (file)
@@ -100,7 +100,7 @@ static int ext4_ext_trunc_restart_fn(struct inode *inode, int *dropped)
         * i_rwsem. So we can safely drop the i_data_sem here.
         */
        BUG_ON(EXT4_JOURNAL(inode) == NULL);
-       ext4_discard_preallocations(inode, 0);
+       ext4_discard_preallocations(inode);
        up_write(&EXT4_I(inode)->i_data_sem);
        *dropped = 1;
        return 0;
@@ -2229,7 +2229,7 @@ static int ext4_fill_es_cache_info(struct inode *inode,
 
 
 /*
- * ext4_ext_determine_hole - determine hole around given block
+ * ext4_ext_find_hole - find hole around given block according to the given path
  * @inode:     inode we lookup in
  * @path:      path in extent tree to @lblk
  * @lblk:      pointer to logical block around which we want to determine hole
@@ -2241,9 +2241,9 @@ static int ext4_fill_es_cache_info(struct inode *inode,
  * The function returns the length of a hole starting at @lblk. We update @lblk
  * to the beginning of the hole if we managed to find it.
  */
-static ext4_lblk_t ext4_ext_determine_hole(struct inode *inode,
-                                          struct ext4_ext_path *path,
-                                          ext4_lblk_t *lblk)
+static ext4_lblk_t ext4_ext_find_hole(struct inode *inode,
+                                     struct ext4_ext_path *path,
+                                     ext4_lblk_t *lblk)
 {
        int depth = ext_depth(inode);
        struct ext4_extent *ex;
@@ -2270,30 +2270,6 @@ static ext4_lblk_t ext4_ext_determine_hole(struct inode *inode,
        return len;
 }
 
-/*
- * ext4_ext_put_gap_in_cache:
- * calculate boundaries of the gap that the requested block fits into
- * and cache this gap
- */
-static void
-ext4_ext_put_gap_in_cache(struct inode *inode, ext4_lblk_t hole_start,
-                         ext4_lblk_t hole_len)
-{
-       struct extent_status es;
-
-       ext4_es_find_extent_range(inode, &ext4_es_is_delayed, hole_start,
-                                 hole_start + hole_len - 1, &es);
-       if (es.es_len) {
-               /* There's delayed extent containing lblock? */
-               if (es.es_lblk <= hole_start)
-                       return;
-               hole_len = min(es.es_lblk - hole_start, hole_len);
-       }
-       ext_debug(inode, " -> %u:%u\n", hole_start, hole_len);
-       ext4_es_insert_extent(inode, hole_start, hole_len, ~0,
-                             EXTENT_STATUS_HOLE);
-}
-
 /*
  * ext4_ext_rm_idx:
  * removes index from the index block.
@@ -4062,6 +4038,72 @@ static int get_implied_cluster_alloc(struct super_block *sb,
        return 0;
 }
 
+/*
+ * Determine hole length around the given logical block, first try to
+ * locate and expand the hole from the given @path, and then adjust it
+ * if it's partially or completely converted to delayed extents, insert
+ * it into the extent cache tree if it's indeed a hole, finally return
+ * the length of the determined extent.
+ */
+static ext4_lblk_t ext4_ext_determine_insert_hole(struct inode *inode,
+                                                 struct ext4_ext_path *path,
+                                                 ext4_lblk_t lblk)
+{
+       ext4_lblk_t hole_start, len;
+       struct extent_status es;
+
+       hole_start = lblk;
+       len = ext4_ext_find_hole(inode, path, &hole_start);
+again:
+       ext4_es_find_extent_range(inode, &ext4_es_is_delayed, hole_start,
+                                 hole_start + len - 1, &es);
+       if (!es.es_len)
+               goto insert_hole;
+
+       /*
+        * There's a delalloc extent in the hole, handle it if the delalloc
+        * extent is in front of, behind and straddle the queried range.
+        */
+       if (lblk >= es.es_lblk + es.es_len) {
+               /*
+                * The delalloc extent is in front of the queried range,
+                * find again from the queried start block.
+                */
+               len -= lblk - hole_start;
+               hole_start = lblk;
+               goto again;
+       } else if (in_range(lblk, es.es_lblk, es.es_len)) {
+               /*
+                * The delalloc extent containing lblk, it must have been
+                * added after ext4_map_blocks() checked the extent status
+                * tree so we are not holding i_rwsem and delalloc info is
+                * only stabilized by i_data_sem we are going to release
+                * soon. Don't modify the extent status tree and report
+                * extent as a hole, just adjust the length to the delalloc
+                * extent's after lblk.
+                */
+               len = es.es_lblk + es.es_len - lblk;
+               return len;
+       } else {
+               /*
+                * The delalloc extent is partially or completely behind
+                * the queried range, update hole length until the
+                * beginning of the delalloc extent.
+                */
+               len = min(es.es_lblk - hole_start, len);
+       }
+
+insert_hole:
+       /* Put just found gap into cache to speed up subsequent requests */
+       ext_debug(inode, " -> %u:%u\n", hole_start, len);
+       ext4_es_insert_extent(inode, hole_start, len, ~0, EXTENT_STATUS_HOLE);
+
+       /* Update hole_len to reflect hole size after lblk */
+       if (hole_start != lblk)
+               len -= lblk - hole_start;
+
+       return len;
+}
 
 /*
  * Block allocation/map/preallocation routine for extents based files
@@ -4179,22 +4221,12 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
         * we couldn't try to create block if create flag is zero
         */
        if ((flags & EXT4_GET_BLOCKS_CREATE) == 0) {
-               ext4_lblk_t hole_start, hole_len;
+               ext4_lblk_t len;
 
-               hole_start = map->m_lblk;
-               hole_len = ext4_ext_determine_hole(inode, path, &hole_start);
-               /*
-                * put just found gap into cache to speed up
-                * subsequent requests
-                */
-               ext4_ext_put_gap_in_cache(inode, hole_start, hole_len);
+               len = ext4_ext_determine_insert_hole(inode, path, map->m_lblk);
 
-               /* Update hole_len to reflect hole size after map->m_lblk */
-               if (hole_start != map->m_lblk)
-                       hole_len -= map->m_lblk - hole_start;
                map->m_pblk = 0;
-               map->m_len = min_t(unsigned int, map->m_len, hole_len);
-
+               map->m_len = min_t(unsigned int, map->m_len, len);
                goto out;
        }
 
@@ -4313,7 +4345,7 @@ got_allocated_blocks:
                         * not a good idea to call discard here directly,
                         * but otherwise we'd need to call it every free().
                         */
-                       ext4_discard_preallocations(inode, 0);
+                       ext4_discard_preallocations(inode);
                        if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
                                fb_flags = EXT4_FREE_BLOCKS_NO_QUOT_UPDATE;
                        ext4_free_blocks(handle, inode, NULL, newblock,
@@ -5357,7 +5389,7 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
        ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_FALLOC_RANGE, handle);
 
        down_write(&EXT4_I(inode)->i_data_sem);
-       ext4_discard_preallocations(inode, 0);
+       ext4_discard_preallocations(inode);
        ext4_es_remove_extent(inode, punch_start, EXT_MAX_BLOCKS - punch_start);
 
        ret = ext4_ext_remove_space(inode, punch_start, punch_stop - 1);
@@ -5365,7 +5397,7 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
                up_write(&EXT4_I(inode)->i_data_sem);
                goto out_stop;
        }
-       ext4_discard_preallocations(inode, 0);
+       ext4_discard_preallocations(inode);
 
        ret = ext4_ext_shift_extents(inode, handle, punch_stop,
                                     punch_stop - punch_start, SHIFT_LEFT);
@@ -5497,7 +5529,7 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
                goto out_stop;
 
        down_write(&EXT4_I(inode)->i_data_sem);
-       ext4_discard_preallocations(inode, 0);
+       ext4_discard_preallocations(inode);
 
        path = ext4_find_extent(inode, offset_lblk, NULL, 0);
        if (IS_ERR(path)) {
index 6aa15dafc67786559d3b68ebfefd8f90e119b3fc..54d6ff22585cf1835e8aced5548dbac7c1b89757 100644 (file)
@@ -174,7 +174,7 @@ static int ext4_release_file(struct inode *inode, struct file *filp)
                        (atomic_read(&inode->i_writecount) == 1) &&
                        !EXT4_I(inode)->i_reserved_data_blocks) {
                down_write(&EXT4_I(inode)->i_data_sem);
-               ext4_discard_preallocations(inode, 0);
+               ext4_discard_preallocations(inode);
                up_write(&EXT4_I(inode)->i_data_sem);
        }
        if (is_dx(inode) && filp->private_data)
index 11e6f33677a2c8566cbb7fc230f42e514a1c726c..df853c4d3a8c91b8ee8e7f07bb5853362067cb25 100644 (file)
@@ -576,9 +576,9 @@ static bool ext4_getfsmap_is_valid_device(struct super_block *sb,
        if (fm->fmr_device == 0 || fm->fmr_device == UINT_MAX ||
            fm->fmr_device == new_encode_dev(sb->s_bdev->bd_dev))
                return true;
-       if (EXT4_SB(sb)->s_journal_bdev_handle &&
+       if (EXT4_SB(sb)->s_journal_bdev_file &&
            fm->fmr_device ==
-           new_encode_dev(EXT4_SB(sb)->s_journal_bdev_handle->bdev->bd_dev))
+           new_encode_dev(file_bdev(EXT4_SB(sb)->s_journal_bdev_file)->bd_dev))
                return true;
        return false;
 }
@@ -648,9 +648,9 @@ int ext4_getfsmap(struct super_block *sb, struct ext4_fsmap_head *head,
        memset(handlers, 0, sizeof(handlers));
        handlers[0].gfd_dev = new_encode_dev(sb->s_bdev->bd_dev);
        handlers[0].gfd_fn = ext4_getfsmap_datadev;
-       if (EXT4_SB(sb)->s_journal_bdev_handle) {
+       if (EXT4_SB(sb)->s_journal_bdev_file) {
                handlers[1].gfd_dev = new_encode_dev(
-                       EXT4_SB(sb)->s_journal_bdev_handle->bdev->bd_dev);
+                       file_bdev(EXT4_SB(sb)->s_journal_bdev_file)->bd_dev);
                handlers[1].gfd_fn = ext4_getfsmap_logdev;
        }
 
index a9f3716119d37249de9cd1c12f02abf5c8db08cb..d8ca7f64f9523412a264dc5e266a72a7e6027e04 100644 (file)
@@ -714,7 +714,7 @@ static int ext4_ind_trunc_restart_fn(handle_t *handle, struct inode *inode,
         * i_rwsem. So we can safely drop the i_data_sem here.
         */
        BUG_ON(EXT4_JOURNAL(inode) == NULL);
-       ext4_discard_preallocations(inode, 0);
+       ext4_discard_preallocations(inode);
        up_write(&EXT4_I(inode)->i_data_sem);
        *dropped = 1;
        return 0;
index 5af1b0b8680e9fa5f34f94f4a49127b554691f00..2ccf3b5e3a7c4dcb1b0c6a9d27a3c8a77a145730 100644 (file)
@@ -371,7 +371,7 @@ void ext4_da_update_reserve_space(struct inode *inode,
         */
        if ((ei->i_reserved_data_blocks == 0) &&
            !inode_is_open_for_write(inode))
-               ext4_discard_preallocations(inode, 0);
+               ext4_discard_preallocations(inode);
 }
 
 static int __check_block_validity(struct inode *inode, const char *func,
@@ -515,6 +515,8 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
                        map->m_len = retval;
                } else if (ext4_es_is_delayed(&es) || ext4_es_is_hole(&es)) {
                        map->m_pblk = 0;
+                       map->m_flags |= ext4_es_is_delayed(&es) ?
+                                       EXT4_MAP_DELAYED : 0;
                        retval = es.es_len - (map->m_lblk - es.es_lblk);
                        if (retval > map->m_len)
                                retval = map->m_len;
@@ -1703,11 +1705,8 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
 
        /* Lookup extent status tree firstly */
        if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
-               if (ext4_es_is_hole(&es)) {
-                       retval = 0;
-                       down_read(&EXT4_I(inode)->i_data_sem);
+               if (ext4_es_is_hole(&es))
                        goto add_delayed;
-               }
 
                /*
                 * Delayed extent could be allocated by fallocate.
@@ -1749,26 +1748,11 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
                retval = ext4_ext_map_blocks(NULL, inode, map, 0);
        else
                retval = ext4_ind_map_blocks(NULL, inode, map, 0);
-
-add_delayed:
-       if (retval == 0) {
-               int ret;
-
-               /*
-                * XXX: __block_prepare_write() unmaps passed block,
-                * is it OK?
-                */
-
-               ret = ext4_insert_delayed_block(inode, map->m_lblk);
-               if (ret != 0) {
-                       retval = ret;
-                       goto out_unlock;
-               }
-
-               map_bh(bh, inode->i_sb, invalid_block);
-               set_buffer_new(bh);
-               set_buffer_delay(bh);
-       } else if (retval > 0) {
+       if (retval < 0) {
+               up_read(&EXT4_I(inode)->i_data_sem);
+               return retval;
+       }
+       if (retval > 0) {
                unsigned int status;
 
                if (unlikely(retval != map->m_len)) {
@@ -1783,11 +1767,21 @@ add_delayed:
                                EXTENT_STATUS_UNWRITTEN : EXTENT_STATUS_WRITTEN;
                ext4_es_insert_extent(inode, map->m_lblk, map->m_len,
                                      map->m_pblk, status);
+               up_read(&EXT4_I(inode)->i_data_sem);
+               return retval;
        }
+       up_read(&EXT4_I(inode)->i_data_sem);
 
-out_unlock:
-       up_read((&EXT4_I(inode)->i_data_sem));
+add_delayed:
+       down_write(&EXT4_I(inode)->i_data_sem);
+       retval = ext4_insert_delayed_block(inode, map->m_lblk);
+       up_write(&EXT4_I(inode)->i_data_sem);
+       if (retval)
+               return retval;
 
+       map_bh(bh, inode->i_sb, invalid_block);
+       set_buffer_new(bh);
+       set_buffer_delay(bh);
        return retval;
 }
 
@@ -3268,6 +3262,9 @@ static void ext4_set_iomap(struct inode *inode, struct iomap *iomap,
                iomap->addr = (u64) map->m_pblk << blkbits;
                if (flags & IOMAP_DAX)
                        iomap->addr += EXT4_SB(inode->i_sb)->s_dax_part_off;
+       } else if (map->m_flags & EXT4_MAP_DELAYED) {
+               iomap->type = IOMAP_DELALLOC;
+               iomap->addr = IOMAP_NULL_ADDR;
        } else {
                iomap->type = IOMAP_HOLE;
                iomap->addr = IOMAP_NULL_ADDR;
@@ -3430,35 +3427,11 @@ const struct iomap_ops ext4_iomap_overwrite_ops = {
        .iomap_end              = ext4_iomap_end,
 };
 
-static bool ext4_iomap_is_delalloc(struct inode *inode,
-                                  struct ext4_map_blocks *map)
-{
-       struct extent_status es;
-       ext4_lblk_t offset = 0, end = map->m_lblk + map->m_len - 1;
-
-       ext4_es_find_extent_range(inode, &ext4_es_is_delayed,
-                                 map->m_lblk, end, &es);
-
-       if (!es.es_len || es.es_lblk > end)
-               return false;
-
-       if (es.es_lblk > map->m_lblk) {
-               map->m_len = es.es_lblk - map->m_lblk;
-               return false;
-       }
-
-       offset = map->m_lblk - es.es_lblk;
-       map->m_len = es.es_len - offset;
-
-       return true;
-}
-
 static int ext4_iomap_begin_report(struct inode *inode, loff_t offset,
                                   loff_t length, unsigned int flags,
                                   struct iomap *iomap, struct iomap *srcmap)
 {
        int ret;
-       bool delalloc = false;
        struct ext4_map_blocks map;
        u8 blkbits = inode->i_blkbits;
 
@@ -3499,13 +3472,8 @@ static int ext4_iomap_begin_report(struct inode *inode, loff_t offset,
        ret = ext4_map_blocks(NULL, inode, &map, 0);
        if (ret < 0)
                return ret;
-       if (ret == 0)
-               delalloc = ext4_iomap_is_delalloc(inode, &map);
-
 set_iomap:
        ext4_set_iomap(inode, iomap, &map, offset, length, flags);
-       if (delalloc && iomap->type == IOMAP_HOLE)
-               iomap->type = IOMAP_DELALLOC;
 
        return 0;
 }
@@ -4015,12 +3983,12 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 
        /* If there are blocks to remove, do it */
        if (stop_block > first_block) {
+               ext4_lblk_t hole_len = stop_block - first_block;
 
                down_write(&EXT4_I(inode)->i_data_sem);
-               ext4_discard_preallocations(inode, 0);
+               ext4_discard_preallocations(inode);
 
-               ext4_es_remove_extent(inode, first_block,
-                                     stop_block - first_block);
+               ext4_es_remove_extent(inode, first_block, hole_len);
 
                if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
                        ret = ext4_ext_remove_space(inode, first_block,
@@ -4029,6 +3997,8 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
                        ret = ext4_ind_remove_space(handle, inode, first_block,
                                                    stop_block);
 
+               ext4_es_insert_extent(inode, first_block, hole_len, ~0,
+                                     EXTENT_STATUS_HOLE);
                up_write(&EXT4_I(inode)->i_data_sem);
        }
        ext4_fc_track_range(handle, inode, first_block, stop_block);
@@ -4170,7 +4140,7 @@ int ext4_truncate(struct inode *inode)
 
        down_write(&EXT4_I(inode)->i_data_sem);
 
-       ext4_discard_preallocations(inode, 0);
+       ext4_discard_preallocations(inode);
 
        if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
                err = ext4_ext_truncate(handle, inode);
index aa6be510eb8f578f09faf937a3debaabb9c3b499..7160a71044c88a8fe409111ec51cd597408f98f4 100644 (file)
@@ -467,7 +467,7 @@ static long swap_inode_boot_loader(struct super_block *sb,
        ext4_reset_inode_seed(inode);
        ext4_reset_inode_seed(inode_bl);
 
-       ext4_discard_preallocations(inode, 0);
+       ext4_discard_preallocations(inode);
 
        err = ext4_mark_inode_dirty(handle, inode);
        if (err < 0) {
index f44f668e407f2bda9fe325631c9a4ab62649b9dc..e4f7cf9d89c45a881d6c403fd50fcc499db0b708 100644 (file)
@@ -564,14 +564,14 @@ static void mb_free_blocks_double(struct inode *inode, struct ext4_buddy *e4b,
 
                        blocknr = ext4_group_first_block_no(sb, e4b->bd_group);
                        blocknr += EXT4_C2B(EXT4_SB(sb), first + i);
+                       ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
+                                       EXT4_GROUP_INFO_BBITMAP_CORRUPT);
                        ext4_grp_locked_error(sb, e4b->bd_group,
                                              inode ? inode->i_ino : 0,
                                              blocknr,
                                              "freeing block already freed "
                                              "(bit %u)",
                                              first + i);
-                       ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
-                                       EXT4_GROUP_INFO_BBITMAP_CORRUPT);
                }
                mb_clear_bit(first + i, e4b->bd_info->bb_bitmap);
        }
@@ -677,7 +677,7 @@ do {                                                                        \
        }                                                               \
 } while (0)
 
-static int __mb_check_buddy(struct ext4_buddy *e4b, char *file,
+static void __mb_check_buddy(struct ext4_buddy *e4b, char *file,
                                const char *function, int line)
 {
        struct super_block *sb = e4b->bd_sb;
@@ -696,7 +696,7 @@ static int __mb_check_buddy(struct ext4_buddy *e4b, char *file,
        void *buddy2;
 
        if (e4b->bd_info->bb_check_counter++ % 10)
-               return 0;
+               return;
 
        while (order > 1) {
                buddy = mb_find_buddy(e4b, order, &max);
@@ -758,7 +758,7 @@ static int __mb_check_buddy(struct ext4_buddy *e4b, char *file,
 
        grp = ext4_get_group_info(sb, e4b->bd_group);
        if (!grp)
-               return NULL;
+               return;
        list_for_each(cur, &grp->bb_prealloc_list) {
                ext4_group_t groupnr;
                struct ext4_prealloc_space *pa;
@@ -768,7 +768,6 @@ static int __mb_check_buddy(struct ext4_buddy *e4b, char *file,
                for (i = 0; i < pa->pa_len; i++)
                        MB_CHECK_ASSERT(mb_test_bit(k + i, buddy));
        }
-       return 0;
 }
 #undef MB_CHECK_ASSERT
 #define mb_check_buddy(e4b) __mb_check_buddy(e4b,      \
@@ -842,7 +841,7 @@ mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp)
        struct ext4_sb_info *sbi = EXT4_SB(sb);
        int new_order;
 
-       if (!test_opt2(sb, MB_OPTIMIZE_SCAN) || grp->bb_free == 0)
+       if (!test_opt2(sb, MB_OPTIMIZE_SCAN) || grp->bb_fragments == 0)
                return;
 
        new_order = mb_avg_fragment_size_order(sb,
@@ -871,7 +870,7 @@ mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp)
  * cr level needs an update.
  */
 static void ext4_mb_choose_next_group_p2_aligned(struct ext4_allocation_context *ac,
-                       enum criteria *new_cr, ext4_group_t *group, ext4_group_t ngroups)
+                       enum criteria *new_cr, ext4_group_t *group)
 {
        struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
        struct ext4_group_info *iter;
@@ -945,7 +944,7 @@ ext4_mb_find_good_group_avg_frag_lists(struct ext4_allocation_context *ac, int o
  * order. Updates *new_cr if cr level needs an update.
  */
 static void ext4_mb_choose_next_group_goal_fast(struct ext4_allocation_context *ac,
-               enum criteria *new_cr, ext4_group_t *group, ext4_group_t ngroups)
+               enum criteria *new_cr, ext4_group_t *group)
 {
        struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
        struct ext4_group_info *grp = NULL;
@@ -990,7 +989,7 @@ static void ext4_mb_choose_next_group_goal_fast(struct ext4_allocation_context *
  * much and fall to CR_GOAL_LEN_SLOW in that case.
  */
 static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context *ac,
-               enum criteria *new_cr, ext4_group_t *group, ext4_group_t ngroups)
+               enum criteria *new_cr, ext4_group_t *group)
 {
        struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
        struct ext4_group_info *grp = NULL;
@@ -1125,11 +1124,11 @@ static void ext4_mb_choose_next_group(struct ext4_allocation_context *ac,
        }
 
        if (*new_cr == CR_POWER2_ALIGNED) {
-               ext4_mb_choose_next_group_p2_aligned(ac, new_cr, group, ngroups);
+               ext4_mb_choose_next_group_p2_aligned(ac, new_cr, group);
        } else if (*new_cr == CR_GOAL_LEN_FAST) {
-               ext4_mb_choose_next_group_goal_fast(ac, new_cr, group, ngroups);
+               ext4_mb_choose_next_group_goal_fast(ac, new_cr, group);
        } else if (*new_cr == CR_BEST_AVAIL_LEN) {
-               ext4_mb_choose_next_group_best_avail(ac, new_cr, group, ngroups);
+               ext4_mb_choose_next_group_best_avail(ac, new_cr, group);
        } else {
                /*
                 * TODO: For CR=2, we can arrange groups in an rb tree sorted by
@@ -1233,6 +1232,24 @@ void ext4_mb_generate_buddy(struct super_block *sb,
        atomic64_add(period, &sbi->s_mb_generation_time);
 }
 
+static void mb_regenerate_buddy(struct ext4_buddy *e4b)
+{
+       int count;
+       int order = 1;
+       void *buddy;
+
+       while ((buddy = mb_find_buddy(e4b, order++, &count)))
+               mb_set_bits(buddy, 0, count);
+
+       e4b->bd_info->bb_fragments = 0;
+       memset(e4b->bd_info->bb_counters, 0,
+               sizeof(*e4b->bd_info->bb_counters) *
+               (e4b->bd_sb->s_blocksize_bits + 2));
+
+       ext4_mb_generate_buddy(e4b->bd_sb, e4b->bd_buddy,
+               e4b->bd_bitmap, e4b->bd_group, e4b->bd_info);
+}
+
 /* The buddy information is attached the buddy cache inode
  * for convenience. The information regarding each group
  * is loaded via ext4_mb_load_buddy. The information involve
@@ -1891,11 +1908,6 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
        mb_check_buddy(e4b);
        mb_free_blocks_double(inode, e4b, first, count);
 
-       this_cpu_inc(discard_pa_seq);
-       e4b->bd_info->bb_free += count;
-       if (first < e4b->bd_info->bb_first_free)
-               e4b->bd_info->bb_first_free = first;
-
        /* access memory sequentially: check left neighbour,
         * clear range and then check right neighbour
         */
@@ -1909,21 +1921,31 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
                struct ext4_sb_info *sbi = EXT4_SB(sb);
                ext4_fsblk_t blocknr;
 
+               /*
+                * Fastcommit replay can free already freed blocks which
+                * corrupts allocation info. Regenerate it.
+                */
+               if (sbi->s_mount_state & EXT4_FC_REPLAY) {
+                       mb_regenerate_buddy(e4b);
+                       goto check;
+               }
+
                blocknr = ext4_group_first_block_no(sb, e4b->bd_group);
                blocknr += EXT4_C2B(sbi, block);
-               if (!(sbi->s_mount_state & EXT4_FC_REPLAY)) {
-                       ext4_grp_locked_error(sb, e4b->bd_group,
-                                             inode ? inode->i_ino : 0,
-                                             blocknr,
-                                             "freeing already freed block (bit %u); block bitmap corrupt.",
-                                             block);
-                       ext4_mark_group_bitmap_corrupted(
-                               sb, e4b->bd_group,
+               ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
                                EXT4_GROUP_INFO_BBITMAP_CORRUPT);
-               }
-               goto done;
+               ext4_grp_locked_error(sb, e4b->bd_group,
+                                     inode ? inode->i_ino : 0, blocknr,
+                                     "freeing already freed block (bit %u); block bitmap corrupt.",
+                                     block);
+               return;
        }
 
+       this_cpu_inc(discard_pa_seq);
+       e4b->bd_info->bb_free += count;
+       if (first < e4b->bd_info->bb_first_free)
+               e4b->bd_info->bb_first_free = first;
+
        /* let's maintain fragments counter */
        if (left_is_free && right_is_free)
                e4b->bd_info->bb_fragments--;
@@ -1948,9 +1970,9 @@ static void mb_free_blocks(struct inode *inode, struct ext4_buddy *e4b,
        if (first <= last)
                mb_buddy_mark_free(e4b, first >> 1, last >> 1);
 
-done:
        mb_set_largest_free_order(sb, e4b->bd_info);
        mb_update_avg_fragment_size(sb, e4b->bd_info);
+check:
        mb_check_buddy(e4b);
 }
 
@@ -2276,6 +2298,9 @@ void ext4_mb_try_best_found(struct ext4_allocation_context *ac,
                return;
 
        ext4_lock_group(ac->ac_sb, group);
+       if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)))
+               goto out;
+
        max = mb_find_extent(e4b, ex.fe_start, ex.fe_len, &ex);
 
        if (max > 0) {
@@ -2283,6 +2308,7 @@ void ext4_mb_try_best_found(struct ext4_allocation_context *ac,
                ext4_mb_use_best_found(ac, e4b);
        }
 
+out:
        ext4_unlock_group(ac->ac_sb, group);
        ext4_mb_unload_buddy(e4b);
 }
@@ -2309,12 +2335,10 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
        if (err)
                return err;
 
-       if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info))) {
-               ext4_mb_unload_buddy(e4b);
-               return 0;
-       }
-
        ext4_lock_group(ac->ac_sb, group);
+       if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)))
+               goto out;
+
        max = mb_find_extent(e4b, ac->ac_g_ex.fe_start,
                             ac->ac_g_ex.fe_len, &ex);
        ex.fe_logical = 0xDEADFA11; /* debug value */
@@ -2347,6 +2371,7 @@ int ext4_mb_find_by_goal(struct ext4_allocation_context *ac,
                ac->ac_b_ex = ex;
                ext4_mb_use_best_found(ac, e4b);
        }
+out:
        ext4_unlock_group(ac->ac_sb, group);
        ext4_mb_unload_buddy(e4b);
 
@@ -2380,12 +2405,12 @@ void ext4_mb_simple_scan_group(struct ext4_allocation_context *ac,
 
                k = mb_find_next_zero_bit(buddy, max, 0);
                if (k >= max) {
+                       ext4_mark_group_bitmap_corrupted(ac->ac_sb,
+                                       e4b->bd_group,
+                                       EXT4_GROUP_INFO_BBITMAP_CORRUPT);
                        ext4_grp_locked_error(ac->ac_sb, e4b->bd_group, 0, 0,
                                "%d free clusters of order %d. But found 0",
                                grp->bb_counters[i], i);
-                       ext4_mark_group_bitmap_corrupted(ac->ac_sb,
-                                        e4b->bd_group,
-                                       EXT4_GROUP_INFO_BBITMAP_CORRUPT);
                        break;
                }
                ac->ac_found++;
@@ -2436,12 +2461,12 @@ void ext4_mb_complex_scan_group(struct ext4_allocation_context *ac,
                         * free blocks even though group info says we
                         * have free blocks
                         */
+                       ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
+                                       EXT4_GROUP_INFO_BBITMAP_CORRUPT);
                        ext4_grp_locked_error(sb, e4b->bd_group, 0, 0,
                                        "%d free clusters as per "
                                        "group info. But bitmap says 0",
                                        free);
-                       ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
-                                       EXT4_GROUP_INFO_BBITMAP_CORRUPT);
                        break;
                }
 
@@ -2467,12 +2492,12 @@ void ext4_mb_complex_scan_group(struct ext4_allocation_context *ac,
                if (WARN_ON(ex.fe_len <= 0))
                        break;
                if (free < ex.fe_len) {
+                       ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
+                                       EXT4_GROUP_INFO_BBITMAP_CORRUPT);
                        ext4_grp_locked_error(sb, e4b->bd_group, 0, 0,
                                        "%d free clusters as per "
                                        "group info. But got %d blocks",
                                        free, ex.fe_len);
-                       ext4_mark_group_bitmap_corrupted(sb, e4b->bd_group,
-                                       EXT4_GROUP_INFO_BBITMAP_CORRUPT);
                        /*
                         * The number of free blocks differs. This mostly
                         * indicate that the bitmap is corrupt. So exit
@@ -3725,7 +3750,7 @@ static int ext4_mb_cleanup_pa(struct ext4_group_info *grp)
        return count;
 }
 
-int ext4_mb_release(struct super_block *sb)
+void ext4_mb_release(struct super_block *sb)
 {
        ext4_group_t ngroups = ext4_get_groups_count(sb);
        ext4_group_t i;
@@ -3801,8 +3826,6 @@ int ext4_mb_release(struct super_block *sb)
        }
 
        free_percpu(sbi->s_locality_groups);
-
-       return 0;
 }
 
 static inline int ext4_issue_discard(struct super_block *sb,
@@ -5284,7 +5307,7 @@ static void ext4_mb_new_preallocation(struct ext4_allocation_context *ac)
  * the caller MUST hold group/inode locks.
  * TODO: optimize the case when there are no in-core structures yet
  */
-static noinline_for_stack int
+static noinline_for_stack void
 ext4_mb_release_inode_pa(struct ext4_buddy *e4b, struct buffer_head *bitmap_bh,
                        struct ext4_prealloc_space *pa)
 {
@@ -5334,11 +5357,9 @@ ext4_mb_release_inode_pa(struct ext4_buddy *e4b, struct buffer_head *bitmap_bh,
                 */
        }
        atomic_add(free, &sbi->s_mb_discarded);
-
-       return 0;
 }
 
-static noinline_for_stack int
+static noinline_for_stack void
 ext4_mb_release_group_pa(struct ext4_buddy *e4b,
                                struct ext4_prealloc_space *pa)
 {
@@ -5352,13 +5373,11 @@ ext4_mb_release_group_pa(struct ext4_buddy *e4b,
        if (unlikely(group != e4b->bd_group && pa->pa_len != 0)) {
                ext4_warning(sb, "bad group: expected %u, group %u, pa_start %llu",
                             e4b->bd_group, group, pa->pa_pstart);
-               return 0;
+               return;
        }
        mb_free_blocks(pa->pa_inode, e4b, bit, pa->pa_len);
        atomic_add(pa->pa_len, &EXT4_SB(sb)->s_mb_discarded);
        trace_ext4_mballoc_discard(sb, NULL, group, bit, pa->pa_len);
-
-       return 0;
 }
 
 /*
@@ -5479,7 +5498,7 @@ out_dbg:
  *
  * FIXME!! Make sure it is valid at all the call sites
  */
-void ext4_discard_preallocations(struct inode *inode, unsigned int needed)
+void ext4_discard_preallocations(struct inode *inode)
 {
        struct ext4_inode_info *ei = EXT4_I(inode);
        struct super_block *sb = inode->i_sb;
@@ -5491,9 +5510,8 @@ void ext4_discard_preallocations(struct inode *inode, unsigned int needed)
        struct rb_node *iter;
        int err;
 
-       if (!S_ISREG(inode->i_mode)) {
+       if (!S_ISREG(inode->i_mode))
                return;
-       }
 
        if (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)
                return;
@@ -5501,15 +5519,12 @@ void ext4_discard_preallocations(struct inode *inode, unsigned int needed)
        mb_debug(sb, "discard preallocation for inode %lu\n",
                 inode->i_ino);
        trace_ext4_discard_preallocations(inode,
-                       atomic_read(&ei->i_prealloc_active), needed);
-
-       if (needed == 0)
-               needed = UINT_MAX;
+                       atomic_read(&ei->i_prealloc_active));
 
 repeat:
        /* first, collect all pa's in the inode */
        write_lock(&ei->i_prealloc_lock);
-       for (iter = rb_first(&ei->i_prealloc_node); iter && needed;
+       for (iter = rb_first(&ei->i_prealloc_node); iter;
             iter = rb_next(iter)) {
                pa = rb_entry(iter, struct ext4_prealloc_space,
                              pa_node.inode_node);
@@ -5533,7 +5548,6 @@ repeat:
                        spin_unlock(&pa->pa_lock);
                        rb_erase(&pa->pa_node.inode_node, &ei->i_prealloc_node);
                        list_add(&pa->u.pa_tmp_list, &list);
-                       needed--;
                        continue;
                }
 
@@ -5943,7 +5957,7 @@ static void ext4_mb_add_n_trim(struct ext4_allocation_context *ac)
 /*
  * release all resource we used in allocation
  */
-static int ext4_mb_release_context(struct ext4_allocation_context *ac)
+static void ext4_mb_release_context(struct ext4_allocation_context *ac)
 {
        struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
        struct ext4_prealloc_space *pa = ac->ac_pa;
@@ -5980,7 +5994,6 @@ static int ext4_mb_release_context(struct ext4_allocation_context *ac)
        if (ac->ac_flags & EXT4_MB_HINT_GROUP_ALLOC)
                mutex_unlock(&ac->ac_lg->lg_mutex);
        ext4_mb_collect_stats(ac);
-       return 0;
 }
 
 static int ext4_mb_discard_preallocations(struct super_block *sb, int needed)
@@ -6761,6 +6774,9 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
        bool set_trimmed = false;
        void *bitmap;
 
+       if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(e4b->bd_info)))
+               return 0;
+
        last = ext4_last_grp_cluster(sb, e4b->bd_group);
        bitmap = e4b->bd_bitmap;
        if (start == 0 && max >= last)
index d7aeb5da7d86768c10efdc7ef38b4a6cee38ca78..56938532b4ce258e210178d2406e187eec5ef8cc 100644 (file)
@@ -192,7 +192,6 @@ struct ext4_allocation_context {
         */
        ext4_grpblk_t   ac_orig_goal_len;
 
-       __u32 ac_groups_considered;
        __u32 ac_flags;         /* allocation hints */
        __u16 ac_groups_scanned;
        __u16 ac_groups_linear_remaining;
index 3aa57376d9c2ecbba3d272b572bf56b924b33103..7cd4afa4de1d3127a34ec02f166e90876ad5c6e4 100644 (file)
@@ -618,6 +618,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
                goto out;
        o_end = o_start + len;
 
+       *moved_len = 0;
        while (o_start < o_end) {
                struct ext4_extent *ex;
                ext4_lblk_t cur_blk, next_blk;
@@ -672,7 +673,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
                 */
                ext4_double_up_write_data_sem(orig_inode, donor_inode);
                /* Swap original branches with new branches */
-               move_extent_per_page(o_filp, donor_inode,
+               *moved_len += move_extent_per_page(o_filp, donor_inode,
                                     orig_page_index, donor_page_index,
                                     offset_in_page, cur_len,
                                     unwritten, &ret);
@@ -682,14 +683,11 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
                o_start += cur_len;
                d_start += cur_len;
        }
-       *moved_len = o_start - orig_blk;
-       if (*moved_len > len)
-               *moved_len = len;
 
 out:
        if (*moved_len) {
-               ext4_discard_preallocations(orig_inode, 0);
-               ext4_discard_preallocations(donor_inode, 0);
+               ext4_discard_preallocations(orig_inode);
+               ext4_discard_preallocations(donor_inode);
        }
 
        ext4_free_ext_path(path);
index 05b647e6bc19547a117214b17226d97420887ce6..5e4f65c14dfb9e2d508f259fd2bb7013b7da38a8 100644 (file)
@@ -1762,7 +1762,6 @@ static struct buffer_head *ext4_lookup_entry(struct inode *dir,
        struct buffer_head *bh;
 
        err = ext4_fname_prepare_lookup(dir, dentry, &fname);
-       generic_set_encrypted_ci_d_ops(dentry);
        if (err == -ENOENT)
                return NULL;
        if (err)
index dcba0f85dfe245ab83598d5a451783b02044be52..a8ba84eabab2c28df962174ba2b752c3cd928676 100644 (file)
@@ -1359,14 +1359,14 @@ static void ext4_put_super(struct super_block *sb)
 
        sync_blockdev(sb->s_bdev);
        invalidate_bdev(sb->s_bdev);
-       if (sbi->s_journal_bdev_handle) {
+       if (sbi->s_journal_bdev_file) {
                /*
                 * Invalidate the journal device's buffers.  We don't want them
                 * floating about in memory - the physical journal device may
                 * hotswapped, and it breaks the `ro-after' testing code.
                 */
-               sync_blockdev(sbi->s_journal_bdev_handle->bdev);
-               invalidate_bdev(sbi->s_journal_bdev_handle->bdev);
+               sync_blockdev(file_bdev(sbi->s_journal_bdev_file));
+               invalidate_bdev(file_bdev(sbi->s_journal_bdev_file));
        }
 
        ext4_xattr_destroy_cache(sbi->s_ea_inode_cache);
@@ -1525,7 +1525,7 @@ void ext4_clear_inode(struct inode *inode)
        ext4_fc_del(inode);
        invalidate_inode_buffers(inode);
        clear_inode(inode);
-       ext4_discard_preallocations(inode, 0);
+       ext4_discard_preallocations(inode);
        ext4_es_remove_extent(inode, 0, EXT_MAX_BLOCKS);
        dquot_drop(inode);
        if (EXT4_I(inode)->jinode) {
@@ -4233,7 +4233,7 @@ int ext4_calculate_overhead(struct super_block *sb)
         * Add the internal journal blocks whether the journal has been
         * loaded or not
         */
-       if (sbi->s_journal && !sbi->s_journal_bdev_handle)
+       if (sbi->s_journal && !sbi->s_journal_bdev_file)
                overhead += EXT4_NUM_B2C(sbi, sbi->s_journal->j_total_len);
        else if (ext4_has_feature_journal(sb) && !sbi->s_journal && j_inum) {
                /* j_inum for internal journal is non-zero */
@@ -5346,7 +5346,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
                sb->s_qcop = &ext4_qctl_operations;
        sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP | QTYPE_MASK_PRJ;
 #endif
-       memcpy(&sb->s_uuid, es->s_uuid, sizeof(es->s_uuid));
+       super_set_uuid(sb, es->s_uuid, sizeof(es->s_uuid));
 
        INIT_LIST_HEAD(&sbi->s_orphan); /* unlinked but open files */
        mutex_init(&sbi->s_orphan_lock);
@@ -5484,6 +5484,7 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
                goto failed_mount4;
        }
 
+       generic_set_sb_d_ops(sb);
        sb->s_root = d_make_root(root);
        if (!sb->s_root) {
                ext4_msg(sb, KERN_ERR, "get root dentry failed");
@@ -5670,9 +5671,9 @@ failed_mount:
 #endif
        fscrypt_free_dummy_policy(&sbi->s_dummy_enc_policy);
        brelse(sbi->s_sbh);
-       if (sbi->s_journal_bdev_handle) {
-               invalidate_bdev(sbi->s_journal_bdev_handle->bdev);
-               bdev_release(sbi->s_journal_bdev_handle);
+       if (sbi->s_journal_bdev_file) {
+               invalidate_bdev(file_bdev(sbi->s_journal_bdev_file));
+               fput(sbi->s_journal_bdev_file);
        }
 out_fail:
        invalidate_bdev(sb->s_bdev);
@@ -5842,30 +5843,30 @@ static journal_t *ext4_open_inode_journal(struct super_block *sb,
        return journal;
 }
 
-static struct bdev_handle *ext4_get_journal_blkdev(struct super_block *sb,
+static struct file *ext4_get_journal_blkdev(struct super_block *sb,
                                        dev_t j_dev, ext4_fsblk_t *j_start,
                                        ext4_fsblk_t *j_len)
 {
        struct buffer_head *bh;
        struct block_device *bdev;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        int hblock, blocksize;
        ext4_fsblk_t sb_block;
        unsigned long offset;
        struct ext4_super_block *es;
        int errno;
 
-       bdev_handle = bdev_open_by_dev(j_dev,
+       bdev_file = bdev_file_open_by_dev(j_dev,
                BLK_OPEN_READ | BLK_OPEN_WRITE | BLK_OPEN_RESTRICT_WRITES,
                sb, &fs_holder_ops);
-       if (IS_ERR(bdev_handle)) {
+       if (IS_ERR(bdev_file)) {
                ext4_msg(sb, KERN_ERR,
                         "failed to open journal device unknown-block(%u,%u) %ld",
-                        MAJOR(j_dev), MINOR(j_dev), PTR_ERR(bdev_handle));
-               return bdev_handle;
+                        MAJOR(j_dev), MINOR(j_dev), PTR_ERR(bdev_file));
+               return bdev_file;
        }
 
-       bdev = bdev_handle->bdev;
+       bdev = file_bdev(bdev_file);
        blocksize = sb->s_blocksize;
        hblock = bdev_logical_block_size(bdev);
        if (blocksize < hblock) {
@@ -5912,12 +5913,12 @@ static struct bdev_handle *ext4_get_journal_blkdev(struct super_block *sb,
        *j_start = sb_block + 1;
        *j_len = ext4_blocks_count(es);
        brelse(bh);
-       return bdev_handle;
+       return bdev_file;
 
 out_bh:
        brelse(bh);
 out_bdev:
-       bdev_release(bdev_handle);
+       fput(bdev_file);
        return ERR_PTR(errno);
 }
 
@@ -5927,14 +5928,14 @@ static journal_t *ext4_open_dev_journal(struct super_block *sb,
        journal_t *journal;
        ext4_fsblk_t j_start;
        ext4_fsblk_t j_len;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        int errno = 0;
 
-       bdev_handle = ext4_get_journal_blkdev(sb, j_dev, &j_start, &j_len);
-       if (IS_ERR(bdev_handle))
-               return ERR_CAST(bdev_handle);
+       bdev_file = ext4_get_journal_blkdev(sb, j_dev, &j_start, &j_len);
+       if (IS_ERR(bdev_file))
+               return ERR_CAST(bdev_file);
 
-       journal = jbd2_journal_init_dev(bdev_handle->bdev, sb->s_bdev, j_start,
+       journal = jbd2_journal_init_dev(file_bdev(bdev_file), sb->s_bdev, j_start,
                                        j_len, sb->s_blocksize);
        if (IS_ERR(journal)) {
                ext4_msg(sb, KERN_ERR, "failed to create device journal");
@@ -5949,14 +5950,14 @@ static journal_t *ext4_open_dev_journal(struct super_block *sb,
                goto out_journal;
        }
        journal->j_private = sb;
-       EXT4_SB(sb)->s_journal_bdev_handle = bdev_handle;
+       EXT4_SB(sb)->s_journal_bdev_file = bdev_file;
        ext4_init_journal_params(sb, journal);
        return journal;
 
 out_journal:
        jbd2_journal_destroy(journal);
 out_bdev:
-       bdev_release(bdev_handle);
+       fput(bdev_file);
        return ERR_PTR(errno);
 }
 
@@ -7314,12 +7315,12 @@ static inline int ext3_feature_set_ok(struct super_block *sb)
 static void ext4_kill_sb(struct super_block *sb)
 {
        struct ext4_sb_info *sbi = EXT4_SB(sb);
-       struct bdev_handle *handle = sbi ? sbi->s_journal_bdev_handle : NULL;
+       struct file *bdev_file = sbi ? sbi->s_journal_bdev_file : NULL;
 
        kill_block_super(sb);
 
-       if (handle)
-               bdev_release(handle);
+       if (bdev_file)
+               fput(bdev_file);
 }
 
 static struct file_system_type ext4_fs_type = {
index 75bf1f88843c4ce96c285423228a70ff77c08517..645240cc0229fe4a2eda4499ae4a834fe3bd3a66 100644 (file)
@@ -92,10 +92,12 @@ static const char *ext4_get_link(struct dentry *dentry, struct inode *inode,
 
        if (!dentry) {
                bh = ext4_getblk(NULL, inode, 0, EXT4_GET_BLOCKS_CACHED_NOWAIT);
-               if (IS_ERR(bh))
-                       return ERR_CAST(bh);
-               if (!bh || !ext4_buffer_uptodate(bh))
+               if (IS_ERR(bh) || !bh)
                        return ERR_PTR(-ECHILD);
+               if (!ext4_buffer_uptodate(bh)) {
+                       brelse(bh);
+                       return ERR_PTR(-ECHILD);
+               }
        } else {
                bh = ext4_bread(NULL, inode, 0, 0);
                if (IS_ERR(bh))
index 65294e3b0bef880a424c480ba4f4997b23c6b59c..4c77e8ce5c7514461831d4c02e9c42a136aec3c1 100644 (file)
@@ -24,6 +24,7 @@
 #include <linux/blkdev.h>
 #include <linux/quotaops.h>
 #include <linux/part_stat.h>
+#include <linux/rw_hint.h>
 #include <crypto/hash.h>
 
 #include <linux/fscrypt.h>
@@ -1239,7 +1240,7 @@ struct f2fs_bio_info {
 #define FDEV(i)                                (sbi->devs[i])
 #define RDEV(i)                                (raw_super->devs[i])
 struct f2fs_dev_info {
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        struct block_device *bdev;
        char path[MAX_PATH_LEN];
        unsigned int total_segments;
@@ -3364,17 +3365,6 @@ static inline bool f2fs_cp_error(struct f2fs_sb_info *sbi)
        return is_set_ckpt_flags(sbi, CP_ERROR_FLAG);
 }
 
-static inline bool is_dot_dotdot(const u8 *name, size_t len)
-{
-       if (len == 1 && name[0] == '.')
-               return true;
-
-       if (len == 2 && name[0] == '.' && name[1] == '.')
-               return true;
-
-       return false;
-}
-
 static inline void *f2fs_kmalloc(struct f2fs_sb_info *sbi,
                                        size_t size, gfp_t flags)
 {
index b3bb815fc6aa45b957b89da9feae04bad9f0a9a8..f7f63a567d869d66e024b0334a06e1cc83c8b6f4 100644 (file)
@@ -531,7 +531,6 @@ static struct dentry *f2fs_lookup(struct inode *dir, struct dentry *dentry,
        }
 
        err = f2fs_prepare_lookup(dir, dentry, &fname);
-       generic_set_encrypted_ci_d_ops(dentry);
        if (err == -ENOENT)
                goto out_splice;
        if (err)
index 4c8836ded90fc253a593a8348fb8ab78cf9eab40..e1065ba702076131067091fe173b19edab41b5d1 100644 (file)
@@ -1971,9 +1971,15 @@ static int __f2fs_issue_discard_zone(struct f2fs_sb_info *sbi,
                }
 
                if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING))) {
+                       unsigned int nofs_flags;
+                       int ret;
+
                        trace_f2fs_issue_reset_zone(bdev, blkstart);
-                       return blkdev_zone_mgmt(bdev, REQ_OP_ZONE_RESET,
-                                               sector, nr_sects, GFP_NOFS);
+                       nofs_flags = memalloc_nofs_save();
+                       ret = blkdev_zone_mgmt(bdev, REQ_OP_ZONE_RESET,
+                                               sector, nr_sects);
+                       memalloc_nofs_restore(nofs_flags);
+                       return ret;
                }
 
                __queue_zone_reset_cmd(sbi, bdev, blkstart, lblkstart, blklen);
@@ -4865,6 +4871,7 @@ static int check_zone_write_pointer(struct f2fs_sb_info *sbi,
        block_t zone_block, valid_block_cnt;
        unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
        int ret;
+       unsigned int nofs_flags;
 
        if (zone->type != BLK_ZONE_TYPE_SEQWRITE_REQ)
                return 0;
@@ -4912,8 +4919,10 @@ static int check_zone_write_pointer(struct f2fs_sb_info *sbi,
                    "pointer: valid block[0x%x,0x%x] cond[0x%x]",
                    zone_segno, valid_block_cnt, zone->cond);
 
+       nofs_flags = memalloc_nofs_save();
        ret = blkdev_zone_mgmt(fdev->bdev, REQ_OP_ZONE_FINISH,
-                               zone->start, zone->len, GFP_NOFS);
+                               zone->start, zone->len);
+       memalloc_nofs_restore(nofs_flags);
        if (ret == -EOPNOTSUPP) {
                ret = blkdev_issue_zeroout(fdev->bdev, zone->wp,
                                        zone->len - (zone->wp - zone->start),
index d45ab0992ae5947e6f89628e8e8829c548645d26..b880b746f2263f72868f27b7f43247637f068950 100644 (file)
@@ -1605,7 +1605,7 @@ static void destroy_device_list(struct f2fs_sb_info *sbi)
 
        for (i = 0; i < sbi->s_ndevs; i++) {
                if (i > 0)
-                       bdev_release(FDEV(i).bdev_handle);
+                       fput(FDEV(i).bdev_file);
 #ifdef CONFIG_BLK_DEV_ZONED
                kvfree(FDEV(i).blkz_seq);
 #endif
@@ -4247,7 +4247,7 @@ static int f2fs_scan_devices(struct f2fs_sb_info *sbi)
 
        for (i = 0; i < max_devices; i++) {
                if (i == 0)
-                       FDEV(0).bdev_handle = sbi->sb->s_bdev_handle;
+                       FDEV(0).bdev_file = sbi->sb->s_bdev_file;
                else if (!RDEV(i).path[0])
                        break;
 
@@ -4267,14 +4267,14 @@ static int f2fs_scan_devices(struct f2fs_sb_info *sbi)
                                FDEV(i).end_blk = FDEV(i).start_blk +
                                        (FDEV(i).total_segments <<
                                        sbi->log_blocks_per_seg) - 1;
-                               FDEV(i).bdev_handle = bdev_open_by_path(
+                               FDEV(i).bdev_file = bdev_file_open_by_path(
                                        FDEV(i).path, mode, sbi->sb, NULL);
                        }
                }
-               if (IS_ERR(FDEV(i).bdev_handle))
-                       return PTR_ERR(FDEV(i).bdev_handle);
+               if (IS_ERR(FDEV(i).bdev_file))
+                       return PTR_ERR(FDEV(i).bdev_file);
 
-               FDEV(i).bdev = FDEV(i).bdev_handle->bdev;
+               FDEV(i).bdev = file_bdev(FDEV(i).bdev_file);
                /* to release errored devices */
                sbi->s_ndevs = i + 1;
 
@@ -4496,7 +4496,7 @@ try_onemore:
        sb->s_time_gran = 1;
        sb->s_flags = (sb->s_flags & ~SB_POSIXACL) |
                (test_opt(sbi, POSIX_ACL) ? SB_POSIXACL : 0);
-       memcpy(&sb->s_uuid, raw_super->uuid, sizeof(raw_super->uuid));
+       super_set_uuid(sb, (void *) raw_super->uuid, sizeof(raw_super->uuid));
        sb->s_iflags |= SB_I_CGROUPWB;
 
        /* init f2fs-specific super block info */
@@ -4660,6 +4660,7 @@ try_onemore:
                goto free_node_inode;
        }
 
+       generic_set_sb_d_ops(sb);
        sb->s_root = d_make_root(root); /* allocate root dentry */
        if (!sb->s_root) {
                err = -ENOMEM;
index 1fac3dabf13031fdf02e257f363d451190721a9e..5c813696d1ff282220ee5a28b1173d0eb02aaa25 100644 (file)
@@ -1762,6 +1762,9 @@ int fat_fill_super(struct super_block *sb, void *data, int silent, int isvfat,
        else /* fat 16 or 12 */
                sbi->vol_id = bpb.fat16_vol_id;
 
+       __le32 vol_id_le = cpu_to_le32(sbi->vol_id);
+       super_set_uuid(sb, (void *) &vol_id_le, sizeof(vol_id_le));
+
        sbi->dir_per_block = sb->s_blocksize / sizeof(struct msdos_dir_entry);
        sbi->dir_per_block_bits = ffs(sbi->dir_per_block) - 1;
 
index c80a6acad742fb027a4a990f63ef05d0678736d7..54cc85d3338ed5afb12021e02a4fe6877aa675af 100644 (file)
@@ -27,6 +27,7 @@
 #include <linux/memfd.h>
 #include <linux/compat.h>
 #include <linux/mount.h>
+#include <linux/rw_hint.h>
 
 #include <linux/poll.h>
 #include <asm/siginfo.h>
@@ -268,8 +269,15 @@ static int f_getowner_uids(struct file *filp, unsigned long arg)
 }
 #endif
 
-static bool rw_hint_valid(enum rw_hint hint)
+static bool rw_hint_valid(u64 hint)
 {
+       BUILD_BUG_ON(WRITE_LIFE_NOT_SET != RWH_WRITE_LIFE_NOT_SET);
+       BUILD_BUG_ON(WRITE_LIFE_NONE != RWH_WRITE_LIFE_NONE);
+       BUILD_BUG_ON(WRITE_LIFE_SHORT != RWH_WRITE_LIFE_SHORT);
+       BUILD_BUG_ON(WRITE_LIFE_MEDIUM != RWH_WRITE_LIFE_MEDIUM);
+       BUILD_BUG_ON(WRITE_LIFE_LONG != RWH_WRITE_LIFE_LONG);
+       BUILD_BUG_ON(WRITE_LIFE_EXTREME != RWH_WRITE_LIFE_EXTREME);
+
        switch (hint) {
        case RWH_WRITE_LIFE_NOT_SET:
        case RWH_WRITE_LIFE_NONE:
@@ -283,34 +291,40 @@ static bool rw_hint_valid(enum rw_hint hint)
        }
 }
 
-static long fcntl_rw_hint(struct file *file, unsigned int cmd,
-                         unsigned long arg)
+static long fcntl_get_rw_hint(struct file *file, unsigned int cmd,
+                             unsigned long arg)
 {
        struct inode *inode = file_inode(file);
        u64 __user *argp = (u64 __user *)arg;
-       enum rw_hint hint;
-       u64 h;
+       u64 hint = READ_ONCE(inode->i_write_hint);
 
-       switch (cmd) {
-       case F_GET_RW_HINT:
-               h = inode->i_write_hint;
-               if (copy_to_user(argp, &h, sizeof(*argp)))
-                       return -EFAULT;
-               return 0;
-       case F_SET_RW_HINT:
-               if (copy_from_user(&h, argp, sizeof(h)))
-                       return -EFAULT;
-               hint = (enum rw_hint) h;
-               if (!rw_hint_valid(hint))
-                       return -EINVAL;
+       if (copy_to_user(argp, &hint, sizeof(*argp)))
+               return -EFAULT;
+       return 0;
+}
 
-               inode_lock(inode);
-               inode->i_write_hint = hint;
-               inode_unlock(inode);
-               return 0;
-       default:
+static long fcntl_set_rw_hint(struct file *file, unsigned int cmd,
+                             unsigned long arg)
+{
+       struct inode *inode = file_inode(file);
+       u64 __user *argp = (u64 __user *)arg;
+       u64 hint;
+
+       if (copy_from_user(&hint, argp, sizeof(hint)))
+               return -EFAULT;
+       if (!rw_hint_valid(hint))
                return -EINVAL;
-       }
+
+       WRITE_ONCE(inode->i_write_hint, hint);
+
+       /*
+        * file->f_mapping->host may differ from inode. As an example,
+        * blkdev_open() modifies file->f_mapping.
+        */
+       if (file->f_mapping->host != inode)
+               WRITE_ONCE(file->f_mapping->host->i_write_hint, hint);
+
+       return 0;
 }
 
 static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
@@ -416,8 +430,10 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
                err = memfd_fcntl(filp, cmd, argi);
                break;
        case F_GET_RW_HINT:
+               err = fcntl_get_rw_hint(filp, cmd, arg);
+               break;
        case F_SET_RW_HINT:
-               err = fcntl_rw_hint(filp, cmd, arg);
+               err = fcntl_set_rw_hint(filp, cmd, arg);
                break;
        default:
                break;
@@ -846,12 +862,6 @@ int send_sigurg(struct fown_struct *fown)
 static DEFINE_SPINLOCK(fasync_lock);
 static struct kmem_cache *fasync_cache __ro_after_init;
 
-static void fasync_free_rcu(struct rcu_head *head)
-{
-       kmem_cache_free(fasync_cache,
-                       container_of(head, struct fasync_struct, fa_rcu));
-}
-
 /*
  * Remove a fasync entry. If successfully removed, return
  * positive and clear the FASYNC flag. If no entry exists,
@@ -877,7 +887,7 @@ int fasync_remove_entry(struct file *filp, struct fasync_struct **fapp)
                write_unlock_irq(&fa->fa_lock);
 
                *fp = fa->fa_next;
-               call_rcu(&fa->fa_rcu, fasync_free_rcu);
+               kfree_rcu(fa, fa_rcu);
                filp->f_flags &= ~FASYNC;
                result = 1;
                break;
index 18b3ba8dc8ead7c6016a1f76d96275268b0667f0..57a12614addfd434f981a5b8c981a48ae1a71ceb 100644 (file)
@@ -36,7 +36,7 @@ static long do_sys_name_to_handle(const struct path *path,
        if (f_handle.handle_bytes > MAX_HANDLE_SZ)
                return -EINVAL;
 
-       handle = kmalloc(sizeof(struct file_handle) + f_handle.handle_bytes,
+       handle = kzalloc(sizeof(struct file_handle) + f_handle.handle_bytes,
                         GFP_KERNEL);
        if (!handle)
                return -ENOMEM;
index b991f90571b4d3089a0c00884fea3e38f317b1da..6925522faa0ae53cb9105eaaed56c6a7fcdb305a 100644 (file)
@@ -276,21 +276,15 @@ struct file *alloc_empty_backing_file(int flags, const struct cred *cred)
 }
 
 /**
- * alloc_file - allocate and initialize a 'struct file'
+ * file_init_path - initialize a 'struct file' based on path
  *
+ * @file: the file to set up
  * @path: the (dentry, vfsmount) pair for the new file
- * @flags: O_... flags with which the new file will be opened
  * @fop: the 'struct file_operations' for the new file
  */
-static struct file *alloc_file(const struct path *path, int flags,
-               const struct file_operations *fop)
+static void file_init_path(struct file *file, const struct path *path,
+                          const struct file_operations *fop)
 {
-       struct file *file;
-
-       file = alloc_empty_file(flags, current_cred());
-       if (IS_ERR(file))
-               return file;
-
        file->f_path = *path;
        file->f_inode = path->dentry->d_inode;
        file->f_mapping = path->dentry->d_inode->i_mapping;
@@ -309,22 +303,51 @@ static struct file *alloc_file(const struct path *path, int flags,
        file->f_op = fop;
        if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
                i_readcount_inc(path->dentry->d_inode);
+}
+
+/**
+ * alloc_file - allocate and initialize a 'struct file'
+ *
+ * @path: the (dentry, vfsmount) pair for the new file
+ * @flags: O_... flags with which the new file will be opened
+ * @fop: the 'struct file_operations' for the new file
+ */
+static struct file *alloc_file(const struct path *path, int flags,
+               const struct file_operations *fop)
+{
+       struct file *file;
+
+       file = alloc_empty_file(flags, current_cred());
+       if (!IS_ERR(file))
+               file_init_path(file, path, fop);
        return file;
 }
 
-struct file *alloc_file_pseudo(struct inode *inode, struct vfsmount *mnt,
-                               const char *name, int flags,
-                               const struct file_operations *fops)
+static inline int alloc_path_pseudo(const char *name, struct inode *inode,
+                                   struct vfsmount *mnt, struct path *path)
 {
        struct qstr this = QSTR_INIT(name, strlen(name));
+
+       path->dentry = d_alloc_pseudo(mnt->mnt_sb, &this);
+       if (!path->dentry)
+               return -ENOMEM;
+       path->mnt = mntget(mnt);
+       d_instantiate(path->dentry, inode);
+       return 0;
+}
+
+struct file *alloc_file_pseudo(struct inode *inode, struct vfsmount *mnt,
+                              const char *name, int flags,
+                              const struct file_operations *fops)
+{
+       int ret;
        struct path path;
        struct file *file;
 
-       path.dentry = d_alloc_pseudo(mnt->mnt_sb, &this);
-       if (!path.dentry)
-               return ERR_PTR(-ENOMEM);
-       path.mnt = mntget(mnt);
-       d_instantiate(path.dentry, inode);
+       ret = alloc_path_pseudo(name, inode, mnt, &path);
+       if (ret)
+               return ERR_PTR(ret);
+
        file = alloc_file(&path, flags, fops);
        if (IS_ERR(file)) {
                ihold(inode);
@@ -334,6 +357,30 @@ struct file *alloc_file_pseudo(struct inode *inode, struct vfsmount *mnt,
 }
 EXPORT_SYMBOL(alloc_file_pseudo);
 
+struct file *alloc_file_pseudo_noaccount(struct inode *inode,
+                                        struct vfsmount *mnt, const char *name,
+                                        int flags,
+                                        const struct file_operations *fops)
+{
+       int ret;
+       struct path path;
+       struct file *file;
+
+       ret = alloc_path_pseudo(name, inode, mnt, &path);
+       if (ret)
+               return ERR_PTR(ret);
+
+       file = alloc_empty_file_noaccount(flags, current_cred());
+       if (IS_ERR(file)) {
+               ihold(inode);
+               path_put(&path);
+               return file;
+       }
+       file_init_path(file, &path, fops);
+       return file;
+}
+EXPORT_SYMBOL_GPL(alloc_file_pseudo_noaccount);
+
 struct file *alloc_file_clone(struct file *base, int flags,
                                const struct file_operations *fops)
 {
index 3d84fcc471c6000e38625e2652121282e4bec3c0..e4f17c53ddfcf345bda9961f239be00a98a528f4 100644 (file)
@@ -141,6 +141,31 @@ static void wb_wakeup(struct bdi_writeback *wb)
        spin_unlock_irq(&wb->work_lock);
 }
 
+/*
+ * This function is used when the first inode for this wb is marked dirty. It
+ * wakes-up the corresponding bdi thread which should then take care of the
+ * periodic background write-out of dirty inodes. Since the write-out would
+ * starts only 'dirty_writeback_interval' centisecs from now anyway, we just
+ * set up a timer which wakes the bdi thread up later.
+ *
+ * Note, we wouldn't bother setting up the timer, but this function is on the
+ * fast-path (used by '__mark_inode_dirty()'), so we save few context switches
+ * by delaying the wake-up.
+ *
+ * We have to be careful not to postpone flush work if it is scheduled for
+ * earlier. Thus we use queue_delayed_work().
+ */
+static void wb_wakeup_delayed(struct bdi_writeback *wb)
+{
+       unsigned long timeout;
+
+       timeout = msecs_to_jiffies(dirty_writeback_interval * 10);
+       spin_lock_irq(&wb->work_lock);
+       if (test_bit(WB_registered, &wb->state))
+               queue_delayed_work(bdi_wq, &wb->dwork, timeout);
+       spin_unlock_irq(&wb->work_lock);
+}
+
 static void finish_writeback_work(struct bdi_writeback *wb,
                                  struct wb_writeback_work *work)
 {
index edb3712dcfa5805aa60364bd0c794c257ae04734..a4d6ca0b8971e65b1e3b832d591ac01bad10f8d8 100644 (file)
@@ -83,8 +83,8 @@ static const struct fs_parameter_spec *fs_lookup_key(
 }
 
 /*
- * fs_parse - Parse a filesystem configuration parameter
- * @fc: The filesystem context to log errors through.
+ * __fs_parse - Parse a filesystem configuration parameter
+ * @log: The filesystem context to log errors through.
  * @desc: The parameter description to use.
  * @param: The parameter.
  * @result: Where to place the result of the parse
index 91e89e68177ee4bd686a920b9dfad4978d8d5062..b6cad106c37e44258bd6e4433cd4aaedfbb98f65 100644 (file)
@@ -474,8 +474,7 @@ err:
 
 static void cuse_fc_release(struct fuse_conn *fc)
 {
-       struct cuse_conn *cc = fc_to_cc(fc);
-       kfree_rcu(cc, fc.rcu);
+       kfree(fc_to_cc(fc));
 }
 
 /**
index 148a71b8b4d0e51037b6b59c754dabb9d3273173..c007b0f0c3a7e73bf222eab2928fc86d544aa692 100644 (file)
@@ -2509,14 +2509,14 @@ static int convert_fuse_file_lock(struct fuse_conn *fc,
                 * translate it into the caller's pid namespace.
                 */
                rcu_read_lock();
-               fl->fl_pid = pid_nr_ns(find_pid_ns(ffl->pid, fc->pid_ns), &init_pid_ns);
+               fl->c.flc_pid = pid_nr_ns(find_pid_ns(ffl->pid, fc->pid_ns), &init_pid_ns);
                rcu_read_unlock();
                break;
 
        default:
                return -EIO;
        }
-       fl->fl_type = ffl->type;
+       fl->c.flc_type = ffl->type;
        return 0;
 }
 
@@ -2530,10 +2530,10 @@ static void fuse_lk_fill(struct fuse_args *args, struct file *file,
 
        memset(inarg, 0, sizeof(*inarg));
        inarg->fh = ff->fh;
-       inarg->owner = fuse_lock_owner_id(fc, fl->fl_owner);
+       inarg->owner = fuse_lock_owner_id(fc, fl->c.flc_owner);
        inarg->lk.start = fl->fl_start;
        inarg->lk.end = fl->fl_end;
-       inarg->lk.type = fl->fl_type;
+       inarg->lk.type = fl->c.flc_type;
        inarg->lk.pid = pid;
        if (flock)
                inarg->lk_flags |= FUSE_LK_FLOCK;
@@ -2570,8 +2570,8 @@ static int fuse_setlk(struct file *file, struct file_lock *fl, int flock)
        struct fuse_mount *fm = get_fuse_mount(inode);
        FUSE_ARGS(args);
        struct fuse_lk_in inarg;
-       int opcode = (fl->fl_flags & FL_SLEEP) ? FUSE_SETLKW : FUSE_SETLK;
-       struct pid *pid = fl->fl_type != F_UNLCK ? task_tgid(current) : NULL;
+       int opcode = (fl->c.flc_flags & FL_SLEEP) ? FUSE_SETLKW : FUSE_SETLK;
+       struct pid *pid = fl->c.flc_type != F_UNLCK ? task_tgid(current) : NULL;
        pid_t pid_nr = pid_nr_ns(pid, fm->fc->pid_ns);
        int err;
 
@@ -2581,7 +2581,7 @@ static int fuse_setlk(struct file *file, struct file_lock *fl, int flock)
        }
 
        /* Unlock on close is handled by the flush method */
-       if ((fl->fl_flags & FL_CLOSE_POSIX) == FL_CLOSE_POSIX)
+       if ((fl->c.flc_flags & FL_CLOSE_POSIX) == FL_CLOSE_POSIX)
                return 0;
 
        fuse_lk_fill(&args, file, fl, opcode, pid_nr, flock, &inarg);
index 1df83eebda92771d20a42ea2aaefa118effcbc77..bcbe34488862752154ca2284386baacadf972744 100644 (file)
@@ -888,6 +888,7 @@ struct fuse_mount {
 
        /* Entry on fc->mounts */
        struct list_head fc_entry;
+       struct rcu_head rcu;
 };
 
 static inline struct fuse_mount *get_fuse_mount_super(struct super_block *sb)
index 2a6d44f91729bbd7e3bf1c955a952ecdd695bd0f..516ea2979a90ff2d0eff63a71dc6b8edc4c91b98 100644 (file)
@@ -930,6 +930,14 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
 }
 EXPORT_SYMBOL_GPL(fuse_conn_init);
 
+static void delayed_release(struct rcu_head *p)
+{
+       struct fuse_conn *fc = container_of(p, struct fuse_conn, rcu);
+
+       put_user_ns(fc->user_ns);
+       fc->release(fc);
+}
+
 void fuse_conn_put(struct fuse_conn *fc)
 {
        if (refcount_dec_and_test(&fc->count)) {
@@ -941,13 +949,12 @@ void fuse_conn_put(struct fuse_conn *fc)
                if (fiq->ops->release)
                        fiq->ops->release(fiq);
                put_pid_ns(fc->pid_ns);
-               put_user_ns(fc->user_ns);
                bucket = rcu_dereference_protected(fc->curr_bucket, 1);
                if (bucket) {
                        WARN_ON(atomic_read(&bucket->count) != 1);
                        kfree(bucket);
                }
-               fc->release(fc);
+               call_rcu(&fc->rcu, delayed_release);
        }
 }
 EXPORT_SYMBOL_GPL(fuse_conn_put);
@@ -1366,7 +1373,7 @@ EXPORT_SYMBOL_GPL(fuse_send_init);
 void fuse_free_conn(struct fuse_conn *fc)
 {
        WARN_ON(!list_empty(&fc->devices));
-       kfree_rcu(fc, rcu);
+       kfree(fc);
 }
 EXPORT_SYMBOL_GPL(fuse_free_conn);
 
@@ -1902,7 +1909,7 @@ static void fuse_sb_destroy(struct super_block *sb)
 void fuse_mount_destroy(struct fuse_mount *fm)
 {
        fuse_conn_put(fm->fc);
-       kfree(fm);
+       kfree_rcu(fm, rcu);
 }
 EXPORT_SYMBOL(fuse_mount_destroy);
 
index d9ccfd27e4f11fe4ecc7ce36981cbef469942847..789af5c8fade9d86354f86a6a7ffe696a9f5447d 100644 (file)
@@ -2465,7 +2465,7 @@ out:
 }
 
 static int gfs2_map_blocks(struct iomap_writepage_ctx *wpc, struct inode *inode,
-               loff_t offset)
+               loff_t offset, unsigned int len)
 {
        int ret;
 
index 177f1f41f225458344cd000147d71079c19689ab..2e215e8c3c88e57d6ed17ba6cc5cb22420e99af6 100644 (file)
 
 static int gfs2_drevalidate(struct dentry *dentry, unsigned int flags)
 {
-       struct dentry *parent = NULL;
+       struct dentry *parent;
        struct gfs2_sbd *sdp;
        struct gfs2_inode *dip;
-       struct inode *dinode, *inode;
+       struct inode *inode;
        struct gfs2_holder d_gh;
        struct gfs2_inode *ip = NULL;
        int error, valid = 0;
        int had_lock = 0;
 
-       if (flags & LOOKUP_RCU) {
-               dinode = d_inode_rcu(READ_ONCE(dentry->d_parent));
-               if (!dinode)
-                       return -ECHILD;
-       } else {
-               parent = dget_parent(dentry);
-               dinode = d_inode(parent);
-       }
-       sdp = GFS2_SB(dinode);
-       dip = GFS2_I(dinode);
+       if (flags & LOOKUP_RCU)
+               return -ECHILD;
+
+       parent = dget_parent(dentry);
+       sdp = GFS2_SB(d_inode(parent));
+       dip = GFS2_I(d_inode(parent));
        inode = d_inode(dentry);
 
        if (inode) {
@@ -66,8 +62,7 @@ static int gfs2_drevalidate(struct dentry *dentry, unsigned int flags)
 
        had_lock = (gfs2_glock_is_locked_by_me(dip->i_gl) != NULL);
        if (!had_lock) {
-               error = gfs2_glock_nq_init(dip->i_gl, LM_ST_SHARED,
-                                          flags & LOOKUP_RCU ? GL_NOBLOCK : 0, &d_gh);
+               error = gfs2_glock_nq_init(dip->i_gl, LM_ST_SHARED, 0, &d_gh);
                if (error)
                        goto out;
        }
index 992ca4effb505ec2936a403ea31fac4391cba76e..4c42ada60ae7a4b2fabbfd17ae1508a3f5d82a39 100644 (file)
@@ -1440,10 +1440,10 @@ static int gfs2_lock(struct file *file, int cmd, struct file_lock *fl)
        struct gfs2_sbd *sdp = GFS2_SB(file->f_mapping->host);
        struct lm_lockstruct *ls = &sdp->sd_lockstruct;
 
-       if (!(fl->fl_flags & FL_POSIX))
+       if (!(fl->c.flc_flags & FL_POSIX))
                return -ENOLCK;
        if (gfs2_withdrawing_or_withdrawn(sdp)) {
-               if (fl->fl_type == F_UNLCK)
+               if (lock_is_unlock(fl))
                        locks_lock_file_wait(file, fl);
                return -EIO;
        }
@@ -1451,7 +1451,7 @@ static int gfs2_lock(struct file *file, int cmd, struct file_lock *fl)
                return dlm_posix_cancel(ls->ls_dlm, ip->i_no_addr, file, fl);
        else if (IS_GETLK(cmd))
                return dlm_posix_get(ls->ls_dlm, ip->i_no_addr, file, fl);
-       else if (fl->fl_type == F_UNLCK)
+       else if (lock_is_unlock(fl))
                return dlm_posix_unlock(ls->ls_dlm, ip->i_no_addr, file, fl);
        else
                return dlm_posix_lock(ls->ls_dlm, ip->i_no_addr, file, cmd, fl);
@@ -1483,7 +1483,7 @@ static int do_flock(struct file *file, int cmd, struct file_lock *fl)
        int error = 0;
        int sleeptime;
 
-       state = (fl->fl_type == F_WRLCK) ? LM_ST_EXCLUSIVE : LM_ST_SHARED;
+       state = lock_is_write(fl) ? LM_ST_EXCLUSIVE : LM_ST_SHARED;
        flags = GL_EXACT | GL_NOPID;
        if (!IS_SETLKW(cmd))
                flags |= LM_FLAG_TRY_1CB;
@@ -1495,8 +1495,8 @@ static int do_flock(struct file *file, int cmd, struct file_lock *fl)
                if (fl_gh->gh_state == state)
                        goto out;
                locks_init_lock(&request);
-               request.fl_type = F_UNLCK;
-               request.fl_flags = FL_FLOCK;
+               request.c.flc_type = F_UNLCK;
+               request.c.flc_flags = FL_FLOCK;
                locks_lock_file_wait(file, &request);
                gfs2_glock_dq(fl_gh);
                gfs2_holder_reinit(state, flags, fl_gh);
@@ -1557,10 +1557,10 @@ static void do_unflock(struct file *file, struct file_lock *fl)
 
 static int gfs2_flock(struct file *file, int cmd, struct file_lock *fl)
 {
-       if (!(fl->fl_flags & FL_FLOCK))
+       if (!(fl->c.flc_flags & FL_FLOCK))
                return -ENOLCK;
 
-       if (fl->fl_type == F_UNLCK) {
+       if (lock_is_unlock(fl)) {
                do_unflock(file, fl);
                return 0;
        } else {
index 6bfc9383b7b8eca60aad0d88c341904b572681bb..1b95db2c3aac3c9a9d5d881985e70622342b52ab 100644 (file)
@@ -1882,10 +1882,10 @@ int gfs2_permission(struct mnt_idmap *idmap, struct inode *inode,
                WARN_ON_ONCE(!may_not_block);
                return -ECHILD;
         }
-       if (gfs2_glock_is_locked_by_me(ip->i_gl) == NULL) {
-               int noblock = may_not_block ? GL_NOBLOCK : 0;
-               error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED,
-                                          LM_FLAG_ANY | noblock, &i_gh);
+       if (gfs2_glock_is_locked_by_me(gl) == NULL) {
+               if (may_not_block)
+                       return -ECHILD;
+               error = gfs2_glock_nq_init(gl, LM_ST_SHARED, LM_FLAG_ANY, &i_gh);
                if (error)
                        return error;
        }
index 1281e60be63900764f8c6c97973e057e02897a12..572d58e86296f9117a4e476075eaacdef52394f7 100644 (file)
@@ -214,7 +214,7 @@ static void gfs2_sb_in(struct gfs2_sbd *sdp, const void *buf)
 
        memcpy(sb->sb_lockproto, str->sb_lockproto, GFS2_LOCKNAME_LEN);
        memcpy(sb->sb_locktable, str->sb_locktable, GFS2_LOCKNAME_LEN);
-       memcpy(&s->s_uuid, str->sb_uuid, 16);
+       super_set_uuid(s, str->sb_uuid, 16);
 }
 
 /**
index 7ededcb720c121794eb3782dd84ccff74685296b..012a3d003fbe6162db231058a647cb07f78ef0a9 100644 (file)
@@ -190,6 +190,7 @@ struct hfsplus_sb_info {
        int work_queued;               /* non-zero delayed work is queued */
        struct delayed_work sync_work; /* FS sync delayed work */
        spinlock_t work_lock;          /* protects sync_work and work_queued */
+       struct rcu_head rcu;
 };
 
 #define HFSPLUS_SB_WRITEBACKUP 0
index 1986b4f18a9013ee27f056b7c871df215f05f862..97920202790f944f0d03dda35cc1a83f27201470 100644 (file)
@@ -277,6 +277,14 @@ void hfsplus_mark_mdb_dirty(struct super_block *sb)
        spin_unlock(&sbi->work_lock);
 }
 
+static void delayed_free(struct rcu_head *p)
+{
+       struct hfsplus_sb_info *sbi = container_of(p, struct hfsplus_sb_info, rcu);
+
+       unload_nls(sbi->nls);
+       kfree(sbi);
+}
+
 static void hfsplus_put_super(struct super_block *sb)
 {
        struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
@@ -302,9 +310,7 @@ static void hfsplus_put_super(struct super_block *sb)
        hfs_btree_close(sbi->ext_tree);
        kfree(sbi->s_vhdr_buf);
        kfree(sbi->s_backup_vhdr_buf);
-       unload_nls(sbi->nls);
-       kfree(sb->s_fs_info);
-       sb->s_fs_info = NULL;
+       call_rcu(&sbi->rcu, delayed_free);
 }
 
 static int hfsplus_statfs(struct dentry *dentry, struct kstatfs *buf)
index b0cb704009963c73d1021acdd70f65b01cab9911..ce9346099c72dc89b15a24d7be67a076923ebfcc 100644 (file)
@@ -30,7 +30,7 @@ struct hfsplus_wd {
  * @sector: block to read or write, for blocks of HFSPLUS_SECTOR_SIZE bytes
  * @buf: buffer for I/O
  * @data: output pointer for location of requested data
- * @opf: request op flags
+ * @opf: I/O operation type and flags
  *
  * The unit of I/O is hfsplus_min_io_size(sb), which may be bigger than
  * HFSPLUS_SECTOR_SIZE, and @buf must be sized accordingly. On reads
index ea5b8e57d904e20b964fb5e627c4bae894370401..6502c7e776d195e1d004908964f74ce5a76d6db2 100644 (file)
@@ -100,6 +100,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
        loff_t len, vma_len;
        int ret;
        struct hstate *h = hstate_file(file);
+       vm_flags_t vm_flags;
 
        /*
         * vma address alignment (but not the pgoff alignment) has
@@ -141,10 +142,20 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
        file_accessed(file);
 
        ret = -ENOMEM;
+
+       vm_flags = vma->vm_flags;
+       /*
+        * for SHM_HUGETLB, the pages are reserved in the shmget() call so skip
+        * reserving here. Note: only for SHM hugetlbfs file, the inode
+        * flag S_PRIVATE is set.
+        */
+       if (inode->i_flags & S_PRIVATE)
+               vm_flags |= VM_NORESERVE;
+
        if (!hugetlb_reserve_pages(inode,
                                vma->vm_pgoff >> huge_page_order(h),
                                len >> huge_page_shift(h), vma,
-                               vma->vm_flags))
+                               vm_flags))
                goto out;
 
        ret = 0;
@@ -340,7 +351,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
                } else {
                        folio_unlock(folio);
 
-                       if (!folio_test_has_hwpoisoned(folio))
+                       if (!folio_test_hwpoison(folio))
                                want = nr;
                        else {
                                /*
@@ -922,7 +933,7 @@ static int hugetlbfs_setattr(struct mnt_idmap *idmap,
        unsigned int ia_valid = attr->ia_valid;
        struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
 
-       error = setattr_prepare(&nop_mnt_idmap, dentry, attr);
+       error = setattr_prepare(idmap, dentry, attr);
        if (error)
                return error;
 
@@ -939,7 +950,7 @@ static int hugetlbfs_setattr(struct mnt_idmap *idmap,
                hugetlb_vmtruncate(inode, newsize);
        }
 
-       setattr_copy(&nop_mnt_idmap, inode, attr);
+       setattr_copy(idmap, inode, attr);
        mark_inode_dirty(inode);
        return 0;
 }
@@ -974,6 +985,7 @@ static struct inode *hugetlbfs_get_root(struct super_block *sb,
 static struct lock_class_key hugetlbfs_i_mmap_rwsem_key;
 
 static struct inode *hugetlbfs_get_inode(struct super_block *sb,
+                                       struct mnt_idmap *idmap,
                                        struct inode *dir,
                                        umode_t mode, dev_t dev)
 {
@@ -995,7 +1007,7 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb,
                struct hugetlbfs_inode_info *info = HUGETLBFS_I(inode);
 
                inode->i_ino = get_next_ino();
-               inode_init_owner(&nop_mnt_idmap, inode, dir, mode);
+               inode_init_owner(idmap, inode, dir, mode);
                lockdep_set_class(&inode->i_mapping->i_mmap_rwsem,
                                &hugetlbfs_i_mmap_rwsem_key);
                inode->i_mapping->a_ops = &hugetlbfs_aops;
@@ -1039,7 +1051,7 @@ static int hugetlbfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
 {
        struct inode *inode;
 
-       inode = hugetlbfs_get_inode(dir->i_sb, dir, mode, dev);
+       inode = hugetlbfs_get_inode(dir->i_sb, idmap, dir, mode, dev);
        if (!inode)
                return -ENOSPC;
        inode_set_mtime_to_ts(dir, inode_set_ctime_current(dir));
@@ -1051,7 +1063,7 @@ static int hugetlbfs_mknod(struct mnt_idmap *idmap, struct inode *dir,
 static int hugetlbfs_mkdir(struct mnt_idmap *idmap, struct inode *dir,
                           struct dentry *dentry, umode_t mode)
 {
-       int retval = hugetlbfs_mknod(&nop_mnt_idmap, dir, dentry,
+       int retval = hugetlbfs_mknod(idmap, dir, dentry,
                                     mode | S_IFDIR, 0);
        if (!retval)
                inc_nlink(dir);
@@ -1062,7 +1074,7 @@ static int hugetlbfs_create(struct mnt_idmap *idmap,
                            struct inode *dir, struct dentry *dentry,
                            umode_t mode, bool excl)
 {
-       return hugetlbfs_mknod(&nop_mnt_idmap, dir, dentry, mode | S_IFREG, 0);
+       return hugetlbfs_mknod(idmap, dir, dentry, mode | S_IFREG, 0);
 }
 
 static int hugetlbfs_tmpfile(struct mnt_idmap *idmap,
@@ -1071,7 +1083,7 @@ static int hugetlbfs_tmpfile(struct mnt_idmap *idmap,
 {
        struct inode *inode;
 
-       inode = hugetlbfs_get_inode(dir->i_sb, dir, mode | S_IFREG, 0);
+       inode = hugetlbfs_get_inode(dir->i_sb, idmap, dir, mode | S_IFREG, 0);
        if (!inode)
                return -ENOSPC;
        inode_set_mtime_to_ts(dir, inode_set_ctime_current(dir));
@@ -1083,10 +1095,11 @@ static int hugetlbfs_symlink(struct mnt_idmap *idmap,
                             struct inode *dir, struct dentry *dentry,
                             const char *symname)
 {
+       const umode_t mode = S_IFLNK|S_IRWXUGO;
        struct inode *inode;
        int error = -ENOSPC;
 
-       inode = hugetlbfs_get_inode(dir->i_sb, dir, S_IFLNK|S_IRWXUGO, 0);
+       inode = hugetlbfs_get_inode(dir->i_sb, idmap, dir, mode, 0);
        if (inode) {
                int l = strlen(symname)+1;
                error = page_symlink(inode, symname, l);
@@ -1354,6 +1367,7 @@ static int hugetlbfs_parse_param(struct fs_context *fc, struct fs_parameter *par
 {
        struct hugetlbfs_fs_context *ctx = fc->fs_private;
        struct fs_parse_result result;
+       struct hstate *h;
        char *rest;
        unsigned long ps;
        int opt;
@@ -1398,11 +1412,12 @@ static int hugetlbfs_parse_param(struct fs_context *fc, struct fs_parameter *par
 
        case Opt_pagesize:
                ps = memparse(param->string, &rest);
-               ctx->hstate = size_to_hstate(ps);
-               if (!ctx->hstate) {
+               h = size_to_hstate(ps);
+               if (!h) {
                        pr_err("Unsupported page size %lu MB\n", ps / SZ_1M);
                        return -EINVAL;
                }
+               ctx->hstate = h;
                return 0;
 
        case Opt_min_size:
@@ -1553,6 +1568,7 @@ static struct file_system_type hugetlbfs_fs_type = {
        .init_fs_context        = hugetlbfs_init_fs_context,
        .parameters             = hugetlb_fs_parameters,
        .kill_sb                = kill_litter_super,
+       .fs_flags               = FS_ALLOW_IDMAP,
 };
 
 static struct vfsmount *hugetlbfs_vfsmount[HUGE_MAX_HSTATE];
@@ -1606,7 +1622,9 @@ struct file *hugetlb_file_setup(const char *name, size_t size,
        }
 
        file = ERR_PTR(-ENOSPC);
-       inode = hugetlbfs_get_inode(mnt->mnt_sb, NULL, S_IFREG | S_IRWXUGO, 0);
+       /* hugetlbfs_vfsmount[] mounts do not use idmapped mounts.  */
+       inode = hugetlbfs_get_inode(mnt->mnt_sb, &nop_mnt_idmap, NULL,
+                                   S_IFREG | S_IRWXUGO, 0);
        if (!inode)
                goto out;
        if (creat_flags == HUGETLB_SHMFS_INODE)
index 91048c4c9c9e7d1079d375afe64383c74f0c3c81..d290f007b3d13132319ea4eec2774ab9622d7ff9 100644 (file)
@@ -20,6 +20,7 @@
 #include <linux/ratelimit.h>
 #include <linux/list_lru.h>
 #include <linux/iversion.h>
+#include <linux/rw_hint.h>
 #include <trace/events/writeback.h>
 #include "internal.h"
 
@@ -588,7 +589,8 @@ void dump_mapping(const struct address_space *mapping)
        }
 
        dentry_ptr = container_of(dentry_first, struct dentry, d_u.d_alias);
-       if (get_kernel_nofault(dentry, dentry_ptr)) {
+       if (get_kernel_nofault(dentry, dentry_ptr) ||
+           !dentry.d_parent || !dentry.d_name.name) {
                pr_warn("aops:%ps ino:%lx invalid dentry:%px\n",
                                a_ops, ino, dentry_ptr);
                return;
@@ -2285,7 +2287,7 @@ void __init inode_init(void)
                                         sizeof(struct inode),
                                         0,
                                         (SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
-                                        SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+                                        SLAB_ACCOUNT),
                                         init_once);
 
        /* Hash may have been set up in inode_init_early */
@@ -2509,7 +2511,7 @@ struct timespec64 inode_set_ctime_current(struct inode *inode)
 {
        struct timespec64 now = current_time(inode);
 
-       inode_set_ctime(inode, now.tv_sec, now.tv_nsec);
+       inode_set_ctime_to_ts(inode, now);
        return now;
 }
 EXPORT_SYMBOL(inode_set_ctime_current);
index b67406435fc02763299e8732fa4cf6dfe1d51bf6..49c1fcfee4b35a8f8f653ad916ec4a4459f1ed34 100644 (file)
@@ -183,6 +183,7 @@ extern struct open_how build_open_how(int flags, umode_t mode);
 extern int build_open_flags(const struct open_how *how, struct open_flags *op);
 struct file *file_close_fd_locked(struct files_struct *files, unsigned fd);
 
+long do_ftruncate(struct file *file, loff_t length, int small);
 long do_sys_ftruncate(unsigned int fd, loff_t length, int small);
 int chmod_common(const struct path *path, umode_t mode);
 int do_fchownat(int dfd, const char __user *filename, uid_t user, gid_t group,
@@ -310,3 +311,10 @@ ssize_t __kernel_write_iter(struct file *file, struct iov_iter *from, loff_t *po
 struct mnt_idmap *alloc_mnt_idmap(struct user_namespace *mnt_userns);
 struct mnt_idmap *mnt_idmap_get(struct mnt_idmap *idmap);
 void mnt_idmap_put(struct mnt_idmap *idmap);
+struct stashed_operations {
+       void (*put_data)(void *data);
+       void (*init_inode)(struct inode *inode, void *data);
+};
+int path_from_stashed(struct dentry **stashed, unsigned long ino,
+                     struct vfsmount *mnt, void *data, struct path *path);
+void stashed_dentry_prune(struct dentry *dentry);
index 76cf22ac97d76256104a6dbc4f819247168ce769..1d5abfdf0f22a626560b9ae6bb95309f8c146be5 100644 (file)
@@ -763,6 +763,33 @@ static int ioctl_fssetxattr(struct file *file, void __user *argp)
        return err;
 }
 
+static int ioctl_getfsuuid(struct file *file, void __user *argp)
+{
+       struct super_block *sb = file_inode(file)->i_sb;
+       struct fsuuid2 u = { .len = sb->s_uuid_len, };
+
+       if (!sb->s_uuid_len)
+               return -ENOIOCTLCMD;
+
+       memcpy(&u.uuid[0], &sb->s_uuid, sb->s_uuid_len);
+
+       return copy_to_user(argp, &u, sizeof(u)) ? -EFAULT : 0;
+}
+
+static int ioctl_get_fs_sysfs_path(struct file *file, void __user *argp)
+{
+       struct super_block *sb = file_inode(file)->i_sb;
+
+       if (!strlen(sb->s_sysfs_name))
+               return -ENOIOCTLCMD;
+
+       struct fs_sysfs_path u = {};
+
+       u.len = scnprintf(u.name, sizeof(u.name), "%s/%s", sb->s_type->name, sb->s_sysfs_name);
+
+       return copy_to_user(argp, &u, sizeof(u)) ? -EFAULT : 0;
+}
+
 /*
  * do_vfs_ioctl() is not for drivers and not intended to be EXPORT_SYMBOL()'d.
  * It's just a simple helper for sys_ioctl and compat_sys_ioctl.
@@ -845,6 +872,12 @@ static int do_vfs_ioctl(struct file *filp, unsigned int fd,
        case FS_IOC_FSSETXATTR:
                return ioctl_fssetxattr(filp, argp);
 
+       case FS_IOC_GETFSUUID:
+               return ioctl_getfsuuid(filp, argp);
+
+       case FS_IOC_GETFSSYSFSPATH:
+               return ioctl_get_fs_sysfs_path(filp, argp);
+
        default:
                if (S_ISREG(inode->i_mode))
                        return file_ioctl(filp, cmd, argp);
index 093c4515b22a53b9b9f9b862747269d76448ab09..4e8e41c8b3c0e41c97ed544bfafbef4113f4fbd5 100644 (file)
@@ -1,7 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
  * Copyright (C) 2010 Red Hat, Inc.
- * Copyright (C) 2016-2019 Christoph Hellwig.
+ * Copyright (C) 2016-2023 Christoph Hellwig.
  */
 #include <linux/module.h>
 #include <linux/compiler.h>
@@ -95,6 +95,44 @@ static inline bool ifs_block_is_dirty(struct folio *folio,
        return test_bit(block + blks_per_folio, ifs->state);
 }
 
+static unsigned ifs_find_dirty_range(struct folio *folio,
+               struct iomap_folio_state *ifs, u64 *range_start, u64 range_end)
+{
+       struct inode *inode = folio->mapping->host;
+       unsigned start_blk =
+               offset_in_folio(folio, *range_start) >> inode->i_blkbits;
+       unsigned end_blk = min_not_zero(
+               offset_in_folio(folio, range_end) >> inode->i_blkbits,
+               i_blocks_per_folio(inode, folio));
+       unsigned nblks = 1;
+
+       while (!ifs_block_is_dirty(folio, ifs, start_blk))
+               if (++start_blk == end_blk)
+                       return 0;
+
+       while (start_blk + nblks < end_blk) {
+               if (!ifs_block_is_dirty(folio, ifs, start_blk + nblks))
+                       break;
+               nblks++;
+       }
+
+       *range_start = folio_pos(folio) + (start_blk << inode->i_blkbits);
+       return nblks << inode->i_blkbits;
+}
+
+static unsigned iomap_find_dirty_range(struct folio *folio, u64 *range_start,
+               u64 range_end)
+{
+       struct iomap_folio_state *ifs = folio->private;
+
+       if (*range_start >= range_end)
+               return 0;
+
+       if (ifs)
+               return ifs_find_dirty_range(folio, ifs, range_start, range_end);
+       return range_end - *range_start;
+}
+
 static void ifs_clear_range_dirty(struct folio *folio,
                struct iomap_folio_state *ifs, size_t off, size_t len)
 {
@@ -1454,15 +1492,10 @@ out_unlock:
 EXPORT_SYMBOL_GPL(iomap_page_mkwrite);
 
 static void iomap_finish_folio_write(struct inode *inode, struct folio *folio,
-               size_t len, int error)
+               size_t len)
 {
        struct iomap_folio_state *ifs = folio->private;
 
-       if (error) {
-               folio_set_error(folio);
-               mapping_set_error(inode->i_mapping, error);
-       }
-
        WARN_ON_ONCE(i_blocks_per_folio(inode, folio) > 1 && !ifs);
        WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) <= 0);
 
@@ -1479,40 +1512,29 @@ static u32
 iomap_finish_ioend(struct iomap_ioend *ioend, int error)
 {
        struct inode *inode = ioend->io_inode;
-       struct bio *bio = &ioend->io_inline_bio;
-       struct bio *last = ioend->io_bio, *next;
-       u64 start = bio->bi_iter.bi_sector;
-       loff_t offset = ioend->io_offset;
-       bool quiet = bio_flagged(bio, BIO_QUIET);
+       struct bio *bio = &ioend->io_bio;
+       struct folio_iter fi;
        u32 folio_count = 0;
 
-       for (bio = &ioend->io_inline_bio; bio; bio = next) {
-               struct folio_iter fi;
-
-               /*
-                * For the last bio, bi_private points to the ioend, so we
-                * need to explicitly end the iteration here.
-                */
-               if (bio == last)
-                       next = NULL;
-               else
-                       next = bio->bi_private;
-
-               /* walk all folios in bio, ending page IO on them */
-               bio_for_each_folio_all(fi, bio) {
-                       iomap_finish_folio_write(inode, fi.folio, fi.length,
-                                       error);
-                       folio_count++;
+       if (error) {
+               mapping_set_error(inode->i_mapping, error);
+               if (!bio_flagged(bio, BIO_QUIET)) {
+                       pr_err_ratelimited(
+"%s: writeback error on inode %lu, offset %lld, sector %llu",
+                               inode->i_sb->s_id, inode->i_ino,
+                               ioend->io_offset, ioend->io_sector);
                }
-               bio_put(bio);
        }
-       /* The ioend has been freed by bio_put() */
 
-       if (unlikely(error && !quiet)) {
-               printk_ratelimited(KERN_ERR
-"%s: writeback error on inode %lu, offset %lld, sector %llu",
-                       inode->i_sb->s_id, inode->i_ino, offset, start);
+       /* walk all folios in bio, ending page IO on them */
+       bio_for_each_folio_all(fi, bio) {
+               if (error)
+                       folio_set_error(fi.folio);
+               iomap_finish_folio_write(inode, fi.folio, fi.length);
+               folio_count++;
        }
+
+       bio_put(bio);   /* frees the ioend */
        return folio_count;
 }
 
@@ -1553,7 +1575,7 @@ EXPORT_SYMBOL_GPL(iomap_finish_ioends);
 static bool
 iomap_ioend_can_merge(struct iomap_ioend *ioend, struct iomap_ioend *next)
 {
-       if (ioend->io_bio->bi_status != next->io_bio->bi_status)
+       if (ioend->io_bio.bi_status != next->io_bio.bi_status)
                return false;
        if ((ioend->io_flags & IOMAP_F_SHARED) ^
            (next->io_flags & IOMAP_F_SHARED))
@@ -1618,47 +1640,46 @@ EXPORT_SYMBOL_GPL(iomap_sort_ioends);
 
 static void iomap_writepage_end_bio(struct bio *bio)
 {
-       struct iomap_ioend *ioend = bio->bi_private;
-
-       iomap_finish_ioend(ioend, blk_status_to_errno(bio->bi_status));
+       iomap_finish_ioend(iomap_ioend_from_bio(bio),
+                       blk_status_to_errno(bio->bi_status));
 }
 
 /*
  * Submit the final bio for an ioend.
  *
  * If @error is non-zero, it means that we have a situation where some part of
- * the submission process has failed after we've marked pages for writeback
- * and unlocked them.  In this situation, we need to fail the bio instead of
- * submitting it.  This typically only happens on a filesystem shutdown.
+ * the submission process has failed after we've marked pages for writeback.
+ * We cannot cancel ioend directly in that case, so call the bio end I/O handler
+ * with the error status here to run the normal I/O completion handler to clear
+ * the writeback bit and let the file system proess the errors.
  */
-static int
-iomap_submit_ioend(struct iomap_writepage_ctx *wpc, struct iomap_ioend *ioend,
-               int error)
+static int iomap_submit_ioend(struct iomap_writepage_ctx *wpc, int error)
 {
-       ioend->io_bio->bi_private = ioend;
-       ioend->io_bio->bi_end_io = iomap_writepage_end_bio;
+       if (!wpc->ioend)
+               return error;
 
+       /*
+        * Let the file systems prepare the I/O submission and hook in an I/O
+        * comletion handler.  This also needs to happen in case after a
+        * failure happened so that the file system end I/O handler gets called
+        * to clean up.
+        */
        if (wpc->ops->prepare_ioend)
-               error = wpc->ops->prepare_ioend(ioend, error);
+               error = wpc->ops->prepare_ioend(wpc->ioend, error);
+
        if (error) {
-               /*
-                * If we're failing the IO now, just mark the ioend with an
-                * error and finish it.  This will run IO completion immediately
-                * as there is only one reference to the ioend at this point in
-                * time.
-                */
-               ioend->io_bio->bi_status = errno_to_blk_status(error);
-               bio_endio(ioend->io_bio);
-               return error;
+               wpc->ioend->io_bio.bi_status = errno_to_blk_status(error);
+               bio_endio(&wpc->ioend->io_bio);
+       } else {
+               submit_bio(&wpc->ioend->io_bio);
        }
 
-       submit_bio(ioend->io_bio);
-       return 0;
+       wpc->ioend = NULL;
+       return error;
 }
 
-static struct iomap_ioend *
-iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc,
-               loff_t offset, sector_t sector, struct writeback_control *wbc)
+static struct iomap_ioend *iomap_alloc_ioend(struct iomap_writepage_ctx *wpc,
+               struct writeback_control *wbc, struct inode *inode, loff_t pos)
 {
        struct iomap_ioend *ioend;
        struct bio *bio;
@@ -1666,63 +1687,42 @@ iomap_alloc_ioend(struct inode *inode, struct iomap_writepage_ctx *wpc,
        bio = bio_alloc_bioset(wpc->iomap.bdev, BIO_MAX_VECS,
                               REQ_OP_WRITE | wbc_to_write_flags(wbc),
                               GFP_NOFS, &iomap_ioend_bioset);
-       bio->bi_iter.bi_sector = sector;
+       bio->bi_iter.bi_sector = iomap_sector(&wpc->iomap, pos);
+       bio->bi_end_io = iomap_writepage_end_bio;
        wbc_init_bio(wbc, bio);
+       bio->bi_write_hint = inode->i_write_hint;
 
-       ioend = container_of(bio, struct iomap_ioend, io_inline_bio);
+       ioend = iomap_ioend_from_bio(bio);
        INIT_LIST_HEAD(&ioend->io_list);
        ioend->io_type = wpc->iomap.type;
        ioend->io_flags = wpc->iomap.flags;
        ioend->io_inode = inode;
        ioend->io_size = 0;
-       ioend->io_folios = 0;
-       ioend->io_offset = offset;
-       ioend->io_bio = bio;
-       ioend->io_sector = sector;
-       return ioend;
-}
-
-/*
- * Allocate a new bio, and chain the old bio to the new one.
- *
- * Note that we have to perform the chaining in this unintuitive order
- * so that the bi_private linkage is set up in the right direction for the
- * traversal in iomap_finish_ioend().
- */
-static struct bio *
-iomap_chain_bio(struct bio *prev)
-{
-       struct bio *new;
-
-       new = bio_alloc(prev->bi_bdev, BIO_MAX_VECS, prev->bi_opf, GFP_NOFS);
-       bio_clone_blkg_association(new, prev);
-       new->bi_iter.bi_sector = bio_end_sector(prev);
+       ioend->io_offset = pos;
+       ioend->io_sector = bio->bi_iter.bi_sector;
 
-       bio_chain(prev, new);
-       bio_get(prev);          /* for iomap_finish_ioend */
-       submit_bio(prev);
-       return new;
+       wpc->nr_folios = 0;
+       return ioend;
 }
 
-static bool
-iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset,
-               sector_t sector)
+static bool iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t pos)
 {
        if ((wpc->iomap.flags & IOMAP_F_SHARED) !=
            (wpc->ioend->io_flags & IOMAP_F_SHARED))
                return false;
        if (wpc->iomap.type != wpc->ioend->io_type)
                return false;
-       if (offset != wpc->ioend->io_offset + wpc->ioend->io_size)
+       if (pos != wpc->ioend->io_offset + wpc->ioend->io_size)
                return false;
-       if (sector != bio_end_sector(wpc->ioend->io_bio))
+       if (iomap_sector(&wpc->iomap, pos) !=
+           bio_end_sector(&wpc->ioend->io_bio))
                return false;
        /*
         * Limit ioend bio chain lengths to minimise IO completion latency. This
         * also prevents long tight loops ending page writeback on all the
         * folios in the ioend.
         */
-       if (wpc->ioend->io_folios >= IOEND_BATCH_SIZE)
+       if (wpc->nr_folios >= IOEND_BATCH_SIZE)
                return false;
        return true;
 }
@@ -1730,255 +1730,238 @@ iomap_can_add_to_ioend(struct iomap_writepage_ctx *wpc, loff_t offset,
 /*
  * Test to see if we have an existing ioend structure that we could append to
  * first; otherwise finish off the current ioend and start another.
+ *
+ * If a new ioend is created and cached, the old ioend is submitted to the block
+ * layer instantly.  Batching optimisations are provided by higher level block
+ * plugging.
+ *
+ * At the end of a writeback pass, there will be a cached ioend remaining on the
+ * writepage context that the caller will need to submit.
  */
-static void
-iomap_add_to_ioend(struct inode *inode, loff_t pos, struct folio *folio,
-               struct iomap_folio_state *ifs, struct iomap_writepage_ctx *wpc,
-               struct writeback_control *wbc, struct list_head *iolist)
+static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc,
+               struct writeback_control *wbc, struct folio *folio,
+               struct inode *inode, loff_t pos, unsigned len)
 {
-       sector_t sector = iomap_sector(&wpc->iomap, pos);
-       unsigned len = i_blocksize(inode);
+       struct iomap_folio_state *ifs = folio->private;
        size_t poff = offset_in_folio(folio, pos);
+       int error;
 
-       if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, pos, sector)) {
-               if (wpc->ioend)
-                       list_add(&wpc->ioend->io_list, iolist);
-               wpc->ioend = iomap_alloc_ioend(inode, wpc, pos, sector, wbc);
+       if (!wpc->ioend || !iomap_can_add_to_ioend(wpc, pos)) {
+new_ioend:
+               error = iomap_submit_ioend(wpc, 0);
+               if (error)
+                       return error;
+               wpc->ioend = iomap_alloc_ioend(wpc, wbc, inode, pos);
        }
 
-       if (!bio_add_folio(wpc->ioend->io_bio, folio, len, poff)) {
-               wpc->ioend->io_bio = iomap_chain_bio(wpc->ioend->io_bio);
-               bio_add_folio_nofail(wpc->ioend->io_bio, folio, len, poff);
-       }
+       if (!bio_add_folio(&wpc->ioend->io_bio, folio, len, poff))
+               goto new_ioend;
 
        if (ifs)
                atomic_add(len, &ifs->write_bytes_pending);
        wpc->ioend->io_size += len;
        wbc_account_cgroup_owner(wbc, &folio->page, len);
+       return 0;
 }
 
-/*
- * We implement an immediate ioend submission policy here to avoid needing to
- * chain multiple ioends and hence nest mempool allocations which can violate
- * the forward progress guarantees we need to provide. The current ioend we're
- * adding blocks to is cached in the writepage context, and if the new block
- * doesn't append to the cached ioend, it will create a new ioend and cache that
- * instead.
- *
- * If a new ioend is created and cached, the old ioend is returned and queued
- * locally for submission once the entire page is processed or an error has been
- * detected.  While ioends are submitted immediately after they are completed,
- * batching optimisations are provided by higher level block plugging.
- *
- * At the end of a writeback pass, there will be a cached ioend remaining on the
- * writepage context that the caller will need to submit.
- */
-static int
-iomap_writepage_map(struct iomap_writepage_ctx *wpc,
-               struct writeback_control *wbc, struct inode *inode,
-               struct folio *folio, u64 end_pos)
+static int iomap_writepage_map_blocks(struct iomap_writepage_ctx *wpc,
+               struct writeback_control *wbc, struct folio *folio,
+               struct inode *inode, u64 pos, unsigned dirty_len,
+               unsigned *count)
 {
-       struct iomap_folio_state *ifs = folio->private;
-       struct iomap_ioend *ioend, *next;
-       unsigned len = i_blocksize(inode);
-       unsigned nblocks = i_blocks_per_folio(inode, folio);
-       u64 pos = folio_pos(folio);
-       int error = 0, count = 0, i;
-       LIST_HEAD(submit_list);
-
-       WARN_ON_ONCE(end_pos <= pos);
-
-       if (!ifs && nblocks > 1) {
-               ifs = ifs_alloc(inode, folio, 0);
-               iomap_set_range_dirty(folio, 0, end_pos - pos);
-       }
+       int error;
 
-       WARN_ON_ONCE(ifs && atomic_read(&ifs->write_bytes_pending) != 0);
-
-       /*
-        * Walk through the folio to find areas to write back. If we
-        * run off the end of the current map or find the current map
-        * invalid, grab a new one.
-        */
-       for (i = 0; i < nblocks && pos < end_pos; i++, pos += len) {
-               if (ifs && !ifs_block_is_dirty(folio, ifs, i))
-                       continue;
+       do {
+               unsigned map_len;
 
-               error = wpc->ops->map_blocks(wpc, inode, pos);
+               error = wpc->ops->map_blocks(wpc, inode, pos, dirty_len);
                if (error)
                        break;
-               trace_iomap_writepage_map(inode, &wpc->iomap);
-               if (WARN_ON_ONCE(wpc->iomap.type == IOMAP_INLINE))
-                       continue;
-               if (wpc->iomap.type == IOMAP_HOLE)
-                       continue;
-               iomap_add_to_ioend(inode, pos, folio, ifs, wpc, wbc,
-                                &submit_list);
-               count++;
-       }
-       if (count)
-               wpc->ioend->io_folios++;
+               trace_iomap_writepage_map(inode, pos, dirty_len, &wpc->iomap);
 
-       WARN_ON_ONCE(!wpc->ioend && !list_empty(&submit_list));
-       WARN_ON_ONCE(!folio_test_locked(folio));
-       WARN_ON_ONCE(folio_test_writeback(folio));
-       WARN_ON_ONCE(folio_test_dirty(folio));
+               map_len = min_t(u64, dirty_len,
+                       wpc->iomap.offset + wpc->iomap.length - pos);
+               WARN_ON_ONCE(!folio->private && map_len < dirty_len);
+
+               switch (wpc->iomap.type) {
+               case IOMAP_INLINE:
+                       WARN_ON_ONCE(1);
+                       error = -EIO;
+                       break;
+               case IOMAP_HOLE:
+                       break;
+               default:
+                       error = iomap_add_to_ioend(wpc, wbc, folio, inode, pos,
+                                       map_len);
+                       if (!error)
+                               (*count)++;
+                       break;
+               }
+               dirty_len -= map_len;
+               pos += map_len;
+       } while (dirty_len && !error);
 
        /*
         * We cannot cancel the ioend directly here on error.  We may have
         * already set other pages under writeback and hence we have to run I/O
         * completion to mark the error state of the pages under writeback
         * appropriately.
+        *
+        * Just let the file system know what portion of the folio failed to
+        * map.
         */
-       if (unlikely(error)) {
-               /*
-                * Let the filesystem know what portion of the current page
-                * failed to map. If the page hasn't been added to ioend, it
-                * won't be affected by I/O completion and we must unlock it
-                * now.
-                */
-               if (wpc->ops->discard_folio)
-                       wpc->ops->discard_folio(folio, pos);
-               if (!count) {
-                       folio_unlock(folio);
-                       goto done;
-               }
-       }
-
-       /*
-        * We can have dirty bits set past end of file in page_mkwrite path
-        * while mapping the last partial folio. Hence it's better to clear
-        * all the dirty bits in the folio here.
-        */
-       iomap_clear_range_dirty(folio, 0, folio_size(folio));
-       folio_start_writeback(folio);
-       folio_unlock(folio);
-
-       /*
-        * Preserve the original error if there was one; catch
-        * submission errors here and propagate into subsequent ioend
-        * submissions.
-        */
-       list_for_each_entry_safe(ioend, next, &submit_list, io_list) {
-               int error2;
-
-               list_del_init(&ioend->io_list);
-               error2 = iomap_submit_ioend(wpc, ioend, error);
-               if (error2 && !error)
-                       error = error2;
-       }
-
-       /*
-        * We can end up here with no error and nothing to write only if we race
-        * with a partial page truncate on a sub-page block sized filesystem.
-        */
-       if (!count)
-               folio_end_writeback(folio);
-done:
-       mapping_set_error(inode->i_mapping, error);
+       if (error && wpc->ops->discard_folio)
+               wpc->ops->discard_folio(folio, pos);
        return error;
 }
 
 /*
- * Write out a dirty page.
+ * Check interaction of the folio with the file end.
  *
- * For delalloc space on the page, we need to allocate space and flush it.
- * For unwritten space on the page, we need to start the conversion to
- * regular allocated space.
+ * If the folio is entirely beyond i_size, return false.  If it straddles
+ * i_size, adjust end_pos and zero all data beyond i_size.
  */
-static int iomap_do_writepage(struct folio *folio,
-               struct writeback_control *wbc, void *data)
+static bool iomap_writepage_handle_eof(struct folio *folio, struct inode *inode,
+               u64 *end_pos)
 {
-       struct iomap_writepage_ctx *wpc = data;
-       struct inode *inode = folio->mapping->host;
-       u64 end_pos, isize;
-
-       trace_iomap_writepage(inode, folio_pos(folio), folio_size(folio));
+       u64 isize = i_size_read(inode);
 
-       /*
-        * Refuse to write the folio out if we're called from reclaim context.
-        *
-        * This avoids stack overflows when called from deeply used stacks in
-        * random callers for direct reclaim or memcg reclaim.  We explicitly
-        * allow reclaim from kswapd as the stack usage there is relatively low.
-        *
-        * This should never happen except in the case of a VM regression so
-        * warn about it.
-        */
-       if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
-                       PF_MEMALLOC))
-               goto redirty;
-
-       /*
-        * Is this folio beyond the end of the file?
-        *
-        * The folio index is less than the end_index, adjust the end_pos
-        * to the highest offset that this folio should represent.
-        * -----------------------------------------------------
-        * |                    file mapping           | <EOF> |
-        * -----------------------------------------------------
-        * | Page ... | Page N-2 | Page N-1 |  Page N  |       |
-        * ^--------------------------------^----------|--------
-        * |     desired writeback range    |      see else    |
-        * ---------------------------------^------------------|
-        */
-       isize = i_size_read(inode);
-       end_pos = folio_pos(folio) + folio_size(folio);
-       if (end_pos > isize) {
-               /*
-                * Check whether the page to write out is beyond or straddles
-                * i_size or not.
-                * -------------------------------------------------------
-                * |            file mapping                    | <EOF>  |
-                * -------------------------------------------------------
-                * | Page ... | Page N-2 | Page N-1 |  Page N   | Beyond |
-                * ^--------------------------------^-----------|---------
-                * |                                |      Straddles     |
-                * ---------------------------------^-----------|--------|
-                */
+       if (*end_pos > isize) {
                size_t poff = offset_in_folio(folio, isize);
                pgoff_t end_index = isize >> PAGE_SHIFT;
 
                /*
-                * Skip the page if it's fully outside i_size, e.g.
-                * due to a truncate operation that's in progress.  We've
-                * cleaned this page and truncate will finish things off for
-                * us.
+                * If the folio is entirely ouside of i_size, skip it.
+                *
+                * This can happen due to a truncate operation that is in
+                * progress and in that case truncate will finish it off once
+                * we've dropped the folio lock.
                 *
-                * Note that the end_index is unsigned long.  If the given
-                * offset is greater than 16TB on a 32-bit system then if we
-                * checked if the page is fully outside i_size with
-                * "if (page->index >= end_index + 1)", "end_index + 1" would
-                * overflow and evaluate to 0.  Hence this page would be
+                * Note that the pgoff_t used for end_index is an unsigned long.
+                * If the given offset is greater than 16TB on a 32-bit system,
+                * then if we checked if the folio is fully outside i_size with
+                * "if (folio->index >= end_index + 1)", "end_index + 1" would
+                * overflow and evaluate to 0.  Hence this folio would be
                 * redirtied and written out repeatedly, which would result in
                 * an infinite loop; the user program performing this operation
                 * would hang.  Instead, we can detect this situation by
-                * checking if the page is totally beyond i_size or if its
+                * checking if the folio is totally beyond i_size or if its
                 * offset is just equal to the EOF.
                 */
                if (folio->index > end_index ||
                    (folio->index == end_index && poff == 0))
-                       goto unlock;
+                       return false;
 
                /*
-                * The page straddles i_size.  It must be zeroed out on each
-                * and every writepage invocation because it may be mmapped.
-                * "A file is mapped in multiples of the page size.  For a file
-                * that is not a multiple of the page size, the remaining
-                * memory is zeroed when mapped, and writes to that region are
-                * not written out to the file."
+                * The folio straddles i_size.
+                *
+                * It must be zeroed out on each and every writepage invocation
+                * because it may be mmapped:
+                *
+                *    A file is mapped in multiples of the page size.  For a
+                *    file that is not a multiple of the page size, the
+                *    remaining memory is zeroed when mapped, and writes to that
+                *    region are not written out to the file.
+                *
+                * Also adjust the writeback range to skip all blocks entirely
+                * beyond i_size.
                 */
                folio_zero_segment(folio, poff, folio_size(folio));
-               end_pos = isize;
+               *end_pos = round_up(isize, i_blocksize(inode));
+       }
+
+       return true;
+}
+
+static int iomap_writepage_map(struct iomap_writepage_ctx *wpc,
+               struct writeback_control *wbc, struct folio *folio)
+{
+       struct iomap_folio_state *ifs = folio->private;
+       struct inode *inode = folio->mapping->host;
+       u64 pos = folio_pos(folio);
+       u64 end_pos = pos + folio_size(folio);
+       unsigned count = 0;
+       int error = 0;
+       u32 rlen;
+
+       WARN_ON_ONCE(!folio_test_locked(folio));
+       WARN_ON_ONCE(folio_test_dirty(folio));
+       WARN_ON_ONCE(folio_test_writeback(folio));
+
+       trace_iomap_writepage(inode, pos, folio_size(folio));
+
+       if (!iomap_writepage_handle_eof(folio, inode, &end_pos)) {
+               folio_unlock(folio);
+               return 0;
+       }
+       WARN_ON_ONCE(end_pos <= pos);
+
+       if (i_blocks_per_folio(inode, folio) > 1) {
+               if (!ifs) {
+                       ifs = ifs_alloc(inode, folio, 0);
+                       iomap_set_range_dirty(folio, 0, end_pos - pos);
+               }
+
+               /*
+                * Keep the I/O completion handler from clearing the writeback
+                * bit until we have submitted all blocks by adding a bias to
+                * ifs->write_bytes_pending, which is dropped after submitting
+                * all blocks.
+                */
+               WARN_ON_ONCE(atomic_read(&ifs->write_bytes_pending) != 0);
+               atomic_inc(&ifs->write_bytes_pending);
        }
 
-       return iomap_writepage_map(wpc, wbc, inode, folio, end_pos);
+       /*
+        * Set the writeback bit ASAP, as the I/O completion for the single
+        * block per folio case happen hit as soon as we're submitting the bio.
+        */
+       folio_start_writeback(folio);
 
-redirty:
-       folio_redirty_for_writepage(wbc, folio);
-unlock:
+       /*
+        * Walk through the folio to find dirty areas to write back.
+        */
+       while ((rlen = iomap_find_dirty_range(folio, &pos, end_pos))) {
+               error = iomap_writepage_map_blocks(wpc, wbc, folio, inode,
+                               pos, rlen, &count);
+               if (error)
+                       break;
+               pos += rlen;
+       }
+
+       if (count)
+               wpc->nr_folios++;
+
+       /*
+        * We can have dirty bits set past end of file in page_mkwrite path
+        * while mapping the last partial folio. Hence it's better to clear
+        * all the dirty bits in the folio here.
+        */
+       iomap_clear_range_dirty(folio, 0, folio_size(folio));
+
+       /*
+        * Usually the writeback bit is cleared by the I/O completion handler.
+        * But we may end up either not actually writing any blocks, or (when
+        * there are multiple blocks in a folio) all I/O might have finished
+        * already at this point.  In that case we need to clear the writeback
+        * bit ourselves right after unlocking the page.
+        */
        folio_unlock(folio);
-       return 0;
+       if (ifs) {
+               if (atomic_dec_and_test(&ifs->write_bytes_pending))
+                       folio_end_writeback(folio);
+       } else {
+               if (!count)
+                       folio_end_writeback(folio);
+       }
+       mapping_set_error(inode->i_mapping, error);
+       return error;
+}
+
+static int iomap_do_writepage(struct folio *folio,
+               struct writeback_control *wbc, void *data)
+{
+       return iomap_writepage_map(data, wbc, folio);
 }
 
 int
@@ -1988,18 +1971,24 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
 {
        int                     ret;
 
+       /*
+        * Writeback from reclaim context should never happen except in the case
+        * of a VM regression so warn about it and refuse to write the data.
+        */
+       if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC | PF_KSWAPD)) ==
+                       PF_MEMALLOC))
+               return -EIO;
+
        wpc->ops = ops;
        ret = write_cache_pages(mapping, wbc, iomap_do_writepage, wpc);
-       if (!wpc->ioend)
-               return ret;
-       return iomap_submit_ioend(wpc, wpc->ioend, ret);
+       return iomap_submit_ioend(wpc, ret);
 }
 EXPORT_SYMBOL_GPL(iomap_writepages);
 
 static int __init iomap_init(void)
 {
        return bioset_init(&iomap_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE),
-                          offsetof(struct iomap_ioend, io_inline_bio),
+                          offsetof(struct iomap_ioend, io_bio),
                           BIOSET_NEED_BVECS);
 }
 fs_initcall(iomap_init);
index bcd3f8cf5ea42f0fa0e4967510230f7e5d25b1fa..f3b43d223a46ee9064315b6bc902852f16172ce1 100644 (file)
@@ -380,6 +380,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
                fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits,
                                          GFP_KERNEL);
                bio->bi_iter.bi_sector = iomap_sector(iomap, pos);
+               bio->bi_write_hint = inode->i_write_hint;
                bio->bi_ioprio = dio->iocb->ki_ioprio;
                bio->bi_private = dio;
                bio->bi_end_io = iomap_dio_bio_end_io;
index c16fd55f5595c2984c24ddf77002eab739eeffc8..0a991c4ce87d2c2ef1ec3e35bd8eefe716582d3c 100644 (file)
@@ -154,7 +154,48 @@ DEFINE_EVENT(iomap_class, name,    \
        TP_ARGS(inode, iomap))
 DEFINE_IOMAP_EVENT(iomap_iter_dstmap);
 DEFINE_IOMAP_EVENT(iomap_iter_srcmap);
-DEFINE_IOMAP_EVENT(iomap_writepage_map);
+
+TRACE_EVENT(iomap_writepage_map,
+       TP_PROTO(struct inode *inode, u64 pos, unsigned int dirty_len,
+                struct iomap *iomap),
+       TP_ARGS(inode, pos, dirty_len, iomap),
+       TP_STRUCT__entry(
+               __field(dev_t, dev)
+               __field(u64, ino)
+               __field(u64, pos)
+               __field(u64, dirty_len)
+               __field(u64, addr)
+               __field(loff_t, offset)
+               __field(u64, length)
+               __field(u16, type)
+               __field(u16, flags)
+               __field(dev_t, bdev)
+       ),
+       TP_fast_assign(
+               __entry->dev = inode->i_sb->s_dev;
+               __entry->ino = inode->i_ino;
+               __entry->pos = pos;
+               __entry->dirty_len = dirty_len;
+               __entry->addr = iomap->addr;
+               __entry->offset = iomap->offset;
+               __entry->length = iomap->length;
+               __entry->type = iomap->type;
+               __entry->flags = iomap->flags;
+               __entry->bdev = iomap->bdev ? iomap->bdev->bd_dev : 0;
+       ),
+       TP_printk("dev %d:%d ino 0x%llx bdev %d:%d pos 0x%llx dirty len 0x%llx "
+                 "addr 0x%llx offset 0x%llx length 0x%llx type %s flags %s",
+                 MAJOR(__entry->dev), MINOR(__entry->dev),
+                 __entry->ino,
+                 MAJOR(__entry->bdev), MINOR(__entry->bdev),
+                 __entry->pos,
+                 __entry->dirty_len,
+                 __entry->addr,
+                 __entry->offset,
+                 __entry->length,
+                 __print_symbolic(__entry->type, IOMAP_TYPE_STRINGS),
+                 __print_flags(__entry->flags, "|", IOMAP_F_FLAGS_STRINGS))
+);
 
 TRACE_EVENT(iomap_iter,
        TP_PROTO(struct iomap_iter *iter, const void *ops,
@@ -165,6 +206,7 @@ TRACE_EVENT(iomap_iter,
                __field(u64, ino)
                __field(loff_t, pos)
                __field(u64, length)
+               __field(s64, processed)
                __field(unsigned int, flags)
                __field(const void *, ops)
                __field(unsigned long, caller)
@@ -174,15 +216,17 @@ TRACE_EVENT(iomap_iter,
                __entry->ino = iter->inode->i_ino;
                __entry->pos = iter->pos;
                __entry->length = iomap_length(iter);
+               __entry->processed = iter->processed;
                __entry->flags = iter->flags;
                __entry->ops = ops;
                __entry->caller = caller;
        ),
-       TP_printk("dev %d:%d ino 0x%llx pos 0x%llx length 0x%llx flags %s (0x%x) ops %ps caller %pS",
+       TP_printk("dev %d:%d ino 0x%llx pos 0x%llx length 0x%llx processed %lld flags %s (0x%x) ops %ps caller %pS",
                  MAJOR(__entry->dev), MINOR(__entry->dev),
                   __entry->ino,
                   __entry->pos,
                   __entry->length,
+                  __entry->processed,
                   __print_flags(__entry->flags, "|", IOMAP_FLAGS_STRINGS),
                   __entry->flags,
                   __entry->ops,
index 8eec84c651bfba2da05af6a834c4ad3fe7a60f2b..cb3cda1390adb16e1ad8031783849ba59022db87 100644 (file)
@@ -2763,9 +2763,7 @@ static int dbBackSplit(dmtree_t *tp, int leafno, bool is_ctl)
  *     leafno  - the number of the leaf to be updated.
  *     newval  - the new value for the leaf.
  *
- * RETURN VALUES:
- *  0          - success
- *     -EIO    - i/o error
+ * RETURN VALUES: none
  */
 static int dbJoin(dmtree_t *tp, int leafno, int newval, bool is_ctl)
 {
@@ -2792,10 +2790,6 @@ static int dbJoin(dmtree_t *tp, int leafno, int newval, bool is_ctl)
                 * get the buddy size (number of words covered) of
                 * the new value.
                 */
-
-               if ((newval - tp->dmt_budmin) > BUDMIN)
-                       return -EIO;
-
                budsz = BUDSIZE(newval, tp->dmt_budmin);
 
                /* try to join.
index cb6d1fda66a7021a9ce5b42959122be9ad1934b2..73389c68e25170c81d6f84483f09b43216ba4b52 100644 (file)
@@ -1058,7 +1058,7 @@ void jfs_syncpt(struct jfs_log *log, int hard_sync)
 int lmLogOpen(struct super_block *sb)
 {
        int rc;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        struct jfs_log *log;
        struct jfs_sb_info *sbi = JFS_SBI(sb);
 
@@ -1070,7 +1070,7 @@ int lmLogOpen(struct super_block *sb)
 
        mutex_lock(&jfs_log_mutex);
        list_for_each_entry(log, &jfs_external_logs, journal_list) {
-               if (log->bdev_handle->bdev->bd_dev == sbi->logdev) {
+               if (file_bdev(log->bdev_file)->bd_dev == sbi->logdev) {
                        if (!uuid_equal(&log->uuid, &sbi->loguuid)) {
                                jfs_warn("wrong uuid on JFS journal");
                                mutex_unlock(&jfs_log_mutex);
@@ -1100,14 +1100,14 @@ int lmLogOpen(struct super_block *sb)
         * file systems to log may have n-to-1 relationship;
         */
 
-       bdev_handle = bdev_open_by_dev(sbi->logdev,
+       bdev_file = bdev_file_open_by_dev(sbi->logdev,
                        BLK_OPEN_READ | BLK_OPEN_WRITE, log, NULL);
-       if (IS_ERR(bdev_handle)) {
-               rc = PTR_ERR(bdev_handle);
+       if (IS_ERR(bdev_file)) {
+               rc = PTR_ERR(bdev_file);
                goto free;
        }
 
-       log->bdev_handle = bdev_handle;
+       log->bdev_file = bdev_file;
        uuid_copy(&log->uuid, &sbi->loguuid);
 
        /*
@@ -1141,7 +1141,7 @@ journal_found:
        lbmLogShutdown(log);
 
       close:           /* close external log device */
-       bdev_release(bdev_handle);
+       fput(bdev_file);
 
       free:            /* free log descriptor */
        mutex_unlock(&jfs_log_mutex);
@@ -1162,7 +1162,7 @@ static int open_inline_log(struct super_block *sb)
        init_waitqueue_head(&log->syncwait);
 
        set_bit(log_INLINELOG, &log->flag);
-       log->bdev_handle = sb->s_bdev_handle;
+       log->bdev_file = sb->s_bdev_file;
        log->base = addressPXD(&JFS_SBI(sb)->logpxd);
        log->size = lengthPXD(&JFS_SBI(sb)->logpxd) >>
            (L2LOGPSIZE - sb->s_blocksize_bits);
@@ -1436,7 +1436,7 @@ int lmLogClose(struct super_block *sb)
 {
        struct jfs_sb_info *sbi = JFS_SBI(sb);
        struct jfs_log *log = sbi->log;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        int rc = 0;
 
        jfs_info("lmLogClose: log:0x%p", log);
@@ -1482,10 +1482,10 @@ int lmLogClose(struct super_block *sb)
         *      external log as separate logical volume
         */
        list_del(&log->journal_list);
-       bdev_handle = log->bdev_handle;
+       bdev_file = log->bdev_file;
        rc = lmLogShutdown(log);
 
-       bdev_release(bdev_handle);
+       fput(bdev_file);
 
        kfree(log);
 
@@ -1972,7 +1972,7 @@ static int lbmRead(struct jfs_log * log, int pn, struct lbuf ** bpp)
 
        bp->l_flag |= lbmREAD;
 
-       bio = bio_alloc(log->bdev_handle->bdev, 1, REQ_OP_READ, GFP_NOFS);
+       bio = bio_alloc(file_bdev(log->bdev_file), 1, REQ_OP_READ, GFP_NOFS);
        bio->bi_iter.bi_sector = bp->l_blkno << (log->l2bsize - 9);
        __bio_add_page(bio, bp->l_page, LOGPSIZE, bp->l_offset);
        BUG_ON(bio->bi_iter.bi_size != LOGPSIZE);
@@ -2115,7 +2115,7 @@ static void lbmStartIO(struct lbuf * bp)
        jfs_info("lbmStartIO");
 
        if (!log->no_integrity)
-               bdev = log->bdev_handle->bdev;
+               bdev = file_bdev(log->bdev_file);
 
        bio = bio_alloc(bdev, 1, REQ_OP_WRITE | REQ_SYNC,
                        GFP_NOFS);
index 84aa2d2539074361cf6694dd9586f5925ad56d3a..8b8994e48cd080b4df8084108e89c06c5ce36417 100644 (file)
@@ -356,7 +356,7 @@ struct jfs_log {
                                 *    before writing syncpt.
                                 */
        struct list_head journal_list; /* Global list */
-       struct bdev_handle *bdev_handle; /* 4: log lv pointer */
+       struct file *bdev_file; /* 4: log lv pointer */
        int serial;             /* 4: log mount serial number */
 
        s64 base;               /* @8: log extent address (inline log ) */
index 9b5c6a20b30c8323f09aefcce6b4ff0482c9b2dc..98f9a432c33662f51b1236579299719bf732f11c 100644 (file)
@@ -431,7 +431,7 @@ int updateSuper(struct super_block *sb, uint state)
        if (state == FM_MOUNT) {
                /* record log's dev_t and mount serial number */
                j_sb->s_logdev = cpu_to_le32(
-                       new_encode_dev(sbi->log->bdev_handle->bdev->bd_dev));
+                       new_encode_dev(file_bdev(sbi->log->bdev_file)->bd_dev));
                j_sb->s_logserial = cpu_to_le32(sbi->log->serial);
        } else if (state == FM_CLEAN) {
                /*
index 8d8e556bd6104eca1ec55d7ea4da3bcfec0c967f..73f09a762b79b0495cfd56017e5e10e0362b2d30 100644 (file)
@@ -932,7 +932,7 @@ static int __init init_jfs_fs(void)
 
        jfs_inode_cachep =
            kmem_cache_create_usercopy("jfs_ip", sizeof(struct jfs_inode_info),
-                       0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
+                       0, SLAB_RECLAIM_ACCOUNT|SLAB_ACCOUNT,
                        offsetof(struct jfs_inode_info, i_inline_all),
                        sizeof_field(struct jfs_inode_info, i_inline_all),
                        init_once);
index 0c93cad0f0acac80425dcdbd21b1eedaffe8c4d6..e29f4edf9572e739c5da8e30f51ebfa05c4eb5ee 100644 (file)
@@ -358,7 +358,9 @@ int kernfs_get_tree(struct fs_context *fc)
                }
                sb->s_flags |= SB_ACTIVE;
 
-               uuid_gen(&sb->s_uuid);
+               uuid_t uuid;
+               uuid_gen(&uuid);
+               super_set_uuid(sb, uuid.b, sizeof(uuid));
 
                down_write(&root->kernfs_supers_rwsem);
                list_add(&info->node, &info->root->supers);
index eec6031b0155442eab924ce946c2295c711105d2..0d14ae808fcfdb314af5b95ccabec10f1dac8c15 100644 (file)
@@ -23,6 +23,7 @@
 #include <linux/fsnotify.h>
 #include <linux/unicode.h>
 #include <linux/fscrypt.h>
+#include <linux/pidfs.h>
 
 #include <linux/uaccess.h>
 
@@ -240,17 +241,22 @@ const struct inode_operations simple_dir_inode_operations = {
 };
 EXPORT_SYMBOL(simple_dir_inode_operations);
 
-static void offset_set(struct dentry *dentry, u32 offset)
+/* 0 is '.', 1 is '..', so always start with offset 2 or more */
+enum {
+       DIR_OFFSET_MIN  = 2,
+};
+
+static void offset_set(struct dentry *dentry, long offset)
 {
-       dentry->d_fsdata = (void *)((uintptr_t)(offset));
+       dentry->d_fsdata = (void *)offset;
 }
 
-static u32 dentry2offset(struct dentry *dentry)
+static long dentry2offset(struct dentry *dentry)
 {
-       return (u32)((uintptr_t)(dentry->d_fsdata));
+       return (long)dentry->d_fsdata;
 }
 
-static struct lock_class_key simple_offset_xa_lock;
+static struct lock_class_key simple_offset_lock_class;
 
 /**
  * simple_offset_init - initialize an offset_ctx
@@ -259,11 +265,9 @@ static struct lock_class_key simple_offset_xa_lock;
  */
 void simple_offset_init(struct offset_ctx *octx)
 {
-       xa_init_flags(&octx->xa, XA_FLAGS_ALLOC1);
-       lockdep_set_class(&octx->xa.xa_lock, &simple_offset_xa_lock);
-
-       /* 0 is '.', 1 is '..', so always start with offset 2 */
-       octx->next_offset = 2;
+       mt_init_flags(&octx->mt, MT_FLAGS_ALLOC_RANGE);
+       lockdep_set_class(&octx->mt.ma_lock, &simple_offset_lock_class);
+       octx->next_offset = DIR_OFFSET_MIN;
 }
 
 /**
@@ -271,20 +275,19 @@ void simple_offset_init(struct offset_ctx *octx)
  * @octx: directory offset ctx to be updated
  * @dentry: new dentry being added
  *
- * Returns zero on success. @so_ctx and the dentry offset are updated.
+ * Returns zero on success. @octx and the dentry's offset are updated.
  * Otherwise, a negative errno value is returned.
  */
 int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry)
 {
-       static const struct xa_limit limit = XA_LIMIT(2, U32_MAX);
-       u32 offset;
+       unsigned long offset;
        int ret;
 
        if (dentry2offset(dentry) != 0)
                return -EBUSY;
 
-       ret = xa_alloc_cyclic(&octx->xa, &offset, dentry, limit,
-                             &octx->next_offset, GFP_KERNEL);
+       ret = mtree_alloc_cyclic(&octx->mt, &offset, dentry, DIR_OFFSET_MIN,
+                                LONG_MAX, &octx->next_offset, GFP_KERNEL);
        if (ret < 0)
                return ret;
 
@@ -300,16 +303,48 @@ int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry)
  */
 void simple_offset_remove(struct offset_ctx *octx, struct dentry *dentry)
 {
-       u32 offset;
+       long offset;
 
        offset = dentry2offset(dentry);
        if (offset == 0)
                return;
 
-       xa_erase(&octx->xa, offset);
+       mtree_erase(&octx->mt, offset);
        offset_set(dentry, 0);
 }
 
+/**
+ * simple_offset_empty - Check if a dentry can be unlinked
+ * @dentry: dentry to be tested
+ *
+ * Returns 0 if @dentry is a non-empty directory; otherwise returns 1.
+ */
+int simple_offset_empty(struct dentry *dentry)
+{
+       struct inode *inode = d_inode(dentry);
+       struct offset_ctx *octx;
+       struct dentry *child;
+       unsigned long index;
+       int ret = 1;
+
+       if (!inode || !S_ISDIR(inode->i_mode))
+               return ret;
+
+       index = DIR_OFFSET_MIN;
+       octx = inode->i_op->get_offset_ctx(inode);
+       mt_for_each(&octx->mt, child, index, LONG_MAX) {
+               spin_lock(&child->d_lock);
+               if (simple_positive(child)) {
+                       spin_unlock(&child->d_lock);
+                       ret = 0;
+                       break;
+               }
+               spin_unlock(&child->d_lock);
+       }
+
+       return ret;
+}
+
 /**
  * simple_offset_rename_exchange - exchange rename with directory offsets
  * @old_dir: parent of dentry being moved
@@ -327,8 +362,8 @@ int simple_offset_rename_exchange(struct inode *old_dir,
 {
        struct offset_ctx *old_ctx = old_dir->i_op->get_offset_ctx(old_dir);
        struct offset_ctx *new_ctx = new_dir->i_op->get_offset_ctx(new_dir);
-       u32 old_index = dentry2offset(old_dentry);
-       u32 new_index = dentry2offset(new_dentry);
+       long old_index = dentry2offset(old_dentry);
+       long new_index = dentry2offset(new_dentry);
        int ret;
 
        simple_offset_remove(old_ctx, old_dentry);
@@ -354,9 +389,9 @@ int simple_offset_rename_exchange(struct inode *old_dir,
 
 out_restore:
        offset_set(old_dentry, old_index);
-       xa_store(&old_ctx->xa, old_index, old_dentry, GFP_KERNEL);
+       mtree_store(&old_ctx->mt, old_index, old_dentry, GFP_KERNEL);
        offset_set(new_dentry, new_index);
-       xa_store(&new_ctx->xa, new_index, new_dentry, GFP_KERNEL);
+       mtree_store(&new_ctx->mt, new_index, new_dentry, GFP_KERNEL);
        return ret;
 }
 
@@ -369,7 +404,7 @@ out_restore:
  */
 void simple_offset_destroy(struct offset_ctx *octx)
 {
-       xa_destroy(&octx->xa);
+       mtree_destroy(&octx->mt);
 }
 
 /**
@@ -399,15 +434,16 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence)
 
        /* In this case, ->private_data is protected by f_pos_lock */
        file->private_data = NULL;
-       return vfs_setpos(file, offset, U32_MAX);
+       return vfs_setpos(file, offset, LONG_MAX);
 }
 
-static struct dentry *offset_find_next(struct xa_state *xas)
+static struct dentry *offset_find_next(struct offset_ctx *octx, loff_t offset)
 {
+       MA_STATE(mas, &octx->mt, offset, offset);
        struct dentry *child, *found = NULL;
 
        rcu_read_lock();
-       child = xas_next_entry(xas, U32_MAX);
+       child = mas_find(&mas, LONG_MAX);
        if (!child)
                goto out;
        spin_lock(&child->d_lock);
@@ -421,8 +457,8 @@ out:
 
 static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry)
 {
-       u32 offset = dentry2offset(dentry);
        struct inode *inode = d_inode(dentry);
+       long offset = dentry2offset(dentry);
 
        return ctx->actor(ctx, dentry->d_name.name, dentry->d_name.len, offset,
                          inode->i_ino, fs_umode_to_dtype(inode->i_mode));
@@ -430,12 +466,11 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry)
 
 static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
 {
-       struct offset_ctx *so_ctx = inode->i_op->get_offset_ctx(inode);
-       XA_STATE(xas, &so_ctx->xa, ctx->pos);
+       struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode);
        struct dentry *dentry;
 
        while (true) {
-               dentry = offset_find_next(&xas);
+               dentry = offset_find_next(octx, ctx->pos);
                if (!dentry)
                        return ERR_PTR(-ENOENT);
 
@@ -444,8 +479,8 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
                        break;
                }
 
+               ctx->pos = dentry2offset(dentry) + 1;
                dput(dentry);
-               ctx->pos = xas.xa_index + 1;
        }
        return NULL;
 }
@@ -481,7 +516,7 @@ static int offset_readdir(struct file *file, struct dir_context *ctx)
                return 0;
 
        /* In this case, ->private_data is protected by f_pos_lock */
-       if (ctx->pos == 2)
+       if (ctx->pos == DIR_OFFSET_MIN)
                file->private_data = NULL;
        else if (file->private_data == ERR_PTR(-ENOENT))
                return 0;
@@ -1580,7 +1615,7 @@ EXPORT_SYMBOL(alloc_anon_inode);
  * All arguments are ignored and it just returns -EINVAL.
  */
 int
-simple_nosetlease(struct file *filp, int arg, struct file_lock **flp,
+simple_nosetlease(struct file *filp, int arg, struct file_lease **flp,
                  void **priv)
 {
        return -EINVAL;
@@ -1704,16 +1739,28 @@ bool is_empty_dir_inode(struct inode *inode)
 static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
                                const char *str, const struct qstr *name)
 {
-       const struct dentry *parent = READ_ONCE(dentry->d_parent);
-       const struct inode *dir = READ_ONCE(parent->d_inode);
-       const struct super_block *sb = dentry->d_sb;
-       const struct unicode_map *um = sb->s_encoding;
-       struct qstr qstr = QSTR_INIT(str, len);
+       const struct dentry *parent;
+       const struct inode *dir;
        char strbuf[DNAME_INLINE_LEN];
-       int ret;
+       struct qstr qstr;
+
+       /*
+        * Attempt a case-sensitive match first. It is cheaper and
+        * should cover most lookups, including all the sane
+        * applications that expect a case-sensitive filesystem.
+        *
+        * This comparison is safe under RCU because the caller
+        * guarantees the consistency between str and len. See
+        * __d_lookup_rcu_op_compare() for details.
+        */
+       if (len == name->len && !memcmp(str, name->name, len))
+               return 0;
 
+       parent = READ_ONCE(dentry->d_parent);
+       dir = READ_ONCE(parent->d_inode);
        if (!dir || !IS_CASEFOLDED(dir))
-               goto fallback;
+               return 1;
+
        /*
         * If the dentry name is stored in-line, then it may be concurrently
         * modified by a rename.  If this happens, the VFS will eventually retry
@@ -1724,20 +1771,14 @@ static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len,
        if (len <= DNAME_INLINE_LEN - 1) {
                memcpy(strbuf, str, len);
                strbuf[len] = 0;
-               qstr.name = strbuf;
+               str = strbuf;
                /* prevent compiler from optimizing out the temporary buffer */
                barrier();
        }
-       ret = utf8_strncasecmp(um, name, &qstr);
-       if (ret >= 0)
-               return ret;
+       qstr.len = len;
+       qstr.name = str;
 
-       if (sb_has_strict_encoding(sb))
-               return -EINVAL;
-fallback:
-       if (len != name->len)
-               return 1;
-       return !!memcmp(str, name->name, len);
+       return utf8_strncasecmp(dentry->d_sb->s_encoding, name, &qstr);
 }
 
 /**
@@ -1752,7 +1793,7 @@ static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str)
        const struct inode *dir = READ_ONCE(dentry->d_inode);
        struct super_block *sb = dentry->d_sb;
        const struct unicode_map *um = sb->s_encoding;
-       int ret = 0;
+       int ret;
 
        if (!dir || !IS_CASEFOLDED(dir))
                return 0;
@@ -1766,73 +1807,45 @@ static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str)
 static const struct dentry_operations generic_ci_dentry_ops = {
        .d_hash = generic_ci_d_hash,
        .d_compare = generic_ci_d_compare,
-};
-#endif
-
 #ifdef CONFIG_FS_ENCRYPTION
-static const struct dentry_operations generic_encrypted_dentry_ops = {
        .d_revalidate = fscrypt_d_revalidate,
+#endif
 };
 #endif
 
-#if defined(CONFIG_FS_ENCRYPTION) && IS_ENABLED(CONFIG_UNICODE)
-static const struct dentry_operations generic_encrypted_ci_dentry_ops = {
-       .d_hash = generic_ci_d_hash,
-       .d_compare = generic_ci_d_compare,
+#ifdef CONFIG_FS_ENCRYPTION
+static const struct dentry_operations generic_encrypted_dentry_ops = {
        .d_revalidate = fscrypt_d_revalidate,
 };
 #endif
 
 /**
- * generic_set_encrypted_ci_d_ops - helper for setting d_ops for given dentry
- * @dentry:    dentry to set ops on
- *
- * Casefolded directories need d_hash and d_compare set, so that the dentries
- * contained in them are handled case-insensitively.  Note that these operations
- * are needed on the parent directory rather than on the dentries in it, and
- * while the casefolding flag can be toggled on and off on an empty directory,
- * dentry_operations can't be changed later.  As a result, if the filesystem has
- * casefolding support enabled at all, we have to give all dentries the
- * casefolding operations even if their inode doesn't have the casefolding flag
- * currently (and thus the casefolding ops would be no-ops for now).
- *
- * Encryption works differently in that the only dentry operation it needs is
- * d_revalidate, which it only needs on dentries that have the no-key name flag.
- * The no-key flag can't be set "later", so we don't have to worry about that.
+ * generic_set_sb_d_ops - helper for choosing the set of
+ * filesystem-wide dentry operations for the enabled features
+ * @sb: superblock to be configured
  *
- * Finally, to maximize compatibility with overlayfs (which isn't compatible
- * with certain dentry operations) and to avoid taking an unnecessary
- * performance hit, we use custom dentry_operations for each possible
- * combination rather than always installing all operations.
+ * Filesystems supporting casefolding and/or fscrypt can call this
+ * helper at mount-time to configure sb->s_d_op to best set of dentry
+ * operations required for the enabled features. The helper must be
+ * called after these have been configured, but before the root dentry
+ * is created.
  */
-void generic_set_encrypted_ci_d_ops(struct dentry *dentry)
+void generic_set_sb_d_ops(struct super_block *sb)
 {
-#ifdef CONFIG_FS_ENCRYPTION
-       bool needs_encrypt_ops = dentry->d_flags & DCACHE_NOKEY_NAME;
-#endif
 #if IS_ENABLED(CONFIG_UNICODE)
-       bool needs_ci_ops = dentry->d_sb->s_encoding;
-#endif
-#if defined(CONFIG_FS_ENCRYPTION) && IS_ENABLED(CONFIG_UNICODE)
-       if (needs_encrypt_ops && needs_ci_ops) {
-               d_set_d_op(dentry, &generic_encrypted_ci_dentry_ops);
+       if (sb->s_encoding) {
+               sb->s_d_op = &generic_ci_dentry_ops;
                return;
        }
 #endif
 #ifdef CONFIG_FS_ENCRYPTION
-       if (needs_encrypt_ops) {
-               d_set_d_op(dentry, &generic_encrypted_dentry_ops);
-               return;
-       }
-#endif
-#if IS_ENABLED(CONFIG_UNICODE)
-       if (needs_ci_ops) {
-               d_set_d_op(dentry, &generic_ci_dentry_ops);
+       if (sb->s_cop) {
+               sb->s_d_op = &generic_encrypted_dentry_ops;
                return;
        }
 #endif
 }
-EXPORT_SYMBOL(generic_set_encrypted_ci_d_ops);
+EXPORT_SYMBOL(generic_set_sb_d_ops);
 
 /**
  * inode_maybe_inc_iversion - increments i_version
@@ -1973,3 +1986,144 @@ struct timespec64 simple_inode_init_ts(struct inode *inode)
        return ts;
 }
 EXPORT_SYMBOL(simple_inode_init_ts);
+
+static inline struct dentry *get_stashed_dentry(struct dentry *stashed)
+{
+       struct dentry *dentry;
+
+       guard(rcu)();
+       dentry = READ_ONCE(stashed);
+       if (!dentry)
+               return NULL;
+       if (!lockref_get_not_dead(&dentry->d_lockref))
+               return NULL;
+       return dentry;
+}
+
+static struct dentry *prepare_anon_dentry(struct dentry **stashed,
+                                         unsigned long ino,
+                                         struct super_block *sb,
+                                         void *data)
+{
+       struct dentry *dentry;
+       struct inode *inode;
+       const struct stashed_operations *sops = sb->s_fs_info;
+
+       dentry = d_alloc_anon(sb);
+       if (!dentry)
+               return ERR_PTR(-ENOMEM);
+
+       inode = new_inode_pseudo(sb);
+       if (!inode) {
+               dput(dentry);
+               return ERR_PTR(-ENOMEM);
+       }
+
+       inode->i_ino = ino;
+       inode->i_flags |= S_IMMUTABLE;
+       inode->i_mode = S_IFREG;
+       simple_inode_init_ts(inode);
+       sops->init_inode(inode, data);
+
+       /* Notice when this is changed. */
+       WARN_ON_ONCE(!S_ISREG(inode->i_mode));
+       WARN_ON_ONCE(!IS_IMMUTABLE(inode));
+
+       /* Store address of location where dentry's supposed to be stashed. */
+       dentry->d_fsdata = stashed;
+
+       /* @data is now owned by the fs */
+       d_instantiate(dentry, inode);
+       return dentry;
+}
+
+static struct dentry *stash_dentry(struct dentry **stashed,
+                                  struct dentry *dentry)
+{
+       guard(rcu)();
+       for (;;) {
+               struct dentry *old;
+
+               /* Assume any old dentry was cleared out. */
+               old = cmpxchg(stashed, NULL, dentry);
+               if (likely(!old))
+                       return dentry;
+
+               /* Check if somebody else installed a reusable dentry. */
+               if (lockref_get_not_dead(&old->d_lockref))
+                       return old;
+
+               /* There's an old dead dentry there, try to take it over. */
+               if (likely(try_cmpxchg(stashed, &old, dentry)))
+                       return dentry;
+       }
+}
+
+/**
+ * path_from_stashed - create path from stashed or new dentry
+ * @stashed:    where to retrieve or stash dentry
+ * @ino:        inode number to use
+ * @mnt:        mnt of the filesystems to use
+ * @data:       data to store in inode->i_private
+ * @path:       path to create
+ *
+ * The function tries to retrieve a stashed dentry from @stashed. If the dentry
+ * is still valid then it will be reused. If the dentry isn't able the function
+ * will allocate a new dentry and inode. It will then check again whether it
+ * can reuse an existing dentry in case one has been added in the meantime or
+ * update @stashed with the newly added dentry.
+ *
+ * Special-purpose helper for nsfs and pidfs.
+ *
+ * Return: On success zero and on failure a negative error is returned.
+ */
+int path_from_stashed(struct dentry **stashed, unsigned long ino,
+                     struct vfsmount *mnt, void *data, struct path *path)
+{
+       struct dentry *dentry;
+       const struct stashed_operations *sops = mnt->mnt_sb->s_fs_info;
+
+       /* See if dentry can be reused. */
+       path->dentry = get_stashed_dentry(*stashed);
+       if (path->dentry) {
+               sops->put_data(data);
+               goto out_path;
+       }
+
+       /* Allocate a new dentry. */
+       dentry = prepare_anon_dentry(stashed, ino, mnt->mnt_sb, data);
+       if (IS_ERR(dentry)) {
+               sops->put_data(data);
+               return PTR_ERR(dentry);
+       }
+
+       /* Added a new dentry. @data is now owned by the filesystem. */
+       path->dentry = stash_dentry(stashed, dentry);
+       if (path->dentry != dentry)
+               dput(dentry);
+
+out_path:
+       WARN_ON_ONCE(path->dentry->d_fsdata != stashed);
+       WARN_ON_ONCE(d_inode(path->dentry)->i_private != data);
+       path->mnt = mntget(mnt);
+       return 0;
+}
+
+void stashed_dentry_prune(struct dentry *dentry)
+{
+       struct dentry **stashed = dentry->d_fsdata;
+       struct inode *inode = d_inode(dentry);
+
+       if (WARN_ON_ONCE(!stashed))
+               return;
+
+       if (!inode)
+               return;
+
+       /*
+        * Only replace our own @dentry as someone else might've
+        * already cleared out @dentry and stashed their own
+        * dentry in there.
+        */
+       cmpxchg(stashed, dentry, NULL);
+}
index 8161667c976f8c487de84ba9fcba189ae29072d9..527458db4525af3e76d9119feb6e5e5b62890741 100644 (file)
@@ -243,7 +243,7 @@ static void encode_nlm4_holder(struct xdr_stream *xdr,
        u64 l_offset, l_len;
        __be32 *p;
 
-       encode_bool(xdr, lock->fl.fl_type == F_RDLCK);
+       encode_bool(xdr, lock->fl.c.flc_type == F_RDLCK);
        encode_int32(xdr, lock->svid);
        encode_netobj(xdr, lock->oh.data, lock->oh.len);
 
@@ -270,7 +270,7 @@ static int decode_nlm4_holder(struct xdr_stream *xdr, struct nlm_res *result)
                goto out_overflow;
        exclusive = be32_to_cpup(p++);
        lock->svid = be32_to_cpup(p);
-       fl->fl_pid = (pid_t)lock->svid;
+       fl->c.flc_pid = (pid_t)lock->svid;
 
        error = decode_netobj(xdr, &lock->oh);
        if (unlikely(error))
@@ -280,8 +280,8 @@ static int decode_nlm4_holder(struct xdr_stream *xdr, struct nlm_res *result)
        if (unlikely(p == NULL))
                goto out_overflow;
 
-       fl->fl_flags = FL_POSIX;
-       fl->fl_type  = exclusive != 0 ? F_WRLCK : F_RDLCK;
+       fl->c.flc_flags = FL_POSIX;
+       fl->c.flc_type  = exclusive != 0 ? F_WRLCK : F_RDLCK;
        p = xdr_decode_hyper(p, &l_offset);
        xdr_decode_hyper(p, &l_len);
        nlm4svc_set_file_lock_range(fl, l_offset, l_len);
@@ -357,7 +357,7 @@ static void nlm4_xdr_enc_testargs(struct rpc_rqst *req,
        const struct nlm_lock *lock = &args->lock;
 
        encode_cookie(xdr, &args->cookie);
-       encode_bool(xdr, lock->fl.fl_type == F_WRLCK);
+       encode_bool(xdr, lock->fl.c.flc_type == F_WRLCK);
        encode_nlm4_lock(xdr, lock);
 }
 
@@ -380,7 +380,7 @@ static void nlm4_xdr_enc_lockargs(struct rpc_rqst *req,
 
        encode_cookie(xdr, &args->cookie);
        encode_bool(xdr, args->block);
-       encode_bool(xdr, lock->fl.fl_type == F_WRLCK);
+       encode_bool(xdr, lock->fl.c.flc_type == F_WRLCK);
        encode_nlm4_lock(xdr, lock);
        encode_bool(xdr, args->reclaim);
        encode_int32(xdr, args->state);
@@ -403,7 +403,7 @@ static void nlm4_xdr_enc_cancargs(struct rpc_rqst *req,
 
        encode_cookie(xdr, &args->cookie);
        encode_bool(xdr, args->block);
-       encode_bool(xdr, lock->fl.fl_type == F_WRLCK);
+       encode_bool(xdr, lock->fl.c.flc_type == F_WRLCK);
        encode_nlm4_lock(xdr, lock);
 }
 
index 5d85715be763044ad2245815295567bcfb84c2d4..a7e0519ec024a9f73ca05d0d4a0ca29f22d0eb23 100644 (file)
@@ -185,7 +185,7 @@ __be32 nlmclnt_grant(const struct sockaddr *addr, const struct nlm_lock *lock)
                        continue;
                if (!rpc_cmp_addr(nlm_addr(block->b_host), addr))
                        continue;
-               if (nfs_compare_fh(NFS_FH(file_inode(fl_blocked->fl_file)), fh) != 0)
+               if (nfs_compare_fh(NFS_FH(file_inode(fl_blocked->c.flc_file)), fh) != 0)
                        continue;
                /* Alright, we found a lock. Set the return status
                 * and wake up the caller
index fba6c7fa74747e9c411ed23917b744cc93c49ac3..cebcc283b7ce2e813944d9037de2a7462585a2c9 100644 (file)
@@ -133,7 +133,8 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl)
        char *nodename = req->a_host->h_rpcclnt->cl_nodename;
 
        nlmclnt_next_cookie(&argp->cookie);
-       memcpy(&lock->fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct nfs_fh));
+       memcpy(&lock->fh, NFS_FH(file_inode(fl->c.flc_file)),
+              sizeof(struct nfs_fh));
        lock->caller  = nodename;
        lock->oh.data = req->a_owner;
        lock->oh.len  = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
@@ -142,7 +143,7 @@ static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl)
        lock->svid = fl->fl_u.nfs_fl.owner->pid;
        lock->fl.fl_start = fl->fl_start;
        lock->fl.fl_end = fl->fl_end;
-       lock->fl.fl_type = fl->fl_type;
+       lock->fl.c.flc_type = fl->c.flc_type;
 }
 
 static void nlmclnt_release_lockargs(struct nlm_rqst *req)
@@ -182,7 +183,7 @@ int nlmclnt_proc(struct nlm_host *host, int cmd, struct file_lock *fl, void *dat
        call->a_callback_data = data;
 
        if (IS_SETLK(cmd) || IS_SETLKW(cmd)) {
-               if (fl->fl_type != F_UNLCK) {
+               if (fl->c.flc_type != F_UNLCK) {
                        call->a_args.block = IS_SETLKW(cmd) ? 1 : 0;
                        status = nlmclnt_lock(call, fl);
                } else
@@ -432,13 +433,14 @@ nlmclnt_test(struct nlm_rqst *req, struct file_lock *fl)
 {
        int     status;
 
-       status = nlmclnt_call(nfs_file_cred(fl->fl_file), req, NLMPROC_TEST);
+       status = nlmclnt_call(nfs_file_cred(fl->c.flc_file), req,
+                             NLMPROC_TEST);
        if (status < 0)
                goto out;
 
        switch (req->a_res.status) {
                case nlm_granted:
-                       fl->fl_type = F_UNLCK;
+                       fl->c.flc_type = F_UNLCK;
                        break;
                case nlm_lck_denied:
                        /*
@@ -446,8 +448,8 @@ nlmclnt_test(struct nlm_rqst *req, struct file_lock *fl)
                         */
                        fl->fl_start = req->a_res.lock.fl.fl_start;
                        fl->fl_end = req->a_res.lock.fl.fl_end;
-                       fl->fl_type = req->a_res.lock.fl.fl_type;
-                       fl->fl_pid = -req->a_res.lock.fl.fl_pid;
+                       fl->c.flc_type = req->a_res.lock.fl.c.flc_type;
+                       fl->c.flc_pid = -req->a_res.lock.fl.c.flc_pid;
                        break;
                default:
                        status = nlm_stat_to_errno(req->a_res.status);
@@ -485,14 +487,15 @@ static const struct file_lock_operations nlmclnt_lock_ops = {
 static void nlmclnt_locks_init_private(struct file_lock *fl, struct nlm_host *host)
 {
        fl->fl_u.nfs_fl.state = 0;
-       fl->fl_u.nfs_fl.owner = nlmclnt_find_lockowner(host, fl->fl_owner);
+       fl->fl_u.nfs_fl.owner = nlmclnt_find_lockowner(host,
+                                                      fl->c.flc_owner);
        INIT_LIST_HEAD(&fl->fl_u.nfs_fl.list);
        fl->fl_ops = &nlmclnt_lock_ops;
 }
 
 static int do_vfs_lock(struct file_lock *fl)
 {
-       return locks_lock_file_wait(fl->fl_file, fl);
+       return locks_lock_file_wait(fl->c.flc_file, fl);
 }
 
 /*
@@ -518,12 +521,12 @@ static int do_vfs_lock(struct file_lock *fl)
 static int
 nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
 {
-       const struct cred *cred = nfs_file_cred(fl->fl_file);
+       const struct cred *cred = nfs_file_cred(fl->c.flc_file);
        struct nlm_host *host = req->a_host;
        struct nlm_res  *resp = &req->a_res;
        struct nlm_wait block;
-       unsigned char fl_flags = fl->fl_flags;
-       unsigned char fl_type;
+       unsigned char flags = fl->c.flc_flags;
+       unsigned char type;
        __be32 b_status;
        int status = -ENOLCK;
 
@@ -531,9 +534,9 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
                goto out;
        req->a_args.state = nsm_local_state;
 
-       fl->fl_flags |= FL_ACCESS;
+       fl->c.flc_flags |= FL_ACCESS;
        status = do_vfs_lock(fl);
-       fl->fl_flags = fl_flags;
+       fl->c.flc_flags = flags;
        if (status < 0)
                goto out;
 
@@ -591,11 +594,11 @@ again:
                        goto again;
                }
                /* Ensure the resulting lock will get added to granted list */
-               fl->fl_flags |= FL_SLEEP;
+               fl->c.flc_flags |= FL_SLEEP;
                if (do_vfs_lock(fl) < 0)
                        printk(KERN_WARNING "%s: VFS is out of sync with lock manager!\n", __func__);
                up_read(&host->h_rwsem);
-               fl->fl_flags = fl_flags;
+               fl->c.flc_flags = flags;
                status = 0;
        }
        if (status < 0)
@@ -605,7 +608,7 @@ again:
         * cases NLM_LCK_DENIED is returned for a permanent error.  So
         * turn it into an ENOLCK.
         */
-       if (resp->status == nlm_lck_denied && (fl_flags & FL_SLEEP))
+       if (resp->status == nlm_lck_denied && (flags & FL_SLEEP))
                status = -ENOLCK;
        else
                status = nlm_stat_to_errno(resp->status);
@@ -622,13 +625,13 @@ out_unlock:
                           req->a_host->h_addrlen, req->a_res.status);
        dprintk("lockd: lock attempt ended in fatal error.\n"
                "       Attempting to unlock.\n");
-       fl_type = fl->fl_type;
-       fl->fl_type = F_UNLCK;
+       type = fl->c.flc_type;
+       fl->c.flc_type = F_UNLCK;
        down_read(&host->h_rwsem);
        do_vfs_lock(fl);
        up_read(&host->h_rwsem);
-       fl->fl_type = fl_type;
-       fl->fl_flags = fl_flags;
+       fl->c.flc_type = type;
+       fl->c.flc_flags = flags;
        nlmclnt_async_call(cred, req, NLMPROC_UNLOCK, &nlmclnt_unlock_ops);
        return status;
 }
@@ -651,12 +654,14 @@ nlmclnt_reclaim(struct nlm_host *host, struct file_lock *fl,
        nlmclnt_setlockargs(req, fl);
        req->a_args.reclaim = 1;
 
-       status = nlmclnt_call(nfs_file_cred(fl->fl_file), req, NLMPROC_LOCK);
+       status = nlmclnt_call(nfs_file_cred(fl->c.flc_file), req,
+                             NLMPROC_LOCK);
        if (status >= 0 && req->a_res.status == nlm_granted)
                return 0;
 
        printk(KERN_WARNING "lockd: failed to reclaim lock for pid %d "
-                               "(errno %d, status %d)\n", fl->fl_pid,
+                               "(errno %d, status %d)\n",
+                               fl->c.flc_pid,
                                status, ntohl(req->a_res.status));
 
        /*
@@ -683,26 +688,26 @@ nlmclnt_unlock(struct nlm_rqst *req, struct file_lock *fl)
        struct nlm_host *host = req->a_host;
        struct nlm_res  *resp = &req->a_res;
        int status;
-       unsigned char fl_flags = fl->fl_flags;
+       unsigned char flags = fl->c.flc_flags;
 
        /*
         * Note: the server is supposed to either grant us the unlock
         * request, or to deny it with NLM_LCK_DENIED_GRACE_PERIOD. In either
         * case, we want to unlock.
         */
-       fl->fl_flags |= FL_EXISTS;
+       fl->c.flc_flags |= FL_EXISTS;
        down_read(&host->h_rwsem);
        status = do_vfs_lock(fl);
        up_read(&host->h_rwsem);
-       fl->fl_flags = fl_flags;
+       fl->c.flc_flags = flags;
        if (status == -ENOENT) {
                status = 0;
                goto out;
        }
 
        refcount_inc(&req->a_count);
-       status = nlmclnt_async_call(nfs_file_cred(fl->fl_file), req,
-                       NLMPROC_UNLOCK, &nlmclnt_unlock_ops);
+       status = nlmclnt_async_call(nfs_file_cred(fl->c.flc_file), req,
+                                   NLMPROC_UNLOCK, &nlmclnt_unlock_ops);
        if (status < 0)
                goto out;
 
@@ -795,8 +800,8 @@ static int nlmclnt_cancel(struct nlm_host *host, int block, struct file_lock *fl
        req->a_args.block = block;
 
        refcount_inc(&req->a_count);
-       status = nlmclnt_async_call(nfs_file_cred(fl->fl_file), req,
-                       NLMPROC_CANCEL, &nlmclnt_cancel_ops);
+       status = nlmclnt_async_call(nfs_file_cred(fl->c.flc_file), req,
+                                   NLMPROC_CANCEL, &nlmclnt_cancel_ops);
        if (status == 0 && req->a_res.status == nlm_lck_denied)
                status = -ENOLCK;
        nlmclnt_release_call(req);
index 4df62f6355295556a4efa65a7148ccc14334a975..a3e97278b997cfee94217ce3b83adbd0df5134f4 100644 (file)
@@ -238,7 +238,7 @@ static void encode_nlm_holder(struct xdr_stream *xdr,
        u32 l_offset, l_len;
        __be32 *p;
 
-       encode_bool(xdr, lock->fl.fl_type == F_RDLCK);
+       encode_bool(xdr, lock->fl.c.flc_type == F_RDLCK);
        encode_int32(xdr, lock->svid);
        encode_netobj(xdr, lock->oh.data, lock->oh.len);
 
@@ -265,7 +265,7 @@ static int decode_nlm_holder(struct xdr_stream *xdr, struct nlm_res *result)
                goto out_overflow;
        exclusive = be32_to_cpup(p++);
        lock->svid = be32_to_cpup(p);
-       fl->fl_pid = (pid_t)lock->svid;
+       fl->c.flc_pid = (pid_t)lock->svid;
 
        error = decode_netobj(xdr, &lock->oh);
        if (unlikely(error))
@@ -275,8 +275,8 @@ static int decode_nlm_holder(struct xdr_stream *xdr, struct nlm_res *result)
        if (unlikely(p == NULL))
                goto out_overflow;
 
-       fl->fl_flags = FL_POSIX;
-       fl->fl_type  = exclusive != 0 ? F_WRLCK : F_RDLCK;
+       fl->c.flc_flags = FL_POSIX;
+       fl->c.flc_type  = exclusive != 0 ? F_WRLCK : F_RDLCK;
        l_offset = be32_to_cpup(p++);
        l_len = be32_to_cpup(p);
        end = l_offset + l_len - 1;
@@ -357,7 +357,7 @@ static void nlm_xdr_enc_testargs(struct rpc_rqst *req,
        const struct nlm_lock *lock = &args->lock;
 
        encode_cookie(xdr, &args->cookie);
-       encode_bool(xdr, lock->fl.fl_type == F_WRLCK);
+       encode_bool(xdr, lock->fl.c.flc_type == F_WRLCK);
        encode_nlm_lock(xdr, lock);
 }
 
@@ -380,7 +380,7 @@ static void nlm_xdr_enc_lockargs(struct rpc_rqst *req,
 
        encode_cookie(xdr, &args->cookie);
        encode_bool(xdr, args->block);
-       encode_bool(xdr, lock->fl.fl_type == F_WRLCK);
+       encode_bool(xdr, lock->fl.c.flc_type == F_WRLCK);
        encode_nlm_lock(xdr, lock);
        encode_bool(xdr, args->reclaim);
        encode_int32(xdr, args->state);
@@ -403,7 +403,7 @@ static void nlm_xdr_enc_cancargs(struct rpc_rqst *req,
 
        encode_cookie(xdr, &args->cookie);
        encode_bool(xdr, args->block);
-       encode_bool(xdr, lock->fl.fl_type == F_WRLCK);
+       encode_bool(xdr, lock->fl.c.flc_type == F_WRLCK);
        encode_nlm_lock(xdr, lock);
 }
 
index b72023a6b4c16d9fbdb4d9c0021d93e732bbc99b..8a72c418cdcc09b5172532964c0cbe8bc7eeda3a 100644 (file)
@@ -52,16 +52,16 @@ nlm4svc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp,
                *filp = file;
 
                /* Set up the missing parts of the file_lock structure */
-               lock->fl.fl_flags = FL_POSIX;
-               lock->fl.fl_file  = file->f_file[mode];
-               lock->fl.fl_pid = current->tgid;
+               lock->fl.c.flc_flags = FL_POSIX;
+               lock->fl.c.flc_file  = file->f_file[mode];
+               lock->fl.c.flc_pid = current->tgid;
                lock->fl.fl_start = (loff_t)lock->lock_start;
                lock->fl.fl_end = lock->lock_len ?
                                   (loff_t)(lock->lock_start + lock->lock_len - 1) :
                                   OFFSET_MAX;
                lock->fl.fl_lmops = &nlmsvc_lock_operations;
                nlmsvc_locks_init_private(&lock->fl, host, (pid_t)lock->svid);
-               if (!lock->fl.fl_owner) {
+               if (!lock->fl.c.flc_owner) {
                        /* lockowner allocation has failed */
                        nlmsvc_release_host(host);
                        return nlm_lck_denied_nolocks;
@@ -106,7 +106,7 @@ __nlm4svc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
        if ((resp->status = nlm4svc_retrieve_args(rqstp, argp, &host, &file)))
                return resp->status == nlm_drop_reply ? rpc_drop_reply :rpc_success;
 
-       test_owner = argp->lock.fl.fl_owner;
+       test_owner = argp->lock.fl.c.flc_owner;
        /* Now check for conflicting locks */
        resp->status = nlmsvc_testlock(rqstp, file, host, &argp->lock, &resp->lock, &resp->cookie);
        if (resp->status == nlm_drop_reply)
index 2dc10900ad1c335415e1ee43f9310d18a79dacc7..1f2149db10f2469562fce16af291246dbeae9ee6 100644 (file)
@@ -150,16 +150,17 @@ nlmsvc_lookup_block(struct nlm_file *file, struct nlm_lock *lock)
        struct file_lock        *fl;
 
        dprintk("lockd: nlmsvc_lookup_block f=%p pd=%d %Ld-%Ld ty=%d\n",
-                               file, lock->fl.fl_pid,
+                               file, lock->fl.c.flc_pid,
                                (long long)lock->fl.fl_start,
-                               (long long)lock->fl.fl_end, lock->fl.fl_type);
+                               (long long)lock->fl.fl_end,
+                               lock->fl.c.flc_type);
        spin_lock(&nlm_blocked_lock);
        list_for_each_entry(block, &nlm_blocked, b_list) {
                fl = &block->b_call->a_args.lock.fl;
                dprintk("lockd: check f=%p pd=%d %Ld-%Ld ty=%d cookie=%s\n",
-                               block->b_file, fl->fl_pid,
+                               block->b_file, fl->c.flc_pid,
                                (long long)fl->fl_start,
-                               (long long)fl->fl_end, fl->fl_type,
+                               (long long)fl->fl_end, fl->c.flc_type,
                                nlmdbg_cookie2a(&block->b_call->a_args.cookie));
                if (block->b_file == file && nlm_compare_locks(fl, &lock->fl)) {
                        kref_get(&block->b_count);
@@ -244,7 +245,7 @@ nlmsvc_create_block(struct svc_rqst *rqstp, struct nlm_host *host,
                goto failed_free;
 
        /* Set notifier function for VFS, and init args */
-       call->a_args.lock.fl.fl_flags |= FL_SLEEP;
+       call->a_args.lock.fl.c.flc_flags |= FL_SLEEP;
        call->a_args.lock.fl.fl_lmops = &nlmsvc_lock_operations;
        nlmclnt_next_cookie(&call->a_args.cookie);
 
@@ -402,14 +403,14 @@ static struct nlm_lockowner *nlmsvc_find_lockowner(struct nlm_host *host, pid_t
 void
 nlmsvc_release_lockowner(struct nlm_lock *lock)
 {
-       if (lock->fl.fl_owner)
-               nlmsvc_put_lockowner(lock->fl.fl_owner);
+       if (lock->fl.c.flc_owner)
+               nlmsvc_put_lockowner(lock->fl.c.flc_owner);
 }
 
 void nlmsvc_locks_init_private(struct file_lock *fl, struct nlm_host *host,
                                                pid_t pid)
 {
-       fl->fl_owner = nlmsvc_find_lockowner(host, pid);
+       fl->c.flc_owner = nlmsvc_find_lockowner(host, pid);
 }
 
 /*
@@ -425,7 +426,7 @@ static int nlmsvc_setgrantargs(struct nlm_rqst *call, struct nlm_lock *lock)
 
        /* set default data area */
        call->a_args.lock.oh.data = call->a_owner;
-       call->a_args.lock.svid = ((struct nlm_lockowner *)lock->fl.fl_owner)->pid;
+       call->a_args.lock.svid = ((struct nlm_lockowner *) lock->fl.c.flc_owner)->pid;
 
        if (lock->oh.len > NLMCLNT_OHSIZE) {
                void *data = kmalloc(lock->oh.len, GFP_KERNEL);
@@ -489,7 +490,8 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
 
        dprintk("lockd: nlmsvc_lock(%s/%ld, ty=%d, pi=%d, %Ld-%Ld, bl=%d)\n",
                                inode->i_sb->s_id, inode->i_ino,
-                               lock->fl.fl_type, lock->fl.fl_pid,
+                               lock->fl.c.flc_type,
+                               lock->fl.c.flc_pid,
                                (long long)lock->fl.fl_start,
                                (long long)lock->fl.fl_end,
                                wait);
@@ -512,7 +514,7 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
                        goto out;
                lock = &block->b_call->a_args.lock;
        } else
-               lock->fl.fl_flags &= ~FL_SLEEP;
+               lock->fl.c.flc_flags &= ~FL_SLEEP;
 
        if (block->b_flags & B_QUEUED) {
                dprintk("lockd: nlmsvc_lock deferred block %p flags %d\n",
@@ -560,10 +562,10 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
        spin_unlock(&nlm_blocked_lock);
 
        if (!wait)
-               lock->fl.fl_flags &= ~FL_SLEEP;
+               lock->fl.c.flc_flags &= ~FL_SLEEP;
        mode = lock_to_openmode(&lock->fl);
        error = vfs_lock_file(file->f_file[mode], F_SETLK, &lock->fl, NULL);
-       lock->fl.fl_flags &= ~FL_SLEEP;
+       lock->fl.c.flc_flags &= ~FL_SLEEP;
 
        dprintk("lockd: vfs_lock_file returned %d\n", error);
        switch (error) {
@@ -616,7 +618,7 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file,
        dprintk("lockd: nlmsvc_testlock(%s/%ld, ty=%d, %Ld-%Ld)\n",
                                nlmsvc_file_inode(file)->i_sb->s_id,
                                nlmsvc_file_inode(file)->i_ino,
-                               lock->fl.fl_type,
+                               lock->fl.c.flc_type,
                                (long long)lock->fl.fl_start,
                                (long long)lock->fl.fl_end);
 
@@ -636,19 +638,19 @@ nlmsvc_testlock(struct svc_rqst *rqstp, struct nlm_file *file,
                goto out;
        }
 
-       if (lock->fl.fl_type == F_UNLCK) {
+       if (lock->fl.c.flc_type == F_UNLCK) {
                ret = nlm_granted;
                goto out;
        }
 
        dprintk("lockd: conflicting lock(ty=%d, %Ld-%Ld)\n",
-               lock->fl.fl_type, (long long)lock->fl.fl_start,
+               lock->fl.c.flc_type, (long long)lock->fl.fl_start,
                (long long)lock->fl.fl_end);
        conflock->caller = "somehost";  /* FIXME */
        conflock->len = strlen(conflock->caller);
        conflock->oh.len = 0;           /* don't return OH info */
-       conflock->svid = lock->fl.fl_pid;
-       conflock->fl.fl_type = lock->fl.fl_type;
+       conflock->svid = lock->fl.c.flc_pid;
+       conflock->fl.c.flc_type = lock->fl.c.flc_type;
        conflock->fl.fl_start = lock->fl.fl_start;
        conflock->fl.fl_end = lock->fl.fl_end;
        locks_release_private(&lock->fl);
@@ -673,21 +675,21 @@ nlmsvc_unlock(struct net *net, struct nlm_file *file, struct nlm_lock *lock)
        dprintk("lockd: nlmsvc_unlock(%s/%ld, pi=%d, %Ld-%Ld)\n",
                                nlmsvc_file_inode(file)->i_sb->s_id,
                                nlmsvc_file_inode(file)->i_ino,
-                               lock->fl.fl_pid,
+                               lock->fl.c.flc_pid,
                                (long long)lock->fl.fl_start,
                                (long long)lock->fl.fl_end);
 
        /* First, cancel any lock that might be there */
        nlmsvc_cancel_blocked(net, file, lock);
 
-       lock->fl.fl_type = F_UNLCK;
-       lock->fl.fl_file = file->f_file[O_RDONLY];
-       if (lock->fl.fl_file)
-               error = vfs_lock_file(lock->fl.fl_file, F_SETLK,
+       lock->fl.c.flc_type = F_UNLCK;
+       lock->fl.c.flc_file = file->f_file[O_RDONLY];
+       if (lock->fl.c.flc_file)
+               error = vfs_lock_file(lock->fl.c.flc_file, F_SETLK,
                                        &lock->fl, NULL);
-       lock->fl.fl_file = file->f_file[O_WRONLY];
-       if (lock->fl.fl_file)
-               error |= vfs_lock_file(lock->fl.fl_file, F_SETLK,
+       lock->fl.c.flc_file = file->f_file[O_WRONLY];
+       if (lock->fl.c.flc_file)
+               error |= vfs_lock_file(lock->fl.c.flc_file, F_SETLK,
                                        &lock->fl, NULL);
 
        return (error < 0)? nlm_lck_denied_nolocks : nlm_granted;
@@ -710,7 +712,7 @@ nlmsvc_cancel_blocked(struct net *net, struct nlm_file *file, struct nlm_lock *l
        dprintk("lockd: nlmsvc_cancel(%s/%ld, pi=%d, %Ld-%Ld)\n",
                                nlmsvc_file_inode(file)->i_sb->s_id,
                                nlmsvc_file_inode(file)->i_ino,
-                               lock->fl.fl_pid,
+                               lock->fl.c.flc_pid,
                                (long long)lock->fl.fl_start,
                                (long long)lock->fl.fl_end);
 
@@ -863,12 +865,12 @@ nlmsvc_grant_blocked(struct nlm_block *block)
        /* vfs_lock_file() can mangle fl_start and fl_end, but we need
         * them unchanged for the GRANT_MSG
         */
-       lock->fl.fl_flags |= FL_SLEEP;
+       lock->fl.c.flc_flags |= FL_SLEEP;
        fl_start = lock->fl.fl_start;
        fl_end = lock->fl.fl_end;
        mode = lock_to_openmode(&lock->fl);
        error = vfs_lock_file(file->f_file[mode], F_SETLK, &lock->fl, NULL);
-       lock->fl.fl_flags &= ~FL_SLEEP;
+       lock->fl.c.flc_flags &= ~FL_SLEEP;
        lock->fl.fl_start = fl_start;
        lock->fl.fl_end = fl_end;
 
@@ -993,8 +995,8 @@ nlmsvc_grant_reply(struct nlm_cookie *cookie, __be32 status)
                /* Client doesn't want it, just unlock it */
                nlmsvc_unlink_block(block);
                fl = &block->b_call->a_args.lock.fl;
-               fl->fl_type = F_UNLCK;
-               error = vfs_lock_file(fl->fl_file, F_SETLK, fl, NULL);
+               fl->c.flc_type = F_UNLCK;
+               error = vfs_lock_file(fl->c.flc_file, F_SETLK, fl, NULL);
                if (error)
                        pr_warn("lockd: unable to unlock lock rejected by client!\n");
                break;
index 32784f508c8106313a4b1e7728b56e22bcd857b7..a03220e66ce02fe9a7b2a5fcfb56293df97e6f24 100644 (file)
@@ -77,12 +77,12 @@ nlmsvc_retrieve_args(struct svc_rqst *rqstp, struct nlm_args *argp,
 
                /* Set up the missing parts of the file_lock structure */
                mode = lock_to_openmode(&lock->fl);
-               lock->fl.fl_flags = FL_POSIX;
-               lock->fl.fl_file  = file->f_file[mode];
-               lock->fl.fl_pid = current->tgid;
+               lock->fl.c.flc_flags = FL_POSIX;
+               lock->fl.c.flc_file  = file->f_file[mode];
+               lock->fl.c.flc_pid = current->tgid;
                lock->fl.fl_lmops = &nlmsvc_lock_operations;
                nlmsvc_locks_init_private(&lock->fl, host, (pid_t)lock->svid);
-               if (!lock->fl.fl_owner) {
+               if (!lock->fl.c.flc_owner) {
                        /* lockowner allocation has failed */
                        nlmsvc_release_host(host);
                        return nlm_lck_denied_nolocks;
@@ -127,7 +127,7 @@ __nlmsvc_proc_test(struct svc_rqst *rqstp, struct nlm_res *resp)
        if ((resp->status = nlmsvc_retrieve_args(rqstp, argp, &host, &file)))
                return resp->status == nlm_drop_reply ? rpc_drop_reply :rpc_success;
 
-       test_owner = argp->lock.fl.fl_owner;
+       test_owner = argp->lock.fl.c.flc_owner;
 
        /* Now check for conflicting locks */
        resp->status = cast_status(nlmsvc_testlock(rqstp, file, host, &argp->lock, &resp->lock, &resp->cookie));
index e3b6229e7ae5cdd807198a47bbb8e8546a6ff3ba..9103896164f6886eec5adf65a55f4c29433dcb29 100644 (file)
@@ -73,7 +73,7 @@ static inline unsigned int file_hash(struct nfs_fh *f)
 
 int lock_to_openmode(struct file_lock *lock)
 {
-       return (lock->fl_type == F_WRLCK) ? O_WRONLY : O_RDONLY;
+       return lock_is_write(lock) ? O_WRONLY : O_RDONLY;
 }
 
 /*
@@ -181,18 +181,18 @@ static int nlm_unlock_files(struct nlm_file *file, const struct file_lock *fl)
        struct file_lock lock;
 
        locks_init_lock(&lock);
-       lock.fl_type  = F_UNLCK;
+       lock.c.flc_type  = F_UNLCK;
        lock.fl_start = 0;
        lock.fl_end   = OFFSET_MAX;
-       lock.fl_owner = fl->fl_owner;
-       lock.fl_pid   = fl->fl_pid;
-       lock.fl_flags = FL_POSIX;
+       lock.c.flc_owner = fl->c.flc_owner;
+       lock.c.flc_pid   = fl->c.flc_pid;
+       lock.c.flc_flags = FL_POSIX;
 
-       lock.fl_file = file->f_file[O_RDONLY];
-       if (lock.fl_file && vfs_lock_file(lock.fl_file, F_SETLK, &lock, NULL))
+       lock.c.flc_file = file->f_file[O_RDONLY];
+       if (lock.c.flc_file && vfs_lock_file(lock.c.flc_file, F_SETLK, &lock, NULL))
                goto out_err;
-       lock.fl_file = file->f_file[O_WRONLY];
-       if (lock.fl_file && vfs_lock_file(lock.fl_file, F_SETLK, &lock, NULL))
+       lock.c.flc_file = file->f_file[O_WRONLY];
+       if (lock.c.flc_file && vfs_lock_file(lock.c.flc_file, F_SETLK, &lock, NULL))
                goto out_err;
        return 0;
 out_err:
@@ -218,14 +218,14 @@ nlm_traverse_locks(struct nlm_host *host, struct nlm_file *file,
 again:
        file->f_locks = 0;
        spin_lock(&flctx->flc_lock);
-       list_for_each_entry(fl, &flctx->flc_posix, fl_list) {
+       for_each_file_lock(fl, &flctx->flc_posix) {
                if (fl->fl_lmops != &nlmsvc_lock_operations)
                        continue;
 
                /* update current lock count */
                file->f_locks++;
 
-               lockhost = ((struct nlm_lockowner *)fl->fl_owner)->host;
+               lockhost = ((struct nlm_lockowner *) fl->c.flc_owner)->host;
                if (match(lockhost, host)) {
 
                        spin_unlock(&flctx->flc_lock);
@@ -272,7 +272,7 @@ nlm_file_inuse(struct nlm_file *file)
 
        if (flctx && !list_empty_careful(&flctx->flc_posix)) {
                spin_lock(&flctx->flc_lock);
-               list_for_each_entry(fl, &flctx->flc_posix, fl_list) {
+               for_each_file_lock(fl, &flctx->flc_posix) {
                        if (fl->fl_lmops == &nlmsvc_lock_operations) {
                                spin_unlock(&flctx->flc_lock);
                                return 1;
index 2fb5748dae0c808bfc55f1435cf6e79350357cd9..adfcce2bf11ba7081d5748f3978c43f13a922e49 100644 (file)
@@ -88,8 +88,8 @@ svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock)
                return false;
 
        locks_init_lock(fl);
-       fl->fl_flags = FL_POSIX;
-       fl->fl_type  = F_RDLCK;
+       fl->c.flc_flags = FL_POSIX;
+       fl->c.flc_type  = F_RDLCK;
        end = start + len - 1;
        fl->fl_start = s32_to_loff_t(start);
        if (len == 0 || end < 0)
@@ -107,7 +107,7 @@ svcxdr_encode_holder(struct xdr_stream *xdr, const struct nlm_lock *lock)
        s32 start, len;
 
        /* exclusive */
-       if (xdr_stream_encode_bool(xdr, fl->fl_type != F_RDLCK) < 0)
+       if (xdr_stream_encode_bool(xdr, fl->c.flc_type != F_RDLCK) < 0)
                return false;
        if (xdr_stream_encode_u32(xdr, lock->svid) < 0)
                return false;
@@ -164,7 +164,7 @@ nlmsvc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
        if (!svcxdr_decode_lock(xdr, &argp->lock))
                return false;
        if (exclusive)
-               argp->lock.fl.fl_type = F_WRLCK;
+               argp->lock.fl.c.flc_type = F_WRLCK;
 
        return true;
 }
@@ -184,7 +184,7 @@ nlmsvc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
        if (!svcxdr_decode_lock(xdr, &argp->lock))
                return false;
        if (exclusive)
-               argp->lock.fl.fl_type = F_WRLCK;
+               argp->lock.fl.c.flc_type = F_WRLCK;
        if (xdr_stream_decode_bool(xdr, &argp->reclaim) < 0)
                return false;
        if (xdr_stream_decode_u32(xdr, &argp->state) < 0)
@@ -209,7 +209,7 @@ nlmsvc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
        if (!svcxdr_decode_lock(xdr, &argp->lock))
                return false;
        if (exclusive)
-               argp->lock.fl.fl_type = F_WRLCK;
+               argp->lock.fl.c.flc_type = F_WRLCK;
 
        return true;
 }
@@ -223,7 +223,7 @@ nlmsvc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
                return false;
        if (!svcxdr_decode_lock(xdr, &argp->lock))
                return false;
-       argp->lock.fl.fl_type = F_UNLCK;
+       argp->lock.fl.c.flc_type = F_UNLCK;
 
        return true;
 }
index 5fcbf30cd275928d1d364cf63e4f39fb61cb2a5a..3d28b9c3ed1509a262cff891f0e1630791b9fcba 100644 (file)
@@ -89,8 +89,8 @@ svcxdr_decode_lock(struct xdr_stream *xdr, struct nlm_lock *lock)
                return false;
 
        locks_init_lock(fl);
-       fl->fl_flags = FL_POSIX;
-       fl->fl_type  = F_RDLCK;
+       fl->c.flc_flags = FL_POSIX;
+       fl->c.flc_type  = F_RDLCK;
        nlm4svc_set_file_lock_range(fl, lock->lock_start, lock->lock_len);
        return true;
 }
@@ -102,7 +102,7 @@ svcxdr_encode_holder(struct xdr_stream *xdr, const struct nlm_lock *lock)
        s64 start, len;
 
        /* exclusive */
-       if (xdr_stream_encode_bool(xdr, fl->fl_type != F_RDLCK) < 0)
+       if (xdr_stream_encode_bool(xdr, fl->c.flc_type != F_RDLCK) < 0)
                return false;
        if (xdr_stream_encode_u32(xdr, lock->svid) < 0)
                return false;
@@ -159,7 +159,7 @@ nlm4svc_decode_testargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
        if (!svcxdr_decode_lock(xdr, &argp->lock))
                return false;
        if (exclusive)
-               argp->lock.fl.fl_type = F_WRLCK;
+               argp->lock.fl.c.flc_type = F_WRLCK;
 
        return true;
 }
@@ -179,7 +179,7 @@ nlm4svc_decode_lockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
        if (!svcxdr_decode_lock(xdr, &argp->lock))
                return false;
        if (exclusive)
-               argp->lock.fl.fl_type = F_WRLCK;
+               argp->lock.fl.c.flc_type = F_WRLCK;
        if (xdr_stream_decode_bool(xdr, &argp->reclaim) < 0)
                return false;
        if (xdr_stream_decode_u32(xdr, &argp->state) < 0)
@@ -204,7 +204,7 @@ nlm4svc_decode_cancargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
        if (!svcxdr_decode_lock(xdr, &argp->lock))
                return false;
        if (exclusive)
-               argp->lock.fl.fl_type = F_WRLCK;
+               argp->lock.fl.c.flc_type = F_WRLCK;
 
        return true;
 }
@@ -218,7 +218,7 @@ nlm4svc_decode_unlockargs(struct svc_rqst *rqstp, struct xdr_stream *xdr)
                return false;
        if (!svcxdr_decode_lock(xdr, &argp->lock))
                return false;
-       argp->lock.fl.fl_type = F_UNLCK;
+       argp->lock.fl.c.flc_type = F_UNLCK;
 
        return true;
 }
index cc7c117ee19294410b02333638ba5f810b0cdadd..90c8746874dedbbb71e14b8269bbbcd216fd1bf7 100644 (file)
@@ -48,7 +48,6 @@
  * children.
  *
  */
-
 #include <linux/capability.h>
 #include <linux/file.h>
 #include <linux/fdtable.h>
 
 #include <linux/uaccess.h>
 
-#define IS_POSIX(fl)   (fl->fl_flags & FL_POSIX)
-#define IS_FLOCK(fl)   (fl->fl_flags & FL_FLOCK)
-#define IS_LEASE(fl)   (fl->fl_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT))
-#define IS_OFDLCK(fl)  (fl->fl_flags & FL_OFDLCK)
-#define IS_REMOTELCK(fl)       (fl->fl_pid <= 0)
+static struct file_lock *file_lock(struct file_lock_core *flc)
+{
+       return container_of(flc, struct file_lock, c);
+}
+
+static struct file_lease *file_lease(struct file_lock_core *flc)
+{
+       return container_of(flc, struct file_lease, c);
+}
 
-static bool lease_breaking(struct file_lock *fl)
+static bool lease_breaking(struct file_lease *fl)
 {
-       return fl->fl_flags & (FL_UNLOCK_PENDING | FL_DOWNGRADE_PENDING);
+       return fl->c.flc_flags & (FL_UNLOCK_PENDING | FL_DOWNGRADE_PENDING);
 }
 
-static int target_leasetype(struct file_lock *fl)
+static int target_leasetype(struct file_lease *fl)
 {
-       if (fl->fl_flags & FL_UNLOCK_PENDING)
+       if (fl->c.flc_flags & FL_UNLOCK_PENDING)
                return F_UNLCK;
-       if (fl->fl_flags & FL_DOWNGRADE_PENDING)
+       if (fl->c.flc_flags & FL_DOWNGRADE_PENDING)
                return F_RDLCK;
-       return fl->fl_type;
+       return fl->c.flc_type;
 }
 
 static int leases_enable = 1;
@@ -168,6 +171,7 @@ static DEFINE_SPINLOCK(blocked_lock_lock);
 
 static struct kmem_cache *flctx_cache __ro_after_init;
 static struct kmem_cache *filelock_cache __ro_after_init;
+static struct kmem_cache *filelease_cache __ro_after_init;
 
 static struct file_lock_context *
 locks_get_lock_context(struct inode *inode, int type)
@@ -204,11 +208,12 @@ out:
 static void
 locks_dump_ctx_list(struct list_head *list, char *list_type)
 {
-       struct file_lock *fl;
+       struct file_lock_core *flc;
 
-       list_for_each_entry(fl, list, fl_list) {
-               pr_warn("%s: fl_owner=%p fl_flags=0x%x fl_type=0x%x fl_pid=%u\n", list_type, fl->fl_owner, fl->fl_flags, fl->fl_type, fl->fl_pid);
-       }
+       list_for_each_entry(flc, list, flc_list)
+               pr_warn("%s: fl_owner=%p fl_flags=0x%x fl_type=0x%x fl_pid=%u\n",
+                       list_type, flc->flc_owner, flc->flc_flags,
+                       flc->flc_type, flc->flc_pid);
 }
 
 static void
@@ -229,19 +234,19 @@ locks_check_ctx_lists(struct inode *inode)
 }
 
 static void
-locks_check_ctx_file_list(struct file *filp, struct list_head *list,
-                               char *list_type)
+locks_check_ctx_file_list(struct file *filp, struct list_head *list, char *list_type)
 {
-       struct file_lock *fl;
+       struct file_lock_core *flc;
        struct inode *inode = file_inode(filp);
 
-       list_for_each_entry(fl, list, fl_list)
-               if (fl->fl_file == filp)
+       list_for_each_entry(flc, list, flc_list)
+               if (flc->flc_file == filp)
                        pr_warn("Leaked %s lock on dev=0x%x:0x%x ino=0x%lx "
                                " fl_owner=%p fl_flags=0x%x fl_type=0x%x fl_pid=%u\n",
                                list_type, MAJOR(inode->i_sb->s_dev),
                                MINOR(inode->i_sb->s_dev), inode->i_ino,
-                               fl->fl_owner, fl->fl_flags, fl->fl_type, fl->fl_pid);
+                               flc->flc_owner, flc->flc_flags,
+                               flc->flc_type, flc->flc_pid);
 }
 
 void
@@ -255,13 +260,13 @@ locks_free_lock_context(struct inode *inode)
        }
 }
 
-static void locks_init_lock_heads(struct file_lock *fl)
+static void locks_init_lock_heads(struct file_lock_core *flc)
 {
-       INIT_HLIST_NODE(&fl->fl_link);
-       INIT_LIST_HEAD(&fl->fl_list);
-       INIT_LIST_HEAD(&fl->fl_blocked_requests);
-       INIT_LIST_HEAD(&fl->fl_blocked_member);
-       init_waitqueue_head(&fl->fl_wait);
+       INIT_HLIST_NODE(&flc->flc_link);
+       INIT_LIST_HEAD(&flc->flc_list);
+       INIT_LIST_HEAD(&flc->flc_blocked_requests);
+       INIT_LIST_HEAD(&flc->flc_blocked_member);
+       init_waitqueue_head(&flc->flc_wait);
 }
 
 /* Allocate an empty lock structure. */
@@ -270,19 +275,33 @@ struct file_lock *locks_alloc_lock(void)
        struct file_lock *fl = kmem_cache_zalloc(filelock_cache, GFP_KERNEL);
 
        if (fl)
-               locks_init_lock_heads(fl);
+               locks_init_lock_heads(&fl->c);
 
        return fl;
 }
 EXPORT_SYMBOL_GPL(locks_alloc_lock);
 
+/* Allocate an empty lock structure. */
+struct file_lease *locks_alloc_lease(void)
+{
+       struct file_lease *fl = kmem_cache_zalloc(filelease_cache, GFP_KERNEL);
+
+       if (fl)
+               locks_init_lock_heads(&fl->c);
+
+       return fl;
+}
+EXPORT_SYMBOL_GPL(locks_alloc_lease);
+
 void locks_release_private(struct file_lock *fl)
 {
-       BUG_ON(waitqueue_active(&fl->fl_wait));
-       BUG_ON(!list_empty(&fl->fl_list));
-       BUG_ON(!list_empty(&fl->fl_blocked_requests));
-       BUG_ON(!list_empty(&fl->fl_blocked_member));
-       BUG_ON(!hlist_unhashed(&fl->fl_link));
+       struct file_lock_core *flc = &fl->c;
+
+       BUG_ON(waitqueue_active(&flc->flc_wait));
+       BUG_ON(!list_empty(&flc->flc_list));
+       BUG_ON(!list_empty(&flc->flc_blocked_requests));
+       BUG_ON(!list_empty(&flc->flc_blocked_member));
+       BUG_ON(!hlist_unhashed(&flc->flc_link));
 
        if (fl->fl_ops) {
                if (fl->fl_ops->fl_release_private)
@@ -292,8 +311,8 @@ void locks_release_private(struct file_lock *fl)
 
        if (fl->fl_lmops) {
                if (fl->fl_lmops->lm_put_owner) {
-                       fl->fl_lmops->lm_put_owner(fl->fl_owner);
-                       fl->fl_owner = NULL;
+                       fl->fl_lmops->lm_put_owner(flc->flc_owner);
+                       flc->flc_owner = NULL;
                }
                fl->fl_lmops = NULL;
        }
@@ -309,16 +328,15 @@ EXPORT_SYMBOL_GPL(locks_release_private);
  *   %true: @owner has at least one blocker
  *   %false: @owner has no blockers
  */
-bool locks_owner_has_blockers(struct file_lock_context *flctx,
-               fl_owner_t owner)
+bool locks_owner_has_blockers(struct file_lock_context *flctx, fl_owner_t owner)
 {
-       struct file_lock *fl;
+       struct file_lock_core *flc;
 
        spin_lock(&flctx->flc_lock);
-       list_for_each_entry(fl, &flctx->flc_posix, fl_list) {
-               if (fl->fl_owner != owner)
+       list_for_each_entry(flc, &flctx->flc_posix, flc_list) {
+               if (flc->flc_owner != owner)
                        continue;
-               if (!list_empty(&fl->fl_blocked_requests)) {
+               if (!list_empty(&flc->flc_blocked_requests)) {
                        spin_unlock(&flctx->flc_lock);
                        return true;
                }
@@ -336,35 +354,52 @@ void locks_free_lock(struct file_lock *fl)
 }
 EXPORT_SYMBOL(locks_free_lock);
 
+/* Free a lease which is not in use. */
+void locks_free_lease(struct file_lease *fl)
+{
+       kmem_cache_free(filelease_cache, fl);
+}
+EXPORT_SYMBOL(locks_free_lease);
+
 static void
 locks_dispose_list(struct list_head *dispose)
 {
-       struct file_lock *fl;
+       struct file_lock_core *flc;
 
        while (!list_empty(dispose)) {
-               fl = list_first_entry(dispose, struct file_lock, fl_list);
-               list_del_init(&fl->fl_list);
-               locks_free_lock(fl);
+               flc = list_first_entry(dispose, struct file_lock_core, flc_list);
+               list_del_init(&flc->flc_list);
+               if (flc->flc_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT))
+                       locks_free_lease(file_lease(flc));
+               else
+                       locks_free_lock(file_lock(flc));
        }
 }
 
 void locks_init_lock(struct file_lock *fl)
 {
        memset(fl, 0, sizeof(struct file_lock));
-       locks_init_lock_heads(fl);
+       locks_init_lock_heads(&fl->c);
 }
 EXPORT_SYMBOL(locks_init_lock);
 
+void locks_init_lease(struct file_lease *fl)
+{
+       memset(fl, 0, sizeof(*fl));
+       locks_init_lock_heads(&fl->c);
+}
+EXPORT_SYMBOL(locks_init_lease);
+
 /*
  * Initialize a new lock from an existing file_lock structure.
  */
 void locks_copy_conflock(struct file_lock *new, struct file_lock *fl)
 {
-       new->fl_owner = fl->fl_owner;
-       new->fl_pid = fl->fl_pid;
-       new->fl_file = NULL;
-       new->fl_flags = fl->fl_flags;
-       new->fl_type = fl->fl_type;
+       new->c.flc_owner = fl->c.flc_owner;
+       new->c.flc_pid = fl->c.flc_pid;
+       new->c.flc_file = NULL;
+       new->c.flc_flags = fl->c.flc_flags;
+       new->c.flc_type = fl->c.flc_type;
        new->fl_start = fl->fl_start;
        new->fl_end = fl->fl_end;
        new->fl_lmops = fl->fl_lmops;
@@ -372,7 +407,7 @@ void locks_copy_conflock(struct file_lock *new, struct file_lock *fl)
 
        if (fl->fl_lmops) {
                if (fl->fl_lmops->lm_get_owner)
-                       fl->fl_lmops->lm_get_owner(fl->fl_owner);
+                       fl->fl_lmops->lm_get_owner(fl->c.flc_owner);
        }
 }
 EXPORT_SYMBOL(locks_copy_conflock);
@@ -384,7 +419,7 @@ void locks_copy_lock(struct file_lock *new, struct file_lock *fl)
 
        locks_copy_conflock(new, fl);
 
-       new->fl_file = fl->fl_file;
+       new->c.flc_file = fl->c.flc_file;
        new->fl_ops = fl->fl_ops;
 
        if (fl->fl_ops) {
@@ -400,15 +435,17 @@ static void locks_move_blocks(struct file_lock *new, struct file_lock *fl)
 
        /*
         * As ctx->flc_lock is held, new requests cannot be added to
-        * ->fl_blocked_requests, so we don't need a lock to check if it
+        * ->flc_blocked_requests, so we don't need a lock to check if it
         * is empty.
         */
-       if (list_empty(&fl->fl_blocked_requests))
+       if (list_empty(&fl->c.flc_blocked_requests))
                return;
        spin_lock(&blocked_lock_lock);
-       list_splice_init(&fl->fl_blocked_requests, &new->fl_blocked_requests);
-       list_for_each_entry(f, &new->fl_blocked_requests, fl_blocked_member)
-               f->fl_blocker = new;
+       list_splice_init(&fl->c.flc_blocked_requests,
+                        &new->c.flc_blocked_requests);
+       list_for_each_entry(f, &new->c.flc_blocked_requests,
+                           c.flc_blocked_member)
+               f->c.flc_blocker = &new->c;
        spin_unlock(&blocked_lock_lock);
 }
 
@@ -429,21 +466,21 @@ static void flock_make_lock(struct file *filp, struct file_lock *fl, int type)
 {
        locks_init_lock(fl);
 
-       fl->fl_file = filp;
-       fl->fl_owner = filp;
-       fl->fl_pid = current->tgid;
-       fl->fl_flags = FL_FLOCK;
-       fl->fl_type = type;
+       fl->c.flc_file = filp;
+       fl->c.flc_owner = filp;
+       fl->c.flc_pid = current->tgid;
+       fl->c.flc_flags = FL_FLOCK;
+       fl->c.flc_type = type;
        fl->fl_end = OFFSET_MAX;
 }
 
-static int assign_type(struct file_lock *fl, int type)
+static int assign_type(struct file_lock_core *flc, int type)
 {
        switch (type) {
        case F_RDLCK:
        case F_WRLCK:
        case F_UNLCK:
-               fl->fl_type = type;
+               flc->flc_type = type;
                break;
        default:
                return -EINVAL;
@@ -488,14 +525,14 @@ static int flock64_to_posix_lock(struct file *filp, struct file_lock *fl,
        } else
                fl->fl_end = OFFSET_MAX;
 
-       fl->fl_owner = current->files;
-       fl->fl_pid = current->tgid;
-       fl->fl_file = filp;
-       fl->fl_flags = FL_POSIX;
+       fl->c.flc_owner = current->files;
+       fl->c.flc_pid = current->tgid;
+       fl->c.flc_file = filp;
+       fl->c.flc_flags = FL_POSIX;
        fl->fl_ops = NULL;
        fl->fl_lmops = NULL;
 
-       return assign_type(fl, l->l_type);
+       return assign_type(&fl->c, l->l_type);
 }
 
 /* Verify a "struct flock" and copy it to a "struct file_lock" as a POSIX
@@ -516,16 +553,16 @@ static int flock_to_posix_lock(struct file *filp, struct file_lock *fl,
 
 /* default lease lock manager operations */
 static bool
-lease_break_callback(struct file_lock *fl)
+lease_break_callback(struct file_lease *fl)
 {
        kill_fasync(&fl->fl_fasync, SIGIO, POLL_MSG);
        return false;
 }
 
 static void
-lease_setup(struct file_lock *fl, void **priv)
+lease_setup(struct file_lease *fl, void **priv)
 {
-       struct file *filp = fl->fl_file;
+       struct file *filp = fl->c.flc_file;
        struct fasync_struct *fa = *priv;
 
        /*
@@ -539,7 +576,7 @@ lease_setup(struct file_lock *fl, void **priv)
        __f_setown(filp, task_pid(current), PIDTYPE_TGID, 0);
 }
 
-static const struct lock_manager_operations lease_manager_ops = {
+static const struct lease_manager_operations lease_manager_ops = {
        .lm_break = lease_break_callback,
        .lm_change = lease_modify,
        .lm_setup = lease_setup,
@@ -548,27 +585,24 @@ static const struct lock_manager_operations lease_manager_ops = {
 /*
  * Initialize a lease, use the default lock manager operations
  */
-static int lease_init(struct file *filp, int type, struct file_lock *fl)
+static int lease_init(struct file *filp, int type, struct file_lease *fl)
 {
-       if (assign_type(fl, type) != 0)
+       if (assign_type(&fl->c, type) != 0)
                return -EINVAL;
 
-       fl->fl_owner = filp;
-       fl->fl_pid = current->tgid;
+       fl->c.flc_owner = filp;
+       fl->c.flc_pid = current->tgid;
 
-       fl->fl_file = filp;
-       fl->fl_flags = FL_LEASE;
-       fl->fl_start = 0;
-       fl->fl_end = OFFSET_MAX;
-       fl->fl_ops = NULL;
+       fl->c.flc_file = filp;
+       fl->c.flc_flags = FL_LEASE;
        fl->fl_lmops = &lease_manager_ops;
        return 0;
 }
 
 /* Allocate a file_lock initialised to this type of lease */
-static struct file_lock *lease_alloc(struct file *filp, int type)
+static struct file_lease *lease_alloc(struct file *filp, int type)
 {
-       struct file_lock *fl = locks_alloc_lock();
+       struct file_lease *fl = locks_alloc_lease();
        int error = -ENOMEM;
 
        if (fl == NULL)
@@ -576,7 +610,7 @@ static struct file_lock *lease_alloc(struct file *filp, int type)
 
        error = lease_init(filp, type, fl);
        if (error) {
-               locks_free_lock(fl);
+               locks_free_lease(fl);
                return ERR_PTR(error);
        }
        return fl;
@@ -593,26 +627,26 @@ static inline int locks_overlap(struct file_lock *fl1, struct file_lock *fl2)
 /*
  * Check whether two locks have the same owner.
  */
-static int posix_same_owner(struct file_lock *fl1, struct file_lock *fl2)
+static int posix_same_owner(struct file_lock_core *fl1, struct file_lock_core *fl2)
 {
-       return fl1->fl_owner == fl2->fl_owner;
+       return fl1->flc_owner == fl2->flc_owner;
 }
 
 /* Must be called with the flc_lock held! */
-static void locks_insert_global_locks(struct file_lock *fl)
+static void locks_insert_global_locks(struct file_lock_core *flc)
 {
        struct file_lock_list_struct *fll = this_cpu_ptr(&file_lock_list);
 
        percpu_rwsem_assert_held(&file_rwsem);
 
        spin_lock(&fll->lock);
-       fl->fl_link_cpu = smp_processor_id();
-       hlist_add_head(&fl->fl_link, &fll->hlist);
+       flc->flc_link_cpu = smp_processor_id();
+       hlist_add_head(&flc->flc_link, &fll->hlist);
        spin_unlock(&fll->lock);
 }
 
 /* Must be called with the flc_lock held! */
-static void locks_delete_global_locks(struct file_lock *fl)
+static void locks_delete_global_locks(struct file_lock_core *flc)
 {
        struct file_lock_list_struct *fll;
 
@@ -623,33 +657,33 @@ static void locks_delete_global_locks(struct file_lock *fl)
         * is done while holding the flc_lock, and new insertions into the list
         * also require that it be held.
         */
-       if (hlist_unhashed(&fl->fl_link))
+       if (hlist_unhashed(&flc->flc_link))
                return;
 
-       fll = per_cpu_ptr(&file_lock_list, fl->fl_link_cpu);
+       fll = per_cpu_ptr(&file_lock_list, flc->flc_link_cpu);
        spin_lock(&fll->lock);
-       hlist_del_init(&fl->fl_link);
+       hlist_del_init(&flc->flc_link);
        spin_unlock(&fll->lock);
 }
 
 static unsigned long
-posix_owner_key(struct file_lock *fl)
+posix_owner_key(struct file_lock_core *flc)
 {
-       return (unsigned long)fl->fl_owner;
+       return (unsigned long) flc->flc_owner;
 }
 
-static void locks_insert_global_blocked(struct file_lock *waiter)
+static void locks_insert_global_blocked(struct file_lock_core *waiter)
 {
        lockdep_assert_held(&blocked_lock_lock);
 
-       hash_add(blocked_hash, &waiter->fl_link, posix_owner_key(waiter));
+       hash_add(blocked_hash, &waiter->flc_link, posix_owner_key(waiter));
 }
 
-static void locks_delete_global_blocked(struct file_lock *waiter)
+static void locks_delete_global_blocked(struct file_lock_core *waiter)
 {
        lockdep_assert_held(&blocked_lock_lock);
 
-       hash_del(&waiter->fl_link);
+       hash_del(&waiter->flc_link);
 }
 
 /* Remove waiter from blocker's block list.
@@ -657,41 +691,39 @@ static void locks_delete_global_blocked(struct file_lock *waiter)
  *
  * Must be called with blocked_lock_lock held.
  */
-static void __locks_delete_block(struct file_lock *waiter)
+static void __locks_unlink_block(struct file_lock_core *waiter)
 {
        locks_delete_global_blocked(waiter);
-       list_del_init(&waiter->fl_blocked_member);
+       list_del_init(&waiter->flc_blocked_member);
 }
 
-static void __locks_wake_up_blocks(struct file_lock *blocker)
+static void __locks_wake_up_blocks(struct file_lock_core *blocker)
 {
-       while (!list_empty(&blocker->fl_blocked_requests)) {
-               struct file_lock *waiter;
+       while (!list_empty(&blocker->flc_blocked_requests)) {
+               struct file_lock_core *waiter;
+               struct file_lock *fl;
 
-               waiter = list_first_entry(&blocker->fl_blocked_requests,
-                                         struct file_lock, fl_blocked_member);
-               __locks_delete_block(waiter);
-               if (waiter->fl_lmops && waiter->fl_lmops->lm_notify)
-                       waiter->fl_lmops->lm_notify(waiter);
+               waiter = list_first_entry(&blocker->flc_blocked_requests,
+                                         struct file_lock_core, flc_blocked_member);
+
+               fl = file_lock(waiter);
+               __locks_unlink_block(waiter);
+               if ((waiter->flc_flags & (FL_POSIX | FL_FLOCK)) &&
+                   fl->fl_lmops && fl->fl_lmops->lm_notify)
+                       fl->fl_lmops->lm_notify(fl);
                else
-                       wake_up(&waiter->fl_wait);
+                       locks_wake_up(fl);
 
                /*
-                * The setting of fl_blocker to NULL marks the "done"
+                * The setting of flc_blocker to NULL marks the "done"
                 * point in deleting a block. Paired with acquire at the top
                 * of locks_delete_block().
                 */
-               smp_store_release(&waiter->fl_blocker, NULL);
+               smp_store_release(&waiter->flc_blocker, NULL);
        }
 }
 
-/**
- *     locks_delete_block - stop waiting for a file lock
- *     @waiter: the lock which was waiting
- *
- *     lockd/nfsd need to disconnect the lock while working on it.
- */
-int locks_delete_block(struct file_lock *waiter)
+static int __locks_delete_block(struct file_lock_core *waiter)
 {
        int status = -ENOENT;
 
@@ -716,24 +748,35 @@ int locks_delete_block(struct file_lock *waiter)
         * no new locks can be inserted into its fl_blocked_requests list, and
         * can avoid doing anything further if the list is empty.
         */
-       if (!smp_load_acquire(&waiter->fl_blocker) &&
-           list_empty(&waiter->fl_blocked_requests))
+       if (!smp_load_acquire(&waiter->flc_blocker) &&
+           list_empty(&waiter->flc_blocked_requests))
                return status;
 
        spin_lock(&blocked_lock_lock);
-       if (waiter->fl_blocker)
+       if (waiter->flc_blocker)
                status = 0;
        __locks_wake_up_blocks(waiter);
-       __locks_delete_block(waiter);
+       __locks_unlink_block(waiter);
 
        /*
         * The setting of fl_blocker to NULL marks the "done" point in deleting
         * a block. Paired with acquire at the top of this function.
         */
-       smp_store_release(&waiter->fl_blocker, NULL);
+       smp_store_release(&waiter->flc_blocker, NULL);
        spin_unlock(&blocked_lock_lock);
        return status;
 }
+
+/**
+ *     locks_delete_block - stop waiting for a file lock
+ *     @waiter: the lock which was waiting
+ *
+ *     lockd/nfsd need to disconnect the lock while working on it.
+ */
+int locks_delete_block(struct file_lock *waiter)
+{
+       return __locks_delete_block(&waiter->c);
+}
 EXPORT_SYMBOL(locks_delete_block);
 
 /* Insert waiter into blocker's block list.
@@ -751,26 +794,28 @@ EXPORT_SYMBOL(locks_delete_block);
  * waiters, and add beneath any waiter that blocks the new waiter.
  * Thus wakeups don't happen until needed.
  */
-static void __locks_insert_block(struct file_lock *blocker,
-                                struct file_lock *waiter,
-                                bool conflict(struct file_lock *,
-                                              struct file_lock *))
+static void __locks_insert_block(struct file_lock_core *blocker,
+                                struct file_lock_core *waiter,
+                                bool conflict(struct file_lock_core *,
+                                              struct file_lock_core *))
 {
-       struct file_lock *fl;
-       BUG_ON(!list_empty(&waiter->fl_blocked_member));
+       struct file_lock_core *flc;
 
+       BUG_ON(!list_empty(&waiter->flc_blocked_member));
 new_blocker:
-       list_for_each_entry(fl, &blocker->fl_blocked_requests, fl_blocked_member)
-               if (conflict(fl, waiter)) {
-                       blocker =  fl;
+       list_for_each_entry(flc, &blocker->flc_blocked_requests, flc_blocked_member)
+               if (conflict(flc, waiter)) {
+                       blocker =  flc;
                        goto new_blocker;
                }
-       waiter->fl_blocker = blocker;
-       list_add_tail(&waiter->fl_blocked_member, &blocker->fl_blocked_requests);
-       if (IS_POSIX(blocker) && !IS_OFDLCK(blocker))
+       waiter->flc_blocker = blocker;
+       list_add_tail(&waiter->flc_blocked_member,
+                     &blocker->flc_blocked_requests);
+
+       if ((blocker->flc_flags & (FL_POSIX|FL_OFDLCK)) == FL_POSIX)
                locks_insert_global_blocked(waiter);
 
-       /* The requests in waiter->fl_blocked are known to conflict with
+       /* The requests in waiter->flc_blocked are known to conflict with
         * waiter, but might not conflict with blocker, or the requests
         * and lock which block it.  So they all need to be woken.
         */
@@ -778,10 +823,10 @@ new_blocker:
 }
 
 /* Must be called with flc_lock held. */
-static void locks_insert_block(struct file_lock *blocker,
-                              struct file_lock *waiter,
-                              bool conflict(struct file_lock *,
-                                            struct file_lock *))
+static void locks_insert_block(struct file_lock_core *blocker,
+                              struct file_lock_core *waiter,
+                              bool conflict(struct file_lock_core *,
+                                            struct file_lock_core *))
 {
        spin_lock(&blocked_lock_lock);
        __locks_insert_block(blocker, waiter, conflict);
@@ -793,7 +838,7 @@ static void locks_insert_block(struct file_lock *blocker,
  *
  * Must be called with the inode->flc_lock held!
  */
-static void locks_wake_up_blocks(struct file_lock *blocker)
+static void locks_wake_up_blocks(struct file_lock_core *blocker)
 {
        /*
         * Avoid taking global lock if list is empty. This is safe since new
@@ -802,7 +847,7 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
         * fl_blocked_requests list does not require the flc_lock, so we must
         * recheck list_empty() after acquiring the blocked_lock_lock.
         */
-       if (list_empty(&blocker->fl_blocked_requests))
+       if (list_empty(&blocker->flc_blocked_requests))
                return;
 
        spin_lock(&blocked_lock_lock);
@@ -811,39 +856,39 @@ static void locks_wake_up_blocks(struct file_lock *blocker)
 }
 
 static void
-locks_insert_lock_ctx(struct file_lock *fl, struct list_head *before)
+locks_insert_lock_ctx(struct file_lock_core *fl, struct list_head *before)
 {
-       list_add_tail(&fl->fl_list, before);
+       list_add_tail(&fl->flc_list, before);
        locks_insert_global_locks(fl);
 }
 
 static void
-locks_unlink_lock_ctx(struct file_lock *fl)
+locks_unlink_lock_ctx(struct file_lock_core *fl)
 {
        locks_delete_global_locks(fl);
-       list_del_init(&fl->fl_list);
+       list_del_init(&fl->flc_list);
        locks_wake_up_blocks(fl);
 }
 
 static void
-locks_delete_lock_ctx(struct file_lock *fl, struct list_head *dispose)
+locks_delete_lock_ctx(struct file_lock_core *fl, struct list_head *dispose)
 {
        locks_unlink_lock_ctx(fl);
        if (dispose)
-               list_add(&fl->fl_list, dispose);
+               list_add(&fl->flc_list, dispose);
        else
-               locks_free_lock(fl);
+               locks_free_lock(file_lock(fl));
 }
 
 /* Determine if lock sys_fl blocks lock caller_fl. Common functionality
  * checks for shared/exclusive status of overlapping locks.
  */
-static bool locks_conflict(struct file_lock *caller_fl,
-                          struct file_lock *sys_fl)
+static bool locks_conflict(struct file_lock_core *caller_flc,
+                          struct file_lock_core *sys_flc)
 {
-       if (sys_fl->fl_type == F_WRLCK)
+       if (sys_flc->flc_type == F_WRLCK)
                return true;
-       if (caller_fl->fl_type == F_WRLCK)
+       if (caller_flc->flc_type == F_WRLCK)
                return true;
        return false;
 }
@@ -851,20 +896,23 @@ static bool locks_conflict(struct file_lock *caller_fl,
 /* Determine if lock sys_fl blocks lock caller_fl. POSIX specific
  * checking before calling the locks_conflict().
  */
-static bool posix_locks_conflict(struct file_lock *caller_fl,
-                                struct file_lock *sys_fl)
+static bool posix_locks_conflict(struct file_lock_core *caller_flc,
+                                struct file_lock_core *sys_flc)
 {
+       struct file_lock *caller_fl = file_lock(caller_flc);
+       struct file_lock *sys_fl = file_lock(sys_flc);
+
        /* POSIX locks owned by the same process do not conflict with
         * each other.
         */
-       if (posix_same_owner(caller_fl, sys_fl))
+       if (posix_same_owner(caller_flc, sys_flc))
                return false;
 
        /* Check whether they overlap */
        if (!locks_overlap(caller_fl, sys_fl))
                return false;
 
-       return locks_conflict(caller_fl, sys_fl);
+       return locks_conflict(caller_flc, sys_flc);
 }
 
 /* Determine if lock sys_fl blocks lock caller_fl. Used on xx_GETLK
@@ -873,28 +921,31 @@ static bool posix_locks_conflict(struct file_lock *caller_fl,
 static bool posix_test_locks_conflict(struct file_lock *caller_fl,
                                      struct file_lock *sys_fl)
 {
+       struct file_lock_core *caller = &caller_fl->c;
+       struct file_lock_core *sys = &sys_fl->c;
+
        /* F_UNLCK checks any locks on the same fd. */
-       if (caller_fl->fl_type == F_UNLCK) {
-               if (!posix_same_owner(caller_fl, sys_fl))
+       if (lock_is_unlock(caller_fl)) {
+               if (!posix_same_owner(caller, sys))
                        return false;
                return locks_overlap(caller_fl, sys_fl);
        }
-       return posix_locks_conflict(caller_fl, sys_fl);
+       return posix_locks_conflict(caller, sys);
 }
 
 /* Determine if lock sys_fl blocks lock caller_fl. FLOCK specific
  * checking before calling the locks_conflict().
  */
-static bool flock_locks_conflict(struct file_lock *caller_fl,
-                                struct file_lock *sys_fl)
+static bool flock_locks_conflict(struct file_lock_core *caller_flc,
+                                struct file_lock_core *sys_flc)
 {
        /* FLOCK locks referring to the same filp do not conflict with
         * each other.
         */
-       if (caller_fl->fl_file == sys_fl->fl_file)
+       if (caller_flc->flc_file == sys_flc->flc_file)
                return false;
 
-       return locks_conflict(caller_fl, sys_fl);
+       return locks_conflict(caller_flc, sys_flc);
 }
 
 void
@@ -908,13 +959,13 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
 
        ctx = locks_inode_context(inode);
        if (!ctx || list_empty_careful(&ctx->flc_posix)) {
-               fl->fl_type = F_UNLCK;
+               fl->c.flc_type = F_UNLCK;
                return;
        }
 
 retry:
        spin_lock(&ctx->flc_lock);
-       list_for_each_entry(cfl, &ctx->flc_posix, fl_list) {
+       list_for_each_entry(cfl, &ctx->flc_posix, c.flc_list) {
                if (!posix_test_locks_conflict(fl, cfl))
                        continue;
                if (cfl->fl_lmops && cfl->fl_lmops->lm_lock_expirable
@@ -930,7 +981,7 @@ retry:
                locks_copy_conflock(fl, cfl);
                goto out;
        }
-       fl->fl_type = F_UNLCK;
+       fl->c.flc_type = F_UNLCK;
 out:
        spin_unlock(&ctx->flc_lock);
        return;
@@ -972,25 +1023,27 @@ EXPORT_SYMBOL(posix_test_lock);
 
 #define MAX_DEADLK_ITERATIONS 10
 
-/* Find a lock that the owner of the given block_fl is blocking on. */
-static struct file_lock *what_owner_is_waiting_for(struct file_lock *block_fl)
+/* Find a lock that the owner of the given @blocker is blocking on. */
+static struct file_lock_core *what_owner_is_waiting_for(struct file_lock_core *blocker)
 {
-       struct file_lock *fl;
+       struct file_lock_core *flc;
 
-       hash_for_each_possible(blocked_hash, fl, fl_link, posix_owner_key(block_fl)) {
-               if (posix_same_owner(fl, block_fl)) {
-                       while (fl->fl_blocker)
-                               fl = fl->fl_blocker;
-                       return fl;
+       hash_for_each_possible(blocked_hash, flc, flc_link, posix_owner_key(blocker)) {
+               if (posix_same_owner(flc, blocker)) {
+                       while (flc->flc_blocker)
+                               flc = flc->flc_blocker;
+                       return flc;
                }
        }
        return NULL;
 }
 
 /* Must be called with the blocked_lock_lock held! */
-static int posix_locks_deadlock(struct file_lock *caller_fl,
-                               struct file_lock *block_fl)
+static bool posix_locks_deadlock(struct file_lock *caller_fl,
+                                struct file_lock *block_fl)
 {
+       struct file_lock_core *caller = &caller_fl->c;
+       struct file_lock_core *blocker = &block_fl->c;
        int i = 0;
 
        lockdep_assert_held(&blocked_lock_lock);
@@ -999,16 +1052,16 @@ static int posix_locks_deadlock(struct file_lock *caller_fl,
         * This deadlock detector can't reasonably detect deadlocks with
         * FL_OFDLCK locks, since they aren't owned by a process, per-se.
         */
-       if (IS_OFDLCK(caller_fl))
-               return 0;
+       if (caller->flc_flags & FL_OFDLCK)
+               return false;
 
-       while ((block_fl = what_owner_is_waiting_for(block_fl))) {
+       while ((blocker = what_owner_is_waiting_for(blocker))) {
                if (i++ > MAX_DEADLK_ITERATIONS)
-                       return 0;
-               if (posix_same_owner(caller_fl, block_fl))
-                       return 1;
+                       return false;
+               if (posix_same_owner(caller, blocker))
+                       return true;
        }
-       return 0;
+       return false;
 }
 
 /* Try to create a FLOCK lock on filp. We always insert new FLOCK locks
@@ -1027,14 +1080,14 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request)
        bool found = false;
        LIST_HEAD(dispose);
 
-       ctx = locks_get_lock_context(inode, request->fl_type);
+       ctx = locks_get_lock_context(inode, request->c.flc_type);
        if (!ctx) {
-               if (request->fl_type != F_UNLCK)
+               if (request->c.flc_type != F_UNLCK)
                        return -ENOMEM;
-               return (request->fl_flags & FL_EXISTS) ? -ENOENT : 0;
+               return (request->c.flc_flags & FL_EXISTS) ? -ENOENT : 0;
        }
 
-       if (!(request->fl_flags & FL_ACCESS) && (request->fl_type != F_UNLCK)) {
+       if (!(request->c.flc_flags & FL_ACCESS) && (request->c.flc_type != F_UNLCK)) {
                new_fl = locks_alloc_lock();
                if (!new_fl)
                        return -ENOMEM;
@@ -1042,41 +1095,41 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request)
 
        percpu_down_read(&file_rwsem);
        spin_lock(&ctx->flc_lock);
-       if (request->fl_flags & FL_ACCESS)
+       if (request->c.flc_flags & FL_ACCESS)
                goto find_conflict;
 
-       list_for_each_entry(fl, &ctx->flc_flock, fl_list) {
-               if (request->fl_file != fl->fl_file)
+       list_for_each_entry(fl, &ctx->flc_flock, c.flc_list) {
+               if (request->c.flc_file != fl->c.flc_file)
                        continue;
-               if (request->fl_type == fl->fl_type)
+               if (request->c.flc_type == fl->c.flc_type)
                        goto out;
                found = true;
-               locks_delete_lock_ctx(fl, &dispose);
+               locks_delete_lock_ctx(&fl->c, &dispose);
                break;
        }
 
-       if (request->fl_type == F_UNLCK) {
-               if ((request->fl_flags & FL_EXISTS) && !found)
+       if (lock_is_unlock(request)) {
+               if ((request->c.flc_flags & FL_EXISTS) && !found)
                        error = -ENOENT;
                goto out;
        }
 
 find_conflict:
-       list_for_each_entry(fl, &ctx->flc_flock, fl_list) {
-               if (!flock_locks_conflict(request, fl))
+       list_for_each_entry(fl, &ctx->flc_flock, c.flc_list) {
+               if (!flock_locks_conflict(&request->c, &fl->c))
                        continue;
                error = -EAGAIN;
-               if (!(request->fl_flags & FL_SLEEP))
+               if (!(request->c.flc_flags & FL_SLEEP))
                        goto out;
                error = FILE_LOCK_DEFERRED;
-               locks_insert_block(fl, request, flock_locks_conflict);
+               locks_insert_block(&fl->c, &request->c, flock_locks_conflict);
                goto out;
        }
-       if (request->fl_flags & FL_ACCESS)
+       if (request->c.flc_flags & FL_ACCESS)
                goto out;
        locks_copy_lock(new_fl, request);
        locks_move_blocks(new_fl, request);
-       locks_insert_lock_ctx(new_fl, &ctx->flc_flock);
+       locks_insert_lock_ctx(&new_fl->c, &ctx->flc_flock);
        new_fl = NULL;
        error = 0;
 
@@ -1105,9 +1158,9 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
        void *owner;
        void (*func)(void);
 
-       ctx = locks_get_lock_context(inode, request->fl_type);
+       ctx = locks_get_lock_context(inode, request->c.flc_type);
        if (!ctx)
-               return (request->fl_type == F_UNLCK) ? 0 : -ENOMEM;
+               return lock_is_unlock(request) ? 0 : -ENOMEM;
 
        /*
         * We may need two file_lock structures for this operation,
@@ -1115,8 +1168,8 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
         *
         * In some cases we can be sure, that no new locks will be needed
         */
-       if (!(request->fl_flags & FL_ACCESS) &&
-           (request->fl_type != F_UNLCK ||
+       if (!(request->c.flc_flags & FL_ACCESS) &&
+           (request->c.flc_type != F_UNLCK ||
             request->fl_start != 0 || request->fl_end != OFFSET_MAX)) {
                new_fl = locks_alloc_lock();
                new_fl2 = locks_alloc_lock();
@@ -1130,9 +1183,9 @@ retry:
         * there are any, either return error or put the request on the
         * blocker's list of waiters and the global blocked_hash.
         */
-       if (request->fl_type != F_UNLCK) {
-               list_for_each_entry(fl, &ctx->flc_posix, fl_list) {
-                       if (!posix_locks_conflict(request, fl))
+       if (request->c.flc_type != F_UNLCK) {
+               list_for_each_entry(fl, &ctx->flc_posix, c.flc_list) {
+                       if (!posix_locks_conflict(&request->c, &fl->c))
                                continue;
                        if (fl->fl_lmops && fl->fl_lmops->lm_lock_expirable
                                && (*fl->fl_lmops->lm_lock_expirable)(fl)) {
@@ -1148,7 +1201,7 @@ retry:
                        if (conflock)
                                locks_copy_conflock(conflock, fl);
                        error = -EAGAIN;
-                       if (!(request->fl_flags & FL_SLEEP))
+                       if (!(request->c.flc_flags & FL_SLEEP))
                                goto out;
                        /*
                         * Deadlock detection and insertion into the blocked
@@ -1160,10 +1213,10 @@ retry:
                         * Ensure that we don't find any locks blocked on this
                         * request during deadlock detection.
                         */
-                       __locks_wake_up_blocks(request);
+                       __locks_wake_up_blocks(&request->c);
                        if (likely(!posix_locks_deadlock(request, fl))) {
                                error = FILE_LOCK_DEFERRED;
-                               __locks_insert_block(fl, request,
+                               __locks_insert_block(&fl->c, &request->c,
                                                     posix_locks_conflict);
                        }
                        spin_unlock(&blocked_lock_lock);
@@ -1173,22 +1226,22 @@ retry:
 
        /* If we're just looking for a conflict, we're done. */
        error = 0;
-       if (request->fl_flags & FL_ACCESS)
+       if (request->c.flc_flags & FL_ACCESS)
                goto out;
 
        /* Find the first old lock with the same owner as the new lock */
-       list_for_each_entry(fl, &ctx->flc_posix, fl_list) {
-               if (posix_same_owner(request, fl))
+       list_for_each_entry(fl, &ctx->flc_posix, c.flc_list) {
+               if (posix_same_owner(&request->c, &fl->c))
                        break;
        }
 
        /* Process locks with this owner. */
-       list_for_each_entry_safe_from(fl, tmp, &ctx->flc_posix, fl_list) {
-               if (!posix_same_owner(request, fl))
+       list_for_each_entry_safe_from(fl, tmp, &ctx->flc_posix, c.flc_list) {
+               if (!posix_same_owner(&request->c, &fl->c))
                        break;
 
                /* Detect adjacent or overlapping regions (if same lock type) */
-               if (request->fl_type == fl->fl_type) {
+               if (request->c.flc_type == fl->c.flc_type) {
                        /* In all comparisons of start vs end, use
                         * "start - 1" rather than "end + 1". If end
                         * is OFFSET_MAX, end + 1 will become negative.
@@ -1215,7 +1268,7 @@ retry:
                        else
                                request->fl_end = fl->fl_end;
                        if (added) {
-                               locks_delete_lock_ctx(fl, &dispose);
+                               locks_delete_lock_ctx(&fl->c, &dispose);
                                continue;
                        }
                        request = fl;
@@ -1228,7 +1281,7 @@ retry:
                                continue;
                        if (fl->fl_start > request->fl_end)
                                break;
-                       if (request->fl_type == F_UNLCK)
+                       if (lock_is_unlock(request))
                                added = true;
                        if (fl->fl_start < request->fl_start)
                                left = fl;
@@ -1244,7 +1297,7 @@ retry:
                                 * one (This may happen several times).
                                 */
                                if (added) {
-                                       locks_delete_lock_ctx(fl, &dispose);
+                                       locks_delete_lock_ctx(&fl->c, &dispose);
                                        continue;
                                }
                                /*
@@ -1261,8 +1314,9 @@ retry:
                                locks_move_blocks(new_fl, request);
                                request = new_fl;
                                new_fl = NULL;
-                               locks_insert_lock_ctx(request, &fl->fl_list);
-                               locks_delete_lock_ctx(fl, &dispose);
+                               locks_insert_lock_ctx(&request->c,
+                                                     &fl->c.flc_list);
+                               locks_delete_lock_ctx(&fl->c, &dispose);
                                added = true;
                        }
                }
@@ -1279,8 +1333,8 @@ retry:
 
        error = 0;
        if (!added) {
-               if (request->fl_type == F_UNLCK) {
-                       if (request->fl_flags & FL_EXISTS)
+               if (lock_is_unlock(request)) {
+                       if (request->c.flc_flags & FL_EXISTS)
                                error = -ENOENT;
                        goto out;
                }
@@ -1291,7 +1345,7 @@ retry:
                }
                locks_copy_lock(new_fl, request);
                locks_move_blocks(new_fl, request);
-               locks_insert_lock_ctx(new_fl, &fl->fl_list);
+               locks_insert_lock_ctx(&new_fl->c, &fl->c.flc_list);
                fl = new_fl;
                new_fl = NULL;
        }
@@ -1303,14 +1357,14 @@ retry:
                        left = new_fl2;
                        new_fl2 = NULL;
                        locks_copy_lock(left, right);
-                       locks_insert_lock_ctx(left, &fl->fl_list);
+                       locks_insert_lock_ctx(&left->c, &fl->c.flc_list);
                }
                right->fl_start = request->fl_end + 1;
-               locks_wake_up_blocks(right);
+               locks_wake_up_blocks(&right->c);
        }
        if (left) {
                left->fl_end = request->fl_start - 1;
-               locks_wake_up_blocks(left);
+               locks_wake_up_blocks(&left->c);
        }
  out:
        spin_unlock(&ctx->flc_lock);
@@ -1364,8 +1418,8 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
                error = posix_lock_inode(inode, fl, NULL);
                if (error != FILE_LOCK_DEFERRED)
                        break;
-               error = wait_event_interruptible(fl->fl_wait,
-                                       list_empty(&fl->fl_blocked_member));
+               error = wait_event_interruptible(fl->c.flc_wait,
+                                                list_empty(&fl->c.flc_blocked_member));
                if (error)
                        break;
        }
@@ -1373,37 +1427,37 @@ static int posix_lock_inode_wait(struct inode *inode, struct file_lock *fl)
        return error;
 }
 
-static void lease_clear_pending(struct file_lock *fl, int arg)
+static void lease_clear_pending(struct file_lease *fl, int arg)
 {
        switch (arg) {
        case F_UNLCK:
-               fl->fl_flags &= ~FL_UNLOCK_PENDING;
+               fl->c.flc_flags &= ~FL_UNLOCK_PENDING;
                fallthrough;
        case F_RDLCK:
-               fl->fl_flags &= ~FL_DOWNGRADE_PENDING;
+               fl->c.flc_flags &= ~FL_DOWNGRADE_PENDING;
        }
 }
 
 /* We already had a lease on this file; just change its type */
-int lease_modify(struct file_lock *fl, int arg, struct list_head *dispose)
+int lease_modify(struct file_lease *fl, int arg, struct list_head *dispose)
 {
-       int error = assign_type(fl, arg);
+       int error = assign_type(&fl->c, arg);
 
        if (error)
                return error;
        lease_clear_pending(fl, arg);
-       locks_wake_up_blocks(fl);
+       locks_wake_up_blocks(&fl->c);
        if (arg == F_UNLCK) {
-               struct file *filp = fl->fl_file;
+               struct file *filp = fl->c.flc_file;
 
                f_delown(filp);
                filp->f_owner.signum = 0;
-               fasync_helper(0, fl->fl_file, 0, &fl->fl_fasync);
+               fasync_helper(0, fl->c.flc_file, 0, &fl->fl_fasync);
                if (fl->fl_fasync != NULL) {
                        printk(KERN_ERR "locks_delete_lock: fasync == %p\n", fl->fl_fasync);
                        fl->fl_fasync = NULL;
                }
-               locks_delete_lock_ctx(fl, dispose);
+               locks_delete_lock_ctx(&fl->c, dispose);
        }
        return 0;
 }
@@ -1420,11 +1474,11 @@ static bool past_time(unsigned long then)
 static void time_out_leases(struct inode *inode, struct list_head *dispose)
 {
        struct file_lock_context *ctx = inode->i_flctx;
-       struct file_lock *fl, *tmp;
+       struct file_lease *fl, *tmp;
 
        lockdep_assert_held(&ctx->flc_lock);
 
-       list_for_each_entry_safe(fl, tmp, &ctx->flc_lease, fl_list) {
+       list_for_each_entry_safe(fl, tmp, &ctx->flc_lease, c.flc_list) {
                trace_time_out_leases(inode, fl);
                if (past_time(fl->fl_downgrade_time))
                        lease_modify(fl, F_RDLCK, dispose);
@@ -1433,38 +1487,40 @@ static void time_out_leases(struct inode *inode, struct list_head *dispose)
        }
 }
 
-static bool leases_conflict(struct file_lock *lease, struct file_lock *breaker)
+static bool leases_conflict(struct file_lock_core *lc, struct file_lock_core *bc)
 {
        bool rc;
+       struct file_lease *lease = file_lease(lc);
+       struct file_lease *breaker = file_lease(bc);
 
        if (lease->fl_lmops->lm_breaker_owns_lease
                        && lease->fl_lmops->lm_breaker_owns_lease(lease))
                return false;
-       if ((breaker->fl_flags & FL_LAYOUT) != (lease->fl_flags & FL_LAYOUT)) {
+       if ((bc->flc_flags & FL_LAYOUT) != (lc->flc_flags & FL_LAYOUT)) {
                rc = false;
                goto trace;
        }
-       if ((breaker->fl_flags & FL_DELEG) && (lease->fl_flags & FL_LEASE)) {
+       if ((bc->flc_flags & FL_DELEG) && (lc->flc_flags & FL_LEASE)) {
                rc = false;
                goto trace;
        }
 
-       rc = locks_conflict(breaker, lease);
+       rc = locks_conflict(bc, lc);
 trace:
        trace_leases_conflict(rc, lease, breaker);
        return rc;
 }
 
 static bool
-any_leases_conflict(struct inode *inode, struct file_lock *breaker)
+any_leases_conflict(struct inode *inode, struct file_lease *breaker)
 {
        struct file_lock_context *ctx = inode->i_flctx;
-       struct file_lock *fl;
+       struct file_lock_core *flc;
 
        lockdep_assert_held(&ctx->flc_lock);
 
-       list_for_each_entry(fl, &ctx->flc_lease, fl_list) {
-               if (leases_conflict(fl, breaker))
+       list_for_each_entry(flc, &ctx->flc_lease, flc_list) {
+               if (leases_conflict(flc, &breaker->c))
                        return true;
        }
        return false;
@@ -1487,7 +1543,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
 {
        int error = 0;
        struct file_lock_context *ctx;
-       struct file_lock *new_fl, *fl, *tmp;
+       struct file_lease *new_fl, *fl, *tmp;
        unsigned long break_time;
        int want_write = (mode & O_ACCMODE) != O_RDONLY;
        LIST_HEAD(dispose);
@@ -1495,7 +1551,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
        new_fl = lease_alloc(NULL, want_write ? F_WRLCK : F_RDLCK);
        if (IS_ERR(new_fl))
                return PTR_ERR(new_fl);
-       new_fl->fl_flags = type;
+       new_fl->c.flc_flags = type;
 
        /* typically we will check that ctx is non-NULL before calling */
        ctx = locks_inode_context(inode);
@@ -1519,22 +1575,22 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
                        break_time++;   /* so that 0 means no break time */
        }
 
-       list_for_each_entry_safe(fl, tmp, &ctx->flc_lease, fl_list) {
-               if (!leases_conflict(fl, new_fl))
+       list_for_each_entry_safe(fl, tmp, &ctx->flc_lease, c.flc_list) {
+               if (!leases_conflict(&fl->c, &new_fl->c))
                        continue;
                if (want_write) {
-                       if (fl->fl_flags & FL_UNLOCK_PENDING)
+                       if (fl->c.flc_flags & FL_UNLOCK_PENDING)
                                continue;
-                       fl->fl_flags |= FL_UNLOCK_PENDING;
+                       fl->c.flc_flags |= FL_UNLOCK_PENDING;
                        fl->fl_break_time = break_time;
                } else {
                        if (lease_breaking(fl))
                                continue;
-                       fl->fl_flags |= FL_DOWNGRADE_PENDING;
+                       fl->c.flc_flags |= FL_DOWNGRADE_PENDING;
                        fl->fl_downgrade_time = break_time;
                }
                if (fl->fl_lmops->lm_break(fl))
-                       locks_delete_lock_ctx(fl, &dispose);
+                       locks_delete_lock_ctx(&fl->c, &dispose);
        }
 
        if (list_empty(&ctx->flc_lease))
@@ -1547,26 +1603,26 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
        }
 
 restart:
-       fl = list_first_entry(&ctx->flc_lease, struct file_lock, fl_list);
+       fl = list_first_entry(&ctx->flc_lease, struct file_lease, c.flc_list);
        break_time = fl->fl_break_time;
        if (break_time != 0)
                break_time -= jiffies;
        if (break_time == 0)
                break_time++;
-       locks_insert_block(fl, new_fl, leases_conflict);
+       locks_insert_block(&fl->c, &new_fl->c, leases_conflict);
        trace_break_lease_block(inode, new_fl);
        spin_unlock(&ctx->flc_lock);
        percpu_up_read(&file_rwsem);
 
        locks_dispose_list(&dispose);
-       error = wait_event_interruptible_timeout(new_fl->fl_wait,
-                                       list_empty(&new_fl->fl_blocked_member),
-                                       break_time);
+       error = wait_event_interruptible_timeout(new_fl->c.flc_wait,
+                                                list_empty(&new_fl->c.flc_blocked_member),
+                                                break_time);
 
        percpu_down_read(&file_rwsem);
        spin_lock(&ctx->flc_lock);
        trace_break_lease_unblock(inode, new_fl);
-       locks_delete_block(new_fl);
+       __locks_delete_block(&new_fl->c);
        if (error >= 0) {
                /*
                 * Wait for the next conflicting lease that has not been
@@ -1583,7 +1639,7 @@ out:
        percpu_up_read(&file_rwsem);
        locks_dispose_list(&dispose);
 free_lock:
-       locks_free_lock(new_fl);
+       locks_free_lease(new_fl);
        return error;
 }
 EXPORT_SYMBOL(__break_lease);
@@ -1601,14 +1657,14 @@ void lease_get_mtime(struct inode *inode, struct timespec64 *time)
 {
        bool has_lease = false;
        struct file_lock_context *ctx;
-       struct file_lock *fl;
+       struct file_lock_core *flc;
 
        ctx = locks_inode_context(inode);
        if (ctx && !list_empty_careful(&ctx->flc_lease)) {
                spin_lock(&ctx->flc_lock);
-               fl = list_first_entry_or_null(&ctx->flc_lease,
-                                             struct file_lock, fl_list);
-               if (fl && (fl->fl_type == F_WRLCK))
+               flc = list_first_entry_or_null(&ctx->flc_lease,
+                                              struct file_lock_core, flc_list);
+               if (flc && flc->flc_type == F_WRLCK)
                        has_lease = true;
                spin_unlock(&ctx->flc_lock);
        }
@@ -1643,7 +1699,7 @@ EXPORT_SYMBOL(lease_get_mtime);
  */
 int fcntl_getlease(struct file *filp)
 {
-       struct file_lock *fl;
+       struct file_lease *fl;
        struct inode *inode = file_inode(filp);
        struct file_lock_context *ctx;
        int type = F_UNLCK;
@@ -1654,8 +1710,8 @@ int fcntl_getlease(struct file *filp)
                percpu_down_read(&file_rwsem);
                spin_lock(&ctx->flc_lock);
                time_out_leases(inode, &dispose);
-               list_for_each_entry(fl, &ctx->flc_lease, fl_list) {
-                       if (fl->fl_file != filp)
+               list_for_each_entry(fl, &ctx->flc_lease, c.flc_list) {
+                       if (fl->c.flc_file != filp)
                                continue;
                        type = target_leasetype(fl);
                        break;
@@ -1715,12 +1771,12 @@ check_conflicting_open(struct file *filp, const int arg, int flags)
 }
 
 static int
-generic_add_lease(struct file *filp, int arg, struct file_lock **flp, void **priv)
+generic_add_lease(struct file *filp, int arg, struct file_lease **flp, void **priv)
 {
-       struct file_lock *fl, *my_fl = NULL, *lease;
+       struct file_lease *fl, *my_fl = NULL, *lease;
        struct inode *inode = file_inode(filp);
        struct file_lock_context *ctx;
-       bool is_deleg = (*flp)->fl_flags & FL_DELEG;
+       bool is_deleg = (*flp)->c.flc_flags & FL_DELEG;
        int error;
        LIST_HEAD(dispose);
 
@@ -1746,7 +1802,7 @@ generic_add_lease(struct file *filp, int arg, struct file_lock **flp, void **pri
        percpu_down_read(&file_rwsem);
        spin_lock(&ctx->flc_lock);
        time_out_leases(inode, &dispose);
-       error = check_conflicting_open(filp, arg, lease->fl_flags);
+       error = check_conflicting_open(filp, arg, lease->c.flc_flags);
        if (error)
                goto out;
 
@@ -1759,9 +1815,9 @@ generic_add_lease(struct file *filp, int arg, struct file_lock **flp, void **pri
         * except for this filp.
         */
        error = -EAGAIN;
-       list_for_each_entry(fl, &ctx->flc_lease, fl_list) {
-               if (fl->fl_file == filp &&
-                   fl->fl_owner == lease->fl_owner) {
+       list_for_each_entry(fl, &ctx->flc_lease, c.flc_list) {
+               if (fl->c.flc_file == filp &&
+                   fl->c.flc_owner == lease->c.flc_owner) {
                        my_fl = fl;
                        continue;
                }
@@ -1776,7 +1832,7 @@ generic_add_lease(struct file *filp, int arg, struct file_lock **flp, void **pri
                 * Modifying our existing lease is OK, but no getting a
                 * new lease if someone else is opening for write:
                 */
-               if (fl->fl_flags & FL_UNLOCK_PENDING)
+               if (fl->c.flc_flags & FL_UNLOCK_PENDING)
                        goto out;
        }
 
@@ -1792,7 +1848,7 @@ generic_add_lease(struct file *filp, int arg, struct file_lock **flp, void **pri
        if (!leases_enable)
                goto out;
 
-       locks_insert_lock_ctx(lease, &ctx->flc_lease);
+       locks_insert_lock_ctx(&lease->c, &ctx->flc_lease);
        /*
         * The check in break_lease() is lockless. It's possible for another
         * open to race in after we did the earlier check for a conflicting
@@ -1803,9 +1859,9 @@ generic_add_lease(struct file *filp, int arg, struct file_lock **flp, void **pri
         * precedes these checks.
         */
        smp_mb();
-       error = check_conflicting_open(filp, arg, lease->fl_flags);
+       error = check_conflicting_open(filp, arg, lease->c.flc_flags);
        if (error) {
-               locks_unlink_lock_ctx(lease);
+               locks_unlink_lock_ctx(&lease->c);
                goto out;
        }
 
@@ -1826,7 +1882,7 @@ out:
 static int generic_delete_lease(struct file *filp, void *owner)
 {
        int error = -EAGAIN;
-       struct file_lock *fl, *victim = NULL;
+       struct file_lease *fl, *victim = NULL;
        struct inode *inode = file_inode(filp);
        struct file_lock_context *ctx;
        LIST_HEAD(dispose);
@@ -1839,9 +1895,9 @@ static int generic_delete_lease(struct file *filp, void *owner)
 
        percpu_down_read(&file_rwsem);
        spin_lock(&ctx->flc_lock);
-       list_for_each_entry(fl, &ctx->flc_lease, fl_list) {
-               if (fl->fl_file == filp &&
-                   fl->fl_owner == owner) {
+       list_for_each_entry(fl, &ctx->flc_lease, c.flc_list) {
+               if (fl->c.flc_file == filp &&
+                   fl->c.flc_owner == owner) {
                        victim = fl;
                        break;
                }
@@ -1866,21 +1922,9 @@ static int generic_delete_lease(struct file *filp, void *owner)
  *     The (input) flp->fl_lmops->lm_break function is required
  *     by break_lease().
  */
-int generic_setlease(struct file *filp, int arg, struct file_lock **flp,
+int generic_setlease(struct file *filp, int arg, struct file_lease **flp,
                        void **priv)
 {
-       struct inode *inode = file_inode(filp);
-       vfsuid_t vfsuid = i_uid_into_vfsuid(file_mnt_idmap(filp), inode);
-       int error;
-
-       if ((!vfsuid_eq_kuid(vfsuid, current_fsuid())) && !capable(CAP_LEASE))
-               return -EACCES;
-       if (!S_ISREG(inode->i_mode))
-               return -EINVAL;
-       error = security_file_lock(filp, arg);
-       if (error)
-               return error;
-
        switch (arg) {
        case F_UNLCK:
                return generic_delete_lease(filp, *priv);
@@ -1913,7 +1957,7 @@ lease_notifier_chain_init(void)
 }
 
 static inline void
-setlease_notifier(int arg, struct file_lock *lease)
+setlease_notifier(int arg, struct file_lease *lease)
 {
        if (arg != F_UNLCK)
                srcu_notifier_call_chain(&lease_notifier_chain, arg, lease);
@@ -1931,6 +1975,19 @@ void lease_unregister_notifier(struct notifier_block *nb)
 }
 EXPORT_SYMBOL_GPL(lease_unregister_notifier);
 
+
+int
+kernel_setlease(struct file *filp, int arg, struct file_lease **lease, void **priv)
+{
+       if (lease)
+               setlease_notifier(arg, *lease);
+       if (filp->f_op->setlease)
+               return filp->f_op->setlease(filp, arg, lease, priv);
+       else
+               return generic_setlease(filp, arg, lease, priv);
+}
+EXPORT_SYMBOL_GPL(kernel_setlease);
+
 /**
  * vfs_setlease        -       sets a lease on an open file
  * @filp:      file pointer
@@ -1949,20 +2006,26 @@ EXPORT_SYMBOL_GPL(lease_unregister_notifier);
  * may be NULL if the lm_setup operation doesn't require it.
  */
 int
-vfs_setlease(struct file *filp, int arg, struct file_lock **lease, void **priv)
+vfs_setlease(struct file *filp, int arg, struct file_lease **lease, void **priv)
 {
-       if (lease)
-               setlease_notifier(arg, *lease);
-       if (filp->f_op->setlease)
-               return filp->f_op->setlease(filp, arg, lease, priv);
-       else
-               return generic_setlease(filp, arg, lease, priv);
+       struct inode *inode = file_inode(filp);
+       vfsuid_t vfsuid = i_uid_into_vfsuid(file_mnt_idmap(filp), inode);
+       int error;
+
+       if ((!vfsuid_eq_kuid(vfsuid, current_fsuid())) && !capable(CAP_LEASE))
+               return -EACCES;
+       if (!S_ISREG(inode->i_mode))
+               return -EINVAL;
+       error = security_file_lock(filp, arg);
+       if (error)
+               return error;
+       return kernel_setlease(filp, arg, lease, priv);
 }
 EXPORT_SYMBOL_GPL(vfs_setlease);
 
 static int do_fcntl_add_lease(unsigned int fd, struct file *filp, int arg)
 {
-       struct file_lock *fl;
+       struct file_lease *fl;
        struct fasync_struct *new;
        int error;
 
@@ -1972,14 +2035,14 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, int arg)
 
        new = fasync_alloc();
        if (!new) {
-               locks_free_lock(fl);
+               locks_free_lease(fl);
                return -ENOMEM;
        }
        new->fa_fd = fd;
 
        error = vfs_setlease(filp, arg, &fl, (void **)&new);
        if (fl)
-               locks_free_lock(fl);
+               locks_free_lease(fl);
        if (new)
                fasync_free(new);
        return error;
@@ -2017,8 +2080,8 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
                error = flock_lock_inode(inode, fl);
                if (error != FILE_LOCK_DEFERRED)
                        break;
-               error = wait_event_interruptible(fl->fl_wait,
-                               list_empty(&fl->fl_blocked_member));
+               error = wait_event_interruptible(fl->c.flc_wait,
+                                                list_empty(&fl->c.flc_blocked_member));
                if (error)
                        break;
        }
@@ -2036,7 +2099,7 @@ static int flock_lock_inode_wait(struct inode *inode, struct file_lock *fl)
 int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl)
 {
        int res = 0;
-       switch (fl->fl_flags & (FL_POSIX|FL_FLOCK)) {
+       switch (fl->c.flc_flags & (FL_POSIX|FL_FLOCK)) {
                case FL_POSIX:
                        res = posix_lock_inode_wait(inode, fl);
                        break;
@@ -2098,13 +2161,13 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
 
        flock_make_lock(f.file, &fl, type);
 
-       error = security_file_lock(f.file, fl.fl_type);
+       error = security_file_lock(f.file, fl.c.flc_type);
        if (error)
                goto out_putf;
 
        can_sleep = !(cmd & LOCK_NB);
        if (can_sleep)
-               fl.fl_flags |= FL_SLEEP;
+               fl.c.flc_flags |= FL_SLEEP;
 
        if (f.file->f_op->flock)
                error = f.file->f_op->flock(f.file,
@@ -2130,7 +2193,7 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
  */
 int vfs_test_lock(struct file *filp, struct file_lock *fl)
 {
-       WARN_ON_ONCE(filp != fl->fl_file);
+       WARN_ON_ONCE(filp != fl->c.flc_file);
        if (filp->f_op->lock)
                return filp->f_op->lock(filp, F_GETLK, fl);
        posix_test_lock(filp, fl);
@@ -2145,25 +2208,28 @@ EXPORT_SYMBOL_GPL(vfs_test_lock);
  *
  * Used to translate a fl_pid into a namespace virtual pid number
  */
-static pid_t locks_translate_pid(struct file_lock *fl, struct pid_namespace *ns)
+static pid_t locks_translate_pid(struct file_lock_core *fl, struct pid_namespace *ns)
 {
        pid_t vnr;
        struct pid *pid;
 
-       if (IS_OFDLCK(fl))
+       if (fl->flc_flags & FL_OFDLCK)
                return -1;
-       if (IS_REMOTELCK(fl))
-               return fl->fl_pid;
+
+       /* Remote locks report a negative pid value */
+       if (fl->flc_pid <= 0)
+               return fl->flc_pid;
+
        /*
         * If the flock owner process is dead and its pid has been already
         * freed, the translation below won't work, but we still want to show
         * flock owner pid number in init pidns.
         */
        if (ns == &init_pid_ns)
-               return (pid_t)fl->fl_pid;
+               return (pid_t) fl->flc_pid;
 
        rcu_read_lock();
-       pid = find_pid_ns(fl->fl_pid, &init_pid_ns);
+       pid = find_pid_ns(fl->flc_pid, &init_pid_ns);
        vnr = pid_nr_ns(pid, ns);
        rcu_read_unlock();
        return vnr;
@@ -2171,7 +2237,7 @@ static pid_t locks_translate_pid(struct file_lock *fl, struct pid_namespace *ns)
 
 static int posix_lock_to_flock(struct flock *flock, struct file_lock *fl)
 {
-       flock->l_pid = locks_translate_pid(fl, task_active_pid_ns(current));
+       flock->l_pid = locks_translate_pid(&fl->c, task_active_pid_ns(current));
 #if BITS_PER_LONG == 32
        /*
         * Make sure we can represent the posix lock via
@@ -2186,19 +2252,19 @@ static int posix_lock_to_flock(struct flock *flock, struct file_lock *fl)
        flock->l_len = fl->fl_end == OFFSET_MAX ? 0 :
                fl->fl_end - fl->fl_start + 1;
        flock->l_whence = 0;
-       flock->l_type = fl->fl_type;
+       flock->l_type = fl->c.flc_type;
        return 0;
 }
 
 #if BITS_PER_LONG == 32
 static void posix_lock_to_flock64(struct flock64 *flock, struct file_lock *fl)
 {
-       flock->l_pid = locks_translate_pid(fl, task_active_pid_ns(current));
+       flock->l_pid = locks_translate_pid(&fl->c, task_active_pid_ns(current));
        flock->l_start = fl->fl_start;
        flock->l_len = fl->fl_end == OFFSET_MAX ? 0 :
                fl->fl_end - fl->fl_start + 1;
        flock->l_whence = 0;
-       flock->l_type = fl->fl_type;
+       flock->l_type = fl->c.flc_type;
 }
 #endif
 
@@ -2227,16 +2293,16 @@ int fcntl_getlk(struct file *filp, unsigned int cmd, struct flock *flock)
                if (flock->l_pid != 0)
                        goto out;
 
-               fl->fl_flags |= FL_OFDLCK;
-               fl->fl_owner = filp;
+               fl->c.flc_flags |= FL_OFDLCK;
+               fl->c.flc_owner = filp;
        }
 
        error = vfs_test_lock(filp, fl);
        if (error)
                goto out;
 
-       flock->l_type = fl->fl_type;
-       if (fl->fl_type != F_UNLCK) {
+       flock->l_type = fl->c.flc_type;
+       if (fl->c.flc_type != F_UNLCK) {
                error = posix_lock_to_flock(flock, fl);
                if (error)
                        goto out;
@@ -2283,7 +2349,7 @@ out:
  */
 int vfs_lock_file(struct file *filp, unsigned int cmd, struct file_lock *fl, struct file_lock *conf)
 {
-       WARN_ON_ONCE(filp != fl->fl_file);
+       WARN_ON_ONCE(filp != fl->c.flc_file);
        if (filp->f_op->lock)
                return filp->f_op->lock(filp, cmd, fl);
        else
@@ -2296,7 +2362,7 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
 {
        int error;
 
-       error = security_file_lock(filp, fl->fl_type);
+       error = security_file_lock(filp, fl->c.flc_type);
        if (error)
                return error;
 
@@ -2304,8 +2370,8 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
                error = vfs_lock_file(filp, cmd, fl, NULL);
                if (error != FILE_LOCK_DEFERRED)
                        break;
-               error = wait_event_interruptible(fl->fl_wait,
-                                       list_empty(&fl->fl_blocked_member));
+               error = wait_event_interruptible(fl->c.flc_wait,
+                                                list_empty(&fl->c.flc_blocked_member));
                if (error)
                        break;
        }
@@ -2318,13 +2384,13 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
 static int
 check_fmode_for_setlk(struct file_lock *fl)
 {
-       switch (fl->fl_type) {
+       switch (fl->c.flc_type) {
        case F_RDLCK:
-               if (!(fl->fl_file->f_mode & FMODE_READ))
+               if (!(fl->c.flc_file->f_mode & FMODE_READ))
                        return -EBADF;
                break;
        case F_WRLCK:
-               if (!(fl->fl_file->f_mode & FMODE_WRITE))
+               if (!(fl->c.flc_file->f_mode & FMODE_WRITE))
                        return -EBADF;
        }
        return 0;
@@ -2363,8 +2429,8 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
                        goto out;
 
                cmd = F_SETLK;
-               file_lock->fl_flags |= FL_OFDLCK;
-               file_lock->fl_owner = filp;
+               file_lock->c.flc_flags |= FL_OFDLCK;
+               file_lock->c.flc_owner = filp;
                break;
        case F_OFD_SETLKW:
                error = -EINVAL;
@@ -2372,11 +2438,11 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
                        goto out;
 
                cmd = F_SETLKW;
-               file_lock->fl_flags |= FL_OFDLCK;
-               file_lock->fl_owner = filp;
+               file_lock->c.flc_flags |= FL_OFDLCK;
+               file_lock->c.flc_owner = filp;
                fallthrough;
        case F_SETLKW:
-               file_lock->fl_flags |= FL_SLEEP;
+               file_lock->c.flc_flags |= FL_SLEEP;
        }
 
        error = do_lock_file_wait(filp, cmd, file_lock);
@@ -2386,8 +2452,8 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
         * lock that was just acquired. There is no need to do that when we're
         * unlocking though, or for OFD locks.
         */
-       if (!error && file_lock->fl_type != F_UNLCK &&
-           !(file_lock->fl_flags & FL_OFDLCK)) {
+       if (!error && file_lock->c.flc_type != F_UNLCK &&
+           !(file_lock->c.flc_flags & FL_OFDLCK)) {
                struct files_struct *files = current->files;
                /*
                 * We need that spin_lock here - it prevents reordering between
@@ -2398,7 +2464,7 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
                f = files_lookup_fd_locked(files, fd);
                spin_unlock(&files->file_lock);
                if (f != filp) {
-                       file_lock->fl_type = F_UNLCK;
+                       file_lock->c.flc_type = F_UNLCK;
                        error = do_lock_file_wait(filp, cmd, file_lock);
                        WARN_ON_ONCE(error);
                        error = -EBADF;
@@ -2437,16 +2503,16 @@ int fcntl_getlk64(struct file *filp, unsigned int cmd, struct flock64 *flock)
                if (flock->l_pid != 0)
                        goto out;
 
-               fl->fl_flags |= FL_OFDLCK;
-               fl->fl_owner = filp;
+               fl->c.flc_flags |= FL_OFDLCK;
+               fl->c.flc_owner = filp;
        }
 
        error = vfs_test_lock(filp, fl);
        if (error)
                goto out;
 
-       flock->l_type = fl->fl_type;
-       if (fl->fl_type != F_UNLCK)
+       flock->l_type = fl->c.flc_type;
+       if (fl->c.flc_type != F_UNLCK)
                posix_lock_to_flock64(flock, fl);
 
 out:
@@ -2486,8 +2552,8 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
                        goto out;
 
                cmd = F_SETLK64;
-               file_lock->fl_flags |= FL_OFDLCK;
-               file_lock->fl_owner = filp;
+               file_lock->c.flc_flags |= FL_OFDLCK;
+               file_lock->c.flc_owner = filp;
                break;
        case F_OFD_SETLKW:
                error = -EINVAL;
@@ -2495,11 +2561,11 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
                        goto out;
 
                cmd = F_SETLKW64;
-               file_lock->fl_flags |= FL_OFDLCK;
-               file_lock->fl_owner = filp;
+               file_lock->c.flc_flags |= FL_OFDLCK;
+               file_lock->c.flc_owner = filp;
                fallthrough;
        case F_SETLKW64:
-               file_lock->fl_flags |= FL_SLEEP;
+               file_lock->c.flc_flags |= FL_SLEEP;
        }
 
        error = do_lock_file_wait(filp, cmd, file_lock);
@@ -2509,8 +2575,8 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
         * lock that was just acquired. There is no need to do that when we're
         * unlocking though, or for OFD locks.
         */
-       if (!error && file_lock->fl_type != F_UNLCK &&
-           !(file_lock->fl_flags & FL_OFDLCK)) {
+       if (!error && file_lock->c.flc_type != F_UNLCK &&
+           !(file_lock->c.flc_flags & FL_OFDLCK)) {
                struct files_struct *files = current->files;
                /*
                 * We need that spin_lock here - it prevents reordering between
@@ -2521,7 +2587,7 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
                f = files_lookup_fd_locked(files, fd);
                spin_unlock(&files->file_lock);
                if (f != filp) {
-                       file_lock->fl_type = F_UNLCK;
+                       file_lock->c.flc_type = F_UNLCK;
                        error = do_lock_file_wait(filp, cmd, file_lock);
                        WARN_ON_ONCE(error);
                        error = -EBADF;
@@ -2555,13 +2621,13 @@ void locks_remove_posix(struct file *filp, fl_owner_t owner)
                return;
 
        locks_init_lock(&lock);
-       lock.fl_type = F_UNLCK;
-       lock.fl_flags = FL_POSIX | FL_CLOSE;
+       lock.c.flc_type = F_UNLCK;
+       lock.c.flc_flags = FL_POSIX | FL_CLOSE;
        lock.fl_start = 0;
        lock.fl_end = OFFSET_MAX;
-       lock.fl_owner = owner;
-       lock.fl_pid = current->tgid;
-       lock.fl_file = filp;
+       lock.c.flc_owner = owner;
+       lock.c.flc_pid = current->tgid;
+       lock.c.flc_file = filp;
        lock.fl_ops = NULL;
        lock.fl_lmops = NULL;
 
@@ -2584,7 +2650,7 @@ locks_remove_flock(struct file *filp, struct file_lock_context *flctx)
                return;
 
        flock_make_lock(filp, &fl, F_UNLCK);
-       fl.fl_flags |= FL_CLOSE;
+       fl.c.flc_flags |= FL_CLOSE;
 
        if (filp->f_op->flock)
                filp->f_op->flock(filp, F_SETLKW, &fl);
@@ -2599,7 +2665,7 @@ locks_remove_flock(struct file *filp, struct file_lock_context *flctx)
 static void
 locks_remove_lease(struct file *filp, struct file_lock_context *ctx)
 {
-       struct file_lock *fl, *tmp;
+       struct file_lease *fl, *tmp;
        LIST_HEAD(dispose);
 
        if (list_empty(&ctx->flc_lease))
@@ -2607,8 +2673,8 @@ locks_remove_lease(struct file *filp, struct file_lock_context *ctx)
 
        percpu_down_read(&file_rwsem);
        spin_lock(&ctx->flc_lock);
-       list_for_each_entry_safe(fl, tmp, &ctx->flc_lease, fl_list)
-               if (filp == fl->fl_file)
+       list_for_each_entry_safe(fl, tmp, &ctx->flc_lease, c.flc_list)
+               if (filp == fl->c.flc_file)
                        lease_modify(fl, F_UNLCK, &dispose);
        spin_unlock(&ctx->flc_lock);
        percpu_up_read(&file_rwsem);
@@ -2652,7 +2718,7 @@ void locks_remove_file(struct file *filp)
  */
 int vfs_cancel_lock(struct file *filp, struct file_lock *fl)
 {
-       WARN_ON_ONCE(filp != fl->fl_file);
+       WARN_ON_ONCE(filp != fl->c.flc_file);
        if (filp->f_op->lock)
                return filp->f_op->lock(filp, F_CANCELLK, fl);
        return 0;
@@ -2691,69 +2757,73 @@ struct locks_iterator {
        loff_t  li_pos;
 };
 
-static void lock_get_status(struct seq_file *f, struct file_lock *fl,
+static void lock_get_status(struct seq_file *f, struct file_lock_core *flc,
                            loff_t id, char *pfx, int repeat)
 {
        struct inode *inode = NULL;
-       unsigned int fl_pid;
+       unsigned int pid;
        struct pid_namespace *proc_pidns = proc_pid_ns(file_inode(f->file)->i_sb);
-       int type;
+       int type = flc->flc_type;
+       struct file_lock *fl = file_lock(flc);
+
+       pid = locks_translate_pid(flc, proc_pidns);
 
-       fl_pid = locks_translate_pid(fl, proc_pidns);
        /*
         * If lock owner is dead (and pid is freed) or not visible in current
         * pidns, zero is shown as a pid value. Check lock info from
         * init_pid_ns to get saved lock pid value.
         */
-
-       if (fl->fl_file != NULL)
-               inode = file_inode(fl->fl_file);
+       if (flc->flc_file != NULL)
+               inode = file_inode(flc->flc_file);
 
        seq_printf(f, "%lld: ", id);
 
        if (repeat)
                seq_printf(f, "%*s", repeat - 1 + (int)strlen(pfx), pfx);
 
-       if (IS_POSIX(fl)) {
-               if (fl->fl_flags & FL_ACCESS)
+       if (flc->flc_flags & FL_POSIX) {
+               if (flc->flc_flags & FL_ACCESS)
                        seq_puts(f, "ACCESS");
-               else if (IS_OFDLCK(fl))
+               else if (flc->flc_flags & FL_OFDLCK)
                        seq_puts(f, "OFDLCK");
                else
                        seq_puts(f, "POSIX ");
 
                seq_printf(f, " %s ",
                             (inode == NULL) ? "*NOINODE*" : "ADVISORY ");
-       } else if (IS_FLOCK(fl)) {
+       } else if (flc->flc_flags & FL_FLOCK) {
                seq_puts(f, "FLOCK  ADVISORY  ");
-       } else if (IS_LEASE(fl)) {
-               if (fl->fl_flags & FL_DELEG)
+       } else if (flc->flc_flags & (FL_LEASE|FL_DELEG|FL_LAYOUT)) {
+               struct file_lease *lease = file_lease(flc);
+
+               type = target_leasetype(lease);
+
+               if (flc->flc_flags & FL_DELEG)
                        seq_puts(f, "DELEG  ");
                else
                        seq_puts(f, "LEASE  ");
 
-               if (lease_breaking(fl))
+               if (lease_breaking(lease))
                        seq_puts(f, "BREAKING  ");
-               else if (fl->fl_file)
+               else if (flc->flc_file)
                        seq_puts(f, "ACTIVE    ");
                else
                        seq_puts(f, "BREAKER   ");
        } else {
                seq_puts(f, "UNKNOWN UNKNOWN  ");
        }
-       type = IS_LEASE(fl) ? target_leasetype(fl) : fl->fl_type;
 
        seq_printf(f, "%s ", (type == F_WRLCK) ? "WRITE" :
                             (type == F_RDLCK) ? "READ" : "UNLCK");
        if (inode) {
                /* userspace relies on this representation of dev_t */
-               seq_printf(f, "%d %02x:%02x:%lu ", fl_pid,
+               seq_printf(f, "%d %02x:%02x:%lu ", pid,
                                MAJOR(inode->i_sb->s_dev),
                                MINOR(inode->i_sb->s_dev), inode->i_ino);
        } else {
-               seq_printf(f, "%d <none>:0 ", fl_pid);
+               seq_printf(f, "%d <none>:0 ", pid);
        }
-       if (IS_POSIX(fl)) {
+       if (flc->flc_flags & FL_POSIX) {
                if (fl->fl_end == OFFSET_MAX)
                        seq_printf(f, "%Ld EOF\n", fl->fl_start);
                else
@@ -2763,17 +2833,18 @@ static void lock_get_status(struct seq_file *f, struct file_lock *fl,
        }
 }
 
-static struct file_lock *get_next_blocked_member(struct file_lock *node)
+static struct file_lock_core *get_next_blocked_member(struct file_lock_core *node)
 {
-       struct file_lock *tmp;
+       struct file_lock_core *tmp;
 
        /* NULL node or root node */
-       if (node == NULL || node->fl_blocker == NULL)
+       if (node == NULL || node->flc_blocker == NULL)
                return NULL;
 
        /* Next member in the linked list could be itself */
-       tmp = list_next_entry(node, fl_blocked_member);
-       if (list_entry_is_head(tmp, &node->fl_blocker->fl_blocked_requests, fl_blocked_member)
+       tmp = list_next_entry(node, flc_blocked_member);
+       if (list_entry_is_head(tmp, &node->flc_blocker->flc_blocked_requests,
+                              flc_blocked_member)
                || tmp == node) {
                return NULL;
        }
@@ -2784,18 +2855,18 @@ static struct file_lock *get_next_blocked_member(struct file_lock *node)
 static int locks_show(struct seq_file *f, void *v)
 {
        struct locks_iterator *iter = f->private;
-       struct file_lock *cur, *tmp;
+       struct file_lock_core *cur, *tmp;
        struct pid_namespace *proc_pidns = proc_pid_ns(file_inode(f->file)->i_sb);
        int level = 0;
 
-       cur = hlist_entry(v, struct file_lock, fl_link);
+       cur = hlist_entry(v, struct file_lock_core, flc_link);
 
        if (locks_translate_pid(cur, proc_pidns) == 0)
                return 0;
 
-       /* View this crossed linked list as a binary tree, the first member of fl_blocked_requests
-        * is the left child of current node, the next silibing in fl_blocked_member is the
-        * right child, we can alse get the parent of current node from fl_blocker, so this
+       /* View this crossed linked list as a binary tree, the first member of flc_blocked_requests
+        * is the left child of current node, the next silibing in flc_blocked_member is the
+        * right child, we can alse get the parent of current node from flc_blocker, so this
         * question becomes traversal of a binary tree
         */
        while (cur != NULL) {
@@ -2804,17 +2875,18 @@ static int locks_show(struct seq_file *f, void *v)
                else
                        lock_get_status(f, cur, iter->li_pos, "", level);
 
-               if (!list_empty(&cur->fl_blocked_requests)) {
+               if (!list_empty(&cur->flc_blocked_requests)) {
                        /* Turn left */
-                       cur = list_first_entry_or_null(&cur->fl_blocked_requests,
-                               struct file_lock, fl_blocked_member);
+                       cur = list_first_entry_or_null(&cur->flc_blocked_requests,
+                                                      struct file_lock_core,
+                                                      flc_blocked_member);
                        level++;
                } else {
                        /* Turn right */
                        tmp = get_next_blocked_member(cur);
                        /* Fall back to parent node */
-                       while (tmp == NULL && cur->fl_blocker != NULL) {
-                               cur = cur->fl_blocker;
+                       while (tmp == NULL && cur->flc_blocker != NULL) {
+                               cur = cur->flc_blocker;
                                level--;
                                tmp = get_next_blocked_member(cur);
                        }
@@ -2829,14 +2901,13 @@ static void __show_fd_locks(struct seq_file *f,
                        struct list_head *head, int *id,
                        struct file *filp, struct files_struct *files)
 {
-       struct file_lock *fl;
+       struct file_lock_core *fl;
 
-       list_for_each_entry(fl, head, fl_list) {
+       list_for_each_entry(fl, head, flc_list) {
 
-               if (filp != fl->fl_file)
+               if (filp != fl->flc_file)
                        continue;
-               if (fl->fl_owner != files &&
-                   fl->fl_owner != filp)
+               if (fl->flc_owner != files && fl->flc_owner != filp)
                        continue;
 
                (*id)++;
@@ -2915,6 +2986,9 @@ static int __init filelock_init(void)
        filelock_cache = kmem_cache_create("file_lock_cache",
                        sizeof(struct file_lock), 0, SLAB_PANIC, NULL);
 
+       filelease_cache = kmem_cache_create("file_lock_cache",
+                       sizeof(struct file_lease), 0, SLAB_PANIC, NULL);
+
        for_each_possible_cpu(i) {
                struct file_lock_list_struct *fll = per_cpu_ptr(&file_lock_list, i);
 
index 82aa7a35db26b3f45950f2ec0f0fb6b04036f811..e60a840999aa98315f7a6dd038fe7ae2eed749d5 100644 (file)
@@ -426,9 +426,7 @@ EXPORT_SYMBOL(mb_cache_destroy);
 
 static int __init mbcache_init(void)
 {
-       mb_entry_cache = kmem_cache_create("mbcache",
-                               sizeof(struct mb_cache_entry), 0,
-                               SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD, NULL);
+       mb_entry_cache = KMEM_CACHE(mb_cache_entry, SLAB_RECLAIM_ACCOUNT);
        if (!mb_entry_cache)
                return -ENOMEM;
        return 0;
index 73f37f298087d14b6201962c21706b6bab5a8374..7cbd2b9f4d115c4406a691b953477113a5bde4f1 100644 (file)
@@ -87,7 +87,7 @@ static int __init init_inodecache(void)
        minix_inode_cachep = kmem_cache_create("minix_inode_cache",
                                             sizeof(struct minix_inode_info),
                                             0, (SLAB_RECLAIM_ACCOUNT|
-                                               SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+                                               SLAB_ACCOUNT),
                                             init_once);
        if (minix_inode_cachep == NULL)
                return -ENOMEM;
index 64c5205e2b5e7dac35c3ee8470d0f65d3254d4fb..3c60f1eaca615a24588a3a7e1645dbaf71234774 100644 (file)
@@ -214,7 +214,7 @@ static int copy_mnt_idmap(struct uid_gid_map *map_from,
         * anything at all.
         */
        if (nr_extents == 0)
-               return 0;
+               return -EINVAL;
 
        /*
         * Here we know that nr_extents is greater than zero which means
index 738882e0766d083743698916df5f77ce1e68dea7..fa8b99a199fa70eed226ed9ea54927751849d62f 100644 (file)
@@ -605,6 +605,7 @@ alloc_new:
                                GFP_NOFS);
                bio->bi_iter.bi_sector = first_block << (blkbits - 9);
                wbc_init_bio(wbc, bio);
+               bio->bi_write_hint = inode->i_write_hint;
        }
 
        /*
index 4e0de939fea127034c24d7badb18253a9351b52e..d0c4a3e9278e444d0fd6e504ba89a3ba335c7fcf 100644 (file)
@@ -1717,7 +1717,11 @@ static inline int may_lookup(struct mnt_idmap *idmap,
 {
        if (nd->flags & LOOKUP_RCU) {
                int err = inode_permission(idmap, nd->inode, MAY_EXEC|MAY_NOT_BLOCK);
-               if (err != -ECHILD || !try_to_unlazy(nd))
+               if (!err)               // success, keep going
+                       return 0;
+               if (!try_to_unlazy(nd))
+                       return -ECHILD; // redo it all non-lazy
+               if (err != -ECHILD)     // hard error
                        return err;
        }
        return inode_permission(idmap, nd->inode, MAY_EXEC);
@@ -2676,10 +2680,8 @@ static int lookup_one_common(struct mnt_idmap *idmap,
        if (!len)
                return -EACCES;
 
-       if (unlikely(name[0] == '.')) {
-               if (len < 2 || (len == 2 && name[1] == '.'))
-                       return -EACCES;
-       }
+       if (is_dot_dotdot(name, len))
+               return -EACCES;
 
        while (len--) {
                unsigned int c = *(const unsigned char *)name++;
index 437f60e96d405861683f7e3596063d26c0e55038..5a51315c6678145467520800ceedc3378df5e7da 100644 (file)
@@ -4472,10 +4472,15 @@ static int do_mount_setattr(struct path *path, struct mount_kattr *kattr)
        /*
         * If this is an attached mount make sure it's located in the callers
         * mount namespace. If it's not don't let the caller interact with it.
-        * If this is a detached mount make sure it has an anonymous mount
-        * namespace attached to it, i.e. we've created it via OPEN_TREE_CLONE.
+        *
+        * If this mount doesn't have a parent it's most often simply a
+        * detached mount with an anonymous mount namespace. IOW, something
+        * that's simply not attached yet. But there are apparently also users
+        * that do change mount properties on the rootfs itself. That obviously
+        * neither has a parent nor is it a detached mount so we cannot
+        * unconditionally check for detached mounts.
         */
-       if (!(mnt_has_parent(mnt) ? check_mnt(mnt) : is_anon_ns(mnt->mnt_ns)))
+       if ((mnt_has_parent(mnt) || !is_anon_ns(mnt->mnt_ns)) && !check_mnt(mnt))
                goto out;
 
        /*
index a59e7b2edaacdcb251765793f14e87eb93a60bb3..3298c29b5548398c0026ccf6ab30c32cb290070d 100644 (file)
@@ -101,7 +101,7 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
                }
 
                if (!test_bit(NETFS_RREQ_DONT_UNLOCK_FOLIOS, &rreq->flags)) {
-                       if (folio_index(folio) == rreq->no_unlock_folio &&
+                       if (folio->index == rreq->no_unlock_folio &&
                            test_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, &rreq->flags))
                                _debug("no unlock");
                        else
@@ -246,13 +246,13 @@ EXPORT_SYMBOL(netfs_readahead);
  */
 int netfs_read_folio(struct file *file, struct folio *folio)
 {
-       struct address_space *mapping = folio_file_mapping(folio);
+       struct address_space *mapping = folio->mapping;
        struct netfs_io_request *rreq;
        struct netfs_inode *ctx = netfs_inode(mapping->host);
        struct folio *sink = NULL;
        int ret;
 
-       _enter("%lx", folio_index(folio));
+       _enter("%lx", folio->index);
 
        rreq = netfs_alloc_request(mapping, file,
                                   folio_file_pos(folio), folio_size(folio),
@@ -460,7 +460,7 @@ retry:
                ret = PTR_ERR(rreq);
                goto error;
        }
-       rreq->no_unlock_folio   = folio_index(folio);
+       rreq->no_unlock_folio   = folio->index;
        __set_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, &rreq->flags);
 
        ret = netfs_begin_cache_read(rreq, ctx);
@@ -518,7 +518,7 @@ int netfs_prefetch_for_write(struct file *file, struct folio *folio,
                             size_t offset, size_t len)
 {
        struct netfs_io_request *rreq;
-       struct address_space *mapping = folio_file_mapping(folio);
+       struct address_space *mapping = folio->mapping;
        struct netfs_inode *ctx = netfs_inode(mapping->host);
        unsigned long long start = folio_pos(folio);
        size_t flen = folio_size(folio);
@@ -535,7 +535,7 @@ int netfs_prefetch_for_write(struct file *file, struct folio *folio,
                goto error;
        }
 
-       rreq->no_unlock_folio = folio_index(folio);
+       rreq->no_unlock_folio = folio->index;
        __set_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, &rreq->flags);
        ret = netfs_begin_cache_read(rreq, ctx);
        if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS)
index 93dc76f34e39a077a82d235fe5ec69bbc5d6e13d..9a0d32e4b422ad09518a6c6143638d0c68fb8b84 100644 (file)
@@ -221,10 +221,11 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
                if (unlikely(fault_in_iov_iter_readable(iter, part) == part))
                        break;
 
-               ret = -ENOMEM;
                folio = netfs_grab_folio_for_write(mapping, pos, part);
-               if (!folio)
+               if (IS_ERR(folio)) {
+                       ret = PTR_ERR(folio);
                        break;
+               }
 
                flen = folio_size(folio);
                offset = pos & (flen - 1);
@@ -343,7 +344,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter,
                        break;
                default:
                        WARN(true, "Unexpected modify type %u ix=%lx\n",
-                            howto, folio_index(folio));
+                            howto, folio->index);
                        ret = -EIO;
                        goto error_folio_unlock;
                }
@@ -476,6 +477,9 @@ ssize_t netfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 
        _enter("%llx,%zx,%llx", iocb->ki_pos, iov_iter_count(from), i_size_read(inode));
 
+       if (!iov_iter_count(from))
+               return 0;
+
        if ((iocb->ki_flags & IOCB_DIRECT) ||
            test_bit(NETFS_ICTX_UNBUFFERED, &ictx->flags))
                return netfs_unbuffered_write_iter(iocb, from);
@@ -648,7 +652,7 @@ static void netfs_pages_written_back(struct netfs_io_request *wreq)
        xas_for_each(&xas, folio, last) {
                WARN(!folio_test_writeback(folio),
                     "bad %zx @%llx page %lx %lx\n",
-                    wreq->len, wreq->start, folio_index(folio), last);
+                    wreq->len, wreq->start, folio->index, last);
 
                if ((finfo = netfs_folio_info(folio))) {
                        /* Streaming writes cannot be redirtied whilst under
@@ -795,7 +799,7 @@ static void netfs_extend_writeback(struct address_space *mapping,
                                continue;
                        if (xa_is_value(folio))
                                break;
-                       if (folio_index(folio) != index) {
+                       if (folio->index != index) {
                                xas_reset(xas);
                                break;
                        }
@@ -901,7 +905,7 @@ static ssize_t netfs_write_back_from_locked_folio(struct address_space *mapping,
        long count = wbc->nr_to_write;
        int ret;
 
-       _enter(",%lx,%llx-%llx,%u", folio_index(folio), start, end, caching);
+       _enter(",%lx,%llx-%llx,%u", folio->index, start, end, caching);
 
        wreq = netfs_alloc_request(mapping, NULL, start, folio_size(folio),
                                   NETFS_WRITEBACK);
@@ -1047,7 +1051,7 @@ search_again:
 
        start = folio_pos(folio); /* May regress with THPs */
 
-       _debug("wback %lx", folio_index(folio));
+       _debug("wback %lx", folio->index);
 
        /* At this point we hold neither the i_pages lock nor the page lock:
         * the page may be truncated or invalidated (changing page->mapping to
index 60a40d293c87f5fd1088830f07488775b8725bb4..bee047e20f5d6933e3af452eb150e4eb2e97d941 100644 (file)
@@ -139,6 +139,9 @@ ssize_t netfs_unbuffered_write_iter(struct kiocb *iocb, struct iov_iter *from)
 
        _enter("%llx,%zx,%llx", iocb->ki_pos, iov_iter_count(from), i_size_read(inode));
 
+       if (!iov_iter_count(from))
+               return 0;
+
        trace_netfs_write_iter(iocb, from);
        netfs_stat(&netfs_n_rh_dio_write);
 
@@ -146,7 +149,7 @@ ssize_t netfs_unbuffered_write_iter(struct kiocb *iocb, struct iov_iter *from)
        if (ret < 0)
                return ret;
        ret = generic_write_checks(iocb, from);
-       if (ret < 0)
+       if (ret <= 0)
                goto out;
        ret = file_remove_privs(file);
        if (ret < 0)
index d645f8b302a27882c86c3c46e134dd5bcbc35cef..9397ed39b0b4ecbdd9c9b5860887162990c2f66d 100644 (file)
@@ -179,13 +179,14 @@ EXPORT_SYMBOL(fscache_acquire_cache);
 void fscache_put_cache(struct fscache_cache *cache,
                       enum fscache_cache_trace where)
 {
-       unsigned int debug_id = cache->debug_id;
+       unsigned int debug_id;
        bool zero;
        int ref;
 
        if (IS_ERR_OR_NULL(cache))
                return;
 
+       debug_id = cache->debug_id;
        zero = __refcount_dec_and_test(&cache->ref, &ref);
        trace_fscache_cache(debug_id, ref - 1, where);
 
index 4309edf338627eee2963e1520ab6485a483e1c5d..4261ad6c55b664a7e3da006d007de03664790641 100644 (file)
@@ -124,7 +124,7 @@ static void netfs_rreq_unmark_after_write(struct netfs_io_request *rreq,
                        /* We might have multiple writes from the same huge
                         * folio, but we mustn't unlock a folio more than once.
                         */
-                       if (have_unlocked && folio_index(folio) <= unlocked)
+                       if (have_unlocked && folio->index <= unlocked)
                                continue;
                        unlocked = folio_next_index(folio) - 1;
                        trace_netfs_folio(folio, netfs_folio_trace_end_copy);
@@ -748,6 +748,8 @@ int netfs_begin_read(struct netfs_io_request *rreq, bool sync)
 
        if (!rreq->submitted) {
                netfs_put_request(rreq, false, netfs_rreq_trace_put_no_submit);
+               if (rreq->origin == NETFS_DIO_READ)
+                       inode_dio_end(rreq->inode);
                ret = 0;
                goto out;
        }
index 0e3af37fc9243f7a0d351840904aa0ce5d91ee59..90051ced8e2a879827e54d4b50c976bc11f6b759 100644 (file)
@@ -180,7 +180,7 @@ void netfs_invalidate_folio(struct folio *folio, size_t offset, size_t length)
        struct netfs_folio *finfo = NULL;
        size_t flen = folio_size(folio);
 
-       _enter("{%lx},%zx,%zx", folio_index(folio), offset, length);
+       _enter("{%lx},%zx,%zx", folio->index, offset, length);
 
        folio_wait_fscache(folio);
 
index b4294a8aa2d4c5a75382951ce134507ec707b0e0..f1eeb49141992995fb422c830ce241e7612dc9a4 100644 (file)
@@ -108,7 +108,7 @@ struct pnfs_block_dev {
        struct pnfs_block_dev           *children;
        u64                             chunk_size;
 
-       struct bdev_handle              *bdev_handle;
+       struct file                     *bdev_file;
        u64                             disk_offset;
 
        u64                             pr_key;
index c97ebc42ec0fee283463be4ad79a8f0fece9a301..93ef7f864980b1dca830d5799c57f33cf21d2886 100644 (file)
@@ -25,17 +25,17 @@ bl_free_device(struct pnfs_block_dev *dev)
        } else {
                if (dev->pr_registered) {
                        const struct pr_ops *ops =
-                               dev->bdev_handle->bdev->bd_disk->fops->pr_ops;
+                               file_bdev(dev->bdev_file)->bd_disk->fops->pr_ops;
                        int error;
 
-                       error = ops->pr_register(dev->bdev_handle->bdev,
+                       error = ops->pr_register(file_bdev(dev->bdev_file),
                                dev->pr_key, 0, false);
                        if (error)
                                pr_err("failed to unregister PR key.\n");
                }
 
-               if (dev->bdev_handle)
-                       bdev_release(dev->bdev_handle);
+               if (dev->bdev_file)
+                       fput(dev->bdev_file);
        }
 }
 
@@ -169,7 +169,7 @@ static bool bl_map_simple(struct pnfs_block_dev *dev, u64 offset,
        map->start = dev->start;
        map->len = dev->len;
        map->disk_offset = dev->disk_offset;
-       map->bdev = dev->bdev_handle->bdev;
+       map->bdev = file_bdev(dev->bdev_file);
        return true;
 }
 
@@ -236,26 +236,26 @@ bl_parse_simple(struct nfs_server *server, struct pnfs_block_dev *d,
                struct pnfs_block_volume *volumes, int idx, gfp_t gfp_mask)
 {
        struct pnfs_block_volume *v = &volumes[idx];
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        dev_t dev;
 
        dev = bl_resolve_deviceid(server, v, gfp_mask);
        if (!dev)
                return -EIO;
 
-       bdev_handle = bdev_open_by_dev(dev, BLK_OPEN_READ | BLK_OPEN_WRITE,
+       bdev_file = bdev_file_open_by_dev(dev, BLK_OPEN_READ | BLK_OPEN_WRITE,
                                       NULL, NULL);
-       if (IS_ERR(bdev_handle)) {
+       if (IS_ERR(bdev_file)) {
                printk(KERN_WARNING "pNFS: failed to open device %d:%d (%ld)\n",
-                       MAJOR(dev), MINOR(dev), PTR_ERR(bdev_handle));
-               return PTR_ERR(bdev_handle);
+                       MAJOR(dev), MINOR(dev), PTR_ERR(bdev_file));
+               return PTR_ERR(bdev_file);
        }
-       d->bdev_handle = bdev_handle;
-       d->len = bdev_nr_bytes(bdev_handle->bdev);
+       d->bdev_file = bdev_file;
+       d->len = bdev_nr_bytes(file_bdev(bdev_file));
        d->map = bl_map_simple;
 
        printk(KERN_INFO "pNFS: using block device %s\n",
-               bdev_handle->bdev->bd_disk->disk_name);
+               file_bdev(bdev_file)->bd_disk->disk_name);
        return 0;
 }
 
@@ -300,10 +300,10 @@ bl_validate_designator(struct pnfs_block_volume *v)
        }
 }
 
-static struct bdev_handle *
+static struct file *
 bl_open_path(struct pnfs_block_volume *v, const char *prefix)
 {
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        const char *devname;
 
        devname = kasprintf(GFP_KERNEL, "/dev/disk/by-id/%s%*phN",
@@ -311,15 +311,15 @@ bl_open_path(struct pnfs_block_volume *v, const char *prefix)
        if (!devname)
                return ERR_PTR(-ENOMEM);
 
-       bdev_handle = bdev_open_by_path(devname, BLK_OPEN_READ | BLK_OPEN_WRITE,
+       bdev_file = bdev_file_open_by_path(devname, BLK_OPEN_READ | BLK_OPEN_WRITE,
                                        NULL, NULL);
-       if (IS_ERR(bdev_handle)) {
+       if (IS_ERR(bdev_file)) {
                pr_warn("pNFS: failed to open device %s (%ld)\n",
-                       devname, PTR_ERR(bdev_handle));
+                       devname, PTR_ERR(bdev_file));
        }
 
        kfree(devname);
-       return bdev_handle;
+       return bdev_file;
 }
 
 static int
@@ -327,7 +327,7 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
                struct pnfs_block_volume *volumes, int idx, gfp_t gfp_mask)
 {
        struct pnfs_block_volume *v = &volumes[idx];
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        const struct pr_ops *ops;
        int error;
 
@@ -340,14 +340,14 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
         * On other distributions like Debian, the default SCSI by-id path will
         * point to the dm-multipath device if one exists.
         */
-       bdev_handle = bl_open_path(v, "dm-uuid-mpath-0x");
-       if (IS_ERR(bdev_handle))
-               bdev_handle = bl_open_path(v, "wwn-0x");
-       if (IS_ERR(bdev_handle))
-               return PTR_ERR(bdev_handle);
-       d->bdev_handle = bdev_handle;
-
-       d->len = bdev_nr_bytes(d->bdev_handle->bdev);
+       bdev_file = bl_open_path(v, "dm-uuid-mpath-0x");
+       if (IS_ERR(bdev_file))
+               bdev_file = bl_open_path(v, "wwn-0x");
+       if (IS_ERR(bdev_file))
+               return PTR_ERR(bdev_file);
+       d->bdev_file = bdev_file;
+
+       d->len = bdev_nr_bytes(file_bdev(d->bdev_file));
        d->map = bl_map_simple;
        d->pr_key = v->scsi.pr_key;
 
@@ -355,20 +355,20 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
                return -ENODEV;
 
        pr_info("pNFS: using block device %s (reservation key 0x%llx)\n",
-               d->bdev_handle->bdev->bd_disk->disk_name, d->pr_key);
+               file_bdev(d->bdev_file)->bd_disk->disk_name, d->pr_key);
 
-       ops = d->bdev_handle->bdev->bd_disk->fops->pr_ops;
+       ops = file_bdev(d->bdev_file)->bd_disk->fops->pr_ops;
        if (!ops) {
                pr_err("pNFS: block device %s does not support reservations.",
-                               d->bdev_handle->bdev->bd_disk->disk_name);
+                               file_bdev(d->bdev_file)->bd_disk->disk_name);
                error = -EINVAL;
                goto out_blkdev_put;
        }
 
-       error = ops->pr_register(d->bdev_handle->bdev, 0, d->pr_key, true);
+       error = ops->pr_register(file_bdev(d->bdev_file), 0, d->pr_key, true);
        if (error) {
                pr_err("pNFS: failed to register key for block device %s.",
-                               d->bdev_handle->bdev->bd_disk->disk_name);
+                               file_bdev(d->bdev_file)->bd_disk->disk_name);
                goto out_blkdev_put;
        }
 
@@ -376,7 +376,7 @@ bl_parse_scsi(struct nfs_server *server, struct pnfs_block_dev *d,
        return 0;
 
 out_blkdev_put:
-       bdev_release(d->bdev_handle);
+       fput(d->bdev_file);
        return error;
 }
 
index 44eca51b28085d9deff764bfe6f9286388e93983..fbdc9ca80f714bdf3d3cad54e63d7c858612e5f1 100644 (file)
@@ -246,7 +246,7 @@ void nfs_free_client(struct nfs_client *clp)
        put_nfs_version(clp->cl_nfs_mod);
        kfree(clp->cl_hostname);
        kfree(clp->cl_acceptor);
-       kfree(clp);
+       kfree_rcu(clp, rcu);
 }
 EXPORT_SYMBOL_GPL(nfs_free_client);
 
@@ -1006,6 +1006,14 @@ struct nfs_server *nfs_alloc_server(void)
 }
 EXPORT_SYMBOL_GPL(nfs_alloc_server);
 
+static void delayed_free(struct rcu_head *p)
+{
+       struct nfs_server *server = container_of(p, struct nfs_server, rcu);
+
+       nfs_free_iostats(server->io_stats);
+       kfree(server);
+}
+
 /*
  * Free up a server record
  */
@@ -1031,10 +1039,9 @@ void nfs_free_server(struct nfs_server *server)
 
        ida_destroy(&server->lockowner_id);
        ida_destroy(&server->openowner_id);
-       nfs_free_iostats(server->io_stats);
        put_cred(server->cred);
-       kfree(server);
        nfs_release_automount_timer();
+       call_rcu(&server->rcu, delayed_free);
 }
 EXPORT_SYMBOL_GPL(nfs_free_server);
 
index fa1a14def45cea2fc485b598831becc9e7497992..d4a42ce0c7e3dfeef19884be272a8ddd449c143f 100644 (file)
@@ -156,8 +156,8 @@ static int nfs_delegation_claim_locks(struct nfs4_state *state, const nfs4_state
        list = &flctx->flc_posix;
        spin_lock(&flctx->flc_lock);
 restart:
-       list_for_each_entry(fl, list, fl_list) {
-               if (nfs_file_open_context(fl->fl_file)->state != state)
+       for_each_file_lock(fl, list) {
+               if (nfs_file_open_context(fl->c.flc_file)->state != state)
                        continue;
                spin_unlock(&flctx->flc_lock);
                status = nfs4_lock_delegation_recall(fl, state, stateid);
index c8ecbe99905960ccd63b7128f273fc38543d876d..ac505671efbdb7a91a346e4f300e352261562eae 100644 (file)
@@ -1431,9 +1431,9 @@ static bool nfs_verifier_is_delegated(struct dentry *dentry)
 static void nfs_set_verifier_locked(struct dentry *dentry, unsigned long verf)
 {
        struct inode *inode = d_inode(dentry);
-       struct inode *dir = d_inode(dentry->d_parent);
+       struct inode *dir = d_inode_rcu(dentry->d_parent);
 
-       if (!nfs_verify_change_attribute(dir, verf))
+       if (!dir || !nfs_verify_change_attribute(dir, verf))
                return;
        if (inode && NFS_PROTO(inode)->have_delegation(inode, FMODE_READ))
                nfs_set_verifier_delegated(&verf);
index 8577ccf621f5be6c72f6954a49867e03be9cd87a..407c6e15afe25c4fd5a6dc452f41d55438d7fb24 100644 (file)
@@ -720,15 +720,15 @@ do_getlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
 {
        struct inode *inode = filp->f_mapping->host;
        int status = 0;
-       unsigned int saved_type = fl->fl_type;
+       unsigned int saved_type = fl->c.flc_type;
 
        /* Try local locking first */
        posix_test_lock(filp, fl);
-       if (fl->fl_type != F_UNLCK) {
+       if (fl->c.flc_type != F_UNLCK) {
                /* found a conflict */
                goto out;
        }
-       fl->fl_type = saved_type;
+       fl->c.flc_type = saved_type;
 
        if (NFS_PROTO(inode)->have_delegation(inode, FMODE_READ))
                goto out_noconflict;
@@ -740,7 +740,7 @@ do_getlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
 out:
        return status;
 out_noconflict:
-       fl->fl_type = F_UNLCK;
+       fl->c.flc_type = F_UNLCK;
        goto out;
 }
 
@@ -765,7 +765,7 @@ do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
                 *      If we're signalled while cleaning up locks on process exit, we
                 *      still need to complete the unlock.
                 */
-               if (status < 0 && !(fl->fl_flags & FL_CLOSE))
+               if (status < 0 && !(fl->c.flc_flags & FL_CLOSE))
                        return status;
        }
 
@@ -832,12 +832,12 @@ int nfs_lock(struct file *filp, int cmd, struct file_lock *fl)
        int is_local = 0;
 
        dprintk("NFS: lock(%pD2, t=%x, fl=%x, r=%lld:%lld)\n",
-                       filp, fl->fl_type, fl->fl_flags,
+                       filp, fl->c.flc_type, fl->c.flc_flags,
                        (long long)fl->fl_start, (long long)fl->fl_end);
 
        nfs_inc_stats(inode, NFSIOS_VFSLOCK);
 
-       if (fl->fl_flags & FL_RECLAIM)
+       if (fl->c.flc_flags & FL_RECLAIM)
                return -ENOGRACE;
 
        if (NFS_SERVER(inode)->flags & NFS_MOUNT_LOCAL_FCNTL)
@@ -851,7 +851,7 @@ int nfs_lock(struct file *filp, int cmd, struct file_lock *fl)
 
        if (IS_GETLK(cmd))
                ret = do_getlk(filp, cmd, fl, is_local);
-       else if (fl->fl_type == F_UNLCK)
+       else if (lock_is_unlock(fl))
                ret = do_unlk(filp, cmd, fl, is_local);
        else
                ret = do_setlk(filp, cmd, fl, is_local);
@@ -869,16 +869,16 @@ int nfs_flock(struct file *filp, int cmd, struct file_lock *fl)
        int is_local = 0;
 
        dprintk("NFS: flock(%pD2, t=%x, fl=%x)\n",
-                       filp, fl->fl_type, fl->fl_flags);
+                       filp, fl->c.flc_type, fl->c.flc_flags);
 
-       if (!(fl->fl_flags & FL_FLOCK))
+       if (!(fl->c.flc_flags & FL_FLOCK))
                return -ENOLCK;
 
        if (NFS_SERVER(inode)->flags & NFS_MOUNT_LOCAL_FLOCK)
                is_local = 1;
 
        /* We're simulating flock() locks using posix locks on the server */
-       if (fl->fl_type == F_UNLCK)
+       if (lock_is_unlock(fl))
                return do_unlk(filp, cmd, fl, is_local);
        return do_setlk(filp, cmd, fl, is_local);
 }
index 2de66e4e8280a801b647dfe10c577e69192e236c..cbbe3f0193b8a34e5a64794248c2b0e4edd63fa0 100644 (file)
@@ -963,7 +963,7 @@ nfs3_proc_lock(struct file *filp, int cmd, struct file_lock *fl)
        struct nfs_open_context *ctx = nfs_file_open_context(filp);
        int status;
 
-       if (fl->fl_flags & FL_CLOSE) {
+       if (fl->c.flc_flags & FL_CLOSE) {
                l_ctx = nfs_get_lock_context(ctx);
                if (IS_ERR(l_ctx))
                        l_ctx = NULL;
index 581698f1b7b2441025d5421b3b00b4fba42b2ab8..6ff41ceb9f1c770cfe0af2f032bfa3f23d95e290 100644 (file)
@@ -330,7 +330,7 @@ extern int update_open_stateid(struct nfs4_state *state,
                                const nfs4_stateid *deleg_stateid,
                                fmode_t fmode);
 extern int nfs4_proc_setlease(struct file *file, int arg,
-                             struct file_lock **lease, void **priv);
+                             struct file_lease **lease, void **priv);
 extern int nfs4_proc_get_lease_time(struct nfs_client *clp,
                struct nfs_fsinfo *fsinfo);
 extern void nfs4_update_changeattr(struct inode *dir,
index e238abc78a13e72672fe0e635e433ce92252a669..1cd9652f3c280358209f22503ea573a906a6194e 100644 (file)
@@ -439,7 +439,7 @@ void nfs42_ssc_unregister_ops(void)
 }
 #endif /* CONFIG_NFS_V4_2 */
 
-static int nfs4_setlease(struct file *file, int arg, struct file_lock **lease,
+static int nfs4_setlease(struct file *file, int arg, struct file_lease **lease,
                         void **priv)
 {
        return nfs4_proc_setlease(file, arg, lease, priv);
index 23819a756508573efbcc61ecc0f529a7ad178e2a..815996cb27fc4589bed01827c086b32e766f0bc0 100644 (file)
@@ -6800,7 +6800,7 @@ static int _nfs4_proc_getlk(struct nfs4_state *state, int cmd, struct file_lock
        status = nfs4_call_sync(server->client, server, &msg, &arg.seq_args, &res.seq_res, 1);
        switch (status) {
                case 0:
-                       request->fl_type = F_UNLCK;
+                       request->c.flc_type = F_UNLCK;
                        break;
                case -NFS4ERR_DENIED:
                        status = 0;
@@ -7018,8 +7018,8 @@ static struct rpc_task *nfs4_do_unlck(struct file_lock *fl,
        /* Ensure this is an unlock - when canceling a lock, the
         * canceled lock is passed in, and it won't be an unlock.
         */
-       fl->fl_type = F_UNLCK;
-       if (fl->fl_flags & FL_CLOSE)
+       fl->c.flc_type = F_UNLCK;
+       if (fl->c.flc_flags & FL_CLOSE)
                set_bit(NFS_CONTEXT_UNLOCK, &ctx->flags);
 
        data = nfs4_alloc_unlockdata(fl, ctx, lsp, seqid);
@@ -7045,11 +7045,11 @@ static int nfs4_proc_unlck(struct nfs4_state *state, int cmd, struct file_lock *
        struct rpc_task *task;
        struct nfs_seqid *(*alloc_seqid)(struct nfs_seqid_counter *, gfp_t);
        int status = 0;
-       unsigned char fl_flags = request->fl_flags;
+       unsigned char saved_flags = request->c.flc_flags;
 
        status = nfs4_set_lock_state(state, request);
        /* Unlock _before_ we do the RPC call */
-       request->fl_flags |= FL_EXISTS;
+       request->c.flc_flags |= FL_EXISTS;
        /* Exclude nfs_delegation_claim_locks() */
        mutex_lock(&sp->so_delegreturn_mutex);
        /* Exclude nfs4_reclaim_open_stateid() - note nesting! */
@@ -7073,14 +7073,16 @@ static int nfs4_proc_unlck(struct nfs4_state *state, int cmd, struct file_lock *
        status = -ENOMEM;
        if (IS_ERR(seqid))
                goto out;
-       task = nfs4_do_unlck(request, nfs_file_open_context(request->fl_file), lsp, seqid);
+       task = nfs4_do_unlck(request,
+                            nfs_file_open_context(request->c.flc_file),
+                            lsp, seqid);
        status = PTR_ERR(task);
        if (IS_ERR(task))
                goto out;
        status = rpc_wait_for_completion_task(task);
        rpc_put_task(task);
 out:
-       request->fl_flags = fl_flags;
+       request->c.flc_flags = saved_flags;
        trace_nfs4_unlock(request, state, F_SETLK, status);
        return status;
 }
@@ -7191,7 +7193,7 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata)
                renew_lease(NFS_SERVER(d_inode(data->ctx->dentry)),
                                data->timestamp);
                if (data->arg.new_lock && !data->cancelled) {
-                       data->fl.fl_flags &= ~(FL_SLEEP | FL_ACCESS);
+                       data->fl.c.flc_flags &= ~(FL_SLEEP | FL_ACCESS);
                        if (locks_lock_inode_wait(lsp->ls_state->inode, &data->fl) < 0)
                                goto out_restart;
                }
@@ -7292,7 +7294,8 @@ static int _nfs4_do_setlk(struct nfs4_state *state, int cmd, struct file_lock *f
        if (nfs_server_capable(state->inode, NFS_CAP_MOVEABLE))
                task_setup_data.flags |= RPC_TASK_MOVEABLE;
 
-       data = nfs4_alloc_lockdata(fl, nfs_file_open_context(fl->fl_file),
+       data = nfs4_alloc_lockdata(fl,
+                                  nfs_file_open_context(fl->c.flc_file),
                                   fl->fl_u.nfs4_fl.owner, GFP_KERNEL);
        if (data == NULL)
                return -ENOMEM;
@@ -7398,10 +7401,10 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
 {
        struct nfs_inode *nfsi = NFS_I(state->inode);
        struct nfs4_state_owner *sp = state->owner;
-       unsigned char fl_flags = request->fl_flags;
+       unsigned char flags = request->c.flc_flags;
        int status;
 
-       request->fl_flags |= FL_ACCESS;
+       request->c.flc_flags |= FL_ACCESS;
        status = locks_lock_inode_wait(state->inode, request);
        if (status < 0)
                goto out;
@@ -7410,7 +7413,7 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
        if (test_bit(NFS_DELEGATED_STATE, &state->flags)) {
                /* Yes: cache locks! */
                /* ...but avoid races with delegation recall... */
-               request->fl_flags = fl_flags & ~FL_SLEEP;
+               request->c.flc_flags = flags & ~FL_SLEEP;
                status = locks_lock_inode_wait(state->inode, request);
                up_read(&nfsi->rwsem);
                mutex_unlock(&sp->so_delegreturn_mutex);
@@ -7420,7 +7423,7 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
        mutex_unlock(&sp->so_delegreturn_mutex);
        status = _nfs4_do_setlk(state, cmd, request, NFS_LOCK_NEW);
 out:
-       request->fl_flags = fl_flags;
+       request->c.flc_flags = flags;
        return status;
 }
 
@@ -7562,7 +7565,7 @@ nfs4_proc_lock(struct file *filp, int cmd, struct file_lock *request)
        if (!(IS_SETLK(cmd) || IS_SETLKW(cmd)))
                return -EINVAL;
 
-       if (request->fl_type == F_UNLCK) {
+       if (lock_is_unlock(request)) {
                if (state != NULL)
                        return nfs4_proc_unlck(state, cmd, request);
                return 0;
@@ -7571,7 +7574,7 @@ nfs4_proc_lock(struct file *filp, int cmd, struct file_lock *request)
        if (state == NULL)
                return -ENOLCK;
 
-       if ((request->fl_flags & FL_POSIX) &&
+       if ((request->c.flc_flags & FL_POSIX) &&
            !test_bit(NFS_STATE_POSIX_LOCKS, &state->flags))
                return -ENOLCK;
 
@@ -7579,7 +7582,7 @@ nfs4_proc_lock(struct file *filp, int cmd, struct file_lock *request)
         * Don't rely on the VFS having checked the file open mode,
         * since it won't do this for flock() locks.
         */
-       switch (request->fl_type) {
+       switch (request->c.flc_type) {
        case F_RDLCK:
                if (!(filp->f_mode & FMODE_READ))
                        return -EBADF;
@@ -7601,7 +7604,7 @@ static int nfs4_delete_lease(struct file *file, void **priv)
        return generic_setlease(file, F_UNLCK, NULL, priv);
 }
 
-static int nfs4_add_lease(struct file *file, int arg, struct file_lock **lease,
+static int nfs4_add_lease(struct file *file, int arg, struct file_lease **lease,
                          void **priv)
 {
        struct inode *inode = file_inode(file);
@@ -7619,7 +7622,7 @@ static int nfs4_add_lease(struct file *file, int arg, struct file_lock **lease,
        return -EAGAIN;
 }
 
-int nfs4_proc_setlease(struct file *file, int arg, struct file_lock **lease,
+int nfs4_proc_setlease(struct file *file, int arg, struct file_lease **lease,
                       void **priv)
 {
        switch (arg) {
index 9a5d911a7edc77cae53cac45aa43f196b32caf20..8cfabdbda33694912652eeb736126c673bdb745a 100644 (file)
@@ -847,15 +847,15 @@ void nfs4_close_sync(struct nfs4_state *state, fmode_t fmode)
  */
 static struct nfs4_lock_state *
 __nfs4_find_lock_state(struct nfs4_state *state,
-                      fl_owner_t fl_owner, fl_owner_t fl_owner2)
+                      fl_owner_t owner, fl_owner_t owner2)
 {
        struct nfs4_lock_state *pos, *ret = NULL;
        list_for_each_entry(pos, &state->lock_states, ls_locks) {
-               if (pos->ls_owner == fl_owner) {
+               if (pos->ls_owner == owner) {
                        ret = pos;
                        break;
                }
-               if (pos->ls_owner == fl_owner2)
+               if (pos->ls_owner == owner2)
                        ret = pos;
        }
        if (ret)
@@ -868,7 +868,7 @@ __nfs4_find_lock_state(struct nfs4_state *state,
  * exists, return an uninitialized one.
  *
  */
-static struct nfs4_lock_state *nfs4_alloc_lock_state(struct nfs4_state *state, fl_owner_t fl_owner)
+static struct nfs4_lock_state *nfs4_alloc_lock_state(struct nfs4_state *state, fl_owner_t owner)
 {
        struct nfs4_lock_state *lsp;
        struct nfs_server *server = state->owner->so_server;
@@ -879,7 +879,7 @@ static struct nfs4_lock_state *nfs4_alloc_lock_state(struct nfs4_state *state, f
        nfs4_init_seqid_counter(&lsp->ls_seqid);
        refcount_set(&lsp->ls_count, 1);
        lsp->ls_state = state;
-       lsp->ls_owner = fl_owner;
+       lsp->ls_owner = owner;
        lsp->ls_seqid.owner_id = ida_alloc(&server->lockowner_id, GFP_KERNEL_ACCOUNT);
        if (lsp->ls_seqid.owner_id < 0)
                goto out_free;
@@ -980,7 +980,7 @@ int nfs4_set_lock_state(struct nfs4_state *state, struct file_lock *fl)
 
        if (fl->fl_ops != NULL)
                return 0;
-       lsp = nfs4_get_lock_state(state, fl->fl_owner);
+       lsp = nfs4_get_lock_state(state, fl->c.flc_owner);
        if (lsp == NULL)
                return -ENOMEM;
        fl->fl_u.nfs4_fl.owner = lsp;
@@ -993,7 +993,7 @@ static int nfs4_copy_lock_stateid(nfs4_stateid *dst,
                const struct nfs_lock_context *l_ctx)
 {
        struct nfs4_lock_state *lsp;
-       fl_owner_t fl_owner, fl_flock_owner;
+       fl_owner_t owner, fl_flock_owner;
        int ret = -ENOENT;
 
        if (l_ctx == NULL)
@@ -1002,11 +1002,11 @@ static int nfs4_copy_lock_stateid(nfs4_stateid *dst,
        if (test_bit(LK_STATE_IN_USE, &state->flags) == 0)
                goto out;
 
-       fl_owner = l_ctx->lockowner;
+       owner = l_ctx->lockowner;
        fl_flock_owner = l_ctx->open_context->flock_owner;
 
        spin_lock(&state->state_lock);
-       lsp = __nfs4_find_lock_state(state, fl_owner, fl_flock_owner);
+       lsp = __nfs4_find_lock_state(state, owner, fl_flock_owner);
        if (lsp && test_bit(NFS_LOCK_LOST, &lsp->ls_flags))
                ret = -EIO;
        else if (lsp != NULL && test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags) != 0) {
@@ -1529,8 +1529,8 @@ static int nfs4_reclaim_locks(struct nfs4_state *state, const struct nfs4_state_
        down_write(&nfsi->rwsem);
        spin_lock(&flctx->flc_lock);
 restart:
-       list_for_each_entry(fl, list, fl_list) {
-               if (nfs_file_open_context(fl->fl_file)->state != state)
+       for_each_file_lock(fl, list) {
+               if (nfs_file_open_context(fl->c.flc_file)->state != state)
                        continue;
                spin_unlock(&flctx->flc_lock);
                status = ops->recover_lock(state, fl);
index d27919d7241d389b257939c26cc134d4ff1be76f..fd7cb15b08b27628f205cdb4284b9035a7d3172a 100644 (file)
@@ -699,7 +699,7 @@ DECLARE_EVENT_CLASS(nfs4_lock_event,
 
                        __entry->error = error < 0 ? -error : 0;
                        __entry->cmd = cmd;
-                       __entry->type = request->fl_type;
+                       __entry->type = request->c.flc_type;
                        __entry->start = request->fl_start;
                        __entry->end = request->fl_end;
                        __entry->dev = inode->i_sb->s_dev;
@@ -771,7 +771,7 @@ TRACE_EVENT(nfs4_set_lock,
 
                        __entry->error = error < 0 ? -error : 0;
                        __entry->cmd = cmd;
-                       __entry->type = request->fl_type;
+                       __entry->type = request->c.flc_type;
                        __entry->start = request->fl_start;
                        __entry->end = request->fl_end;
                        __entry->dev = inode->i_sb->s_dev;
index 69406e60f391e274c83a215bdbd72fc36c8ad78e..1416099dfcd159a9cb4a0ffb21dbd826ad940a07 100644 (file)
@@ -1305,7 +1305,7 @@ static void encode_link(struct xdr_stream *xdr, const struct qstr *name, struct
 
 static inline int nfs4_lock_type(struct file_lock *fl, int block)
 {
-       if (fl->fl_type == F_RDLCK)
+       if (lock_is_read(fl))
                return block ? NFS4_READW_LT : NFS4_READ_LT;
        return block ? NFS4_WRITEW_LT : NFS4_WRITE_LT;
 }
@@ -5052,10 +5052,10 @@ static int decode_lock_denied (struct xdr_stream *xdr, struct file_lock *fl)
                fl->fl_end = fl->fl_start + (loff_t)length - 1;
                if (length == ~(uint64_t)0)
                        fl->fl_end = OFFSET_MAX;
-               fl->fl_type = F_WRLCK;
+               fl->c.flc_type = F_WRLCK;
                if (type & 1)
-                       fl->fl_type = F_RDLCK;
-               fl->fl_pid = 0;
+                       fl->c.flc_type = F_RDLCK;
+               fl->c.flc_pid = 0;
        }
        p = xdr_decode_hyper(p, &clientid); /* read 8 bytes */
        namelen = be32_to_cpup(p); /* read 4 bytes */  /* have read all 32 bytes now */
index bb79d3a886ae83d15395371ec735b7d0e6075bae..84bb852645728b3edf427c5ac1020e38f329f325 100644 (file)
@@ -1301,7 +1301,7 @@ static bool
 is_whole_file_wrlock(struct file_lock *fl)
 {
        return fl->fl_start == 0 && fl->fl_end == OFFSET_MAX &&
-                       fl->fl_type == F_WRLCK;
+                       lock_is_write(fl);
 }
 
 /* If we know the page is up to date, and we're not using byte range locks (or
@@ -1335,13 +1335,13 @@ static int nfs_can_extend_write(struct file *file, struct folio *folio,
        spin_lock(&flctx->flc_lock);
        if (!list_empty(&flctx->flc_posix)) {
                fl = list_first_entry(&flctx->flc_posix, struct file_lock,
-                                       fl_list);
+                                       c.flc_list);
                if (is_whole_file_wrlock(fl))
                        ret = 1;
        } else if (!list_empty(&flctx->flc_flock)) {
                fl = list_first_entry(&flctx->flc_flock, struct file_lock,
-                                       fl_list);
-               if (fl->fl_type == F_WRLCK)
+                                       c.flc_list);
+               if (lock_is_write(fl))
                        ret = 1;
        }
        spin_unlock(&flctx->flc_lock);
index 9cb7f0c33df587875773bb91f7d20de73692f912..b86d8494052cd8b0d70a5b4bdd756f3dda3404ec 100644 (file)
@@ -662,8 +662,8 @@ nfsd_file_lease_notifier_call(struct notifier_block *nb, unsigned long arg,
        struct file_lock *fl = data;
 
        /* Only close files for F_SETLEASE leases */
-       if (fl->fl_flags & FL_LEASE)
-               nfsd_file_close_inode(file_inode(fl->fl_file));
+       if (fl->c.flc_flags & FL_LEASE)
+               nfsd_file_close_inode(file_inode(fl->c.flc_file));
        return 0;
 }
 
index 926c29879c6ab892e4d12329169eb1b5b43d2649..32d23ef3e5de5b4c0f50322ebbbc272439e37d76 100644 (file)
@@ -674,7 +674,7 @@ static void nfs4_xdr_enc_cb_notify_lock(struct rpc_rqst *req,
        const struct nfsd4_callback *cb = data;
        const struct nfsd4_blocked_lock *nbl =
                container_of(cb, struct nfsd4_blocked_lock, nbl_cb);
-       struct nfs4_lockowner *lo = (struct nfs4_lockowner *)nbl->nbl_lock.fl_owner;
+       struct nfs4_lockowner *lo = (struct nfs4_lockowner *)nbl->nbl_lock.c.flc_owner;
        struct nfs4_cb_compound_hdr hdr = {
                .ident = 0,
                .minorversion = cb->cb_clp->cl_minorversion,
index 5e8096bc5eaa452c9ef1d24cbfa58cfb2d9b35d5..4c0d00bdfbb1f3bdc7c3affdeb45bf9e8d7a0b4b 100644 (file)
@@ -25,7 +25,7 @@ static struct kmem_cache *nfs4_layout_cache;
 static struct kmem_cache *nfs4_layout_stateid_cache;
 
 static const struct nfsd4_callback_ops nfsd4_cb_layout_ops;
-static const struct lock_manager_operations nfsd4_layouts_lm_ops;
+static const struct lease_manager_operations nfsd4_layouts_lm_ops;
 
 const struct nfsd4_layout_ops *nfsd4_layout_ops[LAYOUT_TYPE_MAX] =  {
 #ifdef CONFIG_NFSD_FLEXFILELAYOUT
@@ -170,7 +170,7 @@ nfsd4_free_layout_stateid(struct nfs4_stid *stid)
        spin_unlock(&fp->fi_lock);
 
        if (!nfsd4_layout_ops[ls->ls_layout_type]->disable_recalls)
-               vfs_setlease(ls->ls_file->nf_file, F_UNLCK, NULL, (void **)&ls);
+               kernel_setlease(ls->ls_file->nf_file, F_UNLCK, NULL, (void **)&ls);
        nfsd_file_put(ls->ls_file);
 
        if (ls->ls_recalled)
@@ -182,27 +182,26 @@ nfsd4_free_layout_stateid(struct nfs4_stid *stid)
 static int
 nfsd4_layout_setlease(struct nfs4_layout_stateid *ls)
 {
-       struct file_lock *fl;
+       struct file_lease *fl;
        int status;
 
        if (nfsd4_layout_ops[ls->ls_layout_type]->disable_recalls)
                return 0;
 
-       fl = locks_alloc_lock();
+       fl = locks_alloc_lease();
        if (!fl)
                return -ENOMEM;
-       locks_init_lock(fl);
+       locks_init_lease(fl);
        fl->fl_lmops = &nfsd4_layouts_lm_ops;
-       fl->fl_flags = FL_LAYOUT;
-       fl->fl_type = F_RDLCK;
-       fl->fl_end = OFFSET_MAX;
-       fl->fl_owner = ls;
-       fl->fl_pid = current->tgid;
-       fl->fl_file = ls->ls_file->nf_file;
-
-       status = vfs_setlease(fl->fl_file, fl->fl_type, &fl, NULL);
+       fl->c.flc_flags = FL_LAYOUT;
+       fl->c.flc_type = F_RDLCK;
+       fl->c.flc_owner = ls;
+       fl->c.flc_pid = current->tgid;
+       fl->c.flc_file = ls->ls_file->nf_file;
+
+       status = kernel_setlease(fl->c.flc_file, fl->c.flc_type, &fl, NULL);
        if (status) {
-               locks_free_lock(fl);
+               locks_free_lease(fl);
                return status;
        }
        BUG_ON(fl != NULL);
@@ -723,7 +722,7 @@ static const struct nfsd4_callback_ops nfsd4_cb_layout_ops = {
 };
 
 static bool
-nfsd4_layout_lm_break(struct file_lock *fl)
+nfsd4_layout_lm_break(struct file_lease *fl)
 {
        /*
         * We don't want the locks code to timeout the lease for us;
@@ -731,19 +730,19 @@ nfsd4_layout_lm_break(struct file_lock *fl)
         * in time:
         */
        fl->fl_break_time = 0;
-       nfsd4_recall_file_layout(fl->fl_owner);
+       nfsd4_recall_file_layout(fl->c.flc_owner);
        return false;
 }
 
 static int
-nfsd4_layout_lm_change(struct file_lock *onlist, int arg,
+nfsd4_layout_lm_change(struct file_lease *onlist, int arg,
                struct list_head *dispose)
 {
        BUG_ON(!(arg & F_UNLCK));
        return lease_modify(onlist, arg, dispose);
 }
 
-static const struct lock_manager_operations nfsd4_layouts_lm_ops = {
+static const struct lease_manager_operations nfsd4_layouts_lm_ops = {
        .lm_break       = nfsd4_layout_lm_break,
        .lm_change      = nfsd4_layout_lm_change,
 };
index 2fa54cfd4882307e87e9c109070ecdc25a3db401..9257425cbd1a0d0e1dc87b9497e04409a23bd06a 100644 (file)
@@ -1249,7 +1249,7 @@ static void nfs4_unlock_deleg_lease(struct nfs4_delegation *dp)
 
        WARN_ON_ONCE(!fp->fi_delegees);
 
-       vfs_setlease(nf->nf_file, F_UNLCK, NULL, (void **)&dp);
+       kernel_setlease(nf->nf_file, F_UNLCK, NULL, (void **)&dp);
        put_deleg_file(fp);
 }
 
@@ -4922,9 +4922,9 @@ static void nfsd_break_one_deleg(struct nfs4_delegation *dp)
 
 /* Called from break_lease() with flc_lock held. */
 static bool
-nfsd_break_deleg_cb(struct file_lock *fl)
+nfsd_break_deleg_cb(struct file_lease *fl)
 {
-       struct nfs4_delegation *dp = (struct nfs4_delegation *)fl->fl_owner;
+       struct nfs4_delegation *dp = (struct nfs4_delegation *) fl->c.flc_owner;
        struct nfs4_file *fp = dp->dl_stid.sc_file;
        struct nfs4_client *clp = dp->dl_stid.sc_client;
        struct nfsd_net *nn;
@@ -4945,10 +4945,8 @@ nfsd_break_deleg_cb(struct file_lock *fl)
         */
        fl->fl_break_time = 0;
 
-       spin_lock(&fp->fi_lock);
        fp->fi_had_conflict = true;
        nfsd_break_one_deleg(dp);
-       spin_unlock(&fp->fi_lock);
        return false;
 }
 
@@ -4960,9 +4958,9 @@ nfsd_break_deleg_cb(struct file_lock *fl)
  *   %true: Lease conflict was resolved
  *   %false: Lease conflict was not resolved.
  */
-static bool nfsd_breaker_owns_lease(struct file_lock *fl)
+static bool nfsd_breaker_owns_lease(struct file_lease *fl)
 {
-       struct nfs4_delegation *dl = fl->fl_owner;
+       struct nfs4_delegation *dl = fl->c.flc_owner;
        struct svc_rqst *rqst;
        struct nfs4_client *clp;
 
@@ -4977,10 +4975,10 @@ static bool nfsd_breaker_owns_lease(struct file_lock *fl)
 }
 
 static int
-nfsd_change_deleg_cb(struct file_lock *onlist, int arg,
+nfsd_change_deleg_cb(struct file_lease *onlist, int arg,
                     struct list_head *dispose)
 {
-       struct nfs4_delegation *dp = (struct nfs4_delegation *)onlist->fl_owner;
+       struct nfs4_delegation *dp = (struct nfs4_delegation *) onlist->c.flc_owner;
        struct nfs4_client *clp = dp->dl_stid.sc_client;
 
        if (arg & F_UNLCK) {
@@ -4991,7 +4989,7 @@ nfsd_change_deleg_cb(struct file_lock *onlist, int arg,
                return -EAGAIN;
 }
 
-static const struct lock_manager_operations nfsd_lease_mng_ops = {
+static const struct lease_manager_operations nfsd_lease_mng_ops = {
        .lm_breaker_owns_lease = nfsd_breaker_owns_lease,
        .lm_break = nfsd_break_deleg_cb,
        .lm_change = nfsd_change_deleg_cb,
@@ -5331,21 +5329,20 @@ static bool nfsd4_cb_channel_good(struct nfs4_client *clp)
        return clp->cl_minorversion && clp->cl_cb_state == NFSD4_CB_UNKNOWN;
 }
 
-static struct file_lock *nfs4_alloc_init_lease(struct nfs4_delegation *dp,
+static struct file_lease *nfs4_alloc_init_lease(struct nfs4_delegation *dp,
                                                int flag)
 {
-       struct file_lock *fl;
+       struct file_lease *fl;
 
-       fl = locks_alloc_lock();
+       fl = locks_alloc_lease();
        if (!fl)
                return NULL;
        fl->fl_lmops = &nfsd_lease_mng_ops;
-       fl->fl_flags = FL_DELEG;
-       fl->fl_type = flag == NFS4_OPEN_DELEGATE_READ? F_RDLCK: F_WRLCK;
-       fl->fl_end = OFFSET_MAX;
-       fl->fl_owner = (fl_owner_t)dp;
-       fl->fl_pid = current->tgid;
-       fl->fl_file = dp->dl_stid.sc_file->fi_deleg_file->nf_file;
+       fl->c.flc_flags = FL_DELEG;
+       fl->c.flc_type = flag == NFS4_OPEN_DELEGATE_READ? F_RDLCK: F_WRLCK;
+       fl->c.flc_owner = (fl_owner_t)dp;
+       fl->c.flc_pid = current->tgid;
+       fl->c.flc_file = dp->dl_stid.sc_file->fi_deleg_file->nf_file;
        return fl;
 }
 
@@ -5463,7 +5460,7 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
        struct nfs4_clnt_odstate *odstate = stp->st_clnt_odstate;
        struct nfs4_delegation *dp;
        struct nfsd_file *nf = NULL;
-       struct file_lock *fl;
+       struct file_lease *fl;
        u32 dl_type;
 
        /*
@@ -5533,9 +5530,10 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
        if (!fl)
                goto out_clnt_odstate;
 
-       status = vfs_setlease(fp->fi_deleg_file->nf_file, fl->fl_type, &fl, NULL);
+       status = kernel_setlease(fp->fi_deleg_file->nf_file,
+                                     fl->c.flc_type, &fl, NULL);
        if (fl)
-               locks_free_lock(fl);
+               locks_free_lease(fl);
        if (status)
                goto out_clnt_odstate;
 
@@ -5557,12 +5555,13 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
        if (status)
                goto out_unlock;
 
+       status = -EAGAIN;
+       if (fp->fi_had_conflict)
+               goto out_unlock;
+
        spin_lock(&state_lock);
        spin_lock(&fp->fi_lock);
-       if (fp->fi_had_conflict)
-               status = -EAGAIN;
-       else
-               status = hash_delegation_locked(dp, fp);
+       status = hash_delegation_locked(dp, fp);
        spin_unlock(&fp->fi_lock);
        spin_unlock(&state_lock);
 
@@ -5571,7 +5570,7 @@ nfs4_set_delegation(struct nfsd4_open *open, struct nfs4_ol_stateid *stp,
 
        return dp;
 out_unlock:
-       vfs_setlease(fp->fi_deleg_file->nf_file, F_UNLCK, NULL, (void **)&dp);
+       kernel_setlease(fp->fi_deleg_file->nf_file, F_UNLCK, NULL, (void **)&dp);
 out_clnt_odstate:
        put_clnt_odstate(dp->dl_clnt_odstate);
        nfs4_put_stid(&dp->dl_stid);
@@ -7149,7 +7148,7 @@ nfsd4_lm_put_owner(fl_owner_t owner)
 static bool
 nfsd4_lm_lock_expirable(struct file_lock *cfl)
 {
-       struct nfs4_lockowner *lo = (struct nfs4_lockowner *)cfl->fl_owner;
+       struct nfs4_lockowner *lo = (struct nfs4_lockowner *) cfl->c.flc_owner;
        struct nfs4_client *clp = lo->lo_owner.so_client;
        struct nfsd_net *nn;
 
@@ -7171,7 +7170,7 @@ nfsd4_lm_expire_lock(void)
 static void
 nfsd4_lm_notify(struct file_lock *fl)
 {
-       struct nfs4_lockowner           *lo = (struct nfs4_lockowner *)fl->fl_owner;
+       struct nfs4_lockowner           *lo = (struct nfs4_lockowner *) fl->c.flc_owner;
        struct net                      *net = lo->lo_owner.so_client->net;
        struct nfsd_net                 *nn = net_generic(net, nfsd_net_id);
        struct nfsd4_blocked_lock       *nbl = container_of(fl,
@@ -7208,7 +7207,7 @@ nfs4_set_lock_denied(struct file_lock *fl, struct nfsd4_lock_denied *deny)
        struct nfs4_lockowner *lo;
 
        if (fl->fl_lmops == &nfsd_posix_mng_ops) {
-               lo = (struct nfs4_lockowner *) fl->fl_owner;
+               lo = (struct nfs4_lockowner *) fl->c.flc_owner;
                xdr_netobj_dup(&deny->ld_owner, &lo->lo_owner.so_owner,
                                                GFP_KERNEL);
                if (!deny->ld_owner.data)
@@ -7227,7 +7226,7 @@ nevermind:
        if (fl->fl_end != NFS4_MAX_UINT64)
                deny->ld_length = fl->fl_end - fl->fl_start + 1;        
        deny->ld_type = NFS4_READ_LT;
-       if (fl->fl_type != F_RDLCK)
+       if (fl->c.flc_type != F_RDLCK)
                deny->ld_type = NFS4_WRITE_LT;
 }
 
@@ -7493,8 +7492,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
        int lkflg;
        int err;
        bool new = false;
-       unsigned char fl_type;
-       unsigned int fl_flags = FL_POSIX;
+       unsigned char type;
+       unsigned int flags = FL_POSIX;
        struct net *net = SVC_NET(rqstp);
        struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 
@@ -7557,14 +7556,14 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
                goto out;
 
        if (lock->lk_reclaim)
-               fl_flags |= FL_RECLAIM;
+               flags |= FL_RECLAIM;
 
        fp = lock_stp->st_stid.sc_file;
        switch (lock->lk_type) {
                case NFS4_READW_LT:
                        if (nfsd4_has_session(cstate) ||
                            exportfs_lock_op_is_async(sb->s_export_op))
-                               fl_flags |= FL_SLEEP;
+                               flags |= FL_SLEEP;
                        fallthrough;
                case NFS4_READ_LT:
                        spin_lock(&fp->fi_lock);
@@ -7572,12 +7571,12 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
                        if (nf)
                                get_lock_access(lock_stp, NFS4_SHARE_ACCESS_READ);
                        spin_unlock(&fp->fi_lock);
-                       fl_type = F_RDLCK;
+                       type = F_RDLCK;
                        break;
                case NFS4_WRITEW_LT:
                        if (nfsd4_has_session(cstate) ||
                            exportfs_lock_op_is_async(sb->s_export_op))
-                               fl_flags |= FL_SLEEP;
+                               flags |= FL_SLEEP;
                        fallthrough;
                case NFS4_WRITE_LT:
                        spin_lock(&fp->fi_lock);
@@ -7585,7 +7584,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
                        if (nf)
                                get_lock_access(lock_stp, NFS4_SHARE_ACCESS_WRITE);
                        spin_unlock(&fp->fi_lock);
-                       fl_type = F_WRLCK;
+                       type = F_WRLCK;
                        break;
                default:
                        status = nfserr_inval;
@@ -7605,7 +7604,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
         * on those filesystems:
         */
        if (!exportfs_lock_op_is_async(sb->s_export_op))
-               fl_flags &= ~FL_SLEEP;
+               flags &= ~FL_SLEEP;
 
        nbl = find_or_allocate_block(lock_sop, &fp->fi_fhandle, nn);
        if (!nbl) {
@@ -7615,11 +7614,11 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
        }
 
        file_lock = &nbl->nbl_lock;
-       file_lock->fl_type = fl_type;
-       file_lock->fl_owner = (fl_owner_t)lockowner(nfs4_get_stateowner(&lock_sop->lo_owner));
-       file_lock->fl_pid = current->tgid;
-       file_lock->fl_file = nf->nf_file;
-       file_lock->fl_flags = fl_flags;
+       file_lock->c.flc_type = type;
+       file_lock->c.flc_owner = (fl_owner_t)lockowner(nfs4_get_stateowner(&lock_sop->lo_owner));
+       file_lock->c.flc_pid = current->tgid;
+       file_lock->c.flc_file = nf->nf_file;
+       file_lock->c.flc_flags = flags;
        file_lock->fl_lmops = &nfsd_posix_mng_ops;
        file_lock->fl_start = lock->lk_offset;
        file_lock->fl_end = last_byte_offset(lock->lk_offset, lock->lk_length);
@@ -7632,7 +7631,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
                goto out;
        }
 
-       if (fl_flags & FL_SLEEP) {
+       if (flags & FL_SLEEP) {
                nbl->nbl_time = ktime_get_boottime_seconds();
                spin_lock(&nn->blocked_locks_lock);
                list_add_tail(&nbl->nbl_list, &lock_sop->lo_blocked);
@@ -7669,7 +7668,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 out:
        if (nbl) {
                /* dequeue it if we queued it before */
-               if (fl_flags & FL_SLEEP) {
+               if (flags & FL_SLEEP) {
                        spin_lock(&nn->blocked_locks_lock);
                        if (!list_empty(&nbl->nbl_list) &&
                            !list_empty(&nbl->nbl_lru)) {
@@ -7737,9 +7736,9 @@ static __be32 nfsd_test_lock(struct svc_rqst *rqstp, struct svc_fh *fhp, struct
        err = nfserrno(nfsd_open_break_lease(inode, NFSD_MAY_READ));
        if (err)
                goto out;
-       lock->fl_file = nf->nf_file;
+       lock->c.flc_file = nf->nf_file;
        err = nfserrno(vfs_test_lock(nf->nf_file, lock));
-       lock->fl_file = NULL;
+       lock->c.flc_file = NULL;
 out:
        inode_unlock(inode);
        nfsd_file_put(nf);
@@ -7784,11 +7783,11 @@ nfsd4_lockt(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
        switch (lockt->lt_type) {
                case NFS4_READ_LT:
                case NFS4_READW_LT:
-                       file_lock->fl_type = F_RDLCK;
+                       file_lock->c.flc_type = F_RDLCK;
                        break;
                case NFS4_WRITE_LT:
                case NFS4_WRITEW_LT:
-                       file_lock->fl_type = F_WRLCK;
+                       file_lock->c.flc_type = F_WRLCK;
                        break;
                default:
                        dprintk("NFSD: nfs4_lockt: bad lock type!\n");
@@ -7798,9 +7797,9 @@ nfsd4_lockt(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 
        lo = find_lockowner_str(cstate->clp, &lockt->lt_owner);
        if (lo)
-               file_lock->fl_owner = (fl_owner_t)lo;
-       file_lock->fl_pid = current->tgid;
-       file_lock->fl_flags = FL_POSIX;
+               file_lock->c.flc_owner = (fl_owner_t)lo;
+       file_lock->c.flc_pid = current->tgid;
+       file_lock->c.flc_flags = FL_POSIX;
 
        file_lock->fl_start = lockt->lt_offset;
        file_lock->fl_end = last_byte_offset(lockt->lt_offset, lockt->lt_length);
@@ -7811,7 +7810,7 @@ nfsd4_lockt(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
        if (status)
                goto out;
 
-       if (file_lock->fl_type != F_UNLCK) {
+       if (file_lock->c.flc_type != F_UNLCK) {
                status = nfserr_denied;
                nfs4_set_lock_denied(file_lock, &lockt->lt_denied);
        }
@@ -7867,11 +7866,11 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
                goto put_file;
        }
 
-       file_lock->fl_type = F_UNLCK;
-       file_lock->fl_owner = (fl_owner_t)lockowner(nfs4_get_stateowner(stp->st_stateowner));
-       file_lock->fl_pid = current->tgid;
-       file_lock->fl_file = nf->nf_file;
-       file_lock->fl_flags = FL_POSIX;
+       file_lock->c.flc_type = F_UNLCK;
+       file_lock->c.flc_owner = (fl_owner_t)lockowner(nfs4_get_stateowner(stp->st_stateowner));
+       file_lock->c.flc_pid = current->tgid;
+       file_lock->c.flc_file = nf->nf_file;
+       file_lock->c.flc_flags = FL_POSIX;
        file_lock->fl_lmops = &nfsd_posix_mng_ops;
        file_lock->fl_start = locku->lu_offset;
 
@@ -7911,14 +7910,16 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
 {
        struct file_lock *fl;
        int status = false;
-       struct nfsd_file *nf = find_any_file(fp);
+       struct nfsd_file *nf;
        struct inode *inode;
        struct file_lock_context *flctx;
 
+       spin_lock(&fp->fi_lock);
+       nf = find_any_file_locked(fp);
        if (!nf) {
                /* Any valid lock stateid should have some sort of access */
                WARN_ON_ONCE(1);
-               return status;
+               goto out;
        }
 
        inode = file_inode(nf->nf_file);
@@ -7926,15 +7927,16 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
 
        if (flctx && !list_empty_careful(&flctx->flc_posix)) {
                spin_lock(&flctx->flc_lock);
-               list_for_each_entry(fl, &flctx->flc_posix, fl_list) {
-                       if (fl->fl_owner == (fl_owner_t)lowner) {
+               for_each_file_lock(fl, &flctx->flc_posix) {
+                       if (fl->c.flc_owner == (fl_owner_t)lowner) {
                                status = true;
                                break;
                        }
                }
                spin_unlock(&flctx->flc_lock);
        }
-       nfsd_file_put(nf);
+out:
+       spin_unlock(&fp->fi_lock);
        return status;
 }
 
@@ -7944,10 +7946,8 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
  * @cstate: NFSv4 COMPOUND state
  * @u: RELEASE_LOCKOWNER arguments
  *
- * The lockowner's so_count is bumped when a lock record is added
- * or when copying a conflicting lock. The latter case is brief,
- * but can lead to fleeting false positives when looking for
- * locks-in-use.
+ * Check if theree are any locks still held and if not - free the lockowner
+ * and any lock state that is owned.
  *
  * Return values:
  *   %nfs_ok: lockowner released or not found
@@ -7983,10 +7983,13 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp,
                spin_unlock(&clp->cl_lock);
                return nfs_ok;
        }
-       if (atomic_read(&lo->lo_owner.so_count) != 2) {
-               spin_unlock(&clp->cl_lock);
-               nfs4_put_stateowner(&lo->lo_owner);
-               return nfserr_locks_held;
+
+       list_for_each_entry(stp, &lo->lo_owner.so_stateids, st_perstateowner) {
+               if (check_for_locks(stp->st_stid.sc_file, lo)) {
+                       spin_unlock(&clp->cl_lock);
+                       nfs4_put_stateowner(&lo->lo_owner);
+                       return nfserr_locks_held;
+               }
        }
        unhash_lockowner_locked(lo);
        while (!list_empty(&lo->lo_owner.so_stateids)) {
@@ -8448,15 +8451,17 @@ nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct inode *inode)
 {
        __be32 status;
        struct file_lock_context *ctx;
-       struct file_lock *fl;
+       struct file_lease *fl;
        struct nfs4_delegation *dp;
 
        ctx = locks_inode_context(inode);
        if (!ctx)
                return 0;
        spin_lock(&ctx->flc_lock);
-       list_for_each_entry(fl, &ctx->flc_lease, fl_list) {
-               if (fl->fl_flags == FL_LAYOUT)
+       for_each_file_lock(fl, &ctx->flc_lease) {
+               unsigned char type = fl->c.flc_type;
+
+               if (fl->c.flc_flags == FL_LAYOUT)
                        continue;
                if (fl->fl_lmops != &nfsd_lease_mng_ops) {
                        /*
@@ -8464,12 +8469,12 @@ nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct inode *inode)
                         * we are done; there isn't any write delegation
                         * on this inode
                         */
-                       if (fl->fl_type == F_RDLCK)
+                       if (type == F_RDLCK)
                                break;
                        goto break_lease;
                }
-               if (fl->fl_type == F_WRLCK) {
-                       dp = fl->fl_owner;
+               if (type == F_WRLCK) {
+                       dp = fl->c.flc_owner;
                        if (dp->dl_recall.cb_clp == *(rqstp->rq_lease_breaker)) {
                                spin_unlock(&ctx->flc_lock);
                                return 0;
index bec33b89a075858ebf289a95fa4c83dbf6e86103..0e3fc5ba33c73d7f22deefc1cb68ee8395a1efa4 100644 (file)
@@ -107,7 +107,13 @@ static vm_fault_t nilfs_page_mkwrite(struct vm_fault *vmf)
        nilfs_transaction_commit(inode->i_sb);
 
  mapped:
-       folio_wait_stable(folio);
+       /*
+        * Since checksumming including data blocks is performed to determine
+        * the validity of the log to be written and used for recovery, it is
+        * necessary to wait for writeback to finish here, regardless of the
+        * stable write requirement of the backing device.
+        */
+       folio_wait_writeback(folio);
  out:
        sb_end_pagefault(inode->i_sb);
        return vmf_fs_error(ret);
index 0955b657938ff2ce993d92e7d8f81322ce71c2e1..a9b8d77c8c1d55b551582b826dafdcdcd047d13a 100644 (file)
@@ -472,9 +472,10 @@ static int nilfs_prepare_segment_for_recovery(struct the_nilfs *nilfs,
 
 static int nilfs_recovery_copy_block(struct the_nilfs *nilfs,
                                     struct nilfs_recovery_block *rb,
-                                    struct page *page)
+                                    loff_t pos, struct page *page)
 {
        struct buffer_head *bh_org;
+       size_t from = pos & ~PAGE_MASK;
        void *kaddr;
 
        bh_org = __bread(nilfs->ns_bdev, rb->blocknr, nilfs->ns_blocksize);
@@ -482,7 +483,7 @@ static int nilfs_recovery_copy_block(struct the_nilfs *nilfs,
                return -EIO;
 
        kaddr = kmap_atomic(page);
-       memcpy(kaddr + bh_offset(bh_org), bh_org->b_data, bh_org->b_size);
+       memcpy(kaddr + from, bh_org->b_data, bh_org->b_size);
        kunmap_atomic(kaddr);
        brelse(bh_org);
        return 0;
@@ -521,7 +522,7 @@ static int nilfs_recover_dsync_blocks(struct the_nilfs *nilfs,
                        goto failed_inode;
                }
 
-               err = nilfs_recovery_copy_block(nilfs, rb, page);
+               err = nilfs_recovery_copy_block(nilfs, rb, pos, page);
                if (unlikely(err))
                        goto failed_page;
 
index 2590a0860eab022ba68a18b9db8fe181faab5069..2bfb08052d399972dee9fd49583b77b95104ac83 100644 (file)
@@ -1703,7 +1703,6 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci)
 
                list_for_each_entry(bh, &segbuf->sb_payload_buffers,
                                    b_assoc_buffers) {
-                       set_buffer_async_write(bh);
                        if (bh == segbuf->sb_super_root) {
                                if (bh->b_folio != bd_folio) {
                                        folio_lock(bd_folio);
@@ -1714,6 +1713,7 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci)
                                }
                                break;
                        }
+                       set_buffer_async_write(bh);
                        if (bh->b_folio != fs_folio) {
                                nilfs_begin_folio_io(fs_folio);
                                fs_folio = bh->b_folio;
@@ -1800,7 +1800,6 @@ static void nilfs_abort_logs(struct list_head *logs, int err)
 
                list_for_each_entry(bh, &segbuf->sb_payload_buffers,
                                    b_assoc_buffers) {
-                       clear_buffer_async_write(bh);
                        if (bh == segbuf->sb_super_root) {
                                clear_buffer_uptodate(bh);
                                if (bh->b_folio != bd_folio) {
@@ -1809,6 +1808,7 @@ static void nilfs_abort_logs(struct list_head *logs, int err)
                                }
                                break;
                        }
+                       clear_buffer_async_write(bh);
                        if (bh->b_folio != fs_folio) {
                                nilfs_end_folio_io(fs_folio, err);
                                fs_folio = bh->b_folio;
@@ -1896,8 +1896,9 @@ static void nilfs_segctor_complete_write(struct nilfs_sc_info *sci)
                                 BIT(BH_Delay) | BIT(BH_NILFS_Volatile) |
                                 BIT(BH_NILFS_Redirected));
 
-                       set_mask_bits(&bh->b_state, clear_bits, set_bits);
                        if (bh == segbuf->sb_super_root) {
+                               set_buffer_uptodate(bh);
+                               clear_buffer_dirty(bh);
                                if (bh->b_folio != bd_folio) {
                                        folio_end_writeback(bd_folio);
                                        bd_folio = bh->b_folio;
@@ -1905,6 +1906,7 @@ static void nilfs_segctor_complete_write(struct nilfs_sc_info *sci)
                                update_sr = true;
                                break;
                        }
+                       set_mask_bits(&bh->b_state, clear_bits, set_bits);
                        if (bh->b_folio != fs_folio) {
                                nilfs_end_folio_io(fs_folio, 0);
                                fs_folio = bh->b_folio;
index 34e1e3e36733da8ed12de3582123638095f30fab..7aaafb5cb9fc9f7792c7d7198b49f561ee14be0b 100644 (file)
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -27,26 +27,17 @@ static const struct file_operations ns_file_operations = {
 static char *ns_dname(struct dentry *dentry, char *buffer, int buflen)
 {
        struct inode *inode = d_inode(dentry);
-       const struct proc_ns_operations *ns_ops = dentry->d_fsdata;
+       struct ns_common *ns = inode->i_private;
+       const struct proc_ns_operations *ns_ops = ns->ops;
 
        return dynamic_dname(buffer, buflen, "%s:[%lu]",
                ns_ops->name, inode->i_ino);
 }
 
-static void ns_prune_dentry(struct dentry *dentry)
-{
-       struct inode *inode = d_inode(dentry);
-       if (inode) {
-               struct ns_common *ns = inode->i_private;
-               atomic_long_set(&ns->stashed, 0);
-       }
-}
-
-const struct dentry_operations ns_dentry_operations =
-{
-       .d_prune        = ns_prune_dentry,
+const struct dentry_operations ns_dentry_operations = {
        .d_delete       = always_delete_dentry,
        .d_dname        = ns_dname,
+       .d_prune        = stashed_dentry_prune,
 };
 
 static void nsfs_evict(struct inode *inode)
@@ -56,67 +47,16 @@ static void nsfs_evict(struct inode *inode)
        ns->ops->put(ns);
 }
 
-static int __ns_get_path(struct path *path, struct ns_common *ns)
-{
-       struct vfsmount *mnt = nsfs_mnt;
-       struct dentry *dentry;
-       struct inode *inode;
-       unsigned long d;
-
-       rcu_read_lock();
-       d = atomic_long_read(&ns->stashed);
-       if (!d)
-               goto slow;
-       dentry = (struct dentry *)d;
-       if (!lockref_get_not_dead(&dentry->d_lockref))
-               goto slow;
-       rcu_read_unlock();
-       ns->ops->put(ns);
-got_it:
-       path->mnt = mntget(mnt);
-       path->dentry = dentry;
-       return 0;
-slow:
-       rcu_read_unlock();
-       inode = new_inode_pseudo(mnt->mnt_sb);
-       if (!inode) {
-               ns->ops->put(ns);
-               return -ENOMEM;
-       }
-       inode->i_ino = ns->inum;
-       simple_inode_init_ts(inode);
-       inode->i_flags |= S_IMMUTABLE;
-       inode->i_mode = S_IFREG | S_IRUGO;
-       inode->i_fop = &ns_file_operations;
-       inode->i_private = ns;
-
-       dentry = d_make_root(inode);    /* not the normal use, but... */
-       if (!dentry)
-               return -ENOMEM;
-       dentry->d_fsdata = (void *)ns->ops;
-       d = atomic_long_cmpxchg(&ns->stashed, 0, (unsigned long)dentry);
-       if (d) {
-               d_delete(dentry);       /* make sure ->d_prune() does nothing */
-               dput(dentry);
-               cpu_relax();
-               return -EAGAIN;
-       }
-       goto got_it;
-}
-
 int ns_get_path_cb(struct path *path, ns_get_path_helper_t *ns_get_cb,
                     void *private_data)
 {
-       int ret;
+       struct ns_common *ns;
 
-       do {
-               struct ns_common *ns = ns_get_cb(private_data);
-               if (!ns)
-                       return -ENOENT;
-               ret = __ns_get_path(path, ns);
-       } while (ret == -EAGAIN);
+       ns = ns_get_cb(private_data);
+       if (!ns)
+               return -ENOENT;
 
-       return ret;
+       return path_from_stashed(&ns->stashed, ns->inum, nsfs_mnt, ns, path);
 }
 
 struct ns_get_path_task_args {
@@ -146,6 +86,7 @@ int open_related_ns(struct ns_common *ns,
                   struct ns_common *(*get_ns)(struct ns_common *ns))
 {
        struct path path = {};
+       struct ns_common *relative;
        struct file *f;
        int err;
        int fd;
@@ -154,19 +95,15 @@ int open_related_ns(struct ns_common *ns,
        if (fd < 0)
                return fd;
 
-       do {
-               struct ns_common *relative;
-
-               relative = get_ns(ns);
-               if (IS_ERR(relative)) {
-                       put_unused_fd(fd);
-                       return PTR_ERR(relative);
-               }
-
-               err = __ns_get_path(&path, relative);
-       } while (err == -EAGAIN);
+       relative = get_ns(ns);
+       if (IS_ERR(relative)) {
+               put_unused_fd(fd);
+               return PTR_ERR(relative);
+       }
 
-       if (err) {
+       err = path_from_stashed(&relative->stashed, relative->inum, nsfs_mnt,
+                               relative, &path);
+       if (err < 0) {
                put_unused_fd(fd);
                return err;
        }
@@ -249,7 +186,8 @@ bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino)
 static int nsfs_show_path(struct seq_file *seq, struct dentry *dentry)
 {
        struct inode *inode = d_inode(dentry);
-       const struct proc_ns_operations *ns_ops = dentry->d_fsdata;
+       const struct ns_common *ns = inode->i_private;
+       const struct proc_ns_operations *ns_ops = ns->ops;
 
        seq_printf(seq, "%s:[%lu]", ns_ops->name, inode->i_ino);
        return 0;
@@ -261,6 +199,24 @@ static const struct super_operations nsfs_ops = {
        .show_path = nsfs_show_path,
 };
 
+static void nsfs_init_inode(struct inode *inode, void *data)
+{
+       inode->i_private = data;
+       inode->i_mode |= S_IRUGO;
+       inode->i_fop = &ns_file_operations;
+}
+
+static void nsfs_put_data(void *data)
+{
+       struct ns_common *ns = data;
+       ns->ops->put(ns);
+}
+
+static const struct stashed_operations nsfs_stashed_ops = {
+       .init_inode = nsfs_init_inode,
+       .put_data = nsfs_put_data,
+};
+
 static int nsfs_init_fs_context(struct fs_context *fc)
 {
        struct pseudo_fs_context *ctx = init_pseudo(fc, NSFS_MAGIC);
@@ -268,6 +224,7 @@ static int nsfs_init_fs_context(struct fs_context *fc)
                return -ENOMEM;
        ctx->ops = &nsfs_ops;
        ctx->dops = &ns_dentry_operations;
+       fc->s_fs_info = (void *)&nsfs_stashed_ops;
        return 0;
 }
 
diff --git a/fs/ntfs/Kconfig b/fs/ntfs/Kconfig
deleted file mode 100644 (file)
index 7b25097..0000000
+++ /dev/null
@@ -1,81 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0-only
-config NTFS_FS
-       tristate "NTFS file system support"
-       select BUFFER_HEAD
-       select NLS
-       help
-         NTFS is the file system of Microsoft Windows NT, 2000, XP and 2003.
-
-         Saying Y or M here enables read support.  There is partial, but
-         safe, write support available.  For write support you must also
-         say Y to "NTFS write support" below.
-
-         There are also a number of user-space tools available, called
-         ntfsprogs.  These include ntfsundelete and ntfsresize, that work
-         without NTFS support enabled in the kernel.
-
-         This is a rewrite from scratch of Linux NTFS support and replaced
-         the old NTFS code starting with Linux 2.5.11.  A backport to
-         the Linux 2.4 kernel series is separately available as a patch
-         from the project web site.
-
-         For more information see <file:Documentation/filesystems/ntfs.rst>
-         and <http://www.linux-ntfs.org/>.
-
-         To compile this file system support as a module, choose M here: the
-         module will be called ntfs.
-
-         If you are not using Windows NT, 2000, XP or 2003 in addition to
-         Linux on your computer it is safe to say N.
-
-config NTFS_DEBUG
-       bool "NTFS debugging support"
-       depends on NTFS_FS
-       help
-         If you are experiencing any problems with the NTFS file system, say
-         Y here.  This will result in additional consistency checks to be
-         performed by the driver as well as additional debugging messages to
-         be written to the system log.  Note that debugging messages are
-         disabled by default.  To enable them, supply the option debug_msgs=1
-         at the kernel command line when booting the kernel or as an option
-         to insmod when loading the ntfs module.  Once the driver is active,
-         you can enable debugging messages by doing (as root):
-         echo 1 > /proc/sys/fs/ntfs-debug
-         Replacing the "1" with "0" would disable debug messages.
-
-         If you leave debugging messages disabled, this results in little
-         overhead, but enabling debug messages results in very significant
-         slowdown of the system.
-
-         When reporting bugs, please try to have available a full dump of
-         debugging messages while the misbehaviour was occurring.
-
-config NTFS_RW
-       bool "NTFS write support"
-       depends on NTFS_FS
-       depends on PAGE_SIZE_LESS_THAN_64KB
-       help
-         This enables the partial, but safe, write support in the NTFS driver.
-
-         The only supported operation is overwriting existing files, without
-         changing the file length.  No file or directory creation, deletion or
-         renaming is possible.  Note only non-resident files can be written to
-         so you may find that some very small files (<500 bytes or so) cannot
-         be written to.
-
-         While we cannot guarantee that it will not damage any data, we have
-         so far not received a single report where the driver would have
-         damaged someones data so we assume it is perfectly safe to use.
-
-         Note:  While write support is safe in this version (a rewrite from
-         scratch of the NTFS support), it should be noted that the old NTFS
-         write support, included in Linux 2.5.10 and before (since 1997),
-         is not safe.
-
-         This is currently useful with TopologiLinux.  TopologiLinux is run
-         on top of any DOS/Microsoft Windows system without partitioning your
-         hard disk.  Unlike other Linux distributions TopologiLinux does not
-         need its own partition.  For more information see
-         <http://topologi-linux.sourceforge.net/>
-
-         It is perfectly safe to say N here.
diff --git a/fs/ntfs/Makefile b/fs/ntfs/Makefile
deleted file mode 100644 (file)
index 3e73657..0000000
+++ /dev/null
@@ -1,15 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-# Rules for making the NTFS driver.
-
-obj-$(CONFIG_NTFS_FS) += ntfs.o
-
-ntfs-y := aops.o attrib.o collate.o compress.o debug.o dir.o file.o \
-         index.o inode.o mft.o mst.o namei.o runlist.o super.o sysctl.o \
-         unistr.o upcase.o
-
-ntfs-$(CONFIG_NTFS_RW) += bitmap.o lcnalloc.o logfile.o quota.o usnjrnl.o
-
-ccflags-y := -DNTFS_VERSION=\"2.1.32\"
-ccflags-$(CONFIG_NTFS_DEBUG)   += -DDEBUG
-ccflags-$(CONFIG_NTFS_RW)      += -DNTFS_RW
-
diff --git a/fs/ntfs/aops.c b/fs/ntfs/aops.c
deleted file mode 100644 (file)
index 2d01517..0000000
+++ /dev/null
@@ -1,1744 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * aops.c - NTFS kernel address space operations and page cache handling.
- *
- * Copyright (c) 2001-2014 Anton Altaparmakov and Tuxera Inc.
- * Copyright (c) 2002 Richard Russon
- */
-
-#include <linux/errno.h>
-#include <linux/fs.h>
-#include <linux/gfp.h>
-#include <linux/mm.h>
-#include <linux/pagemap.h>
-#include <linux/swap.h>
-#include <linux/buffer_head.h>
-#include <linux/writeback.h>
-#include <linux/bit_spinlock.h>
-#include <linux/bio.h>
-
-#include "aops.h"
-#include "attrib.h"
-#include "debug.h"
-#include "inode.h"
-#include "mft.h"
-#include "runlist.h"
-#include "types.h"
-#include "ntfs.h"
-
-/**
- * ntfs_end_buffer_async_read - async io completion for reading attributes
- * @bh:                buffer head on which io is completed
- * @uptodate:  whether @bh is now uptodate or not
- *
- * Asynchronous I/O completion handler for reading pages belonging to the
- * attribute address space of an inode.  The inodes can either be files or
- * directories or they can be fake inodes describing some attribute.
- *
- * If NInoMstProtected(), perform the post read mst fixups when all IO on the
- * page has been completed and mark the page uptodate or set the error bit on
- * the page.  To determine the size of the records that need fixing up, we
- * cheat a little bit by setting the index_block_size in ntfs_inode to the ntfs
- * record size, and index_block_size_bits, to the log(base 2) of the ntfs
- * record size.
- */
-static void ntfs_end_buffer_async_read(struct buffer_head *bh, int uptodate)
-{
-       unsigned long flags;
-       struct buffer_head *first, *tmp;
-       struct page *page;
-       struct inode *vi;
-       ntfs_inode *ni;
-       int page_uptodate = 1;
-
-       page = bh->b_page;
-       vi = page->mapping->host;
-       ni = NTFS_I(vi);
-
-       if (likely(uptodate)) {
-               loff_t i_size;
-               s64 file_ofs, init_size;
-
-               set_buffer_uptodate(bh);
-
-               file_ofs = ((s64)page->index << PAGE_SHIFT) +
-                               bh_offset(bh);
-               read_lock_irqsave(&ni->size_lock, flags);
-               init_size = ni->initialized_size;
-               i_size = i_size_read(vi);
-               read_unlock_irqrestore(&ni->size_lock, flags);
-               if (unlikely(init_size > i_size)) {
-                       /* Race with shrinking truncate. */
-                       init_size = i_size;
-               }
-               /* Check for the current buffer head overflowing. */
-               if (unlikely(file_ofs + bh->b_size > init_size)) {
-                       int ofs;
-                       void *kaddr;
-
-                       ofs = 0;
-                       if (file_ofs < init_size)
-                               ofs = init_size - file_ofs;
-                       kaddr = kmap_atomic(page);
-                       memset(kaddr + bh_offset(bh) + ofs, 0,
-                                       bh->b_size - ofs);
-                       flush_dcache_page(page);
-                       kunmap_atomic(kaddr);
-               }
-       } else {
-               clear_buffer_uptodate(bh);
-               SetPageError(page);
-               ntfs_error(ni->vol->sb, "Buffer I/O error, logical block "
-                               "0x%llx.", (unsigned long long)bh->b_blocknr);
-       }
-       first = page_buffers(page);
-       spin_lock_irqsave(&first->b_uptodate_lock, flags);
-       clear_buffer_async_read(bh);
-       unlock_buffer(bh);
-       tmp = bh;
-       do {
-               if (!buffer_uptodate(tmp))
-                       page_uptodate = 0;
-               if (buffer_async_read(tmp)) {
-                       if (likely(buffer_locked(tmp)))
-                               goto still_busy;
-                       /* Async buffers must be locked. */
-                       BUG();
-               }
-               tmp = tmp->b_this_page;
-       } while (tmp != bh);
-       spin_unlock_irqrestore(&first->b_uptodate_lock, flags);
-       /*
-        * If none of the buffers had errors then we can set the page uptodate,
-        * but we first have to perform the post read mst fixups, if the
-        * attribute is mst protected, i.e. if NInoMstProteced(ni) is true.
-        * Note we ignore fixup errors as those are detected when
-        * map_mft_record() is called which gives us per record granularity
-        * rather than per page granularity.
-        */
-       if (!NInoMstProtected(ni)) {
-               if (likely(page_uptodate && !PageError(page)))
-                       SetPageUptodate(page);
-       } else {
-               u8 *kaddr;
-               unsigned int i, recs;
-               u32 rec_size;
-
-               rec_size = ni->itype.index.block_size;
-               recs = PAGE_SIZE / rec_size;
-               /* Should have been verified before we got here... */
-               BUG_ON(!recs);
-               kaddr = kmap_atomic(page);
-               for (i = 0; i < recs; i++)
-                       post_read_mst_fixup((NTFS_RECORD*)(kaddr +
-                                       i * rec_size), rec_size);
-               kunmap_atomic(kaddr);
-               flush_dcache_page(page);
-               if (likely(page_uptodate && !PageError(page)))
-                       SetPageUptodate(page);
-       }
-       unlock_page(page);
-       return;
-still_busy:
-       spin_unlock_irqrestore(&first->b_uptodate_lock, flags);
-       return;
-}
-
-/**
- * ntfs_read_block - fill a @folio of an address space with data
- * @folio:     page cache folio to fill with data
- *
- * We read each buffer asynchronously and when all buffers are read in, our io
- * completion handler ntfs_end_buffer_read_async(), if required, automatically
- * applies the mst fixups to the folio before finally marking it uptodate and
- * unlocking it.
- *
- * We only enforce allocated_size limit because i_size is checked for in
- * generic_file_read().
- *
- * Return 0 on success and -errno on error.
- *
- * Contains an adapted version of fs/buffer.c::block_read_full_folio().
- */
-static int ntfs_read_block(struct folio *folio)
-{
-       loff_t i_size;
-       VCN vcn;
-       LCN lcn;
-       s64 init_size;
-       struct inode *vi;
-       ntfs_inode *ni;
-       ntfs_volume *vol;
-       runlist_element *rl;
-       struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE];
-       sector_t iblock, lblock, zblock;
-       unsigned long flags;
-       unsigned int blocksize, vcn_ofs;
-       int i, nr;
-       unsigned char blocksize_bits;
-
-       vi = folio->mapping->host;
-       ni = NTFS_I(vi);
-       vol = ni->vol;
-
-       /* $MFT/$DATA must have its complete runlist in memory at all times. */
-       BUG_ON(!ni->runlist.rl && !ni->mft_no && !NInoAttr(ni));
-
-       blocksize = vol->sb->s_blocksize;
-       blocksize_bits = vol->sb->s_blocksize_bits;
-
-       head = folio_buffers(folio);
-       if (!head)
-               head = create_empty_buffers(folio, blocksize, 0);
-       bh = head;
-
-       /*
-        * We may be racing with truncate.  To avoid some of the problems we
-        * now take a snapshot of the various sizes and use those for the whole
-        * of the function.  In case of an extending truncate it just means we
-        * may leave some buffers unmapped which are now allocated.  This is
-        * not a problem since these buffers will just get mapped when a write
-        * occurs.  In case of a shrinking truncate, we will detect this later
-        * on due to the runlist being incomplete and if the folio is being
-        * fully truncated, truncate will throw it away as soon as we unlock
-        * it so no need to worry what we do with it.
-        */
-       iblock = (s64)folio->index << (PAGE_SHIFT - blocksize_bits);
-       read_lock_irqsave(&ni->size_lock, flags);
-       lblock = (ni->allocated_size + blocksize - 1) >> blocksize_bits;
-       init_size = ni->initialized_size;
-       i_size = i_size_read(vi);
-       read_unlock_irqrestore(&ni->size_lock, flags);
-       if (unlikely(init_size > i_size)) {
-               /* Race with shrinking truncate. */
-               init_size = i_size;
-       }
-       zblock = (init_size + blocksize - 1) >> blocksize_bits;
-
-       /* Loop through all the buffers in the folio. */
-       rl = NULL;
-       nr = i = 0;
-       do {
-               int err = 0;
-
-               if (unlikely(buffer_uptodate(bh)))
-                       continue;
-               if (unlikely(buffer_mapped(bh))) {
-                       arr[nr++] = bh;
-                       continue;
-               }
-               bh->b_bdev = vol->sb->s_bdev;
-               /* Is the block within the allowed limits? */
-               if (iblock < lblock) {
-                       bool is_retry = false;
-
-                       /* Convert iblock into corresponding vcn and offset. */
-                       vcn = (VCN)iblock << blocksize_bits >>
-                                       vol->cluster_size_bits;
-                       vcn_ofs = ((VCN)iblock << blocksize_bits) &
-                                       vol->cluster_size_mask;
-                       if (!rl) {
-lock_retry_remap:
-                               down_read(&ni->runlist.lock);
-                               rl = ni->runlist.rl;
-                       }
-                       if (likely(rl != NULL)) {
-                               /* Seek to element containing target vcn. */
-                               while (rl->length && rl[1].vcn <= vcn)
-                                       rl++;
-                               lcn = ntfs_rl_vcn_to_lcn(rl, vcn);
-                       } else
-                               lcn = LCN_RL_NOT_MAPPED;
-                       /* Successful remap. */
-                       if (lcn >= 0) {
-                               /* Setup buffer head to correct block. */
-                               bh->b_blocknr = ((lcn << vol->cluster_size_bits)
-                                               + vcn_ofs) >> blocksize_bits;
-                               set_buffer_mapped(bh);
-                               /* Only read initialized data blocks. */
-                               if (iblock < zblock) {
-                                       arr[nr++] = bh;
-                                       continue;
-                               }
-                               /* Fully non-initialized data block, zero it. */
-                               goto handle_zblock;
-                       }
-                       /* It is a hole, need to zero it. */
-                       if (lcn == LCN_HOLE)
-                               goto handle_hole;
-                       /* If first try and runlist unmapped, map and retry. */
-                       if (!is_retry && lcn == LCN_RL_NOT_MAPPED) {
-                               is_retry = true;
-                               /*
-                                * Attempt to map runlist, dropping lock for
-                                * the duration.
-                                */
-                               up_read(&ni->runlist.lock);
-                               err = ntfs_map_runlist(ni, vcn);
-                               if (likely(!err))
-                                       goto lock_retry_remap;
-                               rl = NULL;
-                       } else if (!rl)
-                               up_read(&ni->runlist.lock);
-                       /*
-                        * If buffer is outside the runlist, treat it as a
-                        * hole.  This can happen due to concurrent truncate
-                        * for example.
-                        */
-                       if (err == -ENOENT || lcn == LCN_ENOENT) {
-                               err = 0;
-                               goto handle_hole;
-                       }
-                       /* Hard error, zero out region. */
-                       if (!err)
-                               err = -EIO;
-                       bh->b_blocknr = -1;
-                       folio_set_error(folio);
-                       ntfs_error(vol->sb, "Failed to read from inode 0x%lx, "
-                                       "attribute type 0x%x, vcn 0x%llx, "
-                                       "offset 0x%x because its location on "
-                                       "disk could not be determined%s "
-                                       "(error code %i).", ni->mft_no,
-                                       ni->type, (unsigned long long)vcn,
-                                       vcn_ofs, is_retry ? " even after "
-                                       "retrying" : "", err);
-               }
-               /*
-                * Either iblock was outside lblock limits or
-                * ntfs_rl_vcn_to_lcn() returned error.  Just zero that portion
-                * of the folio and set the buffer uptodate.
-                */
-handle_hole:
-               bh->b_blocknr = -1UL;
-               clear_buffer_mapped(bh);
-handle_zblock:
-               folio_zero_range(folio, i * blocksize, blocksize);
-               if (likely(!err))
-                       set_buffer_uptodate(bh);
-       } while (i++, iblock++, (bh = bh->b_this_page) != head);
-
-       /* Release the lock if we took it. */
-       if (rl)
-               up_read(&ni->runlist.lock);
-
-       /* Check we have at least one buffer ready for i/o. */
-       if (nr) {
-               struct buffer_head *tbh;
-
-               /* Lock the buffers. */
-               for (i = 0; i < nr; i++) {
-                       tbh = arr[i];
-                       lock_buffer(tbh);
-                       tbh->b_end_io = ntfs_end_buffer_async_read;
-                       set_buffer_async_read(tbh);
-               }
-               /* Finally, start i/o on the buffers. */
-               for (i = 0; i < nr; i++) {
-                       tbh = arr[i];
-                       if (likely(!buffer_uptodate(tbh)))
-                               submit_bh(REQ_OP_READ, tbh);
-                       else
-                               ntfs_end_buffer_async_read(tbh, 1);
-               }
-               return 0;
-       }
-       /* No i/o was scheduled on any of the buffers. */
-       if (likely(!folio_test_error(folio)))
-               folio_mark_uptodate(folio);
-       else /* Signal synchronous i/o error. */
-               nr = -EIO;
-       folio_unlock(folio);
-       return nr;
-}
-
-/**
- * ntfs_read_folio - fill a @folio of a @file with data from the device
- * @file:      open file to which the folio @folio belongs or NULL
- * @folio:     page cache folio to fill with data
- *
- * For non-resident attributes, ntfs_read_folio() fills the @folio of the open
- * file @file by calling the ntfs version of the generic block_read_full_folio()
- * function, ntfs_read_block(), which in turn creates and reads in the buffers
- * associated with the folio asynchronously.
- *
- * For resident attributes, OTOH, ntfs_read_folio() fills @folio by copying the
- * data from the mft record (which at this stage is most likely in memory) and
- * fills the remainder with zeroes. Thus, in this case, I/O is synchronous, as
- * even if the mft record is not cached at this point in time, we need to wait
- * for it to be read in before we can do the copy.
- *
- * Return 0 on success and -errno on error.
- */
-static int ntfs_read_folio(struct file *file, struct folio *folio)
-{
-       struct page *page = &folio->page;
-       loff_t i_size;
-       struct inode *vi;
-       ntfs_inode *ni, *base_ni;
-       u8 *addr;
-       ntfs_attr_search_ctx *ctx;
-       MFT_RECORD *mrec;
-       unsigned long flags;
-       u32 attr_len;
-       int err = 0;
-
-retry_readpage:
-       BUG_ON(!PageLocked(page));
-       vi = page->mapping->host;
-       i_size = i_size_read(vi);
-       /* Is the page fully outside i_size? (truncate in progress) */
-       if (unlikely(page->index >= (i_size + PAGE_SIZE - 1) >>
-                       PAGE_SHIFT)) {
-               zero_user(page, 0, PAGE_SIZE);
-               ntfs_debug("Read outside i_size - truncated?");
-               goto done;
-       }
-       /*
-        * This can potentially happen because we clear PageUptodate() during
-        * ntfs_writepage() of MstProtected() attributes.
-        */
-       if (PageUptodate(page)) {
-               unlock_page(page);
-               return 0;
-       }
-       ni = NTFS_I(vi);
-       /*
-        * Only $DATA attributes can be encrypted and only unnamed $DATA
-        * attributes can be compressed.  Index root can have the flags set but
-        * this means to create compressed/encrypted files, not that the
-        * attribute is compressed/encrypted.  Note we need to check for
-        * AT_INDEX_ALLOCATION since this is the type of both directory and
-        * index inodes.
-        */
-       if (ni->type != AT_INDEX_ALLOCATION) {
-               /* If attribute is encrypted, deny access, just like NT4. */
-               if (NInoEncrypted(ni)) {
-                       BUG_ON(ni->type != AT_DATA);
-                       err = -EACCES;
-                       goto err_out;
-               }
-               /* Compressed data streams are handled in compress.c. */
-               if (NInoNonResident(ni) && NInoCompressed(ni)) {
-                       BUG_ON(ni->type != AT_DATA);
-                       BUG_ON(ni->name_len);
-                       return ntfs_read_compressed_block(page);
-               }
-       }
-       /* NInoNonResident() == NInoIndexAllocPresent() */
-       if (NInoNonResident(ni)) {
-               /* Normal, non-resident data stream. */
-               return ntfs_read_block(folio);
-       }
-       /*
-        * Attribute is resident, implying it is not compressed or encrypted.
-        * This also means the attribute is smaller than an mft record and
-        * hence smaller than a page, so can simply zero out any pages with
-        * index above 0.  Note the attribute can actually be marked compressed
-        * but if it is resident the actual data is not compressed so we are
-        * ok to ignore the compressed flag here.
-        */
-       if (unlikely(page->index > 0)) {
-               zero_user(page, 0, PAGE_SIZE);
-               goto done;
-       }
-       if (!NInoAttr(ni))
-               base_ni = ni;
-       else
-               base_ni = ni->ext.base_ntfs_ino;
-       /* Map, pin, and lock the mft record. */
-       mrec = map_mft_record(base_ni);
-       if (IS_ERR(mrec)) {
-               err = PTR_ERR(mrec);
-               goto err_out;
-       }
-       /*
-        * If a parallel write made the attribute non-resident, drop the mft
-        * record and retry the read_folio.
-        */
-       if (unlikely(NInoNonResident(ni))) {
-               unmap_mft_record(base_ni);
-               goto retry_readpage;
-       }
-       ctx = ntfs_attr_get_search_ctx(base_ni, mrec);
-       if (unlikely(!ctx)) {
-               err = -ENOMEM;
-               goto unm_err_out;
-       }
-       err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
-                       CASE_SENSITIVE, 0, NULL, 0, ctx);
-       if (unlikely(err))
-               goto put_unm_err_out;
-       attr_len = le32_to_cpu(ctx->attr->data.resident.value_length);
-       read_lock_irqsave(&ni->size_lock, flags);
-       if (unlikely(attr_len > ni->initialized_size))
-               attr_len = ni->initialized_size;
-       i_size = i_size_read(vi);
-       read_unlock_irqrestore(&ni->size_lock, flags);
-       if (unlikely(attr_len > i_size)) {
-               /* Race with shrinking truncate. */
-               attr_len = i_size;
-       }
-       addr = kmap_atomic(page);
-       /* Copy the data to the page. */
-       memcpy(addr, (u8*)ctx->attr +
-                       le16_to_cpu(ctx->attr->data.resident.value_offset),
-                       attr_len);
-       /* Zero the remainder of the page. */
-       memset(addr + attr_len, 0, PAGE_SIZE - attr_len);
-       flush_dcache_page(page);
-       kunmap_atomic(addr);
-put_unm_err_out:
-       ntfs_attr_put_search_ctx(ctx);
-unm_err_out:
-       unmap_mft_record(base_ni);
-done:
-       SetPageUptodate(page);
-err_out:
-       unlock_page(page);
-       return err;
-}
-
-#ifdef NTFS_RW
-
-/**
- * ntfs_write_block - write a @folio to the backing store
- * @folio:     page cache folio to write out
- * @wbc:       writeback control structure
- *
- * This function is for writing folios belonging to non-resident, non-mst
- * protected attributes to their backing store.
- *
- * For a folio with buffers, map and write the dirty buffers asynchronously
- * under folio writeback. For a folio without buffers, create buffers for the
- * folio, then proceed as above.
- *
- * If a folio doesn't have buffers the folio dirty state is definitive. If
- * a folio does have buffers, the folio dirty state is just a hint,
- * and the buffer dirty state is definitive. (A hint which has rules:
- * dirty buffers against a clean folio is illegal. Other combinations are
- * legal and need to be handled. In particular a dirty folio containing
- * clean buffers for example.)
- *
- * Return 0 on success and -errno on error.
- *
- * Based on ntfs_read_block() and __block_write_full_folio().
- */
-static int ntfs_write_block(struct folio *folio, struct writeback_control *wbc)
-{
-       VCN vcn;
-       LCN lcn;
-       s64 initialized_size;
-       loff_t i_size;
-       sector_t block, dblock, iblock;
-       struct inode *vi;
-       ntfs_inode *ni;
-       ntfs_volume *vol;
-       runlist_element *rl;
-       struct buffer_head *bh, *head;
-       unsigned long flags;
-       unsigned int blocksize, vcn_ofs;
-       int err;
-       bool need_end_writeback;
-       unsigned char blocksize_bits;
-
-       vi = folio->mapping->host;
-       ni = NTFS_I(vi);
-       vol = ni->vol;
-
-       ntfs_debug("Entering for inode 0x%lx, attribute type 0x%x, page index "
-                       "0x%lx.", ni->mft_no, ni->type, folio->index);
-
-       BUG_ON(!NInoNonResident(ni));
-       BUG_ON(NInoMstProtected(ni));
-       blocksize = vol->sb->s_blocksize;
-       blocksize_bits = vol->sb->s_blocksize_bits;
-       head = folio_buffers(folio);
-       if (!head) {
-               BUG_ON(!folio_test_uptodate(folio));
-               head = create_empty_buffers(folio, blocksize,
-                               (1 << BH_Uptodate) | (1 << BH_Dirty));
-       }
-       bh = head;
-
-       /* NOTE: Different naming scheme to ntfs_read_block()! */
-
-       /* The first block in the folio. */
-       block = (s64)folio->index << (PAGE_SHIFT - blocksize_bits);
-
-       read_lock_irqsave(&ni->size_lock, flags);
-       i_size = i_size_read(vi);
-       initialized_size = ni->initialized_size;
-       read_unlock_irqrestore(&ni->size_lock, flags);
-
-       /* The first out of bounds block for the data size. */
-       dblock = (i_size + blocksize - 1) >> blocksize_bits;
-
-       /* The last (fully or partially) initialized block. */
-       iblock = initialized_size >> blocksize_bits;
-
-       /*
-        * Be very careful.  We have no exclusion from block_dirty_folio
-        * here, and the (potentially unmapped) buffers may become dirty at
-        * any time.  If a buffer becomes dirty here after we've inspected it
-        * then we just miss that fact, and the folio stays dirty.
-        *
-        * Buffers outside i_size may be dirtied by block_dirty_folio;
-        * handle that here by just cleaning them.
-        */
-
-       /*
-        * Loop through all the buffers in the folio, mapping all the dirty
-        * buffers to disk addresses and handling any aliases from the
-        * underlying block device's mapping.
-        */
-       rl = NULL;
-       err = 0;
-       do {
-               bool is_retry = false;
-
-               if (unlikely(block >= dblock)) {
-                       /*
-                        * Mapped buffers outside i_size will occur, because
-                        * this folio can be outside i_size when there is a
-                        * truncate in progress. The contents of such buffers
-                        * were zeroed by ntfs_writepage().
-                        *
-                        * FIXME: What about the small race window where
-                        * ntfs_writepage() has not done any clearing because
-                        * the folio was within i_size but before we get here,
-                        * vmtruncate() modifies i_size?
-                        */
-                       clear_buffer_dirty(bh);
-                       set_buffer_uptodate(bh);
-                       continue;
-               }
-
-               /* Clean buffers are not written out, so no need to map them. */
-               if (!buffer_dirty(bh))
-                       continue;
-
-               /* Make sure we have enough initialized size. */
-               if (unlikely((block >= iblock) &&
-                               (initialized_size < i_size))) {
-                       /*
-                        * If this folio is fully outside initialized
-                        * size, zero out all folios between the current
-                        * initialized size and the current folio. Just
-                        * use ntfs_read_folio() to do the zeroing
-                        * transparently.
-                        */
-                       if (block > iblock) {
-                               // TODO:
-                               // For each folio do:
-                               // - read_cache_folio()
-                               // Again for each folio do:
-                               // - wait_on_folio_locked()
-                               // - Check (folio_test_uptodate(folio) &&
-                               //              !folio_test_error(folio))
-                               // Update initialized size in the attribute and
-                               // in the inode.
-                               // Again, for each folio do:
-                               //      block_dirty_folio();
-                               // folio_put()
-                               // We don't need to wait on the writes.
-                               // Update iblock.
-                       }
-                       /*
-                        * The current folio straddles initialized size. Zero
-                        * all non-uptodate buffers and set them uptodate (and
-                        * dirty?). Note, there aren't any non-uptodate buffers
-                        * if the folio is uptodate.
-                        * FIXME: For an uptodate folio, the buffers may need to
-                        * be written out because they were not initialized on
-                        * disk before.
-                        */
-                       if (!folio_test_uptodate(folio)) {
-                               // TODO:
-                               // Zero any non-uptodate buffers up to i_size.
-                               // Set them uptodate and dirty.
-                       }
-                       // TODO:
-                       // Update initialized size in the attribute and in the
-                       // inode (up to i_size).
-                       // Update iblock.
-                       // FIXME: This is inefficient. Try to batch the two
-                       // size changes to happen in one go.
-                       ntfs_error(vol->sb, "Writing beyond initialized size "
-                                       "is not supported yet. Sorry.");
-                       err = -EOPNOTSUPP;
-                       break;
-                       // Do NOT set_buffer_new() BUT DO clear buffer range
-                       // outside write request range.
-                       // set_buffer_uptodate() on complete buffers as well as
-                       // set_buffer_dirty().
-               }
-
-               /* No need to map buffers that are already mapped. */
-               if (buffer_mapped(bh))
-                       continue;
-
-               /* Unmapped, dirty buffer. Need to map it. */
-               bh->b_bdev = vol->sb->s_bdev;
-
-               /* Convert block into corresponding vcn and offset. */
-               vcn = (VCN)block << blocksize_bits;
-               vcn_ofs = vcn & vol->cluster_size_mask;
-               vcn >>= vol->cluster_size_bits;
-               if (!rl) {
-lock_retry_remap:
-                       down_read(&ni->runlist.lock);
-                       rl = ni->runlist.rl;
-               }
-               if (likely(rl != NULL)) {
-                       /* Seek to element containing target vcn. */
-                       while (rl->length && rl[1].vcn <= vcn)
-                               rl++;
-                       lcn = ntfs_rl_vcn_to_lcn(rl, vcn);
-               } else
-                       lcn = LCN_RL_NOT_MAPPED;
-               /* Successful remap. */
-               if (lcn >= 0) {
-                       /* Setup buffer head to point to correct block. */
-                       bh->b_blocknr = ((lcn << vol->cluster_size_bits) +
-                                       vcn_ofs) >> blocksize_bits;
-                       set_buffer_mapped(bh);
-                       continue;
-               }
-               /* It is a hole, need to instantiate it. */
-               if (lcn == LCN_HOLE) {
-                       u8 *kaddr;
-                       unsigned long *bpos, *bend;
-
-                       /* Check if the buffer is zero. */
-                       kaddr = kmap_local_folio(folio, bh_offset(bh));
-                       bpos = (unsigned long *)kaddr;
-                       bend = (unsigned long *)(kaddr + blocksize);
-                       do {
-                               if (unlikely(*bpos))
-                                       break;
-                       } while (likely(++bpos < bend));
-                       kunmap_local(kaddr);
-                       if (bpos == bend) {
-                               /*
-                                * Buffer is zero and sparse, no need to write
-                                * it.
-                                */
-                               bh->b_blocknr = -1;
-                               clear_buffer_dirty(bh);
-                               continue;
-                       }
-                       // TODO: Instantiate the hole.
-                       // clear_buffer_new(bh);
-                       // clean_bdev_bh_alias(bh);
-                       ntfs_error(vol->sb, "Writing into sparse regions is "
-                                       "not supported yet. Sorry.");
-                       err = -EOPNOTSUPP;
-                       break;
-               }
-               /* If first try and runlist unmapped, map and retry. */
-               if (!is_retry && lcn == LCN_RL_NOT_MAPPED) {
-                       is_retry = true;
-                       /*
-                        * Attempt to map runlist, dropping lock for
-                        * the duration.
-                        */
-                       up_read(&ni->runlist.lock);
-                       err = ntfs_map_runlist(ni, vcn);
-                       if (likely(!err))
-                               goto lock_retry_remap;
-                       rl = NULL;
-               } else if (!rl)
-                       up_read(&ni->runlist.lock);
-               /*
-                * If buffer is outside the runlist, truncate has cut it out
-                * of the runlist.  Just clean and clear the buffer and set it
-                * uptodate so it can get discarded by the VM.
-                */
-               if (err == -ENOENT || lcn == LCN_ENOENT) {
-                       bh->b_blocknr = -1;
-                       clear_buffer_dirty(bh);
-                       folio_zero_range(folio, bh_offset(bh), blocksize);
-                       set_buffer_uptodate(bh);
-                       err = 0;
-                       continue;
-               }
-               /* Failed to map the buffer, even after retrying. */
-               if (!err)
-                       err = -EIO;
-               bh->b_blocknr = -1;
-               ntfs_error(vol->sb, "Failed to write to inode 0x%lx, "
-                               "attribute type 0x%x, vcn 0x%llx, offset 0x%x "
-                               "because its location on disk could not be "
-                               "determined%s (error code %i).", ni->mft_no,
-                               ni->type, (unsigned long long)vcn,
-                               vcn_ofs, is_retry ? " even after "
-                               "retrying" : "", err);
-               break;
-       } while (block++, (bh = bh->b_this_page) != head);
-
-       /* Release the lock if we took it. */
-       if (rl)
-               up_read(&ni->runlist.lock);
-
-       /* For the error case, need to reset bh to the beginning. */
-       bh = head;
-
-       /* Just an optimization, so ->read_folio() is not called later. */
-       if (unlikely(!folio_test_uptodate(folio))) {
-               int uptodate = 1;
-               do {
-                       if (!buffer_uptodate(bh)) {
-                               uptodate = 0;
-                               bh = head;
-                               break;
-                       }
-               } while ((bh = bh->b_this_page) != head);
-               if (uptodate)
-                       folio_mark_uptodate(folio);
-       }
-
-       /* Setup all mapped, dirty buffers for async write i/o. */
-       do {
-               if (buffer_mapped(bh) && buffer_dirty(bh)) {
-                       lock_buffer(bh);
-                       if (test_clear_buffer_dirty(bh)) {
-                               BUG_ON(!buffer_uptodate(bh));
-                               mark_buffer_async_write(bh);
-                       } else
-                               unlock_buffer(bh);
-               } else if (unlikely(err)) {
-                       /*
-                        * For the error case. The buffer may have been set
-                        * dirty during attachment to a dirty folio.
-                        */
-                       if (err != -ENOMEM)
-                               clear_buffer_dirty(bh);
-               }
-       } while ((bh = bh->b_this_page) != head);
-
-       if (unlikely(err)) {
-               // TODO: Remove the -EOPNOTSUPP check later on...
-               if (unlikely(err == -EOPNOTSUPP))
-                       err = 0;
-               else if (err == -ENOMEM) {
-                       ntfs_warning(vol->sb, "Error allocating memory. "
-                                       "Redirtying folio so we try again "
-                                       "later.");
-                       /*
-                        * Put the folio back on mapping->dirty_pages, but
-                        * leave its buffer's dirty state as-is.
-                        */
-                       folio_redirty_for_writepage(wbc, folio);
-                       err = 0;
-               } else
-                       folio_set_error(folio);
-       }
-
-       BUG_ON(folio_test_writeback(folio));
-       folio_start_writeback(folio);   /* Keeps try_to_free_buffers() away. */
-
-       /* Submit the prepared buffers for i/o. */
-       need_end_writeback = true;
-       do {
-               struct buffer_head *next = bh->b_this_page;
-               if (buffer_async_write(bh)) {
-                       submit_bh(REQ_OP_WRITE, bh);
-                       need_end_writeback = false;
-               }
-               bh = next;
-       } while (bh != head);
-       folio_unlock(folio);
-
-       /* If no i/o was started, need to end writeback here. */
-       if (unlikely(need_end_writeback))
-               folio_end_writeback(folio);
-
-       ntfs_debug("Done.");
-       return err;
-}
-
-/**
- * ntfs_write_mst_block - write a @page to the backing store
- * @page:      page cache page to write out
- * @wbc:       writeback control structure
- *
- * This function is for writing pages belonging to non-resident, mst protected
- * attributes to their backing store.  The only supported attributes are index
- * allocation and $MFT/$DATA.  Both directory inodes and index inodes are
- * supported for the index allocation case.
- *
- * The page must remain locked for the duration of the write because we apply
- * the mst fixups, write, and then undo the fixups, so if we were to unlock the
- * page before undoing the fixups, any other user of the page will see the
- * page contents as corrupt.
- *
- * We clear the page uptodate flag for the duration of the function to ensure
- * exclusion for the $MFT/$DATA case against someone mapping an mft record we
- * are about to apply the mst fixups to.
- *
- * Return 0 on success and -errno on error.
- *
- * Based on ntfs_write_block(), ntfs_mft_writepage(), and
- * write_mft_record_nolock().
- */
-static int ntfs_write_mst_block(struct page *page,
-               struct writeback_control *wbc)
-{
-       sector_t block, dblock, rec_block;
-       struct inode *vi = page->mapping->host;
-       ntfs_inode *ni = NTFS_I(vi);
-       ntfs_volume *vol = ni->vol;
-       u8 *kaddr;
-       unsigned int rec_size = ni->itype.index.block_size;
-       ntfs_inode *locked_nis[PAGE_SIZE / NTFS_BLOCK_SIZE];
-       struct buffer_head *bh, *head, *tbh, *rec_start_bh;
-       struct buffer_head *bhs[MAX_BUF_PER_PAGE];
-       runlist_element *rl;
-       int i, nr_locked_nis, nr_recs, nr_bhs, max_bhs, bhs_per_rec, err, err2;
-       unsigned bh_size, rec_size_bits;
-       bool sync, is_mft, page_is_dirty, rec_is_dirty;
-       unsigned char bh_size_bits;
-
-       if (WARN_ON(rec_size < NTFS_BLOCK_SIZE))
-               return -EINVAL;
-
-       ntfs_debug("Entering for inode 0x%lx, attribute type 0x%x, page index "
-                       "0x%lx.", vi->i_ino, ni->type, page->index);
-       BUG_ON(!NInoNonResident(ni));
-       BUG_ON(!NInoMstProtected(ni));
-       is_mft = (S_ISREG(vi->i_mode) && !vi->i_ino);
-       /*
-        * NOTE: ntfs_write_mst_block() would be called for $MFTMirr if a page
-        * in its page cache were to be marked dirty.  However this should
-        * never happen with the current driver and considering we do not
-        * handle this case here we do want to BUG(), at least for now.
-        */
-       BUG_ON(!(is_mft || S_ISDIR(vi->i_mode) ||
-                       (NInoAttr(ni) && ni->type == AT_INDEX_ALLOCATION)));
-       bh_size = vol->sb->s_blocksize;
-       bh_size_bits = vol->sb->s_blocksize_bits;
-       max_bhs = PAGE_SIZE / bh_size;
-       BUG_ON(!max_bhs);
-       BUG_ON(max_bhs > MAX_BUF_PER_PAGE);
-
-       /* Were we called for sync purposes? */
-       sync = (wbc->sync_mode == WB_SYNC_ALL);
-
-       /* Make sure we have mapped buffers. */
-       bh = head = page_buffers(page);
-       BUG_ON(!bh);
-
-       rec_size_bits = ni->itype.index.block_size_bits;
-       BUG_ON(!(PAGE_SIZE >> rec_size_bits));
-       bhs_per_rec = rec_size >> bh_size_bits;
-       BUG_ON(!bhs_per_rec);
-
-       /* The first block in the page. */
-       rec_block = block = (sector_t)page->index <<
-                       (PAGE_SHIFT - bh_size_bits);
-
-       /* The first out of bounds block for the data size. */
-       dblock = (i_size_read(vi) + bh_size - 1) >> bh_size_bits;
-
-       rl = NULL;
-       err = err2 = nr_bhs = nr_recs = nr_locked_nis = 0;
-       page_is_dirty = rec_is_dirty = false;
-       rec_start_bh = NULL;
-       do {
-               bool is_retry = false;
-
-               if (likely(block < rec_block)) {
-                       if (unlikely(block >= dblock)) {
-                               clear_buffer_dirty(bh);
-                               set_buffer_uptodate(bh);
-                               continue;
-                       }
-                       /*
-                        * This block is not the first one in the record.  We
-                        * ignore the buffer's dirty state because we could
-                        * have raced with a parallel mark_ntfs_record_dirty().
-                        */
-                       if (!rec_is_dirty)
-                               continue;
-                       if (unlikely(err2)) {
-                               if (err2 != -ENOMEM)
-                                       clear_buffer_dirty(bh);
-                               continue;
-                       }
-               } else /* if (block == rec_block) */ {
-                       BUG_ON(block > rec_block);
-                       /* This block is the first one in the record. */
-                       rec_block += bhs_per_rec;
-                       err2 = 0;
-                       if (unlikely(block >= dblock)) {
-                               clear_buffer_dirty(bh);
-                               continue;
-                       }
-                       if (!buffer_dirty(bh)) {
-                               /* Clean records are not written out. */
-                               rec_is_dirty = false;
-                               continue;
-                       }
-                       rec_is_dirty = true;
-                       rec_start_bh = bh;
-               }
-               /* Need to map the buffer if it is not mapped already. */
-               if (unlikely(!buffer_mapped(bh))) {
-                       VCN vcn;
-                       LCN lcn;
-                       unsigned int vcn_ofs;
-
-                       bh->b_bdev = vol->sb->s_bdev;
-                       /* Obtain the vcn and offset of the current block. */
-                       vcn = (VCN)block << bh_size_bits;
-                       vcn_ofs = vcn & vol->cluster_size_mask;
-                       vcn >>= vol->cluster_size_bits;
-                       if (!rl) {
-lock_retry_remap:
-                               down_read(&ni->runlist.lock);
-                               rl = ni->runlist.rl;
-                       }
-                       if (likely(rl != NULL)) {
-                               /* Seek to element containing target vcn. */
-                               while (rl->length && rl[1].vcn <= vcn)
-                                       rl++;
-                               lcn = ntfs_rl_vcn_to_lcn(rl, vcn);
-                       } else
-                               lcn = LCN_RL_NOT_MAPPED;
-                       /* Successful remap. */
-                       if (likely(lcn >= 0)) {
-                               /* Setup buffer head to correct block. */
-                               bh->b_blocknr = ((lcn <<
-                                               vol->cluster_size_bits) +
-                                               vcn_ofs) >> bh_size_bits;
-                               set_buffer_mapped(bh);
-                       } else {
-                               /*
-                                * Remap failed.  Retry to map the runlist once
-                                * unless we are working on $MFT which always
-                                * has the whole of its runlist in memory.
-                                */
-                               if (!is_mft && !is_retry &&
-                                               lcn == LCN_RL_NOT_MAPPED) {
-                                       is_retry = true;
-                                       /*
-                                        * Attempt to map runlist, dropping
-                                        * lock for the duration.
-                                        */
-                                       up_read(&ni->runlist.lock);
-                                       err2 = ntfs_map_runlist(ni, vcn);
-                                       if (likely(!err2))
-                                               goto lock_retry_remap;
-                                       if (err2 == -ENOMEM)
-                                               page_is_dirty = true;
-                                       lcn = err2;
-                               } else {
-                                       err2 = -EIO;
-                                       if (!rl)
-                                               up_read(&ni->runlist.lock);
-                               }
-                               /* Hard error.  Abort writing this record. */
-                               if (!err || err == -ENOMEM)
-                                       err = err2;
-                               bh->b_blocknr = -1;
-                               ntfs_error(vol->sb, "Cannot write ntfs record "
-                                               "0x%llx (inode 0x%lx, "
-                                               "attribute type 0x%x) because "
-                                               "its location on disk could "
-                                               "not be determined (error "
-                                               "code %lli).",
-                                               (long long)block <<
-                                               bh_size_bits >>
-                                               vol->mft_record_size_bits,
-                                               ni->mft_no, ni->type,
-                                               (long long)lcn);
-                               /*
-                                * If this is not the first buffer, remove the
-                                * buffers in this record from the list of
-                                * buffers to write and clear their dirty bit
-                                * if not error -ENOMEM.
-                                */
-                               if (rec_start_bh != bh) {
-                                       while (bhs[--nr_bhs] != rec_start_bh)
-                                               ;
-                                       if (err2 != -ENOMEM) {
-                                               do {
-                                                       clear_buffer_dirty(
-                                                               rec_start_bh);
-                                               } while ((rec_start_bh =
-                                                               rec_start_bh->
-                                                               b_this_page) !=
-                                                               bh);
-                                       }
-                               }
-                               continue;
-                       }
-               }
-               BUG_ON(!buffer_uptodate(bh));
-               BUG_ON(nr_bhs >= max_bhs);
-               bhs[nr_bhs++] = bh;
-       } while (block++, (bh = bh->b_this_page) != head);
-       if (unlikely(rl))
-               up_read(&ni->runlist.lock);
-       /* If there were no dirty buffers, we are done. */
-       if (!nr_bhs)
-               goto done;
-       /* Map the page so we can access its contents. */
-       kaddr = kmap(page);
-       /* Clear the page uptodate flag whilst the mst fixups are applied. */
-       BUG_ON(!PageUptodate(page));
-       ClearPageUptodate(page);
-       for (i = 0; i < nr_bhs; i++) {
-               unsigned int ofs;
-
-               /* Skip buffers which are not at the beginning of records. */
-               if (i % bhs_per_rec)
-                       continue;
-               tbh = bhs[i];
-               ofs = bh_offset(tbh);
-               if (is_mft) {
-                       ntfs_inode *tni;
-                       unsigned long mft_no;
-
-                       /* Get the mft record number. */
-                       mft_no = (((s64)page->index << PAGE_SHIFT) + ofs)
-                                       >> rec_size_bits;
-                       /* Check whether to write this mft record. */
-                       tni = NULL;
-                       if (!ntfs_may_write_mft_record(vol, mft_no,
-                                       (MFT_RECORD*)(kaddr + ofs), &tni)) {
-                               /*
-                                * The record should not be written.  This
-                                * means we need to redirty the page before
-                                * returning.
-                                */
-                               page_is_dirty = true;
-                               /*
-                                * Remove the buffers in this mft record from
-                                * the list of buffers to write.
-                                */
-                               do {
-                                       bhs[i] = NULL;
-                               } while (++i % bhs_per_rec);
-                               continue;
-                       }
-                       /*
-                        * The record should be written.  If a locked ntfs
-                        * inode was returned, add it to the array of locked
-                        * ntfs inodes.
-                        */
-                       if (tni)
-                               locked_nis[nr_locked_nis++] = tni;
-               }
-               /* Apply the mst protection fixups. */
-               err2 = pre_write_mst_fixup((NTFS_RECORD*)(kaddr + ofs),
-                               rec_size);
-               if (unlikely(err2)) {
-                       if (!err || err == -ENOMEM)
-                               err = -EIO;
-                       ntfs_error(vol->sb, "Failed to apply mst fixups "
-                                       "(inode 0x%lx, attribute type 0x%x, "
-                                       "page index 0x%lx, page offset 0x%x)!"
-                                       "  Unmount and run chkdsk.", vi->i_ino,
-                                       ni->type, page->index, ofs);
-                       /*
-                        * Mark all the buffers in this record clean as we do
-                        * not want to write corrupt data to disk.
-                        */
-                       do {
-                               clear_buffer_dirty(bhs[i]);
-                               bhs[i] = NULL;
-                       } while (++i % bhs_per_rec);
-                       continue;
-               }
-               nr_recs++;
-       }
-       /* If no records are to be written out, we are done. */
-       if (!nr_recs)
-               goto unm_done;
-       flush_dcache_page(page);
-       /* Lock buffers and start synchronous write i/o on them. */
-       for (i = 0; i < nr_bhs; i++) {
-               tbh = bhs[i];
-               if (!tbh)
-                       continue;
-               if (!trylock_buffer(tbh))
-                       BUG();
-               /* The buffer dirty state is now irrelevant, just clean it. */
-               clear_buffer_dirty(tbh);
-               BUG_ON(!buffer_uptodate(tbh));
-               BUG_ON(!buffer_mapped(tbh));
-               get_bh(tbh);
-               tbh->b_end_io = end_buffer_write_sync;
-               submit_bh(REQ_OP_WRITE, tbh);
-       }
-       /* Synchronize the mft mirror now if not @sync. */
-       if (is_mft && !sync)
-               goto do_mirror;
-do_wait:
-       /* Wait on i/o completion of buffers. */
-       for (i = 0; i < nr_bhs; i++) {
-               tbh = bhs[i];
-               if (!tbh)
-                       continue;
-               wait_on_buffer(tbh);
-               if (unlikely(!buffer_uptodate(tbh))) {
-                       ntfs_error(vol->sb, "I/O error while writing ntfs "
-                                       "record buffer (inode 0x%lx, "
-                                       "attribute type 0x%x, page index "
-                                       "0x%lx, page offset 0x%lx)!  Unmount "
-                                       "and run chkdsk.", vi->i_ino, ni->type,
-                                       page->index, bh_offset(tbh));
-                       if (!err || err == -ENOMEM)
-                               err = -EIO;
-                       /*
-                        * Set the buffer uptodate so the page and buffer
-                        * states do not become out of sync.
-                        */
-                       set_buffer_uptodate(tbh);
-               }
-       }
-       /* If @sync, now synchronize the mft mirror. */
-       if (is_mft && sync) {
-do_mirror:
-               for (i = 0; i < nr_bhs; i++) {
-                       unsigned long mft_no;
-                       unsigned int ofs;
-
-                       /*
-                        * Skip buffers which are not at the beginning of
-                        * records.
-                        */
-                       if (i % bhs_per_rec)
-                               continue;
-                       tbh = bhs[i];
-                       /* Skip removed buffers (and hence records). */
-                       if (!tbh)
-                               continue;
-                       ofs = bh_offset(tbh);
-                       /* Get the mft record number. */
-                       mft_no = (((s64)page->index << PAGE_SHIFT) + ofs)
-                                       >> rec_size_bits;
-                       if (mft_no < vol->mftmirr_size)
-                               ntfs_sync_mft_mirror(vol, mft_no,
-                                               (MFT_RECORD*)(kaddr + ofs),
-                                               sync);
-               }
-               if (!sync)
-                       goto do_wait;
-       }
-       /* Remove the mst protection fixups again. */
-       for (i = 0; i < nr_bhs; i++) {
-               if (!(i % bhs_per_rec)) {
-                       tbh = bhs[i];
-                       if (!tbh)
-                               continue;
-                       post_write_mst_fixup((NTFS_RECORD*)(kaddr +
-                                       bh_offset(tbh)));
-               }
-       }
-       flush_dcache_page(page);
-unm_done:
-       /* Unlock any locked inodes. */
-       while (nr_locked_nis-- > 0) {
-               ntfs_inode *tni, *base_tni;
-               
-               tni = locked_nis[nr_locked_nis];
-               /* Get the base inode. */
-               mutex_lock(&tni->extent_lock);
-               if (tni->nr_extents >= 0)
-                       base_tni = tni;
-               else {
-                       base_tni = tni->ext.base_ntfs_ino;
-                       BUG_ON(!base_tni);
-               }
-               mutex_unlock(&tni->extent_lock);
-               ntfs_debug("Unlocking %s inode 0x%lx.",
-                               tni == base_tni ? "base" : "extent",
-                               tni->mft_no);
-               mutex_unlock(&tni->mrec_lock);
-               atomic_dec(&tni->count);
-               iput(VFS_I(base_tni));
-       }
-       SetPageUptodate(page);
-       kunmap(page);
-done:
-       if (unlikely(err && err != -ENOMEM)) {
-               /*
-                * Set page error if there is only one ntfs record in the page.
-                * Otherwise we would loose per-record granularity.
-                */
-               if (ni->itype.index.block_size == PAGE_SIZE)
-                       SetPageError(page);
-               NVolSetErrors(vol);
-       }
-       if (page_is_dirty) {
-               ntfs_debug("Page still contains one or more dirty ntfs "
-                               "records.  Redirtying the page starting at "
-                               "record 0x%lx.", page->index <<
-                               (PAGE_SHIFT - rec_size_bits));
-               redirty_page_for_writepage(wbc, page);
-               unlock_page(page);
-       } else {
-               /*
-                * Keep the VM happy.  This must be done otherwise the
-                * radix-tree tag PAGECACHE_TAG_DIRTY remains set even though
-                * the page is clean.
-                */
-               BUG_ON(PageWriteback(page));
-               set_page_writeback(page);
-               unlock_page(page);
-               end_page_writeback(page);
-       }
-       if (likely(!err))
-               ntfs_debug("Done.");
-       return err;
-}
-
-/**
- * ntfs_writepage - write a @page to the backing store
- * @page:      page cache page to write out
- * @wbc:       writeback control structure
- *
- * This is called from the VM when it wants to have a dirty ntfs page cache
- * page cleaned.  The VM has already locked the page and marked it clean.
- *
- * For non-resident attributes, ntfs_writepage() writes the @page by calling
- * the ntfs version of the generic block_write_full_folio() function,
- * ntfs_write_block(), which in turn if necessary creates and writes the
- * buffers associated with the page asynchronously.
- *
- * For resident attributes, OTOH, ntfs_writepage() writes the @page by copying
- * the data to the mft record (which at this stage is most likely in memory).
- * The mft record is then marked dirty and written out asynchronously via the
- * vfs inode dirty code path for the inode the mft record belongs to or via the
- * vm page dirty code path for the page the mft record is in.
- *
- * Based on ntfs_read_folio() and fs/buffer.c::block_write_full_folio().
- *
- * Return 0 on success and -errno on error.
- */
-static int ntfs_writepage(struct page *page, struct writeback_control *wbc)
-{
-       struct folio *folio = page_folio(page);
-       loff_t i_size;
-       struct inode *vi = folio->mapping->host;
-       ntfs_inode *base_ni = NULL, *ni = NTFS_I(vi);
-       char *addr;
-       ntfs_attr_search_ctx *ctx = NULL;
-       MFT_RECORD *m = NULL;
-       u32 attr_len;
-       int err;
-
-retry_writepage:
-       BUG_ON(!folio_test_locked(folio));
-       i_size = i_size_read(vi);
-       /* Is the folio fully outside i_size? (truncate in progress) */
-       if (unlikely(folio->index >= (i_size + PAGE_SIZE - 1) >>
-                       PAGE_SHIFT)) {
-               /*
-                * The folio may have dirty, unmapped buffers.  Make them
-                * freeable here, so the page does not leak.
-                */
-               block_invalidate_folio(folio, 0, folio_size(folio));
-               folio_unlock(folio);
-               ntfs_debug("Write outside i_size - truncated?");
-               return 0;
-       }
-       /*
-        * Only $DATA attributes can be encrypted and only unnamed $DATA
-        * attributes can be compressed.  Index root can have the flags set but
-        * this means to create compressed/encrypted files, not that the
-        * attribute is compressed/encrypted.  Note we need to check for
-        * AT_INDEX_ALLOCATION since this is the type of both directory and
-        * index inodes.
-        */
-       if (ni->type != AT_INDEX_ALLOCATION) {
-               /* If file is encrypted, deny access, just like NT4. */
-               if (NInoEncrypted(ni)) {
-                       folio_unlock(folio);
-                       BUG_ON(ni->type != AT_DATA);
-                       ntfs_debug("Denying write access to encrypted file.");
-                       return -EACCES;
-               }
-               /* Compressed data streams are handled in compress.c. */
-               if (NInoNonResident(ni) && NInoCompressed(ni)) {
-                       BUG_ON(ni->type != AT_DATA);
-                       BUG_ON(ni->name_len);
-                       // TODO: Implement and replace this with
-                       // return ntfs_write_compressed_block(page);
-                       folio_unlock(folio);
-                       ntfs_error(vi->i_sb, "Writing to compressed files is "
-                                       "not supported yet.  Sorry.");
-                       return -EOPNOTSUPP;
-               }
-               // TODO: Implement and remove this check.
-               if (NInoNonResident(ni) && NInoSparse(ni)) {
-                       folio_unlock(folio);
-                       ntfs_error(vi->i_sb, "Writing to sparse files is not "
-                                       "supported yet.  Sorry.");
-                       return -EOPNOTSUPP;
-               }
-       }
-       /* NInoNonResident() == NInoIndexAllocPresent() */
-       if (NInoNonResident(ni)) {
-               /* We have to zero every time due to mmap-at-end-of-file. */
-               if (folio->index >= (i_size >> PAGE_SHIFT)) {
-                       /* The folio straddles i_size. */
-                       unsigned int ofs = i_size & (folio_size(folio) - 1);
-                       folio_zero_segment(folio, ofs, folio_size(folio));
-               }
-               /* Handle mst protected attributes. */
-               if (NInoMstProtected(ni))
-                       return ntfs_write_mst_block(page, wbc);
-               /* Normal, non-resident data stream. */
-               return ntfs_write_block(folio, wbc);
-       }
-       /*
-        * Attribute is resident, implying it is not compressed, encrypted, or
-        * mst protected.  This also means the attribute is smaller than an mft
-        * record and hence smaller than a folio, so can simply return error on
-        * any folios with index above 0.  Note the attribute can actually be
-        * marked compressed but if it is resident the actual data is not
-        * compressed so we are ok to ignore the compressed flag here.
-        */
-       BUG_ON(folio_buffers(folio));
-       BUG_ON(!folio_test_uptodate(folio));
-       if (unlikely(folio->index > 0)) {
-               ntfs_error(vi->i_sb, "BUG()! folio->index (0x%lx) > 0.  "
-                               "Aborting write.", folio->index);
-               BUG_ON(folio_test_writeback(folio));
-               folio_start_writeback(folio);
-               folio_unlock(folio);
-               folio_end_writeback(folio);
-               return -EIO;
-       }
-       if (!NInoAttr(ni))
-               base_ni = ni;
-       else
-               base_ni = ni->ext.base_ntfs_ino;
-       /* Map, pin, and lock the mft record. */
-       m = map_mft_record(base_ni);
-       if (IS_ERR(m)) {
-               err = PTR_ERR(m);
-               m = NULL;
-               ctx = NULL;
-               goto err_out;
-       }
-       /*
-        * If a parallel write made the attribute non-resident, drop the mft
-        * record and retry the writepage.
-        */
-       if (unlikely(NInoNonResident(ni))) {
-               unmap_mft_record(base_ni);
-               goto retry_writepage;
-       }
-       ctx = ntfs_attr_get_search_ctx(base_ni, m);
-       if (unlikely(!ctx)) {
-               err = -ENOMEM;
-               goto err_out;
-       }
-       err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
-                       CASE_SENSITIVE, 0, NULL, 0, ctx);
-       if (unlikely(err))
-               goto err_out;
-       /*
-        * Keep the VM happy.  This must be done otherwise
-        * PAGECACHE_TAG_DIRTY remains set even though the folio is clean.
-        */
-       BUG_ON(folio_test_writeback(folio));
-       folio_start_writeback(folio);
-       folio_unlock(folio);
-       attr_len = le32_to_cpu(ctx->attr->data.resident.value_length);
-       i_size = i_size_read(vi);
-       if (unlikely(attr_len > i_size)) {
-               /* Race with shrinking truncate or a failed truncate. */
-               attr_len = i_size;
-               /*
-                * If the truncate failed, fix it up now.  If a concurrent
-                * truncate, we do its job, so it does not have to do anything.
-                */
-               err = ntfs_resident_attr_value_resize(ctx->mrec, ctx->attr,
-                               attr_len);
-               /* Shrinking cannot fail. */
-               BUG_ON(err);
-       }
-       addr = kmap_local_folio(folio, 0);
-       /* Copy the data from the folio to the mft record. */
-       memcpy((u8*)ctx->attr +
-                       le16_to_cpu(ctx->attr->data.resident.value_offset),
-                       addr, attr_len);
-       /* Zero out of bounds area in the page cache folio. */
-       memset(addr + attr_len, 0, folio_size(folio) - attr_len);
-       kunmap_local(addr);
-       flush_dcache_folio(folio);
-       flush_dcache_mft_record_page(ctx->ntfs_ino);
-       /* We are done with the folio. */
-       folio_end_writeback(folio);
-       /* Finally, mark the mft record dirty, so it gets written back. */
-       mark_mft_record_dirty(ctx->ntfs_ino);
-       ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(base_ni);
-       return 0;
-err_out:
-       if (err == -ENOMEM) {
-               ntfs_warning(vi->i_sb, "Error allocating memory. Redirtying "
-                               "page so we try again later.");
-               /*
-                * Put the folio back on mapping->dirty_pages, but leave its
-                * buffers' dirty state as-is.
-                */
-               folio_redirty_for_writepage(wbc, folio);
-               err = 0;
-       } else {
-               ntfs_error(vi->i_sb, "Resident attribute write failed with "
-                               "error %i.", err);
-               folio_set_error(folio);
-               NVolSetErrors(ni->vol);
-       }
-       folio_unlock(folio);
-       if (ctx)
-               ntfs_attr_put_search_ctx(ctx);
-       if (m)
-               unmap_mft_record(base_ni);
-       return err;
-}
-
-#endif /* NTFS_RW */
-
-/**
- * ntfs_bmap - map logical file block to physical device block
- * @mapping:   address space mapping to which the block to be mapped belongs
- * @block:     logical block to map to its physical device block
- *
- * For regular, non-resident files (i.e. not compressed and not encrypted), map
- * the logical @block belonging to the file described by the address space
- * mapping @mapping to its physical device block.
- *
- * The size of the block is equal to the @s_blocksize field of the super block
- * of the mounted file system which is guaranteed to be smaller than or equal
- * to the cluster size thus the block is guaranteed to fit entirely inside the
- * cluster which means we do not need to care how many contiguous bytes are
- * available after the beginning of the block.
- *
- * Return the physical device block if the mapping succeeded or 0 if the block
- * is sparse or there was an error.
- *
- * Note: This is a problem if someone tries to run bmap() on $Boot system file
- * as that really is in block zero but there is nothing we can do.  bmap() is
- * just broken in that respect (just like it cannot distinguish sparse from
- * not available or error).
- */
-static sector_t ntfs_bmap(struct address_space *mapping, sector_t block)
-{
-       s64 ofs, size;
-       loff_t i_size;
-       LCN lcn;
-       unsigned long blocksize, flags;
-       ntfs_inode *ni = NTFS_I(mapping->host);
-       ntfs_volume *vol = ni->vol;
-       unsigned delta;
-       unsigned char blocksize_bits, cluster_size_shift;
-
-       ntfs_debug("Entering for mft_no 0x%lx, logical block 0x%llx.",
-                       ni->mft_no, (unsigned long long)block);
-       if (ni->type != AT_DATA || !NInoNonResident(ni) || NInoEncrypted(ni)) {
-               ntfs_error(vol->sb, "BMAP does not make sense for %s "
-                               "attributes, returning 0.",
-                               (ni->type != AT_DATA) ? "non-data" :
-                               (!NInoNonResident(ni) ? "resident" :
-                               "encrypted"));
-               return 0;
-       }
-       /* None of these can happen. */
-       BUG_ON(NInoCompressed(ni));
-       BUG_ON(NInoMstProtected(ni));
-       blocksize = vol->sb->s_blocksize;
-       blocksize_bits = vol->sb->s_blocksize_bits;
-       ofs = (s64)block << blocksize_bits;
-       read_lock_irqsave(&ni->size_lock, flags);
-       size = ni->initialized_size;
-       i_size = i_size_read(VFS_I(ni));
-       read_unlock_irqrestore(&ni->size_lock, flags);
-       /*
-        * If the offset is outside the initialized size or the block straddles
-        * the initialized size then pretend it is a hole unless the
-        * initialized size equals the file size.
-        */
-       if (unlikely(ofs >= size || (ofs + blocksize > size && size < i_size)))
-               goto hole;
-       cluster_size_shift = vol->cluster_size_bits;
-       down_read(&ni->runlist.lock);
-       lcn = ntfs_attr_vcn_to_lcn_nolock(ni, ofs >> cluster_size_shift, false);
-       up_read(&ni->runlist.lock);
-       if (unlikely(lcn < LCN_HOLE)) {
-               /*
-                * Step down to an integer to avoid gcc doing a long long
-                * comparision in the switch when we know @lcn is between
-                * LCN_HOLE and LCN_EIO (i.e. -1 to -5).
-                *
-                * Otherwise older gcc (at least on some architectures) will
-                * try to use __cmpdi2() which is of course not available in
-                * the kernel.
-                */
-               switch ((int)lcn) {
-               case LCN_ENOENT:
-                       /*
-                        * If the offset is out of bounds then pretend it is a
-                        * hole.
-                        */
-                       goto hole;
-               case LCN_ENOMEM:
-                       ntfs_error(vol->sb, "Not enough memory to complete "
-                                       "mapping for inode 0x%lx.  "
-                                       "Returning 0.", ni->mft_no);
-                       break;
-               default:
-                       ntfs_error(vol->sb, "Failed to complete mapping for "
-                                       "inode 0x%lx.  Run chkdsk.  "
-                                       "Returning 0.", ni->mft_no);
-                       break;
-               }
-               return 0;
-       }
-       if (lcn < 0) {
-               /* It is a hole. */
-hole:
-               ntfs_debug("Done (returning hole).");
-               return 0;
-       }
-       /*
-        * The block is really allocated and fullfils all our criteria.
-        * Convert the cluster to units of block size and return the result.
-        */
-       delta = ofs & vol->cluster_size_mask;
-       if (unlikely(sizeof(block) < sizeof(lcn))) {
-               block = lcn = ((lcn << cluster_size_shift) + delta) >>
-                               blocksize_bits;
-               /* If the block number was truncated return 0. */
-               if (unlikely(block != lcn)) {
-                       ntfs_error(vol->sb, "Physical block 0x%llx is too "
-                                       "large to be returned, returning 0.",
-                                       (long long)lcn);
-                       return 0;
-               }
-       } else
-               block = ((lcn << cluster_size_shift) + delta) >>
-                               blocksize_bits;
-       ntfs_debug("Done (returning block 0x%llx).", (unsigned long long)lcn);
-       return block;
-}
-
-/*
- * ntfs_normal_aops - address space operations for normal inodes and attributes
- *
- * Note these are not used for compressed or mst protected inodes and
- * attributes.
- */
-const struct address_space_operations ntfs_normal_aops = {
-       .read_folio     = ntfs_read_folio,
-#ifdef NTFS_RW
-       .writepage      = ntfs_writepage,
-       .dirty_folio    = block_dirty_folio,
-#endif /* NTFS_RW */
-       .bmap           = ntfs_bmap,
-       .migrate_folio  = buffer_migrate_folio,
-       .is_partially_uptodate = block_is_partially_uptodate,
-       .error_remove_folio = generic_error_remove_folio,
-};
-
-/*
- * ntfs_compressed_aops - address space operations for compressed inodes
- */
-const struct address_space_operations ntfs_compressed_aops = {
-       .read_folio     = ntfs_read_folio,
-#ifdef NTFS_RW
-       .writepage      = ntfs_writepage,
-       .dirty_folio    = block_dirty_folio,
-#endif /* NTFS_RW */
-       .migrate_folio  = buffer_migrate_folio,
-       .is_partially_uptodate = block_is_partially_uptodate,
-       .error_remove_folio = generic_error_remove_folio,
-};
-
-/*
- * ntfs_mst_aops - general address space operations for mst protecteed inodes
- *                       and attributes
- */
-const struct address_space_operations ntfs_mst_aops = {
-       .read_folio     = ntfs_read_folio,      /* Fill page with data. */
-#ifdef NTFS_RW
-       .writepage      = ntfs_writepage,       /* Write dirty page to disk. */
-       .dirty_folio    = filemap_dirty_folio,
-#endif /* NTFS_RW */
-       .migrate_folio  = buffer_migrate_folio,
-       .is_partially_uptodate  = block_is_partially_uptodate,
-       .error_remove_folio = generic_error_remove_folio,
-};
-
-#ifdef NTFS_RW
-
-/**
- * mark_ntfs_record_dirty - mark an ntfs record dirty
- * @page:      page containing the ntfs record to mark dirty
- * @ofs:       byte offset within @page at which the ntfs record begins
- *
- * Set the buffers and the page in which the ntfs record is located dirty.
- *
- * The latter also marks the vfs inode the ntfs record belongs to dirty
- * (I_DIRTY_PAGES only).
- *
- * If the page does not have buffers, we create them and set them uptodate.
- * The page may not be locked which is why we need to handle the buffers under
- * the mapping->i_private_lock.  Once the buffers are marked dirty we no longer
- * need the lock since try_to_free_buffers() does not free dirty buffers.
- */
-void mark_ntfs_record_dirty(struct page *page, const unsigned int ofs) {
-       struct address_space *mapping = page->mapping;
-       ntfs_inode *ni = NTFS_I(mapping->host);
-       struct buffer_head *bh, *head, *buffers_to_free = NULL;
-       unsigned int end, bh_size, bh_ofs;
-
-       BUG_ON(!PageUptodate(page));
-       end = ofs + ni->itype.index.block_size;
-       bh_size = VFS_I(ni)->i_sb->s_blocksize;
-       spin_lock(&mapping->i_private_lock);
-       if (unlikely(!page_has_buffers(page))) {
-               spin_unlock(&mapping->i_private_lock);
-               bh = head = alloc_page_buffers(page, bh_size, true);
-               spin_lock(&mapping->i_private_lock);
-               if (likely(!page_has_buffers(page))) {
-                       struct buffer_head *tail;
-
-                       do {
-                               set_buffer_uptodate(bh);
-                               tail = bh;
-                               bh = bh->b_this_page;
-                       } while (bh);
-                       tail->b_this_page = head;
-                       attach_page_private(page, head);
-               } else
-                       buffers_to_free = bh;
-       }
-       bh = head = page_buffers(page);
-       BUG_ON(!bh);
-       do {
-               bh_ofs = bh_offset(bh);
-               if (bh_ofs + bh_size <= ofs)
-                       continue;
-               if (unlikely(bh_ofs >= end))
-                       break;
-               set_buffer_dirty(bh);
-       } while ((bh = bh->b_this_page) != head);
-       spin_unlock(&mapping->i_private_lock);
-       filemap_dirty_folio(mapping, page_folio(page));
-       if (unlikely(buffers_to_free)) {
-               do {
-                       bh = buffers_to_free->b_this_page;
-                       free_buffer_head(buffers_to_free);
-                       buffers_to_free = bh;
-               } while (buffers_to_free);
-       }
-}
-
-#endif /* NTFS_RW */
diff --git a/fs/ntfs/aops.h b/fs/ntfs/aops.h
deleted file mode 100644 (file)
index 8d0958a..0000000
+++ /dev/null
@@ -1,88 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * aops.h - Defines for NTFS kernel address space operations and page cache
- *         handling.  Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2004 Anton Altaparmakov
- * Copyright (c) 2002 Richard Russon
- */
-
-#ifndef _LINUX_NTFS_AOPS_H
-#define _LINUX_NTFS_AOPS_H
-
-#include <linux/mm.h>
-#include <linux/highmem.h>
-#include <linux/pagemap.h>
-#include <linux/fs.h>
-
-#include "inode.h"
-
-/**
- * ntfs_unmap_page - release a page that was mapped using ntfs_map_page()
- * @page:      the page to release
- *
- * Unpin, unmap and release a page that was obtained from ntfs_map_page().
- */
-static inline void ntfs_unmap_page(struct page *page)
-{
-       kunmap(page);
-       put_page(page);
-}
-
-/**
- * ntfs_map_page - map a page into accessible memory, reading it if necessary
- * @mapping:   address space for which to obtain the page
- * @index:     index into the page cache for @mapping of the page to map
- *
- * Read a page from the page cache of the address space @mapping at position
- * @index, where @index is in units of PAGE_SIZE, and not in bytes.
- *
- * If the page is not in memory it is loaded from disk first using the
- * read_folio method defined in the address space operations of @mapping
- * and the page is added to the page cache of @mapping in the process.
- *
- * If the page belongs to an mst protected attribute and it is marked as such
- * in its ntfs inode (NInoMstProtected()) the mst fixups are applied but no
- * error checking is performed.  This means the caller has to verify whether
- * the ntfs record(s) contained in the page are valid or not using one of the
- * ntfs_is_XXXX_record{,p}() macros, where XXXX is the record type you are
- * expecting to see.  (For details of the macros, see fs/ntfs/layout.h.)
- *
- * If the page is in high memory it is mapped into memory directly addressible
- * by the kernel.
- *
- * Finally the page count is incremented, thus pinning the page into place.
- *
- * The above means that page_address(page) can be used on all pages obtained
- * with ntfs_map_page() to get the kernel virtual address of the page.
- *
- * When finished with the page, the caller has to call ntfs_unmap_page() to
- * unpin, unmap and release the page.
- *
- * Note this does not grant exclusive access. If such is desired, the caller
- * must provide it independently of the ntfs_{un}map_page() calls by using
- * a {rw_}semaphore or other means of serialization. A spin lock cannot be
- * used as ntfs_map_page() can block.
- *
- * The unlocked and uptodate page is returned on success or an encoded error
- * on failure. Caller has to test for error using the IS_ERR() macro on the
- * return value. If that evaluates to 'true', the negative error code can be
- * obtained using PTR_ERR() on the return value of ntfs_map_page().
- */
-static inline struct page *ntfs_map_page(struct address_space *mapping,
-               unsigned long index)
-{
-       struct page *page = read_mapping_page(mapping, index, NULL);
-
-       if (!IS_ERR(page))
-               kmap(page);
-       return page;
-}
-
-#ifdef NTFS_RW
-
-extern void mark_ntfs_record_dirty(struct page *page, const unsigned int ofs);
-
-#endif /* NTFS_RW */
-
-#endif /* _LINUX_NTFS_AOPS_H */
diff --git a/fs/ntfs/attrib.c b/fs/ntfs/attrib.c
deleted file mode 100644 (file)
index f79408f..0000000
+++ /dev/null
@@ -1,2624 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * attrib.c - NTFS attribute operations.  Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2012 Anton Altaparmakov and Tuxera Inc.
- * Copyright (c) 2002 Richard Russon
- */
-
-#include <linux/buffer_head.h>
-#include <linux/sched.h>
-#include <linux/slab.h>
-#include <linux/swap.h>
-#include <linux/writeback.h>
-
-#include "attrib.h"
-#include "debug.h"
-#include "layout.h"
-#include "lcnalloc.h"
-#include "malloc.h"
-#include "mft.h"
-#include "ntfs.h"
-#include "types.h"
-
-/**
- * ntfs_map_runlist_nolock - map (a part of) a runlist of an ntfs inode
- * @ni:                ntfs inode for which to map (part of) a runlist
- * @vcn:       map runlist part containing this vcn
- * @ctx:       active attribute search context if present or NULL if not
- *
- * Map the part of a runlist containing the @vcn of the ntfs inode @ni.
- *
- * If @ctx is specified, it is an active search context of @ni and its base mft
- * record.  This is needed when ntfs_map_runlist_nolock() encounters unmapped
- * runlist fragments and allows their mapping.  If you do not have the mft
- * record mapped, you can specify @ctx as NULL and ntfs_map_runlist_nolock()
- * will perform the necessary mapping and unmapping.
- *
- * Note, ntfs_map_runlist_nolock() saves the state of @ctx on entry and
- * restores it before returning.  Thus, @ctx will be left pointing to the same
- * attribute on return as on entry.  However, the actual pointers in @ctx may
- * point to different memory locations on return, so you must remember to reset
- * any cached pointers from the @ctx, i.e. after the call to
- * ntfs_map_runlist_nolock(), you will probably want to do:
- *     m = ctx->mrec;
- *     a = ctx->attr;
- * Assuming you cache ctx->attr in a variable @a of type ATTR_RECORD * and that
- * you cache ctx->mrec in a variable @m of type MFT_RECORD *.
- *
- * Return 0 on success and -errno on error.  There is one special error code
- * which is not an error as such.  This is -ENOENT.  It means that @vcn is out
- * of bounds of the runlist.
- *
- * Note the runlist can be NULL after this function returns if @vcn is zero and
- * the attribute has zero allocated size, i.e. there simply is no runlist.
- *
- * WARNING: If @ctx is supplied, regardless of whether success or failure is
- *         returned, you need to check IS_ERR(@ctx->mrec) and if 'true' the @ctx
- *         is no longer valid, i.e. you need to either call
- *         ntfs_attr_reinit_search_ctx() or ntfs_attr_put_search_ctx() on it.
- *         In that case PTR_ERR(@ctx->mrec) will give you the error code for
- *         why the mapping of the old inode failed.
- *
- * Locking: - The runlist described by @ni must be locked for writing on entry
- *           and is locked on return.  Note the runlist will be modified.
- *         - If @ctx is NULL, the base mft record of @ni must not be mapped on
- *           entry and it will be left unmapped on return.
- *         - If @ctx is not NULL, the base mft record must be mapped on entry
- *           and it will be left mapped on return.
- */
-int ntfs_map_runlist_nolock(ntfs_inode *ni, VCN vcn, ntfs_attr_search_ctx *ctx)
-{
-       VCN end_vcn;
-       unsigned long flags;
-       ntfs_inode *base_ni;
-       MFT_RECORD *m;
-       ATTR_RECORD *a;
-       runlist_element *rl;
-       struct page *put_this_page = NULL;
-       int err = 0;
-       bool ctx_is_temporary, ctx_needs_reset;
-       ntfs_attr_search_ctx old_ctx = { NULL, };
-
-       ntfs_debug("Mapping runlist part containing vcn 0x%llx.",
-                       (unsigned long long)vcn);
-       if (!NInoAttr(ni))
-               base_ni = ni;
-       else
-               base_ni = ni->ext.base_ntfs_ino;
-       if (!ctx) {
-               ctx_is_temporary = ctx_needs_reset = true;
-               m = map_mft_record(base_ni);
-               if (IS_ERR(m))
-                       return PTR_ERR(m);
-               ctx = ntfs_attr_get_search_ctx(base_ni, m);
-               if (unlikely(!ctx)) {
-                       err = -ENOMEM;
-                       goto err_out;
-               }
-       } else {
-               VCN allocated_size_vcn;
-
-               BUG_ON(IS_ERR(ctx->mrec));
-               a = ctx->attr;
-               BUG_ON(!a->non_resident);
-               ctx_is_temporary = false;
-               end_vcn = sle64_to_cpu(a->data.non_resident.highest_vcn);
-               read_lock_irqsave(&ni->size_lock, flags);
-               allocated_size_vcn = ni->allocated_size >>
-                               ni->vol->cluster_size_bits;
-               read_unlock_irqrestore(&ni->size_lock, flags);
-               if (!a->data.non_resident.lowest_vcn && end_vcn <= 0)
-                       end_vcn = allocated_size_vcn - 1;
-               /*
-                * If we already have the attribute extent containing @vcn in
-                * @ctx, no need to look it up again.  We slightly cheat in
-                * that if vcn exceeds the allocated size, we will refuse to
-                * map the runlist below, so there is definitely no need to get
-                * the right attribute extent.
-                */
-               if (vcn >= allocated_size_vcn || (a->type == ni->type &&
-                               a->name_length == ni->name_len &&
-                               !memcmp((u8*)a + le16_to_cpu(a->name_offset),
-                               ni->name, ni->name_len) &&
-                               sle64_to_cpu(a->data.non_resident.lowest_vcn)
-                               <= vcn && end_vcn >= vcn))
-                       ctx_needs_reset = false;
-               else {
-                       /* Save the old search context. */
-                       old_ctx = *ctx;
-                       /*
-                        * If the currently mapped (extent) inode is not the
-                        * base inode we will unmap it when we reinitialize the
-                        * search context which means we need to get a
-                        * reference to the page containing the mapped mft
-                        * record so we do not accidentally drop changes to the
-                        * mft record when it has not been marked dirty yet.
-                        */
-                       if (old_ctx.base_ntfs_ino && old_ctx.ntfs_ino !=
-                                       old_ctx.base_ntfs_ino) {
-                               put_this_page = old_ctx.ntfs_ino->page;
-                               get_page(put_this_page);
-                       }
-                       /*
-                        * Reinitialize the search context so we can lookup the
-                        * needed attribute extent.
-                        */
-                       ntfs_attr_reinit_search_ctx(ctx);
-                       ctx_needs_reset = true;
-               }
-       }
-       if (ctx_needs_reset) {
-               err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
-                               CASE_SENSITIVE, vcn, NULL, 0, ctx);
-               if (unlikely(err)) {
-                       if (err == -ENOENT)
-                               err = -EIO;
-                       goto err_out;
-               }
-               BUG_ON(!ctx->attr->non_resident);
-       }
-       a = ctx->attr;
-       /*
-        * Only decompress the mapping pairs if @vcn is inside it.  Otherwise
-        * we get into problems when we try to map an out of bounds vcn because
-        * we then try to map the already mapped runlist fragment and
-        * ntfs_mapping_pairs_decompress() fails.
-        */
-       end_vcn = sle64_to_cpu(a->data.non_resident.highest_vcn) + 1;
-       if (unlikely(vcn && vcn >= end_vcn)) {
-               err = -ENOENT;
-               goto err_out;
-       }
-       rl = ntfs_mapping_pairs_decompress(ni->vol, a, ni->runlist.rl);
-       if (IS_ERR(rl))
-               err = PTR_ERR(rl);
-       else
-               ni->runlist.rl = rl;
-err_out:
-       if (ctx_is_temporary) {
-               if (likely(ctx))
-                       ntfs_attr_put_search_ctx(ctx);
-               unmap_mft_record(base_ni);
-       } else if (ctx_needs_reset) {
-               /*
-                * If there is no attribute list, restoring the search context
-                * is accomplished simply by copying the saved context back over
-                * the caller supplied context.  If there is an attribute list,
-                * things are more complicated as we need to deal with mapping
-                * of mft records and resulting potential changes in pointers.
-                */
-               if (NInoAttrList(base_ni)) {
-                       /*
-                        * If the currently mapped (extent) inode is not the
-                        * one we had before, we need to unmap it and map the
-                        * old one.
-                        */
-                       if (ctx->ntfs_ino != old_ctx.ntfs_ino) {
-                               /*
-                                * If the currently mapped inode is not the
-                                * base inode, unmap it.
-                                */
-                               if (ctx->base_ntfs_ino && ctx->ntfs_ino !=
-                                               ctx->base_ntfs_ino) {
-                                       unmap_extent_mft_record(ctx->ntfs_ino);
-                                       ctx->mrec = ctx->base_mrec;
-                                       BUG_ON(!ctx->mrec);
-                               }
-                               /*
-                                * If the old mapped inode is not the base
-                                * inode, map it.
-                                */
-                               if (old_ctx.base_ntfs_ino &&
-                                               old_ctx.ntfs_ino !=
-                                               old_ctx.base_ntfs_ino) {
-retry_map:
-                                       ctx->mrec = map_mft_record(
-                                                       old_ctx.ntfs_ino);
-                                       /*
-                                        * Something bad has happened.  If out
-                                        * of memory retry till it succeeds.
-                                        * Any other errors are fatal and we
-                                        * return the error code in ctx->mrec.
-                                        * Let the caller deal with it...  We
-                                        * just need to fudge things so the
-                                        * caller can reinit and/or put the
-                                        * search context safely.
-                                        */
-                                       if (IS_ERR(ctx->mrec)) {
-                                               if (PTR_ERR(ctx->mrec) ==
-                                                               -ENOMEM) {
-                                                       schedule();
-                                                       goto retry_map;
-                                               } else
-                                                       old_ctx.ntfs_ino =
-                                                               old_ctx.
-                                                               base_ntfs_ino;
-                                       }
-                               }
-                       }
-                       /* Update the changed pointers in the saved context. */
-                       if (ctx->mrec != old_ctx.mrec) {
-                               if (!IS_ERR(ctx->mrec))
-                                       old_ctx.attr = (ATTR_RECORD*)(
-                                                       (u8*)ctx->mrec +
-                                                       ((u8*)old_ctx.attr -
-                                                       (u8*)old_ctx.mrec));
-                               old_ctx.mrec = ctx->mrec;
-                       }
-               }
-               /* Restore the search context to the saved one. */
-               *ctx = old_ctx;
-               /*
-                * We drop the reference on the page we took earlier.  In the
-                * case that IS_ERR(ctx->mrec) is true this means we might lose
-                * some changes to the mft record that had been made between
-                * the last time it was marked dirty/written out and now.  This
-                * at this stage is not a problem as the mapping error is fatal
-                * enough that the mft record cannot be written out anyway and
-                * the caller is very likely to shutdown the whole inode
-                * immediately and mark the volume dirty for chkdsk to pick up
-                * the pieces anyway.
-                */
-               if (put_this_page)
-                       put_page(put_this_page);
-       }
-       return err;
-}
-
-/**
- * ntfs_map_runlist - map (a part of) a runlist of an ntfs inode
- * @ni:                ntfs inode for which to map (part of) a runlist
- * @vcn:       map runlist part containing this vcn
- *
- * Map the part of a runlist containing the @vcn of the ntfs inode @ni.
- *
- * Return 0 on success and -errno on error.  There is one special error code
- * which is not an error as such.  This is -ENOENT.  It means that @vcn is out
- * of bounds of the runlist.
- *
- * Locking: - The runlist must be unlocked on entry and is unlocked on return.
- *         - This function takes the runlist lock for writing and may modify
- *           the runlist.
- */
-int ntfs_map_runlist(ntfs_inode *ni, VCN vcn)
-{
-       int err = 0;
-
-       down_write(&ni->runlist.lock);
-       /* Make sure someone else didn't do the work while we were sleeping. */
-       if (likely(ntfs_rl_vcn_to_lcn(ni->runlist.rl, vcn) <=
-                       LCN_RL_NOT_MAPPED))
-               err = ntfs_map_runlist_nolock(ni, vcn, NULL);
-       up_write(&ni->runlist.lock);
-       return err;
-}
-
-/**
- * ntfs_attr_vcn_to_lcn_nolock - convert a vcn into a lcn given an ntfs inode
- * @ni:                        ntfs inode of the attribute whose runlist to search
- * @vcn:               vcn to convert
- * @write_locked:      true if the runlist is locked for writing
- *
- * Find the virtual cluster number @vcn in the runlist of the ntfs attribute
- * described by the ntfs inode @ni and return the corresponding logical cluster
- * number (lcn).
- *
- * If the @vcn is not mapped yet, the attempt is made to map the attribute
- * extent containing the @vcn and the vcn to lcn conversion is retried.
- *
- * If @write_locked is true the caller has locked the runlist for writing and
- * if false for reading.
- *
- * Since lcns must be >= 0, we use negative return codes with special meaning:
- *
- * Return code Meaning / Description
- * ==========================================
- *  LCN_HOLE   Hole / not allocated on disk.
- *  LCN_ENOENT There is no such vcn in the runlist, i.e. @vcn is out of bounds.
- *  LCN_ENOMEM Not enough memory to map runlist.
- *  LCN_EIO    Critical error (runlist/file is corrupt, i/o error, etc).
- *
- * Locking: - The runlist must be locked on entry and is left locked on return.
- *         - If @write_locked is 'false', i.e. the runlist is locked for reading,
- *           the lock may be dropped inside the function so you cannot rely on
- *           the runlist still being the same when this function returns.
- */
-LCN ntfs_attr_vcn_to_lcn_nolock(ntfs_inode *ni, const VCN vcn,
-               const bool write_locked)
-{
-       LCN lcn;
-       unsigned long flags;
-       bool is_retry = false;
-
-       BUG_ON(!ni);
-       ntfs_debug("Entering for i_ino 0x%lx, vcn 0x%llx, %s_locked.",
-                       ni->mft_no, (unsigned long long)vcn,
-                       write_locked ? "write" : "read");
-       BUG_ON(!NInoNonResident(ni));
-       BUG_ON(vcn < 0);
-       if (!ni->runlist.rl) {
-               read_lock_irqsave(&ni->size_lock, flags);
-               if (!ni->allocated_size) {
-                       read_unlock_irqrestore(&ni->size_lock, flags);
-                       return LCN_ENOENT;
-               }
-               read_unlock_irqrestore(&ni->size_lock, flags);
-       }
-retry_remap:
-       /* Convert vcn to lcn.  If that fails map the runlist and retry once. */
-       lcn = ntfs_rl_vcn_to_lcn(ni->runlist.rl, vcn);
-       if (likely(lcn >= LCN_HOLE)) {
-               ntfs_debug("Done, lcn 0x%llx.", (long long)lcn);
-               return lcn;
-       }
-       if (lcn != LCN_RL_NOT_MAPPED) {
-               if (lcn != LCN_ENOENT)
-                       lcn = LCN_EIO;
-       } else if (!is_retry) {
-               int err;
-
-               if (!write_locked) {
-                       up_read(&ni->runlist.lock);
-                       down_write(&ni->runlist.lock);
-                       if (unlikely(ntfs_rl_vcn_to_lcn(ni->runlist.rl, vcn) !=
-                                       LCN_RL_NOT_MAPPED)) {
-                               up_write(&ni->runlist.lock);
-                               down_read(&ni->runlist.lock);
-                               goto retry_remap;
-                       }
-               }
-               err = ntfs_map_runlist_nolock(ni, vcn, NULL);
-               if (!write_locked) {
-                       up_write(&ni->runlist.lock);
-                       down_read(&ni->runlist.lock);
-               }
-               if (likely(!err)) {
-                       is_retry = true;
-                       goto retry_remap;
-               }
-               if (err == -ENOENT)
-                       lcn = LCN_ENOENT;
-               else if (err == -ENOMEM)
-                       lcn = LCN_ENOMEM;
-               else
-                       lcn = LCN_EIO;
-       }
-       if (lcn != LCN_ENOENT)
-               ntfs_error(ni->vol->sb, "Failed with error code %lli.",
-                               (long long)lcn);
-       return lcn;
-}
-
-/**
- * ntfs_attr_find_vcn_nolock - find a vcn in the runlist of an ntfs inode
- * @ni:                ntfs inode describing the runlist to search
- * @vcn:       vcn to find
- * @ctx:       active attribute search context if present or NULL if not
- *
- * Find the virtual cluster number @vcn in the runlist described by the ntfs
- * inode @ni and return the address of the runlist element containing the @vcn.
- *
- * If the @vcn is not mapped yet, the attempt is made to map the attribute
- * extent containing the @vcn and the vcn to lcn conversion is retried.
- *
- * If @ctx is specified, it is an active search context of @ni and its base mft
- * record.  This is needed when ntfs_attr_find_vcn_nolock() encounters unmapped
- * runlist fragments and allows their mapping.  If you do not have the mft
- * record mapped, you can specify @ctx as NULL and ntfs_attr_find_vcn_nolock()
- * will perform the necessary mapping and unmapping.
- *
- * Note, ntfs_attr_find_vcn_nolock() saves the state of @ctx on entry and
- * restores it before returning.  Thus, @ctx will be left pointing to the same
- * attribute on return as on entry.  However, the actual pointers in @ctx may
- * point to different memory locations on return, so you must remember to reset
- * any cached pointers from the @ctx, i.e. after the call to
- * ntfs_attr_find_vcn_nolock(), you will probably want to do:
- *     m = ctx->mrec;
- *     a = ctx->attr;
- * Assuming you cache ctx->attr in a variable @a of type ATTR_RECORD * and that
- * you cache ctx->mrec in a variable @m of type MFT_RECORD *.
- * Note you need to distinguish between the lcn of the returned runlist element
- * being >= 0 and LCN_HOLE.  In the later case you have to return zeroes on
- * read and allocate clusters on write.
- *
- * Return the runlist element containing the @vcn on success and
- * ERR_PTR(-errno) on error.  You need to test the return value with IS_ERR()
- * to decide if the return is success or failure and PTR_ERR() to get to the
- * error code if IS_ERR() is true.
- *
- * The possible error return codes are:
- *     -ENOENT - No such vcn in the runlist, i.e. @vcn is out of bounds.
- *     -ENOMEM - Not enough memory to map runlist.
- *     -EIO    - Critical error (runlist/file is corrupt, i/o error, etc).
- *
- * WARNING: If @ctx is supplied, regardless of whether success or failure is
- *         returned, you need to check IS_ERR(@ctx->mrec) and if 'true' the @ctx
- *         is no longer valid, i.e. you need to either call
- *         ntfs_attr_reinit_search_ctx() or ntfs_attr_put_search_ctx() on it.
- *         In that case PTR_ERR(@ctx->mrec) will give you the error code for
- *         why the mapping of the old inode failed.
- *
- * Locking: - The runlist described by @ni must be locked for writing on entry
- *           and is locked on return.  Note the runlist may be modified when
- *           needed runlist fragments need to be mapped.
- *         - If @ctx is NULL, the base mft record of @ni must not be mapped on
- *           entry and it will be left unmapped on return.
- *         - If @ctx is not NULL, the base mft record must be mapped on entry
- *           and it will be left mapped on return.
- */
-runlist_element *ntfs_attr_find_vcn_nolock(ntfs_inode *ni, const VCN vcn,
-               ntfs_attr_search_ctx *ctx)
-{
-       unsigned long flags;
-       runlist_element *rl;
-       int err = 0;
-       bool is_retry = false;
-
-       BUG_ON(!ni);
-       ntfs_debug("Entering for i_ino 0x%lx, vcn 0x%llx, with%s ctx.",
-                       ni->mft_no, (unsigned long long)vcn, ctx ? "" : "out");
-       BUG_ON(!NInoNonResident(ni));
-       BUG_ON(vcn < 0);
-       if (!ni->runlist.rl) {
-               read_lock_irqsave(&ni->size_lock, flags);
-               if (!ni->allocated_size) {
-                       read_unlock_irqrestore(&ni->size_lock, flags);
-                       return ERR_PTR(-ENOENT);
-               }
-               read_unlock_irqrestore(&ni->size_lock, flags);
-       }
-retry_remap:
-       rl = ni->runlist.rl;
-       if (likely(rl && vcn >= rl[0].vcn)) {
-               while (likely(rl->length)) {
-                       if (unlikely(vcn < rl[1].vcn)) {
-                               if (likely(rl->lcn >= LCN_HOLE)) {
-                                       ntfs_debug("Done.");
-                                       return rl;
-                               }
-                               break;
-                       }
-                       rl++;
-               }
-               if (likely(rl->lcn != LCN_RL_NOT_MAPPED)) {
-                       if (likely(rl->lcn == LCN_ENOENT))
-                               err = -ENOENT;
-                       else
-                               err = -EIO;
-               }
-       }
-       if (!err && !is_retry) {
-               /*
-                * If the search context is invalid we cannot map the unmapped
-                * region.
-                */
-               if (IS_ERR(ctx->mrec))
-                       err = PTR_ERR(ctx->mrec);
-               else {
-                       /*
-                        * The @vcn is in an unmapped region, map the runlist
-                        * and retry.
-                        */
-                       err = ntfs_map_runlist_nolock(ni, vcn, ctx);
-                       if (likely(!err)) {
-                               is_retry = true;
-                               goto retry_remap;
-                       }
-               }
-               if (err == -EINVAL)
-                       err = -EIO;
-       } else if (!err)
-               err = -EIO;
-       if (err != -ENOENT)
-               ntfs_error(ni->vol->sb, "Failed with error code %i.", err);
-       return ERR_PTR(err);
-}
-
-/**
- * ntfs_attr_find - find (next) attribute in mft record
- * @type:      attribute type to find
- * @name:      attribute name to find (optional, i.e. NULL means don't care)
- * @name_len:  attribute name length (only needed if @name present)
- * @ic:                IGNORE_CASE or CASE_SENSITIVE (ignored if @name not present)
- * @val:       attribute value to find (optional, resident attributes only)
- * @val_len:   attribute value length
- * @ctx:       search context with mft record and attribute to search from
- *
- * You should not need to call this function directly.  Use ntfs_attr_lookup()
- * instead.
- *
- * ntfs_attr_find() takes a search context @ctx as parameter and searches the
- * mft record specified by @ctx->mrec, beginning at @ctx->attr, for an
- * attribute of @type, optionally @name and @val.
- *
- * If the attribute is found, ntfs_attr_find() returns 0 and @ctx->attr will
- * point to the found attribute.
- *
- * If the attribute is not found, ntfs_attr_find() returns -ENOENT and
- * @ctx->attr will point to the attribute before which the attribute being
- * searched for would need to be inserted if such an action were to be desired.
- *
- * On actual error, ntfs_attr_find() returns -EIO.  In this case @ctx->attr is
- * undefined and in particular do not rely on it not changing.
- *
- * If @ctx->is_first is 'true', the search begins with @ctx->attr itself.  If it
- * is 'false', the search begins after @ctx->attr.
- *
- * If @ic is IGNORE_CASE, the @name comparisson is not case sensitive and
- * @ctx->ntfs_ino must be set to the ntfs inode to which the mft record
- * @ctx->mrec belongs.  This is so we can get at the ntfs volume and hence at
- * the upcase table.  If @ic is CASE_SENSITIVE, the comparison is case
- * sensitive.  When @name is present, @name_len is the @name length in Unicode
- * characters.
- *
- * If @name is not present (NULL), we assume that the unnamed attribute is
- * being searched for.
- *
- * Finally, the resident attribute value @val is looked for, if present.  If
- * @val is not present (NULL), @val_len is ignored.
- *
- * ntfs_attr_find() only searches the specified mft record and it ignores the
- * presence of an attribute list attribute (unless it is the one being searched
- * for, obviously).  If you need to take attribute lists into consideration,
- * use ntfs_attr_lookup() instead (see below).  This also means that you cannot
- * use ntfs_attr_find() to search for extent records of non-resident
- * attributes, as extents with lowest_vcn != 0 are usually described by the
- * attribute list attribute only. - Note that it is possible that the first
- * extent is only in the attribute list while the last extent is in the base
- * mft record, so do not rely on being able to find the first extent in the
- * base mft record.
- *
- * Warning: Never use @val when looking for attribute types which can be
- *         non-resident as this most likely will result in a crash!
- */
-static int ntfs_attr_find(const ATTR_TYPE type, const ntfschar *name,
-               const u32 name_len, const IGNORE_CASE_BOOL ic,
-               const u8 *val, const u32 val_len, ntfs_attr_search_ctx *ctx)
-{
-       ATTR_RECORD *a;
-       ntfs_volume *vol = ctx->ntfs_ino->vol;
-       ntfschar *upcase = vol->upcase;
-       u32 upcase_len = vol->upcase_len;
-
-       /*
-        * Iterate over attributes in mft record starting at @ctx->attr, or the
-        * attribute following that, if @ctx->is_first is 'true'.
-        */
-       if (ctx->is_first) {
-               a = ctx->attr;
-               ctx->is_first = false;
-       } else
-               a = (ATTR_RECORD*)((u8*)ctx->attr +
-                               le32_to_cpu(ctx->attr->length));
-       for (;; a = (ATTR_RECORD*)((u8*)a + le32_to_cpu(a->length))) {
-               u8 *mrec_end = (u8 *)ctx->mrec +
-                              le32_to_cpu(ctx->mrec->bytes_allocated);
-               u8 *name_end;
-
-               /* check whether ATTR_RECORD wrap */
-               if ((u8 *)a < (u8 *)ctx->mrec)
-                       break;
-
-               /* check whether Attribute Record Header is within bounds */
-               if ((u8 *)a > mrec_end ||
-                   (u8 *)a + sizeof(ATTR_RECORD) > mrec_end)
-                       break;
-
-               /* check whether ATTR_RECORD's name is within bounds */
-               name_end = (u8 *)a + le16_to_cpu(a->name_offset) +
-                          a->name_length * sizeof(ntfschar);
-               if (name_end > mrec_end)
-                       break;
-
-               ctx->attr = a;
-               if (unlikely(le32_to_cpu(a->type) > le32_to_cpu(type) ||
-                               a->type == AT_END))
-                       return -ENOENT;
-               if (unlikely(!a->length))
-                       break;
-
-               /* check whether ATTR_RECORD's length wrap */
-               if ((u8 *)a + le32_to_cpu(a->length) < (u8 *)a)
-                       break;
-               /* check whether ATTR_RECORD's length is within bounds */
-               if ((u8 *)a + le32_to_cpu(a->length) > mrec_end)
-                       break;
-
-               if (a->type != type)
-                       continue;
-               /*
-                * If @name is present, compare the two names.  If @name is
-                * missing, assume we want an unnamed attribute.
-                */
-               if (!name) {
-                       /* The search failed if the found attribute is named. */
-                       if (a->name_length)
-                               return -ENOENT;
-               } else if (!ntfs_are_names_equal(name, name_len,
-                           (ntfschar*)((u8*)a + le16_to_cpu(a->name_offset)),
-                           a->name_length, ic, upcase, upcase_len)) {
-                       register int rc;
-
-                       rc = ntfs_collate_names(name, name_len,
-                                       (ntfschar*)((u8*)a +
-                                       le16_to_cpu(a->name_offset)),
-                                       a->name_length, 1, IGNORE_CASE,
-                                       upcase, upcase_len);
-                       /*
-                        * If @name collates before a->name, there is no
-                        * matching attribute.
-                        */
-                       if (rc == -1)
-                               return -ENOENT;
-                       /* If the strings are not equal, continue search. */
-                       if (rc)
-                               continue;
-                       rc = ntfs_collate_names(name, name_len,
-                                       (ntfschar*)((u8*)a +
-                                       le16_to_cpu(a->name_offset)),
-                                       a->name_length, 1, CASE_SENSITIVE,
-                                       upcase, upcase_len);
-                       if (rc == -1)
-                               return -ENOENT;
-                       if (rc)
-                               continue;
-               }
-               /*
-                * The names match or @name not present and attribute is
-                * unnamed.  If no @val specified, we have found the attribute
-                * and are done.
-                */
-               if (!val)
-                       return 0;
-               /* @val is present; compare values. */
-               else {
-                       register int rc;
-
-                       rc = memcmp(val, (u8*)a + le16_to_cpu(
-                                       a->data.resident.value_offset),
-                                       min_t(u32, val_len, le32_to_cpu(
-                                       a->data.resident.value_length)));
-                       /*
-                        * If @val collates before the current attribute's
-                        * value, there is no matching attribute.
-                        */
-                       if (!rc) {
-                               register u32 avl;
-
-                               avl = le32_to_cpu(
-                                               a->data.resident.value_length);
-                               if (val_len == avl)
-                                       return 0;
-                               if (val_len < avl)
-                                       return -ENOENT;
-                       } else if (rc < 0)
-                               return -ENOENT;
-               }
-       }
-       ntfs_error(vol->sb, "Inode is corrupt.  Run chkdsk.");
-       NVolSetErrors(vol);
-       return -EIO;
-}
-
-/**
- * load_attribute_list - load an attribute list into memory
- * @vol:               ntfs volume from which to read
- * @runlist:           runlist of the attribute list
- * @al_start:          destination buffer
- * @size:              size of the destination buffer in bytes
- * @initialized_size:  initialized size of the attribute list
- *
- * Walk the runlist @runlist and load all clusters from it copying them into
- * the linear buffer @al. The maximum number of bytes copied to @al is @size
- * bytes. Note, @size does not need to be a multiple of the cluster size. If
- * @initialized_size is less than @size, the region in @al between
- * @initialized_size and @size will be zeroed and not read from disk.
- *
- * Return 0 on success or -errno on error.
- */
-int load_attribute_list(ntfs_volume *vol, runlist *runlist, u8 *al_start,
-               const s64 size, const s64 initialized_size)
-{
-       LCN lcn;
-       u8 *al = al_start;
-       u8 *al_end = al + initialized_size;
-       runlist_element *rl;
-       struct buffer_head *bh;
-       struct super_block *sb;
-       unsigned long block_size;
-       unsigned long block, max_block;
-       int err = 0;
-       unsigned char block_size_bits;
-
-       ntfs_debug("Entering.");
-       if (!vol || !runlist || !al || size <= 0 || initialized_size < 0 ||
-                       initialized_size > size)
-               return -EINVAL;
-       if (!initialized_size) {
-               memset(al, 0, size);
-               return 0;
-       }
-       sb = vol->sb;
-       block_size = sb->s_blocksize;
-       block_size_bits = sb->s_blocksize_bits;
-       down_read(&runlist->lock);
-       rl = runlist->rl;
-       if (!rl) {
-               ntfs_error(sb, "Cannot read attribute list since runlist is "
-                               "missing.");
-               goto err_out;   
-       }
-       /* Read all clusters specified by the runlist one run at a time. */
-       while (rl->length) {
-               lcn = ntfs_rl_vcn_to_lcn(rl, rl->vcn);
-               ntfs_debug("Reading vcn = 0x%llx, lcn = 0x%llx.",
-                               (unsigned long long)rl->vcn,
-                               (unsigned long long)lcn);
-               /* The attribute list cannot be sparse. */
-               if (lcn < 0) {
-                       ntfs_error(sb, "ntfs_rl_vcn_to_lcn() failed.  Cannot "
-                                       "read attribute list.");
-                       goto err_out;
-               }
-               block = lcn << vol->cluster_size_bits >> block_size_bits;
-               /* Read the run from device in chunks of block_size bytes. */
-               max_block = block + (rl->length << vol->cluster_size_bits >>
-                               block_size_bits);
-               ntfs_debug("max_block = 0x%lx.", max_block);
-               do {
-                       ntfs_debug("Reading block = 0x%lx.", block);
-                       bh = sb_bread(sb, block);
-                       if (!bh) {
-                               ntfs_error(sb, "sb_bread() failed. Cannot "
-                                               "read attribute list.");
-                               goto err_out;
-                       }
-                       if (al + block_size >= al_end)
-                               goto do_final;
-                       memcpy(al, bh->b_data, block_size);
-                       brelse(bh);
-                       al += block_size;
-               } while (++block < max_block);
-               rl++;
-       }
-       if (initialized_size < size) {
-initialize:
-               memset(al_start + initialized_size, 0, size - initialized_size);
-       }
-done:
-       up_read(&runlist->lock);
-       return err;
-do_final:
-       if (al < al_end) {
-               /*
-                * Partial block.
-                *
-                * Note: The attribute list can be smaller than its allocation
-                * by multiple clusters.  This has been encountered by at least
-                * two people running Windows XP, thus we cannot do any
-                * truncation sanity checking here. (AIA)
-                */
-               memcpy(al, bh->b_data, al_end - al);
-               brelse(bh);
-               if (initialized_size < size)
-                       goto initialize;
-               goto done;
-       }
-       brelse(bh);
-       /* Real overflow! */
-       ntfs_error(sb, "Attribute list buffer overflow. Read attribute list "
-                       "is truncated.");
-err_out:
-       err = -EIO;
-       goto done;
-}
-
-/**
- * ntfs_external_attr_find - find an attribute in the attribute list of an inode
- * @type:      attribute type to find
- * @name:      attribute name to find (optional, i.e. NULL means don't care)
- * @name_len:  attribute name length (only needed if @name present)
- * @ic:                IGNORE_CASE or CASE_SENSITIVE (ignored if @name not present)
- * @lowest_vcn:        lowest vcn to find (optional, non-resident attributes only)
- * @val:       attribute value to find (optional, resident attributes only)
- * @val_len:   attribute value length
- * @ctx:       search context with mft record and attribute to search from
- *
- * You should not need to call this function directly.  Use ntfs_attr_lookup()
- * instead.
- *
- * Find an attribute by searching the attribute list for the corresponding
- * attribute list entry.  Having found the entry, map the mft record if the
- * attribute is in a different mft record/inode, ntfs_attr_find() the attribute
- * in there and return it.
- *
- * On first search @ctx->ntfs_ino must be the base mft record and @ctx must
- * have been obtained from a call to ntfs_attr_get_search_ctx().  On subsequent
- * calls @ctx->ntfs_ino can be any extent inode, too (@ctx->base_ntfs_ino is
- * then the base inode).
- *
- * After finishing with the attribute/mft record you need to call
- * ntfs_attr_put_search_ctx() to cleanup the search context (unmapping any
- * mapped inodes, etc).
- *
- * If the attribute is found, ntfs_external_attr_find() returns 0 and
- * @ctx->attr will point to the found attribute.  @ctx->mrec will point to the
- * mft record in which @ctx->attr is located and @ctx->al_entry will point to
- * the attribute list entry for the attribute.
- *
- * If the attribute is not found, ntfs_external_attr_find() returns -ENOENT and
- * @ctx->attr will point to the attribute in the base mft record before which
- * the attribute being searched for would need to be inserted if such an action
- * were to be desired.  @ctx->mrec will point to the mft record in which
- * @ctx->attr is located and @ctx->al_entry will point to the attribute list
- * entry of the attribute before which the attribute being searched for would
- * need to be inserted if such an action were to be desired.
- *
- * Thus to insert the not found attribute, one wants to add the attribute to
- * @ctx->mrec (the base mft record) and if there is not enough space, the
- * attribute should be placed in a newly allocated extent mft record.  The
- * attribute list entry for the inserted attribute should be inserted in the
- * attribute list attribute at @ctx->al_entry.
- *
- * On actual error, ntfs_external_attr_find() returns -EIO.  In this case
- * @ctx->attr is undefined and in particular do not rely on it not changing.
- */
-static int ntfs_external_attr_find(const ATTR_TYPE type,
-               const ntfschar *name, const u32 name_len,
-               const IGNORE_CASE_BOOL ic, const VCN lowest_vcn,
-               const u8 *val, const u32 val_len, ntfs_attr_search_ctx *ctx)
-{
-       ntfs_inode *base_ni, *ni;
-       ntfs_volume *vol;
-       ATTR_LIST_ENTRY *al_entry, *next_al_entry;
-       u8 *al_start, *al_end;
-       ATTR_RECORD *a;
-       ntfschar *al_name;
-       u32 al_name_len;
-       int err = 0;
-       static const char *es = " Unmount and run chkdsk.";
-
-       ni = ctx->ntfs_ino;
-       base_ni = ctx->base_ntfs_ino;
-       ntfs_debug("Entering for inode 0x%lx, type 0x%x.", ni->mft_no, type);
-       if (!base_ni) {
-               /* First call happens with the base mft record. */
-               base_ni = ctx->base_ntfs_ino = ctx->ntfs_ino;
-               ctx->base_mrec = ctx->mrec;
-       }
-       if (ni == base_ni)
-               ctx->base_attr = ctx->attr;
-       if (type == AT_END)
-               goto not_found;
-       vol = base_ni->vol;
-       al_start = base_ni->attr_list;
-       al_end = al_start + base_ni->attr_list_size;
-       if (!ctx->al_entry)
-               ctx->al_entry = (ATTR_LIST_ENTRY*)al_start;
-       /*
-        * Iterate over entries in attribute list starting at @ctx->al_entry,
-        * or the entry following that, if @ctx->is_first is 'true'.
-        */
-       if (ctx->is_first) {
-               al_entry = ctx->al_entry;
-               ctx->is_first = false;
-       } else
-               al_entry = (ATTR_LIST_ENTRY*)((u8*)ctx->al_entry +
-                               le16_to_cpu(ctx->al_entry->length));
-       for (;; al_entry = next_al_entry) {
-               /* Out of bounds check. */
-               if ((u8*)al_entry < base_ni->attr_list ||
-                               (u8*)al_entry > al_end)
-                       break;  /* Inode is corrupt. */
-               ctx->al_entry = al_entry;
-               /* Catch the end of the attribute list. */
-               if ((u8*)al_entry == al_end)
-                       goto not_found;
-               if (!al_entry->length)
-                       break;
-               if ((u8*)al_entry + 6 > al_end || (u8*)al_entry +
-                               le16_to_cpu(al_entry->length) > al_end)
-                       break;
-               next_al_entry = (ATTR_LIST_ENTRY*)((u8*)al_entry +
-                               le16_to_cpu(al_entry->length));
-               if (le32_to_cpu(al_entry->type) > le32_to_cpu(type))
-                       goto not_found;
-               if (type != al_entry->type)
-                       continue;
-               /*
-                * If @name is present, compare the two names.  If @name is
-                * missing, assume we want an unnamed attribute.
-                */
-               al_name_len = al_entry->name_length;
-               al_name = (ntfschar*)((u8*)al_entry + al_entry->name_offset);
-               if (!name) {
-                       if (al_name_len)
-                               goto not_found;
-               } else if (!ntfs_are_names_equal(al_name, al_name_len, name,
-                               name_len, ic, vol->upcase, vol->upcase_len)) {
-                       register int rc;
-
-                       rc = ntfs_collate_names(name, name_len, al_name,
-                                       al_name_len, 1, IGNORE_CASE,
-                                       vol->upcase, vol->upcase_len);
-                       /*
-                        * If @name collates before al_name, there is no
-                        * matching attribute.
-                        */
-                       if (rc == -1)
-                               goto not_found;
-                       /* If the strings are not equal, continue search. */
-                       if (rc)
-                               continue;
-                       /*
-                        * FIXME: Reverse engineering showed 0, IGNORE_CASE but
-                        * that is inconsistent with ntfs_attr_find().  The
-                        * subsequent rc checks were also different.  Perhaps I
-                        * made a mistake in one of the two.  Need to recheck
-                        * which is correct or at least see what is going on...
-                        * (AIA)
-                        */
-                       rc = ntfs_collate_names(name, name_len, al_name,
-                                       al_name_len, 1, CASE_SENSITIVE,
-                                       vol->upcase, vol->upcase_len);
-                       if (rc == -1)
-                               goto not_found;
-                       if (rc)
-                               continue;
-               }
-               /*
-                * The names match or @name not present and attribute is
-                * unnamed.  Now check @lowest_vcn.  Continue search if the
-                * next attribute list entry still fits @lowest_vcn.  Otherwise
-                * we have reached the right one or the search has failed.
-                */
-               if (lowest_vcn && (u8*)next_al_entry >= al_start            &&
-                               (u8*)next_al_entry + 6 < al_end             &&
-                               (u8*)next_al_entry + le16_to_cpu(
-                                       next_al_entry->length) <= al_end    &&
-                               sle64_to_cpu(next_al_entry->lowest_vcn) <=
-                                       lowest_vcn                          &&
-                               next_al_entry->type == al_entry->type       &&
-                               next_al_entry->name_length == al_name_len   &&
-                               ntfs_are_names_equal((ntfschar*)((u8*)
-                                       next_al_entry +
-                                       next_al_entry->name_offset),
-                                       next_al_entry->name_length,
-                                       al_name, al_name_len, CASE_SENSITIVE,
-                                       vol->upcase, vol->upcase_len))
-                       continue;
-               if (MREF_LE(al_entry->mft_reference) == ni->mft_no) {
-                       if (MSEQNO_LE(al_entry->mft_reference) != ni->seq_no) {
-                               ntfs_error(vol->sb, "Found stale mft "
-                                               "reference in attribute list "
-                                               "of base inode 0x%lx.%s",
-                                               base_ni->mft_no, es);
-                               err = -EIO;
-                               break;
-                       }
-               } else { /* Mft references do not match. */
-                       /* If there is a mapped record unmap it first. */
-                       if (ni != base_ni)
-                               unmap_extent_mft_record(ni);
-                       /* Do we want the base record back? */
-                       if (MREF_LE(al_entry->mft_reference) ==
-                                       base_ni->mft_no) {
-                               ni = ctx->ntfs_ino = base_ni;
-                               ctx->mrec = ctx->base_mrec;
-                       } else {
-                               /* We want an extent record. */
-                               ctx->mrec = map_extent_mft_record(base_ni,
-                                               le64_to_cpu(
-                                               al_entry->mft_reference), &ni);
-                               if (IS_ERR(ctx->mrec)) {
-                                       ntfs_error(vol->sb, "Failed to map "
-                                                       "extent mft record "
-                                                       "0x%lx of base inode "
-                                                       "0x%lx.%s",
-                                                       MREF_LE(al_entry->
-                                                       mft_reference),
-                                                       base_ni->mft_no, es);
-                                       err = PTR_ERR(ctx->mrec);
-                                       if (err == -ENOENT)
-                                               err = -EIO;
-                                       /* Cause @ctx to be sanitized below. */
-                                       ni = NULL;
-                                       break;
-                               }
-                               ctx->ntfs_ino = ni;
-                       }
-                       ctx->attr = (ATTR_RECORD*)((u8*)ctx->mrec +
-                                       le16_to_cpu(ctx->mrec->attrs_offset));
-               }
-               /*
-                * ctx->vfs_ino, ctx->mrec, and ctx->attr now point to the
-                * mft record containing the attribute represented by the
-                * current al_entry.
-                */
-               /*
-                * We could call into ntfs_attr_find() to find the right
-                * attribute in this mft record but this would be less
-                * efficient and not quite accurate as ntfs_attr_find() ignores
-                * the attribute instance numbers for example which become
-                * important when one plays with attribute lists.  Also,
-                * because a proper match has been found in the attribute list
-                * entry above, the comparison can now be optimized.  So it is
-                * worth re-implementing a simplified ntfs_attr_find() here.
-                */
-               a = ctx->attr;
-               /*
-                * Use a manual loop so we can still use break and continue
-                * with the same meanings as above.
-                */
-do_next_attr_loop:
-               if ((u8*)a < (u8*)ctx->mrec || (u8*)a > (u8*)ctx->mrec +
-                               le32_to_cpu(ctx->mrec->bytes_allocated))
-                       break;
-               if (a->type == AT_END)
-                       break;
-               if (!a->length)
-                       break;
-               if (al_entry->instance != a->instance)
-                       goto do_next_attr;
-               /*
-                * If the type and/or the name are mismatched between the
-                * attribute list entry and the attribute record, there is
-                * corruption so we break and return error EIO.
-                */
-               if (al_entry->type != a->type)
-                       break;
-               if (!ntfs_are_names_equal((ntfschar*)((u8*)a +
-                               le16_to_cpu(a->name_offset)), a->name_length,
-                               al_name, al_name_len, CASE_SENSITIVE,
-                               vol->upcase, vol->upcase_len))
-                       break;
-               ctx->attr = a;
-               /*
-                * If no @val specified or @val specified and it matches, we
-                * have found it!
-                */
-               if (!val || (!a->non_resident && le32_to_cpu(
-                               a->data.resident.value_length) == val_len &&
-                               !memcmp((u8*)a +
-                               le16_to_cpu(a->data.resident.value_offset),
-                               val, val_len))) {
-                       ntfs_debug("Done, found.");
-                       return 0;
-               }
-do_next_attr:
-               /* Proceed to the next attribute in the current mft record. */
-               a = (ATTR_RECORD*)((u8*)a + le32_to_cpu(a->length));
-               goto do_next_attr_loop;
-       }
-       if (!err) {
-               ntfs_error(vol->sb, "Base inode 0x%lx contains corrupt "
-                               "attribute list attribute.%s", base_ni->mft_no,
-                               es);
-               err = -EIO;
-       }
-       if (ni != base_ni) {
-               if (ni)
-                       unmap_extent_mft_record(ni);
-               ctx->ntfs_ino = base_ni;
-               ctx->mrec = ctx->base_mrec;
-               ctx->attr = ctx->base_attr;
-       }
-       if (err != -ENOMEM)
-               NVolSetErrors(vol);
-       return err;
-not_found:
-       /*
-        * If we were looking for AT_END, we reset the search context @ctx and
-        * use ntfs_attr_find() to seek to the end of the base mft record.
-        */
-       if (type == AT_END) {
-               ntfs_attr_reinit_search_ctx(ctx);
-               return ntfs_attr_find(AT_END, name, name_len, ic, val, val_len,
-                               ctx);
-       }
-       /*
-        * The attribute was not found.  Before we return, we want to ensure
-        * @ctx->mrec and @ctx->attr indicate the position at which the
-        * attribute should be inserted in the base mft record.  Since we also
-        * want to preserve @ctx->al_entry we cannot reinitialize the search
-        * context using ntfs_attr_reinit_search_ctx() as this would set
-        * @ctx->al_entry to NULL.  Thus we do the necessary bits manually (see
-        * ntfs_attr_init_search_ctx() below).  Note, we _only_ preserve
-        * @ctx->al_entry as the remaining fields (base_*) are identical to
-        * their non base_ counterparts and we cannot set @ctx->base_attr
-        * correctly yet as we do not know what @ctx->attr will be set to by
-        * the call to ntfs_attr_find() below.
-        */
-       if (ni != base_ni)
-               unmap_extent_mft_record(ni);
-       ctx->mrec = ctx->base_mrec;
-       ctx->attr = (ATTR_RECORD*)((u8*)ctx->mrec +
-                       le16_to_cpu(ctx->mrec->attrs_offset));
-       ctx->is_first = true;
-       ctx->ntfs_ino = base_ni;
-       ctx->base_ntfs_ino = NULL;
-       ctx->base_mrec = NULL;
-       ctx->base_attr = NULL;
-       /*
-        * In case there are multiple matches in the base mft record, need to
-        * keep enumerating until we get an attribute not found response (or
-        * another error), otherwise we would keep returning the same attribute
-        * over and over again and all programs using us for enumeration would
-        * lock up in a tight loop.
-        */
-       do {
-               err = ntfs_attr_find(type, name, name_len, ic, val, val_len,
-                               ctx);
-       } while (!err);
-       ntfs_debug("Done, not found.");
-       return err;
-}
-
-/**
- * ntfs_attr_lookup - find an attribute in an ntfs inode
- * @type:      attribute type to find
- * @name:      attribute name to find (optional, i.e. NULL means don't care)
- * @name_len:  attribute name length (only needed if @name present)
- * @ic:                IGNORE_CASE or CASE_SENSITIVE (ignored if @name not present)
- * @lowest_vcn:        lowest vcn to find (optional, non-resident attributes only)
- * @val:       attribute value to find (optional, resident attributes only)
- * @val_len:   attribute value length
- * @ctx:       search context with mft record and attribute to search from
- *
- * Find an attribute in an ntfs inode.  On first search @ctx->ntfs_ino must
- * be the base mft record and @ctx must have been obtained from a call to
- * ntfs_attr_get_search_ctx().
- *
- * This function transparently handles attribute lists and @ctx is used to
- * continue searches where they were left off at.
- *
- * After finishing with the attribute/mft record you need to call
- * ntfs_attr_put_search_ctx() to cleanup the search context (unmapping any
- * mapped inodes, etc).
- *
- * Return 0 if the search was successful and -errno if not.
- *
- * When 0, @ctx->attr is the found attribute and it is in mft record
- * @ctx->mrec.  If an attribute list attribute is present, @ctx->al_entry is
- * the attribute list entry of the found attribute.
- *
- * When -ENOENT, @ctx->attr is the attribute which collates just after the
- * attribute being searched for, i.e. if one wants to add the attribute to the
- * mft record this is the correct place to insert it into.  If an attribute
- * list attribute is present, @ctx->al_entry is the attribute list entry which
- * collates just after the attribute list entry of the attribute being searched
- * for, i.e. if one wants to add the attribute to the mft record this is the
- * correct place to insert its attribute list entry into.
- *
- * When -errno != -ENOENT, an error occurred during the lookup.  @ctx->attr is
- * then undefined and in particular you should not rely on it not changing.
- */
-int ntfs_attr_lookup(const ATTR_TYPE type, const ntfschar *name,
-               const u32 name_len, const IGNORE_CASE_BOOL ic,
-               const VCN lowest_vcn, const u8 *val, const u32 val_len,
-               ntfs_attr_search_ctx *ctx)
-{
-       ntfs_inode *base_ni;
-
-       ntfs_debug("Entering.");
-       BUG_ON(IS_ERR(ctx->mrec));
-       if (ctx->base_ntfs_ino)
-               base_ni = ctx->base_ntfs_ino;
-       else
-               base_ni = ctx->ntfs_ino;
-       /* Sanity check, just for debugging really. */
-       BUG_ON(!base_ni);
-       if (!NInoAttrList(base_ni) || type == AT_ATTRIBUTE_LIST)
-               return ntfs_attr_find(type, name, name_len, ic, val, val_len,
-                               ctx);
-       return ntfs_external_attr_find(type, name, name_len, ic, lowest_vcn,
-                       val, val_len, ctx);
-}
-
-/**
- * ntfs_attr_init_search_ctx - initialize an attribute search context
- * @ctx:       attribute search context to initialize
- * @ni:                ntfs inode with which to initialize the search context
- * @mrec:      mft record with which to initialize the search context
- *
- * Initialize the attribute search context @ctx with @ni and @mrec.
- */
-static inline void ntfs_attr_init_search_ctx(ntfs_attr_search_ctx *ctx,
-               ntfs_inode *ni, MFT_RECORD *mrec)
-{
-       *ctx = (ntfs_attr_search_ctx) {
-               .mrec = mrec,
-               /* Sanity checks are performed elsewhere. */
-               .attr = (ATTR_RECORD*)((u8*)mrec +
-                               le16_to_cpu(mrec->attrs_offset)),
-               .is_first = true,
-               .ntfs_ino = ni,
-       };
-}
-
-/**
- * ntfs_attr_reinit_search_ctx - reinitialize an attribute search context
- * @ctx:       attribute search context to reinitialize
- *
- * Reinitialize the attribute search context @ctx, unmapping an associated
- * extent mft record if present, and initialize the search context again.
- *
- * This is used when a search for a new attribute is being started to reset
- * the search context to the beginning.
- */
-void ntfs_attr_reinit_search_ctx(ntfs_attr_search_ctx *ctx)
-{
-       if (likely(!ctx->base_ntfs_ino)) {
-               /* No attribute list. */
-               ctx->is_first = true;
-               /* Sanity checks are performed elsewhere. */
-               ctx->attr = (ATTR_RECORD*)((u8*)ctx->mrec +
-                               le16_to_cpu(ctx->mrec->attrs_offset));
-               /*
-                * This needs resetting due to ntfs_external_attr_find() which
-                * can leave it set despite having zeroed ctx->base_ntfs_ino.
-                */
-               ctx->al_entry = NULL;
-               return;
-       } /* Attribute list. */
-       if (ctx->ntfs_ino != ctx->base_ntfs_ino)
-               unmap_extent_mft_record(ctx->ntfs_ino);
-       ntfs_attr_init_search_ctx(ctx, ctx->base_ntfs_ino, ctx->base_mrec);
-       return;
-}
-
-/**
- * ntfs_attr_get_search_ctx - allocate/initialize a new attribute search context
- * @ni:                ntfs inode with which to initialize the search context
- * @mrec:      mft record with which to initialize the search context
- *
- * Allocate a new attribute search context, initialize it with @ni and @mrec,
- * and return it. Return NULL if allocation failed.
- */
-ntfs_attr_search_ctx *ntfs_attr_get_search_ctx(ntfs_inode *ni, MFT_RECORD *mrec)
-{
-       ntfs_attr_search_ctx *ctx;
-
-       ctx = kmem_cache_alloc(ntfs_attr_ctx_cache, GFP_NOFS);
-       if (ctx)
-               ntfs_attr_init_search_ctx(ctx, ni, mrec);
-       return ctx;
-}
-
-/**
- * ntfs_attr_put_search_ctx - release an attribute search context
- * @ctx:       attribute search context to free
- *
- * Release the attribute search context @ctx, unmapping an associated extent
- * mft record if present.
- */
-void ntfs_attr_put_search_ctx(ntfs_attr_search_ctx *ctx)
-{
-       if (ctx->base_ntfs_ino && ctx->ntfs_ino != ctx->base_ntfs_ino)
-               unmap_extent_mft_record(ctx->ntfs_ino);
-       kmem_cache_free(ntfs_attr_ctx_cache, ctx);
-       return;
-}
-
-#ifdef NTFS_RW
-
-/**
- * ntfs_attr_find_in_attrdef - find an attribute in the $AttrDef system file
- * @vol:       ntfs volume to which the attribute belongs
- * @type:      attribute type which to find
- *
- * Search for the attribute definition record corresponding to the attribute
- * @type in the $AttrDef system file.
- *
- * Return the attribute type definition record if found and NULL if not found.
- */
-static ATTR_DEF *ntfs_attr_find_in_attrdef(const ntfs_volume *vol,
-               const ATTR_TYPE type)
-{
-       ATTR_DEF *ad;
-
-       BUG_ON(!vol->attrdef);
-       BUG_ON(!type);
-       for (ad = vol->attrdef; (u8*)ad - (u8*)vol->attrdef <
-                       vol->attrdef_size && ad->type; ++ad) {
-               /* We have not found it yet, carry on searching. */
-               if (likely(le32_to_cpu(ad->type) < le32_to_cpu(type)))
-                       continue;
-               /* We found the attribute; return it. */
-               if (likely(ad->type == type))
-                       return ad;
-               /* We have gone too far already.  No point in continuing. */
-               break;
-       }
-       /* Attribute not found. */
-       ntfs_debug("Attribute type 0x%x not found in $AttrDef.",
-                       le32_to_cpu(type));
-       return NULL;
-}
-
-/**
- * ntfs_attr_size_bounds_check - check a size of an attribute type for validity
- * @vol:       ntfs volume to which the attribute belongs
- * @type:      attribute type which to check
- * @size:      size which to check
- *
- * Check whether the @size in bytes is valid for an attribute of @type on the
- * ntfs volume @vol.  This information is obtained from $AttrDef system file.
- *
- * Return 0 if valid, -ERANGE if not valid, or -ENOENT if the attribute is not
- * listed in $AttrDef.
- */
-int ntfs_attr_size_bounds_check(const ntfs_volume *vol, const ATTR_TYPE type,
-               const s64 size)
-{
-       ATTR_DEF *ad;
-
-       BUG_ON(size < 0);
-       /*
-        * $ATTRIBUTE_LIST has a maximum size of 256kiB, but this is not
-        * listed in $AttrDef.
-        */
-       if (unlikely(type == AT_ATTRIBUTE_LIST && size > 256 * 1024))
-               return -ERANGE;
-       /* Get the $AttrDef entry for the attribute @type. */
-       ad = ntfs_attr_find_in_attrdef(vol, type);
-       if (unlikely(!ad))
-               return -ENOENT;
-       /* Do the bounds check. */
-       if (((sle64_to_cpu(ad->min_size) > 0) &&
-                       size < sle64_to_cpu(ad->min_size)) ||
-                       ((sle64_to_cpu(ad->max_size) > 0) && size >
-                       sle64_to_cpu(ad->max_size)))
-               return -ERANGE;
-       return 0;
-}
-
-/**
- * ntfs_attr_can_be_non_resident - check if an attribute can be non-resident
- * @vol:       ntfs volume to which the attribute belongs
- * @type:      attribute type which to check
- *
- * Check whether the attribute of @type on the ntfs volume @vol is allowed to
- * be non-resident.  This information is obtained from $AttrDef system file.
- *
- * Return 0 if the attribute is allowed to be non-resident, -EPERM if not, and
- * -ENOENT if the attribute is not listed in $AttrDef.
- */
-int ntfs_attr_can_be_non_resident(const ntfs_volume *vol, const ATTR_TYPE type)
-{
-       ATTR_DEF *ad;
-
-       /* Find the attribute definition record in $AttrDef. */
-       ad = ntfs_attr_find_in_attrdef(vol, type);
-       if (unlikely(!ad))
-               return -ENOENT;
-       /* Check the flags and return the result. */
-       if (ad->flags & ATTR_DEF_RESIDENT)
-               return -EPERM;
-       return 0;
-}
-
-/**
- * ntfs_attr_can_be_resident - check if an attribute can be resident
- * @vol:       ntfs volume to which the attribute belongs
- * @type:      attribute type which to check
- *
- * Check whether the attribute of @type on the ntfs volume @vol is allowed to
- * be resident.  This information is derived from our ntfs knowledge and may
- * not be completely accurate, especially when user defined attributes are
- * present.  Basically we allow everything to be resident except for index
- * allocation and $EA attributes.
- *
- * Return 0 if the attribute is allowed to be non-resident and -EPERM if not.
- *
- * Warning: In the system file $MFT the attribute $Bitmap must be non-resident
- *         otherwise windows will not boot (blue screen of death)!  We cannot
- *         check for this here as we do not know which inode's $Bitmap is
- *         being asked about so the caller needs to special case this.
- */
-int ntfs_attr_can_be_resident(const ntfs_volume *vol, const ATTR_TYPE type)
-{
-       if (type == AT_INDEX_ALLOCATION)
-               return -EPERM;
-       return 0;
-}
-
-/**
- * ntfs_attr_record_resize - resize an attribute record
- * @m:         mft record containing attribute record
- * @a:         attribute record to resize
- * @new_size:  new size in bytes to which to resize the attribute record @a
- *
- * Resize the attribute record @a, i.e. the resident part of the attribute, in
- * the mft record @m to @new_size bytes.
- *
- * Return 0 on success and -errno on error.  The following error codes are
- * defined:
- *     -ENOSPC - Not enough space in the mft record @m to perform the resize.
- *
- * Note: On error, no modifications have been performed whatsoever.
- *
- * Warning: If you make a record smaller without having copied all the data you
- *         are interested in the data may be overwritten.
- */
-int ntfs_attr_record_resize(MFT_RECORD *m, ATTR_RECORD *a, u32 new_size)
-{
-       ntfs_debug("Entering for new_size %u.", new_size);
-       /* Align to 8 bytes if it is not already done. */
-       if (new_size & 7)
-               new_size = (new_size + 7) & ~7;
-       /* If the actual attribute length has changed, move things around. */
-       if (new_size != le32_to_cpu(a->length)) {
-               u32 new_muse = le32_to_cpu(m->bytes_in_use) -
-                               le32_to_cpu(a->length) + new_size;
-               /* Not enough space in this mft record. */
-               if (new_muse > le32_to_cpu(m->bytes_allocated))
-                       return -ENOSPC;
-               /* Move attributes following @a to their new location. */
-               memmove((u8*)a + new_size, (u8*)a + le32_to_cpu(a->length),
-                               le32_to_cpu(m->bytes_in_use) - ((u8*)a -
-                               (u8*)m) - le32_to_cpu(a->length));
-               /* Adjust @m to reflect the change in used space. */
-               m->bytes_in_use = cpu_to_le32(new_muse);
-               /* Adjust @a to reflect the new size. */
-               if (new_size >= offsetof(ATTR_REC, length) + sizeof(a->length))
-                       a->length = cpu_to_le32(new_size);
-       }
-       return 0;
-}
-
-/**
- * ntfs_resident_attr_value_resize - resize the value of a resident attribute
- * @m:         mft record containing attribute record
- * @a:         attribute record whose value to resize
- * @new_size:  new size in bytes to which to resize the attribute value of @a
- *
- * Resize the value of the attribute @a in the mft record @m to @new_size bytes.
- * If the value is made bigger, the newly allocated space is cleared.
- *
- * Return 0 on success and -errno on error.  The following error codes are
- * defined:
- *     -ENOSPC - Not enough space in the mft record @m to perform the resize.
- *
- * Note: On error, no modifications have been performed whatsoever.
- *
- * Warning: If you make a record smaller without having copied all the data you
- *         are interested in the data may be overwritten.
- */
-int ntfs_resident_attr_value_resize(MFT_RECORD *m, ATTR_RECORD *a,
-               const u32 new_size)
-{
-       u32 old_size;
-
-       /* Resize the resident part of the attribute record. */
-       if (ntfs_attr_record_resize(m, a,
-                       le16_to_cpu(a->data.resident.value_offset) + new_size))
-               return -ENOSPC;
-       /*
-        * The resize succeeded!  If we made the attribute value bigger, clear
-        * the area between the old size and @new_size.
-        */
-       old_size = le32_to_cpu(a->data.resident.value_length);
-       if (new_size > old_size)
-               memset((u8*)a + le16_to_cpu(a->data.resident.value_offset) +
-                               old_size, 0, new_size - old_size);
-       /* Finally update the length of the attribute value. */
-       a->data.resident.value_length = cpu_to_le32(new_size);
-       return 0;
-}
-
-/**
- * ntfs_attr_make_non_resident - convert a resident to a non-resident attribute
- * @ni:                ntfs inode describing the attribute to convert
- * @data_size: size of the resident data to copy to the non-resident attribute
- *
- * Convert the resident ntfs attribute described by the ntfs inode @ni to a
- * non-resident one.
- *
- * @data_size must be equal to the attribute value size.  This is needed since
- * we need to know the size before we can map the mft record and our callers
- * always know it.  The reason we cannot simply read the size from the vfs
- * inode i_size is that this is not necessarily uptodate.  This happens when
- * ntfs_attr_make_non_resident() is called in the ->truncate call path(s).
- *
- * Return 0 on success and -errno on error.  The following error return codes
- * are defined:
- *     -EPERM  - The attribute is not allowed to be non-resident.
- *     -ENOMEM - Not enough memory.
- *     -ENOSPC - Not enough disk space.
- *     -EINVAL - Attribute not defined on the volume.
- *     -EIO    - I/o error or other error.
- * Note that -ENOSPC is also returned in the case that there is not enough
- * space in the mft record to do the conversion.  This can happen when the mft
- * record is already very full.  The caller is responsible for trying to make
- * space in the mft record and trying again.  FIXME: Do we need a separate
- * error return code for this kind of -ENOSPC or is it always worth trying
- * again in case the attribute may then fit in a resident state so no need to
- * make it non-resident at all?  Ho-hum...  (AIA)
- *
- * NOTE to self: No changes in the attribute list are required to move from
- *              a resident to a non-resident attribute.
- *
- * Locking: - The caller must hold i_mutex on the inode.
- */
-int ntfs_attr_make_non_resident(ntfs_inode *ni, const u32 data_size)
-{
-       s64 new_size;
-       struct inode *vi = VFS_I(ni);
-       ntfs_volume *vol = ni->vol;
-       ntfs_inode *base_ni;
-       MFT_RECORD *m;
-       ATTR_RECORD *a;
-       ntfs_attr_search_ctx *ctx;
-       struct page *page;
-       runlist_element *rl;
-       u8 *kaddr;
-       unsigned long flags;
-       int mp_size, mp_ofs, name_ofs, arec_size, err, err2;
-       u32 attr_size;
-       u8 old_res_attr_flags;
-
-       /* Check that the attribute is allowed to be non-resident. */
-       err = ntfs_attr_can_be_non_resident(vol, ni->type);
-       if (unlikely(err)) {
-               if (err == -EPERM)
-                       ntfs_debug("Attribute is not allowed to be "
-                                       "non-resident.");
-               else
-                       ntfs_debug("Attribute not defined on the NTFS "
-                                       "volume!");
-               return err;
-       }
-       /*
-        * FIXME: Compressed and encrypted attributes are not supported when
-        * writing and we should never have gotten here for them.
-        */
-       BUG_ON(NInoCompressed(ni));
-       BUG_ON(NInoEncrypted(ni));
-       /*
-        * The size needs to be aligned to a cluster boundary for allocation
-        * purposes.
-        */
-       new_size = (data_size + vol->cluster_size - 1) &
-                       ~(vol->cluster_size - 1);
-       if (new_size > 0) {
-               /*
-                * Will need the page later and since the page lock nests
-                * outside all ntfs locks, we need to get the page now.
-                */
-               page = find_or_create_page(vi->i_mapping, 0,
-                               mapping_gfp_mask(vi->i_mapping));
-               if (unlikely(!page))
-                       return -ENOMEM;
-               /* Start by allocating clusters to hold the attribute value. */
-               rl = ntfs_cluster_alloc(vol, 0, new_size >>
-                               vol->cluster_size_bits, -1, DATA_ZONE, true);
-               if (IS_ERR(rl)) {
-                       err = PTR_ERR(rl);
-                       ntfs_debug("Failed to allocate cluster%s, error code "
-                                       "%i.", (new_size >>
-                                       vol->cluster_size_bits) > 1 ? "s" : "",
-                                       err);
-                       goto page_err_out;
-               }
-       } else {
-               rl = NULL;
-               page = NULL;
-       }
-       /* Determine the size of the mapping pairs array. */
-       mp_size = ntfs_get_size_for_mapping_pairs(vol, rl, 0, -1);
-       if (unlikely(mp_size < 0)) {
-               err = mp_size;
-               ntfs_debug("Failed to get size for mapping pairs array, error "
-                               "code %i.", err);
-               goto rl_err_out;
-       }
-       down_write(&ni->runlist.lock);
-       if (!NInoAttr(ni))
-               base_ni = ni;
-       else
-               base_ni = ni->ext.base_ntfs_ino;
-       m = map_mft_record(base_ni);
-       if (IS_ERR(m)) {
-               err = PTR_ERR(m);
-               m = NULL;
-               ctx = NULL;
-               goto err_out;
-       }
-       ctx = ntfs_attr_get_search_ctx(base_ni, m);
-       if (unlikely(!ctx)) {
-               err = -ENOMEM;
-               goto err_out;
-       }
-       err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
-                       CASE_SENSITIVE, 0, NULL, 0, ctx);
-       if (unlikely(err)) {
-               if (err == -ENOENT)
-                       err = -EIO;
-               goto err_out;
-       }
-       m = ctx->mrec;
-       a = ctx->attr;
-       BUG_ON(NInoNonResident(ni));
-       BUG_ON(a->non_resident);
-       /*
-        * Calculate new offsets for the name and the mapping pairs array.
-        */
-       if (NInoSparse(ni) || NInoCompressed(ni))
-               name_ofs = (offsetof(ATTR_REC,
-                               data.non_resident.compressed_size) +
-                               sizeof(a->data.non_resident.compressed_size) +
-                               7) & ~7;
-       else
-               name_ofs = (offsetof(ATTR_REC,
-                               data.non_resident.compressed_size) + 7) & ~7;
-       mp_ofs = (name_ofs + a->name_length * sizeof(ntfschar) + 7) & ~7;
-       /*
-        * Determine the size of the resident part of the now non-resident
-        * attribute record.
-        */
-       arec_size = (mp_ofs + mp_size + 7) & ~7;
-       /*
-        * If the page is not uptodate bring it uptodate by copying from the
-        * attribute value.
-        */
-       attr_size = le32_to_cpu(a->data.resident.value_length);
-       BUG_ON(attr_size != data_size);
-       if (page && !PageUptodate(page)) {
-               kaddr = kmap_atomic(page);
-               memcpy(kaddr, (u8*)a +
-                               le16_to_cpu(a->data.resident.value_offset),
-                               attr_size);
-               memset(kaddr + attr_size, 0, PAGE_SIZE - attr_size);
-               kunmap_atomic(kaddr);
-               flush_dcache_page(page);
-               SetPageUptodate(page);
-       }
-       /* Backup the attribute flag. */
-       old_res_attr_flags = a->data.resident.flags;
-       /* Resize the resident part of the attribute record. */
-       err = ntfs_attr_record_resize(m, a, arec_size);
-       if (unlikely(err))
-               goto err_out;
-       /*
-        * Convert the resident part of the attribute record to describe a
-        * non-resident attribute.
-        */
-       a->non_resident = 1;
-       /* Move the attribute name if it exists and update the offset. */
-       if (a->name_length)
-               memmove((u8*)a + name_ofs, (u8*)a + le16_to_cpu(a->name_offset),
-                               a->name_length * sizeof(ntfschar));
-       a->name_offset = cpu_to_le16(name_ofs);
-       /* Setup the fields specific to non-resident attributes. */
-       a->data.non_resident.lowest_vcn = 0;
-       a->data.non_resident.highest_vcn = cpu_to_sle64((new_size - 1) >>
-                       vol->cluster_size_bits);
-       a->data.non_resident.mapping_pairs_offset = cpu_to_le16(mp_ofs);
-       memset(&a->data.non_resident.reserved, 0,
-                       sizeof(a->data.non_resident.reserved));
-       a->data.non_resident.allocated_size = cpu_to_sle64(new_size);
-       a->data.non_resident.data_size =
-                       a->data.non_resident.initialized_size =
-                       cpu_to_sle64(attr_size);
-       if (NInoSparse(ni) || NInoCompressed(ni)) {
-               a->data.non_resident.compression_unit = 0;
-               if (NInoCompressed(ni) || vol->major_ver < 3)
-                       a->data.non_resident.compression_unit = 4;
-               a->data.non_resident.compressed_size =
-                               a->data.non_resident.allocated_size;
-       } else
-               a->data.non_resident.compression_unit = 0;
-       /* Generate the mapping pairs array into the attribute record. */
-       err = ntfs_mapping_pairs_build(vol, (u8*)a + mp_ofs,
-                       arec_size - mp_ofs, rl, 0, -1, NULL);
-       if (unlikely(err)) {
-               ntfs_debug("Failed to build mapping pairs, error code %i.",
-                               err);
-               goto undo_err_out;
-       }
-       /* Setup the in-memory attribute structure to be non-resident. */
-       ni->runlist.rl = rl;
-       write_lock_irqsave(&ni->size_lock, flags);
-       ni->allocated_size = new_size;
-       if (NInoSparse(ni) || NInoCompressed(ni)) {
-               ni->itype.compressed.size = ni->allocated_size;
-               if (a->data.non_resident.compression_unit) {
-                       ni->itype.compressed.block_size = 1U << (a->data.
-                                       non_resident.compression_unit +
-                                       vol->cluster_size_bits);
-                       ni->itype.compressed.block_size_bits =
-                                       ffs(ni->itype.compressed.block_size) -
-                                       1;
-                       ni->itype.compressed.block_clusters = 1U <<
-                                       a->data.non_resident.compression_unit;
-               } else {
-                       ni->itype.compressed.block_size = 0;
-                       ni->itype.compressed.block_size_bits = 0;
-                       ni->itype.compressed.block_clusters = 0;
-               }
-               vi->i_blocks = ni->itype.compressed.size >> 9;
-       } else
-               vi->i_blocks = ni->allocated_size >> 9;
-       write_unlock_irqrestore(&ni->size_lock, flags);
-       /*
-        * This needs to be last since the address space operations ->read_folio
-        * and ->writepage can run concurrently with us as they are not
-        * serialized on i_mutex.  Note, we are not allowed to fail once we flip
-        * this switch, which is another reason to do this last.
-        */
-       NInoSetNonResident(ni);
-       /* Mark the mft record dirty, so it gets written back. */
-       flush_dcache_mft_record_page(ctx->ntfs_ino);
-       mark_mft_record_dirty(ctx->ntfs_ino);
-       ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(base_ni);
-       up_write(&ni->runlist.lock);
-       if (page) {
-               set_page_dirty(page);
-               unlock_page(page);
-               put_page(page);
-       }
-       ntfs_debug("Done.");
-       return 0;
-undo_err_out:
-       /* Convert the attribute back into a resident attribute. */
-       a->non_resident = 0;
-       /* Move the attribute name if it exists and update the offset. */
-       name_ofs = (offsetof(ATTR_RECORD, data.resident.reserved) +
-                       sizeof(a->data.resident.reserved) + 7) & ~7;
-       if (a->name_length)
-               memmove((u8*)a + name_ofs, (u8*)a + le16_to_cpu(a->name_offset),
-                               a->name_length * sizeof(ntfschar));
-       mp_ofs = (name_ofs + a->name_length * sizeof(ntfschar) + 7) & ~7;
-       a->name_offset = cpu_to_le16(name_ofs);
-       arec_size = (mp_ofs + attr_size + 7) & ~7;
-       /* Resize the resident part of the attribute record. */
-       err2 = ntfs_attr_record_resize(m, a, arec_size);
-       if (unlikely(err2)) {
-               /*
-                * This cannot happen (well if memory corruption is at work it
-                * could happen in theory), but deal with it as well as we can.
-                * If the old size is too small, truncate the attribute,
-                * otherwise simply give it a larger allocated size.
-                * FIXME: Should check whether chkdsk complains when the
-                * allocated size is much bigger than the resident value size.
-                */
-               arec_size = le32_to_cpu(a->length);
-               if ((mp_ofs + attr_size) > arec_size) {
-                       err2 = attr_size;
-                       attr_size = arec_size - mp_ofs;
-                       ntfs_error(vol->sb, "Failed to undo partial resident "
-                                       "to non-resident attribute "
-                                       "conversion.  Truncating inode 0x%lx, "
-                                       "attribute type 0x%x from %i bytes to "
-                                       "%i bytes to maintain metadata "
-                                       "consistency.  THIS MEANS YOU ARE "
-                                       "LOSING %i BYTES DATA FROM THIS %s.",
-                                       vi->i_ino,
-                                       (unsigned)le32_to_cpu(ni->type),
-                                       err2, attr_size, err2 - attr_size,
-                                       ((ni->type == AT_DATA) &&
-                                       !ni->name_len) ? "FILE": "ATTRIBUTE");
-                       write_lock_irqsave(&ni->size_lock, flags);
-                       ni->initialized_size = attr_size;
-                       i_size_write(vi, attr_size);
-                       write_unlock_irqrestore(&ni->size_lock, flags);
-               }
-       }
-       /* Setup the fields specific to resident attributes. */
-       a->data.resident.value_length = cpu_to_le32(attr_size);
-       a->data.resident.value_offset = cpu_to_le16(mp_ofs);
-       a->data.resident.flags = old_res_attr_flags;
-       memset(&a->data.resident.reserved, 0,
-                       sizeof(a->data.resident.reserved));
-       /* Copy the data from the page back to the attribute value. */
-       if (page) {
-               kaddr = kmap_atomic(page);
-               memcpy((u8*)a + mp_ofs, kaddr, attr_size);
-               kunmap_atomic(kaddr);
-       }
-       /* Setup the allocated size in the ntfs inode in case it changed. */
-       write_lock_irqsave(&ni->size_lock, flags);
-       ni->allocated_size = arec_size - mp_ofs;
-       write_unlock_irqrestore(&ni->size_lock, flags);
-       /* Mark the mft record dirty, so it gets written back. */
-       flush_dcache_mft_record_page(ctx->ntfs_ino);
-       mark_mft_record_dirty(ctx->ntfs_ino);
-err_out:
-       if (ctx)
-               ntfs_attr_put_search_ctx(ctx);
-       if (m)
-               unmap_mft_record(base_ni);
-       ni->runlist.rl = NULL;
-       up_write(&ni->runlist.lock);
-rl_err_out:
-       if (rl) {
-               if (ntfs_cluster_free_from_rl(vol, rl) < 0) {
-                       ntfs_error(vol->sb, "Failed to release allocated "
-                                       "cluster(s) in error code path.  Run "
-                                       "chkdsk to recover the lost "
-                                       "cluster(s).");
-                       NVolSetErrors(vol);
-               }
-               ntfs_free(rl);
-page_err_out:
-               unlock_page(page);
-               put_page(page);
-       }
-       if (err == -EINVAL)
-               err = -EIO;
-       return err;
-}
-
-/**
- * ntfs_attr_extend_allocation - extend the allocated space of an attribute
- * @ni:                        ntfs inode of the attribute whose allocation to extend
- * @new_alloc_size:    new size in bytes to which to extend the allocation to
- * @new_data_size:     new size in bytes to which to extend the data to
- * @data_start:                beginning of region which is required to be non-sparse
- *
- * Extend the allocated space of an attribute described by the ntfs inode @ni
- * to @new_alloc_size bytes.  If @data_start is -1, the whole extension may be
- * implemented as a hole in the file (as long as both the volume and the ntfs
- * inode @ni have sparse support enabled).  If @data_start is >= 0, then the
- * region between the old allocated size and @data_start - 1 may be made sparse
- * but the regions between @data_start and @new_alloc_size must be backed by
- * actual clusters.
- *
- * If @new_data_size is -1, it is ignored.  If it is >= 0, then the data size
- * of the attribute is extended to @new_data_size.  Note that the i_size of the
- * vfs inode is not updated.  Only the data size in the base attribute record
- * is updated.  The caller has to update i_size separately if this is required.
- * WARNING: It is a BUG() for @new_data_size to be smaller than the old data
- * size as well as for @new_data_size to be greater than @new_alloc_size.
- *
- * For resident attributes this involves resizing the attribute record and if
- * necessary moving it and/or other attributes into extent mft records and/or
- * converting the attribute to a non-resident attribute which in turn involves
- * extending the allocation of a non-resident attribute as described below.
- *
- * For non-resident attributes this involves allocating clusters in the data
- * zone on the volume (except for regions that are being made sparse) and
- * extending the run list to describe the allocated clusters as well as
- * updating the mapping pairs array of the attribute.  This in turn involves
- * resizing the attribute record and if necessary moving it and/or other
- * attributes into extent mft records and/or splitting the attribute record
- * into multiple extent attribute records.
- *
- * Also, the attribute list attribute is updated if present and in some of the
- * above cases (the ones where extent mft records/attributes come into play),
- * an attribute list attribute is created if not already present.
- *
- * Return the new allocated size on success and -errno on error.  In the case
- * that an error is encountered but a partial extension at least up to
- * @data_start (if present) is possible, the allocation is partially extended
- * and this is returned.  This means the caller must check the returned size to
- * determine if the extension was partial.  If @data_start is -1 then partial
- * allocations are not performed.
- *
- * WARNING: Do not call ntfs_attr_extend_allocation() for $MFT/$DATA.
- *
- * Locking: This function takes the runlist lock of @ni for writing as well as
- * locking the mft record of the base ntfs inode.  These locks are maintained
- * throughout execution of the function.  These locks are required so that the
- * attribute can be resized safely and so that it can for example be converted
- * from resident to non-resident safely.
- *
- * TODO: At present attribute list attribute handling is not implemented.
- *
- * TODO: At present it is not safe to call this function for anything other
- * than the $DATA attribute(s) of an uncompressed and unencrypted file.
- */
-s64 ntfs_attr_extend_allocation(ntfs_inode *ni, s64 new_alloc_size,
-               const s64 new_data_size, const s64 data_start)
-{
-       VCN vcn;
-       s64 ll, allocated_size, start = data_start;
-       struct inode *vi = VFS_I(ni);
-       ntfs_volume *vol = ni->vol;
-       ntfs_inode *base_ni;
-       MFT_RECORD *m;
-       ATTR_RECORD *a;
-       ntfs_attr_search_ctx *ctx;
-       runlist_element *rl, *rl2;
-       unsigned long flags;
-       int err, mp_size;
-       u32 attr_len = 0; /* Silence stupid gcc warning. */
-       bool mp_rebuilt;
-
-#ifdef DEBUG
-       read_lock_irqsave(&ni->size_lock, flags);
-       allocated_size = ni->allocated_size;
-       read_unlock_irqrestore(&ni->size_lock, flags);
-       ntfs_debug("Entering for i_ino 0x%lx, attribute type 0x%x, "
-                       "old_allocated_size 0x%llx, "
-                       "new_allocated_size 0x%llx, new_data_size 0x%llx, "
-                       "data_start 0x%llx.", vi->i_ino,
-                       (unsigned)le32_to_cpu(ni->type),
-                       (unsigned long long)allocated_size,
-                       (unsigned long long)new_alloc_size,
-                       (unsigned long long)new_data_size,
-                       (unsigned long long)start);
-#endif
-retry_extend:
-       /*
-        * For non-resident attributes, @start and @new_size need to be aligned
-        * to cluster boundaries for allocation purposes.
-        */
-       if (NInoNonResident(ni)) {
-               if (start > 0)
-                       start &= ~(s64)vol->cluster_size_mask;
-               new_alloc_size = (new_alloc_size + vol->cluster_size - 1) &
-                               ~(s64)vol->cluster_size_mask;
-       }
-       BUG_ON(new_data_size >= 0 && new_data_size > new_alloc_size);
-       /* Check if new size is allowed in $AttrDef. */
-       err = ntfs_attr_size_bounds_check(vol, ni->type, new_alloc_size);
-       if (unlikely(err)) {
-               /* Only emit errors when the write will fail completely. */
-               read_lock_irqsave(&ni->size_lock, flags);
-               allocated_size = ni->allocated_size;
-               read_unlock_irqrestore(&ni->size_lock, flags);
-               if (start < 0 || start >= allocated_size) {
-                       if (err == -ERANGE) {
-                               ntfs_error(vol->sb, "Cannot extend allocation "
-                                               "of inode 0x%lx, attribute "
-                                               "type 0x%x, because the new "
-                                               "allocation would exceed the "
-                                               "maximum allowed size for "
-                                               "this attribute type.",
-                                               vi->i_ino, (unsigned)
-                                               le32_to_cpu(ni->type));
-                       } else {
-                               ntfs_error(vol->sb, "Cannot extend allocation "
-                                               "of inode 0x%lx, attribute "
-                                               "type 0x%x, because this "
-                                               "attribute type is not "
-                                               "defined on the NTFS volume.  "
-                                               "Possible corruption!  You "
-                                               "should run chkdsk!",
-                                               vi->i_ino, (unsigned)
-                                               le32_to_cpu(ni->type));
-                       }
-               }
-               /* Translate error code to be POSIX conformant for write(2). */
-               if (err == -ERANGE)
-                       err = -EFBIG;
-               else
-                       err = -EIO;
-               return err;
-       }
-       if (!NInoAttr(ni))
-               base_ni = ni;
-       else
-               base_ni = ni->ext.base_ntfs_ino;
-       /*
-        * We will be modifying both the runlist (if non-resident) and the mft
-        * record so lock them both down.
-        */
-       down_write(&ni->runlist.lock);
-       m = map_mft_record(base_ni);
-       if (IS_ERR(m)) {
-               err = PTR_ERR(m);
-               m = NULL;
-               ctx = NULL;
-               goto err_out;
-       }
-       ctx = ntfs_attr_get_search_ctx(base_ni, m);
-       if (unlikely(!ctx)) {
-               err = -ENOMEM;
-               goto err_out;
-       }
-       read_lock_irqsave(&ni->size_lock, flags);
-       allocated_size = ni->allocated_size;
-       read_unlock_irqrestore(&ni->size_lock, flags);
-       /*
-        * If non-resident, seek to the last extent.  If resident, there is
-        * only one extent, so seek to that.
-        */
-       vcn = NInoNonResident(ni) ? allocated_size >> vol->cluster_size_bits :
-                       0;
-       /*
-        * Abort if someone did the work whilst we waited for the locks.  If we
-        * just converted the attribute from resident to non-resident it is
-        * likely that exactly this has happened already.  We cannot quite
-        * abort if we need to update the data size.
-        */
-       if (unlikely(new_alloc_size <= allocated_size)) {
-               ntfs_debug("Allocated size already exceeds requested size.");
-               new_alloc_size = allocated_size;
-               if (new_data_size < 0)
-                       goto done;
-               /*
-                * We want the first attribute extent so that we can update the
-                * data size.
-                */
-               vcn = 0;
-       }
-       err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
-                       CASE_SENSITIVE, vcn, NULL, 0, ctx);
-       if (unlikely(err)) {
-               if (err == -ENOENT)
-                       err = -EIO;
-               goto err_out;
-       }
-       m = ctx->mrec;
-       a = ctx->attr;
-       /* Use goto to reduce indentation. */
-       if (a->non_resident)
-               goto do_non_resident_extend;
-       BUG_ON(NInoNonResident(ni));
-       /* The total length of the attribute value. */
-       attr_len = le32_to_cpu(a->data.resident.value_length);
-       /*
-        * Extend the attribute record to be able to store the new attribute
-        * size.  ntfs_attr_record_resize() will not do anything if the size is
-        * not changing.
-        */
-       if (new_alloc_size < vol->mft_record_size &&
-                       !ntfs_attr_record_resize(m, a,
-                       le16_to_cpu(a->data.resident.value_offset) +
-                       new_alloc_size)) {
-               /* The resize succeeded! */
-               write_lock_irqsave(&ni->size_lock, flags);
-               ni->allocated_size = le32_to_cpu(a->length) -
-                               le16_to_cpu(a->data.resident.value_offset);
-               write_unlock_irqrestore(&ni->size_lock, flags);
-               if (new_data_size >= 0) {
-                       BUG_ON(new_data_size < attr_len);
-                       a->data.resident.value_length =
-                                       cpu_to_le32((u32)new_data_size);
-               }
-               goto flush_done;
-       }
-       /*
-        * We have to drop all the locks so we can call
-        * ntfs_attr_make_non_resident().  This could be optimised by try-
-        * locking the first page cache page and only if that fails dropping
-        * the locks, locking the page, and redoing all the locking and
-        * lookups.  While this would be a huge optimisation, it is not worth
-        * it as this is definitely a slow code path.
-        */
-       ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(base_ni);
-       up_write(&ni->runlist.lock);
-       /*
-        * Not enough space in the mft record, try to make the attribute
-        * non-resident and if successful restart the extension process.
-        */
-       err = ntfs_attr_make_non_resident(ni, attr_len);
-       if (likely(!err))
-               goto retry_extend;
-       /*
-        * Could not make non-resident.  If this is due to this not being
-        * permitted for this attribute type or there not being enough space,
-        * try to make other attributes non-resident.  Otherwise fail.
-        */
-       if (unlikely(err != -EPERM && err != -ENOSPC)) {
-               /* Only emit errors when the write will fail completely. */
-               read_lock_irqsave(&ni->size_lock, flags);
-               allocated_size = ni->allocated_size;
-               read_unlock_irqrestore(&ni->size_lock, flags);
-               if (start < 0 || start >= allocated_size)
-                       ntfs_error(vol->sb, "Cannot extend allocation of "
-                                       "inode 0x%lx, attribute type 0x%x, "
-                                       "because the conversion from resident "
-                                       "to non-resident attribute failed "
-                                       "with error code %i.", vi->i_ino,
-                                       (unsigned)le32_to_cpu(ni->type), err);
-               if (err != -ENOMEM)
-                       err = -EIO;
-               goto conv_err_out;
-       }
-       /* TODO: Not implemented from here, abort. */
-       read_lock_irqsave(&ni->size_lock, flags);
-       allocated_size = ni->allocated_size;
-       read_unlock_irqrestore(&ni->size_lock, flags);
-       if (start < 0 || start >= allocated_size) {
-               if (err == -ENOSPC)
-                       ntfs_error(vol->sb, "Not enough space in the mft "
-                                       "record/on disk for the non-resident "
-                                       "attribute value.  This case is not "
-                                       "implemented yet.");
-               else /* if (err == -EPERM) */
-                       ntfs_error(vol->sb, "This attribute type may not be "
-                                       "non-resident.  This case is not "
-                                       "implemented yet.");
-       }
-       err = -EOPNOTSUPP;
-       goto conv_err_out;
-#if 0
-       // TODO: Attempt to make other attributes non-resident.
-       if (!err)
-               goto do_resident_extend;
-       /*
-        * Both the attribute list attribute and the standard information
-        * attribute must remain in the base inode.  Thus, if this is one of
-        * these attributes, we have to try to move other attributes out into
-        * extent mft records instead.
-        */
-       if (ni->type == AT_ATTRIBUTE_LIST ||
-                       ni->type == AT_STANDARD_INFORMATION) {
-               // TODO: Attempt to move other attributes into extent mft
-               // records.
-               err = -EOPNOTSUPP;
-               if (!err)
-                       goto do_resident_extend;
-               goto err_out;
-       }
-       // TODO: Attempt to move this attribute to an extent mft record, but
-       // only if it is not already the only attribute in an mft record in
-       // which case there would be nothing to gain.
-       err = -EOPNOTSUPP;
-       if (!err)
-               goto do_resident_extend;
-       /* There is nothing we can do to make enough space. )-: */
-       goto err_out;
-#endif
-do_non_resident_extend:
-       BUG_ON(!NInoNonResident(ni));
-       if (new_alloc_size == allocated_size) {
-               BUG_ON(vcn);
-               goto alloc_done;
-       }
-       /*
-        * If the data starts after the end of the old allocation, this is a
-        * $DATA attribute and sparse attributes are enabled on the volume and
-        * for this inode, then create a sparse region between the old
-        * allocated size and the start of the data.  Otherwise simply proceed
-        * with filling the whole space between the old allocated size and the
-        * new allocated size with clusters.
-        */
-       if ((start >= 0 && start <= allocated_size) || ni->type != AT_DATA ||
-                       !NVolSparseEnabled(vol) || NInoSparseDisabled(ni))
-               goto skip_sparse;
-       // TODO: This is not implemented yet.  We just fill in with real
-       // clusters for now...
-       ntfs_debug("Inserting holes is not-implemented yet.  Falling back to "
-                       "allocating real clusters instead.");
-skip_sparse:
-       rl = ni->runlist.rl;
-       if (likely(rl)) {
-               /* Seek to the end of the runlist. */
-               while (rl->length)
-                       rl++;
-       }
-       /* If this attribute extent is not mapped, map it now. */
-       if (unlikely(!rl || rl->lcn == LCN_RL_NOT_MAPPED ||
-                       (rl->lcn == LCN_ENOENT && rl > ni->runlist.rl &&
-                       (rl-1)->lcn == LCN_RL_NOT_MAPPED))) {
-               if (!rl && !allocated_size)
-                       goto first_alloc;
-               rl = ntfs_mapping_pairs_decompress(vol, a, ni->runlist.rl);
-               if (IS_ERR(rl)) {
-                       err = PTR_ERR(rl);
-                       if (start < 0 || start >= allocated_size)
-                               ntfs_error(vol->sb, "Cannot extend allocation "
-                                               "of inode 0x%lx, attribute "
-                                               "type 0x%x, because the "
-                                               "mapping of a runlist "
-                                               "fragment failed with error "
-                                               "code %i.", vi->i_ino,
-                                               (unsigned)le32_to_cpu(ni->type),
-                                               err);
-                       if (err != -ENOMEM)
-                               err = -EIO;
-                       goto err_out;
-               }
-               ni->runlist.rl = rl;
-               /* Seek to the end of the runlist. */
-               while (rl->length)
-                       rl++;
-       }
-       /*
-        * We now know the runlist of the last extent is mapped and @rl is at
-        * the end of the runlist.  We want to begin allocating clusters
-        * starting at the last allocated cluster to reduce fragmentation.  If
-        * there are no valid LCNs in the attribute we let the cluster
-        * allocator choose the starting cluster.
-        */
-       /* If the last LCN is a hole or simillar seek back to last real LCN. */
-       while (rl->lcn < 0 && rl > ni->runlist.rl)
-               rl--;
-first_alloc:
-       // FIXME: Need to implement partial allocations so at least part of the
-       // write can be performed when start >= 0.  (Needed for POSIX write(2)
-       // conformance.)
-       rl2 = ntfs_cluster_alloc(vol, allocated_size >> vol->cluster_size_bits,
-                       (new_alloc_size - allocated_size) >>
-                       vol->cluster_size_bits, (rl && (rl->lcn >= 0)) ?
-                       rl->lcn + rl->length : -1, DATA_ZONE, true);
-       if (IS_ERR(rl2)) {
-               err = PTR_ERR(rl2);
-               if (start < 0 || start >= allocated_size)
-                       ntfs_error(vol->sb, "Cannot extend allocation of "
-                                       "inode 0x%lx, attribute type 0x%x, "
-                                       "because the allocation of clusters "
-                                       "failed with error code %i.", vi->i_ino,
-                                       (unsigned)le32_to_cpu(ni->type), err);
-               if (err != -ENOMEM && err != -ENOSPC)
-                       err = -EIO;
-               goto err_out;
-       }
-       rl = ntfs_runlists_merge(ni->runlist.rl, rl2);
-       if (IS_ERR(rl)) {
-               err = PTR_ERR(rl);
-               if (start < 0 || start >= allocated_size)
-                       ntfs_error(vol->sb, "Cannot extend allocation of "
-                                       "inode 0x%lx, attribute type 0x%x, "
-                                       "because the runlist merge failed "
-                                       "with error code %i.", vi->i_ino,
-                                       (unsigned)le32_to_cpu(ni->type), err);
-               if (err != -ENOMEM)
-                       err = -EIO;
-               if (ntfs_cluster_free_from_rl(vol, rl2)) {
-                       ntfs_error(vol->sb, "Failed to release allocated "
-                                       "cluster(s) in error code path.  Run "
-                                       "chkdsk to recover the lost "
-                                       "cluster(s).");
-                       NVolSetErrors(vol);
-               }
-               ntfs_free(rl2);
-               goto err_out;
-       }
-       ni->runlist.rl = rl;
-       ntfs_debug("Allocated 0x%llx clusters.", (long long)(new_alloc_size -
-                       allocated_size) >> vol->cluster_size_bits);
-       /* Find the runlist element with which the attribute extent starts. */
-       ll = sle64_to_cpu(a->data.non_resident.lowest_vcn);
-       rl2 = ntfs_rl_find_vcn_nolock(rl, ll);
-       BUG_ON(!rl2);
-       BUG_ON(!rl2->length);
-       BUG_ON(rl2->lcn < LCN_HOLE);
-       mp_rebuilt = false;
-       /* Get the size for the new mapping pairs array for this extent. */
-       mp_size = ntfs_get_size_for_mapping_pairs(vol, rl2, ll, -1);
-       if (unlikely(mp_size <= 0)) {
-               err = mp_size;
-               if (start < 0 || start >= allocated_size)
-                       ntfs_error(vol->sb, "Cannot extend allocation of "
-                                       "inode 0x%lx, attribute type 0x%x, "
-                                       "because determining the size for the "
-                                       "mapping pairs failed with error code "
-                                       "%i.", vi->i_ino,
-                                       (unsigned)le32_to_cpu(ni->type), err);
-               err = -EIO;
-               goto undo_alloc;
-       }
-       /* Extend the attribute record to fit the bigger mapping pairs array. */
-       attr_len = le32_to_cpu(a->length);
-       err = ntfs_attr_record_resize(m, a, mp_size +
-                       le16_to_cpu(a->data.non_resident.mapping_pairs_offset));
-       if (unlikely(err)) {
-               BUG_ON(err != -ENOSPC);
-               // TODO: Deal with this by moving this extent to a new mft
-               // record or by starting a new extent in a new mft record,
-               // possibly by extending this extent partially and filling it
-               // and creating a new extent for the remainder, or by making
-               // other attributes non-resident and/or by moving other
-               // attributes out of this mft record.
-               if (start < 0 || start >= allocated_size)
-                       ntfs_error(vol->sb, "Not enough space in the mft "
-                                       "record for the extended attribute "
-                                       "record.  This case is not "
-                                       "implemented yet.");
-               err = -EOPNOTSUPP;
-               goto undo_alloc;
-       }
-       mp_rebuilt = true;
-       /* Generate the mapping pairs array directly into the attr record. */
-       err = ntfs_mapping_pairs_build(vol, (u8*)a +
-                       le16_to_cpu(a->data.non_resident.mapping_pairs_offset),
-                       mp_size, rl2, ll, -1, NULL);
-       if (unlikely(err)) {
-               if (start < 0 || start >= allocated_size)
-                       ntfs_error(vol->sb, "Cannot extend allocation of "
-                                       "inode 0x%lx, attribute type 0x%x, "
-                                       "because building the mapping pairs "
-                                       "failed with error code %i.", vi->i_ino,
-                                       (unsigned)le32_to_cpu(ni->type), err);
-               err = -EIO;
-               goto undo_alloc;
-       }
-       /* Update the highest_vcn. */
-       a->data.non_resident.highest_vcn = cpu_to_sle64((new_alloc_size >>
-                       vol->cluster_size_bits) - 1);
-       /*
-        * We now have extended the allocated size of the attribute.  Reflect
-        * this in the ntfs_inode structure and the attribute record.
-        */
-       if (a->data.non_resident.lowest_vcn) {
-               /*
-                * We are not in the first attribute extent, switch to it, but
-                * first ensure the changes will make it to disk later.
-                */
-               flush_dcache_mft_record_page(ctx->ntfs_ino);
-               mark_mft_record_dirty(ctx->ntfs_ino);
-               ntfs_attr_reinit_search_ctx(ctx);
-               err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
-                               CASE_SENSITIVE, 0, NULL, 0, ctx);
-               if (unlikely(err))
-                       goto restore_undo_alloc;
-               /* @m is not used any more so no need to set it. */
-               a = ctx->attr;
-       }
-       write_lock_irqsave(&ni->size_lock, flags);
-       ni->allocated_size = new_alloc_size;
-       a->data.non_resident.allocated_size = cpu_to_sle64(new_alloc_size);
-       /*
-        * FIXME: This would fail if @ni is a directory, $MFT, or an index,
-        * since those can have sparse/compressed set.  For example can be
-        * set compressed even though it is not compressed itself and in that
-        * case the bit means that files are to be created compressed in the
-        * directory...  At present this is ok as this code is only called for
-        * regular files, and only for their $DATA attribute(s).
-        * FIXME: The calculation is wrong if we created a hole above.  For now
-        * it does not matter as we never create holes.
-        */
-       if (NInoSparse(ni) || NInoCompressed(ni)) {
-               ni->itype.compressed.size += new_alloc_size - allocated_size;
-               a->data.non_resident.compressed_size =
-                               cpu_to_sle64(ni->itype.compressed.size);
-               vi->i_blocks = ni->itype.compressed.size >> 9;
-       } else
-               vi->i_blocks = new_alloc_size >> 9;
-       write_unlock_irqrestore(&ni->size_lock, flags);
-alloc_done:
-       if (new_data_size >= 0) {
-               BUG_ON(new_data_size <
-                               sle64_to_cpu(a->data.non_resident.data_size));
-               a->data.non_resident.data_size = cpu_to_sle64(new_data_size);
-       }
-flush_done:
-       /* Ensure the changes make it to disk. */
-       flush_dcache_mft_record_page(ctx->ntfs_ino);
-       mark_mft_record_dirty(ctx->ntfs_ino);
-done:
-       ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(base_ni);
-       up_write(&ni->runlist.lock);
-       ntfs_debug("Done, new_allocated_size 0x%llx.",
-                       (unsigned long long)new_alloc_size);
-       return new_alloc_size;
-restore_undo_alloc:
-       if (start < 0 || start >= allocated_size)
-               ntfs_error(vol->sb, "Cannot complete extension of allocation "
-                               "of inode 0x%lx, attribute type 0x%x, because "
-                               "lookup of first attribute extent failed with "
-                               "error code %i.", vi->i_ino,
-                               (unsigned)le32_to_cpu(ni->type), err);
-       if (err == -ENOENT)
-               err = -EIO;
-       ntfs_attr_reinit_search_ctx(ctx);
-       if (ntfs_attr_lookup(ni->type, ni->name, ni->name_len, CASE_SENSITIVE,
-                       allocated_size >> vol->cluster_size_bits, NULL, 0,
-                       ctx)) {
-               ntfs_error(vol->sb, "Failed to find last attribute extent of "
-                               "attribute in error code path.  Run chkdsk to "
-                               "recover.");
-               write_lock_irqsave(&ni->size_lock, flags);
-               ni->allocated_size = new_alloc_size;
-               /*
-                * FIXME: This would fail if @ni is a directory...  See above.
-                * FIXME: The calculation is wrong if we created a hole above.
-                * For now it does not matter as we never create holes.
-                */
-               if (NInoSparse(ni) || NInoCompressed(ni)) {
-                       ni->itype.compressed.size += new_alloc_size -
-                                       allocated_size;
-                       vi->i_blocks = ni->itype.compressed.size >> 9;
-               } else
-                       vi->i_blocks = new_alloc_size >> 9;
-               write_unlock_irqrestore(&ni->size_lock, flags);
-               ntfs_attr_put_search_ctx(ctx);
-               unmap_mft_record(base_ni);
-               up_write(&ni->runlist.lock);
-               /*
-                * The only thing that is now wrong is the allocated size of the
-                * base attribute extent which chkdsk should be able to fix.
-                */
-               NVolSetErrors(vol);
-               return err;
-       }
-       ctx->attr->data.non_resident.highest_vcn = cpu_to_sle64(
-                       (allocated_size >> vol->cluster_size_bits) - 1);
-undo_alloc:
-       ll = allocated_size >> vol->cluster_size_bits;
-       if (ntfs_cluster_free(ni, ll, -1, ctx) < 0) {
-               ntfs_error(vol->sb, "Failed to release allocated cluster(s) "
-                               "in error code path.  Run chkdsk to recover "
-                               "the lost cluster(s).");
-               NVolSetErrors(vol);
-       }
-       m = ctx->mrec;
-       a = ctx->attr;
-       /*
-        * If the runlist truncation fails and/or the search context is no
-        * longer valid, we cannot resize the attribute record or build the
-        * mapping pairs array thus we mark the inode bad so that no access to
-        * the freed clusters can happen.
-        */
-       if (ntfs_rl_truncate_nolock(vol, &ni->runlist, ll) || IS_ERR(m)) {
-               ntfs_error(vol->sb, "Failed to %s in error code path.  Run "
-                               "chkdsk to recover.", IS_ERR(m) ?
-                               "restore attribute search context" :
-                               "truncate attribute runlist");
-               NVolSetErrors(vol);
-       } else if (mp_rebuilt) {
-               if (ntfs_attr_record_resize(m, a, attr_len)) {
-                       ntfs_error(vol->sb, "Failed to restore attribute "
-                                       "record in error code path.  Run "
-                                       "chkdsk to recover.");
-                       NVolSetErrors(vol);
-               } else /* if (success) */ {
-                       if (ntfs_mapping_pairs_build(vol, (u8*)a + le16_to_cpu(
-                                       a->data.non_resident.
-                                       mapping_pairs_offset), attr_len -
-                                       le16_to_cpu(a->data.non_resident.
-                                       mapping_pairs_offset), rl2, ll, -1,
-                                       NULL)) {
-                               ntfs_error(vol->sb, "Failed to restore "
-                                               "mapping pairs array in error "
-                                               "code path.  Run chkdsk to "
-                                               "recover.");
-                               NVolSetErrors(vol);
-                       }
-                       flush_dcache_mft_record_page(ctx->ntfs_ino);
-                       mark_mft_record_dirty(ctx->ntfs_ino);
-               }
-       }
-err_out:
-       if (ctx)
-               ntfs_attr_put_search_ctx(ctx);
-       if (m)
-               unmap_mft_record(base_ni);
-       up_write(&ni->runlist.lock);
-conv_err_out:
-       ntfs_debug("Failed.  Returning error code %i.", err);
-       return err;
-}
-
-/**
- * ntfs_attr_set - fill (a part of) an attribute with a byte
- * @ni:                ntfs inode describing the attribute to fill
- * @ofs:       offset inside the attribute at which to start to fill
- * @cnt:       number of bytes to fill
- * @val:       the unsigned 8-bit value with which to fill the attribute
- *
- * Fill @cnt bytes of the attribute described by the ntfs inode @ni starting at
- * byte offset @ofs inside the attribute with the constant byte @val.
- *
- * This function is effectively like memset() applied to an ntfs attribute.
- * Note this function actually only operates on the page cache pages belonging
- * to the ntfs attribute and it marks them dirty after doing the memset().
- * Thus it relies on the vm dirty page write code paths to cause the modified
- * pages to be written to the mft record/disk.
- *
- * Return 0 on success and -errno on error.  An error code of -ESPIPE means
- * that @ofs + @cnt were outside the end of the attribute and no write was
- * performed.
- */
-int ntfs_attr_set(ntfs_inode *ni, const s64 ofs, const s64 cnt, const u8 val)
-{
-       ntfs_volume *vol = ni->vol;
-       struct address_space *mapping;
-       struct page *page;
-       u8 *kaddr;
-       pgoff_t idx, end;
-       unsigned start_ofs, end_ofs, size;
-
-       ntfs_debug("Entering for ofs 0x%llx, cnt 0x%llx, val 0x%hx.",
-                       (long long)ofs, (long long)cnt, val);
-       BUG_ON(ofs < 0);
-       BUG_ON(cnt < 0);
-       if (!cnt)
-               goto done;
-       /*
-        * FIXME: Compressed and encrypted attributes are not supported when
-        * writing and we should never have gotten here for them.
-        */
-       BUG_ON(NInoCompressed(ni));
-       BUG_ON(NInoEncrypted(ni));
-       mapping = VFS_I(ni)->i_mapping;
-       /* Work out the starting index and page offset. */
-       idx = ofs >> PAGE_SHIFT;
-       start_ofs = ofs & ~PAGE_MASK;
-       /* Work out the ending index and page offset. */
-       end = ofs + cnt;
-       end_ofs = end & ~PAGE_MASK;
-       /* If the end is outside the inode size return -ESPIPE. */
-       if (unlikely(end > i_size_read(VFS_I(ni)))) {
-               ntfs_error(vol->sb, "Request exceeds end of attribute.");
-               return -ESPIPE;
-       }
-       end >>= PAGE_SHIFT;
-       /* If there is a first partial page, need to do it the slow way. */
-       if (start_ofs) {
-               page = read_mapping_page(mapping, idx, NULL);
-               if (IS_ERR(page)) {
-                       ntfs_error(vol->sb, "Failed to read first partial "
-                                       "page (error, index 0x%lx).", idx);
-                       return PTR_ERR(page);
-               }
-               /*
-                * If the last page is the same as the first page, need to
-                * limit the write to the end offset.
-                */
-               size = PAGE_SIZE;
-               if (idx == end)
-                       size = end_ofs;
-               kaddr = kmap_atomic(page);
-               memset(kaddr + start_ofs, val, size - start_ofs);
-               flush_dcache_page(page);
-               kunmap_atomic(kaddr);
-               set_page_dirty(page);
-               put_page(page);
-               balance_dirty_pages_ratelimited(mapping);
-               cond_resched();
-               if (idx == end)
-                       goto done;
-               idx++;
-       }
-       /* Do the whole pages the fast way. */
-       for (; idx < end; idx++) {
-               /* Find or create the current page.  (The page is locked.) */
-               page = grab_cache_page(mapping, idx);
-               if (unlikely(!page)) {
-                       ntfs_error(vol->sb, "Insufficient memory to grab "
-                                       "page (index 0x%lx).", idx);
-                       return -ENOMEM;
-               }
-               kaddr = kmap_atomic(page);
-               memset(kaddr, val, PAGE_SIZE);
-               flush_dcache_page(page);
-               kunmap_atomic(kaddr);
-               /*
-                * If the page has buffers, mark them uptodate since buffer
-                * state and not page state is definitive in 2.6 kernels.
-                */
-               if (page_has_buffers(page)) {
-                       struct buffer_head *bh, *head;
-
-                       bh = head = page_buffers(page);
-                       do {
-                               set_buffer_uptodate(bh);
-                       } while ((bh = bh->b_this_page) != head);
-               }
-               /* Now that buffers are uptodate, set the page uptodate, too. */
-               SetPageUptodate(page);
-               /*
-                * Set the page and all its buffers dirty and mark the inode
-                * dirty, too.  The VM will write the page later on.
-                */
-               set_page_dirty(page);
-               /* Finally unlock and release the page. */
-               unlock_page(page);
-               put_page(page);
-               balance_dirty_pages_ratelimited(mapping);
-               cond_resched();
-       }
-       /* If there is a last partial page, need to do it the slow way. */
-       if (end_ofs) {
-               page = read_mapping_page(mapping, idx, NULL);
-               if (IS_ERR(page)) {
-                       ntfs_error(vol->sb, "Failed to read last partial page "
-                                       "(error, index 0x%lx).", idx);
-                       return PTR_ERR(page);
-               }
-               kaddr = kmap_atomic(page);
-               memset(kaddr, val, end_ofs);
-               flush_dcache_page(page);
-               kunmap_atomic(kaddr);
-               set_page_dirty(page);
-               put_page(page);
-               balance_dirty_pages_ratelimited(mapping);
-               cond_resched();
-       }
-done:
-       ntfs_debug("Done.");
-       return 0;
-}
-
-#endif /* NTFS_RW */
diff --git a/fs/ntfs/attrib.h b/fs/ntfs/attrib.h
deleted file mode 100644 (file)
index fe0890d..0000000
+++ /dev/null
@@ -1,102 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * attrib.h - Defines for attribute handling in NTFS Linux kernel driver.
- *           Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2005 Anton Altaparmakov
- * Copyright (c) 2002 Richard Russon
- */
-
-#ifndef _LINUX_NTFS_ATTRIB_H
-#define _LINUX_NTFS_ATTRIB_H
-
-#include "endian.h"
-#include "types.h"
-#include "layout.h"
-#include "inode.h"
-#include "runlist.h"
-#include "volume.h"
-
-/**
- * ntfs_attr_search_ctx - used in attribute search functions
- * @mrec:      buffer containing mft record to search
- * @attr:      attribute record in @mrec where to begin/continue search
- * @is_first:  if true ntfs_attr_lookup() begins search with @attr, else after
- *
- * Structure must be initialized to zero before the first call to one of the
- * attribute search functions. Initialize @mrec to point to the mft record to
- * search, and @attr to point to the first attribute within @mrec (not necessary
- * if calling the _first() functions), and set @is_first to 'true' (not necessary
- * if calling the _first() functions).
- *
- * If @is_first is 'true', the search begins with @attr. If @is_first is 'false',
- * the search begins after @attr. This is so that, after the first call to one
- * of the search attribute functions, we can call the function again, without
- * any modification of the search context, to automagically get the next
- * matching attribute.
- */
-typedef struct {
-       MFT_RECORD *mrec;
-       ATTR_RECORD *attr;
-       bool is_first;
-       ntfs_inode *ntfs_ino;
-       ATTR_LIST_ENTRY *al_entry;
-       ntfs_inode *base_ntfs_ino;
-       MFT_RECORD *base_mrec;
-       ATTR_RECORD *base_attr;
-} ntfs_attr_search_ctx;
-
-extern int ntfs_map_runlist_nolock(ntfs_inode *ni, VCN vcn,
-               ntfs_attr_search_ctx *ctx);
-extern int ntfs_map_runlist(ntfs_inode *ni, VCN vcn);
-
-extern LCN ntfs_attr_vcn_to_lcn_nolock(ntfs_inode *ni, const VCN vcn,
-               const bool write_locked);
-
-extern runlist_element *ntfs_attr_find_vcn_nolock(ntfs_inode *ni,
-               const VCN vcn, ntfs_attr_search_ctx *ctx);
-
-int ntfs_attr_lookup(const ATTR_TYPE type, const ntfschar *name,
-               const u32 name_len, const IGNORE_CASE_BOOL ic,
-               const VCN lowest_vcn, const u8 *val, const u32 val_len,
-               ntfs_attr_search_ctx *ctx);
-
-extern int load_attribute_list(ntfs_volume *vol, runlist *rl, u8 *al_start,
-               const s64 size, const s64 initialized_size);
-
-static inline s64 ntfs_attr_size(const ATTR_RECORD *a)
-{
-       if (!a->non_resident)
-               return (s64)le32_to_cpu(a->data.resident.value_length);
-       return sle64_to_cpu(a->data.non_resident.data_size);
-}
-
-extern void ntfs_attr_reinit_search_ctx(ntfs_attr_search_ctx *ctx);
-extern ntfs_attr_search_ctx *ntfs_attr_get_search_ctx(ntfs_inode *ni,
-               MFT_RECORD *mrec);
-extern void ntfs_attr_put_search_ctx(ntfs_attr_search_ctx *ctx);
-
-#ifdef NTFS_RW
-
-extern int ntfs_attr_size_bounds_check(const ntfs_volume *vol,
-               const ATTR_TYPE type, const s64 size);
-extern int ntfs_attr_can_be_non_resident(const ntfs_volume *vol,
-               const ATTR_TYPE type);
-extern int ntfs_attr_can_be_resident(const ntfs_volume *vol,
-               const ATTR_TYPE type);
-
-extern int ntfs_attr_record_resize(MFT_RECORD *m, ATTR_RECORD *a, u32 new_size);
-extern int ntfs_resident_attr_value_resize(MFT_RECORD *m, ATTR_RECORD *a,
-               const u32 new_size);
-
-extern int ntfs_attr_make_non_resident(ntfs_inode *ni, const u32 data_size);
-
-extern s64 ntfs_attr_extend_allocation(ntfs_inode *ni, s64 new_alloc_size,
-               const s64 new_data_size, const s64 data_start);
-
-extern int ntfs_attr_set(ntfs_inode *ni, const s64 ofs, const s64 cnt,
-               const u8 val);
-
-#endif /* NTFS_RW */
-
-#endif /* _LINUX_NTFS_ATTRIB_H */
diff --git a/fs/ntfs/bitmap.c b/fs/ntfs/bitmap.c
deleted file mode 100644 (file)
index 0675b24..0000000
+++ /dev/null
@@ -1,179 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * bitmap.c - NTFS kernel bitmap handling.  Part of the Linux-NTFS project.
- *
- * Copyright (c) 2004-2005 Anton Altaparmakov
- */
-
-#ifdef NTFS_RW
-
-#include <linux/pagemap.h>
-
-#include "bitmap.h"
-#include "debug.h"
-#include "aops.h"
-#include "ntfs.h"
-
-/**
- * __ntfs_bitmap_set_bits_in_run - set a run of bits in a bitmap to a value
- * @vi:                        vfs inode describing the bitmap
- * @start_bit:         first bit to set
- * @count:             number of bits to set
- * @value:             value to set the bits to (i.e. 0 or 1)
- * @is_rollback:       if 'true' this is a rollback operation
- *
- * Set @count bits starting at bit @start_bit in the bitmap described by the
- * vfs inode @vi to @value, where @value is either 0 or 1.
- *
- * @is_rollback should always be 'false', it is for internal use to rollback
- * errors.  You probably want to use ntfs_bitmap_set_bits_in_run() instead.
- *
- * Return 0 on success and -errno on error.
- */
-int __ntfs_bitmap_set_bits_in_run(struct inode *vi, const s64 start_bit,
-               const s64 count, const u8 value, const bool is_rollback)
-{
-       s64 cnt = count;
-       pgoff_t index, end_index;
-       struct address_space *mapping;
-       struct page *page;
-       u8 *kaddr;
-       int pos, len;
-       u8 bit;
-
-       BUG_ON(!vi);
-       ntfs_debug("Entering for i_ino 0x%lx, start_bit 0x%llx, count 0x%llx, "
-                       "value %u.%s", vi->i_ino, (unsigned long long)start_bit,
-                       (unsigned long long)cnt, (unsigned int)value,
-                       is_rollback ? " (rollback)" : "");
-       BUG_ON(start_bit < 0);
-       BUG_ON(cnt < 0);
-       BUG_ON(value > 1);
-       /*
-        * Calculate the indices for the pages containing the first and last
-        * bits, i.e. @start_bit and @start_bit + @cnt - 1, respectively.
-        */
-       index = start_bit >> (3 + PAGE_SHIFT);
-       end_index = (start_bit + cnt - 1) >> (3 + PAGE_SHIFT);
-
-       /* Get the page containing the first bit (@start_bit). */
-       mapping = vi->i_mapping;
-       page = ntfs_map_page(mapping, index);
-       if (IS_ERR(page)) {
-               if (!is_rollback)
-                       ntfs_error(vi->i_sb, "Failed to map first page (error "
-                                       "%li), aborting.", PTR_ERR(page));
-               return PTR_ERR(page);
-       }
-       kaddr = page_address(page);
-
-       /* Set @pos to the position of the byte containing @start_bit. */
-       pos = (start_bit >> 3) & ~PAGE_MASK;
-
-       /* Calculate the position of @start_bit in the first byte. */
-       bit = start_bit & 7;
-
-       /* If the first byte is partial, modify the appropriate bits in it. */
-       if (bit) {
-               u8 *byte = kaddr + pos;
-               while ((bit & 7) && cnt) {
-                       cnt--;
-                       if (value)
-                               *byte |= 1 << bit++;
-                       else
-                               *byte &= ~(1 << bit++);
-               }
-               /* If we are done, unmap the page and return success. */
-               if (!cnt)
-                       goto done;
-
-               /* Update @pos to the new position. */
-               pos++;
-       }
-       /*
-        * Depending on @value, modify all remaining whole bytes in the page up
-        * to @cnt.
-        */
-       len = min_t(s64, cnt >> 3, PAGE_SIZE - pos);
-       memset(kaddr + pos, value ? 0xff : 0, len);
-       cnt -= len << 3;
-
-       /* Update @len to point to the first not-done byte in the page. */
-       if (cnt < 8)
-               len += pos;
-
-       /* If we are not in the last page, deal with all subsequent pages. */
-       while (index < end_index) {
-               BUG_ON(cnt <= 0);
-
-               /* Update @index and get the next page. */
-               flush_dcache_page(page);
-               set_page_dirty(page);
-               ntfs_unmap_page(page);
-               page = ntfs_map_page(mapping, ++index);
-               if (IS_ERR(page))
-                       goto rollback;
-               kaddr = page_address(page);
-               /*
-                * Depending on @value, modify all remaining whole bytes in the
-                * page up to @cnt.
-                */
-               len = min_t(s64, cnt >> 3, PAGE_SIZE);
-               memset(kaddr, value ? 0xff : 0, len);
-               cnt -= len << 3;
-       }
-       /*
-        * The currently mapped page is the last one.  If the last byte is
-        * partial, modify the appropriate bits in it.  Note, @len is the
-        * position of the last byte inside the page.
-        */
-       if (cnt) {
-               u8 *byte;
-
-               BUG_ON(cnt > 7);
-
-               bit = cnt;
-               byte = kaddr + len;
-               while (bit--) {
-                       if (value)
-                               *byte |= 1 << bit;
-                       else
-                               *byte &= ~(1 << bit);
-               }
-       }
-done:
-       /* We are done.  Unmap the page and return success. */
-       flush_dcache_page(page);
-       set_page_dirty(page);
-       ntfs_unmap_page(page);
-       ntfs_debug("Done.");
-       return 0;
-rollback:
-       /*
-        * Current state:
-        *      - no pages are mapped
-        *      - @count - @cnt is the number of bits that have been modified
-        */
-       if (is_rollback)
-               return PTR_ERR(page);
-       if (count != cnt)
-               pos = __ntfs_bitmap_set_bits_in_run(vi, start_bit, count - cnt,
-                               value ? 0 : 1, true);
-       else
-               pos = 0;
-       if (!pos) {
-               /* Rollback was successful. */
-               ntfs_error(vi->i_sb, "Failed to map subsequent page (error "
-                               "%li), aborting.", PTR_ERR(page));
-       } else {
-               /* Rollback failed. */
-               ntfs_error(vi->i_sb, "Failed to map subsequent page (error "
-                               "%li) and rollback failed (error %i).  "
-                               "Aborting and leaving inconsistent metadata.  "
-                               "Unmount and run chkdsk.", PTR_ERR(page), pos);
-               NVolSetErrors(NTFS_SB(vi->i_sb));
-       }
-       return PTR_ERR(page);
-}
-
-#endif /* NTFS_RW */
diff --git a/fs/ntfs/bitmap.h b/fs/ntfs/bitmap.h
deleted file mode 100644 (file)
index 9dd2224..0000000
+++ /dev/null
@@ -1,104 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * bitmap.h - Defines for NTFS kernel bitmap handling.  Part of the Linux-NTFS
- *           project.
- *
- * Copyright (c) 2004 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_BITMAP_H
-#define _LINUX_NTFS_BITMAP_H
-
-#ifdef NTFS_RW
-
-#include <linux/fs.h>
-
-#include "types.h"
-
-extern int __ntfs_bitmap_set_bits_in_run(struct inode *vi, const s64 start_bit,
-               const s64 count, const u8 value, const bool is_rollback);
-
-/**
- * ntfs_bitmap_set_bits_in_run - set a run of bits in a bitmap to a value
- * @vi:                        vfs inode describing the bitmap
- * @start_bit:         first bit to set
- * @count:             number of bits to set
- * @value:             value to set the bits to (i.e. 0 or 1)
- *
- * Set @count bits starting at bit @start_bit in the bitmap described by the
- * vfs inode @vi to @value, where @value is either 0 or 1.
- *
- * Return 0 on success and -errno on error.
- */
-static inline int ntfs_bitmap_set_bits_in_run(struct inode *vi,
-               const s64 start_bit, const s64 count, const u8 value)
-{
-       return __ntfs_bitmap_set_bits_in_run(vi, start_bit, count, value,
-                       false);
-}
-
-/**
- * ntfs_bitmap_set_run - set a run of bits in a bitmap
- * @vi:                vfs inode describing the bitmap
- * @start_bit: first bit to set
- * @count:     number of bits to set
- *
- * Set @count bits starting at bit @start_bit in the bitmap described by the
- * vfs inode @vi.
- *
- * Return 0 on success and -errno on error.
- */
-static inline int ntfs_bitmap_set_run(struct inode *vi, const s64 start_bit,
-               const s64 count)
-{
-       return ntfs_bitmap_set_bits_in_run(vi, start_bit, count, 1);
-}
-
-/**
- * ntfs_bitmap_clear_run - clear a run of bits in a bitmap
- * @vi:                vfs inode describing the bitmap
- * @start_bit: first bit to clear
- * @count:     number of bits to clear
- *
- * Clear @count bits starting at bit @start_bit in the bitmap described by the
- * vfs inode @vi.
- *
- * Return 0 on success and -errno on error.
- */
-static inline int ntfs_bitmap_clear_run(struct inode *vi, const s64 start_bit,
-               const s64 count)
-{
-       return ntfs_bitmap_set_bits_in_run(vi, start_bit, count, 0);
-}
-
-/**
- * ntfs_bitmap_set_bit - set a bit in a bitmap
- * @vi:                vfs inode describing the bitmap
- * @bit:       bit to set
- *
- * Set bit @bit in the bitmap described by the vfs inode @vi.
- *
- * Return 0 on success and -errno on error.
- */
-static inline int ntfs_bitmap_set_bit(struct inode *vi, const s64 bit)
-{
-       return ntfs_bitmap_set_run(vi, bit, 1);
-}
-
-/**
- * ntfs_bitmap_clear_bit - clear a bit in a bitmap
- * @vi:                vfs inode describing the bitmap
- * @bit:       bit to clear
- *
- * Clear bit @bit in the bitmap described by the vfs inode @vi.
- *
- * Return 0 on success and -errno on error.
- */
-static inline int ntfs_bitmap_clear_bit(struct inode *vi, const s64 bit)
-{
-       return ntfs_bitmap_clear_run(vi, bit, 1);
-}
-
-#endif /* NTFS_RW */
-
-#endif /* defined _LINUX_NTFS_BITMAP_H */
diff --git a/fs/ntfs/collate.c b/fs/ntfs/collate.c
deleted file mode 100644 (file)
index 3ab6ec9..0000000
+++ /dev/null
@@ -1,110 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * collate.c - NTFS kernel collation handling.  Part of the Linux-NTFS project.
- *
- * Copyright (c) 2004 Anton Altaparmakov
- */
-
-#include "collate.h"
-#include "debug.h"
-#include "ntfs.h"
-
-static int ntfs_collate_binary(ntfs_volume *vol,
-               const void *data1, const int data1_len,
-               const void *data2, const int data2_len)
-{
-       int rc;
-
-       ntfs_debug("Entering.");
-       rc = memcmp(data1, data2, min(data1_len, data2_len));
-       if (!rc && (data1_len != data2_len)) {
-               if (data1_len < data2_len)
-                       rc = -1;
-               else
-                       rc = 1;
-       }
-       ntfs_debug("Done, returning %i", rc);
-       return rc;
-}
-
-static int ntfs_collate_ntofs_ulong(ntfs_volume *vol,
-               const void *data1, const int data1_len,
-               const void *data2, const int data2_len)
-{
-       int rc;
-       u32 d1, d2;
-
-       ntfs_debug("Entering.");
-       // FIXME:  We don't really want to bug here.
-       BUG_ON(data1_len != data2_len);
-       BUG_ON(data1_len != 4);
-       d1 = le32_to_cpup(data1);
-       d2 = le32_to_cpup(data2);
-       if (d1 < d2)
-               rc = -1;
-       else {
-               if (d1 == d2)
-                       rc = 0;
-               else
-                       rc = 1;
-       }
-       ntfs_debug("Done, returning %i", rc);
-       return rc;
-}
-
-typedef int (*ntfs_collate_func_t)(ntfs_volume *, const void *, const int,
-               const void *, const int);
-
-static ntfs_collate_func_t ntfs_do_collate0x0[3] = {
-       ntfs_collate_binary,
-       NULL/*ntfs_collate_file_name*/,
-       NULL/*ntfs_collate_unicode_string*/,
-};
-
-static ntfs_collate_func_t ntfs_do_collate0x1[4] = {
-       ntfs_collate_ntofs_ulong,
-       NULL/*ntfs_collate_ntofs_sid*/,
-       NULL/*ntfs_collate_ntofs_security_hash*/,
-       NULL/*ntfs_collate_ntofs_ulongs*/,
-};
-
-/**
- * ntfs_collate - collate two data items using a specified collation rule
- * @vol:       ntfs volume to which the data items belong
- * @cr:                collation rule to use when comparing the items
- * @data1:     first data item to collate
- * @data1_len: length in bytes of @data1
- * @data2:     second data item to collate
- * @data2_len: length in bytes of @data2
- *
- * Collate the two data items @data1 and @data2 using the collation rule @cr
- * and return -1, 0, ir 1 if @data1 is found, respectively, to collate before,
- * to match, or to collate after @data2.
- *
- * For speed we use the collation rule @cr as an index into two tables of
- * function pointers to call the appropriate collation function.
- */
-int ntfs_collate(ntfs_volume *vol, COLLATION_RULE cr,
-               const void *data1, const int data1_len,
-               const void *data2, const int data2_len) {
-       int i;
-
-       ntfs_debug("Entering.");
-       /*
-        * FIXME:  At the moment we only support COLLATION_BINARY and
-        * COLLATION_NTOFS_ULONG, so we BUG() for everything else for now.
-        */
-       BUG_ON(cr != COLLATION_BINARY && cr != COLLATION_NTOFS_ULONG);
-       i = le32_to_cpu(cr);
-       BUG_ON(i < 0);
-       if (i <= 0x02)
-               return ntfs_do_collate0x0[i](vol, data1, data1_len,
-                               data2, data2_len);
-       BUG_ON(i < 0x10);
-       i -= 0x10;
-       if (likely(i <= 3))
-               return ntfs_do_collate0x1[i](vol, data1, data1_len,
-                               data2, data2_len);
-       BUG();
-       return 0;
-}
diff --git a/fs/ntfs/collate.h b/fs/ntfs/collate.h
deleted file mode 100644 (file)
index f225561..0000000
+++ /dev/null
@@ -1,36 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * collate.h - Defines for NTFS kernel collation handling.  Part of the
- *            Linux-NTFS project.
- *
- * Copyright (c) 2004 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_COLLATE_H
-#define _LINUX_NTFS_COLLATE_H
-
-#include "types.h"
-#include "volume.h"
-
-static inline bool ntfs_is_collation_rule_supported(COLLATION_RULE cr) {
-       int i;
-
-       /*
-        * FIXME:  At the moment we only support COLLATION_BINARY and
-        * COLLATION_NTOFS_ULONG, so we return false for everything else for
-        * now.
-        */
-       if (unlikely(cr != COLLATION_BINARY && cr != COLLATION_NTOFS_ULONG))
-               return false;
-       i = le32_to_cpu(cr);
-       if (likely(((i >= 0) && (i <= 0x02)) ||
-                       ((i >= 0x10) && (i <= 0x13))))
-               return true;
-       return false;
-}
-
-extern int ntfs_collate(ntfs_volume *vol, COLLATION_RULE cr,
-               const void *data1, const int data1_len,
-               const void *data2, const int data2_len);
-
-#endif /* _LINUX_NTFS_COLLATE_H */
diff --git a/fs/ntfs/compress.c b/fs/ntfs/compress.c
deleted file mode 100644 (file)
index 761aaa0..0000000
+++ /dev/null
@@ -1,950 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * compress.c - NTFS kernel compressed attributes handling.
- *             Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2004 Anton Altaparmakov
- * Copyright (c) 2002 Richard Russon
- */
-
-#include <linux/fs.h>
-#include <linux/buffer_head.h>
-#include <linux/blkdev.h>
-#include <linux/vmalloc.h>
-#include <linux/slab.h>
-
-#include "attrib.h"
-#include "inode.h"
-#include "debug.h"
-#include "ntfs.h"
-
-/**
- * ntfs_compression_constants - enum of constants used in the compression code
- */
-typedef enum {
-       /* Token types and access mask. */
-       NTFS_SYMBOL_TOKEN       =       0,
-       NTFS_PHRASE_TOKEN       =       1,
-       NTFS_TOKEN_MASK         =       1,
-
-       /* Compression sub-block constants. */
-       NTFS_SB_SIZE_MASK       =       0x0fff,
-       NTFS_SB_SIZE            =       0x1000,
-       NTFS_SB_IS_COMPRESSED   =       0x8000,
-
-       /*
-        * The maximum compression block size is by definition 16 * the cluster
-        * size, with the maximum supported cluster size being 4kiB. Thus the
-        * maximum compression buffer size is 64kiB, so we use this when
-        * initializing the compression buffer.
-        */
-       NTFS_MAX_CB_SIZE        = 64 * 1024,
-} ntfs_compression_constants;
-
-/*
- * ntfs_compression_buffer - one buffer for the decompression engine
- */
-static u8 *ntfs_compression_buffer;
-
-/*
- * ntfs_cb_lock - spinlock which protects ntfs_compression_buffer
- */
-static DEFINE_SPINLOCK(ntfs_cb_lock);
-
-/**
- * allocate_compression_buffers - allocate the decompression buffers
- *
- * Caller has to hold the ntfs_lock mutex.
- *
- * Return 0 on success or -ENOMEM if the allocations failed.
- */
-int allocate_compression_buffers(void)
-{
-       BUG_ON(ntfs_compression_buffer);
-
-       ntfs_compression_buffer = vmalloc(NTFS_MAX_CB_SIZE);
-       if (!ntfs_compression_buffer)
-               return -ENOMEM;
-       return 0;
-}
-
-/**
- * free_compression_buffers - free the decompression buffers
- *
- * Caller has to hold the ntfs_lock mutex.
- */
-void free_compression_buffers(void)
-{
-       BUG_ON(!ntfs_compression_buffer);
-       vfree(ntfs_compression_buffer);
-       ntfs_compression_buffer = NULL;
-}
-
-/**
- * zero_partial_compressed_page - zero out of bounds compressed page region
- */
-static void zero_partial_compressed_page(struct page *page,
-               const s64 initialized_size)
-{
-       u8 *kp = page_address(page);
-       unsigned int kp_ofs;
-
-       ntfs_debug("Zeroing page region outside initialized size.");
-       if (((s64)page->index << PAGE_SHIFT) >= initialized_size) {
-               clear_page(kp);
-               return;
-       }
-       kp_ofs = initialized_size & ~PAGE_MASK;
-       memset(kp + kp_ofs, 0, PAGE_SIZE - kp_ofs);
-       return;
-}
-
-/**
- * handle_bounds_compressed_page - test for&handle out of bounds compressed page
- */
-static inline void handle_bounds_compressed_page(struct page *page,
-               const loff_t i_size, const s64 initialized_size)
-{
-       if ((page->index >= (initialized_size >> PAGE_SHIFT)) &&
-                       (initialized_size < i_size))
-               zero_partial_compressed_page(page, initialized_size);
-       return;
-}
-
-/**
- * ntfs_decompress - decompress a compression block into an array of pages
- * @dest_pages:                destination array of pages
- * @completed_pages:   scratch space to track completed pages
- * @dest_index:                current index into @dest_pages (IN/OUT)
- * @dest_ofs:          current offset within @dest_pages[@dest_index] (IN/OUT)
- * @dest_max_index:    maximum index into @dest_pages (IN)
- * @dest_max_ofs:      maximum offset within @dest_pages[@dest_max_index] (IN)
- * @xpage:             the target page (-1 if none) (IN)
- * @xpage_done:                set to 1 if xpage was completed successfully (IN/OUT)
- * @cb_start:          compression block to decompress (IN)
- * @cb_size:           size of compression block @cb_start in bytes (IN)
- * @i_size:            file size when we started the read (IN)
- * @initialized_size:  initialized file size when we started the read (IN)
- *
- * The caller must have disabled preemption. ntfs_decompress() reenables it when
- * the critical section is finished.
- *
- * This decompresses the compression block @cb_start into the array of
- * destination pages @dest_pages starting at index @dest_index into @dest_pages
- * and at offset @dest_pos into the page @dest_pages[@dest_index].
- *
- * When the page @dest_pages[@xpage] is completed, @xpage_done is set to 1.
- * If xpage is -1 or @xpage has not been completed, @xpage_done is not modified.
- *
- * @cb_start is a pointer to the compression block which needs decompressing
- * and @cb_size is the size of @cb_start in bytes (8-64kiB).
- *
- * Return 0 if success or -EOVERFLOW on error in the compressed stream.
- * @xpage_done indicates whether the target page (@dest_pages[@xpage]) was
- * completed during the decompression of the compression block (@cb_start).
- *
- * Warning: This function *REQUIRES* PAGE_SIZE >= 4096 or it will blow up
- * unpredicatbly! You have been warned!
- *
- * Note to hackers: This function may not sleep until it has finished accessing
- * the compression block @cb_start as it is a per-CPU buffer.
- */
-static int ntfs_decompress(struct page *dest_pages[], int completed_pages[],
-               int *dest_index, int *dest_ofs, const int dest_max_index,
-               const int dest_max_ofs, const int xpage, char *xpage_done,
-               u8 *const cb_start, const u32 cb_size, const loff_t i_size,
-               const s64 initialized_size)
-{
-       /*
-        * Pointers into the compressed data, i.e. the compression block (cb),
-        * and the therein contained sub-blocks (sb).
-        */
-       u8 *cb_end = cb_start + cb_size; /* End of cb. */
-       u8 *cb = cb_start;      /* Current position in cb. */
-       u8 *cb_sb_start;        /* Beginning of the current sb in the cb. */
-       u8 *cb_sb_end;          /* End of current sb / beginning of next sb. */
-
-       /* Variables for uncompressed data / destination. */
-       struct page *dp;        /* Current destination page being worked on. */
-       u8 *dp_addr;            /* Current pointer into dp. */
-       u8 *dp_sb_start;        /* Start of current sub-block in dp. */
-       u8 *dp_sb_end;          /* End of current sb in dp (dp_sb_start +
-                                  NTFS_SB_SIZE). */
-       u16 do_sb_start;        /* @dest_ofs when starting this sub-block. */
-       u16 do_sb_end;          /* @dest_ofs of end of this sb (do_sb_start +
-                                  NTFS_SB_SIZE). */
-
-       /* Variables for tag and token parsing. */
-       u8 tag;                 /* Current tag. */
-       int token;              /* Loop counter for the eight tokens in tag. */
-       int nr_completed_pages = 0;
-
-       /* Default error code. */
-       int err = -EOVERFLOW;
-
-       ntfs_debug("Entering, cb_size = 0x%x.", cb_size);
-do_next_sb:
-       ntfs_debug("Beginning sub-block at offset = 0x%zx in the cb.",
-                       cb - cb_start);
-       /*
-        * Have we reached the end of the compression block or the end of the
-        * decompressed data?  The latter can happen for example if the current
-        * position in the compression block is one byte before its end so the
-        * first two checks do not detect it.
-        */
-       if (cb == cb_end || !le16_to_cpup((le16*)cb) ||
-                       (*dest_index == dest_max_index &&
-                       *dest_ofs == dest_max_ofs)) {
-               int i;
-
-               ntfs_debug("Completed. Returning success (0).");
-               err = 0;
-return_error:
-               /* We can sleep from now on, so we drop lock. */
-               spin_unlock(&ntfs_cb_lock);
-               /* Second stage: finalize completed pages. */
-               if (nr_completed_pages > 0) {
-                       for (i = 0; i < nr_completed_pages; i++) {
-                               int di = completed_pages[i];
-
-                               dp = dest_pages[di];
-                               /*
-                                * If we are outside the initialized size, zero
-                                * the out of bounds page range.
-                                */
-                               handle_bounds_compressed_page(dp, i_size,
-                                               initialized_size);
-                               flush_dcache_page(dp);
-                               kunmap(dp);
-                               SetPageUptodate(dp);
-                               unlock_page(dp);
-                               if (di == xpage)
-                                       *xpage_done = 1;
-                               else
-                                       put_page(dp);
-                               dest_pages[di] = NULL;
-                       }
-               }
-               return err;
-       }
-
-       /* Setup offsets for the current sub-block destination. */
-       do_sb_start = *dest_ofs;
-       do_sb_end = do_sb_start + NTFS_SB_SIZE;
-
-       /* Check that we are still within allowed boundaries. */
-       if (*dest_index == dest_max_index && do_sb_end > dest_max_ofs)
-               goto return_overflow;
-
-       /* Does the minimum size of a compressed sb overflow valid range? */
-       if (cb + 6 > cb_end)
-               goto return_overflow;
-
-       /* Setup the current sub-block source pointers and validate range. */
-       cb_sb_start = cb;
-       cb_sb_end = cb_sb_start + (le16_to_cpup((le16*)cb) & NTFS_SB_SIZE_MASK)
-                       + 3;
-       if (cb_sb_end > cb_end)
-               goto return_overflow;
-
-       /* Get the current destination page. */
-       dp = dest_pages[*dest_index];
-       if (!dp) {
-               /* No page present. Skip decompression of this sub-block. */
-               cb = cb_sb_end;
-
-               /* Advance destination position to next sub-block. */
-               *dest_ofs = (*dest_ofs + NTFS_SB_SIZE) & ~PAGE_MASK;
-               if (!*dest_ofs && (++*dest_index > dest_max_index))
-                       goto return_overflow;
-               goto do_next_sb;
-       }
-
-       /* We have a valid destination page. Setup the destination pointers. */
-       dp_addr = (u8*)page_address(dp) + do_sb_start;
-
-       /* Now, we are ready to process the current sub-block (sb). */
-       if (!(le16_to_cpup((le16*)cb) & NTFS_SB_IS_COMPRESSED)) {
-               ntfs_debug("Found uncompressed sub-block.");
-               /* This sb is not compressed, just copy it into destination. */
-
-               /* Advance source position to first data byte. */
-               cb += 2;
-
-               /* An uncompressed sb must be full size. */
-               if (cb_sb_end - cb != NTFS_SB_SIZE)
-                       goto return_overflow;
-
-               /* Copy the block and advance the source position. */
-               memcpy(dp_addr, cb, NTFS_SB_SIZE);
-               cb += NTFS_SB_SIZE;
-
-               /* Advance destination position to next sub-block. */
-               *dest_ofs += NTFS_SB_SIZE;
-               if (!(*dest_ofs &= ~PAGE_MASK)) {
-finalize_page:
-                       /*
-                        * First stage: add current page index to array of
-                        * completed pages.
-                        */
-                       completed_pages[nr_completed_pages++] = *dest_index;
-                       if (++*dest_index > dest_max_index)
-                               goto return_overflow;
-               }
-               goto do_next_sb;
-       }
-       ntfs_debug("Found compressed sub-block.");
-       /* This sb is compressed, decompress it into destination. */
-
-       /* Setup destination pointers. */
-       dp_sb_start = dp_addr;
-       dp_sb_end = dp_sb_start + NTFS_SB_SIZE;
-
-       /* Forward to the first tag in the sub-block. */
-       cb += 2;
-do_next_tag:
-       if (cb == cb_sb_end) {
-               /* Check if the decompressed sub-block was not full-length. */
-               if (dp_addr < dp_sb_end) {
-                       int nr_bytes = do_sb_end - *dest_ofs;
-
-                       ntfs_debug("Filling incomplete sub-block with "
-                                       "zeroes.");
-                       /* Zero remainder and update destination position. */
-                       memset(dp_addr, 0, nr_bytes);
-                       *dest_ofs += nr_bytes;
-               }
-               /* We have finished the current sub-block. */
-               if (!(*dest_ofs &= ~PAGE_MASK))
-                       goto finalize_page;
-               goto do_next_sb;
-       }
-
-       /* Check we are still in range. */
-       if (cb > cb_sb_end || dp_addr > dp_sb_end)
-               goto return_overflow;
-
-       /* Get the next tag and advance to first token. */
-       tag = *cb++;
-
-       /* Parse the eight tokens described by the tag. */
-       for (token = 0; token < 8; token++, tag >>= 1) {
-               u16 lg, pt, length, max_non_overlap;
-               register u16 i;
-               u8 *dp_back_addr;
-
-               /* Check if we are done / still in range. */
-               if (cb >= cb_sb_end || dp_addr > dp_sb_end)
-                       break;
-
-               /* Determine token type and parse appropriately.*/
-               if ((tag & NTFS_TOKEN_MASK) == NTFS_SYMBOL_TOKEN) {
-                       /*
-                        * We have a symbol token, copy the symbol across, and
-                        * advance the source and destination positions.
-                        */
-                       *dp_addr++ = *cb++;
-                       ++*dest_ofs;
-
-                       /* Continue with the next token. */
-                       continue;
-               }
-
-               /*
-                * We have a phrase token. Make sure it is not the first tag in
-                * the sb as this is illegal and would confuse the code below.
-                */
-               if (dp_addr == dp_sb_start)
-                       goto return_overflow;
-
-               /*
-                * Determine the number of bytes to go back (p) and the number
-                * of bytes to copy (l). We use an optimized algorithm in which
-                * we first calculate log2(current destination position in sb),
-                * which allows determination of l and p in O(1) rather than
-                * O(n). We just need an arch-optimized log2() function now.
-                */
-               lg = 0;
-               for (i = *dest_ofs - do_sb_start - 1; i >= 0x10; i >>= 1)
-                       lg++;
-
-               /* Get the phrase token into i. */
-               pt = le16_to_cpup((le16*)cb);
-
-               /*
-                * Calculate starting position of the byte sequence in
-                * the destination using the fact that p = (pt >> (12 - lg)) + 1
-                * and make sure we don't go too far back.
-                */
-               dp_back_addr = dp_addr - (pt >> (12 - lg)) - 1;
-               if (dp_back_addr < dp_sb_start)
-                       goto return_overflow;
-
-               /* Now calculate the length of the byte sequence. */
-               length = (pt & (0xfff >> lg)) + 3;
-
-               /* Advance destination position and verify it is in range. */
-               *dest_ofs += length;
-               if (*dest_ofs > do_sb_end)
-                       goto return_overflow;
-
-               /* The number of non-overlapping bytes. */
-               max_non_overlap = dp_addr - dp_back_addr;
-
-               if (length <= max_non_overlap) {
-                       /* The byte sequence doesn't overlap, just copy it. */
-                       memcpy(dp_addr, dp_back_addr, length);
-
-                       /* Advance destination pointer. */
-                       dp_addr += length;
-               } else {
-                       /*
-                        * The byte sequence does overlap, copy non-overlapping
-                        * part and then do a slow byte by byte copy for the
-                        * overlapping part. Also, advance the destination
-                        * pointer.
-                        */
-                       memcpy(dp_addr, dp_back_addr, max_non_overlap);
-                       dp_addr += max_non_overlap;
-                       dp_back_addr += max_non_overlap;
-                       length -= max_non_overlap;
-                       while (length--)
-                               *dp_addr++ = *dp_back_addr++;
-               }
-
-               /* Advance source position and continue with the next token. */
-               cb += 2;
-       }
-
-       /* No tokens left in the current tag. Continue with the next tag. */
-       goto do_next_tag;
-
-return_overflow:
-       ntfs_error(NULL, "Failed. Returning -EOVERFLOW.");
-       goto return_error;
-}
-
-/**
- * ntfs_read_compressed_block - read a compressed block into the page cache
- * @page:      locked page in the compression block(s) we need to read
- *
- * When we are called the page has already been verified to be locked and the
- * attribute is known to be non-resident, not encrypted, but compressed.
- *
- * 1. Determine which compression block(s) @page is in.
- * 2. Get hold of all pages corresponding to this/these compression block(s).
- * 3. Read the (first) compression block.
- * 4. Decompress it into the corresponding pages.
- * 5. Throw the compressed data away and proceed to 3. for the next compression
- *    block or return success if no more compression blocks left.
- *
- * Warning: We have to be careful what we do about existing pages. They might
- * have been written to so that we would lose data if we were to just overwrite
- * them with the out-of-date uncompressed data.
- *
- * FIXME: For PAGE_SIZE > cb_size we are not doing the Right Thing(TM) at
- * the end of the file I think. We need to detect this case and zero the out
- * of bounds remainder of the page in question and mark it as handled. At the
- * moment we would just return -EIO on such a page. This bug will only become
- * apparent if pages are above 8kiB and the NTFS volume only uses 512 byte
- * clusters so is probably not going to be seen by anyone. Still this should
- * be fixed. (AIA)
- *
- * FIXME: Again for PAGE_SIZE > cb_size we are screwing up both in
- * handling sparse and compressed cbs. (AIA)
- *
- * FIXME: At the moment we don't do any zeroing out in the case that
- * initialized_size is less than data_size. This should be safe because of the
- * nature of the compression algorithm used. Just in case we check and output
- * an error message in read inode if the two sizes are not equal for a
- * compressed file. (AIA)
- */
-int ntfs_read_compressed_block(struct page *page)
-{
-       loff_t i_size;
-       s64 initialized_size;
-       struct address_space *mapping = page->mapping;
-       ntfs_inode *ni = NTFS_I(mapping->host);
-       ntfs_volume *vol = ni->vol;
-       struct super_block *sb = vol->sb;
-       runlist_element *rl;
-       unsigned long flags, block_size = sb->s_blocksize;
-       unsigned char block_size_bits = sb->s_blocksize_bits;
-       u8 *cb, *cb_pos, *cb_end;
-       struct buffer_head **bhs;
-       unsigned long offset, index = page->index;
-       u32 cb_size = ni->itype.compressed.block_size;
-       u64 cb_size_mask = cb_size - 1UL;
-       VCN vcn;
-       LCN lcn;
-       /* The first wanted vcn (minimum alignment is PAGE_SIZE). */
-       VCN start_vcn = (((s64)index << PAGE_SHIFT) & ~cb_size_mask) >>
-                       vol->cluster_size_bits;
-       /*
-        * The first vcn after the last wanted vcn (minimum alignment is again
-        * PAGE_SIZE.
-        */
-       VCN end_vcn = ((((s64)(index + 1UL) << PAGE_SHIFT) + cb_size - 1)
-                       & ~cb_size_mask) >> vol->cluster_size_bits;
-       /* Number of compression blocks (cbs) in the wanted vcn range. */
-       unsigned int nr_cbs = (end_vcn - start_vcn) << vol->cluster_size_bits
-                       >> ni->itype.compressed.block_size_bits;
-       /*
-        * Number of pages required to store the uncompressed data from all
-        * compression blocks (cbs) overlapping @page. Due to alignment
-        * guarantees of start_vcn and end_vcn, no need to round up here.
-        */
-       unsigned int nr_pages = (end_vcn - start_vcn) <<
-                       vol->cluster_size_bits >> PAGE_SHIFT;
-       unsigned int xpage, max_page, cur_page, cur_ofs, i;
-       unsigned int cb_clusters, cb_max_ofs;
-       int block, max_block, cb_max_page, bhs_size, nr_bhs, err = 0;
-       struct page **pages;
-       int *completed_pages;
-       unsigned char xpage_done = 0;
-
-       ntfs_debug("Entering, page->index = 0x%lx, cb_size = 0x%x, nr_pages = "
-                       "%i.", index, cb_size, nr_pages);
-       /*
-        * Bad things happen if we get here for anything that is not an
-        * unnamed $DATA attribute.
-        */
-       BUG_ON(ni->type != AT_DATA);
-       BUG_ON(ni->name_len);
-
-       pages = kmalloc_array(nr_pages, sizeof(struct page *), GFP_NOFS);
-       completed_pages = kmalloc_array(nr_pages + 1, sizeof(int), GFP_NOFS);
-
-       /* Allocate memory to store the buffer heads we need. */
-       bhs_size = cb_size / block_size * sizeof(struct buffer_head *);
-       bhs = kmalloc(bhs_size, GFP_NOFS);
-
-       if (unlikely(!pages || !bhs || !completed_pages)) {
-               kfree(bhs);
-               kfree(pages);
-               kfree(completed_pages);
-               unlock_page(page);
-               ntfs_error(vol->sb, "Failed to allocate internal buffers.");
-               return -ENOMEM;
-       }
-
-       /*
-        * We have already been given one page, this is the one we must do.
-        * Once again, the alignment guarantees keep it simple.
-        */
-       offset = start_vcn << vol->cluster_size_bits >> PAGE_SHIFT;
-       xpage = index - offset;
-       pages[xpage] = page;
-       /*
-        * The remaining pages need to be allocated and inserted into the page
-        * cache, alignment guarantees keep all the below much simpler. (-8
-        */
-       read_lock_irqsave(&ni->size_lock, flags);
-       i_size = i_size_read(VFS_I(ni));
-       initialized_size = ni->initialized_size;
-       read_unlock_irqrestore(&ni->size_lock, flags);
-       max_page = ((i_size + PAGE_SIZE - 1) >> PAGE_SHIFT) -
-                       offset;
-       /* Is the page fully outside i_size? (truncate in progress) */
-       if (xpage >= max_page) {
-               kfree(bhs);
-               kfree(pages);
-               kfree(completed_pages);
-               zero_user(page, 0, PAGE_SIZE);
-               ntfs_debug("Compressed read outside i_size - truncated?");
-               SetPageUptodate(page);
-               unlock_page(page);
-               return 0;
-       }
-       if (nr_pages < max_page)
-               max_page = nr_pages;
-       for (i = 0; i < max_page; i++, offset++) {
-               if (i != xpage)
-                       pages[i] = grab_cache_page_nowait(mapping, offset);
-               page = pages[i];
-               if (page) {
-                       /*
-                        * We only (re)read the page if it isn't already read
-                        * in and/or dirty or we would be losing data or at
-                        * least wasting our time.
-                        */
-                       if (!PageDirty(page) && (!PageUptodate(page) ||
-                                       PageError(page))) {
-                               ClearPageError(page);
-                               kmap(page);
-                               continue;
-                       }
-                       unlock_page(page);
-                       put_page(page);
-                       pages[i] = NULL;
-               }
-       }
-
-       /*
-        * We have the runlist, and all the destination pages we need to fill.
-        * Now read the first compression block.
-        */
-       cur_page = 0;
-       cur_ofs = 0;
-       cb_clusters = ni->itype.compressed.block_clusters;
-do_next_cb:
-       nr_cbs--;
-       nr_bhs = 0;
-
-       /* Read all cb buffer heads one cluster at a time. */
-       rl = NULL;
-       for (vcn = start_vcn, start_vcn += cb_clusters; vcn < start_vcn;
-                       vcn++) {
-               bool is_retry = false;
-
-               if (!rl) {
-lock_retry_remap:
-                       down_read(&ni->runlist.lock);
-                       rl = ni->runlist.rl;
-               }
-               if (likely(rl != NULL)) {
-                       /* Seek to element containing target vcn. */
-                       while (rl->length && rl[1].vcn <= vcn)
-                               rl++;
-                       lcn = ntfs_rl_vcn_to_lcn(rl, vcn);
-               } else
-                       lcn = LCN_RL_NOT_MAPPED;
-               ntfs_debug("Reading vcn = 0x%llx, lcn = 0x%llx.",
-                               (unsigned long long)vcn,
-                               (unsigned long long)lcn);
-               if (lcn < 0) {
-                       /*
-                        * When we reach the first sparse cluster we have
-                        * finished with the cb.
-                        */
-                       if (lcn == LCN_HOLE)
-                               break;
-                       if (is_retry || lcn != LCN_RL_NOT_MAPPED)
-                               goto rl_err;
-                       is_retry = true;
-                       /*
-                        * Attempt to map runlist, dropping lock for the
-                        * duration.
-                        */
-                       up_read(&ni->runlist.lock);
-                       if (!ntfs_map_runlist(ni, vcn))
-                               goto lock_retry_remap;
-                       goto map_rl_err;
-               }
-               block = lcn << vol->cluster_size_bits >> block_size_bits;
-               /* Read the lcn from device in chunks of block_size bytes. */
-               max_block = block + (vol->cluster_size >> block_size_bits);
-               do {
-                       ntfs_debug("block = 0x%x.", block);
-                       if (unlikely(!(bhs[nr_bhs] = sb_getblk(sb, block))))
-                               goto getblk_err;
-                       nr_bhs++;
-               } while (++block < max_block);
-       }
-
-       /* Release the lock if we took it. */
-       if (rl)
-               up_read(&ni->runlist.lock);
-
-       /* Setup and initiate io on all buffer heads. */
-       for (i = 0; i < nr_bhs; i++) {
-               struct buffer_head *tbh = bhs[i];
-
-               if (!trylock_buffer(tbh))
-                       continue;
-               if (unlikely(buffer_uptodate(tbh))) {
-                       unlock_buffer(tbh);
-                       continue;
-               }
-               get_bh(tbh);
-               tbh->b_end_io = end_buffer_read_sync;
-               submit_bh(REQ_OP_READ, tbh);
-       }
-
-       /* Wait for io completion on all buffer heads. */
-       for (i = 0; i < nr_bhs; i++) {
-               struct buffer_head *tbh = bhs[i];
-
-               if (buffer_uptodate(tbh))
-                       continue;
-               wait_on_buffer(tbh);
-               /*
-                * We need an optimization barrier here, otherwise we start
-                * hitting the below fixup code when accessing a loopback
-                * mounted ntfs partition. This indicates either there is a
-                * race condition in the loop driver or, more likely, gcc
-                * overoptimises the code without the barrier and it doesn't
-                * do the Right Thing(TM).
-                */
-               barrier();
-               if (unlikely(!buffer_uptodate(tbh))) {
-                       ntfs_warning(vol->sb, "Buffer is unlocked but not "
-                                       "uptodate! Unplugging the disk queue "
-                                       "and rescheduling.");
-                       get_bh(tbh);
-                       io_schedule();
-                       put_bh(tbh);
-                       if (unlikely(!buffer_uptodate(tbh)))
-                               goto read_err;
-                       ntfs_warning(vol->sb, "Buffer is now uptodate. Good.");
-               }
-       }
-
-       /*
-        * Get the compression buffer. We must not sleep any more
-        * until we are finished with it.
-        */
-       spin_lock(&ntfs_cb_lock);
-       cb = ntfs_compression_buffer;
-
-       BUG_ON(!cb);
-
-       cb_pos = cb;
-       cb_end = cb + cb_size;
-
-       /* Copy the buffer heads into the contiguous buffer. */
-       for (i = 0; i < nr_bhs; i++) {
-               memcpy(cb_pos, bhs[i]->b_data, block_size);
-               cb_pos += block_size;
-       }
-
-       /* Just a precaution. */
-       if (cb_pos + 2 <= cb + cb_size)
-               *(u16*)cb_pos = 0;
-
-       /* Reset cb_pos back to the beginning. */
-       cb_pos = cb;
-
-       /* We now have both source (if present) and destination. */
-       ntfs_debug("Successfully read the compression block.");
-
-       /* The last page and maximum offset within it for the current cb. */
-       cb_max_page = (cur_page << PAGE_SHIFT) + cur_ofs + cb_size;
-       cb_max_ofs = cb_max_page & ~PAGE_MASK;
-       cb_max_page >>= PAGE_SHIFT;
-
-       /* Catch end of file inside a compression block. */
-       if (cb_max_page > max_page)
-               cb_max_page = max_page;
-
-       if (vcn == start_vcn - cb_clusters) {
-               /* Sparse cb, zero out page range overlapping the cb. */
-               ntfs_debug("Found sparse compression block.");
-               /* We can sleep from now on, so we drop lock. */
-               spin_unlock(&ntfs_cb_lock);
-               if (cb_max_ofs)
-                       cb_max_page--;
-               for (; cur_page < cb_max_page; cur_page++) {
-                       page = pages[cur_page];
-                       if (page) {
-                               if (likely(!cur_ofs))
-                                       clear_page(page_address(page));
-                               else
-                                       memset(page_address(page) + cur_ofs, 0,
-                                                       PAGE_SIZE -
-                                                       cur_ofs);
-                               flush_dcache_page(page);
-                               kunmap(page);
-                               SetPageUptodate(page);
-                               unlock_page(page);
-                               if (cur_page == xpage)
-                                       xpage_done = 1;
-                               else
-                                       put_page(page);
-                               pages[cur_page] = NULL;
-                       }
-                       cb_pos += PAGE_SIZE - cur_ofs;
-                       cur_ofs = 0;
-                       if (cb_pos >= cb_end)
-                               break;
-               }
-               /* If we have a partial final page, deal with it now. */
-               if (cb_max_ofs && cb_pos < cb_end) {
-                       page = pages[cur_page];
-                       if (page)
-                               memset(page_address(page) + cur_ofs, 0,
-                                               cb_max_ofs - cur_ofs);
-                       /*
-                        * No need to update cb_pos at this stage:
-                        *      cb_pos += cb_max_ofs - cur_ofs;
-                        */
-                       cur_ofs = cb_max_ofs;
-               }
-       } else if (vcn == start_vcn) {
-               /* We can't sleep so we need two stages. */
-               unsigned int cur2_page = cur_page;
-               unsigned int cur_ofs2 = cur_ofs;
-               u8 *cb_pos2 = cb_pos;
-
-               ntfs_debug("Found uncompressed compression block.");
-               /* Uncompressed cb, copy it to the destination pages. */
-               /*
-                * TODO: As a big optimization, we could detect this case
-                * before we read all the pages and use block_read_full_folio()
-                * on all full pages instead (we still have to treat partial
-                * pages especially but at least we are getting rid of the
-                * synchronous io for the majority of pages.
-                * Or if we choose not to do the read-ahead/-behind stuff, we
-                * could just return block_read_full_folio(pages[xpage]) as long
-                * as PAGE_SIZE <= cb_size.
-                */
-               if (cb_max_ofs)
-                       cb_max_page--;
-               /* First stage: copy data into destination pages. */
-               for (; cur_page < cb_max_page; cur_page++) {
-                       page = pages[cur_page];
-                       if (page)
-                               memcpy(page_address(page) + cur_ofs, cb_pos,
-                                               PAGE_SIZE - cur_ofs);
-                       cb_pos += PAGE_SIZE - cur_ofs;
-                       cur_ofs = 0;
-                       if (cb_pos >= cb_end)
-                               break;
-               }
-               /* If we have a partial final page, deal with it now. */
-               if (cb_max_ofs && cb_pos < cb_end) {
-                       page = pages[cur_page];
-                       if (page)
-                               memcpy(page_address(page) + cur_ofs, cb_pos,
-                                               cb_max_ofs - cur_ofs);
-                       cb_pos += cb_max_ofs - cur_ofs;
-                       cur_ofs = cb_max_ofs;
-               }
-               /* We can sleep from now on, so drop lock. */
-               spin_unlock(&ntfs_cb_lock);
-               /* Second stage: finalize pages. */
-               for (; cur2_page < cb_max_page; cur2_page++) {
-                       page = pages[cur2_page];
-                       if (page) {
-                               /*
-                                * If we are outside the initialized size, zero
-                                * the out of bounds page range.
-                                */
-                               handle_bounds_compressed_page(page, i_size,
-                                               initialized_size);
-                               flush_dcache_page(page);
-                               kunmap(page);
-                               SetPageUptodate(page);
-                               unlock_page(page);
-                               if (cur2_page == xpage)
-                                       xpage_done = 1;
-                               else
-                                       put_page(page);
-                               pages[cur2_page] = NULL;
-                       }
-                       cb_pos2 += PAGE_SIZE - cur_ofs2;
-                       cur_ofs2 = 0;
-                       if (cb_pos2 >= cb_end)
-                               break;
-               }
-       } else {
-               /* Compressed cb, decompress it into the destination page(s). */
-               unsigned int prev_cur_page = cur_page;
-
-               ntfs_debug("Found compressed compression block.");
-               err = ntfs_decompress(pages, completed_pages, &cur_page,
-                               &cur_ofs, cb_max_page, cb_max_ofs, xpage,
-                               &xpage_done, cb_pos, cb_size - (cb_pos - cb),
-                               i_size, initialized_size);
-               /*
-                * We can sleep from now on, lock already dropped by
-                * ntfs_decompress().
-                */
-               if (err) {
-                       ntfs_error(vol->sb, "ntfs_decompress() failed in inode "
-                                       "0x%lx with error code %i. Skipping "
-                                       "this compression block.",
-                                       ni->mft_no, -err);
-                       /* Release the unfinished pages. */
-                       for (; prev_cur_page < cur_page; prev_cur_page++) {
-                               page = pages[prev_cur_page];
-                               if (page) {
-                                       flush_dcache_page(page);
-                                       kunmap(page);
-                                       unlock_page(page);
-                                       if (prev_cur_page != xpage)
-                                               put_page(page);
-                                       pages[prev_cur_page] = NULL;
-                               }
-                       }
-               }
-       }
-
-       /* Release the buffer heads. */
-       for (i = 0; i < nr_bhs; i++)
-               brelse(bhs[i]);
-
-       /* Do we have more work to do? */
-       if (nr_cbs)
-               goto do_next_cb;
-
-       /* We no longer need the list of buffer heads. */
-       kfree(bhs);
-
-       /* Clean up if we have any pages left. Should never happen. */
-       for (cur_page = 0; cur_page < max_page; cur_page++) {
-               page = pages[cur_page];
-               if (page) {
-                       ntfs_error(vol->sb, "Still have pages left! "
-                                       "Terminating them with extreme "
-                                       "prejudice.  Inode 0x%lx, page index "
-                                       "0x%lx.", ni->mft_no, page->index);
-                       flush_dcache_page(page);
-                       kunmap(page);
-                       unlock_page(page);
-                       if (cur_page != xpage)
-                               put_page(page);
-                       pages[cur_page] = NULL;
-               }
-       }
-
-       /* We no longer need the list of pages. */
-       kfree(pages);
-       kfree(completed_pages);
-
-       /* If we have completed the requested page, we return success. */
-       if (likely(xpage_done))
-               return 0;
-
-       ntfs_debug("Failed. Returning error code %s.", err == -EOVERFLOW ?
-                       "EOVERFLOW" : (!err ? "EIO" : "unknown error"));
-       return err < 0 ? err : -EIO;
-
-read_err:
-       ntfs_error(vol->sb, "IO error while reading compressed data.");
-       /* Release the buffer heads. */
-       for (i = 0; i < nr_bhs; i++)
-               brelse(bhs[i]);
-       goto err_out;
-
-map_rl_err:
-       ntfs_error(vol->sb, "ntfs_map_runlist() failed. Cannot read "
-                       "compression block.");
-       goto err_out;
-
-rl_err:
-       up_read(&ni->runlist.lock);
-       ntfs_error(vol->sb, "ntfs_rl_vcn_to_lcn() failed. Cannot read "
-                       "compression block.");
-       goto err_out;
-
-getblk_err:
-       up_read(&ni->runlist.lock);
-       ntfs_error(vol->sb, "getblk() failed. Cannot read compression block.");
-
-err_out:
-       kfree(bhs);
-       for (i = cur_page; i < max_page; i++) {
-               page = pages[i];
-               if (page) {
-                       flush_dcache_page(page);
-                       kunmap(page);
-                       unlock_page(page);
-                       if (i != xpage)
-                               put_page(page);
-               }
-       }
-       kfree(pages);
-       kfree(completed_pages);
-       return -EIO;
-}
diff --git a/fs/ntfs/debug.c b/fs/ntfs/debug.c
deleted file mode 100644 (file)
index a3c1c56..0000000
+++ /dev/null
@@ -1,159 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * debug.c - NTFS kernel debug support. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2004 Anton Altaparmakov
- */
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-#include "debug.h"
-
-/**
- * __ntfs_warning - output a warning to the syslog
- * @function:  name of function outputting the warning
- * @sb:                super block of mounted ntfs filesystem
- * @fmt:       warning string containing format specifications
- * @...:       a variable number of arguments specified in @fmt
- *
- * Outputs a warning to the syslog for the mounted ntfs filesystem described
- * by @sb.
- *
- * @fmt and the corresponding @... is printf style format string containing
- * the warning string and the corresponding format arguments, respectively.
- *
- * @function is the name of the function from which __ntfs_warning is being
- * called.
- *
- * Note, you should be using debug.h::ntfs_warning(@sb, @fmt, @...) instead
- * as this provides the @function parameter automatically.
- */
-void __ntfs_warning(const char *function, const struct super_block *sb,
-               const char *fmt, ...)
-{
-       struct va_format vaf;
-       va_list args;
-       int flen = 0;
-
-#ifndef DEBUG
-       if (!printk_ratelimit())
-               return;
-#endif
-       if (function)
-               flen = strlen(function);
-       va_start(args, fmt);
-       vaf.fmt = fmt;
-       vaf.va = &args;
-       if (sb)
-               pr_warn("(device %s): %s(): %pV\n",
-                       sb->s_id, flen ? function : "", &vaf);
-       else
-               pr_warn("%s(): %pV\n", flen ? function : "", &vaf);
-       va_end(args);
-}
-
-/**
- * __ntfs_error - output an error to the syslog
- * @function:  name of function outputting the error
- * @sb:                super block of mounted ntfs filesystem
- * @fmt:       error string containing format specifications
- * @...:       a variable number of arguments specified in @fmt
- *
- * Outputs an error to the syslog for the mounted ntfs filesystem described
- * by @sb.
- *
- * @fmt and the corresponding @... is printf style format string containing
- * the error string and the corresponding format arguments, respectively.
- *
- * @function is the name of the function from which __ntfs_error is being
- * called.
- *
- * Note, you should be using debug.h::ntfs_error(@sb, @fmt, @...) instead
- * as this provides the @function parameter automatically.
- */
-void __ntfs_error(const char *function, const struct super_block *sb,
-               const char *fmt, ...)
-{
-       struct va_format vaf;
-       va_list args;
-       int flen = 0;
-
-#ifndef DEBUG
-       if (!printk_ratelimit())
-               return;
-#endif
-       if (function)
-               flen = strlen(function);
-       va_start(args, fmt);
-       vaf.fmt = fmt;
-       vaf.va = &args;
-       if (sb)
-               pr_err("(device %s): %s(): %pV\n",
-                      sb->s_id, flen ? function : "", &vaf);
-       else
-               pr_err("%s(): %pV\n", flen ? function : "", &vaf);
-       va_end(args);
-}
-
-#ifdef DEBUG
-
-/* If 1, output debug messages, and if 0, don't. */
-int debug_msgs = 0;
-
-void __ntfs_debug(const char *file, int line, const char *function,
-               const char *fmt, ...)
-{
-       struct va_format vaf;
-       va_list args;
-       int flen = 0;
-
-       if (!debug_msgs)
-               return;
-       if (function)
-               flen = strlen(function);
-       va_start(args, fmt);
-       vaf.fmt = fmt;
-       vaf.va = &args;
-       pr_debug("(%s, %d): %s(): %pV", file, line, flen ? function : "", &vaf);
-       va_end(args);
-}
-
-/* Dump a runlist. Caller has to provide synchronisation for @rl. */
-void ntfs_debug_dump_runlist(const runlist_element *rl)
-{
-       int i;
-       const char *lcn_str[5] = { "LCN_HOLE         ", "LCN_RL_NOT_MAPPED",
-                                  "LCN_ENOENT       ", "LCN_unknown      " };
-
-       if (!debug_msgs)
-               return;
-       pr_debug("Dumping runlist (values in hex):\n");
-       if (!rl) {
-               pr_debug("Run list not present.\n");
-               return;
-       }
-       pr_debug("VCN              LCN               Run length\n");
-       for (i = 0; ; i++) {
-               LCN lcn = (rl + i)->lcn;
-
-               if (lcn < (LCN)0) {
-                       int index = -lcn - 1;
-
-                       if (index > -LCN_ENOENT - 1)
-                               index = 3;
-                       pr_debug("%-16Lx %s %-16Lx%s\n",
-                                       (long long)(rl + i)->vcn, lcn_str[index],
-                                       (long long)(rl + i)->length,
-                                       (rl + i)->length ? "" :
-                                               " (runlist end)");
-               } else
-                       pr_debug("%-16Lx %-16Lx  %-16Lx%s\n",
-                                       (long long)(rl + i)->vcn,
-                                       (long long)(rl + i)->lcn,
-                                       (long long)(rl + i)->length,
-                                       (rl + i)->length ? "" :
-                                               " (runlist end)");
-               if (!(rl + i)->length)
-                       break;
-       }
-}
-
-#endif
diff --git a/fs/ntfs/debug.h b/fs/ntfs/debug.h
deleted file mode 100644 (file)
index 6fdef38..0000000
+++ /dev/null
@@ -1,57 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * debug.h - NTFS kernel debug support. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2004 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_DEBUG_H
-#define _LINUX_NTFS_DEBUG_H
-
-#include <linux/fs.h>
-
-#include "runlist.h"
-
-#ifdef DEBUG
-
-extern int debug_msgs;
-
-extern __printf(4, 5)
-void __ntfs_debug(const char *file, int line, const char *function,
-                 const char *format, ...);
-/**
- * ntfs_debug - write a debug level message to syslog
- * @f:         a printf format string containing the message
- * @...:       the variables to substitute into @f
- *
- * ntfs_debug() writes a DEBUG level message to the syslog but only if the
- * driver was compiled with -DDEBUG. Otherwise, the call turns into a NOP.
- */
-#define ntfs_debug(f, a...)                                            \
-       __ntfs_debug(__FILE__, __LINE__, __func__, f, ##a)
-
-extern void ntfs_debug_dump_runlist(const runlist_element *rl);
-
-#else  /* !DEBUG */
-
-#define ntfs_debug(fmt, ...)                                           \
-do {                                                                   \
-       if (0)                                                          \
-               no_printk(fmt, ##__VA_ARGS__);                          \
-} while (0)
-
-#define ntfs_debug_dump_runlist(rl)    do {} while (0)
-
-#endif /* !DEBUG */
-
-extern  __printf(3, 4)
-void __ntfs_warning(const char *function, const struct super_block *sb,
-                   const char *fmt, ...);
-#define ntfs_warning(sb, f, a...)      __ntfs_warning(__func__, sb, f, ##a)
-
-extern  __printf(3, 4)
-void __ntfs_error(const char *function, const struct super_block *sb,
-                 const char *fmt, ...);
-#define ntfs_error(sb, f, a...)                __ntfs_error(__func__, sb, f, ##a)
-
-#endif /* _LINUX_NTFS_DEBUG_H */
diff --git a/fs/ntfs/dir.c b/fs/ntfs/dir.c
deleted file mode 100644 (file)
index 629723a..0000000
+++ /dev/null
@@ -1,1540 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * dir.c - NTFS kernel directory operations. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2007 Anton Altaparmakov
- * Copyright (c) 2002 Richard Russon
- */
-
-#include <linux/buffer_head.h>
-#include <linux/slab.h>
-#include <linux/blkdev.h>
-
-#include "dir.h"
-#include "aops.h"
-#include "attrib.h"
-#include "mft.h"
-#include "debug.h"
-#include "ntfs.h"
-
-/*
- * The little endian Unicode string $I30 as a global constant.
- */
-ntfschar I30[5] = { cpu_to_le16('$'), cpu_to_le16('I'),
-               cpu_to_le16('3'),       cpu_to_le16('0'), 0 };
-
-/**
- * ntfs_lookup_inode_by_name - find an inode in a directory given its name
- * @dir_ni:    ntfs inode of the directory in which to search for the name
- * @uname:     Unicode name for which to search in the directory
- * @uname_len: length of the name @uname in Unicode characters
- * @res:       return the found file name if necessary (see below)
- *
- * Look for an inode with name @uname in the directory with inode @dir_ni.
- * ntfs_lookup_inode_by_name() walks the contents of the directory looking for
- * the Unicode name. If the name is found in the directory, the corresponding
- * inode number (>= 0) is returned as a mft reference in cpu format, i.e. it
- * is a 64-bit number containing the sequence number.
- *
- * On error, a negative value is returned corresponding to the error code. In
- * particular if the inode is not found -ENOENT is returned. Note that you
- * can't just check the return value for being negative, you have to check the
- * inode number for being negative which you can extract using MREC(return
- * value).
- *
- * Note, @uname_len does not include the (optional) terminating NULL character.
- *
- * Note, we look for a case sensitive match first but we also look for a case
- * insensitive match at the same time. If we find a case insensitive match, we
- * save that for the case that we don't find an exact match, where we return
- * the case insensitive match and setup @res (which we allocate!) with the mft
- * reference, the file name type, length and with a copy of the little endian
- * Unicode file name itself. If we match a file name which is in the DOS name
- * space, we only return the mft reference and file name type in @res.
- * ntfs_lookup() then uses this to find the long file name in the inode itself.
- * This is to avoid polluting the dcache with short file names. We want them to
- * work but we don't care for how quickly one can access them. This also fixes
- * the dcache aliasing issues.
- *
- * Locking:  - Caller must hold i_mutex on the directory.
- *          - Each page cache page in the index allocation mapping must be
- *            locked whilst being accessed otherwise we may find a corrupt
- *            page due to it being under ->writepage at the moment which
- *            applies the mst protection fixups before writing out and then
- *            removes them again after the write is complete after which it 
- *            unlocks the page.
- */
-MFT_REF ntfs_lookup_inode_by_name(ntfs_inode *dir_ni, const ntfschar *uname,
-               const int uname_len, ntfs_name **res)
-{
-       ntfs_volume *vol = dir_ni->vol;
-       struct super_block *sb = vol->sb;
-       MFT_RECORD *m;
-       INDEX_ROOT *ir;
-       INDEX_ENTRY *ie;
-       INDEX_ALLOCATION *ia;
-       u8 *index_end;
-       u64 mref;
-       ntfs_attr_search_ctx *ctx;
-       int err, rc;
-       VCN vcn, old_vcn;
-       struct address_space *ia_mapping;
-       struct page *page;
-       u8 *kaddr;
-       ntfs_name *name = NULL;
-
-       BUG_ON(!S_ISDIR(VFS_I(dir_ni)->i_mode));
-       BUG_ON(NInoAttr(dir_ni));
-       /* Get hold of the mft record for the directory. */
-       m = map_mft_record(dir_ni);
-       if (IS_ERR(m)) {
-               ntfs_error(sb, "map_mft_record() failed with error code %ld.",
-                               -PTR_ERR(m));
-               return ERR_MREF(PTR_ERR(m));
-       }
-       ctx = ntfs_attr_get_search_ctx(dir_ni, m);
-       if (unlikely(!ctx)) {
-               err = -ENOMEM;
-               goto err_out;
-       }
-       /* Find the index root attribute in the mft record. */
-       err = ntfs_attr_lookup(AT_INDEX_ROOT, I30, 4, CASE_SENSITIVE, 0, NULL,
-                       0, ctx);
-       if (unlikely(err)) {
-               if (err == -ENOENT) {
-                       ntfs_error(sb, "Index root attribute missing in "
-                                       "directory inode 0x%lx.",
-                                       dir_ni->mft_no);
-                       err = -EIO;
-               }
-               goto err_out;
-       }
-       /* Get to the index root value (it's been verified in read_inode). */
-       ir = (INDEX_ROOT*)((u8*)ctx->attr +
-                       le16_to_cpu(ctx->attr->data.resident.value_offset));
-       index_end = (u8*)&ir->index + le32_to_cpu(ir->index.index_length);
-       /* The first index entry. */
-       ie = (INDEX_ENTRY*)((u8*)&ir->index +
-                       le32_to_cpu(ir->index.entries_offset));
-       /*
-        * Loop until we exceed valid memory (corruption case) or until we
-        * reach the last entry.
-        */
-       for (;; ie = (INDEX_ENTRY*)((u8*)ie + le16_to_cpu(ie->length))) {
-               /* Bounds checks. */
-               if ((u8*)ie < (u8*)ctx->mrec || (u8*)ie +
-                               sizeof(INDEX_ENTRY_HEADER) > index_end ||
-                               (u8*)ie + le16_to_cpu(ie->key_length) >
-                               index_end)
-                       goto dir_err_out;
-               /*
-                * The last entry cannot contain a name. It can however contain
-                * a pointer to a child node in the B+tree so we just break out.
-                */
-               if (ie->flags & INDEX_ENTRY_END)
-                       break;
-               /*
-                * We perform a case sensitive comparison and if that matches
-                * we are done and return the mft reference of the inode (i.e.
-                * the inode number together with the sequence number for
-                * consistency checking). We convert it to cpu format before
-                * returning.
-                */
-               if (ntfs_are_names_equal(uname, uname_len,
-                               (ntfschar*)&ie->key.file_name.file_name,
-                               ie->key.file_name.file_name_length,
-                               CASE_SENSITIVE, vol->upcase, vol->upcase_len)) {
-found_it:
-                       /*
-                        * We have a perfect match, so we don't need to care
-                        * about having matched imperfectly before, so we can
-                        * free name and set *res to NULL.
-                        * However, if the perfect match is a short file name,
-                        * we need to signal this through *res, so that
-                        * ntfs_lookup() can fix dcache aliasing issues.
-                        * As an optimization we just reuse an existing
-                        * allocation of *res.
-                        */
-                       if (ie->key.file_name.file_name_type == FILE_NAME_DOS) {
-                               if (!name) {
-                                       name = kmalloc(sizeof(ntfs_name),
-                                                       GFP_NOFS);
-                                       if (!name) {
-                                               err = -ENOMEM;
-                                               goto err_out;
-                                       }
-                               }
-                               name->mref = le64_to_cpu(
-                                               ie->data.dir.indexed_file);
-                               name->type = FILE_NAME_DOS;
-                               name->len = 0;
-                               *res = name;
-                       } else {
-                               kfree(name);
-                               *res = NULL;
-                       }
-                       mref = le64_to_cpu(ie->data.dir.indexed_file);
-                       ntfs_attr_put_search_ctx(ctx);
-                       unmap_mft_record(dir_ni);
-                       return mref;
-               }
-               /*
-                * For a case insensitive mount, we also perform a case
-                * insensitive comparison (provided the file name is not in the
-                * POSIX namespace). If the comparison matches, and the name is
-                * in the WIN32 namespace, we cache the filename in *res so
-                * that the caller, ntfs_lookup(), can work on it. If the
-                * comparison matches, and the name is in the DOS namespace, we
-                * only cache the mft reference and the file name type (we set
-                * the name length to zero for simplicity).
-                */
-               if (!NVolCaseSensitive(vol) &&
-                               ie->key.file_name.file_name_type &&
-                               ntfs_are_names_equal(uname, uname_len,
-                               (ntfschar*)&ie->key.file_name.file_name,
-                               ie->key.file_name.file_name_length,
-                               IGNORE_CASE, vol->upcase, vol->upcase_len)) {
-                       int name_size = sizeof(ntfs_name);
-                       u8 type = ie->key.file_name.file_name_type;
-                       u8 len = ie->key.file_name.file_name_length;
-
-                       /* Only one case insensitive matching name allowed. */
-                       if (name) {
-                               ntfs_error(sb, "Found already allocated name "
-                                               "in phase 1. Please run chkdsk "
-                                               "and if that doesn't find any "
-                                               "errors please report you saw "
-                                               "this message to "
-                                               "linux-ntfs-dev@lists."
-                                               "sourceforge.net.");
-                               goto dir_err_out;
-                       }
-
-                       if (type != FILE_NAME_DOS)
-                               name_size += len * sizeof(ntfschar);
-                       name = kmalloc(name_size, GFP_NOFS);
-                       if (!name) {
-                               err = -ENOMEM;
-                               goto err_out;
-                       }
-                       name->mref = le64_to_cpu(ie->data.dir.indexed_file);
-                       name->type = type;
-                       if (type != FILE_NAME_DOS) {
-                               name->len = len;
-                               memcpy(name->name, ie->key.file_name.file_name,
-                                               len * sizeof(ntfschar));
-                       } else
-                               name->len = 0;
-                       *res = name;
-               }
-               /*
-                * Not a perfect match, need to do full blown collation so we
-                * know which way in the B+tree we have to go.
-                */
-               rc = ntfs_collate_names(uname, uname_len,
-                               (ntfschar*)&ie->key.file_name.file_name,
-                               ie->key.file_name.file_name_length, 1,
-                               IGNORE_CASE, vol->upcase, vol->upcase_len);
-               /*
-                * If uname collates before the name of the current entry, there
-                * is definitely no such name in this index but we might need to
-                * descend into the B+tree so we just break out of the loop.
-                */
-               if (rc == -1)
-                       break;
-               /* The names are not equal, continue the search. */
-               if (rc)
-                       continue;
-               /*
-                * Names match with case insensitive comparison, now try the
-                * case sensitive comparison, which is required for proper
-                * collation.
-                */
-               rc = ntfs_collate_names(uname, uname_len,
-                               (ntfschar*)&ie->key.file_name.file_name,
-                               ie->key.file_name.file_name_length, 1,
-                               CASE_SENSITIVE, vol->upcase, vol->upcase_len);
-               if (rc == -1)
-                       break;
-               if (rc)
-                       continue;
-               /*
-                * Perfect match, this will never happen as the
-                * ntfs_are_names_equal() call will have gotten a match but we
-                * still treat it correctly.
-                */
-               goto found_it;
-       }
-       /*
-        * We have finished with this index without success. Check for the
-        * presence of a child node and if not present return -ENOENT, unless
-        * we have got a matching name cached in name in which case return the
-        * mft reference associated with it.
-        */
-       if (!(ie->flags & INDEX_ENTRY_NODE)) {
-               if (name) {
-                       ntfs_attr_put_search_ctx(ctx);
-                       unmap_mft_record(dir_ni);
-                       return name->mref;
-               }
-               ntfs_debug("Entry not found.");
-               err = -ENOENT;
-               goto err_out;
-       } /* Child node present, descend into it. */
-       /* Consistency check: Verify that an index allocation exists. */
-       if (!NInoIndexAllocPresent(dir_ni)) {
-               ntfs_error(sb, "No index allocation attribute but index entry "
-                               "requires one. Directory inode 0x%lx is "
-                               "corrupt or driver bug.", dir_ni->mft_no);
-               goto err_out;
-       }
-       /* Get the starting vcn of the index_block holding the child node. */
-       vcn = sle64_to_cpup((sle64*)((u8*)ie + le16_to_cpu(ie->length) - 8));
-       ia_mapping = VFS_I(dir_ni)->i_mapping;
-       /*
-        * We are done with the index root and the mft record. Release them,
-        * otherwise we deadlock with ntfs_map_page().
-        */
-       ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(dir_ni);
-       m = NULL;
-       ctx = NULL;
-descend_into_child_node:
-       /*
-        * Convert vcn to index into the index allocation attribute in units
-        * of PAGE_SIZE and map the page cache page, reading it from
-        * disk if necessary.
-        */
-       page = ntfs_map_page(ia_mapping, vcn <<
-                       dir_ni->itype.index.vcn_size_bits >> PAGE_SHIFT);
-       if (IS_ERR(page)) {
-               ntfs_error(sb, "Failed to map directory index page, error %ld.",
-                               -PTR_ERR(page));
-               err = PTR_ERR(page);
-               goto err_out;
-       }
-       lock_page(page);
-       kaddr = (u8*)page_address(page);
-fast_descend_into_child_node:
-       /* Get to the index allocation block. */
-       ia = (INDEX_ALLOCATION*)(kaddr + ((vcn <<
-                       dir_ni->itype.index.vcn_size_bits) & ~PAGE_MASK));
-       /* Bounds checks. */
-       if ((u8*)ia < kaddr || (u8*)ia > kaddr + PAGE_SIZE) {
-               ntfs_error(sb, "Out of bounds check failed. Corrupt directory "
-                               "inode 0x%lx or driver bug.", dir_ni->mft_no);
-               goto unm_err_out;
-       }
-       /* Catch multi sector transfer fixup errors. */
-       if (unlikely(!ntfs_is_indx_record(ia->magic))) {
-               ntfs_error(sb, "Directory index record with vcn 0x%llx is "
-                               "corrupt.  Corrupt inode 0x%lx.  Run chkdsk.",
-                               (unsigned long long)vcn, dir_ni->mft_no);
-               goto unm_err_out;
-       }
-       if (sle64_to_cpu(ia->index_block_vcn) != vcn) {
-               ntfs_error(sb, "Actual VCN (0x%llx) of index buffer is "
-                               "different from expected VCN (0x%llx). "
-                               "Directory inode 0x%lx is corrupt or driver "
-                               "bug.", (unsigned long long)
-                               sle64_to_cpu(ia->index_block_vcn),
-                               (unsigned long long)vcn, dir_ni->mft_no);
-               goto unm_err_out;
-       }
-       if (le32_to_cpu(ia->index.allocated_size) + 0x18 !=
-                       dir_ni->itype.index.block_size) {
-               ntfs_error(sb, "Index buffer (VCN 0x%llx) of directory inode "
-                               "0x%lx has a size (%u) differing from the "
-                               "directory specified size (%u). Directory "
-                               "inode is corrupt or driver bug.",
-                               (unsigned long long)vcn, dir_ni->mft_no,
-                               le32_to_cpu(ia->index.allocated_size) + 0x18,
-                               dir_ni->itype.index.block_size);
-               goto unm_err_out;
-       }
-       index_end = (u8*)ia + dir_ni->itype.index.block_size;
-       if (index_end > kaddr + PAGE_SIZE) {
-               ntfs_error(sb, "Index buffer (VCN 0x%llx) of directory inode "
-                               "0x%lx crosses page boundary. Impossible! "
-                               "Cannot access! This is probably a bug in the "
-                               "driver.", (unsigned long long)vcn,
-                               dir_ni->mft_no);
-               goto unm_err_out;
-       }
-       index_end = (u8*)&ia->index + le32_to_cpu(ia->index.index_length);
-       if (index_end > (u8*)ia + dir_ni->itype.index.block_size) {
-               ntfs_error(sb, "Size of index buffer (VCN 0x%llx) of directory "
-                               "inode 0x%lx exceeds maximum size.",
-                               (unsigned long long)vcn, dir_ni->mft_no);
-               goto unm_err_out;
-       }
-       /* The first index entry. */
-       ie = (INDEX_ENTRY*)((u8*)&ia->index +
-                       le32_to_cpu(ia->index.entries_offset));
-       /*
-        * Iterate similar to above big loop but applied to index buffer, thus
-        * loop until we exceed valid memory (corruption case) or until we
-        * reach the last entry.
-        */
-       for (;; ie = (INDEX_ENTRY*)((u8*)ie + le16_to_cpu(ie->length))) {
-               /* Bounds check. */
-               if ((u8*)ie < (u8*)ia || (u8*)ie +
-                               sizeof(INDEX_ENTRY_HEADER) > index_end ||
-                               (u8*)ie + le16_to_cpu(ie->key_length) >
-                               index_end) {
-                       ntfs_error(sb, "Index entry out of bounds in "
-                                       "directory inode 0x%lx.",
-                                       dir_ni->mft_no);
-                       goto unm_err_out;
-               }
-               /*
-                * The last entry cannot contain a name. It can however contain
-                * a pointer to a child node in the B+tree so we just break out.
-                */
-               if (ie->flags & INDEX_ENTRY_END)
-                       break;
-               /*
-                * We perform a case sensitive comparison and if that matches
-                * we are done and return the mft reference of the inode (i.e.
-                * the inode number together with the sequence number for
-                * consistency checking). We convert it to cpu format before
-                * returning.
-                */
-               if (ntfs_are_names_equal(uname, uname_len,
-                               (ntfschar*)&ie->key.file_name.file_name,
-                               ie->key.file_name.file_name_length,
-                               CASE_SENSITIVE, vol->upcase, vol->upcase_len)) {
-found_it2:
-                       /*
-                        * We have a perfect match, so we don't need to care
-                        * about having matched imperfectly before, so we can
-                        * free name and set *res to NULL.
-                        * However, if the perfect match is a short file name,
-                        * we need to signal this through *res, so that
-                        * ntfs_lookup() can fix dcache aliasing issues.
-                        * As an optimization we just reuse an existing
-                        * allocation of *res.
-                        */
-                       if (ie->key.file_name.file_name_type == FILE_NAME_DOS) {
-                               if (!name) {
-                                       name = kmalloc(sizeof(ntfs_name),
-                                                       GFP_NOFS);
-                                       if (!name) {
-                                               err = -ENOMEM;
-                                               goto unm_err_out;
-                                       }
-                               }
-                               name->mref = le64_to_cpu(
-                                               ie->data.dir.indexed_file);
-                               name->type = FILE_NAME_DOS;
-                               name->len = 0;
-                               *res = name;
-                       } else {
-                               kfree(name);
-                               *res = NULL;
-                       }
-                       mref = le64_to_cpu(ie->data.dir.indexed_file);
-                       unlock_page(page);
-                       ntfs_unmap_page(page);
-                       return mref;
-               }
-               /*
-                * For a case insensitive mount, we also perform a case
-                * insensitive comparison (provided the file name is not in the
-                * POSIX namespace). If the comparison matches, and the name is
-                * in the WIN32 namespace, we cache the filename in *res so
-                * that the caller, ntfs_lookup(), can work on it. If the
-                * comparison matches, and the name is in the DOS namespace, we
-                * only cache the mft reference and the file name type (we set
-                * the name length to zero for simplicity).
-                */
-               if (!NVolCaseSensitive(vol) &&
-                               ie->key.file_name.file_name_type &&
-                               ntfs_are_names_equal(uname, uname_len,
-                               (ntfschar*)&ie->key.file_name.file_name,
-                               ie->key.file_name.file_name_length,
-                               IGNORE_CASE, vol->upcase, vol->upcase_len)) {
-                       int name_size = sizeof(ntfs_name);
-                       u8 type = ie->key.file_name.file_name_type;
-                       u8 len = ie->key.file_name.file_name_length;
-
-                       /* Only one case insensitive matching name allowed. */
-                       if (name) {
-                               ntfs_error(sb, "Found already allocated name "
-                                               "in phase 2. Please run chkdsk "
-                                               "and if that doesn't find any "
-                                               "errors please report you saw "
-                                               "this message to "
-                                               "linux-ntfs-dev@lists."
-                                               "sourceforge.net.");
-                               unlock_page(page);
-                               ntfs_unmap_page(page);
-                               goto dir_err_out;
-                       }
-
-                       if (type != FILE_NAME_DOS)
-                               name_size += len * sizeof(ntfschar);
-                       name = kmalloc(name_size, GFP_NOFS);
-                       if (!name) {
-                               err = -ENOMEM;
-                               goto unm_err_out;
-                       }
-                       name->mref = le64_to_cpu(ie->data.dir.indexed_file);
-                       name->type = type;
-                       if (type != FILE_NAME_DOS) {
-                               name->len = len;
-                               memcpy(name->name, ie->key.file_name.file_name,
-                                               len * sizeof(ntfschar));
-                       } else
-                               name->len = 0;
-                       *res = name;
-               }
-               /*
-                * Not a perfect match, need to do full blown collation so we
-                * know which way in the B+tree we have to go.
-                */
-               rc = ntfs_collate_names(uname, uname_len,
-                               (ntfschar*)&ie->key.file_name.file_name,
-                               ie->key.file_name.file_name_length, 1,
-                               IGNORE_CASE, vol->upcase, vol->upcase_len);
-               /*
-                * If uname collates before the name of the current entry, there
-                * is definitely no such name in this index but we might need to
-                * descend into the B+tree so we just break out of the loop.
-                */
-               if (rc == -1)
-                       break;
-               /* The names are not equal, continue the search. */
-               if (rc)
-                       continue;
-               /*
-                * Names match with case insensitive comparison, now try the
-                * case sensitive comparison, which is required for proper
-                * collation.
-                */
-               rc = ntfs_collate_names(uname, uname_len,
-                               (ntfschar*)&ie->key.file_name.file_name,
-                               ie->key.file_name.file_name_length, 1,
-                               CASE_SENSITIVE, vol->upcase, vol->upcase_len);
-               if (rc == -1)
-                       break;
-               if (rc)
-                       continue;
-               /*
-                * Perfect match, this will never happen as the
-                * ntfs_are_names_equal() call will have gotten a match but we
-                * still treat it correctly.
-                */
-               goto found_it2;
-       }
-       /*
-        * We have finished with this index buffer without success. Check for
-        * the presence of a child node.
-        */
-       if (ie->flags & INDEX_ENTRY_NODE) {
-               if ((ia->index.flags & NODE_MASK) == LEAF_NODE) {
-                       ntfs_error(sb, "Index entry with child node found in "
-                                       "a leaf node in directory inode 0x%lx.",
-                                       dir_ni->mft_no);
-                       goto unm_err_out;
-               }
-               /* Child node present, descend into it. */
-               old_vcn = vcn;
-               vcn = sle64_to_cpup((sle64*)((u8*)ie +
-                               le16_to_cpu(ie->length) - 8));
-               if (vcn >= 0) {
-                       /* If vcn is in the same page cache page as old_vcn we
-                        * recycle the mapped page. */
-                       if (old_vcn << vol->cluster_size_bits >>
-                                       PAGE_SHIFT == vcn <<
-                                       vol->cluster_size_bits >>
-                                       PAGE_SHIFT)
-                               goto fast_descend_into_child_node;
-                       unlock_page(page);
-                       ntfs_unmap_page(page);
-                       goto descend_into_child_node;
-               }
-               ntfs_error(sb, "Negative child node vcn in directory inode "
-                               "0x%lx.", dir_ni->mft_no);
-               goto unm_err_out;
-       }
-       /*
-        * No child node present, return -ENOENT, unless we have got a matching
-        * name cached in name in which case return the mft reference
-        * associated with it.
-        */
-       if (name) {
-               unlock_page(page);
-               ntfs_unmap_page(page);
-               return name->mref;
-       }
-       ntfs_debug("Entry not found.");
-       err = -ENOENT;
-unm_err_out:
-       unlock_page(page);
-       ntfs_unmap_page(page);
-err_out:
-       if (!err)
-               err = -EIO;
-       if (ctx)
-               ntfs_attr_put_search_ctx(ctx);
-       if (m)
-               unmap_mft_record(dir_ni);
-       if (name) {
-               kfree(name);
-               *res = NULL;
-       }
-       return ERR_MREF(err);
-dir_err_out:
-       ntfs_error(sb, "Corrupt directory.  Aborting lookup.");
-       goto err_out;
-}
-
-#if 0
-
-// TODO: (AIA)
-// The algorithm embedded in this code will be required for the time when we
-// want to support adding of entries to directories, where we require correct
-// collation of file names in order not to cause corruption of the filesystem.
-
-/**
- * ntfs_lookup_inode_by_name - find an inode in a directory given its name
- * @dir_ni:    ntfs inode of the directory in which to search for the name
- * @uname:     Unicode name for which to search in the directory
- * @uname_len: length of the name @uname in Unicode characters
- *
- * Look for an inode with name @uname in the directory with inode @dir_ni.
- * ntfs_lookup_inode_by_name() walks the contents of the directory looking for
- * the Unicode name. If the name is found in the directory, the corresponding
- * inode number (>= 0) is returned as a mft reference in cpu format, i.e. it
- * is a 64-bit number containing the sequence number.
- *
- * On error, a negative value is returned corresponding to the error code. In
- * particular if the inode is not found -ENOENT is returned. Note that you
- * can't just check the return value for being negative, you have to check the
- * inode number for being negative which you can extract using MREC(return
- * value).
- *
- * Note, @uname_len does not include the (optional) terminating NULL character.
- */
-u64 ntfs_lookup_inode_by_name(ntfs_inode *dir_ni, const ntfschar *uname,
-               const int uname_len)
-{
-       ntfs_volume *vol = dir_ni->vol;
-       struct super_block *sb = vol->sb;
-       MFT_RECORD *m;
-       INDEX_ROOT *ir;
-       INDEX_ENTRY *ie;
-       INDEX_ALLOCATION *ia;
-       u8 *index_end;
-       u64 mref;
-       ntfs_attr_search_ctx *ctx;
-       int err, rc;
-       IGNORE_CASE_BOOL ic;
-       VCN vcn, old_vcn;
-       struct address_space *ia_mapping;
-       struct page *page;
-       u8 *kaddr;
-
-       /* Get hold of the mft record for the directory. */
-       m = map_mft_record(dir_ni);
-       if (IS_ERR(m)) {
-               ntfs_error(sb, "map_mft_record() failed with error code %ld.",
-                               -PTR_ERR(m));
-               return ERR_MREF(PTR_ERR(m));
-       }
-       ctx = ntfs_attr_get_search_ctx(dir_ni, m);
-       if (!ctx) {
-               err = -ENOMEM;
-               goto err_out;
-       }
-       /* Find the index root attribute in the mft record. */
-       err = ntfs_attr_lookup(AT_INDEX_ROOT, I30, 4, CASE_SENSITIVE, 0, NULL,
-                       0, ctx);
-       if (unlikely(err)) {
-               if (err == -ENOENT) {
-                       ntfs_error(sb, "Index root attribute missing in "
-                                       "directory inode 0x%lx.",
-                                       dir_ni->mft_no);
-                       err = -EIO;
-               }
-               goto err_out;
-       }
-       /* Get to the index root value (it's been verified in read_inode). */
-       ir = (INDEX_ROOT*)((u8*)ctx->attr +
-                       le16_to_cpu(ctx->attr->data.resident.value_offset));
-       index_end = (u8*)&ir->index + le32_to_cpu(ir->index.index_length);
-       /* The first index entry. */
-       ie = (INDEX_ENTRY*)((u8*)&ir->index +
-                       le32_to_cpu(ir->index.entries_offset));
-       /*
-        * Loop until we exceed valid memory (corruption case) or until we
-        * reach the last entry.
-        */
-       for (;; ie = (INDEX_ENTRY*)((u8*)ie + le16_to_cpu(ie->length))) {
-               /* Bounds checks. */
-               if ((u8*)ie < (u8*)ctx->mrec || (u8*)ie +
-                               sizeof(INDEX_ENTRY_HEADER) > index_end ||
-                               (u8*)ie + le16_to_cpu(ie->key_length) >
-                               index_end)
-                       goto dir_err_out;
-               /*
-                * The last entry cannot contain a name. It can however contain
-                * a pointer to a child node in the B+tree so we just break out.
-                */
-               if (ie->flags & INDEX_ENTRY_END)
-                       break;
-               /*
-                * If the current entry has a name type of POSIX, the name is
-                * case sensitive and not otherwise. This has the effect of us
-                * not being able to access any POSIX file names which collate
-                * after the non-POSIX one when they only differ in case, but
-                * anyone doing screwy stuff like that deserves to burn in
-                * hell... Doing that kind of stuff on NT4 actually causes
-                * corruption on the partition even when using SP6a and Linux
-                * is not involved at all.
-                */
-               ic = ie->key.file_name.file_name_type ? IGNORE_CASE :
-                               CASE_SENSITIVE;
-               /*
-                * If the names match perfectly, we are done and return the
-                * mft reference of the inode (i.e. the inode number together
-                * with the sequence number for consistency checking. We
-                * convert it to cpu format before returning.
-                */
-               if (ntfs_are_names_equal(uname, uname_len,
-                               (ntfschar*)&ie->key.file_name.file_name,
-                               ie->key.file_name.file_name_length, ic,
-                               vol->upcase, vol->upcase_len)) {
-found_it:
-                       mref = le64_to_cpu(ie->data.dir.indexed_file);
-                       ntfs_attr_put_search_ctx(ctx);
-                       unmap_mft_record(dir_ni);
-                       return mref;
-               }
-               /*
-                * Not a perfect match, need to do full blown collation so we
-                * know which way in the B+tree we have to go.
-                */
-               rc = ntfs_collate_names(uname, uname_len,
-                               (ntfschar*)&ie->key.file_name.file_name,
-                               ie->key.file_name.file_name_length, 1,
-                               IGNORE_CASE, vol->upcase, vol->upcase_len);
-               /*
-                * If uname collates before the name of the current entry, there
-                * is definitely no such name in this index but we might need to
-                * descend into the B+tree so we just break out of the loop.
-                */
-               if (rc == -1)
-                       break;
-               /* The names are not equal, continue the search. */
-               if (rc)
-                       continue;
-               /*
-                * Names match with case insensitive comparison, now try the
-                * case sensitive comparison, which is required for proper
-                * collation.
-                */
-               rc = ntfs_collate_names(uname, uname_len,
-                               (ntfschar*)&ie->key.file_name.file_name,
-                               ie->key.file_name.file_name_length, 1,
-                               CASE_SENSITIVE, vol->upcase, vol->upcase_len);
-               if (rc == -1)
-                       break;
-               if (rc)
-                       continue;
-               /*
-                * Perfect match, this will never happen as the
-                * ntfs_are_names_equal() call will have gotten a match but we
-                * still treat it correctly.
-                */
-               goto found_it;
-       }
-       /*
-        * We have finished with this index without success. Check for the
-        * presence of a child node.
-        */
-       if (!(ie->flags & INDEX_ENTRY_NODE)) {
-               /* No child node, return -ENOENT. */
-               err = -ENOENT;
-               goto err_out;
-       } /* Child node present, descend into it. */
-       /* Consistency check: Verify that an index allocation exists. */
-       if (!NInoIndexAllocPresent(dir_ni)) {
-               ntfs_error(sb, "No index allocation attribute but index entry "
-                               "requires one. Directory inode 0x%lx is "
-                               "corrupt or driver bug.", dir_ni->mft_no);
-               goto err_out;
-       }
-       /* Get the starting vcn of the index_block holding the child node. */
-       vcn = sle64_to_cpup((u8*)ie + le16_to_cpu(ie->length) - 8);
-       ia_mapping = VFS_I(dir_ni)->i_mapping;
-       /*
-        * We are done with the index root and the mft record. Release them,
-        * otherwise we deadlock with ntfs_map_page().
-        */
-       ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(dir_ni);
-       m = NULL;
-       ctx = NULL;
-descend_into_child_node:
-       /*
-        * Convert vcn to index into the index allocation attribute in units
-        * of PAGE_SIZE and map the page cache page, reading it from
-        * disk if necessary.
-        */
-       page = ntfs_map_page(ia_mapping, vcn <<
-                       dir_ni->itype.index.vcn_size_bits >> PAGE_SHIFT);
-       if (IS_ERR(page)) {
-               ntfs_error(sb, "Failed to map directory index page, error %ld.",
-                               -PTR_ERR(page));
-               err = PTR_ERR(page);
-               goto err_out;
-       }
-       lock_page(page);
-       kaddr = (u8*)page_address(page);
-fast_descend_into_child_node:
-       /* Get to the index allocation block. */
-       ia = (INDEX_ALLOCATION*)(kaddr + ((vcn <<
-                       dir_ni->itype.index.vcn_size_bits) & ~PAGE_MASK));
-       /* Bounds checks. */
-       if ((u8*)ia < kaddr || (u8*)ia > kaddr + PAGE_SIZE) {
-               ntfs_error(sb, "Out of bounds check failed. Corrupt directory "
-                               "inode 0x%lx or driver bug.", dir_ni->mft_no);
-               goto unm_err_out;
-       }
-       /* Catch multi sector transfer fixup errors. */
-       if (unlikely(!ntfs_is_indx_record(ia->magic))) {
-               ntfs_error(sb, "Directory index record with vcn 0x%llx is "
-                               "corrupt.  Corrupt inode 0x%lx.  Run chkdsk.",
-                               (unsigned long long)vcn, dir_ni->mft_no);
-               goto unm_err_out;
-       }
-       if (sle64_to_cpu(ia->index_block_vcn) != vcn) {
-               ntfs_error(sb, "Actual VCN (0x%llx) of index buffer is "
-                               "different from expected VCN (0x%llx). "
-                               "Directory inode 0x%lx is corrupt or driver "
-                               "bug.", (unsigned long long)
-                               sle64_to_cpu(ia->index_block_vcn),
-                               (unsigned long long)vcn, dir_ni->mft_no);
-               goto unm_err_out;
-       }
-       if (le32_to_cpu(ia->index.allocated_size) + 0x18 !=
-                       dir_ni->itype.index.block_size) {
-               ntfs_error(sb, "Index buffer (VCN 0x%llx) of directory inode "
-                               "0x%lx has a size (%u) differing from the "
-                               "directory specified size (%u). Directory "
-                               "inode is corrupt or driver bug.",
-                               (unsigned long long)vcn, dir_ni->mft_no,
-                               le32_to_cpu(ia->index.allocated_size) + 0x18,
-                               dir_ni->itype.index.block_size);
-               goto unm_err_out;
-       }
-       index_end = (u8*)ia + dir_ni->itype.index.block_size;
-       if (index_end > kaddr + PAGE_SIZE) {
-               ntfs_error(sb, "Index buffer (VCN 0x%llx) of directory inode "
-                               "0x%lx crosses page boundary. Impossible! "
-                               "Cannot access! This is probably a bug in the "
-                               "driver.", (unsigned long long)vcn,
-                               dir_ni->mft_no);
-               goto unm_err_out;
-       }
-       index_end = (u8*)&ia->index + le32_to_cpu(ia->index.index_length);
-       if (index_end > (u8*)ia + dir_ni->itype.index.block_size) {
-               ntfs_error(sb, "Size of index buffer (VCN 0x%llx) of directory "
-                               "inode 0x%lx exceeds maximum size.",
-                               (unsigned long long)vcn, dir_ni->mft_no);
-               goto unm_err_out;
-       }
-       /* The first index entry. */
-       ie = (INDEX_ENTRY*)((u8*)&ia->index +
-                       le32_to_cpu(ia->index.entries_offset));
-       /*
-        * Iterate similar to above big loop but applied to index buffer, thus
-        * loop until we exceed valid memory (corruption case) or until we
-        * reach the last entry.
-        */
-       for (;; ie = (INDEX_ENTRY*)((u8*)ie + le16_to_cpu(ie->length))) {
-               /* Bounds check. */
-               if ((u8*)ie < (u8*)ia || (u8*)ie +
-                               sizeof(INDEX_ENTRY_HEADER) > index_end ||
-                               (u8*)ie + le16_to_cpu(ie->key_length) >
-                               index_end) {
-                       ntfs_error(sb, "Index entry out of bounds in "
-                                       "directory inode 0x%lx.",
-                                       dir_ni->mft_no);
-                       goto unm_err_out;
-               }
-               /*
-                * The last entry cannot contain a name. It can however contain
-                * a pointer to a child node in the B+tree so we just break out.
-                */
-               if (ie->flags & INDEX_ENTRY_END)
-                       break;
-               /*
-                * If the current entry has a name type of POSIX, the name is
-                * case sensitive and not otherwise. This has the effect of us
-                * not being able to access any POSIX file names which collate
-                * after the non-POSIX one when they only differ in case, but
-                * anyone doing screwy stuff like that deserves to burn in
-                * hell... Doing that kind of stuff on NT4 actually causes
-                * corruption on the partition even when using SP6a and Linux
-                * is not involved at all.
-                */
-               ic = ie->key.file_name.file_name_type ? IGNORE_CASE :
-                               CASE_SENSITIVE;
-               /*
-                * If the names match perfectly, we are done and return the
-                * mft reference of the inode (i.e. the inode number together
-                * with the sequence number for consistency checking. We
-                * convert it to cpu format before returning.
-                */
-               if (ntfs_are_names_equal(uname, uname_len,
-                               (ntfschar*)&ie->key.file_name.file_name,
-                               ie->key.file_name.file_name_length, ic,
-                               vol->upcase, vol->upcase_len)) {
-found_it2:
-                       mref = le64_to_cpu(ie->data.dir.indexed_file);
-                       unlock_page(page);
-                       ntfs_unmap_page(page);
-                       return mref;
-               }
-               /*
-                * Not a perfect match, need to do full blown collation so we
-                * know which way in the B+tree we have to go.
-                */
-               rc = ntfs_collate_names(uname, uname_len,
-                               (ntfschar*)&ie->key.file_name.file_name,
-                               ie->key.file_name.file_name_length, 1,
-                               IGNORE_CASE, vol->upcase, vol->upcase_len);
-               /*
-                * If uname collates before the name of the current entry, there
-                * is definitely no such name in this index but we might need to
-                * descend into the B+tree so we just break out of the loop.
-                */
-               if (rc == -1)
-                       break;
-               /* The names are not equal, continue the search. */
-               if (rc)
-                       continue;
-               /*
-                * Names match with case insensitive comparison, now try the
-                * case sensitive comparison, which is required for proper
-                * collation.
-                */
-               rc = ntfs_collate_names(uname, uname_len,
-                               (ntfschar*)&ie->key.file_name.file_name,
-                               ie->key.file_name.file_name_length, 1,
-                               CASE_SENSITIVE, vol->upcase, vol->upcase_len);
-               if (rc == -1)
-                       break;
-               if (rc)
-                       continue;
-               /*
-                * Perfect match, this will never happen as the
-                * ntfs_are_names_equal() call will have gotten a match but we
-                * still treat it correctly.
-                */
-               goto found_it2;
-       }
-       /*
-        * We have finished with this index buffer without success. Check for
-        * the presence of a child node.
-        */
-       if (ie->flags & INDEX_ENTRY_NODE) {
-               if ((ia->index.flags & NODE_MASK) == LEAF_NODE) {
-                       ntfs_error(sb, "Index entry with child node found in "
-                                       "a leaf node in directory inode 0x%lx.",
-                                       dir_ni->mft_no);
-                       goto unm_err_out;
-               }
-               /* Child node present, descend into it. */
-               old_vcn = vcn;
-               vcn = sle64_to_cpup((u8*)ie + le16_to_cpu(ie->length) - 8);
-               if (vcn >= 0) {
-                       /* If vcn is in the same page cache page as old_vcn we
-                        * recycle the mapped page. */
-                       if (old_vcn << vol->cluster_size_bits >>
-                                       PAGE_SHIFT == vcn <<
-                                       vol->cluster_size_bits >>
-                                       PAGE_SHIFT)
-                               goto fast_descend_into_child_node;
-                       unlock_page(page);
-                       ntfs_unmap_page(page);
-                       goto descend_into_child_node;
-               }
-               ntfs_error(sb, "Negative child node vcn in directory inode "
-                               "0x%lx.", dir_ni->mft_no);
-               goto unm_err_out;
-       }
-       /* No child node, return -ENOENT. */
-       ntfs_debug("Entry not found.");
-       err = -ENOENT;
-unm_err_out:
-       unlock_page(page);
-       ntfs_unmap_page(page);
-err_out:
-       if (!err)
-               err = -EIO;
-       if (ctx)
-               ntfs_attr_put_search_ctx(ctx);
-       if (m)
-               unmap_mft_record(dir_ni);
-       return ERR_MREF(err);
-dir_err_out:
-       ntfs_error(sb, "Corrupt directory. Aborting lookup.");
-       goto err_out;
-}
-
-#endif
-
-/**
- * ntfs_filldir - ntfs specific filldir method
- * @vol:       current ntfs volume
- * @ndir:      ntfs inode of current directory
- * @ia_page:   page in which the index allocation buffer @ie is in resides
- * @ie:                current index entry
- * @name:      buffer to use for the converted name
- * @actor:     what to feed the entries to
- *
- * Convert the Unicode @name to the loaded NLS and pass it to the @filldir
- * callback.
- *
- * If @ia_page is not NULL it is the locked page containing the index
- * allocation block containing the index entry @ie.
- *
- * Note, we drop (and then reacquire) the page lock on @ia_page across the
- * @filldir() call otherwise we would deadlock with NFSd when it calls ->lookup
- * since ntfs_lookup() will lock the same page.  As an optimization, we do not
- * retake the lock if we are returning a non-zero value as ntfs_readdir()
- * would need to drop the lock immediately anyway.
- */
-static inline int ntfs_filldir(ntfs_volume *vol,
-               ntfs_inode *ndir, struct page *ia_page, INDEX_ENTRY *ie,
-               u8 *name, struct dir_context *actor)
-{
-       unsigned long mref;
-       int name_len;
-       unsigned dt_type;
-       FILE_NAME_TYPE_FLAGS name_type;
-
-       name_type = ie->key.file_name.file_name_type;
-       if (name_type == FILE_NAME_DOS) {
-               ntfs_debug("Skipping DOS name space entry.");
-               return 0;
-       }
-       if (MREF_LE(ie->data.dir.indexed_file) == FILE_root) {
-               ntfs_debug("Skipping root directory self reference entry.");
-               return 0;
-       }
-       if (MREF_LE(ie->data.dir.indexed_file) < FILE_first_user &&
-                       !NVolShowSystemFiles(vol)) {
-               ntfs_debug("Skipping system file.");
-               return 0;
-       }
-       name_len = ntfs_ucstonls(vol, (ntfschar*)&ie->key.file_name.file_name,
-                       ie->key.file_name.file_name_length, &name,
-                       NTFS_MAX_NAME_LEN * NLS_MAX_CHARSET_SIZE + 1);
-       if (name_len <= 0) {
-               ntfs_warning(vol->sb, "Skipping unrepresentable inode 0x%llx.",
-                               (long long)MREF_LE(ie->data.dir.indexed_file));
-               return 0;
-       }
-       if (ie->key.file_name.file_attributes &
-                       FILE_ATTR_DUP_FILE_NAME_INDEX_PRESENT)
-               dt_type = DT_DIR;
-       else
-               dt_type = DT_REG;
-       mref = MREF_LE(ie->data.dir.indexed_file);
-       /*
-        * Drop the page lock otherwise we deadlock with NFS when it calls
-        * ->lookup since ntfs_lookup() will lock the same page.
-        */
-       if (ia_page)
-               unlock_page(ia_page);
-       ntfs_debug("Calling filldir for %s with len %i, fpos 0x%llx, inode "
-                       "0x%lx, DT_%s.", name, name_len, actor->pos, mref,
-                       dt_type == DT_DIR ? "DIR" : "REG");
-       if (!dir_emit(actor, name, name_len, mref, dt_type))
-               return 1;
-       /* Relock the page but not if we are aborting ->readdir. */
-       if (ia_page)
-               lock_page(ia_page);
-       return 0;
-}
-
-/*
- * We use the same basic approach as the old NTFS driver, i.e. we parse the
- * index root entries and then the index allocation entries that are marked
- * as in use in the index bitmap.
- *
- * While this will return the names in random order this doesn't matter for
- * ->readdir but OTOH results in a faster ->readdir.
- *
- * VFS calls ->readdir without BKL but with i_mutex held. This protects the VFS
- * parts (e.g. ->f_pos and ->i_size, and it also protects against directory
- * modifications).
- *
- * Locking:  - Caller must hold i_mutex on the directory.
- *          - Each page cache page in the index allocation mapping must be
- *            locked whilst being accessed otherwise we may find a corrupt
- *            page due to it being under ->writepage at the moment which
- *            applies the mst protection fixups before writing out and then
- *            removes them again after the write is complete after which it 
- *            unlocks the page.
- */
-static int ntfs_readdir(struct file *file, struct dir_context *actor)
-{
-       s64 ia_pos, ia_start, prev_ia_pos, bmp_pos;
-       loff_t i_size;
-       struct inode *bmp_vi, *vdir = file_inode(file);
-       struct super_block *sb = vdir->i_sb;
-       ntfs_inode *ndir = NTFS_I(vdir);
-       ntfs_volume *vol = NTFS_SB(sb);
-       MFT_RECORD *m;
-       INDEX_ROOT *ir = NULL;
-       INDEX_ENTRY *ie;
-       INDEX_ALLOCATION *ia;
-       u8 *name = NULL;
-       int rc, err, ir_pos, cur_bmp_pos;
-       struct address_space *ia_mapping, *bmp_mapping;
-       struct page *bmp_page = NULL, *ia_page = NULL;
-       u8 *kaddr, *bmp, *index_end;
-       ntfs_attr_search_ctx *ctx;
-
-       ntfs_debug("Entering for inode 0x%lx, fpos 0x%llx.",
-                       vdir->i_ino, actor->pos);
-       rc = err = 0;
-       /* Are we at end of dir yet? */
-       i_size = i_size_read(vdir);
-       if (actor->pos >= i_size + vol->mft_record_size)
-               return 0;
-       /* Emulate . and .. for all directories. */
-       if (!dir_emit_dots(file, actor))
-               return 0;
-       m = NULL;
-       ctx = NULL;
-       /*
-        * Allocate a buffer to store the current name being processed
-        * converted to format determined by current NLS.
-        */
-       name = kmalloc(NTFS_MAX_NAME_LEN * NLS_MAX_CHARSET_SIZE + 1, GFP_NOFS);
-       if (unlikely(!name)) {
-               err = -ENOMEM;
-               goto err_out;
-       }
-       /* Are we jumping straight into the index allocation attribute? */
-       if (actor->pos >= vol->mft_record_size)
-               goto skip_index_root;
-       /* Get hold of the mft record for the directory. */
-       m = map_mft_record(ndir);
-       if (IS_ERR(m)) {
-               err = PTR_ERR(m);
-               m = NULL;
-               goto err_out;
-       }
-       ctx = ntfs_attr_get_search_ctx(ndir, m);
-       if (unlikely(!ctx)) {
-               err = -ENOMEM;
-               goto err_out;
-       }
-       /* Get the offset into the index root attribute. */
-       ir_pos = (s64)actor->pos;
-       /* Find the index root attribute in the mft record. */
-       err = ntfs_attr_lookup(AT_INDEX_ROOT, I30, 4, CASE_SENSITIVE, 0, NULL,
-                       0, ctx);
-       if (unlikely(err)) {
-               ntfs_error(sb, "Index root attribute missing in directory "
-                               "inode 0x%lx.", vdir->i_ino);
-               goto err_out;
-       }
-       /*
-        * Copy the index root attribute value to a buffer so that we can put
-        * the search context and unmap the mft record before calling the
-        * filldir() callback.  We need to do this because of NFSd which calls
-        * ->lookup() from its filldir callback() and this causes NTFS to
-        * deadlock as ntfs_lookup() maps the mft record of the directory and
-        * we have got it mapped here already.  The only solution is for us to
-        * unmap the mft record here so that a call to ntfs_lookup() is able to
-        * map the mft record without deadlocking.
-        */
-       rc = le32_to_cpu(ctx->attr->data.resident.value_length);
-       ir = kmalloc(rc, GFP_NOFS);
-       if (unlikely(!ir)) {
-               err = -ENOMEM;
-               goto err_out;
-       }
-       /* Copy the index root value (it has been verified in read_inode). */
-       memcpy(ir, (u8*)ctx->attr +
-                       le16_to_cpu(ctx->attr->data.resident.value_offset), rc);
-       ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(ndir);
-       ctx = NULL;
-       m = NULL;
-       index_end = (u8*)&ir->index + le32_to_cpu(ir->index.index_length);
-       /* The first index entry. */
-       ie = (INDEX_ENTRY*)((u8*)&ir->index +
-                       le32_to_cpu(ir->index.entries_offset));
-       /*
-        * Loop until we exceed valid memory (corruption case) or until we
-        * reach the last entry or until filldir tells us it has had enough
-        * or signals an error (both covered by the rc test).
-        */
-       for (;; ie = (INDEX_ENTRY*)((u8*)ie + le16_to_cpu(ie->length))) {
-               ntfs_debug("In index root, offset 0x%zx.", (u8*)ie - (u8*)ir);
-               /* Bounds checks. */
-               if (unlikely((u8*)ie < (u8*)ir || (u8*)ie +
-                               sizeof(INDEX_ENTRY_HEADER) > index_end ||
-                               (u8*)ie + le16_to_cpu(ie->key_length) >
-                               index_end))
-                       goto err_out;
-               /* The last entry cannot contain a name. */
-               if (ie->flags & INDEX_ENTRY_END)
-                       break;
-               /* Skip index root entry if continuing previous readdir. */
-               if (ir_pos > (u8*)ie - (u8*)ir)
-                       continue;
-               /* Advance the position even if going to skip the entry. */
-               actor->pos = (u8*)ie - (u8*)ir;
-               /* Submit the name to the filldir callback. */
-               rc = ntfs_filldir(vol, ndir, NULL, ie, name, actor);
-               if (rc) {
-                       kfree(ir);
-                       goto abort;
-               }
-       }
-       /* We are done with the index root and can free the buffer. */
-       kfree(ir);
-       ir = NULL;
-       /* If there is no index allocation attribute we are finished. */
-       if (!NInoIndexAllocPresent(ndir))
-               goto EOD;
-       /* Advance fpos to the beginning of the index allocation. */
-       actor->pos = vol->mft_record_size;
-skip_index_root:
-       kaddr = NULL;
-       prev_ia_pos = -1LL;
-       /* Get the offset into the index allocation attribute. */
-       ia_pos = (s64)actor->pos - vol->mft_record_size;
-       ia_mapping = vdir->i_mapping;
-       ntfs_debug("Inode 0x%lx, getting index bitmap.", vdir->i_ino);
-       bmp_vi = ntfs_attr_iget(vdir, AT_BITMAP, I30, 4);
-       if (IS_ERR(bmp_vi)) {
-               ntfs_error(sb, "Failed to get bitmap attribute.");
-               err = PTR_ERR(bmp_vi);
-               goto err_out;
-       }
-       bmp_mapping = bmp_vi->i_mapping;
-       /* Get the starting bitmap bit position and sanity check it. */
-       bmp_pos = ia_pos >> ndir->itype.index.block_size_bits;
-       if (unlikely(bmp_pos >> 3 >= i_size_read(bmp_vi))) {
-               ntfs_error(sb, "Current index allocation position exceeds "
-                               "index bitmap size.");
-               goto iput_err_out;
-       }
-       /* Get the starting bit position in the current bitmap page. */
-       cur_bmp_pos = bmp_pos & ((PAGE_SIZE * 8) - 1);
-       bmp_pos &= ~(u64)((PAGE_SIZE * 8) - 1);
-get_next_bmp_page:
-       ntfs_debug("Reading bitmap with page index 0x%llx, bit ofs 0x%llx",
-                       (unsigned long long)bmp_pos >> (3 + PAGE_SHIFT),
-                       (unsigned long long)bmp_pos &
-                       (unsigned long long)((PAGE_SIZE * 8) - 1));
-       bmp_page = ntfs_map_page(bmp_mapping,
-                       bmp_pos >> (3 + PAGE_SHIFT));
-       if (IS_ERR(bmp_page)) {
-               ntfs_error(sb, "Reading index bitmap failed.");
-               err = PTR_ERR(bmp_page);
-               bmp_page = NULL;
-               goto iput_err_out;
-       }
-       bmp = (u8*)page_address(bmp_page);
-       /* Find next index block in use. */
-       while (!(bmp[cur_bmp_pos >> 3] & (1 << (cur_bmp_pos & 7)))) {
-find_next_index_buffer:
-               cur_bmp_pos++;
-               /*
-                * If we have reached the end of the bitmap page, get the next
-                * page, and put away the old one.
-                */
-               if (unlikely((cur_bmp_pos >> 3) >= PAGE_SIZE)) {
-                       ntfs_unmap_page(bmp_page);
-                       bmp_pos += PAGE_SIZE * 8;
-                       cur_bmp_pos = 0;
-                       goto get_next_bmp_page;
-               }
-               /* If we have reached the end of the bitmap, we are done. */
-               if (unlikely(((bmp_pos + cur_bmp_pos) >> 3) >= i_size))
-                       goto unm_EOD;
-               ia_pos = (bmp_pos + cur_bmp_pos) <<
-                               ndir->itype.index.block_size_bits;
-       }
-       ntfs_debug("Handling index buffer 0x%llx.",
-                       (unsigned long long)bmp_pos + cur_bmp_pos);
-       /* If the current index buffer is in the same page we reuse the page. */
-       if ((prev_ia_pos & (s64)PAGE_MASK) !=
-                       (ia_pos & (s64)PAGE_MASK)) {
-               prev_ia_pos = ia_pos;
-               if (likely(ia_page != NULL)) {
-                       unlock_page(ia_page);
-                       ntfs_unmap_page(ia_page);
-               }
-               /*
-                * Map the page cache page containing the current ia_pos,
-                * reading it from disk if necessary.
-                */
-               ia_page = ntfs_map_page(ia_mapping, ia_pos >> PAGE_SHIFT);
-               if (IS_ERR(ia_page)) {
-                       ntfs_error(sb, "Reading index allocation data failed.");
-                       err = PTR_ERR(ia_page);
-                       ia_page = NULL;
-                       goto err_out;
-               }
-               lock_page(ia_page);
-               kaddr = (u8*)page_address(ia_page);
-       }
-       /* Get the current index buffer. */
-       ia = (INDEX_ALLOCATION*)(kaddr + (ia_pos & ~PAGE_MASK &
-                                         ~(s64)(ndir->itype.index.block_size - 1)));
-       /* Bounds checks. */
-       if (unlikely((u8*)ia < kaddr || (u8*)ia > kaddr + PAGE_SIZE)) {
-               ntfs_error(sb, "Out of bounds check failed. Corrupt directory "
-                               "inode 0x%lx or driver bug.", vdir->i_ino);
-               goto err_out;
-       }
-       /* Catch multi sector transfer fixup errors. */
-       if (unlikely(!ntfs_is_indx_record(ia->magic))) {
-               ntfs_error(sb, "Directory index record with vcn 0x%llx is "
-                               "corrupt.  Corrupt inode 0x%lx.  Run chkdsk.",
-                               (unsigned long long)ia_pos >>
-                               ndir->itype.index.vcn_size_bits, vdir->i_ino);
-               goto err_out;
-       }
-       if (unlikely(sle64_to_cpu(ia->index_block_vcn) != (ia_pos &
-                       ~(s64)(ndir->itype.index.block_size - 1)) >>
-                       ndir->itype.index.vcn_size_bits)) {
-               ntfs_error(sb, "Actual VCN (0x%llx) of index buffer is "
-                               "different from expected VCN (0x%llx). "
-                               "Directory inode 0x%lx is corrupt or driver "
-                               "bug. ", (unsigned long long)
-                               sle64_to_cpu(ia->index_block_vcn),
-                               (unsigned long long)ia_pos >>
-                               ndir->itype.index.vcn_size_bits, vdir->i_ino);
-               goto err_out;
-       }
-       if (unlikely(le32_to_cpu(ia->index.allocated_size) + 0x18 !=
-                       ndir->itype.index.block_size)) {
-               ntfs_error(sb, "Index buffer (VCN 0x%llx) of directory inode "
-                               "0x%lx has a size (%u) differing from the "
-                               "directory specified size (%u). Directory "
-                               "inode is corrupt or driver bug.",
-                               (unsigned long long)ia_pos >>
-                               ndir->itype.index.vcn_size_bits, vdir->i_ino,
-                               le32_to_cpu(ia->index.allocated_size) + 0x18,
-                               ndir->itype.index.block_size);
-               goto err_out;
-       }
-       index_end = (u8*)ia + ndir->itype.index.block_size;
-       if (unlikely(index_end > kaddr + PAGE_SIZE)) {
-               ntfs_error(sb, "Index buffer (VCN 0x%llx) of directory inode "
-                               "0x%lx crosses page boundary. Impossible! "
-                               "Cannot access! This is probably a bug in the "
-                               "driver.", (unsigned long long)ia_pos >>
-                               ndir->itype.index.vcn_size_bits, vdir->i_ino);
-               goto err_out;
-       }
-       ia_start = ia_pos & ~(s64)(ndir->itype.index.block_size - 1);
-       index_end = (u8*)&ia->index + le32_to_cpu(ia->index.index_length);
-       if (unlikely(index_end > (u8*)ia + ndir->itype.index.block_size)) {
-               ntfs_error(sb, "Size of index buffer (VCN 0x%llx) of directory "
-                               "inode 0x%lx exceeds maximum size.",
-                               (unsigned long long)ia_pos >>
-                               ndir->itype.index.vcn_size_bits, vdir->i_ino);
-               goto err_out;
-       }
-       /* The first index entry in this index buffer. */
-       ie = (INDEX_ENTRY*)((u8*)&ia->index +
-                       le32_to_cpu(ia->index.entries_offset));
-       /*
-        * Loop until we exceed valid memory (corruption case) or until we
-        * reach the last entry or until filldir tells us it has had enough
-        * or signals an error (both covered by the rc test).
-        */
-       for (;; ie = (INDEX_ENTRY*)((u8*)ie + le16_to_cpu(ie->length))) {
-               ntfs_debug("In index allocation, offset 0x%llx.",
-                               (unsigned long long)ia_start +
-                               (unsigned long long)((u8*)ie - (u8*)ia));
-               /* Bounds checks. */
-               if (unlikely((u8*)ie < (u8*)ia || (u8*)ie +
-                               sizeof(INDEX_ENTRY_HEADER) > index_end ||
-                               (u8*)ie + le16_to_cpu(ie->key_length) >
-                               index_end))
-                       goto err_out;
-               /* The last entry cannot contain a name. */
-               if (ie->flags & INDEX_ENTRY_END)
-                       break;
-               /* Skip index block entry if continuing previous readdir. */
-               if (ia_pos - ia_start > (u8*)ie - (u8*)ia)
-                       continue;
-               /* Advance the position even if going to skip the entry. */
-               actor->pos = (u8*)ie - (u8*)ia +
-                               (sle64_to_cpu(ia->index_block_vcn) <<
-                               ndir->itype.index.vcn_size_bits) +
-                               vol->mft_record_size;
-               /*
-                * Submit the name to the @filldir callback.  Note,
-                * ntfs_filldir() drops the lock on @ia_page but it retakes it
-                * before returning, unless a non-zero value is returned in
-                * which case the page is left unlocked.
-                */
-               rc = ntfs_filldir(vol, ndir, ia_page, ie, name, actor);
-               if (rc) {
-                       /* @ia_page is already unlocked in this case. */
-                       ntfs_unmap_page(ia_page);
-                       ntfs_unmap_page(bmp_page);
-                       iput(bmp_vi);
-                       goto abort;
-               }
-       }
-       goto find_next_index_buffer;
-unm_EOD:
-       if (ia_page) {
-               unlock_page(ia_page);
-               ntfs_unmap_page(ia_page);
-       }
-       ntfs_unmap_page(bmp_page);
-       iput(bmp_vi);
-EOD:
-       /* We are finished, set fpos to EOD. */
-       actor->pos = i_size + vol->mft_record_size;
-abort:
-       kfree(name);
-       return 0;
-err_out:
-       if (bmp_page) {
-               ntfs_unmap_page(bmp_page);
-iput_err_out:
-               iput(bmp_vi);
-       }
-       if (ia_page) {
-               unlock_page(ia_page);
-               ntfs_unmap_page(ia_page);
-       }
-       kfree(ir);
-       kfree(name);
-       if (ctx)
-               ntfs_attr_put_search_ctx(ctx);
-       if (m)
-               unmap_mft_record(ndir);
-       if (!err)
-               err = -EIO;
-       ntfs_debug("Failed. Returning error code %i.", -err);
-       return err;
-}
-
-/**
- * ntfs_dir_open - called when an inode is about to be opened
- * @vi:                inode to be opened
- * @filp:      file structure describing the inode
- *
- * Limit directory size to the page cache limit on architectures where unsigned
- * long is 32-bits. This is the most we can do for now without overflowing the
- * page cache page index. Doing it this way means we don't run into problems
- * because of existing too large directories. It would be better to allow the
- * user to read the accessible part of the directory but I doubt very much
- * anyone is going to hit this check on a 32-bit architecture, so there is no
- * point in adding the extra complexity required to support this.
- *
- * On 64-bit architectures, the check is hopefully optimized away by the
- * compiler.
- */
-static int ntfs_dir_open(struct inode *vi, struct file *filp)
-{
-       if (sizeof(unsigned long) < 8) {
-               if (i_size_read(vi) > MAX_LFS_FILESIZE)
-                       return -EFBIG;
-       }
-       return 0;
-}
-
-#ifdef NTFS_RW
-
-/**
- * ntfs_dir_fsync - sync a directory to disk
- * @filp:      directory to be synced
- * @start:     offset in bytes of the beginning of data range to sync
- * @end:       offset in bytes of the end of data range (inclusive)
- * @datasync:  if non-zero only flush user data and not metadata
- *
- * Data integrity sync of a directory to disk.  Used for fsync, fdatasync, and
- * msync system calls.  This function is based on file.c::ntfs_file_fsync().
- *
- * Write the mft record and all associated extent mft records as well as the
- * $INDEX_ALLOCATION and $BITMAP attributes and then sync the block device.
- *
- * If @datasync is true, we do not wait on the inode(s) to be written out
- * but we always wait on the page cache pages to be written out.
- *
- * Note: In the past @filp could be NULL so we ignore it as we don't need it
- * anyway.
- *
- * Locking: Caller must hold i_mutex on the inode.
- *
- * TODO: We should probably also write all attribute/index inodes associated
- * with this inode but since we have no simple way of getting to them we ignore
- * this problem for now.  We do write the $BITMAP attribute if it is present
- * which is the important one for a directory so things are not too bad.
- */
-static int ntfs_dir_fsync(struct file *filp, loff_t start, loff_t end,
-                         int datasync)
-{
-       struct inode *bmp_vi, *vi = filp->f_mapping->host;
-       int err, ret;
-       ntfs_attr na;
-
-       ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
-
-       err = file_write_and_wait_range(filp, start, end);
-       if (err)
-               return err;
-       inode_lock(vi);
-
-       BUG_ON(!S_ISDIR(vi->i_mode));
-       /* If the bitmap attribute inode is in memory sync it, too. */
-       na.mft_no = vi->i_ino;
-       na.type = AT_BITMAP;
-       na.name = I30;
-       na.name_len = 4;
-       bmp_vi = ilookup5(vi->i_sb, vi->i_ino, ntfs_test_inode, &na);
-       if (bmp_vi) {
-               write_inode_now(bmp_vi, !datasync);
-               iput(bmp_vi);
-       }
-       ret = __ntfs_write_inode(vi, 1);
-       write_inode_now(vi, !datasync);
-       err = sync_blockdev(vi->i_sb->s_bdev);
-       if (unlikely(err && !ret))
-               ret = err;
-       if (likely(!ret))
-               ntfs_debug("Done.");
-       else
-               ntfs_warning(vi->i_sb, "Failed to f%ssync inode 0x%lx.  Error "
-                               "%u.", datasync ? "data" : "", vi->i_ino, -ret);
-       inode_unlock(vi);
-       return ret;
-}
-
-#endif /* NTFS_RW */
-
-WRAP_DIR_ITER(ntfs_readdir) // FIXME!
-const struct file_operations ntfs_dir_ops = {
-       .llseek         = generic_file_llseek,  /* Seek inside directory. */
-       .read           = generic_read_dir,     /* Return -EISDIR. */
-       .iterate_shared = shared_ntfs_readdir,  /* Read directory contents. */
-#ifdef NTFS_RW
-       .fsync          = ntfs_dir_fsync,       /* Sync a directory to disk. */
-#endif /* NTFS_RW */
-       /*.ioctl        = ,*/                   /* Perform function on the
-                                                  mounted filesystem. */
-       .open           = ntfs_dir_open,        /* Open directory. */
-};
diff --git a/fs/ntfs/dir.h b/fs/ntfs/dir.h
deleted file mode 100644 (file)
index 0e32675..0000000
+++ /dev/null
@@ -1,34 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * dir.h - Defines for directory handling in NTFS Linux kernel driver. Part of
- *        the Linux-NTFS project.
- *
- * Copyright (c) 2002-2004 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_DIR_H
-#define _LINUX_NTFS_DIR_H
-
-#include "layout.h"
-#include "inode.h"
-#include "types.h"
-
-/*
- * ntfs_name is used to return the file name to the caller of
- * ntfs_lookup_inode_by_name() in order for the caller (namei.c::ntfs_lookup())
- * to be able to deal with dcache aliasing issues.
- */
-typedef struct {
-       MFT_REF mref;
-       FILE_NAME_TYPE_FLAGS type;
-       u8 len;
-       ntfschar name[0];
-} __attribute__ ((__packed__)) ntfs_name;
-
-/* The little endian Unicode string $I30 as a global constant. */
-extern ntfschar I30[5];
-
-extern MFT_REF ntfs_lookup_inode_by_name(ntfs_inode *dir_ni,
-               const ntfschar *uname, const int uname_len, ntfs_name **res);
-
-#endif /* _LINUX_NTFS_FS_DIR_H */
diff --git a/fs/ntfs/endian.h b/fs/ntfs/endian.h
deleted file mode 100644 (file)
index f30c139..0000000
+++ /dev/null
@@ -1,79 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * endian.h - Defines for endianness handling in NTFS Linux kernel driver.
- *           Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2004 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_ENDIAN_H
-#define _LINUX_NTFS_ENDIAN_H
-
-#include <asm/byteorder.h>
-#include "types.h"
-
-/*
- * Signed endianness conversion functions.
- */
-
-static inline s16 sle16_to_cpu(sle16 x)
-{
-       return le16_to_cpu((__force le16)x);
-}
-
-static inline s32 sle32_to_cpu(sle32 x)
-{
-       return le32_to_cpu((__force le32)x);
-}
-
-static inline s64 sle64_to_cpu(sle64 x)
-{
-       return le64_to_cpu((__force le64)x);
-}
-
-static inline s16 sle16_to_cpup(sle16 *x)
-{
-       return le16_to_cpu(*(__force le16*)x);
-}
-
-static inline s32 sle32_to_cpup(sle32 *x)
-{
-       return le32_to_cpu(*(__force le32*)x);
-}
-
-static inline s64 sle64_to_cpup(sle64 *x)
-{
-       return le64_to_cpu(*(__force le64*)x);
-}
-
-static inline sle16 cpu_to_sle16(s16 x)
-{
-       return (__force sle16)cpu_to_le16(x);
-}
-
-static inline sle32 cpu_to_sle32(s32 x)
-{
-       return (__force sle32)cpu_to_le32(x);
-}
-
-static inline sle64 cpu_to_sle64(s64 x)
-{
-       return (__force sle64)cpu_to_le64(x);
-}
-
-static inline sle16 cpu_to_sle16p(s16 *x)
-{
-       return (__force sle16)cpu_to_le16(*x);
-}
-
-static inline sle32 cpu_to_sle32p(s32 *x)
-{
-       return (__force sle32)cpu_to_le32(*x);
-}
-
-static inline sle64 cpu_to_sle64p(s64 *x)
-{
-       return (__force sle64)cpu_to_le64(*x);
-}
-
-#endif /* _LINUX_NTFS_ENDIAN_H */
diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
deleted file mode 100644 (file)
index 297c0b9..0000000
+++ /dev/null
@@ -1,1997 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * file.c - NTFS kernel file operations.  Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2015 Anton Altaparmakov and Tuxera Inc.
- */
-
-#include <linux/blkdev.h>
-#include <linux/backing-dev.h>
-#include <linux/buffer_head.h>
-#include <linux/gfp.h>
-#include <linux/pagemap.h>
-#include <linux/pagevec.h>
-#include <linux/sched/signal.h>
-#include <linux/swap.h>
-#include <linux/uio.h>
-#include <linux/writeback.h>
-
-#include <asm/page.h>
-#include <linux/uaccess.h>
-
-#include "attrib.h"
-#include "bitmap.h"
-#include "inode.h"
-#include "debug.h"
-#include "lcnalloc.h"
-#include "malloc.h"
-#include "mft.h"
-#include "ntfs.h"
-
-/**
- * ntfs_file_open - called when an inode is about to be opened
- * @vi:                inode to be opened
- * @filp:      file structure describing the inode
- *
- * Limit file size to the page cache limit on architectures where unsigned long
- * is 32-bits. This is the most we can do for now without overflowing the page
- * cache page index. Doing it this way means we don't run into problems because
- * of existing too large files. It would be better to allow the user to read
- * the beginning of the file but I doubt very much anyone is going to hit this
- * check on a 32-bit architecture, so there is no point in adding the extra
- * complexity required to support this.
- *
- * On 64-bit architectures, the check is hopefully optimized away by the
- * compiler.
- *
- * After the check passes, just call generic_file_open() to do its work.
- */
-static int ntfs_file_open(struct inode *vi, struct file *filp)
-{
-       if (sizeof(unsigned long) < 8) {
-               if (i_size_read(vi) > MAX_LFS_FILESIZE)
-                       return -EOVERFLOW;
-       }
-       return generic_file_open(vi, filp);
-}
-
-#ifdef NTFS_RW
-
-/**
- * ntfs_attr_extend_initialized - extend the initialized size of an attribute
- * @ni:                        ntfs inode of the attribute to extend
- * @new_init_size:     requested new initialized size in bytes
- *
- * Extend the initialized size of an attribute described by the ntfs inode @ni
- * to @new_init_size bytes.  This involves zeroing any non-sparse space between
- * the old initialized size and @new_init_size both in the page cache and on
- * disk (if relevant complete pages are already uptodate in the page cache then
- * these are simply marked dirty).
- *
- * As a side-effect, the file size (vfs inode->i_size) may be incremented as,
- * in the resident attribute case, it is tied to the initialized size and, in
- * the non-resident attribute case, it may not fall below the initialized size.
- *
- * Note that if the attribute is resident, we do not need to touch the page
- * cache at all.  This is because if the page cache page is not uptodate we
- * bring it uptodate later, when doing the write to the mft record since we
- * then already have the page mapped.  And if the page is uptodate, the
- * non-initialized region will already have been zeroed when the page was
- * brought uptodate and the region may in fact already have been overwritten
- * with new data via mmap() based writes, so we cannot just zero it.  And since
- * POSIX specifies that the behaviour of resizing a file whilst it is mmap()ped
- * is unspecified, we choose not to do zeroing and thus we do not need to touch
- * the page at all.  For a more detailed explanation see ntfs_truncate() in
- * fs/ntfs/inode.c.
- *
- * Return 0 on success and -errno on error.  In the case that an error is
- * encountered it is possible that the initialized size will already have been
- * incremented some way towards @new_init_size but it is guaranteed that if
- * this is the case, the necessary zeroing will also have happened and that all
- * metadata is self-consistent.
- *
- * Locking: i_mutex on the vfs inode corrseponsind to the ntfs inode @ni must be
- *         held by the caller.
- */
-static int ntfs_attr_extend_initialized(ntfs_inode *ni, const s64 new_init_size)
-{
-       s64 old_init_size;
-       loff_t old_i_size;
-       pgoff_t index, end_index;
-       unsigned long flags;
-       struct inode *vi = VFS_I(ni);
-       ntfs_inode *base_ni;
-       MFT_RECORD *m = NULL;
-       ATTR_RECORD *a;
-       ntfs_attr_search_ctx *ctx = NULL;
-       struct address_space *mapping;
-       struct page *page = NULL;
-       u8 *kattr;
-       int err;
-       u32 attr_len;
-
-       read_lock_irqsave(&ni->size_lock, flags);
-       old_init_size = ni->initialized_size;
-       old_i_size = i_size_read(vi);
-       BUG_ON(new_init_size > ni->allocated_size);
-       read_unlock_irqrestore(&ni->size_lock, flags);
-       ntfs_debug("Entering for i_ino 0x%lx, attribute type 0x%x, "
-                       "old_initialized_size 0x%llx, "
-                       "new_initialized_size 0x%llx, i_size 0x%llx.",
-                       vi->i_ino, (unsigned)le32_to_cpu(ni->type),
-                       (unsigned long long)old_init_size,
-                       (unsigned long long)new_init_size, old_i_size);
-       if (!NInoAttr(ni))
-               base_ni = ni;
-       else
-               base_ni = ni->ext.base_ntfs_ino;
-       /* Use goto to reduce indentation and we need the label below anyway. */
-       if (NInoNonResident(ni))
-               goto do_non_resident_extend;
-       BUG_ON(old_init_size != old_i_size);
-       m = map_mft_record(base_ni);
-       if (IS_ERR(m)) {
-               err = PTR_ERR(m);
-               m = NULL;
-               goto err_out;
-       }
-       ctx = ntfs_attr_get_search_ctx(base_ni, m);
-       if (unlikely(!ctx)) {
-               err = -ENOMEM;
-               goto err_out;
-       }
-       err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
-                       CASE_SENSITIVE, 0, NULL, 0, ctx);
-       if (unlikely(err)) {
-               if (err == -ENOENT)
-                       err = -EIO;
-               goto err_out;
-       }
-       m = ctx->mrec;
-       a = ctx->attr;
-       BUG_ON(a->non_resident);
-       /* The total length of the attribute value. */
-       attr_len = le32_to_cpu(a->data.resident.value_length);
-       BUG_ON(old_i_size != (loff_t)attr_len);
-       /*
-        * Do the zeroing in the mft record and update the attribute size in
-        * the mft record.
-        */
-       kattr = (u8*)a + le16_to_cpu(a->data.resident.value_offset);
-       memset(kattr + attr_len, 0, new_init_size - attr_len);
-       a->data.resident.value_length = cpu_to_le32((u32)new_init_size);
-       /* Finally, update the sizes in the vfs and ntfs inodes. */
-       write_lock_irqsave(&ni->size_lock, flags);
-       i_size_write(vi, new_init_size);
-       ni->initialized_size = new_init_size;
-       write_unlock_irqrestore(&ni->size_lock, flags);
-       goto done;
-do_non_resident_extend:
-       /*
-        * If the new initialized size @new_init_size exceeds the current file
-        * size (vfs inode->i_size), we need to extend the file size to the
-        * new initialized size.
-        */
-       if (new_init_size > old_i_size) {
-               m = map_mft_record(base_ni);
-               if (IS_ERR(m)) {
-                       err = PTR_ERR(m);
-                       m = NULL;
-                       goto err_out;
-               }
-               ctx = ntfs_attr_get_search_ctx(base_ni, m);
-               if (unlikely(!ctx)) {
-                       err = -ENOMEM;
-                       goto err_out;
-               }
-               err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
-                               CASE_SENSITIVE, 0, NULL, 0, ctx);
-               if (unlikely(err)) {
-                       if (err == -ENOENT)
-                               err = -EIO;
-                       goto err_out;
-               }
-               m = ctx->mrec;
-               a = ctx->attr;
-               BUG_ON(!a->non_resident);
-               BUG_ON(old_i_size != (loff_t)
-                               sle64_to_cpu(a->data.non_resident.data_size));
-               a->data.non_resident.data_size = cpu_to_sle64(new_init_size);
-               flush_dcache_mft_record_page(ctx->ntfs_ino);
-               mark_mft_record_dirty(ctx->ntfs_ino);
-               /* Update the file size in the vfs inode. */
-               i_size_write(vi, new_init_size);
-               ntfs_attr_put_search_ctx(ctx);
-               ctx = NULL;
-               unmap_mft_record(base_ni);
-               m = NULL;
-       }
-       mapping = vi->i_mapping;
-       index = old_init_size >> PAGE_SHIFT;
-       end_index = (new_init_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
-       do {
-               /*
-                * Read the page.  If the page is not present, this will zero
-                * the uninitialized regions for us.
-                */
-               page = read_mapping_page(mapping, index, NULL);
-               if (IS_ERR(page)) {
-                       err = PTR_ERR(page);
-                       goto init_err_out;
-               }
-               /*
-                * Update the initialized size in the ntfs inode.  This is
-                * enough to make ntfs_writepage() work.
-                */
-               write_lock_irqsave(&ni->size_lock, flags);
-               ni->initialized_size = (s64)(index + 1) << PAGE_SHIFT;
-               if (ni->initialized_size > new_init_size)
-                       ni->initialized_size = new_init_size;
-               write_unlock_irqrestore(&ni->size_lock, flags);
-               /* Set the page dirty so it gets written out. */
-               set_page_dirty(page);
-               put_page(page);
-               /*
-                * Play nice with the vm and the rest of the system.  This is
-                * very much needed as we can potentially be modifying the
-                * initialised size from a very small value to a really huge
-                * value, e.g.
-                *      f = open(somefile, O_TRUNC);
-                *      truncate(f, 10GiB);
-                *      seek(f, 10GiB);
-                *      write(f, 1);
-                * And this would mean we would be marking dirty hundreds of
-                * thousands of pages or as in the above example more than
-                * two and a half million pages!
-                *
-                * TODO: For sparse pages could optimize this workload by using
-                * the FsMisc / MiscFs page bit as a "PageIsSparse" bit.  This
-                * would be set in read_folio for sparse pages and here we would
-                * not need to mark dirty any pages which have this bit set.
-                * The only caveat is that we have to clear the bit everywhere
-                * where we allocate any clusters that lie in the page or that
-                * contain the page.
-                *
-                * TODO: An even greater optimization would be for us to only
-                * call read_folio() on pages which are not in sparse regions as
-                * determined from the runlist.  This would greatly reduce the
-                * number of pages we read and make dirty in the case of sparse
-                * files.
-                */
-               balance_dirty_pages_ratelimited(mapping);
-               cond_resched();
-       } while (++index < end_index);
-       read_lock_irqsave(&ni->size_lock, flags);
-       BUG_ON(ni->initialized_size != new_init_size);
-       read_unlock_irqrestore(&ni->size_lock, flags);
-       /* Now bring in sync the initialized_size in the mft record. */
-       m = map_mft_record(base_ni);
-       if (IS_ERR(m)) {
-               err = PTR_ERR(m);
-               m = NULL;
-               goto init_err_out;
-       }
-       ctx = ntfs_attr_get_search_ctx(base_ni, m);
-       if (unlikely(!ctx)) {
-               err = -ENOMEM;
-               goto init_err_out;
-       }
-       err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
-                       CASE_SENSITIVE, 0, NULL, 0, ctx);
-       if (unlikely(err)) {
-               if (err == -ENOENT)
-                       err = -EIO;
-               goto init_err_out;
-       }
-       m = ctx->mrec;
-       a = ctx->attr;
-       BUG_ON(!a->non_resident);
-       a->data.non_resident.initialized_size = cpu_to_sle64(new_init_size);
-done:
-       flush_dcache_mft_record_page(ctx->ntfs_ino);
-       mark_mft_record_dirty(ctx->ntfs_ino);
-       if (ctx)
-               ntfs_attr_put_search_ctx(ctx);
-       if (m)
-               unmap_mft_record(base_ni);
-       ntfs_debug("Done, initialized_size 0x%llx, i_size 0x%llx.",
-                       (unsigned long long)new_init_size, i_size_read(vi));
-       return 0;
-init_err_out:
-       write_lock_irqsave(&ni->size_lock, flags);
-       ni->initialized_size = old_init_size;
-       write_unlock_irqrestore(&ni->size_lock, flags);
-err_out:
-       if (ctx)
-               ntfs_attr_put_search_ctx(ctx);
-       if (m)
-               unmap_mft_record(base_ni);
-       ntfs_debug("Failed.  Returning error code %i.", err);
-       return err;
-}
-
-static ssize_t ntfs_prepare_file_for_write(struct kiocb *iocb,
-               struct iov_iter *from)
-{
-       loff_t pos;
-       s64 end, ll;
-       ssize_t err;
-       unsigned long flags;
-       struct file *file = iocb->ki_filp;
-       struct inode *vi = file_inode(file);
-       ntfs_inode *ni = NTFS_I(vi);
-       ntfs_volume *vol = ni->vol;
-
-       ntfs_debug("Entering for i_ino 0x%lx, attribute type 0x%x, pos "
-                       "0x%llx, count 0x%zx.", vi->i_ino,
-                       (unsigned)le32_to_cpu(ni->type),
-                       (unsigned long long)iocb->ki_pos,
-                       iov_iter_count(from));
-       err = generic_write_checks(iocb, from);
-       if (unlikely(err <= 0))
-               goto out;
-       /*
-        * All checks have passed.  Before we start doing any writing we want
-        * to abort any totally illegal writes.
-        */
-       BUG_ON(NInoMstProtected(ni));
-       BUG_ON(ni->type != AT_DATA);
-       /* If file is encrypted, deny access, just like NT4. */
-       if (NInoEncrypted(ni)) {
-               /* Only $DATA attributes can be encrypted. */
-               /*
-                * Reminder for later: Encrypted files are _always_
-                * non-resident so that the content can always be encrypted.
-                */
-               ntfs_debug("Denying write access to encrypted file.");
-               err = -EACCES;
-               goto out;
-       }
-       if (NInoCompressed(ni)) {
-               /* Only unnamed $DATA attribute can be compressed. */
-               BUG_ON(ni->name_len);
-               /*
-                * Reminder for later: If resident, the data is not actually
-                * compressed.  Only on the switch to non-resident does
-                * compression kick in.  This is in contrast to encrypted files
-                * (see above).
-                */
-               ntfs_error(vi->i_sb, "Writing to compressed files is not "
-                               "implemented yet.  Sorry.");
-               err = -EOPNOTSUPP;
-               goto out;
-       }
-       err = file_remove_privs(file);
-       if (unlikely(err))
-               goto out;
-       /*
-        * Our ->update_time method always succeeds thus file_update_time()
-        * cannot fail either so there is no need to check the return code.
-        */
-       file_update_time(file);
-       pos = iocb->ki_pos;
-       /* The first byte after the last cluster being written to. */
-       end = (pos + iov_iter_count(from) + vol->cluster_size_mask) &
-                       ~(u64)vol->cluster_size_mask;
-       /*
-        * If the write goes beyond the allocated size, extend the allocation
-        * to cover the whole of the write, rounded up to the nearest cluster.
-        */
-       read_lock_irqsave(&ni->size_lock, flags);
-       ll = ni->allocated_size;
-       read_unlock_irqrestore(&ni->size_lock, flags);
-       if (end > ll) {
-               /*
-                * Extend the allocation without changing the data size.
-                *
-                * Note we ensure the allocation is big enough to at least
-                * write some data but we do not require the allocation to be
-                * complete, i.e. it may be partial.
-                */
-               ll = ntfs_attr_extend_allocation(ni, end, -1, pos);
-               if (likely(ll >= 0)) {
-                       BUG_ON(pos >= ll);
-                       /* If the extension was partial truncate the write. */
-                       if (end > ll) {
-                               ntfs_debug("Truncating write to inode 0x%lx, "
-                                               "attribute type 0x%x, because "
-                                               "the allocation was only "
-                                               "partially extended.",
-                                               vi->i_ino, (unsigned)
-                                               le32_to_cpu(ni->type));
-                               iov_iter_truncate(from, ll - pos);
-                       }
-               } else {
-                       err = ll;
-                       read_lock_irqsave(&ni->size_lock, flags);
-                       ll = ni->allocated_size;
-                       read_unlock_irqrestore(&ni->size_lock, flags);
-                       /* Perform a partial write if possible or fail. */
-                       if (pos < ll) {
-                               ntfs_debug("Truncating write to inode 0x%lx "
-                                               "attribute type 0x%x, because "
-                                               "extending the allocation "
-                                               "failed (error %d).",
-                                               vi->i_ino, (unsigned)
-                                               le32_to_cpu(ni->type),
-                                               (int)-err);
-                               iov_iter_truncate(from, ll - pos);
-                       } else {
-                               if (err != -ENOSPC)
-                                       ntfs_error(vi->i_sb, "Cannot perform "
-                                                       "write to inode "
-                                                       "0x%lx, attribute "
-                                                       "type 0x%x, because "
-                                                       "extending the "
-                                                       "allocation failed "
-                                                       "(error %ld).",
-                                                       vi->i_ino, (unsigned)
-                                                       le32_to_cpu(ni->type),
-                                                       (long)-err);
-                               else
-                                       ntfs_debug("Cannot perform write to "
-                                                       "inode 0x%lx, "
-                                                       "attribute type 0x%x, "
-                                                       "because there is not "
-                                                       "space left.",
-                                                       vi->i_ino, (unsigned)
-                                                       le32_to_cpu(ni->type));
-                               goto out;
-                       }
-               }
-       }
-       /*
-        * If the write starts beyond the initialized size, extend it up to the
-        * beginning of the write and initialize all non-sparse space between
-        * the old initialized size and the new one.  This automatically also
-        * increments the vfs inode->i_size to keep it above or equal to the
-        * initialized_size.
-        */
-       read_lock_irqsave(&ni->size_lock, flags);
-       ll = ni->initialized_size;
-       read_unlock_irqrestore(&ni->size_lock, flags);
-       if (pos > ll) {
-               /*
-                * Wait for ongoing direct i/o to complete before proceeding.
-                * New direct i/o cannot start as we hold i_mutex.
-                */
-               inode_dio_wait(vi);
-               err = ntfs_attr_extend_initialized(ni, pos);
-               if (unlikely(err < 0))
-                       ntfs_error(vi->i_sb, "Cannot perform write to inode "
-                                       "0x%lx, attribute type 0x%x, because "
-                                       "extending the initialized size "
-                                       "failed (error %d).", vi->i_ino,
-                                       (unsigned)le32_to_cpu(ni->type),
-                                       (int)-err);
-       }
-out:
-       return err;
-}
-
-/**
- * __ntfs_grab_cache_pages - obtain a number of locked pages
- * @mapping:   address space mapping from which to obtain page cache pages
- * @index:     starting index in @mapping at which to begin obtaining pages
- * @nr_pages:  number of page cache pages to obtain
- * @pages:     array of pages in which to return the obtained page cache pages
- * @cached_page: allocated but as yet unused page
- *
- * Obtain @nr_pages locked page cache pages from the mapping @mapping and
- * starting at index @index.
- *
- * If a page is newly created, add it to lru list
- *
- * Note, the page locks are obtained in ascending page index order.
- */
-static inline int __ntfs_grab_cache_pages(struct address_space *mapping,
-               pgoff_t index, const unsigned nr_pages, struct page **pages,
-               struct page **cached_page)
-{
-       int err, nr;
-
-       BUG_ON(!nr_pages);
-       err = nr = 0;
-       do {
-               pages[nr] = find_get_page_flags(mapping, index, FGP_LOCK |
-                               FGP_ACCESSED);
-               if (!pages[nr]) {
-                       if (!*cached_page) {
-                               *cached_page = page_cache_alloc(mapping);
-                               if (unlikely(!*cached_page)) {
-                                       err = -ENOMEM;
-                                       goto err_out;
-                               }
-                       }
-                       err = add_to_page_cache_lru(*cached_page, mapping,
-                                  index,
-                                  mapping_gfp_constraint(mapping, GFP_KERNEL));
-                       if (unlikely(err)) {
-                               if (err == -EEXIST)
-                                       continue;
-                               goto err_out;
-                       }
-                       pages[nr] = *cached_page;
-                       *cached_page = NULL;
-               }
-               index++;
-               nr++;
-       } while (nr < nr_pages);
-out:
-       return err;
-err_out:
-       while (nr > 0) {
-               unlock_page(pages[--nr]);
-               put_page(pages[nr]);
-       }
-       goto out;
-}
-
-static inline void ntfs_submit_bh_for_read(struct buffer_head *bh)
-{
-       lock_buffer(bh);
-       get_bh(bh);
-       bh->b_end_io = end_buffer_read_sync;
-       submit_bh(REQ_OP_READ, bh);
-}
-
-/**
- * ntfs_prepare_pages_for_non_resident_write - prepare pages for receiving data
- * @pages:     array of destination pages
- * @nr_pages:  number of pages in @pages
- * @pos:       byte position in file at which the write begins
- * @bytes:     number of bytes to be written
- *
- * This is called for non-resident attributes from ntfs_file_buffered_write()
- * with i_mutex held on the inode (@pages[0]->mapping->host).  There are
- * @nr_pages pages in @pages which are locked but not kmap()ped.  The source
- * data has not yet been copied into the @pages.
- * 
- * Need to fill any holes with actual clusters, allocate buffers if necessary,
- * ensure all the buffers are mapped, and bring uptodate any buffers that are
- * only partially being written to.
- *
- * If @nr_pages is greater than one, we are guaranteed that the cluster size is
- * greater than PAGE_SIZE, that all pages in @pages are entirely inside
- * the same cluster and that they are the entirety of that cluster, and that
- * the cluster is sparse, i.e. we need to allocate a cluster to fill the hole.
- *
- * i_size is not to be modified yet.
- *
- * Return 0 on success or -errno on error.
- */
-static int ntfs_prepare_pages_for_non_resident_write(struct page **pages,
-               unsigned nr_pages, s64 pos, size_t bytes)
-{
-       VCN vcn, highest_vcn = 0, cpos, cend, bh_cpos, bh_cend;
-       LCN lcn;
-       s64 bh_pos, vcn_len, end, initialized_size;
-       sector_t lcn_block;
-       struct folio *folio;
-       struct inode *vi;
-       ntfs_inode *ni, *base_ni = NULL;
-       ntfs_volume *vol;
-       runlist_element *rl, *rl2;
-       struct buffer_head *bh, *head, *wait[2], **wait_bh = wait;
-       ntfs_attr_search_ctx *ctx = NULL;
-       MFT_RECORD *m = NULL;
-       ATTR_RECORD *a = NULL;
-       unsigned long flags;
-       u32 attr_rec_len = 0;
-       unsigned blocksize, u;
-       int err, mp_size;
-       bool rl_write_locked, was_hole, is_retry;
-       unsigned char blocksize_bits;
-       struct {
-               u8 runlist_merged:1;
-               u8 mft_attr_mapped:1;
-               u8 mp_rebuilt:1;
-               u8 attr_switched:1;
-       } status = { 0, 0, 0, 0 };
-
-       BUG_ON(!nr_pages);
-       BUG_ON(!pages);
-       BUG_ON(!*pages);
-       vi = pages[0]->mapping->host;
-       ni = NTFS_I(vi);
-       vol = ni->vol;
-       ntfs_debug("Entering for inode 0x%lx, attribute type 0x%x, start page "
-                       "index 0x%lx, nr_pages 0x%x, pos 0x%llx, bytes 0x%zx.",
-                       vi->i_ino, ni->type, pages[0]->index, nr_pages,
-                       (long long)pos, bytes);
-       blocksize = vol->sb->s_blocksize;
-       blocksize_bits = vol->sb->s_blocksize_bits;
-       rl_write_locked = false;
-       rl = NULL;
-       err = 0;
-       vcn = lcn = -1;
-       vcn_len = 0;
-       lcn_block = -1;
-       was_hole = false;
-       cpos = pos >> vol->cluster_size_bits;
-       end = pos + bytes;
-       cend = (end + vol->cluster_size - 1) >> vol->cluster_size_bits;
-       /*
-        * Loop over each buffer in each folio.  Use goto to
-        * reduce indentation.
-        */
-       u = 0;
-do_next_folio:
-       folio = page_folio(pages[u]);
-       bh_pos = folio_pos(folio);
-       head = folio_buffers(folio);
-       if (!head)
-               /*
-                * create_empty_buffers() will create uptodate/dirty
-                * buffers if the folio is uptodate/dirty.
-                */
-               head = create_empty_buffers(folio, blocksize, 0);
-       bh = head;
-       do {
-               VCN cdelta;
-               s64 bh_end;
-               unsigned bh_cofs;
-
-               /* Clear buffer_new on all buffers to reinitialise state. */
-               if (buffer_new(bh))
-                       clear_buffer_new(bh);
-               bh_end = bh_pos + blocksize;
-               bh_cpos = bh_pos >> vol->cluster_size_bits;
-               bh_cofs = bh_pos & vol->cluster_size_mask;
-               if (buffer_mapped(bh)) {
-                       /*
-                        * The buffer is already mapped.  If it is uptodate,
-                        * ignore it.
-                        */
-                       if (buffer_uptodate(bh))
-                               continue;
-                       /*
-                        * The buffer is not uptodate.  If the folio is uptodate
-                        * set the buffer uptodate and otherwise ignore it.
-                        */
-                       if (folio_test_uptodate(folio)) {
-                               set_buffer_uptodate(bh);
-                               continue;
-                       }
-                       /*
-                        * Neither the folio nor the buffer are uptodate.  If
-                        * the buffer is only partially being written to, we
-                        * need to read it in before the write, i.e. now.
-                        */
-                       if ((bh_pos < pos && bh_end > pos) ||
-                                       (bh_pos < end && bh_end > end)) {
-                               /*
-                                * If the buffer is fully or partially within
-                                * the initialized size, do an actual read.
-                                * Otherwise, simply zero the buffer.
-                                */
-                               read_lock_irqsave(&ni->size_lock, flags);
-                               initialized_size = ni->initialized_size;
-                               read_unlock_irqrestore(&ni->size_lock, flags);
-                               if (bh_pos < initialized_size) {
-                                       ntfs_submit_bh_for_read(bh);
-                                       *wait_bh++ = bh;
-                               } else {
-                                       folio_zero_range(folio, bh_offset(bh),
-                                                       blocksize);
-                                       set_buffer_uptodate(bh);
-                               }
-                       }
-                       continue;
-               }
-               /* Unmapped buffer.  Need to map it. */
-               bh->b_bdev = vol->sb->s_bdev;
-               /*
-                * If the current buffer is in the same clusters as the map
-                * cache, there is no need to check the runlist again.  The
-                * map cache is made up of @vcn, which is the first cached file
-                * cluster, @vcn_len which is the number of cached file
-                * clusters, @lcn is the device cluster corresponding to @vcn,
-                * and @lcn_block is the block number corresponding to @lcn.
-                */
-               cdelta = bh_cpos - vcn;
-               if (likely(!cdelta || (cdelta > 0 && cdelta < vcn_len))) {
-map_buffer_cached:
-                       BUG_ON(lcn < 0);
-                       bh->b_blocknr = lcn_block +
-                                       (cdelta << (vol->cluster_size_bits -
-                                       blocksize_bits)) +
-                                       (bh_cofs >> blocksize_bits);
-                       set_buffer_mapped(bh);
-                       /*
-                        * If the folio is uptodate so is the buffer.  If the
-                        * buffer is fully outside the write, we ignore it if
-                        * it was already allocated and we mark it dirty so it
-                        * gets written out if we allocated it.  On the other
-                        * hand, if we allocated the buffer but we are not
-                        * marking it dirty we set buffer_new so we can do
-                        * error recovery.
-                        */
-                       if (folio_test_uptodate(folio)) {
-                               if (!buffer_uptodate(bh))
-                                       set_buffer_uptodate(bh);
-                               if (unlikely(was_hole)) {
-                                       /* We allocated the buffer. */
-                                       clean_bdev_bh_alias(bh);
-                                       if (bh_end <= pos || bh_pos >= end)
-                                               mark_buffer_dirty(bh);
-                                       else
-                                               set_buffer_new(bh);
-                               }
-                               continue;
-                       }
-                       /* Page is _not_ uptodate. */
-                       if (likely(!was_hole)) {
-                               /*
-                                * Buffer was already allocated.  If it is not
-                                * uptodate and is only partially being written
-                                * to, we need to read it in before the write,
-                                * i.e. now.
-                                */
-                               if (!buffer_uptodate(bh) && bh_pos < end &&
-                                               bh_end > pos &&
-                                               (bh_pos < pos ||
-                                               bh_end > end)) {
-                                       /*
-                                        * If the buffer is fully or partially
-                                        * within the initialized size, do an
-                                        * actual read.  Otherwise, simply zero
-                                        * the buffer.
-                                        */
-                                       read_lock_irqsave(&ni->size_lock,
-                                                       flags);
-                                       initialized_size = ni->initialized_size;
-                                       read_unlock_irqrestore(&ni->size_lock,
-                                                       flags);
-                                       if (bh_pos < initialized_size) {
-                                               ntfs_submit_bh_for_read(bh);
-                                               *wait_bh++ = bh;
-                                       } else {
-                                               folio_zero_range(folio,
-                                                               bh_offset(bh),
-                                                               blocksize);
-                                               set_buffer_uptodate(bh);
-                                       }
-                               }
-                               continue;
-                       }
-                       /* We allocated the buffer. */
-                       clean_bdev_bh_alias(bh);
-                       /*
-                        * If the buffer is fully outside the write, zero it,
-                        * set it uptodate, and mark it dirty so it gets
-                        * written out.  If it is partially being written to,
-                        * zero region surrounding the write but leave it to
-                        * commit write to do anything else.  Finally, if the
-                        * buffer is fully being overwritten, do nothing.
-                        */
-                       if (bh_end <= pos || bh_pos >= end) {
-                               if (!buffer_uptodate(bh)) {
-                                       folio_zero_range(folio, bh_offset(bh),
-                                                       blocksize);
-                                       set_buffer_uptodate(bh);
-                               }
-                               mark_buffer_dirty(bh);
-                               continue;
-                       }
-                       set_buffer_new(bh);
-                       if (!buffer_uptodate(bh) &&
-                                       (bh_pos < pos || bh_end > end)) {
-                               u8 *kaddr;
-                               unsigned pofs;
-                                       
-                               kaddr = kmap_local_folio(folio, 0);
-                               if (bh_pos < pos) {
-                                       pofs = bh_pos & ~PAGE_MASK;
-                                       memset(kaddr + pofs, 0, pos - bh_pos);
-                               }
-                               if (bh_end > end) {
-                                       pofs = end & ~PAGE_MASK;
-                                       memset(kaddr + pofs, 0, bh_end - end);
-                               }
-                               kunmap_local(kaddr);
-                               flush_dcache_folio(folio);
-                       }
-                       continue;
-               }
-               /*
-                * Slow path: this is the first buffer in the cluster.  If it
-                * is outside allocated size and is not uptodate, zero it and
-                * set it uptodate.
-                */
-               read_lock_irqsave(&ni->size_lock, flags);
-               initialized_size = ni->allocated_size;
-               read_unlock_irqrestore(&ni->size_lock, flags);
-               if (bh_pos > initialized_size) {
-                       if (folio_test_uptodate(folio)) {
-                               if (!buffer_uptodate(bh))
-                                       set_buffer_uptodate(bh);
-                       } else if (!buffer_uptodate(bh)) {
-                               folio_zero_range(folio, bh_offset(bh),
-                                               blocksize);
-                               set_buffer_uptodate(bh);
-                       }
-                       continue;
-               }
-               is_retry = false;
-               if (!rl) {
-                       down_read(&ni->runlist.lock);
-retry_remap:
-                       rl = ni->runlist.rl;
-               }
-               if (likely(rl != NULL)) {
-                       /* Seek to element containing target cluster. */
-                       while (rl->length && rl[1].vcn <= bh_cpos)
-                               rl++;
-                       lcn = ntfs_rl_vcn_to_lcn(rl, bh_cpos);
-                       if (likely(lcn >= 0)) {
-                               /*
-                                * Successful remap, setup the map cache and
-                                * use that to deal with the buffer.
-                                */
-                               was_hole = false;
-                               vcn = bh_cpos;
-                               vcn_len = rl[1].vcn - vcn;
-                               lcn_block = lcn << (vol->cluster_size_bits -
-                                               blocksize_bits);
-                               cdelta = 0;
-                               /*
-                                * If the number of remaining clusters touched
-                                * by the write is smaller or equal to the
-                                * number of cached clusters, unlock the
-                                * runlist as the map cache will be used from
-                                * now on.
-                                */
-                               if (likely(vcn + vcn_len >= cend)) {
-                                       if (rl_write_locked) {
-                                               up_write(&ni->runlist.lock);
-                                               rl_write_locked = false;
-                                       } else
-                                               up_read(&ni->runlist.lock);
-                                       rl = NULL;
-                               }
-                               goto map_buffer_cached;
-                       }
-               } else
-                       lcn = LCN_RL_NOT_MAPPED;
-               /*
-                * If it is not a hole and not out of bounds, the runlist is
-                * probably unmapped so try to map it now.
-                */
-               if (unlikely(lcn != LCN_HOLE && lcn != LCN_ENOENT)) {
-                       if (likely(!is_retry && lcn == LCN_RL_NOT_MAPPED)) {
-                               /* Attempt to map runlist. */
-                               if (!rl_write_locked) {
-                                       /*
-                                        * We need the runlist locked for
-                                        * writing, so if it is locked for
-                                        * reading relock it now and retry in
-                                        * case it changed whilst we dropped
-                                        * the lock.
-                                        */
-                                       up_read(&ni->runlist.lock);
-                                       down_write(&ni->runlist.lock);
-                                       rl_write_locked = true;
-                                       goto retry_remap;
-                               }
-                               err = ntfs_map_runlist_nolock(ni, bh_cpos,
-                                               NULL);
-                               if (likely(!err)) {
-                                       is_retry = true;
-                                       goto retry_remap;
-                               }
-                               /*
-                                * If @vcn is out of bounds, pretend @lcn is
-                                * LCN_ENOENT.  As long as the buffer is out
-                                * of bounds this will work fine.
-                                */
-                               if (err == -ENOENT) {
-                                       lcn = LCN_ENOENT;
-                                       err = 0;
-                                       goto rl_not_mapped_enoent;
-                               }
-                       } else
-                               err = -EIO;
-                       /* Failed to map the buffer, even after retrying. */
-                       bh->b_blocknr = -1;
-                       ntfs_error(vol->sb, "Failed to write to inode 0x%lx, "
-                                       "attribute type 0x%x, vcn 0x%llx, "
-                                       "vcn offset 0x%x, because its "
-                                       "location on disk could not be "
-                                       "determined%s (error code %i).",
-                                       ni->mft_no, ni->type,
-                                       (unsigned long long)bh_cpos,
-                                       (unsigned)bh_pos &
-                                       vol->cluster_size_mask,
-                                       is_retry ? " even after retrying" : "",
-                                       err);
-                       break;
-               }
-rl_not_mapped_enoent:
-               /*
-                * The buffer is in a hole or out of bounds.  We need to fill
-                * the hole, unless the buffer is in a cluster which is not
-                * touched by the write, in which case we just leave the buffer
-                * unmapped.  This can only happen when the cluster size is
-                * less than the page cache size.
-                */
-               if (unlikely(vol->cluster_size < PAGE_SIZE)) {
-                       bh_cend = (bh_end + vol->cluster_size - 1) >>
-                                       vol->cluster_size_bits;
-                       if ((bh_cend <= cpos || bh_cpos >= cend)) {
-                               bh->b_blocknr = -1;
-                               /*
-                                * If the buffer is uptodate we skip it.  If it
-                                * is not but the folio is uptodate, we can set
-                                * the buffer uptodate.  If the folio is not
-                                * uptodate, we can clear the buffer and set it
-                                * uptodate.  Whether this is worthwhile is
-                                * debatable and this could be removed.
-                                */
-                               if (folio_test_uptodate(folio)) {
-                                       if (!buffer_uptodate(bh))
-                                               set_buffer_uptodate(bh);
-                               } else if (!buffer_uptodate(bh)) {
-                                       folio_zero_range(folio, bh_offset(bh),
-                                               blocksize);
-                                       set_buffer_uptodate(bh);
-                               }
-                               continue;
-                       }
-               }
-               /*
-                * Out of bounds buffer is invalid if it was not really out of
-                * bounds.
-                */
-               BUG_ON(lcn != LCN_HOLE);
-               /*
-                * We need the runlist locked for writing, so if it is locked
-                * for reading relock it now and retry in case it changed
-                * whilst we dropped the lock.
-                */
-               BUG_ON(!rl);
-               if (!rl_write_locked) {
-                       up_read(&ni->runlist.lock);
-                       down_write(&ni->runlist.lock);
-                       rl_write_locked = true;
-                       goto retry_remap;
-               }
-               /* Find the previous last allocated cluster. */
-               BUG_ON(rl->lcn != LCN_HOLE);
-               lcn = -1;
-               rl2 = rl;
-               while (--rl2 >= ni->runlist.rl) {
-                       if (rl2->lcn >= 0) {
-                               lcn = rl2->lcn + rl2->length;
-                               break;
-                       }
-               }
-               rl2 = ntfs_cluster_alloc(vol, bh_cpos, 1, lcn, DATA_ZONE,
-                               false);
-               if (IS_ERR(rl2)) {
-                       err = PTR_ERR(rl2);
-                       ntfs_debug("Failed to allocate cluster, error code %i.",
-                                       err);
-                       break;
-               }
-               lcn = rl2->lcn;
-               rl = ntfs_runlists_merge(ni->runlist.rl, rl2);
-               if (IS_ERR(rl)) {
-                       err = PTR_ERR(rl);
-                       if (err != -ENOMEM)
-                               err = -EIO;
-                       if (ntfs_cluster_free_from_rl(vol, rl2)) {
-                               ntfs_error(vol->sb, "Failed to release "
-                                               "allocated cluster in error "
-                                               "code path.  Run chkdsk to "
-                                               "recover the lost cluster.");
-                               NVolSetErrors(vol);
-                       }
-                       ntfs_free(rl2);
-                       break;
-               }
-               ni->runlist.rl = rl;
-               status.runlist_merged = 1;
-               ntfs_debug("Allocated cluster, lcn 0x%llx.",
-                               (unsigned long long)lcn);
-               /* Map and lock the mft record and get the attribute record. */
-               if (!NInoAttr(ni))
-                       base_ni = ni;
-               else
-                       base_ni = ni->ext.base_ntfs_ino;
-               m = map_mft_record(base_ni);
-               if (IS_ERR(m)) {
-                       err = PTR_ERR(m);
-                       break;
-               }
-               ctx = ntfs_attr_get_search_ctx(base_ni, m);
-               if (unlikely(!ctx)) {
-                       err = -ENOMEM;
-                       unmap_mft_record(base_ni);
-                       break;
-               }
-               status.mft_attr_mapped = 1;
-               err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
-                               CASE_SENSITIVE, bh_cpos, NULL, 0, ctx);
-               if (unlikely(err)) {
-                       if (err == -ENOENT)
-                               err = -EIO;
-                       break;
-               }
-               m = ctx->mrec;
-               a = ctx->attr;
-               /*
-                * Find the runlist element with which the attribute extent
-                * starts.  Note, we cannot use the _attr_ version because we
-                * have mapped the mft record.  That is ok because we know the
-                * runlist fragment must be mapped already to have ever gotten
-                * here, so we can just use the _rl_ version.
-                */
-               vcn = sle64_to_cpu(a->data.non_resident.lowest_vcn);
-               rl2 = ntfs_rl_find_vcn_nolock(rl, vcn);
-               BUG_ON(!rl2);
-               BUG_ON(!rl2->length);
-               BUG_ON(rl2->lcn < LCN_HOLE);
-               highest_vcn = sle64_to_cpu(a->data.non_resident.highest_vcn);
-               /*
-                * If @highest_vcn is zero, calculate the real highest_vcn
-                * (which can really be zero).
-                */
-               if (!highest_vcn)
-                       highest_vcn = (sle64_to_cpu(
-                                       a->data.non_resident.allocated_size) >>
-                                       vol->cluster_size_bits) - 1;
-               /*
-                * Determine the size of the mapping pairs array for the new
-                * extent, i.e. the old extent with the hole filled.
-                */
-               mp_size = ntfs_get_size_for_mapping_pairs(vol, rl2, vcn,
-                               highest_vcn);
-               if (unlikely(mp_size <= 0)) {
-                       if (!(err = mp_size))
-                               err = -EIO;
-                       ntfs_debug("Failed to get size for mapping pairs "
-                                       "array, error code %i.", err);
-                       break;
-               }
-               /*
-                * Resize the attribute record to fit the new mapping pairs
-                * array.
-                */
-               attr_rec_len = le32_to_cpu(a->length);
-               err = ntfs_attr_record_resize(m, a, mp_size + le16_to_cpu(
-                               a->data.non_resident.mapping_pairs_offset));
-               if (unlikely(err)) {
-                       BUG_ON(err != -ENOSPC);
-                       // TODO: Deal with this by using the current attribute
-                       // and fill it with as much of the mapping pairs
-                       // array as possible.  Then loop over each attribute
-                       // extent rewriting the mapping pairs arrays as we go
-                       // along and if when we reach the end we have not
-                       // enough space, try to resize the last attribute
-                       // extent and if even that fails, add a new attribute
-                       // extent.
-                       // We could also try to resize at each step in the hope
-                       // that we will not need to rewrite every single extent.
-                       // Note, we may need to decompress some extents to fill
-                       // the runlist as we are walking the extents...
-                       ntfs_error(vol->sb, "Not enough space in the mft "
-                                       "record for the extended attribute "
-                                       "record.  This case is not "
-                                       "implemented yet.");
-                       err = -EOPNOTSUPP;
-                       break ;
-               }
-               status.mp_rebuilt = 1;
-               /*
-                * Generate the mapping pairs array directly into the attribute
-                * record.
-                */
-               err = ntfs_mapping_pairs_build(vol, (u8*)a + le16_to_cpu(
-                               a->data.non_resident.mapping_pairs_offset),
-                               mp_size, rl2, vcn, highest_vcn, NULL);
-               if (unlikely(err)) {
-                       ntfs_error(vol->sb, "Cannot fill hole in inode 0x%lx, "
-                                       "attribute type 0x%x, because building "
-                                       "the mapping pairs failed with error "
-                                       "code %i.", vi->i_ino,
-                                       (unsigned)le32_to_cpu(ni->type), err);
-                       err = -EIO;
-                       break;
-               }
-               /* Update the highest_vcn but only if it was not set. */
-               if (unlikely(!a->data.non_resident.highest_vcn))
-                       a->data.non_resident.highest_vcn =
-                                       cpu_to_sle64(highest_vcn);
-               /*
-                * If the attribute is sparse/compressed, update the compressed
-                * size in the ntfs_inode structure and the attribute record.
-                */
-               if (likely(NInoSparse(ni) || NInoCompressed(ni))) {
-                       /*
-                        * If we are not in the first attribute extent, switch
-                        * to it, but first ensure the changes will make it to
-                        * disk later.
-                        */
-                       if (a->data.non_resident.lowest_vcn) {
-                               flush_dcache_mft_record_page(ctx->ntfs_ino);
-                               mark_mft_record_dirty(ctx->ntfs_ino);
-                               ntfs_attr_reinit_search_ctx(ctx);
-                               err = ntfs_attr_lookup(ni->type, ni->name,
-                                               ni->name_len, CASE_SENSITIVE,
-                                               0, NULL, 0, ctx);
-                               if (unlikely(err)) {
-                                       status.attr_switched = 1;
-                                       break;
-                               }
-                               /* @m is not used any more so do not set it. */
-                               a = ctx->attr;
-                       }
-                       write_lock_irqsave(&ni->size_lock, flags);
-                       ni->itype.compressed.size += vol->cluster_size;
-                       a->data.non_resident.compressed_size =
-                                       cpu_to_sle64(ni->itype.compressed.size);
-                       write_unlock_irqrestore(&ni->size_lock, flags);
-               }
-               /* Ensure the changes make it to disk. */
-               flush_dcache_mft_record_page(ctx->ntfs_ino);
-               mark_mft_record_dirty(ctx->ntfs_ino);
-               ntfs_attr_put_search_ctx(ctx);
-               unmap_mft_record(base_ni);
-               /* Successfully filled the hole. */
-               status.runlist_merged = 0;
-               status.mft_attr_mapped = 0;
-               status.mp_rebuilt = 0;
-               /* Setup the map cache and use that to deal with the buffer. */
-               was_hole = true;
-               vcn = bh_cpos;
-               vcn_len = 1;
-               lcn_block = lcn << (vol->cluster_size_bits - blocksize_bits);
-               cdelta = 0;
-               /*
-                * If the number of remaining clusters in the @pages is smaller
-                * or equal to the number of cached clusters, unlock the
-                * runlist as the map cache will be used from now on.
-                */
-               if (likely(vcn + vcn_len >= cend)) {
-                       up_write(&ni->runlist.lock);
-                       rl_write_locked = false;
-                       rl = NULL;
-               }
-               goto map_buffer_cached;
-       } while (bh_pos += blocksize, (bh = bh->b_this_page) != head);
-       /* If there are no errors, do the next page. */
-       if (likely(!err && ++u < nr_pages))
-               goto do_next_folio;
-       /* If there are no errors, release the runlist lock if we took it. */
-       if (likely(!err)) {
-               if (unlikely(rl_write_locked)) {
-                       up_write(&ni->runlist.lock);
-                       rl_write_locked = false;
-               } else if (unlikely(rl))
-                       up_read(&ni->runlist.lock);
-               rl = NULL;
-       }
-       /* If we issued read requests, let them complete. */
-       read_lock_irqsave(&ni->size_lock, flags);
-       initialized_size = ni->initialized_size;
-       read_unlock_irqrestore(&ni->size_lock, flags);
-       while (wait_bh > wait) {
-               bh = *--wait_bh;
-               wait_on_buffer(bh);
-               if (likely(buffer_uptodate(bh))) {
-                       folio = bh->b_folio;
-                       bh_pos = folio_pos(folio) + bh_offset(bh);
-                       /*
-                        * If the buffer overflows the initialized size, need
-                        * to zero the overflowing region.
-                        */
-                       if (unlikely(bh_pos + blocksize > initialized_size)) {
-                               int ofs = 0;
-
-                               if (likely(bh_pos < initialized_size))
-                                       ofs = initialized_size - bh_pos;
-                               folio_zero_segment(folio, bh_offset(bh) + ofs,
-                                               blocksize);
-                       }
-               } else /* if (unlikely(!buffer_uptodate(bh))) */
-                       err = -EIO;
-       }
-       if (likely(!err)) {
-               /* Clear buffer_new on all buffers. */
-               u = 0;
-               do {
-                       bh = head = page_buffers(pages[u]);
-                       do {
-                               if (buffer_new(bh))
-                                       clear_buffer_new(bh);
-                       } while ((bh = bh->b_this_page) != head);
-               } while (++u < nr_pages);
-               ntfs_debug("Done.");
-               return err;
-       }
-       if (status.attr_switched) {
-               /* Get back to the attribute extent we modified. */
-               ntfs_attr_reinit_search_ctx(ctx);
-               if (ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
-                               CASE_SENSITIVE, bh_cpos, NULL, 0, ctx)) {
-                       ntfs_error(vol->sb, "Failed to find required "
-                                       "attribute extent of attribute in "
-                                       "error code path.  Run chkdsk to "
-                                       "recover.");
-                       write_lock_irqsave(&ni->size_lock, flags);
-                       ni->itype.compressed.size += vol->cluster_size;
-                       write_unlock_irqrestore(&ni->size_lock, flags);
-                       flush_dcache_mft_record_page(ctx->ntfs_ino);
-                       mark_mft_record_dirty(ctx->ntfs_ino);
-                       /*
-                        * The only thing that is now wrong is the compressed
-                        * size of the base attribute extent which chkdsk
-                        * should be able to fix.
-                        */
-                       NVolSetErrors(vol);
-               } else {
-                       m = ctx->mrec;
-                       a = ctx->attr;
-                       status.attr_switched = 0;
-               }
-       }
-       /*
-        * If the runlist has been modified, need to restore it by punching a
-        * hole into it and we then need to deallocate the on-disk cluster as
-        * well.  Note, we only modify the runlist if we are able to generate a
-        * new mapping pairs array, i.e. only when the mapped attribute extent
-        * is not switched.
-        */
-       if (status.runlist_merged && !status.attr_switched) {
-               BUG_ON(!rl_write_locked);
-               /* Make the file cluster we allocated sparse in the runlist. */
-               if (ntfs_rl_punch_nolock(vol, &ni->runlist, bh_cpos, 1)) {
-                       ntfs_error(vol->sb, "Failed to punch hole into "
-                                       "attribute runlist in error code "
-                                       "path.  Run chkdsk to recover the "
-                                       "lost cluster.");
-                       NVolSetErrors(vol);
-               } else /* if (success) */ {
-                       status.runlist_merged = 0;
-                       /*
-                        * Deallocate the on-disk cluster we allocated but only
-                        * if we succeeded in punching its vcn out of the
-                        * runlist.
-                        */
-                       down_write(&vol->lcnbmp_lock);
-                       if (ntfs_bitmap_clear_bit(vol->lcnbmp_ino, lcn)) {
-                               ntfs_error(vol->sb, "Failed to release "
-                                               "allocated cluster in error "
-                                               "code path.  Run chkdsk to "
-                                               "recover the lost cluster.");
-                               NVolSetErrors(vol);
-                       }
-                       up_write(&vol->lcnbmp_lock);
-               }
-       }
-       /*
-        * Resize the attribute record to its old size and rebuild the mapping
-        * pairs array.  Note, we only can do this if the runlist has been
-        * restored to its old state which also implies that the mapped
-        * attribute extent is not switched.
-        */
-       if (status.mp_rebuilt && !status.runlist_merged) {
-               if (ntfs_attr_record_resize(m, a, attr_rec_len)) {
-                       ntfs_error(vol->sb, "Failed to restore attribute "
-                                       "record in error code path.  Run "
-                                       "chkdsk to recover.");
-                       NVolSetErrors(vol);
-               } else /* if (success) */ {
-                       if (ntfs_mapping_pairs_build(vol, (u8*)a +
-                                       le16_to_cpu(a->data.non_resident.
-                                       mapping_pairs_offset), attr_rec_len -
-                                       le16_to_cpu(a->data.non_resident.
-                                       mapping_pairs_offset), ni->runlist.rl,
-                                       vcn, highest_vcn, NULL)) {
-                               ntfs_error(vol->sb, "Failed to restore "
-                                               "mapping pairs array in error "
-                                               "code path.  Run chkdsk to "
-                                               "recover.");
-                               NVolSetErrors(vol);
-                       }
-                       flush_dcache_mft_record_page(ctx->ntfs_ino);
-                       mark_mft_record_dirty(ctx->ntfs_ino);
-               }
-       }
-       /* Release the mft record and the attribute. */
-       if (status.mft_attr_mapped) {
-               ntfs_attr_put_search_ctx(ctx);
-               unmap_mft_record(base_ni);
-       }
-       /* Release the runlist lock. */
-       if (rl_write_locked)
-               up_write(&ni->runlist.lock);
-       else if (rl)
-               up_read(&ni->runlist.lock);
-       /*
-        * Zero out any newly allocated blocks to avoid exposing stale data.
-        * If BH_New is set, we know that the block was newly allocated above
-        * and that it has not been fully zeroed and marked dirty yet.
-        */
-       nr_pages = u;
-       u = 0;
-       end = bh_cpos << vol->cluster_size_bits;
-       do {
-               folio = page_folio(pages[u]);
-               bh = head = folio_buffers(folio);
-               do {
-                       if (u == nr_pages &&
-                           folio_pos(folio) + bh_offset(bh) >= end)
-                               break;
-                       if (!buffer_new(bh))
-                               continue;
-                       clear_buffer_new(bh);
-                       if (!buffer_uptodate(bh)) {
-                               if (folio_test_uptodate(folio))
-                                       set_buffer_uptodate(bh);
-                               else {
-                                       folio_zero_range(folio, bh_offset(bh),
-                                                       blocksize);
-                                       set_buffer_uptodate(bh);
-                               }
-                       }
-                       mark_buffer_dirty(bh);
-               } while ((bh = bh->b_this_page) != head);
-       } while (++u <= nr_pages);
-       ntfs_error(vol->sb, "Failed.  Returning error code %i.", err);
-       return err;
-}
-
-static inline void ntfs_flush_dcache_pages(struct page **pages,
-               unsigned nr_pages)
-{
-       BUG_ON(!nr_pages);
-       /*
-        * Warning: Do not do the decrement at the same time as the call to
-        * flush_dcache_page() because it is a NULL macro on i386 and hence the
-        * decrement never happens so the loop never terminates.
-        */
-       do {
-               --nr_pages;
-               flush_dcache_page(pages[nr_pages]);
-       } while (nr_pages > 0);
-}
-
-/**
- * ntfs_commit_pages_after_non_resident_write - commit the received data
- * @pages:     array of destination pages
- * @nr_pages:  number of pages in @pages
- * @pos:       byte position in file at which the write begins
- * @bytes:     number of bytes to be written
- *
- * See description of ntfs_commit_pages_after_write(), below.
- */
-static inline int ntfs_commit_pages_after_non_resident_write(
-               struct page **pages, const unsigned nr_pages,
-               s64 pos, size_t bytes)
-{
-       s64 end, initialized_size;
-       struct inode *vi;
-       ntfs_inode *ni, *base_ni;
-       struct buffer_head *bh, *head;
-       ntfs_attr_search_ctx *ctx;
-       MFT_RECORD *m;
-       ATTR_RECORD *a;
-       unsigned long flags;
-       unsigned blocksize, u;
-       int err;
-
-       vi = pages[0]->mapping->host;
-       ni = NTFS_I(vi);
-       blocksize = vi->i_sb->s_blocksize;
-       end = pos + bytes;
-       u = 0;
-       do {
-               s64 bh_pos;
-               struct page *page;
-               bool partial;
-
-               page = pages[u];
-               bh_pos = (s64)page->index << PAGE_SHIFT;
-               bh = head = page_buffers(page);
-               partial = false;
-               do {
-                       s64 bh_end;
-
-                       bh_end = bh_pos + blocksize;
-                       if (bh_end <= pos || bh_pos >= end) {
-                               if (!buffer_uptodate(bh))
-                                       partial = true;
-                       } else {
-                               set_buffer_uptodate(bh);
-                               mark_buffer_dirty(bh);
-                       }
-               } while (bh_pos += blocksize, (bh = bh->b_this_page) != head);
-               /*
-                * If all buffers are now uptodate but the page is not, set the
-                * page uptodate.
-                */
-               if (!partial && !PageUptodate(page))
-                       SetPageUptodate(page);
-       } while (++u < nr_pages);
-       /*
-        * Finally, if we do not need to update initialized_size or i_size we
-        * are finished.
-        */
-       read_lock_irqsave(&ni->size_lock, flags);
-       initialized_size = ni->initialized_size;
-       read_unlock_irqrestore(&ni->size_lock, flags);
-       if (end <= initialized_size) {
-               ntfs_debug("Done.");
-               return 0;
-       }
-       /*
-        * Update initialized_size/i_size as appropriate, both in the inode and
-        * the mft record.
-        */
-       if (!NInoAttr(ni))
-               base_ni = ni;
-       else
-               base_ni = ni->ext.base_ntfs_ino;
-       /* Map, pin, and lock the mft record. */
-       m = map_mft_record(base_ni);
-       if (IS_ERR(m)) {
-               err = PTR_ERR(m);
-               m = NULL;
-               ctx = NULL;
-               goto err_out;
-       }
-       BUG_ON(!NInoNonResident(ni));
-       ctx = ntfs_attr_get_search_ctx(base_ni, m);
-       if (unlikely(!ctx)) {
-               err = -ENOMEM;
-               goto err_out;
-       }
-       err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
-                       CASE_SENSITIVE, 0, NULL, 0, ctx);
-       if (unlikely(err)) {
-               if (err == -ENOENT)
-                       err = -EIO;
-               goto err_out;
-       }
-       a = ctx->attr;
-       BUG_ON(!a->non_resident);
-       write_lock_irqsave(&ni->size_lock, flags);
-       BUG_ON(end > ni->allocated_size);
-       ni->initialized_size = end;
-       a->data.non_resident.initialized_size = cpu_to_sle64(end);
-       if (end > i_size_read(vi)) {
-               i_size_write(vi, end);
-               a->data.non_resident.data_size =
-                               a->data.non_resident.initialized_size;
-       }
-       write_unlock_irqrestore(&ni->size_lock, flags);
-       /* Mark the mft record dirty, so it gets written back. */
-       flush_dcache_mft_record_page(ctx->ntfs_ino);
-       mark_mft_record_dirty(ctx->ntfs_ino);
-       ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(base_ni);
-       ntfs_debug("Done.");
-       return 0;
-err_out:
-       if (ctx)
-               ntfs_attr_put_search_ctx(ctx);
-       if (m)
-               unmap_mft_record(base_ni);
-       ntfs_error(vi->i_sb, "Failed to update initialized_size/i_size (error "
-                       "code %i).", err);
-       if (err != -ENOMEM)
-               NVolSetErrors(ni->vol);
-       return err;
-}
-
-/**
- * ntfs_commit_pages_after_write - commit the received data
- * @pages:     array of destination pages
- * @nr_pages:  number of pages in @pages
- * @pos:       byte position in file at which the write begins
- * @bytes:     number of bytes to be written
- *
- * This is called from ntfs_file_buffered_write() with i_mutex held on the inode
- * (@pages[0]->mapping->host).  There are @nr_pages pages in @pages which are
- * locked but not kmap()ped.  The source data has already been copied into the
- * @page.  ntfs_prepare_pages_for_non_resident_write() has been called before
- * the data was copied (for non-resident attributes only) and it returned
- * success.
- *
- * Need to set uptodate and mark dirty all buffers within the boundary of the
- * write.  If all buffers in a page are uptodate we set the page uptodate, too.
- *
- * Setting the buffers dirty ensures that they get written out later when
- * ntfs_writepage() is invoked by the VM.
- *
- * Finally, we need to update i_size and initialized_size as appropriate both
- * in the inode and the mft record.
- *
- * This is modelled after fs/buffer.c::generic_commit_write(), which marks
- * buffers uptodate and dirty, sets the page uptodate if all buffers in the
- * page are uptodate, and updates i_size if the end of io is beyond i_size.  In
- * that case, it also marks the inode dirty.
- *
- * If things have gone as outlined in
- * ntfs_prepare_pages_for_non_resident_write(), we do not need to do any page
- * content modifications here for non-resident attributes.  For resident
- * attributes we need to do the uptodate bringing here which we combine with
- * the copying into the mft record which means we save one atomic kmap.
- *
- * Return 0 on success or -errno on error.
- */
-static int ntfs_commit_pages_after_write(struct page **pages,
-               const unsigned nr_pages, s64 pos, size_t bytes)
-{
-       s64 end, initialized_size;
-       loff_t i_size;
-       struct inode *vi;
-       ntfs_inode *ni, *base_ni;
-       struct page *page;
-       ntfs_attr_search_ctx *ctx;
-       MFT_RECORD *m;
-       ATTR_RECORD *a;
-       char *kattr, *kaddr;
-       unsigned long flags;
-       u32 attr_len;
-       int err;
-
-       BUG_ON(!nr_pages);
-       BUG_ON(!pages);
-       page = pages[0];
-       BUG_ON(!page);
-       vi = page->mapping->host;
-       ni = NTFS_I(vi);
-       ntfs_debug("Entering for inode 0x%lx, attribute type 0x%x, start page "
-                       "index 0x%lx, nr_pages 0x%x, pos 0x%llx, bytes 0x%zx.",
-                       vi->i_ino, ni->type, page->index, nr_pages,
-                       (long long)pos, bytes);
-       if (NInoNonResident(ni))
-               return ntfs_commit_pages_after_non_resident_write(pages,
-                               nr_pages, pos, bytes);
-       BUG_ON(nr_pages > 1);
-       /*
-        * Attribute is resident, implying it is not compressed, encrypted, or
-        * sparse.
-        */
-       if (!NInoAttr(ni))
-               base_ni = ni;
-       else
-               base_ni = ni->ext.base_ntfs_ino;
-       BUG_ON(NInoNonResident(ni));
-       /* Map, pin, and lock the mft record. */
-       m = map_mft_record(base_ni);
-       if (IS_ERR(m)) {
-               err = PTR_ERR(m);
-               m = NULL;
-               ctx = NULL;
-               goto err_out;
-       }
-       ctx = ntfs_attr_get_search_ctx(base_ni, m);
-       if (unlikely(!ctx)) {
-               err = -ENOMEM;
-               goto err_out;
-       }
-       err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
-                       CASE_SENSITIVE, 0, NULL, 0, ctx);
-       if (unlikely(err)) {
-               if (err == -ENOENT)
-                       err = -EIO;
-               goto err_out;
-       }
-       a = ctx->attr;
-       BUG_ON(a->non_resident);
-       /* The total length of the attribute value. */
-       attr_len = le32_to_cpu(a->data.resident.value_length);
-       i_size = i_size_read(vi);
-       BUG_ON(attr_len != i_size);
-       BUG_ON(pos > attr_len);
-       end = pos + bytes;
-       BUG_ON(end > le32_to_cpu(a->length) -
-                       le16_to_cpu(a->data.resident.value_offset));
-       kattr = (u8*)a + le16_to_cpu(a->data.resident.value_offset);
-       kaddr = kmap_atomic(page);
-       /* Copy the received data from the page to the mft record. */
-       memcpy(kattr + pos, kaddr + pos, bytes);
-       /* Update the attribute length if necessary. */
-       if (end > attr_len) {
-               attr_len = end;
-               a->data.resident.value_length = cpu_to_le32(attr_len);
-       }
-       /*
-        * If the page is not uptodate, bring the out of bounds area(s)
-        * uptodate by copying data from the mft record to the page.
-        */
-       if (!PageUptodate(page)) {
-               if (pos > 0)
-                       memcpy(kaddr, kattr, pos);
-               if (end < attr_len)
-                       memcpy(kaddr + end, kattr + end, attr_len - end);
-               /* Zero the region outside the end of the attribute value. */
-               memset(kaddr + attr_len, 0, PAGE_SIZE - attr_len);
-               flush_dcache_page(page);
-               SetPageUptodate(page);
-       }
-       kunmap_atomic(kaddr);
-       /* Update initialized_size/i_size if necessary. */
-       read_lock_irqsave(&ni->size_lock, flags);
-       initialized_size = ni->initialized_size;
-       BUG_ON(end > ni->allocated_size);
-       read_unlock_irqrestore(&ni->size_lock, flags);
-       BUG_ON(initialized_size != i_size);
-       if (end > initialized_size) {
-               write_lock_irqsave(&ni->size_lock, flags);
-               ni->initialized_size = end;
-               i_size_write(vi, end);
-               write_unlock_irqrestore(&ni->size_lock, flags);
-       }
-       /* Mark the mft record dirty, so it gets written back. */
-       flush_dcache_mft_record_page(ctx->ntfs_ino);
-       mark_mft_record_dirty(ctx->ntfs_ino);
-       ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(base_ni);
-       ntfs_debug("Done.");
-       return 0;
-err_out:
-       if (err == -ENOMEM) {
-               ntfs_warning(vi->i_sb, "Error allocating memory required to "
-                               "commit the write.");
-               if (PageUptodate(page)) {
-                       ntfs_warning(vi->i_sb, "Page is uptodate, setting "
-                                       "dirty so the write will be retried "
-                                       "later on by the VM.");
-                       /*
-                        * Put the page on mapping->dirty_pages, but leave its
-                        * buffers' dirty state as-is.
-                        */
-                       __set_page_dirty_nobuffers(page);
-                       err = 0;
-               } else
-                       ntfs_error(vi->i_sb, "Page is not uptodate.  Written "
-                                       "data has been lost.");
-       } else {
-               ntfs_error(vi->i_sb, "Resident attribute commit write failed "
-                               "with error %i.", err);
-               NVolSetErrors(ni->vol);
-       }
-       if (ctx)
-               ntfs_attr_put_search_ctx(ctx);
-       if (m)
-               unmap_mft_record(base_ni);
-       return err;
-}
-
-/*
- * Copy as much as we can into the pages and return the number of bytes which
- * were successfully copied.  If a fault is encountered then clear the pages
- * out to (ofs + bytes) and return the number of bytes which were copied.
- */
-static size_t ntfs_copy_from_user_iter(struct page **pages, unsigned nr_pages,
-               unsigned ofs, struct iov_iter *i, size_t bytes)
-{
-       struct page **last_page = pages + nr_pages;
-       size_t total = 0;
-       unsigned len, copied;
-
-       do {
-               len = PAGE_SIZE - ofs;
-               if (len > bytes)
-                       len = bytes;
-               copied = copy_page_from_iter_atomic(*pages, ofs, len, i);
-               total += copied;
-               bytes -= copied;
-               if (!bytes)
-                       break;
-               if (copied < len)
-                       goto err;
-               ofs = 0;
-       } while (++pages < last_page);
-out:
-       return total;
-err:
-       /* Zero the rest of the target like __copy_from_user(). */
-       len = PAGE_SIZE - copied;
-       do {
-               if (len > bytes)
-                       len = bytes;
-               zero_user(*pages, copied, len);
-               bytes -= len;
-               copied = 0;
-               len = PAGE_SIZE;
-       } while (++pages < last_page);
-       goto out;
-}
-
-/**
- * ntfs_perform_write - perform buffered write to a file
- * @file:      file to write to
- * @i:         iov_iter with data to write
- * @pos:       byte offset in file at which to begin writing to
- */
-static ssize_t ntfs_perform_write(struct file *file, struct iov_iter *i,
-               loff_t pos)
-{
-       struct address_space *mapping = file->f_mapping;
-       struct inode *vi = mapping->host;
-       ntfs_inode *ni = NTFS_I(vi);
-       ntfs_volume *vol = ni->vol;
-       struct page *pages[NTFS_MAX_PAGES_PER_CLUSTER];
-       struct page *cached_page = NULL;
-       VCN last_vcn;
-       LCN lcn;
-       size_t bytes;
-       ssize_t status, written = 0;
-       unsigned nr_pages;
-
-       ntfs_debug("Entering for i_ino 0x%lx, attribute type 0x%x, pos "
-                       "0x%llx, count 0x%lx.", vi->i_ino,
-                       (unsigned)le32_to_cpu(ni->type),
-                       (unsigned long long)pos,
-                       (unsigned long)iov_iter_count(i));
-       /*
-        * If a previous ntfs_truncate() failed, repeat it and abort if it
-        * fails again.
-        */
-       if (unlikely(NInoTruncateFailed(ni))) {
-               int err;
-
-               inode_dio_wait(vi);
-               err = ntfs_truncate(vi);
-               if (err || NInoTruncateFailed(ni)) {
-                       if (!err)
-                               err = -EIO;
-                       ntfs_error(vol->sb, "Cannot perform write to inode "
-                                       "0x%lx, attribute type 0x%x, because "
-                                       "ntfs_truncate() failed (error code "
-                                       "%i).", vi->i_ino,
-                                       (unsigned)le32_to_cpu(ni->type), err);
-                       return err;
-               }
-       }
-       /*
-        * Determine the number of pages per cluster for non-resident
-        * attributes.
-        */
-       nr_pages = 1;
-       if (vol->cluster_size > PAGE_SIZE && NInoNonResident(ni))
-               nr_pages = vol->cluster_size >> PAGE_SHIFT;
-       last_vcn = -1;
-       do {
-               VCN vcn;
-               pgoff_t start_idx;
-               unsigned ofs, do_pages, u;
-               size_t copied;
-
-               start_idx = pos >> PAGE_SHIFT;
-               ofs = pos & ~PAGE_MASK;
-               bytes = PAGE_SIZE - ofs;
-               do_pages = 1;
-               if (nr_pages > 1) {
-                       vcn = pos >> vol->cluster_size_bits;
-                       if (vcn != last_vcn) {
-                               last_vcn = vcn;
-                               /*
-                                * Get the lcn of the vcn the write is in.  If
-                                * it is a hole, need to lock down all pages in
-                                * the cluster.
-                                */
-                               down_read(&ni->runlist.lock);
-                               lcn = ntfs_attr_vcn_to_lcn_nolock(ni, pos >>
-                                               vol->cluster_size_bits, false);
-                               up_read(&ni->runlist.lock);
-                               if (unlikely(lcn < LCN_HOLE)) {
-                                       if (lcn == LCN_ENOMEM)
-                                               status = -ENOMEM;
-                                       else {
-                                               status = -EIO;
-                                               ntfs_error(vol->sb, "Cannot "
-                                                       "perform write to "
-                                                       "inode 0x%lx, "
-                                                       "attribute type 0x%x, "
-                                                       "because the attribute "
-                                                       "is corrupt.",
-                                                       vi->i_ino, (unsigned)
-                                                       le32_to_cpu(ni->type));
-                                       }
-                                       break;
-                               }
-                               if (lcn == LCN_HOLE) {
-                                       start_idx = (pos & ~(s64)
-                                                       vol->cluster_size_mask)
-                                                       >> PAGE_SHIFT;
-                                       bytes = vol->cluster_size - (pos &
-                                                       vol->cluster_size_mask);
-                                       do_pages = nr_pages;
-                               }
-                       }
-               }
-               if (bytes > iov_iter_count(i))
-                       bytes = iov_iter_count(i);
-again:
-               /*
-                * Bring in the user page(s) that we will copy from _first_.
-                * Otherwise there is a nasty deadlock on copying from the same
-                * page(s) as we are writing to, without it/them being marked
-                * up-to-date.  Note, at present there is nothing to stop the
-                * pages being swapped out between us bringing them into memory
-                * and doing the actual copying.
-                */
-               if (unlikely(fault_in_iov_iter_readable(i, bytes))) {
-                       status = -EFAULT;
-                       break;
-               }
-               /* Get and lock @do_pages starting at index @start_idx. */
-               status = __ntfs_grab_cache_pages(mapping, start_idx, do_pages,
-                               pages, &cached_page);
-               if (unlikely(status))
-                       break;
-               /*
-                * For non-resident attributes, we need to fill any holes with
-                * actual clusters and ensure all bufferes are mapped.  We also
-                * need to bring uptodate any buffers that are only partially
-                * being written to.
-                */
-               if (NInoNonResident(ni)) {
-                       status = ntfs_prepare_pages_for_non_resident_write(
-                                       pages, do_pages, pos, bytes);
-                       if (unlikely(status)) {
-                               do {
-                                       unlock_page(pages[--do_pages]);
-                                       put_page(pages[do_pages]);
-                               } while (do_pages);
-                               break;
-                       }
-               }
-               u = (pos >> PAGE_SHIFT) - pages[0]->index;
-               copied = ntfs_copy_from_user_iter(pages + u, do_pages - u, ofs,
-                                       i, bytes);
-               ntfs_flush_dcache_pages(pages + u, do_pages - u);
-               status = 0;
-               if (likely(copied == bytes)) {
-                       status = ntfs_commit_pages_after_write(pages, do_pages,
-                                       pos, bytes);
-               }
-               do {
-                       unlock_page(pages[--do_pages]);
-                       put_page(pages[do_pages]);
-               } while (do_pages);
-               if (unlikely(status < 0)) {
-                       iov_iter_revert(i, copied);
-                       break;
-               }
-               cond_resched();
-               if (unlikely(copied < bytes)) {
-                       iov_iter_revert(i, copied);
-                       if (copied)
-                               bytes = copied;
-                       else if (bytes > PAGE_SIZE - ofs)
-                               bytes = PAGE_SIZE - ofs;
-                       goto again;
-               }
-               pos += copied;
-               written += copied;
-               balance_dirty_pages_ratelimited(mapping);
-               if (fatal_signal_pending(current)) {
-                       status = -EINTR;
-                       break;
-               }
-       } while (iov_iter_count(i));
-       if (cached_page)
-               put_page(cached_page);
-       ntfs_debug("Done.  Returning %s (written 0x%lx, status %li).",
-                       written ? "written" : "status", (unsigned long)written,
-                       (long)status);
-       return written ? written : status;
-}
-
-/**
- * ntfs_file_write_iter - simple wrapper for ntfs_file_write_iter_nolock()
- * @iocb:      IO state structure
- * @from:      iov_iter with data to write
- *
- * Basically the same as generic_file_write_iter() except that it ends up
- * up calling ntfs_perform_write() instead of generic_perform_write() and that
- * O_DIRECT is not implemented.
- */
-static ssize_t ntfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
-{
-       struct file *file = iocb->ki_filp;
-       struct inode *vi = file_inode(file);
-       ssize_t written = 0;
-       ssize_t err;
-
-       inode_lock(vi);
-       /* We can write back this queue in page reclaim. */
-       err = ntfs_prepare_file_for_write(iocb, from);
-       if (iov_iter_count(from) && !err)
-               written = ntfs_perform_write(file, from, iocb->ki_pos);
-       inode_unlock(vi);
-       iocb->ki_pos += written;
-       if (likely(written > 0))
-               written = generic_write_sync(iocb, written);
-       return written ? written : err;
-}
-
-/**
- * ntfs_file_fsync - sync a file to disk
- * @filp:      file to be synced
- * @datasync:  if non-zero only flush user data and not metadata
- *
- * Data integrity sync of a file to disk.  Used for fsync, fdatasync, and msync
- * system calls.  This function is inspired by fs/buffer.c::file_fsync().
- *
- * If @datasync is false, write the mft record and all associated extent mft
- * records as well as the $DATA attribute and then sync the block device.
- *
- * If @datasync is true and the attribute is non-resident, we skip the writing
- * of the mft record and all associated extent mft records (this might still
- * happen due to the write_inode_now() call).
- *
- * Also, if @datasync is true, we do not wait on the inode to be written out
- * but we always wait on the page cache pages to be written out.
- *
- * Locking: Caller must hold i_mutex on the inode.
- *
- * TODO: We should probably also write all attribute/index inodes associated
- * with this inode but since we have no simple way of getting to them we ignore
- * this problem for now.
- */
-static int ntfs_file_fsync(struct file *filp, loff_t start, loff_t end,
-                          int datasync)
-{
-       struct inode *vi = filp->f_mapping->host;
-       int err, ret = 0;
-
-       ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
-
-       err = file_write_and_wait_range(filp, start, end);
-       if (err)
-               return err;
-       inode_lock(vi);
-
-       BUG_ON(S_ISDIR(vi->i_mode));
-       if (!datasync || !NInoNonResident(NTFS_I(vi)))
-               ret = __ntfs_write_inode(vi, 1);
-       write_inode_now(vi, !datasync);
-       /*
-        * NOTE: If we were to use mapping->private_list (see ext2 and
-        * fs/buffer.c) for dirty blocks then we could optimize the below to be
-        * sync_mapping_buffers(vi->i_mapping).
-        */
-       err = sync_blockdev(vi->i_sb->s_bdev);
-       if (unlikely(err && !ret))
-               ret = err;
-       if (likely(!ret))
-               ntfs_debug("Done.");
-       else
-               ntfs_warning(vi->i_sb, "Failed to f%ssync inode 0x%lx.  Error "
-                               "%u.", datasync ? "data" : "", vi->i_ino, -ret);
-       inode_unlock(vi);
-       return ret;
-}
-
-#endif /* NTFS_RW */
-
-const struct file_operations ntfs_file_ops = {
-       .llseek         = generic_file_llseek,
-       .read_iter      = generic_file_read_iter,
-#ifdef NTFS_RW
-       .write_iter     = ntfs_file_write_iter,
-       .fsync          = ntfs_file_fsync,
-#endif /* NTFS_RW */
-       .mmap           = generic_file_mmap,
-       .open           = ntfs_file_open,
-       .splice_read    = filemap_splice_read,
-};
-
-const struct inode_operations ntfs_file_inode_ops = {
-#ifdef NTFS_RW
-       .setattr        = ntfs_setattr,
-#endif /* NTFS_RW */
-};
-
-const struct file_operations ntfs_empty_file_ops = {};
-
-const struct inode_operations ntfs_empty_inode_ops = {};
diff --git a/fs/ntfs/index.c b/fs/ntfs/index.c
deleted file mode 100644 (file)
index d46c2c0..0000000
+++ /dev/null
@@ -1,440 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * index.c - NTFS kernel index handling.  Part of the Linux-NTFS project.
- *
- * Copyright (c) 2004-2005 Anton Altaparmakov
- */
-
-#include <linux/slab.h>
-
-#include "aops.h"
-#include "collate.h"
-#include "debug.h"
-#include "index.h"
-#include "ntfs.h"
-
-/**
- * ntfs_index_ctx_get - allocate and initialize a new index context
- * @idx_ni:    ntfs index inode with which to initialize the context
- *
- * Allocate a new index context, initialize it with @idx_ni and return it.
- * Return NULL if allocation failed.
- *
- * Locking:  Caller must hold i_mutex on the index inode.
- */
-ntfs_index_context *ntfs_index_ctx_get(ntfs_inode *idx_ni)
-{
-       ntfs_index_context *ictx;
-
-       ictx = kmem_cache_alloc(ntfs_index_ctx_cache, GFP_NOFS);
-       if (ictx)
-               *ictx = (ntfs_index_context){ .idx_ni = idx_ni };
-       return ictx;
-}
-
-/**
- * ntfs_index_ctx_put - release an index context
- * @ictx:      index context to free
- *
- * Release the index context @ictx, releasing all associated resources.
- *
- * Locking:  Caller must hold i_mutex on the index inode.
- */
-void ntfs_index_ctx_put(ntfs_index_context *ictx)
-{
-       if (ictx->entry) {
-               if (ictx->is_in_root) {
-                       if (ictx->actx)
-                               ntfs_attr_put_search_ctx(ictx->actx);
-                       if (ictx->base_ni)
-                               unmap_mft_record(ictx->base_ni);
-               } else {
-                       struct page *page = ictx->page;
-                       if (page) {
-                               BUG_ON(!PageLocked(page));
-                               unlock_page(page);
-                               ntfs_unmap_page(page);
-                       }
-               }
-       }
-       kmem_cache_free(ntfs_index_ctx_cache, ictx);
-       return;
-}
-
-/**
- * ntfs_index_lookup - find a key in an index and return its index entry
- * @key:       [IN] key for which to search in the index
- * @key_len:   [IN] length of @key in bytes
- * @ictx:      [IN/OUT] context describing the index and the returned entry
- *
- * Before calling ntfs_index_lookup(), @ictx must have been obtained from a
- * call to ntfs_index_ctx_get().
- *
- * Look for the @key in the index specified by the index lookup context @ictx.
- * ntfs_index_lookup() walks the contents of the index looking for the @key.
- *
- * If the @key is found in the index, 0 is returned and @ictx is setup to
- * describe the index entry containing the matching @key.  @ictx->entry is the
- * index entry and @ictx->data and @ictx->data_len are the index entry data and
- * its length in bytes, respectively.
- *
- * If the @key is not found in the index, -ENOENT is returned and @ictx is
- * setup to describe the index entry whose key collates immediately after the
- * search @key, i.e. this is the position in the index at which an index entry
- * with a key of @key would need to be inserted.
- *
- * If an error occurs return the negative error code and @ictx is left
- * untouched.
- *
- * When finished with the entry and its data, call ntfs_index_ctx_put() to free
- * the context and other associated resources.
- *
- * If the index entry was modified, call flush_dcache_index_entry_page()
- * immediately after the modification and either ntfs_index_entry_mark_dirty()
- * or ntfs_index_entry_write() before the call to ntfs_index_ctx_put() to
- * ensure that the changes are written to disk.
- *
- * Locking:  - Caller must hold i_mutex on the index inode.
- *          - Each page cache page in the index allocation mapping must be
- *            locked whilst being accessed otherwise we may find a corrupt
- *            page due to it being under ->writepage at the moment which
- *            applies the mst protection fixups before writing out and then
- *            removes them again after the write is complete after which it 
- *            unlocks the page.
- */
-int ntfs_index_lookup(const void *key, const int key_len,
-               ntfs_index_context *ictx)
-{
-       VCN vcn, old_vcn;
-       ntfs_inode *idx_ni = ictx->idx_ni;
-       ntfs_volume *vol = idx_ni->vol;
-       struct super_block *sb = vol->sb;
-       ntfs_inode *base_ni = idx_ni->ext.base_ntfs_ino;
-       MFT_RECORD *m;
-       INDEX_ROOT *ir;
-       INDEX_ENTRY *ie;
-       INDEX_ALLOCATION *ia;
-       u8 *index_end, *kaddr;
-       ntfs_attr_search_ctx *actx;
-       struct address_space *ia_mapping;
-       struct page *page;
-       int rc, err = 0;
-
-       ntfs_debug("Entering.");
-       BUG_ON(!NInoAttr(idx_ni));
-       BUG_ON(idx_ni->type != AT_INDEX_ALLOCATION);
-       BUG_ON(idx_ni->nr_extents != -1);
-       BUG_ON(!base_ni);
-       BUG_ON(!key);
-       BUG_ON(key_len <= 0);
-       if (!ntfs_is_collation_rule_supported(
-                       idx_ni->itype.index.collation_rule)) {
-               ntfs_error(sb, "Index uses unsupported collation rule 0x%x.  "
-                               "Aborting lookup.", le32_to_cpu(
-                               idx_ni->itype.index.collation_rule));
-               return -EOPNOTSUPP;
-       }
-       /* Get hold of the mft record for the index inode. */
-       m = map_mft_record(base_ni);
-       if (IS_ERR(m)) {
-               ntfs_error(sb, "map_mft_record() failed with error code %ld.",
-                               -PTR_ERR(m));
-               return PTR_ERR(m);
-       }
-       actx = ntfs_attr_get_search_ctx(base_ni, m);
-       if (unlikely(!actx)) {
-               err = -ENOMEM;
-               goto err_out;
-       }
-       /* Find the index root attribute in the mft record. */
-       err = ntfs_attr_lookup(AT_INDEX_ROOT, idx_ni->name, idx_ni->name_len,
-                       CASE_SENSITIVE, 0, NULL, 0, actx);
-       if (unlikely(err)) {
-               if (err == -ENOENT) {
-                       ntfs_error(sb, "Index root attribute missing in inode "
-                                       "0x%lx.", idx_ni->mft_no);
-                       err = -EIO;
-               }
-               goto err_out;
-       }
-       /* Get to the index root value (it has been verified in read_inode). */
-       ir = (INDEX_ROOT*)((u8*)actx->attr +
-                       le16_to_cpu(actx->attr->data.resident.value_offset));
-       index_end = (u8*)&ir->index + le32_to_cpu(ir->index.index_length);
-       /* The first index entry. */
-       ie = (INDEX_ENTRY*)((u8*)&ir->index +
-                       le32_to_cpu(ir->index.entries_offset));
-       /*
-        * Loop until we exceed valid memory (corruption case) or until we
-        * reach the last entry.
-        */
-       for (;; ie = (INDEX_ENTRY*)((u8*)ie + le16_to_cpu(ie->length))) {
-               /* Bounds checks. */
-               if ((u8*)ie < (u8*)actx->mrec || (u8*)ie +
-                               sizeof(INDEX_ENTRY_HEADER) > index_end ||
-                               (u8*)ie + le16_to_cpu(ie->length) > index_end)
-                       goto idx_err_out;
-               /*
-                * The last entry cannot contain a key.  It can however contain
-                * a pointer to a child node in the B+tree so we just break out.
-                */
-               if (ie->flags & INDEX_ENTRY_END)
-                       break;
-               /* Further bounds checks. */
-               if ((u32)sizeof(INDEX_ENTRY_HEADER) +
-                               le16_to_cpu(ie->key_length) >
-                               le16_to_cpu(ie->data.vi.data_offset) ||
-                               (u32)le16_to_cpu(ie->data.vi.data_offset) +
-                               le16_to_cpu(ie->data.vi.data_length) >
-                               le16_to_cpu(ie->length))
-                       goto idx_err_out;
-               /* If the keys match perfectly, we setup @ictx and return 0. */
-               if ((key_len == le16_to_cpu(ie->key_length)) && !memcmp(key,
-                               &ie->key, key_len)) {
-ir_done:
-                       ictx->is_in_root = true;
-                       ictx->ir = ir;
-                       ictx->actx = actx;
-                       ictx->base_ni = base_ni;
-                       ictx->ia = NULL;
-                       ictx->page = NULL;
-done:
-                       ictx->entry = ie;
-                       ictx->data = (u8*)ie +
-                                       le16_to_cpu(ie->data.vi.data_offset);
-                       ictx->data_len = le16_to_cpu(ie->data.vi.data_length);
-                       ntfs_debug("Done.");
-                       return err;
-               }
-               /*
-                * Not a perfect match, need to do full blown collation so we
-                * know which way in the B+tree we have to go.
-                */
-               rc = ntfs_collate(vol, idx_ni->itype.index.collation_rule, key,
-                               key_len, &ie->key, le16_to_cpu(ie->key_length));
-               /*
-                * If @key collates before the key of the current entry, there
-                * is definitely no such key in this index but we might need to
-                * descend into the B+tree so we just break out of the loop.
-                */
-               if (rc == -1)
-                       break;
-               /*
-                * A match should never happen as the memcmp() call should have
-                * cought it, but we still treat it correctly.
-                */
-               if (!rc)
-                       goto ir_done;
-               /* The keys are not equal, continue the search. */
-       }
-       /*
-        * We have finished with this index without success.  Check for the
-        * presence of a child node and if not present setup @ictx and return
-        * -ENOENT.
-        */
-       if (!(ie->flags & INDEX_ENTRY_NODE)) {
-               ntfs_debug("Entry not found.");
-               err = -ENOENT;
-               goto ir_done;
-       } /* Child node present, descend into it. */
-       /* Consistency check: Verify that an index allocation exists. */
-       if (!NInoIndexAllocPresent(idx_ni)) {
-               ntfs_error(sb, "No index allocation attribute but index entry "
-                               "requires one.  Inode 0x%lx is corrupt or "
-                               "driver bug.", idx_ni->mft_no);
-               goto err_out;
-       }
-       /* Get the starting vcn of the index_block holding the child node. */
-       vcn = sle64_to_cpup((sle64*)((u8*)ie + le16_to_cpu(ie->length) - 8));
-       ia_mapping = VFS_I(idx_ni)->i_mapping;
-       /*
-        * We are done with the index root and the mft record.  Release them,
-        * otherwise we deadlock with ntfs_map_page().
-        */
-       ntfs_attr_put_search_ctx(actx);
-       unmap_mft_record(base_ni);
-       m = NULL;
-       actx = NULL;
-descend_into_child_node:
-       /*
-        * Convert vcn to index into the index allocation attribute in units
-        * of PAGE_SIZE and map the page cache page, reading it from
-        * disk if necessary.
-        */
-       page = ntfs_map_page(ia_mapping, vcn <<
-                       idx_ni->itype.index.vcn_size_bits >> PAGE_SHIFT);
-       if (IS_ERR(page)) {
-               ntfs_error(sb, "Failed to map index page, error %ld.",
-                               -PTR_ERR(page));
-               err = PTR_ERR(page);
-               goto err_out;
-       }
-       lock_page(page);
-       kaddr = (u8*)page_address(page);
-fast_descend_into_child_node:
-       /* Get to the index allocation block. */
-       ia = (INDEX_ALLOCATION*)(kaddr + ((vcn <<
-                       idx_ni->itype.index.vcn_size_bits) & ~PAGE_MASK));
-       /* Bounds checks. */
-       if ((u8*)ia < kaddr || (u8*)ia > kaddr + PAGE_SIZE) {
-               ntfs_error(sb, "Out of bounds check failed.  Corrupt inode "
-                               "0x%lx or driver bug.", idx_ni->mft_no);
-               goto unm_err_out;
-       }
-       /* Catch multi sector transfer fixup errors. */
-       if (unlikely(!ntfs_is_indx_record(ia->magic))) {
-               ntfs_error(sb, "Index record with vcn 0x%llx is corrupt.  "
-                               "Corrupt inode 0x%lx.  Run chkdsk.",
-                               (long long)vcn, idx_ni->mft_no);
-               goto unm_err_out;
-       }
-       if (sle64_to_cpu(ia->index_block_vcn) != vcn) {
-               ntfs_error(sb, "Actual VCN (0x%llx) of index buffer is "
-                               "different from expected VCN (0x%llx).  Inode "
-                               "0x%lx is corrupt or driver bug.",
-                               (unsigned long long)
-                               sle64_to_cpu(ia->index_block_vcn),
-                               (unsigned long long)vcn, idx_ni->mft_no);
-               goto unm_err_out;
-       }
-       if (le32_to_cpu(ia->index.allocated_size) + 0x18 !=
-                       idx_ni->itype.index.block_size) {
-               ntfs_error(sb, "Index buffer (VCN 0x%llx) of inode 0x%lx has "
-                               "a size (%u) differing from the index "
-                               "specified size (%u).  Inode is corrupt or "
-                               "driver bug.", (unsigned long long)vcn,
-                               idx_ni->mft_no,
-                               le32_to_cpu(ia->index.allocated_size) + 0x18,
-                               idx_ni->itype.index.block_size);
-               goto unm_err_out;
-       }
-       index_end = (u8*)ia + idx_ni->itype.index.block_size;
-       if (index_end > kaddr + PAGE_SIZE) {
-               ntfs_error(sb, "Index buffer (VCN 0x%llx) of inode 0x%lx "
-                               "crosses page boundary.  Impossible!  Cannot "
-                               "access!  This is probably a bug in the "
-                               "driver.", (unsigned long long)vcn,
-                               idx_ni->mft_no);
-               goto unm_err_out;
-       }
-       index_end = (u8*)&ia->index + le32_to_cpu(ia->index.index_length);
-       if (index_end > (u8*)ia + idx_ni->itype.index.block_size) {
-               ntfs_error(sb, "Size of index buffer (VCN 0x%llx) of inode "
-                               "0x%lx exceeds maximum size.",
-                               (unsigned long long)vcn, idx_ni->mft_no);
-               goto unm_err_out;
-       }
-       /* The first index entry. */
-       ie = (INDEX_ENTRY*)((u8*)&ia->index +
-                       le32_to_cpu(ia->index.entries_offset));
-       /*
-        * Iterate similar to above big loop but applied to index buffer, thus
-        * loop until we exceed valid memory (corruption case) or until we
-        * reach the last entry.
-        */
-       for (;; ie = (INDEX_ENTRY*)((u8*)ie + le16_to_cpu(ie->length))) {
-               /* Bounds checks. */
-               if ((u8*)ie < (u8*)ia || (u8*)ie +
-                               sizeof(INDEX_ENTRY_HEADER) > index_end ||
-                               (u8*)ie + le16_to_cpu(ie->length) > index_end) {
-                       ntfs_error(sb, "Index entry out of bounds in inode "
-                                       "0x%lx.", idx_ni->mft_no);
-                       goto unm_err_out;
-               }
-               /*
-                * The last entry cannot contain a key.  It can however contain
-                * a pointer to a child node in the B+tree so we just break out.
-                */
-               if (ie->flags & INDEX_ENTRY_END)
-                       break;
-               /* Further bounds checks. */
-               if ((u32)sizeof(INDEX_ENTRY_HEADER) +
-                               le16_to_cpu(ie->key_length) >
-                               le16_to_cpu(ie->data.vi.data_offset) ||
-                               (u32)le16_to_cpu(ie->data.vi.data_offset) +
-                               le16_to_cpu(ie->data.vi.data_length) >
-                               le16_to_cpu(ie->length)) {
-                       ntfs_error(sb, "Index entry out of bounds in inode "
-                                       "0x%lx.", idx_ni->mft_no);
-                       goto unm_err_out;
-               }
-               /* If the keys match perfectly, we setup @ictx and return 0. */
-               if ((key_len == le16_to_cpu(ie->key_length)) && !memcmp(key,
-                               &ie->key, key_len)) {
-ia_done:
-                       ictx->is_in_root = false;
-                       ictx->actx = NULL;
-                       ictx->base_ni = NULL;
-                       ictx->ia = ia;
-                       ictx->page = page;
-                       goto done;
-               }
-               /*
-                * Not a perfect match, need to do full blown collation so we
-                * know which way in the B+tree we have to go.
-                */
-               rc = ntfs_collate(vol, idx_ni->itype.index.collation_rule, key,
-                               key_len, &ie->key, le16_to_cpu(ie->key_length));
-               /*
-                * If @key collates before the key of the current entry, there
-                * is definitely no such key in this index but we might need to
-                * descend into the B+tree so we just break out of the loop.
-                */
-               if (rc == -1)
-                       break;
-               /*
-                * A match should never happen as the memcmp() call should have
-                * cought it, but we still treat it correctly.
-                */
-               if (!rc)
-                       goto ia_done;
-               /* The keys are not equal, continue the search. */
-       }
-       /*
-        * We have finished with this index buffer without success.  Check for
-        * the presence of a child node and if not present return -ENOENT.
-        */
-       if (!(ie->flags & INDEX_ENTRY_NODE)) {
-               ntfs_debug("Entry not found.");
-               err = -ENOENT;
-               goto ia_done;
-       }
-       if ((ia->index.flags & NODE_MASK) == LEAF_NODE) {
-               ntfs_error(sb, "Index entry with child node found in a leaf "
-                               "node in inode 0x%lx.", idx_ni->mft_no);
-               goto unm_err_out;
-       }
-       /* Child node present, descend into it. */
-       old_vcn = vcn;
-       vcn = sle64_to_cpup((sle64*)((u8*)ie + le16_to_cpu(ie->length) - 8));
-       if (vcn >= 0) {
-               /*
-                * If vcn is in the same page cache page as old_vcn we recycle
-                * the mapped page.
-                */
-               if (old_vcn << vol->cluster_size_bits >>
-                               PAGE_SHIFT == vcn <<
-                               vol->cluster_size_bits >>
-                               PAGE_SHIFT)
-                       goto fast_descend_into_child_node;
-               unlock_page(page);
-               ntfs_unmap_page(page);
-               goto descend_into_child_node;
-       }
-       ntfs_error(sb, "Negative child node vcn in inode 0x%lx.",
-                       idx_ni->mft_no);
-unm_err_out:
-       unlock_page(page);
-       ntfs_unmap_page(page);
-err_out:
-       if (!err)
-               err = -EIO;
-       if (actx)
-               ntfs_attr_put_search_ctx(actx);
-       if (m)
-               unmap_mft_record(base_ni);
-       return err;
-idx_err_out:
-       ntfs_error(sb, "Corrupt index.  Aborting lookup.");
-       goto err_out;
-}
diff --git a/fs/ntfs/index.h b/fs/ntfs/index.h
deleted file mode 100644 (file)
index bb3c3ae..0000000
+++ /dev/null
@@ -1,134 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * index.h - Defines for NTFS kernel index handling.  Part of the Linux-NTFS
- *          project.
- *
- * Copyright (c) 2004 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_INDEX_H
-#define _LINUX_NTFS_INDEX_H
-
-#include <linux/fs.h>
-
-#include "types.h"
-#include "layout.h"
-#include "inode.h"
-#include "attrib.h"
-#include "mft.h"
-#include "aops.h"
-
-/**
- * @idx_ni:    index inode containing the @entry described by this context
- * @entry:     index entry (points into @ir or @ia)
- * @data:      index entry data (points into @entry)
- * @data_len:  length in bytes of @data
- * @is_in_root:        'true' if @entry is in @ir and 'false' if it is in @ia
- * @ir:                index root if @is_in_root and NULL otherwise
- * @actx:      attribute search context if @is_in_root and NULL otherwise
- * @base_ni:   base inode if @is_in_root and NULL otherwise
- * @ia:                index block if @is_in_root is 'false' and NULL otherwise
- * @page:      page if @is_in_root is 'false' and NULL otherwise
- *
- * @idx_ni is the index inode this context belongs to.
- *
- * @entry is the index entry described by this context.  @data and @data_len
- * are the index entry data and its length in bytes, respectively.  @data
- * simply points into @entry.  This is probably what the user is interested in.
- *
- * If @is_in_root is 'true', @entry is in the index root attribute @ir described
- * by the attribute search context @actx and the base inode @base_ni.  @ia and
- * @page are NULL in this case.
- *
- * If @is_in_root is 'false', @entry is in the index allocation attribute and @ia
- * and @page point to the index allocation block and the mapped, locked page it
- * is in, respectively.  @ir, @actx and @base_ni are NULL in this case.
- *
- * To obtain a context call ntfs_index_ctx_get().
- *
- * We use this context to allow ntfs_index_lookup() to return the found index
- * @entry and its @data without having to allocate a buffer and copy the @entry
- * and/or its @data into it.
- *
- * When finished with the @entry and its @data, call ntfs_index_ctx_put() to
- * free the context and other associated resources.
- *
- * If the index entry was modified, call flush_dcache_index_entry_page()
- * immediately after the modification and either ntfs_index_entry_mark_dirty()
- * or ntfs_index_entry_write() before the call to ntfs_index_ctx_put() to
- * ensure that the changes are written to disk.
- */
-typedef struct {
-       ntfs_inode *idx_ni;
-       INDEX_ENTRY *entry;
-       void *data;
-       u16 data_len;
-       bool is_in_root;
-       INDEX_ROOT *ir;
-       ntfs_attr_search_ctx *actx;
-       ntfs_inode *base_ni;
-       INDEX_ALLOCATION *ia;
-       struct page *page;
-} ntfs_index_context;
-
-extern ntfs_index_context *ntfs_index_ctx_get(ntfs_inode *idx_ni);
-extern void ntfs_index_ctx_put(ntfs_index_context *ictx);
-
-extern int ntfs_index_lookup(const void *key, const int key_len,
-               ntfs_index_context *ictx);
-
-#ifdef NTFS_RW
-
-/**
- * ntfs_index_entry_flush_dcache_page - flush_dcache_page() for index entries
- * @ictx:      ntfs index context describing the index entry
- *
- * Call flush_dcache_page() for the page in which an index entry resides.
- *
- * This must be called every time an index entry is modified, just after the
- * modification.
- *
- * If the index entry is in the index root attribute, simply flush the page
- * containing the mft record containing the index root attribute.
- *
- * If the index entry is in an index block belonging to the index allocation
- * attribute, simply flush the page cache page containing the index block.
- */
-static inline void ntfs_index_entry_flush_dcache_page(ntfs_index_context *ictx)
-{
-       if (ictx->is_in_root)
-               flush_dcache_mft_record_page(ictx->actx->ntfs_ino);
-       else
-               flush_dcache_page(ictx->page);
-}
-
-/**
- * ntfs_index_entry_mark_dirty - mark an index entry dirty
- * @ictx:      ntfs index context describing the index entry
- *
- * Mark the index entry described by the index entry context @ictx dirty.
- *
- * If the index entry is in the index root attribute, simply mark the mft
- * record containing the index root attribute dirty.  This ensures the mft
- * record, and hence the index root attribute, will be written out to disk
- * later.
- *
- * If the index entry is in an index block belonging to the index allocation
- * attribute, mark the buffers belonging to the index record as well as the
- * page cache page the index block is in dirty.  This automatically marks the
- * VFS inode of the ntfs index inode to which the index entry belongs dirty,
- * too (I_DIRTY_PAGES) and this in turn ensures the page buffers, and hence the
- * dirty index block, will be written out to disk later.
- */
-static inline void ntfs_index_entry_mark_dirty(ntfs_index_context *ictx)
-{
-       if (ictx->is_in_root)
-               mark_mft_record_dirty(ictx->actx->ntfs_ino);
-       else
-               mark_ntfs_record_dirty(ictx->page,
-                               (u8*)ictx->ia - (u8*)page_address(ictx->page));
-}
-
-#endif /* NTFS_RW */
-
-#endif /* _LINUX_NTFS_INDEX_H */
diff --git a/fs/ntfs/inode.c b/fs/ntfs/inode.c
deleted file mode 100644 (file)
index aba1e22..0000000
+++ /dev/null
@@ -1,3102 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * inode.c - NTFS kernel inode handling.
- *
- * Copyright (c) 2001-2014 Anton Altaparmakov and Tuxera Inc.
- */
-
-#include <linux/buffer_head.h>
-#include <linux/fs.h>
-#include <linux/mm.h>
-#include <linux/mount.h>
-#include <linux/mutex.h>
-#include <linux/pagemap.h>
-#include <linux/quotaops.h>
-#include <linux/slab.h>
-#include <linux/log2.h>
-
-#include "aops.h"
-#include "attrib.h"
-#include "bitmap.h"
-#include "dir.h"
-#include "debug.h"
-#include "inode.h"
-#include "lcnalloc.h"
-#include "malloc.h"
-#include "mft.h"
-#include "time.h"
-#include "ntfs.h"
-
-/**
- * ntfs_test_inode - compare two (possibly fake) inodes for equality
- * @vi:                vfs inode which to test
- * @data:      data which is being tested with
- *
- * Compare the ntfs attribute embedded in the ntfs specific part of the vfs
- * inode @vi for equality with the ntfs attribute @data.
- *
- * If searching for the normal file/directory inode, set @na->type to AT_UNUSED.
- * @na->name and @na->name_len are then ignored.
- *
- * Return 1 if the attributes match and 0 if not.
- *
- * NOTE: This function runs with the inode_hash_lock spin lock held so it is not
- * allowed to sleep.
- */
-int ntfs_test_inode(struct inode *vi, void *data)
-{
-       ntfs_attr *na = (ntfs_attr *)data;
-       ntfs_inode *ni;
-
-       if (vi->i_ino != na->mft_no)
-               return 0;
-       ni = NTFS_I(vi);
-       /* If !NInoAttr(ni), @vi is a normal file or directory inode. */
-       if (likely(!NInoAttr(ni))) {
-               /* If not looking for a normal inode this is a mismatch. */
-               if (unlikely(na->type != AT_UNUSED))
-                       return 0;
-       } else {
-               /* A fake inode describing an attribute. */
-               if (ni->type != na->type)
-                       return 0;
-               if (ni->name_len != na->name_len)
-                       return 0;
-               if (na->name_len && memcmp(ni->name, na->name,
-                               na->name_len * sizeof(ntfschar)))
-                       return 0;
-       }
-       /* Match! */
-       return 1;
-}
-
-/**
- * ntfs_init_locked_inode - initialize an inode
- * @vi:                vfs inode to initialize
- * @data:      data which to initialize @vi to
- *
- * Initialize the vfs inode @vi with the values from the ntfs attribute @data in
- * order to enable ntfs_test_inode() to do its work.
- *
- * If initializing the normal file/directory inode, set @na->type to AT_UNUSED.
- * In that case, @na->name and @na->name_len should be set to NULL and 0,
- * respectively. Although that is not strictly necessary as
- * ntfs_read_locked_inode() will fill them in later.
- *
- * Return 0 on success and -errno on error.
- *
- * NOTE: This function runs with the inode->i_lock spin lock held so it is not
- * allowed to sleep. (Hence the GFP_ATOMIC allocation.)
- */
-static int ntfs_init_locked_inode(struct inode *vi, void *data)
-{
-       ntfs_attr *na = (ntfs_attr *)data;
-       ntfs_inode *ni = NTFS_I(vi);
-
-       vi->i_ino = na->mft_no;
-
-       ni->type = na->type;
-       if (na->type == AT_INDEX_ALLOCATION)
-               NInoSetMstProtected(ni);
-
-       ni->name = na->name;
-       ni->name_len = na->name_len;
-
-       /* If initializing a normal inode, we are done. */
-       if (likely(na->type == AT_UNUSED)) {
-               BUG_ON(na->name);
-               BUG_ON(na->name_len);
-               return 0;
-       }
-
-       /* It is a fake inode. */
-       NInoSetAttr(ni);
-
-       /*
-        * We have I30 global constant as an optimization as it is the name
-        * in >99.9% of named attributes! The other <0.1% incur a GFP_ATOMIC
-        * allocation but that is ok. And most attributes are unnamed anyway,
-        * thus the fraction of named attributes with name != I30 is actually
-        * absolutely tiny.
-        */
-       if (na->name_len && na->name != I30) {
-               unsigned int i;
-
-               BUG_ON(!na->name);
-               i = na->name_len * sizeof(ntfschar);
-               ni->name = kmalloc(i + sizeof(ntfschar), GFP_ATOMIC);
-               if (!ni->name)
-                       return -ENOMEM;
-               memcpy(ni->name, na->name, i);
-               ni->name[na->name_len] = 0;
-       }
-       return 0;
-}
-
-static int ntfs_read_locked_inode(struct inode *vi);
-static int ntfs_read_locked_attr_inode(struct inode *base_vi, struct inode *vi);
-static int ntfs_read_locked_index_inode(struct inode *base_vi,
-               struct inode *vi);
-
-/**
- * ntfs_iget - obtain a struct inode corresponding to a specific normal inode
- * @sb:                super block of mounted volume
- * @mft_no:    mft record number / inode number to obtain
- *
- * Obtain the struct inode corresponding to a specific normal inode (i.e. a
- * file or directory).
- *
- * If the inode is in the cache, it is just returned with an increased
- * reference count. Otherwise, a new struct inode is allocated and initialized,
- * and finally ntfs_read_locked_inode() is called to read in the inode and
- * fill in the remainder of the inode structure.
- *
- * Return the struct inode on success. Check the return value with IS_ERR() and
- * if true, the function failed and the error code is obtained from PTR_ERR().
- */
-struct inode *ntfs_iget(struct super_block *sb, unsigned long mft_no)
-{
-       struct inode *vi;
-       int err;
-       ntfs_attr na;
-
-       na.mft_no = mft_no;
-       na.type = AT_UNUSED;
-       na.name = NULL;
-       na.name_len = 0;
-
-       vi = iget5_locked(sb, mft_no, ntfs_test_inode,
-                       ntfs_init_locked_inode, &na);
-       if (unlikely(!vi))
-               return ERR_PTR(-ENOMEM);
-
-       err = 0;
-
-       /* If this is a freshly allocated inode, need to read it now. */
-       if (vi->i_state & I_NEW) {
-               err = ntfs_read_locked_inode(vi);
-               unlock_new_inode(vi);
-       }
-       /*
-        * There is no point in keeping bad inodes around if the failure was
-        * due to ENOMEM. We want to be able to retry again later.
-        */
-       if (unlikely(err == -ENOMEM)) {
-               iput(vi);
-               vi = ERR_PTR(err);
-       }
-       return vi;
-}
-
-/**
- * ntfs_attr_iget - obtain a struct inode corresponding to an attribute
- * @base_vi:   vfs base inode containing the attribute
- * @type:      attribute type
- * @name:      Unicode name of the attribute (NULL if unnamed)
- * @name_len:  length of @name in Unicode characters (0 if unnamed)
- *
- * Obtain the (fake) struct inode corresponding to the attribute specified by
- * @type, @name, and @name_len, which is present in the base mft record
- * specified by the vfs inode @base_vi.
- *
- * If the attribute inode is in the cache, it is just returned with an
- * increased reference count. Otherwise, a new struct inode is allocated and
- * initialized, and finally ntfs_read_locked_attr_inode() is called to read the
- * attribute and fill in the inode structure.
- *
- * Note, for index allocation attributes, you need to use ntfs_index_iget()
- * instead of ntfs_attr_iget() as working with indices is a lot more complex.
- *
- * Return the struct inode of the attribute inode on success. Check the return
- * value with IS_ERR() and if true, the function failed and the error code is
- * obtained from PTR_ERR().
- */
-struct inode *ntfs_attr_iget(struct inode *base_vi, ATTR_TYPE type,
-               ntfschar *name, u32 name_len)
-{
-       struct inode *vi;
-       int err;
-       ntfs_attr na;
-
-       /* Make sure no one calls ntfs_attr_iget() for indices. */
-       BUG_ON(type == AT_INDEX_ALLOCATION);
-
-       na.mft_no = base_vi->i_ino;
-       na.type = type;
-       na.name = name;
-       na.name_len = name_len;
-
-       vi = iget5_locked(base_vi->i_sb, na.mft_no, ntfs_test_inode,
-                       ntfs_init_locked_inode, &na);
-       if (unlikely(!vi))
-               return ERR_PTR(-ENOMEM);
-
-       err = 0;
-
-       /* If this is a freshly allocated inode, need to read it now. */
-       if (vi->i_state & I_NEW) {
-               err = ntfs_read_locked_attr_inode(base_vi, vi);
-               unlock_new_inode(vi);
-       }
-       /*
-        * There is no point in keeping bad attribute inodes around. This also
-        * simplifies things in that we never need to check for bad attribute
-        * inodes elsewhere.
-        */
-       if (unlikely(err)) {
-               iput(vi);
-               vi = ERR_PTR(err);
-       }
-       return vi;
-}
-
-/**
- * ntfs_index_iget - obtain a struct inode corresponding to an index
- * @base_vi:   vfs base inode containing the index related attributes
- * @name:      Unicode name of the index
- * @name_len:  length of @name in Unicode characters
- *
- * Obtain the (fake) struct inode corresponding to the index specified by @name
- * and @name_len, which is present in the base mft record specified by the vfs
- * inode @base_vi.
- *
- * If the index inode is in the cache, it is just returned with an increased
- * reference count.  Otherwise, a new struct inode is allocated and
- * initialized, and finally ntfs_read_locked_index_inode() is called to read
- * the index related attributes and fill in the inode structure.
- *
- * Return the struct inode of the index inode on success. Check the return
- * value with IS_ERR() and if true, the function failed and the error code is
- * obtained from PTR_ERR().
- */
-struct inode *ntfs_index_iget(struct inode *base_vi, ntfschar *name,
-               u32 name_len)
-{
-       struct inode *vi;
-       int err;
-       ntfs_attr na;
-
-       na.mft_no = base_vi->i_ino;
-       na.type = AT_INDEX_ALLOCATION;
-       na.name = name;
-       na.name_len = name_len;
-
-       vi = iget5_locked(base_vi->i_sb, na.mft_no, ntfs_test_inode,
-                       ntfs_init_locked_inode, &na);
-       if (unlikely(!vi))
-               return ERR_PTR(-ENOMEM);
-
-       err = 0;
-
-       /* If this is a freshly allocated inode, need to read it now. */
-       if (vi->i_state & I_NEW) {
-               err = ntfs_read_locked_index_inode(base_vi, vi);
-               unlock_new_inode(vi);
-       }
-       /*
-        * There is no point in keeping bad index inodes around.  This also
-        * simplifies things in that we never need to check for bad index
-        * inodes elsewhere.
-        */
-       if (unlikely(err)) {
-               iput(vi);
-               vi = ERR_PTR(err);
-       }
-       return vi;
-}
-
-struct inode *ntfs_alloc_big_inode(struct super_block *sb)
-{
-       ntfs_inode *ni;
-
-       ntfs_debug("Entering.");
-       ni = alloc_inode_sb(sb, ntfs_big_inode_cache, GFP_NOFS);
-       if (likely(ni != NULL)) {
-               ni->state = 0;
-               return VFS_I(ni);
-       }
-       ntfs_error(sb, "Allocation of NTFS big inode structure failed.");
-       return NULL;
-}
-
-void ntfs_free_big_inode(struct inode *inode)
-{
-       kmem_cache_free(ntfs_big_inode_cache, NTFS_I(inode));
-}
-
-static inline ntfs_inode *ntfs_alloc_extent_inode(void)
-{
-       ntfs_inode *ni;
-
-       ntfs_debug("Entering.");
-       ni = kmem_cache_alloc(ntfs_inode_cache, GFP_NOFS);
-       if (likely(ni != NULL)) {
-               ni->state = 0;
-               return ni;
-       }
-       ntfs_error(NULL, "Allocation of NTFS inode structure failed.");
-       return NULL;
-}
-
-static void ntfs_destroy_extent_inode(ntfs_inode *ni)
-{
-       ntfs_debug("Entering.");
-       BUG_ON(ni->page);
-       if (!atomic_dec_and_test(&ni->count))
-               BUG();
-       kmem_cache_free(ntfs_inode_cache, ni);
-}
-
-/*
- * The attribute runlist lock has separate locking rules from the
- * normal runlist lock, so split the two lock-classes:
- */
-static struct lock_class_key attr_list_rl_lock_class;
-
-/**
- * __ntfs_init_inode - initialize ntfs specific part of an inode
- * @sb:                super block of mounted volume
- * @ni:                freshly allocated ntfs inode which to initialize
- *
- * Initialize an ntfs inode to defaults.
- *
- * NOTE: ni->mft_no, ni->state, ni->type, ni->name, and ni->name_len are left
- * untouched. Make sure to initialize them elsewhere.
- *
- * Return zero on success and -ENOMEM on error.
- */
-void __ntfs_init_inode(struct super_block *sb, ntfs_inode *ni)
-{
-       ntfs_debug("Entering.");
-       rwlock_init(&ni->size_lock);
-       ni->initialized_size = ni->allocated_size = 0;
-       ni->seq_no = 0;
-       atomic_set(&ni->count, 1);
-       ni->vol = NTFS_SB(sb);
-       ntfs_init_runlist(&ni->runlist);
-       mutex_init(&ni->mrec_lock);
-       ni->page = NULL;
-       ni->page_ofs = 0;
-       ni->attr_list_size = 0;
-       ni->attr_list = NULL;
-       ntfs_init_runlist(&ni->attr_list_rl);
-       lockdep_set_class(&ni->attr_list_rl.lock,
-                               &attr_list_rl_lock_class);
-       ni->itype.index.block_size = 0;
-       ni->itype.index.vcn_size = 0;
-       ni->itype.index.collation_rule = 0;
-       ni->itype.index.block_size_bits = 0;
-       ni->itype.index.vcn_size_bits = 0;
-       mutex_init(&ni->extent_lock);
-       ni->nr_extents = 0;
-       ni->ext.base_ntfs_ino = NULL;
-}
-
-/*
- * Extent inodes get MFT-mapped in a nested way, while the base inode
- * is still mapped. Teach this nesting to the lock validator by creating
- * a separate class for nested inode's mrec_lock's:
- */
-static struct lock_class_key extent_inode_mrec_lock_key;
-
-inline ntfs_inode *ntfs_new_extent_inode(struct super_block *sb,
-               unsigned long mft_no)
-{
-       ntfs_inode *ni = ntfs_alloc_extent_inode();
-
-       ntfs_debug("Entering.");
-       if (likely(ni != NULL)) {
-               __ntfs_init_inode(sb, ni);
-               lockdep_set_class(&ni->mrec_lock, &extent_inode_mrec_lock_key);
-               ni->mft_no = mft_no;
-               ni->type = AT_UNUSED;
-               ni->name = NULL;
-               ni->name_len = 0;
-       }
-       return ni;
-}
-
-/**
- * ntfs_is_extended_system_file - check if a file is in the $Extend directory
- * @ctx:       initialized attribute search context
- *
- * Search all file name attributes in the inode described by the attribute
- * search context @ctx and check if any of the names are in the $Extend system
- * directory.
- *
- * Return values:
- *        1: file is in $Extend directory
- *        0: file is not in $Extend directory
- *    -errno: failed to determine if the file is in the $Extend directory
- */
-static int ntfs_is_extended_system_file(ntfs_attr_search_ctx *ctx)
-{
-       int nr_links, err;
-
-       /* Restart search. */
-       ntfs_attr_reinit_search_ctx(ctx);
-
-       /* Get number of hard links. */
-       nr_links = le16_to_cpu(ctx->mrec->link_count);
-
-       /* Loop through all hard links. */
-       while (!(err = ntfs_attr_lookup(AT_FILE_NAME, NULL, 0, 0, 0, NULL, 0,
-                       ctx))) {
-               FILE_NAME_ATTR *file_name_attr;
-               ATTR_RECORD *attr = ctx->attr;
-               u8 *p, *p2;
-
-               nr_links--;
-               /*
-                * Maximum sanity checking as we are called on an inode that
-                * we suspect might be corrupt.
-                */
-               p = (u8*)attr + le32_to_cpu(attr->length);
-               if (p < (u8*)ctx->mrec || (u8*)p > (u8*)ctx->mrec +
-                               le32_to_cpu(ctx->mrec->bytes_in_use)) {
-err_corrupt_attr:
-                       ntfs_error(ctx->ntfs_ino->vol->sb, "Corrupt file name "
-                                       "attribute. You should run chkdsk.");
-                       return -EIO;
-               }
-               if (attr->non_resident) {
-                       ntfs_error(ctx->ntfs_ino->vol->sb, "Non-resident file "
-                                       "name. You should run chkdsk.");
-                       return -EIO;
-               }
-               if (attr->flags) {
-                       ntfs_error(ctx->ntfs_ino->vol->sb, "File name with "
-                                       "invalid flags. You should run "
-                                       "chkdsk.");
-                       return -EIO;
-               }
-               if (!(attr->data.resident.flags & RESIDENT_ATTR_IS_INDEXED)) {
-                       ntfs_error(ctx->ntfs_ino->vol->sb, "Unindexed file "
-                                       "name. You should run chkdsk.");
-                       return -EIO;
-               }
-               file_name_attr = (FILE_NAME_ATTR*)((u8*)attr +
-                               le16_to_cpu(attr->data.resident.value_offset));
-               p2 = (u8 *)file_name_attr + le32_to_cpu(attr->data.resident.value_length);
-               if (p2 < (u8*)attr || p2 > p)
-                       goto err_corrupt_attr;
-               /* This attribute is ok, but is it in the $Extend directory? */
-               if (MREF_LE(file_name_attr->parent_directory) == FILE_Extend)
-                       return 1;       /* YES, it's an extended system file. */
-       }
-       if (unlikely(err != -ENOENT))
-               return err;
-       if (unlikely(nr_links)) {
-               ntfs_error(ctx->ntfs_ino->vol->sb, "Inode hard link count "
-                               "doesn't match number of name attributes. You "
-                               "should run chkdsk.");
-               return -EIO;
-       }
-       return 0;       /* NO, it is not an extended system file. */
-}
-
-/**
- * ntfs_read_locked_inode - read an inode from its device
- * @vi:                inode to read
- *
- * ntfs_read_locked_inode() is called from ntfs_iget() to read the inode
- * described by @vi into memory from the device.
- *
- * The only fields in @vi that we need to/can look at when the function is
- * called are i_sb, pointing to the mounted device's super block, and i_ino,
- * the number of the inode to load.
- *
- * ntfs_read_locked_inode() maps, pins and locks the mft record number i_ino
- * for reading and sets up the necessary @vi fields as well as initializing
- * the ntfs inode.
- *
- * Q: What locks are held when the function is called?
- * A: i_state has I_NEW set, hence the inode is locked, also
- *    i_count is set to 1, so it is not going to go away
- *    i_flags is set to 0 and we have no business touching it.  Only an ioctl()
- *    is allowed to write to them. We should of course be honouring them but
- *    we need to do that using the IS_* macros defined in include/linux/fs.h.
- *    In any case ntfs_read_locked_inode() has nothing to do with i_flags.
- *
- * Return 0 on success and -errno on error.  In the error case, the inode will
- * have had make_bad_inode() executed on it.
- */
-static int ntfs_read_locked_inode(struct inode *vi)
-{
-       ntfs_volume *vol = NTFS_SB(vi->i_sb);
-       ntfs_inode *ni;
-       struct inode *bvi;
-       MFT_RECORD *m;
-       ATTR_RECORD *a;
-       STANDARD_INFORMATION *si;
-       ntfs_attr_search_ctx *ctx;
-       int err = 0;
-
-       ntfs_debug("Entering for i_ino 0x%lx.", vi->i_ino);
-
-       /* Setup the generic vfs inode parts now. */
-       vi->i_uid = vol->uid;
-       vi->i_gid = vol->gid;
-       vi->i_mode = 0;
-
-       /*
-        * Initialize the ntfs specific part of @vi special casing
-        * FILE_MFT which we need to do at mount time.
-        */
-       if (vi->i_ino != FILE_MFT)
-               ntfs_init_big_inode(vi);
-       ni = NTFS_I(vi);
-
-       m = map_mft_record(ni);
-       if (IS_ERR(m)) {
-               err = PTR_ERR(m);
-               goto err_out;
-       }
-       ctx = ntfs_attr_get_search_ctx(ni, m);
-       if (!ctx) {
-               err = -ENOMEM;
-               goto unm_err_out;
-       }
-
-       if (!(m->flags & MFT_RECORD_IN_USE)) {
-               ntfs_error(vi->i_sb, "Inode is not in use!");
-               goto unm_err_out;
-       }
-       if (m->base_mft_record) {
-               ntfs_error(vi->i_sb, "Inode is an extent inode!");
-               goto unm_err_out;
-       }
-
-       /* Transfer information from mft record into vfs and ntfs inodes. */
-       vi->i_generation = ni->seq_no = le16_to_cpu(m->sequence_number);
-
-       /*
-        * FIXME: Keep in mind that link_count is two for files which have both
-        * a long file name and a short file name as separate entries, so if
-        * we are hiding short file names this will be too high. Either we need
-        * to account for the short file names by subtracting them or we need
-        * to make sure we delete files even though i_nlink is not zero which
-        * might be tricky due to vfs interactions. Need to think about this
-        * some more when implementing the unlink command.
-        */
-       set_nlink(vi, le16_to_cpu(m->link_count));
-       /*
-        * FIXME: Reparse points can have the directory bit set even though
-        * they would be S_IFLNK. Need to deal with this further below when we
-        * implement reparse points / symbolic links but it will do for now.
-        * Also if not a directory, it could be something else, rather than
-        * a regular file. But again, will do for now.
-        */
-       /* Everyone gets all permissions. */
-       vi->i_mode |= S_IRWXUGO;
-       /* If read-only, no one gets write permissions. */
-       if (IS_RDONLY(vi))
-               vi->i_mode &= ~S_IWUGO;
-       if (m->flags & MFT_RECORD_IS_DIRECTORY) {
-               vi->i_mode |= S_IFDIR;
-               /*
-                * Apply the directory permissions mask set in the mount
-                * options.
-                */
-               vi->i_mode &= ~vol->dmask;
-               /* Things break without this kludge! */
-               if (vi->i_nlink > 1)
-                       set_nlink(vi, 1);
-       } else {
-               vi->i_mode |= S_IFREG;
-               /* Apply the file permissions mask set in the mount options. */
-               vi->i_mode &= ~vol->fmask;
-       }
-       /*
-        * Find the standard information attribute in the mft record. At this
-        * stage we haven't setup the attribute list stuff yet, so this could
-        * in fact fail if the standard information is in an extent record, but
-        * I don't think this actually ever happens.
-        */
-       err = ntfs_attr_lookup(AT_STANDARD_INFORMATION, NULL, 0, 0, 0, NULL, 0,
-                       ctx);
-       if (unlikely(err)) {
-               if (err == -ENOENT) {
-                       /*
-                        * TODO: We should be performing a hot fix here (if the
-                        * recover mount option is set) by creating a new
-                        * attribute.
-                        */
-                       ntfs_error(vi->i_sb, "$STANDARD_INFORMATION attribute "
-                                       "is missing.");
-               }
-               goto unm_err_out;
-       }
-       a = ctx->attr;
-       /* Get the standard information attribute value. */
-       if ((u8 *)a + le16_to_cpu(a->data.resident.value_offset)
-                       + le32_to_cpu(a->data.resident.value_length) >
-                       (u8 *)ctx->mrec + vol->mft_record_size) {
-               ntfs_error(vi->i_sb, "Corrupt standard information attribute in inode.");
-               goto unm_err_out;
-       }
-       si = (STANDARD_INFORMATION*)((u8*)a +
-                       le16_to_cpu(a->data.resident.value_offset));
-
-       /* Transfer information from the standard information into vi. */
-       /*
-        * Note: The i_?times do not quite map perfectly onto the NTFS times,
-        * but they are close enough, and in the end it doesn't really matter
-        * that much...
-        */
-       /*
-        * mtime is the last change of the data within the file. Not changed
-        * when only metadata is changed, e.g. a rename doesn't affect mtime.
-        */
-       inode_set_mtime_to_ts(vi, ntfs2utc(si->last_data_change_time));
-       /*
-        * ctime is the last change of the metadata of the file. This obviously
-        * always changes, when mtime is changed. ctime can be changed on its
-        * own, mtime is then not changed, e.g. when a file is renamed.
-        */
-       inode_set_ctime_to_ts(vi, ntfs2utc(si->last_mft_change_time));
-       /*
-        * Last access to the data within the file. Not changed during a rename
-        * for example but changed whenever the file is written to.
-        */
-       inode_set_atime_to_ts(vi, ntfs2utc(si->last_access_time));
-
-       /* Find the attribute list attribute if present. */
-       ntfs_attr_reinit_search_ctx(ctx);
-       err = ntfs_attr_lookup(AT_ATTRIBUTE_LIST, NULL, 0, 0, 0, NULL, 0, ctx);
-       if (err) {
-               if (unlikely(err != -ENOENT)) {
-                       ntfs_error(vi->i_sb, "Failed to lookup attribute list "
-                                       "attribute.");
-                       goto unm_err_out;
-               }
-       } else /* if (!err) */ {
-               if (vi->i_ino == FILE_MFT)
-                       goto skip_attr_list_load;
-               ntfs_debug("Attribute list found in inode 0x%lx.", vi->i_ino);
-               NInoSetAttrList(ni);
-               a = ctx->attr;
-               if (a->flags & ATTR_COMPRESSION_MASK) {
-                       ntfs_error(vi->i_sb, "Attribute list attribute is "
-                                       "compressed.");
-                       goto unm_err_out;
-               }
-               if (a->flags & ATTR_IS_ENCRYPTED ||
-                               a->flags & ATTR_IS_SPARSE) {
-                       if (a->non_resident) {
-                               ntfs_error(vi->i_sb, "Non-resident attribute "
-                                               "list attribute is encrypted/"
-                                               "sparse.");
-                               goto unm_err_out;
-                       }
-                       ntfs_warning(vi->i_sb, "Resident attribute list "
-                                       "attribute in inode 0x%lx is marked "
-                                       "encrypted/sparse which is not true.  "
-                                       "However, Windows allows this and "
-                                       "chkdsk does not detect or correct it "
-                                       "so we will just ignore the invalid "
-                                       "flags and pretend they are not set.",
-                                       vi->i_ino);
-               }
-               /* Now allocate memory for the attribute list. */
-               ni->attr_list_size = (u32)ntfs_attr_size(a);
-               ni->attr_list = ntfs_malloc_nofs(ni->attr_list_size);
-               if (!ni->attr_list) {
-                       ntfs_error(vi->i_sb, "Not enough memory to allocate "
-                                       "buffer for attribute list.");
-                       err = -ENOMEM;
-                       goto unm_err_out;
-               }
-               if (a->non_resident) {
-                       NInoSetAttrListNonResident(ni);
-                       if (a->data.non_resident.lowest_vcn) {
-                               ntfs_error(vi->i_sb, "Attribute list has non "
-                                               "zero lowest_vcn.");
-                               goto unm_err_out;
-                       }
-                       /*
-                        * Setup the runlist. No need for locking as we have
-                        * exclusive access to the inode at this time.
-                        */
-                       ni->attr_list_rl.rl = ntfs_mapping_pairs_decompress(vol,
-                                       a, NULL);
-                       if (IS_ERR(ni->attr_list_rl.rl)) {
-                               err = PTR_ERR(ni->attr_list_rl.rl);
-                               ni->attr_list_rl.rl = NULL;
-                               ntfs_error(vi->i_sb, "Mapping pairs "
-                                               "decompression failed.");
-                               goto unm_err_out;
-                       }
-                       /* Now load the attribute list. */
-                       if ((err = load_attribute_list(vol, &ni->attr_list_rl,
-                                       ni->attr_list, ni->attr_list_size,
-                                       sle64_to_cpu(a->data.non_resident.
-                                       initialized_size)))) {
-                               ntfs_error(vi->i_sb, "Failed to load "
-                                               "attribute list attribute.");
-                               goto unm_err_out;
-                       }
-               } else /* if (!a->non_resident) */ {
-                       if ((u8*)a + le16_to_cpu(a->data.resident.value_offset)
-                                       + le32_to_cpu(
-                                       a->data.resident.value_length) >
-                                       (u8*)ctx->mrec + vol->mft_record_size) {
-                               ntfs_error(vi->i_sb, "Corrupt attribute list "
-                                               "in inode.");
-                               goto unm_err_out;
-                       }
-                       /* Now copy the attribute list. */
-                       memcpy(ni->attr_list, (u8*)a + le16_to_cpu(
-                                       a->data.resident.value_offset),
-                                       le32_to_cpu(
-                                       a->data.resident.value_length));
-               }
-       }
-skip_attr_list_load:
-       /*
-        * If an attribute list is present we now have the attribute list value
-        * in ntfs_ino->attr_list and it is ntfs_ino->attr_list_size bytes.
-        */
-       if (S_ISDIR(vi->i_mode)) {
-               loff_t bvi_size;
-               ntfs_inode *bni;
-               INDEX_ROOT *ir;
-               u8 *ir_end, *index_end;
-
-               /* It is a directory, find index root attribute. */
-               ntfs_attr_reinit_search_ctx(ctx);
-               err = ntfs_attr_lookup(AT_INDEX_ROOT, I30, 4, CASE_SENSITIVE,
-                               0, NULL, 0, ctx);
-               if (unlikely(err)) {
-                       if (err == -ENOENT) {
-                               // FIXME: File is corrupt! Hot-fix with empty
-                               // index root attribute if recovery option is
-                               // set.
-                               ntfs_error(vi->i_sb, "$INDEX_ROOT attribute "
-                                               "is missing.");
-                       }
-                       goto unm_err_out;
-               }
-               a = ctx->attr;
-               /* Set up the state. */
-               if (unlikely(a->non_resident)) {
-                       ntfs_error(vol->sb, "$INDEX_ROOT attribute is not "
-                                       "resident.");
-                       goto unm_err_out;
-               }
-               /* Ensure the attribute name is placed before the value. */
-               if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=
-                               le16_to_cpu(a->data.resident.value_offset)))) {
-                       ntfs_error(vol->sb, "$INDEX_ROOT attribute name is "
-                                       "placed after the attribute value.");
-                       goto unm_err_out;
-               }
-               /*
-                * Compressed/encrypted index root just means that the newly
-                * created files in that directory should be created compressed/
-                * encrypted. However index root cannot be both compressed and
-                * encrypted.
-                */
-               if (a->flags & ATTR_COMPRESSION_MASK)
-                       NInoSetCompressed(ni);
-               if (a->flags & ATTR_IS_ENCRYPTED) {
-                       if (a->flags & ATTR_COMPRESSION_MASK) {
-                               ntfs_error(vi->i_sb, "Found encrypted and "
-                                               "compressed attribute.");
-                               goto unm_err_out;
-                       }
-                       NInoSetEncrypted(ni);
-               }
-               if (a->flags & ATTR_IS_SPARSE)
-                       NInoSetSparse(ni);
-               ir = (INDEX_ROOT*)((u8*)a +
-                               le16_to_cpu(a->data.resident.value_offset));
-               ir_end = (u8*)ir + le32_to_cpu(a->data.resident.value_length);
-               if (ir_end > (u8*)ctx->mrec + vol->mft_record_size) {
-                       ntfs_error(vi->i_sb, "$INDEX_ROOT attribute is "
-                                       "corrupt.");
-                       goto unm_err_out;
-               }
-               index_end = (u8*)&ir->index +
-                               le32_to_cpu(ir->index.index_length);
-               if (index_end > ir_end) {
-                       ntfs_error(vi->i_sb, "Directory index is corrupt.");
-                       goto unm_err_out;
-               }
-               if (ir->type != AT_FILE_NAME) {
-                       ntfs_error(vi->i_sb, "Indexed attribute is not "
-                                       "$FILE_NAME.");
-                       goto unm_err_out;
-               }
-               if (ir->collation_rule != COLLATION_FILE_NAME) {
-                       ntfs_error(vi->i_sb, "Index collation rule is not "
-                                       "COLLATION_FILE_NAME.");
-                       goto unm_err_out;
-               }
-               ni->itype.index.collation_rule = ir->collation_rule;
-               ni->itype.index.block_size = le32_to_cpu(ir->index_block_size);
-               if (ni->itype.index.block_size &
-                               (ni->itype.index.block_size - 1)) {
-                       ntfs_error(vi->i_sb, "Index block size (%u) is not a "
-                                       "power of two.",
-                                       ni->itype.index.block_size);
-                       goto unm_err_out;
-               }
-               if (ni->itype.index.block_size > PAGE_SIZE) {
-                       ntfs_error(vi->i_sb, "Index block size (%u) > "
-                                       "PAGE_SIZE (%ld) is not "
-                                       "supported.  Sorry.",
-                                       ni->itype.index.block_size,
-                                       PAGE_SIZE);
-                       err = -EOPNOTSUPP;
-                       goto unm_err_out;
-               }
-               if (ni->itype.index.block_size < NTFS_BLOCK_SIZE) {
-                       ntfs_error(vi->i_sb, "Index block size (%u) < "
-                                       "NTFS_BLOCK_SIZE (%i) is not "
-                                       "supported.  Sorry.",
-                                       ni->itype.index.block_size,
-                                       NTFS_BLOCK_SIZE);
-                       err = -EOPNOTSUPP;
-                       goto unm_err_out;
-               }
-               ni->itype.index.block_size_bits =
-                               ffs(ni->itype.index.block_size) - 1;
-               /* Determine the size of a vcn in the directory index. */
-               if (vol->cluster_size <= ni->itype.index.block_size) {
-                       ni->itype.index.vcn_size = vol->cluster_size;
-                       ni->itype.index.vcn_size_bits = vol->cluster_size_bits;
-               } else {
-                       ni->itype.index.vcn_size = vol->sector_size;
-                       ni->itype.index.vcn_size_bits = vol->sector_size_bits;
-               }
-
-               /* Setup the index allocation attribute, even if not present. */
-               NInoSetMstProtected(ni);
-               ni->type = AT_INDEX_ALLOCATION;
-               ni->name = I30;
-               ni->name_len = 4;
-
-               if (!(ir->index.flags & LARGE_INDEX)) {
-                       /* No index allocation. */
-                       vi->i_size = ni->initialized_size =
-                                       ni->allocated_size = 0;
-                       /* We are done with the mft record, so we release it. */
-                       ntfs_attr_put_search_ctx(ctx);
-                       unmap_mft_record(ni);
-                       m = NULL;
-                       ctx = NULL;
-                       goto skip_large_dir_stuff;
-               } /* LARGE_INDEX: Index allocation present. Setup state. */
-               NInoSetIndexAllocPresent(ni);
-               /* Find index allocation attribute. */
-               ntfs_attr_reinit_search_ctx(ctx);
-               err = ntfs_attr_lookup(AT_INDEX_ALLOCATION, I30, 4,
-                               CASE_SENSITIVE, 0, NULL, 0, ctx);
-               if (unlikely(err)) {
-                       if (err == -ENOENT)
-                               ntfs_error(vi->i_sb, "$INDEX_ALLOCATION "
-                                               "attribute is not present but "
-                                               "$INDEX_ROOT indicated it is.");
-                       else
-                               ntfs_error(vi->i_sb, "Failed to lookup "
-                                               "$INDEX_ALLOCATION "
-                                               "attribute.");
-                       goto unm_err_out;
-               }
-               a = ctx->attr;
-               if (!a->non_resident) {
-                       ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute "
-                                       "is resident.");
-                       goto unm_err_out;
-               }
-               /*
-                * Ensure the attribute name is placed before the mapping pairs
-                * array.
-                */
-               if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=
-                               le16_to_cpu(
-                               a->data.non_resident.mapping_pairs_offset)))) {
-                       ntfs_error(vol->sb, "$INDEX_ALLOCATION attribute name "
-                                       "is placed after the mapping pairs "
-                                       "array.");
-                       goto unm_err_out;
-               }
-               if (a->flags & ATTR_IS_ENCRYPTED) {
-                       ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute "
-                                       "is encrypted.");
-                       goto unm_err_out;
-               }
-               if (a->flags & ATTR_IS_SPARSE) {
-                       ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute "
-                                       "is sparse.");
-                       goto unm_err_out;
-               }
-               if (a->flags & ATTR_COMPRESSION_MASK) {
-                       ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute "
-                                       "is compressed.");
-                       goto unm_err_out;
-               }
-               if (a->data.non_resident.lowest_vcn) {
-                       ntfs_error(vi->i_sb, "First extent of "
-                                       "$INDEX_ALLOCATION attribute has non "
-                                       "zero lowest_vcn.");
-                       goto unm_err_out;
-               }
-               vi->i_size = sle64_to_cpu(a->data.non_resident.data_size);
-               ni->initialized_size = sle64_to_cpu(
-                               a->data.non_resident.initialized_size);
-               ni->allocated_size = sle64_to_cpu(
-                               a->data.non_resident.allocated_size);
-               /*
-                * We are done with the mft record, so we release it. Otherwise
-                * we would deadlock in ntfs_attr_iget().
-                */
-               ntfs_attr_put_search_ctx(ctx);
-               unmap_mft_record(ni);
-               m = NULL;
-               ctx = NULL;
-               /* Get the index bitmap attribute inode. */
-               bvi = ntfs_attr_iget(vi, AT_BITMAP, I30, 4);
-               if (IS_ERR(bvi)) {
-                       ntfs_error(vi->i_sb, "Failed to get bitmap attribute.");
-                       err = PTR_ERR(bvi);
-                       goto unm_err_out;
-               }
-               bni = NTFS_I(bvi);
-               if (NInoCompressed(bni) || NInoEncrypted(bni) ||
-                               NInoSparse(bni)) {
-                       ntfs_error(vi->i_sb, "$BITMAP attribute is compressed "
-                                       "and/or encrypted and/or sparse.");
-                       goto iput_unm_err_out;
-               }
-               /* Consistency check bitmap size vs. index allocation size. */
-               bvi_size = i_size_read(bvi);
-               if ((bvi_size << 3) < (vi->i_size >>
-                               ni->itype.index.block_size_bits)) {
-                       ntfs_error(vi->i_sb, "Index bitmap too small (0x%llx) "
-                                       "for index allocation (0x%llx).",
-                                       bvi_size << 3, vi->i_size);
-                       goto iput_unm_err_out;
-               }
-               /* No longer need the bitmap attribute inode. */
-               iput(bvi);
-skip_large_dir_stuff:
-               /* Setup the operations for this inode. */
-               vi->i_op = &ntfs_dir_inode_ops;
-               vi->i_fop = &ntfs_dir_ops;
-               vi->i_mapping->a_ops = &ntfs_mst_aops;
-       } else {
-               /* It is a file. */
-               ntfs_attr_reinit_search_ctx(ctx);
-
-               /* Setup the data attribute, even if not present. */
-               ni->type = AT_DATA;
-               ni->name = NULL;
-               ni->name_len = 0;
-
-               /* Find first extent of the unnamed data attribute. */
-               err = ntfs_attr_lookup(AT_DATA, NULL, 0, 0, 0, NULL, 0, ctx);
-               if (unlikely(err)) {
-                       vi->i_size = ni->initialized_size =
-                                       ni->allocated_size = 0;
-                       if (err != -ENOENT) {
-                               ntfs_error(vi->i_sb, "Failed to lookup $DATA "
-                                               "attribute.");
-                               goto unm_err_out;
-                       }
-                       /*
-                        * FILE_Secure does not have an unnamed $DATA
-                        * attribute, so we special case it here.
-                        */
-                       if (vi->i_ino == FILE_Secure)
-                               goto no_data_attr_special_case;
-                       /*
-                        * Most if not all the system files in the $Extend
-                        * system directory do not have unnamed data
-                        * attributes so we need to check if the parent
-                        * directory of the file is FILE_Extend and if it is
-                        * ignore this error. To do this we need to get the
-                        * name of this inode from the mft record as the name
-                        * contains the back reference to the parent directory.
-                        */
-                       if (ntfs_is_extended_system_file(ctx) > 0)
-                               goto no_data_attr_special_case;
-                       // FIXME: File is corrupt! Hot-fix with empty data
-                       // attribute if recovery option is set.
-                       ntfs_error(vi->i_sb, "$DATA attribute is missing.");
-                       goto unm_err_out;
-               }
-               a = ctx->attr;
-               /* Setup the state. */
-               if (a->flags & (ATTR_COMPRESSION_MASK | ATTR_IS_SPARSE)) {
-                       if (a->flags & ATTR_COMPRESSION_MASK) {
-                               NInoSetCompressed(ni);
-                               if (vol->cluster_size > 4096) {
-                                       ntfs_error(vi->i_sb, "Found "
-                                                       "compressed data but "
-                                                       "compression is "
-                                                       "disabled due to "
-                                                       "cluster size (%i) > "
-                                                       "4kiB.",
-                                                       vol->cluster_size);
-                                       goto unm_err_out;
-                               }
-                               if ((a->flags & ATTR_COMPRESSION_MASK)
-                                               != ATTR_IS_COMPRESSED) {
-                                       ntfs_error(vi->i_sb, "Found unknown "
-                                                       "compression method "
-                                                       "or corrupt file.");
-                                       goto unm_err_out;
-                               }
-                       }
-                       if (a->flags & ATTR_IS_SPARSE)
-                               NInoSetSparse(ni);
-               }
-               if (a->flags & ATTR_IS_ENCRYPTED) {
-                       if (NInoCompressed(ni)) {
-                               ntfs_error(vi->i_sb, "Found encrypted and "
-                                               "compressed data.");
-                               goto unm_err_out;
-                       }
-                       NInoSetEncrypted(ni);
-               }
-               if (a->non_resident) {
-                       NInoSetNonResident(ni);
-                       if (NInoCompressed(ni) || NInoSparse(ni)) {
-                               if (NInoCompressed(ni) && a->data.non_resident.
-                                               compression_unit != 4) {
-                                       ntfs_error(vi->i_sb, "Found "
-                                                       "non-standard "
-                                                       "compression unit (%u "
-                                                       "instead of 4).  "
-                                                       "Cannot handle this.",
-                                                       a->data.non_resident.
-                                                       compression_unit);
-                                       err = -EOPNOTSUPP;
-                                       goto unm_err_out;
-                               }
-                               if (a->data.non_resident.compression_unit) {
-                                       ni->itype.compressed.block_size = 1U <<
-                                                       (a->data.non_resident.
-                                                       compression_unit +
-                                                       vol->cluster_size_bits);
-                                       ni->itype.compressed.block_size_bits =
-                                                       ffs(ni->itype.
-                                                       compressed.
-                                                       block_size) - 1;
-                                       ni->itype.compressed.block_clusters =
-                                                       1U << a->data.
-                                                       non_resident.
-                                                       compression_unit;
-                               } else {
-                                       ni->itype.compressed.block_size = 0;
-                                       ni->itype.compressed.block_size_bits =
-                                                       0;
-                                       ni->itype.compressed.block_clusters =
-                                                       0;
-                               }
-                               ni->itype.compressed.size = sle64_to_cpu(
-                                               a->data.non_resident.
-                                               compressed_size);
-                       }
-                       if (a->data.non_resident.lowest_vcn) {
-                               ntfs_error(vi->i_sb, "First extent of $DATA "
-                                               "attribute has non zero "
-                                               "lowest_vcn.");
-                               goto unm_err_out;
-                       }
-                       vi->i_size = sle64_to_cpu(
-                                       a->data.non_resident.data_size);
-                       ni->initialized_size = sle64_to_cpu(
-                                       a->data.non_resident.initialized_size);
-                       ni->allocated_size = sle64_to_cpu(
-                                       a->data.non_resident.allocated_size);
-               } else { /* Resident attribute. */
-                       vi->i_size = ni->initialized_size = le32_to_cpu(
-                                       a->data.resident.value_length);
-                       ni->allocated_size = le32_to_cpu(a->length) -
-                                       le16_to_cpu(
-                                       a->data.resident.value_offset);
-                       if (vi->i_size > ni->allocated_size) {
-                               ntfs_error(vi->i_sb, "Resident data attribute "
-                                               "is corrupt (size exceeds "
-                                               "allocation).");
-                               goto unm_err_out;
-                       }
-               }
-no_data_attr_special_case:
-               /* We are done with the mft record, so we release it. */
-               ntfs_attr_put_search_ctx(ctx);
-               unmap_mft_record(ni);
-               m = NULL;
-               ctx = NULL;
-               /* Setup the operations for this inode. */
-               vi->i_op = &ntfs_file_inode_ops;
-               vi->i_fop = &ntfs_file_ops;
-               vi->i_mapping->a_ops = &ntfs_normal_aops;
-               if (NInoMstProtected(ni))
-                       vi->i_mapping->a_ops = &ntfs_mst_aops;
-               else if (NInoCompressed(ni))
-                       vi->i_mapping->a_ops = &ntfs_compressed_aops;
-       }
-       /*
-        * The number of 512-byte blocks used on disk (for stat). This is in so
-        * far inaccurate as it doesn't account for any named streams or other
-        * special non-resident attributes, but that is how Windows works, too,
-        * so we are at least consistent with Windows, if not entirely
-        * consistent with the Linux Way. Doing it the Linux Way would cause a
-        * significant slowdown as it would involve iterating over all
-        * attributes in the mft record and adding the allocated/compressed
-        * sizes of all non-resident attributes present to give us the Linux
-        * correct size that should go into i_blocks (after division by 512).
-        */
-       if (S_ISREG(vi->i_mode) && (NInoCompressed(ni) || NInoSparse(ni)))
-               vi->i_blocks = ni->itype.compressed.size >> 9;
-       else
-               vi->i_blocks = ni->allocated_size >> 9;
-       ntfs_debug("Done.");
-       return 0;
-iput_unm_err_out:
-       iput(bvi);
-unm_err_out:
-       if (!err)
-               err = -EIO;
-       if (ctx)
-               ntfs_attr_put_search_ctx(ctx);
-       if (m)
-               unmap_mft_record(ni);
-err_out:
-       ntfs_error(vol->sb, "Failed with error code %i.  Marking corrupt "
-                       "inode 0x%lx as bad.  Run chkdsk.", err, vi->i_ino);
-       make_bad_inode(vi);
-       if (err != -EOPNOTSUPP && err != -ENOMEM)
-               NVolSetErrors(vol);
-       return err;
-}
-
-/**
- * ntfs_read_locked_attr_inode - read an attribute inode from its base inode
- * @base_vi:   base inode
- * @vi:                attribute inode to read
- *
- * ntfs_read_locked_attr_inode() is called from ntfs_attr_iget() to read the
- * attribute inode described by @vi into memory from the base mft record
- * described by @base_ni.
- *
- * ntfs_read_locked_attr_inode() maps, pins and locks the base inode for
- * reading and looks up the attribute described by @vi before setting up the
- * necessary fields in @vi as well as initializing the ntfs inode.
- *
- * Q: What locks are held when the function is called?
- * A: i_state has I_NEW set, hence the inode is locked, also
- *    i_count is set to 1, so it is not going to go away
- *
- * Return 0 on success and -errno on error.  In the error case, the inode will
- * have had make_bad_inode() executed on it.
- *
- * Note this cannot be called for AT_INDEX_ALLOCATION.
- */
-static int ntfs_read_locked_attr_inode(struct inode *base_vi, struct inode *vi)
-{
-       ntfs_volume *vol = NTFS_SB(vi->i_sb);
-       ntfs_inode *ni, *base_ni;
-       MFT_RECORD *m;
-       ATTR_RECORD *a;
-       ntfs_attr_search_ctx *ctx;
-       int err = 0;
-
-       ntfs_debug("Entering for i_ino 0x%lx.", vi->i_ino);
-
-       ntfs_init_big_inode(vi);
-
-       ni      = NTFS_I(vi);
-       base_ni = NTFS_I(base_vi);
-
-       /* Just mirror the values from the base inode. */
-       vi->i_uid       = base_vi->i_uid;
-       vi->i_gid       = base_vi->i_gid;
-       set_nlink(vi, base_vi->i_nlink);
-       inode_set_mtime_to_ts(vi, inode_get_mtime(base_vi));
-       inode_set_ctime_to_ts(vi, inode_get_ctime(base_vi));
-       inode_set_atime_to_ts(vi, inode_get_atime(base_vi));
-       vi->i_generation = ni->seq_no = base_ni->seq_no;
-
-       /* Set inode type to zero but preserve permissions. */
-       vi->i_mode      = base_vi->i_mode & ~S_IFMT;
-
-       m = map_mft_record(base_ni);
-       if (IS_ERR(m)) {
-               err = PTR_ERR(m);
-               goto err_out;
-       }
-       ctx = ntfs_attr_get_search_ctx(base_ni, m);
-       if (!ctx) {
-               err = -ENOMEM;
-               goto unm_err_out;
-       }
-       /* Find the attribute. */
-       err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
-                       CASE_SENSITIVE, 0, NULL, 0, ctx);
-       if (unlikely(err))
-               goto unm_err_out;
-       a = ctx->attr;
-       if (a->flags & (ATTR_COMPRESSION_MASK | ATTR_IS_SPARSE)) {
-               if (a->flags & ATTR_COMPRESSION_MASK) {
-                       NInoSetCompressed(ni);
-                       if ((ni->type != AT_DATA) || (ni->type == AT_DATA &&
-                                       ni->name_len)) {
-                               ntfs_error(vi->i_sb, "Found compressed "
-                                               "non-data or named data "
-                                               "attribute.  Please report "
-                                               "you saw this message to "
-                                               "linux-ntfs-dev@lists."
-                                               "sourceforge.net");
-                               goto unm_err_out;
-                       }
-                       if (vol->cluster_size > 4096) {
-                               ntfs_error(vi->i_sb, "Found compressed "
-                                               "attribute but compression is "
-                                               "disabled due to cluster size "
-                                               "(%i) > 4kiB.",
-                                               vol->cluster_size);
-                               goto unm_err_out;
-                       }
-                       if ((a->flags & ATTR_COMPRESSION_MASK) !=
-                                       ATTR_IS_COMPRESSED) {
-                               ntfs_error(vi->i_sb, "Found unknown "
-                                               "compression method.");
-                               goto unm_err_out;
-                       }
-               }
-               /*
-                * The compressed/sparse flag set in an index root just means
-                * to compress all files.
-                */
-               if (NInoMstProtected(ni) && ni->type != AT_INDEX_ROOT) {
-                       ntfs_error(vi->i_sb, "Found mst protected attribute "
-                                       "but the attribute is %s.  Please "
-                                       "report you saw this message to "
-                                       "linux-ntfs-dev@lists.sourceforge.net",
-                                       NInoCompressed(ni) ? "compressed" :
-                                       "sparse");
-                       goto unm_err_out;
-               }
-               if (a->flags & ATTR_IS_SPARSE)
-                       NInoSetSparse(ni);
-       }
-       if (a->flags & ATTR_IS_ENCRYPTED) {
-               if (NInoCompressed(ni)) {
-                       ntfs_error(vi->i_sb, "Found encrypted and compressed "
-                                       "data.");
-                       goto unm_err_out;
-               }
-               /*
-                * The encryption flag set in an index root just means to
-                * encrypt all files.
-                */
-               if (NInoMstProtected(ni) && ni->type != AT_INDEX_ROOT) {
-                       ntfs_error(vi->i_sb, "Found mst protected attribute "
-                                       "but the attribute is encrypted.  "
-                                       "Please report you saw this message "
-                                       "to linux-ntfs-dev@lists.sourceforge."
-                                       "net");
-                       goto unm_err_out;
-               }
-               if (ni->type != AT_DATA) {
-                       ntfs_error(vi->i_sb, "Found encrypted non-data "
-                                       "attribute.");
-                       goto unm_err_out;
-               }
-               NInoSetEncrypted(ni);
-       }
-       if (!a->non_resident) {
-               /* Ensure the attribute name is placed before the value. */
-               if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=
-                               le16_to_cpu(a->data.resident.value_offset)))) {
-                       ntfs_error(vol->sb, "Attribute name is placed after "
-                                       "the attribute value.");
-                       goto unm_err_out;
-               }
-               if (NInoMstProtected(ni)) {
-                       ntfs_error(vi->i_sb, "Found mst protected attribute "
-                                       "but the attribute is resident.  "
-                                       "Please report you saw this message to "
-                                       "linux-ntfs-dev@lists.sourceforge.net");
-                       goto unm_err_out;
-               }
-               vi->i_size = ni->initialized_size = le32_to_cpu(
-                               a->data.resident.value_length);
-               ni->allocated_size = le32_to_cpu(a->length) -
-                               le16_to_cpu(a->data.resident.value_offset);
-               if (vi->i_size > ni->allocated_size) {
-                       ntfs_error(vi->i_sb, "Resident attribute is corrupt "
-                                       "(size exceeds allocation).");
-                       goto unm_err_out;
-               }
-       } else {
-               NInoSetNonResident(ni);
-               /*
-                * Ensure the attribute name is placed before the mapping pairs
-                * array.
-                */
-               if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=
-                               le16_to_cpu(
-                               a->data.non_resident.mapping_pairs_offset)))) {
-                       ntfs_error(vol->sb, "Attribute name is placed after "
-                                       "the mapping pairs array.");
-                       goto unm_err_out;
-               }
-               if (NInoCompressed(ni) || NInoSparse(ni)) {
-                       if (NInoCompressed(ni) && a->data.non_resident.
-                                       compression_unit != 4) {
-                               ntfs_error(vi->i_sb, "Found non-standard "
-                                               "compression unit (%u instead "
-                                               "of 4).  Cannot handle this.",
-                                               a->data.non_resident.
-                                               compression_unit);
-                               err = -EOPNOTSUPP;
-                               goto unm_err_out;
-                       }
-                       if (a->data.non_resident.compression_unit) {
-                               ni->itype.compressed.block_size = 1U <<
-                                               (a->data.non_resident.
-                                               compression_unit +
-                                               vol->cluster_size_bits);
-                               ni->itype.compressed.block_size_bits =
-                                               ffs(ni->itype.compressed.
-                                               block_size) - 1;
-                               ni->itype.compressed.block_clusters = 1U <<
-                                               a->data.non_resident.
-                                               compression_unit;
-                       } else {
-                               ni->itype.compressed.block_size = 0;
-                               ni->itype.compressed.block_size_bits = 0;
-                               ni->itype.compressed.block_clusters = 0;
-                       }
-                       ni->itype.compressed.size = sle64_to_cpu(
-                                       a->data.non_resident.compressed_size);
-               }
-               if (a->data.non_resident.lowest_vcn) {
-                       ntfs_error(vi->i_sb, "First extent of attribute has "
-                                       "non-zero lowest_vcn.");
-                       goto unm_err_out;
-               }
-               vi->i_size = sle64_to_cpu(a->data.non_resident.data_size);
-               ni->initialized_size = sle64_to_cpu(
-                               a->data.non_resident.initialized_size);
-               ni->allocated_size = sle64_to_cpu(
-                               a->data.non_resident.allocated_size);
-       }
-       vi->i_mapping->a_ops = &ntfs_normal_aops;
-       if (NInoMstProtected(ni))
-               vi->i_mapping->a_ops = &ntfs_mst_aops;
-       else if (NInoCompressed(ni))
-               vi->i_mapping->a_ops = &ntfs_compressed_aops;
-       if ((NInoCompressed(ni) || NInoSparse(ni)) && ni->type != AT_INDEX_ROOT)
-               vi->i_blocks = ni->itype.compressed.size >> 9;
-       else
-               vi->i_blocks = ni->allocated_size >> 9;
-       /*
-        * Make sure the base inode does not go away and attach it to the
-        * attribute inode.
-        */
-       igrab(base_vi);
-       ni->ext.base_ntfs_ino = base_ni;
-       ni->nr_extents = -1;
-
-       ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(base_ni);
-
-       ntfs_debug("Done.");
-       return 0;
-
-unm_err_out:
-       if (!err)
-               err = -EIO;
-       if (ctx)
-               ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(base_ni);
-err_out:
-       ntfs_error(vol->sb, "Failed with error code %i while reading attribute "
-                       "inode (mft_no 0x%lx, type 0x%x, name_len %i).  "
-                       "Marking corrupt inode and base inode 0x%lx as bad.  "
-                       "Run chkdsk.", err, vi->i_ino, ni->type, ni->name_len,
-                       base_vi->i_ino);
-       make_bad_inode(vi);
-       if (err != -ENOMEM)
-               NVolSetErrors(vol);
-       return err;
-}
-
-/**
- * ntfs_read_locked_index_inode - read an index inode from its base inode
- * @base_vi:   base inode
- * @vi:                index inode to read
- *
- * ntfs_read_locked_index_inode() is called from ntfs_index_iget() to read the
- * index inode described by @vi into memory from the base mft record described
- * by @base_ni.
- *
- * ntfs_read_locked_index_inode() maps, pins and locks the base inode for
- * reading and looks up the attributes relating to the index described by @vi
- * before setting up the necessary fields in @vi as well as initializing the
- * ntfs inode.
- *
- * Note, index inodes are essentially attribute inodes (NInoAttr() is true)
- * with the attribute type set to AT_INDEX_ALLOCATION.  Apart from that, they
- * are setup like directory inodes since directories are a special case of
- * indices ao they need to be treated in much the same way.  Most importantly,
- * for small indices the index allocation attribute might not actually exist.
- * However, the index root attribute always exists but this does not need to
- * have an inode associated with it and this is why we define a new inode type
- * index.  Also, like for directories, we need to have an attribute inode for
- * the bitmap attribute corresponding to the index allocation attribute and we
- * can store this in the appropriate field of the inode, just like we do for
- * normal directory inodes.
- *
- * Q: What locks are held when the function is called?
- * A: i_state has I_NEW set, hence the inode is locked, also
- *    i_count is set to 1, so it is not going to go away
- *
- * Return 0 on success and -errno on error.  In the error case, the inode will
- * have had make_bad_inode() executed on it.
- */
-static int ntfs_read_locked_index_inode(struct inode *base_vi, struct inode *vi)
-{
-       loff_t bvi_size;
-       ntfs_volume *vol = NTFS_SB(vi->i_sb);
-       ntfs_inode *ni, *base_ni, *bni;
-       struct inode *bvi;
-       MFT_RECORD *m;
-       ATTR_RECORD *a;
-       ntfs_attr_search_ctx *ctx;
-       INDEX_ROOT *ir;
-       u8 *ir_end, *index_end;
-       int err = 0;
-
-       ntfs_debug("Entering for i_ino 0x%lx.", vi->i_ino);
-       ntfs_init_big_inode(vi);
-       ni      = NTFS_I(vi);
-       base_ni = NTFS_I(base_vi);
-       /* Just mirror the values from the base inode. */
-       vi->i_uid       = base_vi->i_uid;
-       vi->i_gid       = base_vi->i_gid;
-       set_nlink(vi, base_vi->i_nlink);
-       inode_set_mtime_to_ts(vi, inode_get_mtime(base_vi));
-       inode_set_ctime_to_ts(vi, inode_get_ctime(base_vi));
-       inode_set_atime_to_ts(vi, inode_get_atime(base_vi));
-       vi->i_generation = ni->seq_no = base_ni->seq_no;
-       /* Set inode type to zero but preserve permissions. */
-       vi->i_mode      = base_vi->i_mode & ~S_IFMT;
-       /* Map the mft record for the base inode. */
-       m = map_mft_record(base_ni);
-       if (IS_ERR(m)) {
-               err = PTR_ERR(m);
-               goto err_out;
-       }
-       ctx = ntfs_attr_get_search_ctx(base_ni, m);
-       if (!ctx) {
-               err = -ENOMEM;
-               goto unm_err_out;
-       }
-       /* Find the index root attribute. */
-       err = ntfs_attr_lookup(AT_INDEX_ROOT, ni->name, ni->name_len,
-                       CASE_SENSITIVE, 0, NULL, 0, ctx);
-       if (unlikely(err)) {
-               if (err == -ENOENT)
-                       ntfs_error(vi->i_sb, "$INDEX_ROOT attribute is "
-                                       "missing.");
-               goto unm_err_out;
-       }
-       a = ctx->attr;
-       /* Set up the state. */
-       if (unlikely(a->non_resident)) {
-               ntfs_error(vol->sb, "$INDEX_ROOT attribute is not resident.");
-               goto unm_err_out;
-       }
-       /* Ensure the attribute name is placed before the value. */
-       if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=
-                       le16_to_cpu(a->data.resident.value_offset)))) {
-               ntfs_error(vol->sb, "$INDEX_ROOT attribute name is placed "
-                               "after the attribute value.");
-               goto unm_err_out;
-       }
-       /*
-        * Compressed/encrypted/sparse index root is not allowed, except for
-        * directories of course but those are not dealt with here.
-        */
-       if (a->flags & (ATTR_COMPRESSION_MASK | ATTR_IS_ENCRYPTED |
-                       ATTR_IS_SPARSE)) {
-               ntfs_error(vi->i_sb, "Found compressed/encrypted/sparse index "
-                               "root attribute.");
-               goto unm_err_out;
-       }
-       ir = (INDEX_ROOT*)((u8*)a + le16_to_cpu(a->data.resident.value_offset));
-       ir_end = (u8*)ir + le32_to_cpu(a->data.resident.value_length);
-       if (ir_end > (u8*)ctx->mrec + vol->mft_record_size) {
-               ntfs_error(vi->i_sb, "$INDEX_ROOT attribute is corrupt.");
-               goto unm_err_out;
-       }
-       index_end = (u8*)&ir->index + le32_to_cpu(ir->index.index_length);
-       if (index_end > ir_end) {
-               ntfs_error(vi->i_sb, "Index is corrupt.");
-               goto unm_err_out;
-       }
-       if (ir->type) {
-               ntfs_error(vi->i_sb, "Index type is not 0 (type is 0x%x).",
-                               le32_to_cpu(ir->type));
-               goto unm_err_out;
-       }
-       ni->itype.index.collation_rule = ir->collation_rule;
-       ntfs_debug("Index collation rule is 0x%x.",
-                       le32_to_cpu(ir->collation_rule));
-       ni->itype.index.block_size = le32_to_cpu(ir->index_block_size);
-       if (!is_power_of_2(ni->itype.index.block_size)) {
-               ntfs_error(vi->i_sb, "Index block size (%u) is not a power of "
-                               "two.", ni->itype.index.block_size);
-               goto unm_err_out;
-       }
-       if (ni->itype.index.block_size > PAGE_SIZE) {
-               ntfs_error(vi->i_sb, "Index block size (%u) > PAGE_SIZE "
-                               "(%ld) is not supported.  Sorry.",
-                               ni->itype.index.block_size, PAGE_SIZE);
-               err = -EOPNOTSUPP;
-               goto unm_err_out;
-       }
-       if (ni->itype.index.block_size < NTFS_BLOCK_SIZE) {
-               ntfs_error(vi->i_sb, "Index block size (%u) < NTFS_BLOCK_SIZE "
-                               "(%i) is not supported.  Sorry.",
-                               ni->itype.index.block_size, NTFS_BLOCK_SIZE);
-               err = -EOPNOTSUPP;
-               goto unm_err_out;
-       }
-       ni->itype.index.block_size_bits = ffs(ni->itype.index.block_size) - 1;
-       /* Determine the size of a vcn in the index. */
-       if (vol->cluster_size <= ni->itype.index.block_size) {
-               ni->itype.index.vcn_size = vol->cluster_size;
-               ni->itype.index.vcn_size_bits = vol->cluster_size_bits;
-       } else {
-               ni->itype.index.vcn_size = vol->sector_size;
-               ni->itype.index.vcn_size_bits = vol->sector_size_bits;
-       }
-       /* Check for presence of index allocation attribute. */
-       if (!(ir->index.flags & LARGE_INDEX)) {
-               /* No index allocation. */
-               vi->i_size = ni->initialized_size = ni->allocated_size = 0;
-               /* We are done with the mft record, so we release it. */
-               ntfs_attr_put_search_ctx(ctx);
-               unmap_mft_record(base_ni);
-               m = NULL;
-               ctx = NULL;
-               goto skip_large_index_stuff;
-       } /* LARGE_INDEX:  Index allocation present.  Setup state. */
-       NInoSetIndexAllocPresent(ni);
-       /* Find index allocation attribute. */
-       ntfs_attr_reinit_search_ctx(ctx);
-       err = ntfs_attr_lookup(AT_INDEX_ALLOCATION, ni->name, ni->name_len,
-                       CASE_SENSITIVE, 0, NULL, 0, ctx);
-       if (unlikely(err)) {
-               if (err == -ENOENT)
-                       ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute is "
-                                       "not present but $INDEX_ROOT "
-                                       "indicated it is.");
-               else
-                       ntfs_error(vi->i_sb, "Failed to lookup "
-                                       "$INDEX_ALLOCATION attribute.");
-               goto unm_err_out;
-       }
-       a = ctx->attr;
-       if (!a->non_resident) {
-               ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute is "
-                               "resident.");
-               goto unm_err_out;
-       }
-       /*
-        * Ensure the attribute name is placed before the mapping pairs array.
-        */
-       if (unlikely(a->name_length && (le16_to_cpu(a->name_offset) >=
-                       le16_to_cpu(
-                       a->data.non_resident.mapping_pairs_offset)))) {
-               ntfs_error(vol->sb, "$INDEX_ALLOCATION attribute name is "
-                               "placed after the mapping pairs array.");
-               goto unm_err_out;
-       }
-       if (a->flags & ATTR_IS_ENCRYPTED) {
-               ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute is "
-                               "encrypted.");
-               goto unm_err_out;
-       }
-       if (a->flags & ATTR_IS_SPARSE) {
-               ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute is sparse.");
-               goto unm_err_out;
-       }
-       if (a->flags & ATTR_COMPRESSION_MASK) {
-               ntfs_error(vi->i_sb, "$INDEX_ALLOCATION attribute is "
-                               "compressed.");
-               goto unm_err_out;
-       }
-       if (a->data.non_resident.lowest_vcn) {
-               ntfs_error(vi->i_sb, "First extent of $INDEX_ALLOCATION "
-                               "attribute has non zero lowest_vcn.");
-               goto unm_err_out;
-       }
-       vi->i_size = sle64_to_cpu(a->data.non_resident.data_size);
-       ni->initialized_size = sle64_to_cpu(
-                       a->data.non_resident.initialized_size);
-       ni->allocated_size = sle64_to_cpu(a->data.non_resident.allocated_size);
-       /*
-        * We are done with the mft record, so we release it.  Otherwise
-        * we would deadlock in ntfs_attr_iget().
-        */
-       ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(base_ni);
-       m = NULL;
-       ctx = NULL;
-       /* Get the index bitmap attribute inode. */
-       bvi = ntfs_attr_iget(base_vi, AT_BITMAP, ni->name, ni->name_len);
-       if (IS_ERR(bvi)) {
-               ntfs_error(vi->i_sb, "Failed to get bitmap attribute.");
-               err = PTR_ERR(bvi);
-               goto unm_err_out;
-       }
-       bni = NTFS_I(bvi);
-       if (NInoCompressed(bni) || NInoEncrypted(bni) ||
-                       NInoSparse(bni)) {
-               ntfs_error(vi->i_sb, "$BITMAP attribute is compressed and/or "
-                               "encrypted and/or sparse.");
-               goto iput_unm_err_out;
-       }
-       /* Consistency check bitmap size vs. index allocation size. */
-       bvi_size = i_size_read(bvi);
-       if ((bvi_size << 3) < (vi->i_size >> ni->itype.index.block_size_bits)) {
-               ntfs_error(vi->i_sb, "Index bitmap too small (0x%llx) for "
-                               "index allocation (0x%llx).", bvi_size << 3,
-                               vi->i_size);
-               goto iput_unm_err_out;
-       }
-       iput(bvi);
-skip_large_index_stuff:
-       /* Setup the operations for this index inode. */
-       vi->i_mapping->a_ops = &ntfs_mst_aops;
-       vi->i_blocks = ni->allocated_size >> 9;
-       /*
-        * Make sure the base inode doesn't go away and attach it to the
-        * index inode.
-        */
-       igrab(base_vi);
-       ni->ext.base_ntfs_ino = base_ni;
-       ni->nr_extents = -1;
-
-       ntfs_debug("Done.");
-       return 0;
-iput_unm_err_out:
-       iput(bvi);
-unm_err_out:
-       if (!err)
-               err = -EIO;
-       if (ctx)
-               ntfs_attr_put_search_ctx(ctx);
-       if (m)
-               unmap_mft_record(base_ni);
-err_out:
-       ntfs_error(vi->i_sb, "Failed with error code %i while reading index "
-                       "inode (mft_no 0x%lx, name_len %i.", err, vi->i_ino,
-                       ni->name_len);
-       make_bad_inode(vi);
-       if (err != -EOPNOTSUPP && err != -ENOMEM)
-               NVolSetErrors(vol);
-       return err;
-}
-
-/*
- * The MFT inode has special locking, so teach the lock validator
- * about this by splitting off the locking rules of the MFT from
- * the locking rules of other inodes. The MFT inode can never be
- * accessed from the VFS side (or even internally), only by the
- * map_mft functions.
- */
-static struct lock_class_key mft_ni_runlist_lock_key, mft_ni_mrec_lock_key;
-
-/**
- * ntfs_read_inode_mount - special read_inode for mount time use only
- * @vi:                inode to read
- *
- * Read inode FILE_MFT at mount time, only called with super_block lock
- * held from within the read_super() code path.
- *
- * This function exists because when it is called the page cache for $MFT/$DATA
- * is not initialized and hence we cannot get at the contents of mft records
- * by calling map_mft_record*().
- *
- * Further it needs to cope with the circular references problem, i.e. cannot
- * load any attributes other than $ATTRIBUTE_LIST until $DATA is loaded, because
- * we do not know where the other extent mft records are yet and again, because
- * we cannot call map_mft_record*() yet.  Obviously this applies only when an
- * attribute list is actually present in $MFT inode.
- *
- * We solve these problems by starting with the $DATA attribute before anything
- * else and iterating using ntfs_attr_lookup($DATA) over all extents.  As each
- * extent is found, we ntfs_mapping_pairs_decompress() including the implied
- * ntfs_runlists_merge().  Each step of the iteration necessarily provides
- * sufficient information for the next step to complete.
- *
- * This should work but there are two possible pit falls (see inline comments
- * below), but only time will tell if they are real pits or just smoke...
- */
-int ntfs_read_inode_mount(struct inode *vi)
-{
-       VCN next_vcn, last_vcn, highest_vcn;
-       s64 block;
-       struct super_block *sb = vi->i_sb;
-       ntfs_volume *vol = NTFS_SB(sb);
-       struct buffer_head *bh;
-       ntfs_inode *ni;
-       MFT_RECORD *m = NULL;
-       ATTR_RECORD *a;
-       ntfs_attr_search_ctx *ctx;
-       unsigned int i, nr_blocks;
-       int err;
-
-       ntfs_debug("Entering.");
-
-       /* Initialize the ntfs specific part of @vi. */
-       ntfs_init_big_inode(vi);
-
-       ni = NTFS_I(vi);
-
-       /* Setup the data attribute. It is special as it is mst protected. */
-       NInoSetNonResident(ni);
-       NInoSetMstProtected(ni);
-       NInoSetSparseDisabled(ni);
-       ni->type = AT_DATA;
-       ni->name = NULL;
-       ni->name_len = 0;
-       /*
-        * This sets up our little cheat allowing us to reuse the async read io
-        * completion handler for directories.
-        */
-       ni->itype.index.block_size = vol->mft_record_size;
-       ni->itype.index.block_size_bits = vol->mft_record_size_bits;
-
-       /* Very important! Needed to be able to call map_mft_record*(). */
-       vol->mft_ino = vi;
-
-       /* Allocate enough memory to read the first mft record. */
-       if (vol->mft_record_size > 64 * 1024) {
-               ntfs_error(sb, "Unsupported mft record size %i (max 64kiB).",
-                               vol->mft_record_size);
-               goto err_out;
-       }
-       i = vol->mft_record_size;
-       if (i < sb->s_blocksize)
-               i = sb->s_blocksize;
-       m = (MFT_RECORD*)ntfs_malloc_nofs(i);
-       if (!m) {
-               ntfs_error(sb, "Failed to allocate buffer for $MFT record 0.");
-               goto err_out;
-       }
-
-       /* Determine the first block of the $MFT/$DATA attribute. */
-       block = vol->mft_lcn << vol->cluster_size_bits >>
-                       sb->s_blocksize_bits;
-       nr_blocks = vol->mft_record_size >> sb->s_blocksize_bits;
-       if (!nr_blocks)
-               nr_blocks = 1;
-
-       /* Load $MFT/$DATA's first mft record. */
-       for (i = 0; i < nr_blocks; i++) {
-               bh = sb_bread(sb, block++);
-               if (!bh) {
-                       ntfs_error(sb, "Device read failed.");
-                       goto err_out;
-               }
-               memcpy((char*)m + (i << sb->s_blocksize_bits), bh->b_data,
-                               sb->s_blocksize);
-               brelse(bh);
-       }
-
-       if (le32_to_cpu(m->bytes_allocated) != vol->mft_record_size) {
-               ntfs_error(sb, "Incorrect mft record size %u in superblock, should be %u.",
-                               le32_to_cpu(m->bytes_allocated), vol->mft_record_size);
-               goto err_out;
-       }
-
-       /* Apply the mst fixups. */
-       if (post_read_mst_fixup((NTFS_RECORD*)m, vol->mft_record_size)) {
-               /* FIXME: Try to use the $MFTMirr now. */
-               ntfs_error(sb, "MST fixup failed. $MFT is corrupt.");
-               goto err_out;
-       }
-
-       /* Sanity check offset to the first attribute */
-       if (le16_to_cpu(m->attrs_offset) >= le32_to_cpu(m->bytes_allocated)) {
-               ntfs_error(sb, "Incorrect mft offset to the first attribute %u in superblock.",
-                              le16_to_cpu(m->attrs_offset));
-               goto err_out;
-       }
-
-       /* Need this to sanity check attribute list references to $MFT. */
-       vi->i_generation = ni->seq_no = le16_to_cpu(m->sequence_number);
-
-       /* Provides read_folio() for map_mft_record(). */
-       vi->i_mapping->a_ops = &ntfs_mst_aops;
-
-       ctx = ntfs_attr_get_search_ctx(ni, m);
-       if (!ctx) {
-               err = -ENOMEM;
-               goto err_out;
-       }
-
-       /* Find the attribute list attribute if present. */
-       err = ntfs_attr_lookup(AT_ATTRIBUTE_LIST, NULL, 0, 0, 0, NULL, 0, ctx);
-       if (err) {
-               if (unlikely(err != -ENOENT)) {
-                       ntfs_error(sb, "Failed to lookup attribute list "
-                                       "attribute. You should run chkdsk.");
-                       goto put_err_out;
-               }
-       } else /* if (!err) */ {
-               ATTR_LIST_ENTRY *al_entry, *next_al_entry;
-               u8 *al_end;
-               static const char *es = "  Not allowed.  $MFT is corrupt.  "
-                               "You should run chkdsk.";
-
-               ntfs_debug("Attribute list attribute found in $MFT.");
-               NInoSetAttrList(ni);
-               a = ctx->attr;
-               if (a->flags & ATTR_COMPRESSION_MASK) {
-                       ntfs_error(sb, "Attribute list attribute is "
-                                       "compressed.%s", es);
-                       goto put_err_out;
-               }
-               if (a->flags & ATTR_IS_ENCRYPTED ||
-                               a->flags & ATTR_IS_SPARSE) {
-                       if (a->non_resident) {
-                               ntfs_error(sb, "Non-resident attribute list "
-                                               "attribute is encrypted/"
-                                               "sparse.%s", es);
-                               goto put_err_out;
-                       }
-                       ntfs_warning(sb, "Resident attribute list attribute "
-                                       "in $MFT system file is marked "
-                                       "encrypted/sparse which is not true.  "
-                                       "However, Windows allows this and "
-                                       "chkdsk does not detect or correct it "
-                                       "so we will just ignore the invalid "
-                                       "flags and pretend they are not set.");
-               }
-               /* Now allocate memory for the attribute list. */
-               ni->attr_list_size = (u32)ntfs_attr_size(a);
-               if (!ni->attr_list_size) {
-                       ntfs_error(sb, "Attr_list_size is zero");
-                       goto put_err_out;
-               }
-               ni->attr_list = ntfs_malloc_nofs(ni->attr_list_size);
-               if (!ni->attr_list) {
-                       ntfs_error(sb, "Not enough memory to allocate buffer "
-                                       "for attribute list.");
-                       goto put_err_out;
-               }
-               if (a->non_resident) {
-                       NInoSetAttrListNonResident(ni);
-                       if (a->data.non_resident.lowest_vcn) {
-                               ntfs_error(sb, "Attribute list has non zero "
-                                               "lowest_vcn. $MFT is corrupt. "
-                                               "You should run chkdsk.");
-                               goto put_err_out;
-                       }
-                       /* Setup the runlist. */
-                       ni->attr_list_rl.rl = ntfs_mapping_pairs_decompress(vol,
-                                       a, NULL);
-                       if (IS_ERR(ni->attr_list_rl.rl)) {
-                               err = PTR_ERR(ni->attr_list_rl.rl);
-                               ni->attr_list_rl.rl = NULL;
-                               ntfs_error(sb, "Mapping pairs decompression "
-                                               "failed with error code %i.",
-                                               -err);
-                               goto put_err_out;
-                       }
-                       /* Now load the attribute list. */
-                       if ((err = load_attribute_list(vol, &ni->attr_list_rl,
-                                       ni->attr_list, ni->attr_list_size,
-                                       sle64_to_cpu(a->data.
-                                       non_resident.initialized_size)))) {
-                               ntfs_error(sb, "Failed to load attribute list "
-                                               "attribute with error code %i.",
-                                               -err);
-                               goto put_err_out;
-                       }
-               } else /* if (!ctx.attr->non_resident) */ {
-                       if ((u8*)a + le16_to_cpu(
-                                       a->data.resident.value_offset) +
-                                       le32_to_cpu(
-                                       a->data.resident.value_length) >
-                                       (u8*)ctx->mrec + vol->mft_record_size) {
-                               ntfs_error(sb, "Corrupt attribute list "
-                                               "attribute.");
-                               goto put_err_out;
-                       }
-                       /* Now copy the attribute list. */
-                       memcpy(ni->attr_list, (u8*)a + le16_to_cpu(
-                                       a->data.resident.value_offset),
-                                       le32_to_cpu(
-                                       a->data.resident.value_length));
-               }
-               /* The attribute list is now setup in memory. */
-               /*
-                * FIXME: I don't know if this case is actually possible.
-                * According to logic it is not possible but I have seen too
-                * many weird things in MS software to rely on logic... Thus we
-                * perform a manual search and make sure the first $MFT/$DATA
-                * extent is in the base inode. If it is not we abort with an
-                * error and if we ever see a report of this error we will need
-                * to do some magic in order to have the necessary mft record
-                * loaded and in the right place in the page cache. But
-                * hopefully logic will prevail and this never happens...
-                */
-               al_entry = (ATTR_LIST_ENTRY*)ni->attr_list;
-               al_end = (u8*)al_entry + ni->attr_list_size;
-               for (;; al_entry = next_al_entry) {
-                       /* Out of bounds check. */
-                       if ((u8*)al_entry < ni->attr_list ||
-                                       (u8*)al_entry > al_end)
-                               goto em_put_err_out;
-                       /* Catch the end of the attribute list. */
-                       if ((u8*)al_entry == al_end)
-                               goto em_put_err_out;
-                       if (!al_entry->length)
-                               goto em_put_err_out;
-                       if ((u8*)al_entry + 6 > al_end || (u8*)al_entry +
-                                       le16_to_cpu(al_entry->length) > al_end)
-                               goto em_put_err_out;
-                       next_al_entry = (ATTR_LIST_ENTRY*)((u8*)al_entry +
-                                       le16_to_cpu(al_entry->length));
-                       if (le32_to_cpu(al_entry->type) > le32_to_cpu(AT_DATA))
-                               goto em_put_err_out;
-                       if (AT_DATA != al_entry->type)
-                               continue;
-                       /* We want an unnamed attribute. */
-                       if (al_entry->name_length)
-                               goto em_put_err_out;
-                       /* Want the first entry, i.e. lowest_vcn == 0. */
-                       if (al_entry->lowest_vcn)
-                               goto em_put_err_out;
-                       /* First entry has to be in the base mft record. */
-                       if (MREF_LE(al_entry->mft_reference) != vi->i_ino) {
-                               /* MFT references do not match, logic fails. */
-                               ntfs_error(sb, "BUG: The first $DATA extent "
-                                               "of $MFT is not in the base "
-                                               "mft record. Please report "
-                                               "you saw this message to "
-                                               "linux-ntfs-dev@lists."
-                                               "sourceforge.net");
-                               goto put_err_out;
-                       } else {
-                               /* Sequence numbers must match. */
-                               if (MSEQNO_LE(al_entry->mft_reference) !=
-                                               ni->seq_no)
-                                       goto em_put_err_out;
-                               /* Got it. All is ok. We can stop now. */
-                               break;
-                       }
-               }
-       }
-
-       ntfs_attr_reinit_search_ctx(ctx);
-
-       /* Now load all attribute extents. */
-       a = NULL;
-       next_vcn = last_vcn = highest_vcn = 0;
-       while (!(err = ntfs_attr_lookup(AT_DATA, NULL, 0, 0, next_vcn, NULL, 0,
-                       ctx))) {
-               runlist_element *nrl;
-
-               /* Cache the current attribute. */
-               a = ctx->attr;
-               /* $MFT must be non-resident. */
-               if (!a->non_resident) {
-                       ntfs_error(sb, "$MFT must be non-resident but a "
-                                       "resident extent was found. $MFT is "
-                                       "corrupt. Run chkdsk.");
-                       goto put_err_out;
-               }
-               /* $MFT must be uncompressed and unencrypted. */
-               if (a->flags & ATTR_COMPRESSION_MASK ||
-                               a->flags & ATTR_IS_ENCRYPTED ||
-                               a->flags & ATTR_IS_SPARSE) {
-                       ntfs_error(sb, "$MFT must be uncompressed, "
-                                       "non-sparse, and unencrypted but a "
-                                       "compressed/sparse/encrypted extent "
-                                       "was found. $MFT is corrupt. Run "
-                                       "chkdsk.");
-                       goto put_err_out;
-               }
-               /*
-                * Decompress the mapping pairs array of this extent and merge
-                * the result into the existing runlist. No need for locking
-                * as we have exclusive access to the inode at this time and we
-                * are a mount in progress task, too.
-                */
-               nrl = ntfs_mapping_pairs_decompress(vol, a, ni->runlist.rl);
-               if (IS_ERR(nrl)) {
-                       ntfs_error(sb, "ntfs_mapping_pairs_decompress() "
-                                       "failed with error code %ld.  $MFT is "
-                                       "corrupt.", PTR_ERR(nrl));
-                       goto put_err_out;
-               }
-               ni->runlist.rl = nrl;
-
-               /* Are we in the first extent? */
-               if (!next_vcn) {
-                       if (a->data.non_resident.lowest_vcn) {
-                               ntfs_error(sb, "First extent of $DATA "
-                                               "attribute has non zero "
-                                               "lowest_vcn. $MFT is corrupt. "
-                                               "You should run chkdsk.");
-                               goto put_err_out;
-                       }
-                       /* Get the last vcn in the $DATA attribute. */
-                       last_vcn = sle64_to_cpu(
-                                       a->data.non_resident.allocated_size)
-                                       >> vol->cluster_size_bits;
-                       /* Fill in the inode size. */
-                       vi->i_size = sle64_to_cpu(
-                                       a->data.non_resident.data_size);
-                       ni->initialized_size = sle64_to_cpu(
-                                       a->data.non_resident.initialized_size);
-                       ni->allocated_size = sle64_to_cpu(
-                                       a->data.non_resident.allocated_size);
-                       /*
-                        * Verify the number of mft records does not exceed
-                        * 2^32 - 1.
-                        */
-                       if ((vi->i_size >> vol->mft_record_size_bits) >=
-                                       (1ULL << 32)) {
-                               ntfs_error(sb, "$MFT is too big! Aborting.");
-                               goto put_err_out;
-                       }
-                       /*
-                        * We have got the first extent of the runlist for
-                        * $MFT which means it is now relatively safe to call
-                        * the normal ntfs_read_inode() function.
-                        * Complete reading the inode, this will actually
-                        * re-read the mft record for $MFT, this time entering
-                        * it into the page cache with which we complete the
-                        * kick start of the volume. It should be safe to do
-                        * this now as the first extent of $MFT/$DATA is
-                        * already known and we would hope that we don't need
-                        * further extents in order to find the other
-                        * attributes belonging to $MFT. Only time will tell if
-                        * this is really the case. If not we will have to play
-                        * magic at this point, possibly duplicating a lot of
-                        * ntfs_read_inode() at this point. We will need to
-                        * ensure we do enough of its work to be able to call
-                        * ntfs_read_inode() on extents of $MFT/$DATA. But lets
-                        * hope this never happens...
-                        */
-                       ntfs_read_locked_inode(vi);
-                       if (is_bad_inode(vi)) {
-                               ntfs_error(sb, "ntfs_read_inode() of $MFT "
-                                               "failed. BUG or corrupt $MFT. "
-                                               "Run chkdsk and if no errors "
-                                               "are found, please report you "
-                                               "saw this message to "
-                                               "linux-ntfs-dev@lists."
-                                               "sourceforge.net");
-                               ntfs_attr_put_search_ctx(ctx);
-                               /* Revert to the safe super operations. */
-                               ntfs_free(m);
-                               return -1;
-                       }
-                       /*
-                        * Re-initialize some specifics about $MFT's inode as
-                        * ntfs_read_inode() will have set up the default ones.
-                        */
-                       /* Set uid and gid to root. */
-                       vi->i_uid = GLOBAL_ROOT_UID;
-                       vi->i_gid = GLOBAL_ROOT_GID;
-                       /* Regular file. No access for anyone. */
-                       vi->i_mode = S_IFREG;
-                       /* No VFS initiated operations allowed for $MFT. */
-                       vi->i_op = &ntfs_empty_inode_ops;
-                       vi->i_fop = &ntfs_empty_file_ops;
-               }
-
-               /* Get the lowest vcn for the next extent. */
-               highest_vcn = sle64_to_cpu(a->data.non_resident.highest_vcn);
-               next_vcn = highest_vcn + 1;
-
-               /* Only one extent or error, which we catch below. */
-               if (next_vcn <= 0)
-                       break;
-
-               /* Avoid endless loops due to corruption. */
-               if (next_vcn < sle64_to_cpu(
-                               a->data.non_resident.lowest_vcn)) {
-                       ntfs_error(sb, "$MFT has corrupt attribute list "
-                                       "attribute. Run chkdsk.");
-                       goto put_err_out;
-               }
-       }
-       if (err != -ENOENT) {
-               ntfs_error(sb, "Failed to lookup $MFT/$DATA attribute extent. "
-                               "$MFT is corrupt. Run chkdsk.");
-               goto put_err_out;
-       }
-       if (!a) {
-               ntfs_error(sb, "$MFT/$DATA attribute not found. $MFT is "
-                               "corrupt. Run chkdsk.");
-               goto put_err_out;
-       }
-       if (highest_vcn && highest_vcn != last_vcn - 1) {
-               ntfs_error(sb, "Failed to load the complete runlist for "
-                               "$MFT/$DATA. Driver bug or corrupt $MFT. "
-                               "Run chkdsk.");
-               ntfs_debug("highest_vcn = 0x%llx, last_vcn - 1 = 0x%llx",
-                               (unsigned long long)highest_vcn,
-                               (unsigned long long)last_vcn - 1);
-               goto put_err_out;
-       }
-       ntfs_attr_put_search_ctx(ctx);
-       ntfs_debug("Done.");
-       ntfs_free(m);
-
-       /*
-        * Split the locking rules of the MFT inode from the
-        * locking rules of other inodes:
-        */
-       lockdep_set_class(&ni->runlist.lock, &mft_ni_runlist_lock_key);
-       lockdep_set_class(&ni->mrec_lock, &mft_ni_mrec_lock_key);
-
-       return 0;
-
-em_put_err_out:
-       ntfs_error(sb, "Couldn't find first extent of $DATA attribute in "
-                       "attribute list. $MFT is corrupt. Run chkdsk.");
-put_err_out:
-       ntfs_attr_put_search_ctx(ctx);
-err_out:
-       ntfs_error(sb, "Failed. Marking inode as bad.");
-       make_bad_inode(vi);
-       ntfs_free(m);
-       return -1;
-}
-
-static void __ntfs_clear_inode(ntfs_inode *ni)
-{
-       /* Free all alocated memory. */
-       down_write(&ni->runlist.lock);
-       if (ni->runlist.rl) {
-               ntfs_free(ni->runlist.rl);
-               ni->runlist.rl = NULL;
-       }
-       up_write(&ni->runlist.lock);
-
-       if (ni->attr_list) {
-               ntfs_free(ni->attr_list);
-               ni->attr_list = NULL;
-       }
-
-       down_write(&ni->attr_list_rl.lock);
-       if (ni->attr_list_rl.rl) {
-               ntfs_free(ni->attr_list_rl.rl);
-               ni->attr_list_rl.rl = NULL;
-       }
-       up_write(&ni->attr_list_rl.lock);
-
-       if (ni->name_len && ni->name != I30) {
-               /* Catch bugs... */
-               BUG_ON(!ni->name);
-               kfree(ni->name);
-       }
-}
-
-void ntfs_clear_extent_inode(ntfs_inode *ni)
-{
-       ntfs_debug("Entering for inode 0x%lx.", ni->mft_no);
-
-       BUG_ON(NInoAttr(ni));
-       BUG_ON(ni->nr_extents != -1);
-
-#ifdef NTFS_RW
-       if (NInoDirty(ni)) {
-               if (!is_bad_inode(VFS_I(ni->ext.base_ntfs_ino)))
-                       ntfs_error(ni->vol->sb, "Clearing dirty extent inode!  "
-                                       "Losing data!  This is a BUG!!!");
-               // FIXME:  Do something!!!
-       }
-#endif /* NTFS_RW */
-
-       __ntfs_clear_inode(ni);
-
-       /* Bye, bye... */
-       ntfs_destroy_extent_inode(ni);
-}
-
-/**
- * ntfs_evict_big_inode - clean up the ntfs specific part of an inode
- * @vi:                vfs inode pending annihilation
- *
- * When the VFS is going to remove an inode from memory, ntfs_clear_big_inode()
- * is called, which deallocates all memory belonging to the NTFS specific part
- * of the inode and returns.
- *
- * If the MFT record is dirty, we commit it before doing anything else.
- */
-void ntfs_evict_big_inode(struct inode *vi)
-{
-       ntfs_inode *ni = NTFS_I(vi);
-
-       truncate_inode_pages_final(&vi->i_data);
-       clear_inode(vi);
-
-#ifdef NTFS_RW
-       if (NInoDirty(ni)) {
-               bool was_bad = (is_bad_inode(vi));
-
-               /* Committing the inode also commits all extent inodes. */
-               ntfs_commit_inode(vi);
-
-               if (!was_bad && (is_bad_inode(vi) || NInoDirty(ni))) {
-                       ntfs_error(vi->i_sb, "Failed to commit dirty inode "
-                                       "0x%lx.  Losing data!", vi->i_ino);
-                       // FIXME:  Do something!!!
-               }
-       }
-#endif /* NTFS_RW */
-
-       /* No need to lock at this stage as no one else has a reference. */
-       if (ni->nr_extents > 0) {
-               int i;
-
-               for (i = 0; i < ni->nr_extents; i++)
-                       ntfs_clear_extent_inode(ni->ext.extent_ntfs_inos[i]);
-               kfree(ni->ext.extent_ntfs_inos);
-       }
-
-       __ntfs_clear_inode(ni);
-
-       if (NInoAttr(ni)) {
-               /* Release the base inode if we are holding it. */
-               if (ni->nr_extents == -1) {
-                       iput(VFS_I(ni->ext.base_ntfs_ino));
-                       ni->nr_extents = 0;
-                       ni->ext.base_ntfs_ino = NULL;
-               }
-       }
-       BUG_ON(ni->page);
-       if (!atomic_dec_and_test(&ni->count))
-               BUG();
-       return;
-}
-
-/**
- * ntfs_show_options - show mount options in /proc/mounts
- * @sf:                seq_file in which to write our mount options
- * @root:      root of the mounted tree whose mount options to display
- *
- * Called by the VFS once for each mounted ntfs volume when someone reads
- * /proc/mounts in order to display the NTFS specific mount options of each
- * mount. The mount options of fs specified by @root are written to the seq file
- * @sf and success is returned.
- */
-int ntfs_show_options(struct seq_file *sf, struct dentry *root)
-{
-       ntfs_volume *vol = NTFS_SB(root->d_sb);
-       int i;
-
-       seq_printf(sf, ",uid=%i", from_kuid_munged(&init_user_ns, vol->uid));
-       seq_printf(sf, ",gid=%i", from_kgid_munged(&init_user_ns, vol->gid));
-       if (vol->fmask == vol->dmask)
-               seq_printf(sf, ",umask=0%o", vol->fmask);
-       else {
-               seq_printf(sf, ",fmask=0%o", vol->fmask);
-               seq_printf(sf, ",dmask=0%o", vol->dmask);
-       }
-       seq_printf(sf, ",nls=%s", vol->nls_map->charset);
-       if (NVolCaseSensitive(vol))
-               seq_printf(sf, ",case_sensitive");
-       if (NVolShowSystemFiles(vol))
-               seq_printf(sf, ",show_sys_files");
-       if (!NVolSparseEnabled(vol))
-               seq_printf(sf, ",disable_sparse");
-       for (i = 0; on_errors_arr[i].val; i++) {
-               if (on_errors_arr[i].val & vol->on_errors)
-                       seq_printf(sf, ",errors=%s", on_errors_arr[i].str);
-       }
-       seq_printf(sf, ",mft_zone_multiplier=%i", vol->mft_zone_multiplier);
-       return 0;
-}
-
-#ifdef NTFS_RW
-
-static const char *es = "  Leaving inconsistent metadata.  Unmount and run "
-               "chkdsk.";
-
-/**
- * ntfs_truncate - called when the i_size of an ntfs inode is changed
- * @vi:                inode for which the i_size was changed
- *
- * We only support i_size changes for normal files at present, i.e. not
- * compressed and not encrypted.  This is enforced in ntfs_setattr(), see
- * below.
- *
- * The kernel guarantees that @vi is a regular file (S_ISREG() is true) and
- * that the change is allowed.
- *
- * This implies for us that @vi is a file inode rather than a directory, index,
- * or attribute inode as well as that @vi is a base inode.
- *
- * Returns 0 on success or -errno on error.
- *
- * Called with ->i_mutex held.
- */
-int ntfs_truncate(struct inode *vi)
-{
-       s64 new_size, old_size, nr_freed, new_alloc_size, old_alloc_size;
-       VCN highest_vcn;
-       unsigned long flags;
-       ntfs_inode *base_ni, *ni = NTFS_I(vi);
-       ntfs_volume *vol = ni->vol;
-       ntfs_attr_search_ctx *ctx;
-       MFT_RECORD *m;
-       ATTR_RECORD *a;
-       const char *te = "  Leaving file length out of sync with i_size.";
-       int err, mp_size, size_change, alloc_change;
-
-       ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
-       BUG_ON(NInoAttr(ni));
-       BUG_ON(S_ISDIR(vi->i_mode));
-       BUG_ON(NInoMstProtected(ni));
-       BUG_ON(ni->nr_extents < 0);
-retry_truncate:
-       /*
-        * Lock the runlist for writing and map the mft record to ensure it is
-        * safe to mess with the attribute runlist and sizes.
-        */
-       down_write(&ni->runlist.lock);
-       if (!NInoAttr(ni))
-               base_ni = ni;
-       else
-               base_ni = ni->ext.base_ntfs_ino;
-       m = map_mft_record(base_ni);
-       if (IS_ERR(m)) {
-               err = PTR_ERR(m);
-               ntfs_error(vi->i_sb, "Failed to map mft record for inode 0x%lx "
-                               "(error code %d).%s", vi->i_ino, err, te);
-               ctx = NULL;
-               m = NULL;
-               goto old_bad_out;
-       }
-       ctx = ntfs_attr_get_search_ctx(base_ni, m);
-       if (unlikely(!ctx)) {
-               ntfs_error(vi->i_sb, "Failed to allocate a search context for "
-                               "inode 0x%lx (not enough memory).%s",
-                               vi->i_ino, te);
-               err = -ENOMEM;
-               goto old_bad_out;
-       }
-       err = ntfs_attr_lookup(ni->type, ni->name, ni->name_len,
-                       CASE_SENSITIVE, 0, NULL, 0, ctx);
-       if (unlikely(err)) {
-               if (err == -ENOENT) {
-                       ntfs_error(vi->i_sb, "Open attribute is missing from "
-                                       "mft record.  Inode 0x%lx is corrupt.  "
-                                       "Run chkdsk.%s", vi->i_ino, te);
-                       err = -EIO;
-               } else
-                       ntfs_error(vi->i_sb, "Failed to lookup attribute in "
-                                       "inode 0x%lx (error code %d).%s",
-                                       vi->i_ino, err, te);
-               goto old_bad_out;
-       }
-       m = ctx->mrec;
-       a = ctx->attr;
-       /*
-        * The i_size of the vfs inode is the new size for the attribute value.
-        */
-       new_size = i_size_read(vi);
-       /* The current size of the attribute value is the old size. */
-       old_size = ntfs_attr_size(a);
-       /* Calculate the new allocated size. */
-       if (NInoNonResident(ni))
-               new_alloc_size = (new_size + vol->cluster_size - 1) &
-                               ~(s64)vol->cluster_size_mask;
-       else
-               new_alloc_size = (new_size + 7) & ~7;
-       /* The current allocated size is the old allocated size. */
-       read_lock_irqsave(&ni->size_lock, flags);
-       old_alloc_size = ni->allocated_size;
-       read_unlock_irqrestore(&ni->size_lock, flags);
-       /*
-        * The change in the file size.  This will be 0 if no change, >0 if the
-        * size is growing, and <0 if the size is shrinking.
-        */
-       size_change = -1;
-       if (new_size - old_size >= 0) {
-               size_change = 1;
-               if (new_size == old_size)
-                       size_change = 0;
-       }
-       /* As above for the allocated size. */
-       alloc_change = -1;
-       if (new_alloc_size - old_alloc_size >= 0) {
-               alloc_change = 1;
-               if (new_alloc_size == old_alloc_size)
-                       alloc_change = 0;
-       }
-       /*
-        * If neither the size nor the allocation are being changed there is
-        * nothing to do.
-        */
-       if (!size_change && !alloc_change)
-               goto unm_done;
-       /* If the size is changing, check if new size is allowed in $AttrDef. */
-       if (size_change) {
-               err = ntfs_attr_size_bounds_check(vol, ni->type, new_size);
-               if (unlikely(err)) {
-                       if (err == -ERANGE) {
-                               ntfs_error(vol->sb, "Truncate would cause the "
-                                               "inode 0x%lx to %simum size "
-                                               "for its attribute type "
-                                               "(0x%x).  Aborting truncate.",
-                                               vi->i_ino,
-                                               new_size > old_size ? "exceed "
-                                               "the max" : "go under the min",
-                                               le32_to_cpu(ni->type));
-                               err = -EFBIG;
-                       } else {
-                               ntfs_error(vol->sb, "Inode 0x%lx has unknown "
-                                               "attribute type 0x%x.  "
-                                               "Aborting truncate.",
-                                               vi->i_ino,
-                                               le32_to_cpu(ni->type));
-                               err = -EIO;
-                       }
-                       /* Reset the vfs inode size to the old size. */
-                       i_size_write(vi, old_size);
-                       goto err_out;
-               }
-       }
-       if (NInoCompressed(ni) || NInoEncrypted(ni)) {
-               ntfs_warning(vi->i_sb, "Changes in inode size are not "
-                               "supported yet for %s files, ignoring.",
-                               NInoCompressed(ni) ? "compressed" :
-                               "encrypted");
-               err = -EOPNOTSUPP;
-               goto bad_out;
-       }
-       if (a->non_resident)
-               goto do_non_resident_truncate;
-       BUG_ON(NInoNonResident(ni));
-       /* Resize the attribute record to best fit the new attribute size. */
-       if (new_size < vol->mft_record_size &&
-                       !ntfs_resident_attr_value_resize(m, a, new_size)) {
-               /* The resize succeeded! */
-               flush_dcache_mft_record_page(ctx->ntfs_ino);
-               mark_mft_record_dirty(ctx->ntfs_ino);
-               write_lock_irqsave(&ni->size_lock, flags);
-               /* Update the sizes in the ntfs inode and all is done. */
-               ni->allocated_size = le32_to_cpu(a->length) -
-                               le16_to_cpu(a->data.resident.value_offset);
-               /*
-                * Note ntfs_resident_attr_value_resize() has already done any
-                * necessary data clearing in the attribute record.  When the
-                * file is being shrunk vmtruncate() will already have cleared
-                * the top part of the last partial page, i.e. since this is
-                * the resident case this is the page with index 0.  However,
-                * when the file is being expanded, the page cache page data
-                * between the old data_size, i.e. old_size, and the new_size
-                * has not been zeroed.  Fortunately, we do not need to zero it
-                * either since on one hand it will either already be zero due
-                * to both read_folio and writepage clearing partial page data
-                * beyond i_size in which case there is nothing to do or in the
-                * case of the file being mmap()ped at the same time, POSIX
-                * specifies that the behaviour is unspecified thus we do not
-                * have to do anything.  This means that in our implementation
-                * in the rare case that the file is mmap()ped and a write
-                * occurred into the mmap()ped region just beyond the file size
-                * and writepage has not yet been called to write out the page
-                * (which would clear the area beyond the file size) and we now
-                * extend the file size to incorporate this dirty region
-                * outside the file size, a write of the page would result in
-                * this data being written to disk instead of being cleared.
-                * Given both POSIX and the Linux mmap(2) man page specify that
-                * this corner case is undefined, we choose to leave it like
-                * that as this is much simpler for us as we cannot lock the
-                * relevant page now since we are holding too many ntfs locks
-                * which would result in a lock reversal deadlock.
-                */
-               ni->initialized_size = new_size;
-               write_unlock_irqrestore(&ni->size_lock, flags);
-               goto unm_done;
-       }
-       /* If the above resize failed, this must be an attribute extension. */
-       BUG_ON(size_change < 0);
-       /*
-        * We have to drop all the locks so we can call
-        * ntfs_attr_make_non_resident().  This could be optimised by try-
-        * locking the first page cache page and only if that fails dropping
-        * the locks, locking the page, and redoing all the locking and
-        * lookups.  While this would be a huge optimisation, it is not worth
-        * it as this is definitely a slow code path as it only ever can happen
-        * once for any given file.
-        */
-       ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(base_ni);
-       up_write(&ni->runlist.lock);
-       /*
-        * Not enough space in the mft record, try to make the attribute
-        * non-resident and if successful restart the truncation process.
-        */
-       err = ntfs_attr_make_non_resident(ni, old_size);
-       if (likely(!err))
-               goto retry_truncate;
-       /*
-        * Could not make non-resident.  If this is due to this not being
-        * permitted for this attribute type or there not being enough space,
-        * try to make other attributes non-resident.  Otherwise fail.
-        */
-       if (unlikely(err != -EPERM && err != -ENOSPC)) {
-               ntfs_error(vol->sb, "Cannot truncate inode 0x%lx, attribute "
-                               "type 0x%x, because the conversion from "
-                               "resident to non-resident attribute failed "
-                               "with error code %i.", vi->i_ino,
-                               (unsigned)le32_to_cpu(ni->type), err);
-               if (err != -ENOMEM)
-                       err = -EIO;
-               goto conv_err_out;
-       }
-       /* TODO: Not implemented from here, abort. */
-       if (err == -ENOSPC)
-               ntfs_error(vol->sb, "Not enough space in the mft record/on "
-                               "disk for the non-resident attribute value.  "
-                               "This case is not implemented yet.");
-       else /* if (err == -EPERM) */
-               ntfs_error(vol->sb, "This attribute type may not be "
-                               "non-resident.  This case is not implemented "
-                               "yet.");
-       err = -EOPNOTSUPP;
-       goto conv_err_out;
-#if 0
-       // TODO: Attempt to make other attributes non-resident.
-       if (!err)
-               goto do_resident_extend;
-       /*
-        * Both the attribute list attribute and the standard information
-        * attribute must remain in the base inode.  Thus, if this is one of
-        * these attributes, we have to try to move other attributes out into
-        * extent mft records instead.
-        */
-       if (ni->type == AT_ATTRIBUTE_LIST ||
-                       ni->type == AT_STANDARD_INFORMATION) {
-               // TODO: Attempt to move other attributes into extent mft
-               // records.
-               err = -EOPNOTSUPP;
-               if (!err)
-                       goto do_resident_extend;
-               goto err_out;
-       }
-       // TODO: Attempt to move this attribute to an extent mft record, but
-       // only if it is not already the only attribute in an mft record in
-       // which case there would be nothing to gain.
-       err = -EOPNOTSUPP;
-       if (!err)
-               goto do_resident_extend;
-       /* There is nothing we can do to make enough space. )-: */
-       goto err_out;
-#endif
-do_non_resident_truncate:
-       BUG_ON(!NInoNonResident(ni));
-       if (alloc_change < 0) {
-               highest_vcn = sle64_to_cpu(a->data.non_resident.highest_vcn);
-               if (highest_vcn > 0 &&
-                               old_alloc_size >> vol->cluster_size_bits >
-                               highest_vcn + 1) {
-                       /*
-                        * This attribute has multiple extents.  Not yet
-                        * supported.
-                        */
-                       ntfs_error(vol->sb, "Cannot truncate inode 0x%lx, "
-                                       "attribute type 0x%x, because the "
-                                       "attribute is highly fragmented (it "
-                                       "consists of multiple extents) and "
-                                       "this case is not implemented yet.",
-                                       vi->i_ino,
-                                       (unsigned)le32_to_cpu(ni->type));
-                       err = -EOPNOTSUPP;
-                       goto bad_out;
-               }
-       }
-       /*
-        * If the size is shrinking, need to reduce the initialized_size and
-        * the data_size before reducing the allocation.
-        */
-       if (size_change < 0) {
-               /*
-                * Make the valid size smaller (i_size is already up-to-date).
-                */
-               write_lock_irqsave(&ni->size_lock, flags);
-               if (new_size < ni->initialized_size) {
-                       ni->initialized_size = new_size;
-                       a->data.non_resident.initialized_size =
-                                       cpu_to_sle64(new_size);
-               }
-               a->data.non_resident.data_size = cpu_to_sle64(new_size);
-               write_unlock_irqrestore(&ni->size_lock, flags);
-               flush_dcache_mft_record_page(ctx->ntfs_ino);
-               mark_mft_record_dirty(ctx->ntfs_ino);
-               /* If the allocated size is not changing, we are done. */
-               if (!alloc_change)
-                       goto unm_done;
-               /*
-                * If the size is shrinking it makes no sense for the
-                * allocation to be growing.
-                */
-               BUG_ON(alloc_change > 0);
-       } else /* if (size_change >= 0) */ {
-               /*
-                * The file size is growing or staying the same but the
-                * allocation can be shrinking, growing or staying the same.
-                */
-               if (alloc_change > 0) {
-                       /*
-                        * We need to extend the allocation and possibly update
-                        * the data size.  If we are updating the data size,
-                        * since we are not touching the initialized_size we do
-                        * not need to worry about the actual data on disk.
-                        * And as far as the page cache is concerned, there
-                        * will be no pages beyond the old data size and any
-                        * partial region in the last page between the old and
-                        * new data size (or the end of the page if the new
-                        * data size is outside the page) does not need to be
-                        * modified as explained above for the resident
-                        * attribute truncate case.  To do this, we simply drop
-                        * the locks we hold and leave all the work to our
-                        * friendly helper ntfs_attr_extend_allocation().
-                        */
-                       ntfs_attr_put_search_ctx(ctx);
-                       unmap_mft_record(base_ni);
-                       up_write(&ni->runlist.lock);
-                       err = ntfs_attr_extend_allocation(ni, new_size,
-                                       size_change > 0 ? new_size : -1, -1);
-                       /*
-                        * ntfs_attr_extend_allocation() will have done error
-                        * output already.
-                        */
-                       goto done;
-               }
-               if (!alloc_change)
-                       goto alloc_done;
-       }
-       /* alloc_change < 0 */
-       /* Free the clusters. */
-       nr_freed = ntfs_cluster_free(ni, new_alloc_size >>
-                       vol->cluster_size_bits, -1, ctx);
-       m = ctx->mrec;
-       a = ctx->attr;
-       if (unlikely(nr_freed < 0)) {
-               ntfs_error(vol->sb, "Failed to release cluster(s) (error code "
-                               "%lli).  Unmount and run chkdsk to recover "
-                               "the lost cluster(s).", (long long)nr_freed);
-               NVolSetErrors(vol);
-               nr_freed = 0;
-       }
-       /* Truncate the runlist. */
-       err = ntfs_rl_truncate_nolock(vol, &ni->runlist,
-                       new_alloc_size >> vol->cluster_size_bits);
-       /*
-        * If the runlist truncation failed and/or the search context is no
-        * longer valid, we cannot resize the attribute record or build the
-        * mapping pairs array thus we mark the inode bad so that no access to
-        * the freed clusters can happen.
-        */
-       if (unlikely(err || IS_ERR(m))) {
-               ntfs_error(vol->sb, "Failed to %s (error code %li).%s",
-                               IS_ERR(m) ?
-                               "restore attribute search context" :
-                               "truncate attribute runlist",
-                               IS_ERR(m) ? PTR_ERR(m) : err, es);
-               err = -EIO;
-               goto bad_out;
-       }
-       /* Get the size for the shrunk mapping pairs array for the runlist. */
-       mp_size = ntfs_get_size_for_mapping_pairs(vol, ni->runlist.rl, 0, -1);
-       if (unlikely(mp_size <= 0)) {
-               ntfs_error(vol->sb, "Cannot shrink allocation of inode 0x%lx, "
-                               "attribute type 0x%x, because determining the "
-                               "size for the mapping pairs failed with error "
-                               "code %i.%s", vi->i_ino,
-                               (unsigned)le32_to_cpu(ni->type), mp_size, es);
-               err = -EIO;
-               goto bad_out;
-       }
-       /*
-        * Shrink the attribute record for the new mapping pairs array.  Note,
-        * this cannot fail since we are making the attribute smaller thus by
-        * definition there is enough space to do so.
-        */
-       err = ntfs_attr_record_resize(m, a, mp_size +
-                       le16_to_cpu(a->data.non_resident.mapping_pairs_offset));
-       BUG_ON(err);
-       /*
-        * Generate the mapping pairs array directly into the attribute record.
-        */
-       err = ntfs_mapping_pairs_build(vol, (u8*)a +
-                       le16_to_cpu(a->data.non_resident.mapping_pairs_offset),
-                       mp_size, ni->runlist.rl, 0, -1, NULL);
-       if (unlikely(err)) {
-               ntfs_error(vol->sb, "Cannot shrink allocation of inode 0x%lx, "
-                               "attribute type 0x%x, because building the "
-                               "mapping pairs failed with error code %i.%s",
-                               vi->i_ino, (unsigned)le32_to_cpu(ni->type),
-                               err, es);
-               err = -EIO;
-               goto bad_out;
-       }
-       /* Update the allocated/compressed size as well as the highest vcn. */
-       a->data.non_resident.highest_vcn = cpu_to_sle64((new_alloc_size >>
-                       vol->cluster_size_bits) - 1);
-       write_lock_irqsave(&ni->size_lock, flags);
-       ni->allocated_size = new_alloc_size;
-       a->data.non_resident.allocated_size = cpu_to_sle64(new_alloc_size);
-       if (NInoSparse(ni) || NInoCompressed(ni)) {
-               if (nr_freed) {
-                       ni->itype.compressed.size -= nr_freed <<
-                                       vol->cluster_size_bits;
-                       BUG_ON(ni->itype.compressed.size < 0);
-                       a->data.non_resident.compressed_size = cpu_to_sle64(
-                                       ni->itype.compressed.size);
-                       vi->i_blocks = ni->itype.compressed.size >> 9;
-               }
-       } else
-               vi->i_blocks = new_alloc_size >> 9;
-       write_unlock_irqrestore(&ni->size_lock, flags);
-       /*
-        * We have shrunk the allocation.  If this is a shrinking truncate we
-        * have already dealt with the initialized_size and the data_size above
-        * and we are done.  If the truncate is only changing the allocation
-        * and not the data_size, we are also done.  If this is an extending
-        * truncate, need to extend the data_size now which is ensured by the
-        * fact that @size_change is positive.
-        */
-alloc_done:
-       /*
-        * If the size is growing, need to update it now.  If it is shrinking,
-        * we have already updated it above (before the allocation change).
-        */
-       if (size_change > 0)
-               a->data.non_resident.data_size = cpu_to_sle64(new_size);
-       /* Ensure the modified mft record is written out. */
-       flush_dcache_mft_record_page(ctx->ntfs_ino);
-       mark_mft_record_dirty(ctx->ntfs_ino);
-unm_done:
-       ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(base_ni);
-       up_write(&ni->runlist.lock);
-done:
-       /* Update the mtime and ctime on the base inode. */
-       /* normally ->truncate shouldn't update ctime or mtime,
-        * but ntfs did before so it got a copy & paste version
-        * of file_update_time.  one day someone should fix this
-        * for real.
-        */
-       if (!IS_NOCMTIME(VFS_I(base_ni)) && !IS_RDONLY(VFS_I(base_ni))) {
-               struct timespec64 now = current_time(VFS_I(base_ni));
-               struct timespec64 ctime = inode_get_ctime(VFS_I(base_ni));
-               struct timespec64 mtime = inode_get_mtime(VFS_I(base_ni));
-               int sync_it = 0;
-
-               if (!timespec64_equal(&mtime, &now) ||
-                   !timespec64_equal(&ctime, &now))
-                       sync_it = 1;
-               inode_set_ctime_to_ts(VFS_I(base_ni), now);
-               inode_set_mtime_to_ts(VFS_I(base_ni), now);
-
-               if (sync_it)
-                       mark_inode_dirty_sync(VFS_I(base_ni));
-       }
-
-       if (likely(!err)) {
-               NInoClearTruncateFailed(ni);
-               ntfs_debug("Done.");
-       }
-       return err;
-old_bad_out:
-       old_size = -1;
-bad_out:
-       if (err != -ENOMEM && err != -EOPNOTSUPP)
-               NVolSetErrors(vol);
-       if (err != -EOPNOTSUPP)
-               NInoSetTruncateFailed(ni);
-       else if (old_size >= 0)
-               i_size_write(vi, old_size);
-err_out:
-       if (ctx)
-               ntfs_attr_put_search_ctx(ctx);
-       if (m)
-               unmap_mft_record(base_ni);
-       up_write(&ni->runlist.lock);
-out:
-       ntfs_debug("Failed.  Returning error code %i.", err);
-       return err;
-conv_err_out:
-       if (err != -ENOMEM && err != -EOPNOTSUPP)
-               NVolSetErrors(vol);
-       if (err != -EOPNOTSUPP)
-               NInoSetTruncateFailed(ni);
-       else
-               i_size_write(vi, old_size);
-       goto out;
-}
-
-/**
- * ntfs_truncate_vfs - wrapper for ntfs_truncate() that has no return value
- * @vi:                inode for which the i_size was changed
- *
- * Wrapper for ntfs_truncate() that has no return value.
- *
- * See ntfs_truncate() description above for details.
- */
-#ifdef NTFS_RW
-void ntfs_truncate_vfs(struct inode *vi) {
-       ntfs_truncate(vi);
-}
-#endif
-
-/**
- * ntfs_setattr - called from notify_change() when an attribute is being changed
- * @idmap:     idmap of the mount the inode was found from
- * @dentry:    dentry whose attributes to change
- * @attr:      structure describing the attributes and the changes
- *
- * We have to trap VFS attempts to truncate the file described by @dentry as
- * soon as possible, because we do not implement changes in i_size yet.  So we
- * abort all i_size changes here.
- *
- * We also abort all changes of user, group, and mode as we do not implement
- * the NTFS ACLs yet.
- *
- * Called with ->i_mutex held.
- */
-int ntfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
-                struct iattr *attr)
-{
-       struct inode *vi = d_inode(dentry);
-       int err;
-       unsigned int ia_valid = attr->ia_valid;
-
-       err = setattr_prepare(&nop_mnt_idmap, dentry, attr);
-       if (err)
-               goto out;
-       /* We do not support NTFS ACLs yet. */
-       if (ia_valid & (ATTR_UID | ATTR_GID | ATTR_MODE)) {
-               ntfs_warning(vi->i_sb, "Changes in user/group/mode are not "
-                               "supported yet, ignoring.");
-               err = -EOPNOTSUPP;
-               goto out;
-       }
-       if (ia_valid & ATTR_SIZE) {
-               if (attr->ia_size != i_size_read(vi)) {
-                       ntfs_inode *ni = NTFS_I(vi);
-                       /*
-                        * FIXME: For now we do not support resizing of
-                        * compressed or encrypted files yet.
-                        */
-                       if (NInoCompressed(ni) || NInoEncrypted(ni)) {
-                               ntfs_warning(vi->i_sb, "Changes in inode size "
-                                               "are not supported yet for "
-                                               "%s files, ignoring.",
-                                               NInoCompressed(ni) ?
-                                               "compressed" : "encrypted");
-                               err = -EOPNOTSUPP;
-                       } else {
-                               truncate_setsize(vi, attr->ia_size);
-                               ntfs_truncate_vfs(vi);
-                       }
-                       if (err || ia_valid == ATTR_SIZE)
-                               goto out;
-               } else {
-                       /*
-                        * We skipped the truncate but must still update
-                        * timestamps.
-                        */
-                       ia_valid |= ATTR_MTIME | ATTR_CTIME;
-               }
-       }
-       if (ia_valid & ATTR_ATIME)
-               inode_set_atime_to_ts(vi, attr->ia_atime);
-       if (ia_valid & ATTR_MTIME)
-               inode_set_mtime_to_ts(vi, attr->ia_mtime);
-       if (ia_valid & ATTR_CTIME)
-               inode_set_ctime_to_ts(vi, attr->ia_ctime);
-       mark_inode_dirty(vi);
-out:
-       return err;
-}
-
-/**
- * __ntfs_write_inode - write out a dirty inode
- * @vi:                inode to write out
- * @sync:      if true, write out synchronously
- *
- * Write out a dirty inode to disk including any extent inodes if present.
- *
- * If @sync is true, commit the inode to disk and wait for io completion.  This
- * is done using write_mft_record().
- *
- * If @sync is false, just schedule the write to happen but do not wait for i/o
- * completion.  In 2.6 kernels, scheduling usually happens just by virtue of
- * marking the page (and in this case mft record) dirty but we do not implement
- * this yet as write_mft_record() largely ignores the @sync parameter and
- * always performs synchronous writes.
- *
- * Return 0 on success and -errno on error.
- */
-int __ntfs_write_inode(struct inode *vi, int sync)
-{
-       sle64 nt;
-       ntfs_inode *ni = NTFS_I(vi);
-       ntfs_attr_search_ctx *ctx;
-       MFT_RECORD *m;
-       STANDARD_INFORMATION *si;
-       int err = 0;
-       bool modified = false;
-
-       ntfs_debug("Entering for %sinode 0x%lx.", NInoAttr(ni) ? "attr " : "",
-                       vi->i_ino);
-       /*
-        * Dirty attribute inodes are written via their real inodes so just
-        * clean them here.  Access time updates are taken care off when the
-        * real inode is written.
-        */
-       if (NInoAttr(ni)) {
-               NInoClearDirty(ni);
-               ntfs_debug("Done.");
-               return 0;
-       }
-       /* Map, pin, and lock the mft record belonging to the inode. */
-       m = map_mft_record(ni);
-       if (IS_ERR(m)) {
-               err = PTR_ERR(m);
-               goto err_out;
-       }
-       /* Update the access times in the standard information attribute. */
-       ctx = ntfs_attr_get_search_ctx(ni, m);
-       if (unlikely(!ctx)) {
-               err = -ENOMEM;
-               goto unm_err_out;
-       }
-       err = ntfs_attr_lookup(AT_STANDARD_INFORMATION, NULL, 0,
-                       CASE_SENSITIVE, 0, NULL, 0, ctx);
-       if (unlikely(err)) {
-               ntfs_attr_put_search_ctx(ctx);
-               goto unm_err_out;
-       }
-       si = (STANDARD_INFORMATION*)((u8*)ctx->attr +
-                       le16_to_cpu(ctx->attr->data.resident.value_offset));
-       /* Update the access times if they have changed. */
-       nt = utc2ntfs(inode_get_mtime(vi));
-       if (si->last_data_change_time != nt) {
-               ntfs_debug("Updating mtime for inode 0x%lx: old = 0x%llx, "
-                               "new = 0x%llx", vi->i_ino, (long long)
-                               sle64_to_cpu(si->last_data_change_time),
-                               (long long)sle64_to_cpu(nt));
-               si->last_data_change_time = nt;
-               modified = true;
-       }
-       nt = utc2ntfs(inode_get_ctime(vi));
-       if (si->last_mft_change_time != nt) {
-               ntfs_debug("Updating ctime for inode 0x%lx: old = 0x%llx, "
-                               "new = 0x%llx", vi->i_ino, (long long)
-                               sle64_to_cpu(si->last_mft_change_time),
-                               (long long)sle64_to_cpu(nt));
-               si->last_mft_change_time = nt;
-               modified = true;
-       }
-       nt = utc2ntfs(inode_get_atime(vi));
-       if (si->last_access_time != nt) {
-               ntfs_debug("Updating atime for inode 0x%lx: old = 0x%llx, "
-                               "new = 0x%llx", vi->i_ino,
-                               (long long)sle64_to_cpu(si->last_access_time),
-                               (long long)sle64_to_cpu(nt));
-               si->last_access_time = nt;
-               modified = true;
-       }
-       /*
-        * If we just modified the standard information attribute we need to
-        * mark the mft record it is in dirty.  We do this manually so that
-        * mark_inode_dirty() is not called which would redirty the inode and
-        * hence result in an infinite loop of trying to write the inode.
-        * There is no need to mark the base inode nor the base mft record
-        * dirty, since we are going to write this mft record below in any case
-        * and the base mft record may actually not have been modified so it
-        * might not need to be written out.
-        * NOTE: It is not a problem when the inode for $MFT itself is being
-        * written out as mark_ntfs_record_dirty() will only set I_DIRTY_PAGES
-        * on the $MFT inode and hence __ntfs_write_inode() will not be
-        * re-invoked because of it which in turn is ok since the dirtied mft
-        * record will be cleaned and written out to disk below, i.e. before
-        * this function returns.
-        */
-       if (modified) {
-               flush_dcache_mft_record_page(ctx->ntfs_ino);
-               if (!NInoTestSetDirty(ctx->ntfs_ino))
-                       mark_ntfs_record_dirty(ctx->ntfs_ino->page,
-                                       ctx->ntfs_ino->page_ofs);
-       }
-       ntfs_attr_put_search_ctx(ctx);
-       /* Now the access times are updated, write the base mft record. */
-       if (NInoDirty(ni))
-               err = write_mft_record(ni, m, sync);
-       /* Write all attached extent mft records. */
-       mutex_lock(&ni->extent_lock);
-       if (ni->nr_extents > 0) {
-               ntfs_inode **extent_nis = ni->ext.extent_ntfs_inos;
-               int i;
-
-               ntfs_debug("Writing %i extent inodes.", ni->nr_extents);
-               for (i = 0; i < ni->nr_extents; i++) {
-                       ntfs_inode *tni = extent_nis[i];
-
-                       if (NInoDirty(tni)) {
-                               MFT_RECORD *tm = map_mft_record(tni);
-                               int ret;
-
-                               if (IS_ERR(tm)) {
-                                       if (!err || err == -ENOMEM)
-                                               err = PTR_ERR(tm);
-                                       continue;
-                               }
-                               ret = write_mft_record(tni, tm, sync);
-                               unmap_mft_record(tni);
-                               if (unlikely(ret)) {
-                                       if (!err || err == -ENOMEM)
-                                               err = ret;
-                               }
-                       }
-               }
-       }
-       mutex_unlock(&ni->extent_lock);
-       unmap_mft_record(ni);
-       if (unlikely(err))
-               goto err_out;
-       ntfs_debug("Done.");
-       return 0;
-unm_err_out:
-       unmap_mft_record(ni);
-err_out:
-       if (err == -ENOMEM) {
-               ntfs_warning(vi->i_sb, "Not enough memory to write inode.  "
-                               "Marking the inode dirty again, so the VFS "
-                               "retries later.");
-               mark_inode_dirty(vi);
-       } else {
-               ntfs_error(vi->i_sb, "Failed (error %i):  Run chkdsk.", -err);
-               NVolSetErrors(ni->vol);
-       }
-       return err;
-}
-
-#endif /* NTFS_RW */
diff --git a/fs/ntfs/inode.h b/fs/ntfs/inode.h
deleted file mode 100644 (file)
index 147ef4d..0000000
+++ /dev/null
@@ -1,310 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * inode.h - Defines for inode structures NTFS Linux kernel driver. Part of
- *          the Linux-NTFS project.
- *
- * Copyright (c) 2001-2007 Anton Altaparmakov
- * Copyright (c) 2002 Richard Russon
- */
-
-#ifndef _LINUX_NTFS_INODE_H
-#define _LINUX_NTFS_INODE_H
-
-#include <linux/atomic.h>
-
-#include <linux/fs.h>
-#include <linux/list.h>
-#include <linux/mm.h>
-#include <linux/mutex.h>
-#include <linux/seq_file.h>
-
-#include "layout.h"
-#include "volume.h"
-#include "types.h"
-#include "runlist.h"
-#include "debug.h"
-
-typedef struct _ntfs_inode ntfs_inode;
-
-/*
- * The NTFS in-memory inode structure. It is just used as an extension to the
- * fields already provided in the VFS inode.
- */
-struct _ntfs_inode {
-       rwlock_t size_lock;     /* Lock serializing access to inode sizes. */
-       s64 initialized_size;   /* Copy from the attribute record. */
-       s64 allocated_size;     /* Copy from the attribute record. */
-       unsigned long state;    /* NTFS specific flags describing this inode.
-                                  See ntfs_inode_state_bits below. */
-       unsigned long mft_no;   /* Number of the mft record / inode. */
-       u16 seq_no;             /* Sequence number of the mft record. */
-       atomic_t count;         /* Inode reference count for book keeping. */
-       ntfs_volume *vol;       /* Pointer to the ntfs volume of this inode. */
-       /*
-        * If NInoAttr() is true, the below fields describe the attribute which
-        * this fake inode belongs to. The actual inode of this attribute is
-        * pointed to by base_ntfs_ino and nr_extents is always set to -1 (see
-        * below). For real inodes, we also set the type (AT_DATA for files and
-        * AT_INDEX_ALLOCATION for directories), with the name = NULL and
-        * name_len = 0 for files and name = I30 (global constant) and
-        * name_len = 4 for directories.
-        */
-       ATTR_TYPE type; /* Attribute type of this fake inode. */
-       ntfschar *name;         /* Attribute name of this fake inode. */
-       u32 name_len;           /* Attribute name length of this fake inode. */
-       runlist runlist;        /* If state has the NI_NonResident bit set,
-                                  the runlist of the unnamed data attribute
-                                  (if a file) or of the index allocation
-                                  attribute (directory) or of the attribute
-                                  described by the fake inode (if NInoAttr()).
-                                  If runlist.rl is NULL, the runlist has not
-                                  been read in yet or has been unmapped. If
-                                  NI_NonResident is clear, the attribute is
-                                  resident (file and fake inode) or there is
-                                  no $I30 index allocation attribute
-                                  (small directory). In the latter case
-                                  runlist.rl is always NULL.*/
-       /*
-        * The following fields are only valid for real inodes and extent
-        * inodes.
-        */
-       struct mutex mrec_lock; /* Lock for serializing access to the
-                                  mft record belonging to this inode. */
-       struct page *page;      /* The page containing the mft record of the
-                                  inode. This should only be touched by the
-                                  (un)map_mft_record*() functions. */
-       int page_ofs;           /* Offset into the page at which the mft record
-                                  begins. This should only be touched by the
-                                  (un)map_mft_record*() functions. */
-       /*
-        * Attribute list support (only for use by the attribute lookup
-        * functions). Setup during read_inode for all inodes with attribute
-        * lists. Only valid if NI_AttrList is set in state, and attr_list_rl is
-        * further only valid if NI_AttrListNonResident is set.
-        */
-       u32 attr_list_size;     /* Length of attribute list value in bytes. */
-       u8 *attr_list;          /* Attribute list value itself. */
-       runlist attr_list_rl;   /* Run list for the attribute list value. */
-       union {
-               struct { /* It is a directory, $MFT, or an index inode. */
-                       u32 block_size;         /* Size of an index block. */
-                       u32 vcn_size;           /* Size of a vcn in this
-                                                  index. */
-                       COLLATION_RULE collation_rule; /* The collation rule
-                                                  for the index. */
-                       u8 block_size_bits;     /* Log2 of the above. */
-                       u8 vcn_size_bits;       /* Log2 of the above. */
-               } index;
-               struct { /* It is a compressed/sparse file/attribute inode. */
-                       s64 size;               /* Copy of compressed_size from
-                                                  $DATA. */
-                       u32 block_size;         /* Size of a compression block
-                                                  (cb). */
-                       u8 block_size_bits;     /* Log2 of the size of a cb. */
-                       u8 block_clusters;      /* Number of clusters per cb. */
-               } compressed;
-       } itype;
-       struct mutex extent_lock;       /* Lock for accessing/modifying the
-                                          below . */
-       s32 nr_extents; /* For a base mft record, the number of attached extent
-                          inodes (0 if none), for extent records and for fake
-                          inodes describing an attribute this is -1. */
-       union {         /* This union is only used if nr_extents != 0. */
-               ntfs_inode **extent_ntfs_inos;  /* For nr_extents > 0, array of
-                                                  the ntfs inodes of the extent
-                                                  mft records belonging to
-                                                  this base inode which have
-                                                  been loaded. */
-               ntfs_inode *base_ntfs_ino;      /* For nr_extents == -1, the
-                                                  ntfs inode of the base mft
-                                                  record. For fake inodes, the
-                                                  real (base) inode to which
-                                                  the attribute belongs. */
-       } ext;
-};
-
-/*
- * Defined bits for the state field in the ntfs_inode structure.
- * (f) = files only, (d) = directories only, (a) = attributes/fake inodes only
- */
-typedef enum {
-       NI_Dirty,               /* 1: Mft record needs to be written to disk. */
-       NI_AttrList,            /* 1: Mft record contains an attribute list. */
-       NI_AttrListNonResident, /* 1: Attribute list is non-resident. Implies
-                                     NI_AttrList is set. */
-
-       NI_Attr,                /* 1: Fake inode for attribute i/o.
-                                  0: Real inode or extent inode. */
-
-       NI_MstProtected,        /* 1: Attribute is protected by MST fixups.
-                                  0: Attribute is not protected by fixups. */
-       NI_NonResident,         /* 1: Unnamed data attr is non-resident (f).
-                                  1: Attribute is non-resident (a). */
-       NI_IndexAllocPresent = NI_NonResident,  /* 1: $I30 index alloc attr is
-                                                  present (d). */
-       NI_Compressed,          /* 1: Unnamed data attr is compressed (f).
-                                  1: Create compressed files by default (d).
-                                  1: Attribute is compressed (a). */
-       NI_Encrypted,           /* 1: Unnamed data attr is encrypted (f).
-                                  1: Create encrypted files by default (d).
-                                  1: Attribute is encrypted (a). */
-       NI_Sparse,              /* 1: Unnamed data attr is sparse (f).
-                                  1: Create sparse files by default (d).
-                                  1: Attribute is sparse (a). */
-       NI_SparseDisabled,      /* 1: May not create sparse regions. */
-       NI_TruncateFailed,      /* 1: Last ntfs_truncate() call failed. */
-} ntfs_inode_state_bits;
-
-/*
- * NOTE: We should be adding dirty mft records to a list somewhere and they
- * should be independent of the (ntfs/vfs) inode structure so that an inode can
- * be removed but the record can be left dirty for syncing later.
- */
-
-/*
- * Macro tricks to expand the NInoFoo(), NInoSetFoo(), and NInoClearFoo()
- * functions.
- */
-#define NINO_FNS(flag)                                 \
-static inline int NIno##flag(ntfs_inode *ni)           \
-{                                                      \
-       return test_bit(NI_##flag, &(ni)->state);       \
-}                                                      \
-static inline void NInoSet##flag(ntfs_inode *ni)       \
-{                                                      \
-       set_bit(NI_##flag, &(ni)->state);               \
-}                                                      \
-static inline void NInoClear##flag(ntfs_inode *ni)     \
-{                                                      \
-       clear_bit(NI_##flag, &(ni)->state);             \
-}
-
-/*
- * As above for NInoTestSetFoo() and NInoTestClearFoo().
- */
-#define TAS_NINO_FNS(flag)                                     \
-static inline int NInoTestSet##flag(ntfs_inode *ni)            \
-{                                                              \
-       return test_and_set_bit(NI_##flag, &(ni)->state);       \
-}                                                              \
-static inline int NInoTestClear##flag(ntfs_inode *ni)          \
-{                                                              \
-       return test_and_clear_bit(NI_##flag, &(ni)->state);     \
-}
-
-/* Emit the ntfs inode bitops functions. */
-NINO_FNS(Dirty)
-TAS_NINO_FNS(Dirty)
-NINO_FNS(AttrList)
-NINO_FNS(AttrListNonResident)
-NINO_FNS(Attr)
-NINO_FNS(MstProtected)
-NINO_FNS(NonResident)
-NINO_FNS(IndexAllocPresent)
-NINO_FNS(Compressed)
-NINO_FNS(Encrypted)
-NINO_FNS(Sparse)
-NINO_FNS(SparseDisabled)
-NINO_FNS(TruncateFailed)
-
-/*
- * The full structure containing a ntfs_inode and a vfs struct inode. Used for
- * all real and fake inodes but not for extent inodes which lack the vfs struct
- * inode.
- */
-typedef struct {
-       ntfs_inode ntfs_inode;
-       struct inode vfs_inode;         /* The vfs inode structure. */
-} big_ntfs_inode;
-
-/**
- * NTFS_I - return the ntfs inode given a vfs inode
- * @inode:     VFS inode
- *
- * NTFS_I() returns the ntfs inode associated with the VFS @inode.
- */
-static inline ntfs_inode *NTFS_I(struct inode *inode)
-{
-       return (ntfs_inode *)container_of(inode, big_ntfs_inode, vfs_inode);
-}
-
-static inline struct inode *VFS_I(ntfs_inode *ni)
-{
-       return &((big_ntfs_inode *)ni)->vfs_inode;
-}
-
-/**
- * ntfs_attr - ntfs in memory attribute structure
- * @mft_no:    mft record number of the base mft record of this attribute
- * @name:      Unicode name of the attribute (NULL if unnamed)
- * @name_len:  length of @name in Unicode characters (0 if unnamed)
- * @type:      attribute type (see layout.h)
- *
- * This structure exists only to provide a small structure for the
- * ntfs_{attr_}iget()/ntfs_test_inode()/ntfs_init_locked_inode() mechanism.
- *
- * NOTE: Elements are ordered by size to make the structure as compact as
- * possible on all architectures.
- */
-typedef struct {
-       unsigned long mft_no;
-       ntfschar *name;
-       u32 name_len;
-       ATTR_TYPE type;
-} ntfs_attr;
-
-extern int ntfs_test_inode(struct inode *vi, void *data);
-
-extern struct inode *ntfs_iget(struct super_block *sb, unsigned long mft_no);
-extern struct inode *ntfs_attr_iget(struct inode *base_vi, ATTR_TYPE type,
-               ntfschar *name, u32 name_len);
-extern struct inode *ntfs_index_iget(struct inode *base_vi, ntfschar *name,
-               u32 name_len);
-
-extern struct inode *ntfs_alloc_big_inode(struct super_block *sb);
-extern void ntfs_free_big_inode(struct inode *inode);
-extern void ntfs_evict_big_inode(struct inode *vi);
-
-extern void __ntfs_init_inode(struct super_block *sb, ntfs_inode *ni);
-
-static inline void ntfs_init_big_inode(struct inode *vi)
-{
-       ntfs_inode *ni = NTFS_I(vi);
-
-       ntfs_debug("Entering.");
-       __ntfs_init_inode(vi->i_sb, ni);
-       ni->mft_no = vi->i_ino;
-}
-
-extern ntfs_inode *ntfs_new_extent_inode(struct super_block *sb,
-               unsigned long mft_no);
-extern void ntfs_clear_extent_inode(ntfs_inode *ni);
-
-extern int ntfs_read_inode_mount(struct inode *vi);
-
-extern int ntfs_show_options(struct seq_file *sf, struct dentry *root);
-
-#ifdef NTFS_RW
-
-extern int ntfs_truncate(struct inode *vi);
-extern void ntfs_truncate_vfs(struct inode *vi);
-
-extern int ntfs_setattr(struct mnt_idmap *idmap,
-                       struct dentry *dentry, struct iattr *attr);
-
-extern int __ntfs_write_inode(struct inode *vi, int sync);
-
-static inline void ntfs_commit_inode(struct inode *vi)
-{
-       if (!is_bad_inode(vi))
-               __ntfs_write_inode(vi, 1);
-       return;
-}
-
-#else
-
-static inline void ntfs_truncate_vfs(struct inode *vi) {}
-
-#endif /* NTFS_RW */
-
-#endif /* _LINUX_NTFS_INODE_H */
diff --git a/fs/ntfs/layout.h b/fs/ntfs/layout.h
deleted file mode 100644 (file)
index 5d4bf7a..0000000
+++ /dev/null
@@ -1,2421 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * layout.h - All NTFS associated on-disk structures. Part of the Linux-NTFS
- *           project.
- *
- * Copyright (c) 2001-2005 Anton Altaparmakov
- * Copyright (c) 2002 Richard Russon
- */
-
-#ifndef _LINUX_NTFS_LAYOUT_H
-#define _LINUX_NTFS_LAYOUT_H
-
-#include <linux/types.h>
-#include <linux/bitops.h>
-#include <linux/list.h>
-#include <asm/byteorder.h>
-
-#include "types.h"
-
-/* The NTFS oem_id "NTFS    " */
-#define magicNTFS      cpu_to_le64(0x202020205346544eULL)
-
-/*
- * Location of bootsector on partition:
- *     The standard NTFS_BOOT_SECTOR is on sector 0 of the partition.
- *     On NT4 and above there is one backup copy of the boot sector to
- *     be found on the last sector of the partition (not normally accessible
- *     from within Windows as the bootsector contained number of sectors
- *     value is one less than the actual value!).
- *     On versions of NT 3.51 and earlier, the backup copy was located at
- *     number of sectors/2 (integer divide), i.e. in the middle of the volume.
- */
-
-/*
- * BIOS parameter block (bpb) structure.
- */
-typedef struct {
-       le16 bytes_per_sector;          /* Size of a sector in bytes. */
-       u8  sectors_per_cluster;        /* Size of a cluster in sectors. */
-       le16 reserved_sectors;          /* zero */
-       u8  fats;                       /* zero */
-       le16 root_entries;              /* zero */
-       le16 sectors;                   /* zero */
-       u8  media_type;                 /* 0xf8 = hard disk */
-       le16 sectors_per_fat;           /* zero */
-       le16 sectors_per_track;         /* irrelevant */
-       le16 heads;                     /* irrelevant */
-       le32 hidden_sectors;            /* zero */
-       le32 large_sectors;             /* zero */
-} __attribute__ ((__packed__)) BIOS_PARAMETER_BLOCK;
-
-/*
- * NTFS boot sector structure.
- */
-typedef struct {
-       u8  jump[3];                    /* Irrelevant (jump to boot up code).*/
-       le64 oem_id;                    /* Magic "NTFS    ". */
-       BIOS_PARAMETER_BLOCK bpb;       /* See BIOS_PARAMETER_BLOCK. */
-       u8  unused[4];                  /* zero, NTFS diskedit.exe states that
-                                          this is actually:
-                                               __u8 physical_drive;    // 0x80
-                                               __u8 current_head;      // zero
-                                               __u8 extended_boot_signature;
-                                                                       // 0x80
-                                               __u8 unused;            // zero
-                                        */
-/*0x28*/sle64 number_of_sectors;       /* Number of sectors in volume. Gives
-                                          maximum volume size of 2^63 sectors.
-                                          Assuming standard sector size of 512
-                                          bytes, the maximum byte size is
-                                          approx. 4.7x10^21 bytes. (-; */
-       sle64 mft_lcn;                  /* Cluster location of mft data. */
-       sle64 mftmirr_lcn;              /* Cluster location of copy of mft. */
-       s8  clusters_per_mft_record;    /* Mft record size in clusters. */
-       u8  reserved0[3];               /* zero */
-       s8  clusters_per_index_record;  /* Index block size in clusters. */
-       u8  reserved1[3];               /* zero */
-       le64 volume_serial_number;      /* Irrelevant (serial number). */
-       le32 checksum;                  /* Boot sector checksum. */
-/*0x54*/u8  bootstrap[426];            /* Irrelevant (boot up code). */
-       le16 end_of_sector_marker;      /* End of bootsector magic. Always is
-                                          0xaa55 in little endian. */
-/* sizeof() = 512 (0x200) bytes */
-} __attribute__ ((__packed__)) NTFS_BOOT_SECTOR;
-
-/*
- * Magic identifiers present at the beginning of all ntfs record containing
- * records (like mft records for example).
- */
-enum {
-       /* Found in $MFT/$DATA. */
-       magic_FILE = cpu_to_le32(0x454c4946), /* Mft entry. */
-       magic_INDX = cpu_to_le32(0x58444e49), /* Index buffer. */
-       magic_HOLE = cpu_to_le32(0x454c4f48), /* ? (NTFS 3.0+?) */
-
-       /* Found in $LogFile/$DATA. */
-       magic_RSTR = cpu_to_le32(0x52545352), /* Restart page. */
-       magic_RCRD = cpu_to_le32(0x44524352), /* Log record page. */
-
-       /* Found in $LogFile/$DATA.  (May be found in $MFT/$DATA, also?) */
-       magic_CHKD = cpu_to_le32(0x444b4843), /* Modified by chkdsk. */
-
-       /* Found in all ntfs record containing records. */
-       magic_BAAD = cpu_to_le32(0x44414142), /* Failed multi sector
-                                                      transfer was detected. */
-       /*
-        * Found in $LogFile/$DATA when a page is full of 0xff bytes and is
-        * thus not initialized.  Page must be initialized before using it.
-        */
-       magic_empty = cpu_to_le32(0xffffffff) /* Record is empty. */
-};
-
-typedef le32 NTFS_RECORD_TYPE;
-
-/*
- * Generic magic comparison macros. Finally found a use for the ## preprocessor
- * operator! (-8
- */
-
-static inline bool __ntfs_is_magic(le32 x, NTFS_RECORD_TYPE r)
-{
-       return (x == r);
-}
-#define ntfs_is_magic(x, m)    __ntfs_is_magic(x, magic_##m)
-
-static inline bool __ntfs_is_magicp(le32 *p, NTFS_RECORD_TYPE r)
-{
-       return (*p == r);
-}
-#define ntfs_is_magicp(p, m)   __ntfs_is_magicp(p, magic_##m)
-
-/*
- * Specialised magic comparison macros for the NTFS_RECORD_TYPEs defined above.
- */
-#define ntfs_is_file_record(x)         ( ntfs_is_magic (x, FILE) )
-#define ntfs_is_file_recordp(p)                ( ntfs_is_magicp(p, FILE) )
-#define ntfs_is_mft_record(x)          ( ntfs_is_file_record (x) )
-#define ntfs_is_mft_recordp(p)         ( ntfs_is_file_recordp(p) )
-#define ntfs_is_indx_record(x)         ( ntfs_is_magic (x, INDX) )
-#define ntfs_is_indx_recordp(p)                ( ntfs_is_magicp(p, INDX) )
-#define ntfs_is_hole_record(x)         ( ntfs_is_magic (x, HOLE) )
-#define ntfs_is_hole_recordp(p)                ( ntfs_is_magicp(p, HOLE) )
-
-#define ntfs_is_rstr_record(x)         ( ntfs_is_magic (x, RSTR) )
-#define ntfs_is_rstr_recordp(p)                ( ntfs_is_magicp(p, RSTR) )
-#define ntfs_is_rcrd_record(x)         ( ntfs_is_magic (x, RCRD) )
-#define ntfs_is_rcrd_recordp(p)                ( ntfs_is_magicp(p, RCRD) )
-
-#define ntfs_is_chkd_record(x)         ( ntfs_is_magic (x, CHKD) )
-#define ntfs_is_chkd_recordp(p)                ( ntfs_is_magicp(p, CHKD) )
-
-#define ntfs_is_baad_record(x)         ( ntfs_is_magic (x, BAAD) )
-#define ntfs_is_baad_recordp(p)                ( ntfs_is_magicp(p, BAAD) )
-
-#define ntfs_is_empty_record(x)                ( ntfs_is_magic (x, empty) )
-#define ntfs_is_empty_recordp(p)       ( ntfs_is_magicp(p, empty) )
-
-/*
- * The Update Sequence Array (usa) is an array of the le16 values which belong
- * to the end of each sector protected by the update sequence record in which
- * this array is contained. Note that the first entry is the Update Sequence
- * Number (usn), a cyclic counter of how many times the protected record has
- * been written to disk. The values 0 and -1 (ie. 0xffff) are not used. All
- * last le16's of each sector have to be equal to the usn (during reading) or
- * are set to it (during writing). If they are not, an incomplete multi sector
- * transfer has occurred when the data was written.
- * The maximum size for the update sequence array is fixed to:
- *     maximum size = usa_ofs + (usa_count * 2) = 510 bytes
- * The 510 bytes comes from the fact that the last le16 in the array has to
- * (obviously) finish before the last le16 of the first 512-byte sector.
- * This formula can be used as a consistency check in that usa_ofs +
- * (usa_count * 2) has to be less than or equal to 510.
- */
-typedef struct {
-       NTFS_RECORD_TYPE magic; /* A four-byte magic identifying the record
-                                  type and/or status. */
-       le16 usa_ofs;           /* Offset to the Update Sequence Array (usa)
-                                  from the start of the ntfs record. */
-       le16 usa_count;         /* Number of le16 sized entries in the usa
-                                  including the Update Sequence Number (usn),
-                                  thus the number of fixups is the usa_count
-                                  minus 1. */
-} __attribute__ ((__packed__)) NTFS_RECORD;
-
-/*
- * System files mft record numbers. All these files are always marked as used
- * in the bitmap attribute of the mft; presumably in order to avoid accidental
- * allocation for random other mft records. Also, the sequence number for each
- * of the system files is always equal to their mft record number and it is
- * never modified.
- */
-typedef enum {
-       FILE_MFT       = 0,     /* Master file table (mft). Data attribute
-                                  contains the entries and bitmap attribute
-                                  records which ones are in use (bit==1). */
-       FILE_MFTMirr   = 1,     /* Mft mirror: copy of first four mft records
-                                  in data attribute. If cluster size > 4kiB,
-                                  copy of first N mft records, with
-                                       N = cluster_size / mft_record_size. */
-       FILE_LogFile   = 2,     /* Journalling log in data attribute. */
-       FILE_Volume    = 3,     /* Volume name attribute and volume information
-                                  attribute (flags and ntfs version). Windows
-                                  refers to this file as volume DASD (Direct
-                                  Access Storage Device). */
-       FILE_AttrDef   = 4,     /* Array of attribute definitions in data
-                                  attribute. */
-       FILE_root      = 5,     /* Root directory. */
-       FILE_Bitmap    = 6,     /* Allocation bitmap of all clusters (lcns) in
-                                  data attribute. */
-       FILE_Boot      = 7,     /* Boot sector (always at cluster 0) in data
-                                  attribute. */
-       FILE_BadClus   = 8,     /* Contains all bad clusters in the non-resident
-                                  data attribute. */
-       FILE_Secure    = 9,     /* Shared security descriptors in data attribute
-                                  and two indexes into the descriptors.
-                                  Appeared in Windows 2000. Before that, this
-                                  file was named $Quota but was unused. */
-       FILE_UpCase    = 10,    /* Uppercase equivalents of all 65536 Unicode
-                                  characters in data attribute. */
-       FILE_Extend    = 11,    /* Directory containing other system files (eg.
-                                  $ObjId, $Quota, $Reparse and $UsnJrnl). This
-                                  is new to NTFS3.0. */
-       FILE_reserved12 = 12,   /* Reserved for future use (records 12-15). */
-       FILE_reserved13 = 13,
-       FILE_reserved14 = 14,
-       FILE_reserved15 = 15,
-       FILE_first_user = 16,   /* First user file, used as test limit for
-                                  whether to allow opening a file or not. */
-} NTFS_SYSTEM_FILES;
-
-/*
- * These are the so far known MFT_RECORD_* flags (16-bit) which contain
- * information about the mft record in which they are present.
- */
-enum {
-       MFT_RECORD_IN_USE       = cpu_to_le16(0x0001),
-       MFT_RECORD_IS_DIRECTORY = cpu_to_le16(0x0002),
-} __attribute__ ((__packed__));
-
-typedef le16 MFT_RECORD_FLAGS;
-
-/*
- * mft references (aka file references or file record segment references) are
- * used whenever a structure needs to refer to a record in the mft.
- *
- * A reference consists of a 48-bit index into the mft and a 16-bit sequence
- * number used to detect stale references.
- *
- * For error reporting purposes we treat the 48-bit index as a signed quantity.
- *
- * The sequence number is a circular counter (skipping 0) describing how many
- * times the referenced mft record has been (re)used. This has to match the
- * sequence number of the mft record being referenced, otherwise the reference
- * is considered stale and removed (FIXME: only ntfsck or the driver itself?).
- *
- * If the sequence number is zero it is assumed that no sequence number
- * consistency checking should be performed.
- *
- * FIXME: Since inodes are 32-bit as of now, the driver needs to always check
- * for high_part being 0 and if not either BUG(), cause a panic() or handle
- * the situation in some other way. This shouldn't be a problem as a volume has
- * to become HUGE in order to need more than 32-bits worth of mft records.
- * Assuming the standard mft record size of 1kb only the records (never mind
- * the non-resident attributes, etc.) would require 4Tb of space on their own
- * for the first 32 bits worth of records. This is only if some strange person
- * doesn't decide to foul play and make the mft sparse which would be a really
- * horrible thing to do as it would trash our current driver implementation. )-:
- * Do I hear screams "we want 64-bit inodes!" ?!? (-;
- *
- * FIXME: The mft zone is defined as the first 12% of the volume. This space is
- * reserved so that the mft can grow contiguously and hence doesn't become
- * fragmented. Volume free space includes the empty part of the mft zone and
- * when the volume's free 88% are used up, the mft zone is shrunk by a factor
- * of 2, thus making more space available for more files/data. This process is
- * repeated every time there is no more free space except for the mft zone until
- * there really is no more free space.
- */
-
-/*
- * Typedef the MFT_REF as a 64-bit value for easier handling.
- * Also define two unpacking macros to get to the reference (MREF) and
- * sequence number (MSEQNO) respectively.
- * The _LE versions are to be applied on little endian MFT_REFs.
- * Note: The _LE versions will return a CPU endian formatted value!
- */
-#define MFT_REF_MASK_CPU 0x0000ffffffffffffULL
-#define MFT_REF_MASK_LE cpu_to_le64(MFT_REF_MASK_CPU)
-
-typedef u64 MFT_REF;
-typedef le64 leMFT_REF;
-
-#define MK_MREF(m, s)  ((MFT_REF)(((MFT_REF)(s) << 48) |               \
-                                       ((MFT_REF)(m) & MFT_REF_MASK_CPU)))
-#define MK_LE_MREF(m, s) cpu_to_le64(MK_MREF(m, s))
-
-#define MREF(x)                ((unsigned long)((x) & MFT_REF_MASK_CPU))
-#define MSEQNO(x)      ((u16)(((x) >> 48) & 0xffff))
-#define MREF_LE(x)     ((unsigned long)(le64_to_cpu(x) & MFT_REF_MASK_CPU))
-#define MSEQNO_LE(x)   ((u16)((le64_to_cpu(x) >> 48) & 0xffff))
-
-#define IS_ERR_MREF(x) (((x) & 0x0000800000000000ULL) ? true : false)
-#define ERR_MREF(x)    ((u64)((s64)(x)))
-#define MREF_ERR(x)    ((int)((s64)(x)))
-
-/*
- * The mft record header present at the beginning of every record in the mft.
- * This is followed by a sequence of variable length attribute records which
- * is terminated by an attribute of type AT_END which is a truncated attribute
- * in that it only consists of the attribute type code AT_END and none of the
- * other members of the attribute structure are present.
- */
-typedef struct {
-/*Ofs*/
-/*  0  NTFS_RECORD; -- Unfolded here as gcc doesn't like unnamed structs. */
-       NTFS_RECORD_TYPE magic; /* Usually the magic is "FILE". */
-       le16 usa_ofs;           /* See NTFS_RECORD definition above. */
-       le16 usa_count;         /* See NTFS_RECORD definition above. */
-
-/*  8*/        le64 lsn;               /* $LogFile sequence number for this record.
-                                  Changed every time the record is modified. */
-/* 16*/        le16 sequence_number;   /* Number of times this mft record has been
-                                  reused. (See description for MFT_REF
-                                  above.) NOTE: The increment (skipping zero)
-                                  is done when the file is deleted. NOTE: If
-                                  this is zero it is left zero. */
-/* 18*/        le16 link_count;        /* Number of hard links, i.e. the number of
-                                  directory entries referencing this record.
-                                  NOTE: Only used in mft base records.
-                                  NOTE: When deleting a directory entry we
-                                  check the link_count and if it is 1 we
-                                  delete the file. Otherwise we delete the
-                                  FILE_NAME_ATTR being referenced by the
-                                  directory entry from the mft record and
-                                  decrement the link_count.
-                                  FIXME: Careful with Win32 + DOS names! */
-/* 20*/        le16 attrs_offset;      /* Byte offset to the first attribute in this
-                                  mft record from the start of the mft record.
-                                  NOTE: Must be aligned to 8-byte boundary. */
-/* 22*/        MFT_RECORD_FLAGS flags; /* Bit array of MFT_RECORD_FLAGS. When a file
-                                  is deleted, the MFT_RECORD_IN_USE flag is
-                                  set to zero. */
-/* 24*/        le32 bytes_in_use;      /* Number of bytes used in this mft record.
-                                  NOTE: Must be aligned to 8-byte boundary. */
-/* 28*/        le32 bytes_allocated;   /* Number of bytes allocated for this mft
-                                  record. This should be equal to the mft
-                                  record size. */
-/* 32*/        leMFT_REF base_mft_record;/* This is zero for base mft records.
-                                  When it is not zero it is a mft reference
-                                  pointing to the base mft record to which
-                                  this record belongs (this is then used to
-                                  locate the attribute list attribute present
-                                  in the base record which describes this
-                                  extension record and hence might need
-                                  modification when the extension record
-                                  itself is modified, also locating the
-                                  attribute list also means finding the other
-                                  potential extents, belonging to the non-base
-                                  mft record). */
-/* 40*/        le16 next_attr_instance;/* The instance number that will be assigned to
-                                  the next attribute added to this mft record.
-                                  NOTE: Incremented each time after it is used.
-                                  NOTE: Every time the mft record is reused
-                                  this number is set to zero.  NOTE: The first
-                                  instance number is always 0. */
-/* The below fields are specific to NTFS 3.1+ (Windows XP and above): */
-/* 42*/ le16 reserved;         /* Reserved/alignment. */
-/* 44*/ le32 mft_record_number;        /* Number of this mft record. */
-/* sizeof() = 48 bytes */
-/*
- * When (re)using the mft record, we place the update sequence array at this
- * offset, i.e. before we start with the attributes.  This also makes sense,
- * otherwise we could run into problems with the update sequence array
- * containing in itself the last two bytes of a sector which would mean that
- * multi sector transfer protection wouldn't work.  As you can't protect data
- * by overwriting it since you then can't get it back...
- * When reading we obviously use the data from the ntfs record header.
- */
-} __attribute__ ((__packed__)) MFT_RECORD;
-
-/* This is the version without the NTFS 3.1+ specific fields. */
-typedef struct {
-/*Ofs*/
-/*  0  NTFS_RECORD; -- Unfolded here as gcc doesn't like unnamed structs. */
-       NTFS_RECORD_TYPE magic; /* Usually the magic is "FILE". */
-       le16 usa_ofs;           /* See NTFS_RECORD definition above. */
-       le16 usa_count;         /* See NTFS_RECORD definition above. */
-
-/*  8*/        le64 lsn;               /* $LogFile sequence number for this record.
-                                  Changed every time the record is modified. */
-/* 16*/        le16 sequence_number;   /* Number of times this mft record has been
-                                  reused. (See description for MFT_REF
-                                  above.) NOTE: The increment (skipping zero)
-                                  is done when the file is deleted. NOTE: If
-                                  this is zero it is left zero. */
-/* 18*/        le16 link_count;        /* Number of hard links, i.e. the number of
-                                  directory entries referencing this record.
-                                  NOTE: Only used in mft base records.
-                                  NOTE: When deleting a directory entry we
-                                  check the link_count and if it is 1 we
-                                  delete the file. Otherwise we delete the
-                                  FILE_NAME_ATTR being referenced by the
-                                  directory entry from the mft record and
-                                  decrement the link_count.
-                                  FIXME: Careful with Win32 + DOS names! */
-/* 20*/        le16 attrs_offset;      /* Byte offset to the first attribute in this
-                                  mft record from the start of the mft record.
-                                  NOTE: Must be aligned to 8-byte boundary. */
-/* 22*/        MFT_RECORD_FLAGS flags; /* Bit array of MFT_RECORD_FLAGS. When a file
-                                  is deleted, the MFT_RECORD_IN_USE flag is
-                                  set to zero. */
-/* 24*/        le32 bytes_in_use;      /* Number of bytes used in this mft record.
-                                  NOTE: Must be aligned to 8-byte boundary. */
-/* 28*/        le32 bytes_allocated;   /* Number of bytes allocated for this mft
-                                  record. This should be equal to the mft
-                                  record size. */
-/* 32*/        leMFT_REF base_mft_record;/* This is zero for base mft records.
-                                  When it is not zero it is a mft reference
-                                  pointing to the base mft record to which
-                                  this record belongs (this is then used to
-                                  locate the attribute list attribute present
-                                  in the base record which describes this
-                                  extension record and hence might need
-                                  modification when the extension record
-                                  itself is modified, also locating the
-                                  attribute list also means finding the other
-                                  potential extents, belonging to the non-base
-                                  mft record). */
-/* 40*/        le16 next_attr_instance;/* The instance number that will be assigned to
-                                  the next attribute added to this mft record.
-                                  NOTE: Incremented each time after it is used.
-                                  NOTE: Every time the mft record is reused
-                                  this number is set to zero.  NOTE: The first
-                                  instance number is always 0. */
-/* sizeof() = 42 bytes */
-/*
- * When (re)using the mft record, we place the update sequence array at this
- * offset, i.e. before we start with the attributes.  This also makes sense,
- * otherwise we could run into problems with the update sequence array
- * containing in itself the last two bytes of a sector which would mean that
- * multi sector transfer protection wouldn't work.  As you can't protect data
- * by overwriting it since you then can't get it back...
- * When reading we obviously use the data from the ntfs record header.
- */
-} __attribute__ ((__packed__)) MFT_RECORD_OLD;
-
-/*
- * System defined attributes (32-bit).  Each attribute type has a corresponding
- * attribute name (Unicode string of maximum 64 character length) as described
- * by the attribute definitions present in the data attribute of the $AttrDef
- * system file.  On NTFS 3.0 volumes the names are just as the types are named
- * in the below defines exchanging AT_ for the dollar sign ($).  If that is not
- * a revealing choice of symbol I do not know what is... (-;
- */
-enum {
-       AT_UNUSED                       = cpu_to_le32(         0),
-       AT_STANDARD_INFORMATION         = cpu_to_le32(      0x10),
-       AT_ATTRIBUTE_LIST               = cpu_to_le32(      0x20),
-       AT_FILE_NAME                    = cpu_to_le32(      0x30),
-       AT_OBJECT_ID                    = cpu_to_le32(      0x40),
-       AT_SECURITY_DESCRIPTOR          = cpu_to_le32(      0x50),
-       AT_VOLUME_NAME                  = cpu_to_le32(      0x60),
-       AT_VOLUME_INFORMATION           = cpu_to_le32(      0x70),
-       AT_DATA                         = cpu_to_le32(      0x80),
-       AT_INDEX_ROOT                   = cpu_to_le32(      0x90),
-       AT_INDEX_ALLOCATION             = cpu_to_le32(      0xa0),
-       AT_BITMAP                       = cpu_to_le32(      0xb0),
-       AT_REPARSE_POINT                = cpu_to_le32(      0xc0),
-       AT_EA_INFORMATION               = cpu_to_le32(      0xd0),
-       AT_EA                           = cpu_to_le32(      0xe0),
-       AT_PROPERTY_SET                 = cpu_to_le32(      0xf0),
-       AT_LOGGED_UTILITY_STREAM        = cpu_to_le32(     0x100),
-       AT_FIRST_USER_DEFINED_ATTRIBUTE = cpu_to_le32(    0x1000),
-       AT_END                          = cpu_to_le32(0xffffffff)
-};
-
-typedef le32 ATTR_TYPE;
-
-/*
- * The collation rules for sorting views/indexes/etc (32-bit).
- *
- * COLLATION_BINARY - Collate by binary compare where the first byte is most
- *     significant.
- * COLLATION_UNICODE_STRING - Collate Unicode strings by comparing their binary
- *     Unicode values, except that when a character can be uppercased, the
- *     upper case value collates before the lower case one.
- * COLLATION_FILE_NAME - Collate file names as Unicode strings. The collation
- *     is done very much like COLLATION_UNICODE_STRING. In fact I have no idea
- *     what the difference is. Perhaps the difference is that file names
- *     would treat some special characters in an odd way (see
- *     unistr.c::ntfs_collate_names() and unistr.c::legal_ansi_char_array[]
- *     for what I mean but COLLATION_UNICODE_STRING would not give any special
- *     treatment to any characters at all, but this is speculation.
- * COLLATION_NTOFS_ULONG - Sorting is done according to ascending le32 key
- *     values. E.g. used for $SII index in FILE_Secure, which sorts by
- *     security_id (le32).
- * COLLATION_NTOFS_SID - Sorting is done according to ascending SID values.
- *     E.g. used for $O index in FILE_Extend/$Quota.
- * COLLATION_NTOFS_SECURITY_HASH - Sorting is done first by ascending hash
- *     values and second by ascending security_id values. E.g. used for $SDH
- *     index in FILE_Secure.
- * COLLATION_NTOFS_ULONGS - Sorting is done according to a sequence of ascending
- *     le32 key values. E.g. used for $O index in FILE_Extend/$ObjId, which
- *     sorts by object_id (16-byte), by splitting up the object_id in four
- *     le32 values and using them as individual keys. E.g. take the following
- *     two security_ids, stored as follows on disk:
- *             1st: a1 61 65 b7 65 7b d4 11 9e 3d 00 e0 81 10 42 59
- *             2nd: 38 14 37 d2 d2 f3 d4 11 a5 21 c8 6b 79 b1 97 45
- *     To compare them, they are split into four le32 values each, like so:
- *             1st: 0xb76561a1 0x11d47b65 0xe0003d9e 0x59421081
- *             2nd: 0xd2371438 0x11d4f3d2 0x6bc821a5 0x4597b179
- *     Now, it is apparent why the 2nd object_id collates after the 1st: the
- *     first le32 value of the 1st object_id is less than the first le32 of
- *     the 2nd object_id. If the first le32 values of both object_ids were
- *     equal then the second le32 values would be compared, etc.
- */
-enum {
-       COLLATION_BINARY                = cpu_to_le32(0x00),
-       COLLATION_FILE_NAME             = cpu_to_le32(0x01),
-       COLLATION_UNICODE_STRING        = cpu_to_le32(0x02),
-       COLLATION_NTOFS_ULONG           = cpu_to_le32(0x10),
-       COLLATION_NTOFS_SID             = cpu_to_le32(0x11),
-       COLLATION_NTOFS_SECURITY_HASH   = cpu_to_le32(0x12),
-       COLLATION_NTOFS_ULONGS          = cpu_to_le32(0x13),
-};
-
-typedef le32 COLLATION_RULE;
-
-/*
- * The flags (32-bit) describing attribute properties in the attribute
- * definition structure.  FIXME: This information is based on Regis's
- * information and, according to him, it is not certain and probably
- * incomplete.  The INDEXABLE flag is fairly certainly correct as only the file
- * name attribute has this flag set and this is the only attribute indexed in
- * NT4.
- */
-enum {
-       ATTR_DEF_INDEXABLE      = cpu_to_le32(0x02), /* Attribute can be
-                                       indexed. */
-       ATTR_DEF_MULTIPLE       = cpu_to_le32(0x04), /* Attribute type
-                                       can be present multiple times in the
-                                       mft records of an inode. */
-       ATTR_DEF_NOT_ZERO       = cpu_to_le32(0x08), /* Attribute value
-                                       must contain at least one non-zero
-                                       byte. */
-       ATTR_DEF_INDEXED_UNIQUE = cpu_to_le32(0x10), /* Attribute must be
-                                       indexed and the attribute value must be
-                                       unique for the attribute type in all of
-                                       the mft records of an inode. */
-       ATTR_DEF_NAMED_UNIQUE   = cpu_to_le32(0x20), /* Attribute must be
-                                       named and the name must be unique for
-                                       the attribute type in all of the mft
-                                       records of an inode. */
-       ATTR_DEF_RESIDENT       = cpu_to_le32(0x40), /* Attribute must be
-                                       resident. */
-       ATTR_DEF_ALWAYS_LOG     = cpu_to_le32(0x80), /* Always log
-                                       modifications to this attribute,
-                                       regardless of whether it is resident or
-                                       non-resident.  Without this, only log
-                                       modifications if the attribute is
-                                       resident. */
-};
-
-typedef le32 ATTR_DEF_FLAGS;
-
-/*
- * The data attribute of FILE_AttrDef contains a sequence of attribute
- * definitions for the NTFS volume. With this, it is supposed to be safe for an
- * older NTFS driver to mount a volume containing a newer NTFS version without
- * damaging it (that's the theory. In practice it's: not damaging it too much).
- * Entries are sorted by attribute type. The flags describe whether the
- * attribute can be resident/non-resident and possibly other things, but the
- * actual bits are unknown.
- */
-typedef struct {
-/*hex ofs*/
-/*  0*/        ntfschar name[0x40];            /* Unicode name of the attribute. Zero
-                                          terminated. */
-/* 80*/        ATTR_TYPE type;                 /* Type of the attribute. */
-/* 84*/        le32 display_rule;              /* Default display rule.
-                                          FIXME: What does it mean? (AIA) */
-/* 88*/ COLLATION_RULE collation_rule; /* Default collation rule. */
-/* 8c*/        ATTR_DEF_FLAGS flags;           /* Flags describing the attribute. */
-/* 90*/        sle64 min_size;                 /* Optional minimum attribute size. */
-/* 98*/        sle64 max_size;                 /* Maximum size of attribute. */
-/* sizeof() = 0xa0 or 160 bytes */
-} __attribute__ ((__packed__)) ATTR_DEF;
-
-/*
- * Attribute flags (16-bit).
- */
-enum {
-       ATTR_IS_COMPRESSED    = cpu_to_le16(0x0001),
-       ATTR_COMPRESSION_MASK = cpu_to_le16(0x00ff), /* Compression method
-                                                             mask.  Also, first
-                                                             illegal value. */
-       ATTR_IS_ENCRYPTED     = cpu_to_le16(0x4000),
-       ATTR_IS_SPARSE        = cpu_to_le16(0x8000),
-} __attribute__ ((__packed__));
-
-typedef le16 ATTR_FLAGS;
-
-/*
- * Attribute compression.
- *
- * Only the data attribute is ever compressed in the current ntfs driver in
- * Windows. Further, compression is only applied when the data attribute is
- * non-resident. Finally, to use compression, the maximum allowed cluster size
- * on a volume is 4kib.
- *
- * The compression method is based on independently compressing blocks of X
- * clusters, where X is determined from the compression_unit value found in the
- * non-resident attribute record header (more precisely: X = 2^compression_unit
- * clusters). On Windows NT/2k, X always is 16 clusters (compression_unit = 4).
- *
- * There are three different cases of how a compression block of X clusters
- * can be stored:
- *
- *   1) The data in the block is all zero (a sparse block):
- *       This is stored as a sparse block in the runlist, i.e. the runlist
- *       entry has length = X and lcn = -1. The mapping pairs array actually
- *       uses a delta_lcn value length of 0, i.e. delta_lcn is not present at
- *       all, which is then interpreted by the driver as lcn = -1.
- *       NOTE: Even uncompressed files can be sparse on NTFS 3.0 volumes, then
- *       the same principles apply as above, except that the length is not
- *       restricted to being any particular value.
- *
- *   2) The data in the block is not compressed:
- *       This happens when compression doesn't reduce the size of the block
- *       in clusters. I.e. if compression has a small effect so that the
- *       compressed data still occupies X clusters, then the uncompressed data
- *       is stored in the block.
- *       This case is recognised by the fact that the runlist entry has
- *       length = X and lcn >= 0. The mapping pairs array stores this as
- *       normal with a run length of X and some specific delta_lcn, i.e.
- *       delta_lcn has to be present.
- *
- *   3) The data in the block is compressed:
- *       The common case. This case is recognised by the fact that the run
- *       list entry has length L < X and lcn >= 0. The mapping pairs array
- *       stores this as normal with a run length of X and some specific
- *       delta_lcn, i.e. delta_lcn has to be present. This runlist entry is
- *       immediately followed by a sparse entry with length = X - L and
- *       lcn = -1. The latter entry is to make up the vcn counting to the
- *       full compression block size X.
- *
- * In fact, life is more complicated because adjacent entries of the same type
- * can be coalesced. This means that one has to keep track of the number of
- * clusters handled and work on a basis of X clusters at a time being one
- * block. An example: if length L > X this means that this particular runlist
- * entry contains a block of length X and part of one or more blocks of length
- * L - X. Another example: if length L < X, this does not necessarily mean that
- * the block is compressed as it might be that the lcn changes inside the block
- * and hence the following runlist entry describes the continuation of the
- * potentially compressed block. The block would be compressed if the
- * following runlist entry describes at least X - L sparse clusters, thus
- * making up the compression block length as described in point 3 above. (Of
- * course, there can be several runlist entries with small lengths so that the
- * sparse entry does not follow the first data containing entry with
- * length < X.)
- *
- * NOTE: At the end of the compressed attribute value, there most likely is not
- * just the right amount of data to make up a compression block, thus this data
- * is not even attempted to be compressed. It is just stored as is, unless
- * the number of clusters it occupies is reduced when compressed in which case
- * it is stored as a compressed compression block, complete with sparse
- * clusters at the end.
- */
-
-/*
- * Flags of resident attributes (8-bit).
- */
-enum {
-       RESIDENT_ATTR_IS_INDEXED = 0x01, /* Attribute is referenced in an index
-                                           (has implications for deleting and
-                                           modifying the attribute). */
-} __attribute__ ((__packed__));
-
-typedef u8 RESIDENT_ATTR_FLAGS;
-
-/*
- * Attribute record header. Always aligned to 8-byte boundary.
- */
-typedef struct {
-/*Ofs*/
-/*  0*/        ATTR_TYPE type;         /* The (32-bit) type of the attribute. */
-/*  4*/        le32 length;            /* Byte size of the resident part of the
-                                  attribute (aligned to 8-byte boundary).
-                                  Used to get to the next attribute. */
-/*  8*/        u8 non_resident;        /* If 0, attribute is resident.
-                                  If 1, attribute is non-resident. */
-/*  9*/        u8 name_length;         /* Unicode character size of name of attribute.
-                                  0 if unnamed. */
-/* 10*/        le16 name_offset;       /* If name_length != 0, the byte offset to the
-                                  beginning of the name from the attribute
-                                  record. Note that the name is stored as a
-                                  Unicode string. When creating, place offset
-                                  just at the end of the record header. Then,
-                                  follow with attribute value or mapping pairs
-                                  array, resident and non-resident attributes
-                                  respectively, aligning to an 8-byte
-                                  boundary. */
-/* 12*/        ATTR_FLAGS flags;       /* Flags describing the attribute. */
-/* 14*/        le16 instance;          /* The instance of this attribute record. This
-                                  number is unique within this mft record (see
-                                  MFT_RECORD/next_attribute_instance notes in
-                                  mft.h for more details). */
-/* 16*/        union {
-               /* Resident attributes. */
-               struct {
-/* 16 */               le32 value_length;/* Byte size of attribute value. */
-/* 20 */               le16 value_offset;/* Byte offset of the attribute
-                                            value from the start of the
-                                            attribute record. When creating,
-                                            align to 8-byte boundary if we
-                                            have a name present as this might
-                                            not have a length of a multiple
-                                            of 8-bytes. */
-/* 22 */               RESIDENT_ATTR_FLAGS flags; /* See above. */
-/* 23 */               s8 reserved;      /* Reserved/alignment to 8-byte
-                                            boundary. */
-               } __attribute__ ((__packed__)) resident;
-               /* Non-resident attributes. */
-               struct {
-/* 16*/                        leVCN lowest_vcn;/* Lowest valid virtual cluster number
-                               for this portion of the attribute value or
-                               0 if this is the only extent (usually the
-                               case). - Only when an attribute list is used
-                               does lowest_vcn != 0 ever occur. */
-/* 24*/                        leVCN highest_vcn;/* Highest valid vcn of this extent of
-                               the attribute value. - Usually there is only one
-                               portion, so this usually equals the attribute
-                               value size in clusters minus 1. Can be -1 for
-                               zero length files. Can be 0 for "single extent"
-                               attributes. */
-/* 32*/                        le16 mapping_pairs_offset; /* Byte offset from the
-                               beginning of the structure to the mapping pairs
-                               array which contains the mappings between the
-                               vcns and the logical cluster numbers (lcns).
-                               When creating, place this at the end of this
-                               record header aligned to 8-byte boundary. */
-/* 34*/                        u8 compression_unit; /* The compression unit expressed
-                               as the log to the base 2 of the number of
-                               clusters in a compression unit.  0 means not
-                               compressed.  (This effectively limits the
-                               compression unit size to be a power of two
-                               clusters.)  WinNT4 only uses a value of 4.
-                               Sparse files have this set to 0 on XPSP2. */
-/* 35*/                        u8 reserved[5];         /* Align to 8-byte boundary. */
-/* The sizes below are only used when lowest_vcn is zero, as otherwise it would
-   be difficult to keep them up-to-date.*/
-/* 40*/                        sle64 allocated_size;   /* Byte size of disk space
-                               allocated to hold the attribute value. Always
-                               is a multiple of the cluster size. When a file
-                               is compressed, this field is a multiple of the
-                               compression block size (2^compression_unit) and
-                               it represents the logically allocated space
-                               rather than the actual on disk usage. For this
-                               use the compressed_size (see below). */
-/* 48*/                        sle64 data_size;        /* Byte size of the attribute
-                               value. Can be larger than allocated_size if
-                               attribute value is compressed or sparse. */
-/* 56*/                        sle64 initialized_size; /* Byte size of initialized
-                               portion of the attribute value. Usually equals
-                               data_size. */
-/* sizeof(uncompressed attr) = 64*/
-/* 64*/                        sle64 compressed_size;  /* Byte size of the attribute
-                               value after compression.  Only present when
-                               compressed or sparse.  Always is a multiple of
-                               the cluster size.  Represents the actual amount
-                               of disk space being used on the disk. */
-/* sizeof(compressed attr) = 72*/
-               } __attribute__ ((__packed__)) non_resident;
-       } __attribute__ ((__packed__)) data;
-} __attribute__ ((__packed__)) ATTR_RECORD;
-
-typedef ATTR_RECORD ATTR_REC;
-
-/*
- * File attribute flags (32-bit) appearing in the file_attributes fields of the
- * STANDARD_INFORMATION attribute of MFT_RECORDs and the FILENAME_ATTR
- * attributes of MFT_RECORDs and directory index entries.
- *
- * All of the below flags appear in the directory index entries but only some
- * appear in the STANDARD_INFORMATION attribute whilst only some others appear
- * in the FILENAME_ATTR attribute of MFT_RECORDs.  Unless otherwise stated the
- * flags appear in all of the above.
- */
-enum {
-       FILE_ATTR_READONLY              = cpu_to_le32(0x00000001),
-       FILE_ATTR_HIDDEN                = cpu_to_le32(0x00000002),
-       FILE_ATTR_SYSTEM                = cpu_to_le32(0x00000004),
-       /* Old DOS volid. Unused in NT. = cpu_to_le32(0x00000008), */
-
-       FILE_ATTR_DIRECTORY             = cpu_to_le32(0x00000010),
-       /* Note, FILE_ATTR_DIRECTORY is not considered valid in NT.  It is
-          reserved for the DOS SUBDIRECTORY flag. */
-       FILE_ATTR_ARCHIVE               = cpu_to_le32(0x00000020),
-       FILE_ATTR_DEVICE                = cpu_to_le32(0x00000040),
-       FILE_ATTR_NORMAL                = cpu_to_le32(0x00000080),
-
-       FILE_ATTR_TEMPORARY             = cpu_to_le32(0x00000100),
-       FILE_ATTR_SPARSE_FILE           = cpu_to_le32(0x00000200),
-       FILE_ATTR_REPARSE_POINT         = cpu_to_le32(0x00000400),
-       FILE_ATTR_COMPRESSED            = cpu_to_le32(0x00000800),
-
-       FILE_ATTR_OFFLINE               = cpu_to_le32(0x00001000),
-       FILE_ATTR_NOT_CONTENT_INDEXED   = cpu_to_le32(0x00002000),
-       FILE_ATTR_ENCRYPTED             = cpu_to_le32(0x00004000),
-
-       FILE_ATTR_VALID_FLAGS           = cpu_to_le32(0x00007fb7),
-       /* Note, FILE_ATTR_VALID_FLAGS masks out the old DOS VolId and the
-          FILE_ATTR_DEVICE and preserves everything else.  This mask is used
-          to obtain all flags that are valid for reading. */
-       FILE_ATTR_VALID_SET_FLAGS       = cpu_to_le32(0x000031a7),
-       /* Note, FILE_ATTR_VALID_SET_FLAGS masks out the old DOS VolId, the
-          F_A_DEVICE, F_A_DIRECTORY, F_A_SPARSE_FILE, F_A_REPARSE_POINT,
-          F_A_COMPRESSED, and F_A_ENCRYPTED and preserves the rest.  This mask
-          is used to obtain all flags that are valid for setting. */
-       /*
-        * The flag FILE_ATTR_DUP_FILENAME_INDEX_PRESENT is present in all
-        * FILENAME_ATTR attributes but not in the STANDARD_INFORMATION
-        * attribute of an mft record.
-        */
-       FILE_ATTR_DUP_FILE_NAME_INDEX_PRESENT   = cpu_to_le32(0x10000000),
-       /* Note, this is a copy of the corresponding bit from the mft record,
-          telling us whether this is a directory or not, i.e. whether it has
-          an index root attribute or not. */
-       FILE_ATTR_DUP_VIEW_INDEX_PRESENT        = cpu_to_le32(0x20000000),
-       /* Note, this is a copy of the corresponding bit from the mft record,
-          telling us whether this file has a view index present (eg. object id
-          index, quota index, one of the security indexes or the encrypting
-          filesystem related indexes). */
-};
-
-typedef le32 FILE_ATTR_FLAGS;
-
-/*
- * NOTE on times in NTFS: All times are in MS standard time format, i.e. they
- * are the number of 100-nanosecond intervals since 1st January 1601, 00:00:00
- * universal coordinated time (UTC). (In Linux time starts 1st January 1970,
- * 00:00:00 UTC and is stored as the number of 1-second intervals since then.)
- */
-
-/*
- * Attribute: Standard information (0x10).
- *
- * NOTE: Always resident.
- * NOTE: Present in all base file records on a volume.
- * NOTE: There is conflicting information about the meaning of each of the time
- *      fields but the meaning as defined below has been verified to be
- *      correct by practical experimentation on Windows NT4 SP6a and is hence
- *      assumed to be the one and only correct interpretation.
- */
-typedef struct {
-/*Ofs*/
-/*  0*/        sle64 creation_time;            /* Time file was created. Updated when
-                                          a filename is changed(?). */
-/*  8*/        sle64 last_data_change_time;    /* Time the data attribute was last
-                                          modified. */
-/* 16*/        sle64 last_mft_change_time;     /* Time this mft record was last
-                                          modified. */
-/* 24*/        sle64 last_access_time;         /* Approximate time when the file was
-                                          last accessed (obviously this is not
-                                          updated on read-only volumes). In
-                                          Windows this is only updated when
-                                          accessed if some time delta has
-                                          passed since the last update. Also,
-                                          last access time updates can be
-                                          disabled altogether for speed. */
-/* 32*/        FILE_ATTR_FLAGS file_attributes; /* Flags describing the file. */
-/* 36*/        union {
-       /* NTFS 1.2 */
-               struct {
-               /* 36*/ u8 reserved12[12];      /* Reserved/alignment to 8-byte
-                                                  boundary. */
-               } __attribute__ ((__packed__)) v1;
-       /* sizeof() = 48 bytes */
-       /* NTFS 3.x */
-               struct {
-/*
- * If a volume has been upgraded from a previous NTFS version, then these
- * fields are present only if the file has been accessed since the upgrade.
- * Recognize the difference by comparing the length of the resident attribute
- * value. If it is 48, then the following fields are missing. If it is 72 then
- * the fields are present. Maybe just check like this:
- *     if (resident.ValueLength < sizeof(STANDARD_INFORMATION)) {
- *             Assume NTFS 1.2- format.
- *             If (volume version is 3.x)
- *                     Upgrade attribute to NTFS 3.x format.
- *             else
- *                     Use NTFS 1.2- format for access.
- *     } else
- *             Use NTFS 3.x format for access.
- * Only problem is that it might be legal to set the length of the value to
- * arbitrarily large values thus spoiling this check. - But chkdsk probably
- * views that as a corruption, assuming that it behaves like this for all
- * attributes.
- */
-               /* 36*/ le32 maximum_versions;  /* Maximum allowed versions for
-                               file. Zero if version numbering is disabled. */
-               /* 40*/ le32 version_number;    /* This file's version (if any).
-                               Set to zero if maximum_versions is zero. */
-               /* 44*/ le32 class_id;          /* Class id from bidirectional
-                               class id index (?). */
-               /* 48*/ le32 owner_id;          /* Owner_id of the user owning
-                               the file. Translate via $Q index in FILE_Extend
-                               /$Quota to the quota control entry for the user
-                               owning the file. Zero if quotas are disabled. */
-               /* 52*/ le32 security_id;       /* Security_id for the file.
-                               Translate via $SII index and $SDS data stream
-                               in FILE_Secure to the security descriptor. */
-               /* 56*/ le64 quota_charged;     /* Byte size of the charge to
-                               the quota for all streams of the file. Note: Is
-                               zero if quotas are disabled. */
-               /* 64*/ leUSN usn;              /* Last update sequence number
-                               of the file.  This is a direct index into the
-                               transaction log file ($UsnJrnl).  It is zero if
-                               the usn journal is disabled or this file has
-                               not been subject to logging yet.  See usnjrnl.h
-                               for details. */
-               } __attribute__ ((__packed__)) v3;
-       /* sizeof() = 72 bytes (NTFS 3.x) */
-       } __attribute__ ((__packed__)) ver;
-} __attribute__ ((__packed__)) STANDARD_INFORMATION;
-
-/*
- * Attribute: Attribute list (0x20).
- *
- * - Can be either resident or non-resident.
- * - Value consists of a sequence of variable length, 8-byte aligned,
- * ATTR_LIST_ENTRY records.
- * - The list is not terminated by anything at all! The only way to know when
- * the end is reached is to keep track of the current offset and compare it to
- * the attribute value size.
- * - The attribute list attribute contains one entry for each attribute of
- * the file in which the list is located, except for the list attribute
- * itself. The list is sorted: first by attribute type, second by attribute
- * name (if present), third by instance number. The extents of one
- * non-resident attribute (if present) immediately follow after the initial
- * extent. They are ordered by lowest_vcn and have their instace set to zero.
- * It is not allowed to have two attributes with all sorting keys equal.
- * - Further restrictions:
- *     - If not resident, the vcn to lcn mapping array has to fit inside the
- *       base mft record.
- *     - The attribute list attribute value has a maximum size of 256kb. This
- *       is imposed by the Windows cache manager.
- * - Attribute lists are only used when the attributes of mft record do not
- * fit inside the mft record despite all attributes (that can be made
- * non-resident) having been made non-resident. This can happen e.g. when:
- *     - File has a large number of hard links (lots of file name
- *       attributes present).
- *     - The mapping pairs array of some non-resident attribute becomes so
- *       large due to fragmentation that it overflows the mft record.
- *     - The security descriptor is very complex (not applicable to
- *       NTFS 3.0 volumes).
- *     - There are many named streams.
- */
-typedef struct {
-/*Ofs*/
-/*  0*/        ATTR_TYPE type;         /* Type of referenced attribute. */
-/*  4*/        le16 length;            /* Byte size of this entry (8-byte aligned). */
-/*  6*/        u8 name_length;         /* Size in Unicode chars of the name of the
-                                  attribute or 0 if unnamed. */
-/*  7*/        u8 name_offset;         /* Byte offset to beginning of attribute name
-                                  (always set this to where the name would
-                                  start even if unnamed). */
-/*  8*/        leVCN lowest_vcn;       /* Lowest virtual cluster number of this portion
-                                  of the attribute value. This is usually 0. It
-                                  is non-zero for the case where one attribute
-                                  does not fit into one mft record and thus
-                                  several mft records are allocated to hold
-                                  this attribute. In the latter case, each mft
-                                  record holds one extent of the attribute and
-                                  there is one attribute list entry for each
-                                  extent. NOTE: This is DEFINITELY a signed
-                                  value! The windows driver uses cmp, followed
-                                  by jg when comparing this, thus it treats it
-                                  as signed. */
-/* 16*/        leMFT_REF mft_reference;/* The reference of the mft record holding
-                                  the ATTR_RECORD for this portion of the
-                                  attribute value. */
-/* 24*/        le16 instance;          /* If lowest_vcn = 0, the instance of the
-                                  attribute being referenced; otherwise 0. */
-/* 26*/        ntfschar name[0];       /* Use when creating only. When reading use
-                                  name_offset to determine the location of the
-                                  name. */
-/* sizeof() = 26 + (attribute_name_length * 2) bytes */
-} __attribute__ ((__packed__)) ATTR_LIST_ENTRY;
-
-/*
- * The maximum allowed length for a file name.
- */
-#define MAXIMUM_FILE_NAME_LENGTH       255
-
-/*
- * Possible namespaces for filenames in ntfs (8-bit).
- */
-enum {
-       FILE_NAME_POSIX         = 0x00,
-       /* This is the largest namespace. It is case sensitive and allows all
-          Unicode characters except for: '\0' and '/'.  Beware that in
-          WinNT/2k/2003 by default files which eg have the same name except
-          for their case will not be distinguished by the standard utilities
-          and thus a "del filename" will delete both "filename" and "fileName"
-          without warning.  However if for example Services For Unix (SFU) are
-          installed and the case sensitive option was enabled at installation
-          time, then you can create/access/delete such files.
-          Note that even SFU places restrictions on the filenames beyond the
-          '\0' and '/' and in particular the following set of characters is
-          not allowed: '"', '/', '<', '>', '\'.  All other characters,
-          including the ones no allowed in WIN32 namespace are allowed.
-          Tested with SFU 3.5 (this is now free) running on Windows XP. */
-       FILE_NAME_WIN32         = 0x01,
-       /* The standard WinNT/2k NTFS long filenames. Case insensitive.  All
-          Unicode chars except: '\0', '"', '*', '/', ':', '<', '>', '?', '\',
-          and '|'.  Further, names cannot end with a '.' or a space. */
-       FILE_NAME_DOS           = 0x02,
-       /* The standard DOS filenames (8.3 format). Uppercase only.  All 8-bit
-          characters greater space, except: '"', '*', '+', ',', '/', ':', ';',
-          '<', '=', '>', '?', and '\'. */
-       FILE_NAME_WIN32_AND_DOS = 0x03,
-       /* 3 means that both the Win32 and the DOS filenames are identical and
-          hence have been saved in this single filename record. */
-} __attribute__ ((__packed__));
-
-typedef u8 FILE_NAME_TYPE_FLAGS;
-
-/*
- * Attribute: Filename (0x30).
- *
- * NOTE: Always resident.
- * NOTE: All fields, except the parent_directory, are only updated when the
- *      filename is changed. Until then, they just become out of sync with
- *      reality and the more up to date values are present in the standard
- *      information attribute.
- * NOTE: There is conflicting information about the meaning of each of the time
- *      fields but the meaning as defined below has been verified to be
- *      correct by practical experimentation on Windows NT4 SP6a and is hence
- *      assumed to be the one and only correct interpretation.
- */
-typedef struct {
-/*hex ofs*/
-/*  0*/        leMFT_REF parent_directory;     /* Directory this filename is
-                                          referenced from. */
-/*  8*/        sle64 creation_time;            /* Time file was created. */
-/* 10*/        sle64 last_data_change_time;    /* Time the data attribute was last
-                                          modified. */
-/* 18*/        sle64 last_mft_change_time;     /* Time this mft record was last
-                                          modified. */
-/* 20*/        sle64 last_access_time;         /* Time this mft record was last
-                                          accessed. */
-/* 28*/        sle64 allocated_size;           /* Byte size of on-disk allocated space
-                                          for the unnamed data attribute.  So
-                                          for normal $DATA, this is the
-                                          allocated_size from the unnamed
-                                          $DATA attribute and for compressed
-                                          and/or sparse $DATA, this is the
-                                          compressed_size from the unnamed
-                                          $DATA attribute.  For a directory or
-                                          other inode without an unnamed $DATA
-                                          attribute, this is always 0.  NOTE:
-                                          This is a multiple of the cluster
-                                          size. */
-/* 30*/        sle64 data_size;                /* Byte size of actual data in unnamed
-                                          data attribute.  For a directory or
-                                          other inode without an unnamed $DATA
-                                          attribute, this is always 0. */
-/* 38*/        FILE_ATTR_FLAGS file_attributes;        /* Flags describing the file. */
-/* 3c*/        union {
-       /* 3c*/ struct {
-               /* 3c*/ le16 packed_ea_size;    /* Size of the buffer needed to
-                                                  pack the extended attributes
-                                                  (EAs), if such are present.*/
-               /* 3e*/ le16 reserved;          /* Reserved for alignment. */
-               } __attribute__ ((__packed__)) ea;
-       /* 3c*/ struct {
-               /* 3c*/ le32 reparse_point_tag; /* Type of reparse point,
-                                                  present only in reparse
-                                                  points and only if there are
-                                                  no EAs. */
-               } __attribute__ ((__packed__)) rp;
-       } __attribute__ ((__packed__)) type;
-/* 40*/        u8 file_name_length;                    /* Length of file name in
-                                                  (Unicode) characters. */
-/* 41*/        FILE_NAME_TYPE_FLAGS file_name_type;    /* Namespace of the file name.*/
-/* 42*/        ntfschar file_name[0];                  /* File name in Unicode. */
-} __attribute__ ((__packed__)) FILE_NAME_ATTR;
-
-/*
- * GUID structures store globally unique identifiers (GUID). A GUID is a
- * 128-bit value consisting of one group of eight hexadecimal digits, followed
- * by three groups of four hexadecimal digits each, followed by one group of
- * twelve hexadecimal digits. GUIDs are Microsoft's implementation of the
- * distributed computing environment (DCE) universally unique identifier (UUID).
- * Example of a GUID:
- *     1F010768-5A73-BC91-0010A52216A7
- */
-typedef struct {
-       le32 data1;     /* The first eight hexadecimal digits of the GUID. */
-       le16 data2;     /* The first group of four hexadecimal digits. */
-       le16 data3;     /* The second group of four hexadecimal digits. */
-       u8 data4[8];    /* The first two bytes are the third group of four
-                          hexadecimal digits. The remaining six bytes are the
-                          final 12 hexadecimal digits. */
-} __attribute__ ((__packed__)) GUID;
-
-/*
- * FILE_Extend/$ObjId contains an index named $O. This index contains all
- * object_ids present on the volume as the index keys and the corresponding
- * mft_record numbers as the index entry data parts. The data part (defined
- * below) also contains three other object_ids:
- *     birth_volume_id - object_id of FILE_Volume on which the file was first
- *                       created. Optional (i.e. can be zero).
- *     birth_object_id - object_id of file when it was first created. Usually
- *                       equals the object_id. Optional (i.e. can be zero).
- *     domain_id       - Reserved (always zero).
- */
-typedef struct {
-       leMFT_REF mft_reference;/* Mft record containing the object_id in
-                                  the index entry key. */
-       union {
-               struct {
-                       GUID birth_volume_id;
-                       GUID birth_object_id;
-                       GUID domain_id;
-               } __attribute__ ((__packed__)) origin;
-               u8 extended_info[48];
-       } __attribute__ ((__packed__)) opt;
-} __attribute__ ((__packed__)) OBJ_ID_INDEX_DATA;
-
-/*
- * Attribute: Object id (NTFS 3.0+) (0x40).
- *
- * NOTE: Always resident.
- */
-typedef struct {
-       GUID object_id;                         /* Unique id assigned to the
-                                                  file.*/
-       /* The following fields are optional. The attribute value size is 16
-          bytes, i.e. sizeof(GUID), if these are not present at all. Note,
-          the entries can be present but one or more (or all) can be zero
-          meaning that that particular value(s) is(are) not defined. */
-       union {
-               struct {
-                       GUID birth_volume_id;   /* Unique id of volume on which
-                                                  the file was first created.*/
-                       GUID birth_object_id;   /* Unique id of file when it was
-                                                  first created. */
-                       GUID domain_id;         /* Reserved, zero. */
-               } __attribute__ ((__packed__)) origin;
-               u8 extended_info[48];
-       } __attribute__ ((__packed__)) opt;
-} __attribute__ ((__packed__)) OBJECT_ID_ATTR;
-
-/*
- * The pre-defined IDENTIFIER_AUTHORITIES used as SID_IDENTIFIER_AUTHORITY in
- * the SID structure (see below).
- */
-//typedef enum {                                       /* SID string prefix. */
-//     SECURITY_NULL_SID_AUTHORITY     = {0, 0, 0, 0, 0, 0},   /* S-1-0 */
-//     SECURITY_WORLD_SID_AUTHORITY    = {0, 0, 0, 0, 0, 1},   /* S-1-1 */
-//     SECURITY_LOCAL_SID_AUTHORITY    = {0, 0, 0, 0, 0, 2},   /* S-1-2 */
-//     SECURITY_CREATOR_SID_AUTHORITY  = {0, 0, 0, 0, 0, 3},   /* S-1-3 */
-//     SECURITY_NON_UNIQUE_AUTHORITY   = {0, 0, 0, 0, 0, 4},   /* S-1-4 */
-//     SECURITY_NT_SID_AUTHORITY       = {0, 0, 0, 0, 0, 5},   /* S-1-5 */
-//} IDENTIFIER_AUTHORITIES;
-
-/*
- * These relative identifiers (RIDs) are used with the above identifier
- * authorities to make up universal well-known SIDs.
- *
- * Note: The relative identifier (RID) refers to the portion of a SID, which
- * identifies a user or group in relation to the authority that issued the SID.
- * For example, the universal well-known SID Creator Owner ID (S-1-3-0) is
- * made up of the identifier authority SECURITY_CREATOR_SID_AUTHORITY (3) and
- * the relative identifier SECURITY_CREATOR_OWNER_RID (0).
- */
-typedef enum {                                 /* Identifier authority. */
-       SECURITY_NULL_RID                 = 0,  /* S-1-0 */
-       SECURITY_WORLD_RID                = 0,  /* S-1-1 */
-       SECURITY_LOCAL_RID                = 0,  /* S-1-2 */
-
-       SECURITY_CREATOR_OWNER_RID        = 0,  /* S-1-3 */
-       SECURITY_CREATOR_GROUP_RID        = 1,  /* S-1-3 */
-
-       SECURITY_CREATOR_OWNER_SERVER_RID = 2,  /* S-1-3 */
-       SECURITY_CREATOR_GROUP_SERVER_RID = 3,  /* S-1-3 */
-
-       SECURITY_DIALUP_RID               = 1,
-       SECURITY_NETWORK_RID              = 2,
-       SECURITY_BATCH_RID                = 3,
-       SECURITY_INTERACTIVE_RID          = 4,
-       SECURITY_SERVICE_RID              = 6,
-       SECURITY_ANONYMOUS_LOGON_RID      = 7,
-       SECURITY_PROXY_RID                = 8,
-       SECURITY_ENTERPRISE_CONTROLLERS_RID=9,
-       SECURITY_SERVER_LOGON_RID         = 9,
-       SECURITY_PRINCIPAL_SELF_RID       = 0xa,
-       SECURITY_AUTHENTICATED_USER_RID   = 0xb,
-       SECURITY_RESTRICTED_CODE_RID      = 0xc,
-       SECURITY_TERMINAL_SERVER_RID      = 0xd,
-
-       SECURITY_LOGON_IDS_RID            = 5,
-       SECURITY_LOGON_IDS_RID_COUNT      = 3,
-
-       SECURITY_LOCAL_SYSTEM_RID         = 0x12,
-
-       SECURITY_NT_NON_UNIQUE            = 0x15,
-
-       SECURITY_BUILTIN_DOMAIN_RID       = 0x20,
-
-       /*
-        * Well-known domain relative sub-authority values (RIDs).
-        */
-
-       /* Users. */
-       DOMAIN_USER_RID_ADMIN             = 0x1f4,
-       DOMAIN_USER_RID_GUEST             = 0x1f5,
-       DOMAIN_USER_RID_KRBTGT            = 0x1f6,
-
-       /* Groups. */
-       DOMAIN_GROUP_RID_ADMINS           = 0x200,
-       DOMAIN_GROUP_RID_USERS            = 0x201,
-       DOMAIN_GROUP_RID_GUESTS           = 0x202,
-       DOMAIN_GROUP_RID_COMPUTERS        = 0x203,
-       DOMAIN_GROUP_RID_CONTROLLERS      = 0x204,
-       DOMAIN_GROUP_RID_CERT_ADMINS      = 0x205,
-       DOMAIN_GROUP_RID_SCHEMA_ADMINS    = 0x206,
-       DOMAIN_GROUP_RID_ENTERPRISE_ADMINS= 0x207,
-       DOMAIN_GROUP_RID_POLICY_ADMINS    = 0x208,
-
-       /* Aliases. */
-       DOMAIN_ALIAS_RID_ADMINS           = 0x220,
-       DOMAIN_ALIAS_RID_USERS            = 0x221,
-       DOMAIN_ALIAS_RID_GUESTS           = 0x222,
-       DOMAIN_ALIAS_RID_POWER_USERS      = 0x223,
-
-       DOMAIN_ALIAS_RID_ACCOUNT_OPS      = 0x224,
-       DOMAIN_ALIAS_RID_SYSTEM_OPS       = 0x225,
-       DOMAIN_ALIAS_RID_PRINT_OPS        = 0x226,
-       DOMAIN_ALIAS_RID_BACKUP_OPS       = 0x227,
-
-       DOMAIN_ALIAS_RID_REPLICATOR       = 0x228,
-       DOMAIN_ALIAS_RID_RAS_SERVERS      = 0x229,
-       DOMAIN_ALIAS_RID_PREW2KCOMPACCESS = 0x22a,
-} RELATIVE_IDENTIFIERS;
-
-/*
- * The universal well-known SIDs:
- *
- *     NULL_SID                        S-1-0-0
- *     WORLD_SID                       S-1-1-0
- *     LOCAL_SID                       S-1-2-0
- *     CREATOR_OWNER_SID               S-1-3-0
- *     CREATOR_GROUP_SID               S-1-3-1
- *     CREATOR_OWNER_SERVER_SID        S-1-3-2
- *     CREATOR_GROUP_SERVER_SID        S-1-3-3
- *
- *     (Non-unique IDs)                S-1-4
- *
- * NT well-known SIDs:
- *
- *     NT_AUTHORITY_SID        S-1-5
- *     DIALUP_SID              S-1-5-1
- *
- *     NETWORD_SID             S-1-5-2
- *     BATCH_SID               S-1-5-3
- *     INTERACTIVE_SID         S-1-5-4
- *     SERVICE_SID             S-1-5-6
- *     ANONYMOUS_LOGON_SID     S-1-5-7         (aka null logon session)
- *     PROXY_SID               S-1-5-8
- *     SERVER_LOGON_SID        S-1-5-9         (aka domain controller account)
- *     SELF_SID                S-1-5-10        (self RID)
- *     AUTHENTICATED_USER_SID  S-1-5-11
- *     RESTRICTED_CODE_SID     S-1-5-12        (running restricted code)
- *     TERMINAL_SERVER_SID     S-1-5-13        (running on terminal server)
- *
- *     (Logon IDs)             S-1-5-5-X-Y
- *
- *     (NT non-unique IDs)     S-1-5-0x15-...
- *
- *     (Built-in domain)       S-1-5-0x20
- */
-
-/*
- * The SID_IDENTIFIER_AUTHORITY is a 48-bit value used in the SID structure.
- *
- * NOTE: This is stored as a big endian number, hence the high_part comes
- * before the low_part.
- */
-typedef union {
-       struct {
-               u16 high_part;  /* High 16-bits. */
-               u32 low_part;   /* Low 32-bits. */
-       } __attribute__ ((__packed__)) parts;
-       u8 value[6];            /* Value as individual bytes. */
-} __attribute__ ((__packed__)) SID_IDENTIFIER_AUTHORITY;
-
-/*
- * The SID structure is a variable-length structure used to uniquely identify
- * users or groups. SID stands for security identifier.
- *
- * The standard textual representation of the SID is of the form:
- *     S-R-I-S-S...
- * Where:
- *    - The first "S" is the literal character 'S' identifying the following
- *     digits as a SID.
- *    - R is the revision level of the SID expressed as a sequence of digits
- *     either in decimal or hexadecimal (if the later, prefixed by "0x").
- *    - I is the 48-bit identifier_authority, expressed as digits as R above.
- *    - S... is one or more sub_authority values, expressed as digits as above.
- *
- * Example SID; the domain-relative SID of the local Administrators group on
- * Windows NT/2k:
- *     S-1-5-32-544
- * This translates to a SID with:
- *     revision = 1,
- *     sub_authority_count = 2,
- *     identifier_authority = {0,0,0,0,0,5},   // SECURITY_NT_AUTHORITY
- *     sub_authority[0] = 32,                  // SECURITY_BUILTIN_DOMAIN_RID
- *     sub_authority[1] = 544                  // DOMAIN_ALIAS_RID_ADMINS
- */
-typedef struct {
-       u8 revision;
-       u8 sub_authority_count;
-       SID_IDENTIFIER_AUTHORITY identifier_authority;
-       le32 sub_authority[1];          /* At least one sub_authority. */
-} __attribute__ ((__packed__)) SID;
-
-/*
- * Current constants for SIDs.
- */
-typedef enum {
-       SID_REVISION                    =  1,   /* Current revision level. */
-       SID_MAX_SUB_AUTHORITIES         = 15,   /* Maximum number of those. */
-       SID_RECOMMENDED_SUB_AUTHORITIES =  1,   /* Will change to around 6 in
-                                                  a future revision. */
-} SID_CONSTANTS;
-
-/*
- * The predefined ACE types (8-bit, see below).
- */
-enum {
-       ACCESS_MIN_MS_ACE_TYPE          = 0,
-       ACCESS_ALLOWED_ACE_TYPE         = 0,
-       ACCESS_DENIED_ACE_TYPE          = 1,
-       SYSTEM_AUDIT_ACE_TYPE           = 2,
-       SYSTEM_ALARM_ACE_TYPE           = 3, /* Not implemented as of Win2k. */
-       ACCESS_MAX_MS_V2_ACE_TYPE       = 3,
-
-       ACCESS_ALLOWED_COMPOUND_ACE_TYPE= 4,
-       ACCESS_MAX_MS_V3_ACE_TYPE       = 4,
-
-       /* The following are Win2k only. */
-       ACCESS_MIN_MS_OBJECT_ACE_TYPE   = 5,
-       ACCESS_ALLOWED_OBJECT_ACE_TYPE  = 5,
-       ACCESS_DENIED_OBJECT_ACE_TYPE   = 6,
-       SYSTEM_AUDIT_OBJECT_ACE_TYPE    = 7,
-       SYSTEM_ALARM_OBJECT_ACE_TYPE    = 8,
-       ACCESS_MAX_MS_OBJECT_ACE_TYPE   = 8,
-
-       ACCESS_MAX_MS_V4_ACE_TYPE       = 8,
-
-       /* This one is for WinNT/2k. */
-       ACCESS_MAX_MS_ACE_TYPE          = 8,
-} __attribute__ ((__packed__));
-
-typedef u8 ACE_TYPES;
-
-/*
- * The ACE flags (8-bit) for audit and inheritance (see below).
- *
- * SUCCESSFUL_ACCESS_ACE_FLAG is only used with system audit and alarm ACE
- * types to indicate that a message is generated (in Windows!) for successful
- * accesses.
- *
- * FAILED_ACCESS_ACE_FLAG is only used with system audit and alarm ACE types
- * to indicate that a message is generated (in Windows!) for failed accesses.
- */
-enum {
-       /* The inheritance flags. */
-       OBJECT_INHERIT_ACE              = 0x01,
-       CONTAINER_INHERIT_ACE           = 0x02,
-       NO_PROPAGATE_INHERIT_ACE        = 0x04,
-       INHERIT_ONLY_ACE                = 0x08,
-       INHERITED_ACE                   = 0x10, /* Win2k only. */
-       VALID_INHERIT_FLAGS             = 0x1f,
-
-       /* The audit flags. */
-       SUCCESSFUL_ACCESS_ACE_FLAG      = 0x40,
-       FAILED_ACCESS_ACE_FLAG          = 0x80,
-} __attribute__ ((__packed__));
-
-typedef u8 ACE_FLAGS;
-
-/*
- * An ACE is an access-control entry in an access-control list (ACL).
- * An ACE defines access to an object for a specific user or group or defines
- * the types of access that generate system-administration messages or alarms
- * for a specific user or group. The user or group is identified by a security
- * identifier (SID).
- *
- * Each ACE starts with an ACE_HEADER structure (aligned on 4-byte boundary),
- * which specifies the type and size of the ACE. The format of the subsequent
- * data depends on the ACE type.
- */
-typedef struct {
-/*Ofs*/
-/*  0*/        ACE_TYPES type;         /* Type of the ACE. */
-/*  1*/        ACE_FLAGS flags;        /* Flags describing the ACE. */
-/*  2*/        le16 size;              /* Size in bytes of the ACE. */
-} __attribute__ ((__packed__)) ACE_HEADER;
-
-/*
- * The access mask (32-bit). Defines the access rights.
- *
- * The specific rights (bits 0 to 15).  These depend on the type of the object
- * being secured by the ACE.
- */
-enum {
-       /* Specific rights for files and directories are as follows: */
-
-       /* Right to read data from the file. (FILE) */
-       FILE_READ_DATA                  = cpu_to_le32(0x00000001),
-       /* Right to list contents of a directory. (DIRECTORY) */
-       FILE_LIST_DIRECTORY             = cpu_to_le32(0x00000001),
-
-       /* Right to write data to the file. (FILE) */
-       FILE_WRITE_DATA                 = cpu_to_le32(0x00000002),
-       /* Right to create a file in the directory. (DIRECTORY) */
-       FILE_ADD_FILE                   = cpu_to_le32(0x00000002),
-
-       /* Right to append data to the file. (FILE) */
-       FILE_APPEND_DATA                = cpu_to_le32(0x00000004),
-       /* Right to create a subdirectory. (DIRECTORY) */
-       FILE_ADD_SUBDIRECTORY           = cpu_to_le32(0x00000004),
-
-       /* Right to read extended attributes. (FILE/DIRECTORY) */
-       FILE_READ_EA                    = cpu_to_le32(0x00000008),
-
-       /* Right to write extended attributes. (FILE/DIRECTORY) */
-       FILE_WRITE_EA                   = cpu_to_le32(0x00000010),
-
-       /* Right to execute a file. (FILE) */
-       FILE_EXECUTE                    = cpu_to_le32(0x00000020),
-       /* Right to traverse the directory. (DIRECTORY) */
-       FILE_TRAVERSE                   = cpu_to_le32(0x00000020),
-
-       /*
-        * Right to delete a directory and all the files it contains (its
-        * children), even if the files are read-only. (DIRECTORY)
-        */
-       FILE_DELETE_CHILD               = cpu_to_le32(0x00000040),
-
-       /* Right to read file attributes. (FILE/DIRECTORY) */
-       FILE_READ_ATTRIBUTES            = cpu_to_le32(0x00000080),
-
-       /* Right to change file attributes. (FILE/DIRECTORY) */
-       FILE_WRITE_ATTRIBUTES           = cpu_to_le32(0x00000100),
-
-       /*
-        * The standard rights (bits 16 to 23).  These are independent of the
-        * type of object being secured.
-        */
-
-       /* Right to delete the object. */
-       DELETE                          = cpu_to_le32(0x00010000),
-
-       /*
-        * Right to read the information in the object's security descriptor,
-        * not including the information in the SACL, i.e. right to read the
-        * security descriptor and owner.
-        */
-       READ_CONTROL                    = cpu_to_le32(0x00020000),
-
-       /* Right to modify the DACL in the object's security descriptor. */
-       WRITE_DAC                       = cpu_to_le32(0x00040000),
-
-       /* Right to change the owner in the object's security descriptor. */
-       WRITE_OWNER                     = cpu_to_le32(0x00080000),
-
-       /*
-        * Right to use the object for synchronization.  Enables a process to
-        * wait until the object is in the signalled state.  Some object types
-        * do not support this access right.
-        */
-       SYNCHRONIZE                     = cpu_to_le32(0x00100000),
-
-       /*
-        * The following STANDARD_RIGHTS_* are combinations of the above for
-        * convenience and are defined by the Win32 API.
-        */
-
-       /* These are currently defined to READ_CONTROL. */
-       STANDARD_RIGHTS_READ            = cpu_to_le32(0x00020000),
-       STANDARD_RIGHTS_WRITE           = cpu_to_le32(0x00020000),
-       STANDARD_RIGHTS_EXECUTE         = cpu_to_le32(0x00020000),
-
-       /* Combines DELETE, READ_CONTROL, WRITE_DAC, and WRITE_OWNER access. */
-       STANDARD_RIGHTS_REQUIRED        = cpu_to_le32(0x000f0000),
-
-       /*
-        * Combines DELETE, READ_CONTROL, WRITE_DAC, WRITE_OWNER, and
-        * SYNCHRONIZE access.
-        */
-       STANDARD_RIGHTS_ALL             = cpu_to_le32(0x001f0000),
-
-       /*
-        * The access system ACL and maximum allowed access types (bits 24 to
-        * 25, bits 26 to 27 are reserved).
-        */
-       ACCESS_SYSTEM_SECURITY          = cpu_to_le32(0x01000000),
-       MAXIMUM_ALLOWED                 = cpu_to_le32(0x02000000),
-
-       /*
-        * The generic rights (bits 28 to 31).  These map onto the standard and
-        * specific rights.
-        */
-
-       /* Read, write, and execute access. */
-       GENERIC_ALL                     = cpu_to_le32(0x10000000),
-
-       /* Execute access. */
-       GENERIC_EXECUTE                 = cpu_to_le32(0x20000000),
-
-       /*
-        * Write access.  For files, this maps onto:
-        *      FILE_APPEND_DATA | FILE_WRITE_ATTRIBUTES | FILE_WRITE_DATA |
-        *      FILE_WRITE_EA | STANDARD_RIGHTS_WRITE | SYNCHRONIZE
-        * For directories, the mapping has the same numerical value.  See
-        * above for the descriptions of the rights granted.
-        */
-       GENERIC_WRITE                   = cpu_to_le32(0x40000000),
-
-       /*
-        * Read access.  For files, this maps onto:
-        *      FILE_READ_ATTRIBUTES | FILE_READ_DATA | FILE_READ_EA |
-        *      STANDARD_RIGHTS_READ | SYNCHRONIZE
-        * For directories, the mapping has the same numberical value.  See
-        * above for the descriptions of the rights granted.
-        */
-       GENERIC_READ                    = cpu_to_le32(0x80000000),
-};
-
-typedef le32 ACCESS_MASK;
-
-/*
- * The generic mapping array. Used to denote the mapping of each generic
- * access right to a specific access mask.
- *
- * FIXME: What exactly is this and what is it for? (AIA)
- */
-typedef struct {
-       ACCESS_MASK generic_read;
-       ACCESS_MASK generic_write;
-       ACCESS_MASK generic_execute;
-       ACCESS_MASK generic_all;
-} __attribute__ ((__packed__)) GENERIC_MAPPING;
-
-/*
- * The predefined ACE type structures are as defined below.
- */
-
-/*
- * ACCESS_ALLOWED_ACE, ACCESS_DENIED_ACE, SYSTEM_AUDIT_ACE, SYSTEM_ALARM_ACE
- */
-typedef struct {
-/*  0  ACE_HEADER; -- Unfolded here as gcc doesn't like unnamed structs. */
-       ACE_TYPES type;         /* Type of the ACE. */
-       ACE_FLAGS flags;        /* Flags describing the ACE. */
-       le16 size;              /* Size in bytes of the ACE. */
-/*  4*/        ACCESS_MASK mask;       /* Access mask associated with the ACE. */
-
-/*  8*/        SID sid;                /* The SID associated with the ACE. */
-} __attribute__ ((__packed__)) ACCESS_ALLOWED_ACE, ACCESS_DENIED_ACE,
-                              SYSTEM_AUDIT_ACE, SYSTEM_ALARM_ACE;
-
-/*
- * The object ACE flags (32-bit).
- */
-enum {
-       ACE_OBJECT_TYPE_PRESENT                 = cpu_to_le32(1),
-       ACE_INHERITED_OBJECT_TYPE_PRESENT       = cpu_to_le32(2),
-};
-
-typedef le32 OBJECT_ACE_FLAGS;
-
-typedef struct {
-/*  0  ACE_HEADER; -- Unfolded here as gcc doesn't like unnamed structs. */
-       ACE_TYPES type;         /* Type of the ACE. */
-       ACE_FLAGS flags;        /* Flags describing the ACE. */
-       le16 size;              /* Size in bytes of the ACE. */
-/*  4*/        ACCESS_MASK mask;       /* Access mask associated with the ACE. */
-
-/*  8*/        OBJECT_ACE_FLAGS object_flags;  /* Flags describing the object ACE. */
-/* 12*/        GUID object_type;
-/* 28*/        GUID inherited_object_type;
-
-/* 44*/        SID sid;                /* The SID associated with the ACE. */
-} __attribute__ ((__packed__)) ACCESS_ALLOWED_OBJECT_ACE,
-                              ACCESS_DENIED_OBJECT_ACE,
-                              SYSTEM_AUDIT_OBJECT_ACE,
-                              SYSTEM_ALARM_OBJECT_ACE;
-
-/*
- * An ACL is an access-control list (ACL).
- * An ACL starts with an ACL header structure, which specifies the size of
- * the ACL and the number of ACEs it contains. The ACL header is followed by
- * zero or more access control entries (ACEs). The ACL as well as each ACE
- * are aligned on 4-byte boundaries.
- */
-typedef struct {
-       u8 revision;    /* Revision of this ACL. */
-       u8 alignment1;
-       le16 size;      /* Allocated space in bytes for ACL. Includes this
-                          header, the ACEs and the remaining free space. */
-       le16 ace_count; /* Number of ACEs in the ACL. */
-       le16 alignment2;
-/* sizeof() = 8 bytes */
-} __attribute__ ((__packed__)) ACL;
-
-/*
- * Current constants for ACLs.
- */
-typedef enum {
-       /* Current revision. */
-       ACL_REVISION            = 2,
-       ACL_REVISION_DS         = 4,
-
-       /* History of revisions. */
-       ACL_REVISION1           = 1,
-       MIN_ACL_REVISION        = 2,
-       ACL_REVISION2           = 2,
-       ACL_REVISION3           = 3,
-       ACL_REVISION4           = 4,
-       MAX_ACL_REVISION        = 4,
-} ACL_CONSTANTS;
-
-/*
- * The security descriptor control flags (16-bit).
- *
- * SE_OWNER_DEFAULTED - This boolean flag, when set, indicates that the SID
- *     pointed to by the Owner field was provided by a defaulting mechanism
- *     rather than explicitly provided by the original provider of the
- *     security descriptor.  This may affect the treatment of the SID with
- *     respect to inheritance of an owner.
- *
- * SE_GROUP_DEFAULTED - This boolean flag, when set, indicates that the SID in
- *     the Group field was provided by a defaulting mechanism rather than
- *     explicitly provided by the original provider of the security
- *     descriptor.  This may affect the treatment of the SID with respect to
- *     inheritance of a primary group.
- *
- * SE_DACL_PRESENT - This boolean flag, when set, indicates that the security
- *     descriptor contains a discretionary ACL.  If this flag is set and the
- *     Dacl field of the SECURITY_DESCRIPTOR is null, then a null ACL is
- *     explicitly being specified.
- *
- * SE_DACL_DEFAULTED - This boolean flag, when set, indicates that the ACL
- *     pointed to by the Dacl field was provided by a defaulting mechanism
- *     rather than explicitly provided by the original provider of the
- *     security descriptor.  This may affect the treatment of the ACL with
- *     respect to inheritance of an ACL.  This flag is ignored if the
- *     DaclPresent flag is not set.
- *
- * SE_SACL_PRESENT - This boolean flag, when set,  indicates that the security
- *     descriptor contains a system ACL pointed to by the Sacl field.  If this
- *     flag is set and the Sacl field of the SECURITY_DESCRIPTOR is null, then
- *     an empty (but present) ACL is being specified.
- *
- * SE_SACL_DEFAULTED - This boolean flag, when set, indicates that the ACL
- *     pointed to by the Sacl field was provided by a defaulting mechanism
- *     rather than explicitly provided by the original provider of the
- *     security descriptor.  This may affect the treatment of the ACL with
- *     respect to inheritance of an ACL.  This flag is ignored if the
- *     SaclPresent flag is not set.
- *
- * SE_SELF_RELATIVE - This boolean flag, when set, indicates that the security
- *     descriptor is in self-relative form.  In this form, all fields of the
- *     security descriptor are contiguous in memory and all pointer fields are
- *     expressed as offsets from the beginning of the security descriptor.
- */
-enum {
-       SE_OWNER_DEFAULTED              = cpu_to_le16(0x0001),
-       SE_GROUP_DEFAULTED              = cpu_to_le16(0x0002),
-       SE_DACL_PRESENT                 = cpu_to_le16(0x0004),
-       SE_DACL_DEFAULTED               = cpu_to_le16(0x0008),
-
-       SE_SACL_PRESENT                 = cpu_to_le16(0x0010),
-       SE_SACL_DEFAULTED               = cpu_to_le16(0x0020),
-
-       SE_DACL_AUTO_INHERIT_REQ        = cpu_to_le16(0x0100),
-       SE_SACL_AUTO_INHERIT_REQ        = cpu_to_le16(0x0200),
-       SE_DACL_AUTO_INHERITED          = cpu_to_le16(0x0400),
-       SE_SACL_AUTO_INHERITED          = cpu_to_le16(0x0800),
-
-       SE_DACL_PROTECTED               = cpu_to_le16(0x1000),
-       SE_SACL_PROTECTED               = cpu_to_le16(0x2000),
-       SE_RM_CONTROL_VALID             = cpu_to_le16(0x4000),
-       SE_SELF_RELATIVE                = cpu_to_le16(0x8000)
-} __attribute__ ((__packed__));
-
-typedef le16 SECURITY_DESCRIPTOR_CONTROL;
-
-/*
- * Self-relative security descriptor. Contains the owner and group SIDs as well
- * as the sacl and dacl ACLs inside the security descriptor itself.
- */
-typedef struct {
-       u8 revision;    /* Revision level of the security descriptor. */
-       u8 alignment;
-       SECURITY_DESCRIPTOR_CONTROL control; /* Flags qualifying the type of
-                          the descriptor as well as the following fields. */
-       le32 owner;     /* Byte offset to a SID representing an object's
-                          owner. If this is NULL, no owner SID is present in
-                          the descriptor. */
-       le32 group;     /* Byte offset to a SID representing an object's
-                          primary group. If this is NULL, no primary group
-                          SID is present in the descriptor. */
-       le32 sacl;      /* Byte offset to a system ACL. Only valid, if
-                          SE_SACL_PRESENT is set in the control field. If
-                          SE_SACL_PRESENT is set but sacl is NULL, a NULL ACL
-                          is specified. */
-       le32 dacl;      /* Byte offset to a discretionary ACL. Only valid, if
-                          SE_DACL_PRESENT is set in the control field. If
-                          SE_DACL_PRESENT is set but dacl is NULL, a NULL ACL
-                          (unconditionally granting access) is specified. */
-/* sizeof() = 0x14 bytes */
-} __attribute__ ((__packed__)) SECURITY_DESCRIPTOR_RELATIVE;
-
-/*
- * Absolute security descriptor. Does not contain the owner and group SIDs, nor
- * the sacl and dacl ACLs inside the security descriptor. Instead, it contains
- * pointers to these structures in memory. Obviously, absolute security
- * descriptors are only useful for in memory representations of security
- * descriptors. On disk, a self-relative security descriptor is used.
- */
-typedef struct {
-       u8 revision;    /* Revision level of the security descriptor. */
-       u8 alignment;
-       SECURITY_DESCRIPTOR_CONTROL control;    /* Flags qualifying the type of
-                          the descriptor as well as the following fields. */
-       SID *owner;     /* Points to a SID representing an object's owner. If
-                          this is NULL, no owner SID is present in the
-                          descriptor. */
-       SID *group;     /* Points to a SID representing an object's primary
-                          group. If this is NULL, no primary group SID is
-                          present in the descriptor. */
-       ACL *sacl;      /* Points to a system ACL. Only valid, if
-                          SE_SACL_PRESENT is set in the control field. If
-                          SE_SACL_PRESENT is set but sacl is NULL, a NULL ACL
-                          is specified. */
-       ACL *dacl;      /* Points to a discretionary ACL. Only valid, if
-                          SE_DACL_PRESENT is set in the control field. If
-                          SE_DACL_PRESENT is set but dacl is NULL, a NULL ACL
-                          (unconditionally granting access) is specified. */
-} __attribute__ ((__packed__)) SECURITY_DESCRIPTOR;
-
-/*
- * Current constants for security descriptors.
- */
-typedef enum {
-       /* Current revision. */
-       SECURITY_DESCRIPTOR_REVISION    = 1,
-       SECURITY_DESCRIPTOR_REVISION1   = 1,
-
-       /* The sizes of both the absolute and relative security descriptors is
-          the same as pointers, at least on ia32 architecture are 32-bit. */
-       SECURITY_DESCRIPTOR_MIN_LENGTH  = sizeof(SECURITY_DESCRIPTOR),
-} SECURITY_DESCRIPTOR_CONSTANTS;
-
-/*
- * Attribute: Security descriptor (0x50). A standard self-relative security
- * descriptor.
- *
- * NOTE: Can be resident or non-resident.
- * NOTE: Not used in NTFS 3.0+, as security descriptors are stored centrally
- * in FILE_Secure and the correct descriptor is found using the security_id
- * from the standard information attribute.
- */
-typedef SECURITY_DESCRIPTOR_RELATIVE SECURITY_DESCRIPTOR_ATTR;
-
-/*
- * On NTFS 3.0+, all security descriptors are stored in FILE_Secure. Only one
- * referenced instance of each unique security descriptor is stored.
- *
- * FILE_Secure contains no unnamed data attribute, i.e. it has zero length. It
- * does, however, contain two indexes ($SDH and $SII) as well as a named data
- * stream ($SDS).
- *
- * Every unique security descriptor is assigned a unique security identifier
- * (security_id, not to be confused with a SID). The security_id is unique for
- * the NTFS volume and is used as an index into the $SII index, which maps
- * security_ids to the security descriptor's storage location within the $SDS
- * data attribute. The $SII index is sorted by ascending security_id.
- *
- * A simple hash is computed from each security descriptor. This hash is used
- * as an index into the $SDH index, which maps security descriptor hashes to
- * the security descriptor's storage location within the $SDS data attribute.
- * The $SDH index is sorted by security descriptor hash and is stored in a B+
- * tree. When searching $SDH (with the intent of determining whether or not a
- * new security descriptor is already present in the $SDS data stream), if a
- * matching hash is found, but the security descriptors do not match, the
- * search in the $SDH index is continued, searching for a next matching hash.
- *
- * When a precise match is found, the security_id coresponding to the security
- * descriptor in the $SDS attribute is read from the found $SDH index entry and
- * is stored in the $STANDARD_INFORMATION attribute of the file/directory to
- * which the security descriptor is being applied. The $STANDARD_INFORMATION
- * attribute is present in all base mft records (i.e. in all files and
- * directories).
- *
- * If a match is not found, the security descriptor is assigned a new unique
- * security_id and is added to the $SDS data attribute. Then, entries
- * referencing the this security descriptor in the $SDS data attribute are
- * added to the $SDH and $SII indexes.
- *
- * Note: Entries are never deleted from FILE_Secure, even if nothing
- * references an entry any more.
- */
-
-/*
- * This header precedes each security descriptor in the $SDS data stream.
- * This is also the index entry data part of both the $SII and $SDH indexes.
- */
-typedef struct {
-       le32 hash;        /* Hash of the security descriptor. */
-       le32 security_id; /* The security_id assigned to the descriptor. */
-       le64 offset;      /* Byte offset of this entry in the $SDS stream. */
-       le32 length;      /* Size in bytes of this entry in $SDS stream. */
-} __attribute__ ((__packed__)) SECURITY_DESCRIPTOR_HEADER;
-
-/*
- * The $SDS data stream contains the security descriptors, aligned on 16-byte
- * boundaries, sorted by security_id in a B+ tree. Security descriptors cannot
- * cross 256kib boundaries (this restriction is imposed by the Windows cache
- * manager). Each security descriptor is contained in a SDS_ENTRY structure.
- * Also, each security descriptor is stored twice in the $SDS stream with a
- * fixed offset of 0x40000 bytes (256kib, the Windows cache manager's max size)
- * between them; i.e. if a SDS_ENTRY specifies an offset of 0x51d0, then the
- * first copy of the security descriptor will be at offset 0x51d0 in the
- * $SDS data stream and the second copy will be at offset 0x451d0.
- */
-typedef struct {
-/*Ofs*/
-/*  0  SECURITY_DESCRIPTOR_HEADER; -- Unfolded here as gcc doesn't like
-                                      unnamed structs. */
-       le32 hash;        /* Hash of the security descriptor. */
-       le32 security_id; /* The security_id assigned to the descriptor. */
-       le64 offset;      /* Byte offset of this entry in the $SDS stream. */
-       le32 length;      /* Size in bytes of this entry in $SDS stream. */
-/* 20*/        SECURITY_DESCRIPTOR_RELATIVE sid; /* The self-relative security
-                                            descriptor. */
-} __attribute__ ((__packed__)) SDS_ENTRY;
-
-/*
- * The index entry key used in the $SII index. The collation type is
- * COLLATION_NTOFS_ULONG.
- */
-typedef struct {
-       le32 security_id; /* The security_id assigned to the descriptor. */
-} __attribute__ ((__packed__)) SII_INDEX_KEY;
-
-/*
- * The index entry key used in the $SDH index. The keys are sorted first by
- * hash and then by security_id. The collation rule is
- * COLLATION_NTOFS_SECURITY_HASH.
- */
-typedef struct {
-       le32 hash;        /* Hash of the security descriptor. */
-       le32 security_id; /* The security_id assigned to the descriptor. */
-} __attribute__ ((__packed__)) SDH_INDEX_KEY;
-
-/*
- * Attribute: Volume name (0x60).
- *
- * NOTE: Always resident.
- * NOTE: Present only in FILE_Volume.
- */
-typedef struct {
-       ntfschar name[0];       /* The name of the volume in Unicode. */
-} __attribute__ ((__packed__)) VOLUME_NAME;
-
-/*
- * Possible flags for the volume (16-bit).
- */
-enum {
-       VOLUME_IS_DIRTY                 = cpu_to_le16(0x0001),
-       VOLUME_RESIZE_LOG_FILE          = cpu_to_le16(0x0002),
-       VOLUME_UPGRADE_ON_MOUNT         = cpu_to_le16(0x0004),
-       VOLUME_MOUNTED_ON_NT4           = cpu_to_le16(0x0008),
-
-       VOLUME_DELETE_USN_UNDERWAY      = cpu_to_le16(0x0010),
-       VOLUME_REPAIR_OBJECT_ID         = cpu_to_le16(0x0020),
-
-       VOLUME_CHKDSK_UNDERWAY          = cpu_to_le16(0x4000),
-       VOLUME_MODIFIED_BY_CHKDSK       = cpu_to_le16(0x8000),
-
-       VOLUME_FLAGS_MASK               = cpu_to_le16(0xc03f),
-
-       /* To make our life easier when checking if we must mount read-only. */
-       VOLUME_MUST_MOUNT_RO_MASK       = cpu_to_le16(0xc027),
-} __attribute__ ((__packed__));
-
-typedef le16 VOLUME_FLAGS;
-
-/*
- * Attribute: Volume information (0x70).
- *
- * NOTE: Always resident.
- * NOTE: Present only in FILE_Volume.
- * NOTE: Windows 2000 uses NTFS 3.0 while Windows NT4 service pack 6a uses
- *      NTFS 1.2. I haven't personally seen other values yet.
- */
-typedef struct {
-       le64 reserved;          /* Not used (yet?). */
-       u8 major_ver;           /* Major version of the ntfs format. */
-       u8 minor_ver;           /* Minor version of the ntfs format. */
-       VOLUME_FLAGS flags;     /* Bit array of VOLUME_* flags. */
-} __attribute__ ((__packed__)) VOLUME_INFORMATION;
-
-/*
- * Attribute: Data attribute (0x80).
- *
- * NOTE: Can be resident or non-resident.
- *
- * Data contents of a file (i.e. the unnamed stream) or of a named stream.
- */
-typedef struct {
-       u8 data[0];             /* The file's data contents. */
-} __attribute__ ((__packed__)) DATA_ATTR;
-
-/*
- * Index header flags (8-bit).
- */
-enum {
-       /*
-        * When index header is in an index root attribute:
-        */
-       SMALL_INDEX = 0, /* The index is small enough to fit inside the index
-                           root attribute and there is no index allocation
-                           attribute present. */
-       LARGE_INDEX = 1, /* The index is too large to fit in the index root
-                           attribute and/or an index allocation attribute is
-                           present. */
-       /*
-        * When index header is in an index block, i.e. is part of index
-        * allocation attribute:
-        */
-       LEAF_NODE  = 0, /* This is a leaf node, i.e. there are no more nodes
-                          branching off it. */
-       INDEX_NODE = 1, /* This node indexes other nodes, i.e. it is not a leaf
-                          node. */
-       NODE_MASK  = 1, /* Mask for accessing the *_NODE bits. */
-} __attribute__ ((__packed__));
-
-typedef u8 INDEX_HEADER_FLAGS;
-
-/*
- * This is the header for indexes, describing the INDEX_ENTRY records, which
- * follow the INDEX_HEADER. Together the index header and the index entries
- * make up a complete index.
- *
- * IMPORTANT NOTE: The offset, length and size structure members are counted
- * relative to the start of the index header structure and not relative to the
- * start of the index root or index allocation structures themselves.
- */
-typedef struct {
-       le32 entries_offset;            /* Byte offset to first INDEX_ENTRY
-                                          aligned to 8-byte boundary. */
-       le32 index_length;              /* Data size of the index in bytes,
-                                          i.e. bytes used from allocated
-                                          size, aligned to 8-byte boundary. */
-       le32 allocated_size;            /* Byte size of this index (block),
-                                          multiple of 8 bytes. */
-       /* NOTE: For the index root attribute, the above two numbers are always
-          equal, as the attribute is resident and it is resized as needed. In
-          the case of the index allocation attribute the attribute is not
-          resident and hence the allocated_size is a fixed value and must
-          equal the index_block_size specified by the INDEX_ROOT attribute
-          corresponding to the INDEX_ALLOCATION attribute this INDEX_BLOCK
-          belongs to. */
-       INDEX_HEADER_FLAGS flags;       /* Bit field of INDEX_HEADER_FLAGS. */
-       u8 reserved[3];                 /* Reserved/align to 8-byte boundary. */
-} __attribute__ ((__packed__)) INDEX_HEADER;
-
-/*
- * Attribute: Index root (0x90).
- *
- * NOTE: Always resident.
- *
- * This is followed by a sequence of index entries (INDEX_ENTRY structures)
- * as described by the index header.
- *
- * When a directory is small enough to fit inside the index root then this
- * is the only attribute describing the directory. When the directory is too
- * large to fit in the index root, on the other hand, two additional attributes
- * are present: an index allocation attribute, containing sub-nodes of the B+
- * directory tree (see below), and a bitmap attribute, describing which virtual
- * cluster numbers (vcns) in the index allocation attribute are in use by an
- * index block.
- *
- * NOTE: The root directory (FILE_root) contains an entry for itself. Other
- * directories do not contain entries for themselves, though.
- */
-typedef struct {
-       ATTR_TYPE type;                 /* Type of the indexed attribute. Is
-                                          $FILE_NAME for directories, zero
-                                          for view indexes. No other values
-                                          allowed. */
-       COLLATION_RULE collation_rule;  /* Collation rule used to sort the
-                                          index entries. If type is $FILE_NAME,
-                                          this must be COLLATION_FILE_NAME. */
-       le32 index_block_size;          /* Size of each index block in bytes (in
-                                          the index allocation attribute). */
-       u8 clusters_per_index_block;    /* Cluster size of each index block (in
-                                          the index allocation attribute), when
-                                          an index block is >= than a cluster,
-                                          otherwise this will be the log of
-                                          the size (like how the encoding of
-                                          the mft record size and the index
-                                          record size found in the boot sector
-                                          work). Has to be a power of 2. */
-       u8 reserved[3];                 /* Reserved/align to 8-byte boundary. */
-       INDEX_HEADER index;             /* Index header describing the
-                                          following index entries. */
-} __attribute__ ((__packed__)) INDEX_ROOT;
-
-/*
- * Attribute: Index allocation (0xa0).
- *
- * NOTE: Always non-resident (doesn't make sense to be resident anyway!).
- *
- * This is an array of index blocks. Each index block starts with an
- * INDEX_BLOCK structure containing an index header, followed by a sequence of
- * index entries (INDEX_ENTRY structures), as described by the INDEX_HEADER.
- */
-typedef struct {
-/*  0  NTFS_RECORD; -- Unfolded here as gcc doesn't like unnamed structs. */
-       NTFS_RECORD_TYPE magic; /* Magic is "INDX". */
-       le16 usa_ofs;           /* See NTFS_RECORD definition. */
-       le16 usa_count;         /* See NTFS_RECORD definition. */
-
-/*  8*/        sle64 lsn;              /* $LogFile sequence number of the last
-                                  modification of this index block. */
-/* 16*/        leVCN index_block_vcn;  /* Virtual cluster number of the index block.
-                                  If the cluster_size on the volume is <= the
-                                  index_block_size of the directory,
-                                  index_block_vcn counts in units of clusters,
-                                  and in units of sectors otherwise. */
-/* 24*/        INDEX_HEADER index;     /* Describes the following index entries. */
-/* sizeof()= 40 (0x28) bytes */
-/*
- * When creating the index block, we place the update sequence array at this
- * offset, i.e. before we start with the index entries. This also makes sense,
- * otherwise we could run into problems with the update sequence array
- * containing in itself the last two bytes of a sector which would mean that
- * multi sector transfer protection wouldn't work. As you can't protect data
- * by overwriting it since you then can't get it back...
- * When reading use the data from the ntfs record header.
- */
-} __attribute__ ((__packed__)) INDEX_BLOCK;
-
-typedef INDEX_BLOCK INDEX_ALLOCATION;
-
-/*
- * The system file FILE_Extend/$Reparse contains an index named $R listing
- * all reparse points on the volume. The index entry keys are as defined
- * below. Note, that there is no index data associated with the index entries.
- *
- * The index entries are sorted by the index key file_id. The collation rule is
- * COLLATION_NTOFS_ULONGS. FIXME: Verify whether the reparse_tag is not the
- * primary key / is not a key at all. (AIA)
- */
-typedef struct {
-       le32 reparse_tag;       /* Reparse point type (inc. flags). */
-       leMFT_REF file_id;      /* Mft record of the file containing the
-                                  reparse point attribute. */
-} __attribute__ ((__packed__)) REPARSE_INDEX_KEY;
-
-/*
- * Quota flags (32-bit).
- *
- * The user quota flags.  Names explain meaning.
- */
-enum {
-       QUOTA_FLAG_DEFAULT_LIMITS       = cpu_to_le32(0x00000001),
-       QUOTA_FLAG_LIMIT_REACHED        = cpu_to_le32(0x00000002),
-       QUOTA_FLAG_ID_DELETED           = cpu_to_le32(0x00000004),
-
-       QUOTA_FLAG_USER_MASK            = cpu_to_le32(0x00000007),
-       /* This is a bit mask for the user quota flags. */
-
-       /*
-        * These flags are only present in the quota defaults index entry, i.e.
-        * in the entry where owner_id = QUOTA_DEFAULTS_ID.
-        */
-       QUOTA_FLAG_TRACKING_ENABLED     = cpu_to_le32(0x00000010),
-       QUOTA_FLAG_ENFORCEMENT_ENABLED  = cpu_to_le32(0x00000020),
-       QUOTA_FLAG_TRACKING_REQUESTED   = cpu_to_le32(0x00000040),
-       QUOTA_FLAG_LOG_THRESHOLD        = cpu_to_le32(0x00000080),
-
-       QUOTA_FLAG_LOG_LIMIT            = cpu_to_le32(0x00000100),
-       QUOTA_FLAG_OUT_OF_DATE          = cpu_to_le32(0x00000200),
-       QUOTA_FLAG_CORRUPT              = cpu_to_le32(0x00000400),
-       QUOTA_FLAG_PENDING_DELETES      = cpu_to_le32(0x00000800),
-};
-
-typedef le32 QUOTA_FLAGS;
-
-/*
- * The system file FILE_Extend/$Quota contains two indexes $O and $Q. Quotas
- * are on a per volume and per user basis.
- *
- * The $Q index contains one entry for each existing user_id on the volume. The
- * index key is the user_id of the user/group owning this quota control entry,
- * i.e. the key is the owner_id. The user_id of the owner of a file, i.e. the
- * owner_id, is found in the standard information attribute. The collation rule
- * for $Q is COLLATION_NTOFS_ULONG.
- *
- * The $O index contains one entry for each user/group who has been assigned
- * a quota on that volume. The index key holds the SID of the user_id the
- * entry belongs to, i.e. the owner_id. The collation rule for $O is
- * COLLATION_NTOFS_SID.
- *
- * The $O index entry data is the user_id of the user corresponding to the SID.
- * This user_id is used as an index into $Q to find the quota control entry
- * associated with the SID.
- *
- * The $Q index entry data is the quota control entry and is defined below.
- */
-typedef struct {
-       le32 version;           /* Currently equals 2. */
-       QUOTA_FLAGS flags;      /* Flags describing this quota entry. */
-       le64 bytes_used;        /* How many bytes of the quota are in use. */
-       sle64 change_time;      /* Last time this quota entry was changed. */
-       sle64 threshold;        /* Soft quota (-1 if not limited). */
-       sle64 limit;            /* Hard quota (-1 if not limited). */
-       sle64 exceeded_time;    /* How long the soft quota has been exceeded. */
-       SID sid;                /* The SID of the user/object associated with
-                                  this quota entry.  Equals zero for the quota
-                                  defaults entry (and in fact on a WinXP
-                                  volume, it is not present at all). */
-} __attribute__ ((__packed__)) QUOTA_CONTROL_ENTRY;
-
-/*
- * Predefined owner_id values (32-bit).
- */
-enum {
-       QUOTA_INVALID_ID        = cpu_to_le32(0x00000000),
-       QUOTA_DEFAULTS_ID       = cpu_to_le32(0x00000001),
-       QUOTA_FIRST_USER_ID     = cpu_to_le32(0x00000100),
-};
-
-/*
- * Current constants for quota control entries.
- */
-typedef enum {
-       /* Current version. */
-       QUOTA_VERSION   = 2,
-} QUOTA_CONTROL_ENTRY_CONSTANTS;
-
-/*
- * Index entry flags (16-bit).
- */
-enum {
-       INDEX_ENTRY_NODE = cpu_to_le16(1), /* This entry contains a
-                       sub-node, i.e. a reference to an index block in form of
-                       a virtual cluster number (see below). */
-       INDEX_ENTRY_END  = cpu_to_le16(2), /* This signifies the last
-                       entry in an index block.  The index entry does not
-                       represent a file but it can point to a sub-node. */
-
-       INDEX_ENTRY_SPACE_FILLER = cpu_to_le16(0xffff), /* gcc: Force
-                       enum bit width to 16-bit. */
-} __attribute__ ((__packed__));
-
-typedef le16 INDEX_ENTRY_FLAGS;
-
-/*
- * This the index entry header (see below).
- */
-typedef struct {
-/*  0*/        union {
-               struct { /* Only valid when INDEX_ENTRY_END is not set. */
-                       leMFT_REF indexed_file; /* The mft reference of the file
-                                                  described by this index
-                                                  entry. Used for directory
-                                                  indexes. */
-               } __attribute__ ((__packed__)) dir;
-               struct { /* Used for views/indexes to find the entry's data. */
-                       le16 data_offset;       /* Data byte offset from this
-                                                  INDEX_ENTRY. Follows the
-                                                  index key. */
-                       le16 data_length;       /* Data length in bytes. */
-                       le32 reservedV;         /* Reserved (zero). */
-               } __attribute__ ((__packed__)) vi;
-       } __attribute__ ((__packed__)) data;
-/*  8*/        le16 length;             /* Byte size of this index entry, multiple of
-                                   8-bytes. */
-/* 10*/        le16 key_length;         /* Byte size of the key value, which is in the
-                                   index entry. It follows field reserved. Not
-                                   multiple of 8-bytes. */
-/* 12*/        INDEX_ENTRY_FLAGS flags; /* Bit field of INDEX_ENTRY_* flags. */
-/* 14*/        le16 reserved;           /* Reserved/align to 8-byte boundary. */
-/* sizeof() = 16 bytes */
-} __attribute__ ((__packed__)) INDEX_ENTRY_HEADER;
-
-/*
- * This is an index entry. A sequence of such entries follows each INDEX_HEADER
- * structure. Together they make up a complete index. The index follows either
- * an index root attribute or an index allocation attribute.
- *
- * NOTE: Before NTFS 3.0 only filename attributes were indexed.
- */
-typedef struct {
-/*Ofs*/
-/*  0  INDEX_ENTRY_HEADER; -- Unfolded here as gcc dislikes unnamed structs. */
-       union {
-               struct { /* Only valid when INDEX_ENTRY_END is not set. */
-                       leMFT_REF indexed_file; /* The mft reference of the file
-                                                  described by this index
-                                                  entry. Used for directory
-                                                  indexes. */
-               } __attribute__ ((__packed__)) dir;
-               struct { /* Used for views/indexes to find the entry's data. */
-                       le16 data_offset;       /* Data byte offset from this
-                                                  INDEX_ENTRY. Follows the
-                                                  index key. */
-                       le16 data_length;       /* Data length in bytes. */
-                       le32 reservedV;         /* Reserved (zero). */
-               } __attribute__ ((__packed__)) vi;
-       } __attribute__ ((__packed__)) data;
-       le16 length;             /* Byte size of this index entry, multiple of
-                                   8-bytes. */
-       le16 key_length;         /* Byte size of the key value, which is in the
-                                   index entry. It follows field reserved. Not
-                                   multiple of 8-bytes. */
-       INDEX_ENTRY_FLAGS flags; /* Bit field of INDEX_ENTRY_* flags. */
-       le16 reserved;           /* Reserved/align to 8-byte boundary. */
-
-/* 16*/        union {         /* The key of the indexed attribute. NOTE: Only present
-                          if INDEX_ENTRY_END bit in flags is not set. NOTE: On
-                          NTFS versions before 3.0 the only valid key is the
-                          FILE_NAME_ATTR. On NTFS 3.0+ the following
-                          additional index keys are defined: */
-               FILE_NAME_ATTR file_name;/* $I30 index in directories. */
-               SII_INDEX_KEY sii;      /* $SII index in $Secure. */
-               SDH_INDEX_KEY sdh;      /* $SDH index in $Secure. */
-               GUID object_id;         /* $O index in FILE_Extend/$ObjId: The
-                                          object_id of the mft record found in
-                                          the data part of the index. */
-               REPARSE_INDEX_KEY reparse;      /* $R index in
-                                                  FILE_Extend/$Reparse. */
-               SID sid;                /* $O index in FILE_Extend/$Quota:
-                                          SID of the owner of the user_id. */
-               le32 owner_id;          /* $Q index in FILE_Extend/$Quota:
-                                          user_id of the owner of the quota
-                                          control entry in the data part of
-                                          the index. */
-       } __attribute__ ((__packed__)) key;
-       /* The (optional) index data is inserted here when creating. */
-       // leVCN vcn;   /* If INDEX_ENTRY_NODE bit in flags is set, the last
-       //                 eight bytes of this index entry contain the virtual
-       //                 cluster number of the index block that holds the
-       //                 entries immediately preceding the current entry (the
-       //                 vcn references the corresponding cluster in the data
-       //                 of the non-resident index allocation attribute). If
-       //                 the key_length is zero, then the vcn immediately
-       //                 follows the INDEX_ENTRY_HEADER. Regardless of
-       //                 key_length, the address of the 8-byte boundary
-       //                 aligned vcn of INDEX_ENTRY{_HEADER} *ie is given by
-       //                 (char*)ie + le16_to_cpu(ie*)->length) - sizeof(VCN),
-       //                 where sizeof(VCN) can be hardcoded as 8 if wanted. */
-} __attribute__ ((__packed__)) INDEX_ENTRY;
-
-/*
- * Attribute: Bitmap (0xb0).
- *
- * Contains an array of bits (aka a bitfield).
- *
- * When used in conjunction with the index allocation attribute, each bit
- * corresponds to one index block within the index allocation attribute. Thus
- * the number of bits in the bitmap * index block size / cluster size is the
- * number of clusters in the index allocation attribute.
- */
-typedef struct {
-       u8 bitmap[0];                   /* Array of bits. */
-} __attribute__ ((__packed__)) BITMAP_ATTR;
-
-/*
- * The reparse point tag defines the type of the reparse point. It also
- * includes several flags, which further describe the reparse point.
- *
- * The reparse point tag is an unsigned 32-bit value divided in three parts:
- *
- * 1. The least significant 16 bits (i.e. bits 0 to 15) specifiy the type of
- *    the reparse point.
- * 2. The 13 bits after this (i.e. bits 16 to 28) are reserved for future use.
- * 3. The most significant three bits are flags describing the reparse point.
- *    They are defined as follows:
- *     bit 29: Name surrogate bit. If set, the filename is an alias for
- *             another object in the system.
- *     bit 30: High-latency bit. If set, accessing the first byte of data will
- *             be slow. (E.g. the data is stored on a tape drive.)
- *     bit 31: Microsoft bit. If set, the tag is owned by Microsoft. User
- *             defined tags have to use zero here.
- *
- * These are the predefined reparse point tags:
- */
-enum {
-       IO_REPARSE_TAG_IS_ALIAS         = cpu_to_le32(0x20000000),
-       IO_REPARSE_TAG_IS_HIGH_LATENCY  = cpu_to_le32(0x40000000),
-       IO_REPARSE_TAG_IS_MICROSOFT     = cpu_to_le32(0x80000000),
-
-       IO_REPARSE_TAG_RESERVED_ZERO    = cpu_to_le32(0x00000000),
-       IO_REPARSE_TAG_RESERVED_ONE     = cpu_to_le32(0x00000001),
-       IO_REPARSE_TAG_RESERVED_RANGE   = cpu_to_le32(0x00000001),
-
-       IO_REPARSE_TAG_NSS              = cpu_to_le32(0x68000005),
-       IO_REPARSE_TAG_NSS_RECOVER      = cpu_to_le32(0x68000006),
-       IO_REPARSE_TAG_SIS              = cpu_to_le32(0x68000007),
-       IO_REPARSE_TAG_DFS              = cpu_to_le32(0x68000008),
-
-       IO_REPARSE_TAG_MOUNT_POINT      = cpu_to_le32(0x88000003),
-
-       IO_REPARSE_TAG_HSM              = cpu_to_le32(0xa8000004),
-
-       IO_REPARSE_TAG_SYMBOLIC_LINK    = cpu_to_le32(0xe8000000),
-
-       IO_REPARSE_TAG_VALID_VALUES     = cpu_to_le32(0xe000ffff),
-};
-
-/*
- * Attribute: Reparse point (0xc0).
- *
- * NOTE: Can be resident or non-resident.
- */
-typedef struct {
-       le32 reparse_tag;               /* Reparse point type (inc. flags). */
-       le16 reparse_data_length;       /* Byte size of reparse data. */
-       le16 reserved;                  /* Align to 8-byte boundary. */
-       u8 reparse_data[0];             /* Meaning depends on reparse_tag. */
-} __attribute__ ((__packed__)) REPARSE_POINT;
-
-/*
- * Attribute: Extended attribute (EA) information (0xd0).
- *
- * NOTE: Always resident. (Is this true???)
- */
-typedef struct {
-       le16 ea_length;         /* Byte size of the packed extended
-                                  attributes. */
-       le16 need_ea_count;     /* The number of extended attributes which have
-                                  the NEED_EA bit set. */
-       le32 ea_query_length;   /* Byte size of the buffer required to query
-                                  the extended attributes when calling
-                                  ZwQueryEaFile() in Windows NT/2k. I.e. the
-                                  byte size of the unpacked extended
-                                  attributes. */
-} __attribute__ ((__packed__)) EA_INFORMATION;
-
-/*
- * Extended attribute flags (8-bit).
- */
-enum {
-       NEED_EA = 0x80          /* If set the file to which the EA belongs
-                                  cannot be interpreted without understanding
-                                  the associates extended attributes. */
-} __attribute__ ((__packed__));
-
-typedef u8 EA_FLAGS;
-
-/*
- * Attribute: Extended attribute (EA) (0xe0).
- *
- * NOTE: Can be resident or non-resident.
- *
- * Like the attribute list and the index buffer list, the EA attribute value is
- * a sequence of EA_ATTR variable length records.
- */
-typedef struct {
-       le32 next_entry_offset; /* Offset to the next EA_ATTR. */
-       EA_FLAGS flags;         /* Flags describing the EA. */
-       u8 ea_name_length;      /* Length of the name of the EA in bytes
-                                  excluding the '\0' byte terminator. */
-       le16 ea_value_length;   /* Byte size of the EA's value. */
-       u8 ea_name[0];          /* Name of the EA.  Note this is ASCII, not
-                                  Unicode and it is zero terminated. */
-       u8 ea_value[0];         /* The value of the EA.  Immediately follows
-                                  the name. */
-} __attribute__ ((__packed__)) EA_ATTR;
-
-/*
- * Attribute: Property set (0xf0).
- *
- * Intended to support Native Structure Storage (NSS) - a feature removed from
- * NTFS 3.0 during beta testing.
- */
-typedef struct {
-       /* Irrelevant as feature unused. */
-} __attribute__ ((__packed__)) PROPERTY_SET;
-
-/*
- * Attribute: Logged utility stream (0x100).
- *
- * NOTE: Can be resident or non-resident.
- *
- * Operations on this attribute are logged to the journal ($LogFile) like
- * normal metadata changes.
- *
- * Used by the Encrypting File System (EFS). All encrypted files have this
- * attribute with the name $EFS.
- */
-typedef struct {
-       /* Can be anything the creator chooses. */
-       /* EFS uses it as follows: */
-       // FIXME: Type this info, verifying it along the way. (AIA)
-} __attribute__ ((__packed__)) LOGGED_UTILITY_STREAM, EFS_ATTR;
-
-#endif /* _LINUX_NTFS_LAYOUT_H */
diff --git a/fs/ntfs/lcnalloc.c b/fs/ntfs/lcnalloc.c
deleted file mode 100644 (file)
index eda9972..0000000
+++ /dev/null
@@ -1,1000 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * lcnalloc.c - Cluster (de)allocation code.  Part of the Linux-NTFS project.
- *
- * Copyright (c) 2004-2005 Anton Altaparmakov
- */
-
-#ifdef NTFS_RW
-
-#include <linux/pagemap.h>
-
-#include "lcnalloc.h"
-#include "debug.h"
-#include "bitmap.h"
-#include "inode.h"
-#include "volume.h"
-#include "attrib.h"
-#include "malloc.h"
-#include "aops.h"
-#include "ntfs.h"
-
-/**
- * ntfs_cluster_free_from_rl_nolock - free clusters from runlist
- * @vol:       mounted ntfs volume on which to free the clusters
- * @rl:                runlist describing the clusters to free
- *
- * Free all the clusters described by the runlist @rl on the volume @vol.  In
- * the case of an error being returned, at least some of the clusters were not
- * freed.
- *
- * Return 0 on success and -errno on error.
- *
- * Locking: - The volume lcn bitmap must be locked for writing on entry and is
- *           left locked on return.
- */
-int ntfs_cluster_free_from_rl_nolock(ntfs_volume *vol,
-               const runlist_element *rl)
-{
-       struct inode *lcnbmp_vi = vol->lcnbmp_ino;
-       int ret = 0;
-
-       ntfs_debug("Entering.");
-       if (!rl)
-               return 0;
-       for (; rl->length; rl++) {
-               int err;
-
-               if (rl->lcn < 0)
-                       continue;
-               err = ntfs_bitmap_clear_run(lcnbmp_vi, rl->lcn, rl->length);
-               if (unlikely(err && (!ret || ret == -ENOMEM) && ret != err))
-                       ret = err;
-       }
-       ntfs_debug("Done.");
-       return ret;
-}
-
-/**
- * ntfs_cluster_alloc - allocate clusters on an ntfs volume
- * @vol:       mounted ntfs volume on which to allocate the clusters
- * @start_vcn: vcn to use for the first allocated cluster
- * @count:     number of clusters to allocate
- * @start_lcn: starting lcn at which to allocate the clusters (or -1 if none)
- * @zone:      zone from which to allocate the clusters
- * @is_extension:      if 'true', this is an attribute extension
- *
- * Allocate @count clusters preferably starting at cluster @start_lcn or at the
- * current allocator position if @start_lcn is -1, on the mounted ntfs volume
- * @vol. @zone is either DATA_ZONE for allocation of normal clusters or
- * MFT_ZONE for allocation of clusters for the master file table, i.e. the
- * $MFT/$DATA attribute.
- *
- * @start_vcn specifies the vcn of the first allocated cluster.  This makes
- * merging the resulting runlist with the old runlist easier.
- *
- * If @is_extension is 'true', the caller is allocating clusters to extend an
- * attribute and if it is 'false', the caller is allocating clusters to fill a
- * hole in an attribute.  Practically the difference is that if @is_extension
- * is 'true' the returned runlist will be terminated with LCN_ENOENT and if
- * @is_extension is 'false' the runlist will be terminated with
- * LCN_RL_NOT_MAPPED.
- *
- * You need to check the return value with IS_ERR().  If this is false, the
- * function was successful and the return value is a runlist describing the
- * allocated cluster(s).  If IS_ERR() is true, the function failed and
- * PTR_ERR() gives you the error code.
- *
- * Notes on the allocation algorithm
- * =================================
- *
- * There are two data zones.  First is the area between the end of the mft zone
- * and the end of the volume, and second is the area between the start of the
- * volume and the start of the mft zone.  On unmodified/standard NTFS 1.x
- * volumes, the second data zone does not exist due to the mft zone being
- * expanded to cover the start of the volume in order to reserve space for the
- * mft bitmap attribute.
- *
- * This is not the prettiest function but the complexity stems from the need of
- * implementing the mft vs data zoned approach and from the fact that we have
- * access to the lcn bitmap in portions of up to 8192 bytes at a time, so we
- * need to cope with crossing over boundaries of two buffers.  Further, the
- * fact that the allocator allows for caller supplied hints as to the location
- * of where allocation should begin and the fact that the allocator keeps track
- * of where in the data zones the next natural allocation should occur,
- * contribute to the complexity of the function.  But it should all be
- * worthwhile, because this allocator should: 1) be a full implementation of
- * the MFT zone approach used by Windows NT, 2) cause reduction in
- * fragmentation, and 3) be speedy in allocations (the code is not optimized
- * for speed, but the algorithm is, so further speed improvements are probably
- * possible).
- *
- * FIXME: We should be monitoring cluster allocation and increment the MFT zone
- * size dynamically but this is something for the future.  We will just cause
- * heavier fragmentation by not doing it and I am not even sure Windows would
- * grow the MFT zone dynamically, so it might even be correct not to do this.
- * The overhead in doing dynamic MFT zone expansion would be very large and
- * unlikely worth the effort. (AIA)
- *
- * TODO: I have added in double the required zone position pointer wrap around
- * logic which can be optimized to having only one of the two logic sets.
- * However, having the double logic will work fine, but if we have only one of
- * the sets and we get it wrong somewhere, then we get into trouble, so
- * removing the duplicate logic requires _very_ careful consideration of _all_
- * possible code paths.  So at least for now, I am leaving the double logic -
- * better safe than sorry... (AIA)
- *
- * Locking: - The volume lcn bitmap must be unlocked on entry and is unlocked
- *           on return.
- *         - This function takes the volume lcn bitmap lock for writing and
- *           modifies the bitmap contents.
- */
-runlist_element *ntfs_cluster_alloc(ntfs_volume *vol, const VCN start_vcn,
-               const s64 count, const LCN start_lcn,
-               const NTFS_CLUSTER_ALLOCATION_ZONES zone,
-               const bool is_extension)
-{
-       LCN zone_start, zone_end, bmp_pos, bmp_initial_pos, last_read_pos, lcn;
-       LCN prev_lcn = 0, prev_run_len = 0, mft_zone_size;
-       s64 clusters;
-       loff_t i_size;
-       struct inode *lcnbmp_vi;
-       runlist_element *rl = NULL;
-       struct address_space *mapping;
-       struct page *page = NULL;
-       u8 *buf, *byte;
-       int err = 0, rlpos, rlsize, buf_size;
-       u8 pass, done_zones, search_zone, need_writeback = 0, bit;
-
-       ntfs_debug("Entering for start_vcn 0x%llx, count 0x%llx, start_lcn "
-                       "0x%llx, zone %s_ZONE.", (unsigned long long)start_vcn,
-                       (unsigned long long)count,
-                       (unsigned long long)start_lcn,
-                       zone == MFT_ZONE ? "MFT" : "DATA");
-       BUG_ON(!vol);
-       lcnbmp_vi = vol->lcnbmp_ino;
-       BUG_ON(!lcnbmp_vi);
-       BUG_ON(start_vcn < 0);
-       BUG_ON(count < 0);
-       BUG_ON(start_lcn < -1);
-       BUG_ON(zone < FIRST_ZONE);
-       BUG_ON(zone > LAST_ZONE);
-
-       /* Return NULL if @count is zero. */
-       if (!count)
-               return NULL;
-       /* Take the lcnbmp lock for writing. */
-       down_write(&vol->lcnbmp_lock);
-       /*
-        * If no specific @start_lcn was requested, use the current data zone
-        * position, otherwise use the requested @start_lcn but make sure it
-        * lies outside the mft zone.  Also set done_zones to 0 (no zones done)
-        * and pass depending on whether we are starting inside a zone (1) or
-        * at the beginning of a zone (2).  If requesting from the MFT_ZONE,
-        * we either start at the current position within the mft zone or at
-        * the specified position.  If the latter is out of bounds then we start
-        * at the beginning of the MFT_ZONE.
-        */
-       done_zones = 0;
-       pass = 1;
-       /*
-        * zone_start and zone_end are the current search range.  search_zone
-        * is 1 for mft zone, 2 for data zone 1 (end of mft zone till end of
-        * volume) and 4 for data zone 2 (start of volume till start of mft
-        * zone).
-        */
-       zone_start = start_lcn;
-       if (zone_start < 0) {
-               if (zone == DATA_ZONE)
-                       zone_start = vol->data1_zone_pos;
-               else
-                       zone_start = vol->mft_zone_pos;
-               if (!zone_start) {
-                       /*
-                        * Zone starts at beginning of volume which means a
-                        * single pass is sufficient.
-                        */
-                       pass = 2;
-               }
-       } else if (zone == DATA_ZONE && zone_start >= vol->mft_zone_start &&
-                       zone_start < vol->mft_zone_end) {
-               zone_start = vol->mft_zone_end;
-               /*
-                * Starting at beginning of data1_zone which means a single
-                * pass in this zone is sufficient.
-                */
-               pass = 2;
-       } else if (zone == MFT_ZONE && (zone_start < vol->mft_zone_start ||
-                       zone_start >= vol->mft_zone_end)) {
-               zone_start = vol->mft_lcn;
-               if (!vol->mft_zone_end)
-                       zone_start = 0;
-               /*
-                * Starting at beginning of volume which means a single pass
-                * is sufficient.
-                */
-               pass = 2;
-       }
-       if (zone == MFT_ZONE) {
-               zone_end = vol->mft_zone_end;
-               search_zone = 1;
-       } else /* if (zone == DATA_ZONE) */ {
-               /* Skip searching the mft zone. */
-               done_zones |= 1;
-               if (zone_start >= vol->mft_zone_end) {
-                       zone_end = vol->nr_clusters;
-                       search_zone = 2;
-               } else {
-                       zone_end = vol->mft_zone_start;
-                       search_zone = 4;
-               }
-       }
-       /*
-        * bmp_pos is the current bit position inside the bitmap.  We use
-        * bmp_initial_pos to determine whether or not to do a zone switch.
-        */
-       bmp_pos = bmp_initial_pos = zone_start;
-
-       /* Loop until all clusters are allocated, i.e. clusters == 0. */
-       clusters = count;
-       rlpos = rlsize = 0;
-       mapping = lcnbmp_vi->i_mapping;
-       i_size = i_size_read(lcnbmp_vi);
-       while (1) {
-               ntfs_debug("Start of outer while loop: done_zones 0x%x, "
-                               "search_zone %i, pass %i, zone_start 0x%llx, "
-                               "zone_end 0x%llx, bmp_initial_pos 0x%llx, "
-                               "bmp_pos 0x%llx, rlpos %i, rlsize %i.",
-                               done_zones, search_zone, pass,
-                               (unsigned long long)zone_start,
-                               (unsigned long long)zone_end,
-                               (unsigned long long)bmp_initial_pos,
-                               (unsigned long long)bmp_pos, rlpos, rlsize);
-               /* Loop until we run out of free clusters. */
-               last_read_pos = bmp_pos >> 3;
-               ntfs_debug("last_read_pos 0x%llx.",
-                               (unsigned long long)last_read_pos);
-               if (last_read_pos > i_size) {
-                       ntfs_debug("End of attribute reached.  "
-                                       "Skipping to zone_pass_done.");
-                       goto zone_pass_done;
-               }
-               if (likely(page)) {
-                       if (need_writeback) {
-                               ntfs_debug("Marking page dirty.");
-                               flush_dcache_page(page);
-                               set_page_dirty(page);
-                               need_writeback = 0;
-                       }
-                       ntfs_unmap_page(page);
-               }
-               page = ntfs_map_page(mapping, last_read_pos >>
-                               PAGE_SHIFT);
-               if (IS_ERR(page)) {
-                       err = PTR_ERR(page);
-                       ntfs_error(vol->sb, "Failed to map page.");
-                       goto out;
-               }
-               buf_size = last_read_pos & ~PAGE_MASK;
-               buf = page_address(page) + buf_size;
-               buf_size = PAGE_SIZE - buf_size;
-               if (unlikely(last_read_pos + buf_size > i_size))
-                       buf_size = i_size - last_read_pos;
-               buf_size <<= 3;
-               lcn = bmp_pos & 7;
-               bmp_pos &= ~(LCN)7;
-               ntfs_debug("Before inner while loop: buf_size %i, lcn 0x%llx, "
-                               "bmp_pos 0x%llx, need_writeback %i.", buf_size,
-                               (unsigned long long)lcn,
-                               (unsigned long long)bmp_pos, need_writeback);
-               while (lcn < buf_size && lcn + bmp_pos < zone_end) {
-                       byte = buf + (lcn >> 3);
-                       ntfs_debug("In inner while loop: buf_size %i, "
-                                       "lcn 0x%llx, bmp_pos 0x%llx, "
-                                       "need_writeback %i, byte ofs 0x%x, "
-                                       "*byte 0x%x.", buf_size,
-                                       (unsigned long long)lcn,
-                                       (unsigned long long)bmp_pos,
-                                       need_writeback,
-                                       (unsigned int)(lcn >> 3),
-                                       (unsigned int)*byte);
-                       /* Skip full bytes. */
-                       if (*byte == 0xff) {
-                               lcn = (lcn + 8) & ~(LCN)7;
-                               ntfs_debug("Continuing while loop 1.");
-                               continue;
-                       }
-                       bit = 1 << (lcn & 7);
-                       ntfs_debug("bit 0x%x.", bit);
-                       /* If the bit is already set, go onto the next one. */
-                       if (*byte & bit) {
-                               lcn++;
-                               ntfs_debug("Continuing while loop 2.");
-                               continue;
-                       }
-                       /*
-                        * Allocate more memory if needed, including space for
-                        * the terminator element.
-                        * ntfs_malloc_nofs() operates on whole pages only.
-                        */
-                       if ((rlpos + 2) * sizeof(*rl) > rlsize) {
-                               runlist_element *rl2;
-
-                               ntfs_debug("Reallocating memory.");
-                               if (!rl)
-                                       ntfs_debug("First free bit is at LCN "
-                                                       "0x%llx.",
-                                                       (unsigned long long)
-                                                       (lcn + bmp_pos));
-                               rl2 = ntfs_malloc_nofs(rlsize + (int)PAGE_SIZE);
-                               if (unlikely(!rl2)) {
-                                       err = -ENOMEM;
-                                       ntfs_error(vol->sb, "Failed to "
-                                                       "allocate memory.");
-                                       goto out;
-                               }
-                               memcpy(rl2, rl, rlsize);
-                               ntfs_free(rl);
-                               rl = rl2;
-                               rlsize += PAGE_SIZE;
-                               ntfs_debug("Reallocated memory, rlsize 0x%x.",
-                                               rlsize);
-                       }
-                       /* Allocate the bitmap bit. */
-                       *byte |= bit;
-                       /* We need to write this bitmap page to disk. */
-                       need_writeback = 1;
-                       ntfs_debug("*byte 0x%x, need_writeback is set.",
-                                       (unsigned int)*byte);
-                       /*
-                        * Coalesce with previous run if adjacent LCNs.
-                        * Otherwise, append a new run.
-                        */
-                       ntfs_debug("Adding run (lcn 0x%llx, len 0x%llx), "
-                                       "prev_lcn 0x%llx, lcn 0x%llx, "
-                                       "bmp_pos 0x%llx, prev_run_len 0x%llx, "
-                                       "rlpos %i.",
-                                       (unsigned long long)(lcn + bmp_pos),
-                                       1ULL, (unsigned long long)prev_lcn,
-                                       (unsigned long long)lcn,
-                                       (unsigned long long)bmp_pos,
-                                       (unsigned long long)prev_run_len,
-                                       rlpos);
-                       if (prev_lcn == lcn + bmp_pos - prev_run_len && rlpos) {
-                               ntfs_debug("Coalescing to run (lcn 0x%llx, "
-                                               "len 0x%llx).",
-                                               (unsigned long long)
-                                               rl[rlpos - 1].lcn,
-                                               (unsigned long long)
-                                               rl[rlpos - 1].length);
-                               rl[rlpos - 1].length = ++prev_run_len;
-                               ntfs_debug("Run now (lcn 0x%llx, len 0x%llx), "
-                                               "prev_run_len 0x%llx.",
-                                               (unsigned long long)
-                                               rl[rlpos - 1].lcn,
-                                               (unsigned long long)
-                                               rl[rlpos - 1].length,
-                                               (unsigned long long)
-                                               prev_run_len);
-                       } else {
-                               if (likely(rlpos)) {
-                                       ntfs_debug("Adding new run, (previous "
-                                                       "run lcn 0x%llx, "
-                                                       "len 0x%llx).",
-                                                       (unsigned long long)
-                                                       rl[rlpos - 1].lcn,
-                                                       (unsigned long long)
-                                                       rl[rlpos - 1].length);
-                                       rl[rlpos].vcn = rl[rlpos - 1].vcn +
-                                                       prev_run_len;
-                               } else {
-                                       ntfs_debug("Adding new run, is first "
-                                                       "run.");
-                                       rl[rlpos].vcn = start_vcn;
-                               }
-                               rl[rlpos].lcn = prev_lcn = lcn + bmp_pos;
-                               rl[rlpos].length = prev_run_len = 1;
-                               rlpos++;
-                       }
-                       /* Done? */
-                       if (!--clusters) {
-                               LCN tc;
-                               /*
-                                * Update the current zone position.  Positions
-                                * of already scanned zones have been updated
-                                * during the respective zone switches.
-                                */
-                               tc = lcn + bmp_pos + 1;
-                               ntfs_debug("Done. Updating current zone "
-                                               "position, tc 0x%llx, "
-                                               "search_zone %i.",
-                                               (unsigned long long)tc,
-                                               search_zone);
-                               switch (search_zone) {
-                               case 1:
-                                       ntfs_debug("Before checks, "
-                                                       "vol->mft_zone_pos "
-                                                       "0x%llx.",
-                                                       (unsigned long long)
-                                                       vol->mft_zone_pos);
-                                       if (tc >= vol->mft_zone_end) {
-                                               vol->mft_zone_pos =
-                                                               vol->mft_lcn;
-                                               if (!vol->mft_zone_end)
-                                                       vol->mft_zone_pos = 0;
-                                       } else if ((bmp_initial_pos >=
-                                                       vol->mft_zone_pos ||
-                                                       tc > vol->mft_zone_pos)
-                                                       && tc >= vol->mft_lcn)
-                                               vol->mft_zone_pos = tc;
-                                       ntfs_debug("After checks, "
-                                                       "vol->mft_zone_pos "
-                                                       "0x%llx.",
-                                                       (unsigned long long)
-                                                       vol->mft_zone_pos);
-                                       break;
-                               case 2:
-                                       ntfs_debug("Before checks, "
-                                                       "vol->data1_zone_pos "
-                                                       "0x%llx.",
-                                                       (unsigned long long)
-                                                       vol->data1_zone_pos);
-                                       if (tc >= vol->nr_clusters)
-                                               vol->data1_zone_pos =
-                                                            vol->mft_zone_end;
-                                       else if ((bmp_initial_pos >=
-                                                   vol->data1_zone_pos ||
-                                                   tc > vol->data1_zone_pos)
-                                                   && tc >= vol->mft_zone_end)
-                                               vol->data1_zone_pos = tc;
-                                       ntfs_debug("After checks, "
-                                                       "vol->data1_zone_pos "
-                                                       "0x%llx.",
-                                                       (unsigned long long)
-                                                       vol->data1_zone_pos);
-                                       break;
-                               case 4:
-                                       ntfs_debug("Before checks, "
-                                                       "vol->data2_zone_pos "
-                                                       "0x%llx.",
-                                                       (unsigned long long)
-                                                       vol->data2_zone_pos);
-                                       if (tc >= vol->mft_zone_start)
-                                               vol->data2_zone_pos = 0;
-                                       else if (bmp_initial_pos >=
-                                                     vol->data2_zone_pos ||
-                                                     tc > vol->data2_zone_pos)
-                                               vol->data2_zone_pos = tc;
-                                       ntfs_debug("After checks, "
-                                                       "vol->data2_zone_pos "
-                                                       "0x%llx.",
-                                                       (unsigned long long)
-                                                       vol->data2_zone_pos);
-                                       break;
-                               default:
-                                       BUG();
-                               }
-                               ntfs_debug("Finished.  Going to out.");
-                               goto out;
-                       }
-                       lcn++;
-               }
-               bmp_pos += buf_size;
-               ntfs_debug("After inner while loop: buf_size 0x%x, lcn "
-                               "0x%llx, bmp_pos 0x%llx, need_writeback %i.",
-                               buf_size, (unsigned long long)lcn,
-                               (unsigned long long)bmp_pos, need_writeback);
-               if (bmp_pos < zone_end) {
-                       ntfs_debug("Continuing outer while loop, "
-                                       "bmp_pos 0x%llx, zone_end 0x%llx.",
-                                       (unsigned long long)bmp_pos,
-                                       (unsigned long long)zone_end);
-                       continue;
-               }
-zone_pass_done:        /* Finished with the current zone pass. */
-               ntfs_debug("At zone_pass_done, pass %i.", pass);
-               if (pass == 1) {
-                       /*
-                        * Now do pass 2, scanning the first part of the zone
-                        * we omitted in pass 1.
-                        */
-                       pass = 2;
-                       zone_end = zone_start;
-                       switch (search_zone) {
-                       case 1: /* mft_zone */
-                               zone_start = vol->mft_zone_start;
-                               break;
-                       case 2: /* data1_zone */
-                               zone_start = vol->mft_zone_end;
-                               break;
-                       case 4: /* data2_zone */
-                               zone_start = 0;
-                               break;
-                       default:
-                               BUG();
-                       }
-                       /* Sanity check. */
-                       if (zone_end < zone_start)
-                               zone_end = zone_start;
-                       bmp_pos = zone_start;
-                       ntfs_debug("Continuing outer while loop, pass 2, "
-                                       "zone_start 0x%llx, zone_end 0x%llx, "
-                                       "bmp_pos 0x%llx.",
-                                       (unsigned long long)zone_start,
-                                       (unsigned long long)zone_end,
-                                       (unsigned long long)bmp_pos);
-                       continue;
-               } /* pass == 2 */
-done_zones_check:
-               ntfs_debug("At done_zones_check, search_zone %i, done_zones "
-                               "before 0x%x, done_zones after 0x%x.",
-                               search_zone, done_zones,
-                               done_zones | search_zone);
-               done_zones |= search_zone;
-               if (done_zones < 7) {
-                       ntfs_debug("Switching zone.");
-                       /* Now switch to the next zone we haven't done yet. */
-                       pass = 1;
-                       switch (search_zone) {
-                       case 1:
-                               ntfs_debug("Switching from mft zone to data1 "
-                                               "zone.");
-                               /* Update mft zone position. */
-                               if (rlpos) {
-                                       LCN tc;
-
-                                       ntfs_debug("Before checks, "
-                                                       "vol->mft_zone_pos "
-                                                       "0x%llx.",
-                                                       (unsigned long long)
-                                                       vol->mft_zone_pos);
-                                       tc = rl[rlpos - 1].lcn +
-                                                       rl[rlpos - 1].length;
-                                       if (tc >= vol->mft_zone_end) {
-                                               vol->mft_zone_pos =
-                                                               vol->mft_lcn;
-                                               if (!vol->mft_zone_end)
-                                                       vol->mft_zone_pos = 0;
-                                       } else if ((bmp_initial_pos >=
-                                                       vol->mft_zone_pos ||
-                                                       tc > vol->mft_zone_pos)
-                                                       && tc >= vol->mft_lcn)
-                                               vol->mft_zone_pos = tc;
-                                       ntfs_debug("After checks, "
-                                                       "vol->mft_zone_pos "
-                                                       "0x%llx.",
-                                                       (unsigned long long)
-                                                       vol->mft_zone_pos);
-                               }
-                               /* Switch from mft zone to data1 zone. */
-switch_to_data1_zone:          search_zone = 2;
-                               zone_start = bmp_initial_pos =
-                                               vol->data1_zone_pos;
-                               zone_end = vol->nr_clusters;
-                               if (zone_start == vol->mft_zone_end)
-                                       pass = 2;
-                               if (zone_start >= zone_end) {
-                                       vol->data1_zone_pos = zone_start =
-                                                       vol->mft_zone_end;
-                                       pass = 2;
-                               }
-                               break;
-                       case 2:
-                               ntfs_debug("Switching from data1 zone to "
-                                               "data2 zone.");
-                               /* Update data1 zone position. */
-                               if (rlpos) {
-                                       LCN tc;
-
-                                       ntfs_debug("Before checks, "
-                                                       "vol->data1_zone_pos "
-                                                       "0x%llx.",
-                                                       (unsigned long long)
-                                                       vol->data1_zone_pos);
-                                       tc = rl[rlpos - 1].lcn +
-                                                       rl[rlpos - 1].length;
-                                       if (tc >= vol->nr_clusters)
-                                               vol->data1_zone_pos =
-                                                            vol->mft_zone_end;
-                                       else if ((bmp_initial_pos >=
-                                                   vol->data1_zone_pos ||
-                                                   tc > vol->data1_zone_pos)
-                                                   && tc >= vol->mft_zone_end)
-                                               vol->data1_zone_pos = tc;
-                                       ntfs_debug("After checks, "
-                                                       "vol->data1_zone_pos "
-                                                       "0x%llx.",
-                                                       (unsigned long long)
-                                                       vol->data1_zone_pos);
-                               }
-                               /* Switch from data1 zone to data2 zone. */
-                               search_zone = 4;
-                               zone_start = bmp_initial_pos =
-                                               vol->data2_zone_pos;
-                               zone_end = vol->mft_zone_start;
-                               if (!zone_start)
-                                       pass = 2;
-                               if (zone_start >= zone_end) {
-                                       vol->data2_zone_pos = zone_start =
-                                                       bmp_initial_pos = 0;
-                                       pass = 2;
-                               }
-                               break;
-                       case 4:
-                               ntfs_debug("Switching from data2 zone to "
-                                               "data1 zone.");
-                               /* Update data2 zone position. */
-                               if (rlpos) {
-                                       LCN tc;
-
-                                       ntfs_debug("Before checks, "
-                                                       "vol->data2_zone_pos "
-                                                       "0x%llx.",
-                                                       (unsigned long long)
-                                                       vol->data2_zone_pos);
-                                       tc = rl[rlpos - 1].lcn +
-                                                       rl[rlpos - 1].length;
-                                       if (tc >= vol->mft_zone_start)
-                                               vol->data2_zone_pos = 0;
-                                       else if (bmp_initial_pos >=
-                                                     vol->data2_zone_pos ||
-                                                     tc > vol->data2_zone_pos)
-                                               vol->data2_zone_pos = tc;
-                                       ntfs_debug("After checks, "
-                                                       "vol->data2_zone_pos "
-                                                       "0x%llx.",
-                                                       (unsigned long long)
-                                                       vol->data2_zone_pos);
-                               }
-                               /* Switch from data2 zone to data1 zone. */
-                               goto switch_to_data1_zone;
-                       default:
-                               BUG();
-                       }
-                       ntfs_debug("After zone switch, search_zone %i, "
-                                       "pass %i, bmp_initial_pos 0x%llx, "
-                                       "zone_start 0x%llx, zone_end 0x%llx.",
-                                       search_zone, pass,
-                                       (unsigned long long)bmp_initial_pos,
-                                       (unsigned long long)zone_start,
-                                       (unsigned long long)zone_end);
-                       bmp_pos = zone_start;
-                       if (zone_start == zone_end) {
-                               ntfs_debug("Empty zone, going to "
-                                               "done_zones_check.");
-                               /* Empty zone. Don't bother searching it. */
-                               goto done_zones_check;
-                       }
-                       ntfs_debug("Continuing outer while loop.");
-                       continue;
-               } /* done_zones == 7 */
-               ntfs_debug("All zones are finished.");
-               /*
-                * All zones are finished!  If DATA_ZONE, shrink mft zone.  If
-                * MFT_ZONE, we have really run out of space.
-                */
-               mft_zone_size = vol->mft_zone_end - vol->mft_zone_start;
-               ntfs_debug("vol->mft_zone_start 0x%llx, vol->mft_zone_end "
-                               "0x%llx, mft_zone_size 0x%llx.",
-                               (unsigned long long)vol->mft_zone_start,
-                               (unsigned long long)vol->mft_zone_end,
-                               (unsigned long long)mft_zone_size);
-               if (zone == MFT_ZONE || mft_zone_size <= 0) {
-                       ntfs_debug("No free clusters left, going to out.");
-                       /* Really no more space left on device. */
-                       err = -ENOSPC;
-                       goto out;
-               } /* zone == DATA_ZONE && mft_zone_size > 0 */
-               ntfs_debug("Shrinking mft zone.");
-               zone_end = vol->mft_zone_end;
-               mft_zone_size >>= 1;
-               if (mft_zone_size > 0)
-                       vol->mft_zone_end = vol->mft_zone_start + mft_zone_size;
-               else /* mft zone and data2 zone no longer exist. */
-                       vol->data2_zone_pos = vol->mft_zone_start =
-                                       vol->mft_zone_end = 0;
-               if (vol->mft_zone_pos >= vol->mft_zone_end) {
-                       vol->mft_zone_pos = vol->mft_lcn;
-                       if (!vol->mft_zone_end)
-                               vol->mft_zone_pos = 0;
-               }
-               bmp_pos = zone_start = bmp_initial_pos =
-                               vol->data1_zone_pos = vol->mft_zone_end;
-               search_zone = 2;
-               pass = 2;
-               done_zones &= ~2;
-               ntfs_debug("After shrinking mft zone, mft_zone_size 0x%llx, "
-                               "vol->mft_zone_start 0x%llx, "
-                               "vol->mft_zone_end 0x%llx, "
-                               "vol->mft_zone_pos 0x%llx, search_zone 2, "
-                               "pass 2, dones_zones 0x%x, zone_start 0x%llx, "
-                               "zone_end 0x%llx, vol->data1_zone_pos 0x%llx, "
-                               "continuing outer while loop.",
-                               (unsigned long long)mft_zone_size,
-                               (unsigned long long)vol->mft_zone_start,
-                               (unsigned long long)vol->mft_zone_end,
-                               (unsigned long long)vol->mft_zone_pos,
-                               done_zones, (unsigned long long)zone_start,
-                               (unsigned long long)zone_end,
-                               (unsigned long long)vol->data1_zone_pos);
-       }
-       ntfs_debug("After outer while loop.");
-out:
-       ntfs_debug("At out.");
-       /* Add runlist terminator element. */
-       if (likely(rl)) {
-               rl[rlpos].vcn = rl[rlpos - 1].vcn + rl[rlpos - 1].length;
-               rl[rlpos].lcn = is_extension ? LCN_ENOENT : LCN_RL_NOT_MAPPED;
-               rl[rlpos].length = 0;
-       }
-       if (likely(page && !IS_ERR(page))) {
-               if (need_writeback) {
-                       ntfs_debug("Marking page dirty.");
-                       flush_dcache_page(page);
-                       set_page_dirty(page);
-                       need_writeback = 0;
-               }
-               ntfs_unmap_page(page);
-       }
-       if (likely(!err)) {
-               up_write(&vol->lcnbmp_lock);
-               ntfs_debug("Done.");
-               return rl;
-       }
-       ntfs_error(vol->sb, "Failed to allocate clusters, aborting "
-                       "(error %i).", err);
-       if (rl) {
-               int err2;
-
-               if (err == -ENOSPC)
-                       ntfs_debug("Not enough space to complete allocation, "
-                                       "err -ENOSPC, first free lcn 0x%llx, "
-                                       "could allocate up to 0x%llx "
-                                       "clusters.",
-                                       (unsigned long long)rl[0].lcn,
-                                       (unsigned long long)(count - clusters));
-               /* Deallocate all allocated clusters. */
-               ntfs_debug("Attempting rollback...");
-               err2 = ntfs_cluster_free_from_rl_nolock(vol, rl);
-               if (err2) {
-                       ntfs_error(vol->sb, "Failed to rollback (error %i).  "
-                                       "Leaving inconsistent metadata!  "
-                                       "Unmount and run chkdsk.", err2);
-                       NVolSetErrors(vol);
-               }
-               /* Free the runlist. */
-               ntfs_free(rl);
-       } else if (err == -ENOSPC)
-               ntfs_debug("No space left at all, err = -ENOSPC, first free "
-                               "lcn = 0x%llx.",
-                               (long long)vol->data1_zone_pos);
-       up_write(&vol->lcnbmp_lock);
-       return ERR_PTR(err);
-}
-
-/**
- * __ntfs_cluster_free - free clusters on an ntfs volume
- * @ni:                ntfs inode whose runlist describes the clusters to free
- * @start_vcn: vcn in the runlist of @ni at which to start freeing clusters
- * @count:     number of clusters to free or -1 for all clusters
- * @ctx:       active attribute search context if present or NULL if not
- * @is_rollback:       true if this is a rollback operation
- *
- * Free @count clusters starting at the cluster @start_vcn in the runlist
- * described by the vfs inode @ni.
- *
- * If @count is -1, all clusters from @start_vcn to the end of the runlist are
- * deallocated.  Thus, to completely free all clusters in a runlist, use
- * @start_vcn = 0 and @count = -1.
- *
- * If @ctx is specified, it is an active search context of @ni and its base mft
- * record.  This is needed when __ntfs_cluster_free() encounters unmapped
- * runlist fragments and allows their mapping.  If you do not have the mft
- * record mapped, you can specify @ctx as NULL and __ntfs_cluster_free() will
- * perform the necessary mapping and unmapping.
- *
- * Note, __ntfs_cluster_free() saves the state of @ctx on entry and restores it
- * before returning.  Thus, @ctx will be left pointing to the same attribute on
- * return as on entry.  However, the actual pointers in @ctx may point to
- * different memory locations on return, so you must remember to reset any
- * cached pointers from the @ctx, i.e. after the call to __ntfs_cluster_free(),
- * you will probably want to do:
- *     m = ctx->mrec;
- *     a = ctx->attr;
- * Assuming you cache ctx->attr in a variable @a of type ATTR_RECORD * and that
- * you cache ctx->mrec in a variable @m of type MFT_RECORD *.
- *
- * @is_rollback should always be 'false', it is for internal use to rollback
- * errors.  You probably want to use ntfs_cluster_free() instead.
- *
- * Note, __ntfs_cluster_free() does not modify the runlist, so you have to
- * remove from the runlist or mark sparse the freed runs later.
- *
- * Return the number of deallocated clusters (not counting sparse ones) on
- * success and -errno on error.
- *
- * WARNING: If @ctx is supplied, regardless of whether success or failure is
- *         returned, you need to check IS_ERR(@ctx->mrec) and if 'true' the @ctx
- *         is no longer valid, i.e. you need to either call
- *         ntfs_attr_reinit_search_ctx() or ntfs_attr_put_search_ctx() on it.
- *         In that case PTR_ERR(@ctx->mrec) will give you the error code for
- *         why the mapping of the old inode failed.
- *
- * Locking: - The runlist described by @ni must be locked for writing on entry
- *           and is locked on return.  Note the runlist may be modified when
- *           needed runlist fragments need to be mapped.
- *         - The volume lcn bitmap must be unlocked on entry and is unlocked
- *           on return.
- *         - This function takes the volume lcn bitmap lock for writing and
- *           modifies the bitmap contents.
- *         - If @ctx is NULL, the base mft record of @ni must not be mapped on
- *           entry and it will be left unmapped on return.
- *         - If @ctx is not NULL, the base mft record must be mapped on entry
- *           and it will be left mapped on return.
- */
-s64 __ntfs_cluster_free(ntfs_inode *ni, const VCN start_vcn, s64 count,
-               ntfs_attr_search_ctx *ctx, const bool is_rollback)
-{
-       s64 delta, to_free, total_freed, real_freed;
-       ntfs_volume *vol;
-       struct inode *lcnbmp_vi;
-       runlist_element *rl;
-       int err;
-
-       BUG_ON(!ni);
-       ntfs_debug("Entering for i_ino 0x%lx, start_vcn 0x%llx, count "
-                       "0x%llx.%s", ni->mft_no, (unsigned long long)start_vcn,
-                       (unsigned long long)count,
-                       is_rollback ? " (rollback)" : "");
-       vol = ni->vol;
-       lcnbmp_vi = vol->lcnbmp_ino;
-       BUG_ON(!lcnbmp_vi);
-       BUG_ON(start_vcn < 0);
-       BUG_ON(count < -1);
-       /*
-        * Lock the lcn bitmap for writing but only if not rolling back.  We
-        * must hold the lock all the way including through rollback otherwise
-        * rollback is not possible because once we have cleared a bit and
-        * dropped the lock, anyone could have set the bit again, thus
-        * allocating the cluster for another use.
-        */
-       if (likely(!is_rollback))
-               down_write(&vol->lcnbmp_lock);
-
-       total_freed = real_freed = 0;
-
-       rl = ntfs_attr_find_vcn_nolock(ni, start_vcn, ctx);
-       if (IS_ERR(rl)) {
-               if (!is_rollback)
-                       ntfs_error(vol->sb, "Failed to find first runlist "
-                                       "element (error %li), aborting.",
-                                       PTR_ERR(rl));
-               err = PTR_ERR(rl);
-               goto err_out;
-       }
-       if (unlikely(rl->lcn < LCN_HOLE)) {
-               if (!is_rollback)
-                       ntfs_error(vol->sb, "First runlist element has "
-                                       "invalid lcn, aborting.");
-               err = -EIO;
-               goto err_out;
-       }
-       /* Find the starting cluster inside the run that needs freeing. */
-       delta = start_vcn - rl->vcn;
-
-       /* The number of clusters in this run that need freeing. */
-       to_free = rl->length - delta;
-       if (count >= 0 && to_free > count)
-               to_free = count;
-
-       if (likely(rl->lcn >= 0)) {
-               /* Do the actual freeing of the clusters in this run. */
-               err = ntfs_bitmap_set_bits_in_run(lcnbmp_vi, rl->lcn + delta,
-                               to_free, likely(!is_rollback) ? 0 : 1);
-               if (unlikely(err)) {
-                       if (!is_rollback)
-                               ntfs_error(vol->sb, "Failed to clear first run "
-                                               "(error %i), aborting.", err);
-                       goto err_out;
-               }
-               /* We have freed @to_free real clusters. */
-               real_freed = to_free;
-       };
-       /* Go to the next run and adjust the number of clusters left to free. */
-       ++rl;
-       if (count >= 0)
-               count -= to_free;
-
-       /* Keep track of the total "freed" clusters, including sparse ones. */
-       total_freed = to_free;
-       /*
-        * Loop over the remaining runs, using @count as a capping value, and
-        * free them.
-        */
-       for (; rl->length && count != 0; ++rl) {
-               if (unlikely(rl->lcn < LCN_HOLE)) {
-                       VCN vcn;
-
-                       /* Attempt to map runlist. */
-                       vcn = rl->vcn;
-                       rl = ntfs_attr_find_vcn_nolock(ni, vcn, ctx);
-                       if (IS_ERR(rl)) {
-                               err = PTR_ERR(rl);
-                               if (!is_rollback)
-                                       ntfs_error(vol->sb, "Failed to map "
-                                                       "runlist fragment or "
-                                                       "failed to find "
-                                                       "subsequent runlist "
-                                                       "element.");
-                               goto err_out;
-                       }
-                       if (unlikely(rl->lcn < LCN_HOLE)) {
-                               if (!is_rollback)
-                                       ntfs_error(vol->sb, "Runlist element "
-                                                       "has invalid lcn "
-                                                       "(0x%llx).",
-                                                       (unsigned long long)
-                                                       rl->lcn);
-                               err = -EIO;
-                               goto err_out;
-                       }
-               }
-               /* The number of clusters in this run that need freeing. */
-               to_free = rl->length;
-               if (count >= 0 && to_free > count)
-                       to_free = count;
-
-               if (likely(rl->lcn >= 0)) {
-                       /* Do the actual freeing of the clusters in the run. */
-                       err = ntfs_bitmap_set_bits_in_run(lcnbmp_vi, rl->lcn,
-                                       to_free, likely(!is_rollback) ? 0 : 1);
-                       if (unlikely(err)) {
-                               if (!is_rollback)
-                                       ntfs_error(vol->sb, "Failed to clear "
-                                                       "subsequent run.");
-                               goto err_out;
-                       }
-                       /* We have freed @to_free real clusters. */
-                       real_freed += to_free;
-               }
-               /* Adjust the number of clusters left to free. */
-               if (count >= 0)
-                       count -= to_free;
-       
-               /* Update the total done clusters. */
-               total_freed += to_free;
-       }
-       if (likely(!is_rollback))
-               up_write(&vol->lcnbmp_lock);
-
-       BUG_ON(count > 0);
-
-       /* We are done.  Return the number of actually freed clusters. */
-       ntfs_debug("Done.");
-       return real_freed;
-err_out:
-       if (is_rollback)
-               return err;
-       /* If no real clusters were freed, no need to rollback. */
-       if (!real_freed) {
-               up_write(&vol->lcnbmp_lock);
-               return err;
-       }
-       /*
-        * Attempt to rollback and if that succeeds just return the error code.
-        * If rollback fails, set the volume errors flag, emit an error
-        * message, and return the error code.
-        */
-       delta = __ntfs_cluster_free(ni, start_vcn, total_freed, ctx, true);
-       if (delta < 0) {
-               ntfs_error(vol->sb, "Failed to rollback (error %i).  Leaving "
-                               "inconsistent metadata!  Unmount and run "
-                               "chkdsk.", (int)delta);
-               NVolSetErrors(vol);
-       }
-       up_write(&vol->lcnbmp_lock);
-       ntfs_error(vol->sb, "Aborting (error %i).", err);
-       return err;
-}
-
-#endif /* NTFS_RW */
diff --git a/fs/ntfs/lcnalloc.h b/fs/ntfs/lcnalloc.h
deleted file mode 100644 (file)
index 1589a6d..0000000
+++ /dev/null
@@ -1,131 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * lcnalloc.h - Exports for NTFS kernel cluster (de)allocation.  Part of the
- *             Linux-NTFS project.
- *
- * Copyright (c) 2004-2005 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_LCNALLOC_H
-#define _LINUX_NTFS_LCNALLOC_H
-
-#ifdef NTFS_RW
-
-#include <linux/fs.h>
-
-#include "attrib.h"
-#include "types.h"
-#include "inode.h"
-#include "runlist.h"
-#include "volume.h"
-
-typedef enum {
-       FIRST_ZONE      = 0,    /* For sanity checking. */
-       MFT_ZONE        = 0,    /* Allocate from $MFT zone. */
-       DATA_ZONE       = 1,    /* Allocate from $DATA zone. */
-       LAST_ZONE       = 1,    /* For sanity checking. */
-} NTFS_CLUSTER_ALLOCATION_ZONES;
-
-extern runlist_element *ntfs_cluster_alloc(ntfs_volume *vol,
-               const VCN start_vcn, const s64 count, const LCN start_lcn,
-               const NTFS_CLUSTER_ALLOCATION_ZONES zone,
-               const bool is_extension);
-
-extern s64 __ntfs_cluster_free(ntfs_inode *ni, const VCN start_vcn,
-               s64 count, ntfs_attr_search_ctx *ctx, const bool is_rollback);
-
-/**
- * ntfs_cluster_free - free clusters on an ntfs volume
- * @ni:                ntfs inode whose runlist describes the clusters to free
- * @start_vcn: vcn in the runlist of @ni at which to start freeing clusters
- * @count:     number of clusters to free or -1 for all clusters
- * @ctx:       active attribute search context if present or NULL if not
- *
- * Free @count clusters starting at the cluster @start_vcn in the runlist
- * described by the ntfs inode @ni.
- *
- * If @count is -1, all clusters from @start_vcn to the end of the runlist are
- * deallocated.  Thus, to completely free all clusters in a runlist, use
- * @start_vcn = 0 and @count = -1.
- *
- * If @ctx is specified, it is an active search context of @ni and its base mft
- * record.  This is needed when ntfs_cluster_free() encounters unmapped runlist
- * fragments and allows their mapping.  If you do not have the mft record
- * mapped, you can specify @ctx as NULL and ntfs_cluster_free() will perform
- * the necessary mapping and unmapping.
- *
- * Note, ntfs_cluster_free() saves the state of @ctx on entry and restores it
- * before returning.  Thus, @ctx will be left pointing to the same attribute on
- * return as on entry.  However, the actual pointers in @ctx may point to
- * different memory locations on return, so you must remember to reset any
- * cached pointers from the @ctx, i.e. after the call to ntfs_cluster_free(),
- * you will probably want to do:
- *     m = ctx->mrec;
- *     a = ctx->attr;
- * Assuming you cache ctx->attr in a variable @a of type ATTR_RECORD * and that
- * you cache ctx->mrec in a variable @m of type MFT_RECORD *.
- *
- * Note, ntfs_cluster_free() does not modify the runlist, so you have to remove
- * from the runlist or mark sparse the freed runs later.
- *
- * Return the number of deallocated clusters (not counting sparse ones) on
- * success and -errno on error.
- *
- * WARNING: If @ctx is supplied, regardless of whether success or failure is
- *         returned, you need to check IS_ERR(@ctx->mrec) and if 'true' the @ctx
- *         is no longer valid, i.e. you need to either call
- *         ntfs_attr_reinit_search_ctx() or ntfs_attr_put_search_ctx() on it.
- *         In that case PTR_ERR(@ctx->mrec) will give you the error code for
- *         why the mapping of the old inode failed.
- *
- * Locking: - The runlist described by @ni must be locked for writing on entry
- *           and is locked on return.  Note the runlist may be modified when
- *           needed runlist fragments need to be mapped.
- *         - The volume lcn bitmap must be unlocked on entry and is unlocked
- *           on return.
- *         - This function takes the volume lcn bitmap lock for writing and
- *           modifies the bitmap contents.
- *         - If @ctx is NULL, the base mft record of @ni must not be mapped on
- *           entry and it will be left unmapped on return.
- *         - If @ctx is not NULL, the base mft record must be mapped on entry
- *           and it will be left mapped on return.
- */
-static inline s64 ntfs_cluster_free(ntfs_inode *ni, const VCN start_vcn,
-               s64 count, ntfs_attr_search_ctx *ctx)
-{
-       return __ntfs_cluster_free(ni, start_vcn, count, ctx, false);
-}
-
-extern int ntfs_cluster_free_from_rl_nolock(ntfs_volume *vol,
-               const runlist_element *rl);
-
-/**
- * ntfs_cluster_free_from_rl - free clusters from runlist
- * @vol:       mounted ntfs volume on which to free the clusters
- * @rl:                runlist describing the clusters to free
- *
- * Free all the clusters described by the runlist @rl on the volume @vol.  In
- * the case of an error being returned, at least some of the clusters were not
- * freed.
- *
- * Return 0 on success and -errno on error.
- *
- * Locking: - This function takes the volume lcn bitmap lock for writing and
- *           modifies the bitmap contents.
- *         - The caller must have locked the runlist @rl for reading or
- *           writing.
- */
-static inline int ntfs_cluster_free_from_rl(ntfs_volume *vol,
-               const runlist_element *rl)
-{
-       int ret;
-
-       down_write(&vol->lcnbmp_lock);
-       ret = ntfs_cluster_free_from_rl_nolock(vol, rl);
-       up_write(&vol->lcnbmp_lock);
-       return ret;
-}
-
-#endif /* NTFS_RW */
-
-#endif /* defined _LINUX_NTFS_LCNALLOC_H */
diff --git a/fs/ntfs/logfile.c b/fs/ntfs/logfile.c
deleted file mode 100644 (file)
index 6ce60ff..0000000
+++ /dev/null
@@ -1,849 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * logfile.c - NTFS kernel journal handling. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2002-2007 Anton Altaparmakov
- */
-
-#ifdef NTFS_RW
-
-#include <linux/types.h>
-#include <linux/fs.h>
-#include <linux/highmem.h>
-#include <linux/buffer_head.h>
-#include <linux/bitops.h>
-#include <linux/log2.h>
-#include <linux/bio.h>
-
-#include "attrib.h"
-#include "aops.h"
-#include "debug.h"
-#include "logfile.h"
-#include "malloc.h"
-#include "volume.h"
-#include "ntfs.h"
-
-/**
- * ntfs_check_restart_page_header - check the page header for consistency
- * @vi:                $LogFile inode to which the restart page header belongs
- * @rp:                restart page header to check
- * @pos:       position in @vi at which the restart page header resides
- *
- * Check the restart page header @rp for consistency and return 'true' if it is
- * consistent and 'false' otherwise.
- *
- * This function only needs NTFS_BLOCK_SIZE bytes in @rp, i.e. it does not
- * require the full restart page.
- */
-static bool ntfs_check_restart_page_header(struct inode *vi,
-               RESTART_PAGE_HEADER *rp, s64 pos)
-{
-       u32 logfile_system_page_size, logfile_log_page_size;
-       u16 ra_ofs, usa_count, usa_ofs, usa_end = 0;
-       bool have_usa = true;
-
-       ntfs_debug("Entering.");
-       /*
-        * If the system or log page sizes are smaller than the ntfs block size
-        * or either is not a power of 2 we cannot handle this log file.
-        */
-       logfile_system_page_size = le32_to_cpu(rp->system_page_size);
-       logfile_log_page_size = le32_to_cpu(rp->log_page_size);
-       if (logfile_system_page_size < NTFS_BLOCK_SIZE ||
-                       logfile_log_page_size < NTFS_BLOCK_SIZE ||
-                       logfile_system_page_size &
-                       (logfile_system_page_size - 1) ||
-                       !is_power_of_2(logfile_log_page_size)) {
-               ntfs_error(vi->i_sb, "$LogFile uses unsupported page size.");
-               return false;
-       }
-       /*
-        * We must be either at !pos (1st restart page) or at pos = system page
-        * size (2nd restart page).
-        */
-       if (pos && pos != logfile_system_page_size) {
-               ntfs_error(vi->i_sb, "Found restart area in incorrect "
-                               "position in $LogFile.");
-               return false;
-       }
-       /* We only know how to handle version 1.1. */
-       if (sle16_to_cpu(rp->major_ver) != 1 ||
-                       sle16_to_cpu(rp->minor_ver) != 1) {
-               ntfs_error(vi->i_sb, "$LogFile version %i.%i is not "
-                               "supported.  (This driver supports version "
-                               "1.1 only.)", (int)sle16_to_cpu(rp->major_ver),
-                               (int)sle16_to_cpu(rp->minor_ver));
-               return false;
-       }
-       /*
-        * If chkdsk has been run the restart page may not be protected by an
-        * update sequence array.
-        */
-       if (ntfs_is_chkd_record(rp->magic) && !le16_to_cpu(rp->usa_count)) {
-               have_usa = false;
-               goto skip_usa_checks;
-       }
-       /* Verify the size of the update sequence array. */
-       usa_count = 1 + (logfile_system_page_size >> NTFS_BLOCK_SIZE_BITS);
-       if (usa_count != le16_to_cpu(rp->usa_count)) {
-               ntfs_error(vi->i_sb, "$LogFile restart page specifies "
-                               "inconsistent update sequence array count.");
-               return false;
-       }
-       /* Verify the position of the update sequence array. */
-       usa_ofs = le16_to_cpu(rp->usa_ofs);
-       usa_end = usa_ofs + usa_count * sizeof(u16);
-       if (usa_ofs < sizeof(RESTART_PAGE_HEADER) ||
-                       usa_end > NTFS_BLOCK_SIZE - sizeof(u16)) {
-               ntfs_error(vi->i_sb, "$LogFile restart page specifies "
-                               "inconsistent update sequence array offset.");
-               return false;
-       }
-skip_usa_checks:
-       /*
-        * Verify the position of the restart area.  It must be:
-        *      - aligned to 8-byte boundary,
-        *      - after the update sequence array, and
-        *      - within the system page size.
-        */
-       ra_ofs = le16_to_cpu(rp->restart_area_offset);
-       if (ra_ofs & 7 || (have_usa ? ra_ofs < usa_end :
-                       ra_ofs < sizeof(RESTART_PAGE_HEADER)) ||
-                       ra_ofs > logfile_system_page_size) {
-               ntfs_error(vi->i_sb, "$LogFile restart page specifies "
-                               "inconsistent restart area offset.");
-               return false;
-       }
-       /*
-        * Only restart pages modified by chkdsk are allowed to have chkdsk_lsn
-        * set.
-        */
-       if (!ntfs_is_chkd_record(rp->magic) && sle64_to_cpu(rp->chkdsk_lsn)) {
-               ntfs_error(vi->i_sb, "$LogFile restart page is not modified "
-                               "by chkdsk but a chkdsk LSN is specified.");
-               return false;
-       }
-       ntfs_debug("Done.");
-       return true;
-}
-
-/**
- * ntfs_check_restart_area - check the restart area for consistency
- * @vi:                $LogFile inode to which the restart page belongs
- * @rp:                restart page whose restart area to check
- *
- * Check the restart area of the restart page @rp for consistency and return
- * 'true' if it is consistent and 'false' otherwise.
- *
- * This function assumes that the restart page header has already been
- * consistency checked.
- *
- * This function only needs NTFS_BLOCK_SIZE bytes in @rp, i.e. it does not
- * require the full restart page.
- */
-static bool ntfs_check_restart_area(struct inode *vi, RESTART_PAGE_HEADER *rp)
-{
-       u64 file_size;
-       RESTART_AREA *ra;
-       u16 ra_ofs, ra_len, ca_ofs;
-       u8 fs_bits;
-
-       ntfs_debug("Entering.");
-       ra_ofs = le16_to_cpu(rp->restart_area_offset);
-       ra = (RESTART_AREA*)((u8*)rp + ra_ofs);
-       /*
-        * Everything before ra->file_size must be before the first word
-        * protected by an update sequence number.  This ensures that it is
-        * safe to access ra->client_array_offset.
-        */
-       if (ra_ofs + offsetof(RESTART_AREA, file_size) >
-                       NTFS_BLOCK_SIZE - sizeof(u16)) {
-               ntfs_error(vi->i_sb, "$LogFile restart area specifies "
-                               "inconsistent file offset.");
-               return false;
-       }
-       /*
-        * Now that we can access ra->client_array_offset, make sure everything
-        * up to the log client array is before the first word protected by an
-        * update sequence number.  This ensures we can access all of the
-        * restart area elements safely.  Also, the client array offset must be
-        * aligned to an 8-byte boundary.
-        */
-       ca_ofs = le16_to_cpu(ra->client_array_offset);
-       if (((ca_ofs + 7) & ~7) != ca_ofs ||
-                       ra_ofs + ca_ofs > NTFS_BLOCK_SIZE - sizeof(u16)) {
-               ntfs_error(vi->i_sb, "$LogFile restart area specifies "
-                               "inconsistent client array offset.");
-               return false;
-       }
-       /*
-        * The restart area must end within the system page size both when
-        * calculated manually and as specified by ra->restart_area_length.
-        * Also, the calculated length must not exceed the specified length.
-        */
-       ra_len = ca_ofs + le16_to_cpu(ra->log_clients) *
-                       sizeof(LOG_CLIENT_RECORD);
-       if (ra_ofs + ra_len > le32_to_cpu(rp->system_page_size) ||
-                       ra_ofs + le16_to_cpu(ra->restart_area_length) >
-                       le32_to_cpu(rp->system_page_size) ||
-                       ra_len > le16_to_cpu(ra->restart_area_length)) {
-               ntfs_error(vi->i_sb, "$LogFile restart area is out of bounds "
-                               "of the system page size specified by the "
-                               "restart page header and/or the specified "
-                               "restart area length is inconsistent.");
-               return false;
-       }
-       /*
-        * The ra->client_free_list and ra->client_in_use_list must be either
-        * LOGFILE_NO_CLIENT or less than ra->log_clients or they are
-        * overflowing the client array.
-        */
-       if ((ra->client_free_list != LOGFILE_NO_CLIENT &&
-                       le16_to_cpu(ra->client_free_list) >=
-                       le16_to_cpu(ra->log_clients)) ||
-                       (ra->client_in_use_list != LOGFILE_NO_CLIENT &&
-                       le16_to_cpu(ra->client_in_use_list) >=
-                       le16_to_cpu(ra->log_clients))) {
-               ntfs_error(vi->i_sb, "$LogFile restart area specifies "
-                               "overflowing client free and/or in use lists.");
-               return false;
-       }
-       /*
-        * Check ra->seq_number_bits against ra->file_size for consistency.
-        * We cannot just use ffs() because the file size is not a power of 2.
-        */
-       file_size = (u64)sle64_to_cpu(ra->file_size);
-       fs_bits = 0;
-       while (file_size) {
-               file_size >>= 1;
-               fs_bits++;
-       }
-       if (le32_to_cpu(ra->seq_number_bits) != 67 - fs_bits) {
-               ntfs_error(vi->i_sb, "$LogFile restart area specifies "
-                               "inconsistent sequence number bits.");
-               return false;
-       }
-       /* The log record header length must be a multiple of 8. */
-       if (((le16_to_cpu(ra->log_record_header_length) + 7) & ~7) !=
-                       le16_to_cpu(ra->log_record_header_length)) {
-               ntfs_error(vi->i_sb, "$LogFile restart area specifies "
-                               "inconsistent log record header length.");
-               return false;
-       }
-       /* Dito for the log page data offset. */
-       if (((le16_to_cpu(ra->log_page_data_offset) + 7) & ~7) !=
-                       le16_to_cpu(ra->log_page_data_offset)) {
-               ntfs_error(vi->i_sb, "$LogFile restart area specifies "
-                               "inconsistent log page data offset.");
-               return false;
-       }
-       ntfs_debug("Done.");
-       return true;
-}
-
-/**
- * ntfs_check_log_client_array - check the log client array for consistency
- * @vi:                $LogFile inode to which the restart page belongs
- * @rp:                restart page whose log client array to check
- *
- * Check the log client array of the restart page @rp for consistency and
- * return 'true' if it is consistent and 'false' otherwise.
- *
- * This function assumes that the restart page header and the restart area have
- * already been consistency checked.
- *
- * Unlike ntfs_check_restart_page_header() and ntfs_check_restart_area(), this
- * function needs @rp->system_page_size bytes in @rp, i.e. it requires the full
- * restart page and the page must be multi sector transfer deprotected.
- */
-static bool ntfs_check_log_client_array(struct inode *vi,
-               RESTART_PAGE_HEADER *rp)
-{
-       RESTART_AREA *ra;
-       LOG_CLIENT_RECORD *ca, *cr;
-       u16 nr_clients, idx;
-       bool in_free_list, idx_is_first;
-
-       ntfs_debug("Entering.");
-       ra = (RESTART_AREA*)((u8*)rp + le16_to_cpu(rp->restart_area_offset));
-       ca = (LOG_CLIENT_RECORD*)((u8*)ra +
-                       le16_to_cpu(ra->client_array_offset));
-       /*
-        * Check the ra->client_free_list first and then check the
-        * ra->client_in_use_list.  Check each of the log client records in
-        * each of the lists and check that the array does not overflow the
-        * ra->log_clients value.  Also keep track of the number of records
-        * visited as there cannot be more than ra->log_clients records and
-        * that way we detect eventual loops in within a list.
-        */
-       nr_clients = le16_to_cpu(ra->log_clients);
-       idx = le16_to_cpu(ra->client_free_list);
-       in_free_list = true;
-check_list:
-       for (idx_is_first = true; idx != LOGFILE_NO_CLIENT_CPU; nr_clients--,
-                       idx = le16_to_cpu(cr->next_client)) {
-               if (!nr_clients || idx >= le16_to_cpu(ra->log_clients))
-                       goto err_out;
-               /* Set @cr to the current log client record. */
-               cr = ca + idx;
-               /* The first log client record must not have a prev_client. */
-               if (idx_is_first) {
-                       if (cr->prev_client != LOGFILE_NO_CLIENT)
-                               goto err_out;
-                       idx_is_first = false;
-               }
-       }
-       /* Switch to and check the in use list if we just did the free list. */
-       if (in_free_list) {
-               in_free_list = false;
-               idx = le16_to_cpu(ra->client_in_use_list);
-               goto check_list;
-       }
-       ntfs_debug("Done.");
-       return true;
-err_out:
-       ntfs_error(vi->i_sb, "$LogFile log client array is corrupt.");
-       return false;
-}
-
-/**
- * ntfs_check_and_load_restart_page - check the restart page for consistency
- * @vi:                $LogFile inode to which the restart page belongs
- * @rp:                restart page to check
- * @pos:       position in @vi at which the restart page resides
- * @wrp:       [OUT] copy of the multi sector transfer deprotected restart page
- * @lsn:       [OUT] set to the current logfile lsn on success
- *
- * Check the restart page @rp for consistency and return 0 if it is consistent
- * and -errno otherwise.  The restart page may have been modified by chkdsk in
- * which case its magic is CHKD instead of RSTR.
- *
- * This function only needs NTFS_BLOCK_SIZE bytes in @rp, i.e. it does not
- * require the full restart page.
- *
- * If @wrp is not NULL, on success, *@wrp will point to a buffer containing a
- * copy of the complete multi sector transfer deprotected page.  On failure,
- * *@wrp is undefined.
- *
- * Simillarly, if @lsn is not NULL, on success *@lsn will be set to the current
- * logfile lsn according to this restart page.  On failure, *@lsn is undefined.
- *
- * The following error codes are defined:
- *     -EINVAL - The restart page is inconsistent.
- *     -ENOMEM - Not enough memory to load the restart page.
- *     -EIO    - Failed to reading from $LogFile.
- */
-static int ntfs_check_and_load_restart_page(struct inode *vi,
-               RESTART_PAGE_HEADER *rp, s64 pos, RESTART_PAGE_HEADER **wrp,
-               LSN *lsn)
-{
-       RESTART_AREA *ra;
-       RESTART_PAGE_HEADER *trp;
-       int size, err;
-
-       ntfs_debug("Entering.");
-       /* Check the restart page header for consistency. */
-       if (!ntfs_check_restart_page_header(vi, rp, pos)) {
-               /* Error output already done inside the function. */
-               return -EINVAL;
-       }
-       /* Check the restart area for consistency. */
-       if (!ntfs_check_restart_area(vi, rp)) {
-               /* Error output already done inside the function. */
-               return -EINVAL;
-       }
-       ra = (RESTART_AREA*)((u8*)rp + le16_to_cpu(rp->restart_area_offset));
-       /*
-        * Allocate a buffer to store the whole restart page so we can multi
-        * sector transfer deprotect it.
-        */
-       trp = ntfs_malloc_nofs(le32_to_cpu(rp->system_page_size));
-       if (!trp) {
-               ntfs_error(vi->i_sb, "Failed to allocate memory for $LogFile "
-                               "restart page buffer.");
-               return -ENOMEM;
-       }
-       /*
-        * Read the whole of the restart page into the buffer.  If it fits
-        * completely inside @rp, just copy it from there.  Otherwise map all
-        * the required pages and copy the data from them.
-        */
-       size = PAGE_SIZE - (pos & ~PAGE_MASK);
-       if (size >= le32_to_cpu(rp->system_page_size)) {
-               memcpy(trp, rp, le32_to_cpu(rp->system_page_size));
-       } else {
-               pgoff_t idx;
-               struct page *page;
-               int have_read, to_read;
-
-               /* First copy what we already have in @rp. */
-               memcpy(trp, rp, size);
-               /* Copy the remaining data one page at a time. */
-               have_read = size;
-               to_read = le32_to_cpu(rp->system_page_size) - size;
-               idx = (pos + size) >> PAGE_SHIFT;
-               BUG_ON((pos + size) & ~PAGE_MASK);
-               do {
-                       page = ntfs_map_page(vi->i_mapping, idx);
-                       if (IS_ERR(page)) {
-                               ntfs_error(vi->i_sb, "Error mapping $LogFile "
-                                               "page (index %lu).", idx);
-                               err = PTR_ERR(page);
-                               if (err != -EIO && err != -ENOMEM)
-                                       err = -EIO;
-                               goto err_out;
-                       }
-                       size = min_t(int, to_read, PAGE_SIZE);
-                       memcpy((u8*)trp + have_read, page_address(page), size);
-                       ntfs_unmap_page(page);
-                       have_read += size;
-                       to_read -= size;
-                       idx++;
-               } while (to_read > 0);
-       }
-       /*
-        * Perform the multi sector transfer deprotection on the buffer if the
-        * restart page is protected.
-        */
-       if ((!ntfs_is_chkd_record(trp->magic) || le16_to_cpu(trp->usa_count))
-                       && post_read_mst_fixup((NTFS_RECORD*)trp,
-                       le32_to_cpu(rp->system_page_size))) {
-               /*
-                * A multi sector tranfer error was detected.  We only need to
-                * abort if the restart page contents exceed the multi sector
-                * transfer fixup of the first sector.
-                */
-               if (le16_to_cpu(rp->restart_area_offset) +
-                               le16_to_cpu(ra->restart_area_length) >
-                               NTFS_BLOCK_SIZE - sizeof(u16)) {
-                       ntfs_error(vi->i_sb, "Multi sector transfer error "
-                                       "detected in $LogFile restart page.");
-                       err = -EINVAL;
-                       goto err_out;
-               }
-       }
-       /*
-        * If the restart page is modified by chkdsk or there are no active
-        * logfile clients, the logfile is consistent.  Otherwise, need to
-        * check the log client records for consistency, too.
-        */
-       err = 0;
-       if (ntfs_is_rstr_record(rp->magic) &&
-                       ra->client_in_use_list != LOGFILE_NO_CLIENT) {
-               if (!ntfs_check_log_client_array(vi, trp)) {
-                       err = -EINVAL;
-                       goto err_out;
-               }
-       }
-       if (lsn) {
-               if (ntfs_is_rstr_record(rp->magic))
-                       *lsn = sle64_to_cpu(ra->current_lsn);
-               else /* if (ntfs_is_chkd_record(rp->magic)) */
-                       *lsn = sle64_to_cpu(rp->chkdsk_lsn);
-       }
-       ntfs_debug("Done.");
-       if (wrp)
-               *wrp = trp;
-       else {
-err_out:
-               ntfs_free(trp);
-       }
-       return err;
-}
-
-/**
- * ntfs_check_logfile - check the journal for consistency
- * @log_vi:    struct inode of loaded journal $LogFile to check
- * @rp:                [OUT] on success this is a copy of the current restart page
- *
- * Check the $LogFile journal for consistency and return 'true' if it is
- * consistent and 'false' if not.  On success, the current restart page is
- * returned in *@rp.  Caller must call ntfs_free(*@rp) when finished with it.
- *
- * At present we only check the two restart pages and ignore the log record
- * pages.
- *
- * Note that the MstProtected flag is not set on the $LogFile inode and hence
- * when reading pages they are not deprotected.  This is because we do not know
- * if the $LogFile was created on a system with a different page size to ours
- * yet and mst deprotection would fail if our page size is smaller.
- */
-bool ntfs_check_logfile(struct inode *log_vi, RESTART_PAGE_HEADER **rp)
-{
-       s64 size, pos;
-       LSN rstr1_lsn, rstr2_lsn;
-       ntfs_volume *vol = NTFS_SB(log_vi->i_sb);
-       struct address_space *mapping = log_vi->i_mapping;
-       struct page *page = NULL;
-       u8 *kaddr = NULL;
-       RESTART_PAGE_HEADER *rstr1_ph = NULL;
-       RESTART_PAGE_HEADER *rstr2_ph = NULL;
-       int log_page_size, err;
-       bool logfile_is_empty = true;
-       u8 log_page_bits;
-
-       ntfs_debug("Entering.");
-       /* An empty $LogFile must have been clean before it got emptied. */
-       if (NVolLogFileEmpty(vol))
-               goto is_empty;
-       size = i_size_read(log_vi);
-       /* Make sure the file doesn't exceed the maximum allowed size. */
-       if (size > MaxLogFileSize)
-               size = MaxLogFileSize;
-       /*
-        * Truncate size to a multiple of the page cache size or the default
-        * log page size if the page cache size is between the default log page
-        * log page size if the page cache size is between the default log page
-        * size and twice that.
-        */
-       if (PAGE_SIZE >= DefaultLogPageSize && PAGE_SIZE <=
-                       DefaultLogPageSize * 2)
-               log_page_size = DefaultLogPageSize;
-       else
-               log_page_size = PAGE_SIZE;
-       /*
-        * Use ntfs_ffs() instead of ffs() to enable the compiler to
-        * optimize log_page_size and log_page_bits into constants.
-        */
-       log_page_bits = ntfs_ffs(log_page_size) - 1;
-       size &= ~(s64)(log_page_size - 1);
-       /*
-        * Ensure the log file is big enough to store at least the two restart
-        * pages and the minimum number of log record pages.
-        */
-       if (size < log_page_size * 2 || (size - log_page_size * 2) >>
-                       log_page_bits < MinLogRecordPages) {
-               ntfs_error(vol->sb, "$LogFile is too small.");
-               return false;
-       }
-       /*
-        * Read through the file looking for a restart page.  Since the restart
-        * page header is at the beginning of a page we only need to search at
-        * what could be the beginning of a page (for each page size) rather
-        * than scanning the whole file byte by byte.  If all potential places
-        * contain empty and uninitialzed records, the log file can be assumed
-        * to be empty.
-        */
-       for (pos = 0; pos < size; pos <<= 1) {
-               pgoff_t idx = pos >> PAGE_SHIFT;
-               if (!page || page->index != idx) {
-                       if (page)
-                               ntfs_unmap_page(page);
-                       page = ntfs_map_page(mapping, idx);
-                       if (IS_ERR(page)) {
-                               ntfs_error(vol->sb, "Error mapping $LogFile "
-                                               "page (index %lu).", idx);
-                               goto err_out;
-                       }
-               }
-               kaddr = (u8*)page_address(page) + (pos & ~PAGE_MASK);
-               /*
-                * A non-empty block means the logfile is not empty while an
-                * empty block after a non-empty block has been encountered
-                * means we are done.
-                */
-               if (!ntfs_is_empty_recordp((le32*)kaddr))
-                       logfile_is_empty = false;
-               else if (!logfile_is_empty)
-                       break;
-               /*
-                * A log record page means there cannot be a restart page after
-                * this so no need to continue searching.
-                */
-               if (ntfs_is_rcrd_recordp((le32*)kaddr))
-                       break;
-               /* If not a (modified by chkdsk) restart page, continue. */
-               if (!ntfs_is_rstr_recordp((le32*)kaddr) &&
-                               !ntfs_is_chkd_recordp((le32*)kaddr)) {
-                       if (!pos)
-                               pos = NTFS_BLOCK_SIZE >> 1;
-                       continue;
-               }
-               /*
-                * Check the (modified by chkdsk) restart page for consistency
-                * and get a copy of the complete multi sector transfer
-                * deprotected restart page.
-                */
-               err = ntfs_check_and_load_restart_page(log_vi,
-                               (RESTART_PAGE_HEADER*)kaddr, pos,
-                               !rstr1_ph ? &rstr1_ph : &rstr2_ph,
-                               !rstr1_ph ? &rstr1_lsn : &rstr2_lsn);
-               if (!err) {
-                       /*
-                        * If we have now found the first (modified by chkdsk)
-                        * restart page, continue looking for the second one.
-                        */
-                       if (!pos) {
-                               pos = NTFS_BLOCK_SIZE >> 1;
-                               continue;
-                       }
-                       /*
-                        * We have now found the second (modified by chkdsk)
-                        * restart page, so we can stop looking.
-                        */
-                       break;
-               }
-               /*
-                * Error output already done inside the function.  Note, we do
-                * not abort if the restart page was invalid as we might still
-                * find a valid one further in the file.
-                */
-               if (err != -EINVAL) {
-                       ntfs_unmap_page(page);
-                       goto err_out;
-               }
-               /* Continue looking. */
-               if (!pos)
-                       pos = NTFS_BLOCK_SIZE >> 1;
-       }
-       if (page)
-               ntfs_unmap_page(page);
-       if (logfile_is_empty) {
-               NVolSetLogFileEmpty(vol);
-is_empty:
-               ntfs_debug("Done.  ($LogFile is empty.)");
-               return true;
-       }
-       if (!rstr1_ph) {
-               BUG_ON(rstr2_ph);
-               ntfs_error(vol->sb, "Did not find any restart pages in "
-                               "$LogFile and it was not empty.");
-               return false;
-       }
-       /* If both restart pages were found, use the more recent one. */
-       if (rstr2_ph) {
-               /*
-                * If the second restart area is more recent, switch to it.
-                * Otherwise just throw it away.
-                */
-               if (rstr2_lsn > rstr1_lsn) {
-                       ntfs_debug("Using second restart page as it is more "
-                                       "recent.");
-                       ntfs_free(rstr1_ph);
-                       rstr1_ph = rstr2_ph;
-                       /* rstr1_lsn = rstr2_lsn; */
-               } else {
-                       ntfs_debug("Using first restart page as it is more "
-                                       "recent.");
-                       ntfs_free(rstr2_ph);
-               }
-               rstr2_ph = NULL;
-       }
-       /* All consistency checks passed. */
-       if (rp)
-               *rp = rstr1_ph;
-       else
-               ntfs_free(rstr1_ph);
-       ntfs_debug("Done.");
-       return true;
-err_out:
-       if (rstr1_ph)
-               ntfs_free(rstr1_ph);
-       return false;
-}
-
-/**
- * ntfs_is_logfile_clean - check in the journal if the volume is clean
- * @log_vi:    struct inode of loaded journal $LogFile to check
- * @rp:                copy of the current restart page
- *
- * Analyze the $LogFile journal and return 'true' if it indicates the volume was
- * shutdown cleanly and 'false' if not.
- *
- * At present we only look at the two restart pages and ignore the log record
- * pages.  This is a little bit crude in that there will be a very small number
- * of cases where we think that a volume is dirty when in fact it is clean.
- * This should only affect volumes that have not been shutdown cleanly but did
- * not have any pending, non-check-pointed i/o, i.e. they were completely idle
- * at least for the five seconds preceding the unclean shutdown.
- *
- * This function assumes that the $LogFile journal has already been consistency
- * checked by a call to ntfs_check_logfile() and in particular if the $LogFile
- * is empty this function requires that NVolLogFileEmpty() is true otherwise an
- * empty volume will be reported as dirty.
- */
-bool ntfs_is_logfile_clean(struct inode *log_vi, const RESTART_PAGE_HEADER *rp)
-{
-       ntfs_volume *vol = NTFS_SB(log_vi->i_sb);
-       RESTART_AREA *ra;
-
-       ntfs_debug("Entering.");
-       /* An empty $LogFile must have been clean before it got emptied. */
-       if (NVolLogFileEmpty(vol)) {
-               ntfs_debug("Done.  ($LogFile is empty.)");
-               return true;
-       }
-       BUG_ON(!rp);
-       if (!ntfs_is_rstr_record(rp->magic) &&
-                       !ntfs_is_chkd_record(rp->magic)) {
-               ntfs_error(vol->sb, "Restart page buffer is invalid.  This is "
-                               "probably a bug in that the $LogFile should "
-                               "have been consistency checked before calling "
-                               "this function.");
-               return false;
-       }
-       ra = (RESTART_AREA*)((u8*)rp + le16_to_cpu(rp->restart_area_offset));
-       /*
-        * If the $LogFile has active clients, i.e. it is open, and we do not
-        * have the RESTART_VOLUME_IS_CLEAN bit set in the restart area flags,
-        * we assume there was an unclean shutdown.
-        */
-       if (ra->client_in_use_list != LOGFILE_NO_CLIENT &&
-                       !(ra->flags & RESTART_VOLUME_IS_CLEAN)) {
-               ntfs_debug("Done.  $LogFile indicates a dirty shutdown.");
-               return false;
-       }
-       /* $LogFile indicates a clean shutdown. */
-       ntfs_debug("Done.  $LogFile indicates a clean shutdown.");
-       return true;
-}
-
-/**
- * ntfs_empty_logfile - empty the contents of the $LogFile journal
- * @log_vi:    struct inode of loaded journal $LogFile to empty
- *
- * Empty the contents of the $LogFile journal @log_vi and return 'true' on
- * success and 'false' on error.
- *
- * This function assumes that the $LogFile journal has already been consistency
- * checked by a call to ntfs_check_logfile() and that ntfs_is_logfile_clean()
- * has been used to ensure that the $LogFile is clean.
- */
-bool ntfs_empty_logfile(struct inode *log_vi)
-{
-       VCN vcn, end_vcn;
-       ntfs_inode *log_ni = NTFS_I(log_vi);
-       ntfs_volume *vol = log_ni->vol;
-       struct super_block *sb = vol->sb;
-       runlist_element *rl;
-       unsigned long flags;
-       unsigned block_size, block_size_bits;
-       int err;
-       bool should_wait = true;
-
-       ntfs_debug("Entering.");
-       if (NVolLogFileEmpty(vol)) {
-               ntfs_debug("Done.");
-               return true;
-       }
-       /*
-        * We cannot use ntfs_attr_set() because we may be still in the middle
-        * of a mount operation.  Thus we do the emptying by hand by first
-        * zapping the page cache pages for the $LogFile/$DATA attribute and
-        * then emptying each of the buffers in each of the clusters specified
-        * by the runlist by hand.
-        */
-       block_size = sb->s_blocksize;
-       block_size_bits = sb->s_blocksize_bits;
-       vcn = 0;
-       read_lock_irqsave(&log_ni->size_lock, flags);
-       end_vcn = (log_ni->initialized_size + vol->cluster_size_mask) >>
-                       vol->cluster_size_bits;
-       read_unlock_irqrestore(&log_ni->size_lock, flags);
-       truncate_inode_pages(log_vi->i_mapping, 0);
-       down_write(&log_ni->runlist.lock);
-       rl = log_ni->runlist.rl;
-       if (unlikely(!rl || vcn < rl->vcn || !rl->length)) {
-map_vcn:
-               err = ntfs_map_runlist_nolock(log_ni, vcn, NULL);
-               if (err) {
-                       ntfs_error(sb, "Failed to map runlist fragment (error "
-                                       "%d).", -err);
-                       goto err;
-               }
-               rl = log_ni->runlist.rl;
-               BUG_ON(!rl || vcn < rl->vcn || !rl->length);
-       }
-       /* Seek to the runlist element containing @vcn. */
-       while (rl->length && vcn >= rl[1].vcn)
-               rl++;
-       do {
-               LCN lcn;
-               sector_t block, end_block;
-               s64 len;
-
-               /*
-                * If this run is not mapped map it now and start again as the
-                * runlist will have been updated.
-                */
-               lcn = rl->lcn;
-               if (unlikely(lcn == LCN_RL_NOT_MAPPED)) {
-                       vcn = rl->vcn;
-                       goto map_vcn;
-               }
-               /* If this run is not valid abort with an error. */
-               if (unlikely(!rl->length || lcn < LCN_HOLE))
-                       goto rl_err;
-               /* Skip holes. */
-               if (lcn == LCN_HOLE)
-                       continue;
-               block = lcn << vol->cluster_size_bits >> block_size_bits;
-               len = rl->length;
-               if (rl[1].vcn > end_vcn)
-                       len = end_vcn - rl->vcn;
-               end_block = (lcn + len) << vol->cluster_size_bits >>
-                               block_size_bits;
-               /* Iterate over the blocks in the run and empty them. */
-               do {
-                       struct buffer_head *bh;
-
-                       /* Obtain the buffer, possibly not uptodate. */
-                       bh = sb_getblk(sb, block);
-                       BUG_ON(!bh);
-                       /* Setup buffer i/o submission. */
-                       lock_buffer(bh);
-                       bh->b_end_io = end_buffer_write_sync;
-                       get_bh(bh);
-                       /* Set the entire contents of the buffer to 0xff. */
-                       memset(bh->b_data, -1, block_size);
-                       if (!buffer_uptodate(bh))
-                               set_buffer_uptodate(bh);
-                       if (buffer_dirty(bh))
-                               clear_buffer_dirty(bh);
-                       /*
-                        * Submit the buffer and wait for i/o to complete but
-                        * only for the first buffer so we do not miss really
-                        * serious i/o errors.  Once the first buffer has
-                        * completed ignore errors afterwards as we can assume
-                        * that if one buffer worked all of them will work.
-                        */
-                       submit_bh(REQ_OP_WRITE, bh);
-                       if (should_wait) {
-                               should_wait = false;
-                               wait_on_buffer(bh);
-                               if (unlikely(!buffer_uptodate(bh)))
-                                       goto io_err;
-                       }
-                       brelse(bh);
-               } while (++block < end_block);
-       } while ((++rl)->vcn < end_vcn);
-       up_write(&log_ni->runlist.lock);
-       /*
-        * Zap the pages again just in case any got instantiated whilst we were
-        * emptying the blocks by hand.  FIXME: We may not have completed
-        * writing to all the buffer heads yet so this may happen too early.
-        * We really should use a kernel thread to do the emptying
-        * asynchronously and then we can also set the volume dirty and output
-        * an error message if emptying should fail.
-        */
-       truncate_inode_pages(log_vi->i_mapping, 0);
-       /* Set the flag so we do not have to do it again on remount. */
-       NVolSetLogFileEmpty(vol);
-       ntfs_debug("Done.");
-       return true;
-io_err:
-       ntfs_error(sb, "Failed to write buffer.  Unmount and run chkdsk.");
-       goto dirty_err;
-rl_err:
-       ntfs_error(sb, "Runlist is corrupt.  Unmount and run chkdsk.");
-dirty_err:
-       NVolSetErrors(vol);
-       err = -EIO;
-err:
-       up_write(&log_ni->runlist.lock);
-       ntfs_error(sb, "Failed to fill $LogFile with 0xff bytes (error %d).",
-                       -err);
-       return false;
-}
-
-#endif /* NTFS_RW */
diff --git a/fs/ntfs/logfile.h b/fs/ntfs/logfile.h
deleted file mode 100644 (file)
index 429d490..0000000
+++ /dev/null
@@ -1,295 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * logfile.h - Defines for NTFS kernel journal ($LogFile) handling.  Part of
- *            the Linux-NTFS project.
- *
- * Copyright (c) 2000-2005 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_LOGFILE_H
-#define _LINUX_NTFS_LOGFILE_H
-
-#ifdef NTFS_RW
-
-#include <linux/fs.h>
-
-#include "types.h"
-#include "endian.h"
-#include "layout.h"
-
-/*
- * Journal ($LogFile) organization:
- *
- * Two restart areas present in the first two pages (restart pages, one restart
- * area in each page).  When the volume is dismounted they should be identical,
- * except for the update sequence array which usually has a different update
- * sequence number.
- *
- * These are followed by log records organized in pages headed by a log record
- * header going up to log file size.  Not all pages contain log records when a
- * volume is first formatted, but as the volume ages, all records will be used.
- * When the log file fills up, the records at the beginning are purged (by
- * modifying the oldest_lsn to a higher value presumably) and writing begins
- * at the beginning of the file.  Effectively, the log file is viewed as a
- * circular entity.
- *
- * NOTE: Windows NT, 2000, and XP all use log file version 1.1 but they accept
- * versions <= 1.x, including 0.-1.  (Yes, that is a minus one in there!)  We
- * probably only want to support 1.1 as this seems to be the current version
- * and we don't know how that differs from the older versions.  The only
- * exception is if the journal is clean as marked by the two restart pages
- * then it doesn't matter whether we are on an earlier version.  We can just
- * reinitialize the logfile and start again with version 1.1.
- */
-
-/* Some $LogFile related constants. */
-#define MaxLogFileSize         0x100000000ULL
-#define DefaultLogPageSize     4096
-#define MinLogRecordPages      48
-
-/*
- * Log file restart page header (begins the restart area).
- */
-typedef struct {
-/*Ofs*/
-/*  0  NTFS_RECORD; -- Unfolded here as gcc doesn't like unnamed structs. */
-/*  0*/        NTFS_RECORD_TYPE magic; /* The magic is "RSTR". */
-/*  4*/        le16 usa_ofs;           /* See NTFS_RECORD definition in layout.h.
-                                  When creating, set this to be immediately
-                                  after this header structure (without any
-                                  alignment). */
-/*  6*/        le16 usa_count;         /* See NTFS_RECORD definition in layout.h. */
-
-/*  8*/        leLSN chkdsk_lsn;       /* The last log file sequence number found by
-                                  chkdsk.  Only used when the magic is changed
-                                  to "CHKD".  Otherwise this is zero. */
-/* 16*/        le32 system_page_size;  /* Byte size of system pages when the log file
-                                  was created, has to be >= 512 and a power of
-                                  2.  Use this to calculate the required size
-                                  of the usa (usa_count) and add it to usa_ofs.
-                                  Then verify that the result is less than the
-                                  value of the restart_area_offset. */
-/* 20*/        le32 log_page_size;     /* Byte size of log file pages, has to be >=
-                                  512 and a power of 2.  The default is 4096
-                                  and is used when the system page size is
-                                  between 4096 and 8192.  Otherwise this is
-                                  set to the system page size instead. */
-/* 24*/        le16 restart_area_offset;/* Byte offset from the start of this header to
-                                  the RESTART_AREA.  Value has to be aligned
-                                  to 8-byte boundary.  When creating, set this
-                                  to be after the usa. */
-/* 26*/        sle16 minor_ver;        /* Log file minor version.  Only check if major
-                                  version is 1. */
-/* 28*/        sle16 major_ver;        /* Log file major version.  We only support
-                                  version 1.1. */
-/* sizeof() = 30 (0x1e) bytes */
-} __attribute__ ((__packed__)) RESTART_PAGE_HEADER;
-
-/*
- * Constant for the log client indices meaning that there are no client records
- * in this particular client array.  Also inside the client records themselves,
- * this means that there are no client records preceding or following this one.
- */
-#define LOGFILE_NO_CLIENT      cpu_to_le16(0xffff)
-#define LOGFILE_NO_CLIENT_CPU  0xffff
-
-/*
- * These are the so far known RESTART_AREA_* flags (16-bit) which contain
- * information about the log file in which they are present.
- */
-enum {
-       RESTART_VOLUME_IS_CLEAN = cpu_to_le16(0x0002),
-       RESTART_SPACE_FILLER    = cpu_to_le16(0xffff), /* gcc: Force enum bit width to 16. */
-} __attribute__ ((__packed__));
-
-typedef le16 RESTART_AREA_FLAGS;
-
-/*
- * Log file restart area record.  The offset of this record is found by adding
- * the offset of the RESTART_PAGE_HEADER to the restart_area_offset value found
- * in it.  See notes at restart_area_offset above.
- */
-typedef struct {
-/*Ofs*/
-/*  0*/        leLSN current_lsn;      /* The current, i.e. last LSN inside the log
-                                  when the restart area was last written.
-                                  This happens often but what is the interval?
-                                  Is it just fixed time or is it every time a
-                                  check point is written or somethine else?
-                                  On create set to 0. */
-/*  8*/        le16 log_clients;       /* Number of log client records in the array of
-                                  log client records which follows this
-                                  restart area.  Must be 1.  */
-/* 10*/        le16 client_free_list;  /* The index of the first free log client record
-                                  in the array of log client records.
-                                  LOGFILE_NO_CLIENT means that there are no
-                                  free log client records in the array.
-                                  If != LOGFILE_NO_CLIENT, check that
-                                  log_clients > client_free_list.  On Win2k
-                                  and presumably earlier, on a clean volume
-                                  this is != LOGFILE_NO_CLIENT, and it should
-                                  be 0, i.e. the first (and only) client
-                                  record is free and thus the logfile is
-                                  closed and hence clean.  A dirty volume
-                                  would have left the logfile open and hence
-                                  this would be LOGFILE_NO_CLIENT.  On WinXP
-                                  and presumably later, the logfile is always
-                                  open, even on clean shutdown so this should
-                                  always be LOGFILE_NO_CLIENT. */
-/* 12*/        le16 client_in_use_list;/* The index of the first in-use log client
-                                  record in the array of log client records.
-                                  LOGFILE_NO_CLIENT means that there are no
-                                  in-use log client records in the array.  If
-                                  != LOGFILE_NO_CLIENT check that log_clients
-                                  > client_in_use_list.  On Win2k and
-                                  presumably earlier, on a clean volume this
-                                  is LOGFILE_NO_CLIENT, i.e. there are no
-                                  client records in use and thus the logfile
-                                  is closed and hence clean.  A dirty volume
-                                  would have left the logfile open and hence
-                                  this would be != LOGFILE_NO_CLIENT, and it
-                                  should be 0, i.e. the first (and only)
-                                  client record is in use.  On WinXP and
-                                  presumably later, the logfile is always
-                                  open, even on clean shutdown so this should
-                                  always be 0. */
-/* 14*/        RESTART_AREA_FLAGS flags;/* Flags modifying LFS behaviour.  On Win2k
-                                  and presumably earlier this is always 0.  On
-                                  WinXP and presumably later, if the logfile
-                                  was shutdown cleanly, the second bit,
-                                  RESTART_VOLUME_IS_CLEAN, is set.  This bit
-                                  is cleared when the volume is mounted by
-                                  WinXP and set when the volume is dismounted,
-                                  thus if the logfile is dirty, this bit is
-                                  clear.  Thus we don't need to check the
-                                  Windows version to determine if the logfile
-                                  is clean.  Instead if the logfile is closed,
-                                  we know it must be clean.  If it is open and
-                                  this bit is set, we also know it must be
-                                  clean.  If on the other hand the logfile is
-                                  open and this bit is clear, we can be almost
-                                  certain that the logfile is dirty. */
-/* 16*/        le32 seq_number_bits;   /* How many bits to use for the sequence
-                                  number.  This is calculated as 67 - the
-                                  number of bits required to store the logfile
-                                  size in bytes and this can be used in with
-                                  the specified file_size as a consistency
-                                  check. */
-/* 20*/        le16 restart_area_length;/* Length of the restart area including the
-                                  client array.  Following checks required if
-                                  version matches.  Otherwise, skip them.
-                                  restart_area_offset + restart_area_length
-                                  has to be <= system_page_size.  Also,
-                                  restart_area_length has to be >=
-                                  client_array_offset + (log_clients *
-                                  sizeof(log client record)). */
-/* 22*/        le16 client_array_offset;/* Offset from the start of this record to
-                                  the first log client record if versions are
-                                  matched.  When creating, set this to be
-                                  after this restart area structure, aligned
-                                  to 8-bytes boundary.  If the versions do not
-                                  match, this is ignored and the offset is
-                                  assumed to be (sizeof(RESTART_AREA) + 7) &
-                                  ~7, i.e. rounded up to first 8-byte
-                                  boundary.  Either way, client_array_offset
-                                  has to be aligned to an 8-byte boundary.
-                                  Also, restart_area_offset +
-                                  client_array_offset has to be <= 510.
-                                  Finally, client_array_offset + (log_clients
-                                  * sizeof(log client record)) has to be <=
-                                  system_page_size.  On Win2k and presumably
-                                  earlier, this is 0x30, i.e. immediately
-                                  following this record.  On WinXP and
-                                  presumably later, this is 0x40, i.e. there
-                                  are 16 extra bytes between this record and
-                                  the client array.  This probably means that
-                                  the RESTART_AREA record is actually bigger
-                                  in WinXP and later. */
-/* 24*/        sle64 file_size;        /* Usable byte size of the log file.  If the
-                                  restart_area_offset + the offset of the
-                                  file_size are > 510 then corruption has
-                                  occurred.  This is the very first check when
-                                  starting with the restart_area as if it
-                                  fails it means that some of the above values
-                                  will be corrupted by the multi sector
-                                  transfer protection.  The file_size has to
-                                  be rounded down to be a multiple of the
-                                  log_page_size in the RESTART_PAGE_HEADER and
-                                  then it has to be at least big enough to
-                                  store the two restart pages and 48 (0x30)
-                                  log record pages. */
-/* 32*/        le32 last_lsn_data_length;/* Length of data of last LSN, not including
-                                  the log record header.  On create set to
-                                  0. */
-/* 36*/        le16 log_record_header_length;/* Byte size of the log record header.
-                                  If the version matches then check that the
-                                  value of log_record_header_length is a
-                                  multiple of 8, i.e.
-                                  (log_record_header_length + 7) & ~7 ==
-                                  log_record_header_length.  When creating set
-                                  it to sizeof(LOG_RECORD_HEADER), aligned to
-                                  8 bytes. */
-/* 38*/        le16 log_page_data_offset;/* Offset to the start of data in a log record
-                                  page.  Must be a multiple of 8.  On create
-                                  set it to immediately after the update
-                                  sequence array of the log record page. */
-/* 40*/        le32 restart_log_open_count;/* A counter that gets incremented every
-                                  time the logfile is restarted which happens
-                                  at mount time when the logfile is opened.
-                                  When creating set to a random value.  Win2k
-                                  sets it to the low 32 bits of the current
-                                  system time in NTFS format (see time.h). */
-/* 44*/        le32 reserved;          /* Reserved/alignment to 8-byte boundary. */
-/* sizeof() = 48 (0x30) bytes */
-} __attribute__ ((__packed__)) RESTART_AREA;
-
-/*
- * Log client record.  The offset of this record is found by adding the offset
- * of the RESTART_AREA to the client_array_offset value found in it.
- */
-typedef struct {
-/*Ofs*/
-/*  0*/        leLSN oldest_lsn;       /* Oldest LSN needed by this client.  On create
-                                  set to 0. */
-/*  8*/        leLSN client_restart_lsn;/* LSN at which this client needs to restart
-                                  the volume, i.e. the current position within
-                                  the log file.  At present, if clean this
-                                  should = current_lsn in restart area but it
-                                  probably also = current_lsn when dirty most
-                                  of the time.  At create set to 0. */
-/* 16*/        le16 prev_client;       /* The offset to the previous log client record
-                                  in the array of log client records.
-                                  LOGFILE_NO_CLIENT means there is no previous
-                                  client record, i.e. this is the first one.
-                                  This is always LOGFILE_NO_CLIENT. */
-/* 18*/        le16 next_client;       /* The offset to the next log client record in
-                                  the array of log client records.
-                                  LOGFILE_NO_CLIENT means there are no next
-                                  client records, i.e. this is the last one.
-                                  This is always LOGFILE_NO_CLIENT. */
-/* 20*/        le16 seq_number;        /* On Win2k and presumably earlier, this is set
-                                  to zero every time the logfile is restarted
-                                  and it is incremented when the logfile is
-                                  closed at dismount time.  Thus it is 0 when
-                                  dirty and 1 when clean.  On WinXP and
-                                  presumably later, this is always 0. */
-/* 22*/        u8 reserved[6];         /* Reserved/alignment. */
-/* 28*/        le32 client_name_length;/* Length of client name in bytes.  Should
-                                  always be 8. */
-/* 32*/        ntfschar client_name[64];/* Name of the client in Unicode.  Should
-                                  always be "NTFS" with the remaining bytes
-                                  set to 0. */
-/* sizeof() = 160 (0xa0) bytes */
-} __attribute__ ((__packed__)) LOG_CLIENT_RECORD;
-
-extern bool ntfs_check_logfile(struct inode *log_vi,
-               RESTART_PAGE_HEADER **rp);
-
-extern bool ntfs_is_logfile_clean(struct inode *log_vi,
-               const RESTART_PAGE_HEADER *rp);
-
-extern bool ntfs_empty_logfile(struct inode *log_vi);
-
-#endif /* NTFS_RW */
-
-#endif /* _LINUX_NTFS_LOGFILE_H */
diff --git a/fs/ntfs/malloc.h b/fs/ntfs/malloc.h
deleted file mode 100644 (file)
index 7068425..0000000
+++ /dev/null
@@ -1,77 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * malloc.h - NTFS kernel memory handling. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2005 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_MALLOC_H
-#define _LINUX_NTFS_MALLOC_H
-
-#include <linux/vmalloc.h>
-#include <linux/slab.h>
-#include <linux/highmem.h>
-
-/**
- * __ntfs_malloc - allocate memory in multiples of pages
- * @size:      number of bytes to allocate
- * @gfp_mask:  extra flags for the allocator
- *
- * Internal function.  You probably want ntfs_malloc_nofs()...
- *
- * Allocates @size bytes of memory, rounded up to multiples of PAGE_SIZE and
- * returns a pointer to the allocated memory.
- *
- * If there was insufficient memory to complete the request, return NULL.
- * Depending on @gfp_mask the allocation may be guaranteed to succeed.
- */
-static inline void *__ntfs_malloc(unsigned long size, gfp_t gfp_mask)
-{
-       if (likely(size <= PAGE_SIZE)) {
-               BUG_ON(!size);
-               /* kmalloc() has per-CPU caches so is faster for now. */
-               return kmalloc(PAGE_SIZE, gfp_mask & ~__GFP_HIGHMEM);
-               /* return (void *)__get_free_page(gfp_mask); */
-       }
-       if (likely((size >> PAGE_SHIFT) < totalram_pages()))
-               return __vmalloc(size, gfp_mask);
-       return NULL;
-}
-
-/**
- * ntfs_malloc_nofs - allocate memory in multiples of pages
- * @size:      number of bytes to allocate
- *
- * Allocates @size bytes of memory, rounded up to multiples of PAGE_SIZE and
- * returns a pointer to the allocated memory.
- *
- * If there was insufficient memory to complete the request, return NULL.
- */
-static inline void *ntfs_malloc_nofs(unsigned long size)
-{
-       return __ntfs_malloc(size, GFP_NOFS | __GFP_HIGHMEM);
-}
-
-/**
- * ntfs_malloc_nofs_nofail - allocate memory in multiples of pages
- * @size:      number of bytes to allocate
- *
- * Allocates @size bytes of memory, rounded up to multiples of PAGE_SIZE and
- * returns a pointer to the allocated memory.
- *
- * This function guarantees that the allocation will succeed.  It will sleep
- * for as long as it takes to complete the allocation.
- *
- * If there was insufficient memory to complete the request, return NULL.
- */
-static inline void *ntfs_malloc_nofs_nofail(unsigned long size)
-{
-       return __ntfs_malloc(size, GFP_NOFS | __GFP_HIGHMEM | __GFP_NOFAIL);
-}
-
-static inline void ntfs_free(void *addr)
-{
-       kvfree(addr);
-}
-
-#endif /* _LINUX_NTFS_MALLOC_H */
diff --git a/fs/ntfs/mft.c b/fs/ntfs/mft.c
deleted file mode 100644 (file)
index 6fd1dc4..0000000
+++ /dev/null
@@ -1,2907 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * mft.c - NTFS kernel mft record operations. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2012 Anton Altaparmakov and Tuxera Inc.
- * Copyright (c) 2002 Richard Russon
- */
-
-#include <linux/buffer_head.h>
-#include <linux/slab.h>
-#include <linux/swap.h>
-#include <linux/bio.h>
-
-#include "attrib.h"
-#include "aops.h"
-#include "bitmap.h"
-#include "debug.h"
-#include "dir.h"
-#include "lcnalloc.h"
-#include "malloc.h"
-#include "mft.h"
-#include "ntfs.h"
-
-#define MAX_BHS        (PAGE_SIZE / NTFS_BLOCK_SIZE)
-
-/**
- * map_mft_record_page - map the page in which a specific mft record resides
- * @ni:                ntfs inode whose mft record page to map
- *
- * This maps the page in which the mft record of the ntfs inode @ni is situated
- * and returns a pointer to the mft record within the mapped page.
- *
- * Return value needs to be checked with IS_ERR() and if that is true PTR_ERR()
- * contains the negative error code returned.
- */
-static inline MFT_RECORD *map_mft_record_page(ntfs_inode *ni)
-{
-       loff_t i_size;
-       ntfs_volume *vol = ni->vol;
-       struct inode *mft_vi = vol->mft_ino;
-       struct page *page;
-       unsigned long index, end_index;
-       unsigned ofs;
-
-       BUG_ON(ni->page);
-       /*
-        * The index into the page cache and the offset within the page cache
-        * page of the wanted mft record. FIXME: We need to check for
-        * overflowing the unsigned long, but I don't think we would ever get
-        * here if the volume was that big...
-        */
-       index = (u64)ni->mft_no << vol->mft_record_size_bits >>
-                       PAGE_SHIFT;
-       ofs = (ni->mft_no << vol->mft_record_size_bits) & ~PAGE_MASK;
-
-       i_size = i_size_read(mft_vi);
-       /* The maximum valid index into the page cache for $MFT's data. */
-       end_index = i_size >> PAGE_SHIFT;
-
-       /* If the wanted index is out of bounds the mft record doesn't exist. */
-       if (unlikely(index >= end_index)) {
-               if (index > end_index || (i_size & ~PAGE_MASK) < ofs +
-                               vol->mft_record_size) {
-                       page = ERR_PTR(-ENOENT);
-                       ntfs_error(vol->sb, "Attempt to read mft record 0x%lx, "
-                                       "which is beyond the end of the mft.  "
-                                       "This is probably a bug in the ntfs "
-                                       "driver.", ni->mft_no);
-                       goto err_out;
-               }
-       }
-       /* Read, map, and pin the page. */
-       page = ntfs_map_page(mft_vi->i_mapping, index);
-       if (!IS_ERR(page)) {
-               /* Catch multi sector transfer fixup errors. */
-               if (likely(ntfs_is_mft_recordp((le32*)(page_address(page) +
-                               ofs)))) {
-                       ni->page = page;
-                       ni->page_ofs = ofs;
-                       return page_address(page) + ofs;
-               }
-               ntfs_error(vol->sb, "Mft record 0x%lx is corrupt.  "
-                               "Run chkdsk.", ni->mft_no);
-               ntfs_unmap_page(page);
-               page = ERR_PTR(-EIO);
-               NVolSetErrors(vol);
-       }
-err_out:
-       ni->page = NULL;
-       ni->page_ofs = 0;
-       return (void*)page;
-}
-
-/**
- * map_mft_record - map, pin and lock an mft record
- * @ni:                ntfs inode whose MFT record to map
- *
- * First, take the mrec_lock mutex.  We might now be sleeping, while waiting
- * for the mutex if it was already locked by someone else.
- *
- * The page of the record is mapped using map_mft_record_page() before being
- * returned to the caller.
- *
- * This in turn uses ntfs_map_page() to get the page containing the wanted mft
- * record (it in turn calls read_cache_page() which reads it in from disk if
- * necessary, increments the use count on the page so that it cannot disappear
- * under us and returns a reference to the page cache page).
- *
- * If read_cache_page() invokes ntfs_readpage() to load the page from disk, it
- * sets PG_locked and clears PG_uptodate on the page. Once I/O has completed
- * and the post-read mst fixups on each mft record in the page have been
- * performed, the page gets PG_uptodate set and PG_locked cleared (this is done
- * in our asynchronous I/O completion handler end_buffer_read_mft_async()).
- * ntfs_map_page() waits for PG_locked to become clear and checks if
- * PG_uptodate is set and returns an error code if not. This provides
- * sufficient protection against races when reading/using the page.
- *
- * However there is the write mapping to think about. Doing the above described
- * checking here will be fine, because when initiating the write we will set
- * PG_locked and clear PG_uptodate making sure nobody is touching the page
- * contents. Doing the locking this way means that the commit to disk code in
- * the page cache code paths is automatically sufficiently locked with us as
- * we will not touch a page that has been locked or is not uptodate. The only
- * locking problem then is them locking the page while we are accessing it.
- *
- * So that code will end up having to own the mrec_lock of all mft
- * records/inodes present in the page before I/O can proceed. In that case we
- * wouldn't need to bother with PG_locked and PG_uptodate as nobody will be
- * accessing anything without owning the mrec_lock mutex.  But we do need to
- * use them because of the read_cache_page() invocation and the code becomes so
- * much simpler this way that it is well worth it.
- *
- * The mft record is now ours and we return a pointer to it. You need to check
- * the returned pointer with IS_ERR() and if that is true, PTR_ERR() will return
- * the error code.
- *
- * NOTE: Caller is responsible for setting the mft record dirty before calling
- * unmap_mft_record(). This is obviously only necessary if the caller really
- * modified the mft record...
- * Q: Do we want to recycle one of the VFS inode state bits instead?
- * A: No, the inode ones mean we want to change the mft record, not we want to
- * write it out.
- */
-MFT_RECORD *map_mft_record(ntfs_inode *ni)
-{
-       MFT_RECORD *m;
-
-       ntfs_debug("Entering for mft_no 0x%lx.", ni->mft_no);
-
-       /* Make sure the ntfs inode doesn't go away. */
-       atomic_inc(&ni->count);
-
-       /* Serialize access to this mft record. */
-       mutex_lock(&ni->mrec_lock);
-
-       m = map_mft_record_page(ni);
-       if (!IS_ERR(m))
-               return m;
-
-       mutex_unlock(&ni->mrec_lock);
-       atomic_dec(&ni->count);
-       ntfs_error(ni->vol->sb, "Failed with error code %lu.", -PTR_ERR(m));
-       return m;
-}
-
-/**
- * unmap_mft_record_page - unmap the page in which a specific mft record resides
- * @ni:                ntfs inode whose mft record page to unmap
- *
- * This unmaps the page in which the mft record of the ntfs inode @ni is
- * situated and returns. This is a NOOP if highmem is not configured.
- *
- * The unmap happens via ntfs_unmap_page() which in turn decrements the use
- * count on the page thus releasing it from the pinned state.
- *
- * We do not actually unmap the page from memory of course, as that will be
- * done by the page cache code itself when memory pressure increases or
- * whatever.
- */
-static inline void unmap_mft_record_page(ntfs_inode *ni)
-{
-       BUG_ON(!ni->page);
-
-       // TODO: If dirty, blah...
-       ntfs_unmap_page(ni->page);
-       ni->page = NULL;
-       ni->page_ofs = 0;
-       return;
-}
-
-/**
- * unmap_mft_record - release a mapped mft record
- * @ni:                ntfs inode whose MFT record to unmap
- *
- * We release the page mapping and the mrec_lock mutex which unmaps the mft
- * record and releases it for others to get hold of. We also release the ntfs
- * inode by decrementing the ntfs inode reference count.
- *
- * NOTE: If caller has modified the mft record, it is imperative to set the mft
- * record dirty BEFORE calling unmap_mft_record().
- */
-void unmap_mft_record(ntfs_inode *ni)
-{
-       struct page *page = ni->page;
-
-       BUG_ON(!page);
-
-       ntfs_debug("Entering for mft_no 0x%lx.", ni->mft_no);
-
-       unmap_mft_record_page(ni);
-       mutex_unlock(&ni->mrec_lock);
-       atomic_dec(&ni->count);
-       /*
-        * If pure ntfs_inode, i.e. no vfs inode attached, we leave it to
-        * ntfs_clear_extent_inode() in the extent inode case, and to the
-        * caller in the non-extent, yet pure ntfs inode case, to do the actual
-        * tear down of all structures and freeing of all allocated memory.
-        */
-       return;
-}
-
-/**
- * map_extent_mft_record - load an extent inode and attach it to its base
- * @base_ni:   base ntfs inode
- * @mref:      mft reference of the extent inode to load
- * @ntfs_ino:  on successful return, pointer to the ntfs_inode structure
- *
- * Load the extent mft record @mref and attach it to its base inode @base_ni.
- * Return the mapped extent mft record if IS_ERR(result) is false.  Otherwise
- * PTR_ERR(result) gives the negative error code.
- *
- * On successful return, @ntfs_ino contains a pointer to the ntfs_inode
- * structure of the mapped extent inode.
- */
-MFT_RECORD *map_extent_mft_record(ntfs_inode *base_ni, MFT_REF mref,
-               ntfs_inode **ntfs_ino)
-{
-       MFT_RECORD *m;
-       ntfs_inode *ni = NULL;
-       ntfs_inode **extent_nis = NULL;
-       int i;
-       unsigned long mft_no = MREF(mref);
-       u16 seq_no = MSEQNO(mref);
-       bool destroy_ni = false;
-
-       ntfs_debug("Mapping extent mft record 0x%lx (base mft record 0x%lx).",
-                       mft_no, base_ni->mft_no);
-       /* Make sure the base ntfs inode doesn't go away. */
-       atomic_inc(&base_ni->count);
-       /*
-        * Check if this extent inode has already been added to the base inode,
-        * in which case just return it. If not found, add it to the base
-        * inode before returning it.
-        */
-       mutex_lock(&base_ni->extent_lock);
-       if (base_ni->nr_extents > 0) {
-               extent_nis = base_ni->ext.extent_ntfs_inos;
-               for (i = 0; i < base_ni->nr_extents; i++) {
-                       if (mft_no != extent_nis[i]->mft_no)
-                               continue;
-                       ni = extent_nis[i];
-                       /* Make sure the ntfs inode doesn't go away. */
-                       atomic_inc(&ni->count);
-                       break;
-               }
-       }
-       if (likely(ni != NULL)) {
-               mutex_unlock(&base_ni->extent_lock);
-               atomic_dec(&base_ni->count);
-               /* We found the record; just have to map and return it. */
-               m = map_mft_record(ni);
-               /* map_mft_record() has incremented this on success. */
-               atomic_dec(&ni->count);
-               if (!IS_ERR(m)) {
-                       /* Verify the sequence number. */
-                       if (likely(le16_to_cpu(m->sequence_number) == seq_no)) {
-                               ntfs_debug("Done 1.");
-                               *ntfs_ino = ni;
-                               return m;
-                       }
-                       unmap_mft_record(ni);
-                       ntfs_error(base_ni->vol->sb, "Found stale extent mft "
-                                       "reference! Corrupt filesystem. "
-                                       "Run chkdsk.");
-                       return ERR_PTR(-EIO);
-               }
-map_err_out:
-               ntfs_error(base_ni->vol->sb, "Failed to map extent "
-                               "mft record, error code %ld.", -PTR_ERR(m));
-               return m;
-       }
-       /* Record wasn't there. Get a new ntfs inode and initialize it. */
-       ni = ntfs_new_extent_inode(base_ni->vol->sb, mft_no);
-       if (unlikely(!ni)) {
-               mutex_unlock(&base_ni->extent_lock);
-               atomic_dec(&base_ni->count);
-               return ERR_PTR(-ENOMEM);
-       }
-       ni->vol = base_ni->vol;
-       ni->seq_no = seq_no;
-       ni->nr_extents = -1;
-       ni->ext.base_ntfs_ino = base_ni;
-       /* Now map the record. */
-       m = map_mft_record(ni);
-       if (IS_ERR(m)) {
-               mutex_unlock(&base_ni->extent_lock);
-               atomic_dec(&base_ni->count);
-               ntfs_clear_extent_inode(ni);
-               goto map_err_out;
-       }
-       /* Verify the sequence number if it is present. */
-       if (seq_no && (le16_to_cpu(m->sequence_number) != seq_no)) {
-               ntfs_error(base_ni->vol->sb, "Found stale extent mft "
-                               "reference! Corrupt filesystem. Run chkdsk.");
-               destroy_ni = true;
-               m = ERR_PTR(-EIO);
-               goto unm_err_out;
-       }
-       /* Attach extent inode to base inode, reallocating memory if needed. */
-       if (!(base_ni->nr_extents & 3)) {
-               ntfs_inode **tmp;
-               int new_size = (base_ni->nr_extents + 4) * sizeof(ntfs_inode *);
-
-               tmp = kmalloc(new_size, GFP_NOFS);
-               if (unlikely(!tmp)) {
-                       ntfs_error(base_ni->vol->sb, "Failed to allocate "
-                                       "internal buffer.");
-                       destroy_ni = true;
-                       m = ERR_PTR(-ENOMEM);
-                       goto unm_err_out;
-               }
-               if (base_ni->nr_extents) {
-                       BUG_ON(!base_ni->ext.extent_ntfs_inos);
-                       memcpy(tmp, base_ni->ext.extent_ntfs_inos, new_size -
-                                       4 * sizeof(ntfs_inode *));
-                       kfree(base_ni->ext.extent_ntfs_inos);
-               }
-               base_ni->ext.extent_ntfs_inos = tmp;
-       }
-       base_ni->ext.extent_ntfs_inos[base_ni->nr_extents++] = ni;
-       mutex_unlock(&base_ni->extent_lock);
-       atomic_dec(&base_ni->count);
-       ntfs_debug("Done 2.");
-       *ntfs_ino = ni;
-       return m;
-unm_err_out:
-       unmap_mft_record(ni);
-       mutex_unlock(&base_ni->extent_lock);
-       atomic_dec(&base_ni->count);
-       /*
-        * If the extent inode was not attached to the base inode we need to
-        * release it or we will leak memory.
-        */
-       if (destroy_ni)
-               ntfs_clear_extent_inode(ni);
-       return m;
-}
-
-#ifdef NTFS_RW
-
-/**
- * __mark_mft_record_dirty - set the mft record and the page containing it dirty
- * @ni:                ntfs inode describing the mapped mft record
- *
- * Internal function.  Users should call mark_mft_record_dirty() instead.
- *
- * Set the mapped (extent) mft record of the (base or extent) ntfs inode @ni,
- * as well as the page containing the mft record, dirty.  Also, mark the base
- * vfs inode dirty.  This ensures that any changes to the mft record are
- * written out to disk.
- *
- * NOTE:  We only set I_DIRTY_DATASYNC (and not I_DIRTY_PAGES)
- * on the base vfs inode, because even though file data may have been modified,
- * it is dirty in the inode meta data rather than the data page cache of the
- * inode, and thus there are no data pages that need writing out.  Therefore, a
- * full mark_inode_dirty() is overkill.  A mark_inode_dirty_sync(), on the
- * other hand, is not sufficient, because ->write_inode needs to be called even
- * in case of fdatasync. This needs to happen or the file data would not
- * necessarily hit the device synchronously, even though the vfs inode has the
- * O_SYNC flag set.  Also, I_DIRTY_DATASYNC simply "feels" better than just
- * I_DIRTY_SYNC, since the file data has not actually hit the block device yet,
- * which is not what I_DIRTY_SYNC on its own would suggest.
- */
-void __mark_mft_record_dirty(ntfs_inode *ni)
-{
-       ntfs_inode *base_ni;
-
-       ntfs_debug("Entering for inode 0x%lx.", ni->mft_no);
-       BUG_ON(NInoAttr(ni));
-       mark_ntfs_record_dirty(ni->page, ni->page_ofs);
-       /* Determine the base vfs inode and mark it dirty, too. */
-       mutex_lock(&ni->extent_lock);
-       if (likely(ni->nr_extents >= 0))
-               base_ni = ni;
-       else
-               base_ni = ni->ext.base_ntfs_ino;
-       mutex_unlock(&ni->extent_lock);
-       __mark_inode_dirty(VFS_I(base_ni), I_DIRTY_DATASYNC);
-}
-
-static const char *ntfs_please_email = "Please email "
-               "linux-ntfs-dev@lists.sourceforge.net and say that you saw "
-               "this message.  Thank you.";
-
-/**
- * ntfs_sync_mft_mirror_umount - synchronise an mft record to the mft mirror
- * @vol:       ntfs volume on which the mft record to synchronize resides
- * @mft_no:    mft record number of mft record to synchronize
- * @m:         mapped, mst protected (extent) mft record to synchronize
- *
- * Write the mapped, mst protected (extent) mft record @m with mft record
- * number @mft_no to the mft mirror ($MFTMirr) of the ntfs volume @vol,
- * bypassing the page cache and the $MFTMirr inode itself.
- *
- * This function is only for use at umount time when the mft mirror inode has
- * already been disposed off.  We BUG() if we are called while the mft mirror
- * inode is still attached to the volume.
- *
- * On success return 0.  On error return -errno.
- *
- * NOTE:  This function is not implemented yet as I am not convinced it can
- * actually be triggered considering the sequence of commits we do in super.c::
- * ntfs_put_super().  But just in case we provide this place holder as the
- * alternative would be either to BUG() or to get a NULL pointer dereference
- * and Oops.
- */
-static int ntfs_sync_mft_mirror_umount(ntfs_volume *vol,
-               const unsigned long mft_no, MFT_RECORD *m)
-{
-       BUG_ON(vol->mftmirr_ino);
-       ntfs_error(vol->sb, "Umount time mft mirror syncing is not "
-                       "implemented yet.  %s", ntfs_please_email);
-       return -EOPNOTSUPP;
-}
-
-/**
- * ntfs_sync_mft_mirror - synchronize an mft record to the mft mirror
- * @vol:       ntfs volume on which the mft record to synchronize resides
- * @mft_no:    mft record number of mft record to synchronize
- * @m:         mapped, mst protected (extent) mft record to synchronize
- * @sync:      if true, wait for i/o completion
- *
- * Write the mapped, mst protected (extent) mft record @m with mft record
- * number @mft_no to the mft mirror ($MFTMirr) of the ntfs volume @vol.
- *
- * On success return 0.  On error return -errno and set the volume errors flag
- * in the ntfs volume @vol.
- *
- * NOTE:  We always perform synchronous i/o and ignore the @sync parameter.
- *
- * TODO:  If @sync is false, want to do truly asynchronous i/o, i.e. just
- * schedule i/o via ->writepage or do it via kntfsd or whatever.
- */
-int ntfs_sync_mft_mirror(ntfs_volume *vol, const unsigned long mft_no,
-               MFT_RECORD *m, int sync)
-{
-       struct page *page;
-       unsigned int blocksize = vol->sb->s_blocksize;
-       int max_bhs = vol->mft_record_size / blocksize;
-       struct buffer_head *bhs[MAX_BHS];
-       struct buffer_head *bh, *head;
-       u8 *kmirr;
-       runlist_element *rl;
-       unsigned int block_start, block_end, m_start, m_end, page_ofs;
-       int i_bhs, nr_bhs, err = 0;
-       unsigned char blocksize_bits = vol->sb->s_blocksize_bits;
-
-       ntfs_debug("Entering for inode 0x%lx.", mft_no);
-       BUG_ON(!max_bhs);
-       if (WARN_ON(max_bhs > MAX_BHS))
-               return -EINVAL;
-       if (unlikely(!vol->mftmirr_ino)) {
-               /* This could happen during umount... */
-               err = ntfs_sync_mft_mirror_umount(vol, mft_no, m);
-               if (likely(!err))
-                       return err;
-               goto err_out;
-       }
-       /* Get the page containing the mirror copy of the mft record @m. */
-       page = ntfs_map_page(vol->mftmirr_ino->i_mapping, mft_no >>
-                       (PAGE_SHIFT - vol->mft_record_size_bits));
-       if (IS_ERR(page)) {
-               ntfs_error(vol->sb, "Failed to map mft mirror page.");
-               err = PTR_ERR(page);
-               goto err_out;
-       }
-       lock_page(page);
-       BUG_ON(!PageUptodate(page));
-       ClearPageUptodate(page);
-       /* Offset of the mft mirror record inside the page. */
-       page_ofs = (mft_no << vol->mft_record_size_bits) & ~PAGE_MASK;
-       /* The address in the page of the mirror copy of the mft record @m. */
-       kmirr = page_address(page) + page_ofs;
-       /* Copy the mst protected mft record to the mirror. */
-       memcpy(kmirr, m, vol->mft_record_size);
-       /* Create uptodate buffers if not present. */
-       if (unlikely(!page_has_buffers(page))) {
-               struct buffer_head *tail;
-
-               bh = head = alloc_page_buffers(page, blocksize, true);
-               do {
-                       set_buffer_uptodate(bh);
-                       tail = bh;
-                       bh = bh->b_this_page;
-               } while (bh);
-               tail->b_this_page = head;
-               attach_page_private(page, head);
-       }
-       bh = head = page_buffers(page);
-       BUG_ON(!bh);
-       rl = NULL;
-       nr_bhs = 0;
-       block_start = 0;
-       m_start = kmirr - (u8*)page_address(page);
-       m_end = m_start + vol->mft_record_size;
-       do {
-               block_end = block_start + blocksize;
-               /* If the buffer is outside the mft record, skip it. */
-               if (block_end <= m_start)
-                       continue;
-               if (unlikely(block_start >= m_end))
-                       break;
-               /* Need to map the buffer if it is not mapped already. */
-               if (unlikely(!buffer_mapped(bh))) {
-                       VCN vcn;
-                       LCN lcn;
-                       unsigned int vcn_ofs;
-
-                       bh->b_bdev = vol->sb->s_bdev;
-                       /* Obtain the vcn and offset of the current block. */
-                       vcn = ((VCN)mft_no << vol->mft_record_size_bits) +
-                                       (block_start - m_start);
-                       vcn_ofs = vcn & vol->cluster_size_mask;
-                       vcn >>= vol->cluster_size_bits;
-                       if (!rl) {
-                               down_read(&NTFS_I(vol->mftmirr_ino)->
-                                               runlist.lock);
-                               rl = NTFS_I(vol->mftmirr_ino)->runlist.rl;
-                               /*
-                                * $MFTMirr always has the whole of its runlist
-                                * in memory.
-                                */
-                               BUG_ON(!rl);
-                       }
-                       /* Seek to element containing target vcn. */
-                       while (rl->length && rl[1].vcn <= vcn)
-                               rl++;
-                       lcn = ntfs_rl_vcn_to_lcn(rl, vcn);
-                       /* For $MFTMirr, only lcn >= 0 is a successful remap. */
-                       if (likely(lcn >= 0)) {
-                               /* Setup buffer head to correct block. */
-                               bh->b_blocknr = ((lcn <<
-                                               vol->cluster_size_bits) +
-                                               vcn_ofs) >> blocksize_bits;
-                               set_buffer_mapped(bh);
-                       } else {
-                               bh->b_blocknr = -1;
-                               ntfs_error(vol->sb, "Cannot write mft mirror "
-                                               "record 0x%lx because its "
-                                               "location on disk could not "
-                                               "be determined (error code "
-                                               "%lli).", mft_no,
-                                               (long long)lcn);
-                               err = -EIO;
-                       }
-               }
-               BUG_ON(!buffer_uptodate(bh));
-               BUG_ON(!nr_bhs && (m_start != block_start));
-               BUG_ON(nr_bhs >= max_bhs);
-               bhs[nr_bhs++] = bh;
-               BUG_ON((nr_bhs >= max_bhs) && (m_end != block_end));
-       } while (block_start = block_end, (bh = bh->b_this_page) != head);
-       if (unlikely(rl))
-               up_read(&NTFS_I(vol->mftmirr_ino)->runlist.lock);
-       if (likely(!err)) {
-               /* Lock buffers and start synchronous write i/o on them. */
-               for (i_bhs = 0; i_bhs < nr_bhs; i_bhs++) {
-                       struct buffer_head *tbh = bhs[i_bhs];
-
-                       if (!trylock_buffer(tbh))
-                               BUG();
-                       BUG_ON(!buffer_uptodate(tbh));
-                       clear_buffer_dirty(tbh);
-                       get_bh(tbh);
-                       tbh->b_end_io = end_buffer_write_sync;
-                       submit_bh(REQ_OP_WRITE, tbh);
-               }
-               /* Wait on i/o completion of buffers. */
-               for (i_bhs = 0; i_bhs < nr_bhs; i_bhs++) {
-                       struct buffer_head *tbh = bhs[i_bhs];
-
-                       wait_on_buffer(tbh);
-                       if (unlikely(!buffer_uptodate(tbh))) {
-                               err = -EIO;
-                               /*
-                                * Set the buffer uptodate so the page and
-                                * buffer states do not become out of sync.
-                                */
-                               set_buffer_uptodate(tbh);
-                       }
-               }
-       } else /* if (unlikely(err)) */ {
-               /* Clean the buffers. */
-               for (i_bhs = 0; i_bhs < nr_bhs; i_bhs++)
-                       clear_buffer_dirty(bhs[i_bhs]);
-       }
-       /* Current state: all buffers are clean, unlocked, and uptodate. */
-       /* Remove the mst protection fixups again. */
-       post_write_mst_fixup((NTFS_RECORD*)kmirr);
-       flush_dcache_page(page);
-       SetPageUptodate(page);
-       unlock_page(page);
-       ntfs_unmap_page(page);
-       if (likely(!err)) {
-               ntfs_debug("Done.");
-       } else {
-               ntfs_error(vol->sb, "I/O error while writing mft mirror "
-                               "record 0x%lx!", mft_no);
-err_out:
-               ntfs_error(vol->sb, "Failed to synchronize $MFTMirr (error "
-                               "code %i).  Volume will be left marked dirty "
-                               "on umount.  Run ntfsfix on the partition "
-                               "after umounting to correct this.", -err);
-               NVolSetErrors(vol);
-       }
-       return err;
-}
-
-/**
- * write_mft_record_nolock - write out a mapped (extent) mft record
- * @ni:                ntfs inode describing the mapped (extent) mft record
- * @m:         mapped (extent) mft record to write
- * @sync:      if true, wait for i/o completion
- *
- * Write the mapped (extent) mft record @m described by the (regular or extent)
- * ntfs inode @ni to backing store.  If the mft record @m has a counterpart in
- * the mft mirror, that is also updated.
- *
- * We only write the mft record if the ntfs inode @ni is dirty and the first
- * buffer belonging to its mft record is dirty, too.  We ignore the dirty state
- * of subsequent buffers because we could have raced with
- * fs/ntfs/aops.c::mark_ntfs_record_dirty().
- *
- * On success, clean the mft record and return 0.  On error, leave the mft
- * record dirty and return -errno.
- *
- * NOTE:  We always perform synchronous i/o and ignore the @sync parameter.
- * However, if the mft record has a counterpart in the mft mirror and @sync is
- * true, we write the mft record, wait for i/o completion, and only then write
- * the mft mirror copy.  This ensures that if the system crashes either the mft
- * or the mft mirror will contain a self-consistent mft record @m.  If @sync is
- * false on the other hand, we start i/o on both and then wait for completion
- * on them.  This provides a speedup but no longer guarantees that you will end
- * up with a self-consistent mft record in the case of a crash but if you asked
- * for asynchronous writing you probably do not care about that anyway.
- *
- * TODO:  If @sync is false, want to do truly asynchronous i/o, i.e. just
- * schedule i/o via ->writepage or do it via kntfsd or whatever.
- */
-int write_mft_record_nolock(ntfs_inode *ni, MFT_RECORD *m, int sync)
-{
-       ntfs_volume *vol = ni->vol;
-       struct page *page = ni->page;
-       unsigned int blocksize = vol->sb->s_blocksize;
-       unsigned char blocksize_bits = vol->sb->s_blocksize_bits;
-       int max_bhs = vol->mft_record_size / blocksize;
-       struct buffer_head *bhs[MAX_BHS];
-       struct buffer_head *bh, *head;
-       runlist_element *rl;
-       unsigned int block_start, block_end, m_start, m_end;
-       int i_bhs, nr_bhs, err = 0;
-
-       ntfs_debug("Entering for inode 0x%lx.", ni->mft_no);
-       BUG_ON(NInoAttr(ni));
-       BUG_ON(!max_bhs);
-       BUG_ON(!PageLocked(page));
-       if (WARN_ON(max_bhs > MAX_BHS)) {
-               err = -EINVAL;
-               goto err_out;
-       }
-       /*
-        * If the ntfs_inode is clean no need to do anything.  If it is dirty,
-        * mark it as clean now so that it can be redirtied later on if needed.
-        * There is no danger of races since the caller is holding the locks
-        * for the mft record @m and the page it is in.
-        */
-       if (!NInoTestClearDirty(ni))
-               goto done;
-       bh = head = page_buffers(page);
-       BUG_ON(!bh);
-       rl = NULL;
-       nr_bhs = 0;
-       block_start = 0;
-       m_start = ni->page_ofs;
-       m_end = m_start + vol->mft_record_size;
-       do {
-               block_end = block_start + blocksize;
-               /* If the buffer is outside the mft record, skip it. */
-               if (block_end <= m_start)
-                       continue;
-               if (unlikely(block_start >= m_end))
-                       break;
-               /*
-                * If this block is not the first one in the record, we ignore
-                * the buffer's dirty state because we could have raced with a
-                * parallel mark_ntfs_record_dirty().
-                */
-               if (block_start == m_start) {
-                       /* This block is the first one in the record. */
-                       if (!buffer_dirty(bh)) {
-                               BUG_ON(nr_bhs);
-                               /* Clean records are not written out. */
-                               break;
-                       }
-               }
-               /* Need to map the buffer if it is not mapped already. */
-               if (unlikely(!buffer_mapped(bh))) {
-                       VCN vcn;
-                       LCN lcn;
-                       unsigned int vcn_ofs;
-
-                       bh->b_bdev = vol->sb->s_bdev;
-                       /* Obtain the vcn and offset of the current block. */
-                       vcn = ((VCN)ni->mft_no << vol->mft_record_size_bits) +
-                                       (block_start - m_start);
-                       vcn_ofs = vcn & vol->cluster_size_mask;
-                       vcn >>= vol->cluster_size_bits;
-                       if (!rl) {
-                               down_read(&NTFS_I(vol->mft_ino)->runlist.lock);
-                               rl = NTFS_I(vol->mft_ino)->runlist.rl;
-                               BUG_ON(!rl);
-                       }
-                       /* Seek to element containing target vcn. */
-                       while (rl->length && rl[1].vcn <= vcn)
-                               rl++;
-                       lcn = ntfs_rl_vcn_to_lcn(rl, vcn);
-                       /* For $MFT, only lcn >= 0 is a successful remap. */
-                       if (likely(lcn >= 0)) {
-                               /* Setup buffer head to correct block. */
-                               bh->b_blocknr = ((lcn <<
-                                               vol->cluster_size_bits) +
-                                               vcn_ofs) >> blocksize_bits;
-                               set_buffer_mapped(bh);
-                       } else {
-                               bh->b_blocknr = -1;
-                               ntfs_error(vol->sb, "Cannot write mft record "
-                                               "0x%lx because its location "
-                                               "on disk could not be "
-                                               "determined (error code %lli).",
-                                               ni->mft_no, (long long)lcn);
-                               err = -EIO;
-                       }
-               }
-               BUG_ON(!buffer_uptodate(bh));
-               BUG_ON(!nr_bhs && (m_start != block_start));
-               BUG_ON(nr_bhs >= max_bhs);
-               bhs[nr_bhs++] = bh;
-               BUG_ON((nr_bhs >= max_bhs) && (m_end != block_end));
-       } while (block_start = block_end, (bh = bh->b_this_page) != head);
-       if (unlikely(rl))
-               up_read(&NTFS_I(vol->mft_ino)->runlist.lock);
-       if (!nr_bhs)
-               goto done;
-       if (unlikely(err))
-               goto cleanup_out;
-       /* Apply the mst protection fixups. */
-       err = pre_write_mst_fixup((NTFS_RECORD*)m, vol->mft_record_size);
-       if (err) {
-               ntfs_error(vol->sb, "Failed to apply mst fixups!");
-               goto cleanup_out;
-       }
-       flush_dcache_mft_record_page(ni);
-       /* Lock buffers and start synchronous write i/o on them. */
-       for (i_bhs = 0; i_bhs < nr_bhs; i_bhs++) {
-               struct buffer_head *tbh = bhs[i_bhs];
-
-               if (!trylock_buffer(tbh))
-                       BUG();
-               BUG_ON(!buffer_uptodate(tbh));
-               clear_buffer_dirty(tbh);
-               get_bh(tbh);
-               tbh->b_end_io = end_buffer_write_sync;
-               submit_bh(REQ_OP_WRITE, tbh);
-       }
-       /* Synchronize the mft mirror now if not @sync. */
-       if (!sync && ni->mft_no < vol->mftmirr_size)
-               ntfs_sync_mft_mirror(vol, ni->mft_no, m, sync);
-       /* Wait on i/o completion of buffers. */
-       for (i_bhs = 0; i_bhs < nr_bhs; i_bhs++) {
-               struct buffer_head *tbh = bhs[i_bhs];
-
-               wait_on_buffer(tbh);
-               if (unlikely(!buffer_uptodate(tbh))) {
-                       err = -EIO;
-                       /*
-                        * Set the buffer uptodate so the page and buffer
-                        * states do not become out of sync.
-                        */
-                       if (PageUptodate(page))
-                               set_buffer_uptodate(tbh);
-               }
-       }
-       /* If @sync, now synchronize the mft mirror. */
-       if (sync && ni->mft_no < vol->mftmirr_size)
-               ntfs_sync_mft_mirror(vol, ni->mft_no, m, sync);
-       /* Remove the mst protection fixups again. */
-       post_write_mst_fixup((NTFS_RECORD*)m);
-       flush_dcache_mft_record_page(ni);
-       if (unlikely(err)) {
-               /* I/O error during writing.  This is really bad! */
-               ntfs_error(vol->sb, "I/O error while writing mft record "
-                               "0x%lx!  Marking base inode as bad.  You "
-                               "should unmount the volume and run chkdsk.",
-                               ni->mft_no);
-               goto err_out;
-       }
-done:
-       ntfs_debug("Done.");
-       return 0;
-cleanup_out:
-       /* Clean the buffers. */
-       for (i_bhs = 0; i_bhs < nr_bhs; i_bhs++)
-               clear_buffer_dirty(bhs[i_bhs]);
-err_out:
-       /*
-        * Current state: all buffers are clean, unlocked, and uptodate.
-        * The caller should mark the base inode as bad so that no more i/o
-        * happens.  ->clear_inode() will still be invoked so all extent inodes
-        * and other allocated memory will be freed.
-        */
-       if (err == -ENOMEM) {
-               ntfs_error(vol->sb, "Not enough memory to write mft record.  "
-                               "Redirtying so the write is retried later.");
-               mark_mft_record_dirty(ni);
-               err = 0;
-       } else
-               NVolSetErrors(vol);
-       return err;
-}
-
-/**
- * ntfs_may_write_mft_record - check if an mft record may be written out
- * @vol:       [IN]  ntfs volume on which the mft record to check resides
- * @mft_no:    [IN]  mft record number of the mft record to check
- * @m:         [IN]  mapped mft record to check
- * @locked_ni: [OUT] caller has to unlock this ntfs inode if one is returned
- *
- * Check if the mapped (base or extent) mft record @m with mft record number
- * @mft_no belonging to the ntfs volume @vol may be written out.  If necessary
- * and possible the ntfs inode of the mft record is locked and the base vfs
- * inode is pinned.  The locked ntfs inode is then returned in @locked_ni.  The
- * caller is responsible for unlocking the ntfs inode and unpinning the base
- * vfs inode.
- *
- * Return 'true' if the mft record may be written out and 'false' if not.
- *
- * The caller has locked the page and cleared the uptodate flag on it which
- * means that we can safely write out any dirty mft records that do not have
- * their inodes in icache as determined by ilookup5() as anyone
- * opening/creating such an inode would block when attempting to map the mft
- * record in read_cache_page() until we are finished with the write out.
- *
- * Here is a description of the tests we perform:
- *
- * If the inode is found in icache we know the mft record must be a base mft
- * record.  If it is dirty, we do not write it and return 'false' as the vfs
- * inode write paths will result in the access times being updated which would
- * cause the base mft record to be redirtied and written out again.  (We know
- * the access time update will modify the base mft record because Windows
- * chkdsk complains if the standard information attribute is not in the base
- * mft record.)
- *
- * If the inode is in icache and not dirty, we attempt to lock the mft record
- * and if we find the lock was already taken, it is not safe to write the mft
- * record and we return 'false'.
- *
- * If we manage to obtain the lock we have exclusive access to the mft record,
- * which also allows us safe writeout of the mft record.  We then set
- * @locked_ni to the locked ntfs inode and return 'true'.
- *
- * Note we cannot just lock the mft record and sleep while waiting for the lock
- * because this would deadlock due to lock reversal (normally the mft record is
- * locked before the page is locked but we already have the page locked here
- * when we try to lock the mft record).
- *
- * If the inode is not in icache we need to perform further checks.
- *
- * If the mft record is not a FILE record or it is a base mft record, we can
- * safely write it and return 'true'.
- *
- * We now know the mft record is an extent mft record.  We check if the inode
- * corresponding to its base mft record is in icache and obtain a reference to
- * it if it is.  If it is not, we can safely write it and return 'true'.
- *
- * We now have the base inode for the extent mft record.  We check if it has an
- * ntfs inode for the extent mft record attached and if not it is safe to write
- * the extent mft record and we return 'true'.
- *
- * The ntfs inode for the extent mft record is attached to the base inode so we
- * attempt to lock the extent mft record and if we find the lock was already
- * taken, it is not safe to write the extent mft record and we return 'false'.
- *
- * If we manage to obtain the lock we have exclusive access to the extent mft
- * record, which also allows us safe writeout of the extent mft record.  We
- * set the ntfs inode of the extent mft record clean and then set @locked_ni to
- * the now locked ntfs inode and return 'true'.
- *
- * Note, the reason for actually writing dirty mft records here and not just
- * relying on the vfs inode dirty code paths is that we can have mft records
- * modified without them ever having actual inodes in memory.  Also we can have
- * dirty mft records with clean ntfs inodes in memory.  None of the described
- * cases would result in the dirty mft records being written out if we only
- * relied on the vfs inode dirty code paths.  And these cases can really occur
- * during allocation of new mft records and in particular when the
- * initialized_size of the $MFT/$DATA attribute is extended and the new space
- * is initialized using ntfs_mft_record_format().  The clean inode can then
- * appear if the mft record is reused for a new inode before it got written
- * out.
- */
-bool ntfs_may_write_mft_record(ntfs_volume *vol, const unsigned long mft_no,
-               const MFT_RECORD *m, ntfs_inode **locked_ni)
-{
-       struct super_block *sb = vol->sb;
-       struct inode *mft_vi = vol->mft_ino;
-       struct inode *vi;
-       ntfs_inode *ni, *eni, **extent_nis;
-       int i;
-       ntfs_attr na;
-
-       ntfs_debug("Entering for inode 0x%lx.", mft_no);
-       /*
-        * Normally we do not return a locked inode so set @locked_ni to NULL.
-        */
-       BUG_ON(!locked_ni);
-       *locked_ni = NULL;
-       /*
-        * Check if the inode corresponding to this mft record is in the VFS
-        * inode cache and obtain a reference to it if it is.
-        */
-       ntfs_debug("Looking for inode 0x%lx in icache.", mft_no);
-       na.mft_no = mft_no;
-       na.name = NULL;
-       na.name_len = 0;
-       na.type = AT_UNUSED;
-       /*
-        * Optimize inode 0, i.e. $MFT itself, since we have it in memory and
-        * we get here for it rather often.
-        */
-       if (!mft_no) {
-               /* Balance the below iput(). */
-               vi = igrab(mft_vi);
-               BUG_ON(vi != mft_vi);
-       } else {
-               /*
-                * Have to use ilookup5_nowait() since ilookup5() waits for the
-                * inode lock which causes ntfs to deadlock when a concurrent
-                * inode write via the inode dirty code paths and the page
-                * dirty code path of the inode dirty code path when writing
-                * $MFT occurs.
-                */
-               vi = ilookup5_nowait(sb, mft_no, ntfs_test_inode, &na);
-       }
-       if (vi) {
-               ntfs_debug("Base inode 0x%lx is in icache.", mft_no);
-               /* The inode is in icache. */
-               ni = NTFS_I(vi);
-               /* Take a reference to the ntfs inode. */
-               atomic_inc(&ni->count);
-               /* If the inode is dirty, do not write this record. */
-               if (NInoDirty(ni)) {
-                       ntfs_debug("Inode 0x%lx is dirty, do not write it.",
-                                       mft_no);
-                       atomic_dec(&ni->count);
-                       iput(vi);
-                       return false;
-               }
-               ntfs_debug("Inode 0x%lx is not dirty.", mft_no);
-               /* The inode is not dirty, try to take the mft record lock. */
-               if (unlikely(!mutex_trylock(&ni->mrec_lock))) {
-                       ntfs_debug("Mft record 0x%lx is already locked, do "
-                                       "not write it.", mft_no);
-                       atomic_dec(&ni->count);
-                       iput(vi);
-                       return false;
-               }
-               ntfs_debug("Managed to lock mft record 0x%lx, write it.",
-                               mft_no);
-               /*
-                * The write has to occur while we hold the mft record lock so
-                * return the locked ntfs inode.
-                */
-               *locked_ni = ni;
-               return true;
-       }
-       ntfs_debug("Inode 0x%lx is not in icache.", mft_no);
-       /* The inode is not in icache. */
-       /* Write the record if it is not a mft record (type "FILE"). */
-       if (!ntfs_is_mft_record(m->magic)) {
-               ntfs_debug("Mft record 0x%lx is not a FILE record, write it.",
-                               mft_no);
-               return true;
-       }
-       /* Write the mft record if it is a base inode. */
-       if (!m->base_mft_record) {
-               ntfs_debug("Mft record 0x%lx is a base record, write it.",
-                               mft_no);
-               return true;
-       }
-       /*
-        * This is an extent mft record.  Check if the inode corresponding to
-        * its base mft record is in icache and obtain a reference to it if it
-        * is.
-        */
-       na.mft_no = MREF_LE(m->base_mft_record);
-       ntfs_debug("Mft record 0x%lx is an extent record.  Looking for base "
-                       "inode 0x%lx in icache.", mft_no, na.mft_no);
-       if (!na.mft_no) {
-               /* Balance the below iput(). */
-               vi = igrab(mft_vi);
-               BUG_ON(vi != mft_vi);
-       } else
-               vi = ilookup5_nowait(sb, na.mft_no, ntfs_test_inode,
-                               &na);
-       if (!vi) {
-               /*
-                * The base inode is not in icache, write this extent mft
-                * record.
-                */
-               ntfs_debug("Base inode 0x%lx is not in icache, write the "
-                               "extent record.", na.mft_no);
-               return true;
-       }
-       ntfs_debug("Base inode 0x%lx is in icache.", na.mft_no);
-       /*
-        * The base inode is in icache.  Check if it has the extent inode
-        * corresponding to this extent mft record attached.
-        */
-       ni = NTFS_I(vi);
-       mutex_lock(&ni->extent_lock);
-       if (ni->nr_extents <= 0) {
-               /*
-                * The base inode has no attached extent inodes, write this
-                * extent mft record.
-                */
-               mutex_unlock(&ni->extent_lock);
-               iput(vi);
-               ntfs_debug("Base inode 0x%lx has no attached extent inodes, "
-                               "write the extent record.", na.mft_no);
-               return true;
-       }
-       /* Iterate over the attached extent inodes. */
-       extent_nis = ni->ext.extent_ntfs_inos;
-       for (eni = NULL, i = 0; i < ni->nr_extents; ++i) {
-               if (mft_no == extent_nis[i]->mft_no) {
-                       /*
-                        * Found the extent inode corresponding to this extent
-                        * mft record.
-                        */
-                       eni = extent_nis[i];
-                       break;
-               }
-       }
-       /*
-        * If the extent inode was not attached to the base inode, write this
-        * extent mft record.
-        */
-       if (!eni) {
-               mutex_unlock(&ni->extent_lock);
-               iput(vi);
-               ntfs_debug("Extent inode 0x%lx is not attached to its base "
-                               "inode 0x%lx, write the extent record.",
-                               mft_no, na.mft_no);
-               return true;
-       }
-       ntfs_debug("Extent inode 0x%lx is attached to its base inode 0x%lx.",
-                       mft_no, na.mft_no);
-       /* Take a reference to the extent ntfs inode. */
-       atomic_inc(&eni->count);
-       mutex_unlock(&ni->extent_lock);
-       /*
-        * Found the extent inode coresponding to this extent mft record.
-        * Try to take the mft record lock.
-        */
-       if (unlikely(!mutex_trylock(&eni->mrec_lock))) {
-               atomic_dec(&eni->count);
-               iput(vi);
-               ntfs_debug("Extent mft record 0x%lx is already locked, do "
-                               "not write it.", mft_no);
-               return false;
-       }
-       ntfs_debug("Managed to lock extent mft record 0x%lx, write it.",
-                       mft_no);
-       if (NInoTestClearDirty(eni))
-               ntfs_debug("Extent inode 0x%lx is dirty, marking it clean.",
-                               mft_no);
-       /*
-        * The write has to occur while we hold the mft record lock so return
-        * the locked extent ntfs inode.
-        */
-       *locked_ni = eni;
-       return true;
-}
-
-static const char *es = "  Leaving inconsistent metadata.  Unmount and run "
-               "chkdsk.";
-
-/**
- * ntfs_mft_bitmap_find_and_alloc_free_rec_nolock - see name
- * @vol:       volume on which to search for a free mft record
- * @base_ni:   open base inode if allocating an extent mft record or NULL
- *
- * Search for a free mft record in the mft bitmap attribute on the ntfs volume
- * @vol.
- *
- * If @base_ni is NULL start the search at the default allocator position.
- *
- * If @base_ni is not NULL start the search at the mft record after the base
- * mft record @base_ni.
- *
- * Return the free mft record on success and -errno on error.  An error code of
- * -ENOSPC means that there are no free mft records in the currently
- * initialized mft bitmap.
- *
- * Locking: Caller must hold vol->mftbmp_lock for writing.
- */
-static int ntfs_mft_bitmap_find_and_alloc_free_rec_nolock(ntfs_volume *vol,
-               ntfs_inode *base_ni)
-{
-       s64 pass_end, ll, data_pos, pass_start, ofs, bit;
-       unsigned long flags;
-       struct address_space *mftbmp_mapping;
-       u8 *buf, *byte;
-       struct page *page;
-       unsigned int page_ofs, size;
-       u8 pass, b;
-
-       ntfs_debug("Searching for free mft record in the currently "
-                       "initialized mft bitmap.");
-       mftbmp_mapping = vol->mftbmp_ino->i_mapping;
-       /*
-        * Set the end of the pass making sure we do not overflow the mft
-        * bitmap.
-        */
-       read_lock_irqsave(&NTFS_I(vol->mft_ino)->size_lock, flags);
-       pass_end = NTFS_I(vol->mft_ino)->allocated_size >>
-                       vol->mft_record_size_bits;
-       read_unlock_irqrestore(&NTFS_I(vol->mft_ino)->size_lock, flags);
-       read_lock_irqsave(&NTFS_I(vol->mftbmp_ino)->size_lock, flags);
-       ll = NTFS_I(vol->mftbmp_ino)->initialized_size << 3;
-       read_unlock_irqrestore(&NTFS_I(vol->mftbmp_ino)->size_lock, flags);
-       if (pass_end > ll)
-               pass_end = ll;
-       pass = 1;
-       if (!base_ni)
-               data_pos = vol->mft_data_pos;
-       else
-               data_pos = base_ni->mft_no + 1;
-       if (data_pos < 24)
-               data_pos = 24;
-       if (data_pos >= pass_end) {
-               data_pos = 24;
-               pass = 2;
-               /* This happens on a freshly formatted volume. */
-               if (data_pos >= pass_end)
-                       return -ENOSPC;
-       }
-       pass_start = data_pos;
-       ntfs_debug("Starting bitmap search: pass %u, pass_start 0x%llx, "
-                       "pass_end 0x%llx, data_pos 0x%llx.", pass,
-                       (long long)pass_start, (long long)pass_end,
-                       (long long)data_pos);
-       /* Loop until a free mft record is found. */
-       for (; pass <= 2;) {
-               /* Cap size to pass_end. */
-               ofs = data_pos >> 3;
-               page_ofs = ofs & ~PAGE_MASK;
-               size = PAGE_SIZE - page_ofs;
-               ll = ((pass_end + 7) >> 3) - ofs;
-               if (size > ll)
-                       size = ll;
-               size <<= 3;
-               /*
-                * If we are still within the active pass, search the next page
-                * for a zero bit.
-                */
-               if (size) {
-                       page = ntfs_map_page(mftbmp_mapping,
-                                       ofs >> PAGE_SHIFT);
-                       if (IS_ERR(page)) {
-                               ntfs_error(vol->sb, "Failed to read mft "
-                                               "bitmap, aborting.");
-                               return PTR_ERR(page);
-                       }
-                       buf = (u8*)page_address(page) + page_ofs;
-                       bit = data_pos & 7;
-                       data_pos &= ~7ull;
-                       ntfs_debug("Before inner for loop: size 0x%x, "
-                                       "data_pos 0x%llx, bit 0x%llx", size,
-                                       (long long)data_pos, (long long)bit);
-                       for (; bit < size && data_pos + bit < pass_end;
-                                       bit &= ~7ull, bit += 8) {
-                               byte = buf + (bit >> 3);
-                               if (*byte == 0xff)
-                                       continue;
-                               b = ffz((unsigned long)*byte);
-                               if (b < 8 && b >= (bit & 7)) {
-                                       ll = data_pos + (bit & ~7ull) + b;
-                                       if (unlikely(ll > (1ll << 32))) {
-                                               ntfs_unmap_page(page);
-                                               return -ENOSPC;
-                                       }
-                                       *byte |= 1 << b;
-                                       flush_dcache_page(page);
-                                       set_page_dirty(page);
-                                       ntfs_unmap_page(page);
-                                       ntfs_debug("Done.  (Found and "
-                                                       "allocated mft record "
-                                                       "0x%llx.)",
-                                                       (long long)ll);
-                                       return ll;
-                               }
-                       }
-                       ntfs_debug("After inner for loop: size 0x%x, "
-                                       "data_pos 0x%llx, bit 0x%llx", size,
-                                       (long long)data_pos, (long long)bit);
-                       data_pos += size;
-                       ntfs_unmap_page(page);
-                       /*
-                        * If the end of the pass has not been reached yet,
-                        * continue searching the mft bitmap for a zero bit.
-                        */
-                       if (data_pos < pass_end)
-                               continue;
-               }
-               /* Do the next pass. */
-               if (++pass == 2) {
-                       /*
-                        * Starting the second pass, in which we scan the first
-                        * part of the zone which we omitted earlier.
-                        */
-                       pass_end = pass_start;
-                       data_pos = pass_start = 24;
-                       ntfs_debug("pass %i, pass_start 0x%llx, pass_end "
-                                       "0x%llx.", pass, (long long)pass_start,
-                                       (long long)pass_end);
-                       if (data_pos >= pass_end)
-                               break;
-               }
-       }
-       /* No free mft records in currently initialized mft bitmap. */
-       ntfs_debug("Done.  (No free mft records left in currently initialized "
-                       "mft bitmap.)");
-       return -ENOSPC;
-}
-
-/**
- * ntfs_mft_bitmap_extend_allocation_nolock - extend mft bitmap by a cluster
- * @vol:       volume on which to extend the mft bitmap attribute
- *
- * Extend the mft bitmap attribute on the ntfs volume @vol by one cluster.
- *
- * Note: Only changes allocated_size, i.e. does not touch initialized_size or
- * data_size.
- *
- * Return 0 on success and -errno on error.
- *
- * Locking: - Caller must hold vol->mftbmp_lock for writing.
- *         - This function takes NTFS_I(vol->mftbmp_ino)->runlist.lock for
- *           writing and releases it before returning.
- *         - This function takes vol->lcnbmp_lock for writing and releases it
- *           before returning.
- */
-static int ntfs_mft_bitmap_extend_allocation_nolock(ntfs_volume *vol)
-{
-       LCN lcn;
-       s64 ll;
-       unsigned long flags;
-       struct page *page;
-       ntfs_inode *mft_ni, *mftbmp_ni;
-       runlist_element *rl, *rl2 = NULL;
-       ntfs_attr_search_ctx *ctx = NULL;
-       MFT_RECORD *mrec;
-       ATTR_RECORD *a = NULL;
-       int ret, mp_size;
-       u32 old_alen = 0;
-       u8 *b, tb;
-       struct {
-               u8 added_cluster:1;
-               u8 added_run:1;
-               u8 mp_rebuilt:1;
-       } status = { 0, 0, 0 };
-
-       ntfs_debug("Extending mft bitmap allocation.");
-       mft_ni = NTFS_I(vol->mft_ino);
-       mftbmp_ni = NTFS_I(vol->mftbmp_ino);
-       /*
-        * Determine the last lcn of the mft bitmap.  The allocated size of the
-        * mft bitmap cannot be zero so we are ok to do this.
-        */
-       down_write(&mftbmp_ni->runlist.lock);
-       read_lock_irqsave(&mftbmp_ni->size_lock, flags);
-       ll = mftbmp_ni->allocated_size;
-       read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
-       rl = ntfs_attr_find_vcn_nolock(mftbmp_ni,
-                       (ll - 1) >> vol->cluster_size_bits, NULL);
-       if (IS_ERR(rl) || unlikely(!rl->length || rl->lcn < 0)) {
-               up_write(&mftbmp_ni->runlist.lock);
-               ntfs_error(vol->sb, "Failed to determine last allocated "
-                               "cluster of mft bitmap attribute.");
-               if (!IS_ERR(rl))
-                       ret = -EIO;
-               else
-                       ret = PTR_ERR(rl);
-               return ret;
-       }
-       lcn = rl->lcn + rl->length;
-       ntfs_debug("Last lcn of mft bitmap attribute is 0x%llx.",
-                       (long long)lcn);
-       /*
-        * Attempt to get the cluster following the last allocated cluster by
-        * hand as it may be in the MFT zone so the allocator would not give it
-        * to us.
-        */
-       ll = lcn >> 3;
-       page = ntfs_map_page(vol->lcnbmp_ino->i_mapping,
-                       ll >> PAGE_SHIFT);
-       if (IS_ERR(page)) {
-               up_write(&mftbmp_ni->runlist.lock);
-               ntfs_error(vol->sb, "Failed to read from lcn bitmap.");
-               return PTR_ERR(page);
-       }
-       b = (u8*)page_address(page) + (ll & ~PAGE_MASK);
-       tb = 1 << (lcn & 7ull);
-       down_write(&vol->lcnbmp_lock);
-       if (*b != 0xff && !(*b & tb)) {
-               /* Next cluster is free, allocate it. */
-               *b |= tb;
-               flush_dcache_page(page);
-               set_page_dirty(page);
-               up_write(&vol->lcnbmp_lock);
-               ntfs_unmap_page(page);
-               /* Update the mft bitmap runlist. */
-               rl->length++;
-               rl[1].vcn++;
-               status.added_cluster = 1;
-               ntfs_debug("Appending one cluster to mft bitmap.");
-       } else {
-               up_write(&vol->lcnbmp_lock);
-               ntfs_unmap_page(page);
-               /* Allocate a cluster from the DATA_ZONE. */
-               rl2 = ntfs_cluster_alloc(vol, rl[1].vcn, 1, lcn, DATA_ZONE,
-                               true);
-               if (IS_ERR(rl2)) {
-                       up_write(&mftbmp_ni->runlist.lock);
-                       ntfs_error(vol->sb, "Failed to allocate a cluster for "
-                                       "the mft bitmap.");
-                       return PTR_ERR(rl2);
-               }
-               rl = ntfs_runlists_merge(mftbmp_ni->runlist.rl, rl2);
-               if (IS_ERR(rl)) {
-                       up_write(&mftbmp_ni->runlist.lock);
-                       ntfs_error(vol->sb, "Failed to merge runlists for mft "
-                                       "bitmap.");
-                       if (ntfs_cluster_free_from_rl(vol, rl2)) {
-                               ntfs_error(vol->sb, "Failed to deallocate "
-                                               "allocated cluster.%s", es);
-                               NVolSetErrors(vol);
-                       }
-                       ntfs_free(rl2);
-                       return PTR_ERR(rl);
-               }
-               mftbmp_ni->runlist.rl = rl;
-               status.added_run = 1;
-               ntfs_debug("Adding one run to mft bitmap.");
-               /* Find the last run in the new runlist. */
-               for (; rl[1].length; rl++)
-                       ;
-       }
-       /*
-        * Update the attribute record as well.  Note: @rl is the last
-        * (non-terminator) runlist element of mft bitmap.
-        */
-       mrec = map_mft_record(mft_ni);
-       if (IS_ERR(mrec)) {
-               ntfs_error(vol->sb, "Failed to map mft record.");
-               ret = PTR_ERR(mrec);
-               goto undo_alloc;
-       }
-       ctx = ntfs_attr_get_search_ctx(mft_ni, mrec);
-       if (unlikely(!ctx)) {
-               ntfs_error(vol->sb, "Failed to get search context.");
-               ret = -ENOMEM;
-               goto undo_alloc;
-       }
-       ret = ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
-                       mftbmp_ni->name_len, CASE_SENSITIVE, rl[1].vcn, NULL,
-                       0, ctx);
-       if (unlikely(ret)) {
-               ntfs_error(vol->sb, "Failed to find last attribute extent of "
-                               "mft bitmap attribute.");
-               if (ret == -ENOENT)
-                       ret = -EIO;
-               goto undo_alloc;
-       }
-       a = ctx->attr;
-       ll = sle64_to_cpu(a->data.non_resident.lowest_vcn);
-       /* Search back for the previous last allocated cluster of mft bitmap. */
-       for (rl2 = rl; rl2 > mftbmp_ni->runlist.rl; rl2--) {
-               if (ll >= rl2->vcn)
-                       break;
-       }
-       BUG_ON(ll < rl2->vcn);
-       BUG_ON(ll >= rl2->vcn + rl2->length);
-       /* Get the size for the new mapping pairs array for this extent. */
-       mp_size = ntfs_get_size_for_mapping_pairs(vol, rl2, ll, -1);
-       if (unlikely(mp_size <= 0)) {
-               ntfs_error(vol->sb, "Get size for mapping pairs failed for "
-                               "mft bitmap attribute extent.");
-               ret = mp_size;
-               if (!ret)
-                       ret = -EIO;
-               goto undo_alloc;
-       }
-       /* Expand the attribute record if necessary. */
-       old_alen = le32_to_cpu(a->length);
-       ret = ntfs_attr_record_resize(ctx->mrec, a, mp_size +
-                       le16_to_cpu(a->data.non_resident.mapping_pairs_offset));
-       if (unlikely(ret)) {
-               if (ret != -ENOSPC) {
-                       ntfs_error(vol->sb, "Failed to resize attribute "
-                                       "record for mft bitmap attribute.");
-                       goto undo_alloc;
-               }
-               // TODO: Deal with this by moving this extent to a new mft
-               // record or by starting a new extent in a new mft record or by
-               // moving other attributes out of this mft record.
-               // Note: It will need to be a special mft record and if none of
-               // those are available it gets rather complicated...
-               ntfs_error(vol->sb, "Not enough space in this mft record to "
-                               "accommodate extended mft bitmap attribute "
-                               "extent.  Cannot handle this yet.");
-               ret = -EOPNOTSUPP;
-               goto undo_alloc;
-       }
-       status.mp_rebuilt = 1;
-       /* Generate the mapping pairs array directly into the attr record. */
-       ret = ntfs_mapping_pairs_build(vol, (u8*)a +
-                       le16_to_cpu(a->data.non_resident.mapping_pairs_offset),
-                       mp_size, rl2, ll, -1, NULL);
-       if (unlikely(ret)) {
-               ntfs_error(vol->sb, "Failed to build mapping pairs array for "
-                               "mft bitmap attribute.");
-               goto undo_alloc;
-       }
-       /* Update the highest_vcn. */
-       a->data.non_resident.highest_vcn = cpu_to_sle64(rl[1].vcn - 1);
-       /*
-        * We now have extended the mft bitmap allocated_size by one cluster.
-        * Reflect this in the ntfs_inode structure and the attribute record.
-        */
-       if (a->data.non_resident.lowest_vcn) {
-               /*
-                * We are not in the first attribute extent, switch to it, but
-                * first ensure the changes will make it to disk later.
-                */
-               flush_dcache_mft_record_page(ctx->ntfs_ino);
-               mark_mft_record_dirty(ctx->ntfs_ino);
-               ntfs_attr_reinit_search_ctx(ctx);
-               ret = ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
-                               mftbmp_ni->name_len, CASE_SENSITIVE, 0, NULL,
-                               0, ctx);
-               if (unlikely(ret)) {
-                       ntfs_error(vol->sb, "Failed to find first attribute "
-                                       "extent of mft bitmap attribute.");
-                       goto restore_undo_alloc;
-               }
-               a = ctx->attr;
-       }
-       write_lock_irqsave(&mftbmp_ni->size_lock, flags);
-       mftbmp_ni->allocated_size += vol->cluster_size;
-       a->data.non_resident.allocated_size =
-                       cpu_to_sle64(mftbmp_ni->allocated_size);
-       write_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
-       /* Ensure the changes make it to disk. */
-       flush_dcache_mft_record_page(ctx->ntfs_ino);
-       mark_mft_record_dirty(ctx->ntfs_ino);
-       ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(mft_ni);
-       up_write(&mftbmp_ni->runlist.lock);
-       ntfs_debug("Done.");
-       return 0;
-restore_undo_alloc:
-       ntfs_attr_reinit_search_ctx(ctx);
-       if (ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
-                       mftbmp_ni->name_len, CASE_SENSITIVE, rl[1].vcn, NULL,
-                       0, ctx)) {
-               ntfs_error(vol->sb, "Failed to find last attribute extent of "
-                               "mft bitmap attribute.%s", es);
-               write_lock_irqsave(&mftbmp_ni->size_lock, flags);
-               mftbmp_ni->allocated_size += vol->cluster_size;
-               write_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
-               ntfs_attr_put_search_ctx(ctx);
-               unmap_mft_record(mft_ni);
-               up_write(&mftbmp_ni->runlist.lock);
-               /*
-                * The only thing that is now wrong is ->allocated_size of the
-                * base attribute extent which chkdsk should be able to fix.
-                */
-               NVolSetErrors(vol);
-               return ret;
-       }
-       a = ctx->attr;
-       a->data.non_resident.highest_vcn = cpu_to_sle64(rl[1].vcn - 2);
-undo_alloc:
-       if (status.added_cluster) {
-               /* Truncate the last run in the runlist by one cluster. */
-               rl->length--;
-               rl[1].vcn--;
-       } else if (status.added_run) {
-               lcn = rl->lcn;
-               /* Remove the last run from the runlist. */
-               rl->lcn = rl[1].lcn;
-               rl->length = 0;
-       }
-       /* Deallocate the cluster. */
-       down_write(&vol->lcnbmp_lock);
-       if (ntfs_bitmap_clear_bit(vol->lcnbmp_ino, lcn)) {
-               ntfs_error(vol->sb, "Failed to free allocated cluster.%s", es);
-               NVolSetErrors(vol);
-       }
-       up_write(&vol->lcnbmp_lock);
-       if (status.mp_rebuilt) {
-               if (ntfs_mapping_pairs_build(vol, (u8*)a + le16_to_cpu(
-                               a->data.non_resident.mapping_pairs_offset),
-                               old_alen - le16_to_cpu(
-                               a->data.non_resident.mapping_pairs_offset),
-                               rl2, ll, -1, NULL)) {
-                       ntfs_error(vol->sb, "Failed to restore mapping pairs "
-                                       "array.%s", es);
-                       NVolSetErrors(vol);
-               }
-               if (ntfs_attr_record_resize(ctx->mrec, a, old_alen)) {
-                       ntfs_error(vol->sb, "Failed to restore attribute "
-                                       "record.%s", es);
-                       NVolSetErrors(vol);
-               }
-               flush_dcache_mft_record_page(ctx->ntfs_ino);
-               mark_mft_record_dirty(ctx->ntfs_ino);
-       }
-       if (ctx)
-               ntfs_attr_put_search_ctx(ctx);
-       if (!IS_ERR(mrec))
-               unmap_mft_record(mft_ni);
-       up_write(&mftbmp_ni->runlist.lock);
-       return ret;
-}
-
-/**
- * ntfs_mft_bitmap_extend_initialized_nolock - extend mftbmp initialized data
- * @vol:       volume on which to extend the mft bitmap attribute
- *
- * Extend the initialized portion of the mft bitmap attribute on the ntfs
- * volume @vol by 8 bytes.
- *
- * Note:  Only changes initialized_size and data_size, i.e. requires that
- * allocated_size is big enough to fit the new initialized_size.
- *
- * Return 0 on success and -error on error.
- *
- * Locking: Caller must hold vol->mftbmp_lock for writing.
- */
-static int ntfs_mft_bitmap_extend_initialized_nolock(ntfs_volume *vol)
-{
-       s64 old_data_size, old_initialized_size;
-       unsigned long flags;
-       struct inode *mftbmp_vi;
-       ntfs_inode *mft_ni, *mftbmp_ni;
-       ntfs_attr_search_ctx *ctx;
-       MFT_RECORD *mrec;
-       ATTR_RECORD *a;
-       int ret;
-
-       ntfs_debug("Extending mft bitmap initiailized (and data) size.");
-       mft_ni = NTFS_I(vol->mft_ino);
-       mftbmp_vi = vol->mftbmp_ino;
-       mftbmp_ni = NTFS_I(mftbmp_vi);
-       /* Get the attribute record. */
-       mrec = map_mft_record(mft_ni);
-       if (IS_ERR(mrec)) {
-               ntfs_error(vol->sb, "Failed to map mft record.");
-               return PTR_ERR(mrec);
-       }
-       ctx = ntfs_attr_get_search_ctx(mft_ni, mrec);
-       if (unlikely(!ctx)) {
-               ntfs_error(vol->sb, "Failed to get search context.");
-               ret = -ENOMEM;
-               goto unm_err_out;
-       }
-       ret = ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
-                       mftbmp_ni->name_len, CASE_SENSITIVE, 0, NULL, 0, ctx);
-       if (unlikely(ret)) {
-               ntfs_error(vol->sb, "Failed to find first attribute extent of "
-                               "mft bitmap attribute.");
-               if (ret == -ENOENT)
-                       ret = -EIO;
-               goto put_err_out;
-       }
-       a = ctx->attr;
-       write_lock_irqsave(&mftbmp_ni->size_lock, flags);
-       old_data_size = i_size_read(mftbmp_vi);
-       old_initialized_size = mftbmp_ni->initialized_size;
-       /*
-        * We can simply update the initialized_size before filling the space
-        * with zeroes because the caller is holding the mft bitmap lock for
-        * writing which ensures that no one else is trying to access the data.
-        */
-       mftbmp_ni->initialized_size += 8;
-       a->data.non_resident.initialized_size =
-                       cpu_to_sle64(mftbmp_ni->initialized_size);
-       if (mftbmp_ni->initialized_size > old_data_size) {
-               i_size_write(mftbmp_vi, mftbmp_ni->initialized_size);
-               a->data.non_resident.data_size =
-                               cpu_to_sle64(mftbmp_ni->initialized_size);
-       }
-       write_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
-       /* Ensure the changes make it to disk. */
-       flush_dcache_mft_record_page(ctx->ntfs_ino);
-       mark_mft_record_dirty(ctx->ntfs_ino);
-       ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(mft_ni);
-       /* Initialize the mft bitmap attribute value with zeroes. */
-       ret = ntfs_attr_set(mftbmp_ni, old_initialized_size, 8, 0);
-       if (likely(!ret)) {
-               ntfs_debug("Done.  (Wrote eight initialized bytes to mft "
-                               "bitmap.");
-               return 0;
-       }
-       ntfs_error(vol->sb, "Failed to write to mft bitmap.");
-       /* Try to recover from the error. */
-       mrec = map_mft_record(mft_ni);
-       if (IS_ERR(mrec)) {
-               ntfs_error(vol->sb, "Failed to map mft record.%s", es);
-               NVolSetErrors(vol);
-               return ret;
-       }
-       ctx = ntfs_attr_get_search_ctx(mft_ni, mrec);
-       if (unlikely(!ctx)) {
-               ntfs_error(vol->sb, "Failed to get search context.%s", es);
-               NVolSetErrors(vol);
-               goto unm_err_out;
-       }
-       if (ntfs_attr_lookup(mftbmp_ni->type, mftbmp_ni->name,
-                       mftbmp_ni->name_len, CASE_SENSITIVE, 0, NULL, 0, ctx)) {
-               ntfs_error(vol->sb, "Failed to find first attribute extent of "
-                               "mft bitmap attribute.%s", es);
-               NVolSetErrors(vol);
-put_err_out:
-               ntfs_attr_put_search_ctx(ctx);
-unm_err_out:
-               unmap_mft_record(mft_ni);
-               goto err_out;
-       }
-       a = ctx->attr;
-       write_lock_irqsave(&mftbmp_ni->size_lock, flags);
-       mftbmp_ni->initialized_size = old_initialized_size;
-       a->data.non_resident.initialized_size =
-                       cpu_to_sle64(old_initialized_size);
-       if (i_size_read(mftbmp_vi) != old_data_size) {
-               i_size_write(mftbmp_vi, old_data_size);
-               a->data.non_resident.data_size = cpu_to_sle64(old_data_size);
-       }
-       write_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
-       flush_dcache_mft_record_page(ctx->ntfs_ino);
-       mark_mft_record_dirty(ctx->ntfs_ino);
-       ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(mft_ni);
-#ifdef DEBUG
-       read_lock_irqsave(&mftbmp_ni->size_lock, flags);
-       ntfs_debug("Restored status of mftbmp: allocated_size 0x%llx, "
-                       "data_size 0x%llx, initialized_size 0x%llx.",
-                       (long long)mftbmp_ni->allocated_size,
-                       (long long)i_size_read(mftbmp_vi),
-                       (long long)mftbmp_ni->initialized_size);
-       read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
-#endif /* DEBUG */
-err_out:
-       return ret;
-}
-
-/**
- * ntfs_mft_data_extend_allocation_nolock - extend mft data attribute
- * @vol:       volume on which to extend the mft data attribute
- *
- * Extend the mft data attribute on the ntfs volume @vol by 16 mft records
- * worth of clusters or if not enough space for this by one mft record worth
- * of clusters.
- *
- * Note:  Only changes allocated_size, i.e. does not touch initialized_size or
- * data_size.
- *
- * Return 0 on success and -errno on error.
- *
- * Locking: - Caller must hold vol->mftbmp_lock for writing.
- *         - This function takes NTFS_I(vol->mft_ino)->runlist.lock for
- *           writing and releases it before returning.
- *         - This function calls functions which take vol->lcnbmp_lock for
- *           writing and release it before returning.
- */
-static int ntfs_mft_data_extend_allocation_nolock(ntfs_volume *vol)
-{
-       LCN lcn;
-       VCN old_last_vcn;
-       s64 min_nr, nr, ll;
-       unsigned long flags;
-       ntfs_inode *mft_ni;
-       runlist_element *rl, *rl2;
-       ntfs_attr_search_ctx *ctx = NULL;
-       MFT_RECORD *mrec;
-       ATTR_RECORD *a = NULL;
-       int ret, mp_size;
-       u32 old_alen = 0;
-       bool mp_rebuilt = false;
-
-       ntfs_debug("Extending mft data allocation.");
-       mft_ni = NTFS_I(vol->mft_ino);
-       /*
-        * Determine the preferred allocation location, i.e. the last lcn of
-        * the mft data attribute.  The allocated size of the mft data
-        * attribute cannot be zero so we are ok to do this.
-        */
-       down_write(&mft_ni->runlist.lock);
-       read_lock_irqsave(&mft_ni->size_lock, flags);
-       ll = mft_ni->allocated_size;
-       read_unlock_irqrestore(&mft_ni->size_lock, flags);
-       rl = ntfs_attr_find_vcn_nolock(mft_ni,
-                       (ll - 1) >> vol->cluster_size_bits, NULL);
-       if (IS_ERR(rl) || unlikely(!rl->length || rl->lcn < 0)) {
-               up_write(&mft_ni->runlist.lock);
-               ntfs_error(vol->sb, "Failed to determine last allocated "
-                               "cluster of mft data attribute.");
-               if (!IS_ERR(rl))
-                       ret = -EIO;
-               else
-                       ret = PTR_ERR(rl);
-               return ret;
-       }
-       lcn = rl->lcn + rl->length;
-       ntfs_debug("Last lcn of mft data attribute is 0x%llx.", (long long)lcn);
-       /* Minimum allocation is one mft record worth of clusters. */
-       min_nr = vol->mft_record_size >> vol->cluster_size_bits;
-       if (!min_nr)
-               min_nr = 1;
-       /* Want to allocate 16 mft records worth of clusters. */
-       nr = vol->mft_record_size << 4 >> vol->cluster_size_bits;
-       if (!nr)
-               nr = min_nr;
-       /* Ensure we do not go above 2^32-1 mft records. */
-       read_lock_irqsave(&mft_ni->size_lock, flags);
-       ll = mft_ni->allocated_size;
-       read_unlock_irqrestore(&mft_ni->size_lock, flags);
-       if (unlikely((ll + (nr << vol->cluster_size_bits)) >>
-                       vol->mft_record_size_bits >= (1ll << 32))) {
-               nr = min_nr;
-               if (unlikely((ll + (nr << vol->cluster_size_bits)) >>
-                               vol->mft_record_size_bits >= (1ll << 32))) {
-                       ntfs_warning(vol->sb, "Cannot allocate mft record "
-                                       "because the maximum number of inodes "
-                                       "(2^32) has already been reached.");
-                       up_write(&mft_ni->runlist.lock);
-                       return -ENOSPC;
-               }
-       }
-       ntfs_debug("Trying mft data allocation with %s cluster count %lli.",
-                       nr > min_nr ? "default" : "minimal", (long long)nr);
-       old_last_vcn = rl[1].vcn;
-       do {
-               rl2 = ntfs_cluster_alloc(vol, old_last_vcn, nr, lcn, MFT_ZONE,
-                               true);
-               if (!IS_ERR(rl2))
-                       break;
-               if (PTR_ERR(rl2) != -ENOSPC || nr == min_nr) {
-                       ntfs_error(vol->sb, "Failed to allocate the minimal "
-                                       "number of clusters (%lli) for the "
-                                       "mft data attribute.", (long long)nr);
-                       up_write(&mft_ni->runlist.lock);
-                       return PTR_ERR(rl2);
-               }
-               /*
-                * There is not enough space to do the allocation, but there
-                * might be enough space to do a minimal allocation so try that
-                * before failing.
-                */
-               nr = min_nr;
-               ntfs_debug("Retrying mft data allocation with minimal cluster "
-                               "count %lli.", (long long)nr);
-       } while (1);
-       rl = ntfs_runlists_merge(mft_ni->runlist.rl, rl2);
-       if (IS_ERR(rl)) {
-               up_write(&mft_ni->runlist.lock);
-               ntfs_error(vol->sb, "Failed to merge runlists for mft data "
-                               "attribute.");
-               if (ntfs_cluster_free_from_rl(vol, rl2)) {
-                       ntfs_error(vol->sb, "Failed to deallocate clusters "
-                                       "from the mft data attribute.%s", es);
-                       NVolSetErrors(vol);
-               }
-               ntfs_free(rl2);
-               return PTR_ERR(rl);
-       }
-       mft_ni->runlist.rl = rl;
-       ntfs_debug("Allocated %lli clusters.", (long long)nr);
-       /* Find the last run in the new runlist. */
-       for (; rl[1].length; rl++)
-               ;
-       /* Update the attribute record as well. */
-       mrec = map_mft_record(mft_ni);
-       if (IS_ERR(mrec)) {
-               ntfs_error(vol->sb, "Failed to map mft record.");
-               ret = PTR_ERR(mrec);
-               goto undo_alloc;
-       }
-       ctx = ntfs_attr_get_search_ctx(mft_ni, mrec);
-       if (unlikely(!ctx)) {
-               ntfs_error(vol->sb, "Failed to get search context.");
-               ret = -ENOMEM;
-               goto undo_alloc;
-       }
-       ret = ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len,
-                       CASE_SENSITIVE, rl[1].vcn, NULL, 0, ctx);
-       if (unlikely(ret)) {
-               ntfs_error(vol->sb, "Failed to find last attribute extent of "
-                               "mft data attribute.");
-               if (ret == -ENOENT)
-                       ret = -EIO;
-               goto undo_alloc;
-       }
-       a = ctx->attr;
-       ll = sle64_to_cpu(a->data.non_resident.lowest_vcn);
-       /* Search back for the previous last allocated cluster of mft bitmap. */
-       for (rl2 = rl; rl2 > mft_ni->runlist.rl; rl2--) {
-               if (ll >= rl2->vcn)
-                       break;
-       }
-       BUG_ON(ll < rl2->vcn);
-       BUG_ON(ll >= rl2->vcn + rl2->length);
-       /* Get the size for the new mapping pairs array for this extent. */
-       mp_size = ntfs_get_size_for_mapping_pairs(vol, rl2, ll, -1);
-       if (unlikely(mp_size <= 0)) {
-               ntfs_error(vol->sb, "Get size for mapping pairs failed for "
-                               "mft data attribute extent.");
-               ret = mp_size;
-               if (!ret)
-                       ret = -EIO;
-               goto undo_alloc;
-       }
-       /* Expand the attribute record if necessary. */
-       old_alen = le32_to_cpu(a->length);
-       ret = ntfs_attr_record_resize(ctx->mrec, a, mp_size +
-                       le16_to_cpu(a->data.non_resident.mapping_pairs_offset));
-       if (unlikely(ret)) {
-               if (ret != -ENOSPC) {
-                       ntfs_error(vol->sb, "Failed to resize attribute "
-                                       "record for mft data attribute.");
-                       goto undo_alloc;
-               }
-               // TODO: Deal with this by moving this extent to a new mft
-               // record or by starting a new extent in a new mft record or by
-               // moving other attributes out of this mft record.
-               // Note: Use the special reserved mft records and ensure that
-               // this extent is not required to find the mft record in
-               // question.  If no free special records left we would need to
-               // move an existing record away, insert ours in its place, and
-               // then place the moved record into the newly allocated space
-               // and we would then need to update all references to this mft
-               // record appropriately.  This is rather complicated...
-               ntfs_error(vol->sb, "Not enough space in this mft record to "
-                               "accommodate extended mft data attribute "
-                               "extent.  Cannot handle this yet.");
-               ret = -EOPNOTSUPP;
-               goto undo_alloc;
-       }
-       mp_rebuilt = true;
-       /* Generate the mapping pairs array directly into the attr record. */
-       ret = ntfs_mapping_pairs_build(vol, (u8*)a +
-                       le16_to_cpu(a->data.non_resident.mapping_pairs_offset),
-                       mp_size, rl2, ll, -1, NULL);
-       if (unlikely(ret)) {
-               ntfs_error(vol->sb, "Failed to build mapping pairs array of "
-                               "mft data attribute.");
-               goto undo_alloc;
-       }
-       /* Update the highest_vcn. */
-       a->data.non_resident.highest_vcn = cpu_to_sle64(rl[1].vcn - 1);
-       /*
-        * We now have extended the mft data allocated_size by nr clusters.
-        * Reflect this in the ntfs_inode structure and the attribute record.
-        * @rl is the last (non-terminator) runlist element of mft data
-        * attribute.
-        */
-       if (a->data.non_resident.lowest_vcn) {
-               /*
-                * We are not in the first attribute extent, switch to it, but
-                * first ensure the changes will make it to disk later.
-                */
-               flush_dcache_mft_record_page(ctx->ntfs_ino);
-               mark_mft_record_dirty(ctx->ntfs_ino);
-               ntfs_attr_reinit_search_ctx(ctx);
-               ret = ntfs_attr_lookup(mft_ni->type, mft_ni->name,
-                               mft_ni->name_len, CASE_SENSITIVE, 0, NULL, 0,
-                               ctx);
-               if (unlikely(ret)) {
-                       ntfs_error(vol->sb, "Failed to find first attribute "
-                                       "extent of mft data attribute.");
-                       goto restore_undo_alloc;
-               }
-               a = ctx->attr;
-       }
-       write_lock_irqsave(&mft_ni->size_lock, flags);
-       mft_ni->allocated_size += nr << vol->cluster_size_bits;
-       a->data.non_resident.allocated_size =
-                       cpu_to_sle64(mft_ni->allocated_size);
-       write_unlock_irqrestore(&mft_ni->size_lock, flags);
-       /* Ensure the changes make it to disk. */
-       flush_dcache_mft_record_page(ctx->ntfs_ino);
-       mark_mft_record_dirty(ctx->ntfs_ino);
-       ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(mft_ni);
-       up_write(&mft_ni->runlist.lock);
-       ntfs_debug("Done.");
-       return 0;
-restore_undo_alloc:
-       ntfs_attr_reinit_search_ctx(ctx);
-       if (ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len,
-                       CASE_SENSITIVE, rl[1].vcn, NULL, 0, ctx)) {
-               ntfs_error(vol->sb, "Failed to find last attribute extent of "
-                               "mft data attribute.%s", es);
-               write_lock_irqsave(&mft_ni->size_lock, flags);
-               mft_ni->allocated_size += nr << vol->cluster_size_bits;
-               write_unlock_irqrestore(&mft_ni->size_lock, flags);
-               ntfs_attr_put_search_ctx(ctx);
-               unmap_mft_record(mft_ni);
-               up_write(&mft_ni->runlist.lock);
-               /*
-                * The only thing that is now wrong is ->allocated_size of the
-                * base attribute extent which chkdsk should be able to fix.
-                */
-               NVolSetErrors(vol);
-               return ret;
-       }
-       ctx->attr->data.non_resident.highest_vcn =
-                       cpu_to_sle64(old_last_vcn - 1);
-undo_alloc:
-       if (ntfs_cluster_free(mft_ni, old_last_vcn, -1, ctx) < 0) {
-               ntfs_error(vol->sb, "Failed to free clusters from mft data "
-                               "attribute.%s", es);
-               NVolSetErrors(vol);
-       }
-
-       if (ntfs_rl_truncate_nolock(vol, &mft_ni->runlist, old_last_vcn)) {
-               ntfs_error(vol->sb, "Failed to truncate mft data attribute "
-                               "runlist.%s", es);
-               NVolSetErrors(vol);
-       }
-       if (ctx) {
-               a = ctx->attr;
-               if (mp_rebuilt && !IS_ERR(ctx->mrec)) {
-                       if (ntfs_mapping_pairs_build(vol, (u8 *)a + le16_to_cpu(
-                               a->data.non_resident.mapping_pairs_offset),
-                               old_alen - le16_to_cpu(
-                                       a->data.non_resident.mapping_pairs_offset),
-                               rl2, ll, -1, NULL)) {
-                               ntfs_error(vol->sb, "Failed to restore mapping pairs "
-                                       "array.%s", es);
-                               NVolSetErrors(vol);
-                       }
-                       if (ntfs_attr_record_resize(ctx->mrec, a, old_alen)) {
-                               ntfs_error(vol->sb, "Failed to restore attribute "
-                                       "record.%s", es);
-                               NVolSetErrors(vol);
-                       }
-                       flush_dcache_mft_record_page(ctx->ntfs_ino);
-                       mark_mft_record_dirty(ctx->ntfs_ino);
-               } else if (IS_ERR(ctx->mrec)) {
-                       ntfs_error(vol->sb, "Failed to restore attribute search "
-                               "context.%s", es);
-                       NVolSetErrors(vol);
-               }
-               ntfs_attr_put_search_ctx(ctx);
-       }
-       if (!IS_ERR(mrec))
-               unmap_mft_record(mft_ni);
-       up_write(&mft_ni->runlist.lock);
-       return ret;
-}
-
-/**
- * ntfs_mft_record_layout - layout an mft record into a memory buffer
- * @vol:       volume to which the mft record will belong
- * @mft_no:    mft reference specifying the mft record number
- * @m:         destination buffer of size >= @vol->mft_record_size bytes
- *
- * Layout an empty, unused mft record with the mft record number @mft_no into
- * the buffer @m.  The volume @vol is needed because the mft record structure
- * was modified in NTFS 3.1 so we need to know which volume version this mft
- * record will be used on.
- *
- * Return 0 on success and -errno on error.
- */
-static int ntfs_mft_record_layout(const ntfs_volume *vol, const s64 mft_no,
-               MFT_RECORD *m)
-{
-       ATTR_RECORD *a;
-
-       ntfs_debug("Entering for mft record 0x%llx.", (long long)mft_no);
-       if (mft_no >= (1ll << 32)) {
-               ntfs_error(vol->sb, "Mft record number 0x%llx exceeds "
-                               "maximum of 2^32.", (long long)mft_no);
-               return -ERANGE;
-       }
-       /* Start by clearing the whole mft record to gives us a clean slate. */
-       memset(m, 0, vol->mft_record_size);
-       /* Aligned to 2-byte boundary. */
-       if (vol->major_ver < 3 || (vol->major_ver == 3 && !vol->minor_ver))
-               m->usa_ofs = cpu_to_le16((sizeof(MFT_RECORD_OLD) + 1) & ~1);
-       else {
-               m->usa_ofs = cpu_to_le16((sizeof(MFT_RECORD) + 1) & ~1);
-               /*
-                * Set the NTFS 3.1+ specific fields while we know that the
-                * volume version is 3.1+.
-                */
-               m->reserved = 0;
-               m->mft_record_number = cpu_to_le32((u32)mft_no);
-       }
-       m->magic = magic_FILE;
-       if (vol->mft_record_size >= NTFS_BLOCK_SIZE)
-               m->usa_count = cpu_to_le16(vol->mft_record_size /
-                               NTFS_BLOCK_SIZE + 1);
-       else {
-               m->usa_count = cpu_to_le16(1);
-               ntfs_warning(vol->sb, "Sector size is bigger than mft record "
-                               "size.  Setting usa_count to 1.  If chkdsk "
-                               "reports this as corruption, please email "
-                               "linux-ntfs-dev@lists.sourceforge.net stating "
-                               "that you saw this message and that the "
-                               "modified filesystem created was corrupt.  "
-                               "Thank you.");
-       }
-       /* Set the update sequence number to 1. */
-       *(le16*)((u8*)m + le16_to_cpu(m->usa_ofs)) = cpu_to_le16(1);
-       m->lsn = 0;
-       m->sequence_number = cpu_to_le16(1);
-       m->link_count = 0;
-       /*
-        * Place the attributes straight after the update sequence array,
-        * aligned to 8-byte boundary.
-        */
-       m->attrs_offset = cpu_to_le16((le16_to_cpu(m->usa_ofs) +
-                       (le16_to_cpu(m->usa_count) << 1) + 7) & ~7);
-       m->flags = 0;
-       /*
-        * Using attrs_offset plus eight bytes (for the termination attribute).
-        * attrs_offset is already aligned to 8-byte boundary, so no need to
-        * align again.
-        */
-       m->bytes_in_use = cpu_to_le32(le16_to_cpu(m->attrs_offset) + 8);
-       m->bytes_allocated = cpu_to_le32(vol->mft_record_size);
-       m->base_mft_record = 0;
-       m->next_attr_instance = 0;
-       /* Add the termination attribute. */
-       a = (ATTR_RECORD*)((u8*)m + le16_to_cpu(m->attrs_offset));
-       a->type = AT_END;
-       a->length = 0;
-       ntfs_debug("Done.");
-       return 0;
-}
-
-/**
- * ntfs_mft_record_format - format an mft record on an ntfs volume
- * @vol:       volume on which to format the mft record
- * @mft_no:    mft record number to format
- *
- * Format the mft record @mft_no in $MFT/$DATA, i.e. lay out an empty, unused
- * mft record into the appropriate place of the mft data attribute.  This is
- * used when extending the mft data attribute.
- *
- * Return 0 on success and -errno on error.
- */
-static int ntfs_mft_record_format(const ntfs_volume *vol, const s64 mft_no)
-{
-       loff_t i_size;
-       struct inode *mft_vi = vol->mft_ino;
-       struct page *page;
-       MFT_RECORD *m;
-       pgoff_t index, end_index;
-       unsigned int ofs;
-       int err;
-
-       ntfs_debug("Entering for mft record 0x%llx.", (long long)mft_no);
-       /*
-        * The index into the page cache and the offset within the page cache
-        * page of the wanted mft record.
-        */
-       index = mft_no << vol->mft_record_size_bits >> PAGE_SHIFT;
-       ofs = (mft_no << vol->mft_record_size_bits) & ~PAGE_MASK;
-       /* The maximum valid index into the page cache for $MFT's data. */
-       i_size = i_size_read(mft_vi);
-       end_index = i_size >> PAGE_SHIFT;
-       if (unlikely(index >= end_index)) {
-               if (unlikely(index > end_index || ofs + vol->mft_record_size >=
-                               (i_size & ~PAGE_MASK))) {
-                       ntfs_error(vol->sb, "Tried to format non-existing mft "
-                                       "record 0x%llx.", (long long)mft_no);
-                       return -ENOENT;
-               }
-       }
-       /* Read, map, and pin the page containing the mft record. */
-       page = ntfs_map_page(mft_vi->i_mapping, index);
-       if (IS_ERR(page)) {
-               ntfs_error(vol->sb, "Failed to map page containing mft record "
-                               "to format 0x%llx.", (long long)mft_no);
-               return PTR_ERR(page);
-       }
-       lock_page(page);
-       BUG_ON(!PageUptodate(page));
-       ClearPageUptodate(page);
-       m = (MFT_RECORD*)((u8*)page_address(page) + ofs);
-       err = ntfs_mft_record_layout(vol, mft_no, m);
-       if (unlikely(err)) {
-               ntfs_error(vol->sb, "Failed to layout mft record 0x%llx.",
-                               (long long)mft_no);
-               SetPageUptodate(page);
-               unlock_page(page);
-               ntfs_unmap_page(page);
-               return err;
-       }
-       flush_dcache_page(page);
-       SetPageUptodate(page);
-       unlock_page(page);
-       /*
-        * Make sure the mft record is written out to disk.  We could use
-        * ilookup5() to check if an inode is in icache and so on but this is
-        * unnecessary as ntfs_writepage() will write the dirty record anyway.
-        */
-       mark_ntfs_record_dirty(page, ofs);
-       ntfs_unmap_page(page);
-       ntfs_debug("Done.");
-       return 0;
-}
-
-/**
- * ntfs_mft_record_alloc - allocate an mft record on an ntfs volume
- * @vol:       [IN]  volume on which to allocate the mft record
- * @mode:      [IN]  mode if want a file or directory, i.e. base inode or 0
- * @base_ni:   [IN]  open base inode if allocating an extent mft record or NULL
- * @mrec:      [OUT] on successful return this is the mapped mft record
- *
- * Allocate an mft record in $MFT/$DATA of an open ntfs volume @vol.
- *
- * If @base_ni is NULL make the mft record a base mft record, i.e. a file or
- * direvctory inode, and allocate it at the default allocator position.  In
- * this case @mode is the file mode as given to us by the caller.  We in
- * particular use @mode to distinguish whether a file or a directory is being
- * created (S_IFDIR(mode) and S_IFREG(mode), respectively).
- *
- * If @base_ni is not NULL make the allocated mft record an extent record,
- * allocate it starting at the mft record after the base mft record and attach
- * the allocated and opened ntfs inode to the base inode @base_ni.  In this
- * case @mode must be 0 as it is meaningless for extent inodes.
- *
- * You need to check the return value with IS_ERR().  If false, the function
- * was successful and the return value is the now opened ntfs inode of the
- * allocated mft record.  *@mrec is then set to the allocated, mapped, pinned,
- * and locked mft record.  If IS_ERR() is true, the function failed and the
- * error code is obtained from PTR_ERR(return value).  *@mrec is undefined in
- * this case.
- *
- * Allocation strategy:
- *
- * To find a free mft record, we scan the mft bitmap for a zero bit.  To
- * optimize this we start scanning at the place specified by @base_ni or if
- * @base_ni is NULL we start where we last stopped and we perform wrap around
- * when we reach the end.  Note, we do not try to allocate mft records below
- * number 24 because numbers 0 to 15 are the defined system files anyway and 16
- * to 24 are special in that they are used for storing extension mft records
- * for the $DATA attribute of $MFT.  This is required to avoid the possibility
- * of creating a runlist with a circular dependency which once written to disk
- * can never be read in again.  Windows will only use records 16 to 24 for
- * normal files if the volume is completely out of space.  We never use them
- * which means that when the volume is really out of space we cannot create any
- * more files while Windows can still create up to 8 small files.  We can start
- * doing this at some later time, it does not matter much for now.
- *
- * When scanning the mft bitmap, we only search up to the last allocated mft
- * record.  If there are no free records left in the range 24 to number of
- * allocated mft records, then we extend the $MFT/$DATA attribute in order to
- * create free mft records.  We extend the allocated size of $MFT/$DATA by 16
- * records at a time or one cluster, if cluster size is above 16kiB.  If there
- * is not sufficient space to do this, we try to extend by a single mft record
- * or one cluster, if cluster size is above the mft record size.
- *
- * No matter how many mft records we allocate, we initialize only the first
- * allocated mft record, incrementing mft data size and initialized size
- * accordingly, open an ntfs_inode for it and return it to the caller, unless
- * there are less than 24 mft records, in which case we allocate and initialize
- * mft records until we reach record 24 which we consider as the first free mft
- * record for use by normal files.
- *
- * If during any stage we overflow the initialized data in the mft bitmap, we
- * extend the initialized size (and data size) by 8 bytes, allocating another
- * cluster if required.  The bitmap data size has to be at least equal to the
- * number of mft records in the mft, but it can be bigger, in which case the
- * superflous bits are padded with zeroes.
- *
- * Thus, when we return successfully (IS_ERR() is false), we will have:
- *     - initialized / extended the mft bitmap if necessary,
- *     - initialized / extended the mft data if necessary,
- *     - set the bit corresponding to the mft record being allocated in the
- *       mft bitmap,
- *     - opened an ntfs_inode for the allocated mft record, and we will have
- *     - returned the ntfs_inode as well as the allocated mapped, pinned, and
- *       locked mft record.
- *
- * On error, the volume will be left in a consistent state and no record will
- * be allocated.  If rolling back a partial operation fails, we may leave some
- * inconsistent metadata in which case we set NVolErrors() so the volume is
- * left dirty when unmounted.
- *
- * Note, this function cannot make use of most of the normal functions, like
- * for example for attribute resizing, etc, because when the run list overflows
- * the base mft record and an attribute list is used, it is very important that
- * the extension mft records used to store the $DATA attribute of $MFT can be
- * reached without having to read the information contained inside them, as
- * this would make it impossible to find them in the first place after the
- * volume is unmounted.  $MFT/$BITMAP probably does not need to follow this
- * rule because the bitmap is not essential for finding the mft records, but on
- * the other hand, handling the bitmap in this special way would make life
- * easier because otherwise there might be circular invocations of functions
- * when reading the bitmap.
- */
-ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol, const int mode,
-               ntfs_inode *base_ni, MFT_RECORD **mrec)
-{
-       s64 ll, bit, old_data_initialized, old_data_size;
-       unsigned long flags;
-       struct inode *vi;
-       struct page *page;
-       ntfs_inode *mft_ni, *mftbmp_ni, *ni;
-       ntfs_attr_search_ctx *ctx;
-       MFT_RECORD *m;
-       ATTR_RECORD *a;
-       pgoff_t index;
-       unsigned int ofs;
-       int err;
-       le16 seq_no, usn;
-       bool record_formatted = false;
-
-       if (base_ni) {
-               ntfs_debug("Entering (allocating an extent mft record for "
-                               "base mft record 0x%llx).",
-                               (long long)base_ni->mft_no);
-               /* @mode and @base_ni are mutually exclusive. */
-               BUG_ON(mode);
-       } else
-               ntfs_debug("Entering (allocating a base mft record).");
-       if (mode) {
-               /* @mode and @base_ni are mutually exclusive. */
-               BUG_ON(base_ni);
-               /* We only support creation of normal files and directories. */
-               if (!S_ISREG(mode) && !S_ISDIR(mode))
-                       return ERR_PTR(-EOPNOTSUPP);
-       }
-       BUG_ON(!mrec);
-       mft_ni = NTFS_I(vol->mft_ino);
-       mftbmp_ni = NTFS_I(vol->mftbmp_ino);
-       down_write(&vol->mftbmp_lock);
-       bit = ntfs_mft_bitmap_find_and_alloc_free_rec_nolock(vol, base_ni);
-       if (bit >= 0) {
-               ntfs_debug("Found and allocated free record (#1), bit 0x%llx.",
-                               (long long)bit);
-               goto have_alloc_rec;
-       }
-       if (bit != -ENOSPC) {
-               up_write(&vol->mftbmp_lock);
-               return ERR_PTR(bit);
-       }
-       /*
-        * No free mft records left.  If the mft bitmap already covers more
-        * than the currently used mft records, the next records are all free,
-        * so we can simply allocate the first unused mft record.
-        * Note: We also have to make sure that the mft bitmap at least covers
-        * the first 24 mft records as they are special and whilst they may not
-        * be in use, we do not allocate from them.
-        */
-       read_lock_irqsave(&mft_ni->size_lock, flags);
-       ll = mft_ni->initialized_size >> vol->mft_record_size_bits;
-       read_unlock_irqrestore(&mft_ni->size_lock, flags);
-       read_lock_irqsave(&mftbmp_ni->size_lock, flags);
-       old_data_initialized = mftbmp_ni->initialized_size;
-       read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
-       if (old_data_initialized << 3 > ll && old_data_initialized > 3) {
-               bit = ll;
-               if (bit < 24)
-                       bit = 24;
-               if (unlikely(bit >= (1ll << 32)))
-                       goto max_err_out;
-               ntfs_debug("Found free record (#2), bit 0x%llx.",
-                               (long long)bit);
-               goto found_free_rec;
-       }
-       /*
-        * The mft bitmap needs to be expanded until it covers the first unused
-        * mft record that we can allocate.
-        * Note: The smallest mft record we allocate is mft record 24.
-        */
-       bit = old_data_initialized << 3;
-       if (unlikely(bit >= (1ll << 32)))
-               goto max_err_out;
-       read_lock_irqsave(&mftbmp_ni->size_lock, flags);
-       old_data_size = mftbmp_ni->allocated_size;
-       ntfs_debug("Status of mftbmp before extension: allocated_size 0x%llx, "
-                       "data_size 0x%llx, initialized_size 0x%llx.",
-                       (long long)old_data_size,
-                       (long long)i_size_read(vol->mftbmp_ino),
-                       (long long)old_data_initialized);
-       read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
-       if (old_data_initialized + 8 > old_data_size) {
-               /* Need to extend bitmap by one more cluster. */
-               ntfs_debug("mftbmp: initialized_size + 8 > allocated_size.");
-               err = ntfs_mft_bitmap_extend_allocation_nolock(vol);
-               if (unlikely(err)) {
-                       up_write(&vol->mftbmp_lock);
-                       goto err_out;
-               }
-#ifdef DEBUG
-               read_lock_irqsave(&mftbmp_ni->size_lock, flags);
-               ntfs_debug("Status of mftbmp after allocation extension: "
-                               "allocated_size 0x%llx, data_size 0x%llx, "
-                               "initialized_size 0x%llx.",
-                               (long long)mftbmp_ni->allocated_size,
-                               (long long)i_size_read(vol->mftbmp_ino),
-                               (long long)mftbmp_ni->initialized_size);
-               read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
-#endif /* DEBUG */
-       }
-       /*
-        * We now have sufficient allocated space, extend the initialized_size
-        * as well as the data_size if necessary and fill the new space with
-        * zeroes.
-        */
-       err = ntfs_mft_bitmap_extend_initialized_nolock(vol);
-       if (unlikely(err)) {
-               up_write(&vol->mftbmp_lock);
-               goto err_out;
-       }
-#ifdef DEBUG
-       read_lock_irqsave(&mftbmp_ni->size_lock, flags);
-       ntfs_debug("Status of mftbmp after initialized extension: "
-                       "allocated_size 0x%llx, data_size 0x%llx, "
-                       "initialized_size 0x%llx.",
-                       (long long)mftbmp_ni->allocated_size,
-                       (long long)i_size_read(vol->mftbmp_ino),
-                       (long long)mftbmp_ni->initialized_size);
-       read_unlock_irqrestore(&mftbmp_ni->size_lock, flags);
-#endif /* DEBUG */
-       ntfs_debug("Found free record (#3), bit 0x%llx.", (long long)bit);
-found_free_rec:
-       /* @bit is the found free mft record, allocate it in the mft bitmap. */
-       ntfs_debug("At found_free_rec.");
-       err = ntfs_bitmap_set_bit(vol->mftbmp_ino, bit);
-       if (unlikely(err)) {
-               ntfs_error(vol->sb, "Failed to allocate bit in mft bitmap.");
-               up_write(&vol->mftbmp_lock);
-               goto err_out;
-       }
-       ntfs_debug("Set bit 0x%llx in mft bitmap.", (long long)bit);
-have_alloc_rec:
-       /*
-        * The mft bitmap is now uptodate.  Deal with mft data attribute now.
-        * Note, we keep hold of the mft bitmap lock for writing until all
-        * modifications to the mft data attribute are complete, too, as they
-        * will impact decisions for mft bitmap and mft record allocation done
-        * by a parallel allocation and if the lock is not maintained a
-        * parallel allocation could allocate the same mft record as this one.
-        */
-       ll = (bit + 1) << vol->mft_record_size_bits;
-       read_lock_irqsave(&mft_ni->size_lock, flags);
-       old_data_initialized = mft_ni->initialized_size;
-       read_unlock_irqrestore(&mft_ni->size_lock, flags);
-       if (ll <= old_data_initialized) {
-               ntfs_debug("Allocated mft record already initialized.");
-               goto mft_rec_already_initialized;
-       }
-       ntfs_debug("Initializing allocated mft record.");
-       /*
-        * The mft record is outside the initialized data.  Extend the mft data
-        * attribute until it covers the allocated record.  The loop is only
-        * actually traversed more than once when a freshly formatted volume is
-        * first written to so it optimizes away nicely in the common case.
-        */
-       read_lock_irqsave(&mft_ni->size_lock, flags);
-       ntfs_debug("Status of mft data before extension: "
-                       "allocated_size 0x%llx, data_size 0x%llx, "
-                       "initialized_size 0x%llx.",
-                       (long long)mft_ni->allocated_size,
-                       (long long)i_size_read(vol->mft_ino),
-                       (long long)mft_ni->initialized_size);
-       while (ll > mft_ni->allocated_size) {
-               read_unlock_irqrestore(&mft_ni->size_lock, flags);
-               err = ntfs_mft_data_extend_allocation_nolock(vol);
-               if (unlikely(err)) {
-                       ntfs_error(vol->sb, "Failed to extend mft data "
-                                       "allocation.");
-                       goto undo_mftbmp_alloc_nolock;
-               }
-               read_lock_irqsave(&mft_ni->size_lock, flags);
-               ntfs_debug("Status of mft data after allocation extension: "
-                               "allocated_size 0x%llx, data_size 0x%llx, "
-                               "initialized_size 0x%llx.",
-                               (long long)mft_ni->allocated_size,
-                               (long long)i_size_read(vol->mft_ino),
-                               (long long)mft_ni->initialized_size);
-       }
-       read_unlock_irqrestore(&mft_ni->size_lock, flags);
-       /*
-        * Extend mft data initialized size (and data size of course) to reach
-        * the allocated mft record, formatting the mft records allong the way.
-        * Note: We only modify the ntfs_inode structure as that is all that is
-        * needed by ntfs_mft_record_format().  We will update the attribute
-        * record itself in one fell swoop later on.
-        */
-       write_lock_irqsave(&mft_ni->size_lock, flags);
-       old_data_initialized = mft_ni->initialized_size;
-       old_data_size = vol->mft_ino->i_size;
-       while (ll > mft_ni->initialized_size) {
-               s64 new_initialized_size, mft_no;
-               
-               new_initialized_size = mft_ni->initialized_size +
-                               vol->mft_record_size;
-               mft_no = mft_ni->initialized_size >> vol->mft_record_size_bits;
-               if (new_initialized_size > i_size_read(vol->mft_ino))
-                       i_size_write(vol->mft_ino, new_initialized_size);
-               write_unlock_irqrestore(&mft_ni->size_lock, flags);
-               ntfs_debug("Initializing mft record 0x%llx.",
-                               (long long)mft_no);
-               err = ntfs_mft_record_format(vol, mft_no);
-               if (unlikely(err)) {
-                       ntfs_error(vol->sb, "Failed to format mft record.");
-                       goto undo_data_init;
-               }
-               write_lock_irqsave(&mft_ni->size_lock, flags);
-               mft_ni->initialized_size = new_initialized_size;
-       }
-       write_unlock_irqrestore(&mft_ni->size_lock, flags);
-       record_formatted = true;
-       /* Update the mft data attribute record to reflect the new sizes. */
-       m = map_mft_record(mft_ni);
-       if (IS_ERR(m)) {
-               ntfs_error(vol->sb, "Failed to map mft record.");
-               err = PTR_ERR(m);
-               goto undo_data_init;
-       }
-       ctx = ntfs_attr_get_search_ctx(mft_ni, m);
-       if (unlikely(!ctx)) {
-               ntfs_error(vol->sb, "Failed to get search context.");
-               err = -ENOMEM;
-               unmap_mft_record(mft_ni);
-               goto undo_data_init;
-       }
-       err = ntfs_attr_lookup(mft_ni->type, mft_ni->name, mft_ni->name_len,
-                       CASE_SENSITIVE, 0, NULL, 0, ctx);
-       if (unlikely(err)) {
-               ntfs_error(vol->sb, "Failed to find first attribute extent of "
-                               "mft data attribute.");
-               ntfs_attr_put_search_ctx(ctx);
-               unmap_mft_record(mft_ni);
-               goto undo_data_init;
-       }
-       a = ctx->attr;
-       read_lock_irqsave(&mft_ni->size_lock, flags);
-       a->data.non_resident.initialized_size =
-                       cpu_to_sle64(mft_ni->initialized_size);
-       a->data.non_resident.data_size =
-                       cpu_to_sle64(i_size_read(vol->mft_ino));
-       read_unlock_irqrestore(&mft_ni->size_lock, flags);
-       /* Ensure the changes make it to disk. */
-       flush_dcache_mft_record_page(ctx->ntfs_ino);
-       mark_mft_record_dirty(ctx->ntfs_ino);
-       ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(mft_ni);
-       read_lock_irqsave(&mft_ni->size_lock, flags);
-       ntfs_debug("Status of mft data after mft record initialization: "
-                       "allocated_size 0x%llx, data_size 0x%llx, "
-                       "initialized_size 0x%llx.",
-                       (long long)mft_ni->allocated_size,
-                       (long long)i_size_read(vol->mft_ino),
-                       (long long)mft_ni->initialized_size);
-       BUG_ON(i_size_read(vol->mft_ino) > mft_ni->allocated_size);
-       BUG_ON(mft_ni->initialized_size > i_size_read(vol->mft_ino));
-       read_unlock_irqrestore(&mft_ni->size_lock, flags);
-mft_rec_already_initialized:
-       /*
-        * We can finally drop the mft bitmap lock as the mft data attribute
-        * has been fully updated.  The only disparity left is that the
-        * allocated mft record still needs to be marked as in use to match the
-        * set bit in the mft bitmap but this is actually not a problem since
-        * this mft record is not referenced from anywhere yet and the fact
-        * that it is allocated in the mft bitmap means that no-one will try to
-        * allocate it either.
-        */
-       up_write(&vol->mftbmp_lock);
-       /*
-        * We now have allocated and initialized the mft record.  Calculate the
-        * index of and the offset within the page cache page the record is in.
-        */
-       index = bit << vol->mft_record_size_bits >> PAGE_SHIFT;
-       ofs = (bit << vol->mft_record_size_bits) & ~PAGE_MASK;
-       /* Read, map, and pin the page containing the mft record. */
-       page = ntfs_map_page(vol->mft_ino->i_mapping, index);
-       if (IS_ERR(page)) {
-               ntfs_error(vol->sb, "Failed to map page containing allocated "
-                               "mft record 0x%llx.", (long long)bit);
-               err = PTR_ERR(page);
-               goto undo_mftbmp_alloc;
-       }
-       lock_page(page);
-       BUG_ON(!PageUptodate(page));
-       ClearPageUptodate(page);
-       m = (MFT_RECORD*)((u8*)page_address(page) + ofs);
-       /* If we just formatted the mft record no need to do it again. */
-       if (!record_formatted) {
-               /* Sanity check that the mft record is really not in use. */
-               if (ntfs_is_file_record(m->magic) &&
-                               (m->flags & MFT_RECORD_IN_USE)) {
-                       ntfs_error(vol->sb, "Mft record 0x%llx was marked "
-                                       "free in mft bitmap but is marked "
-                                       "used itself.  Corrupt filesystem.  "
-                                       "Unmount and run chkdsk.",
-                                       (long long)bit);
-                       err = -EIO;
-                       SetPageUptodate(page);
-                       unlock_page(page);
-                       ntfs_unmap_page(page);
-                       NVolSetErrors(vol);
-                       goto undo_mftbmp_alloc;
-               }
-               /*
-                * We need to (re-)format the mft record, preserving the
-                * sequence number if it is not zero as well as the update
-                * sequence number if it is not zero or -1 (0xffff).  This
-                * means we do not need to care whether or not something went
-                * wrong with the previous mft record.
-                */
-               seq_no = m->sequence_number;
-               usn = *(le16*)((u8*)m + le16_to_cpu(m->usa_ofs));
-               err = ntfs_mft_record_layout(vol, bit, m);
-               if (unlikely(err)) {
-                       ntfs_error(vol->sb, "Failed to layout allocated mft "
-                                       "record 0x%llx.", (long long)bit);
-                       SetPageUptodate(page);
-                       unlock_page(page);
-                       ntfs_unmap_page(page);
-                       goto undo_mftbmp_alloc;
-               }
-               if (seq_no)
-                       m->sequence_number = seq_no;
-               if (usn && le16_to_cpu(usn) != 0xffff)
-                       *(le16*)((u8*)m + le16_to_cpu(m->usa_ofs)) = usn;
-       }
-       /* Set the mft record itself in use. */
-       m->flags |= MFT_RECORD_IN_USE;
-       if (S_ISDIR(mode))
-               m->flags |= MFT_RECORD_IS_DIRECTORY;
-       flush_dcache_page(page);
-       SetPageUptodate(page);
-       if (base_ni) {
-               MFT_RECORD *m_tmp;
-
-               /*
-                * Setup the base mft record in the extent mft record.  This
-                * completes initialization of the allocated extent mft record
-                * and we can simply use it with map_extent_mft_record().
-                */
-               m->base_mft_record = MK_LE_MREF(base_ni->mft_no,
-                               base_ni->seq_no);
-               /*
-                * Allocate an extent inode structure for the new mft record,
-                * attach it to the base inode @base_ni and map, pin, and lock
-                * its, i.e. the allocated, mft record.
-                */
-               m_tmp = map_extent_mft_record(base_ni, bit, &ni);
-               if (IS_ERR(m_tmp)) {
-                       ntfs_error(vol->sb, "Failed to map allocated extent "
-                                       "mft record 0x%llx.", (long long)bit);
-                       err = PTR_ERR(m_tmp);
-                       /* Set the mft record itself not in use. */
-                       m->flags &= cpu_to_le16(
-                                       ~le16_to_cpu(MFT_RECORD_IN_USE));
-                       flush_dcache_page(page);
-                       /* Make sure the mft record is written out to disk. */
-                       mark_ntfs_record_dirty(page, ofs);
-                       unlock_page(page);
-                       ntfs_unmap_page(page);
-                       goto undo_mftbmp_alloc;
-               }
-               BUG_ON(m != m_tmp);
-               /*
-                * Make sure the allocated mft record is written out to disk.
-                * No need to set the inode dirty because the caller is going
-                * to do that anyway after finishing with the new extent mft
-                * record (e.g. at a minimum a new attribute will be added to
-                * the mft record.
-                */
-               mark_ntfs_record_dirty(page, ofs);
-               unlock_page(page);
-               /*
-                * Need to unmap the page since map_extent_mft_record() mapped
-                * it as well so we have it mapped twice at the moment.
-                */
-               ntfs_unmap_page(page);
-       } else {
-               /*
-                * Allocate a new VFS inode and set it up.  NOTE: @vi->i_nlink
-                * is set to 1 but the mft record->link_count is 0.  The caller
-                * needs to bear this in mind.
-                */
-               vi = new_inode(vol->sb);
-               if (unlikely(!vi)) {
-                       err = -ENOMEM;
-                       /* Set the mft record itself not in use. */
-                       m->flags &= cpu_to_le16(
-                                       ~le16_to_cpu(MFT_RECORD_IN_USE));
-                       flush_dcache_page(page);
-                       /* Make sure the mft record is written out to disk. */
-                       mark_ntfs_record_dirty(page, ofs);
-                       unlock_page(page);
-                       ntfs_unmap_page(page);
-                       goto undo_mftbmp_alloc;
-               }
-               vi->i_ino = bit;
-
-               /* The owner and group come from the ntfs volume. */
-               vi->i_uid = vol->uid;
-               vi->i_gid = vol->gid;
-
-               /* Initialize the ntfs specific part of @vi. */
-               ntfs_init_big_inode(vi);
-               ni = NTFS_I(vi);
-               /*
-                * Set the appropriate mode, attribute type, and name.  For
-                * directories, also setup the index values to the defaults.
-                */
-               if (S_ISDIR(mode)) {
-                       vi->i_mode = S_IFDIR | S_IRWXUGO;
-                       vi->i_mode &= ~vol->dmask;
-
-                       NInoSetMstProtected(ni);
-                       ni->type = AT_INDEX_ALLOCATION;
-                       ni->name = I30;
-                       ni->name_len = 4;
-
-                       ni->itype.index.block_size = 4096;
-                       ni->itype.index.block_size_bits = ntfs_ffs(4096) - 1;
-                       ni->itype.index.collation_rule = COLLATION_FILE_NAME;
-                       if (vol->cluster_size <= ni->itype.index.block_size) {
-                               ni->itype.index.vcn_size = vol->cluster_size;
-                               ni->itype.index.vcn_size_bits =
-                                               vol->cluster_size_bits;
-                       } else {
-                               ni->itype.index.vcn_size = vol->sector_size;
-                               ni->itype.index.vcn_size_bits =
-                                               vol->sector_size_bits;
-                       }
-               } else {
-                       vi->i_mode = S_IFREG | S_IRWXUGO;
-                       vi->i_mode &= ~vol->fmask;
-
-                       ni->type = AT_DATA;
-                       ni->name = NULL;
-                       ni->name_len = 0;
-               }
-               if (IS_RDONLY(vi))
-                       vi->i_mode &= ~S_IWUGO;
-
-               /* Set the inode times to the current time. */
-               simple_inode_init_ts(vi);
-               /*
-                * Set the file size to 0, the ntfs inode sizes are set to 0 by
-                * the call to ntfs_init_big_inode() below.
-                */
-               vi->i_size = 0;
-               vi->i_blocks = 0;
-
-               /* Set the sequence number. */
-               vi->i_generation = ni->seq_no = le16_to_cpu(m->sequence_number);
-               /*
-                * Manually map, pin, and lock the mft record as we already
-                * have its page mapped and it is very easy to do.
-                */
-               atomic_inc(&ni->count);
-               mutex_lock(&ni->mrec_lock);
-               ni->page = page;
-               ni->page_ofs = ofs;
-               /*
-                * Make sure the allocated mft record is written out to disk.
-                * NOTE: We do not set the ntfs inode dirty because this would
-                * fail in ntfs_write_inode() because the inode does not have a
-                * standard information attribute yet.  Also, there is no need
-                * to set the inode dirty because the caller is going to do
-                * that anyway after finishing with the new mft record (e.g. at
-                * a minimum some new attributes will be added to the mft
-                * record.
-                */
-               mark_ntfs_record_dirty(page, ofs);
-               unlock_page(page);
-
-               /* Add the inode to the inode hash for the superblock. */
-               insert_inode_hash(vi);
-
-               /* Update the default mft allocation position. */
-               vol->mft_data_pos = bit + 1;
-       }
-       /*
-        * Return the opened, allocated inode of the allocated mft record as
-        * well as the mapped, pinned, and locked mft record.
-        */
-       ntfs_debug("Returning opened, allocated %sinode 0x%llx.",
-                       base_ni ? "extent " : "", (long long)bit);
-       *mrec = m;
-       return ni;
-undo_data_init:
-       write_lock_irqsave(&mft_ni->size_lock, flags);
-       mft_ni->initialized_size = old_data_initialized;
-       i_size_write(vol->mft_ino, old_data_size);
-       write_unlock_irqrestore(&mft_ni->size_lock, flags);
-       goto undo_mftbmp_alloc_nolock;
-undo_mftbmp_alloc:
-       down_write(&vol->mftbmp_lock);
-undo_mftbmp_alloc_nolock:
-       if (ntfs_bitmap_clear_bit(vol->mftbmp_ino, bit)) {
-               ntfs_error(vol->sb, "Failed to clear bit in mft bitmap.%s", es);
-               NVolSetErrors(vol);
-       }
-       up_write(&vol->mftbmp_lock);
-err_out:
-       return ERR_PTR(err);
-max_err_out:
-       ntfs_warning(vol->sb, "Cannot allocate mft record because the maximum "
-                       "number of inodes (2^32) has already been reached.");
-       up_write(&vol->mftbmp_lock);
-       return ERR_PTR(-ENOSPC);
-}
-
-/**
- * ntfs_extent_mft_record_free - free an extent mft record on an ntfs volume
- * @ni:                ntfs inode of the mapped extent mft record to free
- * @m:         mapped extent mft record of the ntfs inode @ni
- *
- * Free the mapped extent mft record @m of the extent ntfs inode @ni.
- *
- * Note that this function unmaps the mft record and closes and destroys @ni
- * internally and hence you cannot use either @ni nor @m any more after this
- * function returns success.
- *
- * On success return 0 and on error return -errno.  @ni and @m are still valid
- * in this case and have not been freed.
- *
- * For some errors an error message is displayed and the success code 0 is
- * returned and the volume is then left dirty on umount.  This makes sense in
- * case we could not rollback the changes that were already done since the
- * caller no longer wants to reference this mft record so it does not matter to
- * the caller if something is wrong with it as long as it is properly detached
- * from the base inode.
- */
-int ntfs_extent_mft_record_free(ntfs_inode *ni, MFT_RECORD *m)
-{
-       unsigned long mft_no = ni->mft_no;
-       ntfs_volume *vol = ni->vol;
-       ntfs_inode *base_ni;
-       ntfs_inode **extent_nis;
-       int i, err;
-       le16 old_seq_no;
-       u16 seq_no;
-       
-       BUG_ON(NInoAttr(ni));
-       BUG_ON(ni->nr_extents != -1);
-
-       mutex_lock(&ni->extent_lock);
-       base_ni = ni->ext.base_ntfs_ino;
-       mutex_unlock(&ni->extent_lock);
-
-       BUG_ON(base_ni->nr_extents <= 0);
-
-       ntfs_debug("Entering for extent inode 0x%lx, base inode 0x%lx.\n",
-                       mft_no, base_ni->mft_no);
-
-       mutex_lock(&base_ni->extent_lock);
-
-       /* Make sure we are holding the only reference to the extent inode. */
-       if (atomic_read(&ni->count) > 2) {
-               ntfs_error(vol->sb, "Tried to free busy extent inode 0x%lx, "
-                               "not freeing.", base_ni->mft_no);
-               mutex_unlock(&base_ni->extent_lock);
-               return -EBUSY;
-       }
-
-       /* Dissociate the ntfs inode from the base inode. */
-       extent_nis = base_ni->ext.extent_ntfs_inos;
-       err = -ENOENT;
-       for (i = 0; i < base_ni->nr_extents; i++) {
-               if (ni != extent_nis[i])
-                       continue;
-               extent_nis += i;
-               base_ni->nr_extents--;
-               memmove(extent_nis, extent_nis + 1, (base_ni->nr_extents - i) *
-                               sizeof(ntfs_inode*));
-               err = 0;
-               break;
-       }
-
-       mutex_unlock(&base_ni->extent_lock);
-
-       if (unlikely(err)) {
-               ntfs_error(vol->sb, "Extent inode 0x%lx is not attached to "
-                               "its base inode 0x%lx.", mft_no,
-                               base_ni->mft_no);
-               BUG();
-       }
-
-       /*
-        * The extent inode is no longer attached to the base inode so no one
-        * can get a reference to it any more.
-        */
-
-       /* Mark the mft record as not in use. */
-       m->flags &= ~MFT_RECORD_IN_USE;
-
-       /* Increment the sequence number, skipping zero, if it is not zero. */
-       old_seq_no = m->sequence_number;
-       seq_no = le16_to_cpu(old_seq_no);
-       if (seq_no == 0xffff)
-               seq_no = 1;
-       else if (seq_no)
-               seq_no++;
-       m->sequence_number = cpu_to_le16(seq_no);
-
-       /*
-        * Set the ntfs inode dirty and write it out.  We do not need to worry
-        * about the base inode here since whatever caused the extent mft
-        * record to be freed is guaranteed to do it already.
-        */
-       NInoSetDirty(ni);
-       err = write_mft_record(ni, m, 0);
-       if (unlikely(err)) {
-               ntfs_error(vol->sb, "Failed to write mft record 0x%lx, not "
-                               "freeing.", mft_no);
-               goto rollback;
-       }
-rollback_error:
-       /* Unmap and throw away the now freed extent inode. */
-       unmap_extent_mft_record(ni);
-       ntfs_clear_extent_inode(ni);
-
-       /* Clear the bit in the $MFT/$BITMAP corresponding to this record. */
-       down_write(&vol->mftbmp_lock);
-       err = ntfs_bitmap_clear_bit(vol->mftbmp_ino, mft_no);
-       up_write(&vol->mftbmp_lock);
-       if (unlikely(err)) {
-               /*
-                * The extent inode is gone but we failed to deallocate it in
-                * the mft bitmap.  Just emit a warning and leave the volume
-                * dirty on umount.
-                */
-               ntfs_error(vol->sb, "Failed to clear bit in mft bitmap.%s", es);
-               NVolSetErrors(vol);
-       }
-       return 0;
-rollback:
-       /* Rollback what we did... */
-       mutex_lock(&base_ni->extent_lock);
-       extent_nis = base_ni->ext.extent_ntfs_inos;
-       if (!(base_ni->nr_extents & 3)) {
-               int new_size = (base_ni->nr_extents + 4) * sizeof(ntfs_inode*);
-
-               extent_nis = kmalloc(new_size, GFP_NOFS);
-               if (unlikely(!extent_nis)) {
-                       ntfs_error(vol->sb, "Failed to allocate internal "
-                                       "buffer during rollback.%s", es);
-                       mutex_unlock(&base_ni->extent_lock);
-                       NVolSetErrors(vol);
-                       goto rollback_error;
-               }
-               if (base_ni->nr_extents) {
-                       BUG_ON(!base_ni->ext.extent_ntfs_inos);
-                       memcpy(extent_nis, base_ni->ext.extent_ntfs_inos,
-                                       new_size - 4 * sizeof(ntfs_inode*));
-                       kfree(base_ni->ext.extent_ntfs_inos);
-               }
-               base_ni->ext.extent_ntfs_inos = extent_nis;
-       }
-       m->flags |= MFT_RECORD_IN_USE;
-       m->sequence_number = old_seq_no;
-       extent_nis[base_ni->nr_extents++] = ni;
-       mutex_unlock(&base_ni->extent_lock);
-       mark_mft_record_dirty(ni);
-       return err;
-}
-#endif /* NTFS_RW */
diff --git a/fs/ntfs/mft.h b/fs/ntfs/mft.h
deleted file mode 100644 (file)
index 49c001a..0000000
+++ /dev/null
@@ -1,110 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * mft.h - Defines for mft record handling in NTFS Linux kernel driver.
- *        Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2004 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_MFT_H
-#define _LINUX_NTFS_MFT_H
-
-#include <linux/fs.h>
-#include <linux/highmem.h>
-#include <linux/pagemap.h>
-
-#include "inode.h"
-
-extern MFT_RECORD *map_mft_record(ntfs_inode *ni);
-extern void unmap_mft_record(ntfs_inode *ni);
-
-extern MFT_RECORD *map_extent_mft_record(ntfs_inode *base_ni, MFT_REF mref,
-               ntfs_inode **ntfs_ino);
-
-static inline void unmap_extent_mft_record(ntfs_inode *ni)
-{
-       unmap_mft_record(ni);
-       return;
-}
-
-#ifdef NTFS_RW
-
-/**
- * flush_dcache_mft_record_page - flush_dcache_page() for mft records
- * @ni:                ntfs inode structure of mft record
- *
- * Call flush_dcache_page() for the page in which an mft record resides.
- *
- * This must be called every time an mft record is modified, just after the
- * modification.
- */
-static inline void flush_dcache_mft_record_page(ntfs_inode *ni)
-{
-       flush_dcache_page(ni->page);
-}
-
-extern void __mark_mft_record_dirty(ntfs_inode *ni);
-
-/**
- * mark_mft_record_dirty - set the mft record and the page containing it dirty
- * @ni:                ntfs inode describing the mapped mft record
- *
- * Set the mapped (extent) mft record of the (base or extent) ntfs inode @ni,
- * as well as the page containing the mft record, dirty.  Also, mark the base
- * vfs inode dirty.  This ensures that any changes to the mft record are
- * written out to disk.
- *
- * NOTE:  Do not do anything if the mft record is already marked dirty.
- */
-static inline void mark_mft_record_dirty(ntfs_inode *ni)
-{
-       if (!NInoTestSetDirty(ni))
-               __mark_mft_record_dirty(ni);
-}
-
-extern int ntfs_sync_mft_mirror(ntfs_volume *vol, const unsigned long mft_no,
-               MFT_RECORD *m, int sync);
-
-extern int write_mft_record_nolock(ntfs_inode *ni, MFT_RECORD *m, int sync);
-
-/**
- * write_mft_record - write out a mapped (extent) mft record
- * @ni:                ntfs inode describing the mapped (extent) mft record
- * @m:         mapped (extent) mft record to write
- * @sync:      if true, wait for i/o completion
- *
- * This is just a wrapper for write_mft_record_nolock() (see mft.c), which
- * locks the page for the duration of the write.  This ensures that there are
- * no race conditions between writing the mft record via the dirty inode code
- * paths and via the page cache write back code paths or between writing
- * neighbouring mft records residing in the same page.
- *
- * Locking the page also serializes us against ->read_folio() if the page is not
- * uptodate.
- *
- * On success, clean the mft record and return 0.  On error, leave the mft
- * record dirty and return -errno.
- */
-static inline int write_mft_record(ntfs_inode *ni, MFT_RECORD *m, int sync)
-{
-       struct page *page = ni->page;
-       int err;
-
-       BUG_ON(!page);
-       lock_page(page);
-       err = write_mft_record_nolock(ni, m, sync);
-       unlock_page(page);
-       return err;
-}
-
-extern bool ntfs_may_write_mft_record(ntfs_volume *vol,
-               const unsigned long mft_no, const MFT_RECORD *m,
-               ntfs_inode **locked_ni);
-
-extern ntfs_inode *ntfs_mft_record_alloc(ntfs_volume *vol, const int mode,
-               ntfs_inode *base_ni, MFT_RECORD **mrec);
-extern int ntfs_extent_mft_record_free(ntfs_inode *ni, MFT_RECORD *m);
-
-#endif /* NTFS_RW */
-
-#endif /* _LINUX_NTFS_MFT_H */
diff --git a/fs/ntfs/mst.c b/fs/ntfs/mst.c
deleted file mode 100644 (file)
index 16b3c88..0000000
+++ /dev/null
@@ -1,189 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * mst.c - NTFS multi sector transfer protection handling code. Part of the
- *        Linux-NTFS project.
- *
- * Copyright (c) 2001-2004 Anton Altaparmakov
- */
-
-#include "ntfs.h"
-
-/**
- * post_read_mst_fixup - deprotect multi sector transfer protected data
- * @b:         pointer to the data to deprotect
- * @size:      size in bytes of @b
- *
- * Perform the necessary post read multi sector transfer fixup and detect the
- * presence of incomplete multi sector transfers. - In that case, overwrite the
- * magic of the ntfs record header being processed with "BAAD" (in memory only!)
- * and abort processing.
- *
- * Return 0 on success and -EINVAL on error ("BAAD" magic will be present).
- *
- * NOTE: We consider the absence / invalidity of an update sequence array to
- * mean that the structure is not protected at all and hence doesn't need to
- * be fixed up. Thus, we return success and not failure in this case. This is
- * in contrast to pre_write_mst_fixup(), see below.
- */
-int post_read_mst_fixup(NTFS_RECORD *b, const u32 size)
-{
-       u16 usa_ofs, usa_count, usn;
-       u16 *usa_pos, *data_pos;
-
-       /* Setup the variables. */
-       usa_ofs = le16_to_cpu(b->usa_ofs);
-       /* Decrement usa_count to get number of fixups. */
-       usa_count = le16_to_cpu(b->usa_count) - 1;
-       /* Size and alignment checks. */
-       if ( size & (NTFS_BLOCK_SIZE - 1)       ||
-            usa_ofs & 1                        ||
-            usa_ofs + (usa_count * 2) > size   ||
-            (size >> NTFS_BLOCK_SIZE_BITS) != usa_count)
-               return 0;
-       /* Position of usn in update sequence array. */
-       usa_pos = (u16*)b + usa_ofs/sizeof(u16);
-       /*
-        * The update sequence number which has to be equal to each of the
-        * u16 values before they are fixed up. Note no need to care for
-        * endianness since we are comparing and moving data for on disk
-        * structures which means the data is consistent. - If it is
-        * consistenty the wrong endianness it doesn't make any difference.
-        */
-       usn = *usa_pos;
-       /*
-        * Position in protected data of first u16 that needs fixing up.
-        */
-       data_pos = (u16*)b + NTFS_BLOCK_SIZE/sizeof(u16) - 1;
-       /*
-        * Check for incomplete multi sector transfer(s).
-        */
-       while (usa_count--) {
-               if (*data_pos != usn) {
-                       /*
-                        * Incomplete multi sector transfer detected! )-:
-                        * Set the magic to "BAAD" and return failure.
-                        * Note that magic_BAAD is already converted to le32.
-                        */
-                       b->magic = magic_BAAD;
-                       return -EINVAL;
-               }
-               data_pos += NTFS_BLOCK_SIZE/sizeof(u16);
-       }
-       /* Re-setup the variables. */
-       usa_count = le16_to_cpu(b->usa_count) - 1;
-       data_pos = (u16*)b + NTFS_BLOCK_SIZE/sizeof(u16) - 1;
-       /* Fixup all sectors. */
-       while (usa_count--) {
-               /*
-                * Increment position in usa and restore original data from
-                * the usa into the data buffer.
-                */
-               *data_pos = *(++usa_pos);
-               /* Increment position in data as well. */
-               data_pos += NTFS_BLOCK_SIZE/sizeof(u16);
-       }
-       return 0;
-}
-
-/**
- * pre_write_mst_fixup - apply multi sector transfer protection
- * @b:         pointer to the data to protect
- * @size:      size in bytes of @b
- *
- * Perform the necessary pre write multi sector transfer fixup on the data
- * pointer to by @b of @size.
- *
- * Return 0 if fixup applied (success) or -EINVAL if no fixup was performed
- * (assumed not needed). This is in contrast to post_read_mst_fixup() above.
- *
- * NOTE: We consider the absence / invalidity of an update sequence array to
- * mean that the structure is not subject to protection and hence doesn't need
- * to be fixed up. This means that you have to create a valid update sequence
- * array header in the ntfs record before calling this function, otherwise it
- * will fail (the header needs to contain the position of the update sequence
- * array together with the number of elements in the array). You also need to
- * initialise the update sequence number before calling this function
- * otherwise a random word will be used (whatever was in the record at that
- * position at that time).
- */
-int pre_write_mst_fixup(NTFS_RECORD *b, const u32 size)
-{
-       le16 *usa_pos, *data_pos;
-       u16 usa_ofs, usa_count, usn;
-       le16 le_usn;
-
-       /* Sanity check + only fixup if it makes sense. */
-       if (!b || ntfs_is_baad_record(b->magic) ||
-                       ntfs_is_hole_record(b->magic))
-               return -EINVAL;
-       /* Setup the variables. */
-       usa_ofs = le16_to_cpu(b->usa_ofs);
-       /* Decrement usa_count to get number of fixups. */
-       usa_count = le16_to_cpu(b->usa_count) - 1;
-       /* Size and alignment checks. */
-       if ( size & (NTFS_BLOCK_SIZE - 1)       ||
-            usa_ofs & 1                        ||
-            usa_ofs + (usa_count * 2) > size   ||
-            (size >> NTFS_BLOCK_SIZE_BITS) != usa_count)
-               return -EINVAL;
-       /* Position of usn in update sequence array. */
-       usa_pos = (le16*)((u8*)b + usa_ofs);
-       /*
-        * Cyclically increment the update sequence number
-        * (skipping 0 and -1, i.e. 0xffff).
-        */
-       usn = le16_to_cpup(usa_pos) + 1;
-       if (usn == 0xffff || !usn)
-               usn = 1;
-       le_usn = cpu_to_le16(usn);
-       *usa_pos = le_usn;
-       /* Position in data of first u16 that needs fixing up. */
-       data_pos = (le16*)b + NTFS_BLOCK_SIZE/sizeof(le16) - 1;
-       /* Fixup all sectors. */
-       while (usa_count--) {
-               /*
-                * Increment the position in the usa and save the
-                * original data from the data buffer into the usa.
-                */
-               *(++usa_pos) = *data_pos;
-               /* Apply fixup to data. */
-               *data_pos = le_usn;
-               /* Increment position in data as well. */
-               data_pos += NTFS_BLOCK_SIZE/sizeof(le16);
-       }
-       return 0;
-}
-
-/**
- * post_write_mst_fixup - fast deprotect multi sector transfer protected data
- * @b:         pointer to the data to deprotect
- *
- * Perform the necessary post write multi sector transfer fixup, not checking
- * for any errors, because we assume we have just used pre_write_mst_fixup(),
- * thus the data will be fine or we would never have gotten here.
- */
-void post_write_mst_fixup(NTFS_RECORD *b)
-{
-       le16 *usa_pos, *data_pos;
-
-       u16 usa_ofs = le16_to_cpu(b->usa_ofs);
-       u16 usa_count = le16_to_cpu(b->usa_count) - 1;
-
-       /* Position of usn in update sequence array. */
-       usa_pos = (le16*)b + usa_ofs/sizeof(le16);
-
-       /* Position in protected data of first u16 that needs fixing up. */
-       data_pos = (le16*)b + NTFS_BLOCK_SIZE/sizeof(le16) - 1;
-
-       /* Fixup all sectors. */
-       while (usa_count--) {
-               /*
-                * Increment position in usa and restore original data from
-                * the usa into the data buffer.
-                */
-               *data_pos = *(++usa_pos);
-
-               /* Increment position in data as well. */
-               data_pos += NTFS_BLOCK_SIZE/sizeof(le16);
-       }
-}
diff --git a/fs/ntfs/namei.c b/fs/ntfs/namei.c
deleted file mode 100644 (file)
index d7498dd..0000000
+++ /dev/null
@@ -1,392 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * namei.c - NTFS kernel directory inode operations. Part of the Linux-NTFS
- *          project.
- *
- * Copyright (c) 2001-2006 Anton Altaparmakov
- */
-
-#include <linux/dcache.h>
-#include <linux/exportfs.h>
-#include <linux/security.h>
-#include <linux/slab.h>
-
-#include "attrib.h"
-#include "debug.h"
-#include "dir.h"
-#include "mft.h"
-#include "ntfs.h"
-
-/**
- * ntfs_lookup - find the inode represented by a dentry in a directory inode
- * @dir_ino:   directory inode in which to look for the inode
- * @dent:      dentry representing the inode to look for
- * @flags:     lookup flags
- *
- * In short, ntfs_lookup() looks for the inode represented by the dentry @dent
- * in the directory inode @dir_ino and if found attaches the inode to the
- * dentry @dent.
- *
- * In more detail, the dentry @dent specifies which inode to look for by
- * supplying the name of the inode in @dent->d_name.name. ntfs_lookup()
- * converts the name to Unicode and walks the contents of the directory inode
- * @dir_ino looking for the converted Unicode name. If the name is found in the
- * directory, the corresponding inode is loaded by calling ntfs_iget() on its
- * inode number and the inode is associated with the dentry @dent via a call to
- * d_splice_alias().
- *
- * If the name is not found in the directory, a NULL inode is inserted into the
- * dentry @dent via a call to d_add(). The dentry is then termed a negative
- * dentry.
- *
- * Only if an actual error occurs, do we return an error via ERR_PTR().
- *
- * In order to handle the case insensitivity issues of NTFS with regards to the
- * dcache and the dcache requiring only one dentry per directory, we deal with
- * dentry aliases that only differ in case in ->ntfs_lookup() while maintaining
- * a case sensitive dcache. This means that we get the full benefit of dcache
- * speed when the file/directory is looked up with the same case as returned by
- * ->ntfs_readdir() but that a lookup for any other case (or for the short file
- * name) will not find anything in dcache and will enter ->ntfs_lookup()
- * instead, where we search the directory for a fully matching file name
- * (including case) and if that is not found, we search for a file name that
- * matches with different case and if that has non-POSIX semantics we return
- * that. We actually do only one search (case sensitive) and keep tabs on
- * whether we have found a case insensitive match in the process.
- *
- * To simplify matters for us, we do not treat the short vs long filenames as
- * two hard links but instead if the lookup matches a short filename, we
- * return the dentry for the corresponding long filename instead.
- *
- * There are three cases we need to distinguish here:
- *
- * 1) @dent perfectly matches (i.e. including case) a directory entry with a
- *    file name in the WIN32 or POSIX namespaces. In this case
- *    ntfs_lookup_inode_by_name() will return with name set to NULL and we
- *    just d_splice_alias() @dent.
- * 2) @dent matches (not including case) a directory entry with a file name in
- *    the WIN32 namespace. In this case ntfs_lookup_inode_by_name() will return
- *    with name set to point to a kmalloc()ed ntfs_name structure containing
- *    the properly cased little endian Unicode name. We convert the name to the
- *    current NLS code page, search if a dentry with this name already exists
- *    and if so return that instead of @dent.  At this point things are
- *    complicated by the possibility of 'disconnected' dentries due to NFS
- *    which we deal with appropriately (see the code comments).  The VFS will
- *    then destroy the old @dent and use the one we returned.  If a dentry is
- *    not found, we allocate a new one, d_splice_alias() it, and return it as
- *    above.
- * 3) @dent matches either perfectly or not (i.e. we don't care about case) a
- *    directory entry with a file name in the DOS namespace. In this case
- *    ntfs_lookup_inode_by_name() will return with name set to point to a
- *    kmalloc()ed ntfs_name structure containing the mft reference (cpu endian)
- *    of the inode. We use the mft reference to read the inode and to find the
- *    file name in the WIN32 namespace corresponding to the matched short file
- *    name. We then convert the name to the current NLS code page, and proceed
- *    searching for a dentry with this name, etc, as in case 2), above.
- *
- * Locking: Caller must hold i_mutex on the directory.
- */
-static struct dentry *ntfs_lookup(struct inode *dir_ino, struct dentry *dent,
-               unsigned int flags)
-{
-       ntfs_volume *vol = NTFS_SB(dir_ino->i_sb);
-       struct inode *dent_inode;
-       ntfschar *uname;
-       ntfs_name *name = NULL;
-       MFT_REF mref;
-       unsigned long dent_ino;
-       int uname_len;
-
-       ntfs_debug("Looking up %pd in directory inode 0x%lx.",
-                       dent, dir_ino->i_ino);
-       /* Convert the name of the dentry to Unicode. */
-       uname_len = ntfs_nlstoucs(vol, dent->d_name.name, dent->d_name.len,
-                       &uname);
-       if (uname_len < 0) {
-               if (uname_len != -ENAMETOOLONG)
-                       ntfs_error(vol->sb, "Failed to convert name to "
-                                       "Unicode.");
-               return ERR_PTR(uname_len);
-       }
-       mref = ntfs_lookup_inode_by_name(NTFS_I(dir_ino), uname, uname_len,
-                       &name);
-       kmem_cache_free(ntfs_name_cache, uname);
-       if (!IS_ERR_MREF(mref)) {
-               dent_ino = MREF(mref);
-               ntfs_debug("Found inode 0x%lx. Calling ntfs_iget.", dent_ino);
-               dent_inode = ntfs_iget(vol->sb, dent_ino);
-               if (!IS_ERR(dent_inode)) {
-                       /* Consistency check. */
-                       if (is_bad_inode(dent_inode) || MSEQNO(mref) ==
-                                       NTFS_I(dent_inode)->seq_no ||
-                                       dent_ino == FILE_MFT) {
-                               /* Perfect WIN32/POSIX match. -- Case 1. */
-                               if (!name) {
-                                       ntfs_debug("Done.  (Case 1.)");
-                                       return d_splice_alias(dent_inode, dent);
-                               }
-                               /*
-                                * We are too indented.  Handle imperfect
-                                * matches and short file names further below.
-                                */
-                               goto handle_name;
-                       }
-                       ntfs_error(vol->sb, "Found stale reference to inode "
-                                       "0x%lx (reference sequence number = "
-                                       "0x%x, inode sequence number = 0x%x), "
-                                       "returning -EIO. Run chkdsk.",
-                                       dent_ino, MSEQNO(mref),
-                                       NTFS_I(dent_inode)->seq_no);
-                       iput(dent_inode);
-                       dent_inode = ERR_PTR(-EIO);
-               } else
-                       ntfs_error(vol->sb, "ntfs_iget(0x%lx) failed with "
-                                       "error code %li.", dent_ino,
-                                       PTR_ERR(dent_inode));
-               kfree(name);
-               /* Return the error code. */
-               return ERR_CAST(dent_inode);
-       }
-       /* It is guaranteed that @name is no longer allocated at this point. */
-       if (MREF_ERR(mref) == -ENOENT) {
-               ntfs_debug("Entry was not found, adding negative dentry.");
-               /* The dcache will handle negative entries. */
-               d_add(dent, NULL);
-               ntfs_debug("Done.");
-               return NULL;
-       }
-       ntfs_error(vol->sb, "ntfs_lookup_ino_by_name() failed with error "
-                       "code %i.", -MREF_ERR(mref));
-       return ERR_PTR(MREF_ERR(mref));
-       // TODO: Consider moving this lot to a separate function! (AIA)
-handle_name:
-   {
-       MFT_RECORD *m;
-       ntfs_attr_search_ctx *ctx;
-       ntfs_inode *ni = NTFS_I(dent_inode);
-       int err;
-       struct qstr nls_name;
-
-       nls_name.name = NULL;
-       if (name->type != FILE_NAME_DOS) {                      /* Case 2. */
-               ntfs_debug("Case 2.");
-               nls_name.len = (unsigned)ntfs_ucstonls(vol,
-                               (ntfschar*)&name->name, name->len,
-                               (unsigned char**)&nls_name.name, 0);
-               kfree(name);
-       } else /* if (name->type == FILE_NAME_DOS) */ {         /* Case 3. */
-               FILE_NAME_ATTR *fn;
-
-               ntfs_debug("Case 3.");
-               kfree(name);
-
-               /* Find the WIN32 name corresponding to the matched DOS name. */
-               ni = NTFS_I(dent_inode);
-               m = map_mft_record(ni);
-               if (IS_ERR(m)) {
-                       err = PTR_ERR(m);
-                       m = NULL;
-                       ctx = NULL;
-                       goto err_out;
-               }
-               ctx = ntfs_attr_get_search_ctx(ni, m);
-               if (unlikely(!ctx)) {
-                       err = -ENOMEM;
-                       goto err_out;
-               }
-               do {
-                       ATTR_RECORD *a;
-                       u32 val_len;
-
-                       err = ntfs_attr_lookup(AT_FILE_NAME, NULL, 0, 0, 0,
-                                       NULL, 0, ctx);
-                       if (unlikely(err)) {
-                               ntfs_error(vol->sb, "Inode corrupt: No WIN32 "
-                                               "namespace counterpart to DOS "
-                                               "file name. Run chkdsk.");
-                               if (err == -ENOENT)
-                                       err = -EIO;
-                               goto err_out;
-                       }
-                       /* Consistency checks. */
-                       a = ctx->attr;
-                       if (a->non_resident || a->flags)
-                               goto eio_err_out;
-                       val_len = le32_to_cpu(a->data.resident.value_length);
-                       if (le16_to_cpu(a->data.resident.value_offset) +
-                                       val_len > le32_to_cpu(a->length))
-                               goto eio_err_out;
-                       fn = (FILE_NAME_ATTR*)((u8*)ctx->attr + le16_to_cpu(
-                                       ctx->attr->data.resident.value_offset));
-                       if ((u32)(fn->file_name_length * sizeof(ntfschar) +
-                                       sizeof(FILE_NAME_ATTR)) > val_len)
-                               goto eio_err_out;
-               } while (fn->file_name_type != FILE_NAME_WIN32);
-
-               /* Convert the found WIN32 name to current NLS code page. */
-               nls_name.len = (unsigned)ntfs_ucstonls(vol,
-                               (ntfschar*)&fn->file_name, fn->file_name_length,
-                               (unsigned char**)&nls_name.name, 0);
-
-               ntfs_attr_put_search_ctx(ctx);
-               unmap_mft_record(ni);
-       }
-       m = NULL;
-       ctx = NULL;
-
-       /* Check if a conversion error occurred. */
-       if ((signed)nls_name.len < 0) {
-               err = (signed)nls_name.len;
-               goto err_out;
-       }
-       nls_name.hash = full_name_hash(dent, nls_name.name, nls_name.len);
-
-       dent = d_add_ci(dent, dent_inode, &nls_name);
-       kfree(nls_name.name);
-       return dent;
-
-eio_err_out:
-       ntfs_error(vol->sb, "Illegal file name attribute. Run chkdsk.");
-       err = -EIO;
-err_out:
-       if (ctx)
-               ntfs_attr_put_search_ctx(ctx);
-       if (m)
-               unmap_mft_record(ni);
-       iput(dent_inode);
-       ntfs_error(vol->sb, "Failed, returning error code %i.", err);
-       return ERR_PTR(err);
-   }
-}
-
-/*
- * Inode operations for directories.
- */
-const struct inode_operations ntfs_dir_inode_ops = {
-       .lookup = ntfs_lookup,  /* VFS: Lookup directory. */
-};
-
-/**
- * ntfs_get_parent - find the dentry of the parent of a given directory dentry
- * @child_dent:                dentry of the directory whose parent directory to find
- *
- * Find the dentry for the parent directory of the directory specified by the
- * dentry @child_dent.  This function is called from
- * fs/exportfs/expfs.c::find_exported_dentry() which in turn is called from the
- * default ->decode_fh() which is export_decode_fh() in the same file.
- *
- * The code is based on the ext3 ->get_parent() implementation found in
- * fs/ext3/namei.c::ext3_get_parent().
- *
- * Note: ntfs_get_parent() is called with @d_inode(child_dent)->i_mutex down.
- *
- * Return the dentry of the parent directory on success or the error code on
- * error (IS_ERR() is true).
- */
-static struct dentry *ntfs_get_parent(struct dentry *child_dent)
-{
-       struct inode *vi = d_inode(child_dent);
-       ntfs_inode *ni = NTFS_I(vi);
-       MFT_RECORD *mrec;
-       ntfs_attr_search_ctx *ctx;
-       ATTR_RECORD *attr;
-       FILE_NAME_ATTR *fn;
-       unsigned long parent_ino;
-       int err;
-
-       ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
-       /* Get the mft record of the inode belonging to the child dentry. */
-       mrec = map_mft_record(ni);
-       if (IS_ERR(mrec))
-               return ERR_CAST(mrec);
-       /* Find the first file name attribute in the mft record. */
-       ctx = ntfs_attr_get_search_ctx(ni, mrec);
-       if (unlikely(!ctx)) {
-               unmap_mft_record(ni);
-               return ERR_PTR(-ENOMEM);
-       }
-try_next:
-       err = ntfs_attr_lookup(AT_FILE_NAME, NULL, 0, CASE_SENSITIVE, 0, NULL,
-                       0, ctx);
-       if (unlikely(err)) {
-               ntfs_attr_put_search_ctx(ctx);
-               unmap_mft_record(ni);
-               if (err == -ENOENT)
-                       ntfs_error(vi->i_sb, "Inode 0x%lx does not have a "
-                                       "file name attribute.  Run chkdsk.",
-                                       vi->i_ino);
-               return ERR_PTR(err);
-       }
-       attr = ctx->attr;
-       if (unlikely(attr->non_resident))
-               goto try_next;
-       fn = (FILE_NAME_ATTR *)((u8 *)attr +
-                       le16_to_cpu(attr->data.resident.value_offset));
-       if (unlikely((u8 *)fn + le32_to_cpu(attr->data.resident.value_length) >
-                       (u8*)attr + le32_to_cpu(attr->length)))
-               goto try_next;
-       /* Get the inode number of the parent directory. */
-       parent_ino = MREF_LE(fn->parent_directory);
-       /* Release the search context and the mft record of the child. */
-       ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(ni);
-
-       return d_obtain_alias(ntfs_iget(vi->i_sb, parent_ino));
-}
-
-static struct inode *ntfs_nfs_get_inode(struct super_block *sb,
-               u64 ino, u32 generation)
-{
-       struct inode *inode;
-
-       inode = ntfs_iget(sb, ino);
-       if (!IS_ERR(inode)) {
-               if (is_bad_inode(inode) || inode->i_generation != generation) {
-                       iput(inode);
-                       inode = ERR_PTR(-ESTALE);
-               }
-       }
-
-       return inode;
-}
-
-static struct dentry *ntfs_fh_to_dentry(struct super_block *sb, struct fid *fid,
-               int fh_len, int fh_type)
-{
-       return generic_fh_to_dentry(sb, fid, fh_len, fh_type,
-                                   ntfs_nfs_get_inode);
-}
-
-static struct dentry *ntfs_fh_to_parent(struct super_block *sb, struct fid *fid,
-               int fh_len, int fh_type)
-{
-       return generic_fh_to_parent(sb, fid, fh_len, fh_type,
-                                   ntfs_nfs_get_inode);
-}
-
-/*
- * Export operations allowing NFS exporting of mounted NTFS partitions.
- *
- * We use the default ->encode_fh() for now.  Note that they
- * use 32 bits to store the inode number which is an unsigned long so on 64-bit
- * architectures is usually 64 bits so it would all fail horribly on huge
- * volumes.  I guess we need to define our own encode and decode fh functions
- * that store 64-bit inode numbers at some point but for now we will ignore the
- * problem...
- *
- * We also use the default ->get_name() helper (used by ->decode_fh() via
- * fs/exportfs/expfs.c::find_exported_dentry()) as that is completely fs
- * independent.
- *
- * The default ->get_parent() just returns -EACCES so we have to provide our
- * own and the default ->get_dentry() is incompatible with NTFS due to not
- * allowing the inode number 0 which is used in NTFS for the system file $MFT
- * and due to using iget() whereas NTFS needs ntfs_iget().
- */
-const struct export_operations ntfs_export_ops = {
-       .encode_fh      = generic_encode_ino32_fh,
-       .get_parent     = ntfs_get_parent,      /* Find the parent of a given
-                                                  directory. */
-       .fh_to_dentry   = ntfs_fh_to_dentry,
-       .fh_to_parent   = ntfs_fh_to_parent,
-};
diff --git a/fs/ntfs/ntfs.h b/fs/ntfs/ntfs.h
deleted file mode 100644 (file)
index e81376e..0000000
+++ /dev/null
@@ -1,150 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * ntfs.h - Defines for NTFS Linux kernel driver.
- *
- * Copyright (c) 2001-2014 Anton Altaparmakov and Tuxera Inc.
- * Copyright (C) 2002 Richard Russon
- */
-
-#ifndef _LINUX_NTFS_H
-#define _LINUX_NTFS_H
-
-#include <linux/stddef.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/compiler.h>
-#include <linux/fs.h>
-#include <linux/nls.h>
-#include <linux/smp.h>
-#include <linux/pagemap.h>
-
-#include "types.h"
-#include "volume.h"
-#include "layout.h"
-
-typedef enum {
-       NTFS_BLOCK_SIZE         = 512,
-       NTFS_BLOCK_SIZE_BITS    = 9,
-       NTFS_SB_MAGIC           = 0x5346544e,   /* 'NTFS' */
-       NTFS_MAX_NAME_LEN       = 255,
-       NTFS_MAX_ATTR_NAME_LEN  = 255,
-       NTFS_MAX_CLUSTER_SIZE   = 64 * 1024,    /* 64kiB */
-       NTFS_MAX_PAGES_PER_CLUSTER = NTFS_MAX_CLUSTER_SIZE / PAGE_SIZE,
-} NTFS_CONSTANTS;
-
-/* Global variables. */
-
-/* Slab caches (from super.c). */
-extern struct kmem_cache *ntfs_name_cache;
-extern struct kmem_cache *ntfs_inode_cache;
-extern struct kmem_cache *ntfs_big_inode_cache;
-extern struct kmem_cache *ntfs_attr_ctx_cache;
-extern struct kmem_cache *ntfs_index_ctx_cache;
-
-/* The various operations structs defined throughout the driver files. */
-extern const struct address_space_operations ntfs_normal_aops;
-extern const struct address_space_operations ntfs_compressed_aops;
-extern const struct address_space_operations ntfs_mst_aops;
-
-extern const struct  file_operations ntfs_file_ops;
-extern const struct inode_operations ntfs_file_inode_ops;
-
-extern const struct  file_operations ntfs_dir_ops;
-extern const struct inode_operations ntfs_dir_inode_ops;
-
-extern const struct  file_operations ntfs_empty_file_ops;
-extern const struct inode_operations ntfs_empty_inode_ops;
-
-extern const struct export_operations ntfs_export_ops;
-
-/**
- * NTFS_SB - return the ntfs volume given a vfs super block
- * @sb:                VFS super block
- *
- * NTFS_SB() returns the ntfs volume associated with the VFS super block @sb.
- */
-static inline ntfs_volume *NTFS_SB(struct super_block *sb)
-{
-       return sb->s_fs_info;
-}
-
-/* Declarations of functions and global variables. */
-
-/* From fs/ntfs/compress.c */
-extern int ntfs_read_compressed_block(struct page *page);
-extern int allocate_compression_buffers(void);
-extern void free_compression_buffers(void);
-
-/* From fs/ntfs/super.c */
-#define default_upcase_len 0x10000
-extern struct mutex ntfs_lock;
-
-typedef struct {
-       int val;
-       char *str;
-} option_t;
-extern const option_t on_errors_arr[];
-
-/* From fs/ntfs/mst.c */
-extern int post_read_mst_fixup(NTFS_RECORD *b, const u32 size);
-extern int pre_write_mst_fixup(NTFS_RECORD *b, const u32 size);
-extern void post_write_mst_fixup(NTFS_RECORD *b);
-
-/* From fs/ntfs/unistr.c */
-extern bool ntfs_are_names_equal(const ntfschar *s1, size_t s1_len,
-               const ntfschar *s2, size_t s2_len,
-               const IGNORE_CASE_BOOL ic,
-               const ntfschar *upcase, const u32 upcase_size);
-extern int ntfs_collate_names(const ntfschar *name1, const u32 name1_len,
-               const ntfschar *name2, const u32 name2_len,
-               const int err_val, const IGNORE_CASE_BOOL ic,
-               const ntfschar *upcase, const u32 upcase_len);
-extern int ntfs_ucsncmp(const ntfschar *s1, const ntfschar *s2, size_t n);
-extern int ntfs_ucsncasecmp(const ntfschar *s1, const ntfschar *s2, size_t n,
-               const ntfschar *upcase, const u32 upcase_size);
-extern void ntfs_upcase_name(ntfschar *name, u32 name_len,
-               const ntfschar *upcase, const u32 upcase_len);
-extern void ntfs_file_upcase_value(FILE_NAME_ATTR *file_name_attr,
-               const ntfschar *upcase, const u32 upcase_len);
-extern int ntfs_file_compare_values(FILE_NAME_ATTR *file_name_attr1,
-               FILE_NAME_ATTR *file_name_attr2,
-               const int err_val, const IGNORE_CASE_BOOL ic,
-               const ntfschar *upcase, const u32 upcase_len);
-extern int ntfs_nlstoucs(const ntfs_volume *vol, const char *ins,
-               const int ins_len, ntfschar **outs);
-extern int ntfs_ucstonls(const ntfs_volume *vol, const ntfschar *ins,
-               const int ins_len, unsigned char **outs, int outs_len);
-
-/* From fs/ntfs/upcase.c */
-extern ntfschar *generate_default_upcase(void);
-
-static inline int ntfs_ffs(int x)
-{
-       int r = 1;
-
-       if (!x)
-               return 0;
-       if (!(x & 0xffff)) {
-               x >>= 16;
-               r += 16;
-       }
-       if (!(x & 0xff)) {
-               x >>= 8;
-               r += 8;
-       }
-       if (!(x & 0xf)) {
-               x >>= 4;
-               r += 4;
-       }
-       if (!(x & 3)) {
-               x >>= 2;
-               r += 2;
-       }
-       if (!(x & 1)) {
-               x >>= 1;
-               r += 1;
-       }
-       return r;
-}
-
-#endif /* _LINUX_NTFS_H */
diff --git a/fs/ntfs/quota.c b/fs/ntfs/quota.c
deleted file mode 100644 (file)
index 9160480..0000000
+++ /dev/null
@@ -1,103 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * quota.c - NTFS kernel quota ($Quota) handling.  Part of the Linux-NTFS
- *          project.
- *
- * Copyright (c) 2004 Anton Altaparmakov
- */
-
-#ifdef NTFS_RW
-
-#include "index.h"
-#include "quota.h"
-#include "debug.h"
-#include "ntfs.h"
-
-/**
- * ntfs_mark_quotas_out_of_date - mark the quotas out of date on an ntfs volume
- * @vol:       ntfs volume on which to mark the quotas out of date
- *
- * Mark the quotas out of date on the ntfs volume @vol and return 'true' on
- * success and 'false' on error.
- */
-bool ntfs_mark_quotas_out_of_date(ntfs_volume *vol)
-{
-       ntfs_index_context *ictx;
-       QUOTA_CONTROL_ENTRY *qce;
-       const le32 qid = QUOTA_DEFAULTS_ID;
-       int err;
-
-       ntfs_debug("Entering.");
-       if (NVolQuotaOutOfDate(vol))
-               goto done;
-       if (!vol->quota_ino || !vol->quota_q_ino) {
-               ntfs_error(vol->sb, "Quota inodes are not open.");
-               return false;
-       }
-       inode_lock(vol->quota_q_ino);
-       ictx = ntfs_index_ctx_get(NTFS_I(vol->quota_q_ino));
-       if (!ictx) {
-               ntfs_error(vol->sb, "Failed to get index context.");
-               goto err_out;
-       }
-       err = ntfs_index_lookup(&qid, sizeof(qid), ictx);
-       if (err) {
-               if (err == -ENOENT)
-                       ntfs_error(vol->sb, "Quota defaults entry is not "
-                                       "present.");
-               else
-                       ntfs_error(vol->sb, "Lookup of quota defaults entry "
-                                       "failed.");
-               goto err_out;
-       }
-       if (ictx->data_len < offsetof(QUOTA_CONTROL_ENTRY, sid)) {
-               ntfs_error(vol->sb, "Quota defaults entry size is invalid.  "
-                               "Run chkdsk.");
-               goto err_out;
-       }
-       qce = (QUOTA_CONTROL_ENTRY*)ictx->data;
-       if (le32_to_cpu(qce->version) != QUOTA_VERSION) {
-               ntfs_error(vol->sb, "Quota defaults entry version 0x%x is not "
-                               "supported.", le32_to_cpu(qce->version));
-               goto err_out;
-       }
-       ntfs_debug("Quota defaults flags = 0x%x.", le32_to_cpu(qce->flags));
-       /* If quotas are already marked out of date, no need to do anything. */
-       if (qce->flags & QUOTA_FLAG_OUT_OF_DATE)
-               goto set_done;
-       /*
-        * If quota tracking is neither requested, nor enabled and there are no
-        * pending deletes, no need to mark the quotas out of date.
-        */
-       if (!(qce->flags & (QUOTA_FLAG_TRACKING_ENABLED |
-                       QUOTA_FLAG_TRACKING_REQUESTED |
-                       QUOTA_FLAG_PENDING_DELETES)))
-               goto set_done;
-       /*
-        * Set the QUOTA_FLAG_OUT_OF_DATE bit thus marking quotas out of date.
-        * This is verified on WinXP to be sufficient to cause windows to
-        * rescan the volume on boot and update all quota entries.
-        */
-       qce->flags |= QUOTA_FLAG_OUT_OF_DATE;
-       /* Ensure the modified flags are written to disk. */
-       ntfs_index_entry_flush_dcache_page(ictx);
-       ntfs_index_entry_mark_dirty(ictx);
-set_done:
-       ntfs_index_ctx_put(ictx);
-       inode_unlock(vol->quota_q_ino);
-       /*
-        * We set the flag so we do not try to mark the quotas out of date
-        * again on remount.
-        */
-       NVolSetQuotaOutOfDate(vol);
-done:
-       ntfs_debug("Done.");
-       return true;
-err_out:
-       if (ictx)
-               ntfs_index_ctx_put(ictx);
-       inode_unlock(vol->quota_q_ino);
-       return false;
-}
-
-#endif /* NTFS_RW */
diff --git a/fs/ntfs/quota.h b/fs/ntfs/quota.h
deleted file mode 100644 (file)
index fe3132a..0000000
+++ /dev/null
@@ -1,21 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * quota.h - Defines for NTFS kernel quota ($Quota) handling.  Part of the
- *          Linux-NTFS project.
- *
- * Copyright (c) 2004 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_QUOTA_H
-#define _LINUX_NTFS_QUOTA_H
-
-#ifdef NTFS_RW
-
-#include "types.h"
-#include "volume.h"
-
-extern bool ntfs_mark_quotas_out_of_date(ntfs_volume *vol);
-
-#endif /* NTFS_RW */
-
-#endif /* _LINUX_NTFS_QUOTA_H */
diff --git a/fs/ntfs/runlist.c b/fs/ntfs/runlist.c
deleted file mode 100644 (file)
index 0d448e9..0000000
+++ /dev/null
@@ -1,1893 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * runlist.c - NTFS runlist handling code.  Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2007 Anton Altaparmakov
- * Copyright (c) 2002-2005 Richard Russon
- */
-
-#include "debug.h"
-#include "dir.h"
-#include "endian.h"
-#include "malloc.h"
-#include "ntfs.h"
-
-/**
- * ntfs_rl_mm - runlist memmove
- *
- * It is up to the caller to serialize access to the runlist @base.
- */
-static inline void ntfs_rl_mm(runlist_element *base, int dst, int src,
-               int size)
-{
-       if (likely((dst != src) && (size > 0)))
-               memmove(base + dst, base + src, size * sizeof(*base));
-}
-
-/**
- * ntfs_rl_mc - runlist memory copy
- *
- * It is up to the caller to serialize access to the runlists @dstbase and
- * @srcbase.
- */
-static inline void ntfs_rl_mc(runlist_element *dstbase, int dst,
-               runlist_element *srcbase, int src, int size)
-{
-       if (likely(size > 0))
-               memcpy(dstbase + dst, srcbase + src, size * sizeof(*dstbase));
-}
-
-/**
- * ntfs_rl_realloc - Reallocate memory for runlists
- * @rl:                original runlist
- * @old_size:  number of runlist elements in the original runlist @rl
- * @new_size:  number of runlist elements we need space for
- *
- * As the runlists grow, more memory will be required.  To prevent the
- * kernel having to allocate and reallocate large numbers of small bits of
- * memory, this function returns an entire page of memory.
- *
- * It is up to the caller to serialize access to the runlist @rl.
- *
- * N.B.  If the new allocation doesn't require a different number of pages in
- *       memory, the function will return the original pointer.
- *
- * On success, return a pointer to the newly allocated, or recycled, memory.
- * On error, return -errno. The following error codes are defined:
- *     -ENOMEM - Not enough memory to allocate runlist array.
- *     -EINVAL - Invalid parameters were passed in.
- */
-static inline runlist_element *ntfs_rl_realloc(runlist_element *rl,
-               int old_size, int new_size)
-{
-       runlist_element *new_rl;
-
-       old_size = PAGE_ALIGN(old_size * sizeof(*rl));
-       new_size = PAGE_ALIGN(new_size * sizeof(*rl));
-       if (old_size == new_size)
-               return rl;
-
-       new_rl = ntfs_malloc_nofs(new_size);
-       if (unlikely(!new_rl))
-               return ERR_PTR(-ENOMEM);
-
-       if (likely(rl != NULL)) {
-               if (unlikely(old_size > new_size))
-                       old_size = new_size;
-               memcpy(new_rl, rl, old_size);
-               ntfs_free(rl);
-       }
-       return new_rl;
-}
-
-/**
- * ntfs_rl_realloc_nofail - Reallocate memory for runlists
- * @rl:                original runlist
- * @old_size:  number of runlist elements in the original runlist @rl
- * @new_size:  number of runlist elements we need space for
- *
- * As the runlists grow, more memory will be required.  To prevent the
- * kernel having to allocate and reallocate large numbers of small bits of
- * memory, this function returns an entire page of memory.
- *
- * This function guarantees that the allocation will succeed.  It will sleep
- * for as long as it takes to complete the allocation.
- *
- * It is up to the caller to serialize access to the runlist @rl.
- *
- * N.B.  If the new allocation doesn't require a different number of pages in
- *       memory, the function will return the original pointer.
- *
- * On success, return a pointer to the newly allocated, or recycled, memory.
- * On error, return -errno. The following error codes are defined:
- *     -ENOMEM - Not enough memory to allocate runlist array.
- *     -EINVAL - Invalid parameters were passed in.
- */
-static inline runlist_element *ntfs_rl_realloc_nofail(runlist_element *rl,
-               int old_size, int new_size)
-{
-       runlist_element *new_rl;
-
-       old_size = PAGE_ALIGN(old_size * sizeof(*rl));
-       new_size = PAGE_ALIGN(new_size * sizeof(*rl));
-       if (old_size == new_size)
-               return rl;
-
-       new_rl = ntfs_malloc_nofs_nofail(new_size);
-       BUG_ON(!new_rl);
-
-       if (likely(rl != NULL)) {
-               if (unlikely(old_size > new_size))
-                       old_size = new_size;
-               memcpy(new_rl, rl, old_size);
-               ntfs_free(rl);
-       }
-       return new_rl;
-}
-
-/**
- * ntfs_are_rl_mergeable - test if two runlists can be joined together
- * @dst:       original runlist
- * @src:       new runlist to test for mergeability with @dst
- *
- * Test if two runlists can be joined together. For this, their VCNs and LCNs
- * must be adjacent.
- *
- * It is up to the caller to serialize access to the runlists @dst and @src.
- *
- * Return: true   Success, the runlists can be merged.
- *        false  Failure, the runlists cannot be merged.
- */
-static inline bool ntfs_are_rl_mergeable(runlist_element *dst,
-               runlist_element *src)
-{
-       BUG_ON(!dst);
-       BUG_ON(!src);
-
-       /* We can merge unmapped regions even if they are misaligned. */
-       if ((dst->lcn == LCN_RL_NOT_MAPPED) && (src->lcn == LCN_RL_NOT_MAPPED))
-               return true;
-       /* If the runs are misaligned, we cannot merge them. */
-       if ((dst->vcn + dst->length) != src->vcn)
-               return false;
-       /* If both runs are non-sparse and contiguous, we can merge them. */
-       if ((dst->lcn >= 0) && (src->lcn >= 0) &&
-                       ((dst->lcn + dst->length) == src->lcn))
-               return true;
-       /* If we are merging two holes, we can merge them. */
-       if ((dst->lcn == LCN_HOLE) && (src->lcn == LCN_HOLE))
-               return true;
-       /* Cannot merge. */
-       return false;
-}
-
-/**
- * __ntfs_rl_merge - merge two runlists without testing if they can be merged
- * @dst:       original, destination runlist
- * @src:       new runlist to merge with @dst
- *
- * Merge the two runlists, writing into the destination runlist @dst. The
- * caller must make sure the runlists can be merged or this will corrupt the
- * destination runlist.
- *
- * It is up to the caller to serialize access to the runlists @dst and @src.
- */
-static inline void __ntfs_rl_merge(runlist_element *dst, runlist_element *src)
-{
-       dst->length += src->length;
-}
-
-/**
- * ntfs_rl_append - append a runlist after a given element
- * @dst:       original runlist to be worked on
- * @dsize:     number of elements in @dst (including end marker)
- * @src:       runlist to be inserted into @dst
- * @ssize:     number of elements in @src (excluding end marker)
- * @loc:       append the new runlist @src after this element in @dst
- *
- * Append the runlist @src after element @loc in @dst.  Merge the right end of
- * the new runlist, if necessary. Adjust the size of the hole before the
- * appended runlist.
- *
- * It is up to the caller to serialize access to the runlists @dst and @src.
- *
- * On success, return a pointer to the new, combined, runlist. Note, both
- * runlists @dst and @src are deallocated before returning so you cannot use
- * the pointers for anything any more. (Strictly speaking the returned runlist
- * may be the same as @dst but this is irrelevant.)
- *
- * On error, return -errno. Both runlists are left unmodified. The following
- * error codes are defined:
- *     -ENOMEM - Not enough memory to allocate runlist array.
- *     -EINVAL - Invalid parameters were passed in.
- */
-static inline runlist_element *ntfs_rl_append(runlist_element *dst,
-               int dsize, runlist_element *src, int ssize, int loc)
-{
-       bool right = false;     /* Right end of @src needs merging. */
-       int marker;             /* End of the inserted runs. */
-
-       BUG_ON(!dst);
-       BUG_ON(!src);
-
-       /* First, check if the right hand end needs merging. */
-       if ((loc + 1) < dsize)
-               right = ntfs_are_rl_mergeable(src + ssize - 1, dst + loc + 1);
-
-       /* Space required: @dst size + @src size, less one if we merged. */
-       dst = ntfs_rl_realloc(dst, dsize, dsize + ssize - right);
-       if (IS_ERR(dst))
-               return dst;
-       /*
-        * We are guaranteed to succeed from here so can start modifying the
-        * original runlists.
-        */
-
-       /* First, merge the right hand end, if necessary. */
-       if (right)
-               __ntfs_rl_merge(src + ssize - 1, dst + loc + 1);
-
-       /* First run after the @src runs that have been inserted. */
-       marker = loc + ssize + 1;
-
-       /* Move the tail of @dst out of the way, then copy in @src. */
-       ntfs_rl_mm(dst, marker, loc + 1 + right, dsize - (loc + 1 + right));
-       ntfs_rl_mc(dst, loc + 1, src, 0, ssize);
-
-       /* Adjust the size of the preceding hole. */
-       dst[loc].length = dst[loc + 1].vcn - dst[loc].vcn;
-
-       /* We may have changed the length of the file, so fix the end marker */
-       if (dst[marker].lcn == LCN_ENOENT)
-               dst[marker].vcn = dst[marker - 1].vcn + dst[marker - 1].length;
-
-       return dst;
-}
-
-/**
- * ntfs_rl_insert - insert a runlist into another
- * @dst:       original runlist to be worked on
- * @dsize:     number of elements in @dst (including end marker)
- * @src:       new runlist to be inserted
- * @ssize:     number of elements in @src (excluding end marker)
- * @loc:       insert the new runlist @src before this element in @dst
- *
- * Insert the runlist @src before element @loc in the runlist @dst. Merge the
- * left end of the new runlist, if necessary. Adjust the size of the hole
- * after the inserted runlist.
- *
- * It is up to the caller to serialize access to the runlists @dst and @src.
- *
- * On success, return a pointer to the new, combined, runlist. Note, both
- * runlists @dst and @src are deallocated before returning so you cannot use
- * the pointers for anything any more. (Strictly speaking the returned runlist
- * may be the same as @dst but this is irrelevant.)
- *
- * On error, return -errno. Both runlists are left unmodified. The following
- * error codes are defined:
- *     -ENOMEM - Not enough memory to allocate runlist array.
- *     -EINVAL - Invalid parameters were passed in.
- */
-static inline runlist_element *ntfs_rl_insert(runlist_element *dst,
-               int dsize, runlist_element *src, int ssize, int loc)
-{
-       bool left = false;      /* Left end of @src needs merging. */
-       bool disc = false;      /* Discontinuity between @dst and @src. */
-       int marker;             /* End of the inserted runs. */
-
-       BUG_ON(!dst);
-       BUG_ON(!src);
-
-       /*
-        * disc => Discontinuity between the end of @dst and the start of @src.
-        *         This means we might need to insert a "not mapped" run.
-        */
-       if (loc == 0)
-               disc = (src[0].vcn > 0);
-       else {
-               s64 merged_length;
-
-               left = ntfs_are_rl_mergeable(dst + loc - 1, src);
-
-               merged_length = dst[loc - 1].length;
-               if (left)
-                       merged_length += src->length;
-
-               disc = (src[0].vcn > dst[loc - 1].vcn + merged_length);
-       }
-       /*
-        * Space required: @dst size + @src size, less one if we merged, plus
-        * one if there was a discontinuity.
-        */
-       dst = ntfs_rl_realloc(dst, dsize, dsize + ssize - left + disc);
-       if (IS_ERR(dst))
-               return dst;
-       /*
-        * We are guaranteed to succeed from here so can start modifying the
-        * original runlist.
-        */
-       if (left)
-               __ntfs_rl_merge(dst + loc - 1, src);
-       /*
-        * First run after the @src runs that have been inserted.
-        * Nominally,  @marker equals @loc + @ssize, i.e. location + number of
-        * runs in @src.  However, if @left, then the first run in @src has
-        * been merged with one in @dst.  And if @disc, then @dst and @src do
-        * not meet and we need an extra run to fill the gap.
-        */
-       marker = loc + ssize - left + disc;
-
-       /* Move the tail of @dst out of the way, then copy in @src. */
-       ntfs_rl_mm(dst, marker, loc, dsize - loc);
-       ntfs_rl_mc(dst, loc + disc, src, left, ssize - left);
-
-       /* Adjust the VCN of the first run after the insertion... */
-       dst[marker].vcn = dst[marker - 1].vcn + dst[marker - 1].length;
-       /* ... and the length. */
-       if (dst[marker].lcn == LCN_HOLE || dst[marker].lcn == LCN_RL_NOT_MAPPED)
-               dst[marker].length = dst[marker + 1].vcn - dst[marker].vcn;
-
-       /* Writing beyond the end of the file and there is a discontinuity. */
-       if (disc) {
-               if (loc > 0) {
-                       dst[loc].vcn = dst[loc - 1].vcn + dst[loc - 1].length;
-                       dst[loc].length = dst[loc + 1].vcn - dst[loc].vcn;
-               } else {
-                       dst[loc].vcn = 0;
-                       dst[loc].length = dst[loc + 1].vcn;
-               }
-               dst[loc].lcn = LCN_RL_NOT_MAPPED;
-       }
-       return dst;
-}
-
-/**
- * ntfs_rl_replace - overwrite a runlist element with another runlist
- * @dst:       original runlist to be worked on
- * @dsize:     number of elements in @dst (including end marker)
- * @src:       new runlist to be inserted
- * @ssize:     number of elements in @src (excluding end marker)
- * @loc:       index in runlist @dst to overwrite with @src
- *
- * Replace the runlist element @dst at @loc with @src. Merge the left and
- * right ends of the inserted runlist, if necessary.
- *
- * It is up to the caller to serialize access to the runlists @dst and @src.
- *
- * On success, return a pointer to the new, combined, runlist. Note, both
- * runlists @dst and @src are deallocated before returning so you cannot use
- * the pointers for anything any more. (Strictly speaking the returned runlist
- * may be the same as @dst but this is irrelevant.)
- *
- * On error, return -errno. Both runlists are left unmodified. The following
- * error codes are defined:
- *     -ENOMEM - Not enough memory to allocate runlist array.
- *     -EINVAL - Invalid parameters were passed in.
- */
-static inline runlist_element *ntfs_rl_replace(runlist_element *dst,
-               int dsize, runlist_element *src, int ssize, int loc)
-{
-       signed delta;
-       bool left = false;      /* Left end of @src needs merging. */
-       bool right = false;     /* Right end of @src needs merging. */
-       int tail;               /* Start of tail of @dst. */
-       int marker;             /* End of the inserted runs. */
-
-       BUG_ON(!dst);
-       BUG_ON(!src);
-
-       /* First, see if the left and right ends need merging. */
-       if ((loc + 1) < dsize)
-               right = ntfs_are_rl_mergeable(src + ssize - 1, dst + loc + 1);
-       if (loc > 0)
-               left = ntfs_are_rl_mergeable(dst + loc - 1, src);
-       /*
-        * Allocate some space.  We will need less if the left, right, or both
-        * ends get merged.  The -1 accounts for the run being replaced.
-        */
-       delta = ssize - 1 - left - right;
-       if (delta > 0) {
-               dst = ntfs_rl_realloc(dst, dsize, dsize + delta);
-               if (IS_ERR(dst))
-                       return dst;
-       }
-       /*
-        * We are guaranteed to succeed from here so can start modifying the
-        * original runlists.
-        */
-
-       /* First, merge the left and right ends, if necessary. */
-       if (right)
-               __ntfs_rl_merge(src + ssize - 1, dst + loc + 1);
-       if (left)
-               __ntfs_rl_merge(dst + loc - 1, src);
-       /*
-        * Offset of the tail of @dst.  This needs to be moved out of the way
-        * to make space for the runs to be copied from @src, i.e. the first
-        * run of the tail of @dst.
-        * Nominally, @tail equals @loc + 1, i.e. location, skipping the
-        * replaced run.  However, if @right, then one of @dst's runs is
-        * already merged into @src.
-        */
-       tail = loc + right + 1;
-       /*
-        * First run after the @src runs that have been inserted, i.e. where
-        * the tail of @dst needs to be moved to.
-        * Nominally, @marker equals @loc + @ssize, i.e. location + number of
-        * runs in @src.  However, if @left, then the first run in @src has
-        * been merged with one in @dst.
-        */
-       marker = loc + ssize - left;
-
-       /* Move the tail of @dst out of the way, then copy in @src. */
-       ntfs_rl_mm(dst, marker, tail, dsize - tail);
-       ntfs_rl_mc(dst, loc, src, left, ssize - left);
-
-       /* We may have changed the length of the file, so fix the end marker. */
-       if (dsize - tail > 0 && dst[marker].lcn == LCN_ENOENT)
-               dst[marker].vcn = dst[marker - 1].vcn + dst[marker - 1].length;
-       return dst;
-}
-
-/**
- * ntfs_rl_split - insert a runlist into the centre of a hole
- * @dst:       original runlist to be worked on
- * @dsize:     number of elements in @dst (including end marker)
- * @src:       new runlist to be inserted
- * @ssize:     number of elements in @src (excluding end marker)
- * @loc:       index in runlist @dst at which to split and insert @src
- *
- * Split the runlist @dst at @loc into two and insert @new in between the two
- * fragments. No merging of runlists is necessary. Adjust the size of the
- * holes either side.
- *
- * It is up to the caller to serialize access to the runlists @dst and @src.
- *
- * On success, return a pointer to the new, combined, runlist. Note, both
- * runlists @dst and @src are deallocated before returning so you cannot use
- * the pointers for anything any more. (Strictly speaking the returned runlist
- * may be the same as @dst but this is irrelevant.)
- *
- * On error, return -errno. Both runlists are left unmodified. The following
- * error codes are defined:
- *     -ENOMEM - Not enough memory to allocate runlist array.
- *     -EINVAL - Invalid parameters were passed in.
- */
-static inline runlist_element *ntfs_rl_split(runlist_element *dst, int dsize,
-               runlist_element *src, int ssize, int loc)
-{
-       BUG_ON(!dst);
-       BUG_ON(!src);
-
-       /* Space required: @dst size + @src size + one new hole. */
-       dst = ntfs_rl_realloc(dst, dsize, dsize + ssize + 1);
-       if (IS_ERR(dst))
-               return dst;
-       /*
-        * We are guaranteed to succeed from here so can start modifying the
-        * original runlists.
-        */
-
-       /* Move the tail of @dst out of the way, then copy in @src. */
-       ntfs_rl_mm(dst, loc + 1 + ssize, loc, dsize - loc);
-       ntfs_rl_mc(dst, loc + 1, src, 0, ssize);
-
-       /* Adjust the size of the holes either size of @src. */
-       dst[loc].length         = dst[loc+1].vcn       - dst[loc].vcn;
-       dst[loc+ssize+1].vcn    = dst[loc+ssize].vcn   + dst[loc+ssize].length;
-       dst[loc+ssize+1].length = dst[loc+ssize+2].vcn - dst[loc+ssize+1].vcn;
-
-       return dst;
-}
-
-/**
- * ntfs_runlists_merge - merge two runlists into one
- * @drl:       original runlist to be worked on
- * @srl:       new runlist to be merged into @drl
- *
- * First we sanity check the two runlists @srl and @drl to make sure that they
- * are sensible and can be merged. The runlist @srl must be either after the
- * runlist @drl or completely within a hole (or unmapped region) in @drl.
- *
- * It is up to the caller to serialize access to the runlists @drl and @srl.
- *
- * Merging of runlists is necessary in two cases:
- *   1. When attribute lists are used and a further extent is being mapped.
- *   2. When new clusters are allocated to fill a hole or extend a file.
- *
- * There are four possible ways @srl can be merged. It can:
- *     - be inserted at the beginning of a hole,
- *     - split the hole in two and be inserted between the two fragments,
- *     - be appended at the end of a hole, or it can
- *     - replace the whole hole.
- * It can also be appended to the end of the runlist, which is just a variant
- * of the insert case.
- *
- * On success, return a pointer to the new, combined, runlist. Note, both
- * runlists @drl and @srl are deallocated before returning so you cannot use
- * the pointers for anything any more. (Strictly speaking the returned runlist
- * may be the same as @dst but this is irrelevant.)
- *
- * On error, return -errno. Both runlists are left unmodified. The following
- * error codes are defined:
- *     -ENOMEM - Not enough memory to allocate runlist array.
- *     -EINVAL - Invalid parameters were passed in.
- *     -ERANGE - The runlists overlap and cannot be merged.
- */
-runlist_element *ntfs_runlists_merge(runlist_element *drl,
-               runlist_element *srl)
-{
-       int di, si;             /* Current index into @[ds]rl. */
-       int sstart;             /* First index with lcn > LCN_RL_NOT_MAPPED. */
-       int dins;               /* Index into @drl at which to insert @srl. */
-       int dend, send;         /* Last index into @[ds]rl. */
-       int dfinal, sfinal;     /* The last index into @[ds]rl with
-                                  lcn >= LCN_HOLE. */
-       int marker = 0;
-       VCN marker_vcn = 0;
-
-#ifdef DEBUG
-       ntfs_debug("dst:");
-       ntfs_debug_dump_runlist(drl);
-       ntfs_debug("src:");
-       ntfs_debug_dump_runlist(srl);
-#endif
-
-       /* Check for silly calling... */
-       if (unlikely(!srl))
-               return drl;
-       if (IS_ERR(srl) || IS_ERR(drl))
-               return ERR_PTR(-EINVAL);
-
-       /* Check for the case where the first mapping is being done now. */
-       if (unlikely(!drl)) {
-               drl = srl;
-               /* Complete the source runlist if necessary. */
-               if (unlikely(drl[0].vcn)) {
-                       /* Scan to the end of the source runlist. */
-                       for (dend = 0; likely(drl[dend].length); dend++)
-                               ;
-                       dend++;
-                       drl = ntfs_rl_realloc(drl, dend, dend + 1);
-                       if (IS_ERR(drl))
-                               return drl;
-                       /* Insert start element at the front of the runlist. */
-                       ntfs_rl_mm(drl, 1, 0, dend);
-                       drl[0].vcn = 0;
-                       drl[0].lcn = LCN_RL_NOT_MAPPED;
-                       drl[0].length = drl[1].vcn;
-               }
-               goto finished;
-       }
-
-       si = di = 0;
-
-       /* Skip any unmapped start element(s) in the source runlist. */
-       while (srl[si].length && srl[si].lcn < LCN_HOLE)
-               si++;
-
-       /* Can't have an entirely unmapped source runlist. */
-       BUG_ON(!srl[si].length);
-
-       /* Record the starting points. */
-       sstart = si;
-
-       /*
-        * Skip forward in @drl until we reach the position where @srl needs to
-        * be inserted. If we reach the end of @drl, @srl just needs to be
-        * appended to @drl.
-        */
-       for (; drl[di].length; di++) {
-               if (drl[di].vcn + drl[di].length > srl[sstart].vcn)
-                       break;
-       }
-       dins = di;
-
-       /* Sanity check for illegal overlaps. */
-       if ((drl[di].vcn == srl[si].vcn) && (drl[di].lcn >= 0) &&
-                       (srl[si].lcn >= 0)) {
-               ntfs_error(NULL, "Run lists overlap. Cannot merge!");
-               return ERR_PTR(-ERANGE);
-       }
-
-       /* Scan to the end of both runlists in order to know their sizes. */
-       for (send = si; srl[send].length; send++)
-               ;
-       for (dend = di; drl[dend].length; dend++)
-               ;
-
-       if (srl[send].lcn == LCN_ENOENT)
-               marker_vcn = srl[marker = send].vcn;
-
-       /* Scan to the last element with lcn >= LCN_HOLE. */
-       for (sfinal = send; sfinal >= 0 && srl[sfinal].lcn < LCN_HOLE; sfinal--)
-               ;
-       for (dfinal = dend; dfinal >= 0 && drl[dfinal].lcn < LCN_HOLE; dfinal--)
-               ;
-
-       {
-       bool start;
-       bool finish;
-       int ds = dend + 1;              /* Number of elements in drl & srl */
-       int ss = sfinal - sstart + 1;
-
-       start  = ((drl[dins].lcn <  LCN_RL_NOT_MAPPED) ||    /* End of file   */
-                 (drl[dins].vcn == srl[sstart].vcn));       /* Start of hole */
-       finish = ((drl[dins].lcn >= LCN_RL_NOT_MAPPED) &&    /* End of file   */
-                ((drl[dins].vcn + drl[dins].length) <=      /* End of hole   */
-                 (srl[send - 1].vcn + srl[send - 1].length)));
-
-       /* Or we will lose an end marker. */
-       if (finish && !drl[dins].length)
-               ss++;
-       if (marker && (drl[dins].vcn + drl[dins].length > srl[send - 1].vcn))
-               finish = false;
-#if 0
-       ntfs_debug("dfinal = %i, dend = %i", dfinal, dend);
-       ntfs_debug("sstart = %i, sfinal = %i, send = %i", sstart, sfinal, send);
-       ntfs_debug("start = %i, finish = %i", start, finish);
-       ntfs_debug("ds = %i, ss = %i, dins = %i", ds, ss, dins);
-#endif
-       if (start) {
-               if (finish)
-                       drl = ntfs_rl_replace(drl, ds, srl + sstart, ss, dins);
-               else
-                       drl = ntfs_rl_insert(drl, ds, srl + sstart, ss, dins);
-       } else {
-               if (finish)
-                       drl = ntfs_rl_append(drl, ds, srl + sstart, ss, dins);
-               else
-                       drl = ntfs_rl_split(drl, ds, srl + sstart, ss, dins);
-       }
-       if (IS_ERR(drl)) {
-               ntfs_error(NULL, "Merge failed.");
-               return drl;
-       }
-       ntfs_free(srl);
-       if (marker) {
-               ntfs_debug("Triggering marker code.");
-               for (ds = dend; drl[ds].length; ds++)
-                       ;
-               /* We only need to care if @srl ended after @drl. */
-               if (drl[ds].vcn <= marker_vcn) {
-                       int slots = 0;
-
-                       if (drl[ds].vcn == marker_vcn) {
-                               ntfs_debug("Old marker = 0x%llx, replacing "
-                                               "with LCN_ENOENT.",
-                                               (unsigned long long)
-                                               drl[ds].lcn);
-                               drl[ds].lcn = LCN_ENOENT;
-                               goto finished;
-                       }
-                       /*
-                        * We need to create an unmapped runlist element in
-                        * @drl or extend an existing one before adding the
-                        * ENOENT terminator.
-                        */
-                       if (drl[ds].lcn == LCN_ENOENT) {
-                               ds--;
-                               slots = 1;
-                       }
-                       if (drl[ds].lcn != LCN_RL_NOT_MAPPED) {
-                               /* Add an unmapped runlist element. */
-                               if (!slots) {
-                                       drl = ntfs_rl_realloc_nofail(drl, ds,
-                                                       ds + 2);
-                                       slots = 2;
-                               }
-                               ds++;
-                               /* Need to set vcn if it isn't set already. */
-                               if (slots != 1)
-                                       drl[ds].vcn = drl[ds - 1].vcn +
-                                                       drl[ds - 1].length;
-                               drl[ds].lcn = LCN_RL_NOT_MAPPED;
-                               /* We now used up a slot. */
-                               slots--;
-                       }
-                       drl[ds].length = marker_vcn - drl[ds].vcn;
-                       /* Finally add the ENOENT terminator. */
-                       ds++;
-                       if (!slots)
-                               drl = ntfs_rl_realloc_nofail(drl, ds, ds + 1);
-                       drl[ds].vcn = marker_vcn;
-                       drl[ds].lcn = LCN_ENOENT;
-                       drl[ds].length = (s64)0;
-               }
-       }
-       }
-
-finished:
-       /* The merge was completed successfully. */
-       ntfs_debug("Merged runlist:");
-       ntfs_debug_dump_runlist(drl);
-       return drl;
-}
-
-/**
- * ntfs_mapping_pairs_decompress - convert mapping pairs array to runlist
- * @vol:       ntfs volume on which the attribute resides
- * @attr:      attribute record whose mapping pairs array to decompress
- * @old_rl:    optional runlist in which to insert @attr's runlist
- *
- * It is up to the caller to serialize access to the runlist @old_rl.
- *
- * Decompress the attribute @attr's mapping pairs array into a runlist. On
- * success, return the decompressed runlist.
- *
- * If @old_rl is not NULL, decompressed runlist is inserted into the
- * appropriate place in @old_rl and the resultant, combined runlist is
- * returned. The original @old_rl is deallocated.
- *
- * On error, return -errno. @old_rl is left unmodified in that case.
- *
- * The following error codes are defined:
- *     -ENOMEM - Not enough memory to allocate runlist array.
- *     -EIO    - Corrupt runlist.
- *     -EINVAL - Invalid parameters were passed in.
- *     -ERANGE - The two runlists overlap.
- *
- * FIXME: For now we take the conceptionally simplest approach of creating the
- * new runlist disregarding the already existing one and then splicing the
- * two into one, if that is possible (we check for overlap and discard the new
- * runlist if overlap present before returning ERR_PTR(-ERANGE)).
- */
-runlist_element *ntfs_mapping_pairs_decompress(const ntfs_volume *vol,
-               const ATTR_RECORD *attr, runlist_element *old_rl)
-{
-       VCN vcn;                /* Current vcn. */
-       LCN lcn;                /* Current lcn. */
-       s64 deltaxcn;           /* Change in [vl]cn. */
-       runlist_element *rl;    /* The output runlist. */
-       u8 *buf;                /* Current position in mapping pairs array. */
-       u8 *attr_end;           /* End of attribute. */
-       int rlsize;             /* Size of runlist buffer. */
-       u16 rlpos;              /* Current runlist position in units of
-                                  runlist_elements. */
-       u8 b;                   /* Current byte offset in buf. */
-
-#ifdef DEBUG
-       /* Make sure attr exists and is non-resident. */
-       if (!attr || !attr->non_resident || sle64_to_cpu(
-                       attr->data.non_resident.lowest_vcn) < (VCN)0) {
-               ntfs_error(vol->sb, "Invalid arguments.");
-               return ERR_PTR(-EINVAL);
-       }
-#endif
-       /* Start at vcn = lowest_vcn and lcn 0. */
-       vcn = sle64_to_cpu(attr->data.non_resident.lowest_vcn);
-       lcn = 0;
-       /* Get start of the mapping pairs array. */
-       buf = (u8*)attr + le16_to_cpu(
-                       attr->data.non_resident.mapping_pairs_offset);
-       attr_end = (u8*)attr + le32_to_cpu(attr->length);
-       if (unlikely(buf < (u8*)attr || buf > attr_end)) {
-               ntfs_error(vol->sb, "Corrupt attribute.");
-               return ERR_PTR(-EIO);
-       }
-       /* If the mapping pairs array is valid but empty, nothing to do. */
-       if (!vcn && !*buf)
-               return old_rl;
-       /* Current position in runlist array. */
-       rlpos = 0;
-       /* Allocate first page and set current runlist size to one page. */
-       rl = ntfs_malloc_nofs(rlsize = PAGE_SIZE);
-       if (unlikely(!rl))
-               return ERR_PTR(-ENOMEM);
-       /* Insert unmapped starting element if necessary. */
-       if (vcn) {
-               rl->vcn = 0;
-               rl->lcn = LCN_RL_NOT_MAPPED;
-               rl->length = vcn;
-               rlpos++;
-       }
-       while (buf < attr_end && *buf) {
-               /*
-                * Allocate more memory if needed, including space for the
-                * not-mapped and terminator elements. ntfs_malloc_nofs()
-                * operates on whole pages only.
-                */
-               if (((rlpos + 3) * sizeof(*old_rl)) > rlsize) {
-                       runlist_element *rl2;
-
-                       rl2 = ntfs_malloc_nofs(rlsize + (int)PAGE_SIZE);
-                       if (unlikely(!rl2)) {
-                               ntfs_free(rl);
-                               return ERR_PTR(-ENOMEM);
-                       }
-                       memcpy(rl2, rl, rlsize);
-                       ntfs_free(rl);
-                       rl = rl2;
-                       rlsize += PAGE_SIZE;
-               }
-               /* Enter the current vcn into the current runlist element. */
-               rl[rlpos].vcn = vcn;
-               /*
-                * Get the change in vcn, i.e. the run length in clusters.
-                * Doing it this way ensures that we signextend negative values.
-                * A negative run length doesn't make any sense, but hey, I
-                * didn't make up the NTFS specs and Windows NT4 treats the run
-                * length as a signed value so that's how it is...
-                */
-               b = *buf & 0xf;
-               if (b) {
-                       if (unlikely(buf + b > attr_end))
-                               goto io_error;
-                       for (deltaxcn = (s8)buf[b--]; b; b--)
-                               deltaxcn = (deltaxcn << 8) + buf[b];
-               } else { /* The length entry is compulsory. */
-                       ntfs_error(vol->sb, "Missing length entry in mapping "
-                                       "pairs array.");
-                       deltaxcn = (s64)-1;
-               }
-               /*
-                * Assume a negative length to indicate data corruption and
-                * hence clean-up and return NULL.
-                */
-               if (unlikely(deltaxcn < 0)) {
-                       ntfs_error(vol->sb, "Invalid length in mapping pairs "
-                                       "array.");
-                       goto err_out;
-               }
-               /*
-                * Enter the current run length into the current runlist
-                * element.
-                */
-               rl[rlpos].length = deltaxcn;
-               /* Increment the current vcn by the current run length. */
-               vcn += deltaxcn;
-               /*
-                * There might be no lcn change at all, as is the case for
-                * sparse clusters on NTFS 3.0+, in which case we set the lcn
-                * to LCN_HOLE.
-                */
-               if (!(*buf & 0xf0))
-                       rl[rlpos].lcn = LCN_HOLE;
-               else {
-                       /* Get the lcn change which really can be negative. */
-                       u8 b2 = *buf & 0xf;
-                       b = b2 + ((*buf >> 4) & 0xf);
-                       if (buf + b > attr_end)
-                               goto io_error;
-                       for (deltaxcn = (s8)buf[b--]; b > b2; b--)
-                               deltaxcn = (deltaxcn << 8) + buf[b];
-                       /* Change the current lcn to its new value. */
-                       lcn += deltaxcn;
-#ifdef DEBUG
-                       /*
-                        * On NTFS 1.2-, apparently can have lcn == -1 to
-                        * indicate a hole. But we haven't verified ourselves
-                        * whether it is really the lcn or the deltaxcn that is
-                        * -1. So if either is found give us a message so we
-                        * can investigate it further!
-                        */
-                       if (vol->major_ver < 3) {
-                               if (unlikely(deltaxcn == (LCN)-1))
-                                       ntfs_error(vol->sb, "lcn delta == -1");
-                               if (unlikely(lcn == (LCN)-1))
-                                       ntfs_error(vol->sb, "lcn == -1");
-                       }
-#endif
-                       /* Check lcn is not below -1. */
-                       if (unlikely(lcn < (LCN)-1)) {
-                               ntfs_error(vol->sb, "Invalid LCN < -1 in "
-                                               "mapping pairs array.");
-                               goto err_out;
-                       }
-                       /* Enter the current lcn into the runlist element. */
-                       rl[rlpos].lcn = lcn;
-               }
-               /* Get to the next runlist element. */
-               rlpos++;
-               /* Increment the buffer position to the next mapping pair. */
-               buf += (*buf & 0xf) + ((*buf >> 4) & 0xf) + 1;
-       }
-       if (unlikely(buf >= attr_end))
-               goto io_error;
-       /*
-        * If there is a highest_vcn specified, it must be equal to the final
-        * vcn in the runlist - 1, or something has gone badly wrong.
-        */
-       deltaxcn = sle64_to_cpu(attr->data.non_resident.highest_vcn);
-       if (unlikely(deltaxcn && vcn - 1 != deltaxcn)) {
-mpa_err:
-               ntfs_error(vol->sb, "Corrupt mapping pairs array in "
-                               "non-resident attribute.");
-               goto err_out;
-       }
-       /* Setup not mapped runlist element if this is the base extent. */
-       if (!attr->data.non_resident.lowest_vcn) {
-               VCN max_cluster;
-
-               max_cluster = ((sle64_to_cpu(
-                               attr->data.non_resident.allocated_size) +
-                               vol->cluster_size - 1) >>
-                               vol->cluster_size_bits) - 1;
-               /*
-                * A highest_vcn of zero means this is a single extent
-                * attribute so simply terminate the runlist with LCN_ENOENT).
-                */
-               if (deltaxcn) {
-                       /*
-                        * If there is a difference between the highest_vcn and
-                        * the highest cluster, the runlist is either corrupt
-                        * or, more likely, there are more extents following
-                        * this one.
-                        */
-                       if (deltaxcn < max_cluster) {
-                               ntfs_debug("More extents to follow; deltaxcn "
-                                               "= 0x%llx, max_cluster = "
-                                               "0x%llx",
-                                               (unsigned long long)deltaxcn,
-                                               (unsigned long long)
-                                               max_cluster);
-                               rl[rlpos].vcn = vcn;
-                               vcn += rl[rlpos].length = max_cluster -
-                                               deltaxcn;
-                               rl[rlpos].lcn = LCN_RL_NOT_MAPPED;
-                               rlpos++;
-                       } else if (unlikely(deltaxcn > max_cluster)) {
-                               ntfs_error(vol->sb, "Corrupt attribute.  "
-                                               "deltaxcn = 0x%llx, "
-                                               "max_cluster = 0x%llx",
-                                               (unsigned long long)deltaxcn,
-                                               (unsigned long long)
-                                               max_cluster);
-                               goto mpa_err;
-                       }
-               }
-               rl[rlpos].lcn = LCN_ENOENT;
-       } else /* Not the base extent. There may be more extents to follow. */
-               rl[rlpos].lcn = LCN_RL_NOT_MAPPED;
-
-       /* Setup terminating runlist element. */
-       rl[rlpos].vcn = vcn;
-       rl[rlpos].length = (s64)0;
-       /* If no existing runlist was specified, we are done. */
-       if (!old_rl) {
-               ntfs_debug("Mapping pairs array successfully decompressed:");
-               ntfs_debug_dump_runlist(rl);
-               return rl;
-       }
-       /* Now combine the new and old runlists checking for overlaps. */
-       old_rl = ntfs_runlists_merge(old_rl, rl);
-       if (!IS_ERR(old_rl))
-               return old_rl;
-       ntfs_free(rl);
-       ntfs_error(vol->sb, "Failed to merge runlists.");
-       return old_rl;
-io_error:
-       ntfs_error(vol->sb, "Corrupt attribute.");
-err_out:
-       ntfs_free(rl);
-       return ERR_PTR(-EIO);
-}
-
-/**
- * ntfs_rl_vcn_to_lcn - convert a vcn into a lcn given a runlist
- * @rl:                runlist to use for conversion
- * @vcn:       vcn to convert
- *
- * Convert the virtual cluster number @vcn of an attribute into a logical
- * cluster number (lcn) of a device using the runlist @rl to map vcns to their
- * corresponding lcns.
- *
- * It is up to the caller to serialize access to the runlist @rl.
- *
- * Since lcns must be >= 0, we use negative return codes with special meaning:
- *
- * Return code         Meaning / Description
- * ==================================================
- *  LCN_HOLE           Hole / not allocated on disk.
- *  LCN_RL_NOT_MAPPED  This is part of the runlist which has not been
- *                     inserted into the runlist yet.
- *  LCN_ENOENT         There is no such vcn in the attribute.
- *
- * Locking: - The caller must have locked the runlist (for reading or writing).
- *         - This function does not touch the lock, nor does it modify the
- *           runlist.
- */
-LCN ntfs_rl_vcn_to_lcn(const runlist_element *rl, const VCN vcn)
-{
-       int i;
-
-       BUG_ON(vcn < 0);
-       /*
-        * If rl is NULL, assume that we have found an unmapped runlist. The
-        * caller can then attempt to map it and fail appropriately if
-        * necessary.
-        */
-       if (unlikely(!rl))
-               return LCN_RL_NOT_MAPPED;
-
-       /* Catch out of lower bounds vcn. */
-       if (unlikely(vcn < rl[0].vcn))
-               return LCN_ENOENT;
-
-       for (i = 0; likely(rl[i].length); i++) {
-               if (unlikely(vcn < rl[i+1].vcn)) {
-                       if (likely(rl[i].lcn >= (LCN)0))
-                               return rl[i].lcn + (vcn - rl[i].vcn);
-                       return rl[i].lcn;
-               }
-       }
-       /*
-        * The terminator element is setup to the correct value, i.e. one of
-        * LCN_HOLE, LCN_RL_NOT_MAPPED, or LCN_ENOENT.
-        */
-       if (likely(rl[i].lcn < (LCN)0))
-               return rl[i].lcn;
-       /* Just in case... We could replace this with BUG() some day. */
-       return LCN_ENOENT;
-}
-
-#ifdef NTFS_RW
-
-/**
- * ntfs_rl_find_vcn_nolock - find a vcn in a runlist
- * @rl:                runlist to search
- * @vcn:       vcn to find
- *
- * Find the virtual cluster number @vcn in the runlist @rl and return the
- * address of the runlist element containing the @vcn on success.
- *
- * Return NULL if @rl is NULL or @vcn is in an unmapped part/out of bounds of
- * the runlist.
- *
- * Locking: The runlist must be locked on entry.
- */
-runlist_element *ntfs_rl_find_vcn_nolock(runlist_element *rl, const VCN vcn)
-{
-       BUG_ON(vcn < 0);
-       if (unlikely(!rl || vcn < rl[0].vcn))
-               return NULL;
-       while (likely(rl->length)) {
-               if (unlikely(vcn < rl[1].vcn)) {
-                       if (likely(rl->lcn >= LCN_HOLE))
-                               return rl;
-                       return NULL;
-               }
-               rl++;
-       }
-       if (likely(rl->lcn == LCN_ENOENT))
-               return rl;
-       return NULL;
-}
-
-/**
- * ntfs_get_nr_significant_bytes - get number of bytes needed to store a number
- * @n:         number for which to get the number of bytes for
- *
- * Return the number of bytes required to store @n unambiguously as
- * a signed number.
- *
- * This is used in the context of the mapping pairs array to determine how
- * many bytes will be needed in the array to store a given logical cluster
- * number (lcn) or a specific run length.
- *
- * Return the number of bytes written.  This function cannot fail.
- */
-static inline int ntfs_get_nr_significant_bytes(const s64 n)
-{
-       s64 l = n;
-       int i;
-       s8 j;
-
-       i = 0;
-       do {
-               l >>= 8;
-               i++;
-       } while (l != 0 && l != -1);
-       j = (n >> 8 * (i - 1)) & 0xff;
-       /* If the sign bit is wrong, we need an extra byte. */
-       if ((n < 0 && j >= 0) || (n > 0 && j < 0))
-               i++;
-       return i;
-}
-
-/**
- * ntfs_get_size_for_mapping_pairs - get bytes needed for mapping pairs array
- * @vol:       ntfs volume (needed for the ntfs version)
- * @rl:                locked runlist to determine the size of the mapping pairs of
- * @first_vcn: first vcn which to include in the mapping pairs array
- * @last_vcn:  last vcn which to include in the mapping pairs array
- *
- * Walk the locked runlist @rl and calculate the size in bytes of the mapping
- * pairs array corresponding to the runlist @rl, starting at vcn @first_vcn and
- * finishing with vcn @last_vcn.
- *
- * A @last_vcn of -1 means end of runlist and in that case the size of the
- * mapping pairs array corresponding to the runlist starting at vcn @first_vcn
- * and finishing at the end of the runlist is determined.
- *
- * This for example allows us to allocate a buffer of the right size when
- * building the mapping pairs array.
- *
- * If @rl is NULL, just return 1 (for the single terminator byte).
- *
- * Return the calculated size in bytes on success.  On error, return -errno.
- * The following error codes are defined:
- *     -EINVAL - Run list contains unmapped elements.  Make sure to only pass
- *               fully mapped runlists to this function.
- *     -EIO    - The runlist is corrupt.
- *
- * Locking: @rl must be locked on entry (either for reading or writing), it
- *         remains locked throughout, and is left locked upon return.
- */
-int ntfs_get_size_for_mapping_pairs(const ntfs_volume *vol,
-               const runlist_element *rl, const VCN first_vcn,
-               const VCN last_vcn)
-{
-       LCN prev_lcn;
-       int rls;
-       bool the_end = false;
-
-       BUG_ON(first_vcn < 0);
-       BUG_ON(last_vcn < -1);
-       BUG_ON(last_vcn >= 0 && first_vcn > last_vcn);
-       if (!rl) {
-               BUG_ON(first_vcn);
-               BUG_ON(last_vcn > 0);
-               return 1;
-       }
-       /* Skip to runlist element containing @first_vcn. */
-       while (rl->length && first_vcn >= rl[1].vcn)
-               rl++;
-       if (unlikely((!rl->length && first_vcn > rl->vcn) ||
-                       first_vcn < rl->vcn))
-               return -EINVAL;
-       prev_lcn = 0;
-       /* Always need the termining zero byte. */
-       rls = 1;
-       /* Do the first partial run if present. */
-       if (first_vcn > rl->vcn) {
-               s64 delta, length = rl->length;
-
-               /* We know rl->length != 0 already. */
-               if (unlikely(length < 0 || rl->lcn < LCN_HOLE))
-                       goto err_out;
-               /*
-                * If @stop_vcn is given and finishes inside this run, cap the
-                * run length.
-                */
-               if (unlikely(last_vcn >= 0 && rl[1].vcn > last_vcn)) {
-                       s64 s1 = last_vcn + 1;
-                       if (unlikely(rl[1].vcn > s1))
-                               length = s1 - rl->vcn;
-                       the_end = true;
-               }
-               delta = first_vcn - rl->vcn;
-               /* Header byte + length. */
-               rls += 1 + ntfs_get_nr_significant_bytes(length - delta);
-               /*
-                * If the logical cluster number (lcn) denotes a hole and we
-                * are on NTFS 3.0+, we don't store it at all, i.e. we need
-                * zero space.  On earlier NTFS versions we just store the lcn.
-                * Note: this assumes that on NTFS 1.2-, holes are stored with
-                * an lcn of -1 and not a delta_lcn of -1 (unless both are -1).
-                */
-               if (likely(rl->lcn >= 0 || vol->major_ver < 3)) {
-                       prev_lcn = rl->lcn;
-                       if (likely(rl->lcn >= 0))
-                               prev_lcn += delta;
-                       /* Change in lcn. */
-                       rls += ntfs_get_nr_significant_bytes(prev_lcn);
-               }
-               /* Go to next runlist element. */
-               rl++;
-       }
-       /* Do the full runs. */
-       for (; rl->length && !the_end; rl++) {
-               s64 length = rl->length;
-
-               if (unlikely(length < 0 || rl->lcn < LCN_HOLE))
-                       goto err_out;
-               /*
-                * If @stop_vcn is given and finishes inside this run, cap the
-                * run length.
-                */
-               if (unlikely(last_vcn >= 0 && rl[1].vcn > last_vcn)) {
-                       s64 s1 = last_vcn + 1;
-                       if (unlikely(rl[1].vcn > s1))
-                               length = s1 - rl->vcn;
-                       the_end = true;
-               }
-               /* Header byte + length. */
-               rls += 1 + ntfs_get_nr_significant_bytes(length);
-               /*
-                * If the logical cluster number (lcn) denotes a hole and we
-                * are on NTFS 3.0+, we don't store it at all, i.e. we need
-                * zero space.  On earlier NTFS versions we just store the lcn.
-                * Note: this assumes that on NTFS 1.2-, holes are stored with
-                * an lcn of -1 and not a delta_lcn of -1 (unless both are -1).
-                */
-               if (likely(rl->lcn >= 0 || vol->major_ver < 3)) {
-                       /* Change in lcn. */
-                       rls += ntfs_get_nr_significant_bytes(rl->lcn -
-                                       prev_lcn);
-                       prev_lcn = rl->lcn;
-               }
-       }
-       return rls;
-err_out:
-       if (rl->lcn == LCN_RL_NOT_MAPPED)
-               rls = -EINVAL;
-       else
-               rls = -EIO;
-       return rls;
-}
-
-/**
- * ntfs_write_significant_bytes - write the significant bytes of a number
- * @dst:       destination buffer to write to
- * @dst_max:   pointer to last byte of destination buffer for bounds checking
- * @n:         number whose significant bytes to write
- *
- * Store in @dst, the minimum bytes of the number @n which are required to
- * identify @n unambiguously as a signed number, taking care not to exceed
- * @dest_max, the maximum position within @dst to which we are allowed to
- * write.
- *
- * This is used when building the mapping pairs array of a runlist to compress
- * a given logical cluster number (lcn) or a specific run length to the minimum
- * size possible.
- *
- * Return the number of bytes written on success.  On error, i.e. the
- * destination buffer @dst is too small, return -ENOSPC.
- */
-static inline int ntfs_write_significant_bytes(s8 *dst, const s8 *dst_max,
-               const s64 n)
-{
-       s64 l = n;
-       int i;
-       s8 j;
-
-       i = 0;
-       do {
-               if (unlikely(dst > dst_max))
-                       goto err_out;
-               *dst++ = l & 0xffll;
-               l >>= 8;
-               i++;
-       } while (l != 0 && l != -1);
-       j = (n >> 8 * (i - 1)) & 0xff;
-       /* If the sign bit is wrong, we need an extra byte. */
-       if (n < 0 && j >= 0) {
-               if (unlikely(dst > dst_max))
-                       goto err_out;
-               i++;
-               *dst = (s8)-1;
-       } else if (n > 0 && j < 0) {
-               if (unlikely(dst > dst_max))
-                       goto err_out;
-               i++;
-               *dst = (s8)0;
-       }
-       return i;
-err_out:
-       return -ENOSPC;
-}
-
-/**
- * ntfs_mapping_pairs_build - build the mapping pairs array from a runlist
- * @vol:       ntfs volume (needed for the ntfs version)
- * @dst:       destination buffer to which to write the mapping pairs array
- * @dst_len:   size of destination buffer @dst in bytes
- * @rl:                locked runlist for which to build the mapping pairs array
- * @first_vcn: first vcn which to include in the mapping pairs array
- * @last_vcn:  last vcn which to include in the mapping pairs array
- * @stop_vcn:  first vcn outside destination buffer on success or -ENOSPC
- *
- * Create the mapping pairs array from the locked runlist @rl, starting at vcn
- * @first_vcn and finishing with vcn @last_vcn and save the array in @dst.
- * @dst_len is the size of @dst in bytes and it should be at least equal to the
- * value obtained by calling ntfs_get_size_for_mapping_pairs().
- *
- * A @last_vcn of -1 means end of runlist and in that case the mapping pairs
- * array corresponding to the runlist starting at vcn @first_vcn and finishing
- * at the end of the runlist is created.
- *
- * If @rl is NULL, just write a single terminator byte to @dst.
- *
- * On success or -ENOSPC error, if @stop_vcn is not NULL, *@stop_vcn is set to
- * the first vcn outside the destination buffer.  Note that on error, @dst has
- * been filled with all the mapping pairs that will fit, thus it can be treated
- * as partial success, in that a new attribute extent needs to be created or
- * the next extent has to be used and the mapping pairs build has to be
- * continued with @first_vcn set to *@stop_vcn.
- *
- * Return 0 on success and -errno on error.  The following error codes are
- * defined:
- *     -EINVAL - Run list contains unmapped elements.  Make sure to only pass
- *               fully mapped runlists to this function.
- *     -EIO    - The runlist is corrupt.
- *     -ENOSPC - The destination buffer is too small.
- *
- * Locking: @rl must be locked on entry (either for reading or writing), it
- *         remains locked throughout, and is left locked upon return.
- */
-int ntfs_mapping_pairs_build(const ntfs_volume *vol, s8 *dst,
-               const int dst_len, const runlist_element *rl,
-               const VCN first_vcn, const VCN last_vcn, VCN *const stop_vcn)
-{
-       LCN prev_lcn;
-       s8 *dst_max, *dst_next;
-       int err = -ENOSPC;
-       bool the_end = false;
-       s8 len_len, lcn_len;
-
-       BUG_ON(first_vcn < 0);
-       BUG_ON(last_vcn < -1);
-       BUG_ON(last_vcn >= 0 && first_vcn > last_vcn);
-       BUG_ON(dst_len < 1);
-       if (!rl) {
-               BUG_ON(first_vcn);
-               BUG_ON(last_vcn > 0);
-               if (stop_vcn)
-                       *stop_vcn = 0;
-               /* Terminator byte. */
-               *dst = 0;
-               return 0;
-       }
-       /* Skip to runlist element containing @first_vcn. */
-       while (rl->length && first_vcn >= rl[1].vcn)
-               rl++;
-       if (unlikely((!rl->length && first_vcn > rl->vcn) ||
-                       first_vcn < rl->vcn))
-               return -EINVAL;
-       /*
-        * @dst_max is used for bounds checking in
-        * ntfs_write_significant_bytes().
-        */
-       dst_max = dst + dst_len - 1;
-       prev_lcn = 0;
-       /* Do the first partial run if present. */
-       if (first_vcn > rl->vcn) {
-               s64 delta, length = rl->length;
-
-               /* We know rl->length != 0 already. */
-               if (unlikely(length < 0 || rl->lcn < LCN_HOLE))
-                       goto err_out;
-               /*
-                * If @stop_vcn is given and finishes inside this run, cap the
-                * run length.
-                */
-               if (unlikely(last_vcn >= 0 && rl[1].vcn > last_vcn)) {
-                       s64 s1 = last_vcn + 1;
-                       if (unlikely(rl[1].vcn > s1))
-                               length = s1 - rl->vcn;
-                       the_end = true;
-               }
-               delta = first_vcn - rl->vcn;
-               /* Write length. */
-               len_len = ntfs_write_significant_bytes(dst + 1, dst_max,
-                               length - delta);
-               if (unlikely(len_len < 0))
-                       goto size_err;
-               /*
-                * If the logical cluster number (lcn) denotes a hole and we
-                * are on NTFS 3.0+, we don't store it at all, i.e. we need
-                * zero space.  On earlier NTFS versions we just write the lcn
-                * change.  FIXME: Do we need to write the lcn change or just
-                * the lcn in that case?  Not sure as I have never seen this
-                * case on NT4. - We assume that we just need to write the lcn
-                * change until someone tells us otherwise... (AIA)
-                */
-               if (likely(rl->lcn >= 0 || vol->major_ver < 3)) {
-                       prev_lcn = rl->lcn;
-                       if (likely(rl->lcn >= 0))
-                               prev_lcn += delta;
-                       /* Write change in lcn. */
-                       lcn_len = ntfs_write_significant_bytes(dst + 1 +
-                                       len_len, dst_max, prev_lcn);
-                       if (unlikely(lcn_len < 0))
-                               goto size_err;
-               } else
-                       lcn_len = 0;
-               dst_next = dst + len_len + lcn_len + 1;
-               if (unlikely(dst_next > dst_max))
-                       goto size_err;
-               /* Update header byte. */
-               *dst = lcn_len << 4 | len_len;
-               /* Position at next mapping pairs array element. */
-               dst = dst_next;
-               /* Go to next runlist element. */
-               rl++;
-       }
-       /* Do the full runs. */
-       for (; rl->length && !the_end; rl++) {
-               s64 length = rl->length;
-
-               if (unlikely(length < 0 || rl->lcn < LCN_HOLE))
-                       goto err_out;
-               /*
-                * If @stop_vcn is given and finishes inside this run, cap the
-                * run length.
-                */
-               if (unlikely(last_vcn >= 0 && rl[1].vcn > last_vcn)) {
-                       s64 s1 = last_vcn + 1;
-                       if (unlikely(rl[1].vcn > s1))
-                               length = s1 - rl->vcn;
-                       the_end = true;
-               }
-               /* Write length. */
-               len_len = ntfs_write_significant_bytes(dst + 1, dst_max,
-                               length);
-               if (unlikely(len_len < 0))
-                       goto size_err;
-               /*
-                * If the logical cluster number (lcn) denotes a hole and we
-                * are on NTFS 3.0+, we don't store it at all, i.e. we need
-                * zero space.  On earlier NTFS versions we just write the lcn
-                * change.  FIXME: Do we need to write the lcn change or just
-                * the lcn in that case?  Not sure as I have never seen this
-                * case on NT4. - We assume that we just need to write the lcn
-                * change until someone tells us otherwise... (AIA)
-                */
-               if (likely(rl->lcn >= 0 || vol->major_ver < 3)) {
-                       /* Write change in lcn. */
-                       lcn_len = ntfs_write_significant_bytes(dst + 1 +
-                                       len_len, dst_max, rl->lcn - prev_lcn);
-                       if (unlikely(lcn_len < 0))
-                               goto size_err;
-                       prev_lcn = rl->lcn;
-               } else
-                       lcn_len = 0;
-               dst_next = dst + len_len + lcn_len + 1;
-               if (unlikely(dst_next > dst_max))
-                       goto size_err;
-               /* Update header byte. */
-               *dst = lcn_len << 4 | len_len;
-               /* Position at next mapping pairs array element. */
-               dst = dst_next;
-       }
-       /* Success. */
-       err = 0;
-size_err:
-       /* Set stop vcn. */
-       if (stop_vcn)
-               *stop_vcn = rl->vcn;
-       /* Add terminator byte. */
-       *dst = 0;
-       return err;
-err_out:
-       if (rl->lcn == LCN_RL_NOT_MAPPED)
-               err = -EINVAL;
-       else
-               err = -EIO;
-       return err;
-}
-
-/**
- * ntfs_rl_truncate_nolock - truncate a runlist starting at a specified vcn
- * @vol:       ntfs volume (needed for error output)
- * @runlist:   runlist to truncate
- * @new_length:        the new length of the runlist in VCNs
- *
- * Truncate the runlist described by @runlist as well as the memory buffer
- * holding the runlist elements to a length of @new_length VCNs.
- *
- * If @new_length lies within the runlist, the runlist elements with VCNs of
- * @new_length and above are discarded.  As a special case if @new_length is
- * zero, the runlist is discarded and set to NULL.
- *
- * If @new_length lies beyond the runlist, a sparse runlist element is added to
- * the end of the runlist @runlist or if the last runlist element is a sparse
- * one already, this is extended.
- *
- * Note, no checking is done for unmapped runlist elements.  It is assumed that
- * the caller has mapped any elements that need to be mapped already.
- *
- * Return 0 on success and -errno on error.
- *
- * Locking: The caller must hold @runlist->lock for writing.
- */
-int ntfs_rl_truncate_nolock(const ntfs_volume *vol, runlist *const runlist,
-               const s64 new_length)
-{
-       runlist_element *rl;
-       int old_size;
-
-       ntfs_debug("Entering for new_length 0x%llx.", (long long)new_length);
-       BUG_ON(!runlist);
-       BUG_ON(new_length < 0);
-       rl = runlist->rl;
-       if (!new_length) {
-               ntfs_debug("Freeing runlist.");
-               runlist->rl = NULL;
-               if (rl)
-                       ntfs_free(rl);
-               return 0;
-       }
-       if (unlikely(!rl)) {
-               /*
-                * Create a runlist consisting of a sparse runlist element of
-                * length @new_length followed by a terminator runlist element.
-                */
-               rl = ntfs_malloc_nofs(PAGE_SIZE);
-               if (unlikely(!rl)) {
-                       ntfs_error(vol->sb, "Not enough memory to allocate "
-                                       "runlist element buffer.");
-                       return -ENOMEM;
-               }
-               runlist->rl = rl;
-               rl[1].length = rl->vcn = 0;
-               rl->lcn = LCN_HOLE;
-               rl[1].vcn = rl->length = new_length;
-               rl[1].lcn = LCN_ENOENT;
-               return 0;
-       }
-       BUG_ON(new_length < rl->vcn);
-       /* Find @new_length in the runlist. */
-       while (likely(rl->length && new_length >= rl[1].vcn))
-               rl++;
-       /*
-        * If not at the end of the runlist we need to shrink it.
-        * If at the end of the runlist we need to expand it.
-        */
-       if (rl->length) {
-               runlist_element *trl;
-               bool is_end;
-
-               ntfs_debug("Shrinking runlist.");
-               /* Determine the runlist size. */
-               trl = rl + 1;
-               while (likely(trl->length))
-                       trl++;
-               old_size = trl - runlist->rl + 1;
-               /* Truncate the run. */
-               rl->length = new_length - rl->vcn;
-               /*
-                * If a run was partially truncated, make the following runlist
-                * element a terminator.
-                */
-               is_end = false;
-               if (rl->length) {
-                       rl++;
-                       if (!rl->length)
-                               is_end = true;
-                       rl->vcn = new_length;
-                       rl->length = 0;
-               }
-               rl->lcn = LCN_ENOENT;
-               /* Reallocate memory if necessary. */
-               if (!is_end) {
-                       int new_size = rl - runlist->rl + 1;
-                       rl = ntfs_rl_realloc(runlist->rl, old_size, new_size);
-                       if (IS_ERR(rl))
-                               ntfs_warning(vol->sb, "Failed to shrink "
-                                               "runlist buffer.  This just "
-                                               "wastes a bit of memory "
-                                               "temporarily so we ignore it "
-                                               "and return success.");
-                       else
-                               runlist->rl = rl;
-               }
-       } else if (likely(/* !rl->length && */ new_length > rl->vcn)) {
-               ntfs_debug("Expanding runlist.");
-               /*
-                * If there is a previous runlist element and it is a sparse
-                * one, extend it.  Otherwise need to add a new, sparse runlist
-                * element.
-                */
-               if ((rl > runlist->rl) && ((rl - 1)->lcn == LCN_HOLE))
-                       (rl - 1)->length = new_length - (rl - 1)->vcn;
-               else {
-                       /* Determine the runlist size. */
-                       old_size = rl - runlist->rl + 1;
-                       /* Reallocate memory if necessary. */
-                       rl = ntfs_rl_realloc(runlist->rl, old_size,
-                                       old_size + 1);
-                       if (IS_ERR(rl)) {
-                               ntfs_error(vol->sb, "Failed to expand runlist "
-                                               "buffer, aborting.");
-                               return PTR_ERR(rl);
-                       }
-                       runlist->rl = rl;
-                       /*
-                        * Set @rl to the same runlist element in the new
-                        * runlist as before in the old runlist.
-                        */
-                       rl += old_size - 1;
-                       /* Add a new, sparse runlist element. */
-                       rl->lcn = LCN_HOLE;
-                       rl->length = new_length - rl->vcn;
-                       /* Add a new terminator runlist element. */
-                       rl++;
-                       rl->length = 0;
-               }
-               rl->vcn = new_length;
-               rl->lcn = LCN_ENOENT;
-       } else /* if (unlikely(!rl->length && new_length == rl->vcn)) */ {
-               /* Runlist already has same size as requested. */
-               rl->lcn = LCN_ENOENT;
-       }
-       ntfs_debug("Done.");
-       return 0;
-}
-
-/**
- * ntfs_rl_punch_nolock - punch a hole into a runlist
- * @vol:       ntfs volume (needed for error output)
- * @runlist:   runlist to punch a hole into
- * @start:     starting VCN of the hole to be created
- * @length:    size of the hole to be created in units of clusters
- *
- * Punch a hole into the runlist @runlist starting at VCN @start and of size
- * @length clusters.
- *
- * Return 0 on success and -errno on error, in which case @runlist has not been
- * modified.
- *
- * If @start and/or @start + @length are outside the runlist return error code
- * -ENOENT.
- *
- * If the runlist contains unmapped or error elements between @start and @start
- * + @length return error code -EINVAL.
- *
- * Locking: The caller must hold @runlist->lock for writing.
- */
-int ntfs_rl_punch_nolock(const ntfs_volume *vol, runlist *const runlist,
-               const VCN start, const s64 length)
-{
-       const VCN end = start + length;
-       s64 delta;
-       runlist_element *rl, *rl_end, *rl_real_end, *trl;
-       int old_size;
-       bool lcn_fixup = false;
-
-       ntfs_debug("Entering for start 0x%llx, length 0x%llx.",
-                       (long long)start, (long long)length);
-       BUG_ON(!runlist);
-       BUG_ON(start < 0);
-       BUG_ON(length < 0);
-       BUG_ON(end < 0);
-       rl = runlist->rl;
-       if (unlikely(!rl)) {
-               if (likely(!start && !length))
-                       return 0;
-               return -ENOENT;
-       }
-       /* Find @start in the runlist. */
-       while (likely(rl->length && start >= rl[1].vcn))
-               rl++;
-       rl_end = rl;
-       /* Find @end in the runlist. */
-       while (likely(rl_end->length && end >= rl_end[1].vcn)) {
-               /* Verify there are no unmapped or error elements. */
-               if (unlikely(rl_end->lcn < LCN_HOLE))
-                       return -EINVAL;
-               rl_end++;
-       }
-       /* Check the last element. */
-       if (unlikely(rl_end->length && rl_end->lcn < LCN_HOLE))
-               return -EINVAL;
-       /* This covers @start being out of bounds, too. */
-       if (!rl_end->length && end > rl_end->vcn)
-               return -ENOENT;
-       if (!length)
-               return 0;
-       if (!rl->length)
-               return -ENOENT;
-       rl_real_end = rl_end;
-       /* Determine the runlist size. */
-       while (likely(rl_real_end->length))
-               rl_real_end++;
-       old_size = rl_real_end - runlist->rl + 1;
-       /* If @start is in a hole simply extend the hole. */
-       if (rl->lcn == LCN_HOLE) {
-               /*
-                * If both @start and @end are in the same sparse run, we are
-                * done.
-                */
-               if (end <= rl[1].vcn) {
-                       ntfs_debug("Done (requested hole is already sparse).");
-                       return 0;
-               }
-extend_hole:
-               /* Extend the hole. */
-               rl->length = end - rl->vcn;
-               /* If @end is in a hole, merge it with the current one. */
-               if (rl_end->lcn == LCN_HOLE) {
-                       rl_end++;
-                       rl->length = rl_end->vcn - rl->vcn;
-               }
-               /* We have done the hole.  Now deal with the remaining tail. */
-               rl++;
-               /* Cut out all runlist elements up to @end. */
-               if (rl < rl_end)
-                       memmove(rl, rl_end, (rl_real_end - rl_end + 1) *
-                                       sizeof(*rl));
-               /* Adjust the beginning of the tail if necessary. */
-               if (end > rl->vcn) {
-                       delta = end - rl->vcn;
-                       rl->vcn = end;
-                       rl->length -= delta;
-                       /* Only adjust the lcn if it is real. */
-                       if (rl->lcn >= 0)
-                               rl->lcn += delta;
-               }
-shrink_allocation:
-               /* Reallocate memory if the allocation changed. */
-               if (rl < rl_end) {
-                       rl = ntfs_rl_realloc(runlist->rl, old_size,
-                                       old_size - (rl_end - rl));
-                       if (IS_ERR(rl))
-                               ntfs_warning(vol->sb, "Failed to shrink "
-                                               "runlist buffer.  This just "
-                                               "wastes a bit of memory "
-                                               "temporarily so we ignore it "
-                                               "and return success.");
-                       else
-                               runlist->rl = rl;
-               }
-               ntfs_debug("Done (extend hole).");
-               return 0;
-       }
-       /*
-        * If @start is at the beginning of a run things are easier as there is
-        * no need to split the first run.
-        */
-       if (start == rl->vcn) {
-               /*
-                * @start is at the beginning of a run.
-                *
-                * If the previous run is sparse, extend its hole.
-                *
-                * If @end is not in the same run, switch the run to be sparse
-                * and extend the newly created hole.
-                *
-                * Thus both of these cases reduce the problem to the above
-                * case of "@start is in a hole".
-                */
-               if (rl > runlist->rl && (rl - 1)->lcn == LCN_HOLE) {
-                       rl--;
-                       goto extend_hole;
-               }
-               if (end >= rl[1].vcn) {
-                       rl->lcn = LCN_HOLE;
-                       goto extend_hole;
-               }
-               /*
-                * The final case is when @end is in the same run as @start.
-                * For this need to split the run into two.  One run for the
-                * sparse region between the beginning of the old run, i.e.
-                * @start, and @end and one for the remaining non-sparse
-                * region, i.e. between @end and the end of the old run.
-                */
-               trl = ntfs_rl_realloc(runlist->rl, old_size, old_size + 1);
-               if (IS_ERR(trl))
-                       goto enomem_out;
-               old_size++;
-               if (runlist->rl != trl) {
-                       rl = trl + (rl - runlist->rl);
-                       rl_end = trl + (rl_end - runlist->rl);
-                       rl_real_end = trl + (rl_real_end - runlist->rl);
-                       runlist->rl = trl;
-               }
-split_end:
-               /* Shift all the runs up by one. */
-               memmove(rl + 1, rl, (rl_real_end - rl + 1) * sizeof(*rl));
-               /* Finally, setup the two split runs. */
-               rl->lcn = LCN_HOLE;
-               rl->length = length;
-               rl++;
-               rl->vcn += length;
-               /* Only adjust the lcn if it is real. */
-               if (rl->lcn >= 0 || lcn_fixup)
-                       rl->lcn += length;
-               rl->length -= length;
-               ntfs_debug("Done (split one).");
-               return 0;
-       }
-       /*
-        * @start is neither in a hole nor at the beginning of a run.
-        *
-        * If @end is in a hole, things are easier as simply truncating the run
-        * @start is in to end at @start - 1, deleting all runs after that up
-        * to @end, and finally extending the beginning of the run @end is in
-        * to be @start is all that is needed.
-        */
-       if (rl_end->lcn == LCN_HOLE) {
-               /* Truncate the run containing @start. */
-               rl->length = start - rl->vcn;
-               rl++;
-               /* Cut out all runlist elements up to @end. */
-               if (rl < rl_end)
-                       memmove(rl, rl_end, (rl_real_end - rl_end + 1) *
-                                       sizeof(*rl));
-               /* Extend the beginning of the run @end is in to be @start. */
-               rl->vcn = start;
-               rl->length = rl[1].vcn - start;
-               goto shrink_allocation;
-       }
-       /* 
-        * If @end is not in a hole there are still two cases to distinguish.
-        * Either @end is or is not in the same run as @start.
-        *
-        * The second case is easier as it can be reduced to an already solved
-        * problem by truncating the run @start is in to end at @start - 1.
-        * Then, if @end is in the next run need to split the run into a sparse
-        * run followed by a non-sparse run (already covered above) and if @end
-        * is not in the next run switching it to be sparse, again reduces the
-        * problem to the already covered case of "@start is in a hole".
-        */
-       if (end >= rl[1].vcn) {
-               /*
-                * If @end is not in the next run, reduce the problem to the
-                * case of "@start is in a hole".
-                */
-               if (rl[1].length && end >= rl[2].vcn) {
-                       /* Truncate the run containing @start. */
-                       rl->length = start - rl->vcn;
-                       rl++;
-                       rl->vcn = start;
-                       rl->lcn = LCN_HOLE;
-                       goto extend_hole;
-               }
-               trl = ntfs_rl_realloc(runlist->rl, old_size, old_size + 1);
-               if (IS_ERR(trl))
-                       goto enomem_out;
-               old_size++;
-               if (runlist->rl != trl) {
-                       rl = trl + (rl - runlist->rl);
-                       rl_end = trl + (rl_end - runlist->rl);
-                       rl_real_end = trl + (rl_real_end - runlist->rl);
-                       runlist->rl = trl;
-               }
-               /* Truncate the run containing @start. */
-               rl->length = start - rl->vcn;
-               rl++;
-               /*
-                * @end is in the next run, reduce the problem to the case
-                * where "@start is at the beginning of a run and @end is in
-                * the same run as @start".
-                */
-               delta = rl->vcn - start;
-               rl->vcn = start;
-               if (rl->lcn >= 0) {
-                       rl->lcn -= delta;
-                       /* Need this in case the lcn just became negative. */
-                       lcn_fixup = true;
-               }
-               rl->length += delta;
-               goto split_end;
-       }
-       /*
-        * The first case from above, i.e. @end is in the same run as @start.
-        * We need to split the run into three.  One run for the non-sparse
-        * region between the beginning of the old run and @start, one for the
-        * sparse region between @start and @end, and one for the remaining
-        * non-sparse region, i.e. between @end and the end of the old run.
-        */
-       trl = ntfs_rl_realloc(runlist->rl, old_size, old_size + 2);
-       if (IS_ERR(trl))
-               goto enomem_out;
-       old_size += 2;
-       if (runlist->rl != trl) {
-               rl = trl + (rl - runlist->rl);
-               rl_end = trl + (rl_end - runlist->rl);
-               rl_real_end = trl + (rl_real_end - runlist->rl);
-               runlist->rl = trl;
-       }
-       /* Shift all the runs up by two. */
-       memmove(rl + 2, rl, (rl_real_end - rl + 1) * sizeof(*rl));
-       /* Finally, setup the three split runs. */
-       rl->length = start - rl->vcn;
-       rl++;
-       rl->vcn = start;
-       rl->lcn = LCN_HOLE;
-       rl->length = length;
-       rl++;
-       delta = end - rl->vcn;
-       rl->vcn = end;
-       rl->lcn += delta;
-       rl->length -= delta;
-       ntfs_debug("Done (split both).");
-       return 0;
-enomem_out:
-       ntfs_error(vol->sb, "Not enough memory to extend runlist buffer.");
-       return -ENOMEM;
-}
-
-#endif /* NTFS_RW */
diff --git a/fs/ntfs/runlist.h b/fs/ntfs/runlist.h
deleted file mode 100644 (file)
index 38de0a3..0000000
+++ /dev/null
@@ -1,88 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * runlist.h - Defines for runlist handling in NTFS Linux kernel driver.
- *            Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2005 Anton Altaparmakov
- * Copyright (c) 2002 Richard Russon
- */
-
-#ifndef _LINUX_NTFS_RUNLIST_H
-#define _LINUX_NTFS_RUNLIST_H
-
-#include "types.h"
-#include "layout.h"
-#include "volume.h"
-
-/**
- * runlist_element - in memory vcn to lcn mapping array element
- * @vcn:       starting vcn of the current array element
- * @lcn:       starting lcn of the current array element
- * @length:    length in clusters of the current array element
- *
- * The last vcn (in fact the last vcn + 1) is reached when length == 0.
- *
- * When lcn == -1 this means that the count vcns starting at vcn are not
- * physically allocated (i.e. this is a hole / data is sparse).
- */
-typedef struct {       /* In memory vcn to lcn mapping structure element. */
-       VCN vcn;        /* vcn = Starting virtual cluster number. */
-       LCN lcn;        /* lcn = Starting logical cluster number. */
-       s64 length;     /* Run length in clusters. */
-} runlist_element;
-
-/**
- * runlist - in memory vcn to lcn mapping array including a read/write lock
- * @rl:                pointer to an array of runlist elements
- * @lock:      read/write spinlock for serializing access to @rl
- *
- */
-typedef struct {
-       runlist_element *rl;
-       struct rw_semaphore lock;
-} runlist;
-
-static inline void ntfs_init_runlist(runlist *rl)
-{
-       rl->rl = NULL;
-       init_rwsem(&rl->lock);
-}
-
-typedef enum {
-       LCN_HOLE                = -1,   /* Keep this as highest value or die! */
-       LCN_RL_NOT_MAPPED       = -2,
-       LCN_ENOENT              = -3,
-       LCN_ENOMEM              = -4,
-       LCN_EIO                 = -5,
-} LCN_SPECIAL_VALUES;
-
-extern runlist_element *ntfs_runlists_merge(runlist_element *drl,
-               runlist_element *srl);
-
-extern runlist_element *ntfs_mapping_pairs_decompress(const ntfs_volume *vol,
-               const ATTR_RECORD *attr, runlist_element *old_rl);
-
-extern LCN ntfs_rl_vcn_to_lcn(const runlist_element *rl, const VCN vcn);
-
-#ifdef NTFS_RW
-
-extern runlist_element *ntfs_rl_find_vcn_nolock(runlist_element *rl,
-               const VCN vcn);
-
-extern int ntfs_get_size_for_mapping_pairs(const ntfs_volume *vol,
-               const runlist_element *rl, const VCN first_vcn,
-               const VCN last_vcn);
-
-extern int ntfs_mapping_pairs_build(const ntfs_volume *vol, s8 *dst,
-               const int dst_len, const runlist_element *rl,
-               const VCN first_vcn, const VCN last_vcn, VCN *const stop_vcn);
-
-extern int ntfs_rl_truncate_nolock(const ntfs_volume *vol,
-               runlist *const runlist, const s64 new_length);
-
-int ntfs_rl_punch_nolock(const ntfs_volume *vol, runlist *const runlist,
-               const VCN start, const s64 length);
-
-#endif /* NTFS_RW */
-
-#endif /* _LINUX_NTFS_RUNLIST_H */
diff --git a/fs/ntfs/super.c b/fs/ntfs/super.c
deleted file mode 100644 (file)
index 56a7d5b..0000000
+++ /dev/null
@@ -1,3202 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * super.c - NTFS kernel super block handling. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2012 Anton Altaparmakov and Tuxera Inc.
- * Copyright (c) 2001,2002 Richard Russon
- */
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include <linux/stddef.h>
-#include <linux/init.h>
-#include <linux/slab.h>
-#include <linux/string.h>
-#include <linux/spinlock.h>
-#include <linux/blkdev.h>      /* For bdev_logical_block_size(). */
-#include <linux/backing-dev.h>
-#include <linux/buffer_head.h>
-#include <linux/vfs.h>
-#include <linux/moduleparam.h>
-#include <linux/bitmap.h>
-
-#include "sysctl.h"
-#include "logfile.h"
-#include "quota.h"
-#include "usnjrnl.h"
-#include "dir.h"
-#include "debug.h"
-#include "index.h"
-#include "inode.h"
-#include "aops.h"
-#include "layout.h"
-#include "malloc.h"
-#include "ntfs.h"
-
-/* Number of mounted filesystems which have compression enabled. */
-static unsigned long ntfs_nr_compression_users;
-
-/* A global default upcase table and a corresponding reference count. */
-static ntfschar *default_upcase;
-static unsigned long ntfs_nr_upcase_users;
-
-/* Error constants/strings used in inode.c::ntfs_show_options(). */
-typedef enum {
-       /* One of these must be present, default is ON_ERRORS_CONTINUE. */
-       ON_ERRORS_PANIC                 = 0x01,
-       ON_ERRORS_REMOUNT_RO            = 0x02,
-       ON_ERRORS_CONTINUE              = 0x04,
-       /* Optional, can be combined with any of the above. */
-       ON_ERRORS_RECOVER               = 0x10,
-} ON_ERRORS_ACTIONS;
-
-const option_t on_errors_arr[] = {
-       { ON_ERRORS_PANIC,      "panic" },
-       { ON_ERRORS_REMOUNT_RO, "remount-ro", },
-       { ON_ERRORS_CONTINUE,   "continue", },
-       { ON_ERRORS_RECOVER,    "recover" },
-       { 0,                    NULL }
-};
-
-/**
- * simple_getbool - convert input string to a boolean value
- * @s: input string to convert
- * @setval: where to store the output boolean value
- *
- * Copied from old ntfs driver (which copied from vfat driver).
- *
- * "1", "yes", "true", or an empty string are converted to %true.
- * "0", "no", and "false" are converted to %false.
- *
- * Return: %1 if the string is converted or was empty and *setval contains it;
- *        %0 if the string was not valid.
- */
-static int simple_getbool(char *s, bool *setval)
-{
-       if (s) {
-               if (!strcmp(s, "1") || !strcmp(s, "yes") || !strcmp(s, "true"))
-                       *setval = true;
-               else if (!strcmp(s, "0") || !strcmp(s, "no") ||
-                                                       !strcmp(s, "false"))
-                       *setval = false;
-               else
-                       return 0;
-       } else
-               *setval = true;
-       return 1;
-}
-
-/**
- * parse_options - parse the (re)mount options
- * @vol:       ntfs volume
- * @opt:       string containing the (re)mount options
- *
- * Parse the recognized options in @opt for the ntfs volume described by @vol.
- */
-static bool parse_options(ntfs_volume *vol, char *opt)
-{
-       char *p, *v, *ov;
-       static char *utf8 = "utf8";
-       int errors = 0, sloppy = 0;
-       kuid_t uid = INVALID_UID;
-       kgid_t gid = INVALID_GID;
-       umode_t fmask = (umode_t)-1, dmask = (umode_t)-1;
-       int mft_zone_multiplier = -1, on_errors = -1;
-       int show_sys_files = -1, case_sensitive = -1, disable_sparse = -1;
-       struct nls_table *nls_map = NULL, *old_nls;
-
-       /* I am lazy... (-8 */
-#define NTFS_GETOPT_WITH_DEFAULT(option, variable, default_value)      \
-       if (!strcmp(p, option)) {                                       \
-               if (!v || !*v)                                          \
-                       variable = default_value;                       \
-               else {                                                  \
-                       variable = simple_strtoul(ov = v, &v, 0);       \
-                       if (*v)                                         \
-                               goto needs_val;                         \
-               }                                                       \
-       }
-#define NTFS_GETOPT(option, variable)                                  \
-       if (!strcmp(p, option)) {                                       \
-               if (!v || !*v)                                          \
-                       goto needs_arg;                                 \
-               variable = simple_strtoul(ov = v, &v, 0);               \
-               if (*v)                                                 \
-                       goto needs_val;                                 \
-       }
-#define NTFS_GETOPT_UID(option, variable)                              \
-       if (!strcmp(p, option)) {                                       \
-               uid_t uid_value;                                        \
-               if (!v || !*v)                                          \
-                       goto needs_arg;                                 \
-               uid_value = simple_strtoul(ov = v, &v, 0);              \
-               if (*v)                                                 \
-                       goto needs_val;                                 \
-               variable = make_kuid(current_user_ns(), uid_value);     \
-               if (!uid_valid(variable))                               \
-                       goto needs_val;                                 \
-       }
-#define NTFS_GETOPT_GID(option, variable)                              \
-       if (!strcmp(p, option)) {                                       \
-               gid_t gid_value;                                        \
-               if (!v || !*v)                                          \
-                       goto needs_arg;                                 \
-               gid_value = simple_strtoul(ov = v, &v, 0);              \
-               if (*v)                                                 \
-                       goto needs_val;                                 \
-               variable = make_kgid(current_user_ns(), gid_value);     \
-               if (!gid_valid(variable))                               \
-                       goto needs_val;                                 \
-       }
-#define NTFS_GETOPT_OCTAL(option, variable)                            \
-       if (!strcmp(p, option)) {                                       \
-               if (!v || !*v)                                          \
-                       goto needs_arg;                                 \
-               variable = simple_strtoul(ov = v, &v, 8);               \
-               if (*v)                                                 \
-                       goto needs_val;                                 \
-       }
-#define NTFS_GETOPT_BOOL(option, variable)                             \
-       if (!strcmp(p, option)) {                                       \
-               bool val;                                               \
-               if (!simple_getbool(v, &val))                           \
-                       goto needs_bool;                                \
-               variable = val;                                         \
-       }
-#define NTFS_GETOPT_OPTIONS_ARRAY(option, variable, opt_array)         \
-       if (!strcmp(p, option)) {                                       \
-               int _i;                                                 \
-               if (!v || !*v)                                          \
-                       goto needs_arg;                                 \
-               ov = v;                                                 \
-               if (variable == -1)                                     \
-                       variable = 0;                                   \
-               for (_i = 0; opt_array[_i].str && *opt_array[_i].str; _i++) \
-                       if (!strcmp(opt_array[_i].str, v)) {            \
-                               variable |= opt_array[_i].val;          \
-                               break;                                  \
-                       }                                               \
-               if (!opt_array[_i].str || !*opt_array[_i].str)          \
-                       goto needs_val;                                 \
-       }
-       if (!opt || !*opt)
-               goto no_mount_options;
-       ntfs_debug("Entering with mount options string: %s", opt);
-       while ((p = strsep(&opt, ","))) {
-               if ((v = strchr(p, '=')))
-                       *v++ = 0;
-               NTFS_GETOPT_UID("uid", uid)
-               else NTFS_GETOPT_GID("gid", gid)
-               else NTFS_GETOPT_OCTAL("umask", fmask = dmask)
-               else NTFS_GETOPT_OCTAL("fmask", fmask)
-               else NTFS_GETOPT_OCTAL("dmask", dmask)
-               else NTFS_GETOPT("mft_zone_multiplier", mft_zone_multiplier)
-               else NTFS_GETOPT_WITH_DEFAULT("sloppy", sloppy, true)
-               else NTFS_GETOPT_BOOL("show_sys_files", show_sys_files)
-               else NTFS_GETOPT_BOOL("case_sensitive", case_sensitive)
-               else NTFS_GETOPT_BOOL("disable_sparse", disable_sparse)
-               else NTFS_GETOPT_OPTIONS_ARRAY("errors", on_errors,
-                               on_errors_arr)
-               else if (!strcmp(p, "posix") || !strcmp(p, "show_inodes"))
-                       ntfs_warning(vol->sb, "Ignoring obsolete option %s.",
-                                       p);
-               else if (!strcmp(p, "nls") || !strcmp(p, "iocharset")) {
-                       if (!strcmp(p, "iocharset"))
-                               ntfs_warning(vol->sb, "Option iocharset is "
-                                               "deprecated. Please use "
-                                               "option nls=<charsetname> in "
-                                               "the future.");
-                       if (!v || !*v)
-                               goto needs_arg;
-use_utf8:
-                       old_nls = nls_map;
-                       nls_map = load_nls(v);
-                       if (!nls_map) {
-                               if (!old_nls) {
-                                       ntfs_error(vol->sb, "NLS character set "
-                                                       "%s not found.", v);
-                                       return false;
-                               }
-                               ntfs_error(vol->sb, "NLS character set %s not "
-                                               "found. Using previous one %s.",
-                                               v, old_nls->charset);
-                               nls_map = old_nls;
-                       } else /* nls_map */ {
-                               unload_nls(old_nls);
-                       }
-               } else if (!strcmp(p, "utf8")) {
-                       bool val = false;
-                       ntfs_warning(vol->sb, "Option utf8 is no longer "
-                                  "supported, using option nls=utf8. Please "
-                                  "use option nls=utf8 in the future and "
-                                  "make sure utf8 is compiled either as a "
-                                  "module or into the kernel.");
-                       if (!v || !*v)
-                               val = true;
-                       else if (!simple_getbool(v, &val))
-                               goto needs_bool;
-                       if (val) {
-                               v = utf8;
-                               goto use_utf8;
-                       }
-               } else {
-                       ntfs_error(vol->sb, "Unrecognized mount option %s.", p);
-                       if (errors < INT_MAX)
-                               errors++;
-               }
-#undef NTFS_GETOPT_OPTIONS_ARRAY
-#undef NTFS_GETOPT_BOOL
-#undef NTFS_GETOPT
-#undef NTFS_GETOPT_WITH_DEFAULT
-       }
-no_mount_options:
-       if (errors && !sloppy)
-               return false;
-       if (sloppy)
-               ntfs_warning(vol->sb, "Sloppy option given. Ignoring "
-                               "unrecognized mount option(s) and continuing.");
-       /* Keep this first! */
-       if (on_errors != -1) {
-               if (!on_errors) {
-                       ntfs_error(vol->sb, "Invalid errors option argument "
-                                       "or bug in options parser.");
-                       return false;
-               }
-       }
-       if (nls_map) {
-               if (vol->nls_map && vol->nls_map != nls_map) {
-                       ntfs_error(vol->sb, "Cannot change NLS character set "
-                                       "on remount.");
-                       return false;
-               } /* else (!vol->nls_map) */
-               ntfs_debug("Using NLS character set %s.", nls_map->charset);
-               vol->nls_map = nls_map;
-       } else /* (!nls_map) */ {
-               if (!vol->nls_map) {
-                       vol->nls_map = load_nls_default();
-                       if (!vol->nls_map) {
-                               ntfs_error(vol->sb, "Failed to load default "
-                                               "NLS character set.");
-                               return false;
-                       }
-                       ntfs_debug("Using default NLS character set (%s).",
-                                       vol->nls_map->charset);
-               }
-       }
-       if (mft_zone_multiplier != -1) {
-               if (vol->mft_zone_multiplier && vol->mft_zone_multiplier !=
-                               mft_zone_multiplier) {
-                       ntfs_error(vol->sb, "Cannot change mft_zone_multiplier "
-                                       "on remount.");
-                       return false;
-               }
-               if (mft_zone_multiplier < 1 || mft_zone_multiplier > 4) {
-                       ntfs_error(vol->sb, "Invalid mft_zone_multiplier. "
-                                       "Using default value, i.e. 1.");
-                       mft_zone_multiplier = 1;
-               }
-               vol->mft_zone_multiplier = mft_zone_multiplier;
-       }
-       if (!vol->mft_zone_multiplier)
-               vol->mft_zone_multiplier = 1;
-       if (on_errors != -1)
-               vol->on_errors = on_errors;
-       if (!vol->on_errors || vol->on_errors == ON_ERRORS_RECOVER)
-               vol->on_errors |= ON_ERRORS_CONTINUE;
-       if (uid_valid(uid))
-               vol->uid = uid;
-       if (gid_valid(gid))
-               vol->gid = gid;
-       if (fmask != (umode_t)-1)
-               vol->fmask = fmask;
-       if (dmask != (umode_t)-1)
-               vol->dmask = dmask;
-       if (show_sys_files != -1) {
-               if (show_sys_files)
-                       NVolSetShowSystemFiles(vol);
-               else
-                       NVolClearShowSystemFiles(vol);
-       }
-       if (case_sensitive != -1) {
-               if (case_sensitive)
-                       NVolSetCaseSensitive(vol);
-               else
-                       NVolClearCaseSensitive(vol);
-       }
-       if (disable_sparse != -1) {
-               if (disable_sparse)
-                       NVolClearSparseEnabled(vol);
-               else {
-                       if (!NVolSparseEnabled(vol) &&
-                                       vol->major_ver && vol->major_ver < 3)
-                               ntfs_warning(vol->sb, "Not enabling sparse "
-                                               "support due to NTFS volume "
-                                               "version %i.%i (need at least "
-                                               "version 3.0).", vol->major_ver,
-                                               vol->minor_ver);
-                       else
-                               NVolSetSparseEnabled(vol);
-               }
-       }
-       return true;
-needs_arg:
-       ntfs_error(vol->sb, "The %s option requires an argument.", p);
-       return false;
-needs_bool:
-       ntfs_error(vol->sb, "The %s option requires a boolean argument.", p);
-       return false;
-needs_val:
-       ntfs_error(vol->sb, "Invalid %s option argument: %s", p, ov);
-       return false;
-}
-
-#ifdef NTFS_RW
-
-/**
- * ntfs_write_volume_flags - write new flags to the volume information flags
- * @vol:       ntfs volume on which to modify the flags
- * @flags:     new flags value for the volume information flags
- *
- * Internal function.  You probably want to use ntfs_{set,clear}_volume_flags()
- * instead (see below).
- *
- * Replace the volume information flags on the volume @vol with the value
- * supplied in @flags.  Note, this overwrites the volume information flags, so
- * make sure to combine the flags you want to modify with the old flags and use
- * the result when calling ntfs_write_volume_flags().
- *
- * Return 0 on success and -errno on error.
- */
-static int ntfs_write_volume_flags(ntfs_volume *vol, const VOLUME_FLAGS flags)
-{
-       ntfs_inode *ni = NTFS_I(vol->vol_ino);
-       MFT_RECORD *m;
-       VOLUME_INFORMATION *vi;
-       ntfs_attr_search_ctx *ctx;
-       int err;
-
-       ntfs_debug("Entering, old flags = 0x%x, new flags = 0x%x.",
-                       le16_to_cpu(vol->vol_flags), le16_to_cpu(flags));
-       if (vol->vol_flags == flags)
-               goto done;
-       BUG_ON(!ni);
-       m = map_mft_record(ni);
-       if (IS_ERR(m)) {
-               err = PTR_ERR(m);
-               goto err_out;
-       }
-       ctx = ntfs_attr_get_search_ctx(ni, m);
-       if (!ctx) {
-               err = -ENOMEM;
-               goto put_unm_err_out;
-       }
-       err = ntfs_attr_lookup(AT_VOLUME_INFORMATION, NULL, 0, 0, 0, NULL, 0,
-                       ctx);
-       if (err)
-               goto put_unm_err_out;
-       vi = (VOLUME_INFORMATION*)((u8*)ctx->attr +
-                       le16_to_cpu(ctx->attr->data.resident.value_offset));
-       vol->vol_flags = vi->flags = flags;
-       flush_dcache_mft_record_page(ctx->ntfs_ino);
-       mark_mft_record_dirty(ctx->ntfs_ino);
-       ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(ni);
-done:
-       ntfs_debug("Done.");
-       return 0;
-put_unm_err_out:
-       if (ctx)
-               ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(ni);
-err_out:
-       ntfs_error(vol->sb, "Failed with error code %i.", -err);
-       return err;
-}
-
-/**
- * ntfs_set_volume_flags - set bits in the volume information flags
- * @vol:       ntfs volume on which to modify the flags
- * @flags:     flags to set on the volume
- *
- * Set the bits in @flags in the volume information flags on the volume @vol.
- *
- * Return 0 on success and -errno on error.
- */
-static inline int ntfs_set_volume_flags(ntfs_volume *vol, VOLUME_FLAGS flags)
-{
-       flags &= VOLUME_FLAGS_MASK;
-       return ntfs_write_volume_flags(vol, vol->vol_flags | flags);
-}
-
-/**
- * ntfs_clear_volume_flags - clear bits in the volume information flags
- * @vol:       ntfs volume on which to modify the flags
- * @flags:     flags to clear on the volume
- *
- * Clear the bits in @flags in the volume information flags on the volume @vol.
- *
- * Return 0 on success and -errno on error.
- */
-static inline int ntfs_clear_volume_flags(ntfs_volume *vol, VOLUME_FLAGS flags)
-{
-       flags &= VOLUME_FLAGS_MASK;
-       flags = vol->vol_flags & cpu_to_le16(~le16_to_cpu(flags));
-       return ntfs_write_volume_flags(vol, flags);
-}
-
-#endif /* NTFS_RW */
-
-/**
- * ntfs_remount - change the mount options of a mounted ntfs filesystem
- * @sb:                superblock of mounted ntfs filesystem
- * @flags:     remount flags
- * @opt:       remount options string
- *
- * Change the mount options of an already mounted ntfs filesystem.
- *
- * NOTE:  The VFS sets the @sb->s_flags remount flags to @flags after
- * ntfs_remount() returns successfully (i.e. returns 0).  Otherwise,
- * @sb->s_flags are not changed.
- */
-static int ntfs_remount(struct super_block *sb, int *flags, char *opt)
-{
-       ntfs_volume *vol = NTFS_SB(sb);
-
-       ntfs_debug("Entering with remount options string: %s", opt);
-
-       sync_filesystem(sb);
-
-#ifndef NTFS_RW
-       /* For read-only compiled driver, enforce read-only flag. */
-       *flags |= SB_RDONLY;
-#else /* NTFS_RW */
-       /*
-        * For the read-write compiled driver, if we are remounting read-write,
-        * make sure there are no volume errors and that no unsupported volume
-        * flags are set.  Also, empty the logfile journal as it would become
-        * stale as soon as something is written to the volume and mark the
-        * volume dirty so that chkdsk is run if the volume is not umounted
-        * cleanly.  Finally, mark the quotas out of date so Windows rescans
-        * the volume on boot and updates them.
-        *
-        * When remounting read-only, mark the volume clean if no volume errors
-        * have occurred.
-        */
-       if (sb_rdonly(sb) && !(*flags & SB_RDONLY)) {
-               static const char *es = ".  Cannot remount read-write.";
-
-               /* Remounting read-write. */
-               if (NVolErrors(vol)) {
-                       ntfs_error(sb, "Volume has errors and is read-only%s",
-                                       es);
-                       return -EROFS;
-               }
-               if (vol->vol_flags & VOLUME_IS_DIRTY) {
-                       ntfs_error(sb, "Volume is dirty and read-only%s", es);
-                       return -EROFS;
-               }
-               if (vol->vol_flags & VOLUME_MODIFIED_BY_CHKDSK) {
-                       ntfs_error(sb, "Volume has been modified by chkdsk "
-                                       "and is read-only%s", es);
-                       return -EROFS;
-               }
-               if (vol->vol_flags & VOLUME_MUST_MOUNT_RO_MASK) {
-                       ntfs_error(sb, "Volume has unsupported flags set "
-                                       "(0x%x) and is read-only%s",
-                                       (unsigned)le16_to_cpu(vol->vol_flags),
-                                       es);
-                       return -EROFS;
-               }
-               if (ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY)) {
-                       ntfs_error(sb, "Failed to set dirty bit in volume "
-                                       "information flags%s", es);
-                       return -EROFS;
-               }
-#if 0
-               // TODO: Enable this code once we start modifying anything that
-               //       is different between NTFS 1.2 and 3.x...
-               /* Set NT4 compatibility flag on newer NTFS version volumes. */
-               if ((vol->major_ver > 1)) {
-                       if (ntfs_set_volume_flags(vol, VOLUME_MOUNTED_ON_NT4)) {
-                               ntfs_error(sb, "Failed to set NT4 "
-                                               "compatibility flag%s", es);
-                               NVolSetErrors(vol);
-                               return -EROFS;
-                       }
-               }
-#endif
-               if (!ntfs_empty_logfile(vol->logfile_ino)) {
-                       ntfs_error(sb, "Failed to empty journal $LogFile%s",
-                                       es);
-                       NVolSetErrors(vol);
-                       return -EROFS;
-               }
-               if (!ntfs_mark_quotas_out_of_date(vol)) {
-                       ntfs_error(sb, "Failed to mark quotas out of date%s",
-                                       es);
-                       NVolSetErrors(vol);
-                       return -EROFS;
-               }
-               if (!ntfs_stamp_usnjrnl(vol)) {
-                       ntfs_error(sb, "Failed to stamp transaction log "
-                                       "($UsnJrnl)%s", es);
-                       NVolSetErrors(vol);
-                       return -EROFS;
-               }
-       } else if (!sb_rdonly(sb) && (*flags & SB_RDONLY)) {
-               /* Remounting read-only. */
-               if (!NVolErrors(vol)) {
-                       if (ntfs_clear_volume_flags(vol, VOLUME_IS_DIRTY))
-                               ntfs_warning(sb, "Failed to clear dirty bit "
-                                               "in volume information "
-                                               "flags.  Run chkdsk.");
-               }
-       }
-#endif /* NTFS_RW */
-
-       // TODO: Deal with *flags.
-
-       if (!parse_options(vol, opt))
-               return -EINVAL;
-
-       ntfs_debug("Done.");
-       return 0;
-}
-
-/**
- * is_boot_sector_ntfs - check whether a boot sector is a valid NTFS boot sector
- * @sb:                Super block of the device to which @b belongs.
- * @b:         Boot sector of device @sb to check.
- * @silent:    If 'true', all output will be silenced.
- *
- * is_boot_sector_ntfs() checks whether the boot sector @b is a valid NTFS boot
- * sector. Returns 'true' if it is valid and 'false' if not.
- *
- * @sb is only needed for warning/error output, i.e. it can be NULL when silent
- * is 'true'.
- */
-static bool is_boot_sector_ntfs(const struct super_block *sb,
-               const NTFS_BOOT_SECTOR *b, const bool silent)
-{
-       /*
-        * Check that checksum == sum of u32 values from b to the checksum
-        * field.  If checksum is zero, no checking is done.  We will work when
-        * the checksum test fails, since some utilities update the boot sector
-        * ignoring the checksum which leaves the checksum out-of-date.  We
-        * report a warning if this is the case.
-        */
-       if ((void*)b < (void*)&b->checksum && b->checksum && !silent) {
-               le32 *u;
-               u32 i;
-
-               for (i = 0, u = (le32*)b; u < (le32*)(&b->checksum); ++u)
-                       i += le32_to_cpup(u);
-               if (le32_to_cpu(b->checksum) != i)
-                       ntfs_warning(sb, "Invalid boot sector checksum.");
-       }
-       /* Check OEMidentifier is "NTFS    " */
-       if (b->oem_id != magicNTFS)
-               goto not_ntfs;
-       /* Check bytes per sector value is between 256 and 4096. */
-       if (le16_to_cpu(b->bpb.bytes_per_sector) < 0x100 ||
-                       le16_to_cpu(b->bpb.bytes_per_sector) > 0x1000)
-               goto not_ntfs;
-       /* Check sectors per cluster value is valid. */
-       switch (b->bpb.sectors_per_cluster) {
-       case 1: case 2: case 4: case 8: case 16: case 32: case 64: case 128:
-               break;
-       default:
-               goto not_ntfs;
-       }
-       /* Check the cluster size is not above the maximum (64kiB). */
-       if ((u32)le16_to_cpu(b->bpb.bytes_per_sector) *
-                       b->bpb.sectors_per_cluster > NTFS_MAX_CLUSTER_SIZE)
-               goto not_ntfs;
-       /* Check reserved/unused fields are really zero. */
-       if (le16_to_cpu(b->bpb.reserved_sectors) ||
-                       le16_to_cpu(b->bpb.root_entries) ||
-                       le16_to_cpu(b->bpb.sectors) ||
-                       le16_to_cpu(b->bpb.sectors_per_fat) ||
-                       le32_to_cpu(b->bpb.large_sectors) || b->bpb.fats)
-               goto not_ntfs;
-       /* Check clusters per file mft record value is valid. */
-       if ((u8)b->clusters_per_mft_record < 0xe1 ||
-                       (u8)b->clusters_per_mft_record > 0xf7)
-               switch (b->clusters_per_mft_record) {
-               case 1: case 2: case 4: case 8: case 16: case 32: case 64:
-                       break;
-               default:
-                       goto not_ntfs;
-               }
-       /* Check clusters per index block value is valid. */
-       if ((u8)b->clusters_per_index_record < 0xe1 ||
-                       (u8)b->clusters_per_index_record > 0xf7)
-               switch (b->clusters_per_index_record) {
-               case 1: case 2: case 4: case 8: case 16: case 32: case 64:
-                       break;
-               default:
-                       goto not_ntfs;
-               }
-       /*
-        * Check for valid end of sector marker. We will work without it, but
-        * many BIOSes will refuse to boot from a bootsector if the magic is
-        * incorrect, so we emit a warning.
-        */
-       if (!silent && b->end_of_sector_marker != cpu_to_le16(0xaa55))
-               ntfs_warning(sb, "Invalid end of sector marker.");
-       return true;
-not_ntfs:
-       return false;
-}
-
-/**
- * read_ntfs_boot_sector - read the NTFS boot sector of a device
- * @sb:                super block of device to read the boot sector from
- * @silent:    if true, suppress all output
- *
- * Reads the boot sector from the device and validates it. If that fails, tries
- * to read the backup boot sector, first from the end of the device a-la NT4 and
- * later and then from the middle of the device a-la NT3.51 and before.
- *
- * If a valid boot sector is found but it is not the primary boot sector, we
- * repair the primary boot sector silently (unless the device is read-only or
- * the primary boot sector is not accessible).
- *
- * NOTE: To call this function, @sb must have the fields s_dev, the ntfs super
- * block (u.ntfs_sb), nr_blocks and the device flags (s_flags) initialized
- * to their respective values.
- *
- * Return the unlocked buffer head containing the boot sector or NULL on error.
- */
-static struct buffer_head *read_ntfs_boot_sector(struct super_block *sb,
-               const int silent)
-{
-       const char *read_err_str = "Unable to read %s boot sector.";
-       struct buffer_head *bh_primary, *bh_backup;
-       sector_t nr_blocks = NTFS_SB(sb)->nr_blocks;
-
-       /* Try to read primary boot sector. */
-       if ((bh_primary = sb_bread(sb, 0))) {
-               if (is_boot_sector_ntfs(sb, (NTFS_BOOT_SECTOR*)
-                               bh_primary->b_data, silent))
-                       return bh_primary;
-               if (!silent)
-                       ntfs_error(sb, "Primary boot sector is invalid.");
-       } else if (!silent)
-               ntfs_error(sb, read_err_str, "primary");
-       if (!(NTFS_SB(sb)->on_errors & ON_ERRORS_RECOVER)) {
-               if (bh_primary)
-                       brelse(bh_primary);
-               if (!silent)
-                       ntfs_error(sb, "Mount option errors=recover not used. "
-                                       "Aborting without trying to recover.");
-               return NULL;
-       }
-       /* Try to read NT4+ backup boot sector. */
-       if ((bh_backup = sb_bread(sb, nr_blocks - 1))) {
-               if (is_boot_sector_ntfs(sb, (NTFS_BOOT_SECTOR*)
-                               bh_backup->b_data, silent))
-                       goto hotfix_primary_boot_sector;
-               brelse(bh_backup);
-       } else if (!silent)
-               ntfs_error(sb, read_err_str, "backup");
-       /* Try to read NT3.51- backup boot sector. */
-       if ((bh_backup = sb_bread(sb, nr_blocks >> 1))) {
-               if (is_boot_sector_ntfs(sb, (NTFS_BOOT_SECTOR*)
-                               bh_backup->b_data, silent))
-                       goto hotfix_primary_boot_sector;
-               if (!silent)
-                       ntfs_error(sb, "Could not find a valid backup boot "
-                                       "sector.");
-               brelse(bh_backup);
-       } else if (!silent)
-               ntfs_error(sb, read_err_str, "backup");
-       /* We failed. Cleanup and return. */
-       if (bh_primary)
-               brelse(bh_primary);
-       return NULL;
-hotfix_primary_boot_sector:
-       if (bh_primary) {
-               /*
-                * If we managed to read sector zero and the volume is not
-                * read-only, copy the found, valid backup boot sector to the
-                * primary boot sector.  Note we only copy the actual boot
-                * sector structure, not the actual whole device sector as that
-                * may be bigger and would potentially damage the $Boot system
-                * file (FIXME: Would be nice to know if the backup boot sector
-                * on a large sector device contains the whole boot loader or
-                * just the first 512 bytes).
-                */
-               if (!sb_rdonly(sb)) {
-                       ntfs_warning(sb, "Hot-fix: Recovering invalid primary "
-                                       "boot sector from backup copy.");
-                       memcpy(bh_primary->b_data, bh_backup->b_data,
-                                       NTFS_BLOCK_SIZE);
-                       mark_buffer_dirty(bh_primary);
-                       sync_dirty_buffer(bh_primary);
-                       if (buffer_uptodate(bh_primary)) {
-                               brelse(bh_backup);
-                               return bh_primary;
-                       }
-                       ntfs_error(sb, "Hot-fix: Device write error while "
-                                       "recovering primary boot sector.");
-               } else {
-                       ntfs_warning(sb, "Hot-fix: Recovery of primary boot "
-                                       "sector failed: Read-only mount.");
-               }
-               brelse(bh_primary);
-       }
-       ntfs_warning(sb, "Using backup boot sector.");
-       return bh_backup;
-}
-
-/**
- * parse_ntfs_boot_sector - parse the boot sector and store the data in @vol
- * @vol:       volume structure to initialise with data from boot sector
- * @b:         boot sector to parse
- *
- * Parse the ntfs boot sector @b and store all imporant information therein in
- * the ntfs super block @vol.  Return 'true' on success and 'false' on error.
- */
-static bool parse_ntfs_boot_sector(ntfs_volume *vol, const NTFS_BOOT_SECTOR *b)
-{
-       unsigned int sectors_per_cluster_bits, nr_hidden_sects;
-       int clusters_per_mft_record, clusters_per_index_record;
-       s64 ll;
-
-       vol->sector_size = le16_to_cpu(b->bpb.bytes_per_sector);
-       vol->sector_size_bits = ffs(vol->sector_size) - 1;
-       ntfs_debug("vol->sector_size = %i (0x%x)", vol->sector_size,
-                       vol->sector_size);
-       ntfs_debug("vol->sector_size_bits = %i (0x%x)", vol->sector_size_bits,
-                       vol->sector_size_bits);
-       if (vol->sector_size < vol->sb->s_blocksize) {
-               ntfs_error(vol->sb, "Sector size (%i) is smaller than the "
-                               "device block size (%lu).  This is not "
-                               "supported.  Sorry.", vol->sector_size,
-                               vol->sb->s_blocksize);
-               return false;
-       }
-       ntfs_debug("sectors_per_cluster = 0x%x", b->bpb.sectors_per_cluster);
-       sectors_per_cluster_bits = ffs(b->bpb.sectors_per_cluster) - 1;
-       ntfs_debug("sectors_per_cluster_bits = 0x%x",
-                       sectors_per_cluster_bits);
-       nr_hidden_sects = le32_to_cpu(b->bpb.hidden_sectors);
-       ntfs_debug("number of hidden sectors = 0x%x", nr_hidden_sects);
-       vol->cluster_size = vol->sector_size << sectors_per_cluster_bits;
-       vol->cluster_size_mask = vol->cluster_size - 1;
-       vol->cluster_size_bits = ffs(vol->cluster_size) - 1;
-       ntfs_debug("vol->cluster_size = %i (0x%x)", vol->cluster_size,
-                       vol->cluster_size);
-       ntfs_debug("vol->cluster_size_mask = 0x%x", vol->cluster_size_mask);
-       ntfs_debug("vol->cluster_size_bits = %i", vol->cluster_size_bits);
-       if (vol->cluster_size < vol->sector_size) {
-               ntfs_error(vol->sb, "Cluster size (%i) is smaller than the "
-                               "sector size (%i).  This is not supported.  "
-                               "Sorry.", vol->cluster_size, vol->sector_size);
-               return false;
-       }
-       clusters_per_mft_record = b->clusters_per_mft_record;
-       ntfs_debug("clusters_per_mft_record = %i (0x%x)",
-                       clusters_per_mft_record, clusters_per_mft_record);
-       if (clusters_per_mft_record > 0)
-               vol->mft_record_size = vol->cluster_size <<
-                               (ffs(clusters_per_mft_record) - 1);
-       else
-               /*
-                * When mft_record_size < cluster_size, clusters_per_mft_record
-                * = -log2(mft_record_size) bytes. mft_record_size normaly is
-                * 1024 bytes, which is encoded as 0xF6 (-10 in decimal).
-                */
-               vol->mft_record_size = 1 << -clusters_per_mft_record;
-       vol->mft_record_size_mask = vol->mft_record_size - 1;
-       vol->mft_record_size_bits = ffs(vol->mft_record_size) - 1;
-       ntfs_debug("vol->mft_record_size = %i (0x%x)", vol->mft_record_size,
-                       vol->mft_record_size);
-       ntfs_debug("vol->mft_record_size_mask = 0x%x",
-                       vol->mft_record_size_mask);
-       ntfs_debug("vol->mft_record_size_bits = %i (0x%x)",
-                       vol->mft_record_size_bits, vol->mft_record_size_bits);
-       /*
-        * We cannot support mft record sizes above the PAGE_SIZE since
-        * we store $MFT/$DATA, the table of mft records in the page cache.
-        */
-       if (vol->mft_record_size > PAGE_SIZE) {
-               ntfs_error(vol->sb, "Mft record size (%i) exceeds the "
-                               "PAGE_SIZE on your system (%lu).  "
-                               "This is not supported.  Sorry.",
-                               vol->mft_record_size, PAGE_SIZE);
-               return false;
-       }
-       /* We cannot support mft record sizes below the sector size. */
-       if (vol->mft_record_size < vol->sector_size) {
-               ntfs_error(vol->sb, "Mft record size (%i) is smaller than the "
-                               "sector size (%i).  This is not supported.  "
-                               "Sorry.", vol->mft_record_size,
-                               vol->sector_size);
-               return false;
-       }
-       clusters_per_index_record = b->clusters_per_index_record;
-       ntfs_debug("clusters_per_index_record = %i (0x%x)",
-                       clusters_per_index_record, clusters_per_index_record);
-       if (clusters_per_index_record > 0)
-               vol->index_record_size = vol->cluster_size <<
-                               (ffs(clusters_per_index_record) - 1);
-       else
-               /*
-                * When index_record_size < cluster_size,
-                * clusters_per_index_record = -log2(index_record_size) bytes.
-                * index_record_size normaly equals 4096 bytes, which is
-                * encoded as 0xF4 (-12 in decimal).
-                */
-               vol->index_record_size = 1 << -clusters_per_index_record;
-       vol->index_record_size_mask = vol->index_record_size - 1;
-       vol->index_record_size_bits = ffs(vol->index_record_size) - 1;
-       ntfs_debug("vol->index_record_size = %i (0x%x)",
-                       vol->index_record_size, vol->index_record_size);
-       ntfs_debug("vol->index_record_size_mask = 0x%x",
-                       vol->index_record_size_mask);
-       ntfs_debug("vol->index_record_size_bits = %i (0x%x)",
-                       vol->index_record_size_bits,
-                       vol->index_record_size_bits);
-       /* We cannot support index record sizes below the sector size. */
-       if (vol->index_record_size < vol->sector_size) {
-               ntfs_error(vol->sb, "Index record size (%i) is smaller than "
-                               "the sector size (%i).  This is not "
-                               "supported.  Sorry.", vol->index_record_size,
-                               vol->sector_size);
-               return false;
-       }
-       /*
-        * Get the size of the volume in clusters and check for 64-bit-ness.
-        * Windows currently only uses 32 bits to save the clusters so we do
-        * the same as it is much faster on 32-bit CPUs.
-        */
-       ll = sle64_to_cpu(b->number_of_sectors) >> sectors_per_cluster_bits;
-       if ((u64)ll >= 1ULL << 32) {
-               ntfs_error(vol->sb, "Cannot handle 64-bit clusters.  Sorry.");
-               return false;
-       }
-       vol->nr_clusters = ll;
-       ntfs_debug("vol->nr_clusters = 0x%llx", (long long)vol->nr_clusters);
-       /*
-        * On an architecture where unsigned long is 32-bits, we restrict the
-        * volume size to 2TiB (2^41). On a 64-bit architecture, the compiler
-        * will hopefully optimize the whole check away.
-        */
-       if (sizeof(unsigned long) < 8) {
-               if ((ll << vol->cluster_size_bits) >= (1ULL << 41)) {
-                       ntfs_error(vol->sb, "Volume size (%lluTiB) is too "
-                                       "large for this architecture.  "
-                                       "Maximum supported is 2TiB.  Sorry.",
-                                       (unsigned long long)ll >> (40 -
-                                       vol->cluster_size_bits));
-                       return false;
-               }
-       }
-       ll = sle64_to_cpu(b->mft_lcn);
-       if (ll >= vol->nr_clusters) {
-               ntfs_error(vol->sb, "MFT LCN (%lli, 0x%llx) is beyond end of "
-                               "volume.  Weird.", (unsigned long long)ll,
-                               (unsigned long long)ll);
-               return false;
-       }
-       vol->mft_lcn = ll;
-       ntfs_debug("vol->mft_lcn = 0x%llx", (long long)vol->mft_lcn);
-       ll = sle64_to_cpu(b->mftmirr_lcn);
-       if (ll >= vol->nr_clusters) {
-               ntfs_error(vol->sb, "MFTMirr LCN (%lli, 0x%llx) is beyond end "
-                               "of volume.  Weird.", (unsigned long long)ll,
-                               (unsigned long long)ll);
-               return false;
-       }
-       vol->mftmirr_lcn = ll;
-       ntfs_debug("vol->mftmirr_lcn = 0x%llx", (long long)vol->mftmirr_lcn);
-#ifdef NTFS_RW
-       /*
-        * Work out the size of the mft mirror in number of mft records. If the
-        * cluster size is less than or equal to the size taken by four mft
-        * records, the mft mirror stores the first four mft records. If the
-        * cluster size is bigger than the size taken by four mft records, the
-        * mft mirror contains as many mft records as will fit into one
-        * cluster.
-        */
-       if (vol->cluster_size <= (4 << vol->mft_record_size_bits))
-               vol->mftmirr_size = 4;
-       else
-               vol->mftmirr_size = vol->cluster_size >>
-                               vol->mft_record_size_bits;
-       ntfs_debug("vol->mftmirr_size = %i", vol->mftmirr_size);
-#endif /* NTFS_RW */
-       vol->serial_no = le64_to_cpu(b->volume_serial_number);
-       ntfs_debug("vol->serial_no = 0x%llx",
-                       (unsigned long long)vol->serial_no);
-       return true;
-}
-
-/**
- * ntfs_setup_allocators - initialize the cluster and mft allocators
- * @vol:       volume structure for which to setup the allocators
- *
- * Setup the cluster (lcn) and mft allocators to the starting values.
- */
-static void ntfs_setup_allocators(ntfs_volume *vol)
-{
-#ifdef NTFS_RW
-       LCN mft_zone_size, mft_lcn;
-#endif /* NTFS_RW */
-
-       ntfs_debug("vol->mft_zone_multiplier = 0x%x",
-                       vol->mft_zone_multiplier);
-#ifdef NTFS_RW
-       /* Determine the size of the MFT zone. */
-       mft_zone_size = vol->nr_clusters;
-       switch (vol->mft_zone_multiplier) {  /* % of volume size in clusters */
-       case 4:
-               mft_zone_size >>= 1;                    /* 50%   */
-               break;
-       case 3:
-               mft_zone_size = (mft_zone_size +
-                               (mft_zone_size >> 1)) >> 2;     /* 37.5% */
-               break;
-       case 2:
-               mft_zone_size >>= 2;                    /* 25%   */
-               break;
-       /* case 1: */
-       default:
-               mft_zone_size >>= 3;                    /* 12.5% */
-               break;
-       }
-       /* Setup the mft zone. */
-       vol->mft_zone_start = vol->mft_zone_pos = vol->mft_lcn;
-       ntfs_debug("vol->mft_zone_pos = 0x%llx",
-                       (unsigned long long)vol->mft_zone_pos);
-       /*
-        * Calculate the mft_lcn for an unmodified NTFS volume (see mkntfs
-        * source) and if the actual mft_lcn is in the expected place or even
-        * further to the front of the volume, extend the mft_zone to cover the
-        * beginning of the volume as well.  This is in order to protect the
-        * area reserved for the mft bitmap as well within the mft_zone itself.
-        * On non-standard volumes we do not protect it as the overhead would
-        * be higher than the speed increase we would get by doing it.
-        */
-       mft_lcn = (8192 + 2 * vol->cluster_size - 1) / vol->cluster_size;
-       if (mft_lcn * vol->cluster_size < 16 * 1024)
-               mft_lcn = (16 * 1024 + vol->cluster_size - 1) /
-                               vol->cluster_size;
-       if (vol->mft_zone_start <= mft_lcn)
-               vol->mft_zone_start = 0;
-       ntfs_debug("vol->mft_zone_start = 0x%llx",
-                       (unsigned long long)vol->mft_zone_start);
-       /*
-        * Need to cap the mft zone on non-standard volumes so that it does
-        * not point outside the boundaries of the volume.  We do this by
-        * halving the zone size until we are inside the volume.
-        */
-       vol->mft_zone_end = vol->mft_lcn + mft_zone_size;
-       while (vol->mft_zone_end >= vol->nr_clusters) {
-               mft_zone_size >>= 1;
-               vol->mft_zone_end = vol->mft_lcn + mft_zone_size;
-       }
-       ntfs_debug("vol->mft_zone_end = 0x%llx",
-                       (unsigned long long)vol->mft_zone_end);
-       /*
-        * Set the current position within each data zone to the start of the
-        * respective zone.
-        */
-       vol->data1_zone_pos = vol->mft_zone_end;
-       ntfs_debug("vol->data1_zone_pos = 0x%llx",
-                       (unsigned long long)vol->data1_zone_pos);
-       vol->data2_zone_pos = 0;
-       ntfs_debug("vol->data2_zone_pos = 0x%llx",
-                       (unsigned long long)vol->data2_zone_pos);
-
-       /* Set the mft data allocation position to mft record 24. */
-       vol->mft_data_pos = 24;
-       ntfs_debug("vol->mft_data_pos = 0x%llx",
-                       (unsigned long long)vol->mft_data_pos);
-#endif /* NTFS_RW */
-}
-
-#ifdef NTFS_RW
-
-/**
- * load_and_init_mft_mirror - load and setup the mft mirror inode for a volume
- * @vol:       ntfs super block describing device whose mft mirror to load
- *
- * Return 'true' on success or 'false' on error.
- */
-static bool load_and_init_mft_mirror(ntfs_volume *vol)
-{
-       struct inode *tmp_ino;
-       ntfs_inode *tmp_ni;
-
-       ntfs_debug("Entering.");
-       /* Get mft mirror inode. */
-       tmp_ino = ntfs_iget(vol->sb, FILE_MFTMirr);
-       if (IS_ERR(tmp_ino) || is_bad_inode(tmp_ino)) {
-               if (!IS_ERR(tmp_ino))
-                       iput(tmp_ino);
-               /* Caller will display error message. */
-               return false;
-       }
-       /*
-        * Re-initialize some specifics about $MFTMirr's inode as
-        * ntfs_read_inode() will have set up the default ones.
-        */
-       /* Set uid and gid to root. */
-       tmp_ino->i_uid = GLOBAL_ROOT_UID;
-       tmp_ino->i_gid = GLOBAL_ROOT_GID;
-       /* Regular file.  No access for anyone. */
-       tmp_ino->i_mode = S_IFREG;
-       /* No VFS initiated operations allowed for $MFTMirr. */
-       tmp_ino->i_op = &ntfs_empty_inode_ops;
-       tmp_ino->i_fop = &ntfs_empty_file_ops;
-       /* Put in our special address space operations. */
-       tmp_ino->i_mapping->a_ops = &ntfs_mst_aops;
-       tmp_ni = NTFS_I(tmp_ino);
-       /* The $MFTMirr, like the $MFT is multi sector transfer protected. */
-       NInoSetMstProtected(tmp_ni);
-       NInoSetSparseDisabled(tmp_ni);
-       /*
-        * Set up our little cheat allowing us to reuse the async read io
-        * completion handler for directories.
-        */
-       tmp_ni->itype.index.block_size = vol->mft_record_size;
-       tmp_ni->itype.index.block_size_bits = vol->mft_record_size_bits;
-       vol->mftmirr_ino = tmp_ino;
-       ntfs_debug("Done.");
-       return true;
-}
-
-/**
- * check_mft_mirror - compare contents of the mft mirror with the mft
- * @vol:       ntfs super block describing device whose mft mirror to check
- *
- * Return 'true' on success or 'false' on error.
- *
- * Note, this function also results in the mft mirror runlist being completely
- * mapped into memory.  The mft mirror write code requires this and will BUG()
- * should it find an unmapped runlist element.
- */
-static bool check_mft_mirror(ntfs_volume *vol)
-{
-       struct super_block *sb = vol->sb;
-       ntfs_inode *mirr_ni;
-       struct page *mft_page, *mirr_page;
-       u8 *kmft, *kmirr;
-       runlist_element *rl, rl2[2];
-       pgoff_t index;
-       int mrecs_per_page, i;
-
-       ntfs_debug("Entering.");
-       /* Compare contents of $MFT and $MFTMirr. */
-       mrecs_per_page = PAGE_SIZE / vol->mft_record_size;
-       BUG_ON(!mrecs_per_page);
-       BUG_ON(!vol->mftmirr_size);
-       mft_page = mirr_page = NULL;
-       kmft = kmirr = NULL;
-       index = i = 0;
-       do {
-               u32 bytes;
-
-               /* Switch pages if necessary. */
-               if (!(i % mrecs_per_page)) {
-                       if (index) {
-                               ntfs_unmap_page(mft_page);
-                               ntfs_unmap_page(mirr_page);
-                       }
-                       /* Get the $MFT page. */
-                       mft_page = ntfs_map_page(vol->mft_ino->i_mapping,
-                                       index);
-                       if (IS_ERR(mft_page)) {
-                               ntfs_error(sb, "Failed to read $MFT.");
-                               return false;
-                       }
-                       kmft = page_address(mft_page);
-                       /* Get the $MFTMirr page. */
-                       mirr_page = ntfs_map_page(vol->mftmirr_ino->i_mapping,
-                                       index);
-                       if (IS_ERR(mirr_page)) {
-                               ntfs_error(sb, "Failed to read $MFTMirr.");
-                               goto mft_unmap_out;
-                       }
-                       kmirr = page_address(mirr_page);
-                       ++index;
-               }
-               /* Do not check the record if it is not in use. */
-               if (((MFT_RECORD*)kmft)->flags & MFT_RECORD_IN_USE) {
-                       /* Make sure the record is ok. */
-                       if (ntfs_is_baad_recordp((le32*)kmft)) {
-                               ntfs_error(sb, "Incomplete multi sector "
-                                               "transfer detected in mft "
-                                               "record %i.", i);
-mm_unmap_out:
-                               ntfs_unmap_page(mirr_page);
-mft_unmap_out:
-                               ntfs_unmap_page(mft_page);
-                               return false;
-                       }
-               }
-               /* Do not check the mirror record if it is not in use. */
-               if (((MFT_RECORD*)kmirr)->flags & MFT_RECORD_IN_USE) {
-                       if (ntfs_is_baad_recordp((le32*)kmirr)) {
-                               ntfs_error(sb, "Incomplete multi sector "
-                                               "transfer detected in mft "
-                                               "mirror record %i.", i);
-                               goto mm_unmap_out;
-                       }
-               }
-               /* Get the amount of data in the current record. */
-               bytes = le32_to_cpu(((MFT_RECORD*)kmft)->bytes_in_use);
-               if (bytes < sizeof(MFT_RECORD_OLD) ||
-                               bytes > vol->mft_record_size ||
-                               ntfs_is_baad_recordp((le32*)kmft)) {
-                       bytes = le32_to_cpu(((MFT_RECORD*)kmirr)->bytes_in_use);
-                       if (bytes < sizeof(MFT_RECORD_OLD) ||
-                                       bytes > vol->mft_record_size ||
-                                       ntfs_is_baad_recordp((le32*)kmirr))
-                               bytes = vol->mft_record_size;
-               }
-               /* Compare the two records. */
-               if (memcmp(kmft, kmirr, bytes)) {
-                       ntfs_error(sb, "$MFT and $MFTMirr (record %i) do not "
-                                       "match.  Run ntfsfix or chkdsk.", i);
-                       goto mm_unmap_out;
-               }
-               kmft += vol->mft_record_size;
-               kmirr += vol->mft_record_size;
-       } while (++i < vol->mftmirr_size);
-       /* Release the last pages. */
-       ntfs_unmap_page(mft_page);
-       ntfs_unmap_page(mirr_page);
-
-       /* Construct the mft mirror runlist by hand. */
-       rl2[0].vcn = 0;
-       rl2[0].lcn = vol->mftmirr_lcn;
-       rl2[0].length = (vol->mftmirr_size * vol->mft_record_size +
-                       vol->cluster_size - 1) / vol->cluster_size;
-       rl2[1].vcn = rl2[0].length;
-       rl2[1].lcn = LCN_ENOENT;
-       rl2[1].length = 0;
-       /*
-        * Because we have just read all of the mft mirror, we know we have
-        * mapped the full runlist for it.
-        */
-       mirr_ni = NTFS_I(vol->mftmirr_ino);
-       down_read(&mirr_ni->runlist.lock);
-       rl = mirr_ni->runlist.rl;
-       /* Compare the two runlists.  They must be identical. */
-       i = 0;
-       do {
-               if (rl2[i].vcn != rl[i].vcn || rl2[i].lcn != rl[i].lcn ||
-                               rl2[i].length != rl[i].length) {
-                       ntfs_error(sb, "$MFTMirr location mismatch.  "
-                                       "Run chkdsk.");
-                       up_read(&mirr_ni->runlist.lock);
-                       return false;
-               }
-       } while (rl2[i++].length);
-       up_read(&mirr_ni->runlist.lock);
-       ntfs_debug("Done.");
-       return true;
-}
-
-/**
- * load_and_check_logfile - load and check the logfile inode for a volume
- * @vol:       ntfs super block describing device whose logfile to load
- *
- * Return 'true' on success or 'false' on error.
- */
-static bool load_and_check_logfile(ntfs_volume *vol,
-               RESTART_PAGE_HEADER **rp)
-{
-       struct inode *tmp_ino;
-
-       ntfs_debug("Entering.");
-       tmp_ino = ntfs_iget(vol->sb, FILE_LogFile);
-       if (IS_ERR(tmp_ino) || is_bad_inode(tmp_ino)) {
-               if (!IS_ERR(tmp_ino))
-                       iput(tmp_ino);
-               /* Caller will display error message. */
-               return false;
-       }
-       if (!ntfs_check_logfile(tmp_ino, rp)) {
-               iput(tmp_ino);
-               /* ntfs_check_logfile() will have displayed error output. */
-               return false;
-       }
-       NInoSetSparseDisabled(NTFS_I(tmp_ino));
-       vol->logfile_ino = tmp_ino;
-       ntfs_debug("Done.");
-       return true;
-}
-
-#define NTFS_HIBERFIL_HEADER_SIZE      4096
-
-/**
- * check_windows_hibernation_status - check if Windows is suspended on a volume
- * @vol:       ntfs super block of device to check
- *
- * Check if Windows is hibernated on the ntfs volume @vol.  This is done by
- * looking for the file hiberfil.sys in the root directory of the volume.  If
- * the file is not present Windows is definitely not suspended.
- *
- * If hiberfil.sys exists and is less than 4kiB in size it means Windows is
- * definitely suspended (this volume is not the system volume).  Caveat:  on a
- * system with many volumes it is possible that the < 4kiB check is bogus but
- * for now this should do fine.
- *
- * If hiberfil.sys exists and is larger than 4kiB in size, we need to read the
- * hiberfil header (which is the first 4kiB).  If this begins with "hibr",
- * Windows is definitely suspended.  If it is completely full of zeroes,
- * Windows is definitely not hibernated.  Any other case is treated as if
- * Windows is suspended.  This caters for the above mentioned caveat of a
- * system with many volumes where no "hibr" magic would be present and there is
- * no zero header.
- *
- * Return 0 if Windows is not hibernated on the volume, >0 if Windows is
- * hibernated on the volume, and -errno on error.
- */
-static int check_windows_hibernation_status(ntfs_volume *vol)
-{
-       MFT_REF mref;
-       struct inode *vi;
-       struct page *page;
-       u32 *kaddr, *kend;
-       ntfs_name *name = NULL;
-       int ret = 1;
-       static const ntfschar hiberfil[13] = { cpu_to_le16('h'),
-                       cpu_to_le16('i'), cpu_to_le16('b'),
-                       cpu_to_le16('e'), cpu_to_le16('r'),
-                       cpu_to_le16('f'), cpu_to_le16('i'),
-                       cpu_to_le16('l'), cpu_to_le16('.'),
-                       cpu_to_le16('s'), cpu_to_le16('y'),
-                       cpu_to_le16('s'), 0 };
-
-       ntfs_debug("Entering.");
-       /*
-        * Find the inode number for the hibernation file by looking up the
-        * filename hiberfil.sys in the root directory.
-        */
-       inode_lock(vol->root_ino);
-       mref = ntfs_lookup_inode_by_name(NTFS_I(vol->root_ino), hiberfil, 12,
-                       &name);
-       inode_unlock(vol->root_ino);
-       if (IS_ERR_MREF(mref)) {
-               ret = MREF_ERR(mref);
-               /* If the file does not exist, Windows is not hibernated. */
-               if (ret == -ENOENT) {
-                       ntfs_debug("hiberfil.sys not present.  Windows is not "
-                                       "hibernated on the volume.");
-                       return 0;
-               }
-               /* A real error occurred. */
-               ntfs_error(vol->sb, "Failed to find inode number for "
-                               "hiberfil.sys.");
-               return ret;
-       }
-       /* We do not care for the type of match that was found. */
-       kfree(name);
-       /* Get the inode. */
-       vi = ntfs_iget(vol->sb, MREF(mref));
-       if (IS_ERR(vi) || is_bad_inode(vi)) {
-               if (!IS_ERR(vi))
-                       iput(vi);
-               ntfs_error(vol->sb, "Failed to load hiberfil.sys.");
-               return IS_ERR(vi) ? PTR_ERR(vi) : -EIO;
-       }
-       if (unlikely(i_size_read(vi) < NTFS_HIBERFIL_HEADER_SIZE)) {
-               ntfs_debug("hiberfil.sys is smaller than 4kiB (0x%llx).  "
-                               "Windows is hibernated on the volume.  This "
-                               "is not the system volume.", i_size_read(vi));
-               goto iput_out;
-       }
-       page = ntfs_map_page(vi->i_mapping, 0);
-       if (IS_ERR(page)) {
-               ntfs_error(vol->sb, "Failed to read from hiberfil.sys.");
-               ret = PTR_ERR(page);
-               goto iput_out;
-       }
-       kaddr = (u32*)page_address(page);
-       if (*(le32*)kaddr == cpu_to_le32(0x72626968)/*'hibr'*/) {
-               ntfs_debug("Magic \"hibr\" found in hiberfil.sys.  Windows is "
-                               "hibernated on the volume.  This is the "
-                               "system volume.");
-               goto unm_iput_out;
-       }
-       kend = kaddr + NTFS_HIBERFIL_HEADER_SIZE/sizeof(*kaddr);
-       do {
-               if (unlikely(*kaddr)) {
-                       ntfs_debug("hiberfil.sys is larger than 4kiB "
-                                       "(0x%llx), does not contain the "
-                                       "\"hibr\" magic, and does not have a "
-                                       "zero header.  Windows is hibernated "
-                                       "on the volume.  This is not the "
-                                       "system volume.", i_size_read(vi));
-                       goto unm_iput_out;
-               }
-       } while (++kaddr < kend);
-       ntfs_debug("hiberfil.sys contains a zero header.  Windows is not "
-                       "hibernated on the volume.  This is the system "
-                       "volume.");
-       ret = 0;
-unm_iput_out:
-       ntfs_unmap_page(page);
-iput_out:
-       iput(vi);
-       return ret;
-}
-
-/**
- * load_and_init_quota - load and setup the quota file for a volume if present
- * @vol:       ntfs super block describing device whose quota file to load
- *
- * Return 'true' on success or 'false' on error.  If $Quota is not present, we
- * leave vol->quota_ino as NULL and return success.
- */
-static bool load_and_init_quota(ntfs_volume *vol)
-{
-       MFT_REF mref;
-       struct inode *tmp_ino;
-       ntfs_name *name = NULL;
-       static const ntfschar Quota[7] = { cpu_to_le16('$'),
-                       cpu_to_le16('Q'), cpu_to_le16('u'),
-                       cpu_to_le16('o'), cpu_to_le16('t'),
-                       cpu_to_le16('a'), 0 };
-       static ntfschar Q[3] = { cpu_to_le16('$'),
-                       cpu_to_le16('Q'), 0 };
-
-       ntfs_debug("Entering.");
-       /*
-        * Find the inode number for the quota file by looking up the filename
-        * $Quota in the extended system files directory $Extend.
-        */
-       inode_lock(vol->extend_ino);
-       mref = ntfs_lookup_inode_by_name(NTFS_I(vol->extend_ino), Quota, 6,
-                       &name);
-       inode_unlock(vol->extend_ino);
-       if (IS_ERR_MREF(mref)) {
-               /*
-                * If the file does not exist, quotas are disabled and have
-                * never been enabled on this volume, just return success.
-                */
-               if (MREF_ERR(mref) == -ENOENT) {
-                       ntfs_debug("$Quota not present.  Volume does not have "
-                                       "quotas enabled.");
-                       /*
-                        * No need to try to set quotas out of date if they are
-                        * not enabled.
-                        */
-                       NVolSetQuotaOutOfDate(vol);
-                       return true;
-               }
-               /* A real error occurred. */
-               ntfs_error(vol->sb, "Failed to find inode number for $Quota.");
-               return false;
-       }
-       /* We do not care for the type of match that was found. */
-       kfree(name);
-       /* Get the inode. */
-       tmp_ino = ntfs_iget(vol->sb, MREF(mref));
-       if (IS_ERR(tmp_ino) || is_bad_inode(tmp_ino)) {
-               if (!IS_ERR(tmp_ino))
-                       iput(tmp_ino);
-               ntfs_error(vol->sb, "Failed to load $Quota.");
-               return false;
-       }
-       vol->quota_ino = tmp_ino;
-       /* Get the $Q index allocation attribute. */
-       tmp_ino = ntfs_index_iget(vol->quota_ino, Q, 2);
-       if (IS_ERR(tmp_ino)) {
-               ntfs_error(vol->sb, "Failed to load $Quota/$Q index.");
-               return false;
-       }
-       vol->quota_q_ino = tmp_ino;
-       ntfs_debug("Done.");
-       return true;
-}
-
-/**
- * load_and_init_usnjrnl - load and setup the transaction log if present
- * @vol:       ntfs super block describing device whose usnjrnl file to load
- *
- * Return 'true' on success or 'false' on error.
- *
- * If $UsnJrnl is not present or in the process of being disabled, we set
- * NVolUsnJrnlStamped() and return success.
- *
- * If the $UsnJrnl $DATA/$J attribute has a size equal to the lowest valid usn,
- * i.e. transaction logging has only just been enabled or the journal has been
- * stamped and nothing has been logged since, we also set NVolUsnJrnlStamped()
- * and return success.
- */
-static bool load_and_init_usnjrnl(ntfs_volume *vol)
-{
-       MFT_REF mref;
-       struct inode *tmp_ino;
-       ntfs_inode *tmp_ni;
-       struct page *page;
-       ntfs_name *name = NULL;
-       USN_HEADER *uh;
-       static const ntfschar UsnJrnl[9] = { cpu_to_le16('$'),
-                       cpu_to_le16('U'), cpu_to_le16('s'),
-                       cpu_to_le16('n'), cpu_to_le16('J'),
-                       cpu_to_le16('r'), cpu_to_le16('n'),
-                       cpu_to_le16('l'), 0 };
-       static ntfschar Max[5] = { cpu_to_le16('$'),
-                       cpu_to_le16('M'), cpu_to_le16('a'),
-                       cpu_to_le16('x'), 0 };
-       static ntfschar J[3] = { cpu_to_le16('$'),
-                       cpu_to_le16('J'), 0 };
-
-       ntfs_debug("Entering.");
-       /*
-        * Find the inode number for the transaction log file by looking up the
-        * filename $UsnJrnl in the extended system files directory $Extend.
-        */
-       inode_lock(vol->extend_ino);
-       mref = ntfs_lookup_inode_by_name(NTFS_I(vol->extend_ino), UsnJrnl, 8,
-                       &name);
-       inode_unlock(vol->extend_ino);
-       if (IS_ERR_MREF(mref)) {
-               /*
-                * If the file does not exist, transaction logging is disabled,
-                * just return success.
-                */
-               if (MREF_ERR(mref) == -ENOENT) {
-                       ntfs_debug("$UsnJrnl not present.  Volume does not "
-                                       "have transaction logging enabled.");
-not_enabled:
-                       /*
-                        * No need to try to stamp the transaction log if
-                        * transaction logging is not enabled.
-                        */
-                       NVolSetUsnJrnlStamped(vol);
-                       return true;
-               }
-               /* A real error occurred. */
-               ntfs_error(vol->sb, "Failed to find inode number for "
-                               "$UsnJrnl.");
-               return false;
-       }
-       /* We do not care for the type of match that was found. */
-       kfree(name);
-       /* Get the inode. */
-       tmp_ino = ntfs_iget(vol->sb, MREF(mref));
-       if (IS_ERR(tmp_ino) || unlikely(is_bad_inode(tmp_ino))) {
-               if (!IS_ERR(tmp_ino))
-                       iput(tmp_ino);
-               ntfs_error(vol->sb, "Failed to load $UsnJrnl.");
-               return false;
-       }
-       vol->usnjrnl_ino = tmp_ino;
-       /*
-        * If the transaction log is in the process of being deleted, we can
-        * ignore it.
-        */
-       if (unlikely(vol->vol_flags & VOLUME_DELETE_USN_UNDERWAY)) {
-               ntfs_debug("$UsnJrnl in the process of being disabled.  "
-                               "Volume does not have transaction logging "
-                               "enabled.");
-               goto not_enabled;
-       }
-       /* Get the $DATA/$Max attribute. */
-       tmp_ino = ntfs_attr_iget(vol->usnjrnl_ino, AT_DATA, Max, 4);
-       if (IS_ERR(tmp_ino)) {
-               ntfs_error(vol->sb, "Failed to load $UsnJrnl/$DATA/$Max "
-                               "attribute.");
-               return false;
-       }
-       vol->usnjrnl_max_ino = tmp_ino;
-       if (unlikely(i_size_read(tmp_ino) < sizeof(USN_HEADER))) {
-               ntfs_error(vol->sb, "Found corrupt $UsnJrnl/$DATA/$Max "
-                               "attribute (size is 0x%llx but should be at "
-                               "least 0x%zx bytes).", i_size_read(tmp_ino),
-                               sizeof(USN_HEADER));
-               return false;
-       }
-       /* Get the $DATA/$J attribute. */
-       tmp_ino = ntfs_attr_iget(vol->usnjrnl_ino, AT_DATA, J, 2);
-       if (IS_ERR(tmp_ino)) {
-               ntfs_error(vol->sb, "Failed to load $UsnJrnl/$DATA/$J "
-                               "attribute.");
-               return false;
-       }
-       vol->usnjrnl_j_ino = tmp_ino;
-       /* Verify $J is non-resident and sparse. */
-       tmp_ni = NTFS_I(vol->usnjrnl_j_ino);
-       if (unlikely(!NInoNonResident(tmp_ni) || !NInoSparse(tmp_ni))) {
-               ntfs_error(vol->sb, "$UsnJrnl/$DATA/$J attribute is resident "
-                               "and/or not sparse.");
-               return false;
-       }
-       /* Read the USN_HEADER from $DATA/$Max. */
-       page = ntfs_map_page(vol->usnjrnl_max_ino->i_mapping, 0);
-       if (IS_ERR(page)) {
-               ntfs_error(vol->sb, "Failed to read from $UsnJrnl/$DATA/$Max "
-                               "attribute.");
-               return false;
-       }
-       uh = (USN_HEADER*)page_address(page);
-       /* Sanity check the $Max. */
-       if (unlikely(sle64_to_cpu(uh->allocation_delta) >
-                       sle64_to_cpu(uh->maximum_size))) {
-               ntfs_error(vol->sb, "Allocation delta (0x%llx) exceeds "
-                               "maximum size (0x%llx).  $UsnJrnl is corrupt.",
-                               (long long)sle64_to_cpu(uh->allocation_delta),
-                               (long long)sle64_to_cpu(uh->maximum_size));
-               ntfs_unmap_page(page);
-               return false;
-       }
-       /*
-        * If the transaction log has been stamped and nothing has been written
-        * to it since, we do not need to stamp it.
-        */
-       if (unlikely(sle64_to_cpu(uh->lowest_valid_usn) >=
-                       i_size_read(vol->usnjrnl_j_ino))) {
-               if (likely(sle64_to_cpu(uh->lowest_valid_usn) ==
-                               i_size_read(vol->usnjrnl_j_ino))) {
-                       ntfs_unmap_page(page);
-                       ntfs_debug("$UsnJrnl is enabled but nothing has been "
-                                       "logged since it was last stamped.  "
-                                       "Treating this as if the volume does "
-                                       "not have transaction logging "
-                                       "enabled.");
-                       goto not_enabled;
-               }
-               ntfs_error(vol->sb, "$UsnJrnl has lowest valid usn (0x%llx) "
-                               "which is out of bounds (0x%llx).  $UsnJrnl "
-                               "is corrupt.",
-                               (long long)sle64_to_cpu(uh->lowest_valid_usn),
-                               i_size_read(vol->usnjrnl_j_ino));
-               ntfs_unmap_page(page);
-               return false;
-       }
-       ntfs_unmap_page(page);
-       ntfs_debug("Done.");
-       return true;
-}
-
-/**
- * load_and_init_attrdef - load the attribute definitions table for a volume
- * @vol:       ntfs super block describing device whose attrdef to load
- *
- * Return 'true' on success or 'false' on error.
- */
-static bool load_and_init_attrdef(ntfs_volume *vol)
-{
-       loff_t i_size;
-       struct super_block *sb = vol->sb;
-       struct inode *ino;
-       struct page *page;
-       pgoff_t index, max_index;
-       unsigned int size;
-
-       ntfs_debug("Entering.");
-       /* Read attrdef table and setup vol->attrdef and vol->attrdef_size. */
-       ino = ntfs_iget(sb, FILE_AttrDef);
-       if (IS_ERR(ino) || is_bad_inode(ino)) {
-               if (!IS_ERR(ino))
-                       iput(ino);
-               goto failed;
-       }
-       NInoSetSparseDisabled(NTFS_I(ino));
-       /* The size of FILE_AttrDef must be above 0 and fit inside 31 bits. */
-       i_size = i_size_read(ino);
-       if (i_size <= 0 || i_size > 0x7fffffff)
-               goto iput_failed;
-       vol->attrdef = (ATTR_DEF*)ntfs_malloc_nofs(i_size);
-       if (!vol->attrdef)
-               goto iput_failed;
-       index = 0;
-       max_index = i_size >> PAGE_SHIFT;
-       size = PAGE_SIZE;
-       while (index < max_index) {
-               /* Read the attrdef table and copy it into the linear buffer. */
-read_partial_attrdef_page:
-               page = ntfs_map_page(ino->i_mapping, index);
-               if (IS_ERR(page))
-                       goto free_iput_failed;
-               memcpy((u8*)vol->attrdef + (index++ << PAGE_SHIFT),
-                               page_address(page), size);
-               ntfs_unmap_page(page);
-       }
-       if (size == PAGE_SIZE) {
-               size = i_size & ~PAGE_MASK;
-               if (size)
-                       goto read_partial_attrdef_page;
-       }
-       vol->attrdef_size = i_size;
-       ntfs_debug("Read %llu bytes from $AttrDef.", i_size);
-       iput(ino);
-       return true;
-free_iput_failed:
-       ntfs_free(vol->attrdef);
-       vol->attrdef = NULL;
-iput_failed:
-       iput(ino);
-failed:
-       ntfs_error(sb, "Failed to initialize attribute definition table.");
-       return false;
-}
-
-#endif /* NTFS_RW */
-
-/**
- * load_and_init_upcase - load the upcase table for an ntfs volume
- * @vol:       ntfs super block describing device whose upcase to load
- *
- * Return 'true' on success or 'false' on error.
- */
-static bool load_and_init_upcase(ntfs_volume *vol)
-{
-       loff_t i_size;
-       struct super_block *sb = vol->sb;
-       struct inode *ino;
-       struct page *page;
-       pgoff_t index, max_index;
-       unsigned int size;
-       int i, max;
-
-       ntfs_debug("Entering.");
-       /* Read upcase table and setup vol->upcase and vol->upcase_len. */
-       ino = ntfs_iget(sb, FILE_UpCase);
-       if (IS_ERR(ino) || is_bad_inode(ino)) {
-               if (!IS_ERR(ino))
-                       iput(ino);
-               goto upcase_failed;
-       }
-       /*
-        * The upcase size must not be above 64k Unicode characters, must not
-        * be zero and must be a multiple of sizeof(ntfschar).
-        */
-       i_size = i_size_read(ino);
-       if (!i_size || i_size & (sizeof(ntfschar) - 1) ||
-                       i_size > 64ULL * 1024 * sizeof(ntfschar))
-               goto iput_upcase_failed;
-       vol->upcase = (ntfschar*)ntfs_malloc_nofs(i_size);
-       if (!vol->upcase)
-               goto iput_upcase_failed;
-       index = 0;
-       max_index = i_size >> PAGE_SHIFT;
-       size = PAGE_SIZE;
-       while (index < max_index) {
-               /* Read the upcase table and copy it into the linear buffer. */
-read_partial_upcase_page:
-               page = ntfs_map_page(ino->i_mapping, index);
-               if (IS_ERR(page))
-                       goto iput_upcase_failed;
-               memcpy((char*)vol->upcase + (index++ << PAGE_SHIFT),
-                               page_address(page), size);
-               ntfs_unmap_page(page);
-       }
-       if (size == PAGE_SIZE) {
-               size = i_size & ~PAGE_MASK;
-               if (size)
-                       goto read_partial_upcase_page;
-       }
-       vol->upcase_len = i_size >> UCHAR_T_SIZE_BITS;
-       ntfs_debug("Read %llu bytes from $UpCase (expected %zu bytes).",
-                       i_size, 64 * 1024 * sizeof(ntfschar));
-       iput(ino);
-       mutex_lock(&ntfs_lock);
-       if (!default_upcase) {
-               ntfs_debug("Using volume specified $UpCase since default is "
-                               "not present.");
-               mutex_unlock(&ntfs_lock);
-               return true;
-       }
-       max = default_upcase_len;
-       if (max > vol->upcase_len)
-               max = vol->upcase_len;
-       for (i = 0; i < max; i++)
-               if (vol->upcase[i] != default_upcase[i])
-                       break;
-       if (i == max) {
-               ntfs_free(vol->upcase);
-               vol->upcase = default_upcase;
-               vol->upcase_len = max;
-               ntfs_nr_upcase_users++;
-               mutex_unlock(&ntfs_lock);
-               ntfs_debug("Volume specified $UpCase matches default. Using "
-                               "default.");
-               return true;
-       }
-       mutex_unlock(&ntfs_lock);
-       ntfs_debug("Using volume specified $UpCase since it does not match "
-                       "the default.");
-       return true;
-iput_upcase_failed:
-       iput(ino);
-       ntfs_free(vol->upcase);
-       vol->upcase = NULL;
-upcase_failed:
-       mutex_lock(&ntfs_lock);
-       if (default_upcase) {
-               vol->upcase = default_upcase;
-               vol->upcase_len = default_upcase_len;
-               ntfs_nr_upcase_users++;
-               mutex_unlock(&ntfs_lock);
-               ntfs_error(sb, "Failed to load $UpCase from the volume. Using "
-                               "default.");
-               return true;
-       }
-       mutex_unlock(&ntfs_lock);
-       ntfs_error(sb, "Failed to initialize upcase table.");
-       return false;
-}
-
-/*
- * The lcn and mft bitmap inodes are NTFS-internal inodes with
- * their own special locking rules:
- */
-static struct lock_class_key
-       lcnbmp_runlist_lock_key, lcnbmp_mrec_lock_key,
-       mftbmp_runlist_lock_key, mftbmp_mrec_lock_key;
-
-/**
- * load_system_files - open the system files using normal functions
- * @vol:       ntfs super block describing device whose system files to load
- *
- * Open the system files with normal access functions and complete setting up
- * the ntfs super block @vol.
- *
- * Return 'true' on success or 'false' on error.
- */
-static bool load_system_files(ntfs_volume *vol)
-{
-       struct super_block *sb = vol->sb;
-       MFT_RECORD *m;
-       VOLUME_INFORMATION *vi;
-       ntfs_attr_search_ctx *ctx;
-#ifdef NTFS_RW
-       RESTART_PAGE_HEADER *rp;
-       int err;
-#endif /* NTFS_RW */
-
-       ntfs_debug("Entering.");
-#ifdef NTFS_RW
-       /* Get mft mirror inode compare the contents of $MFT and $MFTMirr. */
-       if (!load_and_init_mft_mirror(vol) || !check_mft_mirror(vol)) {
-               static const char *es1 = "Failed to load $MFTMirr";
-               static const char *es2 = "$MFTMirr does not match $MFT";
-               static const char *es3 = ".  Run ntfsfix and/or chkdsk.";
-
-               /* If a read-write mount, convert it to a read-only mount. */
-               if (!sb_rdonly(sb)) {
-                       if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
-                                       ON_ERRORS_CONTINUE))) {
-                               ntfs_error(sb, "%s and neither on_errors="
-                                               "continue nor on_errors="
-                                               "remount-ro was specified%s",
-                                               !vol->mftmirr_ino ? es1 : es2,
-                                               es3);
-                               goto iput_mirr_err_out;
-                       }
-                       sb->s_flags |= SB_RDONLY;
-                       ntfs_error(sb, "%s.  Mounting read-only%s",
-                                       !vol->mftmirr_ino ? es1 : es2, es3);
-               } else
-                       ntfs_warning(sb, "%s.  Will not be able to remount "
-                                       "read-write%s",
-                                       !vol->mftmirr_ino ? es1 : es2, es3);
-               /* This will prevent a read-write remount. */
-               NVolSetErrors(vol);
-       }
-#endif /* NTFS_RW */
-       /* Get mft bitmap attribute inode. */
-       vol->mftbmp_ino = ntfs_attr_iget(vol->mft_ino, AT_BITMAP, NULL, 0);
-       if (IS_ERR(vol->mftbmp_ino)) {
-               ntfs_error(sb, "Failed to load $MFT/$BITMAP attribute.");
-               goto iput_mirr_err_out;
-       }
-       lockdep_set_class(&NTFS_I(vol->mftbmp_ino)->runlist.lock,
-                          &mftbmp_runlist_lock_key);
-       lockdep_set_class(&NTFS_I(vol->mftbmp_ino)->mrec_lock,
-                          &mftbmp_mrec_lock_key);
-       /* Read upcase table and setup @vol->upcase and @vol->upcase_len. */
-       if (!load_and_init_upcase(vol))
-               goto iput_mftbmp_err_out;
-#ifdef NTFS_RW
-       /*
-        * Read attribute definitions table and setup @vol->attrdef and
-        * @vol->attrdef_size.
-        */
-       if (!load_and_init_attrdef(vol))
-               goto iput_upcase_err_out;
-#endif /* NTFS_RW */
-       /*
-        * Get the cluster allocation bitmap inode and verify the size, no
-        * need for any locking at this stage as we are already running
-        * exclusively as we are mount in progress task.
-        */
-       vol->lcnbmp_ino = ntfs_iget(sb, FILE_Bitmap);
-       if (IS_ERR(vol->lcnbmp_ino) || is_bad_inode(vol->lcnbmp_ino)) {
-               if (!IS_ERR(vol->lcnbmp_ino))
-                       iput(vol->lcnbmp_ino);
-               goto bitmap_failed;
-       }
-       lockdep_set_class(&NTFS_I(vol->lcnbmp_ino)->runlist.lock,
-                          &lcnbmp_runlist_lock_key);
-       lockdep_set_class(&NTFS_I(vol->lcnbmp_ino)->mrec_lock,
-                          &lcnbmp_mrec_lock_key);
-
-       NInoSetSparseDisabled(NTFS_I(vol->lcnbmp_ino));
-       if ((vol->nr_clusters + 7) >> 3 > i_size_read(vol->lcnbmp_ino)) {
-               iput(vol->lcnbmp_ino);
-bitmap_failed:
-               ntfs_error(sb, "Failed to load $Bitmap.");
-               goto iput_attrdef_err_out;
-       }
-       /*
-        * Get the volume inode and setup our cache of the volume flags and
-        * version.
-        */
-       vol->vol_ino = ntfs_iget(sb, FILE_Volume);
-       if (IS_ERR(vol->vol_ino) || is_bad_inode(vol->vol_ino)) {
-               if (!IS_ERR(vol->vol_ino))
-                       iput(vol->vol_ino);
-volume_failed:
-               ntfs_error(sb, "Failed to load $Volume.");
-               goto iput_lcnbmp_err_out;
-       }
-       m = map_mft_record(NTFS_I(vol->vol_ino));
-       if (IS_ERR(m)) {
-iput_volume_failed:
-               iput(vol->vol_ino);
-               goto volume_failed;
-       }
-       if (!(ctx = ntfs_attr_get_search_ctx(NTFS_I(vol->vol_ino), m))) {
-               ntfs_error(sb, "Failed to get attribute search context.");
-               goto get_ctx_vol_failed;
-       }
-       if (ntfs_attr_lookup(AT_VOLUME_INFORMATION, NULL, 0, 0, 0, NULL, 0,
-                       ctx) || ctx->attr->non_resident || ctx->attr->flags) {
-err_put_vol:
-               ntfs_attr_put_search_ctx(ctx);
-get_ctx_vol_failed:
-               unmap_mft_record(NTFS_I(vol->vol_ino));
-               goto iput_volume_failed;
-       }
-       vi = (VOLUME_INFORMATION*)((char*)ctx->attr +
-                       le16_to_cpu(ctx->attr->data.resident.value_offset));
-       /* Some bounds checks. */
-       if ((u8*)vi < (u8*)ctx->attr || (u8*)vi +
-                       le32_to_cpu(ctx->attr->data.resident.value_length) >
-                       (u8*)ctx->attr + le32_to_cpu(ctx->attr->length))
-               goto err_put_vol;
-       /* Copy the volume flags and version to the ntfs_volume structure. */
-       vol->vol_flags = vi->flags;
-       vol->major_ver = vi->major_ver;
-       vol->minor_ver = vi->minor_ver;
-       ntfs_attr_put_search_ctx(ctx);
-       unmap_mft_record(NTFS_I(vol->vol_ino));
-       pr_info("volume version %i.%i.\n", vol->major_ver,
-                       vol->minor_ver);
-       if (vol->major_ver < 3 && NVolSparseEnabled(vol)) {
-               ntfs_warning(vol->sb, "Disabling sparse support due to NTFS "
-                               "volume version %i.%i (need at least version "
-                               "3.0).", vol->major_ver, vol->minor_ver);
-               NVolClearSparseEnabled(vol);
-       }
-#ifdef NTFS_RW
-       /* Make sure that no unsupported volume flags are set. */
-       if (vol->vol_flags & VOLUME_MUST_MOUNT_RO_MASK) {
-               static const char *es1a = "Volume is dirty";
-               static const char *es1b = "Volume has been modified by chkdsk";
-               static const char *es1c = "Volume has unsupported flags set";
-               static const char *es2a = ".  Run chkdsk and mount in Windows.";
-               static const char *es2b = ".  Mount in Windows.";
-               const char *es1, *es2;
-
-               es2 = es2a;
-               if (vol->vol_flags & VOLUME_IS_DIRTY)
-                       es1 = es1a;
-               else if (vol->vol_flags & VOLUME_MODIFIED_BY_CHKDSK) {
-                       es1 = es1b;
-                       es2 = es2b;
-               } else {
-                       es1 = es1c;
-                       ntfs_warning(sb, "Unsupported volume flags 0x%x "
-                                       "encountered.",
-                                       (unsigned)le16_to_cpu(vol->vol_flags));
-               }
-               /* If a read-write mount, convert it to a read-only mount. */
-               if (!sb_rdonly(sb)) {
-                       if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
-                                       ON_ERRORS_CONTINUE))) {
-                               ntfs_error(sb, "%s and neither on_errors="
-                                               "continue nor on_errors="
-                                               "remount-ro was specified%s",
-                                               es1, es2);
-                               goto iput_vol_err_out;
-                       }
-                       sb->s_flags |= SB_RDONLY;
-                       ntfs_error(sb, "%s.  Mounting read-only%s", es1, es2);
-               } else
-                       ntfs_warning(sb, "%s.  Will not be able to remount "
-                                       "read-write%s", es1, es2);
-               /*
-                * Do not set NVolErrors() because ntfs_remount() re-checks the
-                * flags which we need to do in case any flags have changed.
-                */
-       }
-       /*
-        * Get the inode for the logfile, check it and determine if the volume
-        * was shutdown cleanly.
-        */
-       rp = NULL;
-       if (!load_and_check_logfile(vol, &rp) ||
-                       !ntfs_is_logfile_clean(vol->logfile_ino, rp)) {
-               static const char *es1a = "Failed to load $LogFile";
-               static const char *es1b = "$LogFile is not clean";
-               static const char *es2 = ".  Mount in Windows.";
-               const char *es1;
-
-               es1 = !vol->logfile_ino ? es1a : es1b;
-               /* If a read-write mount, convert it to a read-only mount. */
-               if (!sb_rdonly(sb)) {
-                       if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
-                                       ON_ERRORS_CONTINUE))) {
-                               ntfs_error(sb, "%s and neither on_errors="
-                                               "continue nor on_errors="
-                                               "remount-ro was specified%s",
-                                               es1, es2);
-                               if (vol->logfile_ino) {
-                                       BUG_ON(!rp);
-                                       ntfs_free(rp);
-                               }
-                               goto iput_logfile_err_out;
-                       }
-                       sb->s_flags |= SB_RDONLY;
-                       ntfs_error(sb, "%s.  Mounting read-only%s", es1, es2);
-               } else
-                       ntfs_warning(sb, "%s.  Will not be able to remount "
-                                       "read-write%s", es1, es2);
-               /* This will prevent a read-write remount. */
-               NVolSetErrors(vol);
-       }
-       ntfs_free(rp);
-#endif /* NTFS_RW */
-       /* Get the root directory inode so we can do path lookups. */
-       vol->root_ino = ntfs_iget(sb, FILE_root);
-       if (IS_ERR(vol->root_ino) || is_bad_inode(vol->root_ino)) {
-               if (!IS_ERR(vol->root_ino))
-                       iput(vol->root_ino);
-               ntfs_error(sb, "Failed to load root directory.");
-               goto iput_logfile_err_out;
-       }
-#ifdef NTFS_RW
-       /*
-        * Check if Windows is suspended to disk on the target volume.  If it
-        * is hibernated, we must not write *anything* to the disk so set
-        * NVolErrors() without setting the dirty volume flag and mount
-        * read-only.  This will prevent read-write remounting and it will also
-        * prevent all writes.
-        */
-       err = check_windows_hibernation_status(vol);
-       if (unlikely(err)) {
-               static const char *es1a = "Failed to determine if Windows is "
-                               "hibernated";
-               static const char *es1b = "Windows is hibernated";
-               static const char *es2 = ".  Run chkdsk.";
-               const char *es1;
-
-               es1 = err < 0 ? es1a : es1b;
-               /* If a read-write mount, convert it to a read-only mount. */
-               if (!sb_rdonly(sb)) {
-                       if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
-                                       ON_ERRORS_CONTINUE))) {
-                               ntfs_error(sb, "%s and neither on_errors="
-                                               "continue nor on_errors="
-                                               "remount-ro was specified%s",
-                                               es1, es2);
-                               goto iput_root_err_out;
-                       }
-                       sb->s_flags |= SB_RDONLY;
-                       ntfs_error(sb, "%s.  Mounting read-only%s", es1, es2);
-               } else
-                       ntfs_warning(sb, "%s.  Will not be able to remount "
-                                       "read-write%s", es1, es2);
-               /* This will prevent a read-write remount. */
-               NVolSetErrors(vol);
-       }
-       /* If (still) a read-write mount, mark the volume dirty. */
-       if (!sb_rdonly(sb) && ntfs_set_volume_flags(vol, VOLUME_IS_DIRTY)) {
-               static const char *es1 = "Failed to set dirty bit in volume "
-                               "information flags";
-               static const char *es2 = ".  Run chkdsk.";
-
-               /* Convert to a read-only mount. */
-               if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
-                               ON_ERRORS_CONTINUE))) {
-                       ntfs_error(sb, "%s and neither on_errors=continue nor "
-                                       "on_errors=remount-ro was specified%s",
-                                       es1, es2);
-                       goto iput_root_err_out;
-               }
-               ntfs_error(sb, "%s.  Mounting read-only%s", es1, es2);
-               sb->s_flags |= SB_RDONLY;
-               /*
-                * Do not set NVolErrors() because ntfs_remount() might manage
-                * to set the dirty flag in which case all would be well.
-                */
-       }
-#if 0
-       // TODO: Enable this code once we start modifying anything that is
-       //       different between NTFS 1.2 and 3.x...
-       /*
-        * If (still) a read-write mount, set the NT4 compatibility flag on
-        * newer NTFS version volumes.
-        */
-       if (!(sb->s_flags & SB_RDONLY) && (vol->major_ver > 1) &&
-                       ntfs_set_volume_flags(vol, VOLUME_MOUNTED_ON_NT4)) {
-               static const char *es1 = "Failed to set NT4 compatibility flag";
-               static const char *es2 = ".  Run chkdsk.";
-
-               /* Convert to a read-only mount. */
-               if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
-                               ON_ERRORS_CONTINUE))) {
-                       ntfs_error(sb, "%s and neither on_errors=continue nor "
-                                       "on_errors=remount-ro was specified%s",
-                                       es1, es2);
-                       goto iput_root_err_out;
-               }
-               ntfs_error(sb, "%s.  Mounting read-only%s", es1, es2);
-               sb->s_flags |= SB_RDONLY;
-               NVolSetErrors(vol);
-       }
-#endif
-       /* If (still) a read-write mount, empty the logfile. */
-       if (!sb_rdonly(sb) && !ntfs_empty_logfile(vol->logfile_ino)) {
-               static const char *es1 = "Failed to empty $LogFile";
-               static const char *es2 = ".  Mount in Windows.";
-
-               /* Convert to a read-only mount. */
-               if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
-                               ON_ERRORS_CONTINUE))) {
-                       ntfs_error(sb, "%s and neither on_errors=continue nor "
-                                       "on_errors=remount-ro was specified%s",
-                                       es1, es2);
-                       goto iput_root_err_out;
-               }
-               ntfs_error(sb, "%s.  Mounting read-only%s", es1, es2);
-               sb->s_flags |= SB_RDONLY;
-               NVolSetErrors(vol);
-       }
-#endif /* NTFS_RW */
-       /* If on NTFS versions before 3.0, we are done. */
-       if (unlikely(vol->major_ver < 3))
-               return true;
-       /* NTFS 3.0+ specific initialization. */
-       /* Get the security descriptors inode. */
-       vol->secure_ino = ntfs_iget(sb, FILE_Secure);
-       if (IS_ERR(vol->secure_ino) || is_bad_inode(vol->secure_ino)) {
-               if (!IS_ERR(vol->secure_ino))
-                       iput(vol->secure_ino);
-               ntfs_error(sb, "Failed to load $Secure.");
-               goto iput_root_err_out;
-       }
-       // TODO: Initialize security.
-       /* Get the extended system files' directory inode. */
-       vol->extend_ino = ntfs_iget(sb, FILE_Extend);
-       if (IS_ERR(vol->extend_ino) || is_bad_inode(vol->extend_ino) ||
-           !S_ISDIR(vol->extend_ino->i_mode)) {
-               if (!IS_ERR(vol->extend_ino))
-                       iput(vol->extend_ino);
-               ntfs_error(sb, "Failed to load $Extend.");
-               goto iput_sec_err_out;
-       }
-#ifdef NTFS_RW
-       /* Find the quota file, load it if present, and set it up. */
-       if (!load_and_init_quota(vol)) {
-               static const char *es1 = "Failed to load $Quota";
-               static const char *es2 = ".  Run chkdsk.";
-
-               /* If a read-write mount, convert it to a read-only mount. */
-               if (!sb_rdonly(sb)) {
-                       if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
-                                       ON_ERRORS_CONTINUE))) {
-                               ntfs_error(sb, "%s and neither on_errors="
-                                               "continue nor on_errors="
-                                               "remount-ro was specified%s",
-                                               es1, es2);
-                               goto iput_quota_err_out;
-                       }
-                       sb->s_flags |= SB_RDONLY;
-                       ntfs_error(sb, "%s.  Mounting read-only%s", es1, es2);
-               } else
-                       ntfs_warning(sb, "%s.  Will not be able to remount "
-                                       "read-write%s", es1, es2);
-               /* This will prevent a read-write remount. */
-               NVolSetErrors(vol);
-       }
-       /* If (still) a read-write mount, mark the quotas out of date. */
-       if (!sb_rdonly(sb) && !ntfs_mark_quotas_out_of_date(vol)) {
-               static const char *es1 = "Failed to mark quotas out of date";
-               static const char *es2 = ".  Run chkdsk.";
-
-               /* Convert to a read-only mount. */
-               if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
-                               ON_ERRORS_CONTINUE))) {
-                       ntfs_error(sb, "%s and neither on_errors=continue nor "
-                                       "on_errors=remount-ro was specified%s",
-                                       es1, es2);
-                       goto iput_quota_err_out;
-               }
-               ntfs_error(sb, "%s.  Mounting read-only%s", es1, es2);
-               sb->s_flags |= SB_RDONLY;
-               NVolSetErrors(vol);
-       }
-       /*
-        * Find the transaction log file ($UsnJrnl), load it if present, check
-        * it, and set it up.
-        */
-       if (!load_and_init_usnjrnl(vol)) {
-               static const char *es1 = "Failed to load $UsnJrnl";
-               static const char *es2 = ".  Run chkdsk.";
-
-               /* If a read-write mount, convert it to a read-only mount. */
-               if (!sb_rdonly(sb)) {
-                       if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
-                                       ON_ERRORS_CONTINUE))) {
-                               ntfs_error(sb, "%s and neither on_errors="
-                                               "continue nor on_errors="
-                                               "remount-ro was specified%s",
-                                               es1, es2);
-                               goto iput_usnjrnl_err_out;
-                       }
-                       sb->s_flags |= SB_RDONLY;
-                       ntfs_error(sb, "%s.  Mounting read-only%s", es1, es2);
-               } else
-                       ntfs_warning(sb, "%s.  Will not be able to remount "
-                                       "read-write%s", es1, es2);
-               /* This will prevent a read-write remount. */
-               NVolSetErrors(vol);
-       }
-       /* If (still) a read-write mount, stamp the transaction log. */
-       if (!sb_rdonly(sb) && !ntfs_stamp_usnjrnl(vol)) {
-               static const char *es1 = "Failed to stamp transaction log "
-                               "($UsnJrnl)";
-               static const char *es2 = ".  Run chkdsk.";
-
-               /* Convert to a read-only mount. */
-               if (!(vol->on_errors & (ON_ERRORS_REMOUNT_RO |
-                               ON_ERRORS_CONTINUE))) {
-                       ntfs_error(sb, "%s and neither on_errors=continue nor "
-                                       "on_errors=remount-ro was specified%s",
-                                       es1, es2);
-                       goto iput_usnjrnl_err_out;
-               }
-               ntfs_error(sb, "%s.  Mounting read-only%s", es1, es2);
-               sb->s_flags |= SB_RDONLY;
-               NVolSetErrors(vol);
-       }
-#endif /* NTFS_RW */
-       return true;
-#ifdef NTFS_RW
-iput_usnjrnl_err_out:
-       iput(vol->usnjrnl_j_ino);
-       iput(vol->usnjrnl_max_ino);
-       iput(vol->usnjrnl_ino);
-iput_quota_err_out:
-       iput(vol->quota_q_ino);
-       iput(vol->quota_ino);
-       iput(vol->extend_ino);
-#endif /* NTFS_RW */
-iput_sec_err_out:
-       iput(vol->secure_ino);
-iput_root_err_out:
-       iput(vol->root_ino);
-iput_logfile_err_out:
-#ifdef NTFS_RW
-       iput(vol->logfile_ino);
-iput_vol_err_out:
-#endif /* NTFS_RW */
-       iput(vol->vol_ino);
-iput_lcnbmp_err_out:
-       iput(vol->lcnbmp_ino);
-iput_attrdef_err_out:
-       vol->attrdef_size = 0;
-       if (vol->attrdef) {
-               ntfs_free(vol->attrdef);
-               vol->attrdef = NULL;
-       }
-#ifdef NTFS_RW
-iput_upcase_err_out:
-#endif /* NTFS_RW */
-       vol->upcase_len = 0;
-       mutex_lock(&ntfs_lock);
-       if (vol->upcase == default_upcase) {
-               ntfs_nr_upcase_users--;
-               vol->upcase = NULL;
-       }
-       mutex_unlock(&ntfs_lock);
-       if (vol->upcase) {
-               ntfs_free(vol->upcase);
-               vol->upcase = NULL;
-       }
-iput_mftbmp_err_out:
-       iput(vol->mftbmp_ino);
-iput_mirr_err_out:
-#ifdef NTFS_RW
-       iput(vol->mftmirr_ino);
-#endif /* NTFS_RW */
-       return false;
-}
-
-/**
- * ntfs_put_super - called by the vfs to unmount a volume
- * @sb:                vfs superblock of volume to unmount
- *
- * ntfs_put_super() is called by the VFS (from fs/super.c::do_umount()) when
- * the volume is being unmounted (umount system call has been invoked) and it
- * releases all inodes and memory belonging to the NTFS specific part of the
- * super block.
- */
-static void ntfs_put_super(struct super_block *sb)
-{
-       ntfs_volume *vol = NTFS_SB(sb);
-
-       ntfs_debug("Entering.");
-
-#ifdef NTFS_RW
-       /*
-        * Commit all inodes while they are still open in case some of them
-        * cause others to be dirtied.
-        */
-       ntfs_commit_inode(vol->vol_ino);
-
-       /* NTFS 3.0+ specific. */
-       if (vol->major_ver >= 3) {
-               if (vol->usnjrnl_j_ino)
-                       ntfs_commit_inode(vol->usnjrnl_j_ino);
-               if (vol->usnjrnl_max_ino)
-                       ntfs_commit_inode(vol->usnjrnl_max_ino);
-               if (vol->usnjrnl_ino)
-                       ntfs_commit_inode(vol->usnjrnl_ino);
-               if (vol->quota_q_ino)
-                       ntfs_commit_inode(vol->quota_q_ino);
-               if (vol->quota_ino)
-                       ntfs_commit_inode(vol->quota_ino);
-               if (vol->extend_ino)
-                       ntfs_commit_inode(vol->extend_ino);
-               if (vol->secure_ino)
-                       ntfs_commit_inode(vol->secure_ino);
-       }
-
-       ntfs_commit_inode(vol->root_ino);
-
-       down_write(&vol->lcnbmp_lock);
-       ntfs_commit_inode(vol->lcnbmp_ino);
-       up_write(&vol->lcnbmp_lock);
-
-       down_write(&vol->mftbmp_lock);
-       ntfs_commit_inode(vol->mftbmp_ino);
-       up_write(&vol->mftbmp_lock);
-
-       if (vol->logfile_ino)
-               ntfs_commit_inode(vol->logfile_ino);
-
-       if (vol->mftmirr_ino)
-               ntfs_commit_inode(vol->mftmirr_ino);
-       ntfs_commit_inode(vol->mft_ino);
-
-       /*
-        * If a read-write mount and no volume errors have occurred, mark the
-        * volume clean.  Also, re-commit all affected inodes.
-        */
-       if (!sb_rdonly(sb)) {
-               if (!NVolErrors(vol)) {
-                       if (ntfs_clear_volume_flags(vol, VOLUME_IS_DIRTY))
-                               ntfs_warning(sb, "Failed to clear dirty bit "
-                                               "in volume information "
-                                               "flags.  Run chkdsk.");
-                       ntfs_commit_inode(vol->vol_ino);
-                       ntfs_commit_inode(vol->root_ino);
-                       if (vol->mftmirr_ino)
-                               ntfs_commit_inode(vol->mftmirr_ino);
-                       ntfs_commit_inode(vol->mft_ino);
-               } else {
-                       ntfs_warning(sb, "Volume has errors.  Leaving volume "
-                                       "marked dirty.  Run chkdsk.");
-               }
-       }
-#endif /* NTFS_RW */
-
-       iput(vol->vol_ino);
-       vol->vol_ino = NULL;
-
-       /* NTFS 3.0+ specific clean up. */
-       if (vol->major_ver >= 3) {
-#ifdef NTFS_RW
-               if (vol->usnjrnl_j_ino) {
-                       iput(vol->usnjrnl_j_ino);
-                       vol->usnjrnl_j_ino = NULL;
-               }
-               if (vol->usnjrnl_max_ino) {
-                       iput(vol->usnjrnl_max_ino);
-                       vol->usnjrnl_max_ino = NULL;
-               }
-               if (vol->usnjrnl_ino) {
-                       iput(vol->usnjrnl_ino);
-                       vol->usnjrnl_ino = NULL;
-               }
-               if (vol->quota_q_ino) {
-                       iput(vol->quota_q_ino);
-                       vol->quota_q_ino = NULL;
-               }
-               if (vol->quota_ino) {
-                       iput(vol->quota_ino);
-                       vol->quota_ino = NULL;
-               }
-#endif /* NTFS_RW */
-               if (vol->extend_ino) {
-                       iput(vol->extend_ino);
-                       vol->extend_ino = NULL;
-               }
-               if (vol->secure_ino) {
-                       iput(vol->secure_ino);
-                       vol->secure_ino = NULL;
-               }
-       }
-
-       iput(vol->root_ino);
-       vol->root_ino = NULL;
-
-       down_write(&vol->lcnbmp_lock);
-       iput(vol->lcnbmp_ino);
-       vol->lcnbmp_ino = NULL;
-       up_write(&vol->lcnbmp_lock);
-
-       down_write(&vol->mftbmp_lock);
-       iput(vol->mftbmp_ino);
-       vol->mftbmp_ino = NULL;
-       up_write(&vol->mftbmp_lock);
-
-#ifdef NTFS_RW
-       if (vol->logfile_ino) {
-               iput(vol->logfile_ino);
-               vol->logfile_ino = NULL;
-       }
-       if (vol->mftmirr_ino) {
-               /* Re-commit the mft mirror and mft just in case. */
-               ntfs_commit_inode(vol->mftmirr_ino);
-               ntfs_commit_inode(vol->mft_ino);
-               iput(vol->mftmirr_ino);
-               vol->mftmirr_ino = NULL;
-       }
-       /*
-        * We should have no dirty inodes left, due to
-        * mft.c::ntfs_mft_writepage() cleaning all the dirty pages as
-        * the underlying mft records are written out and cleaned.
-        */
-       ntfs_commit_inode(vol->mft_ino);
-       write_inode_now(vol->mft_ino, 1);
-#endif /* NTFS_RW */
-
-       iput(vol->mft_ino);
-       vol->mft_ino = NULL;
-
-       /* Throw away the table of attribute definitions. */
-       vol->attrdef_size = 0;
-       if (vol->attrdef) {
-               ntfs_free(vol->attrdef);
-               vol->attrdef = NULL;
-       }
-       vol->upcase_len = 0;
-       /*
-        * Destroy the global default upcase table if necessary.  Also decrease
-        * the number of upcase users if we are a user.
-        */
-       mutex_lock(&ntfs_lock);
-       if (vol->upcase == default_upcase) {
-               ntfs_nr_upcase_users--;
-               vol->upcase = NULL;
-       }
-       if (!ntfs_nr_upcase_users && default_upcase) {
-               ntfs_free(default_upcase);
-               default_upcase = NULL;
-       }
-       if (vol->cluster_size <= 4096 && !--ntfs_nr_compression_users)
-               free_compression_buffers();
-       mutex_unlock(&ntfs_lock);
-       if (vol->upcase) {
-               ntfs_free(vol->upcase);
-               vol->upcase = NULL;
-       }
-
-       unload_nls(vol->nls_map);
-
-       sb->s_fs_info = NULL;
-       kfree(vol);
-}
-
-/**
- * get_nr_free_clusters - return the number of free clusters on a volume
- * @vol:       ntfs volume for which to obtain free cluster count
- *
- * Calculate the number of free clusters on the mounted NTFS volume @vol. We
- * actually calculate the number of clusters in use instead because this
- * allows us to not care about partial pages as these will be just zero filled
- * and hence not be counted as allocated clusters.
- *
- * The only particularity is that clusters beyond the end of the logical ntfs
- * volume will be marked as allocated to prevent errors which means we have to
- * discount those at the end. This is important as the cluster bitmap always
- * has a size in multiples of 8 bytes, i.e. up to 63 clusters could be outside
- * the logical volume and marked in use when they are not as they do not exist.
- *
- * If any pages cannot be read we assume all clusters in the erroring pages are
- * in use. This means we return an underestimate on errors which is better than
- * an overestimate.
- */
-static s64 get_nr_free_clusters(ntfs_volume *vol)
-{
-       s64 nr_free = vol->nr_clusters;
-       struct address_space *mapping = vol->lcnbmp_ino->i_mapping;
-       struct page *page;
-       pgoff_t index, max_index;
-
-       ntfs_debug("Entering.");
-       /* Serialize accesses to the cluster bitmap. */
-       down_read(&vol->lcnbmp_lock);
-       /*
-        * Convert the number of bits into bytes rounded up, then convert into
-        * multiples of PAGE_SIZE, rounding up so that if we have one
-        * full and one partial page max_index = 2.
-        */
-       max_index = (((vol->nr_clusters + 7) >> 3) + PAGE_SIZE - 1) >>
-                       PAGE_SHIFT;
-       /* Use multiples of 4 bytes, thus max_size is PAGE_SIZE / 4. */
-       ntfs_debug("Reading $Bitmap, max_index = 0x%lx, max_size = 0x%lx.",
-                       max_index, PAGE_SIZE / 4);
-       for (index = 0; index < max_index; index++) {
-               unsigned long *kaddr;
-
-               /*
-                * Read the page from page cache, getting it from backing store
-                * if necessary, and increment the use count.
-                */
-               page = read_mapping_page(mapping, index, NULL);
-               /* Ignore pages which errored synchronously. */
-               if (IS_ERR(page)) {
-                       ntfs_debug("read_mapping_page() error. Skipping "
-                                       "page (index 0x%lx).", index);
-                       nr_free -= PAGE_SIZE * 8;
-                       continue;
-               }
-               kaddr = kmap_atomic(page);
-               /*
-                * Subtract the number of set bits. If this
-                * is the last page and it is partial we don't really care as
-                * it just means we do a little extra work but it won't affect
-                * the result as all out of range bytes are set to zero by
-                * ntfs_readpage().
-                */
-               nr_free -= bitmap_weight(kaddr,
-                                       PAGE_SIZE * BITS_PER_BYTE);
-               kunmap_atomic(kaddr);
-               put_page(page);
-       }
-       ntfs_debug("Finished reading $Bitmap, last index = 0x%lx.", index - 1);
-       /*
-        * Fixup for eventual bits outside logical ntfs volume (see function
-        * description above).
-        */
-       if (vol->nr_clusters & 63)
-               nr_free += 64 - (vol->nr_clusters & 63);
-       up_read(&vol->lcnbmp_lock);
-       /* If errors occurred we may well have gone below zero, fix this. */
-       if (nr_free < 0)
-               nr_free = 0;
-       ntfs_debug("Exiting.");
-       return nr_free;
-}
-
-/**
- * __get_nr_free_mft_records - return the number of free inodes on a volume
- * @vol:       ntfs volume for which to obtain free inode count
- * @nr_free:   number of mft records in filesystem
- * @max_index: maximum number of pages containing set bits
- *
- * Calculate the number of free mft records (inodes) on the mounted NTFS
- * volume @vol. We actually calculate the number of mft records in use instead
- * because this allows us to not care about partial pages as these will be just
- * zero filled and hence not be counted as allocated mft record.
- *
- * If any pages cannot be read we assume all mft records in the erroring pages
- * are in use. This means we return an underestimate on errors which is better
- * than an overestimate.
- *
- * NOTE: Caller must hold mftbmp_lock rw_semaphore for reading or writing.
- */
-static unsigned long __get_nr_free_mft_records(ntfs_volume *vol,
-               s64 nr_free, const pgoff_t max_index)
-{
-       struct address_space *mapping = vol->mftbmp_ino->i_mapping;
-       struct page *page;
-       pgoff_t index;
-
-       ntfs_debug("Entering.");
-       /* Use multiples of 4 bytes, thus max_size is PAGE_SIZE / 4. */
-       ntfs_debug("Reading $MFT/$BITMAP, max_index = 0x%lx, max_size = "
-                       "0x%lx.", max_index, PAGE_SIZE / 4);
-       for (index = 0; index < max_index; index++) {
-               unsigned long *kaddr;
-
-               /*
-                * Read the page from page cache, getting it from backing store
-                * if necessary, and increment the use count.
-                */
-               page = read_mapping_page(mapping, index, NULL);
-               /* Ignore pages which errored synchronously. */
-               if (IS_ERR(page)) {
-                       ntfs_debug("read_mapping_page() error. Skipping "
-                                       "page (index 0x%lx).", index);
-                       nr_free -= PAGE_SIZE * 8;
-                       continue;
-               }
-               kaddr = kmap_atomic(page);
-               /*
-                * Subtract the number of set bits. If this
-                * is the last page and it is partial we don't really care as
-                * it just means we do a little extra work but it won't affect
-                * the result as all out of range bytes are set to zero by
-                * ntfs_readpage().
-                */
-               nr_free -= bitmap_weight(kaddr,
-                                       PAGE_SIZE * BITS_PER_BYTE);
-               kunmap_atomic(kaddr);
-               put_page(page);
-       }
-       ntfs_debug("Finished reading $MFT/$BITMAP, last index = 0x%lx.",
-                       index - 1);
-       /* If errors occurred we may well have gone below zero, fix this. */
-       if (nr_free < 0)
-               nr_free = 0;
-       ntfs_debug("Exiting.");
-       return nr_free;
-}
-
-/**
- * ntfs_statfs - return information about mounted NTFS volume
- * @dentry:    dentry from mounted volume
- * @sfs:       statfs structure in which to return the information
- *
- * Return information about the mounted NTFS volume @dentry in the statfs structure
- * pointed to by @sfs (this is initialized with zeros before ntfs_statfs is
- * called). We interpret the values to be correct of the moment in time at
- * which we are called. Most values are variable otherwise and this isn't just
- * the free values but the totals as well. For example we can increase the
- * total number of file nodes if we run out and we can keep doing this until
- * there is no more space on the volume left at all.
- *
- * Called from vfs_statfs which is used to handle the statfs, fstatfs, and
- * ustat system calls.
- *
- * Return 0 on success or -errno on error.
- */
-static int ntfs_statfs(struct dentry *dentry, struct kstatfs *sfs)
-{
-       struct super_block *sb = dentry->d_sb;
-       s64 size;
-       ntfs_volume *vol = NTFS_SB(sb);
-       ntfs_inode *mft_ni = NTFS_I(vol->mft_ino);
-       pgoff_t max_index;
-       unsigned long flags;
-
-       ntfs_debug("Entering.");
-       /* Type of filesystem. */
-       sfs->f_type   = NTFS_SB_MAGIC;
-       /* Optimal transfer block size. */
-       sfs->f_bsize  = PAGE_SIZE;
-       /*
-        * Total data blocks in filesystem in units of f_bsize and since
-        * inodes are also stored in data blocs ($MFT is a file) this is just
-        * the total clusters.
-        */
-       sfs->f_blocks = vol->nr_clusters << vol->cluster_size_bits >>
-                               PAGE_SHIFT;
-       /* Free data blocks in filesystem in units of f_bsize. */
-       size          = get_nr_free_clusters(vol) << vol->cluster_size_bits >>
-                               PAGE_SHIFT;
-       if (size < 0LL)
-               size = 0LL;
-       /* Free blocks avail to non-superuser, same as above on NTFS. */
-       sfs->f_bavail = sfs->f_bfree = size;
-       /* Serialize accesses to the inode bitmap. */
-       down_read(&vol->mftbmp_lock);
-       read_lock_irqsave(&mft_ni->size_lock, flags);
-       size = i_size_read(vol->mft_ino) >> vol->mft_record_size_bits;
-       /*
-        * Convert the maximum number of set bits into bytes rounded up, then
-        * convert into multiples of PAGE_SIZE, rounding up so that if we
-        * have one full and one partial page max_index = 2.
-        */
-       max_index = ((((mft_ni->initialized_size >> vol->mft_record_size_bits)
-                       + 7) >> 3) + PAGE_SIZE - 1) >> PAGE_SHIFT;
-       read_unlock_irqrestore(&mft_ni->size_lock, flags);
-       /* Number of inodes in filesystem (at this point in time). */
-       sfs->f_files = size;
-       /* Free inodes in fs (based on current total count). */
-       sfs->f_ffree = __get_nr_free_mft_records(vol, size, max_index);
-       up_read(&vol->mftbmp_lock);
-       /*
-        * File system id. This is extremely *nix flavour dependent and even
-        * within Linux itself all fs do their own thing. I interpret this to
-        * mean a unique id associated with the mounted fs and not the id
-        * associated with the filesystem driver, the latter is already given
-        * by the filesystem type in sfs->f_type. Thus we use the 64-bit
-        * volume serial number splitting it into two 32-bit parts. We enter
-        * the least significant 32-bits in f_fsid[0] and the most significant
-        * 32-bits in f_fsid[1].
-        */
-       sfs->f_fsid = u64_to_fsid(vol->serial_no);
-       /* Maximum length of filenames. */
-       sfs->f_namelen     = NTFS_MAX_NAME_LEN;
-       return 0;
-}
-
-#ifdef NTFS_RW
-static int ntfs_write_inode(struct inode *vi, struct writeback_control *wbc)
-{
-       return __ntfs_write_inode(vi, wbc->sync_mode == WB_SYNC_ALL);
-}
-#endif
-
-/*
- * The complete super operations.
- */
-static const struct super_operations ntfs_sops = {
-       .alloc_inode    = ntfs_alloc_big_inode,   /* VFS: Allocate new inode. */
-       .free_inode     = ntfs_free_big_inode, /* VFS: Deallocate inode. */
-#ifdef NTFS_RW
-       .write_inode    = ntfs_write_inode,     /* VFS: Write dirty inode to
-                                                  disk. */
-#endif /* NTFS_RW */
-       .put_super      = ntfs_put_super,       /* Syscall: umount. */
-       .statfs         = ntfs_statfs,          /* Syscall: statfs */
-       .remount_fs     = ntfs_remount,         /* Syscall: mount -o remount. */
-       .evict_inode    = ntfs_evict_big_inode, /* VFS: Called when an inode is
-                                                  removed from memory. */
-       .show_options   = ntfs_show_options,    /* Show mount options in
-                                                  proc. */
-};
-
-/**
- * ntfs_fill_super - mount an ntfs filesystem
- * @sb:                super block of ntfs filesystem to mount
- * @opt:       string containing the mount options
- * @silent:    silence error output
- *
- * ntfs_fill_super() is called by the VFS to mount the device described by @sb
- * with the mount otions in @data with the NTFS filesystem.
- *
- * If @silent is true, remain silent even if errors are detected. This is used
- * during bootup, when the kernel tries to mount the root filesystem with all
- * registered filesystems one after the other until one succeeds. This implies
- * that all filesystems except the correct one will quite correctly and
- * expectedly return an error, but nobody wants to see error messages when in
- * fact this is what is supposed to happen.
- *
- * NOTE: @sb->s_flags contains the mount options flags.
- */
-static int ntfs_fill_super(struct super_block *sb, void *opt, const int silent)
-{
-       ntfs_volume *vol;
-       struct buffer_head *bh;
-       struct inode *tmp_ino;
-       int blocksize, result;
-
-       /*
-        * We do a pretty difficult piece of bootstrap by reading the
-        * MFT (and other metadata) from disk into memory. We'll only
-        * release this metadata during umount, so the locking patterns
-        * observed during bootstrap do not count. So turn off the
-        * observation of locking patterns (strictly for this context
-        * only) while mounting NTFS. [The validator is still active
-        * otherwise, even for this context: it will for example record
-        * lock class registrations.]
-        */
-       lockdep_off();
-       ntfs_debug("Entering.");
-#ifndef NTFS_RW
-       sb->s_flags |= SB_RDONLY;
-#endif /* ! NTFS_RW */
-       /* Allocate a new ntfs_volume and place it in sb->s_fs_info. */
-       sb->s_fs_info = kmalloc(sizeof(ntfs_volume), GFP_NOFS);
-       vol = NTFS_SB(sb);
-       if (!vol) {
-               if (!silent)
-                       ntfs_error(sb, "Allocation of NTFS volume structure "
-                                       "failed. Aborting mount...");
-               lockdep_on();
-               return -ENOMEM;
-       }
-       /* Initialize ntfs_volume structure. */
-       *vol = (ntfs_volume) {
-               .sb = sb,
-               /*
-                * Default is group and other don't have any access to files or
-                * directories while owner has full access. Further, files by
-                * default are not executable but directories are of course
-                * browseable.
-                */
-               .fmask = 0177,
-               .dmask = 0077,
-       };
-       init_rwsem(&vol->mftbmp_lock);
-       init_rwsem(&vol->lcnbmp_lock);
-
-       /* By default, enable sparse support. */
-       NVolSetSparseEnabled(vol);
-
-       /* Important to get the mount options dealt with now. */
-       if (!parse_options(vol, (char*)opt))
-               goto err_out_now;
-
-       /* We support sector sizes up to the PAGE_SIZE. */
-       if (bdev_logical_block_size(sb->s_bdev) > PAGE_SIZE) {
-               if (!silent)
-                       ntfs_error(sb, "Device has unsupported sector size "
-                                       "(%i).  The maximum supported sector "
-                                       "size on this architecture is %lu "
-                                       "bytes.",
-                                       bdev_logical_block_size(sb->s_bdev),
-                                       PAGE_SIZE);
-               goto err_out_now;
-       }
-       /*
-        * Setup the device access block size to NTFS_BLOCK_SIZE or the hard
-        * sector size, whichever is bigger.
-        */
-       blocksize = sb_min_blocksize(sb, NTFS_BLOCK_SIZE);
-       if (blocksize < NTFS_BLOCK_SIZE) {
-               if (!silent)
-                       ntfs_error(sb, "Unable to set device block size.");
-               goto err_out_now;
-       }
-       BUG_ON(blocksize != sb->s_blocksize);
-       ntfs_debug("Set device block size to %i bytes (block size bits %i).",
-                       blocksize, sb->s_blocksize_bits);
-       /* Determine the size of the device in units of block_size bytes. */
-       vol->nr_blocks = sb_bdev_nr_blocks(sb);
-       if (!vol->nr_blocks) {
-               if (!silent)
-                       ntfs_error(sb, "Unable to determine device size.");
-               goto err_out_now;
-       }
-       /* Read the boot sector and return unlocked buffer head to it. */
-       if (!(bh = read_ntfs_boot_sector(sb, silent))) {
-               if (!silent)
-                       ntfs_error(sb, "Not an NTFS volume.");
-               goto err_out_now;
-       }
-       /*
-        * Extract the data from the boot sector and setup the ntfs volume
-        * using it.
-        */
-       result = parse_ntfs_boot_sector(vol, (NTFS_BOOT_SECTOR*)bh->b_data);
-       brelse(bh);
-       if (!result) {
-               if (!silent)
-                       ntfs_error(sb, "Unsupported NTFS filesystem.");
-               goto err_out_now;
-       }
-       /*
-        * If the boot sector indicates a sector size bigger than the current
-        * device block size, switch the device block size to the sector size.
-        * TODO: It may be possible to support this case even when the set
-        * below fails, we would just be breaking up the i/o for each sector
-        * into multiple blocks for i/o purposes but otherwise it should just
-        * work.  However it is safer to leave disabled until someone hits this
-        * error message and then we can get them to try it without the setting
-        * so we know for sure that it works.
-        */
-       if (vol->sector_size > blocksize) {
-               blocksize = sb_set_blocksize(sb, vol->sector_size);
-               if (blocksize != vol->sector_size) {
-                       if (!silent)
-                               ntfs_error(sb, "Unable to set device block "
-                                               "size to sector size (%i).",
-                                               vol->sector_size);
-                       goto err_out_now;
-               }
-               BUG_ON(blocksize != sb->s_blocksize);
-               vol->nr_blocks = sb_bdev_nr_blocks(sb);
-               ntfs_debug("Changed device block size to %i bytes (block size "
-                               "bits %i) to match volume sector size.",
-                               blocksize, sb->s_blocksize_bits);
-       }
-       /* Initialize the cluster and mft allocators. */
-       ntfs_setup_allocators(vol);
-       /* Setup remaining fields in the super block. */
-       sb->s_magic = NTFS_SB_MAGIC;
-       /*
-        * Ntfs allows 63 bits for the file size, i.e. correct would be:
-        *      sb->s_maxbytes = ~0ULL >> 1;
-        * But the kernel uses a long as the page cache page index which on
-        * 32-bit architectures is only 32-bits. MAX_LFS_FILESIZE is kernel
-        * defined to the maximum the page cache page index can cope with
-        * without overflowing the index or to 2^63 - 1, whichever is smaller.
-        */
-       sb->s_maxbytes = MAX_LFS_FILESIZE;
-       /* Ntfs measures time in 100ns intervals. */
-       sb->s_time_gran = 100;
-       /*
-        * Now load the metadata required for the page cache and our address
-        * space operations to function. We do this by setting up a specialised
-        * read_inode method and then just calling the normal iget() to obtain
-        * the inode for $MFT which is sufficient to allow our normal inode
-        * operations and associated address space operations to function.
-        */
-       sb->s_op = &ntfs_sops;
-       tmp_ino = new_inode(sb);
-       if (!tmp_ino) {
-               if (!silent)
-                       ntfs_error(sb, "Failed to load essential metadata.");
-               goto err_out_now;
-       }
-       tmp_ino->i_ino = FILE_MFT;
-       insert_inode_hash(tmp_ino);
-       if (ntfs_read_inode_mount(tmp_ino) < 0) {
-               if (!silent)
-                       ntfs_error(sb, "Failed to load essential metadata.");
-               goto iput_tmp_ino_err_out_now;
-       }
-       mutex_lock(&ntfs_lock);
-       /*
-        * The current mount is a compression user if the cluster size is
-        * less than or equal 4kiB.
-        */
-       if (vol->cluster_size <= 4096 && !ntfs_nr_compression_users++) {
-               result = allocate_compression_buffers();
-               if (result) {
-                       ntfs_error(NULL, "Failed to allocate buffers "
-                                       "for compression engine.");
-                       ntfs_nr_compression_users--;
-                       mutex_unlock(&ntfs_lock);
-                       goto iput_tmp_ino_err_out_now;
-               }
-       }
-       /*
-        * Generate the global default upcase table if necessary.  Also
-        * temporarily increment the number of upcase users to avoid race
-        * conditions with concurrent (u)mounts.
-        */
-       if (!default_upcase)
-               default_upcase = generate_default_upcase();
-       ntfs_nr_upcase_users++;
-       mutex_unlock(&ntfs_lock);
-       /*
-        * From now on, ignore @silent parameter. If we fail below this line,
-        * it will be due to a corrupt fs or a system error, so we report it.
-        */
-       /*
-        * Open the system files with normal access functions and complete
-        * setting up the ntfs super block.
-        */
-       if (!load_system_files(vol)) {
-               ntfs_error(sb, "Failed to load system files.");
-               goto unl_upcase_iput_tmp_ino_err_out_now;
-       }
-
-       /* We grab a reference, simulating an ntfs_iget(). */
-       ihold(vol->root_ino);
-       if ((sb->s_root = d_make_root(vol->root_ino))) {
-               ntfs_debug("Exiting, status successful.");
-               /* Release the default upcase if it has no users. */
-               mutex_lock(&ntfs_lock);
-               if (!--ntfs_nr_upcase_users && default_upcase) {
-                       ntfs_free(default_upcase);
-                       default_upcase = NULL;
-               }
-               mutex_unlock(&ntfs_lock);
-               sb->s_export_op = &ntfs_export_ops;
-               lockdep_on();
-               return 0;
-       }
-       ntfs_error(sb, "Failed to allocate root directory.");
-       /* Clean up after the successful load_system_files() call from above. */
-       // TODO: Use ntfs_put_super() instead of repeating all this code...
-       // FIXME: Should mark the volume clean as the error is most likely
-       //        -ENOMEM.
-       iput(vol->vol_ino);
-       vol->vol_ino = NULL;
-       /* NTFS 3.0+ specific clean up. */
-       if (vol->major_ver >= 3) {
-#ifdef NTFS_RW
-               if (vol->usnjrnl_j_ino) {
-                       iput(vol->usnjrnl_j_ino);
-                       vol->usnjrnl_j_ino = NULL;
-               }
-               if (vol->usnjrnl_max_ino) {
-                       iput(vol->usnjrnl_max_ino);
-                       vol->usnjrnl_max_ino = NULL;
-               }
-               if (vol->usnjrnl_ino) {
-                       iput(vol->usnjrnl_ino);
-                       vol->usnjrnl_ino = NULL;
-               }
-               if (vol->quota_q_ino) {
-                       iput(vol->quota_q_ino);
-                       vol->quota_q_ino = NULL;
-               }
-               if (vol->quota_ino) {
-                       iput(vol->quota_ino);
-                       vol->quota_ino = NULL;
-               }
-#endif /* NTFS_RW */
-               if (vol->extend_ino) {
-                       iput(vol->extend_ino);
-                       vol->extend_ino = NULL;
-               }
-               if (vol->secure_ino) {
-                       iput(vol->secure_ino);
-                       vol->secure_ino = NULL;
-               }
-       }
-       iput(vol->root_ino);
-       vol->root_ino = NULL;
-       iput(vol->lcnbmp_ino);
-       vol->lcnbmp_ino = NULL;
-       iput(vol->mftbmp_ino);
-       vol->mftbmp_ino = NULL;
-#ifdef NTFS_RW
-       if (vol->logfile_ino) {
-               iput(vol->logfile_ino);
-               vol->logfile_ino = NULL;
-       }
-       if (vol->mftmirr_ino) {
-               iput(vol->mftmirr_ino);
-               vol->mftmirr_ino = NULL;
-       }
-#endif /* NTFS_RW */
-       /* Throw away the table of attribute definitions. */
-       vol->attrdef_size = 0;
-       if (vol->attrdef) {
-               ntfs_free(vol->attrdef);
-               vol->attrdef = NULL;
-       }
-       vol->upcase_len = 0;
-       mutex_lock(&ntfs_lock);
-       if (vol->upcase == default_upcase) {
-               ntfs_nr_upcase_users--;
-               vol->upcase = NULL;
-       }
-       mutex_unlock(&ntfs_lock);
-       if (vol->upcase) {
-               ntfs_free(vol->upcase);
-               vol->upcase = NULL;
-       }
-       if (vol->nls_map) {
-               unload_nls(vol->nls_map);
-               vol->nls_map = NULL;
-       }
-       /* Error exit code path. */
-unl_upcase_iput_tmp_ino_err_out_now:
-       /*
-        * Decrease the number of upcase users and destroy the global default
-        * upcase table if necessary.
-        */
-       mutex_lock(&ntfs_lock);
-       if (!--ntfs_nr_upcase_users && default_upcase) {
-               ntfs_free(default_upcase);
-               default_upcase = NULL;
-       }
-       if (vol->cluster_size <= 4096 && !--ntfs_nr_compression_users)
-               free_compression_buffers();
-       mutex_unlock(&ntfs_lock);
-iput_tmp_ino_err_out_now:
-       iput(tmp_ino);
-       if (vol->mft_ino && vol->mft_ino != tmp_ino)
-               iput(vol->mft_ino);
-       vol->mft_ino = NULL;
-       /* Errors at this stage are irrelevant. */
-err_out_now:
-       sb->s_fs_info = NULL;
-       kfree(vol);
-       ntfs_debug("Failed, returning -EINVAL.");
-       lockdep_on();
-       return -EINVAL;
-}
-
-/*
- * This is a slab cache to optimize allocations and deallocations of Unicode
- * strings of the maximum length allowed by NTFS, which is NTFS_MAX_NAME_LEN
- * (255) Unicode characters + a terminating NULL Unicode character.
- */
-struct kmem_cache *ntfs_name_cache;
-
-/* Slab caches for efficient allocation/deallocation of inodes. */
-struct kmem_cache *ntfs_inode_cache;
-struct kmem_cache *ntfs_big_inode_cache;
-
-/* Init once constructor for the inode slab cache. */
-static void ntfs_big_inode_init_once(void *foo)
-{
-       ntfs_inode *ni = (ntfs_inode *)foo;
-
-       inode_init_once(VFS_I(ni));
-}
-
-/*
- * Slab caches to optimize allocations and deallocations of attribute search
- * contexts and index contexts, respectively.
- */
-struct kmem_cache *ntfs_attr_ctx_cache;
-struct kmem_cache *ntfs_index_ctx_cache;
-
-/* Driver wide mutex. */
-DEFINE_MUTEX(ntfs_lock);
-
-static struct dentry *ntfs_mount(struct file_system_type *fs_type,
-       int flags, const char *dev_name, void *data)
-{
-       return mount_bdev(fs_type, flags, dev_name, data, ntfs_fill_super);
-}
-
-static struct file_system_type ntfs_fs_type = {
-       .owner          = THIS_MODULE,
-       .name           = "ntfs",
-       .mount          = ntfs_mount,
-       .kill_sb        = kill_block_super,
-       .fs_flags       = FS_REQUIRES_DEV,
-};
-MODULE_ALIAS_FS("ntfs");
-
-/* Stable names for the slab caches. */
-static const char ntfs_index_ctx_cache_name[] = "ntfs_index_ctx_cache";
-static const char ntfs_attr_ctx_cache_name[] = "ntfs_attr_ctx_cache";
-static const char ntfs_name_cache_name[] = "ntfs_name_cache";
-static const char ntfs_inode_cache_name[] = "ntfs_inode_cache";
-static const char ntfs_big_inode_cache_name[] = "ntfs_big_inode_cache";
-
-static int __init init_ntfs_fs(void)
-{
-       int err = 0;
-
-       /* This may be ugly but it results in pretty output so who cares. (-8 */
-       pr_info("driver " NTFS_VERSION " [Flags: R/"
-#ifdef NTFS_RW
-                       "W"
-#else
-                       "O"
-#endif
-#ifdef DEBUG
-                       " DEBUG"
-#endif
-#ifdef MODULE
-                       " MODULE"
-#endif
-                       "].\n");
-
-       ntfs_debug("Debug messages are enabled.");
-
-       ntfs_index_ctx_cache = kmem_cache_create(ntfs_index_ctx_cache_name,
-                       sizeof(ntfs_index_context), 0 /* offset */,
-                       SLAB_HWCACHE_ALIGN, NULL /* ctor */);
-       if (!ntfs_index_ctx_cache) {
-               pr_crit("Failed to create %s!\n", ntfs_index_ctx_cache_name);
-               goto ictx_err_out;
-       }
-       ntfs_attr_ctx_cache = kmem_cache_create(ntfs_attr_ctx_cache_name,
-                       sizeof(ntfs_attr_search_ctx), 0 /* offset */,
-                       SLAB_HWCACHE_ALIGN, NULL /* ctor */);
-       if (!ntfs_attr_ctx_cache) {
-               pr_crit("NTFS: Failed to create %s!\n",
-                       ntfs_attr_ctx_cache_name);
-               goto actx_err_out;
-       }
-
-       ntfs_name_cache = kmem_cache_create(ntfs_name_cache_name,
-                       (NTFS_MAX_NAME_LEN+1) * sizeof(ntfschar), 0,
-                       SLAB_HWCACHE_ALIGN, NULL);
-       if (!ntfs_name_cache) {
-               pr_crit("Failed to create %s!\n", ntfs_name_cache_name);
-               goto name_err_out;
-       }
-
-       ntfs_inode_cache = kmem_cache_create(ntfs_inode_cache_name,
-                       sizeof(ntfs_inode), 0,
-                       SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD, NULL);
-       if (!ntfs_inode_cache) {
-               pr_crit("Failed to create %s!\n", ntfs_inode_cache_name);
-               goto inode_err_out;
-       }
-
-       ntfs_big_inode_cache = kmem_cache_create(ntfs_big_inode_cache_name,
-                       sizeof(big_ntfs_inode), 0,
-                       SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|
-                       SLAB_ACCOUNT, ntfs_big_inode_init_once);
-       if (!ntfs_big_inode_cache) {
-               pr_crit("Failed to create %s!\n", ntfs_big_inode_cache_name);
-               goto big_inode_err_out;
-       }
-
-       /* Register the ntfs sysctls. */
-       err = ntfs_sysctl(1);
-       if (err) {
-               pr_crit("Failed to register NTFS sysctls!\n");
-               goto sysctl_err_out;
-       }
-
-       err = register_filesystem(&ntfs_fs_type);
-       if (!err) {
-               ntfs_debug("NTFS driver registered successfully.");
-               return 0; /* Success! */
-       }
-       pr_crit("Failed to register NTFS filesystem driver!\n");
-
-       /* Unregister the ntfs sysctls. */
-       ntfs_sysctl(0);
-sysctl_err_out:
-       kmem_cache_destroy(ntfs_big_inode_cache);
-big_inode_err_out:
-       kmem_cache_destroy(ntfs_inode_cache);
-inode_err_out:
-       kmem_cache_destroy(ntfs_name_cache);
-name_err_out:
-       kmem_cache_destroy(ntfs_attr_ctx_cache);
-actx_err_out:
-       kmem_cache_destroy(ntfs_index_ctx_cache);
-ictx_err_out:
-       if (!err) {
-               pr_crit("Aborting NTFS filesystem driver registration...\n");
-               err = -ENOMEM;
-       }
-       return err;
-}
-
-static void __exit exit_ntfs_fs(void)
-{
-       ntfs_debug("Unregistering NTFS driver.");
-
-       unregister_filesystem(&ntfs_fs_type);
-
-       /*
-        * Make sure all delayed rcu free inodes are flushed before we
-        * destroy cache.
-        */
-       rcu_barrier();
-       kmem_cache_destroy(ntfs_big_inode_cache);
-       kmem_cache_destroy(ntfs_inode_cache);
-       kmem_cache_destroy(ntfs_name_cache);
-       kmem_cache_destroy(ntfs_attr_ctx_cache);
-       kmem_cache_destroy(ntfs_index_ctx_cache);
-       /* Unregister the ntfs sysctls. */
-       ntfs_sysctl(0);
-}
-
-MODULE_AUTHOR("Anton Altaparmakov <anton@tuxera.com>");
-MODULE_DESCRIPTION("NTFS 1.2/3.x driver - Copyright (c) 2001-2014 Anton Altaparmakov and Tuxera Inc.");
-MODULE_VERSION(NTFS_VERSION);
-MODULE_LICENSE("GPL");
-#ifdef DEBUG
-module_param(debug_msgs, bint, 0);
-MODULE_PARM_DESC(debug_msgs, "Enable debug messages.");
-#endif
-
-module_init(init_ntfs_fs)
-module_exit(exit_ntfs_fs)
diff --git a/fs/ntfs/sysctl.c b/fs/ntfs/sysctl.c
deleted file mode 100644 (file)
index 4e98017..0000000
+++ /dev/null
@@ -1,58 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * sysctl.c - Code for sysctl handling in NTFS Linux kernel driver. Part of
- *           the Linux-NTFS project. Adapted from the old NTFS driver,
- *           Copyright (C) 1997 Martin von Löwis, Régis Duchesne
- *
- * Copyright (c) 2002-2005 Anton Altaparmakov
- */
-
-#ifdef DEBUG
-
-#include <linux/module.h>
-
-#ifdef CONFIG_SYSCTL
-
-#include <linux/proc_fs.h>
-#include <linux/sysctl.h>
-
-#include "sysctl.h"
-#include "debug.h"
-
-/* Definition of the ntfs sysctl. */
-static struct ctl_table ntfs_sysctls[] = {
-       {
-               .procname       = "ntfs-debug",
-               .data           = &debug_msgs,          /* Data pointer and size. */
-               .maxlen         = sizeof(debug_msgs),
-               .mode           = 0644,                 /* Mode, proc handler. */
-               .proc_handler   = proc_dointvec
-       },
-};
-
-/* Storage for the sysctls header. */
-static struct ctl_table_header *sysctls_root_table;
-
-/**
- * ntfs_sysctl - add or remove the debug sysctl
- * @add:       add (1) or remove (0) the sysctl
- *
- * Add or remove the debug sysctl. Return 0 on success or -errno on error.
- */
-int ntfs_sysctl(int add)
-{
-       if (add) {
-               BUG_ON(sysctls_root_table);
-               sysctls_root_table = register_sysctl("fs", ntfs_sysctls);
-               if (!sysctls_root_table)
-                       return -ENOMEM;
-       } else {
-               BUG_ON(!sysctls_root_table);
-               unregister_sysctl_table(sysctls_root_table);
-               sysctls_root_table = NULL;
-       }
-       return 0;
-}
-
-#endif /* CONFIG_SYSCTL */
-#endif /* DEBUG */
diff --git a/fs/ntfs/sysctl.h b/fs/ntfs/sysctl.h
deleted file mode 100644 (file)
index 96bb229..0000000
+++ /dev/null
@@ -1,27 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * sysctl.h - Defines for sysctl handling in NTFS Linux kernel driver. Part of
- *           the Linux-NTFS project. Adapted from the old NTFS driver,
- *           Copyright (C) 1997 Martin von Löwis, Régis Duchesne
- *
- * Copyright (c) 2002-2004 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_SYSCTL_H
-#define _LINUX_NTFS_SYSCTL_H
-
-
-#if defined(DEBUG) && defined(CONFIG_SYSCTL)
-
-extern int ntfs_sysctl(int add);
-
-#else
-
-/* Just return success. */
-static inline int ntfs_sysctl(int add)
-{
-       return 0;
-}
-
-#endif /* DEBUG && CONFIG_SYSCTL */
-#endif /* _LINUX_NTFS_SYSCTL_H */
diff --git a/fs/ntfs/time.h b/fs/ntfs/time.h
deleted file mode 100644 (file)
index 6b63261..0000000
+++ /dev/null
@@ -1,89 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * time.h - NTFS time conversion functions.  Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2005 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_TIME_H
-#define _LINUX_NTFS_TIME_H
-
-#include <linux/time.h>                /* For current_kernel_time(). */
-#include <asm/div64.h>         /* For do_div(). */
-
-#include "endian.h"
-
-#define NTFS_TIME_OFFSET ((s64)(369 * 365 + 89) * 24 * 3600 * 10000000)
-
-/**
- * utc2ntfs - convert Linux UTC time to NTFS time
- * @ts:                Linux UTC time to convert to NTFS time
- *
- * Convert the Linux UTC time @ts to its corresponding NTFS time and return
- * that in little endian format.
- *
- * Linux stores time in a struct timespec64 consisting of a time64_t tv_sec
- * and a long tv_nsec where tv_sec is the number of 1-second intervals since
- * 1st January 1970, 00:00:00 UTC and tv_nsec is the number of 1-nano-second
- * intervals since the value of tv_sec.
- *
- * NTFS uses Microsoft's standard time format which is stored in a s64 and is
- * measured as the number of 100-nano-second intervals since 1st January 1601,
- * 00:00:00 UTC.
- */
-static inline sle64 utc2ntfs(const struct timespec64 ts)
-{
-       /*
-        * Convert the seconds to 100ns intervals, add the nano-seconds
-        * converted to 100ns intervals, and then add the NTFS time offset.
-        */
-       return cpu_to_sle64((s64)ts.tv_sec * 10000000 + ts.tv_nsec / 100 +
-                       NTFS_TIME_OFFSET);
-}
-
-/**
- * get_current_ntfs_time - get the current time in little endian NTFS format
- *
- * Get the current time from the Linux kernel, convert it to its corresponding
- * NTFS time and return that in little endian format.
- */
-static inline sle64 get_current_ntfs_time(void)
-{
-       struct timespec64 ts;
-
-       ktime_get_coarse_real_ts64(&ts);
-       return utc2ntfs(ts);
-}
-
-/**
- * ntfs2utc - convert NTFS time to Linux time
- * @time:      NTFS time (little endian) to convert to Linux UTC
- *
- * Convert the little endian NTFS time @time to its corresponding Linux UTC
- * time and return that in cpu format.
- *
- * Linux stores time in a struct timespec64 consisting of a time64_t tv_sec
- * and a long tv_nsec where tv_sec is the number of 1-second intervals since
- * 1st January 1970, 00:00:00 UTC and tv_nsec is the number of 1-nano-second
- * intervals since the value of tv_sec.
- *
- * NTFS uses Microsoft's standard time format which is stored in a s64 and is
- * measured as the number of 100 nano-second intervals since 1st January 1601,
- * 00:00:00 UTC.
- */
-static inline struct timespec64 ntfs2utc(const sle64 time)
-{
-       struct timespec64 ts;
-
-       /* Subtract the NTFS time offset. */
-       u64 t = (u64)(sle64_to_cpu(time) - NTFS_TIME_OFFSET);
-       /*
-        * Convert the time to 1-second intervals and the remainder to
-        * 1-nano-second intervals.
-        */
-       ts.tv_nsec = do_div(t, 10000000) * 100;
-       ts.tv_sec = t;
-       return ts;
-}
-
-#endif /* _LINUX_NTFS_TIME_H */
diff --git a/fs/ntfs/types.h b/fs/ntfs/types.h
deleted file mode 100644 (file)
index 9a47859..0000000
+++ /dev/null
@@ -1,55 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * types.h - Defines for NTFS Linux kernel driver specific types.
- *          Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2005 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_TYPES_H
-#define _LINUX_NTFS_TYPES_H
-
-#include <linux/types.h>
-
-typedef __le16 le16;
-typedef __le32 le32;
-typedef __le64 le64;
-typedef __u16 __bitwise sle16;
-typedef __u32 __bitwise sle32;
-typedef __u64 __bitwise sle64;
-
-/* 2-byte Unicode character type. */
-typedef le16 ntfschar;
-#define UCHAR_T_SIZE_BITS 1
-
-/*
- * Clusters are signed 64-bit values on NTFS volumes. We define two types, LCN
- * and VCN, to allow for type checking and better code readability.
- */
-typedef s64 VCN;
-typedef sle64 leVCN;
-typedef s64 LCN;
-typedef sle64 leLCN;
-
-/*
- * The NTFS journal $LogFile uses log sequence numbers which are signed 64-bit
- * values.  We define our own type LSN, to allow for type checking and better
- * code readability.
- */
-typedef s64 LSN;
-typedef sle64 leLSN;
-
-/*
- * The NTFS transaction log $UsnJrnl uses usn which are signed 64-bit values.
- * We define our own type USN, to allow for type checking and better code
- * readability.
- */
-typedef s64 USN;
-typedef sle64 leUSN;
-
-typedef enum {
-       CASE_SENSITIVE = 0,
-       IGNORE_CASE = 1,
-} IGNORE_CASE_BOOL;
-
-#endif /* _LINUX_NTFS_TYPES_H */
diff --git a/fs/ntfs/unistr.c b/fs/ntfs/unistr.c
deleted file mode 100644 (file)
index a6b6c64..0000000
+++ /dev/null
@@ -1,384 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * unistr.c - NTFS Unicode string handling. Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2006 Anton Altaparmakov
- */
-
-#include <linux/slab.h>
-
-#include "types.h"
-#include "debug.h"
-#include "ntfs.h"
-
-/*
- * IMPORTANT
- * =========
- *
- * All these routines assume that the Unicode characters are in little endian
- * encoding inside the strings!!!
- */
-
-/*
- * This is used by the name collation functions to quickly determine what
- * characters are (in)valid.
- */
-static const u8 legal_ansi_char_array[0x40] = {
-       0x00, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10,
-       0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10,
-
-       0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10,
-       0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10,
-
-       0x17, 0x07, 0x18, 0x17, 0x17, 0x17, 0x17, 0x17,
-       0x17, 0x17, 0x18, 0x16, 0x16, 0x17, 0x07, 0x00,
-
-       0x17, 0x17, 0x17, 0x17, 0x17, 0x17, 0x17, 0x17,
-       0x17, 0x17, 0x04, 0x16, 0x18, 0x16, 0x18, 0x18,
-};
-
-/**
- * ntfs_are_names_equal - compare two Unicode names for equality
- * @s1:                        name to compare to @s2
- * @s1_len:            length in Unicode characters of @s1
- * @s2:                        name to compare to @s1
- * @s2_len:            length in Unicode characters of @s2
- * @ic:                        ignore case bool
- * @upcase:            upcase table (only if @ic == IGNORE_CASE)
- * @upcase_size:       length in Unicode characters of @upcase (if present)
- *
- * Compare the names @s1 and @s2 and return 'true' (1) if the names are
- * identical, or 'false' (0) if they are not identical. If @ic is IGNORE_CASE,
- * the @upcase table is used to performa a case insensitive comparison.
- */
-bool ntfs_are_names_equal(const ntfschar *s1, size_t s1_len,
-               const ntfschar *s2, size_t s2_len, const IGNORE_CASE_BOOL ic,
-               const ntfschar *upcase, const u32 upcase_size)
-{
-       if (s1_len != s2_len)
-               return false;
-       if (ic == CASE_SENSITIVE)
-               return !ntfs_ucsncmp(s1, s2, s1_len);
-       return !ntfs_ucsncasecmp(s1, s2, s1_len, upcase, upcase_size);
-}
-
-/**
- * ntfs_collate_names - collate two Unicode names
- * @name1:     first Unicode name to compare
- * @name2:     second Unicode name to compare
- * @err_val:   if @name1 contains an invalid character return this value
- * @ic:                either CASE_SENSITIVE or IGNORE_CASE
- * @upcase:    upcase table (ignored if @ic is CASE_SENSITIVE)
- * @upcase_len:        upcase table size (ignored if @ic is CASE_SENSITIVE)
- *
- * ntfs_collate_names collates two Unicode names and returns:
- *
- *  -1 if the first name collates before the second one,
- *   0 if the names match,
- *   1 if the second name collates before the first one, or
- * @err_val if an invalid character is found in @name1 during the comparison.
- *
- * The following characters are considered invalid: '"', '*', '<', '>' and '?'.
- */
-int ntfs_collate_names(const ntfschar *name1, const u32 name1_len,
-               const ntfschar *name2, const u32 name2_len,
-               const int err_val, const IGNORE_CASE_BOOL ic,
-               const ntfschar *upcase, const u32 upcase_len)
-{
-       u32 cnt, min_len;
-       u16 c1, c2;
-
-       min_len = name1_len;
-       if (name1_len > name2_len)
-               min_len = name2_len;
-       for (cnt = 0; cnt < min_len; ++cnt) {
-               c1 = le16_to_cpu(*name1++);
-               c2 = le16_to_cpu(*name2++);
-               if (ic) {
-                       if (c1 < upcase_len)
-                               c1 = le16_to_cpu(upcase[c1]);
-                       if (c2 < upcase_len)
-                               c2 = le16_to_cpu(upcase[c2]);
-               }
-               if (c1 < 64 && legal_ansi_char_array[c1] & 8)
-                       return err_val;
-               if (c1 < c2)
-                       return -1;
-               if (c1 > c2)
-                       return 1;
-       }
-       if (name1_len < name2_len)
-               return -1;
-       if (name1_len == name2_len)
-               return 0;
-       /* name1_len > name2_len */
-       c1 = le16_to_cpu(*name1);
-       if (c1 < 64 && legal_ansi_char_array[c1] & 8)
-               return err_val;
-       return 1;
-}
-
-/**
- * ntfs_ucsncmp - compare two little endian Unicode strings
- * @s1:                first string
- * @s2:                second string
- * @n:         maximum unicode characters to compare
- *
- * Compare the first @n characters of the Unicode strings @s1 and @s2,
- * The strings in little endian format and appropriate le16_to_cpu()
- * conversion is performed on non-little endian machines.
- *
- * The function returns an integer less than, equal to, or greater than zero
- * if @s1 (or the first @n Unicode characters thereof) is found, respectively,
- * to be less than, to match, or be greater than @s2.
- */
-int ntfs_ucsncmp(const ntfschar *s1, const ntfschar *s2, size_t n)
-{
-       u16 c1, c2;
-       size_t i;
-
-       for (i = 0; i < n; ++i) {
-               c1 = le16_to_cpu(s1[i]);
-               c2 = le16_to_cpu(s2[i]);
-               if (c1 < c2)
-                       return -1;
-               if (c1 > c2)
-                       return 1;
-               if (!c1)
-                       break;
-       }
-       return 0;
-}
-
-/**
- * ntfs_ucsncasecmp - compare two little endian Unicode strings, ignoring case
- * @s1:                        first string
- * @s2:                        second string
- * @n:                 maximum unicode characters to compare
- * @upcase:            upcase table
- * @upcase_size:       upcase table size in Unicode characters
- *
- * Compare the first @n characters of the Unicode strings @s1 and @s2,
- * ignoring case. The strings in little endian format and appropriate
- * le16_to_cpu() conversion is performed on non-little endian machines.
- *
- * Each character is uppercased using the @upcase table before the comparison.
- *
- * The function returns an integer less than, equal to, or greater than zero
- * if @s1 (or the first @n Unicode characters thereof) is found, respectively,
- * to be less than, to match, or be greater than @s2.
- */
-int ntfs_ucsncasecmp(const ntfschar *s1, const ntfschar *s2, size_t n,
-               const ntfschar *upcase, const u32 upcase_size)
-{
-       size_t i;
-       u16 c1, c2;
-
-       for (i = 0; i < n; ++i) {
-               if ((c1 = le16_to_cpu(s1[i])) < upcase_size)
-                       c1 = le16_to_cpu(upcase[c1]);
-               if ((c2 = le16_to_cpu(s2[i])) < upcase_size)
-                       c2 = le16_to_cpu(upcase[c2]);
-               if (c1 < c2)
-                       return -1;
-               if (c1 > c2)
-                       return 1;
-               if (!c1)
-                       break;
-       }
-       return 0;
-}
-
-void ntfs_upcase_name(ntfschar *name, u32 name_len, const ntfschar *upcase,
-               const u32 upcase_len)
-{
-       u32 i;
-       u16 u;
-
-       for (i = 0; i < name_len; i++)
-               if ((u = le16_to_cpu(name[i])) < upcase_len)
-                       name[i] = upcase[u];
-}
-
-void ntfs_file_upcase_value(FILE_NAME_ATTR *file_name_attr,
-               const ntfschar *upcase, const u32 upcase_len)
-{
-       ntfs_upcase_name((ntfschar*)&file_name_attr->file_name,
-                       file_name_attr->file_name_length, upcase, upcase_len);
-}
-
-int ntfs_file_compare_values(FILE_NAME_ATTR *file_name_attr1,
-               FILE_NAME_ATTR *file_name_attr2,
-               const int err_val, const IGNORE_CASE_BOOL ic,
-               const ntfschar *upcase, const u32 upcase_len)
-{
-       return ntfs_collate_names((ntfschar*)&file_name_attr1->file_name,
-                       file_name_attr1->file_name_length,
-                       (ntfschar*)&file_name_attr2->file_name,
-                       file_name_attr2->file_name_length,
-                       err_val, ic, upcase, upcase_len);
-}
-
-/**
- * ntfs_nlstoucs - convert NLS string to little endian Unicode string
- * @vol:       ntfs volume which we are working with
- * @ins:       input NLS string buffer
- * @ins_len:   length of input string in bytes
- * @outs:      on return contains the allocated output Unicode string buffer
- *
- * Convert the input string @ins, which is in whatever format the loaded NLS
- * map dictates, into a little endian, 2-byte Unicode string.
- *
- * This function allocates the string and the caller is responsible for
- * calling kmem_cache_free(ntfs_name_cache, *@outs); when finished with it.
- *
- * On success the function returns the number of Unicode characters written to
- * the output string *@outs (>= 0), not counting the terminating Unicode NULL
- * character. *@outs is set to the allocated output string buffer.
- *
- * On error, a negative number corresponding to the error code is returned. In
- * that case the output string is not allocated. Both *@outs and *@outs_len
- * are then undefined.
- *
- * This might look a bit odd due to fast path optimization...
- */
-int ntfs_nlstoucs(const ntfs_volume *vol, const char *ins,
-               const int ins_len, ntfschar **outs)
-{
-       struct nls_table *nls = vol->nls_map;
-       ntfschar *ucs;
-       wchar_t wc;
-       int i, o, wc_len;
-
-       /* We do not trust outside sources. */
-       if (likely(ins)) {
-               ucs = kmem_cache_alloc(ntfs_name_cache, GFP_NOFS);
-               if (likely(ucs)) {
-                       for (i = o = 0; i < ins_len; i += wc_len) {
-                               wc_len = nls->char2uni(ins + i, ins_len - i,
-                                               &wc);
-                               if (likely(wc_len >= 0 &&
-                                               o < NTFS_MAX_NAME_LEN)) {
-                                       if (likely(wc)) {
-                                               ucs[o++] = cpu_to_le16(wc);
-                                               continue;
-                                       } /* else if (!wc) */
-                                       break;
-                               } /* else if (wc_len < 0 ||
-                                               o >= NTFS_MAX_NAME_LEN) */
-                               goto name_err;
-                       }
-                       ucs[o] = 0;
-                       *outs = ucs;
-                       return o;
-               } /* else if (!ucs) */
-               ntfs_error(vol->sb, "Failed to allocate buffer for converted "
-                               "name from ntfs_name_cache.");
-               return -ENOMEM;
-       } /* else if (!ins) */
-       ntfs_error(vol->sb, "Received NULL pointer.");
-       return -EINVAL;
-name_err:
-       kmem_cache_free(ntfs_name_cache, ucs);
-       if (wc_len < 0) {
-               ntfs_error(vol->sb, "Name using character set %s contains "
-                               "characters that cannot be converted to "
-                               "Unicode.", nls->charset);
-               i = -EILSEQ;
-       } else /* if (o >= NTFS_MAX_NAME_LEN) */ {
-               ntfs_error(vol->sb, "Name is too long (maximum length for a "
-                               "name on NTFS is %d Unicode characters.",
-                               NTFS_MAX_NAME_LEN);
-               i = -ENAMETOOLONG;
-       }
-       return i;
-}
-
-/**
- * ntfs_ucstonls - convert little endian Unicode string to NLS string
- * @vol:       ntfs volume which we are working with
- * @ins:       input Unicode string buffer
- * @ins_len:   length of input string in Unicode characters
- * @outs:      on return contains the (allocated) output NLS string buffer
- * @outs_len:  length of output string buffer in bytes
- *
- * Convert the input little endian, 2-byte Unicode string @ins, of length
- * @ins_len into the string format dictated by the loaded NLS.
- *
- * If *@outs is NULL, this function allocates the string and the caller is
- * responsible for calling kfree(*@outs); when finished with it. In this case
- * @outs_len is ignored and can be 0.
- *
- * On success the function returns the number of bytes written to the output
- * string *@outs (>= 0), not counting the terminating NULL byte. If the output
- * string buffer was allocated, *@outs is set to it.
- *
- * On error, a negative number corresponding to the error code is returned. In
- * that case the output string is not allocated. The contents of *@outs are
- * then undefined.
- *
- * This might look a bit odd due to fast path optimization...
- */
-int ntfs_ucstonls(const ntfs_volume *vol, const ntfschar *ins,
-               const int ins_len, unsigned char **outs, int outs_len)
-{
-       struct nls_table *nls = vol->nls_map;
-       unsigned char *ns;
-       int i, o, ns_len, wc;
-
-       /* We don't trust outside sources. */
-       if (ins) {
-               ns = *outs;
-               ns_len = outs_len;
-               if (ns && !ns_len) {
-                       wc = -ENAMETOOLONG;
-                       goto conversion_err;
-               }
-               if (!ns) {
-                       ns_len = ins_len * NLS_MAX_CHARSET_SIZE;
-                       ns = kmalloc(ns_len + 1, GFP_NOFS);
-                       if (!ns)
-                               goto mem_err_out;
-               }
-               for (i = o = 0; i < ins_len; i++) {
-retry:                 wc = nls->uni2char(le16_to_cpu(ins[i]), ns + o,
-                                       ns_len - o);
-                       if (wc > 0) {
-                               o += wc;
-                               continue;
-                       } else if (!wc)
-                               break;
-                       else if (wc == -ENAMETOOLONG && ns != *outs) {
-                               unsigned char *tc;
-                               /* Grow in multiples of 64 bytes. */
-                               tc = kmalloc((ns_len + 64) &
-                                               ~63, GFP_NOFS);
-                               if (tc) {
-                                       memcpy(tc, ns, ns_len);
-                                       ns_len = ((ns_len + 64) & ~63) - 1;
-                                       kfree(ns);
-                                       ns = tc;
-                                       goto retry;
-                               } /* No memory so goto conversion_error; */
-                       } /* wc < 0, real error. */
-                       goto conversion_err;
-               }
-               ns[o] = 0;
-               *outs = ns;
-               return o;
-       } /* else (!ins) */
-       ntfs_error(vol->sb, "Received NULL pointer.");
-       return -EINVAL;
-conversion_err:
-       ntfs_error(vol->sb, "Unicode name contains characters that cannot be "
-                       "converted to character set %s.  You might want to "
-                       "try to use the mount option nls=utf8.", nls->charset);
-       if (ns != *outs)
-               kfree(ns);
-       if (wc != -ENAMETOOLONG)
-               wc = -EILSEQ;
-       return wc;
-mem_err_out:
-       ntfs_error(vol->sb, "Failed to allocate name!");
-       return -ENOMEM;
-}
diff --git a/fs/ntfs/upcase.c b/fs/ntfs/upcase.c
deleted file mode 100644 (file)
index 4ebe84a..0000000
+++ /dev/null
@@ -1,73 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * upcase.c - Generate the full NTFS Unicode upcase table in little endian.
- *           Part of the Linux-NTFS project.
- *
- * Copyright (c) 2001 Richard Russon <ntfs@flatcap.org>
- * Copyright (c) 2001-2006 Anton Altaparmakov
- */
-
-#include "malloc.h"
-#include "ntfs.h"
-
-ntfschar *generate_default_upcase(void)
-{
-       static const int uc_run_table[][3] = { /* Start, End, Add */
-       {0x0061, 0x007B,  -32}, {0x0451, 0x045D, -80}, {0x1F70, 0x1F72,  74},
-       {0x00E0, 0x00F7,  -32}, {0x045E, 0x0460, -80}, {0x1F72, 0x1F76,  86},
-       {0x00F8, 0x00FF,  -32}, {0x0561, 0x0587, -48}, {0x1F76, 0x1F78, 100},
-       {0x0256, 0x0258, -205}, {0x1F00, 0x1F08,   8}, {0x1F78, 0x1F7A, 128},
-       {0x028A, 0x028C, -217}, {0x1F10, 0x1F16,   8}, {0x1F7A, 0x1F7C, 112},
-       {0x03AC, 0x03AD,  -38}, {0x1F20, 0x1F28,   8}, {0x1F7C, 0x1F7E, 126},
-       {0x03AD, 0x03B0,  -37}, {0x1F30, 0x1F38,   8}, {0x1FB0, 0x1FB2,   8},
-       {0x03B1, 0x03C2,  -32}, {0x1F40, 0x1F46,   8}, {0x1FD0, 0x1FD2,   8},
-       {0x03C2, 0x03C3,  -31}, {0x1F51, 0x1F52,   8}, {0x1FE0, 0x1FE2,   8},
-       {0x03C3, 0x03CC,  -32}, {0x1F53, 0x1F54,   8}, {0x1FE5, 0x1FE6,   7},
-       {0x03CC, 0x03CD,  -64}, {0x1F55, 0x1F56,   8}, {0x2170, 0x2180, -16},
-       {0x03CD, 0x03CF,  -63}, {0x1F57, 0x1F58,   8}, {0x24D0, 0x24EA, -26},
-       {0x0430, 0x0450,  -32}, {0x1F60, 0x1F68,   8}, {0xFF41, 0xFF5B, -32},
-       {0}
-       };
-
-       static const int uc_dup_table[][2] = { /* Start, End */
-       {0x0100, 0x012F}, {0x01A0, 0x01A6}, {0x03E2, 0x03EF}, {0x04CB, 0x04CC},
-       {0x0132, 0x0137}, {0x01B3, 0x01B7}, {0x0460, 0x0481}, {0x04D0, 0x04EB},
-       {0x0139, 0x0149}, {0x01CD, 0x01DD}, {0x0490, 0x04BF}, {0x04EE, 0x04F5},
-       {0x014A, 0x0178}, {0x01DE, 0x01EF}, {0x04BF, 0x04BF}, {0x04F8, 0x04F9},
-       {0x0179, 0x017E}, {0x01F4, 0x01F5}, {0x04C1, 0x04C4}, {0x1E00, 0x1E95},
-       {0x018B, 0x018B}, {0x01FA, 0x0218}, {0x04C7, 0x04C8}, {0x1EA0, 0x1EF9},
-       {0}
-       };
-
-       static const int uc_word_table[][2] = { /* Offset, Value */
-       {0x00FF, 0x0178}, {0x01AD, 0x01AC}, {0x01F3, 0x01F1}, {0x0269, 0x0196},
-       {0x0183, 0x0182}, {0x01B0, 0x01AF}, {0x0253, 0x0181}, {0x026F, 0x019C},
-       {0x0185, 0x0184}, {0x01B9, 0x01B8}, {0x0254, 0x0186}, {0x0272, 0x019D},
-       {0x0188, 0x0187}, {0x01BD, 0x01BC}, {0x0259, 0x018F}, {0x0275, 0x019F},
-       {0x018C, 0x018B}, {0x01C6, 0x01C4}, {0x025B, 0x0190}, {0x0283, 0x01A9},
-       {0x0192, 0x0191}, {0x01C9, 0x01C7}, {0x0260, 0x0193}, {0x0288, 0x01AE},
-       {0x0199, 0x0198}, {0x01CC, 0x01CA}, {0x0263, 0x0194}, {0x0292, 0x01B7},
-       {0x01A8, 0x01A7}, {0x01DD, 0x018E}, {0x0268, 0x0197},
-       {0}
-       };
-
-       int i, r;
-       ntfschar *uc;
-
-       uc = ntfs_malloc_nofs(default_upcase_len * sizeof(ntfschar));
-       if (!uc)
-               return uc;
-       memset(uc, 0, default_upcase_len * sizeof(ntfschar));
-       /* Generate the little endian Unicode upcase table used by ntfs. */
-       for (i = 0; i < default_upcase_len; i++)
-               uc[i] = cpu_to_le16(i);
-       for (r = 0; uc_run_table[r][0]; r++)
-               for (i = uc_run_table[r][0]; i < uc_run_table[r][1]; i++)
-                       le16_add_cpu(&uc[i], uc_run_table[r][2]);
-       for (r = 0; uc_dup_table[r][0]; r++)
-               for (i = uc_dup_table[r][0]; i < uc_dup_table[r][1]; i += 2)
-                       le16_add_cpu(&uc[i + 1], -1);
-       for (r = 0; uc_word_table[r][0]; r++)
-               uc[uc_word_table[r][0]] = cpu_to_le16(uc_word_table[r][1]);
-       return uc;
-}
diff --git a/fs/ntfs/usnjrnl.c b/fs/ntfs/usnjrnl.c
deleted file mode 100644 (file)
index 9097a0b..0000000
+++ /dev/null
@@ -1,70 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * usnjrnl.h - NTFS kernel transaction log ($UsnJrnl) handling.  Part of the
- *            Linux-NTFS project.
- *
- * Copyright (c) 2005 Anton Altaparmakov
- */
-
-#ifdef NTFS_RW
-
-#include <linux/fs.h>
-#include <linux/highmem.h>
-#include <linux/mm.h>
-
-#include "aops.h"
-#include "debug.h"
-#include "endian.h"
-#include "time.h"
-#include "types.h"
-#include "usnjrnl.h"
-#include "volume.h"
-
-/**
- * ntfs_stamp_usnjrnl - stamp the transaction log ($UsnJrnl) on an ntfs volume
- * @vol:       ntfs volume on which to stamp the transaction log
- *
- * Stamp the transaction log ($UsnJrnl) on the ntfs volume @vol and return
- * 'true' on success and 'false' on error.
- *
- * This function assumes that the transaction log has already been loaded and
- * consistency checked by a call to fs/ntfs/super.c::load_and_init_usnjrnl().
- */
-bool ntfs_stamp_usnjrnl(ntfs_volume *vol)
-{
-       ntfs_debug("Entering.");
-       if (likely(!NVolUsnJrnlStamped(vol))) {
-               sle64 stamp;
-               struct page *page;
-               USN_HEADER *uh;
-
-               page = ntfs_map_page(vol->usnjrnl_max_ino->i_mapping, 0);
-               if (IS_ERR(page)) {
-                       ntfs_error(vol->sb, "Failed to read from "
-                                       "$UsnJrnl/$DATA/$Max attribute.");
-                       return false;
-               }
-               uh = (USN_HEADER*)page_address(page);
-               stamp = get_current_ntfs_time();
-               ntfs_debug("Stamping transaction log ($UsnJrnl): old "
-                               "journal_id 0x%llx, old lowest_valid_usn "
-                               "0x%llx, new journal_id 0x%llx, new "
-                               "lowest_valid_usn 0x%llx.",
-                               (long long)sle64_to_cpu(uh->journal_id),
-                               (long long)sle64_to_cpu(uh->lowest_valid_usn),
-                               (long long)sle64_to_cpu(stamp),
-                               i_size_read(vol->usnjrnl_j_ino));
-               uh->lowest_valid_usn =
-                               cpu_to_sle64(i_size_read(vol->usnjrnl_j_ino));
-               uh->journal_id = stamp;
-               flush_dcache_page(page);
-               set_page_dirty(page);
-               ntfs_unmap_page(page);
-               /* Set the flag so we do not have to do it again on remount. */
-               NVolSetUsnJrnlStamped(vol);
-       }
-       ntfs_debug("Done.");
-       return true;
-}
-
-#endif /* NTFS_RW */
diff --git a/fs/ntfs/usnjrnl.h b/fs/ntfs/usnjrnl.h
deleted file mode 100644 (file)
index 85f531b..0000000
+++ /dev/null
@@ -1,191 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * usnjrnl.h - Defines for NTFS kernel transaction log ($UsnJrnl) handling.
- *            Part of the Linux-NTFS project.
- *
- * Copyright (c) 2005 Anton Altaparmakov
- */
-
-#ifndef _LINUX_NTFS_USNJRNL_H
-#define _LINUX_NTFS_USNJRNL_H
-
-#ifdef NTFS_RW
-
-#include "types.h"
-#include "endian.h"
-#include "layout.h"
-#include "volume.h"
-
-/*
- * Transaction log ($UsnJrnl) organization:
- *
- * The transaction log records whenever a file is modified in any way.  So for
- * example it will record that file "blah" was written to at a particular time
- * but not what was written.  If will record that a file was deleted or
- * created, that a file was truncated, etc.  See below for all the reason
- * codes used.
- *
- * The transaction log is in the $Extend directory which is in the root
- * directory of each volume.  If it is not present it means transaction
- * logging is disabled.  If it is present it means transaction logging is
- * either enabled or in the process of being disabled in which case we can
- * ignore it as it will go away as soon as Windows gets its hands on it.
- *
- * To determine whether the transaction logging is enabled or in the process
- * of being disabled, need to check the volume flags in the
- * $VOLUME_INFORMATION attribute in the $Volume system file (which is present
- * in the root directory and has a fixed mft record number, see layout.h).
- * If the flag VOLUME_DELETE_USN_UNDERWAY is set it means the transaction log
- * is in the process of being disabled and if this flag is clear it means the
- * transaction log is enabled.
- *
- * The transaction log consists of two parts; the $DATA/$Max attribute as well
- * as the $DATA/$J attribute.  $Max is a header describing the transaction
- * log whilst $J is the transaction log data itself as a sequence of variable
- * sized USN_RECORDs (see below for all the structures).
- *
- * We do not care about transaction logging at this point in time but we still
- * need to let windows know that the transaction log is out of date.  To do
- * this we need to stamp the transaction log.  This involves setting the
- * lowest_valid_usn field in the $DATA/$Max attribute to the usn to be used
- * for the next added USN_RECORD to the $DATA/$J attribute as well as
- * generating a new journal_id in $DATA/$Max.
- *
- * The journal_id is as of the current version (2.0) of the transaction log
- * simply the 64-bit timestamp of when the journal was either created or last
- * stamped.
- *
- * To determine the next usn there are two ways.  The first is to parse
- * $DATA/$J and to find the last USN_RECORD in it and to add its record_length
- * to its usn (which is the byte offset in the $DATA/$J attribute).  The
- * second is simply to take the data size of the attribute.  Since the usns
- * are simply byte offsets into $DATA/$J, this is exactly the next usn.  For
- * obvious reasons we use the second method as it is much simpler and faster.
- *
- * As an aside, note that to actually disable the transaction log, one would
- * need to set the VOLUME_DELETE_USN_UNDERWAY flag (see above), then go
- * through all the mft records on the volume and set the usn field in their
- * $STANDARD_INFORMATION attribute to zero.  Once that is done, one would need
- * to delete the transaction log file, i.e. \$Extent\$UsnJrnl, and finally,
- * one would need to clear the VOLUME_DELETE_USN_UNDERWAY flag.
- *
- * Note that if a volume is unmounted whilst the transaction log is being
- * disabled, the process will continue the next time the volume is mounted.
- * This is why we can safely mount read-write when we see a transaction log
- * in the process of being deleted.
- */
-
-/* Some $UsnJrnl related constants. */
-#define UsnJrnlMajorVer                2
-#define UsnJrnlMinorVer                0
-
-/*
- * $DATA/$Max attribute.  This is (always?) resident and has a fixed size of
- * 32 bytes.  It contains the header describing the transaction log.
- */
-typedef struct {
-/*Ofs*/
-/*   0*/sle64 maximum_size;    /* The maximum on-disk size of the $DATA/$J
-                                  attribute. */
-/*   8*/sle64 allocation_delta;        /* Number of bytes by which to increase the
-                                  size of the $DATA/$J attribute. */
-/*0x10*/sle64 journal_id;      /* Current id of the transaction log. */
-/*0x18*/leUSN lowest_valid_usn;        /* Lowest valid usn in $DATA/$J for the
-                                  current journal_id. */
-/* sizeof() = 32 (0x20) bytes */
-} __attribute__ ((__packed__)) USN_HEADER;
-
-/*
- * Reason flags (32-bit).  Cumulative flags describing the change(s) to the
- * file since it was last opened.  I think the names speak for themselves but
- * if you disagree check out the descriptions in the Linux NTFS project NTFS
- * documentation: http://www.linux-ntfs.org/
- */
-enum {
-       USN_REASON_DATA_OVERWRITE       = cpu_to_le32(0x00000001),
-       USN_REASON_DATA_EXTEND          = cpu_to_le32(0x00000002),
-       USN_REASON_DATA_TRUNCATION      = cpu_to_le32(0x00000004),
-       USN_REASON_NAMED_DATA_OVERWRITE = cpu_to_le32(0x00000010),
-       USN_REASON_NAMED_DATA_EXTEND    = cpu_to_le32(0x00000020),
-       USN_REASON_NAMED_DATA_TRUNCATION= cpu_to_le32(0x00000040),
-       USN_REASON_FILE_CREATE          = cpu_to_le32(0x00000100),
-       USN_REASON_FILE_DELETE          = cpu_to_le32(0x00000200),
-       USN_REASON_EA_CHANGE            = cpu_to_le32(0x00000400),
-       USN_REASON_SECURITY_CHANGE      = cpu_to_le32(0x00000800),
-       USN_REASON_RENAME_OLD_NAME      = cpu_to_le32(0x00001000),
-       USN_REASON_RENAME_NEW_NAME      = cpu_to_le32(0x00002000),
-       USN_REASON_INDEXABLE_CHANGE     = cpu_to_le32(0x00004000),
-       USN_REASON_BASIC_INFO_CHANGE    = cpu_to_le32(0x00008000),
-       USN_REASON_HARD_LINK_CHANGE     = cpu_to_le32(0x00010000),
-       USN_REASON_COMPRESSION_CHANGE   = cpu_to_le32(0x00020000),
-       USN_REASON_ENCRYPTION_CHANGE    = cpu_to_le32(0x00040000),
-       USN_REASON_OBJECT_ID_CHANGE     = cpu_to_le32(0x00080000),
-       USN_REASON_REPARSE_POINT_CHANGE = cpu_to_le32(0x00100000),
-       USN_REASON_STREAM_CHANGE        = cpu_to_le32(0x00200000),
-       USN_REASON_CLOSE                = cpu_to_le32(0x80000000),
-};
-
-typedef le32 USN_REASON_FLAGS;
-
-/*
- * Source info flags (32-bit).  Information about the source of the change(s)
- * to the file.  For detailed descriptions of what these mean, see the Linux
- * NTFS project NTFS documentation:
- *     http://www.linux-ntfs.org/
- */
-enum {
-       USN_SOURCE_DATA_MANAGEMENT        = cpu_to_le32(0x00000001),
-       USN_SOURCE_AUXILIARY_DATA         = cpu_to_le32(0x00000002),
-       USN_SOURCE_REPLICATION_MANAGEMENT = cpu_to_le32(0x00000004),
-};
-
-typedef le32 USN_SOURCE_INFO_FLAGS;
-
-/*
- * $DATA/$J attribute.  This is always non-resident, is marked as sparse, and
- * is of variabled size.  It consists of a sequence of variable size
- * USN_RECORDS.  The minimum allocated_size is allocation_delta as
- * specified in $DATA/$Max.  When the maximum_size specified in $DATA/$Max is
- * exceeded by more than allocation_delta bytes, allocation_delta bytes are
- * allocated and appended to the $DATA/$J attribute and an equal number of
- * bytes at the beginning of the attribute are freed and made sparse.  Note the
- * making sparse only happens at volume checkpoints and hence the actual
- * $DATA/$J size can exceed maximum_size + allocation_delta temporarily.
- */
-typedef struct {
-/*Ofs*/
-/*   0*/le32 length;           /* Byte size of this record (8-byte
-                                  aligned). */
-/*   4*/le16 major_ver;                /* Major version of the transaction log used
-                                  for this record. */
-/*   6*/le16 minor_ver;                /* Minor version of the transaction log used
-                                  for this record. */
-/*   8*/leMFT_REF mft_reference;/* The mft reference of the file (or
-                                  directory) described by this record. */
-/*0x10*/leMFT_REF parent_directory;/* The mft reference of the parent
-                                  directory of the file described by this
-                                  record. */
-/*0x18*/leUSN usn;             /* The usn of this record.  Equals the offset
-                                  within the $DATA/$J attribute. */
-/*0x20*/sle64 time;            /* Time when this record was created. */
-/*0x28*/USN_REASON_FLAGS reason;/* Reason flags (see above). */
-/*0x2c*/USN_SOURCE_INFO_FLAGS source_info;/* Source info flags (see above). */
-/*0x30*/le32 security_id;      /* File security_id copied from
-                                  $STANDARD_INFORMATION. */
-/*0x34*/FILE_ATTR_FLAGS file_attributes;       /* File attributes copied from
-                                  $STANDARD_INFORMATION or $FILE_NAME (not
-                                  sure which). */
-/*0x38*/le16 file_name_size;   /* Size of the file name in bytes. */
-/*0x3a*/le16 file_name_offset; /* Offset to the file name in bytes from the
-                                  start of this record. */
-/*0x3c*/ntfschar file_name[0]; /* Use when creating only.  When reading use
-                                  file_name_offset to determine the location
-                                  of the name. */
-/* sizeof() = 60 (0x3c) bytes */
-} __attribute__ ((__packed__)) USN_RECORD;
-
-extern bool ntfs_stamp_usnjrnl(ntfs_volume *vol);
-
-#endif /* NTFS_RW */
-
-#endif /* _LINUX_NTFS_USNJRNL_H */
diff --git a/fs/ntfs/volume.h b/fs/ntfs/volume.h
deleted file mode 100644 (file)
index 930a9ae..0000000
+++ /dev/null
@@ -1,164 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * volume.h - Defines for volume structures in NTFS Linux kernel driver. Part
- *           of the Linux-NTFS project.
- *
- * Copyright (c) 2001-2006 Anton Altaparmakov
- * Copyright (c) 2002 Richard Russon
- */
-
-#ifndef _LINUX_NTFS_VOLUME_H
-#define _LINUX_NTFS_VOLUME_H
-
-#include <linux/rwsem.h>
-#include <linux/uidgid.h>
-
-#include "types.h"
-#include "layout.h"
-
-/*
- * The NTFS in memory super block structure.
- */
-typedef struct {
-       /*
-        * FIXME: Reorder to have commonly used together element within the
-        * same cache line, aiming at a cache line size of 32 bytes. Aim for
-        * 64 bytes for less commonly used together elements. Put most commonly
-        * used elements to front of structure. Obviously do this only when the
-        * structure has stabilized... (AIA)
-        */
-       /* Device specifics. */
-       struct super_block *sb;         /* Pointer back to the super_block. */
-       LCN nr_blocks;                  /* Number of sb->s_blocksize bytes
-                                          sized blocks on the device. */
-       /* Configuration provided by user at mount time. */
-       unsigned long flags;            /* Miscellaneous flags, see below. */
-       kuid_t uid;                     /* uid that files will be mounted as. */
-       kgid_t gid;                     /* gid that files will be mounted as. */
-       umode_t fmask;                  /* The mask for file permissions. */
-       umode_t dmask;                  /* The mask for directory
-                                          permissions. */
-       u8 mft_zone_multiplier;         /* Initial mft zone multiplier. */
-       u8 on_errors;                   /* What to do on filesystem errors. */
-       /* NTFS bootsector provided information. */
-       u16 sector_size;                /* in bytes */
-       u8 sector_size_bits;            /* log2(sector_size) */
-       u32 cluster_size;               /* in bytes */
-       u32 cluster_size_mask;          /* cluster_size - 1 */
-       u8 cluster_size_bits;           /* log2(cluster_size) */
-       u32 mft_record_size;            /* in bytes */
-       u32 mft_record_size_mask;       /* mft_record_size - 1 */
-       u8 mft_record_size_bits;        /* log2(mft_record_size) */
-       u32 index_record_size;          /* in bytes */
-       u32 index_record_size_mask;     /* index_record_size - 1 */
-       u8 index_record_size_bits;      /* log2(index_record_size) */
-       LCN nr_clusters;                /* Volume size in clusters == number of
-                                          bits in lcn bitmap. */
-       LCN mft_lcn;                    /* Cluster location of mft data. */
-       LCN mftmirr_lcn;                /* Cluster location of copy of mft. */
-       u64 serial_no;                  /* The volume serial number. */
-       /* Mount specific NTFS information. */
-       u32 upcase_len;                 /* Number of entries in upcase[]. */
-       ntfschar *upcase;               /* The upcase table. */
-
-       s32 attrdef_size;               /* Size of the attribute definition
-                                          table in bytes. */
-       ATTR_DEF *attrdef;              /* Table of attribute definitions.
-                                          Obtained from FILE_AttrDef. */
-
-#ifdef NTFS_RW
-       /* Variables used by the cluster and mft allocators. */
-       s64 mft_data_pos;               /* Mft record number at which to
-                                          allocate the next mft record. */
-       LCN mft_zone_start;             /* First cluster of the mft zone. */
-       LCN mft_zone_end;               /* First cluster beyond the mft zone. */
-       LCN mft_zone_pos;               /* Current position in the mft zone. */
-       LCN data1_zone_pos;             /* Current position in the first data
-                                          zone. */
-       LCN data2_zone_pos;             /* Current position in the second data
-                                          zone. */
-#endif /* NTFS_RW */
-
-       struct inode *mft_ino;          /* The VFS inode of $MFT. */
-
-       struct inode *mftbmp_ino;       /* Attribute inode for $MFT/$BITMAP. */
-       struct rw_semaphore mftbmp_lock; /* Lock for serializing accesses to the
-                                           mft record bitmap ($MFT/$BITMAP). */
-#ifdef NTFS_RW
-       struct inode *mftmirr_ino;      /* The VFS inode of $MFTMirr. */
-       int mftmirr_size;               /* Size of mft mirror in mft records. */
-
-       struct inode *logfile_ino;      /* The VFS inode of $LogFile. */
-#endif /* NTFS_RW */
-
-       struct inode *lcnbmp_ino;       /* The VFS inode of $Bitmap. */
-       struct rw_semaphore lcnbmp_lock; /* Lock for serializing accesses to the
-                                           cluster bitmap ($Bitmap/$DATA). */
-
-       struct inode *vol_ino;          /* The VFS inode of $Volume. */
-       VOLUME_FLAGS vol_flags;         /* Volume flags. */
-       u8 major_ver;                   /* Ntfs major version of volume. */
-       u8 minor_ver;                   /* Ntfs minor version of volume. */
-
-       struct inode *root_ino;         /* The VFS inode of the root
-                                          directory. */
-       struct inode *secure_ino;       /* The VFS inode of $Secure (NTFS3.0+
-                                          only, otherwise NULL). */
-       struct inode *extend_ino;       /* The VFS inode of $Extend (NTFS3.0+
-                                          only, otherwise NULL). */
-#ifdef NTFS_RW
-       /* $Quota stuff is NTFS3.0+ specific.  Unused/NULL otherwise. */
-       struct inode *quota_ino;        /* The VFS inode of $Quota. */
-       struct inode *quota_q_ino;      /* Attribute inode for $Quota/$Q. */
-       /* $UsnJrnl stuff is NTFS3.0+ specific.  Unused/NULL otherwise. */
-       struct inode *usnjrnl_ino;      /* The VFS inode of $UsnJrnl. */
-       struct inode *usnjrnl_max_ino;  /* Attribute inode for $UsnJrnl/$Max. */
-       struct inode *usnjrnl_j_ino;    /* Attribute inode for $UsnJrnl/$J. */
-#endif /* NTFS_RW */
-       struct nls_table *nls_map;
-} ntfs_volume;
-
-/*
- * Defined bits for the flags field in the ntfs_volume structure.
- */
-typedef enum {
-       NV_Errors,              /* 1: Volume has errors, prevent remount rw. */
-       NV_ShowSystemFiles,     /* 1: Return system files in ntfs_readdir(). */
-       NV_CaseSensitive,       /* 1: Treat file names as case sensitive and
-                                     create filenames in the POSIX namespace.
-                                     Otherwise be case insensitive but still
-                                     create file names in POSIX namespace. */
-       NV_LogFileEmpty,        /* 1: $LogFile journal is empty. */
-       NV_QuotaOutOfDate,      /* 1: $Quota is out of date. */
-       NV_UsnJrnlStamped,      /* 1: $UsnJrnl has been stamped. */
-       NV_SparseEnabled,       /* 1: May create sparse files. */
-} ntfs_volume_flags;
-
-/*
- * Macro tricks to expand the NVolFoo(), NVolSetFoo(), and NVolClearFoo()
- * functions.
- */
-#define DEFINE_NVOL_BIT_OPS(flag)                                      \
-static inline int NVol##flag(ntfs_volume *vol)         \
-{                                                      \
-       return test_bit(NV_##flag, &(vol)->flags);      \
-}                                                      \
-static inline void NVolSet##flag(ntfs_volume *vol)     \
-{                                                      \
-       set_bit(NV_##flag, &(vol)->flags);              \
-}                                                      \
-static inline void NVolClear##flag(ntfs_volume *vol)   \
-{                                                      \
-       clear_bit(NV_##flag, &(vol)->flags);            \
-}
-
-/* Emit the ntfs volume bitops functions. */
-DEFINE_NVOL_BIT_OPS(Errors)
-DEFINE_NVOL_BIT_OPS(ShowSystemFiles)
-DEFINE_NVOL_BIT_OPS(CaseSensitive)
-DEFINE_NVOL_BIT_OPS(LogFileEmpty)
-DEFINE_NVOL_BIT_OPS(QuotaOutOfDate)
-DEFINE_NVOL_BIT_OPS(UsnJrnlStamped)
-DEFINE_NVOL_BIT_OPS(SparseEnabled)
-
-#endif /* _LINUX_NTFS_VOLUME_H */
index 63f70259edc0d44d6a52c946c9a90663fe16722a..7aadf5010999455e4d16e20581e2334034451e57 100644 (file)
@@ -886,7 +886,7 @@ int attr_data_get_block(struct ntfs_inode *ni, CLST vcn, CLST clen, CLST *lcn,
        struct runs_tree *run = &ni->file.run;
        struct ntfs_sb_info *sbi;
        u8 cluster_bits;
-       struct ATTRIB *attr = NULL, *attr_b;
+       struct ATTRIB *attr, *attr_b;
        struct ATTR_LIST_ENTRY *le, *le_b;
        struct mft_inode *mi, *mi_b;
        CLST hint, svcn, to_alloc, evcn1, next_svcn, asize, end, vcn0, alen;
@@ -904,12 +904,8 @@ int attr_data_get_block(struct ntfs_inode *ni, CLST vcn, CLST clen, CLST *lcn,
                *len = 0;
        up_read(&ni->file.run_lock);
 
-       if (*len) {
-               if (*lcn != SPARSE_LCN || !new)
-                       return 0; /* Fast normal way without allocation. */
-               else if (clen > *len)
-                       clen = *len;
-       }
+       if (*len && (*lcn != SPARSE_LCN || !new))
+               return 0; /* Fast normal way without allocation. */
 
        /* No cluster in cache or we need to allocate cluster in hole. */
        sbi = ni->mi.sbi;
@@ -918,6 +914,17 @@ int attr_data_get_block(struct ntfs_inode *ni, CLST vcn, CLST clen, CLST *lcn,
        ni_lock(ni);
        down_write(&ni->file.run_lock);
 
+       /* Repeat the code above (under write lock). */
+       if (!run_lookup_entry(run, vcn, lcn, len, NULL))
+               *len = 0;
+
+       if (*len) {
+               if (*lcn != SPARSE_LCN || !new)
+                       goto out; /* normal way without allocation. */
+               if (clen > *len)
+                       clen = *len;
+       }
+
        le_b = NULL;
        attr_b = ni_find_attr(ni, NULL, &le_b, ATTR_DATA, NULL, 0, NULL, &mi_b);
        if (!attr_b) {
@@ -1736,8 +1743,10 @@ repack:
                        le_b = NULL;
                        attr_b = ni_find_attr(ni, NULL, &le_b, ATTR_DATA, NULL,
                                              0, NULL, &mi_b);
-                       if (!attr_b)
-                               return -ENOENT;
+                       if (!attr_b) {
+                               err = -ENOENT;
+                               goto out;
+                       }
 
                        attr = attr_b;
                        le = le_b;
@@ -1818,13 +1827,15 @@ ins_ext:
 ok:
        run_truncate_around(run, vcn);
 out:
-       if (new_valid > data_size)
-               new_valid = data_size;
+       if (attr_b) {
+               if (new_valid > data_size)
+                       new_valid = data_size;
 
-       valid_size = le64_to_cpu(attr_b->nres.valid_size);
-       if (new_valid != valid_size) {
-               attr_b->nres.valid_size = cpu_to_le64(valid_size);
-               mi_b->dirty = true;
+               valid_size = le64_to_cpu(attr_b->nres.valid_size);
+               if (new_valid != valid_size) {
+                       attr_b->nres.valid_size = cpu_to_le64(valid_size);
+                       mi_b->dirty = true;
+               }
        }
 
        return err;
@@ -2073,7 +2084,7 @@ next_attr:
 
        /* Update inode size. */
        ni->i_valid = valid_size;
-       ni->vfs_inode.i_size = data_size;
+       i_size_write(&ni->vfs_inode, data_size);
        inode_set_bytes(&ni->vfs_inode, total_size);
        ni->ni_flags |= NI_FLAG_UPDATE_PARENT;
        mark_inode_dirty(&ni->vfs_inode);
@@ -2488,7 +2499,7 @@ int attr_insert_range(struct ntfs_inode *ni, u64 vbo, u64 bytes)
        mi_b->dirty = true;
 
 done:
-       ni->vfs_inode.i_size += bytes;
+       i_size_write(&ni->vfs_inode, ni->vfs_inode.i_size + bytes);
        ni->ni_flags |= NI_FLAG_UPDATE_PARENT;
        mark_inode_dirty(&ni->vfs_inode);
 
index 7c01735d1219d858b46809147fa06dbcc6cafe4c..9f4bd8d260901ca4fd4db97aea3687459e4ebc1a 100644 (file)
@@ -29,7 +29,7 @@ static inline bool al_is_valid_le(const struct ntfs_inode *ni,
 void al_destroy(struct ntfs_inode *ni)
 {
        run_close(&ni->attr_list.run);
-       kfree(ni->attr_list.le);
+       kvfree(ni->attr_list.le);
        ni->attr_list.le = NULL;
        ni->attr_list.size = 0;
        ni->attr_list.dirty = false;
@@ -127,12 +127,13 @@ struct ATTR_LIST_ENTRY *al_enumerate(struct ntfs_inode *ni,
 {
        size_t off;
        u16 sz;
+       const unsigned le_min_size = le_size(0);
 
        if (!le) {
                le = ni->attr_list.le;
        } else {
                sz = le16_to_cpu(le->size);
-               if (sz < sizeof(struct ATTR_LIST_ENTRY)) {
+               if (sz < le_min_size) {
                        /* Impossible 'cause we should not return such le. */
                        return NULL;
                }
@@ -141,7 +142,7 @@ struct ATTR_LIST_ENTRY *al_enumerate(struct ntfs_inode *ni,
 
        /* Check boundary. */
        off = PtrOffset(ni->attr_list.le, le);
-       if (off + sizeof(struct ATTR_LIST_ENTRY) > ni->attr_list.size) {
+       if (off + le_min_size > ni->attr_list.size) {
                /* The regular end of list. */
                return NULL;
        }
@@ -149,8 +150,7 @@ struct ATTR_LIST_ENTRY *al_enumerate(struct ntfs_inode *ni,
        sz = le16_to_cpu(le->size);
 
        /* Check le for errors. */
-       if (sz < sizeof(struct ATTR_LIST_ENTRY) ||
-           off + sz > ni->attr_list.size ||
+       if (sz < le_min_size || off + sz > ni->attr_list.size ||
            sz < le->name_off + le->name_len * sizeof(short)) {
                return NULL;
        }
@@ -318,7 +318,7 @@ int al_add_le(struct ntfs_inode *ni, enum ATTR_TYPE type, const __le16 *name,
                memcpy(ptr, al->le, off);
                memcpy(Add2Ptr(ptr, off + sz), le, old_size - off);
                le = Add2Ptr(ptr, off);
-               kfree(al->le);
+               kvfree(al->le);
                al->le = ptr;
        } else {
                memmove(Add2Ptr(le, sz), le, old_size - off);
index 63f14a0232f6a0e0672c5373748bf77b72931bf2..845f9b22deef0f42cabfb4156d4d8fe05c31fce9 100644 (file)
@@ -124,7 +124,7 @@ void wnd_close(struct wnd_bitmap *wnd)
 {
        struct rb_node *node, *next;
 
-       kfree(wnd->free_bits);
+       kvfree(wnd->free_bits);
        wnd->free_bits = NULL;
        run_close(&wnd->run);
 
@@ -1360,7 +1360,7 @@ int wnd_extend(struct wnd_bitmap *wnd, size_t new_bits)
                memcpy(new_free, wnd->free_bits, wnd->nwnd * sizeof(short));
                memset(new_free + wnd->nwnd, 0,
                       (new_wnd - wnd->nwnd) * sizeof(short));
-               kfree(wnd->free_bits);
+               kvfree(wnd->free_bits);
                wnd->free_bits = new_free;
        }
 
index ec0566b322d5d0b4b36533a3a218671bb0ff7b02..5cf3d9decf646b1935517e8b564d807626e60e0f 100644 (file)
@@ -309,11 +309,31 @@ static inline int ntfs_filldir(struct ntfs_sb_info *sbi, struct ntfs_inode *ni,
                return 0;
        }
 
-       /* NTFS: symlinks are "dir + reparse" or "file + reparse" */
-       if (fname->dup.fa & FILE_ATTRIBUTE_REPARSE_POINT)
-               dt_type = DT_LNK;
-       else
-               dt_type = (fname->dup.fa & FILE_ATTRIBUTE_DIRECTORY) ? DT_DIR : DT_REG;
+       /*
+        * NTFS: symlinks are "dir + reparse" or "file + reparse"
+        * Unfortunately reparse attribute is used for many purposes (several dozens).
+        * It is not possible here to know is this name symlink or not.
+        * To get exactly the type of name we should to open inode (read mft).
+        * getattr for opened file (fstat) correctly returns symlink.
+        */
+       dt_type = (fname->dup.fa & FILE_ATTRIBUTE_DIRECTORY) ? DT_DIR : DT_REG;
+
+       /*
+        * It is not reliable to detect the type of name using duplicated information
+        * stored in parent directory.
+        * The only correct way to get the type of name - read MFT record and find ATTR_STD.
+        * The code below is not good idea.
+        * It does additional locks/reads just to get the type of name.
+        * Should we use additional mount option to enable branch below?
+        */
+       if ((fname->dup.fa & FILE_ATTRIBUTE_REPARSE_POINT) &&
+           ino != ni->mi.rno) {
+               struct inode *inode = ntfs_iget5(sbi->sb, &e->ref, NULL);
+               if (!IS_ERR_OR_NULL(inode)) {
+                       dt_type = fs_umode_to_dtype(inode->i_mode);
+                       iput(inode);
+               }
+       }
 
        return !dir_emit(ctx, (s8 *)name, name_len, ino, dt_type);
 }
@@ -495,11 +515,9 @@ static int ntfs_dir_count(struct inode *dir, bool *is_empty, size_t *dirs,
        struct INDEX_HDR *hdr;
        const struct ATTR_FILE_NAME *fname;
        u32 e_size, off, end;
-       u64 vbo = 0;
        size_t drs = 0, fles = 0, bit = 0;
-       loff_t i_size = ni->vfs_inode.i_size;
        struct indx_node *node = NULL;
-       u8 index_bits = ni->dir.index_bits;
+       size_t max_indx = i_size_read(&ni->vfs_inode) >> ni->dir.index_bits;
 
        if (is_empty)
                *is_empty = true;
@@ -518,8 +536,10 @@ static int ntfs_dir_count(struct inode *dir, bool *is_empty, size_t *dirs,
                        e = Add2Ptr(hdr, off);
                        e_size = le16_to_cpu(e->size);
                        if (e_size < sizeof(struct NTFS_DE) ||
-                           off + e_size > end)
+                           off + e_size > end) {
+                               /* Looks like corruption. */
                                break;
+                       }
 
                        if (de_is_last(e))
                                break;
@@ -543,7 +563,7 @@ static int ntfs_dir_count(struct inode *dir, bool *is_empty, size_t *dirs,
                                fles += 1;
                }
 
-               if (vbo >= i_size)
+               if (bit >= max_indx)
                        goto out;
 
                err = indx_used_bit(&ni->dir, ni, &bit);
@@ -553,8 +573,7 @@ static int ntfs_dir_count(struct inode *dir, bool *is_empty, size_t *dirs,
                if (bit == MINUS_ONE_T)
                        goto out;
 
-               vbo = (u64)bit << index_bits;
-               if (vbo >= i_size)
+               if (bit >= max_indx)
                        goto out;
 
                err = indx_read(&ni->dir, ni, bit << ni->dir.idx2vbn_bits,
@@ -564,7 +583,6 @@ static int ntfs_dir_count(struct inode *dir, bool *is_empty, size_t *dirs,
 
                hdr = &node->index->ihdr;
                bit += 1;
-               vbo = (u64)bit << ni->dir.idx2vbn_bits;
        }
 
 out:
@@ -593,5 +611,9 @@ const struct file_operations ntfs_dir_operations = {
        .iterate_shared = ntfs_readdir,
        .fsync          = generic_file_fsync,
        .open           = ntfs_file_open,
+       .unlocked_ioctl = ntfs_ioctl,
+#ifdef CONFIG_COMPAT
+       .compat_ioctl   = ntfs_compat_ioctl,
+#endif
 };
 // clang-format on
index a5a30a24ce5dfa70d670826d1b5ac16a668d06be..5418662c80d8878afe72a8b8e8ffc43cc834b176 100644 (file)
@@ -48,7 +48,7 @@ static int ntfs_ioctl_fitrim(struct ntfs_sb_info *sbi, unsigned long arg)
        return 0;
 }
 
-static long ntfs_ioctl(struct file *filp, u32 cmd, unsigned long arg)
+long ntfs_ioctl(struct file *filp, u32 cmd, unsigned long arg)
 {
        struct inode *inode = file_inode(filp);
        struct ntfs_sb_info *sbi = inode->i_sb->s_fs_info;
@@ -61,7 +61,7 @@ static long ntfs_ioctl(struct file *filp, u32 cmd, unsigned long arg)
 }
 
 #ifdef CONFIG_COMPAT
-static long ntfs_compat_ioctl(struct file *filp, u32 cmd, unsigned long arg)
+long ntfs_compat_ioctl(struct file *filp, u32 cmd, unsigned long arg)
 
 {
        return ntfs_ioctl(filp, cmd, (unsigned long)compat_ptr(arg));
@@ -188,6 +188,7 @@ static int ntfs_zero_range(struct inode *inode, u64 vbo, u64 vbo_to)
        u32 bh_next, bh_off, to;
        sector_t iblock;
        struct folio *folio;
+       bool dirty = false;
 
        for (; idx < idx_end; idx += 1, from = 0) {
                page_off = (loff_t)idx << PAGE_SHIFT;
@@ -223,29 +224,27 @@ static int ntfs_zero_range(struct inode *inode, u64 vbo, u64 vbo_to)
                        /* Ok, it's mapped. Make sure it's up-to-date. */
                        if (folio_test_uptodate(folio))
                                set_buffer_uptodate(bh);
-
-                       if (!buffer_uptodate(bh)) {
-                               err = bh_read(bh, 0);
-                               if (err < 0) {
-                                       folio_unlock(folio);
-                                       folio_put(folio);
-                                       goto out;
-                               }
+                       else if (bh_read(bh, 0) < 0) {
+                               err = -EIO;
+                               folio_unlock(folio);
+                               folio_put(folio);
+                               goto out;
                        }
 
                        mark_buffer_dirty(bh);
-
                } while (bh_off = bh_next, iblock += 1,
                         head != (bh = bh->b_this_page));
 
                folio_zero_segment(folio, from, to);
+               dirty = true;
 
                folio_unlock(folio);
                folio_put(folio);
                cond_resched();
        }
 out:
-       mark_inode_dirty(inode);
+       if (dirty)
+               mark_inode_dirty(inode);
        return err;
 }
 
@@ -261,6 +260,9 @@ static int ntfs_file_mmap(struct file *file, struct vm_area_struct *vma)
        bool rw = vma->vm_flags & VM_WRITE;
        int err;
 
+       if (unlikely(ntfs3_forced_shutdown(inode->i_sb)))
+               return -EIO;
+
        if (is_encrypted(ni)) {
                ntfs_inode_warn(inode, "mmap encrypted not supported");
                return -EOPNOTSUPP;
@@ -499,10 +501,14 @@ static long ntfs_fallocate(struct file *file, int mode, loff_t vbo, loff_t len)
                ni_lock(ni);
                err = attr_punch_hole(ni, vbo, len, &frame_size);
                ni_unlock(ni);
+               if (!err)
+                       goto ok;
+
                if (err != E_NTFS_NOTALIGNED)
                        goto out;
 
                /* Process not aligned punch. */
+               err = 0;
                mask = frame_size - 1;
                vbo_a = (vbo + mask) & ~mask;
                end_a = end & ~mask;
@@ -525,6 +531,8 @@ static long ntfs_fallocate(struct file *file, int mode, loff_t vbo, loff_t len)
                        ni_lock(ni);
                        err = attr_punch_hole(ni, vbo_a, end_a - vbo_a, NULL);
                        ni_unlock(ni);
+                       if (err)
+                               goto out;
                }
        } else if (mode & FALLOC_FL_COLLAPSE_RANGE) {
                /*
@@ -564,6 +572,8 @@ static long ntfs_fallocate(struct file *file, int mode, loff_t vbo, loff_t len)
                ni_lock(ni);
                err = attr_insert_range(ni, vbo, len);
                ni_unlock(ni);
+               if (err)
+                       goto out;
        } else {
                /* Check new size. */
                u8 cluster_bits = sbi->cluster_bits;
@@ -633,11 +643,18 @@ static long ntfs_fallocate(struct file *file, int mode, loff_t vbo, loff_t len)
                                            &ni->file.run, i_size, &ni->i_valid,
                                            true, NULL);
                        ni_unlock(ni);
+                       if (err)
+                               goto out;
                } else if (new_size > i_size) {
-                       inode->i_size = new_size;
+                       i_size_write(inode, new_size);
                }
        }
 
+ok:
+       err = file_modified(file);
+       if (err)
+               goto out;
+
 out:
        if (map_locked)
                filemap_invalidate_unlock(mapping);
@@ -663,6 +680,9 @@ int ntfs3_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
        umode_t mode = inode->i_mode;
        int err;
 
+       if (unlikely(ntfs3_forced_shutdown(inode->i_sb)))
+               return -EIO;
+
        err = setattr_prepare(idmap, dentry, attr);
        if (err)
                goto out;
@@ -676,7 +696,7 @@ int ntfs3_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
                        goto out;
                }
                inode_dio_wait(inode);
-               oldsize = inode->i_size;
+               oldsize = i_size_read(inode);
                newsize = attr->ia_size;
 
                if (newsize <= oldsize)
@@ -688,7 +708,7 @@ int ntfs3_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
                        goto out;
 
                ni->ni_flags |= NI_FLAG_UPDATE_PARENT;
-               inode->i_size = newsize;
+               i_size_write(inode, newsize);
        }
 
        setattr_copy(idmap, inode, attr);
@@ -718,6 +738,9 @@ static ssize_t ntfs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
        struct inode *inode = file->f_mapping->host;
        struct ntfs_inode *ni = ntfs_i(inode);
 
+       if (unlikely(ntfs3_forced_shutdown(inode->i_sb)))
+               return -EIO;
+
        if (is_encrypted(ni)) {
                ntfs_inode_warn(inode, "encrypted i/o not supported");
                return -EOPNOTSUPP;
@@ -752,6 +775,9 @@ static ssize_t ntfs_file_splice_read(struct file *in, loff_t *ppos,
        struct inode *inode = in->f_mapping->host;
        struct ntfs_inode *ni = ntfs_i(inode);
 
+       if (unlikely(ntfs3_forced_shutdown(inode->i_sb)))
+               return -EIO;
+
        if (is_encrypted(ni)) {
                ntfs_inode_warn(inode, "encrypted i/o not supported");
                return -EOPNOTSUPP;
@@ -821,7 +847,7 @@ static ssize_t ntfs_compress_write(struct kiocb *iocb, struct iov_iter *from)
        size_t count = iov_iter_count(from);
        loff_t pos = iocb->ki_pos;
        struct inode *inode = file_inode(file);
-       loff_t i_size = inode->i_size;
+       loff_t i_size = i_size_read(inode);
        struct address_space *mapping = inode->i_mapping;
        struct ntfs_inode *ni = ntfs_i(inode);
        u64 valid = ni->i_valid;
@@ -1028,6 +1054,8 @@ out:
        iocb->ki_pos += written;
        if (iocb->ki_pos > ni->i_valid)
                ni->i_valid = iocb->ki_pos;
+       if (iocb->ki_pos > i_size)
+               i_size_write(inode, iocb->ki_pos);
 
        return written;
 }
@@ -1041,8 +1069,12 @@ static ssize_t ntfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
        struct address_space *mapping = file->f_mapping;
        struct inode *inode = mapping->host;
        ssize_t ret;
+       int err;
        struct ntfs_inode *ni = ntfs_i(inode);
 
+       if (unlikely(ntfs3_forced_shutdown(inode->i_sb)))
+               return -EIO;
+
        if (is_encrypted(ni)) {
                ntfs_inode_warn(inode, "encrypted i/o not supported");
                return -EOPNOTSUPP;
@@ -1068,6 +1100,12 @@ static ssize_t ntfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
        if (ret <= 0)
                goto out;
 
+       err = file_modified(iocb->ki_filp);
+       if (err) {
+               ret = err;
+               goto out;
+       }
+
        if (WARN_ON(ni->ni_flags & NI_FLAG_COMPRESSED_MASK)) {
                /* Should never be here, see ntfs_file_open(). */
                ret = -EOPNOTSUPP;
@@ -1097,6 +1135,9 @@ int ntfs_file_open(struct inode *inode, struct file *file)
 {
        struct ntfs_inode *ni = ntfs_i(inode);
 
+       if (unlikely(ntfs3_forced_shutdown(inode->i_sb)))
+               return -EIO;
+
        if (unlikely((is_compressed(ni) || is_encrypted(ni)) &&
                     (file->f_flags & O_DIRECT))) {
                return -EOPNOTSUPP;
@@ -1138,7 +1179,8 @@ static int ntfs_file_release(struct inode *inode, struct file *file)
                down_write(&ni->file.run_lock);
 
                err = attr_set_size(ni, ATTR_DATA, NULL, 0, &ni->file.run,
-                                   inode->i_size, &ni->i_valid, false, NULL);
+                                   i_size_read(inode), &ni->i_valid, false,
+                                   NULL);
 
                up_write(&ni->file.run_lock);
                ni_unlock(ni);
index 3df2d9e34b9144f4b039ccc6197c9aa249b7ac64..7f27382e0ce25bcb2c660fa799cf8295a9cf486b 100644 (file)
@@ -778,7 +778,7 @@ static int ni_try_remove_attr_list(struct ntfs_inode *ni)
        run_deallocate(sbi, &ni->attr_list.run, true);
        run_close(&ni->attr_list.run);
        ni->attr_list.size = 0;
-       kfree(ni->attr_list.le);
+       kvfree(ni->attr_list.le);
        ni->attr_list.le = NULL;
        ni->attr_list.dirty = false;
 
@@ -927,7 +927,7 @@ int ni_create_attr_list(struct ntfs_inode *ni)
        return 0;
 
 out:
-       kfree(ni->attr_list.le);
+       kvfree(ni->attr_list.le);
        ni->attr_list.le = NULL;
        ni->attr_list.size = 0;
        return err;
@@ -2099,7 +2099,7 @@ int ni_readpage_cmpr(struct ntfs_inode *ni, struct page *page)
        gfp_t gfp_mask;
        struct page *pg;
 
-       if (vbo >= ni->vfs_inode.i_size) {
+       if (vbo >= i_size_read(&ni->vfs_inode)) {
                SetPageUptodate(page);
                err = 0;
                goto out;
@@ -2173,7 +2173,7 @@ int ni_decompress_file(struct ntfs_inode *ni)
 {
        struct ntfs_sb_info *sbi = ni->mi.sbi;
        struct inode *inode = &ni->vfs_inode;
-       loff_t i_size = inode->i_size;
+       loff_t i_size = i_size_read(inode);
        struct address_space *mapping = inode->i_mapping;
        gfp_t gfp_mask = mapping_gfp_mask(mapping);
        struct page **pages = NULL;
@@ -2508,6 +2508,7 @@ int ni_read_frame(struct ntfs_inode *ni, u64 frame_vbo, struct page **pages,
                err = -EOPNOTSUPP;
                goto out1;
 #else
+               loff_t i_size = i_size_read(&ni->vfs_inode);
                u32 frame_bits = ni_ext_compress_bits(ni);
                u64 frame64 = frame_vbo >> frame_bits;
                u64 frames, vbo_data;
@@ -2548,7 +2549,7 @@ int ni_read_frame(struct ntfs_inode *ni, u64 frame_vbo, struct page **pages,
                        }
                }
 
-               frames = (ni->vfs_inode.i_size - 1) >> frame_bits;
+               frames = (i_size - 1) >> frame_bits;
 
                err = attr_wof_frame_info(ni, attr, run, frame64, frames,
                                          frame_bits, &ondisk_size, &vbo_data);
@@ -2556,8 +2557,7 @@ int ni_read_frame(struct ntfs_inode *ni, u64 frame_vbo, struct page **pages,
                        goto out2;
 
                if (frame64 == frames) {
-                       unc_size = 1 + ((ni->vfs_inode.i_size - 1) &
-                                       (frame_size - 1));
+                       unc_size = 1 + ((i_size - 1) & (frame_size - 1));
                        ondisk_size = attr_size(attr) - vbo_data;
                } else {
                        unc_size = frame_size;
@@ -3259,6 +3259,9 @@ int ni_write_inode(struct inode *inode, int sync, const char *hint)
        if (is_bad_inode(inode) || sb_rdonly(sb))
                return 0;
 
+       if (unlikely(ntfs3_forced_shutdown(sb)))
+               return -EIO;
+
        if (!ni_trylock(ni)) {
                /* 'ni' is under modification, skip for now. */
                mark_inode_dirty_sync(inode);
@@ -3288,7 +3291,7 @@ int ni_write_inode(struct inode *inode, int sync, const char *hint)
                        modified = true;
                }
 
-               ts = inode_get_mtime(inode);
+               ts = inode_get_ctime(inode);
                dup.c_time = kernel2nt(&ts);
                if (std->c_time != dup.c_time) {
                        std->c_time = dup.c_time;
index 98ccb66508583138ed7f5c273b2f4450be88159b..855519713bf79074ed336ca7094cca5d5cdbc009 100644 (file)
@@ -465,7 +465,7 @@ static inline bool is_rst_area_valid(const struct RESTART_HDR *rhdr)
 {
        const struct RESTART_AREA *ra;
        u16 cl, fl, ul;
-       u32 off, l_size, file_dat_bits, file_size_round;
+       u32 off, l_size, seq_bits;
        u16 ro = le16_to_cpu(rhdr->ra_off);
        u32 sys_page = le32_to_cpu(rhdr->sys_page_size);
 
@@ -511,13 +511,15 @@ static inline bool is_rst_area_valid(const struct RESTART_HDR *rhdr)
        /* Make sure the sequence number bits match the log file size. */
        l_size = le64_to_cpu(ra->l_size);
 
-       file_dat_bits = sizeof(u64) * 8 - le32_to_cpu(ra->seq_num_bits);
-       file_size_round = 1u << (file_dat_bits + 3);
-       if (file_size_round != l_size &&
-           (file_size_round < l_size || (file_size_round / 2) > l_size)) {
-               return false;
+       seq_bits = sizeof(u64) * 8 + 3;
+       while (l_size) {
+               l_size >>= 1;
+               seq_bits -= 1;
        }
 
+       if (seq_bits != ra->seq_num_bits)
+               return false;
+
        /* The log page data offset and record header length must be quad-aligned. */
        if (!IS_ALIGNED(le16_to_cpu(ra->data_off), 8) ||
            !IS_ALIGNED(le16_to_cpu(ra->rec_hdr_len), 8))
@@ -974,6 +976,16 @@ skip_looking:
        return e;
 }
 
+struct restart_info {
+       u64 last_lsn;
+       struct RESTART_HDR *r_page;
+       u32 vbo;
+       bool chkdsk_was_run;
+       bool valid_page;
+       bool initialized;
+       bool restart;
+};
+
 #define RESTART_SINGLE_PAGE_IO cpu_to_le16(0x0001)
 
 #define NTFSLOG_WRAPPED 0x00000001
@@ -987,6 +999,7 @@ struct ntfs_log {
        struct ntfs_inode *ni;
 
        u32 l_size;
+       u32 orig_file_size;
        u32 sys_page_size;
        u32 sys_page_mask;
        u32 page_size;
@@ -1040,6 +1053,8 @@ struct ntfs_log {
 
        struct CLIENT_ID client_id;
        u32 client_undo_commit;
+
+       struct restart_info rst_info, rst_info2;
 };
 
 static inline u32 lsn_to_vbo(struct ntfs_log *log, const u64 lsn)
@@ -1105,16 +1120,6 @@ static inline bool verify_client_lsn(struct ntfs_log *log,
               lsn <= le64_to_cpu(log->ra->current_lsn) && lsn;
 }
 
-struct restart_info {
-       u64 last_lsn;
-       struct RESTART_HDR *r_page;
-       u32 vbo;
-       bool chkdsk_was_run;
-       bool valid_page;
-       bool initialized;
-       bool restart;
-};
-
 static int read_log_page(struct ntfs_log *log, u32 vbo,
                         struct RECORD_PAGE_HDR **buffer, bool *usa_error)
 {
@@ -1176,7 +1181,7 @@ out:
  * restart page header. It will stop the first time we find a
  * valid page header.
  */
-static int log_read_rst(struct ntfs_log *log, u32 l_size, bool first,
+static int log_read_rst(struct ntfs_log *log, bool first,
                        struct restart_info *info)
 {
        u32 skip, vbo;
@@ -1192,7 +1197,7 @@ static int log_read_rst(struct ntfs_log *log, u32 l_size, bool first,
        }
 
        /* Loop continuously until we succeed. */
-       for (; vbo < l_size; vbo = 2 * vbo + skip, skip = 0) {
+       for (; vbo < log->l_size; vbo = 2 * vbo + skip, skip = 0) {
                bool usa_error;
                bool brst, bchk;
                struct RESTART_AREA *ra;
@@ -1285,22 +1290,17 @@ check_result:
 /*
  * Ilog_init_pg_hdr - Init @log from restart page header.
  */
-static void log_init_pg_hdr(struct ntfs_log *log, u32 sys_page_size,
-                           u32 page_size, u16 major_ver, u16 minor_ver)
+static void log_init_pg_hdr(struct ntfs_log *log, u16 major_ver, u16 minor_ver)
 {
-       log->sys_page_size = sys_page_size;
-       log->sys_page_mask = sys_page_size - 1;
-       log->page_size = page_size;
-       log->page_mask = page_size - 1;
-       log->page_bits = blksize_bits(page_size);
+       log->sys_page_size = log->page_size;
+       log->sys_page_mask = log->page_mask;
 
        log->clst_per_page = log->page_size >> log->ni->mi.sbi->cluster_bits;
        if (!log->clst_per_page)
                log->clst_per_page = 1;
 
-       log->first_page = major_ver >= 2 ?
-                                 0x22 * page_size :
-                                 ((sys_page_size << 1) + (page_size << 1));
+       log->first_page = major_ver >= 2 ? 0x22 * log->page_size :
+                                          4 * log->page_size;
        log->major_ver = major_ver;
        log->minor_ver = minor_ver;
 }
@@ -1308,12 +1308,11 @@ static void log_init_pg_hdr(struct ntfs_log *log, u32 sys_page_size,
 /*
  * log_create - Init @log in cases when we don't have a restart area to use.
  */
-static void log_create(struct ntfs_log *log, u32 l_size, const u64 last_lsn,
+static void log_create(struct ntfs_log *log, const u64 last_lsn,
                       u32 open_log_count, bool wrapped, bool use_multi_page)
 {
-       log->l_size = l_size;
        /* All file offsets must be quadword aligned. */
-       log->file_data_bits = blksize_bits(l_size) - 3;
+       log->file_data_bits = blksize_bits(log->l_size) - 3;
        log->seq_num_mask = (8 << log->file_data_bits) - 1;
        log->seq_num_bits = sizeof(u64) * 8 - log->file_data_bits;
        log->seq_num = (last_lsn >> log->file_data_bits) + 2;
@@ -3720,10 +3719,8 @@ int log_replay(struct ntfs_inode *ni, bool *initialized)
        struct ntfs_sb_info *sbi = ni->mi.sbi;
        struct ntfs_log *log;
 
-       struct restart_info rst_info, rst_info2;
-       u64 rec_lsn, ra_lsn, checkpt_lsn = 0, rlsn = 0;
+       u64 rec_lsn, checkpt_lsn = 0, rlsn = 0;
        struct ATTR_NAME_ENTRY *attr_names = NULL;
-       struct ATTR_NAME_ENTRY *ane;
        struct RESTART_TABLE *dptbl = NULL;
        struct RESTART_TABLE *trtbl = NULL;
        const struct RESTART_TABLE *rt;
@@ -3741,9 +3738,7 @@ int log_replay(struct ntfs_inode *ni, bool *initialized)
        struct TRANSACTION_ENTRY *tr;
        struct DIR_PAGE_ENTRY *dp;
        u32 i, bytes_per_attr_entry;
-       u32 l_size = ni->vfs_inode.i_size;
-       u32 orig_file_size = l_size;
-       u32 page_size, vbo, tail, off, dlen;
+       u32 vbo, tail, off, dlen;
        u32 saved_len, rec_len, transact_id;
        bool use_second_page;
        struct RESTART_AREA *ra2, *ra = NULL;
@@ -3758,52 +3753,50 @@ int log_replay(struct ntfs_inode *ni, bool *initialized)
        u16 t16;
        u32 t32;
 
-       /* Get the size of page. NOTE: To replay we can use default page. */
-#if PAGE_SIZE >= DefaultLogPageSize && PAGE_SIZE <= DefaultLogPageSize * 2
-       page_size = norm_file_page(PAGE_SIZE, &l_size, true);
-#else
-       page_size = norm_file_page(PAGE_SIZE, &l_size, false);
-#endif
-       if (!page_size)
-               return -EINVAL;
-
        log = kzalloc(sizeof(struct ntfs_log), GFP_NOFS);
        if (!log)
                return -ENOMEM;
 
        log->ni = ni;
-       log->l_size = l_size;
-       log->one_page_buf = kmalloc(page_size, GFP_NOFS);
+       log->l_size = log->orig_file_size = ni->vfs_inode.i_size;
 
+       /* Get the size of page. NOTE: To replay we can use default page. */
+#if PAGE_SIZE >= DefaultLogPageSize && PAGE_SIZE <= DefaultLogPageSize * 2
+       log->page_size = norm_file_page(PAGE_SIZE, &log->l_size, true);
+#else
+       log->page_size = norm_file_page(PAGE_SIZE, &log->l_size, false);
+#endif
+       if (!log->page_size) {
+               err = -EINVAL;
+               goto out;
+       }
+
+       log->one_page_buf = kmalloc(log->page_size, GFP_NOFS);
        if (!log->one_page_buf) {
                err = -ENOMEM;
                goto out;
        }
 
-       log->page_size = page_size;
-       log->page_mask = page_size - 1;
-       log->page_bits = blksize_bits(page_size);
+       log->page_mask = log->page_size - 1;
+       log->page_bits = blksize_bits(log->page_size);
 
        /* Look for a restart area on the disk. */
-       memset(&rst_info, 0, sizeof(struct restart_info));
-       err = log_read_rst(log, l_size, true, &rst_info);
+       err = log_read_rst(log, true, &log->rst_info);
        if (err)
                goto out;
 
        /* remember 'initialized' */
-       *initialized = rst_info.initialized;
+       *initialized = log->rst_info.initialized;
 
-       if (!rst_info.restart) {
-               if (rst_info.initialized) {
+       if (!log->rst_info.restart) {
+               if (log->rst_info.initialized) {
                        /* No restart area but the file is not initialized. */
                        err = -EINVAL;
                        goto out;
                }
 
-               log_init_pg_hdr(log, page_size, page_size, 1, 1);
-               log_create(log, l_size, 0, get_random_u32(), false, false);
-
-               log->ra = ra;
+               log_init_pg_hdr(log, 1, 1);
+               log_create(log, 0, get_random_u32(), false, false);
 
                ra = log_create_ra(log);
                if (!ra) {
@@ -3820,25 +3813,26 @@ int log_replay(struct ntfs_inode *ni, bool *initialized)
         * If the restart offset above wasn't zero then we won't
         * look for a second restart.
         */
-       if (rst_info.vbo)
+       if (log->rst_info.vbo)
                goto check_restart_area;
 
-       memset(&rst_info2, 0, sizeof(struct restart_info));
-       err = log_read_rst(log, l_size, false, &rst_info2);
+       err = log_read_rst(log, false, &log->rst_info2);
        if (err)
                goto out;
 
        /* Determine which restart area to use. */
-       if (!rst_info2.restart || rst_info2.last_lsn <= rst_info.last_lsn)
+       if (!log->rst_info2.restart ||
+           log->rst_info2.last_lsn <= log->rst_info.last_lsn)
                goto use_first_page;
 
        use_second_page = true;
 
-       if (rst_info.chkdsk_was_run && page_size != rst_info.vbo) {
+       if (log->rst_info.chkdsk_was_run &&
+           log->page_size != log->rst_info.vbo) {
                struct RECORD_PAGE_HDR *sp = NULL;
                bool usa_error;
 
-               if (!read_log_page(log, page_size, &sp, &usa_error) &&
+               if (!read_log_page(log, log->page_size, &sp, &usa_error) &&
                    sp->rhdr.sign == NTFS_CHKD_SIGNATURE) {
                        use_second_page = false;
                }
@@ -3846,52 +3840,43 @@ int log_replay(struct ntfs_inode *ni, bool *initialized)
        }
 
        if (use_second_page) {
-               kfree(rst_info.r_page);
-               memcpy(&rst_info, &rst_info2, sizeof(struct restart_info));
-               rst_info2.r_page = NULL;
+               kfree(log->rst_info.r_page);
+               memcpy(&log->rst_info, &log->rst_info2,
+                      sizeof(struct restart_info));
+               log->rst_info2.r_page = NULL;
        }
 
 use_first_page:
-       kfree(rst_info2.r_page);
+       kfree(log->rst_info2.r_page);
 
 check_restart_area:
        /*
         * If the restart area is at offset 0, we want
         * to write the second restart area first.
         */
-       log->init_ra = !!rst_info.vbo;
+       log->init_ra = !!log->rst_info.vbo;
 
        /* If we have a valid page then grab a pointer to the restart area. */
-       ra2 = rst_info.valid_page ?
-                     Add2Ptr(rst_info.r_page,
-                             le16_to_cpu(rst_info.r_page->ra_off)) :
+       ra2 = log->rst_info.valid_page ?
+                     Add2Ptr(log->rst_info.r_page,
+                             le16_to_cpu(log->rst_info.r_page->ra_off)) :
                      NULL;
 
-       if (rst_info.chkdsk_was_run ||
+       if (log->rst_info.chkdsk_was_run ||
            (ra2 && ra2->client_idx[1] == LFS_NO_CLIENT_LE)) {
                bool wrapped = false;
                bool use_multi_page = false;
                u32 open_log_count;
 
                /* Do some checks based on whether we have a valid log page. */
-               if (!rst_info.valid_page) {
-                       open_log_count = get_random_u32();
-                       goto init_log_instance;
-               }
-               open_log_count = le32_to_cpu(ra2->open_log_count);
-
-               /*
-                * If the restart page size isn't changing then we want to
-                * check how much work we need to do.
-                */
-               if (page_size != le32_to_cpu(rst_info.r_page->sys_page_size))
-                       goto init_log_instance;
+               open_log_count = log->rst_info.valid_page ?
+                                        le32_to_cpu(ra2->open_log_count) :
+                                        get_random_u32();
 
-init_log_instance:
-               log_init_pg_hdr(log, page_size, page_size, 1, 1);
+               log_init_pg_hdr(log, 1, 1);
 
-               log_create(log, l_size, rst_info.last_lsn, open_log_count,
-                          wrapped, use_multi_page);
+               log_create(log, log->rst_info.last_lsn, open_log_count, wrapped,
+                          use_multi_page);
 
                ra = log_create_ra(log);
                if (!ra) {
@@ -3916,28 +3901,27 @@ init_log_instance:
         * use the log file. We must use the system page size instead of the
         * default size if there is not a clean shutdown.
         */
-       t32 = le32_to_cpu(rst_info.r_page->sys_page_size);
-       if (page_size != t32) {
-               l_size = orig_file_size;
-               page_size =
-                       norm_file_page(t32, &l_size, t32 == DefaultLogPageSize);
+       t32 = le32_to_cpu(log->rst_info.r_page->sys_page_size);
+       if (log->page_size != t32) {
+               log->l_size = log->orig_file_size;
+               log->page_size = norm_file_page(t32, &log->l_size,
+                                               t32 == DefaultLogPageSize);
        }
 
-       if (page_size != t32 ||
-           page_size != le32_to_cpu(rst_info.r_page->page_size)) {
+       if (log->page_size != t32 ||
+           log->page_size != le32_to_cpu(log->rst_info.r_page->page_size)) {
                err = -EINVAL;
                goto out;
        }
 
        /* If the file size has shrunk then we won't mount it. */
-       if (l_size < le64_to_cpu(ra2->l_size)) {
+       if (log->l_size < le64_to_cpu(ra2->l_size)) {
                err = -EINVAL;
                goto out;
        }
 
-       log_init_pg_hdr(log, page_size, page_size,
-                       le16_to_cpu(rst_info.r_page->major_ver),
-                       le16_to_cpu(rst_info.r_page->minor_ver));
+       log_init_pg_hdr(log, le16_to_cpu(log->rst_info.r_page->major_ver),
+                       le16_to_cpu(log->rst_info.r_page->minor_ver));
 
        log->l_size = le64_to_cpu(ra2->l_size);
        log->seq_num_bits = le32_to_cpu(ra2->seq_num_bits);
@@ -3945,7 +3929,7 @@ init_log_instance:
        log->seq_num_mask = (8 << log->file_data_bits) - 1;
        log->last_lsn = le64_to_cpu(ra2->current_lsn);
        log->seq_num = log->last_lsn >> log->file_data_bits;
-       log->ra_off = le16_to_cpu(rst_info.r_page->ra_off);
+       log->ra_off = le16_to_cpu(log->rst_info.r_page->ra_off);
        log->restart_size = log->sys_page_size - log->ra_off;
        log->record_header_len = le16_to_cpu(ra2->rec_hdr_len);
        log->ra_size = le16_to_cpu(ra2->ra_len);
@@ -4045,7 +4029,7 @@ find_oldest:
        log->current_avail = current_log_avail(log);
 
        /* Remember which restart area to write first. */
-       log->init_ra = rst_info.vbo;
+       log->init_ra = log->rst_info.vbo;
 
 process_log:
        /* 1.0, 1.1, 2.0 log->major_ver/minor_ver - short values. */
@@ -4105,7 +4089,7 @@ process_log:
        log->client_id.seq_num = cr->seq_num;
        log->client_id.client_idx = client;
 
-       err = read_rst_area(log, &rst, &ra_lsn);
+       err = read_rst_area(log, &rst, &checkpt_lsn);
        if (err)
                goto out;
 
@@ -4114,9 +4098,8 @@ process_log:
 
        bytes_per_attr_entry = !rst->major_ver ? 0x2C : 0x28;
 
-       checkpt_lsn = le64_to_cpu(rst->check_point_start);
-       if (!checkpt_lsn)
-               checkpt_lsn = ra_lsn;
+       if (rst->check_point_start)
+               checkpt_lsn = le64_to_cpu(rst->check_point_start);
 
        /* Allocate and Read the Transaction Table. */
        if (!rst->transact_table_len)
@@ -4330,23 +4313,20 @@ check_attr_table:
        lcb = NULL;
 
 check_attribute_names2:
-       if (!rst->attr_names_len)
-               goto trace_attribute_table;
-
-       ane = attr_names;
-       if (!oatbl)
-               goto trace_attribute_table;
-       while (ane->off) {
-               /* TODO: Clear table on exit! */
-               oe = Add2Ptr(oatbl, le16_to_cpu(ane->off));
-               t16 = le16_to_cpu(ane->name_bytes);
-               oe->name_len = t16 / sizeof(short);
-               oe->ptr = ane->name;
-               oe->is_attr_name = 2;
-               ane = Add2Ptr(ane, sizeof(struct ATTR_NAME_ENTRY) + t16);
-       }
-
-trace_attribute_table:
+       if (rst->attr_names_len && oatbl) {
+               struct ATTR_NAME_ENTRY *ane = attr_names;
+               while (ane->off) {
+                       /* TODO: Clear table on exit! */
+                       oe = Add2Ptr(oatbl, le16_to_cpu(ane->off));
+                       t16 = le16_to_cpu(ane->name_bytes);
+                       oe->name_len = t16 / sizeof(short);
+                       oe->ptr = ane->name;
+                       oe->is_attr_name = 2;
+                       ane = Add2Ptr(ane,
+                                     sizeof(struct ATTR_NAME_ENTRY) + t16);
+               }
+       }
+
        /*
         * If the checkpt_lsn is zero, then this is a freshly
         * formatted disk and we have no work to do.
@@ -5189,7 +5169,7 @@ out:
        kfree(oatbl);
        kfree(dptbl);
        kfree(attr_names);
-       kfree(rst_info.r_page);
+       kfree(log->rst_info.r_page);
 
        kfree(ra);
        kfree(log->one_page_buf);
index fbfe21dbb42597cdb25a6d809f7b396e37508046..ae2ef5c11868c360f1285feffae734b2974d0d5c 100644 (file)
@@ -853,7 +853,8 @@ void ntfs_update_mftmirr(struct ntfs_sb_info *sbi, int wait)
        /*
         * sb can be NULL here. In this case sbi->flags should be 0 too.
         */
-       if (!sb || !(sbi->flags & NTFS_FLAGS_MFTMIRR))
+       if (!sb || !(sbi->flags & NTFS_FLAGS_MFTMIRR) ||
+           unlikely(ntfs3_forced_shutdown(sb)))
                return;
 
        blocksize = sb->s_blocksize;
@@ -1006,6 +1007,30 @@ static inline __le32 security_hash(const void *sd, size_t bytes)
        return cpu_to_le32(hash);
 }
 
+/*
+ * simple wrapper for sb_bread_unmovable.
+ */
+struct buffer_head *ntfs_bread(struct super_block *sb, sector_t block)
+{
+       struct ntfs_sb_info *sbi = sb->s_fs_info;
+       struct buffer_head *bh;
+
+       if (unlikely(block >= sbi->volume.blocks)) {
+               /* prevent generic message "attempt to access beyond end of device" */
+               ntfs_err(sb, "try to read out of volume at offset 0x%llx",
+                        (u64)block << sb->s_blocksize_bits);
+               return NULL;
+       }
+
+       bh = sb_bread_unmovable(sb, block);
+       if (bh)
+               return bh;
+
+       ntfs_err(sb, "failed to read volume at offset 0x%llx",
+                (u64)block << sb->s_blocksize_bits);
+       return NULL;
+}
+
 int ntfs_sb_read(struct super_block *sb, u64 lbo, size_t bytes, void *buffer)
 {
        struct block_device *bdev = sb->s_bdev;
@@ -2128,8 +2153,8 @@ int ntfs_insert_security(struct ntfs_sb_info *sbi,
                        if (le32_to_cpu(d_security->size) == new_sec_size &&
                            d_security->key.hash == hash_key.hash &&
                            !memcmp(d_security + 1, sd, size_sd)) {
-                               *security_id = d_security->key.sec_id;
                                /* Such security already exists. */
+                               *security_id = d_security->key.sec_id;
                                err = 0;
                                goto out;
                        }
index cf92b2433f7a750aeb86383eb7440c730ad7dc95..daabaad63aaf64ae65b8d67bbd40de837bfe3486 100644 (file)
@@ -1462,7 +1462,7 @@ static int indx_create_allocate(struct ntfs_index *indx, struct ntfs_inode *ni,
                goto out2;
 
        if (in->name == I30_NAME) {
-               ni->vfs_inode.i_size = data_size;
+               i_size_write(&ni->vfs_inode, data_size);
                inode_set_bytes(&ni->vfs_inode, alloc_size);
        }
 
@@ -1544,7 +1544,7 @@ static int indx_add_allocate(struct ntfs_index *indx, struct ntfs_inode *ni,
        }
 
        if (in->name == I30_NAME)
-               ni->vfs_inode.i_size = data_size;
+               i_size_write(&ni->vfs_inode, data_size);
 
        *vbn = bit << indx->idx2vbn_bits;
 
@@ -2090,7 +2090,7 @@ static int indx_shrink(struct ntfs_index *indx, struct ntfs_inode *ni,
                return err;
 
        if (in->name == I30_NAME)
-               ni->vfs_inode.i_size = new_data;
+               i_size_write(&ni->vfs_inode, new_data);
 
        bpb = bitmap_size(bit);
        if (bpb * 8 == nbits)
@@ -2576,7 +2576,7 @@ int indx_delete_entry(struct ntfs_index *indx, struct ntfs_inode *ni,
                err = attr_set_size(ni, ATTR_ALLOC, in->name, in->name_len,
                                    &indx->alloc_run, 0, NULL, false, NULL);
                if (in->name == I30_NAME)
-                       ni->vfs_inode.i_size = 0;
+                       i_size_write(&ni->vfs_inode, 0);
 
                err = ni_remove_attr(ni, ATTR_ALLOC, in->name, in->name_len,
                                     false, NULL);
index 5e3d713749185f116e145adea4b196fbd7be72d2..eb7a8c9fba0183f40096d673473be4dffaa7c4c8 100644 (file)
@@ -345,9 +345,7 @@ next_attr:
                        inode->i_size = le16_to_cpu(rp.SymbolicLinkReparseBuffer
                                                            .PrintNameLength) /
                                        sizeof(u16);
-
                        ni->i_valid = inode->i_size;
-
                        /* Clear directory bit. */
                        if (ni->ni_flags & NI_FLAG_DIR) {
                                indx_clear(&ni->dir);
@@ -412,7 +410,6 @@ end_enum:
                goto out;
 
        if (!is_match && name) {
-               /* Reuse rec as buffer for ascii name. */
                err = -ENOENT;
                goto out;
        }
@@ -427,6 +424,7 @@ end_enum:
 
        if (names != le16_to_cpu(rec->hard_links)) {
                /* Correct minor error on the fly. Do not mark inode as dirty. */
+               ntfs_inode_warn(inode, "Correct links count -> %u.", names);
                rec->hard_links = cpu_to_le16(names);
                ni->mi.dirty = true;
        }
@@ -653,9 +651,10 @@ static noinline int ntfs_get_block_vbo(struct inode *inode, u64 vbo,
                        off = vbo & (PAGE_SIZE - 1);
                        folio_set_bh(bh, folio, off);
 
-                       err = bh_read(bh, 0);
-                       if (err < 0)
+                       if (bh_read(bh, 0) < 0) {
+                               err = -EIO;
                                goto out;
+                       }
                        folio_zero_segment(folio, off + voff, off + block_size);
                }
        }
@@ -853,9 +852,13 @@ static int ntfs_resident_writepage(struct folio *folio,
                                   struct writeback_control *wbc, void *data)
 {
        struct address_space *mapping = data;
-       struct ntfs_inode *ni = ntfs_i(mapping->host);
+       struct inode *inode = mapping->host;
+       struct ntfs_inode *ni = ntfs_i(inode);
        int ret;
 
+       if (unlikely(ntfs3_forced_shutdown(inode->i_sb)))
+               return -EIO;
+
        ni_lock(ni);
        ret = attr_data_write_resident(ni, &folio->page);
        ni_unlock(ni);
@@ -869,7 +872,12 @@ static int ntfs_resident_writepage(struct folio *folio,
 static int ntfs_writepages(struct address_space *mapping,
                           struct writeback_control *wbc)
 {
-       if (is_resident(ntfs_i(mapping->host)))
+       struct inode *inode = mapping->host;
+
+       if (unlikely(ntfs3_forced_shutdown(inode->i_sb)))
+               return -EIO;
+
+       if (is_resident(ntfs_i(inode)))
                return write_cache_pages(mapping, wbc, ntfs_resident_writepage,
                                         mapping);
        return mpage_writepages(mapping, wbc, ntfs_get_block);
@@ -889,6 +897,9 @@ int ntfs_write_begin(struct file *file, struct address_space *mapping,
        struct inode *inode = mapping->host;
        struct ntfs_inode *ni = ntfs_i(inode);
 
+       if (unlikely(ntfs3_forced_shutdown(inode->i_sb)))
+               return -EIO;
+
        *pagep = NULL;
        if (is_resident(ni)) {
                struct page *page =
@@ -974,7 +985,7 @@ int ntfs_write_end(struct file *file, struct address_space *mapping, loff_t pos,
                }
 
                if (pos + err > inode->i_size) {
-                       inode->i_size = pos + err;
+                       i_size_write(inode, pos + err);
                        dirty = true;
                }
 
@@ -1306,6 +1317,11 @@ struct inode *ntfs_create_inode(struct mnt_idmap *idmap, struct inode *dir,
                goto out1;
        }
 
+       if (unlikely(ntfs3_forced_shutdown(sb))) {
+               err = -EIO;
+               goto out2;
+       }
+
        /* Mark rw ntfs as dirty. it will be cleared at umount. */
        ntfs_set_state(sbi, NTFS_DIRTY_DIRTY);
 
index ee3093be51701e78d1e02f6f30a7b5a4019831a0..084d19d78397c9b6108b36b9730c59a806a66534 100644 (file)
@@ -181,6 +181,9 @@ static int ntfs_unlink(struct inode *dir, struct dentry *dentry)
        struct ntfs_inode *ni = ntfs_i(dir);
        int err;
 
+       if (unlikely(ntfs3_forced_shutdown(dir->i_sb)))
+               return -EIO;
+
        ni_lock_dir(ni);
 
        err = ntfs_unlink_inode(dir, dentry);
@@ -199,6 +202,9 @@ static int ntfs_symlink(struct mnt_idmap *idmap, struct inode *dir,
        u32 size = strlen(symname);
        struct inode *inode;
 
+       if (unlikely(ntfs3_forced_shutdown(dir->i_sb)))
+               return -EIO;
+
        inode = ntfs_create_inode(idmap, dir, dentry, NULL, S_IFLNK | 0777, 0,
                                  symname, size, NULL);
 
@@ -227,6 +233,9 @@ static int ntfs_rmdir(struct inode *dir, struct dentry *dentry)
        struct ntfs_inode *ni = ntfs_i(dir);
        int err;
 
+       if (unlikely(ntfs3_forced_shutdown(dir->i_sb)))
+               return -EIO;
+
        ni_lock_dir(ni);
 
        err = ntfs_unlink_inode(dir, dentry);
@@ -264,6 +273,9 @@ static int ntfs_rename(struct mnt_idmap *idmap, struct inode *dir,
                      1024);
        static_assert(PATH_MAX >= 4 * 1024);
 
+       if (unlikely(ntfs3_forced_shutdown(sb)))
+               return -EIO;
+
        if (flags & ~RENAME_NOREPLACE)
                return -EINVAL;
 
@@ -419,7 +431,7 @@ static int ntfs_atomic_open(struct inode *dir, struct dentry *dentry,
         * fnd contains tree's path to insert to.
         * If fnd is not NULL then dir is locked.
         */
-       inode = ntfs_create_inode(mnt_idmap(file->f_path.mnt), dir, dentry, uni,
+       inode = ntfs_create_inode(file_mnt_idmap(file), dir, dentry, uni,
                                  mode, 0, NULL, 0, fnd);
        err = IS_ERR(inode) ? PTR_ERR(inode) :
                              finish_open(file, dentry, ntfs_file_open);
index 86aecbb01a92f282ab621a26df9897b70b65df28..9c7478150a0352d4f46574b76e91ec27ce661cdb 100644 (file)
@@ -523,12 +523,10 @@ struct ATTR_LIST_ENTRY {
        __le64 vcn;             // 0x08: Starting VCN of this attribute.
        struct MFT_REF ref;     // 0x10: MFT record number with attribute.
        __le16 id;              // 0x18: struct ATTRIB ID.
-       __le16 name[3];         // 0x1A: Just to align. To get real name can use bNameOffset.
+       __le16 name[];          // 0x1A: To get real name use name_off.
 
 }; // sizeof(0x20)
 
-static_assert(sizeof(struct ATTR_LIST_ENTRY) == 0x20);
-
 static inline u32 le_size(u8 name_len)
 {
        return ALIGN(offsetof(struct ATTR_LIST_ENTRY, name) +
index f6706143d14bced3c2bfdbac59d4df1a8cb9da5e..79356fd29a14141de34ed006517b153fd9e4872b 100644 (file)
@@ -61,6 +61,8 @@ enum utf16_endian;
 
 /* sbi->flags */
 #define NTFS_FLAGS_NODISCARD           0x00000001
+/* ntfs in shutdown state. */
+#define NTFS_FLAGS_SHUTDOWN_BIT                0x00000002  /* == 4*/
 /* Set when LogFile is replaying. */
 #define NTFS_FLAGS_LOG_REPLAYING       0x00000008
 /* Set when we changed first MFT's which copy must be updated in $MftMirr. */
@@ -226,7 +228,7 @@ struct ntfs_sb_info {
        u64 maxbytes; // Maximum size for normal files.
        u64 maxbytes_sparse; // Maximum size for sparse file.
 
-       u32 flags; // See NTFS_FLAGS_XXX.
+       unsigned long flags; // See NTFS_FLAGS_
 
        CLST zone_max; // Maximum MFT zone length in clusters
        CLST bad_clusters; // The count of marked bad clusters.
@@ -473,7 +475,7 @@ bool al_delete_le(struct ntfs_inode *ni, enum ATTR_TYPE type, CLST vcn,
 int al_update(struct ntfs_inode *ni, int sync);
 static inline size_t al_aligned(size_t size)
 {
-       return (size + 1023) & ~(size_t)1023;
+       return size_add(size, 1023) & ~(size_t)1023;
 }
 
 /* Globals from bitfunc.c */
@@ -500,6 +502,8 @@ int ntfs3_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
 int ntfs_file_open(struct inode *inode, struct file *file);
 int ntfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
                __u64 start, __u64 len);
+long ntfs_ioctl(struct file *filp, u32 cmd, unsigned long arg);
+long ntfs_compat_ioctl(struct file *filp, u32 cmd, unsigned long arg);
 extern const struct inode_operations ntfs_special_inode_operations;
 extern const struct inode_operations ntfs_file_inode_operations;
 extern const struct file_operations ntfs_file_operations;
@@ -584,6 +588,7 @@ bool check_index_header(const struct INDEX_HDR *hdr, size_t bytes);
 int log_replay(struct ntfs_inode *ni, bool *initialized);
 
 /* Globals from fsntfs.c */
+struct buffer_head *ntfs_bread(struct super_block *sb, sector_t block);
 bool ntfs_fix_pre_write(struct NTFS_RECORD_HEADER *rhdr, size_t bytes);
 int ntfs_fix_post_read(struct NTFS_RECORD_HEADER *rhdr, size_t bytes,
                       bool simple);
@@ -872,7 +877,7 @@ int ntfs_init_acl(struct mnt_idmap *idmap, struct inode *inode,
 
 int ntfs_acl_chmod(struct mnt_idmap *idmap, struct dentry *dentry);
 ssize_t ntfs_listxattr(struct dentry *dentry, char *buffer, size_t size);
-extern const struct xattr_handler * const ntfs_xattr_handlers[];
+extern const struct xattr_handler *const ntfs_xattr_handlers[];
 
 int ntfs_save_wsl_perm(struct inode *inode, __le16 *ea_size);
 void ntfs_get_wsl_perm(struct inode *inode);
@@ -999,6 +1004,11 @@ static inline struct ntfs_sb_info *ntfs_sb(struct super_block *sb)
        return sb->s_fs_info;
 }
 
+static inline int ntfs3_forced_shutdown(struct super_block *sb)
+{
+       return test_bit(NTFS_FLAGS_SHUTDOWN_BIT, &ntfs_sb(sb)->flags);
+}
+
 /*
  * ntfs_up_cluster - Align up on cluster boundary.
  */
@@ -1025,19 +1035,6 @@ static inline u64 bytes_to_block(const struct super_block *sb, u64 size)
        return (size + sb->s_blocksize - 1) >> sb->s_blocksize_bits;
 }
 
-static inline struct buffer_head *ntfs_bread(struct super_block *sb,
-                                            sector_t block)
-{
-       struct buffer_head *bh = sb_bread(sb, block);
-
-       if (bh)
-               return bh;
-
-       ntfs_err(sb, "failed to read volume at offset 0x%llx",
-                (u64)block << sb->s_blocksize_bits);
-       return NULL;
-}
-
 static inline struct ntfs_inode *ntfs_i(struct inode *inode)
 {
        return container_of(inode, struct ntfs_inode, vfs_inode);
index 53629b1f65e995978cef9b312462447305cb578d..6aa3a9d44df1bdc90f56a947caf94c902590d983 100644 (file)
@@ -279,7 +279,7 @@ struct ATTRIB *mi_enum_attr(struct mft_inode *mi, struct ATTRIB *attr)
                if (t16 > asize)
                        return NULL;
 
-               if (t16 + le32_to_cpu(attr->res.data_size) > asize)
+               if (le32_to_cpu(attr->res.data_size) > asize - t16)
                        return NULL;
 
                t32 = sizeof(short) * attr->name_len;
@@ -535,8 +535,20 @@ bool mi_remove_attr(struct ntfs_inode *ni, struct mft_inode *mi,
                return false;
 
        if (ni && is_attr_indexed(attr)) {
-               le16_add_cpu(&ni->mi.mrec->hard_links, -1);
-               ni->mi.dirty = true;
+               u16 links = le16_to_cpu(ni->mi.mrec->hard_links);
+               struct ATTR_FILE_NAME *fname =
+                       attr->type != ATTR_NAME ?
+                               NULL :
+                               resident_data_ex(attr,
+                                                SIZEOF_ATTRIBUTE_FILENAME);
+               if (fname && fname->type == FILE_NAME_DOS) {
+                       /* Do not decrease links count deleting DOS name. */
+               } else if (!links) {
+                       /* minor error. Not critical. */
+               } else {
+                       ni->mi.mrec->hard_links = cpu_to_le16(links - 1);
+                       ni->mi.dirty = true;
+               }
        }
 
        used -= asize;
index 9153dffde950c2a396291bea88e3e6d31169f568..cef5467fd92833aec6fb0bce3879826c2a627a09 100644 (file)
@@ -122,13 +122,12 @@ void ntfs_inode_printk(struct inode *inode, const char *fmt, ...)
 
        if (name) {
                struct dentry *de = d_find_alias(inode);
-               const u32 name_len = ARRAY_SIZE(s_name_buf) - 1;
 
                if (de) {
                        spin_lock(&de->d_lock);
-                       snprintf(name, name_len, " \"%s\"", de->d_name.name);
+                       snprintf(name, sizeof(s_name_buf), " \"%s\"",
+                                de->d_name.name);
                        spin_unlock(&de->d_lock);
-                       name[name_len] = 0; /* To be sure. */
                } else {
                        name[0] = 0;
                }
@@ -625,7 +624,7 @@ static void ntfs3_free_sbi(struct ntfs_sb_info *sbi)
 {
        kfree(sbi->new_rec);
        kvfree(ntfs_put_shared(sbi->upcase));
-       kfree(sbi->def_table);
+       kvfree(sbi->def_table);
        kfree(sbi->compress.lznt);
 #ifdef CONFIG_NTFS3_LZX_XPRESS
        xpress_free_decompressor(sbi->compress.xpress);
@@ -714,6 +713,14 @@ static int ntfs_show_options(struct seq_file *m, struct dentry *root)
        return 0;
 }
 
+/*
+ * ntfs_shutdown - super_operations::shutdown
+ */
+static void ntfs_shutdown(struct super_block *sb)
+{
+       set_bit(NTFS_FLAGS_SHUTDOWN_BIT, &ntfs_sb(sb)->flags);
+}
+
 /*
  * ntfs_sync_fs - super_operations::sync_fs
  */
@@ -724,6 +731,9 @@ static int ntfs_sync_fs(struct super_block *sb, int wait)
        struct ntfs_inode *ni;
        struct inode *inode;
 
+       if (unlikely(ntfs3_forced_shutdown(sb)))
+               return -EIO;
+
        ni = sbi->security.ni;
        if (ni) {
                inode = &ni->vfs_inode;
@@ -763,6 +773,7 @@ static const struct super_operations ntfs_sops = {
        .put_super = ntfs_put_super,
        .statfs = ntfs_statfs,
        .show_options = ntfs_show_options,
+       .shutdown = ntfs_shutdown,
        .sync_fs = ntfs_sync_fs,
        .write_inode = ntfs3_write_inode,
 };
@@ -866,6 +877,7 @@ static int ntfs_init_from_boot(struct super_block *sb, u32 sector_size,
        u16 fn, ao;
        u8 cluster_bits;
        u32 boot_off = 0;
+       sector_t boot_block = 0;
        const char *hint = "Primary boot";
 
        /* Save original dev_size. Used with alternative boot. */
@@ -873,11 +885,11 @@ static int ntfs_init_from_boot(struct super_block *sb, u32 sector_size,
 
        sbi->volume.blocks = dev_size >> PAGE_SHIFT;
 
-       bh = ntfs_bread(sb, 0);
+read_boot:
+       bh = ntfs_bread(sb, boot_block);
        if (!bh)
-               return -EIO;
+               return boot_block ? -EINVAL : -EIO;
 
-check_boot:
        err = -EINVAL;
 
        /* Corrupted image; do not read OOB */
@@ -1108,26 +1120,24 @@ check_boot:
        }
 
 out:
-       if (err == -EINVAL && !bh->b_blocknr && dev_size0 > PAGE_SHIFT) {
+       brelse(bh);
+
+       if (err == -EINVAL && !boot_block && dev_size0 > PAGE_SHIFT) {
                u32 block_size = min_t(u32, sector_size, PAGE_SIZE);
                u64 lbo = dev_size0 - sizeof(*boot);
 
-               /*
-                * Try alternative boot (last sector)
-                */
-               brelse(bh);
-
-               sb_set_blocksize(sb, block_size);
-               bh = ntfs_bread(sb, lbo >> blksize_bits(block_size));
-               if (!bh)
-                       return -EINVAL;
-
+               boot_block = lbo >> blksize_bits(block_size);
                boot_off = lbo & (block_size - 1);
-               hint = "Alternative boot";
-               dev_size = dev_size0; /* restore original size. */
-               goto check_boot;
+               if (boot_block && block_size >= boot_off + sizeof(*boot)) {
+                       /*
+                        * Try alternative boot (last sector)
+                        */
+                       sb_set_blocksize(sb, block_size);
+                       hint = "Alternative boot";
+                       dev_size = dev_size0; /* restore original size. */
+                       goto read_boot;
+               }
        }
-       brelse(bh);
 
        return err;
 }
index 4274b6f31cfa1c49fec865fe0e1d81ab152e040b..53e7d1fa036aa6e50a3ccd529d88584b1350cd74 100644 (file)
@@ -219,6 +219,9 @@ static ssize_t ntfs_list_ea(struct ntfs_inode *ni, char *buffer,
                if (!ea->name_len)
                        break;
 
+               if (ea->name_len > ea_size)
+                       break;
+
                if (buffer) {
                        /* Check if we can use field ea->name */
                        if (off + ea_size > size)
@@ -744,6 +747,9 @@ static int ntfs_getxattr(const struct xattr_handler *handler, struct dentry *de,
        int err;
        struct ntfs_inode *ni = ntfs_i(inode);
 
+       if (unlikely(ntfs3_forced_shutdown(inode->i_sb)))
+               return -EIO;
+
        /* Dispatch request. */
        if (!strcmp(name, SYSTEM_DOS_ATTRIB)) {
                /* system.dos_attrib */
index 4d7efefa98c5ec3da43a554f1e9685e8e6b75e7e..1bde1281d5146d3d99e3b97c10cb7c6a3ad41b0f 100644 (file)
@@ -213,7 +213,7 @@ struct o2hb_region {
        unsigned int            hr_num_pages;
 
        struct page             **hr_slot_data;
-       struct bdev_handle      *hr_bdev_handle;
+       struct file             *hr_bdev_file;
        struct o2hb_disk_slot   *hr_slots;
 
        /* live node map of this region */
@@ -263,7 +263,7 @@ struct o2hb_region {
 
 static inline struct block_device *reg_bdev(struct o2hb_region *reg)
 {
-       return reg->hr_bdev_handle ? reg->hr_bdev_handle->bdev : NULL;
+       return reg->hr_bdev_file ? file_bdev(reg->hr_bdev_file) : NULL;
 }
 
 struct o2hb_bio_wait_ctxt {
@@ -1509,8 +1509,8 @@ static void o2hb_region_release(struct config_item *item)
                kfree(reg->hr_slot_data);
        }
 
-       if (reg->hr_bdev_handle)
-               bdev_release(reg->hr_bdev_handle);
+       if (reg->hr_bdev_file)
+               fput(reg->hr_bdev_file);
 
        kfree(reg->hr_slots);
 
@@ -1569,7 +1569,7 @@ static ssize_t o2hb_region_block_bytes_store(struct config_item *item,
        unsigned long block_bytes;
        unsigned int block_bits;
 
-       if (reg->hr_bdev_handle)
+       if (reg->hr_bdev_file)
                return -EINVAL;
 
        status = o2hb_read_block_input(reg, page, &block_bytes,
@@ -1598,7 +1598,7 @@ static ssize_t o2hb_region_start_block_store(struct config_item *item,
        char *p = (char *)page;
        ssize_t ret;
 
-       if (reg->hr_bdev_handle)
+       if (reg->hr_bdev_file)
                return -EINVAL;
 
        ret = kstrtoull(p, 0, &tmp);
@@ -1623,7 +1623,7 @@ static ssize_t o2hb_region_blocks_store(struct config_item *item,
        unsigned long tmp;
        char *p = (char *)page;
 
-       if (reg->hr_bdev_handle)
+       if (reg->hr_bdev_file)
                return -EINVAL;
 
        tmp = simple_strtoul(p, &p, 0);
@@ -1642,7 +1642,7 @@ static ssize_t o2hb_region_dev_show(struct config_item *item, char *page)
 {
        unsigned int ret = 0;
 
-       if (to_o2hb_region(item)->hr_bdev_handle)
+       if (to_o2hb_region(item)->hr_bdev_file)
                ret = sprintf(page, "%pg\n", reg_bdev(to_o2hb_region(item)));
 
        return ret;
@@ -1753,7 +1753,7 @@ out:
 }
 
 /*
- * this is acting as commit; we set up all of hr_bdev_handle and hr_task or
+ * this is acting as commit; we set up all of hr_bdev_file and hr_task or
  * nothing
  */
 static ssize_t o2hb_region_dev_store(struct config_item *item,
@@ -1769,7 +1769,7 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
        ssize_t ret = -EINVAL;
        int live_threshold;
 
-       if (reg->hr_bdev_handle)
+       if (reg->hr_bdev_file)
                goto out;
 
        /* We can't heartbeat without having had our node number
@@ -1795,11 +1795,11 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
        if (!S_ISBLK(f.file->f_mapping->host->i_mode))
                goto out2;
 
-       reg->hr_bdev_handle = bdev_open_by_dev(f.file->f_mapping->host->i_rdev,
+       reg->hr_bdev_file = bdev_file_open_by_dev(f.file->f_mapping->host->i_rdev,
                        BLK_OPEN_WRITE | BLK_OPEN_READ, NULL, NULL);
-       if (IS_ERR(reg->hr_bdev_handle)) {
-               ret = PTR_ERR(reg->hr_bdev_handle);
-               reg->hr_bdev_handle = NULL;
+       if (IS_ERR(reg->hr_bdev_file)) {
+               ret = PTR_ERR(reg->hr_bdev_file);
+               reg->hr_bdev_file = NULL;
                goto out2;
        }
 
@@ -1903,8 +1903,8 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
 
 out3:
        if (ret < 0) {
-               bdev_release(reg->hr_bdev_handle);
-               reg->hr_bdev_handle = NULL;
+               fput(reg->hr_bdev_file);
+               reg->hr_bdev_file = NULL;
        }
 out2:
        fdput(f);
index f37174e79fad4f377e76c04eec2a06c5f1f19f89..6de944818c569ceecc8b81a4acf49780dbb9b5b7 100644 (file)
@@ -27,7 +27,7 @@ static int ocfs2_do_flock(struct file *file, struct inode *inode,
        struct ocfs2_file_private *fp = file->private_data;
        struct ocfs2_lock_res *lockres = &fp->fp_flock;
 
-       if (fl->fl_type == F_WRLCK)
+       if (lock_is_write(fl))
                level = 1;
        if (!IS_SETLKW(cmd))
                trylock = 1;
@@ -53,8 +53,8 @@ static int ocfs2_do_flock(struct file *file, struct inode *inode,
                 */
 
                locks_init_lock(&request);
-               request.fl_type = F_UNLCK;
-               request.fl_flags = FL_FLOCK;
+               request.c.flc_type = F_UNLCK;
+               request.c.flc_flags = FL_FLOCK;
                locks_lock_file_wait(file, &request);
 
                ocfs2_file_unlock(file);
@@ -100,14 +100,14 @@ int ocfs2_flock(struct file *file, int cmd, struct file_lock *fl)
        struct inode *inode = file->f_mapping->host;
        struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
 
-       if (!(fl->fl_flags & FL_FLOCK))
+       if (!(fl->c.flc_flags & FL_FLOCK))
                return -ENOLCK;
 
        if ((osb->s_mount_opt & OCFS2_MOUNT_LOCALFLOCKS) ||
            ocfs2_mount_local(osb))
                return locks_lock_file_wait(file, fl);
 
-       if (fl->fl_type == F_UNLCK)
+       if (lock_is_unlock(fl))
                return ocfs2_do_funlock(file, cmd, fl);
        else
                return ocfs2_do_flock(file, inode, cmd, fl);
@@ -118,7 +118,7 @@ int ocfs2_lock(struct file *file, int cmd, struct file_lock *fl)
        struct inode *inode = file->f_mapping->host;
        struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
 
-       if (!(fl->fl_flags & FL_POSIX))
+       if (!(fl->c.flc_flags & FL_POSIX))
                return -ENOLCK;
 
        return ocfs2_plock(osb->cconn, OCFS2_I(inode)->ip_blkno, file, cmd, fl);
index 9b76ee66aeb2f4f6c4d5f5728e2c303a8a440007..c11406cd87a88e5329d78a8bd252a180d810e361 100644 (file)
@@ -744,7 +744,7 @@ static int user_plock(struct ocfs2_cluster_connection *conn,
                return dlm_posix_cancel(conn->cc_lockspace, ino, file, fl);
        else if (IS_GETLK(cmd))
                return dlm_posix_get(conn->cc_lockspace, ino, file, fl);
-       else if (fl->fl_type == F_UNLCK)
+       else if (lock_is_unlock(fl))
                return dlm_posix_unlock(conn->cc_lockspace, ino, file, fl);
        else
                return dlm_posix_lock(conn->cc_lockspace, ino, file, cmd, fl);
index 6b906424902b46b9b17e1fb2102e916fcb89d1fc..a70aff17d4554af8478308dc6c2b5b5cd3122d23 100644 (file)
@@ -2027,8 +2027,8 @@ static int ocfs2_initialize_super(struct super_block *sb,
        cbits = le32_to_cpu(di->id2.i_super.s_clustersize_bits);
        bbits = le32_to_cpu(di->id2.i_super.s_blocksize_bits);
        sb->s_maxbytes = ocfs2_max_file_offset(bbits, cbits);
-       memcpy(&sb->s_uuid, di->id2.i_super.s_uuid,
-              sizeof(di->id2.i_super.s_uuid));
+       super_set_uuid(sb, di->id2.i_super.s_uuid,
+                      sizeof(di->id2.i_super.s_uuid));
 
        osb->osb_dx_mask = (1 << (cbits - bbits)) - 1;
 
index a84d21e55c391eebdf662c7fa8c9d5b2aef8efd2..a7d4bb2c725f1e9bdde176f769c3a52df0627ab9 100644 (file)
--- a/fs/open.c
+++ b/fs/open.c
@@ -154,49 +154,52 @@ COMPAT_SYSCALL_DEFINE2(truncate, const char __user *, path, compat_off_t, length
 }
 #endif
 
-long do_sys_ftruncate(unsigned int fd, loff_t length, int small)
+long do_ftruncate(struct file *file, loff_t length, int small)
 {
        struct inode *inode;
        struct dentry *dentry;
-       struct fd f;
        int error;
 
-       error = -EINVAL;
-       if (length < 0)
-               goto out;
-       error = -EBADF;
-       f = fdget(fd);
-       if (!f.file)
-               goto out;
-
        /* explicitly opened as large or we are on 64-bit box */
-       if (f.file->f_flags & O_LARGEFILE)
+       if (file->f_flags & O_LARGEFILE)
                small = 0;
 
-       dentry = f.file->f_path.dentry;
+       dentry = file->f_path.dentry;
        inode = dentry->d_inode;
-       error = -EINVAL;
-       if (!S_ISREG(inode->i_mode) || !(f.file->f_mode & FMODE_WRITE))
-               goto out_putf;
+       if (!S_ISREG(inode->i_mode) || !(file->f_mode & FMODE_WRITE))
+               return -EINVAL;
 
-       error = -EINVAL;
        /* Cannot ftruncate over 2^31 bytes without large file support */
        if (small && length > MAX_NON_LFS)
-               goto out_putf;
+               return -EINVAL;
 
-       error = -EPERM;
        /* Check IS_APPEND on real upper inode */
-       if (IS_APPEND(file_inode(f.file)))
-               goto out_putf;
+       if (IS_APPEND(file_inode(file)))
+               return -EPERM;
        sb_start_write(inode->i_sb);
-       error = security_file_truncate(f.file);
+       error = security_file_truncate(file);
        if (!error)
-               error = do_truncate(file_mnt_idmap(f.file), dentry, length,
-                                   ATTR_MTIME | ATTR_CTIME, f.file);
+               error = do_truncate(file_mnt_idmap(file), dentry, length,
+                                   ATTR_MTIME | ATTR_CTIME, file);
        sb_end_write(inode->i_sb);
-out_putf:
+
+       return error;
+}
+
+long do_sys_ftruncate(unsigned int fd, loff_t length, int small)
+{
+       struct fd f;
+       int error;
+
+       if (length < 0)
+               return -EINVAL;
+       f = fdget(fd);
+       if (!f.file)
+               return -EBADF;
+
+       error = do_ftruncate(f.file, length, small);
+
        fdput(f);
-out:
        return error;
 }
 
@@ -1364,7 +1367,7 @@ struct file *filp_open(const char *filename, int flags, umode_t mode)
 {
        struct filename *name = getname_kernel(filename);
        struct file *file = ERR_CAST(name);
-       
+
        if (!IS_ERR(name)) {
                file = file_open_name(name, flags, mode);
                putname(name);
index c4b65a6d41cc345c83e4729d3f9a76cad66391e3..4a0779e3ef792334904f18bfc5b5c791a7141768 100644 (file)
@@ -446,7 +446,7 @@ static int __init init_openprom_fs(void)
                                            sizeof(struct op_inode_info),
                                            0,
                                            (SLAB_RECLAIM_ACCOUNT |
-                                            SLAB_MEM_SPREAD | SLAB_ACCOUNT),
+                                            SLAB_ACCOUNT),
                                            op_inode_init_once);
        if (!op_inode_cachep)
                return -ENOMEM;
index b8e25ca51016d9df648ca58495baa9db553330ec..8586e2f5d24390c91263ea1ee48e7c3b22199cd2 100644 (file)
@@ -265,20 +265,18 @@ static int ovl_copy_up_file(struct ovl_fs *ofs, struct dentry *dentry,
        if (IS_ERR(old_file))
                return PTR_ERR(old_file);
 
+       /* Try to use clone_file_range to clone up within the same fs */
+       cloned = vfs_clone_file_range(old_file, 0, new_file, 0, len, 0);
+       if (cloned == len)
+               goto out_fput;
+
+       /* Couldn't clone, so now we try to copy the data */
        error = rw_verify_area(READ, old_file, &old_pos, len);
        if (!error)
                error = rw_verify_area(WRITE, new_file, &new_pos, len);
        if (error)
                goto out_fput;
 
-       /* Try to use clone_file_range to clone up within the same fs */
-       ovl_start_write(dentry);
-       cloned = do_clone_file_range(old_file, 0, new_file, 0, len, 0);
-       ovl_end_write(dentry);
-       if (cloned == len)
-               goto out_fput;
-       /* Couldn't clone, so now we try to copy the data */
-
        /* Check if lower fs supports seek operation */
        if (old_file->f_mode & FMODE_LSEEK)
                skip_hole = true;
index 984ffdaeed6ca8efcf8acb7852a481628ee3c380..5764f91d283e7027e2ca075057242968c1016455 100644 (file)
 
 struct ovl_lookup_data {
        struct super_block *sb;
-       struct vfsmount *mnt;
+       const struct ovl_layer *layer;
        struct qstr name;
        bool is_dir;
        bool opaque;
+       bool xwhiteouts;
        bool stop;
        bool last;
        char *redirect;
@@ -201,17 +202,13 @@ struct dentry *ovl_decode_real_fh(struct ovl_fs *ofs, struct ovl_fh *fh,
        return real;
 }
 
-static bool ovl_is_opaquedir(struct ovl_fs *ofs, const struct path *path)
-{
-       return ovl_path_check_dir_xattr(ofs, path, OVL_XATTR_OPAQUE);
-}
-
 static struct dentry *ovl_lookup_positive_unlocked(struct ovl_lookup_data *d,
                                                   const char *name,
                                                   struct dentry *base, int len,
                                                   bool drop_negative)
 {
-       struct dentry *ret = lookup_one_unlocked(mnt_idmap(d->mnt), name, base, len);
+       struct dentry *ret = lookup_one_unlocked(mnt_idmap(d->layer->mnt), name,
+                                                base, len);
 
        if (!IS_ERR(ret) && d_flags_negative(smp_load_acquire(&ret->d_flags))) {
                if (drop_negative && ret->d_lockref.count == 1) {
@@ -232,10 +229,13 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
                             size_t prelen, const char *post,
                             struct dentry **ret, bool drop_negative)
 {
+       struct ovl_fs *ofs = OVL_FS(d->sb);
        struct dentry *this;
        struct path path;
        int err;
        bool last_element = !post[0];
+       bool is_upper = d->layer->idx == 0;
+       char val;
 
        this = ovl_lookup_positive_unlocked(d, name, base, namelen, drop_negative);
        if (IS_ERR(this)) {
@@ -253,8 +253,8 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
        }
 
        path.dentry = this;
-       path.mnt = d->mnt;
-       if (ovl_path_is_whiteout(OVL_FS(d->sb), &path)) {
+       path.mnt = d->layer->mnt;
+       if (ovl_path_is_whiteout(ofs, &path)) {
                d->stop = d->opaque = true;
                goto put_and_out;
        }
@@ -272,7 +272,7 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
                        d->stop = true;
                        goto put_and_out;
                }
-               err = ovl_check_metacopy_xattr(OVL_FS(d->sb), &path, NULL);
+               err = ovl_check_metacopy_xattr(ofs, &path, NULL);
                if (err < 0)
                        goto out_err;
 
@@ -292,7 +292,12 @@ static int ovl_lookup_single(struct dentry *base, struct ovl_lookup_data *d,
                if (d->last)
                        goto out;
 
-               if (ovl_is_opaquedir(OVL_FS(d->sb), &path)) {
+               /* overlay.opaque=x means xwhiteouts directory */
+               val = ovl_get_opaquedir_val(ofs, &path);
+               if (last_element && !is_upper && val == 'x') {
+                       d->xwhiteouts = true;
+                       ovl_layer_set_xwhiteouts(ofs, d->layer);
+               } else if (val == 'y') {
                        d->stop = true;
                        if (last_element)
                                d->opaque = true;
@@ -863,7 +868,8 @@ fail:
  * Returns next layer in stack starting from top.
  * Returns -1 if this is the last layer.
  */
-int ovl_path_next(int idx, struct dentry *dentry, struct path *path)
+int ovl_path_next(int idx, struct dentry *dentry, struct path *path,
+                 const struct ovl_layer **layer)
 {
        struct ovl_entry *oe = OVL_E(dentry);
        struct ovl_path *lowerstack = ovl_lowerstack(oe);
@@ -871,13 +877,16 @@ int ovl_path_next(int idx, struct dentry *dentry, struct path *path)
        BUG_ON(idx < 0);
        if (idx == 0) {
                ovl_path_upper(dentry, path);
-               if (path->dentry)
+               if (path->dentry) {
+                       *layer = &OVL_FS(dentry->d_sb)->layers[0];
                        return ovl_numlower(oe) ? 1 : -1;
+               }
                idx++;
        }
        BUG_ON(idx > ovl_numlower(oe));
        path->dentry = lowerstack[idx - 1].dentry;
-       path->mnt = lowerstack[idx - 1].layer->mnt;
+       *layer = lowerstack[idx - 1].layer;
+       path->mnt = (*layer)->mnt;
 
        return (idx < ovl_numlower(oe)) ? idx + 1 : -1;
 }
@@ -1055,7 +1064,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
        old_cred = ovl_override_creds(dentry->d_sb);
        upperdir = ovl_dentry_upper(dentry->d_parent);
        if (upperdir) {
-               d.mnt = ovl_upper_mnt(ofs);
+               d.layer = &ofs->layers[0];
                err = ovl_lookup_layer(upperdir, &d, &upperdentry, true);
                if (err)
                        goto out;
@@ -1111,7 +1120,7 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
                else if (d.is_dir || !ofs->numdatalayer)
                        d.last = lower.layer->idx == ovl_numlower(roe);
 
-               d.mnt = lower.layer->mnt;
+               d.layer = lower.layer;
                err = ovl_lookup_layer(lower.dentry, &d, &this, false);
                if (err)
                        goto out_put;
@@ -1278,6 +1287,8 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
 
        if (upperopaque)
                ovl_dentry_set_opaque(dentry);
+       if (d.xwhiteouts)
+               ovl_dentry_set_xwhiteouts(dentry);
 
        if (upperdentry)
                ovl_dentry_set_upper_alias(dentry);
index 5ba11eb4376792f3047bb683913557f79f1fa53e..ee949f3e7c77839e999cb6f5344cb167fd4e43b5 100644 (file)
@@ -50,7 +50,6 @@ enum ovl_xattr {
        OVL_XATTR_METACOPY,
        OVL_XATTR_PROTATTR,
        OVL_XATTR_XWHITEOUT,
-       OVL_XATTR_XWHITEOUTS,
 };
 
 enum ovl_inode_flag {
@@ -70,6 +69,8 @@ enum ovl_entry_flag {
        OVL_E_UPPER_ALIAS,
        OVL_E_OPAQUE,
        OVL_E_CONNECTED,
+       /* Lower stack may contain xwhiteout entries */
+       OVL_E_XWHITEOUTS,
 };
 
 enum {
@@ -477,6 +478,10 @@ bool ovl_dentry_test_flag(unsigned long flag, struct dentry *dentry);
 bool ovl_dentry_is_opaque(struct dentry *dentry);
 bool ovl_dentry_is_whiteout(struct dentry *dentry);
 void ovl_dentry_set_opaque(struct dentry *dentry);
+bool ovl_dentry_has_xwhiteouts(struct dentry *dentry);
+void ovl_dentry_set_xwhiteouts(struct dentry *dentry);
+void ovl_layer_set_xwhiteouts(struct ovl_fs *ofs,
+                             const struct ovl_layer *layer);
 bool ovl_dentry_has_upper_alias(struct dentry *dentry);
 void ovl_dentry_set_upper_alias(struct dentry *dentry);
 bool ovl_dentry_needs_data_copy_up(struct dentry *dentry, int flags);
@@ -494,11 +499,10 @@ struct file *ovl_path_open(const struct path *path, int flags);
 int ovl_copy_up_start(struct dentry *dentry, int flags);
 void ovl_copy_up_end(struct dentry *dentry);
 bool ovl_already_copied_up(struct dentry *dentry, int flags);
-bool ovl_path_check_dir_xattr(struct ovl_fs *ofs, const struct path *path,
-                             enum ovl_xattr ox);
+char ovl_get_dir_xattr_val(struct ovl_fs *ofs, const struct path *path,
+                          enum ovl_xattr ox);
 bool ovl_path_check_origin_xattr(struct ovl_fs *ofs, const struct path *path);
 bool ovl_path_check_xwhiteout_xattr(struct ovl_fs *ofs, const struct path *path);
-bool ovl_path_check_xwhiteouts_xattr(struct ovl_fs *ofs, const struct path *path);
 bool ovl_init_uuid_xattr(struct super_block *sb, struct ovl_fs *ofs,
                         const struct path *upperpath);
 
@@ -573,7 +577,13 @@ static inline bool ovl_is_impuredir(struct super_block *sb,
                .mnt = ovl_upper_mnt(ofs),
        };
 
-       return ovl_path_check_dir_xattr(ofs, &upperpath, OVL_XATTR_IMPURE);
+       return ovl_get_dir_xattr_val(ofs, &upperpath, OVL_XATTR_IMPURE) == 'y';
+}
+
+static inline char ovl_get_opaquedir_val(struct ovl_fs *ofs,
+                                        const struct path *path)
+{
+       return ovl_get_dir_xattr_val(ofs, path, OVL_XATTR_OPAQUE);
 }
 
 static inline bool ovl_redirect_follow(struct ovl_fs *ofs)
@@ -680,7 +690,8 @@ int ovl_get_index_name(struct ovl_fs *ofs, struct dentry *origin,
 struct dentry *ovl_get_index_fh(struct ovl_fs *ofs, struct ovl_fh *fh);
 struct dentry *ovl_lookup_index(struct ovl_fs *ofs, struct dentry *upper,
                                struct dentry *origin, bool verify);
-int ovl_path_next(int idx, struct dentry *dentry, struct path *path);
+int ovl_path_next(int idx, struct dentry *dentry, struct path *path,
+                 const struct ovl_layer **layer);
 int ovl_verify_lowerdata(struct dentry *dentry);
 struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
                          unsigned int flags);
index 5fa9c58af65f2107c19524fc58808a3140820f5c..cb449ab310a7a89aafa0ee04ee7ff6c8141dd7d5 100644 (file)
@@ -40,6 +40,8 @@ struct ovl_layer {
        int idx;
        /* One fsid per unique underlying sb (upper fsid == 0) */
        int fsid;
+       /* xwhiteouts were found on this layer */
+       bool has_xwhiteouts;
 };
 
 struct ovl_path {
@@ -59,7 +61,7 @@ struct ovl_fs {
        unsigned int numfs;
        /* Number of data-only lower layers */
        unsigned int numdatalayer;
-       const struct ovl_layer *layers;
+       struct ovl_layer *layers;
        struct ovl_sb *fs;
        /* workbasedir is the path at workdir= mount option */
        struct dentry *workbasedir;
index 112b4b12f8252affb5267cc4a908cb63d9bdd3c2..36dcc530ac286b0102e228acebdf03a5951fa57b 100644 (file)
@@ -280,12 +280,20 @@ static int ovl_mount_dir_check(struct fs_context *fc, const struct path *path,
 {
        struct ovl_fs_context *ctx = fc->fs_private;
 
-       if (ovl_dentry_weird(path->dentry))
-               return invalfc(fc, "filesystem on %s not supported", name);
-
        if (!d_is_dir(path->dentry))
                return invalfc(fc, "%s is not a directory", name);
 
+       /*
+        * Root dentries of case-insensitive capable filesystems might
+        * not have the dentry operations set, but still be incompatible
+        * with overlayfs.  Check explicitly to prevent post-mount
+        * failures.
+        */
+       if (sb_has_encoding(path->mnt->mnt_sb))
+               return invalfc(fc, "case-insensitive capable filesystem on %s not supported", name);
+
+       if (ovl_dentry_weird(path->dentry))
+               return invalfc(fc, "filesystem on %s not supported", name);
 
        /*
         * Check whether upper path is read-only here to report failures
index e71156baa7bccae2d15c1830938d62116f546b74..0ca8af060b0c194e5824e59b59d9d2dc8b051355 100644 (file)
@@ -305,8 +305,6 @@ static inline int ovl_dir_read(const struct path *realpath,
        if (IS_ERR(realfile))
                return PTR_ERR(realfile);
 
-       rdd->in_xwhiteouts_dir = rdd->dentry &&
-               ovl_path_check_xwhiteouts_xattr(OVL_FS(rdd->dentry->d_sb), realpath);
        rdd->first_maybe_whiteout = NULL;
        rdd->ctx.pos = 0;
        do {
@@ -359,10 +357,13 @@ static int ovl_dir_read_merged(struct dentry *dentry, struct list_head *list,
                .is_lowest = false,
        };
        int idx, next;
+       const struct ovl_layer *layer;
 
        for (idx = 0; idx != -1; idx = next) {
-               next = ovl_path_next(idx, dentry, &realpath);
+               next = ovl_path_next(idx, dentry, &realpath, &layer);
                rdd.is_upper = ovl_dentry_upper(dentry) == realpath.dentry;
+               rdd.in_xwhiteouts_dir = layer->has_xwhiteouts &&
+                                       ovl_dentry_has_xwhiteouts(dentry);
 
                if (next != -1) {
                        err = ovl_dir_read(&realpath, &rdd);
index 4ab66e3d4cff9854a99bcc1505963927476bf1d5..36d4b8b1f784462dffe665fb83f7a756eeba3262 100644 (file)
@@ -28,41 +28,38 @@ MODULE_LICENSE("GPL");
 
 struct ovl_dir_cache;
 
-static struct dentry *ovl_d_real(struct dentry *dentry,
-                                const struct inode *inode)
+static struct dentry *ovl_d_real(struct dentry *dentry, enum d_real_type type)
 {
-       struct dentry *real = NULL, *lower;
+       struct dentry *upper, *lower;
        int err;
 
-       /*
-        * vfs is only expected to call d_real() with NULL from d_real_inode()
-        * and with overlay inode from file_dentry() on an overlay file.
-        *
-        * TODO: remove @inode argument from d_real() API, remove code in this
-        * function that deals with non-NULL @inode and remove d_real() call
-        * from file_dentry().
-        */
-       if (inode && d_inode(dentry) == inode)
-               return dentry;
-       else if (inode)
+       switch (type) {
+       case D_REAL_DATA:
+       case D_REAL_METADATA:
+               break;
+       default:
                goto bug;
+       }
 
        if (!d_is_reg(dentry)) {
                /* d_real_inode() is only relevant for regular files */
                return dentry;
        }
 
-       real = ovl_dentry_upper(dentry);
-       if (real && (inode == d_inode(real)))
-               return real;
+       upper = ovl_dentry_upper(dentry);
+       if (upper && (type == D_REAL_METADATA ||
+                     ovl_has_upperdata(d_inode(dentry))))
+               return upper;
 
-       if (real && !inode && ovl_has_upperdata(d_inode(dentry)))
-               return real;
+       if (type == D_REAL_METADATA) {
+               lower = ovl_dentry_lower(dentry);
+               goto real_lower;
+       }
 
        /*
-        * Best effort lazy lookup of lowerdata for !inode case to return
+        * Best effort lazy lookup of lowerdata for D_REAL_DATA case to return
         * the real lowerdata dentry.  The only current caller of d_real() with
-        * NULL inode is d_real_inode() from trace_uprobe and this caller is
+        * D_REAL_DATA is d_real_inode() from trace_uprobe and this caller is
         * likely going to be followed reading from the file, before placing
         * uprobes on offset within the file, so lowerdata should be available
         * when setting the uprobe.
@@ -73,18 +70,13 @@ static struct dentry *ovl_d_real(struct dentry *dentry,
        lower = ovl_dentry_lowerdata(dentry);
        if (!lower)
                goto bug;
-       real = lower;
 
-       /* Handle recursion */
-       real = d_real(real, inode);
+real_lower:
+       /* Handle recursion into stacked lower fs */
+       return d_real(lower, type);
 
-       if (!inode || inode == d_inode(real))
-               return real;
 bug:
-       WARN(1, "%s(%pd4, %s:%lu): real dentry (%p/%lu) not found\n",
-            __func__, dentry, inode ? inode->i_sb->s_id : "NULL",
-            inode ? inode->i_ino : 0, real,
-            real && d_inode(real) ? d_inode(real)->i_ino : 0);
+       WARN(1, "%s(%pd4, %d): real dentry not found\n", __func__, dentry, type);
        return dentry;
 }
 
@@ -1249,6 +1241,7 @@ static struct dentry *ovl_get_root(struct super_block *sb,
                                   struct ovl_entry *oe)
 {
        struct dentry *root;
+       struct ovl_fs *ofs = OVL_FS(sb);
        struct ovl_path *lowerpath = ovl_lowerstack(oe);
        unsigned long ino = d_inode(lowerpath->dentry)->i_ino;
        int fsid = lowerpath->layer->fsid;
@@ -1270,6 +1263,20 @@ static struct dentry *ovl_get_root(struct super_block *sb,
                        ovl_set_flag(OVL_IMPURE, d_inode(root));
        }
 
+       /* Look for xwhiteouts marker except in the lowermost layer */
+       for (int i = 0; i < ovl_numlower(oe) - 1; i++, lowerpath++) {
+               struct path path = {
+                       .mnt = lowerpath->layer->mnt,
+                       .dentry = lowerpath->dentry,
+               };
+
+               /* overlay.opaque=x means xwhiteouts directory */
+               if (ovl_get_opaquedir_val(ofs, &path) == 'x') {
+                       ovl_layer_set_xwhiteouts(ofs, lowerpath->layer);
+                       ovl_dentry_set_xwhiteouts(root);
+               }
+       }
+
        /* Root is always merge -> can have whiteouts */
        ovl_set_flag(OVL_WHITEOUTS, d_inode(root));
        ovl_dentry_set_flag(OVL_E_CONNECTED, root);
index 0217094c23ea6ae8905c7cb0c44c3ba969345200..d285d1d7baaddedf6fc938ac8aaf8dc35e74d7d7 100644 (file)
@@ -461,6 +461,33 @@ void ovl_dentry_set_opaque(struct dentry *dentry)
        ovl_dentry_set_flag(OVL_E_OPAQUE, dentry);
 }
 
+bool ovl_dentry_has_xwhiteouts(struct dentry *dentry)
+{
+       return ovl_dentry_test_flag(OVL_E_XWHITEOUTS, dentry);
+}
+
+void ovl_dentry_set_xwhiteouts(struct dentry *dentry)
+{
+       ovl_dentry_set_flag(OVL_E_XWHITEOUTS, dentry);
+}
+
+/*
+ * ovl_layer_set_xwhiteouts() is called before adding the overlay dir
+ * dentry to dcache, while readdir of that same directory happens after
+ * the overlay dir dentry is in dcache, so if some cpu observes that
+ * ovl_dentry_is_xwhiteouts(), it will also observe layer->has_xwhiteouts
+ * for the layers where xwhiteouts marker was found in that merge dir.
+ */
+void ovl_layer_set_xwhiteouts(struct ovl_fs *ofs,
+                             const struct ovl_layer *layer)
+{
+       if (layer->has_xwhiteouts)
+               return;
+
+       /* Write once to read-mostly layer properties */
+       ofs->layers[layer->idx].has_xwhiteouts = true;
+}
+
 /*
  * For hard links and decoded file handles, it's possible for ovl_dentry_upper()
  * to return positive, while there's no actual upper alias for the inode.
@@ -739,19 +766,6 @@ bool ovl_path_check_xwhiteout_xattr(struct ovl_fs *ofs, const struct path *path)
        return res >= 0;
 }
 
-bool ovl_path_check_xwhiteouts_xattr(struct ovl_fs *ofs, const struct path *path)
-{
-       struct dentry *dentry = path->dentry;
-       int res;
-
-       /* xattr.whiteouts must be a directory */
-       if (!d_is_dir(dentry))
-               return false;
-
-       res = ovl_path_getxattr(ofs, path, OVL_XATTR_XWHITEOUTS, NULL, 0);
-       return res >= 0;
-}
-
 /*
  * Load persistent uuid from xattr into s_uuid if found, or store a new
  * random generated value in s_uuid and in xattr.
@@ -760,13 +774,14 @@ bool ovl_init_uuid_xattr(struct super_block *sb, struct ovl_fs *ofs,
                         const struct path *upperpath)
 {
        bool set = false;
+       uuid_t uuid;
        int res;
 
        /* Try to load existing persistent uuid */
-       res = ovl_path_getxattr(ofs, upperpath, OVL_XATTR_UUID, sb->s_uuid.b,
+       res = ovl_path_getxattr(ofs, upperpath, OVL_XATTR_UUID, uuid.b,
                                UUID_SIZE);
        if (res == UUID_SIZE)
-               return true;
+               goto set_uuid;
 
        if (res != -ENODATA)
                goto fail;
@@ -794,37 +809,37 @@ bool ovl_init_uuid_xattr(struct super_block *sb, struct ovl_fs *ofs,
        }
 
        /* Generate overlay instance uuid */
-       uuid_gen(&sb->s_uuid);
+       uuid_gen(&uuid);
 
        /* Try to store persistent uuid */
        set = true;
-       res = ovl_setxattr(ofs, upperpath->dentry, OVL_XATTR_UUID, sb->s_uuid.b,
+       res = ovl_setxattr(ofs, upperpath->dentry, OVL_XATTR_UUID, uuid.b,
                           UUID_SIZE);
-       if (res == 0)
-               return true;
+       if (res)
+               goto fail;
+
+set_uuid:
+       super_set_uuid(sb, uuid.b, sizeof(uuid));
+       return true;
 
 fail:
-       memset(sb->s_uuid.b, 0, UUID_SIZE);
        ofs->config.uuid = OVL_UUID_NULL;
        pr_warn("failed to %s uuid (%pd2, err=%i); falling back to uuid=null.\n",
                set ? "set" : "get", upperpath->dentry, res);
        return false;
 }
 
-bool ovl_path_check_dir_xattr(struct ovl_fs *ofs, const struct path *path,
-                              enum ovl_xattr ox)
+char ovl_get_dir_xattr_val(struct ovl_fs *ofs, const struct path *path,
+                          enum ovl_xattr ox)
 {
        int res;
        char val;
 
        if (!d_is_dir(path->dentry))
-               return false;
+               return 0;
 
        res = ovl_path_getxattr(ofs, path, ox, &val, 1);
-       if (res == 1 && val == 'y')
-               return true;
-
-       return false;
+       return res == 1 ? val : 0;
 }
 
 #define OVL_XATTR_OPAQUE_POSTFIX       "opaque"
@@ -837,7 +852,6 @@ bool ovl_path_check_dir_xattr(struct ovl_fs *ofs, const struct path *path,
 #define OVL_XATTR_METACOPY_POSTFIX     "metacopy"
 #define OVL_XATTR_PROTATTR_POSTFIX     "protattr"
 #define OVL_XATTR_XWHITEOUT_POSTFIX    "whiteout"
-#define OVL_XATTR_XWHITEOUTS_POSTFIX   "whiteouts"
 
 #define OVL_XATTR_TAB_ENTRY(x) \
        [x] = { [false] = OVL_XATTR_TRUSTED_PREFIX x ## _POSTFIX, \
@@ -854,7 +868,6 @@ const char *const ovl_xattr_table[][2] = {
        OVL_XATTR_TAB_ENTRY(OVL_XATTR_METACOPY),
        OVL_XATTR_TAB_ENTRY(OVL_XATTR_PROTATTR),
        OVL_XATTR_TAB_ENTRY(OVL_XATTR_XWHITEOUT),
-       OVL_XATTR_TAB_ENTRY(OVL_XATTR_XWHITEOUTS),
 };
 
 int ovl_check_setxattr(struct ovl_fs *ofs, struct dentry *upperdentry,
diff --git a/fs/pidfs.c b/fs/pidfs.c
new file mode 100644 (file)
index 0000000..8fd71a0
--- /dev/null
@@ -0,0 +1,290 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/anon_inodes.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/magic.h>
+#include <linux/mount.h>
+#include <linux/pid.h>
+#include <linux/pidfs.h>
+#include <linux/pid_namespace.h>
+#include <linux/poll.h>
+#include <linux/proc_fs.h>
+#include <linux/proc_ns.h>
+#include <linux/pseudo_fs.h>
+#include <linux/seq_file.h>
+#include <uapi/linux/pidfd.h>
+
+#include "internal.h"
+
+static int pidfd_release(struct inode *inode, struct file *file)
+{
+#ifndef CONFIG_FS_PID
+       struct pid *pid = file->private_data;
+
+       file->private_data = NULL;
+       put_pid(pid);
+#endif
+       return 0;
+}
+
+#ifdef CONFIG_PROC_FS
+/**
+ * pidfd_show_fdinfo - print information about a pidfd
+ * @m: proc fdinfo file
+ * @f: file referencing a pidfd
+ *
+ * Pid:
+ * This function will print the pid that a given pidfd refers to in the
+ * pid namespace of the procfs instance.
+ * If the pid namespace of the process is not a descendant of the pid
+ * namespace of the procfs instance 0 will be shown as its pid. This is
+ * similar to calling getppid() on a process whose parent is outside of
+ * its pid namespace.
+ *
+ * NSpid:
+ * If pid namespaces are supported then this function will also print
+ * the pid of a given pidfd refers to for all descendant pid namespaces
+ * starting from the current pid namespace of the instance, i.e. the
+ * Pid field and the first entry in the NSpid field will be identical.
+ * If the pid namespace of the process is not a descendant of the pid
+ * namespace of the procfs instance 0 will be shown as its first NSpid
+ * entry and no others will be shown.
+ * Note that this differs from the Pid and NSpid fields in
+ * /proc/<pid>/status where Pid and NSpid are always shown relative to
+ * the  pid namespace of the procfs instance. The difference becomes
+ * obvious when sending around a pidfd between pid namespaces from a
+ * different branch of the tree, i.e. where no ancestral relation is
+ * present between the pid namespaces:
+ * - create two new pid namespaces ns1 and ns2 in the initial pid
+ *   namespace (also take care to create new mount namespaces in the
+ *   new pid namespace and mount procfs)
+ * - create a process with a pidfd in ns1
+ * - send pidfd from ns1 to ns2
+ * - read /proc/self/fdinfo/<pidfd> and observe that both Pid and NSpid
+ *   have exactly one entry, which is 0
+ */
+static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
+{
+       struct pid *pid = pidfd_pid(f);
+       struct pid_namespace *ns;
+       pid_t nr = -1;
+
+       if (likely(pid_has_task(pid, PIDTYPE_PID))) {
+               ns = proc_pid_ns(file_inode(m->file)->i_sb);
+               nr = pid_nr_ns(pid, ns);
+       }
+
+       seq_put_decimal_ll(m, "Pid:\t", nr);
+
+#ifdef CONFIG_PID_NS
+       seq_put_decimal_ll(m, "\nNSpid:\t", nr);
+       if (nr > 0) {
+               int i;
+
+               /* If nr is non-zero it means that 'pid' is valid and that
+                * ns, i.e. the pid namespace associated with the procfs
+                * instance, is in the pid namespace hierarchy of pid.
+                * Start at one below the already printed level.
+                */
+               for (i = ns->level + 1; i <= pid->level; i++)
+                       seq_put_decimal_ll(m, "\t", pid->numbers[i].nr);
+       }
+#endif
+       seq_putc(m, '\n');
+}
+#endif
+
+/*
+ * Poll support for process exit notification.
+ */
+static __poll_t pidfd_poll(struct file *file, struct poll_table_struct *pts)
+{
+       struct pid *pid = pidfd_pid(file);
+       bool thread = file->f_flags & PIDFD_THREAD;
+       struct task_struct *task;
+       __poll_t poll_flags = 0;
+
+       poll_wait(file, &pid->wait_pidfd, pts);
+       /*
+        * Depending on PIDFD_THREAD, inform pollers when the thread
+        * or the whole thread-group exits.
+        */
+       guard(rcu)();
+       task = pid_task(pid, PIDTYPE_PID);
+       if (!task)
+               poll_flags = EPOLLIN | EPOLLRDNORM | EPOLLHUP;
+       else if (task->exit_state && (thread || thread_group_empty(task)))
+               poll_flags = EPOLLIN | EPOLLRDNORM;
+
+       return poll_flags;
+}
+
+static const struct file_operations pidfs_file_operations = {
+       .release        = pidfd_release,
+       .poll           = pidfd_poll,
+#ifdef CONFIG_PROC_FS
+       .show_fdinfo    = pidfd_show_fdinfo,
+#endif
+};
+
+struct pid *pidfd_pid(const struct file *file)
+{
+       if (file->f_op != &pidfs_file_operations)
+               return ERR_PTR(-EBADF);
+#ifdef CONFIG_FS_PID
+       return file_inode(file)->i_private;
+#else
+       return file->private_data;
+#endif
+}
+
+#ifdef CONFIG_FS_PID
+static struct vfsmount *pidfs_mnt __ro_after_init;
+
+/*
+ * The vfs falls back to simple_setattr() if i_op->setattr() isn't
+ * implemented. Let's reject it completely until we have a clean
+ * permission concept for pidfds.
+ */
+static int pidfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
+                        struct iattr *attr)
+{
+       return -EOPNOTSUPP;
+}
+
+static int pidfs_getattr(struct mnt_idmap *idmap, const struct path *path,
+                        struct kstat *stat, u32 request_mask,
+                        unsigned int query_flags)
+{
+       struct inode *inode = d_inode(path->dentry);
+
+       generic_fillattr(&nop_mnt_idmap, request_mask, inode, stat);
+       return 0;
+}
+
+static const struct inode_operations pidfs_inode_operations = {
+       .getattr = pidfs_getattr,
+       .setattr = pidfs_setattr,
+};
+
+static void pidfs_evict_inode(struct inode *inode)
+{
+       struct pid *pid = inode->i_private;
+
+       clear_inode(inode);
+       put_pid(pid);
+}
+
+static const struct super_operations pidfs_sops = {
+       .drop_inode     = generic_delete_inode,
+       .evict_inode    = pidfs_evict_inode,
+       .statfs         = simple_statfs,
+};
+
+static char *pidfs_dname(struct dentry *dentry, char *buffer, int buflen)
+{
+       return dynamic_dname(buffer, buflen, "pidfd:[%lu]",
+                            d_inode(dentry)->i_ino);
+}
+
+static const struct dentry_operations pidfs_dentry_operations = {
+       .d_delete       = always_delete_dentry,
+       .d_dname        = pidfs_dname,
+       .d_prune        = stashed_dentry_prune,
+};
+
+static void pidfs_init_inode(struct inode *inode, void *data)
+{
+       inode->i_private = data;
+       inode->i_flags |= S_PRIVATE;
+       inode->i_mode |= S_IRWXU;
+       inode->i_op = &pidfs_inode_operations;
+       inode->i_fop = &pidfs_file_operations;
+}
+
+static void pidfs_put_data(void *data)
+{
+       struct pid *pid = data;
+       put_pid(pid);
+}
+
+static const struct stashed_operations pidfs_stashed_ops = {
+       .init_inode = pidfs_init_inode,
+       .put_data = pidfs_put_data,
+};
+
+static int pidfs_init_fs_context(struct fs_context *fc)
+{
+       struct pseudo_fs_context *ctx;
+
+       ctx = init_pseudo(fc, PID_FS_MAGIC);
+       if (!ctx)
+               return -ENOMEM;
+
+       ctx->ops = &pidfs_sops;
+       ctx->dops = &pidfs_dentry_operations;
+       fc->s_fs_info = (void *)&pidfs_stashed_ops;
+       return 0;
+}
+
+static struct file_system_type pidfs_type = {
+       .name                   = "pidfs",
+       .init_fs_context        = pidfs_init_fs_context,
+       .kill_sb                = kill_anon_super,
+};
+
+struct file *pidfs_alloc_file(struct pid *pid, unsigned int flags)
+{
+
+       struct file *pidfd_file;
+       struct path path;
+       int ret;
+
+       /*
+       * Inode numbering for pidfs start at RESERVED_PIDS + 1.
+       * This avoids collisions with the root inode which is 1
+       * for pseudo filesystems.
+        */
+       ret = path_from_stashed(&pid->stashed, pid->ino, pidfs_mnt,
+                               get_pid(pid), &path);
+       if (ret < 0)
+               return ERR_PTR(ret);
+
+       pidfd_file = dentry_open(&path, flags, current_cred());
+       path_put(&path);
+       return pidfd_file;
+}
+
+void __init pidfs_init(void)
+{
+       pidfs_mnt = kern_mount(&pidfs_type);
+       if (IS_ERR(pidfs_mnt))
+               panic("Failed to mount pidfs pseudo filesystem");
+}
+
+bool is_pidfs_sb(const struct super_block *sb)
+{
+       return sb == pidfs_mnt->mnt_sb;
+}
+
+#else /* !CONFIG_FS_PID */
+
+struct file *pidfs_alloc_file(struct pid *pid, unsigned int flags)
+{
+       struct file *pidfd_file;
+
+       pidfd_file = anon_inode_getfile("[pidfd]", &pidfs_file_operations, pid,
+                                       flags | O_RDWR);
+       if (IS_ERR(pidfd_file))
+               return pidfd_file;
+
+       get_pid(pid);
+       return pidfd_file;
+}
+
+void __init pidfs_init(void) { }
+bool is_pidfs_sb(const struct super_block *sb)
+{
+       return false;
+}
+#endif
index f1adbfe743d4a7cce8c6270963f5ba45357964cd..50c8a8596b5245f90f43701ab039e1daa851d009 100644 (file)
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -76,18 +76,20 @@ static unsigned long pipe_user_pages_soft = PIPE_DEF_BUFFERS * INR_OPEN_CUR;
  * -- Manfred Spraul <manfred@colorfullife.com> 2002-05-09
  */
 
-static void pipe_lock_nested(struct pipe_inode_info *pipe, int subclass)
+#define cmp_int(l, r)          ((l > r) - (l < r))
+
+#ifdef CONFIG_PROVE_LOCKING
+static int pipe_lock_cmp_fn(const struct lockdep_map *a,
+                           const struct lockdep_map *b)
 {
-       if (pipe->files)
-               mutex_lock_nested(&pipe->mutex, subclass);
+       return cmp_int((unsigned long) a, (unsigned long) b);
 }
+#endif
 
 void pipe_lock(struct pipe_inode_info *pipe)
 {
-       /*
-        * pipe_lock() nests non-pipe inode locks (for writing to a file)
-        */
-       pipe_lock_nested(pipe, I_MUTEX_PARENT);
+       if (pipe->files)
+               mutex_lock(&pipe->mutex);
 }
 EXPORT_SYMBOL(pipe_lock);
 
@@ -98,28 +100,16 @@ void pipe_unlock(struct pipe_inode_info *pipe)
 }
 EXPORT_SYMBOL(pipe_unlock);
 
-static inline void __pipe_lock(struct pipe_inode_info *pipe)
-{
-       mutex_lock_nested(&pipe->mutex, I_MUTEX_PARENT);
-}
-
-static inline void __pipe_unlock(struct pipe_inode_info *pipe)
-{
-       mutex_unlock(&pipe->mutex);
-}
-
 void pipe_double_lock(struct pipe_inode_info *pipe1,
                      struct pipe_inode_info *pipe2)
 {
        BUG_ON(pipe1 == pipe2);
 
-       if (pipe1 < pipe2) {
-               pipe_lock_nested(pipe1, I_MUTEX_PARENT);
-               pipe_lock_nested(pipe2, I_MUTEX_CHILD);
-       } else {
-               pipe_lock_nested(pipe2, I_MUTEX_PARENT);
-               pipe_lock_nested(pipe1, I_MUTEX_CHILD);
-       }
+       if (pipe1 > pipe2)
+               swap(pipe1, pipe2);
+
+       pipe_lock(pipe1);
+       pipe_lock(pipe2);
 }
 
 static void anon_pipe_buf_release(struct pipe_inode_info *pipe,
@@ -271,7 +261,7 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
                return 0;
 
        ret = 0;
-       __pipe_lock(pipe);
+       mutex_lock(&pipe->mutex);
 
        /*
         * We only wake up writers if the pipe was full when we started
@@ -368,7 +358,7 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
                        ret = -EAGAIN;
                        break;
                }
-               __pipe_unlock(pipe);
+               mutex_unlock(&pipe->mutex);
 
                /*
                 * We only get here if we didn't actually read anything.
@@ -400,13 +390,13 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
                if (wait_event_interruptible_exclusive(pipe->rd_wait, pipe_readable(pipe)) < 0)
                        return -ERESTARTSYS;
 
-               __pipe_lock(pipe);
+               mutex_lock(&pipe->mutex);
                was_full = pipe_full(pipe->head, pipe->tail, pipe->max_usage);
                wake_next_reader = true;
        }
        if (pipe_empty(pipe->head, pipe->tail))
                wake_next_reader = false;
-       __pipe_unlock(pipe);
+       mutex_unlock(&pipe->mutex);
 
        if (was_full)
                wake_up_interruptible_sync_poll(&pipe->wr_wait, EPOLLOUT | EPOLLWRNORM);
@@ -462,7 +452,7 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
        if (unlikely(total_len == 0))
                return 0;
 
-       __pipe_lock(pipe);
+       mutex_lock(&pipe->mutex);
 
        if (!pipe->readers) {
                send_sig(SIGPIPE, current, 0);
@@ -582,19 +572,19 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
                 * after waiting we need to re-check whether the pipe
                 * become empty while we dropped the lock.
                 */
-               __pipe_unlock(pipe);
+               mutex_unlock(&pipe->mutex);
                if (was_empty)
                        wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
                kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
                wait_event_interruptible_exclusive(pipe->wr_wait, pipe_writable(pipe));
-               __pipe_lock(pipe);
+               mutex_lock(&pipe->mutex);
                was_empty = pipe_empty(pipe->head, pipe->tail);
                wake_next_writer = true;
        }
 out:
        if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
                wake_next_writer = false;
-       __pipe_unlock(pipe);
+       mutex_unlock(&pipe->mutex);
 
        /*
         * If we do do a wakeup event, we do a 'sync' wakeup, because we
@@ -629,7 +619,7 @@ static long pipe_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 
        switch (cmd) {
        case FIONREAD:
-               __pipe_lock(pipe);
+               mutex_lock(&pipe->mutex);
                count = 0;
                head = pipe->head;
                tail = pipe->tail;
@@ -639,16 +629,16 @@ static long pipe_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
                        count += pipe->bufs[tail & mask].len;
                        tail++;
                }
-               __pipe_unlock(pipe);
+               mutex_unlock(&pipe->mutex);
 
                return put_user(count, (int __user *)arg);
 
 #ifdef CONFIG_WATCH_QUEUE
        case IOC_WATCH_QUEUE_SET_SIZE: {
                int ret;
-               __pipe_lock(pipe);
+               mutex_lock(&pipe->mutex);
                ret = watch_queue_set_size(pipe, arg);
-               __pipe_unlock(pipe);
+               mutex_unlock(&pipe->mutex);
                return ret;
        }
 
@@ -734,7 +724,7 @@ pipe_release(struct inode *inode, struct file *file)
 {
        struct pipe_inode_info *pipe = file->private_data;
 
-       __pipe_lock(pipe);
+       mutex_lock(&pipe->mutex);
        if (file->f_mode & FMODE_READ)
                pipe->readers--;
        if (file->f_mode & FMODE_WRITE)
@@ -747,7 +737,7 @@ pipe_release(struct inode *inode, struct file *file)
                kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
                kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
        }
-       __pipe_unlock(pipe);
+       mutex_unlock(&pipe->mutex);
 
        put_pipe_info(inode, pipe);
        return 0;
@@ -759,7 +749,7 @@ pipe_fasync(int fd, struct file *filp, int on)
        struct pipe_inode_info *pipe = filp->private_data;
        int retval = 0;
 
-       __pipe_lock(pipe);
+       mutex_lock(&pipe->mutex);
        if (filp->f_mode & FMODE_READ)
                retval = fasync_helper(fd, filp, on, &pipe->fasync_readers);
        if ((filp->f_mode & FMODE_WRITE) && retval >= 0) {
@@ -768,7 +758,7 @@ pipe_fasync(int fd, struct file *filp, int on)
                        /* this can happen only if on == T */
                        fasync_helper(-1, filp, 0, &pipe->fasync_readers);
        }
-       __pipe_unlock(pipe);
+       mutex_unlock(&pipe->mutex);
        return retval;
 }
 
@@ -834,6 +824,7 @@ struct pipe_inode_info *alloc_pipe_info(void)
                pipe->nr_accounted = pipe_bufs;
                pipe->user = user;
                mutex_init(&pipe->mutex);
+               lock_set_cmp_fn(&pipe->mutex, pipe_lock_cmp_fn, NULL);
                return pipe;
        }
 
@@ -1144,7 +1135,7 @@ static int fifo_open(struct inode *inode, struct file *filp)
        filp->private_data = pipe;
        /* OK, we have a pipe and it's pinned down */
 
-       __pipe_lock(pipe);
+       mutex_lock(&pipe->mutex);
 
        /* We can only do regular read/write on fifos */
        stream_open(inode, filp);
@@ -1214,7 +1205,7 @@ static int fifo_open(struct inode *inode, struct file *filp)
        }
 
        /* Ok! */
-       __pipe_unlock(pipe);
+       mutex_unlock(&pipe->mutex);
        return 0;
 
 err_rd:
@@ -1230,7 +1221,7 @@ err_wr:
        goto err;
 
 err:
-       __pipe_unlock(pipe);
+       mutex_unlock(&pipe->mutex);
 
        put_pipe_info(inode, pipe);
        return ret;
@@ -1411,7 +1402,7 @@ long pipe_fcntl(struct file *file, unsigned int cmd, unsigned int arg)
        if (!pipe)
                return -EBADF;
 
-       __pipe_lock(pipe);
+       mutex_lock(&pipe->mutex);
 
        switch (cmd) {
        case F_SETPIPE_SZ:
@@ -1425,7 +1416,7 @@ long pipe_fcntl(struct file *file, unsigned int cmd, unsigned int arg)
                break;
        }
 
-       __pipe_unlock(pipe);
+       mutex_unlock(&pipe->mutex);
        return ret;
 }
 
index e1af20893ebe1ed400acb7c7215377868f623bbf..6bf587d1a9b873ca708ac598a70b8e6fea6bc469 100644 (file)
@@ -786,12 +786,12 @@ struct posix_acl *posix_acl_from_xattr(struct user_namespace *userns,
                return ERR_PTR(count);
        if (count == 0)
                return NULL;
-       
+
        acl = posix_acl_alloc(count, GFP_NOFS);
        if (!acl)
                return ERR_PTR(-ENOMEM);
        acl_e = acl->a_entries;
-       
+
        for (end = entry + count; entry != end; acl_e++, entry++) {
                acl_e->e_tag  = le16_to_cpu(entry->e_tag);
                acl_e->e_perm = le16_to_cpu(entry->e_perm);
index ff08a8957552add31a8fdf98e202f8380d519e50..34a47fb0c57f2570a4f7cb1f45373ddaf2afa883 100644 (file)
@@ -477,13 +477,13 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
        int permitted;
        struct mm_struct *mm;
        unsigned long long start_time;
-       unsigned long cmin_flt = 0, cmaj_flt = 0;
-       unsigned long  min_flt = 0,  maj_flt = 0;
-       u64 cutime, cstime, utime, stime;
-       u64 cgtime, gtime;
+       unsigned long cmin_flt, cmaj_flt, min_flt, maj_flt;
+       u64 cutime, cstime, cgtime, utime, stime, gtime;
        unsigned long rsslim = 0;
        unsigned long flags;
        int exit_code = task->exit_code;
+       struct signal_struct *sig = task->signal;
+       unsigned int seq = 1;
 
        state = *get_task_state(task);
        vsize = eip = esp = 0;
@@ -511,12 +511,8 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 
        sigemptyset(&sigign);
        sigemptyset(&sigcatch);
-       cutime = cstime = utime = stime = 0;
-       cgtime = gtime = 0;
 
        if (lock_task_sighand(task, &flags)) {
-               struct signal_struct *sig = task->signal;
-
                if (sig->tty) {
                        struct pid *pgrp = tty_get_pgrp(sig->tty);
                        tty_pgrp = pid_nr_ns(pgrp, ns);
@@ -527,28 +523,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
                num_threads = get_nr_threads(task);
                collect_sigign_sigcatch(task, &sigign, &sigcatch);
 
-               cmin_flt = sig->cmin_flt;
-               cmaj_flt = sig->cmaj_flt;
-               cutime = sig->cutime;
-               cstime = sig->cstime;
-               cgtime = sig->cgtime;
                rsslim = READ_ONCE(sig->rlim[RLIMIT_RSS].rlim_cur);
 
-               /* add up live thread stats at the group level */
                if (whole) {
-                       struct task_struct *t;
-
-                       __for_each_thread(sig, t) {
-                               min_flt += t->min_flt;
-                               maj_flt += t->maj_flt;
-                               gtime += task_gtime(t);
-                       }
-
-                       min_flt += sig->min_flt;
-                       maj_flt += sig->maj_flt;
-                       thread_group_cputime_adjusted(task, &utime, &stime);
-                       gtime += sig->gtime;
-
                        if (sig->flags & (SIGNAL_GROUP_EXIT | SIGNAL_STOP_STOPPED))
                                exit_code = sig->group_exit_code;
                }
@@ -562,10 +539,41 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 
        if (permitted && (!whole || num_threads < 2))
                wchan = !task_is_running(task);
-       if (!whole) {
+
+       do {
+               seq++; /* 2 on the 1st/lockless path, otherwise odd */
+               flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq);
+
+               cmin_flt = sig->cmin_flt;
+               cmaj_flt = sig->cmaj_flt;
+               cutime = sig->cutime;
+               cstime = sig->cstime;
+               cgtime = sig->cgtime;
+
+               if (whole) {
+                       struct task_struct *t;
+
+                       min_flt = sig->min_flt;
+                       maj_flt = sig->maj_flt;
+                       gtime = sig->gtime;
+
+                       rcu_read_lock();
+                       __for_each_thread(sig, t) {
+                               min_flt += t->min_flt;
+                               maj_flt += t->maj_flt;
+                               gtime += task_gtime(t);
+                       }
+                       rcu_read_unlock();
+               }
+       } while (need_seqretry(&sig->stats_lock, seq));
+       done_seqretry_irqrestore(&sig->stats_lock, seq, flags);
+
+       if (whole) {
+               thread_group_cputime_adjusted(task, &utime, &stime);
+       } else {
+               task_cputime_adjusted(task, &utime, &stime);
                min_flt = task->min_flt;
                maj_flt = task->maj_flt;
-               task_cputime_adjusted(task, &utime, &stime);
                gtime = task_gtime(task);
        }
 
index 98a031ac26484544b8b07aec1ca72f40250ba2ca..18550c071d71c733204e3a94d274ac4d47c00119 100644 (file)
@@ -1878,8 +1878,6 @@ void proc_pid_evict_inode(struct proc_inode *ei)
                hlist_del_init_rcu(&ei->sibling_inodes);
                spin_unlock(&pid->lock);
        }
-
-       put_pid(pid);
 }
 
 struct inode *proc_pid_make_inode(struct super_block *sb,
index b33e490e3fd9f88f569e3453d603041e665cf6bf..dcd513dccf55cbfa50d5abe965b7a636dcff8353 100644 (file)
@@ -30,7 +30,6 @@
 
 static void proc_evict_inode(struct inode *inode)
 {
-       struct proc_dir_entry *de;
        struct ctl_table_header *head;
        struct proc_inode *ei = PROC_I(inode);
 
@@ -38,17 +37,8 @@ static void proc_evict_inode(struct inode *inode)
        clear_inode(inode);
 
        /* Stop tracking associated processes */
-       if (ei->pid) {
+       if (ei->pid)
                proc_pid_evict_inode(ei);
-               ei->pid = NULL;
-       }
-
-       /* Let go of any associated proc directory entry */
-       de = ei->pde;
-       if (de) {
-               pde_put(de);
-               ei->pde = NULL;
-       }
 
        head = ei->sysctl;
        if (head) {
@@ -80,6 +70,13 @@ static struct inode *proc_alloc_inode(struct super_block *sb)
 
 static void proc_free_inode(struct inode *inode)
 {
+       struct proc_inode *ei = PROC_I(inode);
+
+       if (ei->pid)
+               put_pid(ei->pid);
+       /* Let go of any associated proc directory entry */
+       if (ei->pde)
+               pde_put(ei->pde);
        kmem_cache_free(proc_inode_cachep, PROC_I(inode));
 }
 
@@ -95,7 +92,7 @@ void __init proc_init_kmemcache(void)
        proc_inode_cachep = kmem_cache_create("proc_inode_cache",
                                             sizeof(struct proc_inode),
                                             0, (SLAB_RECLAIM_ACCOUNT|
-                                               SLAB_MEM_SPREAD|SLAB_ACCOUNT|
+                                               SLAB_ACCOUNT|
                                                SLAB_PANIC),
                                             init_once);
        pde_opener_cache =
index b55dbc70287b492ae2e4ed43e2a3c04ee0818798..06a297a27ba3b31a5e2092fcd08d1ca9eebb5849 100644 (file)
@@ -271,7 +271,7 @@ static void proc_kill_sb(struct super_block *sb)
 
        kill_anon_super(sb);
        put_pid_ns(fs_info->pid_ns);
-       kfree(fs_info);
+       kfree_rcu(fs_info, rcu);
 }
 
 static struct file_system_type proc_fs_type = {
index 6eb9bb369b57ff1fdd9338b29a33d654c93cbda3..7b5711f76709cd27be1421809a082f0ec42e0a58 100644 (file)
@@ -21,6 +21,7 @@
 #include <linux/buffer_head.h>
 #include <linux/writeback.h>
 #include <linux/statfs.h>
+#include <linux/fs_context.h>
 #include "qnx4.h"
 
 #define QNX4_VERSION  4
@@ -30,28 +31,33 @@ static const struct super_operations qnx4_sops;
 
 static struct inode *qnx4_alloc_inode(struct super_block *sb);
 static void qnx4_free_inode(struct inode *inode);
-static int qnx4_remount(struct super_block *sb, int *flags, char *data);
 static int qnx4_statfs(struct dentry *, struct kstatfs *);
+static int qnx4_get_tree(struct fs_context *fc);
 
 static const struct super_operations qnx4_sops =
 {
        .alloc_inode    = qnx4_alloc_inode,
        .free_inode     = qnx4_free_inode,
        .statfs         = qnx4_statfs,
-       .remount_fs     = qnx4_remount,
 };
 
-static int qnx4_remount(struct super_block *sb, int *flags, char *data)
+static int qnx4_reconfigure(struct fs_context *fc)
 {
+       struct super_block *sb = fc->root->d_sb;
        struct qnx4_sb_info *qs;
 
        sync_filesystem(sb);
        qs = qnx4_sb(sb);
        qs->Version = QNX4_VERSION;
-       *flags |= SB_RDONLY;
+       fc->sb_flags |= SB_RDONLY;
        return 0;
 }
 
+static const struct fs_context_operations qnx4_context_opts = {
+       .get_tree       = qnx4_get_tree,
+       .reconfigure    = qnx4_reconfigure,
+};
+
 static int qnx4_get_block( struct inode *inode, sector_t iblock, struct buffer_head *bh, int create )
 {
        unsigned long phys;
@@ -183,12 +189,13 @@ static const char *qnx4_checkroot(struct super_block *sb,
        return "bitmap file not found.";
 }
 
-static int qnx4_fill_super(struct super_block *s, void *data, int silent)
+static int qnx4_fill_super(struct super_block *s, struct fs_context *fc)
 {
        struct buffer_head *bh;
        struct inode *root;
        const char *errmsg;
        struct qnx4_sb_info *qs;
+       int silent = fc->sb_flags & SB_SILENT;
 
        qs = kzalloc(sizeof(struct qnx4_sb_info), GFP_KERNEL);
        if (!qs)
@@ -216,7 +223,7 @@ static int qnx4_fill_super(struct super_block *s, void *data, int silent)
        errmsg = qnx4_checkroot(s, (struct qnx4_super_block *) bh->b_data);
        brelse(bh);
        if (errmsg != NULL) {
-               if (!silent)
+               if (!silent)
                        printk(KERN_ERR "qnx4: %s\n", errmsg);
                return -EINVAL;
        }
@@ -235,6 +242,18 @@ static int qnx4_fill_super(struct super_block *s, void *data, int silent)
        return 0;
 }
 
+static int qnx4_get_tree(struct fs_context *fc)
+{
+       return get_tree_bdev(fc, qnx4_fill_super);
+}
+
+static int qnx4_init_fs_context(struct fs_context *fc)
+{
+       fc->ops = &qnx4_context_opts;
+
+       return 0;
+}
+
 static void qnx4_kill_sb(struct super_block *sb)
 {
        struct qnx4_sb_info *qs = qnx4_sb(sb);
@@ -376,18 +395,12 @@ static void destroy_inodecache(void)
        kmem_cache_destroy(qnx4_inode_cachep);
 }
 
-static struct dentry *qnx4_mount(struct file_system_type *fs_type,
-       int flags, const char *dev_name, void *data)
-{
-       return mount_bdev(fs_type, flags, dev_name, data, qnx4_fill_super);
-}
-
 static struct file_system_type qnx4_fs_type = {
-       .owner          = THIS_MODULE,
-       .name           = "qnx4",
-       .mount          = qnx4_mount,
-       .kill_sb        = qnx4_kill_sb,
-       .fs_flags       = FS_REQUIRES_DEV,
+       .owner                  = THIS_MODULE,
+       .name                   = "qnx4",
+       .kill_sb                = qnx4_kill_sb,
+       .fs_flags               = FS_REQUIRES_DEV,
+       .init_fs_context        = qnx4_init_fs_context,
 };
 MODULE_ALIAS_FS("qnx4");
 
index a286c545717f8991e3bae5808083f9ee5c2fd11b..405913f4faff99538beab737ee0887931a607b6b 100644 (file)
@@ -615,7 +615,7 @@ static int init_inodecache(void)
        qnx6_inode_cachep = kmem_cache_create("qnx6_inode_cache",
                                             sizeof(struct qnx6_inode_info),
                                             0, (SLAB_RECLAIM_ACCOUNT|
-                                               SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+                                               SLAB_ACCOUNT),
                                             init_once);
        if (!qnx6_inode_cachep)
                return -ENOMEM;
index 171c912af50f6f42bf96a9936f77fde70bbd8770..6474529c42530628fd3969573fb175283f4f51e8 100644 (file)
@@ -2386,7 +2386,7 @@ static int journal_read(struct super_block *sb)
 
        cur_dblock = SB_ONDISK_JOURNAL_1st_BLOCK(sb);
        reiserfs_info(sb, "checking transaction log (%pg)\n",
-                     journal->j_bdev_handle->bdev);
+                     file_bdev(journal->j_bdev_file));
        start = ktime_get_seconds();
 
        /*
@@ -2447,7 +2447,7 @@ static int journal_read(struct super_block *sb)
                 * device and journal device to be the same
                 */
                d_bh =
-                   reiserfs_breada(journal->j_bdev_handle->bdev, cur_dblock,
+                   reiserfs_breada(file_bdev(journal->j_bdev_file), cur_dblock,
                                    sb->s_blocksize,
                                    SB_ONDISK_JOURNAL_1st_BLOCK(sb) +
                                    SB_ONDISK_JOURNAL_SIZE(sb));
@@ -2588,9 +2588,9 @@ static void journal_list_init(struct super_block *sb)
 
 static void release_journal_dev(struct reiserfs_journal *journal)
 {
-       if (journal->j_bdev_handle) {
-               bdev_release(journal->j_bdev_handle);
-               journal->j_bdev_handle = NULL;
+       if (journal->j_bdev_file) {
+               fput(journal->j_bdev_file);
+               journal->j_bdev_file = NULL;
        }
 }
 
@@ -2605,7 +2605,7 @@ static int journal_init_dev(struct super_block *super,
 
        result = 0;
 
-       journal->j_bdev_handle = NULL;
+       journal->j_bdev_file = NULL;
        jdev = SB_ONDISK_JOURNAL_DEVICE(super) ?
            new_decode_dev(SB_ONDISK_JOURNAL_DEVICE(super)) : super->s_dev;
 
@@ -2616,37 +2616,37 @@ static int journal_init_dev(struct super_block *super,
        if ((!jdev_name || !jdev_name[0])) {
                if (jdev == super->s_dev)
                        holder = NULL;
-               journal->j_bdev_handle = bdev_open_by_dev(jdev, blkdev_mode,
+               journal->j_bdev_file = bdev_file_open_by_dev(jdev, blkdev_mode,
                                                          holder, NULL);
-               if (IS_ERR(journal->j_bdev_handle)) {
-                       result = PTR_ERR(journal->j_bdev_handle);
-                       journal->j_bdev_handle = NULL;
+               if (IS_ERR(journal->j_bdev_file)) {
+                       result = PTR_ERR(journal->j_bdev_file);
+                       journal->j_bdev_file = NULL;
                        reiserfs_warning(super, "sh-458",
                                         "cannot init journal device unknown-block(%u,%u): %i",
                                         MAJOR(jdev), MINOR(jdev), result);
                        return result;
                } else if (jdev != super->s_dev)
-                       set_blocksize(journal->j_bdev_handle->bdev,
+                       set_blocksize(file_bdev(journal->j_bdev_file),
                                      super->s_blocksize);
 
                return 0;
        }
 
-       journal->j_bdev_handle = bdev_open_by_path(jdev_name, blkdev_mode,
+       journal->j_bdev_file = bdev_file_open_by_path(jdev_name, blkdev_mode,
                                                   holder, NULL);
-       if (IS_ERR(journal->j_bdev_handle)) {
-               result = PTR_ERR(journal->j_bdev_handle);
-               journal->j_bdev_handle = NULL;
+       if (IS_ERR(journal->j_bdev_file)) {
+               result = PTR_ERR(journal->j_bdev_file);
+               journal->j_bdev_file = NULL;
                reiserfs_warning(super, "sh-457",
                                 "journal_init_dev: Cannot open '%s': %i",
                                 jdev_name, result);
                return result;
        }
 
-       set_blocksize(journal->j_bdev_handle->bdev, super->s_blocksize);
+       set_blocksize(file_bdev(journal->j_bdev_file), super->s_blocksize);
        reiserfs_info(super,
                      "journal_init_dev: journal device: %pg\n",
-                     journal->j_bdev_handle->bdev);
+                     file_bdev(journal->j_bdev_file));
        return 0;
 }
 
@@ -2804,7 +2804,7 @@ int journal_init(struct super_block *sb, const char *j_dev_name,
                                 "journal header magic %x (device %pg) does "
                                 "not match to magic found in super block %x",
                                 jh->jh_journal.jp_journal_magic,
-                                journal->j_bdev_handle->bdev,
+                                file_bdev(journal->j_bdev_file),
                                 sb_jp_journal_magic(rs));
                brelse(bhjh);
                goto free_and_return;
@@ -2828,7 +2828,7 @@ int journal_init(struct super_block *sb, const char *j_dev_name,
        reiserfs_info(sb, "journal params: device %pg, size %u, "
                      "journal first block %u, max trans len %u, max batch %u, "
                      "max commit age %u, max trans age %u\n",
-                     journal->j_bdev_handle->bdev,
+                     file_bdev(journal->j_bdev_file),
                      SB_ONDISK_JOURNAL_SIZE(sb),
                      SB_ONDISK_JOURNAL_1st_BLOCK(sb),
                      journal->j_trans_max,
index 83cb9402e0f9c54dacd9fc64751b6e0ecca11421..5c68a4a52d78818eca7efbb61d3f9ab5787a408d 100644 (file)
@@ -354,7 +354,7 @@ static int show_journal(struct seq_file *m, void *unused)
                   "prepare: \t%12lu\n"
                   "prepare_retry: \t%12lu\n",
                   DJP(jp_journal_1st_block),
-                  SB_JOURNAL(sb)->j_bdev_handle->bdev,
+                  file_bdev(SB_JOURNAL(sb)->j_bdev_file),
                   DJP(jp_journal_dev),
                   DJP(jp_journal_size),
                   DJP(jp_journal_trans_max),
index 725667880e626a4577621c9137733340f0c53d30..0554903f42a90954620de4c6fca37be828d4f2fb 100644 (file)
@@ -299,7 +299,7 @@ struct reiserfs_journal {
        /* oldest journal block.  start here for traverse */
        struct reiserfs_journal_cnode *j_first;
 
-       struct bdev_handle *j_bdev_handle;
+       struct file *j_bdev_file;
 
        /* first block on s_dev of reserved area journal */
        int j_1st_reserved_block;
@@ -2810,10 +2810,10 @@ struct reiserfs_journal_header {
 
 /* We need these to make journal.c code more readable */
 #define journal_find_get_block(s, block) __find_get_block(\
-               SB_JOURNAL(s)->j_bdev_handle->bdev, block, s->s_blocksize)
-#define journal_getblk(s, block) __getblk(SB_JOURNAL(s)->j_bdev_handle->bdev,\
+               file_bdev(SB_JOURNAL(s)->j_bdev_file), block, s->s_blocksize)
+#define journal_getblk(s, block) __getblk(file_bdev(SB_JOURNAL(s)->j_bdev_file),\
                block, s->s_blocksize)
-#define journal_bread(s, block) __bread(SB_JOURNAL(s)->j_bdev_handle->bdev,\
+#define journal_bread(s, block) __bread(file_bdev(SB_JOURNAL(s)->j_bdev_file),\
                block, s->s_blocksize)
 
 enum reiserfs_bh_state_bits {
index 67b5510beded2260b04c41bbd0247950784244dd..2cc469d481a290fa9978880229cc6995dc5b0a60 100644 (file)
@@ -670,7 +670,6 @@ static int __init init_inodecache(void)
                                                  sizeof(struct
                                                         reiserfs_inode_info),
                                                  0, (SLAB_RECLAIM_ACCOUNT|
-                                                     SLAB_MEM_SPREAD|
                                                      SLAB_ACCOUNT),
                                                  init_once);
        if (reiserfs_inode_cachep == NULL)
index f8c1120b8311f62324324b911b0aa4aebe4ccb04..de07f978ce3ebe16bf42bf5315996fd074de5aac 100644 (file)
@@ -373,9 +373,9 @@ int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
 }
 EXPORT_SYMBOL(generic_remap_file_range_prep);
 
-loff_t do_clone_file_range(struct file *file_in, loff_t pos_in,
-                          struct file *file_out, loff_t pos_out,
-                          loff_t len, unsigned int remap_flags)
+loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in,
+                           struct file *file_out, loff_t pos_out,
+                           loff_t len, unsigned int remap_flags)
 {
        loff_t ret;
 
@@ -391,23 +391,6 @@ loff_t do_clone_file_range(struct file *file_in, loff_t pos_in,
        if (!file_in->f_op->remap_file_range)
                return -EOPNOTSUPP;
 
-       ret = file_in->f_op->remap_file_range(file_in, pos_in,
-                       file_out, pos_out, len, remap_flags);
-       if (ret < 0)
-               return ret;
-
-       fsnotify_access(file_in);
-       fsnotify_modify(file_out);
-       return ret;
-}
-EXPORT_SYMBOL(do_clone_file_range);
-
-loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in,
-                           struct file *file_out, loff_t pos_out,
-                           loff_t len, unsigned int remap_flags)
-{
-       loff_t ret;
-
        ret = remap_verify_area(file_in, pos_in, len, false);
        if (ret)
                return ret;
@@ -417,10 +400,14 @@ loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in,
                return ret;
 
        file_start_write(file_out);
-       ret = do_clone_file_range(file_in, pos_in, file_out, pos_out, len,
-                                 remap_flags);
+       ret = file_in->f_op->remap_file_range(file_in, pos_in,
+                       file_out, pos_out, len, remap_flags);
        file_end_write(file_out);
+       if (ret < 0)
+               return ret;
 
+       fsnotify_access(file_in);
+       fsnotify_modify(file_out);
        return ret;
 }
 EXPORT_SYMBOL(vfs_clone_file_range);
index 545ad44f96b89148f1fc122d17f5a4b4e1a66deb..2be227532f399788de82a03e55970d33c67dc695 100644 (file)
@@ -594,7 +594,7 @@ static void romfs_kill_sb(struct super_block *sb)
 #ifdef CONFIG_ROMFS_ON_BLOCK
        if (sb->s_bdev) {
                sync_blockdev(sb->s_bdev);
-               bdev_release(sb->s_bdev_handle);
+               fput(sb->s_bdev_file);
        }
 #endif
 }
@@ -630,8 +630,8 @@ static int __init init_romfs_fs(void)
        romfs_inode_cachep =
                kmem_cache_create("romfs_i",
                                  sizeof(struct romfs_inode_info), 0,
-                                 SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD |
-                                 SLAB_ACCOUNT, romfs_i_init_once);
+                                 SLAB_RECLAIM_ACCOUNT | SLAB_ACCOUNT,
+                                 romfs_i_init_once);
 
        if (!romfs_inode_cachep) {
                pr_err("Failed to initialise inode cache\n");
index 0ee55af1a55c29b14b118907b98e0329b8792376..9515c3fa1a03e8a8576f90f8a4dee46509dd1954 100644 (file)
@@ -476,7 +476,7 @@ static inline void wait_key_set(poll_table *wait, unsigned long in,
                wait->_key |= POLLOUT_SET;
 }
 
-static int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time)
+static noinline_for_stack int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time)
 {
        ktime_t expire, *to = NULL;
        struct poll_wqueues table;
@@ -839,7 +839,7 @@ SYSCALL_DEFINE1(old_select, struct sel_arg_struct __user *, arg)
 
 struct poll_list {
        struct poll_list *next;
-       int len;
+       unsigned int len;
        struct pollfd entries[];
 };
 
@@ -975,14 +975,15 @@ static int do_sys_poll(struct pollfd __user *ufds, unsigned int nfds,
                struct timespec64 *end_time)
 {
        struct poll_wqueues table;
-       int err = -EFAULT, fdcount, len;
+       int err = -EFAULT, fdcount;
        /* Allocate small arguments on the stack to save memory and be
           faster - use long to make sure the buffer is aligned properly
           on 64 bit archs to avoid unaligned access */
        long stack_pps[POLL_STACK_ALLOC/sizeof(long)];
        struct poll_list *const head = (struct poll_list *)stack_pps;
        struct poll_list *walk = head;
-       unsigned long todo = nfds;
+       unsigned int todo = nfds;
+       unsigned int len;
 
        if (nfds > rlimit(RLIMIT_NOFILE))
                return -EINVAL;
@@ -998,9 +999,9 @@ static int do_sys_poll(struct pollfd __user *ufds, unsigned int nfds,
                                        sizeof(struct pollfd) * walk->len))
                        goto out_fds;
 
-               todo -= walk->len;
-               if (!todo)
+               if (walk->len >= todo)
                        break;
+               todo -= walk->len;
 
                len = min(todo, POLLFD_PER_PAGE);
                walk = walk->next = kmalloc(struct_size(walk, entries, len),
@@ -1020,7 +1021,7 @@ static int do_sys_poll(struct pollfd __user *ufds, unsigned int nfds,
 
        for (walk = head; walk; walk = walk->next) {
                struct pollfd *fds = walk->entries;
-               int j;
+               unsigned int j;
 
                for (j = walk->len; j; fds++, ufds++, j--)
                        unsafe_put_user(fds->revents, &ufds->revents, Efault);
index 971892620504730e6e2265f50c54874f3d676eac..3de5047a7ff988c2049350e464771e912b12894e 100644 (file)
@@ -145,21 +145,27 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
        struct cached_fid *cfid;
        struct cached_fids *cfids;
        const char *npath;
+       int retries = 0, cur_sleep = 1;
 
        if (tcon == NULL || tcon->cfids == NULL || tcon->nohandlecache ||
            is_smb1_server(tcon->ses->server) || (dir_cache_timeout == 0))
                return -EOPNOTSUPP;
 
        ses = tcon->ses;
-       server = cifs_pick_channel(ses);
        cfids = tcon->cfids;
 
-       if (!server->ops->new_lease_key)
-               return -EIO;
-
        if (cifs_sb->root == NULL)
                return -ENOENT;
 
+replay_again:
+       /* reinitialize for possible replay */
+       flags = 0;
+       oplock = SMB2_OPLOCK_LEVEL_II;
+       server = cifs_pick_channel(ses);
+
+       if (!server->ops->new_lease_key)
+               return -EIO;
+
        utf16_path = cifs_convert_path_to_utf16(path, cifs_sb);
        if (!utf16_path)
                return -ENOMEM;
@@ -236,6 +242,7 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
                .desired_access =  FILE_READ_DATA | FILE_READ_ATTRIBUTES,
                .disposition = FILE_OPEN,
                .fid = pfid,
+               .replay = !!(retries),
        };
 
        rc = SMB2_open_init(tcon, server,
@@ -268,6 +275,11 @@ int open_cached_dir(unsigned int xid, struct cifs_tcon *tcon,
         */
        cfid->has_lease = true;
 
+       if (retries) {
+               smb2_set_replay(server, &rqst[0]);
+               smb2_set_replay(server, &rqst[1]);
+       }
+
        rc = compound_send_recv(xid, ses, server,
                                flags, 2, rqst,
                                resp_buftype, rsp_iov);
@@ -367,6 +379,11 @@ out:
                atomic_inc(&tcon->num_remote_opens);
        }
        kfree(utf16_path);
+
+       if (is_replayable_error(rc) &&
+           smb2_should_replay(tcon, &retries, &cur_sleep))
+               goto replay_again;
+
        return rc;
 }
 
index ef4c2e3c9fa6130b129be94d4a15c4724b952ed9..6322f0f68a176b177c943b074fe414c4905bf9bb 100644 (file)
@@ -572,7 +572,7 @@ static int calc_ntlmv2_hash(struct cifs_ses *ses, char *ntlmv2_hash,
                len = cifs_strtoUTF16(user, ses->user_name, len, nls_cp);
                UniStrupr(user);
        } else {
-               memset(user, '\0', 2);
+               *(u16 *)user = 0;
        }
 
        rc = crypto_shash_update(ses->server->secmech.hmacmd5,
index e902de4e475af9cc3483fba922a1b11cbb068cd9..fb368b191eefd767a9f1f8b619580e3077827c46 100644 (file)
@@ -396,7 +396,7 @@ cifs_alloc_inode(struct super_block *sb)
        spin_lock_init(&cifs_inode->writers_lock);
        cifs_inode->writers = 0;
        cifs_inode->netfs.inode.i_blkbits = 14;  /* 2**14 = CIFS_MAX_MSGSIZE */
-       cifs_inode->server_eof = 0;
+       cifs_inode->netfs.remote_i_size = 0;
        cifs_inode->uniqueid = 0;
        cifs_inode->createtime = 0;
        cifs_inode->epoch = 0;
@@ -1085,7 +1085,7 @@ static loff_t cifs_llseek(struct file *file, loff_t offset, int whence)
 }
 
 static int
-cifs_setlease(struct file *file, int arg, struct file_lock **lease, void **priv)
+cifs_setlease(struct file *file, int arg, struct file_lease **lease, void **priv)
 {
        /*
         * Note that this is called by vfs setlease with i_lock held to
@@ -1094,9 +1094,6 @@ cifs_setlease(struct file *file, int arg, struct file_lock **lease, void **priv)
        struct inode *inode = file_inode(file);
        struct cifsFileInfo *cfile = file->private_data;
 
-       if (!(S_ISREG(inode->i_mode)))
-               return -EINVAL;
-
        /* Check if file is oplocked if this is request for new lease */
        if (arg == F_UNLCK ||
            ((arg == F_RDLCK) && CIFS_CACHE_READ(CIFS_I(inode))) ||
@@ -1172,6 +1169,9 @@ const char *cifs_get_link(struct dentry *dentry, struct inode *inode,
 {
        char *target_path;
 
+       if (!dentry)
+               return ERR_PTR(-ECHILD);
+
        target_path = kmalloc(PATH_MAX, GFP_KERNEL);
        if (!target_path)
                return ERR_PTR(-ENOMEM);
@@ -1380,6 +1380,7 @@ ssize_t cifs_file_copychunk_range(unsigned int xid,
        struct inode *src_inode = file_inode(src_file);
        struct inode *target_inode = file_inode(dst_file);
        struct cifsInodeInfo *src_cifsi = CIFS_I(src_inode);
+       struct cifsInodeInfo *target_cifsi = CIFS_I(target_inode);
        struct cifsFileInfo *smb_file_src;
        struct cifsFileInfo *smb_file_target;
        struct cifs_tcon *src_tcon;
@@ -1428,7 +1429,7 @@ ssize_t cifs_file_copychunk_range(unsigned int xid,
         * Advance the EOF marker after the flush above to the end of the range
         * if it's short of that.
         */
-       if (src_cifsi->server_eof < off + len) {
+       if (src_cifsi->netfs.remote_i_size < off + len) {
                rc = cifs_precopy_set_eof(src_inode, src_cifsi, src_tcon, xid, off + len);
                if (rc < 0)
                        goto unlock;
@@ -1452,12 +1453,22 @@ ssize_t cifs_file_copychunk_range(unsigned int xid,
        /* Discard all the folios that overlap the destination region. */
        truncate_inode_pages_range(&target_inode->i_data, fstart, fend);
 
+       fscache_invalidate(cifs_inode_cookie(target_inode), NULL,
+                          i_size_read(target_inode), 0);
+
        rc = file_modified(dst_file);
        if (!rc) {
                rc = target_tcon->ses->server->ops->copychunk_range(xid,
                        smb_file_src, smb_file_target, off, len, destoff);
-               if (rc > 0 && destoff + rc > i_size_read(target_inode))
+               if (rc > 0 && destoff + rc > i_size_read(target_inode)) {
                        truncate_setsize(target_inode, destoff + rc);
+                       netfs_resize_file(&target_cifsi->netfs,
+                                         i_size_read(target_inode), true);
+                       fscache_resize_cookie(cifs_inode_cookie(target_inode),
+                                             i_size_read(target_inode));
+               }
+               if (rc > 0 && destoff + rc > target_cifsi->netfs.zero_point)
+                       target_cifsi->netfs.zero_point = destoff + rc;
        }
 
        file_accessed(src_file);
index 20036fb16cececeaa3acffb78d81691ac86b1ec3..53c75cfb33ab9446740133e7f19da6229eeffd55 100644 (file)
  */
 #define CIFS_DEF_ACTIMEO (1 * HZ)
 
+/*
+ * max sleep time before retry to server
+ */
+#define CIFS_MAX_SLEEP 2000
+
 /*
  * max attribute cache timeout (jiffies) - 2^30
  */
@@ -82,7 +87,7 @@
 #define SMB_INTERFACE_POLL_INTERVAL    600
 
 /* maximum number of PDUs in one compound */
-#define MAX_COMPOUND 5
+#define MAX_COMPOUND 7
 
 /*
  * Default number of credits to keep available for SMB3.
@@ -1027,6 +1032,8 @@ struct cifs_chan {
        __u8 signkey[SMB3_SIGN_KEY_SIZE];
 };
 
+#define CIFS_SES_FLAG_SCALE_CHANNELS (0x1)
+
 /*
  * Session structure.  One of these for each uid session with a particular host
  */
@@ -1059,6 +1066,7 @@ struct cifs_ses {
        enum securityEnum sectype; /* what security flavor was specified? */
        bool sign;              /* is signing required? */
        bool domainAuto:1;
+       unsigned int flags;
        __u16 session_flags;
        __u8 smb3signingkey[SMB3_SIGN_KEY_SIZE];
        __u8 smb3encryptionkey[SMB3_ENC_DEC_KEY_SIZE];
@@ -1370,6 +1378,7 @@ struct cifs_open_parms {
        struct cifs_fid *fid;
        umode_t mode;
        bool reconnect:1;
+       bool replay:1; /* indicates that this open is for a replay */
 };
 
 struct cifs_fid {
@@ -1501,6 +1510,7 @@ struct cifs_writedata {
        struct smbd_mr                  *mr;
 #endif
        struct cifs_credits             credits;
+       bool                            replay;
 };
 
 /*
@@ -1561,7 +1571,6 @@ struct cifsInodeInfo {
        spinlock_t writers_lock;
        unsigned int writers;           /* Number of writers on this inode */
        unsigned long time;             /* jiffies of last update of inode */
-       u64  server_eof;                /* current file size on server -- protected by i_lock */
        u64  uniqueid;                  /* server inode number */
        u64  createtime;                /* creation time on server */
        __u8 lease_key[SMB2_LEASE_KEY_SIZE];    /* lease key for this inode */
@@ -1831,6 +1840,13 @@ static inline bool is_retryable_error(int error)
        return false;
 }
 
+static inline bool is_replayable_error(int error)
+{
+       if (error == -EAGAIN || error == -ECONNABORTED)
+               return true;
+       return false;
+}
+
 
 /* cifs_get_writable_file() flags */
 #define FIND_WR_ANY         0
index 01e89070df5ab2a38c1d041556e959419ec40fb4..5eb83bafc7fd2bfda2693e6b8f637a5c0292dbc0 100644 (file)
@@ -2066,20 +2066,20 @@ CIFSSMBPosixLock(const unsigned int xid, struct cifs_tcon *tcon,
                parm_data = (struct cifs_posix_lock *)
                        ((char *)&pSMBr->hdr.Protocol + data_offset);
                if (parm_data->lock_type == cpu_to_le16(CIFS_UNLCK))
-                       pLockData->fl_type = F_UNLCK;
+                       pLockData->c.flc_type = F_UNLCK;
                else {
                        if (parm_data->lock_type ==
                                        cpu_to_le16(CIFS_RDLCK))
-                               pLockData->fl_type = F_RDLCK;
+                               pLockData->c.flc_type = F_RDLCK;
                        else if (parm_data->lock_type ==
                                        cpu_to_le16(CIFS_WRLCK))
-                               pLockData->fl_type = F_WRLCK;
+                               pLockData->c.flc_type = F_WRLCK;
 
                        pLockData->fl_start = le64_to_cpu(parm_data->start);
                        pLockData->fl_end = pLockData->fl_start +
                                (le64_to_cpu(parm_data->length) ?
                                 le64_to_cpu(parm_data->length) - 1 : 0);
-                       pLockData->fl_pid = -le32_to_cpu(parm_data->pid);
+                       pLockData->c.flc_pid = -le32_to_cpu(parm_data->pid);
                }
        }
 
index bfd568f8971056b2c9ffbd509e026f140549f1af..ac9595504f4b11fa066a6516f034bff8bc09d56b 100644 (file)
@@ -233,6 +233,12 @@ cifs_mark_tcp_ses_conns_for_reconnect(struct TCP_Server_Info *server,
        list_for_each_entry_safe(ses, nses, &pserver->smb_ses_list, smb_ses_list) {
                /* check if iface is still active */
                spin_lock(&ses->chan_lock);
+               if (cifs_ses_get_chan_index(ses, server) ==
+                   CIFS_INVAL_CHAN_INDEX) {
+                       spin_unlock(&ses->chan_lock);
+                       continue;
+               }
+
                if (!cifs_chan_is_iface_active(ses, server)) {
                        spin_unlock(&ses->chan_lock);
                        cifs_chan_update_iface(ses, server);
@@ -3438,8 +3444,18 @@ int cifs_mount_get_tcon(struct cifs_mount_ctx *mnt_ctx)
         * the user on mount
         */
        if ((cifs_sb->ctx->wsize == 0) ||
-           (cifs_sb->ctx->wsize > server->ops->negotiate_wsize(tcon, ctx)))
-               cifs_sb->ctx->wsize = server->ops->negotiate_wsize(tcon, ctx);
+           (cifs_sb->ctx->wsize > server->ops->negotiate_wsize(tcon, ctx))) {
+               cifs_sb->ctx->wsize =
+                       round_down(server->ops->negotiate_wsize(tcon, ctx), PAGE_SIZE);
+               /*
+                * in the very unlikely event that the server sent a max write size under PAGE_SIZE,
+                * (which would get rounded down to 0) then reset wsize to absolute minimum eg 4096
+                */
+               if (cifs_sb->ctx->wsize == 0) {
+                       cifs_sb->ctx->wsize = PAGE_SIZE;
+                       cifs_dbg(VFS, "wsize too small, reset to minimum ie PAGE_SIZE, usually 4096\n");
+               }
+       }
        if ((cifs_sb->ctx->rsize == 0) ||
            (cifs_sb->ctx->rsize > server->ops->negotiate_rsize(tcon, ctx)))
                cifs_sb->ctx->rsize = server->ops->negotiate_rsize(tcon, ctx);
@@ -4228,6 +4244,11 @@ int cifs_tree_connect(const unsigned int xid, struct cifs_tcon *tcon, const stru
 
        /* only send once per connect */
        spin_lock(&tcon->tc_lock);
+
+       /* if tcon is marked for needing reconnect, update state */
+       if (tcon->need_reconnect)
+               tcon->status = TID_NEED_TCON;
+
        if (tcon->status == TID_GOOD) {
                spin_unlock(&tcon->tc_lock);
                return 0;
index a8a1d386da6566a2dec94099ae08a80199462bae..449c59830039bc04897e5031dba2dbc9c6649bad 100644 (file)
@@ -565,6 +565,11 @@ int cifs_tree_connect(const unsigned int xid, struct cifs_tcon *tcon, const stru
 
        /* only send once per connect */
        spin_lock(&tcon->tc_lock);
+
+       /* if tcon is marked for needing reconnect, update state */
+       if (tcon->need_reconnect)
+               tcon->status = TID_NEED_TCON;
+
        if (tcon->status == TID_GOOD) {
                spin_unlock(&tcon->tc_lock);
                return 0;
@@ -625,8 +630,8 @@ out:
                spin_lock(&tcon->tc_lock);
                if (tcon->status == TID_IN_TCON)
                        tcon->status = TID_GOOD;
-               spin_unlock(&tcon->tc_lock);
                tcon->need_reconnect = false;
+               spin_unlock(&tcon->tc_lock);
        }
 
        return rc;
index 3a213432775b167dfc844df81de1aba42c5fcfb9..c3b8e7091a4d4d07c2825fbb839b31ec6d4223a1 100644 (file)
@@ -87,7 +87,7 @@ void cifs_pages_written_back(struct inode *inode, loff_t start, unsigned int len
                        continue;
                if (!folio_test_writeback(folio)) {
                        WARN_ONCE(1, "bad %x @%llx page %lx %lx\n",
-                                 len, start, folio_index(folio), end);
+                                 len, start, folio->index, end);
                        continue;
                }
 
@@ -120,7 +120,7 @@ void cifs_pages_write_failed(struct inode *inode, loff_t start, unsigned int len
                        continue;
                if (!folio_test_writeback(folio)) {
                        WARN_ONCE(1, "bad %x @%llx page %lx %lx\n",
-                                 len, start, folio_index(folio), end);
+                                 len, start, folio->index, end);
                        continue;
                }
 
@@ -151,7 +151,7 @@ void cifs_pages_write_redirty(struct inode *inode, loff_t start, unsigned int le
        xas_for_each(&xas, folio, end) {
                if (!folio_test_writeback(folio)) {
                        WARN_ONCE(1, "bad %x @%llx page %lx %lx\n",
-                                 len, start, folio_index(folio), end);
+                                 len, start, folio->index, end);
                        continue;
                }
 
@@ -175,6 +175,9 @@ cifs_mark_open_files_invalid(struct cifs_tcon *tcon)
 
        /* only send once per connect */
        spin_lock(&tcon->tc_lock);
+       if (tcon->need_reconnect)
+               tcon->status = TID_NEED_RECON;
+
        if (tcon->status != TID_NEED_RECON) {
                spin_unlock(&tcon->tc_lock);
                return;
@@ -1312,20 +1315,20 @@ cifs_lock_test(struct cifsFileInfo *cfile, __u64 offset, __u64 length,
        down_read(&cinode->lock_sem);
 
        exist = cifs_find_lock_conflict(cfile, offset, length, type,
-                                       flock->fl_flags, &conf_lock,
+                                       flock->c.flc_flags, &conf_lock,
                                        CIFS_LOCK_OP);
        if (exist) {
                flock->fl_start = conf_lock->offset;
                flock->fl_end = conf_lock->offset + conf_lock->length - 1;
-               flock->fl_pid = conf_lock->pid;
+               flock->c.flc_pid = conf_lock->pid;
                if (conf_lock->type & server->vals->shared_lock_type)
-                       flock->fl_type = F_RDLCK;
+                       flock->c.flc_type = F_RDLCK;
                else
-                       flock->fl_type = F_WRLCK;
+                       flock->c.flc_type = F_WRLCK;
        } else if (!cinode->can_cache_brlcks)
                rc = 1;
        else
-               flock->fl_type = F_UNLCK;
+               flock->c.flc_type = F_UNLCK;
 
        up_read(&cinode->lock_sem);
        return rc;
@@ -1401,16 +1404,16 @@ cifs_posix_lock_test(struct file *file, struct file_lock *flock)
 {
        int rc = 0;
        struct cifsInodeInfo *cinode = CIFS_I(file_inode(file));
-       unsigned char saved_type = flock->fl_type;
+       unsigned char saved_type = flock->c.flc_type;
 
-       if ((flock->fl_flags & FL_POSIX) == 0)
+       if ((flock->c.flc_flags & FL_POSIX) == 0)
                return 1;
 
        down_read(&cinode->lock_sem);
        posix_test_lock(file, flock);
 
-       if (flock->fl_type == F_UNLCK && !cinode->can_cache_brlcks) {
-               flock->fl_type = saved_type;
+       if (lock_is_unlock(flock) && !cinode->can_cache_brlcks) {
+               flock->c.flc_type = saved_type;
                rc = 1;
        }
 
@@ -1431,7 +1434,7 @@ cifs_posix_lock_set(struct file *file, struct file_lock *flock)
        struct cifsInodeInfo *cinode = CIFS_I(file_inode(file));
        int rc = FILE_LOCK_DEFERRED + 1;
 
-       if ((flock->fl_flags & FL_POSIX) == 0)
+       if ((flock->c.flc_flags & FL_POSIX) == 0)
                return rc;
 
        cifs_down_write(&cinode->lock_sem);
@@ -1581,7 +1584,9 @@ cifs_push_posix_locks(struct cifsFileInfo *cfile)
 
        el = locks_to_send.next;
        spin_lock(&flctx->flc_lock);
-       list_for_each_entry(flock, &flctx->flc_posix, fl_list) {
+       for_each_file_lock(flock, &flctx->flc_posix) {
+               unsigned char ftype = flock->c.flc_type;
+
                if (el == &locks_to_send) {
                        /*
                         * The list ended. We don't have enough allocated
@@ -1591,12 +1596,12 @@ cifs_push_posix_locks(struct cifsFileInfo *cfile)
                        break;
                }
                length = cifs_flock_len(flock);
-               if (flock->fl_type == F_RDLCK || flock->fl_type == F_SHLCK)
+               if (ftype == F_RDLCK || ftype == F_SHLCK)
                        type = CIFS_RDLCK;
                else
                        type = CIFS_WRLCK;
                lck = list_entry(el, struct lock_to_push, llist);
-               lck->pid = hash_lockowner(flock->fl_owner);
+               lck->pid = hash_lockowner(flock->c.flc_owner);
                lck->netfid = cfile->fid.netfid;
                lck->length = length;
                lck->type = type;
@@ -1663,42 +1668,43 @@ static void
 cifs_read_flock(struct file_lock *flock, __u32 *type, int *lock, int *unlock,
                bool *wait_flag, struct TCP_Server_Info *server)
 {
-       if (flock->fl_flags & FL_POSIX)
+       if (flock->c.flc_flags & FL_POSIX)
                cifs_dbg(FYI, "Posix\n");
-       if (flock->fl_flags & FL_FLOCK)
+       if (flock->c.flc_flags & FL_FLOCK)
                cifs_dbg(FYI, "Flock\n");
-       if (flock->fl_flags & FL_SLEEP) {
+       if (flock->c.flc_flags & FL_SLEEP) {
                cifs_dbg(FYI, "Blocking lock\n");
                *wait_flag = true;
        }
-       if (flock->fl_flags & FL_ACCESS)
+       if (flock->c.flc_flags & FL_ACCESS)
                cifs_dbg(FYI, "Process suspended by mandatory locking - not implemented yet\n");
-       if (flock->fl_flags & FL_LEASE)
+       if (flock->c.flc_flags & FL_LEASE)
                cifs_dbg(FYI, "Lease on file - not implemented yet\n");
-       if (flock->fl_flags &
+       if (flock->c.flc_flags &
            (~(FL_POSIX | FL_FLOCK | FL_SLEEP |
               FL_ACCESS | FL_LEASE | FL_CLOSE | FL_OFDLCK)))
-               cifs_dbg(FYI, "Unknown lock flags 0x%x\n", flock->fl_flags);
+               cifs_dbg(FYI, "Unknown lock flags 0x%x\n",
+                        flock->c.flc_flags);
 
        *type = server->vals->large_lock_type;
-       if (flock->fl_type == F_WRLCK) {
+       if (lock_is_write(flock)) {
                cifs_dbg(FYI, "F_WRLCK\n");
                *type |= server->vals->exclusive_lock_type;
                *lock = 1;
-       } else if (flock->fl_type == F_UNLCK) {
+       } else if (lock_is_unlock(flock)) {
                cifs_dbg(FYI, "F_UNLCK\n");
                *type |= server->vals->unlock_lock_type;
                *unlock = 1;
                /* Check if unlock includes more than one lock range */
-       } else if (flock->fl_type == F_RDLCK) {
+       } else if (lock_is_read(flock)) {
                cifs_dbg(FYI, "F_RDLCK\n");
                *type |= server->vals->shared_lock_type;
                *lock = 1;
-       } else if (flock->fl_type == F_EXLCK) {
+       } else if (flock->c.flc_type == F_EXLCK) {
                cifs_dbg(FYI, "F_EXLCK\n");
                *type |= server->vals->exclusive_lock_type;
                *lock = 1;
-       } else if (flock->fl_type == F_SHLCK) {
+       } else if (flock->c.flc_type == F_SHLCK) {
                cifs_dbg(FYI, "F_SHLCK\n");
                *type |= server->vals->shared_lock_type;
                *lock = 1;
@@ -1730,7 +1736,7 @@ cifs_getlk(struct file *file, struct file_lock *flock, __u32 type,
                else
                        posix_lock_type = CIFS_WRLCK;
                rc = CIFSSMBPosixLock(xid, tcon, netfid,
-                                     hash_lockowner(flock->fl_owner),
+                                     hash_lockowner(flock->c.flc_owner),
                                      flock->fl_start, length, flock,
                                      posix_lock_type, wait_flag);
                return rc;
@@ -1747,7 +1753,7 @@ cifs_getlk(struct file *file, struct file_lock *flock, __u32 type,
        if (rc == 0) {
                rc = server->ops->mand_lock(xid, cfile, flock->fl_start, length,
                                            type, 0, 1, false);
-               flock->fl_type = F_UNLCK;
+               flock->c.flc_type = F_UNLCK;
                if (rc != 0)
                        cifs_dbg(VFS, "Error unlocking previously locked range %d during test of lock\n",
                                 rc);
@@ -1755,7 +1761,7 @@ cifs_getlk(struct file *file, struct file_lock *flock, __u32 type,
        }
 
        if (type & server->vals->shared_lock_type) {
-               flock->fl_type = F_WRLCK;
+               flock->c.flc_type = F_WRLCK;
                return 0;
        }
 
@@ -1767,12 +1773,12 @@ cifs_getlk(struct file *file, struct file_lock *flock, __u32 type,
        if (rc == 0) {
                rc = server->ops->mand_lock(xid, cfile, flock->fl_start, length,
                        type | server->vals->shared_lock_type, 0, 1, false);
-               flock->fl_type = F_RDLCK;
+               flock->c.flc_type = F_RDLCK;
                if (rc != 0)
                        cifs_dbg(VFS, "Error unlocking previously locked range %d during test of lock\n",
                                 rc);
        } else
-               flock->fl_type = F_WRLCK;
+               flock->c.flc_type = F_WRLCK;
 
        return 0;
 }
@@ -1940,7 +1946,7 @@ cifs_setlk(struct file *file, struct file_lock *flock, __u32 type,
                        posix_lock_type = CIFS_UNLCK;
 
                rc = CIFSSMBPosixLock(xid, tcon, cfile->fid.netfid,
-                                     hash_lockowner(flock->fl_owner),
+                                     hash_lockowner(flock->c.flc_owner),
                                      flock->fl_start, length,
                                      NULL, posix_lock_type, wait_flag);
                goto out;
@@ -1950,7 +1956,7 @@ cifs_setlk(struct file *file, struct file_lock *flock, __u32 type,
                struct cifsLockInfo *lock;
 
                lock = cifs_lock_init(flock->fl_start, length, type,
-                                     flock->fl_flags);
+                                     flock->c.flc_flags);
                if (!lock)
                        return -ENOMEM;
 
@@ -1989,7 +1995,7 @@ cifs_setlk(struct file *file, struct file_lock *flock, __u32 type,
                rc = server->ops->mand_unlock_range(cfile, flock, xid);
 
 out:
-       if ((flock->fl_flags & FL_POSIX) || (flock->fl_flags & FL_FLOCK)) {
+       if ((flock->c.flc_flags & FL_POSIX) || (flock->c.flc_flags & FL_FLOCK)) {
                /*
                 * If this is a request to remove all locks because we
                 * are closing the file, it doesn't matter if the
@@ -1998,7 +2004,7 @@ out:
                 */
                if (rc) {
                        cifs_dbg(VFS, "%s failed rc=%d\n", __func__, rc);
-                       if (!(flock->fl_flags & FL_CLOSE))
+                       if (!(flock->c.flc_flags & FL_CLOSE))
                                return rc;
                }
                rc = locks_lock_file_wait(file, flock);
@@ -2019,7 +2025,7 @@ int cifs_flock(struct file *file, int cmd, struct file_lock *fl)
 
        xid = get_xid();
 
-       if (!(fl->fl_flags & FL_FLOCK)) {
+       if (!(fl->c.flc_flags & FL_FLOCK)) {
                rc = -ENOLCK;
                free_xid(xid);
                return rc;
@@ -2070,7 +2076,8 @@ int cifs_lock(struct file *file, int cmd, struct file_lock *flock)
        xid = get_xid();
 
        cifs_dbg(FYI, "%s: %pD2 cmd=0x%x type=0x%x flags=0x%x r=%lld:%lld\n", __func__, file, cmd,
-                flock->fl_flags, flock->fl_type, (long long)flock->fl_start,
+                flock->c.flc_flags, flock->c.flc_type,
+                (long long)flock->fl_start,
                 (long long)flock->fl_end);
 
        cfile = (struct cifsFileInfo *)file->private_data;
@@ -2120,8 +2127,8 @@ cifs_update_eof(struct cifsInodeInfo *cifsi, loff_t offset,
 {
        loff_t end_of_write = offset + bytes_written;
 
-       if (end_of_write > cifsi->server_eof)
-               cifsi->server_eof = end_of_write;
+       if (end_of_write > cifsi->netfs.remote_i_size)
+               netfs_resize_file(&cifsi->netfs, end_of_write, true);
 }
 
 static ssize_t
@@ -2651,7 +2658,7 @@ static void cifs_extend_writeback(struct address_space *mapping,
                                continue;
                        if (xa_is_value(folio))
                                break;
-                       if (folio_index(folio) != index)
+                       if (folio->index != index)
                                break;
                        if (!folio_try_get_rcu(folio)) {
                                xas_reset(&xas);
@@ -2899,7 +2906,7 @@ redo_folio:
                                        goto skip_write;
                        }
 
-                       if (folio_mapping(folio) != mapping ||
+                       if (folio->mapping != mapping ||
                            !folio_test_dirty(folio)) {
                                start += folio_size(folio);
                                folio_unlock(folio);
@@ -2951,7 +2958,7 @@ skip_write:
                        continue;
                }
 
-               folio_batch_release(&fbatch);           
+               folio_batch_release(&fbatch);
                cond_resched();
        } while (wbc->nr_to_write > 0);
 
@@ -3247,8 +3254,8 @@ cifs_uncached_writev_complete(struct work_struct *work)
 
        spin_lock(&inode->i_lock);
        cifs_update_eof(cifsi, wdata->offset, wdata->bytes);
-       if (cifsi->server_eof > inode->i_size)
-               i_size_write(inode, cifsi->server_eof);
+       if (cifsi->netfs.remote_i_size > inode->i_size)
+               i_size_write(inode, cifsi->netfs.remote_i_size);
        spin_unlock(&inode->i_lock);
 
        complete(&wdata->done);
@@ -3300,6 +3307,7 @@ cifs_resend_wdata(struct cifs_writedata *wdata, struct list_head *wdata_list,
                        if (wdata->cfile->invalidHandle)
                                rc = -EAGAIN;
                        else {
+                               wdata->replay = true;
 #ifdef CONFIG_CIFS_SMB_DIRECT
                                if (wdata->mr) {
                                        wdata->mr->need_invalidate = true;
index 52cbef2eeb28f6ba0013063b4bafcecc08c3a02d..4b2f5aa2ea0e1de026302b9e543e2e13429107f9 100644 (file)
@@ -211,7 +211,7 @@ cifs_parse_security_flavors(struct fs_context *fc, char *value, struct smb3_fs_c
 
        switch (match_token(value, cifs_secflavor_tokens, args)) {
        case Opt_sec_krb5p:
-               cifs_errorf(fc, "sec=krb5p is not supported!\n");
+               cifs_errorf(fc, "sec=krb5p is not supported. Use sec=krb5,seal instead\n");
                return 1;
        case Opt_sec_krb5i:
                ctx->sign = true;
@@ -1111,6 +1111,17 @@ static int smb3_fs_context_parse_param(struct fs_context *fc,
        case Opt_wsize:
                ctx->wsize = result.uint_32;
                ctx->got_wsize = true;
+               if (ctx->wsize % PAGE_SIZE != 0) {
+                       ctx->wsize = round_down(ctx->wsize, PAGE_SIZE);
+                       if (ctx->wsize == 0) {
+                               ctx->wsize = PAGE_SIZE;
+                               cifs_dbg(VFS, "wsize too small, reset to minimum %ld\n", PAGE_SIZE);
+                       } else {
+                               cifs_dbg(VFS,
+                                        "wsize rounded down to %d to multiple of PAGE_SIZE %ld\n",
+                                        ctx->wsize, PAGE_SIZE);
+                       }
+               }
                break;
        case Opt_acregmax:
                ctx->acregmax = HZ * result.uint_32;
index f0989484f2c648796d923fcd3f998b150b1f92cf..d02f8ba29cb5bf22f1dcdcc3932f20afc3094f22 100644 (file)
@@ -104,7 +104,7 @@ cifs_revalidate_cache(struct inode *inode, struct cifs_fattr *fattr)
        fattr->cf_mtime = timestamp_truncate(fattr->cf_mtime, inode);
        mtime = inode_get_mtime(inode);
        if (timespec64_equal(&mtime, &fattr->cf_mtime) &&
-           cifs_i->server_eof == fattr->cf_eof) {
+           cifs_i->netfs.remote_i_size == fattr->cf_eof) {
                cifs_dbg(FYI, "%s: inode %llu is unchanged\n",
                         __func__, cifs_i->uniqueid);
                return;
@@ -194,7 +194,7 @@ cifs_fattr_to_inode(struct inode *inode, struct cifs_fattr *fattr)
        else
                clear_bit(CIFS_INO_DELETE_PENDING, &cifs_i->flags);
 
-       cifs_i->server_eof = fattr->cf_eof;
+       cifs_i->netfs.remote_i_size = fattr->cf_eof;
        /*
         * Can't safely change the file size here if the client is writing to
         * it due to potential races.
@@ -2858,7 +2858,7 @@ cifs_set_file_size(struct inode *inode, struct iattr *attrs,
 
 set_size_out:
        if (rc == 0) {
-               cifsInode->server_eof = attrs->ia_size;
+               netfs_resize_file(&cifsInode->netfs, attrs->ia_size, true);
                cifs_setsize(inode, attrs->ia_size);
                /*
                 * i_blocks is not related to (i_size / i_blksize), but instead
@@ -3011,6 +3011,7 @@ cifs_setattr_unix(struct dentry *direntry, struct iattr *attrs)
        if ((attrs->ia_valid & ATTR_SIZE) &&
            attrs->ia_size != i_size_read(inode)) {
                truncate_setsize(inode, attrs->ia_size);
+               netfs_resize_file(&cifsInode->netfs, attrs->ia_size, true);
                fscache_resize_cookie(cifs_inode_cookie(inode), attrs->ia_size);
        }
 
@@ -3210,6 +3211,7 @@ cifs_setattr_nounix(struct dentry *direntry, struct iattr *attrs)
        if ((attrs->ia_valid & ATTR_SIZE) &&
            attrs->ia_size != i_size_read(inode)) {
                truncate_setsize(inode, attrs->ia_size);
+               netfs_resize_file(&cifsInode->netfs, attrs->ia_size, true);
                fscache_resize_cookie(cifs_inode_cookie(inode), attrs->ia_size);
        }
 
index a6968573b775e7bdcab3df948908e9494f792027..4a517b280f2b79a2c1395a1c627b496def7392a7 100644 (file)
@@ -168,6 +168,21 @@ static char *automount_fullpath(struct dentry *dentry, void *page)
        return s;
 }
 
+static void fs_context_set_ids(struct smb3_fs_context *ctx)
+{
+       kuid_t uid = current_fsuid();
+       kgid_t gid = current_fsgid();
+
+       if (ctx->multiuser) {
+               if (!ctx->uid_specified)
+                       ctx->linux_uid = uid;
+               if (!ctx->gid_specified)
+                       ctx->linux_gid = gid;
+       }
+       if (!ctx->cruid_specified)
+               ctx->cred_uid = uid;
+}
+
 /*
  * Create a vfsmount that we can automount
  */
@@ -205,6 +220,7 @@ static struct vfsmount *cifs_do_automount(struct path *path)
        tmp.leaf_fullpath = NULL;
        tmp.UNC = tmp.prepath = NULL;
        tmp.dfs_root_ses = NULL;
+       fs_context_set_ids(&tmp);
 
        rc = smb3_fs_context_dup(ctx, &tmp);
        if (rc) {
index 94255401b38dcb24c705f255731db2791e171c8d..b520eea7bfce83b2fd8e9a1d9e1685c39516840e 100644 (file)
@@ -141,7 +141,7 @@ retry:
                                        if (likely(reparse_inode_match(inode, fattr))) {
                                                fattr->cf_mode = inode->i_mode;
                                                fattr->cf_rdev = inode->i_rdev;
-                                               fattr->cf_eof = CIFS_I(inode)->server_eof;
+                                               fattr->cf_eof = CIFS_I(inode)->netfs.remote_i_size;
                                                fattr->cf_symlink_target = NULL;
                                        } else {
                                                CIFS_I(inode)->time = 0;
@@ -307,14 +307,16 @@ cifs_dir_info_to_fattr(struct cifs_fattr *fattr, FILE_DIRECTORY_INFO *info,
 }
 
 static void cifs_fulldir_info_to_fattr(struct cifs_fattr *fattr,
-                                      SEARCH_ID_FULL_DIR_INFO *info,
+                                      const void *info,
                                       struct cifs_sb_info *cifs_sb)
 {
+       const FILE_FULL_DIRECTORY_INFO *di = info;
+
        __dir_info_to_fattr(fattr, info);
 
-       /* See MS-FSCC 2.4.19 FileIdFullDirectoryInformation */
+       /* See MS-FSCC 2.4.14, 2.4.19 */
        if (fattr->cf_cifsattrs & ATTR_REPARSE)
-               fattr->cf_cifstag = le32_to_cpu(info->EaSize);
+               fattr->cf_cifstag = le32_to_cpu(di->EaSize);
        cifs_fill_common_info(fattr, cifs_sb);
 }
 
@@ -396,7 +398,7 @@ ffirst_retry:
        } else if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_SERVER_INUM) {
                cifsFile->srch_inf.info_level = SMB_FIND_FILE_ID_FULL_DIR_INFO;
        } else /* not srvinos - BB fixme add check for backlevel? */ {
-               cifsFile->srch_inf.info_level = SMB_FIND_FILE_DIRECTORY_INFO;
+               cifsFile->srch_inf.info_level = SMB_FIND_FILE_FULL_DIRECTORY_INFO;
        }
 
        search_flags = CIFS_SEARCH_CLOSE_AT_END | CIFS_SEARCH_RETURN_RESUME;
@@ -987,10 +989,9 @@ static int cifs_filldir(char *find_entry, struct file *file,
                                       (FIND_FILE_STANDARD_INFO *)find_entry,
                                       cifs_sb);
                break;
+       case SMB_FIND_FILE_FULL_DIRECTORY_INFO:
        case SMB_FIND_FILE_ID_FULL_DIR_INFO:
-               cifs_fulldir_info_to_fattr(&fattr,
-                                          (SEARCH_ID_FULL_DIR_INFO *)find_entry,
-                                          cifs_sb);
+               cifs_fulldir_info_to_fattr(&fattr, find_entry, cifs_sb);
                break;
        default:
                cifs_dir_info_to_fattr(&fattr,
index cde81042bebda6b8f3a454f46eb8b055af8d2f3c..8f37373fd33344bacbf4d492f6115d9396573379 100644 (file)
@@ -75,6 +75,10 @@ cifs_ses_get_chan_index(struct cifs_ses *ses,
 {
        unsigned int i;
 
+       /* if the channel is waiting for termination */
+       if (server && server->terminate)
+               return CIFS_INVAL_CHAN_INDEX;
+
        for (i = 0; i < ses->chan_count; i++) {
                if (ses->chans[i].server == server)
                        return i;
@@ -84,7 +88,6 @@ cifs_ses_get_chan_index(struct cifs_ses *ses,
        if (server)
                cifs_dbg(VFS, "unable to get chan index for server: 0x%llx",
                         server->conn_id);
-       WARN_ON(1);
        return CIFS_INVAL_CHAN_INDEX;
 }
 
@@ -269,6 +272,8 @@ int cifs_try_adding_channels(struct cifs_ses *ses)
                                         &iface->sockaddr,
                                         rc);
                                kref_put(&iface->refcount, release_iface);
+                               /* failure to add chan should increase weight */
+                               iface->weight_fulfilled++;
                                continue;
                        }
 
index e0ee96d69d495216090d65817360c8fef6159419..c23478ab1cf851999e6578eed9d9f6b99cc08b30 100644 (file)
@@ -228,7 +228,7 @@ smb2_unlock_range(struct cifsFileInfo *cfile, struct file_lock *flock,
                         * flock and OFD lock are associated with an open
                         * file description, not the process.
                         */
-                       if (!(flock->fl_flags & (FL_FLOCK | FL_OFDLCK)))
+                       if (!(flock->c.flc_flags & (FL_FLOCK | FL_OFDLCK)))
                                continue;
                if (cinode->can_cache_brlcks) {
                        /*
index a652200540c8aa5d2aa0ecd68ed50cc66f587d05..05818cd6d932e91792ecc65d764eba0a942cb28d 100644 (file)
@@ -120,6 +120,14 @@ static int smb2_compound_op(const unsigned int xid, struct cifs_tcon *tcon,
        unsigned int size[2];
        void *data[2];
        int len;
+       int retries = 0, cur_sleep = 1;
+
+replay_again:
+       /* reinitialize for possible replay */
+       flags = 0;
+       oplock = SMB2_OPLOCK_LEVEL_NONE;
+       num_rqst = 0;
+       server = cifs_pick_channel(ses);
 
        vars = kzalloc(sizeof(*vars), GFP_ATOMIC);
        if (vars == NULL)
@@ -127,8 +135,6 @@ static int smb2_compound_op(const unsigned int xid, struct cifs_tcon *tcon,
        rqst = &vars->rqst[0];
        rsp_iov = &vars->rsp_iov[0];
 
-       server = cifs_pick_channel(ses);
-
        if (smb3_encryption_required(tcon))
                flags |= CIFS_TRANSFORM_REQ;
 
@@ -463,15 +469,24 @@ static int smb2_compound_op(const unsigned int xid, struct cifs_tcon *tcon,
        num_rqst++;
 
        if (cfile) {
+               if (retries)
+                       for (i = 1; i < num_rqst - 2; i++)
+                               smb2_set_replay(server, &rqst[i]);
+
                rc = compound_send_recv(xid, ses, server,
                                        flags, num_rqst - 2,
                                        &rqst[1], &resp_buftype[1],
                                        &rsp_iov[1]);
-       } else
+       } else {
+               if (retries)
+                       for (i = 0; i < num_rqst; i++)
+                               smb2_set_replay(server, &rqst[i]);
+
                rc = compound_send_recv(xid, ses, server,
                                        flags, num_rqst,
                                        rqst, resp_buftype,
                                        rsp_iov);
+       }
 
 finished:
        num_rqst = 0;
@@ -620,9 +635,6 @@ finished:
        }
        SMB2_close_free(&rqst[num_rqst]);
 
-       if (cfile)
-               cifsFileInfo_put(cfile);
-
        num_cmds += 2;
        if (out_iov && out_buftype) {
                memcpy(out_iov, rsp_iov, num_cmds * sizeof(*out_iov));
@@ -632,7 +644,16 @@ finished:
                for (i = 0; i < num_cmds; i++)
                        free_rsp_buf(resp_buftype[i], rsp_iov[i].iov_base);
        }
+       num_cmds -= 2; /* correct num_cmds as there could be a retry */
        kfree(vars);
+
+       if (is_replayable_error(rc) &&
+           smb2_should_replay(tcon, &retries, &cur_sleep))
+               goto replay_again;
+
+       if (cfile)
+               cifsFileInfo_put(cfile);
+
        return rc;
 }
 
index d9553c2556a290dcea14434e00df9d854e713aa3..4695433fcf397f529754cc9ec266cb5ac1727512 100644 (file)
@@ -619,7 +619,7 @@ parse_server_interfaces(struct network_interface_info_ioctl_rsp *buf,
                goto out;
        }
 
-       while (bytes_left >= sizeof(*p)) {
+       while (bytes_left >= (ssize_t)sizeof(*p)) {
                memset(&tmp_iface, 0, sizeof(tmp_iface));
                tmp_iface.speed = le64_to_cpu(p->LinkSpeed);
                tmp_iface.rdma_capable = le32_to_cpu(p->Capability & RDMA_CAPABLE) ? 1 : 0;
@@ -1108,7 +1108,7 @@ smb2_set_ea(const unsigned int xid, struct cifs_tcon *tcon,
 {
        struct smb2_compound_vars *vars;
        struct cifs_ses *ses = tcon->ses;
-       struct TCP_Server_Info *server = cifs_pick_channel(ses);
+       struct TCP_Server_Info *server;
        struct smb_rqst *rqst;
        struct kvec *rsp_iov;
        __le16 *utf16_path = NULL;
@@ -1124,6 +1124,13 @@ smb2_set_ea(const unsigned int xid, struct cifs_tcon *tcon,
        struct smb2_file_full_ea_info *ea = NULL;
        struct smb2_query_info_rsp *rsp;
        int rc, used_len = 0;
+       int retries = 0, cur_sleep = 1;
+
+replay_again:
+       /* reinitialize for possible replay */
+       flags = CIFS_CP_CREATE_CLOSE_OP;
+       oplock = SMB2_OPLOCK_LEVEL_NONE;
+       server = cifs_pick_channel(ses);
 
        if (smb3_encryption_required(tcon))
                flags |= CIFS_TRANSFORM_REQ;
@@ -1197,6 +1204,7 @@ smb2_set_ea(const unsigned int xid, struct cifs_tcon *tcon,
                .disposition = FILE_OPEN,
                .create_options = cifs_create_options(cifs_sb, 0),
                .fid = &fid,
+               .replay = !!(retries),
        };
 
        rc = SMB2_open_init(tcon, server,
@@ -1244,6 +1252,12 @@ smb2_set_ea(const unsigned int xid, struct cifs_tcon *tcon,
                goto sea_exit;
        smb2_set_related(&rqst[2]);
 
+       if (retries) {
+               smb2_set_replay(server, &rqst[0]);
+               smb2_set_replay(server, &rqst[1]);
+               smb2_set_replay(server, &rqst[2]);
+       }
+
        rc = compound_send_recv(xid, ses, server,
                                flags, 3, rqst,
                                resp_buftype, rsp_iov);
@@ -1260,6 +1274,11 @@ smb2_set_ea(const unsigned int xid, struct cifs_tcon *tcon,
        kfree(vars);
 out_free_path:
        kfree(utf16_path);
+
+       if (is_replayable_error(rc) &&
+           smb2_should_replay(tcon, &retries, &cur_sleep))
+               goto replay_again;
+
        return rc;
 }
 #endif
@@ -1484,7 +1503,7 @@ smb2_ioctl_query_info(const unsigned int xid,
        struct smb_rqst *rqst;
        struct kvec *rsp_iov;
        struct cifs_ses *ses = tcon->ses;
-       struct TCP_Server_Info *server = cifs_pick_channel(ses);
+       struct TCP_Server_Info *server;
        char __user *arg = (char __user *)p;
        struct smb_query_info qi;
        struct smb_query_info __user *pqi;
@@ -1501,6 +1520,13 @@ smb2_ioctl_query_info(const unsigned int xid,
        void *data[2];
        int create_options = is_dir ? CREATE_NOT_FILE : CREATE_NOT_DIR;
        void (*free_req1_func)(struct smb_rqst *r);
+       int retries = 0, cur_sleep = 1;
+
+replay_again:
+       /* reinitialize for possible replay */
+       flags = CIFS_CP_CREATE_CLOSE_OP;
+       oplock = SMB2_OPLOCK_LEVEL_NONE;
+       server = cifs_pick_channel(ses);
 
        vars = kzalloc(sizeof(*vars), GFP_ATOMIC);
        if (vars == NULL)
@@ -1544,6 +1570,7 @@ smb2_ioctl_query_info(const unsigned int xid,
                .disposition = FILE_OPEN,
                .create_options = cifs_create_options(cifs_sb, create_options),
                .fid = &fid,
+               .replay = !!(retries),
        };
 
        if (qi.flags & PASSTHRU_FSCTL) {
@@ -1641,6 +1668,12 @@ smb2_ioctl_query_info(const unsigned int xid,
                goto free_req_1;
        smb2_set_related(&rqst[2]);
 
+       if (retries) {
+               smb2_set_replay(server, &rqst[0]);
+               smb2_set_replay(server, &rqst[1]);
+               smb2_set_replay(server, &rqst[2]);
+       }
+
        rc = compound_send_recv(xid, ses, server,
                                flags, 3, rqst,
                                resp_buftype, rsp_iov);
@@ -1701,6 +1734,11 @@ free_output_buffer:
        kfree(buffer);
 free_vars:
        kfree(vars);
+
+       if (is_replayable_error(rc) &&
+           smb2_should_replay(tcon, &retries, &cur_sleep))
+               goto replay_again;
+
        return rc;
 }
 
@@ -2227,8 +2265,14 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
        struct cifs_open_parms oparms;
        struct smb2_query_directory_rsp *qd_rsp = NULL;
        struct smb2_create_rsp *op_rsp = NULL;
-       struct TCP_Server_Info *server = cifs_pick_channel(tcon->ses);
-       int retry_count = 0;
+       struct TCP_Server_Info *server;
+       int retries = 0, cur_sleep = 1;
+
+replay_again:
+       /* reinitialize for possible replay */
+       flags = 0;
+       oplock = SMB2_OPLOCK_LEVEL_NONE;
+       server = cifs_pick_channel(tcon->ses);
 
        utf16_path = cifs_convert_path_to_utf16(path, cifs_sb);
        if (!utf16_path)
@@ -2253,6 +2297,7 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
                .disposition = FILE_OPEN,
                .create_options = cifs_create_options(cifs_sb, 0),
                .fid = fid,
+               .replay = !!(retries),
        };
 
        rc = SMB2_open_init(tcon, server,
@@ -2278,14 +2323,15 @@ smb2_query_dir_first(const unsigned int xid, struct cifs_tcon *tcon,
 
        smb2_set_related(&rqst[1]);
 
-again:
+       if (retries) {
+               smb2_set_replay(server, &rqst[0]);
+               smb2_set_replay(server, &rqst[1]);
+       }
+
        rc = compound_send_recv(xid, tcon->ses, server,
                                flags, 2, rqst,
                                resp_buftype, rsp_iov);
 
-       if (rc == -EAGAIN && retry_count++ < 10)
-               goto again;
-
        /* If the open failed there is nothing to do */
        op_rsp = (struct smb2_create_rsp *)rsp_iov[0].iov_base;
        if (op_rsp == NULL || op_rsp->hdr.Status != STATUS_SUCCESS) {
@@ -2333,6 +2379,11 @@ again:
        SMB2_query_directory_free(&rqst[1]);
        free_rsp_buf(resp_buftype[0], rsp_iov[0].iov_base);
        free_rsp_buf(resp_buftype[1], rsp_iov[1].iov_base);
+
+       if (is_replayable_error(rc) &&
+           smb2_should_replay(tcon, &retries, &cur_sleep))
+               goto replay_again;
+
        return rc;
 }
 
@@ -2457,6 +2508,22 @@ smb2_oplock_response(struct cifs_tcon *tcon, __u64 persistent_fid,
                                 CIFS_CACHE_READ(cinode) ? 1 : 0);
 }
 
+void
+smb2_set_replay(struct TCP_Server_Info *server, struct smb_rqst *rqst)
+{
+       struct smb2_hdr *shdr;
+
+       if (server->dialect < SMB30_PROT_ID)
+               return;
+
+       shdr = (struct smb2_hdr *)(rqst->rq_iov[0].iov_base);
+       if (shdr == NULL) {
+               cifs_dbg(FYI, "shdr NULL in smb2_set_related\n");
+               return;
+       }
+       shdr->Flags |= SMB2_FLAGS_REPLAY_OPERATION;
+}
+
 void
 smb2_set_related(struct smb_rqst *rqst)
 {
@@ -2529,6 +2596,27 @@ smb2_set_next_command(struct cifs_tcon *tcon, struct smb_rqst *rqst)
        shdr->NextCommand = cpu_to_le32(len);
 }
 
+/*
+ * helper function for exponential backoff and check if replayable
+ */
+bool smb2_should_replay(struct cifs_tcon *tcon,
+                               int *pretries,
+                               int *pcur_sleep)
+{
+       if (!pretries || !pcur_sleep)
+               return false;
+
+       if (tcon->retry || (*pretries)++ < tcon->ses->server->retrans) {
+               msleep(*pcur_sleep);
+               (*pcur_sleep) = ((*pcur_sleep) << 1);
+               if ((*pcur_sleep) > CIFS_MAX_SLEEP)
+                       (*pcur_sleep) = CIFS_MAX_SLEEP;
+               return true;
+       }
+
+       return false;
+}
+
 /*
  * Passes the query info response back to the caller on success.
  * Caller need to free this with free_rsp_buf().
@@ -2542,7 +2630,7 @@ smb2_query_info_compound(const unsigned int xid, struct cifs_tcon *tcon,
 {
        struct smb2_compound_vars *vars;
        struct cifs_ses *ses = tcon->ses;
-       struct TCP_Server_Info *server = cifs_pick_channel(ses);
+       struct TCP_Server_Info *server;
        int flags = CIFS_CP_CREATE_CLOSE_OP;
        struct smb_rqst *rqst;
        int resp_buftype[3];
@@ -2553,6 +2641,13 @@ smb2_query_info_compound(const unsigned int xid, struct cifs_tcon *tcon,
        int rc;
        __le16 *utf16_path;
        struct cached_fid *cfid = NULL;
+       int retries = 0, cur_sleep = 1;
+
+replay_again:
+       /* reinitialize for possible replay */
+       flags = CIFS_CP_CREATE_CLOSE_OP;
+       oplock = SMB2_OPLOCK_LEVEL_NONE;
+       server = cifs_pick_channel(ses);
 
        if (!path)
                path = "";
@@ -2589,6 +2684,7 @@ smb2_query_info_compound(const unsigned int xid, struct cifs_tcon *tcon,
                .disposition = FILE_OPEN,
                .create_options = cifs_create_options(cifs_sb, 0),
                .fid = &fid,
+               .replay = !!(retries),
        };
 
        rc = SMB2_open_init(tcon, server,
@@ -2633,6 +2729,14 @@ smb2_query_info_compound(const unsigned int xid, struct cifs_tcon *tcon,
                goto qic_exit;
        smb2_set_related(&rqst[2]);
 
+       if (retries) {
+               if (!cfid) {
+                       smb2_set_replay(server, &rqst[0]);
+                       smb2_set_replay(server, &rqst[2]);
+               }
+               smb2_set_replay(server, &rqst[1]);
+       }
+
        if (cfid) {
                rc = compound_send_recv(xid, ses, server,
                                        flags, 1, &rqst[1],
@@ -2665,6 +2769,11 @@ smb2_query_info_compound(const unsigned int xid, struct cifs_tcon *tcon,
        kfree(vars);
 out_free_path:
        kfree(utf16_path);
+
+       if (is_replayable_error(rc) &&
+           smb2_should_replay(tcon, &retries, &cur_sleep))
+               goto replay_again;
+
        return rc;
 }
 
@@ -3213,6 +3322,9 @@ static long smb3_zero_range(struct file *file, struct cifs_tcon *tcon,
                                  cfile->fid.volatile_fid, cfile->pid, new_size);
                if (rc >= 0) {
                        truncate_setsize(inode, new_size);
+                       netfs_resize_file(&cifsi->netfs, new_size, true);
+                       if (offset < cifsi->netfs.zero_point)
+                               cifsi->netfs.zero_point = offset;
                        fscache_resize_cookie(cifs_inode_cookie(inode), new_size);
                }
        }
@@ -3436,7 +3548,7 @@ static long smb3_simple_falloc(struct file *file, struct cifs_tcon *tcon,
                rc = SMB2_set_eof(xid, tcon, cfile->fid.persistent_fid,
                                  cfile->fid.volatile_fid, cfile->pid, new_eof);
                if (rc == 0) {
-                       cifsi->server_eof = new_eof;
+                       netfs_resize_file(&cifsi->netfs, new_eof, true);
                        cifs_setsize(inode, new_eof);
                        cifs_truncate_page(inode->i_mapping, inode->i_size);
                        truncate_setsize(inode, new_eof);
@@ -3528,8 +3640,9 @@ static long smb3_collapse_range(struct file *file, struct cifs_tcon *tcon,
        int rc;
        unsigned int xid;
        struct inode *inode = file_inode(file);
-       struct cifsFileInfo *cfile = file->private_data;
        struct cifsInodeInfo *cifsi = CIFS_I(inode);
+       struct cifsFileInfo *cfile = file->private_data;
+       struct netfs_inode *ictx = &cifsi->netfs;
        loff_t old_eof, new_eof;
 
        xid = get_xid();
@@ -3549,6 +3662,7 @@ static long smb3_collapse_range(struct file *file, struct cifs_tcon *tcon,
                goto out_2;
 
        truncate_pagecache_range(inode, off, old_eof);
+       ictx->zero_point = old_eof;
 
        rc = smb2_copychunk_range(xid, cfile, cfile, off + len,
                                  old_eof - off - len, off);
@@ -3563,9 +3677,10 @@ static long smb3_collapse_range(struct file *file, struct cifs_tcon *tcon,
 
        rc = 0;
 
-       cifsi->server_eof = i_size_read(inode) - len;
-       truncate_setsize(inode, cifsi->server_eof);
-       fscache_resize_cookie(cifs_inode_cookie(inode), cifsi->server_eof);
+       truncate_setsize(inode, new_eof);
+       netfs_resize_file(&cifsi->netfs, new_eof, true);
+       ictx->zero_point = new_eof;
+       fscache_resize_cookie(cifs_inode_cookie(inode), new_eof);
 out_2:
        filemap_invalidate_unlock(inode->i_mapping);
  out:
@@ -3581,6 +3696,7 @@ static long smb3_insert_range(struct file *file, struct cifs_tcon *tcon,
        unsigned int xid;
        struct cifsFileInfo *cfile = file->private_data;
        struct inode *inode = file_inode(file);
+       struct cifsInodeInfo *cifsi = CIFS_I(inode);
        __u64 count, old_eof, new_eof;
 
        xid = get_xid();
@@ -3608,6 +3724,7 @@ static long smb3_insert_range(struct file *file, struct cifs_tcon *tcon,
                goto out_2;
 
        truncate_setsize(inode, new_eof);
+       netfs_resize_file(&cifsi->netfs, i_size_read(inode), true);
        fscache_resize_cookie(cifs_inode_cookie(inode), i_size_read(inode));
 
        rc = smb2_copychunk_range(xid, cfile, cfile, off, count, off + len);
@@ -5100,7 +5217,7 @@ static int smb2_create_reparse_symlink(const unsigned int xid,
        struct inode *new;
        struct kvec iov;
        __le16 *path;
-       char *sym;
+       char *sym, sep = CIFS_DIR_SEP(cifs_sb);
        u16 len, plen;
        int rc = 0;
 
@@ -5114,7 +5231,8 @@ static int smb2_create_reparse_symlink(const unsigned int xid,
                .symlink_target = sym,
        };
 
-       path = cifs_convert_path_to_utf16(symname, cifs_sb);
+       convert_delimiter(sym, sep);
+       path = cifs_convert_path_to_utf16(sym, cifs_sb);
        if (!path) {
                rc = -ENOMEM;
                goto out;
@@ -5137,7 +5255,10 @@ static int smb2_create_reparse_symlink(const unsigned int xid,
        buf->PrintNameLength = cpu_to_le16(plen);
        memcpy(buf->PathBuffer, path, plen);
        buf->Flags = cpu_to_le32(*symname != '/' ? SYMLINK_FLAG_RELATIVE : 0);
+       if (*sym != sep)
+               buf->Flags = cpu_to_le32(SYMLINK_FLAG_RELATIVE);
 
+       convert_delimiter(sym, '/');
        iov.iov_base = buf;
        iov.iov_len = len;
        new = smb2_get_reparse_inode(&data, inode->i_sb, xid,
index 288199f0b987df98ba3fab9320523bc16e73092d..608ee05491e262c5cf4555c6b51b364cdc60a03b 100644 (file)
@@ -178,6 +178,7 @@ cifs_chan_skip_or_disable(struct cifs_ses *ses,
                }
 
                ses->chans[chan_index].server = NULL;
+               server->terminate = true;
                spin_unlock(&ses->chan_lock);
 
                /*
@@ -188,14 +189,12 @@ cifs_chan_skip_or_disable(struct cifs_ses *ses,
                 */
                cifs_put_tcp_session(server, from_reconnect);
 
-               server->terminate = true;
                cifs_signal_cifsd_for_reconnect(server, false);
 
                /* mark primary server as needing reconnect */
                pserver = server->primary_server;
                cifs_signal_cifsd_for_reconnect(pserver, false);
 skip_terminate:
-               mutex_unlock(&ses->session_mutex);
                return -EHOSTDOWN;
        }
 
@@ -400,6 +399,15 @@ skip_sess_setup:
                goto out;
        }
 
+       spin_lock(&ses->ses_lock);
+       if (ses->flags & CIFS_SES_FLAG_SCALE_CHANNELS) {
+               spin_unlock(&ses->ses_lock);
+               mutex_unlock(&ses->session_mutex);
+               goto skip_add_channels;
+       }
+       ses->flags |= CIFS_SES_FLAG_SCALE_CHANNELS;
+       spin_unlock(&ses->ses_lock);
+
        if (!rc &&
            (server->capabilities & SMB2_GLOBAL_CAP_MULTI_CHANNEL)) {
                mutex_unlock(&ses->session_mutex);
@@ -411,7 +419,7 @@ skip_sess_setup:
                rc = SMB3_request_interfaces(xid, tcon, false);
                free_xid(xid);
 
-               if (rc == -EOPNOTSUPP) {
+               if (rc == -EOPNOTSUPP && ses->chan_count > 1) {
                        /*
                         * some servers like Azure SMB server do not advertise
                         * that multichannel has been disabled with server
@@ -429,17 +437,22 @@ skip_sess_setup:
                if (ses->chan_max > ses->chan_count &&
                    ses->iface_count &&
                    !SERVER_IS_CHAN(server)) {
-                       if (ses->chan_count == 1)
+                       if (ses->chan_count == 1) {
                                cifs_server_dbg(VFS, "supports multichannel now\n");
+                               queue_delayed_work(cifsiod_wq, &tcon->query_interfaces,
+                                                (SMB_INTERFACE_POLL_INTERVAL * HZ));
+                       }
 
                        cifs_try_adding_channels(ses);
-                       queue_delayed_work(cifsiod_wq, &tcon->query_interfaces,
-                                          (SMB_INTERFACE_POLL_INTERVAL * HZ));
                }
        } else {
                mutex_unlock(&ses->session_mutex);
        }
+
 skip_add_channels:
+       spin_lock(&ses->ses_lock);
+       ses->flags &= ~CIFS_SES_FLAG_SCALE_CHANNELS;
+       spin_unlock(&ses->ses_lock);
 
        if (smb2_command != SMB2_INTERNAL_CMD)
                mod_delayed_work(cifsiod_wq, &server->reconnect, 0);
@@ -2391,8 +2404,13 @@ create_durable_v2_buf(struct cifs_open_parms *oparms)
         */
        buf->dcontext.Timeout = cpu_to_le32(oparms->tcon->handle_timeout);
        buf->dcontext.Flags = cpu_to_le32(SMB2_DHANDLE_FLAG_PERSISTENT);
-       generate_random_uuid(buf->dcontext.CreateGuid);
-       memcpy(pfid->create_guid, buf->dcontext.CreateGuid, 16);
+
+       /* for replay, we should not overwrite the existing create guid */
+       if (!oparms->replay) {
+               generate_random_uuid(buf->dcontext.CreateGuid);
+               memcpy(pfid->create_guid, buf->dcontext.CreateGuid, 16);
+       } else
+               memcpy(buf->dcontext.CreateGuid, pfid->create_guid, 16);
 
        /* SMB2_CREATE_DURABLE_HANDLE_REQUEST is "DH2Q" */
        buf->Name[0] = 'D';
@@ -2765,7 +2783,14 @@ int smb311_posix_mkdir(const unsigned int xid, struct inode *inode,
        int flags = 0;
        unsigned int total_len;
        __le16 *utf16_path = NULL;
-       struct TCP_Server_Info *server = cifs_pick_channel(ses);
+       struct TCP_Server_Info *server;
+       int retries = 0, cur_sleep = 1;
+
+replay_again:
+       /* reinitialize for possible replay */
+       flags = 0;
+       n_iov = 2;
+       server = cifs_pick_channel(ses);
 
        cifs_dbg(FYI, "mkdir\n");
 
@@ -2869,6 +2894,10 @@ int smb311_posix_mkdir(const unsigned int xid, struct inode *inode,
        /* no need to inc num_remote_opens because we close it just below */
        trace_smb3_posix_mkdir_enter(xid, tcon->tid, ses->Suid, full_path, CREATE_NOT_FILE,
                                    FILE_WRITE_ATTRIBUTES);
+
+       if (retries)
+               smb2_set_replay(server, &rqst);
+
        /* resource #4: response buffer */
        rc = cifs_send_recv(xid, ses, server,
                            &rqst, &resp_buftype, flags, &rsp_iov);
@@ -2906,6 +2935,11 @@ err_free_req:
        cifs_small_buf_release(req);
 err_free_path:
        kfree(utf16_path);
+
+       if (is_replayable_error(rc) &&
+           smb2_should_replay(tcon, &retries, &cur_sleep))
+               goto replay_again;
+
        return rc;
 }
 
@@ -3101,12 +3135,19 @@ SMB2_open(const unsigned int xid, struct cifs_open_parms *oparms, __le16 *path,
        struct smb2_create_rsp *rsp = NULL;
        struct cifs_tcon *tcon = oparms->tcon;
        struct cifs_ses *ses = tcon->ses;
-       struct TCP_Server_Info *server = cifs_pick_channel(ses);
+       struct TCP_Server_Info *server;
        struct kvec iov[SMB2_CREATE_IOV_SIZE];
        struct kvec rsp_iov = {NULL, 0};
        int resp_buftype = CIFS_NO_BUFFER;
        int rc = 0;
        int flags = 0;
+       int retries = 0, cur_sleep = 1;
+
+replay_again:
+       /* reinitialize for possible replay */
+       flags = 0;
+       server = cifs_pick_channel(ses);
+       oparms->replay = !!(retries);
 
        cifs_dbg(FYI, "create/open\n");
        if (!ses || !server)
@@ -3128,6 +3169,9 @@ SMB2_open(const unsigned int xid, struct cifs_open_parms *oparms, __le16 *path,
        trace_smb3_open_enter(xid, tcon->tid, tcon->ses->Suid, oparms->path,
                oparms->create_options, oparms->desired_access);
 
+       if (retries)
+               smb2_set_replay(server, &rqst);
+
        rc = cifs_send_recv(xid, ses, server,
                            &rqst, &resp_buftype, flags,
                            &rsp_iov);
@@ -3181,6 +3225,11 @@ SMB2_open(const unsigned int xid, struct cifs_open_parms *oparms, __le16 *path,
 creat_exit:
        SMB2_open_free(&rqst);
        free_rsp_buf(resp_buftype, rsp);
+
+       if (is_replayable_error(rc) &&
+           smb2_should_replay(tcon, &retries, &cur_sleep))
+               goto replay_again;
+
        return rc;
 }
 
@@ -3305,15 +3354,7 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon *tcon, u64 persistent_fid,
        int resp_buftype = CIFS_NO_BUFFER;
        int rc = 0;
        int flags = 0;
-
-       cifs_dbg(FYI, "SMB2 IOCTL\n");
-
-       if (out_data != NULL)
-               *out_data = NULL;
-
-       /* zero out returned data len, in case of error */
-       if (plen)
-               *plen = 0;
+       int retries = 0, cur_sleep = 1;
 
        if (!tcon)
                return -EIO;
@@ -3322,10 +3363,23 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon *tcon, u64 persistent_fid,
        if (!ses)
                return -EIO;
 
+replay_again:
+       /* reinitialize for possible replay */
+       flags = 0;
        server = cifs_pick_channel(ses);
+
        if (!server)
                return -EIO;
 
+       cifs_dbg(FYI, "SMB2 IOCTL\n");
+
+       if (out_data != NULL)
+               *out_data = NULL;
+
+       /* zero out returned data len, in case of error */
+       if (plen)
+               *plen = 0;
+
        if (smb3_encryption_required(tcon))
                flags |= CIFS_TRANSFORM_REQ;
 
@@ -3340,6 +3394,9 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon *tcon, u64 persistent_fid,
        if (rc)
                goto ioctl_exit;
 
+       if (retries)
+               smb2_set_replay(server, &rqst);
+
        rc = cifs_send_recv(xid, ses, server,
                            &rqst, &resp_buftype, flags,
                            &rsp_iov);
@@ -3409,6 +3466,11 @@ SMB2_ioctl(const unsigned int xid, struct cifs_tcon *tcon, u64 persistent_fid,
 ioctl_exit:
        SMB2_ioctl_free(&rqst);
        free_rsp_buf(resp_buftype, rsp);
+
+       if (is_replayable_error(rc) &&
+           smb2_should_replay(tcon, &retries, &cur_sleep))
+               goto replay_again;
+
        return rc;
 }
 
@@ -3480,13 +3542,20 @@ __SMB2_close(const unsigned int xid, struct cifs_tcon *tcon,
        struct smb_rqst rqst;
        struct smb2_close_rsp *rsp = NULL;
        struct cifs_ses *ses = tcon->ses;
-       struct TCP_Server_Info *server = cifs_pick_channel(ses);
+       struct TCP_Server_Info *server;
        struct kvec iov[1];
        struct kvec rsp_iov;
        int resp_buftype = CIFS_NO_BUFFER;
        int rc = 0;
        int flags = 0;
        bool query_attrs = false;
+       int retries = 0, cur_sleep = 1;
+
+replay_again:
+       /* reinitialize for possible replay */
+       flags = 0;
+       query_attrs = false;
+       server = cifs_pick_channel(ses);
 
        cifs_dbg(FYI, "Close\n");
 
@@ -3512,6 +3581,9 @@ __SMB2_close(const unsigned int xid, struct cifs_tcon *tcon,
        if (rc)
                goto close_exit;
 
+       if (retries)
+               smb2_set_replay(server, &rqst);
+
        rc = cifs_send_recv(xid, ses, server,
                            &rqst, &resp_buftype, flags, &rsp_iov);
        rsp = (struct smb2_close_rsp *)rsp_iov.iov_base;
@@ -3545,6 +3617,11 @@ close_exit:
                        cifs_dbg(VFS, "handle cancelled close fid 0x%llx returned error %d\n",
                                 persistent_fid, tmp_rc);
        }
+
+       if (is_replayable_error(rc) &&
+           smb2_should_replay(tcon, &retries, &cur_sleep))
+               goto replay_again;
+
        return rc;
 }
 
@@ -3675,12 +3752,19 @@ query_info(const unsigned int xid, struct cifs_tcon *tcon,
        struct TCP_Server_Info *server;
        int flags = 0;
        bool allocated = false;
+       int retries = 0, cur_sleep = 1;
 
        cifs_dbg(FYI, "Query Info\n");
 
        if (!ses)
                return -EIO;
+
+replay_again:
+       /* reinitialize for possible replay */
+       flags = 0;
+       allocated = false;
        server = cifs_pick_channel(ses);
+
        if (!server)
                return -EIO;
 
@@ -3702,6 +3786,9 @@ query_info(const unsigned int xid, struct cifs_tcon *tcon,
        trace_smb3_query_info_enter(xid, persistent_fid, tcon->tid,
                                    ses->Suid, info_class, (__u32)info_type);
 
+       if (retries)
+               smb2_set_replay(server, &rqst);
+
        rc = cifs_send_recv(xid, ses, server,
                            &rqst, &resp_buftype, flags, &rsp_iov);
        rsp = (struct smb2_query_info_rsp *)rsp_iov.iov_base;
@@ -3744,6 +3831,11 @@ query_info(const unsigned int xid, struct cifs_tcon *tcon,
 qinf_exit:
        SMB2_query_info_free(&rqst);
        free_rsp_buf(resp_buftype, rsp);
+
+       if (is_replayable_error(rc) &&
+           smb2_should_replay(tcon, &retries, &cur_sleep))
+               goto replay_again;
+
        return rc;
 }
 
@@ -3844,7 +3936,7 @@ SMB2_change_notify(const unsigned int xid, struct cifs_tcon *tcon,
                u32 *plen /* returned data len */)
 {
        struct cifs_ses *ses = tcon->ses;
-       struct TCP_Server_Info *server = cifs_pick_channel(ses);
+       struct TCP_Server_Info *server;
        struct smb_rqst rqst;
        struct smb2_change_notify_rsp *smb_rsp;
        struct kvec iov[1];
@@ -3852,6 +3944,12 @@ SMB2_change_notify(const unsigned int xid, struct cifs_tcon *tcon,
        int resp_buftype = CIFS_NO_BUFFER;
        int flags = 0;
        int rc = 0;
+       int retries = 0, cur_sleep = 1;
+
+replay_again:
+       /* reinitialize for possible replay */
+       flags = 0;
+       server = cifs_pick_channel(ses);
 
        cifs_dbg(FYI, "change notify\n");
        if (!ses || !server)
@@ -3876,6 +3974,10 @@ SMB2_change_notify(const unsigned int xid, struct cifs_tcon *tcon,
 
        trace_smb3_notify_enter(xid, persistent_fid, tcon->tid, ses->Suid,
                                (u8)watch_tree, completion_filter);
+
+       if (retries)
+               smb2_set_replay(server, &rqst);
+
        rc = cifs_send_recv(xid, ses, server,
                            &rqst, &resp_buftype, flags, &rsp_iov);
 
@@ -3910,6 +4012,11 @@ SMB2_change_notify(const unsigned int xid, struct cifs_tcon *tcon,
        if (rqst.rq_iov)
                cifs_small_buf_release(rqst.rq_iov[0].iov_base); /* request */
        free_rsp_buf(resp_buftype, rsp_iov.iov_base);
+
+       if (is_replayable_error(rc) &&
+           smb2_should_replay(tcon, &retries, &cur_sleep))
+               goto replay_again;
+
        return rc;
 }
 
@@ -4152,10 +4259,16 @@ SMB2_flush(const unsigned int xid, struct cifs_tcon *tcon, u64 persistent_fid,
        struct smb_rqst rqst;
        struct kvec iov[1];
        struct kvec rsp_iov = {NULL, 0};
-       struct TCP_Server_Info *server = cifs_pick_channel(ses);
+       struct TCP_Server_Info *server;
        int resp_buftype = CIFS_NO_BUFFER;
        int flags = 0;
        int rc = 0;
+       int retries = 0, cur_sleep = 1;
+
+replay_again:
+       /* reinitialize for possible replay */
+       flags = 0;
+       server = cifs_pick_channel(ses);
 
        cifs_dbg(FYI, "flush\n");
        if (!ses || !(ses->server))
@@ -4175,6 +4288,10 @@ SMB2_flush(const unsigned int xid, struct cifs_tcon *tcon, u64 persistent_fid,
                goto flush_exit;
 
        trace_smb3_flush_enter(xid, persistent_fid, tcon->tid, ses->Suid);
+
+       if (retries)
+               smb2_set_replay(server, &rqst);
+
        rc = cifs_send_recv(xid, ses, server,
                            &rqst, &resp_buftype, flags, &rsp_iov);
 
@@ -4189,6 +4306,11 @@ SMB2_flush(const unsigned int xid, struct cifs_tcon *tcon, u64 persistent_fid,
  flush_exit:
        SMB2_flush_free(&rqst);
        free_rsp_buf(resp_buftype, rsp_iov.iov_base);
+
+       if (is_replayable_error(rc) &&
+           smb2_should_replay(tcon, &retries, &cur_sleep))
+               goto replay_again;
+
        return rc;
 }
 
@@ -4668,7 +4790,7 @@ smb2_async_writev(struct cifs_writedata *wdata,
        struct cifs_io_parms *io_parms = NULL;
        int credit_request;
 
-       if (!wdata->server)
+       if (!wdata->server || wdata->replay)
                server = wdata->server = cifs_pick_channel(tcon->ses);
 
        /*
@@ -4753,6 +4875,8 @@ smb2_async_writev(struct cifs_writedata *wdata,
        rqst.rq_nvec = 1;
        rqst.rq_iter = wdata->iter;
        rqst.rq_iter_size = iov_iter_count(&rqst.rq_iter);
+       if (wdata->replay)
+               smb2_set_replay(server, &rqst);
 #ifdef CONFIG_CIFS_SMB_DIRECT
        if (wdata->mr)
                iov[0].iov_len += sizeof(struct smbd_buffer_descriptor_v1);
@@ -4826,18 +4950,21 @@ SMB2_write(const unsigned int xid, struct cifs_io_parms *io_parms,
        int flags = 0;
        unsigned int total_len;
        struct TCP_Server_Info *server;
+       int retries = 0, cur_sleep = 1;
 
+replay_again:
+       /* reinitialize for possible replay */
+       flags = 0;
        *nbytes = 0;
-
-       if (n_vec < 1)
-               return rc;
-
        if (!io_parms->server)
                io_parms->server = cifs_pick_channel(io_parms->tcon->ses);
        server = io_parms->server;
        if (server == NULL)
                return -ECONNABORTED;
 
+       if (n_vec < 1)
+               return rc;
+
        rc = smb2_plain_req_init(SMB2_WRITE, io_parms->tcon, server,
                                 (void **) &req, &total_len);
        if (rc)
@@ -4871,6 +4998,9 @@ SMB2_write(const unsigned int xid, struct cifs_io_parms *io_parms,
        rqst.rq_iov = iov;
        rqst.rq_nvec = n_vec + 1;
 
+       if (retries)
+               smb2_set_replay(server, &rqst);
+
        rc = cifs_send_recv(xid, io_parms->tcon->ses, server,
                            &rqst,
                            &resp_buftype, flags, &rsp_iov);
@@ -4895,6 +5025,11 @@ SMB2_write(const unsigned int xid, struct cifs_io_parms *io_parms,
 
        cifs_small_buf_release(req);
        free_rsp_buf(resp_buftype, rsp);
+
+       if (is_replayable_error(rc) &&
+           smb2_should_replay(io_parms->tcon, &retries, &cur_sleep))
+               goto replay_again;
+
        return rc;
 }
 
@@ -5077,6 +5212,9 @@ int SMB2_query_directory_init(const unsigned int xid,
        case SMB_FIND_FILE_POSIX_INFO:
                req->FileInformationClass = SMB_FIND_FILE_POSIX_INFO;
                break;
+       case SMB_FIND_FILE_FULL_DIRECTORY_INFO:
+               req->FileInformationClass = FILE_FULL_DIRECTORY_INFORMATION;
+               break;
        default:
                cifs_tcon_dbg(VFS, "info level %u isn't supported\n",
                        info_level);
@@ -5146,6 +5284,9 @@ smb2_parse_query_directory(struct cifs_tcon *tcon,
                /* note that posix payload are variable size */
                info_buf_size = sizeof(struct smb2_posix_info);
                break;
+       case SMB_FIND_FILE_FULL_DIRECTORY_INFO:
+               info_buf_size = sizeof(FILE_FULL_DIRECTORY_INFO);
+               break;
        default:
                cifs_tcon_dbg(VFS, "info level %u isn't supported\n",
                         srch_inf->info_level);
@@ -5206,8 +5347,14 @@ SMB2_query_directory(const unsigned int xid, struct cifs_tcon *tcon,
        struct kvec rsp_iov;
        int rc = 0;
        struct cifs_ses *ses = tcon->ses;
-       struct TCP_Server_Info *server = cifs_pick_channel(ses);
+       struct TCP_Server_Info *server;
        int flags = 0;
+       int retries = 0, cur_sleep = 1;
+
+replay_again:
+       /* reinitialize for possible replay */
+       flags = 0;
+       server = cifs_pick_channel(ses);
 
        if (!ses || !(ses->server))
                return -EIO;
@@ -5227,6 +5374,9 @@ SMB2_query_directory(const unsigned int xid, struct cifs_tcon *tcon,
        if (rc)
                goto qdir_exit;
 
+       if (retries)
+               smb2_set_replay(server, &rqst);
+
        rc = cifs_send_recv(xid, ses, server,
                            &rqst, &resp_buftype, flags, &rsp_iov);
        rsp = (struct smb2_query_directory_rsp *)rsp_iov.iov_base;
@@ -5261,6 +5411,11 @@ SMB2_query_directory(const unsigned int xid, struct cifs_tcon *tcon,
 qdir_exit:
        SMB2_query_directory_free(&rqst);
        free_rsp_buf(resp_buftype, rsp);
+
+       if (is_replayable_error(rc) &&
+           smb2_should_replay(tcon, &retries, &cur_sleep))
+               goto replay_again;
+
        return rc;
 }
 
@@ -5327,8 +5482,14 @@ send_set_info(const unsigned int xid, struct cifs_tcon *tcon,
        int rc = 0;
        int resp_buftype;
        struct cifs_ses *ses = tcon->ses;
-       struct TCP_Server_Info *server = cifs_pick_channel(ses);
+       struct TCP_Server_Info *server;
        int flags = 0;
+       int retries = 0, cur_sleep = 1;
+
+replay_again:
+       /* reinitialize for possible replay */
+       flags = 0;
+       server = cifs_pick_channel(ses);
 
        if (!ses || !server)
                return -EIO;
@@ -5356,6 +5517,8 @@ send_set_info(const unsigned int xid, struct cifs_tcon *tcon,
                return rc;
        }
 
+       if (retries)
+               smb2_set_replay(server, &rqst);
 
        rc = cifs_send_recv(xid, ses, server,
                            &rqst, &resp_buftype, flags,
@@ -5371,6 +5534,11 @@ send_set_info(const unsigned int xid, struct cifs_tcon *tcon,
 
        free_rsp_buf(resp_buftype, rsp);
        kfree(iov);
+
+       if (is_replayable_error(rc) &&
+           smb2_should_replay(tcon, &retries, &cur_sleep))
+               goto replay_again;
+
        return rc;
 }
 
@@ -5423,12 +5591,18 @@ SMB2_oplock_break(const unsigned int xid, struct cifs_tcon *tcon,
        int rc;
        struct smb2_oplock_break *req = NULL;
        struct cifs_ses *ses = tcon->ses;
-       struct TCP_Server_Info *server = cifs_pick_channel(ses);
+       struct TCP_Server_Info *server;
        int flags = CIFS_OBREAK_OP;
        unsigned int total_len;
        struct kvec iov[1];
        struct kvec rsp_iov;
        int resp_buf_type;
+       int retries = 0, cur_sleep = 1;
+
+replay_again:
+       /* reinitialize for possible replay */
+       flags = CIFS_OBREAK_OP;
+       server = cifs_pick_channel(ses);
 
        cifs_dbg(FYI, "SMB2_oplock_break\n");
        rc = smb2_plain_req_init(SMB2_OPLOCK_BREAK, tcon, server,
@@ -5453,15 +5627,21 @@ SMB2_oplock_break(const unsigned int xid, struct cifs_tcon *tcon,
        rqst.rq_iov = iov;
        rqst.rq_nvec = 1;
 
+       if (retries)
+               smb2_set_replay(server, &rqst);
+
        rc = cifs_send_recv(xid, ses, server,
                            &rqst, &resp_buf_type, flags, &rsp_iov);
        cifs_small_buf_release(req);
-
        if (rc) {
                cifs_stats_fail_inc(tcon, SMB2_OPLOCK_BREAK_HE);
                cifs_dbg(FYI, "Send error in Oplock Break = %d\n", rc);
        }
 
+       if (is_replayable_error(rc) &&
+           smb2_should_replay(tcon, &retries, &cur_sleep))
+               goto replay_again;
+
        return rc;
 }
 
@@ -5547,9 +5727,15 @@ SMB311_posix_qfs_info(const unsigned int xid, struct cifs_tcon *tcon,
        int rc = 0;
        int resp_buftype;
        struct cifs_ses *ses = tcon->ses;
-       struct TCP_Server_Info *server = cifs_pick_channel(ses);
+       struct TCP_Server_Info *server;
        FILE_SYSTEM_POSIX_INFO *info = NULL;
        int flags = 0;
+       int retries = 0, cur_sleep = 1;
+
+replay_again:
+       /* reinitialize for possible replay */
+       flags = 0;
+       server = cifs_pick_channel(ses);
 
        rc = build_qfs_info_req(&iov, tcon, server,
                                FS_POSIX_INFORMATION,
@@ -5565,6 +5751,9 @@ SMB311_posix_qfs_info(const unsigned int xid, struct cifs_tcon *tcon,
        rqst.rq_iov = &iov;
        rqst.rq_nvec = 1;
 
+       if (retries)
+               smb2_set_replay(server, &rqst);
+
        rc = cifs_send_recv(xid, ses, server,
                            &rqst, &resp_buftype, flags, &rsp_iov);
        free_qfs_info_req(&iov);
@@ -5584,6 +5773,11 @@ SMB311_posix_qfs_info(const unsigned int xid, struct cifs_tcon *tcon,
 
 posix_qfsinf_exit:
        free_rsp_buf(resp_buftype, rsp_iov.iov_base);
+
+       if (is_replayable_error(rc) &&
+           smb2_should_replay(tcon, &retries, &cur_sleep))
+               goto replay_again;
+
        return rc;
 }
 
@@ -5598,9 +5792,15 @@ SMB2_QFS_info(const unsigned int xid, struct cifs_tcon *tcon,
        int rc = 0;
        int resp_buftype;
        struct cifs_ses *ses = tcon->ses;
-       struct TCP_Server_Info *server = cifs_pick_channel(ses);
+       struct TCP_Server_Info *server;
        struct smb2_fs_full_size_info *info = NULL;
        int flags = 0;
+       int retries = 0, cur_sleep = 1;
+
+replay_again:
+       /* reinitialize for possible replay */
+       flags = 0;
+       server = cifs_pick_channel(ses);
 
        rc = build_qfs_info_req(&iov, tcon, server,
                                FS_FULL_SIZE_INFORMATION,
@@ -5616,6 +5816,9 @@ SMB2_QFS_info(const unsigned int xid, struct cifs_tcon *tcon,
        rqst.rq_iov = &iov;
        rqst.rq_nvec = 1;
 
+       if (retries)
+               smb2_set_replay(server, &rqst);
+
        rc = cifs_send_recv(xid, ses, server,
                            &rqst, &resp_buftype, flags, &rsp_iov);
        free_qfs_info_req(&iov);
@@ -5635,6 +5838,11 @@ SMB2_QFS_info(const unsigned int xid, struct cifs_tcon *tcon,
 
 qfsinf_exit:
        free_rsp_buf(resp_buftype, rsp_iov.iov_base);
+
+       if (is_replayable_error(rc) &&
+           smb2_should_replay(tcon, &retries, &cur_sleep))
+               goto replay_again;
+
        return rc;
 }
 
@@ -5649,9 +5857,15 @@ SMB2_QFS_attr(const unsigned int xid, struct cifs_tcon *tcon,
        int rc = 0;
        int resp_buftype, max_len, min_len;
        struct cifs_ses *ses = tcon->ses;
-       struct TCP_Server_Info *server = cifs_pick_channel(ses);
+       struct TCP_Server_Info *server;
        unsigned int rsp_len, offset;
        int flags = 0;
+       int retries = 0, cur_sleep = 1;
+
+replay_again:
+       /* reinitialize for possible replay */
+       flags = 0;
+       server = cifs_pick_channel(ses);
 
        if (level == FS_DEVICE_INFORMATION) {
                max_len = sizeof(FILE_SYSTEM_DEVICE_INFO);
@@ -5683,6 +5897,9 @@ SMB2_QFS_attr(const unsigned int xid, struct cifs_tcon *tcon,
        rqst.rq_iov = &iov;
        rqst.rq_nvec = 1;
 
+       if (retries)
+               smb2_set_replay(server, &rqst);
+
        rc = cifs_send_recv(xid, ses, server,
                            &rqst, &resp_buftype, flags, &rsp_iov);
        free_qfs_info_req(&iov);
@@ -5720,6 +5937,11 @@ SMB2_QFS_attr(const unsigned int xid, struct cifs_tcon *tcon,
 
 qfsattr_exit:
        free_rsp_buf(resp_buftype, rsp_iov.iov_base);
+
+       if (is_replayable_error(rc) &&
+           smb2_should_replay(tcon, &retries, &cur_sleep))
+               goto replay_again;
+
        return rc;
 }
 
@@ -5737,7 +5959,13 @@ smb2_lockv(const unsigned int xid, struct cifs_tcon *tcon,
        unsigned int count;
        int flags = CIFS_NO_RSP_BUF;
        unsigned int total_len;
-       struct TCP_Server_Info *server = cifs_pick_channel(tcon->ses);
+       struct TCP_Server_Info *server;
+       int retries = 0, cur_sleep = 1;
+
+replay_again:
+       /* reinitialize for possible replay */
+       flags = CIFS_NO_RSP_BUF;
+       server = cifs_pick_channel(tcon->ses);
 
        cifs_dbg(FYI, "smb2_lockv num lock %d\n", num_lock);
 
@@ -5768,6 +5996,9 @@ smb2_lockv(const unsigned int xid, struct cifs_tcon *tcon,
        rqst.rq_iov = iov;
        rqst.rq_nvec = 2;
 
+       if (retries)
+               smb2_set_replay(server, &rqst);
+
        rc = cifs_send_recv(xid, tcon->ses, server,
                            &rqst, &resp_buf_type, flags,
                            &rsp_iov);
@@ -5779,6 +6010,10 @@ smb2_lockv(const unsigned int xid, struct cifs_tcon *tcon,
                                    tcon->ses->Suid, rc);
        }
 
+       if (is_replayable_error(rc) &&
+           smb2_should_replay(tcon, &retries, &cur_sleep))
+               goto replay_again;
+
        return rc;
 }
 
index 0034b537b0b3f9dd057ce5183f05b6fedb147f77..b3069911e9dd8f51ea38ea54da740049696d18e6 100644 (file)
@@ -122,6 +122,11 @@ extern unsigned long smb_rqst_len(struct TCP_Server_Info *server,
 extern void smb2_set_next_command(struct cifs_tcon *tcon,
                                  struct smb_rqst *rqst);
 extern void smb2_set_related(struct smb_rqst *rqst);
+extern void smb2_set_replay(struct TCP_Server_Info *server,
+                           struct smb_rqst *rqst);
+extern bool smb2_should_replay(struct cifs_tcon *tcon,
+                         int *pretries,
+                         int *pcur_sleep);
 
 /*
  * SMB2 Worker functions - most of protocol specific implementation details
index f0ce26414f17377365ed0201f21dd4e9cdf06b59..1d1ee9f18f373501f781447f82b494857dd8e9f3 100644 (file)
 #include "cifsproto.h"
 #include "../common/md4.h"
 
-#ifndef false
-#define false 0
-#endif
-#ifndef true
-#define true 1
-#endif
-
 /* following came from the other byteorder.h to avoid include conflicts */
 #define CVAL(buf,pos) (((unsigned char *)(buf))[pos])
 #define SSVALX(buf,pos,val) (CVAL(buf,pos)=(val)&0xFF,CVAL(buf,pos+1)=(val)>>8)
index 4f717ad7c21b424d45f785fdbb94be941c1d7f14..994d70193432978de213a19a0f9933bd90e63671 100644 (file)
@@ -400,10 +400,17 @@ unmask:
                                                  server->conn_id, server->hostname);
        }
 smbd_done:
-       if (rc < 0 && rc != -EINTR)
+       /*
+        * there's hardly any use for the layers above to know the
+        * actual error code here. All they should do at this point is
+        * to retry the connection and hope it goes away.
+        */
+       if (rc < 0 && rc != -EINTR && rc != -EAGAIN) {
                cifs_server_dbg(VFS, "Error %d sending data on socket to server\n",
                         rc);
-       else if (rc > 0)
+               rc = -ECONNABORTED;
+               cifs_signal_cifsd_for_reconnect(server, false);
+       } else if (rc > 0)
                rc = 0;
 out:
        cifs_in_send_dec(server);
@@ -428,8 +435,8 @@ smb_send_rqst(struct TCP_Server_Info *server, int num_rqst,
        if (!(flags & CIFS_TRANSFORM_REQ))
                return __smb_send_rqst(server, num_rqst, rqst);
 
-       if (num_rqst > MAX_COMPOUND - 1)
-               return -ENOMEM;
+       if (WARN_ON_ONCE(num_rqst > MAX_COMPOUND - 1))
+               return -EIO;
 
        if (!server->ops->init_transform_rq) {
                cifs_server_dbg(VFS, "Encryption requested but transform callback is missing\n");
@@ -1026,6 +1033,9 @@ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses)
                if (!server || server->terminate)
                        continue;
 
+               if (CIFS_CHAN_NEEDS_RECONNECT(ses, i))
+                       continue;
+
                /*
                 * strictly speaking, we should pick up req_lock to read
                 * server->in_flight. But it shouldn't matter much here if we
index b7521e41402e003a3fd7e121c6003cf7edf23539..0ebf91ffa2361c0940aba0fc301d1a65bf1612e5 100644 (file)
@@ -304,7 +304,8 @@ enum ksmbd_event {
        KSMBD_EVENT_SPNEGO_AUTHEN_REQUEST,
        KSMBD_EVENT_SPNEGO_AUTHEN_RESPONSE      = 15,
 
-       KSMBD_EVENT_MAX
+       __KSMBD_EVENT_MAX,
+       KSMBD_EVENT_MAX = __KSMBD_EVENT_MAX - 1
 };
 
 /*
index 9e8afaa686e3aa8c12e908348aa38b34168ec367..1a5faa6f6e7bc3ddb96bdaa1ce953ba06f3bf5a2 100644 (file)
@@ -261,6 +261,7 @@ out_ascii:
 
 /**
  * ksmbd_extract_sharename() - get share name from tree connect request
+ * @um: pointer to a unicode_map structure for character encoding handling
  * @treename:  buffer containing tree name and share name
  *
  * Return:      share name on success, otherwise error
index ba7a72a6a4f45f6b756768c4a3a48e19d74e3683..089527a8b4ff42211c7bccdd4aa8adba9b984436 100644 (file)
@@ -6173,8 +6173,10 @@ static noinline int smb2_read_pipe(struct ksmbd_work *work)
                err = ksmbd_iov_pin_rsp_read(work, (void *)rsp,
                                             offsetof(struct smb2_read_rsp, Buffer),
                                             aux_payload_buf, nbytes);
-               if (err)
+               if (err) {
+                       kvfree(aux_payload_buf);
                        goto out;
+               }
                kvfree(rpc_resp);
        } else {
                err = ksmbd_iov_pin_rsp(work, (void *)rsp,
@@ -6384,8 +6386,10 @@ int smb2_read(struct ksmbd_work *work)
        err = ksmbd_iov_pin_rsp_read(work, (void *)rsp,
                                     offsetof(struct smb2_read_rsp, Buffer),
                                     aux_payload_buf, nbytes);
-       if (err)
+       if (err) {
+               kvfree(aux_payload_buf);
                goto out;
+       }
        ksmbd_fd_put(work, fp);
        return 0;
 
@@ -6760,10 +6764,10 @@ struct file_lock *smb_flock_init(struct file *f)
 
        locks_init_lock(fl);
 
-       fl->fl_owner = f;
-       fl->fl_pid = current->tgid;
-       fl->fl_file = f;
-       fl->fl_flags = FL_POSIX;
+       fl->c.flc_owner = f;
+       fl->c.flc_pid = current->tgid;
+       fl->c.flc_file = f;
+       fl->c.flc_flags = FL_POSIX;
        fl->fl_ops = NULL;
        fl->fl_lmops = NULL;
 
@@ -6780,30 +6784,30 @@ static int smb2_set_flock_flags(struct file_lock *flock, int flags)
        case SMB2_LOCKFLAG_SHARED:
                ksmbd_debug(SMB, "received shared request\n");
                cmd = F_SETLKW;
-               flock->fl_type = F_RDLCK;
-               flock->fl_flags |= FL_SLEEP;
+               flock->c.flc_type = F_RDLCK;
+               flock->c.flc_flags |= FL_SLEEP;
                break;
        case SMB2_LOCKFLAG_EXCLUSIVE:
                ksmbd_debug(SMB, "received exclusive request\n");
                cmd = F_SETLKW;
-               flock->fl_type = F_WRLCK;
-               flock->fl_flags |= FL_SLEEP;
+               flock->c.flc_type = F_WRLCK;
+               flock->c.flc_flags |= FL_SLEEP;
                break;
        case SMB2_LOCKFLAG_SHARED | SMB2_LOCKFLAG_FAIL_IMMEDIATELY:
                ksmbd_debug(SMB,
                            "received shared & fail immediately request\n");
                cmd = F_SETLK;
-               flock->fl_type = F_RDLCK;
+               flock->c.flc_type = F_RDLCK;
                break;
        case SMB2_LOCKFLAG_EXCLUSIVE | SMB2_LOCKFLAG_FAIL_IMMEDIATELY:
                ksmbd_debug(SMB,
                            "received exclusive & fail immediately request\n");
                cmd = F_SETLK;
-               flock->fl_type = F_WRLCK;
+               flock->c.flc_type = F_WRLCK;
                break;
        case SMB2_LOCKFLAG_UNLOCK:
                ksmbd_debug(SMB, "received unlock request\n");
-               flock->fl_type = F_UNLCK;
+               flock->c.flc_type = F_UNLCK;
                cmd = F_SETLK;
                break;
        }
@@ -6841,13 +6845,13 @@ static void smb2_remove_blocked_lock(void **argv)
        struct file_lock *flock = (struct file_lock *)argv[0];
 
        ksmbd_vfs_posix_lock_unblock(flock);
-       wake_up(&flock->fl_wait);
+       locks_wake_up(flock);
 }
 
 static inline bool lock_defer_pending(struct file_lock *fl)
 {
        /* check pending lock waiters */
-       return waitqueue_active(&fl->fl_wait);
+       return waitqueue_active(&fl->c.flc_wait);
 }
 
 /**
@@ -6938,8 +6942,8 @@ int smb2_lock(struct ksmbd_work *work)
                list_for_each_entry(cmp_lock, &lock_list, llist) {
                        if (cmp_lock->fl->fl_start <= flock->fl_start &&
                            cmp_lock->fl->fl_end >= flock->fl_end) {
-                               if (cmp_lock->fl->fl_type != F_UNLCK &&
-                                   flock->fl_type != F_UNLCK) {
+                               if (cmp_lock->fl->c.flc_type != F_UNLCK &&
+                                   flock->c.flc_type != F_UNLCK) {
                                        pr_err("conflict two locks in one request\n");
                                        err = -EINVAL;
                                        locks_free_lock(flock);
@@ -6987,12 +6991,12 @@ int smb2_lock(struct ksmbd_work *work)
                list_for_each_entry(conn, &conn_list, conns_list) {
                        spin_lock(&conn->llist_lock);
                        list_for_each_entry_safe(cmp_lock, tmp2, &conn->lock_list, clist) {
-                               if (file_inode(cmp_lock->fl->fl_file) !=
-                                   file_inode(smb_lock->fl->fl_file))
+                               if (file_inode(cmp_lock->fl->c.flc_file) !=
+                                   file_inode(smb_lock->fl->c.flc_file))
                                        continue;
 
-                               if (smb_lock->fl->fl_type == F_UNLCK) {
-                                       if (cmp_lock->fl->fl_file == smb_lock->fl->fl_file &&
+                               if (lock_is_unlock(smb_lock->fl)) {
+                                       if (cmp_lock->fl->c.flc_file == smb_lock->fl->c.flc_file &&
                                            cmp_lock->start == smb_lock->start &&
                                            cmp_lock->end == smb_lock->end &&
                                            !lock_defer_pending(cmp_lock->fl)) {
@@ -7009,7 +7013,7 @@ int smb2_lock(struct ksmbd_work *work)
                                        continue;
                                }
 
-                               if (cmp_lock->fl->fl_file == smb_lock->fl->fl_file) {
+                               if (cmp_lock->fl->c.flc_file == smb_lock->fl->c.flc_file) {
                                        if (smb_lock->flags & SMB2_LOCKFLAG_SHARED)
                                                continue;
                                } else {
@@ -7051,7 +7055,7 @@ int smb2_lock(struct ksmbd_work *work)
                }
                up_read(&conn_list_lock);
 out_check_cl:
-               if (smb_lock->fl->fl_type == F_UNLCK && nolock) {
+               if (lock_is_unlock(smb_lock->fl) && nolock) {
                        pr_err("Try to unlock nolocked range\n");
                        rsp->hdr.Status = STATUS_RANGE_NOT_LOCKED;
                        goto out;
@@ -7175,7 +7179,7 @@ out:
                struct file_lock *rlock = NULL;
 
                rlock = smb_flock_init(filp);
-               rlock->fl_type = F_UNLCK;
+               rlock->c.flc_type = F_UNLCK;
                rlock->fl_start = smb_lock->start;
                rlock->fl_end = smb_lock->end;
 
index b49d47bdafc945e31bdfa8d7b9f9931752c4d17c..f29bb03f0dc47bfcb0fe3fc5c5acff16d5a314a8 100644 (file)
@@ -74,7 +74,7 @@ static int handle_unsupported_event(struct sk_buff *skb, struct genl_info *info)
 static int handle_generic_event(struct sk_buff *skb, struct genl_info *info);
 static int ksmbd_ipc_heartbeat_request(void);
 
-static const struct nla_policy ksmbd_nl_policy[KSMBD_EVENT_MAX] = {
+static const struct nla_policy ksmbd_nl_policy[KSMBD_EVENT_MAX + 1] = {
        [KSMBD_EVENT_UNSPEC] = {
                .len = 0,
        },
@@ -403,7 +403,7 @@ static int handle_generic_event(struct sk_buff *skb, struct genl_info *info)
                return -EPERM;
 #endif
 
-       if (type >= KSMBD_EVENT_MAX) {
+       if (type > KSMBD_EVENT_MAX) {
                WARN_ON(1);
                return -EINVAL;
        }
index 9d4222154dcc0c92201a0d7a6e4dac77e0eea37b..002a3f0dc7c5880b61045cf7f10f7e078b85d6a9 100644 (file)
@@ -365,6 +365,7 @@ static int ksmbd_tcp_readv(struct tcp_transport *t, struct kvec *iov_orig,
  * @t:         TCP transport instance
  * @buf:       buffer to store read data from socket
  * @to_read:   number of bytes to read from socket
+ * @max_retries: number of retries if reading from socket fails
  *
  * Return:     on success return number of bytes read from socket,
  *             otherwise return error number
@@ -416,6 +417,7 @@ static void tcp_destroy_socket(struct socket *ksmbd_socket)
 
 /**
  * create_socket - create socket for ksmbd/0
+ * @iface:      interface to bind the created socket to
  *
  * Return:     0 on success, error number otherwise
  */
index a6961bfe3e139467296cf55798fe8f943aae9432..c487e834331aa61962b22d94c677289ec6e72067 100644 (file)
@@ -337,18 +337,18 @@ static int check_lock_range(struct file *filp, loff_t start, loff_t end,
                return 0;
 
        spin_lock(&ctx->flc_lock);
-       list_for_each_entry(flock, &ctx->flc_posix, fl_list) {
+       for_each_file_lock(flock, &ctx->flc_posix) {
                /* check conflict locks */
                if (flock->fl_end >= start && end >= flock->fl_start) {
-                       if (flock->fl_type == F_RDLCK) {
+                       if (lock_is_read(flock)) {
                                if (type == WRITE) {
                                        pr_err("not allow write by shared lock\n");
                                        error = 1;
                                        goto out;
                                }
-                       } else if (flock->fl_type == F_WRLCK) {
+                       } else if (lock_is_write(flock)) {
                                /* check owner in lock */
-                               if (flock->fl_file != filp) {
+                               if (flock->c.flc_file != filp) {
                                        error = 1;
                                        pr_err("not allow rw access by exclusive lock from other opens\n");
                                        goto out;
@@ -1837,13 +1837,13 @@ int ksmbd_vfs_copy_file_ranges(struct ksmbd_work *work,
 
 void ksmbd_vfs_posix_lock_wait(struct file_lock *flock)
 {
-       wait_event(flock->fl_wait, !flock->fl_blocker);
+       wait_event(flock->c.flc_wait, !flock->c.flc_blocker);
 }
 
 int ksmbd_vfs_posix_lock_wait_timeout(struct file_lock *flock, long timeout)
 {
-       return wait_event_interruptible_timeout(flock->fl_wait,
-                                               !flock->fl_blocker,
+       return wait_event_interruptible_timeout(flock->c.flc_wait,
+                                               !flock->c.flc_blocker,
                                                timeout);
 }
 
index d35e852954892dadcf1df6757c8b491904d2edbb..ee05ab6b37e769a16a3b1e40cd820c17e0b76d85 100644 (file)
@@ -274,9 +274,10 @@ static void destroy_super_work(struct work_struct *work)
 {
        struct super_block *s = container_of(work, struct super_block,
                                                        destroy_work);
-       int i;
-
-       for (i = 0; i < SB_FREEZE_LEVELS; i++)
+       security_sb_free(s);
+       put_user_ns(s->s_user_ns);
+       kfree(s->s_subtype);
+       for (int i = 0; i < SB_FREEZE_LEVELS; i++)
                percpu_free_rwsem(&s->s_writers.rw_sem[i]);
        kfree(s);
 }
@@ -296,9 +297,6 @@ static void destroy_unused_super(struct super_block *s)
        super_unlock_excl(s);
        list_lru_destroy(&s->s_dentry_lru);
        list_lru_destroy(&s->s_inode_lru);
-       security_sb_free(s);
-       put_user_ns(s->s_user_ns);
-       kfree(s->s_subtype);
        shrinker_free(s->s_shrink);
        /* no delays needed */
        destroy_super_work(&s->destroy_work);
@@ -409,9 +407,6 @@ static void __put_super(struct super_block *s)
                WARN_ON(s->s_dentry_lru.node);
                WARN_ON(s->s_inode_lru.node);
                WARN_ON(!list_empty(&s->s_mounts));
-               security_sb_free(s);
-               put_user_ns(s->s_user_ns);
-               kfree(s->s_subtype);
                call_rcu(&s->rcu, destroy_super_rcu);
        }
 }
@@ -1532,16 +1527,16 @@ int setup_bdev_super(struct super_block *sb, int sb_flags,
                struct fs_context *fc)
 {
        blk_mode_t mode = sb_open_mode(sb_flags);
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        struct block_device *bdev;
 
-       bdev_handle = bdev_open_by_dev(sb->s_dev, mode, sb, &fs_holder_ops);
-       if (IS_ERR(bdev_handle)) {
+       bdev_file = bdev_file_open_by_dev(sb->s_dev, mode, sb, &fs_holder_ops);
+       if (IS_ERR(bdev_file)) {
                if (fc)
                        errorf(fc, "%s: Can't open blockdev", fc->source);
-               return PTR_ERR(bdev_handle);
+               return PTR_ERR(bdev_file);
        }
-       bdev = bdev_handle->bdev;
+       bdev = file_bdev(bdev_file);
 
        /*
         * This really should be in blkdev_get_by_dev, but right now can't due
@@ -1549,7 +1544,7 @@ int setup_bdev_super(struct super_block *sb, int sb_flags,
         * writable from userspace even for a read-only block device.
         */
        if ((mode & BLK_OPEN_WRITE) && bdev_read_only(bdev)) {
-               bdev_release(bdev_handle);
+               fput(bdev_file);
                return -EACCES;
        }
 
@@ -1560,11 +1555,11 @@ int setup_bdev_super(struct super_block *sb, int sb_flags,
        if (atomic_read(&bdev->bd_fsfreeze_count) > 0) {
                if (fc)
                        warnf(fc, "%pg: Can't mount, blockdev is frozen", bdev);
-               bdev_release(bdev_handle);
+               fput(bdev_file);
                return -EBUSY;
        }
        spin_lock(&sb_lock);
-       sb->s_bdev_handle = bdev_handle;
+       sb->s_bdev_file = bdev_file;
        sb->s_bdev = bdev;
        sb->s_bdi = bdi_get(bdev->bd_disk->bdi);
        if (bdev_stable_writes(bdev))
@@ -1680,7 +1675,7 @@ void kill_block_super(struct super_block *sb)
        generic_shutdown_super(sb);
        if (bdev) {
                sync_blockdev(bdev);
-               bdev_release(sb->s_bdev_handle);
+               fput(sb->s_bdev_file);
        }
 }
 
index 5a915b2e68f5e48769bab76b315913303014d8b4..76bc2d5e75a960ae4610f1cb9f00b9ece914ce83 100644 (file)
@@ -336,7 +336,7 @@ int __init sysv_init_icache(void)
 {
        sysv_inode_cachep = kmem_cache_create("sysv_inode_cache",
                        sizeof(struct sysv_inode_info), 0,
-                       SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT,
+                       SLAB_RECLAIM_ACCOUNT|SLAB_ACCOUNT,
                        init_once);
        if (!sysv_inode_cachep)
                return -ENOMEM;
index 410ab2a44d2f604d22d7e805317cfa616ca5d191..19bcb51a220366631b9b89c0dfa0e86ddada208c 100644 (file)
@@ -83,9 +83,6 @@ static inline sysv_zone_t *block_end(struct buffer_head *bh)
        return (sysv_zone_t*)((char*)bh->b_data + bh->b_size);
 }
 
-/*
- * Requires read_lock(&pointers_lock) or write_lock(&pointers_lock)
- */
 static Indirect *get_branch(struct inode *inode,
                            int depth,
                            int offsets[],
@@ -105,15 +102,18 @@ static Indirect *get_branch(struct inode *inode,
                bh = sb_bread(sb, block);
                if (!bh)
                        goto failure;
+               read_lock(&pointers_lock);
                if (!verify_chain(chain, p))
                        goto changed;
                add_chain(++p, bh, (sysv_zone_t*)bh->b_data + *++offsets);
+               read_unlock(&pointers_lock);
                if (!p->key)
                        goto no_block;
        }
        return NULL;
 
 changed:
+       read_unlock(&pointers_lock);
        brelse(bh);
        *err = -EAGAIN;
        goto no_block;
@@ -219,9 +219,7 @@ static int get_block(struct inode *inode, sector_t iblock, struct buffer_head *b
                goto out;
 
 reread:
-       read_lock(&pointers_lock);
        partial = get_branch(inode, depth, offsets, chain, &err);
-       read_unlock(&pointers_lock);
 
        /* Simplest case - block found, no allocation needed */
        if (!partial) {
@@ -291,9 +289,9 @@ static Indirect *find_shared(struct inode *inode,
        *top = 0;
        for (k = depth; k > 1 && !offsets[k-1]; k--)
                ;
+       partial = get_branch(inode, k, offsets, chain, &err);
 
        write_lock(&pointers_lock);
-       partial = get_branch(inode, k, offsets, chain, &err);
        if (!partial)
                partial = chain + k-1;
        /*
index 6795fda2af191ac7e5d9cacec4220bc0feba2a2c..110e8a27218900756f3af6cd515d6e8cf33e9514 100644 (file)
@@ -34,7 +34,15 @@ static DEFINE_MUTEX(eventfs_mutex);
 
 /* Choose something "unique" ;-) */
 #define EVENTFS_FILE_INODE_INO         0x12c4e37
-#define EVENTFS_DIR_INODE_INO          0x134b2f5
+
+/* Just try to make something consistent and unique */
+static int eventfs_dir_ino(struct eventfs_inode *ei)
+{
+       if (!ei->ino)
+               ei->ino = get_next_ino();
+
+       return ei->ino;
+}
 
 /*
  * The eventfs_inode (ei) itself is protected by SRCU. It is released from
@@ -54,6 +62,46 @@ enum {
 
 #define EVENTFS_MODE_MASK      (EVENTFS_SAVE_MODE - 1)
 
+/*
+ * eventfs_inode reference count management.
+ *
+ * NOTE! We count only references from dentries, in the
+ * form 'dentry->d_fsdata'. There are also references from
+ * directory inodes ('ti->private'), but the dentry reference
+ * count is always a superset of the inode reference count.
+ */
+static void release_ei(struct kref *ref)
+{
+       struct eventfs_inode *ei = container_of(ref, struct eventfs_inode, kref);
+
+       WARN_ON_ONCE(!ei->is_freed);
+
+       kfree(ei->entry_attrs);
+       kfree_const(ei->name);
+       kfree_rcu(ei, rcu);
+}
+
+static inline void put_ei(struct eventfs_inode *ei)
+{
+       if (ei)
+               kref_put(&ei->kref, release_ei);
+}
+
+static inline void free_ei(struct eventfs_inode *ei)
+{
+       if (ei) {
+               ei->is_freed = 1;
+               put_ei(ei);
+       }
+}
+
+static inline struct eventfs_inode *get_ei(struct eventfs_inode *ei)
+{
+       if (ei)
+               kref_get(&ei->kref);
+       return ei;
+}
+
 static struct dentry *eventfs_root_lookup(struct inode *dir,
                                          struct dentry *dentry,
                                          unsigned int flags);
@@ -148,33 +196,30 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry,
        return ret;
 }
 
-static void update_top_events_attr(struct eventfs_inode *ei, struct dentry *dentry)
+static void update_top_events_attr(struct eventfs_inode *ei, struct super_block *sb)
 {
-       struct inode *inode;
+       struct inode *root;
 
        /* Only update if the "events" was on the top level */
        if (!ei || !(ei->attr.mode & EVENTFS_TOPLEVEL))
                return;
 
        /* Get the tracefs root inode. */
-       inode = d_inode(dentry->d_sb->s_root);
-       ei->attr.uid = inode->i_uid;
-       ei->attr.gid = inode->i_gid;
+       root = d_inode(sb->s_root);
+       ei->attr.uid = root->i_uid;
+       ei->attr.gid = root->i_gid;
 }
 
 static void set_top_events_ownership(struct inode *inode)
 {
        struct tracefs_inode *ti = get_tracefs(inode);
        struct eventfs_inode *ei = ti->private;
-       struct dentry *dentry;
 
        /* The top events directory doesn't get automatically updated */
        if (!ei || !ei->is_events || !(ei->attr.mode & EVENTFS_TOPLEVEL))
                return;
 
-       dentry = ei->dentry;
-
-       update_top_events_attr(ei, dentry);
+       update_top_events_attr(ei, inode->i_sb);
 
        if (!(ei->attr.mode & EVENTFS_SAVE_UID))
                inode->i_uid = ei->attr.uid;
@@ -225,10 +270,11 @@ static struct eventfs_inode *eventfs_find_events(struct dentry *dentry)
 {
        struct eventfs_inode *ei;
 
-       mutex_lock(&eventfs_mutex);
        do {
-               /* The parent always has an ei, except for events itself */
-               ei = dentry->d_parent->d_fsdata;
+               // The parent is stable because we do not do renames
+               dentry = dentry->d_parent;
+               // ... and directories always have d_fsdata
+               ei = dentry->d_fsdata;
 
                /*
                 * If the ei is being freed, the ownership of the children
@@ -238,12 +284,10 @@ static struct eventfs_inode *eventfs_find_events(struct dentry *dentry)
                        ei = NULL;
                        break;
                }
-
-               dentry = ei->dentry;
+               // Walk upwards until you find the events inode
        } while (!ei->is_events);
-       mutex_unlock(&eventfs_mutex);
 
-       update_top_events_attr(ei, dentry);
+       update_top_events_attr(ei, dentry->d_sb);
 
        return ei;
 }
@@ -273,50 +317,11 @@ static void update_inode_attr(struct dentry *dentry, struct inode *inode,
                inode->i_gid = attr->gid;
 }
 
-static void update_gid(struct eventfs_inode *ei, kgid_t gid, int level)
-{
-       struct eventfs_inode *ei_child;
-
-       /* at most we have events/system/event */
-       if (WARN_ON_ONCE(level > 3))
-               return;
-
-       ei->attr.gid = gid;
-
-       if (ei->entry_attrs) {
-               for (int i = 0; i < ei->nr_entries; i++) {
-                       ei->entry_attrs[i].gid = gid;
-               }
-       }
-
-       /*
-        * Only eventfs_inode with dentries are updated, make sure
-        * all eventfs_inodes are updated. If one of the children
-        * do not have a dentry, this function must traverse it.
-        */
-       list_for_each_entry_srcu(ei_child, &ei->children, list,
-                                srcu_read_lock_held(&eventfs_srcu)) {
-               if (!ei_child->dentry)
-                       update_gid(ei_child, gid, level + 1);
-       }
-}
-
-void eventfs_update_gid(struct dentry *dentry, kgid_t gid)
-{
-       struct eventfs_inode *ei = dentry->d_fsdata;
-       int idx;
-
-       idx = srcu_read_lock(&eventfs_srcu);
-       update_gid(ei, gid, 0);
-       srcu_read_unlock(&eventfs_srcu, idx);
-}
-
 /**
- * create_file - create a file in the tracefs filesystem
- * @name: the name of the file to create.
+ * lookup_file - look up a file in the tracefs filesystem
+ * @dentry: the dentry to look up
  * @mode: the permission that the file should have.
  * @attr: saved attributes changed by user
- * @parent: parent dentry for this file.
  * @data: something that the caller will want to get to later on.
  * @fop: struct file_operations that should be used for this file.
  *
@@ -324,30 +329,25 @@ void eventfs_update_gid(struct dentry *dentry, kgid_t gid)
  * directory. The inode.i_private pointer will point to @data in the open()
  * call.
  */
-static struct dentry *create_file(const char *name, umode_t mode,
+static struct dentry *lookup_file(struct eventfs_inode *parent_ei,
+                                 struct dentry *dentry,
+                                 umode_t mode,
                                  struct eventfs_attr *attr,
-                                 struct dentry *parent, void *data,
+                                 void *data,
                                  const struct file_operations *fop)
 {
        struct tracefs_inode *ti;
-       struct dentry *dentry;
        struct inode *inode;
 
        if (!(mode & S_IFMT))
                mode |= S_IFREG;
 
        if (WARN_ON_ONCE(!S_ISREG(mode)))
-               return NULL;
-
-       WARN_ON_ONCE(!parent);
-       dentry = eventfs_start_creating(name, parent);
-
-       if (IS_ERR(dentry))
-               return dentry;
+               return ERR_PTR(-EIO);
 
        inode = tracefs_get_inode(dentry->d_sb);
        if (unlikely(!inode))
-               return eventfs_failed_creating(dentry);
+               return ERR_PTR(-ENOMEM);
 
        /* If the user updated the directory's attributes, use them */
        update_inode_attr(dentry, inode, attr, mode);
@@ -361,32 +361,31 @@ static struct dentry *create_file(const char *name, umode_t mode,
 
        ti = get_tracefs(inode);
        ti->flags |= TRACEFS_EVENT_INODE;
-       d_instantiate(dentry, inode);
-       fsnotify_create(dentry->d_parent->d_inode, dentry);
-       return eventfs_end_creating(dentry);
+
+       // Files have their parent's ei as their fsdata
+       dentry->d_fsdata = get_ei(parent_ei);
+
+       d_add(dentry, inode);
+       return NULL;
 };
 
 /**
- * create_dir - create a dir in the tracefs filesystem
+ * lookup_dir_entry - look up a dir in the tracefs filesystem
+ * @dentry: the directory to look up
  * @ei: the eventfs_inode that represents the directory to create
- * @parent: parent dentry for this file.
  *
- * This function will create a dentry for a directory represented by
+ * This function will look up a dentry for a directory represented by
  * a eventfs_inode.
  */
-static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent)
+static struct dentry *lookup_dir_entry(struct dentry *dentry,
+       struct eventfs_inode *pei, struct eventfs_inode *ei)
 {
        struct tracefs_inode *ti;
-       struct dentry *dentry;
        struct inode *inode;
 
-       dentry = eventfs_start_creating(ei->name, parent);
-       if (IS_ERR(dentry))
-               return dentry;
-
        inode = tracefs_get_inode(dentry->d_sb);
        if (unlikely(!inode))
-               return eventfs_failed_creating(dentry);
+               return ERR_PTR(-ENOMEM);
 
        /* If the user updated the directory's attributes, use them */
        update_inode_attr(dentry, inode, &ei->attr,
@@ -396,68 +395,50 @@ static struct dentry *create_dir(struct eventfs_inode *ei, struct dentry *parent
        inode->i_fop = &eventfs_file_operations;
 
        /* All directories will have the same inode number */
-       inode->i_ino = EVENTFS_DIR_INODE_INO;
+       inode->i_ino = eventfs_dir_ino(ei);
 
        ti = get_tracefs(inode);
        ti->flags |= TRACEFS_EVENT_INODE;
+       /* Only directories have ti->private set to an ei, not files */
+       ti->private = ei;
 
-       inc_nlink(inode);
-       d_instantiate(dentry, inode);
-       inc_nlink(dentry->d_parent->d_inode);
-       fsnotify_mkdir(dentry->d_parent->d_inode, dentry);
-       return eventfs_end_creating(dentry);
+       dentry->d_fsdata = get_ei(ei);
+
+       d_add(dentry, inode);
+       return NULL;
 }
 
-static void free_ei(struct eventfs_inode *ei)
+static inline struct eventfs_inode *alloc_ei(const char *name)
 {
-       kfree_const(ei->name);
-       kfree(ei->d_children);
-       kfree(ei->entry_attrs);
-       kfree(ei);
+       struct eventfs_inode *ei = kzalloc(sizeof(*ei), GFP_KERNEL);
+
+       if (!ei)
+               return NULL;
+
+       ei->name = kstrdup_const(name, GFP_KERNEL);
+       if (!ei->name) {
+               kfree(ei);
+               return NULL;
+       }
+       kref_init(&ei->kref);
+       return ei;
 }
 
 /**
- * eventfs_set_ei_status_free - remove the dentry reference from an eventfs_inode
- * @ti: the tracefs_inode of the dentry
+ * eventfs_d_release - dentry is going away
  * @dentry: dentry which has the reference to remove.
  *
  * Remove the association between a dentry from an eventfs_inode.
  */
-void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry)
+void eventfs_d_release(struct dentry *dentry)
 {
-       struct eventfs_inode *ei;
-       int i;
-
-       mutex_lock(&eventfs_mutex);
-
-       ei = dentry->d_fsdata;
-       if (!ei)
-               goto out;
-
-       /* This could belong to one of the files of the ei */
-       if (ei->dentry != dentry) {
-               for (i = 0; i < ei->nr_entries; i++) {
-                       if (ei->d_children[i] == dentry)
-                               break;
-               }
-               if (WARN_ON_ONCE(i == ei->nr_entries))
-                       goto out;
-               ei->d_children[i] = NULL;
-       } else if (ei->is_freed) {
-               free_ei(ei);
-       } else {
-               ei->dentry = NULL;
-       }
-
-       dentry->d_fsdata = NULL;
- out:
-       mutex_unlock(&eventfs_mutex);
+       put_ei(dentry->d_fsdata);
 }
 
 /**
- * create_file_dentry - create a dentry for a file of an eventfs_inode
+ * lookup_file_dentry - create a dentry for a file of an eventfs_inode
  * @ei: the eventfs_inode that the file will be created under
- * @idx: the index into the d_children[] of the @ei
+ * @idx: the index into the entry_attrs[] of the @ei
  * @parent: The parent dentry of the created file.
  * @name: The name of the file to create
  * @mode: The mode of the file.
@@ -468,163 +449,17 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry)
  * address located at @e_dentry.
  */
 static struct dentry *
-create_file_dentry(struct eventfs_inode *ei, int idx,
-                  struct dentry *parent, const char *name, umode_t mode, void *data,
+lookup_file_dentry(struct dentry *dentry,
+                  struct eventfs_inode *ei, int idx,
+                  umode_t mode, void *data,
                   const struct file_operations *fops)
 {
        struct eventfs_attr *attr = NULL;
-       struct dentry **e_dentry = &ei->d_children[idx];
-       struct dentry *dentry;
-
-       WARN_ON_ONCE(!inode_is_locked(parent->d_inode));
 
-       mutex_lock(&eventfs_mutex);
-       if (ei->is_freed) {
-               mutex_unlock(&eventfs_mutex);
-               return NULL;
-       }
-       /* If the e_dentry already has a dentry, use it */
-       if (*e_dentry) {
-               dget(*e_dentry);
-               mutex_unlock(&eventfs_mutex);
-               return *e_dentry;
-       }
-
-       /* ei->entry_attrs are protected by SRCU */
        if (ei->entry_attrs)
                attr = &ei->entry_attrs[idx];
 
-       mutex_unlock(&eventfs_mutex);
-
-       dentry = create_file(name, mode, attr, parent, data, fops);
-
-       mutex_lock(&eventfs_mutex);
-
-       if (IS_ERR_OR_NULL(dentry)) {
-               /*
-                * When the mutex was released, something else could have
-                * created the dentry for this e_dentry. In which case
-                * use that one.
-                *
-                * If ei->is_freed is set, the e_dentry is currently on its
-                * way to being freed, don't return it. If e_dentry is NULL
-                * it means it was already freed.
-                */
-               if (ei->is_freed) {
-                       dentry = NULL;
-               } else {
-                       dentry = *e_dentry;
-                       dget(dentry);
-               }
-               mutex_unlock(&eventfs_mutex);
-               return dentry;
-       }
-
-       if (!*e_dentry && !ei->is_freed) {
-               *e_dentry = dentry;
-               dentry->d_fsdata = ei;
-       } else {
-               /*
-                * Should never happen unless we get here due to being freed.
-                * Otherwise it means two dentries exist with the same name.
-                */
-               WARN_ON_ONCE(!ei->is_freed);
-               dentry = NULL;
-       }
-       mutex_unlock(&eventfs_mutex);
-
-       return dentry;
-}
-
-/**
- * eventfs_post_create_dir - post create dir routine
- * @ei: eventfs_inode of recently created dir
- *
- * Map the meta-data of files within an eventfs dir to their parent dentry
- */
-static void eventfs_post_create_dir(struct eventfs_inode *ei)
-{
-       struct eventfs_inode *ei_child;
-       struct tracefs_inode *ti;
-
-       lockdep_assert_held(&eventfs_mutex);
-
-       /* srcu lock already held */
-       /* fill parent-child relation */
-       list_for_each_entry_srcu(ei_child, &ei->children, list,
-                                srcu_read_lock_held(&eventfs_srcu)) {
-               ei_child->d_parent = ei->dentry;
-       }
-
-       ti = get_tracefs(ei->dentry->d_inode);
-       ti->private = ei;
-}
-
-/**
- * create_dir_dentry - Create a directory dentry for the eventfs_inode
- * @pei: The eventfs_inode parent of ei.
- * @ei: The eventfs_inode to create the directory for
- * @parent: The dentry of the parent of this directory
- *
- * This creates and attaches a directory dentry to the eventfs_inode @ei.
- */
-static struct dentry *
-create_dir_dentry(struct eventfs_inode *pei, struct eventfs_inode *ei,
-                 struct dentry *parent)
-{
-       struct dentry *dentry = NULL;
-
-       WARN_ON_ONCE(!inode_is_locked(parent->d_inode));
-
-       mutex_lock(&eventfs_mutex);
-       if (pei->is_freed || ei->is_freed) {
-               mutex_unlock(&eventfs_mutex);
-               return NULL;
-       }
-       if (ei->dentry) {
-               /* If the eventfs_inode already has a dentry, use it */
-               dentry = ei->dentry;
-               dget(dentry);
-               mutex_unlock(&eventfs_mutex);
-               return dentry;
-       }
-       mutex_unlock(&eventfs_mutex);
-
-       dentry = create_dir(ei, parent);
-
-       mutex_lock(&eventfs_mutex);
-
-       if (IS_ERR_OR_NULL(dentry) && !ei->is_freed) {
-               /*
-                * When the mutex was released, something else could have
-                * created the dentry for this e_dentry. In which case
-                * use that one.
-                *
-                * If ei->is_freed is set, the e_dentry is currently on its
-                * way to being freed.
-                */
-               dentry = ei->dentry;
-               if (dentry)
-                       dget(dentry);
-               mutex_unlock(&eventfs_mutex);
-               return dentry;
-       }
-
-       if (!ei->dentry && !ei->is_freed) {
-               ei->dentry = dentry;
-               eventfs_post_create_dir(ei);
-               dentry->d_fsdata = ei;
-       } else {
-               /*
-                * Should never happen unless we get here due to being freed.
-                * Otherwise it means two dentries exist with the same name.
-                */
-               WARN_ON_ONCE(!ei->is_freed);
-               dentry = NULL;
-       }
-       mutex_unlock(&eventfs_mutex);
-
-       return dentry;
+       return lookup_file(ei, dentry, mode, attr, data, fops);
 }
 
 /**
@@ -641,79 +476,50 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
                                          struct dentry *dentry,
                                          unsigned int flags)
 {
-       const struct file_operations *fops;
-       const struct eventfs_entry *entry;
        struct eventfs_inode *ei_child;
        struct tracefs_inode *ti;
        struct eventfs_inode *ei;
-       struct dentry *ei_dentry = NULL;
-       struct dentry *ret = NULL;
-       struct dentry *d;
        const char *name = dentry->d_name.name;
-       umode_t mode;
-       void *data;
-       int idx;
-       int i;
-       int r;
+       struct dentry *result = NULL;
 
        ti = get_tracefs(dir);
        if (!(ti->flags & TRACEFS_EVENT_INODE))
-               return NULL;
-
-       /* Grab srcu to prevent the ei from going away */
-       idx = srcu_read_lock(&eventfs_srcu);
+               return ERR_PTR(-EIO);
 
-       /*
-        * Grab the eventfs_mutex to consistent value from ti->private.
-        * This s
-        */
        mutex_lock(&eventfs_mutex);
-       ei = READ_ONCE(ti->private);
-       if (ei && !ei->is_freed)
-               ei_dentry = READ_ONCE(ei->dentry);
-       mutex_unlock(&eventfs_mutex);
 
-       if (!ei || !ei_dentry)
+       ei = ti->private;
+       if (!ei || ei->is_freed)
                goto out;
 
-       data = ei->data;
-
-       list_for_each_entry_srcu(ei_child, &ei->children, list,
-                                srcu_read_lock_held(&eventfs_srcu)) {
+       list_for_each_entry(ei_child, &ei->children, list) {
                if (strcmp(ei_child->name, name) != 0)
                        continue;
-               ret = simple_lookup(dir, dentry, flags);
-               if (IS_ERR(ret))
+               if (ei_child->is_freed)
                        goto out;
-               d = create_dir_dentry(ei, ei_child, ei_dentry);
-               dput(d);
+               result = lookup_dir_entry(dentry, ei, ei_child);
                goto out;
        }
 
-       for (i = 0; i < ei->nr_entries; i++) {
-               entry = &ei->entries[i];
-               if (strcmp(name, entry->name) == 0) {
-                       void *cdata = data;
-                       mutex_lock(&eventfs_mutex);
-                       /* If ei->is_freed, then the event itself may be too */
-                       if (!ei->is_freed)
-                               r = entry->callback(name, &mode, &cdata, &fops);
-                       else
-                               r = -1;
-                       mutex_unlock(&eventfs_mutex);
-                       if (r <= 0)
-                               continue;
-                       ret = simple_lookup(dir, dentry, flags);
-                       if (IS_ERR(ret))
-                               goto out;
-                       d = create_file_dentry(ei, i, ei_dentry, name, mode, cdata, fops);
-                       dput(d);
-                       break;
-               }
+       for (int i = 0; i < ei->nr_entries; i++) {
+               void *data;
+               umode_t mode;
+               const struct file_operations *fops;
+               const struct eventfs_entry *entry = &ei->entries[i];
+
+               if (strcmp(name, entry->name) != 0)
+                       continue;
+
+               data = ei->data;
+               if (entry->callback(name, &mode, &data, &fops) <= 0)
+                       goto out;
+
+               result = lookup_file_dentry(dentry, ei, i, mode, data, fops);
+               goto out;
        }
  out:
-       srcu_read_unlock(&eventfs_srcu, idx);
-       return ret;
+       mutex_unlock(&eventfs_mutex);
+       return result;
 }
 
 /*
@@ -802,7 +608,7 @@ static int eventfs_iterate(struct file *file, struct dir_context *ctx)
 
                name = ei_child->name;
 
-               ino = EVENTFS_DIR_INODE_INO;
+               ino = eventfs_dir_ino(ei_child);
 
                if (!dir_emit(ctx, name, strlen(name), ino, DT_DIR))
                        goto out_dec;
@@ -863,25 +669,10 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode
        if (!parent)
                return ERR_PTR(-EINVAL);
 
-       ei = kzalloc(sizeof(*ei), GFP_KERNEL);
+       ei = alloc_ei(name);
        if (!ei)
                return ERR_PTR(-ENOMEM);
 
-       ei->name = kstrdup_const(name, GFP_KERNEL);
-       if (!ei->name) {
-               kfree(ei);
-               return ERR_PTR(-ENOMEM);
-       }
-
-       if (size) {
-               ei->d_children = kcalloc(size, sizeof(*ei->d_children), GFP_KERNEL);
-               if (!ei->d_children) {
-                       kfree_const(ei->name);
-                       kfree(ei);
-                       return ERR_PTR(-ENOMEM);
-               }
-       }
-
        ei->entries = entries;
        ei->nr_entries = size;
        ei->data = data;
@@ -889,10 +680,8 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode
        INIT_LIST_HEAD(&ei->list);
 
        mutex_lock(&eventfs_mutex);
-       if (!parent->is_freed) {
+       if (!parent->is_freed)
                list_add_tail(&ei->list, &parent->children);
-               ei->d_parent = parent->dentry;
-       }
        mutex_unlock(&eventfs_mutex);
 
        /* Was the parent freed? */
@@ -932,28 +721,20 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
        if (IS_ERR(dentry))
                return ERR_CAST(dentry);
 
-       ei = kzalloc(sizeof(*ei), GFP_KERNEL);
+       ei = alloc_ei(name);
        if (!ei)
-               goto fail_ei;
+               goto fail;
 
        inode = tracefs_get_inode(dentry->d_sb);
        if (unlikely(!inode))
                goto fail;
 
-       if (size) {
-               ei->d_children = kcalloc(size, sizeof(*ei->d_children), GFP_KERNEL);
-               if (!ei->d_children)
-                       goto fail;
-       }
-
-       ei->dentry = dentry;
+       // Note: we have a ref to the dentry from tracefs_start_creating()
+       ei->events_dir = dentry;
        ei->entries = entries;
        ei->nr_entries = size;
        ei->is_events = 1;
        ei->data = data;
-       ei->name = kstrdup_const(name, GFP_KERNEL);
-       if (!ei->name)
-               goto fail;
 
        /* Save the ownership of this directory */
        uid = d_inode(dentry->d_parent)->i_uid;
@@ -984,11 +765,19 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
        inode->i_op = &eventfs_root_dir_inode_operations;
        inode->i_fop = &eventfs_file_operations;
 
-       dentry->d_fsdata = ei;
+       dentry->d_fsdata = get_ei(ei);
 
-       /* directory inodes start off with i_nlink == 2 (for "." entry) */
-       inc_nlink(inode);
+       /*
+        * Keep all eventfs directories with i_nlink == 1.
+        * Due to the dynamic nature of the dentry creations and not
+        * wanting to add a pointer to the parent eventfs_inode in the
+        * eventfs_inode structure, keeping the i_nlink in sync with the
+        * number of directories would cause too much complexity for
+        * something not worth much. Keeping directory links at 1
+        * tells userspace not to trust the link number.
+        */
        d_instantiate(dentry, inode);
+       /* The dentry of the "events" parent does keep track though */
        inc_nlink(dentry->d_parent->d_inode);
        fsnotify_mkdir(dentry->d_parent->d_inode, dentry);
        tracefs_end_creating(dentry);
@@ -996,72 +785,11 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
        return ei;
 
  fail:
-       kfree(ei->d_children);
-       kfree(ei);
- fail_ei:
+       free_ei(ei);
        tracefs_failed_creating(dentry);
        return ERR_PTR(-ENOMEM);
 }
 
-static LLIST_HEAD(free_list);
-
-static void eventfs_workfn(struct work_struct *work)
-{
-        struct eventfs_inode *ei, *tmp;
-        struct llist_node *llnode;
-
-       llnode = llist_del_all(&free_list);
-        llist_for_each_entry_safe(ei, tmp, llnode, llist) {
-               /* This dput() matches the dget() from unhook_dentry() */
-               for (int i = 0; i < ei->nr_entries; i++) {
-                       if (ei->d_children[i])
-                               dput(ei->d_children[i]);
-               }
-               /* This should only get here if it had a dentry */
-               if (!WARN_ON_ONCE(!ei->dentry))
-                       dput(ei->dentry);
-        }
-}
-
-static DECLARE_WORK(eventfs_work, eventfs_workfn);
-
-static void free_rcu_ei(struct rcu_head *head)
-{
-       struct eventfs_inode *ei = container_of(head, struct eventfs_inode, rcu);
-
-       if (ei->dentry) {
-               /* Do not free the ei until all references of dentry are gone */
-               if (llist_add(&ei->llist, &free_list))
-                       queue_work(system_unbound_wq, &eventfs_work);
-               return;
-       }
-
-       /* If the ei doesn't have a dentry, neither should its children */
-       for (int i = 0; i < ei->nr_entries; i++) {
-               WARN_ON_ONCE(ei->d_children[i]);
-       }
-
-       free_ei(ei);
-}
-
-static void unhook_dentry(struct dentry *dentry)
-{
-       if (!dentry)
-               return;
-       /*
-        * Need to add a reference to the dentry that is expected by
-        * simple_recursive_removal(), which will include a dput().
-        */
-       dget(dentry);
-
-       /*
-        * Also add a reference for the dput() in eventfs_workfn().
-        * That is required as that dput() will free the ei after
-        * the SRCU grace period is over.
-        */
-       dget(dentry);
-}
-
 /**
  * eventfs_remove_rec - remove eventfs dir or file from list
  * @ei: eventfs_inode to be removed.
@@ -1074,8 +802,6 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, int level)
 {
        struct eventfs_inode *ei_child;
 
-       if (!ei)
-               return;
        /*
         * Check recursion depth. It should never be greater than 3:
         * 0 - events/
@@ -1087,28 +813,11 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, int level)
                return;
 
        /* search for nested folders or files */
-       list_for_each_entry_srcu(ei_child, &ei->children, list,
-                                lockdep_is_held(&eventfs_mutex)) {
-               /* Children only have dentry if parent does */
-               WARN_ON_ONCE(ei_child->dentry && !ei->dentry);
+       list_for_each_entry(ei_child, &ei->children, list)
                eventfs_remove_rec(ei_child, level + 1);
-       }
-
-
-       ei->is_freed = 1;
 
-       for (int i = 0; i < ei->nr_entries; i++) {
-               if (ei->d_children[i]) {
-                       /* Children only have dentry if parent does */
-                       WARN_ON_ONCE(!ei->dentry);
-                       unhook_dentry(ei->d_children[i]);
-               }
-       }
-
-       unhook_dentry(ei->dentry);
-
-       list_del_rcu(&ei->list);
-       call_srcu(&eventfs_srcu, &ei->rcu, free_rcu_ei);
+       list_del(&ei->list);
+       free_ei(ei);
 }
 
 /**
@@ -1119,22 +828,12 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, int level)
  */
 void eventfs_remove_dir(struct eventfs_inode *ei)
 {
-       struct dentry *dentry;
-
        if (!ei)
                return;
 
        mutex_lock(&eventfs_mutex);
-       dentry = ei->dentry;
        eventfs_remove_rec(ei, 0);
        mutex_unlock(&eventfs_mutex);
-
-       /*
-        * If any of the ei children has a dentry, then the ei itself
-        * must have a dentry.
-        */
-       if (dentry)
-               simple_recursive_removal(dentry, NULL);
 }
 
 /**
@@ -1147,7 +846,11 @@ void eventfs_remove_events_dir(struct eventfs_inode *ei)
 {
        struct dentry *dentry;
 
-       dentry = ei->dentry;
+       dentry = ei->events_dir;
+       if (!dentry)
+               return;
+
+       ei->events_dir = NULL;
        eventfs_remove_dir(ei);
 
        /*
@@ -1157,5 +860,6 @@ void eventfs_remove_events_dir(struct eventfs_inode *ei)
         * sticks around while the other ei->dentry are created
         * and destroyed dynamically.
         */
+       d_invalidate(dentry);
        dput(dentry);
 }
index e1b172c0e091a8d55fcc80951fa4ed5202b1539e..d65ffad4c327ca11a98a8d2073d8e5c77ac138c3 100644 (file)
@@ -38,8 +38,6 @@ static struct inode *tracefs_alloc_inode(struct super_block *sb)
        if (!ti)
                return NULL;
 
-       ti->flags = 0;
-
        return &ti->vfs_inode;
 }
 
@@ -379,21 +377,30 @@ static const struct super_operations tracefs_super_operations = {
        .show_options   = tracefs_show_options,
 };
 
-static void tracefs_dentry_iput(struct dentry *dentry, struct inode *inode)
+/*
+ * It would be cleaner if eventfs had its own dentry ops.
+ *
+ * Note that d_revalidate is called potentially under RCU,
+ * so it can't take the eventfs mutex etc. It's fine - if
+ * we open a file just as it's marked dead, things will
+ * still work just fine, and just see the old stale case.
+ */
+static void tracefs_d_release(struct dentry *dentry)
 {
-       struct tracefs_inode *ti;
+       if (dentry->d_fsdata)
+               eventfs_d_release(dentry);
+}
 
-       if (!dentry || !inode)
-               return;
+static int tracefs_d_revalidate(struct dentry *dentry, unsigned int flags)
+{
+       struct eventfs_inode *ei = dentry->d_fsdata;
 
-       ti = get_tracefs(inode);
-       if (ti && ti->flags & TRACEFS_EVENT_INODE)
-               eventfs_set_ei_status_free(ti, dentry);
-       iput(inode);
+       return !(ei && ei->is_freed);
 }
 
 static const struct dentry_operations tracefs_dentry_operations = {
-       .d_iput = tracefs_dentry_iput,
+       .d_revalidate = tracefs_d_revalidate,
+       .d_release = tracefs_d_release,
 };
 
 static int trace_fill_super(struct super_block *sb, void *data, int silent)
@@ -497,75 +504,6 @@ struct dentry *tracefs_end_creating(struct dentry *dentry)
        return dentry;
 }
 
-/**
- * eventfs_start_creating - start the process of creating a dentry
- * @name: Name of the file created for the dentry
- * @parent: The parent dentry where this dentry will be created
- *
- * This is a simple helper function for the dynamically created eventfs
- * files. When the directory of the eventfs files are accessed, their
- * dentries are created on the fly. This function is used to start that
- * process.
- */
-struct dentry *eventfs_start_creating(const char *name, struct dentry *parent)
-{
-       struct dentry *dentry;
-       int error;
-
-       /* Must always have a parent. */
-       if (WARN_ON_ONCE(!parent))
-               return ERR_PTR(-EINVAL);
-
-       error = simple_pin_fs(&trace_fs_type, &tracefs_mount,
-                             &tracefs_mount_count);
-       if (error)
-               return ERR_PTR(error);
-
-       if (unlikely(IS_DEADDIR(parent->d_inode)))
-               dentry = ERR_PTR(-ENOENT);
-       else
-               dentry = lookup_one_len(name, parent, strlen(name));
-
-       if (!IS_ERR(dentry) && dentry->d_inode) {
-               dput(dentry);
-               dentry = ERR_PTR(-EEXIST);
-       }
-
-       if (IS_ERR(dentry))
-               simple_release_fs(&tracefs_mount, &tracefs_mount_count);
-
-       return dentry;
-}
-
-/**
- * eventfs_failed_creating - clean up a failed eventfs dentry creation
- * @dentry: The dentry to clean up
- *
- * If after calling eventfs_start_creating(), a failure is detected, the
- * resources created by eventfs_start_creating() needs to be cleaned up. In
- * that case, this function should be called to perform that clean up.
- */
-struct dentry *eventfs_failed_creating(struct dentry *dentry)
-{
-       dput(dentry);
-       simple_release_fs(&tracefs_mount, &tracefs_mount_count);
-       return NULL;
-}
-
-/**
- * eventfs_end_creating - Finish the process of creating a eventfs dentry
- * @dentry: The dentry that has successfully been created.
- *
- * This function is currently just a place holder to match
- * eventfs_start_creating(). In case any synchronization needs to be added,
- * this function will be used to implement that without having to modify
- * the callers of eventfs_start_creating().
- */
-struct dentry *eventfs_end_creating(struct dentry *dentry)
-{
-       return dentry;
-}
-
 /* Find the inode that this will use for default */
 static struct inode *instance_inode(struct dentry *parent, struct inode *inode)
 {
@@ -779,7 +717,11 @@ static void init_once(void *foo)
 {
        struct tracefs_inode *ti = (struct tracefs_inode *) foo;
 
+       /* inode_init_once() calls memset() on the vfs_inode portion */
        inode_init_once(&ti->vfs_inode);
+
+       /* Zero out the rest */
+       memset_after(ti, 0, vfs_inode);
 }
 
 static int __init tracefs_init(void)
index 12b7d0150ae9efeab86e21fd7e15c8e1a397a69e..beb3dcd0e434207c882bcfe6ec027b6ca3e43526 100644 (file)
@@ -11,9 +11,10 @@ enum {
 };
 
 struct tracefs_inode {
+       struct inode            vfs_inode;
+       /* The below gets initialized with memset_after(ti, 0, vfs_inode) */
        unsigned long           flags;
        void                    *private;
-       struct inode            vfs_inode;
 };
 
 /*
@@ -31,42 +32,37 @@ struct eventfs_attr {
 /*
  * struct eventfs_inode - hold the properties of the eventfs directories.
  * @list:      link list into the parent directory
+ * @rcu:       Union with @list for freeing
+ * @children:  link list into the child eventfs_inode
  * @entries:   the array of entries representing the files in the directory
  * @name:      the name of the directory to create
- * @children:  link list into the child eventfs_inode
- * @dentry:     the dentry of the directory
- * @d_parent:   pointer to the parent's dentry
- * @d_children: The array of dentries to represent the files when created
+ * @events_dir: the dentry of the events directory
  * @entry_attrs: Saved mode and ownership of the @d_children
- * @attr:      Saved mode and ownership of eventfs_inode itself
  * @data:      The private data to pass to the callbacks
+ * @attr:      Saved mode and ownership of eventfs_inode itself
  * @is_freed:  Flag set if the eventfs is on its way to be freed
  *                Note if is_freed is set, then dentry is corrupted.
+ * @is_events: Flag set for only the top level "events" directory
  * @nr_entries: The number of items in @entries
+ * @ino:       The saved inode number
  */
 struct eventfs_inode {
-       struct list_head                list;
+       union {
+               struct list_head        list;
+               struct rcu_head         rcu;
+       };
+       struct list_head                children;
        const struct eventfs_entry      *entries;
        const char                      *name;
-       struct list_head                children;
-       struct dentry                   *dentry; /* Check is_freed to access */
-       struct dentry                   *d_parent;
-       struct dentry                   **d_children;
+       struct dentry                   *events_dir;
        struct eventfs_attr             *entry_attrs;
-       struct eventfs_attr             attr;
        void                            *data;
-       /*
-        * Union - used for deletion
-        * @llist:      for calling dput() if needed after RCU
-        * @rcu:        eventfs_inode to delete in RCU
-        */
-       union {
-               struct llist_node       llist;
-               struct rcu_head         rcu;
-       };
+       struct eventfs_attr             attr;
+       struct kref                     kref;
        unsigned int                    is_freed:1;
        unsigned int                    is_events:1;
        unsigned int                    nr_entries:30;
+       unsigned int                    ino;
 };
 
 static inline struct tracefs_inode *get_tracefs(const struct inode *inode)
@@ -78,10 +74,7 @@ struct dentry *tracefs_start_creating(const char *name, struct dentry *parent);
 struct dentry *tracefs_end_creating(struct dentry *dentry);
 struct dentry *tracefs_failed_creating(struct dentry *dentry);
 struct inode *tracefs_get_inode(struct super_block *sb);
-struct dentry *eventfs_start_creating(const char *name, struct dentry *parent);
-struct dentry *eventfs_failed_creating(struct dentry *dentry);
-struct dentry *eventfs_end_creating(struct dentry *dentry);
-void eventfs_update_gid(struct dentry *dentry, kgid_t gid);
-void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry);
+
+void eventfs_d_release(struct dentry *dentry);
 
 #endif /* _TRACEFS_INTERNAL_H */
index e413a9cf8ee38b9e0f9fad3d9234653626dbf1d7..551148de66cd86dbf76484d73c4c3acdc5969a1c 100644 (file)
@@ -205,7 +205,6 @@ static struct dentry *ubifs_lookup(struct inode *dir, struct dentry *dentry,
        dbg_gen("'%pd' in dir ino %lu", dentry, dir->i_ino);
 
        err = fscrypt_prepare_lookup(dir, dentry, &nm);
-       generic_set_encrypted_ci_d_ops(dentry);
        if (err == -ENOENT)
                return d_splice_alias(NULL, dentry);
        if (err)
index 09e270d6ed0258923ccc7680025fad26d1cac5c6..d2881041b393c3fca36fb15e8aca78df3877e04d 100644 (file)
@@ -2239,13 +2239,14 @@ static int ubifs_fill_super(struct super_block *sb, void *data, int silent)
                goto out_umount;
        }
 
+       generic_set_sb_d_ops(sb);
        sb->s_root = d_make_root(root);
        if (!sb->s_root) {
                err = -ENOMEM;
                goto out_umount;
        }
 
-       import_uuid(&sb->s_uuid, c->uuid);
+       super_set_uuid(sb, c->uuid, sizeof(c->uuid));
 
        mutex_unlock(&c->umount_mutex);
        return 0;
index 9976a00a73f99c46fc27bf6a8e93a390af7b941e..e965a48e7db96f89b782038e0aa363d526c2a42e 100644 (file)
@@ -421,10 +421,10 @@ xfs_attr_complete_op(
        bool                    do_replace = args->op_flags & XFS_DA_OP_REPLACE;
 
        args->op_flags &= ~XFS_DA_OP_REPLACE;
-       if (do_replace) {
-               args->attr_filter &= ~XFS_ATTR_INCOMPLETE;
+       args->attr_filter &= ~XFS_ATTR_INCOMPLETE;
+       if (do_replace)
                return replace_state;
-       }
+
        return XFS_DAS_DONE;
 }
 
index 31100120b2c586bbcfb5ee3d1d413926400e215a..e31663cb7b4349e173c2b19ac33eb6b10cd59a33 100644 (file)
@@ -1118,20 +1118,6 @@ xfs_rtbitmap_blockcount(
        return howmany_64(rtextents, NBBY * mp->m_sb.sb_blocksize);
 }
 
-/*
- * Compute the maximum level number of the realtime summary file, as defined by
- * mkfs.  The historic use of highbit32 on a 64-bit quantity prohibited correct
- * use of rt volumes with more than 2^32 extents.
- */
-uint8_t
-xfs_compute_rextslog(
-       xfs_rtbxlen_t           rtextents)
-{
-       if (!rtextents)
-               return 0;
-       return xfs_highbit64(rtextents);
-}
-
 /*
  * Compute the number of rtbitmap words needed to populate every block of a
  * bitmap that is large enough to track the given number of rt extents.
index 274dc7dae1faf836217bcac95a859fb9cf510d93..152a66750af554d91a0641ae9a3ed1011ffb7386 100644 (file)
@@ -351,20 +351,6 @@ xfs_rtfree_extent(
 int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno,
                xfs_filblks_t rtlen);
 
-uint8_t xfs_compute_rextslog(xfs_rtbxlen_t rtextents);
-
-/* Do we support an rt volume having this number of rtextents? */
-static inline bool
-xfs_validate_rtextents(
-       xfs_rtbxlen_t           rtextents)
-{
-       /* No runt rt volumes */
-       if (rtextents == 0)
-               return false;
-
-       return true;
-}
-
 xfs_filblks_t xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t
                rtextents);
 unsigned long long xfs_rtbitmap_wordcount(struct xfs_mount *mp,
@@ -383,8 +369,6 @@ unsigned long long xfs_rtsummary_wordcount(struct xfs_mount *mp,
 # define xfs_rtsummary_read_buf(a,b)                   (-ENOSYS)
 # define xfs_rtbuf_cache_relse(a)                      (0)
 # define xfs_rtalloc_extent_is_free(m,t,s,l,i)         (-ENOSYS)
-# define xfs_compute_rextslog(rtx)                     (0)
-# define xfs_validate_rtextents(rtx)                   (false)
 static inline xfs_filblks_t
 xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t rtextents)
 {
index 4a9e8588f4c98c3647a85682d56fe26620a59ffc..5bb6e2bd6deeed152414cbc8fae5db927f90bdd4 100644 (file)
@@ -1377,3 +1377,17 @@ xfs_validate_stripe_geometry(
        }
        return true;
 }
+
+/*
+ * Compute the maximum level number of the realtime summary file, as defined by
+ * mkfs.  The historic use of highbit32 on a 64-bit quantity prohibited correct
+ * use of rt volumes with more than 2^32 extents.
+ */
+uint8_t
+xfs_compute_rextslog(
+       xfs_rtbxlen_t           rtextents)
+{
+       if (!rtextents)
+               return 0;
+       return xfs_highbit64(rtextents);
+}
index 19134b23c10be3824de6a7949d6ccf9ebdfa8de0..2e8e8d63d4eb2249d148b8f6d50f2a71726911f5 100644 (file)
@@ -38,4 +38,6 @@ extern int    xfs_sb_get_secondary(struct xfs_mount *mp,
 extern bool    xfs_validate_stripe_geometry(struct xfs_mount *mp,
                __s64 sunit, __s64 swidth, int sectorsize, bool silent);
 
+uint8_t xfs_compute_rextslog(xfs_rtbxlen_t rtextents);
+
 #endif /* __XFS_SB_H__ */
index 20b5375f2d9c9ec466ab2cfc0a6482d20e23965b..62e02d5380ad3b47d6dc403a3b1ffba0d202ce43 100644 (file)
@@ -251,4 +251,16 @@ bool xfs_verify_fileoff(struct xfs_mount *mp, xfs_fileoff_t off);
 bool xfs_verify_fileext(struct xfs_mount *mp, xfs_fileoff_t off,
                xfs_fileoff_t len);
 
+/* Do we support an rt volume having this number of rtextents? */
+static inline bool
+xfs_validate_rtextents(
+       xfs_rtbxlen_t           rtextents)
+{
+       /* No runt rt volumes */
+       if (rtextents == 0)
+               return false;
+
+       return true;
+}
+
 #endif /* __XFS_TYPES_H__ */
index 441ca99776527453a19b2e709f2482a758fe5af0..46583517377ffadd57e557897bc050428e84bd4c 100644 (file)
@@ -15,6 +15,7 @@
 #include "xfs_inode.h"
 #include "xfs_bmap.h"
 #include "xfs_bit.h"
+#include "xfs_sb.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/repair.h"
index fabd0ed9dfa67637686768dff5d27ef0ee78674d..b1ff4f33324a7481ae9818173187e5a767348c76 100644 (file)
@@ -16,6 +16,7 @@
 #include "xfs_rtbitmap.h"
 #include "xfs_bit.h"
 #include "xfs_bmap.h"
+#include "xfs_sb.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/trace.h"
index 813f85156b0c3b9ed97a5d5740db2d20e3810e59..1698507d1ac73a0a4985322e00c52e61539d5317 100644 (file)
@@ -112,7 +112,7 @@ xfs_end_ioend(
         * longer dirty. If we don't remove delalloc blocks here, they become
         * stale and can corrupt free space accounting on unmount.
         */
-       error = blk_status_to_errno(ioend->io_bio->bi_status);
+       error = blk_status_to_errno(ioend->io_bio.bi_status);
        if (unlikely(error)) {
                if (ioend->io_flags & IOMAP_F_SHARED) {
                        xfs_reflink_cancel_cow_range(ip, offset, size, true);
@@ -179,7 +179,7 @@ STATIC void
 xfs_end_bio(
        struct bio              *bio)
 {
-       struct iomap_ioend      *ioend = bio->bi_private;
+       struct iomap_ioend      *ioend = iomap_ioend_from_bio(bio);
        struct xfs_inode        *ip = XFS_I(ioend->io_inode);
        unsigned long           flags;
 
@@ -276,7 +276,8 @@ static int
 xfs_map_blocks(
        struct iomap_writepage_ctx *wpc,
        struct inode            *inode,
-       loff_t                  offset)
+       loff_t                  offset,
+       unsigned int            len)
 {
        struct xfs_inode        *ip = XFS_I(inode);
        struct xfs_mount        *mp = ip->i_mount;
@@ -444,7 +445,7 @@ xfs_prepare_ioend(
        /* send ioends that might require a transaction to the completion wq */
        if (xfs_ioend_is_append(ioend) || ioend->io_type == IOMAP_UNWRITTEN ||
            (ioend->io_flags & IOMAP_F_SHARED))
-               ioend->io_bio->bi_end_io = xfs_end_bio;
+               ioend->io_bio.bi_end_io = xfs_end_bio;
        return status;
 }
 
index 8e5bd50d29feb34b50a5a85dca9110648a6bf69e..01b41fabbe3c7ba89d2f59c6dad15e49dfe781c5 100644 (file)
@@ -1951,7 +1951,7 @@ xfs_free_buftarg(
        fs_put_dax(btp->bt_daxdev, btp->bt_mount);
        /* the main block device is closed by kill_block_super */
        if (btp->bt_bdev != btp->bt_mount->m_super->s_bdev)
-               bdev_release(btp->bt_bdev_handle);
+               fput(btp->bt_bdev_file);
 
        kmem_free(btp);
 }
@@ -1994,7 +1994,7 @@ xfs_setsize_buftarg_early(
 struct xfs_buftarg *
 xfs_alloc_buftarg(
        struct xfs_mount        *mp,
-       struct bdev_handle      *bdev_handle)
+       struct file             *bdev_file)
 {
        xfs_buftarg_t           *btp;
        const struct dax_holder_operations *ops = NULL;
@@ -2005,9 +2005,9 @@ xfs_alloc_buftarg(
        btp = kmem_zalloc(sizeof(*btp), KM_NOFS);
 
        btp->bt_mount = mp;
-       btp->bt_bdev_handle = bdev_handle;
-       btp->bt_dev = bdev_handle->bdev->bd_dev;
-       btp->bt_bdev = bdev_handle->bdev;
+       btp->bt_bdev_file = bdev_file;
+       btp->bt_bdev = file_bdev(bdev_file);
+       btp->bt_dev = btp->bt_bdev->bd_dev;
        btp->bt_daxdev = fs_dax_get_by_bdev(btp->bt_bdev, &btp->bt_dax_part_off,
                                            mp, ops);
 
index b470de08a46ca8e6e7c3b4dfe7e85110037f3949..304e858d04fb3cfca71d4378f2878293f7462d8b 100644 (file)
@@ -98,7 +98,7 @@ typedef unsigned int xfs_buf_flags_t;
  */
 typedef struct xfs_buftarg {
        dev_t                   bt_dev;
-       struct bdev_handle      *bt_bdev_handle;
+       struct file             *bt_bdev_file;
        struct block_device     *bt_bdev;
        struct dax_device       *bt_daxdev;
        u64                     bt_dax_part_off;
@@ -366,7 +366,7 @@ xfs_buf_update_cksum(struct xfs_buf *bp, unsigned long cksum_offset)
  *     Handling of buftargs.
  */
 struct xfs_buftarg *xfs_alloc_buftarg(struct xfs_mount *mp,
-               struct bdev_handle *bdev_handle);
+               struct file *bdev_file);
 extern void xfs_free_buftarg(struct xfs_buftarg *);
 extern void xfs_buftarg_wait(struct xfs_buftarg *);
 extern void xfs_buftarg_drain(struct xfs_buftarg *);
index aabb25dc3efab2ed57e8e1a99ec1aa0bfd503297..57fa21ad79124784d205f1313db461acda2546b1 100644 (file)
@@ -62,7 +62,7 @@ xfs_uuid_mount(
        int                     hole, i;
 
        /* Publish UUID in struct super_block */
-       uuid_copy(&mp->m_super->s_uuid, uuid);
+       super_set_uuid(mp->m_super, uuid->b, sizeof(*uuid));
 
        if (xfs_has_nouuid(mp))
                return 0;
@@ -706,6 +706,8 @@ xfs_mountfs(
        /* enable fail_at_unmount as default */
        mp->m_fail_unmount = true;
 
+       super_set_sysfs_name_id(mp->m_super);
+
        error = xfs_sysfs_init(&mp->m_kobj, &xfs_mp_ktype,
                               NULL, mp->m_super->s_id);
        if (error)
index aff20ddd4a9f9cdeeeca1f54f210d19462773a5b..00fbd5b6e582dffeeecec0e2a3f6f624d25304bd 100644 (file)
@@ -350,7 +350,6 @@ xfs_setup_dax_always(
                return -EINVAL;
        }
 
-       xfs_warn(mp, "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
        return 0;
 
 disable_dax:
@@ -362,16 +361,16 @@ STATIC int
 xfs_blkdev_get(
        xfs_mount_t             *mp,
        const char              *name,
-       struct bdev_handle      **handlep)
+       struct file             **bdev_filep)
 {
        int                     error = 0;
 
-       *handlep = bdev_open_by_path(name,
+       *bdev_filep = bdev_file_open_by_path(name,
                BLK_OPEN_READ | BLK_OPEN_WRITE | BLK_OPEN_RESTRICT_WRITES,
                mp->m_super, &fs_holder_ops);
-       if (IS_ERR(*handlep)) {
-               error = PTR_ERR(*handlep);
-               *handlep = NULL;
+       if (IS_ERR(*bdev_filep)) {
+               error = PTR_ERR(*bdev_filep);
+               *bdev_filep = NULL;
                xfs_warn(mp, "Invalid device [%s], error=%d", name, error);
        }
 
@@ -436,26 +435,26 @@ xfs_open_devices(
 {
        struct super_block      *sb = mp->m_super;
        struct block_device     *ddev = sb->s_bdev;
-       struct bdev_handle      *logdev_handle = NULL, *rtdev_handle = NULL;
+       struct file             *logdev_file = NULL, *rtdev_file = NULL;
        int                     error;
 
        /*
         * Open real time and log devices - order is important.
         */
        if (mp->m_logname) {
-               error = xfs_blkdev_get(mp, mp->m_logname, &logdev_handle);
+               error = xfs_blkdev_get(mp, mp->m_logname, &logdev_file);
                if (error)
                        return error;
        }
 
        if (mp->m_rtname) {
-               error = xfs_blkdev_get(mp, mp->m_rtname, &rtdev_handle);
+               error = xfs_blkdev_get(mp, mp->m_rtname, &rtdev_file);
                if (error)
                        goto out_close_logdev;
 
-               if (rtdev_handle->bdev == ddev ||
-                   (logdev_handle &&
-                    rtdev_handle->bdev == logdev_handle->bdev)) {
+               if (file_bdev(rtdev_file) == ddev ||
+                   (logdev_file &&
+                    file_bdev(rtdev_file) == file_bdev(logdev_file))) {
                        xfs_warn(mp,
        "Cannot mount filesystem with identical rtdev and ddev/logdev.");
                        error = -EINVAL;
@@ -467,25 +466,25 @@ xfs_open_devices(
         * Setup xfs_mount buffer target pointers
         */
        error = -ENOMEM;
-       mp->m_ddev_targp = xfs_alloc_buftarg(mp, sb->s_bdev_handle);
+       mp->m_ddev_targp = xfs_alloc_buftarg(mp, sb->s_bdev_file);
        if (!mp->m_ddev_targp)
                goto out_close_rtdev;
 
-       if (rtdev_handle) {
-               mp->m_rtdev_targp = xfs_alloc_buftarg(mp, rtdev_handle);
+       if (rtdev_file) {
+               mp->m_rtdev_targp = xfs_alloc_buftarg(mp, rtdev_file);
                if (!mp->m_rtdev_targp)
                        goto out_free_ddev_targ;
        }
 
-       if (logdev_handle && logdev_handle->bdev != ddev) {
-               mp->m_logdev_targp = xfs_alloc_buftarg(mp, logdev_handle);
+       if (logdev_file && file_bdev(logdev_file) != ddev) {
+               mp->m_logdev_targp = xfs_alloc_buftarg(mp, logdev_file);
                if (!mp->m_logdev_targp)
                        goto out_free_rtdev_targ;
        } else {
                mp->m_logdev_targp = mp->m_ddev_targp;
                /* Handle won't be used, drop it */
-               if (logdev_handle)
-                       bdev_release(logdev_handle);
+               if (logdev_file)
+                       fput(logdev_file);
        }
 
        return 0;
@@ -496,11 +495,11 @@ xfs_open_devices(
  out_free_ddev_targ:
        xfs_free_buftarg(mp->m_ddev_targp);
  out_close_rtdev:
-        if (rtdev_handle)
-               bdev_release(rtdev_handle);
+        if (rtdev_file)
+               fput(rtdev_file);
  out_close_logdev:
-       if (logdev_handle)
-               bdev_release(logdev_handle);
+       if (logdev_file)
+               fput(logdev_file);
        return error;
 }
 
@@ -1496,6 +1495,18 @@ xfs_fs_fill_super(
 
        mp->m_super = sb;
 
+       /*
+        * Copy VFS mount flags from the context now that all parameter parsing
+        * is guaranteed to have been completed by either the old mount API or
+        * the newer fsopen/fsconfig API.
+        */
+       if (fc->sb_flags & SB_RDONLY)
+               set_bit(XFS_OPSTATE_READONLY, &mp->m_opstate);
+       if (fc->sb_flags & SB_DIRSYNC)
+               mp->m_features |= XFS_FEAT_DIRSYNC;
+       if (fc->sb_flags & SB_SYNCHRONOUS)
+               mp->m_features |= XFS_FEAT_WSYNC;
+
        error = xfs_fs_validate_params(mp);
        if (error)
                return error;
@@ -1965,6 +1976,11 @@ static const struct fs_context_operations xfs_context_ops = {
        .free        = xfs_fs_free,
 };
 
+/*
+ * WARNING: do not initialise any parameters in this function that depend on
+ * mount option parsing having already been performed as this can be called from
+ * fsopen() before any parameters have been set.
+ */
 static int xfs_init_fs_context(
        struct fs_context       *fc)
 {
@@ -1996,16 +2012,6 @@ static int xfs_init_fs_context(
        mp->m_logbsize = -1;
        mp->m_allocsize_log = 16; /* 64k */
 
-       /*
-        * Copy binary VFS mount flags we are interested in.
-        */
-       if (fc->sb_flags & SB_RDONLY)
-               set_bit(XFS_OPSTATE_READONLY, &mp->m_opstate);
-       if (fc->sb_flags & SB_DIRSYNC)
-               mp->m_features |= XFS_FEAT_DIRSYNC;
-       if (fc->sb_flags & SB_SYNCHRONOUS)
-               mp->m_features |= XFS_FEAT_WSYNC;
-
        fc->s_fs_info = mp;
        fc->ops = &xfs_context_ops;
 
index 6ab2318a9c8e80271a3f17d2ad852dacad2f2fa1..3b103715acc90fe28a303d356f2a43d9b71f3eb6 100644 (file)
@@ -125,7 +125,8 @@ static void zonefs_readahead(struct readahead_control *rac)
  * which implies that the page range can only be within the fixed inode size.
  */
 static int zonefs_write_map_blocks(struct iomap_writepage_ctx *wpc,
-                                  struct inode *inode, loff_t offset)
+                                  struct inode *inode, loff_t offset,
+                                  unsigned int len)
 {
        struct zonefs_zone *z = zonefs_inode_zone(inode);
 
@@ -348,7 +349,12 @@ static int zonefs_file_write_dio_end_io(struct kiocb *iocb, ssize_t size,
        struct zonefs_inode_info *zi = ZONEFS_I(inode);
 
        if (error) {
-               zonefs_io_error(inode, true);
+               /*
+                * For Sync IOs, error recovery is called from
+                * zonefs_file_dio_write().
+                */
+               if (!is_sync_kiocb(iocb))
+                       zonefs_io_error(inode, true);
                return error;
        }
 
@@ -491,6 +497,14 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
                        ret = -EINVAL;
                        goto inode_unlock;
                }
+               /*
+                * Advance the zone write pointer offset. This assumes that the
+                * IO will succeed, which is OK to do because we do not allow
+                * partial writes (IOMAP_DIO_PARTIAL is not set) and if the IO
+                * fails, the error path will correct the write pointer offset.
+                */
+               z->z_wpoffset += count;
+               zonefs_inode_account_active(inode);
                mutex_unlock(&zi->i_truncate_mutex);
        }
 
@@ -504,20 +518,19 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
        if (ret == -ENOTBLK)
                ret = -EBUSY;
 
-       if (zonefs_zone_is_seq(z) &&
-           (ret > 0 || ret == -EIOCBQUEUED)) {
-               if (ret > 0)
-                       count = ret;
-
-               /*
-                * Update the zone write pointer offset assuming the write
-                * operation succeeded. If it did not, the error recovery path
-                * will correct it. Also do active seq file accounting.
-                */
-               mutex_lock(&zi->i_truncate_mutex);
-               z->z_wpoffset += count;
-               zonefs_inode_account_active(inode);
-               mutex_unlock(&zi->i_truncate_mutex);
+       /*
+        * For a failed IO or partial completion, trigger error recovery
+        * to update the zone write pointer offset to a correct value.
+        * For asynchronous IOs, zonefs_file_write_dio_end_io() may already
+        * have executed error recovery if the IO already completed when we
+        * reach here. However, we cannot know that and execute error recovery
+        * again (that will not change anything).
+        */
+       if (zonefs_zone_is_seq(z)) {
+               if (ret > 0 && ret != count)
+                       ret = -EIO;
+               if (ret < 0 && ret != -EIOCBQUEUED)
+                       zonefs_io_error(inode, true);
        }
 
 inode_unlock:
index 93971742613a399d07fa2cf7e1f88cba61a91956..aadad16738df6b87feca401d14c8b9bab8cf72c8 100644 (file)
@@ -113,7 +113,7 @@ static int zonefs_zone_mgmt(struct super_block *sb,
 
        trace_zonefs_zone_mgmt(sb, z, op);
        ret = blkdev_zone_mgmt(sb->s_bdev, op, z->z_sector,
-                              z->z_size >> SECTOR_SHIFT, GFP_NOFS);
+                              z->z_size >> SECTOR_SHIFT);
        if (ret) {
                zonefs_err(sb,
                           "Zone management operation %s at %llu failed %d\n",
@@ -246,16 +246,18 @@ static void zonefs_inode_update_mode(struct inode *inode)
        z->z_mode = inode->i_mode;
 }
 
-struct zonefs_ioerr_data {
-       struct inode    *inode;
-       bool            write;
-};
-
 static int zonefs_io_error_cb(struct blk_zone *zone, unsigned int idx,
                              void *data)
 {
-       struct zonefs_ioerr_data *err = data;
-       struct inode *inode = err->inode;
+       struct blk_zone *z = data;
+
+       *z = *zone;
+       return 0;
+}
+
+static void zonefs_handle_io_error(struct inode *inode, struct blk_zone *zone,
+                                  bool write)
+{
        struct zonefs_zone *z = zonefs_inode_zone(inode);
        struct super_block *sb = inode->i_sb;
        struct zonefs_sb_info *sbi = ZONEFS_SB(sb);
@@ -270,8 +272,8 @@ static int zonefs_io_error_cb(struct blk_zone *zone, unsigned int idx,
        data_size = zonefs_check_zone_condition(sb, z, zone);
        isize = i_size_read(inode);
        if (!(z->z_flags & (ZONEFS_ZONE_READONLY | ZONEFS_ZONE_OFFLINE)) &&
-           !err->write && isize == data_size)
-               return 0;
+           !write && isize == data_size)
+               return;
 
        /*
         * At this point, we detected either a bad zone or an inconsistency
@@ -292,7 +294,7 @@ static int zonefs_io_error_cb(struct blk_zone *zone, unsigned int idx,
         * In all cases, warn about inode size inconsistency and handle the
         * IO error according to the zone condition and to the mount options.
         */
-       if (zonefs_zone_is_seq(z) && isize != data_size)
+       if (isize != data_size)
                zonefs_warn(sb,
                            "inode %lu: invalid size %lld (should be %lld)\n",
                            inode->i_ino, isize, data_size);
@@ -352,8 +354,6 @@ static int zonefs_io_error_cb(struct blk_zone *zone, unsigned int idx,
        zonefs_i_size_write(inode, data_size);
        z->z_wpoffset = data_size;
        zonefs_inode_account_active(inode);
-
-       return 0;
 }
 
 /*
@@ -367,23 +367,25 @@ void __zonefs_io_error(struct inode *inode, bool write)
 {
        struct zonefs_zone *z = zonefs_inode_zone(inode);
        struct super_block *sb = inode->i_sb;
-       struct zonefs_sb_info *sbi = ZONEFS_SB(sb);
        unsigned int noio_flag;
-       unsigned int nr_zones = 1;
-       struct zonefs_ioerr_data err = {
-               .inode = inode,
-               .write = write,
-       };
+       struct blk_zone zone;
        int ret;
 
        /*
-        * The only files that have more than one zone are conventional zone
-        * files with aggregated conventional zones, for which the inode zone
-        * size is always larger than the device zone size.
+        * Conventional zone have no write pointer and cannot become read-only
+        * or offline. So simply fake a report for a single or aggregated zone
+        * and let zonefs_handle_io_error() correct the zone inode information
+        * according to the mount options.
         */
-       if (z->z_size > bdev_zone_sectors(sb->s_bdev))
-               nr_zones = z->z_size >>
-                       (sbi->s_zone_sectors_shift + SECTOR_SHIFT);
+       if (!zonefs_zone_is_seq(z)) {
+               zone.start = z->z_sector;
+               zone.len = z->z_size >> SECTOR_SHIFT;
+               zone.wp = zone.start + zone.len;
+               zone.type = BLK_ZONE_TYPE_CONVENTIONAL;
+               zone.cond = BLK_ZONE_COND_NOT_WP;
+               zone.capacity = zone.len;
+               goto handle_io_error;
+       }
 
        /*
         * Memory allocations in blkdev_report_zones() can trigger a memory
@@ -394,12 +396,20 @@ void __zonefs_io_error(struct inode *inode, bool write)
         * the GFP_NOIO context avoids both problems.
         */
        noio_flag = memalloc_noio_save();
-       ret = blkdev_report_zones(sb->s_bdev, z->z_sector, nr_zones,
-                                 zonefs_io_error_cb, &err);
-       if (ret != nr_zones)
+       ret = blkdev_report_zones(sb->s_bdev, z->z_sector, 1,
+                                 zonefs_io_error_cb, &zone);
+       memalloc_noio_restore(noio_flag);
+
+       if (ret != 1) {
                zonefs_err(sb, "Get inode %lu zone information failed %d\n",
                           inode->i_ino, ret);
-       memalloc_noio_restore(noio_flag);
+               zonefs_warn(sb, "remounting filesystem read-only\n");
+               sb->s_flags |= SB_RDONLY;
+               return;
+       }
+
+handle_io_error:
+       zonefs_handle_io_error(inode, &zone, write);
 }
 
 static struct kmem_cache *zonefs_inode_cachep;
index 961f4d88f9ef784c3c8fbafd6925579698a93d5f..0c0695763bea394aadf9ed26abd8fb3bedc714cf 100644 (file)
@@ -193,7 +193,6 @@ do {                                                                        \
 #ifndef smp_store_release
 #define smp_store_release(p, v)                                                \
 do {                                                                   \
-       compiletime_assert_atomic_type(*p);                             \
        barrier();                                                      \
        WRITE_ONCE(*p, v);                                              \
 } while (0)
@@ -203,7 +202,6 @@ do {                                                                        \
 #define smp_load_acquire(p)                                            \
 ({                                                                     \
        __unqual_scalar_typeof(*p) ___p1 = READ_ONCE(*p);               \
-       compiletime_assert_atomic_type(*p);                             \
        barrier();                                                      \
        (typeof(*p))___p1;                                              \
 })
index c4c423e97f069c325ba2ed41b6839adb160d95f6..4453906105ca183a8fe20be81468f5211666d01f 100644 (file)
@@ -9,6 +9,8 @@
 
 #include <drm/drm_connector.h>
 
+struct auxiliary_device;
+
 #if IS_ENABLED(CONFIG_DRM_AUX_BRIDGE)
 int drm_aux_bridge_register(struct device *parent);
 #else
@@ -19,10 +21,23 @@ static inline int drm_aux_bridge_register(struct device *parent)
 #endif
 
 #if IS_ENABLED(CONFIG_DRM_AUX_HPD_BRIDGE)
+struct auxiliary_device *devm_drm_dp_hpd_bridge_alloc(struct device *parent, struct device_node *np);
+int devm_drm_dp_hpd_bridge_add(struct device *dev, struct auxiliary_device *adev);
 struct device *drm_dp_hpd_bridge_register(struct device *parent,
                                          struct device_node *np);
 void drm_aux_hpd_bridge_notify(struct device *dev, enum drm_connector_status status);
 #else
+static inline struct auxiliary_device *devm_drm_dp_hpd_bridge_alloc(struct device *parent,
+                                                                   struct device_node *np)
+{
+       return NULL;
+}
+
+static inline int devm_drm_dp_hpd_bridge_add(struct auxiliary_device *adev)
+{
+       return 0;
+}
+
 static inline struct device *drm_dp_hpd_bridge_register(struct device *parent,
                                                        struct device_node *np)
 {
index fcb4a4940ace74c98aacb1b18c62eee54a95ad4e..61637ef323026c8f87112af494eaca70fb963a17 100644 (file)
@@ -579,12 +579,12 @@ void __printf(2, 3) kunit_log_append(struct string_stream *log, const char *fmt,
 
 void __noreturn __kunit_abort(struct kunit *test);
 
-void __kunit_do_failed_assertion(struct kunit *test,
-                              const struct kunit_loc *loc,
-                              enum kunit_assert_type type,
-                              const struct kunit_assert *assert,
-                              assert_format_t assert_format,
-                              const char *fmt, ...);
+void __printf(6, 7) __kunit_do_failed_assertion(struct kunit *test,
+                                               const struct kunit_loc *loc,
+                                               enum kunit_assert_type type,
+                                               const struct kunit_assert *assert,
+                                               assert_format_t assert_format,
+                                               const char *fmt, ...);
 
 #define _KUNIT_FAILED(test, assert_type, assert_class, assert_format, INITIALIZER, fmt, ...) do { \
        static const struct kunit_loc __loc = KUNIT_CURRENT_LOC;               \
index ae12696ec492c67339409904bb612e9fdc372689..2ad261082bba5f6f0049fa1c642b6ff057f32b5a 100644 (file)
@@ -141,8 +141,6 @@ struct bdi_writeback {
        struct delayed_work dwork;      /* work item used for writeback */
        struct delayed_work bw_dwork;   /* work item used for bandwidth estimate */
 
-       unsigned long dirty_sleep;      /* last wait */
-
        struct list_head bdi_node;      /* anchored at bdi->wb_list */
 
 #ifdef CONFIG_CGROUP_WRITEBACK
@@ -179,6 +177,11 @@ struct backing_dev_info {
         * any dirty wbs, which is depended upon by bdi_has_dirty().
         */
        atomic_long_t tot_write_bandwidth;
+       /*
+        * Jiffies when last process was dirty throttled on this bdi. Used by
+        * blk-wbt.
+        */
+       unsigned long last_bdp_sleep;
 
        struct bdi_writeback wb;  /* the root writeback info for this bdi */
        struct list_head wb_list; /* list of all wbs */
index 1a97277f99b1b82de9e96eb8b9ca5544f9aa6e3a..8e7af9a03b41dd9261254eb4c6a74b748d625391 100644 (file)
@@ -38,7 +38,6 @@ struct backing_dev_info *bdi_alloc(int node_id);
 
 void wb_start_background_writeback(struct bdi_writeback *wb);
 void wb_workfn(struct work_struct *work);
-void wb_wakeup_delayed(struct bdi_writeback *wb);
 
 void wb_wait_for_completion(struct wb_completion *done);
 
index 378b2459efe2da03565997d0980ed10656516c17..e253e7bd0d1793f23f1de2fb03a9032c66bbe659 100644 (file)
@@ -20,6 +20,7 @@ struct blk_integrity_iter {
        unsigned int            data_size;
        unsigned short          interval;
        unsigned char           tuple_size;
+       unsigned char           pi_offset;
        const char              *disk_name;
 };
 
index 7a8150a5f051339f680b9df83fa78da48b8c8af1..d3d8fd8e229b61443f5e3e7256dfcb2a19c34513 100644 (file)
@@ -8,6 +8,7 @@
 #include <linux/scatterlist.h>
 #include <linux/prefetch.h>
 #include <linux/srcu.h>
+#include <linux/rw_hint.h>
 
 struct blk_mq_tags;
 struct blk_flush_queue;
@@ -135,6 +136,7 @@ struct request {
        struct blk_crypto_keyslot *crypt_keyslot;
 #endif
 
+       enum rw_hint write_hint;
        unsigned short ioprio;
 
        enum mq_rq_state state;
@@ -682,17 +684,19 @@ enum {
 
 #define BLK_MQ_NO_HCTX_IDX     (-1U)
 
-struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set, void *queuedata,
+struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set,
+               struct queue_limits *lim, void *queuedata,
                struct lock_class_key *lkclass);
-#define blk_mq_alloc_disk(set, queuedata)                              \
+#define blk_mq_alloc_disk(set, lim, queuedata)                         \
 ({                                                                     \
        static struct lock_class_key __key;                             \
                                                                        \
-       __blk_mq_alloc_disk(set, queuedata, &__key);                    \
+       __blk_mq_alloc_disk(set, lim, queuedata, &__key);               \
 })
 struct gendisk *blk_mq_alloc_disk_for_queue(struct request_queue *q,
                struct lock_class_key *lkclass);
-struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *);
+struct request_queue *blk_mq_alloc_queue(struct blk_mq_tag_set *set,
+               struct queue_limits *lim, void *queuedata);
 int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
                struct request_queue *q);
 void blk_mq_destroy_queue(struct request_queue *);
index f288c94374b307fb1f47cade7e86943858fd6741..cb1526ec44b5f66572337fff1ba61dcc704a1d19 100644 (file)
@@ -10,6 +10,7 @@
 #include <linux/bvec.h>
 #include <linux/device.h>
 #include <linux/ktime.h>
+#include <linux/rw_hint.h>
 
 struct bio_set;
 struct bio;
@@ -206,52 +207,10 @@ static inline bool blk_path_error(blk_status_t error)
        return true;
 }
 
-/*
- * From most significant bit:
- * 1 bit: reserved for other usage, see below
- * 12 bits: original size of bio
- * 51 bits: issue time of bio
- */
-#define BIO_ISSUE_RES_BITS      1
-#define BIO_ISSUE_SIZE_BITS     12
-#define BIO_ISSUE_RES_SHIFT     (64 - BIO_ISSUE_RES_BITS)
-#define BIO_ISSUE_SIZE_SHIFT    (BIO_ISSUE_RES_SHIFT - BIO_ISSUE_SIZE_BITS)
-#define BIO_ISSUE_TIME_MASK     ((1ULL << BIO_ISSUE_SIZE_SHIFT) - 1)
-#define BIO_ISSUE_SIZE_MASK     \
-       (((1ULL << BIO_ISSUE_SIZE_BITS) - 1) << BIO_ISSUE_SIZE_SHIFT)
-#define BIO_ISSUE_RES_MASK      (~((1ULL << BIO_ISSUE_RES_SHIFT) - 1))
-
-/* Reserved bit for blk-throtl */
-#define BIO_ISSUE_THROTL_SKIP_LATENCY (1ULL << 63)
-
 struct bio_issue {
        u64 value;
 };
 
-static inline u64 __bio_issue_time(u64 time)
-{
-       return time & BIO_ISSUE_TIME_MASK;
-}
-
-static inline u64 bio_issue_time(struct bio_issue *issue)
-{
-       return __bio_issue_time(issue->value);
-}
-
-static inline sector_t bio_issue_size(struct bio_issue *issue)
-{
-       return ((issue->value & BIO_ISSUE_SIZE_MASK) >> BIO_ISSUE_SIZE_SHIFT);
-}
-
-static inline void bio_issue_init(struct bio_issue *issue,
-                                      sector_t size)
-{
-       size &= (1ULL << BIO_ISSUE_SIZE_BITS) - 1;
-       issue->value = ((issue->value & BIO_ISSUE_RES_MASK) |
-                       (ktime_get_ns() & BIO_ISSUE_TIME_MASK) |
-                       ((u64)size << BIO_ISSUE_SIZE_SHIFT));
-}
-
 typedef __u32 __bitwise blk_opf_t;
 
 typedef unsigned int blk_qc_t;
@@ -269,6 +228,7 @@ struct bio {
                                                 */
        unsigned short          bi_flags;       /* BIO_* below */
        unsigned short          bi_ioprio;
+       enum rw_hint            bi_write_hint;
        blk_status_t            bi_status;
        atomic_t                __bi_remaining;
 
index 99e4f5e722132c2c4f301816bbab7871f2f2ccb0..f9b87c39cab0478aac030e50174dd5b3fd7c8f16 100644 (file)
@@ -24,6 +24,7 @@
 #include <linux/sbitmap.h>
 #include <linux/uuid.h>
 #include <linux/xarray.h>
+#include <linux/file.h>
 
 struct module;
 struct request_queue;
@@ -42,7 +43,7 @@ struct blk_crypto_profile;
 
 extern const struct device_type disk_type;
 extern const struct device_type part_type;
-extern struct class block_class;
+extern const struct class block_class;
 
 /*
  * Maximum number of blkcg policies allowed to be registered concurrently.
@@ -108,6 +109,7 @@ struct blk_integrity {
        const struct blk_integrity_profile      *profile;
        unsigned char                           flags;
        unsigned char                           tuple_size;
+       unsigned char                           pi_offset;
        unsigned char                           interval_exp;
        unsigned char                           tag_size;
 };
@@ -189,8 +191,6 @@ struct gendisk {
         * blk_mq_unfreeze_queue().
         */
        unsigned int            nr_zones;
-       unsigned int            max_open_zones;
-       unsigned int            max_active_zones;
        unsigned long           *conv_zones_bitmap;
        unsigned long           *seq_zones_wlock;
 #endif /* CONFIG_BLK_DEV_ZONED */
@@ -292,6 +292,7 @@ struct queue_limits {
        unsigned int            io_opt;
        unsigned int            max_discard_sectors;
        unsigned int            max_hw_discard_sectors;
+       unsigned int            max_user_discard_sectors;
        unsigned int            max_secure_erase_sectors;
        unsigned int            max_write_zeroes_sectors;
        unsigned int            max_zone_append_sectors;
@@ -307,6 +308,8 @@ struct queue_limits {
        unsigned char           discard_misaligned;
        unsigned char           raid_partial_stripes_expensive;
        bool                    zoned;
+       unsigned int            max_open_zones;
+       unsigned int            max_active_zones;
 
        /*
         * Drivers that set dma_alignment to less than 511 must be prepared to
@@ -325,7 +328,7 @@ void disk_set_zoned(struct gendisk *disk);
 int blkdev_report_zones(struct block_device *bdev, sector_t sector,
                unsigned int nr_zones, report_zones_cb cb, void *data);
 int blkdev_zone_mgmt(struct block_device *bdev, enum req_op op,
-               sector_t sectors, sector_t nr_sectors, gfp_t gfp_mask);
+               sector_t sectors, sector_t nr_sectors);
 int blk_revalidate_disk_zones(struct gendisk *disk,
                void (*update_driver_data)(struct gendisk *disk));
 
@@ -473,6 +476,7 @@ struct request_queue {
 
        struct mutex            sysfs_lock;
        struct mutex            sysfs_dir_lock;
+       struct mutex            limits_lock;
 
        /*
         * for reusing dead hctx instance in case of updating
@@ -639,23 +643,23 @@ static inline bool disk_zone_is_seq(struct gendisk *disk, sector_t sector)
 static inline void disk_set_max_open_zones(struct gendisk *disk,
                unsigned int max_open_zones)
 {
-       disk->max_open_zones = max_open_zones;
+       disk->queue->limits.max_open_zones = max_open_zones;
 }
 
 static inline void disk_set_max_active_zones(struct gendisk *disk,
                unsigned int max_active_zones)
 {
-       disk->max_active_zones = max_active_zones;
+       disk->queue->limits.max_active_zones = max_active_zones;
 }
 
 static inline unsigned int bdev_max_open_zones(struct block_device *bdev)
 {
-       return bdev->bd_disk->max_open_zones;
+       return bdev->bd_disk->queue->limits.max_open_zones;
 }
 
 static inline unsigned int bdev_max_active_zones(struct block_device *bdev)
 {
-       return bdev->bd_disk->max_active_zones;
+       return bdev->bd_disk->queue->limits.max_active_zones;
 }
 
 #else /* CONFIG_BLK_DEV_ZONED */
@@ -763,22 +767,26 @@ static inline u64 sb_bdev_nr_blocks(struct super_block *sb)
 int bdev_disk_changed(struct gendisk *disk, bool invalidate);
 
 void put_disk(struct gendisk *disk);
-struct gendisk *__blk_alloc_disk(int node, struct lock_class_key *lkclass);
+struct gendisk *__blk_alloc_disk(struct queue_limits *lim, int node,
+               struct lock_class_key *lkclass);
 
 /**
  * blk_alloc_disk - allocate a gendisk structure
+ * @lim: queue limits to be used for this disk.
  * @node_id: numa node to allocate on
  *
  * Allocate and pre-initialize a gendisk structure for use with BIO based
  * drivers.
  *
+ * Returns an ERR_PTR on error, else the allocated disk.
+ *
  * Context: can sleep
  */
-#define blk_alloc_disk(node_id)                                                \
+#define blk_alloc_disk(lim, node_id)                                   \
 ({                                                                     \
        static struct lock_class_key __key;                             \
                                                                        \
-       __blk_alloc_disk(node_id, &__key);                              \
+       __blk_alloc_disk(lim, node_id, &__key);                         \
 })
 
 int __register_blkdev(unsigned int major, const char *name,
@@ -861,6 +869,29 @@ static inline unsigned int blk_chunk_sectors_left(sector_t offset,
        return chunk_sectors - (offset & (chunk_sectors - 1));
 }
 
+/**
+ * queue_limits_start_update - start an atomic update of queue limits
+ * @q:         queue to update
+ *
+ * This functions starts an atomic update of the queue limits.  It takes a lock
+ * to prevent other updates and returns a snapshot of the current limits that
+ * the caller can modify.  The caller must call queue_limits_commit_update()
+ * to finish the update.
+ *
+ * Context: process context.  The caller must have frozen the queue or ensured
+ * that there is outstanding I/O by other means.
+ */
+static inline struct queue_limits
+queue_limits_start_update(struct request_queue *q)
+       __acquires(q->limits_lock)
+{
+       mutex_lock(&q->limits_lock);
+       return q->limits;
+}
+int queue_limits_commit_update(struct request_queue *q,
+               struct queue_limits *lim);
+int queue_limits_set(struct request_queue *q, struct queue_limits *lim);
+
 /*
  * Access functions for manipulating queue properties
  */
@@ -894,8 +925,8 @@ extern void blk_set_queue_depth(struct request_queue *q, unsigned int depth);
 extern void blk_set_stacking_limits(struct queue_limits *lim);
 extern int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
                            sector_t offset);
-extern void disk_stack_limits(struct gendisk *disk, struct block_device *bdev,
-                             sector_t offset);
+void queue_limits_stack_bdev(struct queue_limits *t, struct block_device *bdev,
+               sector_t offset, const char *pfx);
 extern void blk_queue_update_dma_pad(struct request_queue *, unsigned int);
 extern void blk_queue_segment_boundary(struct request_queue *, unsigned long);
 extern void blk_queue_virt_boundary(struct request_queue *, unsigned long);
@@ -942,6 +973,7 @@ struct blk_plug {
 
        /* if ios_left is > 1, we can batch tag/rq allocations */
        struct request *cached_rq;
+       u64 cur_ktime;
        unsigned short nr_ios;
 
        unsigned short rq_count;
@@ -972,6 +1004,18 @@ static inline void blk_flush_plug(struct blk_plug *plug, bool async)
                __blk_flush_plug(plug, async);
 }
 
+/*
+ * tsk == current here
+ */
+static inline void blk_plug_invalidate_ts(struct task_struct *tsk)
+{
+       struct blk_plug *plug = tsk->plug;
+
+       if (plug)
+               plug->cur_ktime = 0;
+       current->flags &= ~PF_BLOCK_TS;
+}
+
 int blkdev_issue_flush(struct block_device *bdev);
 long nr_blockdev_pages(void);
 #else /* CONFIG_BLOCK */
@@ -995,6 +1039,10 @@ static inline void blk_flush_plug(struct blk_plug *plug, bool async)
 {
 }
 
+static inline void blk_plug_invalidate_ts(struct task_struct *tsk)
+{
+}
+
 static inline int blkdev_issue_flush(struct block_device *bdev)
 {
        return 0;
@@ -1474,26 +1522,20 @@ extern const struct blk_holder_ops fs_holder_ops;
        (BLK_OPEN_READ | BLK_OPEN_RESTRICT_WRITES | \
         (((flags) & SB_RDONLY) ? 0 : BLK_OPEN_WRITE))
 
-struct bdev_handle {
-       struct block_device *bdev;
-       void *holder;
-       blk_mode_t mode;
-};
-
-struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
+struct file *bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
                const struct blk_holder_ops *hops);
-struct bdev_handle *bdev_open_by_path(const char *path, blk_mode_t mode,
+struct file *bdev_file_open_by_path(const char *path, blk_mode_t mode,
                void *holder, const struct blk_holder_ops *hops);
 int bd_prepare_to_claim(struct block_device *bdev, void *holder,
                const struct blk_holder_ops *hops);
 void bd_abort_claiming(struct block_device *bdev, void *holder);
-void bdev_release(struct bdev_handle *handle);
 
 /* just for blk-cgroup, don't use elsewhere */
 struct block_device *blkdev_get_no_open(dev_t dev);
 void blkdev_put_no_open(struct block_device *bdev);
 
 struct block_device *I_BDEV(struct inode *inode);
+struct block_device *file_bdev(struct file *bdev_file);
 
 #ifdef CONFIG_BLOCK
 void invalidate_bdev(struct block_device *bdev);
index 555aae5448ae4ec00065e00553955b31bb44884b..bd1e361b351c5afa88ff02e7023cd79acd5454fd 100644 (file)
@@ -83,7 +83,7 @@ struct bvec_iter {
 
        unsigned int            bi_bvec_done;   /* number of bytes completed in
                                                   current bvec */
-} __packed;
+} __packed __aligned(4);
 
 struct bvec_iter_all {
        struct bio_vec  bv;
index 2eaaabbe98cb64d3f64a698a6f527e244649ca4e..1717cc57cdacd3532e5de7d35f1c2d6eb4e1ef5b 100644 (file)
@@ -283,7 +283,7 @@ struct ceph_msg {
        struct kref kref;
        bool more_to_follow;
        bool needs_out_seq;
-       bool sparse_read;
+       u64 sparse_read_total;
        int front_alloc_len;
 
        struct ceph_msgpool *pool;
index fa018d5864e7422c522194c16ff45a8dd0db1376..f66f6aac74f6f108ffba40b62159e047a184b732 100644 (file)
@@ -45,6 +45,7 @@ enum ceph_sparse_read_state {
        CEPH_SPARSE_READ_HDR    = 0,
        CEPH_SPARSE_READ_EXTENTS,
        CEPH_SPARSE_READ_DATA_LEN,
+       CEPH_SPARSE_READ_DATA_PRE,
        CEPH_SPARSE_READ_DATA,
 };
 
@@ -64,7 +65,7 @@ struct ceph_sparse_read {
        u64                             sr_req_len;  /* orig request length */
        u64                             sr_pos;      /* current pos in buffer */
        int                             sr_index;    /* current extent index */
-       __le32                          sr_datalen;  /* length of actual data */
+       u32                             sr_datalen;  /* length of actual data */
        u32                             sr_count;    /* extent count in reply */
        int                             sr_ext_len;  /* length of extent array */
        struct ceph_sparse_extent       *sr_extent;  /* extent array */
index aebb65bf95a7988dfe8cf9cb1b5ec0945640ced3..75bd1692d2e3791c8122cdce74ebfd67032fee14 100644 (file)
                __builtin_unreachable();        \
        } while (0)
 
+/*
+ * GCC 'asm goto' with outputs miscompiles certain code sequences:
+ *
+ *   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921
+ *
+ * Work around it via the same compiler barrier quirk that we used
+ * to use for the old 'asm goto' workaround.
+ *
+ * Also, always mark such 'asm goto' statements as volatile: all
+ * asm goto statements are supposed to be volatile as per the
+ * documentation, but some versions of gcc didn't actually do
+ * that for asms with outputs:
+ *
+ *    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98619
+ */
+#ifdef CONFIG_GCC_ASM_GOTO_OUTPUT_WORKAROUND
+#define asm_goto_output(x...) \
+       do { asm volatile goto(x); asm (""); } while (0)
+#endif
+
 #if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP)
 #define __HAVE_BUILTIN_BSWAP32__
 #define __HAVE_BUILTIN_BSWAP64__
index 28566624f008f49ebe334f1275f59aafaf0c75f4..289810685fc55edd95e3f98705de428786aa0d32 100644 (file)
 #endif
 
 /*
- * Optional: only supported since gcc >= 14
+ * Optional: only supported since gcc >= 15
  * Optional: only supported since clang >= 18
  *
  *   gcc: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896
- * clang: https://reviews.llvm.org/D148381
+ * clang: https://github.com/llvm/llvm-project/pull/76348
  */
 #if __has_attribute(__counted_by__)
 # define __counted_by(member)          __attribute__((__counted_by__(member)))
index 6f1ca49306d2f7e7b51817fc579b82a85246736d..0caf354cb94b5ad9d802addff81a1bd1c3250139 100644 (file)
@@ -362,8 +362,15 @@ struct ftrace_likely_data {
 #define __member_size(p)       __builtin_object_size(p, 1)
 #endif
 
-#ifndef asm_volatile_goto
-#define asm_volatile_goto(x...) asm goto(x)
+/*
+ * Some versions of gcc do not mark 'asm goto' volatile:
+ *
+ *  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103979
+ *
+ * We do it here by hand, because it doesn't hurt.
+ */
+#ifndef asm_goto_output
+#define asm_goto_output(x...) asm volatile goto(x)
 #endif
 
 #ifdef CONFIG_CC_HAS_ASM_INLINE
index c1a7dc3251215a5ba0e982568a746ff5b04602d1..265b0f8fc0b3c876191ba94bbc2d1d9dd66dd848 100644 (file)
@@ -90,6 +90,29 @@ enum {
        GUID_INIT(0x667DD791, 0xC6B3, 0x4c27, 0x8A, 0x6B, 0x0F, 0x8E,   \
                  0x72, 0x2D, 0xEB, 0x41)
 
+/* CXL Event record UUIDs are formatted as GUIDs and reported in section type */
+/*
+ * General Media Event Record
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+#define CPER_SEC_CXL_GEN_MEDIA_GUID                                    \
+       GUID_INIT(0xfbcd0a77, 0xc260, 0x417f,                           \
+                 0x85, 0xa9, 0x08, 0x8b, 0x16, 0x21, 0xeb, 0xa6)
+/*
+ * DRAM Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
+ */
+#define CPER_SEC_CXL_DRAM_GUID                                         \
+       GUID_INIT(0x601dcbb3, 0x9c06, 0x4eab,                           \
+                 0xb8, 0xaf, 0x4e, 0x9b, 0xfb, 0x5c, 0x96, 0x24)
+/*
+ * Memory Module Event Record
+ * CXL rev 3.0 section 8.2.9.2.1.3; Table 8-45
+ */
+#define CPER_SEC_CXL_MEM_MODULE_GUID                                   \
+       GUID_INIT(0xfe927475, 0xdd59, 0x4339,                           \
+                 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74)
+
 /*
  * Flags bits definitions for flags in struct cper_record_header
  * If set, the error has been recovered
index 91125eca4c8ab8ded08a5b4b687c65c69d656401..03fa6d50d46fe5886d92d3cb6cddfe29fc43af11 100644 (file)
@@ -140,22 +140,4 @@ struct cxl_cper_event_rec {
        union cxl_event event;
 } __packed;
 
-typedef void (*cxl_cper_callback)(enum cxl_event_type type,
-                                 struct cxl_cper_event_rec *rec);
-
-#ifdef CONFIG_ACPI_APEI_GHES
-int cxl_cper_register_callback(cxl_cper_callback callback);
-int cxl_cper_unregister_callback(cxl_cper_callback callback);
-#else
-static inline int cxl_cper_register_callback(cxl_cper_callback callback)
-{
-       return 0;
-}
-
-static inline int cxl_cper_unregister_callback(cxl_cper_callback callback)
-{
-       return 0;
-}
-#endif
-
 #endif /* _LINUX_CXL_EVENT_H */
index 1666c387861f7a8fae32d7ae3acd17c950142ff5..bf53e3894aae33ef15218463db3c9f85985b9e26 100644 (file)
@@ -125,6 +125,11 @@ enum dentry_d_lock_class
        DENTRY_D_LOCK_NESTED
 };
 
+enum d_real_type {
+       D_REAL_DATA,
+       D_REAL_METADATA,
+};
+
 struct dentry_operations {
        int (*d_revalidate)(struct dentry *, unsigned int);
        int (*d_weak_revalidate)(struct dentry *, unsigned int);
@@ -139,7 +144,7 @@ struct dentry_operations {
        char *(*d_dname)(struct dentry *, char *, int);
        struct vfsmount *(*d_automount)(struct path *);
        int (*d_manage)(const struct path *, bool);
-       struct dentry *(*d_real)(struct dentry *, const struct inode *);
+       struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
 } ____cacheline_aligned;
 
 /*
@@ -173,6 +178,7 @@ struct dentry_operations {
 #define DCACHE_DONTCACHE               BIT(7) /* Purge from memory on final dput() */
 
 #define DCACHE_CANT_MOUNT              BIT(8)
+#define DCACHE_GENOCIDE                        BIT(9)
 #define DCACHE_SHRINK_LIST             BIT(10)
 
 #define DCACHE_OP_WEAK_REVALIDATE      BIT(11)
@@ -546,24 +552,23 @@ static inline struct inode *d_backing_inode(const struct dentry *upper)
 /**
  * d_real - Return the real dentry
  * @dentry: the dentry to query
- * @inode: inode to select the dentry from multiple layers (can be NULL)
+ * @type: the type of real dentry (data or metadata)
  *
  * If dentry is on a union/overlay, then return the underlying, real dentry.
  * Otherwise return the dentry itself.
  *
  * See also: Documentation/filesystems/vfs.rst
  */
-static inline struct dentry *d_real(struct dentry *dentry,
-                                   const struct inode *inode)
+static inline struct dentry *d_real(struct dentry *dentry, enum d_real_type type)
 {
        if (unlikely(dentry->d_flags & DCACHE_OP_REAL))
-               return dentry->d_op->d_real(dentry, inode);
+               return dentry->d_op->d_real(dentry, type);
        else
                return dentry;
 }
 
 /**
- * d_real_inode - Return the real inode
+ * d_real_inode - Return the real inode hosting the data
  * @dentry: The dentry to query
  *
  * If dentry is on a union/overlay, then return the underlying, real inode.
@@ -572,7 +577,7 @@ static inline struct dentry *d_real(struct dentry *dentry,
 static inline struct inode *d_real_inode(const struct dentry *dentry)
 {
        /* This usage of d_real() results in const dentry */
-       return d_backing_inode(d_real((struct dentry *) dentry, NULL));
+       return d_inode(d_real((struct dentry *) dentry, D_REAL_DATA));
 }
 
 struct name_snapshot {
index 772ab4d74d944b53d97a685c9c1e9f0b95fbb699..82b2195efaca782fc06b7018edda51311297ac87 100644 (file)
@@ -165,7 +165,7 @@ void dm_error(const char *message);
 
 struct dm_dev {
        struct block_device *bdev;
-       struct bdev_handle *bdev_handle;
+       struct file *bdev_file;
        struct dax_device *dax_dev;
        blk_mode_t mode;
        char name[16];
index 3df70d6131c8fee686ffbd8ecfc7e7c432370bac..752dbde4cec1f8073e225961a41bc91435583350 100644 (file)
@@ -953,7 +953,8 @@ static inline int dmaengine_slave_config(struct dma_chan *chan,
 
 static inline bool is_slave_direction(enum dma_transfer_direction direction)
 {
-       return (direction == DMA_MEM_TO_DEV) || (direction == DMA_DEV_TO_MEM);
+       return (direction == DMA_MEM_TO_DEV) || (direction == DMA_DEV_TO_MEM) ||
+              (direction == DMA_DEV_TO_DEV);
 }
 
 static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_single(
index 9cf896ea1d4122f3bc7094e46a5af81b999937dc..e37344f6a231893fa829bf87c8a18e86bb8b8742 100644 (file)
@@ -10,6 +10,8 @@
 #include <uapi/linux/dpll.h>
 #include <linux/device.h>
 #include <linux/netlink.h>
+#include <linux/netdevice.h>
+#include <linux/rtnetlink.h>
 
 struct dpll_device;
 struct dpll_pin;
@@ -120,15 +122,24 @@ struct dpll_pin_properties {
 };
 
 #if IS_ENABLED(CONFIG_DPLL)
-size_t dpll_msg_pin_handle_size(struct dpll_pin *pin);
-int dpll_msg_add_pin_handle(struct sk_buff *msg, struct dpll_pin *pin);
+void dpll_netdev_pin_set(struct net_device *dev, struct dpll_pin *dpll_pin);
+void dpll_netdev_pin_clear(struct net_device *dev);
+
+size_t dpll_netdev_pin_handle_size(const struct net_device *dev);
+int dpll_netdev_add_pin_handle(struct sk_buff *msg,
+                              const struct net_device *dev);
 #else
-static inline size_t dpll_msg_pin_handle_size(struct dpll_pin *pin)
+static inline void
+dpll_netdev_pin_set(struct net_device *dev, struct dpll_pin *dpll_pin) { }
+static inline void dpll_netdev_pin_clear(struct net_device *dev) { }
+
+static inline size_t dpll_netdev_pin_handle_size(const struct net_device *dev)
 {
        return 0;
 }
 
-static inline int dpll_msg_add_pin_handle(struct sk_buff *msg, struct dpll_pin *pin)
+static inline int
+dpll_netdev_add_pin_handle(struct sk_buff *msg, const struct net_device *dev)
 {
        return 0;
 }
index 6834a29338c43c3370640a3432c98b606313cfb3..169692cb1906d8015cca7b854c1eaeedb3db09e2 100644 (file)
@@ -24,6 +24,8 @@ struct inode;
 struct path;
 extern struct file *alloc_file_pseudo(struct inode *, struct vfsmount *,
        const char *, int flags, const struct file_operations *);
+extern struct file *alloc_file_pseudo_noaccount(struct inode *, struct vfsmount *,
+       const char *, int flags, const struct file_operations *);
 extern struct file *alloc_file_clone(struct file *, int flags,
        const struct file_operations *);
 
index 95e868e09e298bb70ca23ec760d1c00fdb07201e..daee999d05f390a9fa28e83ff2643c2b1e5d1e96 100644 (file)
@@ -27,6 +27,7 @@
 #define FILE_LOCK_DEFERRED 1
 
 struct file_lock;
+struct file_lease;
 
 struct file_lock_operations {
        void (*fl_copy_lock)(struct file_lock *, struct file_lock *);
@@ -39,14 +40,17 @@ struct lock_manager_operations {
        void (*lm_put_owner)(fl_owner_t);
        void (*lm_notify)(struct file_lock *);  /* unblock callback */
        int (*lm_grant)(struct file_lock *, int);
-       bool (*lm_break)(struct file_lock *);
-       int (*lm_change)(struct file_lock *, int, struct list_head *);
-       void (*lm_setup)(struct file_lock *, void **);
-       bool (*lm_breaker_owns_lease)(struct file_lock *);
        bool (*lm_lock_expirable)(struct file_lock *cfl);
        void (*lm_expire_lock)(void);
 };
 
+struct lease_manager_operations {
+       bool (*lm_break)(struct file_lease *);
+       int (*lm_change)(struct file_lease *, int, struct list_head *);
+       void (*lm_setup)(struct file_lease *, void **);
+       bool (*lm_breaker_owns_lease)(struct file_lease *);
+};
+
 struct lock_manager {
        struct list_head list;
        /*
@@ -85,31 +89,31 @@ bool opens_in_grace(struct net *);
  *
  * Obviously, the last two criteria only matter for POSIX locks.
  */
-struct file_lock {
-       struct file_lock *fl_blocker;   /* The lock, that is blocking us */
-       struct list_head fl_list;       /* link into file_lock_context */
-       struct hlist_node fl_link;      /* node in global lists */
-       struct list_head fl_blocked_requests;   /* list of requests with
+
+struct file_lock_core {
+       struct file_lock_core *flc_blocker;     /* The lock that is blocking us */
+       struct list_head flc_list;      /* link into file_lock_context */
+       struct hlist_node flc_link;     /* node in global lists */
+       struct list_head flc_blocked_requests;  /* list of requests with
                                                 * ->fl_blocker pointing here
                                                 */
-       struct list_head fl_blocked_member;     /* node in
+       struct list_head flc_blocked_member;    /* node in
                                                 * ->fl_blocker->fl_blocked_requests
                                                 */
-       fl_owner_t fl_owner;
-       unsigned int fl_flags;
-       unsigned char fl_type;
-       unsigned int fl_pid;
-       int fl_link_cpu;                /* what cpu's list is this on? */
-       wait_queue_head_t fl_wait;
-       struct file *fl_file;
+       fl_owner_t flc_owner;
+       unsigned int flc_flags;
+       unsigned char flc_type;
+       pid_t flc_pid;
+       int flc_link_cpu;               /* what cpu's list is this on? */
+       wait_queue_head_t flc_wait;
+       struct file *flc_file;
+};
+
+struct file_lock {
+       struct file_lock_core c;
        loff_t fl_start;
        loff_t fl_end;
 
-       struct fasync_struct *  fl_fasync; /* for lease break notifications */
-       /* for lease breaks: */
-       unsigned long fl_break_time;
-       unsigned long fl_downgrade_time;
-
        const struct file_lock_operations *fl_ops;      /* Callbacks for filesystems */
        const struct lock_manager_operations *fl_lmops; /* Callbacks for lockmanagers */
        union {
@@ -126,6 +130,15 @@ struct file_lock {
        } fl_u;
 } __randomize_layout;
 
+struct file_lease {
+       struct file_lock_core c;
+       struct fasync_struct *  fl_fasync; /* for lease break notifications */
+       /* for lease breaks: */
+       unsigned long fl_break_time;
+       unsigned long fl_downgrade_time;
+       const struct lease_manager_operations *fl_lmops; /* Callbacks for lease managers */
+} __randomize_layout;
+
 struct file_lock_context {
        spinlock_t              flc_lock;
        struct list_head        flc_flock;
@@ -147,11 +160,31 @@ int fcntl_setlk64(unsigned int, struct file *, unsigned int,
 int fcntl_setlease(unsigned int fd, struct file *filp, int arg);
 int fcntl_getlease(struct file *filp);
 
+static inline bool lock_is_unlock(struct file_lock *fl)
+{
+       return fl->c.flc_type == F_UNLCK;
+}
+
+static inline bool lock_is_read(struct file_lock *fl)
+{
+       return fl->c.flc_type == F_RDLCK;
+}
+
+static inline bool lock_is_write(struct file_lock *fl)
+{
+       return fl->c.flc_type == F_WRLCK;
+}
+
+static inline void locks_wake_up(struct file_lock *fl)
+{
+       wake_up(&fl->c.flc_wait);
+}
+
 /* fs/locks.c */
 void locks_free_lock_context(struct inode *inode);
 void locks_free_lock(struct file_lock *fl);
 void locks_init_lock(struct file_lock *);
-struct file_lock * locks_alloc_lock(void);
+struct file_lock *locks_alloc_lock(void);
 void locks_copy_lock(struct file_lock *, struct file_lock *);
 void locks_copy_conflock(struct file_lock *, struct file_lock *);
 void locks_remove_posix(struct file *, fl_owner_t);
@@ -165,11 +198,16 @@ int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_l
 int vfs_cancel_lock(struct file *filp, struct file_lock *fl);
 bool vfs_inode_has_locks(struct inode *inode);
 int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl);
+
+void locks_init_lease(struct file_lease *);
+void locks_free_lease(struct file_lease *fl);
+struct file_lease *locks_alloc_lease(void);
 int __break_lease(struct inode *inode, unsigned int flags, unsigned int type);
 void lease_get_mtime(struct inode *, struct timespec64 *time);
-int generic_setlease(struct file *, int, struct file_lock **, void **priv);
-int vfs_setlease(struct file *, int, struct file_lock **, void **);
-int lease_modify(struct file_lock *, int, struct list_head *);
+int generic_setlease(struct file *, int, struct file_lease **, void **priv);
+int kernel_setlease(struct file *, int, struct file_lease **, void **);
+int vfs_setlease(struct file *, int, struct file_lease **, void **);
+int lease_modify(struct file_lease *, int, struct list_head *);
 
 struct notifier_block;
 int lease_register_notifier(struct notifier_block *);
@@ -223,6 +261,25 @@ static inline int fcntl_getlease(struct file *filp)
        return F_UNLCK;
 }
 
+static inline bool lock_is_unlock(struct file_lock *fl)
+{
+       return false;
+}
+
+static inline bool lock_is_read(struct file_lock *fl)
+{
+       return false;
+}
+
+static inline bool lock_is_write(struct file_lock *fl)
+{
+       return false;
+}
+
+static inline void locks_wake_up(struct file_lock *fl)
+{
+}
+
 static inline void
 locks_free_lock_context(struct inode *inode)
 {
@@ -233,6 +290,11 @@ static inline void locks_init_lock(struct file_lock *fl)
        return;
 }
 
+static inline void locks_init_lease(struct file_lease *fl)
+{
+       return;
+}
+
 static inline void locks_copy_conflock(struct file_lock *new, struct file_lock *fl)
 {
        return;
@@ -307,18 +369,24 @@ static inline void lease_get_mtime(struct inode *inode,
 }
 
 static inline int generic_setlease(struct file *filp, int arg,
-                                   struct file_lock **flp, void **priv)
+                                   struct file_lease **flp, void **priv)
+{
+       return -EINVAL;
+}
+
+static inline int kernel_setlease(struct file *filp, int arg,
+                              struct file_lease **lease, void **priv)
 {
        return -EINVAL;
 }
 
 static inline int vfs_setlease(struct file *filp, int arg,
-                              struct file_lock **lease, void **priv)
+                              struct file_lease **lease, void **priv)
 {
        return -EINVAL;
 }
 
-static inline int lease_modify(struct file_lock *fl, int arg,
+static inline int lease_modify(struct file_lease *fl, int arg,
                               struct list_head *dispose)
 {
        return -EINVAL;
@@ -341,6 +409,9 @@ locks_inode_context(const struct inode *inode)
 
 #endif /* !CONFIG_FILE_LOCKING */
 
+/* for walking lists of file_locks linked by fl_list */
+#define for_each_file_lock(_fl, _head) list_for_each_entry(_fl, _head, c.flc_list)
+
 static inline int locks_lock_file_wait(struct file *filp, struct file_lock *fl)
 {
        return locks_lock_inode_wait(file_inode(filp), fl);
index ed5966a70495129be1d6729eed2918240db62df1..0a22b7245982a5389b983a697d8fb3c87a6273da 100644 (file)
@@ -43,6 +43,8 @@
 #include <linux/cred.h>
 #include <linux/mnt_idmapping.h>
 #include <linux/slab.h>
+#include <linux/maple_tree.h>
+#include <linux/rw_hint.h>
 
 #include <asm/byteorder.h>
 #include <uapi/linux/fs.h>
@@ -309,19 +311,6 @@ struct address_space;
 struct writeback_control;
 struct readahead_control;
 
-/*
- * Write life time hint values.
- * Stored in struct inode as u8.
- */
-enum rw_hint {
-       WRITE_LIFE_NOT_SET      = 0,
-       WRITE_LIFE_NONE         = RWH_WRITE_LIFE_NONE,
-       WRITE_LIFE_SHORT        = RWH_WRITE_LIFE_SHORT,
-       WRITE_LIFE_MEDIUM       = RWH_WRITE_LIFE_MEDIUM,
-       WRITE_LIFE_LONG         = RWH_WRITE_LIFE_LONG,
-       WRITE_LIFE_EXTREME      = RWH_WRITE_LIFE_EXTREME,
-};
-
 /* Match RWF_* bits to IOCB bits */
 #define IOCB_HIPRI             (__force int) RWF_HIPRI
 #define IOCB_DSYNC             (__force int) RWF_DSYNC
@@ -352,6 +341,8 @@ enum rw_hint {
  * unrelated IO (like cache flushing, new IO generation, etc).
  */
 #define IOCB_DIO_CALLER_COMP   (1 << 22)
+/* kiocb is a read or write operation submitted by fs/aio.c. */
+#define IOCB_AIO_RW            (1 << 23)
 
 /* for use in trace events */
 #define TRACE_IOCB_STRINGS \
@@ -482,10 +473,10 @@ struct address_space {
        pgoff_t                 writeback_index;
        const struct address_space_operations *a_ops;
        unsigned long           flags;
-       struct rw_semaphore     i_mmap_rwsem;
        errseq_t                wb_err;
        spinlock_t              i_private_lock;
        struct list_head        i_private_list;
+       struct rw_semaphore     i_mmap_rwsem;
        void *                  i_private_data;
 } __attribute__((aligned(sizeof(long)))) __randomize_layout;
        /*
@@ -677,7 +668,7 @@ struct inode {
        spinlock_t              i_lock; /* i_blocks, i_bytes, maybe i_size */
        unsigned short          i_bytes;
        u8                      i_blkbits;
-       u8                      i_write_hint;
+       enum rw_hint            i_write_hint;
        blkcnt_t                i_blocks;
 
 #ifdef __NEED_I_SIZE_ORDERED
@@ -907,7 +898,8 @@ static inline loff_t i_size_read(const struct inode *inode)
        preempt_enable();
        return i_size;
 #else
-       return inode->i_size;
+       /* Pairs with smp_store_release() in i_size_write() */
+       return smp_load_acquire(&inode->i_size);
 #endif
 }
 
@@ -929,7 +921,12 @@ static inline void i_size_write(struct inode *inode, loff_t i_size)
        inode->i_size = i_size;
        preempt_enable();
 #else
-       inode->i_size = i_size;
+       /*
+        * Pairs with smp_load_acquire() in i_size_read() to ensure
+        * changes related to inode size (such as page contents) are
+        * visible before we see the changed inode size.
+        */
+       smp_store_release(&inode->i_size, i_size);
 #endif
 }
 
@@ -1064,6 +1061,7 @@ struct file *get_file_active(struct file **f);
 typedef void *fl_owner_t;
 
 struct file_lock;
+struct file_lease;
 
 /* The following constant reflects the upper bound of the file/locking space */
 #ifndef OFFSET_MAX
@@ -1078,9 +1076,20 @@ static inline struct inode *file_inode(const struct file *f)
        return f->f_inode;
 }
 
+/*
+ * file_dentry() is a relic from the days that overlayfs was using files with a
+ * "fake" path, meaning, f_path on overlayfs and f_inode on underlying fs.
+ * In those days, file_dentry() was needed to get the underlying fs dentry that
+ * matches f_inode.
+ * Files with "fake" path should not exist nowadays, so use an assertion to make
+ * sure that file_dentry() was not papering over filesystem bugs.
+ */
 static inline struct dentry *file_dentry(const struct file *file)
 {
-       return d_real(file->f_path.dentry, file_inode(file));
+       struct dentry *dentry = file->f_path.dentry;
+
+       WARN_ON_ONCE(d_inode(dentry) != file_inode(file));
+       return dentry;
 }
 
 struct fasync_struct {
@@ -1228,8 +1237,8 @@ struct super_block {
 #endif
        struct hlist_bl_head    s_roots;        /* alternate root dentries for NFS */
        struct list_head        s_mounts;       /* list of mounts; _not_ for fs use */
-       struct block_device     *s_bdev;
-       struct bdev_handle      *s_bdev_handle;
+       struct block_device     *s_bdev;        /* can go away once we use an accessor for @s_bdev_file */
+       struct file             *s_bdev_file;
        struct backing_dev_info *s_bdi;
        struct mtd_info         *s_mtd;
        struct hlist_node       s_instances;
@@ -1255,8 +1264,22 @@ struct super_block {
        struct fsnotify_mark_connector __rcu    *s_fsnotify_marks;
 #endif
 
+       /*
+        * q: why are s_id and s_sysfs_name not the same? both are human
+        * readable strings that identify the filesystem
+        * a: s_id is allowed to change at runtime; it's used in log messages,
+        * and we want to when a device starts out as single device (s_id is dev
+        * name) but then a device is hot added and we have to switch to
+        * identifying it by UUID
+        * but s_sysfs_name is a handle for programmatic access, and can't
+        * change at runtime
+        */
        char                    s_id[32];       /* Informational name */
        uuid_t                  s_uuid;         /* UUID */
+       u8                      s_uuid_len;     /* Default 16, possibly smaller for weird filesystems */
+
+       /* if set, fs shows up under sysfs at /sys/fs/$FSTYP/s_sysfs_name */
+       char                    s_sysfs_name[UUID_STRING_LEN + 1];
 
        unsigned int            s_max_links;
 
@@ -2005,7 +2028,7 @@ struct file_operations {
        ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
        ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int);
        void (*splice_eof)(struct file *file);
-       int (*setlease)(struct file *, int, struct file_lock **, void **);
+       int (*setlease)(struct file *, int, struct file_lease **, void **);
        long (*fallocate)(struct file *file, int mode, loff_t offset,
                          loff_t len);
        void (*show_fdinfo)(struct seq_file *m, struct file *f);
@@ -2101,9 +2124,6 @@ int __generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
 int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
                                  struct file *file_out, loff_t pos_out,
                                  loff_t *count, unsigned int remap_flags);
-extern loff_t do_clone_file_range(struct file *file_in, loff_t pos_in,
-                                 struct file *file_out, loff_t pos_out,
-                                 loff_t len, unsigned int remap_flags);
 extern loff_t vfs_clone_file_range(struct file *file_in, loff_t pos_in,
                                   struct file *file_out, loff_t pos_out,
                                   loff_t len, unsigned int remap_flags);
@@ -2532,6 +2552,44 @@ extern __printf(2, 3)
 int super_setup_bdi_name(struct super_block *sb, char *fmt, ...);
 extern int super_setup_bdi(struct super_block *sb);
 
+static inline void super_set_uuid(struct super_block *sb, const u8 *uuid, unsigned len)
+{
+       if (WARN_ON(len > sizeof(sb->s_uuid)))
+               len = sizeof(sb->s_uuid);
+       sb->s_uuid_len = len;
+       memcpy(&sb->s_uuid, uuid, len);
+}
+
+/* set sb sysfs name based on sb->s_bdev */
+static inline void super_set_sysfs_name_bdev(struct super_block *sb)
+{
+       snprintf(sb->s_sysfs_name, sizeof(sb->s_sysfs_name), "%pg", sb->s_bdev);
+}
+
+/* set sb sysfs name based on sb->s_uuid */
+static inline void super_set_sysfs_name_uuid(struct super_block *sb)
+{
+       WARN_ON(sb->s_uuid_len != sizeof(sb->s_uuid));
+       snprintf(sb->s_sysfs_name, sizeof(sb->s_sysfs_name), "%pU", sb->s_uuid.b);
+}
+
+/* set sb sysfs name based on sb->s_id */
+static inline void super_set_sysfs_name_id(struct super_block *sb)
+{
+       strscpy(sb->s_sysfs_name, sb->s_id, sizeof(sb->s_sysfs_name));
+}
+
+/* try to use something standard before you use this */
+__printf(2, 3)
+static inline void super_set_sysfs_name_generic(struct super_block *sb, const char *fmt, ...)
+{
+       va_list args;
+
+       va_start(args, fmt);
+       vsnprintf(sb->s_sysfs_name, sizeof(sb->s_sysfs_name), fmt, args);
+       va_end(args);
+}
+
 extern int current_umask(void);
 
 extern void ihold(struct inode * inode);
@@ -2928,6 +2986,17 @@ extern bool path_is_under(const struct path *, const struct path *);
 
 extern char *file_path(struct file *, char *, int);
 
+/**
+ * is_dot_dotdot - returns true only if @name is "." or ".."
+ * @name: file name to check
+ * @len: length of file name, in bytes
+ */
+static inline bool is_dot_dotdot(const char *name, size_t len)
+{
+       return len && unlikely(name[0] == '.') &&
+               (len == 1 || (len == 2 && name[1] == '.'));
+}
+
 #include <linux/err.h>
 
 /* needed for stackable file system support */
@@ -3238,7 +3307,7 @@ extern int simple_write_begin(struct file *file, struct address_space *mapping,
 extern const struct address_space_operations ram_aops;
 extern int always_delete_dentry(const struct dentry *);
 extern struct inode *alloc_anon_inode(struct super_block *);
-extern int simple_nosetlease(struct file *, int, struct file_lock **, void **);
+extern int simple_nosetlease(struct file *, int, struct file_lease **, void **);
 extern const struct dentry_operations simple_dentry_operations;
 
 extern struct dentry *simple_lookup(struct inode *, struct dentry *, unsigned int flags);
@@ -3260,13 +3329,14 @@ extern ssize_t simple_write_to_buffer(void *to, size_t available, loff_t *ppos,
                const void __user *from, size_t count);
 
 struct offset_ctx {
-       struct xarray           xa;
-       u32                     next_offset;
+       struct maple_tree       mt;
+       unsigned long           next_offset;
 };
 
 void simple_offset_init(struct offset_ctx *octx);
 int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry);
 void simple_offset_remove(struct offset_ctx *octx, struct dentry *dentry);
+int simple_offset_empty(struct dentry *dentry);
 int simple_offset_rename_exchange(struct inode *old_dir,
                                  struct dentry *old_dentry,
                                  struct inode *new_dir,
@@ -3280,7 +3350,16 @@ extern int generic_file_fsync(struct file *, loff_t, loff_t, int);
 
 extern int generic_check_addressable(unsigned, u64);
 
-extern void generic_set_encrypted_ci_d_ops(struct dentry *dentry);
+extern void generic_set_sb_d_ops(struct super_block *sb);
+
+static inline bool sb_has_encoding(const struct super_block *sb)
+{
+#if IS_ENABLED(CONFIG_UNICODE)
+       return !!sb->s_encoding;
+#else
+       return false;
+#endif
+}
 
 int may_setattr(struct mnt_idmap *idmap, struct inode *inode,
                unsigned int ia_valid);
@@ -3335,6 +3414,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
                return 0;
        if (unlikely(flags & ~RWF_SUPPORTED))
                return -EOPNOTSUPP;
+       if (unlikely((flags & RWF_APPEND) && (flags & RWF_NOAPPEND)))
+               return -EINVAL;
 
        if (flags & RWF_NOWAIT) {
                if (!(ki->ki_filp->f_mode & FMODE_NOWAIT))
@@ -3345,6 +3426,12 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
        if (flags & RWF_SYNC)
                kiocb_flags |= IOCB_DSYNC;
 
+       if ((flags & RWF_NOAPPEND) && (ki->ki_flags & IOCB_APPEND)) {
+               if (IS_APPEND(file_inode(ki->ki_filp)))
+                       return -EPERM;
+               ki->ki_flags &= ~IOCB_APPEND;
+       }
+
        ki->ki_flags |= kiocb_flags;
        return 0;
 }
index 12f9e455d569f0a43aade8a9c6bb47402d6139b6..772f822dc6b82e92903e030b0d5c35512ffc9f55 100644 (file)
@@ -192,6 +192,8 @@ struct fscrypt_operations {
                                             unsigned int *num_devs);
 };
 
+int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags);
+
 static inline struct fscrypt_inode_info *
 fscrypt_get_inode_info(const struct inode *inode)
 {
@@ -221,15 +223,29 @@ static inline bool fscrypt_needs_contents_encryption(const struct inode *inode)
 }
 
 /*
- * When d_splice_alias() moves a directory's no-key alias to its plaintext alias
- * as a result of the encryption key being added, DCACHE_NOKEY_NAME must be
- * cleared.  Note that we don't have to support arbitrary moves of this flag
- * because fscrypt doesn't allow no-key names to be the source or target of a
- * rename().
+ * When d_splice_alias() moves a directory's no-key alias to its
+ * plaintext alias as a result of the encryption key being added,
+ * DCACHE_NOKEY_NAME must be cleared and there might be an opportunity
+ * to disable d_revalidate.  Note that we don't have to support the
+ * inverse operation because fscrypt doesn't allow no-key names to be
+ * the source or target of a rename().
  */
 static inline void fscrypt_handle_d_move(struct dentry *dentry)
 {
-       dentry->d_flags &= ~DCACHE_NOKEY_NAME;
+       /*
+        * VFS calls fscrypt_handle_d_move even for non-fscrypt
+        * filesystems.
+        */
+       if (dentry->d_flags & DCACHE_NOKEY_NAME) {
+               dentry->d_flags &= ~DCACHE_NOKEY_NAME;
+
+               /*
+                * Other filesystem features might be handling dentry
+                * revalidation, in which case it cannot be disabled.
+                */
+               if (dentry->d_op->d_revalidate == fscrypt_d_revalidate)
+                       dentry->d_flags &= ~DCACHE_OP_REVALIDATE;
+       }
 }
 
 /**
@@ -261,6 +277,35 @@ static inline bool fscrypt_is_nokey_name(const struct dentry *dentry)
        return dentry->d_flags & DCACHE_NOKEY_NAME;
 }
 
+static inline void fscrypt_prepare_dentry(struct dentry *dentry,
+                                         bool is_nokey_name)
+{
+       /*
+        * This code tries to only take ->d_lock when necessary to write
+        * to ->d_flags.  We shouldn't be peeking on d_flags for
+        * DCACHE_OP_REVALIDATE unlocked, but in the unlikely case
+        * there is a race, the worst it can happen is that we fail to
+        * unset DCACHE_OP_REVALIDATE and pay the cost of an extra
+        * d_revalidate.
+        */
+       if (is_nokey_name) {
+               spin_lock(&dentry->d_lock);
+               dentry->d_flags |= DCACHE_NOKEY_NAME;
+               spin_unlock(&dentry->d_lock);
+       } else if (dentry->d_flags & DCACHE_OP_REVALIDATE &&
+                  dentry->d_op->d_revalidate == fscrypt_d_revalidate) {
+               /*
+                * Unencrypted dentries and encrypted dentries where the
+                * key is available are always valid from fscrypt
+                * perspective. Avoid the cost of calling
+                * fscrypt_d_revalidate unnecessarily.
+                */
+               spin_lock(&dentry->d_lock);
+               dentry->d_flags &= ~DCACHE_OP_REVALIDATE;
+               spin_unlock(&dentry->d_lock);
+       }
+}
+
 /* crypto.c */
 void fscrypt_enqueue_decrypt_work(struct work_struct *);
 
@@ -368,7 +413,6 @@ int fscrypt_fname_disk_to_usr(const struct inode *inode,
 bool fscrypt_match_name(const struct fscrypt_name *fname,
                        const u8 *de_name, u32 de_name_len);
 u64 fscrypt_fname_siphash(const struct inode *dir, const struct qstr *name);
-int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags);
 
 /* bio.c */
 bool fscrypt_decrypt_bio(struct bio *bio);
@@ -425,6 +469,11 @@ static inline bool fscrypt_is_nokey_name(const struct dentry *dentry)
        return false;
 }
 
+static inline void fscrypt_prepare_dentry(struct dentry *dentry,
+                                         bool is_nokey_name)
+{
+}
+
 /* crypto.c */
 static inline void fscrypt_enqueue_decrypt_work(struct work_struct *work)
 {
@@ -982,6 +1031,9 @@ static inline int fscrypt_prepare_lookup(struct inode *dir,
        fname->usr_fname = &dentry->d_name;
        fname->disk_name.name = (unsigned char *)dentry->d_name.name;
        fname->disk_name.len = dentry->d_name.len;
+
+       fscrypt_prepare_dentry(dentry, false);
+
        return 0;
 }
 
index de292a0071389ed122a3540c4a98870fe30aa8d8..e2a916cf29c42ff6c9e298bf64b9e3b3f9208879 100644 (file)
@@ -353,6 +353,15 @@ static inline bool gfp_has_io_fs(gfp_t gfp)
        return (gfp & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS);
 }
 
+/*
+ * Check if the gfp flags allow compaction - GFP_NOIO is a really
+ * tricky context because the migration might require IO.
+ */
+static inline bool gfp_compaction_allowed(gfp_t gfp_mask)
+{
+       return IS_ENABLED(CONFIG_COMPACTION) && (gfp_mask & __GFP_IO);
+}
+
 extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma);
 
 #ifdef CONFIG_CONTIG_ALLOC
index 9a5c6c76e6533385dbb32de98abfd330c8736585..7f75c9a5187417b3b52386573a47f7c1e95e9126 100644 (file)
@@ -819,6 +819,24 @@ static inline struct gpio_chip *gpiod_to_chip(const struct gpio_desc *desc)
        return ERR_PTR(-ENODEV);
 }
 
+static inline struct gpio_device *gpiod_to_gpio_device(struct gpio_desc *desc)
+{
+       WARN_ON(1);
+       return ERR_PTR(-ENODEV);
+}
+
+static inline int gpio_device_get_base(struct gpio_device *gdev)
+{
+       WARN_ON(1);
+       return -ENODEV;
+}
+
+static inline const char *gpio_device_get_label(struct gpio_device *gdev)
+{
+       WARN_ON(1);
+       return NULL;
+}
+
 static inline int gpiochip_lock_as_irq(struct gpio_chip *gc,
                                       unsigned int offset)
 {
index 840cd254172d061ec445bdc845a7f1e8cbf20463..7118ac28d46879b615de35a6e3702208de4001e9 100644 (file)
@@ -77,17 +77,6 @@ enum hid_bpf_attach_flags {
 int hid_bpf_device_event(struct hid_bpf_ctx *ctx);
 int hid_bpf_rdesc_fixup(struct hid_bpf_ctx *ctx);
 
-/* Following functions are kfunc that we export to BPF programs */
-/* available everywhere in HID-BPF */
-__u8 *hid_bpf_get_data(struct hid_bpf_ctx *ctx, unsigned int offset, const size_t __sz);
-
-/* only available in syscall */
-int hid_bpf_attach_prog(unsigned int hid_id, int prog_fd, __u32 flags);
-int hid_bpf_hw_request(struct hid_bpf_ctx *ctx, __u8 *buf, size_t buf__sz,
-                      enum hid_report_type rtype, enum hid_class_request reqtype);
-struct hid_bpf_ctx *hid_bpf_allocate_context(unsigned int hid_id);
-void hid_bpf_release_context(struct hid_bpf_ctx *ctx);
-
 /*
  * Below is HID internal
  */
index 87e3bedf8eb00323c102787243e7dbfd045ba4e9..641c4567cfa7aee830f8ad0b52abb24bcbe353a8 100644 (file)
@@ -157,6 +157,7 @@ enum  hrtimer_base_type {
  * @max_hang_time:     Maximum time spent in hrtimer_interrupt
  * @softirq_expiry_lock: Lock which is taken while softirq based hrtimer are
  *                      expired
+ * @online:            CPU is online from an hrtimers point of view
  * @timer_waiters:     A hrtimer_cancel() invocation waits for the timer
  *                     callback to finish.
  * @expires_next:      absolute time of the next event, is required for remote
@@ -179,7 +180,8 @@ struct hrtimer_cpu_base {
        unsigned int                    hres_active             : 1,
                                        in_hrtirq               : 1,
                                        hang_detected           : 1,
-                                       softirq_activated       : 1;
+                                       softirq_activated       : 1,
+                                       online                  : 1;
 #ifdef CONFIG_HIGH_RES_TIMERS
        unsigned int                    nr_events;
        unsigned short                  nr_retries;
index 2b00faf98017cc9c2a99da9f2cc29c45ecbaf2ec..6ef0557b4bff8ed5d14bc18391d356913136c23c 100644 (file)
@@ -164,8 +164,28 @@ struct hv_ring_buffer {
        u8 buffer[];
 } __packed;
 
+
+/*
+ * If the requested ring buffer size is at least 8 times the size of the
+ * header, steal space from the ring buffer for the header. Otherwise, add
+ * space for the header so that is doesn't take too much of the ring buffer
+ * space.
+ *
+ * The factor of 8 is somewhat arbitrary. The goal is to prevent adding a
+ * relatively small header (4 Kbytes on x86) to a large-ish power-of-2 ring
+ * buffer size (such as 128 Kbytes) and so end up making a nearly twice as
+ * large allocation that will be almost half wasted. As a contrasting example,
+ * on ARM64 with 64 Kbyte page size, we don't want to take 64 Kbytes for the
+ * header from a 128 Kbyte allocation, leaving only 64 Kbytes for the ring.
+ * In this latter case, we must add 64 Kbytes for the header and not worry
+ * about what's wasted.
+ */
+#define VMBUS_HEADER_ADJ(payload_sz) \
+       ((payload_sz) >=  8 * sizeof(struct hv_ring_buffer) ? \
+       0 : sizeof(struct hv_ring_buffer))
+
 /* Calculate the proper size of a ringbuffer, it must be page-aligned */
-#define VMBUS_RING_SIZE(payload_sz) PAGE_ALIGN(sizeof(struct hv_ring_buffer) + \
+#define VMBUS_RING_SIZE(payload_sz) PAGE_ALIGN(VMBUS_HEADER_ADJ(payload_sz) + \
                                               (payload_sz))
 
 struct hv_ring_buffer_info {
index 7852f6c9a714c6fb7dc84321c4b416717b6d4666..719cf9cc6e1ac4db6abbd1171b1590716dc50dc9 100644 (file)
@@ -8,6 +8,8 @@
 #ifndef __AD_SIGMA_DELTA_H__
 #define __AD_SIGMA_DELTA_H__
 
+#include <linux/iio/iio.h>
+
 enum ad_sigma_delta_mode {
        AD_SD_MODE_CONTINUOUS = 0,
        AD_SD_MODE_SINGLE = 1,
@@ -99,7 +101,7 @@ struct ad_sigma_delta {
         * 'rx_buf' is up to 32 bits per sample + 64 bit timestamp,
         * rounded to 16 bytes to take into account padding.
         */
-       uint8_t                         tx_buf[4] ____cacheline_aligned;
+       uint8_t                         tx_buf[4] __aligned(IIO_DMA_MINALIGN);
        uint8_t                         rx_buf[16] __aligned(8);
 };
 
index 607c3a89a6471df963e6fc4ed09df84b53c0db0f..f9ae5cdd884f5be041246b5aebbf6467980319dc 100644 (file)
@@ -258,9 +258,9 @@ struct st_sensor_data {
        bool hw_irq_trigger;
        s64 hw_timestamp;
 
-       char buffer_data[ST_SENSORS_MAX_BUFFER_SIZE] ____cacheline_aligned;
-
        struct mutex odr_lock;
+
+       char buffer_data[ST_SENSORS_MAX_BUFFER_SIZE] __aligned(IIO_DMA_MINALIGN);
 };
 
 #ifdef CONFIG_IIO_BUFFER
index dc9ea299e0885cd55018f21b80a35155d0ca1d63..8898966bc0f08c152a8156b85151b223f76e9ffe 100644 (file)
@@ -11,6 +11,7 @@
 
 #include <linux/spi/spi.h>
 #include <linux/interrupt.h>
+#include <linux/iio/iio.h>
 #include <linux/iio/types.h>
 
 #define ADIS_WRITE_REG(reg) ((0x80 | (reg)))
@@ -131,7 +132,7 @@ struct adis {
        unsigned long           irq_flag;
        void                    *buffer;
 
-       u8                      tx[10] ____cacheline_aligned;
+       u8                      tx[10] __aligned(IIO_DMA_MINALIGN);
        u8                      rx[4];
 };
 
index 854ad67a5f70e8a9f9055f5d76762717c3c33c69..e248936250852dd09c7e71b958a0072c1ac86193 100644 (file)
@@ -2,6 +2,7 @@
 #define IO_URING_TYPES_H
 
 #include <linux/blkdev.h>
+#include <linux/hashtable.h>
 #include <linux/task_work.h>
 #include <linux/bitmap.h>
 #include <linux/llist.h>
@@ -240,12 +241,14 @@ struct io_ring_ctx {
                unsigned int            poll_activated: 1;
                unsigned int            drain_disabled: 1;
                unsigned int            compat: 1;
+               unsigned int            iowq_limits_set : 1;
 
                struct task_struct      *submitter_task;
                struct io_rings         *rings;
                struct percpu_ref       refs;
 
                enum task_work_notify_mode      notify_method;
+               unsigned                        sq_thread_idle;
        } ____cacheline_aligned_in_smp;
 
        /* submission data */
@@ -274,10 +277,20 @@ struct io_ring_ctx {
                 */
                struct io_rsrc_node     *rsrc_node;
                atomic_t                cancel_seq;
+
+               /*
+                * ->iopoll_list is protected by the ctx->uring_lock for
+                * io_uring instances that don't use IORING_SETUP_SQPOLL.
+                * For SQPOLL, only the single threaded io_sq_thread() will
+                * manipulate the list, hence no extra locking is needed there.
+                */
+               bool                    poll_multi_queue;
+               struct io_wq_work_list  iopoll_list;
+
                struct io_file_table    file_table;
+               struct io_mapped_ubuf   **user_bufs;
                unsigned                nr_user_files;
                unsigned                nr_user_bufs;
-               struct io_mapped_ubuf   **user_bufs;
 
                struct io_submit_state  submit_state;
 
@@ -288,15 +301,6 @@ struct io_ring_ctx {
                struct io_alloc_cache   apoll_cache;
                struct io_alloc_cache   netmsg_cache;
 
-               /*
-                * ->iopoll_list is protected by the ctx->uring_lock for
-                * io_uring instances that don't use IORING_SETUP_SQPOLL.
-                * For SQPOLL, only the single threaded io_sq_thread() will
-                * manipulate the list, hence no extra locking is needed there.
-                */
-               struct io_wq_work_list  iopoll_list;
-               bool                    poll_multi_queue;
-
                /*
                 * Any cancelable uring_cmd is added to this list in
                 * ->uring_cmd() by io_uring_cmd_insert_cancelable()
@@ -343,8 +347,8 @@ struct io_ring_ctx {
        spinlock_t              completion_lock;
 
        /* IRQ completion list, under ->completion_lock */
-       struct io_wq_work_list  locked_free_list;
        unsigned int            locked_free_nr;
+       struct io_wq_work_list  locked_free_list;
 
        struct list_head        io_buffers_comp;
        struct list_head        cq_overflow_list;
@@ -366,9 +370,6 @@ struct io_ring_ctx {
        unsigned int            file_alloc_start;
        unsigned int            file_alloc_end;
 
-       struct xarray           personalities;
-       u32                     pers_next;
-
        struct list_head        io_buffers_cache;
 
        /* deferred free list, protected by ->uring_lock */
@@ -389,6 +390,9 @@ struct io_ring_ctx {
        struct wait_queue_head          rsrc_quiesce_wq;
        unsigned                        rsrc_quiesce;
 
+       u32                     pers_next;
+       struct xarray           personalities;
+
        /* hashed buffered write serialization */
        struct io_wq_hash               *hash_map;
 
@@ -405,11 +409,22 @@ struct io_ring_ctx {
 
        /* io-wq management, e.g. thread count */
        u32                             iowq_limits[2];
-       bool                            iowq_limits_set;
 
        struct callback_head            poll_wq_task_work;
        struct list_head                defer_list;
-       unsigned                        sq_thread_idle;
+
+#ifdef CONFIG_NET_RX_BUSY_POLL
+       struct list_head        napi_list;      /* track busy poll napi_id */
+       spinlock_t              napi_lock;      /* napi_list lock */
+
+       /* napi busy poll default timeout */
+       unsigned int            napi_busy_poll_to;
+       bool                    napi_prefer_busy_poll;
+       bool                    napi_enabled;
+
+       DECLARE_HASHTABLE(napi_ht, 4);
+#endif
+
        /* protected by ->completion_lock */
        unsigned                        evfd_last_cq_tail;
 
@@ -455,7 +470,6 @@ enum {
        REQ_F_SKIP_LINK_CQES_BIT,
        REQ_F_SINGLE_POLL_BIT,
        REQ_F_DOUBLE_POLL_BIT,
-       REQ_F_PARTIAL_IO_BIT,
        REQ_F_APOLL_MULTISHOT_BIT,
        REQ_F_CLEAR_POLLIN_BIT,
        REQ_F_HASH_LOCKED_BIT,
@@ -463,75 +477,88 @@ enum {
        REQ_F_SUPPORT_NOWAIT_BIT,
        REQ_F_ISREG_BIT,
        REQ_F_POLL_NO_LAZY_BIT,
+       REQ_F_CANCEL_SEQ_BIT,
+       REQ_F_CAN_POLL_BIT,
+       REQ_F_BL_EMPTY_BIT,
+       REQ_F_BL_NO_RECYCLE_BIT,
 
        /* not a real bit, just to check we're not overflowing the space */
        __REQ_F_LAST_BIT,
 };
 
+typedef u64 __bitwise io_req_flags_t;
+#define IO_REQ_FLAG(bitno)     ((__force io_req_flags_t) BIT_ULL((bitno)))
+
 enum {
        /* ctx owns file */
-       REQ_F_FIXED_FILE        = BIT(REQ_F_FIXED_FILE_BIT),
+       REQ_F_FIXED_FILE        = IO_REQ_FLAG(REQ_F_FIXED_FILE_BIT),
        /* drain existing IO first */
-       REQ_F_IO_DRAIN          = BIT(REQ_F_IO_DRAIN_BIT),
+       REQ_F_IO_DRAIN          = IO_REQ_FLAG(REQ_F_IO_DRAIN_BIT),
        /* linked sqes */
-       REQ_F_LINK              = BIT(REQ_F_LINK_BIT),
+       REQ_F_LINK              = IO_REQ_FLAG(REQ_F_LINK_BIT),
        /* doesn't sever on completion < 0 */
-       REQ_F_HARDLINK          = BIT(REQ_F_HARDLINK_BIT),
+       REQ_F_HARDLINK          = IO_REQ_FLAG(REQ_F_HARDLINK_BIT),
        /* IOSQE_ASYNC */
-       REQ_F_FORCE_ASYNC       = BIT(REQ_F_FORCE_ASYNC_BIT),
+       REQ_F_FORCE_ASYNC       = IO_REQ_FLAG(REQ_F_FORCE_ASYNC_BIT),
        /* IOSQE_BUFFER_SELECT */
-       REQ_F_BUFFER_SELECT     = BIT(REQ_F_BUFFER_SELECT_BIT),
+       REQ_F_BUFFER_SELECT     = IO_REQ_FLAG(REQ_F_BUFFER_SELECT_BIT),
        /* IOSQE_CQE_SKIP_SUCCESS */
-       REQ_F_CQE_SKIP          = BIT(REQ_F_CQE_SKIP_BIT),
+       REQ_F_CQE_SKIP          = IO_REQ_FLAG(REQ_F_CQE_SKIP_BIT),
 
        /* fail rest of links */
-       REQ_F_FAIL              = BIT(REQ_F_FAIL_BIT),
+       REQ_F_FAIL              = IO_REQ_FLAG(REQ_F_FAIL_BIT),
        /* on inflight list, should be cancelled and waited on exit reliably */
-       REQ_F_INFLIGHT          = BIT(REQ_F_INFLIGHT_BIT),
+       REQ_F_INFLIGHT          = IO_REQ_FLAG(REQ_F_INFLIGHT_BIT),
        /* read/write uses file position */
-       REQ_F_CUR_POS           = BIT(REQ_F_CUR_POS_BIT),
+       REQ_F_CUR_POS           = IO_REQ_FLAG(REQ_F_CUR_POS_BIT),
        /* must not punt to workers */
-       REQ_F_NOWAIT            = BIT(REQ_F_NOWAIT_BIT),
+       REQ_F_NOWAIT            = IO_REQ_FLAG(REQ_F_NOWAIT_BIT),
        /* has or had linked timeout */
-       REQ_F_LINK_TIMEOUT      = BIT(REQ_F_LINK_TIMEOUT_BIT),
+       REQ_F_LINK_TIMEOUT      = IO_REQ_FLAG(REQ_F_LINK_TIMEOUT_BIT),
        /* needs cleanup */
-       REQ_F_NEED_CLEANUP      = BIT(REQ_F_NEED_CLEANUP_BIT),
+       REQ_F_NEED_CLEANUP      = IO_REQ_FLAG(REQ_F_NEED_CLEANUP_BIT),
        /* already went through poll handler */
-       REQ_F_POLLED            = BIT(REQ_F_POLLED_BIT),
+       REQ_F_POLLED            = IO_REQ_FLAG(REQ_F_POLLED_BIT),
        /* buffer already selected */
-       REQ_F_BUFFER_SELECTED   = BIT(REQ_F_BUFFER_SELECTED_BIT),
+       REQ_F_BUFFER_SELECTED   = IO_REQ_FLAG(REQ_F_BUFFER_SELECTED_BIT),
        /* buffer selected from ring, needs commit */
-       REQ_F_BUFFER_RING       = BIT(REQ_F_BUFFER_RING_BIT),
+       REQ_F_BUFFER_RING       = IO_REQ_FLAG(REQ_F_BUFFER_RING_BIT),
        /* caller should reissue async */
-       REQ_F_REISSUE           = BIT(REQ_F_REISSUE_BIT),
+       REQ_F_REISSUE           = IO_REQ_FLAG(REQ_F_REISSUE_BIT),
        /* supports async reads/writes */
-       REQ_F_SUPPORT_NOWAIT    = BIT(REQ_F_SUPPORT_NOWAIT_BIT),
+       REQ_F_SUPPORT_NOWAIT    = IO_REQ_FLAG(REQ_F_SUPPORT_NOWAIT_BIT),
        /* regular file */
-       REQ_F_ISREG             = BIT(REQ_F_ISREG_BIT),
+       REQ_F_ISREG             = IO_REQ_FLAG(REQ_F_ISREG_BIT),
        /* has creds assigned */
-       REQ_F_CREDS             = BIT(REQ_F_CREDS_BIT),
+       REQ_F_CREDS             = IO_REQ_FLAG(REQ_F_CREDS_BIT),
        /* skip refcounting if not set */
-       REQ_F_REFCOUNT          = BIT(REQ_F_REFCOUNT_BIT),
+       REQ_F_REFCOUNT          = IO_REQ_FLAG(REQ_F_REFCOUNT_BIT),
        /* there is a linked timeout that has to be armed */
-       REQ_F_ARM_LTIMEOUT      = BIT(REQ_F_ARM_LTIMEOUT_BIT),
+       REQ_F_ARM_LTIMEOUT      = IO_REQ_FLAG(REQ_F_ARM_LTIMEOUT_BIT),
        /* ->async_data allocated */
-       REQ_F_ASYNC_DATA        = BIT(REQ_F_ASYNC_DATA_BIT),
+       REQ_F_ASYNC_DATA        = IO_REQ_FLAG(REQ_F_ASYNC_DATA_BIT),
        /* don't post CQEs while failing linked requests */
-       REQ_F_SKIP_LINK_CQES    = BIT(REQ_F_SKIP_LINK_CQES_BIT),
+       REQ_F_SKIP_LINK_CQES    = IO_REQ_FLAG(REQ_F_SKIP_LINK_CQES_BIT),
        /* single poll may be active */
-       REQ_F_SINGLE_POLL       = BIT(REQ_F_SINGLE_POLL_BIT),
+       REQ_F_SINGLE_POLL       = IO_REQ_FLAG(REQ_F_SINGLE_POLL_BIT),
        /* double poll may active */
-       REQ_F_DOUBLE_POLL       = BIT(REQ_F_DOUBLE_POLL_BIT),
-       /* request has already done partial IO */
-       REQ_F_PARTIAL_IO        = BIT(REQ_F_PARTIAL_IO_BIT),
+       REQ_F_DOUBLE_POLL       = IO_REQ_FLAG(REQ_F_DOUBLE_POLL_BIT),
        /* fast poll multishot mode */
-       REQ_F_APOLL_MULTISHOT   = BIT(REQ_F_APOLL_MULTISHOT_BIT),
+       REQ_F_APOLL_MULTISHOT   = IO_REQ_FLAG(REQ_F_APOLL_MULTISHOT_BIT),
        /* recvmsg special flag, clear EPOLLIN */
-       REQ_F_CLEAR_POLLIN      = BIT(REQ_F_CLEAR_POLLIN_BIT),
+       REQ_F_CLEAR_POLLIN      = IO_REQ_FLAG(REQ_F_CLEAR_POLLIN_BIT),
        /* hashed into ->cancel_hash_locked, protected by ->uring_lock */
-       REQ_F_HASH_LOCKED       = BIT(REQ_F_HASH_LOCKED_BIT),
+       REQ_F_HASH_LOCKED       = IO_REQ_FLAG(REQ_F_HASH_LOCKED_BIT),
        /* don't use lazy poll wake for this request */
-       REQ_F_POLL_NO_LAZY      = BIT(REQ_F_POLL_NO_LAZY_BIT),
+       REQ_F_POLL_NO_LAZY      = IO_REQ_FLAG(REQ_F_POLL_NO_LAZY_BIT),
+       /* cancel sequence is set and valid */
+       REQ_F_CANCEL_SEQ        = IO_REQ_FLAG(REQ_F_CANCEL_SEQ_BIT),
+       /* file is pollable */
+       REQ_F_CAN_POLL          = IO_REQ_FLAG(REQ_F_CAN_POLL_BIT),
+       /* buffer list was empty after selection of buffer */
+       REQ_F_BL_EMPTY          = IO_REQ_FLAG(REQ_F_BL_EMPTY_BIT),
+       /* don't recycle provided buffers for this request */
+       REQ_F_BL_NO_RECYCLE     = IO_REQ_FLAG(REQ_F_BL_NO_RECYCLE_BIT),
 };
 
 typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts);
@@ -592,15 +619,17 @@ struct io_kiocb {
         * and after selection it points to the buffer ID itself.
         */
        u16                             buf_index;
-       unsigned int                    flags;
+
+       unsigned                        nr_tw;
+
+       /* REQ_F_* flags */
+       io_req_flags_t                  flags;
 
        struct io_cqe                   cqe;
 
        struct io_ring_ctx              *ctx;
        struct task_struct              *task;
 
-       struct io_rsrc_node             *rsrc_node;
-
        union {
                /* store used ubuf, so we can prevent reloading */
                struct io_mapped_ubuf   *imu;
@@ -621,10 +650,12 @@ struct io_kiocb {
                /* cache ->apoll->events */
                __poll_t apoll_events;
        };
+
+       struct io_rsrc_node             *rsrc_node;
+
        atomic_t                        refs;
        atomic_t                        poll_refs;
        struct io_task_work             io_task_work;
-       unsigned                        nr_tw;
        /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */
        struct hlist_node               hash_node;
        /* internal polling, see IORING_FEAT_FAST_POLL */
index 96dd0acbba44aca735ff027ffb8f1c118cb762e8..6fc1c858013d1e4dda4ed38fa4083acf25d16d36 100644 (file)
@@ -293,22 +293,32 @@ struct iomap_ioend {
        struct list_head        io_list;        /* next ioend in chain */
        u16                     io_type;
        u16                     io_flags;       /* IOMAP_F_* */
-       u32                     io_folios;      /* folios added to ioend */
        struct inode            *io_inode;      /* file being written to */
        size_t                  io_size;        /* size of the extent */
        loff_t                  io_offset;      /* offset in the file */
        sector_t                io_sector;      /* start sector of ioend */
-       struct bio              *io_bio;        /* bio being built */
-       struct bio              io_inline_bio;  /* MUST BE LAST! */
+       struct bio              io_bio;         /* MUST BE LAST! */
 };
 
+static inline struct iomap_ioend *iomap_ioend_from_bio(struct bio *bio)
+{
+       return container_of(bio, struct iomap_ioend, io_bio);
+}
+
 struct iomap_writeback_ops {
        /*
         * Required, maps the blocks so that writeback can be performed on
         * the range starting at offset.
+        *
+        * Can return arbitrarily large regions, but we need to call into it at
+        * least once per folio to allow the file systems to synchronize with
+        * the write path that could be invalidating mappings.
+        *
+        * An existing mapping from a previous call to this method can be reused
+        * by the file system if it is still valid.
         */
        int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode,
-                               loff_t offset);
+                         loff_t offset, unsigned len);
 
        /*
         * Optional, allows the file systems to perform actions just before
@@ -329,6 +339,7 @@ struct iomap_writepage_ctx {
        struct iomap            iomap;
        struct iomap_ioend      *ioend;
        const struct iomap_writeback_ops *ops;
+       u32                     nr_folios;      /* folios added to the ioend */
 };
 
 void iomap_finish_ioends(struct iomap_ioend *ioend, int error);
index 1ea2a820e1eb035c9eea2ec97d9874c52bbd0b42..5e27cb3a3be99b34e705cb7c4569cfbdf2b11f82 100644 (file)
@@ -892,11 +892,14 @@ struct iommu_fwspec {
 struct iommu_sva {
        struct device                   *dev;
        struct iommu_domain             *domain;
+       struct list_head                handle_item;
+       refcount_t                      users;
 };
 
 struct iommu_mm_data {
        u32                     pasid;
        struct list_head        sva_domains;
+       struct list_head        sva_handles;
 };
 
 int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
index 7e7fd25b09b3ebe3d81e30fb23f506a9ee5a6519..179df96b20f88d065d0c9be4d4ef643b71a801c6 100644 (file)
@@ -2031,6 +2031,32 @@ static inline int mmu_invalidate_retry_gfn(struct kvm *kvm,
                return 1;
        return 0;
 }
+
+/*
+ * This lockless version of the range-based retry check *must* be paired with a
+ * call to the locked version after acquiring mmu_lock, i.e. this is safe to
+ * use only as a pre-check to avoid contending mmu_lock.  This version *will*
+ * get false negatives and false positives.
+ */
+static inline bool mmu_invalidate_retry_gfn_unsafe(struct kvm *kvm,
+                                                  unsigned long mmu_seq,
+                                                  gfn_t gfn)
+{
+       /*
+        * Use READ_ONCE() to ensure the in-progress flag and sequence counter
+        * are always read from memory, e.g. so that checking for retry in a
+        * loop won't result in an infinite retry loop.  Don't force loads for
+        * start+end, as the key to avoiding infinite retry loops is observing
+        * the 1=>0 transition of in-progress, i.e. getting false negatives
+        * due to stale start+end values is acceptable.
+        */
+       if (unlikely(READ_ONCE(kvm->mmu_invalidate_in_progress)) &&
+           gfn >= kvm->mmu_invalidate_range_start &&
+           gfn < kvm->mmu_invalidate_range_end)
+               return true;
+
+       return READ_ONCE(kvm->mmu_invalidate_seq) != mmu_seq;
+}
 #endif
 
 #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
index 1dbb14daccfaf326af0c54c89ff61afb50e07982..26d68115afb826b65a9fd11ce329635161e39cca 100644 (file)
@@ -471,7 +471,7 @@ enum ata_completion_errors {
 
 /*
  * Link power management policy: If you alter this, you also need to
- * alter libata-scsi.c (for the ascii descriptions)
+ * alter libata-sata.c (for the ascii descriptions)
  */
 enum ata_lpm_policy {
        ATA_LPM_UNKNOWN,
index 9f565416d18671e110e1fb99e44b96e241b519d4..1b95fe31051ff393084d622b03c7ff7746f75700 100644 (file)
@@ -375,12 +375,12 @@ static inline int nlm_privileged_requester(const struct svc_rqst *rqstp)
 static inline int nlm_compare_locks(const struct file_lock *fl1,
                                    const struct file_lock *fl2)
 {
-       return file_inode(fl1->fl_file) == file_inode(fl2->fl_file)
-            && fl1->fl_pid   == fl2->fl_pid
-            && fl1->fl_owner == fl2->fl_owner
+       return file_inode(fl1->c.flc_file) == file_inode(fl2->c.flc_file)
+            && fl1->c.flc_pid   == fl2->c.flc_pid
+            && fl1->c.flc_owner == fl2->c.flc_owner
             && fl1->fl_start == fl2->fl_start
             && fl1->fl_end   == fl2->fl_end
-            &&(fl1->fl_type  == fl2->fl_type || fl2->fl_type == F_UNLCK);
+            &&(fl1->c.flc_type  == fl2->c.flc_type || fl2->c.flc_type == F_UNLCK);
 }
 
 extern const struct lock_manager_operations nlmsvc_lock_operations;
index b60fbcd8cdfad5a5e607eb0c456e3afd559e0ec2..80cca9426761532f18c274b37789ee1f4462b701 100644 (file)
@@ -52,7 +52,7 @@ struct nlm_lock {
  *     FreeBSD uses 16, Apple Mac OS X 10.3 uses 20. Therefore we set it to
  *     32 bytes.
  */
+
 struct nlm_cookie
 {
        unsigned char data[NLM_MAXCOOKIELEN];
index 185924c5637876a153957a3a206fe907dcc784a5..76458b6d53da7667b31fc3a80007bb1a609ec1d8 100644 (file)
@@ -315,9 +315,9 @@ LSM_HOOK(int, 0, socket_getsockopt, struct socket *sock, int level, int optname)
 LSM_HOOK(int, 0, socket_setsockopt, struct socket *sock, int level, int optname)
 LSM_HOOK(int, 0, socket_shutdown, struct socket *sock, int how)
 LSM_HOOK(int, 0, socket_sock_rcv_skb, struct sock *sk, struct sk_buff *skb)
-LSM_HOOK(int, 0, socket_getpeersec_stream, struct socket *sock,
+LSM_HOOK(int, -ENOPROTOOPT, socket_getpeersec_stream, struct socket *sock,
         sockptr_t optval, sockptr_t optlen, unsigned int len)
-LSM_HOOK(int, 0, socket_getpeersec_dgram, struct socket *sock,
+LSM_HOOK(int, -ENOPROTOOPT, socket_getpeersec_dgram, struct socket *sock,
         struct sk_buff *skb, u32 *secid)
 LSM_HOOK(int, 0, sk_alloc_security, struct sock *sk, int family, gfp_t priority)
 LSM_HOOK(void, LSM_RET_VOID, sk_free_security, struct sock *sk)
index b3d63123b945b58bc7549142d79fd2783f2b0f07..a53ad4dabd7e8f7618a991d3fd17dc65fe2f33fd 100644 (file)
@@ -171,6 +171,7 @@ enum maple_type {
 #define MT_FLAGS_LOCK_IRQ      0x100
 #define MT_FLAGS_LOCK_BH       0x200
 #define MT_FLAGS_LOCK_EXTERN   0x300
+#define MT_FLAGS_ALLOC_WRAPPED 0x0800
 
 #define MAPLE_HEIGHT_MAX       31
 
@@ -319,6 +320,9 @@ int mtree_insert_range(struct maple_tree *mt, unsigned long first,
 int mtree_alloc_range(struct maple_tree *mt, unsigned long *startp,
                void *entry, unsigned long size, unsigned long min,
                unsigned long max, gfp_t gfp);
+int mtree_alloc_cyclic(struct maple_tree *mt, unsigned long *startp,
+               void *entry, unsigned long range_lo, unsigned long range_hi,
+               unsigned long *next, gfp_t gfp);
 int mtree_alloc_rrange(struct maple_tree *mt, unsigned long *startp,
                void *entry, unsigned long size, unsigned long min,
                unsigned long max, gfp_t gfp);
@@ -499,6 +503,9 @@ void *mas_find_range(struct ma_state *mas, unsigned long max);
 void *mas_find_rev(struct ma_state *mas, unsigned long min);
 void *mas_find_range_rev(struct ma_state *mas, unsigned long max);
 int mas_preallocate(struct ma_state *mas, void *entry, gfp_t gfp);
+int mas_alloc_cyclic(struct ma_state *mas, unsigned long *startp,
+               void *entry, unsigned long range_lo, unsigned long range_hi,
+               unsigned long *next, gfp_t gfp);
 
 bool mas_nomem(struct ma_state *mas, gfp_t gfp);
 void mas_pause(struct ma_state *mas);
index b695f9e946dabb46f08e1d1688d275bb3ff35b49..e2082240586d00b5f21af7b3f9e0ca6176b5884a 100644 (file)
@@ -121,6 +121,8 @@ int memblock_reserve(phys_addr_t base, phys_addr_t size);
 int memblock_physmem_add(phys_addr_t base, phys_addr_t size);
 #endif
 void memblock_trim_memory(phys_addr_t align);
+unsigned long memblock_addrs_overlap(phys_addr_t base1, phys_addr_t size1,
+                                    phys_addr_t base2, phys_addr_t size2);
 bool memblock_overlaps_region(struct memblock_type *type,
                              phys_addr_t base, phys_addr_t size);
 bool memblock_validate_numa_coverage(unsigned long threshold_bytes);
index 8c55ff351e5f2eed3416b0b59dd7e193f06bec02..41f03b352401e7556ddf92f0b9a53da4918a291a 100644 (file)
@@ -681,6 +681,7 @@ struct mlx5e_resources {
                struct mlx5_sq_bfreg       bfreg;
 #define MLX5_MAX_NUM_TC 8
                u32                        tisn[MLX5_MAX_PORTS][MLX5_MAX_NUM_TC];
+               bool                       tisn_valid;
        } hw_objs;
        struct net_device *uplink_netdev;
        struct mutex uplink_netdev_lock;
index 6f7725238abc2fcfeaf471e988b0035df25b9b87..3fb428ce7d1c7c0dd57969e8b82e227a4efb5d41 100644 (file)
@@ -132,6 +132,7 @@ struct mlx5_flow_handle;
 
 enum {
        FLOW_CONTEXT_HAS_TAG = BIT(0),
+       FLOW_CONTEXT_UPLINK_HAIRPIN_EN = BIT(1),
 };
 
 struct mlx5_flow_context {
index bf5320b28b8bf045f7ab3492eb7f050e027df29d..486b7492050c3daa04459c7de0e8471faca27ba4 100644 (file)
@@ -1103,7 +1103,7 @@ struct mlx5_ifc_roce_cap_bits {
        u8         sw_r_roce_src_udp_port[0x1];
        u8         fl_rc_qp_when_roce_disabled[0x1];
        u8         fl_rc_qp_when_roce_enabled[0x1];
-       u8         reserved_at_7[0x1];
+       u8         roce_cc_general[0x1];
        u8         qp_ooo_transmit_default[0x1];
        u8         reserved_at_9[0x15];
        u8         qp_ts_format[0x2];
@@ -3576,7 +3576,7 @@ struct mlx5_ifc_flow_context_bits {
        u8         action[0x10];
 
        u8         extended_destination[0x1];
-       u8         reserved_at_81[0x1];
+       u8         uplink_hairpin_en[0x1];
        u8         flow_source[0x2];
        u8         encrypt_decrypt_type[0x4];
        u8         destination_list_size[0x18];
@@ -4036,8 +4036,13 @@ struct mlx5_ifc_nic_vport_context_bits {
        u8         affiliation_criteria[0x4];
        u8         affiliated_vhca_id[0x10];
 
-       u8         reserved_at_60[0xd0];
+       u8         reserved_at_60[0xa0];
 
+       u8         reserved_at_100[0x1];
+       u8         sd_group[0x3];
+       u8         reserved_at_104[0x1c];
+
+       u8         reserved_at_120[0x10];
        u8         mtu[0x10];
 
        u8         system_image_guid[0x40];
@@ -10122,8 +10127,7 @@ struct mlx5_ifc_mpir_reg_bits {
        u8         reserved_at_20[0x20];
 
        u8         local_port[0x8];
-       u8         reserved_at_28[0x15];
-       u8         sd_group[0x3];
+       u8         reserved_at_28[0x18];
 
        u8         reserved_at_60[0x20];
 };
@@ -10257,7 +10261,9 @@ struct mlx5_ifc_mcam_access_reg_bits {
 
        u8         regs_63_to_46[0x12];
        u8         mrtc[0x1];
-       u8         regs_44_to_32[0xd];
+       u8         regs_44_to_41[0x4];
+       u8         mfrl[0x1];
+       u8         regs_39_to_32[0x8];
 
        u8         regs_31_to_10[0x16];
        u8         mtmp[0x1];
index bd53cf4be7bdcbe4ea47ab640fbe0052ffc88bef..f0e55bf3ec8b5b0dd10c3270c1659e1fbb96ac64 100644 (file)
@@ -269,7 +269,10 @@ struct mlx5_wqe_eth_seg {
        union {
                struct {
                        __be16 sz;
-                       u8     start[2];
+                       union {
+                               u8     start[2];
+                               DECLARE_FLEX_ARRAY(u8, data);
+                       };
                } inline_hdr;
                struct {
                        __be16 type;
index fbb9bf4478894c72e0e4a3f4d6404008611cbca0..c36cc6d829267e8b795c5c1ea7f71c1e28dcdaed 100644 (file)
@@ -72,6 +72,7 @@ int mlx5_query_nic_vport_mtu(struct mlx5_core_dev *mdev, u16 *mtu);
 int mlx5_modify_nic_vport_mtu(struct mlx5_core_dev *mdev, u16 mtu);
 int mlx5_query_nic_vport_system_image_guid(struct mlx5_core_dev *mdev,
                                           u64 *system_image_guid);
+int mlx5_query_nic_vport_sd_group(struct mlx5_core_dev *mdev, u8 *sd_group);
 int mlx5_query_nic_vport_node_guid(struct mlx5_core_dev *mdev, u64 *node_guid);
 int mlx5_modify_nic_vport_node_guid(struct mlx5_core_dev *mdev,
                                    u16 vport, u64 node_guid);
index 40d94411d49204e7276a6ad9554eb17335fd4577..dc7048824be81d628ca12f0874c1a7508da0d5c1 100644 (file)
@@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
        return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
               _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
               _calc_vm_trans(flags, MAP_SYNC,       VM_SYNC      ) |
+              _calc_vm_trans(flags, MAP_STACK,      VM_NOHUGEPAGE) |
               arch_calc_vm_flag_bits(flags);
 }
 
index 4ed33b12782151632e36aa114039cb4a0916fe06..a497f189d98818bcda37458746ebb2bded7826e4 100644 (file)
@@ -2013,9 +2013,9 @@ static inline int pfn_valid(unsigned long pfn)
        if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS)
                return 0;
        ms = __pfn_to_section(pfn);
-       rcu_read_lock();
+       rcu_read_lock_sched();
        if (!valid_section(ms)) {
-               rcu_read_unlock();
+               rcu_read_unlock_sched();
                return 0;
        }
        /*
@@ -2023,7 +2023,7 @@ static inline int pfn_valid(unsigned long pfn)
         * the entire section-sized span.
         */
        ret = early_section(ms) || pfn_section_valid(ms, pfn);
-       rcu_read_unlock();
+       rcu_read_unlock_sched();
 
        return ret;
 }
index 118c40258d07b787adf518e576e75545e4bae846..78a09af89e39b7a43ce211cbbf17e7fe035d36bb 100644 (file)
@@ -79,8 +79,6 @@ struct xdp_buff;
 struct xdp_frame;
 struct xdp_metadata_ops;
 struct xdp_md;
-/* DPLL specific */
-struct dpll_pin;
 
 typedef u32 xdp_features_t;
 
@@ -2141,6 +2139,11 @@ struct net_device {
 
        /* TXRX read-mostly hotpath */
        __cacheline_group_begin(net_device_read_txrx);
+       union {
+               struct pcpu_lstats __percpu             *lstats;
+               struct pcpu_sw_netstats __percpu        *tstats;
+               struct pcpu_dstats __percpu             *dstats;
+       };
        unsigned int            flags;
        unsigned short          hard_header_len;
        netdev_features_t       features;
@@ -2395,11 +2398,6 @@ struct net_device {
        enum netdev_ml_priv_type        ml_priv_type;
 
        enum netdev_stat_type           pcpu_stat_type:8;
-       union {
-               struct pcpu_lstats __percpu             *lstats;
-               struct pcpu_sw_netstats __percpu        *tstats;
-               struct pcpu_dstats __percpu             *dstats;
-       };
 
 #if IS_ENABLED(CONFIG_GARP)
        struct garp_port __rcu  *garp_port;
@@ -2469,7 +2467,7 @@ struct net_device {
        struct devlink_port     *devlink_port;
 
 #if IS_ENABLED(CONFIG_DPLL)
-       struct dpll_pin         *dpll_pin;
+       struct dpll_pin __rcu   *dpll_pin;
 #endif
 #if IS_ENABLED(CONFIG_PAGE_POOL)
        /** @page_pools: page pools created for this netdevice */
@@ -3499,6 +3497,16 @@ static inline void netdev_queue_set_dql_min_limit(struct netdev_queue *dev_queue
 #endif
 }
 
+static inline int netdev_queue_dql_avail(const struct netdev_queue *txq)
+{
+#ifdef CONFIG_BQL
+       /* Non-BQL migrated drivers will return 0, too. */
+       return dql_avail(&txq->dql);
+#else
+       return 0;
+#endif
+}
+
 /**
  *     netdev_txq_bql_enqueue_prefetchw - prefetch bql data for write
  *     @dev_queue: pointer to transmit queue
@@ -4032,17 +4040,6 @@ int dev_get_mac_address(struct sockaddr *sa, struct net *net, char *dev_name);
 int dev_get_port_parent_id(struct net_device *dev,
                           struct netdev_phys_item_id *ppid, bool recurse);
 bool netdev_port_same_parent_id(struct net_device *a, struct net_device *b);
-void netdev_dpll_pin_set(struct net_device *dev, struct dpll_pin *dpll_pin);
-void netdev_dpll_pin_clear(struct net_device *dev);
-
-static inline struct dpll_pin *netdev_dpll_pin(const struct net_device *dev)
-{
-#if IS_ENABLED(CONFIG_DPLL)
-       return dev->dpll_pin;
-#else
-       return NULL;
-#endif
-}
 
 struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *dev, bool *again);
 struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
index 80900d9109920f686f971e431b95361a831fcc26..ce660d51549b469243357bf9187e108582bc4d95 100644 (file)
@@ -474,6 +474,7 @@ struct nf_ct_hook {
                              const struct sk_buff *);
        void (*attach)(struct sk_buff *nskb, const struct sk_buff *skb);
        void (*set_closing)(struct nf_conntrack *nfct);
+       int (*confirm)(struct sk_buff *skb);
 };
 extern const struct nf_ct_hook __rcu *nf_ct_hook;
 
index e8c350a3ade153d852bec011dbd3c72a352d319d..e9f4f845d760afafbfb6e45b220dfb6919a29779 100644 (file)
@@ -186,6 +186,8 @@ struct ip_set_type_variant {
        /* Return true if "b" set is the same as "a"
         * according to the create set parameters */
        bool (*same_set)(const struct ip_set *a, const struct ip_set *b);
+       /* Cancel ongoing garbage collectors before destroying the set*/
+       void (*cancel_gc)(struct ip_set *set);
        /* Region-locking is used */
        bool region_lock;
 };
@@ -242,6 +244,8 @@ extern void ip_set_type_unregister(struct ip_set_type *set_type);
 
 /* A generic IP set */
 struct ip_set {
+       /* For call_cru in destroy */
+       struct rcu_head rcu;
        /* The name of the set */
        char name[IPSET_MAXNAMELEN];
        /* Lock protecting the set data */
index cd797e00fe359a91b44b2e012309e87d5b446a7e..92de074e63b98c03cefbb2f07d60de0b8f1fb039 100644 (file)
@@ -124,6 +124,7 @@ struct nfs_client {
        char                    cl_ipaddr[48];
        struct net              *cl_net;
        struct list_head        pending_cb_stateids;
+       struct rcu_head         rcu;
 };
 
 /*
@@ -265,6 +266,7 @@ struct nfs_server {
        const struct cred       *cred;
        bool                    has_sec_mnt_opts;
        struct kobject          kobj;
+       struct rcu_head         rcu;
 };
 
 /* Server capabilities */
index 0f1d024bd9582618a54b601988969694b4dafb47..7d22ea50b09841ed2f764273564e53588e9b7a4e 100644 (file)
@@ -7,7 +7,7 @@
 struct proc_ns_operations;
 
 struct ns_common {
-       atomic_long_t stashed;
+       struct dentry *stashed;
        const struct proc_ns_operations *ops;
        unsigned int inum;
        refcount_t count;
index 4dd7e6fe92fb011d70b5a9dcb83e16f9d0a96b3f..eb2f04d636c89866167021019049fb979006c81e 100644 (file)
@@ -6,7 +6,11 @@
 #ifndef _LINUX_NVME_RDMA_H
 #define _LINUX_NVME_RDMA_H
 
-#define NVME_RDMA_MAX_QUEUE_SIZE       128
+#define NVME_RDMA_IP_PORT              4420
+
+#define NVME_RDMA_MAX_QUEUE_SIZE 256
+#define NVME_RDMA_MAX_METADATA_QUEUE_SIZE 128
+#define NVME_RDMA_DEFAULT_QUEUE_SIZE 128
 
 enum nvme_rdma_cm_fmt {
        NVME_RDMA_CM_FMT_1_0 = 0x0,
index 462c21e0e417654e56edf314ebcb62d8c6f4ad16..425573202295352796c8261b1c9345a3721760df 100644 (file)
@@ -23,8 +23,6 @@
 
 #define NVME_DISC_SUBSYS_NAME  "nqn.2014-08.org.nvmexpress.discovery"
 
-#define NVME_RDMA_IP_PORT      4420
-
 #define NVME_NSID_ALL          0xffffffff
 
 enum nvme_subsys_type {
@@ -646,6 +644,7 @@ enum {
        NVME_CMD_EFFECTS_NCC            = 1 << 2,
        NVME_CMD_EFFECTS_NIC            = 1 << 3,
        NVME_CMD_EFFECTS_CCC            = 1 << 4,
+       NVME_CMD_EFFECTS_CSER_MASK      = GENMASK(15, 14),
        NVME_CMD_EFFECTS_CSE_MASK       = GENMASK(18, 16),
        NVME_CMD_EFFECTS_UUID_SEL       = 1 << 19,
        NVME_CMD_EFFECTS_SCOPE_MASK     = GENMASK(31, 20),
@@ -816,12 +815,6 @@ struct nvme_reservation_status_ext {
        struct nvme_registered_ctrl_ext regctl_eds[];
 };
 
-enum nvme_async_event_type {
-       NVME_AER_TYPE_ERROR     = 0,
-       NVME_AER_TYPE_SMART     = 1,
-       NVME_AER_TYPE_NOTICE    = 2,
-};
-
 /* I/O commands */
 
 enum nvme_opcode {
@@ -1818,7 +1811,7 @@ struct nvme_command {
        };
 };
 
-static inline bool nvme_is_fabrics(struct nvme_command *cmd)
+static inline bool nvme_is_fabrics(const struct nvme_command *cmd)
 {
        return cmd->common.opcode == nvme_fabrics_command;
 }
@@ -1837,7 +1830,7 @@ struct nvme_error_slot {
        __u8            resv2[24];
 };
 
-static inline bool nvme_is_write(struct nvme_command *cmd)
+static inline bool nvme_is_write(const struct nvme_command *cmd)
 {
        /*
         * What a mess...
index add9368e6314b9d7038a651af3f8e1b9e08d7ffa..7ab0d13672dafa0faaeaf4cf02e7bab6fcc3130b 100644 (file)
@@ -1422,6 +1422,7 @@ int pci_load_and_free_saved_state(struct pci_dev *dev,
                                  struct pci_saved_state **state);
 int pci_platform_power_transition(struct pci_dev *dev, pci_power_t state);
 int pci_set_power_state(struct pci_dev *dev, pci_power_t state);
+int pci_set_power_state_locked(struct pci_dev *dev, pci_power_t state);
 pci_power_t pci_choose_state(struct pci_dev *dev, pm_message_t state);
 bool pci_pme_capable(struct pci_dev *dev, pci_power_t state);
 void pci_pme_active(struct pci_dev *dev, bool enable);
@@ -1625,6 +1626,8 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max,
 
 void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
                  void *userdata);
+void pci_walk_bus_locked(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
+                        void *userdata);
 int pci_cfg_space_size(struct pci_dev *dev);
 unsigned char pci_bus_max_busnr(struct pci_bus *bus);
 void pci_setup_bridge(struct pci_bus *bus);
@@ -2025,6 +2028,8 @@ static inline int pci_save_state(struct pci_dev *dev) { return 0; }
 static inline void pci_restore_state(struct pci_dev *dev) { }
 static inline int pci_set_power_state(struct pci_dev *dev, pci_power_t state)
 { return 0; }
+static inline int pci_set_power_state_locked(struct pci_dev *dev, pci_power_t state)
+{ return 0; }
 static inline int pci_wake_from_d3(struct pci_dev *dev, bool enable)
 { return 0; }
 static inline pci_power_t pci_choose_state(struct pci_dev *dev,
index 395cacce1179cac85b7f427d188361da75bfd169..c79a0efd02586b1b9147895272c0d0fd46c354f1 100644 (file)
@@ -55,6 +55,10 @@ struct pid
        refcount_t count;
        unsigned int level;
        spinlock_t lock;
+#ifdef CONFIG_FS_PID
+       struct dentry *stashed;
+       unsigned long ino;
+#endif
        /* lists of tasks that use this pid */
        struct hlist_head tasks[PIDTYPE_MAX];
        struct hlist_head inodes;
@@ -66,15 +70,13 @@ struct pid
 
 extern struct pid init_struct_pid;
 
-extern const struct file_operations pidfd_fops;
-
 struct file;
 
-extern struct pid *pidfd_pid(const struct file *file);
+struct pid *pidfd_pid(const struct file *file);
 struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags);
 struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags);
-int pidfd_create(struct pid *pid, unsigned int flags);
 int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret);
+void do_notify_pidfd(struct task_struct *task);
 
 static inline struct pid *get_pid(struct pid *pid)
 {
diff --git a/include/linux/pidfs.h b/include/linux/pidfs.h
new file mode 100644 (file)
index 0000000..40dd325
--- /dev/null
@@ -0,0 +1,9 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_PID_FS_H
+#define _LINUX_PID_FS_H
+
+struct file *pidfs_alloc_file(struct pid *pid, unsigned int flags);
+void __init pidfs_init(void);
+bool is_pidfs_sb(const struct super_block *sb);
+
+#endif /* _LINUX_PID_FS_H */
index 79594aeb160dafa2e04ff47390d37316025a5588..2f1b952d596aa4661ee20e2450d8ab4feac4c86f 100644 (file)
@@ -154,9 +154,9 @@ struct packet_stacked_data
 
 struct pktcdvd_device
 {
-       struct bdev_handle      *bdev_handle;   /* dev attached */
+       struct file             *bdev_file;     /* dev attached */
        /* handle acquired for bdev during pkt_open_dev() */
-       struct bdev_handle      *open_bdev_handle;
+       struct file             *f_open_bdev;
        dev_t                   pkt_dev;        /* our dev */
        struct packet_settings  settings;
        struct packet_stats     stats;
index 27a7dad17eefb83b917569fdc6a5df7298f3859b..1f0ee2459f2aa2db997a979ff7df74b1d1bd588c 100644 (file)
@@ -92,4 +92,7 @@
 /********** VFS **********/
 #define VFS_PTR_POISON ((void *)(0xF5 + POISON_POINTER_DELTA))
 
+/********** lib/stackdepot.c **********/
+#define STACK_DEPOT_POISON ((void *)(0xD390 + POISON_POINTER_DELTA))
+
 #endif
index a9e0e1c2d1f2ff89d99a9377a1de7d0dc78483cd..d1ea4f3714a8485677e89a4c6c07a657e53a15ca 100644 (file)
 
 /* ~832 bytes of stack space used max in sys_select/sys_poll before allocating
    additional memory. */
-#ifdef __clang__
-#define MAX_STACK_ALLOC 768
-#else
 #define MAX_STACK_ALLOC 832
-#endif
 #define FRONTEND_STACK_ALLOC   256
 #define SELECT_STACK_ALLOC     FRONTEND_STACK_ALLOC
 #define POLL_STACK_ALLOC       FRONTEND_STACK_ALLOC
index de407e7c3b55fdbd9b5d3cbe93585b1e417a3e20..0b2a8985444097f91cd0557563ad9438e3c494ea 100644 (file)
@@ -65,6 +65,7 @@ struct proc_fs_info {
        kgid_t pid_gid;
        enum proc_hidepid hide_pid;
        enum proc_pidonly pidonly;
+       struct rcu_head rcu;
 };
 
 static inline struct proc_fs_info *proc_sb_info(struct super_block *sb)
index 49539bc416cecb7b45bfd4f11df817cd11f2e791..5ea470eb4d768ad92f2785187aa9298c39025ea2 100644 (file)
@@ -66,7 +66,7 @@ static inline void proc_free_inum(unsigned int inum) {}
 
 static inline int ns_alloc_inum(struct ns_common *ns)
 {
-       atomic_long_set(&ns->stashed, 0);
+       WRITE_ONCE(ns->stashed, NULL);
        return proc_alloc_inum(&ns->inum);
 }
 
index eaaef3ffec221b93cfbb9a2f4c20646b473754fa..90507d4afcd6debb80eef494c4c874c8a1732e49 100644 (file)
@@ -393,6 +393,10 @@ static inline void user_single_step_report(struct pt_regs *regs)
 #define current_user_stack_pointer() user_stack_pointer(current_pt_regs())
 #endif
 
+#ifndef exception_ip
+#define exception_ip(x) instruction_pointer(x)
+#endif
+
 extern int task_current_syscall(struct task_struct *target, struct syscall_info *info);
 
 extern void sigaction_compat_abi(struct k_sigaction *act, struct k_sigaction *oact);
index 0027d4c8087c9ae7aec81e2412a2444a45bf673a..3860dbb9107a2117aa0078c18550aed42d36e241 100644 (file)
@@ -37,7 +37,6 @@ static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
 }
 
 extern void rcu_sync_init(struct rcu_sync *);
-extern void rcu_sync_enter_start(struct rcu_sync *);
 extern void rcu_sync_enter(struct rcu_sync *);
 extern void rcu_sync_exit(struct rcu_sync *);
 extern void rcu_sync_dtor(struct rcu_sync *);
index 0746b1b0b6639d9a912e2ba7503b928f40e92748..16f519914415ebf5592e75be67e8660339d6b0e7 100644 (file)
@@ -184,9 +184,9 @@ void rcu_tasks_trace_qs_blkd(struct task_struct *t);
        do {                                                                    \
                int ___rttq_nesting = READ_ONCE((t)->trc_reader_nesting);       \
                                                                                \
-               if (likely(!READ_ONCE((t)->trc_reader_special.b.need_qs)) &&    \
+               if (unlikely(READ_ONCE((t)->trc_reader_special.b.need_qs) == TRC_NEED_QS) &&    \
                    likely(!___rttq_nesting)) {                                 \
-                       rcu_trc_cmpxchg_need_qs((t), 0, TRC_NEED_QS_CHECKED);   \
+                       rcu_trc_cmpxchg_need_qs((t), TRC_NEED_QS, TRC_NEED_QS_CHECKED); \
                } else if (___rttq_nesting && ___rttq_nesting != INT_MIN &&     \
                           !READ_ONCE((t)->trc_reader_special.b.blocked)) {     \
                        rcu_tasks_trace_qs_blkd(t);                             \
diff --git a/include/linux/rw_hint.h b/include/linux/rw_hint.h
new file mode 100644 (file)
index 0000000..309ca72
--- /dev/null
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_RW_HINT_H
+#define _LINUX_RW_HINT_H
+
+#include <linux/build_bug.h>
+#include <linux/compiler_attributes.h>
+#include <uapi/linux/fcntl.h>
+
+/* Block storage write lifetime hint values. */
+enum rw_hint {
+       WRITE_LIFE_NOT_SET      = RWH_WRITE_LIFE_NOT_SET,
+       WRITE_LIFE_NONE         = RWH_WRITE_LIFE_NONE,
+       WRITE_LIFE_SHORT        = RWH_WRITE_LIFE_SHORT,
+       WRITE_LIFE_MEDIUM       = RWH_WRITE_LIFE_MEDIUM,
+       WRITE_LIFE_LONG         = RWH_WRITE_LIFE_LONG,
+       WRITE_LIFE_EXTREME      = RWH_WRITE_LIFE_EXTREME,
+} __packed;
+
+/* Sparse ignores __packed annotations on enums, hence the #ifndef below. */
+#ifndef __CHECKER__
+static_assert(sizeof(enum rw_hint) == 1);
+#endif
+
+#endif /* _LINUX_RW_HINT_H */
index cdb8ea53c365ba45be4041c887de4c9d1c22afcd..17cb0761ff658e6838aa4e04eb429d4155e55e81 100644 (file)
@@ -858,6 +858,8 @@ struct task_struct {
        u8                              rcu_tasks_idx;
        int                             rcu_tasks_idle_cpu;
        struct list_head                rcu_tasks_holdout_list;
+       int                             rcu_tasks_exit_cpu;
+       struct list_head                rcu_tasks_exit_list;
 #endif /* #ifdef CONFIG_TASKS_RCU */
 
 #ifdef CONFIG_TASKS_TRACE_RCU
@@ -920,7 +922,7 @@ struct task_struct {
        unsigned                        sched_rt_mutex:1;
 #endif
 
-       /* Bit to tell LSMs we're in execve(): */
+       /* Bit to tell TOMOYO we're in execve(): */
        unsigned                        in_execve:1;
        unsigned                        in_iowait:1;
 #ifndef TIF_RESTORE_SIGMASK
@@ -1642,7 +1644,7 @@ extern struct pid *cad_pid;
 #define PF_NO_SETAFFINITY      0x04000000      /* Userland is not allowed to meddle with cpus_mask */
 #define PF_MCE_EARLY           0x08000000      /* Early kill for mce process policy */
 #define PF_MEMALLOC_PIN                0x10000000      /* Allocation context constrained to zones which allow long term pinning. */
-#define PF__HOLE__20000000     0x20000000
+#define PF_BLOCK_TS            0x20000000      /* plug has ts that needs updating */
 #define PF__HOLE__40000000     0x40000000
 #define PF_SUSPEND_TASK                0x80000000      /* This thread called freeze_processes() and should not be frozen */
 
index 4b7664c56208f9db855c527cd6daff672454d106..0a0e23c45406fe477a8c32534326bde451e4a45a 100644 (file)
@@ -735,8 +735,6 @@ static inline int thread_group_empty(struct task_struct *p)
 #define delay_group_leader(p) \
                (thread_group_leader(p) && !thread_group_empty(p))
 
-extern bool thread_group_exited(struct pid *pid);
-
 extern struct sighand_struct *__lock_task_sighand(struct task_struct *task,
                                                        unsigned long *flags);
 
index a6e04b4a21d70f9d3c807970815d1b79fee74ed4..11e0e00e0bb95599a0efebdefe6ebb9a5288b0c4 100644 (file)
@@ -176,6 +176,7 @@ extern void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
 cpumask_var_t *alloc_sched_domains(unsigned int ndoms);
 void free_sched_domains(cpumask_var_t doms[], unsigned int ndoms);
 
+bool cpus_equal_capacity(int this_cpu, int that_cpu);
 bool cpus_share_cache(int this_cpu, int that_cpu);
 bool cpus_share_resources(int this_cpu, int that_cpu);
 
@@ -226,6 +227,11 @@ partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
 {
 }
 
+static inline bool cpus_equal_capacity(int this_cpu, int that_cpu)
+{
+       return true;
+}
+
 static inline bool cpus_share_cache(int this_cpu, int that_cpu)
 {
        return true;
index c44f4b47b945306318d8ed164c498abfe2512a10..fe41da0059700432044c1260952073914f7a26fc 100644 (file)
@@ -2,7 +2,10 @@
 #ifndef _LINUX_SEQ_BUF_H
 #define _LINUX_SEQ_BUF_H
 
-#include <linux/fs.h>
+#include <linux/bug.h>
+#include <linux/minmax.h>
+#include <linux/seq_file.h>
+#include <linux/types.h>
 
 /*
  * Trace sequences are used to allow a function to call several other functions
@@ -10,7 +13,7 @@
  */
 
 /**
- * seq_buf - seq buffer structure
+ * struct seq_buf - seq buffer structure
  * @buffer:    pointer to the buffer
  * @size:      size of the buffer
  * @len:       the amount of data inside the buffer
@@ -77,10 +80,10 @@ static inline unsigned int seq_buf_used(struct seq_buf *s)
 }
 
 /**
- * seq_buf_str - get %NUL-terminated C string from seq_buf
+ * seq_buf_str - get NUL-terminated C string from seq_buf
  * @s: the seq_buf handle
  *
- * This makes sure that the buffer in @s is nul terminated and
+ * This makes sure that the buffer in @s is NUL-terminated and
  * safe to read as a string.
  *
  * Note, if this is called when the buffer has overflowed, then
@@ -90,7 +93,7 @@ static inline unsigned int seq_buf_used(struct seq_buf *s)
  * After this function is called, s->buffer is safe to use
  * in string operations.
  *
- * Returns @s->buf after making sure it is terminated.
+ * Returns: @s->buf after making sure it is terminated.
  */
 static inline const char *seq_buf_str(struct seq_buf *s)
 {
@@ -110,7 +113,7 @@ static inline const char *seq_buf_str(struct seq_buf *s)
  * @s: the seq_buf handle
  * @bufp: the beginning of the buffer is stored here
  *
- * Return the number of bytes available in the buffer, or zero if
+ * Returns: the number of bytes available in the buffer, or zero if
  * there's no space.
  */
 static inline size_t seq_buf_get_buf(struct seq_buf *s, char **bufp)
@@ -132,7 +135,7 @@ static inline size_t seq_buf_get_buf(struct seq_buf *s, char **bufp)
  * @num: the number of bytes to commit
  *
  * Commit @num bytes of data written to a buffer previously acquired
- * by seq_buf_get To signal an error condition, or that the data
+ * by seq_buf_get_buf(). To signal an error condition, or that the data
  * didn't fit in the available space, pass a negative @num value.
  */
 static inline void seq_buf_commit(struct seq_buf *s, int num)
index 536b2581d3e2007593323a53c050d037d6ac5dd1..55b1f3ba48ac1725f110747b8d05ce6bbfc1de0b 100644 (file)
@@ -748,8 +748,17 @@ struct uart_driver {
 
 void uart_write_wakeup(struct uart_port *port);
 
-#define __uart_port_tx(uport, ch, tx_ready, put_char, tx_done, for_test,      \
-               for_post)                                                     \
+/**
+ * enum UART_TX_FLAGS -- flags for uart_port_tx_flags()
+ *
+ * @UART_TX_NOSTOP: don't call port->ops->stop_tx() on empty buffer
+ */
+enum UART_TX_FLAGS {
+       UART_TX_NOSTOP = BIT(0),
+};
+
+#define __uart_port_tx(uport, ch, flags, tx_ready, put_char, tx_done,        \
+                      for_test, for_post)                                    \
 ({                                                                           \
        struct uart_port *__port = (uport);                                   \
        struct circ_buf *xmit = &__port->state->xmit;                         \
@@ -777,7 +786,7 @@ void uart_write_wakeup(struct uart_port *port);
        if (pending < WAKEUP_CHARS) {                                         \
                uart_write_wakeup(__port);                                    \
                                                                              \
-               if (pending == 0)                                             \
+               if (!((flags) & UART_TX_NOSTOP) && pending == 0)              \
                        __port->ops->stop_tx(__port);                         \
        }                                                                     \
                                                                              \
@@ -812,7 +821,7 @@ void uart_write_wakeup(struct uart_port *port);
  */
 #define uart_port_tx_limited(port, ch, count, tx_ready, put_char, tx_done) ({ \
        unsigned int __count = (count);                                       \
-       __uart_port_tx(port, ch, tx_ready, put_char, tx_done, __count,        \
+       __uart_port_tx(port, ch, 0, tx_ready, put_char, tx_done, __count,     \
                        __count--);                                           \
 })
 
@@ -826,8 +835,21 @@ void uart_write_wakeup(struct uart_port *port);
  * See uart_port_tx_limited() for more details.
  */
 #define uart_port_tx(port, ch, tx_ready, put_char)                     \
-       __uart_port_tx(port, ch, tx_ready, put_char, ({}), true, ({}))
+       __uart_port_tx(port, ch, 0, tx_ready, put_char, ({}), true, ({}))
+
 
+/**
+ * uart_port_tx_flags -- transmit helper for uart_port with flags
+ * @port: uart port
+ * @ch: variable to store a character to be written to the HW
+ * @flags: %UART_TX_NOSTOP or similar
+ * @tx_ready: can HW accept more data function
+ * @put_char: function to write a character
+ *
+ * See uart_port_tx_limited() for more details.
+ */
+#define uart_port_tx_flags(port, ch, flags, tx_ready, put_char)                \
+       __uart_port_tx(port, ch, flags, tx_ready, put_char, ({}), true, ({}))
 /*
  * Baud rate helpers.
  */
index 888a4b217829fd4d6baf52f784ce35e9ad6bd0ed..e65ec3fd27998a5b82fc2c4597c575125e653056 100644 (file)
@@ -505,12 +505,6 @@ static inline bool sk_psock_strp_enabled(struct sk_psock *psock)
        return !!psock->saved_data_ready;
 }
 
-static inline bool sk_is_udp(const struct sock *sk)
-{
-       return sk->sk_type == SOCK_DGRAM &&
-              sk->sk_protocol == IPPROTO_UDP;
-}
-
 #if IS_ENABLED(CONFIG_NET_SOCK_MSG)
 
 #define BPF_F_STRPARSER        (1UL << 1)
index 471fe2ff9066b75e82795b92972905bbae3cc48c..600fbd5daf683d4d93536a569ef5e52248d7a851 100644 (file)
@@ -21,7 +21,7 @@
 #include <uapi/linux/spi/spi.h>
 
 /* Max no. of CS supported per spi device */
-#define SPI_CS_CNT_MAX 4
+#define SPI_CS_CNT_MAX 16
 
 struct dma_chan;
 struct software_node;
index 4db00ddad26169060e1d42d5ca9b9723546ab81c..378ab1cd23bdcb4be1857d0f3bada023f8a6dae7 100644 (file)
@@ -298,7 +298,7 @@ struct swap_info_struct {
        unsigned int __percpu *cluster_next_cpu; /*percpu index for next allocation */
        struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap location */
        struct rb_root swap_extent_root;/* root of the swap extent rbtree */
-       struct bdev_handle *bdev_handle;/* open handle of the bdev */
+       struct file *bdev_file;         /* open handle of the bdev */
        struct block_device *bdev;      /* swap device or bdev of swap file */
        struct file *swap_file;         /* seldom referenced */
        unsigned int old_block_size;    /* seldom referenced */
@@ -549,6 +549,11 @@ static inline int swap_duplicate(swp_entry_t swp)
        return 0;
 }
 
+static inline int swapcache_prepare(swp_entry_t swp)
+{
+       return 0;
+}
+
 static inline void swap_free(swp_entry_t swp)
 {
 }
index cdba4d0c6d4a88dd19db34faa425addb8cd3f744..77eb9b0e768504daa57af63b2eb7c8debd00dfda 100644 (file)
@@ -128,6 +128,7 @@ struct mnt_id_req;
 #define __TYPE_IS_LL(t) (__TYPE_AS(t, 0LL) || __TYPE_AS(t, 0ULL))
 #define __SC_LONG(t, a) __typeof(__builtin_choose_expr(__TYPE_IS_LL(t), 0LL, 0L)) a
 #define __SC_CAST(t, a)        (__force t) a
+#define __SC_TYPE(t, a)        t
 #define __SC_ARGS(t, a)        a
 #define __SC_TEST(t, a) (void)BUILD_BUG_ON_ZERO(!__TYPE_IS_LL(t) && sizeof(t) > sizeof(long))
 
index 89b290d8c8dc9f115df7a295bc8d2512698db169..a1c47a6d69b0efd7e62765fbd873c848da22aaec 100644 (file)
@@ -221,8 +221,10 @@ struct tcp_sock {
        u32     lost_out;       /* Lost packets                 */
        u32     sacked_out;     /* SACK'd packets                       */
        u16     tcp_header_len; /* Bytes of tcp header to send          */
+       u8      scaling_ratio;  /* see tcp_win_from_space() */
        u8      chrono_type : 2,        /* current chronograph type */
                repair      : 1,
+               tcp_usec_ts : 1, /* TSval values in usec */
                is_sack_reneg:1,    /* in recovery from loss with SACK reneg? */
                is_cwnd_limited:1;/* forward progress limited by snd_cwnd? */
        __cacheline_group_end(tcp_sock_read_txrx);
@@ -352,7 +354,6 @@ struct tcp_sock {
        u32     compressed_ack_rcv_nxt;
        struct list_head tsq_node; /* anchor in tsq_tasklet.head list */
 
-       u8      scaling_ratio;  /* see tcp_win_from_space() */
        /* Information of the most recently (s)acked skb */
        struct tcp_rack {
                u64 mstamp; /* (Re)sent time of the skb */
@@ -368,8 +369,7 @@ struct tcp_sock {
        u8      compressed_ack;
        u8      dup_ack_counter:2,
                tlp_retrans:1,  /* TLP is a retransmission */
-               tcp_usec_ts:1, /* TSval values in usec */
-               unused:4;
+               unused:5;
        u8      thin_lto    : 1,/* Use linear timeouts for thin streams */
                recvmsg_inq : 1,/* Indicate # of bytes in queue upon recvmsg */
                fastopen_connect:1, /* FASTOPEN_CONNECT sockopt */
index 9ec229dfddaa774b9c0a4f2ae410eb273061c330..1ef95c0287f05daed3a1096ad9d5a4cf66dd0f3e 100644 (file)
@@ -9,9 +9,15 @@
 /*
  * Trace sequences are used to allow a function to call several other functions
  * to create a string of data to use.
+ *
+ * Have the trace seq to be 8K which is typically PAGE_SIZE * 2 on
+ * most architectures. The TRACE_SEQ_BUFFER_SIZE (which is
+ * TRACE_SEQ_SIZE minus the other fields of trace_seq), is the
+ * max size the output of a trace event may be.
  */
 
-#define TRACE_SEQ_BUFFER_SIZE  (PAGE_SIZE * 2 - \
+#define TRACE_SEQ_SIZE         8192
+#define TRACE_SEQ_BUFFER_SIZE  (TRACE_SEQ_SIZE - \
        (sizeof(struct seq_buf) + sizeof(size_t) + sizeof(int)))
 
 struct trace_seq {
index bea9c89922d908f66511dacb02edc8f4bcad918c..00cebe2b70de7ef3d74c814d4c59b4539c3d55da 100644 (file)
@@ -40,7 +40,6 @@ struct iov_iter_state {
 
 struct iov_iter {
        u8 iter_type;
-       bool copy_mc;
        bool nofault;
        bool data_source;
        size_t iov_offset;
@@ -248,22 +247,8 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i);
 
 #ifdef CONFIG_ARCH_HAS_COPY_MC
 size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
-static inline void iov_iter_set_copy_mc(struct iov_iter *i)
-{
-       i->copy_mc = true;
-}
-
-static inline bool iov_iter_is_copy_mc(const struct iov_iter *i)
-{
-       return i->copy_mc;
-}
 #else
 #define _copy_mc_to_iter _copy_to_iter
-static inline void iov_iter_set_copy_mc(struct iov_iter *i) { }
-static inline bool iov_iter_is_copy_mc(const struct iov_iter *i)
-{
-       return false;
-}
 #endif
 
 size_t iov_iter_zero(size_t bytes, struct iov_iter *);
@@ -355,7 +340,6 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
        WARN_ON(direction & ~(READ | WRITE));
        *i = (struct iov_iter) {
                .iter_type = ITER_UBUF,
-               .copy_mc = false,
                .data_source = direction,
                .ubuf = buf,
                .count = count,
index a771ccc038ac949f2b4a835e28d720735e17ee22..6532beb587b1978e09bc5b17dc088daf91f9f88c 100644 (file)
@@ -236,7 +236,6 @@ struct usb_ep {
        unsigned                max_streams:16;
        unsigned                mult:2;
        unsigned                maxburst:5;
-       unsigned                fifo_mode:1;
        u8                      address;
        const struct usb_endpoint_descriptor    *desc;
        const struct usb_ss_ep_comp_descriptor  *comp_desc;
index 49c4640027d8a6b93e903a6238d21e8541e31da4..afd40dce40f3d593f6fa0a11828aee9fd1582de3 100644 (file)
@@ -46,12 +46,6 @@ struct scm_stat {
 
 #define UNIXCB(skb)    (*(struct unix_skb_parms *)&((skb)->cb))
 
-#define unix_state_lock(s)     spin_lock(&unix_sk(s)->lock)
-#define unix_state_unlock(s)   spin_unlock(&unix_sk(s)->lock)
-#define unix_state_lock_nested(s) \
-                               spin_lock_nested(&unix_sk(s)->lock, \
-                               SINGLE_DEPTH_NESTING)
-
 /* The AF_UNIX socket */
 struct unix_sock {
        /* WARNING: sk has to be the first member */
@@ -77,6 +71,20 @@ struct unix_sock {
 #define unix_sk(ptr) container_of_const(ptr, struct unix_sock, sk)
 #define unix_peer(sk) (unix_sk(sk)->peer)
 
+#define unix_state_lock(s)     spin_lock(&unix_sk(s)->lock)
+#define unix_state_unlock(s)   spin_unlock(&unix_sk(s)->lock)
+enum unix_socket_lock_class {
+       U_LOCK_NORMAL,
+       U_LOCK_SECOND,  /* for double locking, see unix_state_double_lock(). */
+       U_LOCK_DIAG, /* used while dumping icons, see sk_diag_dump_icons(). */
+};
+
+static inline void unix_state_lock_nested(struct sock *sk,
+                                  enum unix_socket_lock_class subclass)
+{
+       spin_lock_nested(&unix_sk(sk)->lock, subclass);
+}
+
 #define peer_wait peer_wq.wait
 
 long unix_inq_len(struct sock *sk);
index 4dabeb6c76d31da1e3725a091a0a2636fcc9667c..9b09acac538eed8dbaa2576bf2af926ecd98eb44 100644 (file)
@@ -48,6 +48,10 @@ void napi_busy_loop(unsigned int napi_id,
                    bool (*loop_end)(void *, unsigned long),
                    void *loop_end_arg, bool prefer_busy_poll, u16 budget);
 
+void napi_busy_loop_rcu(unsigned int napi_id,
+                       bool (*loop_end)(void *, unsigned long),
+                       void *loop_end_arg, bool prefer_busy_poll, u16 budget);
+
 #else /* CONFIG_NET_RX_BUSY_POLL */
 static inline unsigned long net_busy_loop_on(void)
 {
index cf79656ce09ca1f05b733bfce2393f4a044d9454..2b54fdd8ca15a8fae0f810fc5ba550a1b1a44676 100644 (file)
@@ -2910,6 +2910,8 @@ struct cfg80211_bss_ies {
  *     own the beacon_ies, but they're just pointers to the ones from the
  *     @hidden_beacon_bss struct)
  * @proberesp_ies: the information elements from the last Probe Response frame
+ * @proberesp_ecsa_stuck: ECSA element is stuck in the Probe Response frame,
+ *     cannot rely on it having valid data
  * @hidden_beacon_bss: in case this BSS struct represents a probe response from
  *     a BSS that hides the SSID in its beacon, this points to the BSS struct
  *     that holds the beacon data. @beacon_ies is still valid, of course, and
@@ -2950,6 +2952,8 @@ struct cfg80211_bss {
        u8 chains;
        s8 chain_signal[IEEE80211_MAX_CHAINS];
 
+       u8 proberesp_ecsa_stuck:1;
+
        u8 bssid_index;
        u8 max_bssid_indicator;
 
index d0a2f827d5f20f3fed3c177d9b64d9dac373a26f..9ab4bf704e864358215d2370d33d3d9668681923 100644 (file)
@@ -357,4 +357,12 @@ static inline bool inet_csk_has_ulp(const struct sock *sk)
        return inet_test_bit(IS_ICSK, sk) && !!inet_csk(sk)->icsk_ulp_ops;
 }
 
+static inline void inet_init_csk_locks(struct sock *sk)
+{
+       struct inet_connection_sock *icsk = inet_csk(sk);
+
+       spin_lock_init(&icsk->icsk_accept_queue.rskq_lock);
+       spin_lock_init(&icsk->icsk_accept_queue.fastopenq.lock);
+}
+
 #endif /* _INET_CONNECTION_SOCK_H */
index aa86453f6b9ba367f772570a7b783bb098be6236..d94c242eb3ed20b2c5b2e5ceea3953cf96341fb7 100644 (file)
@@ -307,11 +307,6 @@ static inline unsigned long inet_cmsg_flags(const struct inet_sock *inet)
 #define inet_assign_bit(nr, sk, val)           \
        assign_bit(INET_FLAGS_##nr, &inet_sk(sk)->inet_flags, val)
 
-static inline bool sk_is_inet(struct sock *sk)
-{
-       return sk->sk_family == AF_INET || sk->sk_family == AF_INET6;
-}
-
 /**
  * sk_to_full_sk - Access to a full socket
  * @sk: pointer to a socket
index de0c69c57e3cb7485e3d8473bc0b109e4280d2f6..25cb688bdc62360292e25b0d676f135101a2118c 100644 (file)
@@ -767,7 +767,7 @@ int ip_options_rcv_srr(struct sk_buff *skb, struct net_device *dev);
  *     Functions provided by ip_sockglue.c
  */
 
-void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb);
+void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb, bool drop_dst);
 void ip_cmsg_recv_offset(struct msghdr *msg, struct sock *sk,
                         struct sk_buff *skb, int tlen, int offset);
 int ip_cmsg_send(struct sock *sk, struct msghdr *msg,
index 7e73f8e5e4970d4d12b89bf6a1a3988f88e2b635..1d55ba7c45be16356e4144e09cdfeb7da99a7971 100644 (file)
@@ -262,8 +262,7 @@ static inline void llc_pdu_header_init(struct sk_buff *skb, u8 type,
  */
 static inline void llc_pdu_decode_sa(struct sk_buff *skb, u8 *sa)
 {
-       if (skb->protocol == htons(ETH_P_802_2))
-               memcpy(sa, eth_hdr(skb)->h_source, ETH_ALEN);
+       memcpy(sa, eth_hdr(skb)->h_source, ETH_ALEN);
 }
 
 /**
@@ -275,8 +274,7 @@ static inline void llc_pdu_decode_sa(struct sk_buff *skb, u8 *sa)
  */
 static inline void llc_pdu_decode_da(struct sk_buff *skb, u8 *da)
 {
-       if (skb->protocol == htons(ETH_P_802_2))
-               memcpy(da, eth_hdr(skb)->h_dest, ETH_ALEN);
+       memcpy(da, eth_hdr(skb)->h_dest, ETH_ALEN);
 }
 
 /**
index da86e106c91d57b2eedfc7bb301867eb8e51c123..2bff5f47ce82f1c6f2774f49d13a647c573034d7 100644 (file)
@@ -249,6 +249,7 @@ struct mctp_route {
 struct mctp_route *mctp_route_lookup(struct net *net, unsigned int dnet,
                                     mctp_eid_t daddr);
 
+/* always takes ownership of skb */
 int mctp_local_output(struct sock *sk, struct mctp_route *rt,
                      struct sk_buff *skb, mctp_eid_t daddr, u8 req_tag);
 
index 956c752ceb3180115eec0b607d81cafe5f038ce8..a763dd327c6ea95d6b94fda1ea2efd8f1784335f 100644 (file)
@@ -276,7 +276,7 @@ nf_flow_table_offload_del_cb(struct nf_flowtable *flow_table,
 }
 
 void flow_offload_route_init(struct flow_offload *flow,
-                            const struct nf_flow_route *route);
+                            struct nf_flow_route *route);
 
 int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow);
 void flow_offload_refresh(struct nf_flowtable *flow_table,
index b157c5cafd14cfe307f3d36ad533d528f142eea6..510244cc0f8f0e479f252598ba2aaf43b8918978 100644 (file)
@@ -205,6 +205,7 @@ static inline void nft_data_copy(u32 *dst, const struct nft_data *src,
  *     @nla: netlink attributes
  *     @portid: netlink portID of the original message
  *     @seq: netlink sequence number
+ *     @flags: modifiers to new request
  *     @family: protocol family
  *     @level: depth of the chains
  *     @report: notify via unicast netlink message
@@ -282,6 +283,7 @@ struct nft_elem_priv { };
  *
  *     @key: element key
  *     @key_end: closing element key
+ *     @data: element data
  *     @priv: element private data and extensions
  */
 struct nft_set_elem {
@@ -325,10 +327,10 @@ struct nft_set_iter {
  *     @dtype: data type
  *     @dlen: data length
  *     @objtype: object type
- *     @flags: flags
  *     @size: number of set elements
  *     @policy: set policy
  *     @gc_int: garbage collector interval
+ *     @timeout: element timeout
  *     @field_len: length of each field in concatenation, bytes
  *     @field_count: number of concatenated fields in element
  *     @expr: set must support for expressions
@@ -351,9 +353,9 @@ struct nft_set_desc {
 /**
  *     enum nft_set_class - performance class
  *
- *     @NFT_LOOKUP_O_1: constant, O(1)
- *     @NFT_LOOKUP_O_LOG_N: logarithmic, O(log N)
- *     @NFT_LOOKUP_O_N: linear, O(N)
+ *     @NFT_SET_CLASS_O_1: constant, O(1)
+ *     @NFT_SET_CLASS_O_LOG_N: logarithmic, O(log N)
+ *     @NFT_SET_CLASS_O_N: linear, O(N)
  */
 enum nft_set_class {
        NFT_SET_CLASS_O_1,
@@ -422,9 +424,13 @@ struct nft_set_ext;
  *     @remove: remove element from set
  *     @walk: iterate over all set elements
  *     @get: get set elements
+ *     @commit: commit set elements
+ *     @abort: abort set elements
  *     @privsize: function to return size of set private data
+ *     @estimate: estimate the required memory size and the lookup complexity class
  *     @init: initialize private data of new set instance
  *     @destroy: destroy private data of set instance
+ *     @gc_init: initialize garbage collection
  *     @elemsize: element private size
  *
  *     Operations lookup, update and delete have simpler interfaces, are faster
@@ -540,13 +546,16 @@ struct nft_set_elem_expr {
  *     @policy: set parameterization (see enum nft_set_policies)
  *     @udlen: user data length
  *     @udata: user data
- *     @expr: stateful expression
+ *     @pending_update: list of pending update set element
  *     @ops: set ops
  *     @flags: set flags
  *     @dead: set will be freed, never cleared
  *     @genmask: generation mask
  *     @klen: key length
  *     @dlen: data length
+ *     @num_exprs: numbers of exprs
+ *     @exprs: stateful expression
+ *     @catchall_list: list of catch-all set element
  *     @data: private set data
  */
 struct nft_set {
@@ -692,6 +701,7 @@ extern const struct nft_set_ext_type nft_set_ext_types[];
  *
  *     @len: length of extension area
  *     @offset: offsets of individual extension types
+ *     @ext_len: length of the expected extension(used to sanity check)
  */
 struct nft_set_ext_tmpl {
        u16     len;
@@ -798,10 +808,16 @@ static inline struct nft_set_elem_expr *nft_set_ext_expr(const struct nft_set_ex
        return nft_set_ext(ext, NFT_SET_EXT_EXPRESSIONS);
 }
 
-static inline bool nft_set_elem_expired(const struct nft_set_ext *ext)
+static inline bool __nft_set_elem_expired(const struct nft_set_ext *ext,
+                                         u64 tstamp)
 {
        return nft_set_ext_exists(ext, NFT_SET_EXT_EXPIRATION) &&
-              time_is_before_eq_jiffies64(*nft_set_ext_expiration(ext));
+              time_after_eq64(tstamp, *nft_set_ext_expiration(ext));
+}
+
+static inline bool nft_set_elem_expired(const struct nft_set_ext *ext)
+{
+       return __nft_set_elem_expired(ext, get_jiffies_64());
 }
 
 static inline struct nft_set_ext *nft_set_elem_ext(const struct nft_set *set,
@@ -840,6 +856,7 @@ struct nft_expr_ops;
  *     @select_ops: function to select nft_expr_ops
  *     @release_ops: release nft_expr_ops
  *     @ops: default ops, used when no select_ops functions is present
+ *     @inner_ops: inner ops, used for inner packet operation
  *     @list: used internally
  *     @name: Identifier
  *     @owner: module reference
@@ -881,14 +898,22 @@ struct nft_offload_ctx;
  *     struct nft_expr_ops - nf_tables expression operations
  *
  *     @eval: Expression evaluation function
+ *     @clone: Expression clone function
  *     @size: full expression size, including private data size
  *     @init: initialization function
  *     @activate: activate expression in the next generation
  *     @deactivate: deactivate expression in next generation
  *     @destroy: destruction function, called after synchronize_rcu
+ *     @destroy_clone: destruction clone function
  *     @dump: function to dump parameters
- *     @type: expression type
  *     @validate: validate expression, called during loop detection
+ *     @reduce: reduce expression
+ *     @gc: garbage collection expression
+ *     @offload: hardware offload expression
+ *     @offload_action: function to report true/false to allocate one slot or not in the flow
+ *                      offload array
+ *     @offload_stats: function to synchronize hardware stats via updating the counter expression
+ *     @type: expression type
  *     @data: extra data to attach to this expression operation
  */
 struct nft_expr_ops {
@@ -1041,14 +1066,21 @@ struct nft_rule_blob {
 /**
  *     struct nft_chain - nf_tables chain
  *
+ *     @blob_gen_0: rule blob pointer to the current generation
+ *     @blob_gen_1: rule blob pointer to the future generation
  *     @rules: list of rules in the chain
  *     @list: used internally
  *     @rhlhead: used internally
  *     @table: table that this chain belongs to
  *     @handle: chain handle
  *     @use: number of jump references to this chain
- *     @flags: bitmask of enum nft_chain_flags
+ *     @flags: bitmask of enum NFTA_CHAIN_FLAGS
+ *     @bound: bind or not
+ *     @genmask: generation mask
  *     @name: name of the chain
+ *     @udlen: user data length
+ *     @udata: user data in the chain
+ *     @blob_next: rule blob pointer to the next in the chain
  */
 struct nft_chain {
        struct nft_rule_blob            __rcu *blob_gen_0;
@@ -1146,6 +1178,7 @@ struct nft_hook {
  *     @hook_list: list of netfilter hooks (for NFPROTO_NETDEV family)
  *     @type: chain type
  *     @policy: default policy
+ *     @flags: indicate the base chain disabled or not
  *     @stats: per-cpu chain stats
  *     @chain: the chain
  *     @flow_block: flow block (for hardware offload)
@@ -1274,11 +1307,13 @@ struct nft_object_hash_key {
  *     struct nft_object - nf_tables stateful object
  *
  *     @list: table stateful object list node
- *     @key:  keys that identify this object
  *     @rhlhead: nft_objname_ht node
+ *     @key: keys that identify this object
  *     @genmask: generation mask
  *     @use: number of references to this stateful object
  *     @handle: unique object handle
+ *     @udlen: length of user data
+ *     @udata: user data
  *     @ops: object operations
  *     @data: object data, layout depends on type
  */
@@ -1322,6 +1357,7 @@ void nft_obj_notify(struct net *net, const struct nft_table *table,
  *     @type: stateful object numeric type
  *     @owner: module owner
  *     @maxattr: maximum netlink attribute
+ *     @family: address family for AF-specific object types
  *     @policy: netlink attribute policy
  */
 struct nft_object_type {
@@ -1331,6 +1367,7 @@ struct nft_object_type {
        struct list_head                list;
        u32                             type;
        unsigned int                    maxattr;
+       u8                              family;
        struct module                   *owner;
        const struct nla_policy         *policy;
 };
@@ -1344,6 +1381,7 @@ struct nft_object_type {
  *     @destroy: release existing stateful object
  *     @dump: netlink dump stateful object
  *     @update: update stateful object
+ *     @type: pointer to object type
  */
 struct nft_object_ops {
        void                            (*eval)(struct nft_object *obj,
@@ -1379,9 +1417,8 @@ void nft_unregister_obj(struct nft_object_type *obj_type);
  *     @genmask: generation mask
  *     @use: number of references to this flow table
  *     @handle: unique object handle
- *     @dev_name: array of device names
+ *     @hook_list: hook list for hooks per net_device in flowtables
  *     @data: rhashtable and garbage collector
- *     @ops: array of hooks
  */
 struct nft_flowtable {
        struct list_head                list;
@@ -1748,6 +1785,7 @@ struct nftables_pernet {
        struct list_head        notify_list;
        struct mutex            commit_mutex;
        u64                     table_handle;
+       u64                     tstamp;
        unsigned int            base_seq;
        unsigned int            gc_seq;
        u8                      validate_state;
@@ -1760,6 +1798,11 @@ static inline struct nftables_pernet *nft_pernet(const struct net *net)
        return net_generic(net, nf_tables_net_id);
 }
 
+static inline u64 nft_net_tstamp(const struct net *net)
+{
+       return nft_pernet(net)->tstamp;
+}
+
 #define __NFT_REDUCE_READONLY  1UL
 #define NFT_REDUCE_READONLY    (void *)__NFT_REDUCE_READONLY
 
index ba3e1b315de838f9696ad7948ae474552c288e73..cefe0c4bdae34c91868c22731a3b666f8e16e996 100644 (file)
@@ -238,12 +238,7 @@ static inline bool qdisc_may_bulk(const struct Qdisc *qdisc)
 
 static inline int qdisc_avail_bulklimit(const struct netdev_queue *txq)
 {
-#ifdef CONFIG_BQL
-       /* Non-BQL migrated drivers will return 0, too. */
-       return dql_avail(&txq->dql);
-#else
-       return 0;
-#endif
+       return netdev_queue_dql_avail(txq);
 }
 
 struct Qdisc_class_ops {
@@ -375,6 +370,10 @@ struct tcf_proto_ops {
                                                struct nlattr **tca,
                                                struct netlink_ext_ack *extack);
        void                    (*tmplt_destroy)(void *tmplt_priv);
+       void                    (*tmplt_reoffload)(struct tcf_chain *chain,
+                                                  bool add,
+                                                  flow_setup_cb_t *cb,
+                                                  void *cb_priv);
        struct tcf_exts *       (*get_exts)(const struct tcf_proto *tp,
                                            u32 handle);
 
index a7f815c7cfdfdf1296be2967fd100efdb10cdd63..54ca8dcbfb4335d657b5cea323aa7d8c4316d49e 100644 (file)
@@ -2765,9 +2765,25 @@ static inline void skb_setup_tx_timestamp(struct sk_buff *skb, __u16 tsflags)
                           &skb_shinfo(skb)->tskey);
 }
 
+static inline bool sk_is_inet(const struct sock *sk)
+{
+       int family = READ_ONCE(sk->sk_family);
+
+       return family == AF_INET || family == AF_INET6;
+}
+
 static inline bool sk_is_tcp(const struct sock *sk)
 {
-       return sk->sk_type == SOCK_STREAM && sk->sk_protocol == IPPROTO_TCP;
+       return sk_is_inet(sk) &&
+              sk->sk_type == SOCK_STREAM &&
+              sk->sk_protocol == IPPROTO_TCP;
+}
+
+static inline bool sk_is_udp(const struct sock *sk)
+{
+       return sk_is_inet(sk) &&
+              sk->sk_type == SOCK_DGRAM &&
+              sk->sk_protocol == IPPROTO_UDP;
 }
 
 static inline bool sk_is_stream_unix(const struct sock *sk)
index a43062d4c734bb4e8e855fdd150b8c169a7fe172..8346b0d29542c3d5569b94b35eaa12461f78d62a 100644 (file)
@@ -308,6 +308,9 @@ void switchdev_deferred_process(void);
 int switchdev_port_attr_set(struct net_device *dev,
                            const struct switchdev_attr *attr,
                            struct netlink_ext_ack *extack);
+bool switchdev_port_obj_act_is_deferred(struct net_device *dev,
+                                       enum switchdev_notifier_type nt,
+                                       const struct switchdev_obj *obj);
 int switchdev_port_obj_add(struct net_device *dev,
                           const struct switchdev_obj *obj,
                           struct netlink_ext_ack *extack);
index dd78a11810310e84ef1c1ed8c3e0e274ddd77d7f..f6eba9652d010fbc8482bfd3c99377d631686324 100644 (file)
@@ -2506,7 +2506,7 @@ struct tcp_ulp_ops {
        /* cleanup ulp */
        void (*release)(struct sock *sk);
        /* diagnostic */
-       int (*get_info)(const struct sock *sk, struct sk_buff *skb);
+       int (*get_info)(struct sock *sk, struct sk_buff *skb);
        size_t (*get_info_size)(const struct sock *sk);
        /* clone ulp */
        void (*clone)(const struct request_sock *req, struct sock *newsk,
index 962f0c501111bac34781c4419913540c162ea058..340ad43971e4711d8091a6397bb5cf3c3c4ef0fd 100644 (file)
@@ -97,9 +97,6 @@ struct tls_sw_context_tx {
        struct tls_rec *open_rec;
        struct list_head tx_list;
        atomic_t encrypt_pending;
-       /* protect crypto_wait with encrypt_pending */
-       spinlock_t encrypt_compl_lock;
-       int async_notify;
        u8 async_capable:1;
 
 #define BIT_TX_SCHEDULED       0
@@ -136,8 +133,6 @@ struct tls_sw_context_rx {
        struct tls_strparser strp;
 
        atomic_t decrypt_pending;
-       /* protect crypto_wait with decrypt_pending*/
-       spinlock_t decrypt_compl_lock;
        struct sk_buff_head async_hold;
        struct wait_queue_head wq;
 };
index 526c1e7f505e4d9633bfb6da058ea25b9f2b9cfa..c9aec9ab6191205c7c6f8d3f0f5c136cae520750 100644 (file)
@@ -159,11 +159,29 @@ static inline struct xdp_buff *xsk_buff_get_frag(struct xdp_buff *first)
        return ret;
 }
 
+static inline void xsk_buff_del_tail(struct xdp_buff *tail)
+{
+       struct xdp_buff_xsk *xskb = container_of(tail, struct xdp_buff_xsk, xdp);
+
+       list_del(&xskb->xskb_list_node);
+}
+
+static inline struct xdp_buff *xsk_buff_get_tail(struct xdp_buff *first)
+{
+       struct xdp_buff_xsk *xskb = container_of(first, struct xdp_buff_xsk, xdp);
+       struct xdp_buff_xsk *frag;
+
+       frag = list_last_entry(&xskb->pool->xskb_list, struct xdp_buff_xsk,
+                              xskb_list_node);
+       return &frag->xdp;
+}
+
 static inline void xsk_buff_set_size(struct xdp_buff *xdp, u32 size)
 {
        xdp->data = xdp->data_hard_start + XDP_PACKET_HEADROOM;
        xdp->data_meta = xdp->data;
        xdp->data_end = xdp->data + size;
+       xdp->flags = 0;
 }
 
 static inline dma_addr_t xsk_buff_raw_get_dma(struct xsk_buff_pool *pool,
@@ -350,6 +368,15 @@ static inline struct xdp_buff *xsk_buff_get_frag(struct xdp_buff *first)
        return NULL;
 }
 
+static inline void xsk_buff_del_tail(struct xdp_buff *tail)
+{
+}
+
+static inline struct xdp_buff *xsk_buff_get_tail(struct xdp_buff *first)
+{
+       return NULL;
+}
+
 static inline void xsk_buff_set_size(struct xdp_buff *xdp, u32 size)
 {
 }
index 5ec1e71a09de7698616dff799a935da15083deef..c38f4fe5e64cf4f14b668328ab0cfac76ea5d496 100644 (file)
@@ -100,10 +100,6 @@ struct scsi_vpd {
        unsigned char   data[];
 };
 
-enum scsi_vpd_parameters {
-       SCSI_VPD_HEADER_SIZE = 4,
-};
-
 struct scsi_device {
        struct Scsi_Host *host;
        struct request_queue *request_queue;
@@ -208,6 +204,7 @@ struct scsi_device {
        unsigned use_10_for_rw:1; /* first try 10-byte read / write */
        unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
        unsigned set_dbd_for_ms:1; /* Set "DBD" field in mode sense */
+       unsigned read_before_ms:1;      /* perform a READ before MODE SENSE */
        unsigned no_report_opcodes:1;   /* no REPORT SUPPORTED OPERATION CODES */
        unsigned no_write_same:1;       /* no WRITE SAME command */
        unsigned use_16_for_rw:1; /* Use read/write(16) over read/write(10) */
index 8c18e8b6d27d21b34962cc392dfeaffe48070847..b24716ab27504bdfa17f221a39053dd21dd961d9 100644 (file)
@@ -75,6 +75,7 @@
 #define CS35L56_DSP1_AHBM_WINDOW_DEBUG_0               0x25E2040
 #define CS35L56_DSP1_AHBM_WINDOW_DEBUG_1               0x25E2044
 #define CS35L56_DSP1_XMEM_UNPACKED24_0                 0x2800000
+#define CS35L56_DSP1_FW_VER                            0x2800010
 #define CS35L56_DSP1_HALO_STATE_A1                     0x2801E58
 #define CS35L56_DSP1_HALO_STATE                                0x28021E0
 #define CS35L56_DSP1_PM_CUR_STATE_A1                   0x2804000
 
 #define CS35L56_CONTROL_PORT_READY_US                  2200
 #define CS35L56_HALO_STATE_POLL_US                     1000
-#define CS35L56_HALO_STATE_TIMEOUT_US                  50000
+#define CS35L56_HALO_STATE_TIMEOUT_US                  250000
 #define CS35L56_RESET_PULSE_MIN_US                     1100
 #define CS35L56_WAKE_HOLD_TIME_US                      1000
 
@@ -272,6 +273,7 @@ extern const char * const cs35l56_tx_input_texts[CS35L56_NUM_INPUT_SRC];
 extern const unsigned int cs35l56_tx_input_values[CS35L56_NUM_INPUT_SRC];
 
 int cs35l56_set_patch(struct cs35l56_base *cs35l56_base);
+int cs35l56_force_sync_asp1_registers_from_cache(struct cs35l56_base *cs35l56_base);
 int cs35l56_mbox_send(struct cs35l56_base *cs35l56_base, unsigned int command);
 int cs35l56_firmware_shutdown(struct cs35l56_base *cs35l56_base);
 int cs35l56_wait_for_firmware_boot(struct cs35l56_base *cs35l56_base);
@@ -284,7 +286,10 @@ int cs35l56_is_fw_reload_needed(struct cs35l56_base *cs35l56_base);
 int cs35l56_runtime_suspend_common(struct cs35l56_base *cs35l56_base);
 int cs35l56_runtime_resume_common(struct cs35l56_base *cs35l56_base, bool is_soundwire);
 void cs35l56_init_cs_dsp(struct cs35l56_base *cs35l56_base, struct cs_dsp *cs_dsp);
+int cs35l56_read_prot_status(struct cs35l56_base *cs35l56_base,
+                            bool *fw_missing, unsigned int *fw_version);
 int cs35l56_hw_init(struct cs35l56_base *cs35l56_base);
+int cs35l56_get_speaker_id(struct cs35l56_base *cs35l56_base);
 int cs35l56_get_bclk_freq_id(unsigned int freq);
 void cs35l56_fill_supply_names(struct regulator_bulk_data *data);
 
index ecc02e955279fdfa3f10d116eeb5a2d7271cc91c..1f4c39922d825035be15ba38e7ac6d112b36c457 100644 (file)
@@ -30,6 +30,8 @@ static inline void snd_soc_card_mutex_unlock(struct snd_soc_card *card)
 
 struct snd_kcontrol *snd_soc_card_get_kcontrol(struct snd_soc_card *soc_card,
                                               const char *name);
+struct snd_kcontrol *snd_soc_card_get_kcontrol_locked(struct snd_soc_card *soc_card,
+                                                     const char *name);
 int snd_soc_card_jack_new(struct snd_soc_card *card, const char *id, int type,
                          struct snd_soc_jack *jack);
 int snd_soc_card_jack_new_pins(struct snd_soc_card *card, const char *id,
index b00d65417c310a42a39aec6ce927b85083ace264..9aff384941de27f925d0491312a7088ebeb6a297 100644 (file)
@@ -142,6 +142,7 @@ struct tasdevice_priv {
 
 void tas2781_reset(struct tasdevice_priv *tas_dev);
 int tascodec_init(struct tasdevice_priv *tas_priv, void *codec,
+       struct module *module,
        void (*cont)(const struct firmware *fw, void *context));
 struct tasdevice_priv *tasdevice_kzalloc(struct i2c_client *i2c);
 int tasdevice_init(struct tasdevice_priv *tas_priv);
index 8d73171cb9f0d78672355d4e822176d95aa13cb2..450c44c83a5d21bad22485efbb5734c47edb0125 100644 (file)
@@ -1071,6 +1071,31 @@ TRACE_EVENT(afs_file_error,
                      __print_symbolic(__entry->where, afs_file_errors))
            );
 
+TRACE_EVENT(afs_bulkstat_error,
+           TP_PROTO(struct afs_operation *op, struct afs_fid *fid, unsigned int index, s32 abort),
+
+           TP_ARGS(op, fid, index, abort),
+
+           TP_STRUCT__entry(
+                   __field_struct(struct afs_fid,      fid)
+                   __field(unsigned int,               op)
+                   __field(unsigned int,               index)
+                   __field(s32,                        abort)
+                            ),
+
+           TP_fast_assign(
+                   __entry->op = op->debug_id;
+                   __entry->fid = *fid;
+                   __entry->index = index;
+                   __entry->abort = abort;
+                          ),
+
+           TP_printk("OP=%08x[%02x] %llx:%llx:%x a=%d",
+                     __entry->op, __entry->index,
+                     __entry->fid.vid, __entry->fid.vnode, __entry->fid.unique,
+                     __entry->abort)
+           );
+
 TRACE_EVENT(afs_cm_no_server,
            TP_PROTO(struct afs_call *call, struct sockaddr_rxrpc *srx),
 
@@ -1164,8 +1189,8 @@ TRACE_EVENT(afs_flock_op,
                    __entry->from = fl->fl_start;
                    __entry->len = fl->fl_end - fl->fl_start + 1;
                    __entry->op = op;
-                   __entry->type = fl->fl_type;
-                   __entry->flags = fl->fl_flags;
+                   __entry->type = fl->c.flc_type;
+                   __entry->flags = fl->c.flc_flags;
                    __entry->debug_id = fl->fl_u.afs.debug_id;
                           ),
 
index 65029dfb92fbc3162c30d0a85a6805afa3ab335e..a697f4b77162dd79c45c5cdb25db63332818fcc7 100644 (file)
@@ -772,15 +772,14 @@ TRACE_EVENT(ext4_mb_release_group_pa,
 );
 
 TRACE_EVENT(ext4_discard_preallocations,
-       TP_PROTO(struct inode *inode, unsigned int len, unsigned int needed),
+       TP_PROTO(struct inode *inode, unsigned int len),
 
-       TP_ARGS(inode, len, needed),
+       TP_ARGS(inode, len),
 
        TP_STRUCT__entry(
                __field(        dev_t,          dev             )
                __field(        ino_t,          ino             )
                __field(        unsigned int,   len             )
-               __field(        unsigned int,   needed          )
 
        ),
 
@@ -788,13 +787,11 @@ TRACE_EVENT(ext4_discard_preallocations,
                __entry->dev    = inode->i_sb->s_dev;
                __entry->ino    = inode->i_ino;
                __entry->len    = len;
-               __entry->needed = needed;
        ),
 
-       TP_printk("dev %d,%d ino %lu len: %u needed %u",
+       TP_printk("dev %d,%d ino %lu len: %u",
                  MAJOR(__entry->dev), MINOR(__entry->dev),
-                 (unsigned long) __entry->ino, __entry->len,
-                 __entry->needed)
+                 (unsigned long) __entry->ino, __entry->len)
 );
 
 TRACE_EVENT(ext4_mb_discard_preallocations,
index 1646dadd7f37cf7c97f6cd140e9571f43e9aa141..b8d1e00a7982c9ef966f414ee279ed91663bf550 100644 (file)
@@ -68,11 +68,11 @@ DECLARE_EVENT_CLASS(filelock_lock,
                __field(struct file_lock *, fl)
                __field(unsigned long, i_ino)
                __field(dev_t, s_dev)
-               __field(struct file_lock *, fl_blocker)
-               __field(fl_owner_t, fl_owner)
-               __field(unsigned int, fl_pid)
-               __field(unsigned int, fl_flags)
-               __field(unsigned char, fl_type)
+               __field(struct file_lock_core *, blocker)
+               __field(fl_owner_t, owner)
+               __field(unsigned int, pid)
+               __field(unsigned int, flags)
+               __field(unsigned char, type)
                __field(loff_t, fl_start)
                __field(loff_t, fl_end)
                __field(int, ret)
@@ -82,11 +82,11 @@ DECLARE_EVENT_CLASS(filelock_lock,
                __entry->fl = fl ? fl : NULL;
                __entry->s_dev = inode->i_sb->s_dev;
                __entry->i_ino = inode->i_ino;
-               __entry->fl_blocker = fl ? fl->fl_blocker : NULL;
-               __entry->fl_owner = fl ? fl->fl_owner : NULL;
-               __entry->fl_pid = fl ? fl->fl_pid : 0;
-               __entry->fl_flags = fl ? fl->fl_flags : 0;
-               __entry->fl_type = fl ? fl->fl_type : 0;
+               __entry->blocker = fl ? fl->c.flc_blocker : NULL;
+               __entry->owner = fl ? fl->c.flc_owner : NULL;
+               __entry->pid = fl ? fl->c.flc_pid : 0;
+               __entry->flags = fl ? fl->c.flc_flags : 0;
+               __entry->type = fl ? fl->c.flc_type : 0;
                __entry->fl_start = fl ? fl->fl_start : 0;
                __entry->fl_end = fl ? fl->fl_end : 0;
                __entry->ret = ret;
@@ -94,9 +94,9 @@ DECLARE_EVENT_CLASS(filelock_lock,
 
        TP_printk("fl=%p dev=0x%x:0x%x ino=0x%lx fl_blocker=%p fl_owner=%p fl_pid=%u fl_flags=%s fl_type=%s fl_start=%lld fl_end=%lld ret=%d",
                __entry->fl, MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
-               __entry->i_ino, __entry->fl_blocker, __entry->fl_owner,
-               __entry->fl_pid, show_fl_flags(__entry->fl_flags),
-               show_fl_type(__entry->fl_type),
+               __entry->i_ino, __entry->blocker, __entry->owner,
+               __entry->pid, show_fl_flags(__entry->flags),
+               show_fl_type(__entry->type),
                __entry->fl_start, __entry->fl_end, __entry->ret)
 );
 
@@ -117,59 +117,59 @@ DEFINE_EVENT(filelock_lock, flock_lock_inode,
                TP_ARGS(inode, fl, ret));
 
 DECLARE_EVENT_CLASS(filelock_lease,
-       TP_PROTO(struct inode *inode, struct file_lock *fl),
+       TP_PROTO(struct inode *inode, struct file_lease *fl),
 
        TP_ARGS(inode, fl),
 
        TP_STRUCT__entry(
-               __field(struct file_lock *, fl)
+               __field(struct file_lease *, fl)
                __field(unsigned long, i_ino)
                __field(dev_t, s_dev)
-               __field(struct file_lock *, fl_blocker)
-               __field(fl_owner_t, fl_owner)
-               __field(unsigned int, fl_flags)
-               __field(unsigned char, fl_type)
-               __field(unsigned long, fl_break_time)
-               __field(unsigned long, fl_downgrade_time)
+               __field(struct file_lock_core *, blocker)
+               __field(fl_owner_t, owner)
+               __field(unsigned int, flags)
+               __field(unsigned char, type)
+               __field(unsigned long, break_time)
+               __field(unsigned long, downgrade_time)
        ),
 
        TP_fast_assign(
                __entry->fl = fl ? fl : NULL;
                __entry->s_dev = inode->i_sb->s_dev;
                __entry->i_ino = inode->i_ino;
-               __entry->fl_blocker = fl ? fl->fl_blocker : NULL;
-               __entry->fl_owner = fl ? fl->fl_owner : NULL;
-               __entry->fl_flags = fl ? fl->fl_flags : 0;
-               __entry->fl_type = fl ? fl->fl_type : 0;
-               __entry->fl_break_time = fl ? fl->fl_break_time : 0;
-               __entry->fl_downgrade_time = fl ? fl->fl_downgrade_time : 0;
+               __entry->blocker = fl ? fl->c.flc_blocker : NULL;
+               __entry->owner = fl ? fl->c.flc_owner : NULL;
+               __entry->flags = fl ? fl->c.flc_flags : 0;
+               __entry->type = fl ? fl->c.flc_type : 0;
+               __entry->break_time = fl ? fl->fl_break_time : 0;
+               __entry->downgrade_time = fl ? fl->fl_downgrade_time : 0;
        ),
 
        TP_printk("fl=%p dev=0x%x:0x%x ino=0x%lx fl_blocker=%p fl_owner=%p fl_flags=%s fl_type=%s fl_break_time=%lu fl_downgrade_time=%lu",
                __entry->fl, MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
-               __entry->i_ino, __entry->fl_blocker, __entry->fl_owner,
-               show_fl_flags(__entry->fl_flags),
-               show_fl_type(__entry->fl_type),
-               __entry->fl_break_time, __entry->fl_downgrade_time)
+               __entry->i_ino, __entry->blocker, __entry->owner,
+               show_fl_flags(__entry->flags),
+               show_fl_type(__entry->type),
+               __entry->break_time, __entry->downgrade_time)
 );
 
-DEFINE_EVENT(filelock_lease, break_lease_noblock, TP_PROTO(struct inode *inode, struct file_lock *fl),
+DEFINE_EVENT(filelock_lease, break_lease_noblock, TP_PROTO(struct inode *inode, struct file_lease *fl),
                TP_ARGS(inode, fl));
 
-DEFINE_EVENT(filelock_lease, break_lease_block, TP_PROTO(struct inode *inode, struct file_lock *fl),
+DEFINE_EVENT(filelock_lease, break_lease_block, TP_PROTO(struct inode *inode, struct file_lease *fl),
                TP_ARGS(inode, fl));
 
-DEFINE_EVENT(filelock_lease, break_lease_unblock, TP_PROTO(struct inode *inode, struct file_lock *fl),
+DEFINE_EVENT(filelock_lease, break_lease_unblock, TP_PROTO(struct inode *inode, struct file_lease *fl),
                TP_ARGS(inode, fl));
 
-DEFINE_EVENT(filelock_lease, generic_delete_lease, TP_PROTO(struct inode *inode, struct file_lock *fl),
+DEFINE_EVENT(filelock_lease, generic_delete_lease, TP_PROTO(struct inode *inode, struct file_lease *fl),
                TP_ARGS(inode, fl));
 
-DEFINE_EVENT(filelock_lease, time_out_leases, TP_PROTO(struct inode *inode, struct file_lock *fl),
+DEFINE_EVENT(filelock_lease, time_out_leases, TP_PROTO(struct inode *inode, struct file_lease *fl),
                TP_ARGS(inode, fl));
 
 TRACE_EVENT(generic_add_lease,
-       TP_PROTO(struct inode *inode, struct file_lock *fl),
+       TP_PROTO(struct inode *inode, struct file_lease *fl),
 
        TP_ARGS(inode, fl),
 
@@ -179,9 +179,9 @@ TRACE_EVENT(generic_add_lease,
                __field(int, rcount)
                __field(int, icount)
                __field(dev_t, s_dev)
-               __field(fl_owner_t, fl_owner)
-               __field(unsigned int, fl_flags)
-               __field(unsigned char, fl_type)
+               __field(fl_owner_t, owner)
+               __field(unsigned int, flags)
+               __field(unsigned char, type)
        ),
 
        TP_fast_assign(
@@ -190,21 +190,21 @@ TRACE_EVENT(generic_add_lease,
                __entry->wcount = atomic_read(&inode->i_writecount);
                __entry->rcount = atomic_read(&inode->i_readcount);
                __entry->icount = atomic_read(&inode->i_count);
-               __entry->fl_owner = fl->fl_owner;
-               __entry->fl_flags = fl->fl_flags;
-               __entry->fl_type = fl->fl_type;
+               __entry->owner = fl->c.flc_owner;
+               __entry->flags = fl->c.flc_flags;
+               __entry->type = fl->c.flc_type;
        ),
 
        TP_printk("dev=0x%x:0x%x ino=0x%lx wcount=%d rcount=%d icount=%d fl_owner=%p fl_flags=%s fl_type=%s",
                MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
                __entry->i_ino, __entry->wcount, __entry->rcount,
-               __entry->icount, __entry->fl_owner,
-               show_fl_flags(__entry->fl_flags),
-               show_fl_type(__entry->fl_type))
+               __entry->icount, __entry->owner,
+               show_fl_flags(__entry->flags),
+               show_fl_type(__entry->type))
 );
 
 TRACE_EVENT(leases_conflict,
-       TP_PROTO(bool conflict, struct file_lock *lease, struct file_lock *breaker),
+       TP_PROTO(bool conflict, struct file_lease *lease, struct file_lease *breaker),
 
        TP_ARGS(conflict, lease, breaker),
 
@@ -220,11 +220,11 @@ TRACE_EVENT(leases_conflict,
 
        TP_fast_assign(
                __entry->lease = lease;
-               __entry->l_fl_flags = lease->fl_flags;
-               __entry->l_fl_type = lease->fl_type;
+               __entry->l_fl_flags = lease->c.flc_flags;
+               __entry->l_fl_type = lease->c.flc_type;
                __entry->breaker = breaker;
-               __entry->b_fl_flags = breaker->fl_flags;
-               __entry->b_fl_type = breaker->fl_type;
+               __entry->b_fl_flags = breaker->c.flc_flags;
+               __entry->b_fl_type = breaker->c.flc_type;
                __entry->conflict = conflict;
        ),
 
index 69454f1f98b01eb3d16eb01eb2c9c39d69c7705b..e948df7ce62597507d88327723d5586f578c5c8d 100644 (file)
@@ -148,7 +148,7 @@ TRACE_EVENT(io_uring_queue_async_work,
                __field(  void *,                       req             )
                __field(  u64,                          user_data       )
                __field(  u8,                           opcode          )
-               __field(  unsigned int,                 flags           )
+               __field(  unsigned long long,           flags           )
                __field(  struct io_wq_work *,          work            )
                __field(  int,                          rw              )
 
@@ -159,7 +159,7 @@ TRACE_EVENT(io_uring_queue_async_work,
                __entry->ctx            = req->ctx;
                __entry->req            = req;
                __entry->user_data      = req->cqe.user_data;
-               __entry->flags          = req->flags;
+               __entry->flags          = (__force unsigned long long) req->flags;
                __entry->opcode         = req->opcode;
                __entry->work           = &req->work;
                __entry->rw             = rw;
@@ -167,10 +167,10 @@ TRACE_EVENT(io_uring_queue_async_work,
                __assign_str(op_str, io_uring_get_opcode(req->opcode));
        ),
 
-       TP_printk("ring %p, request %p, user_data 0x%llx, opcode %s, flags 0x%x, %s queue, work %p",
+       TP_printk("ring %p, request %p, user_data 0x%llx, opcode %s, flags 0x%llx, %s queue, work %p",
                __entry->ctx, __entry->req, __entry->user_data,
-               __get_str(op_str),
-               __entry->flags, __entry->rw ? "hashed" : "normal", __entry->work)
+               __get_str(op_str), __entry->flags,
+               __entry->rw ? "hashed" : "normal", __entry->work)
 );
 
 /**
@@ -378,7 +378,7 @@ TRACE_EVENT(io_uring_submit_req,
                __field(  void *,               req             )
                __field(  unsigned long long,   user_data       )
                __field(  u8,                   opcode          )
-               __field(  u32,                  flags           )
+               __field(  unsigned long long,   flags           )
                __field(  bool,                 sq_thread       )
 
                __string( op_str, io_uring_get_opcode(req->opcode) )
@@ -389,16 +389,16 @@ TRACE_EVENT(io_uring_submit_req,
                __entry->req            = req;
                __entry->user_data      = req->cqe.user_data;
                __entry->opcode         = req->opcode;
-               __entry->flags          = req->flags;
+               __entry->flags          = (__force unsigned long long) req->flags;
                __entry->sq_thread      = req->ctx->flags & IORING_SETUP_SQPOLL;
 
                __assign_str(op_str, io_uring_get_opcode(req->opcode));
        ),
 
-       TP_printk("ring %p, req %p, user_data 0x%llx, opcode %s, flags 0x%x, "
+       TP_printk("ring %p, req %p, user_data 0x%llx, opcode %s, flags 0x%llx, "
                  "sq_thread %d", __entry->ctx, __entry->req,
-                 __entry->user_data, __get_str(op_str),
-                 __entry->flags, __entry->sq_thread)
+                 __entry->user_data, __get_str(op_str), __entry->flags,
+                 __entry->sq_thread)
 );
 
 /*
@@ -602,29 +602,25 @@ TRACE_EVENT(io_uring_cqe_overflow,
  *
  * @tctx:              pointer to a io_uring_task
  * @count:             how many functions it ran
- * @loops:             how many loops it ran
  *
  */
 TRACE_EVENT(io_uring_task_work_run,
 
-       TP_PROTO(void *tctx, unsigned int count, unsigned int loops),
+       TP_PROTO(void *tctx, unsigned int count),
 
-       TP_ARGS(tctx, count, loops),
+       TP_ARGS(tctx, count),
 
        TP_STRUCT__entry (
                __field(  void *,               tctx            )
                __field(  unsigned int,         count           )
-               __field(  unsigned int,         loops           )
        ),
 
        TP_fast_assign(
                __entry->tctx           = tctx;
                __entry->count          = count;
-               __entry->loops          = loops;
        ),
 
-       TP_printk("tctx %p, count %u, loops %u",
-                __entry->tctx, __entry->count, __entry->loops)
+       TP_printk("tctx %p, count %u", __entry->tctx, __entry->count)
 );
 
 TRACE_EVENT(io_uring_short_write,
index a3995925cb057021dc779344d19f7e3724f6df3c..1f4258308b967a9ca8e17bbf61ba4ef07b6d786b 100644 (file)
@@ -81,14 +81,14 @@ TRACE_EVENT(qdisc_reset,
        TP_ARGS(q),
 
        TP_STRUCT__entry(
-               __string(       dev,            qdisc_dev(q)    )
-               __string(       kind,           q->ops->id      )
-               __field(        u32,            parent          )
-               __field(        u32,            handle          )
+               __string(       dev,            qdisc_dev(q)->name      )
+               __string(       kind,           q->ops->id              )
+               __field(        u32,            parent                  )
+               __field(        u32,            handle                  )
        ),
 
        TP_fast_assign(
-               __assign_str(dev, qdisc_dev(q));
+               __assign_str(dev, qdisc_dev(q)->name);
                __assign_str(kind, q->ops->id);
                __entry->parent = q->parent;
                __entry->handle = q->handle;
@@ -106,14 +106,14 @@ TRACE_EVENT(qdisc_destroy,
        TP_ARGS(q),
 
        TP_STRUCT__entry(
-               __string(       dev,            qdisc_dev(q)    )
-               __string(       kind,           q->ops->id      )
-               __field(        u32,            parent          )
-               __field(        u32,            handle          )
+               __string(       dev,            qdisc_dev(q)->name      )
+               __string(       kind,           q->ops->id              )
+               __field(        u32,            parent                  )
+               __field(        u32,            handle                  )
        ),
 
        TP_fast_assign(
-               __assign_str(dev, qdisc_dev(q));
+               __assign_str(dev, qdisc_dev(q)->name);
                __assign_str(kind, q->ops->id);
                __entry->parent = q->parent;
                __entry->handle = q->handle;
index 4c1ef7b3705c26baf79c135f235c94195635bf4b..87b8de9b6c1c440ce4a8b2fe6072b4d81cbc1cf4 100644 (file)
        EM(rxrpc_skb_eaten_by_unshare_nomem,    "ETN unshar-nm") \
        EM(rxrpc_skb_get_conn_secured,          "GET conn-secd") \
        EM(rxrpc_skb_get_conn_work,             "GET conn-work") \
+       EM(rxrpc_skb_get_last_nack,             "GET last-nack") \
        EM(rxrpc_skb_get_local_work,            "GET locl-work") \
        EM(rxrpc_skb_get_reject_work,           "GET rej-work ") \
        EM(rxrpc_skb_get_to_recvmsg,            "GET to-recv  ") \
        EM(rxrpc_skb_put_error_report,          "PUT error-rep") \
        EM(rxrpc_skb_put_input,                 "PUT input    ") \
        EM(rxrpc_skb_put_jumbo_subpacket,       "PUT jumbo-sub") \
+       EM(rxrpc_skb_put_last_nack,             "PUT last-nack") \
        EM(rxrpc_skb_put_purge,                 "PUT purge    ") \
        EM(rxrpc_skb_put_rotate,                "PUT rotate   ") \
        EM(rxrpc_skb_put_unknown,               "PUT unknown  ") \
@@ -1552,7 +1554,7 @@ TRACE_EVENT(rxrpc_congest,
                    memcpy(&__entry->sum, summary, sizeof(__entry->sum));
                           ),
 
-           TP_printk("c=%08x r=%08x %s q=%08x %s cw=%u ss=%u nA=%u,%u+%u r=%u b=%u u=%u d=%u l=%x%s%s%s",
+           TP_printk("c=%08x r=%08x %s q=%08x %s cw=%u ss=%u nA=%u,%u+%u,%u b=%u u=%u d=%u l=%x%s%s%s",
                      __entry->call,
                      __entry->ack_serial,
                      __print_symbolic(__entry->sum.ack_reason, rxrpc_ack_names),
@@ -1560,9 +1562,9 @@ TRACE_EVENT(rxrpc_congest,
                      __print_symbolic(__entry->sum.mode, rxrpc_congest_modes),
                      __entry->sum.cwnd,
                      __entry->sum.ssthresh,
-                     __entry->sum.nr_acks, __entry->sum.saw_nacks,
+                     __entry->sum.nr_acks, __entry->sum.nr_retained_nacks,
                      __entry->sum.nr_new_acks,
-                     __entry->sum.nr_rot_new_acks,
+                     __entry->sum.nr_new_nacks,
                      __entry->top - __entry->hard_ack,
                      __entry->sum.cumulative_acks,
                      __entry->sum.dup_acks,
index de1944e42c6556a46a8a87855189f34b86859e99..19a13468eca5e4c13cbdf05444604a54567a67ae 100644 (file)
@@ -53,7 +53,7 @@ extern "C" {
 #define DRM_IVPU_PARAM_CORE_CLOCK_RATE     3
 #define DRM_IVPU_PARAM_NUM_CONTEXTS        4
 #define DRM_IVPU_PARAM_CONTEXT_BASE_ADDRESS 5
-#define DRM_IVPU_PARAM_CONTEXT_PRIORITY            6
+#define DRM_IVPU_PARAM_CONTEXT_PRIORITY            6 /* Deprecated */
 #define DRM_IVPU_PARAM_CONTEXT_ID          7
 #define DRM_IVPU_PARAM_FW_API_VERSION      8
 #define DRM_IVPU_PARAM_ENGINE_HEARTBEAT            9
@@ -64,11 +64,18 @@ extern "C" {
 
 #define DRM_IVPU_PLATFORM_TYPE_SILICON     0
 
+/* Deprecated, use DRM_IVPU_JOB_PRIORITY */
 #define DRM_IVPU_CONTEXT_PRIORITY_IDLE     0
 #define DRM_IVPU_CONTEXT_PRIORITY_NORMAL    1
 #define DRM_IVPU_CONTEXT_PRIORITY_FOCUS            2
 #define DRM_IVPU_CONTEXT_PRIORITY_REALTIME  3
 
+#define DRM_IVPU_JOB_PRIORITY_DEFAULT  0
+#define DRM_IVPU_JOB_PRIORITY_IDLE     1
+#define DRM_IVPU_JOB_PRIORITY_NORMAL   2
+#define DRM_IVPU_JOB_PRIORITY_FOCUS    3
+#define DRM_IVPU_JOB_PRIORITY_REALTIME 4
+
 /**
  * DRM_IVPU_CAP_METRIC_STREAMER
  *
@@ -112,10 +119,6 @@ struct drm_ivpu_param {
         * %DRM_IVPU_PARAM_CONTEXT_BASE_ADDRESS:
         * Lowest VPU virtual address available in the current context (read-only)
         *
-        * %DRM_IVPU_PARAM_CONTEXT_PRIORITY:
-        * Value of current context scheduling priority (read-write).
-        * See DRM_IVPU_CONTEXT_PRIORITY_* for possible values.
-        *
         * %DRM_IVPU_PARAM_CONTEXT_ID:
         * Current context ID, always greater than 0 (read-only)
         *
@@ -286,10 +289,23 @@ struct drm_ivpu_submit {
         * to be executed. The offset has to be 8-byte aligned.
         */
        __u32 commands_offset;
+
+       /**
+        * @priority:
+        *
+        * Priority to be set for related job command queue, can be one of the following:
+        * %DRM_IVPU_JOB_PRIORITY_DEFAULT
+        * %DRM_IVPU_JOB_PRIORITY_IDLE
+        * %DRM_IVPU_JOB_PRIORITY_NORMAL
+        * %DRM_IVPU_JOB_PRIORITY_FOCUS
+        * %DRM_IVPU_JOB_PRIORITY_REALTIME
+        */
+       __u32 priority;
 };
 
 /* drm_ivpu_bo_wait job status codes */
 #define DRM_IVPU_JOB_STATUS_SUCCESS 0
+#define DRM_IVPU_JOB_STATUS_ABORTED 256
 
 /**
  * struct drm_ivpu_bo_wait - Wait for BO to become inactive
index 0bade1592f34f21690eab41de48595d7aaa24fe4..77d7ff0d5b110da4a05a4a7730d01bbd2d7c581e 100644 (file)
@@ -54,6 +54,20 @@ extern "C" {
  */
 #define NOUVEAU_GETPARAM_EXEC_PUSH_MAX   17
 
+/*
+ * NOUVEAU_GETPARAM_VRAM_BAR_SIZE - query bar size
+ *
+ * Query the VRAM BAR size.
+ */
+#define NOUVEAU_GETPARAM_VRAM_BAR_SIZE 18
+
+/*
+ * NOUVEAU_GETPARAM_VRAM_USED
+ *
+ * Get remaining VRAM size.
+ */
+#define NOUVEAU_GETPARAM_VRAM_USED 19
+
 struct drm_nouveau_getparam {
        __u64 param;
        __u64 value;
index 9fa3ae324731a6a96d47d81e18566b321f2f0bca..bb0c8a9941164228fef069433194aa2e549a0174 100644 (file)
@@ -831,11 +831,6 @@ struct drm_xe_vm_destroy {
  *  - %DRM_XE_VM_BIND_OP_PREFETCH
  *
  * and the @flags can be:
- *  - %DRM_XE_VM_BIND_FLAG_READONLY
- *  - %DRM_XE_VM_BIND_FLAG_ASYNC
- *  - %DRM_XE_VM_BIND_FLAG_IMMEDIATE - Valid on a faulting VM only, do the
- *    MAP operation immediately rather than deferring the MAP to the page
- *    fault handler.
  *  - %DRM_XE_VM_BIND_FLAG_NULL - When the NULL flag is set, the page
  *    tables are setup with a special bit which indicates writes are
  *    dropped and all reads return zero. In the future, the NULL flags
@@ -928,9 +923,8 @@ struct drm_xe_vm_bind_op {
        /** @op: Bind operation to perform */
        __u32 op;
 
-#define DRM_XE_VM_BIND_FLAG_READONLY   (1 << 0)
-#define DRM_XE_VM_BIND_FLAG_IMMEDIATE  (1 << 1)
 #define DRM_XE_VM_BIND_FLAG_NULL       (1 << 2)
+#define DRM_XE_VM_BIND_FLAG_DUMPABLE   (1 << 3)
        /** @flags: Bind flags */
        __u32 flags;
 
@@ -1045,20 +1039,6 @@ struct drm_xe_exec_queue_create {
 #define DRM_XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY               0
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY              0
 #define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE             1
-#define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PREEMPTION_TIMEOUT    2
-#define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_PERSISTENCE           3
-#define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_JOB_TIMEOUT           4
-#define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_ACC_TRIGGER           5
-#define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_ACC_NOTIFY            6
-#define   DRM_XE_EXEC_QUEUE_SET_PROPERTY_ACC_GRANULARITY       7
-/* Monitor 128KB contiguous region with 4K sub-granularity */
-#define     DRM_XE_ACC_GRANULARITY_128K                                0
-/* Monitor 2MB contiguous region with 64KB sub-granularity */
-#define     DRM_XE_ACC_GRANULARITY_2M                          1
-/* Monitor 16MB contiguous region with 512KB sub-granularity */
-#define     DRM_XE_ACC_GRANULARITY_16M                         2
-/* Monitor 64MB contiguous region with 2M sub-granularity */
-#define     DRM_XE_ACC_GRANULARITY_64M                         3
 
        /** @extensions: Pointer to the first extension struct, if any */
        __u64 extensions;
index 7c29d82db9ee0dcb5ce770b384149c9734a50f30..f8bc34a6bcfa2f7313f2e9eac38e2df6a25aafca 100644 (file)
@@ -614,6 +614,9 @@ struct btrfs_ioctl_clone_range_args {
  */
 #define BTRFS_DEFRAG_RANGE_COMPRESS 1
 #define BTRFS_DEFRAG_RANGE_START_IO 2
+#define BTRFS_DEFRAG_RANGE_FLAGS_SUPP  (BTRFS_DEFRAG_RANGE_COMPRESS |          \
+                                        BTRFS_DEFRAG_RANGE_START_IO)
+
 struct btrfs_ioctl_defrag_range_args {
        /* start of the defrag operation */
        __u64 start;
index 48ad69f7722e1ae51ae5871a06482b6aa45dfc18..45e4e64fd6643ce3a83711cb295c711dd67ca511 100644 (file)
@@ -64,6 +64,24 @@ struct fstrim_range {
        __u64 minlen;
 };
 
+/*
+ * We include a length field because some filesystems (vfat) have an identifier
+ * that we do want to expose as a UUID, but doesn't have the standard length.
+ *
+ * We use a fixed size buffer beacuse this interface will, by fiat, never
+ * support "UUIDs" longer than 16 bytes; we don't want to force all downstream
+ * users to have to deal with that.
+ */
+struct fsuuid2 {
+       __u8    len;
+       __u8    uuid[16];
+};
+
+struct fs_sysfs_path {
+       __u8                    len;
+       __u8                    name[128];
+};
+
 /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions */
 #define FILE_DEDUPE_RANGE_SAME         0
 #define FILE_DEDUPE_RANGE_DIFFERS      1
@@ -215,6 +233,13 @@ struct fsxattr {
 #define FS_IOC_FSSETXATTR              _IOW('X', 32, struct fsxattr)
 #define FS_IOC_GETFSLABEL              _IOR(0x94, 49, char[FSLABEL_MAX])
 #define FS_IOC_SETFSLABEL              _IOW(0x94, 50, char[FSLABEL_MAX])
+/* Returns the external filesystem UUID, the same one blkid returns */
+#define FS_IOC_GETFSUUID               _IOR(0x15, 0, struct fsuuid2)
+/*
+ * Returns the path component under /sys/fs/ that refers to this filesystem;
+ * also /sys/kernel/debug/ for filesystems with debugfs exports
+ */
+#define FS_IOC_GETFSSYSFSPATH          _IOR(0x15, 1, struct fs_sysfs_path)
 
 /*
  * Inode flags (FS_IOC_GETFLAGS / FS_IOC_SETFLAGS)
@@ -301,9 +326,12 @@ typedef int __bitwise __kernel_rwf_t;
 /* per-IO O_APPEND */
 #define RWF_APPEND     ((__force __kernel_rwf_t)0x00000010)
 
+/* per-IO negation of O_APPEND */
+#define RWF_NOAPPEND   ((__force __kernel_rwf_t)0x00000020)
+
 /* mask of flags supported by the kernel */
 #define RWF_SUPPORTED  (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\
-                        RWF_APPEND)
+                        RWF_APPEND | RWF_NOAPPEND)
 
 /* Pagemap ioctl */
 #define PAGEMAP_SCAN   _IOWR('f', 16, struct pm_scan_arg)
index 5060963707b1ed44d9b640454e9b0656a505d761..f2e0b2d50e6b5ffadd52ced37e8a1e36999c9d97 100644 (file)
@@ -91,8 +91,6 @@ enum iio_modifier {
        IIO_MOD_CO2,
        IIO_MOD_VOC,
        IIO_MOD_LIGHT_UV,
-       IIO_MOD_LIGHT_UVA,
-       IIO_MOD_LIGHT_UVB,
        IIO_MOD_LIGHT_DUV,
        IIO_MOD_PM1,
        IIO_MOD_PM2P5,
@@ -107,6 +105,8 @@ enum iio_modifier {
        IIO_MOD_PITCH,
        IIO_MOD_YAW,
        IIO_MOD_ROLL,
+       IIO_MOD_LIGHT_UVA,
+       IIO_MOD_LIGHT_UVB,
 };
 
 enum iio_event_type {
index c4c53a9ab9595b2a5b95e5b22cafa5bd2cd6fd3c..ff8d21f9e95b7798eaf3e00635050e1631d6697a 100644 (file)
@@ -145,7 +145,7 @@ struct in6_flowlabel_req {
 #define IPV6_TLV_PADN          1
 #define IPV6_TLV_ROUTERALERT   5
 #define IPV6_TLV_CALIPSO       7       /* RFC 5570 */
-#define IPV6_TLV_IOAM          49      /* TEMPORARY IANA allocation for IOAM */
+#define IPV6_TLV_IOAM          49      /* RFC 9486 */
 #define IPV6_TLV_JUMBO         194
 #define IPV6_TLV_HAO           201     /* home address option */
 
index 7a673b52827b160afbd9d3b3bd64d3112dc7ce7c..7bd10201a02bc833a5f6334d45b03e50a17f0e1b 100644 (file)
@@ -255,6 +255,7 @@ enum io_uring_op {
        IORING_OP_FUTEX_WAKE,
        IORING_OP_FUTEX_WAITV,
        IORING_OP_FIXED_FD_INSTALL,
+       IORING_OP_FTRUNCATE,
 
        /* this goes last, obviously */
        IORING_OP_LAST,
@@ -570,6 +571,10 @@ enum {
        /* return status information for a buffer group */
        IORING_REGISTER_PBUF_STATUS             = 26,
 
+       /* set/clear busy poll settings */
+       IORING_REGISTER_NAPI                    = 27,
+       IORING_UNREGISTER_NAPI                  = 28,
+
        /* this goes last */
        IORING_REGISTER_LAST,
 
@@ -703,6 +708,14 @@ struct io_uring_buf_status {
        __u32   resv[8];
 };
 
+/* argument for IORING_(UN)REGISTER_NAPI */
+struct io_uring_napi {
+       __u32   busy_poll_to;
+       __u8    prefer_busy_poll;
+       __u8    pad[3];
+       __u64   resv;
+};
+
 /*
  * io_uring_restriction->opcode values
  */
index 6325d1d0e90f5dcdc7bdc91d612f8fc4c7b40135..1b40a968ba91fc9d2d0a8339261584a0d89ff9a9 100644 (file)
 #define DMA_BUF_MAGIC          0x444d4142      /* "DMAB" */
 #define DEVMEM_MAGIC           0x454d444d      /* "DMEM" */
 #define SECRETMEM_MAGIC                0x5345434d      /* "SECM" */
+#define PID_FS_MAGIC           0x50494446      /* "PIDF" */
 
 #endif /* __LINUX_MAGIC_H__ */
index ca30232b7bc8af49a6c3dd1c03e105628aafabf9..117c6a9b845b1a6fde23a952560c0e807a5a3d90 100644 (file)
@@ -285,9 +285,11 @@ enum nft_rule_attributes {
 /**
  * enum nft_rule_compat_flags - nf_tables rule compat flags
  *
+ * @NFT_RULE_COMPAT_F_UNUSED: unused
  * @NFT_RULE_COMPAT_F_INV: invert the check result
  */
 enum nft_rule_compat_flags {
+       NFT_RULE_COMPAT_F_UNUSED = (1 << 0),
        NFT_RULE_COMPAT_F_INV   = (1 << 1),
        NFT_RULE_COMPAT_F_MASK  = NFT_RULE_COMPAT_F_INV,
 };
index 5406fbc1307489e2083a5c9136cf00c3844506ca..72ec000a97cda30dfeea1d04493f0ac7af7ed300 100644 (file)
@@ -7,6 +7,12 @@
 #include <linux/fcntl.h>
 
 /* Flags for pidfd_open().  */
-#define PIDFD_NONBLOCK O_NONBLOCK
+#define PIDFD_NONBLOCK O_NONBLOCK
+#define PIDFD_THREAD   O_EXCL
+
+/* Flags for pidfd_send_signal(). */
+#define PIDFD_SIGNAL_THREAD            (1UL << 0)
+#define PIDFD_SIGNAL_THREAD_GROUP      (1UL << 1)
+#define PIDFD_SIGNAL_PROCESS_GROUP     (1UL << 2)
 
 #endif /* _UAPI_LINUX_PIDFD_H */
index 9086367db0435365689d1f8cbe2a9693958df401..de9b4733607e6b61b08ff7089ff90070168ff4a2 100644 (file)
@@ -145,12 +145,13 @@ struct serial_rs485 {
 #define SER_RS485_ENABLED              _BITUL(0)
 #define SER_RS485_RTS_ON_SEND          _BITUL(1)
 #define SER_RS485_RTS_AFTER_SEND       _BITUL(2)
-#define SER_RS485_RX_DURING_TX         _BITUL(3)
-#define SER_RS485_TERMINATE_BUS                _BITUL(4)
-#define SER_RS485_ADDRB                        _BITUL(5)
-#define SER_RS485_ADDR_RECV            _BITUL(6)
-#define SER_RS485_ADDR_DEST            _BITUL(7)
-#define SER_RS485_MODE_RS422           _BITUL(8)
+/* Placeholder for bit 3: SER_RS485_RTS_BEFORE_SEND, which isn't used anymore */
+#define SER_RS485_RX_DURING_TX         _BITUL(4)
+#define SER_RS485_TERMINATE_BUS                _BITUL(5)
+#define SER_RS485_ADDRB                        _BITUL(6)
+#define SER_RS485_ADDR_RECV            _BITUL(7)
+#define SER_RS485_ADDR_DEST            _BITUL(8)
+#define SER_RS485_MODE_RS422           _BITUL(9)
 
        __u32   delay_rts_before_send;
        __u32   delay_rts_after_send;
index b9cfc5c962682364e6dda544322ec07aa4a1ff83..c8dc5f8ea699627402dc2069615acd066f103981 100644 (file)
@@ -49,6 +49,8 @@
        _IOR('u', UBLK_CMD_GET_DEV_INFO2, struct ublksrv_ctrl_cmd)
 #define UBLK_U_CMD_GET_FEATURES        \
        _IOR('u', 0x13, struct ublksrv_ctrl_cmd)
+#define UBLK_U_CMD_DEL_DEV_ASYNC       \
+       _IOR('u', 0x14, struct ublksrv_ctrl_cmd)
 
 /*
  * 64bits are enough now, and it should be easy to extend in case of
index d5b9cfbd9ceac69323d0fe487cc49ab388a2e523..628d46a0da92eb0393dd592a38e987d08dcf6db0 100644 (file)
@@ -142,7 +142,7 @@ struct snd_hwdep_dsp_image {
  *                                                                           *
  *****************************************************************************/
 
-#define SNDRV_PCM_VERSION              SNDRV_PROTOCOL_VERSION(2, 0, 16)
+#define SNDRV_PCM_VERSION              SNDRV_PROTOCOL_VERSION(2, 0, 17)
 
 typedef unsigned long snd_pcm_uframes_t;
 typedef signed long snd_pcm_sframes_t;
@@ -416,7 +416,7 @@ struct snd_pcm_hw_params {
        unsigned int rmask;             /* W: requested masks */
        unsigned int cmask;             /* R: changed masks */
        unsigned int info;              /* R: Info flags for returned setup */
-       unsigned int msbits;            /* R: used most significant bits */
+       unsigned int msbits;            /* R: used most significant bits (in sample bit-width) */
        unsigned int rate_num;          /* R: rate numerator */
        unsigned int rate_den;          /* R: rate denominator */
        snd_pcm_uframes_t fifo_size;    /* R: chip FIFO size in frames */
index 48d2790ef928c798279babb69fbd07f7d6dcc20d..3109282672f33cecc00cd3f08e9dfd22e7991637 100644 (file)
@@ -31,7 +31,10 @@ struct ioctl_gntalloc_alloc_gref {
        __u64 index;
        /* The grant references of the newly created grant, one per page */
        /* Variable size, depending on count */
-       __u32 gref_ids[1];
+       union {
+               __u32 gref_ids[1];
+               __DECLARE_FLEX_ARRAY(__u32, gref_ids_flex);
+       };
 };
 
 #define GNTALLOC_FLAG_WRITABLE 1
index 8df18f3a974846b48e41b2a8dcbc2f2f2f90128e..bee58f7468c36a8f030c73f1c16bbfb2ab9acab2 100644 (file)
@@ -89,6 +89,15 @@ config CC_HAS_ASM_GOTO_TIED_OUTPUT
        # Detect buggy gcc and clang, fixed in gcc-11 clang-14.
        def_bool $(success,echo 'int foo(int *x) { asm goto (".long (%l[bar]) - .": "+m"(*x) ::: bar); return *x; bar: return 0; }' | $CC -x c - -c -o /dev/null)
 
+config GCC_ASM_GOTO_OUTPUT_WORKAROUND
+       bool
+       depends on CC_IS_GCC && CC_HAS_ASM_GOTO_OUTPUT
+       # Fixed in GCC 14, 13.3, 12.4 and 11.5
+       # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921
+       default y if GCC_VERSION < 110500
+       default y if GCC_VERSION >= 120000 && GCC_VERSION < 120400
+       default y if GCC_VERSION >= 130000 && GCC_VERSION < 130300
+
 config TOOLS_SUPPORT_RELR
        def_bool $(success,env "CC=$(CC)" "LD=$(LD)" "NM=$(NM)" "OBJCOPY=$(OBJCOPY)" $(srctree)/scripts/tools-support-relr.sh)
 
@@ -867,14 +876,26 @@ config CC_IMPLICIT_FALLTHROUGH
        default "-Wimplicit-fallthrough=5" if CC_IS_GCC && $(cc-option,-Wimplicit-fallthrough=5)
        default "-Wimplicit-fallthrough" if CC_IS_CLANG && $(cc-option,-Wunreachable-code-fallthrough)
 
-# Currently, disable gcc-11+ array-bounds globally.
+# Currently, disable gcc-10+ array-bounds globally.
 # It's still broken in gcc-13, so no upper bound yet.
-config GCC11_NO_ARRAY_BOUNDS
+config GCC10_NO_ARRAY_BOUNDS
        def_bool y
 
 config CC_NO_ARRAY_BOUNDS
        bool
-       default y if CC_IS_GCC && GCC_VERSION >= 110000 && GCC11_NO_ARRAY_BOUNDS
+       default y if CC_IS_GCC && GCC_VERSION >= 100000 && GCC10_NO_ARRAY_BOUNDS
+
+# Currently, disable -Wstringop-overflow for GCC globally.
+config GCC_NO_STRINGOP_OVERFLOW
+       def_bool y
+
+config CC_NO_STRINGOP_OVERFLOW
+       bool
+       default y if CC_IS_GCC && GCC_NO_STRINGOP_OVERFLOW
+
+config CC_STRINGOP_OVERFLOW
+       bool
+       default y if CC_IS_GCC && !CC_NO_STRINGOP_OVERFLOW
 
 #
 # For architectures that know their GCC __int128 support is sound
index 279ad28bf4fb148e37cbd9600842a567c813039c..3c5fd993bc7e3e955f322e506bdad8aea8e0edbb 100644 (file)
@@ -208,6 +208,9 @@ retry:
                                goto out;
                        case -EACCES:
                        case -EINVAL:
+#ifdef CONFIG_BLOCK
+                               init_flush_fput();
+#endif
                                continue;
                }
                /*
index 15e372b00ce704bc1b1e5db6514db3a4dcd654a4..6069ea3eb80d70106d6a8d8b3515d50176353206 100644 (file)
@@ -9,6 +9,8 @@
 #include <linux/major.h>
 #include <linux/root_dev.h>
 #include <linux/init_syscalls.h>
+#include <linux/task_work.h>
+#include <linux/file.h>
 
 void  mount_root_generic(char *name, char *pretty_name, int flags);
 void  mount_root(char *root_device_name);
@@ -41,3 +43,10 @@ static inline bool initrd_load(char *root_device_name)
        }
 
 #endif
+
+/* Ensure that async file closing finished to prevent spurious errors. */
+static inline void init_flush_fput(void)
+{
+       flush_delayed_fput();
+       task_work_run();
+}
index 7ecb458eb3da60eb73123f4b2072910f194c4bb5..4daee6d761c86c11bcfe76b04a137d7f0ea38ec9 100644 (file)
@@ -147,6 +147,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
        .rcu_tasks_holdout = false,
        .rcu_tasks_holdout_list = LIST_HEAD_INIT(init_task.rcu_tasks_holdout_list),
        .rcu_tasks_idle_cpu = -1,
+       .rcu_tasks_exit_list = LIST_HEAD_INIT(init_task.rcu_tasks_exit_list),
 #endif
 #ifdef CONFIG_TASKS_TRACE_RCU
        .trc_reader_nesting = 0,
index 76deb48c38cb16dd779de7ee91b35785b6890579..01dbd0e8150180c2dd50d449adf7ad5d3ac0c8d0 100644 (file)
 #include <linux/mm.h>
 #include <linux/namei.h>
 #include <linux/init_syscalls.h>
-#include <linux/task_work.h>
 #include <linux/umh.h>
 
+#include "do_mounts.h"
+
 static __initdata bool csum_present;
 static __initdata u32 io_csum;
 
@@ -679,8 +680,6 @@ static void __init populate_initrd_image(char *err)
        struct file *file;
        loff_t pos = 0;
 
-       unpack_to_rootfs(__initramfs_start, __initramfs_size);
-
        printk(KERN_INFO "rootfs image is not initramfs (%s); looks like an initrd\n",
                        err);
        file = filp_open("/initrd.image", O_WRONLY | O_CREAT, 0700);
@@ -736,8 +735,7 @@ done:
        initrd_start = 0;
        initrd_end = 0;
 
-       flush_delayed_fput();
-       task_work_run();
+       init_flush_fput();
 }
 
 static ASYNC_DOMAIN_EXCLUSIVE(initramfs_domain);
index e24b0780fdff7a807bd027ab26e61fc303c624ef..2fbf6a3114d57a8458bfcd0fd3d3b43163714010 100644 (file)
@@ -99,6 +99,7 @@
 #include <linux/init_syscalls.h>
 #include <linux/stackdepot.h>
 #include <linux/randomize_kstack.h>
+#include <linux/pidfs.h>
 #include <net/net_namespace.h>
 
 #include <asm/io.h>
@@ -1059,6 +1060,7 @@ void start_kernel(void)
        seq_file_init();
        proc_root_init();
        nsfs_init();
+       pidfs_init();
        cpuset_init();
        cgroup_init();
        taskstats_init_early();
index 2cdc51825405371a05e4357b0956799c6672431f..2e1d4e03799c34f4e56fa2cc5397eaeb69756498 100644 (file)
@@ -8,6 +8,7 @@ obj-$(CONFIG_IO_URING)          += io_uring.o xattr.o nop.o fs.o splice.o \
                                        statx.o net.o msg_ring.o timeout.o \
                                        sqpoll.o fdinfo.o tctx.o poll.o \
                                        cancel.o kbuf.o rsrc.o rw.o opdef.o \
-                                       notif.o waitid.o register.o
+                                       notif.o waitid.o register.o truncate.o
 obj-$(CONFIG_IO_WQ)            += io-wq.o
 obj-$(CONFIG_FUTEX)            += futex.o
+obj-$(CONFIG_NET_RX_BUSY_POLL) += napi.o
index 8a8b07dfc444cde6181e2e5f86b6aa799ab5980b..acfcdd7f059afd871e3dba0b591e26620f3db64b 100644 (file)
@@ -58,9 +58,8 @@ bool io_cancel_req_match(struct io_kiocb *req, struct io_cancel_data *cd)
                return false;
        if (cd->flags & IORING_ASYNC_CANCEL_ALL) {
 check_seq:
-               if (cd->seq == req->work.cancel_seq)
+               if (io_cancel_match_sequence(req, cd->seq))
                        return false;
-               req->work.cancel_seq = cd->seq;
        }
 
        return true;
index c0a8e7c520b6d65479b2874d1ec536f21450342a..76b32e65c03cd72452e37293bfcdc5b604e10267 100644 (file)
@@ -25,4 +25,14 @@ void init_hash_table(struct io_hash_table *table, unsigned size);
 int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg);
 bool io_cancel_req_match(struct io_kiocb *req, struct io_cancel_data *cd);
 
+static inline bool io_cancel_match_sequence(struct io_kiocb *req, int sequence)
+{
+       if ((req->flags & REQ_F_CANCEL_SEQ) && sequence == req->work.cancel_seq)
+               return true;
+
+       req->flags |= REQ_F_CANCEL_SEQ;
+       req->work.cancel_seq = sequence;
+       return false;
+}
+
 #endif
index 976e9500f6518cbc121d212af5e80334ef3e2ace..8d444dd1b0a7b63c786198349648be8ad21c77f5 100644 (file)
@@ -55,6 +55,7 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
        struct io_ring_ctx *ctx = f->private_data;
        struct io_overflow_cqe *ocqe;
        struct io_rings *r = ctx->rings;
+       struct rusage sq_usage;
        unsigned int sq_mask = ctx->sq_entries - 1, cq_mask = ctx->cq_entries - 1;
        unsigned int sq_head = READ_ONCE(r->sq.head);
        unsigned int sq_tail = READ_ONCE(r->sq.tail);
@@ -64,6 +65,7 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
        unsigned int sq_shift = 0;
        unsigned int sq_entries, cq_entries;
        int sq_pid = -1, sq_cpu = -1;
+       u64 sq_total_time = 0, sq_work_time = 0;
        bool has_lock;
        unsigned int i;
 
@@ -145,12 +147,24 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
        if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) {
                struct io_sq_data *sq = ctx->sq_data;
 
-               sq_pid = sq->task_pid;
-               sq_cpu = sq->sq_cpu;
+               /*
+                * sq->thread might be NULL if we raced with the sqpoll
+                * thread termination.
+                */
+               if (sq->thread) {
+                       sq_pid = sq->task_pid;
+                       sq_cpu = sq->sq_cpu;
+                       getrusage(sq->thread, RUSAGE_SELF, &sq_usage);
+                       sq_total_time = (sq_usage.ru_stime.tv_sec * 1000000
+                                        + sq_usage.ru_stime.tv_usec);
+                       sq_work_time = sq->work_time;
+               }
        }
 
        seq_printf(m, "SqThread:\t%d\n", sq_pid);
        seq_printf(m, "SqThreadCpu:\t%d\n", sq_cpu);
+       seq_printf(m, "SqTotalTime:\t%llu\n", sq_total_time);
+       seq_printf(m, "SqWorkTime:\t%llu\n", sq_work_time);
        seq_printf(m, "UserFiles:\t%u\n", ctx->nr_user_files);
        for (i = 0; has_lock && i < ctx->nr_user_files; i++) {
                struct file *f = io_file_from_index(&ctx->file_table, i);
index b47adf170c314daaf4c09b2b8ab1876079ccbf60..b2435c4dca1f9fd24ddadf3396edbf0d146eea96 100644 (file)
@@ -17,7 +17,7 @@ int io_fixed_fd_remove(struct io_ring_ctx *ctx, unsigned int offset);
 int io_register_file_alloc_range(struct io_ring_ctx *ctx,
                                 struct io_uring_file_index_range __user *arg);
 
-unsigned int io_file_get_flags(struct file *file);
+io_req_flags_t io_file_get_flags(struct file *file);
 
 static inline void io_file_bitmap_clear(struct io_file_table *table, int bit)
 {
index cd9a137ad6cefbb907a177fd8f0c9753ac0c70dd..cf348c33f4855e519d1d28ef29cff8f106db5ff3 100644 (file)
@@ -59,7 +59,6 @@
 #include <linux/bvec.h>
 #include <linux/net.h>
 #include <net/sock.h>
-#include <net/af_unix.h>
 #include <linux/anon_inodes.h>
 #include <linux/sched/mm.h>
 #include <linux/uaccess.h>
@@ -95,6 +94,7 @@
 #include "notif.h"
 #include "waitid.h"
 #include "futex.h"
+#include "napi.h"
 
 #include "timeout.h"
 #include "poll.h"
 #define IO_COMPL_BATCH                 32
 #define IO_REQ_ALLOC_BATCH             8
 
-enum {
-       IO_CHECK_CQ_OVERFLOW_BIT,
-       IO_CHECK_CQ_DROPPED_BIT,
-};
-
 struct io_defer_entry {
        struct list_head        list;
        struct io_kiocb         *req;
@@ -349,6 +344,8 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
        INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func);
        INIT_WQ_LIST(&ctx->submit_state.compl_reqs);
        INIT_HLIST_HEAD(&ctx->cancelable_uring_cmd);
+       io_napi_init(ctx);
+
        return ctx;
 err:
        kfree(ctx->cancel_table.hbs);
@@ -463,7 +460,6 @@ static void io_prep_async_work(struct io_kiocb *req)
 
        req->work.list.next = NULL;
        req->work.flags = 0;
-       req->work.cancel_seq = atomic_read(&ctx->cancel_seq);
        if (req->flags & REQ_F_FORCE_ASYNC)
                req->work.flags |= IO_WQ_WORK_CONCURRENT;
 
@@ -670,7 +666,6 @@ static void io_cq_unlock_post(struct io_ring_ctx *ctx)
        io_commit_cqring_flush(ctx);
 }
 
-/* Returns true if there are no backlogged entries after the flush */
 static void io_cqring_overflow_kill(struct io_ring_ctx *ctx)
 {
        struct io_overflow_cqe *ocqe;
@@ -949,6 +944,8 @@ bool io_fill_cqe_req_aux(struct io_kiocb *req, bool defer, s32 res, u32 cflags)
        u64 user_data = req->cqe.user_data;
        struct io_uring_cqe *cqe;
 
+       lockdep_assert(!io_wq_current_is_worker());
+
        if (!defer)
                return __io_post_aux_cqe(ctx, user_data, res, cflags, false);
 
@@ -1025,15 +1022,15 @@ static void __io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
 
 void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
 {
-       if (req->ctx->task_complete && req->ctx->submitter_task != current) {
+       struct io_ring_ctx *ctx = req->ctx;
+
+       if (ctx->task_complete && ctx->submitter_task != current) {
                req->io_task_work.func = io_req_task_complete;
                io_req_task_work_add(req);
        } else if (!(issue_flags & IO_URING_F_UNLOCKED) ||
-                  !(req->ctx->flags & IORING_SETUP_IOPOLL)) {
+                  !(ctx->flags & IORING_SETUP_IOPOLL)) {
                __io_req_complete_post(req, issue_flags);
        } else {
-               struct io_ring_ctx *ctx = req->ctx;
-
                mutex_lock(&ctx->uring_lock);
                __io_req_complete_post(req, issue_flags & ~IO_URING_F_UNLOCKED);
                mutex_unlock(&ctx->uring_lock);
@@ -1174,40 +1171,44 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, struct io_tw_state *ts)
        percpu_ref_put(&ctx->refs);
 }
 
-static unsigned int handle_tw_list(struct llist_node *node,
-                                  struct io_ring_ctx **ctx,
-                                  struct io_tw_state *ts,
-                                  struct llist_node *last)
+/*
+ * Run queued task_work, returning the number of entries processed in *count.
+ * If more entries than max_entries are available, stop processing once this
+ * is reached and return the rest of the list.
+ */
+struct llist_node *io_handle_tw_list(struct llist_node *node,
+                                    unsigned int *count,
+                                    unsigned int max_entries)
 {
-       unsigned int count = 0;
+       struct io_ring_ctx *ctx = NULL;
+       struct io_tw_state ts = { };
 
-       while (node && node != last) {
+       do {
                struct llist_node *next = node->next;
                struct io_kiocb *req = container_of(node, struct io_kiocb,
                                                    io_task_work.node);
 
-               prefetch(container_of(next, struct io_kiocb, io_task_work.node));
-
-               if (req->ctx != *ctx) {
-                       ctx_flush_and_put(*ctx, ts);
-                       *ctx = req->ctx;
+               if (req->ctx != ctx) {
+                       ctx_flush_and_put(ctx, &ts);
+                       ctx = req->ctx;
                        /* if not contended, grab and improve batching */
-                       ts->locked = mutex_trylock(&(*ctx)->uring_lock);
-                       percpu_ref_get(&(*ctx)->refs);
+                       ts.locked = mutex_trylock(&ctx->uring_lock);
+                       percpu_ref_get(&ctx->refs);
                }
                INDIRECT_CALL_2(req->io_task_work.func,
                                io_poll_task_func, io_req_rw_complete,
-                               req, ts);
+                               req, &ts);
                node = next;
-               count++;
+               (*count)++;
                if (unlikely(need_resched())) {
-                       ctx_flush_and_put(*ctx, ts);
-                       *ctx = NULL;
+                       ctx_flush_and_put(ctx, &ts);
+                       ctx = NULL;
                        cond_resched();
                }
-       }
+       } while (node && *count < max_entries);
 
-       return count;
+       ctx_flush_and_put(ctx, &ts);
+       return node;
 }
 
 /**
@@ -1224,22 +1225,6 @@ static inline struct llist_node *io_llist_xchg(struct llist_head *head,
        return xchg(&head->first, new);
 }
 
-/**
- * io_llist_cmpxchg - possibly swap all entries in a lock-less list
- * @head:      the head of lock-less list to delete all entries
- * @old:       expected old value of the first entry of the list
- * @new:       new entry as the head of the list
- *
- * perform a cmpxchg on the first entry of the list.
- */
-
-static inline struct llist_node *io_llist_cmpxchg(struct llist_head *head,
-                                                 struct llist_node *old,
-                                                 struct llist_node *new)
-{
-       return cmpxchg(&head->first, old, new);
-}
-
 static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync)
 {
        struct llist_node *node = llist_del_all(&tctx->task_list);
@@ -1268,45 +1253,41 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync)
        }
 }
 
-void tctx_task_work(struct callback_head *cb)
+struct llist_node *tctx_task_work_run(struct io_uring_task *tctx,
+                                     unsigned int max_entries,
+                                     unsigned int *count)
 {
-       struct io_tw_state ts = {};
-       struct io_ring_ctx *ctx = NULL;
-       struct io_uring_task *tctx = container_of(cb, struct io_uring_task,
-                                                 task_work);
-       struct llist_node fake = {};
        struct llist_node *node;
-       unsigned int loops = 0;
-       unsigned int count = 0;
 
        if (unlikely(current->flags & PF_EXITING)) {
                io_fallback_tw(tctx, true);
-               return;
+               return NULL;
        }
 
-       do {
-               loops++;
-               node = io_llist_xchg(&tctx->task_list, &fake);
-               count += handle_tw_list(node, &ctx, &ts, &fake);
-
-               /* skip expensive cmpxchg if there are items in the list */
-               if (READ_ONCE(tctx->task_list.first) != &fake)
-                       continue;
-               if (ts.locked && !wq_list_empty(&ctx->submit_state.compl_reqs)) {
-                       io_submit_flush_completions(ctx);
-                       if (READ_ONCE(tctx->task_list.first) != &fake)
-                               continue;
-               }
-               node = io_llist_cmpxchg(&tctx->task_list, &fake, NULL);
-       } while (node != &fake);
-
-       ctx_flush_and_put(ctx, &ts);
+       node = llist_del_all(&tctx->task_list);
+       if (node) {
+               node = llist_reverse_order(node);
+               node = io_handle_tw_list(node, count, max_entries);
+       }
 
        /* relaxed read is enough as only the task itself sets ->in_cancel */
        if (unlikely(atomic_read(&tctx->in_cancel)))
                io_uring_drop_tctx_refs(current);
 
-       trace_io_uring_task_work_run(tctx, count, loops);
+       trace_io_uring_task_work_run(tctx, *count);
+       return node;
+}
+
+void tctx_task_work(struct callback_head *cb)
+{
+       struct io_uring_task *tctx;
+       struct llist_node *ret;
+       unsigned int count = 0;
+
+       tctx = container_of(cb, struct io_uring_task, task_work);
+       ret = tctx_task_work_run(tctx, UINT_MAX, &count);
+       /* can't happen */
+       WARN_ON_ONCE(ret);
 }
 
 static inline void io_req_local_work_add(struct io_kiocb *req, unsigned flags)
@@ -1389,6 +1370,15 @@ static void io_req_normal_work_add(struct io_kiocb *req)
        if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
                atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
 
+       /* SQPOLL doesn't need the task_work added, it'll run it itself */
+       if (ctx->flags & IORING_SETUP_SQPOLL) {
+               struct io_sq_data *sqd = ctx->sq_data;
+
+               if (wq_has_sleeper(&sqd->wait))
+                       wake_up(&sqd->wait);
+               return;
+       }
+
        if (likely(!task_work_add(req->task, &tctx->task_work, ctx->notify_method)))
                return;
 
@@ -1420,7 +1410,20 @@ static void __cold io_move_task_work_from_local(struct io_ring_ctx *ctx)
        }
 }
 
-static int __io_run_local_work(struct io_ring_ctx *ctx, struct io_tw_state *ts)
+static bool io_run_local_work_continue(struct io_ring_ctx *ctx, int events,
+                                      int min_events)
+{
+       if (llist_empty(&ctx->work_llist))
+               return false;
+       if (events < min_events)
+               return true;
+       if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
+               atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
+       return false;
+}
+
+static int __io_run_local_work(struct io_ring_ctx *ctx, struct io_tw_state *ts,
+                              int min_events)
 {
        struct llist_node *node;
        unsigned int loops = 0;
@@ -1440,7 +1443,6 @@ again:
                struct llist_node *next = node->next;
                struct io_kiocb *req = container_of(node, struct io_kiocb,
                                                    io_task_work.node);
-               prefetch(container_of(next, struct io_kiocb, io_task_work.node));
                INDIRECT_CALL_2(req->io_task_work.func,
                                io_poll_task_func, io_req_rw_complete,
                                req, ts);
@@ -1449,18 +1451,20 @@ again:
        }
        loops++;
 
-       if (!llist_empty(&ctx->work_llist))
+       if (io_run_local_work_continue(ctx, ret, min_events))
                goto again;
        if (ts->locked) {
                io_submit_flush_completions(ctx);
-               if (!llist_empty(&ctx->work_llist))
+               if (io_run_local_work_continue(ctx, ret, min_events))
                        goto again;
        }
+
        trace_io_uring_local_work_run(ctx, ret, loops);
        return ret;
 }
 
-static inline int io_run_local_work_locked(struct io_ring_ctx *ctx)
+static inline int io_run_local_work_locked(struct io_ring_ctx *ctx,
+                                          int min_events)
 {
        struct io_tw_state ts = { .locked = true, };
        int ret;
@@ -1468,20 +1472,20 @@ static inline int io_run_local_work_locked(struct io_ring_ctx *ctx)
        if (llist_empty(&ctx->work_llist))
                return 0;
 
-       ret = __io_run_local_work(ctx, &ts);
+       ret = __io_run_local_work(ctx, &ts, min_events);
        /* shouldn't happen! */
        if (WARN_ON_ONCE(!ts.locked))
                mutex_lock(&ctx->uring_lock);
        return ret;
 }
 
-static int io_run_local_work(struct io_ring_ctx *ctx)
+static int io_run_local_work(struct io_ring_ctx *ctx, int min_events)
 {
        struct io_tw_state ts = {};
        int ret;
 
        ts.locked = mutex_trylock(&ctx->uring_lock);
-       ret = __io_run_local_work(ctx, &ts);
+       ret = __io_run_local_work(ctx, &ts, min_events);
        if (ts.locked)
                mutex_unlock(&ctx->uring_lock);
 
@@ -1677,7 +1681,7 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
                    io_task_work_pending(ctx)) {
                        u32 tail = ctx->cached_cq_tail;
 
-                       (void) io_run_local_work_locked(ctx);
+                       (void) io_run_local_work_locked(ctx, min);
 
                        if (task_work_pending(current) ||
                            wq_list_empty(&ctx->iopoll_list)) {
@@ -1768,9 +1772,9 @@ static void io_iopoll_req_issued(struct io_kiocb *req, unsigned int issue_flags)
        }
 }
 
-unsigned int io_file_get_flags(struct file *file)
+io_req_flags_t io_file_get_flags(struct file *file)
 {
-       unsigned int res = 0;
+       io_req_flags_t res = 0;
 
        if (S_ISREG(file_inode(file)->i_mode))
                res |= REQ_F_ISREG;
@@ -1966,10 +1970,28 @@ fail:
                goto fail;
        }
 
+       /*
+        * If DEFER_TASKRUN is set, it's only allowed to post CQEs from the
+        * submitter task context. Final request completions are handed to the
+        * right context, however this is not the case of auxiliary CQEs,
+        * which is the main mean of operation for multishot requests.
+        * Don't allow any multishot execution from io-wq. It's more restrictive
+        * than necessary and also cleaner.
+        */
+       if (req->flags & REQ_F_APOLL_MULTISHOT) {
+               err = -EBADFD;
+               if (!io_file_can_poll(req))
+                       goto fail;
+               err = -ECANCELED;
+               if (io_arm_poll_handler(req, issue_flags) != IO_APOLL_OK)
+                       goto fail;
+               return;
+       }
+
        if (req->flags & REQ_F_FORCE_ASYNC) {
                bool opcode_poll = def->pollin || def->pollout;
 
-               if (opcode_poll && file_can_poll(req->file)) {
+               if (opcode_poll && io_file_can_poll(req)) {
                        needs_poll = true;
                        issue_flags |= IO_URING_F_NONBLOCK;
                }
@@ -2171,7 +2193,8 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
        /* req is partially pre-initialised, see io_preinit_req() */
        req->opcode = opcode = READ_ONCE(sqe->opcode);
        /* same numerical values with corresponding REQ_F_*, safe to copy */
-       req->flags = sqe_flags = READ_ONCE(sqe->flags);
+       sqe_flags = READ_ONCE(sqe->flags);
+       req->flags = (io_req_flags_t) sqe_flags;
        req->cqe.user_data = READ_ONCE(sqe->user_data);
        req->file = NULL;
        req->rsrc_node = NULL;
@@ -2475,33 +2498,6 @@ int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
        return ret;
 }
 
-struct io_wait_queue {
-       struct wait_queue_entry wq;
-       struct io_ring_ctx *ctx;
-       unsigned cq_tail;
-       unsigned nr_timeouts;
-       ktime_t timeout;
-};
-
-static inline bool io_has_work(struct io_ring_ctx *ctx)
-{
-       return test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq) ||
-              !llist_empty(&ctx->work_llist);
-}
-
-static inline bool io_should_wake(struct io_wait_queue *iowq)
-{
-       struct io_ring_ctx *ctx = iowq->ctx;
-       int dist = READ_ONCE(ctx->rings->cq.tail) - (int) iowq->cq_tail;
-
-       /*
-        * Wake up if we have enough events, or if a timeout occurred since we
-        * started waiting. For timeouts, we always want to return to userspace,
-        * regardless of event count.
-        */
-       return dist >= 0 || atomic_read(&ctx->cq_timeouts) != iowq->nr_timeouts;
-}
-
 static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode,
                            int wake_flags, void *key)
 {
@@ -2520,7 +2516,7 @@ int io_run_task_work_sig(struct io_ring_ctx *ctx)
 {
        if (!llist_empty(&ctx->work_llist)) {
                __set_current_state(TASK_RUNNING);
-               if (io_run_local_work(ctx) > 0)
+               if (io_run_local_work(ctx, INT_MAX) > 0)
                        return 0;
        }
        if (io_run_task_work() > 0)
@@ -2588,7 +2584,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
        if (!io_allowed_run_tw(ctx))
                return -EEXIST;
        if (!llist_empty(&ctx->work_llist))
-               io_run_local_work(ctx);
+               io_run_local_work(ctx, min_events);
        io_run_task_work();
        io_cqring_overflow_flush(ctx);
        /* if user messes with these they will just get an early return */
@@ -2621,16 +2617,19 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 
                if (get_timespec64(&ts, uts))
                        return -EFAULT;
+
                iowq.timeout = ktime_add_ns(timespec64_to_ktime(ts), ktime_get_ns());
+               io_napi_adjust_timeout(ctx, &iowq, &ts);
        }
 
+       io_napi_busy_loop(ctx, &iowq);
+
        trace_io_uring_cqring_wait(ctx, min_events);
        do {
+               int nr_wait = (int) iowq.cq_tail - READ_ONCE(ctx->rings->cq.tail);
                unsigned long check_cq;
 
                if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) {
-                       int nr_wait = (int) iowq.cq_tail - READ_ONCE(ctx->rings->cq.tail);
-
                        atomic_set(&ctx->cq_wait_nr, nr_wait);
                        set_current_state(TASK_INTERRUPTIBLE);
                } else {
@@ -2649,7 +2648,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
                 */
                io_run_task_work();
                if (!llist_empty(&ctx->work_llist))
-                       io_run_local_work(ctx);
+                       io_run_local_work(ctx, nr_wait);
 
                /*
                 * Non-local task_work will be run on exit to userspace, but
@@ -2917,6 +2916,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
        io_req_caches_free(ctx);
        if (ctx->hash_map)
                io_wq_put_hash(ctx->hash_map);
+       io_napi_free(ctx);
        kfree(ctx->cancel_table.hbs);
        kfree(ctx->cancel_table_locked.hbs);
        kfree(ctx->io_bl);
@@ -3304,7 +3304,7 @@ static __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
 
        if ((ctx->flags & IORING_SETUP_DEFER_TASKRUN) &&
            io_allowed_defer_tw_run(ctx))
-               ret |= io_run_local_work(ctx) > 0;
+               ret |= io_run_local_work(ctx, INT_MAX) > 0;
        ret |= io_cancel_defer_files(ctx, task, cancel_all);
        mutex_lock(&ctx->uring_lock);
        ret |= io_poll_remove_all(ctx, task, cancel_all);
@@ -3666,7 +3666,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
                         * it should handle ownership problems if any.
                         */
                        if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
-                               (void)io_run_local_work_locked(ctx);
+                               (void)io_run_local_work_locked(ctx, min_complete);
                }
                mutex_unlock(&ctx->uring_lock);
        }
@@ -4153,7 +4153,7 @@ static int __init io_uring_init(void)
        BUILD_BUG_ON(SQE_COMMON_FLAGS >= (1 << 8));
        BUILD_BUG_ON((SQE_VALID_FLAGS | SQE_COMMON_FLAGS) != SQE_VALID_FLAGS);
 
-       BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(int));
+       BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof_field(struct io_kiocb, flags));
 
        BUILD_BUG_ON(sizeof(atomic_t) != sizeof(u32));
 
@@ -4175,9 +4175,8 @@ static int __init io_uring_init(void)
                                SLAB_ACCOUNT | SLAB_TYPESAFE_BY_RCU,
                                offsetof(struct io_kiocb, cmd.data),
                                sizeof_field(struct io_kiocb, cmd.data), NULL);
-       io_buf_cachep = kmem_cache_create("io_buffer", sizeof(struct io_buffer), 0,
-                                         SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT,
-                                         NULL);
+       io_buf_cachep = KMEM_CACHE(io_buffer,
+                                         SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT);
 
 #ifdef CONFIG_SYSCTL
        register_sysctl_init("kernel", kernel_io_uring_disabled_table);
index 04e33f25919ca78332fc3429b8c98d427b768688..6426ee382276b5f298c68a450ceb894205eb374d 100644 (file)
@@ -5,6 +5,7 @@
 #include <linux/lockdep.h>
 #include <linux/resume_user_mode.h>
 #include <linux/kasan.h>
+#include <linux/poll.h>
 #include <linux/io_uring_types.h>
 #include <uapi/linux/eventpoll.h>
 #include "io-wq.h"
 #include <trace/events/io_uring.h>
 #endif
 
-
 enum {
        IOU_OK                  = 0,
        IOU_ISSUE_SKIP_COMPLETE = -EIOCBQUEUED,
 
+       /*
+        * Requeue the task_work to restart operations on this request. The
+        * actual value isn't important, should just be not an otherwise
+        * valid error code, yet less than -MAX_ERRNO and valid internally.
+        */
+       IOU_REQUEUE             = -3072,
+
        /*
         * Intended only when both IO_URING_F_MULTISHOT is passed
         * to indicate to the poll runner that multishot should be
@@ -28,6 +35,32 @@ enum {
        IOU_STOP_MULTISHOT      = -ECANCELED,
 };
 
+struct io_wait_queue {
+       struct wait_queue_entry wq;
+       struct io_ring_ctx *ctx;
+       unsigned cq_tail;
+       unsigned nr_timeouts;
+       ktime_t timeout;
+
+#ifdef CONFIG_NET_RX_BUSY_POLL
+       unsigned int napi_busy_poll_to;
+       bool napi_prefer_busy_poll;
+#endif
+};
+
+static inline bool io_should_wake(struct io_wait_queue *iowq)
+{
+       struct io_ring_ctx *ctx = iowq->ctx;
+       int dist = READ_ONCE(ctx->rings->cq.tail) - (int) iowq->cq_tail;
+
+       /*
+        * Wake up if we have enough events, or if a timeout occurred since we
+        * started waiting. For timeouts, we always want to return to userspace,
+        * regardless of event count.
+        */
+       return dist >= 0 || atomic_read(&ctx->cq_timeouts) != iowq->nr_timeouts;
+}
+
 bool io_cqe_cache_refill(struct io_ring_ctx *ctx, bool overflow);
 void io_req_cqe_overflow(struct io_kiocb *req);
 int io_run_task_work_sig(struct io_ring_ctx *ctx);
@@ -50,6 +83,8 @@ void io_queue_iowq(struct io_kiocb *req, struct io_tw_state *ts_dont_use);
 void io_req_task_complete(struct io_kiocb *req, struct io_tw_state *ts);
 void io_req_task_queue_fail(struct io_kiocb *req, int ret);
 void io_req_task_submit(struct io_kiocb *req, struct io_tw_state *ts);
+struct llist_node *io_handle_tw_list(struct llist_node *node, unsigned int *count, unsigned int max_entries);
+struct llist_node *tctx_task_work_run(struct io_uring_task *tctx, unsigned int max_entries, unsigned int *count);
 void tctx_task_work(struct callback_head *cb);
 __cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd);
 int io_uring_alloc_task_context(struct task_struct *task,
@@ -201,7 +236,7 @@ static inline void io_ring_submit_unlock(struct io_ring_ctx *ctx,
                                         unsigned issue_flags)
 {
        lockdep_assert_held(&ctx->uring_lock);
-       if (issue_flags & IO_URING_F_UNLOCKED)
+       if (unlikely(issue_flags & IO_URING_F_UNLOCKED))
                mutex_unlock(&ctx->uring_lock);
 }
 
@@ -214,7 +249,7 @@ static inline void io_ring_submit_lock(struct io_ring_ctx *ctx,
         * The only exception is when we've detached the request and issue it
         * from an async worker thread, grab the lock for that case.
         */
-       if (issue_flags & IO_URING_F_UNLOCKED)
+       if (unlikely(issue_flags & IO_URING_F_UNLOCKED))
                mutex_lock(&ctx->uring_lock);
        lockdep_assert_held(&ctx->uring_lock);
 }
@@ -268,6 +303,8 @@ static inline unsigned int io_sqring_entries(struct io_ring_ctx *ctx)
 
 static inline int io_run_task_work(void)
 {
+       bool ret = false;
+
        /*
         * Always check-and-clear the task_work notification signal. With how
         * signaling works for task_work, we can find it set with nothing to
@@ -279,18 +316,26 @@ static inline int io_run_task_work(void)
         * PF_IO_WORKER never returns to userspace, so check here if we have
         * notify work that needs processing.
         */
-       if (current->flags & PF_IO_WORKER &&
-           test_thread_flag(TIF_NOTIFY_RESUME)) {
-               __set_current_state(TASK_RUNNING);
-               resume_user_mode_work(NULL);
+       if (current->flags & PF_IO_WORKER) {
+               if (test_thread_flag(TIF_NOTIFY_RESUME)) {
+                       __set_current_state(TASK_RUNNING);
+                       resume_user_mode_work(NULL);
+               }
+               if (current->io_uring) {
+                       unsigned int count = 0;
+
+                       tctx_task_work_run(current->io_uring, UINT_MAX, &count);
+                       if (count)
+                               ret = true;
+               }
        }
        if (task_work_pending(current)) {
                __set_current_state(TASK_RUNNING);
                task_work_run();
-               return 1;
+               ret = true;
        }
 
-       return 0;
+       return ret;
 }
 
 static inline bool io_task_work_pending(struct io_ring_ctx *ctx)
@@ -392,4 +437,26 @@ static inline size_t uring_sqe_size(struct io_ring_ctx *ctx)
                return 2 * sizeof(struct io_uring_sqe);
        return sizeof(struct io_uring_sqe);
 }
+
+static inline bool io_file_can_poll(struct io_kiocb *req)
+{
+       if (req->flags & REQ_F_CAN_POLL)
+               return true;
+       if (file_can_poll(req->file)) {
+               req->flags |= REQ_F_CAN_POLL;
+               return true;
+       }
+       return false;
+}
+
+enum {
+       IO_CHECK_CQ_OVERFLOW_BIT,
+       IO_CHECK_CQ_DROPPED_BIT,
+};
+
+static inline bool io_has_work(struct io_ring_ctx *ctx)
+{
+       return test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq) ||
+              !llist_empty(&ctx->work_llist);
+}
 #endif
index 18df5a9d2f5e7defb2df04f548d9e67b737ddbaf..9be42bff936b95750beee16661893d7686a1fcb9 100644 (file)
@@ -81,15 +81,6 @@ bool io_kbuf_recycle_legacy(struct io_kiocb *req, unsigned issue_flags)
        struct io_buffer_list *bl;
        struct io_buffer *buf;
 
-       /*
-        * For legacy provided buffer mode, don't recycle if we already did
-        * IO to this buffer. For ring-mapped provided buffer mode, we should
-        * increment ring->head to explicitly monopolize the buffer to avoid
-        * multiple use.
-        */
-       if (req->flags & REQ_F_PARTIAL_IO)
-               return false;
-
        io_ring_submit_lock(ctx, issue_flags);
 
        buf = req->kbuf;
@@ -102,10 +93,8 @@ bool io_kbuf_recycle_legacy(struct io_kiocb *req, unsigned issue_flags)
        return true;
 }
 
-unsigned int __io_put_kbuf(struct io_kiocb *req, unsigned issue_flags)
+void __io_put_kbuf(struct io_kiocb *req, unsigned issue_flags)
 {
-       unsigned int cflags;
-
        /*
         * We can add this buffer back to two lists:
         *
@@ -118,21 +107,17 @@ unsigned int __io_put_kbuf(struct io_kiocb *req, unsigned issue_flags)
         * We migrate buffers from the comp_list to the issue cache list
         * when we need one.
         */
-       if (req->flags & REQ_F_BUFFER_RING) {
-               /* no buffers to recycle for this case */
-               cflags = __io_put_kbuf_list(req, NULL);
-       } else if (issue_flags & IO_URING_F_UNLOCKED) {
+       if (issue_flags & IO_URING_F_UNLOCKED) {
                struct io_ring_ctx *ctx = req->ctx;
 
                spin_lock(&ctx->completion_lock);
-               cflags = __io_put_kbuf_list(req, &ctx->io_buffers_comp);
+               __io_put_kbuf_list(req, &ctx->io_buffers_comp);
                spin_unlock(&ctx->completion_lock);
        } else {
                lockdep_assert_held(&req->ctx->uring_lock);
 
-               cflags = __io_put_kbuf_list(req, &req->ctx->io_buffers_cache);
+               __io_put_kbuf_list(req, &req->ctx->io_buffers_cache);
        }
-       return cflags;
 }
 
 static void __user *io_provided_buffer_select(struct io_kiocb *req, size_t *len,
@@ -145,6 +130,8 @@ static void __user *io_provided_buffer_select(struct io_kiocb *req, size_t *len,
                list_del(&kbuf->list);
                if (*len == 0 || *len > kbuf->len)
                        *len = kbuf->len;
+               if (list_empty(&bl->buf_list))
+                       req->flags |= REQ_F_BL_EMPTY;
                req->flags |= REQ_F_BUFFER_SELECTED;
                req->kbuf = kbuf;
                req->buf_index = kbuf->bid;
@@ -158,12 +145,16 @@ static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len,
                                          unsigned int issue_flags)
 {
        struct io_uring_buf_ring *br = bl->buf_ring;
+       __u16 tail, head = bl->head;
        struct io_uring_buf *buf;
-       __u16 head = bl->head;
 
-       if (unlikely(smp_load_acquire(&br->tail) == head))
+       tail = smp_load_acquire(&br->tail);
+       if (unlikely(tail == head))
                return NULL;
 
+       if (head + 1 == tail)
+               req->flags |= REQ_F_BL_EMPTY;
+
        head &= bl->mask;
        /* mmaped buffers are always contig */
        if (bl->is_mmap || head < IO_BUFFER_LIST_BUF_PER_PAGE) {
@@ -180,7 +171,7 @@ static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len,
        req->buf_list = bl;
        req->buf_index = buf->bid;
 
-       if (issue_flags & IO_URING_F_UNLOCKED || !file_can_poll(req->file)) {
+       if (issue_flags & IO_URING_F_UNLOCKED || !io_file_can_poll(req)) {
                /*
                 * If we came in unlocked, we have no choice but to consume the
                 * buffer here, otherwise nothing ensures that the buffer won't
index 53dfaa71a397cedd9b52923a3f0178873aaf9960..5218bfd79e871eef88d0a34445ee72731320be22 100644 (file)
@@ -57,7 +57,7 @@ int io_register_pbuf_status(struct io_ring_ctx *ctx, void __user *arg);
 
 void io_kbuf_mmap_list_free(struct io_ring_ctx *ctx);
 
-unsigned int __io_put_kbuf(struct io_kiocb *req, unsigned issue_flags);
+void __io_put_kbuf(struct io_kiocb *req, unsigned issue_flags);
 
 bool io_kbuf_recycle_legacy(struct io_kiocb *req, unsigned issue_flags);
 
@@ -73,21 +73,9 @@ static inline bool io_kbuf_recycle_ring(struct io_kiocb *req)
         * to monopolize the buffer.
         */
        if (req->buf_list) {
-               if (req->flags & REQ_F_PARTIAL_IO) {
-                       /*
-                        * If we end up here, then the io_uring_lock has
-                        * been kept held since we retrieved the buffer.
-                        * For the io-wq case, we already cleared
-                        * req->buf_list when the buffer was retrieved,
-                        * hence it cannot be set here for that case.
-                        */
-                       req->buf_list->head++;
-                       req->buf_list = NULL;
-               } else {
-                       req->buf_index = req->buf_list->bgid;
-                       req->flags &= ~REQ_F_BUFFER_RING;
-                       return true;
-               }
+               req->buf_index = req->buf_list->bgid;
+               req->flags &= ~REQ_F_BUFFER_RING;
+               return true;
        }
        return false;
 }
@@ -101,6 +89,8 @@ static inline bool io_do_buffer_select(struct io_kiocb *req)
 
 static inline bool io_kbuf_recycle(struct io_kiocb *req, unsigned issue_flags)
 {
+       if (req->flags & REQ_F_BL_NO_RECYCLE)
+               return false;
        if (req->flags & REQ_F_BUFFER_SELECTED)
                return io_kbuf_recycle_legacy(req, issue_flags);
        if (req->flags & REQ_F_BUFFER_RING)
@@ -108,41 +98,54 @@ static inline bool io_kbuf_recycle(struct io_kiocb *req, unsigned issue_flags)
        return false;
 }
 
-static inline unsigned int __io_put_kbuf_list(struct io_kiocb *req,
-                                             struct list_head *list)
+static inline void __io_put_kbuf_ring(struct io_kiocb *req)
 {
-       unsigned int ret = IORING_CQE_F_BUFFER | (req->buf_index << IORING_CQE_BUFFER_SHIFT);
+       if (req->buf_list) {
+               req->buf_index = req->buf_list->bgid;
+               req->buf_list->head++;
+       }
+       req->flags &= ~REQ_F_BUFFER_RING;
+}
 
+static inline void __io_put_kbuf_list(struct io_kiocb *req,
+                                     struct list_head *list)
+{
        if (req->flags & REQ_F_BUFFER_RING) {
-               if (req->buf_list) {
-                       req->buf_index = req->buf_list->bgid;
-                       req->buf_list->head++;
-               }
-               req->flags &= ~REQ_F_BUFFER_RING;
+               __io_put_kbuf_ring(req);
        } else {
                req->buf_index = req->kbuf->bgid;
                list_add(&req->kbuf->list, list);
                req->flags &= ~REQ_F_BUFFER_SELECTED;
        }
-
-       return ret;
 }
 
 static inline unsigned int io_put_kbuf_comp(struct io_kiocb *req)
 {
+       unsigned int ret;
+
        lockdep_assert_held(&req->ctx->completion_lock);
 
        if (!(req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)))
                return 0;
-       return __io_put_kbuf_list(req, &req->ctx->io_buffers_comp);
+
+       ret = IORING_CQE_F_BUFFER | (req->buf_index << IORING_CQE_BUFFER_SHIFT);
+       __io_put_kbuf_list(req, &req->ctx->io_buffers_comp);
+       return ret;
 }
 
 static inline unsigned int io_put_kbuf(struct io_kiocb *req,
                                       unsigned issue_flags)
 {
+       unsigned int ret;
 
-       if (!(req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)))
+       if (!(req->flags & (REQ_F_BUFFER_RING | REQ_F_BUFFER_SELECTED)))
                return 0;
-       return __io_put_kbuf(req, issue_flags);
+
+       ret = IORING_CQE_F_BUFFER | (req->buf_index << IORING_CQE_BUFFER_SHIFT);
+       if (req->flags & REQ_F_BUFFER_RING)
+               __io_put_kbuf_ring(req);
+       else
+               __io_put_kbuf(req, issue_flags);
+       return ret;
 }
 #endif
diff --git a/io_uring/napi.c b/io_uring/napi.c
new file mode 100644 (file)
index 0000000..883a1a6
--- /dev/null
@@ -0,0 +1,332 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "io_uring.h"
+#include "napi.h"
+
+#ifdef CONFIG_NET_RX_BUSY_POLL
+
+/* Timeout for cleanout of stale entries. */
+#define NAPI_TIMEOUT           (60 * SEC_CONVERSION)
+
+struct io_napi_entry {
+       unsigned int            napi_id;
+       struct list_head        list;
+
+       unsigned long           timeout;
+       struct hlist_node       node;
+
+       struct rcu_head         rcu;
+};
+
+static struct io_napi_entry *io_napi_hash_find(struct hlist_head *hash_list,
+                                              unsigned int napi_id)
+{
+       struct io_napi_entry *e;
+
+       hlist_for_each_entry_rcu(e, hash_list, node) {
+               if (e->napi_id != napi_id)
+                       continue;
+               e->timeout = jiffies + NAPI_TIMEOUT;
+               return e;
+       }
+
+       return NULL;
+}
+
+void __io_napi_add(struct io_ring_ctx *ctx, struct socket *sock)
+{
+       struct hlist_head *hash_list;
+       unsigned int napi_id;
+       struct sock *sk;
+       struct io_napi_entry *e;
+
+       sk = sock->sk;
+       if (!sk)
+               return;
+
+       napi_id = READ_ONCE(sk->sk_napi_id);
+
+       /* Non-NAPI IDs can be rejected. */
+       if (napi_id < MIN_NAPI_ID)
+               return;
+
+       hash_list = &ctx->napi_ht[hash_min(napi_id, HASH_BITS(ctx->napi_ht))];
+
+       rcu_read_lock();
+       e = io_napi_hash_find(hash_list, napi_id);
+       if (e) {
+               e->timeout = jiffies + NAPI_TIMEOUT;
+               rcu_read_unlock();
+               return;
+       }
+       rcu_read_unlock();
+
+       e = kmalloc(sizeof(*e), GFP_NOWAIT);
+       if (!e)
+               return;
+
+       e->napi_id = napi_id;
+       e->timeout = jiffies + NAPI_TIMEOUT;
+
+       spin_lock(&ctx->napi_lock);
+       if (unlikely(io_napi_hash_find(hash_list, napi_id))) {
+               spin_unlock(&ctx->napi_lock);
+               kfree(e);
+               return;
+       }
+
+       hlist_add_tail_rcu(&e->node, hash_list);
+       list_add_tail(&e->list, &ctx->napi_list);
+       spin_unlock(&ctx->napi_lock);
+}
+
+static void __io_napi_remove_stale(struct io_ring_ctx *ctx)
+{
+       struct io_napi_entry *e;
+       unsigned int i;
+
+       spin_lock(&ctx->napi_lock);
+       hash_for_each(ctx->napi_ht, i, e, node) {
+               if (time_after(jiffies, e->timeout)) {
+                       list_del(&e->list);
+                       hash_del_rcu(&e->node);
+                       kfree_rcu(e, rcu);
+               }
+       }
+       spin_unlock(&ctx->napi_lock);
+}
+
+static inline void io_napi_remove_stale(struct io_ring_ctx *ctx, bool is_stale)
+{
+       if (is_stale)
+               __io_napi_remove_stale(ctx);
+}
+
+static inline bool io_napi_busy_loop_timeout(unsigned long start_time,
+                                            unsigned long bp_usec)
+{
+       if (bp_usec) {
+               unsigned long end_time = start_time + bp_usec;
+               unsigned long now = busy_loop_current_time();
+
+               return time_after(now, end_time);
+       }
+
+       return true;
+}
+
+static bool io_napi_busy_loop_should_end(void *data,
+                                        unsigned long start_time)
+{
+       struct io_wait_queue *iowq = data;
+
+       if (signal_pending(current))
+               return true;
+       if (io_should_wake(iowq) || io_has_work(iowq->ctx))
+               return true;
+       if (io_napi_busy_loop_timeout(start_time, iowq->napi_busy_poll_to))
+               return true;
+
+       return false;
+}
+
+static bool __io_napi_do_busy_loop(struct io_ring_ctx *ctx,
+                                  void *loop_end_arg)
+{
+       struct io_napi_entry *e;
+       bool (*loop_end)(void *, unsigned long) = NULL;
+       bool is_stale = false;
+
+       if (loop_end_arg)
+               loop_end = io_napi_busy_loop_should_end;
+
+       list_for_each_entry_rcu(e, &ctx->napi_list, list) {
+               napi_busy_loop_rcu(e->napi_id, loop_end, loop_end_arg,
+                                  ctx->napi_prefer_busy_poll, BUSY_POLL_BUDGET);
+
+               if (time_after(jiffies, e->timeout))
+                       is_stale = true;
+       }
+
+       return is_stale;
+}
+
+static void io_napi_blocking_busy_loop(struct io_ring_ctx *ctx,
+                                      struct io_wait_queue *iowq)
+{
+       unsigned long start_time = busy_loop_current_time();
+       void *loop_end_arg = NULL;
+       bool is_stale = false;
+
+       /* Singular lists use a different napi loop end check function and are
+        * only executed once.
+        */
+       if (list_is_singular(&ctx->napi_list))
+               loop_end_arg = iowq;
+
+       rcu_read_lock();
+       do {
+               is_stale = __io_napi_do_busy_loop(ctx, loop_end_arg);
+       } while (!io_napi_busy_loop_should_end(iowq, start_time) && !loop_end_arg);
+       rcu_read_unlock();
+
+       io_napi_remove_stale(ctx, is_stale);
+}
+
+/*
+ * io_napi_init() - Init napi settings
+ * @ctx: pointer to io-uring context structure
+ *
+ * Init napi settings in the io-uring context.
+ */
+void io_napi_init(struct io_ring_ctx *ctx)
+{
+       INIT_LIST_HEAD(&ctx->napi_list);
+       spin_lock_init(&ctx->napi_lock);
+       ctx->napi_prefer_busy_poll = false;
+       ctx->napi_busy_poll_to = READ_ONCE(sysctl_net_busy_poll);
+}
+
+/*
+ * io_napi_free() - Deallocate napi
+ * @ctx: pointer to io-uring context structure
+ *
+ * Free the napi list and the hash table in the io-uring context.
+ */
+void io_napi_free(struct io_ring_ctx *ctx)
+{
+       struct io_napi_entry *e;
+       LIST_HEAD(napi_list);
+       unsigned int i;
+
+       spin_lock(&ctx->napi_lock);
+       hash_for_each(ctx->napi_ht, i, e, node) {
+               hash_del_rcu(&e->node);
+               kfree_rcu(e, rcu);
+       }
+       spin_unlock(&ctx->napi_lock);
+}
+
+/*
+ * io_napi_register() - Register napi with io-uring
+ * @ctx: pointer to io-uring context structure
+ * @arg: pointer to io_uring_napi structure
+ *
+ * Register napi in the io-uring context.
+ */
+int io_register_napi(struct io_ring_ctx *ctx, void __user *arg)
+{
+       const struct io_uring_napi curr = {
+               .busy_poll_to     = ctx->napi_busy_poll_to,
+               .prefer_busy_poll = ctx->napi_prefer_busy_poll
+       };
+       struct io_uring_napi napi;
+
+       if (copy_from_user(&napi, arg, sizeof(napi)))
+               return -EFAULT;
+       if (napi.pad[0] || napi.pad[1] || napi.pad[2] || napi.resv)
+               return -EINVAL;
+
+       if (copy_to_user(arg, &curr, sizeof(curr)))
+               return -EFAULT;
+
+       WRITE_ONCE(ctx->napi_busy_poll_to, napi.busy_poll_to);
+       WRITE_ONCE(ctx->napi_prefer_busy_poll, !!napi.prefer_busy_poll);
+       WRITE_ONCE(ctx->napi_enabled, true);
+       return 0;
+}
+
+/*
+ * io_napi_unregister() - Unregister napi with io-uring
+ * @ctx: pointer to io-uring context structure
+ * @arg: pointer to io_uring_napi structure
+ *
+ * Unregister napi. If arg has been specified copy the busy poll timeout and
+ * prefer busy poll setting to the passed in structure.
+ */
+int io_unregister_napi(struct io_ring_ctx *ctx, void __user *arg)
+{
+       const struct io_uring_napi curr = {
+               .busy_poll_to     = ctx->napi_busy_poll_to,
+               .prefer_busy_poll = ctx->napi_prefer_busy_poll
+       };
+
+       if (arg && copy_to_user(arg, &curr, sizeof(curr)))
+               return -EFAULT;
+
+       WRITE_ONCE(ctx->napi_busy_poll_to, 0);
+       WRITE_ONCE(ctx->napi_prefer_busy_poll, false);
+       WRITE_ONCE(ctx->napi_enabled, false);
+       return 0;
+}
+
+/*
+ * __io_napi_adjust_timeout() - Add napi id to the busy poll list
+ * @ctx: pointer to io-uring context structure
+ * @iowq: pointer to io wait queue
+ * @ts: pointer to timespec or NULL
+ *
+ * Adjust the busy loop timeout according to timespec and busy poll timeout.
+ */
+void __io_napi_adjust_timeout(struct io_ring_ctx *ctx, struct io_wait_queue *iowq,
+                             struct timespec64 *ts)
+{
+       unsigned int poll_to = READ_ONCE(ctx->napi_busy_poll_to);
+
+       if (ts) {
+               struct timespec64 poll_to_ts = ns_to_timespec64(1000 * (s64)poll_to);
+
+               if (timespec64_compare(ts, &poll_to_ts) > 0) {
+                       *ts = timespec64_sub(*ts, poll_to_ts);
+               } else {
+                       u64 to = timespec64_to_ns(ts);
+
+                       do_div(to, 1000);
+                       ts->tv_sec = 0;
+                       ts->tv_nsec = 0;
+               }
+       }
+
+       iowq->napi_busy_poll_to = poll_to;
+}
+
+/*
+ * __io_napi_busy_loop() - execute busy poll loop
+ * @ctx: pointer to io-uring context structure
+ * @iowq: pointer to io wait queue
+ *
+ * Execute the busy poll loop and merge the spliced off list.
+ */
+void __io_napi_busy_loop(struct io_ring_ctx *ctx, struct io_wait_queue *iowq)
+{
+       iowq->napi_prefer_busy_poll = READ_ONCE(ctx->napi_prefer_busy_poll);
+
+       if (!(ctx->flags & IORING_SETUP_SQPOLL) && ctx->napi_enabled)
+               io_napi_blocking_busy_loop(ctx, iowq);
+}
+
+/*
+ * io_napi_sqpoll_busy_poll() - busy poll loop for sqpoll
+ * @ctx: pointer to io-uring context structure
+ *
+ * Splice of the napi list and execute the napi busy poll loop.
+ */
+int io_napi_sqpoll_busy_poll(struct io_ring_ctx *ctx)
+{
+       LIST_HEAD(napi_list);
+       bool is_stale = false;
+
+       if (!READ_ONCE(ctx->napi_busy_poll_to))
+               return 0;
+       if (list_empty_careful(&ctx->napi_list))
+               return 0;
+
+       rcu_read_lock();
+       is_stale = __io_napi_do_busy_loop(ctx, NULL);
+       rcu_read_unlock();
+
+       io_napi_remove_stale(ctx, is_stale);
+       return 1;
+}
+
+#endif
diff --git a/io_uring/napi.h b/io_uring/napi.h
new file mode 100644 (file)
index 0000000..6fc0393
--- /dev/null
@@ -0,0 +1,104 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef IOU_NAPI_H
+#define IOU_NAPI_H
+
+#include <linux/kernel.h>
+#include <linux/io_uring.h>
+#include <net/busy_poll.h>
+
+#ifdef CONFIG_NET_RX_BUSY_POLL
+
+void io_napi_init(struct io_ring_ctx *ctx);
+void io_napi_free(struct io_ring_ctx *ctx);
+
+int io_register_napi(struct io_ring_ctx *ctx, void __user *arg);
+int io_unregister_napi(struct io_ring_ctx *ctx, void __user *arg);
+
+void __io_napi_add(struct io_ring_ctx *ctx, struct socket *sock);
+
+void __io_napi_adjust_timeout(struct io_ring_ctx *ctx,
+               struct io_wait_queue *iowq, struct timespec64 *ts);
+void __io_napi_busy_loop(struct io_ring_ctx *ctx, struct io_wait_queue *iowq);
+int io_napi_sqpoll_busy_poll(struct io_ring_ctx *ctx);
+
+static inline bool io_napi(struct io_ring_ctx *ctx)
+{
+       return !list_empty(&ctx->napi_list);
+}
+
+static inline void io_napi_adjust_timeout(struct io_ring_ctx *ctx,
+                                         struct io_wait_queue *iowq,
+                                         struct timespec64 *ts)
+{
+       if (!io_napi(ctx))
+               return;
+       __io_napi_adjust_timeout(ctx, iowq, ts);
+}
+
+static inline void io_napi_busy_loop(struct io_ring_ctx *ctx,
+                                    struct io_wait_queue *iowq)
+{
+       if (!io_napi(ctx))
+               return;
+       __io_napi_busy_loop(ctx, iowq);
+}
+
+/*
+ * io_napi_add() - Add napi id to the busy poll list
+ * @req: pointer to io_kiocb request
+ *
+ * Add the napi id of the socket to the napi busy poll list and hash table.
+ */
+static inline void io_napi_add(struct io_kiocb *req)
+{
+       struct io_ring_ctx *ctx = req->ctx;
+       struct socket *sock;
+
+       if (!READ_ONCE(ctx->napi_busy_poll_to))
+               return;
+
+       sock = sock_from_file(req->file);
+       if (sock)
+               __io_napi_add(ctx, sock);
+}
+
+#else
+
+static inline void io_napi_init(struct io_ring_ctx *ctx)
+{
+}
+static inline void io_napi_free(struct io_ring_ctx *ctx)
+{
+}
+static inline int io_register_napi(struct io_ring_ctx *ctx, void __user *arg)
+{
+       return -EOPNOTSUPP;
+}
+static inline int io_unregister_napi(struct io_ring_ctx *ctx, void __user *arg)
+{
+       return -EOPNOTSUPP;
+}
+static inline bool io_napi(struct io_ring_ctx *ctx)
+{
+       return false;
+}
+static inline void io_napi_add(struct io_kiocb *req)
+{
+}
+static inline void io_napi_adjust_timeout(struct io_ring_ctx *ctx,
+                                         struct io_wait_queue *iowq,
+                                         struct timespec64 *ts)
+{
+}
+static inline void io_napi_busy_loop(struct io_ring_ctx *ctx,
+                                    struct io_wait_queue *iowq)
+{
+}
+static inline int io_napi_sqpoll_busy_poll(struct io_ring_ctx *ctx)
+{
+       return 0;
+}
+#endif /* CONFIG_NET_RX_BUSY_POLL */
+
+#endif
index 75d494dad7e2c7b22a53f50fc422d807a0559000..19451f0dbf813664f9b26e0a27c12b52be080944 100644 (file)
@@ -60,6 +60,7 @@ struct io_sr_msg {
        unsigned                        len;
        unsigned                        done_io;
        unsigned                        msg_flags;
+       unsigned                        nr_multishot_loops;
        u16                             flags;
        /* initialised and used only by !msg send variants */
        u16                             addr_len;
@@ -70,18 +71,12 @@ struct io_sr_msg {
        struct io_kiocb                 *notif;
 };
 
-static inline bool io_check_multishot(struct io_kiocb *req,
-                                     unsigned int issue_flags)
-{
-       /*
-        * When ->locked_cq is set we only allow to post CQEs from the original
-        * task context. Usual request completions will be handled in other
-        * generic paths but multipoll may decide to post extra cqes.
-        */
-       return !(issue_flags & IO_URING_F_IOWQ) ||
-               !(issue_flags & IO_URING_F_MULTISHOT) ||
-               !req->ctx->task_complete;
-}
+/*
+ * Number of times we'll try and do receives if there's more data. If we
+ * exceed this limit, then add us to the back of the queue and retry from
+ * there. This helps fairness between flooding clients.
+ */
+#define MULTISHOT_MAX_RETRY    32
 
 int io_shutdown_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
@@ -196,16 +191,130 @@ static int io_setup_async_msg(struct io_kiocb *req,
        return -EAGAIN;
 }
 
+#ifdef CONFIG_COMPAT
+static int io_compat_msg_copy_hdr(struct io_kiocb *req,
+                                 struct io_async_msghdr *iomsg,
+                                 struct compat_msghdr *msg, int ddir)
+{
+       struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
+       struct compat_iovec __user *uiov;
+       int ret;
+
+       if (copy_from_user(msg, sr->umsg_compat, sizeof(*msg)))
+               return -EFAULT;
+
+       uiov = compat_ptr(msg->msg_iov);
+       if (req->flags & REQ_F_BUFFER_SELECT) {
+               compat_ssize_t clen;
+
+               iomsg->free_iov = NULL;
+               if (msg->msg_iovlen == 0) {
+                       sr->len = 0;
+               } else if (msg->msg_iovlen > 1) {
+                       return -EINVAL;
+               } else {
+                       if (!access_ok(uiov, sizeof(*uiov)))
+                               return -EFAULT;
+                       if (__get_user(clen, &uiov->iov_len))
+                               return -EFAULT;
+                       if (clen < 0)
+                               return -EINVAL;
+                       sr->len = clen;
+               }
+
+               return 0;
+       }
+
+       iomsg->free_iov = iomsg->fast_iov;
+       ret = __import_iovec(ddir, (struct iovec __user *)uiov, msg->msg_iovlen,
+                               UIO_FASTIOV, &iomsg->free_iov,
+                               &iomsg->msg.msg_iter, true);
+       if (unlikely(ret < 0))
+               return ret;
+
+       return 0;
+}
+#endif
+
+static int io_msg_copy_hdr(struct io_kiocb *req, struct io_async_msghdr *iomsg,
+                          struct user_msghdr *msg, int ddir)
+{
+       struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
+       int ret;
+
+       if (!user_access_begin(sr->umsg, sizeof(*sr->umsg)))
+               return -EFAULT;
+
+       ret = -EFAULT;
+       unsafe_get_user(msg->msg_name, &sr->umsg->msg_name, ua_end);
+       unsafe_get_user(msg->msg_namelen, &sr->umsg->msg_namelen, ua_end);
+       unsafe_get_user(msg->msg_iov, &sr->umsg->msg_iov, ua_end);
+       unsafe_get_user(msg->msg_iovlen, &sr->umsg->msg_iovlen, ua_end);
+       unsafe_get_user(msg->msg_control, &sr->umsg->msg_control, ua_end);
+       unsafe_get_user(msg->msg_controllen, &sr->umsg->msg_controllen, ua_end);
+       msg->msg_flags = 0;
+
+       if (req->flags & REQ_F_BUFFER_SELECT) {
+               if (msg->msg_iovlen == 0) {
+                       sr->len = iomsg->fast_iov[0].iov_len = 0;
+                       iomsg->fast_iov[0].iov_base = NULL;
+                       iomsg->free_iov = NULL;
+               } else if (msg->msg_iovlen > 1) {
+                       ret = -EINVAL;
+                       goto ua_end;
+               } else {
+                       /* we only need the length for provided buffers */
+                       if (!access_ok(&msg->msg_iov[0].iov_len, sizeof(__kernel_size_t)))
+                               goto ua_end;
+                       unsafe_get_user(iomsg->fast_iov[0].iov_len,
+                                       &msg->msg_iov[0].iov_len, ua_end);
+                       sr->len = iomsg->fast_iov[0].iov_len;
+                       iomsg->free_iov = NULL;
+               }
+               ret = 0;
+ua_end:
+               user_access_end();
+               return ret;
+       }
+
+       user_access_end();
+       iomsg->free_iov = iomsg->fast_iov;
+       ret = __import_iovec(ddir, msg->msg_iov, msg->msg_iovlen, UIO_FASTIOV,
+                               &iomsg->free_iov, &iomsg->msg.msg_iter, false);
+       if (unlikely(ret < 0))
+               return ret;
+
+       return 0;
+}
+
 static int io_sendmsg_copy_hdr(struct io_kiocb *req,
                               struct io_async_msghdr *iomsg)
 {
        struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
+       struct user_msghdr msg;
        int ret;
 
        iomsg->msg.msg_name = &iomsg->addr;
-       iomsg->free_iov = iomsg->fast_iov;
-       ret = sendmsg_copy_msghdr(&iomsg->msg, sr->umsg, sr->msg_flags,
-                                       &iomsg->free_iov);
+       iomsg->msg.msg_iter.nr_segs = 0;
+
+#ifdef CONFIG_COMPAT
+       if (unlikely(req->ctx->compat)) {
+               struct compat_msghdr cmsg;
+
+               ret = io_compat_msg_copy_hdr(req, iomsg, &cmsg, ITER_SOURCE);
+               if (unlikely(ret))
+                       return ret;
+
+               return __get_compat_msghdr(&iomsg->msg, &cmsg, NULL);
+       }
+#endif
+
+       ret = io_msg_copy_hdr(req, iomsg, &msg, ITER_SOURCE);
+       if (unlikely(ret))
+               return ret;
+
+       ret = __copy_msghdr(&iomsg->msg, &msg, NULL);
+
        /* save msg_control as sys_sendmsg() overwrites it */
        sr->msg_control = iomsg->msg.msg_control_user;
        return ret;
@@ -265,6 +374,8 @@ int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
        struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
 
+       sr->done_io = 0;
+
        if (req->opcode == IORING_OP_SEND) {
                if (READ_ONCE(sqe->__pad3[0]))
                        return -EINVAL;
@@ -287,10 +398,20 @@ int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
        if (req->ctx->compat)
                sr->msg_flags |= MSG_CMSG_COMPAT;
 #endif
-       sr->done_io = 0;
        return 0;
 }
 
+static void io_req_msg_cleanup(struct io_kiocb *req,
+                              struct io_async_msghdr *kmsg,
+                              unsigned int issue_flags)
+{
+       req->flags &= ~REQ_F_NEED_CLEANUP;
+       /* fast path, check for non-NULL to avoid function call */
+       if (kmsg->free_iov)
+               kfree(kmsg->free_iov);
+       io_netmsg_recycle(req, issue_flags);
+}
+
 int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
 {
        struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
@@ -333,18 +454,14 @@ int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
                        kmsg->msg.msg_controllen = 0;
                        kmsg->msg.msg_control = NULL;
                        sr->done_io += ret;
-                       req->flags |= REQ_F_PARTIAL_IO;
+                       req->flags |= REQ_F_BL_NO_RECYCLE;
                        return io_setup_async_msg(req, kmsg, issue_flags);
                }
                if (ret == -ERESTARTSYS)
                        ret = -EINTR;
                req_set_fail(req);
        }
-       /* fast path, check for non-NULL to avoid function call */
-       if (kmsg->free_iov)
-               kfree(kmsg->free_iov);
-       req->flags &= ~REQ_F_NEED_CLEANUP;
-       io_netmsg_recycle(req, issue_flags);
+       io_req_msg_cleanup(req, kmsg, issue_flags);
        if (ret >= 0)
                ret += sr->done_io;
        else if (sr->done_io)
@@ -412,7 +529,7 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags)
                        sr->len -= ret;
                        sr->buf += ret;
                        sr->done_io += ret;
-                       req->flags |= REQ_F_PARTIAL_IO;
+                       req->flags |= REQ_F_BL_NO_RECYCLE;
                        return io_setup_async_addr(req, &__address, issue_flags);
                }
                if (ret == -ERESTARTSYS)
@@ -427,142 +544,77 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags)
        return IOU_OK;
 }
 
-static bool io_recvmsg_multishot_overflow(struct io_async_msghdr *iomsg)
+static int io_recvmsg_mshot_prep(struct io_kiocb *req,
+                                struct io_async_msghdr *iomsg,
+                                int namelen, size_t controllen)
 {
-       int hdr;
-
-       if (iomsg->namelen < 0)
-               return true;
-       if (check_add_overflow((int)sizeof(struct io_uring_recvmsg_out),
-                              iomsg->namelen, &hdr))
-               return true;
-       if (check_add_overflow(hdr, (int)iomsg->controllen, &hdr))
-               return true;
+       if ((req->flags & (REQ_F_APOLL_MULTISHOT|REQ_F_BUFFER_SELECT)) ==
+                         (REQ_F_APOLL_MULTISHOT|REQ_F_BUFFER_SELECT)) {
+               int hdr;
+
+               if (unlikely(namelen < 0))
+                       return -EOVERFLOW;
+               if (check_add_overflow(sizeof(struct io_uring_recvmsg_out),
+                                       namelen, &hdr))
+                       return -EOVERFLOW;
+               if (check_add_overflow(hdr, controllen, &hdr))
+                       return -EOVERFLOW;
+
+               iomsg->namelen = namelen;
+               iomsg->controllen = controllen;
+               return 0;
+       }
 
-       return false;
+       return 0;
 }
 
-static int __io_recvmsg_copy_hdr(struct io_kiocb *req,
-                                struct io_async_msghdr *iomsg)
+static int io_recvmsg_copy_hdr(struct io_kiocb *req,
+                              struct io_async_msghdr *iomsg)
 {
-       struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
        struct user_msghdr msg;
        int ret;
 
-       if (copy_from_user(&msg, sr->umsg, sizeof(*sr->umsg)))
-               return -EFAULT;
-
-       ret = __copy_msghdr(&iomsg->msg, &msg, &iomsg->uaddr);
-       if (ret)
-               return ret;
-
-       if (req->flags & REQ_F_BUFFER_SELECT) {
-               if (msg.msg_iovlen == 0) {
-                       sr->len = iomsg->fast_iov[0].iov_len = 0;
-                       iomsg->fast_iov[0].iov_base = NULL;
-                       iomsg->free_iov = NULL;
-               } else if (msg.msg_iovlen > 1) {
-                       return -EINVAL;
-               } else {
-                       if (copy_from_user(iomsg->fast_iov, msg.msg_iov, sizeof(*msg.msg_iov)))
-                               return -EFAULT;
-                       sr->len = iomsg->fast_iov[0].iov_len;
-                       iomsg->free_iov = NULL;
-               }
-
-               if (req->flags & REQ_F_APOLL_MULTISHOT) {
-                       iomsg->namelen = msg.msg_namelen;
-                       iomsg->controllen = msg.msg_controllen;
-                       if (io_recvmsg_multishot_overflow(iomsg))
-                               return -EOVERFLOW;
-               }
-       } else {
-               iomsg->free_iov = iomsg->fast_iov;
-               ret = __import_iovec(ITER_DEST, msg.msg_iov, msg.msg_iovlen, UIO_FASTIOV,
-                                    &iomsg->free_iov, &iomsg->msg.msg_iter,
-                                    false);
-               if (ret > 0)
-                       ret = 0;
-       }
-
-       return ret;
-}
+       iomsg->msg.msg_name = &iomsg->addr;
+       iomsg->msg.msg_iter.nr_segs = 0;
 
 #ifdef CONFIG_COMPAT
-static int __io_compat_recvmsg_copy_hdr(struct io_kiocb *req,
-                                       struct io_async_msghdr *iomsg)
-{
-       struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
-       struct compat_msghdr msg;
-       struct compat_iovec __user *uiov;
-       int ret;
+       if (unlikely(req->ctx->compat)) {
+               struct compat_msghdr cmsg;
 
-       if (copy_from_user(&msg, sr->umsg_compat, sizeof(msg)))
-               return -EFAULT;
-
-       ret = __get_compat_msghdr(&iomsg->msg, &msg, &iomsg->uaddr);
-       if (ret)
-               return ret;
-
-       uiov = compat_ptr(msg.msg_iov);
-       if (req->flags & REQ_F_BUFFER_SELECT) {
-               compat_ssize_t clen;
-
-               iomsg->free_iov = NULL;
-               if (msg.msg_iovlen == 0) {
-                       sr->len = 0;
-               } else if (msg.msg_iovlen > 1) {
-                       return -EINVAL;
-               } else {
-                       if (!access_ok(uiov, sizeof(*uiov)))
-                               return -EFAULT;
-                       if (__get_user(clen, &uiov->iov_len))
-                               return -EFAULT;
-                       if (clen < 0)
-                               return -EINVAL;
-                       sr->len = clen;
-               }
+               ret = io_compat_msg_copy_hdr(req, iomsg, &cmsg, ITER_DEST);
+               if (unlikely(ret))
+                       return ret;
 
-               if (req->flags & REQ_F_APOLL_MULTISHOT) {
-                       iomsg->namelen = msg.msg_namelen;
-                       iomsg->controllen = msg.msg_controllen;
-                       if (io_recvmsg_multishot_overflow(iomsg))
-                               return -EOVERFLOW;
-               }
-       } else {
-               iomsg->free_iov = iomsg->fast_iov;
-               ret = __import_iovec(ITER_DEST, (struct iovec __user *)uiov, msg.msg_iovlen,
-                                  UIO_FASTIOV, &iomsg->free_iov,
-                                  &iomsg->msg.msg_iter, true);
-               if (ret < 0)
+               ret = __get_compat_msghdr(&iomsg->msg, &cmsg, &iomsg->uaddr);
+               if (unlikely(ret))
                        return ret;
-       }
 
-       return 0;
-}
+               return io_recvmsg_mshot_prep(req, iomsg, cmsg.msg_namelen,
+                                               cmsg.msg_controllen);
+       }
 #endif
 
-static int io_recvmsg_copy_hdr(struct io_kiocb *req,
-                              struct io_async_msghdr *iomsg)
-{
-       iomsg->msg.msg_name = &iomsg->addr;
-       iomsg->msg.msg_iter.nr_segs = 0;
+       ret = io_msg_copy_hdr(req, iomsg, &msg, ITER_DEST);
+       if (unlikely(ret))
+               return ret;
 
-#ifdef CONFIG_COMPAT
-       if (req->ctx->compat)
-               return __io_compat_recvmsg_copy_hdr(req, iomsg);
-#endif
+       ret = __copy_msghdr(&iomsg->msg, &msg, &iomsg->uaddr);
+       if (unlikely(ret))
+               return ret;
 
-       return __io_recvmsg_copy_hdr(req, iomsg);
+       return io_recvmsg_mshot_prep(req, iomsg, msg.msg_namelen,
+                                       msg.msg_controllen);
 }
 
 int io_recvmsg_prep_async(struct io_kiocb *req)
 {
+       struct io_async_msghdr *iomsg;
        int ret;
 
        if (!io_msg_alloc_async_prep(req))
                return -ENOMEM;
-       ret = io_recvmsg_copy_hdr(req, req->async_data);
+       iomsg = req->async_data;
+       ret = io_recvmsg_copy_hdr(req, iomsg);
        if (!ret)
                req->flags |= REQ_F_NEED_CLEANUP;
        return ret;
@@ -574,6 +626,8 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
        struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
 
+       sr->done_io = 0;
+
        if (unlikely(sqe->file_index || sqe->addr2))
                return -EINVAL;
 
@@ -610,7 +664,7 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
        if (req->ctx->compat)
                sr->msg_flags |= MSG_CMSG_COMPAT;
 #endif
-       sr->done_io = 0;
+       sr->nr_multishot_loops = 0;
        return 0;
 }
 
@@ -618,6 +672,7 @@ static inline void io_recv_prep_retry(struct io_kiocb *req)
 {
        struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
 
+       req->flags &= ~REQ_F_BL_EMPTY;
        sr->done_io = 0;
        sr->len = 0; /* get from the provided buffer */
        req->buf_index = sr->buf_group;
@@ -636,32 +691,36 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret,
        unsigned int cflags;
 
        cflags = io_put_kbuf(req, issue_flags);
-       if (msg->msg_inq && msg->msg_inq != -1)
+       if (msg->msg_inq > 0)
                cflags |= IORING_CQE_F_SOCK_NONEMPTY;
 
-       if (!(req->flags & REQ_F_APOLL_MULTISHOT)) {
-               io_req_set_res(req, *ret, cflags);
-               *ret = IOU_OK;
-               return true;
-       }
-
-       if (!mshot_finished) {
-               if (io_fill_cqe_req_aux(req, issue_flags & IO_URING_F_COMPLETE_DEFER,
-                                       *ret, cflags | IORING_CQE_F_MORE)) {
-                       io_recv_prep_retry(req);
-                       /* Known not-empty or unknown state, retry */
-                       if (cflags & IORING_CQE_F_SOCK_NONEMPTY ||
-                           msg->msg_inq == -1)
+       /*
+        * Fill CQE for this receive and see if we should keep trying to
+        * receive from this socket.
+        */
+       if ((req->flags & REQ_F_APOLL_MULTISHOT) && !mshot_finished &&
+           io_fill_cqe_req_aux(req, issue_flags & IO_URING_F_COMPLETE_DEFER,
+                               *ret, cflags | IORING_CQE_F_MORE)) {
+               struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
+               int mshot_retry_ret = IOU_ISSUE_SKIP_COMPLETE;
+
+               io_recv_prep_retry(req);
+               /* Known not-empty or unknown state, retry */
+               if (cflags & IORING_CQE_F_SOCK_NONEMPTY || msg->msg_inq < 0) {
+                       if (sr->nr_multishot_loops++ < MULTISHOT_MAX_RETRY)
                                return false;
-                       if (issue_flags & IO_URING_F_MULTISHOT)
-                               *ret = IOU_ISSUE_SKIP_COMPLETE;
-                       else
-                               *ret = -EAGAIN;
-                       return true;
+                       /* mshot retries exceeded, force a requeue */
+                       sr->nr_multishot_loops = 0;
+                       mshot_retry_ret = IOU_REQUEUE;
                }
-               /* Otherwise stop multishot but use the current result. */
+               if (issue_flags & IO_URING_F_MULTISHOT)
+                       *ret = mshot_retry_ret;
+               else
+                       *ret = -EAGAIN;
+               return true;
        }
 
+       /* Finish the request / stop multishot. */
        io_req_set_res(req, *ret, cflags);
 
        if (issue_flags & IO_URING_F_MULTISHOT)
@@ -782,8 +841,9 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
            (sr->flags & IORING_RECVSEND_POLL_FIRST))
                return io_setup_async_msg(req, kmsg, issue_flags);
 
-       if (!io_check_multishot(req, issue_flags))
-               return io_setup_async_msg(req, kmsg, issue_flags);
+       flags = sr->msg_flags;
+       if (force_nonblock)
+               flags |= MSG_DONTWAIT;
 
 retry_multishot:
        if (io_do_buffer_select(req)) {
@@ -805,10 +865,6 @@ retry_multishot:
                iov_iter_ubuf(&kmsg->msg.msg_iter, ITER_DEST, buf, len);
        }
 
-       flags = sr->msg_flags;
-       if (force_nonblock)
-               flags |= MSG_DONTWAIT;
-
        kmsg->msg.msg_get_inq = 1;
        kmsg->msg.msg_inq = -1;
        if (req->flags & REQ_F_APOLL_MULTISHOT) {
@@ -834,7 +890,7 @@ retry_multishot:
                }
                if (ret > 0 && io_net_retry(sock, flags)) {
                        sr->done_io += ret;
-                       req->flags |= REQ_F_PARTIAL_IO;
+                       req->flags |= REQ_F_BL_NO_RECYCLE;
                        return io_setup_async_msg(req, kmsg, issue_flags);
                }
                if (ret == -ERESTARTSYS)
@@ -854,13 +910,10 @@ retry_multishot:
        if (!io_recv_finish(req, &ret, &kmsg->msg, mshot_finished, issue_flags))
                goto retry_multishot;
 
-       if (mshot_finished) {
-               /* fast path, check for non-NULL to avoid function call */
-               if (kmsg->free_iov)
-                       kfree(kmsg->free_iov);
-               io_netmsg_recycle(req, issue_flags);
-               req->flags &= ~REQ_F_NEED_CLEANUP;
-       }
+       if (mshot_finished)
+               io_req_msg_cleanup(req, kmsg, issue_flags);
+       else if (ret == -EAGAIN)
+               return io_setup_async_msg(req, kmsg, issue_flags);
 
        return ret;
 }
@@ -879,9 +932,6 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
            (sr->flags & IORING_RECVSEND_POLL_FIRST))
                return -EAGAIN;
 
-       if (!io_check_multishot(req, issue_flags))
-               return -EAGAIN;
-
        sock = sock_from_file(req->file);
        if (unlikely(!sock))
                return -ENOTSOCK;
@@ -894,6 +944,10 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
        msg.msg_iocb = NULL;
        msg.msg_ubuf = NULL;
 
+       flags = sr->msg_flags;
+       if (force_nonblock)
+               flags |= MSG_DONTWAIT;
+
 retry_multishot:
        if (io_do_buffer_select(req)) {
                void __user *buf;
@@ -902,6 +956,7 @@ retry_multishot:
                if (!buf)
                        return -ENOBUFS;
                sr->buf = buf;
+               sr->len = len;
        }
 
        ret = import_ubuf(ITER_DEST, sr->buf, len, &msg.msg_iter);
@@ -911,9 +966,6 @@ retry_multishot:
        msg.msg_inq = -1;
        msg.msg_flags = 0;
 
-       flags = sr->msg_flags;
-       if (force_nonblock)
-               flags |= MSG_DONTWAIT;
        if (flags & MSG_WAITALL)
                min_ret = iov_iter_count(&msg.msg_iter);
 
@@ -931,7 +983,7 @@ retry_multishot:
                        sr->len -= ret;
                        sr->buf += ret;
                        sr->done_io += ret;
-                       req->flags |= REQ_F_PARTIAL_IO;
+                       req->flags |= REQ_F_BL_NO_RECYCLE;
                        return -EAGAIN;
                }
                if (ret == -ERESTARTSYS)
@@ -981,6 +1033,8 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
        struct io_ring_ctx *ctx = req->ctx;
        struct io_kiocb *notif;
 
+       zc->done_io = 0;
+
        if (unlikely(READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3)))
                return -EINVAL;
        /* we don't support IOSQE_CQE_SKIP_SUCCESS just yet */
@@ -1033,8 +1087,6 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
        if (zc->msg_flags & MSG_DONTWAIT)
                req->flags |= REQ_F_NOWAIT;
 
-       zc->done_io = 0;
-
 #ifdef CONFIG_COMPAT
        if (req->ctx->compat)
                zc->msg_flags |= MSG_CMSG_COMPAT;
@@ -1174,7 +1226,7 @@ int io_send_zc(struct io_kiocb *req, unsigned int issue_flags)
                        zc->len -= ret;
                        zc->buf += ret;
                        zc->done_io += ret;
-                       req->flags |= REQ_F_PARTIAL_IO;
+                       req->flags |= REQ_F_BL_NO_RECYCLE;
                        return io_setup_async_addr(req, &__address, issue_flags);
                }
                if (ret == -ERESTARTSYS)
@@ -1244,7 +1296,7 @@ int io_sendmsg_zc(struct io_kiocb *req, unsigned int issue_flags)
 
                if (ret > 0 && io_net_retry(sock, flags)) {
                        sr->done_io += ret;
-                       req->flags |= REQ_F_PARTIAL_IO;
+                       req->flags |= REQ_F_BL_NO_RECYCLE;
                        return io_setup_async_msg(req, kmsg, issue_flags);
                }
                if (ret == -ERESTARTSYS)
@@ -1279,7 +1331,7 @@ void io_sendrecv_fail(struct io_kiocb *req)
 {
        struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
 
-       if (req->flags & REQ_F_PARTIAL_IO)
+       if (sr->done_io)
                req->cqe.res = sr->done_io;
 
        if ((req->flags & REQ_F_NEED_CLEANUP) &&
@@ -1329,8 +1381,6 @@ int io_accept(struct io_kiocb *req, unsigned int issue_flags)
        struct file *file;
        int ret, fd;
 
-       if (!io_check_multishot(req, issue_flags))
-               return -EAGAIN;
 retry:
        if (!fixed) {
                fd = __get_unused_fd_flags(accept->flags, accept->nofile);
@@ -1350,7 +1400,7 @@ retry:
                         * has already been done
                         */
                        if (issue_flags & IO_URING_F_MULTISHOT)
-                               ret = IOU_ISSUE_SKIP_COMPLETE;
+                               return IOU_ISSUE_SKIP_COMPLETE;
                        return ret;
                }
                if (ret == -ERESTARTSYS)
@@ -1375,7 +1425,8 @@ retry:
                                ret, IORING_CQE_F_MORE))
                goto retry;
 
-       return -ECANCELED;
+       io_req_set_res(req, ret, 0);
+       return IOU_STOP_MULTISHOT;
 }
 
 int io_socket_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
index 6705634e5f52aa625b797f3c7931903c55130359..9c080aadc5a662f5fb9e8a15fd9a0f956405fb76 100644 (file)
@@ -35,6 +35,7 @@
 #include "rw.h"
 #include "waitid.h"
 #include "futex.h"
+#include "truncate.h"
 
 static int io_no_issue(struct io_kiocb *req, unsigned int issue_flags)
 {
@@ -471,10 +472,15 @@ const struct io_issue_def io_issue_defs[] = {
        },
        [IORING_OP_FIXED_FD_INSTALL] = {
                .needs_file             = 1,
-               .audit_skip             = 1,
                .prep                   = io_install_fixed_fd_prep,
                .issue                  = io_install_fixed_fd,
        },
+       [IORING_OP_FTRUNCATE] = {
+               .needs_file             = 1,
+               .hash_reg_file          = 1,
+               .prep                   = io_ftruncate_prep,
+               .issue                  = io_ftruncate,
+       },
 };
 
 const struct io_cold_def io_cold_defs[] = {
@@ -713,6 +719,9 @@ const struct io_cold_def io_cold_defs[] = {
        [IORING_OP_FIXED_FD_INSTALL] = {
                .name                   = "FIXED_FD_INSTALL",
        },
+       [IORING_OP_FTRUNCATE] = {
+               .name                   = "FTRUNCATE",
+       },
 };
 
 const char *io_uring_get_opcode(u8 opcode)
index 0fe0dd30554623edb87cd159b4ffe0c52288211a..e3357dfa14ca42dd5b25e6cf9ce4a4be8b7ee0f4 100644 (file)
@@ -277,6 +277,10 @@ int io_install_fixed_fd_prep(struct io_kiocb *req, const struct io_uring_sqe *sq
        if (flags & ~IORING_FIXED_FD_NO_CLOEXEC)
                return -EINVAL;
 
+       /* ensure the task's creds are used when installing/receiving fds */
+       if (req->flags & REQ_F_CREDS)
+               return -EPERM;
+
        /* default to O_CLOEXEC, disable if IORING_FIXED_FD_NO_CLOEXEC is set */
        ifi = io_kiocb_to_cmd(req, struct io_fixed_install);
        ifi->o_flags = O_CLOEXEC;
index d59b74a99d4e4b444dcb2f86dc9d3594d838e1cf..5f779139cae1849e30f6caaae484c060f938febc 100644 (file)
@@ -15,6 +15,7 @@
 
 #include "io_uring.h"
 #include "refs.h"
+#include "napi.h"
 #include "opdef.h"
 #include "kbuf.h"
 #include "poll.h"
@@ -226,8 +227,29 @@ enum {
        IOU_POLL_NO_ACTION = 1,
        IOU_POLL_REMOVE_POLL_USE_RES = 2,
        IOU_POLL_REISSUE = 3,
+       IOU_POLL_REQUEUE = 4,
 };
 
+static void __io_poll_execute(struct io_kiocb *req, int mask)
+{
+       unsigned flags = 0;
+
+       io_req_set_res(req, mask, 0);
+       req->io_task_work.func = io_poll_task_func;
+
+       trace_io_uring_task_add(req, mask);
+
+       if (!(req->flags & REQ_F_POLL_NO_LAZY))
+               flags = IOU_F_TWQ_LAZY_WAKE;
+       __io_req_task_work_add(req, flags);
+}
+
+static inline void io_poll_execute(struct io_kiocb *req, int res)
+{
+       if (io_poll_get_ownership(req))
+               __io_poll_execute(req, res);
+}
+
 /*
  * All poll tw should go through this. Checks for poll events, manages
  * references, does rewait, etc.
@@ -309,6 +331,8 @@ static int io_poll_check_events(struct io_kiocb *req, struct io_tw_state *ts)
                        int ret = io_poll_issue(req, ts);
                        if (ret == IOU_STOP_MULTISHOT)
                                return IOU_POLL_REMOVE_POLL_USE_RES;
+                       else if (ret == IOU_REQUEUE)
+                               return IOU_POLL_REQUEUE;
                        if (ret < 0)
                                return ret;
                }
@@ -320,8 +344,8 @@ static int io_poll_check_events(struct io_kiocb *req, struct io_tw_state *ts)
                 * Release all references, retry if someone tried to restart
                 * task_work while we were executing it.
                 */
-       } while (atomic_sub_return(v & IO_POLL_REF_MASK, &req->poll_refs) &
-                                       IO_POLL_REF_MASK);
+               v &= IO_POLL_REF_MASK;
+       } while (atomic_sub_return(v, &req->poll_refs) & IO_POLL_REF_MASK);
 
        return IOU_POLL_NO_ACTION;
 }
@@ -331,8 +355,12 @@ void io_poll_task_func(struct io_kiocb *req, struct io_tw_state *ts)
        int ret;
 
        ret = io_poll_check_events(req, ts);
-       if (ret == IOU_POLL_NO_ACTION)
+       if (ret == IOU_POLL_NO_ACTION) {
                return;
+       } else if (ret == IOU_POLL_REQUEUE) {
+               __io_poll_execute(req, 0);
+               return;
+       }
        io_poll_remove_entries(req);
        io_poll_tw_hash_eject(req, ts);
 
@@ -364,26 +392,6 @@ void io_poll_task_func(struct io_kiocb *req, struct io_tw_state *ts)
        }
 }
 
-static void __io_poll_execute(struct io_kiocb *req, int mask)
-{
-       unsigned flags = 0;
-
-       io_req_set_res(req, mask, 0);
-       req->io_task_work.func = io_poll_task_func;
-
-       trace_io_uring_task_add(req, mask);
-
-       if (!(req->flags & REQ_F_POLL_NO_LAZY))
-               flags = IOU_F_TWQ_LAZY_WAKE;
-       __io_req_task_work_add(req, flags);
-}
-
-static inline void io_poll_execute(struct io_kiocb *req, int res)
-{
-       if (io_poll_get_ownership(req))
-               __io_poll_execute(req, res);
-}
-
 static void io_poll_cancel_req(struct io_kiocb *req)
 {
        io_poll_mark_cancelled(req);
@@ -532,14 +540,6 @@ static void __io_queue_proc(struct io_poll *poll, struct io_poll_table *pt,
        poll->wait.private = (void *) wqe_private;
 
        if (poll->events & EPOLLEXCLUSIVE) {
-               /*
-                * Exclusive waits may only wake a limited amount of entries
-                * rather than all of them, this may interfere with lazy
-                * wake if someone does wait(events > 1). Ensure we don't do
-                * lazy wake for those, as we need to process each one as they
-                * come in.
-                */
-               req->flags |= REQ_F_POLL_NO_LAZY;
                add_wait_queue_exclusive(head, &poll->wait);
        } else {
                add_wait_queue(head, &poll->wait);
@@ -581,10 +581,7 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
                                 struct io_poll_table *ipt, __poll_t mask,
                                 unsigned issue_flags)
 {
-       struct io_ring_ctx *ctx = req->ctx;
-
        INIT_HLIST_NODE(&req->hash_node);
-       req->work.cancel_seq = atomic_read(&ctx->cancel_seq);
        io_init_poll_iocb(poll, mask);
        poll->file = req->file;
        req->apoll_events = poll->events;
@@ -611,6 +608,17 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
        if (issue_flags & IO_URING_F_UNLOCKED)
                req->flags &= ~REQ_F_HASH_LOCKED;
 
+
+       /*
+        * Exclusive waits may only wake a limited amount of entries
+        * rather than all of them, this may interfere with lazy
+        * wake if someone does wait(events > 1). Ensure we don't do
+        * lazy wake for those, as we need to process each one as they
+        * come in.
+        */
+       if (poll->events & EPOLLEXCLUSIVE)
+               req->flags |= REQ_F_POLL_NO_LAZY;
+
        mask = vfs_poll(req->file, &ipt->pt) & poll->events;
 
        if (unlikely(ipt->error || !ipt->nr_entries)) {
@@ -645,6 +653,7 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
                __io_poll_execute(req, mask);
                return 0;
        }
+       io_napi_add(req);
 
        if (ipt->owning) {
                /*
@@ -720,7 +729,7 @@ int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
 
        if (!def->pollin && !def->pollout)
                return IO_APOLL_ABORTED;
-       if (!file_can_poll(req->file))
+       if (!io_file_can_poll(req))
                return IO_APOLL_ABORTED;
        if (!(req->flags & REQ_F_APOLL_MULTISHOT))
                mask |= EPOLLONESHOT;
@@ -811,9 +820,8 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
                if (poll_only && req->opcode != IORING_OP_POLL_ADD)
                        continue;
                if (cd->flags & IORING_ASYNC_CANCEL_ALL) {
-                       if (cd->seq == req->work.cancel_seq)
+                       if (io_cancel_match_sequence(req, cd->seq))
                                continue;
-                       req->work.cancel_seq = cd->seq;
                }
                *out_bucket = hb;
                return req;
index ff4d5d753387e80568ccc90734732d6cb999b39e..1dacae9e816c9269e8a1ae5bfab4d12fa9aaa9ac 100644 (file)
@@ -24,6 +24,15 @@ struct async_poll {
        struct io_poll          *double_poll;
 };
 
+/*
+ * Must only be called inside issue_flags & IO_URING_F_MULTISHOT, or
+ * potentially other cases where we already "own" this poll request.
+ */
+static inline void io_poll_multishot_retry(struct io_kiocb *req)
+{
+       atomic_inc(&req->poll_refs);
+}
+
 int io_poll_add_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_poll_add(struct io_kiocb *req, unsigned int issue_flags);
 
index 5e62c1208996542537c6aedf4d57506863165e10..99c37775f974c02a1c2f99d244af69a7c69c4ed3 100644 (file)
@@ -26,6 +26,7 @@
 #include "register.h"
 #include "cancel.h"
 #include "kbuf.h"
+#include "napi.h"
 
 #define IORING_MAX_RESTRICTIONS        (IORING_RESTRICTION_LAST + \
                                 IORING_REGISTER_LAST + IORING_OP_LAST)
@@ -550,6 +551,18 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
                        break;
                ret = io_register_pbuf_status(ctx, arg);
                break;
+       case IORING_REGISTER_NAPI:
+               ret = -EINVAL;
+               if (!arg || nr_args != 1)
+                       break;
+               ret = io_register_napi(ctx, arg);
+               break;
+       case IORING_UNREGISTER_NAPI:
+               ret = -EINVAL;
+               if (nr_args != 1)
+                       break;
+               ret = io_unregister_napi(ctx, arg);
+               break;
        default:
                ret = -EINVAL;
                break;
index c6f199bbee2843dfea2d88729b707d46a168d3d9..e210002389540b671e9cb1f00dcbff4303ae2f75 100644 (file)
@@ -2,8 +2,6 @@
 #ifndef IOU_RSRC_H
 #define IOU_RSRC_H
 
-#include <net/af_unix.h>
-
 #include "alloc_cache.h"
 
 #define IO_NODE_ALLOC_CACHE_MAX 32
index 118cc9f1cf1602a4859eb3359c8b2e64cf6db620..47e097ab5d7e4f2e0617146cb5c139dc3b92a667 100644 (file)
@@ -11,6 +11,7 @@
 #include <linux/nospec.h>
 #include <linux/compat.h>
 #include <linux/io_uring/cmd.h>
+#include <linux/indirect_call_wrapper.h>
 
 #include <uapi/linux/io_uring.h>
 
@@ -18,6 +19,7 @@
 #include "opdef.h"
 #include "kbuf.h"
 #include "rsrc.h"
+#include "poll.h"
 #include "rw.h"
 
 struct io_rw {
@@ -273,7 +275,7 @@ static bool __io_complete_rw_common(struct io_kiocb *req, long res)
                         * current cycle.
                         */
                        io_req_io_end(req);
-                       req->flags |= REQ_F_REISSUE | REQ_F_PARTIAL_IO;
+                       req->flags |= REQ_F_REISSUE | REQ_F_BL_NO_RECYCLE;
                        return true;
                }
                req_set_fail(req);
@@ -340,7 +342,7 @@ static void io_complete_rw_iopoll(struct kiocb *kiocb, long res)
                io_req_end_write(req);
        if (unlikely(res != req->cqe.res)) {
                if (res == -EAGAIN && io_rw_should_reissue(req)) {
-                       req->flags |= REQ_F_REISSUE | REQ_F_PARTIAL_IO;
+                       req->flags |= REQ_F_REISSUE | REQ_F_BL_NO_RECYCLE;
                        return;
                }
                req->cqe.res = res;
@@ -681,7 +683,7 @@ static bool io_rw_should_retry(struct io_kiocb *req)
         * just use poll if we can, and don't attempt if the fs doesn't
         * support callback based unlocks
         */
-       if (file_can_poll(req->file) || !(req->file->f_mode & FMODE_BUF_RASYNC))
+       if (io_file_can_poll(req) || !(req->file->f_mode & FMODE_BUF_RASYNC))
                return false;
 
        wait->wait.func = io_async_buf_func;
@@ -720,7 +722,7 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
        struct file *file = req->file;
        int ret;
 
-       if (unlikely(!file || !(file->f_mode & mode)))
+       if (unlikely(!(file->f_mode & mode)))
                return -EBADF;
 
        if (!(req->flags & REQ_F_FIXED_FILE))
@@ -830,7 +832,7 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
                 * If we can poll, just do that. For a vectored read, we'll
                 * need to copy state first.
                 */
-               if (file_can_poll(req->file) && !io_issue_defs[req->opcode].vectored)
+               if (io_file_can_poll(req) && !io_issue_defs[req->opcode].vectored)
                        return -EAGAIN;
                /* IOPOLL retry should happen for io-wq threads */
                if (!force_nonblock && !(req->ctx->flags & IORING_SETUP_IOPOLL))
@@ -929,7 +931,7 @@ int io_read_mshot(struct io_kiocb *req, unsigned int issue_flags)
        /*
         * Multishot MUST be used on a pollable file
         */
-       if (!file_can_poll(req->file))
+       if (!io_file_can_poll(req))
                return -EBADFD;
 
        ret = __io_read(req, issue_flags);
@@ -962,8 +964,15 @@ int io_read_mshot(struct io_kiocb *req, unsigned int issue_flags)
                if (io_fill_cqe_req_aux(req,
                                        issue_flags & IO_URING_F_COMPLETE_DEFER,
                                        ret, cflags | IORING_CQE_F_MORE)) {
-                       if (issue_flags & IO_URING_F_MULTISHOT)
+                       if (issue_flags & IO_URING_F_MULTISHOT) {
+                               /*
+                                * Force retry, as we might have more data to
+                                * be read and otherwise it won't get retried
+                                * until (if ever) another poll is triggered.
+                                */
+                               io_poll_multishot_retry(req);
                                return IOU_ISSUE_SKIP_COMPLETE;
+                       }
                        return -EAGAIN;
                }
        }
index 65b5dbe3c850ed564432c76f17e64739d430f2fe..363052b4ea76a218f2266f543203631a08d0502b 100644 (file)
 #include <uapi/linux/io_uring.h>
 
 #include "io_uring.h"
+#include "napi.h"
 #include "sqpoll.h"
 
 #define IORING_SQPOLL_CAP_ENTRIES_VALUE 8
+#define IORING_TW_CAP_ENTRIES_VALUE    8
 
 enum {
        IO_SQ_THREAD_SHOULD_STOP = 0,
@@ -193,6 +195,9 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
                        ret = io_submit_sqes(ctx, to_submit);
                mutex_unlock(&ctx->uring_lock);
 
+               if (io_napi(ctx))
+                       ret += io_napi_sqpoll_busy_poll(ctx);
+
                if (to_submit && wq_has_sleeper(&ctx->sqo_sq_wait))
                        wake_up(&ctx->sqo_sq_wait);
                if (creds)
@@ -219,10 +224,52 @@ static bool io_sqd_handle_event(struct io_sq_data *sqd)
        return did_sig || test_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state);
 }
 
+/*
+ * Run task_work, processing the retry_list first. The retry_list holds
+ * entries that we passed on in the previous run, if we had more task_work
+ * than we were asked to process. Newly queued task_work isn't run until the
+ * retry list has been fully processed.
+ */
+static unsigned int io_sq_tw(struct llist_node **retry_list, int max_entries)
+{
+       struct io_uring_task *tctx = current->io_uring;
+       unsigned int count = 0;
+
+       if (*retry_list) {
+               *retry_list = io_handle_tw_list(*retry_list, &count, max_entries);
+               if (count >= max_entries)
+                       return count;
+               max_entries -= count;
+       }
+
+       *retry_list = tctx_task_work_run(tctx, max_entries, &count);
+       return count;
+}
+
+static bool io_sq_tw_pending(struct llist_node *retry_list)
+{
+       struct io_uring_task *tctx = current->io_uring;
+
+       return retry_list || !llist_empty(&tctx->task_list);
+}
+
+static void io_sq_update_worktime(struct io_sq_data *sqd, struct rusage *start)
+{
+       struct rusage end;
+
+       getrusage(current, RUSAGE_SELF, &end);
+       end.ru_stime.tv_sec -= start->ru_stime.tv_sec;
+       end.ru_stime.tv_usec -= start->ru_stime.tv_usec;
+
+       sqd->work_time += end.ru_stime.tv_usec + end.ru_stime.tv_sec * 1000000;
+}
+
 static int io_sq_thread(void *data)
 {
+       struct llist_node *retry_list = NULL;
        struct io_sq_data *sqd = data;
        struct io_ring_ctx *ctx;
+       struct rusage start;
        unsigned long timeout = 0;
        char buf[TASK_COMM_LEN];
        DEFINE_WAIT(wait);
@@ -251,18 +298,21 @@ static int io_sq_thread(void *data)
                }
 
                cap_entries = !list_is_singular(&sqd->ctx_list);
+               getrusage(current, RUSAGE_SELF, &start);
                list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
                        int ret = __io_sq_thread(ctx, cap_entries);
 
                        if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list)))
                                sqt_spin = true;
                }
-               if (io_run_task_work())
+               if (io_sq_tw(&retry_list, IORING_TW_CAP_ENTRIES_VALUE))
                        sqt_spin = true;
 
                if (sqt_spin || !time_after(jiffies, timeout)) {
-                       if (sqt_spin)
+                       if (sqt_spin) {
+                               io_sq_update_worktime(sqd, &start);
                                timeout = jiffies + sqd->sq_thread_idle;
+                       }
                        if (unlikely(need_resched())) {
                                mutex_unlock(&sqd->lock);
                                cond_resched();
@@ -273,7 +323,7 @@ static int io_sq_thread(void *data)
                }
 
                prepare_to_wait(&sqd->wait, &wait, TASK_INTERRUPTIBLE);
-               if (!io_sqd_events_pending(sqd) && !task_work_pending(current)) {
+               if (!io_sqd_events_pending(sqd) && !io_sq_tw_pending(retry_list)) {
                        bool needs_sched = true;
 
                        list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
@@ -312,6 +362,9 @@ static int io_sq_thread(void *data)
                timeout = jiffies + sqd->sq_thread_idle;
        }
 
+       if (retry_list)
+               io_sq_tw(&retry_list, UINT_MAX);
+
        io_uring_cancel_generic(true, sqd);
        sqd->thread = NULL;
        list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
index 8df37e8c914936d777b9d0495796c236f41e5189..4171666b1cf4cc37b84cb4079483bdf7b762add1 100644 (file)
@@ -16,6 +16,7 @@ struct io_sq_data {
        pid_t                   task_pid;
        pid_t                   task_tgid;
 
+       u64                     work_time;
        unsigned long           state;
        struct completion       exited;
 };
diff --git a/io_uring/truncate.c b/io_uring/truncate.c
new file mode 100644 (file)
index 0000000..62ee73d
--- /dev/null
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/syscalls.h>
+#include <linux/io_uring.h>
+
+#include <uapi/linux/io_uring.h>
+
+#include "../fs/internal.h"
+
+#include "io_uring.h"
+#include "truncate.h"
+
+struct io_ftrunc {
+       struct file                     *file;
+       loff_t                          len;
+};
+
+int io_ftruncate_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+       struct io_ftrunc *ft = io_kiocb_to_cmd(req, struct io_ftrunc);
+
+       if (sqe->rw_flags || sqe->addr || sqe->len || sqe->buf_index ||
+           sqe->splice_fd_in || sqe->addr3)
+               return -EINVAL;
+
+       ft->len = READ_ONCE(sqe->off);
+
+       req->flags |= REQ_F_FORCE_ASYNC;
+       return 0;
+}
+
+int io_ftruncate(struct io_kiocb *req, unsigned int issue_flags)
+{
+       struct io_ftrunc *ft = io_kiocb_to_cmd(req, struct io_ftrunc);
+       int ret;
+
+       WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
+
+       ret = do_ftruncate(req->file, ft->len, 1);
+
+       io_req_set_res(req, ret, 0);
+       return IOU_OK;
+}
diff --git a/io_uring/truncate.h b/io_uring/truncate.h
new file mode 100644 (file)
index 0000000..ec08829
--- /dev/null
@@ -0,0 +1,4 @@
+// SPDX-License-Identifier: GPL-2.0
+
+int io_ftruncate_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
+int io_ftruncate(struct io_kiocb *req, unsigned int issue_flags);
index c33fca585dde5ceb993b80d88025a9d9a61c9ae7..42f63adfa54a04f0b8c67a3babb7415b2bcc96bc 100644 (file)
@@ -5,6 +5,7 @@
 #include <linux/io_uring/cmd.h>
 #include <linux/security.h>
 #include <linux/nospec.h>
+#include <net/sock.h>
 
 #include <uapi/linux/io_uring.h>
 #include <asm/ioctls.h>
index e1c810e0b85a422435979a89d1edd9dfb04d6c22..44905b82eea8305d0cf40080fe3c6cb71fa6e985 100644 (file)
@@ -112,7 +112,7 @@ int io_fgetxattr(struct io_kiocb *req, unsigned int issue_flags)
 
        WARN_ON_ONCE(issue_flags & IO_URING_F_NONBLOCK);
 
-       ret = do_getxattr(mnt_idmap(req->file->f_path.mnt),
+       ret = do_getxattr(file_mnt_idmap(req->file),
                        req->file->f_path.dentry,
                        &ix->ctx);
 
index 8a0bb80fe48a344964e4029fec5e895ee512babf..ef82ffc90cbe9d7aeab50c4856b45a52621a90a9 100644 (file)
@@ -178,7 +178,7 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
                                    void **frames, int n,
                                    struct xdp_cpumap_stats *stats)
 {
-       struct xdp_rxq_info rxq;
+       struct xdp_rxq_info rxq = {};
        struct xdp_buff xdp;
        int i, nframes = 0;
 
index be72824f32b2cc5e3dfcb8d2bd613b86116a498c..d19cd863d294ea1b589aeae327ae6b10e7211a93 100644 (file)
@@ -1101,6 +1101,7 @@ struct bpf_hrtimer {
        struct bpf_prog *prog;
        void __rcu *callback_fn;
        void *value;
+       struct rcu_head rcu;
 };
 
 /* the actual struct hidden inside uapi struct bpf_timer */
@@ -1332,6 +1333,7 @@ BPF_CALL_1(bpf_timer_cancel, struct bpf_timer_kern *, timer)
 
        if (in_nmi())
                return -EOPNOTSUPP;
+       rcu_read_lock();
        __bpf_spin_lock_irqsave(&timer->lock);
        t = timer->timer;
        if (!t) {
@@ -1353,6 +1355,7 @@ out:
         * if it was running.
         */
        ret = ret ?: hrtimer_cancel(&t->timer);
+       rcu_read_unlock();
        return ret;
 }
 
@@ -1407,7 +1410,7 @@ out:
         */
        if (this_cpu_read(hrtimer_running) != t)
                hrtimer_cancel(&t->timer);
-       kfree(t);
+       kfree_rcu(t, rcu);
 }
 
 BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr)
index e5c3500443c6e71f4fca7a6403dc33aa054d6fd8..ec4e97c61eefe667955e984b934f354b9162b52d 100644 (file)
@@ -978,6 +978,8 @@ __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it,
        BUILD_BUG_ON(__alignof__(struct bpf_iter_task_kern) !=
                                        __alignof__(struct bpf_iter_task));
 
+       kit->pos = NULL;
+
        switch (flags) {
        case BPF_TASK_ITER_ALL_THREADS:
        case BPF_TASK_ITER_ALL_PROCS:
index 65f598694d550359f2b926ef26ae30d0c80c6f69..ddea9567f755946501cd2dc92aef56057ddee41d 100644 (file)
@@ -5227,7 +5227,9 @@ BTF_ID(struct, prog_test_ref_kfunc)
 #ifdef CONFIG_CGROUPS
 BTF_ID(struct, cgroup)
 #endif
+#ifdef CONFIG_BPF_JIT
 BTF_ID(struct, bpf_cpumask)
+#endif
 BTF_ID(struct, task_struct)
 BTF_SET_END(rcu_protected_types)
 
@@ -16600,6 +16602,9 @@ static bool func_states_equal(struct bpf_verifier_env *env, struct bpf_func_stat
 {
        int i;
 
+       if (old->callback_depth > cur->callback_depth)
+               return false;
+
        for (i = 0; i < MAX_BPF_REG; i++)
                if (!regsafe(env, &old->regs[i], &cur->regs[i],
                             &env->idmap_scratch, exact))
index ba36c073304a3eee081b770b12dacb0e5b1a60cd..927bef3a598ad5ce4b0fee8dfce70d0a89c1964c 100644 (file)
@@ -2562,7 +2562,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
                update_partition_sd_lb(cs, old_prs);
 out_free:
        free_cpumasks(NULL, &tmp);
-       return 0;
+       return retval;
 }
 
 /**
@@ -2598,9 +2598,6 @@ static int update_exclusive_cpumask(struct cpuset *cs, struct cpuset *trialcs,
        if (cpumask_equal(cs->exclusive_cpus, trialcs->exclusive_cpus))
                return 0;
 
-       if (alloc_cpumasks(NULL, &tmp))
-               return -ENOMEM;
-
        if (*buf)
                compute_effective_exclusive_cpumask(trialcs, NULL);
 
@@ -2615,6 +2612,9 @@ static int update_exclusive_cpumask(struct cpuset *cs, struct cpuset *trialcs,
        if (retval)
                return retval;
 
+       if (alloc_cpumasks(NULL, &tmp))
+               return -ENOMEM;
+
        if (old_prs) {
                if (cpumask_empty(trialcs->effective_xcpus)) {
                        invalidate = true;
index 6ef0b35fc28c5a50434837f89d798060a58e26b9..70ae70d0382337e1d7c361926b7162bd21cda9be 100644 (file)
@@ -458,6 +458,8 @@ static __always_inline void context_tracking_recursion_exit(void)
  * __ct_user_enter - Inform the context tracking that the CPU is going
  *                  to enter user or guest space mode.
  *
+ * @state: userspace context-tracking state to enter.
+ *
  * This function must be called right before we switch from the kernel
  * to user or guest space, when it's guaranteed the remaining kernel
  * instructions to execute won't use any RCU read side critical section
@@ -595,6 +597,8 @@ NOKPROBE_SYMBOL(user_enter_callable);
  * __ct_user_exit - Inform the context tracking that the CPU is
  *                 exiting user or guest mode and entering the kernel.
  *
+ * @state: userspace context-tracking state being exited from.
+ *
  * This function must be called after we entered the kernel from user or
  * guest space before any use of RCU read side critical section. This
  * potentially include any high level kernel code like syscalls, exceptions,
index 485bb0389b488d28a4efb23901b514d93b3834f6..929e98c629652a0fef1b71e6c002cca41936c4b4 100644 (file)
@@ -537,7 +537,7 @@ retry:
                }
        }
 
-       ret = __replace_page(vma, vaddr, old_page, new_page);
+       ret = __replace_page(vma, vaddr & PAGE_MASK, old_page, new_page);
        if (new_page)
                put_page(new_page);
 put_old:
index 3988a02efaef06444654a415ce298d378ab925ec..41a12630cbbc9cd80b6b5a154041c514b46ad3fe 100644 (file)
@@ -739,6 +739,13 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
                kill_orphaned_pgrp(tsk->group_leader, NULL);
 
        tsk->exit_state = EXIT_ZOMBIE;
+       /*
+        * sub-thread or delay_group_leader(), wake up the
+        * PIDFD_THREAD waiters.
+        */
+       if (!thread_group_empty(tsk))
+               do_notify_pidfd(tsk);
+
        if (unlikely(tsk->ptrace)) {
                int sig = thread_group_leader(tsk) &&
                                thread_group_empty(tsk) &&
@@ -1127,17 +1134,14 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
                 * and nobody can change them.
                 *
                 * psig->stats_lock also protects us from our sub-threads
-                * which can reap other children at the same time. Until
-                * we change k_getrusage()-like users to rely on this lock
-                * we have to take ->siglock as well.
+                * which can reap other children at the same time.
                 *
                 * We use thread_group_cputime_adjusted() to get times for
                 * the thread group, which consolidates times for all threads
                 * in the group including the group leader.
                 */
                thread_group_cputime_adjusted(p, &tgutime, &tgstime);
-               spin_lock_irq(&current->sighand->siglock);
-               write_seqlock(&psig->stats_lock);
+               write_seqlock_irq(&psig->stats_lock);
                psig->cutime += tgutime + sig->cutime;
                psig->cstime += tgstime + sig->cstime;
                psig->cgtime += task_gtime(p) + sig->gtime + sig->cgtime;
@@ -1160,8 +1164,7 @@ static int wait_task_zombie(struct wait_opts *wo, struct task_struct *p)
                        psig->cmaxrss = maxrss;
                task_io_accounting_add(&psig->ioac, &p->ioac);
                task_io_accounting_add(&psig->ioac, &sig->ioac);
-               write_sequnlock(&psig->stats_lock);
-               spin_unlock_irq(&current->sighand->siglock);
+               write_sequnlock_irq(&psig->stats_lock);
        }
 
        if (wo->wo_rusage)
@@ -1893,30 +1896,6 @@ Efault:
 }
 #endif
 
-/**
- * thread_group_exited - check that a thread group has exited
- * @pid: tgid of thread group to be checked.
- *
- * Test if the thread group represented by tgid has exited (all
- * threads are zombies, dead or completely gone).
- *
- * Return: true if the thread group has exited. false otherwise.
- */
-bool thread_group_exited(struct pid *pid)
-{
-       struct task_struct *task;
-       bool exited;
-
-       rcu_read_lock();
-       task = pid_task(pid, PIDTYPE_PID);
-       exited = !task ||
-               (READ_ONCE(task->exit_state) && thread_group_empty(task));
-       rcu_read_unlock();
-
-       return exited;
-}
-EXPORT_SYMBOL(thread_group_exited);
-
 /*
  * This needs to be __function_aligned as GCC implicitly makes any
  * implementation of abort() cold and drops alignment specified by
index 47ff3b35352e0bb4ec040b77241d9cbdcb986ef2..39a5046c2f0bf49e1bcade15c4c3c5574742b09d 100644 (file)
 #include <linux/user_events.h>
 #include <linux/iommu.h>
 #include <linux/rseq.h>
+#include <uapi/linux/pidfd.h>
+#include <linux/pidfs.h>
 
 #include <asm/pgalloc.h>
 #include <linux/uaccess.h>
@@ -1748,6 +1750,7 @@ static int copy_fs(unsigned long clone_flags, struct task_struct *tsk)
        if (clone_flags & CLONE_FS) {
                /* tsk->fs is already what we want */
                spin_lock(&fs->lock);
+               /* "users" and "in_exec" locked for check_unsafe_exec() */
                if (fs->in_exec) {
                        spin_unlock(&fs->lock);
                        return -EAGAIN;
@@ -1975,6 +1978,7 @@ static inline void rcu_copy_process(struct task_struct *p)
        p->rcu_tasks_holdout = false;
        INIT_LIST_HEAD(&p->rcu_tasks_holdout_list);
        p->rcu_tasks_idle_cpu = -1;
+       INIT_LIST_HEAD(&p->rcu_tasks_exit_list);
 #endif /* #ifdef CONFIG_TASKS_RCU */
 #ifdef CONFIG_TASKS_TRACE_RCU
        p->trc_reader_nesting = 0;
@@ -1984,119 +1988,6 @@ static inline void rcu_copy_process(struct task_struct *p)
 #endif /* #ifdef CONFIG_TASKS_TRACE_RCU */
 }
 
-struct pid *pidfd_pid(const struct file *file)
-{
-       if (file->f_op == &pidfd_fops)
-               return file->private_data;
-
-       return ERR_PTR(-EBADF);
-}
-
-static int pidfd_release(struct inode *inode, struct file *file)
-{
-       struct pid *pid = file->private_data;
-
-       file->private_data = NULL;
-       put_pid(pid);
-       return 0;
-}
-
-#ifdef CONFIG_PROC_FS
-/**
- * pidfd_show_fdinfo - print information about a pidfd
- * @m: proc fdinfo file
- * @f: file referencing a pidfd
- *
- * Pid:
- * This function will print the pid that a given pidfd refers to in the
- * pid namespace of the procfs instance.
- * If the pid namespace of the process is not a descendant of the pid
- * namespace of the procfs instance 0 will be shown as its pid. This is
- * similar to calling getppid() on a process whose parent is outside of
- * its pid namespace.
- *
- * NSpid:
- * If pid namespaces are supported then this function will also print
- * the pid of a given pidfd refers to for all descendant pid namespaces
- * starting from the current pid namespace of the instance, i.e. the
- * Pid field and the first entry in the NSpid field will be identical.
- * If the pid namespace of the process is not a descendant of the pid
- * namespace of the procfs instance 0 will be shown as its first NSpid
- * entry and no others will be shown.
- * Note that this differs from the Pid and NSpid fields in
- * /proc/<pid>/status where Pid and NSpid are always shown relative to
- * the  pid namespace of the procfs instance. The difference becomes
- * obvious when sending around a pidfd between pid namespaces from a
- * different branch of the tree, i.e. where no ancestral relation is
- * present between the pid namespaces:
- * - create two new pid namespaces ns1 and ns2 in the initial pid
- *   namespace (also take care to create new mount namespaces in the
- *   new pid namespace and mount procfs)
- * - create a process with a pidfd in ns1
- * - send pidfd from ns1 to ns2
- * - read /proc/self/fdinfo/<pidfd> and observe that both Pid and NSpid
- *   have exactly one entry, which is 0
- */
-static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
-{
-       struct pid *pid = f->private_data;
-       struct pid_namespace *ns;
-       pid_t nr = -1;
-
-       if (likely(pid_has_task(pid, PIDTYPE_PID))) {
-               ns = proc_pid_ns(file_inode(m->file)->i_sb);
-               nr = pid_nr_ns(pid, ns);
-       }
-
-       seq_put_decimal_ll(m, "Pid:\t", nr);
-
-#ifdef CONFIG_PID_NS
-       seq_put_decimal_ll(m, "\nNSpid:\t", nr);
-       if (nr > 0) {
-               int i;
-
-               /* If nr is non-zero it means that 'pid' is valid and that
-                * ns, i.e. the pid namespace associated with the procfs
-                * instance, is in the pid namespace hierarchy of pid.
-                * Start at one below the already printed level.
-                */
-               for (i = ns->level + 1; i <= pid->level; i++)
-                       seq_put_decimal_ll(m, "\t", pid->numbers[i].nr);
-       }
-#endif
-       seq_putc(m, '\n');
-}
-#endif
-
-/*
- * Poll support for process exit notification.
- */
-static __poll_t pidfd_poll(struct file *file, struct poll_table_struct *pts)
-{
-       struct pid *pid = file->private_data;
-       __poll_t poll_flags = 0;
-
-       poll_wait(file, &pid->wait_pidfd, pts);
-
-       /*
-        * Inform pollers only when the whole thread group exits.
-        * If the thread group leader exits before all other threads in the
-        * group, then poll(2) should block, similar to the wait(2) family.
-        */
-       if (thread_group_exited(pid))
-               poll_flags = EPOLLIN | EPOLLRDNORM;
-
-       return poll_flags;
-}
-
-const struct file_operations pidfd_fops = {
-       .release = pidfd_release,
-       .poll = pidfd_poll,
-#ifdef CONFIG_PROC_FS
-       .show_fdinfo = pidfd_show_fdinfo,
-#endif
-};
-
 /**
  * __pidfd_prepare - allocate a new pidfd_file and reserve a pidfd
  * @pid:   the struct pid for which to create a pidfd
@@ -2130,20 +2021,20 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
        int pidfd;
        struct file *pidfd_file;
 
-       if (flags & ~(O_NONBLOCK | O_RDWR | O_CLOEXEC))
-               return -EINVAL;
-
-       pidfd = get_unused_fd_flags(O_RDWR | O_CLOEXEC);
+       pidfd = get_unused_fd_flags(O_CLOEXEC);
        if (pidfd < 0)
                return pidfd;
 
-       pidfd_file = anon_inode_getfile("[pidfd]", &pidfd_fops, pid,
-                                       flags | O_RDWR | O_CLOEXEC);
+       pidfd_file = pidfs_alloc_file(pid, flags | O_RDWR);
        if (IS_ERR(pidfd_file)) {
                put_unused_fd(pidfd);
                return PTR_ERR(pidfd_file);
        }
-       get_pid(pid); /* held by pidfd_file now */
+       /*
+        * anon_inode_getfile() ignores everything outside of the
+        * O_ACCMODE | O_NONBLOCK mask, set PIDFD_THREAD manually.
+        */
+       pidfd_file->f_flags |= (flags & PIDFD_THREAD);
        *ret = pidfd_file;
        return pidfd;
 }
@@ -2157,7 +2048,8 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
  * Allocate a new file that stashes @pid and reserve a new pidfd number in the
  * caller's file descriptor table. The pidfd is reserved but not installed yet.
  *
- * The helper verifies that @pid is used as a thread group leader.
+ * The helper verifies that @pid is still in use, without PIDFD_THREAD the
+ * task identified by @pid must be a thread-group leader.
  *
  * If this function returns successfully the caller is responsible to either
  * call fd_install() passing the returned pidfd and pidfd file as arguments in
@@ -2176,7 +2068,9 @@ static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
  */
 int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
 {
-       if (!pid || !pid_has_task(pid, PIDTYPE_TGID))
+       bool thread = flags & PIDFD_THREAD;
+
+       if (!pid || !pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID))
                return -EINVAL;
 
        return __pidfd_prepare(pid, flags, ret);
@@ -2298,9 +2192,8 @@ __latent_entropy struct task_struct *copy_process(
                /*
                 * - CLONE_DETACHED is blocked so that we can potentially
                 *   reuse it later for CLONE_PIDFD.
-                * - CLONE_THREAD is blocked until someone really needs it.
                 */
-               if (clone_flags & (CLONE_DETACHED | CLONE_THREAD))
+               if (clone_flags & CLONE_DETACHED)
                        return ERR_PTR(-EINVAL);
        }
 
@@ -2523,8 +2416,10 @@ __latent_entropy struct task_struct *copy_process(
         * if the fd table isn't shared).
         */
        if (clone_flags & CLONE_PIDFD) {
+               int flags = (clone_flags & CLONE_THREAD) ? PIDFD_THREAD : 0;
+
                /* Note that no task has been attached to @pid yet. */
-               retval = __pidfd_prepare(pid, O_RDWR | O_CLOEXEC, &pidfile);
+               retval = __pidfd_prepare(pid, flags, &pidfile);
                if (retval < 0)
                        goto bad_fork_free_pid;
                pidfd = retval;
@@ -2875,8 +2770,8 @@ pid_t kernel_clone(struct kernel_clone_args *args)
         * here has the advantage that we don't need to have a separate helper
         * to check for legacy clone().
         */
-       if ((args->flags & CLONE_PIDFD) &&
-           (args->flags & CLONE_PARENT_SETTID) &&
+       if ((clone_flags & CLONE_PIDFD) &&
+           (clone_flags & CLONE_PARENT_SETTID) &&
            (args->pidfd == args->parent_tid))
                return -EINVAL;
 
index e0e853412c158e1277ea4c63a40576093b0bc673..1e78ef24321e82dbfaf0c07941a0c41ad3438aaa 100644 (file)
@@ -627,12 +627,21 @@ retry:
 }
 
 /*
- * PI futexes can not be requeued and must remove themselves from the
- * hash bucket. The hash bucket lock (i.e. lock_ptr) is held.
+ * PI futexes can not be requeued and must remove themselves from the hash
+ * bucket. The hash bucket lock (i.e. lock_ptr) is held.
  */
 void futex_unqueue_pi(struct futex_q *q)
 {
-       __futex_unqueue(q);
+       /*
+        * If the lock was not acquired (due to timeout or signal) then the
+        * rt_waiter is removed before futex_q is. If this is observed by
+        * an unlocker after dropping the rtmutex wait lock and before
+        * acquiring the hash bucket lock, then the unlocker dequeues the
+        * futex_q from the hash bucket list to guarantee consistent state
+        * vs. userspace. Therefore the dequeue here must be conditional.
+        */
+       if (!plist_node_empty(&q->list))
+               __futex_unqueue(q);
 
        BUG_ON(!q->pi_state);
        put_pi_state(q->pi_state);
index 90e5197f4e5696dbd5a79fd0033ed95e4bd32fac..5722467f273794ec314870fc76d0ba04a8617f7e 100644 (file)
@@ -1135,6 +1135,7 @@ retry:
 
        hb = futex_hash(&key);
        spin_lock(&hb->lock);
+retry_hb:
 
        /*
         * Check waiters first. We do not trust user space values at
@@ -1177,12 +1178,17 @@ retry:
                /*
                 * Futex vs rt_mutex waiter state -- if there are no rt_mutex
                 * waiters even though futex thinks there are, then the waiter
-                * is leaving and the uncontended path is safe to take.
+                * is leaving. The entry needs to be removed from the list so a
+                * new futex_lock_pi() is not using this stale PI-state while
+                * the futex is available in user space again.
+                * There can be more than one task on its way out so it needs
+                * to retry.
                 */
                rt_waiter = rt_mutex_top_waiter(&pi_state->pi_mutex);
                if (!rt_waiter) {
+                       __futex_unqueue(top_waiter);
                        raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
-                       goto do_uncontended;
+                       goto retry_hb;
                }
 
                get_pi_state(pi_state);
@@ -1217,7 +1223,6 @@ retry:
                return ret;
        }
 
-do_uncontended:
        /*
         * We have no kernel internal state, i.e. no waiters in the
         * kernel. Waiters which are about to queue themselves are stuck
index 27ca1c866f298bf9d8876bce68e418881702aabf..371eb1711d3467baf596c477411c1d3ac554cedd 100644 (file)
@@ -600,7 +600,7 @@ int __init early_irq_init(void)
                mutex_init(&desc[i].request_mutex);
                init_waitqueue_head(&desc[i].wait_for_threads);
                desc_set_defaults(i, &desc[i], node, NULL, NULL);
-               irq_resend_init(desc);
+               irq_resend_init(&desc[i]);
        }
        return arch_early_irq_init();
 }
index d5a0ee40bf66c5318df14c5a49294850434e13d3..9d9095e817928658d2c6d54d5da6f4826ff7c6be 100644 (file)
@@ -1993,7 +1993,7 @@ NOKPROBE_SYMBOL(__kretprobe_find_ret_addr);
 unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
                                      struct llist_node **cur)
 {
-       struct kretprobe_instance *ri = NULL;
+       struct kretprobe_instance *ri;
        kprobe_opcode_t *ret;
 
        if (WARN_ON_ONCE(!cur))
@@ -2802,7 +2802,7 @@ static int show_kprobe_addr(struct seq_file *pi, void *v)
 {
        struct hlist_head *head;
        struct kprobe *p, *kp;
-       const char *sym = NULL;
+       const char *sym;
        unsigned int i = *(loff_t *) v;
        unsigned long offset = 0;
        char *modname, namebuf[KSYM_NAME_LEN];
index 15781acaac1ceec97fa2ae649d284c025693186e..6ec3deec68c200eb656d2686e20a31165922f3b3 100644 (file)
@@ -573,7 +573,7 @@ SYSCALL_DEFINE2(setns, int, fd, int, flags)
        if (proc_ns_file(f.file))
                err = validate_ns(&nsset, ns);
        else
-               err = validate_nsset(&nsset, f.file->private_data);
+               err = validate_nsset(&nsset, pidfd_pid(f.file));
        if (!err) {
                commit_nsset(&nsset);
                perf_event_namespaces(current);
index b52b108654541545797c19aa0bf68c735442bbac..99a0c5eb24b8d61df9b4fa2d171ddf71b54f3a92 100644 (file)
@@ -42,6 +42,7 @@
 #include <linux/sched/signal.h>
 #include <linux/sched/task.h>
 #include <linux/idr.h>
+#include <linux/pidfs.h>
 #include <net/sock.h>
 #include <uapi/linux/pidfd.h>
 
@@ -65,6 +66,13 @@ int pid_max = PID_MAX_DEFAULT;
 
 int pid_max_min = RESERVED_PIDS + 1;
 int pid_max_max = PID_MAX_LIMIT;
+#ifdef CONFIG_FS_PID
+/*
+ * Pseudo filesystems start inode numbering after one. We use Reserved
+ * PIDs as a natural offset.
+ */
+static u64 pidfs_ino = RESERVED_PIDS;
+#endif
 
 /*
  * PID-map pages start out as NULL, they get allocated upon
@@ -272,6 +280,10 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
        spin_lock_irq(&pidmap_lock);
        if (!(ns->pid_allocated & PIDNS_ADDING))
                goto out_unlock;
+#ifdef CONFIG_FS_PID
+       pid->stashed = NULL;
+       pid->ino = ++pidfs_ino;
+#endif
        for ( ; upid >= pid->numbers; --upid) {
                /* Make the PID visible to find_pid_ns. */
                idr_replace(&upid->ns->idr, pid, upid->nr);
@@ -349,6 +361,11 @@ static void __change_pid(struct task_struct *task, enum pid_type type,
        hlist_del_rcu(&task->pid_links[type]);
        *pid_ptr = new;
 
+       if (type == PIDTYPE_PID) {
+               WARN_ON_ONCE(pid_has_task(pid, PIDTYPE_PID));
+               wake_up_all(&pid->wait_pidfd);
+       }
+
        for (tmp = PIDTYPE_MAX; --tmp >= 0; )
                if (pid_has_task(pid, tmp))
                        return;
@@ -391,8 +408,7 @@ void exchange_tids(struct task_struct *left, struct task_struct *right)
 void transfer_pid(struct task_struct *old, struct task_struct *new,
                           enum pid_type type)
 {
-       if (type == PIDTYPE_PID)
-               new->thread_pid = old->thread_pid;
+       WARN_ON_ONCE(type == PIDTYPE_PID);
        hlist_replace_rcu(&old->pid_links[type], &new->pid_links[type]);
 }
 
@@ -552,11 +568,6 @@ struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
  * Return the task associated with @pidfd. The function takes a reference on
  * the returned task. The caller is responsible for releasing that reference.
  *
- * Currently, the process identified by @pidfd is always a thread-group leader.
- * This restriction currently exists for all aspects of pidfds including pidfd
- * creation (CLONE_PIDFD cannot be used with CLONE_THREAD) and pidfd polling
- * (only supports thread group leaders).
- *
  * Return: On success, the task_struct associated with the pidfd.
  *        On error, a negative errno number will be returned.
  */
@@ -595,7 +606,7 @@ struct task_struct *pidfd_get_task(int pidfd, unsigned int *flags)
  * Return: On success, a cloexec pidfd is returned.
  *         On error, a negative errno number will be returned.
  */
-int pidfd_create(struct pid *pid, unsigned int flags)
+static int pidfd_create(struct pid *pid, unsigned int flags)
 {
        int pidfd;
        struct file *pidfd_file;
@@ -615,11 +626,8 @@ int pidfd_create(struct pid *pid, unsigned int flags)
  * @flags: flags to pass
  *
  * This creates a new pid file descriptor with the O_CLOEXEC flag set for
- * the process identified by @pid. Currently, the process identified by
- * @pid must be a thread-group leader. This restriction currently exists
- * for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot
- * be used with CLONE_THREAD) and pidfd polling (only supports thread group
- * leaders).
+ * the task identified by @pid. Without PIDFD_THREAD flag the target task
+ * must be a thread-group leader.
  *
  * Return: On success, a cloexec pidfd is returned.
  *         On error, a negative errno number will be returned.
@@ -629,7 +637,7 @@ SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
        int fd;
        struct pid *p;
 
-       if (flags & ~PIDFD_NONBLOCK)
+       if (flags & ~(PIDFD_NONBLOCK | PIDFD_THREAD))
                return -EINVAL;
 
        if (pid <= 0)
@@ -682,7 +690,26 @@ static struct file *__pidfd_fget(struct task_struct *task, int fd)
 
        up_read(&task->signal->exec_update_lock);
 
-       return file ?: ERR_PTR(-EBADF);
+       if (!file) {
+               /*
+                * It is possible that the target thread is exiting; it can be
+                * either:
+                * 1. before exit_signals(), which gives a real fd
+                * 2. before exit_files() takes the task_lock() gives a real fd
+                * 3. after exit_files() releases task_lock(), ->files is NULL;
+                *    this has PF_EXITING, since it was set in exit_signals(),
+                *    __pidfd_fget() returns EBADF.
+                * In case 3 we get EBADF, but that really means ESRCH, since
+                * the task is currently exiting and has freed its files
+                * struct, so we fix it up.
+                */
+               if (task->flags & PF_EXITING)
+                       file = ERR_PTR(-ESRCH);
+               else
+                       file = ERR_PTR(-EBADF);
+       }
+
+       return file;
 }
 
 static int pidfd_getfd(struct pid *pid, int fd)
index 6053ddddaf6540209bdacc80cac431c1bcd78222..692f12fe60c1309232d565ed0d8b3bbd98609b2c 100644 (file)
@@ -222,7 +222,7 @@ int swsusp_swap_in_use(void)
  */
 
 static unsigned short root_swap = 0xffff;
-static struct bdev_handle *hib_resume_bdev_handle;
+static struct file *hib_resume_bdev_file;
 
 struct hib_bio_batch {
        atomic_t                count;
@@ -276,7 +276,7 @@ static int hib_submit_io(blk_opf_t opf, pgoff_t page_off, void *addr,
        struct bio *bio;
        int error = 0;
 
-       bio = bio_alloc(hib_resume_bdev_handle->bdev, 1, opf,
+       bio = bio_alloc(file_bdev(hib_resume_bdev_file), 1, opf,
                        GFP_NOIO | __GFP_HIGH);
        bio->bi_iter.bi_sector = page_off * (PAGE_SIZE >> 9);
 
@@ -357,14 +357,14 @@ static int swsusp_swap_check(void)
                return res;
        root_swap = res;
 
-       hib_resume_bdev_handle = bdev_open_by_dev(swsusp_resume_device,
+       hib_resume_bdev_file = bdev_file_open_by_dev(swsusp_resume_device,
                        BLK_OPEN_WRITE, NULL, NULL);
-       if (IS_ERR(hib_resume_bdev_handle))
-               return PTR_ERR(hib_resume_bdev_handle);
+       if (IS_ERR(hib_resume_bdev_file))
+               return PTR_ERR(hib_resume_bdev_file);
 
-       res = set_blocksize(hib_resume_bdev_handle->bdev, PAGE_SIZE);
+       res = set_blocksize(file_bdev(hib_resume_bdev_file), PAGE_SIZE);
        if (res < 0)
-               bdev_release(hib_resume_bdev_handle);
+               fput(hib_resume_bdev_file);
 
        return res;
 }
@@ -1523,10 +1523,10 @@ int swsusp_check(bool exclusive)
        void *holder = exclusive ? &swsusp_holder : NULL;
        int error;
 
-       hib_resume_bdev_handle = bdev_open_by_dev(swsusp_resume_device,
+       hib_resume_bdev_file = bdev_file_open_by_dev(swsusp_resume_device,
                                BLK_OPEN_READ, holder, NULL);
-       if (!IS_ERR(hib_resume_bdev_handle)) {
-               set_blocksize(hib_resume_bdev_handle->bdev, PAGE_SIZE);
+       if (!IS_ERR(hib_resume_bdev_file)) {
+               set_blocksize(file_bdev(hib_resume_bdev_file), PAGE_SIZE);
                clear_page(swsusp_header);
                error = hib_submit_io(REQ_OP_READ, swsusp_resume_block,
                                        swsusp_header, NULL);
@@ -1551,11 +1551,11 @@ int swsusp_check(bool exclusive)
 
 put:
                if (error)
-                       bdev_release(hib_resume_bdev_handle);
+                       fput(hib_resume_bdev_file);
                else
                        pr_debug("Image signature found, resuming\n");
        } else {
-               error = PTR_ERR(hib_resume_bdev_handle);
+               error = PTR_ERR(hib_resume_bdev_file);
        }
 
        if (error)
@@ -1570,12 +1570,12 @@ put:
 
 void swsusp_close(void)
 {
-       if (IS_ERR(hib_resume_bdev_handle)) {
+       if (IS_ERR(hib_resume_bdev_file)) {
                pr_debug("Image device not initialised\n");
                return;
        }
 
-       bdev_release(hib_resume_bdev_handle);
+       fput(hib_resume_bdev_file);
 }
 
 /**
index bdd7eadb33d8fe0039349f8295cc967c9f137289..e7d2dd2675931fac627947959f3fbb9e93aae05b 100644 (file)
@@ -314,6 +314,19 @@ config RCU_LAZY
          To save power, batch RCU callbacks and flush after delay, memory
          pressure, or callback list growing too big.
 
+         Requires rcu_nocbs=all to be set.
+
+         Use rcutree.enable_rcu_lazy=0 to turn it off at boot time.
+
+config RCU_LAZY_DEFAULT_OFF
+       bool "Turn RCU lazy invocation off by default"
+       depends on RCU_LAZY
+       default n
+       help
+         Allows building the kernel with CONFIG_RCU_LAZY=y yet keep it default
+         off. Boot time param rcutree.enable_rcu_lazy=1 can be used to switch
+         it back on.
+
 config RCU_DOUBLE_CHECK_CB_TIME
        bool "RCU callback-batch backup time check"
        depends on RCU_EXPERT
index f94f65877f2b68055b4e5c7a057c22bf4fb96a18..86fce206560e83f05e3a57114e5771e0c0a76b7d 100644 (file)
@@ -528,6 +528,12 @@ struct task_struct *get_rcu_tasks_gp_kthread(void);
 struct task_struct *get_rcu_tasks_rude_gp_kthread(void);
 #endif // # ifdef CONFIG_TASKS_RUDE_RCU
 
+#ifdef CONFIG_TASKS_RCU_GENERIC
+void tasks_cblist_init_generic(void);
+#else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
+static inline void tasks_cblist_init_generic(void) { }
+#endif /* #else #ifdef CONFIG_TASKS_RCU_GENERIC */
+
 #define RCU_SCHEDULER_INACTIVE 0
 #define RCU_SCHEDULER_INIT     1
 #define RCU_SCHEDULER_RUNNING  2
@@ -543,11 +549,11 @@ enum rcutorture_type {
 };
 
 #if defined(CONFIG_RCU_LAZY)
-unsigned long rcu_lazy_get_jiffies_till_flush(void);
-void rcu_lazy_set_jiffies_till_flush(unsigned long j);
+unsigned long rcu_get_jiffies_lazy_flush(void);
+void rcu_set_jiffies_lazy_flush(unsigned long j);
 #else
-static inline unsigned long rcu_lazy_get_jiffies_till_flush(void) { return 0; }
-static inline void rcu_lazy_set_jiffies_till_flush(unsigned long j) { }
+static inline unsigned long rcu_get_jiffies_lazy_flush(void) { return 0; }
+static inline void rcu_set_jiffies_lazy_flush(unsigned long j) { }
 #endif
 
 #if defined(CONFIG_TREE_RCU)
@@ -623,12 +629,7 @@ int rcu_get_gp_kthreads_prio(void);
 void rcu_fwd_progress_check(unsigned long j);
 void rcu_force_quiescent_state(void);
 extern struct workqueue_struct *rcu_gp_wq;
-#ifdef CONFIG_RCU_EXP_KTHREAD
 extern struct kthread_worker *rcu_exp_gp_kworker;
-extern struct kthread_worker *rcu_exp_par_gp_kworker;
-#else /* !CONFIG_RCU_EXP_KTHREAD */
-extern struct workqueue_struct *rcu_par_gp_wq;
-#endif /* CONFIG_RCU_EXP_KTHREAD */
 void rcu_gp_slow_register(atomic_t *rgssp);
 void rcu_gp_slow_unregister(atomic_t *rgssp);
 #endif /* #else #ifdef CONFIG_TINY_RCU */
index ffdb30495e3cc3d2a5adc2efcd550132fb8c7a4d..8db4fedaaa1eb7340cf9b4e808e9da45b0b687cc 100644 (file)
@@ -764,9 +764,9 @@ kfree_scale_init(void)
 
        if (kfree_by_call_rcu) {
                /* do a test to check the timeout. */
-               orig_jif = rcu_lazy_get_jiffies_till_flush();
+               orig_jif = rcu_get_jiffies_lazy_flush();
 
-               rcu_lazy_set_jiffies_till_flush(2 * HZ);
+               rcu_set_jiffies_lazy_flush(2 * HZ);
                rcu_barrier();
 
                jif_start = jiffies;
@@ -775,7 +775,7 @@ kfree_scale_init(void)
 
                smp_cond_load_relaxed(&rcu_lazy_test1_cb_called, VAL == 1);
 
-               rcu_lazy_set_jiffies_till_flush(orig_jif);
+               rcu_set_jiffies_lazy_flush(orig_jif);
 
                if (WARN_ON_ONCE(jiffies_at_lazy_cb - jif_start < 2 * HZ)) {
                        pr_alert("ERROR: call_rcu() CBs are not being lazy as expected!\n");
index 7567ca8e743ca62f92fe2dda179d1bce56aaedef..45d6b4c3d199c13481b281360c2f50d75e600cef 100644 (file)
@@ -1368,9 +1368,13 @@ rcu_torture_writer(void *arg)
        struct rcu_torture *rp;
        struct rcu_torture *old_rp;
        static DEFINE_TORTURE_RANDOM(rand);
+       unsigned long stallsdone = jiffies;
        bool stutter_waited;
        unsigned long ulo[NUM_ACTIVE_RCU_POLL_OLDSTATE];
 
+       // If a new stall test is added, this must be adjusted.
+       if (stall_cpu_holdoff + stall_gp_kthread + stall_cpu)
+               stallsdone += (stall_cpu_holdoff + stall_gp_kthread + stall_cpu + 60) * HZ;
        VERBOSE_TOROUT_STRING("rcu_torture_writer task started");
        if (!can_expedite)
                pr_alert("%s" TORTURE_FLAG
@@ -1576,11 +1580,11 @@ rcu_torture_writer(void *arg)
                    !atomic_read(&rcu_fwd_cb_nodelay) &&
                    !cur_ops->slow_gps &&
                    !torture_must_stop() &&
-                   boot_ended)
+                   boot_ended &&
+                   time_after(jiffies, stallsdone))
                        for (i = 0; i < ARRAY_SIZE(rcu_tortures); i++)
                                if (list_empty(&rcu_tortures[i].rtort_free) &&
-                                   rcu_access_pointer(rcu_torture_current) !=
-                                   &rcu_tortures[i]) {
+                                   rcu_access_pointer(rcu_torture_current) != &rcu_tortures[i]) {
                                        tracing_off();
                                        show_rcu_gp_kthreads();
                                        WARN(1, "%s: rtort_pipe_count: %d\n", __func__, rcu_tortures[i].rtort_pipe_count);
@@ -2441,7 +2445,8 @@ static struct notifier_block rcu_torture_stall_block = {
 
 /*
  * CPU-stall kthread.  It waits as specified by stall_cpu_holdoff, then
- * induces a CPU stall for the time specified by stall_cpu.
+ * induces a CPU stall for the time specified by stall_cpu.  If a new
+ * stall test is added, stallsdone in rcu_torture_writer() must be adjusted.
  */
 static int rcu_torture_stall(void *args)
 {
index 0351a4e83529e322f8e96cecb6244f728478fe7b..e4d673fc30f42f9c1278e2c5ee07d1ee894706f3 100644 (file)
@@ -1234,11 +1234,20 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
        if (rhp)
                rcu_segcblist_enqueue(&sdp->srcu_cblist, rhp);
        /*
-        * The snapshot for acceleration must be taken _before_ the read of the
-        * current gp sequence used for advancing, otherwise advancing may fail
-        * and acceleration may then fail too.
+        * It's crucial to capture the snapshot 's' for acceleration before
+        * reading the current gp_seq that is used for advancing. This is
+        * essential because if the acceleration snapshot is taken after a
+        * failed advancement attempt, there's a risk that a grace period may
+        * conclude and a new one may start in the interim. If the snapshot is
+        * captured after this sequence of events, the acceleration snapshot 's'
+        * could be excessively advanced, leading to acceleration failure.
+        * In such a scenario, an 'acceleration leak' can occur, where new
+        * callbacks become indefinitely stuck in the RCU_NEXT_TAIL segment.
+        * Also note that encountering advancing failures is a normal
+        * occurrence when the grace period for RCU_WAIT_TAIL is in progress.
         *
-        * This could happen if:
+        * To see this, consider the following events which occur if
+        * rcu_seq_snap() were to be called after advance:
         *
         *  1) The RCU_WAIT_TAIL segment has callbacks (gp_num = X + 4) and the
         *     RCU_NEXT_READY_TAIL also has callbacks (gp_num = X + 8).
@@ -1264,6 +1273,13 @@ static unsigned long srcu_gp_start_if_needed(struct srcu_struct *ssp,
        if (rhp) {
                rcu_segcblist_advance(&sdp->srcu_cblist,
                                      rcu_seq_current(&ssp->srcu_sup->srcu_gp_seq));
+               /*
+                * Acceleration can never fail because the base current gp_seq
+                * used for acceleration is <= the value of gp_seq used for
+                * advancing. This means that RCU_NEXT_TAIL segment will
+                * always be able to be emptied by the acceleration into the
+                * RCU_NEXT_READY_TAIL or RCU_WAIT_TAIL segments.
+                */
                WARN_ON_ONCE(!rcu_segcblist_accelerate(&sdp->srcu_cblist, s));
        }
        if (ULONG_CMP_LT(sdp->srcu_gp_seq_needed, s)) {
index e550f97779b8dc82c950bcf92e0676339688c26e..86df878a2fee8b0b78718d8158c23bbb8137fb6f 100644 (file)
@@ -24,22 +24,6 @@ void rcu_sync_init(struct rcu_sync *rsp)
        init_waitqueue_head(&rsp->gp_wait);
 }
 
-/**
- * rcu_sync_enter_start - Force readers onto slow path for multiple updates
- * @rsp: Pointer to rcu_sync structure to use for synchronization
- *
- * Must be called after rcu_sync_init() and before first use.
- *
- * Ensures rcu_sync_is_idle() returns false and rcu_sync_{enter,exit}()
- * pairs turn into NO-OPs.
- */
-void rcu_sync_enter_start(struct rcu_sync *rsp)
-{
-       rsp->gp_count++;
-       rsp->gp_state = GP_PASSED;
-}
-
-
 static void rcu_sync_func(struct rcu_head *rhp);
 
 static void rcu_sync_call(struct rcu_sync *rsp)
index 732ad5b39946a519bb0ba7a0de35537d4d19f38b..147b5945d67a046dd568e6df586881f8c1a0c25f 100644 (file)
@@ -32,6 +32,7 @@ typedef void (*postgp_func_t)(struct rcu_tasks *rtp);
  * @rtp_irq_work: IRQ work queue for deferred wakeups.
  * @barrier_q_head: RCU callback for barrier operation.
  * @rtp_blkd_tasks: List of tasks blocked as readers.
+ * @rtp_exit_list: List of tasks in the latter portion of do_exit().
  * @cpu: CPU number corresponding to this entry.
  * @rtpp: Pointer to the rcu_tasks structure.
  */
@@ -46,6 +47,7 @@ struct rcu_tasks_percpu {
        struct irq_work rtp_irq_work;
        struct rcu_head barrier_q_head;
        struct list_head rtp_blkd_tasks;
+       struct list_head rtp_exit_list;
        int cpu;
        struct rcu_tasks *rtpp;
 };
@@ -144,8 +146,6 @@ static struct rcu_tasks rt_name =                                                   \
 }
 
 #ifdef CONFIG_TASKS_RCU
-/* Track exiting tasks in order to allow them to be waited for. */
-DEFINE_STATIC_SRCU(tasks_rcu_exit_srcu);
 
 /* Report delay in synchronize_srcu() completion in rcu_tasks_postscan(). */
 static void tasks_rcu_exit_srcu_stall(struct timer_list *unused);
@@ -240,7 +240,6 @@ static const char *tasks_gp_state_getname(struct rcu_tasks *rtp)
 static void cblist_init_generic(struct rcu_tasks *rtp)
 {
        int cpu;
-       unsigned long flags;
        int lim;
        int shift;
 
@@ -266,15 +265,15 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
                WARN_ON_ONCE(!rtpcp);
                if (cpu)
                        raw_spin_lock_init(&ACCESS_PRIVATE(rtpcp, lock));
-               local_irq_save(flags);  // serialize initialization
                if (rcu_segcblist_empty(&rtpcp->cblist))
                        rcu_segcblist_init(&rtpcp->cblist);
-               local_irq_restore(flags);
                INIT_WORK(&rtpcp->rtp_work, rcu_tasks_invoke_cbs_wq);
                rtpcp->cpu = cpu;
                rtpcp->rtpp = rtp;
                if (!rtpcp->rtp_blkd_tasks.next)
                        INIT_LIST_HEAD(&rtpcp->rtp_blkd_tasks);
+               if (!rtpcp->rtp_exit_list.next)
+                       INIT_LIST_HEAD(&rtpcp->rtp_exit_list);
        }
 
        pr_info("%s: Setting shift to %d and lim to %d rcu_task_cb_adjust=%d.\n", rtp->name,
@@ -851,10 +850,12 @@ static void rcu_tasks_wait_gp(struct rcu_tasks *rtp)
 //     number of voluntary context switches, and add that task to the
 //     holdout list.
 // rcu_tasks_postscan():
-//     Invoke synchronize_srcu() to ensure that all tasks that were
-//     in the process of exiting (and which thus might not know to
-//     synchronize with this RCU Tasks grace period) have completed
-//     exiting.
+//     Gather per-CPU lists of tasks in do_exit() to ensure that all
+//     tasks that were in the process of exiting (and which thus might
+//     not know to synchronize with this RCU Tasks grace period) have
+//     completed exiting.  The synchronize_rcu() in rcu_tasks_postgp()
+//     will take care of any tasks stuck in the non-preemptible region
+//     of do_exit() following its call to exit_tasks_rcu_stop().
 // check_all_holdout_tasks(), repeatedly until holdout list is empty:
 //     Scans the holdout list, attempting to identify a quiescent state
 //     for each task on the list.  If there is a quiescent state, the
@@ -867,8 +868,10 @@ static void rcu_tasks_wait_gp(struct rcu_tasks *rtp)
 //     with interrupts disabled.
 //
 // For each exiting task, the exit_tasks_rcu_start() and
-// exit_tasks_rcu_finish() functions begin and end, respectively, the SRCU
-// read-side critical sections waited for by rcu_tasks_postscan().
+// exit_tasks_rcu_finish() functions add and remove, respectively, the
+// current task to a per-CPU list of tasks that rcu_tasks_postscan() must
+// wait on.  This is necessary because rcu_tasks_postscan() must wait on
+// tasks that have already been removed from the global list of tasks.
 //
 // Pre-grace-period update-side code is ordered before the grace
 // via the raw_spin_lock.*rcu_node().  Pre-grace-period read-side code
@@ -932,9 +935,13 @@ static void rcu_tasks_pertask(struct task_struct *t, struct list_head *hop)
        }
 }
 
+void call_rcu_tasks(struct rcu_head *rhp, rcu_callback_t func);
+DEFINE_RCU_TASKS(rcu_tasks, rcu_tasks_wait_gp, call_rcu_tasks, "RCU Tasks");
+
 /* Processing between scanning taskslist and draining the holdout list. */
 static void rcu_tasks_postscan(struct list_head *hop)
 {
+       int cpu;
        int rtsi = READ_ONCE(rcu_task_stall_info);
 
        if (!IS_ENABLED(CONFIG_TINY_RCU)) {
@@ -948,9 +955,9 @@ static void rcu_tasks_postscan(struct list_head *hop)
         * this, divide the fragile exit path part in two intersecting
         * read side critical sections:
         *
-        * 1) An _SRCU_ read side starting before calling exit_notify(),
-        *    which may remove the task from the tasklist, and ending after
-        *    the final preempt_disable() call in do_exit().
+        * 1) A task_struct list addition before calling exit_notify(),
+        *    which may remove the task from the tasklist, with the
+        *    removal after the final preempt_disable() call in do_exit().
         *
         * 2) An _RCU_ read side starting with the final preempt_disable()
         *    call in do_exit() and ending with the final call to schedule()
@@ -959,7 +966,37 @@ static void rcu_tasks_postscan(struct list_head *hop)
         * This handles the part 1). And postgp will handle part 2) with a
         * call to synchronize_rcu().
         */
-       synchronize_srcu(&tasks_rcu_exit_srcu);
+
+       for_each_possible_cpu(cpu) {
+               unsigned long j = jiffies + 1;
+               struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rcu_tasks.rtpcpu, cpu);
+               struct task_struct *t;
+               struct task_struct *t1;
+               struct list_head tmp;
+
+               raw_spin_lock_irq_rcu_node(rtpcp);
+               list_for_each_entry_safe(t, t1, &rtpcp->rtp_exit_list, rcu_tasks_exit_list) {
+                       if (list_empty(&t->rcu_tasks_holdout_list))
+                               rcu_tasks_pertask(t, hop);
+
+                       // RT kernels need frequent pauses, otherwise
+                       // pause at least once per pair of jiffies.
+                       if (!IS_ENABLED(CONFIG_PREEMPT_RT) && time_before(jiffies, j))
+                               continue;
+
+                       // Keep our place in the list while pausing.
+                       // Nothing else traverses this list, so adding a
+                       // bare list_head is OK.
+                       list_add(&tmp, &t->rcu_tasks_exit_list);
+                       raw_spin_unlock_irq_rcu_node(rtpcp);
+                       cond_resched(); // For CONFIG_PREEMPT=n kernels
+                       raw_spin_lock_irq_rcu_node(rtpcp);
+                       t1 = list_entry(tmp.next, struct task_struct, rcu_tasks_exit_list);
+                       list_del(&tmp);
+                       j = jiffies + 1;
+               }
+               raw_spin_unlock_irq_rcu_node(rtpcp);
+       }
 
        if (!IS_ENABLED(CONFIG_TINY_RCU))
                del_timer_sync(&tasks_rcu_exit_srcu_stall_timer);
@@ -1027,7 +1064,6 @@ static void rcu_tasks_postgp(struct rcu_tasks *rtp)
         *
         * In addition, this synchronize_rcu() waits for exiting tasks
         * to complete their final preempt_disable() region of execution,
-        * cleaning up after synchronize_srcu(&tasks_rcu_exit_srcu),
         * enforcing the whole region before tasklist removal until
         * the final schedule() with TASK_DEAD state to be an RCU TASKS
         * read side critical section.
@@ -1035,9 +1071,6 @@ static void rcu_tasks_postgp(struct rcu_tasks *rtp)
        synchronize_rcu();
 }
 
-void call_rcu_tasks(struct rcu_head *rhp, rcu_callback_t func);
-DEFINE_RCU_TASKS(rcu_tasks, rcu_tasks_wait_gp, call_rcu_tasks, "RCU Tasks");
-
 static void tasks_rcu_exit_srcu_stall(struct timer_list *unused)
 {
 #ifndef CONFIG_TINY_RCU
@@ -1118,7 +1151,6 @@ module_param(rcu_tasks_lazy_ms, int, 0444);
 
 static int __init rcu_spawn_tasks_kthread(void)
 {
-       cblist_init_generic(&rcu_tasks);
        rcu_tasks.gp_sleep = HZ / 10;
        rcu_tasks.init_fract = HZ / 10;
        if (rcu_tasks_lazy_ms >= 0)
@@ -1147,25 +1179,48 @@ struct task_struct *get_rcu_tasks_gp_kthread(void)
 EXPORT_SYMBOL_GPL(get_rcu_tasks_gp_kthread);
 
 /*
- * Contribute to protect against tasklist scan blind spot while the
- * task is exiting and may be removed from the tasklist. See
- * corresponding synchronize_srcu() for further details.
+ * Protect against tasklist scan blind spot while the task is exiting and
+ * may be removed from the tasklist.  Do this by adding the task to yet
+ * another list.
+ *
+ * Note that the task will remove itself from this list, so there is no
+ * need for get_task_struct(), except in the case where rcu_tasks_pertask()
+ * adds it to the holdout list, in which case rcu_tasks_pertask() supplies
+ * the needed get_task_struct().
  */
-void exit_tasks_rcu_start(void) __acquires(&tasks_rcu_exit_srcu)
+void exit_tasks_rcu_start(void)
 {
-       current->rcu_tasks_idx = __srcu_read_lock(&tasks_rcu_exit_srcu);
+       unsigned long flags;
+       struct rcu_tasks_percpu *rtpcp;
+       struct task_struct *t = current;
+
+       WARN_ON_ONCE(!list_empty(&t->rcu_tasks_exit_list));
+       preempt_disable();
+       rtpcp = this_cpu_ptr(rcu_tasks.rtpcpu);
+       t->rcu_tasks_exit_cpu = smp_processor_id();
+       raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
+       if (!rtpcp->rtp_exit_list.next)
+               INIT_LIST_HEAD(&rtpcp->rtp_exit_list);
+       list_add(&t->rcu_tasks_exit_list, &rtpcp->rtp_exit_list);
+       raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
+       preempt_enable();
 }
 
 /*
- * Contribute to protect against tasklist scan blind spot while the
- * task is exiting and may be removed from the tasklist. See
- * corresponding synchronize_srcu() for further details.
+ * Remove the task from the "yet another list" because do_exit() is now
+ * non-preemptible, allowing synchronize_rcu() to wait beyond this point.
  */
-void exit_tasks_rcu_stop(void) __releases(&tasks_rcu_exit_srcu)
+void exit_tasks_rcu_stop(void)
 {
+       unsigned long flags;
+       struct rcu_tasks_percpu *rtpcp;
        struct task_struct *t = current;
 
-       __srcu_read_unlock(&tasks_rcu_exit_srcu, t->rcu_tasks_idx);
+       WARN_ON_ONCE(list_empty(&t->rcu_tasks_exit_list));
+       rtpcp = per_cpu_ptr(rcu_tasks.rtpcpu, t->rcu_tasks_exit_cpu);
+       raw_spin_lock_irqsave_rcu_node(rtpcp, flags);
+       list_del_init(&t->rcu_tasks_exit_list);
+       raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags);
 }
 
 /*
@@ -1282,7 +1337,6 @@ module_param(rcu_tasks_rude_lazy_ms, int, 0444);
 
 static int __init rcu_spawn_tasks_rude_kthread(void)
 {
-       cblist_init_generic(&rcu_tasks_rude);
        rcu_tasks_rude.gp_sleep = HZ / 10;
        if (rcu_tasks_rude_lazy_ms >= 0)
                rcu_tasks_rude.lazy_jiffies = msecs_to_jiffies(rcu_tasks_rude_lazy_ms);
@@ -1914,7 +1968,6 @@ module_param(rcu_tasks_trace_lazy_ms, int, 0444);
 
 static int __init rcu_spawn_tasks_trace_kthread(void)
 {
-       cblist_init_generic(&rcu_tasks_trace);
        if (IS_ENABLED(CONFIG_TASKS_TRACE_RCU_READ_MB)) {
                rcu_tasks_trace.gp_sleep = HZ / 10;
                rcu_tasks_trace.init_fract = HZ / 10;
@@ -2086,6 +2139,24 @@ late_initcall(rcu_tasks_verify_schedule_work);
 static void rcu_tasks_initiate_self_tests(void) { }
 #endif /* #else #ifdef CONFIG_PROVE_RCU */
 
+void __init tasks_cblist_init_generic(void)
+{
+       lockdep_assert_irqs_disabled();
+       WARN_ON(num_online_cpus() > 1);
+
+#ifdef CONFIG_TASKS_RCU
+       cblist_init_generic(&rcu_tasks);
+#endif
+
+#ifdef CONFIG_TASKS_RUDE_RCU
+       cblist_init_generic(&rcu_tasks_rude);
+#endif
+
+#ifdef CONFIG_TASKS_TRACE_RCU
+       cblist_init_generic(&rcu_tasks_trace);
+#endif
+}
+
 void __init rcu_init_tasks_generic(void)
 {
 #ifdef CONFIG_TASKS_RCU
index fec804b7908032d4416ea9ff3531eae154bba8f9..705c0d16850aa28db4e0ec7d26d10204e1ed24ce 100644 (file)
@@ -261,4 +261,5 @@ void __init rcu_init(void)
 {
        open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
        rcu_early_boot_tests();
+       tasks_cblist_init_generic();
 }
index 1ae8517778066284be5c7c15111b09a0f726f164..d9642dd06c2535d9a59682e9b9416d980bd6703c 100644 (file)
@@ -145,7 +145,7 @@ static int rcu_scheduler_fully_active __read_mostly;
 
 static void rcu_report_qs_rnp(unsigned long mask, struct rcu_node *rnp,
                              unsigned long gps, unsigned long flags);
-static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu);
+static struct task_struct *rcu_boost_task(struct rcu_node *rnp);
 static void invoke_rcu_core(void);
 static void rcu_report_exp_rdp(struct rcu_data *rdp);
 static void sync_sched_exp_online_cleanup(int cpu);
@@ -1013,6 +1013,38 @@ static bool rcu_future_gp_cleanup(struct rcu_node *rnp)
        return needmore;
 }
 
+static void swake_up_one_online_ipi(void *arg)
+{
+       struct swait_queue_head *wqh = arg;
+
+       swake_up_one(wqh);
+}
+
+static void swake_up_one_online(struct swait_queue_head *wqh)
+{
+       int cpu = get_cpu();
+
+       /*
+        * If called from rcutree_report_cpu_starting(), wake up
+        * is dangerous that late in the CPU-down hotplug process. The
+        * scheduler might queue an ignored hrtimer. Defer the wake up
+        * to an online CPU instead.
+        */
+       if (unlikely(cpu_is_offline(cpu))) {
+               int target;
+
+               target = cpumask_any_and(housekeeping_cpumask(HK_TYPE_RCU),
+                                        cpu_online_mask);
+
+               smp_call_function_single(target, swake_up_one_online_ipi,
+                                        wqh, 0);
+               put_cpu();
+       } else {
+               put_cpu();
+               swake_up_one(wqh);
+       }
+}
+
 /*
  * Awaken the grace-period kthread.  Don't do a self-awaken (unless in an
  * interrupt or softirq handler, in which case we just might immediately
@@ -1037,7 +1069,7 @@ static void rcu_gp_kthread_wake(void)
                return;
        WRITE_ONCE(rcu_state.gp_wake_time, jiffies);
        WRITE_ONCE(rcu_state.gp_wake_seq, READ_ONCE(rcu_state.gp_seq));
-       swake_up_one(&rcu_state.gp_wq);
+       swake_up_one_online(&rcu_state.gp_wq);
 }
 
 /*
@@ -2113,6 +2145,12 @@ static void rcu_do_batch(struct rcu_data *rdp)
         * Extract the list of ready callbacks, disabling IRQs to prevent
         * races with call_rcu() from interrupt handlers.  Leave the
         * callback counts, as rcu_barrier() needs to be conservative.
+        *
+        * Callbacks execution is fully ordered against preceding grace period
+        * completion (materialized by rnp->gp_seq update) thanks to the
+        * smp_mb__after_unlock_lock() upon node locking required for callbacks
+        * advancing. In NOCB mode this ordering is then further relayed through
+        * the nocb locking that protects both callbacks advancing and extraction.
         */
        rcu_nocb_lock_irqsave(rdp, flags);
        WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
@@ -2559,12 +2597,26 @@ static int __init rcu_spawn_core_kthreads(void)
        return 0;
 }
 
+static void rcutree_enqueue(struct rcu_data *rdp, struct rcu_head *head, rcu_callback_t func)
+{
+       rcu_segcblist_enqueue(&rdp->cblist, head);
+       if (__is_kvfree_rcu_offset((unsigned long)func))
+               trace_rcu_kvfree_callback(rcu_state.name, head,
+                                        (unsigned long)func,
+                                        rcu_segcblist_n_cbs(&rdp->cblist));
+       else
+               trace_rcu_callback(rcu_state.name, head,
+                                  rcu_segcblist_n_cbs(&rdp->cblist));
+       trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCBQueued"));
+}
+
 /*
  * Handle any core-RCU processing required by a call_rcu() invocation.
  */
-static void __call_rcu_core(struct rcu_data *rdp, struct rcu_head *head,
-                           unsigned long flags)
+static void call_rcu_core(struct rcu_data *rdp, struct rcu_head *head,
+                         rcu_callback_t func, unsigned long flags)
 {
+       rcutree_enqueue(rdp, head, func);
        /*
         * If called from an extended quiescent state, invoke the RCU
         * core in order to force a re-evaluation of RCU's idleness.
@@ -2660,7 +2712,6 @@ __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy_in)
        unsigned long flags;
        bool lazy;
        struct rcu_data *rdp;
-       bool was_alldone;
 
        /* Misaligned rcu_head! */
        WARN_ON_ONCE((unsigned long)head & (sizeof(void *) - 1));
@@ -2697,30 +2748,18 @@ __call_rcu_common(struct rcu_head *head, rcu_callback_t func, bool lazy_in)
        }
 
        check_cb_ovld(rdp);
-       if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags, lazy))
-               return; // Enqueued onto ->nocb_bypass, so just leave.
-       // If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock.
-       rcu_segcblist_enqueue(&rdp->cblist, head);
-       if (__is_kvfree_rcu_offset((unsigned long)func))
-               trace_rcu_kvfree_callback(rcu_state.name, head,
-                                        (unsigned long)func,
-                                        rcu_segcblist_n_cbs(&rdp->cblist));
-       else
-               trace_rcu_callback(rcu_state.name, head,
-                                  rcu_segcblist_n_cbs(&rdp->cblist));
 
-       trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCBQueued"));
-
-       /* Go handle any RCU core processing required. */
-       if (unlikely(rcu_rdp_is_offloaded(rdp))) {
-               __call_rcu_nocb_wake(rdp, was_alldone, flags); /* unlocks */
-       } else {
-               __call_rcu_core(rdp, head, flags);
-               local_irq_restore(flags);
-       }
+       if (unlikely(rcu_rdp_is_offloaded(rdp)))
+               call_rcu_nocb(rdp, head, func, flags, lazy);
+       else
+               call_rcu_core(rdp, head, func, flags);
+       local_irq_restore(flags);
 }
 
 #ifdef CONFIG_RCU_LAZY
+static bool enable_rcu_lazy __read_mostly = !IS_ENABLED(CONFIG_RCU_LAZY_DEFAULT_OFF);
+module_param(enable_rcu_lazy, bool, 0444);
+
 /**
  * call_rcu_hurry() - Queue RCU callback for invocation after grace period, and
  * flush all lazy callbacks (including the new one) to the main ->cblist while
@@ -2746,6 +2785,8 @@ void call_rcu_hurry(struct rcu_head *head, rcu_callback_t func)
        __call_rcu_common(head, func, false);
 }
 EXPORT_SYMBOL_GPL(call_rcu_hurry);
+#else
+#define enable_rcu_lazy                false
 #endif
 
 /**
@@ -2794,7 +2835,7 @@ EXPORT_SYMBOL_GPL(call_rcu_hurry);
  */
 void call_rcu(struct rcu_head *head, rcu_callback_t func)
 {
-       __call_rcu_common(head, func, IS_ENABLED(CONFIG_RCU_LAZY));
+       __call_rcu_common(head, func, enable_rcu_lazy);
 }
 EXPORT_SYMBOL_GPL(call_rcu);
 
@@ -4362,6 +4403,66 @@ rcu_boot_init_percpu_data(int cpu)
        rcu_boot_init_nocb_percpu_data(rdp);
 }
 
+struct kthread_worker *rcu_exp_gp_kworker;
+
+static void rcu_spawn_exp_par_gp_kworker(struct rcu_node *rnp)
+{
+       struct kthread_worker *kworker;
+       const char *name = "rcu_exp_par_gp_kthread_worker/%d";
+       struct sched_param param = { .sched_priority = kthread_prio };
+       int rnp_index = rnp - rcu_get_root();
+
+       if (rnp->exp_kworker)
+               return;
+
+       kworker = kthread_create_worker(0, name, rnp_index);
+       if (IS_ERR_OR_NULL(kworker)) {
+               pr_err("Failed to create par gp kworker on %d/%d\n",
+                      rnp->grplo, rnp->grphi);
+               return;
+       }
+       WRITE_ONCE(rnp->exp_kworker, kworker);
+
+       if (IS_ENABLED(CONFIG_RCU_EXP_KTHREAD))
+               sched_setscheduler_nocheck(kworker->task, SCHED_FIFO, &param);
+}
+
+static struct task_struct *rcu_exp_par_gp_task(struct rcu_node *rnp)
+{
+       struct kthread_worker *kworker = READ_ONCE(rnp->exp_kworker);
+
+       if (!kworker)
+               return NULL;
+
+       return kworker->task;
+}
+
+static void __init rcu_start_exp_gp_kworker(void)
+{
+       const char *name = "rcu_exp_gp_kthread_worker";
+       struct sched_param param = { .sched_priority = kthread_prio };
+
+       rcu_exp_gp_kworker = kthread_create_worker(0, name);
+       if (IS_ERR_OR_NULL(rcu_exp_gp_kworker)) {
+               pr_err("Failed to create %s!\n", name);
+               rcu_exp_gp_kworker = NULL;
+               return;
+       }
+
+       if (IS_ENABLED(CONFIG_RCU_EXP_KTHREAD))
+               sched_setscheduler_nocheck(rcu_exp_gp_kworker->task, SCHED_FIFO, &param);
+}
+
+static void rcu_spawn_rnp_kthreads(struct rcu_node *rnp)
+{
+       if (rcu_scheduler_fully_active) {
+               mutex_lock(&rnp->kthread_mutex);
+               rcu_spawn_one_boost_kthread(rnp);
+               rcu_spawn_exp_par_gp_kworker(rnp);
+               mutex_unlock(&rnp->kthread_mutex);
+       }
+}
+
 /*
  * Invoked early in the CPU-online process, when pretty much all services
  * are available.  The incoming CPU is not present.
@@ -4410,7 +4511,7 @@ int rcutree_prepare_cpu(unsigned int cpu)
        rdp->rcu_iw_gp_seq = rdp->gp_seq - 1;
        trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
        raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
-       rcu_spawn_one_boost_kthread(rnp);
+       rcu_spawn_rnp_kthreads(rnp);
        rcu_spawn_cpu_nocb_kthread(cpu);
        WRITE_ONCE(rcu_state.n_online_cpus, rcu_state.n_online_cpus + 1);
 
@@ -4418,13 +4519,64 @@ int rcutree_prepare_cpu(unsigned int cpu)
 }
 
 /*
- * Update RCU priority boot kthread affinity for CPU-hotplug changes.
+ * Update kthreads affinity during CPU-hotplug changes.
+ *
+ * Set the per-rcu_node kthread's affinity to cover all CPUs that are
+ * served by the rcu_node in question.  The CPU hotplug lock is still
+ * held, so the value of rnp->qsmaskinit will be stable.
+ *
+ * We don't include outgoingcpu in the affinity set, use -1 if there is
+ * no outgoing CPU.  If there are no CPUs left in the affinity set,
+ * this function allows the kthread to execute on any CPU.
+ *
+ * Any future concurrent calls are serialized via ->kthread_mutex.
  */
-static void rcutree_affinity_setting(unsigned int cpu, int outgoing)
+static void rcutree_affinity_setting(unsigned int cpu, int outgoingcpu)
 {
-       struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
+       cpumask_var_t cm;
+       unsigned long mask;
+       struct rcu_data *rdp;
+       struct rcu_node *rnp;
+       struct task_struct *task_boost, *task_exp;
+
+       rdp = per_cpu_ptr(&rcu_data, cpu);
+       rnp = rdp->mynode;
+
+       task_boost = rcu_boost_task(rnp);
+       task_exp = rcu_exp_par_gp_task(rnp);
 
-       rcu_boost_kthread_setaffinity(rdp->mynode, outgoing);
+       /*
+        * If CPU is the boot one, those tasks are created later from early
+        * initcall since kthreadd must be created first.
+        */
+       if (!task_boost && !task_exp)
+               return;
+
+       if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
+               return;
+
+       mutex_lock(&rnp->kthread_mutex);
+       mask = rcu_rnp_online_cpus(rnp);
+       for_each_leaf_node_possible_cpu(rnp, cpu)
+               if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
+                   cpu != outgoingcpu)
+                       cpumask_set_cpu(cpu, cm);
+       cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
+       if (cpumask_empty(cm)) {
+               cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_RCU));
+               if (outgoingcpu >= 0)
+                       cpumask_clear_cpu(outgoingcpu, cm);
+       }
+
+       if (task_exp)
+               set_cpus_allowed_ptr(task_exp, cm);
+
+       if (task_boost)
+               set_cpus_allowed_ptr(task_boost, cm);
+
+       mutex_unlock(&rnp->kthread_mutex);
+
+       free_cpumask_var(cm);
 }
 
 /*
@@ -4608,8 +4760,9 @@ void rcutree_migrate_callbacks(int cpu)
                __call_rcu_nocb_wake(my_rdp, true, flags);
        } else {
                rcu_nocb_unlock(my_rdp); /* irqs remain disabled. */
-               raw_spin_unlock_irqrestore_rcu_node(my_rnp, flags);
+               raw_spin_unlock_rcu_node(my_rnp); /* irqs remain disabled. */
        }
+       local_irq_restore(flags);
        if (needwake)
                rcu_gp_kthread_wake();
        lockdep_assert_irqs_enabled();
@@ -4698,51 +4851,6 @@ static int rcu_pm_notify(struct notifier_block *self,
        return NOTIFY_OK;
 }
 
-#ifdef CONFIG_RCU_EXP_KTHREAD
-struct kthread_worker *rcu_exp_gp_kworker;
-struct kthread_worker *rcu_exp_par_gp_kworker;
-
-static void __init rcu_start_exp_gp_kworkers(void)
-{
-       const char *par_gp_kworker_name = "rcu_exp_par_gp_kthread_worker";
-       const char *gp_kworker_name = "rcu_exp_gp_kthread_worker";
-       struct sched_param param = { .sched_priority = kthread_prio };
-
-       rcu_exp_gp_kworker = kthread_create_worker(0, gp_kworker_name);
-       if (IS_ERR_OR_NULL(rcu_exp_gp_kworker)) {
-               pr_err("Failed to create %s!\n", gp_kworker_name);
-               return;
-       }
-
-       rcu_exp_par_gp_kworker = kthread_create_worker(0, par_gp_kworker_name);
-       if (IS_ERR_OR_NULL(rcu_exp_par_gp_kworker)) {
-               pr_err("Failed to create %s!\n", par_gp_kworker_name);
-               kthread_destroy_worker(rcu_exp_gp_kworker);
-               return;
-       }
-
-       sched_setscheduler_nocheck(rcu_exp_gp_kworker->task, SCHED_FIFO, &param);
-       sched_setscheduler_nocheck(rcu_exp_par_gp_kworker->task, SCHED_FIFO,
-                                  &param);
-}
-
-static inline void rcu_alloc_par_gp_wq(void)
-{
-}
-#else /* !CONFIG_RCU_EXP_KTHREAD */
-struct workqueue_struct *rcu_par_gp_wq;
-
-static void __init rcu_start_exp_gp_kworkers(void)
-{
-}
-
-static inline void rcu_alloc_par_gp_wq(void)
-{
-       rcu_par_gp_wq = alloc_workqueue("rcu_par_gp", WQ_MEM_RECLAIM, 0);
-       WARN_ON(!rcu_par_gp_wq);
-}
-#endif /* CONFIG_RCU_EXP_KTHREAD */
-
 /*
  * Spawn the kthreads that handle RCU's grace periods.
  */
@@ -4777,10 +4885,10 @@ static int __init rcu_spawn_gp_kthread(void)
         * due to rcu_scheduler_fully_active.
         */
        rcu_spawn_cpu_nocb_kthread(smp_processor_id());
-       rcu_spawn_one_boost_kthread(rdp->mynode);
+       rcu_spawn_rnp_kthreads(rdp->mynode);
        rcu_spawn_core_kthreads();
        /* Create kthread worker for expedited GPs */
-       rcu_start_exp_gp_kworkers();
+       rcu_start_exp_gp_kworker();
        return 0;
 }
 early_initcall(rcu_spawn_gp_kthread);
@@ -4883,7 +4991,7 @@ static void __init rcu_init_one(void)
                        init_waitqueue_head(&rnp->exp_wq[2]);
                        init_waitqueue_head(&rnp->exp_wq[3]);
                        spin_lock_init(&rnp->exp_lock);
-                       mutex_init(&rnp->boost_kthread_mutex);
+                       mutex_init(&rnp->kthread_mutex);
                        raw_spin_lock_init(&rnp->exp_poll_lock);
                        rnp->exp_seq_poll_rq = RCU_GET_STATE_COMPLETED;
                        INIT_WORK(&rnp->exp_poll_wq, sync_rcu_do_polled_gp);
@@ -5120,7 +5228,6 @@ void __init rcu_init(void)
        /* Create workqueue for Tree SRCU and for expedited GPs. */
        rcu_gp_wq = alloc_workqueue("rcu_gp", WQ_MEM_RECLAIM, 0);
        WARN_ON(!rcu_gp_wq);
-       rcu_alloc_par_gp_wq();
 
        /* Fill in default value for rcutree.qovld boot parameter. */
        /* -After- the rcu_node ->lock fields are initialized! */
@@ -5133,6 +5240,8 @@ void __init rcu_init(void)
        (void)start_poll_synchronize_rcu_expedited();
 
        rcu_test_sync_prims();
+
+       tasks_cblist_init_generic();
 }
 
 #include "tree_stall.h"
index e9821a8422dbe7d327f875a633a8d4fc1348feba..df48160b3136dca1b668a3e0d275832ccc273dd3 100644 (file)
 
 #include "rcu_segcblist.h"
 
-/* Communicate arguments to a workqueue handler. */
+/* Communicate arguments to a kthread worker handler. */
 struct rcu_exp_work {
        unsigned long rew_s;
-#ifdef CONFIG_RCU_EXP_KTHREAD
        struct kthread_work rew_work;
-#else
-       struct work_struct rew_work;
-#endif /* CONFIG_RCU_EXP_KTHREAD */
 };
 
 /* RCU's kthread states for tracing. */
@@ -72,6 +68,9 @@ struct rcu_node {
                                /* Online CPUs for next expedited GP. */
                                /*  Any CPU that has ever been online will */
                                /*  have its bit set. */
+       struct kthread_worker *exp_kworker;
+                               /* Workers performing per node expedited GP */
+                               /* initialization. */
        unsigned long cbovldmask;
                                /* CPUs experiencing callback overload. */
        unsigned long ffmask;   /* Fully functional CPUs. */
@@ -113,7 +112,7 @@ struct rcu_node {
                                /*  side effect, not as a lock. */
        unsigned long boost_time;
                                /* When to start boosting (jiffies). */
-       struct mutex boost_kthread_mutex;
+       struct mutex kthread_mutex;
                                /* Exclusion for thread spawning and affinity */
                                /*  manipulation. */
        struct task_struct *boost_kthread_task;
@@ -467,11 +466,10 @@ static void rcu_init_one_nocb(struct rcu_node *rnp);
 static bool wake_nocb_gp(struct rcu_data *rdp, bool force);
 static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
                                  unsigned long j, bool lazy);
-static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-                               bool *was_alldone, unsigned long flags,
-                               bool lazy);
-static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty,
-                                unsigned long flags);
+static void call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *head,
+                         rcu_callback_t func, unsigned long flags, bool lazy);
+static void __maybe_unused __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty,
+                                               unsigned long flags);
 static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp, int level);
 static bool do_nocb_deferred_wakeup(struct rcu_data *rdp);
 static void rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp);
index 6d7cea5d591f95d823b63972da899dded9e369d1..6b83537480b12f01c05be827b3eeae87dbb4c6ce 100644 (file)
@@ -173,7 +173,6 @@ static bool sync_rcu_exp_done_unlocked(struct rcu_node *rnp)
        return ret;
 }
 
-
 /*
  * Report the exit from RCU read-side critical section for the last task
  * that queued itself during or before the current expedited preemptible-RCU
@@ -199,10 +198,9 @@ static void __rcu_report_exp_rnp(struct rcu_node *rnp,
                }
                if (rnp->parent == NULL) {
                        raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
-                       if (wake) {
-                               smp_mb(); /* EGP done before wake_up(). */
-                               swake_up_one(&rcu_state.expedited_wq);
-                       }
+                       if (wake)
+                               swake_up_one_online(&rcu_state.expedited_wq);
+
                        break;
                }
                mask = rnp->grpmask;
@@ -420,7 +418,6 @@ retry_ipi:
 
 static void rcu_exp_sel_wait_wake(unsigned long s);
 
-#ifdef CONFIG_RCU_EXP_KTHREAD
 static void sync_rcu_exp_select_node_cpus(struct kthread_work *wp)
 {
        struct rcu_exp_work *rewp =
@@ -429,9 +426,14 @@ static void sync_rcu_exp_select_node_cpus(struct kthread_work *wp)
        __sync_rcu_exp_select_node_cpus(rewp);
 }
 
-static inline bool rcu_gp_par_worker_started(void)
+static inline bool rcu_exp_worker_started(void)
+{
+       return !!READ_ONCE(rcu_exp_gp_kworker);
+}
+
+static inline bool rcu_exp_par_worker_started(struct rcu_node *rnp)
 {
-       return !!READ_ONCE(rcu_exp_par_gp_kworker);
+       return !!READ_ONCE(rnp->exp_kworker);
 }
 
 static inline void sync_rcu_exp_select_cpus_queue_work(struct rcu_node *rnp)
@@ -442,7 +444,7 @@ static inline void sync_rcu_exp_select_cpus_queue_work(struct rcu_node *rnp)
         * another work item on the same kthread worker can result in
         * deadlock.
         */
-       kthread_queue_work(rcu_exp_par_gp_kworker, &rnp->rew.rew_work);
+       kthread_queue_work(READ_ONCE(rnp->exp_kworker), &rnp->rew.rew_work);
 }
 
 static inline void sync_rcu_exp_select_cpus_flush_work(struct rcu_node *rnp)
@@ -467,64 +469,6 @@ static inline void synchronize_rcu_expedited_queue_work(struct rcu_exp_work *rew
        kthread_queue_work(rcu_exp_gp_kworker, &rew->rew_work);
 }
 
-static inline void synchronize_rcu_expedited_destroy_work(struct rcu_exp_work *rew)
-{
-}
-#else /* !CONFIG_RCU_EXP_KTHREAD */
-static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
-{
-       struct rcu_exp_work *rewp =
-               container_of(wp, struct rcu_exp_work, rew_work);
-
-       __sync_rcu_exp_select_node_cpus(rewp);
-}
-
-static inline bool rcu_gp_par_worker_started(void)
-{
-       return !!READ_ONCE(rcu_par_gp_wq);
-}
-
-static inline void sync_rcu_exp_select_cpus_queue_work(struct rcu_node *rnp)
-{
-       int cpu = find_next_bit(&rnp->ffmask, BITS_PER_LONG, -1);
-
-       INIT_WORK(&rnp->rew.rew_work, sync_rcu_exp_select_node_cpus);
-       /* If all offline, queue the work on an unbound CPU. */
-       if (unlikely(cpu > rnp->grphi - rnp->grplo))
-               cpu = WORK_CPU_UNBOUND;
-       else
-               cpu += rnp->grplo;
-       queue_work_on(cpu, rcu_par_gp_wq, &rnp->rew.rew_work);
-}
-
-static inline void sync_rcu_exp_select_cpus_flush_work(struct rcu_node *rnp)
-{
-       flush_work(&rnp->rew.rew_work);
-}
-
-/*
- * Work-queue handler to drive an expedited grace period forward.
- */
-static void wait_rcu_exp_gp(struct work_struct *wp)
-{
-       struct rcu_exp_work *rewp;
-
-       rewp = container_of(wp, struct rcu_exp_work, rew_work);
-       rcu_exp_sel_wait_wake(rewp->rew_s);
-}
-
-static inline void synchronize_rcu_expedited_queue_work(struct rcu_exp_work *rew)
-{
-       INIT_WORK_ONSTACK(&rew->rew_work, wait_rcu_exp_gp);
-       queue_work(rcu_gp_wq, &rew->rew_work);
-}
-
-static inline void synchronize_rcu_expedited_destroy_work(struct rcu_exp_work *rew)
-{
-       destroy_work_on_stack(&rew->rew_work);
-}
-#endif /* CONFIG_RCU_EXP_KTHREAD */
-
 /*
  * Select the nodes that the upcoming expedited grace period needs
  * to wait for.
@@ -542,7 +486,7 @@ static void sync_rcu_exp_select_cpus(void)
                rnp->exp_need_flush = false;
                if (!READ_ONCE(rnp->expmask))
                        continue; /* Avoid early boot non-existent wq. */
-               if (!rcu_gp_par_worker_started() ||
+               if (!rcu_exp_par_worker_started(rnp) ||
                    rcu_scheduler_active != RCU_SCHEDULER_RUNNING ||
                    rcu_is_last_leaf_node(rnp)) {
                        /* No worker started yet or last leaf, do direct call. */
@@ -957,7 +901,6 @@ static void rcu_exp_print_detail_task_stall_rnp(struct rcu_node *rnp)
  */
 void synchronize_rcu_expedited(void)
 {
-       bool boottime = (rcu_scheduler_active == RCU_SCHEDULER_INIT);
        unsigned long flags;
        struct rcu_exp_work rew;
        struct rcu_node *rnp;
@@ -997,7 +940,7 @@ void synchronize_rcu_expedited(void)
                return;  /* Someone else did our work for us. */
 
        /* Ensure that load happens before action based on it. */
-       if (unlikely(boottime)) {
+       if (unlikely((rcu_scheduler_active == RCU_SCHEDULER_INIT) || !rcu_exp_worker_started())) {
                /* Direct call during scheduler init and early_initcalls(). */
                rcu_exp_sel_wait_wake(s);
        } else {
@@ -1014,9 +957,6 @@ void synchronize_rcu_expedited(void)
 
        /* Let the next expedited grace period start. */
        mutex_unlock(&rcu_state.exp_mutex);
-
-       if (likely(!boottime))
-               synchronize_rcu_expedited_destroy_work(&rew);
 }
 EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
 
index 4efbf7333d4e168c3bbaca61b43b919d5ce3fe26..3f85577bddd4ef101cbfc682a3c9622ca816d14a 100644 (file)
@@ -256,6 +256,7 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
        return __wake_nocb_gp(rdp_gp, rdp, force, flags);
 }
 
+#ifdef CONFIG_RCU_LAZY
 /*
  * LAZY_FLUSH_JIFFIES decides the maximum amount of time that
  * can elapse before lazy callbacks are flushed. Lazy callbacks
@@ -264,21 +265,20 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
  * left unsubmitted to RCU after those many jiffies.
  */
 #define LAZY_FLUSH_JIFFIES (10 * HZ)
-static unsigned long jiffies_till_flush = LAZY_FLUSH_JIFFIES;
+static unsigned long jiffies_lazy_flush = LAZY_FLUSH_JIFFIES;
 
-#ifdef CONFIG_RCU_LAZY
 // To be called only from test code.
-void rcu_lazy_set_jiffies_till_flush(unsigned long jif)
+void rcu_set_jiffies_lazy_flush(unsigned long jif)
 {
-       jiffies_till_flush = jif;
+       jiffies_lazy_flush = jif;
 }
-EXPORT_SYMBOL(rcu_lazy_set_jiffies_till_flush);
+EXPORT_SYMBOL(rcu_set_jiffies_lazy_flush);
 
-unsigned long rcu_lazy_get_jiffies_till_flush(void)
+unsigned long rcu_get_jiffies_lazy_flush(void)
 {
-       return jiffies_till_flush;
+       return jiffies_lazy_flush;
 }
-EXPORT_SYMBOL(rcu_lazy_get_jiffies_till_flush);
+EXPORT_SYMBOL(rcu_get_jiffies_lazy_flush);
 #endif
 
 /*
@@ -299,7 +299,7 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
         */
        if (waketype == RCU_NOCB_WAKE_LAZY &&
            rdp->nocb_defer_wakeup == RCU_NOCB_WAKE_NOT) {
-               mod_timer(&rdp_gp->nocb_timer, jiffies + jiffies_till_flush);
+               mod_timer(&rdp_gp->nocb_timer, jiffies + rcu_get_jiffies_lazy_flush());
                WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
        } else if (waketype == RCU_NOCB_WAKE_BYPASS) {
                mod_timer(&rdp_gp->nocb_timer, jiffies + 2);
@@ -482,7 +482,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
        // flush ->nocb_bypass to ->cblist.
        if ((ncbs && !bypass_is_lazy && j != READ_ONCE(rdp->nocb_bypass_first)) ||
            (ncbs &&  bypass_is_lazy &&
-            (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_till_flush))) ||
+            (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + rcu_get_jiffies_lazy_flush()))) ||
            ncbs >= qhimark) {
                rcu_nocb_lock(rdp);
                *was_alldone = !rcu_segcblist_pend_cbs(&rdp->cblist);
@@ -532,9 +532,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
        // 2. Both of these conditions are met:
        //    a. The bypass list previously had only lazy CBs, and:
        //    b. The new CB is non-lazy.
-       if (ncbs && (!bypass_is_lazy || lazy)) {
-               local_irq_restore(flags);
-       } else {
+       if (!ncbs || (bypass_is_lazy && !lazy)) {
                // No-CBs GP kthread might be indefinitely asleep, if so, wake.
                rcu_nocb_lock(rdp); // Rare during call_rcu() flood.
                if (!rcu_segcblist_pend_cbs(&rdp->cblist)) {
@@ -544,7 +542,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
                } else {
                        trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
                                            TPS("FirstBQnoWake"));
-                       rcu_nocb_unlock_irqrestore(rdp, flags);
+                       rcu_nocb_unlock(rdp);
                }
        }
        return true; // Callback already enqueued.
@@ -566,11 +564,12 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
        long lazy_len;
        long len;
        struct task_struct *t;
+       struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
 
        // If we are being polled or there is no kthread, just leave.
        t = READ_ONCE(rdp->nocb_gp_kthread);
        if (rcu_nocb_poll || !t) {
-               rcu_nocb_unlock_irqrestore(rdp, flags);
+               rcu_nocb_unlock(rdp);
                trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
                                    TPS("WakeNotPoll"));
                return;
@@ -583,17 +582,17 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
                rdp->qlen_last_fqs_check = len;
                // Only lazy CBs in bypass list
                if (lazy_len && bypass_len == lazy_len) {
-                       rcu_nocb_unlock_irqrestore(rdp, flags);
+                       rcu_nocb_unlock(rdp);
                        wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_LAZY,
                                           TPS("WakeLazy"));
                } else if (!irqs_disabled_flags(flags)) {
                        /* ... if queue was empty ... */
-                       rcu_nocb_unlock_irqrestore(rdp, flags);
+                       rcu_nocb_unlock(rdp);
                        wake_nocb_gp(rdp, false);
                        trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
                                            TPS("WakeEmpty"));
                } else {
-                       rcu_nocb_unlock_irqrestore(rdp, flags);
+                       rcu_nocb_unlock(rdp);
                        wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE,
                                           TPS("WakeEmptyIsDeferred"));
                }
@@ -610,20 +609,32 @@ static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_alldone,
                smp_mb(); /* Enqueue before timer_pending(). */
                if ((rdp->nocb_cb_sleep ||
                     !rcu_segcblist_ready_cbs(&rdp->cblist)) &&
-                   !timer_pending(&rdp->nocb_timer)) {
-                       rcu_nocb_unlock_irqrestore(rdp, flags);
+                   !timer_pending(&rdp_gp->nocb_timer)) {
+                       rcu_nocb_unlock(rdp);
                        wake_nocb_gp_defer(rdp, RCU_NOCB_WAKE_FORCE,
                                           TPS("WakeOvfIsDeferred"));
                } else {
-                       rcu_nocb_unlock_irqrestore(rdp, flags);
+                       rcu_nocb_unlock(rdp);
                        trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WakeNot"));
                }
        } else {
-               rcu_nocb_unlock_irqrestore(rdp, flags);
+               rcu_nocb_unlock(rdp);
                trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WakeNot"));
        }
 }
 
+static void call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *head,
+                         rcu_callback_t func, unsigned long flags, bool lazy)
+{
+       bool was_alldone;
+
+       if (!rcu_nocb_try_bypass(rdp, head, &was_alldone, flags, lazy)) {
+               /* Not enqueued on bypass but locked, do regular enqueue */
+               rcutree_enqueue(rdp, head, func);
+               __call_rcu_nocb_wake(rdp, was_alldone, flags); /* unlocks */
+       }
+}
+
 static int nocb_gp_toggle_rdp(struct rcu_data *rdp,
                               bool *wake_state)
 {
@@ -723,7 +734,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
                lazy_ncbs = READ_ONCE(rdp->lazy_len);
 
                if (bypass_ncbs && (lazy_ncbs == bypass_ncbs) &&
-                   (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + jiffies_till_flush) ||
+                   (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + rcu_get_jiffies_lazy_flush()) ||
                     bypass_ncbs > 2 * qhimark)) {
                        flush_bypass = true;
                } else if (bypass_ncbs && (lazy_ncbs != bypass_ncbs) &&
@@ -779,7 +790,6 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
                if (rcu_segcblist_ready_cbs(&rdp->cblist)) {
                        needwake = rdp->nocb_cb_sleep;
                        WRITE_ONCE(rdp->nocb_cb_sleep, false);
-                       smp_mb(); /* CB invocation -after- GP end. */
                } else {
                        needwake = false;
                }
@@ -933,8 +943,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
                swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
                                                    nocb_cb_wait_cond(rdp));
 
-               // VVV Ensure CB invocation follows _sleep test.
-               if (smp_load_acquire(&rdp->nocb_cb_sleep)) { // ^^^
+               if (READ_ONCE(rdp->nocb_cb_sleep)) {
                        WARN_ON(signal_pending(current));
                        trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("WokeEmpty"));
                }
@@ -1383,7 +1392,7 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
                        rcu_nocb_unlock_irqrestore(rdp, flags);
                        continue;
                }
-               WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false));
+               rcu_nocb_try_flush_bypass(rdp, jiffies);
                rcu_nocb_unlock_irqrestore(rdp, flags);
                wake_nocb_gp(rdp, false);
                sc->nr_to_scan -= _count;
@@ -1768,10 +1777,10 @@ static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
        return true;
 }
 
-static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
-                               bool *was_alldone, unsigned long flags, bool lazy)
+static void call_rcu_nocb(struct rcu_data *rdp, struct rcu_head *head,
+                         rcu_callback_t func, unsigned long flags, bool lazy)
 {
-       return false;
+       WARN_ON_ONCE(1);  /* Should be dead code! */
 }
 
 static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty,
index 41021080ad258d7f68d43762ceb2299266754712..36a8b5dbf5b52ecff9e879f102a53fa6cd264474 100644 (file)
@@ -1195,14 +1195,13 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
        struct sched_param sp;
        struct task_struct *t;
 
-       mutex_lock(&rnp->boost_kthread_mutex);
-       if (rnp->boost_kthread_task || !rcu_scheduler_fully_active)
-               goto out;
+       if (rnp->boost_kthread_task)
+               return;
 
        t = kthread_create(rcu_boost_kthread, (void *)rnp,
                           "rcub/%d", rnp_index);
        if (WARN_ON_ONCE(IS_ERR(t)))
-               goto out;
+               return;
 
        raw_spin_lock_irqsave_rcu_node(rnp, flags);
        rnp->boost_kthread_task = t;
@@ -1210,48 +1209,11 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
        sp.sched_priority = kthread_prio;
        sched_setscheduler_nocheck(t, SCHED_FIFO, &sp);
        wake_up_process(t); /* get to TASK_INTERRUPTIBLE quickly. */
-
- out:
-       mutex_unlock(&rnp->boost_kthread_mutex);
 }
 
-/*
- * Set the per-rcu_node kthread's affinity to cover all CPUs that are
- * served by the rcu_node in question.  The CPU hotplug lock is still
- * held, so the value of rnp->qsmaskinit will be stable.
- *
- * We don't include outgoingcpu in the affinity set, use -1 if there is
- * no outgoing CPU.  If there are no CPUs left in the affinity set,
- * this function allows the kthread to execute on any CPU.
- *
- * Any future concurrent calls are serialized via ->boost_kthread_mutex.
- */
-static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
+static struct task_struct *rcu_boost_task(struct rcu_node *rnp)
 {
-       struct task_struct *t = rnp->boost_kthread_task;
-       unsigned long mask;
-       cpumask_var_t cm;
-       int cpu;
-
-       if (!t)
-               return;
-       if (!zalloc_cpumask_var(&cm, GFP_KERNEL))
-               return;
-       mutex_lock(&rnp->boost_kthread_mutex);
-       mask = rcu_rnp_online_cpus(rnp);
-       for_each_leaf_node_possible_cpu(rnp, cpu)
-               if ((mask & leaf_node_cpu_bit(rnp, cpu)) &&
-                   cpu != outgoingcpu)
-                       cpumask_set_cpu(cpu, cm);
-       cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
-       if (cpumask_empty(cm)) {
-               cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_RCU));
-               if (outgoingcpu >= 0)
-                       cpumask_clear_cpu(outgoingcpu, cm);
-       }
-       set_cpus_allowed_ptr(t, cm);
-       mutex_unlock(&rnp->boost_kthread_mutex);
-       free_cpumask_var(cm);
+       return READ_ONCE(rnp->boost_kthread_task);
 }
 
 #else /* #ifdef CONFIG_RCU_BOOST */
@@ -1270,10 +1232,10 @@ static void rcu_spawn_one_boost_kthread(struct rcu_node *rnp)
 {
 }
 
-static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
+static struct task_struct *rcu_boost_task(struct rcu_node *rnp)
 {
+       return NULL;
 }
-
 #endif /* #else #ifdef CONFIG_RCU_BOOST */
 
 /*
index 9116bcc903467fe0d5854e3deb06b8d334cf85eb..540f229700b62c3e2d3f084ce68549251ad39be8 100644 (file)
@@ -3955,6 +3955,17 @@ void wake_up_if_idle(int cpu)
        }
 }
 
+bool cpus_equal_capacity(int this_cpu, int that_cpu)
+{
+       if (!sched_asym_cpucap_active())
+               return true;
+
+       if (this_cpu == that_cpu)
+               return true;
+
+       return arch_scale_cpu_capacity(this_cpu) == arch_scale_cpu_capacity(that_cpu);
+}
+
 bool cpus_share_cache(int this_cpu, int that_cpu)
 {
        if (this_cpu == that_cpu)
@@ -6787,10 +6798,12 @@ static inline void sched_submit_work(struct task_struct *tsk)
 
 static void sched_update_worker(struct task_struct *tsk)
 {
-       if (tsk->flags & (PF_WQ_WORKER | PF_IO_WORKER)) {
+       if (tsk->flags & (PF_WQ_WORKER | PF_IO_WORKER | PF_BLOCK_TS)) {
+               if (tsk->flags & PF_BLOCK_TS)
+                       blk_plug_invalidate_ts(tsk);
                if (tsk->flags & PF_WQ_WORKER)
                        wq_worker_running(tsk);
-               else
+               else if (tsk->flags & PF_IO_WORKER)
                        io_wq_worker_running(tsk);
        }
 }
index 2ad881d07752c15f60a4c14bee21051117d5aeb2..4e715b9b278e7fd7fbea70110f5a829635a4bc01 100644 (file)
        | MEMBARRIER_PRIVATE_EXPEDITED_RSEQ_BITMASK                     \
        | MEMBARRIER_CMD_GET_REGISTRATIONS)
 
+static DEFINE_MUTEX(membarrier_ipi_mutex);
+#define SERIALIZE_IPI() guard(mutex)(&membarrier_ipi_mutex)
+
 static void ipi_mb(void *info)
 {
        smp_mb();       /* IPIs should be serializing but paranoid. */
@@ -259,6 +262,7 @@ static int membarrier_global_expedited(void)
        if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
                return -ENOMEM;
 
+       SERIALIZE_IPI();
        cpus_read_lock();
        rcu_read_lock();
        for_each_online_cpu(cpu) {
@@ -347,6 +351,7 @@ static int membarrier_private_expedited(int flags, int cpu_id)
        if (cpu_id < 0 && !zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
                return -ENOMEM;
 
+       SERIALIZE_IPI();
        cpus_read_lock();
 
        if (cpu_id >= 0) {
@@ -460,6 +465,7 @@ static int sync_runqueues_membarrier_state(struct mm_struct *mm)
         * between threads which are users of @mm has its membarrier state
         * updated.
         */
+       SERIALIZE_IPI();
        cpus_read_lock();
        rcu_read_lock();
        for_each_online_cpu(cpu) {
index c9c57d053ce4f64d9a832f358b4e1ee837959b8b..bdca529f0f7b7aa23e377afca0bf9cc6d04a6473 100644 (file)
@@ -47,6 +47,7 @@
 #include <linux/cgroup.h>
 #include <linux/audit.h>
 #include <linux/sysctl.h>
+#include <uapi/linux/pidfd.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/signal.h>
@@ -1436,7 +1437,8 @@ void lockdep_assert_task_sighand_held(struct task_struct *task)
 #endif
 
 /*
- * send signal info to all the members of a group
+ * send signal info to all the members of a thread group or to the
+ * individual thread if type == PIDTYPE_PID.
  */
 int group_send_sig_info(int sig, struct kernel_siginfo *info,
                        struct task_struct *p, enum pid_type type)
@@ -1478,7 +1480,8 @@ int __kill_pgrp_info(int sig, struct kernel_siginfo *info, struct pid *pgrp)
        return ret;
 }
 
-int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid)
+static int kill_pid_info_type(int sig, struct kernel_siginfo *info,
+                               struct pid *pid, enum pid_type type)
 {
        int error = -ESRCH;
        struct task_struct *p;
@@ -1487,11 +1490,10 @@ int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid)
                rcu_read_lock();
                p = pid_task(pid, PIDTYPE_PID);
                if (p)
-                       error = group_send_sig_info(sig, info, p, PIDTYPE_TGID);
+                       error = group_send_sig_info(sig, info, p, type);
                rcu_read_unlock();
                if (likely(!p || error != -ESRCH))
                        return error;
-
                /*
                 * The task was unhashed in between, try again.  If it
                 * is dead, pid_task() will return NULL, if we race with
@@ -1500,6 +1502,11 @@ int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid)
        }
 }
 
+int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid)
+{
+       return kill_pid_info_type(sig, info, pid, PIDTYPE_TGID);
+}
+
 static int kill_proc_info(int sig, struct kernel_siginfo *info, pid_t pid)
 {
        int error;
@@ -1898,16 +1905,19 @@ int send_sig_fault_trapno(int sig, int code, void __user *addr, int trapno,
        return send_sig_info(info.si_signo, &info, t);
 }
 
-int kill_pgrp(struct pid *pid, int sig, int priv)
+static int kill_pgrp_info(int sig, struct kernel_siginfo *info, struct pid *pgrp)
 {
        int ret;
-
        read_lock(&tasklist_lock);
-       ret = __kill_pgrp_info(sig, __si_special(priv), pid);
+       ret = __kill_pgrp_info(sig, info, pgrp);
        read_unlock(&tasklist_lock);
-
        return ret;
 }
+
+int kill_pgrp(struct pid *pid, int sig, int priv)
+{
+       return kill_pgrp_info(sig, __si_special(priv), pid);
+}
 EXPORT_SYMBOL(kill_pgrp);
 
 int kill_pid(struct pid *pid, int sig, int priv)
@@ -2019,13 +2029,14 @@ ret:
        return ret;
 }
 
-static void do_notify_pidfd(struct task_struct *task)
+void do_notify_pidfd(struct task_struct *task)
 {
-       struct pid *pid;
+       struct pid *pid = task_pid(task);
 
        WARN_ON(task->exit_state == 0);
-       pid = task_pid(task);
-       wake_up_all(&pid->wait_pidfd);
+
+       __wake_up(&pid->wait_pidfd, TASK_NORMAL, 0,
+                       poll_to_key(EPOLLIN | EPOLLRDNORM));
 }
 
 /*
@@ -2050,9 +2061,12 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 
        WARN_ON_ONCE(!tsk->ptrace &&
               (tsk->group_leader != tsk || !thread_group_empty(tsk)));
-
-       /* Wake up all pidfd waiters */
-       do_notify_pidfd(tsk);
+       /*
+        * tsk is a group leader and has no threads, wake up the
+        * non-PIDFD_THREAD waiters.
+        */
+       if (thread_group_empty(tsk))
+               do_notify_pidfd(tsk);
 
        if (sig != SIGCHLD) {
                /*
@@ -3789,12 +3803,13 @@ COMPAT_SYSCALL_DEFINE4(rt_sigtimedwait_time32, compat_sigset_t __user *, uthese,
 #endif
 #endif
 
-static inline void prepare_kill_siginfo(int sig, struct kernel_siginfo *info)
+static void prepare_kill_siginfo(int sig, struct kernel_siginfo *info,
+                                enum pid_type type)
 {
        clear_siginfo(info);
        info->si_signo = sig;
        info->si_errno = 0;
-       info->si_code = SI_USER;
+       info->si_code = (type == PIDTYPE_PID) ? SI_TKILL : SI_USER;
        info->si_pid = task_tgid_vnr(current);
        info->si_uid = from_kuid_munged(current_user_ns(), current_uid());
 }
@@ -3808,7 +3823,7 @@ SYSCALL_DEFINE2(kill, pid_t, pid, int, sig)
 {
        struct kernel_siginfo info;
 
-       prepare_kill_siginfo(sig, &info);
+       prepare_kill_siginfo(sig, &info, PIDTYPE_TGID);
 
        return kill_something_info(sig, &info, pid);
 }
@@ -3861,6 +3876,10 @@ static struct pid *pidfd_to_pid(const struct file *file)
        return tgid_pidfd_to_pid(file);
 }
 
+#define PIDFD_SEND_SIGNAL_FLAGS                            \
+       (PIDFD_SIGNAL_THREAD | PIDFD_SIGNAL_THREAD_GROUP | \
+        PIDFD_SIGNAL_PROCESS_GROUP)
+
 /**
  * sys_pidfd_send_signal - Signal a process through a pidfd
  * @pidfd:  file descriptor of the process
@@ -3868,14 +3887,10 @@ static struct pid *pidfd_to_pid(const struct file *file)
  * @info:   signal info
  * @flags:  future flags
  *
- * The syscall currently only signals via PIDTYPE_PID which covers
- * kill(<positive-pid>, <signal>. It does not signal threads or process
- * groups.
- * In order to extend the syscall to threads and process groups the @flags
- * argument should be used. In essence, the @flags argument will determine
- * what is signaled and not the file descriptor itself. Put in other words,
- * grouping is a property of the flags argument not a property of the file
- * descriptor.
+ * Send the signal to the thread group or to the individual thread depending
+ * on PIDFD_THREAD.
+ * In the future extension to @flags may be used to override the default scope
+ * of @pidfd.
  *
  * Return: 0 on success, negative errno on failure
  */
@@ -3886,9 +3901,14 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
        struct fd f;
        struct pid *pid;
        kernel_siginfo_t kinfo;
+       enum pid_type type;
 
        /* Enforce flags be set to 0 until we add an extension. */
-       if (flags)
+       if (flags & ~PIDFD_SEND_SIGNAL_FLAGS)
+               return -EINVAL;
+
+       /* Ensure that only a single signal scope determining flag is set. */
+       if (hweight32(flags & PIDFD_SEND_SIGNAL_FLAGS) > 1)
                return -EINVAL;
 
        f = fdget(pidfd);
@@ -3906,6 +3926,25 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
        if (!access_pidfd_pidns(pid))
                goto err;
 
+       switch (flags) {
+       case 0:
+               /* Infer scope from the type of pidfd. */
+               if (f.file->f_flags & PIDFD_THREAD)
+                       type = PIDTYPE_PID;
+               else
+                       type = PIDTYPE_TGID;
+               break;
+       case PIDFD_SIGNAL_THREAD:
+               type = PIDTYPE_PID;
+               break;
+       case PIDFD_SIGNAL_THREAD_GROUP:
+               type = PIDTYPE_TGID;
+               break;
+       case PIDFD_SIGNAL_PROCESS_GROUP:
+               type = PIDTYPE_PGID;
+               break;
+       }
+
        if (info) {
                ret = copy_siginfo_from_user_any(&kinfo, info);
                if (unlikely(ret))
@@ -3917,15 +3956,17 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
 
                /* Only allow sending arbitrary signals to yourself. */
                ret = -EPERM;
-               if ((task_pid(current) != pid) &&
+               if ((task_pid(current) != pid || type > PIDTYPE_TGID) &&
                    (kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL))
                        goto err;
        } else {
-               prepare_kill_siginfo(sig, &kinfo);
+               prepare_kill_siginfo(sig, &kinfo, type);
        }
 
-       ret = kill_pid_info(sig, &kinfo, pid);
-
+       if (type == PIDTYPE_PGID)
+               ret = kill_pgrp_info(sig, &kinfo, pid);
+       else
+               ret = kill_pid_info_type(sig, &kinfo, pid, type);
 err:
        fdput(f);
        return ret;
@@ -3965,12 +4006,7 @@ static int do_tkill(pid_t tgid, pid_t pid, int sig)
 {
        struct kernel_siginfo info;
 
-       clear_siginfo(&info);
-       info.si_signo = sig;
-       info.si_errno = 0;
-       info.si_code = SI_TKILL;
-       info.si_pid = task_tgid_vnr(current);
-       info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
+       prepare_kill_siginfo(sig, &info, PIDTYPE_PID);
 
        return do_send_specific(tgid, pid, sig, &info);
 }
index e219fcfa112d863eeef58381d04fd4bab16a1e32..f8e543f1e38a06dc3a4aa2f777c7e88d444e5565 100644 (file)
@@ -1785,21 +1785,24 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
        struct task_struct *t;
        unsigned long flags;
        u64 tgutime, tgstime, utime, stime;
-       unsigned long maxrss = 0;
+       unsigned long maxrss;
+       struct mm_struct *mm;
        struct signal_struct *sig = p->signal;
+       unsigned int seq = 0;
 
-       memset((char *)r, 0, sizeof (*r));
+retry:
+       memset(r, 0, sizeof(*r));
        utime = stime = 0;
+       maxrss = 0;
 
        if (who == RUSAGE_THREAD) {
                task_cputime_adjusted(current, &utime, &stime);
                accumulate_thread_rusage(p, r);
                maxrss = sig->maxrss;
-               goto out;
+               goto out_thread;
        }
 
-       if (!lock_task_sighand(p, &flags))
-               return;
+       flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq);
 
        switch (who) {
        case RUSAGE_BOTH:
@@ -1819,9 +1822,6 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
                fallthrough;
 
        case RUSAGE_SELF:
-               thread_group_cputime_adjusted(p, &tgutime, &tgstime);
-               utime += tgutime;
-               stime += tgstime;
                r->ru_nvcsw += sig->nvcsw;
                r->ru_nivcsw += sig->nivcsw;
                r->ru_minflt += sig->min_flt;
@@ -1830,28 +1830,42 @@ void getrusage(struct task_struct *p, int who, struct rusage *r)
                r->ru_oublock += sig->oublock;
                if (maxrss < sig->maxrss)
                        maxrss = sig->maxrss;
+
+               rcu_read_lock();
                __for_each_thread(sig, t)
                        accumulate_thread_rusage(t, r);
+               rcu_read_unlock();
+
                break;
 
        default:
                BUG();
        }
-       unlock_task_sighand(p, &flags);
 
-out:
-       r->ru_utime = ns_to_kernel_old_timeval(utime);
-       r->ru_stime = ns_to_kernel_old_timeval(stime);
+       if (need_seqretry(&sig->stats_lock, seq)) {
+               seq = 1;
+               goto retry;
+       }
+       done_seqretry_irqrestore(&sig->stats_lock, seq, flags);
 
-       if (who != RUSAGE_CHILDREN) {
-               struct mm_struct *mm = get_task_mm(p);
+       if (who == RUSAGE_CHILDREN)
+               goto out_children;
 
-               if (mm) {
-                       setmax_mm_hiwater_rss(&maxrss, mm);
-                       mmput(mm);
-               }
+       thread_group_cputime_adjusted(p, &tgutime, &tgstime);
+       utime += tgutime;
+       stime += tgstime;
+
+out_thread:
+       mm = get_task_mm(p);
+       if (mm) {
+               setmax_mm_hiwater_rss(&maxrss, mm);
+               mmput(mm);
        }
+
+out_children:
        r->ru_maxrss = maxrss * (PAGE_SIZE / 1024); /* convert pages to KBs */
+       r->ru_utime = ns_to_kernel_old_timeval(utime);
+       r->ru_stime = ns_to_kernel_old_timeval(stime);
 }
 
 SYSCALL_DEFINE2(getrusage, int, who, struct rusage __user *, ru)
index c108ed8a9804ada919575c97b42dd663e33c1a16..3052b1f1168e29c4432ba3b068488af11029018d 100644 (file)
@@ -99,6 +99,7 @@ static u64 suspend_start;
  * Interval: 0.5sec.
  */
 #define WATCHDOG_INTERVAL (HZ >> 1)
+#define WATCHDOG_INTERVAL_MAX_NS ((2 * WATCHDOG_INTERVAL) * (NSEC_PER_SEC / HZ))
 
 /*
  * Threshold: 0.0312s, when doubled: 0.0625s.
@@ -134,6 +135,7 @@ static DECLARE_WORK(watchdog_work, clocksource_watchdog_work);
 static DEFINE_SPINLOCK(watchdog_lock);
 static int watchdog_running;
 static atomic_t watchdog_reset_pending;
+static int64_t watchdog_max_interval;
 
 static inline void clocksource_watchdog_lock(unsigned long *flags)
 {
@@ -399,8 +401,8 @@ static inline void clocksource_reset_watchdog(void)
 static void clocksource_watchdog(struct timer_list *unused)
 {
        u64 csnow, wdnow, cslast, wdlast, delta;
+       int64_t wd_nsec, cs_nsec, interval;
        int next_cpu, reset_pending;
-       int64_t wd_nsec, cs_nsec;
        struct clocksource *cs;
        enum wd_read_status read_ret;
        unsigned long extra_wait = 0;
@@ -470,6 +472,27 @@ static void clocksource_watchdog(struct timer_list *unused)
                if (atomic_read(&watchdog_reset_pending))
                        continue;
 
+               /*
+                * The processing of timer softirqs can get delayed (usually
+                * on account of ksoftirqd not getting to run in a timely
+                * manner), which causes the watchdog interval to stretch.
+                * Skew detection may fail for longer watchdog intervals
+                * on account of fixed margins being used.
+                * Some clocksources, e.g. acpi_pm, cannot tolerate
+                * watchdog intervals longer than a few seconds.
+                */
+               interval = max(cs_nsec, wd_nsec);
+               if (unlikely(interval > WATCHDOG_INTERVAL_MAX_NS)) {
+                       if (system_state > SYSTEM_SCHEDULING &&
+                           interval > 2 * watchdog_max_interval) {
+                               watchdog_max_interval = interval;
+                               pr_warn("Long readout interval, skipping watchdog check: cs_nsec: %lld wd_nsec: %lld\n",
+                                       cs_nsec, wd_nsec);
+                       }
+                       watchdog_timer.expires = jiffies;
+                       continue;
+               }
+
                /* Check the deviation from the watchdog clocksource. */
                md = cs->uncertainty_margin + watchdog->uncertainty_margin;
                if (abs(cs_nsec - wd_nsec) > md) {
index 760793998cdd703a387c64a792a7b7f7dab552d5..edb0f821dceaa1720ac94fc53f4002a1e5f7bdd3 100644 (file)
@@ -1085,6 +1085,7 @@ static int enqueue_hrtimer(struct hrtimer *timer,
                           enum hrtimer_mode mode)
 {
        debug_activate(timer, mode);
+       WARN_ON_ONCE(!base->cpu_base->online);
 
        base->cpu_base->active_bases |= 1 << base->index;
 
@@ -2183,6 +2184,7 @@ int hrtimers_prepare_cpu(unsigned int cpu)
        cpu_base->softirq_next_timer = NULL;
        cpu_base->expires_next = KTIME_MAX;
        cpu_base->softirq_expires_next = KTIME_MAX;
+       cpu_base->online = 1;
        hrtimer_cpu_base_init_expiry_lock(cpu_base);
        return 0;
 }
@@ -2250,6 +2252,7 @@ int hrtimers_cpu_dying(unsigned int dying_cpu)
        smp_call_function_single(ncpu, retrigger_next_event, NULL, 0);
 
        raw_spin_unlock(&new_base->lock);
+       old_base->online = 0;
        raw_spin_unlock(&old_base->lock);
 
        return 0;
index d2501673028da5e627aa08b51a103c181295cad3..01fb50c1b17e4f1b33285ae2ce2690f0747f8ee8 100644 (file)
@@ -1577,6 +1577,7 @@ void tick_cancel_sched_timer(int cpu)
 {
        struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
        ktime_t idle_sleeptime, iowait_sleeptime;
+       unsigned long idle_calls, idle_sleeps;
 
 # ifdef CONFIG_HIGH_RES_TIMERS
        if (ts->sched_timer.base)
@@ -1585,9 +1586,13 @@ void tick_cancel_sched_timer(int cpu)
 
        idle_sleeptime = ts->idle_sleeptime;
        iowait_sleeptime = ts->iowait_sleeptime;
+       idle_calls = ts->idle_calls;
+       idle_sleeps = ts->idle_sleeps;
        memset(ts, 0, sizeof(*ts));
        ts->idle_sleeptime = idle_sleeptime;
        ts->iowait_sleeptime = iowait_sleeptime;
+       ts->idle_calls = idle_calls;
+       ts->idle_sleeps = idle_sleeps;
 }
 #endif
 
index ca058c8af6bafdf2581733e8e4d5f7b6aa1dc88c..3e5d422dd15cbf32e0fec3a44d100c35671fa202 100644 (file)
@@ -73,7 +73,7 @@ static void time64_to_tm_test_date_range(struct kunit *test)
 
                days = div_s64(secs, 86400);
 
-               #define FAIL_MSG "%05ld/%02d/%02d (%2d) : %ld", \
+               #define FAIL_MSG "%05ld/%02d/%02d (%2d) : %lld", \
                        year, month, mdday, yday, days
 
                KUNIT_ASSERT_EQ_MSG(test, year - 1900, result.tm_year, FAIL_MSG);
index 6cd2a4e3afb8fb6045dbf27543a67147dea20900..9ff0182458408438ddc5d7c720d866d0adc02dfd 100644 (file)
@@ -189,9 +189,6 @@ static int fprobe_init_rethook(struct fprobe *fp, int num)
 {
        int size;
 
-       if (num <= 0)
-               return -EINVAL;
-
        if (!fp->exit_handler) {
                fp->rethook = NULL;
                return 0;
@@ -199,15 +196,16 @@ static int fprobe_init_rethook(struct fprobe *fp, int num)
 
        /* Initialize rethook if needed */
        if (fp->nr_maxactive)
-               size = fp->nr_maxactive;
+               num = fp->nr_maxactive;
        else
-               size = num * num_possible_cpus() * 2;
-       if (size <= 0)
+               num *= num_possible_cpus() * 2;
+       if (num <= 0)
                return -EINVAL;
 
+       size = sizeof(struct fprobe_rethook_node) + fp->entry_data_size;
+
        /* Initialize rethook */
-       fp->rethook = rethook_alloc((void *)fp, fprobe_exit_handler,
-                               sizeof(struct fprobe_rethook_node), size);
+       fp->rethook = rethook_alloc((void *)fp, fprobe_exit_handler, size, num);
        if (IS_ERR(fp->rethook))
                return PTR_ERR(fp->rethook);
 
index b01ae7d36021819e6d929ce2ab1e0a5a61464309..83ba342aef31f7d919f4c24a73d2c4cea76be137 100644 (file)
@@ -5325,7 +5325,17 @@ static LIST_HEAD(ftrace_direct_funcs);
 
 static int register_ftrace_function_nolock(struct ftrace_ops *ops);
 
+/*
+ * If there are multiple ftrace_ops, use SAVE_REGS by default, so that direct
+ * call will be jumped from ftrace_regs_caller. Only if the architecture does
+ * not support ftrace_regs_caller but direct_call, use SAVE_ARGS so that it
+ * jumps from ftrace_caller for multiple ftrace_ops.
+ */
+#ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS
 #define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_ARGS)
+#else
+#define MULTI_FLAGS (FTRACE_OPS_FL_DIRECT | FTRACE_OPS_FL_SAVE_REGS)
+#endif
 
 static int check_direct_multi(struct ftrace_ops *ops)
 {
index 13aaf5e85b811b72f60b355f833a680342a41c7f..aa332ace108b18169b3c67198a4fc0b621670b00 100644 (file)
@@ -384,7 +384,6 @@ struct rb_irq_work {
        struct irq_work                 work;
        wait_queue_head_t               waiters;
        wait_queue_head_t               full_waiters;
-       long                            wait_index;
        bool                            waiters_pending;
        bool                            full_waiters_pending;
        bool                            wakeup_full;
@@ -756,8 +755,19 @@ static void rb_wake_up_waiters(struct irq_work *work)
 
        wake_up_all(&rbwork->waiters);
        if (rbwork->full_waiters_pending || rbwork->wakeup_full) {
+               /* Only cpu_buffer sets the above flags */
+               struct ring_buffer_per_cpu *cpu_buffer =
+                       container_of(rbwork, struct ring_buffer_per_cpu, irq_work);
+
+               /* Called from interrupt context */
+               raw_spin_lock(&cpu_buffer->reader_lock);
                rbwork->wakeup_full = false;
                rbwork->full_waiters_pending = false;
+
+               /* Waking up all waiters, they will reset the shortest full */
+               cpu_buffer->shortest_full = 0;
+               raw_spin_unlock(&cpu_buffer->reader_lock);
+
                wake_up_all(&rbwork->full_waiters);
        }
 }
@@ -798,14 +808,40 @@ void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu)
                rbwork = &cpu_buffer->irq_work;
        }
 
-       rbwork->wait_index++;
-       /* make sure the waiters see the new index */
-       smp_wmb();
-
        /* This can be called in any context */
        irq_work_queue(&rbwork->work);
 }
 
+static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full)
+{
+       struct ring_buffer_per_cpu *cpu_buffer;
+       bool ret = false;
+
+       /* Reads of all CPUs always waits for any data */
+       if (cpu == RING_BUFFER_ALL_CPUS)
+               return !ring_buffer_empty(buffer);
+
+       cpu_buffer = buffer->buffers[cpu];
+
+       if (!ring_buffer_empty_cpu(buffer, cpu)) {
+               unsigned long flags;
+               bool pagebusy;
+
+               if (!full)
+                       return true;
+
+               raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+               pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
+               ret = !pagebusy && full_hit(buffer, cpu, full);
+
+               if (!cpu_buffer->shortest_full ||
+                   cpu_buffer->shortest_full > full)
+                       cpu_buffer->shortest_full = full;
+               raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+       }
+       return ret;
+}
+
 /**
  * ring_buffer_wait - wait for input to the ring buffer
  * @buffer: buffer to wait on
@@ -821,7 +857,6 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
        struct ring_buffer_per_cpu *cpu_buffer;
        DEFINE_WAIT(wait);
        struct rb_irq_work *work;
-       long wait_index;
        int ret = 0;
 
        /*
@@ -840,81 +875,54 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
                work = &cpu_buffer->irq_work;
        }
 
-       wait_index = READ_ONCE(work->wait_index);
-
-       while (true) {
-               if (full)
-                       prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE);
-               else
-                       prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE);
-
-               /*
-                * The events can happen in critical sections where
-                * checking a work queue can cause deadlocks.
-                * After adding a task to the queue, this flag is set
-                * only to notify events to try to wake up the queue
-                * using irq_work.
-                *
-                * We don't clear it even if the buffer is no longer
-                * empty. The flag only causes the next event to run
-                * irq_work to do the work queue wake up. The worse
-                * that can happen if we race with !trace_empty() is that
-                * an event will cause an irq_work to try to wake up
-                * an empty queue.
-                *
-                * There's no reason to protect this flag either, as
-                * the work queue and irq_work logic will do the necessary
-                * synchronization for the wake ups. The only thing
-                * that is necessary is that the wake up happens after
-                * a task has been queued. It's OK for spurious wake ups.
-                */
-               if (full)
-                       work->full_waiters_pending = true;
-               else
-                       work->waiters_pending = true;
-
-               if (signal_pending(current)) {
-                       ret = -EINTR;
-                       break;
-               }
-
-               if (cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer))
-                       break;
-
-               if (cpu != RING_BUFFER_ALL_CPUS &&
-                   !ring_buffer_empty_cpu(buffer, cpu)) {
-                       unsigned long flags;
-                       bool pagebusy;
-                       bool done;
-
-                       if (!full)
-                               break;
-
-                       raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
-                       pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
-                       done = !pagebusy && full_hit(buffer, cpu, full);
+       if (full)
+               prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE);
+       else
+               prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE);
 
-                       if (!cpu_buffer->shortest_full ||
-                           cpu_buffer->shortest_full > full)
-                               cpu_buffer->shortest_full = full;
-                       raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
-                       if (done)
-                               break;
-               }
+       /*
+        * The events can happen in critical sections where
+        * checking a work queue can cause deadlocks.
+        * After adding a task to the queue, this flag is set
+        * only to notify events to try to wake up the queue
+        * using irq_work.
+        *
+        * We don't clear it even if the buffer is no longer
+        * empty. The flag only causes the next event to run
+        * irq_work to do the work queue wake up. The worse
+        * that can happen if we race with !trace_empty() is that
+        * an event will cause an irq_work to try to wake up
+        * an empty queue.
+        *
+        * There's no reason to protect this flag either, as
+        * the work queue and irq_work logic will do the necessary
+        * synchronization for the wake ups. The only thing
+        * that is necessary is that the wake up happens after
+        * a task has been queued. It's OK for spurious wake ups.
+        */
+       if (full)
+               work->full_waiters_pending = true;
+       else
+               work->waiters_pending = true;
 
-               schedule();
+       if (rb_watermark_hit(buffer, cpu, full))
+               goto out;
 
-               /* Make sure to see the new wait index */
-               smp_rmb();
-               if (wait_index != work->wait_index)
-                       break;
+       if (signal_pending(current)) {
+               ret = -EINTR;
+               goto out;
        }
 
+       schedule();
+ out:
        if (full)
                finish_wait(&work->full_waiters, &wait);
        else
                finish_wait(&work->waiters, &wait);
 
+       if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current))
+               ret = -EINTR;
+
        return ret;
 }
 
@@ -937,28 +945,33 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
                          struct file *filp, poll_table *poll_table, int full)
 {
        struct ring_buffer_per_cpu *cpu_buffer;
-       struct rb_irq_work *work;
+       struct rb_irq_work *rbwork;
 
        if (cpu == RING_BUFFER_ALL_CPUS) {
-               work = &buffer->irq_work;
+               rbwork = &buffer->irq_work;
                full = 0;
        } else {
                if (!cpumask_test_cpu(cpu, buffer->cpumask))
-                       return -EINVAL;
+                       return EPOLLERR;
 
                cpu_buffer = buffer->buffers[cpu];
-               work = &cpu_buffer->irq_work;
+               rbwork = &cpu_buffer->irq_work;
        }
 
        if (full) {
-               poll_wait(filp, &work->full_waiters, poll_table);
-               work->full_waiters_pending = true;
+               unsigned long flags;
+
+               poll_wait(filp, &rbwork->full_waiters, poll_table);
+
+               raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+               rbwork->full_waiters_pending = true;
                if (!cpu_buffer->shortest_full ||
                    cpu_buffer->shortest_full > full)
                        cpu_buffer->shortest_full = full;
+               raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
        } else {
-               poll_wait(filp, &work->waiters, poll_table);
-               work->waiters_pending = true;
+               poll_wait(filp, &rbwork->waiters, poll_table);
+               rbwork->waiters_pending = true;
        }
 
        /*
@@ -5877,6 +5890,10 @@ int ring_buffer_subbuf_order_set(struct trace_buffer *buffer, int order)
        if (psize <= BUF_PAGE_HDR_SIZE)
                return -EINVAL;
 
+       /* Size of a subbuf cannot be greater than the write counter */
+       if (psize > RB_WRITE_MASK + 1)
+               return -EINVAL;
+
        old_order = buffer->subbuf_order;
        old_size = buffer->subbuf_size;
 
index 2a7c6fd934e9cb391b5ddf589748c572810ad6d1..c9c8983073485bb3645062760ffc4b0dcef94e9d 100644 (file)
@@ -39,6 +39,7 @@
 #include <linux/ctype.h>
 #include <linux/init.h>
 #include <linux/panic_notifier.h>
+#include <linux/kmemleak.h>
 #include <linux/poll.h>
 #include <linux/nmi.h>
 #include <linux/fs.h>
@@ -1532,7 +1533,7 @@ void disable_trace_on_warning(void)
 bool tracer_tracing_is_on(struct trace_array *tr)
 {
        if (tr->array_buffer.buffer)
-               return ring_buffer_record_is_on(tr->array_buffer.buffer);
+               return ring_buffer_record_is_set_on(tr->array_buffer.buffer);
        return !tr->buffer_disabled;
 }
 
@@ -2320,7 +2321,7 @@ struct saved_cmdlines_buffer {
        unsigned *map_cmdline_to_pid;
        unsigned cmdline_num;
        int cmdline_idx;
-       char *saved_cmdlines;
+       char saved_cmdlines[];
 };
 static struct saved_cmdlines_buffer *savedcmd;
 
@@ -2334,47 +2335,60 @@ static inline void set_cmdline(int idx, const char *cmdline)
        strncpy(get_saved_cmdlines(idx), cmdline, TASK_COMM_LEN);
 }
 
-static int allocate_cmdlines_buffer(unsigned int val,
-                                   struct saved_cmdlines_buffer *s)
+static void free_saved_cmdlines_buffer(struct saved_cmdlines_buffer *s)
+{
+       int order = get_order(sizeof(*s) + s->cmdline_num * TASK_COMM_LEN);
+
+       kfree(s->map_cmdline_to_pid);
+       kmemleak_free(s);
+       free_pages((unsigned long)s, order);
+}
+
+static struct saved_cmdlines_buffer *allocate_cmdlines_buffer(unsigned int val)
 {
+       struct saved_cmdlines_buffer *s;
+       struct page *page;
+       int orig_size, size;
+       int order;
+
+       /* Figure out how much is needed to hold the given number of cmdlines */
+       orig_size = sizeof(*s) + val * TASK_COMM_LEN;
+       order = get_order(orig_size);
+       size = 1 << (order + PAGE_SHIFT);
+       page = alloc_pages(GFP_KERNEL, order);
+       if (!page)
+               return NULL;
+
+       s = page_address(page);
+       kmemleak_alloc(s, size, 1, GFP_KERNEL);
+       memset(s, 0, sizeof(*s));
+
+       /* Round up to actual allocation */
+       val = (size - sizeof(*s)) / TASK_COMM_LEN;
+       s->cmdline_num = val;
+
        s->map_cmdline_to_pid = kmalloc_array(val,
                                              sizeof(*s->map_cmdline_to_pid),
                                              GFP_KERNEL);
-       if (!s->map_cmdline_to_pid)
-               return -ENOMEM;
-
-       s->saved_cmdlines = kmalloc_array(TASK_COMM_LEN, val, GFP_KERNEL);
-       if (!s->saved_cmdlines) {
-               kfree(s->map_cmdline_to_pid);
-               return -ENOMEM;
+       if (!s->map_cmdline_to_pid) {
+               free_saved_cmdlines_buffer(s);
+               return NULL;
        }
 
        s->cmdline_idx = 0;
-       s->cmdline_num = val;
        memset(&s->map_pid_to_cmdline, NO_CMDLINE_MAP,
               sizeof(s->map_pid_to_cmdline));
        memset(s->map_cmdline_to_pid, NO_CMDLINE_MAP,
               val * sizeof(*s->map_cmdline_to_pid));
 
-       return 0;
+       return s;
 }
 
 static int trace_create_savedcmd(void)
 {
-       int ret;
-
-       savedcmd = kmalloc(sizeof(*savedcmd), GFP_KERNEL);
-       if (!savedcmd)
-               return -ENOMEM;
+       savedcmd = allocate_cmdlines_buffer(SAVED_CMDLINES_DEFAULT);
 
-       ret = allocate_cmdlines_buffer(SAVED_CMDLINES_DEFAULT, savedcmd);
-       if (ret < 0) {
-               kfree(savedcmd);
-               savedcmd = NULL;
-               return -ENOMEM;
-       }
-
-       return 0;
+       return savedcmd ? 0 : -ENOMEM;
 }
 
 int is_tracing_stopped(void)
@@ -6056,26 +6070,14 @@ tracing_saved_cmdlines_size_read(struct file *filp, char __user *ubuf,
        return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
 }
 
-static void free_saved_cmdlines_buffer(struct saved_cmdlines_buffer *s)
-{
-       kfree(s->saved_cmdlines);
-       kfree(s->map_cmdline_to_pid);
-       kfree(s);
-}
-
 static int tracing_resize_saved_cmdlines(unsigned int val)
 {
        struct saved_cmdlines_buffer *s, *savedcmd_temp;
 
-       s = kmalloc(sizeof(*s), GFP_KERNEL);
+       s = allocate_cmdlines_buffer(val);
        if (!s)
                return -ENOMEM;
 
-       if (allocate_cmdlines_buffer(val, s) < 0) {
-               kfree(s);
-               return -ENOMEM;
-       }
-
        preempt_disable();
        arch_spin_lock(&trace_cmdline_lock);
        savedcmd_temp = savedcmd;
@@ -7291,6 +7293,8 @@ tracing_free_buffer_release(struct inode *inode, struct file *filp)
        return 0;
 }
 
+#define TRACE_MARKER_MAX_SIZE          4096
+
 static ssize_t
 tracing_mark_write(struct file *filp, const char __user *ubuf,
                                        size_t cnt, loff_t *fpos)
@@ -7318,6 +7322,9 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
        if ((ssize_t)cnt < 0)
                return -EINVAL;
 
+       if (cnt > TRACE_MARKER_MAX_SIZE)
+               cnt = TRACE_MARKER_MAX_SIZE;
+
        meta_size = sizeof(*entry) + 2;  /* add '\0' and possible '\n' */
  again:
        size = cnt + meta_size;
@@ -7326,11 +7333,6 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
        if (cnt < FAULTED_SIZE)
                size += FAULTED_SIZE - cnt;
 
-       if (size > TRACE_SEQ_BUFFER_SIZE) {
-               cnt -= size - TRACE_SEQ_BUFFER_SIZE;
-               goto again;
-       }
-
        buffer = tr->array_buffer.buffer;
        event = __trace_buffer_lock_reserve(buffer, TRACE_PRINT, size,
                                            tracing_gen_ctx());
@@ -8391,6 +8393,20 @@ tracing_buffers_read(struct file *filp, char __user *ubuf,
        return size;
 }
 
+static int tracing_buffers_flush(struct file *file, fl_owner_t id)
+{
+       struct ftrace_buffer_info *info = file->private_data;
+       struct trace_iterator *iter = &info->iter;
+
+       iter->wait_index++;
+       /* Make sure the waiters see the new wait_index */
+       smp_wmb();
+
+       ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
+
+       return 0;
+}
+
 static int tracing_buffers_release(struct inode *inode, struct file *file)
 {
        struct ftrace_buffer_info *info = file->private_data;
@@ -8402,12 +8418,6 @@ static int tracing_buffers_release(struct inode *inode, struct file *file)
 
        __trace_array_put(iter->tr);
 
-       iter->wait_index++;
-       /* Make sure the waiters see the new wait_index */
-       smp_wmb();
-
-       ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
-
        if (info->spare)
                ring_buffer_free_read_page(iter->array_buffer->buffer,
                                           info->spare_cpu, info->spare);
@@ -8623,6 +8633,7 @@ static const struct file_operations tracing_buffers_fops = {
        .read           = tracing_buffers_read,
        .poll           = tracing_buffers_poll,
        .release        = tracing_buffers_release,
+       .flush          = tracing_buffers_flush,
        .splice_read    = tracing_buffers_splice_read,
        .unlocked_ioctl = tracing_buffers_ioctl,
        .llseek         = no_llseek,
index ca224d53bfdcd0df9f6609d3f30a4a6054868136..5bbdbcbbde3cd281f82f3cbc79f72f7a0c3e67ee 100644 (file)
@@ -91,8 +91,8 @@ retry:
        for_each_member(i, type, member) {
                if (!member->name_off) {
                        /* Anonymous union/struct: push it for later use */
-                       type = btf_type_skip_modifiers(btf, member->type, &tid);
-                       if (type && top < BTF_ANON_STACK_MAX) {
+                       if (btf_type_skip_modifiers(btf, member->type, &tid) &&
+                           top < BTF_ANON_STACK_MAX) {
                                anon_stack[top].tid = tid;
                                anon_stack[top++].offset =
                                        cur_offset + member->offset;
index e7af286af4f1ad9d3ac578cb3ce3c58ba3d5ce0b..c82b401a294d961ae75c48f3a164e92f3ad181b1 100644 (file)
@@ -441,8 +441,9 @@ static unsigned int trace_string(struct synth_trace_event *entry,
        if (is_dynamic) {
                union trace_synth_field *data = &entry->fields[*n_u64];
 
+               len = fetch_store_strlen((unsigned long)str_val);
                data->as_dynamic.offset = struct_size(entry, fields, event->n_u64) + data_size;
-               data->as_dynamic.len = fetch_store_strlen((unsigned long)str_val);
+               data->as_dynamic.len = len;
 
                ret = fetch_store_string((unsigned long)str_val, &entry->fields[*n_u64], entry);
 
index 46439e3bcec4d20b45ae8202d7a68888778fa208..b33c3861fbbbf303e78f740a0fcc41caa2a77d77 100644 (file)
@@ -1470,8 +1470,10 @@ register_snapshot_trigger(char *glob,
                          struct event_trigger_data *data,
                          struct trace_event_file *file)
 {
-       if (tracing_alloc_snapshot_instance(file->tr) != 0)
-               return 0;
+       int ret = tracing_alloc_snapshot_instance(file->tr);
+
+       if (ret < 0)
+               return ret;
 
        return register_trigger(glob, data, file);
 }
index bd0d01d00fb9d52d0736524c3274f1382fee8462..a8e28f9b9271cf6545351f7d4f7ece1fbd9d8989 100644 (file)
@@ -2444,6 +2444,9 @@ static int timerlat_fd_open(struct inode *inode, struct file *file)
        tlat = this_cpu_tmr_var();
        tlat->count = 0;
 
+       hrtimer_init(&tlat->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED_HARD);
+       tlat->timer.function = timerlat_irq;
+
        migrate_enable();
        return 0;
 };
@@ -2526,9 +2529,6 @@ timerlat_fd_read(struct file *file, char __user *ubuf, size_t count,
                tlat->tracing_thread = false;
                tlat->kthread = current;
 
-               hrtimer_init(&tlat->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED_HARD);
-               tlat->timer.function = timerlat_irq;
-
                /* Annotate now to drift new period */
                tlat->abs_period = hrtimer_cb_get_time(&tlat->timer);
 
index 3e7fa44dc2b24850f8b836f6bd223807fbcf0c48..d8b302d0108302d9ef2debe735c4b7778a217f90 100644 (file)
@@ -1587,12 +1587,11 @@ static enum print_line_t trace_print_print(struct trace_iterator *iter,
 {
        struct print_entry *field;
        struct trace_seq *s = &iter->seq;
-       int max = iter->ent_size - offsetof(struct print_entry, buf);
 
        trace_assign_type(field, iter->ent);
 
        seq_print_ip_sym(s, field->ip, flags);
-       trace_seq_printf(s, ": %.*s", max, field->buf);
+       trace_seq_printf(s, ": %s", field->buf);
 
        return trace_handle_return(s);
 }
@@ -1601,11 +1600,10 @@ static enum print_line_t trace_print_raw(struct trace_iterator *iter, int flags,
                                         struct trace_event *event)
 {
        struct print_entry *field;
-       int max = iter->ent_size - offsetof(struct print_entry, buf);
 
        trace_assign_type(field, iter->ent);
 
-       trace_seq_printf(&iter->seq, "# %lx %.*s", field->ip, max, field->buf);
+       trace_seq_printf(&iter->seq, "# %lx %s", field->ip, field->buf);
 
        return trace_handle_return(&iter->seq);
 }
index 4dc74d73fc1df5af9ee297611865705d44259663..34289f9c67076b2ab81ffc67bd5a518926e59ca6 100644 (file)
@@ -1159,9 +1159,12 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
        if (!(ctx->flags & TPARG_FL_TEVENT) &&
            (strcmp(arg, "$comm") == 0 || strcmp(arg, "$COMM") == 0 ||
             strncmp(arg, "\\\"", 2) == 0)) {
-               /* The type of $comm must be "string", and not an array. */
-               if (parg->count || (t && strcmp(t, "string")))
+               /* The type of $comm must be "string", and not an array type. */
+               if (parg->count || (t && strcmp(t, "string"))) {
+                       trace_probe_log_err(ctx->offset + (t ? (t - arg) : 0),
+                                       NEED_STRING_TYPE);
                        goto out;
+               }
                parg->type = find_fetch_type("string", ctx->flags);
        } else
                parg->type = find_fetch_type(t, ctx->flags);
@@ -1169,18 +1172,6 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
                trace_probe_log_err(ctx->offset + (t ? (t - arg) : 0), BAD_TYPE);
                goto out;
        }
-       parg->offset = *size;
-       *size += parg->type->size * (parg->count ?: 1);
-
-       ret = -ENOMEM;
-       if (parg->count) {
-               len = strlen(parg->type->fmttype) + 6;
-               parg->fmt = kmalloc(len, GFP_KERNEL);
-               if (!parg->fmt)
-                       goto out;
-               snprintf(parg->fmt, len, "%s[%d]", parg->type->fmttype,
-                        parg->count);
-       }
 
        code = tmp = kcalloc(FETCH_INSN_MAX, sizeof(*code), GFP_KERNEL);
        if (!code)
@@ -1204,6 +1195,19 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
                                goto fail;
                }
        }
+       parg->offset = *size;
+       *size += parg->type->size * (parg->count ?: 1);
+
+       if (parg->count) {
+               len = strlen(parg->type->fmttype) + 6;
+               parg->fmt = kmalloc(len, GFP_KERNEL);
+               if (!parg->fmt) {
+                       ret = -ENOMEM;
+                       goto out;
+               }
+               snprintf(parg->fmt, len, "%s[%d]", parg->type->fmttype,
+                        parg->count);
+       }
 
        ret = -EINVAL;
        /* Store operation */
index 850d9ecb6765a8bd372b214ee6f302e3374ffa93..c1877d0182691c20eba09e60a98d80daf2dd810a 100644 (file)
@@ -515,7 +515,8 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
        C(BAD_HYPHEN,           "Failed to parse single hyphen. Forgot '>'?"),  \
        C(NO_BTF_FIELD,         "This field is not found."),    \
        C(BAD_BTF_TID,          "Failed to get BTF type info."),\
-       C(BAD_TYPE4STR,         "This type does not fit for string."),
+       C(BAD_TYPE4STR,         "This type does not fit for string."),\
+       C(NEED_STRING_TYPE,     "$comm and immediate-string only accepts string type"),
 
 #undef C
 #define C(a, b)                TP_ERR_##a
index c774e560f2f957127c7e41b825164a0d102b6fd0..a4dcf0f2435213bc2b2b91d677ec18290aa53859 100644 (file)
@@ -574,7 +574,12 @@ __tracing_map_insert(struct tracing_map *map, void *key, bool lookup_only)
                                }
 
                                memcpy(elt->key, key, map->key_size);
-                               entry->val = elt;
+                               /*
+                                * Ensure the initialization is visible and
+                                * publish the elt.
+                                */
+                               smp_wmb();
+                               WRITE_ONCE(entry->val, elt);
                                atomic64_inc(&map->hits);
 
                                return entry->val;
index 76e60faed892357002868cdf9fb41c76ad4eba54..7b482a26d74196c4505d7b45017ed153c75572fd 100644 (file)
@@ -5786,13 +5786,9 @@ static int workqueue_apply_unbound_cpumask(const cpumask_var_t unbound_cpumask)
        list_for_each_entry(wq, &workqueues, list) {
                if (!(wq->flags & WQ_UNBOUND))
                        continue;
-
                /* creating multiple pwqs breaks ordering guarantee */
-               if (!list_empty(&wq->pwqs)) {
-                       if (wq->flags & __WQ_ORDERED_EXPLICIT)
-                               continue;
-                       wq->flags &= ~__WQ_ORDERED;
-               }
+               if (wq->flags & __WQ_ORDERED)
+                       continue;
 
                ctx = apply_wqattrs_prepare(wq, wq->unbound_attrs, unbound_cpumask);
                if (IS_ERR(ctx)) {
index 975a07f9f1cc08838d272f83d5f04a85ff2f5cd2..f3b50b47b7eacda900f4f9d7bff3495da9cf2bcd 100644 (file)
@@ -2235,6 +2235,7 @@ config TEST_DIV64
 config TEST_IOV_ITER
        tristate "Test iov_iter operation" if !KUNIT_ALL_TESTS
        depends on KUNIT
+       depends on MMU
        default KUNIT_ALL_TESTS
        help
          Enable this to turn on testing of the operation of the I/O iterator
@@ -2857,28 +2858,6 @@ config TEST_MEMCAT_P
 
          If unsure, say N.
 
-config TEST_LIVEPATCH
-       tristate "Test livepatching"
-       default n
-       depends on DYNAMIC_DEBUG
-       depends on LIVEPATCH
-       depends on m
-       help
-         Test kernel livepatching features for correctness.  The tests will
-         load test modules that will be livepatched in various scenarios.
-
-         To run all the livepatching tests:
-
-         make -C tools/testing/selftests TARGETS=livepatch run_tests
-
-         Alternatively, individual tests may be invoked:
-
-         tools/testing/selftests/livepatch/test-callbacks.sh
-         tools/testing/selftests/livepatch/test-livepatch.sh
-         tools/testing/selftests/livepatch/test-shadow-vars.sh
-
-         If unsure, say N.
-
 config TEST_OBJAGG
        tristate "Perform selftest on object aggreration manager"
        default n
index 6b09731d8e6195603aab99a3fd3f5dc56331f567..95ed57f377fd9b9493d25bc9e7dd4681d86c75c6 100644 (file)
@@ -134,8 +134,6 @@ endif
 obj-$(CONFIG_TEST_FPU) += test_fpu.o
 CFLAGS_test_fpu.o += $(FPU_CFLAGS)
 
-obj-$(CONFIG_TEST_LIVEPATCH) += livepatch/
-
 # Some KUnit files (hooks.o) need to be built-in even when KUnit is a module,
 # so we can't just use obj-$(CONFIG_KUNIT).
 ifdef CONFIG_KUNIT
index 225bb77014600f796e972a9c0f03638c23750a06..bf70850035c76f468c7c0af023454bf5bc6716e3 100644 (file)
@@ -215,7 +215,7 @@ static const u32 init_sums_no_overflow[] = {
        0xffff0000, 0xfffffffb,
 };
 
-static const __sum16 expected_csum_ipv6_magic[] = {
+static const u16 expected_csum_ipv6_magic[] = {
        0x18d4, 0x3085, 0x2e4b, 0xd9f4, 0xbdc8, 0x78f,  0x1034, 0x8422, 0x6fc0,
        0xd2f6, 0xbeb5, 0x9d3,  0x7e2a, 0x312e, 0x778e, 0xc1bb, 0x7cf2, 0x9d1e,
        0xca21, 0xf3ff, 0x7569, 0xb02e, 0xca86, 0x7e76, 0x4539, 0x45e3, 0xf28d,
@@ -241,7 +241,7 @@ static const __sum16 expected_csum_ipv6_magic[] = {
        0x3845, 0x1014
 };
 
-static const __sum16 expected_fast_csum[] = {
+static const u16 expected_fast_csum[] = {
        0xda83, 0x45da, 0x4f46, 0x4e4f, 0x34e,  0xe902, 0xa5e9, 0x87a5, 0x7187,
        0x5671, 0xf556, 0x6df5, 0x816d, 0x8f81, 0xbb8f, 0xfbba, 0x5afb, 0xbe5a,
        0xedbe, 0xabee, 0x6aac, 0xe6b,  0xea0d, 0x67ea, 0x7e68, 0x8a7e, 0x6f8a,
@@ -577,7 +577,8 @@ static void test_csum_no_carry_inputs(struct kunit *test)
 
 static void test_ip_fast_csum(struct kunit *test)
 {
-       __sum16 csum_result, expected;
+       __sum16 csum_result;
+       u16 expected;
 
        for (int len = IPv4_MIN_WORDS; len < IPv4_MAX_WORDS; len++) {
                for (int index = 0; index < NUM_IP_FAST_CSUM_TESTS; index++) {
@@ -586,7 +587,7 @@ static void test_ip_fast_csum(struct kunit *test)
                                expected_fast_csum[(len - IPv4_MIN_WORDS) *
                                                   NUM_IP_FAST_CSUM_TESTS +
                                                   index];
-                       CHECK_EQ(expected, csum_result);
+                       CHECK_EQ(to_sum16(expected), csum_result);
                }
        }
 }
@@ -598,7 +599,7 @@ static void test_csum_ipv6_magic(struct kunit *test)
        const struct in6_addr *daddr;
        unsigned int len;
        unsigned char proto;
-       unsigned int csum;
+       __wsum csum;
 
        const int daddr_offset = sizeof(struct in6_addr);
        const int len_offset = sizeof(struct in6_addr) + sizeof(struct in6_addr);
@@ -611,10 +612,10 @@ static void test_csum_ipv6_magic(struct kunit *test)
                saddr = (const struct in6_addr *)(random_buf + i);
                daddr = (const struct in6_addr *)(random_buf + i +
                                                  daddr_offset);
-               len = *(unsigned int *)(random_buf + i + len_offset);
+               len = le32_to_cpu(*(__le32 *)(random_buf + i + len_offset));
                proto = *(random_buf + i + proto_offset);
-               csum = *(unsigned int *)(random_buf + i + csum_offset);
-               CHECK_EQ(expected_csum_ipv6_magic[i],
+               csum = *(__wsum *)(random_buf + i + csum_offset);
+               CHECK_EQ(to_sum16(expected_csum_ipv6_magic[i]),
                         csum_ipv6_magic(saddr, daddr, len, proto, csum));
        }
 #endif /* !CONFIG_NET */
index d4572dbc914539a56c8b2a6da5ab101a9e11d80d..705b82736be0890de9417ca62233479eb3999cd9 100644 (file)
@@ -124,7 +124,7 @@ static void cmdline_do_one_range_test(struct kunit *test, const char *in,
                            n, e[0], r[0]);
 
        p = memchr_inv(&r[1], 0, sizeof(r) - sizeof(r[0]));
-       KUNIT_EXPECT_PTR_EQ_MSG(test, p, NULL, "in test %u at %u out of bound", n, p - r);
+       KUNIT_EXPECT_PTR_EQ_MSG(test, p, NULL, "in test %u at %td out of bound", n, p - r);
 }
 
 static void cmdline_test_range(struct kunit *test)
index e0aa6b440ca5f4a4f3560100985e39c068c7a6a8..4a6a9f419bd7eb8cf1370eed5f42c98eb6914f3e 100644 (file)
@@ -166,7 +166,6 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
        WARN_ON(direction & ~(READ | WRITE));
        *i = (struct iov_iter) {
                .iter_type = ITER_IOVEC,
-               .copy_mc = false,
                .nofault = false,
                .data_source = direction,
                .__iov = iov,
@@ -244,27 +243,9 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 EXPORT_SYMBOL_GPL(_copy_mc_to_iter);
 #endif /* CONFIG_ARCH_HAS_COPY_MC */
 
-static __always_inline
-size_t memcpy_from_iter_mc(void *iter_from, size_t progress,
-                          size_t len, void *to, void *priv2)
-{
-       return copy_mc_to_kernel(to + progress, iter_from, len);
-}
-
-static size_t __copy_from_iter_mc(void *addr, size_t bytes, struct iov_iter *i)
-{
-       if (unlikely(i->count < bytes))
-               bytes = i->count;
-       if (unlikely(!bytes))
-               return 0;
-       return iterate_bvec(i, bytes, addr, NULL, memcpy_from_iter_mc);
-}
-
 static __always_inline
 size_t __copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
 {
-       if (unlikely(iov_iter_is_copy_mc(i)))
-               return __copy_from_iter_mc(addr, bytes, i);
        return iterate_and_advance(i, bytes, addr,
                                   copy_from_user_iter, memcpy_from_iter);
 }
@@ -633,7 +614,6 @@ void iov_iter_kvec(struct iov_iter *i, unsigned int direction,
        WARN_ON(direction & ~(READ | WRITE));
        *i = (struct iov_iter){
                .iter_type = ITER_KVEC,
-               .copy_mc = false,
                .data_source = direction,
                .kvec = kvec,
                .nr_segs = nr_segs,
@@ -650,7 +630,6 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction,
        WARN_ON(direction & ~(READ | WRITE));
        *i = (struct iov_iter){
                .iter_type = ITER_BVEC,
-               .copy_mc = false,
                .data_source = direction,
                .bvec = bvec,
                .nr_segs = nr_segs,
@@ -679,7 +658,6 @@ void iov_iter_xarray(struct iov_iter *i, unsigned int direction,
        BUG_ON(direction & ~1);
        *i = (struct iov_iter) {
                .iter_type = ITER_XARRAY,
-               .copy_mc = false,
                .data_source = direction,
                .xarray = xarray,
                .xarray_start = start,
@@ -703,7 +681,6 @@ void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count)
        BUG_ON(direction != READ);
        *i = (struct iov_iter){
                .iter_type = ITER_DISCARD,
-               .copy_mc = false,
                .data_source = false,
                .count = count,
                .iov_offset = 0
@@ -714,12 +691,11 @@ EXPORT_SYMBOL(iov_iter_discard);
 static bool iov_iter_aligned_iovec(const struct iov_iter *i, unsigned addr_mask,
                                   unsigned len_mask)
 {
+       const struct iovec *iov = iter_iov(i);
        size_t size = i->count;
        size_t skip = i->iov_offset;
-       unsigned k;
 
-       for (k = 0; k < i->nr_segs; k++, skip = 0) {
-               const struct iovec *iov = iter_iov(i) + k;
+       do {
                size_t len = iov->iov_len - skip;
 
                if (len > size)
@@ -729,34 +705,36 @@ static bool iov_iter_aligned_iovec(const struct iov_iter *i, unsigned addr_mask,
                if ((unsigned long)(iov->iov_base + skip) & addr_mask)
                        return false;
 
+               iov++;
                size -= len;
-               if (!size)
-                       break;
-       }
+               skip = 0;
+       } while (size);
+
        return true;
 }
 
 static bool iov_iter_aligned_bvec(const struct iov_iter *i, unsigned addr_mask,
                                  unsigned len_mask)
 {
-       size_t size = i->count;
+       const struct bio_vec *bvec = i->bvec;
        unsigned skip = i->iov_offset;
-       unsigned k;
+       size_t size = i->count;
 
-       for (k = 0; k < i->nr_segs; k++, skip = 0) {
-               size_t len = i->bvec[k].bv_len - skip;
+       do {
+               size_t len = bvec->bv_len;
 
                if (len > size)
                        len = size;
                if (len & len_mask)
                        return false;
-               if ((unsigned long)(i->bvec[k].bv_offset + skip) & addr_mask)
+               if ((unsigned long)(bvec->bv_offset + skip) & addr_mask)
                        return false;
 
+               bvec++;
                size -= len;
-               if (!size)
-                       break;
-       }
+               skip = 0;
+       } while (size);
+
        return true;
 }
 
@@ -800,13 +778,12 @@ EXPORT_SYMBOL_GPL(iov_iter_is_aligned);
 
 static unsigned long iov_iter_alignment_iovec(const struct iov_iter *i)
 {
+       const struct iovec *iov = iter_iov(i);
        unsigned long res = 0;
        size_t size = i->count;
        size_t skip = i->iov_offset;
-       unsigned k;
 
-       for (k = 0; k < i->nr_segs; k++, skip = 0) {
-               const struct iovec *iov = iter_iov(i) + k;
+       do {
                size_t len = iov->iov_len - skip;
                if (len) {
                        res |= (unsigned long)iov->iov_base + skip;
@@ -814,30 +791,31 @@ static unsigned long iov_iter_alignment_iovec(const struct iov_iter *i)
                                len = size;
                        res |= len;
                        size -= len;
-                       if (!size)
-                               break;
                }
-       }
+               iov++;
+               skip = 0;
+       } while (size);
        return res;
 }
 
 static unsigned long iov_iter_alignment_bvec(const struct iov_iter *i)
 {
+       const struct bio_vec *bvec = i->bvec;
        unsigned res = 0;
        size_t size = i->count;
        unsigned skip = i->iov_offset;
-       unsigned k;
 
-       for (k = 0; k < i->nr_segs; k++, skip = 0) {
-               size_t len = i->bvec[k].bv_len - skip;
-               res |= (unsigned long)i->bvec[k].bv_offset + skip;
+       do {
+               size_t len = bvec->bv_len - skip;
+               res |= (unsigned long)bvec->bv_offset + skip;
                if (len > size)
                        len = size;
                res |= len;
+               bvec++;
                size -= len;
-               if (!size)
-                       break;
-       }
+               skip = 0;
+       } while (size);
+
        return res;
 }
 
@@ -1166,11 +1144,12 @@ const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
 EXPORT_SYMBOL(dup_iter);
 
 static __noclone int copy_compat_iovec_from_user(struct iovec *iov,
-               const struct iovec __user *uvec, unsigned long nr_segs)
+               const struct iovec __user *uvec, u32 nr_segs)
 {
        const struct compat_iovec __user *uiov =
                (const struct compat_iovec __user *)uvec;
-       int ret = -EFAULT, i;
+       int ret = -EFAULT;
+       u32 i;
 
        if (!user_access_begin(uiov, nr_segs * sizeof(*uiov)))
                return -EFAULT;
index 59dbcbdb1c916d93dcbcbb3b97911098ef9420a7..72fa20f405f1520a63dd50d9aa37f6609306eb3e 100644 (file)
@@ -74,10 +74,12 @@ static int create_dir(struct kobject *kobj)
        if (error)
                return error;
 
-       error = sysfs_create_groups(kobj, ktype->default_groups);
-       if (error) {
-               sysfs_remove_dir(kobj);
-               return error;
+       if (ktype) {
+               error = sysfs_create_groups(kobj, ktype->default_groups);
+               if (error) {
+                       sysfs_remove_dir(kobj);
+                       return error;
+               }
        }
 
        /*
@@ -589,7 +591,8 @@ static void __kobject_del(struct kobject *kobj)
        sd = kobj->sd;
        ktype = get_ktype(kobj);
 
-       sysfs_remove_groups(kobj, ktype->default_groups);
+       if (ktype)
+               sysfs_remove_groups(kobj, ktype->default_groups);
 
        /* send "remove" if the caller did not do it but sent "add" */
        if (kobj->state_add_uevent_sent && !kobj->state_remove_uevent_sent) {
@@ -666,6 +669,10 @@ static void kobject_cleanup(struct kobject *kobj)
        pr_debug("'%s' (%p): %s, parent %p\n",
                 kobject_name(kobj), kobj, __func__, kobj->parent);
 
+       if (t && !t->release)
+               pr_debug("'%s' (%p): does not have a release() function, it is broken and must be fixed. See Documentation/core-api/kobject.rst.\n",
+                        kobject_name(kobj), kobj);
+
        /* remove from sysfs if the caller did not do it */
        if (kobj->state_in_sysfs) {
                pr_debug("'%s' (%p): auto cleanup kobject_del\n",
@@ -676,13 +683,10 @@ static void kobject_cleanup(struct kobject *kobj)
                parent = NULL;
        }
 
-       if (t->release) {
+       if (t && t->release) {
                pr_debug("'%s' (%p): calling ktype release\n",
                         kobject_name(kobj), kobj);
                t->release(kobj);
-       } else {
-               pr_debug("'%s' (%p): does not have a release() function, it is broken and must be fixed. See Documentation/core-api/kobject.rst.\n",
-                        kobject_name(kobj), kobj);
        }
 
        /* free name if we allocated it */
@@ -1056,7 +1060,7 @@ const struct kobj_ns_type_operations *kobj_child_ns_ops(const struct kobject *pa
 {
        const struct kobj_ns_type_operations *ops = NULL;
 
-       if (parent && parent->ktype->child_ns_type)
+       if (parent && parent->ktype && parent->ktype->child_ns_type)
                ops = parent->ktype->child_ns_type(parent);
 
        return ops;
index 54bd558364053c2eb5437e0bf5b94da6cdfba0e1..5fcd48ff0f36a37415c67cf2223cdf7ffeaef794 100644 (file)
@@ -13,5 +13,7 @@
 
 // For internal use only -- registers the kunit_bus.
 int kunit_bus_init(void);
+// For internal use only -- unregisters the kunit_bus.
+void kunit_bus_shutdown(void);
 
 #endif //_KUNIT_DEVICE_IMPL_H
index f5371287b3750f0cefdc423748846b4257d8d14a..abc603730b8ea4e55e320419b671130a2266f0a3 100644 (file)
@@ -10,6 +10,7 @@
  */
 
 #include <linux/device.h>
+#include <linux/dma-mapping.h>
 
 #include <kunit/test.h>
 #include <kunit/device.h>
@@ -35,7 +36,7 @@ struct kunit_device {
 
 #define to_kunit_device(d) container_of_const(d, struct kunit_device, dev)
 
-static struct bus_type kunit_bus_type = {
+static const struct bus_type kunit_bus_type = {
        .name           = "kunit",
 };
 
@@ -45,8 +46,8 @@ int kunit_bus_init(void)
        int error;
 
        kunit_bus_device = root_device_register("kunit");
-       if (!kunit_bus_device)
-               return -ENOMEM;
+       if (IS_ERR(kunit_bus_device))
+               return PTR_ERR(kunit_bus_device);
 
        error = bus_register(&kunit_bus_type);
        if (error)
@@ -54,6 +55,20 @@ int kunit_bus_init(void)
        return error;
 }
 
+/* Unregister the 'kunit_bus' in case the KUnit module is unloaded. */
+void kunit_bus_shutdown(void)
+{
+       /* Make sure the bus exists before we unregister it. */
+       if (IS_ERR_OR_NULL(kunit_bus_device))
+               return;
+
+       bus_unregister(&kunit_bus_type);
+
+       root_device_unregister(kunit_bus_device);
+
+       kunit_bus_device = NULL;
+}
+
 /* Release a 'fake' KUnit device. */
 static void kunit_device_release(struct device *d)
 {
@@ -119,6 +134,9 @@ static struct kunit_device *kunit_device_register_internal(struct kunit *test,
                return ERR_PTR(err);
        }
 
+       kunit_dev->dev.dma_mask = &kunit_dev->dev.coherent_dma_mask;
+       kunit_dev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
+
        kunit_add_action(test, device_unregister_wrapper, &kunit_dev->dev);
 
        return kunit_dev;
index 717b9599036ba0bccf1ffe846b7b051c591109f2..70b9a43cd2571620f8f7d3d12dd72c736e3741ab 100644 (file)
@@ -33,13 +33,13 @@ static char *filter_glob_param;
 static char *filter_param;
 static char *filter_action_param;
 
-module_param_named(filter_glob, filter_glob_param, charp, 0400);
+module_param_named(filter_glob, filter_glob_param, charp, 0600);
 MODULE_PARM_DESC(filter_glob,
                "Filter which KUnit test suites/tests run at boot-time, e.g. list* or list*.*del_test");
-module_param_named(filter, filter_param, charp, 0400);
+module_param_named(filter, filter_param, charp, 0600);
 MODULE_PARM_DESC(filter,
                "Filter which KUnit test suites/tests run at boot-time using attributes, e.g. speed>slow");
-module_param_named(filter_action, filter_action_param, charp, 0400);
+module_param_named(filter_action, filter_action_param, charp, 0600);
 MODULE_PARM_DESC(filter_action,
                "Changes behavior of filtered tests using attributes, valid values are:\n"
                "<none>: do not run filtered tests as normal\n"
@@ -146,6 +146,10 @@ void kunit_free_suite_set(struct kunit_suite_set suite_set)
        kfree(suite_set.start);
 }
 
+/*
+ * Filter and reallocate test suites. Must return the filtered test suites set
+ * allocated at a valid virtual address or NULL in case of error.
+ */
 struct kunit_suite_set
 kunit_filter_suites(const struct kunit_suite_set *suite_set,
                    const char *filter_glob,
index 22d4ee86dbedde1c477ee1cd1ed68cbc2b999d23..3f7f967e3688ee0de7e0e883c858bee2c2e9f683 100644 (file)
@@ -129,7 +129,7 @@ static void parse_filter_attr_test(struct kunit *test)
                        GFP_KERNEL);
        for (j = 0; j < filter_count; j++) {
                parsed_filters[j] = kunit_next_attr_filter(&filter, &err);
-               KUNIT_ASSERT_EQ_MSG(test, err, 0, "failed to parse filter '%s'", filters[j]);
+               KUNIT_ASSERT_EQ_MSG(test, err, 0, "failed to parse filter from '%s'", filters);
        }
 
        KUNIT_EXPECT_STREQ(test, kunit_attr_filter_name(parsed_filters[0]), "speed");
index c4259d910356ba7e8f24847cd347eb5861071cb4..f7980ef236a38bdefd8e0e7b53915f6057348617 100644 (file)
@@ -720,7 +720,7 @@ static void kunit_device_cleanup_test(struct kunit *test)
        long action_was_run = 0;
 
        test_device = kunit_device_register(test, "my_device");
-       KUNIT_ASSERT_NOT_NULL(test, test_device);
+       KUNIT_ASSERT_NOT_ERR_OR_NULL(test, test_device);
 
        /* Add an action to verify cleanup. */
        devm_add_action(test_device, test_dev_action, &action_was_run);
index f95d2093a0aa3359c0cb08462ea62e76ab0f2ecf..1d1475578515c261fe74b454502f1a2a5ac3bb81 100644 (file)
@@ -17,6 +17,7 @@
 #include <linux/panic.h>
 #include <linux/sched/debug.h>
 #include <linux/sched.h>
+#include <linux/mm.h>
 
 #include "debugfs.h"
 #include "device-impl.h"
@@ -801,12 +802,19 @@ static void kunit_module_exit(struct module *mod)
        };
        const char *action = kunit_action();
 
+       /*
+        * Check if the start address is a valid virtual address to detect
+        * if the module load sequence has failed and the suite set has not
+        * been initialized and filtered.
+        */
+       if (!suite_set.start || !virt_addr_valid(suite_set.start))
+               return;
+
        if (!action)
                __kunit_test_suites_exit(mod->kunit_suites,
                                         mod->num_kunit_suites);
 
-       if (suite_set.start)
-               kunit_free_suite_set(suite_set);
+       kunit_free_suite_set(suite_set);
 }
 
 static int kunit_module_notify(struct notifier_block *nb, unsigned long val,
@@ -816,12 +824,12 @@ static int kunit_module_notify(struct notifier_block *nb, unsigned long val,
 
        switch (val) {
        case MODULE_STATE_LIVE:
+               kunit_module_init(mod);
                break;
        case MODULE_STATE_GOING:
                kunit_module_exit(mod);
                break;
        case MODULE_STATE_COMING:
-               kunit_module_init(mod);
                break;
        case MODULE_STATE_UNFORMED:
                break;
@@ -920,6 +928,9 @@ static void __exit kunit_exit(void)
 #ifdef CONFIG_MODULES
        unregister_module_notifier(&kunit_mod_nb);
 #endif
+
+       kunit_bus_shutdown();
+
        kunit_debugfs_cleanup();
 }
 module_exit(kunit_exit);
diff --git a/lib/livepatch/Makefile b/lib/livepatch/Makefile
deleted file mode 100644 (file)
index dcc912b..0000000
+++ /dev/null
@@ -1,14 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-#
-# Makefile for livepatch test code.
-
-obj-$(CONFIG_TEST_LIVEPATCH) += test_klp_atomic_replace.o \
-                               test_klp_callbacks_demo.o \
-                               test_klp_callbacks_demo2.o \
-                               test_klp_callbacks_busy.o \
-                               test_klp_callbacks_mod.o \
-                               test_klp_livepatch.o \
-                               test_klp_shadow_vars.o \
-                               test_klp_state.o \
-                               test_klp_state2.o \
-                               test_klp_state3.o
index 6f241bb38799201defb1b31154920dcc257b9831..af097028872722f4e3a0225f5ca4adb9097c3f39 100644 (file)
@@ -4290,6 +4290,56 @@ exists:
 
 }
 
+/**
+ * mas_alloc_cyclic() - Internal call to find somewhere to store an entry
+ * @mas: The maple state.
+ * @startp: Pointer to ID.
+ * @range_lo: Lower bound of range to search.
+ * @range_hi: Upper bound of range to search.
+ * @entry: The entry to store.
+ * @next: Pointer to next ID to allocate.
+ * @gfp: The GFP_FLAGS to use for allocations.
+ *
+ * Return: 0 if the allocation succeeded without wrapping, 1 if the
+ * allocation succeeded after wrapping, or -EBUSY if there are no
+ * free entries.
+ */
+int mas_alloc_cyclic(struct ma_state *mas, unsigned long *startp,
+               void *entry, unsigned long range_lo, unsigned long range_hi,
+               unsigned long *next, gfp_t gfp)
+{
+       unsigned long min = range_lo;
+       int ret = 0;
+
+       range_lo = max(min, *next);
+       ret = mas_empty_area(mas, range_lo, range_hi, 1);
+       if ((mas->tree->ma_flags & MT_FLAGS_ALLOC_WRAPPED) && ret == 0) {
+               mas->tree->ma_flags &= ~MT_FLAGS_ALLOC_WRAPPED;
+               ret = 1;
+       }
+       if (ret < 0 && range_lo > min) {
+               ret = mas_empty_area(mas, min, range_hi, 1);
+               if (ret == 0)
+                       ret = 1;
+       }
+       if (ret < 0)
+               return ret;
+
+       do {
+               mas_insert(mas, entry);
+       } while (mas_nomem(mas, gfp));
+       if (mas_is_err(mas))
+               return xa_err(mas->node);
+
+       *startp = mas->index;
+       *next = *startp + 1;
+       if (*next == 0)
+               mas->tree->ma_flags |= MT_FLAGS_ALLOC_WRAPPED;
+
+       return ret;
+}
+EXPORT_SYMBOL(mas_alloc_cyclic);
+
 static __always_inline void mas_rewalk(struct ma_state *mas, unsigned long index)
 {
 retry:
@@ -6443,6 +6493,49 @@ unlock:
 }
 EXPORT_SYMBOL(mtree_alloc_range);
 
+/**
+ * mtree_alloc_cyclic() - Find somewhere to store this entry in the tree.
+ * @mt: The maple tree.
+ * @startp: Pointer to ID.
+ * @range_lo: Lower bound of range to search.
+ * @range_hi: Upper bound of range to search.
+ * @entry: The entry to store.
+ * @next: Pointer to next ID to allocate.
+ * @gfp: The GFP_FLAGS to use for allocations.
+ *
+ * Finds an empty entry in @mt after @next, stores the new index into
+ * the @id pointer, stores the entry at that index, then updates @next.
+ *
+ * @mt must be initialized with the MT_FLAGS_ALLOC_RANGE flag.
+ *
+ * Context: Any context.  Takes and releases the mt.lock.  May sleep if
+ * the @gfp flags permit.
+ *
+ * Return: 0 if the allocation succeeded without wrapping, 1 if the
+ * allocation succeeded after wrapping, -ENOMEM if memory could not be
+ * allocated, -EINVAL if @mt cannot be used, or -EBUSY if there are no
+ * free entries.
+ */
+int mtree_alloc_cyclic(struct maple_tree *mt, unsigned long *startp,
+               void *entry, unsigned long range_lo, unsigned long range_hi,
+               unsigned long *next, gfp_t gfp)
+{
+       int ret;
+
+       MA_STATE(mas, mt, 0, 0);
+
+       if (!mt_is_alloc(mt))
+               return -EINVAL;
+       if (WARN_ON_ONCE(mt_is_reserved(entry)))
+               return -EINVAL;
+       mtree_lock(mt);
+       ret = mas_alloc_cyclic(&mas, startp, entry, range_lo, range_hi,
+                              next, gfp);
+       mtree_unlock(mt);
+       return ret;
+}
+EXPORT_SYMBOL(mtree_alloc_cyclic);
+
 int mtree_alloc_rrange(struct maple_tree *mt, unsigned long *startp,
                void *entry, unsigned long size, unsigned long min,
                unsigned long max, gfp_t gfp)
index 440aee705cccab09a616870b5fb4640ad6f2b7cd..30e00ef0bf2e0f8e407a97ef555f8aa4f648dac3 100644 (file)
@@ -32,7 +32,7 @@ struct some_bytes {
        BUILD_BUG_ON(sizeof(instance.data) != 32);      \
        for (size_t i = 0; i < sizeof(instance.data); i++) {    \
                KUNIT_ASSERT_EQ_MSG(test, instance.data[i], v, \
-                       "line %d: '%s' not initialized to 0x%02x @ %d (saw 0x%02x)\n", \
+                       "line %d: '%s' not initialized to 0x%02x @ %zu (saw 0x%02x)\n", \
                        __LINE__, #instance, v, i, instance.data[i]);   \
        }       \
 } while (0)
@@ -41,7 +41,7 @@ struct some_bytes {
        BUILD_BUG_ON(sizeof(one) != sizeof(two)); \
        for (size_t i = 0; i < sizeof(one); i++) {      \
                KUNIT_EXPECT_EQ_MSG(test, one.data[i], two.data[i], \
-                       "line %d: %s.data[%d] (0x%02x) != %s.data[%d] (0x%02x)\n", \
+                       "line %d: %s.data[%zu] (0x%02x) != %s.data[%zu] (0x%02x)\n", \
                        __LINE__, #one, i, one.data[i], #two, i, two.data[i]); \
        }       \
        kunit_info(test, "ok: " TEST_OP "() " name "\n");       \
index ed2ab43e1b22c0156e5d361c6bfa7eb745759232..be9c576b6e2dc6d35d67d31f15014ab747f478ce 100644 (file)
@@ -30,6 +30,8 @@ static const u8 nla_attr_len[NLA_TYPE_MAX+1] = {
        [NLA_S16]       = sizeof(s16),
        [NLA_S32]       = sizeof(s32),
        [NLA_S64]       = sizeof(s64),
+       [NLA_BE16]      = sizeof(__be16),
+       [NLA_BE32]      = sizeof(__be32),
 };
 
 static const u8 nla_attr_minlen[NLA_TYPE_MAX+1] = {
@@ -43,6 +45,8 @@ static const u8 nla_attr_minlen[NLA_TYPE_MAX+1] = {
        [NLA_S16]       = sizeof(s16),
        [NLA_S32]       = sizeof(s32),
        [NLA_S64]       = sizeof(s64),
+       [NLA_BE16]      = sizeof(__be16),
+       [NLA_BE32]      = sizeof(__be32),
 };
 
 /*
index 010c730ca7fca9b9aab83960a78638046043693f..f3f3436d60a9403eae5b1ef9b091b027881f14fb 100644 (file)
  * seq_buf_init() more than once to reset the seq_buf to start
  * from scratch.
  */
-#include <linux/uaccess.h>
-#include <linux/seq_file.h>
+
+#include <linux/bug.h>
+#include <linux/err.h>
+#include <linux/export.h>
+#include <linux/hex.h>
+#include <linux/minmax.h>
+#include <linux/printk.h>
 #include <linux/seq_buf.h>
+#include <linux/seq_file.h>
+#include <linux/sprintf.h>
+#include <linux/string.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
 
 /**
  * seq_buf_can_fit - can the new data fit in the current buffer?
  * @s: the seq_buf descriptor
  * @len: The length to see if it can fit in the current buffer
  *
- * Returns true if there's enough unused space in the seq_buf buffer
+ * Returns: true if there's enough unused space in the seq_buf buffer
  * to fit the amount of new data according to @len.
  */
 static bool seq_buf_can_fit(struct seq_buf *s, size_t len)
@@ -35,7 +45,7 @@ static bool seq_buf_can_fit(struct seq_buf *s, size_t len)
  * @m: the seq_file descriptor that is the destination
  * @s: the seq_buf descriptor that is the source.
  *
- * Returns zero on success, non zero otherwise
+ * Returns: zero on success, non-zero otherwise.
  */
 int seq_buf_print_seq(struct seq_file *m, struct seq_buf *s)
 {
@@ -50,9 +60,9 @@ int seq_buf_print_seq(struct seq_file *m, struct seq_buf *s)
  * @fmt: printf format string
  * @args: va_list of arguments from a printf() type function
  *
- * Writes a vnprintf() format into the sequencce buffer.
+ * Writes a vnprintf() format into the sequence buffer.
  *
- * Returns zero on success, -1 on overflow.
+ * Returns: zero on success, -1 on overflow.
  */
 int seq_buf_vprintf(struct seq_buf *s, const char *fmt, va_list args)
 {
@@ -78,7 +88,7 @@ int seq_buf_vprintf(struct seq_buf *s, const char *fmt, va_list args)
  *
  * Writes a printf() format into the sequence buffer.
  *
- * Returns zero on success, -1 on overflow.
+ * Returns: zero on success, -1 on overflow.
  */
 int seq_buf_printf(struct seq_buf *s, const char *fmt, ...)
 {
@@ -94,12 +104,12 @@ int seq_buf_printf(struct seq_buf *s, const char *fmt, ...)
 EXPORT_SYMBOL_GPL(seq_buf_printf);
 
 /**
- * seq_buf_do_printk - printk seq_buf line by line
+ * seq_buf_do_printk - printk() seq_buf line by line
  * @s: seq_buf descriptor
  * @lvl: printk level
  *
  * printk()-s a multi-line sequential buffer line by line. The function
- * makes sure that the buffer in @s is nul terminated and safe to read
+ * makes sure that the buffer in @s is NUL-terminated and safe to read
  * as a string.
  */
 void seq_buf_do_printk(struct seq_buf *s, const char *lvl)
@@ -139,7 +149,7 @@ EXPORT_SYMBOL_GPL(seq_buf_do_printk);
  * This function will take the format and the binary array and finish
  * the conversion into the ASCII string within the buffer.
  *
- * Returns zero on success, -1 on overflow.
+ * Returns: zero on success, -1 on overflow.
  */
 int seq_buf_bprintf(struct seq_buf *s, const char *fmt, const u32 *binary)
 {
@@ -167,7 +177,7 @@ int seq_buf_bprintf(struct seq_buf *s, const char *fmt, const u32 *binary)
  *
  * Copy a simple string into the sequence buffer.
  *
- * Returns zero on success, -1 on overflow
+ * Returns: zero on success, -1 on overflow.
  */
 int seq_buf_puts(struct seq_buf *s, const char *str)
 {
@@ -196,7 +206,7 @@ EXPORT_SYMBOL_GPL(seq_buf_puts);
  *
  * Copy a single character into the sequence buffer.
  *
- * Returns zero on success, -1 on overflow
+ * Returns: zero on success, -1 on overflow.
  */
 int seq_buf_putc(struct seq_buf *s, unsigned char c)
 {
@@ -212,7 +222,7 @@ int seq_buf_putc(struct seq_buf *s, unsigned char c)
 EXPORT_SYMBOL_GPL(seq_buf_putc);
 
 /**
- * seq_buf_putmem - write raw data into the sequenc buffer
+ * seq_buf_putmem - write raw data into the sequence buffer
  * @s: seq_buf descriptor
  * @mem: The raw memory to copy into the buffer
  * @len: The length of the raw memory to copy (in bytes)
@@ -221,7 +231,7 @@ EXPORT_SYMBOL_GPL(seq_buf_putc);
  * buffer and a strcpy() would not work. Using this function allows
  * for such cases.
  *
- * Returns zero on success, -1 on overflow
+ * Returns: zero on success, -1 on overflow.
  */
 int seq_buf_putmem(struct seq_buf *s, const void *mem, unsigned int len)
 {
@@ -249,7 +259,7 @@ int seq_buf_putmem(struct seq_buf *s, const void *mem, unsigned int len)
  * raw memory into the buffer it writes its ASCII representation of it
  * in hex characters.
  *
- * Returns zero on success, -1 on overflow
+ * Returns: zero on success, -1 on overflow.
  */
 int seq_buf_putmem_hex(struct seq_buf *s, const void *mem,
                       unsigned int len)
@@ -297,7 +307,7 @@ int seq_buf_putmem_hex(struct seq_buf *s, const void *mem,
  *
  * Write a path name into the sequence buffer.
  *
- * Returns the number of written bytes on success, -1 on overflow
+ * Returns: the number of written bytes on success, -1 on overflow.
  */
 int seq_buf_path(struct seq_buf *s, const struct path *path, const char *esc)
 {
@@ -332,6 +342,7 @@ int seq_buf_path(struct seq_buf *s, const struct path *path, const char *esc)
  * or until it reaches the end of the content in the buffer (@s->len),
  * whichever comes first.
  *
+ * Returns:
  * On success, it returns a positive number of the number of bytes
  * it copied.
  *
@@ -382,11 +393,11 @@ int seq_buf_to_user(struct seq_buf *s, char __user *ubuf, size_t start, int cnt)
  * linebuf size is maximal length for one line.
  * 32 * 3 - maximum bytes per line, each printed into 2 chars + 1 for
  *     separating space
- * 2 - spaces separating hex dump and ascii representation
- * 32 - ascii representation
+ * 2 - spaces separating hex dump and ASCII representation
+ * 32 - ASCII representation
  * 1 - terminating '\0'
  *
- * Returns zero on success, -1 on overflow
+ * Returns: zero on success, -1 on overflow.
  */
 int seq_buf_hex_dump(struct seq_buf *s, const char *prefix_str, int prefix_type,
                     int rowsize, int groupsize,
index a0be5d05c7f08187667c91c7d0886843df52225c..4a7055a63d9f8a8a6723563fd8a30115653eea83 100644 (file)
@@ -14,6 +14,7 @@
 
 #define pr_fmt(fmt) "stackdepot: " fmt
 
+#include <linux/debugfs.h>
 #include <linux/gfp.h>
 #include <linux/jhash.h>
 #include <linux/kernel.h>
 #include <linux/list.h>
 #include <linux/mm.h>
 #include <linux/mutex.h>
-#include <linux/percpu.h>
+#include <linux/poison.h>
 #include <linux/printk.h>
+#include <linux/rculist.h>
+#include <linux/rcupdate.h>
 #include <linux/refcount.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 #define DEPOT_OFFSET_BITS (DEPOT_POOL_ORDER + PAGE_SHIFT - DEPOT_STACK_ALIGN)
 #define DEPOT_POOL_INDEX_BITS (DEPOT_HANDLE_BITS - DEPOT_OFFSET_BITS - \
                               STACK_DEPOT_EXTRA_BITS)
-#if IS_ENABLED(CONFIG_KMSAN) && CONFIG_STACKDEPOT_MAX_FRAMES >= 32
-/*
- * KMSAN is frequently used in fuzzing scenarios and thus saves a lot of stack
- * traces. As KMSAN does not support evicting stack traces from the stack
- * depot, the stack depot capacity might be reached quickly with large stack
- * records. Adjust the maximum number of stack depot pools for this case.
- */
-#define DEPOT_POOLS_CAP (8192 * (CONFIG_STACKDEPOT_MAX_FRAMES / 16))
-#else
 #define DEPOT_POOLS_CAP 8192
-#endif
 #define DEPOT_MAX_POOLS \
        (((1LL << (DEPOT_POOL_INDEX_BITS)) < DEPOT_POOLS_CAP) ? \
         (1LL << (DEPOT_POOL_INDEX_BITS)) : DEPOT_POOLS_CAP)
@@ -67,17 +60,30 @@ union handle_parts {
 };
 
 struct stack_record {
-       struct list_head list;          /* Links in hash table or freelist */
+       struct list_head hash_list;     /* Links in the hash table */
        u32 hash;                       /* Hash in hash table */
        u32 size;                       /* Number of stored frames */
-       union handle_parts handle;
+       union handle_parts handle;      /* Constant after initialization */
        refcount_t count;
-       unsigned long entries[CONFIG_STACKDEPOT_MAX_FRAMES];    /* Frames */
+       union {
+               unsigned long entries[CONFIG_STACKDEPOT_MAX_FRAMES];    /* Frames */
+               struct {
+                       /*
+                        * An important invariant of the implementation is to
+                        * only place a stack record onto the freelist iff its
+                        * refcount is zero. Because stack records with a zero
+                        * refcount are never considered as valid, it is safe to
+                        * union @entries and freelist management state below.
+                        * Conversely, as soon as an entry is off the freelist
+                        * and its refcount becomes non-zero, the below must not
+                        * be accessed until being placed back on the freelist.
+                        */
+                       struct list_head free_list;     /* Links in the freelist */
+                       unsigned long rcu_state;        /* RCU cookie */
+               };
+       };
 };
 
-#define DEPOT_STACK_RECORD_SIZE \
-       ALIGN(sizeof(struct stack_record), 1 << DEPOT_STACK_ALIGN)
-
 static bool stack_depot_disabled;
 static bool __stack_depot_early_init_requested __initdata = IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT);
 static bool __stack_depot_early_init_passed __initdata;
@@ -103,17 +109,33 @@ static void *stack_pools[DEPOT_MAX_POOLS];
 static void *new_pool;
 /* Number of pools in stack_pools. */
 static int pools_num;
+/* Offset to the unused space in the currently used pool. */
+static size_t pool_offset = DEPOT_POOL_SIZE;
 /* Freelist of stack records within stack_pools. */
 static LIST_HEAD(free_stacks);
-/*
- * Stack depot tries to keep an extra pool allocated even before it runs out
- * of space in the currently used pool. This flag marks whether this extra pool
- * needs to be allocated. It has the value 0 when either an extra pool is not
- * yet allocated or if the limit on the number of pools is reached.
- */
-static bool new_pool_required = true;
-/* Lock that protects the variables above. */
-static DEFINE_RWLOCK(pool_rwlock);
+/* The lock must be held when performing pool or freelist modifications. */
+static DEFINE_RAW_SPINLOCK(pool_lock);
+
+/* Statistics counters for debugfs. */
+enum depot_counter_id {
+       DEPOT_COUNTER_REFD_ALLOCS,
+       DEPOT_COUNTER_REFD_FREES,
+       DEPOT_COUNTER_REFD_INUSE,
+       DEPOT_COUNTER_FREELIST_SIZE,
+       DEPOT_COUNTER_PERSIST_COUNT,
+       DEPOT_COUNTER_PERSIST_BYTES,
+       DEPOT_COUNTER_COUNT,
+};
+static long counters[DEPOT_COUNTER_COUNT];
+static const char *const counter_names[] = {
+       [DEPOT_COUNTER_REFD_ALLOCS]     = "refcounted_allocations",
+       [DEPOT_COUNTER_REFD_FREES]      = "refcounted_frees",
+       [DEPOT_COUNTER_REFD_INUSE]      = "refcounted_in_use",
+       [DEPOT_COUNTER_FREELIST_SIZE]   = "freelist_size",
+       [DEPOT_COUNTER_PERSIST_COUNT]   = "persistent_count",
+       [DEPOT_COUNTER_PERSIST_BYTES]   = "persistent_bytes",
+};
+static_assert(ARRAY_SIZE(counter_names) == DEPOT_COUNTER_COUNT);
 
 static int __init disable_stack_depot(char *str)
 {
@@ -258,174 +280,273 @@ out_unlock:
 }
 EXPORT_SYMBOL_GPL(stack_depot_init);
 
-/* Initializes a stack depol pool. */
-static void depot_init_pool(void *pool)
+/*
+ * Initializes new stack pool, and updates the list of pools.
+ */
+static bool depot_init_pool(void **prealloc)
 {
-       int offset;
+       lockdep_assert_held(&pool_lock);
 
-       lockdep_assert_held_write(&pool_rwlock);
+       if (unlikely(pools_num >= DEPOT_MAX_POOLS)) {
+               /* Bail out if we reached the pool limit. */
+               WARN_ON_ONCE(pools_num > DEPOT_MAX_POOLS); /* should never happen */
+               WARN_ON_ONCE(!new_pool); /* to avoid unnecessary pre-allocation */
+               WARN_ONCE(1, "Stack depot reached limit capacity");
+               return false;
+       }
 
-       WARN_ON(!list_empty(&free_stacks));
+       if (!new_pool && *prealloc) {
+               /* We have preallocated memory, use it. */
+               WRITE_ONCE(new_pool, *prealloc);
+               *prealloc = NULL;
+       }
 
-       /* Initialize handles and link stack records into the freelist. */
-       for (offset = 0; offset <= DEPOT_POOL_SIZE - DEPOT_STACK_RECORD_SIZE;
-            offset += DEPOT_STACK_RECORD_SIZE) {
-               struct stack_record *stack = pool + offset;
+       if (!new_pool)
+               return false; /* new_pool and *prealloc are NULL */
 
-               stack->handle.pool_index = pools_num;
-               stack->handle.offset = offset >> DEPOT_STACK_ALIGN;
-               stack->handle.extra = 0;
+       /* Save reference to the pool to be used by depot_fetch_stack(). */
+       stack_pools[pools_num] = new_pool;
 
-               list_add(&stack->list, &free_stacks);
-       }
+       /*
+        * Stack depot tries to keep an extra pool allocated even before it runs
+        * out of space in the currently used pool.
+        *
+        * To indicate that a new preallocation is needed new_pool is reset to
+        * NULL; do not reset to NULL if we have reached the maximum number of
+        * pools.
+        */
+       if (pools_num < DEPOT_MAX_POOLS)
+               WRITE_ONCE(new_pool, NULL);
+       else
+               WRITE_ONCE(new_pool, STACK_DEPOT_POISON);
 
-       /* Save reference to the pool to be used by depot_fetch_stack(). */
-       stack_pools[pools_num] = pool;
-       pools_num++;
+       /* Pairs with concurrent READ_ONCE() in depot_fetch_stack(). */
+       WRITE_ONCE(pools_num, pools_num + 1);
+       ASSERT_EXCLUSIVE_WRITER(pools_num);
+
+       pool_offset = 0;
+
+       return true;
 }
 
 /* Keeps the preallocated memory to be used for a new stack depot pool. */
 static void depot_keep_new_pool(void **prealloc)
 {
-       lockdep_assert_held_write(&pool_rwlock);
+       lockdep_assert_held(&pool_lock);
 
        /*
         * If a new pool is already saved or the maximum number of
         * pools is reached, do not use the preallocated memory.
         */
-       if (!new_pool_required)
+       if (new_pool)
                return;
 
-       /*
-        * Use the preallocated memory for the new pool
-        * as long as we do not exceed the maximum number of pools.
-        */
-       if (pools_num < DEPOT_MAX_POOLS) {
-               new_pool = *prealloc;
-               *prealloc = NULL;
+       WRITE_ONCE(new_pool, *prealloc);
+       *prealloc = NULL;
+}
+
+/*
+ * Try to initialize a new stack record from the current pool, a cached pool, or
+ * the current pre-allocation.
+ */
+static struct stack_record *depot_pop_free_pool(void **prealloc, size_t size)
+{
+       struct stack_record *stack;
+       void *current_pool;
+       u32 pool_index;
+
+       lockdep_assert_held(&pool_lock);
+
+       if (pool_offset + size > DEPOT_POOL_SIZE) {
+               if (!depot_init_pool(prealloc))
+                       return NULL;
        }
 
-       /*
-        * At this point, either a new pool is kept or the maximum
-        * number of pools is reached. In either case, take note that
-        * keeping another pool is not required.
-        */
-       new_pool_required = false;
+       if (WARN_ON_ONCE(pools_num < 1))
+               return NULL;
+       pool_index = pools_num - 1;
+       current_pool = stack_pools[pool_index];
+       if (WARN_ON_ONCE(!current_pool))
+               return NULL;
+
+       stack = current_pool + pool_offset;
+
+       /* Pre-initialize handle once. */
+       stack->handle.pool_index = pool_index;
+       stack->handle.offset = pool_offset >> DEPOT_STACK_ALIGN;
+       stack->handle.extra = 0;
+       INIT_LIST_HEAD(&stack->hash_list);
+
+       pool_offset += size;
+
+       return stack;
 }
 
-/* Updates references to the current and the next stack depot pools. */
-static bool depot_update_pools(void **prealloc)
+/* Try to find next free usable entry from the freelist. */
+static struct stack_record *depot_pop_free(void)
 {
-       lockdep_assert_held_write(&pool_rwlock);
+       struct stack_record *stack;
 
-       /* Check if we still have objects in the freelist. */
-       if (!list_empty(&free_stacks))
-               goto out_keep_prealloc;
+       lockdep_assert_held(&pool_lock);
 
-       /* Check if we have a new pool saved and use it. */
-       if (new_pool) {
-               depot_init_pool(new_pool);
-               new_pool = NULL;
+       if (list_empty(&free_stacks))
+               return NULL;
 
-               /* Take note that we might need a new new_pool. */
-               if (pools_num < DEPOT_MAX_POOLS)
-                       new_pool_required = true;
+       /*
+        * We maintain the invariant that the elements in front are least
+        * recently used, and are therefore more likely to be associated with an
+        * RCU grace period in the past. Consequently it is sufficient to only
+        * check the first entry.
+        */
+       stack = list_first_entry(&free_stacks, struct stack_record, free_list);
+       if (!poll_state_synchronize_rcu(stack->rcu_state))
+               return NULL;
 
-               /* Try keeping the preallocated memory for new_pool. */
-               goto out_keep_prealloc;
-       }
+       list_del(&stack->free_list);
+       counters[DEPOT_COUNTER_FREELIST_SIZE]--;
 
-       /* Bail out if we reached the pool limit. */
-       if (unlikely(pools_num >= DEPOT_MAX_POOLS)) {
-               WARN_ONCE(1, "Stack depot reached limit capacity");
-               return false;
-       }
+       return stack;
+}
 
-       /* Check if we have preallocated memory and use it. */
-       if (*prealloc) {
-               depot_init_pool(*prealloc);
-               *prealloc = NULL;
-               return true;
-       }
+static inline size_t depot_stack_record_size(struct stack_record *s, unsigned int nr_entries)
+{
+       const size_t used = flex_array_size(s, entries, nr_entries);
+       const size_t unused = sizeof(s->entries) - used;
 
-       return false;
+       WARN_ON_ONCE(sizeof(s->entries) < used);
 
-out_keep_prealloc:
-       /* Keep the preallocated memory for a new pool if required. */
-       if (*prealloc)
-               depot_keep_new_pool(prealloc);
-       return true;
+       return ALIGN(sizeof(struct stack_record) - unused, 1 << DEPOT_STACK_ALIGN);
 }
 
 /* Allocates a new stack in a stack depot pool. */
 static struct stack_record *
-depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
+depot_alloc_stack(unsigned long *entries, unsigned int nr_entries, u32 hash, depot_flags_t flags, void **prealloc)
 {
-       struct stack_record *stack;
+       struct stack_record *stack = NULL;
+       size_t record_size;
 
-       lockdep_assert_held_write(&pool_rwlock);
+       lockdep_assert_held(&pool_lock);
 
-       /* Update current and new pools if required and possible. */
-       if (!depot_update_pools(prealloc))
+       /* This should already be checked by public API entry points. */
+       if (WARN_ON_ONCE(!nr_entries))
                return NULL;
 
-       /* Check if we have a stack record to save the stack trace. */
-       if (list_empty(&free_stacks))
-               return NULL;
+       /* Limit number of saved frames to CONFIG_STACKDEPOT_MAX_FRAMES. */
+       if (nr_entries > CONFIG_STACKDEPOT_MAX_FRAMES)
+               nr_entries = CONFIG_STACKDEPOT_MAX_FRAMES;
 
-       /* Get and unlink the first entry from the freelist. */
-       stack = list_first_entry(&free_stacks, struct stack_record, list);
-       list_del(&stack->list);
+       if (flags & STACK_DEPOT_FLAG_GET) {
+               /*
+                * Evictable entries have to allocate the max. size so they may
+                * safely be re-used by differently sized allocations.
+                */
+               record_size = depot_stack_record_size(stack, CONFIG_STACKDEPOT_MAX_FRAMES);
+               stack = depot_pop_free();
+       } else {
+               record_size = depot_stack_record_size(stack, nr_entries);
+       }
 
-       /* Limit number of saved frames to CONFIG_STACKDEPOT_MAX_FRAMES. */
-       if (size > CONFIG_STACKDEPOT_MAX_FRAMES)
-               size = CONFIG_STACKDEPOT_MAX_FRAMES;
+       if (!stack) {
+               stack = depot_pop_free_pool(prealloc, record_size);
+               if (!stack)
+                       return NULL;
+       }
 
        /* Save the stack trace. */
        stack->hash = hash;
-       stack->size = size;
-       /* stack->handle is already filled in by depot_init_pool(). */
-       refcount_set(&stack->count, 1);
-       memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
+       stack->size = nr_entries;
+       /* stack->handle is already filled in by depot_pop_free_pool(). */
+       memcpy(stack->entries, entries, flex_array_size(stack, entries, nr_entries));
+
+       if (flags & STACK_DEPOT_FLAG_GET) {
+               refcount_set(&stack->count, 1);
+               counters[DEPOT_COUNTER_REFD_ALLOCS]++;
+               counters[DEPOT_COUNTER_REFD_INUSE]++;
+       } else {
+               /* Warn on attempts to switch to refcounting this entry. */
+               refcount_set(&stack->count, REFCOUNT_SATURATED);
+               counters[DEPOT_COUNTER_PERSIST_COUNT]++;
+               counters[DEPOT_COUNTER_PERSIST_BYTES] += record_size;
+       }
 
        /*
         * Let KMSAN know the stored stack record is initialized. This shall
         * prevent false positive reports if instrumented code accesses it.
         */
-       kmsan_unpoison_memory(stack, DEPOT_STACK_RECORD_SIZE);
+       kmsan_unpoison_memory(stack, record_size);
 
        return stack;
 }
 
 static struct stack_record *depot_fetch_stack(depot_stack_handle_t handle)
 {
+       const int pools_num_cached = READ_ONCE(pools_num);
        union handle_parts parts = { .handle = handle };
        void *pool;
        size_t offset = parts.offset << DEPOT_STACK_ALIGN;
        struct stack_record *stack;
 
-       lockdep_assert_held(&pool_rwlock);
+       lockdep_assert_not_held(&pool_lock);
 
-       if (parts.pool_index > pools_num) {
+       if (parts.pool_index > pools_num_cached) {
                WARN(1, "pool index %d out of bounds (%d) for stack id %08x\n",
-                    parts.pool_index, pools_num, handle);
+                    parts.pool_index, pools_num_cached, handle);
                return NULL;
        }
 
        pool = stack_pools[parts.pool_index];
-       if (!pool)
+       if (WARN_ON(!pool))
                return NULL;
 
        stack = pool + offset;
+       if (WARN_ON(!refcount_read(&stack->count)))
+               return NULL;
+
        return stack;
 }
 
 /* Links stack into the freelist. */
 static void depot_free_stack(struct stack_record *stack)
 {
-       lockdep_assert_held_write(&pool_rwlock);
+       unsigned long flags;
+
+       lockdep_assert_not_held(&pool_lock);
+
+       raw_spin_lock_irqsave(&pool_lock, flags);
+       printk_deferred_enter();
 
-       list_add(&stack->list, &free_stacks);
+       /*
+        * Remove the entry from the hash list. Concurrent list traversal may
+        * still observe the entry, but since the refcount is zero, this entry
+        * will no longer be considered as valid.
+        */
+       list_del_rcu(&stack->hash_list);
+
+       /*
+        * Due to being used from constrained contexts such as the allocators,
+        * NMI, or even RCU itself, stack depot cannot rely on primitives that
+        * would sleep (such as synchronize_rcu()) or recursively call into
+        * stack depot again (such as call_rcu()).
+        *
+        * Instead, get an RCU cookie, so that we can ensure this entry isn't
+        * moved onto another list until the next grace period, and concurrent
+        * RCU list traversal remains safe.
+        */
+       stack->rcu_state = get_state_synchronize_rcu();
+
+       /*
+        * Add the entry to the freelist tail, so that older entries are
+        * considered first - their RCU cookie is more likely to no longer be
+        * associated with the current grace period.
+        */
+       list_add_tail(&stack->free_list, &free_stacks);
+
+       counters[DEPOT_COUNTER_FREELIST_SIZE]++;
+       counters[DEPOT_COUNTER_REFD_FREES]++;
+       counters[DEPOT_COUNTER_REFD_INUSE]--;
+
+       printk_deferred_exit();
+       raw_spin_unlock_irqrestore(&pool_lock, flags);
 }
 
 /* Calculates the hash for a stack. */
@@ -453,22 +574,52 @@ int stackdepot_memcmp(const unsigned long *u1, const unsigned long *u2,
 
 /* Finds a stack in a bucket of the hash table. */
 static inline struct stack_record *find_stack(struct list_head *bucket,
-                                            unsigned long *entries, int size,
-                                            u32 hash)
+                                             unsigned long *entries, int size,
+                                             u32 hash, depot_flags_t flags)
 {
-       struct list_head *pos;
-       struct stack_record *found;
+       struct stack_record *stack, *ret = NULL;
 
-       lockdep_assert_held(&pool_rwlock);
+       /*
+        * Stack depot may be used from instrumentation that instruments RCU or
+        * tracing itself; use variant that does not call into RCU and cannot be
+        * traced.
+        *
+        * Note: Such use cases must take care when using refcounting to evict
+        * unused entries, because the stack record free-then-reuse code paths
+        * do call into RCU.
+        */
+       rcu_read_lock_sched_notrace();
+
+       list_for_each_entry_rcu(stack, bucket, hash_list) {
+               if (stack->hash != hash || stack->size != size)
+                       continue;
+
+               /*
+                * This may race with depot_free_stack() accessing the freelist
+                * management state unioned with @entries. The refcount is zero
+                * in that case and the below refcount_inc_not_zero() will fail.
+                */
+               if (data_race(stackdepot_memcmp(entries, stack->entries, size)))
+                       continue;
+
+               /*
+                * Try to increment refcount. If this succeeds, the stack record
+                * is valid and has not yet been freed.
+                *
+                * If STACK_DEPOT_FLAG_GET is not used, it is undefined behavior
+                * to then call stack_depot_put() later, and we can assume that
+                * a stack record is never placed back on the freelist.
+                */
+               if ((flags & STACK_DEPOT_FLAG_GET) && !refcount_inc_not_zero(&stack->count))
+                       continue;
 
-       list_for_each(pos, bucket) {
-               found = list_entry(pos, struct stack_record, list);
-               if (found->hash == hash &&
-                   found->size == size &&
-                   !stackdepot_memcmp(entries, found->entries, size))
-                       return found;
+               ret = stack;
+               break;
        }
-       return NULL;
+
+       rcu_read_unlock_sched_notrace();
+
+       return ret;
 }
 
 depot_stack_handle_t stack_depot_save_flags(unsigned long *entries,
@@ -482,7 +633,6 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries,
        struct page *page = NULL;
        void *prealloc = NULL;
        bool can_alloc = depot_flags & STACK_DEPOT_FLAG_CAN_ALLOC;
-       bool need_alloc = false;
        unsigned long flags;
        u32 hash;
 
@@ -505,31 +655,16 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries,
        hash = hash_stack(entries, nr_entries);
        bucket = &stack_table[hash & stack_hash_mask];
 
-       read_lock_irqsave(&pool_rwlock, flags);
-       printk_deferred_enter();
-
-       /* Fast path: look the stack trace up without full locking. */
-       found = find_stack(bucket, entries, nr_entries, hash);
-       if (found) {
-               if (depot_flags & STACK_DEPOT_FLAG_GET)
-                       refcount_inc(&found->count);
-               printk_deferred_exit();
-               read_unlock_irqrestore(&pool_rwlock, flags);
+       /* Fast path: look the stack trace up without locking. */
+       found = find_stack(bucket, entries, nr_entries, hash, depot_flags);
+       if (found)
                goto exit;
-       }
-
-       /* Take note if another stack pool needs to be allocated. */
-       if (new_pool_required)
-               need_alloc = true;
-
-       printk_deferred_exit();
-       read_unlock_irqrestore(&pool_rwlock, flags);
 
        /*
         * Allocate memory for a new pool if required now:
         * we won't be able to do that under the lock.
         */
-       if (unlikely(can_alloc && need_alloc)) {
+       if (unlikely(can_alloc && !READ_ONCE(new_pool))) {
                /*
                 * Zero out zone modifiers, as we don't have specific zone
                 * requirements. Keep the flags related to allocation in atomic
@@ -543,31 +678,36 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries,
                        prealloc = page_address(page);
        }
 
-       write_lock_irqsave(&pool_rwlock, flags);
+       raw_spin_lock_irqsave(&pool_lock, flags);
        printk_deferred_enter();
 
-       found = find_stack(bucket, entries, nr_entries, hash);
+       /* Try to find again, to avoid concurrently inserting duplicates. */
+       found = find_stack(bucket, entries, nr_entries, hash, depot_flags);
        if (!found) {
                struct stack_record *new =
-                       depot_alloc_stack(entries, nr_entries, hash, &prealloc);
+                       depot_alloc_stack(entries, nr_entries, hash, depot_flags, &prealloc);
 
                if (new) {
-                       list_add(&new->list, bucket);
+                       /*
+                        * This releases the stack record into the bucket and
+                        * makes it visible to readers in find_stack().
+                        */
+                       list_add_rcu(&new->hash_list, bucket);
                        found = new;
                }
-       } else {
-               if (depot_flags & STACK_DEPOT_FLAG_GET)
-                       refcount_inc(&found->count);
+       }
+
+       if (prealloc) {
                /*
-                * Stack depot already contains this stack trace, but let's
-                * keep the preallocated memory for future.
+                * Either stack depot already contains this stack trace, or
+                * depot_alloc_stack() did not consume the preallocated memory.
+                * Try to keep the preallocated memory for future.
                 */
-               if (prealloc)
-                       depot_keep_new_pool(&prealloc);
+               depot_keep_new_pool(&prealloc);
        }
 
        printk_deferred_exit();
-       write_unlock_irqrestore(&pool_rwlock, flags);
+       raw_spin_unlock_irqrestore(&pool_lock, flags);
 exit:
        if (prealloc) {
                /* Stack depot didn't use this memory, free it. */
@@ -592,7 +732,6 @@ unsigned int stack_depot_fetch(depot_stack_handle_t handle,
                               unsigned long **entries)
 {
        struct stack_record *stack;
-       unsigned long flags;
 
        *entries = NULL;
        /*
@@ -604,13 +743,13 @@ unsigned int stack_depot_fetch(depot_stack_handle_t handle,
        if (!handle || stack_depot_disabled)
                return 0;
 
-       read_lock_irqsave(&pool_rwlock, flags);
-       printk_deferred_enter();
-
        stack = depot_fetch_stack(handle);
-
-       printk_deferred_exit();
-       read_unlock_irqrestore(&pool_rwlock, flags);
+       /*
+        * Should never be NULL, otherwise this is a use-after-put (or just a
+        * corrupt handle).
+        */
+       if (WARN(!stack, "corrupt handle or use after stack_depot_put()"))
+               return 0;
 
        *entries = stack->entries;
        return stack->size;
@@ -620,29 +759,20 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
 void stack_depot_put(depot_stack_handle_t handle)
 {
        struct stack_record *stack;
-       unsigned long flags;
 
        if (!handle || stack_depot_disabled)
                return;
 
-       write_lock_irqsave(&pool_rwlock, flags);
-       printk_deferred_enter();
-
        stack = depot_fetch_stack(handle);
-       if (WARN_ON(!stack))
-               goto out;
-
-       if (refcount_dec_and_test(&stack->count)) {
-               /* Unlink stack from the hash table. */
-               list_del(&stack->list);
+       /*
+        * Should always be able to find the stack record, otherwise this is an
+        * unbalanced put attempt (or corrupt handle).
+        */
+       if (WARN(!stack, "corrupt handle or unbalanced stack_depot_put()"))
+               return;
 
-               /* Free stack. */
+       if (refcount_dec_and_test(&stack->count))
                depot_free_stack(stack);
-       }
-
-out:
-       printk_deferred_exit();
-       write_unlock_irqrestore(&pool_rwlock, flags);
 }
 EXPORT_SYMBOL_GPL(stack_depot_put);
 
@@ -690,3 +820,30 @@ unsigned int stack_depot_get_extra_bits(depot_stack_handle_t handle)
        return parts.extra;
 }
 EXPORT_SYMBOL(stack_depot_get_extra_bits);
+
+static int stats_show(struct seq_file *seq, void *v)
+{
+       /*
+        * data race ok: These are just statistics counters, and approximate
+        * statistics are ok for debugging.
+        */
+       seq_printf(seq, "pools: %d\n", data_race(pools_num));
+       for (int i = 0; i < DEPOT_COUNTER_COUNT; i++)
+               seq_printf(seq, "%s: %ld\n", counter_names[i], data_race(counters[i]));
+
+       return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(stats);
+
+static int depot_debugfs_init(void)
+{
+       struct dentry *dir;
+
+       if (stack_depot_disabled)
+               return 0;
+
+       dir = debugfs_create_dir("stackdepot", NULL);
+       debugfs_create_file("stats", 0444, dir, NULL, &stats_fops);
+       return 0;
+}
+late_initcall(depot_debugfs_init);
index 29185ac5c727f6f14bf2bd62ed9c16c9acdca085..399380db449cdd7c4b224d615fc5fc8a5528e06c 100644 (file)
@@ -3599,6 +3599,45 @@ static noinline void __init check_state_handling(struct maple_tree *mt)
        mas_unlock(&mas);
 }
 
+static noinline void __init alloc_cyclic_testing(struct maple_tree *mt)
+{
+       unsigned long location;
+       unsigned long next;
+       int ret = 0;
+       MA_STATE(mas, mt, 0, 0);
+
+       next = 0;
+       mtree_lock(mt);
+       for (int i = 0; i < 100; i++) {
+               mas_alloc_cyclic(&mas, &location, mt, 2, ULONG_MAX, &next, GFP_KERNEL);
+               MAS_BUG_ON(&mas, i != location - 2);
+               MAS_BUG_ON(&mas, mas.index != location);
+               MAS_BUG_ON(&mas, mas.last != location);
+               MAS_BUG_ON(&mas, i != next - 3);
+       }
+
+       mtree_unlock(mt);
+       mtree_destroy(mt);
+       next = 0;
+       mt_init_flags(mt, MT_FLAGS_ALLOC_RANGE);
+       for (int i = 0; i < 100; i++) {
+               mtree_alloc_cyclic(mt, &location, mt, 2, ULONG_MAX, &next, GFP_KERNEL);
+               MT_BUG_ON(mt, i != location - 2);
+               MT_BUG_ON(mt, i != next - 3);
+               MT_BUG_ON(mt, mtree_load(mt, location) != mt);
+       }
+
+       mtree_destroy(mt);
+       /* Overflow test */
+       next = ULONG_MAX - 1;
+       ret = mtree_alloc_cyclic(mt, &location, mt, 2, ULONG_MAX, &next, GFP_KERNEL);
+       MT_BUG_ON(mt, ret != 0);
+       ret = mtree_alloc_cyclic(mt, &location, mt, 2, ULONG_MAX, &next, GFP_KERNEL);
+       MT_BUG_ON(mt, ret != 0);
+       ret = mtree_alloc_cyclic(mt, &location, mt, 2, ULONG_MAX, &next, GFP_KERNEL);
+       MT_BUG_ON(mt, ret != 1);
+}
+
 static DEFINE_MTREE(tree);
 static int __init maple_tree_seed(void)
 {
@@ -3880,6 +3919,11 @@ static int __init maple_tree_seed(void)
        check_state_handling(&tree);
        mtree_destroy(&tree);
 
+       mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
+       alloc_cyclic_testing(&tree);
+       mtree_destroy(&tree);
+
+
 #if defined(BENCH)
 skip:
 #endif
index 1e3447bccdb14d126b3c108fd27ab652b5a3a94f..5f2be8c8df11f1ba31a1c2ea78be76651eb98747 100644 (file)
@@ -372,31 +372,6 @@ static int __init default_bdi_init(void)
 }
 subsys_initcall(default_bdi_init);
 
-/*
- * This function is used when the first inode for this wb is marked dirty. It
- * wakes-up the corresponding bdi thread which should then take care of the
- * periodic background write-out of dirty inodes. Since the write-out would
- * starts only 'dirty_writeback_interval' centisecs from now anyway, we just
- * set up a timer which wakes the bdi thread up later.
- *
- * Note, we wouldn't bother setting up the timer, but this function is on the
- * fast-path (used by '__mark_inode_dirty()'), so we save few context switches
- * by delaying the wake-up.
- *
- * We have to be careful not to postpone flush work if it is scheduled for
- * earlier. Thus we use queue_delayed_work().
- */
-void wb_wakeup_delayed(struct bdi_writeback *wb)
-{
-       unsigned long timeout;
-
-       timeout = msecs_to_jiffies(dirty_writeback_interval * 10);
-       spin_lock_irq(&wb->work_lock);
-       if (test_bit(WB_registered, &wb->state))
-               queue_delayed_work(bdi_wq, &wb->dwork, timeout);
-       spin_unlock_irq(&wb->work_lock);
-}
-
 static void wb_update_bandwidth_workfn(struct work_struct *work)
 {
        struct bdi_writeback *wb = container_of(to_delayed_work(work),
@@ -436,7 +411,6 @@ static int wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi,
        INIT_LIST_HEAD(&wb->work_list);
        INIT_DELAYED_WORK(&wb->dwork, wb_workfn);
        INIT_DELAYED_WORK(&wb->bw_dwork, wb_update_bandwidth_workfn);
-       wb->dirty_sleep = jiffies;
 
        err = fprop_local_init_percpu(&wb->completions, gfp);
        if (err)
@@ -921,6 +895,7 @@ int bdi_init(struct backing_dev_info *bdi)
        INIT_LIST_HEAD(&bdi->bdi_list);
        INIT_LIST_HEAD(&bdi->wb_list);
        init_waitqueue_head(&bdi->wb_waitq);
+       bdi->last_bdp_sleep = jiffies;
 
        return cgwb_bdi_init(bdi);
 }
index 4add68d40e8d99c72bd6af648510aadd587dddfc..b961db601df4194f4cc69535bf154bef4a2624f0 100644 (file)
@@ -2723,16 +2723,11 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
                unsigned int alloc_flags, const struct alloc_context *ac,
                enum compact_priority prio, struct page **capture)
 {
-       int may_perform_io = (__force int)(gfp_mask & __GFP_IO);
        struct zoneref *z;
        struct zone *zone;
        enum compact_result rc = COMPACT_SKIPPED;
 
-       /*
-        * Check if the GFP flags allow compaction - GFP_NOIO is really
-        * tricky context because the migration might require IO
-        */
-       if (!may_perform_io)
+       if (!gfp_compaction_allowed(gfp_mask))
                return COMPACT_SKIPPED;
 
        trace_mm_compaction_try_to_compact_pages(order, gfp_mask, prio);
index 36f6f1d21ff069de12575a4f0d932e0dfc316c11..5b325749fc12597ddd273ae605bdb1c04a93f99e 100644 (file)
@@ -1026,6 +1026,9 @@ static void damon_do_apply_schemes(struct damon_ctx *c,
        damon_for_each_scheme(s, c) {
                struct damos_quota *quota = &s->quota;
 
+               if (c->passed_sample_intervals != s->next_apply_sis)
+                       continue;
+
                if (!s->wmarks.activated)
                        continue;
 
@@ -1176,10 +1179,6 @@ static void kdamond_apply_schemes(struct damon_ctx *c)
                if (c->passed_sample_intervals != s->next_apply_sis)
                        continue;
 
-               s->next_apply_sis +=
-                       (s->apply_interval_us ? s->apply_interval_us :
-                        c->attrs.aggr_interval) / sample_interval;
-
                if (!s->wmarks.activated)
                        continue;
 
@@ -1195,6 +1194,14 @@ static void kdamond_apply_schemes(struct damon_ctx *c)
                damon_for_each_region_safe(r, next_r, t)
                        damon_do_apply_schemes(c, t, r);
        }
+
+       damon_for_each_scheme(s, c) {
+               if (c->passed_sample_intervals != s->next_apply_sis)
+                       continue;
+               s->next_apply_sis +=
+                       (s->apply_interval_us ? s->apply_interval_us :
+                        c->attrs.aggr_interval) / sample_interval;
+       }
 }
 
 /*
index f2e5f9431892eb207bec1da87224282e3de27371..3de2916a65c38c372b5ed8472b7a87b34026aed7 100644 (file)
@@ -185,9 +185,21 @@ static struct damos *damon_lru_sort_new_cold_scheme(unsigned int cold_thres)
        return damon_lru_sort_new_scheme(&pattern, DAMOS_LRU_DEPRIO);
 }
 
+static void damon_lru_sort_copy_quota_status(struct damos_quota *dst,
+               struct damos_quota *src)
+{
+       dst->total_charged_sz = src->total_charged_sz;
+       dst->total_charged_ns = src->total_charged_ns;
+       dst->charged_sz = src->charged_sz;
+       dst->charged_from = src->charged_from;
+       dst->charge_target_from = src->charge_target_from;
+       dst->charge_addr_from = src->charge_addr_from;
+}
+
 static int damon_lru_sort_apply_parameters(void)
 {
-       struct damos *scheme;
+       struct damos *scheme, *hot_scheme, *cold_scheme;
+       struct damos *old_hot_scheme = NULL, *old_cold_scheme = NULL;
        unsigned int hot_thres, cold_thres;
        int err = 0;
 
@@ -195,18 +207,35 @@ static int damon_lru_sort_apply_parameters(void)
        if (err)
                return err;
 
+       damon_for_each_scheme(scheme, ctx) {
+               if (!old_hot_scheme) {
+                       old_hot_scheme = scheme;
+                       continue;
+               }
+               old_cold_scheme = scheme;
+       }
+
        hot_thres = damon_max_nr_accesses(&damon_lru_sort_mon_attrs) *
                hot_thres_access_freq / 1000;
-       scheme = damon_lru_sort_new_hot_scheme(hot_thres);
-       if (!scheme)
+       hot_scheme = damon_lru_sort_new_hot_scheme(hot_thres);
+       if (!hot_scheme)
                return -ENOMEM;
-       damon_set_schemes(ctx, &scheme, 1);
+       if (old_hot_scheme)
+               damon_lru_sort_copy_quota_status(&hot_scheme->quota,
+                               &old_hot_scheme->quota);
 
        cold_thres = cold_min_age / damon_lru_sort_mon_attrs.aggr_interval;
-       scheme = damon_lru_sort_new_cold_scheme(cold_thres);
-       if (!scheme)
+       cold_scheme = damon_lru_sort_new_cold_scheme(cold_thres);
+       if (!cold_scheme) {
+               damon_destroy_scheme(hot_scheme);
                return -ENOMEM;
-       damon_add_scheme(ctx, scheme);
+       }
+       if (old_cold_scheme)
+               damon_lru_sort_copy_quota_status(&cold_scheme->quota,
+                               &old_cold_scheme->quota);
+
+       damon_set_schemes(ctx, &hot_scheme, 1);
+       damon_add_scheme(ctx, cold_scheme);
 
        return damon_set_region_biggest_system_ram_default(target,
                                        &monitor_region_start,
index ab974e477d2f2850f642fbbafac48a8b3a5d136b..66e190f0374ac84b47100b8ba21fe4c32e104891 100644 (file)
@@ -150,9 +150,20 @@ static struct damos *damon_reclaim_new_scheme(void)
                        &damon_reclaim_wmarks);
 }
 
+static void damon_reclaim_copy_quota_status(struct damos_quota *dst,
+               struct damos_quota *src)
+{
+       dst->total_charged_sz = src->total_charged_sz;
+       dst->total_charged_ns = src->total_charged_ns;
+       dst->charged_sz = src->charged_sz;
+       dst->charged_from = src->charged_from;
+       dst->charge_target_from = src->charge_target_from;
+       dst->charge_addr_from = src->charge_addr_from;
+}
+
 static int damon_reclaim_apply_parameters(void)
 {
-       struct damos *scheme;
+       struct damos *scheme, *old_scheme;
        struct damos_filter *filter;
        int err = 0;
 
@@ -164,6 +175,11 @@ static int damon_reclaim_apply_parameters(void)
        scheme = damon_reclaim_new_scheme();
        if (!scheme)
                return -ENOMEM;
+       if (!list_empty(&ctx->schemes)) {
+               damon_for_each_scheme(old_scheme, ctx)
+                       damon_reclaim_copy_quota_status(&scheme->quota,
+                                       &old_scheme->quota);
+       }
        if (skip_anon) {
                filter = damos_new_filter(DAMOS_FILTER_TYPE_ANON, true);
                if (!filter) {
index 8dbaac6e5c2d05dc4bc7c2b7545be26d508afd9a..ae0f0b314f3a9a5ec251021d0fb68d423fa53cd7 100644 (file)
@@ -1905,6 +1905,10 @@ void damos_sysfs_set_quota_scores(struct damon_sysfs_schemes *sysfs_schemes,
        damon_for_each_scheme(scheme, ctx) {
                struct damon_sysfs_scheme *sysfs_scheme;
 
+               /* user could have removed the scheme sysfs dir */
+               if (i >= sysfs_schemes->nr)
+                       break;
+
                sysfs_scheme = sysfs_schemes->schemes_arr[i];
                damos_sysfs_set_quota_score(sysfs_scheme->quotas->goals,
                                &scheme->quota);
@@ -2194,7 +2198,7 @@ static void damos_tried_regions_init_upd_status(
                sysfs_regions->upd_timeout_jiffies = jiffies +
                        2 * usecs_to_jiffies(scheme->apply_interval_us ?
                                        scheme->apply_interval_us :
-                                       ctx->attrs.sample_interval);
+                                       ctx->attrs.aggr_interval);
        }
 }
 
index 5662e29fe25335cf9e6227ae9fd4f22971095adb..65c19025da3dfee99ba0c1129874283bfcd5c72f 100644 (file)
@@ -362,6 +362,12 @@ static void __init pud_advanced_tests(struct pgtable_debug_args *args)
        vaddr &= HPAGE_PUD_MASK;
 
        pud = pfn_pud(args->pud_pfn, args->page_prot);
+       /*
+        * Some architectures have debug checks to make sure
+        * huge pud mapping are only found with devmap entries
+        * For now test with only devmap entries.
+        */
+       pud = pud_mkdevmap(pud);
        set_pud_at(args->mm, vaddr, args->pudp, pud);
        flush_dcache_page(page);
        pudp_set_wrprotect(args->mm, vaddr, args->pudp);
@@ -374,6 +380,7 @@ static void __init pud_advanced_tests(struct pgtable_debug_args *args)
        WARN_ON(!pud_none(pud));
 #endif /* __PAGETABLE_PMD_FOLDED */
        pud = pfn_pud(args->pud_pfn, args->page_prot);
+       pud = pud_mkdevmap(pud);
        pud = pud_wrprotect(pud);
        pud = pud_mkclean(pud);
        set_pud_at(args->mm, vaddr, args->pudp, pud);
@@ -391,6 +398,7 @@ static void __init pud_advanced_tests(struct pgtable_debug_args *args)
 #endif /* __PAGETABLE_PMD_FOLDED */
 
        pud = pfn_pud(args->pud_pfn, args->page_prot);
+       pud = pud_mkdevmap(pud);
        pud = pud_mkyoung(pud);
        set_pud_at(args->mm, vaddr, args->pudp, pud);
        flush_dcache_page(page);
index 750e779c23db74730fa7743c2307d1b996729d62..8df4797c5287fa748aef1c26dd63f92dc24f03fd 100644 (file)
@@ -2608,15 +2608,6 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
                        goto put_folios;
                end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count);
 
-               /*
-                * Pairs with a barrier in
-                * block_write_end()->mark_buffer_dirty() or other page
-                * dirtying routines like iomap_write_end() to ensure
-                * changes to page contents are visible before we see
-                * increased inode size.
-                */
-               smp_rmb();
-
                /*
                 * Once we start copying data, we don't want to be touching any
                 * cachelines that might be contended:
@@ -4111,28 +4102,40 @@ static void filemap_cachestat(struct address_space *mapping,
 
        rcu_read_lock();
        xas_for_each(&xas, folio, last_index) {
+               int order;
                unsigned long nr_pages;
                pgoff_t folio_first_index, folio_last_index;
 
+               /*
+                * Don't deref the folio. It is not pinned, and might
+                * get freed (and reused) underneath us.
+                *
+                * We *could* pin it, but that would be expensive for
+                * what should be a fast and lightweight syscall.
+                *
+                * Instead, derive all information of interest from
+                * the rcu-protected xarray.
+                */
+
                if (xas_retry(&xas, folio))
                        continue;
 
+               order = xa_get_order(xas.xa, xas.xa_index);
+               nr_pages = 1 << order;
+               folio_first_index = round_down(xas.xa_index, 1 << order);
+               folio_last_index = folio_first_index + nr_pages - 1;
+
+               /* Folios might straddle the range boundaries, only count covered pages */
+               if (folio_first_index < first_index)
+                       nr_pages -= first_index - folio_first_index;
+
+               if (folio_last_index > last_index)
+                       nr_pages -= folio_last_index - last_index;
+
                if (xa_is_value(folio)) {
                        /* page is evicted */
                        void *shadow = (void *)folio;
                        bool workingset; /* not used */
-                       int order = xa_get_order(xas.xa, xas.xa_index);
-
-                       nr_pages = 1 << order;
-                       folio_first_index = round_down(xas.xa_index, 1 << order);
-                       folio_last_index = folio_first_index + nr_pages - 1;
-
-                       /* Folios might straddle the range boundaries, only count covered pages */
-                       if (folio_first_index < first_index)
-                               nr_pages -= first_index - folio_first_index;
-
-                       if (folio_last_index > last_index)
-                               nr_pages -= folio_last_index - last_index;
 
                        cs->nr_evicted += nr_pages;
 
@@ -4150,24 +4153,13 @@ static void filemap_cachestat(struct address_space *mapping,
                        goto resched;
                }
 
-               nr_pages = folio_nr_pages(folio);
-               folio_first_index = folio_pgoff(folio);
-               folio_last_index = folio_first_index + nr_pages - 1;
-
-               /* Folios might straddle the range boundaries, only count covered pages */
-               if (folio_first_index < first_index)
-                       nr_pages -= first_index - folio_first_index;
-
-               if (folio_last_index > last_index)
-                       nr_pages -= folio_last_index - last_index;
-
                /* page is in cache */
                cs->nr_cache += nr_pages;
 
-               if (folio_test_dirty(folio))
+               if (xas_get_mark(&xas, PAGECACHE_TAG_DIRTY))
                        cs->nr_dirty += nr_pages;
 
-               if (folio_test_writeback(folio))
+               if (xas_get_mark(&xas, PAGECACHE_TAG_WRITEBACK))
                        cs->nr_writeback += nr_pages;
 
 resched:
index 94ef5c02b459642f2625775bc66ca147cb2ac992..94c958f7ebb50dd925070157c0d0b2432dfc0483 100644 (file)
@@ -37,6 +37,7 @@
 #include <linux/page_owner.h>
 #include <linux/sched/sysctl.h>
 #include <linux/memory-tiers.h>
+#include <linux/compat.h>
 
 #include <asm/tlb.h>
 #include <asm/pgalloc.h>
@@ -809,7 +810,10 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
 {
        loff_t off_end = off + len;
        loff_t off_align = round_up(off, size);
-       unsigned long len_pad, ret;
+       unsigned long len_pad, ret, off_sub;
+
+       if (IS_ENABLED(CONFIG_32BIT) || in_compat_syscall())
+               return 0;
 
        if (off_end <= off_align || (off_end - off_align) < size)
                return 0;
@@ -835,7 +839,13 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
        if (ret == addr)
                return addr;
 
-       ret += (off - ret) & (size - 1);
+       off_sub = (off - ret) & (size - 1);
+
+       if (current->mm->get_unmapped_area == arch_get_unmapped_area_topdown &&
+           !off_sub)
+               return ret + size;
+
+       ret += off_sub;
        return ret;
 }
 
@@ -2437,7 +2447,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
                        page = pmd_page(old_pmd);
                        folio = page_folio(page);
                        if (!folio_test_dirty(folio) && pmd_dirty(old_pmd))
-                               folio_set_dirty(folio);
+                               folio_mark_dirty(folio);
                        if (!folio_test_referenced(folio) && pmd_young(old_pmd))
                                folio_set_referenced(folio);
                        folio_remove_rmap_pmd(folio, page, vma);
@@ -3563,7 +3573,7 @@ int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw,
        }
 
        if (pmd_dirty(pmdval))
-               folio_set_dirty(folio);
+               folio_mark_dirty(folio);
        if (pmd_write(pmdval))
                entry = make_writable_migration_entry(page_to_pfn(page));
        else if (anon_exclusive)
index 610efae912209472d818628778f1ebce5b0e2cc1..6ca63e8dda741b5e4094f7205f0b74a163be2e43 100644 (file)
@@ -65,8 +65,7 @@ void kasan_save_track(struct kasan_track *track, gfp_t flags)
 {
        depot_stack_handle_t stack;
 
-       stack = kasan_save_stack(flags,
-                       STACK_DEPOT_FLAG_CAN_ALLOC | STACK_DEPOT_FLAG_GET);
+       stack = kasan_save_stack(flags, STACK_DEPOT_FLAG_CAN_ALLOC);
        kasan_set_track(track, stack);
 }
 
@@ -266,10 +265,9 @@ bool __kasan_slab_free(struct kmem_cache *cache, void *object,
                return true;
 
        /*
-        * If the object is not put into quarantine, it will likely be quickly
-        * reallocated. Thus, release its metadata now.
+        * Note: Keep per-object metadata to allow KASAN print stack traces for
+        * use-after-free-before-realloc bugs.
         */
-       kasan_release_object_meta(cache, object);
 
        /* Let slab put the object onto the freelist. */
        return false;
index df6627f62402c01dab04e6955bf80e7fb4b4b2ae..1900f857603456ec20c1f7bb0841362624c11260 100644 (file)
@@ -485,16 +485,6 @@ void kasan_init_object_meta(struct kmem_cache *cache, const void *object)
        if (alloc_meta) {
                /* Zero out alloc meta to mark it as invalid. */
                __memset(alloc_meta, 0, sizeof(*alloc_meta));
-
-               /*
-                * Prepare the lock for saving auxiliary stack traces.
-                * Temporarily disable KASAN bug reporting to allow instrumented
-                * raw_spin_lock_init to access aux_lock, which resides inside
-                * of a redzone.
-                */
-               kasan_disable_current();
-               raw_spin_lock_init(&alloc_meta->aux_lock);
-               kasan_enable_current();
        }
 
        /*
@@ -506,47 +496,23 @@ void kasan_init_object_meta(struct kmem_cache *cache, const void *object)
 
 static void release_alloc_meta(struct kasan_alloc_meta *meta)
 {
-       /* Evict the stack traces from stack depot. */
-       stack_depot_put(meta->alloc_track.stack);
-       stack_depot_put(meta->aux_stack[0]);
-       stack_depot_put(meta->aux_stack[1]);
-
-       /*
-        * Zero out alloc meta to mark it as invalid but keep aux_lock
-        * initialized to avoid having to reinitialize it when another object
-        * is allocated in the same slot.
-        */
-       __memset(&meta->alloc_track, 0, sizeof(meta->alloc_track));
-       __memset(meta->aux_stack, 0, sizeof(meta->aux_stack));
+       /* Zero out alloc meta to mark it as invalid. */
+       __memset(meta, 0, sizeof(*meta));
 }
 
 static void release_free_meta(const void *object, struct kasan_free_meta *meta)
 {
+       if (!kasan_arch_is_ready())
+               return;
+
        /* Check if free meta is valid. */
        if (*(u8 *)kasan_mem_to_shadow(object) != KASAN_SLAB_FREE_META)
                return;
 
-       /* Evict the stack trace from the stack depot. */
-       stack_depot_put(meta->free_track.stack);
-
        /* Mark free meta as invalid. */
        *(u8 *)kasan_mem_to_shadow(object) = KASAN_SLAB_FREE;
 }
 
-void kasan_release_object_meta(struct kmem_cache *cache, const void *object)
-{
-       struct kasan_alloc_meta *alloc_meta;
-       struct kasan_free_meta *free_meta;
-
-       alloc_meta = kasan_get_alloc_meta(cache, object);
-       if (alloc_meta)
-               release_alloc_meta(alloc_meta);
-
-       free_meta = kasan_get_free_meta(cache, object);
-       if (free_meta)
-               release_free_meta(object, free_meta);
-}
-
 size_t kasan_metadata_size(struct kmem_cache *cache, bool in_object)
 {
        struct kasan_cache *info = &cache->kasan_info;
@@ -571,8 +537,6 @@ static void __kasan_record_aux_stack(void *addr, depot_flags_t depot_flags)
        struct kmem_cache *cache;
        struct kasan_alloc_meta *alloc_meta;
        void *object;
-       depot_stack_handle_t new_handle, old_handle;
-       unsigned long flags;
 
        if (is_kfence_address(addr) || !slab)
                return;
@@ -583,33 +547,18 @@ static void __kasan_record_aux_stack(void *addr, depot_flags_t depot_flags)
        if (!alloc_meta)
                return;
 
-       new_handle = kasan_save_stack(0, depot_flags);
-
-       /*
-        * Temporarily disable KASAN bug reporting to allow instrumented
-        * spinlock functions to access aux_lock, which resides inside of a
-        * redzone.
-        */
-       kasan_disable_current();
-       raw_spin_lock_irqsave(&alloc_meta->aux_lock, flags);
-       old_handle = alloc_meta->aux_stack[1];
        alloc_meta->aux_stack[1] = alloc_meta->aux_stack[0];
-       alloc_meta->aux_stack[0] = new_handle;
-       raw_spin_unlock_irqrestore(&alloc_meta->aux_lock, flags);
-       kasan_enable_current();
-
-       stack_depot_put(old_handle);
+       alloc_meta->aux_stack[0] = kasan_save_stack(0, depot_flags);
 }
 
 void kasan_record_aux_stack(void *addr)
 {
-       return __kasan_record_aux_stack(addr,
-                       STACK_DEPOT_FLAG_CAN_ALLOC | STACK_DEPOT_FLAG_GET);
+       return __kasan_record_aux_stack(addr, STACK_DEPOT_FLAG_CAN_ALLOC);
 }
 
 void kasan_record_aux_stack_noalloc(void *addr)
 {
-       return __kasan_record_aux_stack(addr, STACK_DEPOT_FLAG_GET);
+       return __kasan_record_aux_stack(addr, 0);
 }
 
 void kasan_save_alloc_info(struct kmem_cache *cache, void *object, gfp_t flags)
@@ -620,7 +569,7 @@ void kasan_save_alloc_info(struct kmem_cache *cache, void *object, gfp_t flags)
        if (!alloc_meta)
                return;
 
-       /* Evict previous stack traces (might exist for krealloc or mempool). */
+       /* Invalidate previous stack traces (might exist for krealloc or mempool). */
        release_alloc_meta(alloc_meta);
 
        kasan_save_track(&alloc_meta->alloc_track, flags);
@@ -634,7 +583,7 @@ void kasan_save_free_info(struct kmem_cache *cache, void *object)
        if (!free_meta)
                return;
 
-       /* Evict previous stack trace (might exist for mempool). */
+       /* Invalidate previous stack trace (might exist for mempool). */
        release_free_meta(object, free_meta);
 
        kasan_save_track(&free_meta->free_track, 0);
index d0f172f2b9783f1b1e73ea82ed5d3e6aaf2bec75..fb2b9ac0659a7add8f4ca95b9dcdc38b937cd216 100644 (file)
@@ -6,7 +6,6 @@
 #include <linux/kasan.h>
 #include <linux/kasan-tags.h>
 #include <linux/kfence.h>
-#include <linux/spinlock.h>
 #include <linux/stackdepot.h>
 
 #if defined(CONFIG_KASAN_SW_TAGS) || defined(CONFIG_KASAN_HW_TAGS)
@@ -265,13 +264,6 @@ struct kasan_global {
 struct kasan_alloc_meta {
        struct kasan_track alloc_track;
        /* Free track is stored in kasan_free_meta. */
-       /*
-        * aux_lock protects aux_stack from accesses from concurrent
-        * kasan_record_aux_stack calls. It is a raw spinlock to avoid sleeping
-        * on RT kernels, as kasan_record_aux_stack_noalloc can be called from
-        * non-sleepable contexts.
-        */
-       raw_spinlock_t aux_lock;
        depot_stack_handle_t aux_stack[2];
 };
 
@@ -398,10 +390,8 @@ struct kasan_alloc_meta *kasan_get_alloc_meta(struct kmem_cache *cache,
 struct kasan_free_meta *kasan_get_free_meta(struct kmem_cache *cache,
                                                const void *object);
 void kasan_init_object_meta(struct kmem_cache *cache, const void *object);
-void kasan_release_object_meta(struct kmem_cache *cache, const void *object);
 #else
 static inline void kasan_init_object_meta(struct kmem_cache *cache, const void *object) { }
-static inline void kasan_release_object_meta(struct kmem_cache *cache, const void *object) { }
 #endif
 
 depot_stack_handle_t kasan_save_stack(gfp_t flags, depot_flags_t depot_flags);
index 3ba02efb952aac15b4e511ad985caae0d0935bac..6958aa713c67ee7b0d2c74af676ecb82788d22cd 100644 (file)
@@ -145,7 +145,10 @@ static void qlink_free(struct qlist_node *qlink, struct kmem_cache *cache)
        void *object = qlink_to_object(qlink, cache);
        struct kasan_free_meta *free_meta = kasan_get_free_meta(cache, object);
 
-       kasan_release_object_meta(cache, object);
+       /*
+        * Note: Keep per-object metadata to allow KASAN print stack traces for
+        * use-after-free-before-realloc bugs.
+        */
 
        /*
         * If init_on_free is enabled and KASAN's free metadata is stored in
index 912155a94ed5871c1805f33ec624c7c7c1ee28c8..cfa5e7288261189cb8242e5a0367fe6ffeebca12 100644 (file)
@@ -429,6 +429,7 @@ restart:
                if (++batch_count == SWAP_CLUSTER_MAX) {
                        batch_count = 0;
                        if (need_resched()) {
+                               arch_leave_lazy_mmu_mode();
                                pte_unmap_unlock(start_pte, ptl);
                                cond_resched();
                                goto restart;
index abd92869874d75f62cd303475ba2b44c67db38ac..d09136e040d3cc37b1b7b74a7cdd5d2c92eb8cb7 100644 (file)
@@ -180,8 +180,9 @@ static inline phys_addr_t memblock_cap_size(phys_addr_t base, phys_addr_t *size)
 /*
  * Address comparison utilities
  */
-static unsigned long __init_memblock memblock_addrs_overlap(phys_addr_t base1, phys_addr_t size1,
-                                      phys_addr_t base2, phys_addr_t size2)
+unsigned long __init_memblock
+memblock_addrs_overlap(phys_addr_t base1, phys_addr_t size1, phys_addr_t base2,
+                      phys_addr_t size2)
 {
        return ((base1 < (base2 + size2)) && (base2 < (base1 + size1)));
 }
@@ -2176,6 +2177,9 @@ static void __init memmap_init_reserved_pages(void)
                        start = region->base;
                        end = start + region->size;
 
+                       if (nid == NUMA_NO_NODE || nid >= MAX_NUMNODES)
+                               nid = early_pfn_to_nid(PFN_DOWN(start));
+
                        reserve_bootmem_region(start, end, nid);
                }
        }
@@ -2246,6 +2250,7 @@ static const char * const flagname[] = {
        [ilog2(MEMBLOCK_MIRROR)] = "MIRROR",
        [ilog2(MEMBLOCK_NOMAP)] = "NOMAP",
        [ilog2(MEMBLOCK_DRIVER_MANAGED)] = "DRV_MNG",
+       [ilog2(MEMBLOCK_RSRV_NOINIT)] = "RSV_NIT",
 };
 
 static int memblock_debug_show(struct seq_file *m, void *private)
index e4c8735e7c85cf061a2ab31c9be250934c680879..61932c9215e7734e4dfc7dc6e427c3692d1c3c6f 100644 (file)
@@ -621,6 +621,15 @@ static inline int memcg_events_index(enum vm_event_item idx)
 }
 
 struct memcg_vmstats_percpu {
+       /* Stats updates since the last flush */
+       unsigned int                    stats_updates;
+
+       /* Cached pointers for fast iteration in memcg_rstat_updated() */
+       struct memcg_vmstats_percpu     *parent;
+       struct memcg_vmstats            *vmstats;
+
+       /* The above should fit a single cacheline for memcg_rstat_updated() */
+
        /* Local (CPU and cgroup) page state & events */
        long                    state[MEMCG_NR_STAT];
        unsigned long           events[NR_MEMCG_EVENTS];
@@ -632,10 +641,7 @@ struct memcg_vmstats_percpu {
        /* Cgroup1: threshold notifications & softlimit tree updates */
        unsigned long           nr_page_events;
        unsigned long           targets[MEM_CGROUP_NTARGETS];
-
-       /* Stats updates since the last flush */
-       unsigned int            stats_updates;
-};
+} ____cacheline_aligned;
 
 struct memcg_vmstats {
        /* Aggregated (CPU and subtree) page state & events */
@@ -698,36 +704,35 @@ static void memcg_stats_unlock(void)
 }
 
 
-static bool memcg_should_flush_stats(struct mem_cgroup *memcg)
+static bool memcg_vmstats_needs_flush(struct memcg_vmstats *vmstats)
 {
-       return atomic64_read(&memcg->vmstats->stats_updates) >
+       return atomic64_read(&vmstats->stats_updates) >
                MEMCG_CHARGE_BATCH * num_online_cpus();
 }
 
 static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
 {
+       struct memcg_vmstats_percpu *statc;
        int cpu = smp_processor_id();
-       unsigned int x;
 
        if (!val)
                return;
 
        cgroup_rstat_updated(memcg->css.cgroup, cpu);
-
-       for (; memcg; memcg = parent_mem_cgroup(memcg)) {
-               x = __this_cpu_add_return(memcg->vmstats_percpu->stats_updates,
-                                         abs(val));
-
-               if (x < MEMCG_CHARGE_BATCH)
+       statc = this_cpu_ptr(memcg->vmstats_percpu);
+       for (; statc; statc = statc->parent) {
+               statc->stats_updates += abs(val);
+               if (statc->stats_updates < MEMCG_CHARGE_BATCH)
                        continue;
 
                /*
                 * If @memcg is already flush-able, increasing stats_updates is
                 * redundant. Avoid the overhead of the atomic update.
                 */
-               if (!memcg_should_flush_stats(memcg))
-                       atomic64_add(x, &memcg->vmstats->stats_updates);
-               __this_cpu_write(memcg->vmstats_percpu->stats_updates, 0);
+               if (!memcg_vmstats_needs_flush(statc->vmstats))
+                       atomic64_add(statc->stats_updates,
+                                    &statc->vmstats->stats_updates);
+               statc->stats_updates = 0;
        }
 }
 
@@ -756,7 +761,7 @@ void mem_cgroup_flush_stats(struct mem_cgroup *memcg)
        if (!memcg)
                memcg = root_mem_cgroup;
 
-       if (memcg_should_flush_stats(memcg))
+       if (memcg_vmstats_needs_flush(memcg->vmstats))
                do_flush_stats(memcg);
 }
 
@@ -770,7 +775,7 @@ void mem_cgroup_flush_stats_ratelimited(struct mem_cgroup *memcg)
 static void flush_memcg_stats_dwork(struct work_struct *w)
 {
        /*
-        * Deliberately ignore memcg_should_flush_stats() here so that flushing
+        * Deliberately ignore memcg_vmstats_needs_flush() here so that flushing
         * in latency-sensitive paths is as cheap as possible.
         */
        do_flush_stats(root_mem_cgroup);
@@ -2623,8 +2628,9 @@ static unsigned long calculate_high_delay(struct mem_cgroup *memcg,
 }
 
 /*
- * Scheduled by try_charge() to be executed from the userland return path
- * and reclaims memory over the high limit.
+ * Reclaims memory over the high limit. Called directly from
+ * try_charge() (context permitting), as well as from the userland
+ * return path where reclaim is always able to block.
  */
 void mem_cgroup_handle_over_high(gfp_t gfp_mask)
 {
@@ -2643,6 +2649,17 @@ void mem_cgroup_handle_over_high(gfp_t gfp_mask)
        current->memcg_nr_pages_over_high = 0;
 
 retry_reclaim:
+       /*
+        * Bail if the task is already exiting. Unlike memory.max,
+        * memory.high enforcement isn't as strict, and there is no
+        * OOM killer involved, which means the excess could already
+        * be much bigger (and still growing) than it could for
+        * memory.max; the dying task could get stuck in fruitless
+        * reclaim for a long time, which isn't desirable.
+        */
+       if (task_is_dying())
+               goto out;
+
        /*
         * The allocating task should reclaim at least the batch size, but for
         * subsequent retries we only want to do what's necessary to prevent oom
@@ -2693,6 +2710,9 @@ retry_reclaim:
        }
 
        /*
+        * Reclaim didn't manage to push usage below the limit, slow
+        * this allocating task down.
+        *
         * If we exit early, we're guaranteed to die (since
         * schedule_timeout_killable sets TASK_KILLABLE). This means we don't
         * need to account for any ill-begotten jiffies to pay them off later.
@@ -2887,11 +2907,17 @@ done_restock:
                }
        } while ((memcg = parent_mem_cgroup(memcg)));
 
+       /*
+        * Reclaim is set up above to be called from the userland
+        * return path. But also attempt synchronous reclaim to avoid
+        * excessive overrun while the task is still inside the
+        * kernel. If this is successful, the return path will see it
+        * when it rechecks the overage and simply bail out.
+        */
        if (current->memcg_nr_pages_over_high > MEMCG_CHARGE_BATCH &&
            !(current->flags & PF_MEMALLOC) &&
-           gfpflags_allow_blocking(gfp_mask)) {
+           gfpflags_allow_blocking(gfp_mask))
                mem_cgroup_handle_over_high(gfp_mask);
-       }
        return 0;
 }
 
@@ -5456,10 +5482,11 @@ static void mem_cgroup_free(struct mem_cgroup *memcg)
        __mem_cgroup_free(memcg);
 }
 
-static struct mem_cgroup *mem_cgroup_alloc(void)
+static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent)
 {
+       struct memcg_vmstats_percpu *statc, *pstatc;
        struct mem_cgroup *memcg;
-       int node;
+       int node, cpu;
        int __maybe_unused i;
        long error = -ENOMEM;
 
@@ -5483,6 +5510,14 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
        if (!memcg->vmstats_percpu)
                goto fail;
 
+       for_each_possible_cpu(cpu) {
+               if (parent)
+                       pstatc = per_cpu_ptr(parent->vmstats_percpu, cpu);
+               statc = per_cpu_ptr(memcg->vmstats_percpu, cpu);
+               statc->parent = parent ? pstatc : NULL;
+               statc->vmstats = memcg->vmstats;
+       }
+
        for_each_node(node)
                if (alloc_mem_cgroup_per_node_info(memcg, node))
                        goto fail;
@@ -5528,7 +5563,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
        struct mem_cgroup *memcg, *old_memcg;
 
        old_memcg = set_active_memcg(parent);
-       memcg = mem_cgroup_alloc();
+       memcg = mem_cgroup_alloc(parent);
        set_active_memcg(old_memcg);
        if (IS_ERR(memcg))
                return ERR_CAST(memcg);
@@ -7936,9 +7971,13 @@ bool mem_cgroup_swap_full(struct folio *folio)
 
 static int __init setup_swap_account(char *s)
 {
-       pr_warn_once("The swapaccount= commandline option is deprecated. "
-                    "Please report your usecase to linux-mm@kvack.org if you "
-                    "depend on this functionality.\n");
+       bool res;
+
+       if (!kstrtobool(s, &res) && !res)
+               pr_warn_once("The swapaccount=0 commandline option is deprecated "
+                            "in favor of configuring swap control via cgroupfs. "
+                            "Please report your usecase to linux-mm@kvack.org if you "
+                            "depend on this functionality.\n");
        return 1;
 }
 __setup("swapaccount=", setup_swap_account);
index 4f9b61f4a6682a530a202d02a6998c0b687906dd..9349948f1abfd120977706bbda23456999f057bc 100644 (file)
@@ -982,7 +982,7 @@ static bool has_extra_refcount(struct page_state *ps, struct page *p,
        int count = page_count(p) - 1;
 
        if (extra_pins)
-               count -= 1;
+               count -= folio_nr_pages(page_folio(p));
 
        if (count > 0) {
                pr_err("%#lx: %s still referenced by %d users\n",
@@ -1377,6 +1377,9 @@ void ClearPageHWPoisonTakenOff(struct page *page)
  */
 static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)
 {
+       if (PageSlab(page))
+               return false;
+
        /* Soft offline could migrate non-LRU movable pages */
        if ((flags & MF_SOFT_OFFLINE) && __PageMovable(page))
                return true;
index 7e1f4849463aa3645a0eead97f40a90caf5e6d5f..0bfc8b007c01a3323a15a17d51c4da46a6207540 100644 (file)
@@ -1464,7 +1464,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
                        delay_rmap = 0;
                        if (!folio_test_anon(folio)) {
                                if (pte_dirty(ptent)) {
-                                       folio_set_dirty(folio);
+                                       folio_mark_dirty(folio);
                                        if (tlb_delay_rmap(tlb)) {
                                                delay_rmap = 1;
                                                force_flush = 1;
@@ -3799,6 +3799,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
        struct page *page;
        struct swap_info_struct *si = NULL;
        rmap_t rmap_flags = RMAP_NONE;
+       bool need_clear_cache = false;
        bool exclusive = false;
        swp_entry_t entry;
        pte_t pte;
@@ -3867,6 +3868,20 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
        if (!folio) {
                if (data_race(si->flags & SWP_SYNCHRONOUS_IO) &&
                    __swap_count(entry) == 1) {
+                       /*
+                        * Prevent parallel swapin from proceeding with
+                        * the cache flag. Otherwise, another thread may
+                        * finish swapin first, free the entry, and swapout
+                        * reusing the same entry. It's undetectable as
+                        * pte_same() returns true due to entry reuse.
+                        */
+                       if (swapcache_prepare(entry)) {
+                               /* Relax a bit to prevent rapid repeated page faults */
+                               schedule_timeout_uninterruptible(1);
+                               goto out;
+                       }
+                       need_clear_cache = true;
+
                        /* skip swapcache */
                        folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0,
                                                vma, vmf->address, false);
@@ -4117,6 +4132,9 @@ unlock:
        if (vmf->pte)
                pte_unmap_unlock(vmf->pte, vmf->ptl);
 out:
+       /* Clear the swap cache pin for direct swapin after PTL unlock */
+       if (need_clear_cache)
+               swapcache_clear(si, entry);
        if (si)
                put_swap_device(si);
        return ret;
@@ -4131,6 +4149,8 @@ out_release:
                folio_unlock(swapcache);
                folio_put(swapcache);
        }
+       if (need_clear_cache)
+               swapcache_clear(si, entry);
        if (si)
                put_swap_device(si);
        return ret;
@@ -5478,7 +5498,7 @@ static inline bool get_mmap_lock_carefully(struct mm_struct *mm, struct pt_regs
                return true;
 
        if (regs && !user_mode(regs)) {
-               unsigned long ip = instruction_pointer(regs);
+               unsigned long ip = exception_ip(regs);
                if (!search_exception_tables(ip))
                        return false;
        }
@@ -5503,7 +5523,7 @@ static inline bool upgrade_mmap_lock_carefully(struct mm_struct *mm, struct pt_r
 {
        mmap_read_unlock(mm);
        if (regs && !user_mode(regs)) {
-               unsigned long ip = instruction_pointer(regs);
+               unsigned long ip = exception_ip(regs);
                if (!search_exception_tables(ip))
                        return false;
        }
index cc9f2bcd73b492aebacab4b812a515cf7e70b92b..c27b1f8097d4a72e569ce5a06be42b93184e9db0 100644 (file)
@@ -2519,6 +2519,14 @@ static int numamigrate_isolate_folio(pg_data_t *pgdat, struct folio *folio)
                        if (managed_zone(pgdat->node_zones + z))
                                break;
                }
+
+               /*
+                * If there are no managed zones, it should not proceed
+                * further.
+                */
+               if (z < 0)
+                       return 0;
+
                wakeup_kswapd(pgdat->node_zones + z, 0,
                              folio_order(folio), ZONE_MOVABLE);
                return 0;
index b78e83d351d2864a6a339059ac734b6602eb5824..3281287771c9c6100ebefde692bca06e247ae0f8 100644 (file)
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -954,13 +954,21 @@ static struct vm_area_struct
        } else if (merge_prev) {                        /* case 2 */
                if (curr) {
                        vma_start_write(curr);
-                       err = dup_anon_vma(prev, curr, &anon_dup);
                        if (end == curr->vm_end) {      /* case 7 */
+                               /*
+                                * can_vma_merge_after() assumed we would not be
+                                * removing prev vma, so it skipped the check
+                                * for vm_ops->close, but we are removing curr
+                                */
+                               if (curr->vm_ops && curr->vm_ops->close)
+                                       err = -EINVAL;
                                remove = curr;
                        } else {                        /* case 5 */
                                adjust = curr;
                                adj_start = (end - curr->vm_start);
                        }
+                       if (!err)
+                               err = dup_anon_vma(prev, curr, &anon_dup);
                }
        } else { /* merge_next */
                vma_start_write(next);
@@ -1825,15 +1833,17 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
                /*
                 * mmap_region() will call shmem_zero_setup() to create a file,
                 * so use shmem's get_unmapped_area in case it can be huge.
-                * do_mmap() will clear pgoff, so match alignment.
                 */
-               pgoff = 0;
                get_area = shmem_get_unmapped_area;
        } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
                /* Ensures that larger anonymous mappings are THP aligned. */
                get_area = thp_get_unmapped_area;
        }
 
+       /* Always treat pgoff as zero for anonymous memory. */
+       if (!file)
+               pgoff = 0;
+
        addr = get_area(file, addr, len, pgoff, flags);
        if (IS_ERR_VALUE(addr))
                return addr;
index cd4e4ae77c40ae0497efeaa8fb391f6550e51a4b..3f255534986a2fda07e2d35187bb385f64749c5c 100644 (file)
@@ -1638,7 +1638,7 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
         */
        dtc->wb_thresh = __wb_calc_thresh(dtc);
        dtc->wb_bg_thresh = dtc->thresh ?
-               div_u64((u64)dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
+               div64_u64(dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
 
        /*
         * In order to avoid the stacked BDI deadlock we need
@@ -1921,7 +1921,7 @@ pause:
                        break;
                }
                __set_current_state(TASK_KILLABLE);
-               wb->dirty_sleep = now;
+               bdi->last_bdp_sleep = jiffies;
                io_schedule_timeout(pause);
 
                current->dirty_paused_when = now + pause;
index 150d4f23b01048ed7af53a74ec3e12a208fc17b5..a663202045dc437a4ff6186afb48e728fa30839a 100644 (file)
@@ -4041,6 +4041,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
                                                struct alloc_context *ac)
 {
        bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
+       bool can_compact = gfp_compaction_allowed(gfp_mask);
        const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER;
        struct page *page = NULL;
        unsigned int alloc_flags;
@@ -4111,7 +4112,7 @@ restart:
         * Don't try this for allocations that are allowed to ignore
         * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen.
         */
-       if (can_direct_reclaim &&
+       if (can_direct_reclaim && can_compact &&
                        (costly_order ||
                           (order > 0 && ac->migratetype != MIGRATE_MOVABLE))
                        && !gfp_pfmemalloc_allowed(gfp_mask)) {
@@ -4209,9 +4210,10 @@ retry:
 
        /*
         * Do not retry costly high order allocations unless they are
-        * __GFP_RETRY_MAYFAIL
+        * __GFP_RETRY_MAYFAIL and we can compact
         */
-       if (costly_order && !(gfp_mask & __GFP_RETRY_MAYFAIL))
+       if (costly_order && (!can_compact ||
+                            !(gfp_mask & __GFP_RETRY_MAYFAIL)))
                goto nopage;
 
        if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
@@ -4224,7 +4226,7 @@ retry:
         * implementation of the compaction depends on the sufficient amount
         * of free memory (see __compaction_suitable)
         */
-       if (did_some_progress > 0 &&
+       if (did_some_progress > 0 && can_compact &&
                        should_compact_retry(ac, order, alloc_flags,
                                compact_result, &compact_priority,
                                &compaction_retries))
index 23620c57c1225bef9e3e1193a7163c36a916951f..2648ec4f04947b2e837377da68d7b8ae1fd48f7a 100644 (file)
@@ -469,7 +469,7 @@ static inline int ra_alloc_folio(struct readahead_control *ractl, pgoff_t index,
 
        if (!folio)
                return -ENOMEM;
-       mark = round_up(mark, 1UL << order);
+       mark = round_down(mark, 1UL << order);
        if (index == mark)
                folio_set_readahead(folio);
        err = filemap_add_folio(ractl->mapping, folio, index, gfp);
@@ -575,7 +575,7 @@ static void ondemand_readahead(struct readahead_control *ractl,
         * It's the expected callback index, assume sequential access.
         * Ramp up sizes, and push forward the readahead window.
         */
-       expected = round_up(ra->start + ra->size - ra->async_size,
+       expected = round_down(ra->start + ra->size - ra->async_size,
                        1UL << order);
        if (index == expected || index == (ra->start + ra->size)) {
                ra->start += ra->size;
index d7c84ff621860b85090cf61d9b2970357da01b76..29d5a024df48ae85ca2c39793477bf359d0c3980 100644 (file)
@@ -3374,7 +3374,7 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry)
 
 static int shmem_rmdir(struct inode *dir, struct dentry *dentry)
 {
-       if (!simple_empty(dentry))
+       if (!simple_offset_empty(dentry))
                return -ENOTEMPTY;
 
        drop_nlink(d_inode(dentry));
@@ -3431,7 +3431,7 @@ static int shmem_rename2(struct mnt_idmap *idmap,
                return simple_offset_rename_exchange(old_dir, old_dentry,
                                                     new_dir, new_dentry);
 
-       if (!simple_empty(new_dentry))
+       if (!simple_offset_empty(new_dentry))
                return -ENOTEMPTY;
 
        if (flags & RENAME_WHITEOUT) {
@@ -4355,7 +4355,9 @@ static int shmem_fill_super(struct super_block *sb, struct fs_context *fc)
 #ifdef CONFIG_TMPFS_POSIX_ACL
        sb->s_flags |= SB_POSIXACL;
 #endif
-       uuid_gen(&sb->s_uuid);
+       uuid_t uuid;
+       uuid_gen(&uuid);
+       super_set_uuid(sb, uuid.b, sizeof(uuid));
 
 #ifdef CONFIG_TMPFS_QUOTA
        if (ctx->seen & SHMEM_SEEN_QUOTA) {
index 758c46ca671ed110ae8e25fad48196d3feed03dc..fc2f6ade7f80b399707bcc67c44f813aea0b846d 100644 (file)
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -41,6 +41,7 @@ void __delete_from_swap_cache(struct folio *folio,
 void delete_from_swap_cache(struct folio *folio);
 void clear_shadow_from_swap_cache(int type, unsigned long begin,
                                  unsigned long end);
+void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry);
 struct folio *swap_cache_get_folio(swp_entry_t entry,
                struct vm_area_struct *vma, unsigned long addr);
 struct folio *filemap_get_incore_folio(struct address_space *mapping,
@@ -97,6 +98,10 @@ static inline int swap_writepage(struct page *p, struct writeback_control *wbc)
        return 0;
 }
 
+static inline void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry)
+{
+}
+
 static inline struct folio *swap_cache_get_folio(swp_entry_t entry,
                struct vm_area_struct *vma, unsigned long addr)
 {
index e671266ad77241f461a17cbb2e486fe48a423f69..7255c01a1e4e16d758186019f904e70a7890a5cc 100644 (file)
@@ -680,9 +680,10 @@ skip:
        /* The page was likely read above, so no need for plugging here */
        folio = __read_swap_cache_async(entry, gfp_mask, mpol, ilx,
                                        &page_allocated, false);
-       if (unlikely(page_allocated))
+       if (unlikely(page_allocated)) {
+               zswap_folio_swapin(folio);
                swap_read_folio(folio, false, NULL);
-       zswap_folio_swapin(folio);
+       }
        return folio;
 }
 
@@ -855,9 +856,10 @@ skip:
        /* The folio was likely read above, so no need for plugging here */
        folio = __read_swap_cache_async(targ_entry, gfp_mask, mpol, targ_ilx,
                                        &page_allocated, false);
-       if (unlikely(page_allocated))
+       if (unlikely(page_allocated)) {
+               zswap_folio_swapin(folio);
                swap_read_folio(folio, false, NULL);
-       zswap_folio_swapin(folio);
+       }
        return folio;
 }
 
index 556ff7347d5f04402b61cc5bd9d0d123a36dc1d5..573843d9cc91ca8061ab45fbabbd3fe319bedc78 100644 (file)
@@ -2532,10 +2532,10 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
        exit_swap_address_space(p->type);
 
        inode = mapping->host;
-       if (p->bdev_handle) {
+       if (p->bdev_file) {
                set_blocksize(p->bdev, old_block_size);
-               bdev_release(p->bdev_handle);
-               p->bdev_handle = NULL;
+               fput(p->bdev_file);
+               p->bdev_file = NULL;
        }
 
        inode_lock(inode);
@@ -2765,14 +2765,14 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
        int error;
 
        if (S_ISBLK(inode->i_mode)) {
-               p->bdev_handle = bdev_open_by_dev(inode->i_rdev,
+               p->bdev_file = bdev_file_open_by_dev(inode->i_rdev,
                                BLK_OPEN_READ | BLK_OPEN_WRITE, p, NULL);
-               if (IS_ERR(p->bdev_handle)) {
-                       error = PTR_ERR(p->bdev_handle);
-                       p->bdev_handle = NULL;
+               if (IS_ERR(p->bdev_file)) {
+                       error = PTR_ERR(p->bdev_file);
+                       p->bdev_file = NULL;
                        return error;
                }
-               p->bdev = p->bdev_handle->bdev;
+               p->bdev = file_bdev(p->bdev_file);
                p->old_block_size = block_size(p->bdev);
                error = set_blocksize(p->bdev, PAGE_SIZE);
                if (error < 0)
@@ -3208,10 +3208,10 @@ bad_swap:
        p->percpu_cluster = NULL;
        free_percpu(p->cluster_next_cpu);
        p->cluster_next_cpu = NULL;
-       if (p->bdev_handle) {
+       if (p->bdev_file) {
                set_blocksize(p->bdev, p->old_block_size);
-               bdev_release(p->bdev_handle);
-               p->bdev_handle = NULL;
+               fput(p->bdev_file);
+               p->bdev_file = NULL;
        }
        inode = NULL;
        destroy_swap_extents(p);
@@ -3365,6 +3365,19 @@ int swapcache_prepare(swp_entry_t entry)
        return __swap_duplicate(entry, SWAP_HAS_CACHE);
 }
 
+void swapcache_clear(struct swap_info_struct *si, swp_entry_t entry)
+{
+       struct swap_cluster_info *ci;
+       unsigned long offset = swp_offset(entry);
+       unsigned char usage;
+
+       ci = lock_cluster_or_swap_info(si, offset);
+       usage = __swap_entry_free_locked(si, offset, SWAP_HAS_CACHE);
+       unlock_cluster_or_swap_info(si, ci);
+       if (!usage)
+               free_swap_slot(entry);
+}
+
 struct swap_info_struct *swp_swap_info(swp_entry_t entry)
 {
        return swap_type_to_swap_info(swp_type(entry));
index 20e3b0d9cf7ed0d59d86a11b2472f0e138160692..313f1c42768a621d59385a0673e0cdf85d5c1720 100644 (file)
@@ -357,6 +357,7 @@ static __always_inline ssize_t mfill_atomic_hugetlb(
                                              unsigned long dst_start,
                                              unsigned long src_start,
                                              unsigned long len,
+                                             atomic_t *mmap_changing,
                                              uffd_flags_t flags)
 {
        struct mm_struct *dst_mm = dst_vma->vm_mm;
@@ -472,6 +473,15 @@ retry:
                                goto out;
                        }
                        mmap_read_lock(dst_mm);
+                       /*
+                        * If memory mappings are changing because of non-cooperative
+                        * operation (e.g. mremap) running in parallel, bail out and
+                        * request the user to retry later
+                        */
+                       if (mmap_changing && atomic_read(mmap_changing)) {
+                               err = -EAGAIN;
+                               break;
+                       }
 
                        dst_vma = NULL;
                        goto retry;
@@ -506,6 +516,7 @@ extern ssize_t mfill_atomic_hugetlb(struct vm_area_struct *dst_vma,
                                    unsigned long dst_start,
                                    unsigned long src_start,
                                    unsigned long len,
+                                   atomic_t *mmap_changing,
                                    uffd_flags_t flags);
 #endif /* CONFIG_HUGETLB_PAGE */
 
@@ -622,8 +633,8 @@ retry:
         * If this is a HUGETLB vma, pass off to appropriate routine
         */
        if (is_vm_hugetlb_page(dst_vma))
-               return  mfill_atomic_hugetlb(dst_vma, dst_start,
-                                            src_start, len, flags);
+               return  mfill_atomic_hugetlb(dst_vma, dst_start, src_start,
+                                            len, mmap_changing, flags);
 
        if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma))
                goto out_unlock;
@@ -891,8 +902,8 @@ static int move_present_pte(struct mm_struct *mm,
 
        double_pt_lock(dst_ptl, src_ptl);
 
-       if (!pte_same(*src_pte, orig_src_pte) ||
-           !pte_same(*dst_pte, orig_dst_pte)) {
+       if (!pte_same(ptep_get(src_pte), orig_src_pte) ||
+           !pte_same(ptep_get(dst_pte), orig_dst_pte)) {
                err = -EAGAIN;
                goto out;
        }
@@ -903,9 +914,6 @@ static int move_present_pte(struct mm_struct *mm,
                goto out;
        }
 
-       folio_move_anon_rmap(src_folio, dst_vma);
-       WRITE_ONCE(src_folio->index, linear_page_index(dst_vma, dst_addr));
-
        orig_src_pte = ptep_clear_flush(src_vma, src_addr, src_pte);
        /* Folio got pinned from under us. Put it back and fail the move. */
        if (folio_maybe_dma_pinned(src_folio)) {
@@ -914,6 +922,9 @@ static int move_present_pte(struct mm_struct *mm,
                goto out;
        }
 
+       folio_move_anon_rmap(src_folio, dst_vma);
+       WRITE_ONCE(src_folio->index, linear_page_index(dst_vma, dst_addr));
+
        orig_dst_pte = mk_pte(&src_folio->page, dst_vma->vm_page_prot);
        /* Follow mremap() behavior and treat the entry dirty after the move */
        orig_dst_pte = pte_mkwrite(pte_mkdirty(orig_dst_pte), dst_vma);
@@ -935,8 +946,8 @@ static int move_swap_pte(struct mm_struct *mm,
 
        double_pt_lock(dst_ptl, src_ptl);
 
-       if (!pte_same(*src_pte, orig_src_pte) ||
-           !pte_same(*dst_pte, orig_dst_pte)) {
+       if (!pte_same(ptep_get(src_pte), orig_src_pte) ||
+           !pte_same(ptep_get(dst_pte), orig_dst_pte)) {
                double_pt_unlock(dst_ptl, src_ptl);
                return -EAGAIN;
        }
@@ -1005,7 +1016,7 @@ retry:
        }
 
        spin_lock(dst_ptl);
-       orig_dst_pte = *dst_pte;
+       orig_dst_pte = ptep_get(dst_pte);
        spin_unlock(dst_ptl);
        if (!pte_none(orig_dst_pte)) {
                err = -EEXIST;
@@ -1013,7 +1024,7 @@ retry:
        }
 
        spin_lock(src_ptl);
-       orig_src_pte = *src_pte;
+       orig_src_pte = ptep_get(src_pte);
        spin_unlock(src_ptl);
        if (pte_none(orig_src_pte)) {
                if (!(mode & UFFDIO_MOVE_MODE_ALLOW_SRC_HOLES))
@@ -1043,7 +1054,7 @@ retry:
                         * page isn't freed under us
                         */
                        spin_lock(src_ptl);
-                       if (!pte_same(orig_src_pte, *src_pte)) {
+                       if (!pte_same(orig_src_pte, ptep_get(src_pte))) {
                                spin_unlock(src_ptl);
                                err = -EAGAIN;
                                goto out;
index 4f9c854ce6cc66c2d971767a8f00e51eeab8a65e..4255619a1a314717df613e20090b160fce72a7e9 100644 (file)
@@ -5753,7 +5753,7 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
 /* Use reclaim/compaction for costly allocs or under memory pressure */
 static bool in_reclaim_compaction(struct scan_control *sc)
 {
-       if (IS_ENABLED(CONFIG_COMPACTION) && sc->order &&
+       if (gfp_compaction_allowed(sc->gfp_mask) && sc->order &&
                        (sc->order > PAGE_ALLOC_COSTLY_ORDER ||
                         sc->priority < DEF_PRIORITY - 2))
                return true;
@@ -5998,6 +5998,9 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
 {
        unsigned long watermark;
 
+       if (!gfp_compaction_allowed(sc->gfp_mask))
+               return false;
+
        /* Allocation can already succeed, nothing to do */
        if (zone_watermark_ok(zone, sc->order, min_wmark_pages(zone),
                              sc->reclaim_idx, 0))
index ca25b676048ea6d0b399661e9ebca137585f8dbd..db4625af65fb7f6655a057e145bbe20dd64f7ae9 100644 (file)
@@ -377,10 +377,9 @@ void zswap_folio_swapin(struct folio *folio)
 {
        struct lruvec *lruvec;
 
-       if (folio) {
-               lruvec = folio_lruvec(folio);
-               atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected);
-       }
+       VM_WARN_ON_ONCE(!folio_test_locked(folio));
+       lruvec = folio_lruvec(folio);
+       atomic_long_inc(&lruvec->zswap_lruvec_state.nr_zswap_protected);
 }
 
 /*********************************
@@ -536,10 +535,6 @@ static struct zpool *zswap_find_zpool(struct zswap_entry *entry)
  */
 static void zswap_free_entry(struct zswap_entry *entry)
 {
-       if (entry->objcg) {
-               obj_cgroup_uncharge_zswap(entry->objcg, entry->length);
-               obj_cgroup_put(entry->objcg);
-       }
        if (!entry->length)
                atomic_dec(&zswap_same_filled_pages);
        else {
@@ -548,6 +543,10 @@ static void zswap_free_entry(struct zswap_entry *entry)
                atomic_dec(&entry->pool->nr_stored);
                zswap_pool_put(entry->pool);
        }
+       if (entry->objcg) {
+               obj_cgroup_uncharge_zswap(entry->objcg, entry->length);
+               obj_cgroup_put(entry->objcg);
+       }
        zswap_entry_cache_free(entry);
        atomic_dec(&zswap_stored_pages);
        zswap_update_total_size();
@@ -895,10 +894,8 @@ static enum lru_status shrink_memcg_cb(struct list_head *item, struct list_lru_o
                 * into the warmer region. We should terminate shrinking (if we're in the dynamic
                 * shrinker context).
                 */
-               if (writeback_result == -EEXIST && encountered_page_in_swapcache) {
-                       ret = LRU_SKIP;
+               if (writeback_result == -EEXIST && encountered_page_in_swapcache)
                        *encountered_page_in_swapcache = true;
-               }
 
                goto put_unlock;
        }
@@ -1442,6 +1439,8 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
        if (zswap_rb_search(&tree->rbroot, swp_offset(entry->swpentry)) != entry) {
                spin_unlock(&tree->lock);
                delete_from_swap_cache(folio);
+               folio_unlock(folio);
+               folio_put(folio);
                return -ENOMEM;
        }
        spin_unlock(&tree->lock);
@@ -1519,7 +1518,7 @@ bool zswap_store(struct folio *folio)
        if (folio_test_large(folio))
                return false;
 
-       if (!zswap_enabled || !tree)
+       if (!tree)
                return false;
 
        /*
@@ -1534,6 +1533,10 @@ bool zswap_store(struct folio *folio)
                zswap_invalidate_entry(tree, dupentry);
        }
        spin_unlock(&tree->lock);
+
+       if (!zswap_enabled)
+               return false;
+
        objcg = get_obj_cgroup_from_folio(folio);
        if (objcg && !obj_cgroup_may_zswap(objcg)) {
                memcg = get_mem_cgroup_from_objcg(objcg);
index 7b3341cef926ef37ce84c7dd09301c84c0a103c6..850d4a185f55f87f70a5e0a5112d1ab20d3eb070 100644 (file)
@@ -179,4 +179,5 @@ static void __exit lowpan_module_exit(void)
 module_init(lowpan_module_init);
 module_exit(lowpan_module_exit);
 
+MODULE_DESCRIPTION("IPv6 over Low-Power Wireless Personal Area Network core module");
 MODULE_LICENSE("GPL");
index 214532173536b790cf032615f73fb3d868d2aae1..a3b68243fd4b18492220339f8a2151598cf6e98a 100644 (file)
@@ -118,12 +118,16 @@ static int vlan_changelink(struct net_device *dev, struct nlattr *tb[],
        }
        if (data[IFLA_VLAN_INGRESS_QOS]) {
                nla_for_each_nested(attr, data[IFLA_VLAN_INGRESS_QOS], rem) {
+                       if (nla_type(attr) != IFLA_VLAN_QOS_MAPPING)
+                               continue;
                        m = nla_data(attr);
                        vlan_dev_set_ingress_priority(dev, m->to, m->from);
                }
        }
        if (data[IFLA_VLAN_EGRESS_QOS]) {
                nla_for_each_nested(attr, data[IFLA_VLAN_EGRESS_QOS], rem) {
+                       if (nla_type(attr) != IFLA_VLAN_QOS_MAPPING)
+                               continue;
                        m = nla_data(attr);
                        err = vlan_dev_set_egress_priority(dev, m->from, m->to);
                        if (err)
index 033871e718a34f7430929f862fcbcc886d933622..324e3ab96bb393d815901bb2d221ef6e18cf25e4 100644 (file)
@@ -1532,4 +1532,5 @@ static void __exit atm_mpoa_cleanup(void)
 module_init(atm_mpoa_init);
 module_exit(atm_mpoa_cleanup);
 
+MODULE_DESCRIPTION("Multi-Protocol Over ATM (MPOA) driver");
 MODULE_LICENSE("GPL");
index d982daea832927d38474f8d46764a82f87a09659..14088c4ff2f66f9049858598f30dc0069c27fb70 100644 (file)
@@ -2175,6 +2175,7 @@ void batadv_mcast_free(struct batadv_priv *bat_priv)
        cancel_delayed_work_sync(&bat_priv->mcast.work);
 
        batadv_tvlv_container_unregister(bat_priv, BATADV_TVLV_MCAST, 2);
+       batadv_tvlv_handler_unregister(bat_priv, BATADV_TVLV_MCAST_TRACKER, 1);
        batadv_tvlv_handler_unregister(bat_priv, BATADV_TVLV_MCAST, 2);
 
        /* safely calling outside of worker, as worker was canceled above */
@@ -2198,6 +2199,8 @@ void batadv_mcast_purge_orig(struct batadv_orig_node *orig)
                                      BATADV_MCAST_WANT_NO_RTR4);
        batadv_mcast_want_rtr6_update(bat_priv, orig,
                                      BATADV_MCAST_WANT_NO_RTR6);
+       batadv_mcast_have_mc_ptype_update(bat_priv, orig,
+                                         BATADV_MCAST_HAVE_MC_PTYPE_CAPA);
 
        spin_unlock_bh(&orig->mcast_handler_lock);
 }
index 65601aa52e0d8b669ac8aaec116301398a5e865b..2821a42cefdc6e0f83fa4a765bc881a67795a2b5 100644 (file)
@@ -1049,6 +1049,7 @@ static void hci_error_reset(struct work_struct *work)
 {
        struct hci_dev *hdev = container_of(work, struct hci_dev, error_reset);
 
+       hci_dev_hold(hdev);
        BT_DBG("%s", hdev->name);
 
        if (hdev->hw_error)
@@ -1056,10 +1057,10 @@ static void hci_error_reset(struct work_struct *work)
        else
                bt_dev_err(hdev, "hardware error 0x%2.2x", hdev->hw_error_code);
 
-       if (hci_dev_do_close(hdev))
-               return;
+       if (!hci_dev_do_close(hdev))
+               hci_dev_do_open(hdev);
 
-       hci_dev_do_open(hdev);
+       hci_dev_put(hdev);
 }
 
 void hci_uuids_clear(struct hci_dev *hdev)
index ef8c3bed73617efa01052f7c84f170bee4666eef..2a5f5a7d2412be4aef32e8bfeb69cab0f6ad4fec 100644 (file)
@@ -5329,9 +5329,12 @@ static void hci_io_capa_request_evt(struct hci_dev *hdev, void *data,
        hci_dev_lock(hdev);
 
        conn = hci_conn_hash_lookup_ba(hdev, ACL_LINK, &ev->bdaddr);
-       if (!conn || !hci_conn_ssp_enabled(conn))
+       if (!conn || !hci_dev_test_flag(hdev, HCI_SSP_ENABLED))
                goto unlock;
 
+       /* Assume remote supports SSP since it has triggered this event */
+       set_bit(HCI_CONN_SSP_ENABLED, &conn->flags);
+
        hci_conn_hold(conn);
 
        if (!hci_dev_test_flag(hdev, HCI_MGMT))
@@ -6794,6 +6797,10 @@ static void hci_le_remote_conn_param_req_evt(struct hci_dev *hdev, void *data,
                return send_conn_param_neg_reply(hdev, handle,
                                                 HCI_ERROR_UNKNOWN_CONN_ID);
 
+       if (max > hcon->le_conn_max_interval)
+               return send_conn_param_neg_reply(hdev, handle,
+                                                HCI_ERROR_INVALID_LL_PARAMS);
+
        if (hci_check_conn_params(min, max, latency, timeout))
                return send_conn_param_neg_reply(hdev, handle,
                                                 HCI_ERROR_INVALID_LL_PARAMS);
@@ -7420,10 +7427,10 @@ static void hci_store_wake_reason(struct hci_dev *hdev, u8 event,
         * keep track of the bdaddr of the connection event that woke us up.
         */
        if (event == HCI_EV_CONN_REQUEST) {
-               bacpy(&hdev->wake_addr, &conn_complete->bdaddr);
+               bacpy(&hdev->wake_addr, &conn_request->bdaddr);
                hdev->wake_addr_type = BDADDR_BREDR;
        } else if (event == HCI_EV_CONN_COMPLETE) {
-               bacpy(&hdev->wake_addr, &conn_request->bdaddr);
+               bacpy(&hdev->wake_addr, &conn_complete->bdaddr);
                hdev->wake_addr_type = BDADDR_BREDR;
        } else if (event == HCI_EV_LE_META) {
                struct hci_ev_le_meta *le_ev = (void *)skb->data;
index a6fc8a2a5c673d5266ceb98bef1d69b70ae19e4c..5716345a26dfb757b540137fa7616f4af49d013c 100644 (file)
@@ -2206,8 +2206,11 @@ static int hci_le_add_accept_list_sync(struct hci_dev *hdev,
 
        /* During suspend, only wakeable devices can be in acceptlist */
        if (hdev->suspended &&
-           !(params->flags & HCI_CONN_FLAG_REMOTE_WAKEUP))
+           !(params->flags & HCI_CONN_FLAG_REMOTE_WAKEUP)) {
+               hci_le_del_accept_list_sync(hdev, &params->addr,
+                                           params->addr_type);
                return 0;
+       }
 
        /* Select filter policy to accept all advertising */
        if (*num_entries >= hdev->le_accept_list_size)
@@ -5559,7 +5562,7 @@ static int hci_inquiry_sync(struct hci_dev *hdev, u8 length)
 
        bt_dev_dbg(hdev, "");
 
-       if (hci_dev_test_flag(hdev, HCI_INQUIRY))
+       if (test_bit(HCI_INQUIRY, &hdev->flags))
                return 0;
 
        hci_dev_lock(hdev);
index 60298975d5c45620f21ca5fe161da1a9fdf55eec..656f49b299d20d9141b9579aef84acf3b81bff7e 100644 (file)
@@ -5613,7 +5613,13 @@ static inline int l2cap_conn_param_update_req(struct l2cap_conn *conn,
 
        memset(&rsp, 0, sizeof(rsp));
 
-       err = hci_check_conn_params(min, max, latency, to_multiplier);
+       if (max > hcon->le_conn_max_interval) {
+               BT_DBG("requested connection interval exceeds current bounds.");
+               err = -EINVAL;
+       } else {
+               err = hci_check_conn_params(min, max, latency, to_multiplier);
+       }
+
        if (err)
                rsp.result = cpu_to_le16(L2CAP_CONN_PARAM_REJECTED);
        else
index bb72ff6eb22f4b30864aefd2588cce982d37d153..ee3b4aad8bd8d65239efc591cf33a631690a270f 100644 (file)
@@ -1045,6 +1045,8 @@ static void rpa_expired(struct work_struct *work)
        hci_cmd_sync_queue(hdev, rpa_expired_sync, NULL, NULL);
 }
 
+static int set_discoverable_sync(struct hci_dev *hdev, void *data);
+
 static void discov_off(struct work_struct *work)
 {
        struct hci_dev *hdev = container_of(work, struct hci_dev,
@@ -1063,7 +1065,7 @@ static void discov_off(struct work_struct *work)
        hci_dev_clear_flag(hdev, HCI_DISCOVERABLE);
        hdev->discov_timeout = 0;
 
-       hci_update_discoverable(hdev);
+       hci_cmd_sync_queue(hdev, set_discoverable_sync, NULL, NULL);
 
        mgmt_new_settings(hdev);
 
index 053ef8f25fae47b369068adb49f1391b32fd7bc9..1d34d8497033299907d341212c2977b2b1d9b870 100644 (file)
@@ -1941,7 +1941,7 @@ static struct rfcomm_session *rfcomm_process_rx(struct rfcomm_session *s)
        /* Get data directly from socket receive queue without copying it. */
        while ((skb = skb_dequeue(&sk->sk_receive_queue))) {
                skb_orphan(skb);
-               if (!skb_linearize(skb)) {
+               if (!skb_linearize(skb) && sk->sk_state != BT_CLOSED) {
                        s = rfcomm_recv_frame(s, skb);
                        if (!s)
                                break;
index d7d021af102981255ba284d396826feb71ae20be..2d7b7324295885e7a5ee70dd63b5dffd9a9a8968 100644 (file)
@@ -1762,6 +1762,10 @@ static void br_ip6_multicast_querier_expired(struct timer_list *t)
 }
 #endif
 
+static void br_multicast_query_delay_expired(struct timer_list *t)
+{
+}
+
 static void br_multicast_select_own_querier(struct net_bridge_mcast *brmctx,
                                            struct br_ip *ip,
                                            struct sk_buff *skb)
@@ -3198,7 +3202,7 @@ br_multicast_update_query_timer(struct net_bridge_mcast *brmctx,
                                unsigned long max_delay)
 {
        if (!timer_pending(&query->timer))
-               query->delay_time = jiffies + max_delay;
+               mod_timer(&query->delay_timer, jiffies + max_delay);
 
        mod_timer(&query->timer, jiffies + brmctx->multicast_querier_interval);
 }
@@ -4041,13 +4045,11 @@ void br_multicast_ctx_init(struct net_bridge *br,
        brmctx->multicast_querier_interval = 255 * HZ;
        brmctx->multicast_membership_interval = 260 * HZ;
 
-       brmctx->ip4_other_query.delay_time = 0;
        brmctx->ip4_querier.port_ifidx = 0;
        seqcount_spinlock_init(&brmctx->ip4_querier.seq, &br->multicast_lock);
        brmctx->multicast_igmp_version = 2;
 #if IS_ENABLED(CONFIG_IPV6)
        brmctx->multicast_mld_version = 1;
-       brmctx->ip6_other_query.delay_time = 0;
        brmctx->ip6_querier.port_ifidx = 0;
        seqcount_spinlock_init(&brmctx->ip6_querier.seq, &br->multicast_lock);
 #endif
@@ -4056,6 +4058,8 @@ void br_multicast_ctx_init(struct net_bridge *br,
                    br_ip4_multicast_local_router_expired, 0);
        timer_setup(&brmctx->ip4_other_query.timer,
                    br_ip4_multicast_querier_expired, 0);
+       timer_setup(&brmctx->ip4_other_query.delay_timer,
+                   br_multicast_query_delay_expired, 0);
        timer_setup(&brmctx->ip4_own_query.timer,
                    br_ip4_multicast_query_expired, 0);
 #if IS_ENABLED(CONFIG_IPV6)
@@ -4063,6 +4067,8 @@ void br_multicast_ctx_init(struct net_bridge *br,
                    br_ip6_multicast_local_router_expired, 0);
        timer_setup(&brmctx->ip6_other_query.timer,
                    br_ip6_multicast_querier_expired, 0);
+       timer_setup(&brmctx->ip6_other_query.delay_timer,
+                   br_multicast_query_delay_expired, 0);
        timer_setup(&brmctx->ip6_own_query.timer,
                    br_ip6_multicast_query_expired, 0);
 #endif
@@ -4197,10 +4203,12 @@ static void __br_multicast_stop(struct net_bridge_mcast *brmctx)
 {
        del_timer_sync(&brmctx->ip4_mc_router_timer);
        del_timer_sync(&brmctx->ip4_other_query.timer);
+       del_timer_sync(&brmctx->ip4_other_query.delay_timer);
        del_timer_sync(&brmctx->ip4_own_query.timer);
 #if IS_ENABLED(CONFIG_IPV6)
        del_timer_sync(&brmctx->ip6_mc_router_timer);
        del_timer_sync(&brmctx->ip6_other_query.timer);
+       del_timer_sync(&brmctx->ip6_other_query.delay_timer);
        del_timer_sync(&brmctx->ip6_own_query.timer);
 #endif
 }
@@ -4643,13 +4651,15 @@ int br_multicast_set_querier(struct net_bridge_mcast *brmctx, unsigned long val)
        max_delay = brmctx->multicast_query_response_interval;
 
        if (!timer_pending(&brmctx->ip4_other_query.timer))
-               brmctx->ip4_other_query.delay_time = jiffies + max_delay;
+               mod_timer(&brmctx->ip4_other_query.delay_timer,
+                         jiffies + max_delay);
 
        br_multicast_start_querier(brmctx, &brmctx->ip4_own_query);
 
 #if IS_ENABLED(CONFIG_IPV6)
        if (!timer_pending(&brmctx->ip6_other_query.timer))
-               brmctx->ip6_other_query.delay_time = jiffies + max_delay;
+               mod_timer(&brmctx->ip6_other_query.delay_timer,
+                         jiffies + max_delay);
 
        br_multicast_start_querier(brmctx, &brmctx->ip6_own_query);
 #endif
index ed17208907578a231d283c04bd97ce48bebdffaa..35e10c5a766d550e0c5cb85cf5a0c4835b52a89d 100644 (file)
 #include <linux/sysctl.h>
 #endif
 
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+#include <net/netfilter/nf_conntrack_core.h>
+#endif
+
 static unsigned int brnf_net_id __read_mostly;
 
 struct brnf_net {
@@ -553,6 +557,90 @@ static unsigned int br_nf_pre_routing(void *priv,
        return NF_STOLEN;
 }
 
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+/* conntracks' nf_confirm logic cannot handle cloned skbs referencing
+ * the same nf_conn entry, which will happen for multicast (broadcast)
+ * Frames on bridges.
+ *
+ * Example:
+ *      macvlan0
+ *      br0
+ *  ethX  ethY
+ *
+ * ethX (or Y) receives multicast or broadcast packet containing
+ * an IP packet, not yet in conntrack table.
+ *
+ * 1. skb passes through bridge and fake-ip (br_netfilter)Prerouting.
+ *    -> skb->_nfct now references a unconfirmed entry
+ * 2. skb is broad/mcast packet. bridge now passes clones out on each bridge
+ *    interface.
+ * 3. skb gets passed up the stack.
+ * 4. In macvlan case, macvlan driver retains clone(s) of the mcast skb
+ *    and schedules a work queue to send them out on the lower devices.
+ *
+ *    The clone skb->_nfct is not a copy, it is the same entry as the
+ *    original skb.  The macvlan rx handler then returns RX_HANDLER_PASS.
+ * 5. Normal conntrack hooks (in NF_INET_LOCAL_IN) confirm the orig skb.
+ *
+ * The Macvlan broadcast worker and normal confirm path will race.
+ *
+ * This race will not happen if step 2 already confirmed a clone. In that
+ * case later steps perform skb_clone() with skb->_nfct already confirmed (in
+ * hash table).  This works fine.
+ *
+ * But such confirmation won't happen when eb/ip/nftables rules dropped the
+ * packets before they reached the nf_confirm step in postrouting.
+ *
+ * Work around this problem by explicit confirmation of the entry at
+ * LOCAL_IN time, before upper layer has a chance to clone the unconfirmed
+ * entry.
+ *
+ */
+static unsigned int br_nf_local_in(void *priv,
+                                  struct sk_buff *skb,
+                                  const struct nf_hook_state *state)
+{
+       struct nf_conntrack *nfct = skb_nfct(skb);
+       const struct nf_ct_hook *ct_hook;
+       struct nf_conn *ct;
+       int ret;
+
+       if (!nfct || skb->pkt_type == PACKET_HOST)
+               return NF_ACCEPT;
+
+       ct = container_of(nfct, struct nf_conn, ct_general);
+       if (likely(nf_ct_is_confirmed(ct)))
+               return NF_ACCEPT;
+
+       WARN_ON_ONCE(skb_shared(skb));
+       WARN_ON_ONCE(refcount_read(&nfct->use) != 1);
+
+       /* We can't call nf_confirm here, it would create a dependency
+        * on nf_conntrack module.
+        */
+       ct_hook = rcu_dereference(nf_ct_hook);
+       if (!ct_hook) {
+               skb->_nfct = 0ul;
+               nf_conntrack_put(nfct);
+               return NF_ACCEPT;
+       }
+
+       nf_bridge_pull_encap_header(skb);
+       ret = ct_hook->confirm(skb);
+       switch (ret & NF_VERDICT_MASK) {
+       case NF_STOLEN:
+               return NF_STOLEN;
+       default:
+               nf_bridge_push_encap_header(skb);
+               break;
+       }
+
+       ct = container_of(nfct, struct nf_conn, ct_general);
+       WARN_ON_ONCE(!nf_ct_is_confirmed(ct));
+
+       return ret;
+}
+#endif
 
 /* PF_BRIDGE/FORWARD *************************************************/
 static int br_nf_forward_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
@@ -964,6 +1052,14 @@ static const struct nf_hook_ops br_nf_ops[] = {
                .hooknum = NF_BR_PRE_ROUTING,
                .priority = NF_BR_PRI_BRNF,
        },
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+       {
+               .hook = br_nf_local_in,
+               .pf = NFPROTO_BRIDGE,
+               .hooknum = NF_BR_LOCAL_IN,
+               .priority = NF_BR_PRI_LAST,
+       },
+#endif
        {
                .hook = br_nf_forward,
                .pf = NFPROTO_BRIDGE,
index b0a92c344722be6bf195d571bf1a26daf759dfca..86ea5e6689b5ce49a4b71b383893d2ef5b53d110 100644 (file)
@@ -78,7 +78,7 @@ struct bridge_mcast_own_query {
 /* other querier */
 struct bridge_mcast_other_query {
        struct timer_list               timer;
-       unsigned long                   delay_time;
+       struct timer_list               delay_timer;
 };
 
 /* selected querier */
@@ -1159,7 +1159,7 @@ __br_multicast_querier_exists(struct net_bridge_mcast *brmctx,
                own_querier_enabled = false;
        }
 
-       return time_is_before_jiffies(querier->delay_time) &&
+       return !timer_pending(&querier->delay_timer) &&
               (own_querier_enabled || timer_pending(&querier->timer));
 }
 
index ee84e783e1dff5b67994a3ba5a4e5d8aa875eeef..7b41ee8740cbbaf6b959d9273c49ebcd4830a5c8 100644 (file)
@@ -595,21 +595,40 @@ br_switchdev_mdb_replay_one(struct notifier_block *nb, struct net_device *dev,
 }
 
 static int br_switchdev_mdb_queue_one(struct list_head *mdb_list,
+                                     struct net_device *dev,
+                                     unsigned long action,
                                      enum switchdev_obj_id id,
                                      const struct net_bridge_mdb_entry *mp,
                                      struct net_device *orig_dev)
 {
-       struct switchdev_obj_port_mdb *mdb;
+       struct switchdev_obj_port_mdb mdb = {
+               .obj = {
+                       .id = id,
+                       .orig_dev = orig_dev,
+               },
+       };
+       struct switchdev_obj_port_mdb *pmdb;
 
-       mdb = kzalloc(sizeof(*mdb), GFP_ATOMIC);
-       if (!mdb)
-               return -ENOMEM;
+       br_switchdev_mdb_populate(&mdb, mp);
+
+       if (action == SWITCHDEV_PORT_OBJ_ADD &&
+           switchdev_port_obj_act_is_deferred(dev, action, &mdb.obj)) {
+               /* This event is already in the deferred queue of
+                * events, so this replay must be elided, lest the
+                * driver receives duplicate events for it. This can
+                * only happen when replaying additions, since
+                * modifications are always immediately visible in
+                * br->mdb_list, whereas actual event delivery may be
+                * delayed.
+                */
+               return 0;
+       }
 
-       mdb->obj.id = id;
-       mdb->obj.orig_dev = orig_dev;
-       br_switchdev_mdb_populate(mdb, mp);
-       list_add_tail(&mdb->obj.list, mdb_list);
+       pmdb = kmemdup(&mdb, sizeof(mdb), GFP_ATOMIC);
+       if (!pmdb)
+               return -ENOMEM;
 
+       list_add_tail(&pmdb->obj.list, mdb_list);
        return 0;
 }
 
@@ -677,51 +696,50 @@ br_switchdev_mdb_replay(struct net_device *br_dev, struct net_device *dev,
        if (!br_opt_get(br, BROPT_MULTICAST_ENABLED))
                return 0;
 
-       /* We cannot walk over br->mdb_list protected just by the rtnl_mutex,
-        * because the write-side protection is br->multicast_lock. But we
-        * need to emulate the [ blocking ] calling context of a regular
-        * switchdev event, so since both br->multicast_lock and RCU read side
-        * critical sections are atomic, we have no choice but to pick the RCU
-        * read side lock, queue up all our events, leave the critical section
-        * and notify switchdev from blocking context.
+       if (adding)
+               action = SWITCHDEV_PORT_OBJ_ADD;
+       else
+               action = SWITCHDEV_PORT_OBJ_DEL;
+
+       /* br_switchdev_mdb_queue_one() will take care to not queue a
+        * replay of an event that is already pending in the switchdev
+        * deferred queue. In order to safely determine that, there
+        * must be no new deferred MDB notifications enqueued for the
+        * duration of the MDB scan. Therefore, grab the write-side
+        * lock to avoid racing with any concurrent IGMP/MLD snooping.
         */
-       rcu_read_lock();
+       spin_lock_bh(&br->multicast_lock);
 
-       hlist_for_each_entry_rcu(mp, &br->mdb_list, mdb_node) {
+       hlist_for_each_entry(mp, &br->mdb_list, mdb_node) {
                struct net_bridge_port_group __rcu * const *pp;
                const struct net_bridge_port_group *p;
 
                if (mp->host_joined) {
-                       err = br_switchdev_mdb_queue_one(&mdb_list,
+                       err = br_switchdev_mdb_queue_one(&mdb_list, dev, action,
                                                         SWITCHDEV_OBJ_ID_HOST_MDB,
                                                         mp, br_dev);
                        if (err) {
-                               rcu_read_unlock();
+                               spin_unlock_bh(&br->multicast_lock);
                                goto out_free_mdb;
                        }
                }
 
-               for (pp = &mp->ports; (p = rcu_dereference(*pp)) != NULL;
+               for (pp = &mp->ports; (p = mlock_dereference(*pp, br)) != NULL;
                     pp = &p->next) {
                        if (p->key.port->dev != dev)
                                continue;
 
-                       err = br_switchdev_mdb_queue_one(&mdb_list,
+                       err = br_switchdev_mdb_queue_one(&mdb_list, dev, action,
                                                         SWITCHDEV_OBJ_ID_PORT_MDB,
                                                         mp, dev);
                        if (err) {
-                               rcu_read_unlock();
+                               spin_unlock_bh(&br->multicast_lock);
                                goto out_free_mdb;
                        }
                }
        }
 
-       rcu_read_unlock();
-
-       if (adding)
-               action = SWITCHDEV_PORT_OBJ_ADD;
-       else
-               action = SWITCHDEV_PORT_OBJ_DEL;
+       spin_unlock_bh(&br->multicast_lock);
 
        list_for_each_entry(obj, &mdb_list, list) {
                err = br_switchdev_mdb_replay_one(nb, dev,
@@ -786,6 +804,16 @@ static void nbp_switchdev_unsync_objs(struct net_bridge_port *p,
        br_switchdev_mdb_replay(br_dev, dev, ctx, false, blocking_nb, NULL);
 
        br_switchdev_vlan_replay(br_dev, ctx, false, blocking_nb, NULL);
+
+       /* Make sure that the device leaving this bridge has seen all
+        * relevant events before it is disassociated. In the normal
+        * case, when the device is directly attached to the bridge,
+        * this is covered by del_nbp(). If the association was indirect
+        * however, e.g. via a team or bond, and the device is leaving
+        * that intermediate device, then the bridge port remains in
+        * place.
+        */
+       switchdev_deferred_process();
 }
 
 /* Let the bridge know that this port is offloaded, so that it can assign a
index abb090f94ed2609eeb9cd54b4e5faed1c3cb7bfe..6f877e31709bad3646ea15bf3a96999ed275bdc1 100644 (file)
@@ -291,6 +291,30 @@ static unsigned int nf_ct_bridge_pre(void *priv, struct sk_buff *skb,
        return nf_conntrack_in(skb, &bridge_state);
 }
 
+static unsigned int nf_ct_bridge_in(void *priv, struct sk_buff *skb,
+                                   const struct nf_hook_state *state)
+{
+       enum ip_conntrack_info ctinfo;
+       struct nf_conn *ct;
+
+       if (skb->pkt_type == PACKET_HOST)
+               return NF_ACCEPT;
+
+       /* nf_conntrack_confirm() cannot handle concurrent clones,
+        * this happens for broad/multicast frames with e.g. macvlan on top
+        * of the bridge device.
+        */
+       ct = nf_ct_get(skb, &ctinfo);
+       if (!ct || nf_ct_is_confirmed(ct) || nf_ct_is_template(ct))
+               return NF_ACCEPT;
+
+       /* let inet prerouting call conntrack again */
+       skb->_nfct = 0;
+       nf_ct_put(ct);
+
+       return NF_ACCEPT;
+}
+
 static void nf_ct_bridge_frag_save(struct sk_buff *skb,
                                   struct nf_bridge_frag_data *data)
 {
@@ -385,6 +409,12 @@ static struct nf_hook_ops nf_ct_bridge_hook_ops[] __read_mostly = {
                .hooknum        = NF_BR_PRE_ROUTING,
                .priority       = NF_IP_PRI_CONNTRACK,
        },
+       {
+               .hook           = nf_ct_bridge_in,
+               .pf             = NFPROTO_BRIDGE,
+               .hooknum        = NF_BR_LOCAL_IN,
+               .priority       = NF_IP_PRI_CONNTRACK_CONFIRM,
+       },
        {
                .hook           = nf_ct_bridge_post,
                .pf             = NFPROTO_BRIDGE,
index 16af1a7f80f60e18b5526c973f66e98821786a78..31a93cae5111b50d54e10a061a9e235e81a1da1c 100644 (file)
@@ -86,7 +86,7 @@ struct j1939_priv {
        unsigned int tp_max_packet_size;
 
        /* lock for j1939_socks list */
-       spinlock_t j1939_socks_lock;
+       rwlock_t j1939_socks_lock;
        struct list_head j1939_socks;
 
        struct kref rx_kref;
@@ -301,6 +301,7 @@ struct j1939_sock {
 
        int ifindex;
        struct j1939_addr addr;
+       spinlock_t filters_lock;
        struct j1939_filter *filters;
        int nfilters;
        pgn_t pgn_rx_filter;
index ecff1c947d683b2f3e4eeff144f39a8f5ff5de1a..a6fb89fa62785121f9edc42308b052142fda5fdd 100644 (file)
@@ -274,7 +274,7 @@ struct j1939_priv *j1939_netdev_start(struct net_device *ndev)
                return ERR_PTR(-ENOMEM);
 
        j1939_tp_init(priv);
-       spin_lock_init(&priv->j1939_socks_lock);
+       rwlock_init(&priv->j1939_socks_lock);
        INIT_LIST_HEAD(&priv->j1939_socks);
 
        mutex_lock(&j1939_netdev_lock);
index 14c43166323393541bc102f47a311c79199a2acd..305dd72c844c70f1589f14fcb668bd46b92ff6f2 100644 (file)
@@ -80,16 +80,16 @@ static void j1939_jsk_add(struct j1939_priv *priv, struct j1939_sock *jsk)
        jsk->state |= J1939_SOCK_BOUND;
        j1939_priv_get(priv);
 
-       spin_lock_bh(&priv->j1939_socks_lock);
+       write_lock_bh(&priv->j1939_socks_lock);
        list_add_tail(&jsk->list, &priv->j1939_socks);
-       spin_unlock_bh(&priv->j1939_socks_lock);
+       write_unlock_bh(&priv->j1939_socks_lock);
 }
 
 static void j1939_jsk_del(struct j1939_priv *priv, struct j1939_sock *jsk)
 {
-       spin_lock_bh(&priv->j1939_socks_lock);
+       write_lock_bh(&priv->j1939_socks_lock);
        list_del_init(&jsk->list);
-       spin_unlock_bh(&priv->j1939_socks_lock);
+       write_unlock_bh(&priv->j1939_socks_lock);
 
        j1939_priv_put(priv);
        jsk->state &= ~J1939_SOCK_BOUND;
@@ -262,12 +262,17 @@ static bool j1939_sk_match_dst(struct j1939_sock *jsk,
 static bool j1939_sk_match_filter(struct j1939_sock *jsk,
                                  const struct j1939_sk_buff_cb *skcb)
 {
-       const struct j1939_filter *f = jsk->filters;
-       int nfilter = jsk->nfilters;
+       const struct j1939_filter *f;
+       int nfilter;
+
+       spin_lock_bh(&jsk->filters_lock);
+
+       f = jsk->filters;
+       nfilter = jsk->nfilters;
 
        if (!nfilter)
                /* receive all when no filters are assigned */
-               return true;
+               goto filter_match_found;
 
        for (; nfilter; ++f, --nfilter) {
                if ((skcb->addr.pgn & f->pgn_mask) != f->pgn)
@@ -276,9 +281,15 @@ static bool j1939_sk_match_filter(struct j1939_sock *jsk,
                        continue;
                if ((skcb->addr.src_name & f->name_mask) != f->name)
                        continue;
-               return true;
+               goto filter_match_found;
        }
+
+       spin_unlock_bh(&jsk->filters_lock);
        return false;
+
+filter_match_found:
+       spin_unlock_bh(&jsk->filters_lock);
+       return true;
 }
 
 static bool j1939_sk_recv_match_one(struct j1939_sock *jsk,
@@ -329,13 +340,13 @@ bool j1939_sk_recv_match(struct j1939_priv *priv, struct j1939_sk_buff_cb *skcb)
        struct j1939_sock *jsk;
        bool match = false;
 
-       spin_lock_bh(&priv->j1939_socks_lock);
+       read_lock_bh(&priv->j1939_socks_lock);
        list_for_each_entry(jsk, &priv->j1939_socks, list) {
                match = j1939_sk_recv_match_one(jsk, skcb);
                if (match)
                        break;
        }
-       spin_unlock_bh(&priv->j1939_socks_lock);
+       read_unlock_bh(&priv->j1939_socks_lock);
 
        return match;
 }
@@ -344,11 +355,11 @@ void j1939_sk_recv(struct j1939_priv *priv, struct sk_buff *skb)
 {
        struct j1939_sock *jsk;
 
-       spin_lock_bh(&priv->j1939_socks_lock);
+       read_lock_bh(&priv->j1939_socks_lock);
        list_for_each_entry(jsk, &priv->j1939_socks, list) {
                j1939_sk_recv_one(jsk, skb);
        }
-       spin_unlock_bh(&priv->j1939_socks_lock);
+       read_unlock_bh(&priv->j1939_socks_lock);
 }
 
 static void j1939_sk_sock_destruct(struct sock *sk)
@@ -401,6 +412,7 @@ static int j1939_sk_init(struct sock *sk)
        atomic_set(&jsk->skb_pending, 0);
        spin_lock_init(&jsk->sk_session_queue_lock);
        INIT_LIST_HEAD(&jsk->sk_session_queue);
+       spin_lock_init(&jsk->filters_lock);
 
        /* j1939_sk_sock_destruct() depends on SOCK_RCU_FREE flag */
        sock_set_flag(sk, SOCK_RCU_FREE);
@@ -703,9 +715,11 @@ static int j1939_sk_setsockopt(struct socket *sock, int level, int optname,
                }
 
                lock_sock(&jsk->sk);
+               spin_lock_bh(&jsk->filters_lock);
                ofilters = jsk->filters;
                jsk->filters = filters;
                jsk->nfilters = count;
+               spin_unlock_bh(&jsk->filters_lock);
                release_sock(&jsk->sk);
                kfree(ofilters);
                return 0;
@@ -1080,12 +1094,12 @@ void j1939_sk_errqueue(struct j1939_session *session,
        }
 
        /* spread RX notifications to all sockets subscribed to this session */
-       spin_lock_bh(&priv->j1939_socks_lock);
+       read_lock_bh(&priv->j1939_socks_lock);
        list_for_each_entry(jsk, &priv->j1939_socks, list) {
                if (j1939_sk_recv_match_one(jsk, &session->skcb))
                        __j1939_sk_errqueue(session, &jsk->sk, type);
        }
-       spin_unlock_bh(&priv->j1939_socks_lock);
+       read_unlock_bh(&priv->j1939_socks_lock);
 };
 
 void j1939_sk_send_loop_abort(struct sock *sk, int err)
@@ -1273,7 +1287,7 @@ void j1939_sk_netdev_event_netdown(struct j1939_priv *priv)
        struct j1939_sock *jsk;
        int error_code = ENETDOWN;
 
-       spin_lock_bh(&priv->j1939_socks_lock);
+       read_lock_bh(&priv->j1939_socks_lock);
        list_for_each_entry(jsk, &priv->j1939_socks, list) {
                jsk->sk.sk_err = error_code;
                if (!sock_flag(&jsk->sk, SOCK_DEAD))
@@ -1281,7 +1295,7 @@ void j1939_sk_netdev_event_netdown(struct j1939_priv *priv)
 
                j1939_sk_queue_drop_all(priv, jsk, error_code);
        }
-       spin_unlock_bh(&priv->j1939_socks_lock);
+       read_unlock_bh(&priv->j1939_socks_lock);
 }
 
 static int j1939_sk_no_ioctlcmd(struct socket *sock, unsigned int cmd,
index f9a50d7f0d204639f821835d341bb87c13a80333..0cb61c76b9b87da0746294cb371bc62defec0f81 100644 (file)
@@ -160,8 +160,9 @@ static size_t sizeof_footer(struct ceph_connection *con)
 static void prepare_message_data(struct ceph_msg *msg, u32 data_len)
 {
        /* Initialize data cursor if it's not a sparse read */
-       if (!msg->sparse_read)
-               ceph_msg_data_cursor_init(&msg->cursor, msg, data_len);
+       u64 len = msg->sparse_read_total ? : data_len;
+
+       ceph_msg_data_cursor_init(&msg->cursor, msg, len);
 }
 
 /*
@@ -991,7 +992,7 @@ static inline int read_partial_message_section(struct ceph_connection *con,
        return read_partial_message_chunk(con, section, sec_len, crc);
 }
 
-static int read_sparse_msg_extent(struct ceph_connection *con, u32 *crc)
+static int read_partial_sparse_msg_extent(struct ceph_connection *con, u32 *crc)
 {
        struct ceph_msg_data_cursor *cursor = &con->in_msg->cursor;
        bool do_bounce = ceph_test_opt(from_msgr(con->msgr), RXBOUNCE);
@@ -1026,7 +1027,7 @@ static int read_sparse_msg_extent(struct ceph_connection *con, u32 *crc)
        return 1;
 }
 
-static int read_sparse_msg_data(struct ceph_connection *con)
+static int read_partial_sparse_msg_data(struct ceph_connection *con)
 {
        struct ceph_msg_data_cursor *cursor = &con->in_msg->cursor;
        bool do_datacrc = !ceph_test_opt(from_msgr(con->msgr), NOCRC);
@@ -1036,31 +1037,31 @@ static int read_sparse_msg_data(struct ceph_connection *con)
        if (do_datacrc)
                crc = con->in_data_crc;
 
-       do {
+       while (cursor->total_resid) {
                if (con->v1.in_sr_kvec.iov_base)
                        ret = read_partial_message_chunk(con,
                                                         &con->v1.in_sr_kvec,
                                                         con->v1.in_sr_len,
                                                         &crc);
                else if (cursor->sr_resid > 0)
-                       ret = read_sparse_msg_extent(con, &crc);
-
-               if (ret <= 0) {
-                       if (do_datacrc)
-                               con->in_data_crc = crc;
-                       return ret;
-               }
+                       ret = read_partial_sparse_msg_extent(con, &crc);
+               if (ret <= 0)
+                       break;
 
                memset(&con->v1.in_sr_kvec, 0, sizeof(con->v1.in_sr_kvec));
                ret = con->ops->sparse_read(con, cursor,
                                (char **)&con->v1.in_sr_kvec.iov_base);
+               if (ret <= 0) {
+                       ret = ret ? ret : 1;  /* must return > 0 to indicate success */
+                       break;
+               }
                con->v1.in_sr_len = ret;
-       } while (ret > 0);
+       }
 
        if (do_datacrc)
                con->in_data_crc = crc;
 
-       return ret < 0 ? ret : 1;  /* must return > 0 to indicate success */
+       return ret;
 }
 
 static int read_partial_msg_data(struct ceph_connection *con)
@@ -1253,8 +1254,8 @@ static int read_partial_message(struct ceph_connection *con)
                if (!m->num_data_items)
                        return -EIO;
 
-               if (m->sparse_read)
-                       ret = read_sparse_msg_data(con);
+               if (m->sparse_read_total)
+                       ret = read_partial_sparse_msg_data(con);
                else if (ceph_test_opt(from_msgr(con->msgr), RXBOUNCE))
                        ret = read_partial_msg_data_bounce(con);
                else
index f8ec60e1aba3a112aaa024c235f0117297b9bf70..bd608ffa06279704b5f4f43e5e369035e3ff032c 100644 (file)
@@ -1128,7 +1128,7 @@ static int decrypt_tail(struct ceph_connection *con)
        struct sg_table enc_sgt = {};
        struct sg_table sgt = {};
        struct page **pages = NULL;
-       bool sparse = con->in_msg->sparse_read;
+       bool sparse = !!con->in_msg->sparse_read_total;
        int dpos = 0;
        int tail_len;
        int ret;
@@ -2034,6 +2034,9 @@ static int prepare_sparse_read_data(struct ceph_connection *con)
        if (!con_secure(con))
                con->in_data_crc = -1;
 
+       ceph_msg_data_cursor_init(&con->v2.in_cursor, msg,
+                                 msg->sparse_read_total);
+
        reset_in_kvecs(con);
        con->v2.in_state = IN_S_PREPARE_SPARSE_DATA_CONT;
        con->v2.data_len_remain = data_len(msg);
@@ -2060,7 +2063,7 @@ static int prepare_read_tail_plain(struct ceph_connection *con)
        }
 
        if (data_len(msg)) {
-               if (msg->sparse_read)
+               if (msg->sparse_read_total)
                        con->v2.in_state = IN_S_PREPARE_SPARSE_DATA;
                else
                        con->v2.in_state = IN_S_PREPARE_READ_DATA;
index 625622016f5761e36bccc3f7a239e265039ce95d..9d078b37fe0b9b085894be86db17053227de9a18 100644 (file)
@@ -5510,7 +5510,7 @@ static struct ceph_msg *get_reply(struct ceph_connection *con,
        }
 
        m = ceph_msg_get(req->r_reply);
-       m->sparse_read = (bool)srlen;
+       m->sparse_read_total = srlen;
 
        dout("get_reply tid %lld %p\n", tid, m);
 
@@ -5777,11 +5777,8 @@ static int prep_next_sparse_read(struct ceph_connection *con,
        }
 
        if (o->o_sparse_op_idx < 0) {
-               u64 srlen = sparse_data_requested(req);
-
-               dout("%s: [%d] starting new sparse read req. srlen=0x%llx\n",
-                    __func__, o->o_osd, srlen);
-               ceph_msg_data_cursor_init(cursor, con->in_msg, srlen);
+               dout("%s: [%d] starting new sparse read req\n",
+                    __func__, o->o_osd);
        } else {
                u64 end;
 
@@ -5857,8 +5854,8 @@ static int osd_sparse_read(struct ceph_connection *con,
        struct ceph_osd *o = con->private;
        struct ceph_sparse_read *sr = &o->o_sparse_read;
        u32 count = sr->sr_count;
-       u64 eoff, elen;
-       int ret;
+       u64 eoff, elen, len = 0;
+       int i, ret;
 
        switch (sr->sr_state) {
        case CEPH_SPARSE_READ_HDR:
@@ -5903,8 +5900,20 @@ next_op:
                convert_extent_map(sr);
                ret = sizeof(sr->sr_datalen);
                *pbuf = (char *)&sr->sr_datalen;
-               sr->sr_state = CEPH_SPARSE_READ_DATA;
+               sr->sr_state = CEPH_SPARSE_READ_DATA_PRE;
                break;
+       case CEPH_SPARSE_READ_DATA_PRE:
+               /* Convert sr_datalen to host-endian */
+               sr->sr_datalen = le32_to_cpu((__force __le32)sr->sr_datalen);
+               for (i = 0; i < count; i++)
+                       len += sr->sr_extent[i].len;
+               if (sr->sr_datalen != len) {
+                       pr_warn_ratelimited("data len %u != extent len %llu\n",
+                                           sr->sr_datalen, len);
+                       return -EREMOTEIO;
+               }
+               sr->sr_state = CEPH_SPARSE_READ_DATA;
+               fallthrough;
        case CEPH_SPARSE_READ_DATA:
                if (sr->sr_index >= count) {
                        sr->sr_state = CEPH_SPARSE_READ_HDR;
index 103d46fa0eeb34af20b2d74b79f38e7424e25155..a8b625abe242c657dca8cd0188c236553757c6b2 100644 (file)
@@ -751,7 +751,7 @@ size_t memcpy_to_iter_csum(void *iter_to, size_t progress,
                           size_t len, void *from, void *priv2)
 {
        __wsum *csum = priv2;
-       __wsum next = csum_partial_copy_nocheck(from, iter_to, len);
+       __wsum next = csum_partial_copy_nocheck(from + progress, iter_to, len);
 
        *csum = csum_block_add(*csum, next, progress);
        return 0;
index f01a9b858347b41e88c25632d5d9524cbabba9a1..a892f72651890d4da0c1b954991130c07eb91ee9 100644 (file)
@@ -336,7 +336,7 @@ int netdev_name_node_alt_create(struct net_device *dev, const char *name)
                return -ENOMEM;
        netdev_name_node_add(net, name_node);
        /* The node that holds dev->name acts as a head of per-device list. */
-       list_add_tail(&name_node->list, &dev->name_node->list);
+       list_add_tail_rcu(&name_node->list, &dev->name_node->list);
 
        return 0;
 }
@@ -6177,8 +6177,13 @@ static void __busy_poll_stop(struct napi_struct *napi, bool skip_schedule)
        clear_bit(NAPI_STATE_SCHED, &napi->state);
 }
 
-static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, bool prefer_busy_poll,
-                          u16 budget)
+enum {
+       NAPI_F_PREFER_BUSY_POLL = 1,
+       NAPI_F_END_ON_RESCHED   = 2,
+};
+
+static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock,
+                          unsigned flags, u16 budget)
 {
        bool skip_schedule = false;
        unsigned long timeout;
@@ -6198,7 +6203,7 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, bool
 
        local_bh_disable();
 
-       if (prefer_busy_poll) {
+       if (flags & NAPI_F_PREFER_BUSY_POLL) {
                napi->defer_hard_irqs_count = READ_ONCE(napi->dev->napi_defer_hard_irqs);
                timeout = READ_ONCE(napi->dev->gro_flush_timeout);
                if (napi->defer_hard_irqs_count && timeout) {
@@ -6222,23 +6227,23 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, bool
        local_bh_enable();
 }
 
-void napi_busy_loop(unsigned int napi_id,
-                   bool (*loop_end)(void *, unsigned long),
-                   void *loop_end_arg, bool prefer_busy_poll, u16 budget)
+static void __napi_busy_loop(unsigned int napi_id,
+                     bool (*loop_end)(void *, unsigned long),
+                     void *loop_end_arg, unsigned flags, u16 budget)
 {
        unsigned long start_time = loop_end ? busy_loop_current_time() : 0;
        int (*napi_poll)(struct napi_struct *napi, int budget);
        void *have_poll_lock = NULL;
        struct napi_struct *napi;
 
+       WARN_ON_ONCE(!rcu_read_lock_held());
+
 restart:
        napi_poll = NULL;
 
-       rcu_read_lock();
-
        napi = napi_by_id(napi_id);
        if (!napi)
-               goto out;
+               return;
 
        if (!IS_ENABLED(CONFIG_PREEMPT_RT))
                preempt_disable();
@@ -6254,14 +6259,14 @@ restart:
                         */
                        if (val & (NAPIF_STATE_DISABLE | NAPIF_STATE_SCHED |
                                   NAPIF_STATE_IN_BUSY_POLL)) {
-                               if (prefer_busy_poll)
+                               if (flags & NAPI_F_PREFER_BUSY_POLL)
                                        set_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
                                goto count;
                        }
                        if (cmpxchg(&napi->state, val,
                                    val | NAPIF_STATE_IN_BUSY_POLL |
                                          NAPIF_STATE_SCHED) != val) {
-                               if (prefer_busy_poll)
+                               if (flags & NAPI_F_PREFER_BUSY_POLL)
                                        set_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
                                goto count;
                        }
@@ -6281,12 +6286,15 @@ count:
                        break;
 
                if (unlikely(need_resched())) {
+                       if (flags & NAPI_F_END_ON_RESCHED)
+                               break;
                        if (napi_poll)
-                               busy_poll_stop(napi, have_poll_lock, prefer_busy_poll, budget);
+                               busy_poll_stop(napi, have_poll_lock, flags, budget);
                        if (!IS_ENABLED(CONFIG_PREEMPT_RT))
                                preempt_enable();
                        rcu_read_unlock();
                        cond_resched();
+                       rcu_read_lock();
                        if (loop_end(loop_end_arg, start_time))
                                return;
                        goto restart;
@@ -6294,10 +6302,31 @@ count:
                cpu_relax();
        }
        if (napi_poll)
-               busy_poll_stop(napi, have_poll_lock, prefer_busy_poll, budget);
+               busy_poll_stop(napi, have_poll_lock, flags, budget);
        if (!IS_ENABLED(CONFIG_PREEMPT_RT))
                preempt_enable();
-out:
+}
+
+void napi_busy_loop_rcu(unsigned int napi_id,
+                       bool (*loop_end)(void *, unsigned long),
+                       void *loop_end_arg, bool prefer_busy_poll, u16 budget)
+{
+       unsigned flags = NAPI_F_END_ON_RESCHED;
+
+       if (prefer_busy_poll)
+               flags |= NAPI_F_PREFER_BUSY_POLL;
+
+       __napi_busy_loop(napi_id, loop_end, loop_end_arg, flags, budget);
+}
+
+void napi_busy_loop(unsigned int napi_id,
+                   bool (*loop_end)(void *, unsigned long),
+                   void *loop_end_arg, bool prefer_busy_poll, u16 budget)
+{
+       unsigned flags = prefer_busy_poll ? NAPI_F_PREFER_BUSY_POLL : 0;
+
+       rcu_read_lock();
+       __napi_busy_loop(napi_id, loop_end, loop_end_arg, flags, budget);
        rcu_read_unlock();
 }
 EXPORT_SYMBOL(napi_busy_loop);
@@ -9074,28 +9103,6 @@ bool netdev_port_same_parent_id(struct net_device *a, struct net_device *b)
 }
 EXPORT_SYMBOL(netdev_port_same_parent_id);
 
-static void netdev_dpll_pin_assign(struct net_device *dev, struct dpll_pin *dpll_pin)
-{
-#if IS_ENABLED(CONFIG_DPLL)
-       rtnl_lock();
-       dev->dpll_pin = dpll_pin;
-       rtnl_unlock();
-#endif
-}
-
-void netdev_dpll_pin_set(struct net_device *dev, struct dpll_pin *dpll_pin)
-{
-       WARN_ON(!dpll_pin);
-       netdev_dpll_pin_assign(dev, dpll_pin);
-}
-EXPORT_SYMBOL(netdev_dpll_pin_set);
-
-void netdev_dpll_pin_clear(struct net_device *dev)
-{
-       netdev_dpll_pin_assign(dev, NULL);
-}
-EXPORT_SYMBOL(netdev_dpll_pin_clear);
-
 /**
  *     dev_change_proto_down - set carrier according to proto_down.
  *
@@ -11551,6 +11558,7 @@ static struct pernet_operations __net_initdata netdev_net_ops = {
 
 static void __net_exit default_device_exit_net(struct net *net)
 {
+       struct netdev_name_node *name_node, *tmp;
        struct net_device *dev, *aux;
        /*
         * Push all migratable network devices back to the
@@ -11573,6 +11581,14 @@ static void __net_exit default_device_exit_net(struct net *net)
                snprintf(fb_name, IFNAMSIZ, "dev%d", dev->ifindex);
                if (netdev_name_in_use(&init_net, fb_name))
                        snprintf(fb_name, IFNAMSIZ, "dev%%d");
+
+               netdev_for_each_altname_safe(dev, name_node, tmp)
+                       if (netdev_name_in_use(&init_net, name_node->name)) {
+                               netdev_name_node_del(name_node);
+                               synchronize_rcu();
+                               __netdev_name_node_alt_destroy(name_node);
+                       }
+
                err = dev_change_net_namespace(dev, &init_net, fb_name);
                if (err) {
                        pr_emerg("%s: failed to move %s to init_net: %d\n",
@@ -11643,11 +11659,12 @@ static void __init net_dev_struct_check(void)
        CACHELINE_ASSERT_GROUP_SIZE(struct net_device, net_device_read_tx, 160);
 
        /* TXRX read-mostly hotpath */
+       CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_txrx, lstats);
        CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_txrx, flags);
        CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_txrx, hard_header_len);
        CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_txrx, features);
        CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_txrx, ip6_ptr);
-       CACHELINE_ASSERT_GROUP_SIZE(struct net_device, net_device_read_txrx, 30);
+       CACHELINE_ASSERT_GROUP_SIZE(struct net_device, net_device_read_txrx, 38);
 
        /* RX read-mostly hotpath */
        CACHELINE_ASSERT_GROUP_MEMBER(struct net_device, net_device_read_rx, ptype_specific);
index cf93e188785ba7f0fd6e9428762bf02105eb3154..7480b4c8429808378f7c5ec499c4f479d5a4b285 100644 (file)
@@ -63,6 +63,9 @@ int dev_change_name(struct net_device *dev, const char *newname);
 
 #define netdev_for_each_altname(dev, namenode)                         \
        list_for_each_entry((namenode), &(dev)->name_node->list, list)
+#define netdev_for_each_altname_safe(dev, namenode, next)              \
+       list_for_each_entry_safe((namenode), (next), &(dev)->name_node->list, \
+                                list)
 
 int netdev_name_node_alt_create(struct net_device *dev, const char *name);
 int netdev_name_node_alt_destroy(struct net_device *dev, const char *name);
index 24061f29c9dd25bcf2e852471d0e8e394ff4121a..ef3e78b6a39c45b9487931e0b7fa438e722aac2e 100644 (file)
@@ -83,6 +83,7 @@
 #include <net/netfilter/nf_conntrack_bpf.h>
 #include <net/netkit.h>
 #include <linux/un.h>
+#include <net/xdp_sock_drv.h>
 
 #include "dev.h"
 
@@ -4092,10 +4093,46 @@ static int bpf_xdp_frags_increase_tail(struct xdp_buff *xdp, int offset)
        memset(skb_frag_address(frag) + skb_frag_size(frag), 0, offset);
        skb_frag_size_add(frag, offset);
        sinfo->xdp_frags_size += offset;
+       if (rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL)
+               xsk_buff_get_tail(xdp)->data_end += offset;
 
        return 0;
 }
 
+static void bpf_xdp_shrink_data_zc(struct xdp_buff *xdp, int shrink,
+                                  struct xdp_mem_info *mem_info, bool release)
+{
+       struct xdp_buff *zc_frag = xsk_buff_get_tail(xdp);
+
+       if (release) {
+               xsk_buff_del_tail(zc_frag);
+               __xdp_return(NULL, mem_info, false, zc_frag);
+       } else {
+               zc_frag->data_end -= shrink;
+       }
+}
+
+static bool bpf_xdp_shrink_data(struct xdp_buff *xdp, skb_frag_t *frag,
+                               int shrink)
+{
+       struct xdp_mem_info *mem_info = &xdp->rxq->mem;
+       bool release = skb_frag_size(frag) == shrink;
+
+       if (mem_info->type == MEM_TYPE_XSK_BUFF_POOL) {
+               bpf_xdp_shrink_data_zc(xdp, shrink, mem_info, release);
+               goto out;
+       }
+
+       if (release) {
+               struct page *page = skb_frag_page(frag);
+
+               __xdp_return(page_address(page), mem_info, false, NULL);
+       }
+
+out:
+       return release;
+}
+
 static int bpf_xdp_frags_shrink_tail(struct xdp_buff *xdp, int offset)
 {
        struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
@@ -4110,12 +4147,7 @@ static int bpf_xdp_frags_shrink_tail(struct xdp_buff *xdp, int offset)
 
                len_free += shrink;
                offset -= shrink;
-
-               if (skb_frag_size(frag) == shrink) {
-                       struct page *page = skb_frag_page(frag);
-
-                       __xdp_return(page_address(page), &xdp->rxq->mem,
-                                    false, NULL);
+               if (bpf_xdp_shrink_data(xdp, frag, shrink)) {
                        n_frags_free++;
                } else {
                        skb_frag_size_sub(frag, shrink);
index 4c2e77bd12f4b17f57f115ce4f9b99fb1da0a875..358c44680d917dfe8b324ab3a2a86fe78de44930 100644 (file)
@@ -225,7 +225,7 @@ static void gso_test_func(struct kunit *test)
 
        segs = skb_segment(skb, features);
        if (IS_ERR(segs)) {
-               KUNIT_FAIL(test, "segs error %lld", PTR_ERR(segs));
+               KUNIT_FAIL(test, "segs error %pe", segs);
                goto free_gso_skb;
        } else if (!segs) {
                KUNIT_FAIL(test, "no segments");
index ffe5244e5597e806e1cbd2dc82894276e107e91c..278294aca66ababdf5f7d383833ff5496255b274 100644 (file)
@@ -94,11 +94,12 @@ netdev_nl_page_pool_get_dump(struct sk_buff *skb, struct netlink_callback *cb,
                        state->pp_id = pool->user.id;
                        err = fill(skb, pool, info);
                        if (err)
-                               break;
+                               goto out;
                }
 
                state->pp_id = 0;
        }
+out:
        mutex_unlock(&page_pools_lock);
        rtnl_unlock();
 
index f35c2e9984062ba4bed637eaeace4eb9e71dadc0..63de5c635842b6f9e6d92f2a28a69009e54ec68c 100644 (file)
@@ -33,9 +33,6 @@
 
 void reqsk_queue_alloc(struct request_sock_queue *queue)
 {
-       spin_lock_init(&queue->rskq_lock);
-
-       spin_lock_init(&queue->fastopenq.lock);
        queue->fastopenq.rskq_rst_head = NULL;
        queue->fastopenq.rskq_rst_tail = NULL;
        queue->fastopenq.qlen = 0;
index f6f29eb03ec277a1ea17ccc220fa7624bf6db092..bd50e9fe3234b6c252a199f05adcae2a09b1bbdd 100644 (file)
@@ -1020,14 +1020,17 @@ static size_t rtnl_xdp_size(void)
 static size_t rtnl_prop_list_size(const struct net_device *dev)
 {
        struct netdev_name_node *name_node;
-       size_t size;
+       unsigned int cnt = 0;
+
+       rcu_read_lock();
+       list_for_each_entry_rcu(name_node, &dev->name_node->list, list)
+               cnt++;
+       rcu_read_unlock();
 
-       if (list_empty(&dev->name_node->list))
+       if (!cnt)
                return 0;
-       size = nla_total_size(0);
-       list_for_each_entry(name_node, &dev->name_node->list, list)
-               size += nla_total_size(ALTIFNAMSIZ);
-       return size;
+
+       return nla_total_size(0) + cnt * nla_total_size(ALTIFNAMSIZ);
 }
 
 static size_t rtnl_proto_down_size(const struct net_device *dev)
@@ -1054,7 +1057,7 @@ static size_t rtnl_dpll_pin_size(const struct net_device *dev)
 {
        size_t size = nla_total_size(0); /* nest IFLA_DPLL_PIN */
 
-       size += dpll_msg_pin_handle_size(netdev_dpll_pin(dev));
+       size += dpll_netdev_pin_handle_size(dev);
 
        return size;
 }
@@ -1789,7 +1792,7 @@ static int rtnl_fill_dpll_pin(struct sk_buff *skb,
        if (!dpll_pin_nest)
                return -EMSGSIZE;
 
-       ret = dpll_msg_add_pin_handle(skb, netdev_dpll_pin(dev));
+       ret = dpll_netdev_add_pin_handle(skb, dev);
        if (ret < 0)
                goto nest_cancel;
 
@@ -5166,10 +5169,9 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
        struct net *net = sock_net(skb->sk);
        struct ifinfomsg *ifm;
        struct net_device *dev;
-       struct nlattr *br_spec, *attr = NULL;
+       struct nlattr *br_spec, *attr, *br_flags_attr = NULL;
        int rem, err = -EOPNOTSUPP;
        u16 flags = 0;
-       bool have_flags = false;
 
        if (nlmsg_len(nlh) < sizeof(*ifm))
                return -EINVAL;
@@ -5187,11 +5189,11 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
        br_spec = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg), IFLA_AF_SPEC);
        if (br_spec) {
                nla_for_each_nested(attr, br_spec, rem) {
-                       if (nla_type(attr) == IFLA_BRIDGE_FLAGS && !have_flags) {
+                       if (nla_type(attr) == IFLA_BRIDGE_FLAGS && !br_flags_attr) {
                                if (nla_len(attr) < sizeof(flags))
                                        return -EINVAL;
 
-                               have_flags = true;
+                               br_flags_attr = attr;
                                flags = nla_get_u16(attr);
                        }
 
@@ -5235,8 +5237,8 @@ static int rtnl_bridge_setlink(struct sk_buff *skb, struct nlmsghdr *nlh,
                }
        }
 
-       if (have_flags)
-               memcpy(nla_data(attr), &flags, sizeof(flags));
+       if (br_flags_attr)
+               memcpy(nla_data(br_flags_attr), &flags, sizeof(flags));
 out:
        return err;
 }
index 93ecfceac1bc49bd843728518215ade5ced374a5..4d75ef9d24bfa7cbffe642448f5116ac0b943ed2 100644 (file)
@@ -1226,8 +1226,11 @@ static void sk_psock_verdict_data_ready(struct sock *sk)
 
                rcu_read_lock();
                psock = sk_psock(sk);
-               if (psock)
-                       psock->saved_data_ready(sk);
+               if (psock) {
+                       read_lock_bh(&sk->sk_callback_lock);
+                       sk_psock_data_ready(sk, psock);
+                       read_unlock_bh(&sk->sk_callback_lock);
+               }
                rcu_read_unlock();
        }
 }
index 158dbdebce6a3693deb63e557e856d9cdd7500ae..5e78798456fd81dbd34e94021531340f7ba5ab0a 100644 (file)
 #include <linux/interrupt.h>
 #include <linux/poll.h>
 #include <linux/tcp.h>
+#include <linux/udp.h>
 #include <linux/init.h>
 #include <linux/highmem.h>
 #include <linux/user_namespace.h>
@@ -1187,6 +1188,17 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
                 */
                WRITE_ONCE(sk->sk_txrehash, (u8)val);
                return 0;
+       case SO_PEEK_OFF:
+               {
+               int (*set_peek_off)(struct sock *sk, int val);
+
+               set_peek_off = READ_ONCE(sock->ops)->set_peek_off;
+               if (set_peek_off)
+                       ret = set_peek_off(sk, val);
+               else
+                       ret = -EOPNOTSUPP;
+               return ret;
+               }
        }
 
        sockopt_lock_sock(sk);
@@ -1429,18 +1441,6 @@ set_sndbuf:
                sock_valbool_flag(sk, SOCK_WIFI_STATUS, valbool);
                break;
 
-       case SO_PEEK_OFF:
-               {
-               int (*set_peek_off)(struct sock *sk, int val);
-
-               set_peek_off = READ_ONCE(sock->ops)->set_peek_off;
-               if (set_peek_off)
-                       ret = set_peek_off(sk, val);
-               else
-                       ret = -EOPNOTSUPP;
-               break;
-               }
-
        case SO_NOFCS:
                sock_valbool_flag(sk, SOCK_NOFCS, valbool);
                break;
@@ -4144,8 +4144,14 @@ bool sk_busy_loop_end(void *p, unsigned long start_time)
 {
        struct sock *sk = p;
 
-       return !skb_queue_empty_lockless(&sk->sk_receive_queue) ||
-              sk_busy_loop_timeout(sk, start_time);
+       if (!skb_queue_empty_lockless(&sk->sk_receive_queue))
+               return true;
+
+       if (sk_is_udp(sk) &&
+           !skb_queue_empty_lockless(&udp_sk(sk)->reader_queue))
+               return true;
+
+       return sk_busy_loop_timeout(sk, start_time);
 }
 EXPORT_SYMBOL(sk_busy_loop_end);
 #endif /* CONFIG_NET_RX_BUSY_POLL */
index 4275a2bc6d8e062052a88503b731d9599ca55d2a..7f0b093208d75b91e25cb78a73bece8ef2577831 100644 (file)
@@ -46,7 +46,7 @@ struct devlink_rel {
                u32 obj_index;
                devlink_rel_notify_cb_t *notify_cb;
                devlink_rel_cleanup_cb_t *cleanup_cb;
-               struct work_struct notify_work;
+               struct delayed_work notify_work;
        } nested_in;
 };
 
@@ -70,7 +70,7 @@ static void __devlink_rel_put(struct devlink_rel *rel)
 static void devlink_rel_nested_in_notify_work(struct work_struct *work)
 {
        struct devlink_rel *rel = container_of(work, struct devlink_rel,
-                                              nested_in.notify_work);
+                                              nested_in.notify_work.work);
        struct devlink *devlink;
 
        devlink = devlinks_xa_get(rel->nested_in.devlink_index);
@@ -96,13 +96,13 @@ rel_put:
        return;
 
 reschedule_work:
-       schedule_work(&rel->nested_in.notify_work);
+       schedule_delayed_work(&rel->nested_in.notify_work, 1);
 }
 
 static void devlink_rel_nested_in_notify_work_schedule(struct devlink_rel *rel)
 {
        __devlink_rel_get(rel);
-       schedule_work(&rel->nested_in.notify_work);
+       schedule_delayed_work(&rel->nested_in.notify_work, 0);
 }
 
 static struct devlink_rel *devlink_rel_alloc(void)
@@ -123,8 +123,8 @@ static struct devlink_rel *devlink_rel_alloc(void)
        }
 
        refcount_set(&rel->refcount, 1);
-       INIT_WORK(&rel->nested_in.notify_work,
-                 &devlink_rel_nested_in_notify_work);
+       INIT_DELAYED_WORK(&rel->nested_in.notify_work,
+                         &devlink_rel_nested_in_notify_work);
        return rel;
 }
 
@@ -529,14 +529,20 @@ static int __init devlink_init(void)
 {
        int err;
 
-       err = genl_register_family(&devlink_nl_family);
-       if (err)
-               goto out;
        err = register_pernet_subsys(&devlink_pernet_ops);
        if (err)
                goto out;
+       err = genl_register_family(&devlink_nl_family);
+       if (err)
+               goto out_unreg_pernet_subsys;
        err = register_netdevice_notifier(&devlink_port_netdevice_nb);
+       if (!err)
+               return 0;
+
+       genl_unregister_family(&devlink_nl_family);
 
+out_unreg_pernet_subsys:
+       unregister_pernet_subsys(&devlink_pernet_ops);
 out:
        WARN_ON(err);
        return err;
index 62e54e152ecf1fa601cb2cd755988c9ff97670af..4b2d46ccfe484f1ae2c21b5b2921a113d59e13f5 100644 (file)
@@ -583,7 +583,7 @@ devlink_nl_port_get_dump_one(struct sk_buff *msg, struct devlink *devlink,
 
        xa_for_each_start(&devlink->ports, port_index, devlink_port, state->idx) {
                err = devlink_nl_port_fill(msg, devlink_port,
-                                          DEVLINK_CMD_NEW,
+                                          DEVLINK_CMD_PORT_NEW,
                                           NETLINK_CB(cb->skb).portid,
                                           cb->nlh->nlmsg_seq, flags,
                                           cb->extack);
@@ -674,7 +674,7 @@ static int devlink_port_function_validate(struct devlink_port *devlink_port,
                return -EOPNOTSUPP;
        }
        if (tb[DEVLINK_PORT_FN_ATTR_STATE] && !ops->port_fn_state_set) {
-               NL_SET_ERR_MSG_ATTR(extack, tb[DEVLINK_PORT_FUNCTION_ATTR_HW_ADDR],
+               NL_SET_ERR_MSG_ATTR(extack, tb[DEVLINK_PORT_FN_ATTR_STATE],
                                    "Function does not support state setting");
                return -EOPNOTSUPP;
        }
index 16ed7bfd29e4fbf39e204278950834b9e69b9f97..34fd1d9b2db861de15f7ce68828abedbf6bc771c 100644 (file)
@@ -471,7 +471,10 @@ static void handshake_req_destroy_test1(struct kunit *test)
        handshake_req_cancel(sock->sk);
 
        /* Act */
-       fput(filp);
+       /* Ensure the close/release/put process has run to
+        * completion before checking the result.
+        */
+       __fput_sync(filp);
 
        /* Assert */
        KUNIT_EXPECT_PTR_EQ(test, handshake_req_destroy_test, req);
index 7ceb9ac6e7309372a5931f92c9b8adcc390af5f4..9d71b66183daf94e19945d75cfb5c33df6ce346c 100644 (file)
@@ -308,7 +308,7 @@ static void send_hsr_supervision_frame(struct hsr_port *master,
 
        skb = hsr_init_skb(master);
        if (!skb) {
-               WARN_ONCE(1, "HSR: Could not send supervision frame\n");
+               netdev_warn_once(master->dev, "HSR: Could not send supervision frame\n");
                return;
        }
 
@@ -355,7 +355,7 @@ static void send_prp_supervision_frame(struct hsr_port *master,
 
        skb = hsr_init_skb(master);
        if (!skb) {
-               WARN_ONCE(1, "PRP: Could not send supervision frame\n");
+               netdev_warn_once(master->dev, "PRP: Could not send supervision frame\n");
                return;
        }
 
index 80cdc6f6b34c97601961179c4839dc68c0a6d2e1..5d68cb181695d9a9f83809142a0300b8ddad5f53 100644 (file)
@@ -83,7 +83,7 @@ static bool is_supervision_frame(struct hsr_priv *hsr, struct sk_buff *skb)
                return false;
 
        /* Get next tlv */
-       total_length += sizeof(struct hsr_sup_tlv) + hsr_sup_tag->tlv.HSR_TLV_length;
+       total_length += hsr_sup_tag->tlv.HSR_TLV_length;
        if (!pskb_may_pull(skb, total_length))
                return false;
        skb_pull(skb, total_length);
@@ -435,7 +435,7 @@ static void hsr_forward_do(struct hsr_frame_info *frame)
                        continue;
 
                /* Don't send frame over port where it has been sent before.
-                * Also fro SAN, this shouldn't be done.
+                * Also for SAN, this shouldn't be done.
                 */
                if (!frame->is_from_san &&
                    hsr_register_frame_out(port, frame->node_src,
index 835f4f9d98d25559fb8965a7531c6863448a55c2..a5a820ee2026691afdd5ca3255962b5116fca290 100644 (file)
@@ -330,6 +330,9 @@ lookup_protocol:
        if (INET_PROTOSW_REUSE & answer_flags)
                sk->sk_reuse = SK_CAN_REUSE;
 
+       if (INET_PROTOSW_ICSK & answer_flags)
+               inet_init_csk_locks(sk);
+
        inet = inet_sk(sk);
        inet_assign_bit(IS_ICSK, sk, INET_PROTOSW_ICSK & answer_flags);
 
@@ -1625,10 +1628,12 @@ EXPORT_SYMBOL(inet_current_timestamp);
 
 int inet_recv_error(struct sock *sk, struct msghdr *msg, int len, int *addr_len)
 {
-       if (sk->sk_family == AF_INET)
+       unsigned int family = READ_ONCE(sk->sk_family);
+
+       if (family == AF_INET)
                return ip_recv_error(sk, msg, len, addr_len);
 #if IS_ENABLED(CONFIG_IPV6)
-       if (sk->sk_family == AF_INET6)
+       if (family == AF_INET6)
                return pingv6_ops.ipv6_recv_error(sk, msg, len, addr_len);
 #endif
        return -EINVAL;
index a2e6e1fdf82be44c15daefa2a423967ccd8999f7..64aec3dff8ec85135a8d14e5618900927eb59959 100644 (file)
@@ -597,5 +597,6 @@ static void __exit ah4_fini(void)
 
 module_init(ah4_init);
 module_exit(ah4_fini);
+MODULE_DESCRIPTION("IPv4 AH transformation library");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_XFRM_TYPE(AF_INET, XFRM_PROTO_AH);
index 9456f5bb35e5d9e97d6c05be21561b435e2b704a..0d0d725b46ad0c56b19b6356f6d3e6be8bdcae83 100644 (file)
@@ -1125,7 +1125,8 @@ static int arp_req_get(struct arpreq *r, struct net_device *dev)
        if (neigh) {
                if (!(READ_ONCE(neigh->nud_state) & NUD_NOARP)) {
                        read_lock_bh(&neigh->lock);
-                       memcpy(r->arp_ha.sa_data, neigh->ha, dev->addr_len);
+                       memcpy(r->arp_ha.sa_data, neigh->ha,
+                              min(dev->addr_len, sizeof(r->arp_ha.sa_data_min)));
                        r->arp_flags = arp_state_to_flags(neigh);
                        read_unlock_bh(&neigh->lock);
                        r->arp_ha.sa_family = dev->type;
index ca0ff15dc8fa358b81a804eda7398ecd10f00743..bc74f131fe4dfad327e71c1a8f0a4b66cdc526e5 100644 (file)
@@ -1825,6 +1825,21 @@ done:
        return err;
 }
 
+/* Combine dev_addr_genid and dev_base_seq to detect changes.
+ */
+static u32 inet_base_seq(const struct net *net)
+{
+       u32 res = atomic_read(&net->ipv4.dev_addr_genid) +
+                 net->dev_base_seq;
+
+       /* Must not return 0 (see nl_dump_check_consistent()).
+        * Chose a value far away from 0.
+        */
+       if (!res)
+               res = 0x80000000;
+       return res;
+}
+
 static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
 {
        const struct nlmsghdr *nlh = cb->nlh;
@@ -1876,8 +1891,7 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
                idx = 0;
                head = &tgt_net->dev_index_head[h];
                rcu_read_lock();
-               cb->seq = atomic_read(&tgt_net->ipv4.dev_addr_genid) ^
-                         tgt_net->dev_base_seq;
+               cb->seq = inet_base_seq(tgt_net);
                hlist_for_each_entry_rcu(dev, head, index_hlist) {
                        if (idx < s_idx)
                                goto cont;
@@ -2278,8 +2292,7 @@ static int inet_netconf_dump_devconf(struct sk_buff *skb,
                idx = 0;
                head = &net->dev_index_head[h];
                rcu_read_lock();
-               cb->seq = atomic_read(&net->ipv4.dev_addr_genid) ^
-                         net->dev_base_seq;
+               cb->seq = inet_base_seq(net);
                hlist_for_each_entry_rcu(dev, head, index_hlist) {
                        if (idx < s_idx)
                                goto cont;
index 4ccfc104f13a517ec15e5b609502708c289e3b57..4dd9e50406720cfc90d280f61e4616d6a2e58d3c 100644 (file)
@@ -1247,5 +1247,6 @@ static void __exit esp4_fini(void)
 
 module_init(esp4_init);
 module_exit(esp4_fini);
+MODULE_DESCRIPTION("IPv4 ESP transformation library");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_XFRM_TYPE(AF_INET, XFRM_PROTO_ESP);
index 8e2eb1793685ecd72da75bc841af12b90e85fcc7..459af1f8973958611c43936b0894f6154d23b99a 100644 (file)
@@ -727,6 +727,10 @@ out:
        }
        if (req)
                reqsk_put(req);
+
+       if (newsk)
+               inet_init_csk_locks(newsk);
+
        return newsk;
 out_err:
        newsk = NULL;
index 93e9193df54461b25c61089bd5db4dd33c32dab6..308ff34002ea6b5e0620004f65ffd833087afbc1 100644 (file)
@@ -1130,10 +1130,33 @@ ok:
        return 0;
 
 error:
+       if (sk_hashed(sk)) {
+               spinlock_t *lock = inet_ehash_lockp(hinfo, sk->sk_hash);
+
+               sock_prot_inuse_add(net, sk->sk_prot, -1);
+
+               spin_lock(lock);
+               sk_nulls_del_node_init_rcu(sk);
+               spin_unlock(lock);
+
+               sk->sk_hash = 0;
+               inet_sk(sk)->inet_sport = 0;
+               inet_sk(sk)->inet_num = 0;
+
+               if (tw)
+                       inet_twsk_bind_unhash(tw, hinfo);
+       }
+
        spin_unlock(&head2->lock);
        if (tb_created)
                inet_bind_bucket_destroy(hinfo->bind_bucket_cachep, tb);
-       spin_unlock_bh(&head->lock);
+       spin_unlock(&head->lock);
+
+       if (tw)
+               inet_twsk_deschedule_put(tw);
+
+       local_bh_enable();
+
        return -ENOMEM;
 }
 
index 5169c3c72cffe49cef613e69889d139db867ff74..6b9cf5a24c19ff06634f7841141b8a30639b8d17 100644 (file)
@@ -1793,6 +1793,7 @@ static void __exit ipgre_fini(void)
 
 module_init(ipgre_init);
 module_exit(ipgre_fini);
+MODULE_DESCRIPTION("IPv4 GRE tunnels over IP library");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_RTNL_LINK("gre");
 MODULE_ALIAS_RTNL_LINK("gretap");
index b06f678b03a19b806fd14764a4caad60caf02919..67d846622365e8da9c2295f76943a504d16b066f 100644 (file)
@@ -972,8 +972,8 @@ static int __ip_append_data(struct sock *sk,
        unsigned int maxfraglen, fragheaderlen, maxnonfragsize;
        int csummode = CHECKSUM_NONE;
        struct rtable *rt = (struct rtable *)cork->dst;
+       bool paged, hold_tskey, extra_uref = false;
        unsigned int wmem_alloc_delta = 0;
-       bool paged, extra_uref = false;
        u32 tskey = 0;
 
        skb = skb_peek_tail(queue);
@@ -982,10 +982,6 @@ static int __ip_append_data(struct sock *sk,
        mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize;
        paged = !!cork->gso_size;
 
-       if (cork->tx_flags & SKBTX_ANY_TSTAMP &&
-           READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID)
-               tskey = atomic_inc_return(&sk->sk_tskey) - 1;
-
        hh_len = LL_RESERVED_SPACE(rt->dst.dev);
 
        fragheaderlen = sizeof(struct iphdr) + (opt ? opt->optlen : 0);
@@ -1052,6 +1048,11 @@ static int __ip_append_data(struct sock *sk,
 
        cork->length += length;
 
+       hold_tskey = cork->tx_flags & SKBTX_ANY_TSTAMP &&
+                    READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID;
+       if (hold_tskey)
+               tskey = atomic_inc_return(&sk->sk_tskey) - 1;
+
        /* So, what's going on in the loop below?
         *
         * We use calculated fragment length to generate chained skb,
@@ -1274,6 +1275,8 @@ error:
        cork->length -= length;
        IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTDISCARDS);
        refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
+       if (hold_tskey)
+               atomic_dec(&sk->sk_tskey);
        return err;
 }
 
@@ -1287,6 +1290,12 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
        if (unlikely(!rt))
                return -EFAULT;
 
+       cork->fragsize = ip_sk_use_pmtu(sk) ?
+                        dst_mtu(&rt->dst) : READ_ONCE(rt->dst.dev->mtu);
+
+       if (!inetdev_valid_mtu(cork->fragsize))
+               return -ENETUNREACH;
+
        /*
         * setup for corking.
         */
@@ -1303,12 +1312,6 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
                cork->addr = ipc->addr;
        }
 
-       cork->fragsize = ip_sk_use_pmtu(sk) ?
-                        dst_mtu(&rt->dst) : READ_ONCE(rt->dst.dev->mtu);
-
-       if (!inetdev_valid_mtu(cork->fragsize))
-               return -ENETUNREACH;
-
        cork->gso_size = ipc->gso_size;
 
        cork->dst = &rt->dst;
index 7aa9dc0e6760df6c9980252854014ab6fdd1c3f7..21d2ffa919e98b41ed325f978ae573b9f25f4d71 100644 (file)
@@ -1363,12 +1363,13 @@ e_inval:
  * ipv4_pktinfo_prepare - transfer some info from rtable to skb
  * @sk: socket
  * @skb: buffer
+ * @drop_dst: if true, drops skb dst
  *
  * To support IP_CMSG_PKTINFO option, we store rt_iif and specific
  * destination in skb->cb[] before dst drop.
  * This way, receiver doesn't make cache line misses to read rtable.
  */
-void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb)
+void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb, bool drop_dst)
 {
        struct in_pktinfo *pktinfo = PKTINFO_SKB_CB(skb);
        bool prepare = inet_test_bit(PKTINFO, sk) ||
@@ -1397,7 +1398,8 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb)
                pktinfo->ipi_ifindex = 0;
                pktinfo->ipi_spec_dst.s_addr = 0;
        }
-       skb_dst_drop(skb);
+       if (drop_dst)
+               skb_dst_drop(skb);
 }
 
 int ip_setsockopt(struct sock *sk, int level, int optname, sockptr_t optval,
index beeae624c412d752bd5ee5d459a88f57640445e9..1b6981de3f29514dac72161be02f3ac6e4625551 100644 (file)
@@ -554,6 +554,20 @@ static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb,
        return 0;
 }
 
+static void ip_tunnel_adj_headroom(struct net_device *dev, unsigned int headroom)
+{
+       /* we must cap headroom to some upperlimit, else pskb_expand_head
+        * will overflow header offsets in skb_headers_offset_update().
+        */
+       static const unsigned int max_allowed = 512;
+
+       if (headroom > max_allowed)
+               headroom = max_allowed;
+
+       if (headroom > READ_ONCE(dev->needed_headroom))
+               WRITE_ONCE(dev->needed_headroom, headroom);
+}
+
 void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
                       u8 proto, int tunnel_hlen)
 {
@@ -632,13 +646,13 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
        }
 
        headroom += LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len;
-       if (headroom > READ_ONCE(dev->needed_headroom))
-               WRITE_ONCE(dev->needed_headroom, headroom);
-
-       if (skb_cow_head(skb, READ_ONCE(dev->needed_headroom))) {
+       if (skb_cow_head(skb, headroom)) {
                ip_rt_put(rt);
                goto tx_dropped;
        }
+
+       ip_tunnel_adj_headroom(dev, headroom);
+
        iptunnel_xmit(NULL, rt, skb, fl4.saddr, fl4.daddr, proto, tos, ttl,
                      df, !net_eq(tunnel->net, dev_net(dev)));
        return;
@@ -818,16 +832,16 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 
        max_headroom = LL_RESERVED_SPACE(rt->dst.dev) + sizeof(struct iphdr)
                        + rt->dst.header_len + ip_encap_hlen(&tunnel->encap);
-       if (max_headroom > READ_ONCE(dev->needed_headroom))
-               WRITE_ONCE(dev->needed_headroom, max_headroom);
 
-       if (skb_cow_head(skb, READ_ONCE(dev->needed_headroom))) {
+       if (skb_cow_head(skb, max_headroom)) {
                ip_rt_put(rt);
                DEV_STATS_INC(dev, tx_dropped);
                kfree_skb(skb);
                return;
        }
 
+       ip_tunnel_adj_headroom(dev, max_headroom);
+
        iptunnel_xmit(NULL, rt, skb, fl4.saddr, fl4.daddr, protocol, tos, ttl,
                      df, !net_eq(tunnel->net, dev_net(dev)));
        return;
@@ -1298,4 +1312,5 @@ void ip_tunnel_setup(struct net_device *dev, unsigned int net_id)
 }
 EXPORT_SYMBOL_GPL(ip_tunnel_setup);
 
+MODULE_DESCRIPTION("IPv4 tunnel implementation library");
 MODULE_LICENSE("GPL");
index 586b1b3e35b805d46158531ae8e7b49122abbaa7..80ccd6661aa32f2b60a720a18deec26e9e2cc18d 100644 (file)
@@ -332,7 +332,7 @@ static int iptunnel_pmtud_build_icmpv6(struct sk_buff *skb, int mtu)
        };
        skb_reset_network_header(skb);
 
-       csum = csum_partial(icmp6h, len, 0);
+       csum = skb_checksum(skb, skb_transport_offset(skb), len, 0);
        icmp6h->icmp6_cksum = csum_ipv6_magic(&nip6h->saddr, &nip6h->daddr, len,
                                              IPPROTO_ICMPV6, csum);
 
index 9ab9b3ebe0cd1a9e95f489d98c5a3d89c7c0edf6..d1d6bb28ed6e95c6e9c247bf1df1b27287bc8328 100644 (file)
@@ -721,6 +721,7 @@ static void __exit vti_fini(void)
 
 module_init(vti_init);
 module_exit(vti_fini);
+MODULE_DESCRIPTION("Virtual (secure) IP tunneling library");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_RTNL_LINK("vti");
 MODULE_ALIAS_NETDEV("ip_vti0");
index 27b8f83c6ea200314f41a29ecfea494b9ddef2ca..03afa3871efc53b5af543e7d53283be69a02f818 100644 (file)
@@ -658,6 +658,7 @@ static void __exit ipip_fini(void)
 
 module_init(ipip_init);
 module_exit(ipip_fini);
+MODULE_DESCRIPTION("IP/IP protocol decoder library");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_RTNL_LINK("ipip");
 MODULE_ALIAS_NETDEV("tunl0");
index 9d6f59531b3a0b0bc082e1f1febf4568368580b9..3622298365105d99c0277f1c1616fb5fc63cdc2d 100644 (file)
@@ -1073,7 +1073,7 @@ static int ipmr_cache_report(const struct mr_table *mrt,
                msg = (struct igmpmsg *)skb_network_header(skb);
                msg->im_vif = vifi;
                msg->im_vif_hi = vifi >> 8;
-               ipv4_pktinfo_prepare(mroute_sk, pkt);
+               ipv4_pktinfo_prepare(mroute_sk, pkt, false);
                memcpy(skb->cb, pkt->cb, sizeof(skb->cb));
                /* Add our header */
                igmp = skb_put(skb, sizeof(struct igmphdr));
index 27da9d7294c0b4fb9027bb7feb704063dc6302db..aea89326c69793f94bb8489cdf0c93b7524ba3fc 100644 (file)
@@ -292,7 +292,7 @@ static int raw_rcv_skb(struct sock *sk, struct sk_buff *skb)
 
        /* Charge it to the socket. */
 
-       ipv4_pktinfo_prepare(sk, skb);
+       ipv4_pktinfo_prepare(sk, skb, true);
        if (sock_queue_rcv_skb_reason(sk, skb, &reason) < 0) {
                kfree_skb_reason(skb, reason);
                return NET_RX_DROP;
index 1baa484d21902d2492fc2830d960100dc09683bf..c82dc42f57c65df112f79080ff407cd98d11ce68 100644 (file)
@@ -722,6 +722,7 @@ void tcp_push(struct sock *sk, int flags, int mss_now,
                if (!test_bit(TSQ_THROTTLED, &sk->sk_tsq_flags)) {
                        NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPAUTOCORKING);
                        set_bit(TSQ_THROTTLED, &sk->sk_tsq_flags);
+                       smp_mb__after_atomic();
                }
                /* It is possible TX completion already happened
                 * before we set TSQ_THROTTLED.
@@ -1785,7 +1786,17 @@ static skb_frag_t *skb_advance_to_frag(struct sk_buff *skb, u32 offset_skb,
 
 static bool can_map_frag(const skb_frag_t *frag)
 {
-       return skb_frag_size(frag) == PAGE_SIZE && !skb_frag_off(frag);
+       struct page *page;
+
+       if (skb_frag_size(frag) != PAGE_SIZE || skb_frag_off(frag))
+               return false;
+
+       page = skb_frag_page(frag);
+
+       if (PageCompound(page) || page->mapping)
+               return false;
+
+       return true;
 }
 
 static int find_next_mappable_frag(const skb_frag_t *frag,
@@ -4604,7 +4615,8 @@ static void __init tcp_struct_check(void)
        CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_read_txrx, prr_out);
        CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_read_txrx, lost_out);
        CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_read_txrx, sacked_out);
-       CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_read_txrx, 31);
+       CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_read_txrx, scaling_ratio);
+       CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_read_txrx, 32);
 
        /* RX read-mostly hotpath cache lines */
        CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_read_rx, copied_seq);
index 5048c47c79b2848a1b42032dceb4884f43cd4748..4c1f836aae38b7a75b912db9e2cdc84ccfc48e56 100644 (file)
@@ -294,4 +294,5 @@ static void __exit tunnel4_fini(void)
 
 module_init(tunnel4_init);
 module_exit(tunnel4_fini);
+MODULE_DESCRIPTION("IPv4 XFRM tunnel library");
 MODULE_LICENSE("GPL");
index 148ffb007969f57edc4be8ec1c235062ad49b503..e474b201900f9317069a31e4b507964fe11b2297 100644 (file)
@@ -1589,12 +1589,7 @@ int udp_init_sock(struct sock *sk)
 
 void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len)
 {
-       if (unlikely(READ_ONCE(sk->sk_peek_off) >= 0)) {
-               bool slow = lock_sock_fast(sk);
-
-               sk_peek_offset_bwd(sk, len);
-               unlock_sock_fast(sk, slow);
-       }
+       sk_peek_offset_bwd(sk, len);
 
        if (!skb_unref(skb))
                return;
@@ -2169,7 +2164,7 @@ static int udp_queue_rcv_one_skb(struct sock *sk, struct sk_buff *skb)
 
        udp_csum_pull_header(skb);
 
-       ipv4_pktinfo_prepare(sk, skb);
+       ipv4_pktinfo_prepare(sk, skb, true);
        return __udp_queue_rcv_skb(sk, skb);
 
 csum_error:
index a87defb2b16729886d20fcec53cea939c7fea4b7..860aff5f85990252607651c173a6d84006e5afe1 100644 (file)
@@ -253,4 +253,5 @@ struct rtable *udp_tunnel_dst_lookup(struct sk_buff *skb,
 }
 EXPORT_SYMBOL_GPL(udp_tunnel_dst_lookup);
 
+MODULE_DESCRIPTION("IPv4 Foo over UDP tunnel driver");
 MODULE_LICENSE("GPL");
index 8489fa10658377eb0942943e537a453d781f4520..8cb266af139311b48af474fc19eff32c6d8b5e37 100644 (file)
@@ -114,5 +114,6 @@ static void __exit ipip_fini(void)
 
 module_init(ipip_init);
 module_exit(ipip_fini);
+MODULE_DESCRIPTION("IPv4 XFRM tunnel driver");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_XFRM_TYPE(AF_INET, XFRM_PROTO_IPIP);
index 733ace18806c61f487d83081dc6d39d079959f77..055230b669cf21d87738a4371543c599c3476f98 100644 (file)
@@ -708,6 +708,22 @@ errout:
        return err;
 }
 
+/* Combine dev_addr_genid and dev_base_seq to detect changes.
+ */
+static u32 inet6_base_seq(const struct net *net)
+{
+       u32 res = atomic_read(&net->ipv6.dev_addr_genid) +
+                 net->dev_base_seq;
+
+       /* Must not return 0 (see nl_dump_check_consistent()).
+        * Chose a value far away from 0.
+        */
+       if (!res)
+               res = 0x80000000;
+       return res;
+}
+
+
 static int inet6_netconf_dump_devconf(struct sk_buff *skb,
                                      struct netlink_callback *cb)
 {
@@ -741,8 +757,7 @@ static int inet6_netconf_dump_devconf(struct sk_buff *skb,
                idx = 0;
                head = &net->dev_index_head[h];
                rcu_read_lock();
-               cb->seq = atomic_read(&net->ipv6.dev_addr_genid) ^
-                         net->dev_base_seq;
+               cb->seq = inet6_base_seq(net);
                hlist_for_each_entry_rcu(dev, head, index_hlist) {
                        if (idx < s_idx)
                                goto cont;
@@ -5362,7 +5377,7 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
        }
 
        rcu_read_lock();
-       cb->seq = atomic_read(&tgt_net->ipv6.dev_addr_genid) ^ tgt_net->dev_base_seq;
+       cb->seq = inet6_base_seq(tgt_net);
        for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
                idx = 0;
                head = &tgt_net->dev_index_head[h];
@@ -5494,9 +5509,10 @@ static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr *nlh,
        }
 
        addr = extract_addr(tb[IFA_ADDRESS], tb[IFA_LOCAL], &peer);
-       if (!addr)
-               return -EINVAL;
-
+       if (!addr) {
+               err = -EINVAL;
+               goto errout;
+       }
        ifm = nlmsg_data(nlh);
        if (ifm->ifa_index)
                dev = dev_get_by_index(tgt_net, ifm->ifa_index);
index 507a8353a6bdb94cd5e83aad6efd877d84cfdc85..c008d21925d7f4afa31cc55deec0ccc321cdab04 100644 (file)
@@ -220,19 +220,26 @@ const struct ipv6_stub *ipv6_stub __read_mostly = &(struct ipv6_stub) {
 EXPORT_SYMBOL_GPL(ipv6_stub);
 
 /* IPv6 Wildcard Address and Loopback Address defined by RFC2553 */
-const struct in6_addr in6addr_loopback = IN6ADDR_LOOPBACK_INIT;
+const struct in6_addr in6addr_loopback __aligned(BITS_PER_LONG/8)
+       = IN6ADDR_LOOPBACK_INIT;
 EXPORT_SYMBOL(in6addr_loopback);
-const struct in6_addr in6addr_any = IN6ADDR_ANY_INIT;
+const struct in6_addr in6addr_any __aligned(BITS_PER_LONG/8)
+       = IN6ADDR_ANY_INIT;
 EXPORT_SYMBOL(in6addr_any);
-const struct in6_addr in6addr_linklocal_allnodes = IN6ADDR_LINKLOCAL_ALLNODES_INIT;
+const struct in6_addr in6addr_linklocal_allnodes __aligned(BITS_PER_LONG/8)
+       = IN6ADDR_LINKLOCAL_ALLNODES_INIT;
 EXPORT_SYMBOL(in6addr_linklocal_allnodes);
-const struct in6_addr in6addr_linklocal_allrouters = IN6ADDR_LINKLOCAL_ALLROUTERS_INIT;
+const struct in6_addr in6addr_linklocal_allrouters __aligned(BITS_PER_LONG/8)
+       = IN6ADDR_LINKLOCAL_ALLROUTERS_INIT;
 EXPORT_SYMBOL(in6addr_linklocal_allrouters);
-const struct in6_addr in6addr_interfacelocal_allnodes = IN6ADDR_INTERFACELOCAL_ALLNODES_INIT;
+const struct in6_addr in6addr_interfacelocal_allnodes __aligned(BITS_PER_LONG/8)
+       = IN6ADDR_INTERFACELOCAL_ALLNODES_INIT;
 EXPORT_SYMBOL(in6addr_interfacelocal_allnodes);
-const struct in6_addr in6addr_interfacelocal_allrouters = IN6ADDR_INTERFACELOCAL_ALLROUTERS_INIT;
+const struct in6_addr in6addr_interfacelocal_allrouters __aligned(BITS_PER_LONG/8)
+       = IN6ADDR_INTERFACELOCAL_ALLROUTERS_INIT;
 EXPORT_SYMBOL(in6addr_interfacelocal_allrouters);
-const struct in6_addr in6addr_sitelocal_allrouters = IN6ADDR_SITELOCAL_ALLROUTERS_INIT;
+const struct in6_addr in6addr_sitelocal_allrouters __aligned(BITS_PER_LONG/8)
+       = IN6ADDR_SITELOCAL_ALLROUTERS_INIT;
 EXPORT_SYMBOL(in6addr_sitelocal_allrouters);
 
 static void snmp6_free_dev(struct inet6_dev *idev)
index 13a1833a4df52956431c5c2fefcb6af80e1a828f..959bfd9f6344f11241dd20246f92bd1d47ff565e 100644 (file)
@@ -199,6 +199,9 @@ lookup_protocol:
        if (INET_PROTOSW_REUSE & answer_flags)
                sk->sk_reuse = SK_CAN_REUSE;
 
+       if (INET_PROTOSW_ICSK & answer_flags)
+               inet_init_csk_locks(sk);
+
        inet = inet_sk(sk);
        inet_assign_bit(IS_ICSK, sk, INET_PROTOSW_ICSK & answer_flags);
 
index 2016e90e6e1d21a49696c9933f1b77320cc71953..eb474f0987ae016b9d800e9f83d70d73171b21d2 100644 (file)
@@ -800,5 +800,6 @@ static void __exit ah6_fini(void)
 module_init(ah6_init);
 module_exit(ah6_fini);
 
+MODULE_DESCRIPTION("IPv6 AH transformation helpers");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_XFRM_TYPE(AF_INET6, XFRM_PROTO_AH);
index 2cc1a45742d823a793d95140910942fb83e7f331..6e6efe026cdcc2feab9a1f18fb784042b586f045 100644 (file)
@@ -1301,5 +1301,6 @@ static void __exit esp6_fini(void)
 module_init(esp6_init);
 module_exit(esp6_fini);
 
+MODULE_DESCRIPTION("IPv6 ESP transformation helpers");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_XFRM_TYPE(AF_INET6, XFRM_PROTO_ESP);
index 4952ae792450575d275f1565d2bc198e440b67f6..02e9ffb63af1971c0949ccd0c392b995efb41ccb 100644 (file)
@@ -177,6 +177,8 @@ static bool ip6_parse_tlv(bool hopbyhop,
                                case IPV6_TLV_IOAM:
                                        if (!ipv6_hop_ioam(skb, off))
                                                return false;
+
+                                       nh = skb_network_header(skb);
                                        break;
                                case IPV6_TLV_JUMBO:
                                        if (!ipv6_hop_jumbo(skb, off))
@@ -943,6 +945,14 @@ static bool ipv6_hop_ioam(struct sk_buff *skb, int optoff)
                if (!skb_valid_dst(skb))
                        ip6_route_input(skb);
 
+               /* About to mangle packet header */
+               if (skb_ensure_writable(skb, optoff + 2 + hdr->opt_len))
+                       goto drop;
+
+               /* Trace pointer may have changed */
+               trace = (struct ioam6_trace_hdr *)(skb_network_header(skb)
+                                                  + optoff + sizeof(*hdr));
+
                ioam6_fill_trace_data(skb, ns, trace, true);
                break;
        default:
index a722a43dd668581cf4efb08ee5ab8410e5adebb7..31b86fe661aa6cd94fb5d8848900406c2db110e3 100644 (file)
@@ -1424,11 +1424,11 @@ static int __ip6_append_data(struct sock *sk,
        bool zc = false;
        u32 tskey = 0;
        struct rt6_info *rt = (struct rt6_info *)cork->dst;
+       bool paged, hold_tskey, extra_uref = false;
        struct ipv6_txoptions *opt = v6_cork->opt;
        int csummode = CHECKSUM_NONE;
        unsigned int maxnonfragsize, headersize;
        unsigned int wmem_alloc_delta = 0;
-       bool paged, extra_uref = false;
 
        skb = skb_peek_tail(queue);
        if (!skb) {
@@ -1440,10 +1440,6 @@ static int __ip6_append_data(struct sock *sk,
        mtu = cork->gso_size ? IP6_MAX_MTU : cork->fragsize;
        orig_mtu = mtu;
 
-       if (cork->tx_flags & SKBTX_ANY_TSTAMP &&
-           READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID)
-               tskey = atomic_inc_return(&sk->sk_tskey) - 1;
-
        hh_len = LL_RESERVED_SPACE(rt->dst.dev);
 
        fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len +
@@ -1538,6 +1534,11 @@ emsgsize:
                        flags &= ~MSG_SPLICE_PAGES;
        }
 
+       hold_tskey = cork->tx_flags & SKBTX_ANY_TSTAMP &&
+                    READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID;
+       if (hold_tskey)
+               tskey = atomic_inc_return(&sk->sk_tskey) - 1;
+
        /*
         * Let's try using as much space as possible.
         * Use MTU if total length of the message fits into the MTU.
@@ -1794,6 +1795,8 @@ error:
        cork->length -= length;
        IP6_INC_STATS(sock_net(sk), rt->rt6i_idev, IPSTATS_MIB_OUTDISCARDS);
        refcount_add(wmem_alloc_delta, &sk->sk_wmem_alloc);
+       if (hold_tskey)
+               atomic_dec(&sk->sk_tskey);
        return err;
 }
 
index 46c19bd4899011d53b4feb84e25013c01ddce701..9bbabf750a21e251d4e8f9e3059c707505f5ce32 100644 (file)
@@ -796,8 +796,8 @@ static int __ip6_tnl_rcv(struct ip6_tnl *tunnel, struct sk_buff *skb,
                                                struct sk_buff *skb),
                         bool log_ecn_err)
 {
-       const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
-       int err;
+       const struct ipv6hdr *ipv6h;
+       int nh, err;
 
        if ((!(tpi->flags & TUNNEL_CSUM) &&
             (tunnel->parms.i_flags & TUNNEL_CSUM)) ||
@@ -829,7 +829,6 @@ static int __ip6_tnl_rcv(struct ip6_tnl *tunnel, struct sk_buff *skb,
                        goto drop;
                }
 
-               ipv6h = ipv6_hdr(skb);
                skb->protocol = eth_type_trans(skb, tunnel->dev);
                skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
        } else {
@@ -837,7 +836,23 @@ static int __ip6_tnl_rcv(struct ip6_tnl *tunnel, struct sk_buff *skb,
                skb_reset_mac_header(skb);
        }
 
+       /* Save offset of outer header relative to skb->head,
+        * because we are going to reset the network header to the inner header
+        * and might change skb->head.
+        */
+       nh = skb_network_header(skb) - skb->head;
+
        skb_reset_network_header(skb);
+
+       if (!pskb_inet_may_pull(skb)) {
+               DEV_STATS_INC(tunnel->dev, rx_length_errors);
+               DEV_STATS_INC(tunnel->dev, rx_errors);
+               goto drop;
+       }
+
+       /* Get the outer header. */
+       ipv6h = (struct ipv6hdr *)(skb->head + nh);
+
        memset(skb->cb, 0, sizeof(struct inet6_skb_parm));
 
        __skb_tunnel_rx(skb, tunnel->dev, tunnel->net);
index a7bf0327b380be90bfdcc2182ed3a4296a0e814f..c99053189ea8a13be63927290576655e8da0c0fb 100644 (file)
@@ -182,4 +182,5 @@ struct dst_entry *udp_tunnel6_dst_lookup(struct sk_buff *skb,
 }
 EXPORT_SYMBOL_GPL(udp_tunnel6_dst_lookup);
 
+MODULE_DESCRIPTION("IPv6 Foo over UDP tunnel driver");
 MODULE_LICENSE("GPL");
index 83d2a8be263fb7bdd0cbe820168b1aac9a4336b2..6a16a5bd0d910bca55a87c55580cbd1cae71bede 100644 (file)
@@ -405,6 +405,7 @@ static void __exit mip6_fini(void)
 module_init(mip6_init);
 module_exit(mip6_fini);
 
+MODULE_DESCRIPTION("IPv6 Mobility driver");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_XFRM_TYPE(AF_INET6, XFRM_PROTO_DSTOPTS);
 MODULE_ALIAS_XFRM_TYPE(AF_INET6, XFRM_PROTO_ROUTING);
index ea1dec8448fce8ccf29be650301e937cfce6bd7a..ef815ba583a8f4ed0ca523a13c515f108132a939 100644 (file)
@@ -5332,19 +5332,7 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,
        err_nh = NULL;
        list_for_each_entry(nh, &rt6_nh_list, next) {
                err = __ip6_ins_rt(nh->fib6_info, info, extack);
-               fib6_info_release(nh->fib6_info);
-
-               if (!err) {
-                       /* save reference to last route successfully inserted */
-                       rt_last = nh->fib6_info;
-
-                       /* save reference to first route for notification */
-                       if (!rt_notif)
-                               rt_notif = nh->fib6_info;
-               }
 
-               /* nh->fib6_info is used or freed at this point, reset to NULL*/
-               nh->fib6_info = NULL;
                if (err) {
                        if (replace && nhn)
                                NL_SET_ERR_MSG_MOD(extack,
@@ -5352,6 +5340,12 @@ static int ip6_route_multipath_add(struct fib6_config *cfg,
                        err_nh = nh;
                        goto add_errout;
                }
+               /* save reference to last route successfully inserted */
+               rt_last = nh->fib6_info;
+
+               /* save reference to first route for notification */
+               if (!rt_notif)
+                       rt_notif = nh->fib6_info;
 
                /* Because each route is added like a single route we remove
                 * these flags after the first nexthop: if there is a collision,
@@ -5412,8 +5406,7 @@ add_errout:
 
 cleanup:
        list_for_each_entry_safe(nh, nh_safe, &rt6_nh_list, next) {
-               if (nh->fib6_info)
-                       fib6_info_release(nh->fib6_info);
+               fib6_info_release(nh->fib6_info);
                list_del(&nh->next);
                kfree(nh);
        }
index 29346a6eec9ffed46b00153c4a6cb0295a327ceb..35508abd76f43d771ed7e66f29bc143af4a81977 100644 (file)
@@ -512,22 +512,24 @@ int __init seg6_init(void)
 {
        int err;
 
-       err = genl_register_family(&seg6_genl_family);
+       err = register_pernet_subsys(&ip6_segments_ops);
        if (err)
                goto out;
 
-       err = register_pernet_subsys(&ip6_segments_ops);
+       err = genl_register_family(&seg6_genl_family);
        if (err)
-               goto out_unregister_genl;
+               goto out_unregister_pernet;
 
 #ifdef CONFIG_IPV6_SEG6_LWTUNNEL
        err = seg6_iptunnel_init();
        if (err)
-               goto out_unregister_pernet;
+               goto out_unregister_genl;
 
        err = seg6_local_init();
-       if (err)
-               goto out_unregister_pernet;
+       if (err) {
+               seg6_iptunnel_exit();
+               goto out_unregister_genl;
+       }
 #endif
 
 #ifdef CONFIG_IPV6_SEG6_HMAC
@@ -548,11 +550,11 @@ out_unregister_iptun:
 #endif
 #endif
 #ifdef CONFIG_IPV6_SEG6_LWTUNNEL
-out_unregister_pernet:
-       unregister_pernet_subsys(&ip6_segments_ops);
-#endif
 out_unregister_genl:
        genl_unregister_family(&seg6_genl_family);
+#endif
+out_unregister_pernet:
+       unregister_pernet_subsys(&ip6_segments_ops);
        goto out;
 }
 
index cc24cefdb85c0944c03c019b1c4214302d18e2c8..5e9f625b76e36b9a61c6c2db0b4163e78dca549a 100644 (file)
@@ -1956,6 +1956,7 @@ xfrm_tunnel_failed:
 
 module_init(sit_init);
 module_exit(sit_cleanup);
+MODULE_DESCRIPTION("IPv6-in-IPv4 tunnel SIT driver");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_RTNL_LINK("sit");
 MODULE_ALIAS_NETDEV("sit0");
index 00e8d8b1c9a75fa1a820ac85eba91e3e24750c01..dc4ea9b11794e800eb027855e59a56fa6197df0b 100644 (file)
@@ -302,4 +302,5 @@ static void __exit tunnel6_fini(void)
 
 module_init(tunnel6_init);
 module_exit(tunnel6_fini);
+MODULE_DESCRIPTION("IP-in-IPv6 tunnel driver");
 MODULE_LICENSE("GPL");
index 1323f2f6928e2abf277e9ce7bd06025cd0049031..f6cb94f82cc3a2b40717a0c4406801dd26ac18c3 100644 (file)
@@ -401,5 +401,6 @@ static void __exit xfrm6_tunnel_fini(void)
 
 module_init(xfrm6_tunnel_init);
 module_exit(xfrm6_tunnel_fini);
+MODULE_DESCRIPTION("IPv6 XFRM tunnel driver");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_XFRM_TYPE(AF_INET6, XFRM_PROTO_IPV6);
index 6334f64f04d5f28c7e01e959d18b343d7c641336..b0b3e9c5af44fdd83b0a108bdeb0f3f6a3ffb85e 100644 (file)
@@ -156,7 +156,7 @@ static char iucv_error_pathid[16] = "INVALID PATHID";
 static LIST_HEAD(iucv_handler_list);
 
 /*
- * iucv_path_table: an array of iucv_path structures.
+ * iucv_path_table: array of pointers to iucv_path structures.
  */
 static struct iucv_path **iucv_path_table;
 static unsigned long iucv_max_pathid;
@@ -544,7 +544,7 @@ static int iucv_enable(void)
 
        cpus_read_lock();
        rc = -ENOMEM;
-       alloc_size = iucv_max_pathid * sizeof(struct iucv_path);
+       alloc_size = iucv_max_pathid * sizeof(*iucv_path_table);
        iucv_path_table = kzalloc(alloc_size, GFP_KERNEL);
        if (!iucv_path_table)
                goto out;
index d68d01804dc7bcf7df78ff6968518298fe9d596a..f79fb99271ed84b8fe981a2b34a25b6abcf9d8e0 100644 (file)
@@ -3924,5 +3924,6 @@ out_unregister_key_proto:
 
 module_init(ipsec_pfkey_init);
 module_exit(ipsec_pfkey_exit);
+MODULE_DESCRIPTION("PF_KEY socket helpers");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_NETPROTO(PF_KEY);
index dd3153966173db09d42de02fa3ad4d44d05620f4..7bf14cf9ffaa967483ac0ee01e3f8e835754cd57 100644 (file)
@@ -627,7 +627,7 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 back_from_confirm:
        lock_sock(sk);
-       ulen = len + skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0;
+       ulen = len + (skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0);
        err = ip6_append_data(sk, ip_generic_getfrag, msg,
                              ulen, transhdrlen, &ipc6,
                              &fl6, (struct rt6_info *)dst,
index 9b06c380866b53bcb395bf255587279db025d11d..fde1140d899efc7ba02e6bc3998cb857ef30df14 100644 (file)
@@ -226,6 +226,8 @@ static int llc_ui_release(struct socket *sock)
        }
        netdev_put(llc->dev, &llc->dev_tracker);
        sock_put(sk);
+       sock_orphan(sk);
+       sock->sk = NULL;
        llc_sk_free(sk);
 out:
        return 0;
@@ -928,14 +930,15 @@ copy_uaddr:
  */
 static int llc_ui_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
 {
+       DECLARE_SOCKADDR(struct sockaddr_llc *, addr, msg->msg_name);
        struct sock *sk = sock->sk;
        struct llc_sock *llc = llc_sk(sk);
-       DECLARE_SOCKADDR(struct sockaddr_llc *, addr, msg->msg_name);
        int flags = msg->msg_flags;
        int noblock = flags & MSG_DONTWAIT;
+       int rc = -EINVAL, copied = 0, hdrlen, hh_len;
        struct sk_buff *skb = NULL;
+       struct net_device *dev;
        size_t size = 0;
-       int rc = -EINVAL, copied = 0, hdrlen;
 
        dprintk("%s: sending from %02X to %02X\n", __func__,
                llc->laddr.lsap, llc->daddr.lsap);
@@ -955,22 +958,29 @@ static int llc_ui_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
                if (rc)
                        goto out;
        }
-       hdrlen = llc->dev->hard_header_len + llc_ui_header_len(sk, addr);
+       dev = llc->dev;
+       hh_len = LL_RESERVED_SPACE(dev);
+       hdrlen = llc_ui_header_len(sk, addr);
        size = hdrlen + len;
-       if (size > llc->dev->mtu)
-               size = llc->dev->mtu;
+       size = min_t(size_t, size, READ_ONCE(dev->mtu));
        copied = size - hdrlen;
        rc = -EINVAL;
        if (copied < 0)
                goto out;
        release_sock(sk);
-       skb = sock_alloc_send_skb(sk, size, noblock, &rc);
+       skb = sock_alloc_send_skb(sk, hh_len + size, noblock, &rc);
        lock_sock(sk);
        if (!skb)
                goto out;
-       skb->dev      = llc->dev;
+       if (sock_flag(sk, SOCK_ZAPPED) ||
+           llc->dev != dev ||
+           hdrlen != llc_ui_header_len(sk, addr) ||
+           hh_len != LL_RESERVED_SPACE(dev) ||
+           size > READ_ONCE(dev->mtu))
+               goto out;
+       skb->dev      = dev;
        skb->protocol = llc_proto_type(addr->sllc_arphrd);
-       skb_reserve(skb, hdrlen);
+       skb_reserve(skb, hh_len + hdrlen);
        rc = memcpy_from_msg(skb_put(skb, copied), msg, copied);
        if (rc)
                goto out;
index 6e387aadffcecbec01d63aef4d6289bccc17f59e..4f16d9c88350b4481805c145887df23c681a159d 100644 (file)
@@ -135,22 +135,15 @@ static struct packet_type llc_packet_type __read_mostly = {
        .func = llc_rcv,
 };
 
-static struct packet_type llc_tr_packet_type __read_mostly = {
-       .type = cpu_to_be16(ETH_P_TR_802_2),
-       .func = llc_rcv,
-};
-
 static int __init llc_init(void)
 {
        dev_add_pack(&llc_packet_type);
-       dev_add_pack(&llc_tr_packet_type);
        return 0;
 }
 
 static void __exit llc_exit(void)
 {
        dev_remove_pack(&llc_packet_type);
-       dev_remove_pack(&llc_tr_packet_type);
 }
 
 module_init(llc_init);
index cb0291decf2e56c7d4111e649f41d28577af987e..13438cc0a6b139b6cb10c15ce894153706514811 100644 (file)
@@ -62,7 +62,6 @@ config MAC80211_KUNIT_TEST
        depends on KUNIT
        depends on MAC80211
        default KUNIT_ALL_TESTS
-       depends on !KERNEL_6_2
        help
          Enable this option to test mac80211 internals with kunit.
 
index 489dd97f51724a86053a9c4e9269487c4c7e928b..327682995c9260c9c7498ff9b322ecf5d59c6717 100644 (file)
@@ -5,7 +5,7 @@
  * Copyright 2006-2010 Johannes Berg <johannes@sipsolutions.net>
  * Copyright 2013-2015  Intel Mobile Communications GmbH
  * Copyright (C) 2015-2017 Intel Deutschland GmbH
- * Copyright (C) 2018-2022 Intel Corporation
+ * Copyright (C) 2018-2024 Intel Corporation
  */
 
 #include <linux/ieee80211.h>
@@ -987,7 +987,8 @@ static int
 ieee80211_set_unsol_bcast_probe_resp(struct ieee80211_sub_if_data *sdata,
                                     struct cfg80211_unsol_bcast_probe_resp *params,
                                     struct ieee80211_link_data *link,
-                                    struct ieee80211_bss_conf *link_conf)
+                                    struct ieee80211_bss_conf *link_conf,
+                                    u64 *changed)
 {
        struct unsol_bcast_probe_resp_data *new, *old = NULL;
 
@@ -1011,7 +1012,8 @@ ieee80211_set_unsol_bcast_probe_resp(struct ieee80211_sub_if_data *sdata,
                RCU_INIT_POINTER(link->u.ap.unsol_bcast_probe_resp, NULL);
        }
 
-       return BSS_CHANGED_UNSOL_BCAST_PROBE_RESP;
+       *changed |= BSS_CHANGED_UNSOL_BCAST_PROBE_RESP;
+       return 0;
 }
 
 static int ieee80211_set_ftm_responder_params(
@@ -1450,10 +1452,9 @@ static int ieee80211_start_ap(struct wiphy *wiphy, struct net_device *dev,
 
        err = ieee80211_set_unsol_bcast_probe_resp(sdata,
                                                   &params->unsol_bcast_probe_resp,
-                                                  link, link_conf);
+                                                  link, link_conf, &changed);
        if (err < 0)
                goto error;
-       changed |= err;
 
        err = drv_start_ap(sdata->local, sdata, link_conf);
        if (err) {
@@ -1525,10 +1526,9 @@ static int ieee80211_change_beacon(struct wiphy *wiphy, struct net_device *dev,
 
        err = ieee80211_set_unsol_bcast_probe_resp(sdata,
                                                   &params->unsol_bcast_probe_resp,
-                                                  link, link_conf);
+                                                  link, link_conf, &changed);
        if (err < 0)
                return err;
-       changed |= err;
 
        if (beacon->he_bss_color_valid &&
            beacon->he_bss_color.enabled != link_conf->he_bss_color.enabled) {
@@ -1869,6 +1869,8 @@ static int sta_link_apply_parameters(struct ieee80211_local *local,
                                              sband->band);
        }
 
+       ieee80211_sta_set_rx_nss(link_sta);
+
        return ret;
 }
 
index dce5606ed66da5a31a476aec16bb55412e1e72cc..68596ef78b15ee9596f6f81e8dd2d2f82c1d56cd 100644 (file)
@@ -997,8 +997,8 @@ static void add_link_files(struct ieee80211_link_data *link,
        }
 }
 
-void ieee80211_debugfs_add_netdev(struct ieee80211_sub_if_data *sdata,
-                                 bool mld_vif)
+static void ieee80211_debugfs_add_netdev(struct ieee80211_sub_if_data *sdata,
+                                        bool mld_vif)
 {
        char buf[10+IFNAMSIZ];
 
index b226b1aae88a5d4205c0206b351b63e6ee54c2a2..a02ec0a413f61468dded52076fbfef9a35da17b0 100644 (file)
@@ -11,8 +11,6 @@
 #include "ieee80211_i.h"
 
 #ifdef CONFIG_MAC80211_DEBUGFS
-void ieee80211_debugfs_add_netdev(struct ieee80211_sub_if_data *sdata,
-                                 bool mld_vif);
 void ieee80211_debugfs_remove_netdev(struct ieee80211_sub_if_data *sdata);
 void ieee80211_debugfs_rename_netdev(struct ieee80211_sub_if_data *sdata);
 void ieee80211_debugfs_recreate_netdev(struct ieee80211_sub_if_data *sdata,
@@ -24,9 +22,6 @@ void ieee80211_link_debugfs_remove(struct ieee80211_link_data *link);
 void ieee80211_link_debugfs_drv_add(struct ieee80211_link_data *link);
 void ieee80211_link_debugfs_drv_remove(struct ieee80211_link_data *link);
 #else
-static inline void ieee80211_debugfs_add_netdev(
-       struct ieee80211_sub_if_data *sdata, bool mld_vif)
-{}
 static inline void ieee80211_debugfs_remove_netdev(
        struct ieee80211_sub_if_data *sdata)
 {}
index e4e7c0b38cb6efcbb65786d071f436e09c7bf322..11c4caa4748e4038a2c758e34ae8dbf762e8159e 100644 (file)
@@ -1783,7 +1783,7 @@ static void ieee80211_setup_sdata(struct ieee80211_sub_if_data *sdata,
        /* need to do this after the switch so vif.type is correct */
        ieee80211_link_setup(&sdata->deflink);
 
-       ieee80211_debugfs_add_netdev(sdata, false);
+       ieee80211_debugfs_recreate_netdev(sdata, false);
 }
 
 static int ieee80211_runtime_change_iftype(struct ieee80211_sub_if_data *sdata,
index 073105deb42481f2792a33fa0341d509f7a95017..2022a26eb8811492ef8029de9e89dfd4bbb5f101 100644 (file)
@@ -8,7 +8,7 @@
  * Copyright 2007, Michael Wu <flamingice@sourmilk.net>
  * Copyright 2013-2014  Intel Mobile Communications GmbH
  * Copyright (C) 2015 - 2017 Intel Deutschland GmbH
- * Copyright (C) 2018 - 2023 Intel Corporation
+ * Copyright (C) 2018 - 2024 Intel Corporation
  */
 
 #include <linux/delay.h>
@@ -2918,6 +2918,7 @@ static void ieee80211_set_disassoc(struct ieee80211_sub_if_data *sdata,
 
        /* other links will be destroyed */
        sdata->deflink.u.mgd.bss = NULL;
+       sdata->deflink.smps_mode = IEEE80211_SMPS_OFF;
 
        netif_carrier_off(sdata->dev);
 
@@ -5045,9 +5046,6 @@ static int ieee80211_prep_channel(struct ieee80211_sub_if_data *sdata,
        if (!link)
                return 0;
 
-       /* will change later if needed */
-       link->smps_mode = IEEE80211_SMPS_OFF;
-
        /*
         * If this fails (possibly due to channel context sharing
         * on incompatible channels, e.g. 80+80 and 160 sharing the
@@ -7096,6 +7094,7 @@ void ieee80211_mgd_setup_link(struct ieee80211_link_data *link)
        link->u.mgd.p2p_noa_index = -1;
        link->u.mgd.conn_flags = 0;
        link->conf->bssid = link->u.mgd.bssid;
+       link->smps_mode = IEEE80211_SMPS_OFF;
 
        wiphy_work_init(&link->u.mgd.request_smps_work,
                        ieee80211_request_smps_mgd_work);
@@ -7309,6 +7308,75 @@ out_err:
        return err;
 }
 
+static bool ieee80211_mgd_csa_present(struct ieee80211_sub_if_data *sdata,
+                                     const struct cfg80211_bss_ies *ies,
+                                     u8 cur_channel, bool ignore_ecsa)
+{
+       const struct element *csa_elem, *ecsa_elem;
+       struct ieee80211_channel_sw_ie *csa = NULL;
+       struct ieee80211_ext_chansw_ie *ecsa = NULL;
+
+       if (!ies)
+               return false;
+
+       csa_elem = cfg80211_find_elem(WLAN_EID_CHANNEL_SWITCH,
+                                     ies->data, ies->len);
+       if (csa_elem && csa_elem->datalen == sizeof(*csa))
+               csa = (void *)csa_elem->data;
+
+       ecsa_elem = cfg80211_find_elem(WLAN_EID_EXT_CHANSWITCH_ANN,
+                                      ies->data, ies->len);
+       if (ecsa_elem && ecsa_elem->datalen == sizeof(*ecsa))
+               ecsa = (void *)ecsa_elem->data;
+
+       if (csa && csa->count == 0)
+               csa = NULL;
+       if (csa && !csa->mode && csa->new_ch_num == cur_channel)
+               csa = NULL;
+
+       if (ecsa && ecsa->count == 0)
+               ecsa = NULL;
+       if (ecsa && !ecsa->mode && ecsa->new_ch_num == cur_channel)
+               ecsa = NULL;
+
+       if (ignore_ecsa && ecsa) {
+               sdata_info(sdata,
+                          "Ignoring ECSA in probe response - was considered stuck!\n");
+               return csa;
+       }
+
+       return csa || ecsa;
+}
+
+static bool ieee80211_mgd_csa_in_process(struct ieee80211_sub_if_data *sdata,
+                                        struct cfg80211_bss *bss)
+{
+       u8 cur_channel;
+       bool ret;
+
+       cur_channel = ieee80211_frequency_to_channel(bss->channel->center_freq);
+
+       rcu_read_lock();
+       if (ieee80211_mgd_csa_present(sdata,
+                                     rcu_dereference(bss->beacon_ies),
+                                     cur_channel, false)) {
+               ret = true;
+               goto out;
+       }
+
+       if (ieee80211_mgd_csa_present(sdata,
+                                     rcu_dereference(bss->proberesp_ies),
+                                     cur_channel, bss->proberesp_ecsa_stuck)) {
+               ret = true;
+               goto out;
+       }
+
+       ret = false;
+out:
+       rcu_read_unlock();
+       return ret;
+}
+
 /* config hooks */
 int ieee80211_mgd_auth(struct ieee80211_sub_if_data *sdata,
                       struct cfg80211_auth_request *req)
@@ -7317,7 +7385,6 @@ int ieee80211_mgd_auth(struct ieee80211_sub_if_data *sdata,
        struct ieee80211_if_managed *ifmgd = &sdata->u.mgd;
        struct ieee80211_mgd_auth_data *auth_data;
        struct ieee80211_link_data *link;
-       const struct element *csa_elem, *ecsa_elem;
        u16 auth_alg;
        int err;
        bool cont_auth;
@@ -7360,21 +7427,10 @@ int ieee80211_mgd_auth(struct ieee80211_sub_if_data *sdata,
        if (ifmgd->assoc_data)
                return -EBUSY;
 
-       rcu_read_lock();
-       csa_elem = ieee80211_bss_get_elem(req->bss, WLAN_EID_CHANNEL_SWITCH);
-       ecsa_elem = ieee80211_bss_get_elem(req->bss,
-                                          WLAN_EID_EXT_CHANSWITCH_ANN);
-       if ((csa_elem &&
-            csa_elem->datalen == sizeof(struct ieee80211_channel_sw_ie) &&
-            ((struct ieee80211_channel_sw_ie *)csa_elem->data)->count != 0) ||
-           (ecsa_elem &&
-            ecsa_elem->datalen == sizeof(struct ieee80211_ext_chansw_ie) &&
-            ((struct ieee80211_ext_chansw_ie *)ecsa_elem->data)->count != 0)) {
-               rcu_read_unlock();
+       if (ieee80211_mgd_csa_in_process(sdata, req->bss)) {
                sdata_info(sdata, "AP is in CSA process, reject auth\n");
                return -EINVAL;
        }
-       rcu_read_unlock();
 
        auth_data = kzalloc(sizeof(*auth_data) + req->auth_data_len +
                            req->ie_len, GFP_KERNEL);
@@ -7684,7 +7740,7 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata,
        struct ieee80211_local *local = sdata->local;
        struct ieee80211_if_managed *ifmgd = &sdata->u.mgd;
        struct ieee80211_mgd_assoc_data *assoc_data;
-       const struct element *ssid_elem, *csa_elem, *ecsa_elem;
+       const struct element *ssid_elem;
        struct ieee80211_vif_cfg *vif_cfg = &sdata->vif.cfg;
        ieee80211_conn_flags_t conn_flags = 0;
        struct ieee80211_link_data *link;
@@ -7707,23 +7763,15 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata,
 
        cbss = req->link_id < 0 ? req->bss : req->links[req->link_id].bss;
 
-       rcu_read_lock();
-       ssid_elem = ieee80211_bss_get_elem(cbss, WLAN_EID_SSID);
-       if (!ssid_elem || ssid_elem->datalen > sizeof(assoc_data->ssid)) {
-               rcu_read_unlock();
+       if (ieee80211_mgd_csa_in_process(sdata, cbss)) {
+               sdata_info(sdata, "AP is in CSA process, reject assoc\n");
                kfree(assoc_data);
                return -EINVAL;
        }
 
-       csa_elem = ieee80211_bss_get_elem(cbss, WLAN_EID_CHANNEL_SWITCH);
-       ecsa_elem = ieee80211_bss_get_elem(cbss, WLAN_EID_EXT_CHANSWITCH_ANN);
-       if ((csa_elem &&
-            csa_elem->datalen == sizeof(struct ieee80211_channel_sw_ie) &&
-            ((struct ieee80211_channel_sw_ie *)csa_elem->data)->count != 0) ||
-           (ecsa_elem &&
-            ecsa_elem->datalen == sizeof(struct ieee80211_ext_chansw_ie) &&
-            ((struct ieee80211_ext_chansw_ie *)ecsa_elem->data)->count != 0)) {
-               sdata_info(sdata, "AP is in CSA process, reject assoc\n");
+       rcu_read_lock();
+       ssid_elem = ieee80211_bss_get_elem(cbss, WLAN_EID_SSID);
+       if (!ssid_elem || ssid_elem->datalen > sizeof(assoc_data->ssid)) {
                rcu_read_unlock();
                kfree(assoc_data);
                return -EINVAL;
@@ -7998,8 +8046,7 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata,
 
                rcu_read_lock();
                beacon_ies = rcu_dereference(req->bss->beacon_ies);
-
-               if (beacon_ies) {
+               if (!beacon_ies) {
                        /*
                         * Wait up to one beacon interval ...
                         * should this be more if we miss one?
@@ -8080,6 +8127,7 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,
                ieee80211_report_disconnect(sdata, frame_buf,
                                            sizeof(frame_buf), true,
                                            req->reason_code, false);
+               drv_mgd_complete_tx(sdata->local, sdata, &info);
                return 0;
        }
 
index d5ea5f5bcf3a069e1d4dc5dd2638275e58aae51f..9d33fd2377c88af8ec38b6e398d103449f3b03b8 100644 (file)
@@ -119,7 +119,8 @@ void rate_control_rate_update(struct ieee80211_local *local,
                rcu_read_unlock();
        }
 
-       drv_sta_rc_update(local, sta->sdata, &sta->sta, changed);
+       if (sta->uploaded)
+               drv_sta_rc_update(local, sta->sdata, &sta->sta, changed);
 }
 
 int ieee80211_rate_control_register(const struct rate_control_ops *ops)
index 645355e5f1bc7baba435db18c0c8d8243a4649c6..f9d5842601fa9433ba0303f3b6572129b3e2f9fe 100644 (file)
@@ -9,7 +9,7 @@
  * Copyright 2007, Michael Wu <flamingice@sourmilk.net>
  * Copyright 2013-2015  Intel Mobile Communications GmbH
  * Copyright 2016-2017  Intel Deutschland GmbH
- * Copyright (C) 2018-2023 Intel Corporation
+ * Copyright (C) 2018-2024 Intel Corporation
  */
 
 #include <linux/if_arp.h>
@@ -237,14 +237,18 @@ ieee80211_bss_info_update(struct ieee80211_local *local,
 }
 
 static bool ieee80211_scan_accept_presp(struct ieee80211_sub_if_data *sdata,
+                                       struct ieee80211_channel *channel,
                                        u32 scan_flags, const u8 *da)
 {
        if (!sdata)
                return false;
-       /* accept broadcast for OCE */
-       if (scan_flags & NL80211_SCAN_FLAG_ACCEPT_BCAST_PROBE_RESP &&
-           is_broadcast_ether_addr(da))
+
+       /* accept broadcast on 6 GHz and for OCE */
+       if (is_broadcast_ether_addr(da) &&
+           (channel->band == NL80211_BAND_6GHZ ||
+            scan_flags & NL80211_SCAN_FLAG_ACCEPT_BCAST_PROBE_RESP))
                return true;
+
        if (scan_flags & NL80211_SCAN_FLAG_RANDOM_ADDR)
                return true;
        return ether_addr_equal(da, sdata->vif.addr);
@@ -293,6 +297,12 @@ void ieee80211_scan_rx(struct ieee80211_local *local, struct sk_buff *skb)
                wiphy_delayed_work_queue(local->hw.wiphy, &local->scan_work, 0);
        }
 
+       channel = ieee80211_get_channel_khz(local->hw.wiphy,
+                                           ieee80211_rx_status_to_khz(rx_status));
+
+       if (!channel || channel->flags & IEEE80211_CHAN_DISABLED)
+               return;
+
        if (ieee80211_is_probe_resp(mgmt->frame_control)) {
                struct cfg80211_scan_request *scan_req;
                struct cfg80211_sched_scan_request *sched_scan_req;
@@ -310,19 +320,15 @@ void ieee80211_scan_rx(struct ieee80211_local *local, struct sk_buff *skb)
                /* ignore ProbeResp to foreign address or non-bcast (OCE)
                 * unless scanning with randomised address
                 */
-               if (!ieee80211_scan_accept_presp(sdata1, scan_req_flags,
+               if (!ieee80211_scan_accept_presp(sdata1, channel,
+                                                scan_req_flags,
                                                 mgmt->da) &&
-                   !ieee80211_scan_accept_presp(sdata2, sched_scan_req_flags,
+                   !ieee80211_scan_accept_presp(sdata2, channel,
+                                                sched_scan_req_flags,
                                                 mgmt->da))
                        return;
        }
 
-       channel = ieee80211_get_channel_khz(local->hw.wiphy,
-                                       ieee80211_rx_status_to_khz(rx_status));
-
-       if (!channel || channel->flags & IEEE80211_CHAN_DISABLED)
-               return;
-
        bss = ieee80211_bss_info_update(local, rx_status,
                                        mgmt, skb->len,
                                        channel);
index bf1adcd96b411327ba79b3bdc6734df1afd605ca..4391d8dd634bb557771dcc07c11bab296c5a18f3 100644 (file)
@@ -404,7 +404,10 @@ void sta_info_free(struct ieee80211_local *local, struct sta_info *sta)
        int i;
 
        for (i = 0; i < ARRAY_SIZE(sta->link); i++) {
-               if (!(sta->sta.valid_links & BIT(i)))
+               struct link_sta_info *link_sta;
+
+               link_sta = rcu_access_pointer(sta->link[i]);
+               if (!link_sta)
                        continue;
 
                sta_remove_link(sta, i, false);
@@ -910,6 +913,8 @@ static int sta_info_insert_finish(struct sta_info *sta) __acquires(RCU)
        if (ieee80211_vif_is_mesh(&sdata->vif))
                mesh_accept_plinks_update(sdata);
 
+       ieee80211_check_fast_xmit(sta);
+
        return 0;
  out_remove:
        if (sta->sta.valid_links)
index 314998fdb1a5a4853f84a90edf2ba2312933719a..6fbb15b65902c754ea4c2487a40d4ce0ed38634a 100644 (file)
@@ -5,7 +5,7 @@
  * Copyright 2006-2007 Jiri Benc <jbenc@suse.cz>
  * Copyright 2007      Johannes Berg <johannes@sipsolutions.net>
  * Copyright 2013-2014  Intel Mobile Communications GmbH
- * Copyright (C) 2018-2022 Intel Corporation
+ * Copyright (C) 2018-2024 Intel Corporation
  *
  * Transmit and frame generation functions.
  */
@@ -3048,7 +3048,7 @@ void ieee80211_check_fast_xmit(struct sta_info *sta)
            sdata->vif.type == NL80211_IFTYPE_STATION)
                goto out;
 
-       if (!test_sta_flag(sta, WLAN_STA_AUTHORIZED))
+       if (!test_sta_flag(sta, WLAN_STA_AUTHORIZED) || !sta->uploaded)
                goto out;
 
        if (test_sta_flag(sta, WLAN_STA_PS_STA) ||
@@ -3100,10 +3100,11 @@ void ieee80211_check_fast_xmit(struct sta_info *sta)
                        /* DA SA BSSID */
                        build.da_offs = offsetof(struct ieee80211_hdr, addr1);
                        build.sa_offs = offsetof(struct ieee80211_hdr, addr2);
+                       rcu_read_lock();
                        link = rcu_dereference(sdata->link[tdls_link_id]);
-                       if (WARN_ON_ONCE(!link))
-                               break;
-                       memcpy(hdr->addr3, link->u.mgd.bssid, ETH_ALEN);
+                       if (!WARN_ON_ONCE(!link))
+                               memcpy(hdr->addr3, link->u.mgd.bssid, ETH_ALEN);
+                       rcu_read_unlock();
                        build.hdr_len = 24;
                        break;
                }
@@ -3926,6 +3927,7 @@ begin:
                        goto begin;
 
                skb = __skb_dequeue(&tx.skbs);
+               info = IEEE80211_SKB_CB(skb);
 
                if (!skb_queue_empty(&tx.skbs)) {
                        spin_lock_bh(&fq->lock);
@@ -3970,7 +3972,7 @@ begin:
        }
 
 encap_out:
-       IEEE80211_SKB_CB(skb)->control.vif = vif;
+       info->control.vif = vif;
 
        if (tx.sta &&
            wiphy_ext_feature_isset(local->hw.wiphy, NL80211_EXT_FEATURE_AQL)) {
index a05c5b971789c796658a04e0a41a3748fdce75da..3a8612309137312f88dfe6bad79ac854fe76fcc1 100644 (file)
@@ -23,8 +23,6 @@ void ieee80211_check_wbrf_support(struct ieee80211_local *local)
                return;
 
        local->wbrf_supported = acpi_amd_wbrf_supported_producer(dev);
-       dev_dbg(dev, "WBRF is %s supported\n",
-               local->wbrf_supported ? "" : "not");
 }
 
 static void get_chan_freq_boundary(u32 center_freq, u32 bandwidth, u64 *start, u64 *end)
index 7a47a58aa54b446acf7451ba6bdc1b834adda327..ceee44ea09d97b025a490058403cf435e3337ef5 100644 (file)
@@ -663,7 +663,7 @@ struct mctp_sk_key *mctp_alloc_local_tag(struct mctp_sock *msk,
        spin_unlock_irqrestore(&mns->keys_lock, flags);
 
        if (!tagbits) {
-               kfree(key);
+               mctp_key_unref(key);
                return ERR_PTR(-EBUSY);
        }
 
@@ -888,7 +888,7 @@ int mctp_local_output(struct sock *sk, struct mctp_route *rt,
                dev = dev_get_by_index_rcu(sock_net(sk), cb->ifindex);
                if (!dev) {
                        rcu_read_unlock();
-                       return rc;
+                       goto out_free;
                }
                rt->dev = __mctp_dev_get(dev);
                rcu_read_unlock();
@@ -903,7 +903,8 @@ int mctp_local_output(struct sock *sk, struct mctp_route *rt,
                rt->mtu = 0;
 
        } else {
-               return -EINVAL;
+               rc = -EINVAL;
+               goto out_free;
        }
 
        spin_lock_irqsave(&rt->dev->addrs_lock, flags);
@@ -966,12 +967,17 @@ int mctp_local_output(struct sock *sk, struct mctp_route *rt,
                rc = mctp_do_fragment_route(rt, skb, mtu, tag);
        }
 
+       /* route output functions consume the skb, even on error */
+       skb = NULL;
+
 out_release:
        if (!ext_rt)
                mctp_route_release(rt);
 
        mctp_dev_put(tmp_rt.dev);
 
+out_free:
+       kfree_skb(skb);
        return rc;
 }
 
index a536586742f28c1ddd54c79e62eb56fea267a8fa..7017dd60659dc7133318c1c82e3f429bea3a5d57 100644 (file)
 #include <uapi/linux/mptcp.h>
 #include "protocol.h"
 
-static int subflow_get_info(const struct sock *sk, struct sk_buff *skb)
+static int subflow_get_info(struct sock *sk, struct sk_buff *skb)
 {
        struct mptcp_subflow_context *sf;
        struct nlattr *start;
        u32 flags = 0;
+       bool slow;
        int err;
 
+       if (inet_sk_state_load(sk) == TCP_LISTEN)
+               return 0;
+
        start = nla_nest_start_noflag(skb, INET_ULP_INFO_MPTCP);
        if (!start)
                return -EMSGSIZE;
 
+       slow = lock_sock_fast(sk);
        rcu_read_lock();
        sf = rcu_dereference(inet_csk(sk)->icsk_ulp_data);
        if (!sf) {
@@ -63,17 +68,19 @@ static int subflow_get_info(const struct sock *sk, struct sk_buff *skb)
                        sf->map_data_len) ||
            nla_put_u32(skb, MPTCP_SUBFLOW_ATTR_FLAGS, flags) ||
            nla_put_u8(skb, MPTCP_SUBFLOW_ATTR_ID_REM, sf->remote_id) ||
-           nla_put_u8(skb, MPTCP_SUBFLOW_ATTR_ID_LOC, sf->local_id)) {
+           nla_put_u8(skb, MPTCP_SUBFLOW_ATTR_ID_LOC, subflow_get_local_id(sf))) {
                err = -EMSGSIZE;
                goto nla_failure;
        }
 
        rcu_read_unlock();
+       unlock_sock_fast(sk, slow);
        nla_nest_end(skb, start);
        return 0;
 
 nla_failure:
        rcu_read_unlock();
+       unlock_sock_fast(sk, slow);
        nla_nest_cancel(skb, start);
        return err;
 }
index 74698582a2859e4d6ea40abaf8d0f31943e0d128..ad28da655f8bcc75e4ea05d4de2e2ab073ebc2c5 100644 (file)
@@ -59,13 +59,12 @@ void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subf
        mptcp_data_unlock(sk);
 }
 
-void mptcp_fastopen_gen_msk_ackseq(struct mptcp_sock *msk, struct mptcp_subflow_context *subflow,
-                                  const struct mptcp_options_received *mp_opt)
+void __mptcp_fastopen_gen_msk_ackseq(struct mptcp_sock *msk, struct mptcp_subflow_context *subflow,
+                                    const struct mptcp_options_received *mp_opt)
 {
        struct sock *sk = (struct sock *)msk;
        struct sk_buff *skb;
 
-       mptcp_data_lock(sk);
        skb = skb_peek_tail(&sk->sk_receive_queue);
        if (skb) {
                WARN_ON_ONCE(MPTCP_SKB_CB(skb)->end_seq);
@@ -77,5 +76,4 @@ void mptcp_fastopen_gen_msk_ackseq(struct mptcp_sock *msk, struct mptcp_subflow_
        }
 
        pr_debug("msk=%p ack_seq=%llx", msk, msk->ack_seq);
-       mptcp_data_unlock(sk);
 }
index d2527d189a799319c068a5b76a5816cc7a905861..63fc0758c22d45e356d4edadff991b7e88ec8659 100644 (file)
@@ -962,9 +962,7 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
                /* subflows are fully established as soon as we get any
                 * additional ack, including ADD_ADDR.
                 */
-               subflow->fully_established = 1;
-               WRITE_ONCE(msk->fully_established, true);
-               goto check_notify;
+               goto set_fully_established;
        }
 
        /* If the first established packet does not contain MP_CAPABLE + data
@@ -983,10 +981,13 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
        if (mp_opt->deny_join_id0)
                WRITE_ONCE(msk->pm.remote_deny_join_id0, true);
 
-set_fully_established:
        if (unlikely(!READ_ONCE(msk->pm.server_side)))
                pr_warn_once("bogus mpc option on established client sk");
-       mptcp_subflow_fully_established(subflow, mp_opt);
+
+set_fully_established:
+       mptcp_data_lock((struct sock *)msk);
+       __mptcp_subflow_fully_established(msk, subflow, mp_opt);
+       mptcp_data_unlock((struct sock *)msk);
 
 check_notify:
        /* if the subflow is not already linked into the conn_list, we can't
index 287a60381eae6e39c68d49a65530ea5bdc8a6675..58d17d9604e78fde24795219e53e18646c53b0de 100644 (file)
@@ -396,19 +396,6 @@ void mptcp_pm_free_anno_list(struct mptcp_sock *msk)
        }
 }
 
-static bool lookup_address_in_vec(const struct mptcp_addr_info *addrs, unsigned int nr,
-                                 const struct mptcp_addr_info *addr)
-{
-       int i;
-
-       for (i = 0; i < nr; i++) {
-               if (addrs[i].id == addr->id)
-                       return true;
-       }
-
-       return false;
-}
-
 /* Fill all the remote addresses into the array addrs[],
  * and return the array size.
  */
@@ -440,18 +427,34 @@ static unsigned int fill_remote_addresses_vec(struct mptcp_sock *msk,
                msk->pm.subflows++;
                addrs[i++] = remote;
        } else {
+               DECLARE_BITMAP(unavail_id, MPTCP_PM_MAX_ADDR_ID + 1);
+
+               /* Forbid creation of new subflows matching existing
+                * ones, possibly already created by incoming ADD_ADDR
+                */
+               bitmap_zero(unavail_id, MPTCP_PM_MAX_ADDR_ID + 1);
+               mptcp_for_each_subflow(msk, subflow)
+                       if (READ_ONCE(subflow->local_id) == local->id)
+                               __set_bit(subflow->remote_id, unavail_id);
+
                mptcp_for_each_subflow(msk, subflow) {
                        ssk = mptcp_subflow_tcp_sock(subflow);
                        remote_address((struct sock_common *)ssk, &addrs[i]);
-                       addrs[i].id = subflow->remote_id;
+                       addrs[i].id = READ_ONCE(subflow->remote_id);
                        if (deny_id0 && !addrs[i].id)
                                continue;
 
+                       if (test_bit(addrs[i].id, unavail_id))
+                               continue;
+
                        if (!mptcp_pm_addr_families_match(sk, local, &addrs[i]))
                                continue;
 
-                       if (!lookup_address_in_vec(addrs, i, &addrs[i]) &&
-                           msk->pm.subflows < subflows_max) {
+                       if (msk->pm.subflows < subflows_max) {
+                               /* forbid creating multiple address towards
+                                * this id
+                                */
+                               __set_bit(addrs[i].id, unavail_id);
                                msk->pm.subflows++;
                                i++;
                        }
@@ -799,18 +802,18 @@ static void mptcp_pm_nl_rm_addr_or_subflow(struct mptcp_sock *msk,
 
                mptcp_for_each_subflow_safe(msk, subflow, tmp) {
                        struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+                       u8 remote_id = READ_ONCE(subflow->remote_id);
                        int how = RCV_SHUTDOWN | SEND_SHUTDOWN;
-                       u8 id = subflow->local_id;
+                       u8 id = subflow_get_local_id(subflow);
 
-                       if (rm_type == MPTCP_MIB_RMADDR && subflow->remote_id != rm_id)
+                       if (rm_type == MPTCP_MIB_RMADDR && remote_id != rm_id)
                                continue;
                        if (rm_type == MPTCP_MIB_RMSUBFLOW && !mptcp_local_id_match(msk, id, rm_id))
                                continue;
 
                        pr_debug(" -> %s rm_list_ids[%d]=%u local_id=%u remote_id=%u mpc_id=%u",
                                 rm_type == MPTCP_MIB_RMADDR ? "address" : "subflow",
-                                i, rm_id, subflow->local_id, subflow->remote_id,
-                                msk->mpc_endpoint_id);
+                                i, rm_id, id, remote_id, msk->mpc_endpoint_id);
                        spin_unlock_bh(&msk->pm.lock);
                        mptcp_subflow_shutdown(sk, ssk, how);
 
@@ -901,7 +904,8 @@ static void __mptcp_pm_release_addr_entry(struct mptcp_pm_addr_entry *entry)
 }
 
 static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet,
-                                            struct mptcp_pm_addr_entry *entry)
+                                            struct mptcp_pm_addr_entry *entry,
+                                            bool needs_id)
 {
        struct mptcp_pm_addr_entry *cur, *del_entry = NULL;
        unsigned int addr_max;
@@ -949,7 +953,7 @@ static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet,
                }
        }
 
-       if (!entry->addr.id) {
+       if (!entry->addr.id && needs_id) {
 find_next:
                entry->addr.id = find_next_zero_bit(pernet->id_bitmap,
                                                    MPTCP_PM_MAX_ADDR_ID + 1,
@@ -960,7 +964,7 @@ find_next:
                }
        }
 
-       if (!entry->addr.id)
+       if (!entry->addr.id && needs_id)
                goto out;
 
        __set_bit(entry->addr.id, pernet->id_bitmap);
@@ -1092,7 +1096,7 @@ int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct mptcp_addr_info *skc
        entry->ifindex = 0;
        entry->flags = MPTCP_PM_ADDR_FLAG_IMPLICIT;
        entry->lsk = NULL;
-       ret = mptcp_pm_nl_append_new_local_addr(pernet, entry);
+       ret = mptcp_pm_nl_append_new_local_addr(pernet, entry, true);
        if (ret < 0)
                kfree(entry);
 
@@ -1285,6 +1289,18 @@ next:
        return 0;
 }
 
+static bool mptcp_pm_has_addr_attr_id(const struct nlattr *attr,
+                                     struct genl_info *info)
+{
+       struct nlattr *tb[MPTCP_PM_ADDR_ATTR_MAX + 1];
+
+       if (!nla_parse_nested_deprecated(tb, MPTCP_PM_ADDR_ATTR_MAX, attr,
+                                        mptcp_pm_address_nl_policy, info->extack) &&
+           tb[MPTCP_PM_ADDR_ATTR_ID])
+               return true;
+       return false;
+}
+
 int mptcp_pm_nl_add_addr_doit(struct sk_buff *skb, struct genl_info *info)
 {
        struct nlattr *attr = info->attrs[MPTCP_PM_ENDPOINT_ADDR];
@@ -1326,7 +1342,8 @@ int mptcp_pm_nl_add_addr_doit(struct sk_buff *skb, struct genl_info *info)
                        goto out_free;
                }
        }
-       ret = mptcp_pm_nl_append_new_local_addr(pernet, entry);
+       ret = mptcp_pm_nl_append_new_local_addr(pernet, entry,
+                                               !mptcp_pm_has_addr_attr_id(attr, info));
        if (ret < 0) {
                GENL_SET_ERR_MSG_FMT(info, "too many addresses or duplicate one: %d", ret);
                goto out_free;
@@ -1980,7 +1997,7 @@ static int mptcp_event_add_subflow(struct sk_buff *skb, const struct sock *ssk)
        if (WARN_ON_ONCE(!sf))
                return -EINVAL;
 
-       if (nla_put_u8(skb, MPTCP_ATTR_LOC_ID, sf->local_id))
+       if (nla_put_u8(skb, MPTCP_ATTR_LOC_ID, subflow_get_local_id(sf)))
                return -EMSGSIZE;
 
        if (nla_put_u8(skb, MPTCP_ATTR_REM_ID, sf->remote_id))
index efecbe3cf41533324a5df71da39f775dd2078ca6..bc97cc30f013abdba076aa93596dd213e9353eb8 100644 (file)
@@ -26,7 +26,8 @@ void mptcp_free_local_addr_list(struct mptcp_sock *msk)
 }
 
 static int mptcp_userspace_pm_append_new_local_addr(struct mptcp_sock *msk,
-                                                   struct mptcp_pm_addr_entry *entry)
+                                                   struct mptcp_pm_addr_entry *entry,
+                                                   bool needs_id)
 {
        DECLARE_BITMAP(id_bitmap, MPTCP_PM_MAX_ADDR_ID + 1);
        struct mptcp_pm_addr_entry *match = NULL;
@@ -41,7 +42,7 @@ static int mptcp_userspace_pm_append_new_local_addr(struct mptcp_sock *msk,
        spin_lock_bh(&msk->pm.lock);
        list_for_each_entry(e, &msk->pm.userspace_pm_local_addr_list, list) {
                addr_match = mptcp_addresses_equal(&e->addr, &entry->addr, true);
-               if (addr_match && entry->addr.id == 0)
+               if (addr_match && entry->addr.id == 0 && needs_id)
                        entry->addr.id = e->addr.id;
                id_match = (e->addr.id == entry->addr.id);
                if (addr_match && id_match) {
@@ -64,7 +65,7 @@ static int mptcp_userspace_pm_append_new_local_addr(struct mptcp_sock *msk,
                }
 
                *e = *entry;
-               if (!e->addr.id)
+               if (!e->addr.id && needs_id)
                        e->addr.id = find_next_zero_bit(id_bitmap,
                                                        MPTCP_PM_MAX_ADDR_ID + 1,
                                                        1);
@@ -130,10 +131,21 @@ int mptcp_userspace_pm_get_flags_and_ifindex_by_id(struct mptcp_sock *msk,
 int mptcp_userspace_pm_get_local_id(struct mptcp_sock *msk,
                                    struct mptcp_addr_info *skc)
 {
-       struct mptcp_pm_addr_entry new_entry;
+       struct mptcp_pm_addr_entry *entry = NULL, *e, new_entry;
        __be16 msk_sport =  ((struct inet_sock *)
                             inet_sk((struct sock *)msk))->inet_sport;
 
+       spin_lock_bh(&msk->pm.lock);
+       list_for_each_entry(e, &msk->pm.userspace_pm_local_addr_list, list) {
+               if (mptcp_addresses_equal(&e->addr, skc, false)) {
+                       entry = e;
+                       break;
+               }
+       }
+       spin_unlock_bh(&msk->pm.lock);
+       if (entry)
+               return entry->addr.id;
+
        memset(&new_entry, 0, sizeof(struct mptcp_pm_addr_entry));
        new_entry.addr = *skc;
        new_entry.addr.id = 0;
@@ -142,7 +154,7 @@ int mptcp_userspace_pm_get_local_id(struct mptcp_sock *msk,
        if (new_entry.addr.port == msk_sport)
                new_entry.addr.port = 0;
 
-       return mptcp_userspace_pm_append_new_local_addr(msk, &new_entry);
+       return mptcp_userspace_pm_append_new_local_addr(msk, &new_entry, true);
 }
 
 int mptcp_pm_nl_announce_doit(struct sk_buff *skb, struct genl_info *info)
@@ -187,7 +199,7 @@ int mptcp_pm_nl_announce_doit(struct sk_buff *skb, struct genl_info *info)
                goto announce_err;
        }
 
-       err = mptcp_userspace_pm_append_new_local_addr(msk, &addr_val);
+       err = mptcp_userspace_pm_append_new_local_addr(msk, &addr_val, false);
        if (err < 0) {
                GENL_SET_ERR_MSG(info, "did not match address and id");
                goto announce_err;
@@ -222,7 +234,7 @@ static int mptcp_userspace_pm_remove_id_zero_address(struct mptcp_sock *msk,
 
        lock_sock(sk);
        mptcp_for_each_subflow(msk, subflow) {
-               if (subflow->local_id == 0) {
+               if (READ_ONCE(subflow->local_id) == 0) {
                        has_id_0 = true;
                        break;
                }
@@ -367,7 +379,7 @@ int mptcp_pm_nl_subflow_create_doit(struct sk_buff *skb, struct genl_info *info)
        }
 
        local.addr = addr_l;
-       err = mptcp_userspace_pm_append_new_local_addr(msk, &local);
+       err = mptcp_userspace_pm_append_new_local_addr(msk, &local, false);
        if (err < 0) {
                GENL_SET_ERR_MSG(info, "did not match address and id");
                goto create_err;
@@ -483,6 +495,16 @@ int mptcp_pm_nl_subflow_destroy_doit(struct sk_buff *skb, struct genl_info *info
                goto destroy_err;
        }
 
+#if IS_ENABLED(CONFIG_MPTCP_IPV6)
+       if (addr_l.family == AF_INET && ipv6_addr_v4mapped(&addr_r.addr6)) {
+               ipv6_addr_set_v4mapped(addr_l.addr.s_addr, &addr_l.addr6);
+               addr_l.family = AF_INET6;
+       }
+       if (addr_r.family == AF_INET && ipv6_addr_v4mapped(&addr_l.addr6)) {
+               ipv6_addr_set_v4mapped(addr_r.addr.s_addr, &addr_r.addr6);
+               addr_r.family = AF_INET6;
+       }
+#endif
        if (addr_l.family != addr_r.family) {
                GENL_SET_ERR_MSG(info, "address families do not match");
                err = -EINVAL;
index 3ed4709a75096025683149b2d4af0a1d5f24141c..7833a49f6214a194a282bba92671e9cdd945ad92 100644 (file)
@@ -85,7 +85,7 @@ static int __mptcp_socket_create(struct mptcp_sock *msk)
        subflow->subflow_id = msk->subflow_id++;
 
        /* This is the first subflow, always with id 0 */
-       subflow->local_id_valid = 1;
+       WRITE_ONCE(subflow->local_id, 0);
        mptcp_sock_graft(msk->first, sk->sk_socket);
        iput(SOCK_INODE(ssock));
 
@@ -1260,6 +1260,7 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
                mpext = mptcp_get_ext(skb);
                if (!mptcp_skb_can_collapse_to(data_seq, skb, mpext)) {
                        TCP_SKB_CB(skb)->eor = 1;
+                       tcp_mark_push(tcp_sk(ssk), skb);
                        goto alloc_skb;
                }
 
@@ -1505,8 +1506,11 @@ static void mptcp_update_post_push(struct mptcp_sock *msk,
 
 void mptcp_check_and_set_pending(struct sock *sk)
 {
-       if (mptcp_send_head(sk))
-               mptcp_sk(sk)->push_pending |= BIT(MPTCP_PUSH_PENDING);
+       if (mptcp_send_head(sk)) {
+               mptcp_data_lock(sk);
+               mptcp_sk(sk)->cb_flags |= BIT(MPTCP_PUSH_PENDING);
+               mptcp_data_unlock(sk);
+       }
 }
 
 static int __subflow_push_pending(struct sock *sk, struct sock *ssk,
@@ -1960,6 +1964,9 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied)
        if (copied <= 0)
                return;
 
+       if (!msk->rcvspace_init)
+               mptcp_rcv_space_init(msk, msk->first);
+
        msk->rcvq_space.copied += copied;
 
        mstamp = div_u64(tcp_clock_ns(), NSEC_PER_USEC);
@@ -2314,9 +2321,6 @@ bool __mptcp_retransmit_pending_data(struct sock *sk)
        if (__mptcp_check_fallback(msk))
                return false;
 
-       if (tcp_rtx_and_write_queues_empty(sk))
-               return false;
-
        /* the closing socket has some data untransmitted and/or unacked:
         * some data in the mptcp rtx queue has not really xmitted yet.
         * keep it simple and re-inject the whole mptcp level rtx queue
@@ -3145,7 +3149,6 @@ static int mptcp_disconnect(struct sock *sk, int flags)
        mptcp_destroy_common(msk, MPTCP_CF_FASTCLOSE);
        WRITE_ONCE(msk->flags, 0);
        msk->cb_flags = 0;
-       msk->push_pending = 0;
        msk->recovery = false;
        msk->can_ack = false;
        msk->fully_established = false;
@@ -3161,6 +3164,7 @@ static int mptcp_disconnect(struct sock *sk, int flags)
        msk->bytes_received = 0;
        msk->bytes_sent = 0;
        msk->bytes_retrans = 0;
+       msk->rcvspace_init = 0;
 
        WRITE_ONCE(sk->sk_shutdown, 0);
        sk_error_report(sk);
@@ -3174,8 +3178,50 @@ static struct ipv6_pinfo *mptcp_inet6_sk(const struct sock *sk)
 
        return (struct ipv6_pinfo *)(((u8 *)sk) + offset);
 }
+
+static void mptcp_copy_ip6_options(struct sock *newsk, const struct sock *sk)
+{
+       const struct ipv6_pinfo *np = inet6_sk(sk);
+       struct ipv6_txoptions *opt;
+       struct ipv6_pinfo *newnp;
+
+       newnp = inet6_sk(newsk);
+
+       rcu_read_lock();
+       opt = rcu_dereference(np->opt);
+       if (opt) {
+               opt = ipv6_dup_options(newsk, opt);
+               if (!opt)
+                       net_warn_ratelimited("%s: Failed to copy ip6 options\n", __func__);
+       }
+       RCU_INIT_POINTER(newnp->opt, opt);
+       rcu_read_unlock();
+}
 #endif
 
+static void mptcp_copy_ip_options(struct sock *newsk, const struct sock *sk)
+{
+       struct ip_options_rcu *inet_opt, *newopt = NULL;
+       const struct inet_sock *inet = inet_sk(sk);
+       struct inet_sock *newinet;
+
+       newinet = inet_sk(newsk);
+
+       rcu_read_lock();
+       inet_opt = rcu_dereference(inet->inet_opt);
+       if (inet_opt) {
+               newopt = sock_kmalloc(newsk, sizeof(*inet_opt) +
+                                     inet_opt->opt.optlen, GFP_ATOMIC);
+               if (newopt)
+                       memcpy(newopt, inet_opt, sizeof(*inet_opt) +
+                              inet_opt->opt.optlen);
+               else
+                       net_warn_ratelimited("%s: Failed to copy ip options\n", __func__);
+       }
+       RCU_INIT_POINTER(newinet->inet_opt, newopt);
+       rcu_read_unlock();
+}
+
 struct sock *mptcp_sk_clone_init(const struct sock *sk,
                                 const struct mptcp_options_received *mp_opt,
                                 struct sock *ssk,
@@ -3183,6 +3229,7 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk,
 {
        struct mptcp_subflow_request_sock *subflow_req = mptcp_subflow_rsk(req);
        struct sock *nsk = sk_clone_lock(sk, GFP_ATOMIC);
+       struct mptcp_subflow_context *subflow;
        struct mptcp_sock *msk;
 
        if (!nsk)
@@ -3195,6 +3242,13 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk,
 
        __mptcp_init_sock(nsk);
 
+#if IS_ENABLED(CONFIG_MPTCP_IPV6)
+       if (nsk->sk_family == AF_INET6)
+               mptcp_copy_ip6_options(nsk, sk);
+       else
+#endif
+               mptcp_copy_ip_options(nsk, sk);
+
        msk = mptcp_sk(nsk);
        msk->local_key = subflow_req->local_key;
        msk->token = subflow_req->token;
@@ -3206,7 +3260,7 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk,
        msk->write_seq = subflow_req->idsn + 1;
        msk->snd_nxt = msk->write_seq;
        msk->snd_una = msk->write_seq;
-       msk->wnd_end = msk->snd_nxt + req->rsk_rcv_wnd;
+       msk->wnd_end = msk->snd_nxt + tcp_sk(ssk)->snd_wnd;
        msk->setsockopt_seq = mptcp_sk(sk)->setsockopt_seq;
        mptcp_init_sched(msk, mptcp_sk(sk)->sched);
 
@@ -3223,7 +3277,8 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk,
 
        /* The msk maintain a ref to each subflow in the connections list */
        WRITE_ONCE(msk->first, ssk);
-       list_add(&mptcp_subflow_ctx(ssk)->node, &msk->conn_list);
+       subflow = mptcp_subflow_ctx(ssk);
+       list_add(&subflow->node, &msk->conn_list);
        sock_hold(ssk);
 
        /* new mpc subflow takes ownership of the newly
@@ -3238,6 +3293,9 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk,
        __mptcp_propagate_sndbuf(nsk, ssk);
 
        mptcp_rcv_space_init(msk, ssk);
+
+       if (mp_opt->suboptions & OPTION_MPTCP_MPC_ACK)
+               __mptcp_subflow_fully_established(msk, subflow, mp_opt);
        bh_unlock_sock(nsk);
 
        /* note: the newly allocated socket refcount is 2 now */
@@ -3248,6 +3306,7 @@ void mptcp_rcv_space_init(struct mptcp_sock *msk, const struct sock *ssk)
 {
        const struct tcp_sock *tp = tcp_sk(ssk);
 
+       msk->rcvspace_init = 1;
        msk->rcvq_space.copied = 0;
        msk->rcvq_space.rtt_us = 0;
 
@@ -3258,8 +3317,6 @@ void mptcp_rcv_space_init(struct mptcp_sock *msk, const struct sock *ssk)
                                      TCP_INIT_CWND * tp->advmss);
        if (msk->rcvq_space.space == 0)
                msk->rcvq_space.space = TCP_INIT_CWND * TCP_MSS_DEFAULT;
-
-       WRITE_ONCE(msk->wnd_end, msk->snd_nxt + tcp_sk(ssk)->snd_wnd);
 }
 
 void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags)
@@ -3333,8 +3390,7 @@ static void mptcp_release_cb(struct sock *sk)
        struct mptcp_sock *msk = mptcp_sk(sk);
 
        for (;;) {
-               unsigned long flags = (msk->cb_flags & MPTCP_FLAGS_PROCESS_CTX_NEED) |
-                                     msk->push_pending;
+               unsigned long flags = (msk->cb_flags & MPTCP_FLAGS_PROCESS_CTX_NEED);
                struct list_head join_list;
 
                if (!flags)
@@ -3350,7 +3406,6 @@ static void mptcp_release_cb(struct sock *sk)
                 *    datapath acquires the msk socket spinlock while helding
                 *    the subflow socket lock
                 */
-               msk->push_pending = 0;
                msk->cb_flags &= ~flags;
                spin_unlock_bh(&sk->sk_lock.slock);
 
@@ -3478,13 +3533,8 @@ void mptcp_finish_connect(struct sock *ssk)
         * accessing the field below
         */
        WRITE_ONCE(msk->local_key, subflow->local_key);
-       WRITE_ONCE(msk->write_seq, subflow->idsn + 1);
-       WRITE_ONCE(msk->snd_nxt, msk->write_seq);
-       WRITE_ONCE(msk->snd_una, msk->write_seq);
 
        mptcp_pm_new_connection(msk, ssk, 0);
-
-       mptcp_rcv_space_init(msk, ssk);
 }
 
 void mptcp_sock_graft(struct sock *sk, struct socket *parent)
index 3517f2d24a226ff0be1adec800044810f1aa31c6..07f6242afc1ae09d3c17aadfe7bb104eb3cf177c 100644 (file)
@@ -286,7 +286,6 @@ struct mptcp_sock {
        int             rmem_released;
        unsigned long   flags;
        unsigned long   cb_flags;
-       unsigned long   push_pending;
        bool            recovery;               /* closing subflow write queue reinjected */
        bool            can_ack;
        bool            fully_established;
@@ -305,7 +304,8 @@ struct mptcp_sock {
                        nodelay:1,
                        fastopening:1,
                        in_accept_queue:1,
-                       free_first:1;
+                       free_first:1,
+                       rcvspace_init:1;
        struct work_struct work;
        struct sk_buff  *ooo_last_skb;
        struct rb_root  out_of_order_queue;
@@ -491,10 +491,9 @@ struct mptcp_subflow_context {
                remote_key_valid : 1,        /* received the peer key from */
                disposable : 1,     /* ctx can be free at ulp release time */
                stale : 1,          /* unable to snd/rcv data, do not use for xmit */
-               local_id_valid : 1, /* local_id is correctly initialized */
                valid_csum_seen : 1,        /* at least one csum validated */
                is_mptfo : 1,       /* subflow is doing TFO */
-               __unused : 9;
+               __unused : 10;
        bool    data_avail;
        bool    scheduled;
        u32     remote_nonce;
@@ -505,7 +504,7 @@ struct mptcp_subflow_context {
                u8      hmac[MPTCPOPT_HMAC_LEN]; /* MPJ subflow only */
                u64     iasn;       /* initial ack sequence number, MPC subflows only */
        };
-       u8      local_id;
+       s16     local_id;           /* if negative not initialized yet */
        u8      remote_id;
        u8      reset_seen:1;
        u8      reset_transient:1;
@@ -556,6 +555,7 @@ mptcp_subflow_ctx_reset(struct mptcp_subflow_context *subflow)
 {
        memset(&subflow->reset, 0, sizeof(subflow->reset));
        subflow->request_mptcp = 1;
+       WRITE_ONCE(subflow->local_id, -1);
 }
 
 static inline u64
@@ -622,8 +622,9 @@ unsigned int mptcp_stale_loss_cnt(const struct net *net);
 unsigned int mptcp_close_timeout(const struct sock *sk);
 int mptcp_get_pm_type(const struct net *net);
 const char *mptcp_get_scheduler(const struct net *net);
-void mptcp_subflow_fully_established(struct mptcp_subflow_context *subflow,
-                                    const struct mptcp_options_received *mp_opt);
+void __mptcp_subflow_fully_established(struct mptcp_sock *msk,
+                                      struct mptcp_subflow_context *subflow,
+                                      const struct mptcp_options_received *mp_opt);
 bool __mptcp_retransmit_pending_data(struct sock *sk);
 void mptcp_check_and_set_pending(struct sock *sk);
 void __mptcp_push_pending(struct sock *sk, unsigned int flags);
@@ -789,6 +790,16 @@ static inline bool mptcp_data_fin_enabled(const struct mptcp_sock *msk)
               READ_ONCE(msk->write_seq) == READ_ONCE(msk->snd_nxt);
 }
 
+static inline void mptcp_write_space(struct sock *sk)
+{
+       if (sk_stream_is_writeable(sk)) {
+               /* pairs with memory barrier in mptcp_poll */
+               smp_mb();
+               if (test_and_clear_bit(MPTCP_NOSPACE, &mptcp_sk(sk)->flags))
+                       sk_stream_write_space(sk);
+       }
+}
+
 static inline void __mptcp_sync_sndbuf(struct sock *sk)
 {
        struct mptcp_subflow_context *subflow;
@@ -807,6 +818,7 @@ static inline void __mptcp_sync_sndbuf(struct sock *sk)
 
        /* the msk max wmem limit is <nr_subflows> * tcp wmem[2] */
        WRITE_ONCE(sk->sk_sndbuf, new_sndbuf);
+       mptcp_write_space(sk);
 }
 
 /* The called held both the msk socket and the subflow socket locks,
@@ -837,16 +849,6 @@ static inline void mptcp_propagate_sndbuf(struct sock *sk, struct sock *ssk)
        local_bh_enable();
 }
 
-static inline void mptcp_write_space(struct sock *sk)
-{
-       if (sk_stream_is_writeable(sk)) {
-               /* pairs with memory barrier in mptcp_poll */
-               smp_mb();
-               if (test_and_clear_bit(MPTCP_NOSPACE, &mptcp_sk(sk)->flags))
-                       sk_stream_write_space(sk);
-       }
-}
-
 void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags);
 
 #define MPTCP_TOKEN_MAX_RETRIES        4
@@ -952,8 +954,8 @@ void mptcp_event_pm_listener(const struct sock *ssk,
                             enum mptcp_event_type event);
 bool mptcp_userspace_pm_active(const struct mptcp_sock *msk);
 
-void mptcp_fastopen_gen_msk_ackseq(struct mptcp_sock *msk, struct mptcp_subflow_context *subflow,
-                                  const struct mptcp_options_received *mp_opt);
+void __mptcp_fastopen_gen_msk_ackseq(struct mptcp_sock *msk, struct mptcp_subflow_context *subflow,
+                                    const struct mptcp_options_received *mp_opt);
 void mptcp_fastopen_subflow_synack_set_params(struct mptcp_subflow_context *subflow,
                                              struct request_sock *req);
 
@@ -1021,6 +1023,15 @@ int mptcp_pm_get_local_id(struct mptcp_sock *msk, struct sock_common *skc);
 int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct mptcp_addr_info *skc);
 int mptcp_userspace_pm_get_local_id(struct mptcp_sock *msk, struct mptcp_addr_info *skc);
 
+static inline u8 subflow_get_local_id(const struct mptcp_subflow_context *subflow)
+{
+       int local_id = READ_ONCE(subflow->local_id);
+
+       if (local_id < 0)
+               return 0;
+       return local_id;
+}
+
 void __init mptcp_pm_nl_init(void);
 void mptcp_pm_nl_work(struct mptcp_sock *msk);
 void mptcp_pm_nl_rm_subflow_received(struct mptcp_sock *msk,
@@ -1128,7 +1139,8 @@ static inline bool subflow_simultaneous_connect(struct sock *sk)
 {
        struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
 
-       return (1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_FIN_WAIT1) &&
+       return (1 << sk->sk_state) &
+              (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2 | TCPF_CLOSING) &&
               is_active_ssk(subflow) &&
               !subflow->conn_finished;
 }
index 0dcb721c89d193e8943aa414610fcf4284d51f38..71ba86246ff893c5bf65f77802510b52c3d68fd4 100644 (file)
@@ -421,29 +421,26 @@ static bool subflow_use_different_dport(struct mptcp_sock *msk, const struct soc
 
 void __mptcp_sync_state(struct sock *sk, int state)
 {
+       struct mptcp_subflow_context *subflow;
        struct mptcp_sock *msk = mptcp_sk(sk);
+       struct sock *ssk = msk->first;
+
+       subflow = mptcp_subflow_ctx(ssk);
+       __mptcp_propagate_sndbuf(sk, ssk);
+       if (!msk->rcvspace_init)
+               mptcp_rcv_space_init(msk, ssk);
 
-       __mptcp_propagate_sndbuf(sk, msk->first);
        if (sk->sk_state == TCP_SYN_SENT) {
+               /* subflow->idsn is always available is TCP_SYN_SENT state,
+                * even for the FASTOPEN scenarios
+                */
+               WRITE_ONCE(msk->write_seq, subflow->idsn + 1);
+               WRITE_ONCE(msk->snd_nxt, msk->write_seq);
                mptcp_set_state(sk, state);
                sk->sk_state_change(sk);
        }
 }
 
-static void mptcp_propagate_state(struct sock *sk, struct sock *ssk)
-{
-       struct mptcp_sock *msk = mptcp_sk(sk);
-
-       mptcp_data_lock(sk);
-       if (!sock_owned_by_user(sk)) {
-               __mptcp_sync_state(sk, ssk->sk_state);
-       } else {
-               msk->pending_state = ssk->sk_state;
-               __set_bit(MPTCP_SYNC_STATE, &msk->cb_flags);
-       }
-       mptcp_data_unlock(sk);
-}
-
 static void subflow_set_remote_key(struct mptcp_sock *msk,
                                   struct mptcp_subflow_context *subflow,
                                   const struct mptcp_options_received *mp_opt)
@@ -465,6 +462,31 @@ static void subflow_set_remote_key(struct mptcp_sock *msk,
        atomic64_set(&msk->rcv_wnd_sent, subflow->iasn);
 }
 
+static void mptcp_propagate_state(struct sock *sk, struct sock *ssk,
+                                 struct mptcp_subflow_context *subflow,
+                                 const struct mptcp_options_received *mp_opt)
+{
+       struct mptcp_sock *msk = mptcp_sk(sk);
+
+       mptcp_data_lock(sk);
+       if (mp_opt) {
+               /* Options are available only in the non fallback cases
+                * avoid updating rx path fields otherwise
+                */
+               WRITE_ONCE(msk->snd_una, subflow->idsn + 1);
+               WRITE_ONCE(msk->wnd_end, subflow->idsn + 1 + tcp_sk(ssk)->snd_wnd);
+               subflow_set_remote_key(msk, subflow, mp_opt);
+       }
+
+       if (!sock_owned_by_user(sk)) {
+               __mptcp_sync_state(sk, ssk->sk_state);
+       } else {
+               msk->pending_state = ssk->sk_state;
+               __set_bit(MPTCP_SYNC_STATE, &msk->cb_flags);
+       }
+       mptcp_data_unlock(sk);
+}
+
 static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
 {
        struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
@@ -499,10 +521,9 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
                if (mp_opt.deny_join_id0)
                        WRITE_ONCE(msk->pm.remote_deny_join_id0, true);
                subflow->mp_capable = 1;
-               subflow_set_remote_key(msk, subflow, &mp_opt);
                MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_MPCAPABLEACTIVEACK);
                mptcp_finish_connect(sk);
-               mptcp_propagate_state(parent, sk);
+               mptcp_propagate_state(parent, sk, subflow, &mp_opt);
        } else if (subflow->request_join) {
                u8 hmac[SHA256_DIGEST_SIZE];
 
@@ -514,7 +535,7 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
                subflow->backup = mp_opt.backup;
                subflow->thmac = mp_opt.thmac;
                subflow->remote_nonce = mp_opt.nonce;
-               subflow->remote_id = mp_opt.join_id;
+               WRITE_ONCE(subflow->remote_id, mp_opt.join_id);
                pr_debug("subflow=%p, thmac=%llu, remote_nonce=%u backup=%d",
                         subflow, subflow->thmac, subflow->remote_nonce,
                         subflow->backup);
@@ -545,8 +566,7 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
                }
        } else if (mptcp_check_fallback(sk)) {
 fallback:
-               mptcp_rcv_space_init(msk, sk);
-               mptcp_propagate_state(parent, sk);
+               mptcp_propagate_state(parent, sk, subflow, NULL);
        }
        return;
 
@@ -557,8 +577,8 @@ do_reset:
 
 static void subflow_set_local_id(struct mptcp_subflow_context *subflow, int local_id)
 {
-       subflow->local_id = local_id;
-       subflow->local_id_valid = 1;
+       WARN_ON_ONCE(local_id < 0 || local_id > 255);
+       WRITE_ONCE(subflow->local_id, local_id);
 }
 
 static int subflow_chk_local_id(struct sock *sk)
@@ -567,7 +587,7 @@ static int subflow_chk_local_id(struct sock *sk)
        struct mptcp_sock *msk = mptcp_sk(subflow->conn);
        int err;
 
-       if (likely(subflow->local_id_valid))
+       if (likely(subflow->local_id >= 0))
                return 0;
 
        err = mptcp_pm_get_local_id(msk, (struct sock_common *)sk);
@@ -731,17 +751,16 @@ void mptcp_subflow_drop_ctx(struct sock *ssk)
        kfree_rcu(ctx, rcu);
 }
 
-void mptcp_subflow_fully_established(struct mptcp_subflow_context *subflow,
-                                    const struct mptcp_options_received *mp_opt)
+void __mptcp_subflow_fully_established(struct mptcp_sock *msk,
+                                      struct mptcp_subflow_context *subflow,
+                                      const struct mptcp_options_received *mp_opt)
 {
-       struct mptcp_sock *msk = mptcp_sk(subflow->conn);
-
        subflow_set_remote_key(msk, subflow, mp_opt);
        subflow->fully_established = 1;
        WRITE_ONCE(msk->fully_established, true);
 
        if (subflow->is_mptfo)
-               mptcp_fastopen_gen_msk_ackseq(msk, subflow, mp_opt);
+               __mptcp_fastopen_gen_msk_ackseq(msk, subflow, mp_opt);
 }
 
 static struct sock *subflow_syn_recv_sock(const struct sock *sk,
@@ -834,7 +853,6 @@ create_child:
                         * mpc option
                         */
                        if (mp_opt.suboptions & OPTION_MPTCP_MPC_ACK) {
-                               mptcp_subflow_fully_established(ctx, &mp_opt);
                                mptcp_pm_fully_established(owner, child);
                                ctx->pm_notified = 1;
                        }
@@ -1549,7 +1567,7 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc,
        pr_debug("msk=%p remote_token=%u local_id=%d remote_id=%d", msk,
                 remote_token, local_id, remote_id);
        subflow->remote_token = remote_token;
-       subflow->remote_id = remote_id;
+       WRITE_ONCE(subflow->remote_id, remote_id);
        subflow->request_join = 1;
        subflow->request_bkup = !!(flags & MPTCP_PM_ADDR_FLAG_BACKUP);
        subflow->subflow_id = msk->subflow_id++;
@@ -1713,6 +1731,7 @@ static struct mptcp_subflow_context *subflow_create_ctx(struct sock *sk,
        pr_debug("subflow=%p", ctx);
 
        ctx->tcp_sock = sk;
+       WRITE_ONCE(ctx->local_id, -1);
 
        return ctx;
 }
@@ -1744,10 +1763,9 @@ static void subflow_state_change(struct sock *sk)
        msk = mptcp_sk(parent);
        if (subflow_simultaneous_connect(sk)) {
                mptcp_do_fallback(sk);
-               mptcp_rcv_space_init(msk, sk);
                pr_fallback(msk);
                subflow->conn_finished = 1;
-               mptcp_propagate_state(parent, sk);
+               mptcp_propagate_state(parent, sk, subflow, NULL);
        }
 
        /* as recvmsg() does not acquire the subflow socket for ssk selection
@@ -1949,14 +1967,14 @@ static void subflow_ulp_clone(const struct request_sock *req,
                new_ctx->idsn = subflow_req->idsn;
 
                /* this is the first subflow, id is always 0 */
-               new_ctx->local_id_valid = 1;
+               subflow_set_local_id(new_ctx, 0);
        } else if (subflow_req->mp_join) {
                new_ctx->ssn_offset = subflow_req->ssn_offset;
                new_ctx->mp_join = 1;
                new_ctx->fully_established = 1;
                new_ctx->remote_key_valid = 1;
                new_ctx->backup = subflow_req->backup;
-               new_ctx->remote_id = subflow_req->remote_id;
+               WRITE_ONCE(new_ctx->remote_id, subflow_req->remote_id);
                new_ctx->token = subflow_req->token;
                new_ctx->thmac = subflow_req->thmac;
 
index 21f7860e8fa1fd4f1f46c9ad278bfeb261090152..cb48a2b9cb9fd708c2f99adadfd4b21671b44a4a 100644 (file)
@@ -30,6 +30,7 @@
 #define mtype_del              IPSET_TOKEN(MTYPE, _del)
 #define mtype_list             IPSET_TOKEN(MTYPE, _list)
 #define mtype_gc               IPSET_TOKEN(MTYPE, _gc)
+#define mtype_cancel_gc                IPSET_TOKEN(MTYPE, _cancel_gc)
 #define mtype                  MTYPE
 
 #define get_ext(set, map, id)  ((map)->extensions + ((set)->dsize * (id)))
@@ -59,9 +60,6 @@ mtype_destroy(struct ip_set *set)
 {
        struct mtype *map = set->data;
 
-       if (SET_WITH_TIMEOUT(set))
-               del_timer_sync(&map->gc);
-
        if (set->dsize && set->extensions & IPSET_EXT_DESTROY)
                mtype_ext_cleanup(set);
        ip_set_free(map->members);
@@ -290,6 +288,15 @@ mtype_gc(struct timer_list *t)
        add_timer(&map->gc);
 }
 
+static void
+mtype_cancel_gc(struct ip_set *set)
+{
+       struct mtype *map = set->data;
+
+       if (SET_WITH_TIMEOUT(set))
+               del_timer_sync(&map->gc);
+}
+
 static const struct ip_set_type_variant mtype = {
        .kadt   = mtype_kadt,
        .uadt   = mtype_uadt,
@@ -303,6 +310,7 @@ static const struct ip_set_type_variant mtype = {
        .head   = mtype_head,
        .list   = mtype_list,
        .same_set = mtype_same_set,
+       .cancel_gc = mtype_cancel_gc,
 };
 
 #endif /* __IP_SET_BITMAP_IP_GEN_H */
index 4c133e06be1de2f8972b50ac87e6b0b7bfc9ac6d..3184cc6be4c9d375fb2bda49d1bbec6623618c77 100644 (file)
@@ -1154,6 +1154,7 @@ static int ip_set_create(struct sk_buff *skb, const struct nfnl_info *info,
        return ret;
 
 cleanup:
+       set->variant->cancel_gc(set);
        set->variant->destroy(set);
 put_out:
        module_put(set->type->me);
@@ -1182,6 +1183,14 @@ ip_set_destroy_set(struct ip_set *set)
        kfree(set);
 }
 
+static void
+ip_set_destroy_set_rcu(struct rcu_head *head)
+{
+       struct ip_set *set = container_of(head, struct ip_set, rcu);
+
+       ip_set_destroy_set(set);
+}
+
 static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info,
                          const struct nlattr * const attr[])
 {
@@ -1193,8 +1202,6 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info,
        if (unlikely(protocol_min_failed(attr)))
                return -IPSET_ERR_PROTOCOL;
 
-       /* Must wait for flush to be really finished in list:set */
-       rcu_barrier();
 
        /* Commands are serialized and references are
         * protected by the ip_set_ref_lock.
@@ -1206,8 +1213,10 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info,
         * counter, so if it's already zero, we can proceed
         * without holding the lock.
         */
-       read_lock_bh(&ip_set_ref_lock);
        if (!attr[IPSET_ATTR_SETNAME]) {
+               /* Must wait for flush to be really finished in list:set */
+               rcu_barrier();
+               read_lock_bh(&ip_set_ref_lock);
                for (i = 0; i < inst->ip_set_max; i++) {
                        s = ip_set(inst, i);
                        if (s && (s->ref || s->ref_netlink)) {
@@ -1221,6 +1230,8 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info,
                        s = ip_set(inst, i);
                        if (s) {
                                ip_set(inst, i) = NULL;
+                               /* Must cancel garbage collectors */
+                               s->variant->cancel_gc(s);
                                ip_set_destroy_set(s);
                        }
                }
@@ -1228,6 +1239,9 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info,
                inst->is_destroyed = false;
        } else {
                u32 flags = flag_exist(info->nlh);
+               u16 features = 0;
+
+               read_lock_bh(&ip_set_ref_lock);
                s = find_set_and_id(inst, nla_data(attr[IPSET_ATTR_SETNAME]),
                                    &i);
                if (!s) {
@@ -1238,10 +1252,16 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info,
                        ret = -IPSET_ERR_BUSY;
                        goto out;
                }
+               features = s->type->features;
                ip_set(inst, i) = NULL;
                read_unlock_bh(&ip_set_ref_lock);
-
-               ip_set_destroy_set(s);
+               if (features & IPSET_TYPE_NAME) {
+                       /* Must wait for flush to be really finished  */
+                       rcu_barrier();
+               }
+               /* Must cancel garbage collectors */
+               s->variant->cancel_gc(s);
+               call_rcu(&s->rcu, ip_set_destroy_set_rcu);
        }
        return 0;
 out:
@@ -1394,9 +1414,6 @@ static int ip_set_swap(struct sk_buff *skb, const struct nfnl_info *info,
        ip_set(inst, to_id) = from;
        write_unlock_bh(&ip_set_ref_lock);
 
-       /* Make sure all readers of the old set pointers are completed. */
-       synchronize_rcu();
-
        return 0;
 }
 
@@ -2362,6 +2379,7 @@ ip_set_net_exit(struct net *net)
                set = ip_set(inst, i);
                if (set) {
                        ip_set(inst, i) = NULL;
+                       set->variant->cancel_gc(set);
                        ip_set_destroy_set(set);
                }
        }
@@ -2409,8 +2427,11 @@ ip_set_fini(void)
 {
        nf_unregister_sockopt(&so_set);
        nfnetlink_subsys_unregister(&ip_set_netlink_subsys);
-
        unregister_pernet_subsys(&ip_set_net_ops);
+
+       /* Wait for call_rcu() in destroy */
+       rcu_barrier();
+
        pr_debug("these are the famous last words\n");
 }
 
index cbf80da9a01caf0616d7d77d5be16521b6c0d47e..cf3ce72c3de645168b4698176518a02df6a6fa5a 100644 (file)
@@ -222,6 +222,7 @@ static const union nf_inet_addr zeromask = {};
 #undef mtype_gc_do
 #undef mtype_gc
 #undef mtype_gc_init
+#undef mtype_cancel_gc
 #undef mtype_variant
 #undef mtype_data_match
 
@@ -266,6 +267,7 @@ static const union nf_inet_addr zeromask = {};
 #define mtype_gc_do            IPSET_TOKEN(MTYPE, _gc_do)
 #define mtype_gc               IPSET_TOKEN(MTYPE, _gc)
 #define mtype_gc_init          IPSET_TOKEN(MTYPE, _gc_init)
+#define mtype_cancel_gc                IPSET_TOKEN(MTYPE, _cancel_gc)
 #define mtype_variant          IPSET_TOKEN(MTYPE, _variant)
 #define mtype_data_match       IPSET_TOKEN(MTYPE, _data_match)
 
@@ -430,7 +432,7 @@ mtype_ahash_destroy(struct ip_set *set, struct htable *t, bool ext_destroy)
        u32 i;
 
        for (i = 0; i < jhash_size(t->htable_bits); i++) {
-               n = __ipset_dereference(hbucket(t, i));
+               n = (__force struct hbucket *)hbucket(t, i);
                if (!n)
                        continue;
                if (set->extensions & IPSET_EXT_DESTROY && ext_destroy)
@@ -450,10 +452,7 @@ mtype_destroy(struct ip_set *set)
        struct htype *h = set->data;
        struct list_head *l, *lt;
 
-       if (SET_WITH_TIMEOUT(set))
-               cancel_delayed_work_sync(&h->gc.dwork);
-
-       mtype_ahash_destroy(set, ipset_dereference_nfnl(h->table), true);
+       mtype_ahash_destroy(set, (__force struct htable *)h->table, true);
        list_for_each_safe(l, lt, &h->ad) {
                list_del(l);
                kfree(l);
@@ -599,6 +598,15 @@ mtype_gc_init(struct htable_gc *gc)
        queue_delayed_work(system_power_efficient_wq, &gc->dwork, HZ);
 }
 
+static void
+mtype_cancel_gc(struct ip_set *set)
+{
+       struct htype *h = set->data;
+
+       if (SET_WITH_TIMEOUT(set))
+               cancel_delayed_work_sync(&h->gc.dwork);
+}
+
 static int
 mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext,
          struct ip_set_ext *mext, u32 flags);
@@ -1441,6 +1449,7 @@ static const struct ip_set_type_variant mtype_variant = {
        .uref   = mtype_uref,
        .resize = mtype_resize,
        .same_set = mtype_same_set,
+       .cancel_gc = mtype_cancel_gc,
        .region_lock = true,
 };
 
index e162636525cfb4ad02de58982382a289e5bcbc45..6c3f28bc59b3259f0033cd4adc0ba5711db08c26 100644 (file)
@@ -426,9 +426,6 @@ list_set_destroy(struct ip_set *set)
        struct list_set *map = set->data;
        struct set_elem *e, *n;
 
-       if (SET_WITH_TIMEOUT(set))
-               timer_shutdown_sync(&map->gc);
-
        list_for_each_entry_safe(e, n, &map->members, list) {
                list_del(&e->list);
                ip_set_put_byindex(map->net, e->id);
@@ -545,6 +542,15 @@ list_set_same_set(const struct ip_set *a, const struct ip_set *b)
               a->extensions == b->extensions;
 }
 
+static void
+list_set_cancel_gc(struct ip_set *set)
+{
+       struct list_set *map = set->data;
+
+       if (SET_WITH_TIMEOUT(set))
+               timer_shutdown_sync(&map->gc);
+}
+
 static const struct ip_set_type_variant set_variant = {
        .kadt   = list_set_kadt,
        .uadt   = list_set_uadt,
@@ -558,6 +564,7 @@ static const struct ip_set_type_variant set_variant = {
        .head   = list_set_head,
        .list   = list_set_list,
        .same_set = list_set_same_set,
+       .cancel_gc = list_set_cancel_gc,
 };
 
 static void
index 2e5f3864d353a39cfde138b725e790d7290b82c9..5b876fa7f9af9e5dfe950929b29f0fc92daf9bab 100644 (file)
@@ -2756,6 +2756,7 @@ static const struct nf_ct_hook nf_conntrack_hook = {
        .get_tuple_skb  = nf_conntrack_get_tuple_skb,
        .attach         = nf_conntrack_attach,
        .set_closing    = nf_conntrack_set_closing,
+       .confirm        = __nf_conntrack_confirm,
 };
 
 void nf_conntrack_init_end(void)
index e697a824b0018e1f1e26e3d547c1e80c6ca49e39..540d97715bd23d6f53f29fc7df39f09cd6b2f5c0 100644 (file)
@@ -533,6 +533,8 @@ static int decode_seq(struct bitstr *bs, const struct field_t *f,
        /* Get fields bitmap */
        if (nf_h323_error_boundary(bs, 0, f->sz))
                return H323_ERROR_BOUND;
+       if (f->sz > 32)
+               return H323_ERROR_RANGE;
        bmp = get_bitmap(bs, f->sz);
        if (base)
                *(unsigned int *)base = bmp;
@@ -589,6 +591,8 @@ static int decode_seq(struct bitstr *bs, const struct field_t *f,
        bmp2_len = get_bits(bs, 7) + 1;
        if (nf_h323_error_boundary(bs, 0, bmp2_len))
                return H323_ERROR_BOUND;
+       if (bmp2_len > 32)
+               return H323_ERROR_RANGE;
        bmp2 = get_bitmap(bs, bmp2_len);
        bmp |= bmp2 >> f->sz;
        if (base)
index 0c22a02c2035ccb9c760d71fe2e5dd9c461bf239..3b846cbdc050d324626586fb6ece00985efd874b 100644 (file)
@@ -876,6 +876,7 @@ struct ctnetlink_filter_u32 {
 
 struct ctnetlink_filter {
        u8 family;
+       bool zone_filter;
 
        u_int32_t orig_flags;
        u_int32_t reply_flags;
@@ -992,9 +993,12 @@ ctnetlink_alloc_filter(const struct nlattr * const cda[], u8 family)
        if (err)
                goto err_filter;
 
-       err = ctnetlink_parse_zone(cda[CTA_ZONE], &filter->zone);
-       if (err < 0)
-               goto err_filter;
+       if (cda[CTA_ZONE]) {
+               err = ctnetlink_parse_zone(cda[CTA_ZONE], &filter->zone);
+               if (err < 0)
+                       goto err_filter;
+               filter->zone_filter = true;
+       }
 
        if (!cda[CTA_FILTER])
                return filter;
@@ -1148,7 +1152,7 @@ static int ctnetlink_filter_match(struct nf_conn *ct, void *data)
        if (filter->family && nf_ct_l3num(ct) != filter->family)
                goto ignore_entry;
 
-       if (filter->zone.id != NF_CT_DEFAULT_ZONE_ID &&
+       if (filter->zone_filter &&
            !nf_ct_zone_equal_any(ct, &filter->zone))
                goto ignore_entry;
 
index c6bd533983c1ff275796b789cdb973c09f646984..4cc97f971264ed779434ab4597dd0162586b3736 100644 (file)
@@ -283,7 +283,7 @@ sctp_new(struct nf_conn *ct, const struct sk_buff *skb,
                        pr_debug("Setting vtag %x for secondary conntrack\n",
                                 sh->vtag);
                        ct->proto.sctp.vtag[IP_CT_DIR_ORIGINAL] = sh->vtag;
-               } else {
+               } else if (sch->type == SCTP_CID_SHUTDOWN_ACK) {
                /* If it is a shutdown ack OOTB packet, we expect a return
                   shutdown complete, otherwise an ABORT Sec 8.4 (5) and (8) */
                        pr_debug("Setting vtag %x for new conn OOTB\n",
index e573be5afde7a591e00e799aadeadcf455b31f05..ae493599a3ef03415f6c40e942cdab700acb84c6 100644 (file)
@@ -457,7 +457,8 @@ static void tcp_init_sender(struct ip_ct_tcp_state *sender,
                            const struct sk_buff *skb,
                            unsigned int dataoff,
                            const struct tcphdr *tcph,
-                           u32 end, u32 win)
+                           u32 end, u32 win,
+                           enum ip_conntrack_dir dir)
 {
        /* SYN-ACK in reply to a SYN
         * or SYN from reply direction in simultaneous open.
@@ -471,7 +472,8 @@ static void tcp_init_sender(struct ip_ct_tcp_state *sender,
         * Both sides must send the Window Scale option
         * to enable window scaling in either direction.
         */
-       if (!(sender->flags & IP_CT_TCP_FLAG_WINDOW_SCALE &&
+       if (dir == IP_CT_DIR_REPLY &&
+           !(sender->flags & IP_CT_TCP_FLAG_WINDOW_SCALE &&
              receiver->flags & IP_CT_TCP_FLAG_WINDOW_SCALE)) {
                sender->td_scale = 0;
                receiver->td_scale = 0;
@@ -542,7 +544,7 @@ tcp_in_window(struct nf_conn *ct, enum ip_conntrack_dir dir,
                if (tcph->syn) {
                        tcp_init_sender(sender, receiver,
                                        skb, dataoff, tcph,
-                                       end, win);
+                                       end, win, dir);
                        if (!tcph->ack)
                                /* Simultaneous open */
                                return NFCT_TCP_ACCEPT;
@@ -585,7 +587,7 @@ tcp_in_window(struct nf_conn *ct, enum ip_conntrack_dir dir,
                 */
                tcp_init_sender(sender, receiver,
                                skb, dataoff, tcph,
-                               end, win);
+                               end, win, dir);
 
                if (dir == IP_CT_DIR_REPLY && !tcph->ack)
                        return NFCT_TCP_ACCEPT;
index 920a5a29ae1dceba6849aaad6d62701567d3ec99..a0571339239c40ded96c4a9466d53d5de2887ed5 100644 (file)
@@ -87,12 +87,22 @@ static u32 flow_offload_dst_cookie(struct flow_offload_tuple *flow_tuple)
        return 0;
 }
 
+static struct dst_entry *nft_route_dst_fetch(struct nf_flow_route *route,
+                                            enum flow_offload_tuple_dir dir)
+{
+       struct dst_entry *dst = route->tuple[dir].dst;
+
+       route->tuple[dir].dst = NULL;
+
+       return dst;
+}
+
 static int flow_offload_fill_route(struct flow_offload *flow,
-                                  const struct nf_flow_route *route,
+                                  struct nf_flow_route *route,
                                   enum flow_offload_tuple_dir dir)
 {
        struct flow_offload_tuple *flow_tuple = &flow->tuplehash[dir].tuple;
-       struct dst_entry *dst = route->tuple[dir].dst;
+       struct dst_entry *dst = nft_route_dst_fetch(route, dir);
        int i, j = 0;
 
        switch (flow_tuple->l3proto) {
@@ -122,6 +132,7 @@ static int flow_offload_fill_route(struct flow_offload *flow,
                       ETH_ALEN);
                flow_tuple->out.ifidx = route->tuple[dir].out.ifindex;
                flow_tuple->out.hw_ifidx = route->tuple[dir].out.hw_ifindex;
+               dst_release(dst);
                break;
        case FLOW_OFFLOAD_XMIT_XFRM:
        case FLOW_OFFLOAD_XMIT_NEIGH:
@@ -146,7 +157,7 @@ static void nft_flow_dst_release(struct flow_offload *flow,
 }
 
 void flow_offload_route_init(struct flow_offload *flow,
-                           const struct nf_flow_route *route)
+                            struct nf_flow_route *route)
 {
        flow_offload_fill_route(flow, route, FLOW_OFFLOAD_DIR_ORIGINAL);
        flow_offload_fill_route(flow, route, FLOW_OFFLOAD_DIR_REPLY);
index 8cc52d2bd31be518df778bbe2cfaad6172d90dbc..e16f158388bbe568cddc1be5a0a6d16069897822 100644 (file)
@@ -193,11 +193,12 @@ void nf_logger_put(int pf, enum nf_log_type type)
                return;
        }
 
-       BUG_ON(loggers[pf][type] == NULL);
-
        rcu_read_lock();
        logger = rcu_dereference(loggers[pf][type]);
-       module_put(logger->me);
+       if (!logger)
+               WARN_ON_ONCE(1);
+       else
+               module_put(logger->me);
        rcu_read_unlock();
 }
 EXPORT_SYMBOL_GPL(nf_logger_put);
index c3d7ecbc777ce08525bedee77d637c18682d816c..016c816d91cbc49bfbd5417c295e26667b7be179 100644 (file)
@@ -551,8 +551,11 @@ static void nf_nat_l4proto_unique_tuple(struct nf_conntrack_tuple *tuple,
 find_free_id:
        if (range->flags & NF_NAT_RANGE_PROTO_OFFSET)
                off = (ntohs(*keyptr) - ntohs(range->base_proto.all));
-       else
+       else if ((range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL) ||
+                maniptype != NF_NAT_MANIP_DST)
                off = get_random_u16();
+       else
+               off = 0;
 
        attempts = range_size;
        if (attempts > NF_NAT_MAX_ATTEMPTS)
index 4b55533ce5ca2c29b1648b4f36de3e835c8953a6..1683dc196b5921da91c8dd81b99ac9106c7493fe 100644 (file)
@@ -24,6 +24,7 @@
 #include <net/sock.h>
 
 #define NFT_MODULE_AUTOLOAD_LIMIT (MODULE_NAME_LEN - sizeof("nft-expr-255-"))
+#define NFT_SET_MAX_ANONLEN 16
 
 unsigned int nf_tables_net_id __read_mostly;
 
@@ -683,15 +684,16 @@ static int nft_delobj(struct nft_ctx *ctx, struct nft_object *obj)
        return err;
 }
 
-static int nft_trans_flowtable_add(struct nft_ctx *ctx, int msg_type,
-                                  struct nft_flowtable *flowtable)
+static struct nft_trans *
+nft_trans_flowtable_add(struct nft_ctx *ctx, int msg_type,
+                       struct nft_flowtable *flowtable)
 {
        struct nft_trans *trans;
 
        trans = nft_trans_alloc(ctx, msg_type,
                                sizeof(struct nft_trans_flowtable));
        if (trans == NULL)
-               return -ENOMEM;
+               return ERR_PTR(-ENOMEM);
 
        if (msg_type == NFT_MSG_NEWFLOWTABLE)
                nft_activate_next(ctx->net, flowtable);
@@ -700,22 +702,22 @@ static int nft_trans_flowtable_add(struct nft_ctx *ctx, int msg_type,
        nft_trans_flowtable(trans) = flowtable;
        nft_trans_commit_list_add_tail(ctx->net, trans);
 
-       return 0;
+       return trans;
 }
 
 static int nft_delflowtable(struct nft_ctx *ctx,
                            struct nft_flowtable *flowtable)
 {
-       int err;
+       struct nft_trans *trans;
 
-       err = nft_trans_flowtable_add(ctx, NFT_MSG_DELFLOWTABLE, flowtable);
-       if (err < 0)
-               return err;
+       trans = nft_trans_flowtable_add(ctx, NFT_MSG_DELFLOWTABLE, flowtable);
+       if (IS_ERR(trans))
+               return PTR_ERR(trans);
 
        nft_deactivate_next(ctx->net, flowtable);
        nft_use_dec(&ctx->table->use);
 
-       return err;
+       return 0;
 }
 
 static void __nft_reg_track_clobber(struct nft_regs_track *track, u8 dreg)
@@ -1250,6 +1252,7 @@ static int nf_tables_updtable(struct nft_ctx *ctx)
        return 0;
 
 err_register_hooks:
+       ctx->table->flags |= NFT_TABLE_F_DORMANT;
        nft_trans_destroy(trans);
        return ret;
 }
@@ -2079,7 +2082,7 @@ static struct nft_hook *nft_netdev_hook_alloc(struct net *net,
        struct nft_hook *hook;
        int err;
 
-       hook = kmalloc(sizeof(struct nft_hook), GFP_KERNEL_ACCOUNT);
+       hook = kzalloc(sizeof(struct nft_hook), GFP_KERNEL_ACCOUNT);
        if (!hook) {
                err = -ENOMEM;
                goto err_hook_alloc;
@@ -2502,19 +2505,15 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
        RCU_INIT_POINTER(chain->blob_gen_0, blob);
        RCU_INIT_POINTER(chain->blob_gen_1, blob);
 
-       err = nf_tables_register_hook(net, table, chain);
-       if (err < 0)
-               goto err_destroy_chain;
-
        if (!nft_use_inc(&table->use)) {
                err = -EMFILE;
-               goto err_use;
+               goto err_destroy_chain;
        }
 
        trans = nft_trans_chain_add(ctx, NFT_MSG_NEWCHAIN);
        if (IS_ERR(trans)) {
                err = PTR_ERR(trans);
-               goto err_unregister_hook;
+               goto err_trans;
        }
 
        nft_trans_chain_policy(trans) = NFT_CHAIN_POLICY_UNSET;
@@ -2522,17 +2521,22 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
                nft_trans_chain_policy(trans) = policy;
 
        err = nft_chain_add(table, chain);
-       if (err < 0) {
-               nft_trans_destroy(trans);
-               goto err_unregister_hook;
-       }
+       if (err < 0)
+               goto err_chain_add;
+
+       /* This must be LAST to ensure no packets are walking over this chain. */
+       err = nf_tables_register_hook(net, table, chain);
+       if (err < 0)
+               goto err_register_hook;
 
        return 0;
 
-err_unregister_hook:
+err_register_hook:
+       nft_chain_del(chain);
+err_chain_add:
+       nft_trans_destroy(trans);
+err_trans:
        nft_use_dec_restore(&table->use);
-err_use:
-       nf_tables_unregister_hook(net, table, chain);
 err_destroy_chain:
        nf_tables_chain_destroy(ctx);
 
@@ -4413,6 +4417,9 @@ static int nf_tables_set_alloc_name(struct nft_ctx *ctx, struct nft_set *set,
                if (p[1] != 'd' || strchr(p + 2, '%'))
                        return -EINVAL;
 
+               if (strnlen(name, NFT_SET_MAX_ANONLEN) >= NFT_SET_MAX_ANONLEN)
+                       return -EINVAL;
+
                inuse = (unsigned long *)get_zeroed_page(GFP_KERNEL);
                if (inuse == NULL)
                        return -ENOMEM;
@@ -4994,6 +5001,12 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info,
                if ((flags & (NFT_SET_EVAL | NFT_SET_OBJECT)) ==
                             (NFT_SET_EVAL | NFT_SET_OBJECT))
                        return -EOPNOTSUPP;
+               if ((flags & (NFT_SET_ANONYMOUS | NFT_SET_TIMEOUT | NFT_SET_EVAL)) ==
+                            (NFT_SET_ANONYMOUS | NFT_SET_TIMEOUT))
+                       return -EOPNOTSUPP;
+               if ((flags & (NFT_SET_CONSTANT | NFT_SET_TIMEOUT)) ==
+                            (NFT_SET_CONSTANT | NFT_SET_TIMEOUT))
+                       return -EOPNOTSUPP;
        }
 
        desc.dtype = 0;
@@ -5417,6 +5430,7 @@ static void nf_tables_unbind_set(const struct nft_ctx *ctx, struct nft_set *set,
 
        if (list_empty(&set->bindings) && nft_set_is_anonymous(set)) {
                list_del_rcu(&set->list);
+               set->dead = 1;
                if (event)
                        nf_tables_set_notify(ctx, set, NFT_MSG_DELSET,
                                             GFP_KERNEL);
@@ -7547,11 +7561,15 @@ nla_put_failure:
        return -1;
 }
 
-static const struct nft_object_type *__nft_obj_type_get(u32 objtype)
+static const struct nft_object_type *__nft_obj_type_get(u32 objtype, u8 family)
 {
        const struct nft_object_type *type;
 
        list_for_each_entry(type, &nf_tables_objects, list) {
+               if (type->family != NFPROTO_UNSPEC &&
+                   type->family != family)
+                       continue;
+
                if (objtype == type->type)
                        return type;
        }
@@ -7559,11 +7577,11 @@ static const struct nft_object_type *__nft_obj_type_get(u32 objtype)
 }
 
 static const struct nft_object_type *
-nft_obj_type_get(struct net *net, u32 objtype)
+nft_obj_type_get(struct net *net, u32 objtype, u8 family)
 {
        const struct nft_object_type *type;
 
-       type = __nft_obj_type_get(objtype);
+       type = __nft_obj_type_get(objtype, family);
        if (type != NULL && try_module_get(type->owner))
                return type;
 
@@ -7656,7 +7674,7 @@ static int nf_tables_newobj(struct sk_buff *skb, const struct nfnl_info *info,
                if (info->nlh->nlmsg_flags & NLM_F_REPLACE)
                        return -EOPNOTSUPP;
 
-               type = __nft_obj_type_get(objtype);
+               type = __nft_obj_type_get(objtype, family);
                if (WARN_ON_ONCE(!type))
                        return -ENOENT;
 
@@ -7670,7 +7688,7 @@ static int nf_tables_newobj(struct sk_buff *skb, const struct nfnl_info *info,
        if (!nft_use_inc(&table->use))
                return -EMFILE;
 
-       type = nft_obj_type_get(net, objtype);
+       type = nft_obj_type_get(net, objtype, family);
        if (IS_ERR(type)) {
                err = PTR_ERR(type);
                goto err_type;
@@ -8447,9 +8465,9 @@ static int nf_tables_newflowtable(struct sk_buff *skb,
        u8 family = info->nfmsg->nfgen_family;
        const struct nf_flowtable_type *type;
        struct nft_flowtable *flowtable;
-       struct nft_hook *hook, *next;
        struct net *net = info->net;
        struct nft_table *table;
+       struct nft_trans *trans;
        struct nft_ctx ctx;
        int err;
 
@@ -8529,34 +8547,34 @@ static int nf_tables_newflowtable(struct sk_buff *skb,
        err = nft_flowtable_parse_hook(&ctx, nla, &flowtable_hook, flowtable,
                                       extack, true);
        if (err < 0)
-               goto err4;
+               goto err_flowtable_parse_hooks;
 
        list_splice(&flowtable_hook.list, &flowtable->hook_list);
        flowtable->data.priority = flowtable_hook.priority;
        flowtable->hooknum = flowtable_hook.num;
 
+       trans = nft_trans_flowtable_add(&ctx, NFT_MSG_NEWFLOWTABLE, flowtable);
+       if (IS_ERR(trans)) {
+               err = PTR_ERR(trans);
+               goto err_flowtable_trans;
+       }
+
+       /* This must be LAST to ensure no packets are walking over this flowtable. */
        err = nft_register_flowtable_net_hooks(ctx.net, table,
                                               &flowtable->hook_list,
                                               flowtable);
-       if (err < 0) {
-               nft_hooks_destroy(&flowtable->hook_list);
-               goto err4;
-       }
-
-       err = nft_trans_flowtable_add(&ctx, NFT_MSG_NEWFLOWTABLE, flowtable);
        if (err < 0)
-               goto err5;
+               goto err_flowtable_hooks;
 
        list_add_tail_rcu(&flowtable->list, &table->flowtables);
 
        return 0;
-err5:
-       list_for_each_entry_safe(hook, next, &flowtable->hook_list, list) {
-               nft_unregister_flowtable_hook(net, flowtable, hook);
-               list_del_rcu(&hook->list);
-               kfree_rcu(hook, rcu);
-       }
-err4:
+
+err_flowtable_hooks:
+       nft_trans_destroy(trans);
+err_flowtable_trans:
+       nft_hooks_destroy(&flowtable->hook_list);
+err_flowtable_parse_hooks:
        flowtable->data.type->free(&flowtable->data);
 err3:
        module_put(type->owner);
@@ -9819,6 +9837,7 @@ dead_elem:
 struct nft_trans_gc *nft_trans_gc_catchall_sync(struct nft_trans_gc *gc)
 {
        struct nft_set_elem_catchall *catchall, *next;
+       u64 tstamp = nft_net_tstamp(gc->net);
        const struct nft_set *set = gc->set;
        struct nft_elem_priv *elem_priv;
        struct nft_set_ext *ext;
@@ -9828,7 +9847,7 @@ struct nft_trans_gc *nft_trans_gc_catchall_sync(struct nft_trans_gc *gc)
        list_for_each_entry_safe(catchall, next, &set->catchall_list, list) {
                ext = nft_set_elem_ext(set, catchall->elem);
 
-               if (!nft_set_elem_expired(ext))
+               if (!__nft_set_elem_expired(ext, tstamp))
                        continue;
 
                gc = nft_trans_gc_queue_sync(gc, GFP_KERNEL);
@@ -10614,6 +10633,7 @@ static bool nf_tables_valid_genid(struct net *net, u32 genid)
        bool genid_ok;
 
        mutex_lock(&nft_net->commit_mutex);
+       nft_net->tstamp = get_jiffies_64();
 
        genid_ok = genid == 0 || nft_net->base_seq == genid;
        if (!genid_ok)
@@ -10988,16 +11008,10 @@ static int nft_verdict_init(const struct nft_ctx *ctx, struct nft_data *data,
        data->verdict.code = ntohl(nla_get_be32(tb[NFTA_VERDICT_CODE]));
 
        switch (data->verdict.code) {
-       default:
-               switch (data->verdict.code & NF_VERDICT_MASK) {
-               case NF_ACCEPT:
-               case NF_DROP:
-               case NF_QUEUE:
-                       break;
-               default:
-                       return -EINVAL;
-               }
-               fallthrough;
+       case NF_ACCEPT:
+       case NF_DROP:
+       case NF_QUEUE:
+               break;
        case NFT_CONTINUE:
        case NFT_BREAK:
        case NFT_RETURN:
@@ -11032,6 +11046,8 @@ static int nft_verdict_init(const struct nft_ctx *ctx, struct nft_data *data,
 
                data->verdict.chain = chain;
                break;
+       default:
+               return -EINVAL;
        }
 
        desc->len = sizeof(data->verdict);
index 171d1f52d3dd0da711cd63b23ec31d72fa88cdd2..5cf38fc0a366ac55c0ce11798baf7fb93c88283f 100644 (file)
@@ -232,18 +232,25 @@ static void nfqnl_reinject(struct nf_queue_entry *entry, unsigned int verdict)
        if (verdict == NF_ACCEPT ||
            verdict == NF_REPEAT ||
            verdict == NF_STOP) {
+               unsigned int ct_verdict = verdict;
+
                rcu_read_lock();
                ct_hook = rcu_dereference(nf_ct_hook);
                if (ct_hook)
-                       verdict = ct_hook->update(entry->state.net, entry->skb);
+                       ct_verdict = ct_hook->update(entry->state.net, entry->skb);
                rcu_read_unlock();
 
-               switch (verdict & NF_VERDICT_MASK) {
+               switch (ct_verdict & NF_VERDICT_MASK) {
+               case NF_ACCEPT:
+                       /* follow userspace verdict, could be REPEAT */
+                       break;
                case NF_STOLEN:
                        nf_queue_entry_free(entry);
                        return;
+               default:
+                       verdict = ct_verdict & NF_VERDICT_MASK;
+                       break;
                }
-
        }
        nf_reinject(entry, verdict);
 }
index 680fe557686e42d3421a445b6c5472bd4056a65a..274b6f7e6bb57e4f270262ef923ebf8d7f1cf02c 100644 (file)
@@ -357,9 +357,10 @@ static int nf_tables_netdev_event(struct notifier_block *this,
                                  unsigned long event, void *ptr)
 {
        struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+       struct nft_base_chain *basechain;
        struct nftables_pernet *nft_net;
-       struct nft_table *table;
        struct nft_chain *chain, *nr;
+       struct nft_table *table;
        struct nft_ctx ctx = {
                .net    = dev_net(dev),
        };
@@ -371,7 +372,8 @@ static int nf_tables_netdev_event(struct notifier_block *this,
        nft_net = nft_pernet(ctx.net);
        mutex_lock(&nft_net->commit_mutex);
        list_for_each_entry(table, &nft_net->tables, list) {
-               if (table->family != NFPROTO_NETDEV)
+               if (table->family != NFPROTO_NETDEV &&
+                   table->family != NFPROTO_INET)
                        continue;
 
                ctx.family = table->family;
@@ -380,6 +382,11 @@ static int nf_tables_netdev_event(struct notifier_block *this,
                        if (!nft_is_base_chain(chain))
                                continue;
 
+                       basechain = nft_base_chain(chain);
+                       if (table->family == NFPROTO_INET &&
+                           basechain->ops.hooknum != NF_INET_INGRESS)
+                               continue;
+
                        ctx.chain = chain;
                        nft_netdev_event(event, dev, &ctx);
                }
index 5284cd2ad532713368db0cd56bdf17baf1e0ed4d..d3d11dede54507262022725a5e54a12f0def7f89 100644 (file)
@@ -135,7 +135,7 @@ static void nft_target_eval_bridge(const struct nft_expr *expr,
 
 static const struct nla_policy nft_target_policy[NFTA_TARGET_MAX + 1] = {
        [NFTA_TARGET_NAME]      = { .type = NLA_NUL_STRING },
-       [NFTA_TARGET_REV]       = { .type = NLA_U32 },
+       [NFTA_TARGET_REV]       = NLA_POLICY_MAX(NLA_BE32, 255),
        [NFTA_TARGET_INFO]      = { .type = NLA_BINARY },
 };
 
@@ -200,6 +200,7 @@ static const struct nla_policy nft_rule_compat_policy[NFTA_RULE_COMPAT_MAX + 1]
 static int nft_parse_compat(const struct nlattr *attr, u16 *proto, bool *inv)
 {
        struct nlattr *tb[NFTA_RULE_COMPAT_MAX+1];
+       u32 l4proto;
        u32 flags;
        int err;
 
@@ -212,12 +213,18 @@ static int nft_parse_compat(const struct nlattr *attr, u16 *proto, bool *inv)
                return -EINVAL;
 
        flags = ntohl(nla_get_be32(tb[NFTA_RULE_COMPAT_FLAGS]));
-       if (flags & ~NFT_RULE_COMPAT_F_MASK)
+       if (flags & NFT_RULE_COMPAT_F_UNUSED ||
+           flags & ~NFT_RULE_COMPAT_F_MASK)
                return -EINVAL;
        if (flags & NFT_RULE_COMPAT_F_INV)
                *inv = true;
 
-       *proto = ntohl(nla_get_be32(tb[NFTA_RULE_COMPAT_PROTO]));
+       l4proto = ntohl(nla_get_be32(tb[NFTA_RULE_COMPAT_PROTO]));
+       if (l4proto > U16_MAX)
+               return -EINVAL;
+
+       *proto = l4proto;
+
        return 0;
 }
 
@@ -350,6 +357,22 @@ static int nft_target_validate(const struct nft_ctx *ctx,
        unsigned int hook_mask = 0;
        int ret;
 
+       if (ctx->family != NFPROTO_IPV4 &&
+           ctx->family != NFPROTO_IPV6 &&
+           ctx->family != NFPROTO_INET &&
+           ctx->family != NFPROTO_BRIDGE &&
+           ctx->family != NFPROTO_ARP)
+               return -EOPNOTSUPP;
+
+       ret = nft_chain_validate_hooks(ctx->chain,
+                                      (1 << NF_INET_PRE_ROUTING) |
+                                      (1 << NF_INET_LOCAL_IN) |
+                                      (1 << NF_INET_FORWARD) |
+                                      (1 << NF_INET_LOCAL_OUT) |
+                                      (1 << NF_INET_POST_ROUTING));
+       if (ret)
+               return ret;
+
        if (nft_is_base_chain(ctx->chain)) {
                const struct nft_base_chain *basechain =
                                                nft_base_chain(ctx->chain);
@@ -413,7 +436,7 @@ static void nft_match_eval(const struct nft_expr *expr,
 
 static const struct nla_policy nft_match_policy[NFTA_MATCH_MAX + 1] = {
        [NFTA_MATCH_NAME]       = { .type = NLA_NUL_STRING },
-       [NFTA_MATCH_REV]        = { .type = NLA_U32 },
+       [NFTA_MATCH_REV]        = NLA_POLICY_MAX(NLA_BE32, 255),
        [NFTA_MATCH_INFO]       = { .type = NLA_BINARY },
 };
 
@@ -595,6 +618,22 @@ static int nft_match_validate(const struct nft_ctx *ctx,
        unsigned int hook_mask = 0;
        int ret;
 
+       if (ctx->family != NFPROTO_IPV4 &&
+           ctx->family != NFPROTO_IPV6 &&
+           ctx->family != NFPROTO_INET &&
+           ctx->family != NFPROTO_BRIDGE &&
+           ctx->family != NFPROTO_ARP)
+               return -EOPNOTSUPP;
+
+       ret = nft_chain_validate_hooks(ctx->chain,
+                                      (1 << NF_INET_PRE_ROUTING) |
+                                      (1 << NF_INET_LOCAL_IN) |
+                                      (1 << NF_INET_FORWARD) |
+                                      (1 << NF_INET_LOCAL_OUT) |
+                                      (1 << NF_INET_POST_ROUTING));
+       if (ret)
+               return ret;
+
        if (nft_is_base_chain(ctx->chain)) {
                const struct nft_base_chain *basechain =
                                                nft_base_chain(ctx->chain);
@@ -712,7 +751,7 @@ out_put:
 static const struct nla_policy nfnl_compat_policy_get[NFTA_COMPAT_MAX+1] = {
        [NFTA_COMPAT_NAME]      = { .type = NLA_NUL_STRING,
                                    .len = NFT_COMPAT_NAME_MAX-1 },
-       [NFTA_COMPAT_REV]       = { .type = NLA_U32 },
+       [NFTA_COMPAT_REV]       = NLA_POLICY_MAX(NLA_BE32, 255),
        [NFTA_COMPAT_TYPE]      = { .type = NLA_U32 },
 };
 
index 86bb9d7797d9eeaea730e463c389a958e0b6ec85..255640013ab84542b76026e9fc4ae4a2f61b2c99 100644 (file)
@@ -476,6 +476,9 @@ static int nft_ct_get_init(const struct nft_ctx *ctx,
                break;
 #endif
        case NFT_CT_ID:
+               if (tb[NFTA_CT_DIRECTION])
+                       return -EINVAL;
+
                len = sizeof(u32);
                break;
        default:
@@ -1250,7 +1253,30 @@ static int nft_ct_expect_obj_init(const struct nft_ctx *ctx,
        if (tb[NFTA_CT_EXPECT_L3PROTO])
                priv->l3num = ntohs(nla_get_be16(tb[NFTA_CT_EXPECT_L3PROTO]));
 
+       switch (priv->l3num) {
+       case NFPROTO_IPV4:
+       case NFPROTO_IPV6:
+               if (priv->l3num == ctx->family || ctx->family == NFPROTO_INET)
+                       break;
+
+               return -EINVAL;
+       case NFPROTO_INET: /* tuple.src.l3num supports NFPROTO_IPV4/6 only */
+       default:
+               return -EAFNOSUPPORT;
+       }
+
        priv->l4proto = nla_get_u8(tb[NFTA_CT_EXPECT_L4PROTO]);
+       switch (priv->l4proto) {
+       case IPPROTO_TCP:
+       case IPPROTO_UDP:
+       case IPPROTO_UDPLITE:
+       case IPPROTO_DCCP:
+       case IPPROTO_SCTP:
+               break;
+       default:
+               return -EOPNOTSUPP;
+       }
+
        priv->dport = nla_get_be16(tb[NFTA_CT_EXPECT_DPORT]);
        priv->timeout = nla_get_u32(tb[NFTA_CT_EXPECT_TIMEOUT]);
        priv->size = nla_get_u8(tb[NFTA_CT_EXPECT_SIZE]);
index ab3362c483b4a78c1e138815764e9e80bfd5d43d..ab95760987010b649483bf052fbdba9fde4c9624 100644 (file)
@@ -361,6 +361,7 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
                ct->proto.tcp.seen[1].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
        }
 
+       __set_bit(NF_FLOW_HW_BIDIRECTIONAL, &flow->flags);
        ret = flow_offload_add(flowtable, flow);
        if (ret < 0)
                goto err_flow_add;
@@ -384,6 +385,11 @@ static int nft_flow_offload_validate(const struct nft_ctx *ctx,
 {
        unsigned int hook_mask = (1 << NF_INET_FORWARD);
 
+       if (ctx->family != NFPROTO_IPV4 &&
+           ctx->family != NFPROTO_IPV6 &&
+           ctx->family != NFPROTO_INET)
+               return -EOPNOTSUPP;
+
        return nft_chain_validate_hooks(ctx->chain, hook_mask);
 }
 
index 79039afde34ecb1ca9fe1494855676ab26e7c53b..cefa25e0dbb0a2c87af43e8230cf7934ce8fa3d1 100644 (file)
@@ -58,17 +58,19 @@ static inline bool nft_limit_eval(struct nft_limit_priv *priv, u64 cost)
 static int nft_limit_init(struct nft_limit_priv *priv,
                          const struct nlattr * const tb[], bool pkts)
 {
+       u64 unit, tokens, rate_with_burst;
        bool invert = false;
-       u64 unit, tokens;
 
        if (tb[NFTA_LIMIT_RATE] == NULL ||
            tb[NFTA_LIMIT_UNIT] == NULL)
                return -EINVAL;
 
        priv->rate = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_RATE]));
+       if (priv->rate == 0)
+               return -EINVAL;
+
        unit = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_UNIT]));
-       priv->nsecs = unit * NSEC_PER_SEC;
-       if (priv->rate == 0 || priv->nsecs < unit)
+       if (check_mul_overflow(unit, NSEC_PER_SEC, &priv->nsecs))
                return -EOVERFLOW;
 
        if (tb[NFTA_LIMIT_BURST])
@@ -77,18 +79,25 @@ static int nft_limit_init(struct nft_limit_priv *priv,
        if (pkts && priv->burst == 0)
                priv->burst = NFT_LIMIT_PKT_BURST_DEFAULT;
 
-       if (priv->rate + priv->burst < priv->rate)
+       if (check_add_overflow(priv->rate, priv->burst, &rate_with_burst))
                return -EOVERFLOW;
 
        if (pkts) {
-               tokens = div64_u64(priv->nsecs, priv->rate) * priv->burst;
+               u64 tmp = div64_u64(priv->nsecs, priv->rate);
+
+               if (check_mul_overflow(tmp, priv->burst, &tokens))
+                       return -EOVERFLOW;
        } else {
+               u64 tmp;
+
                /* The token bucket size limits the number of tokens can be
                 * accumulated. tokens_max specifies the bucket size.
                 * tokens_max = unit * (rate + burst) / rate.
                 */
-               tokens = div64_u64(priv->nsecs * (priv->rate + priv->burst),
-                                priv->rate);
+               if (check_mul_overflow(priv->nsecs, rate_with_burst, &tmp))
+                       return -EOVERFLOW;
+
+               tokens = div64_u64(tmp, priv->rate);
        }
 
        if (tb[NFTA_LIMIT_FLAGS]) {
index 583885ce72328fab424da04f888398eb687c896a..808f5802c2704a583c747e71d227965fa5c1a8bf 100644 (file)
@@ -143,6 +143,11 @@ static int nft_nat_validate(const struct nft_ctx *ctx,
        struct nft_nat *priv = nft_expr_priv(expr);
        int err;
 
+       if (ctx->family != NFPROTO_IPV4 &&
+           ctx->family != NFPROTO_IPV6 &&
+           ctx->family != NFPROTO_INET)
+               return -EOPNOTSUPP;
+
        err = nft_chain_validate_dependency(ctx->chain, NFT_CHAIN_T_NAT);
        if (err < 0)
                return err;
index 35a2c28caa60bb6d50da5febbf5a6d2be7c9bdd9..24d977138572988e87b8c726daf67441f0b41de2 100644 (file)
@@ -166,6 +166,11 @@ static int nft_rt_validate(const struct nft_ctx *ctx, const struct nft_expr *exp
        const struct nft_rt *priv = nft_expr_priv(expr);
        unsigned int hooks;
 
+       if (ctx->family != NFPROTO_IPV4 &&
+           ctx->family != NFPROTO_IPV6 &&
+           ctx->family != NFPROTO_INET)
+               return -EOPNOTSUPP;
+
        switch (priv->key) {
        case NFT_RT_NEXTHOP4:
        case NFT_RT_NEXTHOP6:
index 6c2061bfdae6c361c530088ca51aa2790d850ba4..6968a3b342367c6c0cb0df7523fdfd5864038802 100644 (file)
@@ -36,6 +36,7 @@ struct nft_rhash_cmp_arg {
        const struct nft_set            *set;
        const u32                       *key;
        u8                              genmask;
+       u64                             tstamp;
 };
 
 static inline u32 nft_rhash_key(const void *data, u32 len, u32 seed)
@@ -62,7 +63,7 @@ static inline int nft_rhash_cmp(struct rhashtable_compare_arg *arg,
                return 1;
        if (nft_set_elem_is_dead(&he->ext))
                return 1;
-       if (nft_set_elem_expired(&he->ext))
+       if (__nft_set_elem_expired(&he->ext, x->tstamp))
                return 1;
        if (!nft_set_elem_active(&he->ext, x->genmask))
                return 1;
@@ -87,6 +88,7 @@ bool nft_rhash_lookup(const struct net *net, const struct nft_set *set,
                .genmask = nft_genmask_cur(net),
                .set     = set,
                .key     = key,
+               .tstamp  = get_jiffies_64(),
        };
 
        he = rhashtable_lookup(&priv->ht, &arg, nft_rhash_params);
@@ -106,6 +108,7 @@ nft_rhash_get(const struct net *net, const struct nft_set *set,
                .genmask = nft_genmask_cur(net),
                .set     = set,
                .key     = elem->key.val.data,
+               .tstamp  = get_jiffies_64(),
        };
 
        he = rhashtable_lookup(&priv->ht, &arg, nft_rhash_params);
@@ -131,6 +134,7 @@ static bool nft_rhash_update(struct nft_set *set, const u32 *key,
                .genmask = NFT_GENMASK_ANY,
                .set     = set,
                .key     = key,
+               .tstamp  = get_jiffies_64(),
        };
 
        he = rhashtable_lookup(&priv->ht, &arg, nft_rhash_params);
@@ -175,6 +179,7 @@ static int nft_rhash_insert(const struct net *net, const struct nft_set *set,
                .genmask = nft_genmask_next(net),
                .set     = set,
                .key     = elem->key.val.data,
+               .tstamp  = nft_net_tstamp(net),
        };
        struct nft_rhash_elem *prev;
 
@@ -216,6 +221,7 @@ nft_rhash_deactivate(const struct net *net, const struct nft_set *set,
                .genmask = nft_genmask_next(net),
                .set     = set,
                .key     = elem->key.val.data,
+               .tstamp  = nft_net_tstamp(net),
        };
 
        rcu_read_lock();
index efd523496be45f59408e8b6dcec7ff40dbcf5844..aa1d9e93a9a04859d48e417501c7f9e889187400 100644 (file)
 #include "nft_set_pipapo_avx2.h"
 #include "nft_set_pipapo.h"
 
-/* Current working bitmap index, toggled between field matches */
-static DEFINE_PER_CPU(bool, nft_pipapo_scratch_index);
-
 /**
  * pipapo_refill() - For each set bit, set bits from selected mapping table item
  * @map:       Bitmap to be scanned for set bits
@@ -412,6 +409,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
                       const u32 *key, const struct nft_set_ext **ext)
 {
        struct nft_pipapo *priv = nft_set_priv(set);
+       struct nft_pipapo_scratch *scratch;
        unsigned long *res_map, *fill_map;
        u8 genmask = nft_genmask_cur(net);
        const u8 *rp = (const u8 *)key;
@@ -422,15 +420,17 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
 
        local_bh_disable();
 
-       map_index = raw_cpu_read(nft_pipapo_scratch_index);
-
        m = rcu_dereference(priv->match);
 
        if (unlikely(!m || !*raw_cpu_ptr(m->scratch)))
                goto out;
 
-       res_map  = *raw_cpu_ptr(m->scratch) + (map_index ? m->bsize_max : 0);
-       fill_map = *raw_cpu_ptr(m->scratch) + (map_index ? 0 : m->bsize_max);
+       scratch = *raw_cpu_ptr(m->scratch);
+
+       map_index = scratch->map_index;
+
+       res_map  = scratch->map + (map_index ? m->bsize_max : 0);
+       fill_map = scratch->map + (map_index ? 0 : m->bsize_max);
 
        memset(res_map, 0xff, m->bsize_max * sizeof(*res_map));
 
@@ -460,7 +460,7 @@ next_match:
                b = pipapo_refill(res_map, f->bsize, f->rules, fill_map, f->mt,
                                  last);
                if (b < 0) {
-                       raw_cpu_write(nft_pipapo_scratch_index, map_index);
+                       scratch->map_index = map_index;
                        local_bh_enable();
 
                        return false;
@@ -477,7 +477,7 @@ next_match:
                         * current inactive bitmap is clean and can be reused as
                         * *next* bitmap (not initial) for the next packet.
                         */
-                       raw_cpu_write(nft_pipapo_scratch_index, map_index);
+                       scratch->map_index = map_index;
                        local_bh_enable();
 
                        return true;
@@ -504,6 +504,7 @@ out:
  * @set:       nftables API set representation
  * @data:      Key data to be matched against existing elements
  * @genmask:   If set, check that element is active in given genmask
+ * @tstamp:    timestamp to check for expired elements
  *
  * This is essentially the same as the lookup function, except that it matches
  * key data against the uncommitted copy and doesn't use preallocated maps for
@@ -513,7 +514,8 @@ out:
  */
 static struct nft_pipapo_elem *pipapo_get(const struct net *net,
                                          const struct nft_set *set,
-                                         const u8 *data, u8 genmask)
+                                         const u8 *data, u8 genmask,
+                                         u64 tstamp)
 {
        struct nft_pipapo_elem *ret = ERR_PTR(-ENOENT);
        struct nft_pipapo *priv = nft_set_priv(set);
@@ -566,7 +568,7 @@ next_match:
                        goto out;
 
                if (last) {
-                       if (nft_set_elem_expired(&f->mt[b].e->ext))
+                       if (__nft_set_elem_expired(&f->mt[b].e->ext, tstamp))
                                goto next_match;
                        if ((genmask &&
                             !nft_set_elem_active(&f->mt[b].e->ext, genmask)))
@@ -603,10 +605,10 @@ static struct nft_elem_priv *
 nft_pipapo_get(const struct net *net, const struct nft_set *set,
               const struct nft_set_elem *elem, unsigned int flags)
 {
-       static struct nft_pipapo_elem *e;
+       struct nft_pipapo_elem *e;
 
        e = pipapo_get(net, set, (const u8 *)elem->key.val.data,
-                      nft_genmask_cur(net));
+                      nft_genmask_cur(net), get_jiffies_64());
        if (IS_ERR(e))
                return ERR_CAST(e);
 
@@ -1108,6 +1110,25 @@ static void pipapo_map(struct nft_pipapo_match *m,
                f->mt[map[i].to + j].e = e;
 }
 
+/**
+ * pipapo_free_scratch() - Free per-CPU map at original (not aligned) address
+ * @m:         Matching data
+ * @cpu:       CPU number
+ */
+static void pipapo_free_scratch(const struct nft_pipapo_match *m, unsigned int cpu)
+{
+       struct nft_pipapo_scratch *s;
+       void *mem;
+
+       s = *per_cpu_ptr(m->scratch, cpu);
+       if (!s)
+               return;
+
+       mem = s;
+       mem -= s->align_off;
+       kfree(mem);
+}
+
 /**
  * pipapo_realloc_scratch() - Reallocate scratch maps for partial match results
  * @clone:     Copy of matching data with pending insertions and deletions
@@ -1121,12 +1142,13 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
        int i;
 
        for_each_possible_cpu(i) {
-               unsigned long *scratch;
+               struct nft_pipapo_scratch *scratch;
 #ifdef NFT_PIPAPO_ALIGN
-               unsigned long *scratch_aligned;
+               void *scratch_aligned;
+               u32 align_off;
 #endif
-
-               scratch = kzalloc_node(bsize_max * sizeof(*scratch) * 2 +
+               scratch = kzalloc_node(struct_size(scratch, map,
+                                                  bsize_max * 2) +
                                       NFT_PIPAPO_ALIGN_HEADROOM,
                                       GFP_KERNEL, cpu_to_node(i));
                if (!scratch) {
@@ -1140,14 +1162,25 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
                        return -ENOMEM;
                }
 
-               kfree(*per_cpu_ptr(clone->scratch, i));
-
-               *per_cpu_ptr(clone->scratch, i) = scratch;
+               pipapo_free_scratch(clone, i);
 
 #ifdef NFT_PIPAPO_ALIGN
-               scratch_aligned = NFT_PIPAPO_LT_ALIGN(scratch);
-               *per_cpu_ptr(clone->scratch_aligned, i) = scratch_aligned;
+               /* Align &scratch->map (not the struct itself): the extra
+                * %NFT_PIPAPO_ALIGN_HEADROOM bytes passed to kzalloc_node()
+                * above guarantee we can waste up to those bytes in order
+                * to align the map field regardless of its offset within
+                * the struct.
+                */
+               BUILD_BUG_ON(offsetof(struct nft_pipapo_scratch, map) > NFT_PIPAPO_ALIGN_HEADROOM);
+
+               scratch_aligned = NFT_PIPAPO_LT_ALIGN(&scratch->map);
+               scratch_aligned -= offsetof(struct nft_pipapo_scratch, map);
+               align_off = scratch_aligned - (void *)scratch;
+
+               scratch = scratch_aligned;
+               scratch->align_off = align_off;
 #endif
+               *per_cpu_ptr(clone->scratch, i) = scratch;
        }
 
        return 0;
@@ -1173,6 +1206,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
        struct nft_pipapo_match *m = priv->clone;
        u8 genmask = nft_genmask_next(net);
        struct nft_pipapo_elem *e, *dup;
+       u64 tstamp = nft_net_tstamp(net);
        struct nft_pipapo_field *f;
        const u8 *start_p, *end_p;
        int i, bsize_max, err = 0;
@@ -1182,7 +1216,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
        else
                end = start;
 
-       dup = pipapo_get(net, set, start, genmask);
+       dup = pipapo_get(net, set, start, genmask, tstamp);
        if (!IS_ERR(dup)) {
                /* Check if we already have the same exact entry */
                const struct nft_data *dup_key, *dup_end;
@@ -1204,7 +1238,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
 
        if (PTR_ERR(dup) == -ENOENT) {
                /* Look for partially overlapping entries */
-               dup = pipapo_get(net, set, end, nft_genmask_next(net));
+               dup = pipapo_get(net, set, end, nft_genmask_next(net), tstamp);
        }
 
        if (PTR_ERR(dup) != -ENOENT) {
@@ -1301,11 +1335,6 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
        if (!new->scratch)
                goto out_scratch;
 
-#ifdef NFT_PIPAPO_ALIGN
-       new->scratch_aligned = alloc_percpu(*new->scratch_aligned);
-       if (!new->scratch_aligned)
-               goto out_scratch;
-#endif
        for_each_possible_cpu(i)
                *per_cpu_ptr(new->scratch, i) = NULL;
 
@@ -1357,10 +1386,7 @@ out_lt:
        }
 out_scratch_realloc:
        for_each_possible_cpu(i)
-               kfree(*per_cpu_ptr(new->scratch, i));
-#ifdef NFT_PIPAPO_ALIGN
-       free_percpu(new->scratch_aligned);
-#endif
+               pipapo_free_scratch(new, i);
 out_scratch:
        free_percpu(new->scratch);
        kfree(new);
@@ -1560,6 +1586,7 @@ static void pipapo_gc(struct nft_set *set, struct nft_pipapo_match *m)
 {
        struct nft_pipapo *priv = nft_set_priv(set);
        struct net *net = read_pnet(&set->net);
+       u64 tstamp = nft_net_tstamp(net);
        int rules_f0, first_rule = 0;
        struct nft_pipapo_elem *e;
        struct nft_trans_gc *gc;
@@ -1594,7 +1621,7 @@ static void pipapo_gc(struct nft_set *set, struct nft_pipapo_match *m)
                /* synchronous gc never fails, there is no need to set on
                 * NFT_SET_ELEM_DEAD_BIT.
                 */
-               if (nft_set_elem_expired(&e->ext)) {
+               if (__nft_set_elem_expired(&e->ext, tstamp)) {
                        priv->dirty = true;
 
                        gc = nft_trans_gc_queue_sync(gc, GFP_KERNEL);
@@ -1640,13 +1667,9 @@ static void pipapo_free_match(struct nft_pipapo_match *m)
        int i;
 
        for_each_possible_cpu(i)
-               kfree(*per_cpu_ptr(m->scratch, i));
+               pipapo_free_scratch(m, i);
 
-#ifdef NFT_PIPAPO_ALIGN
-       free_percpu(m->scratch_aligned);
-#endif
        free_percpu(m->scratch);
-
        pipapo_free_fields(m);
 
        kfree(m);
@@ -1769,7 +1792,7 @@ static void *pipapo_deactivate(const struct net *net, const struct nft_set *set,
 {
        struct nft_pipapo_elem *e;
 
-       e = pipapo_get(net, set, data, nft_genmask_next(net));
+       e = pipapo_get(net, set, data, nft_genmask_next(net), nft_net_tstamp(net));
        if (IS_ERR(e))
                return NULL;
 
@@ -2132,7 +2155,7 @@ static int nft_pipapo_init(const struct nft_set *set,
        m->field_count = field_count;
        m->bsize_max = 0;
 
-       m->scratch = alloc_percpu(unsigned long *);
+       m->scratch = alloc_percpu(struct nft_pipapo_scratch *);
        if (!m->scratch) {
                err = -ENOMEM;
                goto out_scratch;
@@ -2140,16 +2163,6 @@ static int nft_pipapo_init(const struct nft_set *set,
        for_each_possible_cpu(i)
                *per_cpu_ptr(m->scratch, i) = NULL;
 
-#ifdef NFT_PIPAPO_ALIGN
-       m->scratch_aligned = alloc_percpu(unsigned long *);
-       if (!m->scratch_aligned) {
-               err = -ENOMEM;
-               goto out_free;
-       }
-       for_each_possible_cpu(i)
-               *per_cpu_ptr(m->scratch_aligned, i) = NULL;
-#endif
-
        rcu_head_init(&m->rcu);
 
        nft_pipapo_for_each_field(f, i, m) {
@@ -2180,9 +2193,6 @@ static int nft_pipapo_init(const struct nft_set *set,
        return 0;
 
 out_free:
-#ifdef NFT_PIPAPO_ALIGN
-       free_percpu(m->scratch_aligned);
-#endif
        free_percpu(m->scratch);
 out_scratch:
        kfree(m);
@@ -2236,11 +2246,8 @@ static void nft_pipapo_destroy(const struct nft_ctx *ctx,
 
                nft_set_pipapo_match_destroy(ctx, set, m);
 
-#ifdef NFT_PIPAPO_ALIGN
-               free_percpu(m->scratch_aligned);
-#endif
                for_each_possible_cpu(cpu)
-                       kfree(*per_cpu_ptr(m->scratch, cpu));
+                       pipapo_free_scratch(m, cpu);
                free_percpu(m->scratch);
                pipapo_free_fields(m);
                kfree(m);
@@ -2253,11 +2260,8 @@ static void nft_pipapo_destroy(const struct nft_ctx *ctx,
                if (priv->dirty)
                        nft_set_pipapo_match_destroy(ctx, set, m);
 
-#ifdef NFT_PIPAPO_ALIGN
-               free_percpu(priv->clone->scratch_aligned);
-#endif
                for_each_possible_cpu(cpu)
-                       kfree(*per_cpu_ptr(priv->clone->scratch, cpu));
+                       pipapo_free_scratch(priv->clone, cpu);
                free_percpu(priv->clone->scratch);
 
                pipapo_free_fields(priv->clone);
index 1040223da5fa3ab7bbfd4da4d348baee3d22a0d6..3842c7341a9f40a088d78532c4b610f3a99d7d23 100644 (file)
@@ -130,21 +130,29 @@ struct nft_pipapo_field {
        union nft_pipapo_map_bucket *mt;
 };
 
+/**
+ * struct nft_pipapo_scratch - percpu data used for lookup and matching
+ * @map_index: Current working bitmap index, toggled between field matches
+ * @align_off: Offset to get the originally allocated address
+ * @map:       store partial matching results during lookup
+ */
+struct nft_pipapo_scratch {
+       u8 map_index;
+       u32 align_off;
+       unsigned long map[];
+};
+
 /**
  * struct nft_pipapo_match - Data used for lookup and matching
- * @field_count                Amount of fields in set
+ * @field_count:       Amount of fields in set
  * @scratch:           Preallocated per-CPU maps for partial matching results
- * @scratch_aligned:   Version of @scratch aligned to NFT_PIPAPO_ALIGN bytes
  * @bsize_max:         Maximum lookup table bucket size of all fields, in longs
- * @rcu                        Matching data is swapped on commits
+ * @rcu:               Matching data is swapped on commits
  * @f:                 Fields, with lookup and mapping tables
  */
 struct nft_pipapo_match {
        int field_count;
-#ifdef NFT_PIPAPO_ALIGN
-       unsigned long * __percpu *scratch_aligned;
-#endif
-       unsigned long * __percpu *scratch;
+       struct nft_pipapo_scratch * __percpu *scratch;
        size_t bsize_max;
        struct rcu_head rcu;
        struct nft_pipapo_field f[] __counted_by(field_count);
index 52e0d026d30ad2c92f63f589727cdc0b39d7092b..a3a8ddca991894b28aa1a1cd7c84ba0380366b5f 100644 (file)
@@ -57,7 +57,7 @@
 
 /* Jump to label if @reg is zero */
 #define NFT_PIPAPO_AVX2_NOMATCH_GOTO(reg, label)                       \
-       asm_volatile_goto("vptest %%ymm" #reg ", %%ymm" #reg ";"        \
+       asm goto("vptest %%ymm" #reg ", %%ymm" #reg ";" \
                          "je %l[" #label "]" : : : : label)
 
 /* Store 256 bits from YMM register into memory. Contrary to bucket load
@@ -71,9 +71,6 @@
 #define NFT_PIPAPO_AVX2_ZERO(reg)                                      \
        asm volatile("vpxor %ymm" #reg ", %ymm" #reg ", %ymm" #reg)
 
-/* Current working bitmap index, toggled between field matches */
-static DEFINE_PER_CPU(bool, nft_pipapo_avx2_scratch_index);
-
 /**
  * nft_pipapo_avx2_prepare() - Prepare before main algorithm body
  *
@@ -1120,11 +1117,12 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
                            const u32 *key, const struct nft_set_ext **ext)
 {
        struct nft_pipapo *priv = nft_set_priv(set);
-       unsigned long *res, *fill, *scratch;
+       struct nft_pipapo_scratch *scratch;
        u8 genmask = nft_genmask_cur(net);
        const u8 *rp = (const u8 *)key;
        struct nft_pipapo_match *m;
        struct nft_pipapo_field *f;
+       unsigned long *res, *fill;
        bool map_index;
        int i, ret = 0;
 
@@ -1141,15 +1139,16 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
         */
        kernel_fpu_begin_mask(0);
 
-       scratch = *raw_cpu_ptr(m->scratch_aligned);
+       scratch = *raw_cpu_ptr(m->scratch);
        if (unlikely(!scratch)) {
                kernel_fpu_end();
                return false;
        }
-       map_index = raw_cpu_read(nft_pipapo_avx2_scratch_index);
 
-       res  = scratch + (map_index ? m->bsize_max : 0);
-       fill = scratch + (map_index ? 0 : m->bsize_max);
+       map_index = scratch->map_index;
+
+       res  = scratch->map + (map_index ? m->bsize_max : 0);
+       fill = scratch->map + (map_index ? 0 : m->bsize_max);
 
        /* Starting map doesn't need to be set for this implementation */
 
@@ -1221,7 +1220,7 @@ next_match:
 
 out:
        if (i % 2)
-               raw_cpu_write(nft_pipapo_avx2_scratch_index, !map_index);
+               scratch->map_index = !map_index;
        kernel_fpu_end();
 
        return ret >= 0;
index baa3fea4fe65c8f938e665a7fb6b0e4fc0f8f9ad..9944fe479e5361dc140f75be8b90bf3c5deb40f6 100644 (file)
@@ -234,7 +234,7 @@ static void nft_rbtree_gc_elem_remove(struct net *net, struct nft_set *set,
 
 static const struct nft_rbtree_elem *
 nft_rbtree_gc_elem(const struct nft_set *__set, struct nft_rbtree *priv,
-                  struct nft_rbtree_elem *rbe, u8 genmask)
+                  struct nft_rbtree_elem *rbe)
 {
        struct nft_set *set = (struct nft_set *)__set;
        struct rb_node *prev = rb_prev(&rbe->node);
@@ -253,7 +253,7 @@ nft_rbtree_gc_elem(const struct nft_set *__set, struct nft_rbtree *priv,
        while (prev) {
                rbe_prev = rb_entry(prev, struct nft_rbtree_elem, node);
                if (nft_rbtree_interval_end(rbe_prev) &&
-                   nft_set_elem_active(&rbe_prev->ext, genmask))
+                   nft_set_elem_active(&rbe_prev->ext, NFT_GENMASK_ANY))
                        break;
 
                prev = rb_prev(prev);
@@ -313,6 +313,7 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set,
        struct nft_rbtree *priv = nft_set_priv(set);
        u8 cur_genmask = nft_genmask_cur(net);
        u8 genmask = nft_genmask_next(net);
+       u64 tstamp = nft_net_tstamp(net);
        int d;
 
        /* Descend the tree to search for an existing element greater than the
@@ -360,11 +361,11 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set,
                /* perform garbage collection to avoid bogus overlap reports
                 * but skip new elements in this transaction.
                 */
-               if (nft_set_elem_expired(&rbe->ext) &&
+               if (__nft_set_elem_expired(&rbe->ext, tstamp) &&
                    nft_set_elem_active(&rbe->ext, cur_genmask)) {
                        const struct nft_rbtree_elem *removed_end;
 
-                       removed_end = nft_rbtree_gc_elem(set, priv, rbe, genmask);
+                       removed_end = nft_rbtree_gc_elem(set, priv, rbe);
                        if (IS_ERR(removed_end))
                                return PTR_ERR(removed_end);
 
@@ -551,6 +552,7 @@ nft_rbtree_deactivate(const struct net *net, const struct nft_set *set,
        const struct nft_rbtree *priv = nft_set_priv(set);
        const struct rb_node *parent = priv->root.rb_node;
        u8 genmask = nft_genmask_next(net);
+       u64 tstamp = nft_net_tstamp(net);
        int d;
 
        while (parent != NULL) {
@@ -571,7 +573,7 @@ nft_rbtree_deactivate(const struct net *net, const struct nft_set *set,
                                   nft_rbtree_interval_end(this)) {
                                parent = parent->rb_right;
                                continue;
-                       } else if (nft_set_elem_expired(&rbe->ext)) {
+                       } else if (__nft_set_elem_expired(&rbe->ext, tstamp)) {
                                break;
                        } else if (!nft_set_elem_active(&rbe->ext, genmask)) {
                                parent = parent->rb_left;
@@ -624,9 +626,10 @@ static void nft_rbtree_gc(struct nft_set *set)
 {
        struct nft_rbtree *priv = nft_set_priv(set);
        struct nft_rbtree_elem *rbe, *rbe_end = NULL;
+       struct net *net = read_pnet(&set->net);
+       u64 tstamp = nft_net_tstamp(net);
        struct rb_node *node, *next;
        struct nft_trans_gc *gc;
-       struct net *net;
 
        set  = nft_set_container_of(priv);
        net  = read_pnet(&set->net);
@@ -648,7 +651,7 @@ static void nft_rbtree_gc(struct nft_set *set)
                        rbe_end = rbe;
                        continue;
                }
-               if (!nft_set_elem_expired(&rbe->ext))
+               if (!__nft_set_elem_expired(&rbe->ext, tstamp))
                        continue;
 
                gc = nft_trans_gc_queue_sync(gc, GFP_KERNEL);
index 9ed85be79452d990ad79ad9a0b31a26bb3f4c6a4..f30163e2ca620783cceda339c702c9f81b29cfa2 100644 (file)
@@ -242,6 +242,11 @@ static int nft_socket_validate(const struct nft_ctx *ctx,
                               const struct nft_expr *expr,
                               const struct nft_data **data)
 {
+       if (ctx->family != NFPROTO_IPV4 &&
+           ctx->family != NFPROTO_IPV6 &&
+           ctx->family != NFPROTO_INET)
+               return -EOPNOTSUPP;
+
        return nft_chain_validate_hooks(ctx->chain,
                                        (1 << NF_INET_PRE_ROUTING) |
                                        (1 << NF_INET_LOCAL_IN) |
index 13da882669a4ee026d286a7903e0c974e60541ac..1d737f89dfc18ccdf816e00407bc9be70c13e8f2 100644 (file)
@@ -186,7 +186,6 @@ static int nft_synproxy_do_init(const struct nft_ctx *ctx,
                break;
 #endif
        case NFPROTO_INET:
-       case NFPROTO_BRIDGE:
                err = nf_synproxy_ipv4_init(snet, ctx->net);
                if (err)
                        goto nf_ct_failure;
@@ -219,7 +218,6 @@ static void nft_synproxy_do_destroy(const struct nft_ctx *ctx)
                break;
 #endif
        case NFPROTO_INET:
-       case NFPROTO_BRIDGE:
                nf_synproxy_ipv4_fini(snet, ctx->net);
                nf_synproxy_ipv6_fini(snet, ctx->net);
                break;
@@ -253,6 +251,11 @@ static int nft_synproxy_validate(const struct nft_ctx *ctx,
                                 const struct nft_expr *expr,
                                 const struct nft_data **data)
 {
+       if (ctx->family != NFPROTO_IPV4 &&
+           ctx->family != NFPROTO_IPV6 &&
+           ctx->family != NFPROTO_INET)
+               return -EOPNOTSUPP;
+
        return nft_chain_validate_hooks(ctx->chain, (1 << NF_INET_LOCAL_IN) |
                                                    (1 << NF_INET_FORWARD));
 }
index ae15cd693f0ec2857215c1daa7e633af222de423..71412adb73d414c43d2082362e854c3ad561d815 100644 (file)
@@ -316,6 +316,11 @@ static int nft_tproxy_validate(const struct nft_ctx *ctx,
                               const struct nft_expr *expr,
                               const struct nft_data **data)
 {
+       if (ctx->family != NFPROTO_IPV4 &&
+           ctx->family != NFPROTO_IPV6 &&
+           ctx->family != NFPROTO_INET)
+               return -EOPNOTSUPP;
+
        return nft_chain_validate_hooks(ctx->chain, 1 << NF_INET_PRE_ROUTING);
 }
 
index 9f21953c7433ff942caba909a8c8673baa3e003c..f735d79d8be5778a008485e893a2be78584318fe 100644 (file)
@@ -713,6 +713,7 @@ static const struct nft_object_ops nft_tunnel_obj_ops = {
 
 static struct nft_object_type nft_tunnel_obj_type __read_mostly = {
        .type           = NFT_OBJECT_TUNNEL,
+       .family         = NFPROTO_NETDEV,
        .ops            = &nft_tunnel_obj_ops,
        .maxattr        = NFTA_TUNNEL_KEY_MAX,
        .policy         = nft_tunnel_key_policy,
index 452f8587addadce5a2e1f480d5685eb70c5760b0..1c866757db55247b8e267fb038dd4e1fbd9681ea 100644 (file)
@@ -235,6 +235,11 @@ static int nft_xfrm_validate(const struct nft_ctx *ctx, const struct nft_expr *e
        const struct nft_xfrm *priv = nft_expr_priv(expr);
        unsigned int hooks;
 
+       if (ctx->family != NFPROTO_IPV4 &&
+           ctx->family != NFPROTO_IPV6 &&
+           ctx->family != NFPROTO_INET)
+               return -EOPNOTSUPP;
+
        switch (priv->dir) {
        case XFRM_POLICY_IN:
                hooks = (1 << NF_INET_FORWARD) |
index 4ed8ffd58ff375f3fa9f262e6f3b4d1a1aaf2731..ff315351269fe643073bb2984485b3a76566b1c8 100644 (file)
@@ -167,7 +167,7 @@ static inline u32 netlink_group_mask(u32 group)
 static struct sk_buff *netlink_to_full_skb(const struct sk_buff *skb,
                                           gfp_t gfp_mask)
 {
-       unsigned int len = skb_end_offset(skb);
+       unsigned int len = skb->len;
        struct sk_buff *new;
 
        new = alloc_skb(len, gfp_mask);
@@ -374,7 +374,7 @@ static void netlink_skb_destructor(struct sk_buff *skb)
        if (is_vmalloc_addr(skb->head)) {
                if (!skb->cloned ||
                    !atomic_dec_return(&(skb_shinfo(skb)->dataref)))
-                       vfree(skb->head);
+                       vfree_atomic(skb->head);
 
                skb->head = NULL;
        }
index 0eed00184adf454d2e06bb44330c079a402a959e..104a80b75477f60199c69811ff48e65af293297c 100644 (file)
@@ -453,16 +453,16 @@ static int nr_create(struct net *net, struct socket *sock, int protocol,
        nr_init_timers(sk);
 
        nr->t1     =
-               msecs_to_jiffies(sysctl_netrom_transport_timeout);
+               msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_timeout));
        nr->t2     =
-               msecs_to_jiffies(sysctl_netrom_transport_acknowledge_delay);
+               msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_acknowledge_delay));
        nr->n2     =
-               msecs_to_jiffies(sysctl_netrom_transport_maximum_tries);
+               msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_maximum_tries));
        nr->t4     =
-               msecs_to_jiffies(sysctl_netrom_transport_busy_delay);
+               msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_busy_delay));
        nr->idle   =
-               msecs_to_jiffies(sysctl_netrom_transport_no_activity_timeout);
-       nr->window = sysctl_netrom_transport_requested_window_size;
+               msecs_to_jiffies(READ_ONCE(sysctl_netrom_transport_no_activity_timeout));
+       nr->window = READ_ONCE(sysctl_netrom_transport_requested_window_size);
 
        nr->bpqext = 1;
        nr->state  = NR_STATE_0;
@@ -954,7 +954,7 @@ int nr_rx_frame(struct sk_buff *skb, struct net_device *dev)
                 * G8PZT's Xrouter which is sending packets with command type 7
                 * as an extension of the protocol.
                 */
-               if (sysctl_netrom_reset_circuit &&
+               if (READ_ONCE(sysctl_netrom_reset_circuit) &&
                    (frametype != NR_RESET || flags != 0))
                        nr_transmit_reset(skb, 1);
 
index 3aaac4a22b38763cd855e494b2c19fdf3bcbd54e..2c34389c3ce6f16acf669fa14c7b38a7d63dda4b 100644 (file)
@@ -81,7 +81,7 @@ static int nr_header(struct sk_buff *skb, struct net_device *dev,
        buff[6] |= AX25_SSSID_SPARE;
        buff    += AX25_ADDR_LEN;
 
-       *buff++ = sysctl_netrom_network_ttl_initialiser;
+       *buff++ = READ_ONCE(sysctl_netrom_network_ttl_initialiser);
 
        *buff++ = NR_PROTO_IP;
        *buff++ = NR_PROTO_IP;
index 2f084b6f69d7e05b51d1673157e8e72f3b9e2635..97944db6b5ac64387d8c8d53dc551bb11f1789f1 100644 (file)
@@ -97,7 +97,7 @@ static int nr_state1_machine(struct sock *sk, struct sk_buff *skb,
                break;
 
        case NR_RESET:
-               if (sysctl_netrom_reset_circuit)
+               if (READ_ONCE(sysctl_netrom_reset_circuit))
                        nr_disconnect(sk, ECONNRESET);
                break;
 
@@ -128,7 +128,7 @@ static int nr_state2_machine(struct sock *sk, struct sk_buff *skb,
                break;
 
        case NR_RESET:
-               if (sysctl_netrom_reset_circuit)
+               if (READ_ONCE(sysctl_netrom_reset_circuit))
                        nr_disconnect(sk, ECONNRESET);
                break;
 
@@ -262,7 +262,7 @@ static int nr_state3_machine(struct sock *sk, struct sk_buff *skb, int frametype
                break;
 
        case NR_RESET:
-               if (sysctl_netrom_reset_circuit)
+               if (READ_ONCE(sysctl_netrom_reset_circuit))
                        nr_disconnect(sk, ECONNRESET);
                break;
 
index 44929657f5b717de639c13334f17f42f652a78cc..5e531394a724b7f919f22ae2be42c8feaafdc22e 100644 (file)
@@ -204,7 +204,7 @@ void nr_transmit_buffer(struct sock *sk, struct sk_buff *skb)
        dptr[6] |= AX25_SSSID_SPARE;
        dptr += AX25_ADDR_LEN;
 
-       *dptr++ = sysctl_netrom_network_ttl_initialiser;
+       *dptr++ = READ_ONCE(sysctl_netrom_network_ttl_initialiser);
 
        if (!nr_route_frame(skb, NULL)) {
                kfree_skb(skb);
index baea3cbd76ca5bb5803974a75c3f10f87cc80140..70480869ad1c566a8ab8a28c0d39bdae056ec596 100644 (file)
@@ -153,7 +153,7 @@ static int __must_check nr_add_node(ax25_address *nr, const char *mnemonic,
                nr_neigh->digipeat = NULL;
                nr_neigh->ax25     = NULL;
                nr_neigh->dev      = dev;
-               nr_neigh->quality  = sysctl_netrom_default_path_quality;
+               nr_neigh->quality  = READ_ONCE(sysctl_netrom_default_path_quality);
                nr_neigh->locked   = 0;
                nr_neigh->count    = 0;
                nr_neigh->number   = nr_neigh_no++;
@@ -728,7 +728,7 @@ void nr_link_failed(ax25_cb *ax25, int reason)
        nr_neigh->ax25 = NULL;
        ax25_cb_put(ax25);
 
-       if (++nr_neigh->failed < sysctl_netrom_link_fails_count) {
+       if (++nr_neigh->failed < READ_ONCE(sysctl_netrom_link_fails_count)) {
                nr_neigh_put(nr_neigh);
                return;
        }
@@ -766,7 +766,7 @@ int nr_route_frame(struct sk_buff *skb, ax25_cb *ax25)
        if (ax25 != NULL) {
                ret = nr_add_node(nr_src, "", &ax25->dest_addr, ax25->digipeat,
                                  ax25->ax25_dev->dev, 0,
-                                 sysctl_netrom_obsolescence_count_initialiser);
+                                 READ_ONCE(sysctl_netrom_obsolescence_count_initialiser));
                if (ret)
                        return ret;
        }
@@ -780,7 +780,7 @@ int nr_route_frame(struct sk_buff *skb, ax25_cb *ax25)
                return ret;
        }
 
-       if (!sysctl_netrom_routing_control && ax25 != NULL)
+       if (!READ_ONCE(sysctl_netrom_routing_control) && ax25 != NULL)
                return 0;
 
        /* Its Time-To-Live has expired */
index e2d2af924cff4a4103e59e04a6efe69c6fcca23e..c3bbd5880850bb047c7375e2c1b8acf6e80e0231 100644 (file)
@@ -182,7 +182,8 @@ void nr_write_internal(struct sock *sk, int frametype)
                *dptr++ = nr->my_id;
                *dptr++ = frametype;
                *dptr++ = nr->window;
-               if (nr->bpqext) *dptr++ = sysctl_netrom_network_ttl_initialiser;
+               if (nr->bpqext)
+                       *dptr++ = READ_ONCE(sysctl_netrom_network_ttl_initialiser);
                break;
 
        case NR_DISCREQ:
@@ -236,7 +237,7 @@ void __nr_transmit_reply(struct sk_buff *skb, int mine, unsigned char cmdflags)
        dptr[6] |= AX25_SSSID_SPARE;
        dptr += AX25_ADDR_LEN;
 
-       *dptr++ = sysctl_netrom_network_ttl_initialiser;
+       *dptr++ = READ_ONCE(sysctl_netrom_network_ttl_initialiser);
 
        if (mine) {
                *dptr++ = 0;
index 97348cedb16b30d9a60cb8096a8408f6a8890e6d..cdad47b140fa4bd54ac0571457ab16ab505a3a11 100644 (file)
@@ -1208,6 +1208,10 @@ void nci_free_device(struct nci_dev *ndev)
 {
        nfc_free_device(ndev->nfc_dev);
        nci_hci_deallocate(ndev);
+
+       /* drop partial rx data packet if present */
+       if (ndev->rx_data_reassembly)
+               kfree_skb(ndev->rx_data_reassembly);
        kfree(ndev);
 }
 EXPORT_SYMBOL(nci_free_device);
index 88965e2068ac655317169256486613aadf471580..ebc5728aab4eaf0bc165cbdb03e26fc852af9a3e 100644 (file)
@@ -48,6 +48,7 @@ struct ovs_len_tbl {
 
 #define OVS_ATTR_NESTED -1
 #define OVS_ATTR_VARIABLE -2
+#define OVS_COPY_ACTIONS_MAX_DEPTH 16
 
 static bool actions_may_change_flow(const struct nlattr *actions)
 {
@@ -2545,13 +2546,15 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
                                  const struct sw_flow_key *key,
                                  struct sw_flow_actions **sfa,
                                  __be16 eth_type, __be16 vlan_tci,
-                                 u32 mpls_label_count, bool log);
+                                 u32 mpls_label_count, bool log,
+                                 u32 depth);
 
 static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
                                    const struct sw_flow_key *key,
                                    struct sw_flow_actions **sfa,
                                    __be16 eth_type, __be16 vlan_tci,
-                                   u32 mpls_label_count, bool log, bool last)
+                                   u32 mpls_label_count, bool log, bool last,
+                                   u32 depth)
 {
        const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1];
        const struct nlattr *probability, *actions;
@@ -2602,7 +2605,8 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
                return err;
 
        err = __ovs_nla_copy_actions(net, actions, key, sfa,
-                                    eth_type, vlan_tci, mpls_label_count, log);
+                                    eth_type, vlan_tci, mpls_label_count, log,
+                                    depth + 1);
 
        if (err)
                return err;
@@ -2617,7 +2621,8 @@ static int validate_and_copy_dec_ttl(struct net *net,
                                     const struct sw_flow_key *key,
                                     struct sw_flow_actions **sfa,
                                     __be16 eth_type, __be16 vlan_tci,
-                                    u32 mpls_label_count, bool log)
+                                    u32 mpls_label_count, bool log,
+                                    u32 depth)
 {
        const struct nlattr *attrs[OVS_DEC_TTL_ATTR_MAX + 1];
        int start, action_start, err, rem;
@@ -2660,7 +2665,8 @@ static int validate_and_copy_dec_ttl(struct net *net,
                return action_start;
 
        err = __ovs_nla_copy_actions(net, actions, key, sfa, eth_type,
-                                    vlan_tci, mpls_label_count, log);
+                                    vlan_tci, mpls_label_count, log,
+                                    depth + 1);
        if (err)
                return err;
 
@@ -2674,7 +2680,8 @@ static int validate_and_copy_clone(struct net *net,
                                   const struct sw_flow_key *key,
                                   struct sw_flow_actions **sfa,
                                   __be16 eth_type, __be16 vlan_tci,
-                                  u32 mpls_label_count, bool log, bool last)
+                                  u32 mpls_label_count, bool log, bool last,
+                                  u32 depth)
 {
        int start, err;
        u32 exec;
@@ -2694,7 +2701,8 @@ static int validate_and_copy_clone(struct net *net,
                return err;
 
        err = __ovs_nla_copy_actions(net, attr, key, sfa,
-                                    eth_type, vlan_tci, mpls_label_count, log);
+                                    eth_type, vlan_tci, mpls_label_count, log,
+                                    depth + 1);
        if (err)
                return err;
 
@@ -3063,7 +3071,7 @@ static int validate_and_copy_check_pkt_len(struct net *net,
                                           struct sw_flow_actions **sfa,
                                           __be16 eth_type, __be16 vlan_tci,
                                           u32 mpls_label_count,
-                                          bool log, bool last)
+                                          bool log, bool last, u32 depth)
 {
        const struct nlattr *acts_if_greater, *acts_if_lesser_eq;
        struct nlattr *a[OVS_CHECK_PKT_LEN_ATTR_MAX + 1];
@@ -3111,7 +3119,8 @@ static int validate_and_copy_check_pkt_len(struct net *net,
                return nested_acts_start;
 
        err = __ovs_nla_copy_actions(net, acts_if_lesser_eq, key, sfa,
-                                    eth_type, vlan_tci, mpls_label_count, log);
+                                    eth_type, vlan_tci, mpls_label_count, log,
+                                    depth + 1);
 
        if (err)
                return err;
@@ -3124,7 +3133,8 @@ static int validate_and_copy_check_pkt_len(struct net *net,
                return nested_acts_start;
 
        err = __ovs_nla_copy_actions(net, acts_if_greater, key, sfa,
-                                    eth_type, vlan_tci, mpls_label_count, log);
+                                    eth_type, vlan_tci, mpls_label_count, log,
+                                    depth + 1);
 
        if (err)
                return err;
@@ -3152,12 +3162,16 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
                                  const struct sw_flow_key *key,
                                  struct sw_flow_actions **sfa,
                                  __be16 eth_type, __be16 vlan_tci,
-                                 u32 mpls_label_count, bool log)
+                                 u32 mpls_label_count, bool log,
+                                 u32 depth)
 {
        u8 mac_proto = ovs_key_mac_proto(key);
        const struct nlattr *a;
        int rem, err;
 
+       if (depth > OVS_COPY_ACTIONS_MAX_DEPTH)
+               return -EOVERFLOW;
+
        nla_for_each_nested(a, attr, rem) {
                /* Expected argument lengths, (u32)-1 for variable length. */
                static const u32 action_lens[OVS_ACTION_ATTR_MAX + 1] = {
@@ -3355,7 +3369,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
                        err = validate_and_copy_sample(net, a, key, sfa,
                                                       eth_type, vlan_tci,
                                                       mpls_label_count,
-                                                      log, last);
+                                                      log, last, depth);
                        if (err)
                                return err;
                        skip_copy = true;
@@ -3426,7 +3440,7 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
                        err = validate_and_copy_clone(net, a, key, sfa,
                                                      eth_type, vlan_tci,
                                                      mpls_label_count,
-                                                     log, last);
+                                                     log, last, depth);
                        if (err)
                                return err;
                        skip_copy = true;
@@ -3440,7 +3454,8 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
                                                              eth_type,
                                                              vlan_tci,
                                                              mpls_label_count,
-                                                             log, last);
+                                                             log, last,
+                                                             depth);
                        if (err)
                                return err;
                        skip_copy = true;
@@ -3450,7 +3465,8 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
                case OVS_ACTION_ATTR_DEC_TTL:
                        err = validate_and_copy_dec_ttl(net, a, key, sfa,
                                                        eth_type, vlan_tci,
-                                                       mpls_label_count, log);
+                                                       mpls_label_count, log,
+                                                       depth);
                        if (err)
                                return err;
                        skip_copy = true;
@@ -3495,7 +3511,8 @@ int ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 
        (*sfa)->orig_len = nla_len(attr);
        err = __ovs_nla_copy_actions(net, attr, key, sfa, key->eth.type,
-                                    key->eth.vlan.tci, mpls_label_count, log);
+                                    key->eth.vlan.tci, mpls_label_count, log,
+                                    0);
        if (err)
                ovs_nla_free_flow_actions(*sfa);
 
index 3aa50dc7535b7761c77652d2f38826419b57c26a..976fe250b50955ec51b0c5d73f2dfa132990b60b 100644 (file)
@@ -34,10 +34,10 @@ static int pn_ioctl(struct sock *sk, int cmd, int *karg)
 
        switch (cmd) {
        case SIOCINQ:
-               lock_sock(sk);
+               spin_lock_bh(&sk->sk_receive_queue.lock);
                skb = skb_peek(&sk->sk_receive_queue);
                *karg = skb ? skb->len : 0;
-               release_sock(sk);
+               spin_unlock_bh(&sk->sk_receive_queue.lock);
                return 0;
 
        case SIOCPNADDRESOURCE:
index faba31f2eff2903bee7082b295f137ff848a1e10..3dd5f52bc1b58e3f1ee4e235126438c723f1f73c 100644 (file)
@@ -917,6 +917,37 @@ static int pep_sock_enable(struct sock *sk, struct sockaddr *addr, int len)
        return 0;
 }
 
+static unsigned int pep_first_packet_length(struct sock *sk)
+{
+       struct pep_sock *pn = pep_sk(sk);
+       struct sk_buff_head *q;
+       struct sk_buff *skb;
+       unsigned int len = 0;
+       bool found = false;
+
+       if (sock_flag(sk, SOCK_URGINLINE)) {
+               q = &pn->ctrlreq_queue;
+               spin_lock_bh(&q->lock);
+               skb = skb_peek(q);
+               if (skb) {
+                       len = skb->len;
+                       found = true;
+               }
+               spin_unlock_bh(&q->lock);
+       }
+
+       if (likely(!found)) {
+               q = &sk->sk_receive_queue;
+               spin_lock_bh(&q->lock);
+               skb = skb_peek(q);
+               if (skb)
+                       len = skb->len;
+               spin_unlock_bh(&q->lock);
+       }
+
+       return len;
+}
+
 static int pep_ioctl(struct sock *sk, int cmd, int *karg)
 {
        struct pep_sock *pn = pep_sk(sk);
@@ -929,15 +960,7 @@ static int pep_ioctl(struct sock *sk, int cmd, int *karg)
                        break;
                }
 
-               lock_sock(sk);
-               if (sock_flag(sk, SOCK_URGINLINE) &&
-                   !skb_queue_empty(&pn->ctrlreq_queue))
-                       *karg = skb_peek(&pn->ctrlreq_queue)->len;
-               else if (!skb_queue_empty(&sk->sk_receive_queue))
-                       *karg = skb_peek(&sk->sk_receive_queue)->len;
-               else
-                       *karg = 0;
-               release_sock(sk);
+               *karg = pep_first_packet_length(sk);
                ret = 0;
                break;
 
index 01c4cdfef45df32ad0b0b942e416d6bc267687e1..8435a20968ef5112d44164ecbf89071f7ee4b855 100644 (file)
@@ -419,7 +419,7 @@ static int rds_recv_track_latency(struct rds_sock *rs, sockptr_t optval,
 
        rs->rs_rx_traces = trace.rx_traces;
        for (i = 0; i < rs->rs_rx_traces; i++) {
-               if (trace.rx_trace_pos[i] > RDS_MSG_RX_DGRAM_TRACE_MAX) {
+               if (trace.rx_trace_pos[i] >= RDS_MSG_RX_DGRAM_TRACE_MAX) {
                        rs->rs_rx_traces = 0;
                        return -EFAULT;
                }
index fba82d36593add3317e89104aabbd69a0281cb33..a4e3c5de998be4c756cb0dc423ee9a7e7fa3e1a9 100644 (file)
@@ -301,6 +301,9 @@ static int __rds_rdma_map(struct rds_sock *rs, struct rds_get_mr_args *args,
                        kfree(sg);
                }
                ret = PTR_ERR(trans_private);
+               /* Trigger connection so that its ready for the next retry */
+               if (ret == -ENODEV)
+                       rds_conn_connect_if_down(cp->cp_conn);
                goto out;
        }
 
index c71b923764fd7cd7268953b968c5f5749a0b98a6..5627f80013f8b17d3de6284784fe3cbb02bba754 100644 (file)
@@ -425,6 +425,7 @@ static int rds_still_queued(struct rds_sock *rs, struct rds_incoming *inc,
        struct sock *sk = rds_rs_to_sk(rs);
        int ret = 0;
        unsigned long flags;
+       struct rds_incoming *to_drop = NULL;
 
        write_lock_irqsave(&rs->rs_recv_lock, flags);
        if (!list_empty(&inc->i_item)) {
@@ -435,11 +436,14 @@ static int rds_still_queued(struct rds_sock *rs, struct rds_incoming *inc,
                                              -be32_to_cpu(inc->i_hdr.h_len),
                                              inc->i_hdr.h_dport);
                        list_del_init(&inc->i_item);
-                       rds_inc_put(inc);
+                       to_drop = inc;
                }
        }
        write_unlock_irqrestore(&rs->rs_recv_lock, flags);
 
+       if (to_drop)
+               rds_inc_put(to_drop);
+
        rdsdebug("inc %p rs %p still %d dropped %d\n", inc, rs, ret, drop);
        return ret;
 }
@@ -758,16 +762,21 @@ void rds_clear_recv_queue(struct rds_sock *rs)
        struct sock *sk = rds_rs_to_sk(rs);
        struct rds_incoming *inc, *tmp;
        unsigned long flags;
+       LIST_HEAD(to_drop);
 
        write_lock_irqsave(&rs->rs_recv_lock, flags);
        list_for_each_entry_safe(inc, tmp, &rs->rs_recv_queue, i_item) {
                rds_recv_rcvbuf_delta(rs, sk, inc->i_conn->c_lcong,
                                      -be32_to_cpu(inc->i_hdr.h_len),
                                      inc->i_hdr.h_dport);
+               list_move(&inc->i_item, &to_drop);
+       }
+       write_unlock_irqrestore(&rs->rs_recv_lock, flags);
+
+       list_for_each_entry_safe(inc, tmp, &to_drop, i_item) {
                list_del_init(&inc->i_item);
                rds_inc_put(inc);
        }
-       write_unlock_irqrestore(&rs->rs_recv_lock, flags);
 }
 
 /*
index 5e57a1581dc60571406e2faeeeba63af1e8aa29c..2899def23865fa47ce55faa7c7eb72fb52bc432b 100644 (file)
@@ -1313,12 +1313,8 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len)
 
        /* Parse any control messages the user may have included. */
        ret = rds_cmsg_send(rs, rm, msg, &allocated_mr, &vct);
-       if (ret) {
-               /* Trigger connection so that its ready for the next retry */
-               if (ret ==  -EAGAIN)
-                       rds_conn_connect_if_down(conn);
+       if (ret)
                goto out;
-       }
 
        if (rm->rdma.op_active && !conn->c_trans->xmit_rdma) {
                printk_ratelimited(KERN_NOTICE "rdma_op %p conn xmit_rdma %p\n",
index dbeb75c298573adc580568744d6781a5c6193b0d..7818aae1be8e00c1e9b15c918868ca11b40a7213 100644 (file)
@@ -199,11 +199,19 @@ struct rxrpc_host_header {
  */
 struct rxrpc_skb_priv {
        struct rxrpc_connection *conn;  /* Connection referred to (poke packet) */
-       u16             offset;         /* Offset of data */
-       u16             len;            /* Length of data */
-       u8              flags;
+       union {
+               struct {
+                       u16             offset;         /* Offset of data */
+                       u16             len;            /* Length of data */
+                       u8              flags;
 #define RXRPC_RX_VERIFIED      0x01
-
+               };
+               struct {
+                       rxrpc_seq_t     first_ack;      /* First packet in acks table */
+                       u8              nr_acks;        /* Number of acks+nacks */
+                       u8              nr_nacks;       /* Number of nacks */
+               };
+       };
        struct rxrpc_host_header hdr;   /* RxRPC packet header from this packet */
 };
 
@@ -510,7 +518,7 @@ struct rxrpc_connection {
        enum rxrpc_call_completion completion;  /* Completion condition */
        s32                     abort_code;     /* Abort code of connection abort */
        int                     debug_id;       /* debug ID for printks */
-       atomic_t                serial;         /* packet serial number counter */
+       rxrpc_serial_t          tx_serial;      /* Outgoing packet serial number counter */
        unsigned int            hi_serial;      /* highest serial number received */
        u32                     service_id;     /* Service ID, possibly upgraded */
        u32                     security_level; /* Security level selected */
@@ -692,11 +700,11 @@ struct rxrpc_call {
        u8                      cong_dup_acks;  /* Count of ACKs showing missing packets */
        u8                      cong_cumul_acks; /* Cumulative ACK count */
        ktime_t                 cong_tstamp;    /* Last time cwnd was changed */
+       struct sk_buff          *cong_last_nack; /* Last ACK with nacks received */
 
        /* Receive-phase ACK management (ACKs we send). */
        u8                      ackr_reason;    /* reason to ACK */
        u16                     ackr_sack_base; /* Starting slot in SACK table ring */
-       rxrpc_serial_t          ackr_serial;    /* serial of packet being ACK'd */
        rxrpc_seq_t             ackr_window;    /* Base of SACK window */
        rxrpc_seq_t             ackr_wtop;      /* Base of SACK window */
        unsigned int            ackr_nr_unacked; /* Number of unacked packets */
@@ -730,7 +738,8 @@ struct rxrpc_call {
 struct rxrpc_ack_summary {
        u16                     nr_acks;                /* Number of ACKs in packet */
        u16                     nr_new_acks;            /* Number of new ACKs in packet */
-       u16                     nr_rot_new_acks;        /* Number of rotated new ACKs */
+       u16                     nr_new_nacks;           /* Number of new nacks in packet */
+       u16                     nr_retained_nacks;      /* Number of nacks retained between ACKs */
        u8                      ack_reason;
        bool                    saw_nacks;              /* Saw NACKs in packet */
        bool                    new_low_nack;           /* T if new low NACK found */
@@ -822,6 +831,20 @@ static inline bool rxrpc_sending_to_client(const struct rxrpc_txbuf *txb)
 
 #include <trace/events/rxrpc.h>
 
+/*
+ * Allocate the next serial number on a connection.  0 must be skipped.
+ */
+static inline rxrpc_serial_t rxrpc_get_next_serial(struct rxrpc_connection *conn)
+{
+       rxrpc_serial_t serial;
+
+       serial = conn->tx_serial;
+       if (serial == 0)
+               serial = 1;
+       conn->tx_serial = serial + 1;
+       return serial;
+}
+
 /*
  * af_rxrpc.c
  */
index e363f21a20141bb13c931fc0cd40c862c49e5829..0f78544d043be9327ea13cc91fbfa532d6ef4002 100644 (file)
@@ -43,8 +43,6 @@ void rxrpc_propose_delay_ACK(struct rxrpc_call *call, rxrpc_serial_t serial,
        unsigned long expiry = rxrpc_soft_ack_delay;
        unsigned long now = jiffies, ack_at;
 
-       call->ackr_serial = serial;
-
        if (rxrpc_soft_ack_delay < expiry)
                expiry = rxrpc_soft_ack_delay;
        if (call->peer->srtt_us != 0)
@@ -114,6 +112,7 @@ static void rxrpc_congestion_timeout(struct rxrpc_call *call)
 void rxrpc_resend(struct rxrpc_call *call, struct sk_buff *ack_skb)
 {
        struct rxrpc_ackpacket *ack = NULL;
+       struct rxrpc_skb_priv *sp;
        struct rxrpc_txbuf *txb;
        unsigned long resend_at;
        rxrpc_seq_t transmitted = READ_ONCE(call->tx_transmitted);
@@ -141,14 +140,15 @@ void rxrpc_resend(struct rxrpc_call *call, struct sk_buff *ack_skb)
         * explicitly NAK'd packets.
         */
        if (ack_skb) {
+               sp = rxrpc_skb(ack_skb);
                ack = (void *)ack_skb->data + sizeof(struct rxrpc_wire_header);
 
-               for (i = 0; i < ack->nAcks; i++) {
+               for (i = 0; i < sp->nr_acks; i++) {
                        rxrpc_seq_t seq;
 
                        if (ack->acks[i] & 1)
                                continue;
-                       seq = ntohl(ack->firstPacket) + i;
+                       seq = sp->first_ack + i;
                        if (after(txb->seq, transmitted))
                                break;
                        if (after(txb->seq, seq))
@@ -373,7 +373,6 @@ static void rxrpc_send_initial_ping(struct rxrpc_call *call)
 bool rxrpc_input_call_event(struct rxrpc_call *call, struct sk_buff *skb)
 {
        unsigned long now, next, t;
-       rxrpc_serial_t ackr_serial;
        bool resend = false, expired = false;
        s32 abort_code;
 
@@ -423,8 +422,7 @@ bool rxrpc_input_call_event(struct rxrpc_call *call, struct sk_buff *skb)
        if (time_after_eq(now, t)) {
                trace_rxrpc_timer(call, rxrpc_timer_exp_ack, now);
                cmpxchg(&call->delay_ack_at, t, now + MAX_JIFFY_OFFSET);
-               ackr_serial = xchg(&call->ackr_serial, 0);
-               rxrpc_send_ACK(call, RXRPC_ACK_DELAY, ackr_serial,
+               rxrpc_send_ACK(call, RXRPC_ACK_DELAY, 0,
                               rxrpc_propose_ack_ping_for_lost_ack);
        }
 
index 0943e54370ba0e71bcfa6d2238704b0b41c49ee9..9fc9a6c3f685868fe69842d5ac133a1b898674eb 100644 (file)
@@ -686,6 +686,7 @@ static void rxrpc_destroy_call(struct work_struct *work)
 
        del_timer_sync(&call->timer);
 
+       rxrpc_free_skb(call->cong_last_nack, rxrpc_skb_put_last_nack);
        rxrpc_cleanup_ring(call);
        while ((txb = list_first_entry_or_null(&call->tx_sendmsg,
                                               struct rxrpc_txbuf, call_link))) {
index 95f4bc206b3dc9a571abe6fb63cc6fe05575e9c9..1f251d758cb9d8be81856187d78e1994ef179072 100644 (file)
@@ -95,6 +95,14 @@ void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn,
 
        _enter("%d", conn->debug_id);
 
+       if (sp && sp->hdr.type == RXRPC_PACKET_TYPE_ACK) {
+               if (skb_copy_bits(skb, sizeof(struct rxrpc_wire_header),
+                                 &pkt.ack, sizeof(pkt.ack)) < 0)
+                       return;
+               if (pkt.ack.reason == RXRPC_ACK_PING_RESPONSE)
+                       return;
+       }
+
        chan = &conn->channels[channel];
 
        /* If the last call got moved on whilst we were waiting to run, just
@@ -117,7 +125,7 @@ void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn,
        iov[2].iov_base = &ack_info;
        iov[2].iov_len  = sizeof(ack_info);
 
-       serial = atomic_inc_return(&conn->serial);
+       serial = rxrpc_get_next_serial(conn);
 
        pkt.whdr.epoch          = htonl(conn->proto.epoch);
        pkt.whdr.cid            = htonl(conn->proto.cid | channel);
index 92495e73b8699185cf76c60aa88f62d77a29dd56..9691de00ade7522d36174bbe1ab9098c1b52b145 100644 (file)
@@ -45,11 +45,9 @@ static void rxrpc_congestion_management(struct rxrpc_call *call,
        }
 
        cumulative_acks += summary->nr_new_acks;
-       cumulative_acks += summary->nr_rot_new_acks;
        if (cumulative_acks > 255)
                cumulative_acks = 255;
 
-       summary->mode = call->cong_mode;
        summary->cwnd = call->cong_cwnd;
        summary->ssthresh = call->cong_ssthresh;
        summary->cumulative_acks = cumulative_acks;
@@ -151,6 +149,7 @@ out_no_clear_ca:
                cwnd = RXRPC_TX_MAX_WINDOW;
        call->cong_cwnd = cwnd;
        call->cong_cumul_acks = cumulative_acks;
+       summary->mode = call->cong_mode;
        trace_rxrpc_congest(call, summary, acked_serial, change);
        if (resend)
                rxrpc_resend(call, skb);
@@ -213,7 +212,6 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to,
        list_for_each_entry_rcu(txb, &call->tx_buffer, call_link, false) {
                if (before_eq(txb->seq, call->acks_hard_ack))
                        continue;
-               summary->nr_rot_new_acks++;
                if (test_bit(RXRPC_TXBUF_LAST, &txb->flags)) {
                        set_bit(RXRPC_CALL_TX_LAST, &call->flags);
                        rot_last = true;
@@ -254,6 +252,11 @@ static void rxrpc_end_tx_phase(struct rxrpc_call *call, bool reply_begun,
 {
        ASSERT(test_bit(RXRPC_CALL_TX_LAST, &call->flags));
 
+       if (unlikely(call->cong_last_nack)) {
+               rxrpc_free_skb(call->cong_last_nack, rxrpc_skb_put_last_nack);
+               call->cong_last_nack = NULL;
+       }
+
        switch (__rxrpc_call_state(call)) {
        case RXRPC_CALL_CLIENT_SEND_REQUEST:
        case RXRPC_CALL_CLIENT_AWAIT_REPLY:
@@ -702,6 +705,43 @@ static void rxrpc_input_ackinfo(struct rxrpc_call *call, struct sk_buff *skb,
                wake_up(&call->waitq);
 }
 
+/*
+ * Determine how many nacks from the previous ACK have now been satisfied.
+ */
+static rxrpc_seq_t rxrpc_input_check_prev_ack(struct rxrpc_call *call,
+                                             struct rxrpc_ack_summary *summary,
+                                             rxrpc_seq_t seq)
+{
+       struct sk_buff *skb = call->cong_last_nack;
+       struct rxrpc_ackpacket ack;
+       struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
+       unsigned int i, new_acks = 0, retained_nacks = 0;
+       rxrpc_seq_t old_seq = sp->first_ack;
+       u8 *acks = skb->data + sizeof(struct rxrpc_wire_header) + sizeof(ack);
+
+       if (after_eq(seq, old_seq + sp->nr_acks)) {
+               summary->nr_new_acks += sp->nr_nacks;
+               summary->nr_new_acks += seq - (old_seq + sp->nr_acks);
+               summary->nr_retained_nacks = 0;
+       } else if (seq == old_seq) {
+               summary->nr_retained_nacks = sp->nr_nacks;
+       } else {
+               for (i = 0; i < sp->nr_acks; i++) {
+                       if (acks[i] == RXRPC_ACK_TYPE_NACK) {
+                               if (before(old_seq + i, seq))
+                                       new_acks++;
+                               else
+                                       retained_nacks++;
+                       }
+               }
+
+               summary->nr_new_acks += new_acks;
+               summary->nr_retained_nacks = retained_nacks;
+       }
+
+       return old_seq + sp->nr_acks;
+}
+
 /*
  * Process individual soft ACKs.
  *
@@ -711,25 +751,51 @@ static void rxrpc_input_ackinfo(struct rxrpc_call *call, struct sk_buff *skb,
  * the timer on the basis that the peer might just not have processed them at
  * the time the ACK was sent.
  */
-static void rxrpc_input_soft_acks(struct rxrpc_call *call, u8 *acks,
-                                 rxrpc_seq_t seq, int nr_acks,
-                                 struct rxrpc_ack_summary *summary)
+static void rxrpc_input_soft_acks(struct rxrpc_call *call,
+                                 struct rxrpc_ack_summary *summary,
+                                 struct sk_buff *skb,
+                                 rxrpc_seq_t seq,
+                                 rxrpc_seq_t since)
 {
-       unsigned int i;
+       struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
+       unsigned int i, old_nacks = 0;
+       rxrpc_seq_t lowest_nak = seq + sp->nr_acks;
+       u8 *acks = skb->data + sizeof(struct rxrpc_wire_header) + sizeof(struct rxrpc_ackpacket);
 
-       for (i = 0; i < nr_acks; i++) {
+       for (i = 0; i < sp->nr_acks; i++) {
                if (acks[i] == RXRPC_ACK_TYPE_ACK) {
                        summary->nr_acks++;
-                       summary->nr_new_acks++;
+                       if (after_eq(seq, since))
+                               summary->nr_new_acks++;
                } else {
-                       if (!summary->saw_nacks &&
-                           call->acks_lowest_nak != seq + i) {
-                               call->acks_lowest_nak = seq + i;
-                               summary->new_low_nack = true;
-                       }
                        summary->saw_nacks = true;
+                       if (before(seq, since)) {
+                               /* Overlap with previous ACK */
+                               old_nacks++;
+                       } else {
+                               summary->nr_new_nacks++;
+                               sp->nr_nacks++;
+                       }
+
+                       if (before(seq, lowest_nak))
+                               lowest_nak = seq;
                }
+               seq++;
+       }
+
+       if (lowest_nak != call->acks_lowest_nak) {
+               call->acks_lowest_nak = lowest_nak;
+               summary->new_low_nack = true;
        }
+
+       /* We *can* have more nacks than we did - the peer is permitted to drop
+        * packets it has soft-acked and re-request them.  Further, it is
+        * possible for the nack distribution to change whilst the number of
+        * nacks stays the same or goes down.
+        */
+       if (old_nacks < summary->nr_retained_nacks)
+               summary->nr_new_acks += summary->nr_retained_nacks - old_nacks;
+       summary->nr_retained_nacks = old_nacks;
 }
 
 /*
@@ -773,7 +839,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
        struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
        struct rxrpc_ackinfo info;
        rxrpc_serial_t ack_serial, acked_serial;
-       rxrpc_seq_t first_soft_ack, hard_ack, prev_pkt;
+       rxrpc_seq_t first_soft_ack, hard_ack, prev_pkt, since;
        int nr_acks, offset, ioffset;
 
        _enter("");
@@ -789,6 +855,8 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
        prev_pkt = ntohl(ack.previousPacket);
        hard_ack = first_soft_ack - 1;
        nr_acks = ack.nAcks;
+       sp->first_ack = first_soft_ack;
+       sp->nr_acks = nr_acks;
        summary.ack_reason = (ack.reason < RXRPC_ACK__INVALID ?
                              ack.reason : RXRPC_ACK__INVALID);
 
@@ -858,6 +926,16 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
        if (nr_acks > 0)
                skb_condense(skb);
 
+       if (call->cong_last_nack) {
+               since = rxrpc_input_check_prev_ack(call, &summary, first_soft_ack);
+               rxrpc_free_skb(call->cong_last_nack, rxrpc_skb_put_last_nack);
+               call->cong_last_nack = NULL;
+       } else {
+               summary.nr_new_acks = first_soft_ack - call->acks_first_seq;
+               call->acks_lowest_nak = first_soft_ack + nr_acks;
+               since = first_soft_ack;
+       }
+
        call->acks_latest_ts = skb->tstamp;
        call->acks_first_seq = first_soft_ack;
        call->acks_prev_seq = prev_pkt;
@@ -866,7 +944,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
        case RXRPC_ACK_PING:
                break;
        default:
-               if (after(acked_serial, call->acks_highest_serial))
+               if (acked_serial && after(acked_serial, call->acks_highest_serial))
                        call->acks_highest_serial = acked_serial;
                break;
        }
@@ -905,8 +983,9 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
        if (nr_acks > 0) {
                if (offset > (int)skb->len - nr_acks)
                        return rxrpc_proto_abort(call, 0, rxrpc_eproto_ackr_short_sack);
-               rxrpc_input_soft_acks(call, skb->data + offset, first_soft_ack,
-                                     nr_acks, &summary);
+               rxrpc_input_soft_acks(call, &summary, skb, first_soft_ack, since);
+               rxrpc_get_skb(skb, rxrpc_skb_get_last_nack);
+               call->cong_last_nack = skb;
        }
 
        if (test_bit(RXRPC_CALL_TX_LAST, &call->flags) &&
index a0906145e8293ca457fd0b1493ba3892f5f0729a..4a292f860ae37a41bddcd99f7e3bdc6a2c092d29 100644 (file)
@@ -216,7 +216,7 @@ int rxrpc_send_ack_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb)
        iov[0].iov_len  = sizeof(txb->wire) + sizeof(txb->ack) + n;
        len = iov[0].iov_len;
 
-       serial = atomic_inc_return(&conn->serial);
+       serial = rxrpc_get_next_serial(conn);
        txb->wire.serial = htonl(serial);
        trace_rxrpc_tx_ack(call->debug_id, serial,
                           ntohl(txb->ack.firstPacket),
@@ -302,7 +302,7 @@ int rxrpc_send_abort_packet(struct rxrpc_call *call)
        iov[0].iov_base = &pkt;
        iov[0].iov_len  = sizeof(pkt);
 
-       serial = atomic_inc_return(&conn->serial);
+       serial = rxrpc_get_next_serial(conn);
        pkt.whdr.serial = htonl(serial);
 
        iov_iter_kvec(&msg.msg_iter, WRITE, iov, 1, sizeof(pkt));
@@ -334,7 +334,7 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, struct rxrpc_txbuf *txb)
        _enter("%x,{%d}", txb->seq, txb->len);
 
        /* Each transmission of a Tx packet needs a new serial number */
-       serial = atomic_inc_return(&conn->serial);
+       serial = rxrpc_get_next_serial(conn);
        txb->wire.serial = htonl(serial);
 
        if (test_bit(RXRPC_CONN_PROBING_FOR_UPGRADE, &conn->flags) &&
@@ -558,7 +558,7 @@ void rxrpc_send_conn_abort(struct rxrpc_connection *conn)
 
        len = iov[0].iov_len + iov[1].iov_len;
 
-       serial = atomic_inc_return(&conn->serial);
+       serial = rxrpc_get_next_serial(conn);
        whdr.serial = htonl(serial);
 
        iov_iter_kvec(&msg.msg_iter, WRITE, iov, 2, len);
index 6c86cbb98d1d601edc9dad728c72f887067a376e..26dc2f26d92d8d67f82229675254d7217c2184e0 100644 (file)
@@ -181,7 +181,7 @@ print:
                   atomic_read(&conn->active),
                   state,
                   key_serial(conn->key),
-                  atomic_read(&conn->serial),
+                  conn->tx_serial,
                   conn->hi_serial,
                   conn->channels[0].call_id,
                   conn->channels[1].call_id,
index b52dedcebce0a7aafe0888f97e79bb81435749f2..6b32d61d4cdc46719d4a011987f6ea112ae59fc1 100644 (file)
@@ -664,7 +664,7 @@ static int rxkad_issue_challenge(struct rxrpc_connection *conn)
 
        len = iov[0].iov_len + iov[1].iov_len;
 
-       serial = atomic_inc_return(&conn->serial);
+       serial = rxrpc_get_next_serial(conn);
        whdr.serial = htonl(serial);
 
        ret = kernel_sendmsg(conn->local->socket, &msg, iov, 2, len);
@@ -721,7 +721,7 @@ static int rxkad_send_response(struct rxrpc_connection *conn,
 
        len = iov[0].iov_len + iov[1].iov_len + iov[2].iov_len;
 
-       serial = atomic_inc_return(&conn->serial);
+       serial = rxrpc_get_next_serial(conn);
        whdr.serial = htonl(serial);
 
        rxrpc_local_dont_fragment(conn->local, false);
index 12386f590b0f61f45e4ff40c9ec3605326671a2d..6faa7d00da09771ae130581604c3b14c50472966 100644 (file)
@@ -232,18 +232,14 @@ release_idr:
        return err;
 }
 
-static bool is_mirred_nested(void)
-{
-       return unlikely(__this_cpu_read(mirred_nest_level) > 1);
-}
-
-static int tcf_mirred_forward(bool want_ingress, struct sk_buff *skb)
+static int
+tcf_mirred_forward(bool at_ingress, bool want_ingress, struct sk_buff *skb)
 {
        int err;
 
        if (!want_ingress)
                err = tcf_dev_queue_xmit(skb, dev_queue_xmit);
-       else if (is_mirred_nested())
+       else if (!at_ingress)
                err = netif_rx(skb);
        else
                err = netif_receive_skb(skb);
@@ -270,8 +266,7 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
        if (unlikely(!(dev->flags & IFF_UP)) || !netif_carrier_ok(dev)) {
                net_notice_ratelimited("tc mirred to Houston: device %s is down\n",
                                       dev->name);
-               err = -ENODEV;
-               goto out;
+               goto err_cant_do;
        }
 
        /* we could easily avoid the clone only if called by ingress and clsact;
@@ -283,10 +278,8 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
                tcf_mirred_can_reinsert(retval);
        if (!dont_clone) {
                skb_to_send = skb_clone(skb, GFP_ATOMIC);
-               if (!skb_to_send) {
-                       err =  -ENOMEM;
-                       goto out;
-               }
+               if (!skb_to_send)
+                       goto err_cant_do;
        }
 
        want_ingress = tcf_mirred_act_wants_ingress(m_eaction);
@@ -319,19 +312,20 @@ static int tcf_mirred_to_dev(struct sk_buff *skb, struct tcf_mirred *m,
 
                skb_set_redirected(skb_to_send, skb_to_send->tc_at_ingress);
 
-               err = tcf_mirred_forward(want_ingress, skb_to_send);
+               err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
        } else {
-               err = tcf_mirred_forward(want_ingress, skb_to_send);
+               err = tcf_mirred_forward(at_ingress, want_ingress, skb_to_send);
        }
-
-       if (err) {
-out:
+       if (err)
                tcf_action_inc_overlimit_qstats(&m->common);
-               if (is_redirect)
-                       retval = TC_ACT_SHOT;
-       }
 
        return retval;
+
+err_cant_do:
+       if (is_redirect)
+               retval = TC_ACT_SHOT;
+       tcf_action_inc_overlimit_qstats(&m->common);
+       return retval;
 }
 
 static int tcf_blockcast_redir(struct sk_buff *skb, struct tcf_mirred *m,
@@ -533,8 +527,6 @@ static int mirred_device_event(struct notifier_block *unused,
                                 * net_device are already rcu protected.
                                 */
                                RCU_INIT_POINTER(m->tcfm_dev, NULL);
-                       } else if (m->tcfm_blockid) {
-                               m->tcfm_blockid = 0;
                        }
                        spin_unlock_bh(&m->tcf_lock);
                }
index 92a12e3d0fe63646b1d82751c9986e08de6ab673..ff3d396a65aac0dec81fc79a6bea44c24cfd7a68 100644 (file)
@@ -1560,6 +1560,9 @@ tcf_block_playback_offloads(struct tcf_block *block, flow_setup_cb_t *cb,
             chain_prev = chain,
                     chain = __tcf_get_next_chain(block, chain),
                     tcf_chain_put(chain_prev)) {
+               if (chain->tmplt_ops && add)
+                       chain->tmplt_ops->tmplt_reoffload(chain, true, cb,
+                                                         cb_priv);
                for (tp = __tcf_get_next_proto(chain, NULL); tp;
                     tp_prev = tp,
                             tp = __tcf_get_next_proto(chain, tp),
@@ -1575,6 +1578,9 @@ tcf_block_playback_offloads(struct tcf_block *block, flow_setup_cb_t *cb,
                                goto err_playback_remove;
                        }
                }
+               if (chain->tmplt_ops && !add)
+                       chain->tmplt_ops->tmplt_reoffload(chain, false, cb,
+                                                         cb_priv);
        }
 
        return 0;
@@ -3000,7 +3006,8 @@ static int tc_chain_tmplt_add(struct tcf_chain *chain, struct net *net,
        ops = tcf_proto_lookup_ops(name, true, extack);
        if (IS_ERR(ops))
                return PTR_ERR(ops);
-       if (!ops->tmplt_create || !ops->tmplt_destroy || !ops->tmplt_dump) {
+       if (!ops->tmplt_create || !ops->tmplt_destroy || !ops->tmplt_dump ||
+           !ops->tmplt_reoffload) {
                NL_SET_ERR_MSG(extack, "Chain templates are not supported with specified classifier");
                module_put(ops->owner);
                return -EOPNOTSUPP;
index e5314a31f75ae3a6db31cb81a3ebf5316a3005ff..6ee7064c82fcc3bdb7596e2ad8fe33bc6456102d 100644 (file)
@@ -2460,8 +2460,11 @@ unbind_filter:
        }
 
 errout_idr:
-       if (!fold)
+       if (!fold) {
+               spin_lock(&tp->lock);
                idr_remove(&head->handle_idr, fnew->handle);
+               spin_unlock(&tp->lock);
+       }
        __fl_put(fnew);
 errout_tb:
        kfree(tb);
@@ -2721,6 +2724,28 @@ static void fl_tmplt_destroy(void *tmplt_priv)
        kfree(tmplt);
 }
 
+static void fl_tmplt_reoffload(struct tcf_chain *chain, bool add,
+                              flow_setup_cb_t *cb, void *cb_priv)
+{
+       struct fl_flow_tmplt *tmplt = chain->tmplt_priv;
+       struct flow_cls_offload cls_flower = {};
+
+       cls_flower.rule = flow_rule_alloc(0);
+       if (!cls_flower.rule)
+               return;
+
+       cls_flower.common.chain_index = chain->index;
+       cls_flower.command = add ? FLOW_CLS_TMPLT_CREATE :
+                                  FLOW_CLS_TMPLT_DESTROY;
+       cls_flower.cookie = (unsigned long) tmplt;
+       cls_flower.rule->match.dissector = &tmplt->dissector;
+       cls_flower.rule->match.mask = &tmplt->mask;
+       cls_flower.rule->match.key = &tmplt->dummy_key;
+
+       cb(TC_SETUP_CLSFLOWER, &cls_flower, cb_priv);
+       kfree(cls_flower.rule);
+}
+
 static int fl_dump_key_val(struct sk_buff *skb,
                           void *val, int val_type,
                           void *mask, int mask_type, int len)
@@ -3628,6 +3653,7 @@ static struct tcf_proto_ops cls_fl_ops __read_mostly = {
        .bind_class     = fl_bind_class,
        .tmplt_create   = fl_tmplt_create,
        .tmplt_destroy  = fl_tmplt_destroy,
+       .tmplt_reoffload = fl_tmplt_reoffload,
        .tmplt_dump     = fl_tmplt_dump,
        .get_exts       = fl_get_exts,
        .owner          = THIS_MODULE,
index 5ea84decec19a6b6593e1f9caa31e0e914a52e9d..5337bc46275519a062e54d093df84f3ea8f58583 100644 (file)
@@ -222,6 +222,7 @@ static void __exit exit_em_canid(void)
        tcf_em_unregister(&em_canid_ops);
 }
 
+MODULE_DESCRIPTION("ematch classifier to match CAN IDs embedded in skb CAN frames");
 MODULE_LICENSE("GPL");
 
 module_init(init_em_canid);
index f17b049ea53090d750133422f90b3ad673fd6c41..c90ad7ea26b4697cbedf25a51fc1b92771040c4e 100644 (file)
@@ -87,6 +87,7 @@ static void __exit exit_em_cmp(void)
        tcf_em_unregister(&em_cmp_ops);
 }
 
+MODULE_DESCRIPTION("ematch classifier for basic data types(8/16/32 bit) against skb data");
 MODULE_LICENSE("GPL");
 
 module_init(init_em_cmp);
index 09d8afd04a2a78ac55b0ddd1b424ddcb28b9ba83..8996c73c9779b5fa804e6f913834cf1fe4d071e6 100644 (file)
@@ -1006,6 +1006,7 @@ static void __exit exit_em_meta(void)
        tcf_em_unregister(&em_meta_ops);
 }
 
+MODULE_DESCRIPTION("ematch classifier for various internal kernel metadata, skb metadata and sk metadata");
 MODULE_LICENSE("GPL");
 
 module_init(init_em_meta);
index a83b237cbeb06553c805dfeac3632fd69d6dc3c6..4f9f21a05d5e40aadfdc4c339b8178ad43dc2c8b 100644 (file)
@@ -68,6 +68,7 @@ static void __exit exit_em_nbyte(void)
        tcf_em_unregister(&em_nbyte_ops);
 }
 
+MODULE_DESCRIPTION("ematch classifier for arbitrary skb multi-bytes");
 MODULE_LICENSE("GPL");
 
 module_init(init_em_nbyte);
index f176afb70559eb0a594a2f724765ccb0a1d3b746..420c66203b1777632500ee3d5e2d89a46b50bc4c 100644 (file)
@@ -147,6 +147,7 @@ static void __exit exit_em_text(void)
        tcf_em_unregister(&em_text_ops);
 }
 
+MODULE_DESCRIPTION("ematch classifier for embedded text in skbs");
 MODULE_LICENSE("GPL");
 
 module_init(init_em_text);
index 71b070da043796d1872ae7aecc2348ab6a4b37f1..fdec4db5ec89d047427d62fb6ce95b3649a80f9f 100644 (file)
@@ -52,6 +52,7 @@ static void __exit exit_em_u32(void)
        tcf_em_unregister(&em_u32_ops);
 }
 
+MODULE_DESCRIPTION("ematch skb classifier using 32 bit chunks of data");
 MODULE_LICENSE("GPL");
 
 module_init(init_em_u32);
index 7182c5a450fb5b804d19fdb1b04b75a2c34eb2d0..5c1652181805880ff9f5eaae45e0bab6b00170df 100644 (file)
@@ -38,6 +38,14 @@ void sctp_inq_init(struct sctp_inq *queue)
        INIT_WORK(&queue->immediate, NULL);
 }
 
+/* Properly release the chunk which is being worked on. */
+static inline void sctp_inq_chunk_free(struct sctp_chunk *chunk)
+{
+       if (chunk->head_skb)
+               chunk->skb = chunk->head_skb;
+       sctp_chunk_free(chunk);
+}
+
 /* Release the memory associated with an SCTP inqueue.  */
 void sctp_inq_free(struct sctp_inq *queue)
 {
@@ -53,7 +61,7 @@ void sctp_inq_free(struct sctp_inq *queue)
         * free it as well.
         */
        if (queue->in_progress) {
-               sctp_chunk_free(queue->in_progress);
+               sctp_inq_chunk_free(queue->in_progress);
                queue->in_progress = NULL;
        }
 }
@@ -130,9 +138,7 @@ struct sctp_chunk *sctp_inq_pop(struct sctp_inq *queue)
                                goto new_skb;
                        }
 
-                       if (chunk->head_skb)
-                               chunk->skb = chunk->head_skb;
-                       sctp_chunk_free(chunk);
+                       sctp_inq_chunk_free(chunk);
                        chunk = queue->in_progress = NULL;
                } else {
                        /* Nothing to do. Next chunk in the packet, please. */
index a2cb30af46cb158fcd3a1c12349a0375a1020203..0f53a5c6fd9d9c88c78f51640b179bf214e78bda 100644 (file)
@@ -924,6 +924,7 @@ static int smc_switch_to_fallback(struct smc_sock *smc, int reason_code)
                smc->clcsock->file->private_data = smc->clcsock;
                smc->clcsock->wq.fasync_list =
                        smc->sk.sk_socket->wq.fasync_list;
+               smc->sk.sk_socket->wq.fasync_list = NULL;
 
                /* There might be some wait entries remaining
                 * in smc sk->sk_wq and they should be woken up
index 95cc95458e2d8d2c2c3578088544ee1abe0ea8a6..e4c858411207a51d043aef96a47c48ac63f5dd8a 100644 (file)
@@ -1877,9 +1877,15 @@ static bool smcd_lgr_match(struct smc_link_group *lgr,
                           struct smcd_dev *smcismdev,
                           struct smcd_gid *peer_gid)
 {
-       return lgr->peer_gid.gid == peer_gid->gid && lgr->smcd == smcismdev &&
-               smc_ism_is_virtual(smcismdev) ?
-               (lgr->peer_gid.gid_ext == peer_gid->gid_ext) : 1;
+       if (lgr->peer_gid.gid != peer_gid->gid ||
+           lgr->smcd != smcismdev)
+               return false;
+
+       if (smc_ism_is_virtual(smcismdev) &&
+           lgr->peer_gid.gid_ext != peer_gid->gid_ext)
+               return false;
+
+       return true;
 }
 
 /* create a new SMC connection (and a new link group if necessary) */
index 52f7c4f1e7670d723a6858614f071f73dbd88dc5..5a33908015f3e3197ad11869c6f5134799307c56 100644 (file)
@@ -164,7 +164,7 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb,
        }
        if (smc_conn_lgr_valid(&smc->conn) && smc->conn.lgr->is_smcd &&
            (req->diag_ext & (1 << (SMC_DIAG_DMBINFO - 1))) &&
-           !list_empty(&smc->conn.lgr->list)) {
+           !list_empty(&smc->conn.lgr->list) && smc->conn.rmb_desc) {
                struct smc_connection *conn = &smc->conn;
                struct smcd_diag_dmbinfo dinfo;
                struct smcd_dev *smcd = conn->lgr->smcd;
index f60c93e5a25d69f6c918ab43a9c48a973cbf90b4..b969e505c7b77002e17936c7ee4fa6e6c79ad223 100644 (file)
@@ -1598,10 +1598,10 @@ void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp)
        /* Finally, send the reply synchronously */
        if (rqstp->bc_to_initval > 0) {
                timeout.to_initval = rqstp->bc_to_initval;
-               timeout.to_retries = rqstp->bc_to_initval;
+               timeout.to_retries = rqstp->bc_to_retries;
        } else {
                timeout.to_initval = req->rq_xprt->timeout->to_initval;
-               timeout.to_initval = req->rq_xprt->timeout->to_retries;
+               timeout.to_retries = req->rq_xprt->timeout->to_retries;
        }
        memcpy(&req->rq_snd_buf, &rqstp->rq_res, sizeof(req->rq_snd_buf));
        task = rpc_run_bc_task(req, &timeout);
index bfb2f78523a8289f0a6ea758ca61c53d06832273..545017a3daa4d6b20255c51c6c0dea73ec32ecfc 100644 (file)
@@ -717,12 +717,12 @@ static int svc_udp_sendto(struct svc_rqst *rqstp)
                                ARRAY_SIZE(rqstp->rq_bvec), xdr);
 
        iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec,
-                     count, 0);
+                     count, rqstp->rq_res.len);
        err = sock_sendmsg(svsk->sk_sock, &msg);
        if (err == -ECONNREFUSED) {
                /* ICMP error on earlier request. */
                iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec,
-                             count, 0);
+                             count, rqstp->rq_res.len);
                err = sock_sendmsg(svsk->sk_sock, &msg);
        }
 
index 5b045284849e03151b172cf55248492aed2b3472..c9189a970eec317745a06c27064f504a6ff2e3d2 100644 (file)
 #include <linux/rtnetlink.h>
 #include <net/switchdev.h>
 
+static bool switchdev_obj_eq(const struct switchdev_obj *a,
+                            const struct switchdev_obj *b)
+{
+       const struct switchdev_obj_port_vlan *va, *vb;
+       const struct switchdev_obj_port_mdb *ma, *mb;
+
+       if (a->id != b->id || a->orig_dev != b->orig_dev)
+               return false;
+
+       switch (a->id) {
+       case SWITCHDEV_OBJ_ID_PORT_VLAN:
+               va = SWITCHDEV_OBJ_PORT_VLAN(a);
+               vb = SWITCHDEV_OBJ_PORT_VLAN(b);
+               return va->flags == vb->flags &&
+                       va->vid == vb->vid &&
+                       va->changed == vb->changed;
+       case SWITCHDEV_OBJ_ID_PORT_MDB:
+       case SWITCHDEV_OBJ_ID_HOST_MDB:
+               ma = SWITCHDEV_OBJ_PORT_MDB(a);
+               mb = SWITCHDEV_OBJ_PORT_MDB(b);
+               return ma->vid == mb->vid &&
+                       ether_addr_equal(ma->addr, mb->addr);
+       default:
+               break;
+       }
+
+       BUG();
+}
+
 static LIST_HEAD(deferred);
 static DEFINE_SPINLOCK(deferred_lock);
 
@@ -307,6 +336,50 @@ int switchdev_port_obj_del(struct net_device *dev,
 }
 EXPORT_SYMBOL_GPL(switchdev_port_obj_del);
 
+/**
+ *     switchdev_port_obj_act_is_deferred - Is object action pending?
+ *
+ *     @dev: port device
+ *     @nt: type of action; add or delete
+ *     @obj: object to test
+ *
+ *     Returns true if a deferred item is pending, which is
+ *     equivalent to the action @nt on an object @obj.
+ *
+ *     rtnl_lock must be held.
+ */
+bool switchdev_port_obj_act_is_deferred(struct net_device *dev,
+                                       enum switchdev_notifier_type nt,
+                                       const struct switchdev_obj *obj)
+{
+       struct switchdev_deferred_item *dfitem;
+       bool found = false;
+
+       ASSERT_RTNL();
+
+       spin_lock_bh(&deferred_lock);
+
+       list_for_each_entry(dfitem, &deferred, list) {
+               if (dfitem->dev != dev)
+                       continue;
+
+               if ((dfitem->func == switchdev_port_obj_add_deferred &&
+                    nt == SWITCHDEV_PORT_OBJ_ADD) ||
+                   (dfitem->func == switchdev_port_obj_del_deferred &&
+                    nt == SWITCHDEV_PORT_OBJ_DEL)) {
+                       if (switchdev_obj_eq((const void *)dfitem->data, obj)) {
+                               found = true;
+                               break;
+                       }
+               }
+       }
+
+       spin_unlock_bh(&deferred_lock);
+
+       return found;
+}
+EXPORT_SYMBOL_GPL(switchdev_port_obj_act_is_deferred);
+
 static ATOMIC_NOTIFIER_HEAD(switchdev_notif_chain);
 static BLOCKING_NOTIFIER_HEAD(switchdev_blocking_notif_chain);
 
index 2cde375477e381aa4a542cd4cf24db067770b466..878415c43527615801186d79a6c0c73b62bf5750 100644 (file)
@@ -1086,6 +1086,12 @@ int tipc_nl_bearer_add(struct sk_buff *skb, struct genl_info *info)
 
 #ifdef CONFIG_TIPC_MEDIA_UDP
        if (attrs[TIPC_NLA_BEARER_UDP_OPTS]) {
+               if (b->media->type_id != TIPC_MEDIA_TYPE_UDP) {
+                       rtnl_unlock();
+                       NL_SET_ERR_MSG(info->extack, "UDP option is unsupported");
+                       return -EINVAL;
+               }
+
                err = tipc_udp_nl_bearer_add(b,
                                             attrs[TIPC_NLA_BEARER_UDP_OPTS]);
                if (err) {
index 1c2c6800949dd4c2800f76326b7500f9743c4720..b4674f03d71a9fb9a5526555d7aca9b9cc5e665c 100644 (file)
@@ -1003,7 +1003,7 @@ static u16 tls_user_config(struct tls_context *ctx, bool tx)
        return 0;
 }
 
-static int tls_get_info(const struct sock *sk, struct sk_buff *skb)
+static int tls_get_info(struct sock *sk, struct sk_buff *skb)
 {
        u16 version, cipher_type;
        struct tls_context *ctx;
index 31e8a94dfc111b7705fe19b9b4ddee3e6a317a23..211f57164cb611fd2665f682906be96aa35463ed 100644 (file)
@@ -52,6 +52,7 @@ struct tls_decrypt_arg {
        struct_group(inargs,
        bool zc;
        bool async;
+       bool async_done;
        u8 tail;
        );
 
@@ -63,6 +64,7 @@ struct tls_decrypt_ctx {
        u8 iv[TLS_MAX_IV_SIZE];
        u8 aad[TLS_MAX_AAD_SIZE];
        u8 tail;
+       bool free_sgout;
        struct scatterlist sg[];
 };
 
@@ -187,7 +189,6 @@ static void tls_decrypt_done(void *data, int err)
        struct aead_request *aead_req = data;
        struct crypto_aead *aead = crypto_aead_reqtfm(aead_req);
        struct scatterlist *sgout = aead_req->dst;
-       struct scatterlist *sgin = aead_req->src;
        struct tls_sw_context_rx *ctx;
        struct tls_decrypt_ctx *dctx;
        struct tls_context *tls_ctx;
@@ -196,6 +197,17 @@ static void tls_decrypt_done(void *data, int err)
        struct sock *sk;
        int aead_size;
 
+       /* If requests get too backlogged crypto API returns -EBUSY and calls
+        * ->complete(-EINPROGRESS) immediately followed by ->complete(0)
+        * to make waiting for backlog to flush with crypto_wait_req() easier.
+        * First wait converts -EBUSY -> -EINPROGRESS, and the second one
+        * -EINPROGRESS -> 0.
+        * We have a single struct crypto_async_request per direction, this
+        * scheme doesn't help us, so just ignore the first ->complete().
+        */
+       if (err == -EINPROGRESS)
+               return;
+
        aead_size = sizeof(*aead_req) + crypto_aead_reqsize(aead);
        aead_size = ALIGN(aead_size, __alignof__(*dctx));
        dctx = (void *)((u8 *)aead_req + aead_size);
@@ -213,7 +225,7 @@ static void tls_decrypt_done(void *data, int err)
        }
 
        /* Free the destination pages if skb was not decrypted inplace */
-       if (sgout != sgin) {
+       if (dctx->free_sgout) {
                /* Skip the first S/G entry as it points to AAD */
                for_each_sg(sg_next(sgout), sg, UINT_MAX, pages) {
                        if (!sg)
@@ -224,10 +236,17 @@ static void tls_decrypt_done(void *data, int err)
 
        kfree(aead_req);
 
-       spin_lock_bh(&ctx->decrypt_compl_lock);
-       if (!atomic_dec_return(&ctx->decrypt_pending))
+       if (atomic_dec_and_test(&ctx->decrypt_pending))
                complete(&ctx->async_wait.completion);
-       spin_unlock_bh(&ctx->decrypt_compl_lock);
+}
+
+static int tls_decrypt_async_wait(struct tls_sw_context_rx *ctx)
+{
+       if (!atomic_dec_and_test(&ctx->decrypt_pending))
+               crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
+       atomic_inc(&ctx->decrypt_pending);
+
+       return ctx->async_wait.err;
 }
 
 static int tls_do_decryption(struct sock *sk,
@@ -253,20 +272,33 @@ static int tls_do_decryption(struct sock *sk,
                aead_request_set_callback(aead_req,
                                          CRYPTO_TFM_REQ_MAY_BACKLOG,
                                          tls_decrypt_done, aead_req);
+               DEBUG_NET_WARN_ON_ONCE(atomic_read(&ctx->decrypt_pending) < 1);
                atomic_inc(&ctx->decrypt_pending);
        } else {
+               DECLARE_CRYPTO_WAIT(wait);
+
                aead_request_set_callback(aead_req,
                                          CRYPTO_TFM_REQ_MAY_BACKLOG,
-                                         crypto_req_done, &ctx->async_wait);
+                                         crypto_req_done, &wait);
+               ret = crypto_aead_decrypt(aead_req);
+               if (ret == -EINPROGRESS || ret == -EBUSY)
+                       ret = crypto_wait_req(ret, &wait);
+               return ret;
        }
 
        ret = crypto_aead_decrypt(aead_req);
-       if (ret == -EINPROGRESS) {
-               if (darg->async)
-                       return 0;
+       if (ret == -EINPROGRESS)
+               return 0;
 
-               ret = crypto_wait_req(ret, &ctx->async_wait);
+       if (ret == -EBUSY) {
+               ret = tls_decrypt_async_wait(ctx);
+               darg->async_done = true;
+               /* all completions have run, we're not doing async anymore */
+               darg->async = false;
+               return ret;
        }
+
+       atomic_dec(&ctx->decrypt_pending);
        darg->async = false;
 
        return ret;
@@ -439,9 +471,10 @@ static void tls_encrypt_done(void *data, int err)
        struct tls_rec *rec = data;
        struct scatterlist *sge;
        struct sk_msg *msg_en;
-       bool ready = false;
        struct sock *sk;
-       int pending;
+
+       if (err == -EINPROGRESS) /* see the comment in tls_decrypt_done() */
+               return;
 
        msg_en = &rec->msg_encrypted;
 
@@ -476,23 +509,25 @@ static void tls_encrypt_done(void *data, int err)
                /* If received record is at head of tx_list, schedule tx */
                first_rec = list_first_entry(&ctx->tx_list,
                                             struct tls_rec, list);
-               if (rec == first_rec)
-                       ready = true;
+               if (rec == first_rec) {
+                       /* Schedule the transmission */
+                       if (!test_and_set_bit(BIT_TX_SCHEDULED,
+                                             &ctx->tx_bitmask))
+                               schedule_delayed_work(&ctx->tx_work.work, 1);
+               }
        }
 
-       spin_lock_bh(&ctx->encrypt_compl_lock);
-       pending = atomic_dec_return(&ctx->encrypt_pending);
-
-       if (!pending && ctx->async_notify)
+       if (atomic_dec_and_test(&ctx->encrypt_pending))
                complete(&ctx->async_wait.completion);
-       spin_unlock_bh(&ctx->encrypt_compl_lock);
+}
 
-       if (!ready)
-               return;
+static int tls_encrypt_async_wait(struct tls_sw_context_tx *ctx)
+{
+       if (!atomic_dec_and_test(&ctx->encrypt_pending))
+               crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
+       atomic_inc(&ctx->encrypt_pending);
 
-       /* Schedule the transmission */
-       if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask))
-               schedule_delayed_work(&ctx->tx_work.work, 1);
+       return ctx->async_wait.err;
 }
 
 static int tls_do_encryption(struct sock *sk,
@@ -541,9 +576,14 @@ static int tls_do_encryption(struct sock *sk,
 
        /* Add the record in tx_list */
        list_add_tail((struct list_head *)&rec->list, &ctx->tx_list);
+       DEBUG_NET_WARN_ON_ONCE(atomic_read(&ctx->encrypt_pending) < 1);
        atomic_inc(&ctx->encrypt_pending);
 
        rc = crypto_aead_encrypt(aead_req);
+       if (rc == -EBUSY) {
+               rc = tls_encrypt_async_wait(ctx);
+               rc = rc ?: -EINPROGRESS;
+       }
        if (!rc || rc != -EINPROGRESS) {
                atomic_dec(&ctx->encrypt_pending);
                sge->offset -= prot->prepend_size;
@@ -984,7 +1024,6 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
        int num_zc = 0;
        int orig_size;
        int ret = 0;
-       int pending;
 
        if (!eor && (msg->msg_flags & MSG_EOR))
                return -EINVAL;
@@ -1163,24 +1202,12 @@ trim_sgl:
        if (!num_async) {
                goto send_end;
        } else if (num_zc) {
-               /* Wait for pending encryptions to get completed */
-               spin_lock_bh(&ctx->encrypt_compl_lock);
-               ctx->async_notify = true;
-
-               pending = atomic_read(&ctx->encrypt_pending);
-               spin_unlock_bh(&ctx->encrypt_compl_lock);
-               if (pending)
-                       crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
-               else
-                       reinit_completion(&ctx->async_wait.completion);
-
-               /* There can be no concurrent accesses, since we have no
-                * pending encrypt operations
-                */
-               WRITE_ONCE(ctx->async_notify, false);
+               int err;
 
-               if (ctx->async_wait.err) {
-                       ret = ctx->async_wait.err;
+               /* Wait for pending encryptions to get completed */
+               err = tls_encrypt_async_wait(ctx);
+               if (err) {
+                       ret = err;
                        copied = 0;
                }
        }
@@ -1229,7 +1256,6 @@ void tls_sw_splice_eof(struct socket *sock)
        ssize_t copied = 0;
        bool retrying = false;
        int ret = 0;
-       int pending;
 
        if (!ctx->open_rec)
                return;
@@ -1264,22 +1290,7 @@ retry:
        }
 
        /* Wait for pending encryptions to get completed */
-       spin_lock_bh(&ctx->encrypt_compl_lock);
-       ctx->async_notify = true;
-
-       pending = atomic_read(&ctx->encrypt_pending);
-       spin_unlock_bh(&ctx->encrypt_compl_lock);
-       if (pending)
-               crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
-       else
-               reinit_completion(&ctx->async_wait.completion);
-
-       /* There can be no concurrent accesses, since we have no pending
-        * encrypt operations
-        */
-       WRITE_ONCE(ctx->async_notify, false);
-
-       if (ctx->async_wait.err)
+       if (tls_encrypt_async_wait(ctx))
                goto unlock;
 
        /* Transmit if any encryptions have completed */
@@ -1581,12 +1592,16 @@ static int tls_decrypt_sg(struct sock *sk, struct iov_iter *out_iov,
        } else if (out_sg) {
                memcpy(sgout, out_sg, n_sgout * sizeof(*sgout));
        }
+       dctx->free_sgout = !!pages;
 
        /* Prepare and submit AEAD request */
        err = tls_do_decryption(sk, sgin, sgout, dctx->iv,
                                data_len + prot->tail_size, aead_req, darg);
-       if (err)
+       if (err) {
+               if (darg->async_done)
+                       goto exit_free_skb;
                goto exit_free_pages;
+       }
 
        darg->skb = clear_skb ?: tls_strp_msg(ctx);
        clear_skb = NULL;
@@ -1598,6 +1613,9 @@ static int tls_decrypt_sg(struct sock *sk, struct iov_iter *out_iov,
                return err;
        }
 
+       if (unlikely(darg->async_done))
+               return 0;
+
        if (prot->tail_size)
                darg->tail = dctx->tail;
 
@@ -1769,7 +1787,8 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
                           u8 *control,
                           size_t skip,
                           size_t len,
-                          bool is_peek)
+                          bool is_peek,
+                          bool *more)
 {
        struct sk_buff *skb = skb_peek(&ctx->rx_list);
        struct tls_msg *tlm;
@@ -1782,7 +1801,7 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
 
                err = tls_record_content_type(msg, tlm, control);
                if (err <= 0)
-                       goto out;
+                       goto more;
 
                if (skip < rxm->full_len)
                        break;
@@ -1800,12 +1819,12 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
 
                err = tls_record_content_type(msg, tlm, control);
                if (err <= 0)
-                       goto out;
+                       goto more;
 
                err = skb_copy_datagram_msg(skb, rxm->offset + skip,
                                            msg, chunk);
                if (err < 0)
-                       goto out;
+                       goto more;
 
                len = len - chunk;
                copied = copied + chunk;
@@ -1841,6 +1860,10 @@ static int process_rx_list(struct tls_sw_context_rx *ctx,
 
 out:
        return copied ? : err;
+more:
+       if (more)
+               *more = true;
+       goto out;
 }
 
 static bool
@@ -1940,10 +1963,12 @@ int tls_sw_recvmsg(struct sock *sk,
        struct strp_msg *rxm;
        struct tls_msg *tlm;
        ssize_t copied = 0;
+       ssize_t peeked = 0;
        bool async = false;
        int target, err;
        bool is_kvec = iov_iter_is_kvec(&msg->msg_iter);
        bool is_peek = flags & MSG_PEEK;
+       bool rx_more = false;
        bool released = true;
        bool bpf_strp_enabled;
        bool zc_capable;
@@ -1963,12 +1988,12 @@ int tls_sw_recvmsg(struct sock *sk,
                goto end;
 
        /* Process pending decrypted records. It must be non-zero-copy */
-       err = process_rx_list(ctx, msg, &control, 0, len, is_peek);
+       err = process_rx_list(ctx, msg, &control, 0, len, is_peek, &rx_more);
        if (err < 0)
                goto end;
 
        copied = err;
-       if (len <= copied)
+       if (len <= copied || (copied && control != TLS_RECORD_TYPE_DATA) || rx_more)
                goto end;
 
        target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
@@ -2061,6 +2086,8 @@ put_on_rx_list:
                                decrypted += chunk;
                                len -= chunk;
                                __skb_queue_tail(&ctx->rx_list, skb);
+                               if (unlikely(control != TLS_RECORD_TYPE_DATA))
+                                       break;
                                continue;
                        }
 
@@ -2084,8 +2111,10 @@ put_on_rx_list:
                        if (err < 0)
                                goto put_on_rx_list_err;
 
-                       if (is_peek)
+                       if (is_peek) {
+                               peeked += chunk;
                                goto put_on_rx_list;
+                       }
 
                        if (partially_consumed) {
                                rxm->offset += chunk;
@@ -2109,16 +2138,10 @@ put_on_rx_list:
 
 recv_end:
        if (async) {
-               int ret, pending;
+               int ret;
 
                /* Wait for all previously submitted records to be decrypted */
-               spin_lock_bh(&ctx->decrypt_compl_lock);
-               reinit_completion(&ctx->async_wait.completion);
-               pending = atomic_read(&ctx->decrypt_pending);
-               spin_unlock_bh(&ctx->decrypt_compl_lock);
-               ret = 0;
-               if (pending)
-                       ret = crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
+               ret = tls_decrypt_async_wait(ctx);
                __skb_queue_purge(&ctx->async_hold);
 
                if (ret) {
@@ -2130,12 +2153,11 @@ recv_end:
 
                /* Drain records from the rx_list & copy if required */
                if (is_peek || is_kvec)
-                       err = process_rx_list(ctx, msg, &control, copied,
-                                             decrypted, is_peek);
+                       err = process_rx_list(ctx, msg, &control, copied + peeked,
+                                             decrypted - peeked, is_peek, NULL);
                else
                        err = process_rx_list(ctx, msg, &control, 0,
-                                             async_copy_bytes, is_peek);
-               decrypted += max(err, 0);
+                                             async_copy_bytes, is_peek, NULL);
        }
 
        copied += decrypted;
@@ -2435,16 +2457,9 @@ void tls_sw_release_resources_tx(struct sock *sk)
        struct tls_context *tls_ctx = tls_get_ctx(sk);
        struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
        struct tls_rec *rec, *tmp;
-       int pending;
 
        /* Wait for any pending async encryptions to complete */
-       spin_lock_bh(&ctx->encrypt_compl_lock);
-       ctx->async_notify = true;
-       pending = atomic_read(&ctx->encrypt_pending);
-       spin_unlock_bh(&ctx->encrypt_compl_lock);
-
-       if (pending)
-               crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
+       tls_encrypt_async_wait(ctx);
 
        tls_tx_records(sk, -1);
 
@@ -2607,7 +2622,7 @@ static struct tls_sw_context_tx *init_ctx_tx(struct tls_context *ctx, struct soc
        }
 
        crypto_init_wait(&sw_ctx_tx->async_wait);
-       spin_lock_init(&sw_ctx_tx->encrypt_compl_lock);
+       atomic_set(&sw_ctx_tx->encrypt_pending, 1);
        INIT_LIST_HEAD(&sw_ctx_tx->tx_list);
        INIT_DELAYED_WORK(&sw_ctx_tx->tx_work.work, tx_work_handler);
        sw_ctx_tx->tx_work.sk = sk;
@@ -2628,7 +2643,7 @@ static struct tls_sw_context_rx *init_ctx_rx(struct tls_context *ctx)
        }
 
        crypto_init_wait(&sw_ctx_rx->async_wait);
-       spin_lock_init(&sw_ctx_rx->decrypt_compl_lock);
+       atomic_set(&sw_ctx_rx->decrypt_pending, 1);
        init_waitqueue_head(&sw_ctx_rx->wq);
        skb_queue_head_init(&sw_ctx_rx->rx_list);
        skb_queue_head_init(&sw_ctx_rx->async_hold);
index ac1f2bc18fc9685652c26ac3b68f19bfd82f8332..0748e7ea5210e7d597acf87fc6caf1ea2156562e 100644 (file)
@@ -782,19 +782,6 @@ static int unix_seqpacket_sendmsg(struct socket *, struct msghdr *, size_t);
 static int unix_seqpacket_recvmsg(struct socket *, struct msghdr *, size_t,
                                  int);
 
-static int unix_set_peek_off(struct sock *sk, int val)
-{
-       struct unix_sock *u = unix_sk(sk);
-
-       if (mutex_lock_interruptible(&u->iolock))
-               return -EINTR;
-
-       WRITE_ONCE(sk->sk_peek_off, val);
-       mutex_unlock(&u->iolock);
-
-       return 0;
-}
-
 #ifdef CONFIG_PROC_FS
 static int unix_count_nr_fds(struct sock *sk)
 {
@@ -862,7 +849,7 @@ static const struct proto_ops unix_stream_ops = {
        .read_skb =     unix_stream_read_skb,
        .mmap =         sock_no_mmap,
        .splice_read =  unix_stream_splice_read,
-       .set_peek_off = unix_set_peek_off,
+       .set_peek_off = sk_set_peek_off,
        .show_fdinfo =  unix_show_fdinfo,
 };
 
@@ -886,7 +873,7 @@ static const struct proto_ops unix_dgram_ops = {
        .read_skb =     unix_read_skb,
        .recvmsg =      unix_dgram_recvmsg,
        .mmap =         sock_no_mmap,
-       .set_peek_off = unix_set_peek_off,
+       .set_peek_off = sk_set_peek_off,
        .show_fdinfo =  unix_show_fdinfo,
 };
 
@@ -909,7 +896,7 @@ static const struct proto_ops unix_seqpacket_ops = {
        .sendmsg =      unix_seqpacket_sendmsg,
        .recvmsg =      unix_seqpacket_recvmsg,
        .mmap =         sock_no_mmap,
-       .set_peek_off = unix_set_peek_off,
+       .set_peek_off = sk_set_peek_off,
        .show_fdinfo =  unix_show_fdinfo,
 };
 
@@ -1344,13 +1331,11 @@ static void unix_state_double_lock(struct sock *sk1, struct sock *sk2)
                unix_state_lock(sk1);
                return;
        }
-       if (sk1 < sk2) {
-               unix_state_lock(sk1);
-               unix_state_lock_nested(sk2);
-       } else {
-               unix_state_lock(sk2);
-               unix_state_lock_nested(sk1);
-       }
+       if (sk1 > sk2)
+               swap(sk1, sk2);
+
+       unix_state_lock(sk1);
+       unix_state_lock_nested(sk2, U_LOCK_SECOND);
 }
 
 static void unix_state_double_unlock(struct sock *sk1, struct sock *sk2)
@@ -1591,7 +1576,7 @@ restart:
                goto out_unlock;
        }
 
-       unix_state_lock_nested(sk);
+       unix_state_lock_nested(sk, U_LOCK_SECOND);
 
        if (sk->sk_state != st) {
                unix_state_unlock(sk);
index bec09a3a1d44ce56d43e16583fdf3b417cce4033..be19827eca36dbb68ec97b2e9b3c80e22b4fa4be 100644 (file)
@@ -84,7 +84,7 @@ static int sk_diag_dump_icons(struct sock *sk, struct sk_buff *nlskb)
                         * queue lock. With the other's queue locked it's
                         * OK to lock the state.
                         */
-                       unix_state_lock_nested(req);
+                       unix_state_lock_nested(req, U_LOCK_DIAG);
                        peer = unix_sk(req)->peer;
                        buf[i++] = (peer ? sock_i_ino(peer) : 0);
                        unix_state_unlock(req);
index 2405f0f9af31c0ccefe2aa404002cfab8583c090..2a81880dac7b7b464b5ae9443fa3b2863cd76471 100644 (file)
@@ -284,9 +284,17 @@ void unix_gc(void)
         * which are creating the cycle(s).
         */
        skb_queue_head_init(&hitlist);
-       list_for_each_entry(u, &gc_candidates, link)
+       list_for_each_entry(u, &gc_candidates, link) {
                scan_children(&u->sk, inc_inflight, &hitlist);
 
+#if IS_ENABLED(CONFIG_AF_UNIX_OOB)
+               if (u->oob_skb) {
+                       kfree_skb(u->oob_skb);
+                       u->oob_skb = NULL;
+               }
+#endif
+       }
+
        /* not_cycle_list contains those sockets which do not make up a
         * cycle.  Restore these to the inflight list.
         */
index a9ac85e09af37ca8f7d1599e7057f98e7d8200be..10345388ad139f5f9b35025b2336c548bd3344be 100644 (file)
@@ -206,7 +206,6 @@ config CFG80211_KUNIT_TEST
        depends on KUNIT
        depends on CFG80211
        default KUNIT_ALL_TESTS
-       depends on !KERNEL_6_2
        help
          Enable this option to test cfg80211 functions with kunit.
 
index 409d74c57ca0d8c8d36c2260897fce39557620ee..3fb1b637352a9d0b469206d890601031ffd4c68f 100644 (file)
@@ -5,7 +5,7 @@
  * Copyright 2006-2010         Johannes Berg <johannes@sipsolutions.net>
  * Copyright 2013-2014  Intel Mobile Communications GmbH
  * Copyright 2015-2017 Intel Deutschland GmbH
- * Copyright (C) 2018-2023 Intel Corporation
+ * Copyright (C) 2018-2024 Intel Corporation
  */
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -1661,6 +1661,7 @@ void wiphy_delayed_work_queue(struct wiphy *wiphy,
                              unsigned long delay)
 {
        if (!delay) {
+               del_timer(&dwork->timer);
                wiphy_work_queue(wiphy, &dwork->work);
                return;
        }
index 60877b532993219c6607c28d6b4e0fb6ae2506ad..bd54a928bab4120134711f54e677cb1f60c4ba7b 100644 (file)
@@ -4020,6 +4020,7 @@ static int nl80211_dump_interface(struct sk_buff *skb, struct netlink_callback *
                }
                wiphy_unlock(&rdev->wiphy);
 
+               if_start = 0;
                wp_idx++;
        }
  out:
@@ -4196,6 +4197,8 @@ static int nl80211_set_interface(struct sk_buff *skb, struct genl_info *info)
 
                if (ntype != NL80211_IFTYPE_MESH_POINT)
                        return -EINVAL;
+               if (otype != NL80211_IFTYPE_MESH_POINT)
+                       return -EINVAL;
                if (netif_running(dev))
                        return -EBUSY;
 
index 2249b1a89d1c4cee36bda840d64dda612d367c5f..389a52c29bfc728c2437037b4f0e180b974d12ba 100644 (file)
@@ -1731,6 +1731,61 @@ static void cfg80211_update_hidden_bsses(struct cfg80211_internal_bss *known,
        }
 }
 
+static void cfg80211_check_stuck_ecsa(struct cfg80211_registered_device *rdev,
+                                     struct cfg80211_internal_bss *known,
+                                     const struct cfg80211_bss_ies *old)
+{
+       const struct ieee80211_ext_chansw_ie *ecsa;
+       const struct element *elem_new, *elem_old;
+       const struct cfg80211_bss_ies *new, *bcn;
+
+       if (known->pub.proberesp_ecsa_stuck)
+               return;
+
+       new = rcu_dereference_protected(known->pub.proberesp_ies,
+                                       lockdep_is_held(&rdev->bss_lock));
+       if (WARN_ON(!new))
+               return;
+
+       if (new->tsf - old->tsf < USEC_PER_SEC)
+               return;
+
+       elem_old = cfg80211_find_elem(WLAN_EID_EXT_CHANSWITCH_ANN,
+                                     old->data, old->len);
+       if (!elem_old)
+               return;
+
+       elem_new = cfg80211_find_elem(WLAN_EID_EXT_CHANSWITCH_ANN,
+                                     new->data, new->len);
+       if (!elem_new)
+               return;
+
+       bcn = rcu_dereference_protected(known->pub.beacon_ies,
+                                       lockdep_is_held(&rdev->bss_lock));
+       if (bcn &&
+           cfg80211_find_elem(WLAN_EID_EXT_CHANSWITCH_ANN,
+                              bcn->data, bcn->len))
+               return;
+
+       if (elem_new->datalen != elem_old->datalen)
+               return;
+       if (elem_new->datalen < sizeof(struct ieee80211_ext_chansw_ie))
+               return;
+       if (memcmp(elem_new->data, elem_old->data, elem_new->datalen))
+               return;
+
+       ecsa = (void *)elem_new->data;
+
+       if (!ecsa->mode)
+               return;
+
+       if (ecsa->new_ch_num !=
+           ieee80211_frequency_to_channel(known->pub.channel->center_freq))
+               return;
+
+       known->pub.proberesp_ecsa_stuck = 1;
+}
+
 static bool
 cfg80211_update_known_bss(struct cfg80211_registered_device *rdev,
                          struct cfg80211_internal_bss *known,
@@ -1750,8 +1805,10 @@ cfg80211_update_known_bss(struct cfg80211_registered_device *rdev,
                /* Override possible earlier Beacon frame IEs */
                rcu_assign_pointer(known->pub.ies,
                                   new->pub.proberesp_ies);
-               if (old)
+               if (old) {
+                       cfg80211_check_stuck_ecsa(rdev, known, old);
                        kfree_rcu((struct cfg80211_bss_ies *)old, rcu_head);
+               }
        }
 
        if (rcu_access_pointer(new->pub.beacon_ies)) {
index 9f13aa3353e31f9692ce41db10a977ac2614d7d8..b78c0e095e221fd775b9e1eafa4dd15485915079 100644 (file)
@@ -167,8 +167,10 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
                contd = XDP_PKT_CONTD;
 
        err = __xsk_rcv_zc(xs, xskb, len, contd);
-       if (err || likely(!frags))
-               goto out;
+       if (err)
+               goto err;
+       if (likely(!frags))
+               return 0;
 
        xskb_list = &xskb->pool->xskb_list;
        list_for_each_entry_safe(pos, tmp, xskb_list, xskb_list_node) {
@@ -177,11 +179,13 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
                len = pos->xdp.data_end - pos->xdp.data;
                err = __xsk_rcv_zc(xs, pos, len, contd);
                if (err)
-                       return err;
+                       goto err;
                list_del(&pos->xskb_list_node);
        }
 
-out:
+       return 0;
+err:
+       xsk_buff_free(xdp);
        return err;
 }
 
@@ -718,7 +722,8 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
                        memcpy(vaddr, buffer, len);
                        kunmap_local(vaddr);
 
-                       skb_add_rx_frag(skb, nr_frags, page, 0, len, 0);
+                       skb_add_rx_frag(skb, nr_frags, page, 0, len, PAGE_SIZE);
+                       refcount_add(PAGE_SIZE, &xs->sk.sk_wmem_alloc);
                }
 
                if (first_frag && desc->options & XDP_TX_METADATA) {
index 28711cc44ced216573938f392de3b452f2176410..ce60ecd48a4dc88eed7582bc0701f7c72acc84f5 100644 (file)
@@ -555,6 +555,7 @@ struct xdp_buff *xp_alloc(struct xsk_buff_pool *pool)
 
        xskb->xdp.data = xskb->xdp.data_hard_start + XDP_PACKET_HEADROOM;
        xskb->xdp.data_meta = xskb->xdp.data;
+       xskb->xdp.flags = 0;
 
        if (pool->dma_need_sync) {
                dma_sync_single_range_for_device(pool->dev, xskb->dma, 0,
index 41533c631431493882a7fa427d393c4b6a753e74..e6da7e8495c9cfdc3e81eb408e444a39442a2c9b 100644 (file)
@@ -858,4 +858,5 @@ int xfrm_count_pfkey_enc_supported(void)
 }
 EXPORT_SYMBOL_GPL(xfrm_count_pfkey_enc_supported);
 
+MODULE_DESCRIPTION("XFRM Algorithm interface");
 MODULE_LICENSE("GPL");
index 3784534c918552dc2db6d84b4ba00e1337d63b74..653e51ae39648da177b84c82881932e9987eaa99 100644 (file)
@@ -407,7 +407,7 @@ bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x)
        struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
        struct net_device *dev = x->xso.dev;
 
-       if (!x->type_offload || x->encap)
+       if (!x->type_offload)
                return false;
 
        if (x->xso.type == XFRM_DEV_OFFLOAD_PACKET ||
index 662c83beb345ed2037d146c976180b1b62c26794..e5722c95b8bb38c528cc518cdc3a05e08a338264 100644 (file)
@@ -704,9 +704,13 @@ int xfrm_output(struct sock *sk, struct sk_buff *skb)
 {
        struct net *net = dev_net(skb_dst(skb)->dev);
        struct xfrm_state *x = skb_dst(skb)->xfrm;
+       int family;
        int err;
 
-       switch (x->outer_mode.family) {
+       family = (x->xso.type != XFRM_DEV_OFFLOAD_PACKET) ? x->outer_mode.family
+               : skb_dst(skb)->ops->family;
+
+       switch (family) {
        case AF_INET:
                memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
                IPCB(skb)->flags |= IPSKB_XFRM_TRANSFORMED;
index 1b7e75159727791ef5ed03299729711ed775a16e..da6ecc6b3e153db74765a500afe3b4a255fdba44 100644 (file)
@@ -2694,7 +2694,9 @@ static struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy,
                        if (xfrm[i]->props.smark.v || xfrm[i]->props.smark.m)
                                mark = xfrm_smark_get(fl->flowi_mark, xfrm[i]);
 
-                       family = xfrm[i]->props.family;
+                       if (xfrm[i]->xso.type != XFRM_DEV_OFFLOAD_PACKET)
+                               family = xfrm[i]->props.family;
+
                        oif = fl->flowi_oif ? : fl->flowi_l3mdev;
                        dst = xfrm_dst_lookup(xfrm[i], tos, oif,
                                              &saddr, &daddr, family, mark);
@@ -3416,7 +3418,7 @@ decode_session4(const struct xfrm_flow_keys *flkeys, struct flowi *fl, bool reve
        }
 
        fl4->flowi4_proto = flkeys->basic.ip_proto;
-       fl4->flowi4_tos = flkeys->ip.tos;
+       fl4->flowi4_tos = flkeys->ip.tos & ~INET_ECN_MASK;
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
index ad01997c3aa9dd851a3fa4ad6dd6c877eaaddd36..912c1189ba41c1cdca51f0212f7ea1ae293f9370 100644 (file)
@@ -2017,6 +2017,9 @@ static int copy_to_user_tmpl(struct xfrm_policy *xp, struct sk_buff *skb)
        if (xp->xfrm_nr == 0)
                return 0;
 
+       if (xp->xfrm_nr > XFRM_MAX_DEPTH)
+               return -ENOBUFS;
+
        for (i = 0; i < xp->xfrm_nr; i++) {
                struct xfrm_user_tmpl *up = &vec[i];
                struct xfrm_tmpl *kp = &xp->xfrm_vec[i];
@@ -3888,5 +3891,6 @@ static void __exit xfrm_user_exit(void)
 
 module_init(xfrm_user_init);
 module_exit(xfrm_user_exit);
+MODULE_DESCRIPTION("XFRM User interface");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_NET_PF_PROTO(PF_NETLINK, NETLINK_XFRM);
index 7048bb3594d65be6d132d4103ee801fadf087b7e..634e81d83efd9577337e37f1ce42911574b0adf9 100644 (file)
@@ -4,14 +4,14 @@
 #define __ASM_GOTO_WORKAROUND_H
 
 /*
- * This will bring in asm_volatile_goto and asm_inline macro definitions
+ * This will bring in asm_goto_output and asm_inline macro definitions
  * if enabled by compiler and config options.
  */
 #include <linux/types.h>
 
-#ifdef asm_volatile_goto
-#undef asm_volatile_goto
-#define asm_volatile_goto(x...) asm volatile("invalid use of asm_volatile_goto")
+#ifdef asm_goto_output
+#undef asm_goto_output
+#define asm_goto_output(x...) asm volatile("invalid use of asm_goto_output")
 #endif
 
 /*
diff --git a/samples/cgroup/.gitignore b/samples/cgroup/.gitignore
new file mode 100644 (file)
index 0000000..3a01611
--- /dev/null
@@ -0,0 +1,3 @@
+/cgroup_event_listener
+/memcg_event_listener
+
index 5a84b6443875c47013348bfed85ebefe8c6da4db..3ee8ecfb8c044c3bf65461e81af5a9e95391fa44 100644 (file)
@@ -33,7 +33,7 @@ ld-option = $(success,$(LD) -v $(1))
 
 # $(as-instr,<instr>)
 # Return y if the assembler supports <instr>, n otherwise
-as-instr = $(success,printf "%b\n" "$(1)" | $(CC) $(CLANG_FLAGS) -c -x assembler-with-cpp -o /dev/null -)
+as-instr = $(success,printf "%b\n" "$(1)" | $(CC) $(CLANG_FLAGS) -Wa$(comma)--fatal-warnings -c -x assembler-with-cpp -o /dev/null -)
 
 # check if $(CC) and $(LD) exist
 $(error-if,$(failure,command -v $(CC)),C compiler '$(CC)' not found)
index 8fcb427405a6f17f61655a6d0881c433f22e1dd6..92be0c9a13eeb51beca06abe15bfe22c6e72bfcb 100644 (file)
@@ -38,7 +38,7 @@ as-option = $(call try-run,\
 # Usage: aflags-y += $(call as-instr,instr,option1,option2)
 
 as-instr = $(call try-run,\
-       printf "%b\n" "$(1)" | $(CC) -Werror $(CLANG_FLAGS) $(KBUILD_AFLAGS) -c -x assembler-with-cpp -o "$$TMP" -,$(2),$(3))
+       printf "%b\n" "$(1)" | $(CC) -Werror $(CLANG_FLAGS) $(KBUILD_AFLAGS) -Wa$(comma)--fatal-warnings -c -x assembler-with-cpp -o "$$TMP" -,$(2),$(3))
 
 # __cc-option
 # Usage: MY_CFLAGS += $(call __cc-option,$(CC),$(MY_CFLAGS),-march=winchip-c6,-march=i586)
index ab271b2051a2459cc83d05111ba1be0558cdc954..226ea3df3b4b4caf70a8b7cc1c7ead71def9af7c 100644 (file)
@@ -9,8 +9,8 @@
 # Input config fragments without '.config' suffix
 define merge_into_defconfig
        $(Q)$(CONFIG_SHELL) $(srctree)/scripts/kconfig/merge_config.sh \
-               -m -O $(objtree) $(srctree)/arch/$(ARCH)/configs/$(1) \
-               $(foreach config,$(2),$(srctree)/arch/$(ARCH)/configs/$(config).config)
+               -m -O $(objtree) $(srctree)/arch/$(SRCARCH)/configs/$(1) \
+               $(foreach config,$(2),$(srctree)/arch/$(SRCARCH)/configs/$(config).config)
        +$(Q)$(MAKE) -f $(srctree)/Makefile olddefconfig
 endef
 
@@ -23,7 +23,7 @@ endef
 # Input config fragments without '.config' suffix
 define merge_into_defconfig_override
        $(Q)$(CONFIG_SHELL) $(srctree)/scripts/kconfig/merge_config.sh \
-               -Q -m -O $(objtree) $(srctree)/arch/$(ARCH)/configs/$(1) \
-               $(foreach config,$(2),$(srctree)/arch/$(ARCH)/configs/$(config).config)
+               -Q -m -O $(objtree) $(srctree)/arch/$(SRCARCH)/configs/$(1) \
+               $(foreach config,$(2),$(srctree)/arch/$(SRCARCH)/configs/$(config).config)
        +$(Q)$(MAKE) -f $(srctree)/Makefile olddefconfig
 endef
index 9b7a37ae28a8818a41dada5d1dc12c5c65c791f9..a9e552a1e9105b5efb559a23e4a2943c102b12a2 100644 (file)
@@ -97,7 +97,6 @@ KBUILD_CFLAGS += $(call cc-option, -Wunused-const-variable)
 KBUILD_CFLAGS += $(call cc-option, -Wpacked-not-aligned)
 KBUILD_CFLAGS += $(call cc-option, -Wformat-overflow)
 KBUILD_CFLAGS += $(call cc-option, -Wformat-truncation)
-KBUILD_CFLAGS += $(call cc-option, -Wstringop-overflow)
 KBUILD_CFLAGS += $(call cc-option, -Wstringop-truncation)
 
 KBUILD_CPPFLAGS += -Wundef
@@ -113,7 +112,6 @@ KBUILD_CFLAGS += $(call cc-disable-warning, restrict)
 KBUILD_CFLAGS += $(call cc-disable-warning, packed-not-aligned)
 KBUILD_CFLAGS += $(call cc-disable-warning, format-overflow)
 KBUILD_CFLAGS += $(call cc-disable-warning, format-truncation)
-KBUILD_CFLAGS += $(call cc-disable-warning, stringop-overflow)
 KBUILD_CFLAGS += $(call cc-disable-warning, stringop-truncation)
 
 ifdef CONFIG_CC_IS_CLANG
index 61b7dddedc461e2ece91a7b25bcf14987fc98886..0669bac5e900e134c45a025697bae3b6251c09b1 100755 (executable)
@@ -513,7 +513,7 @@ eBPF programs can have an associated license, passed along with the bytecode
 instructions to the kernel when the programs are loaded. The format for that
 string is identical to the one in use for kernel modules (Dual licenses, such
 as "Dual BSD/GPL", may be used). Some helper functions are only accessible to
-programs that are compatible with the GNU Privacy License (GPL).
+programs that are compatible with the GNU General Public License (GNU GPL).
 
 In order to use such helpers, the eBPF program must be loaded with the correct
 license string passed (via **attr**) to the **bpf**\\ () system call, and this
index 5dea4479240bc02226828f13f3fec2dc3acf36c6..e4fb686dfaa9f0ee49fadb43c6e4404fcd5ac8f3 100755 (executable)
@@ -170,7 +170,7 @@ def process_line(root_directory, command_prefix, file_path):
     # escape the pound sign '#', either as '\#' or '$(pound)' (depending on the
     # kernel version). The compile_commands.json file is not interepreted
     # by Make, so this code replaces the escaped version with '#'.
-    prefix = command_prefix.replace('\#', '#').replace('$(pound)', '#')
+    prefix = command_prefix.replace(r'\#', '#').replace('$(pound)', '#')
 
     # Return the canonical path, eliminating any symbolic links encountered in the path.
     abs_path = os.path.realpath(os.path.join(root_directory, file_path))
index c8047f4441e60ea944ae6fccedf0e4d0d7632dcd..e8316beb17a714588fa5358a8b514a9d383fecbf 100644 (file)
@@ -82,7 +82,7 @@ lx-symbols command."""
         self.module_files_updated = True
 
     def _get_module_file(self, module_name):
-        module_pattern = ".*/{0}\.ko(?:.debug)?$".format(
+        module_pattern = r".*/{0}\.ko(?:.debug)?$".format(
             module_name.replace("_", r"[_\-]"))
         for name in self.module_files:
             if re.match(module_pattern, name) and os.path.exists(name):
index 3e808528aaeab2625424b56247eed97fa107232d..e9e9fb8d86746460c893a51a2989ca4061fc8e52 100644 (file)
@@ -345,6 +345,8 @@ void sym_calc_value(struct symbol *sym)
 
        oldval = sym->curr;
 
+       newval.tri = no;
+
        switch (sym->type) {
        case S_INT:
                newval.val = "0";
@@ -357,7 +359,7 @@ void sym_calc_value(struct symbol *sym)
                break;
        case S_BOOLEAN:
        case S_TRISTATE:
-               newval = symbol_no.curr;
+               newval.val = "n";
                break;
        default:
                sym->curr.val = sym->name;
index a432b171be826a9c4ff362c518f97024ab90a811..7862a81017477daec1f702fdf46166e381bd396d 100755 (executable)
@@ -135,8 +135,13 @@ gen_btf()
        ${OBJCOPY} --only-section=.BTF --set-section-flags .BTF=alloc,readonly \
                --strip-all ${1} ${2} 2>/dev/null
        # Change e_type to ET_REL so that it can be used to link final vmlinux.
-       # Unlike GNU ld, lld does not allow an ET_EXEC input.
-       printf '\1' | dd of=${2} conv=notrunc bs=1 seek=16 status=none
+       # GNU ld 2.35+ and lld do not allow an ET_EXEC input.
+       if is_enabled CONFIG_CPU_BIG_ENDIAN; then
+               et_rel='\0\1'
+       else
+               et_rel='\1\0'
+       fi
+       printf "${et_rel}" | dd of=${2} conv=notrunc bs=1 seek=16 status=none
 }
 
 # Create ${2} .S file with all symbols from the ${1} object file
index 9ba1c9da0a40f28a4efe22856868ac745ab68b56..57ff5656d566fbc659801bdb4fd445b7b9ef2b86 100755 (executable)
@@ -48,17 +48,8 @@ ${NM} -n ${1} | sed >${2} -e "
 / __kvm_nvhe_\\$/d
 / __kvm_nvhe_\.L/d
 
-# arm64 lld
-/ __AArch64ADRPThunk_/d
-
-# arm lld
-/ __ARMV5PILongThunk_/d
-/ __ARMV7PILongThunk_/d
-/ __ThumbV7PILongThunk_/d
-
-# mips lld
-/ __LA25Thunk_/d
-/ __microLA25Thunk_/d
+# lld arm/aarch64/mips thunks
+/ __[[:alnum:]]*Thunk_/d
 
 # CFI type identifiers
 / __kcfi_typeid_/d
index 795b21154446df9d6cd37920e4b5183e0cbddeef..267b9a0a3abcd849fe4f0bae4cddd8a287d26184 100644 (file)
@@ -70,9 +70,7 @@ void modpost_log(enum loglevel loglevel, const char *fmt, ...)
                break;
        case LOG_ERROR:
                fprintf(stderr, "ERROR: ");
-               break;
-       case LOG_FATAL:
-               fprintf(stderr, "FATAL: ");
+               error_occurred = true;
                break;
        default: /* invalid loglevel, ignore */
                break;
@@ -83,16 +81,8 @@ void modpost_log(enum loglevel loglevel, const char *fmt, ...)
        va_start(arglist, fmt);
        vfprintf(stderr, fmt, arglist);
        va_end(arglist);
-
-       if (loglevel == LOG_FATAL)
-               exit(1);
-       if (loglevel == LOG_ERROR)
-               error_occurred = true;
 }
 
-void __attribute__((alias("modpost_log")))
-modpost_log_noret(enum loglevel loglevel, const char *fmt, ...);
-
 static inline bool strends(const char *str, const char *postfix)
 {
        if (strlen(str) < strlen(postfix))
@@ -806,7 +796,8 @@ static void check_section(const char *modname, struct elf_info *elf,
 
 #define DATA_SECTIONS ".data", ".data.rel"
 #define TEXT_SECTIONS ".text", ".text.*", ".sched.text", \
-               ".kprobes.text", ".cpuidle.text", ".noinstr.text"
+               ".kprobes.text", ".cpuidle.text", ".noinstr.text", \
+               ".ltext", ".ltext.*"
 #define OTHER_TEXT_SECTIONS ".ref.text", ".head.text", ".spinlock.text", \
                ".fixup", ".entry.text", ".exception.text", \
                ".coldtext", ".softirqentry.text"
index 835cababf1b09eb2353f8777f934dfabf3731454..ee43c795063682b440818cd3a81ba5355afba456 100644 (file)
@@ -194,15 +194,11 @@ void *sym_get_data(const struct elf_info *info, const Elf_Sym *sym);
 enum loglevel {
        LOG_WARN,
        LOG_ERROR,
-       LOG_FATAL
 };
 
 void __attribute__((format(printf, 2, 3)))
 modpost_log(enum loglevel loglevel, const char *fmt, ...);
 
-void __attribute__((format(printf, 2, 3), noreturn))
-modpost_log_noret(enum loglevel loglevel, const char *fmt, ...);
-
 /*
  * warn - show the given message, then let modpost continue running, still
  *        allowing modpost to exit successfully. This should be used when
@@ -218,4 +214,4 @@ modpost_log_noret(enum loglevel loglevel, const char *fmt, ...);
  */
 #define warn(fmt, args...)     modpost_log(LOG_WARN, fmt, ##args)
 #define error(fmt, args...)    modpost_log(LOG_ERROR, fmt, ##args)
-#define fatal(fmt, args...)    modpost_log_noret(LOG_FATAL, fmt, ##args)
+#define fatal(fmt, args...)    do { error(fmt, ##args); exit(1); } while (1)
index 31066bfdba04e30abffa2d4f3088760a0a3bc753..dc4878502276ce94bc7d3d8a213934a21751078a 100644 (file)
@@ -326,7 +326,12 @@ static int parse_source_files(const char *objfile, struct md4_ctx *md)
 
        /* Sum all files in the same dir or subdirs. */
        while ((line = get_line(&pos))) {
-               char* p = line;
+               char* p;
+
+               /* trim the leading spaces away */
+               while (isspace(*line))
+                       line++;
+               p = line;
 
                if (strncmp(line, "source_", sizeof("source_")-1) == 0) {
                        p = strrchr(line, ' ');
index 89298983a16941a20ccbd72330af1e168652c3f4..f58726671fb37424308c678126e5edd4a22dc5f3 100644 (file)
@@ -55,12 +55,12 @@ patch -p1 < %{SOURCE2}
 %{make} %{makeflags} KERNELRELEASE=%{KERNELRELEASE} KBUILD_BUILD_VERSION=%{release}
 
 %install
-mkdir -p %{buildroot}/boot
-cp $(%{make} %{makeflags} -s image_name) %{buildroot}/boot/vmlinuz-%{KERNELRELEASE}
+mkdir -p %{buildroot}/lib/modules/%{KERNELRELEASE}
+cp $(%{make} %{makeflags} -s image_name) %{buildroot}/lib/modules/%{KERNELRELEASE}/vmlinuz
 %{make} %{makeflags} INSTALL_MOD_PATH=%{buildroot} modules_install
 %{make} %{makeflags} INSTALL_HDR_PATH=%{buildroot}/usr headers_install
-cp System.map %{buildroot}/boot/System.map-%{KERNELRELEASE}
-cp .config %{buildroot}/boot/config-%{KERNELRELEASE}
+cp System.map %{buildroot}/lib/modules/%{KERNELRELEASE}
+cp .config %{buildroot}/lib/modules/%{KERNELRELEASE}/config
 ln -fns /usr/src/kernels/%{KERNELRELEASE} %{buildroot}/lib/modules/%{KERNELRELEASE}/build
 %if %{with_devel}
 %{make} %{makeflags} run-command KBUILD_RUN_COMMAND='${srctree}/scripts/package/install-extmod-build %{buildroot}/usr/src/kernels/%{KERNELRELEASE}'
@@ -70,13 +70,14 @@ ln -fns /usr/src/kernels/%{KERNELRELEASE} %{buildroot}/lib/modules/%{KERNELRELEA
 rm -rf %{buildroot}
 
 %post
-if [ -x /sbin/installkernel -a -r /boot/vmlinuz-%{KERNELRELEASE} -a -r /boot/System.map-%{KERNELRELEASE} ]; then
-cp /boot/vmlinuz-%{KERNELRELEASE} /boot/.vmlinuz-%{KERNELRELEASE}-rpm
-cp /boot/System.map-%{KERNELRELEASE} /boot/.System.map-%{KERNELRELEASE}-rpm
-rm -f /boot/vmlinuz-%{KERNELRELEASE} /boot/System.map-%{KERNELRELEASE}
-/sbin/installkernel %{KERNELRELEASE} /boot/.vmlinuz-%{KERNELRELEASE}-rpm /boot/.System.map-%{KERNELRELEASE}-rpm
-rm -f /boot/.vmlinuz-%{KERNELRELEASE}-rpm /boot/.System.map-%{KERNELRELEASE}-rpm
+if [ -x /usr/bin/kernel-install ]; then
+       /usr/bin/kernel-install add %{KERNELRELEASE} /lib/modules/%{KERNELRELEASE}/vmlinuz
 fi
+for file in vmlinuz System.map config; do
+       if ! cmp --silent "/lib/modules/%{KERNELRELEASE}/${file}" "/boot/${file}-%{KERNELRELEASE}"; then
+               cp "/lib/modules/%{KERNELRELEASE}/${file}" "/boot/${file}-%{KERNELRELEASE}"
+       fi
+done
 
 %preun
 if [ -x /sbin/new-kernel-pkg ]; then
@@ -94,7 +95,6 @@ fi
 %defattr (-, root, root)
 /lib/modules/%{KERNELRELEASE}
 %exclude /lib/modules/%{KERNELRELEASE}/build
-/boot/*
 
 %files headers
 %defattr (-, root, root)
index 7717354ce0950af9627939b787efa98b4e50621c..9a3dcaafb5b1ee20c4d2d5d355d81302ead8d427 100644 (file)
@@ -469,8 +469,10 @@ static int apparmor_file_open(struct file *file)
         * Cache permissions granted by the previous exec check, with
         * implicit read and executable mmap which are required to
         * actually execute the image.
+        *
+        * Illogically, FMODE_EXEC is in f_flags, not f_mode.
         */
-       if (current->in_execve) {
+       if (file->f_flags & __FMODE_EXEC) {
                fctx->allow = MAY_EXEC | MAY_READ | AA_EXEC_MMAP;
                return 0;
        }
@@ -782,7 +784,7 @@ static int apparmor_getselfattr(unsigned int attr, struct lsm_ctx __user *lx,
        int error = -ENOENT;
        struct aa_task_ctx *ctx = task_ctx(current);
        struct aa_label *label = NULL;
-       char *value;
+       char *value = NULL;
 
        switch (attr) {
        case LSM_ATTR_CURRENT:
index df387de29bfa54bf4bf0f7607a6837ec589ebfd5..45c3e5dda355e23f823086816d00e78071b1c15c 100644 (file)
@@ -179,7 +179,8 @@ static int __init integrity_add_key(const unsigned int id, const void *data,
                                   KEY_ALLOC_NOT_IN_QUOTA);
        if (IS_ERR(key)) {
                rc = PTR_ERR(key);
-               pr_err("Problem loading X.509 certificate %d\n", rc);
+               if (id != INTEGRITY_KEYRING_MACHINE)
+                       pr_err("Problem loading X.509 certificate %d\n", rc);
        } else {
                pr_notice("Loaded X.509 cert '%s'\n",
                          key_ref_to_ptr(key)->description);
index 76f55dd13cb801078ba71079bf7e1c58eb2ada3b..8af2136069d239129c2994e5ee0f3e9b696ed7ea 100644 (file)
@@ -237,10 +237,6 @@ static int datablob_parse(char *datablob, const char **format,
                        break;
                }
                *decrypted_data = strsep(&datablob, " \t");
-               if (!*decrypted_data) {
-                       pr_info("encrypted_key: decrypted_data is missing\n");
-                       break;
-               }
                ret = 0;
                break;
        case Opt_load:
index fc520a06f9af107310aa81050b8ad21accc6640d..0171f7eb6ee15d384835a6cd68a2afcff1f3b653 100644 (file)
@@ -737,8 +737,8 @@ static int current_check_refer_path(struct dentry *const old_dentry,
        bool allow_parent1, allow_parent2;
        access_mask_t access_request_parent1, access_request_parent2;
        struct path mnt_dir;
-       layer_mask_t layer_masks_parent1[LANDLOCK_NUM_ACCESS_FS],
-               layer_masks_parent2[LANDLOCK_NUM_ACCESS_FS];
+       layer_mask_t layer_masks_parent1[LANDLOCK_NUM_ACCESS_FS] = {},
+                    layer_masks_parent2[LANDLOCK_NUM_ACCESS_FS] = {};
 
        if (!dom)
                return 0;
index 0144a98d3712e66733462a37a47518beb2eab743..7035ee35a393020303304741092e6b016e02669c 100644 (file)
@@ -29,6 +29,7 @@
 #include <linux/backing-dev.h>
 #include <linux/string.h>
 #include <linux/msg.h>
+#include <linux/overflow.h>
 #include <net/flow.h>
 
 /* How many LSMs were built into the kernel? */
@@ -4015,6 +4016,7 @@ int security_setselfattr(unsigned int attr, struct lsm_ctx __user *uctx,
        struct security_hook_list *hp;
        struct lsm_ctx *lctx;
        int rc = LSM_RET_DEFAULT(setselfattr);
+       u64 required_len;
 
        if (flags)
                return -EINVAL;
@@ -4027,8 +4029,9 @@ int security_setselfattr(unsigned int attr, struct lsm_ctx __user *uctx,
        if (IS_ERR(lctx))
                return PTR_ERR(lctx);
 
-       if (size < lctx->len || size < lctx->ctx_len + sizeof(*lctx) ||
-           lctx->len < lctx->ctx_len + sizeof(*lctx)) {
+       if (size < lctx->len ||
+           check_add_overflow(sizeof(*lctx), lctx->ctx_len, &required_len) ||
+           lctx->len < required_len) {
                rc = -EINVAL;
                goto free_out;
        }
@@ -4255,7 +4258,19 @@ EXPORT_SYMBOL(security_inode_setsecctx);
  */
 int security_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen)
 {
-       return call_int_hook(inode_getsecctx, -EOPNOTSUPP, inode, ctx, ctxlen);
+       struct security_hook_list *hp;
+       int rc;
+
+       /*
+        * Only one module will provide a security context.
+        */
+       hlist_for_each_entry(hp, &security_hook_heads.inode_getsecctx, list) {
+               rc = hp->hook.inode_getsecctx(inode, ctx, ctxlen);
+               if (rc != LSM_RET_DEFAULT(inode_getsecctx))
+                       return rc;
+       }
+
+       return LSM_RET_DEFAULT(inode_getsecctx);
 }
 EXPORT_SYMBOL(security_inode_getsecctx);
 
@@ -4612,8 +4627,20 @@ EXPORT_SYMBOL(security_sock_rcv_skb);
 int security_socket_getpeersec_stream(struct socket *sock, sockptr_t optval,
                                      sockptr_t optlen, unsigned int len)
 {
-       return call_int_hook(socket_getpeersec_stream, -ENOPROTOOPT, sock,
-                            optval, optlen, len);
+       struct security_hook_list *hp;
+       int rc;
+
+       /*
+        * Only one module will provide a security context.
+        */
+       hlist_for_each_entry(hp, &security_hook_heads.socket_getpeersec_stream,
+                            list) {
+               rc = hp->hook.socket_getpeersec_stream(sock, optval, optlen,
+                                                      len);
+               if (rc != LSM_RET_DEFAULT(socket_getpeersec_stream))
+                       return rc;
+       }
+       return LSM_RET_DEFAULT(socket_getpeersec_stream);
 }
 
 /**
@@ -4633,8 +4660,19 @@ int security_socket_getpeersec_stream(struct socket *sock, sockptr_t optval,
 int security_socket_getpeersec_dgram(struct socket *sock,
                                     struct sk_buff *skb, u32 *secid)
 {
-       return call_int_hook(socket_getpeersec_dgram, -ENOPROTOOPT, sock,
-                            skb, secid);
+       struct security_hook_list *hp;
+       int rc;
+
+       /*
+        * Only one module will provide a security context.
+        */
+       hlist_for_each_entry(hp, &security_hook_heads.socket_getpeersec_dgram,
+                            list) {
+               rc = hp->hook.socket_getpeersec_dgram(sock, skb, secid);
+               if (rc != LSM_RET_DEFAULT(socket_getpeersec_dgram))
+                       return rc;
+       }
+       return LSM_RET_DEFAULT(socket_getpeersec_dgram);
 }
 EXPORT_SYMBOL(security_socket_getpeersec_dgram);
 
index a6bf90ace84c74bdb11330d7bb278183dfb13275..338b023a8c3edb5918d59a2bc07e23221507ff45 100644 (file)
@@ -6559,7 +6559,7 @@ static int selinux_getselfattr(unsigned int attr, struct lsm_ctx __user *ctx,
                               size_t *size, u32 flags)
 {
        int rc;
-       char *val;
+       char *val = NULL;
        int val_len;
 
        val_len = selinux_lsm_getattr(attr, current, &val);
index 57ee70ae50f24ac771a7bd74d224c17b1ff03d25..ea3140d510ecbfee06666df588a795b9f5bfc5ce 100644 (file)
@@ -2649,13 +2649,14 @@ ssize_t tomoyo_write_control(struct tomoyo_io_buffer *head,
 {
        int error = buffer_len;
        size_t avail_len = buffer_len;
-       char *cp0 = head->write_buf;
+       char *cp0;
        int idx;
 
        if (!head->write)
                return -EINVAL;
        if (mutex_lock_interruptible(&head->io_sem))
                return -EINTR;
+       cp0 = head->write_buf;
        head->read_user_buf_avail = 0;
        idx = tomoyo_read_lock();
        /* Read a line and dispatch it to the policy handler. */
index 3c3af149bf1c12a94c318d188984ab4bda4a2edc..04a92c3d65d44de5502dd5955146e58cba4f4978 100644 (file)
@@ -328,7 +328,8 @@ static int tomoyo_file_fcntl(struct file *file, unsigned int cmd,
 static int tomoyo_file_open(struct file *f)
 {
        /* Don't check read permission here if called from execve(). */
-       if (current->in_execve)
+       /* Illogically, FMODE_EXEC is in f_flags, not f_mode. */
+       if (f->f_flags & __FMODE_EXEC)
                return 0;
        return tomoyo_check_open_permission(tomoyo_domain(), &f->f_path,
                                            f->f_flags);
index a6b444ee283264ca60e8d5673c4378f6175f128c..f6526b33713756071c14d4da2c7e051d1ae17bf9 100644 (file)
@@ -32,7 +32,6 @@ snd-ump-objs      := ump.o
 snd-ump-$(CONFIG_SND_UMP_LEGACY_RAWMIDI) += ump_convert.o
 snd-timer-objs    := timer.o
 snd-hrtimer-objs  := hrtimer.o
-snd-rtctimer-objs := rtctimer.o
 snd-hwdep-objs    := hwdep.o
 snd-seq-device-objs := seq_device.o
 
index a09f0154e6a7029c72fb3f0d6d8bd36f202b72c8..d0788126cbab10a2ef8daaab9201f366f27d8c63 100644 (file)
@@ -211,6 +211,10 @@ static const char * const snd_pcm_format_names[] = {
        FORMAT(DSD_U32_LE),
        FORMAT(DSD_U16_BE),
        FORMAT(DSD_U32_BE),
+       FORMAT(S20_LE),
+       FORMAT(S20_BE),
+       FORMAT(U20_LE),
+       FORMAT(U20_BE),
 };
 
 /**
index f5ff00f99788a80135e2832f0cdb64650e3122d7..21baf6bf7e25a048e64d37331b0109f2ad7ec354 100644 (file)
@@ -486,6 +486,11 @@ static int fixup_unreferenced_params(struct snd_pcm_substream *substream,
                i = hw_param_interval_c(params, SNDRV_PCM_HW_PARAM_SAMPLE_BITS);
                if (snd_interval_single(i))
                        params->msbits = snd_interval_value(i);
+               m = hw_param_mask_c(params, SNDRV_PCM_HW_PARAM_FORMAT);
+               if (snd_mask_single(m)) {
+                       snd_pcm_format_t format = (__force snd_pcm_format_t)snd_mask_min(m);
+                       params->msbits = snd_pcm_format_width(format);
+               }
        }
 
        if (params->msbits) {
index 3bef1944e955ff24940c1cba0d0a82f79db33d6e..fe7911498cc4325a866a87328087c54a0a6c791a 100644 (file)
@@ -985,7 +985,7 @@ static int snd_ump_legacy_open(struct snd_rawmidi_substream *substream)
        struct snd_ump_endpoint *ump = substream->rmidi->private_data;
        int dir = substream->stream;
        int group = ump->legacy_mapping[substream->number];
-       int err;
+       int err = 0;
 
        mutex_lock(&ump->open_mutex);
        if (ump->legacy_substreams[dir][group]) {
@@ -1009,7 +1009,7 @@ static int snd_ump_legacy_open(struct snd_rawmidi_substream *substream)
        spin_unlock_irq(&ump->legacy_locks[dir]);
  unlock:
        mutex_unlock(&ump->open_mutex);
-       return 0;
+       return err;
 }
 
 static int snd_ump_legacy_close(struct snd_rawmidi_substream *substream)
index a13c0b408aadfcc6d2f1f588cdd96a733961bf2a..7be17bca257f0ddba4799875c9250f96ba4e4b8a 100644 (file)
@@ -951,7 +951,7 @@ static int generate_tx_packet_descs(struct amdtp_stream *s, struct pkt_desc *des
                                // to the reason.
                                unsigned int safe_cycle = increment_ohci_cycle_count(next_cycle,
                                                                IR_JUMBO_PAYLOAD_MAX_SKIP_CYCLES);
-                               lost = (compare_ohci_cycle_count(safe_cycle, cycle) > 0);
+                               lost = (compare_ohci_cycle_count(safe_cycle, cycle) < 0);
                        }
                        if (lost) {
                                dev_err(&s->unit->device, "Detect discontinuity of cycle: %d %d\n",
index 21a90b3c4cc7300e0abfc301e34338170dd6b911..8e0ff70fb6101ff9b94aab0d79c4b6f7cf90498d 100644 (file)
@@ -156,7 +156,7 @@ config SND_HDA_SCODEC_CS35L56_I2C
        depends on I2C
        depends on ACPI || COMPILE_TEST
        depends on SND_SOC
-       select CS_DSP
+       select FW_CS_DSP
        select SND_HDA_GENERIC
        select SND_SOC_CS35L56_SHARED
        select SND_HDA_SCODEC_CS35L56
@@ -171,7 +171,7 @@ config SND_HDA_SCODEC_CS35L56_SPI
        depends on SPI_MASTER
        depends on ACPI || COMPILE_TEST
        depends on SND_SOC
-       select CS_DSP
+       select FW_CS_DSP
        select SND_HDA_GENERIC
        select SND_SOC_CS35L56_SHARED
        select SND_HDA_SCODEC_CS35L56
index 35277ce890a46fb9204cc80a55e812cb64c842d5..e436d4dab317f05c20a8a806f14e07580bfcb2de 100644 (file)
@@ -76,11 +76,14 @@ static const struct cs35l41_config cs35l41_config_table[] = {
        { "10431533", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 1000, 4500, 24 },
        { "10431573", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 1000, 4500, 24 },
        { "10431663", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, -1, 0, 1000, 4500, 24 },
+       { "10431683", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 0, 0, 0 },
+       { "104316A3", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 0, 0, 0 },
        { "104316D3", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 0, 0, 0 },
        { "104316F3", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 0, 0, 0 },
        { "104317F3", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 1000, 4500, 24 },
        { "10431863", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 1000, 4500, 24 },
        { "104318D3", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 0, 0, 0 },
+       { "10431A83", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 1000, 4500, 24 },
        { "10431C9F", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 1000, 4500, 24 },
        { "10431CAF", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 1000, 4500, 24 },
        { "10431CCF", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 1000, 4500, 24 },
@@ -89,10 +92,14 @@ static const struct cs35l41_config cs35l41_config_table[] = {
        { "10431D1F", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 1000, 4500, 24 },
        { "10431DA2", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 0, 0, 0 },
        { "10431E02", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 0, 0, 0 },
+       { "10431E12", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 0, 0, 0 },
        { "10431EE2", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, -1, -1, 0, 0, 0 },
        { "10431F12", 2, INTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 1000, 4500, 24 },
        { "10431F1F", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, -1, 0, 0, 0, 0 },
        { "10431F62", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 1, 2, 0, 0, 0, 0 },
+       { "17AA386F", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, -1, -1, 0, 0, 0 },
+       { "17AA38A9", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 2, -1, 0, 0, 0 },
+       { "17AA38AB", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 2, -1, 0, 0, 0 },
        { "17AA38B4", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 0, 0, 0 },
        { "17AA38B5", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 0, 0, 0 },
        { "17AA38B6", 2, EXTERNAL, { CS35L41_LEFT, CS35L41_RIGHT, 0, 0 }, 0, 1, -1, 0, 0, 0 },
@@ -410,11 +417,14 @@ static const struct cs35l41_prop_model cs35l41_prop_model_table[] = {
        { "CSC3551", "10431533", generic_dsd_config },
        { "CSC3551", "10431573", generic_dsd_config },
        { "CSC3551", "10431663", generic_dsd_config },
+       { "CSC3551", "10431683", generic_dsd_config },
+       { "CSC3551", "104316A3", generic_dsd_config },
        { "CSC3551", "104316D3", generic_dsd_config },
        { "CSC3551", "104316F3", generic_dsd_config },
        { "CSC3551", "104317F3", generic_dsd_config },
        { "CSC3551", "10431863", generic_dsd_config },
        { "CSC3551", "104318D3", generic_dsd_config },
+       { "CSC3551", "10431A83", generic_dsd_config },
        { "CSC3551", "10431C9F", generic_dsd_config },
        { "CSC3551", "10431CAF", generic_dsd_config },
        { "CSC3551", "10431CCF", generic_dsd_config },
@@ -423,10 +433,14 @@ static const struct cs35l41_prop_model cs35l41_prop_model_table[] = {
        { "CSC3551", "10431D1F", generic_dsd_config },
        { "CSC3551", "10431DA2", generic_dsd_config },
        { "CSC3551", "10431E02", generic_dsd_config },
+       { "CSC3551", "10431E12", generic_dsd_config },
        { "CSC3551", "10431EE2", generic_dsd_config },
        { "CSC3551", "10431F12", generic_dsd_config },
        { "CSC3551", "10431F1F", generic_dsd_config },
        { "CSC3551", "10431F62", generic_dsd_config },
+       { "CSC3551", "17AA386F", generic_dsd_config },
+       { "CSC3551", "17AA38A9", generic_dsd_config },
+       { "CSC3551", "17AA38AB", generic_dsd_config },
        { "CSC3551", "17AA38B4", generic_dsd_config },
        { "CSC3551", "17AA38B5", generic_dsd_config },
        { "CSC3551", "17AA38B6", generic_dsd_config },
index b61e1de8c4bf905a6bddd8bc859da2dc9b8cd408..75a14ba54fcd1c270459b47be66ecb6aa799aa33 100644 (file)
   *  ASP1_RX_WL = 24 bits per sample
   *  ASP1_TX_WL = 24 bits per sample
   *  ASP1_RXn_EN 1..3 and ASP1_TXn_EN 1..4 disabled
+  *
+  * Override any Windows-specific mixer settings applied by the firmware.
   */
 static const struct reg_sequence cs35l56_hda_dai_config[] = {
        { CS35L56_ASP1_CONTROL1,        0x00000021 },
        { CS35L56_ASP1_CONTROL2,        0x20200200 },
        { CS35L56_ASP1_CONTROL3,        0x00000003 },
+       { CS35L56_ASP1_FRAME_CONTROL1,  0x03020100 },
+       { CS35L56_ASP1_FRAME_CONTROL5,  0x00020100 },
        { CS35L56_ASP1_DATA_CONTROL5,   0x00000018 },
        { CS35L56_ASP1_DATA_CONTROL1,   0x00000018 },
        { CS35L56_ASP1_ENABLES1,        0x00000000 },
+       { CS35L56_ASP1TX1_INPUT,        0x00000018 },
+       { CS35L56_ASP1TX2_INPUT,        0x00000019 },
+       { CS35L56_ASP1TX3_INPUT,        0x00000020 },
+       { CS35L56_ASP1TX4_INPUT,        0x00000028 },
+
 };
 
 static void cs35l56_hda_play(struct cs35l56_hda *cs35l56)
@@ -133,6 +142,10 @@ static int cs35l56_hda_runtime_resume(struct device *dev)
                }
        }
 
+       ret = cs35l56_force_sync_asp1_registers_from_cache(&cs35l56->base);
+       if (ret)
+               goto err;
+
        return 0;
 
 err:
@@ -384,7 +397,7 @@ static const struct cs_dsp_client_ops cs35l56_hda_client_ops = {
 
 static int cs35l56_hda_request_firmware_file(struct cs35l56_hda *cs35l56,
                                             const struct firmware **firmware, char **filename,
-                                            const char *dir, const char *system_name,
+                                            const char *base_name, const char *system_name,
                                             const char *amp_name,
                                             const char *filetype)
 {
@@ -392,17 +405,13 @@ static int cs35l56_hda_request_firmware_file(struct cs35l56_hda *cs35l56,
        int ret = 0;
 
        if (system_name && amp_name)
-               *filename = kasprintf(GFP_KERNEL, "%scs35l56%s-%02x-dsp1-misc-%s-%s.%s", dir,
-                                     cs35l56->base.secured ? "s" : "", cs35l56->base.rev,
+               *filename = kasprintf(GFP_KERNEL, "%s-%s-%s.%s", base_name,
                                      system_name, amp_name, filetype);
        else if (system_name)
-               *filename = kasprintf(GFP_KERNEL, "%scs35l56%s-%02x-dsp1-misc-%s.%s", dir,
-                                     cs35l56->base.secured ? "s" : "", cs35l56->base.rev,
+               *filename = kasprintf(GFP_KERNEL, "%s-%s.%s", base_name,
                                      system_name, filetype);
        else
-               *filename = kasprintf(GFP_KERNEL, "%scs35l56%s-%02x-dsp1-misc.%s", dir,
-                                     cs35l56->base.secured ? "s" : "", cs35l56->base.rev,
-                                     filetype);
+               *filename = kasprintf(GFP_KERNEL, "%s.%s", base_name, filetype);
 
        if (!*filename)
                return -ENOMEM;
@@ -435,8 +444,8 @@ static int cs35l56_hda_request_firmware_file(struct cs35l56_hda *cs35l56,
        return 0;
 }
 
-static const char cirrus_dir[] = "cirrus/";
 static void cs35l56_hda_request_firmware_files(struct cs35l56_hda *cs35l56,
+                                              unsigned int preloaded_fw_ver,
                                               const struct firmware **wmfw_firmware,
                                               char **wmfw_filename,
                                               const struct firmware **coeff_firmware,
@@ -444,55 +453,73 @@ static void cs35l56_hda_request_firmware_files(struct cs35l56_hda *cs35l56,
 {
        const char *system_name = cs35l56->system_name;
        const char *amp_name = cs35l56->amp_name;
+       char base_name[37];
        int ret;
 
+       if (preloaded_fw_ver) {
+               snprintf(base_name, sizeof(base_name),
+                        "cirrus/cs35l56-%02x%s-%06x-dsp1-misc",
+                        cs35l56->base.rev,
+                        cs35l56->base.secured ? "-s" : "",
+                        preloaded_fw_ver & 0xffffff);
+       } else {
+               snprintf(base_name, sizeof(base_name),
+                        "cirrus/cs35l56-%02x%s-dsp1-misc",
+                        cs35l56->base.rev,
+                        cs35l56->base.secured ? "-s" : "");
+       }
+
        if (system_name && amp_name) {
                if (!cs35l56_hda_request_firmware_file(cs35l56, wmfw_firmware, wmfw_filename,
-                                                      cirrus_dir, system_name, amp_name, "wmfw")) {
+                                                      base_name, system_name, amp_name, "wmfw")) {
                        cs35l56_hda_request_firmware_file(cs35l56, coeff_firmware, coeff_filename,
-                                                         cirrus_dir, system_name, amp_name, "bin");
+                                                         base_name, system_name, amp_name, "bin");
                        return;
                }
        }
 
        if (system_name) {
                if (!cs35l56_hda_request_firmware_file(cs35l56, wmfw_firmware, wmfw_filename,
-                                                      cirrus_dir, system_name, NULL, "wmfw")) {
+                                                      base_name, system_name, NULL, "wmfw")) {
                        if (amp_name)
                                cs35l56_hda_request_firmware_file(cs35l56,
                                                                  coeff_firmware, coeff_filename,
-                                                                 cirrus_dir, system_name,
+                                                                 base_name, system_name,
                                                                  amp_name, "bin");
                        if (!*coeff_firmware)
                                cs35l56_hda_request_firmware_file(cs35l56,
                                                                  coeff_firmware, coeff_filename,
-                                                                 cirrus_dir, system_name,
+                                                                 base_name, system_name,
                                                                  NULL, "bin");
                        return;
                }
+
+               /*
+                * Check for system-specific bin files without wmfw before
+                * falling back to generic firmware
+                */
+               if (amp_name)
+                       cs35l56_hda_request_firmware_file(cs35l56, coeff_firmware, coeff_filename,
+                                                         base_name, system_name, amp_name, "bin");
+               if (!*coeff_firmware)
+                       cs35l56_hda_request_firmware_file(cs35l56, coeff_firmware, coeff_filename,
+                                                         base_name, system_name, NULL, "bin");
+
+               if (*coeff_firmware)
+                       return;
        }
 
        ret = cs35l56_hda_request_firmware_file(cs35l56, wmfw_firmware, wmfw_filename,
-                                               cirrus_dir, NULL, NULL, "wmfw");
+                                               base_name, NULL, NULL, "wmfw");
        if (!ret) {
                cs35l56_hda_request_firmware_file(cs35l56, coeff_firmware, coeff_filename,
-                                                 cirrus_dir, NULL, NULL, "bin");
+                                                 base_name, NULL, NULL, "bin");
                return;
        }
 
-       /* When a firmware file is not found must still search for the coeff files */
-       if (system_name) {
-               if (amp_name)
-                       cs35l56_hda_request_firmware_file(cs35l56, coeff_firmware, coeff_filename,
-                                                         cirrus_dir, system_name, amp_name, "bin");
-               if (!*coeff_firmware)
-                       cs35l56_hda_request_firmware_file(cs35l56, coeff_firmware, coeff_filename,
-                                                         cirrus_dir, system_name, NULL, "bin");
-       }
-
        if (!*coeff_firmware)
                cs35l56_hda_request_firmware_file(cs35l56, coeff_firmware, coeff_filename,
-                                                 cirrus_dir, NULL, NULL, "bin");
+                                                 base_name, NULL, NULL, "bin");
 }
 
 static void cs35l56_hda_release_firmware_files(const struct firmware *wmfw_firmware,
@@ -526,7 +553,8 @@ static int cs35l56_hda_fw_load(struct cs35l56_hda *cs35l56)
        const struct firmware *wmfw_firmware = NULL;
        char *coeff_filename = NULL;
        char *wmfw_filename = NULL;
-       unsigned int firmware_missing;
+       unsigned int preloaded_fw_ver;
+       bool firmware_missing;
        int ret = 0;
 
        /* Prepare for a new DSP power-up */
@@ -537,24 +565,21 @@ static int cs35l56_hda_fw_load(struct cs35l56_hda *cs35l56)
 
        pm_runtime_get_sync(cs35l56->base.dev);
 
-       ret = regmap_read(cs35l56->base.regmap, CS35L56_PROTECTION_STATUS, &firmware_missing);
-       if (ret) {
-               dev_err(cs35l56->base.dev, "Failed to read PROTECTION_STATUS: %d\n", ret);
+       /*
+        * The firmware can only be upgraded if it is currently running
+        * from the built-in ROM. If not, the wmfw/bin must be for the
+        * version of firmware that is running on the chip.
+        */
+       ret = cs35l56_read_prot_status(&cs35l56->base, &firmware_missing, &preloaded_fw_ver);
+       if (ret)
                goto err_pm_put;
-       }
 
-       firmware_missing &= CS35L56_FIRMWARE_MISSING;
+       if (firmware_missing)
+               preloaded_fw_ver = 0;
 
-       /*
-        * Firmware can only be downloaded if the CS35L56 is secured or is
-        * running from the built-in ROM. If it is secured the BIOS will have
-        * downloaded firmware, and the wmfw/bin files will only contain
-        * tunings that are safe to download with the firmware running.
-        */
-       if (cs35l56->base.secured || firmware_missing) {
-               cs35l56_hda_request_firmware_files(cs35l56, &wmfw_firmware, &wmfw_filename,
-                                                  &coeff_firmware, &coeff_filename);
-       }
+       cs35l56_hda_request_firmware_files(cs35l56, preloaded_fw_ver,
+                                          &wmfw_firmware, &wmfw_filename,
+                                          &coeff_firmware, &coeff_filename);
 
        /*
         * If the BIOS didn't patch the firmware a bin file is mandatory to
@@ -569,12 +594,12 @@ static int cs35l56_hda_fw_load(struct cs35l56_hda *cs35l56)
        mutex_lock(&cs35l56->base.irq_lock);
 
        /*
-        * When the device is running in secure mode the firmware files can
-        * only contain insecure tunings and therefore we do not need to
-        * shutdown the firmware to apply them and can use the lower cost
-        * reinit sequence instead.
+        * If the firmware hasn't been patched it must be shutdown before
+        * doing a full patch and reset afterwards. If it is already
+        * running a patched version the firmware files only contain
+        * tunings and we can use the lower cost reinit sequence instead.
         */
-       if (!cs35l56->base.secured && (wmfw_firmware || coeff_firmware)) {
+       if (firmware_missing && (wmfw_firmware || coeff_firmware)) {
                ret = cs35l56_firmware_shutdown(&cs35l56->base);
                if (ret)
                        goto err;
@@ -593,7 +618,7 @@ static int cs35l56_hda_fw_load(struct cs35l56_hda *cs35l56)
        if (coeff_filename)
                dev_dbg(cs35l56->base.dev, "Loaded Coefficients: %s\n", coeff_filename);
 
-       if (cs35l56->base.secured) {
+       if (!firmware_missing) {
                ret = cs35l56_mbox_send(&cs35l56->base, CS35L56_MBOX_CMD_AUDIO_REINIT);
                if (ret)
                        goto err_powered_up;
@@ -976,6 +1001,9 @@ int cs35l56_hda_common_probe(struct cs35l56_hda *cs35l56, int id)
 
        regmap_multi_reg_write(cs35l56->base.regmap, cs35l56_hda_dai_config,
                               ARRAY_SIZE(cs35l56_hda_dai_config));
+       ret = cs35l56_force_sync_asp1_registers_from_cache(&cs35l56->base);
+       if (ret)
+               goto err;
 
        /*
         * By default only enable one ASP1TXn, where n=amplifier index,
@@ -1035,16 +1063,6 @@ const struct dev_pm_ops cs35l56_hda_pm_ops = {
 };
 EXPORT_SYMBOL_NS_GPL(cs35l56_hda_pm_ops, SND_HDA_SCODEC_CS35L56);
 
-#if IS_ENABLED(CONFIG_SND_HDA_SCODEC_CS35L56_KUNIT_TEST)
-/* Hooks to export static function to KUnit test */
-
-int cs35l56_hda_test_hook_get_speaker_id(struct device *dev, int amp_index, int num_amps)
-{
-       return cs35l56_hda_get_speaker_id(dev, amp_index, num_amps);
-}
-EXPORT_SYMBOL_NS_GPL(cs35l56_hda_test_hook_get_speaker_id, SND_HDA_SCODEC_CS35L56);
-#endif
-
 MODULE_DESCRIPTION("CS35L56 HDA Driver");
 MODULE_IMPORT_NS(SND_HDA_CIRRUS_SCODEC);
 MODULE_IMPORT_NS(SND_HDA_CS_DSP_CONTROLS);
index 3e7bfeee84fdadf136108212e286fd16eb3c819d..efe98f6f19a373b9c343861cf9bdc2039885d7e2 100644 (file)
@@ -1207,6 +1207,9 @@ int azx_probe_codecs(struct azx *chip, unsigned int max_slots)
                                dev_warn(chip->card->dev,
                                         "Codec #%d probe error; disabling it...\n", c);
                                bus->codec_mask &= ~(1 << c);
+                               /* no codecs */
+                               if (bus->codec_mask == 0)
+                                       break;
                                /* More badly, accessing to a non-existing
                                 * codec often screws up the controller chip,
                                 * and disturbs the further communications.
index 2276adc8447840a232eb493e5bac54af0cb35682..1b550c42db092739135e5917a74914894e254454 100644 (file)
@@ -1729,9 +1729,11 @@ static int default_bdl_pos_adj(struct azx *chip)
        /* some exceptions: Atoms seem problematic with value 1 */
        if (chip->pci->vendor == PCI_VENDOR_ID_INTEL) {
                switch (chip->pci->device) {
-               case 0x0f04: /* Baytrail */
-               case 0x2284: /* Braswell */
+               case PCI_DEVICE_ID_INTEL_HDA_BYT:
+               case PCI_DEVICE_ID_INTEL_HDA_BSW:
                        return 32;
+               case PCI_DEVICE_ID_INTEL_HDA_APL:
+                       return 64;
                }
        }
 
index e8819e8a98763cb5a4760768be387a4a387c384d..e8209178d87bbcc88551e6773c90c9fec7d98af2 100644 (file)
@@ -344,6 +344,7 @@ enum {
        CXT_FIXUP_HP_ZBOOK_MUTE_LED,
        CXT_FIXUP_HEADSET_MIC,
        CXT_FIXUP_HP_MIC_NO_PRESENCE,
+       CXT_PINCFG_SWS_JS201D,
 };
 
 /* for hda_fixup_thinkpad_acpi() */
@@ -841,6 +842,17 @@ static const struct hda_pintbl cxt_pincfg_lemote[] = {
        {}
 };
 
+/* SuoWoSi/South-holding JS201D with sn6140 */
+static const struct hda_pintbl cxt_pincfg_sws_js201d[] = {
+       { 0x16, 0x03211040 }, /* hp out */
+       { 0x17, 0x91170110 }, /* SPK/Class_D */
+       { 0x18, 0x95a70130 }, /* Internal mic */
+       { 0x19, 0x03a11020 }, /* Headset Mic */
+       { 0x1a, 0x40f001f0 }, /* Not used */
+       { 0x21, 0x40f001f0 }, /* Not used */
+       {}
+};
+
 static const struct hda_fixup cxt_fixups[] = {
        [CXT_PINCFG_LENOVO_X200] = {
                .type = HDA_FIXUP_PINS,
@@ -996,6 +1008,10 @@ static const struct hda_fixup cxt_fixups[] = {
                .chained = true,
                .chain_id = CXT_FIXUP_HEADSET_MIC,
        },
+       [CXT_PINCFG_SWS_JS201D] = {
+               .type = HDA_FIXUP_PINS,
+               .v.pins = cxt_pincfg_sws_js201d,
+       },
 };
 
 static const struct snd_pci_quirk cxt5045_fixups[] = {
@@ -1069,6 +1085,7 @@ static const struct snd_pci_quirk cxt5066_fixups[] = {
        SND_PCI_QUIRK(0x103c, 0x8457, "HP Z2 G4 mini", CXT_FIXUP_HP_MIC_NO_PRESENCE),
        SND_PCI_QUIRK(0x103c, 0x8458, "HP Z2 G4 mini premium", CXT_FIXUP_HP_MIC_NO_PRESENCE),
        SND_PCI_QUIRK(0x1043, 0x138d, "Asus", CXT_FIXUP_HEADPHONE_MIC_PIN),
+       SND_PCI_QUIRK(0x14f1, 0x0265, "SWS JS201D", CXT_PINCFG_SWS_JS201D),
        SND_PCI_QUIRK(0x152d, 0x0833, "OLPC XO-1.5", CXT_FIXUP_OLPC_XO),
        SND_PCI_QUIRK(0x17aa, 0x20f2, "Lenovo T400", CXT_PINCFG_LENOVO_TP410),
        SND_PCI_QUIRK(0x17aa, 0x215e, "Lenovo T410", CXT_PINCFG_LENOVO_TP410),
@@ -1109,6 +1126,7 @@ static const struct hda_model_fixup cxt5066_fixup_models[] = {
        { .id = CXT_FIXUP_HP_ZBOOK_MUTE_LED, .name = "hp-zbook-mute-led" },
        { .id = CXT_FIXUP_HP_MIC_NO_PRESENCE, .name = "hp-mic-fix" },
        { .id = CXT_PINCFG_LENOVO_NOTEBOOK, .name = "lenovo-20149" },
+       { .id = CXT_PINCFG_SWS_JS201D, .name = "sws-js201d" },
        {}
 };
 
index 627899959ffe8c34c76211d51824e1d1e93d33a8..e41316e2e98338a5d69245ed9e9db1b979a2fdf2 100644 (file)
@@ -1371,6 +1371,7 @@ void dolphin_fixups(struct hda_codec *codec, const struct hda_fixup *fix, int ac
                spec->scodecs[CS8409_CODEC1] = &dolphin_cs42l42_1;
                spec->scodecs[CS8409_CODEC1]->codec = codec;
                spec->num_scodecs = 2;
+               spec->gen.suppress_vmaster = 1;
 
                codec->patch_ops = cs8409_dolphin_patch_ops;
 
index f6f16622f9cc78a1ac8ca0de8b82e915f580f7fd..a1facdb98d9a00a4236ff06c0f64700a0facd1cc 100644 (file)
@@ -439,6 +439,10 @@ static void alc_fill_eapd_coef(struct hda_codec *codec)
                alc_update_coef_idx(codec, 0x67, 0xf000, 0x3000);
                fallthrough;
        case 0x10ec0215:
+       case 0x10ec0285:
+       case 0x10ec0289:
+               alc_update_coef_idx(codec, 0x36, 1<<13, 0);
+               fallthrough;
        case 0x10ec0230:
        case 0x10ec0233:
        case 0x10ec0235:
@@ -452,9 +456,7 @@ static void alc_fill_eapd_coef(struct hda_codec *codec)
        case 0x10ec0283:
        case 0x10ec0286:
        case 0x10ec0288:
-       case 0x10ec0285:
        case 0x10ec0298:
-       case 0x10ec0289:
        case 0x10ec0300:
                alc_update_coef_idx(codec, 0x10, 1<<9, 0);
                break;
@@ -3682,6 +3684,7 @@ static void alc285_hp_init(struct hda_codec *codec)
        int i, val;
        int coef38, coef0d, coef36;
 
+       alc_write_coefex_idx(codec, 0x58, 0x00, 0x1888); /* write default value */
        alc_update_coef_idx(codec, 0x4a, 1<<15, 1<<15); /* Reset HP JD */
        coef38 = alc_read_coef_idx(codec, 0x38); /* Amp control */
        coef0d = alc_read_coef_idx(codec, 0x0d); /* Digital Misc control */
@@ -7442,6 +7445,7 @@ enum {
        ALC287_FIXUP_LEGION_15IMHG05_AUTOMUTE,
        ALC287_FIXUP_YOGA7_14ITL_SPEAKERS,
        ALC298_FIXUP_LENOVO_C940_DUET7,
+       ALC287_FIXUP_LENOVO_14IRP8_DUETITL,
        ALC287_FIXUP_13S_GEN2_SPEAKERS,
        ALC256_FIXUP_SET_COEF_DEFAULTS,
        ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE,
@@ -7493,6 +7497,26 @@ static void alc298_fixup_lenovo_c940_duet7(struct hda_codec *codec,
        __snd_hda_apply_fixup(codec, id, action, 0);
 }
 
+/* A special fixup for Lenovo Slim/Yoga Pro 9 14IRP8 and Yoga DuetITL 2021;
+ * 14IRP8 PCI SSID will mistakenly be matched with the DuetITL codec SSID,
+ * so we need to apply a different fixup in this case. The only DuetITL codec
+ * SSID reported so far is the 17aa:3802 while the 14IRP8 has the 17aa:38be
+ * and 17aa:38bf. If it weren't for the PCI SSID, the 14IRP8 models would
+ * have matched correctly by their codecs.
+ */
+static void alc287_fixup_lenovo_14irp8_duetitl(struct hda_codec *codec,
+                                             const struct hda_fixup *fix,
+                                             int action)
+{
+       int id;
+
+       if (codec->core.subsystem_id == 0x17aa3802)
+               id = ALC287_FIXUP_YOGA7_14ITL_SPEAKERS; /* DuetITL */
+       else
+               id = ALC287_FIXUP_TAS2781_I2C; /* 14IRP8 */
+       __snd_hda_apply_fixup(codec, id, action, 0);
+}
+
 static const struct hda_fixup alc269_fixups[] = {
        [ALC269_FIXUP_GPIO2] = {
                .type = HDA_FIXUP_FUNC,
@@ -9377,6 +9401,10 @@ static const struct hda_fixup alc269_fixups[] = {
                .type = HDA_FIXUP_FUNC,
                .v.func = alc298_fixup_lenovo_c940_duet7,
        },
+       [ALC287_FIXUP_LENOVO_14IRP8_DUETITL] = {
+               .type = HDA_FIXUP_FUNC,
+               .v.func = alc287_fixup_lenovo_14irp8_duetitl,
+       },
        [ALC287_FIXUP_13S_GEN2_SPEAKERS] = {
                .type = HDA_FIXUP_VERBS,
                .v.verbs = (const struct hda_verb[]) {
@@ -9577,13 +9605,13 @@ static const struct hda_fixup alc269_fixups[] = {
                .type = HDA_FIXUP_FUNC,
                .v.func = cs35l41_fixup_i2c_two,
                .chained = true,
-               .chain_id = ALC269_FIXUP_THINKPAD_ACPI,
+               .chain_id = ALC285_FIXUP_THINKPAD_NO_BASS_SPK_HEADSET_JACK,
        },
        [ALC287_FIXUP_TAS2781_I2C] = {
                .type = HDA_FIXUP_FUNC,
                .v.func = tas2781_fixup_i2c,
                .chained = true,
-               .chain_id = ALC269_FIXUP_THINKPAD_ACPI,
+               .chain_id = ALC285_FIXUP_THINKPAD_HEADSET_JACK,
        },
        [ALC287_FIXUP_YOGA7_14ARB7_I2C] = {
                .type = HDA_FIXUP_FUNC,
@@ -9604,6 +9632,8 @@ static const struct hda_fixup alc269_fixups[] = {
        [ALC287_FIXUP_THINKPAD_I2S_SPK] = {
                .type = HDA_FIXUP_FUNC,
                .v.func = alc287_fixup_bind_dacs,
+               .chained = true,
+               .chain_id = ALC285_FIXUP_THINKPAD_NO_BASS_SPK_HEADSET_JACK,
        },
        [ALC287_FIXUP_MG_RTKC_CSAMP_CS35L41_I2C_THINKPAD] = {
                .type = HDA_FIXUP_FUNC,
@@ -9653,6 +9683,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x1025, 0x1247, "Acer vCopperbox", ALC269VC_FIXUP_ACER_VCOPPERBOX_PINS),
        SND_PCI_QUIRK(0x1025, 0x1248, "Acer Veriton N4660G", ALC269VC_FIXUP_ACER_MIC_NO_PRESENCE),
        SND_PCI_QUIRK(0x1025, 0x1269, "Acer SWIFT SF314-54", ALC256_FIXUP_ACER_HEADSET_MIC),
+       SND_PCI_QUIRK(0x1025, 0x126a, "Acer Swift SF114-32", ALC256_FIXUP_ACER_MIC_NO_PRESENCE),
        SND_PCI_QUIRK(0x1025, 0x128f, "Acer Veriton Z6860G", ALC286_FIXUP_ACER_AIO_HEADSET_MIC),
        SND_PCI_QUIRK(0x1025, 0x1290, "Acer Veriton Z4860G", ALC286_FIXUP_ACER_AIO_HEADSET_MIC),
        SND_PCI_QUIRK(0x1025, 0x1291, "Acer Veriton Z4660G", ALC286_FIXUP_ACER_AIO_HEADSET_MIC),
@@ -9732,12 +9763,16 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x1028, 0x0b71, "Dell Inspiron 16 Plus 7620", ALC295_FIXUP_DELL_INSPIRON_TOP_SPEAKERS),
        SND_PCI_QUIRK(0x1028, 0x0beb, "Dell XPS 15 9530 (2023)", ALC289_FIXUP_DELL_CS35L41_SPI_2),
        SND_PCI_QUIRK(0x1028, 0x0c03, "Dell Precision 5340", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE),
+       SND_PCI_QUIRK(0x1028, 0x0c0b, "Dell Oasis 14 RPL-P", ALC289_FIXUP_RTK_AMP_DUAL_SPK),
+       SND_PCI_QUIRK(0x1028, 0x0c0d, "Dell Oasis", ALC289_FIXUP_RTK_AMP_DUAL_SPK),
+       SND_PCI_QUIRK(0x1028, 0x0c0e, "Dell Oasis 16", ALC289_FIXUP_RTK_AMP_DUAL_SPK),
        SND_PCI_QUIRK(0x1028, 0x0c19, "Dell Precision 3340", ALC236_FIXUP_DELL_DUAL_CODECS),
        SND_PCI_QUIRK(0x1028, 0x0c1a, "Dell Precision 3340", ALC236_FIXUP_DELL_DUAL_CODECS),
        SND_PCI_QUIRK(0x1028, 0x0c1b, "Dell Precision 3440", ALC236_FIXUP_DELL_DUAL_CODECS),
        SND_PCI_QUIRK(0x1028, 0x0c1c, "Dell Precision 3540", ALC236_FIXUP_DELL_DUAL_CODECS),
        SND_PCI_QUIRK(0x1028, 0x0c1d, "Dell Precision 3440", ALC236_FIXUP_DELL_DUAL_CODECS),
        SND_PCI_QUIRK(0x1028, 0x0c1e, "Dell Precision 3540", ALC236_FIXUP_DELL_DUAL_CODECS),
+       SND_PCI_QUIRK(0x1028, 0x0c28, "Dell Inspiron 16 Plus 7630", ALC295_FIXUP_DELL_INSPIRON_TOP_SPEAKERS),
        SND_PCI_QUIRK(0x1028, 0x0c4d, "Dell", ALC287_FIXUP_CS35L41_I2C_4),
        SND_PCI_QUIRK(0x1028, 0x0cbd, "Dell Oasis 13 CS MTL-U", ALC289_FIXUP_DELL_CS35L41_SPI_2),
        SND_PCI_QUIRK(0x1028, 0x0cbe, "Dell Oasis 13 2-IN-1 MTL-U", ALC289_FIXUP_DELL_CS35L41_SPI_2),
@@ -9852,6 +9887,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x103c, 0x8786, "HP OMEN 15", ALC285_FIXUP_HP_MUTE_LED),
        SND_PCI_QUIRK(0x103c, 0x8787, "HP OMEN 15", ALC285_FIXUP_HP_MUTE_LED),
        SND_PCI_QUIRK(0x103c, 0x8788, "HP OMEN 15", ALC285_FIXUP_HP_MUTE_LED),
+       SND_PCI_QUIRK(0x103c, 0x87b7, "HP Laptop 14-fq0xxx", ALC236_FIXUP_HP_MUTE_LED_COEFBIT2),
        SND_PCI_QUIRK(0x103c, 0x87c8, "HP", ALC287_FIXUP_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x87e5, "HP ProBook 440 G8 Notebook PC", ALC236_FIXUP_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x87e7, "HP ProBook 450 G8 Notebook PC", ALC236_FIXUP_HP_GPIO_LED),
@@ -9893,6 +9929,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x103c, 0x8973, "HP EliteBook 860 G9", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8974, "HP EliteBook 840 Aero G9", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8975, "HP EliteBook x360 840 Aero G9", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
+       SND_PCI_QUIRK(0x103c, 0x897d, "HP mt440 Mobile Thin Client U74", ALC236_FIXUP_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8981, "HP Elite Dragonfly G3", ALC245_FIXUP_CS35L41_SPI_4),
        SND_PCI_QUIRK(0x103c, 0x898e, "HP EliteBook 835 G9", ALC287_FIXUP_CS35L41_I2C_2),
        SND_PCI_QUIRK(0x103c, 0x898f, "HP EliteBook 835 G9", ALC287_FIXUP_CS35L41_I2C_2),
@@ -9918,16 +9955,20 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x103c, 0x8aa3, "HP ProBook 450 G9 (MB 8AA1)", ALC236_FIXUP_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8aa8, "HP EliteBook 640 G9 (MB 8AA6)", ALC236_FIXUP_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8aab, "HP EliteBook 650 G9 (MB 8AA9)", ALC236_FIXUP_HP_GPIO_LED),
+       SND_PCI_QUIRK(0x103c, 0x8ab9, "HP EliteBook 840 G8 (MB 8AB8)", ALC285_FIXUP_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8abb, "HP ZBook Firefly 14 G9", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8ad1, "HP EliteBook 840 14 inch G9 Notebook PC", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8ad2, "HP EliteBook 860 16 inch G9 Notebook PC", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
+       SND_PCI_QUIRK(0x103c, 0x8b0f, "HP Elite mt645 G7 Mobile Thin Client U81", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
        SND_PCI_QUIRK(0x103c, 0x8b2f, "HP 255 15.6 inch G10 Notebook PC", ALC236_FIXUP_HP_MUTE_LED_COEFBIT2),
+       SND_PCI_QUIRK(0x103c, 0x8b3f, "HP mt440 Mobile Thin Client U91", ALC236_FIXUP_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8b42, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8b43, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8b44, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8b45, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8b46, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8b47, "HP", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
+       SND_PCI_QUIRK(0x103c, 0x8b59, "HP Elite mt645 G7 Mobile Thin Client U89", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
        SND_PCI_QUIRK(0x103c, 0x8b5d, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
        SND_PCI_QUIRK(0x103c, 0x8b5e, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
        SND_PCI_QUIRK(0x103c, 0x8b63, "HP Elite Dragonfly 13.5 inch G4", ALC245_FIXUP_CS35L41_SPI_4_HP_GPIO_LED),
@@ -9955,8 +9996,14 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x103c, 0x8c70, "HP EliteBook 835 G11", ALC287_FIXUP_CS35L41_I2C_2_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8c71, "HP EliteBook 845 G11", ALC287_FIXUP_CS35L41_I2C_2_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8c72, "HP EliteBook 865 G11", ALC287_FIXUP_CS35L41_I2C_2_HP_GPIO_LED),
+       SND_PCI_QUIRK(0x103c, 0x8c8a, "HP EliteBook 630", ALC236_FIXUP_HP_GPIO_LED),
+       SND_PCI_QUIRK(0x103c, 0x8c8c, "HP EliteBook 660", ALC236_FIXUP_HP_GPIO_LED),
+       SND_PCI_QUIRK(0x103c, 0x8c90, "HP EliteBook 640", ALC236_FIXUP_HP_GPIO_LED),
+       SND_PCI_QUIRK(0x103c, 0x8c91, "HP EliteBook 660", ALC236_FIXUP_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8c96, "HP", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
        SND_PCI_QUIRK(0x103c, 0x8c97, "HP ZBook", ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF),
+       SND_PCI_QUIRK(0x103c, 0x8ca1, "HP ZBook Power", ALC236_FIXUP_HP_GPIO_LED),
+       SND_PCI_QUIRK(0x103c, 0x8ca2, "HP ZBook Power", ALC236_FIXUP_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8ca4, "HP ZBook Fury", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8ca7, "HP ZBook Fury", ALC245_FIXUP_CS35L41_SPI_2_HP_GPIO_LED),
        SND_PCI_QUIRK(0x103c, 0x8cf5, "HP ZBook Studio 16", ALC245_FIXUP_CS35L41_SPI_4_HP_GPIO_LED),
@@ -9994,6 +10041,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x1043, 0x1662, "ASUS GV301QH", ALC294_FIXUP_ASUS_DUAL_SPK),
        SND_PCI_QUIRK(0x1043, 0x1663, "ASUS GU603ZI/ZJ/ZQ/ZU/ZV", ALC285_FIXUP_ASUS_HEADSET_MIC),
        SND_PCI_QUIRK(0x1043, 0x1683, "ASUS UM3402YAR", ALC287_FIXUP_CS35L41_I2C_2),
+       SND_PCI_QUIRK(0x1043, 0x16a3, "ASUS UX3402VA", ALC245_FIXUP_CS35L41_SPI_2),
        SND_PCI_QUIRK(0x1043, 0x16b2, "ASUS GU603", ALC289_FIXUP_ASUS_GA401),
        SND_PCI_QUIRK(0x1043, 0x16d3, "ASUS UX5304VA", ALC245_FIXUP_CS35L41_SPI_2),
        SND_PCI_QUIRK(0x1043, 0x16e3, "ASUS UX50", ALC269_FIXUP_STEREO_DMIC),
@@ -10037,14 +10085,12 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x1043, 0x1d4e, "ASUS TM420", ALC256_FIXUP_ASUS_HPE),
        SND_PCI_QUIRK(0x1043, 0x1da2, "ASUS UP6502ZA/ZD", ALC245_FIXUP_CS35L41_SPI_2),
        SND_PCI_QUIRK(0x1043, 0x1e02, "ASUS UX3402ZA", ALC245_FIXUP_CS35L41_SPI_2),
-       SND_PCI_QUIRK(0x1043, 0x16a3, "ASUS UX3402VA", ALC245_FIXUP_CS35L41_SPI_2),
-       SND_PCI_QUIRK(0x1043, 0x1f62, "ASUS UX7602ZM", ALC245_FIXUP_CS35L41_SPI_2),
        SND_PCI_QUIRK(0x1043, 0x1e11, "ASUS Zephyrus G15", ALC289_FIXUP_ASUS_GA502),
-       SND_PCI_QUIRK(0x1043, 0x1e12, "ASUS UM6702RA/RC", ALC287_FIXUP_CS35L41_I2C_2),
+       SND_PCI_QUIRK(0x1043, 0x1e12, "ASUS UM3402", ALC287_FIXUP_CS35L41_I2C_2),
        SND_PCI_QUIRK(0x1043, 0x1e51, "ASUS Zephyrus M15", ALC294_FIXUP_ASUS_GU502_PINS),
        SND_PCI_QUIRK(0x1043, 0x1e5e, "ASUS ROG Strix G513", ALC294_FIXUP_ASUS_G513_PINS),
        SND_PCI_QUIRK(0x1043, 0x1e8e, "ASUS Zephyrus G15", ALC289_FIXUP_ASUS_GA401),
-       SND_PCI_QUIRK(0x1043, 0x1ee2, "ASUS UM3402", ALC287_FIXUP_CS35L41_I2C_2),
+       SND_PCI_QUIRK(0x1043, 0x1ee2, "ASUS UM6702RA/RC", ALC287_FIXUP_CS35L41_I2C_2),
        SND_PCI_QUIRK(0x1043, 0x1c52, "ASUS Zephyrus G15 2022", ALC289_FIXUP_ASUS_GA401),
        SND_PCI_QUIRK(0x1043, 0x1f11, "ASUS Zephyrus G14", ALC289_FIXUP_ASUS_GA401),
        SND_PCI_QUIRK(0x1043, 0x1f12, "ASUS UM5302", ALC287_FIXUP_CS35L41_I2C_2),
@@ -10235,7 +10281,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x17aa, 0x31af, "ThinkCentre Station", ALC623_FIXUP_LENOVO_THINKSTATION_P340),
        SND_PCI_QUIRK(0x17aa, 0x334b, "Lenovo ThinkCentre M70 Gen5", ALC283_FIXUP_HEADSET_MIC),
        SND_PCI_QUIRK(0x17aa, 0x3801, "Lenovo Yoga9 14IAP7", ALC287_FIXUP_YOGA9_14IAP7_BASS_SPK_PIN),
-       SND_PCI_QUIRK(0x17aa, 0x3802, "Lenovo Yoga DuetITL 2021", ALC287_FIXUP_YOGA7_14ITL_SPEAKERS),
+       SND_PCI_QUIRK(0x17aa, 0x3802, "Lenovo Yoga Pro 9 14IRP8 / DuetITL 2021", ALC287_FIXUP_LENOVO_14IRP8_DUETITL),
        SND_PCI_QUIRK(0x17aa, 0x3813, "Legion 7i 15IMHG05", ALC287_FIXUP_LEGION_15IMHG05_SPEAKERS),
        SND_PCI_QUIRK(0x17aa, 0x3818, "Lenovo C940 / Yoga Duet 7", ALC298_FIXUP_LENOVO_C940_DUET7),
        SND_PCI_QUIRK(0x17aa, 0x3819, "Lenovo 13s Gen2 ITL", ALC287_FIXUP_13S_GEN2_SPEAKERS),
@@ -10251,6 +10297,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x17aa, 0x3853, "Lenovo Yoga 7 15ITL5", ALC287_FIXUP_YOGA7_14ITL_SPEAKERS),
        SND_PCI_QUIRK(0x17aa, 0x3855, "Legion 7 16ITHG6", ALC287_FIXUP_LEGION_16ITHG6),
        SND_PCI_QUIRK(0x17aa, 0x3869, "Lenovo Yoga7 14IAL7", ALC287_FIXUP_YOGA9_14IAP7_BASS_SPK_PIN),
+       SND_PCI_QUIRK(0x17aa, 0x386f, "Legion 7i 16IAX7", ALC287_FIXUP_CS35L41_I2C_2),
        SND_PCI_QUIRK(0x17aa, 0x3870, "Lenovo Yoga 7 14ARB7", ALC287_FIXUP_YOGA7_14ARB7_I2C),
        SND_PCI_QUIRK(0x17aa, 0x387d, "Yoga S780-16 pro Quad AAC", ALC287_FIXUP_TAS2781_I2C),
        SND_PCI_QUIRK(0x17aa, 0x387e, "Yoga S780-16 pro Quad YC", ALC287_FIXUP_TAS2781_I2C),
@@ -10260,6 +10307,8 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x17aa, 0x3886, "Y780 VECO DUAL", ALC287_FIXUP_TAS2781_I2C),
        SND_PCI_QUIRK(0x17aa, 0x38a7, "Y780P AMD YG dual", ALC287_FIXUP_TAS2781_I2C),
        SND_PCI_QUIRK(0x17aa, 0x38a8, "Y780P AMD VECO dual", ALC287_FIXUP_TAS2781_I2C),
+       SND_PCI_QUIRK(0x17aa, 0x38a9, "Thinkbook 16P", ALC287_FIXUP_CS35L41_I2C_2),
+       SND_PCI_QUIRK(0x17aa, 0x38ab, "Thinkbook 16P", ALC287_FIXUP_CS35L41_I2C_2),
        SND_PCI_QUIRK(0x17aa, 0x38b4, "Legion Slim 7 16IRH8", ALC287_FIXUP_CS35L41_I2C_2),
        SND_PCI_QUIRK(0x17aa, 0x38b5, "Legion Slim 7 16IRH8", ALC287_FIXUP_CS35L41_I2C_2),
        SND_PCI_QUIRK(0x17aa, 0x38b6, "Legion Slim 7 16APH8", ALC287_FIXUP_CS35L41_I2C_2),
@@ -10322,6 +10371,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
        SND_PCI_QUIRK(0x1d72, 0x1945, "Redmi G", ALC256_FIXUP_ASUS_HEADSET_MIC),
        SND_PCI_QUIRK(0x1d72, 0x1947, "RedmiBook Air", ALC255_FIXUP_XIAOMI_HEADSET_MIC),
        SND_PCI_QUIRK(0x2782, 0x0232, "CHUWI CoreBook XPro", ALC269VB_FIXUP_CHUWI_COREBOOK_XPRO),
+       SND_PCI_QUIRK(0x2782, 0x1707, "Vaio VJFE-ADL", ALC298_FIXUP_SPK_VOLUME),
        SND_PCI_QUIRK(0x8086, 0x2074, "Intel NUC 8", ALC233_FIXUP_INTEL_NUC8_DMIC),
        SND_PCI_QUIRK(0x8086, 0x2080, "Intel NUC 8 Rugged", ALC256_FIXUP_INTEL_NUC8_RUGGED),
        SND_PCI_QUIRK(0x8086, 0x2081, "Intel NUC 10", ALC256_FIXUP_INTEL_NUC10),
@@ -10951,6 +11001,8 @@ static const struct snd_hda_pin_quirk alc269_pin_fixup_tbl[] = {
  *   at most one tbl is allowed to define for the same vendor and same codec
  */
 static const struct snd_hda_pin_quirk alc269_fallback_pin_fixup_tbl[] = {
+       SND_HDA_PIN_QUIRK(0x10ec0256, 0x1025, "Acer", ALC2XX_FIXUP_HEADSET_MIC,
+               {0x19, 0x40000000}),
        SND_HDA_PIN_QUIRK(0x10ec0289, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
                {0x19, 0x40000000},
                {0x1b, 0x40000000}),
@@ -11640,8 +11692,7 @@ static void alc897_hp_automute_hook(struct hda_codec *codec,
 
        snd_hda_gen_hp_automute(codec, jack);
        vref = spec->gen.hp_jack_present ? (PIN_HP | AC_PINCTL_VREF_100) : PIN_HP;
-       snd_hda_codec_write(codec, 0x1b, 0, AC_VERB_SET_PIN_WIDGET_CONTROL,
-                           vref);
+       snd_hda_set_pin_ctl(codec, 0x1b, vref);
 }
 
 static void alc897_fixup_lenovo_headset_mic(struct hda_codec *codec,
@@ -11650,6 +11701,10 @@ static void alc897_fixup_lenovo_headset_mic(struct hda_codec *codec,
        struct alc_spec *spec = codec->spec;
        if (action == HDA_FIXUP_ACT_PRE_PROBE) {
                spec->gen.hp_automute_hook = alc897_hp_automute_hook;
+               spec->no_shutup_pins = 1;
+       }
+       if (action == HDA_FIXUP_ACT_PROBE) {
+               snd_hda_set_pin_ctl_cache(codec, 0x1a, PIN_IN | AC_PINCTL_VREF_100);
        }
 }
 
index 2dd809de62e5a4ac049a49752f447b0a8b4a3be6..1bfb00102a77a4d49695efeaf7016eded961345b 100644 (file)
@@ -710,7 +710,7 @@ static int tas2781_hda_bind(struct device *dev, struct device *master,
 
        strscpy(comps->name, dev_name(dev), sizeof(comps->name));
 
-       ret = tascodec_init(tas_hda->priv, codec, tasdev_fw_ready);
+       ret = tascodec_init(tas_hda->priv, codec, THIS_MODULE, tasdev_fw_ready);
        if (!ret)
                comps->playback_hook = tas2781_hda_playback_hook;
 
index c90ec3419247797a0628bfa5207eaf3afcb8d012..504d1b8c4cbb4f104a8b8e70adf10894f211ce6c 100644 (file)
@@ -505,6 +505,13 @@ static int acp_card_rt5682s_hw_params(struct snd_pcm_substream *substream,
 
        clk_set_rate(drvdata->wclk, srate);
        clk_set_rate(drvdata->bclk, srate * ch * format);
+       if (!drvdata->soc_mclk) {
+               ret = acp_clk_enable(drvdata, srate, ch * format);
+               if (ret < 0) {
+                       dev_err(rtd->card->dev, "Failed to enable HS clk: %d\n", ret);
+                       return ret;
+               }
+       }
 
        return 0;
 }
@@ -1464,8 +1471,13 @@ int acp_sofdsp_dai_links_create(struct snd_soc_card *card)
        if (drv_data->amp_cpu_id == I2S_SP) {
                links[i].name = "acp-amp-codec";
                links[i].id = AMP_BE_ID;
-               links[i].cpus = sof_sp_virtual;
-               links[i].num_cpus = ARRAY_SIZE(sof_sp_virtual);
+               if (drv_data->platform == RENOIR) {
+                       links[i].cpus = sof_sp;
+                       links[i].num_cpus = ARRAY_SIZE(sof_sp);
+               } else {
+                       links[i].cpus = sof_sp_virtual;
+                       links[i].num_cpus = ARRAY_SIZE(sof_sp_virtual);
+               }
                links[i].platforms = sof_component;
                links[i].num_platforms = ARRAY_SIZE(sof_component);
                links[i].dpcm_playback = 1;
index 2a9fd3275e42f5fa1086d10baf4a8cc4cc2b69b1..20b94814a0462147258fe94cd4219b772afa45f0 100644 (file)
@@ -48,6 +48,7 @@ static struct acp_card_drvdata sof_rt5682s_rt1019_data = {
        .hs_codec_id = RT5682S,
        .amp_codec_id = RT1019,
        .dmic_codec_id = DMIC,
+       .platform = RENOIR,
        .tdm_mode = false,
 };
 
@@ -58,6 +59,7 @@ static struct acp_card_drvdata sof_rt5682s_max_data = {
        .hs_codec_id = RT5682S,
        .amp_codec_id = MAX98360A,
        .dmic_codec_id = DMIC,
+       .platform = RENOIR,
        .tdm_mode = false,
 };
 
@@ -68,6 +70,7 @@ static struct acp_card_drvdata sof_nau8825_data = {
        .hs_codec_id = NAU8825,
        .amp_codec_id = MAX98360A,
        .dmic_codec_id = DMIC,
+       .platform = REMBRANDT,
        .soc_mclk = true,
        .tdm_mode = false,
 };
@@ -79,6 +82,7 @@ static struct acp_card_drvdata sof_rt5682s_hs_rt1019_data = {
        .hs_codec_id = RT5682S,
        .amp_codec_id = RT1019,
        .dmic_codec_id = DMIC,
+       .platform = REMBRANDT,
        .soc_mclk = true,
        .tdm_mode = false,
 };
index f85b85ea4be9c28cf6ba1f29b4c7290908bc842d..2b0aa270a3e9d75c8ebd47aaf12e6fc0b73c75b3 100644 (file)
@@ -354,6 +354,14 @@ static const struct dmi_system_id acp3x_es83xx_dmi_table[] = {
                },
                .driver_data = (void *)(ES83XX_ENABLE_DMIC|ES83XX_48_MHZ_MCLK),
        },
+       {
+               .matches = {
+                       DMI_EXACT_MATCH(DMI_BOARD_VENDOR, "HUAWEI"),
+                       DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "HVY-WXX9"),
+                       DMI_EXACT_MATCH(DMI_PRODUCT_VERSION, "M1010"),
+               },
+               .driver_data = (void *)(ES83XX_ENABLE_DMIC),
+       },
        {
                .matches = {
                        DMI_EXACT_MATCH(DMI_BOARD_VENDOR, "HUAWEI"),
index d83cb6e4c62aecc6e54a700e5d22f136253e42fb..90360f8b3e81b9374680058256f59efd96ad822d 100644 (file)
@@ -199,6 +199,20 @@ static const struct dmi_system_id yc_acp_quirk_table[] = {
                        DMI_MATCH(DMI_PRODUCT_NAME, "21HY"),
                }
        },
+       {
+               .driver_data = &acp6x_card,
+               .matches = {
+                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
+                       DMI_MATCH(DMI_PRODUCT_NAME, "21J2"),
+               }
+       },
+       {
+               .driver_data = &acp6x_card,
+               .matches = {
+                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
+                       DMI_MATCH(DMI_PRODUCT_NAME, "21J0"),
+               }
+       },
        {
                .driver_data = &acp6x_card,
                .matches = {
@@ -234,6 +248,13 @@ static const struct dmi_system_id yc_acp_quirk_table[] = {
                        DMI_MATCH(DMI_PRODUCT_NAME, "82UG"),
                }
        },
+       {
+               .driver_data = &acp6x_card,
+               .matches = {
+                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
+                       DMI_MATCH(DMI_PRODUCT_NAME, "82UU"),
+               }
+       },
        {
                .driver_data = &acp6x_card,
                .matches = {
@@ -248,6 +269,13 @@ static const struct dmi_system_id yc_acp_quirk_table[] = {
                        DMI_MATCH(DMI_PRODUCT_NAME, "82YM"),
                }
        },
+       {
+               .driver_data = &acp6x_card,
+               .matches = {
+                       DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
+                       DMI_MATCH(DMI_PRODUCT_NAME, "83AS"),
+               }
+       },
        {
                .driver_data = &acp6x_card,
                .matches = {
@@ -297,6 +325,13 @@ static const struct dmi_system_id yc_acp_quirk_table[] = {
                        DMI_MATCH(DMI_PRODUCT_NAME, "Bravo 15 B7ED"),
                }
        },
+       {
+               .driver_data = &acp6x_card,
+               .matches = {
+                       DMI_MATCH(DMI_BOARD_VENDOR, "Micro-Star International Co., Ltd."),
+                       DMI_MATCH(DMI_PRODUCT_NAME, "Bravo 15 C7VF"),
+               }
+       },
        {
                .driver_data = &acp6x_card,
                .matches = {
@@ -381,6 +416,13 @@ static const struct dmi_system_id yc_acp_quirk_table[] = {
                        DMI_MATCH(DMI_BOARD_NAME, "8B2F"),
                }
        },
+       {
+               .driver_data = &acp6x_card,
+               .matches = {
+                       DMI_MATCH(DMI_BOARD_VENDOR, "HP"),
+                       DMI_MATCH(DMI_BOARD_NAME, "8BD6"),
+               }
+       },
        {
                .driver_data = &acp6x_card,
                .matches = {
index 7af6a349b1d41fb60d9450a312927e4774889a34..694b8e31390248b88e4bc8aba843316244f0bdb4 100644 (file)
@@ -162,6 +162,7 @@ static int snd_acp6x_probe(struct pci_dev *pci,
        /* Yellow Carp device check */
        switch (pci->revision) {
        case 0x60:
+       case 0x63:
        case 0x6f:
                break;
        default:
index 44c221745c3b255c7e69a4ab092ad09581a9b1f9..2392c6effed857c32326ee032f776dcc577ed0fb 100644 (file)
@@ -184,7 +184,7 @@ static int cs35l45_activate_ctl(struct snd_soc_component *component,
        else
                snprintf(name, SNDRV_CTL_ELEM_ID_NAME_MAXLEN, "%s", ctl_name);
 
-       kcontrol = snd_soc_card_get_kcontrol(component->card, name);
+       kcontrol = snd_soc_card_get_kcontrol_locked(component->card, name);
        if (!kcontrol) {
                dev_err(component->dev, "Can't find kcontrol %s\n", name);
                return -EINVAL;
index 953ba066bab1e30dfc22ea94bead73ce2f91c0fe..cb4e83126b085228fa196aeb7e635537c39cfc24 100644 (file)
@@ -5,6 +5,7 @@
 // Copyright (C) 2023 Cirrus Logic, Inc. and
 //                    Cirrus Logic International Semiconductor Ltd.
 
+#include <linux/gpio/consumer.h>
 #include <linux/regmap.h>
 #include <linux/regulator/consumer.h>
 #include <linux/types.h>
 #include "cs35l56.h"
 
 static const struct reg_sequence cs35l56_patch[] = {
+       /*
+        * Firmware can change these to non-defaults to satisfy SDCA.
+        * Ensure that they are at known defaults.
+        */
+       { CS35L56_SWIRE_DP3_CH1_INPUT,          0x00000018 },
+       { CS35L56_SWIRE_DP3_CH2_INPUT,          0x00000019 },
+       { CS35L56_SWIRE_DP3_CH3_INPUT,          0x00000029 },
+       { CS35L56_SWIRE_DP3_CH4_INPUT,          0x00000028 },
+
        /* These are not reset by a soft-reset, so patch to defaults. */
        { CS35L56_MAIN_RENDER_USER_MUTE,        0x00000000 },
        { CS35L56_MAIN_RENDER_USER_VOLUME,      0x00000000 },
@@ -34,15 +44,13 @@ static const struct reg_default cs35l56_reg_defaults[] = {
        { CS35L56_ASP1_FRAME_CONTROL5,          0x00020100 },
        { CS35L56_ASP1_DATA_CONTROL1,           0x00000018 },
        { CS35L56_ASP1_DATA_CONTROL5,           0x00000018 },
-       { CS35L56_ASP1TX1_INPUT,                0x00000018 },
-       { CS35L56_ASP1TX2_INPUT,                0x00000019 },
-       { CS35L56_ASP1TX3_INPUT,                0x00000020 },
-       { CS35L56_ASP1TX4_INPUT,                0x00000028 },
+
+       /* no defaults for ASP1TX mixer */
+
        { CS35L56_SWIRE_DP3_CH1_INPUT,          0x00000018 },
        { CS35L56_SWIRE_DP3_CH2_INPUT,          0x00000019 },
        { CS35L56_SWIRE_DP3_CH3_INPUT,          0x00000029 },
        { CS35L56_SWIRE_DP3_CH4_INPUT,          0x00000028 },
-       { CS35L56_IRQ1_CFG,                     0x00000000 },
        { CS35L56_IRQ1_MASK_1,                  0x83ffffff },
        { CS35L56_IRQ1_MASK_2,                  0xffff7fff },
        { CS35L56_IRQ1_MASK_4,                  0xe0ffffff },
@@ -195,6 +203,47 @@ static bool cs35l56_volatile_reg(struct device *dev, unsigned int reg)
        }
 }
 
+/*
+ * The firmware boot sequence can overwrite the ASP1 config registers so that
+ * they don't match regmap's view of their values. Rewrite the values from the
+ * regmap cache into the hardware registers.
+ */
+int cs35l56_force_sync_asp1_registers_from_cache(struct cs35l56_base *cs35l56_base)
+{
+       struct reg_sequence asp1_regs[] = {
+               { .reg = CS35L56_ASP1_ENABLES1 },
+               { .reg = CS35L56_ASP1_CONTROL1 },
+               { .reg = CS35L56_ASP1_CONTROL2 },
+               { .reg = CS35L56_ASP1_CONTROL3 },
+               { .reg = CS35L56_ASP1_FRAME_CONTROL1 },
+               { .reg = CS35L56_ASP1_FRAME_CONTROL5 },
+               { .reg = CS35L56_ASP1_DATA_CONTROL1 },
+               { .reg = CS35L56_ASP1_DATA_CONTROL5 },
+       };
+       int i, ret;
+
+       /* Read values from regmap cache into a write sequence */
+       for (i = 0; i < ARRAY_SIZE(asp1_regs); ++i) {
+               ret = regmap_read(cs35l56_base->regmap, asp1_regs[i].reg, &asp1_regs[i].def);
+               if (ret)
+                       goto err;
+       }
+
+       /* Write the values cache-bypassed so that they will be written to silicon */
+       ret = regmap_multi_reg_write_bypassed(cs35l56_base->regmap, asp1_regs,
+                                             ARRAY_SIZE(asp1_regs));
+       if (ret)
+               goto err;
+
+       return 0;
+
+err:
+       dev_err(cs35l56_base->dev, "Failed to sync ASP1 registers: %d\n", ret);
+
+       return ret;
+}
+EXPORT_SYMBOL_NS_GPL(cs35l56_force_sync_asp1_registers_from_cache, SND_SOC_CS35L56_SHARED);
+
 int cs35l56_mbox_send(struct cs35l56_base *cs35l56_base, unsigned int command)
 {
        unsigned int val;
@@ -286,6 +335,7 @@ void cs35l56_wait_min_reset_pulse(void)
 EXPORT_SYMBOL_NS_GPL(cs35l56_wait_min_reset_pulse, SND_SOC_CS35L56_SHARED);
 
 static const struct reg_sequence cs35l56_system_reset_seq[] = {
+       REG_SEQ0(CS35L56_DSP1_HALO_STATE, 0),
        REG_SEQ0(CS35L56_DSP_VIRTUAL1_MBOX_1, CS35L56_MBOX_CMD_SYSTEM_RESET),
 };
 
@@ -400,17 +450,6 @@ int cs35l56_is_fw_reload_needed(struct cs35l56_base *cs35l56_base)
        unsigned int val;
        int ret;
 
-       /* Nothing to re-patch if we haven't done any patching yet. */
-       if (!cs35l56_base->fw_patched)
-               return false;
-
-       /*
-        * If we have control of RESET we will have asserted it so the firmware
-        * will need re-patching.
-        */
-       if (cs35l56_base->reset_gpio)
-               return true;
-
        /*
         * In secure mode FIRMWARE_MISSING is cleared by the BIOS loader so
         * can't be used here to test for memory retention.
@@ -590,10 +629,35 @@ void cs35l56_init_cs_dsp(struct cs35l56_base *cs35l56_base, struct cs_dsp *cs_ds
 }
 EXPORT_SYMBOL_NS_GPL(cs35l56_init_cs_dsp, SND_SOC_CS35L56_SHARED);
 
+int cs35l56_read_prot_status(struct cs35l56_base *cs35l56_base,
+                            bool *fw_missing, unsigned int *fw_version)
+{
+       unsigned int prot_status;
+       int ret;
+
+       ret = regmap_read(cs35l56_base->regmap, CS35L56_PROTECTION_STATUS, &prot_status);
+       if (ret) {
+               dev_err(cs35l56_base->dev, "Get PROTECTION_STATUS failed: %d\n", ret);
+               return ret;
+       }
+
+       *fw_missing = !!(prot_status & CS35L56_FIRMWARE_MISSING);
+
+       ret = regmap_read(cs35l56_base->regmap, CS35L56_DSP1_FW_VER, fw_version);
+       if (ret) {
+               dev_err(cs35l56_base->dev, "Get FW VER failed: %d\n", ret);
+               return ret;
+       }
+
+       return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cs35l56_read_prot_status, SND_SOC_CS35L56_SHARED);
+
 int cs35l56_hw_init(struct cs35l56_base *cs35l56_base)
 {
        int ret;
-       unsigned int devid, revid, otpid, secured;
+       unsigned int devid, revid, otpid, secured, fw_ver;
+       bool fw_missing;
 
        /*
         * When the system is not using a reset_gpio ensure the device is
@@ -652,8 +716,13 @@ int cs35l56_hw_init(struct cs35l56_base *cs35l56_base)
                return ret;
        }
 
-       dev_info(cs35l56_base->dev, "Cirrus Logic CS35L56%s Rev %02X OTP%d\n",
-                cs35l56_base->secured ? "s" : "", cs35l56_base->rev, otpid);
+       ret = cs35l56_read_prot_status(cs35l56_base, &fw_missing, &fw_ver);
+       if (ret)
+               return ret;
+
+       dev_info(cs35l56_base->dev, "Cirrus Logic CS35L56%s Rev %02X OTP%d fw:%d.%d.%d (patched=%u)\n",
+                cs35l56_base->secured ? "s" : "", cs35l56_base->rev, otpid,
+                fw_ver >> 16, (fw_ver >> 8) & 0xff, fw_ver & 0xff, !fw_missing);
 
        /* Wake source and *_BLOCKED interrupts default to unmasked, so mask them */
        regmap_write(cs35l56_base->regmap, CS35L56_IRQ1_MASK_20, 0xffffffff);
@@ -668,6 +737,41 @@ int cs35l56_hw_init(struct cs35l56_base *cs35l56_base)
 }
 EXPORT_SYMBOL_NS_GPL(cs35l56_hw_init, SND_SOC_CS35L56_SHARED);
 
+int cs35l56_get_speaker_id(struct cs35l56_base *cs35l56_base)
+{
+       struct gpio_descs *descs;
+       int speaker_id;
+       int i, ret;
+
+       /* Read the speaker type qualifier from the motherboard GPIOs */
+       descs = gpiod_get_array_optional(cs35l56_base->dev, "spk-id", GPIOD_IN);
+       if (!descs) {
+               return -ENOENT;
+       } else if (IS_ERR(descs)) {
+               ret = PTR_ERR(descs);
+               return dev_err_probe(cs35l56_base->dev, ret, "Failed to get spk-id-gpios\n");
+       }
+
+       speaker_id = 0;
+       for (i = 0; i < descs->ndescs; i++) {
+               ret = gpiod_get_value_cansleep(descs->desc[i]);
+               if (ret < 0) {
+                       dev_err_probe(cs35l56_base->dev, ret, "Failed to read spk-id[%d]\n", i);
+                       goto err;
+               }
+
+               speaker_id |= (ret << i);
+       }
+
+       dev_dbg(cs35l56_base->dev, "Speaker ID = %d\n", speaker_id);
+       ret = speaker_id;
+err:
+       gpiod_put_array(descs);
+
+       return ret;
+}
+EXPORT_SYMBOL_NS_GPL(cs35l56_get_speaker_id, SND_SOC_CS35L56_SHARED);
+
 static const u32 cs35l56_bclk_valid_for_pll_freq_table[] = {
        [0x0C] = 128000,
        [0x0F] = 256000,
index 45b4de3eff94ffef7ba6ef8ffcccf11747f74415..6dd0319bc843cf5d1740222e57c380881b4b5b05 100644 (file)
@@ -5,6 +5,7 @@
 // Copyright (C) 2023 Cirrus Logic, Inc. and
 //                    Cirrus Logic International Semiconductor Ltd.
 
+#include <linux/acpi.h>
 #include <linux/completion.h>
 #include <linux/debugfs.h>
 #include <linux/delay.h>
@@ -15,6 +16,7 @@
 #include <linux/module.h>
 #include <linux/pm.h>
 #include <linux/pm_runtime.h>
+#include <linux/property.h>
 #include <linux/regmap.h>
 #include <linux/regulator/consumer.h>
 #include <linux/slab.h>
@@ -59,6 +61,131 @@ static int cs35l56_dspwait_put_volsw(struct snd_kcontrol *kcontrol,
        return snd_soc_put_volsw(kcontrol, ucontrol);
 }
 
+static const unsigned short cs35l56_asp1_mixer_regs[] = {
+       CS35L56_ASP1TX1_INPUT, CS35L56_ASP1TX2_INPUT,
+       CS35L56_ASP1TX3_INPUT, CS35L56_ASP1TX4_INPUT,
+};
+
+static const char * const cs35l56_asp1_mux_control_names[] = {
+       "ASP1 TX1 Source", "ASP1 TX2 Source", "ASP1 TX3 Source", "ASP1 TX4 Source"
+};
+
+static int cs35l56_sync_asp1_mixer_widgets_with_firmware(struct cs35l56_private *cs35l56)
+{
+       struct snd_soc_dapm_context *dapm = snd_soc_component_get_dapm(cs35l56->component);
+       const char *prefix = cs35l56->component->name_prefix;
+       char full_name[SNDRV_CTL_ELEM_ID_NAME_MAXLEN];
+       const char *name;
+       struct snd_kcontrol *kcontrol;
+       struct soc_enum *e;
+       unsigned int val[4];
+       int i, item, ret;
+
+       if (cs35l56->asp1_mixer_widgets_initialized)
+               return 0;
+
+       /*
+        * Resume so we can read the registers from silicon if the regmap
+        * cache has not yet been populated.
+        */
+       ret = pm_runtime_resume_and_get(cs35l56->base.dev);
+       if (ret < 0)
+               return ret;
+
+       /* Wait for firmware download and reboot */
+       cs35l56_wait_dsp_ready(cs35l56);
+
+       ret = regmap_bulk_read(cs35l56->base.regmap, CS35L56_ASP1TX1_INPUT,
+                              val, ARRAY_SIZE(val));
+
+       pm_runtime_mark_last_busy(cs35l56->base.dev);
+       pm_runtime_put_autosuspend(cs35l56->base.dev);
+
+       if (ret) {
+               dev_err(cs35l56->base.dev, "Failed to read ASP1 mixer regs: %d\n", ret);
+               return ret;
+       }
+
+       for (i = 0; i < ARRAY_SIZE(cs35l56_asp1_mux_control_names); ++i) {
+               name = cs35l56_asp1_mux_control_names[i];
+
+               if (prefix) {
+                       snprintf(full_name, sizeof(full_name), "%s %s", prefix, name);
+                       name = full_name;
+               }
+
+               kcontrol = snd_soc_card_get_kcontrol_locked(dapm->card, name);
+               if (!kcontrol) {
+                       dev_warn(cs35l56->base.dev, "Could not find control %s\n", name);
+                       continue;
+               }
+
+               e = (struct soc_enum *)kcontrol->private_value;
+               item = snd_soc_enum_val_to_item(e, val[i] & CS35L56_ASP_TXn_SRC_MASK);
+               snd_soc_dapm_mux_update_power(dapm, kcontrol, item, e, NULL);
+       }
+
+       cs35l56->asp1_mixer_widgets_initialized = true;
+
+       return 0;
+}
+
+static int cs35l56_dspwait_asp1tx_get(struct snd_kcontrol *kcontrol,
+                                     struct snd_ctl_elem_value *ucontrol)
+{
+       struct snd_soc_component *component = snd_soc_dapm_kcontrol_component(kcontrol);
+       struct cs35l56_private *cs35l56 = snd_soc_component_get_drvdata(component);
+       struct soc_enum *e = (struct soc_enum *)kcontrol->private_value;
+       int index = e->shift_l;
+       unsigned int addr, val;
+       int ret;
+
+       ret = cs35l56_sync_asp1_mixer_widgets_with_firmware(cs35l56);
+       if (ret)
+               return ret;
+
+       addr = cs35l56_asp1_mixer_regs[index];
+       ret = regmap_read(cs35l56->base.regmap, addr, &val);
+       if (ret)
+               return ret;
+
+       val &= CS35L56_ASP_TXn_SRC_MASK;
+       ucontrol->value.enumerated.item[0] = snd_soc_enum_val_to_item(e, val);
+
+       return 0;
+}
+
+static int cs35l56_dspwait_asp1tx_put(struct snd_kcontrol *kcontrol,
+                                     struct snd_ctl_elem_value *ucontrol)
+{
+       struct snd_soc_component *component = snd_soc_dapm_kcontrol_component(kcontrol);
+       struct snd_soc_dapm_context *dapm = snd_soc_dapm_kcontrol_dapm(kcontrol);
+       struct cs35l56_private *cs35l56 = snd_soc_component_get_drvdata(component);
+       struct soc_enum *e = (struct soc_enum *)kcontrol->private_value;
+       int item = ucontrol->value.enumerated.item[0];
+       int index = e->shift_l;
+       unsigned int addr, val;
+       bool changed;
+       int ret;
+
+       ret = cs35l56_sync_asp1_mixer_widgets_with_firmware(cs35l56);
+       if (ret)
+               return ret;
+
+       addr = cs35l56_asp1_mixer_regs[index];
+       val = snd_soc_enum_item_to_val(e, item);
+
+       ret = regmap_update_bits_check(cs35l56->base.regmap, addr,
+                                      CS35L56_ASP_TXn_SRC_MASK, val, &changed);
+       if (ret)
+               return ret;
+
+       if (changed)
+               snd_soc_dapm_mux_update_power(dapm, kcontrol, item, e, NULL);
+
+       return changed;
+}
+
 static DECLARE_TLV_DB_SCALE(vol_tlv, -10000, 25, 0);
 
 static const struct snd_kcontrol_new cs35l56_controls[] = {
@@ -77,40 +204,44 @@ static const struct snd_kcontrol_new cs35l56_controls[] = {
 };
 
 static SOC_VALUE_ENUM_SINGLE_DECL(cs35l56_asp1tx1_enum,
-                                 CS35L56_ASP1TX1_INPUT,
-                                 0, CS35L56_ASP_TXn_SRC_MASK,
+                                 SND_SOC_NOPM,
+                                 0, 0,
                                  cs35l56_tx_input_texts,
                                  cs35l56_tx_input_values);
 
 static const struct snd_kcontrol_new asp1_tx1_mux =
-       SOC_DAPM_ENUM("ASP1TX1 SRC", cs35l56_asp1tx1_enum);
+       SOC_DAPM_ENUM_EXT("ASP1TX1 SRC", cs35l56_asp1tx1_enum,
+                         cs35l56_dspwait_asp1tx_get, cs35l56_dspwait_asp1tx_put);
 
 static SOC_VALUE_ENUM_SINGLE_DECL(cs35l56_asp1tx2_enum,
-                                 CS35L56_ASP1TX2_INPUT,
-                                 0, CS35L56_ASP_TXn_SRC_MASK,
+                                 SND_SOC_NOPM,
+                                 1, 0,
                                  cs35l56_tx_input_texts,
                                  cs35l56_tx_input_values);
 
 static const struct snd_kcontrol_new asp1_tx2_mux =
-       SOC_DAPM_ENUM("ASP1TX2 SRC", cs35l56_asp1tx2_enum);
+       SOC_DAPM_ENUM_EXT("ASP1TX2 SRC", cs35l56_asp1tx2_enum,
+                         cs35l56_dspwait_asp1tx_get, cs35l56_dspwait_asp1tx_put);
 
 static SOC_VALUE_ENUM_SINGLE_DECL(cs35l56_asp1tx3_enum,
-                                 CS35L56_ASP1TX3_INPUT,
-                                 0, CS35L56_ASP_TXn_SRC_MASK,
+                                 SND_SOC_NOPM,
+                                 2, 0,
                                  cs35l56_tx_input_texts,
                                  cs35l56_tx_input_values);
 
 static const struct snd_kcontrol_new asp1_tx3_mux =
-       SOC_DAPM_ENUM("ASP1TX3 SRC", cs35l56_asp1tx3_enum);
+       SOC_DAPM_ENUM_EXT("ASP1TX3 SRC", cs35l56_asp1tx3_enum,
+                         cs35l56_dspwait_asp1tx_get, cs35l56_dspwait_asp1tx_put);
 
 static SOC_VALUE_ENUM_SINGLE_DECL(cs35l56_asp1tx4_enum,
-                                 CS35L56_ASP1TX4_INPUT,
-                                 0, CS35L56_ASP_TXn_SRC_MASK,
+                                 SND_SOC_NOPM,
+                                 3, 0,
                                  cs35l56_tx_input_texts,
                                  cs35l56_tx_input_values);
 
 static const struct snd_kcontrol_new asp1_tx4_mux =
-       SOC_DAPM_ENUM("ASP1TX4 SRC", cs35l56_asp1tx4_enum);
+       SOC_DAPM_ENUM_EXT("ASP1TX4 SRC", cs35l56_asp1tx4_enum,
+                         cs35l56_dspwait_asp1tx_get, cs35l56_dspwait_asp1tx_put);
 
 static SOC_VALUE_ENUM_SINGLE_DECL(cs35l56_sdw1tx1_enum,
                                CS35L56_SWIRE_DP3_CH1_INPUT,
@@ -148,6 +279,21 @@ static SOC_VALUE_ENUM_SINGLE_DECL(cs35l56_sdw1tx4_enum,
 static const struct snd_kcontrol_new sdw1_tx4_mux =
        SOC_DAPM_ENUM("SDW1TX4 SRC", cs35l56_sdw1tx4_enum);
 
+static int cs35l56_asp1_cfg_event(struct snd_soc_dapm_widget *w,
+                                 struct snd_kcontrol *kcontrol, int event)
+{
+       struct snd_soc_component *component = snd_soc_dapm_to_component(w->dapm);
+       struct cs35l56_private *cs35l56 = snd_soc_component_get_drvdata(component);
+
+       switch (event) {
+       case SND_SOC_DAPM_PRE_PMU:
+               /* Override register values set by firmware boot */
+               return cs35l56_force_sync_asp1_registers_from_cache(&cs35l56->base);
+       default:
+               return 0;
+       }
+}
+
 static int cs35l56_play_event(struct snd_soc_dapm_widget *w,
                              struct snd_kcontrol *kcontrol, int event)
 {
@@ -184,6 +330,9 @@ static const struct snd_soc_dapm_widget cs35l56_dapm_widgets[] = {
        SND_SOC_DAPM_REGULATOR_SUPPLY("VDD_B", 0, 0),
        SND_SOC_DAPM_REGULATOR_SUPPLY("VDD_AMP", 0, 0),
 
+       SND_SOC_DAPM_SUPPLY("ASP1 CFG", SND_SOC_NOPM, 0, 0, cs35l56_asp1_cfg_event,
+                           SND_SOC_DAPM_PRE_PMU),
+
        SND_SOC_DAPM_SUPPLY("PLAY", SND_SOC_NOPM, 0, 0, cs35l56_play_event,
                            SND_SOC_DAPM_PRE_PMU | SND_SOC_DAPM_POST_PMU | SND_SOC_DAPM_POST_PMD),
 
@@ -251,6 +400,9 @@ static const struct snd_soc_dapm_route cs35l56_audio_map[] = {
        { "AMP", NULL, "VDD_B" },
        { "AMP", NULL, "VDD_AMP" },
 
+       { "ASP1 Playback", NULL, "ASP1 CFG" },
+       { "ASP1 Capture", NULL, "ASP1 CFG" },
+
        { "ASP1 Playback", NULL, "PLAY" },
        { "SDW1 Playback", NULL, "PLAY" },
 
@@ -650,7 +802,7 @@ static struct snd_soc_dai_driver cs35l56_dai[] = {
        }
 };
 
-static void cs35l56_secure_patch(struct cs35l56_private *cs35l56)
+static void cs35l56_reinit_patch(struct cs35l56_private *cs35l56)
 {
        int ret;
 
@@ -662,19 +814,10 @@ static void cs35l56_secure_patch(struct cs35l56_private *cs35l56)
                cs35l56_mbox_send(&cs35l56->base, CS35L56_MBOX_CMD_AUDIO_REINIT);
 }
 
-static void cs35l56_patch(struct cs35l56_private *cs35l56)
+static void cs35l56_patch(struct cs35l56_private *cs35l56, bool firmware_missing)
 {
-       unsigned int firmware_missing;
        int ret;
 
-       ret = regmap_read(cs35l56->base.regmap, CS35L56_PROTECTION_STATUS, &firmware_missing);
-       if (ret) {
-               dev_err(cs35l56->base.dev, "Failed to read PROTECTION_STATUS: %d\n", ret);
-               return;
-       }
-
-       firmware_missing &= CS35L56_FIRMWARE_MISSING;
-
        /*
         * Disable SoundWire interrupts to prevent race with IRQ work.
         * Setting sdw_irq_no_unmask prevents the handler re-enabling
@@ -747,23 +890,51 @@ static void cs35l56_dsp_work(struct work_struct *work)
        struct cs35l56_private *cs35l56 = container_of(work,
                                                       struct cs35l56_private,
                                                       dsp_work);
+       unsigned int firmware_version;
+       bool firmware_missing;
+       int ret;
 
        if (!cs35l56->base.init_done)
                return;
 
        pm_runtime_get_sync(cs35l56->base.dev);
 
+       ret = cs35l56_read_prot_status(&cs35l56->base, &firmware_missing, &firmware_version);
+       if (ret)
+               goto err;
+
+       /* Populate fw file qualifier with the revision and security state */
+       kfree(cs35l56->dsp.fwf_name);
+       if (firmware_missing) {
+               cs35l56->dsp.fwf_name = kasprintf(GFP_KERNEL, "%02x-dsp1", cs35l56->base.rev);
+       } else {
+               /* Firmware files must match the running firmware version */
+               cs35l56->dsp.fwf_name = kasprintf(GFP_KERNEL,
+                                                 "%02x%s-%06x-dsp1",
+                                                 cs35l56->base.rev,
+                                                 cs35l56->base.secured ? "-s" : "",
+                                                 firmware_version);
+       }
+
+       if (!cs35l56->dsp.fwf_name)
+               goto err;
+
+       dev_dbg(cs35l56->base.dev, "DSP fwf name: '%s' system name: '%s'\n",
+               cs35l56->dsp.fwf_name, cs35l56->dsp.system_name);
+
        /*
-        * When the device is running in secure mode the firmware files can
-        * only contain insecure tunings and therefore we do not need to
-        * shutdown the firmware to apply them and can use the lower cost
-        * reinit sequence instead.
+        * The firmware cannot be patched if it is already running from
+        * patch RAM. In this case the firmware files are versioned to
+        * match the running firmware version and will only contain
+        * tunings. We do not need to shutdown the firmware to apply
+        * tunings so can use the lower cost reinit sequence instead.
         */
-       if (cs35l56->base.secured)
-               cs35l56_secure_patch(cs35l56);
+       if (!firmware_missing)
+               cs35l56_reinit_patch(cs35l56);
        else
-               cs35l56_patch(cs35l56);
+               cs35l56_patch(cs35l56, firmware_missing);
 
+err:
        pm_runtime_mark_last_busy(cs35l56->base.dev);
        pm_runtime_put_autosuspend(cs35l56->base.dev);
 }
@@ -778,10 +949,19 @@ static int cs35l56_component_probe(struct snd_soc_component *component)
 
        if (!cs35l56->dsp.system_name &&
            (snd_soc_card_get_pci_ssid(component->card, &vendor, &device) == 0)) {
-               cs35l56->dsp.system_name = devm_kasprintf(cs35l56->base.dev,
-                                                         GFP_KERNEL,
-                                                         "%04x%04x",
-                                                         vendor, device);
+               /* Append a speaker qualifier if there is a speaker ID */
+               if (cs35l56->speaker_id >= 0) {
+                       cs35l56->dsp.system_name = devm_kasprintf(cs35l56->base.dev,
+                                                                 GFP_KERNEL,
+                                                                 "%04x%04x-spkid%d",
+                                                                 vendor, device,
+                                                                 cs35l56->speaker_id);
+               } else {
+                       cs35l56->dsp.system_name = devm_kasprintf(cs35l56->base.dev,
+                                                                 GFP_KERNEL,
+                                                                 "%04x%04x",
+                                                                 vendor, device);
+               }
                if (!cs35l56->dsp.system_name)
                        return -ENOMEM;
        }
@@ -799,6 +979,13 @@ static int cs35l56_component_probe(struct snd_soc_component *component)
        debugfs_create_bool("can_hibernate", 0444, debugfs_root, &cs35l56->base.can_hibernate);
        debugfs_create_bool("fw_patched", 0444, debugfs_root, &cs35l56->base.fw_patched);
 
+       /*
+        * The widgets for the ASP1TX mixer can't be initialized
+        * until the firmware has been downloaded and rebooted.
+        */
+       regcache_drop_region(cs35l56->base.regmap, CS35L56_ASP1TX1_INPUT, CS35L56_ASP1TX4_INPUT);
+       cs35l56->asp1_mixer_widgets_initialized = false;
+
        queue_work(cs35l56->dsp_wq, &cs35l56->dsp_work);
 
        return 0;
@@ -809,6 +996,16 @@ static void cs35l56_component_remove(struct snd_soc_component *component)
        struct cs35l56_private *cs35l56 = snd_soc_component_get_drvdata(component);
 
        cancel_work_sync(&cs35l56->dsp_work);
+
+       if (cs35l56->dsp.cs_dsp.booted)
+               wm_adsp_power_down(&cs35l56->dsp);
+
+       wm_adsp2_component_remove(&cs35l56->dsp, component);
+
+       kfree(cs35l56->dsp.fwf_name);
+       cs35l56->dsp.fwf_name = NULL;
+
+       cs35l56->component = NULL;
 }
 
 static int cs35l56_set_bias_level(struct snd_soc_component *component,
@@ -1050,7 +1247,13 @@ static int cs35l56_get_firmware_uid(struct cs35l56_private *cs35l56)
        if (ret < 0)
                return 0;
 
-       cs35l56->dsp.system_name = devm_kstrdup(dev, prop, GFP_KERNEL);
+       /* Append a speaker qualifier if there is a speaker ID */
+       if (cs35l56->speaker_id >= 0)
+               cs35l56->dsp.system_name = devm_kasprintf(dev, GFP_KERNEL, "%s-spkid%d",
+                                                         prop, cs35l56->speaker_id);
+       else
+               cs35l56->dsp.system_name = devm_kstrdup(dev, prop, GFP_KERNEL);
+
        if (cs35l56->dsp.system_name == NULL)
                return -ENOMEM;
 
@@ -1059,12 +1262,101 @@ static int cs35l56_get_firmware_uid(struct cs35l56_private *cs35l56)
        return 0;
 }
 
+/*
+ * Some SoundWire laptops have a spk-id-gpios property but it points to
+ * the wrong ACPI Device node so can't be used to get the GPIO. Try to
+ * find the SDCA node containing the GpioIo resource and add a GPIO
+ * mapping to it.
+ */
+static const struct acpi_gpio_params cs35l56_af01_first_gpio = { 0, 0, false };
+static const struct acpi_gpio_mapping cs35l56_af01_spkid_gpios_mapping[] = {
+       { "spk-id-gpios", &cs35l56_af01_first_gpio, 1 },
+       { }
+};
+
+static void cs35l56_acpi_dev_release_driver_gpios(void *adev)
+{
+       acpi_dev_remove_driver_gpios(adev);
+}
+
+static int cs35l56_try_get_broken_sdca_spkid_gpio(struct cs35l56_private *cs35l56)
+{
+       struct fwnode_handle *af01_fwnode;
+       const union acpi_object *obj;
+       struct gpio_desc *desc;
+       int ret;
+
+       /* Find the SDCA node containing the GpioIo */
+       af01_fwnode = device_get_named_child_node(cs35l56->base.dev, "AF01");
+       if (!af01_fwnode) {
+               dev_dbg(cs35l56->base.dev, "No AF01 node\n");
+               return -ENOENT;
+       }
+
+       ret = acpi_dev_get_property(ACPI_COMPANION(cs35l56->base.dev),
+                                   "spk-id-gpios", ACPI_TYPE_PACKAGE, &obj);
+       if (ret) {
+               dev_dbg(cs35l56->base.dev, "Could not get spk-id-gpios package: %d\n", ret);
+               return -ENOENT;
+       }
+
+       /* The broken properties we can handle are a 4-element package (one GPIO) */
+       if (obj->package.count != 4) {
+               dev_warn(cs35l56->base.dev, "Unexpected spk-id element count %d\n",
+                        obj->package.count);
+               return -ENOENT;
+       }
+
+       /* Add a GPIO mapping if it doesn't already have one */
+       if (!fwnode_property_present(af01_fwnode, "spk-id-gpios")) {
+               struct acpi_device *adev = to_acpi_device_node(af01_fwnode);
+
+               /*
+                * Can't use devm_acpi_dev_add_driver_gpios() because the
+                * mapping isn't being added to the node pointed to by
+                * ACPI_COMPANION().
+                */
+               ret = acpi_dev_add_driver_gpios(adev, cs35l56_af01_spkid_gpios_mapping);
+               if (ret) {
+                       return dev_err_probe(cs35l56->base.dev, ret,
+                                            "Failed to add gpio mapping to AF01\n");
+               }
+
+               ret = devm_add_action_or_reset(cs35l56->base.dev,
+                                              cs35l56_acpi_dev_release_driver_gpios,
+                                              adev);
+               if (ret)
+                       return ret;
+
+               dev_dbg(cs35l56->base.dev, "Added spk-id-gpios mapping to AF01\n");
+       }
+
+       desc = fwnode_gpiod_get_index(af01_fwnode, "spk-id", 0, GPIOD_IN, NULL);
+       if (IS_ERR(desc)) {
+               ret = PTR_ERR(desc);
+               return dev_err_probe(cs35l56->base.dev, ret, "Get GPIO from AF01 failed\n");
+       }
+
+       ret = gpiod_get_value_cansleep(desc);
+       gpiod_put(desc);
+
+       if (ret < 0) {
+               dev_err_probe(cs35l56->base.dev, ret, "Error reading spk-id GPIO\n");
+               return ret;
+               }
+
+       dev_info(cs35l56->base.dev, "Got spk-id from AF01\n");
+
+       return ret;
+}
+
 int cs35l56_common_probe(struct cs35l56_private *cs35l56)
 {
        int ret;
 
        init_completion(&cs35l56->init_completion);
        mutex_init(&cs35l56->base.irq_lock);
+       cs35l56->speaker_id = -ENOENT;
 
        dev_set_drvdata(cs35l56->base.dev, cs35l56);
 
@@ -1101,6 +1393,15 @@ int cs35l56_common_probe(struct cs35l56_private *cs35l56)
                gpiod_set_value_cansleep(cs35l56->base.reset_gpio, 1);
        }
 
+       ret = cs35l56_get_speaker_id(&cs35l56->base);
+       if (ACPI_COMPANION(cs35l56->base.dev) && cs35l56->sdw_peripheral && (ret == -ENOENT))
+               ret = cs35l56_try_get_broken_sdca_spkid_gpio(cs35l56);
+
+       if ((ret < 0) && (ret != -ENOENT))
+               goto err;
+
+       cs35l56->speaker_id = ret;
+
        ret = cs35l56_get_firmware_uid(cs35l56);
        if (ret != 0)
                goto err;
@@ -1152,11 +1453,9 @@ int cs35l56_init(struct cs35l56_private *cs35l56)
        if (ret < 0)
                return ret;
 
-       /* Populate the DSP information with the revision and security state */
-       cs35l56->dsp.part = devm_kasprintf(cs35l56->base.dev, GFP_KERNEL, "cs35l56%s-%02x",
-                                          cs35l56->base.secured ? "s" : "", cs35l56->base.rev);
-       if (!cs35l56->dsp.part)
-               return -ENOMEM;
+       ret = cs35l56_set_patch(&cs35l56->base);
+       if (ret)
+               return ret;
 
        if (!cs35l56->base.reset_gpio) {
                dev_dbg(cs35l56->base.dev, "No reset gpio: using soft reset\n");
@@ -1190,10 +1489,6 @@ post_soft_reset:
        if (ret)
                return ret;
 
-       ret = cs35l56_set_patch(&cs35l56->base);
-       if (ret)
-               return ret;
-
        /* Registers could be dirty after soft reset or SoundWire enumeration */
        regcache_sync(cs35l56->base.regmap);
 
index 8159c3e217d936c02baf88c5659a99e4f3159ddd..b000e7365e4065eaffc3b3ea2aa2714b9acccdfd 100644 (file)
@@ -44,12 +44,14 @@ struct cs35l56_private {
        bool sdw_attached;
        struct completion init_completion;
 
+       int speaker_id;
        u32 rx_mask;
        u32 tx_mask;
        u8 asp_slot_width;
        u8 asp_slot_count;
        bool tdm_mode;
        bool sysclk_set;
+       bool asp1_mixer_widgets_initialized;
        u8 old_sdw_clock_scale;
 };
 
index 6a64681767de8122d06bee23b5a21674d930bf57..a97ccb512deba86305ff433fa5b639fdfbb8dcbe 100644 (file)
@@ -2257,7 +2257,10 @@ static int cs42l43_codec_probe(struct platform_device *pdev)
        pm_runtime_use_autosuspend(priv->dev);
        pm_runtime_set_active(priv->dev);
        pm_runtime_get_noresume(priv->dev);
-       devm_pm_runtime_enable(priv->dev);
+
+       ret = devm_pm_runtime_enable(priv->dev);
+       if (ret)
+               goto err_pm;
 
        for (i = 0; i < ARRAY_SIZE(cs42l43_irqs); i++) {
                ret = cs42l43_request_irq(priv, dom, cs42l43_irqs[i].name,
@@ -2333,8 +2336,47 @@ static int cs42l43_codec_runtime_resume(struct device *dev)
        return 0;
 }
 
-static DEFINE_RUNTIME_DEV_PM_OPS(cs42l43_codec_pm_ops, NULL,
-                                cs42l43_codec_runtime_resume, NULL);
+static int cs42l43_codec_suspend(struct device *dev)
+{
+       struct cs42l43 *cs42l43 = dev_get_drvdata(dev);
+
+       disable_irq(cs42l43->irq);
+
+       return 0;
+}
+
+static int cs42l43_codec_suspend_noirq(struct device *dev)
+{
+       struct cs42l43 *cs42l43 = dev_get_drvdata(dev);
+
+       enable_irq(cs42l43->irq);
+
+       return 0;
+}
+
+static int cs42l43_codec_resume(struct device *dev)
+{
+       struct cs42l43 *cs42l43 = dev_get_drvdata(dev);
+
+       enable_irq(cs42l43->irq);
+
+       return 0;
+}
+
+static int cs42l43_codec_resume_noirq(struct device *dev)
+{
+       struct cs42l43 *cs42l43 = dev_get_drvdata(dev);
+
+       disable_irq(cs42l43->irq);
+
+       return 0;
+}
+
+static const struct dev_pm_ops cs42l43_codec_pm_ops = {
+       SYSTEM_SLEEP_PM_OPS(cs42l43_codec_suspend, cs42l43_codec_resume)
+       NOIRQ_SYSTEM_SLEEP_PM_OPS(cs42l43_codec_suspend_noirq, cs42l43_codec_resume_noirq)
+       RUNTIME_PM_OPS(NULL, cs42l43_codec_runtime_resume, NULL)
+};
 
 static const struct platform_device_id cs42l43_codec_id_table[] = {
        { "cs42l43-codec", },
old mode 100755 (executable)
new mode 100644 (file)
index fa890f6..cbcd02e
@@ -45,6 +45,82 @@ struct es8326_priv {
        int jack_remove_retry;
 };
 
+static int es8326_crosstalk1_get(struct snd_kcontrol *kcontrol,
+               struct snd_ctl_elem_value *ucontrol)
+{
+       struct snd_soc_component *component = snd_kcontrol_chip(kcontrol);
+       struct es8326_priv *es8326 = snd_soc_component_get_drvdata(component);
+       unsigned int crosstalk_h, crosstalk_l;
+       unsigned int crosstalk;
+
+       regmap_read(es8326->regmap, ES8326_DAC_RAMPRATE, &crosstalk_h);
+       regmap_read(es8326->regmap, ES8326_DAC_CROSSTALK, &crosstalk_l);
+       crosstalk_h &= 0x20;
+       crosstalk_l &= 0xf0;
+       crosstalk = crosstalk_h >> 1 | crosstalk_l >> 4;
+       ucontrol->value.integer.value[0] = crosstalk;
+
+       return 0;
+}
+
+static int es8326_crosstalk1_set(struct snd_kcontrol *kcontrol,
+               struct snd_ctl_elem_value *ucontrol)
+{
+       struct snd_soc_component *component = snd_kcontrol_chip(kcontrol);
+       struct es8326_priv *es8326 = snd_soc_component_get_drvdata(component);
+       unsigned int crosstalk_h, crosstalk_l;
+       unsigned int crosstalk;
+
+       crosstalk = ucontrol->value.integer.value[0];
+       regmap_read(es8326->regmap, ES8326_DAC_CROSSTALK, &crosstalk_l);
+       crosstalk_h = (crosstalk & 0x10) << 1;
+       crosstalk_l &= 0x0f;
+       crosstalk_l |= (crosstalk & 0x0f) << 4;
+       regmap_update_bits(es8326->regmap, ES8326_DAC_RAMPRATE,
+                       0x20, crosstalk_h);
+       regmap_write(es8326->regmap, ES8326_DAC_CROSSTALK, crosstalk_l);
+
+       return 0;
+}
+
+static int es8326_crosstalk2_get(struct snd_kcontrol *kcontrol,
+               struct snd_ctl_elem_value *ucontrol)
+{
+       struct snd_soc_component *component = snd_kcontrol_chip(kcontrol);
+       struct es8326_priv *es8326 = snd_soc_component_get_drvdata(component);
+       unsigned int crosstalk_h, crosstalk_l;
+       unsigned int crosstalk;
+
+       regmap_read(es8326->regmap, ES8326_DAC_RAMPRATE, &crosstalk_h);
+       regmap_read(es8326->regmap, ES8326_DAC_CROSSTALK, &crosstalk_l);
+       crosstalk_h &= 0x10;
+       crosstalk_l &= 0x0f;
+       crosstalk = crosstalk_h  | crosstalk_l;
+       ucontrol->value.integer.value[0] = crosstalk;
+
+       return 0;
+}
+
+static int es8326_crosstalk2_set(struct snd_kcontrol *kcontrol,
+               struct snd_ctl_elem_value *ucontrol)
+{
+       struct snd_soc_component *component = snd_kcontrol_chip(kcontrol);
+       struct es8326_priv *es8326 = snd_soc_component_get_drvdata(component);
+       unsigned int crosstalk_h, crosstalk_l;
+       unsigned int crosstalk;
+
+       crosstalk = ucontrol->value.integer.value[0];
+       regmap_read(es8326->regmap, ES8326_DAC_CROSSTALK, &crosstalk_l);
+       crosstalk_h = crosstalk & 0x10;
+       crosstalk_l &= 0xf0;
+       crosstalk_l |= crosstalk & 0x0f;
+       regmap_update_bits(es8326->regmap, ES8326_DAC_RAMPRATE,
+                       0x10, crosstalk_h);
+       regmap_write(es8326->regmap, ES8326_DAC_CROSSTALK, crosstalk_l);
+
+       return 0;
+}
+
 static const SNDRV_CTL_TLVD_DECLARE_DB_SCALE(dac_vol_tlv, -9550, 50, 0);
 static const SNDRV_CTL_TLVD_DECLARE_DB_SCALE(adc_vol_tlv, -9550, 50, 0);
 static const SNDRV_CTL_TLVD_DECLARE_DB_SCALE(adc_analog_pga_tlv, 0, 300, 0);
@@ -102,6 +178,10 @@ static const struct snd_kcontrol_new es8326_snd_controls[] = {
        SOC_SINGLE_TLV("ALC Capture Target Level", ES8326_ALC_LEVEL,
                        0, 0x0f, 0, drc_target_tlv),
 
+       SOC_SINGLE_EXT("CROSSTALK1", SND_SOC_NOPM, 0, 31, 0,
+                       es8326_crosstalk1_get, es8326_crosstalk1_set),
+       SOC_SINGLE_EXT("CROSSTALK2", SND_SOC_NOPM, 0, 31, 0,
+                       es8326_crosstalk2_get, es8326_crosstalk2_set),
 };
 
 static const struct snd_soc_dapm_widget es8326_dapm_widgets[] = {
@@ -117,12 +197,6 @@ static const struct snd_soc_dapm_widget es8326_dapm_widgets[] = {
        SND_SOC_DAPM_AIF_OUT("I2S OUT", "I2S1 Capture", 0, SND_SOC_NOPM, 0, 0),
        SND_SOC_DAPM_AIF_IN("I2S IN", "I2S1 Playback", 0, SND_SOC_NOPM, 0, 0),
 
-       /* ADC Digital Mute */
-       SND_SOC_DAPM_PGA("ADC L1", ES8326_ADC_MUTE, 0, 1, NULL, 0),
-       SND_SOC_DAPM_PGA("ADC R1", ES8326_ADC_MUTE, 1, 1, NULL, 0),
-       SND_SOC_DAPM_PGA("ADC L2", ES8326_ADC_MUTE, 2, 1, NULL, 0),
-       SND_SOC_DAPM_PGA("ADC R2", ES8326_ADC_MUTE, 3, 1, NULL, 0),
-
        /* Analog Power Supply*/
        SND_SOC_DAPM_DAC("Right DAC", NULL, ES8326_ANA_PDN, 0, 1),
        SND_SOC_DAPM_DAC("Left DAC", NULL, ES8326_ANA_PDN, 1, 1),
@@ -142,15 +216,10 @@ static const struct snd_soc_dapm_widget es8326_dapm_widgets[] = {
 };
 
 static const struct snd_soc_dapm_route es8326_dapm_routes[] = {
-       {"ADC L1", NULL, "MIC1"},
-       {"ADC R1", NULL, "MIC2"},
-       {"ADC L2", NULL, "MIC3"},
-       {"ADC R2", NULL, "MIC4"},
-
-       {"ADC L", NULL, "ADC L1"},
-       {"ADC R", NULL, "ADC R1"},
-       {"ADC L", NULL, "ADC L2"},
-       {"ADC R", NULL, "ADC R2"},
+       {"ADC L", NULL, "MIC1"},
+       {"ADC R", NULL, "MIC2"},
+       {"ADC L", NULL, "MIC3"},
+       {"ADC R", NULL, "MIC4"},
 
        {"I2S OUT", NULL, "ADC L"},
        {"I2S OUT", NULL, "ADC R"},
@@ -440,10 +509,16 @@ static int es8326_mute(struct snd_soc_dai *dai, int mute, int direction)
        unsigned int offset_l, offset_r;
 
        if (mute) {
-               regmap_write(es8326->regmap, ES8326_HP_CAL, ES8326_HP_OFF);
-               regmap_update_bits(es8326->regmap, ES8326_DAC_MUTE,
-                               ES8326_MUTE_MASK, ES8326_MUTE);
-               regmap_write(es8326->regmap, ES8326_HP_DRIVER, 0xf0);
+               if (direction == SNDRV_PCM_STREAM_PLAYBACK) {
+                       regmap_write(es8326->regmap, ES8326_HP_CAL, ES8326_HP_OFF);
+                       regmap_update_bits(es8326->regmap, ES8326_DAC_MUTE,
+                                       ES8326_MUTE_MASK, ES8326_MUTE);
+                       regmap_update_bits(es8326->regmap, ES8326_HP_DRIVER_REF,
+                                       0x30, 0x00);
+               } else {
+                       regmap_update_bits(es8326->regmap,  ES8326_ADC_MUTE,
+                                       0x0F, 0x0F);
+               }
        } else {
                if (!es8326->calibrated) {
                        regmap_write(es8326->regmap, ES8326_HP_CAL, ES8326_HP_FORCE_CAL);
@@ -456,11 +531,22 @@ static int es8326_mute(struct snd_soc_dai *dai, int mute, int direction)
                        regmap_write(es8326->regmap, ES8326_HPR_OFFSET_INI, offset_r);
                        es8326->calibrated = true;
                }
-               regmap_write(es8326->regmap, ES8326_HP_DRIVER, 0xa1);
-               regmap_write(es8326->regmap, ES8326_HP_VOL, 0x91);
-               regmap_write(es8326->regmap, ES8326_HP_CAL, ES8326_HP_ON);
-               regmap_update_bits(es8326->regmap, ES8326_DAC_MUTE,
-                               ES8326_MUTE_MASK, ~(ES8326_MUTE));
+               if (direction == SNDRV_PCM_STREAM_PLAYBACK) {
+                       regmap_update_bits(es8326->regmap, ES8326_DAC_DSM, 0x01, 0x01);
+                       usleep_range(1000, 5000);
+                       regmap_update_bits(es8326->regmap, ES8326_DAC_DSM, 0x01, 0x00);
+                       usleep_range(1000, 5000);
+                       regmap_update_bits(es8326->regmap, ES8326_HP_DRIVER_REF, 0x30, 0x20);
+                       regmap_update_bits(es8326->regmap, ES8326_HP_DRIVER_REF, 0x30, 0x30);
+                       regmap_write(es8326->regmap, ES8326_HP_DRIVER, 0xa1);
+                       regmap_write(es8326->regmap, ES8326_HP_CAL, ES8326_HP_ON);
+                       regmap_update_bits(es8326->regmap, ES8326_DAC_MUTE,
+                                       ES8326_MUTE_MASK, ~(ES8326_MUTE));
+               } else {
+                       msleep(300);
+                       regmap_update_bits(es8326->regmap,  ES8326_ADC_MUTE,
+                                       0x0F, 0x00);
+               }
        }
        return 0;
 }
@@ -477,23 +563,20 @@ static int es8326_set_bias_level(struct snd_soc_component *codec,
                if (ret)
                        return ret;
 
-               regmap_update_bits(es8326->regmap, ES8326_DAC_DSM, 0x01, 0x00);
+               regmap_update_bits(es8326->regmap, ES8326_RESET, 0x02, 0x02);
+               usleep_range(5000, 10000);
                regmap_write(es8326->regmap, ES8326_INTOUT_IO, es8326->interrupt_clk);
                regmap_write(es8326->regmap, ES8326_SDINOUT1_IO,
                            (ES8326_IO_DMIC_CLK << ES8326_SDINOUT1_SHIFT));
-               regmap_write(es8326->regmap, ES8326_VMIDSEL, 0x0E);
                regmap_write(es8326->regmap, ES8326_PGA_PDN, 0x40);
                regmap_write(es8326->regmap, ES8326_ANA_PDN, 0x00);
                regmap_update_bits(es8326->regmap,  ES8326_CLK_CTL, 0x20, 0x20);
-
-               regmap_update_bits(es8326->regmap, ES8326_RESET,
-                               ES8326_CSM_ON, ES8326_CSM_ON);
+               regmap_update_bits(es8326->regmap, ES8326_RESET, 0x02, 0x00);
                break;
        case SND_SOC_BIAS_PREPARE:
                break;
        case SND_SOC_BIAS_STANDBY:
                regmap_write(es8326->regmap, ES8326_ANA_PDN, 0x3b);
-               regmap_write(es8326->regmap, ES8326_VMIDSEL, 0x00);
                regmap_update_bits(es8326->regmap, ES8326_CLK_CTL, 0x20, 0x00);
                regmap_write(es8326->regmap, ES8326_SDINOUT1_IO, ES8326_IO_INPUT);
                break;
@@ -513,7 +596,7 @@ static const struct snd_soc_dai_ops es8326_ops = {
        .set_fmt = es8326_set_dai_fmt,
        .set_sysclk = es8326_set_dai_sysclk,
        .mute_stream = es8326_mute,
-       .no_capture_mute = 1,
+       .no_capture_mute = 0,
 };
 
 static struct snd_soc_dai_driver es8326_dai = {
@@ -672,6 +755,8 @@ static void es8326_jack_detect_handler(struct work_struct *work)
                        es8326->hp = 0;
                }
                regmap_update_bits(es8326->regmap, ES8326_HPDET_TYPE, 0x03, 0x01);
+               regmap_write(es8326->regmap, ES8326_SYS_BIAS, 0x0a);
+               regmap_update_bits(es8326->regmap, ES8326_HP_DRIVER_REF, 0x0f, 0x03);
                /*
                 * Inverted HPJACK_POL bit to trigger one IRQ to double check HP Removal event
                 */
@@ -695,8 +780,11 @@ static void es8326_jack_detect_handler(struct work_struct *work)
                         * Don't report jack status.
                         */
                        regmap_update_bits(es8326->regmap, ES8326_HPDET_TYPE, 0x03, 0x01);
+                       es8326_enable_micbias(es8326->component);
                        usleep_range(50000, 70000);
                        regmap_update_bits(es8326->regmap, ES8326_HPDET_TYPE, 0x03, 0x00);
+                       regmap_write(es8326->regmap, ES8326_SYS_BIAS, 0x1f);
+                       regmap_update_bits(es8326->regmap, ES8326_HP_DRIVER_REF, 0x0f, 0x08);
                        queue_delayed_work(system_wq, &es8326->jack_detect_work,
                                        msecs_to_jiffies(400));
                        es8326->hp = 1;
@@ -736,13 +824,10 @@ exit:
 static irqreturn_t es8326_irq(int irq, void *dev_id)
 {
        struct es8326_priv *es8326 = dev_id;
-       struct snd_soc_component *comp = es8326->component;
 
        if (!es8326->jack)
                goto out;
 
-       es8326_enable_micbias(comp);
-
        if (es8326->jack->status & SND_JACK_HEADSET)
                queue_delayed_work(system_wq, &es8326->jack_detect_work,
                                   msecs_to_jiffies(10));
@@ -766,14 +851,14 @@ static int es8326_calibrate(struct snd_soc_component *component)
        if ((es8326->version == ES8326_VERSION_B) && (es8326->calibrated == false)) {
                dev_dbg(component->dev, "ES8326_VERSION_B, calibrating\n");
                regmap_write(es8326->regmap, ES8326_CLK_INV, 0xc0);
-               regmap_write(es8326->regmap, ES8326_CLK_DIV1, 0x01);
+               regmap_write(es8326->regmap, ES8326_CLK_DIV1, 0x03);
                regmap_write(es8326->regmap, ES8326_CLK_DLL, 0x30);
                regmap_write(es8326->regmap, ES8326_CLK_MUX, 0xed);
                regmap_write(es8326->regmap, ES8326_CLK_DAC_SEL, 0x08);
                regmap_write(es8326->regmap, ES8326_CLK_TRI, 0xc1);
                regmap_write(es8326->regmap, ES8326_DAC_MUTE, 0x03);
                regmap_write(es8326->regmap, ES8326_ANA_VSEL, 0x7f);
-               regmap_write(es8326->regmap, ES8326_VMIDLOW, 0x03);
+               regmap_write(es8326->regmap, ES8326_VMIDLOW, 0x23);
                regmap_write(es8326->regmap, ES8326_DAC2HPMIX, 0x88);
                usleep_range(15000, 20000);
                regmap_write(es8326->regmap, ES8326_HP_OFFSET_CAL, 0x8c);
@@ -814,13 +899,13 @@ static int es8326_resume(struct snd_soc_component *component)
        /* reset internal clock state */
        regmap_write(es8326->regmap, ES8326_RESET, 0x1f);
        regmap_write(es8326->regmap, ES8326_VMIDSEL, 0x0E);
+       regmap_write(es8326->regmap, ES8326_ANA_LP, 0xf0);
        usleep_range(10000, 15000);
        regmap_write(es8326->regmap, ES8326_HPJACK_TIMER, 0xe9);
-       regmap_write(es8326->regmap, ES8326_ANA_MICBIAS, 0x4b);
+       regmap_write(es8326->regmap, ES8326_ANA_MICBIAS, 0xcb);
        /* set headphone default type and detect pin */
        regmap_write(es8326->regmap, ES8326_HPDET_TYPE, 0x83);
        regmap_write(es8326->regmap, ES8326_CLK_RESAMPLE, 0x05);
-       regmap_write(es8326->regmap, ES8326_HP_MISC, 0x30);
 
        /* set internal oscillator as clock source of headpone cp */
        regmap_write(es8326->regmap, ES8326_CLK_DIV_CPC, 0x89);
@@ -828,14 +913,15 @@ static int es8326_resume(struct snd_soc_component *component)
        /* clock manager reset release */
        regmap_write(es8326->regmap, ES8326_RESET, 0x17);
        /* set headphone detection as half scan mode */
-       regmap_write(es8326->regmap, ES8326_HP_MISC, 0x30);
+       regmap_write(es8326->regmap, ES8326_HP_MISC, 0x3d);
        regmap_write(es8326->regmap, ES8326_PULLUP_CTL, 0x00);
 
        /* enable headphone driver */
+       regmap_write(es8326->regmap, ES8326_HP_VOL, 0xc4);
        regmap_write(es8326->regmap, ES8326_HP_DRIVER, 0xa7);
        usleep_range(2000, 5000);
-       regmap_write(es8326->regmap, ES8326_HP_DRIVER_REF, 0xa3);
-       regmap_write(es8326->regmap, ES8326_HP_DRIVER_REF, 0xb3);
+       regmap_write(es8326->regmap, ES8326_HP_DRIVER_REF, 0x23);
+       regmap_write(es8326->regmap, ES8326_HP_DRIVER_REF, 0x33);
        regmap_write(es8326->regmap, ES8326_HP_DRIVER, 0xa1);
 
        regmap_write(es8326->regmap, ES8326_CLK_INV, 0x00);
@@ -844,6 +930,8 @@ static int es8326_resume(struct snd_soc_component *component)
        regmap_write(es8326->regmap, ES8326_CLK_CAL_TIME, 0x00);
        /* calibrate for B version */
        es8326_calibrate(component);
+       regmap_write(es8326->regmap, ES8326_DAC_CROSSTALK, 0xaa);
+       regmap_write(es8326->regmap, ES8326_DAC_RAMPRATE, 0x00);
        /* turn off headphone out */
        regmap_write(es8326->regmap, ES8326_HP_CAL, 0x00);
        /* set ADC and DAC in low power mode */
@@ -856,6 +944,14 @@ static int es8326_resume(struct snd_soc_component *component)
        regmap_write(es8326->regmap, ES8326_DAC_DSM, 0x08);
        regmap_write(es8326->regmap, ES8326_DAC_VPPSCALE, 0x15);
 
+       regmap_write(es8326->regmap, ES8326_HPDET_TYPE, 0x80 |
+                       ((es8326->version == ES8326_VERSION_B) ?
+                       (ES8326_HP_DET_SRC_PIN9 | es8326->jack_pol) :
+                       (ES8326_HP_DET_SRC_PIN9 | es8326->jack_pol | 0x04)));
+       usleep_range(5000, 10000);
+       es8326_enable_micbias(es8326->component);
+       usleep_range(50000, 70000);
+       regmap_update_bits(es8326->regmap, ES8326_HPDET_TYPE, 0x03, 0x00);
        regmap_write(es8326->regmap, ES8326_INT_SOURCE,
                    (ES8326_INT_SRC_PIN9 | ES8326_INT_SRC_BUTTON));
        regmap_write(es8326->regmap, ES8326_INTOUT_IO,
@@ -864,7 +960,7 @@ static int es8326_resume(struct snd_soc_component *component)
                    (ES8326_IO_DMIC_CLK << ES8326_SDINOUT1_SHIFT));
        regmap_write(es8326->regmap, ES8326_SDINOUT23_IO, ES8326_IO_INPUT);
 
-       regmap_write(es8326->regmap, ES8326_ANA_PDN, 0x3b);
+       regmap_write(es8326->regmap, ES8326_ANA_PDN, 0x00);
        regmap_write(es8326->regmap, ES8326_RESET, ES8326_CSM_ON);
        regmap_update_bits(es8326->regmap, ES8326_PGAGAIN, ES8326_MIC_SEL_MASK,
                           ES8326_MIC1_SEL);
@@ -872,11 +968,7 @@ static int es8326_resume(struct snd_soc_component *component)
        regmap_update_bits(es8326->regmap, ES8326_DAC_MUTE, ES8326_MUTE_MASK,
                           ES8326_MUTE);
 
-       regmap_write(es8326->regmap, ES8326_HPDET_TYPE, 0x80 |
-                       ((es8326->version == ES8326_VERSION_B) ?
-                       (ES8326_HP_DET_SRC_PIN9 | es8326->jack_pol) :
-                       (ES8326_HP_DET_SRC_PIN9 | es8326->jack_pol | 0x04)));
-       regmap_write(es8326->regmap, ES8326_HP_VOL, 0x11);
+       regmap_write(es8326->regmap, ES8326_ADC_MUTE, 0x0f);
 
        es8326->jack_remove_retry = 0;
        es8326->hp = 0;
index 90a08351d6acd043b42a6ebed574591a3defc6a0..4234bbb900c4530c6746fab5f75fcb0b049e4fdb 100644 (file)
@@ -72,6 +72,7 @@
 #define ES8326_DAC_VOL         0x50
 #define ES8326_DRC_RECOVERY    0x53
 #define ES8326_DRC_WINSIZE     0x54
+#define ES8326_DAC_CROSSTALK   0x55
 #define ES8326_HPJACK_TIMER    0x56
 #define ES8326_HPDET_TYPE      0x57
 #define ES8326_INT_SOURCE      0x58
 #define ES8326_MUTE (3 << 0)
 
 /* ES8326_CLK_CTL */
-#define ES8326_CLK_ON (0x7f << 0)
+#define ES8326_CLK_ON (0x7e << 0)
 #define ES8326_CLK_OFF (0 << 0)
 
 /* ES8326_CLK_INV */
index 7e21cec3c2fb97a9be518b4316cdeafae2cf0776..6ce309980cd10e200dc62a1941b07f6f7728d3cd 100644 (file)
@@ -1584,7 +1584,6 @@ static int wsa_macro_enable_interpolator(struct snd_soc_dapm_widget *w,
        u16 gain_reg;
        u16 reg;
        int val;
-       int offset_val = 0;
        struct wsa_macro *wsa = snd_soc_component_get_drvdata(component);
 
        if (w->shift == WSA_MACRO_COMP1) {
@@ -1623,10 +1622,8 @@ static int wsa_macro_enable_interpolator(struct snd_soc_dapm_widget *w,
                                        CDC_WSA_RX1_RX_PATH_MIX_SEC0,
                                        CDC_WSA_RX_PGA_HALF_DB_MASK,
                                        CDC_WSA_RX_PGA_HALF_DB_ENABLE);
-                       offset_val = -2;
                }
                val = snd_soc_component_read(component, gain_reg);
-               val += offset_val;
                snd_soc_component_write(component, gain_reg, val);
                wsa_macro_config_ear_spkr_gain(component, wsa,
                                                event, gain_reg);
@@ -1654,10 +1651,6 @@ static int wsa_macro_enable_interpolator(struct snd_soc_dapm_widget *w,
                                        CDC_WSA_RX1_RX_PATH_MIX_SEC0,
                                        CDC_WSA_RX_PGA_HALF_DB_MASK,
                                        CDC_WSA_RX_PGA_HALF_DB_DISABLE);
-                       offset_val = 2;
-                       val = snd_soc_component_read(component, gain_reg);
-                       val += offset_val;
-                       snd_soc_component_write(component, gain_reg, val);
                }
                wsa_macro_config_ear_spkr_gain(component, wsa,
                                                event, gain_reg);
index b9f19fbd2911453803f07413fa33d2c947730ae2..b24d6472ad5fc91b43179a41b4935b53ca44c95f 100644 (file)
@@ -3884,7 +3884,7 @@ static inline int madera_set_fll_clks(struct madera_fll *fll, int base, bool ena
        return madera_set_fll_clks_reg(fll, ena,
                                       base + MADERA_FLL_CONTROL_6_OFFS,
                                       MADERA_FLL1_REFCLK_SRC_MASK,
-                                      MADERA_FLL1_REFCLK_DIV_SHIFT);
+                                      MADERA_FLL1_REFCLK_SRC_SHIFT);
 }
 
 static inline int madera_set_fllao_clks(struct madera_fll *fll, int base, bool ena)
index 5150d6ee374810f34a881dd33d4ea57187e13522..20191a4473c2d2c7b4ae6bddf34109a403b84e3d 100644 (file)
@@ -3317,6 +3317,7 @@ static void rt5645_jack_detect_work(struct work_struct *work)
                                    report, SND_JACK_HEADPHONE);
                snd_soc_jack_report(rt5645->mic_jack,
                                    report, SND_JACK_MICROPHONE);
+               mutex_unlock(&rt5645->jd_mutex);
                return;
        case 4:
                val = snd_soc_component_read(rt5645->component, RT5645_A_JD_CTRL1) & 0x0020;
@@ -3692,6 +3693,11 @@ static const struct rt5645_platform_data jd_mode3_monospk_platform_data = {
        .mono_speaker = true,
 };
 
+static const struct rt5645_platform_data jd_mode3_inv_data = {
+       .jd_mode = 3,
+       .inv_jd1_1 = true,
+};
+
 static const struct rt5645_platform_data jd_mode3_platform_data = {
        .jd_mode = 3,
 };
@@ -3837,6 +3843,16 @@ static const struct dmi_system_id dmi_platform_data[] = {
                  DMI_EXACT_MATCH(DMI_BOARD_VENDOR, "AMI Corporation"),
                  DMI_EXACT_MATCH(DMI_BOARD_NAME, "Cherry Trail CR"),
                  DMI_EXACT_MATCH(DMI_BOARD_VERSION, "Default string"),
+                 /*
+                  * Above strings are too generic, LattePanda BIOS versions for
+                  * all 4 hw revisions are:
+                  * DF-BI-7-S70CR100-*
+                  * DF-BI-7-S70CR110-*
+                  * DF-BI-7-S70CR200-*
+                  * LP-BS-7-S70CR700-*
+                  * Do a partial match for S70CR to avoid false positive matches.
+                  */
+                 DMI_MATCH(DMI_BIOS_VERSION, "S70CR"),
                },
                .driver_data = (void *)&lattepanda_board_platform_data,
        },
@@ -3871,6 +3887,16 @@ static const struct dmi_system_id dmi_platform_data[] = {
                },
                .driver_data = (void *)&intel_braswell_platform_data,
        },
+       {
+               .ident = "Meegopad T08",
+               .matches = {
+                       DMI_MATCH(DMI_SYS_VENDOR, "Default string"),
+                       DMI_MATCH(DMI_PRODUCT_NAME, "Default string"),
+                       DMI_MATCH(DMI_BOARD_NAME, "T3 MRD"),
+                       DMI_MATCH(DMI_BOARD_VERSION, "V1.1"),
+               },
+               .driver_data = (void *)&jd_mode3_inv_data,
+       },
        { }
 };
 
index b7e56ceb1acff9f41e6a26f09e1c4395ff537553..5d0e5348b361a568475fd1fde3a299a57926b365 100644 (file)
@@ -267,6 +267,7 @@ void tas2781_reset(struct tasdevice_priv *tas_dev)
 EXPORT_SYMBOL_GPL(tas2781_reset);
 
 int tascodec_init(struct tasdevice_priv *tas_priv, void *codec,
+       struct module *module,
        void (*cont)(const struct firmware *fw, void *context))
 {
        int ret = 0;
@@ -280,7 +281,7 @@ int tascodec_init(struct tasdevice_priv *tas_priv, void *codec,
                tas_priv->dev_name, tas_priv->ndev);
        crc8_populate_msb(tas_priv->crc8_lkp_tbl, TASDEVICE_CRC8_POLYNOMIAL);
        tas_priv->codec = codec;
-       ret = request_firmware_nowait(THIS_MODULE, FW_ACTION_UEVENT,
+       ret = request_firmware_nowait(module, FW_ACTION_UEVENT,
                tas_priv->rca_binaryname, tas_priv->dev, GFP_KERNEL, tas_priv,
                cont);
        if (ret)
index 32913bd1a623381ee6e8d3d72c3f8e49d60ff0f7..b5abff230e43701f0f00c7b19f895469305747d7 100644 (file)
@@ -566,7 +566,7 @@ static int tasdevice_codec_probe(struct snd_soc_component *codec)
 {
        struct tasdevice_priv *tas_priv = snd_soc_component_get_drvdata(codec);
 
-       return tascodec_init(tas_priv, codec, tasdevice_fw_ready);
+       return tascodec_init(tas_priv, codec, THIS_MODULE, tasdevice_fw_ready);
 }
 
 static void tasdevice_deinit(void *context)
index 43c648efd0d938db5e0cb470a625617b8dc1860f..deb15b95992d5cc494562a91f13adbc348e2dd31 100644 (file)
@@ -3033,7 +3033,6 @@ static int wcd9335_codec_enable_mix_path(struct snd_soc_dapm_widget *w,
 {
        struct snd_soc_component *comp = snd_soc_dapm_to_component(w->dapm);
        u16 gain_reg;
-       int offset_val = 0;
        int val = 0;
 
        switch (w->reg) {
@@ -3073,7 +3072,6 @@ static int wcd9335_codec_enable_mix_path(struct snd_soc_dapm_widget *w,
        switch (event) {
        case SND_SOC_DAPM_POST_PMU:
                val = snd_soc_component_read(comp, gain_reg);
-               val += offset_val;
                snd_soc_component_write(comp, gain_reg, val);
                break;
        case SND_SOC_DAPM_POST_PMD:
@@ -3294,7 +3292,6 @@ static int wcd9335_codec_enable_interpolator(struct snd_soc_dapm_widget *w,
        u16 gain_reg;
        u16 reg;
        int val;
-       int offset_val = 0;
 
        if (!(snd_soc_dapm_widget_name_cmp(w, "RX INT0 INTERP"))) {
                reg = WCD9335_CDC_RX0_RX_PATH_CTL;
@@ -3337,7 +3334,6 @@ static int wcd9335_codec_enable_interpolator(struct snd_soc_dapm_widget *w,
        case SND_SOC_DAPM_POST_PMU:
                wcd9335_config_compander(comp, w->shift, event);
                val = snd_soc_component_read(comp, gain_reg);
-               val += offset_val;
                snd_soc_component_write(comp, gain_reg, val);
                break;
        case SND_SOC_DAPM_POST_PMD:
index 1b6e376f3833cbc5f59034e88f90a4ee3845632a..6813268e6a19f3048877c5ae0ee55ae227543c04 100644 (file)
@@ -13,7 +13,6 @@
 #include <linux/of.h>
 #include <linux/platform_device.h>
 #include <linux/regmap.h>
-#include <linux/regulator/consumer.h>
 #include <linux/slab.h>
 #include <linux/slimbus.h>
 #include <sound/pcm_params.h>
index faf8d3f9b3c5d929d935ad4c5c63fe45371e4edc..6021aa5a56891969b04db64ac019bafb0766c701 100644 (file)
@@ -210,7 +210,7 @@ struct wcd938x_priv {
 };
 
 static const SNDRV_CTL_TLVD_DECLARE_DB_MINMAX(ear_pa_gain, 600, -1800);
-static const DECLARE_TLV_DB_SCALE(line_gain, -3000, 150, -3000);
+static const DECLARE_TLV_DB_SCALE(line_gain, -3000, 150, 0);
 static const SNDRV_CTL_TLVD_DECLARE_DB_MINMAX(analog_gain, 0, 3000);
 
 struct wcd938x_mbhc_zdet_param {
@@ -3587,10 +3587,8 @@ static int wcd938x_probe(struct platform_device *pdev)
        mutex_init(&wcd938x->micb_lock);
 
        ret = wcd938x_populate_dt_data(wcd938x, dev);
-       if (ret) {
-               dev_err(dev, "%s: Fail to obtain platform data\n", __func__);
-               return -EINVAL;
-       }
+       if (ret)
+               return ret;
 
        ret = wcd938x_add_slave_components(wcd938x, dev, &match);
        if (ret)
index fb90ae6a8a344acade2ef3cdee183491b218159d..7c6ed29831285f6bd08ceba939684e8958ea21cc 100644 (file)
@@ -2229,6 +2229,9 @@ SND_SOC_DAPM_PGA_E("HPOUT", SND_SOC_NOPM, 0, 0, NULL, 0, hp_event,
 
 SND_SOC_DAPM_OUTPUT("HPOUTL"),
 SND_SOC_DAPM_OUTPUT("HPOUTR"),
+
+SND_SOC_DAPM_PGA("SPKOUTL Output", WM8962_CLASS_D_CONTROL_1, 6, 0, NULL, 0),
+SND_SOC_DAPM_PGA("SPKOUTR Output", WM8962_CLASS_D_CONTROL_1, 7, 0, NULL, 0),
 };
 
 static const struct snd_soc_dapm_widget wm8962_dapm_spk_mono_widgets[] = {
@@ -2236,7 +2239,6 @@ SND_SOC_DAPM_MIXER("Speaker Mixer", WM8962_MIXER_ENABLES, 1, 0,
                   spkmixl, ARRAY_SIZE(spkmixl)),
 SND_SOC_DAPM_MUX_E("Speaker PGA", WM8962_PWR_MGMT_2, 4, 0, &spkoutl_mux,
                   out_pga_event, SND_SOC_DAPM_POST_PMU),
-SND_SOC_DAPM_PGA("Speaker Output", WM8962_CLASS_D_CONTROL_1, 7, 0, NULL, 0),
 SND_SOC_DAPM_OUTPUT("SPKOUT"),
 };
 
@@ -2251,9 +2253,6 @@ SND_SOC_DAPM_MUX_E("SPKOUTL PGA", WM8962_PWR_MGMT_2, 4, 0, &spkoutl_mux,
 SND_SOC_DAPM_MUX_E("SPKOUTR PGA", WM8962_PWR_MGMT_2, 3, 0, &spkoutr_mux,
                   out_pga_event, SND_SOC_DAPM_POST_PMU),
 
-SND_SOC_DAPM_PGA("SPKOUTR Output", WM8962_CLASS_D_CONTROL_1, 7, 0, NULL, 0),
-SND_SOC_DAPM_PGA("SPKOUTL Output", WM8962_CLASS_D_CONTROL_1, 6, 0, NULL, 0),
-
 SND_SOC_DAPM_OUTPUT("SPKOUTL"),
 SND_SOC_DAPM_OUTPUT("SPKOUTR"),
 };
@@ -2366,12 +2365,18 @@ static const struct snd_soc_dapm_route wm8962_spk_mono_intercon[] = {
        { "Speaker PGA", "Mixer", "Speaker Mixer" },
        { "Speaker PGA", "DAC", "DACL" },
 
-       { "Speaker Output", NULL, "Speaker PGA" },
-       { "Speaker Output", NULL, "SYSCLK" },
-       { "Speaker Output", NULL, "TOCLK" },
-       { "Speaker Output", NULL, "TEMP_SPK" },
+       { "SPKOUTL Output", NULL, "Speaker PGA" },
+       { "SPKOUTL Output", NULL, "SYSCLK" },
+       { "SPKOUTL Output", NULL, "TOCLK" },
+       { "SPKOUTL Output", NULL, "TEMP_SPK" },
 
-       { "SPKOUT", NULL, "Speaker Output" },
+       { "SPKOUTR Output", NULL, "Speaker PGA" },
+       { "SPKOUTR Output", NULL, "SYSCLK" },
+       { "SPKOUTR Output", NULL, "TOCLK" },
+       { "SPKOUTR Output", NULL, "TEMP_SPK" },
+
+       { "SPKOUT", NULL, "SPKOUTL Output" },
+       { "SPKOUT", NULL, "SPKOUTR Output" },
 };
 
 static const struct snd_soc_dapm_route wm8962_spk_stereo_intercon[] = {
@@ -2914,8 +2919,12 @@ static int wm8962_set_fll(struct snd_soc_component *component, int fll_id, int s
        switch (fll_id) {
        case WM8962_FLL_MCLK:
        case WM8962_FLL_BCLK:
+               fll1 |= (fll_id - 1) << WM8962_FLL_REFCLK_SRC_SHIFT;
+               break;
        case WM8962_FLL_OSC:
                fll1 |= (fll_id - 1) << WM8962_FLL_REFCLK_SRC_SHIFT;
+               snd_soc_component_update_bits(component, WM8962_PLL2,
+                                             WM8962_OSC_ENA, WM8962_OSC_ENA);
                break;
        case WM8962_FLL_INT:
                snd_soc_component_update_bits(component, WM8962_FLL_CONTROL_1,
@@ -2924,7 +2933,7 @@ static int wm8962_set_fll(struct snd_soc_component *component, int fll_id, int s
                                    WM8962_FLL_FRC_NCO, WM8962_FLL_FRC_NCO);
                break;
        default:
-               dev_err(component->dev, "Unknown FLL source %d\n", ret);
+               dev_err(component->dev, "Unknown FLL source %d\n", source);
                return -EINVAL;
        }
 
index c01e31175015cc2f354175dec019fac591a98b4b..36ea0dcdc7ab0033eb48e393d783e4f7d4df9854 100644 (file)
@@ -739,19 +739,25 @@ static int wm_adsp_request_firmware_file(struct wm_adsp *dsp,
                                         const char *filetype)
 {
        struct cs_dsp *cs_dsp = &dsp->cs_dsp;
+       const char *fwf;
        char *s, c;
        int ret = 0;
 
+       if (dsp->fwf_name)
+               fwf = dsp->fwf_name;
+       else
+               fwf = dsp->cs_dsp.name;
+
        if (system_name && asoc_component_prefix)
                *filename = kasprintf(GFP_KERNEL, "%s%s-%s-%s-%s-%s.%s", dir, dsp->part,
-                                     dsp->fwf_name, wm_adsp_fw[dsp->fw].file, system_name,
+                                     fwf, wm_adsp_fw[dsp->fw].file, system_name,
                                      asoc_component_prefix, filetype);
        else if (system_name)
                *filename = kasprintf(GFP_KERNEL, "%s%s-%s-%s-%s.%s", dir, dsp->part,
-                                     dsp->fwf_name, wm_adsp_fw[dsp->fw].file, system_name,
+                                     fwf, wm_adsp_fw[dsp->fw].file, system_name,
                                      filetype);
        else
-               *filename = kasprintf(GFP_KERNEL, "%s%s-%s-%s.%s", dir, dsp->part, dsp->fwf_name,
+               *filename = kasprintf(GFP_KERNEL, "%s%s-%s-%s.%s", dir, dsp->part, fwf,
                                      wm_adsp_fw[dsp->fw].file, filetype);
 
        if (*filename == NULL)
@@ -823,6 +829,23 @@ static int wm_adsp_request_firmware_files(struct wm_adsp *dsp,
                }
        }
 
+       /* Check system-specific bin without wmfw before falling back to generic */
+       if (dsp->wmfw_optional && system_name) {
+               if (asoc_component_prefix)
+                       wm_adsp_request_firmware_file(dsp, coeff_firmware, coeff_filename,
+                                                     cirrus_dir, system_name,
+                                                     asoc_component_prefix, "bin");
+
+               if (!*coeff_firmware)
+                       wm_adsp_request_firmware_file(dsp, coeff_firmware, coeff_filename,
+                                                     cirrus_dir, system_name,
+                                                     NULL, "bin");
+
+               if (*coeff_firmware)
+                       return 0;
+       }
+
+       /* Check legacy location */
        if (!wm_adsp_request_firmware_file(dsp, wmfw_firmware, wmfw_filename,
                                           "", NULL, NULL, "wmfw")) {
                wm_adsp_request_firmware_file(dsp, coeff_firmware, coeff_filename,
@@ -830,62 +853,28 @@ static int wm_adsp_request_firmware_files(struct wm_adsp *dsp,
                return 0;
        }
 
+       /* Fall back to generic wmfw and optional matching bin */
        ret = wm_adsp_request_firmware_file(dsp, wmfw_firmware, wmfw_filename,
                                            cirrus_dir, NULL, NULL, "wmfw");
-       if (!ret) {
+       if (!ret || dsp->wmfw_optional) {
                wm_adsp_request_firmware_file(dsp, coeff_firmware, coeff_filename,
                                              cirrus_dir, NULL, NULL, "bin");
                return 0;
        }
 
-       if (dsp->wmfw_optional) {
-               if (system_name) {
-                       if (asoc_component_prefix)
-                               wm_adsp_request_firmware_file(dsp, coeff_firmware, coeff_filename,
-                                                             cirrus_dir, system_name,
-                                                             asoc_component_prefix, "bin");
-
-                       if (!*coeff_firmware)
-                               wm_adsp_request_firmware_file(dsp, coeff_firmware, coeff_filename,
-                                                             cirrus_dir, system_name,
-                                                             NULL, "bin");
-               }
-
-               if (!*coeff_firmware)
-                       wm_adsp_request_firmware_file(dsp, coeff_firmware, coeff_filename,
-                                                     "", NULL, NULL, "bin");
-
-               if (!*coeff_firmware)
-                       wm_adsp_request_firmware_file(dsp, coeff_firmware, coeff_filename,
-                                                     cirrus_dir, NULL, NULL, "bin");
-
-               return 0;
-       }
-
        adsp_err(dsp, "Failed to request firmware <%s>%s-%s-%s<-%s<%s>>.wmfw\n",
-                cirrus_dir, dsp->part, dsp->fwf_name, wm_adsp_fw[dsp->fw].file,
-                system_name, asoc_component_prefix);
+                cirrus_dir, dsp->part,
+                dsp->fwf_name ? dsp->fwf_name : dsp->cs_dsp.name,
+                wm_adsp_fw[dsp->fw].file, system_name, asoc_component_prefix);
 
        return -ENOENT;
 }
 
 static int wm_adsp_common_init(struct wm_adsp *dsp)
 {
-       char *p;
-
        INIT_LIST_HEAD(&dsp->compr_list);
        INIT_LIST_HEAD(&dsp->buffer_list);
 
-       if (!dsp->fwf_name) {
-               p = devm_kstrdup(dsp->cs_dsp.dev, dsp->cs_dsp.name, GFP_KERNEL);
-               if (!p)
-                       return -ENOMEM;
-
-               dsp->fwf_name = p;
-               for (; *p != 0; ++p)
-                       *p = tolower(*p);
-       }
-
        return 0;
 }
 
index cb83c569e18d6aef70b23a56198dbf5ccd5ef2d8..a2e86ef7d18f5981b4604372e4b20930695aa5c4 100644 (file)
@@ -1098,7 +1098,11 @@ static int wsa_dev_mode_put(struct snd_kcontrol *kcontrol,
        return 1;
 }
 
-static const DECLARE_TLV_DB_SCALE(pa_gain, -300, 150, -300);
+static const SNDRV_CTL_TLVD_DECLARE_DB_RANGE(pa_gain,
+       0, 14, TLV_DB_SCALE_ITEM(-300, 0, 0),
+       15, 29, TLV_DB_SCALE_ITEM(-300, 150, 0),
+       30, 31, TLV_DB_SCALE_ITEM(1800, 0, 0),
+);
 
 static int wsa883x_get_swr_port(struct snd_kcontrol *kcontrol,
                                struct snd_ctl_elem_value *ucontrol)
index f0fb33d719c25135722014f9763c65df3289ed7e..c46f64557a7ffd268716e852dfe4411251da08cc 100644 (file)
@@ -174,7 +174,9 @@ static int fsl_xcvr_activate_ctl(struct snd_soc_dai *dai, const char *name,
        struct snd_kcontrol *kctl;
        bool enabled;
 
-       kctl = snd_soc_card_get_kcontrol(card, name);
+       lockdep_assert_held(&card->snd_card->controls_rwsem);
+
+       kctl = snd_soc_card_get_kcontrol_locked(card, name);
        if (kctl == NULL)
                return -ENOENT;
 
@@ -576,10 +578,14 @@ static int fsl_xcvr_startup(struct snd_pcm_substream *substream,
        xcvr->streams |= BIT(substream->stream);
 
        if (!xcvr->soc_data->spdif_only) {
+               struct snd_soc_card *card = dai->component->card;
+
                /* Disable XCVR controls if there is stream started */
+               down_read(&card->snd_card->controls_rwsem);
                fsl_xcvr_activate_ctl(dai, fsl_xcvr_mode_kctl.name, false);
                fsl_xcvr_activate_ctl(dai, fsl_xcvr_arc_mode_kctl.name, false);
                fsl_xcvr_activate_ctl(dai, fsl_xcvr_earc_capds_kctl.name, false);
+               up_read(&card->snd_card->controls_rwsem);
        }
 
        return 0;
@@ -598,11 +604,15 @@ static void fsl_xcvr_shutdown(struct snd_pcm_substream *substream,
        /* Enable XCVR controls if there is no stream started */
        if (!xcvr->streams) {
                if (!xcvr->soc_data->spdif_only) {
+                       struct snd_soc_card *card = dai->component->card;
+
+                       down_read(&card->snd_card->controls_rwsem);
                        fsl_xcvr_activate_ctl(dai, fsl_xcvr_mode_kctl.name, true);
                        fsl_xcvr_activate_ctl(dai, fsl_xcvr_arc_mode_kctl.name,
                                                (xcvr->mode == FSL_XCVR_MODE_ARC));
                        fsl_xcvr_activate_ctl(dai, fsl_xcvr_earc_capds_kctl.name,
                                                (xcvr->mode == FSL_XCVR_MODE_EARC));
+                       up_read(&card->snd_card->controls_rwsem);
                }
                ret = regmap_update_bits(xcvr->regmap, FSL_XCVR_EXT_IER0,
                                         FSL_XCVR_IRQ_EARC_ALL, 0);
index 59c3793f65df0c5573ec6e7f873ef33d31f46371..db78eb2f0108071736b6bafa5e77f874b15f3375 100644 (file)
@@ -477,6 +477,9 @@ static int avs_pci_probe(struct pci_dev *pci, const struct pci_device_id *id)
        return 0;
 
 err_i915_init:
+       pci_free_irq(pci, 0, adev);
+       pci_free_irq(pci, 0, bus);
+       pci_free_irq_vectors(pci);
        pci_clear_master(pci);
        pci_set_drvdata(pci, NULL);
 err_acquire_irq:
index 778236d3fd2806912120ab0eb953f96a6c79c90b..48b3c67c91032c97b7da54e9d822876f0e66b994 100644 (file)
@@ -857,7 +857,7 @@ assign_copier_gtw_instance(struct snd_soc_component *comp, struct avs_tplg_modcf
        }
 
        /* If topology sets value don't overwrite it */
-       if (cfg->copier.vindex.i2s.instance)
+       if (cfg->copier.vindex.val)
                return;
 
        mach = dev_get_platdata(comp->card->dev);
index 10a84a2c1036e9ce67751702a047795a5eabf9b7..c014d85a08b24755682f3faf97e3e60cd171dc7a 100644 (file)
@@ -241,7 +241,8 @@ static int snd_byt_cht_cx2072x_probe(struct platform_device *pdev)
 
        /* fix index of codec dai */
        for (i = 0; i < ARRAY_SIZE(byt_cht_cx2072x_dais); i++) {
-               if (!strcmp(byt_cht_cx2072x_dais[i].codecs->name,
+               if (byt_cht_cx2072x_dais[i].codecs->name &&
+                   !strcmp(byt_cht_cx2072x_dais[i].codecs->name,
                            "i2c-14F10720:00")) {
                        dai_index = i;
                        break;
index 7e5eea690023dff7bbdea996c739428c28445cf9..f4ac3ddd148b83757881426a2522adacd3d966d3 100644 (file)
@@ -245,7 +245,8 @@ static int bytcht_da7213_probe(struct platform_device *pdev)
 
        /* fix index of codec dai */
        for (i = 0; i < ARRAY_SIZE(dailink); i++) {
-               if (!strcmp(dailink[i].codecs->name, "i2c-DLGS7213:00")) {
+               if (dailink[i].codecs->name &&
+                   !strcmp(dailink[i].codecs->name, "i2c-DLGS7213:00")) {
                        dai_index = i;
                        break;
                }
index 1564a88a885efa1838317f453bf3d514a529d7e8..2fcec2e02bb53b403350ee76cb124ed99882087e 100644 (file)
@@ -546,7 +546,8 @@ static int snd_byt_cht_es8316_mc_probe(struct platform_device *pdev)
 
        /* fix index of codec dai */
        for (i = 0; i < ARRAY_SIZE(byt_cht_es8316_dais); i++) {
-               if (!strcmp(byt_cht_es8316_dais[i].codecs->name,
+               if (byt_cht_es8316_dais[i].codecs->name &&
+                   !strcmp(byt_cht_es8316_dais[i].codecs->name,
                            "i2c-ESSX8316:00")) {
                        dai_index = i;
                        break;
index 42466b4b1ca45e159ea40c42809018929e0ed0dc..05f38d1f7d824dc0f46c5262e0f6616dce1b7411 100644 (file)
@@ -685,6 +685,18 @@ static const struct dmi_system_id byt_rt5640_quirk_table[] = {
                                        BYT_RT5640_SSP0_AIF1 |
                                        BYT_RT5640_MCLK_EN),
        },
+       {       /* Chuwi Vi8 dual-boot (CWI506) */
+               .matches = {
+                       DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Insyde"),
+                       DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "i86"),
+                       /* The above are too generic, also match BIOS info */
+                       DMI_MATCH(DMI_BIOS_VERSION, "CHUWI2.D86JHBNR02"),
+               },
+               .driver_data = (void *)(BYTCR_INPUT_DEFAULTS |
+                                       BYT_RT5640_MONO_SPEAKER |
+                                       BYT_RT5640_SSP0_AIF1 |
+                                       BYT_RT5640_MCLK_EN),
+       },
        {
                /* Chuwi Vi10 (CWI505) */
                .matches = {
@@ -1652,7 +1664,8 @@ static int snd_byt_rt5640_mc_probe(struct platform_device *pdev)
 
        /* fix index of codec dai */
        for (i = 0; i < ARRAY_SIZE(byt_rt5640_dais); i++) {
-               if (!strcmp(byt_rt5640_dais[i].codecs->name,
+               if (byt_rt5640_dais[i].codecs->name &&
+                   !strcmp(byt_rt5640_dais[i].codecs->name,
                            "i2c-10EC5640:00")) {
                        dai_index = i;
                        break;
index f9fe8414f454ff481b7e1b84f5377bb4d8835161..80c841b000a311229c310fec3ba91264696e6025 100644 (file)
@@ -910,7 +910,8 @@ static int snd_byt_rt5651_mc_probe(struct platform_device *pdev)
 
        /* fix index of codec dai */
        for (i = 0; i < ARRAY_SIZE(byt_rt5651_dais); i++) {
-               if (!strcmp(byt_rt5651_dais[i].codecs->name,
+               if (byt_rt5651_dais[i].codecs->name &&
+                   !strcmp(byt_rt5651_dais[i].codecs->name,
                            "i2c-10EC5651:00")) {
                        dai_index = i;
                        break;
index 6978ebde669357fc7a25abc9961aaafc278b1789..cccb5e90c0fefc6a888ac302a63ed50a9423342d 100644 (file)
@@ -605,7 +605,8 @@ static int snd_byt_wm5102_mc_probe(struct platform_device *pdev)
 
        /* find index of codec dai */
        for (i = 0; i < ARRAY_SIZE(byt_wm5102_dais); i++) {
-               if (!strcmp(byt_wm5102_dais[i].codecs->name,
+               if (byt_wm5102_dais[i].codecs->name &&
+                   !strcmp(byt_wm5102_dais[i].codecs->name,
                            "wm5102-codec")) {
                        dai_index = i;
                        break;
index c952a96cde7ebe27ba6f61ed6d417d0f063d0e1e..eb41b7115d01dd38685d5a10cd393e46eb4106a4 100644 (file)
@@ -40,7 +40,6 @@ struct cht_acpi_card {
 struct cht_mc_private {
        struct snd_soc_jack jack;
        struct cht_acpi_card *acpi_card;
-       char codec_name[SND_ACPI_I2C_ID_LEN];
        struct clk *mclk;
 };
 
@@ -567,14 +566,14 @@ static int snd_cht_mc_probe(struct platform_device *pdev)
        }
 
        card->dev = &pdev->dev;
-       sprintf(drv->codec_name, "i2c-%s:00", drv->acpi_card->codec_id);
 
        /* set correct codec name */
        for (i = 0; i < ARRAY_SIZE(cht_dailink); i++)
-               if (!strcmp(card->dai_link[i].codecs->name,
+               if (cht_dailink[i].codecs->name &&
+                   !strcmp(cht_dailink[i].codecs->name,
                            "i2c-10EC5645:00")) {
-                       card->dai_link[i].codecs->name = drv->codec_name;
                        dai_index = i;
+                       break;
                }
 
        /* fixup codec name based on HID */
index 8cf0b33cc02eb5763acbb572ab0387efbe8da325..be2d1a8dbca807dd1f4af070382d2d2f169c9e27 100644 (file)
@@ -466,7 +466,8 @@ static int snd_cht_mc_probe(struct platform_device *pdev)
 
        /* find index of codec dai */
        for (i = 0; i < ARRAY_SIZE(cht_dailink); i++) {
-               if (!strcmp(cht_dailink[i].codecs->name, RT5672_I2C_DEFAULT)) {
+               if (cht_dailink[i].codecs->name &&
+                   !strcmp(cht_dailink[i].codecs->name, RT5672_I2C_DEFAULT)) {
                        dai_index = i;
                        break;
                }
index 48b03e60e3a3d760c9d872d9ffc3527b0aed40fe..8106c586f68a4ec456ae2be11f37b1c2c8cd806c 100644 (file)
@@ -259,7 +259,7 @@ static int lpass_cdc_dma_daiops_trigger(struct snd_pcm_substream *substream,
                                    int cmd, struct snd_soc_dai *dai)
 {
        struct snd_soc_pcm_runtime *soc_runtime = snd_soc_substream_to_rtd(substream);
-       struct lpaif_dmactl *dmactl;
+       struct lpaif_dmactl *dmactl = NULL;
        int ret = 0, id;
 
        switch (cmd) {
index 052e40cb38feca032752784545a4a07332369054..00bbd291be5cea4b5f43ee06b85b55dfb8eb91c2 100644 (file)
@@ -123,7 +123,7 @@ static struct snd_pcm_hardware q6apm_dai_hardware_playback = {
        .fifo_size =            0,
 };
 
-static void event_handler(uint32_t opcode, uint32_t token, uint32_t *payload, void *priv)
+static void event_handler(uint32_t opcode, uint32_t token, void *payload, void *priv)
 {
        struct q6apm_dai_rtd *prtd = priv;
        struct snd_pcm_substream *substream = prtd->substream;
@@ -157,7 +157,7 @@ static void event_handler(uint32_t opcode, uint32_t token, uint32_t *payload, vo
 }
 
 static void event_handler_compr(uint32_t opcode, uint32_t token,
-                               uint32_t *payload, void *priv)
+                               void *payload, void *priv)
 {
        struct q6apm_dai_rtd *prtd = priv;
        struct snd_compr_stream *substream = prtd->cstream;
@@ -352,7 +352,7 @@ static int q6apm_dai_open(struct snd_soc_component *component,
 
        spin_lock_init(&prtd->lock);
        prtd->substream = substream;
-       prtd->graph = q6apm_graph_open(dev, (q6apm_cb)event_handler, prtd, graph_id);
+       prtd->graph = q6apm_graph_open(dev, event_handler, prtd, graph_id);
        if (IS_ERR(prtd->graph)) {
                dev_err(dev, "%s: Could not allocate memory\n", __func__);
                ret = PTR_ERR(prtd->graph);
@@ -496,7 +496,7 @@ static int q6apm_dai_compr_open(struct snd_soc_component *component,
                return -ENOMEM;
 
        prtd->cstream = stream;
-       prtd->graph = q6apm_graph_open(dev, (q6apm_cb)event_handler_compr, prtd, graph_id);
+       prtd->graph = q6apm_graph_open(dev, event_handler_compr, prtd, graph_id);
        if (IS_ERR(prtd->graph)) {
                ret = PTR_ERR(prtd->graph);
                kfree(prtd);
index ed4bb551bfbb92c965eba25c048ce4fa12648283..b7fd503a166668d3fe41500bf52e58de129c92de 100644 (file)
@@ -32,12 +32,14 @@ static int sc8280xp_snd_init(struct snd_soc_pcm_runtime *rtd)
        case WSA_CODEC_DMA_RX_0:
        case WSA_CODEC_DMA_RX_1:
                /*
-                * set limit of 0dB on Digital Volume for Speakers,
-                * this can prevent damage of speakers to some extent without
-                * active speaker protection
+                * Set limit of -3 dB on Digital Volume and 0 dB on PA Volume
+                * to reduce the risk of speaker damage until we have active
+                * speaker protection in place.
                 */
-               snd_soc_limit_volume(card, "WSA_RX0 Digital Volume", 84);
-               snd_soc_limit_volume(card, "WSA_RX1 Digital Volume", 84);
+               snd_soc_limit_volume(card, "WSA_RX0 Digital Volume", 81);
+               snd_soc_limit_volume(card, "WSA_RX1 Digital Volume", 81);
+               snd_soc_limit_volume(card, "SpkrLeft PA Volume", 17);
+               snd_soc_limit_volume(card, "SpkrRight PA Volume", 17);
                break;
        default:
                break;
index 230c48648af359381223eda3a5e6a4b1793bc5f2..afd69c6eb6544cc21c9d533a03cbabc755c15764 100644 (file)
@@ -111,6 +111,13 @@ static u32 rsnd_adg_ssi_ws_timing_gen2(struct rsnd_dai_stream *io)
                        ws = 7;
                        break;
                }
+       } else {
+               /*
+                * SSI8 is not connected to ADG.
+                * Thus SSI9 is using ws = 8
+                */
+               if (id == 9)
+                       ws = 8;
        }
 
        return (0x6 + ws) << 8;
index 285ab4c9c7168314ae34bead44a5229bc5d8b96b..8a2f163da6bc9e8e61fc1f196f3b89556815cafe 100644 (file)
@@ -5,6 +5,9 @@
 // Copyright (C) 2019 Renesas Electronics Corp.
 // Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
 //
+
+#include <linux/lockdep.h>
+#include <linux/rwsem.h>
 #include <sound/soc.h>
 #include <sound/jack.h>
 
@@ -26,12 +29,15 @@ static inline int _soc_card_ret(struct snd_soc_card *card,
        return ret;
 }
 
-struct snd_kcontrol *snd_soc_card_get_kcontrol(struct snd_soc_card *soc_card,
-                                              const char *name)
+struct snd_kcontrol *snd_soc_card_get_kcontrol_locked(struct snd_soc_card *soc_card,
+                                                     const char *name)
 {
        struct snd_card *card = soc_card->snd_card;
        struct snd_kcontrol *kctl;
 
+       /* must be held read or write */
+       lockdep_assert_held(&card->controls_rwsem);
+
        if (unlikely(!name))
                return NULL;
 
@@ -40,6 +46,20 @@ struct snd_kcontrol *snd_soc_card_get_kcontrol(struct snd_soc_card *soc_card,
                        return kctl;
        return NULL;
 }
+EXPORT_SYMBOL_GPL(snd_soc_card_get_kcontrol_locked);
+
+struct snd_kcontrol *snd_soc_card_get_kcontrol(struct snd_soc_card *soc_card,
+                                              const char *name)
+{
+       struct snd_card *card = soc_card->snd_card;
+       struct snd_kcontrol *kctl;
+
+       down_read(&card->controls_rwsem);
+       kctl = snd_soc_card_get_kcontrol_locked(soc_card, name);
+       up_read(&card->controls_rwsem);
+
+       return kctl;
+}
 EXPORT_SYMBOL_GPL(snd_soc_card_get_kcontrol);
 
 static int jack_new(struct snd_soc_card *card, const char *id, int type,
index f8524b5bfb330652afb48091b7afab12b1a70d6e..516350533e73f8ee1084164e44068f825eb7fafe 100644 (file)
@@ -1037,7 +1037,7 @@ component_dai_empty:
        return -EINVAL;
 }
 
-#define MAX_DEFAULT_CH_MAP_SIZE 7
+#define MAX_DEFAULT_CH_MAP_SIZE 8
 static struct snd_soc_dai_link_ch_map default_ch_map_sync[MAX_DEFAULT_CH_MAP_SIZE] = {
        { .cpu = 0, .codec = 0 },
        { .cpu = 1, .codec = 1 },
@@ -1046,6 +1046,7 @@ static struct snd_soc_dai_link_ch_map default_ch_map_sync[MAX_DEFAULT_CH_MAP_SIZ
        { .cpu = 4, .codec = 4 },
        { .cpu = 5, .codec = 5 },
        { .cpu = 6, .codec = 6 },
+       { .cpu = 7, .codec = 7 },
 };
 static struct snd_soc_dai_link_ch_map default_ch_map_1cpu[MAX_DEFAULT_CH_MAP_SIZE] = {
        { .cpu = 0, .codec = 0 },
@@ -1055,6 +1056,7 @@ static struct snd_soc_dai_link_ch_map default_ch_map_1cpu[MAX_DEFAULT_CH_MAP_SIZ
        { .cpu = 0, .codec = 4 },
        { .cpu = 0, .codec = 5 },
        { .cpu = 0, .codec = 6 },
+       { .cpu = 0, .codec = 7 },
 };
 static struct snd_soc_dai_link_ch_map default_ch_map_1codec[MAX_DEFAULT_CH_MAP_SIZE] = {
        { .cpu = 0, .codec = 0 },
@@ -1064,6 +1066,7 @@ static struct snd_soc_dai_link_ch_map default_ch_map_1codec[MAX_DEFAULT_CH_MAP_S
        { .cpu = 4, .codec = 0 },
        { .cpu = 5, .codec = 0 },
        { .cpu = 6, .codec = 0 },
+       { .cpu = 7, .codec = 0 },
 };
 static int snd_soc_compensate_channel_connection_map(struct snd_soc_card *card,
                                                     struct snd_soc_dai_link *dai_link)
index 2743f07a5e0811912722d174bd4096950e788a62..b44b1b1adb6ed9e913c857168902f00949abcb86 100644 (file)
@@ -188,11 +188,13 @@ irqreturn_t acp_sof_ipc_irq_thread(int irq, void *context)
 
        dsp_ack = snd_sof_dsp_read(sdev, ACP_DSP_BAR, ACP_SCRATCH_REG_0 + dsp_ack_write);
        if (dsp_ack) {
+               spin_lock_irq(&sdev->ipc_lock);
                /* handle immediate reply from DSP core */
                acp_dsp_ipc_get_reply(sdev);
                snd_sof_ipc_reply(sdev, 0);
                /* set the done bit */
                acp_dsp_ipc_dsp_done(sdev);
+               spin_unlock_irq(&sdev->ipc_lock);
                ipc_irq = true;
        }
 
index 32a741fcb84fffcc3988e4deabf4e99ef4863d49..07632ae6ccf5ec058565d50f1992475795c490d2 100644 (file)
@@ -355,21 +355,20 @@ static irqreturn_t acp_irq_thread(int irq, void *context)
        unsigned int count = ACP_HW_SEM_RETRY_COUNT;
 
        spin_lock_irq(&sdev->ipc_lock);
-       while (snd_sof_dsp_read(sdev, ACP_DSP_BAR, desc->hw_semaphore_offset)) {
-               /* Wait until acquired HW Semaphore lock or timeout */
-               count--;
-               if (!count) {
-                       dev_err(sdev->dev, "%s: Failed to acquire HW lock\n", __func__);
-                       spin_unlock_irq(&sdev->ipc_lock);
-                       return IRQ_NONE;
-               }
+       /* Wait until acquired HW Semaphore lock or timeout */
+       while (snd_sof_dsp_read(sdev, ACP_DSP_BAR, desc->hw_semaphore_offset) && --count)
+               ;
+       spin_unlock_irq(&sdev->ipc_lock);
+
+       if (!count) {
+               dev_err(sdev->dev, "%s: Failed to acquire HW lock\n", __func__);
+               return IRQ_NONE;
        }
 
        sof_ops(sdev)->irq_thread(irq, sdev);
        /* Unlock or Release HW Semaphore */
        snd_sof_dsp_write(sdev, ACP_DSP_BAR, desc->hw_semaphore_offset, 0x0);
 
-       spin_unlock_irq(&sdev->ipc_lock);
        return IRQ_HANDLED;
 };
 
index 78a57eb9cbc377c0b525827a0539baea6850ca6f..b26ffe767fab553467897b4facc13931668b27fe 100644 (file)
@@ -36,7 +36,7 @@ static const struct sof_dev_desc lnl_desc = {
                [SOF_IPC_TYPE_4] = "intel/sof-ipc4/lnl",
        },
        .default_tplg_path = {
-               [SOF_IPC_TYPE_4] = "intel/sof-ace-tplg",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4-tplg",
        },
        .default_fw_filename = {
                [SOF_IPC_TYPE_4] = "sof-lnl.ri",
index 0660d4b2ac96b66da5997d1aed60ddb91341571a..a361ee9d1107f5ed7533d1f71400f95ae4ad34af 100644 (file)
@@ -33,18 +33,18 @@ static const struct sof_dev_desc tgl_desc = {
        .dspless_mode_supported = true,         /* Only supported for HDaudio */
        .default_fw_path = {
                [SOF_IPC_TYPE_3] = "intel/sof",
-               [SOF_IPC_TYPE_4] = "intel/avs/tgl",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4/tgl",
        },
        .default_lib_path = {
-               [SOF_IPC_TYPE_4] = "intel/avs-lib/tgl",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4-lib/tgl",
        },
        .default_tplg_path = {
                [SOF_IPC_TYPE_3] = "intel/sof-tplg",
-               [SOF_IPC_TYPE_4] = "intel/avs-tplg",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4-tplg",
        },
        .default_fw_filename = {
                [SOF_IPC_TYPE_3] = "sof-tgl.ri",
-               [SOF_IPC_TYPE_4] = "dsp_basefw.bin",
+               [SOF_IPC_TYPE_4] = "sof-tgl.ri",
        },
        .nocodec_tplg_filename = "sof-tgl-nocodec.tplg",
        .ops = &sof_tgl_ops,
@@ -66,18 +66,18 @@ static const struct sof_dev_desc tglh_desc = {
        .dspless_mode_supported = true,         /* Only supported for HDaudio */
        .default_fw_path = {
                [SOF_IPC_TYPE_3] = "intel/sof",
-               [SOF_IPC_TYPE_4] = "intel/avs/tgl-h",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4/tgl-h",
        },
        .default_lib_path = {
-               [SOF_IPC_TYPE_4] = "intel/avs-lib/tgl-h",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4-lib/tgl-h",
        },
        .default_tplg_path = {
                [SOF_IPC_TYPE_3] = "intel/sof-tplg",
-               [SOF_IPC_TYPE_4] = "intel/avs-tplg",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4-tplg",
        },
        .default_fw_filename = {
                [SOF_IPC_TYPE_3] = "sof-tgl-h.ri",
-               [SOF_IPC_TYPE_4] = "dsp_basefw.bin",
+               [SOF_IPC_TYPE_4] = "sof-tgl-h.ri",
        },
        .nocodec_tplg_filename = "sof-tgl-nocodec.tplg",
        .ops = &sof_tgl_ops,
@@ -98,18 +98,18 @@ static const struct sof_dev_desc ehl_desc = {
        .dspless_mode_supported = true,         /* Only supported for HDaudio */
        .default_fw_path = {
                [SOF_IPC_TYPE_3] = "intel/sof",
-               [SOF_IPC_TYPE_4] = "intel/avs/ehl",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4/ehl",
        },
        .default_lib_path = {
-               [SOF_IPC_TYPE_4] = "intel/avs-lib/ehl",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4-lib/ehl",
        },
        .default_tplg_path = {
                [SOF_IPC_TYPE_3] = "intel/sof-tplg",
-               [SOF_IPC_TYPE_4] = "intel/avs-tplg",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4-tplg",
        },
        .default_fw_filename = {
                [SOF_IPC_TYPE_3] = "sof-ehl.ri",
-               [SOF_IPC_TYPE_4] = "dsp_basefw.bin",
+               [SOF_IPC_TYPE_4] = "sof-ehl.ri",
        },
        .nocodec_tplg_filename = "sof-ehl-nocodec.tplg",
        .ops = &sof_tgl_ops,
@@ -131,18 +131,18 @@ static const struct sof_dev_desc adls_desc = {
        .dspless_mode_supported = true,         /* Only supported for HDaudio */
        .default_fw_path = {
                [SOF_IPC_TYPE_3] = "intel/sof",
-               [SOF_IPC_TYPE_4] = "intel/avs/adl-s",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4/adl-s",
        },
        .default_lib_path = {
-               [SOF_IPC_TYPE_4] = "intel/avs-lib/adl-s",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4-lib/adl-s",
        },
        .default_tplg_path = {
                [SOF_IPC_TYPE_3] = "intel/sof-tplg",
-               [SOF_IPC_TYPE_4] = "intel/avs-tplg",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4-tplg",
        },
        .default_fw_filename = {
                [SOF_IPC_TYPE_3] = "sof-adl-s.ri",
-               [SOF_IPC_TYPE_4] = "dsp_basefw.bin",
+               [SOF_IPC_TYPE_4] = "sof-adl-s.ri",
        },
        .nocodec_tplg_filename = "sof-adl-nocodec.tplg",
        .ops = &sof_tgl_ops,
@@ -164,18 +164,18 @@ static const struct sof_dev_desc adl_desc = {
        .dspless_mode_supported = true,         /* Only supported for HDaudio */
        .default_fw_path = {
                [SOF_IPC_TYPE_3] = "intel/sof",
-               [SOF_IPC_TYPE_4] = "intel/avs/adl",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4/adl",
        },
        .default_lib_path = {
-               [SOF_IPC_TYPE_4] = "intel/avs-lib/adl",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4-lib/adl",
        },
        .default_tplg_path = {
                [SOF_IPC_TYPE_3] = "intel/sof-tplg",
-               [SOF_IPC_TYPE_4] = "intel/avs-tplg",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4-tplg",
        },
        .default_fw_filename = {
                [SOF_IPC_TYPE_3] = "sof-adl.ri",
-               [SOF_IPC_TYPE_4] = "dsp_basefw.bin",
+               [SOF_IPC_TYPE_4] = "sof-adl.ri",
        },
        .nocodec_tplg_filename = "sof-adl-nocodec.tplg",
        .ops = &sof_tgl_ops,
@@ -197,18 +197,18 @@ static const struct sof_dev_desc adl_n_desc = {
        .dspless_mode_supported = true,         /* Only supported for HDaudio */
        .default_fw_path = {
                [SOF_IPC_TYPE_3] = "intel/sof",
-               [SOF_IPC_TYPE_4] = "intel/avs/adl-n",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4/adl-n",
        },
        .default_lib_path = {
-               [SOF_IPC_TYPE_4] = "intel/avs-lib/adl-n",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4-lib/adl-n",
        },
        .default_tplg_path = {
                [SOF_IPC_TYPE_3] = "intel/sof-tplg",
-               [SOF_IPC_TYPE_4] = "intel/avs-tplg",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4-tplg",
        },
        .default_fw_filename = {
                [SOF_IPC_TYPE_3] = "sof-adl-n.ri",
-               [SOF_IPC_TYPE_4] = "dsp_basefw.bin",
+               [SOF_IPC_TYPE_4] = "sof-adl-n.ri",
        },
        .nocodec_tplg_filename = "sof-adl-nocodec.tplg",
        .ops = &sof_tgl_ops,
@@ -230,18 +230,18 @@ static const struct sof_dev_desc rpls_desc = {
        .dspless_mode_supported = true,         /* Only supported for HDaudio */
        .default_fw_path = {
                [SOF_IPC_TYPE_3] = "intel/sof",
-               [SOF_IPC_TYPE_4] = "intel/avs/rpl-s",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4/rpl-s",
        },
        .default_lib_path = {
-               [SOF_IPC_TYPE_4] = "intel/avs-lib/rpl-s",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4-lib/rpl-s",
        },
        .default_tplg_path = {
                [SOF_IPC_TYPE_3] = "intel/sof-tplg",
-               [SOF_IPC_TYPE_4] = "intel/avs-tplg",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4-tplg",
        },
        .default_fw_filename = {
                [SOF_IPC_TYPE_3] = "sof-rpl-s.ri",
-               [SOF_IPC_TYPE_4] = "dsp_basefw.bin",
+               [SOF_IPC_TYPE_4] = "sof-rpl-s.ri",
        },
        .nocodec_tplg_filename = "sof-rpl-nocodec.tplg",
        .ops = &sof_tgl_ops,
@@ -263,18 +263,18 @@ static const struct sof_dev_desc rpl_desc = {
        .dspless_mode_supported = true,         /* Only supported for HDaudio */
        .default_fw_path = {
                [SOF_IPC_TYPE_3] = "intel/sof",
-               [SOF_IPC_TYPE_4] = "intel/avs/rpl",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4/rpl",
        },
        .default_lib_path = {
-               [SOF_IPC_TYPE_4] = "intel/avs-lib/rpl",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4-lib/rpl",
        },
        .default_tplg_path = {
                [SOF_IPC_TYPE_3] = "intel/sof-tplg",
-               [SOF_IPC_TYPE_4] = "intel/avs-tplg",
+               [SOF_IPC_TYPE_4] = "intel/sof-ipc4-tplg",
        },
        .default_fw_filename = {
                [SOF_IPC_TYPE_3] = "sof-rpl.ri",
-               [SOF_IPC_TYPE_4] = "dsp_basefw.bin",
+               [SOF_IPC_TYPE_4] = "sof-rpl.ri",
        },
        .nocodec_tplg_filename = "sof-rpl-nocodec.tplg",
        .ops = &sof_tgl_ops,
index a8832a1c1a2442c8af573943787e3b6768b5e36f..d47698f4be2deb6f5bb943e2de2611923c45c387 100644 (file)
@@ -2360,27 +2360,16 @@ static int sof_tear_down_left_over_pipelines(struct snd_sof_dev *sdev)
        return 0;
 }
 
-/*
- * For older firmware, this function doesn't free widgets for static pipelines during suspend.
- * It only resets use_count for all widgets.
- */
-static int sof_ipc3_tear_down_all_pipelines(struct snd_sof_dev *sdev, bool verify)
+static int sof_ipc3_free_widgets_in_list(struct snd_sof_dev *sdev, bool include_scheduler,
+                                        bool *dyn_widgets, bool verify)
 {
        struct sof_ipc_fw_version *v = &sdev->fw_ready.version;
        struct snd_sof_widget *swidget;
-       struct snd_sof_route *sroute;
-       bool dyn_widgets = false;
        int ret;
 
-       /*
-        * This function is called during suspend and for one-time topology verification during
-        * first boot. In both cases, there is no need to protect swidget->use_count and
-        * sroute->setup because during suspend all running streams are suspended and during
-        * topology loading the sound card unavailable to open PCMs.
-        */
        list_for_each_entry(swidget, &sdev->widget_list, list) {
                if (swidget->dynamic_pipeline_widget) {
-                       dyn_widgets = true;
+                       *dyn_widgets = true;
                        continue;
                }
 
@@ -2395,11 +2384,49 @@ static int sof_ipc3_tear_down_all_pipelines(struct snd_sof_dev *sdev, bool verif
                        continue;
                }
 
+               if (include_scheduler && swidget->id != snd_soc_dapm_scheduler)
+                       continue;
+
+               if (!include_scheduler && swidget->id == snd_soc_dapm_scheduler)
+                       continue;
+
                ret = sof_widget_free(sdev, swidget);
                if (ret < 0)
                        return ret;
        }
 
+       return 0;
+}
+
+/*
+ * For older firmware, this function doesn't free widgets for static pipelines during suspend.
+ * It only resets use_count for all widgets.
+ */
+static int sof_ipc3_tear_down_all_pipelines(struct snd_sof_dev *sdev, bool verify)
+{
+       struct sof_ipc_fw_version *v = &sdev->fw_ready.version;
+       struct snd_sof_widget *swidget;
+       struct snd_sof_route *sroute;
+       bool dyn_widgets = false;
+       int ret;
+
+       /*
+        * This function is called during suspend and for one-time topology verification during
+        * first boot. In both cases, there is no need to protect swidget->use_count and
+        * sroute->setup because during suspend all running streams are suspended and during
+        * topology loading the sound card unavailable to open PCMs. Do not free the scheduler
+        * widgets yet so that the secondary cores do not get powered down before all the widgets
+        * associated with the scheduler are freed.
+        */
+       ret = sof_ipc3_free_widgets_in_list(sdev, false, &dyn_widgets, verify);
+       if (ret < 0)
+               return ret;
+
+       /* free all the scheduler widgets now */
+       ret = sof_ipc3_free_widgets_in_list(sdev, true, &dyn_widgets, verify);
+       if (ret < 0)
+               return ret;
+
        /*
         * Tear down all pipelines associated with PCMs that did not get suspended
         * and unset the prepare flag so that they can be set up again during resume.
index fb40378ad0840255a9b35cb74f99c9b80d5271b9..c03dd513fbff142fee51a9b0233920024d691f1d 100644 (file)
@@ -1067,7 +1067,7 @@ static void sof_ipc3_rx_msg(struct snd_sof_dev *sdev)
                return;
        }
 
-       if (hdr.size < sizeof(hdr)) {
+       if (hdr.size < sizeof(hdr) || hdr.size > SOF_IPC_MSG_MAX_SIZE) {
                dev_err(sdev->dev, "The received message size is invalid\n");
                return;
        }
index 85d3f390e4b290774687086f37b2a73473117e54..07eb5c6d4adf3246877e4881c1927a6a4f8c39ee 100644 (file)
@@ -413,7 +413,18 @@ skip_pause_transition:
        ret = sof_ipc4_set_multi_pipeline_state(sdev, state, trigger_list);
        if (ret < 0) {
                dev_err(sdev->dev, "failed to set final state %d for all pipelines\n", state);
-               goto free;
+               /*
+                * workaround: if the firmware is crashed while setting the
+                * pipelines to reset state we must ignore the error code and
+                * reset it to 0.
+                * Since the firmware is crashed we will not send IPC messages
+                * and we are going to see errors printed, but the state of the
+                * widgets will be correct for the next boot.
+                */
+               if (sdev->fw_state != SOF_FW_CRASHED || state != SOF_IPC4_PIPE_RESET)
+                       goto free;
+
+               ret = 0;
        }
 
        /* update RUNNING/RESET state for all pipelines that were just triggered */
index 702386823d17263ffa6acacb6d0bd71adb7c83d9..f41c309558579f1c3c4b1d0e7bcca1b2e64d8747 100644 (file)
@@ -577,6 +577,11 @@ static const struct of_device_id sun4i_spdif_of_match[] = {
                .compatible = "allwinner,sun50i-h6-spdif",
                .data = &sun50i_h6_spdif_quirks,
        },
+       {
+               .compatible = "allwinner,sun50i-h616-spdif",
+               /* Essentially the same as the H6, but without RX */
+               .data = &sun50i_h6_spdif_quirks,
+       },
        { /* sentinel */ }
 };
 MODULE_DEVICE_TABLE(of, sun4i_spdif_of_match);
index 33db334e6556674414047b1a1d660ec3e8083100..60fcb872a80b6c1f79afcec88e959df33a04a4da 100644 (file)
@@ -261,6 +261,8 @@ static int __uac_clock_find_source(struct snd_usb_audio *chip,
        int ret, i, cur, err, pins, clock_id;
        const u8 *sources;
        int proto = fmt->protocol;
+       bool readable, writeable;
+       u32 bmControls;
 
        entity_id &= 0xff;
 
@@ -292,11 +294,27 @@ static int __uac_clock_find_source(struct snd_usb_audio *chip,
                sources = GET_VAL(selector, proto, baCSourceID);
                cur = 0;
 
+               if (proto == UAC_VERSION_3)
+                       bmControls = le32_to_cpu(*(__le32 *)(&selector->v3.baCSourceID[0] + pins));
+               else
+                       bmControls = *(__u8 *)(&selector->v2.baCSourceID[0] + pins);
+
+               readable = uac_v2v3_control_is_readable(bmControls,
+                                                       UAC2_CX_CLOCK_SELECTOR);
+               writeable = uac_v2v3_control_is_writeable(bmControls,
+                                                         UAC2_CX_CLOCK_SELECTOR);
+
                if (pins == 1) {
                        ret = 1;
                        goto find_source;
                }
 
+               /* for now just warn about buggy device */
+               if (!readable)
+                       usb_audio_warn(chip,
+                               "%s(): clock selector control is not readable, id %d\n",
+                               __func__, clock_id);
+
                /* the entity ID we are looking at is a selector.
                 * find out what it currently selects */
                ret = uac_clock_selector_get_val(chip, clock_id);
@@ -325,17 +343,29 @@ static int __uac_clock_find_source(struct snd_usb_audio *chip,
                                              visited, validate);
                if (ret > 0) {
                        /* Skip setting clock selector again for some devices */
-                       if (chip->quirk_flags & QUIRK_FLAG_SKIP_CLOCK_SELECTOR)
+                       if (chip->quirk_flags & QUIRK_FLAG_SKIP_CLOCK_SELECTOR ||
+                           !writeable)
                                return ret;
                        err = uac_clock_selector_set_val(chip, entity_id, cur);
-                       if (err < 0)
+                       if (err < 0) {
+                               if (pins == 1) {
+                                       usb_audio_dbg(chip,
+                                                     "%s(): selector returned an error, "
+                                                     "assuming a firmware bug, id %d, ret %d\n",
+                                                     __func__, clock_id, err);
+                                       return ret;
+                               }
                                return err;
+                       }
                }
 
                if (!validate || ret > 0 || !chip->autoclock)
                        return ret;
 
        find_others:
+               if (!writeable)
+                       return -ENXIO;
+
                /* The current clock source is invalid, try others. */
                for (i = 1; i <= pins; i++) {
                        if (i == cur)
index ab5fed9f55b60ec8b255448a9cbb435f9e04d96b..3b45d0ee769389aafb3e752cec5b96223c52077b 100644 (file)
@@ -470,9 +470,11 @@ static int validate_sample_rate_table_v2v3(struct snd_usb_audio *chip,
                                           int clock)
 {
        struct usb_device *dev = chip->dev;
+       struct usb_host_interface *alts;
        unsigned int *table;
        unsigned int nr_rates;
        int i, err;
+       u32 bmControls;
 
        /* performing the rate verification may lead to unexpected USB bus
         * behavior afterwards by some unknown reason.  Do this only for the
@@ -481,6 +483,24 @@ static int validate_sample_rate_table_v2v3(struct snd_usb_audio *chip,
        if (!(chip->quirk_flags & QUIRK_FLAG_VALIDATE_RATES))
                return 0; /* don't perform the validation as default */
 
+       alts = snd_usb_get_host_interface(chip, fp->iface, fp->altsetting);
+       if (!alts)
+               return 0;
+
+       if (fp->protocol == UAC_VERSION_3) {
+               struct uac3_as_header_descriptor *as = snd_usb_find_csint_desc(
+                               alts->extra, alts->extralen, NULL, UAC_AS_GENERAL);
+               bmControls = le32_to_cpu(as->bmControls);
+       } else {
+               struct uac2_as_header_descriptor *as = snd_usb_find_csint_desc(
+                               alts->extra, alts->extralen, NULL, UAC_AS_GENERAL);
+               bmControls = as->bmControls;
+       }
+
+       if (!uac_v2v3_control_is_readable(bmControls,
+                               UAC2_AS_VAL_ALT_SETTINGS))
+               return 0;
+
        table = kcalloc(fp->nr_rates, sizeof(*table), GFP_KERNEL);
        if (!table)
                return -ENOMEM;
index 6b0993258e039b052b9196f3a2f43f720623f2ce..c1f2e5a03de969af932b3f8394332eb6f80983a3 100644 (file)
@@ -1742,50 +1742,44 @@ static void snd_usbmidi_get_port_info(struct snd_rawmidi *rmidi, int number,
        }
 }
 
-static struct usb_midi_in_jack_descriptor *find_usb_in_jack_descriptor(
-                                       struct usb_host_interface *hostif, uint8_t jack_id)
+/* return iJack for the corresponding jackID */
+static int find_usb_ijack(struct usb_host_interface *hostif, uint8_t jack_id)
 {
        unsigned char *extra = hostif->extra;
        int extralen = hostif->extralen;
+       struct usb_descriptor_header *h;
+       struct usb_midi_out_jack_descriptor *outjd;
+       struct usb_midi_in_jack_descriptor *injd;
+       size_t sz;
 
        while (extralen > 4) {
-               struct usb_midi_in_jack_descriptor *injd =
-                               (struct usb_midi_in_jack_descriptor *)extra;
+               h = (struct usb_descriptor_header *)extra;
+               if (h->bDescriptorType != USB_DT_CS_INTERFACE)
+                       goto next;
 
+               outjd = (struct usb_midi_out_jack_descriptor *)h;
+               if (h->bLength >= sizeof(*outjd) &&
+                   outjd->bDescriptorSubtype == UAC_MIDI_OUT_JACK &&
+                   outjd->bJackID == jack_id) {
+                       sz = USB_DT_MIDI_OUT_SIZE(outjd->bNrInputPins);
+                       if (outjd->bLength < sz)
+                               goto next;
+                       return *(extra + sz - 1);
+               }
+
+               injd = (struct usb_midi_in_jack_descriptor *)h;
                if (injd->bLength >= sizeof(*injd) &&
-                   injd->bDescriptorType == USB_DT_CS_INTERFACE &&
                    injd->bDescriptorSubtype == UAC_MIDI_IN_JACK &&
-                               injd->bJackID == jack_id)
-                       return injd;
-               if (!extra[0])
-                       break;
-               extralen -= extra[0];
-               extra += extra[0];
-       }
-       return NULL;
-}
-
-static struct usb_midi_out_jack_descriptor *find_usb_out_jack_descriptor(
-                                       struct usb_host_interface *hostif, uint8_t jack_id)
-{
-       unsigned char *extra = hostif->extra;
-       int extralen = hostif->extralen;
+                   injd->bJackID == jack_id)
+                       return injd->iJack;
 
-       while (extralen > 4) {
-               struct usb_midi_out_jack_descriptor *outjd =
-                               (struct usb_midi_out_jack_descriptor *)extra;
-
-               if (outjd->bLength >= sizeof(*outjd) &&
-                   outjd->bDescriptorType == USB_DT_CS_INTERFACE &&
-                   outjd->bDescriptorSubtype == UAC_MIDI_OUT_JACK &&
-                               outjd->bJackID == jack_id)
-                       return outjd;
+next:
                if (!extra[0])
                        break;
                extralen -= extra[0];
                extra += extra[0];
        }
-       return NULL;
+       return 0;
 }
 
 static void snd_usbmidi_init_substream(struct snd_usb_midi *umidi,
@@ -1796,13 +1790,10 @@ static void snd_usbmidi_init_substream(struct snd_usb_midi *umidi,
        const char *name_format;
        struct usb_interface *intf;
        struct usb_host_interface *hostif;
-       struct usb_midi_in_jack_descriptor *injd;
-       struct usb_midi_out_jack_descriptor *outjd;
        uint8_t jack_name_buf[32];
        uint8_t *default_jack_name = "MIDI";
        uint8_t *jack_name = default_jack_name;
        uint8_t iJack;
-       size_t sz;
        int res;
 
        struct snd_rawmidi_substream *substream =
@@ -1816,21 +1807,7 @@ static void snd_usbmidi_init_substream(struct snd_usb_midi *umidi,
        intf = umidi->iface;
        if (intf && jack_id >= 0) {
                hostif = intf->cur_altsetting;
-               iJack = 0;
-               if (stream != SNDRV_RAWMIDI_STREAM_OUTPUT) {
-                       /* in jacks connect to outs */
-                       outjd = find_usb_out_jack_descriptor(hostif, jack_id);
-                       if (outjd) {
-                               sz = USB_DT_MIDI_OUT_SIZE(outjd->bNrInputPins);
-                               if (outjd->bLength >= sz)
-                                       iJack = *(((uint8_t *) outjd) + sz - sizeof(uint8_t));
-                       }
-               } else {
-                       /* and out jacks connect to ins */
-                       injd = find_usb_in_jack_descriptor(hostif, jack_id);
-                       if (injd)
-                               iJack = injd->iJack;
-               }
+               iJack = find_usb_ijack(hostif, jack_id);
                if (iJack != 0) {
                        res = usb_string(umidi->dev, iJack, jack_name_buf,
                          ARRAY_SIZE(jack_name_buf));
index 1ec177fe284eddd7eb431d56083e82886c858550..820d3e4b672ab603b6f2cb91ba95d12b60d519f5 100644 (file)
@@ -1085,7 +1085,7 @@ int snd_usb_midi_v2_create(struct snd_usb_audio *chip,
        }
        if ((quirk && quirk->type != QUIRK_MIDI_STANDARD_INTERFACE) ||
            iface->num_altsetting < 2) {
-               usb_audio_info(chip, "Quirk or no altest; falling back to MIDI 1.0\n");
+               usb_audio_info(chip, "Quirk or no altset; falling back to MIDI 1.0\n");
                goto fallback_to_midi1;
        }
        hostif = &iface->altsetting[1];
index 07cc6a201579aa864f6ebd113d638cf4a36153d8..09712e61c606ef21c2b39bb80b8906a70f6ed4ff 100644 (file)
@@ -2031,10 +2031,14 @@ static const struct usb_audio_quirk_flags_table quirk_flags_table[] = {
                   QUIRK_FLAG_CTL_MSG_DELAY_1M | QUIRK_FLAG_IGNORE_CTL_ERROR),
        DEVICE_FLG(0x0499, 0x1509, /* Steinberg UR22 */
                   QUIRK_FLAG_GENERIC_IMPLICIT_FB),
+       DEVICE_FLG(0x0499, 0x3108, /* Yamaha YIT-W12TX */
+                  QUIRK_FLAG_GET_SAMPLE_RATE),
        DEVICE_FLG(0x04d8, 0xfeea, /* Benchmark DAC1 Pre */
                   QUIRK_FLAG_GET_SAMPLE_RATE),
        DEVICE_FLG(0x04e8, 0xa051, /* Samsung USBC Headset (AKG) */
                   QUIRK_FLAG_SKIP_CLOCK_SELECTOR | QUIRK_FLAG_CTL_MSG_DELAY_5M),
+       DEVICE_FLG(0x0525, 0xa4ad, /* Hamedal C20 usb camero */
+                  QUIRK_FLAG_IFACE_SKIP_CLOSE),
        DEVICE_FLG(0x054c, 0x0b8c, /* Sony WALKMAN NW-A45 DAC */
                   QUIRK_FLAG_SET_IFACE_FIRST),
        DEVICE_FLG(0x0556, 0x0014, /* Phoenix Audio TMX320VC */
@@ -2073,14 +2077,22 @@ static const struct usb_audio_quirk_flags_table quirk_flags_table[] = {
                   QUIRK_FLAG_GENERIC_IMPLICIT_FB),
        DEVICE_FLG(0x0763, 0x2031, /* M-Audio Fast Track C600 */
                   QUIRK_FLAG_GENERIC_IMPLICIT_FB),
+       DEVICE_FLG(0x07fd, 0x000b, /* MOTU M Series 2nd hardware revision */
+                  QUIRK_FLAG_CTL_MSG_DELAY_1M),
        DEVICE_FLG(0x08bb, 0x2702, /* LineX FM Transmitter */
                   QUIRK_FLAG_IGNORE_CTL_ERROR),
        DEVICE_FLG(0x0951, 0x16ad, /* Kingston HyperX */
                   QUIRK_FLAG_CTL_MSG_DELAY_1M),
        DEVICE_FLG(0x0b0e, 0x0349, /* Jabra 550a */
                   QUIRK_FLAG_CTL_MSG_DELAY_1M),
+       DEVICE_FLG(0x0ecb, 0x205c, /* JBL Quantum610 Wireless */
+                  QUIRK_FLAG_FIXED_RATE),
+       DEVICE_FLG(0x0ecb, 0x2069, /* JBL Quantum810 Wireless */
+                  QUIRK_FLAG_FIXED_RATE),
        DEVICE_FLG(0x0fd9, 0x0008, /* Hauppauge HVR-950Q */
                   QUIRK_FLAG_SHARE_MEDIA_DEVICE | QUIRK_FLAG_ALIGN_TRANSFER),
+       DEVICE_FLG(0x1224, 0x2a25, /* Jieli Technology USB PHY 2.0 */
+                  QUIRK_FLAG_GET_SAMPLE_RATE),
        DEVICE_FLG(0x1395, 0x740a, /* Sennheiser DECT */
                   QUIRK_FLAG_GET_SAMPLE_RATE),
        DEVICE_FLG(0x1397, 0x0507, /* Behringer UMC202HD */
@@ -2113,6 +2125,10 @@ static const struct usb_audio_quirk_flags_table quirk_flags_table[] = {
                   QUIRK_FLAG_ITF_USB_DSD_DAC | QUIRK_FLAG_CTL_MSG_DELAY),
        DEVICE_FLG(0x1901, 0x0191, /* GE B850V3 CP2114 audio interface */
                   QUIRK_FLAG_GET_SAMPLE_RATE),
+       DEVICE_FLG(0x19f7, 0x0035, /* RODE NT-USB+ */
+                  QUIRK_FLAG_GET_SAMPLE_RATE),
+       DEVICE_FLG(0x1bcf, 0x2283, /* NexiGo N930AF FHD Webcam */
+                  QUIRK_FLAG_GET_SAMPLE_RATE),
        DEVICE_FLG(0x2040, 0x7200, /* Hauppauge HVR-950Q */
                   QUIRK_FLAG_SHARE_MEDIA_DEVICE | QUIRK_FLAG_ALIGN_TRANSFER),
        DEVICE_FLG(0x2040, 0x7201, /* Hauppauge HVR-950Q-MXL */
@@ -2155,6 +2171,12 @@ static const struct usb_audio_quirk_flags_table quirk_flags_table[] = {
                   QUIRK_FLAG_IGNORE_CTL_ERROR),
        DEVICE_FLG(0x2912, 0x30c8, /* Audioengine D1 */
                   QUIRK_FLAG_GET_SAMPLE_RATE),
+       DEVICE_FLG(0x2b53, 0x0023, /* Fiero SC-01 (firmware v1.0.0 @ 48 kHz) */
+                  QUIRK_FLAG_GENERIC_IMPLICIT_FB),
+       DEVICE_FLG(0x2b53, 0x0024, /* Fiero SC-01 (firmware v1.0.0 @ 96 kHz) */
+                  QUIRK_FLAG_GENERIC_IMPLICIT_FB),
+       DEVICE_FLG(0x2b53, 0x0031, /* Fiero SC-01 (firmware v1.1.0) */
+                  QUIRK_FLAG_GENERIC_IMPLICIT_FB),
        DEVICE_FLG(0x30be, 0x0101, /* Schiit Hel */
                   QUIRK_FLAG_IGNORE_CTL_ERROR),
        DEVICE_FLG(0x413c, 0xa506, /* Dell AE515 sound bar */
@@ -2163,22 +2185,6 @@ static const struct usb_audio_quirk_flags_table quirk_flags_table[] = {
                   QUIRK_FLAG_ALIGN_TRANSFER),
        DEVICE_FLG(0x534d, 0x2109, /* MacroSilicon MS2109 */
                   QUIRK_FLAG_ALIGN_TRANSFER),
-       DEVICE_FLG(0x1224, 0x2a25, /* Jieli Technology USB PHY 2.0 */
-                  QUIRK_FLAG_GET_SAMPLE_RATE),
-       DEVICE_FLG(0x2b53, 0x0023, /* Fiero SC-01 (firmware v1.0.0 @ 48 kHz) */
-                  QUIRK_FLAG_GENERIC_IMPLICIT_FB),
-       DEVICE_FLG(0x2b53, 0x0024, /* Fiero SC-01 (firmware v1.0.0 @ 96 kHz) */
-                  QUIRK_FLAG_GENERIC_IMPLICIT_FB),
-       DEVICE_FLG(0x2b53, 0x0031, /* Fiero SC-01 (firmware v1.1.0) */
-                  QUIRK_FLAG_GENERIC_IMPLICIT_FB),
-       DEVICE_FLG(0x0525, 0xa4ad, /* Hamedal C20 usb camero */
-                  QUIRK_FLAG_IFACE_SKIP_CLOSE),
-       DEVICE_FLG(0x0ecb, 0x205c, /* JBL Quantum610 Wireless */
-                  QUIRK_FLAG_FIXED_RATE),
-       DEVICE_FLG(0x0ecb, 0x2069, /* JBL Quantum810 Wireless */
-                  QUIRK_FLAG_FIXED_RATE),
-       DEVICE_FLG(0x1bcf, 0x2283, /* NexiGo N930AF FHD Webcam */
-                  QUIRK_FLAG_GET_SAMPLE_RATE),
 
        /* Vendor matches */
        VENDOR_FLG(0x045e, /* MS Lifecam */
index e2847c040f750f98a77cb0bfe4ec2a548f5eb691..b158c3cb8e5f5fce75e22306c7707935465bc57f 100644 (file)
@@ -91,8 +91,6 @@ static void virtsnd_event_notify_cb(struct virtqueue *vqueue)
                        virtsnd_event_dispatch(snd, event);
                        virtsnd_event_send(vqueue, event, true, GFP_ATOMIC);
                }
-               if (unlikely(virtqueue_is_broken(vqueue)))
-                       break;
        } while (!virtqueue_enable_cb(vqueue));
        spin_unlock_irqrestore(&queue->lock, flags);
 }
index 18dc5aca2e0c5b2a1e6c0d4391b34865b797995e..9dabea01277f845726ee2a908b9288c9f0e5e918 100644 (file)
@@ -303,8 +303,6 @@ void virtsnd_ctl_notify_cb(struct virtqueue *vqueue)
                virtqueue_disable_cb(vqueue);
                while ((msg = virtqueue_get_buf(vqueue, &length)))
                        virtsnd_ctl_msg_complete(msg);
-               if (unlikely(virtqueue_is_broken(vqueue)))
-                       break;
        } while (!virtqueue_enable_cb(vqueue));
        spin_unlock_irqrestore(&queue->lock, flags);
 }
index 542446c4c7ba8e4da2d7dd5b701c829e45c24084..8c32efaf4c5294e6aba0adcfb8a40a22d3a0d261 100644 (file)
@@ -358,8 +358,6 @@ static inline void virtsnd_pcm_notify_cb(struct virtio_snd_queue *queue)
                virtqueue_disable_cb(queue->vqueue);
                while ((msg = virtqueue_get_buf(queue->vqueue, &written_bytes)))
                        virtsnd_pcm_msg_complete(msg, written_bytes);
-               if (unlikely(virtqueue_is_broken(queue->vqueue)))
-                       break;
        } while (!virtqueue_enable_cb(queue->vqueue));
        spin_unlock_irqrestore(&queue->lock, flags);
 }
index f4542d2718f4f635ce8879da123764e72e9af47b..29cb275a219d7fb38fa0d16e6ba48e91c9d032b4 100644 (file)
 #define X86_FEATURE_CAT_L3             ( 7*32+ 4) /* Cache Allocation Technology L3 */
 #define X86_FEATURE_CAT_L2             ( 7*32+ 5) /* Cache Allocation Technology L2 */
 #define X86_FEATURE_CDP_L3             ( 7*32+ 6) /* Code and Data Prioritization L3 */
+#define X86_FEATURE_TDX_HOST_PLATFORM  ( 7*32+ 7) /* Platform supports being a TDX host */
 #define X86_FEATURE_HW_PSTATE          ( 7*32+ 8) /* AMD HW-PState */
 #define X86_FEATURE_PROC_FEEDBACK      ( 7*32+ 9) /* AMD ProcFeedbackInterface */
 #define X86_FEATURE_XCOMPACTED         ( 7*32+10) /* "" Use compacted XSTATE (XSAVES or XSAVEC) */
 #define X86_FEATURE_SMBA               (11*32+21) /* "" Slow Memory Bandwidth Allocation */
 #define X86_FEATURE_BMEC               (11*32+22) /* "" Bandwidth Monitoring Event Configuration */
 #define X86_FEATURE_USER_SHSTK         (11*32+23) /* Shadow stack support for user mode applications */
-
 #define X86_FEATURE_SRSO               (11*32+24) /* "" AMD BTB untrain RETs */
 #define X86_FEATURE_SRSO_ALIAS         (11*32+25) /* "" AMD BTB untrain RETs through aliasing */
 #define X86_FEATURE_IBPB_ON_VMEXIT     (11*32+26) /* "" Issue an IBPB only on VMEXIT */
+#define X86_FEATURE_APIC_MSRS_FENCE    (11*32+27) /* "" IA32_TSC_DEADLINE and X2APIC MSRs need fencing */
+#define X86_FEATURE_ZEN2               (11*32+28) /* "" CPU based on Zen2 microarchitecture */
+#define X86_FEATURE_ZEN3               (11*32+29) /* "" CPU based on Zen3 microarchitecture */
+#define X86_FEATURE_ZEN4               (11*32+30) /* "" CPU based on Zen4 microarchitecture */
+#define X86_FEATURE_ZEN1               (11*32+31) /* "" CPU based on Zen1 microarchitecture */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX_VNNI           (12*32+ 4) /* AVX VNNI instructions */
 #define X86_BUG_EIBRS_PBRSB            X86_BUG(28) /* EIBRS is vulnerable to Post Barrier RSB Predictions */
 #define X86_BUG_SMT_RSB                        X86_BUG(29) /* CPU is vulnerable to Cross-Thread Return Address Predictions */
 #define X86_BUG_GDS                    X86_BUG(30) /* CPU is affected by Gather Data Sampling */
+#define X86_BUG_TDX_PW_MCE             X86_BUG(31) /* CPU may incur #MC if non-TD software does partial write to TDX private memory */
 
 /* BUG word 2 */
 #define X86_BUG_SRSO                   X86_BUG(1*32 + 0) /* AMD SRSO bug */
index 1d51e1850ed03d46e84c71de0c451067d0baac5b..f1bd7b91b3c63735738825f15cd3c82fca7579ce 100644 (file)
 #define LBR_INFO_CYCLES                        0xffff
 #define LBR_INFO_BR_TYPE_OFFSET                56
 #define LBR_INFO_BR_TYPE               (0xfull << LBR_INFO_BR_TYPE_OFFSET)
+#define LBR_INFO_BR_CNTR_OFFSET                32
+#define LBR_INFO_BR_CNTR_NUM           4
+#define LBR_INFO_BR_CNTR_BITS          2
+#define LBR_INFO_BR_CNTR_MASK          GENMASK_ULL(LBR_INFO_BR_CNTR_BITS - 1, 0)
+#define LBR_INFO_BR_CNTR_FULL_MASK     GENMASK_ULL(LBR_INFO_BR_CNTR_NUM * LBR_INFO_BR_CNTR_BITS - 1, 0)
 
 #define MSR_ARCH_LBR_CTL               0x000014ce
 #define ARCH_LBR_CTL_LBREN             BIT(0)
 #define MSR_RELOAD_PMC0                        0x000014c1
 #define MSR_RELOAD_FIXED_CTR0          0x00001309
 
+/* KeyID partitioning between MKTME and TDX */
+#define MSR_IA32_MKTME_KEYID_PARTITIONING      0x00000087
+
 /*
  * AMD64 MSRs. Not complete. See the architecture manual for a more
  * complete list.
index 11ff975242cac7cff4dfaab3a4591dc4cb82eb1d..e2ff22b379a44c584b7325249c48db9b3368c7d8 100644 (file)
@@ -4,7 +4,7 @@
 
 #define __GEN_RMWcc(fullop, var, cc, ...)                              \
 do {                                                                   \
-       asm_volatile_goto (fullop "; j" cc " %l[cc_label]"              \
+       asm goto (fullop "; j" cc " %l[cc_label]"               \
                        : : "m" (var), ## __VA_ARGS__                   \
                        : "memory" : cc_label);                         \
        return 0;                                                       \
index 1a6a1f98794967d260e2898b0dbb62f830d45664..a448d0964fc06ebd0c15cd0b550e3c2cefbf57bf 100644 (file)
@@ -562,4 +562,7 @@ struct kvm_pmu_event_filter {
 /* x86-specific KVM_EXIT_HYPERCALL flags. */
 #define KVM_EXIT_HYPERCALL_LONG_MODE   BIT(0)
 
+#define KVM_X86_DEFAULT_VM     0
+#define KVM_X86_SW_PROTECTED_VM        1
+
 #endif /* _ASM_X86_KVM_H */
index d055b82d22ccd083975a874e5a96abbeff8f496b..59cf6f9065aa84d8a4a6a92999f3d3d1f3367681 100644 (file)
@@ -1,11 +1,11 @@
 /* SPDX-License-Identifier: GPL-2.0-only */
 /* Copyright 2002 Andi Kleen */
 
+#include <linux/export.h>
 #include <linux/linkage.h>
 #include <asm/errno.h>
 #include <asm/cpufeatures.h>
 #include <asm/alternative.h>
-#include <asm/export.h>
 
 .section .noinstr.text, "ax"
 
@@ -39,7 +39,7 @@ SYM_TYPED_FUNC_START(__memcpy)
 SYM_FUNC_END(__memcpy)
 EXPORT_SYMBOL(__memcpy)
 
-SYM_FUNC_ALIAS(memcpy, __memcpy)
+SYM_FUNC_ALIAS_MEMFUNC(memcpy, __memcpy)
 EXPORT_SYMBOL(memcpy)
 
 SYM_FUNC_START_LOCAL(memcpy_orig)
index 7c59a704c4584bf7ef3e6a50f2021c31e6f15029..0199d56cb479d88ce0bc6556c092ea87ae9ceb3b 100644 (file)
@@ -1,10 +1,10 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /* Copyright 2002 Andi Kleen, SuSE Labs */
 
+#include <linux/export.h>
 #include <linux/linkage.h>
 #include <asm/cpufeatures.h>
 #include <asm/alternative.h>
-#include <asm/export.h>
 
 .section .noinstr.text, "ax"
 
@@ -40,7 +40,7 @@ SYM_FUNC_START(__memset)
 SYM_FUNC_END(__memset)
 EXPORT_SYMBOL(__memset)
 
-SYM_FUNC_ALIAS(memset, __memset)
+SYM_FUNC_ALIAS_MEMFUNC(memset, __memset)
 EXPORT_SYMBOL(memset)
 
 SYM_FUNC_START_LOCAL(memset_orig)
index 2fd551915c2025ee7d7adc53f30e44e7b6bf01c1..cdd2fd078027afc99a21a7edb2fa097caf6e4a92 100644 (file)
@@ -105,9 +105,9 @@ static inline u32 get_unaligned_le24(const void *p)
 
 static inline void __put_unaligned_be24(const u32 val, u8 *p)
 {
-       *p++ = val >> 16;
-       *p++ = val >> 8;
-       *p++ = val;
+       *p++ = (val >> 16) & 0xff;
+       *p++ = (val >> 8) & 0xff;
+       *p++ = val & 0xff;
 }
 
 static inline void put_unaligned_be24(const u32 val, void *p)
@@ -117,9 +117,9 @@ static inline void put_unaligned_be24(const u32 val, void *p)
 
 static inline void __put_unaligned_le24(const u32 val, u8 *p)
 {
-       *p++ = val;
-       *p++ = val >> 8;
-       *p++ = val >> 16;
+       *p++ = val & 0xff;
+       *p++ = (val >> 8) & 0xff;
+       *p++ = (val >> 16) & 0xff;
 }
 
 static inline void put_unaligned_le24(const u32 val, void *p)
@@ -129,12 +129,12 @@ static inline void put_unaligned_le24(const u32 val, void *p)
 
 static inline void __put_unaligned_be48(const u64 val, u8 *p)
 {
-       *p++ = val >> 40;
-       *p++ = val >> 32;
-       *p++ = val >> 24;
-       *p++ = val >> 16;
-       *p++ = val >> 8;
-       *p++ = val;
+       *p++ = (val >> 40) & 0xff;
+       *p++ = (val >> 32) & 0xff;
+       *p++ = (val >> 24) & 0xff;
+       *p++ = (val >> 16) & 0xff;
+       *p++ = (val >> 8) & 0xff;
+       *p++ = val & 0xff;
 }
 
 static inline void put_unaligned_be48(const u64 val, void *p)
index 1bdd834bdd57198059c91222036314403191cdbc..d09f9dc172a486875e2e62cf8550a69f24c9beed 100644 (file)
@@ -36,8 +36,8 @@
 #include <linux/compiler-gcc.h>
 #endif
 
-#ifndef asm_volatile_goto
-#define asm_volatile_goto(x...) asm goto(x)
+#ifndef asm_goto_output
+#define asm_goto_output(x...) asm goto(x)
 #endif
 
 #endif /* __LINUX_COMPILER_TYPES_H */
index 756b013fb8324bd7a320e60cebec2ca692faa149..75f00965ab1586cd64d00928217596de5034bd25 100644 (file)
@@ -829,8 +829,21 @@ __SYSCALL(__NR_futex_wait, sys_futex_wait)
 #define __NR_futex_requeue 456
 __SYSCALL(__NR_futex_requeue, sys_futex_requeue)
 
+#define __NR_statmount   457
+__SYSCALL(__NR_statmount, sys_statmount)
+
+#define __NR_listmount   458
+__SYSCALL(__NR_listmount, sys_listmount)
+
+#define __NR_lsm_get_self_attr 459
+__SYSCALL(__NR_lsm_get_self_attr, sys_lsm_get_self_attr)
+#define __NR_lsm_set_self_attr 460
+__SYSCALL(__NR_lsm_set_self_attr, sys_lsm_set_self_attr)
+#define __NR_lsm_list_modules 461
+__SYSCALL(__NR_lsm_list_modules, sys_lsm_list_modules)
+
 #undef __NR_syscalls
-#define __NR_syscalls 457
+#define __NR_syscalls 462
 
 /*
  * 32 bit systems traditionally used different
index de723566c5ae82382192923e17478209f7c94f41..16122819edfeff872b91d989d1f6267640ae1391 100644 (file)
@@ -713,7 +713,8 @@ struct drm_gem_open {
 /**
  * DRM_CAP_ASYNC_PAGE_FLIP
  *
- * If set to 1, the driver supports &DRM_MODE_PAGE_FLIP_ASYNC.
+ * If set to 1, the driver supports &DRM_MODE_PAGE_FLIP_ASYNC for legacy
+ * page-flips.
  */
 #define DRM_CAP_ASYNC_PAGE_FLIP                0x7
 /**
@@ -773,6 +774,13 @@ struct drm_gem_open {
  * :ref:`drm_sync_objects`.
  */
 #define DRM_CAP_SYNCOBJ_TIMELINE       0x14
+/**
+ * DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP
+ *
+ * If set to 1, the driver supports &DRM_MODE_PAGE_FLIP_ASYNC for atomic
+ * commits.
+ */
+#define DRM_CAP_ATOMIC_ASYNC_PAGE_FLIP 0x15
 
 /* DRM_IOCTL_GET_CAP ioctl argument type */
 struct drm_get_cap {
@@ -842,6 +850,31 @@ struct drm_get_cap {
  */
 #define DRM_CLIENT_CAP_WRITEBACK_CONNECTORS    5
 
+/**
+ * DRM_CLIENT_CAP_CURSOR_PLANE_HOTSPOT
+ *
+ * Drivers for para-virtualized hardware (e.g. vmwgfx, qxl, virtio and
+ * virtualbox) have additional restrictions for cursor planes (thus
+ * making cursor planes on those drivers not truly universal,) e.g.
+ * they need cursor planes to act like one would expect from a mouse
+ * cursor and have correctly set hotspot properties.
+ * If this client cap is not set the DRM core will hide cursor plane on
+ * those virtualized drivers because not setting it implies that the
+ * client is not capable of dealing with those extra restictions.
+ * Clients which do set cursor hotspot and treat the cursor plane
+ * like a mouse cursor should set this property.
+ * The client must enable &DRM_CLIENT_CAP_ATOMIC first.
+ *
+ * Setting this property on drivers which do not special case
+ * cursor planes (i.e. non-virtualized drivers) will return
+ * EOPNOTSUPP, which can be used by userspace to gauge
+ * requirements of the hardware/drivers they're running on.
+ *
+ * This capability is always supported for atomic-capable virtualized
+ * drivers starting from kernel version 6.6.
+ */
+#define DRM_CLIENT_CAP_CURSOR_PLANE_HOTSPOT    6
+
 /* DRM_IOCTL_SET_CLIENT_CAP ioctl argument type */
 struct drm_set_client_cap {
        __u64 capability;
@@ -893,6 +926,7 @@ struct drm_syncobj_transfer {
 #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL (1 << 0)
 #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT (1 << 1)
 #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE (1 << 2) /* wait for time point to become available */
+#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE (1 << 3) /* set fence deadline to deadline_nsec */
 struct drm_syncobj_wait {
        __u64 handles;
        /* absolute timeout */
@@ -901,6 +935,14 @@ struct drm_syncobj_wait {
        __u32 flags;
        __u32 first_signaled; /* only valid when not waiting all */
        __u32 pad;
+       /**
+        * @deadline_nsec - fence deadline hint
+        *
+        * Deadline hint, in absolute CLOCK_MONOTONIC, to set on backing
+        * fence(s) if the DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE flag is
+        * set.
+        */
+       __u64 deadline_nsec;
 };
 
 struct drm_syncobj_timeline_wait {
@@ -913,6 +955,14 @@ struct drm_syncobj_timeline_wait {
        __u32 flags;
        __u32 first_signaled; /* only valid when not waiting all */
        __u32 pad;
+       /**
+        * @deadline_nsec - fence deadline hint
+        *
+        * Deadline hint, in absolute CLOCK_MONOTONIC, to set on backing
+        * fence(s) if the DRM_SYNCOBJ_WAIT_FLAGS_WAIT_DEADLINE flag is
+        * set.
+        */
+       __u64 deadline_nsec;
 };
 
 /**
@@ -1218,6 +1268,26 @@ extern "C" {
 
 #define DRM_IOCTL_SYNCOBJ_EVENTFD      DRM_IOWR(0xCF, struct drm_syncobj_eventfd)
 
+/**
+ * DRM_IOCTL_MODE_CLOSEFB - Close a framebuffer.
+ *
+ * This closes a framebuffer previously added via ADDFB/ADDFB2. The IOCTL
+ * argument is a framebuffer object ID.
+ *
+ * This IOCTL is similar to &DRM_IOCTL_MODE_RMFB, except it doesn't disable
+ * planes and CRTCs. As long as the framebuffer is used by a plane, it's kept
+ * alive. When the plane no longer uses the framebuffer (because the
+ * framebuffer is replaced with another one, or the plane is disabled), the
+ * framebuffer is cleaned up.
+ *
+ * This is useful to implement flicker-free transitions between two processes.
+ *
+ * Depending on the threat model, user-space may want to ensure that the
+ * framebuffer doesn't expose any sensitive user information: closed
+ * framebuffers attached to a plane can be read back by the next DRM master.
+ */
+#define DRM_IOCTL_MODE_CLOSEFB         DRM_IOWR(0xD0, struct drm_mode_closefb)
+
 /*
  * Device specific ioctls should only be in their respective headers
  * The device specific ioctl range is from 0x40 to 0x9f.
index 218edb0a96f8c043df13a5bf25f85ec754ee449a..fd4f9574d177a269b2cdbe5a36b3b30f2addbc94 100644 (file)
@@ -693,7 +693,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_EXEC_FENCE       44
 
 /* Query whether DRM_I915_GEM_EXECBUFFER2 supports the ability to capture
- * user specified bufffers for post-mortem debugging of GPU hangs. See
+ * user-specified buffers for post-mortem debugging of GPU hangs. See
  * EXEC_OBJECT_CAPTURE.
  */
 #define I915_PARAM_HAS_EXEC_CAPTURE     45
@@ -1606,7 +1606,7 @@ struct drm_i915_gem_busy {
         * is accurate.
         *
         * The returned dword is split into two fields to indicate both
-        * the engine classess on which the object is being read, and the
+        * the engine classes on which the object is being read, and the
         * engine class on which it is currently being written (if any).
         *
         * The low word (bits 0:15) indicate if the object is being written
@@ -1815,7 +1815,7 @@ struct drm_i915_gem_madvise {
        __u32 handle;
 
        /* Advice: either the buffer will be needed again in the near future,
-        *         or wont be and could be discarded under memory pressure.
+        *         or won't be and could be discarded under memory pressure.
         */
        __u32 madv;
 
@@ -3246,7 +3246,7 @@ struct drm_i915_query_topology_info {
  *     // enough to hold our array of engines. The kernel will fill out the
  *     // item.length for us, which is the number of bytes we need.
  *     //
- *     // Alternatively a large buffer can be allocated straight away enabling
+ *     // Alternatively a large buffer can be allocated straightaway enabling
  *     // querying in one pass, in which case item.length should contain the
  *     // length of the provided buffer.
  *     err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
@@ -3256,7 +3256,7 @@ struct drm_i915_query_topology_info {
  *     // Now that we allocated the required number of bytes, we call the ioctl
  *     // again, this time with the data_ptr pointing to our newly allocated
  *     // blob, which the kernel can then populate with info on all engines.
- *     item.data_ptr = (uintptr_t)&info,
+ *     item.data_ptr = (uintptr_t)&info;
  *
  *     err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
  *     if (err) ...
@@ -3286,7 +3286,7 @@ struct drm_i915_query_topology_info {
 /**
  * struct drm_i915_engine_info
  *
- * Describes one engine and it's capabilities as known to the driver.
+ * Describes one engine and its capabilities as known to the driver.
  */
 struct drm_i915_engine_info {
        /** @engine: Engine class and instance. */
index 6c80f96049bd07d1aa527c103acb07fe52bfd617..282e90aeb163c0288590995b38fe011b19e85111 100644 (file)
 #define AT_HANDLE_FID          AT_REMOVEDIR    /* file handle is needed to
                                        compare object identity and may not
                                        be usable to open_by_handle_at(2) */
+#if defined(__KERNEL__)
+#define AT_GETATTR_NOSEC       0x80000000
+#endif
 
 #endif /* _UAPI_LINUX_FCNTL_H */
index 211b86de35ac53f6457bbd2fae8c973ce6b3a968..c3308536482bdb2bfb1279279325faf5430a3356 100644 (file)
 
 #define KVM_API_VERSION 12
 
-/* *** Deprecated interfaces *** */
-
-#define KVM_TRC_SHIFT           16
-
-#define KVM_TRC_ENTRYEXIT       (1 << KVM_TRC_SHIFT)
-#define KVM_TRC_HANDLER         (1 << (KVM_TRC_SHIFT + 1))
-
-#define KVM_TRC_VMENTRY         (KVM_TRC_ENTRYEXIT + 0x01)
-#define KVM_TRC_VMEXIT          (KVM_TRC_ENTRYEXIT + 0x02)
-#define KVM_TRC_PAGE_FAULT      (KVM_TRC_HANDLER + 0x01)
-
-#define KVM_TRC_HEAD_SIZE       12
-#define KVM_TRC_CYCLE_SIZE      8
-#define KVM_TRC_EXTRA_MAX       7
-
-#define KVM_TRC_INJ_VIRQ         (KVM_TRC_HANDLER + 0x02)
-#define KVM_TRC_REDELIVER_EVT    (KVM_TRC_HANDLER + 0x03)
-#define KVM_TRC_PEND_INTR        (KVM_TRC_HANDLER + 0x04)
-#define KVM_TRC_IO_READ          (KVM_TRC_HANDLER + 0x05)
-#define KVM_TRC_IO_WRITE         (KVM_TRC_HANDLER + 0x06)
-#define KVM_TRC_CR_READ          (KVM_TRC_HANDLER + 0x07)
-#define KVM_TRC_CR_WRITE         (KVM_TRC_HANDLER + 0x08)
-#define KVM_TRC_DR_READ          (KVM_TRC_HANDLER + 0x09)
-#define KVM_TRC_DR_WRITE         (KVM_TRC_HANDLER + 0x0A)
-#define KVM_TRC_MSR_READ         (KVM_TRC_HANDLER + 0x0B)
-#define KVM_TRC_MSR_WRITE        (KVM_TRC_HANDLER + 0x0C)
-#define KVM_TRC_CPUID            (KVM_TRC_HANDLER + 0x0D)
-#define KVM_TRC_INTR             (KVM_TRC_HANDLER + 0x0E)
-#define KVM_TRC_NMI              (KVM_TRC_HANDLER + 0x0F)
-#define KVM_TRC_VMMCALL          (KVM_TRC_HANDLER + 0x10)
-#define KVM_TRC_HLT              (KVM_TRC_HANDLER + 0x11)
-#define KVM_TRC_CLTS             (KVM_TRC_HANDLER + 0x12)
-#define KVM_TRC_LMSW             (KVM_TRC_HANDLER + 0x13)
-#define KVM_TRC_APIC_ACCESS      (KVM_TRC_HANDLER + 0x14)
-#define KVM_TRC_TDP_FAULT        (KVM_TRC_HANDLER + 0x15)
-#define KVM_TRC_GTLB_WRITE       (KVM_TRC_HANDLER + 0x16)
-#define KVM_TRC_STLB_WRITE       (KVM_TRC_HANDLER + 0x17)
-#define KVM_TRC_STLB_INVAL       (KVM_TRC_HANDLER + 0x18)
-#define KVM_TRC_PPC_INSTR        (KVM_TRC_HANDLER + 0x19)
-
-struct kvm_user_trace_setup {
-       __u32 buf_size;
-       __u32 buf_nr;
-};
-
-#define __KVM_DEPRECATED_MAIN_W_0x06 \
-       _IOW(KVMIO, 0x06, struct kvm_user_trace_setup)
-#define __KVM_DEPRECATED_MAIN_0x07 _IO(KVMIO, 0x07)
-#define __KVM_DEPRECATED_MAIN_0x08 _IO(KVMIO, 0x08)
-
-#define __KVM_DEPRECATED_VM_R_0x70 _IOR(KVMIO, 0x70, struct kvm_assigned_irq)
-
-struct kvm_breakpoint {
-       __u32 enabled;
-       __u32 padding;
-       __u64 address;
-};
-
-struct kvm_debug_guest {
-       __u32 enabled;
-       __u32 pad;
-       struct kvm_breakpoint breakpoints[4];
-       __u32 singlestep;
-};
-
-#define __KVM_DEPRECATED_VCPU_W_0x87 _IOW(KVMIO, 0x87, struct kvm_debug_guest)
-
-/* *** End of deprecated interfaces *** */
-
-
 /* for KVM_SET_USER_MEMORY_REGION */
 struct kvm_userspace_memory_region {
        __u32 slot;
@@ -95,6 +25,19 @@ struct kvm_userspace_memory_region {
        __u64 userspace_addr; /* start of the userspace allocated memory */
 };
 
+/* for KVM_SET_USER_MEMORY_REGION2 */
+struct kvm_userspace_memory_region2 {
+       __u32 slot;
+       __u32 flags;
+       __u64 guest_phys_addr;
+       __u64 memory_size;
+       __u64 userspace_addr;
+       __u64 guest_memfd_offset;
+       __u32 guest_memfd;
+       __u32 pad1;
+       __u64 pad2[14];
+};
+
 /*
  * The bit 0 ~ bit 15 of kvm_userspace_memory_region::flags are visible for
  * userspace, other bits are reserved for kvm internal use which are defined
@@ -102,6 +45,7 @@ struct kvm_userspace_memory_region {
  */
 #define KVM_MEM_LOG_DIRTY_PAGES        (1UL << 0)
 #define KVM_MEM_READONLY       (1UL << 1)
+#define KVM_MEM_GUEST_MEMFD    (1UL << 2)
 
 /* for KVM_IRQ_LINE */
 struct kvm_irq_level {
@@ -265,6 +209,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_RISCV_CSR        36
 #define KVM_EXIT_NOTIFY           37
 #define KVM_EXIT_LOONGARCH_IOCSR  38
+#define KVM_EXIT_MEMORY_FAULT     39
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -518,6 +463,13 @@ struct kvm_run {
 #define KVM_NOTIFY_CONTEXT_INVALID     (1 << 0)
                        __u32 flags;
                } notify;
+               /* KVM_EXIT_MEMORY_FAULT */
+               struct {
+#define KVM_MEMORY_EXIT_FLAG_PRIVATE   (1ULL << 3)
+                       __u64 flags;
+                       __u64 gpa;
+                       __u64 size;
+               } memory_fault;
                /* Fix the size of the union. */
                char padding[256];
        };
@@ -945,9 +897,6 @@ struct kvm_ppc_resize_hpt {
  */
 #define KVM_GET_VCPU_MMAP_SIZE    _IO(KVMIO,   0x04) /* in bytes */
 #define KVM_GET_SUPPORTED_CPUID   _IOWR(KVMIO, 0x05, struct kvm_cpuid2)
-#define KVM_TRACE_ENABLE          __KVM_DEPRECATED_MAIN_W_0x06
-#define KVM_TRACE_PAUSE           __KVM_DEPRECATED_MAIN_0x07
-#define KVM_TRACE_DISABLE         __KVM_DEPRECATED_MAIN_0x08
 #define KVM_GET_EMULATED_CPUID   _IOWR(KVMIO, 0x09, struct kvm_cpuid2)
 #define KVM_GET_MSR_FEATURE_INDEX_LIST    _IOWR(KVMIO, 0x0a, struct kvm_msr_list)
 
@@ -1201,6 +1150,11 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228
 #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229
 #define KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES 230
+#define KVM_CAP_USER_MEMORY2 231
+#define KVM_CAP_MEMORY_FAULT_INFO 232
+#define KVM_CAP_MEMORY_ATTRIBUTES 233
+#define KVM_CAP_GUEST_MEMFD 234
+#define KVM_CAP_VM_TYPES 235
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1291,6 +1245,7 @@ struct kvm_x86_mce {
 #define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL       (1 << 4)
 #define KVM_XEN_HVM_CONFIG_EVTCHN_SEND         (1 << 5)
 #define KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG        (1 << 6)
+#define KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE        (1 << 7)
 
 struct kvm_xen_hvm_config {
        __u32 flags;
@@ -1483,6 +1438,8 @@ struct kvm_vfio_spapr_tce {
                                        struct kvm_userspace_memory_region)
 #define KVM_SET_TSS_ADDR          _IO(KVMIO,   0x47)
 #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO,  0x48, __u64)
+#define KVM_SET_USER_MEMORY_REGION2 _IOW(KVMIO, 0x49, \
+                                        struct kvm_userspace_memory_region2)
 
 /* enable ucontrol for s390 */
 struct kvm_s390_ucas_mapping {
@@ -1507,20 +1464,8 @@ struct kvm_s390_ucas_mapping {
                        _IOW(KVMIO,  0x67, struct kvm_coalesced_mmio_zone)
 #define KVM_UNREGISTER_COALESCED_MMIO \
                        _IOW(KVMIO,  0x68, struct kvm_coalesced_mmio_zone)
-#define KVM_ASSIGN_PCI_DEVICE     _IOR(KVMIO,  0x69, \
-                                      struct kvm_assigned_pci_dev)
 #define KVM_SET_GSI_ROUTING       _IOW(KVMIO,  0x6a, struct kvm_irq_routing)
-/* deprecated, replaced by KVM_ASSIGN_DEV_IRQ */
-#define KVM_ASSIGN_IRQ            __KVM_DEPRECATED_VM_R_0x70
-#define KVM_ASSIGN_DEV_IRQ        _IOW(KVMIO,  0x70, struct kvm_assigned_irq)
 #define KVM_REINJECT_CONTROL      _IO(KVMIO,   0x71)
-#define KVM_DEASSIGN_PCI_DEVICE   _IOW(KVMIO,  0x72, \
-                                      struct kvm_assigned_pci_dev)
-#define KVM_ASSIGN_SET_MSIX_NR    _IOW(KVMIO,  0x73, \
-                                      struct kvm_assigned_msix_nr)
-#define KVM_ASSIGN_SET_MSIX_ENTRY _IOW(KVMIO,  0x74, \
-                                      struct kvm_assigned_msix_entry)
-#define KVM_DEASSIGN_DEV_IRQ      _IOW(KVMIO,  0x75, struct kvm_assigned_irq)
 #define KVM_IRQFD                 _IOW(KVMIO,  0x76, struct kvm_irqfd)
 #define KVM_CREATE_PIT2                  _IOW(KVMIO,  0x77, struct kvm_pit_config)
 #define KVM_SET_BOOT_CPU_ID       _IO(KVMIO,   0x78)
@@ -1537,9 +1482,6 @@ struct kvm_s390_ucas_mapping {
 *  KVM_CAP_VM_TSC_CONTROL to set defaults for a VM */
 #define KVM_SET_TSC_KHZ           _IO(KVMIO,  0xa2)
 #define KVM_GET_TSC_KHZ           _IO(KVMIO,  0xa3)
-/* Available with KVM_CAP_PCI_2_3 */
-#define KVM_ASSIGN_SET_INTX_MASK  _IOW(KVMIO,  0xa4, \
-                                      struct kvm_assigned_pci_dev)
 /* Available with KVM_CAP_SIGNAL_MSI */
 #define KVM_SIGNAL_MSI            _IOW(KVMIO,  0xa5, struct kvm_msi)
 /* Available with KVM_CAP_PPC_GET_SMMU_INFO */
@@ -1592,8 +1534,6 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SET_SREGS             _IOW(KVMIO,  0x84, struct kvm_sregs)
 #define KVM_TRANSLATE             _IOWR(KVMIO, 0x85, struct kvm_translation)
 #define KVM_INTERRUPT             _IOW(KVMIO,  0x86, struct kvm_interrupt)
-/* KVM_DEBUG_GUEST is no longer supported, use KVM_SET_GUEST_DEBUG instead */
-#define KVM_DEBUG_GUEST           __KVM_DEPRECATED_VCPU_W_0x87
 #define KVM_GET_MSRS              _IOWR(KVMIO, 0x88, struct kvm_msrs)
 #define KVM_SET_MSRS              _IOW(KVMIO,  0x89, struct kvm_msrs)
 #define KVM_SET_CPUID             _IOW(KVMIO,  0x8a, struct kvm_cpuid)
@@ -2267,4 +2207,24 @@ struct kvm_s390_zpci_op {
 /* flags for kvm_s390_zpci_op->u.reg_aen.flags */
 #define KVM_S390_ZPCIOP_REGAEN_HOST    (1 << 0)
 
+/* Available with KVM_CAP_MEMORY_ATTRIBUTES */
+#define KVM_SET_MEMORY_ATTRIBUTES              _IOW(KVMIO,  0xd2, struct kvm_memory_attributes)
+
+struct kvm_memory_attributes {
+       __u64 address;
+       __u64 size;
+       __u64 attributes;
+       __u64 flags;
+};
+
+#define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
+
+#define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
+
+struct kvm_create_guest_memfd {
+       __u64 size;
+       __u64 flags;
+       __u64 reserved[6];
+};
+
 #endif /* __LINUX_KVM_H */
index bb242fdcfe6b29bf96e287023701dd8629042969..ad5478dbad007341f70a8816aa506216ffea89ec 100644 (file)
@@ -138,4 +138,74 @@ struct mount_attr {
 /* List of all mount_attr versions. */
 #define MOUNT_ATTR_SIZE_VER0   32 /* sizeof first published struct */
 
+
+/*
+ * Structure for getting mount/superblock/filesystem info with statmount(2).
+ *
+ * The interface is similar to statx(2): individual fields or groups can be
+ * selected with the @mask argument of statmount().  Kernel will set the @mask
+ * field according to the supported fields.
+ *
+ * If string fields are selected, then the caller needs to pass a buffer that
+ * has space after the fixed part of the structure.  Nul terminated strings are
+ * copied there and offsets relative to @str are stored in the relevant fields.
+ * If the buffer is too small, then EOVERFLOW is returned.  The actually used
+ * size is returned in @size.
+ */
+struct statmount {
+       __u32 size;             /* Total size, including strings */
+       __u32 __spare1;
+       __u64 mask;             /* What results were written */
+       __u32 sb_dev_major;     /* Device ID */
+       __u32 sb_dev_minor;
+       __u64 sb_magic;         /* ..._SUPER_MAGIC */
+       __u32 sb_flags;         /* SB_{RDONLY,SYNCHRONOUS,DIRSYNC,LAZYTIME} */
+       __u32 fs_type;          /* [str] Filesystem type */
+       __u64 mnt_id;           /* Unique ID of mount */
+       __u64 mnt_parent_id;    /* Unique ID of parent (for root == mnt_id) */
+       __u32 mnt_id_old;       /* Reused IDs used in proc/.../mountinfo */
+       __u32 mnt_parent_id_old;
+       __u64 mnt_attr;         /* MOUNT_ATTR_... */
+       __u64 mnt_propagation;  /* MS_{SHARED,SLAVE,PRIVATE,UNBINDABLE} */
+       __u64 mnt_peer_group;   /* ID of shared peer group */
+       __u64 mnt_master;       /* Mount receives propagation from this ID */
+       __u64 propagate_from;   /* Propagation from in current namespace */
+       __u32 mnt_root;         /* [str] Root of mount relative to root of fs */
+       __u32 mnt_point;        /* [str] Mountpoint relative to current root */
+       __u64 __spare2[50];
+       char str[];             /* Variable size part containing strings */
+};
+
+/*
+ * Structure for passing mount ID and miscellaneous parameters to statmount(2)
+ * and listmount(2).
+ *
+ * For statmount(2) @param represents the request mask.
+ * For listmount(2) @param represents the last listed mount id (or zero).
+ */
+struct mnt_id_req {
+       __u32 size;
+       __u32 spare;
+       __u64 mnt_id;
+       __u64 param;
+};
+
+/* List of all mnt_id_req versions. */
+#define MNT_ID_REQ_SIZE_VER0   24 /* sizeof first published struct */
+
+/*
+ * @mask bits for statmount(2)
+ */
+#define STATMOUNT_SB_BASIC             0x00000001U     /* Want/got sb_... */
+#define STATMOUNT_MNT_BASIC            0x00000002U     /* Want/got mnt_... */
+#define STATMOUNT_PROPAGATE_FROM       0x00000004U     /* Want/got propagate_from */
+#define STATMOUNT_MNT_ROOT             0x00000008U     /* Want/got mnt_root  */
+#define STATMOUNT_MNT_POINT            0x00000010U     /* Want/got mnt_point */
+#define STATMOUNT_FS_TYPE              0x00000020U     /* Want/got fs_type */
+
+/*
+ * Special @mnt_id values that can be passed to listmount
+ */
+#define LSMT_ROOT              0xffffffffffffffff      /* root mount */
+
 #endif /* _UAPI_LINUX_MOUNT_H */
index 7cab2c65d3d7fce9210d2fb6d02012233b9923cf..2f2ee82d55175d052c0214a7e29da5d6ce2738ab 100644 (file)
@@ -154,6 +154,7 @@ struct statx {
 #define STATX_BTIME            0x00000800U     /* Want/got stx_btime */
 #define STATX_MNT_ID           0x00001000U     /* Got stx_mnt_id */
 #define STATX_DIOALIGN         0x00002000U     /* Want/got direct I/O alignment info */
+#define STATX_MNT_ID_UNIQUE    0x00004000U     /* Want/got extended stx_mount_id */
 
 #define STATX__RESERVED                0x80000000U     /* Reserved for future struct statx expansion */
 
index c82a7f41b31c571c09c3dd454a7db3ebbb8a1e60..45e49671ae87b01fac48b2877f46403cf9ee1c36 100644 (file)
@@ -466,6 +466,8 @@ ynl_gemsg_start_dump(struct ynl_sock *ys, __u32 id, __u8 cmd, __u8 version)
 
 int ynl_recv_ack(struct ynl_sock *ys, int ret)
 {
+       struct ynl_parse_arg yarg = { .ys = ys, };
+
        if (!ret) {
                yerr(ys, YNL_ERROR_EXPECT_ACK,
                     "Expecting an ACK but nothing received");
@@ -478,7 +480,7 @@ int ynl_recv_ack(struct ynl_sock *ys, int ret)
                return ret;
        }
        return mnl_cb_run(ys->rx_buf, ret, ys->seq, ys->portid,
-                         ynl_cb_null, ys);
+                         ynl_cb_null, &yarg);
 }
 
 int ynl_cb_null(const struct nlmsghdr *nlh, void *data)
@@ -521,6 +523,7 @@ ynl_get_family_info_mcast(struct ynl_sock *ys, const struct nlattr *mcasts)
                                ys->mcast_groups[i].name[GENL_NAMSIZ - 1] = 0;
                        }
                }
+               i++;
        }
 
        return 0;
@@ -586,7 +589,13 @@ static int ynl_sock_read_family(struct ynl_sock *ys, const char *family_name)
                return err;
        }
 
-       return ynl_recv_ack(ys, err);
+       err = ynl_recv_ack(ys, err);
+       if (err < 0) {
+               free(ys->mcast_groups);
+               return err;
+       }
+
+       return 0;
 }
 
 struct ynl_sock *
@@ -741,11 +750,14 @@ err_free:
 
 static int ynl_ntf_trampoline(const struct nlmsghdr *nlh, void *data)
 {
-       return ynl_ntf_parse((struct ynl_sock *)data, nlh);
+       struct ynl_parse_arg *yarg = data;
+
+       return ynl_ntf_parse(yarg->ys, nlh);
 }
 
 int ynl_ntf_check(struct ynl_sock *ys)
 {
+       struct ynl_parse_arg yarg = { .ys = ys, };
        ssize_t len;
        int err;
 
@@ -767,7 +779,7 @@ int ynl_ntf_check(struct ynl_sock *ys)
                        return len;
 
                err = mnl_cb_run2(ys->rx_buf, len, ys->seq, ys->portid,
-                                 ynl_ntf_trampoline, ys,
+                                 ynl_ntf_trampoline, &yarg,
                                  ynl_cb_array, NLMSG_MIN_TYPE);
                if (err < 0)
                        return err;
index 1b90575ee3c84eb206f9291e8fd05d43c70b2f9c..3b12595193c9f49a78a490b888f22d615b92dc43 100644 (file)
@@ -47,6 +47,10 @@ Print PMU events and metrics limited to the specific PMU name.
 --json::
 Output in JSON format.
 
+-o::
+--output=::
+       Output file name. By default output is written to stdout.
+
 [[EVENT_MODIFIERS]]
 EVENT MODIFIERS
 ---------------
index 27e7c478880fdecd10761fc07d4249bf1581d9c0..f8774a9b1377a3e98b98543a66b4f8aea6fb6837 100644 (file)
@@ -236,6 +236,16 @@ else
   SHELLCHECK := $(shell which shellcheck 2> /dev/null)
 endif
 
+# shellcheck is using in tools/perf/tests/Build with option -a/--check-sourced (
+# introduced in v0.4.7) and -S/--severity (introduced in v0.6.0). So make the
+# minimal shellcheck version as v0.6.0.
+ifneq ($(SHELLCHECK),)
+  ifeq ($(shell expr $(shell $(SHELLCHECK) --version | grep version: | \
+        sed -e 's/.\+ \([0-9]\+\).\([0-9]\+\).\([0-9]\+\)/\1\2\3/g') \< 060), 1)
+    SHELLCHECK :=
+  endif
+endif
+
 export srctree OUTPUT RM CC CXX LD AR CFLAGS CXXFLAGS V BISON FLEX AWK
 export HOSTCC HOSTLD HOSTAR HOSTCFLAGS SHELLCHECK
 
index 61c2c96cc0701b886d7c1daecd92cfa25581d1f7..e27a1b1288c29ffe96ce871bc5bab76c8a67c8b7 100644 (file)
@@ -30,6 +30,8 @@
  * functions.
  */
 struct print_state {
+       /** @fp: File to write output to. */
+       FILE *fp;
        /**
         * @pmu_glob: Optionally restrict PMU and metric matching to PMU or
         * debugfs subsystem name.
@@ -66,13 +68,15 @@ static void default_print_start(void *ps)
 {
        struct print_state *print_state = ps;
 
-       if (!print_state->name_only && pager_in_use())
-               printf("\nList of pre-defined events (to be used in -e or -M):\n\n");
+       if (!print_state->name_only && pager_in_use()) {
+               fprintf(print_state->fp,
+                       "\nList of pre-defined events (to be used in -e or -M):\n\n");
+       }
 }
 
 static void default_print_end(void *print_state __maybe_unused) {}
 
-static void wordwrap(const char *s, int start, int max, int corr)
+static void wordwrap(FILE *fp, const char *s, int start, int max, int corr)
 {
        int column = start;
        int n;
@@ -82,10 +86,10 @@ static void wordwrap(const char *s, int start, int max, int corr)
                int wlen = strcspn(s, " \t\n");
 
                if ((column + wlen >= max && column > start) || saw_newline) {
-                       printf("\n%*s", start, "");
+                       fprintf(fp, "\n%*s", start, "");
                        column = start + corr;
                }
-               n = printf("%s%.*s", column > start ? " " : "", wlen, s);
+               n = fprintf(fp, "%s%.*s", column > start ? " " : "", wlen, s);
                if (n <= 0)
                        break;
                saw_newline = s[wlen] == '\n';
@@ -104,6 +108,7 @@ static void default_print_event(void *ps, const char *pmu_name, const char *topi
 {
        struct print_state *print_state = ps;
        int pos;
+       FILE *fp = print_state->fp;
 
        if (deprecated && !print_state->deprecated)
                return;
@@ -119,30 +124,30 @@ static void default_print_event(void *ps, const char *pmu_name, const char *topi
 
        if (print_state->name_only) {
                if (event_alias && strlen(event_alias))
-                       printf("%s ", event_alias);
+                       fprintf(fp, "%s ", event_alias);
                else
-                       printf("%s ", event_name);
+                       fprintf(fp, "%s ", event_name);
                return;
        }
 
        if (strcmp(print_state->last_topic, topic ?: "")) {
                if (topic)
-                       printf("\n%s:\n", topic);
+                       fprintf(fp, "\n%s:\n", topic);
                zfree(&print_state->last_topic);
                print_state->last_topic = strdup(topic ?: "");
        }
 
        if (event_alias && strlen(event_alias))
-               pos = printf("  %s OR %s", event_name, event_alias);
+               pos = fprintf(fp, "  %s OR %s", event_name, event_alias);
        else
-               pos = printf("  %s", event_name);
+               pos = fprintf(fp, "  %s", event_name);
 
        if (!topic && event_type_desc) {
                for (; pos < 53; pos++)
-                       putchar(' ');
-               printf("[%s]\n", event_type_desc);
+                       fputc(' ', fp);
+               fprintf(fp, "[%s]\n", event_type_desc);
        } else
-               putchar('\n');
+               fputc('\n', fp);
 
        if (desc && print_state->desc) {
                char *desc_with_unit = NULL;
@@ -155,22 +160,22 @@ static void default_print_event(void *ps, const char *pmu_name, const char *topi
                                              ? "%s. Unit: %s" : "%s Unit: %s",
                                            desc, pmu_name);
                }
-               printf("%*s", 8, "[");
-               wordwrap(desc_len > 0 ? desc_with_unit : desc, 8, pager_get_columns(), 0);
-               printf("]\n");
+               fprintf(fp, "%*s", 8, "[");
+               wordwrap(fp, desc_len > 0 ? desc_with_unit : desc, 8, pager_get_columns(), 0);
+               fprintf(fp, "]\n");
                free(desc_with_unit);
        }
        long_desc = long_desc ?: desc;
        if (long_desc && print_state->long_desc) {
-               printf("%*s", 8, "[");
-               wordwrap(long_desc, 8, pager_get_columns(), 0);
-               printf("]\n");
+               fprintf(fp, "%*s", 8, "[");
+               wordwrap(fp, long_desc, 8, pager_get_columns(), 0);
+               fprintf(fp, "]\n");
        }
 
        if (print_state->detailed && encoding_desc) {
-               printf("%*s", 8, "");
-               wordwrap(encoding_desc, 8, pager_get_columns(), 0);
-               putchar('\n');
+               fprintf(fp, "%*s", 8, "");
+               wordwrap(fp, encoding_desc, 8, pager_get_columns(), 0);
+               fputc('\n', fp);
        }
 }
 
@@ -184,6 +189,7 @@ static void default_print_metric(void *ps,
                                const char *unit __maybe_unused)
 {
        struct print_state *print_state = ps;
+       FILE *fp = print_state->fp;
 
        if (print_state->event_glob &&
            (!print_state->metrics || !name || !strglobmatch(name, print_state->event_glob)) &&
@@ -192,27 +198,27 @@ static void default_print_metric(void *ps,
 
        if (!print_state->name_only && !print_state->last_metricgroups) {
                if (print_state->metricgroups) {
-                       printf("\nMetric Groups:\n");
+                       fprintf(fp, "\nMetric Groups:\n");
                        if (!print_state->metrics)
-                               putchar('\n');
+                               fputc('\n', fp);
                } else {
-                       printf("\nMetrics:\n\n");
+                       fprintf(fp, "\nMetrics:\n\n");
                }
        }
        if (!print_state->last_metricgroups ||
            strcmp(print_state->last_metricgroups, group ?: "")) {
                if (group && print_state->metricgroups) {
                        if (print_state->name_only)
-                               printf("%s ", group);
+                               fprintf(fp, "%s ", group);
                        else if (print_state->metrics) {
                                const char *gdesc = describe_metricgroup(group);
 
                                if (gdesc)
-                                       printf("\n%s: [%s]\n", group, gdesc);
+                                       fprintf(fp, "\n%s: [%s]\n", group, gdesc);
                                else
-                                       printf("\n%s:\n", group);
+                                       fprintf(fp, "\n%s:\n", group);
                        } else
-                               printf("%s\n", group);
+                               fprintf(fp, "%s\n", group);
                }
                zfree(&print_state->last_metricgroups);
                print_state->last_metricgroups = strdup(group ?: "");
@@ -223,53 +229,59 @@ static void default_print_metric(void *ps,
        if (print_state->name_only) {
                if (print_state->metrics &&
                    !strlist__has_entry(print_state->visited_metrics, name)) {
-                       printf("%s ", name);
+                       fprintf(fp, "%s ", name);
                        strlist__add(print_state->visited_metrics, name);
                }
                return;
        }
-       printf("  %s\n", name);
+       fprintf(fp, "  %s\n", name);
 
        if (desc && print_state->desc) {
-               printf("%*s", 8, "[");
-               wordwrap(desc, 8, pager_get_columns(), 0);
-               printf("]\n");
+               fprintf(fp, "%*s", 8, "[");
+               wordwrap(fp, desc, 8, pager_get_columns(), 0);
+               fprintf(fp, "]\n");
        }
        if (long_desc && print_state->long_desc) {
-               printf("%*s", 8, "[");
-               wordwrap(long_desc, 8, pager_get_columns(), 0);
-               printf("]\n");
+               fprintf(fp, "%*s", 8, "[");
+               wordwrap(fp, long_desc, 8, pager_get_columns(), 0);
+               fprintf(fp, "]\n");
        }
        if (expr && print_state->detailed) {
-               printf("%*s", 8, "[");
-               wordwrap(expr, 8, pager_get_columns(), 0);
-               printf("]\n");
+               fprintf(fp, "%*s", 8, "[");
+               wordwrap(fp, expr, 8, pager_get_columns(), 0);
+               fprintf(fp, "]\n");
        }
        if (threshold && print_state->detailed) {
-               printf("%*s", 8, "[");
-               wordwrap(threshold, 8, pager_get_columns(), 0);
-               printf("]\n");
+               fprintf(fp, "%*s", 8, "[");
+               wordwrap(fp, threshold, 8, pager_get_columns(), 0);
+               fprintf(fp, "]\n");
        }
 }
 
 struct json_print_state {
+       /** @fp: File to write output to. */
+       FILE *fp;
        /** Should a separator be printed prior to the next item? */
        bool need_sep;
 };
 
-static void json_print_start(void *print_state __maybe_unused)
+static void json_print_start(void *ps)
 {
-       printf("[\n");
+       struct json_print_state *print_state = ps;
+       FILE *fp = print_state->fp;
+
+       fprintf(fp, "[\n");
 }
 
 static void json_print_end(void *ps)
 {
        struct json_print_state *print_state = ps;
+       FILE *fp = print_state->fp;
 
-       printf("%s]\n", print_state->need_sep ? "\n" : "");
+       fprintf(fp, "%s]\n", print_state->need_sep ? "\n" : "");
 }
 
-static void fix_escape_printf(struct strbuf *buf, const char *fmt, ...)
+static void fix_escape_fprintf(FILE *fp, struct strbuf *buf, const char *fmt, ...)
 {
        va_list args;
 
@@ -318,7 +330,7 @@ static void fix_escape_printf(struct strbuf *buf, const char *fmt, ...)
                }
        }
        va_end(args);
-       fputs(buf->buf, stdout);
+       fputs(buf->buf, fp);
 }
 
 static void json_print_event(void *ps, const char *pmu_name, const char *topic,
@@ -330,60 +342,71 @@ static void json_print_event(void *ps, const char *pmu_name, const char *topic,
 {
        struct json_print_state *print_state = ps;
        bool need_sep = false;
+       FILE *fp = print_state->fp;
        struct strbuf buf;
 
        strbuf_init(&buf, 0);
-       printf("%s{\n", print_state->need_sep ? ",\n" : "");
+       fprintf(fp, "%s{\n", print_state->need_sep ? ",\n" : "");
        print_state->need_sep = true;
        if (pmu_name) {
-               fix_escape_printf(&buf, "\t\"Unit\": \"%S\"", pmu_name);
+               fix_escape_fprintf(fp, &buf, "\t\"Unit\": \"%S\"", pmu_name);
                need_sep = true;
        }
        if (topic) {
-               fix_escape_printf(&buf, "%s\t\"Topic\": \"%S\"", need_sep ? ",\n" : "", topic);
+               fix_escape_fprintf(fp, &buf, "%s\t\"Topic\": \"%S\"",
+                                  need_sep ? ",\n" : "",
+                                  topic);
                need_sep = true;
        }
        if (event_name) {
-               fix_escape_printf(&buf, "%s\t\"EventName\": \"%S\"", need_sep ? ",\n" : "",
-                                 event_name);
+               fix_escape_fprintf(fp, &buf, "%s\t\"EventName\": \"%S\"",
+                                  need_sep ? ",\n" : "",
+                                  event_name);
                need_sep = true;
        }
        if (event_alias && strlen(event_alias)) {
-               fix_escape_printf(&buf, "%s\t\"EventAlias\": \"%S\"", need_sep ? ",\n" : "",
-                                 event_alias);
+               fix_escape_fprintf(fp, &buf, "%s\t\"EventAlias\": \"%S\"",
+                                  need_sep ? ",\n" : "",
+                                  event_alias);
                need_sep = true;
        }
        if (scale_unit && strlen(scale_unit)) {
-               fix_escape_printf(&buf, "%s\t\"ScaleUnit\": \"%S\"", need_sep ? ",\n" : "",
-                                 scale_unit);
+               fix_escape_fprintf(fp, &buf, "%s\t\"ScaleUnit\": \"%S\"",
+                                  need_sep ? ",\n" : "",
+                                  scale_unit);
                need_sep = true;
        }
        if (event_type_desc) {
-               fix_escape_printf(&buf, "%s\t\"EventType\": \"%S\"", need_sep ? ",\n" : "",
-                                 event_type_desc);
+               fix_escape_fprintf(fp, &buf, "%s\t\"EventType\": \"%S\"",
+                                  need_sep ? ",\n" : "",
+                                  event_type_desc);
                need_sep = true;
        }
        if (deprecated) {
-               fix_escape_printf(&buf, "%s\t\"Deprecated\": \"%S\"", need_sep ? ",\n" : "",
-                                 deprecated ? "1" : "0");
+               fix_escape_fprintf(fp, &buf, "%s\t\"Deprecated\": \"%S\"",
+                                  need_sep ? ",\n" : "",
+                                  deprecated ? "1" : "0");
                need_sep = true;
        }
        if (desc) {
-               fix_escape_printf(&buf, "%s\t\"BriefDescription\": \"%S\"", need_sep ? ",\n" : "",
-                                 desc);
+               fix_escape_fprintf(fp, &buf, "%s\t\"BriefDescription\": \"%S\"",
+                                  need_sep ? ",\n" : "",
+                                  desc);
                need_sep = true;
        }
        if (long_desc) {
-               fix_escape_printf(&buf, "%s\t\"PublicDescription\": \"%S\"", need_sep ? ",\n" : "",
-                                 long_desc);
+               fix_escape_fprintf(fp, &buf, "%s\t\"PublicDescription\": \"%S\"",
+                                  need_sep ? ",\n" : "",
+                                  long_desc);
                need_sep = true;
        }
        if (encoding_desc) {
-               fix_escape_printf(&buf, "%s\t\"Encoding\": \"%S\"", need_sep ? ",\n" : "",
-                                 encoding_desc);
+               fix_escape_fprintf(fp, &buf, "%s\t\"Encoding\": \"%S\"",
+                                  need_sep ? ",\n" : "",
+                                  encoding_desc);
                need_sep = true;
        }
-       printf("%s}", need_sep ? "\n" : "");
+       fprintf(fp, "%s}", need_sep ? "\n" : "");
        strbuf_release(&buf);
 }
 
@@ -394,43 +417,53 @@ static void json_print_metric(void *ps __maybe_unused, const char *group,
 {
        struct json_print_state *print_state = ps;
        bool need_sep = false;
+       FILE *fp = print_state->fp;
        struct strbuf buf;
 
        strbuf_init(&buf, 0);
-       printf("%s{\n", print_state->need_sep ? ",\n" : "");
+       fprintf(fp, "%s{\n", print_state->need_sep ? ",\n" : "");
        print_state->need_sep = true;
        if (group) {
-               fix_escape_printf(&buf, "\t\"MetricGroup\": \"%S\"", group);
+               fix_escape_fprintf(fp, &buf, "\t\"MetricGroup\": \"%S\"", group);
                need_sep = true;
        }
        if (name) {
-               fix_escape_printf(&buf, "%s\t\"MetricName\": \"%S\"", need_sep ? ",\n" : "", name);
+               fix_escape_fprintf(fp, &buf, "%s\t\"MetricName\": \"%S\"",
+                                  need_sep ? ",\n" : "",
+                                  name);
                need_sep = true;
        }
        if (expr) {
-               fix_escape_printf(&buf, "%s\t\"MetricExpr\": \"%S\"", need_sep ? ",\n" : "", expr);
+               fix_escape_fprintf(fp, &buf, "%s\t\"MetricExpr\": \"%S\"",
+                                  need_sep ? ",\n" : "",
+                                  expr);
                need_sep = true;
        }
        if (threshold) {
-               fix_escape_printf(&buf, "%s\t\"MetricThreshold\": \"%S\"", need_sep ? ",\n" : "",
-                                 threshold);
+               fix_escape_fprintf(fp, &buf, "%s\t\"MetricThreshold\": \"%S\"",
+                                  need_sep ? ",\n" : "",
+                                  threshold);
                need_sep = true;
        }
        if (unit) {
-               fix_escape_printf(&buf, "%s\t\"ScaleUnit\": \"%S\"", need_sep ? ",\n" : "", unit);
+               fix_escape_fprintf(fp, &buf, "%s\t\"ScaleUnit\": \"%S\"",
+                                  need_sep ? ",\n" : "",
+                                  unit);
                need_sep = true;
        }
        if (desc) {
-               fix_escape_printf(&buf, "%s\t\"BriefDescription\": \"%S\"", need_sep ? ",\n" : "",
-                                 desc);
+               fix_escape_fprintf(fp, &buf, "%s\t\"BriefDescription\": \"%S\"",
+                                  need_sep ? ",\n" : "",
+                                  desc);
                need_sep = true;
        }
        if (long_desc) {
-               fix_escape_printf(&buf, "%s\t\"PublicDescription\": \"%S\"", need_sep ? ",\n" : "",
-                                 long_desc);
+               fix_escape_fprintf(fp, &buf, "%s\t\"PublicDescription\": \"%S\"",
+                                  need_sep ? ",\n" : "",
+                                  long_desc);
                need_sep = true;
        }
-       printf("%s}", need_sep ? "\n" : "");
+       fprintf(fp, "%s}", need_sep ? "\n" : "");
        strbuf_release(&buf);
 }
 
@@ -449,8 +482,12 @@ static bool default_skip_duplicate_pmus(void *ps)
 int cmd_list(int argc, const char **argv)
 {
        int i, ret = 0;
-       struct print_state default_ps = {};
-       struct print_state json_ps = {};
+       struct print_state default_ps = {
+               .fp = stdout,
+       };
+       struct print_state json_ps = {
+               .fp = stdout,
+       };
        void *ps = &default_ps;
        struct print_callbacks print_cb = {
                .print_start = default_print_start,
@@ -461,6 +498,7 @@ int cmd_list(int argc, const char **argv)
        };
        const char *cputype = NULL;
        const char *unit_name = NULL;
+       const char *output_path = NULL;
        bool json = false;
        struct option list_options[] = {
                OPT_BOOLEAN(0, "raw-dump", &default_ps.name_only, "Dump raw events"),
@@ -471,6 +509,7 @@ int cmd_list(int argc, const char **argv)
                            "Print longer event descriptions."),
                OPT_BOOLEAN(0, "details", &default_ps.detailed,
                            "Print information on the perf event names and expressions used internally by events."),
+               OPT_STRING('o', "output", &output_path, "file", "output file name"),
                OPT_BOOLEAN(0, "deprecated", &default_ps.deprecated,
                            "Print deprecated events."),
                OPT_STRING(0, "cputype", &cputype, "cpu type",
@@ -497,6 +536,11 @@ int cmd_list(int argc, const char **argv)
        argc = parse_options(argc, argv, list_options, list_usage,
                             PARSE_OPT_STOP_AT_NON_OPTION);
 
+       if (output_path) {
+               default_ps.fp = fopen(output_path, "w");
+               json_ps.fp = default_ps.fp;
+       }
+
        setup_pager();
 
        if (!default_ps.name_only)
@@ -618,5 +662,8 @@ out:
        free(default_ps.last_topic);
        free(default_ps.last_metricgroups);
        strlist__delete(default_ps.visited_metrics);
+       if (output_path)
+               fclose(default_ps.fp);
+
        return ret;
 }
index 91e6828c38cc2ef4c6b4d28d842309ce4e475f8d..86c91012517267c5355d7fedebdeed42e9cfb675 100644 (file)
@@ -4080,8 +4080,8 @@ int cmd_record(int argc, const char **argv)
        }
 
        if (rec->switch_output.num_files) {
-               rec->switch_output.filenames = calloc(sizeof(char *),
-                                                     rec->switch_output.num_files);
+               rec->switch_output.filenames = calloc(rec->switch_output.num_files,
+                                                     sizeof(char *));
                if (!rec->switch_output.filenames) {
                        err = -EINVAL;
                        goto out_opts;
index baf1ab083436e3f980157cb5d3646d6ccc59a40c..5301d1badd435906ddf152511e6935b73236c034 100644 (file)
@@ -357,7 +357,7 @@ static void perf_top__print_sym_table(struct perf_top *top)
 
 static void prompt_integer(int *target, const char *msg)
 {
-       char *buf = malloc(0), *p;
+       char *buf = NULL, *p;
        size_t dummy = 0;
        int tmp;
 
index 35124a4ddcb2bd547d190b40cdbb2c81fd5f5841..bbfa3883e53384f563427e1b7567e679e1d1465f 100644 (file)
     },
     {
         "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to certain allocation restrictions.",
-        "MetricExpr": "TOPDOWN_BE_BOUND.ALLOC_RESTRICTIONS / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.ALLOC_RESTRICTIONS@ / tma_info_core_slots",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group",
         "MetricName": "tma_alloc_restriction",
         "MetricThreshold": "tma_alloc_restriction > 0.1",
     {
         "BriefDescription": "Counts the total number of issue slots  that were not consumed by the backend due to backend stalls",
         "DefaultMetricgroupName": "TopdownL1",
-        "MetricExpr": "TOPDOWN_BE_BOUND.ALL / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.ALL@ / tma_info_core_slots",
         "MetricGroup": "Default;TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
         "MetricThreshold": "tma_backend_bound > 0.1",
     },
     {
         "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to BACLEARS, which occurs when the Branch Target Buffer (BTB) prediction or lack thereof, was corrected by a later branch predictor in the frontend",
-        "MetricExpr": "TOPDOWN_FE_BOUND.BRANCH_DETECT / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.BRANCH_DETECT@ / tma_info_core_slots",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_latency_group",
         "MetricName": "tma_branch_detect",
         "MetricThreshold": "tma_branch_detect > 0.05",
     },
     {
         "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to branch mispredicts.",
-        "MetricExpr": "TOPDOWN_BAD_SPECULATION.MISPREDICT / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_BAD_SPECULATION.MISPREDICT@ / tma_info_core_slots",
         "MetricGroup": "TopdownL2;tma_L2_group;tma_bad_speculation_group",
         "MetricName": "tma_branch_mispredicts",
         "MetricThreshold": "tma_branch_mispredicts > 0.05",
     },
     {
         "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to BTCLEARS, which occurs when the Branch Target Buffer (BTB) predicts a taken branch.",
-        "MetricExpr": "TOPDOWN_FE_BOUND.BRANCH_RESTEER / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.BRANCH_RESTEER@ / tma_info_core_slots",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_latency_group",
         "MetricName": "tma_branch_resteer",
         "MetricThreshold": "tma_branch_resteer > 0.05",
     },
     {
         "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to the microcode sequencer (MS).",
-        "MetricExpr": "TOPDOWN_FE_BOUND.CISC / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.CISC@ / tma_info_core_slots",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_bandwidth_group",
         "MetricName": "tma_cisc",
         "MetricThreshold": "tma_cisc > 0.05",
     },
     {
         "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to decode stalls.",
-        "MetricExpr": "TOPDOWN_FE_BOUND.DECODE / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.DECODE@ / tma_info_core_slots",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_bandwidth_group",
         "MetricName": "tma_decode",
         "MetricThreshold": "tma_decode > 0.05",
     },
     {
         "BriefDescription": "Counts the number of cycles the core is stalled due to a demand load miss which hit in DRAM or MMIO (Non-DRAM).",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "cpu_atom@MEM_BOUND_STALLS.LOAD_DRAM_HIT@ / tma_info_core_clks - max((cpu_atom@MEM_BOUND_STALLS.LOAD@ - cpu_atom@LD_HEAD.L1_MISS_AT_RET@) / tma_info_core_clks, 0) * cpu_atom@MEM_BOUND_STALLS.LOAD_DRAM_HIT@ / cpu_atom@MEM_BOUND_STALLS.LOAD@",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group",
         "MetricName": "tma_dram_bound",
     },
     {
         "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to a machine clear classified as a fast nuke due to memory ordering, memory disambiguation and memory renaming.",
-        "MetricExpr": "TOPDOWN_BAD_SPECULATION.FASTNUKE / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_BAD_SPECULATION.FASTNUKE@ / tma_info_core_slots",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_machine_clears_group",
         "MetricName": "tma_fast_nuke",
         "MetricThreshold": "tma_fast_nuke > 0.05",
     },
     {
         "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to frontend bandwidth restrictions due to decode, predecode, cisc, and other limitations.",
-        "MetricExpr": "TOPDOWN_FE_BOUND.FRONTEND_BANDWIDTH / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.FRONTEND_BANDWIDTH@ / tma_info_core_slots",
         "MetricGroup": "TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_bandwidth",
         "MetricThreshold": "tma_fetch_bandwidth > 0.1",
     },
     {
         "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to frontend bandwidth restrictions due to decode, predecode, cisc, and other limitations.",
-        "MetricExpr": "TOPDOWN_FE_BOUND.FRONTEND_LATENCY / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.FRONTEND_LATENCY@ / tma_info_core_slots",
         "MetricGroup": "TopdownL2;tma_L2_group;tma_frontend_bound_group",
         "MetricName": "tma_fetch_latency",
         "MetricThreshold": "tma_fetch_latency > 0.15",
     },
     {
         "BriefDescription": "Counts the number of floating point divide operations per uop.",
-        "MetricExpr": "UOPS_RETIRED.FPDIV / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@UOPS_RETIRED.FPDIV@ / tma_info_core_slots",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_base_group",
         "MetricName": "tma_fpdiv_uops",
         "MetricThreshold": "tma_fpdiv_uops > 0.2",
     {
         "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to frontend stalls.",
         "DefaultMetricgroupName": "TopdownL1",
-        "MetricExpr": "TOPDOWN_FE_BOUND.ALL / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.ALL@ / tma_info_core_slots",
         "MetricGroup": "Default;TopdownL1;tma_L1_group",
         "MetricName": "tma_frontend_bound",
         "MetricThreshold": "tma_frontend_bound > 0.2",
     },
     {
         "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to instruction cache misses.",
-        "MetricExpr": "TOPDOWN_FE_BOUND.ICACHE / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.ICACHE@ / tma_info_core_slots",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_latency_group",
         "MetricName": "tma_icache_misses",
         "MetricThreshold": "tma_icache_misses > 0.05",
     },
     {
         "BriefDescription": "Instructions Per Cycle",
-        "MetricExpr": "INST_RETIRED.ANY / tma_info_core_clks",
+        "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / tma_info_core_clks",
         "MetricName": "tma_info_core_ipc",
         "Unit": "cpu_atom"
     },
     },
     {
         "BriefDescription": "Uops Per Instruction",
-        "MetricExpr": "UOPS_RETIRED.ALL / INST_RETIRED.ANY",
+        "MetricExpr": "cpu_atom@UOPS_RETIRED.ALL@ / INST_RETIRED.ANY",
         "MetricName": "tma_info_core_upi",
         "Unit": "cpu_atom"
     },
     },
     {
         "BriefDescription": "Ratio of all branches which mispredict",
-        "MetricExpr": "BR_MISP_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.ALL_BRANCHES",
+        "MetricExpr": "cpu_atom@BR_MISP_RETIRED.ALL_BRANCHES@ / BR_INST_RETIRED.ALL_BRANCHES",
         "MetricName": "tma_info_inst_mix_branch_mispredict_ratio",
         "Unit": "cpu_atom"
     },
     {
         "BriefDescription": "Ratio between Mispredicted branches and unknown branches",
-        "MetricExpr": "BR_MISP_RETIRED.ALL_BRANCHES / BACLEARS.ANY",
+        "MetricExpr": "cpu_atom@BR_MISP_RETIRED.ALL_BRANCHES@ / BACLEARS.ANY",
         "MetricName": "tma_info_inst_mix_branch_mispredict_to_unknown_branch_ratio",
         "Unit": "cpu_atom"
     },
     },
     {
         "BriefDescription": "Instructions per Branch (lower number means higher occurrence rate)",
-        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES",
+        "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / BR_INST_RETIRED.ALL_BRANCHES",
         "MetricName": "tma_info_inst_mix_ipbranch",
         "Unit": "cpu_atom"
     },
     {
         "BriefDescription": "Instruction per (near) call (lower number means higher occurrence rate)",
-        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.CALL",
+        "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / BR_INST_RETIRED.CALL",
         "MetricName": "tma_info_inst_mix_ipcall",
         "Unit": "cpu_atom"
     },
     {
         "BriefDescription": "Instructions per Far Branch",
-        "MetricExpr": "INST_RETIRED.ANY / (cpu_atom@BR_INST_RETIRED.FAR_BRANCH@ / 2)",
+        "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / (cpu_atom@BR_INST_RETIRED.FAR_BRANCH@ / 2)",
         "MetricName": "tma_info_inst_mix_ipfarbranch",
         "Unit": "cpu_atom"
     },
     {
         "BriefDescription": "Instructions per Load",
-        "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_LOADS",
+        "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / MEM_UOPS_RETIRED.ALL_LOADS",
         "MetricName": "tma_info_inst_mix_ipload",
         "Unit": "cpu_atom"
     },
     {
         "BriefDescription": "Instructions per retired conditional Branch Misprediction where the branch was not taken",
-        "MetricExpr": "INST_RETIRED.ANY / (cpu_atom@BR_MISP_RETIRED.COND@ - cpu_atom@BR_MISP_RETIRED.COND_TAKEN@)",
+        "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / (cpu_atom@BR_MISP_RETIRED.COND@ - cpu_atom@BR_MISP_RETIRED.COND_TAKEN@)",
         "MetricName": "tma_info_inst_mix_ipmisp_cond_ntaken",
         "Unit": "cpu_atom"
     },
     {
         "BriefDescription": "Instructions per retired conditional Branch Misprediction where the branch was taken",
-        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.COND_TAKEN",
+        "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / BR_MISP_RETIRED.COND_TAKEN",
         "MetricName": "tma_info_inst_mix_ipmisp_cond_taken",
         "Unit": "cpu_atom"
     },
     {
         "BriefDescription": "Instructions per retired indirect call or jump Branch Misprediction",
-        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.INDIRECT",
+        "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / BR_MISP_RETIRED.INDIRECT",
         "MetricName": "tma_info_inst_mix_ipmisp_indirect",
         "Unit": "cpu_atom"
     },
     {
         "BriefDescription": "Instructions per retired return Branch Misprediction",
-        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.RETURN",
+        "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / BR_MISP_RETIRED.RETURN",
         "MetricName": "tma_info_inst_mix_ipmisp_ret",
         "Unit": "cpu_atom"
     },
     {
         "BriefDescription": "Instructions per retired Branch Misprediction",
-        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
+        "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / BR_MISP_RETIRED.ALL_BRANCHES",
         "MetricName": "tma_info_inst_mix_ipmispredict",
         "Unit": "cpu_atom"
     },
     {
         "BriefDescription": "Instructions per Store",
-        "MetricExpr": "INST_RETIRED.ANY / MEM_UOPS_RETIRED.ALL_STORES",
+        "MetricExpr": "cpu_atom@INST_RETIRED.ANY@ / MEM_UOPS_RETIRED.ALL_STORES",
         "MetricName": "tma_info_inst_mix_ipstore",
         "Unit": "cpu_atom"
     },
     },
     {
         "BriefDescription": "Cycle cost per DRAM hit",
-        "MetricExpr": "MEM_BOUND_STALLS.LOAD_DRAM_HIT / MEM_LOAD_UOPS_RETIRED.DRAM_HIT",
+        "MetricExpr": "cpu_atom@MEM_BOUND_STALLS.LOAD_DRAM_HIT@ / MEM_LOAD_UOPS_RETIRED.DRAM_HIT",
         "MetricName": "tma_info_memory_cycles_per_demand_load_dram_hit",
         "Unit": "cpu_atom"
     },
     {
         "BriefDescription": "Cycle cost per L2 hit",
-        "MetricExpr": "MEM_BOUND_STALLS.LOAD_L2_HIT / MEM_LOAD_UOPS_RETIRED.L2_HIT",
+        "MetricExpr": "cpu_atom@MEM_BOUND_STALLS.LOAD_L2_HIT@ / MEM_LOAD_UOPS_RETIRED.L2_HIT",
         "MetricName": "tma_info_memory_cycles_per_demand_load_l2_hit",
         "Unit": "cpu_atom"
     },
     {
         "BriefDescription": "Cycle cost per LLC hit",
-        "MetricExpr": "MEM_BOUND_STALLS.LOAD_LLC_HIT / MEM_LOAD_UOPS_RETIRED.L3_HIT",
+        "MetricExpr": "cpu_atom@MEM_BOUND_STALLS.LOAD_LLC_HIT@ / MEM_LOAD_UOPS_RETIRED.L3_HIT",
         "MetricName": "tma_info_memory_cycles_per_demand_load_l3_hit",
         "Unit": "cpu_atom"
     },
     },
     {
         "BriefDescription": "Average CPU Utilization",
-        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC",
+        "MetricExpr": "cpu_atom@CPU_CLK_UNHALTED.REF_TSC@ / TSC",
         "MetricName": "tma_info_system_cpu_utilization",
         "Unit": "cpu_atom"
     },
     },
     {
         "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to Instruction Table Lookaside Buffer (ITLB) misses.",
-        "MetricExpr": "TOPDOWN_FE_BOUND.ITLB / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.ITLB@ / tma_info_core_slots",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_latency_group",
         "MetricName": "tma_itlb_misses",
         "MetricThreshold": "tma_itlb_misses > 0.05",
     },
     {
         "BriefDescription": "Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a load block.",
-        "MetricExpr": "LD_HEAD.L1_BOUND_AT_RET / tma_info_core_clks",
+        "MetricExpr": "cpu_atom@LD_HEAD.L1_BOUND_AT_RET@ / tma_info_core_clks",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group",
         "MetricName": "tma_l1_bound",
         "MetricThreshold": "tma_l1_bound > 0.1",
     },
     {
         "BriefDescription": "Counts the number of cycles a core is stalled due to a demand load which hit in the L2 Cache.",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "cpu_atom@MEM_BOUND_STALLS.LOAD_L2_HIT@ / tma_info_core_clks - max((cpu_atom@MEM_BOUND_STALLS.LOAD@ - cpu_atom@LD_HEAD.L1_MISS_AT_RET@) / tma_info_core_clks, 0) * cpu_atom@MEM_BOUND_STALLS.LOAD_L2_HIT@ / cpu_atom@MEM_BOUND_STALLS.LOAD@",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group",
         "MetricName": "tma_l2_bound",
     },
     {
         "BriefDescription": "Counts the number of cycles a core is stalled due to a demand load which hit in the Last Level Cache (LLC) or other core with HITE/F/M.",
-        "MetricConstraint": "NO_GROUP_EVENTS_NMI",
         "MetricExpr": "cpu_atom@MEM_BOUND_STALLS.LOAD_LLC_HIT@ / tma_info_core_clks - max((cpu_atom@MEM_BOUND_STALLS.LOAD@ - cpu_atom@LD_HEAD.L1_MISS_AT_RET@) / tma_info_core_clks, 0) * cpu_atom@MEM_BOUND_STALLS.LOAD_LLC_HIT@ / cpu_atom@MEM_BOUND_STALLS.LOAD@",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group",
         "MetricName": "tma_l3_bound",
     },
     {
         "BriefDescription": "Counts the total number of issue slots that were not consumed by the backend because allocation is stalled due to a machine clear (nuke) of any kind including memory ordering and memory disambiguation.",
-        "MetricExpr": "TOPDOWN_BAD_SPECULATION.MACHINE_CLEARS / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_BAD_SPECULATION.MACHINE_CLEARS@ / tma_info_core_slots",
         "MetricGroup": "TopdownL2;tma_L2_group;tma_bad_speculation_group",
         "MetricName": "tma_machine_clears",
         "MetricThreshold": "tma_machine_clears > 0.05",
     },
     {
         "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to memory reservation stalls in which a scheduler is not able to accept uops.",
-        "MetricExpr": "TOPDOWN_BE_BOUND.MEM_SCHEDULER / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.MEM_SCHEDULER@ / tma_info_core_slots",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group",
         "MetricName": "tma_mem_scheduler",
         "MetricThreshold": "tma_mem_scheduler > 0.1",
     },
     {
         "BriefDescription": "Counts the number of cycles the core is stalled due to stores or loads.",
-        "MetricExpr": "min(cpu_atom@TOPDOWN_BE_BOUND.ALL@ / tma_info_core_slots, cpu_atom@LD_HEAD.ANY_AT_RET@ / tma_info_core_clks + tma_store_bound)",
+        "MetricExpr": "min(tma_backend_bound, cpu_atom@LD_HEAD.ANY_AT_RET@ / tma_info_core_clks + tma_store_bound)",
         "MetricGroup": "TopdownL2;tma_L2_group;tma_backend_bound_group",
         "MetricName": "tma_memory_bound",
         "MetricThreshold": "tma_memory_bound > 0.2",
     },
     {
         "BriefDescription": "Counts the number of uops that are from the complex flows issued by the micro-sequencer (MS)",
-        "MetricExpr": "UOPS_RETIRED.MS / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@UOPS_RETIRED.MS@ / tma_info_core_slots",
         "MetricGroup": "TopdownL2;tma_L2_group;tma_retiring_group",
         "MetricName": "tma_ms_uops",
         "MetricThreshold": "tma_ms_uops > 0.05",
     },
     {
         "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to IEC or FPC RAT stalls, which can be due to FIQ or IEC reservation stalls in which the integer, floating point or SIMD scheduler is not able to accept uops.",
-        "MetricExpr": "TOPDOWN_BE_BOUND.NON_MEM_SCHEDULER / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.NON_MEM_SCHEDULER@ / tma_info_core_slots",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group",
         "MetricName": "tma_non_mem_scheduler",
         "MetricThreshold": "tma_non_mem_scheduler > 0.1",
     },
     {
         "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to a machine clear (slow nuke).",
-        "MetricExpr": "TOPDOWN_BAD_SPECULATION.NUKE / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_BAD_SPECULATION.NUKE@ / tma_info_core_slots",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_machine_clears_group",
         "MetricName": "tma_nuke",
         "MetricThreshold": "tma_nuke > 0.05",
     },
     {
         "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to other common frontend stalls not categorized.",
-        "MetricExpr": "TOPDOWN_FE_BOUND.OTHER / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.OTHER@ / tma_info_core_slots",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_bandwidth_group",
         "MetricName": "tma_other_fb",
         "MetricThreshold": "tma_other_fb > 0.05",
     },
     {
         "BriefDescription": "Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a number of other load blocks.",
-        "MetricExpr": "LD_HEAD.OTHER_AT_RET / tma_info_core_clks",
+        "MetricExpr": "cpu_atom@LD_HEAD.OTHER_AT_RET@ / tma_info_core_clks",
         "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group",
         "MetricName": "tma_other_l1",
         "MetricThreshold": "tma_other_l1 > 0.05",
     },
     {
         "BriefDescription": "Counts the number of issue slots  that were not delivered by the frontend due to wrong predecodes.",
-        "MetricExpr": "TOPDOWN_FE_BOUND.PREDECODE / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_FE_BOUND.PREDECODE@ / tma_info_core_slots",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_fetch_bandwidth_group",
         "MetricName": "tma_predecode",
         "MetricThreshold": "tma_predecode > 0.05",
     },
     {
         "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to the physical register file unable to accept an entry (marble stalls).",
-        "MetricExpr": "TOPDOWN_BE_BOUND.REGISTER / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.REGISTER@ / tma_info_core_slots",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group",
         "MetricName": "tma_register",
         "MetricThreshold": "tma_register > 0.1",
     },
     {
         "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to the reorder buffer being full (ROB stalls).",
-        "MetricExpr": "TOPDOWN_BE_BOUND.REORDER_BUFFER / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.REORDER_BUFFER@ / tma_info_core_slots",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group",
         "MetricName": "tma_reorder_buffer",
         "MetricThreshold": "tma_reorder_buffer > 0.1",
     {
         "BriefDescription": "Counts the number of issue slots  that result in retirement slots.",
         "DefaultMetricgroupName": "TopdownL1",
-        "MetricExpr": "TOPDOWN_RETIRING.ALL / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_RETIRING.ALL@ / tma_info_core_slots",
         "MetricGroup": "Default;TopdownL1;tma_L1_group",
         "MetricName": "tma_retiring",
         "MetricThreshold": "tma_retiring > 0.75",
     },
     {
         "BriefDescription": "Counts the number of issue slots  that were not consumed by the backend due to scoreboards from the instruction queue (IQ), jump execution unit (JEU), or microcode sequencer (MS).",
-        "MetricExpr": "TOPDOWN_BE_BOUND.SERIALIZATION / tma_info_core_slots",
+        "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.SERIALIZATION@ / tma_info_core_slots",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_resource_bound_group",
         "MetricName": "tma_serialization",
         "MetricThreshold": "tma_serialization > 0.1",
     },
     {
         "BriefDescription": "Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a first level TLB miss.",
-        "MetricExpr": "LD_HEAD.DTLB_MISS_AT_RET / tma_info_core_clks",
+        "MetricExpr": "cpu_atom@LD_HEAD.DTLB_MISS_AT_RET@ / tma_info_core_clks",
         "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group",
         "MetricName": "tma_stlb_hit",
         "MetricThreshold": "tma_stlb_hit > 0.05",
     },
     {
         "BriefDescription": "Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a second level TLB miss requiring a page walk.",
-        "MetricExpr": "LD_HEAD.PGWALK_AT_RET / tma_info_core_clks",
+        "MetricExpr": "cpu_atom@LD_HEAD.PGWALK_AT_RET@ / tma_info_core_clks",
         "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group",
         "MetricName": "tma_stlb_miss",
         "MetricThreshold": "tma_stlb_miss > 0.05",
     },
     {
         "BriefDescription": "Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a store forward block.",
-        "MetricConstraint": "NO_GROUP_EVENTS_NMI",
-        "MetricExpr": "LD_HEAD.ST_ADDR_AT_RET / tma_info_core_clks",
+        "MetricExpr": "cpu_atom@LD_HEAD.ST_ADDR_AT_RET@ / tma_info_core_clks",
         "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group",
         "MetricName": "tma_store_fwd_blk",
         "MetricThreshold": "tma_store_fwd_blk > 0.05",
     },
     {
         "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to Branch Resteers",
-        "MetricExpr": "INT_MISC.CLEAR_RESTEER_CYCLES / tma_info_thread_clks + tma_unknown_branches",
+        "MetricExpr": "cpu_core@INT_MISC.CLEAR_RESTEER_CYCLES@ / tma_info_thread_clks + tma_unknown_branches",
         "MetricGroup": "FetchLat;TopdownL3;tma_L3_group;tma_fetch_latency_group",
         "MetricName": "tma_branch_resteers",
         "MetricThreshold": "tma_branch_resteers > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15)",
     },
     {
         "BriefDescription": "This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to contested accesses",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "(25 * tma_info_system_average_frequency * (cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD@ * (cpu_core@OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM@ / (cpu_core@OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM@ + cpu_core@OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD@))) + 24 * tma_info_system_average_frequency * cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS@) * (1 + cpu_core@MEM_LOAD_RETIRED.FB_HIT@ / cpu_core@MEM_LOAD_RETIRED.L1_MISS@ / 2) / tma_info_thread_clks",
         "MetricGroup": "DataSharing;Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSyncxn;tma_l3_bound_group",
         "MetricName": "tma_contested_accesses",
     },
     {
         "BriefDescription": "This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to data-sharing accesses",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "24 * tma_info_system_average_frequency * (cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_NO_FWD@ + cpu_core@MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD@ * (1 - cpu_core@OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM@ / (cpu_core@OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM@ + cpu_core@OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD@))) * (1 + cpu_core@MEM_LOAD_RETIRED.FB_HIT@ / cpu_core@MEM_LOAD_RETIRED.L1_MISS@ / 2) / tma_info_thread_clks",
         "MetricGroup": "Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSyncxn;tma_l3_bound_group",
         "MetricName": "tma_data_sharing",
     },
     {
         "BriefDescription": "This metric represents fraction of cycles where the Divider unit was active",
-        "MetricExpr": "ARITH.DIV_ACTIVE / tma_info_thread_clks",
+        "MetricExpr": "cpu_core@ARITH.DIV_ACTIVE@ / tma_info_thread_clks",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_core_bound_group",
         "MetricName": "tma_divider",
         "MetricThreshold": "tma_divider > 0.2 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2)",
     },
     {
         "BriefDescription": "This metric estimates how often the CPU was stalled on accesses to external memory (DRAM) by loads",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "cpu_core@MEMORY_ACTIVITY.STALLS_L3_MISS@ / tma_info_thread_clks",
         "MetricGroup": "MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_memory_bound_group",
         "MetricName": "tma_dram_bound",
     },
     {
         "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to switches from DSB to MITE pipelines",
-        "MetricExpr": "DSB2MITE_SWITCHES.PENALTY_CYCLES / tma_info_thread_clks",
+        "MetricExpr": "cpu_core@DSB2MITE_SWITCHES.PENALTY_CYCLES@ / tma_info_thread_clks",
         "MetricGroup": "DSBmiss;FetchLat;TopdownL3;tma_L3_group;tma_fetch_latency_group;tma_issueFB",
         "MetricName": "tma_dsb_switches",
         "MetricThreshold": "tma_dsb_switches > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15)",
     },
     {
         "BriefDescription": "This metric does a *rough estimation* of how often L1D Fill Buffer unavailability limited additional L1D miss memory access requests to proceed",
-        "MetricExpr": "L1D_PEND_MISS.FB_FULL / tma_info_thread_clks",
+        "MetricExpr": "cpu_core@L1D_PEND_MISS.FB_FULL@ / tma_info_thread_clks",
         "MetricGroup": "MemoryBW;TopdownL4;tma_L4_group;tma_issueBW;tma_issueSL;tma_issueSmSt;tma_l1_bound_group",
         "MetricName": "tma_fb_full",
         "MetricThreshold": "tma_fb_full > 0.3",
     },
     {
         "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to instruction cache misses",
-        "MetricExpr": "ICACHE_DATA.STALLS / tma_info_thread_clks",
+        "MetricExpr": "cpu_core@ICACHE_DATA.STALLS@ / tma_info_thread_clks",
         "MetricGroup": "BigFoot;FetchLat;IcMiss;TopdownL3;tma_L3_group;tma_fetch_latency_group",
         "MetricName": "tma_icache_misses",
         "MetricThreshold": "tma_icache_misses > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15)",
     },
     {
         "BriefDescription": "Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative branch misprediction (retired JEClear)",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "(tma_branch_mispredicts + tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) * tma_info_thread_slots / BR_MISP_RETIRED.ALL_BRANCHES",
         "MetricGroup": "Bad;BrMispredicts;tma_issueBM",
         "MetricName": "tma_info_bad_spec_branch_misprediction_cost",
     },
     {
         "BriefDescription": "Instructions per retired mispredicts for conditional non-taken branches (lower number means higher occurrence rate).",
-        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.COND_NTAKEN",
+        "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / BR_MISP_RETIRED.COND_NTAKEN",
         "MetricGroup": "Bad;BrMispredicts",
         "MetricName": "tma_info_bad_spec_ipmisp_cond_ntaken",
         "MetricThreshold": "tma_info_bad_spec_ipmisp_cond_ntaken < 200",
     },
     {
         "BriefDescription": "Instructions per retired mispredicts for conditional taken branches (lower number means higher occurrence rate).",
-        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.COND_TAKEN",
+        "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / BR_MISP_RETIRED.COND_TAKEN",
         "MetricGroup": "Bad;BrMispredicts",
         "MetricName": "tma_info_bad_spec_ipmisp_cond_taken",
         "MetricThreshold": "tma_info_bad_spec_ipmisp_cond_taken < 200",
     },
     {
         "BriefDescription": "Instructions per retired mispredicts for return branches (lower number means higher occurrence rate).",
-        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.RET",
+        "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / BR_MISP_RETIRED.RET",
         "MetricGroup": "Bad;BrMispredicts",
         "MetricName": "tma_info_bad_spec_ipmisp_ret",
         "MetricThreshold": "tma_info_bad_spec_ipmisp_ret < 500",
     },
     {
         "BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear) (lower number means higher occurrence rate)",
-        "MetricExpr": "INST_RETIRED.ANY / BR_MISP_RETIRED.ALL_BRANCHES",
+        "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / BR_MISP_RETIRED.ALL_BRANCHES",
         "MetricGroup": "Bad;BadSpec;BrMispredicts",
         "MetricName": "tma_info_bad_spec_ipmispredict",
         "MetricThreshold": "tma_info_bad_spec_ipmispredict < 200",
     },
     {
         "BriefDescription": "Probability of Core Bound bottleneck hidden by SMT-profiling artifacts",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "(100 * (1 - tma_core_bound / tma_ports_utilization if tma_core_bound < tma_ports_utilization else 1) if tma_info_system_smt_2t_utilization > 0.5 else 0)",
         "MetricGroup": "Cor;SMT",
         "MetricName": "tma_info_botlnk_l0_core_bound_likely",
     },
     {
         "BriefDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fetch_BW Bottleneck",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "100 * (tma_fetch_latency * tma_dsb_switches / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_mite / (tma_dsb + tma_lsd + tma_mite))",
         "MetricGroup": "DSBmiss;Fed;tma_issueFB",
         "MetricName": "tma_info_botlnk_l2_dsb_misses",
     },
     {
         "BriefDescription": "Total pipeline cost of Instruction Cache misses - subset of the Big_Code Bottleneck",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "100 * (tma_fetch_latency * tma_icache_misses / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))",
         "MetricGroup": "Fed;FetchLat;IcMiss;tma_issueFL",
         "MetricName": "tma_info_botlnk_l2_ic_misses",
     },
     {
         "BriefDescription": "Total pipeline cost of instruction fetch related bottlenecks by large code footprint programs (i-side cache; TLB and BTB misses)",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "100 * tma_fetch_latency * (tma_itlb_misses + tma_icache_misses + tma_unknown_branches) / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)",
         "MetricGroup": "BigFoot;Fed;Frontend;IcMiss;MemoryTLB;tma_issueBC",
         "MetricName": "tma_info_bottleneck_big_code",
     },
     {
         "BriefDescription": "Total pipeline cost of instruction fetch bandwidth related bottlenecks",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "100 * (tma_frontend_bound - tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) - tma_info_bottleneck_big_code",
         "MetricGroup": "Fed;FetchBW;Frontend",
         "MetricName": "tma_info_bottleneck_instruction_fetch_bw",
     },
     {
         "BriefDescription": "Total pipeline cost of (external) Memory Bandwidth related bottlenecks",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_sq_full / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full))) + tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_fb_full / (tma_dtlb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fwd_blk))",
         "MetricGroup": "Mem;MemoryBW;Offcore;tma_issueBW",
         "MetricName": "tma_info_bottleneck_memory_bandwidth",
     },
     {
         "BriefDescription": "Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "100 * tma_memory_bound * (tma_l1_bound / max(tma_memory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_dtlb_load / max(tma_l1_bound, tma_dtlb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores)))",
         "MetricGroup": "Mem;MemoryTLB;Offcore;tma_issueTLB",
         "MetricName": "tma_info_bottleneck_memory_data_tlbs",
     },
     {
         "BriefDescription": "Total pipeline cost of Memory Latency related bottlenecks (external memory and off-core caches)",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound) * (tma_l3_hit_latency / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_store_bound))",
         "MetricGroup": "Mem;MemoryLat;Offcore;tma_issueLat",
         "MetricName": "tma_info_bottleneck_memory_latency",
     },
     {
         "BriefDescription": "Total pipeline cost of Branch Misprediction related bottlenecks",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "100 * (tma_branch_mispredicts + tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))",
         "MetricGroup": "Bad;BadSpec;BrMispredicts;tma_issueBM",
         "MetricName": "tma_info_bottleneck_mispredictions",
     },
     {
         "BriefDescription": "Fraction of branches that are non-taken conditionals",
-        "MetricExpr": "BR_INST_RETIRED.COND_NTAKEN / BR_INST_RETIRED.ALL_BRANCHES",
+        "MetricExpr": "cpu_core@BR_INST_RETIRED.COND_NTAKEN@ / BR_INST_RETIRED.ALL_BRANCHES",
         "MetricGroup": "Bad;Branches;CodeGen;PGO",
         "MetricName": "tma_info_branches_cond_nt",
         "Unit": "cpu_core"
     },
     {
         "BriefDescription": "Fraction of branches that are taken conditionals",
-        "MetricExpr": "BR_INST_RETIRED.COND_TAKEN / BR_INST_RETIRED.ALL_BRANCHES",
+        "MetricExpr": "cpu_core@BR_INST_RETIRED.COND_TAKEN@ / BR_INST_RETIRED.ALL_BRANCHES",
         "MetricGroup": "Bad;Branches;CodeGen;PGO",
         "MetricName": "tma_info_branches_cond_tk",
         "Unit": "cpu_core"
     },
     {
         "BriefDescription": "Instructions Per Cycle across hyper-threads (per physical core)",
-        "MetricExpr": "INST_RETIRED.ANY / tma_info_core_core_clks",
+        "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / tma_info_core_core_clks",
         "MetricGroup": "Ret;SMT;TmaL1;tma_L1_group",
         "MetricName": "tma_info_core_coreipc",
         "Unit": "cpu_core"
     },
     {
         "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is execution) per-core",
-        "MetricExpr": "UOPS_EXECUTED.THREAD / (cpu_core@UOPS_EXECUTED.CORE_CYCLES_GE_1@ / 2 if #SMT_on else cpu_core@UOPS_EXECUTED.CORE_CYCLES_GE_1@)",
+        "MetricExpr": "cpu_core@UOPS_EXECUTED.THREAD@ / (cpu_core@UOPS_EXECUTED.CORE_CYCLES_GE_1@ / 2 if #SMT_on else cpu_core@UOPS_EXECUTED.CORE_CYCLES_GE_1@)",
         "MetricGroup": "Backend;Cor;Pipeline;PortsUtil",
         "MetricName": "tma_info_core_ilp",
         "Unit": "cpu_core"
     },
     {
         "BriefDescription": "Fraction of Uops delivered by the DSB (aka Decoded ICache; or Uop Cache)",
-        "MetricExpr": "IDQ.DSB_UOPS / cpu_core@UOPS_ISSUED.ANY@",
+        "MetricExpr": "cpu_core@IDQ.DSB_UOPS@ / cpu_core@UOPS_ISSUED.ANY@",
         "MetricGroup": "DSB;Fed;FetchBW;tma_issueFB",
         "MetricName": "tma_info_frontend_dsb_coverage",
         "MetricThreshold": "tma_info_frontend_dsb_coverage < 0.7 & tma_info_thread_ipc / 6 > 0.35",
     },
     {
         "BriefDescription": "Average number of cycles of a switch from the DSB fetch-unit to MITE fetch unit - see DSB_Switches tree node for details.",
-        "MetricExpr": "DSB2MITE_SWITCHES.PENALTY_CYCLES / cpu_core@DSB2MITE_SWITCHES.PENALTY_CYCLES\\,cmask\\=1\\,edge@",
+        "MetricExpr": "cpu_core@DSB2MITE_SWITCHES.PENALTY_CYCLES@ / cpu_core@DSB2MITE_SWITCHES.PENALTY_CYCLES\\,cmask\\=1\\,edge@",
         "MetricGroup": "DSBmiss",
         "MetricName": "tma_info_frontend_dsb_switch_cost",
         "Unit": "cpu_core"
     },
     {
         "BriefDescription": "Average number of Uops issued by front-end when it issued something",
-        "MetricExpr": "UOPS_ISSUED.ANY / cpu_core@UOPS_ISSUED.ANY\\,cmask\\=1@",
+        "MetricExpr": "cpu_core@UOPS_ISSUED.ANY@ / cpu_core@UOPS_ISSUED.ANY\\,cmask\\=1@",
         "MetricGroup": "Fed;FetchBW",
         "MetricName": "tma_info_frontend_fetch_upc",
         "Unit": "cpu_core"
     },
     {
         "BriefDescription": "Average Latency for L1 instruction cache misses",
-        "MetricExpr": "ICACHE_DATA.STALLS / cpu_core@ICACHE_DATA.STALLS\\,cmask\\=1\\,edge@",
+        "MetricExpr": "cpu_core@ICACHE_DATA.STALLS@ / cpu_core@ICACHE_DATA.STALLS\\,cmask\\=1\\,edge@",
         "MetricGroup": "Fed;FetchLat;IcMiss",
         "MetricName": "tma_info_frontend_icache_miss_latency",
         "Unit": "cpu_core"
     },
     {
         "BriefDescription": "Instructions per non-speculative DSB miss (lower number means higher occurrence rate)",
-        "MetricExpr": "INST_RETIRED.ANY / FRONTEND_RETIRED.ANY_DSB_MISS",
+        "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / FRONTEND_RETIRED.ANY_DSB_MISS",
         "MetricGroup": "DSBmiss;Fed",
         "MetricName": "tma_info_frontend_ipdsb_miss_ret",
         "MetricThreshold": "tma_info_frontend_ipdsb_miss_ret < 50",
     },
     {
         "BriefDescription": "Fraction of Uops delivered by the LSD (Loop Stream Detector; aka Loop Cache)",
-        "MetricExpr": "LSD.UOPS / cpu_core@UOPS_ISSUED.ANY@",
+        "MetricExpr": "cpu_core@LSD.UOPS@ / cpu_core@UOPS_ISSUED.ANY@",
         "MetricGroup": "Fed;LSD",
         "MetricName": "tma_info_frontend_lsd_coverage",
         "Unit": "cpu_core"
     },
     {
         "BriefDescription": "Branch instructions per taken branch.",
-        "MetricExpr": "BR_INST_RETIRED.ALL_BRANCHES / BR_INST_RETIRED.NEAR_TAKEN",
+        "MetricExpr": "cpu_core@BR_INST_RETIRED.ALL_BRANCHES@ / BR_INST_RETIRED.NEAR_TAKEN",
         "MetricGroup": "Branches;Fed;PGO",
         "MetricName": "tma_info_inst_mix_bptkbranch",
         "Unit": "cpu_core"
     },
     {
         "BriefDescription": "Instructions per FP Arithmetic instruction (lower number means higher occurrence rate)",
-        "MetricExpr": "INST_RETIRED.ANY / (cpu_core@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\=0x03@ + cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\,umask\\=0x3c@)",
+        "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_INST_RETIRED.SCALAR_SINGLE\\,umask\\=0x03@ + cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE\\,umask\\=0x3c@)",
         "MetricGroup": "Flops;InsType",
         "MetricName": "tma_info_inst_mix_iparith",
         "MetricThreshold": "tma_info_inst_mix_iparith < 10",
     },
     {
         "BriefDescription": "Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number means higher occurrence rate)",
-        "MetricExpr": "INST_RETIRED.ANY / (cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE@ + cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE@)",
+        "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE@ + cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE@)",
         "MetricGroup": "Flops;FpVector;InsType",
         "MetricName": "tma_info_inst_mix_iparith_avx128",
         "MetricThreshold": "tma_info_inst_mix_iparith_avx128 < 10",
     },
     {
         "BriefDescription": "Instructions per FP Arithmetic AVX* 256-bit instruction (lower number means higher occurrence rate)",
-        "MetricExpr": "INST_RETIRED.ANY / (cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE@ + cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE@)",
+        "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE@ + cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE@)",
         "MetricGroup": "Flops;FpVector;InsType",
         "MetricName": "tma_info_inst_mix_iparith_avx256",
         "MetricThreshold": "tma_info_inst_mix_iparith_avx256 < 10",
     },
     {
         "BriefDescription": "Instructions per FP Arithmetic Scalar Double-Precision instruction (lower number means higher occurrence rate)",
-        "MetricExpr": "INST_RETIRED.ANY / FP_ARITH_INST_RETIRED.SCALAR_DOUBLE",
+        "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / FP_ARITH_INST_RETIRED.SCALAR_DOUBLE",
         "MetricGroup": "Flops;FpScalar;InsType",
         "MetricName": "tma_info_inst_mix_iparith_scalar_dp",
         "MetricThreshold": "tma_info_inst_mix_iparith_scalar_dp < 10",
     },
     {
         "BriefDescription": "Instructions per FP Arithmetic Scalar Single-Precision instruction (lower number means higher occurrence rate)",
-        "MetricExpr": "INST_RETIRED.ANY / FP_ARITH_INST_RETIRED.SCALAR_SINGLE",
+        "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / FP_ARITH_INST_RETIRED.SCALAR_SINGLE",
         "MetricGroup": "Flops;FpScalar;InsType",
         "MetricName": "tma_info_inst_mix_iparith_scalar_sp",
         "MetricThreshold": "tma_info_inst_mix_iparith_scalar_sp < 10",
     },
     {
         "BriefDescription": "Instructions per Branch (lower number means higher occurrence rate)",
-        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.ALL_BRANCHES",
+        "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / BR_INST_RETIRED.ALL_BRANCHES",
         "MetricGroup": "Branches;Fed;InsType",
         "MetricName": "tma_info_inst_mix_ipbranch",
         "MetricThreshold": "tma_info_inst_mix_ipbranch < 8",
     },
     {
         "BriefDescription": "Instructions per (near) call (lower number means higher occurrence rate)",
-        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_CALL",
+        "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / BR_INST_RETIRED.NEAR_CALL",
         "MetricGroup": "Branches;Fed;PGO",
         "MetricName": "tma_info_inst_mix_ipcall",
         "MetricThreshold": "tma_info_inst_mix_ipcall < 200",
     },
     {
         "BriefDescription": "Instructions per Floating Point (FP) Operation (lower number means higher occurrence rate)",
-        "MetricExpr": "INST_RETIRED.ANY / (cpu_core@FP_ARITH_INST_RETIRED.SCALAR_SINGLE@ + cpu_core@FP_ARITH_INST_RETIRED.SCALAR_DOUBLE@ + 2 * cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE@ + 4 * (cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE@ + cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE@) + 8 * cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE@)",
+        "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_INST_RETIRED.SCALAR_SINGLE@ + cpu_core@FP_ARITH_INST_RETIRED.SCALAR_DOUBLE@ + 2 * cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE@ + 4 * (cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE@ + cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE@) + 8 * cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE@)",
         "MetricGroup": "Flops;InsType",
         "MetricName": "tma_info_inst_mix_ipflop",
         "MetricThreshold": "tma_info_inst_mix_ipflop < 10",
     },
     {
         "BriefDescription": "Instructions per Load (lower number means higher occurrence rate)",
-        "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_LOADS",
+        "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / MEM_INST_RETIRED.ALL_LOADS",
         "MetricGroup": "InsType",
         "MetricName": "tma_info_inst_mix_ipload",
         "MetricThreshold": "tma_info_inst_mix_ipload < 3",
     },
     {
         "BriefDescription": "Instructions per Store (lower number means higher occurrence rate)",
-        "MetricExpr": "INST_RETIRED.ANY / MEM_INST_RETIRED.ALL_STORES",
+        "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / MEM_INST_RETIRED.ALL_STORES",
         "MetricGroup": "InsType",
         "MetricName": "tma_info_inst_mix_ipstore",
         "MetricThreshold": "tma_info_inst_mix_ipstore < 8",
     },
     {
         "BriefDescription": "Instructions per Software prefetch instruction (of any type: NTA/T0/T1/T2/Prefetch) (lower number means higher occurrence rate)",
-        "MetricExpr": "INST_RETIRED.ANY / cpu_core@SW_PREFETCH_ACCESS.T0\\,umask\\=0xF@",
+        "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / cpu_core@SW_PREFETCH_ACCESS.T0\\,umask\\=0xF@",
         "MetricGroup": "Prefetches",
         "MetricName": "tma_info_inst_mix_ipswpf",
         "MetricThreshold": "tma_info_inst_mix_ipswpf < 100",
     },
     {
         "BriefDescription": "Instruction per taken branch",
-        "MetricExpr": "INST_RETIRED.ANY / BR_INST_RETIRED.NEAR_TAKEN",
+        "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / BR_INST_RETIRED.NEAR_TAKEN",
         "MetricGroup": "Branches;Fed;FetchBW;Frontend;PGO;tma_issueFB",
         "MetricName": "tma_info_inst_mix_iptb",
         "MetricThreshold": "tma_info_inst_mix_iptb < 13",
     },
     {
         "BriefDescription": "Actual Average Latency for L1 data-cache miss demand load operations (in core cycles)",
-        "MetricExpr": "L1D_PEND_MISS.PENDING / MEM_LOAD_COMPLETED.L1_MISS_ANY",
+        "MetricExpr": "cpu_core@L1D_PEND_MISS.PENDING@ / MEM_LOAD_COMPLETED.L1_MISS_ANY",
         "MetricGroup": "Mem;MemoryBound;MemoryLat",
         "MetricName": "tma_info_memory_load_miss_real_latency",
         "Unit": "cpu_core"
     },
     {
         "BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss",
-        "MetricExpr": "L1D_PEND_MISS.PENDING / L1D_PEND_MISS.PENDING_CYCLES",
+        "MetricExpr": "cpu_core@L1D_PEND_MISS.PENDING@ / L1D_PEND_MISS.PENDING_CYCLES",
         "MetricGroup": "Mem;MemoryBW;MemoryBound",
         "MetricName": "tma_info_memory_mlp",
         "PublicDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is at least one such miss. Per-Logical Processor)",
     },
     {
         "BriefDescription": "Average Parallel L2 cache miss data reads",
-        "MetricExpr": "OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD / OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD",
+        "MetricExpr": "cpu_core@OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD@ / OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD",
         "MetricGroup": "Memory_BW;Offcore",
         "MetricName": "tma_info_memory_oro_data_l2_mlp",
         "Unit": "cpu_core"
     },
     {
         "BriefDescription": "Average Latency for L2 cache miss demand Loads",
-        "MetricExpr": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD / OFFCORE_REQUESTS.DEMAND_DATA_RD",
+        "MetricExpr": "cpu_core@OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD@ / OFFCORE_REQUESTS.DEMAND_DATA_RD",
         "MetricGroup": "Memory_Lat;Offcore",
         "MetricName": "tma_info_memory_oro_load_l2_miss_latency",
         "Unit": "cpu_core"
     },
     {
         "BriefDescription": "Average Parallel L2 cache miss demand Loads",
-        "MetricExpr": "OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD / cpu_core@OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD\\,cmask\\=1@",
+        "MetricExpr": "cpu_core@OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD@ / cpu_core@OFFCORE_REQUESTS_OUTSTANDING.DEMAND_DATA_RD\\,cmask\\=1@",
         "MetricGroup": "Memory_BW;Offcore",
         "MetricName": "tma_info_memory_oro_load_l2_mlp",
         "Unit": "cpu_core"
     },
     {
         "BriefDescription": "Average Latency for L3 cache miss demand Loads",
-        "MetricExpr": "OFFCORE_REQUESTS_OUTSTANDING.L3_MISS_DEMAND_DATA_RD / OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD",
+        "MetricExpr": "cpu_core@OFFCORE_REQUESTS_OUTSTANDING.L3_MISS_DEMAND_DATA_RD@ / OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD",
         "MetricGroup": "Memory_Lat;Offcore",
         "MetricName": "tma_info_memory_oro_load_l3_miss_latency",
         "Unit": "cpu_core"
     },
     {
         "BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is execution) per-thread",
-        "MetricExpr": "UOPS_EXECUTED.THREAD / cpu_core@UOPS_EXECUTED.THREAD\\,cmask\\=1@",
+        "MetricExpr": "cpu_core@UOPS_EXECUTED.THREAD@ / cpu_core@UOPS_EXECUTED.THREAD\\,cmask\\=1@",
         "MetricGroup": "Cor;Pipeline;PortsUtil;SMT",
         "MetricName": "tma_info_pipeline_execute",
         "Unit": "cpu_core"
     },
     {
         "BriefDescription": "Instructions per a microcode Assist invocation",
-        "MetricExpr": "INST_RETIRED.ANY / cpu_core@ASSISTS.ANY\\,umask\\=0x1B@",
+        "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / cpu_core@ASSISTS.ANY\\,umask\\=0x1B@",
         "MetricGroup": "Pipeline;Ret;Retire",
         "MetricName": "tma_info_pipeline_ipassist",
         "MetricThreshold": "tma_info_pipeline_ipassist < 100e3",
     },
     {
         "BriefDescription": "Estimated fraction of retirement-cycles dealing with repeat instructions",
-        "MetricExpr": "INST_RETIRED.REP_ITERATION / cpu_core@UOPS_RETIRED.SLOTS\\,cmask\\=1@",
+        "MetricExpr": "cpu_core@INST_RETIRED.REP_ITERATION@ / cpu_core@UOPS_RETIRED.SLOTS\\,cmask\\=1@",
         "MetricGroup": "Pipeline;Ret",
         "MetricName": "tma_info_pipeline_strings_cycles",
         "MetricThreshold": "tma_info_pipeline_strings_cycles > 0.1",
     },
     {
         "BriefDescription": "Average CPU Utilization",
-        "MetricExpr": "CPU_CLK_UNHALTED.REF_TSC / TSC",
+        "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.REF_TSC@ / TSC",
         "MetricGroup": "HPC;Summary",
         "MetricName": "tma_info_system_cpu_utilization",
         "Unit": "cpu_core"
     },
     {
         "BriefDescription": "Instructions per Far Branch ( Far Branches apply upon transition from application to operating system, handling interrupts, exceptions) [lower number means higher occurrence rate]",
-        "MetricExpr": "INST_RETIRED.ANY / cpu_core@BR_INST_RETIRED.FAR_BRANCH@u",
+        "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / cpu_core@BR_INST_RETIRED.FAR_BRANCH@u",
         "MetricGroup": "Branches;OS",
         "MetricName": "tma_info_system_ipfarbranch",
         "MetricThreshold": "tma_info_system_ipfarbranch < 1e6",
     },
     {
         "BriefDescription": "Average latency of data read request to external memory (in nanoseconds)",
+        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "(UNC_ARB_TRK_OCCUPANCY.RD + UNC_ARB_DAT_OCCUPANCY.RD) / UNC_ARB_TRK_REQUESTS.RD",
         "MetricGroup": "Mem;MemoryLat;SoC",
         "MetricName": "tma_info_system_mem_read_latency",
     },
     {
         "BriefDescription": "Average latency of all requests to external memory (in Uncore cycles)",
+        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "(UNC_ARB_TRK_OCCUPANCY.ALL + UNC_ARB_DAT_OCCUPANCY.RD) / UNC_ARB_TRK_REQUESTS.ALL",
         "MetricGroup": "Mem;SoC",
         "MetricName": "tma_info_system_mem_request_latency",
     },
     {
         "BriefDescription": "The ratio of Executed- by Issued-Uops",
-        "MetricExpr": "UOPS_EXECUTED.THREAD / UOPS_ISSUED.ANY",
+        "MetricExpr": "cpu_core@UOPS_EXECUTED.THREAD@ / UOPS_ISSUED.ANY",
         "MetricGroup": "Cor;Pipeline",
         "MetricName": "tma_info_thread_execute_per_issue",
         "PublicDescription": "The ratio of Executed- by Issued-Uops. Ratio > 1 suggests high rate of uop micro-fusions. Ratio < 1 suggest high rate of \"execute\" at rename stage.",
     },
     {
         "BriefDescription": "Instructions Per Cycle (per Logical Processor)",
-        "MetricExpr": "INST_RETIRED.ANY / tma_info_thread_clks",
+        "MetricExpr": "cpu_core@INST_RETIRED.ANY@ / tma_info_thread_clks",
         "MetricGroup": "Ret;Summary",
         "MetricName": "tma_info_thread_ipc",
         "Unit": "cpu_core"
     },
     {
         "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to Instruction TLB (ITLB) misses",
-        "MetricExpr": "ICACHE_TAG.STALLS / tma_info_thread_clks",
+        "MetricExpr": "cpu_core@ICACHE_TAG.STALLS@ / tma_info_thread_clks",
         "MetricGroup": "BigFoot;FetchLat;MemoryTLB;TopdownL3;tma_L3_group;tma_fetch_latency_group",
         "MetricName": "tma_itlb_misses",
         "MetricThreshold": "tma_itlb_misses > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15)",
     },
     {
         "BriefDescription": "This metric estimates how often the CPU was stalled due to L2 cache accesses by loads",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "(cpu_core@MEMORY_ACTIVITY.STALLS_L1D_MISS@ - cpu_core@MEMORY_ACTIVITY.STALLS_L2_MISS@) / tma_info_thread_clks",
         "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_memory_bound_group",
         "MetricName": "tma_l2_bound",
     },
     {
         "BriefDescription": "This metric estimates how often the CPU was stalled due to loads accesses to L3 cache or contended with a sibling Core",
-        "MetricConstraint": "NO_GROUP_EVENTS_NMI",
         "MetricExpr": "(cpu_core@MEMORY_ACTIVITY.STALLS_L2_MISS@ - cpu_core@MEMORY_ACTIVITY.STALLS_L3_MISS@) / tma_info_thread_clks",
         "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_memory_bound_group",
         "MetricName": "tma_l3_bound",
     },
     {
         "BriefDescription": "This metric represents fraction of cycles CPU was stalled due to Length Changing Prefixes (LCPs)",
-        "MetricExpr": "DECODE.LCP / tma_info_thread_clks",
+        "MetricExpr": "cpu_core@DECODE.LCP@ / tma_info_thread_clks",
         "MetricGroup": "FetchLat;TopdownL3;tma_L3_group;tma_fetch_latency_group;tma_issueFB",
         "MetricName": "tma_lcp",
         "MetricThreshold": "tma_lcp > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15)",
     },
     {
         "BriefDescription": "This metric represents Core fraction of cycles CPU dispatched uops on execution port for Load operations",
-        "MetricExpr": "UOPS_DISPATCHED.PORT_2_3_10 / (3 * tma_info_core_core_clks)",
+        "MetricExpr": "cpu_core@UOPS_DISPATCHED.PORT_2_3_10@ / (3 * tma_info_core_core_clks)",
         "MetricGroup": "TopdownL5;tma_L5_group;tma_ports_utilized_3m_group",
         "MetricName": "tma_load_op_utilization",
         "MetricThreshold": "tma_load_op_utilization > 0.6",
     },
     {
         "BriefDescription": "This metric estimates the fraction of cycles where the Second-level TLB (STLB) was missed by load accesses, performing a hardware page walk",
-        "MetricExpr": "DTLB_LOAD_MISSES.WALK_ACTIVE / tma_info_thread_clks",
+        "MetricExpr": "cpu_core@DTLB_LOAD_MISSES.WALK_ACTIVE@ / tma_info_thread_clks",
         "MetricGroup": "MemoryTLB;TopdownL5;tma_L5_group;tma_dtlb_load_group",
         "MetricName": "tma_load_stlb_miss",
         "MetricThreshold": "tma_load_stlb_miss > 0.05 & (tma_dtlb_load > 0.1 & (tma_l1_bound > 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2)))",
     },
     {
         "BriefDescription": "This metric represents fraction of cycles the CPU spent handling cache misses due to lock operations",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "(16 * max(0, cpu_core@MEM_INST_RETIRED.LOCK_LOADS@ - cpu_core@L2_RQSTS.ALL_RFO@) + cpu_core@MEM_INST_RETIRED.LOCK_LOADS@ / cpu_core@MEM_INST_RETIRED.ALL_STORES@ * (10 * cpu_core@L2_RQSTS.RFO_HIT@ + min(cpu_core@CPU_CLK_UNHALTED.THREAD@, cpu_core@OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO@))) / tma_info_thread_clks",
         "MetricGroup": "Offcore;TopdownL4;tma_L4_group;tma_issueRFO;tma_l1_bound_group",
         "MetricName": "tma_lock_latency",
     },
     {
         "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to LFENCE Instructions.",
+        "MetricConstraint": "NO_GROUP_EVENTS_NMI",
         "MetricExpr": "13 * cpu_core@MISC2_RETIRED.LFENCE@ / tma_info_thread_clks",
         "MetricGroup": "TopdownL6;tma_L6_group;tma_serializing_operation_group",
         "MetricName": "tma_memory_fence",
     },
     {
         "BriefDescription": "This metric represents fraction of slots where the CPU was retiring memory operations -- uops for memory load or store accesses.",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "tma_light_operations * cpu_core@MEM_UOP_RETIRED.ANY@ / (tma_retiring * tma_info_thread_slots)",
         "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
         "MetricName": "tma_memory_operations",
     },
     {
         "BriefDescription": "This metric represents fraction of slots the CPU was retiring uops fetched by the Microcode Sequencer (MS) unit",
-        "MetricExpr": "UOPS_RETIRED.MS / tma_info_thread_slots",
+        "MetricExpr": "cpu_core@UOPS_RETIRED.MS@ / tma_info_thread_slots",
         "MetricGroup": "MicroSeq;TopdownL3;tma_L3_group;tma_heavy_operations_group;tma_issueMC;tma_issueMS",
         "MetricName": "tma_microcode_sequencer",
         "MetricThreshold": "tma_microcode_sequencer > 0.05 & tma_heavy_operations > 0.1",
     },
     {
         "BriefDescription": "This metric represents the remaining light uops fraction the CPU has executed - remaining means not covered by other sibling nodes",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_int_operations + tma_memory_operations + tma_fused_instructions + tma_non_fused_branches + tma_nop_instructions))",
         "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
         "MetricName": "tma_other_light_ops",
     },
     {
         "BriefDescription": "This metric represents Core fraction of cycles CPU dispatched uops on execution port 0 ([SNB+] ALU; [HSW+] ALU and 2nd branch)",
-        "MetricExpr": "UOPS_DISPATCHED.PORT_0 / tma_info_core_core_clks",
+        "MetricExpr": "cpu_core@UOPS_DISPATCHED.PORT_0@ / tma_info_core_core_clks",
         "MetricGroup": "Compute;TopdownL6;tma_L6_group;tma_alu_op_utilization_group;tma_issue2P",
         "MetricName": "tma_port_0",
         "MetricThreshold": "tma_port_0 > 0.6",
     },
     {
         "BriefDescription": "This metric represents Core fraction of cycles CPU dispatched uops on execution port 1 (ALU)",
-        "MetricExpr": "UOPS_DISPATCHED.PORT_1 / tma_info_core_core_clks",
+        "MetricExpr": "cpu_core@UOPS_DISPATCHED.PORT_1@ / tma_info_core_core_clks",
         "MetricGroup": "TopdownL6;tma_L6_group;tma_alu_op_utilization_group;tma_issue2P",
         "MetricName": "tma_port_1",
         "MetricThreshold": "tma_port_1 > 0.6",
     },
     {
         "BriefDescription": "This metric represents Core fraction of cycles CPU dispatched uops on execution port 6 ([HSW+]Primary Branch and simple ALU)",
-        "MetricExpr": "UOPS_DISPATCHED.PORT_6 / tma_info_core_core_clks",
+        "MetricExpr": "cpu_core@UOPS_DISPATCHED.PORT_6@ / tma_info_core_core_clks",
         "MetricGroup": "TopdownL6;tma_L6_group;tma_alu_op_utilization_group;tma_issue2P",
         "MetricName": "tma_port_6",
         "MetricThreshold": "tma_port_6 > 0.6",
     },
     {
         "BriefDescription": "This metric represents fraction of cycles where the CPU executed total of 1 uop per cycle on all execution ports (Logical Processor cycles since ICL, Physical Core cycles otherwise)",
-        "MetricExpr": "EXE_ACTIVITY.1_PORTS_UTIL / tma_info_thread_clks",
+        "MetricExpr": "cpu_core@EXE_ACTIVITY.1_PORTS_UTIL@ / tma_info_thread_clks",
         "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issueL1;tma_ports_utilization_group",
         "MetricName": "tma_ports_utilized_1",
         "MetricThreshold": "tma_ports_utilized_1 > 0.2 & (tma_ports_utilization > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))",
     },
     {
         "BriefDescription": "This metric represents fraction of cycles CPU executed total of 2 uops per cycle on all execution ports (Logical Processor cycles since ICL, Physical Core cycles otherwise)",
-        "MetricExpr": "EXE_ACTIVITY.2_PORTS_UTIL / tma_info_thread_clks",
+        "MetricConstraint": "NO_GROUP_EVENTS_NMI",
+        "MetricExpr": "cpu_core@EXE_ACTIVITY.2_PORTS_UTIL@ / tma_info_thread_clks",
         "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issue2P;tma_ports_utilization_group",
         "MetricName": "tma_ports_utilized_2",
         "MetricThreshold": "tma_ports_utilized_2 > 0.15 & (tma_ports_utilization > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))",
     },
     {
         "BriefDescription": "This metric represents fraction of cycles CPU executed total of 3 or more uops per cycle on all execution ports (Logical Processor cycles since ICL, Physical Core cycles otherwise)",
-        "MetricExpr": "UOPS_EXECUTED.CYCLES_GE_3 / tma_info_thread_clks",
+        "MetricConstraint": "NO_GROUP_EVENTS_NMI",
+        "MetricExpr": "cpu_core@UOPS_EXECUTED.CYCLES_GE_3@ / tma_info_thread_clks",
         "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utilization_group",
         "MetricName": "tma_ports_utilized_3m",
         "MetricThreshold": "tma_ports_utilized_3m > 0.7 & (tma_ports_utilization > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))",
     },
     {
         "BriefDescription": "This metric represents fraction of cycles the CPU issue-pipeline was stalled due to serializing operations",
-        "MetricExpr": "RESOURCE_STALLS.SCOREBOARD / tma_info_thread_clks",
+        "MetricExpr": "cpu_core@RESOURCE_STALLS.SCOREBOARD@ / tma_info_thread_clks",
         "MetricGroup": "PortsUtil;TopdownL5;tma_L5_group;tma_issueSO;tma_ports_utilized_0_group",
         "MetricName": "tma_serializing_operation",
         "MetricThreshold": "tma_serializing_operation > 0.1 & (tma_ports_utilized_0 > 0.2 & (tma_ports_utilization > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2)))",
     },
     {
         "BriefDescription": "This metric represents Shuffle (cross \"vector lane\" data transfers) uops fraction the CPU has retired.",
-        "MetricExpr": "INT_VEC_RETIRED.SHUFFLES / (tma_retiring * tma_info_thread_slots)",
+        "MetricExpr": "cpu_core@INT_VEC_RETIRED.SHUFFLES@ / (tma_retiring * tma_info_thread_slots)",
         "MetricGroup": "HPC;Pipeline;TopdownL4;tma_L4_group;tma_int_operations_group",
         "MetricName": "tma_shuffles",
         "MetricThreshold": "tma_shuffles > 0.1 & (tma_int_operations > 0.1 & tma_light_operations > 0.6)",
     },
     {
         "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to PAUSE Instructions",
-        "MetricExpr": "CPU_CLK_UNHALTED.PAUSE / tma_info_thread_clks",
+        "MetricConstraint": "NO_GROUP_EVENTS_NMI",
+        "MetricExpr": "cpu_core@CPU_CLK_UNHALTED.PAUSE@ / tma_info_thread_clks",
         "MetricGroup": "TopdownL6;tma_L6_group;tma_serializing_operation_group",
         "MetricName": "tma_slow_pause",
         "MetricThreshold": "tma_slow_pause > 0.05 & (tma_serializing_operation > 0.1 & (tma_ports_utilized_0 > 0.2 & (tma_ports_utilization > 0.15 & (tma_core_bound > 0.1 & tma_backend_bound > 0.2))))",
     },
     {
         "BriefDescription": "This metric represents rate of split store accesses",
-        "MetricConstraint": "NO_GROUP_EVENTS_NMI",
-        "MetricExpr": "MEM_INST_RETIRED.SPLIT_STORES / tma_info_core_core_clks",
+        "MetricExpr": "cpu_core@MEM_INST_RETIRED.SPLIT_STORES@ / tma_info_core_core_clks",
         "MetricGroup": "TopdownL4;tma_L4_group;tma_issueSpSt;tma_store_bound_group",
         "MetricName": "tma_split_stores",
         "MetricThreshold": "tma_split_stores > 0.2 & (tma_store_bound > 0.2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))",
     },
     {
         "BriefDescription": "This metric estimates how often CPU was stalled  due to RFO store memory accesses; RFO store issue a read-for-ownership request before the write",
-        "MetricExpr": "EXE_ACTIVITY.BOUND_ON_STORES / tma_info_thread_clks",
+        "MetricExpr": "cpu_core@EXE_ACTIVITY.BOUND_ON_STORES@ / tma_info_thread_clks",
         "MetricGroup": "MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_memory_bound_group",
         "MetricName": "tma_store_bound",
         "MetricThreshold": "tma_store_bound > 0.2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2)",
     },
     {
         "BriefDescription": "This metric roughly estimates fraction of cycles when the memory subsystem had loads blocked since they could not forward data from earlier (in program order) overlapping stores",
-        "MetricConstraint": "NO_GROUP_EVENTS_NMI",
         "MetricExpr": "13 * cpu_core@LD_BLOCKS.STORE_FORWARD@ / tma_info_thread_clks",
         "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group",
         "MetricName": "tma_store_fwd_blk",
     },
     {
         "BriefDescription": "This metric estimates the fraction of cycles where the STLB was missed by store accesses, performing a hardware page walk",
-        "MetricExpr": "DTLB_STORE_MISSES.WALK_ACTIVE / tma_info_core_core_clks",
+        "MetricExpr": "cpu_core@DTLB_STORE_MISSES.WALK_ACTIVE@ / tma_info_core_core_clks",
         "MetricGroup": "MemoryTLB;TopdownL5;tma_L5_group;tma_dtlb_store_group",
         "MetricName": "tma_store_stlb_miss",
         "MetricThreshold": "tma_store_stlb_miss > 0.05 & (tma_dtlb_store > 0.05 & (tma_store_bound > 0.2 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2)))",
     },
     {
         "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to new branch address clears",
-        "MetricExpr": "INT_MISC.UNKNOWN_BRANCH_CYCLES / tma_info_thread_clks",
+        "MetricExpr": "cpu_core@INT_MISC.UNKNOWN_BRANCH_CYCLES@ / tma_info_thread_clks",
         "MetricGroup": "BigFoot;FetchLat;TopdownL4;tma_L4_group;tma_branch_resteers_group",
         "MetricName": "tma_unknown_branches",
         "MetricThreshold": "tma_unknown_branches > 0.05 & (tma_branch_resteers > 0.05 & (tma_fetch_latency > 0.1 & tma_frontend_bound > 0.15))",
index c150c14ac6ed9925888fb9afc5e15e8dabdc1d8f..a35edf7d86a97e20570cf3f053d1b314be37fd20 100644 (file)
     },
     {
         "BriefDescription": "Counts the number of cycles the core is stalled due to a demand load miss which hit in DRAM or MMIO (Non-DRAM).",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "MEM_BOUND_STALLS.LOAD_DRAM_HIT / tma_info_core_clks - max((MEM_BOUND_STALLS.LOAD - LD_HEAD.L1_MISS_AT_RET) / tma_info_core_clks, 0) * MEM_BOUND_STALLS.LOAD_DRAM_HIT / MEM_BOUND_STALLS.LOAD",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group",
         "MetricName": "tma_dram_bound",
     },
     {
         "BriefDescription": "Counts the number of cycles a core is stalled due to a demand load which hit in the L2 Cache.",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "MEM_BOUND_STALLS.LOAD_L2_HIT / tma_info_core_clks - max((MEM_BOUND_STALLS.LOAD - LD_HEAD.L1_MISS_AT_RET) / tma_info_core_clks, 0) * MEM_BOUND_STALLS.LOAD_L2_HIT / MEM_BOUND_STALLS.LOAD",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group",
         "MetricName": "tma_l2_bound",
     },
     {
         "BriefDescription": "Counts the number of cycles a core is stalled due to a demand load which hit in the Last Level Cache (LLC) or other core with HITE/F/M.",
-        "MetricConstraint": "NO_GROUP_EVENTS_NMI",
         "MetricExpr": "MEM_BOUND_STALLS.LOAD_LLC_HIT / tma_info_core_clks - max((MEM_BOUND_STALLS.LOAD - LD_HEAD.L1_MISS_AT_RET) / tma_info_core_clks, 0) * MEM_BOUND_STALLS.LOAD_LLC_HIT / MEM_BOUND_STALLS.LOAD",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_memory_bound_group",
         "MetricName": "tma_l3_bound",
     },
     {
         "BriefDescription": "Counts the number of cycles that the oldest load of the load buffer is stalled at retirement due to a store forward block.",
-        "MetricConstraint": "NO_GROUP_EVENTS_NMI",
         "MetricExpr": "LD_HEAD.ST_ADDR_AT_RET / tma_info_core_clks",
         "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group",
         "MetricName": "tma_store_fwd_blk",
index e31a4aac9f205e4d462b43472dbfc02c2ffd91c1..56e54babcc26f16aee88abae0716675e3ab97c83 100644 (file)
     },
     {
         "BriefDescription": "This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to contested accesses",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "(76 * tma_info_system_average_frequency * (MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD * (OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM / (OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM + OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD))) + 75.5 * tma_info_system_average_frequency * MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS) * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2) / tma_info_thread_clks",
         "MetricGroup": "DataSharing;Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSyncxn;tma_l3_bound_group",
         "MetricName": "tma_contested_accesses",
     },
     {
         "BriefDescription": "This metric estimates fraction of cycles while the memory subsystem was handling synchronizations due to data-sharing accesses",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "75.5 * tma_info_system_average_frequency * (MEM_LOAD_L3_HIT_RETIRED.XSNP_NO_FWD + MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD * (1 - OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM / (OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM + OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD))) * (1 + MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.L1_MISS / 2) / tma_info_thread_clks",
         "MetricGroup": "Offcore;Snoop;TopdownL4;tma_L4_group;tma_issueSyncxn;tma_l3_bound_group",
         "MetricName": "tma_data_sharing",
     },
     {
         "BriefDescription": "This metric estimates how often the CPU was stalled on accesses to external memory (DRAM) by loads",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "(MEMORY_ACTIVITY.STALLS_L3_MISS / tma_info_thread_clks - tma_pmm_bound if #has_pmem > 0 else MEMORY_ACTIVITY.STALLS_L3_MISS / tma_info_thread_clks)",
         "MetricGroup": "MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_memory_bound_group",
         "MetricName": "tma_dram_bound",
     },
     {
         "BriefDescription": "Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative branch misprediction (retired JEClear)",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "(tma_branch_mispredicts + tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) * tma_info_thread_slots / BR_MISP_RETIRED.ALL_BRANCHES",
         "MetricGroup": "Bad;BrMispredicts;tma_issueBM",
         "MetricName": "tma_info_bad_spec_branch_misprediction_cost",
     },
     {
         "BriefDescription": "Probability of Core Bound bottleneck hidden by SMT-profiling artifacts",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "(100 * (1 - tma_core_bound / tma_ports_utilization if tma_core_bound < tma_ports_utilization else 1) if tma_info_system_smt_2t_utilization > 0.5 else 0)",
         "MetricGroup": "Cor;SMT",
         "MetricName": "tma_info_botlnk_l0_core_bound_likely",
     },
     {
         "BriefDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fetch_BW Bottleneck",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "100 * (tma_fetch_latency * tma_dsb_switches / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches) + tma_fetch_bandwidth * tma_mite / (tma_dsb + tma_mite))",
         "MetricGroup": "DSBmiss;Fed;tma_issueFB",
         "MetricName": "tma_info_botlnk_l2_dsb_misses",
     },
     {
         "BriefDescription": "Total pipeline cost of Instruction Cache misses - subset of the Big_Code Bottleneck",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "100 * (tma_fetch_latency * tma_icache_misses / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))",
         "MetricGroup": "Fed;FetchLat;IcMiss;tma_issueFL",
         "MetricName": "tma_info_botlnk_l2_ic_misses",
     },
     {
         "BriefDescription": "Total pipeline cost of instruction fetch related bottlenecks by large code footprint programs (i-side cache; TLB and BTB misses)",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "100 * tma_fetch_latency * (tma_itlb_misses + tma_icache_misses + tma_unknown_branches) / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)",
         "MetricGroup": "BigFoot;Fed;Frontend;IcMiss;MemoryTLB;tma_issueBC",
         "MetricName": "tma_info_bottleneck_big_code",
     },
     {
         "BriefDescription": "Total pipeline cost of instruction fetch bandwidth related bottlenecks",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "100 * (tma_frontend_bound - tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches)) - tma_info_bottleneck_big_code",
         "MetricGroup": "Fed;FetchBW;Frontend",
         "MetricName": "tma_info_bottleneck_instruction_fetch_bw",
     },
     {
         "BriefDescription": "Total pipeline cost of (external) Memory Bandwidth related bottlenecks",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_store_bound) * (tma_mem_bandwidth / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_store_bound) * (tma_sq_full / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full))) + tma_l1_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_store_bound) * (tma_fb_full / (tma_dtlb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fwd_blk))",
         "MetricGroup": "Mem;MemoryBW;Offcore;tma_issueBW",
         "MetricName": "tma_info_bottleneck_memory_bandwidth",
     },
     {
         "BriefDescription": "Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "100 * tma_memory_bound * (tma_l1_bound / max(tma_memory_bound, tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_store_bound) * (tma_dtlb_load / max(tma_l1_bound, tma_dtlb_load + tma_fb_full + tma_lock_latency + tma_split_loads + tma_store_fwd_blk)) + tma_store_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_store_bound) * (tma_dtlb_store / (tma_dtlb_store + tma_false_sharing + tma_split_stores + tma_store_latency + tma_streaming_stores)))",
         "MetricGroup": "Mem;MemoryTLB;Offcore;tma_issueTLB",
         "MetricName": "tma_info_bottleneck_memory_data_tlbs",
     },
     {
         "BriefDescription": "Total pipeline cost of Memory Latency related bottlenecks (external memory and off-core caches)",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "100 * tma_memory_bound * (tma_dram_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_store_bound) * (tma_mem_latency / (tma_mem_bandwidth + tma_mem_latency)) + tma_l3_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_store_bound) * (tma_l3_hit_latency / (tma_contested_accesses + tma_data_sharing + tma_l3_hit_latency + tma_sq_full)) + tma_l2_bound / (tma_dram_bound + tma_l1_bound + tma_l2_bound + tma_l3_bound + tma_pmm_bound + tma_store_bound))",
         "MetricGroup": "Mem;MemoryLat;Offcore;tma_issueLat",
         "MetricName": "tma_info_bottleneck_memory_latency",
     },
     {
         "BriefDescription": "Total pipeline cost of Branch Misprediction related bottlenecks",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "100 * (tma_branch_mispredicts + tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_icache_misses + tma_itlb_misses + tma_lcp + tma_ms_switches))",
         "MetricGroup": "Bad;BadSpec;BrMispredicts;tma_issueBM",
         "MetricName": "tma_info_bottleneck_mispredictions",
     },
     {
         "BriefDescription": "Average latency of data read request to external memory (in nanoseconds)",
+        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "1e9 * (UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD / UNC_CHA_TOR_INSERTS.IA_MISS_DRD) / (tma_info_system_socket_clks / duration_time)",
         "MetricGroup": "Mem;MemoryLat;SoC",
         "MetricName": "tma_info_system_mem_read_latency",
     },
     {
         "BriefDescription": "This metric estimates how often the CPU was stalled due to L2 cache accesses by loads",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "(MEMORY_ACTIVITY.STALLS_L1D_MISS - MEMORY_ACTIVITY.STALLS_L2_MISS) / tma_info_thread_clks",
         "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_memory_bound_group",
         "MetricName": "tma_l2_bound",
     },
     {
         "BriefDescription": "This metric estimates how often the CPU was stalled due to loads accesses to L3 cache or contended with a sibling Core",
-        "MetricConstraint": "NO_GROUP_EVENTS_NMI",
         "MetricExpr": "(MEMORY_ACTIVITY.STALLS_L2_MISS - MEMORY_ACTIVITY.STALLS_L3_MISS) / tma_info_thread_clks",
         "MetricGroup": "CacheMisses;MemoryBound;TmaL3mem;TopdownL3;tma_L3_group;tma_memory_bound_group",
         "MetricName": "tma_l3_bound",
     },
     {
         "BriefDescription": "This metric represents fraction of cycles the CPU spent handling cache misses due to lock operations",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "(16 * max(0, MEM_INST_RETIRED.LOCK_LOADS - L2_RQSTS.ALL_RFO) + MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STORES * (10 * L2_RQSTS.RFO_HIT + min(CPU_CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO))) / tma_info_thread_clks",
         "MetricGroup": "Offcore;TopdownL4;tma_L4_group;tma_issueRFO;tma_l1_bound_group",
         "MetricName": "tma_lock_latency",
     },
     {
         "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to LFENCE Instructions.",
+        "MetricConstraint": "NO_GROUP_EVENTS_NMI",
         "MetricExpr": "13 * MISC2_RETIRED.LFENCE / tma_info_thread_clks",
         "MetricGroup": "TopdownL6;tma_L6_group;tma_serializing_operation_group",
         "MetricName": "tma_memory_fence",
     },
     {
         "BriefDescription": "This metric represents fraction of slots where the CPU was retiring memory operations -- uops for memory load or store accesses.",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "tma_light_operations * MEM_UOP_RETIRED.ANY / (tma_retiring * tma_info_thread_slots)",
         "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
         "MetricName": "tma_memory_operations",
     },
     {
         "BriefDescription": "This metric represents the remaining light uops fraction the CPU has executed - remaining means not covered by other sibling nodes",
-        "MetricConstraint": "NO_GROUP_EVENTS",
         "MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_int_operations + tma_memory_operations + tma_fused_instructions + tma_non_fused_branches + tma_nop_instructions))",
         "MetricGroup": "Pipeline;TopdownL3;tma_L3_group;tma_light_operations_group",
         "MetricName": "tma_other_light_ops",
     },
     {
         "BriefDescription": "This metric represents fraction of cycles CPU executed total of 2 uops per cycle on all execution ports (Logical Processor cycles since ICL, Physical Core cycles otherwise)",
+        "MetricConstraint": "NO_GROUP_EVENTS_NMI",
         "MetricExpr": "EXE_ACTIVITY.2_PORTS_UTIL / tma_info_thread_clks",
         "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_issue2P;tma_ports_utilization_group",
         "MetricName": "tma_ports_utilized_2",
     },
     {
         "BriefDescription": "This metric represents fraction of cycles CPU executed total of 3 or more uops per cycle on all execution ports (Logical Processor cycles since ICL, Physical Core cycles otherwise)",
+        "MetricConstraint": "NO_GROUP_EVENTS_NMI",
         "MetricExpr": "UOPS_EXECUTED.CYCLES_GE_3 / tma_info_thread_clks",
         "MetricGroup": "PortsUtil;TopdownL4;tma_L4_group;tma_ports_utilization_group",
         "MetricName": "tma_ports_utilized_3m",
     },
     {
         "BriefDescription": "This metric represents fraction of cycles the CPU was stalled due to PAUSE Instructions",
+        "MetricConstraint": "NO_GROUP_EVENTS_NMI",
         "MetricExpr": "CPU_CLK_UNHALTED.PAUSE / tma_info_thread_clks",
         "MetricGroup": "TopdownL6;tma_L6_group;tma_serializing_operation_group",
         "MetricName": "tma_slow_pause",
     },
     {
         "BriefDescription": "This metric represents rate of split store accesses",
-        "MetricConstraint": "NO_GROUP_EVENTS_NMI",
         "MetricExpr": "MEM_INST_RETIRED.SPLIT_STORES / tma_info_core_core_clks",
         "MetricGroup": "TopdownL4;tma_L4_group;tma_issueSpSt;tma_store_bound_group",
         "MetricName": "tma_split_stores",
     },
     {
         "BriefDescription": "This metric roughly estimates fraction of cycles when the memory subsystem had loads blocked since they could not forward data from earlier (in program order) overlapping stores",
-        "MetricConstraint": "NO_GROUP_EVENTS_NMI",
         "MetricExpr": "13 * LD_BLOCKS.STORE_FORWARD / tma_info_thread_clks",
         "MetricGroup": "TopdownL4;tma_L4_group;tma_l1_bound_group",
         "MetricName": "tma_store_fwd_blk",
index 4c598cfc5afa14816f128da39f86cc7f18b4c68a..e5fa8d6f9eb1fdad3a5d700da9fa788c771a31ad 100755 (executable)
@@ -414,16 +414,30 @@ EOF
        # start daemon
        daemon_start ${config} test
 
-       # send 2 signals
-       perf daemon signal --config ${config} --session test
-       perf daemon signal --config ${config}
-
-       # stop daemon
-       daemon_exit ${config}
-
-       # count is 2 perf.data for signals and 1 for perf record finished
-       count=`ls ${base}/session-test/*perf.data* | wc -l`
-       if [ ${count} -ne 3 ]; then
+        # send 2 signals then exit. Do this in a loop watching the number of
+        # files to avoid races. If the loop retries more than 600 times then
+        # give up.
+       local retries=0
+       local signals=0
+       local success=0
+       while [ ${retries} -lt 600 ] && [ ${success} -eq 0 ]; do
+               local files
+               files=`ls ${base}/session-test/*perf.data* 2> /dev/null | wc -l`
+               if [ ${signals} -eq 0 ]; then
+                       perf daemon signal --config ${config} --session test
+                       signals=1
+               elif [ ${signals} -eq 1 ] && [ $files -ge 1 ]; then
+                       perf daemon signal --config ${config}
+                       signals=2
+               elif [ ${signals} -eq 2 ] && [ $files -ge 2 ]; then
+                       daemon_exit ${config}
+                       signals=3
+               elif [ ${signals} -eq 3 ] && [ $files -ge 3 ]; then
+                       success=1
+               fi
+               retries=$((${retries} +1))
+       done
+       if [ ${success} -eq 0 ]; then
                error=1
                echo "FAILED: perf data no generated"
        fi
index 22b004f2b23ec6bb2c7748c6b804c292c276daf6..8a868ae64560e1842019155d4f1f519e5b8eecef 100755 (executable)
@@ -3,17 +3,32 @@
 # SPDX-License-Identifier: GPL-2.0
 
 set -e
-err=0
 
 shelldir=$(dirname "$0")
 # shellcheck source=lib/setup_python.sh
 . "${shelldir}"/lib/setup_python.sh
 
+list_output=$(mktemp /tmp/__perf_test.list_output.json.XXXXX)
+
+cleanup() {
+  rm -f "${list_output}"
+
+  trap - EXIT TERM INT
+}
+
+trap_cleanup() {
+  cleanup
+  exit 1
+}
+trap trap_cleanup EXIT TERM INT
+
 test_list_json() {
   echo "Json output test"
-  perf list -j | $PYTHON -m json.tool
+  perf list -j -o "${list_output}"
+  $PYTHON -m json.tool "${list_output}"
   echo "Json output test [Success]"
 }
 
 test_list_json
-exit $err
+cleanup
+exit 0
index 5ae7bd0031a8226ab7e1f38ed4869e9058f0cf1d..fa4d71e2e72a6146485859883541023c9ac9e772 100755 (executable)
@@ -36,8 +36,7 @@ test_db()
        echo "DB test"
 
        # Check if python script is supported
-       libpython=$(perf version --build-options | grep python | grep -cv OFF)
-       if [ "${libpython}" != "1" ] ; then
+        if perf version --build-options | grep python | grep -q OFF ; then
                echo "SKIP: python scripting is not supported"
                err=2
                return
@@ -54,7 +53,14 @@ def sample_table(*args):
 def call_path_table(*args):
     print(f'call_path_table({args}')
 _end_of_file_
-       perf record -g -o "${perfdatafile}" true
+       case $(uname -m)
+       in s390x)
+               cmd_flags="--call-graph dwarf -e cpu-clock";;
+       *)
+               cmd_flags="-g";;
+       esac
+
+       perf record $cmd_flags -o "${perfdatafile}" true
        perf script -i "${perfdatafile}" -s "${db_test}"
        echo "DB test [Success]"
 }
index 5f5320f7c6e27d17a944e7196cfa1605c66357be..dc5943a6352d91dc67bafedfad09bdc40da1e135 100644 (file)
@@ -67,6 +67,7 @@ size_t syscall_arg__scnprintf_statx_mask(char *bf, size_t size, struct syscall_a
        P_FLAG(BTIME);
        P_FLAG(MNT_ID);
        P_FLAG(DIOALIGN);
+       P_FLAG(MNT_ID_UNIQUE);
 
 #undef P_FLAG
 
index 95f25e9fb994ab2a5190c40f91e9bbe3d5f884be..55a300a0977b416e60e90819ad1a9feefcdcbc84 100644 (file)
@@ -103,7 +103,14 @@ struct evlist *evlist__new_default(void)
        err = parse_event(evlist, can_profile_kernel ? "cycles:P" : "cycles:Pu");
        if (err) {
                evlist__delete(evlist);
-               evlist = NULL;
+               return NULL;
+       }
+
+       if (evlist->core.nr_entries > 1) {
+               struct evsel *evsel;
+
+               evlist__for_each_entry(evlist, evsel)
+                       evsel__set_sample_id(evsel, /*can_sample_identifier=*/false);
        }
 
        return evlist;
index 0888b7163b7cc25c33724f4e61099ab16a2ab60a..fa359180ebf8fc45e1248e4241543817e0660260 100644 (file)
@@ -491,8 +491,8 @@ static int hist_entry__init(struct hist_entry *he,
        }
 
        if (symbol_conf.res_sample) {
-               he->res_samples = calloc(sizeof(struct res_sample),
-                                       symbol_conf.res_sample);
+               he->res_samples = calloc(symbol_conf.res_sample,
+                                       sizeof(struct res_sample));
                if (!he->res_samples)
                        goto err_srcline;
        }
index 75e2248416f55f6792563a2614b211a36281222f..178b00205fe6a7b2d75f3a9d68b7bad99ccd82af 100644 (file)
        SYM_ALIAS(alias, name, SYM_T_FUNC, SYM_L_WEAK)
 #endif
 
+#ifndef SYM_FUNC_ALIAS_MEMFUNC
+#define SYM_FUNC_ALIAS_MEMFUNC SYM_FUNC_ALIAS
+#endif
+
 // In the kernel sources (include/linux/cfi_types.h), this has a different
 // definition when CONFIG_CFI_CLANG is used, for tools/ just use the !clang
 // definition:
index ca3e0404f18720d7a3cc2376896195f55cf1192d..966cca5a3e88cd94b78c27ce8429f820f1fa86b9 100644 (file)
@@ -286,7 +286,7 @@ static int setup_metric_events(const char *pmu, struct hashmap *ids,
        *out_metric_events = NULL;
        ids_size = hashmap__size(ids);
 
-       metric_events = calloc(sizeof(void *), ids_size + 1);
+       metric_events = calloc(ids_size + 1, sizeof(void *));
        if (!metric_events)
                return -ENOMEM;
 
index b0fc48be623f31bcfd478a79b66b5295708a5a15..9e47712507cc265d46c7cf5d66033185006ba2f3 100644 (file)
@@ -66,7 +66,7 @@ void print_tracepoint_events(const struct print_callbacks *print_cb __maybe_unus
 
        put_tracing_file(events_path);
        if (events_fd < 0) {
-               printf("Error: failed to open tracing events directory\n");
+               pr_err("Error: failed to open tracing events directory\n");
                return;
        }
 
index 3712186353fb94109e327195d1aee6d2177763ec..2a0289c149599927f1ee4023e530866d8da15a71 100644 (file)
@@ -1055,11 +1055,11 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
        if (thread_nr > n)
                thread_nr = n;
 
-       synthesize_threads = calloc(sizeof(pthread_t), thread_nr);
+       synthesize_threads = calloc(thread_nr, sizeof(pthread_t));
        if (synthesize_threads == NULL)
                goto free_dirent;
 
-       args = calloc(sizeof(*args), thread_nr);
+       args = calloc(thread_nr, sizeof(*args));
        if (args == NULL)
                goto free_threads;
 
index d9d9923af85c2e60ca9b2161fd93867df3346d1b..a4b902f9e1c486801a7c14072e796a1b1f8e92ad 100644 (file)
@@ -15,7 +15,7 @@ LIBS = -L../ -L$(OUTPUT) -lm -lcpupower
 OBJS = $(OUTPUT)main.o $(OUTPUT)parse.o $(OUTPUT)system.o $(OUTPUT)benchmark.o
 endif
 
-CFLAGS += -D_GNU_SOURCE -I../lib -DDEFAULT_CONFIG_FILE=\"$(confdir)/cpufreq-bench.conf\"
+override CFLAGS += -D_GNU_SOURCE -I../lib -DDEFAULT_CONFIG_FILE=\"$(confdir)/cpufreq-bench.conf\"
 
 $(OUTPUT)%.o : %.c
        $(ECHO) "  CC      " $@
index 0b12c36902d82ddf23d8d71ac5067e282f1b1564..030b388800f05191e5b6492b97ebf96e9d45dcf3 100644 (file)
@@ -13,6 +13,7 @@ ldflags-y += --wrap=cxl_hdm_decode_init
 ldflags-y += --wrap=cxl_dvsec_rr_decode
 ldflags-y += --wrap=devm_cxl_add_rch_dport
 ldflags-y += --wrap=cxl_rcd_component_reg_phys
+ldflags-y += --wrap=cxl_endpoint_parse_cdat
 
 DRIVERS := ../../../drivers
 CXL_SRC := $(DRIVERS)/cxl
@@ -65,4 +66,6 @@ cxl_core-y += config_check.o
 cxl_core-y += cxl_core_test.o
 cxl_core-y += cxl_core_exports.o
 
+KBUILD_CFLAGS := $(filter-out -Wmissing-prototypes -Wmissing-declarations, $(KBUILD_CFLAGS))
+
 obj-m += test/
index 61d5f7bcddf9a6ef9d5df5d0c4346bd93f7181f9..6b192789785612d810c6ff577b1ac47aadd9e9b3 100644 (file)
@@ -8,3 +8,5 @@ obj-m += cxl_mock_mem.o
 cxl_test-y := cxl.o
 cxl_mock-y := mock.o
 cxl_mock_mem-y := mem.o
+
+KBUILD_CFLAGS := $(filter-out -Wmissing-prototypes -Wmissing-declarations, $(KBUILD_CFLAGS))
index a3cdbb2be038c45e27326925d81ba43294b56c31..908e0d0839369c2e41f090bddc2e9a9b9121b4c9 100644 (file)
@@ -15,6 +15,8 @@
 
 static int interleave_arithmetic;
 
+#define FAKE_QTG_ID    42
+
 #define NR_CXL_HOST_BRIDGES 2
 #define NR_CXL_SINGLE_HOST 1
 #define NR_CXL_RCH 1
@@ -209,7 +211,7 @@ static struct {
                        .granularity = 4,
                        .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
                                        ACPI_CEDT_CFMWS_RESTRICT_VOLATILE,
-                       .qtg_id = 0,
+                       .qtg_id = FAKE_QTG_ID,
                        .window_size = SZ_256M * 4UL,
                },
                .target = { 0 },
@@ -224,7 +226,7 @@ static struct {
                        .granularity = 4,
                        .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
                                        ACPI_CEDT_CFMWS_RESTRICT_VOLATILE,
-                       .qtg_id = 1,
+                       .qtg_id = FAKE_QTG_ID,
                        .window_size = SZ_256M * 8UL,
                },
                .target = { 0, 1, },
@@ -239,7 +241,7 @@ static struct {
                        .granularity = 4,
                        .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
                                        ACPI_CEDT_CFMWS_RESTRICT_PMEM,
-                       .qtg_id = 2,
+                       .qtg_id = FAKE_QTG_ID,
                        .window_size = SZ_256M * 4UL,
                },
                .target = { 0 },
@@ -254,7 +256,7 @@ static struct {
                        .granularity = 4,
                        .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
                                        ACPI_CEDT_CFMWS_RESTRICT_PMEM,
-                       .qtg_id = 3,
+                       .qtg_id = FAKE_QTG_ID,
                        .window_size = SZ_256M * 8UL,
                },
                .target = { 0, 1, },
@@ -269,7 +271,7 @@ static struct {
                        .granularity = 4,
                        .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
                                        ACPI_CEDT_CFMWS_RESTRICT_PMEM,
-                       .qtg_id = 4,
+                       .qtg_id = FAKE_QTG_ID,
                        .window_size = SZ_256M * 4UL,
                },
                .target = { 2 },
@@ -284,7 +286,7 @@ static struct {
                        .granularity = 4,
                        .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
                                        ACPI_CEDT_CFMWS_RESTRICT_VOLATILE,
-                       .qtg_id = 5,
+                       .qtg_id = FAKE_QTG_ID,
                        .window_size = SZ_256M,
                },
                .target = { 3 },
@@ -301,7 +303,7 @@ static struct {
                        .granularity = 4,
                        .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
                                        ACPI_CEDT_CFMWS_RESTRICT_PMEM,
-                       .qtg_id = 0,
+                       .qtg_id = FAKE_QTG_ID,
                        .window_size = SZ_256M * 8UL,
                },
                .target = { 0, },
@@ -317,7 +319,7 @@ static struct {
                        .granularity = 0,
                        .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
                                        ACPI_CEDT_CFMWS_RESTRICT_PMEM,
-                       .qtg_id = 1,
+                       .qtg_id = FAKE_QTG_ID,
                        .window_size = SZ_256M * 8UL,
                },
                .target = { 0, 1, },
@@ -333,7 +335,7 @@ static struct {
                        .granularity = 0,
                        .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
                                        ACPI_CEDT_CFMWS_RESTRICT_PMEM,
-                       .qtg_id = 0,
+                       .qtg_id = FAKE_QTG_ID,
                        .window_size = SZ_256M * 16UL,
                },
                .target = { 0, 1, 0, 1, },
@@ -976,6 +978,48 @@ static int mock_cxl_port_enumerate_dports(struct cxl_port *port)
        return 0;
 }
 
+/*
+ * Faking the cxl_dpa_perf for the memdev when appropriate.
+ */
+static void dpa_perf_setup(struct cxl_port *endpoint, struct range *range,
+                          struct cxl_dpa_perf *dpa_perf)
+{
+       dpa_perf->qos_class = FAKE_QTG_ID;
+       dpa_perf->dpa_range = *range;
+       dpa_perf->coord.read_latency = 500;
+       dpa_perf->coord.write_latency = 500;
+       dpa_perf->coord.read_bandwidth = 1000;
+       dpa_perf->coord.write_bandwidth = 1000;
+}
+
+static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port)
+{
+       struct cxl_root *cxl_root __free(put_cxl_root) =
+               find_cxl_root(port);
+       struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
+       struct cxl_dev_state *cxlds = cxlmd->cxlds;
+       struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
+       struct range pmem_range = {
+               .start = cxlds->pmem_res.start,
+               .end = cxlds->pmem_res.end,
+       };
+       struct range ram_range = {
+               .start = cxlds->ram_res.start,
+               .end = cxlds->ram_res.end,
+       };
+
+       if (!cxl_root)
+               return;
+
+       if (range_len(&ram_range))
+               dpa_perf_setup(port, &ram_range, &mds->ram_perf);
+
+       if (range_len(&pmem_range))
+               dpa_perf_setup(port, &pmem_range, &mds->pmem_perf);
+
+       cxl_memdev_update_perf(cxlmd);
+}
+
 static struct cxl_mock_ops cxl_mock_ops = {
        .is_mock_adev = is_mock_adev,
        .is_mock_bridge = is_mock_bridge,
@@ -989,6 +1033,7 @@ static struct cxl_mock_ops cxl_mock_ops = {
        .devm_cxl_setup_hdm = mock_cxl_setup_hdm,
        .devm_cxl_add_passthrough_decoder = mock_cxl_add_passthrough_decoder,
        .devm_cxl_enumerate_decoders = mock_cxl_enumerate_decoders,
+       .cxl_endpoint_parse_cdat = mock_cxl_endpoint_parse_cdat,
        .list = LIST_HEAD_INIT(cxl_mock_ops.list),
 };
 
index 1a61e68e30950ba623b52c5920a7874e3d97b9cd..6f737941dc0e164b9611e9dac91cb9e55b69e715 100644 (file)
@@ -285,6 +285,20 @@ resource_size_t __wrap_cxl_rcd_component_reg_phys(struct device *dev,
 }
 EXPORT_SYMBOL_NS_GPL(__wrap_cxl_rcd_component_reg_phys, CXL);
 
+void __wrap_cxl_endpoint_parse_cdat(struct cxl_port *port)
+{
+       int index;
+       struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
+       struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
+
+       if (ops && ops->is_mock_dev(cxlmd->dev.parent))
+               ops->cxl_endpoint_parse_cdat(port);
+       else
+               cxl_endpoint_parse_cdat(port);
+       put_cxl_mock_ops(index);
+}
+EXPORT_SYMBOL_NS_GPL(__wrap_cxl_endpoint_parse_cdat, CXL);
+
 MODULE_LICENSE("GPL v2");
 MODULE_IMPORT_NS(ACPI);
 MODULE_IMPORT_NS(CXL);
index a94223750346c8d897197f3171091459982d29ac..d1b0271d282203b7bccac68aedef0646d1391d59 100644 (file)
@@ -25,6 +25,7 @@ struct cxl_mock_ops {
        int (*devm_cxl_add_passthrough_decoder)(struct cxl_port *port);
        int (*devm_cxl_enumerate_decoders)(
                struct cxl_hdm *hdm, struct cxl_endpoint_dvsec_info *info);
+       void (*cxl_endpoint_parse_cdat)(struct cxl_port *port);
 };
 
 void register_cxl_mock_ops(struct cxl_mock_ops *ops);
index 0b6488efed47ac309c977fd0d3249c150de8219e..7254c110ff23ab9a661ed1ca4d9dc789b8d8762f 100644 (file)
@@ -146,6 +146,7 @@ class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations):
                """Runs the Linux UML binary. Must be named 'linux'."""
                linux_bin = os.path.join(build_dir, 'linux')
                params.extend(['mem=1G', 'console=tty', 'kunit_shutdown=halt'])
+               print('Running tests with:\n$', linux_bin, ' '.join(shlex.quote(arg) for arg in params))
                return subprocess.Popen([linux_bin] + params,
                                           stdin=subprocess.PIPE,
                                           stdout=subprocess.PIPE,
index 8153251ea389a7dcff59d13f258cee0b066c7dbf..91a3627f301a79b90036c2cfd82217819b9bb757 100644 (file)
@@ -82,4 +82,6 @@ libnvdimm-$(CONFIG_NVDIMM_KEYS) += $(NVDIMM_SRC)/security.o
 libnvdimm-y += libnvdimm_test.o
 libnvdimm-y += config_check.o
 
+KBUILD_CFLAGS := $(filter-out -Wmissing-prototypes -Wmissing-declarations, $(KBUILD_CFLAGS))
+
 obj-m += test/
index 15b6a111c3beaa180af9a1c81db2bed024b2da9c..cd9ae576bfde3fcce5d4ae8733483d59d5bf13f6 100644 (file)
@@ -67,6 +67,7 @@ TARGETS += nsfs
 TARGETS += perf_events
 TARGETS += pidfd
 TARGETS += pid_namespace
+TARGETS += power_supply
 TARGETS += powerpc
 TARGETS += prctl
 TARGETS += proc
@@ -78,6 +79,7 @@ TARGETS += riscv
 TARGETS += rlimits
 TARGETS += rseq
 TARGETS += rtc
+TARGETS += rust
 TARGETS += seccomp
 TARGETS += sgx
 TARGETS += sigaltstack
@@ -236,6 +238,7 @@ ifdef INSTALL_PATH
        install -m 744 kselftest/module.sh $(INSTALL_PATH)/kselftest/
        install -m 744 kselftest/runner.sh $(INSTALL_PATH)/kselftest/
        install -m 744 kselftest/prefix.pl $(INSTALL_PATH)/kselftest/
+       install -m 744 kselftest/ktap_helpers.sh $(INSTALL_PATH)/kselftest/
        install -m 744 run_kselftest.sh $(INSTALL_PATH)/
        rm -f $(TEST_LIST)
        @ret=1; \
index bf84d4a1d9ae2c68ceeac9f25373fd9df01b6935..3c440370c1f0f2b9cc67a754da8087e447efa625 100644 (file)
@@ -193,6 +193,7 @@ static void subtest_task_iters(void)
        ASSERT_EQ(skel->bss->procs_cnt, 1, "procs_cnt");
        ASSERT_EQ(skel->bss->threads_cnt, thread_num + 1, "threads_cnt");
        ASSERT_EQ(skel->bss->proc_threads_cnt, thread_num + 1, "proc_threads_cnt");
+       ASSERT_EQ(skel->bss->invalid_cnt, 0, "invalid_cnt");
        pthread_mutex_unlock(&do_nothing_mutex);
        for (int i = 0; i < thread_num; i++)
                ASSERT_OK(pthread_join(thread_ids[i], &ret), "pthread_join");
diff --git a/tools/testing/selftests/bpf/prog_tests/read_vsyscall.c b/tools/testing/selftests/bpf/prog_tests/read_vsyscall.c
new file mode 100644 (file)
index 0000000..3405923
--- /dev/null
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2024. Huawei Technologies Co., Ltd */
+#include "test_progs.h"
+#include "read_vsyscall.skel.h"
+
+#if defined(__x86_64__)
+/* For VSYSCALL_ADDR */
+#include <asm/vsyscall.h>
+#else
+/* To prevent build failure on non-x86 arch */
+#define VSYSCALL_ADDR 0UL
+#endif
+
+struct read_ret_desc {
+       const char *name;
+       int ret;
+} all_read[] = {
+       { .name = "probe_read_kernel", .ret = -ERANGE },
+       { .name = "probe_read_kernel_str", .ret = -ERANGE },
+       { .name = "probe_read", .ret = -ERANGE },
+       { .name = "probe_read_str", .ret = -ERANGE },
+       { .name = "probe_read_user", .ret = -EFAULT },
+       { .name = "probe_read_user_str", .ret = -EFAULT },
+       { .name = "copy_from_user", .ret = -EFAULT },
+       { .name = "copy_from_user_task", .ret = -EFAULT },
+};
+
+void test_read_vsyscall(void)
+{
+       struct read_vsyscall *skel;
+       unsigned int i;
+       int err;
+
+#if !defined(__x86_64__)
+       test__skip();
+       return;
+#endif
+       skel = read_vsyscall__open_and_load();
+       if (!ASSERT_OK_PTR(skel, "read_vsyscall open_load"))
+               return;
+
+       skel->bss->target_pid = getpid();
+       err = read_vsyscall__attach(skel);
+       if (!ASSERT_EQ(err, 0, "read_vsyscall attach"))
+               goto out;
+
+       /* userspace may don't have vsyscall page due to LEGACY_VSYSCALL_NONE,
+        * but it doesn't affect the returned error codes.
+        */
+       skel->bss->user_ptr = (void *)VSYSCALL_ADDR;
+       usleep(1);
+
+       for (i = 0; i < ARRAY_SIZE(all_read); i++)
+               ASSERT_EQ(skel->bss->read_ret[i], all_read[i].ret, all_read[i].name);
+out:
+       read_vsyscall__destroy(skel);
+}
index 760ad96b4be099ed74779d8895165df5d212f091..d66687f1ee6a8df52cb228010a293a3d4d102216 100644 (file)
@@ -4,10 +4,29 @@
 #include "timer.skel.h"
 #include "timer_failure.skel.h"
 
+#define NUM_THR 8
+
+static void *spin_lock_thread(void *arg)
+{
+       int i, err, prog_fd = *(int *)arg;
+       LIBBPF_OPTS(bpf_test_run_opts, topts);
+
+       for (i = 0; i < 10000; i++) {
+               err = bpf_prog_test_run_opts(prog_fd, &topts);
+               if (!ASSERT_OK(err, "test_run_opts err") ||
+                   !ASSERT_OK(topts.retval, "test_run_opts retval"))
+                       break;
+       }
+
+       pthread_exit(arg);
+}
+
 static int timer(struct timer *timer_skel)
 {
-       int err, prog_fd;
+       int i, err, prog_fd;
        LIBBPF_OPTS(bpf_test_run_opts, topts);
+       pthread_t thread_id[NUM_THR];
+       void *ret;
 
        err = timer__attach(timer_skel);
        if (!ASSERT_OK(err, "timer_attach"))
@@ -43,6 +62,20 @@ static int timer(struct timer *timer_skel)
        /* check that code paths completed */
        ASSERT_EQ(timer_skel->bss->ok, 1 | 2 | 4, "ok");
 
+       prog_fd = bpf_program__fd(timer_skel->progs.race);
+       for (i = 0; i < NUM_THR; i++) {
+               err = pthread_create(&thread_id[i], NULL,
+                                    &spin_lock_thread, &prog_fd);
+               if (!ASSERT_OK(err, "pthread_create"))
+                       break;
+       }
+
+       while (i) {
+               err = pthread_join(thread_id[--i], &ret);
+               if (ASSERT_OK(err, "pthread_join"))
+                       ASSERT_EQ(ret, (void *)&prog_fd, "pthread_join");
+       }
+
        return 0;
 }
 
index c3b45745cbccd71b089d23a4ee3f4d1a296c2de6..6d8b54124cb359697bcb85b369c9bcd7457e71b4 100644 (file)
@@ -511,7 +511,7 @@ static void test_xdp_bonding_features(struct skeletons *skeletons)
        if (!ASSERT_OK(err, "bond bpf_xdp_query"))
                goto out;
 
-       if (!ASSERT_EQ(query_opts.feature_flags, NETDEV_XDP_ACT_MASK,
+       if (!ASSERT_EQ(query_opts.feature_flags, 0,
                       "bond query_opts.feature_flags"))
                goto out;
 
@@ -601,7 +601,7 @@ static void test_xdp_bonding_features(struct skeletons *skeletons)
        if (!ASSERT_OK(err, "bond bpf_xdp_query"))
                goto out;
 
-       ASSERT_EQ(query_opts.feature_flags, NETDEV_XDP_ACT_MASK,
+       ASSERT_EQ(query_opts.feature_flags, 0,
                  "bond query_opts.feature_flags");
 out:
        bpf_link__destroy(link);
index c9b4055cd410ae6378066e28e2f41c9b23d47ab1..e4d53e40ff2086112dff757581ef37f8fdcbe272 100644 (file)
@@ -10,7 +10,7 @@
 char _license[] SEC("license") = "GPL";
 
 pid_t target_pid;
-int procs_cnt, threads_cnt, proc_threads_cnt;
+int procs_cnt, threads_cnt, proc_threads_cnt, invalid_cnt;
 
 void bpf_rcu_read_lock(void) __ksym;
 void bpf_rcu_read_unlock(void) __ksym;
@@ -26,6 +26,16 @@ int iter_task_for_each_sleep(void *ctx)
        procs_cnt = threads_cnt = proc_threads_cnt = 0;
 
        bpf_rcu_read_lock();
+       bpf_for_each(task, pos, NULL, ~0U) {
+               /* Below instructions shouldn't be executed for invalid flags */
+               invalid_cnt++;
+       }
+
+       bpf_for_each(task, pos, NULL, BPF_TASK_ITER_PROC_THREADS) {
+               /* Below instructions shouldn't be executed for invalid task__nullable */
+               invalid_cnt++;
+       }
+
        bpf_for_each(task, pos, NULL, BPF_TASK_ITER_ALL_PROCS)
                if (pos->pid == target_pid)
                        procs_cnt++;
diff --git a/tools/testing/selftests/bpf/progs/read_vsyscall.c b/tools/testing/selftests/bpf/progs/read_vsyscall.c
new file mode 100644 (file)
index 0000000..986f966
--- /dev/null
@@ -0,0 +1,45 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2024. Huawei Technologies Co., Ltd */
+#include <linux/types.h>
+#include <bpf/bpf_helpers.h>
+
+#include "bpf_misc.h"
+
+int target_pid = 0;
+void *user_ptr = 0;
+int read_ret[8];
+
+char _license[] SEC("license") = "GPL";
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int do_probe_read(void *ctx)
+{
+       char buf[8];
+
+       if ((bpf_get_current_pid_tgid() >> 32) != target_pid)
+               return 0;
+
+       read_ret[0] = bpf_probe_read_kernel(buf, sizeof(buf), user_ptr);
+       read_ret[1] = bpf_probe_read_kernel_str(buf, sizeof(buf), user_ptr);
+       read_ret[2] = bpf_probe_read(buf, sizeof(buf), user_ptr);
+       read_ret[3] = bpf_probe_read_str(buf, sizeof(buf), user_ptr);
+       read_ret[4] = bpf_probe_read_user(buf, sizeof(buf), user_ptr);
+       read_ret[5] = bpf_probe_read_user_str(buf, sizeof(buf), user_ptr);
+
+       return 0;
+}
+
+SEC("fentry.s/" SYS_PREFIX "sys_nanosleep")
+int do_copy_from_user(void *ctx)
+{
+       char buf[8];
+
+       if ((bpf_get_current_pid_tgid() >> 32) != target_pid)
+               return 0;
+
+       read_ret[6] = bpf_copy_from_user(buf, sizeof(buf), user_ptr);
+       read_ret[7] = bpf_copy_from_user_task(buf, sizeof(buf), user_ptr,
+                                             bpf_get_current_task_btf(), 0);
+
+       return 0;
+}
index 8b946c8188c65d10ed86de886ea9aead585de86d..f615da97df26382f4dd758015dd0aaf2cd359614 100644 (file)
@@ -51,7 +51,8 @@ struct {
        __uint(max_entries, 1);
        __type(key, int);
        __type(value, struct elem);
-} abs_timer SEC(".maps"), soft_timer_pinned SEC(".maps"), abs_timer_pinned SEC(".maps");
+} abs_timer SEC(".maps"), soft_timer_pinned SEC(".maps"), abs_timer_pinned SEC(".maps"),
+       race_array SEC(".maps");
 
 __u64 bss_data;
 __u64 abs_data;
@@ -390,3 +391,34 @@ int BPF_PROG2(test5, int, a)
 
        return 0;
 }
+
+static int race_timer_callback(void *race_array, int *race_key, struct bpf_timer *timer)
+{
+       bpf_timer_start(timer, 1000000, 0);
+       return 0;
+}
+
+SEC("syscall")
+int race(void *ctx)
+{
+       struct bpf_timer *timer;
+       int err, race_key = 0;
+       struct elem init;
+
+       __builtin_memset(&init, 0, sizeof(struct elem));
+       bpf_map_update_elem(&race_array, &race_key, &init, BPF_ANY);
+
+       timer = bpf_map_lookup_elem(&race_array, &race_key);
+       if (!timer)
+               return 1;
+
+       err = bpf_timer_init(timer, &race_array, CLOCK_MONOTONIC);
+       if (err && err != -EBUSY)
+               return 1;
+
+       bpf_timer_set_callback(timer, race_timer_callback);
+       bpf_timer_start(timer, 0, 0);
+       bpf_timer_cancel(timer);
+
+       return 0;
+}
index 5905e036e0eaca6aa7c38e06412aabf0e2b1b4b9..a955a6358206eac8a4b5065b531e0171d4cc2a62 100644 (file)
@@ -239,4 +239,74 @@ int bpf_loop_iter_limit_nested(void *unused)
        return 1000 * a + b + c;
 }
 
+struct iter_limit_bug_ctx {
+       __u64 a;
+       __u64 b;
+       __u64 c;
+};
+
+static __naked void iter_limit_bug_cb(void)
+{
+       /* This is the same as C code below, but written
+        * in assembly to control which branches are fall-through.
+        *
+        *   switch (bpf_get_prandom_u32()) {
+        *   case 1:  ctx->a = 42; break;
+        *   case 2:  ctx->b = 42; break;
+        *   default: ctx->c = 42; break;
+        *   }
+        */
+       asm volatile (
+       "r9 = r2;"
+       "call %[bpf_get_prandom_u32];"
+       "r1 = r0;"
+       "r2 = 42;"
+       "r0 = 0;"
+       "if r1 == 0x1 goto 1f;"
+       "if r1 == 0x2 goto 2f;"
+       "*(u64 *)(r9 + 16) = r2;"
+       "exit;"
+       "1: *(u64 *)(r9 + 0) = r2;"
+       "exit;"
+       "2: *(u64 *)(r9 + 8) = r2;"
+       "exit;"
+       :
+       : __imm(bpf_get_prandom_u32)
+       : __clobber_all
+       );
+}
+
+SEC("tc")
+__failure
+__flag(BPF_F_TEST_STATE_FREQ)
+int iter_limit_bug(struct __sk_buff *skb)
+{
+       struct iter_limit_bug_ctx ctx = { 7, 7, 7 };
+
+       bpf_loop(2, iter_limit_bug_cb, &ctx, 0);
+
+       /* This is the same as C code below,
+        * written in assembly to guarantee checks order.
+        *
+        *   if (ctx.a == 42 && ctx.b == 42 && ctx.c == 7)
+        *     asm volatile("r1 /= 0;":::"r1");
+        */
+       asm volatile (
+       "r1 = *(u64 *)%[ctx_a];"
+       "if r1 != 42 goto 1f;"
+       "r1 = *(u64 *)%[ctx_b];"
+       "if r1 != 42 goto 1f;"
+       "r1 = *(u64 *)%[ctx_c];"
+       "if r1 != 7 goto 1f;"
+       "r1 /= 0;"
+       "1:"
+       :
+       : [ctx_a]"m"(ctx.a),
+         [ctx_b]"m"(ctx.b),
+         [ctx_c]"m"(ctx.c)
+       : "r1"
+       );
+       return 0;
+}
+
 char _license[] SEC("license") = "GPL";
index 534576f06df1cc78f63619d873f77ad0390f45e5..c59e4adb905df61494db41d99355fbac5d742bab 100644 (file)
@@ -12,6 +12,7 @@
 #include <syscall.h>
 #include <unistd.h>
 #include <sys/resource.h>
+#include <linux/close_range.h>
 
 #include "../kselftest_harness.h"
 #include "../clone3/clone3_selftests.h"
index c54d1697f439a47908f360e75b2760861a9d7939..9a3d3c389dadda07d1e8d499fea65e307c656056 100755 (executable)
@@ -62,6 +62,8 @@ prio_test()
 
        # create bond
        bond_reset "${param}"
+       # set active_slave to primary eth1 specifically
+       ip -n ${s_ns} link set bond0 type bond active_slave eth1
 
        # check bonding member prio value
        ip -n ${s_ns} link set eth0 type bond_slave prio 0
@@ -162,7 +164,7 @@ prio_arp()
        local mode=$1
 
        for primary_reselect in 0 1 2; do
-               prio_test "mode active-backup arp_interval 100 arp_ip_target ${g_ip4} primary eth1 primary_reselect $primary_reselect"
+               prio_test "mode $mode arp_interval 100 arp_ip_target ${g_ip4} primary eth1 primary_reselect $primary_reselect"
                log_test "prio" "$mode arp_ip_target primary_reselect $primary_reselect"
        done
 }
@@ -178,7 +180,7 @@ prio_ns()
        fi
 
        for primary_reselect in 0 1 2; do
-               prio_test "mode active-backup arp_interval 100 ns_ip6_target ${g_ip6} primary eth1 primary_reselect $primary_reselect"
+               prio_test "mode $mode arp_interval 100 ns_ip6_target ${g_ip6} primary eth1 primary_reselect $primary_reselect"
                log_test "prio" "$mode ns_ip6_target primary_reselect $primary_reselect"
        done
 }
@@ -194,9 +196,9 @@ prio()
 
        for mode in $modes; do
                prio_miimon $mode
-               prio_arp $mode
-               prio_ns $mode
        done
+       prio_arp "active-backup"
+       prio_ns "active-backup"
 }
 
 arp_validate_test()
index 2a268b17b61f515b5c50a1fdbe3d7ae21af00578..dbdd736a41d394c9a6e2897d971eb31e728eae34 100644 (file)
@@ -48,6 +48,17 @@ test_LAG_cleanup()
        ip link add mv0 link "$name" up address "$ucaddr" type macvlan
        # Used to test dev->mc handling
        ip address add "$addr6" dev "$name"
+
+       # Check that addresses were added as expected
+       (grep_bridge_fdb "$ucaddr" bridge fdb show dev dummy1 ||
+               grep_bridge_fdb "$ucaddr" bridge fdb show dev dummy2) >/dev/null
+       check_err $? "macvlan unicast address not found on a slave"
+
+       # mcaddr is added asynchronously by addrconf_dad_work(), use busywait
+       (busywait 10000 grep_bridge_fdb "$mcaddr" bridge fdb show dev dummy1 ||
+               grep_bridge_fdb "$mcaddr" bridge fdb show dev dummy2) >/dev/null
+       check_err $? "IPv6 solicited-node multicast mac address not found on a slave"
+
        ip link set dev "$name" down
        ip link del "$name"
 
index 6091b45d226baf192c2d380ba893be15592f323d..79b65bdf05db6586726cc76d3313f12368d21dc5 100644 (file)
@@ -1 +1 @@
-timeout=120
+timeout=1200
index 4855ef597a152135979694fb3e9145f1db4e8bcf..f98435c502f61aa665cefc39bea873e66a273ade 100755 (executable)
@@ -270,6 +270,7 @@ for port in 0 1; do
        echo 1 > $NSIM_DEV_SYS/new_port
     fi
     NSIM_NETDEV=`get_netdev_name old_netdevs`
+    ifconfig $NSIM_NETDEV up
 
     msg="new NIC device created"
     exp0=( 0 0 0 0 )
@@ -431,6 +432,7 @@ for port in 0 1; do
     fi
 
     echo $port > $NSIM_DEV_SYS/new_port
+    NSIM_NETDEV=`get_netdev_name old_netdevs`
     ifconfig $NSIM_NETDEV up
 
     overflow_table0 "overflow NIC table"
@@ -488,6 +490,7 @@ for port in 0 1; do
     fi
 
     echo $port > $NSIM_DEV_SYS/new_port
+    NSIM_NETDEV=`get_netdev_name old_netdevs`
     ifconfig $NSIM_NETDEV up
 
     overflow_table0 "overflow NIC table"
@@ -544,6 +547,7 @@ for port in 0 1; do
     fi
 
     echo $port > $NSIM_DEV_SYS/new_port
+    NSIM_NETDEV=`get_netdev_name old_netdevs`
     ifconfig $NSIM_NETDEV up
 
     overflow_table0 "destroy NIC"
@@ -573,6 +577,7 @@ for port in 0 1; do
     fi
 
     echo $port > $NSIM_DEV_SYS/new_port
+    NSIM_NETDEV=`get_netdev_name old_netdevs`
     ifconfig $NSIM_NETDEV up
 
     msg="create VxLANs v6"
@@ -633,6 +638,7 @@ for port in 0 1; do
     fi
 
     echo $port > $NSIM_DEV_SYS/new_port
+    NSIM_NETDEV=`get_netdev_name old_netdevs`
     ifconfig $NSIM_NETDEV up
 
     echo 110 > $NSIM_DEV_DFS/ports/$port/udp_ports_inject_error
@@ -688,6 +694,7 @@ for port in 0 1; do
     fi
 
     echo $port > $NSIM_DEV_SYS/new_port
+    NSIM_NETDEV=`get_netdev_name old_netdevs`
     ifconfig $NSIM_NETDEV up
 
     msg="create VxLANs v6"
@@ -747,6 +754,7 @@ for port in 0 1; do
     fi
 
     echo $port > $NSIM_DEV_SYS/new_port
+    NSIM_NETDEV=`get_netdev_name old_netdevs`
     ifconfig $NSIM_NETDEV up
 
     msg="create VxLANs v6"
@@ -877,6 +885,7 @@ msg="re-add a port"
 
 echo 2 > $NSIM_DEV_SYS/del_port
 echo 2 > $NSIM_DEV_SYS/new_port
+NSIM_NETDEV=`get_netdev_name old_netdevs`
 check_tables
 
 msg="replace VxLAN in overflow table"
index 265b6882cc21ed0c285ae9f37f9282bfb2e440d1..b5e3a3aad4bfbb5f1d77b4fd1bd4ae566f6394a1 100644 (file)
@@ -1,3 +1,5 @@
+CONFIG_DUMMY=y
+CONFIG_IPV6=y
+CONFIG_MACVLAN=y
 CONFIG_NET_TEAM=y
 CONFIG_NET_TEAM_MODE_LOADBALANCE=y
-CONFIG_MACVLAN=y
index 62dc00ee4978a46b1001bad62597d7589d261a75..2d33ee9e9b71ac2e0436abf8e35fe29af92a3cca 100644 (file)
@@ -4,7 +4,7 @@ ifneq ($(PY3),)
 
 TEST_PROGS := test_unprobed_devices.sh
 TEST_GEN_FILES := compatible_list
-TEST_FILES := compatible_ignore_list ktap_helpers.sh
+TEST_FILES := compatible_ignore_list
 
 include ../lib.mk
 
index b07af2a4c4de0b680f37d510337d38b23691a478..2d7e70c5ad2d36d0849a4c923fff64156e3841ca 100755 (executable)
 
 DIR="$(dirname $(readlink -f "$0"))"
 
-source "${DIR}"/ktap_helpers.sh
+source "${DIR}"/../kselftest/ktap_helpers.sh
 
 PDT=/proc/device-tree/
 COMPAT_LIST="${DIR}"/compatible_list
 IGNORE_LIST="${DIR}"/compatible_ignore_list
 
-KSFT_PASS=0
-KSFT_FAIL=1
-KSFT_SKIP=4
-
 ktap_print_header
 
 if [[ ! -d "${PDT}" ]]; then
@@ -33,8 +29,8 @@ if [[ ! -d "${PDT}" ]]; then
 fi
 
 nodes_compatible=$(
-       for node_compat in $(find ${PDT} -name compatible); do
-               node=$(dirname "${node_compat}")
+       for node in $(find ${PDT} -type d); do
+               [ ! -f "${node}"/compatible ] && continue
                # Check if node is available
                if [[ -e "${node}"/status ]]; then
                        status=$(tr -d '\000' < "${node}"/status)
@@ -46,10 +42,11 @@ nodes_compatible=$(
 
 nodes_dev_bound=$(
        IFS=$'\n'
-       for uevent in $(find /sys/devices -name uevent); do
-               if [[ -d "$(dirname "${uevent}")"/driver ]]; then
-                       grep '^OF_FULLNAME=' "${uevent}" | sed -e 's|OF_FULLNAME=||'
-               fi
+       for dev_dir in $(find /sys/devices -type d); do
+               [ ! -f "${dev_dir}"/uevent ] && continue
+               [ ! -d "${dev_dir}"/driver ] && continue
+
+               grep '^OF_FULLNAME=' "${dev_dir}"/uevent | sed -e 's|OF_FULLNAME=||'
        done
        )
 
index e19ab0e857091381948f1bdeb75f08c0f829b38a..759f86e7d263e43bcd7438b5979cc9815c2d4c2f 100644 (file)
@@ -10,7 +10,6 @@
 #include <linux/mount.h>
 #include <sys/syscall.h>
 #include <sys/stat.h>
-#include <sys/mount.h>
 #include <sys/mman.h>
 #include <sched.h>
 #include <fcntl.h>
@@ -32,7 +31,11 @@ static int sys_fsmount(int fd, unsigned int flags, unsigned int attr_flags)
 {
        return syscall(__NR_fsmount, fd, flags, attr_flags);
 }
-
+static int sys_mount(const char *src, const char *tgt, const char *fst,
+               unsigned long flags, const void *data)
+{
+       return syscall(__NR_mount, src, tgt, fst, flags, data);
+}
 static int sys_move_mount(int from_dfd, const char *from_pathname,
                          int to_dfd, const char *to_pathname,
                          unsigned int flags)
@@ -166,8 +169,7 @@ int main(int argc, char **argv)
                ksft_test_result_skip("unable to create a new mount namespace\n");
                return 1;
        }
-
-       if (mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL) == -1) {
+       if (sys_mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL) == -1) {
                pr_perror("mount");
                return 1;
        }
index c778d4dcc17e58523193ffad46a79212b3e721cd..25d4e0fca385ca3733d17cadb63c235403a0161f 100755 (executable)
@@ -504,7 +504,7 @@ prlog "# of undefined(test bug): " `echo $UNDEFINED_CASES | wc -w`
 if [ "$KTAP" = "1" ]; then
   echo -n "# Totals:"
   echo -n " pass:"`echo $PASSED_CASES | wc -w`
-  echo -n " faii:"`echo $FAILED_CASES | wc -w`
+  echo -n " fail:"`echo $FAILED_CASES | wc -w`
   echo -n " xfail:"`echo $XFAILED_CASES | wc -w`
   echo -n " xpass:0"
   echo -n " skip:"`echo $UNTESTED_CASES $UNSUPPORTED_CASES | wc -w`
index add7d5bf585de51a86450997debc174ef9cbf85a..c45094d1e1d2db2e812f7a7480256f581b63b3b8 100644 (file)
@@ -1,6 +1,6 @@
 #!/bin/sh
 # SPDX-License-Identifier: GPL-2.0
-# description: Test file and directory owership changes for eventfs
+# description: Test file and directory ownership changes for eventfs
 
 original_group=`stat -c "%g" .`
 original_owner=`stat -c "%u" .`
diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func_hotplug.tc b/tools/testing/selftests/ftrace/test.d/ftrace/func_hotplug.tc
new file mode 100644 (file)
index 0000000..ccfbfde
--- /dev/null
@@ -0,0 +1,42 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0-or-later
+# description: ftrace - function trace across cpu hotplug
+# requires: function:tracer
+
+if ! which nproc ; then
+  nproc() {
+    ls -d /sys/devices/system/cpu/cpu[0-9]* | wc -l
+  }
+fi
+
+NP=`nproc`
+
+if [ $NP -eq 1 ] ;then
+  echo "We cannot test cpu hotplug in UP environment"
+  exit_unresolved
+fi
+
+# Find online cpu
+for i in /sys/devices/system/cpu/cpu[1-9]*; do
+       if [ -f $i/online ] && [ "$(cat $i/online)" = "1" ]; then
+               cpu=$i
+               break
+       fi
+done
+
+if [ -z "$cpu" ]; then
+       echo "We cannot test cpu hotplug with a single cpu online"
+       exit_unresolved
+fi
+
+echo 0 > tracing_on
+echo > trace
+
+: "Set $(basename $cpu) offline/online with function tracer enabled"
+echo function > current_tracer
+echo 1 > tracing_on
+(echo 0 > $cpu/online)
+(echo "forked"; sleep 1)
+(echo 1 > $cpu/online)
+echo 0 > tracing_on
+echo nop > current_tracer
index 4562e13cb26bcde4b643084375c22c1a7c3edfea..717898894ef76f81e1a7304826698e48e68d5d7a 100644 (file)
@@ -40,7 +40,7 @@ grep "id: \(unknown_\|sys_\)" events/raw_syscalls/sys_exit/hist > /dev/null || \
 
 reset_trigger
 
-echo "Test histgram with log2 modifier"
+echo "Test histogram with log2 modifier"
 
 echo 'hist:keys=bytes_req.log2' > events/kmem/kmalloc/trigger
 for i in `seq 1 10` ; do ( echo "forked" > /dev/null); done
index 1ee5518ee6b7f9ffae5ae092c895d9decedc3c92..7f3ca5c78df12968f0aa14168f7dc78001b8fff8 100644 (file)
@@ -17,6 +17,8 @@
  *
  *****************************************************************************/
 
+#define _GNU_SOURCE
+
 #include <errno.h>
 #include <limits.h>
 #include <pthread.h>
@@ -358,6 +360,7 @@ out:
 
 int main(int argc, char *argv[])
 {
+       const char *test_name;
        int c, ret;
 
        while ((c = getopt(argc, argv, "bchlot:v:")) != -1) {
@@ -397,6 +400,14 @@ int main(int argc, char *argv[])
                "\tArguments: broadcast=%d locked=%d owner=%d timeout=%ldns\n",
                broadcast, locked, owner, timeout_ns);
 
+       ret = asprintf(&test_name,
+                      "%s broadcast=%d locked=%d owner=%d timeout=%ldns",
+                      TEST_NAME, broadcast, locked, owner, timeout_ns);
+       if (ret < 0) {
+               ksft_print_msg("Failed to generate test name\n");
+               test_name = TEST_NAME;
+       }
+
        /*
         * FIXME: unit_test is obsolete now that we parse options and the
         * various style of runs are done by run.sh - simplify the code and move
@@ -404,6 +415,6 @@ int main(int argc, char *argv[])
         */
        ret = unit_test(broadcast, locked, owner, timeout_ns);
 
-       print_result(TEST_NAME, ret);
+       print_result(test_name, ret);
        return ret;
 }
index 352fc39f3c6c160bfdcd2b3c655bfe319f00892c..b62c7dba6777f975dd9158f6788a6177307bc9e4 100644 (file)
@@ -880,8 +880,8 @@ class TestDTH2452Tablet(test_multitouch.BaseTest.TestMultitouch, TouchTabletTest
         does not overlap with other contacts. The value of `t` may be
         incremented over time to move the point along a linear path.
         """
-        x = 50 + 10 * contact_id + t
-        y = 100 + 100 * contact_id + t
+        x = 50 + 10 * contact_id + t * 11
+        y = 100 + 100 * contact_id + t * 11
         return test_multitouch.Touch(contact_id, x, y)
 
     def make_contacts(self, n, t=0):
@@ -902,8 +902,8 @@ class TestDTH2452Tablet(test_multitouch.BaseTest.TestMultitouch, TouchTabletTest
         tracking_id = contact_ids.tracking_id
         slot_num = contact_ids.slot_num
 
-        x = 50 + 10 * contact_id + t
-        y = 100 + 100 * contact_id + t
+        x = 50 + 10 * contact_id + t * 11
+        y = 100 + 100 * contact_id + t * 11
 
         # If the data isn't supposed to be stored in any slots, there is
         # nothing we can check for in the evdev stream.
index 6c4f901d6fed3c200bbcb40a6ba7dd22c0b2e2bd..110d73917615d177d5d7a891f08d523619c404f3 100644 (file)
@@ -1,2 +1,3 @@
-CONFIG_IOMMUFD
-CONFIG_IOMMUFD_TEST
+CONFIG_IOMMUFD=y
+CONFIG_FAULT_INJECTION=y
+CONFIG_IOMMUFD_TEST=y
index 1a881e7a21d1b26ce7ad19de1cc5ea07d3773ff9..edf1c99c9936c8549e8a2938a2ff11875197b3d4 100644 (file)
@@ -12,6 +12,7 @@
 static unsigned long HUGEPAGE_SIZE;
 
 #define MOCK_PAGE_SIZE (PAGE_SIZE / 2)
+#define MOCK_HUGE_PAGE_SIZE (512 * MOCK_PAGE_SIZE)
 
 static unsigned long get_huge_page_size(void)
 {
@@ -1716,10 +1717,12 @@ FIXTURE(iommufd_dirty_tracking)
 FIXTURE_VARIANT(iommufd_dirty_tracking)
 {
        unsigned long buffer_size;
+       bool hugepages;
 };
 
 FIXTURE_SETUP(iommufd_dirty_tracking)
 {
+       int mmap_flags;
        void *vrc;
        int rc;
 
@@ -1732,25 +1735,41 @@ FIXTURE_SETUP(iommufd_dirty_tracking)
                           variant->buffer_size, rc);
        }
 
+       mmap_flags = MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED;
+       if (variant->hugepages) {
+               /*
+                * MAP_POPULATE will cause the kernel to fail mmap if THPs are
+                * not available.
+                */
+               mmap_flags |= MAP_HUGETLB | MAP_POPULATE;
+       }
        assert((uintptr_t)self->buffer % HUGEPAGE_SIZE == 0);
        vrc = mmap(self->buffer, variant->buffer_size, PROT_READ | PROT_WRITE,
-                  MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+                  mmap_flags, -1, 0);
        assert(vrc == self->buffer);
 
        self->page_size = MOCK_PAGE_SIZE;
        self->bitmap_size =
                variant->buffer_size / self->page_size / BITS_PER_BYTE;
 
-       /* Provision with an extra (MOCK_PAGE_SIZE) for the unaligned case */
+       /* Provision with an extra (PAGE_SIZE) for the unaligned case */
        rc = posix_memalign(&self->bitmap, PAGE_SIZE,
-                           self->bitmap_size + MOCK_PAGE_SIZE);
+                           self->bitmap_size + PAGE_SIZE);
        assert(!rc);
        assert(self->bitmap);
        assert((uintptr_t)self->bitmap % PAGE_SIZE == 0);
 
        test_ioctl_ioas_alloc(&self->ioas_id);
-       test_cmd_mock_domain(self->ioas_id, &self->stdev_id, &self->hwpt_id,
-                            &self->idev_id);
+       /* Enable 1M mock IOMMU hugepages */
+       if (variant->hugepages) {
+               test_cmd_mock_domain_flags(self->ioas_id,
+                                          MOCK_FLAGS_DEVICE_HUGE_IOVA,
+                                          &self->stdev_id, &self->hwpt_id,
+                                          &self->idev_id);
+       } else {
+               test_cmd_mock_domain(self->ioas_id, &self->stdev_id,
+                                    &self->hwpt_id, &self->idev_id);
+       }
 }
 
 FIXTURE_TEARDOWN(iommufd_dirty_tracking)
@@ -1784,12 +1803,26 @@ FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty128M)
        .buffer_size = 128UL * 1024UL * 1024UL,
 };
 
+FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty128M_huge)
+{
+       /* 4K bitmap (128M IOVA range) */
+       .buffer_size = 128UL * 1024UL * 1024UL,
+       .hugepages = true,
+};
+
 FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty256M)
 {
        /* 8K bitmap (256M IOVA range) */
        .buffer_size = 256UL * 1024UL * 1024UL,
 };
 
+FIXTURE_VARIANT_ADD(iommufd_dirty_tracking, domain_dirty256M_huge)
+{
+       /* 8K bitmap (256M IOVA range) */
+       .buffer_size = 256UL * 1024UL * 1024UL,
+       .hugepages = true,
+};
+
 TEST_F(iommufd_dirty_tracking, enforce_dirty)
 {
        uint32_t ioas_id, stddev_id, idev_id;
@@ -1849,65 +1882,80 @@ TEST_F(iommufd_dirty_tracking, device_dirty_capability)
 
 TEST_F(iommufd_dirty_tracking, get_dirty_bitmap)
 {
-       uint32_t stddev_id;
+       uint32_t page_size = MOCK_PAGE_SIZE;
        uint32_t hwpt_id;
        uint32_t ioas_id;
 
+       if (variant->hugepages)
+               page_size = MOCK_HUGE_PAGE_SIZE;
+
        test_ioctl_ioas_alloc(&ioas_id);
        test_ioctl_ioas_map_fixed_id(ioas_id, self->buffer,
                                     variant->buffer_size, MOCK_APERTURE_START);
 
        test_cmd_hwpt_alloc(self->idev_id, ioas_id,
                            IOMMU_HWPT_ALLOC_DIRTY_TRACKING, &hwpt_id);
-       test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL);
 
        test_cmd_set_dirty_tracking(hwpt_id, true);
 
        test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
-                               MOCK_APERTURE_START, self->page_size,
+                               MOCK_APERTURE_START, self->page_size, page_size,
                                self->bitmap, self->bitmap_size, 0, _metadata);
 
        /* PAGE_SIZE unaligned bitmap */
        test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
-                               MOCK_APERTURE_START, self->page_size,
+                               MOCK_APERTURE_START, self->page_size, page_size,
                                self->bitmap + MOCK_PAGE_SIZE,
                                self->bitmap_size, 0, _metadata);
 
-       test_ioctl_destroy(stddev_id);
+       /* u64 unaligned bitmap */
+       test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
+                               MOCK_APERTURE_START, self->page_size, page_size,
+                               self->bitmap + 0xff1, self->bitmap_size, 0,
+                               _metadata);
+
        test_ioctl_destroy(hwpt_id);
 }
 
 TEST_F(iommufd_dirty_tracking, get_dirty_bitmap_no_clear)
 {
-       uint32_t stddev_id;
+       uint32_t page_size = MOCK_PAGE_SIZE;
        uint32_t hwpt_id;
        uint32_t ioas_id;
 
+       if (variant->hugepages)
+               page_size = MOCK_HUGE_PAGE_SIZE;
+
        test_ioctl_ioas_alloc(&ioas_id);
        test_ioctl_ioas_map_fixed_id(ioas_id, self->buffer,
                                     variant->buffer_size, MOCK_APERTURE_START);
 
        test_cmd_hwpt_alloc(self->idev_id, ioas_id,
                            IOMMU_HWPT_ALLOC_DIRTY_TRACKING, &hwpt_id);
-       test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL);
 
        test_cmd_set_dirty_tracking(hwpt_id, true);
 
        test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
-                               MOCK_APERTURE_START, self->page_size,
+                               MOCK_APERTURE_START, self->page_size, page_size,
                                self->bitmap, self->bitmap_size,
                                IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR,
                                _metadata);
 
        /* Unaligned bitmap */
        test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
-                               MOCK_APERTURE_START, self->page_size,
+                               MOCK_APERTURE_START, self->page_size, page_size,
                                self->bitmap + MOCK_PAGE_SIZE,
                                self->bitmap_size,
                                IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR,
                                _metadata);
 
-       test_ioctl_destroy(stddev_id);
+       /* u64 unaligned bitmap */
+       test_mock_dirty_bitmaps(hwpt_id, variant->buffer_size,
+                               MOCK_APERTURE_START, self->page_size, page_size,
+                               self->bitmap + 0xff1, self->bitmap_size,
+                               IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR,
+                               _metadata);
+
        test_ioctl_destroy(hwpt_id);
 }
 
index c646264aa41fdc1871c60bba6dc25841767f399b..8d2b46b2114da814f75740992c0dc4b1be14d33b 100644 (file)
@@ -344,16 +344,19 @@ static int _test_cmd_mock_domain_set_dirty(int fd, __u32 hwpt_id, size_t length,
                                                  page_size, bitmap, nr))
 
 static int _test_mock_dirty_bitmaps(int fd, __u32 hwpt_id, size_t length,
-                                   __u64 iova, size_t page_size, __u64 *bitmap,
+                                   __u64 iova, size_t page_size,
+                                   size_t pte_page_size, __u64 *bitmap,
                                    __u64 bitmap_size, __u32 flags,
                                    struct __test_metadata *_metadata)
 {
-       unsigned long i, nbits = bitmap_size * BITS_PER_BYTE;
-       unsigned long nr = nbits / 2;
+       unsigned long npte = pte_page_size / page_size, pteset = 2 * npte;
+       unsigned long nbits = bitmap_size * BITS_PER_BYTE;
+       unsigned long j, i, nr = nbits / pteset ?: 1;
        __u64 out_dirty = 0;
 
        /* Mark all even bits as dirty in the mock domain */
-       for (i = 0; i < nbits; i += 2)
+       memset(bitmap, 0, bitmap_size);
+       for (i = 0; i < nbits; i += pteset)
                set_bit(i, (unsigned long *)bitmap);
 
        test_cmd_mock_domain_set_dirty(fd, hwpt_id, length, iova, page_size,
@@ -365,8 +368,12 @@ static int _test_mock_dirty_bitmaps(int fd, __u32 hwpt_id, size_t length,
        test_cmd_get_dirty_bitmap(fd, hwpt_id, length, iova, page_size, bitmap,
                                  flags);
        /* Beware ASSERT_EQ() is two statements -- braces are not redundant! */
-       for (i = 0; i < nbits; i++) {
-               ASSERT_EQ(!(i % 2), test_bit(i, (unsigned long *)bitmap));
+       for (i = 0; i < nbits; i += pteset) {
+               for (j = 0; j < pteset; j++) {
+                       ASSERT_EQ(j < npte,
+                                 test_bit(i + j, (unsigned long *)bitmap));
+               }
+               ASSERT_EQ(!(i % pteset), test_bit(i, (unsigned long *)bitmap));
        }
 
        memset(bitmap, 0, bitmap_size);
@@ -374,19 +381,23 @@ static int _test_mock_dirty_bitmaps(int fd, __u32 hwpt_id, size_t length,
                                  flags);
 
        /* It as read already -- expect all zeroes */
-       for (i = 0; i < nbits; i++) {
-               ASSERT_EQ(!(i % 2) && (flags &
-                                      IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR),
-                         test_bit(i, (unsigned long *)bitmap));
+       for (i = 0; i < nbits; i += pteset) {
+               for (j = 0; j < pteset; j++) {
+                       ASSERT_EQ(
+                               (j < npte) &&
+                                       (flags &
+                                        IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR),
+                               test_bit(i + j, (unsigned long *)bitmap));
+               }
        }
 
        return 0;
 }
-#define test_mock_dirty_bitmaps(hwpt_id, length, iova, page_size, bitmap,      \
-                               bitmap_size, flags, _metadata)                 \
+#define test_mock_dirty_bitmaps(hwpt_id, length, iova, page_size, pte_size,\
+                               bitmap, bitmap_size, flags, _metadata)     \
        ASSERT_EQ(0, _test_mock_dirty_bitmaps(self->fd, hwpt_id, length, iova, \
-                                             page_size, bitmap, bitmap_size,  \
-                                             flags, _metadata))
+                                             page_size, pte_size, bitmap,     \
+                                             bitmap_size, flags, _metadata))
 
 static int _test_cmd_create_access(int fd, unsigned int ioas_id,
                                   __u32 *access_id, unsigned int flags)
similarity index 66%
rename from tools/testing/selftests/dt/ktap_helpers.sh
rename to tools/testing/selftests/kselftest/ktap_helpers.sh
index 8dfae51bb4e2e034b8d0e55ce120c41171bb981e..f2fbb914e058d2bae0f8e8e455cc20a1626c988e 100644 (file)
@@ -9,14 +9,27 @@ KTAP_CNT_PASS=0
 KTAP_CNT_FAIL=0
 KTAP_CNT_SKIP=0
 
+KSFT_PASS=0
+KSFT_FAIL=1
+KSFT_XFAIL=2
+KSFT_XPASS=3
+KSFT_SKIP=4
+
+KSFT_NUM_TESTS=0
+
 ktap_print_header() {
        echo "TAP version 13"
 }
 
+ktap_print_msg()
+{
+       echo "#" $@
+}
+
 ktap_set_plan() {
-       num_tests="$1"
+       KSFT_NUM_TESTS="$1"
 
-       echo "1..$num_tests"
+       echo "1..$KSFT_NUM_TESTS"
 }
 
 ktap_skip_all() {
@@ -65,6 +78,34 @@ ktap_test_fail() {
        KTAP_CNT_FAIL=$((KTAP_CNT_FAIL+1))
 }
 
+ktap_test_result() {
+       description="$1"
+       shift
+
+       if $@; then
+               ktap_test_pass "$description"
+       else
+               ktap_test_fail "$description"
+       fi
+}
+
+ktap_exit_fail_msg() {
+       echo "Bail out! " $@
+       ktap_print_totals
+
+       exit "$KSFT_FAIL"
+}
+
+ktap_finished() {
+       ktap_print_totals
+
+       if [ $(("$KTAP_CNT_PASS" + "$KTAP_CNT_SKIP")) -eq "$KSFT_NUM_TESTS" ]; then
+               exit "$KSFT_PASS"
+       else
+               exit "$KSFT_FAIL"
+       fi
+}
+
 ktap_print_totals() {
        echo "# Totals: pass:$KTAP_CNT_PASS fail:$KTAP_CNT_FAIL xfail:0 xpass:0 skip:$KTAP_CNT_SKIP error:0"
 }
index 274b8465b42a5aa1d210ced625db7ce42799e693..2cb8dd1f8275fb0a83e1a4cb605e7b63c46ef221 100644 (file)
@@ -248,7 +248,7 @@ static void *test_vcpu_run(void *arg)
                REPORT_GUEST_ASSERT(uc);
                break;
        default:
-               TEST_FAIL("Unexpected guest exit\n");
+               TEST_FAIL("Unexpected guest exit");
        }
 
        return NULL;
@@ -287,7 +287,7 @@ static int test_migrate_vcpu(unsigned int vcpu_idx)
 
        /* Allow the error where the vCPU thread is already finished */
        TEST_ASSERT(ret == 0 || ret == ESRCH,
-                   "Failed to migrate the vCPU:%u to pCPU: %u; ret: %d\n",
+                   "Failed to migrate the vCPU:%u to pCPU: %u; ret: %d",
                    vcpu_idx, new_pcpu, ret);
 
        return ret;
@@ -326,12 +326,12 @@ static void test_run(struct kvm_vm *vm)
 
        pthread_mutex_init(&vcpu_done_map_lock, NULL);
        vcpu_done_map = bitmap_zalloc(test_args.nr_vcpus);
-       TEST_ASSERT(vcpu_done_map, "Failed to allocate vcpu done bitmap\n");
+       TEST_ASSERT(vcpu_done_map, "Failed to allocate vcpu done bitmap");
 
        for (i = 0; i < (unsigned long)test_args.nr_vcpus; i++) {
                ret = pthread_create(&pt_vcpu_run[i], NULL, test_vcpu_run,
                                     (void *)(unsigned long)i);
-               TEST_ASSERT(!ret, "Failed to create vCPU-%d pthread\n", i);
+               TEST_ASSERT(!ret, "Failed to create vCPU-%d pthread", i);
        }
 
        /* Spawn a thread to control the vCPU migrations */
@@ -340,7 +340,7 @@ static void test_run(struct kvm_vm *vm)
 
                ret = pthread_create(&pt_vcpu_migration, NULL,
                                        test_vcpu_migration, NULL);
-               TEST_ASSERT(!ret, "Failed to create the migration pthread\n");
+               TEST_ASSERT(!ret, "Failed to create the migration pthread");
        }
 
 
@@ -384,7 +384,7 @@ static struct kvm_vm *test_vm_create(void)
                if (kvm_has_cap(KVM_CAP_COUNTER_OFFSET))
                        vm_ioctl(vm, KVM_ARM_SET_COUNTER_OFFSET, &test_args.offset);
                else
-                       TEST_FAIL("no support for global offset\n");
+                       TEST_FAIL("no support for global offset");
        }
 
        for (i = 0; i < nr_vcpus; i++)
index 31f66ba97228babe46bf2c46640077065ded7241..27c10e7a7e0124f01e0e636dfb25944c0d7abc95 100644 (file)
@@ -175,18 +175,18 @@ static void test_fw_regs_before_vm_start(struct kvm_vcpu *vcpu)
                /* First 'read' should be an upper limit of the features supported */
                vcpu_get_reg(vcpu, reg_info->reg, &val);
                TEST_ASSERT(val == FW_REG_ULIMIT_VAL(reg_info->max_feat_bit),
-                       "Expected all the features to be set for reg: 0x%lx; expected: 0x%lx; read: 0x%lx\n",
+                       "Expected all the features to be set for reg: 0x%lx; expected: 0x%lx; read: 0x%lx",
                        reg_info->reg, FW_REG_ULIMIT_VAL(reg_info->max_feat_bit), val);
 
                /* Test a 'write' by disabling all the features of the register map */
                ret = __vcpu_set_reg(vcpu, reg_info->reg, 0);
                TEST_ASSERT(ret == 0,
-                       "Failed to clear all the features of reg: 0x%lx; ret: %d\n",
+                       "Failed to clear all the features of reg: 0x%lx; ret: %d",
                        reg_info->reg, errno);
 
                vcpu_get_reg(vcpu, reg_info->reg, &val);
                TEST_ASSERT(val == 0,
-                       "Expected all the features to be cleared for reg: 0x%lx\n", reg_info->reg);
+                       "Expected all the features to be cleared for reg: 0x%lx", reg_info->reg);
 
                /*
                 * Test enabling a feature that's not supported.
@@ -195,7 +195,7 @@ static void test_fw_regs_before_vm_start(struct kvm_vcpu *vcpu)
                if (reg_info->max_feat_bit < 63) {
                        ret = __vcpu_set_reg(vcpu, reg_info->reg, BIT(reg_info->max_feat_bit + 1));
                        TEST_ASSERT(ret != 0 && errno == EINVAL,
-                       "Unexpected behavior or return value (%d) while setting an unsupported feature for reg: 0x%lx\n",
+                       "Unexpected behavior or return value (%d) while setting an unsupported feature for reg: 0x%lx",
                        errno, reg_info->reg);
                }
        }
@@ -216,7 +216,7 @@ static void test_fw_regs_after_vm_start(struct kvm_vcpu *vcpu)
                 */
                vcpu_get_reg(vcpu, reg_info->reg, &val);
                TEST_ASSERT(val == 0,
-                       "Expected all the features to be cleared for reg: 0x%lx\n",
+                       "Expected all the features to be cleared for reg: 0x%lx",
                        reg_info->reg);
 
                /*
@@ -226,7 +226,7 @@ static void test_fw_regs_after_vm_start(struct kvm_vcpu *vcpu)
                 */
                ret = __vcpu_set_reg(vcpu, reg_info->reg, FW_REG_ULIMIT_VAL(reg_info->max_feat_bit));
                TEST_ASSERT(ret != 0 && errno == EBUSY,
-               "Unexpected behavior or return value (%d) while setting a feature while VM is running for reg: 0x%lx\n",
+               "Unexpected behavior or return value (%d) while setting a feature while VM is running for reg: 0x%lx",
                errno, reg_info->reg);
        }
 }
@@ -265,7 +265,7 @@ static void test_guest_stage(struct kvm_vm **vm, struct kvm_vcpu **vcpu)
        case TEST_STAGE_HVC_IFACE_FALSE_INFO:
                break;
        default:
-               TEST_FAIL("Unknown test stage: %d\n", prev_stage);
+               TEST_FAIL("Unknown test stage: %d", prev_stage);
        }
 }
 
@@ -294,7 +294,7 @@ static void test_run(void)
                        REPORT_GUEST_ASSERT(uc);
                        break;
                default:
-                       TEST_FAIL("Unexpected guest exit\n");
+                       TEST_FAIL("Unexpected guest exit");
                }
        }
 
index 08a5ca5bed56a9f602c01c024b19771a5cc9e219..53fddad57cbbc689231fd81cd1947d27f1c84807 100644 (file)
@@ -414,10 +414,10 @@ static bool punch_hole_in_backing_store(struct kvm_vm *vm,
        if (fd != -1) {
                ret = fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
                                0, paging_size);
-               TEST_ASSERT(ret == 0, "fallocate failed\n");
+               TEST_ASSERT(ret == 0, "fallocate failed");
        } else {
                ret = madvise(hva, paging_size, MADV_DONTNEED);
-               TEST_ASSERT(ret == 0, "madvise failed\n");
+               TEST_ASSERT(ret == 0, "madvise failed");
        }
 
        return true;
@@ -501,7 +501,7 @@ static bool handle_cmd(struct kvm_vm *vm, int cmd)
 
 void fail_vcpu_run_no_handler(int ret)
 {
-       TEST_FAIL("Unexpected vcpu run failure\n");
+       TEST_FAIL("Unexpected vcpu run failure");
 }
 
 void fail_vcpu_run_mmio_no_syndrome_handler(int ret)
index f4ceae9c89257d211bab149be149510898e02290..2d189f3da228cdb74b8a8976c0736abde18ce9b5 100644 (file)
@@ -178,7 +178,7 @@ static void expect_call_denied(struct kvm_vcpu *vcpu)
        struct ucall uc;
 
        if (get_ucall(vcpu, &uc) != UCALL_SYNC)
-               TEST_FAIL("Unexpected ucall: %lu\n", uc.cmd);
+               TEST_FAIL("Unexpected ucall: %lu", uc.cmd);
 
        TEST_ASSERT(uc.args[1] == SMCCC_RET_NOT_SUPPORTED,
                    "Unexpected SMCCC return code: %lu", uc.args[1]);
index 9d51b56913496ed38f0413183b49380aba017ba3..5f9713364693b4eacd175bad2b1531e6a91f5504 100644 (file)
@@ -517,11 +517,11 @@ static void test_create_vpmu_vm_with_pmcr_n(uint64_t pmcr_n, bool expect_fail)
 
        if (expect_fail)
                TEST_ASSERT(pmcr_orig == pmcr,
-                           "PMCR.N modified by KVM to a larger value (PMCR: 0x%lx) for pmcr_n: 0x%lx\n",
+                           "PMCR.N modified by KVM to a larger value (PMCR: 0x%lx) for pmcr_n: 0x%lx",
                            pmcr, pmcr_n);
        else
                TEST_ASSERT(pmcr_n == get_pmcr_n(pmcr),
-                           "Failed to update PMCR.N to %lu (received: %lu)\n",
+                           "Failed to update PMCR.N to %lu (received: %lu)",
                            pmcr_n, get_pmcr_n(pmcr));
 }
 
@@ -594,12 +594,12 @@ static void run_pmregs_validity_test(uint64_t pmcr_n)
                 */
                vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(set_reg_id), &reg_val);
                TEST_ASSERT((reg_val & (~valid_counters_mask)) == 0,
-                           "Initial read of set_reg: 0x%llx has unimplemented counters enabled: 0x%lx\n",
+                           "Initial read of set_reg: 0x%llx has unimplemented counters enabled: 0x%lx",
                            KVM_ARM64_SYS_REG(set_reg_id), reg_val);
 
                vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(clr_reg_id), &reg_val);
                TEST_ASSERT((reg_val & (~valid_counters_mask)) == 0,
-                           "Initial read of clr_reg: 0x%llx has unimplemented counters enabled: 0x%lx\n",
+                           "Initial read of clr_reg: 0x%llx has unimplemented counters enabled: 0x%lx",
                            KVM_ARM64_SYS_REG(clr_reg_id), reg_val);
 
                /*
@@ -611,12 +611,12 @@ static void run_pmregs_validity_test(uint64_t pmcr_n)
 
                vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(set_reg_id), &reg_val);
                TEST_ASSERT((reg_val & (~valid_counters_mask)) == 0,
-                           "Read of set_reg: 0x%llx has unimplemented counters enabled: 0x%lx\n",
+                           "Read of set_reg: 0x%llx has unimplemented counters enabled: 0x%lx",
                            KVM_ARM64_SYS_REG(set_reg_id), reg_val);
 
                vcpu_get_reg(vcpu, KVM_ARM64_SYS_REG(clr_reg_id), &reg_val);
                TEST_ASSERT((reg_val & (~valid_counters_mask)) == 0,
-                           "Read of clr_reg: 0x%llx has unimplemented counters enabled: 0x%lx\n",
+                           "Read of clr_reg: 0x%llx has unimplemented counters enabled: 0x%lx",
                            KVM_ARM64_SYS_REG(clr_reg_id), reg_val);
        }
 
index 09c116a82a8499d7b14dc60bdcb088dd6c6914c7..bf3609f718544fb2b5d6c116d8eec4b28c29ab98 100644 (file)
@@ -45,10 +45,10 @@ static void vcpu_worker(struct memstress_vcpu_args *vcpu_args)
 
        /* Let the guest access its memory */
        ret = _vcpu_run(vcpu);
-       TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret);
+       TEST_ASSERT(ret == 0, "vcpu_run failed: %d", ret);
        if (get_ucall(vcpu, NULL) != UCALL_SYNC) {
                TEST_ASSERT(false,
-                           "Invalid guest sync status: exit_reason=%s\n",
+                           "Invalid guest sync status: exit_reason=%s",
                            exit_reason_str(run->exit_reason));
        }
 
index d374dbcf9a535dbd9efc7316e9c63c4152010e7a..504f6fe980e8fd7bf57ae105ed574fd3bf62d583 100644 (file)
@@ -88,9 +88,9 @@ static void vcpu_worker(struct memstress_vcpu_args *vcpu_args)
                ret = _vcpu_run(vcpu);
                ts_diff = timespec_elapsed(start);
 
-               TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret);
+               TEST_ASSERT(ret == 0, "vcpu_run failed: %d", ret);
                TEST_ASSERT(get_ucall(vcpu, NULL) == UCALL_SYNC,
-                           "Invalid guest sync status: exit_reason=%s\n",
+                           "Invalid guest sync status: exit_reason=%s",
                            exit_reason_str(run->exit_reason));
 
                pr_debug("Got sync event from vCPU %d\n", vcpu_idx);
index 6cbecf4997676f327095a399a75dcfa514fca13c..eaad5b20854ccf095a4447245554f8ecd48e0506 100644 (file)
@@ -262,7 +262,7 @@ static void default_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
                    "vcpu run failed: errno=%d", err);
 
        TEST_ASSERT(get_ucall(vcpu, NULL) == UCALL_SYNC,
-                   "Invalid guest sync status: exit_reason=%s\n",
+                   "Invalid guest sync status: exit_reason=%s",
                    exit_reason_str(run->exit_reason));
 
        vcpu_handle_sync_stop();
@@ -376,7 +376,10 @@ static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
 
        cleared = kvm_vm_reset_dirty_ring(vcpu->vm);
 
-       /* Cleared pages should be the same as collected */
+       /*
+        * Cleared pages should be the same as collected, as KVM is supposed to
+        * clear only the entries that have been harvested.
+        */
        TEST_ASSERT(cleared == count, "Reset dirty pages (%u) mismatch "
                    "with collected (%u)", cleared, count);
 
@@ -410,17 +413,11 @@ static void dirty_ring_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
                pr_info("vcpu continues now.\n");
        } else {
                TEST_ASSERT(false, "Invalid guest sync status: "
-                           "exit_reason=%s\n",
+                           "exit_reason=%s",
                            exit_reason_str(run->exit_reason));
        }
 }
 
-static void dirty_ring_before_vcpu_join(void)
-{
-       /* Kick another round of vcpu just to make sure it will quit */
-       sem_post(&sem_vcpu_cont);
-}
-
 struct log_mode {
        const char *name;
        /* Return true if this mode is supported, otherwise false */
@@ -433,7 +430,6 @@ struct log_mode {
                                     uint32_t *ring_buf_idx);
        /* Hook to call when after each vcpu run */
        void (*after_vcpu_run)(struct kvm_vcpu *vcpu, int ret, int err);
-       void (*before_vcpu_join) (void);
 } log_modes[LOG_MODE_NUM] = {
        {
                .name = "dirty-log",
@@ -452,7 +448,6 @@ struct log_mode {
                .supported = dirty_ring_supported,
                .create_vm_done = dirty_ring_create_vm_done,
                .collect_dirty_pages = dirty_ring_collect_dirty_pages,
-               .before_vcpu_join = dirty_ring_before_vcpu_join,
                .after_vcpu_run = dirty_ring_after_vcpu_run,
        },
 };
@@ -513,14 +508,6 @@ static void log_mode_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
                mode->after_vcpu_run(vcpu, ret, err);
 }
 
-static void log_mode_before_vcpu_join(void)
-{
-       struct log_mode *mode = &log_modes[host_log_mode];
-
-       if (mode->before_vcpu_join)
-               mode->before_vcpu_join();
-}
-
 static void generate_random_array(uint64_t *guest_array, uint64_t size)
 {
        uint64_t i;
@@ -719,6 +706,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
        struct kvm_vm *vm;
        unsigned long *bmap;
        uint32_t ring_buf_idx = 0;
+       int sem_val;
 
        if (!log_mode_supported()) {
                print_skip("Log mode '%s' not supported",
@@ -788,12 +776,22 @@ static void run_test(enum vm_guest_mode mode, void *arg)
        /* Start the iterations */
        iteration = 1;
        sync_global_to_guest(vm, iteration);
-       host_quit = false;
+       WRITE_ONCE(host_quit, false);
        host_dirty_count = 0;
        host_clear_count = 0;
        host_track_next_count = 0;
        WRITE_ONCE(dirty_ring_vcpu_ring_full, false);
 
+       /*
+        * Ensure the previous iteration didn't leave a dangling semaphore, i.e.
+        * that the main task and vCPU worker were synchronized and completed
+        * verification of all iterations.
+        */
+       sem_getvalue(&sem_vcpu_stop, &sem_val);
+       TEST_ASSERT_EQ(sem_val, 0);
+       sem_getvalue(&sem_vcpu_cont, &sem_val);
+       TEST_ASSERT_EQ(sem_val, 0);
+
        pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
 
        while (iteration < p->iterations) {
@@ -819,15 +817,21 @@ static void run_test(enum vm_guest_mode mode, void *arg)
                assert(host_log_mode == LOG_MODE_DIRTY_RING ||
                       atomic_read(&vcpu_sync_stop_requested) == false);
                vm_dirty_log_verify(mode, bmap);
-               sem_post(&sem_vcpu_cont);
 
-               iteration++;
+               /*
+                * Set host_quit before sem_vcpu_cont in the final iteration to
+                * ensure that the vCPU worker doesn't resume the guest.  As
+                * above, the dirty ring test may stop and wait even when not
+                * explicitly request to do so, i.e. would hang waiting for a
+                * "continue" if it's allowed to resume the guest.
+                */
+               if (++iteration == p->iterations)
+                       WRITE_ONCE(host_quit, true);
+
+               sem_post(&sem_vcpu_cont);
                sync_global_to_guest(vm, iteration);
        }
 
-       /* Tell the vcpu thread to quit */
-       host_quit = true;
-       log_mode_before_vcpu_join();
        pthread_join(vcpu_thread, NULL);
 
        pr_info("Total bits checked: dirty (%"PRIu64"), clear (%"PRIu64"), "
index 8274ef04301f6704528293206603efca9690b29c..91f05f78e8249124332e9a28fef53daf4e746fd1 100644 (file)
@@ -152,7 +152,7 @@ static void check_supported(struct vcpu_reg_list *c)
                        continue;
 
                __TEST_REQUIRE(kvm_has_cap(s->capability),
-                              "%s: %s not available, skipping tests\n",
+                              "%s: %s not available, skipping tests",
                               config_name(c), s->name);
        }
 }
index 41230b74619023d447f48fc8555936e194ce9387..3502caa3590c6488442a015ff40e5fa1027e41e6 100644 (file)
@@ -98,7 +98,7 @@ static void ucall_abort(const char *assert_msg, const char *expected_assert_msg)
        int offset = len_str - len_substr;
 
        TEST_ASSERT(len_substr <= len_str,
-                   "Expected '%s' to be a substring of '%s'\n",
+                   "Expected '%s' to be a substring of '%s'",
                    assert_msg, expected_assert_msg);
 
        TEST_ASSERT(strcmp(&assert_msg[offset], expected_assert_msg) == 0,
@@ -116,7 +116,7 @@ static void run_test(struct kvm_vcpu *vcpu, const char *expected_printf,
                vcpu_run(vcpu);
 
                TEST_ASSERT(run->exit_reason == UCALL_EXIT_REASON,
-                           "Unexpected exit reason: %u (%s),\n",
+                           "Unexpected exit reason: %u (%s),",
                            run->exit_reason, exit_reason_str(run->exit_reason));
 
                switch (get_ucall(vcpu, &uc)) {
@@ -161,11 +161,11 @@ static void test_limits(void)
        vcpu_run(vcpu);
 
        TEST_ASSERT(run->exit_reason == UCALL_EXIT_REASON,
-                   "Unexpected exit reason: %u (%s),\n",
+                   "Unexpected exit reason: %u (%s),",
                    run->exit_reason, exit_reason_str(run->exit_reason));
 
        TEST_ASSERT(get_ucall(vcpu, &uc) == UCALL_ABORT,
-                   "Unexpected ucall command: %lu,  Expected: %u (UCALL_ABORT)\n",
+                   "Unexpected ucall command: %lu,  Expected: %u (UCALL_ABORT)",
                    uc.cmd, UCALL_ABORT);
 
        kvm_vm_free(vm);
index f5d59b9934f184163e3e1e48578508d2a5255eae..decc521fc7603b1440cbb414f94acdf042ce1c5a 100644 (file)
@@ -41,7 +41,7 @@ static void *run_vcpu(void *arg)
 
        vcpu_run(vcpu);
 
-       TEST_ASSERT(false, "%s: exited with reason %d: %s\n",
+       TEST_ASSERT(false, "%s: exited with reason %d: %s",
                    __func__, run->exit_reason,
                    exit_reason_str(run->exit_reason));
        pthread_exit(NULL);
@@ -55,7 +55,7 @@ static void *sleeping_thread(void *arg)
                fd = open("/dev/null", O_RDWR);
                close(fd);
        }
-       TEST_ASSERT(false, "%s: exited\n", __func__);
+       TEST_ASSERT(false, "%s: exited", __func__);
        pthread_exit(NULL);
 }
 
@@ -118,7 +118,7 @@ static void run_test(uint32_t run)
        for (i = 0; i < VCPU_NUM; ++i)
                check_join(threads[i], &b);
        /* Should not be reached */
-       TEST_ASSERT(false, "%s: [%d] child escaped the ninja\n", __func__, run);
+       TEST_ASSERT(false, "%s: [%d] child escaped the ninja", __func__, run);
 }
 
 void wait_for_child_setup(pid_t pid)
index 71a41fa924b7d09cb1a3aaf9bcc779d7d3311110..50a5e31ba8da1bc904d7cb04be2571bf786b847e 100644 (file)
@@ -195,4 +195,6 @@ __printf(3, 4) int guest_snprintf(char *buf, int n, const char *fmt, ...);
 
 char *strdup_printf(const char *fmt, ...) __attribute__((format(printf, 1, 2), nonnull(1)));
 
+char *sys_get_cur_clocksource(void);
+
 #endif /* SELFTEST_KVM_TEST_UTIL_H */
index a84863503fcb46cda532840f3be4512cf35061c3..5bca8c947c8253819dd07ed6266a03194a0c1857 100644 (file)
@@ -1271,4 +1271,6 @@ void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
 #define PFERR_GUEST_PAGE_MASK  BIT_ULL(PFERR_GUEST_PAGE_BIT)
 #define PFERR_IMPLICIT_ACCESS  BIT_ULL(PFERR_IMPLICIT_ACCESS_BIT)
 
+bool sys_clocksource_is_based_on_tsc(void);
+
 #endif /* SELFTEST_KVM_PROCESSOR_H */
index 31b3cb24b9a75cdf0cc9e71f1f0cf820ec20edae..b9e23265e4b3833a4fa07acca1461fb35cd8297f 100644 (file)
@@ -65,7 +65,7 @@ int main(int argc, char *argv[])
 
                        int r = setrlimit(RLIMIT_NOFILE, &rl);
                        __TEST_REQUIRE(r >= 0,
-                                      "RLIMIT_NOFILE hard limit is too low (%d, wanted %d)\n",
+                                      "RLIMIT_NOFILE hard limit is too low (%d, wanted %d)",
                                       old_rlim_max, nr_fds_wanted);
                } else {
                        TEST_ASSERT(!setrlimit(RLIMIT_NOFILE, &rl), "setrlimit() failed!");
index e37dc9c21888f4bc4ed06bf3bacff89e28c68b5d..e0ba97ac1c5611a386981caf679b12910b600d8b 100644 (file)
@@ -204,9 +204,9 @@ static void *vcpu_worker(void *data)
                ret = _vcpu_run(vcpu);
                ts_diff = timespec_elapsed(start);
 
-               TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret);
+               TEST_ASSERT(ret == 0, "vcpu_run failed: %d", ret);
                TEST_ASSERT(get_ucall(vcpu, NULL) == UCALL_SYNC,
-                           "Invalid guest sync status: exit_reason=%s\n",
+                           "Invalid guest sync status: exit_reason=%s",
                            exit_reason_str(vcpu->run->exit_reason));
 
                pr_debug("Got sync event from vCPU %d\n", vcpu->id);
index 41c776b642c0cd0be722e4bad1e0e9cc1f0cff80..43b9a72833602e072ad7ba5e7e69c341fb8469a6 100644 (file)
@@ -398,7 +398,7 @@ void vcpu_args_set(struct kvm_vcpu *vcpu, unsigned int num, ...)
        int i;
 
        TEST_ASSERT(num >= 1 && num <= 8, "Unsupported number of args,\n"
-                   "  num: %u\n", num);
+                   "  num: %u", num);
 
        va_start(ap, num);
 
index b5f28d21a947704456b3e8cc9a0e86825d386c2f..184378d593e9a8e45ca7d46e6116ca692d710abd 100644 (file)
@@ -38,7 +38,7 @@ int vgic_v3_setup(struct kvm_vm *vm, unsigned int nr_vcpus, uint32_t nr_irqs,
        struct list_head *iter;
        unsigned int nr_gic_pages, nr_vcpus_created = 0;
 
-       TEST_ASSERT(nr_vcpus, "Number of vCPUs cannot be empty\n");
+       TEST_ASSERT(nr_vcpus, "Number of vCPUs cannot be empty");
 
        /*
         * Make sure that the caller is infact calling this
@@ -47,7 +47,7 @@ int vgic_v3_setup(struct kvm_vm *vm, unsigned int nr_vcpus, uint32_t nr_irqs,
        list_for_each(iter, &vm->vcpus)
                nr_vcpus_created++;
        TEST_ASSERT(nr_vcpus == nr_vcpus_created,
-                       "Number of vCPUs requested (%u) doesn't match with the ones created for the VM (%u)\n",
+                       "Number of vCPUs requested (%u) doesn't match with the ones created for the VM (%u)",
                        nr_vcpus, nr_vcpus_created);
 
        /* Distributor setup */
index 266f3876e10aff98955b6301d4e0ddaa3236e26e..f34d926d9735913f5ba826a8bee07aa0b8169d43 100644 (file)
@@ -184,7 +184,7 @@ void kvm_vm_elf_load(struct kvm_vm *vm, const char *filename)
                                "Seek to program segment offset failed,\n"
                                "  program header idx: %u errno: %i\n"
                                "  offset_rv: 0x%jx\n"
-                               "  expected: 0x%jx\n",
+                               "  expected: 0x%jx",
                                n1, errno, (intmax_t) offset_rv,
                                (intmax_t) phdr.p_offset);
                        test_read(fd, addr_gva2hva(vm, phdr.p_vaddr),
index e066d584c65611b4da45b0312734c3fab7b3dcd6..1b197426f29fcd1e6faf4abe3069044dfe64b9ca 100644 (file)
@@ -27,7 +27,8 @@ int open_path_or_exit(const char *path, int flags)
        int fd;
 
        fd = open(path, flags);
-       __TEST_REQUIRE(fd >= 0, "%s not available (errno: %d)", path, errno);
+       __TEST_REQUIRE(fd >= 0 || errno != ENOENT, "Cannot open %s: %s", path, strerror(errno));
+       TEST_ASSERT(fd >= 0, "Failed to open '%s'", path);
 
        return fd;
 }
@@ -320,7 +321,7 @@ static uint64_t vm_nr_pages_required(enum vm_guest_mode mode,
        uint64_t nr_pages;
 
        TEST_ASSERT(nr_runnable_vcpus,
-                   "Use vm_create_barebones() for VMs that _never_ have vCPUs\n");
+                   "Use vm_create_barebones() for VMs that _never_ have vCPUs");
 
        TEST_ASSERT(nr_runnable_vcpus <= kvm_check_cap(KVM_CAP_MAX_VCPUS),
                    "nr_vcpus = %d too large for host, max-vcpus = %d",
@@ -491,7 +492,7 @@ void kvm_pin_this_task_to_pcpu(uint32_t pcpu)
        CPU_ZERO(&mask);
        CPU_SET(pcpu, &mask);
        r = sched_setaffinity(0, sizeof(mask), &mask);
-       TEST_ASSERT(!r, "sched_setaffinity() failed for pCPU '%u'.\n", pcpu);
+       TEST_ASSERT(!r, "sched_setaffinity() failed for pCPU '%u'.", pcpu);
 }
 
 static uint32_t parse_pcpu(const char *cpu_str, const cpu_set_t *allowed_mask)
@@ -499,7 +500,7 @@ static uint32_t parse_pcpu(const char *cpu_str, const cpu_set_t *allowed_mask)
        uint32_t pcpu = atoi_non_negative("CPU number", cpu_str);
 
        TEST_ASSERT(CPU_ISSET(pcpu, allowed_mask),
-                   "Not allowed to run on pCPU '%d', check cgroups?\n", pcpu);
+                   "Not allowed to run on pCPU '%d', check cgroups?", pcpu);
        return pcpu;
 }
 
@@ -529,7 +530,7 @@ void kvm_parse_vcpu_pinning(const char *pcpus_string, uint32_t vcpu_to_pcpu[],
        int i, r;
 
        cpu_list = strdup(pcpus_string);
-       TEST_ASSERT(cpu_list, "strdup() allocation failed.\n");
+       TEST_ASSERT(cpu_list, "strdup() allocation failed.");
 
        r = sched_getaffinity(0, sizeof(allowed_mask), &allowed_mask);
        TEST_ASSERT(!r, "sched_getaffinity() failed");
@@ -538,7 +539,7 @@ void kvm_parse_vcpu_pinning(const char *pcpus_string, uint32_t vcpu_to_pcpu[],
 
        /* 1. Get all pcpus for vcpus. */
        for (i = 0; i < nr_vcpus; i++) {
-               TEST_ASSERT(cpu, "pCPU not provided for vCPU '%d'\n", i);
+               TEST_ASSERT(cpu, "pCPU not provided for vCPU '%d'", i);
                vcpu_to_pcpu[i] = parse_pcpu(cpu, &allowed_mask);
                cpu = strtok(NULL, delim);
        }
@@ -1057,7 +1058,7 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
        TEST_ASSERT(ret == 0, "KVM_SET_USER_MEMORY_REGION2 IOCTL failed,\n"
                "  rc: %i errno: %i\n"
                "  slot: %u flags: 0x%x\n"
-               "  guest_phys_addr: 0x%lx size: 0x%lx guest_memfd: %d\n",
+               "  guest_phys_addr: 0x%lx size: 0x%lx guest_memfd: %d",
                ret, errno, slot, flags,
                guest_paddr, (uint64_t) region->region.memory_size,
                region->region.guest_memfd);
@@ -1222,7 +1223,7 @@ void vm_guest_mem_fallocate(struct kvm_vm *vm, uint64_t base, uint64_t size,
                len = min_t(uint64_t, end - gpa, region->region.memory_size - offset);
 
                ret = fallocate(region->region.guest_memfd, mode, fd_offset, len);
-               TEST_ASSERT(!ret, "fallocate() failed to %s at %lx (len = %lu), fd = %d, mode = %x, offset = %lx\n",
+               TEST_ASSERT(!ret, "fallocate() failed to %s at %lx (len = %lu), fd = %d, mode = %x, offset = %lx",
                            punch_hole ? "punch hole" : "allocate", gpa, len,
                            region->region.guest_memfd, mode, fd_offset);
        }
@@ -1265,7 +1266,7 @@ struct kvm_vcpu *__vm_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id)
        struct kvm_vcpu *vcpu;
 
        /* Confirm a vcpu with the specified id doesn't already exist. */
-       TEST_ASSERT(!vcpu_exists(vm, vcpu_id), "vCPU%d already exists\n", vcpu_id);
+       TEST_ASSERT(!vcpu_exists(vm, vcpu_id), "vCPU%d already exists", vcpu_id);
 
        /* Allocate and initialize new vcpu structure. */
        vcpu = calloc(1, sizeof(*vcpu));
index d05487e5a371df1d17c96d4fbec1b1f6b2e60c0f..cf2c739713080f3f55e1383fb5d0a9e65f5d1dc7 100644 (file)
@@ -192,7 +192,7 @@ struct kvm_vm *memstress_create_vm(enum vm_guest_mode mode, int nr_vcpus,
        TEST_ASSERT(guest_num_pages < region_end_gfn,
                    "Requested more guest memory than address space allows.\n"
                    "    guest pages: %" PRIx64 " max gfn: %" PRIx64
-                   " nr_vcpus: %d wss: %" PRIx64 "]\n",
+                   " nr_vcpus: %d wss: %" PRIx64 "]",
                    guest_num_pages, region_end_gfn - 1, nr_vcpus, vcpu_memory_bytes);
 
        args->gpa = (region_end_gfn - guest_num_pages - 1) * args->guest_page_size;
index 7ca736fb4194046072bf69b3210f0fefd8ce0834..2bb33a8ac03c25f622ec6dc21430529b8b128a9d 100644 (file)
@@ -327,7 +327,7 @@ void vcpu_args_set(struct kvm_vcpu *vcpu, unsigned int num, ...)
        int i;
 
        TEST_ASSERT(num >= 1 && num <= 8, "Unsupported number of args,\n"
-                   "  num: %u\n", num);
+                   "  num: %u", num);
 
        va_start(ap, num);
 
index 15945121daf17dc46bf38cf8f2a24d23b8f073f3..f6d227892cbcfc88cc97abcc9a0d78b2aa38f459 100644 (file)
@@ -198,7 +198,7 @@ void vcpu_args_set(struct kvm_vcpu *vcpu, unsigned int num, ...)
        int i;
 
        TEST_ASSERT(num >= 1 && num <= 5, "Unsupported number of args,\n"
-                   "  num: %u\n",
+                   "  num: %u",
                    num);
 
        va_start(ap, num);
index 5d7f28b02d73bab79891b808bb46c095c52bd505..5a8f8becb12984ff52a16f01281b5d396ce318be 100644 (file)
@@ -392,3 +392,28 @@ char *strdup_printf(const char *fmt, ...)
 
        return str;
 }
+
+#define CLOCKSOURCE_PATH "/sys/devices/system/clocksource/clocksource0/current_clocksource"
+
+char *sys_get_cur_clocksource(void)
+{
+       char *clk_name;
+       struct stat st;
+       FILE *fp;
+
+       fp = fopen(CLOCKSOURCE_PATH, "r");
+       TEST_ASSERT(fp, "failed to open clocksource file, errno: %d", errno);
+
+       TEST_ASSERT(!fstat(fileno(fp), &st), "failed to stat clocksource file, errno: %d",
+                   errno);
+
+       clk_name = malloc(st.st_size);
+       TEST_ASSERT(clk_name, "failed to allocate buffer to read file");
+
+       TEST_ASSERT(fgets(clk_name, st.st_size, fp), "failed to read clocksource file: %d",
+                   ferror(fp));
+
+       fclose(fp);
+
+       return clk_name;
+}
index 271f6389158122a973fa597d68ee147f7169374c..f4eef6eb2dc2cc2bdefc90cbe5ac845be3a18b53 100644 (file)
@@ -69,7 +69,7 @@ static void *uffd_handler_thread_fn(void *arg)
                if (pollfd[1].revents & POLLIN) {
                        r = read(pollfd[1].fd, &tmp_chr, 1);
                        TEST_ASSERT(r == 1,
-                                   "Error reading pipefd in UFFD thread\n");
+                                   "Error reading pipefd in UFFD thread");
                        break;
                }
 
index d8288374078e4b3ce888bed569c7e59192d43e7c..f639b3e062e3a328165bfb6ed32f606cf1f48e76 100644 (file)
@@ -170,10 +170,10 @@ static uint64_t *virt_create_upper_pte(struct kvm_vm *vm,
                 * this level.
                 */
                TEST_ASSERT(current_level != target_level,
-                           "Cannot create hugepage at level: %u, vaddr: 0x%lx\n",
+                           "Cannot create hugepage at level: %u, vaddr: 0x%lx",
                            current_level, vaddr);
                TEST_ASSERT(!(*pte & PTE_LARGE_MASK),
-                           "Cannot create page table at level: %u, vaddr: 0x%lx\n",
+                           "Cannot create page table at level: %u, vaddr: 0x%lx",
                            current_level, vaddr);
        }
        return pte;
@@ -220,7 +220,7 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
        /* Fill in page table entry. */
        pte = virt_get_pte(vm, pde, vaddr, PG_LEVEL_4K);
        TEST_ASSERT(!(*pte & PTE_PRESENT_MASK),
-                   "PTE already present for 4k page at vaddr: 0x%lx\n", vaddr);
+                   "PTE already present for 4k page at vaddr: 0x%lx", vaddr);
        *pte = PTE_PRESENT_MASK | PTE_WRITABLE_MASK | (paddr & PHYSICAL_PAGE_MASK);
 }
 
@@ -253,7 +253,7 @@ static bool vm_is_target_pte(uint64_t *pte, int *level, int current_level)
        if (*pte & PTE_LARGE_MASK) {
                TEST_ASSERT(*level == PG_LEVEL_NONE ||
                            *level == current_level,
-                           "Unexpected hugepage at level %d\n", current_level);
+                           "Unexpected hugepage at level %d", current_level);
                *level = current_level;
        }
 
@@ -825,7 +825,7 @@ void vcpu_args_set(struct kvm_vcpu *vcpu, unsigned int num, ...)
        struct kvm_regs regs;
 
        TEST_ASSERT(num >= 1 && num <= 6, "Unsupported number of args,\n"
-                   "  num: %u\n",
+                   "  num: %u",
                    num);
 
        va_start(ap, num);
@@ -1299,3 +1299,14 @@ void kvm_selftest_arch_init(void)
        host_cpu_is_intel = this_cpu_is_intel();
        host_cpu_is_amd = this_cpu_is_amd();
 }
+
+bool sys_clocksource_is_based_on_tsc(void)
+{
+       char *clk_name = sys_get_cur_clocksource();
+       bool ret = !strcmp(clk_name, "tsc\n") ||
+                  !strcmp(clk_name, "hyperv_clocksource_tsc_page\n");
+
+       free(clk_name);
+
+       return ret;
+}
index 59d97531c9b17f7b0c1893d1aa5a63575e3fdb5c..089b8925b6b22d9929bee551435ebdca0fa24764 100644 (file)
@@ -54,7 +54,7 @@ int vcpu_enable_evmcs(struct kvm_vcpu *vcpu)
        /* KVM should return supported EVMCS version range */
        TEST_ASSERT(((evmcs_ver >> 8) >= (evmcs_ver & 0xff)) &&
                    (evmcs_ver & 0xff) > 0,
-                   "Incorrect EVMCS version range: %x:%x\n",
+                   "Incorrect EVMCS version range: %x:%x",
                    evmcs_ver & 0xff, evmcs_ver >> 8);
 
        return evmcs_ver;
@@ -387,10 +387,10 @@ static void nested_create_pte(struct kvm_vm *vm,
                 * this level.
                 */
                TEST_ASSERT(current_level != target_level,
-                           "Cannot create hugepage at level: %u, nested_paddr: 0x%lx\n",
+                           "Cannot create hugepage at level: %u, nested_paddr: 0x%lx",
                            current_level, nested_paddr);
                TEST_ASSERT(!pte->page_size,
-                           "Cannot create page table at level: %u, nested_paddr: 0x%lx\n",
+                           "Cannot create page table at level: %u, nested_paddr: 0x%lx",
                            current_level, nested_paddr);
        }
 }
index 9855c41ca811fa69a77f41f212ddc6086d47467c..1563619666123fed1da83a2786d309c51cde25d3 100644 (file)
@@ -45,7 +45,7 @@ static void vcpu_worker(struct memstress_vcpu_args *vcpu_args)
        /* Let the guest access its memory until a stop signal is received */
        while (!READ_ONCE(memstress_args.stop_vcpus)) {
                ret = _vcpu_run(vcpu);
-               TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret);
+               TEST_ASSERT(ret == 0, "vcpu_run failed: %d", ret);
 
                if (get_ucall(vcpu, NULL) == UCALL_SYNC)
                        continue;
index 8698d1ab60d00f72399571b62b49314f3a8c401f..579a64f97333b8d0ebd4533fdfdee1a172ef731a 100644 (file)
@@ -175,11 +175,11 @@ static void wait_for_vcpu(void)
        struct timespec ts;
 
        TEST_ASSERT(!clock_gettime(CLOCK_REALTIME, &ts),
-                   "clock_gettime() failed: %d\n", errno);
+                   "clock_gettime() failed: %d", errno);
 
        ts.tv_sec += 2;
        TEST_ASSERT(!sem_timedwait(&vcpu_ready, &ts),
-                   "sem_timedwait() failed: %d\n", errno);
+                   "sem_timedwait() failed: %d", errno);
 }
 
 static void *vm_gpa2hva(struct vm_data *data, uint64_t gpa, uint64_t *rempages)
@@ -336,7 +336,7 @@ static bool prepare_vm(struct vm_data *data, int nslots, uint64_t *maxslots,
 
                gpa = vm_phy_pages_alloc(data->vm, npages, guest_addr, slot);
                TEST_ASSERT(gpa == guest_addr,
-                           "vm_phy_pages_alloc() failed\n");
+                           "vm_phy_pages_alloc() failed");
 
                data->hva_slots[slot - 1] = addr_gpa2hva(data->vm, guest_addr);
                memset(data->hva_slots[slot - 1], 0, npages * guest_page_size);
index 6652108816db462160230a17c4da32fa078526dc..6435e7a6564252fae5e1cbd2275a6fc0c3a7f12f 100644 (file)
@@ -49,15 +49,42 @@ bool filter_reg(__u64 reg)
        case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_SVPBMT:
        case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZBA:
        case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZBB:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZBC:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZBKB:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZBKC:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZBKX:
        case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZBS:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZFA:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZFH:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZFHMIN:
        case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZICBOM:
        case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZICBOZ:
        case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZICNTR:
        case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZICOND:
        case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZICSR:
        case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZIFENCEI:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZIHINTNTL:
        case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZIHINTPAUSE:
        case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZIHPM:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZKND:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZKNE:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZKNH:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZKR:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZKSED:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZKSH:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZKT:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVBB:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVBC:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVFH:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVFHMIN:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVKB:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVKG:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVKNED:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVKNHA:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVKNHB:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVKSED:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVKSH:
+       case KVM_REG_RISCV_ISA_EXT | KVM_REG_RISCV_ISA_SINGLE | KVM_RISCV_ISA_EXT_ZVKT:
        /*
         * Like ISA_EXT registers, SBI_EXT registers are only visible when the
         * host supports them and disabling them does not affect the visibility
@@ -150,7 +177,7 @@ void finalize_vcpu(struct kvm_vcpu *vcpu, struct vcpu_reg_list *c)
 
                /* Double check whether the desired extension was enabled */
                __TEST_REQUIRE(vcpu_has_ext(vcpu, feature),
-                              "%s not available, skipping tests\n", s->name);
+                              "%s not available, skipping tests", s->name);
        }
 }
 
@@ -394,15 +421,42 @@ static const char *isa_ext_single_id_to_str(__u64 reg_off)
                KVM_ISA_EXT_ARR(SVPBMT),
                KVM_ISA_EXT_ARR(ZBA),
                KVM_ISA_EXT_ARR(ZBB),
+               KVM_ISA_EXT_ARR(ZBC),
+               KVM_ISA_EXT_ARR(ZBKB),
+               KVM_ISA_EXT_ARR(ZBKC),
+               KVM_ISA_EXT_ARR(ZBKX),
                KVM_ISA_EXT_ARR(ZBS),
+               KVM_ISA_EXT_ARR(ZFA),
+               KVM_ISA_EXT_ARR(ZFH),
+               KVM_ISA_EXT_ARR(ZFHMIN),
                KVM_ISA_EXT_ARR(ZICBOM),
                KVM_ISA_EXT_ARR(ZICBOZ),
                KVM_ISA_EXT_ARR(ZICNTR),
                KVM_ISA_EXT_ARR(ZICOND),
                KVM_ISA_EXT_ARR(ZICSR),
                KVM_ISA_EXT_ARR(ZIFENCEI),
+               KVM_ISA_EXT_ARR(ZIHINTNTL),
                KVM_ISA_EXT_ARR(ZIHINTPAUSE),
                KVM_ISA_EXT_ARR(ZIHPM),
+               KVM_ISA_EXT_ARR(ZKND),
+               KVM_ISA_EXT_ARR(ZKNE),
+               KVM_ISA_EXT_ARR(ZKNH),
+               KVM_ISA_EXT_ARR(ZKR),
+               KVM_ISA_EXT_ARR(ZKSED),
+               KVM_ISA_EXT_ARR(ZKSH),
+               KVM_ISA_EXT_ARR(ZKT),
+               KVM_ISA_EXT_ARR(ZVBB),
+               KVM_ISA_EXT_ARR(ZVBC),
+               KVM_ISA_EXT_ARR(ZVFH),
+               KVM_ISA_EXT_ARR(ZVFHMIN),
+               KVM_ISA_EXT_ARR(ZVKB),
+               KVM_ISA_EXT_ARR(ZVKG),
+               KVM_ISA_EXT_ARR(ZVKNED),
+               KVM_ISA_EXT_ARR(ZVKNHA),
+               KVM_ISA_EXT_ARR(ZVKNHB),
+               KVM_ISA_EXT_ARR(ZVKSED),
+               KVM_ISA_EXT_ARR(ZVKSH),
+               KVM_ISA_EXT_ARR(ZVKT),
        };
 
        if (reg_off >= ARRAY_SIZE(kvm_isa_ext_reg_name))
@@ -888,15 +942,42 @@ KVM_ISA_EXT_SIMPLE_CONFIG(svnapot, SVNAPOT);
 KVM_ISA_EXT_SIMPLE_CONFIG(svpbmt, SVPBMT);
 KVM_ISA_EXT_SIMPLE_CONFIG(zba, ZBA);
 KVM_ISA_EXT_SIMPLE_CONFIG(zbb, ZBB);
+KVM_ISA_EXT_SIMPLE_CONFIG(zbc, ZBC);
+KVM_ISA_EXT_SIMPLE_CONFIG(zbkb, ZBKB);
+KVM_ISA_EXT_SIMPLE_CONFIG(zbkc, ZBKC);
+KVM_ISA_EXT_SIMPLE_CONFIG(zbkx, ZBKX);
 KVM_ISA_EXT_SIMPLE_CONFIG(zbs, ZBS);
+KVM_ISA_EXT_SIMPLE_CONFIG(zfa, ZFA);
+KVM_ISA_EXT_SIMPLE_CONFIG(zfh, ZFH);
+KVM_ISA_EXT_SIMPLE_CONFIG(zfhmin, ZFHMIN);
 KVM_ISA_EXT_SUBLIST_CONFIG(zicbom, ZICBOM);
 KVM_ISA_EXT_SUBLIST_CONFIG(zicboz, ZICBOZ);
 KVM_ISA_EXT_SIMPLE_CONFIG(zicntr, ZICNTR);
 KVM_ISA_EXT_SIMPLE_CONFIG(zicond, ZICOND);
 KVM_ISA_EXT_SIMPLE_CONFIG(zicsr, ZICSR);
 KVM_ISA_EXT_SIMPLE_CONFIG(zifencei, ZIFENCEI);
+KVM_ISA_EXT_SIMPLE_CONFIG(zihintntl, ZIHINTNTL);
 KVM_ISA_EXT_SIMPLE_CONFIG(zihintpause, ZIHINTPAUSE);
 KVM_ISA_EXT_SIMPLE_CONFIG(zihpm, ZIHPM);
+KVM_ISA_EXT_SIMPLE_CONFIG(zknd, ZKND);
+KVM_ISA_EXT_SIMPLE_CONFIG(zkne, ZKNE);
+KVM_ISA_EXT_SIMPLE_CONFIG(zknh, ZKNH);
+KVM_ISA_EXT_SIMPLE_CONFIG(zkr, ZKR);
+KVM_ISA_EXT_SIMPLE_CONFIG(zksed, ZKSED);
+KVM_ISA_EXT_SIMPLE_CONFIG(zksh, ZKSH);
+KVM_ISA_EXT_SIMPLE_CONFIG(zkt, ZKT);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvbb, ZVBB);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvbc, ZVBC);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvfh, ZVFH);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvfhmin, ZVFHMIN);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvkb, ZVKB);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvkg, ZVKG);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvkned, ZVKNED);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvknha, ZVKNHA);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvknhb, ZVKNHB);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvksed, ZVKSED);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvksh, ZVKSH);
+KVM_ISA_EXT_SIMPLE_CONFIG(zvkt, ZVKT);
 
 struct vcpu_reg_list *vcpu_configs[] = {
        &config_sbi_base,
@@ -914,14 +995,41 @@ struct vcpu_reg_list *vcpu_configs[] = {
        &config_svpbmt,
        &config_zba,
        &config_zbb,
+       &config_zbc,
+       &config_zbkb,
+       &config_zbkc,
+       &config_zbkx,
        &config_zbs,
+       &config_zfa,
+       &config_zfh,
+       &config_zfhmin,
        &config_zicbom,
        &config_zicboz,
        &config_zicntr,
        &config_zicond,
        &config_zicsr,
        &config_zifencei,
+       &config_zihintntl,
        &config_zihintpause,
        &config_zihpm,
+       &config_zknd,
+       &config_zkne,
+       &config_zknh,
+       &config_zkr,
+       &config_zksed,
+       &config_zksh,
+       &config_zkt,
+       &config_zvbb,
+       &config_zvbc,
+       &config_zvfh,
+       &config_zvfhmin,
+       &config_zvkb,
+       &config_zvkg,
+       &config_zvkned,
+       &config_zvknha,
+       &config_zvknhb,
+       &config_zvksed,
+       &config_zvksh,
+       &config_zvkt,
 };
 int vcpu_configs_n = ARRAY_SIZE(vcpu_configs);
index f74e76d03b7e306667221ee9c5411bda2af417d5..28f97fb520441476c374dffc0bcf24bf6553a021 100644 (file)
@@ -245,7 +245,7 @@ int main(int argc, char *argv[])
                } while (snapshot != atomic_read(&seq_cnt));
 
                TEST_ASSERT(rseq_cpu == cpu,
-                           "rseq CPU = %d, sched CPU = %d\n", rseq_cpu, cpu);
+                           "rseq CPU = %d, sched CPU = %d", rseq_cpu, cpu);
        }
 
        /*
@@ -256,7 +256,7 @@ int main(int argc, char *argv[])
         * migrations given the 1us+ delay in the migration task.
         */
        TEST_ASSERT(i > (NR_TASK_MIGRATIONS / 2),
-                   "Only performed %d KVM_RUNs, task stalled too much?\n", i);
+                   "Only performed %d KVM_RUNs, task stalled too much?", i);
 
        pthread_join(migration_thread, NULL);
 
index e41e2cb8ffa9797c470fb061a34fb106b687a47d..357943f2bea87fff66384bcf381cf9100a5a3a6b 100644 (file)
@@ -78,7 +78,7 @@ static void assert_noirq(struct kvm_vcpu *vcpu)
         * (notably, the emergency call interrupt we have injected) should
         * be cleared by the resets, so this should be 0.
         */
-       TEST_ASSERT(irqs >= 0, "Could not fetch IRQs: errno %d\n", errno);
+       TEST_ASSERT(irqs >= 0, "Could not fetch IRQs: errno %d", errno);
        TEST_ASSERT(!irqs, "IRQ pending");
 }
 
@@ -199,7 +199,7 @@ static void inject_irq(struct kvm_vcpu *vcpu)
        irq->type = KVM_S390_INT_EMERGENCY;
        irq->u.emerg.code = vcpu->id;
        irqs = __vcpu_ioctl(vcpu, KVM_S390_SET_IRQ_STATE, &irq_state);
-       TEST_ASSERT(irqs >= 0, "Error injecting EMERGENCY IRQ errno %d\n", errno);
+       TEST_ASSERT(irqs >= 0, "Error injecting EMERGENCY IRQ errno %d", errno);
 }
 
 static struct kvm_vm *create_vm(struct kvm_vcpu **vcpu)
index 636a70ddac1ea36151cb57a0dfd74ddcca33de14..43fb25ddc3eca3a83ba14c1905972243c4bda793 100644 (file)
@@ -39,13 +39,13 @@ static void guest_code(void)
 #define REG_COMPARE(reg) \
        TEST_ASSERT(left->reg == right->reg, \
                    "Register " #reg \
-                   " values did not match: 0x%llx, 0x%llx\n", \
+                   " values did not match: 0x%llx, 0x%llx", \
                    left->reg, right->reg)
 
 #define REG_COMPARE32(reg) \
        TEST_ASSERT(left->reg == right->reg, \
                    "Register " #reg \
-                   " values did not match: 0x%x, 0x%x\n", \
+                   " values did not match: 0x%x, 0x%x", \
                    left->reg, right->reg)
 
 
@@ -82,14 +82,14 @@ void test_read_invalid(struct kvm_vcpu *vcpu)
        run->kvm_valid_regs = INVALID_SYNC_FIELD;
        rv = _vcpu_run(vcpu);
        TEST_ASSERT(rv < 0 && errno == EINVAL,
-                   "Invalid kvm_valid_regs did not cause expected KVM_RUN error: %d\n",
+                   "Invalid kvm_valid_regs did not cause expected KVM_RUN error: %d",
                    rv);
        run->kvm_valid_regs = 0;
 
        run->kvm_valid_regs = INVALID_SYNC_FIELD | TEST_SYNC_FIELDS;
        rv = _vcpu_run(vcpu);
        TEST_ASSERT(rv < 0 && errno == EINVAL,
-                   "Invalid kvm_valid_regs did not cause expected KVM_RUN error: %d\n",
+                   "Invalid kvm_valid_regs did not cause expected KVM_RUN error: %d",
                    rv);
        run->kvm_valid_regs = 0;
 }
@@ -103,14 +103,14 @@ void test_set_invalid(struct kvm_vcpu *vcpu)
        run->kvm_dirty_regs = INVALID_SYNC_FIELD;
        rv = _vcpu_run(vcpu);
        TEST_ASSERT(rv < 0 && errno == EINVAL,
-                   "Invalid kvm_dirty_regs did not cause expected KVM_RUN error: %d\n",
+                   "Invalid kvm_dirty_regs did not cause expected KVM_RUN error: %d",
                    rv);
        run->kvm_dirty_regs = 0;
 
        run->kvm_dirty_regs = INVALID_SYNC_FIELD | TEST_SYNC_FIELDS;
        rv = _vcpu_run(vcpu);
        TEST_ASSERT(rv < 0 && errno == EINVAL,
-                   "Invalid kvm_dirty_regs did not cause expected KVM_RUN error: %d\n",
+                   "Invalid kvm_dirty_regs did not cause expected KVM_RUN error: %d",
                    rv);
        run->kvm_dirty_regs = 0;
 }
@@ -125,12 +125,12 @@ void test_req_and_verify_all_valid_regs(struct kvm_vcpu *vcpu)
        /* Request and verify all valid register sets. */
        run->kvm_valid_regs = TEST_SYNC_FIELDS;
        rv = _vcpu_run(vcpu);
-       TEST_ASSERT(rv == 0, "vcpu_run failed: %d\n", rv);
+       TEST_ASSERT(rv == 0, "vcpu_run failed: %d", rv);
        TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_S390_SIEIC);
        TEST_ASSERT(run->s390_sieic.icptcode == 4 &&
                    (run->s390_sieic.ipa >> 8) == 0x83 &&
                    (run->s390_sieic.ipb >> 16) == 0x501,
-                   "Unexpected interception code: ic=%u, ipa=0x%x, ipb=0x%x\n",
+                   "Unexpected interception code: ic=%u, ipa=0x%x, ipb=0x%x",
                    run->s390_sieic.icptcode, run->s390_sieic.ipa,
                    run->s390_sieic.ipb);
 
@@ -161,7 +161,7 @@ void test_set_and_verify_various_reg_values(struct kvm_vcpu *vcpu)
        }
 
        rv = _vcpu_run(vcpu);
-       TEST_ASSERT(rv == 0, "vcpu_run failed: %d\n", rv);
+       TEST_ASSERT(rv == 0, "vcpu_run failed: %d", rv);
        TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_S390_SIEIC);
        TEST_ASSERT(run->s.regs.gprs[11] == 0xBAD1DEA + 1,
                    "r11 sync regs value incorrect 0x%llx.",
@@ -193,7 +193,7 @@ void test_clear_kvm_dirty_regs_bits(struct kvm_vcpu *vcpu)
        run->s.regs.gprs[11] = 0xDEADBEEF;
        run->s.regs.diag318 = 0x4B1D;
        rv = _vcpu_run(vcpu);
-       TEST_ASSERT(rv == 0, "vcpu_run failed: %d\n", rv);
+       TEST_ASSERT(rv == 0, "vcpu_run failed: %d", rv);
        TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_S390_SIEIC);
        TEST_ASSERT(run->s.regs.gprs[11] != 0xDEADBEEF,
                    "r11 sync regs value incorrect 0x%llx.",
index 075b80dbe2370d2ff472685f4b02b4e1243d7123..06b43ed23580b67c060aeaadea11b06641a629c3 100644 (file)
@@ -98,11 +98,11 @@ static void wait_for_vcpu(void)
        struct timespec ts;
 
        TEST_ASSERT(!clock_gettime(CLOCK_REALTIME, &ts),
-                   "clock_gettime() failed: %d\n", errno);
+                   "clock_gettime() failed: %d", errno);
 
        ts.tv_sec += 2;
        TEST_ASSERT(!sem_timedwait(&vcpu_ready, &ts),
-                   "sem_timedwait() failed: %d\n", errno);
+                   "sem_timedwait() failed: %d", errno);
 
        /* Wait for the vCPU thread to reenter the guest. */
        usleep(100000);
@@ -302,7 +302,7 @@ static void test_delete_memory_region(void)
        if (run->exit_reason == KVM_EXIT_INTERNAL_ERROR)
                TEST_ASSERT(regs.rip >= final_rip_start &&
                            regs.rip < final_rip_end,
-                           "Bad rip, expected 0x%lx - 0x%lx, got 0x%llx\n",
+                           "Bad rip, expected 0x%lx - 0x%lx, got 0x%llx",
                            final_rip_start, final_rip_end, regs.rip);
 
        kvm_vm_free(vm);
@@ -367,11 +367,21 @@ static void test_invalid_memory_region_flags(void)
        }
 
        if (supported_flags & KVM_MEM_GUEST_MEMFD) {
+               int guest_memfd = vm_create_guest_memfd(vm, MEM_REGION_SIZE, 0);
+
                r = __vm_set_user_memory_region2(vm, 0,
                                                 KVM_MEM_LOG_DIRTY_PAGES | KVM_MEM_GUEST_MEMFD,
-                                                0, MEM_REGION_SIZE, NULL, 0, 0);
+                                                0, MEM_REGION_SIZE, NULL, guest_memfd, 0);
                TEST_ASSERT(r && errno == EINVAL,
                            "KVM_SET_USER_MEMORY_REGION2 should have failed, dirty logging private memory is unsupported");
+
+               r = __vm_set_user_memory_region2(vm, 0,
+                                                KVM_MEM_READONLY | KVM_MEM_GUEST_MEMFD,
+                                                0, MEM_REGION_SIZE, NULL, guest_memfd, 0);
+               TEST_ASSERT(r && errno == EINVAL,
+                           "KVM_SET_USER_MEMORY_REGION2 should have failed, read-only GUEST_MEMFD memslots are unsupported");
+
+               close(guest_memfd);
        }
 }
 
index 7f5b330b6a1b182f7a5890d3f7b67b7d2535accc..513d421a9bff85a96e619f187310769de7e48490 100644 (file)
@@ -108,7 +108,7 @@ static void enter_guest(struct kvm_vcpu *vcpu)
                        handle_abort(&uc);
                        return;
                default:
-                       TEST_ASSERT(0, "unhandled ucall %ld\n",
+                       TEST_ASSERT(0, "unhandled ucall %ld",
                                    get_ucall(vcpu, &uc));
                }
        }
index 11329e5ff945eb1c383c7237ebc349712734dd50..eae521f050e09fd6f69e896c03cf381ac05c4d71 100644 (file)
@@ -221,7 +221,7 @@ int main(int argc, char *argv[])
        vm_vaddr_t amx_cfg, tiledata, xstate;
        struct ucall uc;
        u32 amx_offset;
-       int stage, ret;
+       int ret;
 
        /*
         * Note, all off-by-default features must be enabled before anything
@@ -263,7 +263,7 @@ int main(int argc, char *argv[])
        memset(addr_gva2hva(vm, xstate), 0, PAGE_SIZE * DIV_ROUND_UP(XSAVE_SIZE, PAGE_SIZE));
        vcpu_args_set(vcpu, 3, amx_cfg, tiledata, xstate);
 
-       for (stage = 1; ; stage++) {
+       for (;;) {
                vcpu_run(vcpu);
                TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
 
@@ -296,7 +296,7 @@ int main(int argc, char *argv[])
                                void *tiles_data = (void *)addr_gva2hva(vm, tiledata);
                                /* Only check TMM0 register, 1 tile */
                                ret = memcmp(amx_start, tiles_data, TILE_SIZE);
-                               TEST_ASSERT(ret == 0, "memcmp failed, ret=%d\n", ret);
+                               TEST_ASSERT(ret == 0, "memcmp failed, ret=%d", ret);
                                kvm_x86_state_cleanup(state);
                                break;
                        case 9:
index 3b34d8156d1c97a879d497a12eefe4f110628973..8c579ce714e9a7ce3982123b089856c3b5963d43 100644 (file)
@@ -84,7 +84,7 @@ static void compare_cpuids(const struct kvm_cpuid2 *cpuid1,
 
                TEST_ASSERT(e1->function == e2->function &&
                            e1->index == e2->index && e1->flags == e2->flags,
-                           "CPUID entries[%d] mismtach: 0x%x.%d.%x vs. 0x%x.%d.%x\n",
+                           "CPUID entries[%d] mismtach: 0x%x.%d.%x vs. 0x%x.%d.%x",
                            i, e1->function, e1->index, e1->flags,
                            e2->function, e2->index, e2->flags);
 
@@ -170,7 +170,7 @@ static void test_get_cpuid2(struct kvm_vcpu *vcpu)
 
        vcpu_ioctl(vcpu, KVM_GET_CPUID2, cpuid);
        TEST_ASSERT(cpuid->nent == vcpu->cpuid->nent,
-                   "KVM didn't update nent on success, wanted %u, got %u\n",
+                   "KVM didn't update nent on success, wanted %u, got %u",
                    vcpu->cpuid->nent, cpuid->nent);
 
        for (i = 0; i < vcpu->cpuid->nent; i++) {
index 634c6bfcd5720717e0d4cc93e9cdf92ae96fb1d0..ee3b384b991c8be2957bdcf56fa52aa6a4a00a26 100644 (file)
@@ -92,7 +92,6 @@ static void run_test(enum vm_guest_mode mode, void *unused)
        uint64_t host_num_pages;
        uint64_t pages_per_slot;
        int i;
-       uint64_t total_4k_pages;
        struct kvm_page_stats stats_populated;
        struct kvm_page_stats stats_dirty_logging_enabled;
        struct kvm_page_stats stats_dirty_pass[ITERATIONS];
@@ -107,6 +106,9 @@ static void run_test(enum vm_guest_mode mode, void *unused)
        guest_num_pages = vm_adjust_num_guest_pages(mode, guest_num_pages);
        host_num_pages = vm_num_host_pages(mode, guest_num_pages);
        pages_per_slot = host_num_pages / SLOTS;
+       TEST_ASSERT_EQ(host_num_pages, pages_per_slot * SLOTS);
+       TEST_ASSERT(!(host_num_pages % 512),
+                   "Number of pages, '%lu' not a multiple of 2MiB", host_num_pages);
 
        bitmaps = memstress_alloc_bitmaps(SLOTS, pages_per_slot);
 
@@ -165,10 +167,8 @@ static void run_test(enum vm_guest_mode mode, void *unused)
        memstress_free_bitmaps(bitmaps, SLOTS);
        memstress_destroy_vm(vm);
 
-       /* Make assertions about the page counts. */
-       total_4k_pages = stats_populated.pages_4k;
-       total_4k_pages += stats_populated.pages_2m * 512;
-       total_4k_pages += stats_populated.pages_1g * 512 * 512;
+       TEST_ASSERT_EQ((stats_populated.pages_2m * 512 +
+                       stats_populated.pages_1g * 512 * 512), host_num_pages);
 
        /*
         * Check that all huge pages were split. Since large pages can only
@@ -180,19 +180,22 @@ static void run_test(enum vm_guest_mode mode, void *unused)
         */
        if (dirty_log_manual_caps) {
                TEST_ASSERT_EQ(stats_clear_pass[0].hugepages, 0);
-               TEST_ASSERT_EQ(stats_clear_pass[0].pages_4k, total_4k_pages);
+               TEST_ASSERT(stats_clear_pass[0].pages_4k >= host_num_pages,
+                           "Expected at least '%lu' 4KiB pages, found only '%lu'",
+                           host_num_pages, stats_clear_pass[0].pages_4k);
                TEST_ASSERT_EQ(stats_dirty_logging_enabled.hugepages, stats_populated.hugepages);
        } else {
                TEST_ASSERT_EQ(stats_dirty_logging_enabled.hugepages, 0);
-               TEST_ASSERT_EQ(stats_dirty_logging_enabled.pages_4k, total_4k_pages);
+               TEST_ASSERT(stats_dirty_logging_enabled.pages_4k >= host_num_pages,
+                           "Expected at least '%lu' 4KiB pages, found only '%lu'",
+                           host_num_pages, stats_dirty_logging_enabled.pages_4k);
        }
 
        /*
         * Once dirty logging is disabled and the vCPUs have touched all their
-        * memory again, the page counts should be the same as they were
+        * memory again, the hugepage counts should be the same as they were
         * right after initial population of memory.
         */
-       TEST_ASSERT_EQ(stats_populated.pages_4k, stats_repopulated.pages_4k);
        TEST_ASSERT_EQ(stats_populated.pages_2m, stats_repopulated.pages_2m);
        TEST_ASSERT_EQ(stats_populated.pages_1g, stats_repopulated.pages_1g);
 }
index 0a1573d52882b7b127307a829b0d1dc4d1af6560..37b1a9f5286447a1bb4a8c8f7a69c807ac0d6ced 100644 (file)
@@ -41,7 +41,7 @@ static inline void handle_flds_emulation_failure_exit(struct kvm_vcpu *vcpu)
 
        insn_bytes = run->emulation_failure.insn_bytes;
        TEST_ASSERT(insn_bytes[0] == 0xd9 && insn_bytes[1] == 0,
-                   "Expected 'flds [eax]', opcode '0xd9 0x00', got opcode 0x%02x 0x%02x\n",
+                   "Expected 'flds [eax]', opcode '0xd9 0x00', got opcode 0x%02x 0x%02x",
                    insn_bytes[0], insn_bytes[1]);
 
        vcpu_regs_get(vcpu, &regs);
index f5e1e98f04f9ef0a00f3d80ba8a0bd94f90feffd..e058bc676cd6930d54593093579fff15e1930d91 100644 (file)
@@ -212,6 +212,7 @@ int main(void)
        int stage;
 
        TEST_REQUIRE(kvm_has_cap(KVM_CAP_HYPERV_TIME));
+       TEST_REQUIRE(sys_clocksource_is_based_on_tsc());
 
        vm = vm_create_with_one_vcpu(&vcpu, guest_main);
 
@@ -220,7 +221,7 @@ int main(void)
        tsc_page_gva = vm_vaddr_alloc_page(vm);
        memset(addr_gva2hva(vm, tsc_page_gva), 0x0, getpagesize());
        TEST_ASSERT((addr_gva2gpa(vm, tsc_page_gva) & (getpagesize() - 1)) == 0,
-               "TSC page has to be page aligned\n");
+               "TSC page has to be page aligned");
        vcpu_args_set(vcpu, 2, tsc_page_gva, addr_gva2gpa(vm, tsc_page_gva));
 
        host_check_tsc_msr_rdtsc(vcpu);
@@ -237,7 +238,7 @@ int main(void)
                        break;
                case UCALL_DONE:
                        /* Keep in sync with guest_main() */
-                       TEST_ASSERT(stage == 11, "Testing ended prematurely, stage %d\n",
+                       TEST_ASSERT(stage == 11, "Testing ended prematurely, stage %d",
                                    stage);
                        goto out;
                default:
index 4f4193fc74ffa29454c193ed741603c5fef1004b..b923a285e96f9492108ac17c21a070a4dd4ffa61 100644 (file)
@@ -454,7 +454,7 @@ static void guest_test_msrs_access(void)
                case 44:
                        /* MSR is not available when CPUID feature bit is unset */
                        if (!has_invtsc)
-                               continue;
+                               goto next_stage;
                        msr->idx = HV_X64_MSR_TSC_INVARIANT_CONTROL;
                        msr->write = false;
                        msr->fault_expected = true;
@@ -462,7 +462,7 @@ static void guest_test_msrs_access(void)
                case 45:
                        /* MSR is vailable when CPUID feature bit is set */
                        if (!has_invtsc)
-                               continue;
+                               goto next_stage;
                        vcpu_set_cpuid_feature(vcpu, HV_ACCESS_TSC_INVARIANT);
                        msr->idx = HV_X64_MSR_TSC_INVARIANT_CONTROL;
                        msr->write = false;
@@ -471,7 +471,7 @@ static void guest_test_msrs_access(void)
                case 46:
                        /* Writing bits other than 0 is forbidden */
                        if (!has_invtsc)
-                               continue;
+                               goto next_stage;
                        msr->idx = HV_X64_MSR_TSC_INVARIANT_CONTROL;
                        msr->write = true;
                        msr->write_val = 0xdeadbeef;
@@ -480,7 +480,7 @@ static void guest_test_msrs_access(void)
                case 47:
                        /* Setting bit 0 enables the feature */
                        if (!has_invtsc)
-                               continue;
+                               goto next_stage;
                        msr->idx = HV_X64_MSR_TSC_INVARIANT_CONTROL;
                        msr->write = true;
                        msr->write_val = 1;
@@ -513,6 +513,7 @@ static void guest_test_msrs_access(void)
                        return;
                }
 
+next_stage:
                stage++;
                kvm_vm_free(vm);
        }
index 65e5f4c05068a8fff78caf386f76ef107ac407f2..f1617762c22fecaff954824c3ca09cebec1eb78d 100644 (file)
@@ -289,7 +289,7 @@ int main(int argc, char *argv[])
                switch (get_ucall(vcpu[0], &uc)) {
                case UCALL_SYNC:
                        TEST_ASSERT(uc.args[1] == stage,
-                                   "Unexpected stage: %ld (%d expected)\n",
+                                   "Unexpected stage: %ld (%d expected)",
                                    uc.args[1], stage);
                        break;
                case UCALL_DONE:
index c4443f71f8dd01f6aafde337c618d414c61c1ce3..05b56095cf76f6b8857f7f83bccdf68b3787d759 100644 (file)
@@ -658,7 +658,7 @@ int main(int argc, char *argv[])
                switch (get_ucall(vcpu[0], &uc)) {
                case UCALL_SYNC:
                        TEST_ASSERT(uc.args[1] == stage,
-                                   "Unexpected stage: %ld (%d expected)\n",
+                                   "Unexpected stage: %ld (%d expected)",
                                    uc.args[1], stage);
                        break;
                case UCALL_ABORT:
index 1778704360a6634ee7df3dfacd8f63fd3d236cab..5bc12222d87af696dc2f35e8874c4eb3f58a4f7e 100644 (file)
@@ -92,7 +92,7 @@ static void setup_clock(struct kvm_vm *vm, struct test_case *test_case)
                                break;
                } while (errno == EINTR);
 
-               TEST_ASSERT(!r, "clock_gettime() failed: %d\n", r);
+               TEST_ASSERT(!r, "clock_gettime() failed: %d", r);
 
                data.realtime = ts.tv_sec * NSEC_PER_SEC;
                data.realtime += ts.tv_nsec;
@@ -127,47 +127,11 @@ static void enter_guest(struct kvm_vcpu *vcpu)
                        handle_abort(&uc);
                        return;
                default:
-                       TEST_ASSERT(0, "unhandled ucall: %ld\n", uc.cmd);
+                       TEST_ASSERT(0, "unhandled ucall: %ld", uc.cmd);
                }
        }
 }
 
-#define CLOCKSOURCE_PATH "/sys/devices/system/clocksource/clocksource0/current_clocksource"
-
-static void check_clocksource(void)
-{
-       char *clk_name;
-       struct stat st;
-       FILE *fp;
-
-       fp = fopen(CLOCKSOURCE_PATH, "r");
-       if (!fp) {
-               pr_info("failed to open clocksource file: %d; assuming TSC.\n",
-                       errno);
-               return;
-       }
-
-       if (fstat(fileno(fp), &st)) {
-               pr_info("failed to stat clocksource file: %d; assuming TSC.\n",
-                       errno);
-               goto out;
-       }
-
-       clk_name = malloc(st.st_size);
-       TEST_ASSERT(clk_name, "failed to allocate buffer to read file\n");
-
-       if (!fgets(clk_name, st.st_size, fp)) {
-               pr_info("failed to read clocksource file: %d; assuming TSC.\n",
-                       ferror(fp));
-               goto out;
-       }
-
-       TEST_ASSERT(!strncmp(clk_name, "tsc\n", st.st_size),
-                   "clocksource not supported: %s", clk_name);
-out:
-       fclose(fp);
-}
-
 int main(void)
 {
        struct kvm_vcpu *vcpu;
@@ -179,7 +143,7 @@ int main(void)
        flags = kvm_check_cap(KVM_CAP_ADJUST_CLOCK);
        TEST_REQUIRE(flags & KVM_CLOCK_REALTIME);
 
-       check_clocksource();
+       TEST_REQUIRE(sys_clocksource_is_based_on_tsc());
 
        vm = vm_create_with_one_vcpu(&vcpu, guest_main);
 
index 83e25bccc139decff79249e99a336ef2cb8cc820..17bbb96fc4dfcbc25e2ed62e5010d2d71a9d1746 100644 (file)
@@ -257,9 +257,9 @@ int main(int argc, char **argv)
        TEST_REQUIRE(kvm_has_cap(KVM_CAP_VM_DISABLE_NX_HUGE_PAGES));
 
        __TEST_REQUIRE(token == MAGIC_TOKEN,
-                      "This test must be run with the magic token %d.\n"
-                      "This is done by nx_huge_pages_test.sh, which\n"
-                      "also handles environment setup for the test.", MAGIC_TOKEN);
+                      "This test must be run with the magic token via '-t %d'.\n"
+                      "Running via nx_huge_pages_test.sh, which also handles "
+                      "environment setup, is strongly recommended.", MAGIC_TOKEN);
 
        run_test(reclaim_period_ms, false, reboot_permissions);
        run_test(reclaim_period_ms, true, reboot_permissions);
index c9a07963d68aaedcc6b45d4e00e932e5981645b0..87011965dc41664d46bb285fa5dd408876fe20c2 100644 (file)
@@ -44,7 +44,7 @@ static void test_msr_platform_info_enabled(struct kvm_vcpu *vcpu)
 
        get_ucall(vcpu, &uc);
        TEST_ASSERT(uc.cmd == UCALL_SYNC,
-                       "Received ucall other than UCALL_SYNC: %lu\n", uc.cmd);
+                       "Received ucall other than UCALL_SYNC: %lu", uc.cmd);
        TEST_ASSERT((uc.args[1] & MSR_PLATFORM_INFO_MAX_TURBO_RATIO) ==
                MSR_PLATFORM_INFO_MAX_TURBO_RATIO,
                "Expected MSR_PLATFORM_INFO to have max turbo ratio mask: %i.",
index 283cc55597a4fe28c02197ea18c269b9294a16b2..a3bd54b925abaf10331886385f983001c47ebb06 100644 (file)
@@ -866,7 +866,7 @@ static void __test_fixed_counter_bitmap(struct kvm_vcpu *vcpu, uint8_t idx,
         * userspace doesn't set any pmu filter.
         */
        count = run_vcpu_to_sync(vcpu);
-       TEST_ASSERT(count, "Unexpected count value: %ld\n", count);
+       TEST_ASSERT(count, "Unexpected count value: %ld", count);
 
        for (i = 0; i < BIT(nr_fixed_counters); i++) {
                bitmap = BIT(i);
index c7ef97561038e8bc297324387160990f170f53a9..a49828adf2949464b668f530a977e9efd9d75888 100644 (file)
@@ -91,7 +91,7 @@ static void sev_migrate_from(struct kvm_vm *dst, struct kvm_vm *src)
        int ret;
 
        ret = __sev_migrate_from(dst, src);
-       TEST_ASSERT(!ret, "Migration failed, ret: %d, errno: %d\n", ret, errno);
+       TEST_ASSERT(!ret, "Migration failed, ret: %d, errno: %d", ret, errno);
 }
 
 static void test_sev_migrate_from(bool es)
@@ -113,7 +113,7 @@ static void test_sev_migrate_from(bool es)
        /* Migrate the guest back to the original VM. */
        ret = __sev_migrate_from(src_vm, dst_vms[NR_MIGRATE_TEST_VMS - 1]);
        TEST_ASSERT(ret == -1 && errno == EIO,
-                   "VM that was migrated from should be dead. ret %d, errno: %d\n", ret,
+                   "VM that was migrated from should be dead. ret %d, errno: %d", ret,
                    errno);
 
        kvm_vm_free(src_vm);
@@ -172,7 +172,7 @@ static void test_sev_migrate_parameters(void)
        vm_no_sev = aux_vm_create(true);
        ret = __sev_migrate_from(vm_no_vcpu, vm_no_sev);
        TEST_ASSERT(ret == -1 && errno == EINVAL,
-                   "Migrations require SEV enabled. ret %d, errno: %d\n", ret,
+                   "Migrations require SEV enabled. ret %d, errno: %d", ret,
                    errno);
 
        if (!have_sev_es)
@@ -187,25 +187,25 @@ static void test_sev_migrate_parameters(void)
        ret = __sev_migrate_from(sev_vm, sev_es_vm);
        TEST_ASSERT(
                ret == -1 && errno == EINVAL,
-               "Should not be able migrate to SEV enabled VM. ret: %d, errno: %d\n",
+               "Should not be able migrate to SEV enabled VM. ret: %d, errno: %d",
                ret, errno);
 
        ret = __sev_migrate_from(sev_es_vm, sev_vm);
        TEST_ASSERT(
                ret == -1 && errno == EINVAL,
-               "Should not be able migrate to SEV-ES enabled VM. ret: %d, errno: %d\n",
+               "Should not be able migrate to SEV-ES enabled VM. ret: %d, errno: %d",
                ret, errno);
 
        ret = __sev_migrate_from(vm_no_vcpu, sev_es_vm);
        TEST_ASSERT(
                ret == -1 && errno == EINVAL,
-               "SEV-ES migrations require same number of vCPUS. ret: %d, errno: %d\n",
+               "SEV-ES migrations require same number of vCPUS. ret: %d, errno: %d",
                ret, errno);
 
        ret = __sev_migrate_from(vm_no_vcpu, sev_es_vm_no_vmsa);
        TEST_ASSERT(
                ret == -1 && errno == EINVAL,
-               "SEV-ES migrations require UPDATE_VMSA. ret %d, errno: %d\n",
+               "SEV-ES migrations require UPDATE_VMSA. ret %d, errno: %d",
                ret, errno);
 
        kvm_vm_free(sev_vm);
@@ -227,7 +227,7 @@ static void sev_mirror_create(struct kvm_vm *dst, struct kvm_vm *src)
        int ret;
 
        ret = __sev_mirror_create(dst, src);
-       TEST_ASSERT(!ret, "Copying context failed, ret: %d, errno: %d\n", ret, errno);
+       TEST_ASSERT(!ret, "Copying context failed, ret: %d, errno: %d", ret, errno);
 }
 
 static void verify_mirror_allowed_cmds(int vm_fd)
@@ -259,7 +259,7 @@ static void verify_mirror_allowed_cmds(int vm_fd)
                ret = __sev_ioctl(vm_fd, cmd_id, NULL, &fw_error);
                TEST_ASSERT(
                        ret == -1 && errno == EINVAL,
-                       "Should not be able call command: %d. ret: %d, errno: %d\n",
+                       "Should not be able call command: %d. ret: %d, errno: %d",
                        cmd_id, ret, errno);
        }
 
@@ -301,18 +301,18 @@ static void test_sev_mirror_parameters(void)
        ret = __sev_mirror_create(sev_vm, sev_vm);
        TEST_ASSERT(
                ret == -1 && errno == EINVAL,
-               "Should not be able copy context to self. ret: %d, errno: %d\n",
+               "Should not be able copy context to self. ret: %d, errno: %d",
                ret, errno);
 
        ret = __sev_mirror_create(vm_no_vcpu, vm_with_vcpu);
        TEST_ASSERT(ret == -1 && errno == EINVAL,
-                   "Copy context requires SEV enabled. ret %d, errno: %d\n", ret,
+                   "Copy context requires SEV enabled. ret %d, errno: %d", ret,
                    errno);
 
        ret = __sev_mirror_create(vm_with_vcpu, sev_vm);
        TEST_ASSERT(
                ret == -1 && errno == EINVAL,
-               "SEV copy context requires no vCPUS on the destination. ret: %d, errno: %d\n",
+               "SEV copy context requires no vCPUS on the destination. ret: %d, errno: %d",
                ret, errno);
 
        if (!have_sev_es)
@@ -322,13 +322,13 @@ static void test_sev_mirror_parameters(void)
        ret = __sev_mirror_create(sev_vm, sev_es_vm);
        TEST_ASSERT(
                ret == -1 && errno == EINVAL,
-               "Should not be able copy context to SEV enabled VM. ret: %d, errno: %d\n",
+               "Should not be able copy context to SEV enabled VM. ret: %d, errno: %d",
                ret, errno);
 
        ret = __sev_mirror_create(sev_es_vm, sev_vm);
        TEST_ASSERT(
                ret == -1 && errno == EINVAL,
-               "Should not be able copy context to SEV-ES enabled VM. ret: %d, errno: %d\n",
+               "Should not be able copy context to SEV-ES enabled VM. ret: %d, errno: %d",
                ret, errno);
 
        kvm_vm_free(sev_es_vm);
index 06edf00a97d61dc3ca265926c2043d839daddeeb..1a46dd7bb39136e51b1e021be08667b84c2fe001 100644 (file)
@@ -74,7 +74,7 @@ int main(int argc, char *argv[])
                                    MEM_REGION_SIZE / PAGE_SIZE, 0);
        gpa = vm_phy_pages_alloc(vm, MEM_REGION_SIZE / PAGE_SIZE,
                                 MEM_REGION_GPA, MEM_REGION_SLOT);
-       TEST_ASSERT(gpa == MEM_REGION_GPA, "Failed vm_phy_pages_alloc\n");
+       TEST_ASSERT(gpa == MEM_REGION_GPA, "Failed vm_phy_pages_alloc");
        virt_map(vm, MEM_REGION_GVA, MEM_REGION_GPA, 1);
        hva = addr_gpa2hva(vm, MEM_REGION_GPA);
        memset(hva, 0, PAGE_SIZE);
@@ -102,7 +102,7 @@ int main(int argc, char *argv[])
        case UCALL_DONE:
                break;
        default:
-               TEST_FAIL("Unrecognized ucall: %lu\n", uc.cmd);
+               TEST_FAIL("Unrecognized ucall: %lu", uc.cmd);
        }
 
        kvm_vm_free(vm);
index 00965ba33f730c2a443773dc89e41d465c39beeb..a91b5b145fa35ac81dfaa225ff350e4186a229d0 100644 (file)
@@ -46,7 +46,7 @@ static void compare_regs(struct kvm_regs *left, struct kvm_regs *right)
 #define REG_COMPARE(reg) \
        TEST_ASSERT(left->reg == right->reg, \
                    "Register " #reg \
-                   " values did not match: 0x%llx, 0x%llx\n", \
+                   " values did not match: 0x%llx, 0x%llx", \
                    left->reg, right->reg)
        REG_COMPARE(rax);
        REG_COMPARE(rbx);
@@ -230,14 +230,14 @@ int main(int argc, char *argv[])
        run->kvm_valid_regs = INVALID_SYNC_FIELD;
        rv = _vcpu_run(vcpu);
        TEST_ASSERT(rv < 0 && errno == EINVAL,
-                   "Invalid kvm_valid_regs did not cause expected KVM_RUN error: %d\n",
+                   "Invalid kvm_valid_regs did not cause expected KVM_RUN error: %d",
                    rv);
        run->kvm_valid_regs = 0;
 
        run->kvm_valid_regs = INVALID_SYNC_FIELD | TEST_SYNC_FIELDS;
        rv = _vcpu_run(vcpu);
        TEST_ASSERT(rv < 0 && errno == EINVAL,
-                   "Invalid kvm_valid_regs did not cause expected KVM_RUN error: %d\n",
+                   "Invalid kvm_valid_regs did not cause expected KVM_RUN error: %d",
                    rv);
        run->kvm_valid_regs = 0;
 
@@ -245,14 +245,14 @@ int main(int argc, char *argv[])
        run->kvm_dirty_regs = INVALID_SYNC_FIELD;
        rv = _vcpu_run(vcpu);
        TEST_ASSERT(rv < 0 && errno == EINVAL,
-                   "Invalid kvm_dirty_regs did not cause expected KVM_RUN error: %d\n",
+                   "Invalid kvm_dirty_regs did not cause expected KVM_RUN error: %d",
                    rv);
        run->kvm_dirty_regs = 0;
 
        run->kvm_dirty_regs = INVALID_SYNC_FIELD | TEST_SYNC_FIELDS;
        rv = _vcpu_run(vcpu);
        TEST_ASSERT(rv < 0 && errno == EINVAL,
-                   "Invalid kvm_dirty_regs did not cause expected KVM_RUN error: %d\n",
+                   "Invalid kvm_dirty_regs did not cause expected KVM_RUN error: %d",
                    rv);
        run->kvm_dirty_regs = 0;
 
index 0ed32ec903d03548ce11fa5bcc42eba329808506..dcbb3c29fb8e9f82b9dce0b222111f8a8eb4776a 100644 (file)
@@ -143,7 +143,7 @@ static void run_vcpu_expect_gp(struct kvm_vcpu *vcpu)
 
        TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
        TEST_ASSERT(get_ucall(vcpu, &uc) == UCALL_SYNC,
-                   "Expect UCALL_SYNC\n");
+                   "Expect UCALL_SYNC");
        TEST_ASSERT(uc.args[1] == SYNC_GP, "#GP is expected.");
        printf("vCPU received GP in guest.\n");
 }
@@ -188,7 +188,7 @@ static void *run_ucna_injection(void *arg)
 
        TEST_ASSERT_KVM_EXIT_REASON(params->vcpu, KVM_EXIT_IO);
        TEST_ASSERT(get_ucall(params->vcpu, &uc) == UCALL_SYNC,
-                   "Expect UCALL_SYNC\n");
+                   "Expect UCALL_SYNC");
        TEST_ASSERT(uc.args[1] == SYNC_FIRST_UCNA, "Injecting first UCNA.");
 
        printf("Injecting first UCNA at %#x.\n", FIRST_UCNA_ADDR);
@@ -198,7 +198,7 @@ static void *run_ucna_injection(void *arg)
 
        TEST_ASSERT_KVM_EXIT_REASON(params->vcpu, KVM_EXIT_IO);
        TEST_ASSERT(get_ucall(params->vcpu, &uc) == UCALL_SYNC,
-                   "Expect UCALL_SYNC\n");
+                   "Expect UCALL_SYNC");
        TEST_ASSERT(uc.args[1] == SYNC_SECOND_UCNA, "Injecting second UCNA.");
 
        printf("Injecting second UCNA at %#x.\n", SECOND_UCNA_ADDR);
@@ -208,7 +208,7 @@ static void *run_ucna_injection(void *arg)
 
        TEST_ASSERT_KVM_EXIT_REASON(params->vcpu, KVM_EXIT_IO);
        if (get_ucall(params->vcpu, &uc) == UCALL_ABORT) {
-               TEST_ASSERT(false, "vCPU assertion failure: %s.\n",
+               TEST_ASSERT(false, "vCPU assertion failure: %s.",
                            (const char *)uc.args[0]);
        }
 
index 255c50b0dc32675dfef64e65e1a0c6b164d0ca39..9481cbcf284f69b662a2f9f1a5fb3ff304ab0605 100644 (file)
@@ -71,7 +71,7 @@ int main(int argc, char *argv[])
                        break;
 
                TEST_ASSERT(run->io.port == 0x80,
-                           "Expected I/O at port 0x80, got port 0x%x\n", run->io.port);
+                           "Expected I/O at port 0x80, got port 0x%x", run->io.port);
 
                /*
                 * Modify the rep string count in RCX: 2 => 1 and 3 => 8192.
index 2bed5fb3a0d6e51aa63f9732d77dac330378d2c8..a81a24761aac072a0359305826712a917ed06bba 100644 (file)
@@ -99,7 +99,7 @@ int main(int argc, char *argv[])
                        TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_INTERNAL_ERROR);
                        TEST_ASSERT(run->internal.suberror ==
                                    KVM_INTERNAL_ERROR_EMULATION,
-                                   "Got internal suberror other than KVM_INTERNAL_ERROR_EMULATION: %u\n",
+                                   "Got internal suberror other than KVM_INTERNAL_ERROR_EMULATION: %u",
                                    run->internal.suberror);
                        break;
                }
index e4ad5fef52ffc5b08ec8d3f445b518b52229e672..7f6f5f23fb9b67fcb186a0e9c9ad00aaced6d2d3 100644 (file)
@@ -128,17 +128,17 @@ int main(int argc, char *argv[])
                         */
                        kvm_vm_get_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap);
                        if (uc.args[1]) {
-                               TEST_ASSERT(test_bit(0, bmap), "Page 0 incorrectly reported clean\n");
-                               TEST_ASSERT(host_test_mem[0] == 1, "Page 0 not written by guest\n");
+                               TEST_ASSERT(test_bit(0, bmap), "Page 0 incorrectly reported clean");
+                               TEST_ASSERT(host_test_mem[0] == 1, "Page 0 not written by guest");
                        } else {
-                               TEST_ASSERT(!test_bit(0, bmap), "Page 0 incorrectly reported dirty\n");
-                               TEST_ASSERT(host_test_mem[0] == 0xaaaaaaaaaaaaaaaaULL, "Page 0 written by guest\n");
+                               TEST_ASSERT(!test_bit(0, bmap), "Page 0 incorrectly reported dirty");
+                               TEST_ASSERT(host_test_mem[0] == 0xaaaaaaaaaaaaaaaaULL, "Page 0 written by guest");
                        }
 
-                       TEST_ASSERT(!test_bit(1, bmap), "Page 1 incorrectly reported dirty\n");
-                       TEST_ASSERT(host_test_mem[4096 / 8] == 0xaaaaaaaaaaaaaaaaULL, "Page 1 written by guest\n");
-                       TEST_ASSERT(!test_bit(2, bmap), "Page 2 incorrectly reported dirty\n");
-                       TEST_ASSERT(host_test_mem[8192 / 8] == 0xaaaaaaaaaaaaaaaaULL, "Page 2 written by guest\n");
+                       TEST_ASSERT(!test_bit(1, bmap), "Page 1 incorrectly reported dirty");
+                       TEST_ASSERT(host_test_mem[4096 / 8] == 0xaaaaaaaaaaaaaaaaULL, "Page 1 written by guest");
+                       TEST_ASSERT(!test_bit(2, bmap), "Page 2 incorrectly reported dirty");
+                       TEST_ASSERT(host_test_mem[8192 / 8] == 0xaaaaaaaaaaaaaaaaULL, "Page 2 written by guest");
                        break;
                case UCALL_DONE:
                        done = true;
index a9b827c69f32c5f96548d040c72d979166041f19..fad3634fd9eb62e34ded1c3f59b1d38a0b61d03a 100644 (file)
@@ -28,7 +28,7 @@ static void __run_vcpu_with_invalid_state(struct kvm_vcpu *vcpu)
 
        TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_INTERNAL_ERROR);
        TEST_ASSERT(run->emulation_failure.suberror == KVM_INTERNAL_ERROR_EMULATION,
-                   "Expected emulation failure, got %d\n",
+                   "Expected emulation failure, got %d",
                    run->emulation_failure.suberror);
 }
 
index e710b6e7fb384aac124ad0c7f646a872b347ad51..1759fa5cb3f29c337a09a5d15569ff028067e2db 100644 (file)
@@ -116,23 +116,6 @@ static void l1_guest_code(struct vmx_pages *vmx_pages)
        GUEST_DONE();
 }
 
-static bool system_has_stable_tsc(void)
-{
-       bool tsc_is_stable;
-       FILE *fp;
-       char buf[4];
-
-       fp = fopen("/sys/devices/system/clocksource/clocksource0/current_clocksource", "r");
-       if (fp == NULL)
-               return false;
-
-       tsc_is_stable = fgets(buf, sizeof(buf), fp) &&
-                       !strncmp(buf, "tsc", sizeof(buf));
-
-       fclose(fp);
-       return tsc_is_stable;
-}
-
 int main(int argc, char *argv[])
 {
        struct kvm_vcpu *vcpu;
@@ -148,7 +131,7 @@ int main(int argc, char *argv[])
 
        TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_VMX));
        TEST_REQUIRE(kvm_has_cap(KVM_CAP_TSC_CONTROL));
-       TEST_REQUIRE(system_has_stable_tsc());
+       TEST_REQUIRE(sys_clocksource_is_based_on_tsc());
 
        /*
         * We set L1's scale factor to be a random number from 2 to 10.
index 67ac2a3292efd4e5a4ff24073af25849ce375dd9..725c206ba0b92bc9d073dcbc7d8403cb7d2f0bd1 100644 (file)
@@ -216,7 +216,7 @@ static void *vcpu_thread(void *arg)
                            "Halting vCPU halted %lu times, woke %lu times, received %lu IPIs.\n"
                            "Halter TPR=%#x PPR=%#x LVR=%#x\n"
                            "Migrations attempted: %lu\n"
-                           "Migrations completed: %lu\n",
+                           "Migrations completed: %lu",
                            vcpu->id, (const char *)uc.args[0],
                            params->data->ipis_sent, params->data->hlt_count,
                            params->data->wake_count,
@@ -288,7 +288,7 @@ void do_migrations(struct test_data_page *data, int run_secs, int delay_usecs,
        }
 
        TEST_ASSERT(nodes > 1,
-                   "Did not find at least 2 numa nodes. Can't do migration\n");
+                   "Did not find at least 2 numa nodes. Can't do migration");
 
        fprintf(stderr, "Migrating amongst %d nodes found\n", nodes);
 
@@ -347,7 +347,7 @@ void do_migrations(struct test_data_page *data, int run_secs, int delay_usecs,
                                    wake_count != data->wake_count,
                                    "IPI, HLT and wake count have not increased "
                                    "in the last %lu seconds. "
-                                   "HLTer is likely hung.\n", interval_secs);
+                                   "HLTer is likely hung.", interval_secs);
 
                        ipis_sent = data->ipis_sent;
                        hlt_count = data->hlt_count;
@@ -381,7 +381,7 @@ void get_cmdline_args(int argc, char *argv[], int *run_secs,
                                    "-m adds calls to migrate_pages while vCPUs are running."
                                    " Default is no migrations.\n"
                                    "-d <delay microseconds> - delay between migrate_pages() calls."
-                                   " Default is %d microseconds.\n",
+                                   " Default is %d microseconds.",
                                    DEFAULT_RUN_SECS, DEFAULT_DELAY_USECS);
                }
        }
index dc6217440db3ae193fd9bfbcfbaadea223e57c03..25a0b0db5c3c9dfac6819de37e8c7d0a541935fb 100644 (file)
@@ -116,7 +116,7 @@ int main(int argc, char *argv[])
                vcpu_run(vcpu);
 
                TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
-                           "Unexpected exit reason: %u (%s),\n",
+                           "Unexpected exit reason: %u (%s),",
                            run->exit_reason,
                            exit_reason_str(run->exit_reason));
 
index e0ddf47362e773fbba85169b6c517ac505a3077b..167c97abff1b816dd9792b19f71c58847f433f78 100644 (file)
@@ -29,7 +29,7 @@ int main(int argc, char *argv[])
 
        xss_val = vcpu_get_msr(vcpu, MSR_IA32_XSS);
        TEST_ASSERT(xss_val == 0,
-                   "MSR_IA32_XSS should be initialized to zero\n");
+                   "MSR_IA32_XSS should be initialized to zero");
 
        vcpu_set_msr(vcpu, MSR_IA32_XSS, xss_val);
 
index 5b79758cae627593c68b9fd465451efcf7b75f9f..e64bbdf0e86eac8bf1751ee287508aca1a7ed27c 100644 (file)
@@ -9,6 +9,7 @@
 
 #include <errno.h>
 #include <linux/landlock.h>
+#include <linux/securebits.h>
 #include <sys/capability.h>
 #include <sys/socket.h>
 #include <sys/syscall.h>
@@ -115,11 +116,16 @@ static void _init_caps(struct __test_metadata *const _metadata, bool drop_all)
                /* clang-format off */
                CAP_DAC_OVERRIDE,
                CAP_MKNOD,
+               CAP_NET_ADMIN,
+               CAP_NET_BIND_SERVICE,
                CAP_SYS_ADMIN,
                CAP_SYS_CHROOT,
-               CAP_NET_BIND_SERVICE,
                /* clang-format on */
        };
+       const unsigned int noroot = SECBIT_NOROOT | SECBIT_NOROOT_LOCKED;
+
+       if ((cap_get_secbits() & noroot) != noroot)
+               EXPECT_EQ(0, cap_set_secbits(noroot));
 
        cap_p = cap_get_proc();
        EXPECT_NE(NULL, cap_p)
@@ -137,6 +143,8 @@ static void _init_caps(struct __test_metadata *const _metadata, bool drop_all)
                        TH_LOG("Failed to cap_set_flag: %s", strerror(errno));
                }
        }
+
+       /* Automatically resets ambient capabilities. */
        EXPECT_NE(-1, cap_set_proc(cap_p))
        {
                TH_LOG("Failed to cap_set_proc: %s", strerror(errno));
@@ -145,6 +153,9 @@ static void _init_caps(struct __test_metadata *const _metadata, bool drop_all)
        {
                TH_LOG("Failed to cap_free: %s", strerror(errno));
        }
+
+       /* Quickly checks that ambient capabilities are cleared. */
+       EXPECT_NE(-1, cap_get_ambient(caps[0]));
 }
 
 /* We cannot put such helpers in a library because of kselftest_harness.h . */
@@ -158,8 +169,9 @@ static void __maybe_unused drop_caps(struct __test_metadata *const _metadata)
        _init_caps(_metadata, true);
 }
 
-static void _effective_cap(struct __test_metadata *const _metadata,
-                          const cap_value_t caps, const cap_flag_value_t value)
+static void _change_cap(struct __test_metadata *const _metadata,
+                       const cap_flag_t flag, const cap_value_t cap,
+                       const cap_flag_value_t value)
 {
        cap_t cap_p;
 
@@ -168,7 +180,7 @@ static void _effective_cap(struct __test_metadata *const _metadata,
        {
                TH_LOG("Failed to cap_get_proc: %s", strerror(errno));
        }
-       EXPECT_NE(-1, cap_set_flag(cap_p, CAP_EFFECTIVE, 1, &caps, value))
+       EXPECT_NE(-1, cap_set_flag(cap_p, flag, 1, &cap, value))
        {
                TH_LOG("Failed to cap_set_flag: %s", strerror(errno));
        }
@@ -183,15 +195,35 @@ static void _effective_cap(struct __test_metadata *const _metadata,
 }
 
 static void __maybe_unused set_cap(struct __test_metadata *const _metadata,
-                                  const cap_value_t caps)
+                                  const cap_value_t cap)
 {
-       _effective_cap(_metadata, caps, CAP_SET);
+       _change_cap(_metadata, CAP_EFFECTIVE, cap, CAP_SET);
 }
 
 static void __maybe_unused clear_cap(struct __test_metadata *const _metadata,
-                                    const cap_value_t caps)
+                                    const cap_value_t cap)
+{
+       _change_cap(_metadata, CAP_EFFECTIVE, cap, CAP_CLEAR);
+}
+
+static void __maybe_unused
+set_ambient_cap(struct __test_metadata *const _metadata, const cap_value_t cap)
+{
+       _change_cap(_metadata, CAP_INHERITABLE, cap, CAP_SET);
+
+       EXPECT_NE(-1, cap_set_ambient(cap, CAP_SET))
+       {
+               TH_LOG("Failed to set ambient capability %d: %s", cap,
+                      strerror(errno));
+       }
+}
+
+static void __maybe_unused clear_ambient_cap(
+       struct __test_metadata *const _metadata, const cap_value_t cap)
 {
-       _effective_cap(_metadata, caps, CAP_CLEAR);
+       EXPECT_EQ(1, cap_get_ambient(cap));
+       _change_cap(_metadata, CAP_INHERITABLE, cap, CAP_CLEAR);
+       EXPECT_EQ(0, cap_get_ambient(cap));
 }
 
 /* Receives an FD from a UNIX socket. Returns the received FD, or -errno. */
index 50818904397c577e6953b7fd66bbfe894827b120..2d6d9b43d958cfb7c247e2cfa1fdbdf7a48c4c08 100644 (file)
@@ -241,9 +241,11 @@ struct mnt_opt {
        const char *const data;
 };
 
-const struct mnt_opt mnt_tmp = {
+#define MNT_TMP_DATA "size=4m,mode=700"
+
+static const struct mnt_opt mnt_tmp = {
        .type = "tmpfs",
-       .data = "size=4m,mode=700",
+       .data = MNT_TMP_DATA,
 };
 
 static int mount_opt(const struct mnt_opt *const mnt, const char *const target)
@@ -4632,7 +4634,10 @@ FIXTURE_VARIANT(layout3_fs)
 /* clang-format off */
 FIXTURE_VARIANT_ADD(layout3_fs, tmpfs) {
        /* clang-format on */
-       .mnt = mnt_tmp,
+       .mnt = {
+               .type = "tmpfs",
+               .data = MNT_TMP_DATA,
+       },
        .file_path = file1_s1d1,
 };
 
index ea5f727dd25778df7def21365eae073981ee08fe..936cfc879f1d2c419195338a8af04c095fe770f8 100644 (file)
@@ -17,6 +17,7 @@
 #include <string.h>
 #include <sys/prctl.h>
 #include <sys/socket.h>
+#include <sys/syscall.h>
 #include <sys/un.h>
 
 #include "common.h"
@@ -54,6 +55,11 @@ struct service_fixture {
        };
 };
 
+static pid_t sys_gettid(void)
+{
+       return syscall(__NR_gettid);
+}
+
 static int set_service(struct service_fixture *const srv,
                       const struct protocol_variant prot,
                       const unsigned short index)
@@ -88,7 +94,7 @@ static int set_service(struct service_fixture *const srv,
        case AF_UNIX:
                srv->unix_addr.sun_family = prot.domain;
                sprintf(srv->unix_addr.sun_path,
-                       "_selftests-landlock-net-tid%d-index%d", gettid(),
+                       "_selftests-landlock-net-tid%d-index%d", sys_gettid(),
                        index);
                srv->unix_addr_len = SUN_LEN(&srv->unix_addr);
                srv->unix_addr.sun_path[0] = '\0';
@@ -101,8 +107,11 @@ static void setup_loopback(struct __test_metadata *const _metadata)
 {
        set_cap(_metadata, CAP_SYS_ADMIN);
        ASSERT_EQ(0, unshare(CLONE_NEWNET));
-       ASSERT_EQ(0, system("ip link set dev lo up"));
        clear_cap(_metadata, CAP_SYS_ADMIN);
+
+       set_ambient_cap(_metadata, CAP_NET_ADMIN);
+       ASSERT_EQ(0, system("ip link set dev lo up"));
+       clear_ambient_cap(_metadata, CAP_NET_ADMIN);
 }
 
 static bool is_restricted(const struct protocol_variant *const prot,
index aa646e0661f36cac428395667f47ed11013ef083..286ce0ee102b539a0db797a4327efaee1af924a9 100644 (file)
@@ -58,7 +58,8 @@ TEST_GEN_PROGS := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS))
 TEST_GEN_PROGS_EXTENDED := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS_EXTENDED))
 TEST_GEN_FILES := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_FILES))
 
-all: $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES)
+all: $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES) \
+       $(if $(TEST_GEN_MODS_DIR),gen_mods_dir)
 
 define RUN_TESTS
        BASE_DIR="$(selfdir)";                  \
@@ -71,8 +72,8 @@ endef
 
 run_tests: all
 ifdef building_out_of_srctree
-       @if [ "X$(TEST_PROGS)$(TEST_PROGS_EXTENDED)$(TEST_FILES)" != "X" ]; then \
-               rsync -aq --copy-unsafe-links $(TEST_PROGS) $(TEST_PROGS_EXTENDED) $(TEST_FILES) $(OUTPUT); \
+       @if [ "X$(TEST_PROGS)$(TEST_PROGS_EXTENDED)$(TEST_FILES)$(TEST_GEN_MODS_DIR)" != "X" ]; then \
+               rsync -aq --copy-unsafe-links $(TEST_PROGS) $(TEST_PROGS_EXTENDED) $(TEST_FILES) $(TEST_GEN_MODS_DIR) $(OUTPUT); \
        fi
        @if [ "X$(TEST_PROGS)" != "X" ]; then \
                $(call RUN_TESTS, $(TEST_GEN_PROGS) $(TEST_CUSTOM_PROGS) \
@@ -84,11 +85,22 @@ else
        @$(call RUN_TESTS, $(TEST_GEN_PROGS) $(TEST_CUSTOM_PROGS) $(TEST_PROGS))
 endif
 
+gen_mods_dir:
+       $(Q)$(MAKE) -C $(TEST_GEN_MODS_DIR)
+
+clean_mods_dir:
+       $(Q)$(MAKE) -C $(TEST_GEN_MODS_DIR) clean
+
 define INSTALL_SINGLE_RULE
        $(if $(INSTALL_LIST),@mkdir -p $(INSTALL_PATH))
        $(if $(INSTALL_LIST),rsync -a --copy-unsafe-links $(INSTALL_LIST) $(INSTALL_PATH)/)
 endef
 
+define INSTALL_MODS_RULE
+       $(if $(INSTALL_LIST),@mkdir -p $(INSTALL_PATH)/$(INSTALL_LIST))
+       $(if $(INSTALL_LIST),rsync -a --copy-unsafe-links $(INSTALL_LIST)/*.ko $(INSTALL_PATH)/$(INSTALL_LIST))
+endef
+
 define INSTALL_RULE
        $(eval INSTALL_LIST = $(TEST_PROGS)) $(INSTALL_SINGLE_RULE)
        $(eval INSTALL_LIST = $(TEST_PROGS_EXTENDED)) $(INSTALL_SINGLE_RULE)
@@ -97,6 +109,7 @@ define INSTALL_RULE
        $(eval INSTALL_LIST = $(TEST_CUSTOM_PROGS)) $(INSTALL_SINGLE_RULE)
        $(eval INSTALL_LIST = $(TEST_GEN_PROGS_EXTENDED)) $(INSTALL_SINGLE_RULE)
        $(eval INSTALL_LIST = $(TEST_GEN_FILES)) $(INSTALL_SINGLE_RULE)
+       $(eval INSTALL_LIST = $(notdir $(TEST_GEN_MODS_DIR))) $(INSTALL_MODS_RULE)
        $(eval INSTALL_LIST = $(wildcard config settings)) $(INSTALL_SINGLE_RULE)
 endef
 
@@ -122,7 +135,7 @@ define CLEAN
        $(RM) -r $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES) $(EXTRA_CLEAN)
 endef
 
-clean:
+clean: $(if $(TEST_GEN_MODS_DIR),clean_mods_dir)
        $(CLEAN)
 
 # Enables to extend CFLAGS and LDFLAGS from command line, e.g.
@@ -153,4 +166,4 @@ $(OUTPUT)/%:%.S
        $(LINK.S) $^ $(LDLIBS) -o $@
 endif
 
-.PHONY: run_tests all clean install emit_tests
+.PHONY: run_tests all clean install emit_tests gen_mods_dir clean_mods_dir
diff --git a/tools/testing/selftests/livepatch/.gitignore b/tools/testing/selftests/livepatch/.gitignore
new file mode 100644 (file)
index 0000000..f1e9c2a
--- /dev/null
@@ -0,0 +1 @@
+test_klp-call_getpid
index 02fadc9d55e0f576b22b0053189739d51ed6fba5..35418a4790be4be0461044d2a9430adea6366791 100644 (file)
@@ -1,5 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 
+TEST_GEN_FILES := test_klp-call_getpid
+TEST_GEN_MODS_DIR := test_modules
 TEST_PROGS_EXTENDED := functions.sh
 TEST_PROGS := \
        test-livepatch.sh \
@@ -7,7 +9,8 @@ TEST_PROGS := \
        test-shadow-vars.sh \
        test-state.sh \
        test-ftrace.sh \
-       test-sysfs.sh
+       test-sysfs.sh \
+       test-syscall.sh
 
 TEST_FILES := settings
 
index 0942dd5826f87d546b804d78a0b82f14329f6240..d2035dd64a2bea04cc55e1413ef76671ef6f7b0f 100644 (file)
@@ -13,23 +13,36 @@ the message buffer for only the duration of each individual test.)
 Config
 ------
 
-Set these config options and their prerequisites:
+Set CONFIG_LIVEPATCH=y option and it's prerequisites.
 
-CONFIG_LIVEPATCH=y
-CONFIG_TEST_LIVEPATCH=m
 
+Building the tests
+------------------
+
+To only build the tests without running them, run:
+
+  % make -C tools/testing/selftests/livepatch
+
+The command above will compile all test modules and test programs, making them
+ready to be packaged if so desired.
 
 Running the tests
 -----------------
 
-Test kernel modules are built as part of lib/ (make modules) and need to
-be installed (make modules_install) as the test scripts will modprobe
-them.
+Test kernel modules are built before running the livepatch selftests.  The
+modules are located under test_modules directory, and are built as out-of-tree
+modules.  This is specially useful since the same sources can be built and
+tested on systems with different kABI, ensuring they the tests are backwards
+compatible.  The modules will be loaded by the test scripts using insmod.
 
 To run the livepatch selftests, from the top of the kernel source tree:
 
   % make -C tools/testing/selftests TARGETS=livepatch run_tests
 
+or
+
+  % make kselftest TARGETS=livepatch
+
 
 Adding tests
 ------------
index ad23100cb27c84da75727efb714244487008c046..e88bf518a23ab3ccff9dfa1da9656e61e4f2a316 100644 (file)
@@ -1,3 +1,2 @@
 CONFIG_LIVEPATCH=y
 CONFIG_DYNAMIC_DEBUG=y
-CONFIG_TEST_LIVEPATCH=m
index c8416c54b4637b1380810f0c9f71bc99e1710b8e..fc4c6a016d3853a6b6903387ecc146a3dba86420 100644 (file)
@@ -34,6 +34,18 @@ function is_root() {
        fi
 }
 
+# Check if we can compile the modules before loading them
+function has_kdir() {
+       if [ -z "$KDIR" ]; then
+               KDIR="/lib/modules/$(uname -r)/build"
+       fi
+
+       if [ ! -d "$KDIR" ]; then
+               echo "skip all tests: KDIR ($KDIR) not available to compile modules."
+               exit $ksft_skip
+       fi
+}
+
 # die(msg) - game over, man
 #      msg - dying words
 function die() {
@@ -42,17 +54,6 @@ function die() {
        exit 1
 }
 
-# save existing dmesg so we can detect new content
-function save_dmesg() {
-       SAVED_DMESG=$(mktemp --tmpdir -t klp-dmesg-XXXXXX)
-       dmesg > "$SAVED_DMESG"
-}
-
-# cleanup temporary dmesg file from save_dmesg()
-function cleanup_dmesg_file() {
-       rm -f "$SAVED_DMESG"
-}
-
 function push_config() {
        DYNAMIC_DEBUG=$(grep '^kernel/livepatch' /sys/kernel/debug/dynamic_debug/control | \
                        awk -F'[: ]' '{print "file " $1 " line " $2 " " $4}')
@@ -99,7 +100,6 @@ function set_ftrace_enabled() {
 
 function cleanup() {
        pop_config
-       cleanup_dmesg_file
 }
 
 # setup_config - save the current config and set a script exit trap that
@@ -108,6 +108,7 @@ function cleanup() {
 #               the ftrace_enabled sysctl.
 function setup_config() {
        is_root
+       has_kdir
        push_config
        set_dynamic_debug
        set_ftrace_enabled 1
@@ -127,16 +128,14 @@ function loop_until() {
        done
 }
 
-function assert_mod() {
-       local mod="$1"
-
-       modprobe --dry-run "$mod" &>/dev/null
-}
-
 function is_livepatch_mod() {
        local mod="$1"
 
-       if [[ $(modinfo "$mod" | awk '/^livepatch:/{print $NF}') == "Y" ]]; then
+       if [[ ! -f "test_modules/$mod.ko" ]]; then
+               die "Can't find \"test_modules/$mod.ko\", try \"make\""
+       fi
+
+       if [[ $(modinfo "test_modules/$mod.ko" | awk '/^livepatch:/{print $NF}') == "Y" ]]; then
                return 0
        fi
 
@@ -146,9 +145,9 @@ function is_livepatch_mod() {
 function __load_mod() {
        local mod="$1"; shift
 
-       local msg="% modprobe $mod $*"
+       local msg="% insmod test_modules/$mod.ko $*"
        log "${msg%% }"
-       ret=$(modprobe "$mod" "$@" 2>&1)
+       ret=$(insmod "test_modules/$mod.ko" "$@" 2>&1)
        if [[ "$ret" != "" ]]; then
                die "$ret"
        fi
@@ -161,13 +160,10 @@ function __load_mod() {
 
 # load_mod(modname, params) - load a kernel module
 #      modname - module name to load
-#      params  - module parameters to pass to modprobe
+#      params  - module parameters to pass to insmod
 function load_mod() {
        local mod="$1"; shift
 
-       assert_mod "$mod" ||
-               skip "unable to load module ${mod}, verify CONFIG_TEST_LIVEPATCH=m and run self-tests as root"
-
        is_livepatch_mod "$mod" &&
                die "use load_lp() to load the livepatch module $mod"
 
@@ -177,13 +173,10 @@ function load_mod() {
 # load_lp_nowait(modname, params) - load a kernel module with a livepatch
 #                      but do not wait on until the transition finishes
 #      modname - module name to load
-#      params  - module parameters to pass to modprobe
+#      params  - module parameters to pass to insmod
 function load_lp_nowait() {
        local mod="$1"; shift
 
-       assert_mod "$mod" ||
-               skip "unable to load module ${mod}, verify CONFIG_TEST_LIVEPATCH=m and run self-tests as root"
-
        is_livepatch_mod "$mod" ||
                die "module $mod is not a livepatch"
 
@@ -196,7 +189,7 @@ function load_lp_nowait() {
 
 # load_lp(modname, params) - load a kernel module with a livepatch
 #      modname - module name to load
-#      params  - module parameters to pass to modprobe
+#      params  - module parameters to pass to insmod
 function load_lp() {
        local mod="$1"; shift
 
@@ -209,13 +202,13 @@ function load_lp() {
 
 # load_failing_mod(modname, params) - load a kernel module, expect to fail
 #      modname - module name to load
-#      params  - module parameters to pass to modprobe
+#      params  - module parameters to pass to insmod
 function load_failing_mod() {
        local mod="$1"; shift
 
-       local msg="% modprobe $mod $*"
+       local msg="% insmod test_modules/$mod.ko $*"
        log "${msg%% }"
-       ret=$(modprobe "$mod" "$@" 2>&1)
+       ret=$(insmod "test_modules/$mod.ko" "$@" 2>&1)
        if [[ "$ret" == "" ]]; then
                die "$mod unexpectedly loaded"
        fi
@@ -280,7 +273,15 @@ function set_pre_patch_ret {
 function start_test {
        local test="$1"
 
-       save_dmesg
+       # Dump something unique into the dmesg log, then stash the entry
+       # in LAST_DMESG.  The check_result() function will use it to
+       # find new kernel messages since the test started.
+       local last_dmesg_msg="livepatch kselftest timestamp: $(date --rfc-3339=ns)"
+       log "$last_dmesg_msg"
+       loop_until 'dmesg | grep -q "$last_dmesg_msg"' ||
+               die "buffer busy? can't find canary dmesg message: $last_dmesg_msg"
+       LAST_DMESG=$(dmesg | grep "$last_dmesg_msg")
+
        echo -n "TEST: $test ... "
        log "===== TEST: $test ====="
 }
@@ -291,23 +292,24 @@ function check_result {
        local expect="$*"
        local result
 
-       # Note: when comparing dmesg output, the kernel log timestamps
-       # help differentiate repeated testing runs.  Remove them with a
-       # post-comparison sed filter.
-
-       result=$(dmesg | comm --nocheck-order -13 "$SAVED_DMESG" - | \
+       # Test results include any new dmesg entry since LAST_DMESG, then:
+       # - include lines matching keywords
+       # - exclude lines matching keywords
+       # - filter out dmesg timestamp prefixes
+       result=$(dmesg | awk -v last_dmesg="$LAST_DMESG" 'p; $0 == last_dmesg { p=1 }' | \
                 grep -e 'livepatch:' -e 'test_klp' | \
                 grep -v '\(tainting\|taints\) kernel' | \
                 sed 's/^\[[ 0-9.]*\] //')
 
        if [[ "$expect" == "$result" ]] ; then
                echo "ok"
+       elif [[ "$result" == "" ]] ; then
+               echo -e "not ok\n\nbuffer overrun? can't find canary dmesg entry: $LAST_DMESG\n"
+               die "livepatch kselftest(s) failed"
        else
                echo -e "not ok\n\n$(diff -upr --label expected --label result <(echo "$expect") <(echo "$result"))\n"
                die "livepatch kselftest(s) failed"
        fi
-
-       cleanup_dmesg_file
 }
 
 # check_sysfs_rights(modname, rel_path, expected_rights) - check sysfs
index 90b26dbb26262c6f21ac02741cec6c316636fff2..32b150e25b10b8ab8e3cbc8a20005b2e883f8b50 100755 (executable)
@@ -34,9 +34,9 @@ disable_lp $MOD_LIVEPATCH
 unload_lp $MOD_LIVEPATCH
 unload_mod $MOD_TARGET
 
-check_result "% modprobe $MOD_TARGET
+check_result "% insmod test_modules/$MOD_TARGET.ko
 $MOD_TARGET: ${MOD_TARGET}_init
-% modprobe $MOD_LIVEPATCH
+% insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 $MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -81,7 +81,7 @@ disable_lp $MOD_LIVEPATCH
 unload_lp $MOD_LIVEPATCH
 unload_mod $MOD_TARGET
 
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 $MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -89,7 +89,7 @@ livepatch: '$MOD_LIVEPATCH': starting patching transition
 livepatch: '$MOD_LIVEPATCH': completing patching transition
 $MOD_LIVEPATCH: post_patch_callback: vmlinux
 livepatch: '$MOD_LIVEPATCH': patching complete
-% modprobe $MOD_TARGET
+% insmod test_modules/$MOD_TARGET.ko
 livepatch: applying patch '$MOD_LIVEPATCH' to loading module '$MOD_TARGET'
 $MOD_LIVEPATCH: pre_patch_callback: $MOD_TARGET -> [MODULE_STATE_COMING] Full formed, running module_init
 $MOD_LIVEPATCH: post_patch_callback: $MOD_TARGET -> [MODULE_STATE_COMING] Full formed, running module_init
@@ -129,9 +129,9 @@ unload_mod $MOD_TARGET
 disable_lp $MOD_LIVEPATCH
 unload_lp $MOD_LIVEPATCH
 
-check_result "% modprobe $MOD_TARGET
+check_result "% insmod test_modules/$MOD_TARGET.ko
 $MOD_TARGET: ${MOD_TARGET}_init
-% modprobe $MOD_LIVEPATCH
+% insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 $MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -177,7 +177,7 @@ unload_mod $MOD_TARGET
 disable_lp $MOD_LIVEPATCH
 unload_lp $MOD_LIVEPATCH
 
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 $MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -185,7 +185,7 @@ livepatch: '$MOD_LIVEPATCH': starting patching transition
 livepatch: '$MOD_LIVEPATCH': completing patching transition
 $MOD_LIVEPATCH: post_patch_callback: vmlinux
 livepatch: '$MOD_LIVEPATCH': patching complete
-% modprobe $MOD_TARGET
+% insmod test_modules/$MOD_TARGET.ko
 livepatch: applying patch '$MOD_LIVEPATCH' to loading module '$MOD_TARGET'
 $MOD_LIVEPATCH: pre_patch_callback: $MOD_TARGET -> [MODULE_STATE_COMING] Full formed, running module_init
 $MOD_LIVEPATCH: post_patch_callback: $MOD_TARGET -> [MODULE_STATE_COMING] Full formed, running module_init
@@ -219,7 +219,7 @@ load_lp $MOD_LIVEPATCH
 disable_lp $MOD_LIVEPATCH
 unload_lp $MOD_LIVEPATCH
 
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 $MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -254,9 +254,9 @@ load_mod $MOD_TARGET
 load_failing_mod $MOD_LIVEPATCH pre_patch_ret=-19
 unload_mod $MOD_TARGET
 
-check_result "% modprobe $MOD_TARGET
+check_result "% insmod test_modules/$MOD_TARGET.ko
 $MOD_TARGET: ${MOD_TARGET}_init
-% modprobe $MOD_LIVEPATCH pre_patch_ret=-19
+% insmod test_modules/$MOD_LIVEPATCH.ko pre_patch_ret=-19
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 test_klp_callbacks_demo: pre_patch_callback: vmlinux
@@ -265,7 +265,7 @@ livepatch: failed to enable patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': canceling patching transition, going to unpatch
 livepatch: '$MOD_LIVEPATCH': completing unpatching transition
 livepatch: '$MOD_LIVEPATCH': unpatching complete
-modprobe: ERROR: could not insert '$MOD_LIVEPATCH': No such device
+insmod: ERROR: could not insert module test_modules/$MOD_LIVEPATCH.ko: No such device
 % rmmod $MOD_TARGET
 $MOD_TARGET: ${MOD_TARGET}_exit"
 
@@ -295,7 +295,7 @@ load_failing_mod $MOD_TARGET
 disable_lp $MOD_LIVEPATCH
 unload_lp $MOD_LIVEPATCH
 
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 $MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -304,12 +304,12 @@ livepatch: '$MOD_LIVEPATCH': completing patching transition
 $MOD_LIVEPATCH: post_patch_callback: vmlinux
 livepatch: '$MOD_LIVEPATCH': patching complete
 % echo -19 > /sys/module/$MOD_LIVEPATCH/parameters/pre_patch_ret
-% modprobe $MOD_TARGET
+% insmod test_modules/$MOD_TARGET.ko
 livepatch: applying patch '$MOD_LIVEPATCH' to loading module '$MOD_TARGET'
 $MOD_LIVEPATCH: pre_patch_callback: $MOD_TARGET -> [MODULE_STATE_COMING] Full formed, running module_init
 livepatch: pre-patch callback failed for object '$MOD_TARGET'
 livepatch: patch '$MOD_LIVEPATCH' failed for module '$MOD_TARGET', refusing to load module '$MOD_TARGET'
-modprobe: ERROR: could not insert '$MOD_TARGET': No such device
+insmod: ERROR: could not insert module test_modules/$MOD_TARGET.ko: No such device
 % echo 0 > /sys/kernel/livepatch/$MOD_LIVEPATCH/enabled
 livepatch: '$MOD_LIVEPATCH': initializing unpatching transition
 $MOD_LIVEPATCH: pre_unpatch_callback: vmlinux
@@ -340,11 +340,11 @@ disable_lp $MOD_LIVEPATCH
 unload_lp $MOD_LIVEPATCH
 unload_mod $MOD_TARGET_BUSY
 
-check_result "% modprobe $MOD_TARGET_BUSY block_transition=N
+check_result "% insmod test_modules/$MOD_TARGET_BUSY.ko block_transition=N
 $MOD_TARGET_BUSY: ${MOD_TARGET_BUSY}_init
 $MOD_TARGET_BUSY: busymod_work_func enter
 $MOD_TARGET_BUSY: busymod_work_func exit
-% modprobe $MOD_LIVEPATCH
+% insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 $MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -354,7 +354,7 @@ livepatch: '$MOD_LIVEPATCH': completing patching transition
 $MOD_LIVEPATCH: post_patch_callback: vmlinux
 $MOD_LIVEPATCH: post_patch_callback: $MOD_TARGET_BUSY -> [MODULE_STATE_LIVE] Normal state
 livepatch: '$MOD_LIVEPATCH': patching complete
-% modprobe $MOD_TARGET
+% insmod test_modules/$MOD_TARGET.ko
 livepatch: applying patch '$MOD_LIVEPATCH' to loading module '$MOD_TARGET'
 $MOD_LIVEPATCH: pre_patch_callback: $MOD_TARGET -> [MODULE_STATE_COMING] Full formed, running module_init
 $MOD_LIVEPATCH: post_patch_callback: $MOD_TARGET -> [MODULE_STATE_COMING] Full formed, running module_init
@@ -421,16 +421,16 @@ disable_lp $MOD_LIVEPATCH
 unload_lp $MOD_LIVEPATCH
 unload_mod $MOD_TARGET_BUSY
 
-check_result "% modprobe $MOD_TARGET_BUSY block_transition=Y
+check_result "% insmod test_modules/$MOD_TARGET_BUSY.ko block_transition=Y
 $MOD_TARGET_BUSY: ${MOD_TARGET_BUSY}_init
 $MOD_TARGET_BUSY: busymod_work_func enter
-% modprobe $MOD_LIVEPATCH
+% insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 $MOD_LIVEPATCH: pre_patch_callback: vmlinux
 $MOD_LIVEPATCH: pre_patch_callback: $MOD_TARGET_BUSY -> [MODULE_STATE_LIVE] Normal state
 livepatch: '$MOD_LIVEPATCH': starting patching transition
-% modprobe $MOD_TARGET
+% insmod test_modules/$MOD_TARGET.ko
 livepatch: applying patch '$MOD_LIVEPATCH' to loading module '$MOD_TARGET'
 $MOD_LIVEPATCH: pre_patch_callback: $MOD_TARGET -> [MODULE_STATE_COMING] Full formed, running module_init
 $MOD_TARGET: ${MOD_TARGET}_init
@@ -467,7 +467,7 @@ disable_lp $MOD_LIVEPATCH
 unload_lp $MOD_LIVEPATCH2
 unload_lp $MOD_LIVEPATCH
 
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 $MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -475,7 +475,7 @@ livepatch: '$MOD_LIVEPATCH': starting patching transition
 livepatch: '$MOD_LIVEPATCH': completing patching transition
 $MOD_LIVEPATCH: post_patch_callback: vmlinux
 livepatch: '$MOD_LIVEPATCH': patching complete
-% modprobe $MOD_LIVEPATCH2
+% insmod test_modules/$MOD_LIVEPATCH2.ko
 livepatch: enabling patch '$MOD_LIVEPATCH2'
 livepatch: '$MOD_LIVEPATCH2': initializing patching transition
 $MOD_LIVEPATCH2: pre_patch_callback: vmlinux
@@ -523,7 +523,7 @@ disable_lp $MOD_LIVEPATCH2
 unload_lp $MOD_LIVEPATCH2
 unload_lp $MOD_LIVEPATCH
 
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 $MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -531,7 +531,7 @@ livepatch: '$MOD_LIVEPATCH': starting patching transition
 livepatch: '$MOD_LIVEPATCH': completing patching transition
 $MOD_LIVEPATCH: post_patch_callback: vmlinux
 livepatch: '$MOD_LIVEPATCH': patching complete
-% modprobe $MOD_LIVEPATCH2 replace=1
+% insmod test_modules/$MOD_LIVEPATCH2.ko replace=1
 livepatch: enabling patch '$MOD_LIVEPATCH2'
 livepatch: '$MOD_LIVEPATCH2': initializing patching transition
 $MOD_LIVEPATCH2: pre_patch_callback: vmlinux
index 825540a5194d472a122963fdf1b1327aac4e6217..730218bce99c5006336c1c08e4a1d5a74054bda8 100755 (executable)
@@ -35,7 +35,7 @@ disable_lp $MOD_LIVEPATCH
 unload_lp $MOD_LIVEPATCH
 
 check_result "livepatch: kernel.ftrace_enabled = 0
-% modprobe $MOD_LIVEPATCH
+% insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 livepatch: failed to register ftrace handler for function 'cmdline_proc_show' (-16)
@@ -44,9 +44,9 @@ livepatch: failed to enable patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': canceling patching transition, going to unpatch
 livepatch: '$MOD_LIVEPATCH': completing unpatching transition
 livepatch: '$MOD_LIVEPATCH': unpatching complete
-modprobe: ERROR: could not insert '$MOD_LIVEPATCH': Device or resource busy
+insmod: ERROR: could not insert module test_modules/$MOD_LIVEPATCH.ko: Device or resource busy
 livepatch: kernel.ftrace_enabled = 1
-% modprobe $MOD_LIVEPATCH
+% insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 livepatch: '$MOD_LIVEPATCH': starting patching transition
index 5fe79ac34be12de172ca12ef4295f18223331f7c..e3455a6b11589eed3df64e818b5d5bae1fd97b54 100755 (executable)
@@ -31,7 +31,7 @@ if [[ "$(cat /proc/cmdline)" == "$MOD_LIVEPATCH: this has been live patched" ]]
        die "livepatch kselftest(s) failed"
 fi
 
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 livepatch: '$MOD_LIVEPATCH': starting patching transition
@@ -75,14 +75,14 @@ unload_lp $MOD_LIVEPATCH
 grep 'live patched' /proc/cmdline > /dev/kmsg
 grep 'live patched' /proc/meminfo > /dev/kmsg
 
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 livepatch: '$MOD_LIVEPATCH': starting patching transition
 livepatch: '$MOD_LIVEPATCH': completing patching transition
 livepatch: '$MOD_LIVEPATCH': patching complete
 $MOD_LIVEPATCH: this has been live patched
-% modprobe $MOD_REPLACE replace=0
+% insmod test_modules/$MOD_REPLACE.ko replace=0
 livepatch: enabling patch '$MOD_REPLACE'
 livepatch: '$MOD_REPLACE': initializing patching transition
 livepatch: '$MOD_REPLACE': starting patching transition
@@ -135,14 +135,14 @@ unload_lp $MOD_REPLACE
 grep 'live patched' /proc/cmdline > /dev/kmsg
 grep 'live patched' /proc/meminfo > /dev/kmsg
 
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 livepatch: '$MOD_LIVEPATCH': starting patching transition
 livepatch: '$MOD_LIVEPATCH': completing patching transition
 livepatch: '$MOD_LIVEPATCH': patching complete
 $MOD_LIVEPATCH: this has been live patched
-% modprobe $MOD_REPLACE replace=1
+% insmod test_modules/$MOD_REPLACE.ko replace=1
 livepatch: enabling patch '$MOD_REPLACE'
 livepatch: '$MOD_REPLACE': initializing patching transition
 livepatch: '$MOD_REPLACE': starting patching transition
index e04cb354f56b2059359f0aa3fa941a5cb6d8b220..1218c155bffeaba5735b0b127348bbe569a9017a 100755 (executable)
@@ -16,7 +16,7 @@ start_test "basic shadow variable API"
 load_mod $MOD_TEST
 unload_mod $MOD_TEST
 
-check_result "% modprobe $MOD_TEST
+check_result "% insmod test_modules/$MOD_TEST.ko
 $MOD_TEST: klp_shadow_get(obj=PTR1, id=0x1234) = PTR0
 $MOD_TEST:   got expected NULL result
 $MOD_TEST: shadow_ctor: PTR3 -> PTR2
index 38656721c958b48814f58f0f9b7741a1482db3db..10a52ac061850561f7d2fb9201755c55fdd9c7c7 100755 (executable)
@@ -19,7 +19,7 @@ load_lp $MOD_LIVEPATCH
 disable_lp $MOD_LIVEPATCH
 unload_lp $MOD_LIVEPATCH
 
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 $MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -51,7 +51,7 @@ unload_lp $MOD_LIVEPATCH
 disable_lp $MOD_LIVEPATCH2
 unload_lp $MOD_LIVEPATCH2
 
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 $MOD_LIVEPATCH: pre_patch_callback: vmlinux
@@ -61,7 +61,7 @@ livepatch: '$MOD_LIVEPATCH': completing patching transition
 $MOD_LIVEPATCH: post_patch_callback: vmlinux
 $MOD_LIVEPATCH: fix_console_loglevel: fixing console_loglevel
 livepatch: '$MOD_LIVEPATCH': patching complete
-% modprobe $MOD_LIVEPATCH2
+% insmod test_modules/$MOD_LIVEPATCH2.ko
 livepatch: enabling patch '$MOD_LIVEPATCH2'
 livepatch: '$MOD_LIVEPATCH2': initializing patching transition
 $MOD_LIVEPATCH2: pre_patch_callback: vmlinux
@@ -96,7 +96,7 @@ disable_lp $MOD_LIVEPATCH2
 unload_lp $MOD_LIVEPATCH2
 unload_lp $MOD_LIVEPATCH3
 
-check_result "% modprobe $MOD_LIVEPATCH2
+check_result "% insmod test_modules/$MOD_LIVEPATCH2.ko
 livepatch: enabling patch '$MOD_LIVEPATCH2'
 livepatch: '$MOD_LIVEPATCH2': initializing patching transition
 $MOD_LIVEPATCH2: pre_patch_callback: vmlinux
@@ -106,7 +106,7 @@ livepatch: '$MOD_LIVEPATCH2': completing patching transition
 $MOD_LIVEPATCH2: post_patch_callback: vmlinux
 $MOD_LIVEPATCH2: fix_console_loglevel: fixing console_loglevel
 livepatch: '$MOD_LIVEPATCH2': patching complete
-% modprobe $MOD_LIVEPATCH3
+% insmod test_modules/$MOD_LIVEPATCH3.ko
 livepatch: enabling patch '$MOD_LIVEPATCH3'
 livepatch: '$MOD_LIVEPATCH3': initializing patching transition
 $MOD_LIVEPATCH3: pre_patch_callback: vmlinux
@@ -117,7 +117,7 @@ $MOD_LIVEPATCH3: post_patch_callback: vmlinux
 $MOD_LIVEPATCH3: fix_console_loglevel: taking over the console_loglevel change
 livepatch: '$MOD_LIVEPATCH3': patching complete
 % rmmod $MOD_LIVEPATCH2
-% modprobe $MOD_LIVEPATCH2
+% insmod test_modules/$MOD_LIVEPATCH2.ko
 livepatch: enabling patch '$MOD_LIVEPATCH2'
 livepatch: '$MOD_LIVEPATCH2': initializing patching transition
 $MOD_LIVEPATCH2: pre_patch_callback: vmlinux
@@ -149,7 +149,7 @@ load_failing_mod $MOD_LIVEPATCH
 disable_lp $MOD_LIVEPATCH2
 unload_lp $MOD_LIVEPATCH2
 
-check_result "% modprobe $MOD_LIVEPATCH2
+check_result "% insmod test_modules/$MOD_LIVEPATCH2.ko
 livepatch: enabling patch '$MOD_LIVEPATCH2'
 livepatch: '$MOD_LIVEPATCH2': initializing patching transition
 $MOD_LIVEPATCH2: pre_patch_callback: vmlinux
@@ -159,9 +159,9 @@ livepatch: '$MOD_LIVEPATCH2': completing patching transition
 $MOD_LIVEPATCH2: post_patch_callback: vmlinux
 $MOD_LIVEPATCH2: fix_console_loglevel: fixing console_loglevel
 livepatch: '$MOD_LIVEPATCH2': patching complete
-% modprobe $MOD_LIVEPATCH
+% insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: Livepatch patch ($MOD_LIVEPATCH) is not compatible with the already installed livepatches.
-modprobe: ERROR: could not insert '$MOD_LIVEPATCH': Invalid argument
+insmod: ERROR: could not insert module test_modules/$MOD_LIVEPATCH.ko: Invalid parameters
 % echo 0 > /sys/kernel/livepatch/$MOD_LIVEPATCH2/enabled
 livepatch: '$MOD_LIVEPATCH2': initializing unpatching transition
 $MOD_LIVEPATCH2: pre_unpatch_callback: vmlinux
diff --git a/tools/testing/selftests/livepatch/test-syscall.sh b/tools/testing/selftests/livepatch/test-syscall.sh
new file mode 100755 (executable)
index 0000000..b76a881
--- /dev/null
@@ -0,0 +1,53 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2023 SUSE
+# Author: Marcos Paulo de Souza <mpdesouza@suse.com>
+
+. $(dirname $0)/functions.sh
+
+MOD_SYSCALL=test_klp_syscall
+
+setup_config
+
+# - Start _NRPROC processes calling getpid and load a livepatch to patch the
+#   getpid syscall. Check if all the processes transitioned to the livepatched
+#   state.
+
+start_test "patch getpid syscall while being heavily hammered"
+
+for i in $(seq 1 $(getconf _NPROCESSORS_ONLN)); do
+       ./test_klp-call_getpid &
+       pids[$i]="$!"
+done
+
+pid_list=$(echo ${pids[@]} | tr ' ' ',')
+load_lp $MOD_SYSCALL klp_pids=$pid_list
+
+# wait for all tasks to transition to patched state
+loop_until 'grep -q '^0$' /sys/kernel/test_klp_syscall/npids'
+
+pending_pids=$(cat /sys/kernel/test_klp_syscall/npids)
+log "$MOD_SYSCALL: Remaining not livepatched processes: $pending_pids"
+
+for pid in ${pids[@]}; do
+       kill $pid || true
+done
+
+disable_lp $MOD_SYSCALL
+unload_lp $MOD_SYSCALL
+
+check_result "% insmod test_modules/$MOD_SYSCALL.ko klp_pids=$pid_list
+livepatch: enabling patch '$MOD_SYSCALL'
+livepatch: '$MOD_SYSCALL': initializing patching transition
+livepatch: '$MOD_SYSCALL': starting patching transition
+livepatch: '$MOD_SYSCALL': completing patching transition
+livepatch: '$MOD_SYSCALL': patching complete
+$MOD_SYSCALL: Remaining not livepatched processes: 0
+% echo 0 > /sys/kernel/livepatch/$MOD_SYSCALL/enabled
+livepatch: '$MOD_SYSCALL': initializing unpatching transition
+livepatch: '$MOD_SYSCALL': starting unpatching transition
+livepatch: '$MOD_SYSCALL': completing unpatching transition
+livepatch: '$MOD_SYSCALL': unpatching complete
+% rmmod $MOD_SYSCALL"
+
+exit 0
index 7f76f280189a0c341517e641612387ae1ec63143..6c646afa7395e5324c24c0aeaa6ac606745ea4f4 100755 (executable)
@@ -27,7 +27,7 @@ disable_lp $MOD_LIVEPATCH
 
 unload_lp $MOD_LIVEPATCH
 
-check_result "% modprobe $MOD_LIVEPATCH
+check_result "% insmod test_modules/$MOD_LIVEPATCH.ko
 livepatch: enabling patch '$MOD_LIVEPATCH'
 livepatch: '$MOD_LIVEPATCH': initializing patching transition
 livepatch: '$MOD_LIVEPATCH': starting patching transition
@@ -56,7 +56,7 @@ check_sysfs_value  "$MOD_LIVEPATCH" "$MOD_TARGET/patched" "0"
 disable_lp $MOD_LIVEPATCH
 unload_lp $MOD_LIVEPATCH
 
-check_result "% modprobe test_klp_callbacks_demo
+check_result "% insmod test_modules/test_klp_callbacks_demo.ko
 livepatch: enabling patch 'test_klp_callbacks_demo'
 livepatch: 'test_klp_callbacks_demo': initializing patching transition
 test_klp_callbacks_demo: pre_patch_callback: vmlinux
@@ -64,7 +64,7 @@ livepatch: 'test_klp_callbacks_demo': starting patching transition
 livepatch: 'test_klp_callbacks_demo': completing patching transition
 test_klp_callbacks_demo: post_patch_callback: vmlinux
 livepatch: 'test_klp_callbacks_demo': patching complete
-% modprobe test_klp_callbacks_mod
+% insmod test_modules/test_klp_callbacks_mod.ko
 livepatch: applying patch 'test_klp_callbacks_demo' to loading module 'test_klp_callbacks_mod'
 test_klp_callbacks_demo: pre_patch_callback: test_klp_callbacks_mod -> [MODULE_STATE_COMING] Full formed, running module_init
 test_klp_callbacks_demo: post_patch_callback: test_klp_callbacks_mod -> [MODULE_STATE_COMING] Full formed, running module_init
diff --git a/tools/testing/selftests/livepatch/test_klp-call_getpid.c b/tools/testing/selftests/livepatch/test_klp-call_getpid.c
new file mode 100644 (file)
index 0000000..ce321a2
--- /dev/null
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023 SUSE
+ * Authors: Libor Pechacek <lpechacek@suse.cz>
+ *          Marcos Paulo de Souza <mpdesouza@suse.com>
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <signal.h>
+
+static int stop;
+static int sig_int;
+
+void hup_handler(int signum)
+{
+       stop = 1;
+}
+
+void int_handler(int signum)
+{
+       stop = 1;
+       sig_int = 1;
+}
+
+int main(int argc, char *argv[])
+{
+       long count = 0;
+
+       signal(SIGHUP, &hup_handler);
+       signal(SIGINT, &int_handler);
+
+       while (!stop) {
+               (void)syscall(SYS_getpid);
+               count++;
+       }
+
+       if (sig_int)
+               printf("%ld iterations done\n", count);
+
+       return 0;
+}
diff --git a/tools/testing/selftests/livepatch/test_modules/Makefile b/tools/testing/selftests/livepatch/test_modules/Makefile
new file mode 100644 (file)
index 0000000..e6e638c
--- /dev/null
@@ -0,0 +1,26 @@
+TESTMODS_DIR := $(realpath $(dir $(abspath $(lastword $(MAKEFILE_LIST)))))
+KDIR ?= /lib/modules/$(shell uname -r)/build
+
+obj-m += test_klp_atomic_replace.o \
+       test_klp_callbacks_busy.o \
+       test_klp_callbacks_demo.o \
+       test_klp_callbacks_demo2.o \
+       test_klp_callbacks_mod.o \
+       test_klp_livepatch.o \
+       test_klp_state.o \
+       test_klp_state2.o \
+       test_klp_state3.o \
+       test_klp_shadow_vars.o \
+       test_klp_syscall.o
+
+# Ensure that KDIR exists, otherwise skip the compilation
+modules:
+ifneq ("$(wildcard $(KDIR))", "")
+       $(Q)$(MAKE) -C $(KDIR) modules KBUILD_EXTMOD=$(TESTMODS_DIR)
+endif
+
+# Ensure that KDIR exists, otherwise skip the clean target
+clean:
+ifneq ("$(wildcard $(KDIR))", "")
+       $(Q)$(MAKE) -C $(KDIR) clean KBUILD_EXTMOD=$(TESTMODS_DIR)
+endif
diff --git a/tools/testing/selftests/livepatch/test_modules/test_klp_syscall.c b/tools/testing/selftests/livepatch/test_modules/test_klp_syscall.c
new file mode 100644 (file)
index 0000000..dd80278
--- /dev/null
@@ -0,0 +1,116 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2017-2023 SUSE
+ * Authors: Libor Pechacek <lpechacek@suse.cz>
+ *          Nicolai Stange <nstange@suse.de>
+ *          Marcos Paulo de Souza <mpdesouza@suse.com>
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/livepatch.h>
+
+#if defined(__x86_64__)
+#define FN_PREFIX __x64_
+#elif defined(__s390x__)
+#define FN_PREFIX __s390x_
+#elif defined(__aarch64__)
+#define FN_PREFIX __arm64_
+#else
+/* powerpc does not select ARCH_HAS_SYSCALL_WRAPPER */
+#define FN_PREFIX
+#endif
+
+/* Protects klp_pids */
+static DEFINE_MUTEX(kpid_mutex);
+
+static unsigned int npids, npids_pending;
+static int klp_pids[NR_CPUS];
+module_param_array(klp_pids, int, &npids_pending, 0);
+MODULE_PARM_DESC(klp_pids, "Array of pids to be transitioned to livepatched state.");
+
+static ssize_t npids_show(struct kobject *kobj, struct kobj_attribute *attr,
+                         char *buf)
+{
+       return sprintf(buf, "%u\n", npids_pending);
+}
+
+static struct kobj_attribute klp_attr = __ATTR_RO(npids);
+static struct kobject *klp_kobj;
+
+static asmlinkage long lp_sys_getpid(void)
+{
+       int i;
+
+       mutex_lock(&kpid_mutex);
+       if (npids_pending > 0) {
+               for (i = 0; i < npids; i++) {
+                       if (current->pid == klp_pids[i]) {
+                               klp_pids[i] = 0;
+                               npids_pending--;
+                               break;
+                       }
+               }
+       }
+       mutex_unlock(&kpid_mutex);
+
+       return task_tgid_vnr(current);
+}
+
+static struct klp_func vmlinux_funcs[] = {
+       {
+               .old_name = __stringify(FN_PREFIX) "sys_getpid",
+               .new_func = lp_sys_getpid,
+       }, {}
+};
+
+static struct klp_object objs[] = {
+       {
+               /* name being NULL means vmlinux */
+               .funcs = vmlinux_funcs,
+       }, {}
+};
+
+static struct klp_patch patch = {
+       .mod = THIS_MODULE,
+       .objs = objs,
+};
+
+static int livepatch_init(void)
+{
+       int ret;
+
+       klp_kobj = kobject_create_and_add("test_klp_syscall", kernel_kobj);
+       if (!klp_kobj)
+               return -ENOMEM;
+
+       ret = sysfs_create_file(klp_kobj, &klp_attr.attr);
+       if (ret) {
+               kobject_put(klp_kobj);
+               return ret;
+       }
+
+       /*
+        * Save the number pids to transition to livepatched state before the
+        * number of pending pids is decremented.
+        */
+       npids = npids_pending;
+
+       return klp_enable_patch(&patch);
+}
+
+static void livepatch_exit(void)
+{
+       kobject_put(klp_kobj);
+}
+
+module_init(livepatch_init);
+module_exit(livepatch_exit);
+MODULE_LICENSE("GPL");
+MODULE_INFO(livepatch, "Y");
+MODULE_AUTHOR("Libor Pechacek <lpechacek@suse.cz>");
+MODULE_AUTHOR("Nicolai Stange <nstange@suse.de>");
+MODULE_AUTHOR("Marcos Paulo de Souza <mpdesouza@suse.com>");
+MODULE_DESCRIPTION("Livepatch test: syscall transition");
index 0899019a7fcb4b04bcedca44227f2c2dd5a83597..e14bdd4455f2d2798077b8a701790bcee0732e90 100755 (executable)
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
 # Kselftest framework requirement - SKIP code is 4.
index 380b691d3eb9fbe9c1070937d9561b732343aec0..b748c48908d9d4af9ba31fe7d2443329c13c3dc2 100644 (file)
@@ -566,7 +566,7 @@ static int ksm_merge_hugepages_time(int merge_type, int mapping, int prot,
        if (map_ptr_orig == MAP_FAILED)
                err(2, "initial mmap");
 
-       if (madvise(map_ptr, len + HPAGE_SIZE, MADV_HUGEPAGE))
+       if (madvise(map_ptr, len, MADV_HUGEPAGE))
                err(2, "MADV_HUGEPAGE");
 
        pagemap_fd = open("/proc/self/pagemap", O_RDONLY);
index 193281560b61be23d3b55857030ea07b4ad3f95d..86e8f2048a409028b28ece3f755d06f535726c47 100644 (file)
@@ -15,6 +15,7 @@
 #include <unistd.h>
 #include <sys/mman.h>
 #include <fcntl.h>
+#include "vm_util.h"
 
 #define LENGTH (256UL*1024*1024)
 #define PROTECTION (PROT_READ | PROT_WRITE)
@@ -58,10 +59,16 @@ int main(int argc, char **argv)
 {
        void *addr;
        int ret;
+       size_t hugepage_size;
        size_t length = LENGTH;
        int flags = FLAGS;
        int shift = 0;
 
+       hugepage_size = default_huge_page_size();
+       /* munmap with fail if the length is not page aligned */
+       if (hugepage_size > length)
+               length = hugepage_size;
+
        if (argc > 1)
                length = atol(argv[1]) << 20;
        if (argc > 2) {
index 1d4c1589c3055d3bb22eebe2c02fa7b015e4a665..2f8b991f78cb4cade90dc05f502a647a955fb582 100644 (file)
@@ -360,7 +360,8 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
                              char pattern_seed)
 {
        void *addr, *src_addr, *dest_addr, *dest_preamble_addr;
-       unsigned long long i;
+       int d;
+       unsigned long long t;
        struct timespec t_start = {0, 0}, t_end = {0, 0};
        long long  start_ns, end_ns, align_mask, ret, offset;
        unsigned long long threshold;
@@ -378,8 +379,8 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
 
        /* Set byte pattern for source block. */
        srand(pattern_seed);
-       for (i = 0; i < threshold; i++)
-               memset((char *) src_addr + i, (char) rand(), 1);
+       for (t = 0; t < threshold; t++)
+               memset((char *) src_addr + t, (char) rand(), 1);
 
        /* Mask to zero out lower bits of address for alignment */
        align_mask = ~(c.dest_alignment - 1);
@@ -420,8 +421,8 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
 
                /* Set byte pattern for the dest preamble block. */
                srand(pattern_seed);
-               for (i = 0; i < c.dest_preamble_size; i++)
-                       memset((char *) dest_preamble_addr + i, (char) rand(), 1);
+               for (d = 0; d < c.dest_preamble_size; d++)
+                       memset((char *) dest_preamble_addr + d, (char) rand(), 1);
        }
 
        clock_gettime(CLOCK_MONOTONIC, &t_start);
@@ -437,14 +438,14 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
 
        /* Verify byte pattern after remapping */
        srand(pattern_seed);
-       for (i = 0; i < threshold; i++) {
+       for (t = 0; t < threshold; t++) {
                char c = (char) rand();
 
-               if (((char *) dest_addr)[i] != c) {
+               if (((char *) dest_addr)[t] != c) {
                        ksft_print_msg("Data after remap doesn't match at offset %llu\n",
-                                      i);
+                                      t);
                        ksft_print_msg("Expected: %#x\t Got: %#x\n", c & 0xff,
-                                       ((char *) dest_addr)[i] & 0xff);
+                                       ((char *) dest_addr)[t] & 0xff);
                        ret = -1;
                        goto clean_up_dest;
                }
@@ -453,14 +454,14 @@ static long long remap_region(struct config c, unsigned int threshold_mb,
        /* Verify the dest preamble byte pattern after remapping */
        if (c.dest_preamble_size) {
                srand(pattern_seed);
-               for (i = 0; i < c.dest_preamble_size; i++) {
+               for (d = 0; d < c.dest_preamble_size; d++) {
                        char c = (char) rand();
 
-                       if (((char *) dest_preamble_addr)[i] != c) {
+                       if (((char *) dest_preamble_addr)[d] != c) {
                                ksft_print_msg("Preamble data after remap doesn't match at offset %d\n",
-                                              i);
+                                              d);
                                ksft_print_msg("Expected: %#x\t Got: %#x\n", c & 0xff,
-                                              ((char *) dest_preamble_addr)[i] & 0xff);
+                                              ((char *) dest_preamble_addr)[d] & 0xff);
                                ret = -1;
                                goto clean_up_dest;
                        }
index cce90a10515ad2fe78fe68147d26732d717c3bb6..2b9f8cc52639d1942238b41a1ad55edc6bd406ed 100644 (file)
@@ -1517,6 +1517,12 @@ int main(int argc, char *argv[])
                                continue;
 
                        uffd_test_start("%s on %s", test->name, mem_type->name);
+                       if ((mem_type->mem_flag == MEM_HUGETLB ||
+                           mem_type->mem_flag == MEM_HUGETLB_PRIVATE) &&
+                           (default_huge_page_size() == 0)) {
+                               uffd_test_skip("huge page size is 0, feature missing?");
+                               continue;
+                       }
                        if (!uffd_feature_supported(test)) {
                                uffd_test_skip("feature missing");
                                continue;
index 45cae7cab27e12705c59cc56f6fdf5e675805f92..a0a75f3029043727b96bdb59728ed80d4d5cd9c0 100755 (executable)
@@ -29,9 +29,15 @@ check_supported_x86_64()
        # See man 1 gzip under '-f'.
        local pg_table_levels=$(gzip -dcfq "${config}" | grep PGTABLE_LEVELS | cut -d'=' -f 2)
 
+       local cpu_supports_pl5=$(awk '/^flags/ {if (/la57/) {print 0;}
+               else {print 1}; exit}' /proc/cpuinfo 2>/dev/null)
+
        if [[ "${pg_table_levels}" -lt 5 ]]; then
                echo "$0: PGTABLE_LEVELS=${pg_table_levels}, must be >= 5 to run this test"
                exit $ksft_skip
+       elif [[ "${cpu_supports_pl5}" -ne 0 ]]; then
+               echo "$0: CPU does not have the necessary la57 flag to support page table level 5"
+               exit $ksft_skip
        fi
 }
 
index 70a02301f4c276ba6313c3baa1ab3b5058a68b0c..3d2d2eb9d6fff077cca24fd82a2a4990c34706d1 100755 (executable)
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
 set -e
index 50ed5d475dd1317ede706a90d68ec3db06c82e32..bcf51d785a3712934e9e6d2833f6bac00cbe68a2 100644 (file)
@@ -218,7 +218,7 @@ static bool move_mount_set_group_supported(void)
        if (mount(NULL, SET_GROUP_FROM, NULL, MS_SHARED, 0))
                return -1;
 
-       ret = syscall(SYS_move_mount, AT_FDCWD, SET_GROUP_FROM,
+       ret = syscall(__NR_move_mount, AT_FDCWD, SET_GROUP_FROM,
                      AT_FDCWD, SET_GROUP_TO, MOVE_MOUNT_SET_GROUP);
        umount2("/tmp", MNT_DETACH);
 
@@ -363,7 +363,7 @@ TEST_F(move_mount_set_group, complex_sharing_copying)
                       CLONE_VM | CLONE_FILES); ASSERT_GT(pid, 0);
        ASSERT_EQ(wait_for_pid(pid), 0);
 
-       ASSERT_EQ(syscall(SYS_move_mount, ca_from.mntfd, "",
+       ASSERT_EQ(syscall(__NR_move_mount, ca_from.mntfd, "",
                          ca_to.mntfd, "", MOVE_MOUNT_SET_GROUP
                          | MOVE_MOUNT_F_EMPTY_PATH | MOVE_MOUNT_T_EMPTY_PATH),
                  0);
diff --git a/tools/testing/selftests/mqueue/setting b/tools/testing/selftests/mqueue/setting
new file mode 100644 (file)
index 0000000..a953c96
--- /dev/null
@@ -0,0 +1 @@
+timeout=180
index 50818075e566e1abf1f2f9e587951e5abed238fc..211753756bdee87daf7ebb1af06dbb4c1f6ee383 100644 (file)
@@ -53,8 +53,7 @@ TEST_PROGS += bind_bhash.sh
 TEST_PROGS += ip_local_port_range.sh
 TEST_PROGS += rps_default_mask.sh
 TEST_PROGS += big_tcp.sh
-TEST_PROGS_EXTENDED := in_netns.sh setup_loopback.sh setup_veth.sh
-TEST_PROGS_EXTENDED += toeplitz_client.sh toeplitz.sh lib.sh
+TEST_PROGS_EXTENDED := toeplitz_client.sh toeplitz.sh
 TEST_GEN_FILES =  socket nettest
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy reuseport_addr_any
 TEST_GEN_FILES += tcp_mmap tcp_inq psock_snd txring_overwrite
@@ -84,6 +83,7 @@ TEST_PROGS += sctp_vrf.sh
 TEST_GEN_FILES += sctp_hello
 TEST_GEN_FILES += csum
 TEST_GEN_FILES += nat6to4.o
+TEST_GEN_FILES += xdp_dummy.o
 TEST_GEN_FILES += ip_local_port_range
 TEST_GEN_FILES += bind_wildcard
 TEST_PROGS += test_vxlan_mdb.sh
@@ -95,6 +95,7 @@ TEST_PROGS += fq_band_pktlimit.sh
 TEST_PROGS += vlan_hw_filter.sh
 
 TEST_FILES := settings
+TEST_FILES += in_netns.sh lib.sh net_helper.sh setup_loopback.sh setup_veth.sh
 
 include ../lib.mk
 
@@ -104,7 +105,7 @@ $(OUTPUT)/tcp_inq: LDLIBS += -lpthread
 $(OUTPUT)/bind_bhash: LDLIBS += -lpthread
 $(OUTPUT)/io_uring_zerocopy_tx: CFLAGS += -I../../../include/
 
-# Rules to generate bpf obj nat6to4.o
+# Rules to generate bpf objs
 CLANG ?= clang
 SCRATCH_DIR := $(OUTPUT)/tools
 BUILD_DIR := $(SCRATCH_DIR)/build
@@ -139,7 +140,7 @@ endif
 
 CLANG_SYS_INCLUDES = $(call get_sys_includes,$(CLANG),$(CLANG_TARGET_ARCH))
 
-$(OUTPUT)/nat6to4.o: nat6to4.c $(BPFOBJ) | $(MAKE_DIRS)
+$(OUTPUT)/nat6to4.o $(OUTPUT)/xdp_dummy.o: $(OUTPUT)/%.o : %.c $(BPFOBJ) | $(MAKE_DIRS)
        $(CLANG) -O2 --target=bpf -c $< $(CCINCLUDE) $(CLANG_SYS_INCLUDES) -o $@
 
 $(BPFOBJ): $(wildcard $(BPFDIR)/*.[ch] $(BPFDIR)/Makefile)                    \
index cde9a91c479716e178c569cac18bedc9698357b6..2db9d15cd45feafc007669ae358372249c74151a 100755 (executable)
@@ -122,7 +122,9 @@ do_netperf() {
        local netns=$1
 
        [ "$NF" = "6" ] && serip=$SERVER_IP6
-       ip net exec $netns netperf -$NF -t TCP_STREAM -H $serip 2>&1 >/dev/null
+
+       # use large write to be sure to generate big tcp packets
+       ip net exec $netns netperf -$NF -t TCP_STREAM -l 1 -H $serip -- -m 262144 2>&1 >/dev/null
 }
 
 do_test() {
index f30bd57d5e38744de09a86cab47ed4999ba46e25..8bc23fb4c82b71c88c6a0f292e8735b831a5d03f 100755 (executable)
@@ -89,7 +89,7 @@ for ovr in setsock cmsg both diff; do
        check_result $? 0 "TCLASS $prot $ovr - pass"
 
        while [ -d /proc/$BG ]; do
-           $NSEXE ./cmsg_sender -6 -p u $TGT6 1234
+           $NSEXE ./cmsg_sender -6 -p $p $m $((TOS2)) $TGT6 1234
        done
 
        tcpdump -r $TMPF -v 2>&1 | grep "class $TOS2" >> /dev/null
@@ -126,7 +126,7 @@ for ovr in setsock cmsg both diff; do
        check_result $? 0 "HOPLIMIT $prot $ovr - pass"
 
        while [ -d /proc/$BG ]; do
-           $NSEXE ./cmsg_sender -6 -p u $TGT6 1234
+           $NSEXE ./cmsg_sender -6 -p $p $m $LIM $TGT6 1234
        done
 
        tcpdump -r $TMPF -v 2>&1 | grep "hlim $LIM[^0-9]" >> /dev/null
index 8da562a9ae87e445a7e3003c4e07f589e3f85f0d..5e4390cac17eda2c96d35c5cdcdf6899610e4367 100644 (file)
@@ -1,5 +1,6 @@
 CONFIG_USER_NS=y
 CONFIG_NET_NS=y
+CONFIG_BONDING=m
 CONFIG_BPF_SYSCALL=y
 CONFIG_TEST_BPF=m
 CONFIG_NUMA=y
@@ -14,30 +15,74 @@ CONFIG_VETH=y
 CONFIG_NET_IPVTI=y
 CONFIG_IPV6_VTI=y
 CONFIG_DUMMY=y
+CONFIG_BRIDGE_VLAN_FILTERING=y
 CONFIG_BRIDGE=y
+CONFIG_CRYPTO_CHACHA20POLY1305=m
 CONFIG_VLAN_8021Q=y
+CONFIG_GENEVE=m
 CONFIG_IFB=y
+CONFIG_INET_DIAG=y
+CONFIG_INET_ESP=y
+CONFIG_INET_ESP_OFFLOAD=y
+CONFIG_NET_FOU=y
+CONFIG_NET_FOU_IP_TUNNELS=y
+CONFIG_IP_GRE=m
 CONFIG_NETFILTER=y
 CONFIG_NETFILTER_ADVANCED=y
 CONFIG_NF_CONNTRACK=m
+CONFIG_IPV6_SIT=y
+CONFIG_IP_DCCP=m
 CONFIG_NF_NAT=m
 CONFIG_IP6_NF_IPTABLES=m
 CONFIG_IP_NF_IPTABLES=m
 CONFIG_IP6_NF_NAT=m
+CONFIG_IP6_NF_RAW=m
 CONFIG_IP_NF_NAT=m
+CONFIG_IP_NF_RAW=m
+CONFIG_IP_NF_TARGET_TTL=m
+CONFIG_IPV6_GRE=m
+CONFIG_IPV6_SEG6_LWTUNNEL=y
+CONFIG_L2TP_ETH=m
+CONFIG_L2TP_IP=m
+CONFIG_L2TP=m
+CONFIG_L2TP_V3=y
+CONFIG_MACSEC=m
+CONFIG_MACVLAN=y
+CONFIG_MACVTAP=y
+CONFIG_MPLS=y
+CONFIG_MPTCP=y
 CONFIG_NF_TABLES=m
 CONFIG_NF_TABLES_IPV6=y
 CONFIG_NF_TABLES_IPV4=y
 CONFIG_NFT_NAT=m
+CONFIG_NETFILTER_XT_MATCH_LENGTH=m
+CONFIG_NET_ACT_CSUM=m
+CONFIG_NET_ACT_CT=m
+CONFIG_NET_ACT_GACT=m
+CONFIG_NET_ACT_PEDIT=m
+CONFIG_NET_CLS_BASIC=m
+CONFIG_NET_CLS_BPF=m
+CONFIG_NET_CLS_MATCHALL=m
+CONFIG_NET_CLS_U32=m
+CONFIG_NET_IPGRE_DEMUX=m
+CONFIG_NET_IPGRE=m
+CONFIG_NET_IPIP=y
+CONFIG_NET_SCH_FQ_CODEL=m
+CONFIG_NET_SCH_HTB=m
 CONFIG_NET_SCH_FQ=m
 CONFIG_NET_SCH_ETF=m
 CONFIG_NET_SCH_NETEM=y
+CONFIG_NET_SCH_PRIO=m
+CONFIG_NFT_COMPAT=m
+CONFIG_NF_FLOW_TABLE=m
+CONFIG_PSAMPLE=m
+CONFIG_TCP_MD5SIG=y
 CONFIG_TEST_BLACKHOLE_DEV=m
 CONFIG_KALLSYMS=y
+CONFIG_TLS=m
 CONFIG_TRACEPOINTS=y
 CONFIG_NET_DROP_MONITOR=m
 CONFIG_NETDEVSIM=m
-CONFIG_NET_FOU=m
 CONFIG_MPLS_ROUTING=m
 CONFIG_MPLS_IPTUNNEL=m
 CONFIG_NET_SCH_INGRESS=m
@@ -48,7 +93,10 @@ CONFIG_BAREUDP=m
 CONFIG_IPV6_IOAM6_LWTUNNEL=y
 CONFIG_CRYPTO_SM4_GENERIC=y
 CONFIG_AMT=m
+CONFIG_TUN=y
 CONFIG_VXLAN=m
 CONFIG_IP_SCTP=m
 CONFIG_NETFILTER_XT_MATCH_POLICY=m
 CONFIG_CRYPTO_ARIA=y
+CONFIG_XFRM_INTERFACE=m
+CONFIG_XFRM_USER=m
index 452693514be4b06842dbe32088c5495c2c933f0b..4de92632f48360c0002900260af9b35d34186f4b 100644 (file)
@@ -112,7 +112,7 @@ TEST_PROGS = bridge_fdb_learning_limit.sh \
        vxlan_symmetric_ipv6.sh \
        vxlan_symmetric.sh
 
-TEST_PROGS_EXTENDED := devlink_lib.sh \
+TEST_FILES := devlink_lib.sh \
        ethtool_lib.sh \
        fib_offload_lib.sh \
        forwarding.config.sample \
index 9af9f6964808baee53e69c51ad1adb4128fc700a..c62331b2e006069e8812dedf797968c24726493d 100755 (executable)
@@ -327,10 +327,10 @@ locked_port_mab_redirect()
        RET=0
        check_port_mab_support || return 0
 
-       bridge link set dev $swp1 learning on locked on mab on
        tc qdisc add dev $swp1 clsact
        tc filter add dev $swp1 ingress protocol all pref 1 handle 101 flower \
                action mirred egress redirect dev $swp2
+       bridge link set dev $swp1 learning on locked on mab on
 
        ping_do $h1 192.0.2.2
        check_err $? "Ping did not work with redirection"
@@ -349,8 +349,8 @@ locked_port_mab_redirect()
        check_err $? "Locked entry not created after deleting filter"
 
        bridge fdb del `mac_get $h1` vlan 1 dev $swp1 master
-       tc qdisc del dev $swp1 clsact
        bridge link set dev $swp1 learning off locked off mab off
+       tc qdisc del dev $swp1 clsact
 
        log_test "Locked port MAB redirect"
 }
index 61348f71728cd54537f49e17b192e08514323b4c..d9d587454d207931a539f59be15cbc63d471888f 100755 (executable)
@@ -329,7 +329,7 @@ __cfg_test_port_ip_star_g()
 
        bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q " 0.00"
        check_err $? "(*, G) \"permanent\" entry has a pending group timer"
-       bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q "\/0.00"
+       bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q "/0.00"
        check_err $? "\"permanent\" source entry has a pending source timer"
 
        bridge mdb del dev br0 port $swp1 grp $grp vid 10
@@ -346,7 +346,7 @@ __cfg_test_port_ip_star_g()
 
        bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q " 0.00"
        check_fail $? "(*, G) EXCLUDE entry does not have a pending group timer"
-       bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q "\/0.00"
+       bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q "/0.00"
        check_err $? "\"blocked\" source entry has a pending source timer"
 
        bridge mdb del dev br0 port $swp1 grp $grp vid 10
@@ -363,7 +363,7 @@ __cfg_test_port_ip_star_g()
 
        bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q " 0.00"
        check_err $? "(*, G) INCLUDE entry has a pending group timer"
-       bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q "\/0.00"
+       bridge -d -s mdb get dev br0 grp $grp vid 10 | grep -q "/0.00"
        check_fail $? "Source entry does not have a pending source timer"
 
        bridge mdb del dev br0 port $swp1 grp $grp vid 10
@@ -1252,14 +1252,17 @@ fwd_test()
        echo
        log_info "# Forwarding tests"
 
+       # Set the Max Response Delay to 100 centiseconds (1 second) so that the
+       # bridge will start forwarding according to its MDB soon after a
+       # multicast querier is enabled.
+       ip link set dev br0 type bridge mcast_query_response_interval 100
+
        # Forwarding according to MDB entries only takes place when the bridge
        # detects that there is a valid querier in the network. Set the bridge
        # as the querier and assign it a valid IPv6 link-local address to be
        # used as the source address for MLD queries.
        ip -6 address add fe80::1/64 nodad dev br0
        ip link set dev br0 type bridge mcast_querier 1
-       # Wait the default Query Response Interval (10 seconds) for the bridge
-       # to determine that there are no other queriers in the network.
        sleep 10
 
        fwd_test_host
@@ -1267,6 +1270,7 @@ fwd_test()
 
        ip link set dev br0 type bridge mcast_querier 0
        ip -6 address del fe80::1/64 dev br0
+       ip link set dev br0 type bridge mcast_query_response_interval 1000
 }
 
 ctrl_igmpv3_is_in_test()
index b0f5e55d2d0b2584aefacc135ffe6b2d2cab34fc..58962963650227bcc942354a052d8bf2bd95aa13 100755 (executable)
@@ -235,9 +235,6 @@ mirred_egress_to_ingress_tcp_test()
        check_err $? "didn't mirred redirect ICMP"
        tc_check_packets "dev $h1 ingress" 102 10
        check_err $? "didn't drop mirred ICMP"
-       local overlimits=$(tc_rule_stats_get ${h1} 101 egress .overlimits)
-       test ${overlimits} = 10
-       check_err $? "wrong overlimits, expected 10 got ${overlimits}"
 
        tc filter del dev $h1 egress protocol ip pref 100 handle 100 flower
        tc filter del dev $h1 egress protocol ip pref 101 handle 101 flower
index 20a7cb7222b8baa4062a769f2f0d27daf2b00cfd..c2420bb72c128119f005de590e36952dc5960f36 100755 (executable)
@@ -209,14 +209,17 @@ test_l2_miss_multicast()
        # both registered and unregistered multicast traffic.
        bridge link set dev $swp2 mcast_router 2
 
+       # Set the Max Response Delay to 100 centiseconds (1 second) so that the
+       # bridge will start forwarding according to its MDB soon after a
+       # multicast querier is enabled.
+       ip link set dev br1 type bridge mcast_query_response_interval 100
+
        # Forwarding according to MDB entries only takes place when the bridge
        # detects that there is a valid querier in the network. Set the bridge
        # as the querier and assign it a valid IPv6 link-local address to be
        # used as the source address for MLD queries.
        ip link set dev br1 type bridge mcast_querier 1
        ip -6 address add fe80::1/64 nodad dev br1
-       # Wait the default Query Response Interval (10 seconds) for the bridge
-       # to determine that there are no other queriers in the network.
        sleep 10
 
        test_l2_miss_multicast_ipv4
@@ -224,6 +227,7 @@ test_l2_miss_multicast()
 
        ip -6 address del fe80::1/64 dev br1
        ip link set dev br1 type bridge mcast_querier 0
+       ip link set dev br1 type bridge mcast_query_response_interval 1000
        bridge link set dev $swp2 mcast_router 1
 }
 
index 19352f106c1dff1b316ef9991e32c9d78142ac45..02c21ff4ca81fddc89ca697fe3d3f04a5dc792c8 100755 (executable)
@@ -31,6 +31,11 @@ run_test() {
       1>>log.txt
     wait "${server_pid}"
     exit_code=$?
+    if [[ ${test} == "large" && -n "${KSFT_MACHINE_SLOW}" && \
+          ${exit_code} -ne 0 ]]; then
+        echo "Ignoring errors due to slow environment" 1>&2
+        exit_code=0
+    fi
     if [[ "${exit_code}" -eq 0 ]]; then
         break;
     fi
index fe59ca3e5596bfe3abfbb477dad2d8bcbb608a56..12491850ae985a779b069662ccba312a3dc1964e 100755 (executable)
@@ -367,14 +367,12 @@ run_test()
   local desc=$2
   local node_src=$3
   local node_dst=$4
-  local ip6_src=$5
-  local ip6_dst=$6
-  local if_dst=$7
-  local trace_type=$8
-  local ioam_ns=$9
-
-  ip netns exec $node_dst ./ioam6_parser $if_dst $name $ip6_src $ip6_dst \
-         $trace_type $ioam_ns &
+  local ip6_dst=$5
+  local trace_type=$6
+  local ioam_ns=$7
+  local type=$8
+
+  ip netns exec $node_dst ./ioam6_parser $name $trace_type $ioam_ns $type &
   local spid=$!
   sleep 0.1
 
@@ -489,7 +487,7 @@ out_undef_ns()
          trace prealloc type 0x800000 ns 0 size 4 dev veth0
 
   run_test ${FUNCNAME[0]} "${desc} ($1 mode)" $ioam_node_alpha $ioam_node_beta \
-         db01::2 db01::1 veth0 0x800000 0
+         db01::1 0x800000 0 $1
 
   [ "$1" = "encap" ] && ip -netns $ioam_node_beta link set ip6tnl0 down
 }
@@ -509,7 +507,7 @@ out_no_room()
          trace prealloc type 0xc00000 ns 123 size 4 dev veth0
 
   run_test ${FUNCNAME[0]} "${desc} ($1 mode)" $ioam_node_alpha $ioam_node_beta \
-         db01::2 db01::1 veth0 0xc00000 123
+         db01::1 0xc00000 123 $1
 
   [ "$1" = "encap" ] && ip -netns $ioam_node_beta link set ip6tnl0 down
 }
@@ -543,14 +541,14 @@ out_bits()
       if [ $cmd_res != 0 ]
       then
         npassed=$((npassed+1))
-        log_test_passed "$descr"
+        log_test_passed "$descr ($1 mode)"
       else
         nfailed=$((nfailed+1))
-        log_test_failed "$descr"
+        log_test_failed "$descr ($1 mode)"
       fi
     else
        run_test "out_bit$i" "$descr ($1 mode)" $ioam_node_alpha \
-           $ioam_node_beta db01::2 db01::1 veth0 ${bit2type[$i]} 123
+           $ioam_node_beta db01::1 ${bit2type[$i]} 123 $1
     fi
   done
 
@@ -574,7 +572,7 @@ out_full_supp_trace()
          trace prealloc type 0xfff002 ns 123 size 100 dev veth0
 
   run_test ${FUNCNAME[0]} "${desc} ($1 mode)" $ioam_node_alpha $ioam_node_beta \
-         db01::2 db01::1 veth0 0xfff002 123
+         db01::1 0xfff002 123 $1
 
   [ "$1" = "encap" ] && ip -netns $ioam_node_beta link set ip6tnl0 down
 }
@@ -604,7 +602,7 @@ in_undef_ns()
          trace prealloc type 0x800000 ns 0 size 4 dev veth0
 
   run_test ${FUNCNAME[0]} "${desc} ($1 mode)" $ioam_node_alpha $ioam_node_beta \
-         db01::2 db01::1 veth0 0x800000 0
+         db01::1 0x800000 0 $1
 
   [ "$1" = "encap" ] && ip -netns $ioam_node_beta link set ip6tnl0 down
 }
@@ -624,7 +622,7 @@ in_no_room()
          trace prealloc type 0xc00000 ns 123 size 4 dev veth0
 
   run_test ${FUNCNAME[0]} "${desc} ($1 mode)" $ioam_node_alpha $ioam_node_beta \
-         db01::2 db01::1 veth0 0xc00000 123
+         db01::1 0xc00000 123 $1
 
   [ "$1" = "encap" ] && ip -netns $ioam_node_beta link set ip6tnl0 down
 }
@@ -651,7 +649,7 @@ in_bits()
            dev veth0
 
     run_test "in_bit$i" "${desc/<n>/$i} ($1 mode)" $ioam_node_alpha \
-           $ioam_node_beta db01::2 db01::1 veth0 ${bit2type[$i]} 123
+           $ioam_node_beta db01::1 ${bit2type[$i]} 123 $1
   done
 
   [ "$1" = "encap" ] && ip -netns $ioam_node_beta link set ip6tnl0 down
@@ -679,7 +677,7 @@ in_oflag()
          trace prealloc type 0xc00000 ns 123 size 4 dev veth0
 
   run_test ${FUNCNAME[0]} "${desc} ($1 mode)" $ioam_node_alpha $ioam_node_beta \
-         db01::2 db01::1 veth0 0xc00000 123
+         db01::1 0xc00000 123 $1
 
   [ "$1" = "encap" ] && ip -netns $ioam_node_beta link set ip6tnl0 down
 
@@ -703,7 +701,7 @@ in_full_supp_trace()
          trace prealloc type 0xfff002 ns 123 size 80 dev veth0
 
   run_test ${FUNCNAME[0]} "${desc} ($1 mode)" $ioam_node_alpha $ioam_node_beta \
-         db01::2 db01::1 veth0 0xfff002 123
+         db01::1 0xfff002 123 $1
 
   [ "$1" = "encap" ] && ip -netns $ioam_node_beta link set ip6tnl0 down
 }
@@ -731,7 +729,7 @@ fwd_full_supp_trace()
          trace prealloc type 0xfff002 ns 123 size 244 via db01::1 dev veth0
 
   run_test ${FUNCNAME[0]} "${desc} ($1 mode)" $ioam_node_alpha $ioam_node_gamma \
-         db01::2 db02::2 veth0 0xfff002 123
+         db02::2 0xfff002 123 $1
 
   [ "$1" = "encap" ] && ip -netns $ioam_node_gamma link set ip6tnl0 down
 }
index d9d1d41901267439aac832166e46410c85f44111..895e5bb5044bb126dc9894cf35a68d7cf1c79ec7 100644 (file)
@@ -8,7 +8,6 @@
 #include <errno.h>
 #include <limits.h>
 #include <linux/const.h>
-#include <linux/if_ether.h>
 #include <linux/ioam6.h>
 #include <linux/ipv6.h>
 #include <stdlib.h>
@@ -512,14 +511,6 @@ static int str2id(const char *tname)
        return -1;
 }
 
-static int ipv6_addr_equal(const struct in6_addr *a1, const struct in6_addr *a2)
-{
-       return ((a1->s6_addr32[0] ^ a2->s6_addr32[0]) |
-               (a1->s6_addr32[1] ^ a2->s6_addr32[1]) |
-               (a1->s6_addr32[2] ^ a2->s6_addr32[2]) |
-               (a1->s6_addr32[3] ^ a2->s6_addr32[3])) == 0;
-}
-
 static int get_u32(__u32 *val, const char *arg, int base)
 {
        unsigned long res;
@@ -603,70 +594,80 @@ static int (*func[__TEST_MAX])(int, struct ioam6_trace_hdr *, __u32, __u16) = {
 
 int main(int argc, char **argv)
 {
-       int fd, size, hoplen, tid, ret = 1;
-       struct in6_addr src, dst;
+       int fd, size, hoplen, tid, ret = 1, on = 1;
        struct ioam6_hdr *opt;
-       struct ipv6hdr *ip6h;
-       __u8 buffer[400], *p;
-       __u16 ioam_ns;
+       struct cmsghdr *cmsg;
+       struct msghdr msg;
+       struct iovec iov;
+       __u8 buffer[512];
        __u32 tr_type;
+       __u16 ioam_ns;
+       __u8 *ptr;
 
-       if (argc != 7)
+       if (argc != 5)
                goto out;
 
-       tid = str2id(argv[2]);
+       tid = str2id(argv[1]);
        if (tid < 0 || !func[tid])
                goto out;
 
-       if (inet_pton(AF_INET6, argv[3], &src) != 1 ||
-           inet_pton(AF_INET6, argv[4], &dst) != 1)
+       if (get_u32(&tr_type, argv[2], 16) ||
+           get_u16(&ioam_ns, argv[3], 0))
                goto out;
 
-       if (get_u32(&tr_type, argv[5], 16) ||
-           get_u16(&ioam_ns, argv[6], 0))
+       fd = socket(PF_INET6, SOCK_RAW,
+                   !strcmp(argv[4], "encap") ? IPPROTO_IPV6 : IPPROTO_ICMPV6);
+       if (fd < 0)
                goto out;
 
-       fd = socket(AF_PACKET, SOCK_DGRAM, __cpu_to_be16(ETH_P_IPV6));
-       if (!fd)
-               goto out;
+       setsockopt(fd, IPPROTO_IPV6, IPV6_RECVHOPOPTS,  &on, sizeof(on));
 
-       if (setsockopt(fd, SOL_SOCKET, SO_BINDTODEVICE,
-                      argv[1], strlen(argv[1])))
+       iov.iov_len = 1;
+       iov.iov_base = malloc(CMSG_SPACE(sizeof(buffer)));
+       if (!iov.iov_base)
                goto close;
-
 recv:
-       size = recv(fd, buffer, sizeof(buffer), 0);
+       memset(&msg, 0, sizeof(msg));
+       msg.msg_iov = &iov;
+       msg.msg_iovlen = 1;
+       msg.msg_control = buffer;
+       msg.msg_controllen = CMSG_SPACE(sizeof(buffer));
+
+       size = recvmsg(fd, &msg, 0);
        if (size <= 0)
                goto close;
 
-       ip6h = (struct ipv6hdr *)buffer;
+       for (cmsg = CMSG_FIRSTHDR(&msg); cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) {
+               if (cmsg->cmsg_level != IPPROTO_IPV6 ||
+                   cmsg->cmsg_type != IPV6_HOPOPTS ||
+                   cmsg->cmsg_len < sizeof(struct ipv6_hopopt_hdr))
+                       continue;
 
-       if (!ipv6_addr_equal(&ip6h->saddr, &src) ||
-           !ipv6_addr_equal(&ip6h->daddr, &dst))
-               goto recv;
+               ptr = (__u8 *)CMSG_DATA(cmsg);
 
-       if (ip6h->nexthdr != IPPROTO_HOPOPTS)
-               goto close;
+               hoplen = (ptr[1] + 1) << 3;
+               ptr += sizeof(struct ipv6_hopopt_hdr);
 
-       p = buffer + sizeof(*ip6h);
-       hoplen = (p[1] + 1) << 3;
-       p += sizeof(struct ipv6_hopopt_hdr);
+               while (hoplen > 0) {
+                       opt = (struct ioam6_hdr *)ptr;
 
-       while (hoplen > 0) {
-               opt = (struct ioam6_hdr *)p;
+                       if (opt->opt_type == IPV6_TLV_IOAM &&
+                           opt->type == IOAM6_TYPE_PREALLOC) {
+                               ptr += sizeof(*opt);
+                               ret = func[tid](tid,
+                                               (struct ioam6_trace_hdr *)ptr,
+                                               tr_type, ioam_ns);
+                               goto close;
+                       }
 
-               if (opt->opt_type == IPV6_TLV_IOAM &&
-                   opt->type == IOAM6_TYPE_PREALLOC) {
-                       p += sizeof(*opt);
-                       ret = func[tid](tid, (struct ioam6_trace_hdr *)p,
-                                          tr_type, ioam_ns);
-                       break;
+                       ptr += opt->opt_len + 2;
+                       hoplen -= opt->opt_len + 2;
                }
-
-               p += opt->opt_len + 2;
-               hoplen -= opt->opt_len + 2;
        }
+
+       goto recv;
 close:
+       free(iov.iov_base);
        close(fd);
 out:
        return ret;
index 0f217a1cc837de22de278d15f7cfb6bbdb372c50..6ebd58869a637227319524c0bdc69fe2495a37c9 100644 (file)
 #define IP_LOCAL_PORT_RANGE 51
 #endif
 
+#ifndef IPPROTO_MPTCP
+#define IPPROTO_MPTCP 262
+#endif
+
 static __u32 pack_port_range(__u16 lo, __u16 hi)
 {
        return (hi << 16) | (lo << 0);
index dca549443801135cf8db35f9545885a6ab772504..f9fe182dfbd44e9de0f0caa27b409281c0584081 100644 (file)
@@ -4,6 +4,9 @@
 ##############################################################################
 # Defines
 
+WAIT_TIMEOUT=${WAIT_TIMEOUT:=20}
+BUSYWAIT_TIMEOUT=$((WAIT_TIMEOUT * 1000)) # ms
+
 # Kselftest framework requirement - SKIP code is 4.
 ksft_skip=4
 # namespace list created by setup_ns
@@ -48,7 +51,7 @@ cleanup_ns()
 
        for ns in "$@"; do
                ip netns delete "${ns}" &> /dev/null
-               if ! busywait 2 ip netns list \| grep -vq "^$ns$" &> /dev/null; then
+               if ! busywait $BUSYWAIT_TIMEOUT ip netns list \| grep -vq "^$ns$" &> /dev/null; then
                        echo "Warn: Failed to remove namespace $ns"
                        ret=1
                fi
index e317c2e44dae840149fad7fe14a3a41d699b063e..4f80014cae4940a3f56ebb313349baa8540c0a0a 100644 (file)
@@ -22,8 +22,11 @@ CONFIG_NFT_TPROXY=m
 CONFIG_NFT_SOCKET=m
 CONFIG_IP_ADVANCED_ROUTER=y
 CONFIG_IP_MULTIPLE_TABLES=y
+CONFIG_IP_NF_FILTER=m
+CONFIG_IP_NF_MANGLE=m
 CONFIG_IP_NF_TARGET_REJECT=m
 CONFIG_IPV6_MULTIPLE_TABLES=y
+CONFIG_IP6_NF_FILTER=m
 CONFIG_NET_ACT_CSUM=m
 CONFIG_NET_ACT_PEDIT=m
 CONFIG_NET_CLS_ACT=y
index 04fcb8a077c995c768d222171c045caf2ab3c3b3..75fc95675e2dcaa7a4593d0b9bd2ccaccdb12921 100755 (executable)
@@ -20,7 +20,7 @@ flush_pids()
 
        ip netns pids "${ns}" | xargs --no-run-if-empty kill -SIGUSR1 &>/dev/null
 
-       for _ in $(seq 10); do
+       for _ in $(seq $((timeout_poll * 10))); do
                [ -z "$(ip netns pids "${ns}")" ] && break
                sleep 0.1
        done
@@ -62,14 +62,14 @@ __chk_nr()
        nr=$(eval $command)
 
        printf "%-50s" "$msg"
-       if [ $nr != $expected ]; then
-               if [ $nr = "$skip" ] && ! mptcp_lib_expect_all_features; then
+       if [ "$nr" != "$expected" ]; then
+               if [ "$nr" = "$skip" ] && ! mptcp_lib_expect_all_features; then
                        echo "[ skip ] Feature probably not supported"
                        mptcp_lib_result_skip "${msg}"
                else
                        echo "[ fail ] expected $expected found $nr"
                        mptcp_lib_result_fail "${msg}"
-                       ret=$test_cnt
+                       ret=${KSFT_FAIL}
                fi
        else
                echo "[  ok  ]"
@@ -91,6 +91,15 @@ chk_msk_nr()
        __chk_msk_nr "grep -c token:" "$@"
 }
 
+chk_listener_nr()
+{
+       local expected=$1
+       local msg="$2"
+
+       __chk_nr "ss -nlHMON $ns | wc -l" "$expected" "$msg - mptcp" 0
+       __chk_nr "ss -nlHtON $ns | wc -l" "$expected" "$msg - subflows"
+}
+
 wait_msk_nr()
 {
        local condition="grep -c token:"
@@ -115,11 +124,11 @@ wait_msk_nr()
        if [ $i -ge $timeout ]; then
                echo "[ fail ] timeout while expecting $expected max $max last $nr"
                mptcp_lib_result_fail "${msg} # timeout"
-               ret=$test_cnt
+               ret=${KSFT_FAIL}
        elif [ $nr != $expected ]; then
                echo "[ fail ] expected $expected found $nr"
                mptcp_lib_result_fail "${msg} # unexpected result"
-               ret=$test_cnt
+               ret=${KSFT_FAIL}
        else
                echo "[  ok  ]"
                mptcp_lib_result_pass "${msg}"
@@ -166,9 +175,13 @@ chk_msk_listen()
 chk_msk_inuse()
 {
        local expected=$1
-       local msg="$2"
+       local msg="....chk ${2:-${expected}} msk in use"
        local listen_nr
 
+       if [ "${expected}" -eq 0 ]; then
+               msg+=" after flush"
+       fi
+
        listen_nr=$(ss -N "${ns}" -Ml | grep -c LISTEN)
        expected=$((expected + listen_nr))
 
@@ -179,16 +192,21 @@ chk_msk_inuse()
                sleep 0.1
        done
 
-       __chk_nr get_msk_inuse $expected "$msg" 0
+       __chk_nr get_msk_inuse $expected "${msg}" 0
 }
 
 # $1: cestab nr
 chk_msk_cestab()
 {
-       local cestab=$1
+       local expected=$1
+       local msg="....chk ${2:-${expected}} cestab"
+
+       if [ "${expected}" -eq 0 ]; then
+               msg+=" after flush"
+       fi
 
        __chk_nr "mptcp_lib_get_counter ${ns} MPTcpExtMPCurrEstab" \
-                "${cestab}" "....chk ${cestab} cestab" ""
+                "${expected}" "${msg}" ""
 }
 
 wait_connected()
@@ -227,12 +245,12 @@ wait_connected $ns 10000
 chk_msk_nr 2 "after MPC handshake "
 chk_msk_remote_key_nr 2 "....chk remote_key"
 chk_msk_fallback_nr 0 "....chk no fallback"
-chk_msk_inuse 2 "....chk 2 msk in use"
+chk_msk_inuse 2
 chk_msk_cestab 2
 flush_pids
 
-chk_msk_inuse 0 "....chk 0 msk in use after flush"
-chk_msk_cestab 0
+chk_msk_inuse 0 "2->0"
+chk_msk_cestab 0 "2->0"
 
 echo "a" | \
        timeout ${timeout_test} \
@@ -247,12 +265,12 @@ echo "b" | \
                                127.0.0.1 >/dev/null &
 wait_connected $ns 10001
 chk_msk_fallback_nr 1 "check fallback"
-chk_msk_inuse 1 "....chk 1 msk in use"
+chk_msk_inuse 1
 chk_msk_cestab 1
 flush_pids
 
-chk_msk_inuse 0 "....chk 0 msk in use after flush"
-chk_msk_cestab 0
+chk_msk_inuse 0 "1->0"
+chk_msk_cestab 0 "1->0"
 
 NR_CLIENTS=100
 for I in `seq 1 $NR_CLIENTS`; do
@@ -273,12 +291,28 @@ for I in `seq 1 $NR_CLIENTS`; do
 done
 
 wait_msk_nr $((NR_CLIENTS*2)) "many msk socket present"
-chk_msk_inuse $((NR_CLIENTS*2)) "....chk many msk in use"
-chk_msk_cestab $((NR_CLIENTS*2))
+chk_msk_inuse $((NR_CLIENTS*2)) "many"
+chk_msk_cestab $((NR_CLIENTS*2)) "many"
 flush_pids
 
-chk_msk_inuse 0 "....chk 0 msk in use after flush"
-chk_msk_cestab 0
+chk_msk_inuse 0 "many->0"
+chk_msk_cestab 0 "many->0"
+
+chk_listener_nr 0 "no listener sockets"
+NR_SERVERS=100
+for I in $(seq 1 $NR_SERVERS); do
+       ip netns exec $ns ./mptcp_connect -p $((I + 20001)) \
+               -t ${timeout_poll} -l 0.0.0.0 >/dev/null 2>&1 &
+done
+mptcp_lib_wait_local_port_listen $ns $((NR_SERVERS + 20001))
+
+chk_listener_nr $NR_SERVERS "many listener sockets"
+
+# graceful termination
+for I in $(seq 1 $NR_SERVERS); do
+       echo a | ip netns exec $ns ./mptcp_connect -p $((I + 20001)) 127.0.0.1 >/dev/null 2>&1 &
+done
+flush_pids
 
 mptcp_lib_result_print_all_tap
 exit $ret
index 3a5b630261910b1cfdd001e05c97c07335b61827..e4581b0dfb967723e36b1847c512f02f4bc87a45 100755 (executable)
@@ -161,6 +161,11 @@ check_tools()
                exit $ksft_skip
        fi
 
+       if ! ss -h | grep -q MPTCP; then
+               echo "SKIP: ss tool does not support MPTCP"
+               exit $ksft_skip
+       fi
+
        # Use the legacy version if available to support old kernel versions
        if iptables-legacy -V &> /dev/null; then
                iptables="iptables-legacy"
@@ -643,13 +648,6 @@ kill_events_pids()
        mptcp_lib_kill_wait $evts_ns2_pid
 }
 
-kill_tests_wait()
-{
-       #shellcheck disable=SC2046
-       kill -SIGUSR1 $(ip netns pids $ns2) $(ip netns pids $ns1)
-       wait
-}
-
 pm_nl_set_limits()
 {
        local ns=$1
@@ -3340,16 +3338,17 @@ userspace_pm_rm_sf()
 {
        local evts=$evts_ns1
        local t=${3:-1}
-       local ip=4
+       local ip
        local tk da dp sp
        local cnt
 
        [ "$1" == "$ns2" ] && evts=$evts_ns2
-       if mptcp_lib_is_v6 $2; then ip=6; fi
+       [ -n "$(mptcp_lib_evts_get_info "saddr4" "$evts" $t)" ] && ip=4
+       [ -n "$(mptcp_lib_evts_get_info "saddr6" "$evts" $t)" ] && ip=6
        tk=$(mptcp_lib_evts_get_info token "$evts")
-       da=$(mptcp_lib_evts_get_info "daddr$ip" "$evts" $t)
-       dp=$(mptcp_lib_evts_get_info dport "$evts" $t)
-       sp=$(mptcp_lib_evts_get_info sport "$evts" $t)
+       da=$(mptcp_lib_evts_get_info "daddr$ip" "$evts" $t $2)
+       dp=$(mptcp_lib_evts_get_info dport "$evts" $t $2)
+       sp=$(mptcp_lib_evts_get_info sport "$evts" $t $2)
 
        cnt=$(rm_sf_count ${1})
        ip netns exec $1 ./pm_nl_ctl dsf lip $2 lport $sp \
@@ -3436,24 +3435,27 @@ userspace_tests()
        if reset_with_events "userspace pm add & remove address" &&
           continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/pm_type'; then
                set_userspace_pm $ns1
-               pm_nl_set_limits $ns2 1 1
+               pm_nl_set_limits $ns2 2 2
                speed=5 \
                        run_tests $ns1 $ns2 10.0.1.1 &
                local tests_pid=$!
                wait_mpj $ns1
                userspace_pm_add_addr $ns1 10.0.2.1 10
-               chk_join_nr 1 1 1
-               chk_add_nr 1 1
-               chk_mptcp_info subflows 1 subflows 1
-               chk_subflows_total 2 2
-               chk_mptcp_info add_addr_signal 1 add_addr_accepted 1
+               userspace_pm_add_addr $ns1 10.0.3.1 20
+               chk_join_nr 2 2 2
+               chk_add_nr 2 2
+               chk_mptcp_info subflows 2 subflows 2
+               chk_subflows_total 3 3
+               chk_mptcp_info add_addr_signal 2 add_addr_accepted 2
                userspace_pm_rm_addr $ns1 10
                userspace_pm_rm_sf $ns1 "::ffff:10.0.2.1" $SUB_ESTABLISHED
-               chk_rm_nr 1 1 invert
+               userspace_pm_rm_addr $ns1 20
+               userspace_pm_rm_sf $ns1 10.0.3.1 $SUB_ESTABLISHED
+               chk_rm_nr 2 2 invert
                chk_mptcp_info subflows 0 subflows 0
                chk_subflows_total 1 1
                kill_events_pids
-               wait $tests_pid
+               mptcp_lib_kill_wait $tests_pid
        fi
 
        # userspace pm create destroy subflow
@@ -3475,7 +3477,7 @@ userspace_tests()
                chk_mptcp_info subflows 0 subflows 0
                chk_subflows_total 1 1
                kill_events_pids
-               wait $tests_pid
+               mptcp_lib_kill_wait $tests_pid
        fi
 
        # userspace pm create id 0 subflow
@@ -3494,7 +3496,7 @@ userspace_tests()
                chk_mptcp_info subflows 1 subflows 1
                chk_subflows_total 2 2
                kill_events_pids
-               wait $tests_pid
+               mptcp_lib_kill_wait $tests_pid
        fi
 
        # userspace pm remove initial subflow
@@ -3518,7 +3520,7 @@ userspace_tests()
                chk_mptcp_info subflows 1 subflows 1
                chk_subflows_total 1 1
                kill_events_pids
-               wait $tests_pid
+               mptcp_lib_kill_wait $tests_pid
        fi
 
        # userspace pm send RM_ADDR for ID 0
@@ -3544,7 +3546,7 @@ userspace_tests()
                chk_mptcp_info subflows 1 subflows 1
                chk_subflows_total 1 1
                kill_events_pids
-               wait $tests_pid
+               mptcp_lib_kill_wait $tests_pid
        fi
 }
 
@@ -3558,7 +3560,8 @@ endpoint_tests()
                pm_nl_set_limits $ns2 2 2
                pm_nl_add_endpoint $ns1 10.0.2.1 flags signal
                speed=slow \
-                       run_tests $ns1 $ns2 10.0.1.1 2>/dev/null &
+                       run_tests $ns1 $ns2 10.0.1.1 &
+               local tests_pid=$!
 
                wait_mpj $ns1
                pm_nl_check_endpoint "creation" \
@@ -3573,7 +3576,7 @@ endpoint_tests()
                pm_nl_add_endpoint $ns2 10.0.2.2 flags signal
                pm_nl_check_endpoint "modif is allowed" \
                        $ns2 10.0.2.2 id 1 flags signal
-               kill_tests_wait
+               mptcp_lib_kill_wait $tests_pid
        fi
 
        if reset "delete and re-add" &&
@@ -3582,7 +3585,8 @@ endpoint_tests()
                pm_nl_set_limits $ns2 1 1
                pm_nl_add_endpoint $ns2 10.0.2.2 id 2 dev ns2eth2 flags subflow
                test_linkfail=4 speed=20 \
-                       run_tests $ns1 $ns2 10.0.1.1 2>/dev/null &
+                       run_tests $ns1 $ns2 10.0.1.1 &
+               local tests_pid=$!
 
                wait_mpj $ns2
                chk_subflow_nr "before delete" 2
@@ -3597,7 +3601,7 @@ endpoint_tests()
                wait_mpj $ns2
                chk_subflow_nr "after re-add" 2
                chk_mptcp_info subflows 1 subflows 1
-               kill_tests_wait
+               mptcp_lib_kill_wait $tests_pid
        fi
 }
 
index 022262a2cfe0ee59976d398f665c8057dfaea0d7..3777d66fc56d36a4770b164fd781af298cd4eb70 100644 (file)
@@ -6,7 +6,7 @@ readonly KSFT_FAIL=1
 readonly KSFT_SKIP=4
 
 # shellcheck disable=SC2155 # declare and assign separately
-readonly KSFT_TEST=$(basename "${0}" | sed 's/\.sh$//g')
+readonly KSFT_TEST="${MPTCP_LIB_KSFT_TEST:-$(basename "${0}" .sh)}"
 
 MPTCP_LIB_SUBTESTS=()
 
@@ -213,9 +213,9 @@ mptcp_lib_get_info_value() {
        grep "${2}" | sed -n 's/.*\('"${1}"':\)\([0-9a-f:.]*\).*$/\2/p;q'
 }
 
-# $1: info name ; $2: evts_ns ; $3: event type
+# $1: info name ; $2: evts_ns ; [$3: event type; [$4: addr]]
 mptcp_lib_evts_get_info() {
-       mptcp_lib_get_info_value "${1}" "^type:${3:-1}," < "${2}"
+       grep "${4:-}" "${2}" | mptcp_lib_get_info_value "${1}" "^type:${3:-1},"
 }
 
 # $1: PID
index 8f4ff123a7eb92646845a5dea4caf28483057085..71899a3ffa7a9d7831c61f08b7f3b9c20aaed58e 100755 (executable)
@@ -183,7 +183,7 @@ check "ip netns exec $ns1 ./pm_nl_ctl dump" "id 1 flags \
 subflow 10.0.1.1" "          (nobackup)"
 
 # fullmesh support has been added later
-ip netns exec $ns1 ./pm_nl_ctl set id 1 flags fullmesh
+ip netns exec $ns1 ./pm_nl_ctl set id 1 flags fullmesh 2>/dev/null
 if ip netns exec $ns1 ./pm_nl_ctl dump | grep -q "fullmesh" ||
    mptcp_lib_expect_all_features; then
        check "ip netns exec $ns1 ./pm_nl_ctl dump" "id 1 flags \
@@ -194,6 +194,12 @@ subflow 10.0.1.1" "          (nofullmesh)"
        ip netns exec $ns1 ./pm_nl_ctl set id 1 flags backup,fullmesh
        check "ip netns exec $ns1 ./pm_nl_ctl dump" "id 1 flags \
 subflow,backup,fullmesh 10.0.1.1" "          (backup,fullmesh)"
+else
+       for st in fullmesh nofullmesh backup,fullmesh; do
+               st="          (${st})"
+               printf "%-50s%s\n" "${st}" "[SKIP]"
+               mptcp_lib_result_skip "${st}"
+       done
 fi
 
 mptcp_lib_result_print_all_tap
index 79b65bdf05db6586726cc76d3313f12368d21dc5..abc5648b59abde537dca90791404691050c759e2 100644 (file)
@@ -1 +1 @@
-timeout=1200
+timeout=1800
index ae8ad5d6fb9dac680573b4207a67781e18773c09..8f9ddb3ad4fe83501f54a1ac5e62047108eea910 100755 (executable)
@@ -250,7 +250,8 @@ run_test()
                [ $bail -eq 0 ] || exit $ret
        fi
 
-       printf "%-60s" "$msg - reverse direction"
+       msg+=" - reverse direction"
+       printf "%-60s" "${msg}"
        do_transfer $large $small $time
        lret=$?
        mptcp_lib_result_code "${lret}" "${msg}"
@@ -284,12 +285,12 @@ done
 
 setup
 run_test 10 10 0 0 "balanced bwidth"
-run_test 10 10 1 50 "balanced bwidth with unbalanced delay"
+run_test 10 10 1 25 "balanced bwidth with unbalanced delay"
 
 # we still need some additional infrastructure to pass the following test-cases
-run_test 30 10 0 0 "unbalanced bwidth"
-run_test 30 10 1 50 "unbalanced bwidth with unbalanced delay"
-run_test 30 10 50 1 "unbalanced bwidth with opposed, unbalanced delay"
+run_test 10 3 0 0 "unbalanced bwidth"
+run_test 10 3 1 25 "unbalanced bwidth with unbalanced delay"
+run_test 10 3 25 1 "unbalanced bwidth with opposed, unbalanced delay"
 
 mptcp_lib_result_print_all_tap
 exit $ret
index 6167837f48e17ef8ba0d41ba541f73da765f945e..1b94a75604fee98788ba5792b384ca4870bdafbb 100755 (executable)
@@ -75,7 +75,7 @@ print_test()
 {
        test_name="${1}"
 
-       _printf "%-63s" "${test_name}"
+       _printf "%-68s" "${test_name}"
 }
 
 print_results()
@@ -542,7 +542,7 @@ verify_subflow_events()
        local remid
        local info
 
-       info="${e_saddr} (${e_from}) => ${e_daddr} (${e_to})"
+       info="${e_saddr} (${e_from}) => ${e_daddr}:${e_dport} (${e_to})"
 
        if [ "$e_type" = "$SUB_ESTABLISHED" ]
        then
old mode 100755 (executable)
new mode 100644 (file)
index 4fe0bef..6596fe0
@@ -8,13 +8,16 @@ wait_local_port_listen()
        local listener_ns="${1}"
        local port="${2}"
        local protocol="${3}"
-       local port_hex
+       local pattern
        local i
 
-       port_hex="$(printf "%04X" "${port}")"
+       pattern=":$(printf "%04X" "${port}") "
+
+       # for tcp protocol additionally check the socket state
+       [ ${protocol} = "tcp" ] && pattern="${pattern}0A"
        for i in $(seq 10); do
-               if ip netns exec "${listener_ns}" cat /proc/net/"${protocol}"* | \
-                  grep -q "${port_hex}"; then
+               if ip netns exec "${listener_ns}" awk '{print $2" "$4}' \
+                  /proc/net/"${protocol}"* | grep -q "${pattern}"; then
                        break
                fi
                sleep 0.1
index f8499d4c87f3f763e774619666e00ce6a17d333b..36e40256ab92a696de62339dd7c7342df3468372 100755 (executable)
@@ -502,7 +502,20 @@ test_netlink_checks () {
            wc -l) == 2 ] || \
              return 1
 
+       info "Checking clone depth"
        ERR_MSG="Flow actions may not be safe on all matching packets"
+       PRE_TEST=$(dmesg | grep -c "${ERR_MSG}")
+       ovs_add_flow "test_netlink_checks" nv0 \
+               'in_port(1),eth(),eth_type(0x800),ipv4()' \
+               'clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(clone(drop)))))))))))))))))' \
+               >/dev/null 2>&1 && return 1
+       POST_TEST=$(dmesg | grep -c "${ERR_MSG}")
+
+       if [ "$PRE_TEST" == "$POST_TEST" ]; then
+               info "failed - clone depth too large"
+               return 1
+       fi
+
        PRE_TEST=$(dmesg | grep -c "${ERR_MSG}")
        ovs_add_flow "test_netlink_checks" nv0 \
                'in_port(1),eth(),eth_type(0x0806),arp()' 'drop(0),2' \
index b97e621face958838f1ad5cf2d12dfc5875db4f4..5e0e539a323d55a44802495ca356661ef8de783a 100644 (file)
@@ -299,7 +299,7 @@ class ovsactions(nla):
         ("OVS_ACTION_ATTR_PUSH_NSH", "none"),
         ("OVS_ACTION_ATTR_POP_NSH", "flag"),
         ("OVS_ACTION_ATTR_METER", "none"),
-        ("OVS_ACTION_ATTR_CLONE", "none"),
+        ("OVS_ACTION_ATTR_CLONE", "recursive"),
         ("OVS_ACTION_ATTR_CHECK_PKT_LEN", "none"),
         ("OVS_ACTION_ATTR_ADD_MPLS", "none"),
         ("OVS_ACTION_ATTR_DEC_TTL", "none"),
@@ -465,29 +465,42 @@ class ovsactions(nla):
                     print_str += "pop_mpls"
             else:
                 datum = self.get_attr(field[0])
-                print_str += datum.dpstr(more)
+                if field[0] == "OVS_ACTION_ATTR_CLONE":
+                    print_str += "clone("
+                    print_str += datum.dpstr(more)
+                    print_str += ")"
+                else:
+                    print_str += datum.dpstr(more)
 
         return print_str
 
     def parse(self, actstr):
+        totallen = len(actstr)
         while len(actstr) != 0:
             parsed = False
+            parencount = 0
             if actstr.startswith("drop"):
                 # If no reason is provided, the implicit drop is used (i.e no
                 # action). If some reason is given, an explicit action is used.
-                actstr, reason = parse_extract_field(
-                    actstr,
-                    "drop(",
-                    "([0-9]+)",
-                    lambda x: int(x, 0),
-                    False,
-                    None,
-                )
+                reason = None
+                if actstr.startswith("drop("):
+                    parencount += 1
+
+                    actstr, reason = parse_extract_field(
+                        actstr,
+                        "drop(",
+                        "([0-9]+)",
+                        lambda x: int(x, 0),
+                        False,
+                        None,
+                    )
+
                 if reason is not None:
                     self["attrs"].append(["OVS_ACTION_ATTR_DROP", reason])
                     parsed = True
                 else:
-                    return
+                    actstr = actstr[len("drop"): ]
+                    return (totallen - len(actstr))
 
             elif parse_starts_block(actstr, "^(\d+)", False, True):
                 actstr, output = parse_extract_field(
@@ -504,6 +517,7 @@ class ovsactions(nla):
                     False,
                     0,
                 )
+                parencount += 1
                 self["attrs"].append(["OVS_ACTION_ATTR_RECIRC", recircid])
                 parsed = True
 
@@ -516,12 +530,22 @@ class ovsactions(nla):
 
             for flat_act in parse_flat_map:
                 if parse_starts_block(actstr, flat_act[0], False):
-                    actstr += len(flat_act[0])
+                    actstr = actstr[len(flat_act[0]):]
                     self["attrs"].append([flat_act[1]])
                     actstr = actstr[strspn(actstr, ", ") :]
                     parsed = True
 
-            if parse_starts_block(actstr, "ct(", False):
+            if parse_starts_block(actstr, "clone(", False):
+                parencount += 1
+                subacts = ovsactions()
+                actstr = actstr[len("clone("):]
+                parsedLen = subacts.parse(actstr)
+                lst = []
+                self["attrs"].append(("OVS_ACTION_ATTR_CLONE", subacts))
+                actstr = actstr[parsedLen:]
+                parsed = True
+            elif parse_starts_block(actstr, "ct(", False):
+                parencount += 1
                 actstr = actstr[len("ct(") :]
                 ctact = ovsactions.ctact()
 
@@ -553,6 +577,7 @@ class ovsactions(nla):
                         natact = ovsactions.ctact.natattr()
 
                         if actstr.startswith("("):
+                            parencount += 1
                             t = None
                             actstr = actstr[1:]
                             if actstr.startswith("src"):
@@ -607,15 +632,29 @@ class ovsactions(nla):
                                     actstr = actstr[strspn(actstr, ", ") :]
 
                         ctact["attrs"].append(["OVS_CT_ATTR_NAT", natact])
-                        actstr = actstr[strspn(actstr, ",) ") :]
+                        actstr = actstr[strspn(actstr, ", ") :]
 
                 self["attrs"].append(["OVS_ACTION_ATTR_CT", ctact])
                 parsed = True
 
-            actstr = actstr[strspn(actstr, "), ") :]
+            actstr = actstr[strspn(actstr, ", ") :]
+            while parencount > 0:
+                parencount -= 1
+                actstr = actstr[strspn(actstr, " "):]
+                if len(actstr) and actstr[0] != ")":
+                    raise ValueError("Action str: '%s' unbalanced" % actstr)
+                actstr = actstr[1:]
+
+            if len(actstr) and actstr[0] == ")":
+                return (totallen - len(actstr))
+
+            actstr = actstr[strspn(actstr, ", ") :]
+
             if not parsed:
                 raise ValueError("Action str: '%s' not supported" % actstr)
 
+        return (totallen - len(actstr))
+
 
 class ovskey(nla):
     nla_flags = NLA_F_NESTED
@@ -2111,6 +2150,8 @@ def main(argv):
     ovsflow = OvsFlow()
     ndb = NDB()
 
+    sys.setrecursionlimit(100000)
+
     if hasattr(args, "showdp"):
         found = False
         for iface in ndb.interfaces:
index f10879788f61ba4f4c01d6cb60a929048ec56540..cfc84958025a61e7ee24a3675a0969e0e3c7cd52 100755 (executable)
 #      Same as above but with IPv6
 
 source lib.sh
+source net_helper.sh
 
 PAUSE_ON_FAIL=no
 VERBOSE=0
@@ -707,23 +708,23 @@ setup_xfrm6() {
 }
 
 setup_xfrm4udp() {
-       setup_xfrm 4 ${veth4_a_addr} ${veth4_b_addr} "encap espinudp 4500 4500 0.0.0.0"
-       setup_nettest_xfrm 4 4500
+       setup_xfrm 4 ${veth4_a_addr} ${veth4_b_addr} "encap espinudp 4500 4500 0.0.0.0" && \
+               setup_nettest_xfrm 4 4500
 }
 
 setup_xfrm6udp() {
-       setup_xfrm 6 ${veth6_a_addr} ${veth6_b_addr} "encap espinudp 4500 4500 0.0.0.0"
-       setup_nettest_xfrm 6 4500
+       setup_xfrm 6 ${veth6_a_addr} ${veth6_b_addr} "encap espinudp 4500 4500 0.0.0.0" && \
+               setup_nettest_xfrm 6 4500
 }
 
 setup_xfrm4udprouted() {
-       setup_xfrm 4 ${prefix4}.${a_r1}.1 ${prefix4}.${b_r1}.1 "encap espinudp 4500 4500 0.0.0.0"
-       setup_nettest_xfrm 4 4500
+       setup_xfrm 4 ${prefix4}.${a_r1}.1 ${prefix4}.${b_r1}.1 "encap espinudp 4500 4500 0.0.0.0" && \
+               setup_nettest_xfrm 4 4500
 }
 
 setup_xfrm6udprouted() {
-       setup_xfrm 6 ${prefix6}:${a_r1}::1 ${prefix6}:${b_r1}::1 "encap espinudp 4500 4500 0.0.0.0"
-       setup_nettest_xfrm 6 4500
+       setup_xfrm 6 ${prefix6}:${a_r1}::1 ${prefix6}:${b_r1}::1 "encap espinudp 4500 4500 0.0.0.0" && \
+               setup_nettest_xfrm 6 4500
 }
 
 setup_routing_old() {
@@ -1335,12 +1336,14 @@ test_pmtu_ipvX_over_bridged_vxlanY_or_geneveY_exception() {
                else
                        TCPDST="TCP:[${dst}]:50000"
                fi
-               ${ns_b} socat -T 3 -u -6 TCP-LISTEN:50000 STDOUT > $tmpoutfile &
+               ${ns_b} socat -T 3 -u -6 TCP-LISTEN:50000,reuseaddr STDOUT > $tmpoutfile &
+               local socat_pid=$!
 
-               sleep 1
+               wait_local_port_listen ${NS_B} 50000 tcp
 
-               dd if=/dev/zero of=/dev/stdout status=none bs=1M count=1 | ${target} socat -T 3 -u STDIN $TCPDST,connect-timeout=3
+               dd if=/dev/zero status=none bs=1M count=1 | ${target} socat -T 3 -u STDIN $TCPDST,connect-timeout=3
 
+               wait ${socat_pid}
                size=$(du -sb $tmpoutfile)
                size=${size%%/tmp/*}
 
@@ -1954,6 +1957,13 @@ check_command() {
        return 0
 }
 
+check_running() {
+       pid=${1}
+       cmd=${2}
+
+       [ "$(cat /proc/${pid}/cmdline 2>/dev/null | tr -d '\0')" = "{cmd}" ]
+}
+
 test_cleanup_vxlanX_exception() {
        outer="${1}"
        encap="vxlan"
@@ -1984,11 +1994,12 @@ test_cleanup_vxlanX_exception() {
 
        ${ns_a} ip link del dev veth_A-R1 &
        iplink_pid=$!
-       sleep 1
-       if [ "$(cat /proc/${iplink_pid}/cmdline 2>/dev/null | tr -d '\0')" = "iplinkdeldevveth_A-R1" ]; then
-               err "  can't delete veth device in a timely manner, PMTU dst likely leaked"
-               return 1
-       fi
+       for i in $(seq 1 20); do
+               check_running ${iplink_pid} "iplinkdeldevveth_A-R1" || return 0
+               sleep 0.1
+       done
+       err "  can't delete veth device in a timely manner, PMTU dst likely leaked"
+       return 1
 }
 
 test_cleanup_ipv6_exception() {
index a26c5624429fb1a029d1d472921154e73a7ea86b..4287a85298907969dbd7df7da0e1969494f7857e 100755 (executable)
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
 readonly ksft_skip=4
@@ -33,6 +33,10 @@ chk_rps() {
 
        rps_mask=$($cmd /sys/class/net/$dev_name/queues/rx-0/rps_cpus)
        printf "%-60s" "$msg"
+
+       # In case there is more than 32 CPUs we need to remove commas from masks
+       rps_mask=${rps_mask//,}
+       expected_rps_mask=${expected_rps_mask//,}
        if [ $rps_mask -eq $expected_rps_mask ]; then
                echo "[ ok ]"
        else
index 4667d74579d135eb74d701d79baa3036c88f2635..874a2952aa8ee16b1841bdd7e2930e54a1c99303 100755 (executable)
@@ -440,7 +440,6 @@ kci_test_encap_vxlan()
        local ret=0
        vxlan="test-vxlan0"
        vlan="test-vlan0"
-       testns="$1"
        run_cmd ip -netns "$testns" link add "$vxlan" type vxlan id 42 group 239.1.1.1 \
                dev "$devdummy" dstport 4789
        if [ $? -ne 0 ]; then
@@ -485,7 +484,6 @@ kci_test_encap_fou()
 {
        local ret=0
        name="test-fou"
-       testns="$1"
        run_cmd_grep 'Usage: ip fou' ip fou help
        if [ $? -ne 0 ];then
                end_test "SKIP: fou: iproute2 too old"
@@ -526,8 +524,8 @@ kci_test_encap()
        run_cmd ip -netns "$testns" link set lo up
        run_cmd ip -netns "$testns" link add name "$devdummy" type dummy
        run_cmd ip -netns "$testns" link set "$devdummy" up
-       run_cmd kci_test_encap_vxlan "$testns"
-       run_cmd kci_test_encap_fou "$testns"
+       run_cmd kci_test_encap_vxlan
+       run_cmd kci_test_encap_fou
 
        ip netns del "$testns"
        return $ret
old mode 100755 (executable)
new mode 100644 (file)
index a9a1759e035ca875ac22036fa52984b32ada76f8..1f78a87f6f37eaab8dc41850e1a906f9aef6315f 100644 (file)
@@ -11,7 +11,7 @@ setup_veth_ns() {
        local -r ns_mac="$4"
 
        [[ -e /var/run/netns/"${ns_name}" ]] || ip netns add "${ns_name}"
-       echo 100000 > "/sys/class/net/${ns_dev}/gro_flush_timeout"
+       echo 1000000 > "/sys/class/net/${ns_dev}/gro_flush_timeout"
        ip link set dev "${ns_dev}" netns "${ns_name}" mtu 65535
        ip -netns "${ns_name}" link set dev "${ns_dev}" up
 
index a148181641026e18e7d9c138ab7b6dde89cc90fd..e9fa14e1073226829e883d7c6621ae9e9a2ce173 100644 (file)
@@ -3,19 +3,16 @@
 #define _GNU_SOURCE
 #include <sched.h>
 
+#include <fcntl.h>
+
 #include <netinet/in.h>
 #include <sys/socket.h>
 #include <sys/sysinfo.h>
 
 #include "../kselftest_harness.h"
 
-#define CLIENT_PER_SERVER      32 /* More sockets, more reliable */
-#define NR_SERVER              self->nproc
-#define NR_CLIENT              (CLIENT_PER_SERVER * NR_SERVER)
-
 FIXTURE(so_incoming_cpu)
 {
-       int nproc;
        int *servers;
        union {
                struct sockaddr addr;
@@ -56,12 +53,47 @@ FIXTURE_VARIANT_ADD(so_incoming_cpu, after_all_listen)
        .when_to_set = AFTER_ALL_LISTEN,
 };
 
+static void write_sysctl(struct __test_metadata *_metadata,
+                        char *filename, char *string)
+{
+       int fd, len, ret;
+
+       fd = open(filename, O_WRONLY);
+       ASSERT_NE(fd, -1);
+
+       len = strlen(string);
+       ret = write(fd, string, len);
+       ASSERT_EQ(ret, len);
+}
+
+static void setup_netns(struct __test_metadata *_metadata)
+{
+       ASSERT_EQ(unshare(CLONE_NEWNET), 0);
+       ASSERT_EQ(system("ip link set lo up"), 0);
+
+       write_sysctl(_metadata, "/proc/sys/net/ipv4/ip_local_port_range", "10000 60001");
+       write_sysctl(_metadata, "/proc/sys/net/ipv4/tcp_tw_reuse", "0");
+}
+
+#define NR_PORT                                (60001 - 10000 - 1)
+#define NR_CLIENT_PER_SERVER_DEFAULT   32
+static int nr_client_per_server, nr_server, nr_client;
+
 FIXTURE_SETUP(so_incoming_cpu)
 {
-       self->nproc = get_nprocs();
-       ASSERT_LE(2, self->nproc);
+       setup_netns(_metadata);
+
+       nr_server = get_nprocs();
+       ASSERT_LE(2, nr_server);
+
+       if (NR_CLIENT_PER_SERVER_DEFAULT * nr_server < NR_PORT)
+               nr_client_per_server = NR_CLIENT_PER_SERVER_DEFAULT;
+       else
+               nr_client_per_server = NR_PORT / nr_server;
+
+       nr_client = nr_client_per_server * nr_server;
 
-       self->servers = malloc(sizeof(int) * NR_SERVER);
+       self->servers = malloc(sizeof(int) * nr_server);
        ASSERT_NE(self->servers, NULL);
 
        self->in_addr.sin_family = AF_INET;
@@ -74,7 +106,7 @@ FIXTURE_TEARDOWN(so_incoming_cpu)
 {
        int i;
 
-       for (i = 0; i < NR_SERVER; i++)
+       for (i = 0; i < nr_server; i++)
                close(self->servers[i]);
 
        free(self->servers);
@@ -110,10 +142,10 @@ int create_server(struct __test_metadata *_metadata,
        if (variant->when_to_set == BEFORE_LISTEN)
                set_so_incoming_cpu(_metadata, fd, cpu);
 
-       /* We don't use CLIENT_PER_SERVER here not to block
+       /* We don't use nr_client_per_server here not to block
         * this test at connect() if SO_INCOMING_CPU is broken.
         */
-       ret = listen(fd, NR_CLIENT);
+       ret = listen(fd, nr_client);
        ASSERT_EQ(ret, 0);
 
        if (variant->when_to_set == AFTER_LISTEN)
@@ -128,7 +160,7 @@ void create_servers(struct __test_metadata *_metadata,
 {
        int i, ret;
 
-       for (i = 0; i < NR_SERVER; i++) {
+       for (i = 0; i < nr_server; i++) {
                self->servers[i] = create_server(_metadata, self, variant, i);
 
                if (i == 0) {
@@ -138,7 +170,7 @@ void create_servers(struct __test_metadata *_metadata,
        }
 
        if (variant->when_to_set == AFTER_ALL_LISTEN) {
-               for (i = 0; i < NR_SERVER; i++)
+               for (i = 0; i < nr_server; i++)
                        set_so_incoming_cpu(_metadata, self->servers[i], i);
        }
 }
@@ -149,7 +181,7 @@ void create_clients(struct __test_metadata *_metadata,
        cpu_set_t cpu_set;
        int i, j, fd, ret;
 
-       for (i = 0; i < NR_SERVER; i++) {
+       for (i = 0; i < nr_server; i++) {
                CPU_ZERO(&cpu_set);
 
                CPU_SET(i, &cpu_set);
@@ -162,7 +194,7 @@ void create_clients(struct __test_metadata *_metadata,
                ret = sched_setaffinity(0, sizeof(cpu_set), &cpu_set);
                ASSERT_EQ(ret, 0);
 
-               for (j = 0; j < CLIENT_PER_SERVER; j++) {
+               for (j = 0; j < nr_client_per_server; j++) {
                        fd  = socket(AF_INET, SOCK_STREAM, 0);
                        ASSERT_NE(fd, -1);
 
@@ -180,8 +212,8 @@ void verify_incoming_cpu(struct __test_metadata *_metadata,
        int i, j, fd, cpu, ret, total = 0;
        socklen_t len = sizeof(int);
 
-       for (i = 0; i < NR_SERVER; i++) {
-               for (j = 0; j < CLIENT_PER_SERVER; j++) {
+       for (i = 0; i < nr_server; i++) {
+               for (j = 0; j < nr_client_per_server; j++) {
                        /* If we see -EAGAIN here, SO_INCOMING_CPU is broken */
                        fd = accept(self->servers[i], &self->addr, &self->addrlen);
                        ASSERT_NE(fd, -1);
@@ -195,7 +227,7 @@ void verify_incoming_cpu(struct __test_metadata *_metadata,
                }
        }
 
-       ASSERT_EQ(total, NR_CLIENT);
+       ASSERT_EQ(total, nr_client);
        TH_LOG("SO_INCOMING_CPU is very likely to be "
               "working correctly with %d sockets.", total);
 }
index 3f06f4d286a988f15ccbc006f8015b24a07be3f0..5e861ad32a42e11b680236b4a016ed952caf5914 100755 (executable)
@@ -5,6 +5,7 @@
 
 set -e
 
+readonly ksft_skip=4
 readonly DEV="veth0"
 readonly BIN="./so_txtime"
 
@@ -46,7 +47,7 @@ ip -netns "${NS2}" addr add 192.168.1.2/24 dev "${DEV}"
 ip -netns "${NS1}" addr add       fd::1/64 dev "${DEV}" nodad
 ip -netns "${NS2}" addr add       fd::2/64 dev "${DEV}" nodad
 
-do_test() {
+run_test() {
        local readonly IP="$1"
        local readonly CLOCK="$2"
        local readonly TXARGS="$3"
@@ -64,12 +65,25 @@ do_test() {
        fi
 
        local readonly START="$(date +%s%N --date="+ 0.1 seconds")"
+
        ip netns exec "${NS2}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${RXARGS}" -r &
        ip netns exec "${NS1}" "${BIN}" -"${IP}" -c "${CLOCK}" -t "${START}" -S "${SADDR}" -D "${DADDR}" "${TXARGS}"
        wait "$!"
 }
 
+do_test() {
+       run_test $@
+       [ $? -ne 0 ] && ret=1
+}
+
+do_fail_test() {
+       run_test $@
+       [ $? -eq 0 ] && ret=1
+}
+
 ip netns exec "${NS1}" tc qdisc add dev "${DEV}" root fq
+set +e
+ret=0
 do_test 4 mono a,-1 a,-1
 do_test 6 mono a,0 a,0
 do_test 6 mono a,10 a,10
@@ -77,13 +91,20 @@ do_test 4 mono a,10,b,20 a,10,b,20
 do_test 6 mono a,20,b,10 b,20,a,20
 
 if ip netns exec "${NS1}" tc qdisc replace dev "${DEV}" root etf clockid CLOCK_TAI delta 400000; then
-       ! do_test 4 tai a,-1 a,-1
-       ! do_test 6 tai a,0 a,0
+       do_fail_test 4 tai a,-1 a,-1
+       do_fail_test 6 tai a,0 a,0
        do_test 6 tai a,10 a,10
        do_test 4 tai a,10,b,20 a,10,b,20
        do_test 6 tai a,20,b,10 b,10,a,20
 else
        echo "tc ($(tc -V)) does not support qdisc etf. skipping"
+       [ $ret -eq 0 ] && ret=$ksft_skip
 fi
 
-echo OK. All tests passed
+if [ $ret -eq 0 ]; then
+       echo OK. All tests passed
+elif [[ $ret -ne $ksft_skip && -n "$KSFT_MACHINE_SLOW" ]]; then
+       echo "Ignoring errors due to slow environment" 1>&2
+       ret=0
+fi
+exit $ret
diff --git a/tools/testing/selftests/net/tcp_ao/config b/tools/testing/selftests/net/tcp_ao/config
new file mode 100644 (file)
index 0000000..d3277a9
--- /dev/null
@@ -0,0 +1,10 @@
+CONFIG_CRYPTO_HMAC=y
+CONFIG_CRYPTO_RMD160=y
+CONFIG_CRYPTO_SHA1=y
+CONFIG_IPV6_MULTIPLE_TABLES=y
+CONFIG_IPV6=y
+CONFIG_NET_L3_MASTER_DEV=y
+CONFIG_NET_VRF=y
+CONFIG_TCP_AO=y
+CONFIG_TCP_MD5SIG=y
+CONFIG_VETH=m
index c48b4970ca17e07220813192fadbc553cdc89250..24e62120b7924d3a1555a7e42097f9e55338b4db 100644 (file)
@@ -417,9 +417,9 @@ struct test_key {
                matches_vrf             : 1,
                is_current              : 1,
                is_rnext                : 1,
-               used_on_handshake       : 1,
-               used_after_accept       : 1,
-               used_on_client          : 1;
+               used_on_server_tx       : 1,
+               used_on_client_tx       : 1,
+               skip_counters_checks    : 1;
 };
 
 struct key_collection {
@@ -609,16 +609,14 @@ static int key_collection_socket(bool server, unsigned int port)
                                addr = &this_ip_dest;
                        sndid = key->client_keyid;
                        rcvid = key->server_keyid;
-                       set_current = key->is_current;
-                       set_rnext = key->is_rnext;
+                       key->used_on_client_tx = set_current = key->is_current;
+                       key->used_on_server_tx = set_rnext = key->is_rnext;
                }
 
                if (test_add_key_cr(sk, key->password, key->len,
                                    *addr, vrf, sndid, rcvid, key->maclen,
                                    key->alg, set_current, set_rnext))
                        test_key_error("setsockopt(TCP_AO_ADD_KEY)", key);
-               if (set_current || set_rnext)
-                       key->used_on_handshake = 1;
 #ifdef DEBUG
                test_print("%s [%u/%u] key: { %s, %u:%u, %u, %u:%u:%u:%u (%u)}",
                           server ? "server" : "client", i, collection.nr_keys,
@@ -640,22 +638,22 @@ static void verify_counters(const char *tst_name, bool is_listen_sk, bool server
        for (i = 0; i < collection.nr_keys; i++) {
                struct test_key *key = &collection.keys[i];
                uint8_t sndid, rcvid;
-               bool was_used;
+               bool rx_cnt_expected;
 
+               if (key->skip_counters_checks)
+                       continue;
                if (server) {
                        sndid = key->server_keyid;
                        rcvid = key->client_keyid;
-                       if (is_listen_sk)
-                               was_used = key->used_on_handshake;
-                       else
-                               was_used = key->used_after_accept;
+                       rx_cnt_expected = key->used_on_client_tx;
                } else {
                        sndid = key->client_keyid;
                        rcvid = key->server_keyid;
-                       was_used = key->used_on_client;
+                       rx_cnt_expected = key->used_on_server_tx;
                }
 
-               test_tcp_ao_key_counters_cmp(tst_name, a, b, was_used,
+               test_tcp_ao_key_counters_cmp(tst_name, a, b,
+                                            rx_cnt_expected ? TEST_CNT_KEY_GOOD : 0,
                                             sndid, rcvid);
        }
        test_tcp_ao_counters_free(a);
@@ -843,7 +841,7 @@ static void end_server(const char *tst_name, int sk,
        synchronize_threads(); /* 4: verified => closed */
        close(sk);
 
-       verify_counters(tst_name, true, false, begin, &end);
+       verify_counters(tst_name, false, true, begin, &end);
        synchronize_threads(); /* 5: counters */
 }
 
@@ -916,9 +914,8 @@ static int run_client(const char *tst_name, unsigned int port,
                current_index = nr_keys - 1;
        if (rnext_index < 0)
                rnext_index = nr_keys - 1;
-       collection.keys[current_index].used_on_handshake = 1;
-       collection.keys[rnext_index].used_after_accept = 1;
-       collection.keys[rnext_index].used_on_client = 1;
+       collection.keys[current_index].used_on_client_tx = 1;
+       collection.keys[rnext_index].used_on_server_tx = 1;
 
        synchronize_threads(); /* 3: accepted => send data */
        if (test_client_verify(sk, msg_sz, msg_nr, TEST_TIMEOUT_SEC)) {
@@ -1059,7 +1056,16 @@ static void check_current_back(const char *tst_name, unsigned int port,
                test_error("Can't change the current key");
        if (test_client_verify(sk, msg_len, nr_packets, TEST_TIMEOUT_SEC))
                test_fail("verify failed");
-       collection.keys[rotate_to_index].used_after_accept = 1;
+       /* There is a race here: between setting the current_key with
+        * setsockopt(TCP_AO_INFO) and starting to send some data - there
+        * might have been a segment received with the desired
+        * RNext_key set. In turn that would mean that the first outgoing
+        * segment will have the desired current_key (flipped back).
+        * Which is what the user/test wants. As it's racy, skip checking
+        * the counters, yet check what are the resulting current/rnext
+        * keys on both sides.
+        */
+       collection.keys[rotate_to_index].skip_counters_checks = 1;
 
        end_client(tst_name, sk, nr_keys, current_index, rnext_index, &tmp);
 }
@@ -1089,7 +1095,7 @@ static void roll_over_keys(const char *tst_name, unsigned int port,
                }
                verify_current_rnext(tst_name, sk, -1,
                                     collection.keys[i].server_keyid);
-               collection.keys[i].used_on_client = 1;
+               collection.keys[i].used_on_server_tx = 1;
                synchronize_threads(); /* verify current/rnext */
        }
        end_client(tst_name, sk, nr_keys, current_index, rnext_index, &tmp);
index c75d82885a2e1aa40f463bbdc65999c05c6a063d..15aeb0963058fdf645451206b3015dd707aa0c13 100644 (file)
@@ -62,7 +62,9 @@ int test_wait_fd(int sk, time_t sec, bool write)
                return -ETIMEDOUT;
        }
 
-       if (getsockopt(sk, SOL_SOCKET, SO_ERROR, &ret, &slen) || ret)
+       if (getsockopt(sk, SOL_SOCKET, SO_ERROR, &ret, &slen))
+               return -errno;
+       if (ret)
                return -ret;
        return 0;
 }
@@ -584,9 +586,11 @@ int test_client_verify(int sk, const size_t msg_len, const size_t nr,
 {
        size_t buf_sz = msg_len * nr;
        char *buf = alloca(buf_sz);
+       ssize_t ret;
 
        randomize_buffer(buf, buf_sz);
-       if (test_client_loop(sk, buf, buf_sz, msg_len, timeout_sec) != buf_sz)
-               return -1;
-       return 0;
+       ret = test_client_loop(sk, buf, buf_sz, msg_len, timeout_sec);
+       if (ret < 0)
+               return (int)ret;
+       return ret != buf_sz ? -1 : 0;
 }
index ac06009a7f5f65ddf0095aa6d7044e98abf032cf..7df8b8700e39e96292f8eafdf105ee0314a65497 100644 (file)
@@ -1,10 +1,33 @@
 // SPDX-License-Identifier: GPL-2.0
-/* Author: Dmitry Safonov <dima@arista.com> */
+/*
+ * The test checks that both active and passive reset have correct TCP-AO
+ * signature. An "active" reset (abort) here is procured from closing
+ * listen() socket with non-accepted connections in the queue:
+ * inet_csk_listen_stop() => inet_child_forget() =>
+ *                        => tcp_disconnect() => tcp_send_active_reset()
+ *
+ * The passive reset is quite hard to get on established TCP connections.
+ * It could be procured from non-established states, but the synchronization
+ * part from userspace in order to reliably get RST seems uneasy.
+ * So, instead it's procured by corrupting SEQ number on TIMED-WAIT state.
+ *
+ * It's important to test both passive and active RST as they go through
+ * different code-paths:
+ * - tcp_send_active_reset() makes no-data skb, sends it with tcp_transmit_skb()
+ * - tcp_v*_send_reset() create their reply skbs and send them with
+ *   ip_send_unicast_reply()
+ *
+ * In both cases TCP-AO signatures have to be correct, which is verified by
+ * (1) checking that the TCP-AO connection was reset and (2) TCP-AO counters.
+ *
+ * Author: Dmitry Safonov <dima@arista.com>
+ */
 #include <inttypes.h>
 #include "../../../../include/linux/kernel.h"
 #include "aolib.h"
 
 const size_t quota = 1000;
+const size_t packet_sz = 100;
 /*
  * Backlog == 0 means 1 connection in queue, see:
  * commit 64a146513f8f ("[NET]: Revert incorrect accept queue...")
@@ -59,26 +82,6 @@ static void close_forced(int sk)
        close(sk);
 }
 
-static int test_wait_for_exception(int sk, time_t sec)
-{
-       struct timeval tv = { .tv_sec = sec };
-       struct timeval *ptv = NULL;
-       fd_set efds;
-       int ret;
-
-       FD_ZERO(&efds);
-       FD_SET(sk, &efds);
-
-       if (sec)
-               ptv = &tv;
-
-       errno = 0;
-       ret = select(sk + 1, NULL, NULL, &efds, ptv);
-       if (ret < 0)
-               return -errno;
-       return ret ? sk : 0;
-}
-
 static void test_server_active_rst(unsigned int port)
 {
        struct tcp_ao_counters cnt1, cnt2;
@@ -155,17 +158,16 @@ static void test_server_passive_rst(unsigned int port)
                        test_fail("server returned %zd", bytes);
        }
 
-       synchronize_threads(); /* 3: chekpoint/restore the connection */
+       synchronize_threads(); /* 3: checkpoint the client */
+       synchronize_threads(); /* 4: close the server, creating twsk */
        if (test_get_tcp_ao_counters(sk, &ao2))
                test_error("test_get_tcp_ao_counters()");
-
-       synchronize_threads(); /* 4: terminate server + send more on client */
-       bytes = test_server_run(sk, quota, TEST_RETRANSMIT_SEC);
        close(sk);
+
+       synchronize_threads(); /* 5: restore the socket, send more data */
        test_tcp_ao_counters_cmp("passive RST server", &ao1, &ao2, TEST_CNT_GOOD);
 
-       synchronize_threads(); /* 5: verified => closed */
-       close(sk);
+       synchronize_threads(); /* 6: server exits */
 }
 
 static void *server_fn(void *arg)
@@ -284,7 +286,7 @@ static void test_client_active_rst(unsigned int port)
                test_error("test_wait_fds(): %d", err);
 
        synchronize_threads(); /* 3: close listen socket */
-       if (test_client_verify(sk[0], 100, quota / 100, TEST_TIMEOUT_SEC))
+       if (test_client_verify(sk[0], packet_sz, quota / packet_sz, TEST_TIMEOUT_SEC))
                test_fail("Failed to send data on connected socket");
        else
                test_ok("Verified established tcp connection");
@@ -323,7 +325,6 @@ static void test_client_passive_rst(unsigned int port)
        struct tcp_sock_state img;
        sockaddr_af saddr;
        int sk, err;
-       socklen_t slen = sizeof(err);
 
        sk = socket(test_family, SOCK_STREAM, IPPROTO_TCP);
        if (sk < 0)
@@ -337,18 +338,51 @@ static void test_client_passive_rst(unsigned int port)
                test_error("failed to connect()");
 
        synchronize_threads(); /* 2: accepted => send data */
-       if (test_client_verify(sk, 100, quota / 100, TEST_TIMEOUT_SEC))
+       if (test_client_verify(sk, packet_sz, quota / packet_sz, TEST_TIMEOUT_SEC))
                test_fail("Failed to send data on connected socket");
        else
                test_ok("Verified established tcp connection");
 
-       synchronize_threads(); /* 3: chekpoint/restore the connection */
+       synchronize_threads(); /* 3: checkpoint the client */
        test_enable_repair(sk);
        test_sock_checkpoint(sk, &img, &saddr);
        test_ao_checkpoint(sk, &ao_img);
-       test_kill_sk(sk);
+       test_disable_repair(sk);
 
-       img.out.seq += quota;
+       synchronize_threads(); /* 4: close the server, creating twsk */
+
+       /*
+        * The "corruption" in SEQ has to be small enough to fit into TCP
+        * window, see tcp_timewait_state_process() for out-of-window
+        * segments.
+        */
+       img.out.seq += 5; /* 5 is more noticeable in tcpdump than 1 */
+
+       /*
+        * FIXME: This is kind-of ugly and dirty, but it works.
+        *
+        * At this moment, the server has close'ed(sk).
+        * The passive RST that is being targeted here is new data after
+        * half-duplex close, see tcp_timewait_state_process() => TCP_TW_RST
+        *
+        * What is needed here is:
+        * (1) wait for FIN from the server
+        * (2) make sure that the ACK from the client went out
+        * (3) make sure that the ACK was received and processed by the server
+        *
+        * Otherwise, the data that will be sent from "repaired" socket
+        * post SEQ corruption may get to the server before it's in
+        * TCP_FIN_WAIT2.
+        *
+        * (1) is easy with select()/poll()
+        * (2) is possible by polling tcpi_state from TCP_INFO
+        * (3) is quite complex: as server's socket was already closed,
+        *     probably the way to do it would be tcp-diag.
+        */
+       sleep(TEST_RETRANSMIT_SEC);
+
+       synchronize_threads(); /* 5: restore the socket, send more data */
+       test_kill_sk(sk);
 
        sk = socket(test_family, SOCK_STREAM, IPPROTO_TCP);
        if (sk < 0)
@@ -366,25 +400,33 @@ static void test_client_passive_rst(unsigned int port)
        test_disable_repair(sk);
        test_sock_state_free(&img);
 
-       synchronize_threads(); /* 4: terminate server + send more on client */
-       if (test_client_verify(sk, 100, quota / 100, 2 * TEST_TIMEOUT_SEC))
-               test_ok("client connection broken post-seq-adjust");
-       else
-               test_fail("client connection still works post-seq-adjust");
-
-       test_wait_for_exception(sk, TEST_TIMEOUT_SEC);
-
-       if (getsockopt(sk, SOL_SOCKET, SO_ERROR, &err, &slen))
-               test_error("getsockopt()");
-       if (err != ECONNRESET && err != EPIPE)
-               test_fail("client connection was not reset: %d", err);
+       /*
+        * This is how "passive reset" is acquired in this test from TCP_TW_RST:
+        *
+        * IP 10.0.254.1.7011 > 10.0.1.1.59772: Flags [P.], seq 901:1001, ack 1001, win 249,
+        *    options [tcp-ao keyid 100 rnextkeyid 100 mac 0x10217d6c36a22379086ef3b1], length 100
+        * IP 10.0.254.1.7011 > 10.0.1.1.59772: Flags [F.], seq 1001, ack 1001, win 249,
+        *    options [tcp-ao keyid 100 rnextkeyid 100 mac 0x104ffc99b98c10a5298cc268], length 0
+        * IP 10.0.1.1.59772 > 10.0.254.1.7011: Flags [.], ack 1002, win 251,
+        *    options [tcp-ao keyid 100 rnextkeyid 100 mac 0xe496dd4f7f5a8a66873c6f93,nop,nop,sack 1 {1001:1002}], length 0
+        * IP 10.0.1.1.59772 > 10.0.254.1.7011: Flags [P.], seq 1006:1106, ack 1001, win 251,
+        *    options [tcp-ao keyid 100 rnextkeyid 100 mac 0x1b5f3330fb23fbcd0c77d0ca], length 100
+        * IP 10.0.254.1.7011 > 10.0.1.1.59772: Flags [R], seq 3215596252, win 0,
+        *    options [tcp-ao keyid 100 rnextkeyid 100 mac 0x0bcfbbf497bce844312304b2], length 0
+        */
+       err = test_client_verify(sk, packet_sz, quota / packet_sz, 2 * TEST_TIMEOUT_SEC);
+       /* Make sure that the connection was reset, not timeouted */
+       if (err && err == -ECONNRESET)
+               test_ok("client sock was passively reset post-seq-adjust");
+       else if (err)
+               test_fail("client sock was not reset post-seq-adjust: %d", err);
        else
-               test_ok("client connection was reset");
+               test_fail("client sock is yet connected post-seq-adjust");
 
        if (test_get_tcp_ao_counters(sk, &ao2))
                test_error("test_get_tcp_ao_counters()");
 
-       synchronize_threads(); /* 5: verified => closed */
+       synchronize_threads(); /* 6: server exits */
        close(sk);
        test_tcp_ao_counters_cmp("client passive RST", &ao1, &ao2, TEST_CNT_GOOD);
 }
@@ -410,6 +452,6 @@ static void *client_fn(void *arg)
 
 int main(int argc, char *argv[])
 {
-       test_init(15, server_fn, client_fn);
+       test_init(14, server_fn, client_fn);
        return 0;
 }
diff --git a/tools/testing/selftests/net/tcp_ao/settings b/tools/testing/selftests/net/tcp_ao/settings
new file mode 100644 (file)
index 0000000..6091b45
--- /dev/null
@@ -0,0 +1 @@
+timeout=120
index c5b568cd7d901ce19d26cc0228dc7089581bb7f1..6b59a652159f7754417471c066a06bd4eb511a41 100644 (file)
@@ -110,9 +110,9 @@ static void try_accept(const char *tst_name, unsigned int port,
                test_tcp_ao_counters_cmp(tst_name, &ao_cnt1, &ao_cnt2, cnt_expected);
 
 out:
-       synchronize_threads(); /* close() */
+       synchronize_threads(); /* test_kill_sk() */
        if (sk > 0)
-               close(sk);
+               test_kill_sk(sk);
 }
 
 static void server_add_routes(void)
@@ -302,10 +302,10 @@ static void try_connect(const char *tst_name, unsigned int port,
                test_ok("%s: connected", tst_name);
 
 out:
-       synchronize_threads(); /* close() */
+       synchronize_threads(); /* test_kill_sk() */
        /* _test_connect_socket() cleans up on failure */
        if (ret > 0)
-               close(sk);
+               test_kill_sk(sk);
 }
 
 #define PREINSTALL_MD5_FIRST   BIT(0)
@@ -486,10 +486,10 @@ static void try_to_add(const char *tst_name, unsigned int port,
        }
 
 out:
-       synchronize_threads(); /* close() */
+       synchronize_threads(); /* test_kill_sk() */
        /* _test_connect_socket() cleans up on failure */
        if (ret > 0)
-               close(sk);
+               test_kill_sk(sk);
 }
 
 static void client_add_ip(union tcp_addr *client, const char *ip)
index 70a7d87ba2d21cecf6d76f7d184e0902e7b6d3e9..1b3f89e2b86e6aac2f9d631bb9bb22265c3f1734 100755 (executable)
@@ -124,6 +124,16 @@ tc_check_packets()
        [[ $pkts == $count ]]
 }
 
+bridge_link_check()
+{
+       local ns=$1; shift
+       local dev=$1; shift
+       local state=$1; shift
+
+       bridge -n $ns -d -j link show dev $dev | \
+               jq -e ".[][\"state\"] == \"$state\"" &> /dev/null
+}
+
 ################################################################################
 # Setup
 
@@ -259,6 +269,7 @@ backup_port()
        log_test $? 0 "No forwarding out of vx0"
 
        run_cmd "ip -n $sw1 link set dev swp1 carrier off"
+       busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled
        log_test $? 0 "swp1 carrier off"
 
        run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1"
@@ -268,6 +279,7 @@ backup_port()
        log_test $? 0 "No forwarding out of vx0"
 
        run_cmd "ip -n $sw1 link set dev swp1 carrier on"
+       busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 forwarding
        log_test $? 0 "swp1 carrier on"
 
        # Configure vx0 as the backup port of swp1 and check that packets are
@@ -284,6 +296,7 @@ backup_port()
        log_test $? 0 "No forwarding out of vx0"
 
        run_cmd "ip -n $sw1 link set dev swp1 carrier off"
+       busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled
        log_test $? 0 "swp1 carrier off"
 
        run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1"
@@ -293,6 +306,7 @@ backup_port()
        log_test $? 0 "Forwarding out of vx0"
 
        run_cmd "ip -n $sw1 link set dev swp1 carrier on"
+       busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 forwarding
        log_test $? 0 "swp1 carrier on"
 
        run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1"
@@ -314,6 +328,7 @@ backup_port()
        log_test $? 0 "No forwarding out of vx0"
 
        run_cmd "ip -n $sw1 link set dev swp1 carrier off"
+       busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled
        log_test $? 0 "swp1 carrier off"
 
        run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1"
@@ -369,6 +384,7 @@ backup_nhid()
        log_test $? 0 "No forwarding out of vx0"
 
        run_cmd "ip -n $sw1 link set dev swp1 carrier off"
+       busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled
        log_test $? 0 "swp1 carrier off"
 
        run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1"
@@ -382,6 +398,7 @@ backup_nhid()
        log_test $? 0 "Forwarding using VXLAN FDB entry"
 
        run_cmd "ip -n $sw1 link set dev swp1 carrier on"
+       busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 forwarding
        log_test $? 0 "swp1 carrier on"
 
        # Configure nexthop ID 10 as the backup nexthop ID of swp1 and check
@@ -398,6 +415,7 @@ backup_nhid()
        log_test $? 0 "No forwarding out of vx0"
 
        run_cmd "ip -n $sw1 link set dev swp1 carrier off"
+       busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled
        log_test $? 0 "swp1 carrier off"
 
        run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1"
@@ -411,6 +429,7 @@ backup_nhid()
        log_test $? 0 "No forwarding using VXLAN FDB entry"
 
        run_cmd "ip -n $sw1 link set dev swp1 carrier on"
+       busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 forwarding
        log_test $? 0 "swp1 carrier on"
 
        run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1"
@@ -441,6 +460,7 @@ backup_nhid()
        log_test $? 0 "No forwarding using VXLAN FDB entry"
 
        run_cmd "ip -n $sw1 link set dev swp1 carrier off"
+       busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled
        log_test $? 0 "swp1 carrier off"
 
        run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1"
@@ -497,6 +517,7 @@ backup_nhid_invalid()
        log_test $? 0 "Valid nexthop as backup nexthop"
 
        run_cmd "ip -n $sw1 link set dev swp1 carrier off"
+       busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled
        log_test $? 0 "swp1 carrier off"
 
        run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1"
@@ -604,7 +625,9 @@ backup_nhid_ping()
        run_cmd "bridge -n $sw2 link set dev swp1 backup_nhid 10"
 
        run_cmd "ip -n $sw1 link set dev swp1 carrier off"
+       busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled
        run_cmd "ip -n $sw2 link set dev swp1 carrier off"
+       busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw2 swp1 disabled
 
        run_cmd "ip netns exec $sw1 ping -i 0.1 -c 10 -w $PING_TIMEOUT 192.0.2.66"
        log_test $? 0 "Ping with backup nexthop ID"
index 7799e042a9719cda33ea7d004d2ae4a2ec608a4f..b95c249f81c254dae9160b42ec595b3d2daf6679 100644 (file)
@@ -1002,12 +1002,12 @@ TEST_F(tls, recv_partial)
 
        memset(recv_mem, 0, sizeof(recv_mem));
        EXPECT_EQ(send(self->fd, test_str, send_len, 0), send_len);
-       EXPECT_NE(recv(self->cfd, recv_mem, strlen(test_str_first),
-                      MSG_WAITALL), -1);
+       EXPECT_EQ(recv(self->cfd, recv_mem, strlen(test_str_first),
+                      MSG_WAITALL), strlen(test_str_first));
        EXPECT_EQ(memcmp(test_str_first, recv_mem, strlen(test_str_first)), 0);
        memset(recv_mem, 0, sizeof(recv_mem));
-       EXPECT_NE(recv(self->cfd, recv_mem, strlen(test_str_second),
-                      MSG_WAITALL), -1);
+       EXPECT_EQ(recv(self->cfd, recv_mem, strlen(test_str_second),
+                      MSG_WAITALL), strlen(test_str_second));
        EXPECT_EQ(memcmp(test_str_second, recv_mem, strlen(test_str_second)),
                  0);
 }
@@ -1485,6 +1485,51 @@ TEST_F(tls, control_msg)
        EXPECT_EQ(memcmp(buf, test_str, send_len), 0);
 }
 
+TEST_F(tls, control_msg_nomerge)
+{
+       char *rec1 = "1111";
+       char *rec2 = "2222";
+       int send_len = 5;
+       char buf[15];
+
+       if (self->notls)
+               SKIP(return, "no TLS support");
+
+       EXPECT_EQ(tls_send_cmsg(self->fd, 100, rec1, send_len, 0), send_len);
+       EXPECT_EQ(tls_send_cmsg(self->fd, 100, rec2, send_len, 0), send_len);
+
+       EXPECT_EQ(tls_recv_cmsg(_metadata, self->cfd, 100, buf, sizeof(buf), MSG_PEEK), send_len);
+       EXPECT_EQ(memcmp(buf, rec1, send_len), 0);
+
+       EXPECT_EQ(tls_recv_cmsg(_metadata, self->cfd, 100, buf, sizeof(buf), MSG_PEEK), send_len);
+       EXPECT_EQ(memcmp(buf, rec1, send_len), 0);
+
+       EXPECT_EQ(tls_recv_cmsg(_metadata, self->cfd, 100, buf, sizeof(buf), 0), send_len);
+       EXPECT_EQ(memcmp(buf, rec1, send_len), 0);
+
+       EXPECT_EQ(tls_recv_cmsg(_metadata, self->cfd, 100, buf, sizeof(buf), 0), send_len);
+       EXPECT_EQ(memcmp(buf, rec2, send_len), 0);
+}
+
+TEST_F(tls, data_control_data)
+{
+       char *rec1 = "1111";
+       char *rec2 = "2222";
+       char *rec3 = "3333";
+       int send_len = 5;
+       char buf[15];
+
+       if (self->notls)
+               SKIP(return, "no TLS support");
+
+       EXPECT_EQ(send(self->fd, rec1, send_len, 0), send_len);
+       EXPECT_EQ(tls_send_cmsg(self->fd, 100, rec2, send_len, 0), send_len);
+       EXPECT_EQ(send(self->fd, rec3, send_len, 0), send_len);
+
+       EXPECT_EQ(recv(self->cfd, buf, sizeof(buf), MSG_PEEK), send_len);
+       EXPECT_EQ(recv(self->cfd, buf, sizeof(buf), MSG_PEEK), send_len);
+}
+
 TEST_F(tls, shutdown)
 {
        char const *test_str = "test_read";
@@ -1874,13 +1919,13 @@ TEST_F(tls_err, poll_partial_rec_async)
                /* Child should sleep in poll(), never get a wake */
                pfd.fd = self->cfd2;
                pfd.events = POLLIN;
-               EXPECT_EQ(poll(&pfd, 1, 5), 0);
+               EXPECT_EQ(poll(&pfd, 1, 20), 0);
 
                EXPECT_EQ(write(p[1], &token, 1), 1); /* Barrier #1 */
 
                pfd.fd = self->cfd2;
                pfd.events = POLLIN;
-               EXPECT_EQ(poll(&pfd, 1, 5), 1);
+               EXPECT_EQ(poll(&pfd, 1, 20), 1);
 
                exit(!_metadata->passed);
        }
index af5dc57c8ce935907fd93279077c0d326205415e..8802604148dda1c2565fdb0d5b0aaabb0cad1427 100755 (executable)
@@ -7,7 +7,7 @@ source net_helper.sh
 
 readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)"
 
-BPF_FILE="../bpf/xdp_dummy.bpf.o"
+BPF_FILE="xdp_dummy.o"
 
 # set global exit status, but never reset nonzero one.
 check_err()
@@ -197,7 +197,7 @@ run_all() {
 }
 
 if [ ! -f ${BPF_FILE} ]; then
-       echo "Missing ${BPF_FILE}. Build bpf selftest first"
+       echo "Missing ${BPF_FILE}. Run 'make' first"
        exit -1
 fi
 
index cb664679b4342992a16694a182c7d0b3a7e9d80b..7080eae5312b2f9fa13c41868337fd4433fb0de6 100755 (executable)
@@ -7,7 +7,7 @@ source net_helper.sh
 
 readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)"
 
-BPF_FILE="../bpf/xdp_dummy.bpf.o"
+BPF_FILE="xdp_dummy.o"
 
 cleanup() {
        local -r jobs="$(jobs -p)"
@@ -84,7 +84,7 @@ run_all() {
 }
 
 if [ ! -f ${BPF_FILE} ]; then
-       echo "Missing ${BPF_FILE}. Build bpf selftest first"
+       echo "Missing ${BPF_FILE}. Run 'make' first"
        exit -1
 fi
 
index dd47fa96f6b3e5ea1cf1f750a4fd55d7a0c4592b..e1ff645bd3d1c7b0b8ba177ee73ce595a91f3808 100755 (executable)
@@ -7,7 +7,7 @@ source net_helper.sh
 
 readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)"
 
-BPF_FILE="../bpf/xdp_dummy.bpf.o"
+BPF_FILE="xdp_dummy.o"
 
 cleanup() {
        local -r jobs="$(jobs -p)"
@@ -85,12 +85,12 @@ run_all() {
 }
 
 if [ ! -f ${BPF_FILE} ]; then
-       echo "Missing ${BPF_FILE}. Build bpf selftest first"
+       echo "Missing ${BPF_FILE}. Run 'make' first"
        exit -1
 fi
 
 if [ ! -f nat6to4.o ]; then
-       echo "Missing nat6to4 helper. Build bpf nat6to4.o selftest first"
+       echo "Missing nat6to4 helper. Run 'make' first"
        exit -1
 fi
 
index c079565add39224eb99e011f941b6f0a11c1648c..9cd5e885e91f74b01007cf14bbdb9808aa04c632 100755 (executable)
@@ -1,7 +1,9 @@
 #!/bin/bash
 # SPDX-License-Identifier: GPL-2.0
 
-BPF_FILE="../bpf/xdp_dummy.bpf.o"
+source net_helper.sh
+
+BPF_FILE="xdp_dummy.o"
 readonly BASE="ns-$(mktemp -u XXXXXX)"
 readonly SRC=2
 readonly DST=1
@@ -37,6 +39,10 @@ create_ns() {
        for ns in $NS_SRC $NS_DST; do
                ip netns add $ns
                ip -n $ns link set dev lo up
+
+               # disable route solicitations to decrease 'noise' traffic
+               ip netns exec $ns sysctl -qw net.ipv6.conf.default.router_solicitations=0
+               ip netns exec $ns sysctl -qw net.ipv6.conf.all.router_solicitations=0
        done
 
        ip link add name veth$SRC type veth peer name veth$DST
@@ -78,6 +84,12 @@ create_vxlan_pair() {
                create_vxlan_endpoint $BASE$ns veth$ns $BM_NET_V6$((3 - $ns)) vxlan6$ns 6
                ip -n $BASE$ns addr add dev vxlan6$ns $OL_NET_V6$ns/24 nodad
        done
+
+       # preload neighbur cache, do avoid some noisy traffic
+       local addr_dst=$(ip -j -n $BASE$DST link show dev vxlan6$DST  |jq -r '.[]["address"]')
+       local addr_src=$(ip -j -n $BASE$SRC link show dev vxlan6$SRC  |jq -r '.[]["address"]')
+       ip -n $BASE$DST neigh add dev vxlan6$DST lladdr $addr_src $OL_NET_V6$SRC
+       ip -n $BASE$SRC neigh add dev vxlan6$SRC lladdr $addr_dst $OL_NET_V6$DST
 }
 
 is_ipv6() {
@@ -117,9 +129,9 @@ run_test() {
        # not enable GRO
        ip netns exec $NS_DST $ipt -A INPUT -p udp --dport 4789
        ip netns exec $NS_DST $ipt -A INPUT -p udp --dport 8000
-       ip netns exec $NS_DST ./udpgso_bench_rx -C 1000 -R 10 -n 10 -l 1300 $rx_args &
+       ip netns exec $NS_DST ./udpgso_bench_rx -C 2000 -R 100 -n 10 -l 1300 $rx_args &
        local spid=$!
-       sleep 0.1
+       wait_local_port_listen "$NS_DST" 8000 udp
        ip netns exec $NS_SRC ./udpgso_bench_tx $family -M 1 -s 13000 -S 1300 -D $dst
        local retc=$?
        wait $spid
@@ -166,9 +178,9 @@ run_bench() {
        # bind the sender and the receiver to different CPUs to try
        # get reproducible results
        ip netns exec $NS_DST bash -c "echo 2 > /sys/class/net/veth$DST/queues/rx-0/rps_cpus"
-       ip netns exec $NS_DST taskset 0x2 ./udpgso_bench_rx -C 1000 -R 10  &
+       ip netns exec $NS_DST taskset 0x2 ./udpgso_bench_rx -C 2000 -R 100  &
        local spid=$!
-       sleep 0.1
+       wait_local_port_listen "$NS_DST" 8000 udp
        ip netns exec $NS_SRC taskset 0x1 ./udpgso_bench_tx $family -l 3 -S 1300 -D $dst
        local retc=$?
        wait $spid
index f35a924d4a3030780447f2cc137f6ff373ed693c..1cbadd267c963c0c067308d3fb16493625e8f1b7 100644 (file)
@@ -375,7 +375,7 @@ static void do_recv(void)
                        do_flush_udp(fd);
 
                tnow = gettimeofday_ms();
-               if (tnow > treport) {
+               if (!cfg_expected_pkt_nr && tnow > treport) {
                        if (packets)
                                fprintf(stderr,
                                        "%s rx: %6lu MB/s %8lu calls/s\n",
index 2d073595c620210254bc372bc428b05121e9b26b..5ae85def07395b50c07600f4a31b7ff69578bb9f 100755 (executable)
@@ -1,7 +1,7 @@
 #!/bin/sh
 # SPDX-License-Identifier: GPL-2.0
 
-BPF_FILE="../bpf/xdp_dummy.bpf.o"
+BPF_FILE="xdp_dummy.o"
 readonly STATS="$(mktemp -p /tmp ns-XXXXXX)"
 readonly BASE=`basename $STATS`
 readonly SRC=2
@@ -218,7 +218,7 @@ while getopts "hs:" option; do
 done
 
 if [ ! -f ${BPF_FILE} ]; then
-       echo "Missing ${BPF_FILE}. Build bpf selftest first"
+       echo "Missing ${BPF_FILE}. Run 'make' first"
        exit 1
 fi
 
@@ -246,6 +246,20 @@ ip netns exec $NS_DST ethtool -K veth$DST rx-udp-gro-forwarding on
 chk_gro "        - aggregation with TSO off" 1
 cleanup
 
+create_ns
+ip -n $NS_DST link set dev veth$DST up
+ip -n $NS_DST link set dev veth$DST xdp object ${BPF_FILE} section xdp
+chk_gro_flag "gro vs xdp while down - gro flag on" $DST on
+ip -n $NS_DST link set dev veth$DST down
+chk_gro_flag "                      - after down" $DST on
+ip -n $NS_DST link set dev veth$DST xdp off
+chk_gro_flag "                      - after xdp off" $DST off
+ip -n $NS_DST link set dev veth$DST up
+chk_gro_flag "                      - after up" $DST off
+ip -n $NS_SRC link set dev veth$SRC xdp object ${BPF_FILE} section xdp
+chk_gro_flag "                      - after peer xdp" $DST off
+cleanup
+
 create_ns
 chk_channels "default channels" $DST 1 1
 
diff --git a/tools/testing/selftests/net/xdp_dummy.c b/tools/testing/selftests/net/xdp_dummy.c
new file mode 100644 (file)
index 0000000..d988b2e
--- /dev/null
@@ -0,0 +1,13 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define KBUILD_MODNAME "xdp_dummy"
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+SEC("xdp")
+int xdp_dummy_prog(struct xdp_md *ctx)
+{
+       return XDP_PASS;
+}
+
+char _license[] SEC("license") = "GPL";
index db27153eb4a02c1db3f0f9dc55445558fbb5d5ea..936c3085bb8373ea74036a6870cb67f5b103f0ae 100644 (file)
@@ -7,7 +7,8 @@ TEST_PROGS := nft_trans_stress.sh nft_fib.sh nft_nat.sh bridge_brouter.sh \
        nft_queue.sh nft_meta.sh nf_nat_edemux.sh \
        ipip-conntrack-mtu.sh conntrack_tcp_unreplied.sh \
        conntrack_vrf.sh nft_synproxy.sh rpath.sh nft_audit.sh \
-       conntrack_sctp_collision.sh xt_string.sh
+       conntrack_sctp_collision.sh xt_string.sh \
+       bridge_netfilter.sh
 
 HOSTPKG_CONFIG := pkg-config
 
diff --git a/tools/testing/selftests/netfilter/bridge_netfilter.sh b/tools/testing/selftests/netfilter/bridge_netfilter.sh
new file mode 100644 (file)
index 0000000..659b3ab
--- /dev/null
@@ -0,0 +1,188 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Test bridge netfilter + conntrack, a combination that doesn't really work,
+# with multicast/broadcast packets racing for hash table insertion.
+
+#           eth0    br0     eth0
+# setup is: ns1 <->,ns0 <-> ns3
+#           ns2 <-'    `'-> ns4
+
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+ret=0
+
+sfx=$(mktemp -u "XXXXXXXX")
+ns0="ns0-$sfx"
+ns1="ns1-$sfx"
+ns2="ns2-$sfx"
+ns3="ns3-$sfx"
+ns4="ns4-$sfx"
+
+ebtables -V > /dev/null 2>&1
+if [ $? -ne 0 ];then
+       echo "SKIP: Could not run test without ebtables"
+       exit $ksft_skip
+fi
+
+ip -Version > /dev/null 2>&1
+if [ $? -ne 0 ];then
+       echo "SKIP: Could not run test without ip tool"
+       exit $ksft_skip
+fi
+
+for i in $(seq 0 4); do
+  eval ip netns add \$ns$i
+done
+
+cleanup() {
+  for i in $(seq 0 4); do eval ip netns del \$ns$i;done
+}
+
+trap cleanup EXIT
+
+do_ping()
+{
+       fromns="$1"
+       dstip="$2"
+
+       ip netns exec $fromns ping -c 1 -q $dstip > /dev/null
+       if [ $? -ne 0 ]; then
+               echo "ERROR: ping from $fromns to $dstip"
+               ip netns exec ${ns0} nft list ruleset
+               ret=1
+       fi
+}
+
+bcast_ping()
+{
+       fromns="$1"
+       dstip="$2"
+
+       for i in $(seq 1 1000); do
+               ip netns exec $fromns ping -q -f -b -c 1 -q $dstip > /dev/null 2>&1
+               if [ $? -ne 0 ]; then
+                       echo "ERROR: ping -b from $fromns to $dstip"
+                       ip netns exec ${ns0} nft list ruleset
+                       fi
+       done
+}
+
+ip link add veth1 netns ${ns0} type veth peer name eth0 netns ${ns1}
+if [ $? -ne 0 ]; then
+       echo "SKIP: Can't create veth device"
+       exit $ksft_skip
+fi
+
+ip link add veth2 netns ${ns0} type veth peer name eth0 netns $ns2
+ip link add veth3 netns ${ns0} type veth peer name eth0 netns $ns3
+ip link add veth4 netns ${ns0} type veth peer name eth0 netns $ns4
+
+ip -net ${ns0} link set lo up
+
+for i in $(seq 1 4); do
+  ip -net ${ns0} link set veth$i up
+done
+
+ip -net ${ns0} link add br0 type bridge stp_state 0 forward_delay 0 nf_call_iptables 1 nf_call_ip6tables 1 nf_call_arptables 1
+if [ $? -ne 0 ]; then
+       echo "SKIP: Can't create bridge br0"
+       exit $ksft_skip
+fi
+
+# make veth0,1,2 part of bridge.
+for i in $(seq 1 3); do
+  ip -net ${ns0} link set veth$i master br0
+done
+
+# add a macvlan on top of the bridge.
+MACVLAN_ADDR=ba:f3:13:37:42:23
+ip -net ${ns0} link add link br0 name macvlan0 type macvlan mode private
+ip -net ${ns0} link set macvlan0 address ${MACVLAN_ADDR}
+ip -net ${ns0} link set macvlan0 up
+ip -net ${ns0} addr add 10.23.0.1/24 dev macvlan0
+
+# add a macvlan on top of veth4.
+MACVLAN_ADDR=ba:f3:13:37:42:24
+ip -net ${ns0} link add link veth4 name macvlan4 type macvlan mode vepa
+ip -net ${ns0} link set macvlan4 address ${MACVLAN_ADDR}
+ip -net ${ns0} link set macvlan4 up
+
+# make the macvlan part of the bridge.
+# veth4 is not a bridge port, only the macvlan on top of it.
+ip -net ${ns0} link set macvlan4 master br0
+
+ip -net ${ns0} link set br0 up
+ip -net ${ns0} addr add 10.0.0.1/24 dev br0
+ip netns exec ${ns0} sysctl -q net.bridge.bridge-nf-call-iptables=1
+ret=$?
+if [ $ret -ne 0 ] ; then
+       echo "SKIP: bridge netfilter not available"
+       ret=$ksft_skip
+fi
+
+# for testing, so namespaces will reply to ping -b probes.
+ip netns exec ${ns0} sysctl -q net.ipv4.icmp_echo_ignore_broadcasts=0
+
+# enable conntrack in ns0 and drop broadcast packets in forward to
+# avoid them from getting confirmed in the postrouting hook before
+# the cloned skb is passed up the stack.
+ip netns exec ${ns0} nft -f - <<EOF
+table ip filter {
+       chain input {
+               type filter hook input priority 1; policy accept
+               iifname br0 counter
+               ct state new accept
+       }
+}
+
+table bridge filter {
+       chain forward {
+               type filter hook forward priority 0; policy accept
+               meta pkttype broadcast ip protocol icmp counter drop
+       }
+}
+EOF
+
+# place 1, 2 & 3 in same subnet, connected via ns0:br0.
+# ns4 is placed in same subnet as well, but its not
+# part of the bridge: the corresponding veth4 is not
+# part of the bridge, only its macvlan interface.
+for i in $(seq 1 4); do
+  eval ip -net \$ns$i link set lo up
+  eval ip -net \$ns$i link set eth0 up
+done
+for i in $(seq 1 2); do
+  eval ip -net \$ns$i addr add 10.0.0.1$i/24 dev eth0
+done
+
+ip -net ${ns3} addr add 10.23.0.13/24 dev eth0
+ip -net ${ns4} addr add 10.23.0.14/24 dev eth0
+
+# test basic connectivity
+do_ping ${ns1} 10.0.0.12
+do_ping ${ns3} 10.23.0.1
+do_ping ${ns4} 10.23.0.1
+
+if [ $ret -eq 0 ];then
+       echo "PASS: netns connectivity: ns1 can reach ns2, ns3 and ns4 can reach ns0"
+fi
+
+bcast_ping ${ns1} 10.0.0.255
+
+# This should deliver broadcast to macvlan0, which is on top of ns0:br0.
+bcast_ping ${ns3} 10.23.0.255
+
+# same, this time via veth4:macvlan4.
+bcast_ping ${ns4} 10.23.0.255
+
+read t < /proc/sys/kernel/tainted
+
+if [ $t -eq 0 ];then
+       echo PASS: kernel not tainted
+else
+       echo ERROR: kernel is tainted
+       ret=1
+fi
+
+exit $ret
index f18c6db13bbff402202f6bd4796b4581803e2d73..b11ea8ee67194604de4a7dcbda7539dbffe7b7a1 100644 (file)
@@ -13,7 +13,7 @@
 #include "../kselftest_harness.h"
 
 #define TEST_ZONE_ID 123
-#define CTA_FILTER_F_CTA_TUPLE_ZONE (1 << 2)
+#define NF_CT_DEFAULT_ZONE_ID 0
 
 static int reply_counter;
 
@@ -336,6 +336,9 @@ FIXTURE_SETUP(conntrack_dump_flush)
        ret = conntrack_data_generate_v4(self->sock, 0xf4f4f4f4, 0xf5f5f5f5,
                                         TEST_ZONE_ID + 2);
        EXPECT_EQ(ret, 0);
+       ret = conntrack_data_generate_v4(self->sock, 0xf6f6f6f6, 0xf7f7f7f7,
+                                        NF_CT_DEFAULT_ZONE_ID);
+       EXPECT_EQ(ret, 0);
 
        src = (struct in6_addr) {{
                .__u6_addr32 = {
@@ -395,6 +398,26 @@ FIXTURE_SETUP(conntrack_dump_flush)
                                         TEST_ZONE_ID + 2);
        EXPECT_EQ(ret, 0);
 
+       src = (struct in6_addr) {{
+               .__u6_addr32 = {
+                       0xb80d0120,
+                       0x00000000,
+                       0x00000000,
+                       0x07000000
+               }
+       }};
+       dst = (struct in6_addr) {{
+               .__u6_addr32 = {
+                       0xb80d0120,
+                       0x00000000,
+                       0x00000000,
+                       0x08000000
+               }
+       }};
+       ret = conntrack_data_generate_v6(self->sock, src, dst,
+                                        NF_CT_DEFAULT_ZONE_ID);
+       EXPECT_EQ(ret, 0);
+
        ret = conntracK_count_zone(self->sock, TEST_ZONE_ID);
        EXPECT_GE(ret, 2);
        if (ret > 2)
@@ -425,6 +448,24 @@ TEST_F(conntrack_dump_flush, test_flush_by_zone)
        EXPECT_EQ(ret, 2);
        ret = conntracK_count_zone(self->sock, TEST_ZONE_ID + 2);
        EXPECT_EQ(ret, 2);
+       ret = conntracK_count_zone(self->sock, NF_CT_DEFAULT_ZONE_ID);
+       EXPECT_EQ(ret, 2);
+}
+
+TEST_F(conntrack_dump_flush, test_flush_by_zone_default)
+{
+       int ret;
+
+       ret = conntrack_flush_zone(self->sock, NF_CT_DEFAULT_ZONE_ID);
+       EXPECT_EQ(ret, 0);
+       ret = conntracK_count_zone(self->sock, TEST_ZONE_ID);
+       EXPECT_EQ(ret, 2);
+       ret = conntracK_count_zone(self->sock, TEST_ZONE_ID + 1);
+       EXPECT_EQ(ret, 2);
+       ret = conntracK_count_zone(self->sock, TEST_ZONE_ID + 2);
+       EXPECT_EQ(ret, 2);
+       ret = conntracK_count_zone(self->sock, NF_CT_DEFAULT_ZONE_ID);
+       EXPECT_EQ(ret, 0);
 }
 
 TEST_HARNESS_MAIN
index 0930e2411dfb9a1337988e4cb815e14bfa4464b5..cd51d547b751db39547d4e25b8db44f25dd23780 100644 (file)
@@ -5,6 +5,7 @@
 #include <fcntl.h>
 #include <limits.h>
 #include <linux/types.h>
+#include <poll.h>
 #include <sched.h>
 #include <signal.h>
 #include <stdio.h>
@@ -129,6 +130,7 @@ FIXTURE(child)
         * When it is closed, the child will exit.
         */
        int sk;
+       bool ignore_child_result;
 };
 
 FIXTURE_SETUP(child)
@@ -165,10 +167,14 @@ FIXTURE_SETUP(child)
 
 FIXTURE_TEARDOWN(child)
 {
+       int ret;
+
        EXPECT_EQ(0, close(self->pidfd));
        EXPECT_EQ(0, close(self->sk));
 
-       EXPECT_EQ(0, wait_for_pid(self->pid));
+       ret = wait_for_pid(self->pid);
+       if (!self->ignore_child_result)
+               EXPECT_EQ(0, ret);
 }
 
 TEST_F(child, disable_ptrace)
@@ -235,6 +241,29 @@ TEST(flags_set)
        EXPECT_EQ(errno, EINVAL);
 }
 
+TEST_F(child, no_strange_EBADF)
+{
+       struct pollfd fds;
+
+       self->ignore_child_result = true;
+
+       fds.fd = self->pidfd;
+       fds.events = POLLIN;
+
+       ASSERT_EQ(kill(self->pid, SIGKILL), 0);
+       ASSERT_EQ(poll(&fds, 1, 5000), 1);
+
+       /*
+        * It used to be that pidfd_getfd() could race with the exiting thread
+        * between exit_files() and release_task(), and get a non-null task
+        * with a NULL files struct, and you'd get EBADF, which was slightly
+        * confusing.
+        */
+       errno = 0;
+       EXPECT_EQ(sys_pidfd_getfd(self->pidfd, self->remote_fd, 0), -1);
+       EXPECT_EQ(errno, ESRCH);
+}
+
 #if __NR_pidfd_getfd == -1
 int main(void)
 {
diff --git a/tools/testing/selftests/power_supply/Makefile b/tools/testing/selftests/power_supply/Makefile
new file mode 100644 (file)
index 0000000..44f0658
--- /dev/null
@@ -0,0 +1,4 @@
+TEST_PROGS := test_power_supply_properties.sh
+TEST_FILES := helpers.sh
+
+include ../lib.mk
diff --git a/tools/testing/selftests/power_supply/helpers.sh b/tools/testing/selftests/power_supply/helpers.sh
new file mode 100644 (file)
index 0000000..1ec90d7
--- /dev/null
@@ -0,0 +1,178 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2022, 2024 Collabora Ltd
+SYSFS_SUPPLIES=/sys/class/power_supply
+
+calc() {
+       awk "BEGIN { print $* }";
+}
+
+test_sysfs_prop() {
+       PROP="$1"
+       VALUE="$2" # optional
+
+       PROP_PATH="$SYSFS_SUPPLIES"/"$DEVNAME"/"$PROP"
+       TEST_NAME="$DEVNAME".sysfs."$PROP"
+
+       if [ -z "$VALUE" ]; then
+               ktap_test_result "$TEST_NAME" [ -f "$PROP_PATH" ]
+       else
+               ktap_test_result "$TEST_NAME" grep -q "$VALUE" "$PROP_PATH"
+       fi
+}
+
+to_human_readable_unit() {
+       VALUE="$1"
+       UNIT="$2"
+
+       case "$VALUE" in
+               *[!0-9]* ) return ;; # Not a number
+       esac
+
+       if [ "$UNIT" = "uA" ]; then
+               new_unit="mA"
+               div=1000
+       elif [ "$UNIT" = "uV" ]; then
+               new_unit="V"
+               div=1000000
+       elif [ "$UNIT" = "uAh" ]; then
+               new_unit="Ah"
+               div=1000000
+       elif [ "$UNIT" = "uW" ]; then
+               new_unit="mW"
+               div=1000
+       elif [ "$UNIT" = "uWh" ]; then
+               new_unit="Wh"
+               div=1000000
+       else
+               return
+       fi
+
+       value_converted=$(calc "$VALUE"/"$div")
+       echo "$value_converted" "$new_unit"
+}
+
+_check_sysfs_prop_available() {
+       PROP=$1
+
+       PROP_PATH="$SYSFS_SUPPLIES"/"$DEVNAME"/"$PROP"
+       TEST_NAME="$DEVNAME".sysfs."$PROP"
+
+       if [ ! -e "$PROP_PATH" ] ; then
+               ktap_test_skip "$TEST_NAME"
+               return 1
+       fi
+
+       if ! cat "$PROP_PATH" >/dev/null; then
+               ktap_print_msg "Failed to read"
+               ktap_test_fail "$TEST_NAME"
+               return 1
+       fi
+
+       return 0
+}
+
+test_sysfs_prop_optional() {
+       PROP=$1
+       UNIT=$2 # optional
+
+       TEST_NAME="$DEVNAME".sysfs."$PROP"
+
+       _check_sysfs_prop_available "$PROP" || return
+       DATA=$(cat "$SYSFS_SUPPLIES"/"$DEVNAME"/"$PROP")
+
+       ktap_print_msg "Reported: '$DATA' $UNIT ($(to_human_readable_unit "$DATA" "$UNIT"))"
+       ktap_test_pass "$TEST_NAME"
+}
+
+test_sysfs_prop_optional_range() {
+       PROP=$1
+       MIN=$2
+       MAX=$3
+       UNIT=$4 # optional
+
+       TEST_NAME="$DEVNAME".sysfs."$PROP"
+
+       _check_sysfs_prop_available "$PROP" || return
+       DATA=$(cat "$SYSFS_SUPPLIES"/"$DEVNAME"/"$PROP")
+
+       if [ "$DATA" -lt "$MIN" ] || [ "$DATA" -gt "$MAX" ]; then
+               ktap_print_msg "'$DATA' is out of range (min=$MIN, max=$MAX)"
+               ktap_test_fail "$TEST_NAME"
+       else
+               ktap_print_msg "Reported: '$DATA' $UNIT ($(to_human_readable_unit "$DATA" "$UNIT"))"
+               ktap_test_pass "$TEST_NAME"
+       fi
+}
+
+test_sysfs_prop_optional_list() {
+       PROP=$1
+       LIST=$2
+
+       TEST_NAME="$DEVNAME".sysfs."$PROP"
+
+       _check_sysfs_prop_available "$PROP" || return
+       DATA=$(cat "$SYSFS_SUPPLIES"/"$DEVNAME"/"$PROP")
+
+       valid=0
+
+       OLDIFS=$IFS
+       IFS=","
+       for item in $LIST; do
+               if [ "$DATA" = "$item" ]; then
+                       valid=1
+                       break
+               fi
+       done
+       if [ "$valid" -eq 1 ]; then
+               ktap_print_msg "Reported: '$DATA'"
+               ktap_test_pass "$TEST_NAME"
+       else
+               ktap_print_msg "'$DATA' is not a valid value for this property"
+               ktap_test_fail "$TEST_NAME"
+       fi
+       IFS=$OLDIFS
+}
+
+dump_file() {
+       FILE="$1"
+       while read -r line; do
+               ktap_print_msg "$line"
+       done < "$FILE"
+}
+
+__test_uevent_prop() {
+       PROP="$1"
+       OPTIONAL="$2"
+       VALUE="$3" # optional
+
+       UEVENT_PATH="$SYSFS_SUPPLIES"/"$DEVNAME"/uevent
+       TEST_NAME="$DEVNAME".uevent."$PROP"
+
+       if ! grep -q "POWER_SUPPLY_$PROP=" "$UEVENT_PATH"; then
+               if [ "$OPTIONAL" -eq 1 ]; then
+                       ktap_test_skip "$TEST_NAME"
+               else
+                       ktap_print_msg "Missing property"
+                       ktap_test_fail "$TEST_NAME"
+               fi
+               return
+       fi
+
+       if ! grep -q "POWER_SUPPLY_$PROP=$VALUE" "$UEVENT_PATH"; then
+               ktap_print_msg "Invalid value for uevent property, dumping..."
+               dump_file "$UEVENT_PATH"
+               ktap_test_fail "$TEST_NAME"
+       else
+               ktap_test_pass "$TEST_NAME"
+       fi
+}
+
+test_uevent_prop() {
+       __test_uevent_prop "$1" 0 "$2"
+}
+
+test_uevent_prop_optional() {
+       __test_uevent_prop "$1" 1 "$2"
+}
diff --git a/tools/testing/selftests/power_supply/test_power_supply_properties.sh b/tools/testing/selftests/power_supply/test_power_supply_properties.sh
new file mode 100755 (executable)
index 0000000..df272df
--- /dev/null
@@ -0,0 +1,114 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2022, 2024 Collabora Ltd
+#
+# This test validates the power supply uAPI: namely, the files in sysfs and
+# lines in uevent that expose the power supply properties.
+#
+# By default all power supplies available are tested. Optionally the name of a
+# power supply can be passed as a parameter to test only that one instead.
+DIR="$(dirname "$(readlink -f "$0")")"
+
+. "${DIR}"/../kselftest/ktap_helpers.sh
+
+. "${DIR}"/helpers.sh
+
+count_tests() {
+       SUPPLIES=$1
+
+       # This needs to be updated every time a new test is added.
+       NUM_TESTS=33
+
+       total_tests=0
+
+       for i in $SUPPLIES; do
+               total_tests=$(("$total_tests" + "$NUM_TESTS"))
+       done
+
+       echo "$total_tests"
+}
+
+ktap_print_header
+
+SYSFS_SUPPLIES=/sys/class/power_supply/
+
+if [ $# -eq 0 ]; then
+       supplies=$(ls "$SYSFS_SUPPLIES")
+else
+       supplies=$1
+fi
+
+ktap_set_plan "$(count_tests "$supplies")"
+
+for DEVNAME in $supplies; do
+       ktap_print_msg Testing device "$DEVNAME"
+
+       if [ ! -d "$SYSFS_SUPPLIES"/"$DEVNAME" ]; then
+               ktap_test_fail "$DEVNAME".exists
+               ktap_exit_fail_msg Device does not exist
+       fi
+
+       ktap_test_pass "$DEVNAME".exists
+
+       test_uevent_prop NAME "$DEVNAME"
+
+       test_sysfs_prop type
+       SUPPLY_TYPE=$(cat "$SYSFS_SUPPLIES"/"$DEVNAME"/type)
+       # This fails on kernels < 5.8 (needs 2ad3d74e3c69f)
+       test_uevent_prop TYPE "$SUPPLY_TYPE"
+
+       test_sysfs_prop_optional usb_type
+
+       test_sysfs_prop_optional_range online 0 2
+       test_sysfs_prop_optional_range present 0 1
+
+       test_sysfs_prop_optional_list status "Unknown","Charging","Discharging","Not charging","Full"
+
+       # Capacity is reported as percentage, thus any value less than 0 and
+       # greater than 100 are not allowed.
+       test_sysfs_prop_optional_range capacity 0 100 "%"
+
+       test_sysfs_prop_optional_list capacity_level "Unknown","Critical","Low","Normal","High","Full"
+
+       test_sysfs_prop_optional model_name
+       test_sysfs_prop_optional manufacturer
+       test_sysfs_prop_optional serial_number
+       test_sysfs_prop_optional_list technology "Unknown","NiMH","Li-ion","Li-poly","LiFe","NiCd","LiMn"
+
+       test_sysfs_prop_optional cycle_count
+
+       test_sysfs_prop_optional_list scope "Unknown","System","Device"
+
+       test_sysfs_prop_optional input_current_limit "uA"
+       test_sysfs_prop_optional input_voltage_limit "uV"
+
+       # Technically the power-supply class does not limit reported values.
+       # E.g. one could expose an RTC backup-battery, which goes below 1.5V or
+       # an electric vehicle battery with over 300V. But most devices do not
+       # have a step-up capable regulator behind the battery and operate with
+       # voltages considered safe to touch, so we limit the allowed range to
+       # 1.8V-60V to catch drivers reporting incorrectly scaled values. E.g. a
+       # common mistake is reporting data in mV instead of µV.
+       test_sysfs_prop_optional_range voltage_now 1800000 60000000 "uV"
+       test_sysfs_prop_optional_range voltage_min 1800000 60000000 "uV"
+       test_sysfs_prop_optional_range voltage_max 1800000 60000000 "uV"
+       test_sysfs_prop_optional_range voltage_min_design 1800000 60000000 "uV"
+       test_sysfs_prop_optional_range voltage_max_design 1800000 60000000 "uV"
+
+       # current based systems
+       test_sysfs_prop_optional current_now "uA"
+       test_sysfs_prop_optional current_max "uA"
+       test_sysfs_prop_optional charge_now "uAh"
+       test_sysfs_prop_optional charge_full "uAh"
+       test_sysfs_prop_optional charge_full_design "uAh"
+
+       # power based systems
+       test_sysfs_prop_optional power_now "uW"
+       test_sysfs_prop_optional energy_now "uWh"
+       test_sysfs_prop_optional energy_full "uWh"
+       test_sysfs_prop_optional energy_full_design "uWh"
+       test_sysfs_prop_optional energy_full_design "uWh"
+done
+
+ktap_finished
index 7b1addd504209fadb59f908a843b23a9d0218f3f..8a64f63e37ce215e4aeff2675b7114704075ae48 100644 (file)
@@ -18,6 +18,7 @@
 #include <pthread.h>
 
 #include "utils.h"
+#include "fpu.h"
 
 /* Number of times each thread should receive the signal */
 #define ITERATIONS 10
@@ -27,9 +28,7 @@
  */
 #define THREAD_FACTOR 8
 
-__thread double darray[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
-                    1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
-                    2.1};
+__thread double darray[32];
 
 bool bad_context;
 int threads_starting;
@@ -43,9 +42,9 @@ void signal_fpu_sig(int sig, siginfo_t *info, void *context)
        ucontext_t *uc = context;
        mcontext_t *mc = &uc->uc_mcontext;
 
-       /* Only the non volatiles were loaded up */
-       for (i = 14; i < 32; i++) {
-               if (mc->fp_regs[i] != darray[i - 14]) {
+       // Don't check f30/f31, they're used as scratches in check_all_fprs()
+       for (i = 0; i < 30; i++) {
+               if (mc->fp_regs[i] != darray[i]) {
                        bad_context = true;
                        break;
                }
@@ -54,7 +53,6 @@ void signal_fpu_sig(int sig, siginfo_t *info, void *context)
 
 void *signal_fpu_c(void *p)
 {
-       int i;
        long rc;
        struct sigaction act;
        act.sa_sigaction = signal_fpu_sig;
@@ -64,9 +62,7 @@ void *signal_fpu_c(void *p)
                return p;
 
        srand(pthread_self());
-       for (i = 0; i < 21; i++)
-               darray[i] = rand();
-
+       randomise_darray(darray, ARRAY_SIZE(darray));
        rc = preempt_fpu(darray, &threads_starting, &running);
 
        return (void *) rc;
index 98cbb9109ee6e8e1e6047c1640c8cb0f4a2b5625..505294da1b9fb5e7bd07aac4a119164900c8f2e6 100644 (file)
@@ -263,10 +263,10 @@ static int papr_vpd_system_loc_code(void)
        off_t size;
        int fd;
 
-       SKIP_IF_MSG(get_system_loc_code(&lc),
-                   "Cannot determine system location code");
        SKIP_IF_MSG(devfd < 0 && errno == ENOENT,
                    DEVPATH " not present");
+       SKIP_IF_MSG(get_system_loc_code(&lc),
+                   "Cannot determine system location code");
 
        FAIL_IF(devfd < 0);
 
index bcbca356d56a8d7ddd449af2bc0b01a69e960624..1b339d6bbff1c945fdcfac6acafd908ae1bd96c8 100644 (file)
 #include <stdint.h>
 #include "resctrl.h"
 
-struct read_format {
-       __u64 nr;                       /* The number of events */
-       struct {
-               __u64 value;            /* The value of the event */
-       } values[2];
-};
-
-static struct perf_event_attr pea_llc_miss;
-static struct read_format rf_cqm;
-static int fd_lm;
 char llc_occup_path[1024];
 
-static void initialize_perf_event_attr(void)
+void perf_event_attr_initialize(struct perf_event_attr *pea, __u64 config)
 {
-       pea_llc_miss.type = PERF_TYPE_HARDWARE;
-       pea_llc_miss.size = sizeof(struct perf_event_attr);
-       pea_llc_miss.read_format = PERF_FORMAT_GROUP;
-       pea_llc_miss.exclude_kernel = 1;
-       pea_llc_miss.exclude_hv = 1;
-       pea_llc_miss.exclude_idle = 1;
-       pea_llc_miss.exclude_callchain_kernel = 1;
-       pea_llc_miss.inherit = 1;
-       pea_llc_miss.exclude_guest = 1;
-       pea_llc_miss.disabled = 1;
-}
-
-static void ioctl_perf_event_ioc_reset_enable(void)
-{
-       ioctl(fd_lm, PERF_EVENT_IOC_RESET, 0);
-       ioctl(fd_lm, PERF_EVENT_IOC_ENABLE, 0);
-}
-
-static int perf_event_open_llc_miss(pid_t pid, int cpu_no)
-{
-       fd_lm = perf_event_open(&pea_llc_miss, pid, cpu_no, -1,
-                               PERF_FLAG_FD_CLOEXEC);
-       if (fd_lm == -1) {
-               perror("Error opening leader");
-               ctrlc_handler(0, NULL, NULL);
-               return -1;
-       }
-
-       return 0;
-}
-
-static void initialize_llc_perf(void)
-{
-       memset(&pea_llc_miss, 0, sizeof(struct perf_event_attr));
-       memset(&rf_cqm, 0, sizeof(struct read_format));
-
-       /* Initialize perf_event_attr structures for HW_CACHE_MISSES */
-       initialize_perf_event_attr();
-
-       pea_llc_miss.config = PERF_COUNT_HW_CACHE_MISSES;
-
-       rf_cqm.nr = 1;
+       memset(pea, 0, sizeof(*pea));
+       pea->type = PERF_TYPE_HARDWARE;
+       pea->size = sizeof(*pea);
+       pea->read_format = PERF_FORMAT_GROUP;
+       pea->exclude_kernel = 1;
+       pea->exclude_hv = 1;
+       pea->exclude_idle = 1;
+       pea->exclude_callchain_kernel = 1;
+       pea->inherit = 1;
+       pea->exclude_guest = 1;
+       pea->disabled = 1;
+       pea->config = config;
 }
 
-static int reset_enable_llc_perf(pid_t pid, int cpu_no)
+/* Start counters to log values */
+int perf_event_reset_enable(int pe_fd)
 {
-       int ret = 0;
+       int ret;
 
-       ret = perf_event_open_llc_miss(pid, cpu_no);
+       ret = ioctl(pe_fd, PERF_EVENT_IOC_RESET, 0);
        if (ret < 0)
                return ret;
 
-       /* Start counters to log values */
-       ioctl_perf_event_ioc_reset_enable();
+       ret = ioctl(pe_fd, PERF_EVENT_IOC_ENABLE, 0);
+       if (ret < 0)
+               return ret;
 
        return 0;
 }
 
-/*
- * get_llc_perf:       llc cache miss through perf events
- * @llc_perf_miss:     LLC miss counter that is filled on success
- *
- * Perf events like HW_CACHE_MISSES could be used to validate number of
- * cache lines allocated.
- *
- * Return: =0 on success.  <0 on failure.
- */
-static int get_llc_perf(unsigned long *llc_perf_miss)
+void perf_event_initialize_read_format(struct perf_event_read *pe_read)
 {
-       __u64 total_misses;
-       int ret;
-
-       /* Stop counters after one span to get miss rate */
+       memset(pe_read, 0, sizeof(*pe_read));
+       pe_read->nr = 1;
+}
 
-       ioctl(fd_lm, PERF_EVENT_IOC_DISABLE, 0);
+int perf_open(struct perf_event_attr *pea, pid_t pid, int cpu_no)
+{
+       int pe_fd;
 
-       ret = read(fd_lm, &rf_cqm, sizeof(struct read_format));
-       if (ret == -1) {
-               perror("Could not get llc misses through perf");
+       pe_fd = perf_event_open(pea, pid, cpu_no, -1, PERF_FLAG_FD_CLOEXEC);
+       if (pe_fd == -1) {
+               ksft_perror("Error opening leader");
                return -1;
        }
 
-       total_misses = rf_cqm.values[0].value;
-       *llc_perf_miss = total_misses;
+       perf_event_reset_enable(pe_fd);
 
-       return 0;
+       return pe_fd;
 }
 
 /*
@@ -124,12 +77,12 @@ static int get_llc_occu_resctrl(unsigned long *llc_occupancy)
 
        fp = fopen(llc_occup_path, "r");
        if (!fp) {
-               perror("Failed to open results file");
+               ksft_perror("Failed to open results file");
 
-               return errno;
+               return -1;
        }
        if (fscanf(fp, "%lu", llc_occupancy) <= 0) {
-               perror("Could not get llc occupancy");
+               ksft_perror("Could not get llc occupancy");
                fclose(fp);
 
                return -1;
@@ -146,163 +99,91 @@ static int get_llc_occu_resctrl(unsigned long *llc_occupancy)
  * @llc_value:         perf miss value /
  *                     llc occupancy value reported by resctrl FS
  *
- * Return:             0 on success. non-zero on failure.
+ * Return:             0 on success, < 0 on error.
  */
-static int print_results_cache(char *filename, int bm_pid,
-                              unsigned long llc_value)
+static int print_results_cache(const char *filename, int bm_pid, __u64 llc_value)
 {
        FILE *fp;
 
        if (strcmp(filename, "stdio") == 0 || strcmp(filename, "stderr") == 0) {
-               printf("Pid: %d \t LLC_value: %lu\n", bm_pid,
-                      llc_value);
+               printf("Pid: %d \t LLC_value: %llu\n", bm_pid, llc_value);
        } else {
                fp = fopen(filename, "a");
                if (!fp) {
-                       perror("Cannot open results file");
+                       ksft_perror("Cannot open results file");
 
-                       return errno;
+                       return -1;
                }
-               fprintf(fp, "Pid: %d \t llc_value: %lu\n", bm_pid, llc_value);
+               fprintf(fp, "Pid: %d \t llc_value: %llu\n", bm_pid, llc_value);
                fclose(fp);
        }
 
        return 0;
 }
 
-int measure_cache_vals(struct resctrl_val_param *param, int bm_pid)
+/*
+ * perf_event_measure - Measure perf events
+ * @filename:  Filename for writing the results
+ * @bm_pid:    PID that runs the benchmark
+ *
+ * Measures perf events (e.g., cache misses) and writes the results into
+ * @filename. @bm_pid is written to the results file along with the measured
+ * value.
+ *
+ * Return: =0 on success. <0 on failure.
+ */
+int perf_event_measure(int pe_fd, struct perf_event_read *pe_read,
+                      const char *filename, int bm_pid)
 {
-       unsigned long llc_perf_miss = 0, llc_occu_resc = 0, llc_value = 0;
        int ret;
 
-       /*
-        * Measure cache miss from perf.
-        */
-       if (!strncmp(param->resctrl_val, CAT_STR, sizeof(CAT_STR))) {
-               ret = get_llc_perf(&llc_perf_miss);
-               if (ret < 0)
-                       return ret;
-               llc_value = llc_perf_miss;
-       }
+       /* Stop counters after one span to get miss rate */
+       ret = ioctl(pe_fd, PERF_EVENT_IOC_DISABLE, 0);
+       if (ret < 0)
+               return ret;
 
-       /*
-        * Measure llc occupancy from resctrl.
-        */
-       if (!strncmp(param->resctrl_val, CMT_STR, sizeof(CMT_STR))) {
-               ret = get_llc_occu_resctrl(&llc_occu_resc);
-               if (ret < 0)
-                       return ret;
-               llc_value = llc_occu_resc;
+       ret = read(pe_fd, pe_read, sizeof(*pe_read));
+       if (ret == -1) {
+               ksft_perror("Could not get perf value");
+               return -1;
        }
-       ret = print_results_cache(param->filename, bm_pid, llc_value);
-       if (ret)
-               return ret;
 
-       return 0;
+       return print_results_cache(filename, bm_pid, pe_read->values[0].value);
 }
 
 /*
- * cache_val:          execute benchmark and measure LLC occupancy resctrl
- * and perf cache miss for the benchmark
- * @param:             parameters passed to cache_val()
- * @span:              buffer size for the benchmark
+ * measure_llc_resctrl - Measure resctrl LLC value from resctrl
+ * @filename:  Filename for writing the results
+ * @bm_pid:    PID that runs the benchmark
  *
- * Return:             0 on success. non-zero on failure.
+ * Measures LLC occupancy from resctrl and writes the results into @filename.
+ * @bm_pid is written to the results file along with the measured value.
+ *
+ * Return: =0 on success. <0 on failure.
  */
-int cat_val(struct resctrl_val_param *param, size_t span)
+int measure_llc_resctrl(const char *filename, int bm_pid)
 {
-       int memflush = 1, operation = 0, ret = 0;
-       char *resctrl_val = param->resctrl_val;
-       pid_t bm_pid;
-
-       if (strcmp(param->filename, "") == 0)
-               sprintf(param->filename, "stdio");
-
-       bm_pid = getpid();
-
-       /* Taskset benchmark to specified cpu */
-       ret = taskset_benchmark(bm_pid, param->cpu_no);
-       if (ret)
-               return ret;
+       unsigned long llc_occu_resc = 0;
+       int ret;
 
-       /* Write benchmark to specified con_mon grp, mon_grp in resctrl FS*/
-       ret = write_bm_pid_to_resctrl(bm_pid, param->ctrlgrp, param->mongrp,
-                                     resctrl_val);
-       if (ret)
+       ret = get_llc_occu_resctrl(&llc_occu_resc);
+       if (ret < 0)
                return ret;
 
-       initialize_llc_perf();
-
-       /* Test runs until the callback setup() tells the test to stop. */
-       while (1) {
-               ret = param->setup(param);
-               if (ret == END_OF_TESTS) {
-                       ret = 0;
-                       break;
-               }
-               if (ret < 0)
-                       break;
-               ret = reset_enable_llc_perf(bm_pid, param->cpu_no);
-               if (ret)
-                       break;
-
-               if (run_fill_buf(span, memflush, operation, true)) {
-                       fprintf(stderr, "Error-running fill buffer\n");
-                       ret = -1;
-                       goto pe_close;
-               }
-
-               sleep(1);
-               ret = measure_cache_vals(param, bm_pid);
-               if (ret)
-                       goto pe_close;
-       }
-
-       return ret;
-
-pe_close:
-       close(fd_lm);
-       return ret;
+       return print_results_cache(filename, bm_pid, llc_occu_resc);
 }
 
 /*
- * show_cache_info:    show cache test result information
- * @sum_llc_val:       sum of LLC cache result data
- * @no_of_bits:                number of bits
- * @cache_span:                cache span in bytes for CMT or in lines for CAT
- * @max_diff:          max difference
- * @max_diff_percent:  max difference percentage
- * @num_of_runs:       number of runs
- * @platform:          show test information on this platform
- * @cmt:               CMT test or CAT test
- *
- * Return:             0 on success. non-zero on failure.
+ * show_cache_info - Show generic cache test information
+ * @no_of_bits:                Number of bits
+ * @avg_llc_val:       Average of LLC cache result data
+ * @cache_span:                Cache span
+ * @lines:             @cache_span in lines or bytes
  */
-int show_cache_info(unsigned long sum_llc_val, int no_of_bits,
-                   size_t cache_span, unsigned long max_diff,
-                   unsigned long max_diff_percent, unsigned long num_of_runs,
-                   bool platform, bool cmt)
+void show_cache_info(int no_of_bits, __u64 avg_llc_val, size_t cache_span, bool lines)
 {
-       unsigned long avg_llc_val = 0;
-       float diff_percent;
-       long avg_diff = 0;
-       int ret;
-
-       avg_llc_val = sum_llc_val / num_of_runs;
-       avg_diff = (long)abs(cache_span - avg_llc_val);
-       diff_percent = ((float)cache_span - avg_llc_val) / cache_span * 100;
-
-       ret = platform && abs((int)diff_percent) > max_diff_percent &&
-             (cmt ? (abs(avg_diff) > max_diff) : true);
-
-       ksft_print_msg("%s Check cache miss rate within %lu%%\n",
-                      ret ? "Fail:" : "Pass:", max_diff_percent);
-
-       ksft_print_msg("Percent diff=%d\n", abs((int)diff_percent));
        ksft_print_msg("Number of bits: %d\n", no_of_bits);
-       ksft_print_msg("Average LLC val: %lu\n", avg_llc_val);
-       ksft_print_msg("Cache span (%s): %zu\n", cmt ? "bytes" : "lines",
+       ksft_print_msg("Average LLC val: %llu\n", avg_llc_val);
+       ksft_print_msg("Cache span (%s): %zu\n", lines ? "lines" : "bytes",
                       cache_span);
-
-       return ret;
 }
index 224ba8544d8afc88910b01e77e77a35c8e2cfdc2..4cb991be8e31be37032d9811a070fc68dce27f66 100644 (file)
 #include "resctrl.h"
 #include <unistd.h>
 
-#define RESULT_FILE_NAME1      "result_cat1"
-#define RESULT_FILE_NAME2      "result_cat2"
+#define RESULT_FILE_NAME       "result_cat"
 #define NUM_OF_RUNS            5
-#define MAX_DIFF_PERCENT       4
-#define MAX_DIFF               1000000
 
 /*
- * Change schemata. Write schemata to specified
- * con_mon grp, mon_grp in resctrl FS.
- * Run 5 times in order to get average values.
+ * Minimum difference in LLC misses between a test with n+1 bits CBM to the
+ * test with n bits is MIN_DIFF_PERCENT_PER_BIT * (n - 1). With e.g. 5 vs 4
+ * bits in the CBM mask, the minimum difference must be at least
+ * MIN_DIFF_PERCENT_PER_BIT * (4 - 1) = 3 percent.
+ *
+ * The relationship between number of used CBM bits and difference in LLC
+ * misses is not expected to be linear. With a small number of bits, the
+ * margin is smaller than with larger number of bits. For selftest purposes,
+ * however, linear approach is enough because ultimately only pass/fail
+ * decision has to be made and distinction between strong and stronger
+ * signal is irrelevant.
  */
-static int cat_setup(struct resctrl_val_param *p)
+#define MIN_DIFF_PERCENT_PER_BIT       1UL
+
+static int show_results_info(__u64 sum_llc_val, int no_of_bits,
+                            unsigned long cache_span,
+                            unsigned long min_diff_percent,
+                            unsigned long num_of_runs, bool platform,
+                            __s64 *prev_avg_llc_val)
 {
-       char schemata[64];
+       __u64 avg_llc_val = 0;
+       float avg_diff;
        int ret = 0;
 
-       /* Run NUM_OF_RUNS times */
-       if (p->num_of_runs >= NUM_OF_RUNS)
-               return END_OF_TESTS;
+       avg_llc_val = sum_llc_val / num_of_runs;
+       if (*prev_avg_llc_val) {
+               float delta = (__s64)(avg_llc_val - *prev_avg_llc_val);
+
+               avg_diff = delta / *prev_avg_llc_val;
+               ret = platform && (avg_diff * 100) < (float)min_diff_percent;
+
+               ksft_print_msg("%s Check cache miss rate changed more than %.1f%%\n",
+                              ret ? "Fail:" : "Pass:", (float)min_diff_percent);
 
-       if (p->num_of_runs == 0) {
-               sprintf(schemata, "%lx", p->mask);
-               ret = write_schemata(p->ctrlgrp, schemata, p->cpu_no,
-                                    p->resctrl_val);
+               ksft_print_msg("Percent diff=%.1f\n", avg_diff * 100);
        }
-       p->num_of_runs++;
+       *prev_avg_llc_val = avg_llc_val;
+
+       show_cache_info(no_of_bits, avg_llc_val, cache_span, true);
 
        return ret;
 }
 
-static int check_results(struct resctrl_val_param *param, size_t span)
+/* Remove the highest bit from CBM */
+static unsigned long next_mask(unsigned long current_mask)
+{
+       return current_mask & (current_mask >> 1);
+}
+
+static int check_results(struct resctrl_val_param *param, const char *cache_type,
+                        unsigned long cache_total_size, unsigned long full_cache_mask,
+                        unsigned long current_mask)
 {
        char *token_array[8], temp[512];
-       unsigned long sum_llc_perf_miss = 0;
-       int runs = 0, no_of_bits = 0;
+       __u64 sum_llc_perf_miss = 0;
+       __s64 prev_avg_llc_val = 0;
+       unsigned long alloc_size;
+       int runs = 0;
+       int fail = 0;
+       int ret;
        FILE *fp;
 
        ksft_print_msg("Checking for pass/fail\n");
        fp = fopen(param->filename, "r");
        if (!fp) {
-               perror("# Cannot open file");
+               ksft_perror("Cannot open file");
 
-               return errno;
+               return -1;
        }
 
        while (fgets(temp, sizeof(temp), fp)) {
                char *token = strtok(temp, ":\t");
                int fields = 0;
+               int bits;
 
                while (token) {
                        token_array[fields++] = token;
                        token = strtok(NULL, ":\t");
                }
-               /*
-                * Discard the first value which is inaccurate due to monitoring
-                * setup transition phase.
-                */
-               if (runs > 0)
-                       sum_llc_perf_miss += strtoul(token_array[3], NULL, 0);
+
+               sum_llc_perf_miss += strtoull(token_array[3], NULL, 0);
                runs++;
+
+               if (runs < NUM_OF_RUNS)
+                       continue;
+
+               if (!current_mask) {
+                       ksft_print_msg("Unexpected empty cache mask\n");
+                       break;
+               }
+
+               alloc_size = cache_portion_size(cache_total_size, current_mask, full_cache_mask);
+
+               bits = count_bits(current_mask);
+
+               ret = show_results_info(sum_llc_perf_miss, bits,
+                                       alloc_size / 64,
+                                       MIN_DIFF_PERCENT_PER_BIT * (bits - 1),
+                                       runs, get_vendor() == ARCH_INTEL,
+                                       &prev_avg_llc_val);
+               if (ret)
+                       fail = 1;
+
+               runs = 0;
+               sum_llc_perf_miss = 0;
+               current_mask = next_mask(current_mask);
        }
 
        fclose(fp);
-       no_of_bits = count_bits(param->mask);
 
-       return show_cache_info(sum_llc_perf_miss, no_of_bits, span / 64,
-                              MAX_DIFF, MAX_DIFF_PERCENT, runs - 1,
-                              get_vendor() == ARCH_INTEL, false);
+       return fail;
 }
 
 void cat_test_cleanup(void)
 {
-       remove(RESULT_FILE_NAME1);
-       remove(RESULT_FILE_NAME2);
+       remove(RESULT_FILE_NAME);
 }
 
-int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
+/*
+ * cat_test - Execute CAT benchmark and measure cache misses
+ * @test:              Test information structure
+ * @uparams:           User supplied parameters
+ * @param:             Parameters passed to cat_test()
+ * @span:              Buffer size for the benchmark
+ * @current_mask       Start mask for the first iteration
+ *
+ * Run CAT selftest by varying the allocated cache portion and comparing the
+ * impact on cache misses (the result analysis is done in check_results()
+ * and show_results_info(), not in this function).
+ *
+ * One bit is removed from the CAT allocation bit mask (in current_mask) for
+ * each subsequent test which keeps reducing the size of the allocated cache
+ * portion. A single test flushes the buffer, reads it to warm up the cache,
+ * and reads the buffer again. The cache misses are measured during the last
+ * read pass.
+ *
+ * Return:             0 when the test was run, < 0 on error.
+ */
+static int cat_test(const struct resctrl_test *test,
+                   const struct user_params *uparams,
+                   struct resctrl_val_param *param,
+                   size_t span, unsigned long current_mask)
 {
-       unsigned long l_mask, l_mask_1;
-       int ret, pipefd[2], sibling_cpu_no;
-       unsigned long cache_size = 0;
-       unsigned long long_mask;
-       char cbm_mask[256];
+       char *resctrl_val = param->resctrl_val;
+       struct perf_event_read pe_read;
+       struct perf_event_attr pea;
+       cpu_set_t old_affinity;
+       unsigned char *buf;
+       char schemata[64];
+       int ret, i, pe_fd;
+       pid_t bm_pid;
+
+       if (strcmp(param->filename, "") == 0)
+               sprintf(param->filename, "stdio");
+
+       bm_pid = getpid();
+
+       /* Taskset benchmark to specified cpu */
+       ret = taskset_benchmark(bm_pid, uparams->cpu, &old_affinity);
+       if (ret)
+               return ret;
+
+       /* Write benchmark to specified con_mon grp, mon_grp in resctrl FS*/
+       ret = write_bm_pid_to_resctrl(bm_pid, param->ctrlgrp, param->mongrp,
+                                     resctrl_val);
+       if (ret)
+               goto reset_affinity;
+
+       perf_event_attr_initialize(&pea, PERF_COUNT_HW_CACHE_MISSES);
+       perf_event_initialize_read_format(&pe_read);
+       pe_fd = perf_open(&pea, bm_pid, uparams->cpu);
+       if (pe_fd < 0) {
+               ret = -1;
+               goto reset_affinity;
+       }
+
+       buf = alloc_buffer(span, 1);
+       if (!buf) {
+               ret = -1;
+               goto pe_close;
+       }
+
+       while (current_mask) {
+               snprintf(schemata, sizeof(schemata), "%lx", param->mask & ~current_mask);
+               ret = write_schemata("", schemata, uparams->cpu, test->resource);
+               if (ret)
+                       goto free_buf;
+               snprintf(schemata, sizeof(schemata), "%lx", current_mask);
+               ret = write_schemata(param->ctrlgrp, schemata, uparams->cpu, test->resource);
+               if (ret)
+                       goto free_buf;
+
+               for (i = 0; i < NUM_OF_RUNS; i++) {
+                       mem_flush(buf, span);
+                       fill_cache_read(buf, span, true);
+
+                       ret = perf_event_reset_enable(pe_fd);
+                       if (ret)
+                               goto free_buf;
+
+                       fill_cache_read(buf, span, true);
+
+                       ret = perf_event_measure(pe_fd, &pe_read, param->filename, bm_pid);
+                       if (ret)
+                               goto free_buf;
+               }
+               current_mask = next_mask(current_mask);
+       }
+
+free_buf:
+       free(buf);
+pe_close:
+       close(pe_fd);
+reset_affinity:
+       taskset_restore(bm_pid, &old_affinity);
+
+       return ret;
+}
+
+static int cat_run_test(const struct resctrl_test *test, const struct user_params *uparams)
+{
+       unsigned long long_mask, start_mask, full_cache_mask;
+       unsigned long cache_total_size = 0;
+       int n = uparams->bits;
+       unsigned int start;
        int count_of_bits;
-       char pipe_message;
        size_t span;
+       int ret;
 
-       /* Get default cbm mask for L3/L2 cache */
-       ret = get_cbm_mask(cache_type, cbm_mask);
+       ret = get_full_cbm(test->resource, &full_cache_mask);
+       if (ret)
+               return ret;
+       /* Get the largest contiguous exclusive portion of the cache */
+       ret = get_mask_no_shareable(test->resource, &long_mask);
        if (ret)
                return ret;
-
-       long_mask = strtoul(cbm_mask, NULL, 16);
 
        /* Get L3/L2 cache size */
-       ret = get_cache_size(cpu_no, cache_type, &cache_size);
+       ret = get_cache_size(uparams->cpu, test->resource, &cache_total_size);
        if (ret)
                return ret;
-       ksft_print_msg("Cache size :%lu\n", cache_size);
+       ksft_print_msg("Cache size :%lu\n", cache_total_size);
 
-       /* Get max number of bits from default-cabm mask */
-       count_of_bits = count_bits(long_mask);
+       count_of_bits = count_contiguous_bits(long_mask, &start);
 
        if (!n)
                n = count_of_bits / 2;
@@ -123,89 +269,124 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type)
                               count_of_bits - 1);
                return -1;
        }
-
-       /* Get core id from same socket for running another thread */
-       sibling_cpu_no = get_core_sibling(cpu_no);
-       if (sibling_cpu_no < 0)
-               return -1;
+       start_mask = create_bit_mask(start, n);
 
        struct resctrl_val_param param = {
                .resctrl_val    = CAT_STR,
-               .cpu_no         = cpu_no,
-               .setup          = cat_setup,
+               .ctrlgrp        = "c1",
+               .filename       = RESULT_FILE_NAME,
+               .num_of_runs    = 0,
        };
+       param.mask = long_mask;
+       span = cache_portion_size(cache_total_size, start_mask, full_cache_mask);
 
-       l_mask = long_mask >> n;
-       l_mask_1 = ~l_mask & long_mask;
+       remove(param.filename);
 
-       /* Set param values for parent thread which will be allocated bitmask
-        * with (max_bits - n) bits
-        */
-       span = cache_size * (count_of_bits - n) / count_of_bits;
-       strcpy(param.ctrlgrp, "c2");
-       strcpy(param.mongrp, "m2");
-       strcpy(param.filename, RESULT_FILE_NAME2);
-       param.mask = l_mask;
-       param.num_of_runs = 0;
-
-       if (pipe(pipefd)) {
-               perror("# Unable to create pipe");
-               return errno;
-       }
+       ret = cat_test(test, uparams, &param, span, start_mask);
+       if (ret)
+               goto out;
 
-       fflush(stdout);
-       bm_pid = fork();
+       ret = check_results(&param, test->resource,
+                           cache_total_size, full_cache_mask, start_mask);
+out:
+       cat_test_cleanup();
 
-       /* Set param values for child thread which will be allocated bitmask
-        * with n bits
-        */
-       if (bm_pid == 0) {
-               param.mask = l_mask_1;
-               strcpy(param.ctrlgrp, "c1");
-               strcpy(param.mongrp, "m1");
-               span = cache_size * n / count_of_bits;
-               strcpy(param.filename, RESULT_FILE_NAME1);
-               param.num_of_runs = 0;
-               param.cpu_no = sibling_cpu_no;
+       return ret;
+}
+
+static int noncont_cat_run_test(const struct resctrl_test *test,
+                               const struct user_params *uparams)
+{
+       unsigned long full_cache_mask, cont_mask, noncont_mask;
+       unsigned int eax, ebx, ecx, edx, sparse_masks;
+       int bit_center, ret;
+       char schemata[64];
+
+       /* Check to compare sparse_masks content to CPUID output. */
+       ret = resource_info_unsigned_get(test->resource, "sparse_masks", &sparse_masks);
+       if (ret)
+               return ret;
+
+       if (!strcmp(test->resource, "L3"))
+               __cpuid_count(0x10, 1, eax, ebx, ecx, edx);
+       else if (!strcmp(test->resource, "L2"))
+               __cpuid_count(0x10, 2, eax, ebx, ecx, edx);
+       else
+               return -EINVAL;
+
+       if (sparse_masks != ((ecx >> 3) & 1)) {
+               ksft_print_msg("CPUID output doesn't match 'sparse_masks' file content!\n");
+               return 1;
        }
 
-       remove(param.filename);
+       /* Write checks initialization. */
+       ret = get_full_cbm(test->resource, &full_cache_mask);
+       if (ret < 0)
+               return ret;
+       bit_center = count_bits(full_cache_mask) / 2;
 
-       ret = cat_val(&param, span);
-       if (ret == 0)
-               ret = check_results(&param, span);
-
-       if (bm_pid == 0) {
-               /* Tell parent that child is ready */
-               close(pipefd[0]);
-               pipe_message = 1;
-               if (write(pipefd[1], &pipe_message, sizeof(pipe_message)) <
-                   sizeof(pipe_message))
-                       /*
-                        * Just print the error message.
-                        * Let while(1) run and wait for itself to be killed.
-                        */
-                       perror("# failed signaling parent process");
-
-               close(pipefd[1]);
-               while (1)
-                       ;
-       } else {
-               /* Parent waits for child to be ready. */
-               close(pipefd[1]);
-               pipe_message = 0;
-               while (pipe_message != 1) {
-                       if (read(pipefd[0], &pipe_message,
-                                sizeof(pipe_message)) < sizeof(pipe_message)) {
-                               perror("# failed reading from child process");
-                               break;
-                       }
-               }
-               close(pipefd[0]);
-               kill(bm_pid, SIGKILL);
+       /*
+        * The bit_center needs to be at least 3 to properly calculate the CBM
+        * hole in the noncont_mask. If it's smaller return an error since the
+        * cache mask is too short and that shouldn't happen.
+        */
+       if (bit_center < 3)
+               return -EINVAL;
+       cont_mask = full_cache_mask >> bit_center;
+
+       /* Contiguous mask write check. */
+       snprintf(schemata, sizeof(schemata), "%lx", cont_mask);
+       ret = write_schemata("", schemata, uparams->cpu, test->resource);
+       if (ret) {
+               ksft_print_msg("Write of contiguous CBM failed\n");
+               return 1;
        }
 
-       cat_test_cleanup();
+       /*
+        * Non-contiguous mask write check. CBM has a 0xf hole approximately in the middle.
+        * Output is compared with support information to catch any edge case errors.
+        */
+       noncont_mask = ~(0xfUL << (bit_center - 2)) & full_cache_mask;
+       snprintf(schemata, sizeof(schemata), "%lx", noncont_mask);
+       ret = write_schemata("", schemata, uparams->cpu, test->resource);
+       if (ret && sparse_masks)
+               ksft_print_msg("Non-contiguous CBMs supported but write of non-contiguous CBM failed\n");
+       else if (ret && !sparse_masks)
+               ksft_print_msg("Non-contiguous CBMs not supported and write of non-contiguous CBM failed as expected\n");
+       else if (!ret && !sparse_masks)
+               ksft_print_msg("Non-contiguous CBMs not supported but write of non-contiguous CBM succeeded\n");
+
+       return !ret == !sparse_masks;
+}
 
-       return ret;
+static bool noncont_cat_feature_check(const struct resctrl_test *test)
+{
+       if (!resctrl_resource_exists(test->resource))
+               return false;
+
+       return resource_info_file_exists(test->resource, "sparse_masks");
 }
+
+struct resctrl_test l3_cat_test = {
+       .name = "L3_CAT",
+       .group = "CAT",
+       .resource = "L3",
+       .feature_check = test_resource_feature_check,
+       .run_test = cat_run_test,
+};
+
+struct resctrl_test l3_noncont_cat_test = {
+       .name = "L3_NONCONT_CAT",
+       .group = "CAT",
+       .resource = "L3",
+       .feature_check = noncont_cat_feature_check,
+       .run_test = noncont_cat_run_test,
+};
+
+struct resctrl_test l2_noncont_cat_test = {
+       .name = "L2_NONCONT_CAT",
+       .group = "CAT",
+       .resource = "L2",
+       .feature_check = noncont_cat_feature_check,
+       .run_test = noncont_cat_run_test,
+};
index 50bdbce9fba95f9b16c0526b98f64229450d0058..a81f91222a89a820b14117e133fc6b5d596d7b72 100644 (file)
@@ -16,7 +16,9 @@
 #define MAX_DIFF               2000000
 #define MAX_DIFF_PERCENT       15
 
-static int cmt_setup(struct resctrl_val_param *p)
+static int cmt_setup(const struct resctrl_test *test,
+                    const struct user_params *uparams,
+                    struct resctrl_val_param *p)
 {
        /* Run NUM_OF_RUNS times */
        if (p->num_of_runs >= NUM_OF_RUNS)
@@ -27,6 +29,33 @@ static int cmt_setup(struct resctrl_val_param *p)
        return 0;
 }
 
+static int show_results_info(unsigned long sum_llc_val, int no_of_bits,
+                            unsigned long cache_span, unsigned long max_diff,
+                            unsigned long max_diff_percent, unsigned long num_of_runs,
+                            bool platform)
+{
+       unsigned long avg_llc_val = 0;
+       float diff_percent;
+       long avg_diff = 0;
+       int ret;
+
+       avg_llc_val = sum_llc_val / num_of_runs;
+       avg_diff = (long)abs(cache_span - avg_llc_val);
+       diff_percent = ((float)cache_span - avg_llc_val) / cache_span * 100;
+
+       ret = platform && abs((int)diff_percent) > max_diff_percent &&
+             abs(avg_diff) > max_diff;
+
+       ksft_print_msg("%s Check cache miss rate within %lu%%\n",
+                      ret ? "Fail:" : "Pass:", max_diff_percent);
+
+       ksft_print_msg("Percent diff=%d\n", abs((int)diff_percent));
+
+       show_cache_info(no_of_bits, avg_llc_val, cache_span, false);
+
+       return ret;
+}
+
 static int check_results(struct resctrl_val_param *param, size_t span, int no_of_bits)
 {
        char *token_array[8], temp[512];
@@ -37,9 +66,9 @@ static int check_results(struct resctrl_val_param *param, size_t span, int no_of
        ksft_print_msg("Checking for pass/fail\n");
        fp = fopen(param->filename, "r");
        if (!fp) {
-               perror("# Error in opening file\n");
+               ksft_perror("Error in opening file");
 
-               return errno;
+               return -1;
        }
 
        while (fgets(temp, sizeof(temp), fp)) {
@@ -58,9 +87,8 @@ static int check_results(struct resctrl_val_param *param, size_t span, int no_of
        }
        fclose(fp);
 
-       return show_cache_info(sum_llc_occu_resc, no_of_bits, span,
-                              MAX_DIFF, MAX_DIFF_PERCENT, runs - 1,
-                              true, true);
+       return show_results_info(sum_llc_occu_resc, no_of_bits, span,
+                                MAX_DIFF, MAX_DIFF_PERCENT, runs - 1, true);
 }
 
 void cmt_test_cleanup(void)
@@ -68,28 +96,26 @@ void cmt_test_cleanup(void)
        remove(RESULT_FILE_NAME);
 }
 
-int cmt_resctrl_val(int cpu_no, int n, const char * const *benchmark_cmd)
+static int cmt_run_test(const struct resctrl_test *test, const struct user_params *uparams)
 {
-       const char * const *cmd = benchmark_cmd;
+       const char * const *cmd = uparams->benchmark_cmd;
        const char *new_cmd[BENCHMARK_ARGS];
-       unsigned long cache_size = 0;
+       unsigned long cache_total_size = 0;
+       int n = uparams->bits ? : 5;
        unsigned long long_mask;
        char *span_str = NULL;
-       char cbm_mask[256];
        int count_of_bits;
        size_t span;
        int ret, i;
 
-       ret = get_cbm_mask("L3", cbm_mask);
+       ret = get_full_cbm("L3", &long_mask);
        if (ret)
                return ret;
 
-       long_mask = strtoul(cbm_mask, NULL, 16);
-
-       ret = get_cache_size(cpu_no, "L3", &cache_size);
+       ret = get_cache_size(uparams->cpu, "L3", &cache_total_size);
        if (ret)
                return ret;
-       ksft_print_msg("Cache size :%lu\n", cache_size);
+       ksft_print_msg("Cache size :%lu\n", cache_total_size);
 
        count_of_bits = count_bits(long_mask);
 
@@ -103,19 +129,18 @@ int cmt_resctrl_val(int cpu_no, int n, const char * const *benchmark_cmd)
                .resctrl_val    = CMT_STR,
                .ctrlgrp        = "c1",
                .mongrp         = "m1",
-               .cpu_no         = cpu_no,
                .filename       = RESULT_FILE_NAME,
                .mask           = ~(long_mask << n) & long_mask,
                .num_of_runs    = 0,
                .setup          = cmt_setup,
        };
 
-       span = cache_size * n / count_of_bits;
+       span = cache_portion_size(cache_total_size, param.mask, long_mask);
 
        if (strcmp(cmd[0], "fill_buf") == 0) {
                /* Duplicate the command to be able to replace span in it */
-               for (i = 0; benchmark_cmd[i]; i++)
-                       new_cmd[i] = benchmark_cmd[i];
+               for (i = 0; uparams->benchmark_cmd[i]; i++)
+                       new_cmd[i] = uparams->benchmark_cmd[i];
                new_cmd[i] = NULL;
 
                ret = asprintf(&span_str, "%zu", span);
@@ -127,11 +152,13 @@ int cmt_resctrl_val(int cpu_no, int n, const char * const *benchmark_cmd)
 
        remove(RESULT_FILE_NAME);
 
-       ret = resctrl_val(cmd, &param);
+       ret = resctrl_val(test, uparams, cmd, &param);
        if (ret)
                goto out;
 
        ret = check_results(&param, span, n);
+       if (ret && (get_vendor() == ARCH_INTEL))
+               ksft_print_msg("Intel CMT may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n");
 
 out:
        cmt_test_cleanup();
@@ -139,3 +166,16 @@ out:
 
        return ret;
 }
+
+static bool cmt_feature_check(const struct resctrl_test *test)
+{
+       return test_resource_feature_check(test) &&
+              resctrl_mon_feature_exists("L3_MON", "llc_occupancy");
+}
+
+struct resctrl_test cmt_test = {
+       .name = "CMT",
+       .resource = "L3",
+       .feature_check = cmt_feature_check,
+       .run_test = cmt_run_test,
+};
index 0d425f26583a9596479ae8a017c70d3523fa25a8..ae120f1735c0bc0d9e12bd8c38fabf9d75711b08 100644 (file)
@@ -38,7 +38,7 @@ static void cl_flush(void *p)
 #endif
 }
 
-static void mem_flush(unsigned char *buf, size_t buf_size)
+void mem_flush(unsigned char *buf, size_t buf_size)
 {
        unsigned char *cp = buf;
        size_t i = 0;
@@ -51,39 +51,38 @@ static void mem_flush(unsigned char *buf, size_t buf_size)
        sb();
 }
 
-static void *malloc_and_init_memory(size_t buf_size)
-{
-       void *p = NULL;
-       uint64_t *p64;
-       size_t s64;
-       int ret;
-
-       ret = posix_memalign(&p, PAGE_SIZE, buf_size);
-       if (ret < 0)
-               return NULL;
-
-       p64 = (uint64_t *)p;
-       s64 = buf_size / sizeof(uint64_t);
-
-       while (s64 > 0) {
-               *p64 = (uint64_t)rand();
-               p64 += (CL_SIZE / sizeof(uint64_t));
-               s64 -= (CL_SIZE / sizeof(uint64_t));
-       }
-
-       return p;
-}
+/*
+ * Buffer index step advance to workaround HW prefetching interfering with
+ * the measurements.
+ *
+ * Must be a prime to step through all indexes of the buffer.
+ *
+ * Some primes work better than others on some architectures (from MBA/MBM
+ * result stability point of view).
+ */
+#define FILL_IDX_MULT  23
 
 static int fill_one_span_read(unsigned char *buf, size_t buf_size)
 {
-       unsigned char *end_ptr = buf + buf_size;
-       unsigned char sum, *p;
-
-       sum = 0;
-       p = buf;
-       while (p < end_ptr) {
-               sum += *p;
-               p += (CL_SIZE / 2);
+       unsigned int size = buf_size / (CL_SIZE / 2);
+       unsigned int i, idx = 0;
+       unsigned char sum = 0;
+
+       /*
+        * Read the buffer in an order that is unexpected by HW prefetching
+        * optimizations to prevent them interfering with the caching pattern.
+        *
+        * The read order is (in terms of halves of cachelines):
+        *      i * FILL_IDX_MULT % size
+        * The formula is open-coded below to avoiding modulo inside the loop
+        * as it improves MBA/MBM result stability on some architectures.
+        */
+       for (i = 0; i < size; i++) {
+               sum += buf[idx * (CL_SIZE / 2)];
+
+               idx += FILL_IDX_MULT;
+               while (idx >= size)
+                       idx -= size;
        }
 
        return sum;
@@ -101,10 +100,9 @@ static void fill_one_span_write(unsigned char *buf, size_t buf_size)
        }
 }
 
-static int fill_cache_read(unsigned char *buf, size_t buf_size, bool once)
+void fill_cache_read(unsigned char *buf, size_t buf_size, bool once)
 {
        int ret = 0;
-       FILE *fp;
 
        while (1) {
                ret = fill_one_span_read(buf, buf_size);
@@ -113,67 +111,59 @@ static int fill_cache_read(unsigned char *buf, size_t buf_size, bool once)
        }
 
        /* Consume read result so that reading memory is not optimized out. */
-       fp = fopen("/dev/null", "w");
-       if (!fp) {
-               perror("Unable to write to /dev/null");
-               return -1;
-       }
-       fprintf(fp, "Sum: %d ", ret);
-       fclose(fp);
-
-       return 0;
+       *value_sink = ret;
 }
 
-static int fill_cache_write(unsigned char *buf, size_t buf_size, bool once)
+static void fill_cache_write(unsigned char *buf, size_t buf_size, bool once)
 {
        while (1) {
                fill_one_span_write(buf, buf_size);
                if (once)
                        break;
        }
-
-       return 0;
 }
 
-static int fill_cache(size_t buf_size, int memflush, int op, bool once)
+unsigned char *alloc_buffer(size_t buf_size, int memflush)
 {
-       unsigned char *buf;
+       void *buf = NULL;
+       uint64_t *p64;
+       size_t s64;
        int ret;
 
-       buf = malloc_and_init_memory(buf_size);
-       if (!buf)
-               return -1;
-
-       /* Flush the memory before using to avoid "cache hot pages" effect */
-       if (memflush)
-               mem_flush(buf, buf_size);
-
-       if (op == 0)
-               ret = fill_cache_read(buf, buf_size, once);
-       else
-               ret = fill_cache_write(buf, buf_size, once);
+       ret = posix_memalign(&buf, PAGE_SIZE, buf_size);
+       if (ret < 0)
+               return NULL;
 
-       free(buf);
+       /* Initialize the buffer */
+       p64 = buf;
+       s64 = buf_size / sizeof(uint64_t);
 
-       if (ret) {
-               printf("\n Error in fill cache read/write...\n");
-               return -1;
+       while (s64 > 0) {
+               *p64 = (uint64_t)rand();
+               p64 += (CL_SIZE / sizeof(uint64_t));
+               s64 -= (CL_SIZE / sizeof(uint64_t));
        }
 
+       /* Flush the memory before using to avoid "cache hot pages" effect */
+       if (memflush)
+               mem_flush(buf, buf_size);
 
-       return 0;
+       return buf;
 }
 
-int run_fill_buf(size_t span, int memflush, int op, bool once)
+int run_fill_buf(size_t buf_size, int memflush, int op, bool once)
 {
-       size_t cache_size = span;
-       int ret;
+       unsigned char *buf;
 
-       ret = fill_cache(cache_size, memflush, op, once);
-       if (ret) {
-               printf("\n Error in fill cache\n");
+       buf = alloc_buffer(buf_size, memflush);
+       if (!buf)
                return -1;
-       }
+
+       if (op == 0)
+               fill_cache_read(buf, buf_size, once);
+       else
+               fill_cache_write(buf, buf_size, once);
+       free(buf);
 
        return 0;
 }
index d3bf4368341ece023dff1a96a4c795122baf8464..7946e32e85c83fb9422caf1ab3168f4994f21048 100644 (file)
@@ -22,7 +22,9 @@
  * con_mon grp, mon_grp in resctrl FS.
  * For each allocation, run 5 times in order to get average values.
  */
-static int mba_setup(struct resctrl_val_param *p)
+static int mba_setup(const struct resctrl_test *test,
+                    const struct user_params *uparams,
+                    struct resctrl_val_param *p)
 {
        static int runs_per_allocation, allocation = 100;
        char allocation_str[64];
@@ -40,8 +42,7 @@ static int mba_setup(struct resctrl_val_param *p)
 
        sprintf(allocation_str, "%d", allocation);
 
-       ret = write_schemata(p->ctrlgrp, allocation_str, p->cpu_no,
-                            p->resctrl_val);
+       ret = write_schemata(p->ctrlgrp, allocation_str, uparams->cpu, test->resource);
        if (ret < 0)
                return ret;
 
@@ -109,9 +110,9 @@ static int check_results(void)
 
        fp = fopen(output, "r");
        if (!fp) {
-               perror(output);
+               ksft_perror(output);
 
-               return errno;
+               return -1;
        }
 
        runs = 0;
@@ -141,13 +142,12 @@ void mba_test_cleanup(void)
        remove(RESULT_FILE_NAME);
 }
 
-int mba_schemata_change(int cpu_no, const char * const *benchmark_cmd)
+static int mba_run_test(const struct resctrl_test *test, const struct user_params *uparams)
 {
        struct resctrl_val_param param = {
                .resctrl_val    = MBA_STR,
                .ctrlgrp        = "c1",
                .mongrp         = "m1",
-               .cpu_no         = cpu_no,
                .filename       = RESULT_FILE_NAME,
                .bw_report      = "reads",
                .setup          = mba_setup
@@ -156,7 +156,7 @@ int mba_schemata_change(int cpu_no, const char * const *benchmark_cmd)
 
        remove(RESULT_FILE_NAME);
 
-       ret = resctrl_val(benchmark_cmd, &param);
+       ret = resctrl_val(test, uparams, uparams->benchmark_cmd, &param);
        if (ret)
                goto out;
 
@@ -167,3 +167,17 @@ out:
 
        return ret;
 }
+
+static bool mba_feature_check(const struct resctrl_test *test)
+{
+       return test_resource_feature_check(test) &&
+              resctrl_mon_feature_exists("L3_MON", "mbm_local_bytes");
+}
+
+struct resctrl_test mba_test = {
+       .name = "MBA",
+       .resource = "MB",
+       .vendor_specific = ARCH_INTEL,
+       .feature_check = mba_feature_check,
+       .run_test = mba_run_test,
+};
index 741533f2b075884d8983e8635cc454f05d29ea6c..d67ffa3ec63a321416855d2405f06c3e2c16af05 100644 (file)
@@ -59,9 +59,9 @@ static int check_results(size_t span)
 
        fp = fopen(output, "r");
        if (!fp) {
-               perror(output);
+               ksft_perror(output);
 
-               return errno;
+               return -1;
        }
 
        runs = 0;
@@ -86,7 +86,9 @@ static int check_results(size_t span)
        return ret;
 }
 
-static int mbm_setup(struct resctrl_val_param *p)
+static int mbm_setup(const struct resctrl_test *test,
+                    const struct user_params *uparams,
+                    struct resctrl_val_param *p)
 {
        int ret = 0;
 
@@ -95,9 +97,8 @@ static int mbm_setup(struct resctrl_val_param *p)
                return END_OF_TESTS;
 
        /* Set up shemata with 100% allocation on the first run. */
-       if (p->num_of_runs == 0 && validate_resctrl_feature_request("MB", NULL))
-               ret = write_schemata(p->ctrlgrp, "100", p->cpu_no,
-                                    p->resctrl_val);
+       if (p->num_of_runs == 0 && resctrl_resource_exists("MB"))
+               ret = write_schemata(p->ctrlgrp, "100", uparams->cpu, test->resource);
 
        p->num_of_runs++;
 
@@ -109,13 +110,12 @@ void mbm_test_cleanup(void)
        remove(RESULT_FILE_NAME);
 }
 
-int mbm_bw_change(int cpu_no, const char * const *benchmark_cmd)
+static int mbm_run_test(const struct resctrl_test *test, const struct user_params *uparams)
 {
        struct resctrl_val_param param = {
                .resctrl_val    = MBM_STR,
                .ctrlgrp        = "c1",
                .mongrp         = "m1",
-               .cpu_no         = cpu_no,
                .filename       = RESULT_FILE_NAME,
                .bw_report      = "reads",
                .setup          = mbm_setup
@@ -124,14 +124,30 @@ int mbm_bw_change(int cpu_no, const char * const *benchmark_cmd)
 
        remove(RESULT_FILE_NAME);
 
-       ret = resctrl_val(benchmark_cmd, &param);
+       ret = resctrl_val(test, uparams, uparams->benchmark_cmd, &param);
        if (ret)
                goto out;
 
        ret = check_results(DEFAULT_SPAN);
+       if (ret && (get_vendor() == ARCH_INTEL))
+               ksft_print_msg("Intel MBM may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n");
 
 out:
        mbm_test_cleanup();
 
        return ret;
 }
+
+static bool mbm_feature_check(const struct resctrl_test *test)
+{
+       return resctrl_mon_feature_exists("L3_MON", "mbm_total_bytes") &&
+              resctrl_mon_feature_exists("L3_MON", "mbm_local_bytes");
+}
+
+struct resctrl_test mbm_test = {
+       .name = "MBM",
+       .resource = "MB",
+       .vendor_specific = ARCH_INTEL,
+       .feature_check = mbm_feature_check,
+       .run_test = mbm_run_test,
+};
index a33f414f60199ac543412589780b1e554d148ac8..2051bd135e0d04d0d403e28b3c9b393a7d043926 100644 (file)
 #define PHYS_ID_PATH           "/sys/devices/system/cpu/cpu"
 #define INFO_PATH              "/sys/fs/resctrl/info"
 
+/*
+ * CPU vendor IDs
+ *
+ * Define as bits because they're used for vendor_specific bitmask in
+ * the struct resctrl_test.
+ */
 #define ARCH_INTEL     1
 #define ARCH_AMD       2
 
 
 #define DEFAULT_SPAN           (250 * MB)
 
-#define PARENT_EXIT(err_msg)                   \
+#define PARENT_EXIT()                          \
        do {                                    \
-               perror(err_msg);                \
                kill(ppid, SIGKILL);            \
                umount_resctrlfs();             \
                exit(EXIT_FAILURE);             \
        } while (0)
 
+/*
+ * user_params:                User supplied parameters
+ * @cpu:               CPU number to which the benchmark will be bound to
+ * @bits:              Number of bits used for cache allocation size
+ * @benchmark_cmd:     Benchmark command to run during (some of the) tests
+ */
+struct user_params {
+       int cpu;
+       int bits;
+       const char *benchmark_cmd[BENCHMARK_ARGS];
+};
+
+/*
+ * resctrl_test:       resctrl test definition
+ * @name:              Test name
+ * @group:             Test group - a common name for tests that share some characteristic
+ *                     (e.g., L3 CAT test belongs to the CAT group). Can be NULL
+ * @resource:          Resource to test (e.g., MB, L3, L2, etc.)
+ * @vendor_specific:   Bitmask for vendor-specific tests (can be 0 for universal tests)
+ * @disabled:          Test is disabled
+ * @feature_check:     Callback to check required resctrl features
+ * @run_test:          Callback to run the test
+ */
+struct resctrl_test {
+       const char      *name;
+       const char      *group;
+       const char      *resource;
+       unsigned int    vendor_specific;
+       bool            disabled;
+       bool            (*feature_check)(const struct resctrl_test *test);
+       int             (*run_test)(const struct resctrl_test *test,
+                                   const struct user_params *uparams);
+};
+
 /*
  * resctrl_val_param:  resctrl test parameters
  * @resctrl_val:       Resctrl feature (Eg: mbm, mba.. etc)
  * @ctrlgrp:           Name of the control monitor group (con_mon grp)
  * @mongrp:            Name of the monitor group (mon grp)
- * @cpu_no:            CPU number to which the benchmark would be binded
  * @filename:          Name of file to which the o/p should be written
  * @bw_report:         Bandwidth report type (reads vs writes)
  * @setup:             Call back function to setup test environment
@@ -59,12 +97,20 @@ struct resctrl_val_param {
        char            *resctrl_val;
        char            ctrlgrp[64];
        char            mongrp[64];
-       int             cpu_no;
        char            filename[64];
        char            *bw_report;
        unsigned long   mask;
        int             num_of_runs;
-       int             (*setup)(struct resctrl_val_param *param);
+       int             (*setup)(const struct resctrl_test *test,
+                                const struct user_params *uparams,
+                                struct resctrl_val_param *param);
+};
+
+struct perf_event_read {
+       __u64 nr;                       /* The number of events */
+       struct {
+               __u64 value;            /* The value of the event */
+       } values[2];
 };
 
 #define MBM_STR                        "mbm"
@@ -72,6 +118,13 @@ struct resctrl_val_param {
 #define CMT_STR                        "cmt"
 #define CAT_STR                        "cat"
 
+/*
+ * Memory location that consumes values compiler must not optimize away.
+ * Volatile ensures writes to this location cannot be optimized away by
+ * compiler.
+ */
+extern volatile int *value_sink;
+
 extern pid_t bm_pid, ppid;
 
 extern char llc_occup_path[1024];
@@ -79,42 +132,84 @@ extern char llc_occup_path[1024];
 int get_vendor(void);
 bool check_resctrlfs_support(void);
 int filter_dmesg(void);
-int get_resource_id(int cpu_no, int *resource_id);
+int get_domain_id(const char *resource, int cpu_no, int *domain_id);
 int mount_resctrlfs(void);
 int umount_resctrlfs(void);
 int validate_bw_report_request(char *bw_report);
-bool validate_resctrl_feature_request(const char *resource, const char *feature);
+bool resctrl_resource_exists(const char *resource);
+bool resctrl_mon_feature_exists(const char *resource, const char *feature);
+bool resource_info_file_exists(const char *resource, const char *file);
+bool test_resource_feature_check(const struct resctrl_test *test);
 char *fgrep(FILE *inf, const char *str);
-int taskset_benchmark(pid_t bm_pid, int cpu_no);
-int write_schemata(char *ctrlgrp, char *schemata, int cpu_no,
-                  char *resctrl_val);
+int taskset_benchmark(pid_t bm_pid, int cpu_no, cpu_set_t *old_affinity);
+int taskset_restore(pid_t bm_pid, cpu_set_t *old_affinity);
+int write_schemata(char *ctrlgrp, char *schemata, int cpu_no, const char *resource);
 int write_bm_pid_to_resctrl(pid_t bm_pid, char *ctrlgrp, char *mongrp,
                            char *resctrl_val);
 int perf_event_open(struct perf_event_attr *hw_event, pid_t pid, int cpu,
                    int group_fd, unsigned long flags);
-int run_fill_buf(size_t span, int memflush, int op, bool once);
-int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *param);
-int mbm_bw_change(int cpu_no, const char * const *benchmark_cmd);
+unsigned char *alloc_buffer(size_t buf_size, int memflush);
+void mem_flush(unsigned char *buf, size_t buf_size);
+void fill_cache_read(unsigned char *buf, size_t buf_size, bool once);
+int run_fill_buf(size_t buf_size, int memflush, int op, bool once);
+int resctrl_val(const struct resctrl_test *test,
+               const struct user_params *uparams,
+               const char * const *benchmark_cmd,
+               struct resctrl_val_param *param);
 void tests_cleanup(void);
 void mbm_test_cleanup(void);
-int mba_schemata_change(int cpu_no, const char * const *benchmark_cmd);
 void mba_test_cleanup(void);
-int get_cbm_mask(char *cache_type, char *cbm_mask);
-int get_cache_size(int cpu_no, char *cache_type, unsigned long *cache_size);
+unsigned long create_bit_mask(unsigned int start, unsigned int len);
+unsigned int count_contiguous_bits(unsigned long val, unsigned int *start);
+int get_full_cbm(const char *cache_type, unsigned long *mask);
+int get_mask_no_shareable(const char *cache_type, unsigned long *mask);
+int get_cache_size(int cpu_no, const char *cache_type, unsigned long *cache_size);
+int resource_info_unsigned_get(const char *resource, const char *filename, unsigned int *val);
 void ctrlc_handler(int signum, siginfo_t *info, void *ptr);
 int signal_handler_register(void);
 void signal_handler_unregister(void);
-int cat_val(struct resctrl_val_param *param, size_t span);
 void cat_test_cleanup(void);
-int cat_perf_miss_val(int cpu_no, int no_of_bits, char *cache_type);
-int cmt_resctrl_val(int cpu_no, int n, const char * const *benchmark_cmd);
 unsigned int count_bits(unsigned long n);
 void cmt_test_cleanup(void);
-int get_core_sibling(int cpu_no);
-int measure_cache_vals(struct resctrl_val_param *param, int bm_pid);
-int show_cache_info(unsigned long sum_llc_val, int no_of_bits,
-                   size_t cache_span, unsigned long max_diff,
-                   unsigned long max_diff_percent, unsigned long num_of_runs,
-                   bool platform, bool cmt);
+
+void perf_event_attr_initialize(struct perf_event_attr *pea, __u64 config);
+void perf_event_initialize_read_format(struct perf_event_read *pe_read);
+int perf_open(struct perf_event_attr *pea, pid_t pid, int cpu_no);
+int perf_event_reset_enable(int pe_fd);
+int perf_event_measure(int pe_fd, struct perf_event_read *pe_read,
+                      const char *filename, int bm_pid);
+int measure_llc_resctrl(const char *filename, int bm_pid);
+void show_cache_info(int no_of_bits, __u64 avg_llc_val, size_t cache_span, bool lines);
+
+/*
+ * cache_portion_size - Calculate the size of a cache portion
+ * @cache_size:                Total cache size in bytes
+ * @portion_mask:      Cache portion mask
+ * @full_cache_mask:   Full Cache Bit Mask (CBM) for the cache
+ *
+ * Return: The size of the cache portion in bytes.
+ */
+static inline unsigned long cache_portion_size(unsigned long cache_size,
+                                              unsigned long portion_mask,
+                                              unsigned long full_cache_mask)
+{
+       unsigned int bits = count_bits(full_cache_mask);
+
+       /*
+        * With no bits the full CBM, assume cache cannot be split into
+        * smaller portions. To avoid divide by zero, return cache_size.
+        */
+       if (!bits)
+               return cache_size;
+
+       return cache_size * count_bits(portion_mask) / bits;
+}
+
+extern struct resctrl_test mbm_test;
+extern struct resctrl_test mba_test;
+extern struct resctrl_test cmt_test;
+extern struct resctrl_test l3_cat_test;
+extern struct resctrl_test l3_noncont_cat_test;
+extern struct resctrl_test l2_noncont_cat_test;
 
 #endif /* RESCTRL_H */
index 2bbe3045a018e2c09edb5785ca9e79d589c7d0f8..f3dc1b9696e71a47e91495c4d046cfadbee46c24 100644 (file)
  */
 #include "resctrl.h"
 
+/* Volatile memory sink to prevent compiler optimizations */
+static volatile int sink_target;
+volatile int *value_sink = &sink_target;
+
+static struct resctrl_test *resctrl_tests[] = {
+       &mbm_test,
+       &mba_test,
+       &cmt_test,
+       &l3_cat_test,
+       &l3_noncont_cat_test,
+       &l2_noncont_cat_test,
+};
+
 static int detect_vendor(void)
 {
        FILE *inf = fopen("/proc/cpuinfo", "r");
@@ -49,11 +62,20 @@ int get_vendor(void)
 
 static void cmd_help(void)
 {
+       int i;
+
        printf("usage: resctrl_tests [-h] [-t test list] [-n no_of_bits] [-b benchmark_cmd [option]...]\n");
        printf("\t-b benchmark_cmd [option]...: run specified benchmark for MBM, MBA and CMT\n");
        printf("\t   default benchmark is builtin fill_buf\n");
-       printf("\t-t test list: run tests specified in the test list, ");
+       printf("\t-t test list: run tests/groups specified by the list, ");
        printf("e.g. -t mbm,mba,cmt,cat\n");
+       printf("\t\tSupported tests (group):\n");
+       for (i = 0; i < ARRAY_SIZE(resctrl_tests); i++) {
+               if (resctrl_tests[i]->group)
+                       printf("\t\t\t%s (%s)\n", resctrl_tests[i]->name, resctrl_tests[i]->group);
+               else
+                       printf("\t\t\t%s\n", resctrl_tests[i]->name);
+       }
        printf("\t-n no_of_bits: run cache tests using specified no of bits in cache bit mask\n");
        printf("\t-p cpu_no: specify CPU number to run the test. 1 is default\n");
        printf("\t-h: help\n");
@@ -92,116 +114,63 @@ static void test_cleanup(void)
        signal_handler_unregister();
 }
 
-static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no)
+static bool test_vendor_specific_check(const struct resctrl_test *test)
 {
-       int res;
-
-       ksft_print_msg("Starting MBM BW change ...\n");
-
-       if (test_prepare()) {
-               ksft_exit_fail_msg("Abnormal failure when preparing for the test\n");
-               return;
-       }
-
-       if (!validate_resctrl_feature_request("L3_MON", "mbm_total_bytes") ||
-           !validate_resctrl_feature_request("L3_MON", "mbm_local_bytes") ||
-           (get_vendor() != ARCH_INTEL)) {
-               ksft_test_result_skip("Hardware does not support MBM or MBM is disabled\n");
-               goto cleanup;
-       }
-
-       res = mbm_bw_change(cpu_no, benchmark_cmd);
-       ksft_test_result(!res, "MBM: bw change\n");
-       if ((get_vendor() == ARCH_INTEL) && res)
-               ksft_print_msg("Intel MBM may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n");
+       if (!test->vendor_specific)
+               return true;
 
-cleanup:
-       test_cleanup();
+       return get_vendor() & test->vendor_specific;
 }
 
-static void run_mba_test(const char * const *benchmark_cmd, int cpu_no)
+static void run_single_test(const struct resctrl_test *test, const struct user_params *uparams)
 {
-       int res;
-
-       ksft_print_msg("Starting MBA Schemata change ...\n");
+       int ret;
 
-       if (test_prepare()) {
-               ksft_exit_fail_msg("Abnormal failure when preparing for the test\n");
+       if (test->disabled)
                return;
-       }
 
-       if (!validate_resctrl_feature_request("MB", NULL) ||
-           !validate_resctrl_feature_request("L3_MON", "mbm_local_bytes") ||
-           (get_vendor() != ARCH_INTEL)) {
-               ksft_test_result_skip("Hardware does not support MBA or MBA is disabled\n");
-               goto cleanup;
+       if (!test_vendor_specific_check(test)) {
+               ksft_test_result_skip("Hardware does not support %s\n", test->name);
+               return;
        }
 
-       res = mba_schemata_change(cpu_no, benchmark_cmd);
-       ksft_test_result(!res, "MBA: schemata change\n");
-
-cleanup:
-       test_cleanup();
-}
-
-static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no)
-{
-       int res;
-
-       ksft_print_msg("Starting CMT test ...\n");
+       ksft_print_msg("Starting %s test ...\n", test->name);
 
        if (test_prepare()) {
                ksft_exit_fail_msg("Abnormal failure when preparing for the test\n");
                return;
        }
 
-       if (!validate_resctrl_feature_request("L3_MON", "llc_occupancy") ||
-           !validate_resctrl_feature_request("L3", NULL)) {
-               ksft_test_result_skip("Hardware does not support CMT or CMT is disabled\n");
+       if (!test->feature_check(test)) {
+               ksft_test_result_skip("Hardware does not support %s or %s is disabled\n",
+                                     test->name, test->name);
                goto cleanup;
        }
 
-       res = cmt_resctrl_val(cpu_no, 5, benchmark_cmd);
-       ksft_test_result(!res, "CMT: test\n");
-       if ((get_vendor() == ARCH_INTEL) && res)
-               ksft_print_msg("Intel CMT may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n");
+       ret = test->run_test(test, uparams);
+       ksft_test_result(!ret, "%s: test\n", test->name);
 
 cleanup:
        test_cleanup();
 }
 
-static void run_cat_test(int cpu_no, int no_of_bits)
+static void init_user_params(struct user_params *uparams)
 {
-       int res;
-
-       ksft_print_msg("Starting CAT test ...\n");
-
-       if (test_prepare()) {
-               ksft_exit_fail_msg("Abnormal failure when preparing for the test\n");
-               return;
-       }
-
-       if (!validate_resctrl_feature_request("L3", NULL)) {
-               ksft_test_result_skip("Hardware does not support CAT or CAT is disabled\n");
-               goto cleanup;
-       }
+       memset(uparams, 0, sizeof(*uparams));
 
-       res = cat_perf_miss_val(cpu_no, no_of_bits, "L3");
-       ksft_test_result(!res, "CAT: test\n");
-
-cleanup:
-       test_cleanup();
+       uparams->cpu = 1;
+       uparams->bits = 0;
 }
 
 int main(int argc, char **argv)
 {
-       bool mbm_test = true, mba_test = true, cmt_test = true;
-       const char *benchmark_cmd[BENCHMARK_ARGS] = {};
-       int c, cpu_no = 1, i, no_of_bits = 0;
+       int tests = ARRAY_SIZE(resctrl_tests);
+       bool test_param_seen = false;
+       struct user_params uparams;
        char *span_str = NULL;
-       bool cat_test = true;
-       int tests = 0;
-       int ret;
+       int ret, c, i;
+
+       init_user_params(&uparams);
 
        while ((c = getopt(argc, argv, "ht:b:n:p:")) != -1) {
                char *token;
@@ -219,32 +188,35 @@ int main(int argc, char **argv)
 
                        /* Extract benchmark command from command line. */
                        for (i = 0; i < argc - optind; i++)
-                               benchmark_cmd[i] = argv[i + optind];
-                       benchmark_cmd[i] = NULL;
+                               uparams.benchmark_cmd[i] = argv[i + optind];
+                       uparams.benchmark_cmd[i] = NULL;
 
                        goto last_arg;
                case 't':
                        token = strtok(optarg, ",");
 
-                       mbm_test = false;
-                       mba_test = false;
-                       cmt_test = false;
-                       cat_test = false;
+                       if (!test_param_seen) {
+                               for (i = 0; i < ARRAY_SIZE(resctrl_tests); i++)
+                                       resctrl_tests[i]->disabled = true;
+                               tests = 0;
+                               test_param_seen = true;
+                       }
                        while (token) {
-                               if (!strncmp(token, MBM_STR, sizeof(MBM_STR))) {
-                                       mbm_test = true;
-                                       tests++;
-                               } else if (!strncmp(token, MBA_STR, sizeof(MBA_STR))) {
-                                       mba_test = true;
-                                       tests++;
-                               } else if (!strncmp(token, CMT_STR, sizeof(CMT_STR))) {
-                                       cmt_test = true;
-                                       tests++;
-                               } else if (!strncmp(token, CAT_STR, sizeof(CAT_STR))) {
-                                       cat_test = true;
-                                       tests++;
-                               } else {
-                                       printf("invalid argument\n");
+                               bool found = false;
+
+                               for (i = 0; i < ARRAY_SIZE(resctrl_tests); i++) {
+                                       if (!strcasecmp(token, resctrl_tests[i]->name) ||
+                                           (resctrl_tests[i]->group &&
+                                            !strcasecmp(token, resctrl_tests[i]->group))) {
+                                               if (resctrl_tests[i]->disabled)
+                                                       tests++;
+                                               resctrl_tests[i]->disabled = false;
+                                               found = true;
+                                       }
+                               }
+
+                               if (!found) {
+                                       printf("invalid test: %s\n", token);
 
                                        return -1;
                                }
@@ -252,11 +224,11 @@ int main(int argc, char **argv)
                        }
                        break;
                case 'p':
-                       cpu_no = atoi(optarg);
+                       uparams.cpu = atoi(optarg);
                        break;
                case 'n':
-                       no_of_bits = atoi(optarg);
-                       if (no_of_bits <= 0) {
+                       uparams.bits = atoi(optarg);
+                       if (uparams.bits <= 0) {
                                printf("Bail out! invalid argument for no_of_bits\n");
                                return -1;
                        }
@@ -291,32 +263,23 @@ last_arg:
 
        filter_dmesg();
 
-       if (!benchmark_cmd[0]) {
+       if (!uparams.benchmark_cmd[0]) {
                /* If no benchmark is given by "-b" argument, use fill_buf. */
-               benchmark_cmd[0] = "fill_buf";
+               uparams.benchmark_cmd[0] = "fill_buf";
                ret = asprintf(&span_str, "%u", DEFAULT_SPAN);
                if (ret < 0)
                        ksft_exit_fail_msg("Out of memory!\n");
-               benchmark_cmd[1] = span_str;
-               benchmark_cmd[2] = "1";
-               benchmark_cmd[3] = "0";
-               benchmark_cmd[4] = "false";
-               benchmark_cmd[5] = NULL;
+               uparams.benchmark_cmd[1] = span_str;
+               uparams.benchmark_cmd[2] = "1";
+               uparams.benchmark_cmd[3] = "0";
+               uparams.benchmark_cmd[4] = "false";
+               uparams.benchmark_cmd[5] = NULL;
        }
 
-       ksft_set_plan(tests ? : 4);
-
-       if (mbm_test)
-               run_mbm_test(benchmark_cmd, cpu_no);
-
-       if (mba_test)
-               run_mba_test(benchmark_cmd, cpu_no);
-
-       if (cmt_test)
-               run_cmt_test(benchmark_cmd, cpu_no);
+       ksft_set_plan(tests);
 
-       if (cat_test)
-               run_cat_test(cpu_no, no_of_bits);
+       for (i = 0; i < ARRAY_SIZE(resctrl_tests); i++)
+               run_single_test(resctrl_tests[i], &uparams);
 
        free(span_str);
        ksft_finished();
index 88789678917b6b9d799bd7ee54f4e9c9ecf7d311..5a49f07a6c857daee0d8639880bafb77af3acf6e 100644 (file)
@@ -156,12 +156,12 @@ static int read_from_imc_dir(char *imc_dir, int count)
        sprintf(imc_counter_type, "%s%s", imc_dir, "type");
        fp = fopen(imc_counter_type, "r");
        if (!fp) {
-               perror("Failed to open imc counter type file");
+               ksft_perror("Failed to open iMC counter type file");
 
                return -1;
        }
        if (fscanf(fp, "%u", &imc_counters_config[count][READ].type) <= 0) {
-               perror("Could not get imc type");
+               ksft_perror("Could not get iMC type");
                fclose(fp);
 
                return -1;
@@ -175,12 +175,12 @@ static int read_from_imc_dir(char *imc_dir, int count)
        sprintf(imc_counter_cfg, "%s%s", imc_dir, READ_FILE_NAME);
        fp = fopen(imc_counter_cfg, "r");
        if (!fp) {
-               perror("Failed to open imc config file");
+               ksft_perror("Failed to open iMC config file");
 
                return -1;
        }
        if (fscanf(fp, "%s", cas_count_cfg) <= 0) {
-               perror("Could not get imc cas count read");
+               ksft_perror("Could not get iMC cas count read");
                fclose(fp);
 
                return -1;
@@ -193,12 +193,12 @@ static int read_from_imc_dir(char *imc_dir, int count)
        sprintf(imc_counter_cfg, "%s%s", imc_dir, WRITE_FILE_NAME);
        fp = fopen(imc_counter_cfg, "r");
        if (!fp) {
-               perror("Failed to open imc config file");
+               ksft_perror("Failed to open iMC config file");
 
                return -1;
        }
        if  (fscanf(fp, "%s", cas_count_cfg) <= 0) {
-               perror("Could not get imc cas count write");
+               ksft_perror("Could not get iMC cas count write");
                fclose(fp);
 
                return -1;
@@ -262,12 +262,12 @@ static int num_of_imcs(void)
                }
                closedir(dp);
                if (count == 0) {
-                       perror("Unable find iMC counters!\n");
+                       ksft_print_msg("Unable to find iMC counters\n");
 
                        return -1;
                }
        } else {
-               perror("Unable to open PMU directory!\n");
+               ksft_perror("Unable to open PMU directory");
 
                return -1;
        }
@@ -339,14 +339,14 @@ static int get_mem_bw_imc(int cpu_no, char *bw_report, float *bw_imc)
 
                if (read(r->fd, &r->return_value,
                         sizeof(struct membw_read_format)) == -1) {
-                       perror("Couldn't get read b/w through iMC");
+                       ksft_perror("Couldn't get read b/w through iMC");
 
                        return -1;
                }
 
                if (read(w->fd, &w->return_value,
                         sizeof(struct membw_read_format)) == -1) {
-                       perror("Couldn't get write bw through iMC");
+                       ksft_perror("Couldn't get write bw through iMC");
 
                        return -1;
                }
@@ -387,20 +387,20 @@ static int get_mem_bw_imc(int cpu_no, char *bw_report, float *bw_imc)
        return 0;
 }
 
-void set_mbm_path(const char *ctrlgrp, const char *mongrp, int resource_id)
+void set_mbm_path(const char *ctrlgrp, const char *mongrp, int domain_id)
 {
        if (ctrlgrp && mongrp)
                sprintf(mbm_total_path, CON_MON_MBM_LOCAL_BYTES_PATH,
-                       RESCTRL_PATH, ctrlgrp, mongrp, resource_id);
+                       RESCTRL_PATH, ctrlgrp, mongrp, domain_id);
        else if (!ctrlgrp && mongrp)
                sprintf(mbm_total_path, MON_MBM_LOCAL_BYTES_PATH, RESCTRL_PATH,
-                       mongrp, resource_id);
+                       mongrp, domain_id);
        else if (ctrlgrp && !mongrp)
                sprintf(mbm_total_path, CON_MBM_LOCAL_BYTES_PATH, RESCTRL_PATH,
-                       ctrlgrp, resource_id);
+                       ctrlgrp, domain_id);
        else if (!ctrlgrp && !mongrp)
                sprintf(mbm_total_path, MBM_LOCAL_BYTES_PATH, RESCTRL_PATH,
-                       resource_id);
+                       domain_id);
 }
 
 /*
@@ -413,23 +413,23 @@ void set_mbm_path(const char *ctrlgrp, const char *mongrp, int resource_id)
 static void initialize_mem_bw_resctrl(const char *ctrlgrp, const char *mongrp,
                                      int cpu_no, char *resctrl_val)
 {
-       int resource_id;
+       int domain_id;
 
-       if (get_resource_id(cpu_no, &resource_id) < 0) {
-               perror("Could not get resource_id");
+       if (get_domain_id("MB", cpu_no, &domain_id) < 0) {
+               ksft_print_msg("Could not get domain ID\n");
                return;
        }
 
        if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR)))
-               set_mbm_path(ctrlgrp, mongrp, resource_id);
+               set_mbm_path(ctrlgrp, mongrp, domain_id);
 
        if (!strncmp(resctrl_val, MBA_STR, sizeof(MBA_STR))) {
                if (ctrlgrp)
                        sprintf(mbm_total_path, CON_MBM_LOCAL_BYTES_PATH,
-                               RESCTRL_PATH, ctrlgrp, resource_id);
+                               RESCTRL_PATH, ctrlgrp, domain_id);
                else
                        sprintf(mbm_total_path, MBM_LOCAL_BYTES_PATH,
-                               RESCTRL_PATH, resource_id);
+                               RESCTRL_PATH, domain_id);
        }
 }
 
@@ -449,12 +449,12 @@ static int get_mem_bw_resctrl(unsigned long *mbm_total)
 
        fp = fopen(mbm_total_path, "r");
        if (!fp) {
-               perror("Failed to open total bw file");
+               ksft_perror("Failed to open total bw file");
 
                return -1;
        }
        if (fscanf(fp, "%lu", mbm_total) <= 0) {
-               perror("Could not get mbm local bytes");
+               ksft_perror("Could not get mbm local bytes");
                fclose(fp);
 
                return -1;
@@ -495,7 +495,7 @@ int signal_handler_register(void)
        if (sigaction(SIGINT, &sigact, NULL) ||
            sigaction(SIGTERM, &sigact, NULL) ||
            sigaction(SIGHUP, &sigact, NULL)) {
-               perror("# sigaction");
+               ksft_perror("sigaction");
                ret = -1;
        }
        return ret;
@@ -515,7 +515,7 @@ void signal_handler_unregister(void)
        if (sigaction(SIGINT, &sigact, NULL) ||
            sigaction(SIGTERM, &sigact, NULL) ||
            sigaction(SIGHUP, &sigact, NULL)) {
-               perror("# sigaction");
+               ksft_perror("sigaction");
        }
 }
 
@@ -526,7 +526,7 @@ void signal_handler_unregister(void)
  * @bw_imc:            perf imc counter value
  * @bw_resc:           memory bandwidth value
  *
- * Return:             0 on success. non-zero on failure.
+ * Return:             0 on success, < 0 on error.
  */
 static int print_results_bw(char *filename,  int bm_pid, float bw_imc,
                            unsigned long bw_resc)
@@ -540,16 +540,16 @@ static int print_results_bw(char *filename,  int bm_pid, float bw_imc,
        } else {
                fp = fopen(filename, "a");
                if (!fp) {
-                       perror("Cannot open results file");
+                       ksft_perror("Cannot open results file");
 
-                       return errno;
+                       return -1;
                }
                if (fprintf(fp, "Pid: %d \t Mem_BW_iMC: %f \t Mem_BW_resc: %lu \t Difference: %lu\n",
                            bm_pid, bw_imc, bw_resc, diff) <= 0) {
+                       ksft_print_msg("Could not log results\n");
                        fclose(fp);
-                       perror("Could not log results.");
 
-                       return errno;
+                       return -1;
                }
                fclose(fp);
        }
@@ -582,19 +582,20 @@ static void set_cmt_path(const char *ctrlgrp, const char *mongrp, char sock_num)
 static void initialize_llc_occu_resctrl(const char *ctrlgrp, const char *mongrp,
                                        int cpu_no, char *resctrl_val)
 {
-       int resource_id;
+       int domain_id;
 
-       if (get_resource_id(cpu_no, &resource_id) < 0) {
-               perror("# Unable to resource_id");
+       if (get_domain_id("L3", cpu_no, &domain_id) < 0) {
+               ksft_print_msg("Could not get domain ID\n");
                return;
        }
 
        if (!strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR)))
-               set_cmt_path(ctrlgrp, mongrp, resource_id);
+               set_cmt_path(ctrlgrp, mongrp, domain_id);
 }
 
-static int
-measure_vals(struct resctrl_val_param *param, unsigned long *bw_resc_start)
+static int measure_vals(const struct user_params *uparams,
+                       struct resctrl_val_param *param,
+                       unsigned long *bw_resc_start)
 {
        unsigned long bw_resc, bw_resc_end;
        float bw_imc;
@@ -607,7 +608,7 @@ measure_vals(struct resctrl_val_param *param, unsigned long *bw_resc_start)
         * Compare the two values to validate resctrl value.
         * It takes 1sec to measure the data.
         */
-       ret = get_mem_bw_imc(param->cpu_no, param->bw_report, &bw_imc);
+       ret = get_mem_bw_imc(uparams->cpu, param->bw_report, &bw_imc);
        if (ret < 0)
                return ret;
 
@@ -647,20 +648,24 @@ static void run_benchmark(int signum, siginfo_t *info, void *ucontext)
         * stdio (console)
         */
        fp = freopen("/dev/null", "w", stdout);
-       if (!fp)
-               PARENT_EXIT("Unable to direct benchmark status to /dev/null");
+       if (!fp) {
+               ksft_perror("Unable to direct benchmark status to /dev/null");
+               PARENT_EXIT();
+       }
 
        if (strcmp(benchmark_cmd[0], "fill_buf") == 0) {
                /* Execute default fill_buf benchmark */
                span = strtoul(benchmark_cmd[1], NULL, 10);
                memflush =  atoi(benchmark_cmd[2]);
                operation = atoi(benchmark_cmd[3]);
-               if (!strcmp(benchmark_cmd[4], "true"))
+               if (!strcmp(benchmark_cmd[4], "true")) {
                        once = true;
-               else if (!strcmp(benchmark_cmd[4], "false"))
+               } else if (!strcmp(benchmark_cmd[4], "false")) {
                        once = false;
-               else
-                       PARENT_EXIT("Invalid once parameter");
+               } else {
+                       ksft_print_msg("Invalid once parameter\n");
+                       PARENT_EXIT();
+               }
 
                if (run_fill_buf(span, memflush, operation, once))
                        fprintf(stderr, "Error in running fill buffer\n");
@@ -668,22 +673,28 @@ static void run_benchmark(int signum, siginfo_t *info, void *ucontext)
                /* Execute specified benchmark */
                ret = execvp(benchmark_cmd[0], benchmark_cmd);
                if (ret)
-                       perror("wrong\n");
+                       ksft_perror("execvp");
        }
 
        fclose(stdout);
-       PARENT_EXIT("Unable to run specified benchmark");
+       ksft_print_msg("Unable to run specified benchmark\n");
+       PARENT_EXIT();
 }
 
 /*
  * resctrl_val:        execute benchmark and measure memory bandwidth on
  *                     the benchmark
+ * @test:              test information structure
+ * @uparams:           user supplied parameters
  * @benchmark_cmd:     benchmark command and its arguments
  * @param:             parameters passed to resctrl_val()
  *
- * Return:             0 on success. non-zero on failure.
+ * Return:             0 when the test was run, < 0 on error.
  */
-int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *param)
+int resctrl_val(const struct resctrl_test *test,
+               const struct user_params *uparams,
+               const char * const *benchmark_cmd,
+               struct resctrl_val_param *param)
 {
        char *resctrl_val = param->resctrl_val;
        unsigned long bw_resc_start = 0;
@@ -709,7 +720,7 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
        ppid = getpid();
 
        if (pipe(pipefd)) {
-               perror("# Unable to create pipe");
+               ksft_perror("Unable to create pipe");
 
                return -1;
        }
@@ -721,7 +732,7 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
        fflush(stdout);
        bm_pid = fork();
        if (bm_pid == -1) {
-               perror("# Unable to fork");
+               ksft_perror("Unable to fork");
 
                return -1;
        }
@@ -738,15 +749,17 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
                sigact.sa_flags = SA_SIGINFO;
 
                /* Register for "SIGUSR1" signal from parent */
-               if (sigaction(SIGUSR1, &sigact, NULL))
-                       PARENT_EXIT("Can't register child for signal");
+               if (sigaction(SIGUSR1, &sigact, NULL)) {
+                       ksft_perror("Can't register child for signal");
+                       PARENT_EXIT();
+               }
 
                /* Tell parent that child is ready */
                close(pipefd[0]);
                pipe_message = 1;
                if (write(pipefd[1], &pipe_message, sizeof(pipe_message)) <
                    sizeof(pipe_message)) {
-                       perror("# failed signaling parent process");
+                       ksft_perror("Failed signaling parent process");
                        close(pipefd[1]);
                        return -1;
                }
@@ -755,7 +768,8 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
                /* Suspend child until delivery of "SIGUSR1" from parent */
                sigsuspend(&sigact.sa_mask);
 
-               PARENT_EXIT("Child is done");
+               ksft_perror("Child is done");
+               PARENT_EXIT();
        }
 
        ksft_print_msg("Benchmark PID: %d\n", bm_pid);
@@ -769,7 +783,7 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
        value.sival_ptr = (void *)benchmark_cmd;
 
        /* Taskset benchmark to specified cpu */
-       ret = taskset_benchmark(bm_pid, param->cpu_no);
+       ret = taskset_benchmark(bm_pid, uparams->cpu, NULL);
        if (ret)
                goto out;
 
@@ -786,17 +800,17 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
                        goto out;
 
                initialize_mem_bw_resctrl(param->ctrlgrp, param->mongrp,
-                                         param->cpu_no, resctrl_val);
+                                         uparams->cpu, resctrl_val);
        } else if (!strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR)))
                initialize_llc_occu_resctrl(param->ctrlgrp, param->mongrp,
-                                           param->cpu_no, resctrl_val);
+                                           uparams->cpu, resctrl_val);
 
        /* Parent waits for child to be ready. */
        close(pipefd[1]);
        while (pipe_message != 1) {
                if (read(pipefd[0], &pipe_message, sizeof(pipe_message)) <
                    sizeof(pipe_message)) {
-                       perror("# failed reading message from child process");
+                       ksft_perror("Failed reading message from child process");
                        close(pipefd[0]);
                        goto out;
                }
@@ -805,8 +819,8 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
 
        /* Signal child to start benchmark */
        if (sigqueue(bm_pid, SIGUSR1, value) == -1) {
-               perror("# sigqueue SIGUSR1 to child");
-               ret = errno;
+               ksft_perror("sigqueue SIGUSR1 to child");
+               ret = -1;
                goto out;
        }
 
@@ -815,7 +829,7 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
 
        /* Test runs until the callback setup() tells the test to stop. */
        while (1) {
-               ret = param->setup(param);
+               ret = param->setup(test, uparams, param);
                if (ret == END_OF_TESTS) {
                        ret = 0;
                        break;
@@ -825,12 +839,12 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par
 
                if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR)) ||
                    !strncmp(resctrl_val, MBA_STR, sizeof(MBA_STR))) {
-                       ret = measure_vals(param, &bw_resc_start);
+                       ret = measure_vals(uparams, param, &bw_resc_start);
                        if (ret)
                                break;
                } else if (!strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR))) {
                        sleep(1);
-                       ret = measure_cache_vals(param, bm_pid);
+                       ret = measure_llc_resctrl(param->filename, bm_pid);
                        if (ret)
                                break;
                }
index 5ebd436838769561bfd426a8b9c398347dc210e5..1cade75176eb1d01bf43113414e24a5d7b3190fa 100644 (file)
@@ -20,7 +20,7 @@ static int find_resctrl_mount(char *buffer)
 
        mounts = fopen("/proc/mounts", "r");
        if (!mounts) {
-               perror("/proc/mounts");
+               ksft_perror("/proc/mounts");
                return -ENXIO;
        }
        while (!feof(mounts)) {
@@ -56,7 +56,7 @@ static int find_resctrl_mount(char *buffer)
  * Mounts resctrl FS. Fails if resctrl FS is already mounted to avoid
  * pre-existing settings interfering with the test results.
  *
- * Return: 0 on success, non-zero on failure
+ * Return: 0 on success, < 0 on error.
  */
 int mount_resctrlfs(void)
 {
@@ -69,7 +69,7 @@ int mount_resctrlfs(void)
        ksft_print_msg("Mounting resctrl to \"%s\"\n", RESCTRL_PATH);
        ret = mount("resctrl", RESCTRL_PATH, "resctrl", 0, NULL);
        if (ret)
-               perror("# mount");
+               ksft_perror("mount");
 
        return ret;
 }
@@ -86,41 +86,67 @@ int umount_resctrlfs(void)
                return ret;
 
        if (umount(mountpoint)) {
-               perror("# Unable to umount resctrl");
+               ksft_perror("Unable to umount resctrl");
 
-               return errno;
+               return -1;
        }
 
        return 0;
 }
 
 /*
- * get_resource_id - Get socket number/l3 id for a specified CPU
+ * get_cache_level - Convert cache level from string to integer
+ * @cache_type:                Cache level as string
+ *
+ * Return: cache level as integer or -1 if @cache_type is invalid.
+ */
+static int get_cache_level(const char *cache_type)
+{
+       if (!strcmp(cache_type, "L3"))
+               return 3;
+       if (!strcmp(cache_type, "L2"))
+               return 2;
+
+       ksft_print_msg("Invalid cache level\n");
+       return -1;
+}
+
+static int get_resource_cache_level(const char *resource)
+{
+       /* "MB" use L3 (LLC) as resource */
+       if (!strcmp(resource, "MB"))
+               return 3;
+       return get_cache_level(resource);
+}
+
+/*
+ * get_domain_id - Get resctrl domain ID for a specified CPU
+ * @resource:  resource name
  * @cpu_no:    CPU number
- * @resource_id: Socket number or l3_id
+ * @domain_id: domain ID (cache ID; for MB, L3 cache ID)
  *
  * Return: >= 0 on success, < 0 on failure.
  */
-int get_resource_id(int cpu_no, int *resource_id)
+int get_domain_id(const char *resource, int cpu_no, int *domain_id)
 {
        char phys_pkg_path[1024];
+       int cache_num;
        FILE *fp;
 
-       if (get_vendor() == ARCH_AMD)
-               sprintf(phys_pkg_path, "%s%d/cache/index3/id",
-                       PHYS_ID_PATH, cpu_no);
-       else
-               sprintf(phys_pkg_path, "%s%d/topology/physical_package_id",
-                       PHYS_ID_PATH, cpu_no);
+       cache_num = get_resource_cache_level(resource);
+       if (cache_num < 0)
+               return cache_num;
+
+       sprintf(phys_pkg_path, "%s%d/cache/index%d/id", PHYS_ID_PATH, cpu_no, cache_num);
 
        fp = fopen(phys_pkg_path, "r");
        if (!fp) {
-               perror("Failed to open physical_package_id");
+               ksft_perror("Failed to open cache id file");
 
                return -1;
        }
-       if (fscanf(fp, "%d", resource_id) <= 0) {
-               perror("Could not get socket number or l3 id");
+       if (fscanf(fp, "%d", domain_id) <= 0) {
+               ksft_perror("Could not get domain ID");
                fclose(fp);
 
                return -1;
@@ -138,31 +164,26 @@ int get_resource_id(int cpu_no, int *resource_id)
  *
  * Return: = 0 on success, < 0 on failure.
  */
-int get_cache_size(int cpu_no, char *cache_type, unsigned long *cache_size)
+int get_cache_size(int cpu_no, const char *cache_type, unsigned long *cache_size)
 {
        char cache_path[1024], cache_str[64];
        int length, i, cache_num;
        FILE *fp;
 
-       if (!strcmp(cache_type, "L3")) {
-               cache_num = 3;
-       } else if (!strcmp(cache_type, "L2")) {
-               cache_num = 2;
-       } else {
-               perror("Invalid cache level");
-               return -1;
-       }
+       cache_num = get_cache_level(cache_type);
+       if (cache_num < 0)
+               return cache_num;
 
        sprintf(cache_path, "/sys/bus/cpu/devices/cpu%d/cache/index%d/size",
                cpu_no, cache_num);
        fp = fopen(cache_path, "r");
        if (!fp) {
-               perror("Failed to open cache size");
+               ksft_perror("Failed to open cache size");
 
                return -1;
        }
        if (fscanf(fp, "%s", cache_str) <= 0) {
-               perror("Could not get cache_size");
+               ksft_perror("Could not get cache_size");
                fclose(fp);
 
                return -1;
@@ -196,30 +217,29 @@ int get_cache_size(int cpu_no, char *cache_type, unsigned long *cache_size)
 #define CORE_SIBLINGS_PATH     "/sys/bus/cpu/devices/cpu"
 
 /*
- * get_cbm_mask - Get cbm mask for given cache
- * @cache_type:        Cache level L2/L3
- * @cbm_mask:  cbm_mask returned as a string
+ * get_bit_mask - Get bit mask from given file
+ * @filename:  File containing the mask
+ * @mask:      The bit mask returned as unsigned long
  *
  * Return: = 0 on success, < 0 on failure.
  */
-int get_cbm_mask(char *cache_type, char *cbm_mask)
+static int get_bit_mask(const char *filename, unsigned long *mask)
 {
-       char cbm_mask_path[1024];
        FILE *fp;
 
-       if (!cbm_mask)
+       if (!filename || !mask)
                return -1;
 
-       sprintf(cbm_mask_path, "%s/%s/cbm_mask", INFO_PATH, cache_type);
-
-       fp = fopen(cbm_mask_path, "r");
+       fp = fopen(filename, "r");
        if (!fp) {
-               perror("Failed to open cache level");
-
+               ksft_print_msg("Failed to open bit mask file '%s': %s\n",
+                              filename, strerror(errno));
                return -1;
        }
-       if (fscanf(fp, "%s", cbm_mask) <= 0) {
-               perror("Could not get max cbm_mask");
+
+       if (fscanf(fp, "%lx", mask) <= 0) {
+               ksft_print_msg("Could not read bit mask file '%s': %s\n",
+                              filename, strerror(errno));
                fclose(fp);
 
                return -1;
@@ -230,64 +250,200 @@ int get_cbm_mask(char *cache_type, char *cbm_mask)
 }
 
 /*
- * get_core_sibling - Get sibling core id from the same socket for given CPU
- * @cpu_no:    CPU number
+ * resource_info_unsigned_get - Read an unsigned value from
+ * /sys/fs/resctrl/info/@resource/@filename
+ * @resource:  Resource name that matches directory name in
+ *             /sys/fs/resctrl/info
+ * @filename:  File in /sys/fs/resctrl/info/@resource
+ * @val:       Contains read value on success.
  *
- * Return:     > 0 on success, < 0 on failure.
+ * Return: = 0 on success, < 0 on failure. On success the read
+ * value is saved into @val.
  */
-int get_core_sibling(int cpu_no)
+int resource_info_unsigned_get(const char *resource, const char *filename,
+                              unsigned int *val)
 {
-       char core_siblings_path[1024], cpu_list_str[64];
-       int sibling_cpu_no = -1;
+       char file_path[PATH_MAX];
        FILE *fp;
 
-       sprintf(core_siblings_path, "%s%d/topology/core_siblings_list",
-               CORE_SIBLINGS_PATH, cpu_no);
+       snprintf(file_path, sizeof(file_path), "%s/%s/%s", INFO_PATH, resource,
+                filename);
 
-       fp = fopen(core_siblings_path, "r");
+       fp = fopen(file_path, "r");
        if (!fp) {
-               perror("Failed to open core siblings path");
-
+               ksft_print_msg("Error opening %s: %m\n", file_path);
                return -1;
        }
-       if (fscanf(fp, "%s", cpu_list_str) <= 0) {
-               perror("Could not get core_siblings list");
-               fclose(fp);
 
+       if (fscanf(fp, "%u", val) <= 0) {
+               ksft_print_msg("Could not get contents of %s: %m\n", file_path);
+               fclose(fp);
                return -1;
        }
+
        fclose(fp);
+       return 0;
+}
 
-       char *token = strtok(cpu_list_str, "-,");
+/*
+ * create_bit_mask- Create bit mask from start, len pair
+ * @start:     LSB of the mask
+ * @len                Number of bits in the mask
+ */
+unsigned long create_bit_mask(unsigned int start, unsigned int len)
+{
+       return ((1UL << len) - 1UL) << start;
+}
 
-       while (token) {
-               sibling_cpu_no = atoi(token);
-               /* Skipping core 0 as we don't want to run test on core 0 */
-               if (sibling_cpu_no != 0 && sibling_cpu_no != cpu_no)
-                       break;
-               token = strtok(NULL, "-,");
+/*
+ * count_contiguous_bits - Returns the longest train of bits in a bit mask
+ * @val                A bit mask
+ * @start      The location of the least-significant bit of the longest train
+ *
+ * Return:     The length of the contiguous bits in the longest train of bits
+ */
+unsigned int count_contiguous_bits(unsigned long val, unsigned int *start)
+{
+       unsigned long last_val;
+       unsigned int count = 0;
+
+       while (val) {
+               last_val = val;
+               val &= (val >> 1);
+               count++;
+       }
+
+       if (start) {
+               if (count)
+                       *start = ffsl(last_val) - 1;
+               else
+                       *start = 0;
        }
 
-       return sibling_cpu_no;
+       return count;
+}
+
+/*
+ * get_full_cbm - Get full Cache Bit Mask (CBM)
+ * @cache_type:        Cache type as "L2" or "L3"
+ * @mask:      Full cache bit mask representing the maximal portion of cache
+ *             available for allocation, returned as unsigned long.
+ *
+ * Return: = 0 on success, < 0 on failure.
+ */
+int get_full_cbm(const char *cache_type, unsigned long *mask)
+{
+       char cbm_path[PATH_MAX];
+       int ret;
+
+       if (!cache_type)
+               return -1;
+
+       snprintf(cbm_path, sizeof(cbm_path), "%s/%s/cbm_mask",
+                INFO_PATH, cache_type);
+
+       ret = get_bit_mask(cbm_path, mask);
+       if (ret || !*mask)
+               return -1;
+
+       return 0;
+}
+
+/*
+ * get_shareable_mask - Get shareable mask from shareable_bits
+ * @cache_type:                Cache type as "L2" or "L3"
+ * @shareable_mask:    Shareable mask returned as unsigned long
+ *
+ * Return: = 0 on success, < 0 on failure.
+ */
+static int get_shareable_mask(const char *cache_type, unsigned long *shareable_mask)
+{
+       char mask_path[PATH_MAX];
+
+       if (!cache_type)
+               return -1;
+
+       snprintf(mask_path, sizeof(mask_path), "%s/%s/shareable_bits",
+                INFO_PATH, cache_type);
+
+       return get_bit_mask(mask_path, shareable_mask);
+}
+
+/*
+ * get_mask_no_shareable - Get Cache Bit Mask (CBM) without shareable bits
+ * @cache_type:                Cache type as "L2" or "L3"
+ * @mask:              The largest exclusive portion of the cache out of the
+ *                     full CBM, returned as unsigned long
+ *
+ * Parts of a cache may be shared with other devices such as GPU. This function
+ * calculates the largest exclusive portion of the cache where no other devices
+ * besides CPU have access to the cache portion.
+ *
+ * Return: = 0 on success, < 0 on failure.
+ */
+int get_mask_no_shareable(const char *cache_type, unsigned long *mask)
+{
+       unsigned long full_mask, shareable_mask;
+       unsigned int start, len;
+
+       if (get_full_cbm(cache_type, &full_mask) < 0)
+               return -1;
+       if (get_shareable_mask(cache_type, &shareable_mask) < 0)
+               return -1;
+
+       len = count_contiguous_bits(full_mask & ~shareable_mask, &start);
+       if (!len)
+               return -1;
+
+       *mask = create_bit_mask(start, len);
+
+       return 0;
 }
 
 /*
  * taskset_benchmark - Taskset PID (i.e. benchmark) to a specified cpu
- * @bm_pid:    PID that should be binded
- * @cpu_no:    CPU number at which the PID would be binded
+ * @bm_pid:            PID that should be binded
+ * @cpu_no:            CPU number at which the PID would be binded
+ * @old_affinity:      When not NULL, set to old CPU affinity
  *
- * Return: 0 on success, non-zero on failure
+ * Return: 0 on success, < 0 on error.
  */
-int taskset_benchmark(pid_t bm_pid, int cpu_no)
+int taskset_benchmark(pid_t bm_pid, int cpu_no, cpu_set_t *old_affinity)
 {
        cpu_set_t my_set;
 
+       if (old_affinity) {
+               CPU_ZERO(old_affinity);
+               if (sched_getaffinity(bm_pid, sizeof(*old_affinity),
+                                     old_affinity)) {
+                       ksft_perror("Unable to read CPU affinity");
+                       return -1;
+               }
+       }
+
        CPU_ZERO(&my_set);
        CPU_SET(cpu_no, &my_set);
 
        if (sched_setaffinity(bm_pid, sizeof(cpu_set_t), &my_set)) {
-               perror("Unable to taskset benchmark");
+               ksft_perror("Unable to taskset benchmark");
+
+               return -1;
+       }
+
+       return 0;
+}
 
+/*
+ * taskset_restore - Taskset PID to the earlier CPU affinity
+ * @bm_pid:            PID that should be reset
+ * @old_affinity:      The old CPU affinity to restore
+ *
+ * Return: 0 on success, < 0 on error.
+ */
+int taskset_restore(pid_t bm_pid, cpu_set_t *old_affinity)
+{
+       if (sched_setaffinity(bm_pid, sizeof(*old_affinity), old_affinity)) {
+               ksft_perror("Unable to restore CPU affinity");
                return -1;
        }
 
@@ -300,7 +456,7 @@ int taskset_benchmark(pid_t bm_pid, int cpu_no)
  * @grp:       Full path and name of the group
  * @parent_grp:        Full path and name of the parent group
  *
- * Return: 0 on success, non-zero on failure
+ * Return: 0 on success, < 0 on error.
  */
 static int create_grp(const char *grp_name, char *grp, const char *parent_grp)
 {
@@ -325,7 +481,7 @@ static int create_grp(const char *grp_name, char *grp, const char *parent_grp)
                }
                closedir(dp);
        } else {
-               perror("Unable to open resctrl for group");
+               ksft_perror("Unable to open resctrl for group");
 
                return -1;
        }
@@ -333,7 +489,7 @@ static int create_grp(const char *grp_name, char *grp, const char *parent_grp)
        /* Requested grp doesn't exist, hence create it */
        if (found_grp == 0) {
                if (mkdir(grp, 0) == -1) {
-                       perror("Unable to create group");
+                       ksft_perror("Unable to create group");
 
                        return -1;
                }
@@ -348,12 +504,12 @@ static int write_pid_to_tasks(char *tasks, pid_t pid)
 
        fp = fopen(tasks, "w");
        if (!fp) {
-               perror("Failed to open tasks file");
+               ksft_perror("Failed to open tasks file");
 
                return -1;
        }
        if (fprintf(fp, "%d\n", pid) < 0) {
-               perror("Failed to wr pid to tasks file");
+               ksft_print_msg("Failed to write pid to tasks file\n");
                fclose(fp);
 
                return -1;
@@ -376,7 +532,7 @@ static int write_pid_to_tasks(char *tasks, pid_t pid)
  * pid is not written, this means that pid is in con_mon grp and hence
  * should consult con_mon grp's mon_data directory for results.
  *
- * Return: 0 on success, non-zero on failure
+ * Return: 0 on success, < 0 on error.
  */
 int write_bm_pid_to_resctrl(pid_t bm_pid, char *ctrlgrp, char *mongrp,
                            char *resctrl_val)
@@ -420,7 +576,7 @@ int write_bm_pid_to_resctrl(pid_t bm_pid, char *ctrlgrp, char *mongrp,
 out:
        ksft_print_msg("Writing benchmark parameters to resctrl FS\n");
        if (ret)
-               perror("# writing to resctrlfs");
+               ksft_print_msg("Failed writing to resctrlfs\n");
 
        return ret;
 }
@@ -430,23 +586,17 @@ out:
  * @ctrlgrp:           Name of the con_mon grp
  * @schemata:          Schemata that should be updated to
  * @cpu_no:            CPU number that the benchmark PID is binded to
- * @resctrl_val:       Resctrl feature (Eg: mbm, mba.. etc)
+ * @resource:          Resctrl resource (Eg: MB, L3, L2, etc.)
  *
- * Update schemata of a con_mon grp *only* if requested resctrl feature is
+ * Update schemata of a con_mon grp *only* if requested resctrl resource is
  * allocation type
  *
- * Return: 0 on success, non-zero on failure
+ * Return: 0 on success, < 0 on error.
  */
-int write_schemata(char *ctrlgrp, char *schemata, int cpu_no, char *resctrl_val)
+int write_schemata(char *ctrlgrp, char *schemata, int cpu_no, const char *resource)
 {
        char controlgroup[1024], reason[128], schema[1024] = {};
-       int resource_id, fd, schema_len = -1, ret = 0;
-
-       if (strncmp(resctrl_val, MBA_STR, sizeof(MBA_STR)) &&
-           strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR)) &&
-           strncmp(resctrl_val, CAT_STR, sizeof(CAT_STR)) &&
-           strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR)))
-               return -ENOENT;
+       int domain_id, fd, schema_len, ret = 0;
 
        if (!schemata) {
                ksft_print_msg("Skipping empty schemata update\n");
@@ -454,8 +604,8 @@ int write_schemata(char *ctrlgrp, char *schemata, int cpu_no, char *resctrl_val)
                return -1;
        }
 
-       if (get_resource_id(cpu_no, &resource_id) < 0) {
-               sprintf(reason, "Failed to get resource id");
+       if (get_domain_id(resource, cpu_no, &domain_id) < 0) {
+               sprintf(reason, "Failed to get domain ID");
                ret = -1;
 
                goto out;
@@ -466,14 +616,8 @@ int write_schemata(char *ctrlgrp, char *schemata, int cpu_no, char *resctrl_val)
        else
                sprintf(controlgroup, "%s/schemata", RESCTRL_PATH);
 
-       if (!strncmp(resctrl_val, CAT_STR, sizeof(CAT_STR)) ||
-           !strncmp(resctrl_val, CMT_STR, sizeof(CMT_STR)))
-               schema_len = snprintf(schema, sizeof(schema), "%s%d%c%s\n",
-                                     "L3:", resource_id, '=', schemata);
-       if (!strncmp(resctrl_val, MBA_STR, sizeof(MBA_STR)) ||
-           !strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR)))
-               schema_len = snprintf(schema, sizeof(schema), "%s%d%c%s\n",
-                                     "MB:", resource_id, '=', schemata);
+       schema_len = snprintf(schema, sizeof(schema), "%s:%d=%s\n",
+                             resource, domain_id, schemata);
        if (schema_len < 0 || schema_len >= sizeof(schema)) {
                snprintf(reason, sizeof(reason),
                         "snprintf() failed with return value : %d", schema_len);
@@ -564,20 +708,16 @@ char *fgrep(FILE *inf, const char *str)
 }
 
 /*
- * validate_resctrl_feature_request - Check if requested feature is valid.
- * @resource:  Required resource (e.g., MB, L3, L2, L3_MON, etc.)
- * @feature:   Required monitor feature (in mon_features file). Can only be
- *             set for L3_MON. Must be NULL for all other resources.
+ * resctrl_resource_exists - Check if a resource is supported.
+ * @resource:  Resctrl resource (e.g., MB, L3, L2, L3_MON, etc.)
  *
- * Return: True if the resource/feature is supported, else false. False is
+ * Return: True if the resource is supported, else false. False is
  *         also returned if resctrl FS is not mounted.
  */
-bool validate_resctrl_feature_request(const char *resource, const char *feature)
+bool resctrl_resource_exists(const char *resource)
 {
        char res_path[PATH_MAX];
        struct stat statbuf;
-       char *res;
-       FILE *inf;
        int ret;
 
        if (!resource)
@@ -592,8 +732,25 @@ bool validate_resctrl_feature_request(const char *resource, const char *feature)
        if (stat(res_path, &statbuf))
                return false;
 
-       if (!feature)
-               return true;
+       return true;
+}
+
+/*
+ * resctrl_mon_feature_exists - Check if requested monitoring feature is valid.
+ * @resource:  Resource that uses the mon_features file. Currently only L3_MON
+ *             is valid.
+ * @feature:   Required monitor feature (in mon_features file).
+ *
+ * Return: True if the feature is supported, else false.
+ */
+bool resctrl_mon_feature_exists(const char *resource, const char *feature)
+{
+       char res_path[PATH_MAX];
+       char *res;
+       FILE *inf;
+
+       if (!feature || !resource)
+               return false;
 
        snprintf(res_path, sizeof(res_path), "%s/%s/mon_features", INFO_PATH, resource);
        inf = fopen(res_path, "r");
@@ -607,6 +764,36 @@ bool validate_resctrl_feature_request(const char *resource, const char *feature)
        return !!res;
 }
 
+/*
+ * resource_info_file_exists - Check if a file is present inside
+ * /sys/fs/resctrl/info/@resource.
+ * @resource:  Required resource (Eg: MB, L3, L2, etc.)
+ * @file:      Required file.
+ *
+ * Return: True if the /sys/fs/resctrl/info/@resource/@file exists, else false.
+ */
+bool resource_info_file_exists(const char *resource, const char *file)
+{
+       char res_path[PATH_MAX];
+       struct stat statbuf;
+
+       if (!file || !resource)
+               return false;
+
+       snprintf(res_path, sizeof(res_path), "%s/%s/%s", INFO_PATH, resource,
+                file);
+
+       if (stat(res_path, &statbuf))
+               return false;
+
+       return true;
+}
+
+bool test_resource_feature_check(const struct resctrl_test *test)
+{
+       return resctrl_resource_exists(test->resource);
+}
+
 int filter_dmesg(void)
 {
        char line[1024];
@@ -617,7 +804,7 @@ int filter_dmesg(void)
 
        ret = pipe(pipefds);
        if (ret) {
-               perror("pipe");
+               ksft_perror("pipe");
                return ret;
        }
        fflush(stdout);
@@ -626,13 +813,13 @@ int filter_dmesg(void)
                close(pipefds[0]);
                dup2(pipefds[1], STDOUT_FILENO);
                execlp("dmesg", "dmesg", NULL);
-               perror("executing dmesg");
+               ksft_perror("Executing dmesg");
                exit(1);
        }
        close(pipefds[1]);
        fp = fdopen(pipefds[0], "r");
        if (!fp) {
-               perror("fdopen(pipe)");
+               ksft_perror("fdopen(pipe)");
                kill(pid, SIGTERM);
 
                return -1;
index 88754296196870a5d0ef3afb52373c8d40cbc598..2348d2c20d0a1aaf3a05a1c7005983f442708b3c 100644 (file)
@@ -24,6 +24,11 @@ bool rseq_validate_cpu_id(void)
 {
        return rseq_mm_cid_available();
 }
+static
+bool rseq_use_cpu_index(void)
+{
+       return false;   /* Use mm_cid */
+}
 #else
 # define RSEQ_PERCPU   RSEQ_PERCPU_CPU_ID
 static
@@ -36,6 +41,11 @@ bool rseq_validate_cpu_id(void)
 {
        return rseq_current_cpu_raw() >= 0;
 }
+static
+bool rseq_use_cpu_index(void)
+{
+       return true;    /* Use cpu_id as index. */
+}
 #endif
 
 struct percpu_lock_entry {
@@ -274,7 +284,7 @@ void test_percpu_list(void)
        /* Generate list entries for every usable cpu. */
        sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
        for (i = 0; i < CPU_SETSIZE; i++) {
-               if (!CPU_ISSET(i, &allowed_cpus))
+               if (rseq_use_cpu_index() && !CPU_ISSET(i, &allowed_cpus))
                        continue;
                for (j = 1; j <= 100; j++) {
                        struct percpu_list_node *node;
@@ -299,7 +309,7 @@ void test_percpu_list(void)
        for (i = 0; i < CPU_SETSIZE; i++) {
                struct percpu_list_node *node;
 
-               if (!CPU_ISSET(i, &allowed_cpus))
+               if (rseq_use_cpu_index() && !CPU_ISSET(i, &allowed_cpus))
                        continue;
 
                while ((node = __percpu_list_pop(&list, i))) {
index 20403d58345cd523186b9423750ea7ad669cdd96..2f37961240caa7cc43f142fac32fd7f9c9c211d4 100644 (file)
@@ -288,6 +288,11 @@ bool rseq_validate_cpu_id(void)
 {
        return rseq_mm_cid_available();
 }
+static
+bool rseq_use_cpu_index(void)
+{
+       return false;   /* Use mm_cid */
+}
 # ifdef TEST_MEMBARRIER
 /*
  * Membarrier does not currently support targeting a mm_cid, so
@@ -312,6 +317,11 @@ bool rseq_validate_cpu_id(void)
 {
        return rseq_current_cpu_raw() >= 0;
 }
+static
+bool rseq_use_cpu_index(void)
+{
+       return true;    /* Use cpu_id as index. */
+}
 # ifdef TEST_MEMBARRIER
 static
 int rseq_membarrier_expedited(int cpu)
@@ -715,7 +725,7 @@ void test_percpu_list(void)
        /* Generate list entries for every usable cpu. */
        sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
        for (i = 0; i < CPU_SETSIZE; i++) {
-               if (!CPU_ISSET(i, &allowed_cpus))
+               if (rseq_use_cpu_index() && !CPU_ISSET(i, &allowed_cpus))
                        continue;
                for (j = 1; j <= 100; j++) {
                        struct percpu_list_node *node;
@@ -752,7 +762,7 @@ void test_percpu_list(void)
        for (i = 0; i < CPU_SETSIZE; i++) {
                struct percpu_list_node *node;
 
-               if (!CPU_ISSET(i, &allowed_cpus))
+               if (rseq_use_cpu_index() && !CPU_ISSET(i, &allowed_cpus))
                        continue;
 
                while ((node = __percpu_list_pop(&list, i))) {
@@ -902,7 +912,7 @@ void test_percpu_buffer(void)
        /* Generate list entries for every usable cpu. */
        sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
        for (i = 0; i < CPU_SETSIZE; i++) {
-               if (!CPU_ISSET(i, &allowed_cpus))
+               if (rseq_use_cpu_index() && !CPU_ISSET(i, &allowed_cpus))
                        continue;
                /* Worse-case is every item in same CPU. */
                buffer.c[i].array =
@@ -952,7 +962,7 @@ void test_percpu_buffer(void)
        for (i = 0; i < CPU_SETSIZE; i++) {
                struct percpu_buffer_node *node;
 
-               if (!CPU_ISSET(i, &allowed_cpus))
+               if (rseq_use_cpu_index() && !CPU_ISSET(i, &allowed_cpus))
                        continue;
 
                while ((node = __percpu_buffer_pop(&buffer, i))) {
@@ -1113,7 +1123,7 @@ void test_percpu_memcpy_buffer(void)
        /* Generate list entries for every usable cpu. */
        sched_getaffinity(0, sizeof(allowed_cpus), &allowed_cpus);
        for (i = 0; i < CPU_SETSIZE; i++) {
-               if (!CPU_ISSET(i, &allowed_cpus))
+               if (rseq_use_cpu_index() && !CPU_ISSET(i, &allowed_cpus))
                        continue;
                /* Worse-case is every item in same CPU. */
                buffer.c[i].array =
@@ -1160,7 +1170,7 @@ void test_percpu_memcpy_buffer(void)
        for (i = 0; i < CPU_SETSIZE; i++) {
                struct percpu_memcpy_buffer_node item;
 
-               if (!CPU_ISSET(i, &allowed_cpus))
+               if (rseq_use_cpu_index() && !CPU_ISSET(i, &allowed_cpus))
                        continue;
 
                while (__percpu_memcpy_buffer_pop(&buffer, &item, i)) {
diff --git a/tools/testing/selftests/rust/Makefile b/tools/testing/selftests/rust/Makefile
new file mode 100644 (file)
index 0000000..fce1584
--- /dev/null
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+TEST_PROGS += test_probe_samples.sh
+
+include ../lib.mk
diff --git a/tools/testing/selftests/rust/config b/tools/testing/selftests/rust/config
new file mode 100644 (file)
index 0000000..b4002ac
--- /dev/null
@@ -0,0 +1,5 @@
+CONFIG_RUST=y
+CONFIG_SAMPLES=y
+CONFIG_SAMPLES_RUST=y
+CONFIG_SAMPLE_RUST_MINIMAL=m
+CONFIG_SAMPLE_RUST_PRINT=m
\ No newline at end of file
diff --git a/tools/testing/selftests/rust/test_probe_samples.sh b/tools/testing/selftests/rust/test_probe_samples.sh
new file mode 100755 (executable)
index 0000000..ad0397e
--- /dev/null
@@ -0,0 +1,41 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2023 Collabora Ltd
+#
+# This script tests whether the rust sample modules can
+# be added and removed correctly.
+#
+DIR="$(dirname "$(readlink -f "$0")")"
+
+KTAP_HELPERS="${DIR}/../kselftest/ktap_helpers.sh"
+if [ -e "$KTAP_HELPERS" ]; then
+    source "$KTAP_HELPERS"
+else
+    echo "$KTAP_HELPERS file not found [SKIP]"
+    exit 4
+fi
+
+rust_sample_modules=("rust_minimal" "rust_print")
+
+ktap_print_header
+
+for sample in "${rust_sample_modules[@]}"; do
+    if ! /sbin/modprobe -n -q "$sample"; then
+        ktap_skip_all "module $sample is not found in /lib/modules/$(uname -r)"
+        exit "$KSFT_SKIP"
+    fi
+done
+
+ktap_set_plan "${#rust_sample_modules[@]}"
+
+for sample in "${rust_sample_modules[@]}"; do
+    if /sbin/modprobe -q "$sample"; then
+        /sbin/modprobe -q -r "$sample"
+        ktap_test_pass "$sample"
+    else
+        ktap_test_fail "$sample"
+    fi
+done
+
+ktap_finished
index 7ba057154343fbdbcef82ca95c3cedfd561866b7..62fba7356af2ef0e8e824209f668ba6e8dce6a62 100644 (file)
@@ -276,7 +276,7 @@ int main(int argc, char *argv[])
        if (setpgid(0, 0) != 0)
                handle_error("process group");
 
-       printf("\n## Create a thread/process/process group hiearchy\n");
+       printf("\n## Create a thread/process/process group hierarchy\n");
        create_processes(num_processes, num_threads, procs);
        need_cleanup = 1;
        disp_processes(num_processes, procs);
index 5b5c9d558dee07bc1f7afd7df280e1189858451e..97b86980b768f4fa09da58f16d71ba42f42d2c8d 100644 (file)
@@ -38,10 +38,10 @@ unsigned long long timing(clockid_t clk_id, unsigned long long samples)
        i *= 1000000000ULL;
        i += finish.tv_nsec - start.tv_nsec;
 
-       printf("%lu.%09lu - %lu.%09lu = %llu (%.1fs)\n",
-               finish.tv_sec, finish.tv_nsec,
-               start.tv_sec, start.tv_nsec,
-               i, (double)i / 1000000000.0);
+       ksft_print_msg("%lu.%09lu - %lu.%09lu = %llu (%.1fs)\n",
+                      finish.tv_sec, finish.tv_nsec,
+                      start.tv_sec, start.tv_nsec,
+                      i, (double)i / 1000000000.0);
 
        return i;
 }
@@ -53,7 +53,7 @@ unsigned long long calibrate(void)
        pid_t pid, ret;
        int seconds = 15;
 
-       printf("Calibrating sample size for %d seconds worth of syscalls ...\n", seconds);
+       ksft_print_msg("Calibrating sample size for %d seconds worth of syscalls ...\n", seconds);
 
        samples = 0;
        pid = getpid();
@@ -98,24 +98,36 @@ bool le(int i_one, int i_two)
 }
 
 long compare(const char *name_one, const char *name_eval, const char *name_two,
-            unsigned long long one, bool (*eval)(int, int), unsigned long long two)
+            unsigned long long one, bool (*eval)(int, int), unsigned long long two,
+            bool skip)
 {
        bool good;
 
-       printf("\t%s %s %s (%lld %s %lld): ", name_one, name_eval, name_two,
-              (long long)one, name_eval, (long long)two);
+       if (skip) {
+               ksft_test_result_skip("%s %s %s\n", name_one, name_eval,
+                                     name_two);
+               return 0;
+       }
+
+       ksft_print_msg("\t%s %s %s (%lld %s %lld): ", name_one, name_eval, name_two,
+                      (long long)one, name_eval, (long long)two);
        if (one > INT_MAX) {
-               printf("Miscalculation! Measurement went negative: %lld\n", (long long)one);
-               return 1;
+               ksft_print_msg("Miscalculation! Measurement went negative: %lld\n", (long long)one);
+               good = false;
+               goto out;
        }
        if (two > INT_MAX) {
-               printf("Miscalculation! Measurement went negative: %lld\n", (long long)two);
-               return 1;
+               ksft_print_msg("Miscalculation! Measurement went negative: %lld\n", (long long)two);
+               good = false;
+               goto out;
        }
 
        good = eval(one, two);
        printf("%s\n", good ? "✔️" : "❌");
 
+out:
+       ksft_test_result(good, "%s %s %s\n", name_one, name_eval, name_two);
+
        return good ? 0 : 1;
 }
 
@@ -142,15 +154,22 @@ int main(int argc, char *argv[])
        unsigned long long samples, calc;
        unsigned long long native, filter1, filter2, bitmap1, bitmap2;
        unsigned long long entry, per_filter1, per_filter2;
+       bool skip = false;
 
        setbuf(stdout, NULL);
 
-       printf("Running on:\n");
+       ksft_print_header();
+       ksft_set_plan(7);
+
+       ksft_print_msg("Running on:\n");
+       ksft_print_msg("");
        system("uname -a");
 
-       printf("Current BPF sysctl settings:\n");
+       ksft_print_msg("Current BPF sysctl settings:\n");
        /* Avoid using "sysctl" which may not be installed. */
+       ksft_print_msg("");
        system("grep -H . /proc/sys/net/core/bpf_jit_enable");
+       ksft_print_msg("");
        system("grep -H . /proc/sys/net/core/bpf_jit_harden");
 
        if (argc > 1)
@@ -158,11 +177,11 @@ int main(int argc, char *argv[])
        else
                samples = calibrate();
 
-       printf("Benchmarking %llu syscalls...\n", samples);
+       ksft_print_msg("Benchmarking %llu syscalls...\n", samples);
 
        /* Native call */
        native = timing(CLOCK_PROCESS_CPUTIME_ID, samples) / samples;
-       printf("getpid native: %llu ns\n", native);
+       ksft_print_msg("getpid native: %llu ns\n", native);
 
        ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
        assert(ret == 0);
@@ -172,35 +191,37 @@ int main(int argc, char *argv[])
        assert(ret == 0);
 
        bitmap1 = timing(CLOCK_PROCESS_CPUTIME_ID, samples) / samples;
-       printf("getpid RET_ALLOW 1 filter (bitmap): %llu ns\n", bitmap1);
+       ksft_print_msg("getpid RET_ALLOW 1 filter (bitmap): %llu ns\n", bitmap1);
 
        /* Second filter resulting in a bitmap */
        ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &bitmap_prog);
        assert(ret == 0);
 
        bitmap2 = timing(CLOCK_PROCESS_CPUTIME_ID, samples) / samples;
-       printf("getpid RET_ALLOW 2 filters (bitmap): %llu ns\n", bitmap2);
+       ksft_print_msg("getpid RET_ALLOW 2 filters (bitmap): %llu ns\n", bitmap2);
 
        /* Third filter, can no longer be converted to bitmap */
        ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);
        assert(ret == 0);
 
        filter1 = timing(CLOCK_PROCESS_CPUTIME_ID, samples) / samples;
-       printf("getpid RET_ALLOW 3 filters (full): %llu ns\n", filter1);
+       ksft_print_msg("getpid RET_ALLOW 3 filters (full): %llu ns\n", filter1);
 
        /* Fourth filter, can not be converted to bitmap because of filter 3 */
        ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &bitmap_prog);
        assert(ret == 0);
 
        filter2 = timing(CLOCK_PROCESS_CPUTIME_ID, samples) / samples;
-       printf("getpid RET_ALLOW 4 filters (full): %llu ns\n", filter2);
+       ksft_print_msg("getpid RET_ALLOW 4 filters (full): %llu ns\n", filter2);
 
        /* Estimations */
 #define ESTIMATE(fmt, var, what)       do {                    \
                var = (what);                                   \
-               printf("Estimated " fmt ": %llu ns\n", var);    \
-               if (var > INT_MAX)                              \
-                       goto more_samples;                      \
+               ksft_print_msg("Estimated " fmt ": %llu ns\n", var);    \
+               if (var > INT_MAX) {                            \
+                       skip = true;                            \
+                       ret |= 1;                               \
+               }                                               \
        } while (0)
 
        ESTIMATE("total seccomp overhead for 1 bitmapped filter", calc,
@@ -218,31 +239,34 @@ int main(int argc, char *argv[])
        ESTIMATE("seccomp per-filter overhead (filters / 4)", per_filter2,
                 (filter2 - native - entry) / 4);
 
-       printf("Expectations:\n");
-       ret |= compare("native", "≤", "1 bitmap", native, le, bitmap1);
-       bits = compare("native", "≤", "1 filter", native, le, filter1);
+       ksft_print_msg("Expectations:\n");
+       ret |= compare("native", "≤", "1 bitmap", native, le, bitmap1,
+                      skip);
+       bits = compare("native", "≤", "1 filter", native, le, filter1,
+                      skip);
        if (bits)
-               goto more_samples;
+               skip = true;
 
        ret |= compare("per-filter (last 2 diff)", "≈", "per-filter (filters / 4)",
-                       per_filter1, approx, per_filter2);
+                      per_filter1, approx, per_filter2, skip);
 
        bits = compare("1 bitmapped", "≈", "2 bitmapped",
-                       bitmap1 - native, approx, bitmap2 - native);
+                      bitmap1 - native, approx, bitmap2 - native, skip);
        if (bits) {
-               printf("Skipping constant action bitmap expectations: they appear unsupported.\n");
-               goto out;
+               ksft_print_msg("Skipping constant action bitmap expectations: they appear unsupported.\n");
+               skip = true;
        }
 
-       ret |= compare("entry", "≈", "1 bitmapped", entry, approx, bitmap1 - native);
-       ret |= compare("entry", "≈", "2 bitmapped", entry, approx, bitmap2 - native);
+       ret |= compare("entry", "≈", "1 bitmapped", entry, approx,
+                      bitmap1 - native, skip);
+       ret |= compare("entry", "≈", "2 bitmapped", entry, approx,
+                      bitmap2 - native, skip);
        ret |= compare("native + entry + (per filter * 4)", "≈", "4 filters total",
-                       entry + (per_filter1 * 4) + native, approx, filter2);
-       if (ret == 0)
-               goto out;
+                      entry + (per_filter1 * 4) + native, approx, filter2,
+                      skip);
 
-more_samples:
-       printf("Saw unexpected benchmark result. Try running again with more samples?\n");
-out:
-       return 0;
+       if (ret)
+               ksft_print_msg("Saw unexpected benchmark result. Try running again with more samples?\n");
+
+       ksft_finished();
 }
diff --git a/tools/testing/selftests/thermal/intel/power_floor/.gitignore b/tools/testing/selftests/thermal/intel/power_floor/.gitignore
new file mode 100644 (file)
index 0000000..1b9a764
--- /dev/null
@@ -0,0 +1 @@
+power_floor_test
diff --git a/tools/testing/selftests/thermal/intel/workload_hint/.gitignore b/tools/testing/selftests/thermal/intel/workload_hint/.gitignore
new file mode 100644 (file)
index 0000000..d697b03
--- /dev/null
@@ -0,0 +1 @@
+workload_hint_test
diff --git a/tools/testing/selftests/uevent/.gitignore b/tools/testing/selftests/uevent/.gitignore
new file mode 100644 (file)
index 0000000..382afb7
--- /dev/null
@@ -0,0 +1 @@
+uevent_filtering
index 2456a399eb9ae1ce2a3c90c10ce1403dd23f62d1..afd18c678ff5a584a92e13ae8ee30bcba90f21ca 100644 (file)
@@ -28,10 +28,15 @@ FOPTS       :=      -flto=auto -ffat-lto-objects -fexceptions -fstack-protector-strong \
                -fasynchronous-unwind-tables -fstack-clash-protection
 WOPTS  :=      -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized
 
+ifeq ($(CC),clang)
+  FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS))
+  WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS))
+endif
+
 TRACEFS_HEADERS        := $$($(PKG_CONFIG) --cflags libtracefs)
 
 CFLAGS :=      -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) $(TRACEFS_HEADERS) $(EXTRA_CFLAGS)
-LDFLAGS        :=      -ggdb $(EXTRA_LDFLAGS)
+LDFLAGS        :=      -flto=auto -ggdb $(EXTRA_LDFLAGS)
 LIBS   :=      $$($(PKG_CONFIG) --libs libtracefs)
 
 SRC    :=      $(wildcard src/*.c)
index 8f81fa007364890dd4303b047c484786a56390f2..01870d50942a19a242f6444c7b11f019b460ec4e 100644 (file)
@@ -135,8 +135,7 @@ static void osnoise_hist_update_multiple(struct osnoise_tool *tool, int cpu,
        if (params->output_divisor)
                duration = duration / params->output_divisor;
 
-       if (data->bucket_size)
-               bucket = duration / data->bucket_size;
+       bucket = duration / data->bucket_size;
 
        total_duration = duration * count;
 
@@ -480,7 +479,11 @@ static void osnoise_hist_usage(char *usage)
 
        for (i = 0; msg[i]; i++)
                fprintf(stderr, "%s\n", msg[i]);
-       exit(1);
+
+       if (usage)
+               exit(EXIT_FAILURE);
+
+       exit(EXIT_SUCCESS);
 }
 
 /*
index f7c959be8677799788eda3e577246cd3d2ec444a..457360db07673191fbc034c9fdb3193b91c4499c 100644 (file)
@@ -331,7 +331,11 @@ static void osnoise_top_usage(struct osnoise_top_params *params, char *usage)
 
        for (i = 0; msg[i]; i++)
                fprintf(stderr, "%s\n", msg[i]);
-       exit(1);
+
+       if (usage)
+               exit(EXIT_FAILURE);
+
+       exit(EXIT_SUCCESS);
 }
 
 /*
index 47d3d8b53cb2177fe7db4c39a21e630aca22fa23..dbf154082f958c146bed6537dc527f83e57993d4 100644 (file)
@@ -178,8 +178,7 @@ timerlat_hist_update(struct osnoise_tool *tool, int cpu,
        if (params->output_divisor)
                latency = latency / params->output_divisor;
 
-       if (data->bucket_size)
-               bucket = latency / data->bucket_size;
+       bucket = latency / data->bucket_size;
 
        if (!context) {
                hist = data->hist[cpu].irq;
@@ -546,7 +545,11 @@ static void timerlat_hist_usage(char *usage)
 
        for (i = 0; msg[i]; i++)
                fprintf(stderr, "%s\n", msg[i]);
-       exit(1);
+
+       if (usage)
+               exit(EXIT_FAILURE);
+
+       exit(EXIT_SUCCESS);
 }
 
 /*
index 1640f121baca50d99b94621309522d3fb824bc33..3e9af2c3868880197dc3075b74d94a15bea07d38 100644 (file)
@@ -375,7 +375,11 @@ static void timerlat_top_usage(char *usage)
 
        for (i = 0; msg[i]; i++)
                fprintf(stderr, "%s\n", msg[i]);
-       exit(1);
+
+       if (usage)
+               exit(EXIT_FAILURE);
+
+       exit(EXIT_SUCCESS);
 }
 
 /*
index c769d7b3842c0967e85f7dc1d8c6c705edd1f2dd..9ac71a66840c1bec2e944f3a9db0f427f3c7edfb 100644 (file)
@@ -238,12 +238,6 @@ static inline int sched_setattr(pid_t pid, const struct sched_attr *attr,
        return syscall(__NR_sched_setattr, pid, attr, flags);
 }
 
-static inline int sched_getattr(pid_t pid, struct sched_attr *attr,
-                               unsigned int size, unsigned int flags)
-{
-       return syscall(__NR_sched_getattr, pid, attr, size, flags);
-}
-
 int __set_sched_attr(int pid, struct sched_attr *attr)
 {
        int flags = 0;
@@ -479,13 +473,13 @@ int parse_prio(char *arg, struct sched_attr *sched_param)
                if (prio == INVALID_VAL)
                        return -1;
 
-               if (prio < sched_get_priority_min(SCHED_OTHER))
+               if (prio < MIN_NICE)
                        return -1;
-               if (prio > sched_get_priority_max(SCHED_OTHER))
+               if (prio > MAX_NICE)
                        return -1;
 
                sched_param->sched_policy   = SCHED_OTHER;
-               sched_param->sched_priority = prio;
+               sched_param->sched_nice = prio;
                break;
        default:
                return -1;
@@ -536,7 +530,7 @@ int set_cpu_dma_latency(int32_t latency)
  */
 static const int find_mount(const char *fs, char *mp, int sizeof_mp)
 {
-       char mount_point[MAX_PATH];
+       char mount_point[MAX_PATH+1];
        char type[100];
        int found = 0;
        FILE *fp;
index 04ed1e650495a357daabfe40653c1eab5e89b005..d44513e6c66a01a5fc75f472dcfb9ebbfbd57bfb 100644 (file)
@@ -9,6 +9,8 @@
  */
 #define BUFF_U64_STR_SIZE      24
 #define MAX_PATH               1024
+#define MAX_NICE               20
+#define MIN_NICE               -19
 
 #define container_of(ptr, type, member)({                      \
        const typeof(((type *)0)->member) *__mptr = (ptr);      \
index 3d0f3888a58c66816fca24b5a9d7ab2106f0e992..485f8aeddbe033f227faf32139630d2a616cbd66 100644 (file)
@@ -28,10 +28,15 @@ FOPTS       :=      -flto=auto -ffat-lto-objects -fexceptions -fstack-protector-strong \
                -fasynchronous-unwind-tables -fstack-clash-protection
 WOPTS  :=      -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized
 
+ifeq ($(CC),clang)
+  FOPTS := $(filter-out -ffat-lto-objects, $(FOPTS))
+  WOPTS := $(filter-out -Wno-maybe-uninitialized, $(WOPTS))
+endif
+
 TRACEFS_HEADERS        := $$($(PKG_CONFIG) --cflags libtracefs)
 
 CFLAGS :=      -O -g -DVERSION=\"$(VERSION)\" $(FOPTS) $(MOPTS) $(WOPTS) $(TRACEFS_HEADERS) $(EXTRA_CFLAGS) -I include
-LDFLAGS        :=      -ggdb $(EXTRA_LDFLAGS)
+LDFLAGS        :=      -flto=auto -ggdb $(EXTRA_LDFLAGS)
 LIBS   :=      $$($(PKG_CONFIG) --libs libtracefs)
 
 SRC    :=      $(wildcard src/*.c)
index ad28582bcf2b1ca6b6c9ba9e5d09b0bb5fbe63c0..f04479ecc96c0b75af1afb2e7855cf1cf2491970 100644 (file)
@@ -210,9 +210,9 @@ static char *ikm_read_reactor(char *monitor_name)
 static char *ikm_get_current_reactor(char *monitor_name)
 {
        char *reactors = ikm_read_reactor(monitor_name);
+       char *curr_reactor = NULL;
        char *start;
        char *end;
-       char *curr_reactor;
 
        if (!reactors)
                return NULL;
index 10bfc88a69f72b6a0e310cca043fb04882e24eb1..0f50960b0e3a89215757163ad3b458c92670f4de 100644 (file)
@@ -1615,7 +1615,13 @@ static int check_memory_region_flags(struct kvm *kvm,
                valid_flags &= ~KVM_MEM_LOG_DIRTY_PAGES;
 
 #ifdef __KVM_HAVE_READONLY_MEM
-       valid_flags |= KVM_MEM_READONLY;
+       /*
+        * GUEST_MEMFD is incompatible with read-only memslots, as writes to
+        * read-only memslots have emulated MMIO, not page fault, semantics,
+        * and KVM doesn't allow emulated MMIO for private memory.
+        */
+       if (!(mem->flags & KVM_MEM_GUEST_MEMFD))
+               valid_flags |= KVM_MEM_READONLY;
 #endif
 
        if (mem->flags & ~valid_flags)