Merge tag 'drm-intel-gt-next-2022-06-29' of git://anongit.freedesktop.org/drm/drm...
authorDave Airlie <airlied@redhat.com>
Fri, 1 Jul 2022 04:14:52 +0000 (14:14 +1000)
committerDave Airlie <airlied@redhat.com>
Fri, 1 Jul 2022 04:14:52 +0000 (14:14 +1000)
UAPI Changes:

- Expose per tile media freq factor in sysfs (Ashutosh Dixit, Dale B Stimson)
- Document memory residency and Flat-CCS capability of obj (Ramalingam C)
- Disable GETPARAM lookups of I915_PARAM_[SUB]SLICE_MASK on Xe_HP+ (Matt Roper)

Cross-subsystem Changes:

- Rename intel-gtt symbols (Lucas De Marchi)

Core Changes:

Driver Changes:

- Support programming the EU priority in the GuC descriptor (DG2) (Matthew Brost)
- DG2 HuC loading support (Daniele Ceraolo Spurio)
- Fix build error without CONFIG_PM (YueHaibing)
- Enable THP on Icelake and beyond (Tvrtko Ursulin)
- Only setup private tmpfs mount when needed and fix logging (Tvrtko Ursulin)
- Make __guc_reset_context aware of guilty engines (Umesh Nerlige Ramappa)
- DG2 small bar memory probing fixes (Nirmoy Das)
- Remove unnecessary GuC err capture noise (Alan Previn)
- Fix i915_gem_object_ggtt_pin_ww regression on old platforms (Maarten Lankhorst)
- Fix undefined behavior in GuC backend due to shift overflowing the constant (Borislav Petkov)
- New DG2 workarounds (Swathi Dhanavanthri, Anshuman Gupta)
- Report no hwconfig support on ADL-N (Balasubramani Vivekanandan)
- Fix error_state_read ptr + offset use (Alan Previn)
- Expose per tile media freq factor in sysfs (Ashutosh Dixit, Dale B Stimson)
- Fix memory leaks in per-gt sysfs (Ashutosh Dixit)
- Fix dma_resv fence handling in multi-batch execbuf (Nirmoy Das)
- Add extra registers to GPU error dump on Gen11+ (Stuart Summers)
- More PVC+DG2 workarounds (Matt Roper)
- Improve user experience and driver robustness under SIGINT or similar (Tvrtko Ursulin)
- Don't show engine classes not present (Tvrtko Ursulin)
- Improve on suspend / resume time with VT-d enabled (Thomas Hellström)
- Add missing else (katrinzhou)
- Don't leak lmem mapping in vma_evict (Juha-Pekka Heikkila)
- Add smem fallback allocation for dpt (Juha-Pekka Heikkila)
- Tweak the ordering in cpu_write_needs_clflush (Matthew Auld)
- Do not access rq->engine without a reference (Niranjana Vishwanathapura)
- Revert "drm/i915: Hold reference to intel_context over life of i915_request" (Niranjana Vishwanathapura)
- Don't update engine busyness stats too frequently (Alan Previn)
- Add additional steps for Wa_22011802037 for execlist backend (Umesh Nerlige Ramappa)
- Fix a lockdep warning at error capture (Nirmoy Das)

- Ponte Vecchio prep work and new blitter engines (Matt Roper, John Harrison, Lucas De Marchi)
- Read correct RP_STATE_CAP register (PVC) (Matt Roper)
- Define MOCS table for PVC (Ayaz A Siddiqui)
- Driver refactor and support Ponte Vecchio forcewake handling (Matt Roper)
- Remove additional 3D flags from PIPE_CONTROL (Ponte Vecchio) (Stuart Summers)
- XEHPSDV and PVC do not use HuC (Daniele Ceraolo Spurio)
- Extract stepping information from PCI revid (Ponte Vecchio) (Matt Roper)
- Add initial PVC workarounds (Stuart Summers)
- SSEU handling driver refactor and Ponte Vecchio support (Matt Roper)
- GuC depriv applies to PVC (Matt Roper)
- Add register steering (Ponte Vecchio) (Matt Roper)
- Add recommended MMIO setting (Ponte Vecchio) (Matt Roper)

- Move multicast register handling to a dedicated file (Matt Roper)
- Cleanup interface for MCR operations (Matt Roper)
- Extend i915_vma_pin_iomap() (CQ Tang)
- Re-do the intel-gtt split (Lucas De Marchi)
- Correct duplicated/misplaced GT register definitions (Matt Roper)
- Prefer "XEHP_" prefix for registers (Matt Roper)

- Don't use DRM_DEBUG_WARN_ON for unexpected l3bank/mslice config (Tvrtko Ursulin)
- Don't use DRM_DEBUG_WARN_ON for ring unexpectedly not idle (Tvrtko Ursulin)
- Make drop_pages() return bool (Lucas De Marchi)
- Fix CFI violation with show_dynamic_id() (Nathan Chancellor)
- Use i915_probe_error instead of drm_error in GuC code (Vinay Belgaumkar)
- Fix use of static in macro mismatch (Andi Shyti)
- Update tiled blits selftest (Bommu Krishnaiah)
- Future-proof platform checks (Matt Roper)
- Only include what's needed (Jani Nikula)
- remove accidental static from a local variable (Jani Nikula)
- Add global forcewake request to drpc (Vinay Belgaumkar)
- Fix spelling typo in comment (pengfuyuan)
- Increase timeout for live_parallel_switch selftest (Akeem G Abodunrin)
- Use non-blocking H2G for waitboost (Vinay Belgaumkar)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YrwtLM081SQUG1Dc@tursulin-desk
97 files changed:
Documentation/gpu/i915.rst
drivers/char/agp/intel-gtt.c
drivers/gpu/drm/i915/Makefile
drivers/gpu/drm/i915/display/intel_dpt.c
drivers/gpu/drm/i915/gem/i915_gem_context.c
drivers/gpu/drm/i915/gem/i915_gem_domain.c
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
drivers/gpu/drm/i915/gem/i915_gem_shmem.c
drivers/gpu/drm/i915/gem/i915_gem_shrinker.c
drivers/gpu/drm/i915/gem/i915_gem_stolen.c
drivers/gpu/drm/i915/gem/i915_gem_tiling.c
drivers/gpu/drm/i915/gem/i915_gemfs.c
drivers/gpu/drm/i915/gem/i915_gemfs.h
drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
drivers/gpu/drm/i915/gt/gen8_engine_cs.c
drivers/gpu/drm/i915/gt/intel_context.c
drivers/gpu/drm/i915/gt/intel_context.h
drivers/gpu/drm/i915/gt/intel_context_types.h
drivers/gpu/drm/i915/gt/intel_engine.h
drivers/gpu/drm/i915/gt/intel_engine_cs.c
drivers/gpu/drm/i915/gt/intel_engine_regs.h
drivers/gpu/drm/i915/gt/intel_engine_types.h
drivers/gpu/drm/i915/gt/intel_execlists_submission.c
drivers/gpu/drm/i915/gt/intel_ggtt.c
drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c [new file with mode: 0644]
drivers/gpu/drm/i915/gt/intel_ggtt_gmch.h [new file with mode: 0644]
drivers/gpu/drm/i915/gt/intel_gpu_commands.h
drivers/gpu/drm/i915/gt/intel_gt.c
drivers/gpu/drm/i915/gt/intel_gt.h
drivers/gpu/drm/i915/gt/intel_gt_debugfs.c
drivers/gpu/drm/i915/gt/intel_gt_gmch.c [deleted file]
drivers/gpu/drm/i915/gt/intel_gt_gmch.h [deleted file]
drivers/gpu/drm/i915/gt/intel_gt_irq.c
drivers/gpu/drm/i915/gt/intel_gt_mcr.c [new file with mode: 0644]
drivers/gpu/drm/i915/gt/intel_gt_mcr.h [new file with mode: 0644]
drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c
drivers/gpu/drm/i915/gt/intel_gt_regs.h
drivers/gpu/drm/i915/gt/intel_gt_sysfs.c
drivers/gpu/drm/i915/gt/intel_gt_sysfs.h
drivers/gpu/drm/i915/gt/intel_gt_sysfs_pm.c
drivers/gpu/drm/i915/gt/intel_gt_types.h
drivers/gpu/drm/i915/gt/intel_gtt.h
drivers/gpu/drm/i915/gt/intel_lrc.h
drivers/gpu/drm/i915/gt/intel_mocs.c
drivers/gpu/drm/i915/gt/intel_region_lmem.c
drivers/gpu/drm/i915/gt/intel_ring_submission.c
drivers/gpu/drm/i915/gt/intel_rps.c
drivers/gpu/drm/i915/gt/intel_sseu.c
drivers/gpu/drm/i915/gt/intel_sseu.h
drivers/gpu/drm/i915/gt/intel_sseu_debugfs.c
drivers/gpu/drm/i915/gt/intel_workarounds.c
drivers/gpu/drm/i915/gt/selftest_hangcheck.c
drivers/gpu/drm/i915/gt/uc/abi/guc_actions_slpc_abi.h
drivers/gpu/drm/i915/gt/uc/intel_guc.c
drivers/gpu/drm/i915/gt/uc/intel_guc.h
drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
drivers/gpu/drm/i915/gt/uc/intel_guc_capture.c
drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c
drivers/gpu/drm/i915/gt/uc/intel_guc_rc.c
drivers/gpu/drm/i915/gt/uc/intel_guc_reg.h
drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.c
drivers/gpu/drm/i915/gt/uc/intel_guc_slpc.h
drivers/gpu/drm/i915/gt/uc/intel_guc_slpc_types.h
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
drivers/gpu/drm/i915/gt/uc/intel_huc.c
drivers/gpu/drm/i915/gt/uc/intel_huc.h
drivers/gpu/drm/i915/gt/uc/intel_huc_fw.c
drivers/gpu/drm/i915/gt/uc/intel_uc.c
drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
drivers/gpu/drm/i915/gt/uc/intel_uc_fw.h
drivers/gpu/drm/i915/gt/uc/intel_uc_fw_abi.h
drivers/gpu/drm/i915/gvt/cmd_parser.c
drivers/gpu/drm/i915/i915_driver.c
drivers/gpu/drm/i915/i915_drm_client.c
drivers/gpu/drm/i915/i915_drm_client.h
drivers/gpu/drm/i915/i915_drv.h
drivers/gpu/drm/i915/i915_getparam.c
drivers/gpu/drm/i915/i915_gpu_error.c
drivers/gpu/drm/i915/i915_gpu_error.h
drivers/gpu/drm/i915/i915_pci.c
drivers/gpu/drm/i915/i915_query.c
drivers/gpu/drm/i915/i915_reg.h
drivers/gpu/drm/i915/i915_request.c
drivers/gpu/drm/i915/i915_request.h
drivers/gpu/drm/i915/i915_sysfs.c
drivers/gpu/drm/i915/i915_vma.c
drivers/gpu/drm/i915/intel_device_info.h
drivers/gpu/drm/i915/intel_pm.c
drivers/gpu/drm/i915/intel_step.c
drivers/gpu/drm/i915/intel_step.h
drivers/gpu/drm/i915/intel_uncore.c
drivers/gpu/drm/i915/intel_uncore.h
drivers/gpu/drm/i915/selftests/intel_uncore.c
include/drm/intel-gtt.h
include/uapi/drm/i915_drm.h

index 54060cd6c419474ef09ae6dc9ab74f7c328dfee1..4e59db1cfb00ef7a07b5d8c6ed9598735a43d0c7 100644 (file)
@@ -246,6 +246,18 @@ Display State Buffer
 .. kernel-doc:: drivers/gpu/drm/i915/display/intel_dsb.c
    :internal:
 
+GT Programming
+==============
+
+Multicast/Replicated (MCR) Registers
+------------------------------------
+
+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_gt_mcr.c
+   :doc: GT Multicast/Replicated (MCR) Register Support
+
+.. kernel-doc:: drivers/gpu/drm/i915/gt/intel_gt_mcr.c
+   :internal:
+
 Memory Management and Command Submission
 ========================================
 
index 79a1b65527c2c51f3d9b225308c0df8d0a15298a..fe7e2105e7667a530ee20696b111f025729b8f89 100644 (file)
@@ -744,7 +744,7 @@ static void i830_write_entry(dma_addr_t addr, unsigned int entry,
        writel_relaxed(addr | pte_flags, intel_private.gtt + entry);
 }
 
-bool intel_enable_gtt(void)
+bool intel_gmch_enable_gtt(void)
 {
        u8 __iomem *reg;
 
@@ -787,7 +787,7 @@ bool intel_enable_gtt(void)
 
        return true;
 }
-EXPORT_SYMBOL(intel_enable_gtt);
+EXPORT_SYMBOL(intel_gmch_enable_gtt);
 
 static int i830_setup(void)
 {
@@ -821,8 +821,8 @@ static int intel_fake_agp_free_gatt_table(struct agp_bridge_data *bridge)
 
 static int intel_fake_agp_configure(void)
 {
-       if (!intel_enable_gtt())
-           return -EIO;
+       if (!intel_gmch_enable_gtt())
+               return -EIO;
 
        intel_private.clear_fake_agp = true;
        agp_bridge->gart_bus_addr = intel_private.gma_bus_addr;
@@ -844,20 +844,20 @@ static bool i830_check_flags(unsigned int flags)
        return false;
 }
 
-void intel_gtt_insert_page(dma_addr_t addr,
-                          unsigned int pg,
-                          unsigned int flags)
+void intel_gmch_gtt_insert_page(dma_addr_t addr,
+                               unsigned int pg,
+                               unsigned int flags)
 {
        intel_private.driver->write_entry(addr, pg, flags);
        readl(intel_private.gtt + pg);
        if (intel_private.driver->chipset_flush)
                intel_private.driver->chipset_flush();
 }
-EXPORT_SYMBOL(intel_gtt_insert_page);
+EXPORT_SYMBOL(intel_gmch_gtt_insert_page);
 
-void intel_gtt_insert_sg_entries(struct sg_table *st,
-                                unsigned int pg_start,
-                                unsigned int flags)
+void intel_gmch_gtt_insert_sg_entries(struct sg_table *st,
+                                     unsigned int pg_start,
+                                     unsigned int flags)
 {
        struct scatterlist *sg;
        unsigned int len, m;
@@ -879,13 +879,13 @@ void intel_gtt_insert_sg_entries(struct sg_table *st,
        if (intel_private.driver->chipset_flush)
                intel_private.driver->chipset_flush();
 }
-EXPORT_SYMBOL(intel_gtt_insert_sg_entries);
+EXPORT_SYMBOL(intel_gmch_gtt_insert_sg_entries);
 
 #if IS_ENABLED(CONFIG_AGP_INTEL)
-static void intel_gtt_insert_pages(unsigned int first_entry,
-                                  unsigned int num_entries,
-                                  struct page **pages,
-                                  unsigned int flags)
+static void intel_gmch_gtt_insert_pages(unsigned int first_entry,
+                                       unsigned int num_entries,
+                                       struct page **pages,
+                                       unsigned int flags)
 {
        int i, j;
 
@@ -905,7 +905,7 @@ static int intel_fake_agp_insert_entries(struct agp_memory *mem,
        if (intel_private.clear_fake_agp) {
                int start = intel_private.stolen_size / PAGE_SIZE;
                int end = intel_private.gtt_mappable_entries;
-               intel_gtt_clear_range(start, end - start);
+               intel_gmch_gtt_clear_range(start, end - start);
                intel_private.clear_fake_agp = false;
        }
 
@@ -934,12 +934,12 @@ static int intel_fake_agp_insert_entries(struct agp_memory *mem,
                if (ret != 0)
                        return ret;
 
-               intel_gtt_insert_sg_entries(&st, pg_start, type);
+               intel_gmch_gtt_insert_sg_entries(&st, pg_start, type);
                mem->sg_list = st.sgl;
                mem->num_sg = st.nents;
        } else
-               intel_gtt_insert_pages(pg_start, mem->page_count, mem->pages,
-                                      type);
+               intel_gmch_gtt_insert_pages(pg_start, mem->page_count, mem->pages,
+                                           type);
 
 out:
        ret = 0;
@@ -949,7 +949,7 @@ out_err:
 }
 #endif
 
-void intel_gtt_clear_range(unsigned int first_entry, unsigned int num_entries)
+void intel_gmch_gtt_clear_range(unsigned int first_entry, unsigned int num_entries)
 {
        unsigned int i;
 
@@ -959,7 +959,7 @@ void intel_gtt_clear_range(unsigned int first_entry, unsigned int num_entries)
        }
        wmb();
 }
-EXPORT_SYMBOL(intel_gtt_clear_range);
+EXPORT_SYMBOL(intel_gmch_gtt_clear_range);
 
 #if IS_ENABLED(CONFIG_AGP_INTEL)
 static int intel_fake_agp_remove_entries(struct agp_memory *mem,
@@ -968,7 +968,7 @@ static int intel_fake_agp_remove_entries(struct agp_memory *mem,
        if (mem->page_count == 0)
                return 0;
 
-       intel_gtt_clear_range(pg_start, mem->page_count);
+       intel_gmch_gtt_clear_range(pg_start, mem->page_count);
 
        if (intel_private.needs_dmar) {
                intel_gtt_unmap_memory(mem->sg_list, mem->num_sg);
@@ -1431,22 +1431,22 @@ int intel_gmch_probe(struct pci_dev *bridge_pdev, struct pci_dev *gpu_pdev,
 }
 EXPORT_SYMBOL(intel_gmch_probe);
 
-void intel_gtt_get(u64 *gtt_total,
-                  phys_addr_t *mappable_base,
-                  resource_size_t *mappable_end)
+void intel_gmch_gtt_get(u64 *gtt_total,
+                       phys_addr_t *mappable_base,
+                       resource_size_t *mappable_end)
 {
        *gtt_total = intel_private.gtt_total_entries << PAGE_SHIFT;
        *mappable_base = intel_private.gma_bus_addr;
        *mappable_end = intel_private.gtt_mappable_entries << PAGE_SHIFT;
 }
-EXPORT_SYMBOL(intel_gtt_get);
+EXPORT_SYMBOL(intel_gmch_gtt_get);
 
-void intel_gtt_chipset_flush(void)
+void intel_gmch_gtt_flush(void)
 {
        if (intel_private.driver->chipset_flush)
                intel_private.driver->chipset_flush();
 }
-EXPORT_SYMBOL(intel_gtt_chipset_flush);
+EXPORT_SYMBOL(intel_gmch_gtt_flush);
 
 void intel_gmch_remove(void)
 {
index c84a9cd8440d358c9d6e4caf05f7aef413be7c66..522ef9b4aff329625086f8ef5cf9ed9053a2f716 100644 (file)
@@ -103,6 +103,7 @@ gt-y += \
        gt/intel_gt_debugfs.o \
        gt/intel_gt_engines_debugfs.o \
        gt/intel_gt_irq.o \
+       gt/intel_gt_mcr.o \
        gt/intel_gt_pm.o \
        gt/intel_gt_pm_debugfs.o \
        gt/intel_gt_pm_irq.o \
@@ -129,7 +130,7 @@ gt-y += \
        gt/shmem_utils.o \
        gt/sysfs_engines.o
 # x86 intel-gtt module support
-gt-$(CONFIG_X86) += gt/intel_gt_gmch.o
+gt-$(CONFIG_X86) += gt/intel_ggtt_gmch.o
 # autogenerated null render state
 gt-y += \
        gt/gen6_renderstate.o \
index fb0e7e79e0cdacd9e49d34e4b1e9b784963fa5f1..ac587647e1f5015d393270a8aeeb7b360ef236b3 100644 (file)
@@ -4,6 +4,7 @@
  */
 
 #include "gem/i915_gem_domain.h"
+#include "gem/i915_gem_internal.h"
 #include "gt/gen8_ppgtt.h"
 
 #include "i915_drv.h"
@@ -127,8 +128,12 @@ struct i915_vma *intel_dpt_pin(struct i915_address_space *vm)
        struct i915_vma *vma;
        void __iomem *iomem;
        struct i915_gem_ww_ctx ww;
+       u64 pin_flags = 0;
        int err;
 
+       if (i915_gem_object_is_stolen(dpt->obj))
+               pin_flags |= PIN_MAPPABLE;
+
        wakeref = intel_runtime_pm_get(&i915->runtime_pm);
        atomic_inc(&i915->gpu_error.pending_fb_pin);
 
@@ -138,7 +143,7 @@ struct i915_vma *intel_dpt_pin(struct i915_address_space *vm)
                        continue;
 
                vma = i915_gem_object_ggtt_pin_ww(dpt->obj, &ww, NULL, 0, 4096,
-                                                 HAS_LMEM(i915) ? 0 : PIN_MAPPABLE);
+                                                 pin_flags);
                if (IS_ERR(vma)) {
                        err = PTR_ERR(vma);
                        continue;
@@ -248,10 +253,13 @@ intel_dpt_create(struct intel_framebuffer *fb)
 
        size = round_up(size * sizeof(gen8_pte_t), I915_GTT_PAGE_SIZE);
 
-       if (HAS_LMEM(i915))
-               dpt_obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_CONTIGUOUS);
-       else
+       dpt_obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_CONTIGUOUS);
+       if (IS_ERR(dpt_obj) && i915_ggtt_has_aperture(to_gt(i915)->ggtt))
                dpt_obj = i915_gem_object_create_stolen(i915, size);
+       if (IS_ERR(dpt_obj) && !HAS_LMEM(i915)) {
+               drm_dbg_kms(&i915->drm, "Allocating dpt from smem\n");
+               dpt_obj = i915_gem_object_create_internal(i915, size);
+       }
        if (IS_ERR(dpt_obj))
                return ERR_CAST(dpt_obj);
 
index ab4c5ab28e4d9fbcd666d341a0d6934c3d2f96dd..dabdfe09f5e51438c7d9b203194316f48b71b1d9 100644 (file)
@@ -933,8 +933,9 @@ static int set_proto_ctx_param(struct drm_i915_file_private *fpriv,
        case I915_CONTEXT_PARAM_PERSISTENCE:
                if (args->size)
                        ret = -EINVAL;
-               ret = proto_context_set_persistence(fpriv->dev_priv, pc,
-                                                   args->value);
+               else
+                       ret = proto_context_set_persistence(fpriv->dev_priv, pc,
+                                                           args->value);
                break;
 
        case I915_CONTEXT_PARAM_PROTECTED_CONTENT:
@@ -1367,7 +1368,8 @@ static struct intel_engine_cs *active_engine(struct intel_context *ce)
        return engine;
 }
 
-static void kill_engines(struct i915_gem_engines *engines, bool ban)
+static void
+kill_engines(struct i915_gem_engines *engines, bool exit, bool persistent)
 {
        struct i915_gem_engines_iter it;
        struct intel_context *ce;
@@ -1381,9 +1383,15 @@ static void kill_engines(struct i915_gem_engines *engines, bool ban)
         */
        for_each_gem_engine(ce, engines, it) {
                struct intel_engine_cs *engine;
+               bool skip = false;
 
-               if (ban && intel_context_ban(ce, NULL))
-                       continue;
+               if (exit)
+                       skip = intel_context_set_exiting(ce);
+               else if (!persistent)
+                       skip = intel_context_exit_nonpersistent(ce, NULL);
+
+               if (skip)
+                       continue; /* Already marked. */
 
                /*
                 * Check the current active state of this context; if we
@@ -1395,7 +1403,7 @@ static void kill_engines(struct i915_gem_engines *engines, bool ban)
                engine = active_engine(ce);
 
                /* First attempt to gracefully cancel the context */
-               if (engine && !__cancel_engine(engine) && ban)
+               if (engine && !__cancel_engine(engine) && (exit || !persistent))
                        /*
                         * If we are unable to send a preemptive pulse to bump
                         * the context from the GPU, we have to resort to a full
@@ -1407,8 +1415,6 @@ static void kill_engines(struct i915_gem_engines *engines, bool ban)
 
 static void kill_context(struct i915_gem_context *ctx)
 {
-       bool ban = (!i915_gem_context_is_persistent(ctx) ||
-                   !ctx->i915->params.enable_hangcheck);
        struct i915_gem_engines *pos, *next;
 
        spin_lock_irq(&ctx->stale.lock);
@@ -1421,7 +1427,8 @@ static void kill_context(struct i915_gem_context *ctx)
 
                spin_unlock_irq(&ctx->stale.lock);
 
-               kill_engines(pos, ban);
+               kill_engines(pos, !ctx->i915->params.enable_hangcheck,
+                            i915_gem_context_is_persistent(ctx));
 
                spin_lock_irq(&ctx->stale.lock);
                GEM_BUG_ON(i915_sw_fence_signaled(&pos->fence));
@@ -1467,7 +1474,8 @@ static void engines_idle_release(struct i915_gem_context *ctx,
 
 kill:
        if (list_empty(&engines->link)) /* raced, already closed */
-               kill_engines(engines, true);
+               kill_engines(engines, true,
+                            i915_gem_context_is_persistent(ctx));
 
        i915_sw_fence_commit(&engines->fence);
 }
@@ -1875,6 +1883,7 @@ i915_gem_user_to_context_sseu(struct intel_gt *gt,
 {
        const struct sseu_dev_info *device = &gt->info.sseu;
        struct drm_i915_private *i915 = gt->i915;
+       unsigned int dev_subslice_mask = intel_sseu_get_hsw_subslices(device, 0);
 
        /* No zeros in any field. */
        if (!user->slice_mask || !user->subslice_mask ||
@@ -1901,7 +1910,7 @@ i915_gem_user_to_context_sseu(struct intel_gt *gt,
        if (user->slice_mask & ~device->slice_mask)
                return -EINVAL;
 
-       if (user->subslice_mask & ~device->subslice_mask[0])
+       if (user->subslice_mask & ~dev_subslice_mask)
                return -EINVAL;
 
        if (user->max_eus_per_subslice > device->max_eus_per_subslice)
@@ -1915,7 +1924,7 @@ i915_gem_user_to_context_sseu(struct intel_gt *gt,
        /* Part specific restrictions. */
        if (GRAPHICS_VER(i915) == 11) {
                unsigned int hw_s = hweight8(device->slice_mask);
-               unsigned int hw_ss_per_s = hweight8(device->subslice_mask[0]);
+               unsigned int hw_ss_per_s = hweight8(dev_subslice_mask);
                unsigned int req_s = hweight8(context->slice_mask);
                unsigned int req_ss = hweight8(context->subslice_mask);
 
index 3e5d6057b3ef91cf4a4a8d77cf13f06f1cf60361..1674b0c5802bf0500c6cce1e010f6fb59c8318fe 100644 (file)
@@ -35,12 +35,12 @@ bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
        if (obj->cache_dirty)
                return false;
 
-       if (!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE))
-               return true;
-
        if (IS_DGFX(i915))
                return false;
 
+       if (!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE))
+               return true;
+
        /* Currently in use by HW (display engine)? Keep flushed. */
        return i915_gem_object_is_framebuffer(obj);
 }
index c326bd2b444fc50cb350a2e6d3ff8184e117d97c..30fe847c6664d3d9c70ce4fd8e8fed675191764f 100644 (file)
@@ -999,7 +999,8 @@ static int eb_validate_vmas(struct i915_execbuffer *eb)
                        }
                }
 
-               err = dma_resv_reserve_fences(vma->obj->base.resv, 1);
+               /* Reserve enough slots to accommodate composite fences */
+               err = dma_resv_reserve_fences(vma->obj->base.resv, eb->num_batches);
                if (err)
                        return err;
 
index 2e16e91a5a56b03dab1178614c010b062722ce30..4eed3dd90ba8bc75aaa030ae752d2b7403593090 100644 (file)
@@ -670,17 +670,10 @@ fail:
 
 static int init_shmem(struct intel_memory_region *mem)
 {
-       int err;
-
-       err = i915_gemfs_init(mem->i915);
-       if (err) {
-               DRM_NOTE("Unable to create a private tmpfs mount, hugepage support will be disabled(%d).\n",
-                        err);
-       }
-
+       i915_gemfs_init(mem->i915);
        intel_memory_region_set_name(mem, "system");
 
-       return 0; /* Don't error, we can simply fallback to the kernel mnt */
+       return 0; /* We have fallback to the kernel mnt if gemfs init failed. */
 }
 
 static int release_shmem(struct intel_memory_region *mem)
index 6a6ff98a87462b388f8af950fcc1b0282e656ade..1030053571a20aa8169faf428bc4ebe0152b2382 100644 (file)
@@ -36,7 +36,7 @@ static bool can_release_pages(struct drm_i915_gem_object *obj)
        return swap_available() || obj->mm.madv == I915_MADV_DONTNEED;
 }
 
-static int drop_pages(struct drm_i915_gem_object *obj,
+static bool drop_pages(struct drm_i915_gem_object *obj,
                       unsigned long shrink, bool trylock_vm)
 {
        unsigned long flags;
index 47b5e0e342abd550d575d8eed958e07977e61859..166d0a4b9e8c0c394c386960d731eac182795cdb 100644 (file)
@@ -13,6 +13,8 @@
 #include "gem/i915_gem_lmem.h"
 #include "gem/i915_gem_region.h"
 #include "gt/intel_gt.h"
+#include "gt/intel_gt_mcr.h"
+#include "gt/intel_gt_regs.h"
 #include "gt/intel_region_lmem.h"
 #include "i915_drv.h"
 #include "i915_gem_stolen.h"
@@ -834,8 +836,8 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, u16 type,
        } else {
                resource_size_t lmem_range;
 
-               lmem_range = intel_gt_read_register(&i915->gt0, XEHPSDV_TILE0_ADDR_RANGE) & 0xFFFF;
-               lmem_size = lmem_range >> XEHPSDV_TILE_LMEM_RANGE_SHIFT;
+               lmem_range = intel_gt_mcr_read_any(&i915->gt0, XEHP_TILE0_ADDR_RANGE) & 0xFFFF;
+               lmem_size = lmem_range >> XEHP_TILE_LMEM_RANGE_SHIFT;
                lmem_size *= SZ_1G;
        }
 
index 80ac0db1ae8caca6fffd65525de62d29ed952446..85518b28cd721bd4086c616444f3e5ba32b3397f 100644 (file)
@@ -114,7 +114,7 @@ u32 i915_gem_fence_alignment(struct drm_i915_private *i915, u32 size,
        return i915_gem_fence_size(i915, size, tiling, stride);
 }
 
-/* Check pitch constriants for all chips & tiling formats */
+/* Check pitch constraints for all chips & tiling formats */
 static bool
 i915_tiling_ok(struct drm_i915_gem_object *obj,
               unsigned int tiling, unsigned int stride)
index ee87874e59dcc2150bd3295590ffd92ce6311cdd..46b9a17d6abc6e64a2e46f2e825746022faa842d 100644 (file)
 #include "i915_gemfs.h"
 #include "i915_utils.h"
 
-int i915_gemfs_init(struct drm_i915_private *i915)
+void i915_gemfs_init(struct drm_i915_private *i915)
 {
        char huge_opt[] = "huge=within_size"; /* r/w */
        struct file_system_type *type;
        struct vfsmount *gemfs;
-       char *opts;
-
-       type = get_fs_type("tmpfs");
-       if (!type)
-               return -ENODEV;
 
        /*
         * By creating our own shmemfs mountpoint, we can pass in
@@ -28,30 +23,35 @@ int i915_gemfs_init(struct drm_i915_private *i915)
         *
         * One example, although it is probably better with a per-file
         * control, is selecting huge page allocations ("huge=within_size").
-        * However, we only do so to offset the overhead of iommu lookups
-        * due to bandwidth issues (slow reads) on Broadwell+.
+        * However, we only do so on platforms which benefit from it, or to
+        * offset the overhead of iommu lookups, where with latter it is a net
+        * win even on platforms which would otherwise see some performance
+        * regressions such a slow reads issue on Broadwell and Skylake.
         */
 
-       opts = NULL;
-       if (i915_vtd_active(i915)) {
-               if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
-                       opts = huge_opt;
-                       drm_info(&i915->drm,
-                                "Transparent Hugepage mode '%s'\n",
-                                opts);
-               } else {
-                       drm_notice(&i915->drm,
-                                  "Transparent Hugepage support is recommended for optimal performance when IOMMU is enabled!\n");
-               }
-       }
-
-       gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name, opts);
+       if (GRAPHICS_VER(i915) < 11 && !i915_vtd_active(i915))
+               return;
+
+       if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
+               goto err;
+
+       type = get_fs_type("tmpfs");
+       if (!type)
+               goto err;
+
+       gemfs = vfs_kern_mount(type, SB_KERNMOUNT, type->name, huge_opt);
        if (IS_ERR(gemfs))
-               return PTR_ERR(gemfs);
+               goto err;
 
        i915->mm.gemfs = gemfs;
-
-       return 0;
+       drm_info(&i915->drm, "Using Transparent Hugepages\n");
+       return;
+
+err:
+       drm_notice(&i915->drm,
+                  "Transparent Hugepage support is recommended for optimal performance%s\n",
+                  GRAPHICS_VER(i915) >= 11 ? " on this platform!" :
+                                             " when IOMMU is enabled!");
 }
 
 void i915_gemfs_fini(struct drm_i915_private *i915)
index 2a1e59af3e4a1616b26964290e594b20acc8eee6..5d835e44c4f6eba6691557b14dad3f0945445194 100644 (file)
@@ -9,8 +9,7 @@
 
 struct drm_i915_private;
 
-int i915_gemfs_init(struct drm_i915_private *i915);
-
+void i915_gemfs_init(struct drm_i915_private *i915);
 void i915_gemfs_fini(struct drm_i915_private *i915);
 
 #endif
index ddd0772fd82864e3af437df60bd183c691266c73..3cfc621ef363d2e097dc863b4941be578ff337e9 100644 (file)
@@ -6,6 +6,7 @@
 #include "i915_selftest.h"
 
 #include "gt/intel_context.h"
+#include "gt/intel_engine_regs.h"
 #include "gt/intel_engine_user.h"
 #include "gt/intel_gpu_commands.h"
 #include "gt/intel_gt.h"
 #include "huge_gem_object.h"
 #include "mock_context.h"
 
+#define OW_SIZE 16                      /* in bytes */
+#define F_SUBTILE_SIZE 64               /* in bytes */
+#define F_TILE_WIDTH 128                /* in bytes */
+#define F_TILE_HEIGHT 32                /* in pixels */
+#define F_SUBTILE_WIDTH  OW_SIZE        /* in bytes */
+#define F_SUBTILE_HEIGHT 4              /* in pixels */
+
+static int linear_x_y_to_ftiled_pos(int x, int y, u32 stride, int bpp)
+{
+       int tile_base;
+       int tile_x, tile_y;
+       int swizzle, subtile;
+       int pixel_size = bpp / 8;
+       int pos;
+
+       /*
+        * Subtile remapping for F tile. Note that map[a]==b implies map[b]==a
+        * so we can use the same table to tile and until.
+        */
+       static const u8 f_subtile_map[] = {
+                0,  1,  2,  3,  8,  9, 10, 11,
+                4,  5,  6,  7, 12, 13, 14, 15,
+               16, 17, 18, 19, 24, 25, 26, 27,
+               20, 21, 22, 23, 28, 29, 30, 31,
+               32, 33, 34, 35, 40, 41, 42, 43,
+               36, 37, 38, 39, 44, 45, 46, 47,
+               48, 49, 50, 51, 56, 57, 58, 59,
+               52, 53, 54, 55, 60, 61, 62, 63
+       };
+
+       x *= pixel_size;
+       /*
+        * Where does the 4k tile start (in bytes)?  This is the same for Y and
+        * F so we can use the Y-tile algorithm to get to that point.
+        */
+       tile_base =
+               y / F_TILE_HEIGHT * stride * F_TILE_HEIGHT +
+               x / F_TILE_WIDTH * 4096;
+
+       /* Find pixel within tile */
+       tile_x = x % F_TILE_WIDTH;
+       tile_y = y % F_TILE_HEIGHT;
+
+       /* And figure out the subtile within the 4k tile */
+       subtile = tile_y / F_SUBTILE_HEIGHT * 8 + tile_x / F_SUBTILE_WIDTH;
+
+       /* Swizzle the subtile number according to the bspec diagram */
+       swizzle = f_subtile_map[subtile];
+
+       /* Calculate new position */
+       pos = tile_base +
+               swizzle * F_SUBTILE_SIZE +
+               tile_y % F_SUBTILE_HEIGHT * OW_SIZE +
+               tile_x % F_SUBTILE_WIDTH;
+
+       GEM_BUG_ON(!IS_ALIGNED(pos, pixel_size));
+
+       return pos / pixel_size * 4;
+}
+
 enum client_tiling {
        CLIENT_TILING_LINEAR,
        CLIENT_TILING_X,
        CLIENT_TILING_Y,
+       CLIENT_TILING_4,
        CLIENT_NUM_TILING_TYPES
 };
 
@@ -45,6 +107,36 @@ struct tiled_blits {
        u32 height;
 };
 
+static bool supports_x_tiling(const struct drm_i915_private *i915)
+{
+       int gen = GRAPHICS_VER(i915);
+
+       if (gen < 12)
+               return true;
+
+       if (!HAS_LMEM(i915) || IS_DG1(i915))
+               return false;
+
+       return true;
+}
+
+static bool fast_blit_ok(const struct blit_buffer *buf)
+{
+       int gen = GRAPHICS_VER(buf->vma->vm->i915);
+
+       if (gen < 9)
+               return false;
+
+       if (gen < 12)
+               return true;
+
+       /* filter out platforms with unsupported X-tile support in fastblit */
+       if (buf->tiling == CLIENT_TILING_X && !supports_x_tiling(buf->vma->vm->i915))
+               return false;
+
+       return true;
+}
+
 static int prepare_blit(const struct tiled_blits *t,
                        struct blit_buffer *dst,
                        struct blit_buffer *src,
@@ -59,51 +151,103 @@ static int prepare_blit(const struct tiled_blits *t,
        if (IS_ERR(cs))
                return PTR_ERR(cs);
 
-       *cs++ = MI_LOAD_REGISTER_IMM(1);
-       *cs++ = i915_mmio_reg_offset(BCS_SWCTRL);
-       cmd = (BCS_SRC_Y | BCS_DST_Y) << 16;
-       if (src->tiling == CLIENT_TILING_Y)
-               cmd |= BCS_SRC_Y;
-       if (dst->tiling == CLIENT_TILING_Y)
-               cmd |= BCS_DST_Y;
-       *cs++ = cmd;
-
-       cmd = MI_FLUSH_DW;
-       if (ver >= 8)
-               cmd++;
-       *cs++ = cmd;
-       *cs++ = 0;
-       *cs++ = 0;
-       *cs++ = 0;
-
-       cmd = XY_SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (8 - 2);
-       if (ver >= 8)
-               cmd += 2;
-
-       src_pitch = t->width * 4;
-       if (src->tiling) {
-               cmd |= XY_SRC_COPY_BLT_SRC_TILED;
-               src_pitch /= 4;
-       }
+       if (fast_blit_ok(dst) && fast_blit_ok(src)) {
+               struct intel_gt *gt = t->ce->engine->gt;
+               u32 src_tiles = 0, dst_tiles = 0;
+               u32 src_4t = 0, dst_4t = 0;
+
+               /* Need to program BLIT_CCTL if it is not done previously
+                * before using XY_FAST_COPY_BLT
+                */
+               *cs++ = MI_LOAD_REGISTER_IMM(1);
+               *cs++ = i915_mmio_reg_offset(BLIT_CCTL(t->ce->engine->mmio_base));
+               *cs++ = (BLIT_CCTL_SRC_MOCS(gt->mocs.uc_index) |
+                        BLIT_CCTL_DST_MOCS(gt->mocs.uc_index));
+
+               src_pitch = t->width; /* in dwords */
+               if (src->tiling == CLIENT_TILING_4) {
+                       src_tiles = XY_FAST_COPY_BLT_D0_SRC_TILE_MODE(YMAJOR);
+                       src_4t = XY_FAST_COPY_BLT_D1_SRC_TILE4;
+               } else if (src->tiling == CLIENT_TILING_Y) {
+                       src_tiles = XY_FAST_COPY_BLT_D0_SRC_TILE_MODE(YMAJOR);
+               } else if (src->tiling == CLIENT_TILING_X) {
+                       src_tiles = XY_FAST_COPY_BLT_D0_SRC_TILE_MODE(TILE_X);
+               } else {
+                       src_pitch *= 4; /* in bytes */
+               }
 
-       dst_pitch = t->width * 4;
-       if (dst->tiling) {
-               cmd |= XY_SRC_COPY_BLT_DST_TILED;
-               dst_pitch /= 4;
-       }
+               dst_pitch = t->width; /* in dwords */
+               if (dst->tiling == CLIENT_TILING_4) {
+                       dst_tiles = XY_FAST_COPY_BLT_D0_DST_TILE_MODE(YMAJOR);
+                       dst_4t = XY_FAST_COPY_BLT_D1_DST_TILE4;
+               } else if (dst->tiling == CLIENT_TILING_Y) {
+                       dst_tiles = XY_FAST_COPY_BLT_D0_DST_TILE_MODE(YMAJOR);
+               } else if (dst->tiling == CLIENT_TILING_X) {
+                       dst_tiles = XY_FAST_COPY_BLT_D0_DST_TILE_MODE(TILE_X);
+               } else {
+                       dst_pitch *= 4; /* in bytes */
+               }
 
-       *cs++ = cmd;
-       *cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | dst_pitch;
-       *cs++ = 0;
-       *cs++ = t->height << 16 | t->width;
-       *cs++ = lower_32_bits(dst->vma->node.start);
-       if (use_64b_reloc)
+               *cs++ = GEN9_XY_FAST_COPY_BLT_CMD | (10 - 2) |
+                       src_tiles | dst_tiles;
+               *cs++ = src_4t | dst_4t | BLT_DEPTH_32 | dst_pitch;
+               *cs++ = 0;
+               *cs++ = t->height << 16 | t->width;
+               *cs++ = lower_32_bits(dst->vma->node.start);
                *cs++ = upper_32_bits(dst->vma->node.start);
-       *cs++ = 0;
-       *cs++ = src_pitch;
-       *cs++ = lower_32_bits(src->vma->node.start);
-       if (use_64b_reloc)
+               *cs++ = 0;
+               *cs++ = src_pitch;
+               *cs++ = lower_32_bits(src->vma->node.start);
                *cs++ = upper_32_bits(src->vma->node.start);
+       } else {
+               if (ver >= 6) {
+                       *cs++ = MI_LOAD_REGISTER_IMM(1);
+                       *cs++ = i915_mmio_reg_offset(BCS_SWCTRL);
+                       cmd = (BCS_SRC_Y | BCS_DST_Y) << 16;
+                       if (src->tiling == CLIENT_TILING_Y)
+                               cmd |= BCS_SRC_Y;
+                       if (dst->tiling == CLIENT_TILING_Y)
+                               cmd |= BCS_DST_Y;
+                       *cs++ = cmd;
+
+                       cmd = MI_FLUSH_DW;
+                       if (ver >= 8)
+                               cmd++;
+                       *cs++ = cmd;
+                       *cs++ = 0;
+                       *cs++ = 0;
+                       *cs++ = 0;
+               }
+
+               cmd = XY_SRC_COPY_BLT_CMD | BLT_WRITE_RGBA | (8 - 2);
+               if (ver >= 8)
+                       cmd += 2;
+
+               src_pitch = t->width * 4;
+               if (src->tiling) {
+                       cmd |= XY_SRC_COPY_BLT_SRC_TILED;
+                       src_pitch /= 4;
+               }
+
+               dst_pitch = t->width * 4;
+               if (dst->tiling) {
+                       cmd |= XY_SRC_COPY_BLT_DST_TILED;
+                       dst_pitch /= 4;
+               }
+
+               *cs++ = cmd;
+               *cs++ = BLT_DEPTH_32 | BLT_ROP_SRC_COPY | dst_pitch;
+               *cs++ = 0;
+               *cs++ = t->height << 16 | t->width;
+               *cs++ = lower_32_bits(dst->vma->node.start);
+               if (use_64b_reloc)
+                       *cs++ = upper_32_bits(dst->vma->node.start);
+               *cs++ = 0;
+               *cs++ = src_pitch;
+               *cs++ = lower_32_bits(src->vma->node.start);
+               if (use_64b_reloc)
+                       *cs++ = upper_32_bits(src->vma->node.start);
+       }
 
        *cs++ = MI_BATCH_BUFFER_END;
 
@@ -181,7 +325,13 @@ static int tiled_blits_create_buffers(struct tiled_blits *t,
 
                t->buffers[i].vma = vma;
                t->buffers[i].tiling =
-                       i915_prandom_u32_max_state(CLIENT_TILING_Y + 1, prng);
+                       i915_prandom_u32_max_state(CLIENT_NUM_TILING_TYPES, prng);
+
+               /* Platforms support either TileY or Tile4, not both */
+               if (HAS_4TILE(i915) && t->buffers[i].tiling == CLIENT_TILING_Y)
+                       t->buffers[i].tiling = CLIENT_TILING_4;
+               else if (!HAS_4TILE(i915) && t->buffers[i].tiling == CLIENT_TILING_4)
+                       t->buffers[i].tiling = CLIENT_TILING_Y;
        }
 
        return 0;
@@ -206,7 +356,8 @@ static u64 swizzle_bit(unsigned int bit, u64 offset)
 static u64 tiled_offset(const struct intel_gt *gt,
                        u64 v,
                        unsigned int stride,
-                       enum client_tiling tiling)
+                       enum client_tiling tiling,
+                       int x_pos, int y_pos)
 {
        unsigned int swizzle;
        u64 x, y;
@@ -216,7 +367,12 @@ static u64 tiled_offset(const struct intel_gt *gt,
 
        y = div64_u64_rem(v, stride, &x);
 
-       if (tiling == CLIENT_TILING_X) {
+       if (tiling == CLIENT_TILING_4) {
+               v = linear_x_y_to_ftiled_pos(x_pos, y_pos, stride, 32);
+
+               /* no swizzling for f-tiling */
+               swizzle = I915_BIT_6_SWIZZLE_NONE;
+       } else if (tiling == CLIENT_TILING_X) {
                v = div64_u64_rem(y, 8, &y) * stride * 8;
                v += y * 512;
                v += div64_u64_rem(x, 512, &x) << 12;
@@ -259,6 +415,7 @@ static const char *repr_tiling(enum client_tiling tiling)
        case CLIENT_TILING_LINEAR: return "linear";
        case CLIENT_TILING_X: return "X";
        case CLIENT_TILING_Y: return "Y";
+       case CLIENT_TILING_4: return "F";
        default: return "unknown";
        }
 }
@@ -284,7 +441,7 @@ static int verify_buffer(const struct tiled_blits *t,
        } else {
                u64 v = tiled_offset(buf->vma->vm->gt,
                                     p * 4, t->width * 4,
-                                    buf->tiling);
+                                    buf->tiling, x, y);
 
                if (vaddr[v / sizeof(*vaddr)] != buf->start_val + p)
                        ret = -EINVAL;
@@ -504,6 +661,9 @@ static int tiled_blits_bounce(struct tiled_blits *t, struct rnd_state *prng)
        if (err)
                return err;
 
+       /* Simulating GTT eviction of the same buffer / layout */
+       t->buffers[2].tiling = t->buffers[0].tiling;
+
        /* Reposition so that we overlap the old addresses, and slightly off */
        err = tiled_blit(t,
                         &t->buffers[2], t->hole + t->align,
index 93a67422ca3b133cba32398d75616720d481714b..c6ad67b90e8af2657f6b0c61011fd30bdecaa7d8 100644 (file)
@@ -212,7 +212,7 @@ static int __live_parallel_switch1(void *data)
 
                        i915_request_add(rq);
                }
-               if (i915_request_wait(rq, 0, HZ / 5) < 0)
+               if (i915_request_wait(rq, 0, HZ) < 0)
                        err = -ETIME;
                i915_request_put(rq);
                if (err)
index 3e13960615bdecedc5507a5ac5c29fa8cad0435a..98645797962f59799a6863b2ef3e3307df78809d 100644 (file)
@@ -197,8 +197,10 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 mode)
 
                flags |= PIPE_CONTROL_CS_STALL;
 
-               if (engine->class == COMPUTE_CLASS)
-                       flags &= ~PIPE_CONTROL_3D_FLAGS;
+               if (!HAS_3D_PIPELINE(engine->i915))
+                       flags &= ~PIPE_CONTROL_3D_ARCH_FLAGS;
+               else if (engine->class == COMPUTE_CLASS)
+                       flags &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
 
                cs = intel_ring_begin(rq, 6);
                if (IS_ERR(cs))
@@ -227,8 +229,10 @@ int gen12_emit_flush_rcs(struct i915_request *rq, u32 mode)
 
                flags |= PIPE_CONTROL_CS_STALL;
 
-               if (engine->class == COMPUTE_CLASS)
-                       flags &= ~PIPE_CONTROL_3D_FLAGS;
+               if (!HAS_3D_PIPELINE(engine->i915))
+                       flags &= ~PIPE_CONTROL_3D_ARCH_FLAGS;
+               else if (engine->class == COMPUTE_CLASS)
+                       flags &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
 
                if (!HAS_FLAT_CCS(rq->engine->i915))
                        count = 8 + 4;
@@ -272,7 +276,8 @@ int gen12_emit_flush_xcs(struct i915_request *rq, u32 mode)
                if (!HAS_FLAT_CCS(rq->engine->i915) &&
                    (rq->engine->class == VIDEO_DECODE_CLASS ||
                     rq->engine->class == VIDEO_ENHANCEMENT_CLASS)) {
-                       aux_inv = rq->engine->mask & ~BIT(BCS0);
+                       aux_inv = rq->engine->mask &
+                               ~GENMASK(_BCS(I915_MAX_BCS - 1), BCS0);
                        if (aux_inv)
                                cmd += 4;
                }
@@ -716,8 +721,10 @@ u32 *gen12_emit_fini_breadcrumb_rcs(struct i915_request *rq, u32 *cs)
                /* Wa_1409600907 */
                flags |= PIPE_CONTROL_DEPTH_STALL;
 
-       if (rq->engine->class == COMPUTE_CLASS)
-               flags &= ~PIPE_CONTROL_3D_FLAGS;
+       if (!HAS_3D_PIPELINE(rq->engine->i915))
+               flags &= ~PIPE_CONTROL_3D_ARCH_FLAGS;
+       else if (rq->engine->class == COMPUTE_CLASS)
+               flags &= ~PIPE_CONTROL_3D_ENGINE_FLAGS;
 
        cs = gen12_emit_ggtt_write_rcs(cs,
                                       rq->fence.seqno,
index 4070cb5711d88f672ee3505d98177d74cc0af98c..654a092ed3d69c547704799a877b8d6095343670 100644 (file)
@@ -601,6 +601,30 @@ u64 intel_context_get_avg_runtime_ns(struct intel_context *ce)
        return avg;
 }
 
+bool intel_context_ban(struct intel_context *ce, struct i915_request *rq)
+{
+       bool ret = intel_context_set_banned(ce);
+
+       trace_intel_context_ban(ce);
+
+       if (ce->ops->revoke)
+               ce->ops->revoke(ce, rq,
+                               INTEL_CONTEXT_BANNED_PREEMPT_TIMEOUT_MS);
+
+       return ret;
+}
+
+bool intel_context_exit_nonpersistent(struct intel_context *ce,
+                                     struct i915_request *rq)
+{
+       bool ret = intel_context_set_exiting(ce);
+
+       if (ce->ops->revoke)
+               ce->ops->revoke(ce, rq, ce->engine->props.preempt_timeout_ms);
+
+       return ret;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftest_context.c"
 #endif
index b7d3214d2cdd8a4d933308159101797720ecbf35..8e2d70630c49e82c456bf9499b0da5deb952ceda 100644 (file)
@@ -25,6 +25,8 @@
                     ##__VA_ARGS__);                                    \
 } while (0)
 
+#define INTEL_CONTEXT_BANNED_PREEMPT_TIMEOUT_MS (1)
+
 struct i915_gem_ww_ctx;
 
 void intel_context_init(struct intel_context *ce,
@@ -309,18 +311,27 @@ static inline bool intel_context_set_banned(struct intel_context *ce)
        return test_and_set_bit(CONTEXT_BANNED, &ce->flags);
 }
 
-static inline bool intel_context_ban(struct intel_context *ce,
-                                    struct i915_request *rq)
+bool intel_context_ban(struct intel_context *ce, struct i915_request *rq);
+
+static inline bool intel_context_is_schedulable(const struct intel_context *ce)
 {
-       bool ret = intel_context_set_banned(ce);
+       return !test_bit(CONTEXT_EXITING, &ce->flags) &&
+              !test_bit(CONTEXT_BANNED, &ce->flags);
+}
 
-       trace_intel_context_ban(ce);
-       if (ce->ops->ban)
-               ce->ops->ban(ce, rq);
+static inline bool intel_context_is_exiting(const struct intel_context *ce)
+{
+       return test_bit(CONTEXT_EXITING, &ce->flags);
+}
 
-       return ret;
+static inline bool intel_context_set_exiting(struct intel_context *ce)
+{
+       return test_and_set_bit(CONTEXT_EXITING, &ce->flags);
 }
 
+bool intel_context_exit_nonpersistent(struct intel_context *ce,
+                                     struct i915_request *rq);
+
 static inline bool
 intel_context_force_single_submission(const struct intel_context *ce)
 {
index 09f82545789f194b305cbadd42305d82d66b22d4..d2d75d9c0c8dd5e832f5e0015da9bdff5360a266 100644 (file)
@@ -40,7 +40,8 @@ struct intel_context_ops {
 
        int (*alloc)(struct intel_context *ce);
 
-       void (*ban)(struct intel_context *ce, struct i915_request *rq);
+       void (*revoke)(struct intel_context *ce, struct i915_request *rq,
+                      unsigned int preempt_timeout_ms);
 
        int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr);
        int (*pin)(struct intel_context *ce, void *vaddr);
@@ -122,6 +123,7 @@ struct intel_context {
 #define CONTEXT_GUC_INIT               10
 #define CONTEXT_PERMA_PIN              11
 #define CONTEXT_IS_PARKING             12
+#define CONTEXT_EXITING                        13
 
        struct {
                u64 timeout_us;
index 1431f1e9dbee76b475ce4c93f96ae1dc3210fd6c..04e435bce79bdfc731b0f4bf834e3bd06381e2da 100644 (file)
@@ -201,6 +201,8 @@ int intel_ring_submission_setup(struct intel_engine_cs *engine);
 int intel_engine_stop_cs(struct intel_engine_cs *engine);
 void intel_engine_cancel_stop_cs(struct intel_engine_cs *engine);
 
+void intel_engine_wait_for_pending_mi_fw(struct intel_engine_cs *engine);
+
 void intel_engine_set_hwsp_writemask(struct intel_engine_cs *engine, u32 mask);
 
 u64 intel_engine_get_active_head(const struct intel_engine_cs *engine);
index 14c6ddbbfde8b65586da0030669db5fa266cac01..283870c659911b8b8cbf8f73f9bb5e6789663b45 100644 (file)
@@ -21,8 +21,9 @@
 #include "intel_engine_user.h"
 #include "intel_execlists_submission.h"
 #include "intel_gt.h"
-#include "intel_gt_requests.h"
+#include "intel_gt_mcr.h"
 #include "intel_gt_pm.h"
+#include "intel_gt_requests.h"
 #include "intel_lrc.h"
 #include "intel_lrc_reg.h"
 #include "intel_reset.h"
@@ -71,6 +72,62 @@ static const struct engine_info intel_engines[] = {
                        { .graphics_ver = 6, .base = BLT_RING_BASE }
                },
        },
+       [BCS1] = {
+               .class = COPY_ENGINE_CLASS,
+               .instance = 1,
+               .mmio_bases = {
+                       { .graphics_ver = 12, .base = XEHPC_BCS1_RING_BASE }
+               },
+       },
+       [BCS2] = {
+               .class = COPY_ENGINE_CLASS,
+               .instance = 2,
+               .mmio_bases = {
+                       { .graphics_ver = 12, .base = XEHPC_BCS2_RING_BASE }
+               },
+       },
+       [BCS3] = {
+               .class = COPY_ENGINE_CLASS,
+               .instance = 3,
+               .mmio_bases = {
+                       { .graphics_ver = 12, .base = XEHPC_BCS3_RING_BASE }
+               },
+       },
+       [BCS4] = {
+               .class = COPY_ENGINE_CLASS,
+               .instance = 4,
+               .mmio_bases = {
+                       { .graphics_ver = 12, .base = XEHPC_BCS4_RING_BASE }
+               },
+       },
+       [BCS5] = {
+               .class = COPY_ENGINE_CLASS,
+               .instance = 5,
+               .mmio_bases = {
+                       { .graphics_ver = 12, .base = XEHPC_BCS5_RING_BASE }
+               },
+       },
+       [BCS6] = {
+               .class = COPY_ENGINE_CLASS,
+               .instance = 6,
+               .mmio_bases = {
+                       { .graphics_ver = 12, .base = XEHPC_BCS6_RING_BASE }
+               },
+       },
+       [BCS7] = {
+               .class = COPY_ENGINE_CLASS,
+               .instance = 7,
+               .mmio_bases = {
+                       { .graphics_ver = 12, .base = XEHPC_BCS7_RING_BASE }
+               },
+       },
+       [BCS8] = {
+               .class = COPY_ENGINE_CLASS,
+               .instance = 8,
+               .mmio_bases = {
+                       { .graphics_ver = 12, .base = XEHPC_BCS8_RING_BASE }
+               },
+       },
        [VCS0] = {
                .class = VIDEO_DECODE_CLASS,
                .instance = 0,
@@ -334,6 +391,14 @@ static u32 get_reset_domain(u8 ver, enum intel_engine_id id)
                static const u32 engine_reset_domains[] = {
                        [RCS0]  = GEN11_GRDOM_RENDER,
                        [BCS0]  = GEN11_GRDOM_BLT,
+                       [BCS1]  = XEHPC_GRDOM_BLT1,
+                       [BCS2]  = XEHPC_GRDOM_BLT2,
+                       [BCS3]  = XEHPC_GRDOM_BLT3,
+                       [BCS4]  = XEHPC_GRDOM_BLT4,
+                       [BCS5]  = XEHPC_GRDOM_BLT5,
+                       [BCS6]  = XEHPC_GRDOM_BLT6,
+                       [BCS7]  = XEHPC_GRDOM_BLT7,
+                       [BCS8]  = XEHPC_GRDOM_BLT8,
                        [VCS0]  = GEN11_GRDOM_MEDIA,
                        [VCS1]  = GEN11_GRDOM_MEDIA2,
                        [VCS2]  = GEN11_GRDOM_MEDIA3,
@@ -610,8 +675,8 @@ static void engine_mask_apply_compute_fuses(struct intel_gt *gt)
        if (GRAPHICS_VER_FULL(i915) < IP_VER(12, 50))
                return;
 
-       ccs_mask = intel_slicemask_from_dssmask(intel_sseu_get_compute_subslices(&info->sseu),
-                                               ss_per_ccs);
+       ccs_mask = intel_slicemask_from_xehp_dssmask(info->sseu.compute_subslice_mask,
+                                                    ss_per_ccs);
        /*
         * If all DSS in a quadrant are fused off, the corresponding CCS
         * engine is not available for use.
@@ -622,6 +687,34 @@ static void engine_mask_apply_compute_fuses(struct intel_gt *gt)
        }
 }
 
+static void engine_mask_apply_copy_fuses(struct intel_gt *gt)
+{
+       struct drm_i915_private *i915 = gt->i915;
+       struct intel_gt_info *info = &gt->info;
+       unsigned long meml3_mask;
+       unsigned long quad;
+
+       meml3_mask = intel_uncore_read(gt->uncore, GEN10_MIRROR_FUSE3);
+       meml3_mask = REG_FIELD_GET(GEN12_MEML3_EN_MASK, meml3_mask);
+
+       /*
+        * Link Copy engines may be fused off according to meml3_mask. Each
+        * bit is a quad that houses 2 Link Copy and two Sub Copy engines.
+        */
+       for_each_clear_bit(quad, &meml3_mask, GEN12_MAX_MSLICES) {
+               unsigned int instance = quad * 2 + 1;
+               intel_engine_mask_t mask = GENMASK(_BCS(instance + 1),
+                                                  _BCS(instance));
+
+               if (mask & info->engine_mask) {
+                       drm_dbg(&i915->drm, "bcs%u fused off\n", instance);
+                       drm_dbg(&i915->drm, "bcs%u fused off\n", instance + 1);
+
+                       info->engine_mask &= ~mask;
+               }
+       }
+}
+
 /*
  * Determine which engines are fused off in our particular hardware.
  * Note that we have a catch-22 situation where we need to be able to access
@@ -704,6 +797,7 @@ static intel_engine_mask_t init_engine_mask(struct intel_gt *gt)
        GEM_BUG_ON(vebox_mask != VEBOX_MASK(gt));
 
        engine_mask_apply_compute_fuses(gt);
+       engine_mask_apply_copy_fuses(gt);
 
        return info->engine_mask;
 }
@@ -1282,10 +1376,10 @@ static int __intel_engine_stop_cs(struct intel_engine_cs *engine,
        intel_uncore_write_fw(uncore, mode, _MASKED_BIT_ENABLE(STOP_RING));
 
        /*
-        * Wa_22011802037 : gen12, Prior to doing a reset, ensure CS is
+        * Wa_22011802037 : gen11, gen12, Prior to doing a reset, ensure CS is
         * stopped, set ring stop bit and prefetch disable bit to halt CS
         */
-       if (GRAPHICS_VER(engine->i915) == 12)
+       if (IS_GRAPHICS_VER(engine->i915, 11, 12))
                intel_uncore_write_fw(uncore, RING_MODE_GEN7(engine->mmio_base),
                                      _MASKED_BIT_ENABLE(GEN12_GFX_PREFETCH_DISABLE));
 
@@ -1308,6 +1402,18 @@ int intel_engine_stop_cs(struct intel_engine_cs *engine)
                return -ENODEV;
 
        ENGINE_TRACE(engine, "\n");
+       /*
+        * TODO: Find out why occasionally stopping the CS times out. Seen
+        * especially with gem_eio tests.
+        *
+        * Occasionally trying to stop the cs times out, but does not adversely
+        * affect functionality. The timeout is set as a config parameter that
+        * defaults to 100ms. In most cases the follow up operation is to wait
+        * for pending MI_FORCE_WAKES. The assumption is that this timeout is
+        * sufficient for any pending MI_FORCEWAKEs to complete. Once root
+        * caused, the caller must check and handle the return from this
+        * function.
+        */
        if (__intel_engine_stop_cs(engine, 1000, stop_timeout(engine))) {
                ENGINE_TRACE(engine,
                             "timed out on STOP_RING -> IDLE; HEAD:%04x, TAIL:%04x\n",
@@ -1334,12 +1440,76 @@ void intel_engine_cancel_stop_cs(struct intel_engine_cs *engine)
        ENGINE_WRITE_FW(engine, RING_MI_MODE, _MASKED_BIT_DISABLE(STOP_RING));
 }
 
-static u32
-read_subslice_reg(const struct intel_engine_cs *engine,
-                 int slice, int subslice, i915_reg_t reg)
+static u32 __cs_pending_mi_force_wakes(struct intel_engine_cs *engine)
+{
+       static const i915_reg_t _reg[I915_NUM_ENGINES] = {
+               [RCS0] = MSG_IDLE_CS,
+               [BCS0] = MSG_IDLE_BCS,
+               [VCS0] = MSG_IDLE_VCS0,
+               [VCS1] = MSG_IDLE_VCS1,
+               [VCS2] = MSG_IDLE_VCS2,
+               [VCS3] = MSG_IDLE_VCS3,
+               [VCS4] = MSG_IDLE_VCS4,
+               [VCS5] = MSG_IDLE_VCS5,
+               [VCS6] = MSG_IDLE_VCS6,
+               [VCS7] = MSG_IDLE_VCS7,
+               [VECS0] = MSG_IDLE_VECS0,
+               [VECS1] = MSG_IDLE_VECS1,
+               [VECS2] = MSG_IDLE_VECS2,
+               [VECS3] = MSG_IDLE_VECS3,
+               [CCS0] = MSG_IDLE_CS,
+               [CCS1] = MSG_IDLE_CS,
+               [CCS2] = MSG_IDLE_CS,
+               [CCS3] = MSG_IDLE_CS,
+       };
+       u32 val;
+
+       if (!_reg[engine->id].reg) {
+               drm_err(&engine->i915->drm,
+                       "MSG IDLE undefined for engine id %u\n", engine->id);
+               return 0;
+       }
+
+       val = intel_uncore_read(engine->uncore, _reg[engine->id]);
+
+       /* bits[29:25] & bits[13:9] >> shift */
+       return (val & (val >> 16) & MSG_IDLE_FW_MASK) >> MSG_IDLE_FW_SHIFT;
+}
+
+static void __gpm_wait_for_fw_complete(struct intel_gt *gt, u32 fw_mask)
 {
-       return intel_uncore_read_with_mcr_steering(engine->uncore, reg,
-                                                  slice, subslice);
+       int ret;
+
+       /* Ensure GPM receives fw up/down after CS is stopped */
+       udelay(1);
+
+       /* Wait for forcewake request to complete in GPM */
+       ret =  __intel_wait_for_register_fw(gt->uncore,
+                                           GEN9_PWRGT_DOMAIN_STATUS,
+                                           fw_mask, fw_mask, 5000, 0, NULL);
+
+       /* Ensure CS receives fw ack from GPM */
+       udelay(1);
+
+       if (ret)
+               GT_TRACE(gt, "Failed to complete pending forcewake %d\n", ret);
+}
+
+/*
+ * Wa_22011802037:gen12: In addition to stopping the cs, we need to wait for any
+ * pending MI_FORCE_WAKEUP requests that the CS has initiated to complete. The
+ * pending status is indicated by bits[13:9] (masked by bits[29:25]) in the
+ * MSG_IDLE register. There's one MSG_IDLE register per reset domain. Since we
+ * are concerned only with the gt reset here, we use a logical OR of pending
+ * forcewakeups from all reset domains and then wait for them to complete by
+ * querying PWRGT_DOMAIN_STATUS.
+ */
+void intel_engine_wait_for_pending_mi_fw(struct intel_engine_cs *engine)
+{
+       u32 fw_pending = __cs_pending_mi_force_wakes(engine);
+
+       if (fw_pending)
+               __gpm_wait_for_fw_complete(engine->gt, fw_pending);
 }
 
 /* NB: please notice the memset */
@@ -1375,28 +1545,33 @@ void intel_engine_get_instdone(const struct intel_engine_cs *engine,
                if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50)) {
                        for_each_instdone_gslice_dss_xehp(i915, sseu, iter, slice, subslice) {
                                instdone->sampler[slice][subslice] =
-                                       read_subslice_reg(engine, slice, subslice,
-                                                         GEN7_SAMPLER_INSTDONE);
+                                       intel_gt_mcr_read(engine->gt,
+                                                         GEN7_SAMPLER_INSTDONE,
+                                                         slice, subslice);
                                instdone->row[slice][subslice] =
-                                       read_subslice_reg(engine, slice, subslice,
-                                                         GEN7_ROW_INSTDONE);
+                                       intel_gt_mcr_read(engine->gt,
+                                                         GEN7_ROW_INSTDONE,
+                                                         slice, subslice);
                        }
                } else {
                        for_each_instdone_slice_subslice(i915, sseu, slice, subslice) {
                                instdone->sampler[slice][subslice] =
-                                       read_subslice_reg(engine, slice, subslice,
-                                                         GEN7_SAMPLER_INSTDONE);
+                                       intel_gt_mcr_read(engine->gt,
+                                                         GEN7_SAMPLER_INSTDONE,
+                                                         slice, subslice);
                                instdone->row[slice][subslice] =
-                                       read_subslice_reg(engine, slice, subslice,
-                                                         GEN7_ROW_INSTDONE);
+                                       intel_gt_mcr_read(engine->gt,
+                                                         GEN7_ROW_INSTDONE,
+                                                         slice, subslice);
                        }
                }
 
                if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 55)) {
                        for_each_instdone_gslice_dss_xehp(i915, sseu, iter, slice, subslice)
                                instdone->geom_svg[slice][subslice] =
-                                       read_subslice_reg(engine, slice, subslice,
-                                                         XEHPG_INSTDONE_GEOM_SVG);
+                                       intel_gt_mcr_read(engine->gt,
+                                                         XEHPG_INSTDONE_GEOM_SVG,
+                                                         slice, subslice);
                }
        } else if (GRAPHICS_VER(i915) >= 7) {
                instdone->instdone =
index 75a0c55c5aa5dd134bfc237399f1ec267be0235d..889f0df3940b8f5c7c5628c9777faba4c9af94a6 100644 (file)
@@ -8,6 +8,7 @@
 
 #include "i915_reg_defs.h"
 
+#define RING_EXCC(base)                                _MMIO((base) + 0x28)
 #define RING_TAIL(base)                                _MMIO((base) + 0x30)
 #define   TAIL_ADDR                            0x001FFFF8
 #define RING_HEAD(base)                                _MMIO((base) + 0x34)
                (REG_FIELD_PREP(BLIT_CCTL_DST_MOCS_MASK, (dst) << 1) | \
                 REG_FIELD_PREP(BLIT_CCTL_SRC_MOCS_MASK, (src) << 1))
 
+#define RING_CSCMDOP(base)                     _MMIO((base) + 0x20c)
+
 /*
  * CMD_CCTL read/write fields take a MOCS value and _not_ a table index.
  * The lsb of each can be considered a separate enabling bit for encryption.
                 REG_FIELD_PREP(CMD_CCTL_READ_OVERRIDE_MASK, (read) << 1))
 
 #define RING_PREDICATE_RESULT(base)            _MMIO((base) + 0x3b8) /* gen12+ */
+
 #define MI_PREDICATE_RESULT_2(base)            _MMIO((base) + 0x3bc)
 #define   LOWER_SLICE_ENABLED                  (1 << 0)
 #define   LOWER_SLICE_DISABLED                 (0 << 0)
 #define          CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT      REG_BIT(2)
 #define          CTX_CTRL_INHIBIT_SYN_CTX_SWITCH       REG_BIT(3)
 #define          GEN12_CTX_CTRL_OAR_CONTEXT_ENABLE     REG_BIT(8)
+#define RING_CTX_SR_CTL(base)                  _MMIO((base) + 0x244)
 #define RING_SEMA_WAIT_POLL(base)              _MMIO((base) + 0x24c)
 #define GEN8_RING_PDP_UDW(base, n)             _MMIO((base) + 0x270 + (n) * 8 + 4)
 #define GEN8_RING_PDP_LDW(base, n)             _MMIO((base) + 0x270 + (n) * 8)
 #define RING_CTX_TIMESTAMP(base)               _MMIO((base) + 0x3a8) /* gen8+ */
 #define RING_PREDICATE_RESULT(base)            _MMIO((base) + 0x3b8)
 #define RING_FORCE_TO_NONPRIV(base, i)         _MMIO(((base) + 0x4D0) + (i) * 4)
+#define   RING_FORCE_TO_NONPRIV_DENY           REG_BIT(30)
 #define   RING_FORCE_TO_NONPRIV_ADDRESS_MASK   REG_GENMASK(25, 2)
 #define   RING_FORCE_TO_NONPRIV_ACCESS_RW      (0 << 28)    /* CFL+ & Gen11+ */
 #define   RING_FORCE_TO_NONPRIV_ACCESS_RD      (1 << 28)
 #define   RING_FORCE_TO_NONPRIV_RANGE_64       (3 << 0)
 #define   RING_FORCE_TO_NONPRIV_RANGE_MASK     (3 << 0)
 #define   RING_FORCE_TO_NONPRIV_MASK_VALID     \
-       (RING_FORCE_TO_NONPRIV_RANGE_MASK | RING_FORCE_TO_NONPRIV_ACCESS_MASK)
+       (RING_FORCE_TO_NONPRIV_RANGE_MASK | \
+        RING_FORCE_TO_NONPRIV_ACCESS_MASK | \
+        RING_FORCE_TO_NONPRIV_DENY)
 #define   RING_MAX_NONPRIV_SLOTS  12
 
 #define RING_EXECLIST_SQ_CONTENTS(base)                _MMIO((base) + 0x510)
index 298f2cc7a879f402fb6bf732179e1aab905ad5b6..2286f96f5f8772aff10d3d8c78e46f6fdd2f2a33 100644 (file)
@@ -35,7 +35,7 @@
 #define OTHER_CLASS            4
 #define COMPUTE_CLASS          5
 #define MAX_ENGINE_CLASS       5
-#define MAX_ENGINE_INSTANCE    7
+#define MAX_ENGINE_INSTANCE    8
 
 #define I915_MAX_SLICES        3
 #define I915_MAX_SUBSLICES 8
@@ -99,6 +99,7 @@ struct i915_ctx_workarounds {
 #define I915_MAX_SFC   (I915_MAX_VCS / 2)
 #define I915_MAX_CCS   4
 #define I915_MAX_RCS   1
+#define I915_MAX_BCS   9
 
 /*
  * Engine IDs definitions.
@@ -107,6 +108,15 @@ struct i915_ctx_workarounds {
 enum intel_engine_id {
        RCS0 = 0,
        BCS0,
+       BCS1,
+       BCS2,
+       BCS3,
+       BCS4,
+       BCS5,
+       BCS6,
+       BCS7,
+       BCS8,
+#define _BCS(n) (BCS0 + (n))
        VCS0,
        VCS1,
        VCS2,
index aa0d2bbbbcc41e3011a8baabf1098f00b912a136..4b909cb88cdfb7b7422bd5cc1629f536f9422032 100644 (file)
@@ -480,9 +480,9 @@ __execlists_schedule_in(struct i915_request *rq)
 
        if (unlikely(intel_context_is_closed(ce) &&
                     !intel_engine_has_heartbeat(engine)))
-               intel_context_set_banned(ce);
+               intel_context_set_exiting(ce);
 
-       if (unlikely(intel_context_is_banned(ce) || bad_request(rq)))
+       if (unlikely(!intel_context_is_schedulable(ce) || bad_request(rq)))
                reset_active(rq, engine);
 
        if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
@@ -661,6 +661,16 @@ static inline void execlists_schedule_out(struct i915_request *rq)
        i915_request_put(rq);
 }
 
+static u32 map_i915_prio_to_lrc_desc_prio(int prio)
+{
+       if (prio > I915_PRIORITY_NORMAL)
+               return GEN12_CTX_PRIORITY_HIGH;
+       else if (prio < I915_PRIORITY_NORMAL)
+               return GEN12_CTX_PRIORITY_LOW;
+       else
+               return GEN12_CTX_PRIORITY_NORMAL;
+}
+
 static u64 execlists_update_context(struct i915_request *rq)
 {
        struct intel_context *ce = rq->context;
@@ -669,7 +679,7 @@ static u64 execlists_update_context(struct i915_request *rq)
 
        desc = ce->lrc.desc;
        if (rq->engine->flags & I915_ENGINE_HAS_EU_PRIORITY)
-               desc |= lrc_desc_priority(rq_prio(rq));
+               desc |= map_i915_prio_to_lrc_desc_prio(rq_prio(rq));
 
        /*
         * WaIdleLiteRestore:bdw,skl
@@ -1233,7 +1243,7 @@ static unsigned long active_preempt_timeout(struct intel_engine_cs *engine,
 
        /* Force a fast reset for terminated contexts (ignoring sysfs!) */
        if (unlikely(intel_context_is_banned(rq->context) || bad_request(rq)))
-               return 1;
+               return INTEL_CONTEXT_BANNED_PREEMPT_TIMEOUT_MS;
 
        return READ_ONCE(engine->props.preempt_timeout_ms);
 }
@@ -2958,6 +2968,13 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine)
        ring_set_paused(engine, 1);
        intel_engine_stop_cs(engine);
 
+       /*
+        * Wa_22011802037:gen11/gen12: In addition to stopping the cs, we need
+        * to wait for any pending mi force wakeups
+        */
+       if (IS_GRAPHICS_VER(engine->i915, 11, 12))
+               intel_engine_wait_for_pending_mi_fw(engine);
+
        engine->execlists.reset_ccid = active_ccid(engine);
 }
 
index e6b2eb122ad7ed60c5b6a0bb92dbaa7d9a86453c..15a915bb4088e9e7e5601688791c8c0043e84049 100644 (file)
@@ -3,16 +3,18 @@
  * Copyright Â© 2020 Intel Corporation
  */
 
-#include <linux/types.h>
 #include <asm/set_memory.h>
 #include <asm/smp.h>
+#include <linux/types.h>
+#include <linux/stop_machine.h>
 
 #include <drm/i915_drm.h>
+#include <drm/intel-gtt.h>
 
 #include "gem/i915_gem_lmem.h"
 
+#include "intel_ggtt_gmch.h"
 #include "intel_gt.h"
-#include "intel_gt_gmch.h"
 #include "intel_gt_regs.h"
 #include "i915_drv.h"
 #include "i915_scatterlist.h"
 #include "intel_gtt.h"
 #include "gen8_ppgtt.h"
 
+static inline bool suspend_retains_ptes(struct i915_address_space *vm)
+{
+       return GRAPHICS_VER(vm->i915) >= 8 &&
+               !HAS_LMEM(vm->i915) &&
+               vm->is_ggtt;
+}
+
 static void i915_ggtt_color_adjust(const struct drm_mm_node *node,
                                   unsigned long color,
                                   u64 *start,
@@ -93,6 +102,23 @@ int i915_ggtt_init_hw(struct drm_i915_private *i915)
        return 0;
 }
 
+/*
+ * Return the value of the last GGTT pte cast to an u64, if
+ * the system is supposed to retain ptes across resume. 0 otherwise.
+ */
+static u64 read_last_pte(struct i915_address_space *vm)
+{
+       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
+       gen8_pte_t __iomem *ptep;
+
+       if (!suspend_retains_ptes(vm))
+               return 0;
+
+       GEM_BUG_ON(GRAPHICS_VER(vm->i915) < 8);
+       ptep = (typeof(ptep))ggtt->gsm + (ggtt_total_entries(ggtt) - 1);
+       return readq(ptep);
+}
+
 /**
  * i915_ggtt_suspend_vm - Suspend the memory mappings for a GGTT or DPT VM
  * @vm: The VM to suspend the mappings for
@@ -156,7 +182,10 @@ retry:
                i915_gem_object_unlock(obj);
        }
 
-       vm->clear_range(vm, 0, vm->total);
+       if (!suspend_retains_ptes(vm))
+               vm->clear_range(vm, 0, vm->total);
+       else
+               i915_vm_to_ggtt(vm)->probed_pte = read_last_pte(vm);
 
        vm->skip_pte_rewrite = save_skip_rewrite;
 
@@ -181,7 +210,7 @@ void gen6_ggtt_invalidate(struct i915_ggtt *ggtt)
        spin_unlock_irq(&uncore->lock);
 }
 
-void gen8_ggtt_invalidate(struct i915_ggtt *ggtt)
+static void gen8_ggtt_invalidate(struct i915_ggtt *ggtt)
 {
        struct intel_uncore *uncore = ggtt->vm.gt->uncore;
 
@@ -218,11 +247,232 @@ u64 gen8_ggtt_pte_encode(dma_addr_t addr,
        return pte;
 }
 
+static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
+{
+       writeq(pte, addr);
+}
+
+static void gen8_ggtt_insert_page(struct i915_address_space *vm,
+                                 dma_addr_t addr,
+                                 u64 offset,
+                                 enum i915_cache_level level,
+                                 u32 flags)
+{
+       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
+       gen8_pte_t __iomem *pte =
+               (gen8_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE;
+
+       gen8_set_pte(pte, gen8_ggtt_pte_encode(addr, level, flags));
+
+       ggtt->invalidate(ggtt);
+}
+
+static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
+                                    struct i915_vma_resource *vma_res,
+                                    enum i915_cache_level level,
+                                    u32 flags)
+{
+       const gen8_pte_t pte_encode = gen8_ggtt_pte_encode(0, level, flags);
+       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
+       gen8_pte_t __iomem *gte;
+       gen8_pte_t __iomem *end;
+       struct sgt_iter iter;
+       dma_addr_t addr;
+
+       /*
+        * Note that we ignore PTE_READ_ONLY here. The caller must be careful
+        * not to allow the user to override access to a read only page.
+        */
+
+       gte = (gen8_pte_t __iomem *)ggtt->gsm;
+       gte += vma_res->start / I915_GTT_PAGE_SIZE;
+       end = gte + vma_res->node_size / I915_GTT_PAGE_SIZE;
+
+       for_each_sgt_daddr(addr, iter, vma_res->bi.pages)
+               gen8_set_pte(gte++, pte_encode | addr);
+       GEM_BUG_ON(gte > end);
+
+       /* Fill the allocated but "unused" space beyond the end of the buffer */
+       while (gte < end)
+               gen8_set_pte(gte++, vm->scratch[0]->encode);
+
+       /*
+        * We want to flush the TLBs only after we're certain all the PTE
+        * updates have finished.
+        */
+       ggtt->invalidate(ggtt);
+}
+
+static void gen6_ggtt_insert_page(struct i915_address_space *vm,
+                                 dma_addr_t addr,
+                                 u64 offset,
+                                 enum i915_cache_level level,
+                                 u32 flags)
+{
+       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
+       gen6_pte_t __iomem *pte =
+               (gen6_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE;
+
+       iowrite32(vm->pte_encode(addr, level, flags), pte);
+
+       ggtt->invalidate(ggtt);
+}
+
+/*
+ * Binds an object into the global gtt with the specified cache level.
+ * The object will be accessible to the GPU via commands whose operands
+ * reference offsets within the global GTT as well as accessible by the GPU
+ * through the GMADR mapped BAR (i915->mm.gtt->gtt).
+ */
+static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
+                                    struct i915_vma_resource *vma_res,
+                                    enum i915_cache_level level,
+                                    u32 flags)
+{
+       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
+       gen6_pte_t __iomem *gte;
+       gen6_pte_t __iomem *end;
+       struct sgt_iter iter;
+       dma_addr_t addr;
+
+       gte = (gen6_pte_t __iomem *)ggtt->gsm;
+       gte += vma_res->start / I915_GTT_PAGE_SIZE;
+       end = gte + vma_res->node_size / I915_GTT_PAGE_SIZE;
+
+       for_each_sgt_daddr(addr, iter, vma_res->bi.pages)
+               iowrite32(vm->pte_encode(addr, level, flags), gte++);
+       GEM_BUG_ON(gte > end);
+
+       /* Fill the allocated but "unused" space beyond the end of the buffer */
+       while (gte < end)
+               iowrite32(vm->scratch[0]->encode, gte++);
+
+       /*
+        * We want to flush the TLBs only after we're certain all the PTE
+        * updates have finished.
+        */
+       ggtt->invalidate(ggtt);
+}
+
+static void nop_clear_range(struct i915_address_space *vm,
+                           u64 start, u64 length)
+{
+}
+
+static void gen8_ggtt_clear_range(struct i915_address_space *vm,
+                                 u64 start, u64 length)
+{
+       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
+       unsigned int first_entry = start / I915_GTT_PAGE_SIZE;
+       unsigned int num_entries = length / I915_GTT_PAGE_SIZE;
+       const gen8_pte_t scratch_pte = vm->scratch[0]->encode;
+       gen8_pte_t __iomem *gtt_base =
+               (gen8_pte_t __iomem *)ggtt->gsm + first_entry;
+       const int max_entries = ggtt_total_entries(ggtt) - first_entry;
+       int i;
+
+       if (WARN(num_entries > max_entries,
+                "First entry = %d; Num entries = %d (max=%d)\n",
+                first_entry, num_entries, max_entries))
+               num_entries = max_entries;
+
+       for (i = 0; i < num_entries; i++)
+               gen8_set_pte(&gtt_base[i], scratch_pte);
+}
+
+static void bxt_vtd_ggtt_wa(struct i915_address_space *vm)
+{
+       /*
+        * Make sure the internal GAM fifo has been cleared of all GTT
+        * writes before exiting stop_machine(). This guarantees that
+        * any aperture accesses waiting to start in another process
+        * cannot back up behind the GTT writes causing a hang.
+        * The register can be any arbitrary GAM register.
+        */
+       intel_uncore_posting_read_fw(vm->gt->uncore, GFX_FLSH_CNTL_GEN6);
+}
+
+struct insert_page {
+       struct i915_address_space *vm;
+       dma_addr_t addr;
+       u64 offset;
+       enum i915_cache_level level;
+};
+
+static int bxt_vtd_ggtt_insert_page__cb(void *_arg)
+{
+       struct insert_page *arg = _arg;
+
+       gen8_ggtt_insert_page(arg->vm, arg->addr, arg->offset, arg->level, 0);
+       bxt_vtd_ggtt_wa(arg->vm);
+
+       return 0;
+}
+
+static void bxt_vtd_ggtt_insert_page__BKL(struct i915_address_space *vm,
+                                         dma_addr_t addr,
+                                         u64 offset,
+                                         enum i915_cache_level level,
+                                         u32 unused)
+{
+       struct insert_page arg = { vm, addr, offset, level };
+
+       stop_machine(bxt_vtd_ggtt_insert_page__cb, &arg, NULL);
+}
+
+struct insert_entries {
+       struct i915_address_space *vm;
+       struct i915_vma_resource *vma_res;
+       enum i915_cache_level level;
+       u32 flags;
+};
+
+static int bxt_vtd_ggtt_insert_entries__cb(void *_arg)
+{
+       struct insert_entries *arg = _arg;
+
+       gen8_ggtt_insert_entries(arg->vm, arg->vma_res, arg->level, arg->flags);
+       bxt_vtd_ggtt_wa(arg->vm);
+
+       return 0;
+}
+
+static void bxt_vtd_ggtt_insert_entries__BKL(struct i915_address_space *vm,
+                                            struct i915_vma_resource *vma_res,
+                                            enum i915_cache_level level,
+                                            u32 flags)
+{
+       struct insert_entries arg = { vm, vma_res, level, flags };
+
+       stop_machine(bxt_vtd_ggtt_insert_entries__cb, &arg, NULL);
+}
+
+static void gen6_ggtt_clear_range(struct i915_address_space *vm,
+                                 u64 start, u64 length)
+{
+       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
+       unsigned int first_entry = start / I915_GTT_PAGE_SIZE;
+       unsigned int num_entries = length / I915_GTT_PAGE_SIZE;
+       gen6_pte_t scratch_pte, __iomem *gtt_base =
+               (gen6_pte_t __iomem *)ggtt->gsm + first_entry;
+       const int max_entries = ggtt_total_entries(ggtt) - first_entry;
+       int i;
+
+       if (WARN(num_entries > max_entries,
+                "First entry = %d; Num entries = %d (max=%d)\n",
+                first_entry, num_entries, max_entries))
+               num_entries = max_entries;
+
+       scratch_pte = vm->scratch[0]->encode;
+       for (i = 0; i < num_entries; i++)
+               iowrite32(scratch_pte, &gtt_base[i]);
+}
+
 void intel_ggtt_bind_vma(struct i915_address_space *vm,
-                         struct i915_vm_pt_stash *stash,
-                         struct i915_vma_resource *vma_res,
-                         enum i915_cache_level cache_level,
-                         u32 flags)
+                        struct i915_vm_pt_stash *stash,
+                        struct i915_vma_resource *vma_res,
+                        enum i915_cache_level cache_level,
+                        u32 flags)
 {
        u32 pte_flags;
 
@@ -243,7 +493,7 @@ void intel_ggtt_bind_vma(struct i915_address_space *vm,
 }
 
 void intel_ggtt_unbind_vma(struct i915_address_space *vm,
-                           struct i915_vma_resource *vma_res)
+                          struct i915_vma_resource *vma_res)
 {
        vm->clear_range(vm, vma_res->start, vma_res->vma_size);
 }
@@ -299,6 +549,8 @@ static int init_ggtt(struct i915_ggtt *ggtt)
        struct drm_mm_node *entry;
        int ret;
 
+       ggtt->pte_lost = true;
+
        /*
         * GuC requires all resources that we're sharing with it to be placed in
         * non-WOPCM memory. If GuC is not present or not in use we still need a
@@ -560,12 +812,326 @@ void i915_ggtt_driver_late_release(struct drm_i915_private *i915)
        dma_resv_fini(&ggtt->vm._resv);
 }
 
-struct resource intel_pci_resource(struct pci_dev *pdev, int bar)
+static unsigned int gen6_get_total_gtt_size(u16 snb_gmch_ctl)
+{
+       snb_gmch_ctl >>= SNB_GMCH_GGMS_SHIFT;
+       snb_gmch_ctl &= SNB_GMCH_GGMS_MASK;
+       return snb_gmch_ctl << 20;
+}
+
+static unsigned int gen8_get_total_gtt_size(u16 bdw_gmch_ctl)
+{
+       bdw_gmch_ctl >>= BDW_GMCH_GGMS_SHIFT;
+       bdw_gmch_ctl &= BDW_GMCH_GGMS_MASK;
+       if (bdw_gmch_ctl)
+               bdw_gmch_ctl = 1 << bdw_gmch_ctl;
+
+#ifdef CONFIG_X86_32
+       /* Limit 32b platforms to a 2GB GGTT: 4 << 20 / pte size * I915_GTT_PAGE_SIZE */
+       if (bdw_gmch_ctl > 4)
+               bdw_gmch_ctl = 4;
+#endif
+
+       return bdw_gmch_ctl << 20;
+}
+
+static unsigned int chv_get_total_gtt_size(u16 gmch_ctrl)
+{
+       gmch_ctrl >>= SNB_GMCH_GGMS_SHIFT;
+       gmch_ctrl &= SNB_GMCH_GGMS_MASK;
+
+       if (gmch_ctrl)
+               return 1 << (20 + gmch_ctrl);
+
+       return 0;
+}
+
+static unsigned int gen6_gttmmadr_size(struct drm_i915_private *i915)
+{
+       /*
+        * GEN6: GTTMMADR size is 4MB and GTTADR starts at 2MB offset
+        * GEN8: GTTMMADR size is 16MB and GTTADR starts at 8MB offset
+        */
+       GEM_BUG_ON(GRAPHICS_VER(i915) < 6);
+       return (GRAPHICS_VER(i915) < 8) ? SZ_4M : SZ_16M;
+}
+
+static unsigned int gen6_gttadr_offset(struct drm_i915_private *i915)
+{
+       return gen6_gttmmadr_size(i915) / 2;
+}
+
+static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
+{
+       struct drm_i915_private *i915 = ggtt->vm.i915;
+       struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
+       phys_addr_t phys_addr;
+       u32 pte_flags;
+       int ret;
+
+       GEM_WARN_ON(pci_resource_len(pdev, 0) != gen6_gttmmadr_size(i915));
+       phys_addr = pci_resource_start(pdev, 0) + gen6_gttadr_offset(i915);
+
+       /*
+        * On BXT+/ICL+ writes larger than 64 bit to the GTT pagetable range
+        * will be dropped. For WC mappings in general we have 64 byte burst
+        * writes when the WC buffer is flushed, so we can't use it, but have to
+        * resort to an uncached mapping. The WC issue is easily caught by the
+        * readback check when writing GTT PTE entries.
+        */
+       if (IS_GEN9_LP(i915) || GRAPHICS_VER(i915) >= 11)
+               ggtt->gsm = ioremap(phys_addr, size);
+       else
+               ggtt->gsm = ioremap_wc(phys_addr, size);
+       if (!ggtt->gsm) {
+               drm_err(&i915->drm, "Failed to map the ggtt page table\n");
+               return -ENOMEM;
+       }
+
+       kref_init(&ggtt->vm.resv_ref);
+       ret = setup_scratch_page(&ggtt->vm);
+       if (ret) {
+               drm_err(&i915->drm, "Scratch setup failed\n");
+               /* iounmap will also get called at remove, but meh */
+               iounmap(ggtt->gsm);
+               return ret;
+       }
+
+       pte_flags = 0;
+       if (i915_gem_object_is_lmem(ggtt->vm.scratch[0]))
+               pte_flags |= PTE_LM;
+
+       ggtt->vm.scratch[0]->encode =
+               ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]),
+                                   I915_CACHE_NONE, pte_flags);
+
+       return 0;
+}
+
+static void gen6_gmch_remove(struct i915_address_space *vm)
+{
+       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
+
+       iounmap(ggtt->gsm);
+       free_scratch(vm);
+}
+
+static struct resource pci_resource(struct pci_dev *pdev, int bar)
 {
        return (struct resource)DEFINE_RES_MEM(pci_resource_start(pdev, bar),
                                               pci_resource_len(pdev, bar));
 }
 
+static int gen8_gmch_probe(struct i915_ggtt *ggtt)
+{
+       struct drm_i915_private *i915 = ggtt->vm.i915;
+       struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
+       unsigned int size;
+       u16 snb_gmch_ctl;
+
+       if (!HAS_LMEM(i915)) {
+               ggtt->gmadr = pci_resource(pdev, 2);
+               ggtt->mappable_end = resource_size(&ggtt->gmadr);
+       }
+
+       pci_read_config_word(pdev, SNB_GMCH_CTRL, &snb_gmch_ctl);
+       if (IS_CHERRYVIEW(i915))
+               size = chv_get_total_gtt_size(snb_gmch_ctl);
+       else
+               size = gen8_get_total_gtt_size(snb_gmch_ctl);
+
+       ggtt->vm.alloc_pt_dma = alloc_pt_dma;
+       ggtt->vm.alloc_scratch_dma = alloc_pt_dma;
+       ggtt->vm.lmem_pt_obj_flags = I915_BO_ALLOC_PM_EARLY;
+
+       ggtt->vm.total = (size / sizeof(gen8_pte_t)) * I915_GTT_PAGE_SIZE;
+       ggtt->vm.cleanup = gen6_gmch_remove;
+       ggtt->vm.insert_page = gen8_ggtt_insert_page;
+       ggtt->vm.clear_range = nop_clear_range;
+       if (intel_scanout_needs_vtd_wa(i915))
+               ggtt->vm.clear_range = gen8_ggtt_clear_range;
+
+       ggtt->vm.insert_entries = gen8_ggtt_insert_entries;
+
+       /*
+        * Serialize GTT updates with aperture access on BXT if VT-d is on,
+        * and always on CHV.
+        */
+       if (intel_vm_no_concurrent_access_wa(i915)) {
+               ggtt->vm.insert_entries = bxt_vtd_ggtt_insert_entries__BKL;
+               ggtt->vm.insert_page    = bxt_vtd_ggtt_insert_page__BKL;
+
+               /*
+                * Calling stop_machine() version of GGTT update function
+                * at error capture/reset path will raise lockdep warning.
+                * Allow calling gen8_ggtt_insert_* directly at reset path
+                * which is safe from parallel GGTT updates.
+                */
+               ggtt->vm.raw_insert_page = gen8_ggtt_insert_page;
+               ggtt->vm.raw_insert_entries = gen8_ggtt_insert_entries;
+
+               ggtt->vm.bind_async_flags =
+                       I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND;
+       }
+
+       ggtt->invalidate = gen8_ggtt_invalidate;
+
+       ggtt->vm.vma_ops.bind_vma    = intel_ggtt_bind_vma;
+       ggtt->vm.vma_ops.unbind_vma  = intel_ggtt_unbind_vma;
+
+       ggtt->vm.pte_encode = gen8_ggtt_pte_encode;
+
+       setup_private_pat(ggtt->vm.gt->uncore);
+
+       return ggtt_probe_common(ggtt, size);
+}
+
+static u64 snb_pte_encode(dma_addr_t addr,
+                         enum i915_cache_level level,
+                         u32 flags)
+{
+       gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
+
+       switch (level) {
+       case I915_CACHE_L3_LLC:
+       case I915_CACHE_LLC:
+               pte |= GEN6_PTE_CACHE_LLC;
+               break;
+       case I915_CACHE_NONE:
+               pte |= GEN6_PTE_UNCACHED;
+               break;
+       default:
+               MISSING_CASE(level);
+       }
+
+       return pte;
+}
+
+static u64 ivb_pte_encode(dma_addr_t addr,
+                         enum i915_cache_level level,
+                         u32 flags)
+{
+       gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
+
+       switch (level) {
+       case I915_CACHE_L3_LLC:
+               pte |= GEN7_PTE_CACHE_L3_LLC;
+               break;
+       case I915_CACHE_LLC:
+               pte |= GEN6_PTE_CACHE_LLC;
+               break;
+       case I915_CACHE_NONE:
+               pte |= GEN6_PTE_UNCACHED;
+               break;
+       default:
+               MISSING_CASE(level);
+       }
+
+       return pte;
+}
+
+static u64 byt_pte_encode(dma_addr_t addr,
+                         enum i915_cache_level level,
+                         u32 flags)
+{
+       gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
+
+       if (!(flags & PTE_READ_ONLY))
+               pte |= BYT_PTE_WRITEABLE;
+
+       if (level != I915_CACHE_NONE)
+               pte |= BYT_PTE_SNOOPED_BY_CPU_CACHES;
+
+       return pte;
+}
+
+static u64 hsw_pte_encode(dma_addr_t addr,
+                         enum i915_cache_level level,
+                         u32 flags)
+{
+       gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
+
+       if (level != I915_CACHE_NONE)
+               pte |= HSW_WB_LLC_AGE3;
+
+       return pte;
+}
+
+static u64 iris_pte_encode(dma_addr_t addr,
+                          enum i915_cache_level level,
+                          u32 flags)
+{
+       gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
+
+       switch (level) {
+       case I915_CACHE_NONE:
+               break;
+       case I915_CACHE_WT:
+               pte |= HSW_WT_ELLC_LLC_AGE3;
+               break;
+       default:
+               pte |= HSW_WB_ELLC_LLC_AGE3;
+               break;
+       }
+
+       return pte;
+}
+
+static int gen6_gmch_probe(struct i915_ggtt *ggtt)
+{
+       struct drm_i915_private *i915 = ggtt->vm.i915;
+       struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
+       unsigned int size;
+       u16 snb_gmch_ctl;
+
+       ggtt->gmadr = pci_resource(pdev, 2);
+       ggtt->mappable_end = resource_size(&ggtt->gmadr);
+
+       /*
+        * 64/512MB is the current min/max we actually know of, but this is
+        * just a coarse sanity check.
+        */
+       if (ggtt->mappable_end < (64 << 20) ||
+           ggtt->mappable_end > (512 << 20)) {
+               drm_err(&i915->drm, "Unknown GMADR size (%pa)\n",
+                       &ggtt->mappable_end);
+               return -ENXIO;
+       }
+
+       pci_read_config_word(pdev, SNB_GMCH_CTRL, &snb_gmch_ctl);
+
+       size = gen6_get_total_gtt_size(snb_gmch_ctl);
+       ggtt->vm.total = (size / sizeof(gen6_pte_t)) * I915_GTT_PAGE_SIZE;
+
+       ggtt->vm.alloc_pt_dma = alloc_pt_dma;
+       ggtt->vm.alloc_scratch_dma = alloc_pt_dma;
+
+       ggtt->vm.clear_range = nop_clear_range;
+       if (!HAS_FULL_PPGTT(i915) || intel_scanout_needs_vtd_wa(i915))
+               ggtt->vm.clear_range = gen6_ggtt_clear_range;
+       ggtt->vm.insert_page = gen6_ggtt_insert_page;
+       ggtt->vm.insert_entries = gen6_ggtt_insert_entries;
+       ggtt->vm.cleanup = gen6_gmch_remove;
+
+       ggtt->invalidate = gen6_ggtt_invalidate;
+
+       if (HAS_EDRAM(i915))
+               ggtt->vm.pte_encode = iris_pte_encode;
+       else if (IS_HASWELL(i915))
+               ggtt->vm.pte_encode = hsw_pte_encode;
+       else if (IS_VALLEYVIEW(i915))
+               ggtt->vm.pte_encode = byt_pte_encode;
+       else if (GRAPHICS_VER(i915) >= 7)
+               ggtt->vm.pte_encode = ivb_pte_encode;
+       else
+               ggtt->vm.pte_encode = snb_pte_encode;
+
+       ggtt->vm.vma_ops.bind_vma    = intel_ggtt_bind_vma;
+       ggtt->vm.vma_ops.unbind_vma  = intel_ggtt_unbind_vma;
+
+       return ggtt_probe_common(ggtt, size);
+}
+
 static int ggtt_probe_hw(struct i915_ggtt *ggtt, struct intel_gt *gt)
 {
        struct drm_i915_private *i915 = gt->i915;
@@ -576,12 +1142,13 @@ static int ggtt_probe_hw(struct i915_ggtt *ggtt, struct intel_gt *gt)
        ggtt->vm.dma = i915->drm.dev;
        dma_resv_init(&ggtt->vm._resv);
 
-       if (GRAPHICS_VER(i915) <= 5)
-               ret = intel_gt_gmch_gen5_probe(ggtt);
-       else if (GRAPHICS_VER(i915) < 8)
-               ret = intel_gt_gmch_gen6_probe(ggtt);
+       if (GRAPHICS_VER(i915) >= 8)
+               ret = gen8_gmch_probe(ggtt);
+       else if (GRAPHICS_VER(i915) >= 6)
+               ret = gen6_gmch_probe(ggtt);
        else
-               ret = intel_gt_gmch_gen8_probe(ggtt);
+               ret = intel_ggtt_gmch_probe(ggtt);
+
        if (ret) {
                dma_resv_fini(&ggtt->vm._resv);
                return ret;
@@ -635,7 +1202,10 @@ int i915_ggtt_probe_hw(struct drm_i915_private *i915)
 
 int i915_ggtt_enable_hw(struct drm_i915_private *i915)
 {
-       return intel_gt_gmch_gen5_enable_hw(i915);
+       if (GRAPHICS_VER(i915) < 6)
+               return intel_ggtt_gmch_enable_hw(i915);
+
+       return 0;
 }
 
 void i915_ggtt_enable_guc(struct i915_ggtt *ggtt)
@@ -675,11 +1245,20 @@ bool i915_ggtt_resume_vm(struct i915_address_space *vm)
 {
        struct i915_vma *vma;
        bool write_domain_objs = false;
+       bool retained_ptes;
 
        drm_WARN_ON(&vm->i915->drm, !vm->is_ggtt && !vm->is_dpt);
 
-       /* First fill our portion of the GTT with scratch pages */
-       vm->clear_range(vm, 0, vm->total);
+       /*
+        * First fill our portion of the GTT with scratch pages if
+        * they were not retained across suspend.
+        */
+       retained_ptes = suspend_retains_ptes(vm) &&
+               !i915_vm_to_ggtt(vm)->pte_lost &&
+               !GEM_WARN_ON(i915_vm_to_ggtt(vm)->probed_pte != read_last_pte(vm));
+
+       if (!retained_ptes)
+               vm->clear_range(vm, 0, vm->total);
 
        /* clflush objects bound into the GGTT and rebind them. */
        list_for_each_entry(vma, &vm->bound_list, vm_link) {
@@ -688,9 +1267,10 @@ bool i915_ggtt_resume_vm(struct i915_address_space *vm)
                        atomic_read(&vma->flags) & I915_VMA_BIND_MASK;
 
                GEM_BUG_ON(!was_bound);
-               vma->ops->bind_vma(vm, NULL, vma->resource,
-                                  obj ? obj->cache_level : 0,
-                                  was_bound);
+               if (!retained_ptes)
+                       vma->ops->bind_vma(vm, NULL, vma->resource,
+                                          obj ? obj->cache_level : 0,
+                                          was_bound);
                if (obj) { /* only used during resume => exclusive access */
                        write_domain_objs |= fetch_and_zero(&obj->write_domain);
                        obj->read_domains |= I915_GEM_DOMAIN_GTT;
@@ -718,3 +1298,8 @@ void i915_ggtt_resume(struct i915_ggtt *ggtt)
 
        intel_ggtt_restore_fences(ggtt);
 }
+
+void i915_ggtt_mark_pte_lost(struct drm_i915_private *i915, bool val)
+{
+       to_gt(i915)->ggtt->pte_lost = val;
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.c
new file mode 100644 (file)
index 0000000..4e2163a
--- /dev/null
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright Â© 2022 Intel Corporation
+ */
+
+#include "intel_ggtt_gmch.h"
+
+#include <drm/intel-gtt.h>
+#include <drm/i915_drm.h>
+
+#include <linux/agp_backend.h>
+
+#include "i915_drv.h"
+#include "i915_utils.h"
+#include "intel_gtt.h"
+#include "intel_gt_regs.h"
+#include "intel_gt.h"
+
+static void gmch_ggtt_insert_page(struct i915_address_space *vm,
+                                 dma_addr_t addr,
+                                 u64 offset,
+                                 enum i915_cache_level cache_level,
+                                 u32 unused)
+{
+       unsigned int flags = (cache_level == I915_CACHE_NONE) ?
+               AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
+
+       intel_gmch_gtt_insert_page(addr, offset >> PAGE_SHIFT, flags);
+}
+
+static void gmch_ggtt_insert_entries(struct i915_address_space *vm,
+                                    struct i915_vma_resource *vma_res,
+                                    enum i915_cache_level cache_level,
+                                    u32 unused)
+{
+       unsigned int flags = (cache_level == I915_CACHE_NONE) ?
+               AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
+
+       intel_gmch_gtt_insert_sg_entries(vma_res->bi.pages, vma_res->start >> PAGE_SHIFT,
+                                        flags);
+}
+
+static void gmch_ggtt_invalidate(struct i915_ggtt *ggtt)
+{
+       intel_gmch_gtt_flush();
+}
+
+static void gmch_ggtt_clear_range(struct i915_address_space *vm,
+                                 u64 start, u64 length)
+{
+       intel_gmch_gtt_clear_range(start >> PAGE_SHIFT, length >> PAGE_SHIFT);
+}
+
+static void gmch_ggtt_remove(struct i915_address_space *vm)
+{
+       intel_gmch_remove();
+}
+
+/*
+ * Certain Gen5 chipsets require idling the GPU before unmapping anything from
+ * the GTT when VT-d is enabled.
+ */
+static bool needs_idle_maps(struct drm_i915_private *i915)
+{
+       /*
+        * Query intel_iommu to see if we need the workaround. Presumably that
+        * was loaded first.
+        */
+       if (!i915_vtd_active(i915))
+               return false;
+
+       if (GRAPHICS_VER(i915) == 5 && IS_MOBILE(i915))
+               return true;
+
+       return false;
+}
+
+int intel_ggtt_gmch_probe(struct i915_ggtt *ggtt)
+{
+       struct drm_i915_private *i915 = ggtt->vm.i915;
+       phys_addr_t gmadr_base;
+       int ret;
+
+       ret = intel_gmch_probe(i915->bridge_dev, to_pci_dev(i915->drm.dev), NULL);
+       if (!ret) {
+               drm_err(&i915->drm, "failed to set up gmch\n");
+               return -EIO;
+       }
+
+       intel_gmch_gtt_get(&ggtt->vm.total, &gmadr_base, &ggtt->mappable_end);
+
+       ggtt->gmadr =
+               (struct resource)DEFINE_RES_MEM(gmadr_base, ggtt->mappable_end);
+
+       ggtt->vm.alloc_pt_dma = alloc_pt_dma;
+       ggtt->vm.alloc_scratch_dma = alloc_pt_dma;
+
+       if (needs_idle_maps(i915)) {
+               drm_notice(&i915->drm,
+                          "Flushing DMA requests before IOMMU unmaps; performance may be degraded\n");
+               ggtt->do_idle_maps = true;
+       }
+
+       ggtt->vm.insert_page = gmch_ggtt_insert_page;
+       ggtt->vm.insert_entries = gmch_ggtt_insert_entries;
+       ggtt->vm.clear_range = gmch_ggtt_clear_range;
+       ggtt->vm.cleanup = gmch_ggtt_remove;
+
+       ggtt->invalidate = gmch_ggtt_invalidate;
+
+       ggtt->vm.vma_ops.bind_vma    = intel_ggtt_bind_vma;
+       ggtt->vm.vma_ops.unbind_vma  = intel_ggtt_unbind_vma;
+
+       if (unlikely(ggtt->do_idle_maps))
+               drm_notice(&i915->drm,
+                          "Applying Ironlake quirks for intel_iommu\n");
+
+       return 0;
+}
+
+int intel_ggtt_gmch_enable_hw(struct drm_i915_private *i915)
+{
+       if (!intel_gmch_enable_gtt())
+               return -EIO;
+
+       return 0;
+}
+
+void intel_ggtt_gmch_flush(void)
+{
+       intel_gmch_gtt_flush();
+}
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.h b/drivers/gpu/drm/i915/gt/intel_ggtt_gmch.h
new file mode 100644 (file)
index 0000000..370bf32
--- /dev/null
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright Â© 2022 Intel Corporation
+ */
+
+#ifndef __INTEL_GGTT_GMCH_H__
+#define __INTEL_GGTT_GMCH_H__
+
+#include "intel_gtt.h"
+
+/* For x86 platforms */
+#if IS_ENABLED(CONFIG_X86)
+
+void intel_ggtt_gmch_flush(void);
+int intel_ggtt_gmch_enable_hw(struct drm_i915_private *i915);
+int intel_ggtt_gmch_probe(struct i915_ggtt *ggtt);
+
+/* Stubs for non-x86 platforms */
+#else
+
+static inline void intel_ggtt_gmch_flush(void) { }
+static inline int intel_ggtt_gmch_enable_hw(struct drm_i915_private *i915) { return -ENODEV; }
+static inline int intel_ggtt_gmch_probe(struct i915_ggtt *ggtt) { return -ENODEV; }
+
+#endif
+
+#endif /* __INTEL_GGTT_GMCH_H__ */
index 556bca3be80403ee4d35deb86aa04f07ba02d5c7..d4e9702d3c8e7f06c007bc8cfb031672aa81c9bb 100644 (file)
 #define   XY_FAST_COLOR_BLT_DW         16
 #define   XY_FAST_COLOR_BLT_MOCS_MASK  GENMASK(27, 21)
 #define   XY_FAST_COLOR_BLT_MEM_TYPE_SHIFT 31
+
+#define   XY_FAST_COPY_BLT_D0_SRC_TILING_MASK     REG_GENMASK(21, 20)
+#define   XY_FAST_COPY_BLT_D0_DST_TILING_MASK     REG_GENMASK(14, 13)
+#define   XY_FAST_COPY_BLT_D0_SRC_TILE_MODE(mode)  \
+       REG_FIELD_PREP(XY_FAST_COPY_BLT_D0_SRC_TILING_MASK, mode)
+#define   XY_FAST_COPY_BLT_D0_DST_TILE_MODE(mode)  \
+       REG_FIELD_PREP(XY_FAST_COPY_BLT_D0_DST_TILING_MASK, mode)
+#define     LINEAR                             0
+#define     TILE_X                             0x1
+#define     XMAJOR                             0x1
+#define     YMAJOR                             0x2
+#define     TILE_64                    0x3
+#define   XY_FAST_COPY_BLT_D1_SRC_TILE4        REG_BIT(31)
+#define   XY_FAST_COPY_BLT_D1_DST_TILE4        REG_BIT(30)
+#define BLIT_CCTL_SRC_MOCS_MASK  REG_GENMASK(6, 0)
+#define BLIT_CCTL_DST_MOCS_MASK  REG_GENMASK(14, 8)
+/* Note:  MOCS value = (index << 1) */
+#define BLIT_CCTL_SRC_MOCS(idx) \
+       REG_FIELD_PREP(BLIT_CCTL_SRC_MOCS_MASK, (idx) << 1)
+#define BLIT_CCTL_DST_MOCS(idx) \
+       REG_FIELD_PREP(BLIT_CCTL_DST_MOCS_MASK, (idx) << 1)
+
 #define SRC_COPY_BLT_CMD               (2 << 29 | 0x43 << 22)
 #define GEN9_XY_FAST_COPY_BLT_CMD      (2 << 29 | 0x42 << 22)
 #define XY_SRC_COPY_BLT_CMD            (2 << 29 | 0x53 << 22)
 #define   PIPE_CONTROL_DEPTH_CACHE_FLUSH               (1<<0)
 #define   PIPE_CONTROL_GLOBAL_GTT (1<<2) /* in addr dword */
 
-/* 3D-related flags can't be set on compute engine */
-#define PIPE_CONTROL_3D_FLAGS (\
+/*
+ * 3D-related flags that can't be set on _engines_ that lack access to the 3D
+ * pipeline (i.e., CCS engines).
+ */
+#define PIPE_CONTROL_3D_ENGINE_FLAGS (\
                PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH | \
                PIPE_CONTROL_DEPTH_CACHE_FLUSH | \
                PIPE_CONTROL_TILE_CACHE_FLUSH | \
                PIPE_CONTROL_VF_CACHE_INVALIDATE | \
                PIPE_CONTROL_GLOBAL_SNAPSHOT_RESET)
 
+/* 3D-related flags that can't be set on _platforms_ that lack a 3D pipeline */
+#define PIPE_CONTROL_3D_ARCH_FLAGS ( \
+               PIPE_CONTROL_3D_ENGINE_FLAGS | \
+               PIPE_CONTROL_INDIRECT_STATE_DISABLE | \
+               PIPE_CONTROL_FLUSH_ENABLE | \
+               PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE | \
+               PIPE_CONTROL_DC_FLUSH_ENABLE)
+
 #define MI_MATH(x)                     MI_INSTR(0x1a, (x) - 1)
 #define MI_MATH_INSTR(opcode, op1, op2) ((opcode) << 20 | (op1) << 10 | (op2))
 /* Opcodes for MI_MATH_INSTR */
index 53307ca0eed0c873286a768bdf3ad94d32750349..8da3314bb6bf605267d493bb2093ecb02f14a883 100644 (file)
@@ -4,6 +4,7 @@
  */
 
 #include <drm/drm_managed.h>
+#include <drm/intel-gtt.h>
 
 #include "gem/i915_gem_internal.h"
 #include "gem/i915_gem_lmem.h"
 #include "i915_drv.h"
 #include "intel_context.h"
 #include "intel_engine_regs.h"
+#include "intel_ggtt_gmch.h"
 #include "intel_gt.h"
 #include "intel_gt_buffer_pool.h"
 #include "intel_gt_clock_utils.h"
 #include "intel_gt_debugfs.h"
-#include "intel_gt_gmch.h"
+#include "intel_gt_mcr.h"
 #include "intel_gt_pm.h"
 #include "intel_gt_regs.h"
 #include "intel_gt_requests.h"
@@ -102,78 +104,13 @@ int intel_gt_assign_ggtt(struct intel_gt *gt)
        return gt->ggtt ? 0 : -ENOMEM;
 }
 
-static const char * const intel_steering_types[] = {
-       "L3BANK",
-       "MSLICE",
-       "LNCF",
-};
-
-static const struct intel_mmio_range icl_l3bank_steering_table[] = {
-       { 0x00B100, 0x00B3FF },
-       {},
-};
-
-static const struct intel_mmio_range xehpsdv_mslice_steering_table[] = {
-       { 0x004000, 0x004AFF },
-       { 0x00C800, 0x00CFFF },
-       { 0x00DD00, 0x00DDFF },
-       { 0x00E900, 0x00FFFF }, /* 0xEA00 - OxEFFF is unused */
-       {},
-};
-
-static const struct intel_mmio_range xehpsdv_lncf_steering_table[] = {
-       { 0x00B000, 0x00B0FF },
-       { 0x00D800, 0x00D8FF },
-       {},
-};
-
-static const struct intel_mmio_range dg2_lncf_steering_table[] = {
-       { 0x00B000, 0x00B0FF },
-       { 0x00D880, 0x00D8FF },
-       {},
-};
-
-static u16 slicemask(struct intel_gt *gt, int count)
-{
-       u64 dss_mask = intel_sseu_get_subslices(&gt->info.sseu, 0);
-
-       return intel_slicemask_from_dssmask(dss_mask, count);
-}
-
 int intel_gt_init_mmio(struct intel_gt *gt)
 {
-       struct drm_i915_private *i915 = gt->i915;
-
        intel_gt_init_clock_frequency(gt);
 
        intel_uc_init_mmio(&gt->uc);
        intel_sseu_info_init(gt);
-
-       /*
-        * An mslice is unavailable only if both the meml3 for the slice is
-        * disabled *and* all of the DSS in the slice (quadrant) are disabled.
-        */
-       if (HAS_MSLICES(i915))
-               gt->info.mslice_mask =
-                       slicemask(gt, GEN_DSS_PER_MSLICE) |
-                       (intel_uncore_read(gt->uncore, GEN10_MIRROR_FUSE3) &
-                        GEN12_MEML3_EN_MASK);
-
-       if (IS_DG2(i915)) {
-               gt->steering_table[MSLICE] = xehpsdv_mslice_steering_table;
-               gt->steering_table[LNCF] = dg2_lncf_steering_table;
-       } else if (IS_XEHPSDV(i915)) {
-               gt->steering_table[MSLICE] = xehpsdv_mslice_steering_table;
-               gt->steering_table[LNCF] = xehpsdv_lncf_steering_table;
-       } else if (GRAPHICS_VER(i915) >= 11 &&
-                  GRAPHICS_VER_FULL(i915) < IP_VER(12, 50)) {
-               gt->steering_table[L3BANK] = icl_l3bank_steering_table;
-               gt->info.l3bank_mask =
-                       ~intel_uncore_read(gt->uncore, GEN10_MIRROR_FUSE3) &
-                       GEN10_L3BANK_MASK;
-       } else if (HAS_MSLICES(i915)) {
-               MISSING_CASE(INTEL_INFO(i915)->platform);
-       }
+       intel_gt_mcr_init(gt);
 
        return intel_engines_init_mmio(gt);
 }
@@ -451,7 +388,7 @@ void intel_gt_chipset_flush(struct intel_gt *gt)
 {
        wmb();
        if (GRAPHICS_VER(gt->i915) < 6)
-               intel_gt_gmch_gen5_chipset_flush(gt);
+               intel_ggtt_gmch_flush();
 }
 
 void intel_gt_driver_register(struct intel_gt *gt)
@@ -785,6 +722,7 @@ void intel_gt_driver_unregister(struct intel_gt *gt)
 {
        intel_wakeref_t wakeref;
 
+       intel_gt_sysfs_unregister(gt);
        intel_rps_driver_unregister(&gt->rps);
        intel_gsc_fini(&gt->gsc);
 
@@ -834,200 +772,6 @@ void intel_gt_driver_late_release_all(struct drm_i915_private *i915)
        }
 }
 
-/**
- * intel_gt_reg_needs_read_steering - determine whether a register read
- *     requires explicit steering
- * @gt: GT structure
- * @reg: the register to check steering requirements for
- * @type: type of multicast steering to check
- *
- * Determines whether @reg needs explicit steering of a specific type for
- * reads.
- *
- * Returns false if @reg does not belong to a register range of the given
- * steering type, or if the default (subslice-based) steering IDs are suitable
- * for @type steering too.
- */
-static bool intel_gt_reg_needs_read_steering(struct intel_gt *gt,
-                                            i915_reg_t reg,
-                                            enum intel_steering_type type)
-{
-       const u32 offset = i915_mmio_reg_offset(reg);
-       const struct intel_mmio_range *entry;
-
-       if (likely(!intel_gt_needs_read_steering(gt, type)))
-               return false;
-
-       for (entry = gt->steering_table[type]; entry->end; entry++) {
-               if (offset >= entry->start && offset <= entry->end)
-                       return true;
-       }
-
-       return false;
-}
-
-/**
- * intel_gt_get_valid_steering - determines valid IDs for a class of MCR steering
- * @gt: GT structure
- * @type: multicast register type
- * @sliceid: Slice ID returned
- * @subsliceid: Subslice ID returned
- *
- * Determines sliceid and subsliceid values that will steer reads
- * of a specific multicast register class to a valid value.
- */
-static void intel_gt_get_valid_steering(struct intel_gt *gt,
-                                       enum intel_steering_type type,
-                                       u8 *sliceid, u8 *subsliceid)
-{
-       switch (type) {
-       case L3BANK:
-               GEM_DEBUG_WARN_ON(!gt->info.l3bank_mask); /* should be impossible! */
-
-               *sliceid = 0;           /* unused */
-               *subsliceid = __ffs(gt->info.l3bank_mask);
-               break;
-       case MSLICE:
-               GEM_DEBUG_WARN_ON(!gt->info.mslice_mask); /* should be impossible! */
-
-               *sliceid = __ffs(gt->info.mslice_mask);
-               *subsliceid = 0;        /* unused */
-               break;
-       case LNCF:
-               GEM_DEBUG_WARN_ON(!gt->info.mslice_mask); /* should be impossible! */
-
-               /*
-                * An LNCF is always present if its mslice is present, so we
-                * can safely just steer to LNCF 0 in all cases.
-                */
-               *sliceid = __ffs(gt->info.mslice_mask) << 1;
-               *subsliceid = 0;        /* unused */
-               break;
-       default:
-               MISSING_CASE(type);
-               *sliceid = 0;
-               *subsliceid = 0;
-       }
-}
-
-/**
- * intel_gt_read_register_fw - reads a GT register with support for multicast
- * @gt: GT structure
- * @reg: register to read
- *
- * This function will read a GT register.  If the register is a multicast
- * register, the read will be steered to a valid instance (i.e., one that
- * isn't fused off or powered down by power gating).
- *
- * Returns the value from a valid instance of @reg.
- */
-u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg)
-{
-       int type;
-       u8 sliceid, subsliceid;
-
-       for (type = 0; type < NUM_STEERING_TYPES; type++) {
-               if (intel_gt_reg_needs_read_steering(gt, reg, type)) {
-                       intel_gt_get_valid_steering(gt, type, &sliceid,
-                                                   &subsliceid);
-                       return intel_uncore_read_with_mcr_steering_fw(gt->uncore,
-                                                                     reg,
-                                                                     sliceid,
-                                                                     subsliceid);
-               }
-       }
-
-       return intel_uncore_read_fw(gt->uncore, reg);
-}
-
-/**
- * intel_gt_get_valid_steering_for_reg - get a valid steering for a register
- * @gt: GT structure
- * @reg: register for which the steering is required
- * @sliceid: return variable for slice steering
- * @subsliceid: return variable for subslice steering
- *
- * This function returns a slice/subslice pair that is guaranteed to work for
- * read steering of the given register. Note that a value will be returned even
- * if the register is not replicated and therefore does not actually require
- * steering.
- */
-void intel_gt_get_valid_steering_for_reg(struct intel_gt *gt, i915_reg_t reg,
-                                        u8 *sliceid, u8 *subsliceid)
-{
-       int type;
-
-       for (type = 0; type < NUM_STEERING_TYPES; type++) {
-               if (intel_gt_reg_needs_read_steering(gt, reg, type)) {
-                       intel_gt_get_valid_steering(gt, type, sliceid,
-                                                   subsliceid);
-                       return;
-               }
-       }
-
-       *sliceid = gt->default_steering.groupid;
-       *subsliceid = gt->default_steering.instanceid;
-}
-
-u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg)
-{
-       int type;
-       u8 sliceid, subsliceid;
-
-       for (type = 0; type < NUM_STEERING_TYPES; type++) {
-               if (intel_gt_reg_needs_read_steering(gt, reg, type)) {
-                       intel_gt_get_valid_steering(gt, type, &sliceid,
-                                                   &subsliceid);
-                       return intel_uncore_read_with_mcr_steering(gt->uncore,
-                                                                  reg,
-                                                                  sliceid,
-                                                                  subsliceid);
-               }
-       }
-
-       return intel_uncore_read(gt->uncore, reg);
-}
-
-static void report_steering_type(struct drm_printer *p,
-                                struct intel_gt *gt,
-                                enum intel_steering_type type,
-                                bool dump_table)
-{
-       const struct intel_mmio_range *entry;
-       u8 slice, subslice;
-
-       BUILD_BUG_ON(ARRAY_SIZE(intel_steering_types) != NUM_STEERING_TYPES);
-
-       if (!gt->steering_table[type]) {
-               drm_printf(p, "%s steering: uses default steering\n",
-                          intel_steering_types[type]);
-               return;
-       }
-
-       intel_gt_get_valid_steering(gt, type, &slice, &subslice);
-       drm_printf(p, "%s steering: sliceid=0x%x, subsliceid=0x%x\n",
-                  intel_steering_types[type], slice, subslice);
-
-       if (!dump_table)
-               return;
-
-       for (entry = gt->steering_table[type]; entry->end; entry++)
-               drm_printf(p, "\t0x%06x - 0x%06x\n", entry->start, entry->end);
-}
-
-void intel_gt_report_steering(struct drm_printer *p, struct intel_gt *gt,
-                             bool dump_table)
-{
-       drm_printf(p, "Default steering: sliceid=0x%x, subsliceid=0x%x\n",
-                  gt->default_steering.groupid,
-                  gt->default_steering.instanceid);
-
-       if (HAS_MSLICES(gt->i915)) {
-               report_steering_type(p, gt, MSLICE, dump_table);
-               report_steering_type(p, gt, LNCF, dump_table);
-       }
-}
-
 static int intel_gt_tile_setup(struct intel_gt *gt, phys_addr_t phys_addr)
 {
        int ret;
index 44c6cb63ccbc8592f5f7a61260e15a9d1a6754da..82d6f248d876256f1831369951665ec526a826aa 100644 (file)
 struct drm_i915_private;
 struct drm_printer;
 
-struct insert_entries {
-       struct i915_address_space *vm;
-       struct i915_vma_resource *vma_res;
-       enum i915_cache_level level;
-       u32 flags;
-};
-
 #define GT_TRACE(gt, fmt, ...) do {                                    \
        const struct intel_gt *gt__ __maybe_unused = (gt);              \
        GEM_TRACE("%s " fmt, dev_name(gt__->i915->drm.dev),             \
@@ -93,21 +86,6 @@ static inline bool intel_gt_is_wedged(const struct intel_gt *gt)
        return unlikely(test_bit(I915_WEDGED, &gt->reset.flags));
 }
 
-static inline bool intel_gt_needs_read_steering(struct intel_gt *gt,
-                                               enum intel_steering_type type)
-{
-       return gt->steering_table[type];
-}
-
-void intel_gt_get_valid_steering_for_reg(struct intel_gt *gt, i915_reg_t reg,
-                                        u8 *sliceid, u8 *subsliceid);
-
-u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg);
-u32 intel_gt_read_register(struct intel_gt *gt, i915_reg_t reg);
-
-void intel_gt_report_steering(struct drm_printer *p, struct intel_gt *gt,
-                             bool dump_table);
-
 int intel_gt_probe_all(struct drm_i915_private *i915);
 int intel_gt_tiles_init(struct drm_i915_private *i915);
 void intel_gt_release_all(struct drm_i915_private *i915);
@@ -125,6 +103,4 @@ void intel_gt_watchdog_work(struct work_struct *work);
 
 void intel_gt_invalidate_tlbs(struct intel_gt *gt);
 
-struct resource intel_pci_resource(struct pci_dev *pdev, int bar);
-
 #endif /* __INTEL_GT_H__ */
index d886fdc2c694bd3359252528bc048dae8f764985..dd53641f36372d77ee38da71ec1ffae7f2bdc945 100644 (file)
@@ -9,6 +9,7 @@
 #include "intel_gt.h"
 #include "intel_gt_debugfs.h"
 #include "intel_gt_engines_debugfs.h"
+#include "intel_gt_mcr.h"
 #include "intel_gt_pm_debugfs.h"
 #include "intel_sseu_debugfs.h"
 #include "pxp/intel_pxp_debugfs.h"
@@ -64,7 +65,7 @@ static int steering_show(struct seq_file *m, void *data)
        struct drm_printer p = drm_seq_file_printer(m);
        struct intel_gt *gt = m->private;
 
-       intel_gt_report_steering(&p, gt, true);
+       intel_gt_mcr_report_steering(&p, gt, true);
 
        return 0;
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_gmch.c b/drivers/gpu/drm/i915/gt/intel_gt_gmch.c
deleted file mode 100644 (file)
index 18e4886..0000000
+++ /dev/null
@@ -1,654 +0,0 @@
-// SPDX-License-Identifier: MIT
-/*
- * Copyright Â© 2022 Intel Corporation
- */
-
-#include <drm/intel-gtt.h>
-#include <drm/i915_drm.h>
-
-#include <linux/agp_backend.h>
-#include <linux/stop_machine.h>
-
-#include "i915_drv.h"
-#include "intel_gt_gmch.h"
-#include "intel_gt_regs.h"
-#include "intel_gt.h"
-#include "i915_utils.h"
-
-#include "gen8_ppgtt.h"
-
-struct insert_page {
-       struct i915_address_space *vm;
-       dma_addr_t addr;
-       u64 offset;
-       enum i915_cache_level level;
-};
-
-static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
-{
-       writeq(pte, addr);
-}
-
-static void nop_clear_range(struct i915_address_space *vm,
-                           u64 start, u64 length)
-{
-}
-
-static u64 snb_pte_encode(dma_addr_t addr,
-                         enum i915_cache_level level,
-                         u32 flags)
-{
-       gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
-
-       switch (level) {
-       case I915_CACHE_L3_LLC:
-       case I915_CACHE_LLC:
-               pte |= GEN6_PTE_CACHE_LLC;
-               break;
-       case I915_CACHE_NONE:
-               pte |= GEN6_PTE_UNCACHED;
-               break;
-       default:
-               MISSING_CASE(level);
-       }
-
-       return pte;
-}
-
-static u64 ivb_pte_encode(dma_addr_t addr,
-                         enum i915_cache_level level,
-                         u32 flags)
-{
-       gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
-
-       switch (level) {
-       case I915_CACHE_L3_LLC:
-               pte |= GEN7_PTE_CACHE_L3_LLC;
-               break;
-       case I915_CACHE_LLC:
-               pte |= GEN6_PTE_CACHE_LLC;
-               break;
-       case I915_CACHE_NONE:
-               pte |= GEN6_PTE_UNCACHED;
-               break;
-       default:
-               MISSING_CASE(level);
-       }
-
-       return pte;
-}
-
-static u64 byt_pte_encode(dma_addr_t addr,
-                         enum i915_cache_level level,
-                         u32 flags)
-{
-       gen6_pte_t pte = GEN6_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
-
-       if (!(flags & PTE_READ_ONLY))
-               pte |= BYT_PTE_WRITEABLE;
-
-       if (level != I915_CACHE_NONE)
-               pte |= BYT_PTE_SNOOPED_BY_CPU_CACHES;
-
-       return pte;
-}
-
-static u64 hsw_pte_encode(dma_addr_t addr,
-                         enum i915_cache_level level,
-                         u32 flags)
-{
-       gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
-
-       if (level != I915_CACHE_NONE)
-               pte |= HSW_WB_LLC_AGE3;
-
-       return pte;
-}
-
-static u64 iris_pte_encode(dma_addr_t addr,
-                          enum i915_cache_level level,
-                          u32 flags)
-{
-       gen6_pte_t pte = HSW_PTE_ADDR_ENCODE(addr) | GEN6_PTE_VALID;
-
-       switch (level) {
-       case I915_CACHE_NONE:
-               break;
-       case I915_CACHE_WT:
-               pte |= HSW_WT_ELLC_LLC_AGE3;
-               break;
-       default:
-               pte |= HSW_WB_ELLC_LLC_AGE3;
-               break;
-       }
-
-       return pte;
-}
-
-static void gen5_ggtt_insert_page(struct i915_address_space *vm,
-                                 dma_addr_t addr,
-                                 u64 offset,
-                                 enum i915_cache_level cache_level,
-                                 u32 unused)
-{
-       unsigned int flags = (cache_level == I915_CACHE_NONE) ?
-               AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
-
-       intel_gtt_insert_page(addr, offset >> PAGE_SHIFT, flags);
-}
-
-static void gen6_ggtt_insert_page(struct i915_address_space *vm,
-                                 dma_addr_t addr,
-                                 u64 offset,
-                                 enum i915_cache_level level,
-                                 u32 flags)
-{
-       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
-       gen6_pte_t __iomem *pte =
-               (gen6_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE;
-
-       iowrite32(vm->pte_encode(addr, level, flags), pte);
-
-       ggtt->invalidate(ggtt);
-}
-
-static void gen8_ggtt_insert_page(struct i915_address_space *vm,
-                                 dma_addr_t addr,
-                                 u64 offset,
-                                 enum i915_cache_level level,
-                                 u32 flags)
-{
-       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
-       gen8_pte_t __iomem *pte =
-               (gen8_pte_t __iomem *)ggtt->gsm + offset / I915_GTT_PAGE_SIZE;
-
-       gen8_set_pte(pte, gen8_ggtt_pte_encode(addr, level, flags));
-
-       ggtt->invalidate(ggtt);
-}
-
-static void gen5_ggtt_insert_entries(struct i915_address_space *vm,
-                                    struct i915_vma_resource *vma_res,
-                                    enum i915_cache_level cache_level,
-                                    u32 unused)
-{
-       unsigned int flags = (cache_level == I915_CACHE_NONE) ?
-               AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY;
-
-       intel_gtt_insert_sg_entries(vma_res->bi.pages, vma_res->start >> PAGE_SHIFT,
-                                   flags);
-}
-
-/*
- * Binds an object into the global gtt with the specified cache level.
- * The object will be accessible to the GPU via commands whose operands
- * reference offsets within the global GTT as well as accessible by the GPU
- * through the GMADR mapped BAR (i915->mm.gtt->gtt).
- */
-static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
-                                    struct i915_vma_resource *vma_res,
-                                    enum i915_cache_level level,
-                                    u32 flags)
-{
-       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
-       gen6_pte_t __iomem *gte;
-       gen6_pte_t __iomem *end;
-       struct sgt_iter iter;
-       dma_addr_t addr;
-
-       gte = (gen6_pte_t __iomem *)ggtt->gsm;
-       gte += vma_res->start / I915_GTT_PAGE_SIZE;
-       end = gte + vma_res->node_size / I915_GTT_PAGE_SIZE;
-
-       for_each_sgt_daddr(addr, iter, vma_res->bi.pages)
-               iowrite32(vm->pte_encode(addr, level, flags), gte++);
-       GEM_BUG_ON(gte > end);
-
-       /* Fill the allocated but "unused" space beyond the end of the buffer */
-       while (gte < end)
-               iowrite32(vm->scratch[0]->encode, gte++);
-
-       /*
-        * We want to flush the TLBs only after we're certain all the PTE
-        * updates have finished.
-        */
-       ggtt->invalidate(ggtt);
-}
-
-static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
-                                    struct i915_vma_resource *vma_res,
-                                    enum i915_cache_level level,
-                                    u32 flags)
-{
-       const gen8_pte_t pte_encode = gen8_ggtt_pte_encode(0, level, flags);
-       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
-       gen8_pte_t __iomem *gte;
-       gen8_pte_t __iomem *end;
-       struct sgt_iter iter;
-       dma_addr_t addr;
-
-       /*
-        * Note that we ignore PTE_READ_ONLY here. The caller must be careful
-        * not to allow the user to override access to a read only page.
-        */
-
-       gte = (gen8_pte_t __iomem *)ggtt->gsm;
-       gte += vma_res->start / I915_GTT_PAGE_SIZE;
-       end = gte + vma_res->node_size / I915_GTT_PAGE_SIZE;
-
-       for_each_sgt_daddr(addr, iter, vma_res->bi.pages)
-               gen8_set_pte(gte++, pte_encode | addr);
-       GEM_BUG_ON(gte > end);
-
-       /* Fill the allocated but "unused" space beyond the end of the buffer */
-       while (gte < end)
-               gen8_set_pte(gte++, vm->scratch[0]->encode);
-
-       /*
-        * We want to flush the TLBs only after we're certain all the PTE
-        * updates have finished.
-        */
-       ggtt->invalidate(ggtt);
-}
-
-static void bxt_vtd_ggtt_wa(struct i915_address_space *vm)
-{
-       /*
-        * Make sure the internal GAM fifo has been cleared of all GTT
-        * writes before exiting stop_machine(). This guarantees that
-        * any aperture accesses waiting to start in another process
-        * cannot back up behind the GTT writes causing a hang.
-        * The register can be any arbitrary GAM register.
-        */
-       intel_uncore_posting_read_fw(vm->gt->uncore, GFX_FLSH_CNTL_GEN6);
-}
-
-static int bxt_vtd_ggtt_insert_page__cb(void *_arg)
-{
-       struct insert_page *arg = _arg;
-
-       gen8_ggtt_insert_page(arg->vm, arg->addr, arg->offset, arg->level, 0);
-       bxt_vtd_ggtt_wa(arg->vm);
-
-       return 0;
-}
-
-static void bxt_vtd_ggtt_insert_page__BKL(struct i915_address_space *vm,
-                                         dma_addr_t addr,
-                                         u64 offset,
-                                         enum i915_cache_level level,
-                                         u32 unused)
-{
-       struct insert_page arg = { vm, addr, offset, level };
-
-       stop_machine(bxt_vtd_ggtt_insert_page__cb, &arg, NULL);
-}
-
-static int bxt_vtd_ggtt_insert_entries__cb(void *_arg)
-{
-       struct insert_entries *arg = _arg;
-
-       gen8_ggtt_insert_entries(arg->vm, arg->vma_res, arg->level, arg->flags);
-       bxt_vtd_ggtt_wa(arg->vm);
-
-       return 0;
-}
-
-static void bxt_vtd_ggtt_insert_entries__BKL(struct i915_address_space *vm,
-                                            struct i915_vma_resource *vma_res,
-                                            enum i915_cache_level level,
-                                            u32 flags)
-{
-       struct insert_entries arg = { vm, vma_res, level, flags };
-
-       stop_machine(bxt_vtd_ggtt_insert_entries__cb, &arg, NULL);
-}
-
-void intel_gt_gmch_gen5_chipset_flush(struct intel_gt *gt)
-{
-       intel_gtt_chipset_flush();
-}
-
-static void gmch_ggtt_invalidate(struct i915_ggtt *ggtt)
-{
-       intel_gtt_chipset_flush();
-}
-
-static void gen5_ggtt_clear_range(struct i915_address_space *vm,
-                                        u64 start, u64 length)
-{
-       intel_gtt_clear_range(start >> PAGE_SHIFT, length >> PAGE_SHIFT);
-}
-
-static void gen6_ggtt_clear_range(struct i915_address_space *vm,
-                                 u64 start, u64 length)
-{
-       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
-       unsigned int first_entry = start / I915_GTT_PAGE_SIZE;
-       unsigned int num_entries = length / I915_GTT_PAGE_SIZE;
-       gen6_pte_t scratch_pte, __iomem *gtt_base =
-               (gen6_pte_t __iomem *)ggtt->gsm + first_entry;
-       const int max_entries = ggtt_total_entries(ggtt) - first_entry;
-       int i;
-
-       if (WARN(num_entries > max_entries,
-                "First entry = %d; Num entries = %d (max=%d)\n",
-                first_entry, num_entries, max_entries))
-               num_entries = max_entries;
-
-       scratch_pte = vm->scratch[0]->encode;
-       for (i = 0; i < num_entries; i++)
-               iowrite32(scratch_pte, &gtt_base[i]);
-}
-
-static void gen8_ggtt_clear_range(struct i915_address_space *vm,
-                                 u64 start, u64 length)
-{
-       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
-       unsigned int first_entry = start / I915_GTT_PAGE_SIZE;
-       unsigned int num_entries = length / I915_GTT_PAGE_SIZE;
-       const gen8_pte_t scratch_pte = vm->scratch[0]->encode;
-       gen8_pte_t __iomem *gtt_base =
-               (gen8_pte_t __iomem *)ggtt->gsm + first_entry;
-       const int max_entries = ggtt_total_entries(ggtt) - first_entry;
-       int i;
-
-       if (WARN(num_entries > max_entries,
-                "First entry = %d; Num entries = %d (max=%d)\n",
-                first_entry, num_entries, max_entries))
-               num_entries = max_entries;
-
-       for (i = 0; i < num_entries; i++)
-               gen8_set_pte(&gtt_base[i], scratch_pte);
-}
-
-static void gen5_gmch_remove(struct i915_address_space *vm)
-{
-       intel_gmch_remove();
-}
-
-static void gen6_gmch_remove(struct i915_address_space *vm)
-{
-       struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
-
-       iounmap(ggtt->gsm);
-       free_scratch(vm);
-}
-
-/*
- * Certain Gen5 chipsets require idling the GPU before
- * unmapping anything from the GTT when VT-d is enabled.
- */
-static bool needs_idle_maps(struct drm_i915_private *i915)
-{
-       /*
-        * Query intel_iommu to see if we need the workaround. Presumably that
-        * was loaded first.
-        */
-       if (!i915_vtd_active(i915))
-               return false;
-
-       if (GRAPHICS_VER(i915) == 5 && IS_MOBILE(i915))
-               return true;
-
-       if (GRAPHICS_VER(i915) == 12)
-               return true; /* XXX DMAR fault reason 7 */
-
-       return false;
-}
-
-static unsigned int gen6_gttmmadr_size(struct drm_i915_private *i915)
-{
-       /*
-        * GEN6: GTTMMADR size is 4MB and GTTADR starts at 2MB offset
-        * GEN8: GTTMMADR size is 16MB and GTTADR starts at 8MB offset
-        */
-       GEM_BUG_ON(GRAPHICS_VER(i915) < 6);
-       return (GRAPHICS_VER(i915) < 8) ? SZ_4M : SZ_16M;
-}
-
-static unsigned int gen6_get_total_gtt_size(u16 snb_gmch_ctl)
-{
-       snb_gmch_ctl >>= SNB_GMCH_GGMS_SHIFT;
-       snb_gmch_ctl &= SNB_GMCH_GGMS_MASK;
-       return snb_gmch_ctl << 20;
-}
-
-static unsigned int gen8_get_total_gtt_size(u16 bdw_gmch_ctl)
-{
-       bdw_gmch_ctl >>= BDW_GMCH_GGMS_SHIFT;
-       bdw_gmch_ctl &= BDW_GMCH_GGMS_MASK;
-       if (bdw_gmch_ctl)
-               bdw_gmch_ctl = 1 << bdw_gmch_ctl;
-
-#ifdef CONFIG_X86_32
-       /* Limit 32b platforms to a 2GB GGTT: 4 << 20 / pte size * I915_GTT_PAGE_SIZE */
-       if (bdw_gmch_ctl > 4)
-               bdw_gmch_ctl = 4;
-#endif
-
-       return bdw_gmch_ctl << 20;
-}
-
-static unsigned int gen6_gttadr_offset(struct drm_i915_private *i915)
-{
-       return gen6_gttmmadr_size(i915) / 2;
-}
-
-static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
-{
-       struct drm_i915_private *i915 = ggtt->vm.i915;
-       struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
-       phys_addr_t phys_addr;
-       u32 pte_flags;
-       int ret;
-
-       GEM_WARN_ON(pci_resource_len(pdev, 0) != gen6_gttmmadr_size(i915));
-       phys_addr = pci_resource_start(pdev, 0) + gen6_gttadr_offset(i915);
-
-       /*
-        * On BXT+/ICL+ writes larger than 64 bit to the GTT pagetable range
-        * will be dropped. For WC mappings in general we have 64 byte burst
-        * writes when the WC buffer is flushed, so we can't use it, but have to
-        * resort to an uncached mapping. The WC issue is easily caught by the
-        * readback check when writing GTT PTE entries.
-        */
-       if (IS_GEN9_LP(i915) || GRAPHICS_VER(i915) >= 11)
-               ggtt->gsm = ioremap(phys_addr, size);
-       else
-               ggtt->gsm = ioremap_wc(phys_addr, size);
-       if (!ggtt->gsm) {
-               drm_err(&i915->drm, "Failed to map the ggtt page table\n");
-               return -ENOMEM;
-       }
-
-       kref_init(&ggtt->vm.resv_ref);
-       ret = setup_scratch_page(&ggtt->vm);
-       if (ret) {
-               drm_err(&i915->drm, "Scratch setup failed\n");
-               /* iounmap will also get called at remove, but meh */
-               iounmap(ggtt->gsm);
-               return ret;
-       }
-
-       pte_flags = 0;
-       if (i915_gem_object_is_lmem(ggtt->vm.scratch[0]))
-               pte_flags |= PTE_LM;
-
-       ggtt->vm.scratch[0]->encode =
-               ggtt->vm.pte_encode(px_dma(ggtt->vm.scratch[0]),
-                                   I915_CACHE_NONE, pte_flags);
-
-       return 0;
-}
-
-int intel_gt_gmch_gen5_probe(struct i915_ggtt *ggtt)
-{
-       struct drm_i915_private *i915 = ggtt->vm.i915;
-       phys_addr_t gmadr_base;
-       int ret;
-
-       ret = intel_gmch_probe(i915->bridge_dev, to_pci_dev(i915->drm.dev), NULL);
-       if (!ret) {
-               drm_err(&i915->drm, "failed to set up gmch\n");
-               return -EIO;
-       }
-
-       intel_gtt_get(&ggtt->vm.total, &gmadr_base, &ggtt->mappable_end);
-
-       ggtt->gmadr =
-               (struct resource)DEFINE_RES_MEM(gmadr_base, ggtt->mappable_end);
-
-       ggtt->vm.alloc_pt_dma = alloc_pt_dma;
-       ggtt->vm.alloc_scratch_dma = alloc_pt_dma;
-
-       if (needs_idle_maps(i915)) {
-               drm_notice(&i915->drm,
-                          "Flushing DMA requests before IOMMU unmaps; performance may be degraded\n");
-               ggtt->do_idle_maps = true;
-       }
-
-       ggtt->vm.insert_page = gen5_ggtt_insert_page;
-       ggtt->vm.insert_entries = gen5_ggtt_insert_entries;
-       ggtt->vm.clear_range = gen5_ggtt_clear_range;
-       ggtt->vm.cleanup = gen5_gmch_remove;
-
-       ggtt->invalidate = gmch_ggtt_invalidate;
-
-       ggtt->vm.vma_ops.bind_vma    = intel_ggtt_bind_vma;
-       ggtt->vm.vma_ops.unbind_vma  = intel_ggtt_unbind_vma;
-
-       if (unlikely(ggtt->do_idle_maps))
-               drm_notice(&i915->drm,
-                          "Applying Ironlake quirks for intel_iommu\n");
-
-       return 0;
-}
-
-int intel_gt_gmch_gen6_probe(struct i915_ggtt *ggtt)
-{
-       struct drm_i915_private *i915 = ggtt->vm.i915;
-       struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
-       unsigned int size;
-       u16 snb_gmch_ctl;
-
-       ggtt->gmadr = intel_pci_resource(pdev, 2);
-       ggtt->mappable_end = resource_size(&ggtt->gmadr);
-
-       /*
-        * 64/512MB is the current min/max we actually know of, but this is
-        * just a coarse sanity check.
-        */
-       if (ggtt->mappable_end < (64<<20) || ggtt->mappable_end > (512<<20)) {
-               drm_err(&i915->drm, "Unknown GMADR size (%pa)\n",
-                       &ggtt->mappable_end);
-               return -ENXIO;
-       }
-
-       pci_read_config_word(pdev, SNB_GMCH_CTRL, &snb_gmch_ctl);
-
-       size = gen6_get_total_gtt_size(snb_gmch_ctl);
-       ggtt->vm.total = (size / sizeof(gen6_pte_t)) * I915_GTT_PAGE_SIZE;
-
-       ggtt->vm.alloc_pt_dma = alloc_pt_dma;
-       ggtt->vm.alloc_scratch_dma = alloc_pt_dma;
-
-       ggtt->vm.clear_range = nop_clear_range;
-       if (!HAS_FULL_PPGTT(i915) || intel_scanout_needs_vtd_wa(i915))
-               ggtt->vm.clear_range = gen6_ggtt_clear_range;
-       ggtt->vm.insert_page = gen6_ggtt_insert_page;
-       ggtt->vm.insert_entries = gen6_ggtt_insert_entries;
-       ggtt->vm.cleanup = gen6_gmch_remove;
-
-       ggtt->invalidate = gen6_ggtt_invalidate;
-
-       if (HAS_EDRAM(i915))
-               ggtt->vm.pte_encode = iris_pte_encode;
-       else if (IS_HASWELL(i915))
-               ggtt->vm.pte_encode = hsw_pte_encode;
-       else if (IS_VALLEYVIEW(i915))
-               ggtt->vm.pte_encode = byt_pte_encode;
-       else if (GRAPHICS_VER(i915) >= 7)
-               ggtt->vm.pte_encode = ivb_pte_encode;
-       else
-               ggtt->vm.pte_encode = snb_pte_encode;
-
-       ggtt->vm.vma_ops.bind_vma    = intel_ggtt_bind_vma;
-       ggtt->vm.vma_ops.unbind_vma  = intel_ggtt_unbind_vma;
-
-       return ggtt_probe_common(ggtt, size);
-}
-
-static unsigned int chv_get_total_gtt_size(u16 gmch_ctrl)
-{
-       gmch_ctrl >>= SNB_GMCH_GGMS_SHIFT;
-       gmch_ctrl &= SNB_GMCH_GGMS_MASK;
-
-       if (gmch_ctrl)
-               return 1 << (20 + gmch_ctrl);
-
-       return 0;
-}
-
-int intel_gt_gmch_gen8_probe(struct i915_ggtt *ggtt)
-{
-       struct drm_i915_private *i915 = ggtt->vm.i915;
-       struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
-       unsigned int size;
-       u16 snb_gmch_ctl;
-
-       /* TODO: We're not aware of mappable constraints on gen8 yet */
-       if (!HAS_LMEM(i915)) {
-               ggtt->gmadr = intel_pci_resource(pdev, 2);
-               ggtt->mappable_end = resource_size(&ggtt->gmadr);
-       }
-
-       pci_read_config_word(pdev, SNB_GMCH_CTRL, &snb_gmch_ctl);
-       if (IS_CHERRYVIEW(i915))
-               size = chv_get_total_gtt_size(snb_gmch_ctl);
-       else
-               size = gen8_get_total_gtt_size(snb_gmch_ctl);
-
-       ggtt->vm.alloc_pt_dma = alloc_pt_dma;
-       ggtt->vm.alloc_scratch_dma = alloc_pt_dma;
-       ggtt->vm.lmem_pt_obj_flags = I915_BO_ALLOC_PM_EARLY;
-
-       ggtt->vm.total = (size / sizeof(gen8_pte_t)) * I915_GTT_PAGE_SIZE;
-       ggtt->vm.cleanup = gen6_gmch_remove;
-       ggtt->vm.insert_page = gen8_ggtt_insert_page;
-       ggtt->vm.clear_range = nop_clear_range;
-       if (intel_scanout_needs_vtd_wa(i915))
-               ggtt->vm.clear_range = gen8_ggtt_clear_range;
-
-       ggtt->vm.insert_entries = gen8_ggtt_insert_entries;
-
-       /*
-        * Serialize GTT updates with aperture access on BXT if VT-d is on,
-        * and always on CHV.
-        */
-       if (intel_vm_no_concurrent_access_wa(i915)) {
-               ggtt->vm.insert_entries = bxt_vtd_ggtt_insert_entries__BKL;
-               ggtt->vm.insert_page    = bxt_vtd_ggtt_insert_page__BKL;
-               ggtt->vm.bind_async_flags =
-                       I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND;
-       }
-
-       ggtt->invalidate = gen8_ggtt_invalidate;
-
-       ggtt->vm.vma_ops.bind_vma    = intel_ggtt_bind_vma;
-       ggtt->vm.vma_ops.unbind_vma  = intel_ggtt_unbind_vma;
-
-       ggtt->vm.pte_encode = gen8_ggtt_pte_encode;
-
-       setup_private_pat(ggtt->vm.gt->uncore);
-
-       return ggtt_probe_common(ggtt, size);
-}
-
-int intel_gt_gmch_gen5_enable_hw(struct drm_i915_private *i915)
-{
-       if (GRAPHICS_VER(i915) < 6 && !intel_enable_gtt())
-               return -EIO;
-
-       return 0;
-}
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_gmch.h b/drivers/gpu/drm/i915/gt/intel_gt_gmch.h
deleted file mode 100644 (file)
index 75ed55c..0000000
+++ /dev/null
@@ -1,46 +0,0 @@
-/* SPDX-License-Identifier: MIT */
-/*
- * Copyright Â© 2022 Intel Corporation
- */
-
-#ifndef __INTEL_GT_GMCH_H__
-#define __INTEL_GT_GMCH_H__
-
-#include "intel_gtt.h"
-
-/* For x86 platforms */
-#if IS_ENABLED(CONFIG_X86)
-void intel_gt_gmch_gen5_chipset_flush(struct intel_gt *gt);
-int intel_gt_gmch_gen6_probe(struct i915_ggtt *ggtt);
-int intel_gt_gmch_gen8_probe(struct i915_ggtt *ggtt);
-int intel_gt_gmch_gen5_probe(struct i915_ggtt *ggtt);
-int intel_gt_gmch_gen5_enable_hw(struct drm_i915_private *i915);
-
-/* Stubs for non-x86 platforms */
-#else
-static inline void intel_gt_gmch_gen5_chipset_flush(struct intel_gt *gt)
-{
-}
-static inline int intel_gt_gmch_gen5_probe(struct i915_ggtt *ggtt)
-{
-       /* No HW should be probed for this case yet, return fail */
-       return -ENODEV;
-}
-static inline int intel_gt_gmch_gen6_probe(struct i915_ggtt *ggtt)
-{
-       /* No HW should be probed for this case yet, return fail */
-       return -ENODEV;
-}
-static inline int intel_gt_gmch_gen8_probe(struct i915_ggtt *ggtt)
-{
-       /* No HW should be probed for this case yet, return fail */
-       return -ENODEV;
-}
-static inline int intel_gt_gmch_gen5_enable_hw(struct drm_i915_private *i915)
-{
-       /* No HW should be enabled for this case yet, return fail */
-       return -ENODEV;
-}
-#endif
-
-#endif /* __INTEL_GT_GMCH_H__ */
index 88b4becfcb175b11d2813f1685ffbf83cbf18c58..3a72d4fd0214e41fa60924b45697265b8143b1bd 100644 (file)
@@ -193,6 +193,14 @@ void gen11_gt_irq_reset(struct intel_gt *gt)
        /* Restore masks irqs on RCS, BCS, VCS and VECS engines. */
        intel_uncore_write(uncore, GEN11_RCS0_RSVD_INTR_MASK,   ~0);
        intel_uncore_write(uncore, GEN11_BCS_RSVD_INTR_MASK,    ~0);
+       if (HAS_ENGINE(gt, BCS1) || HAS_ENGINE(gt, BCS2))
+               intel_uncore_write(uncore, XEHPC_BCS1_BCS2_INTR_MASK, ~0);
+       if (HAS_ENGINE(gt, BCS3) || HAS_ENGINE(gt, BCS4))
+               intel_uncore_write(uncore, XEHPC_BCS3_BCS4_INTR_MASK, ~0);
+       if (HAS_ENGINE(gt, BCS5) || HAS_ENGINE(gt, BCS6))
+               intel_uncore_write(uncore, XEHPC_BCS5_BCS6_INTR_MASK, ~0);
+       if (HAS_ENGINE(gt, BCS7) || HAS_ENGINE(gt, BCS8))
+               intel_uncore_write(uncore, XEHPC_BCS7_BCS8_INTR_MASK, ~0);
        intel_uncore_write(uncore, GEN11_VCS0_VCS1_INTR_MASK,   ~0);
        intel_uncore_write(uncore, GEN11_VCS2_VCS3_INTR_MASK,   ~0);
        if (HAS_ENGINE(gt, VCS4) || HAS_ENGINE(gt, VCS5))
@@ -248,6 +256,14 @@ void gen11_gt_irq_postinstall(struct intel_gt *gt)
        /* Unmask irqs on RCS, BCS, VCS and VECS engines. */
        intel_uncore_write(uncore, GEN11_RCS0_RSVD_INTR_MASK, ~smask);
        intel_uncore_write(uncore, GEN11_BCS_RSVD_INTR_MASK, ~smask);
+       if (HAS_ENGINE(gt, BCS1) || HAS_ENGINE(gt, BCS2))
+               intel_uncore_write(uncore, XEHPC_BCS1_BCS2_INTR_MASK, ~dmask);
+       if (HAS_ENGINE(gt, BCS3) || HAS_ENGINE(gt, BCS4))
+               intel_uncore_write(uncore, XEHPC_BCS3_BCS4_INTR_MASK, ~dmask);
+       if (HAS_ENGINE(gt, BCS5) || HAS_ENGINE(gt, BCS6))
+               intel_uncore_write(uncore, XEHPC_BCS5_BCS6_INTR_MASK, ~dmask);
+       if (HAS_ENGINE(gt, BCS7) || HAS_ENGINE(gt, BCS8))
+               intel_uncore_write(uncore, XEHPC_BCS7_BCS8_INTR_MASK, ~dmask);
        intel_uncore_write(uncore, GEN11_VCS0_VCS1_INTR_MASK, ~dmask);
        intel_uncore_write(uncore, GEN11_VCS2_VCS3_INTR_MASK, ~dmask);
        if (HAS_ENGINE(gt, VCS4) || HAS_ENGINE(gt, VCS5))
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.c b/drivers/gpu/drm/i915/gt/intel_gt_mcr.c
new file mode 100644 (file)
index 0000000..777025d
--- /dev/null
@@ -0,0 +1,497 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright Â© 2022 Intel Corporation
+ */
+
+#include "i915_drv.h"
+
+#include "intel_gt_mcr.h"
+#include "intel_gt_regs.h"
+
+/**
+ * DOC: GT Multicast/Replicated (MCR) Register Support
+ *
+ * Some GT registers are designed as "multicast" or "replicated" registers:
+ * multiple instances of the same register share a single MMIO offset.  MCR
+ * registers are generally used when the hardware needs to potentially track
+ * independent values of a register per hardware unit (e.g., per-subslice,
+ * per-L3bank, etc.).  The specific types of replication that exist vary
+ * per-platform.
+ *
+ * MMIO accesses to MCR registers are controlled according to the settings
+ * programmed in the platform's MCR_SELECTOR register(s).  MMIO writes to MCR
+ * registers can be done in either a (i.e., a single write updates all
+ * instances of the register to the same value) or unicast (a write updates only
+ * one specific instance).  Reads of MCR registers always operate in a unicast
+ * manner regardless of how the multicast/unicast bit is set in MCR_SELECTOR.
+ * Selection of a specific MCR instance for unicast operations is referred to
+ * as "steering."
+ *
+ * If MCR register operations are steered toward a hardware unit that is
+ * fused off or currently powered down due to power gating, the MMIO operation
+ * is "terminated" by the hardware.  Terminated read operations will return a
+ * value of zero and terminated unicast write operations will be silently
+ * ignored.
+ */
+
+#define HAS_MSLICE_STEERING(dev_priv)  (INTEL_INFO(dev_priv)->has_mslice_steering)
+
+static const char * const intel_steering_types[] = {
+       "L3BANK",
+       "MSLICE",
+       "LNCF",
+       "INSTANCE 0",
+};
+
+static const struct intel_mmio_range icl_l3bank_steering_table[] = {
+       { 0x00B100, 0x00B3FF },
+       {},
+};
+
+static const struct intel_mmio_range xehpsdv_mslice_steering_table[] = {
+       { 0x004000, 0x004AFF },
+       { 0x00C800, 0x00CFFF },
+       { 0x00DD00, 0x00DDFF },
+       { 0x00E900, 0x00FFFF }, /* 0xEA00 - OxEFFF is unused */
+       {},
+};
+
+static const struct intel_mmio_range xehpsdv_lncf_steering_table[] = {
+       { 0x00B000, 0x00B0FF },
+       { 0x00D800, 0x00D8FF },
+       {},
+};
+
+static const struct intel_mmio_range dg2_lncf_steering_table[] = {
+       { 0x00B000, 0x00B0FF },
+       { 0x00D880, 0x00D8FF },
+       {},
+};
+
+/*
+ * We have several types of MCR registers on PVC where steering to (0,0)
+ * will always provide us with a non-terminated value.  We'll stick them
+ * all in the same table for simplicity.
+ */
+static const struct intel_mmio_range pvc_instance0_steering_table[] = {
+       { 0x004000, 0x004AFF },         /* HALF-BSLICE */
+       { 0x008800, 0x00887F },         /* CC */
+       { 0x008A80, 0x008AFF },         /* TILEPSMI */
+       { 0x00B000, 0x00B0FF },         /* HALF-BSLICE */
+       { 0x00B100, 0x00B3FF },         /* L3BANK */
+       { 0x00C800, 0x00CFFF },         /* HALF-BSLICE */
+       { 0x00D800, 0x00D8FF },         /* HALF-BSLICE */
+       { 0x00DD00, 0x00DDFF },         /* BSLICE */
+       { 0x00E900, 0x00E9FF },         /* HALF-BSLICE */
+       { 0x00EC00, 0x00EEFF },         /* HALF-BSLICE */
+       { 0x00F000, 0x00FFFF },         /* HALF-BSLICE */
+       { 0x024180, 0x0241FF },         /* HALF-BSLICE */
+       {},
+};
+
+void intel_gt_mcr_init(struct intel_gt *gt)
+{
+       struct drm_i915_private *i915 = gt->i915;
+
+       /*
+        * An mslice is unavailable only if both the meml3 for the slice is
+        * disabled *and* all of the DSS in the slice (quadrant) are disabled.
+        */
+       if (HAS_MSLICE_STEERING(i915)) {
+               gt->info.mslice_mask =
+                       intel_slicemask_from_xehp_dssmask(gt->info.sseu.subslice_mask,
+                                                         GEN_DSS_PER_MSLICE);
+               gt->info.mslice_mask |=
+                       (intel_uncore_read(gt->uncore, GEN10_MIRROR_FUSE3) &
+                        GEN12_MEML3_EN_MASK);
+
+               if (!gt->info.mslice_mask) /* should be impossible! */
+                       drm_warn(&i915->drm, "mslice mask all zero!\n");
+       }
+
+       if (IS_PONTEVECCHIO(i915)) {
+               gt->steering_table[INSTANCE0] = pvc_instance0_steering_table;
+       } else if (IS_DG2(i915)) {
+               gt->steering_table[MSLICE] = xehpsdv_mslice_steering_table;
+               gt->steering_table[LNCF] = dg2_lncf_steering_table;
+       } else if (IS_XEHPSDV(i915)) {
+               gt->steering_table[MSLICE] = xehpsdv_mslice_steering_table;
+               gt->steering_table[LNCF] = xehpsdv_lncf_steering_table;
+       } else if (GRAPHICS_VER(i915) >= 11 &&
+                  GRAPHICS_VER_FULL(i915) < IP_VER(12, 50)) {
+               gt->steering_table[L3BANK] = icl_l3bank_steering_table;
+               gt->info.l3bank_mask =
+                       ~intel_uncore_read(gt->uncore, GEN10_MIRROR_FUSE3) &
+                       GEN10_L3BANK_MASK;
+               if (!gt->info.l3bank_mask) /* should be impossible! */
+                       drm_warn(&i915->drm, "L3 bank mask is all zero!\n");
+       } else if (GRAPHICS_VER(i915) >= 11) {
+               /*
+                * We expect all modern platforms to have at least some
+                * type of steering that needs to be initialized.
+                */
+               MISSING_CASE(INTEL_INFO(i915)->platform);
+       }
+}
+
+/*
+ * rw_with_mcr_steering_fw - Access a register with specific MCR steering
+ * @uncore: pointer to struct intel_uncore
+ * @reg: register being accessed
+ * @rw_flag: FW_REG_READ for read access or FW_REG_WRITE for write access
+ * @group: group number (documented as "sliceid" on older platforms)
+ * @instance: instance number (documented as "subsliceid" on older platforms)
+ * @value: register value to be written (ignored for read)
+ *
+ * Return: 0 for write access. register value for read access.
+ *
+ * Caller needs to make sure the relevant forcewake wells are up.
+ */
+static u32 rw_with_mcr_steering_fw(struct intel_uncore *uncore,
+                                  i915_reg_t reg, u8 rw_flag,
+                                  int group, int instance, u32 value)
+{
+       u32 mcr_mask, mcr_ss, mcr, old_mcr, val = 0;
+
+       lockdep_assert_held(&uncore->lock);
+
+       if (GRAPHICS_VER(uncore->i915) >= 11) {
+               mcr_mask = GEN11_MCR_SLICE_MASK | GEN11_MCR_SUBSLICE_MASK;
+               mcr_ss = GEN11_MCR_SLICE(group) | GEN11_MCR_SUBSLICE(instance);
+
+               /*
+                * Wa_22013088509
+                *
+                * The setting of the multicast/unicast bit usually wouldn't
+                * matter for read operations (which always return the value
+                * from a single register instance regardless of how that bit
+                * is set), but some platforms have a workaround requiring us
+                * to remain in multicast mode for reads.  There's no real
+                * downside to this, so we'll just go ahead and do so on all
+                * platforms; we'll only clear the multicast bit from the mask
+                * when exlicitly doing a write operation.
+                */
+               if (rw_flag == FW_REG_WRITE)
+                       mcr_mask |= GEN11_MCR_MULTICAST;
+       } else {
+               mcr_mask = GEN8_MCR_SLICE_MASK | GEN8_MCR_SUBSLICE_MASK;
+               mcr_ss = GEN8_MCR_SLICE(group) | GEN8_MCR_SUBSLICE(instance);
+       }
+
+       old_mcr = mcr = intel_uncore_read_fw(uncore, GEN8_MCR_SELECTOR);
+
+       mcr &= ~mcr_mask;
+       mcr |= mcr_ss;
+       intel_uncore_write_fw(uncore, GEN8_MCR_SELECTOR, mcr);
+
+       if (rw_flag == FW_REG_READ)
+               val = intel_uncore_read_fw(uncore, reg);
+       else
+               intel_uncore_write_fw(uncore, reg, value);
+
+       mcr &= ~mcr_mask;
+       mcr |= old_mcr & mcr_mask;
+
+       intel_uncore_write_fw(uncore, GEN8_MCR_SELECTOR, mcr);
+
+       return val;
+}
+
+static u32 rw_with_mcr_steering(struct intel_uncore *uncore,
+                               i915_reg_t reg, u8 rw_flag,
+                               int group, int instance,
+                               u32 value)
+{
+       enum forcewake_domains fw_domains;
+       u32 val;
+
+       fw_domains = intel_uncore_forcewake_for_reg(uncore, reg,
+                                                   rw_flag);
+       fw_domains |= intel_uncore_forcewake_for_reg(uncore,
+                                                    GEN8_MCR_SELECTOR,
+                                                    FW_REG_READ | FW_REG_WRITE);
+
+       spin_lock_irq(&uncore->lock);
+       intel_uncore_forcewake_get__locked(uncore, fw_domains);
+
+       val = rw_with_mcr_steering_fw(uncore, reg, rw_flag, group, instance, value);
+
+       intel_uncore_forcewake_put__locked(uncore, fw_domains);
+       spin_unlock_irq(&uncore->lock);
+
+       return val;
+}
+
+/**
+ * intel_gt_mcr_read - read a specific instance of an MCR register
+ * @gt: GT structure
+ * @reg: the MCR register to read
+ * @group: the MCR group
+ * @instance: the MCR instance
+ *
+ * Returns the value read from an MCR register after steering toward a specific
+ * group/instance.
+ */
+u32 intel_gt_mcr_read(struct intel_gt *gt,
+                     i915_reg_t reg,
+                     int group, int instance)
+{
+       return rw_with_mcr_steering(gt->uncore, reg, FW_REG_READ, group, instance, 0);
+}
+
+/**
+ * intel_gt_mcr_unicast_write - write a specific instance of an MCR register
+ * @gt: GT structure
+ * @reg: the MCR register to write
+ * @value: value to write
+ * @group: the MCR group
+ * @instance: the MCR instance
+ *
+ * Write an MCR register in unicast mode after steering toward a specific
+ * group/instance.
+ */
+void intel_gt_mcr_unicast_write(struct intel_gt *gt, i915_reg_t reg, u32 value,
+                               int group, int instance)
+{
+       rw_with_mcr_steering(gt->uncore, reg, FW_REG_WRITE, group, instance, value);
+}
+
+/**
+ * intel_gt_mcr_multicast_write - write a value to all instances of an MCR register
+ * @gt: GT structure
+ * @reg: the MCR register to write
+ * @value: value to write
+ *
+ * Write an MCR register in multicast mode to update all instances.
+ */
+void intel_gt_mcr_multicast_write(struct intel_gt *gt,
+                               i915_reg_t reg, u32 value)
+{
+       intel_uncore_write(gt->uncore, reg, value);
+}
+
+/**
+ * intel_gt_mcr_multicast_write_fw - write a value to all instances of an MCR register
+ * @gt: GT structure
+ * @reg: the MCR register to write
+ * @value: value to write
+ *
+ * Write an MCR register in multicast mode to update all instances.  This
+ * function assumes the caller is already holding any necessary forcewake
+ * domains; use intel_gt_mcr_multicast_write() in cases where forcewake should
+ * be obtained automatically.
+ */
+void intel_gt_mcr_multicast_write_fw(struct intel_gt *gt, i915_reg_t reg, u32 value)
+{
+       intel_uncore_write_fw(gt->uncore, reg, value);
+}
+
+/*
+ * reg_needs_read_steering - determine whether a register read requires
+ *     explicit steering
+ * @gt: GT structure
+ * @reg: the register to check steering requirements for
+ * @type: type of multicast steering to check
+ *
+ * Determines whether @reg needs explicit steering of a specific type for
+ * reads.
+ *
+ * Returns false if @reg does not belong to a register range of the given
+ * steering type, or if the default (subslice-based) steering IDs are suitable
+ * for @type steering too.
+ */
+static bool reg_needs_read_steering(struct intel_gt *gt,
+                                   i915_reg_t reg,
+                                   enum intel_steering_type type)
+{
+       const u32 offset = i915_mmio_reg_offset(reg);
+       const struct intel_mmio_range *entry;
+
+       if (likely(!gt->steering_table[type]))
+               return false;
+
+       for (entry = gt->steering_table[type]; entry->end; entry++) {
+               if (offset >= entry->start && offset <= entry->end)
+                       return true;
+       }
+
+       return false;
+}
+
+/*
+ * get_nonterminated_steering - determines valid IDs for a class of MCR steering
+ * @gt: GT structure
+ * @type: multicast register type
+ * @group: Group ID returned
+ * @instance: Instance ID returned
+ *
+ * Determines group and instance values that will steer reads of the specified
+ * MCR class to a non-terminated instance.
+ */
+static void get_nonterminated_steering(struct intel_gt *gt,
+                                      enum intel_steering_type type,
+                                      u8 *group, u8 *instance)
+{
+       switch (type) {
+       case L3BANK:
+               *group = 0;             /* unused */
+               *instance = __ffs(gt->info.l3bank_mask);
+               break;
+       case MSLICE:
+               GEM_WARN_ON(!HAS_MSLICE_STEERING(gt->i915));
+               *group = __ffs(gt->info.mslice_mask);
+               *instance = 0;  /* unused */
+               break;
+       case LNCF:
+               /*
+                * An LNCF is always present if its mslice is present, so we
+                * can safely just steer to LNCF 0 in all cases.
+                */
+               GEM_WARN_ON(!HAS_MSLICE_STEERING(gt->i915));
+               *group = __ffs(gt->info.mslice_mask) << 1;
+               *instance = 0;  /* unused */
+               break;
+       case INSTANCE0:
+               /*
+                * There are a lot of MCR types for which instance (0, 0)
+                * will always provide a non-terminated value.
+                */
+               *group = 0;
+               *instance = 0;
+               break;
+       default:
+               MISSING_CASE(type);
+               *group = 0;
+               *instance = 0;
+       }
+}
+
+/**
+ * intel_gt_mcr_get_nonterminated_steering - find group/instance values that
+ *    will steer a register to a non-terminated instance
+ * @gt: GT structure
+ * @reg: register for which the steering is required
+ * @group: return variable for group steering
+ * @instance: return variable for instance steering
+ *
+ * This function returns a group/instance pair that is guaranteed to work for
+ * read steering of the given register. Note that a value will be returned even
+ * if the register is not replicated and therefore does not actually require
+ * steering.
+ */
+void intel_gt_mcr_get_nonterminated_steering(struct intel_gt *gt,
+                                            i915_reg_t reg,
+                                            u8 *group, u8 *instance)
+{
+       int type;
+
+       for (type = 0; type < NUM_STEERING_TYPES; type++) {
+               if (reg_needs_read_steering(gt, reg, type)) {
+                       get_nonterminated_steering(gt, type, group, instance);
+                       return;
+               }
+       }
+
+       *group = gt->default_steering.groupid;
+       *instance = gt->default_steering.instanceid;
+}
+
+/**
+ * intel_gt_mcr_read_any_fw - reads one instance of an MCR register
+ * @gt: GT structure
+ * @reg: register to read
+ *
+ * Reads a GT MCR register.  The read will be steered to a non-terminated
+ * instance (i.e., one that isn't fused off or powered down by power gating).
+ * This function assumes the caller is already holding any necessary forcewake
+ * domains; use intel_gt_mcr_read_any() in cases where forcewake should be
+ * obtained automatically.
+ *
+ * Returns the value from a non-terminated instance of @reg.
+ */
+u32 intel_gt_mcr_read_any_fw(struct intel_gt *gt, i915_reg_t reg)
+{
+       int type;
+       u8 group, instance;
+
+       for (type = 0; type < NUM_STEERING_TYPES; type++) {
+               if (reg_needs_read_steering(gt, reg, type)) {
+                       get_nonterminated_steering(gt, type, &group, &instance);
+                       return rw_with_mcr_steering_fw(gt->uncore, reg,
+                                                      FW_REG_READ,
+                                                      group, instance, 0);
+               }
+       }
+
+       return intel_uncore_read_fw(gt->uncore, reg);
+}
+
+/**
+ * intel_gt_mcr_read_any - reads one instance of an MCR register
+ * @gt: GT structure
+ * @reg: register to read
+ *
+ * Reads a GT MCR register.  The read will be steered to a non-terminated
+ * instance (i.e., one that isn't fused off or powered down by power gating).
+ *
+ * Returns the value from a non-terminated instance of @reg.
+ */
+u32 intel_gt_mcr_read_any(struct intel_gt *gt, i915_reg_t reg)
+{
+       int type;
+       u8 group, instance;
+
+       for (type = 0; type < NUM_STEERING_TYPES; type++) {
+               if (reg_needs_read_steering(gt, reg, type)) {
+                       get_nonterminated_steering(gt, type, &group, &instance);
+                       return rw_with_mcr_steering(gt->uncore, reg,
+                                                   FW_REG_READ,
+                                                   group, instance, 0);
+               }
+       }
+
+       return intel_uncore_read(gt->uncore, reg);
+}
+
+static void report_steering_type(struct drm_printer *p,
+                                struct intel_gt *gt,
+                                enum intel_steering_type type,
+                                bool dump_table)
+{
+       const struct intel_mmio_range *entry;
+       u8 group, instance;
+
+       BUILD_BUG_ON(ARRAY_SIZE(intel_steering_types) != NUM_STEERING_TYPES);
+
+       if (!gt->steering_table[type]) {
+               drm_printf(p, "%s steering: uses default steering\n",
+                          intel_steering_types[type]);
+               return;
+       }
+
+       get_nonterminated_steering(gt, type, &group, &instance);
+       drm_printf(p, "%s steering: group=0x%x, instance=0x%x\n",
+                  intel_steering_types[type], group, instance);
+
+       if (!dump_table)
+               return;
+
+       for (entry = gt->steering_table[type]; entry->end; entry++)
+               drm_printf(p, "\t0x%06x - 0x%06x\n", entry->start, entry->end);
+}
+
+void intel_gt_mcr_report_steering(struct drm_printer *p, struct intel_gt *gt,
+                                 bool dump_table)
+{
+       drm_printf(p, "Default steering: group=0x%x, instance=0x%x\n",
+                  gt->default_steering.groupid,
+                  gt->default_steering.instanceid);
+
+       if (IS_PONTEVECCHIO(gt->i915)) {
+               report_steering_type(p, gt, INSTANCE0, dump_table);
+       } else if (HAS_MSLICE_STEERING(gt->i915)) {
+               report_steering_type(p, gt, MSLICE, dump_table);
+               report_steering_type(p, gt, LNCF, dump_table);
+       }
+}
+
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_mcr.h b/drivers/gpu/drm/i915/gt/intel_gt_mcr.h
new file mode 100644 (file)
index 0000000..506b0cb
--- /dev/null
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright Â© 2022 Intel Corporation
+ */
+
+#ifndef __INTEL_GT_MCR__
+#define __INTEL_GT_MCR__
+
+#include "intel_gt_types.h"
+
+void intel_gt_mcr_init(struct intel_gt *gt);
+
+u32 intel_gt_mcr_read(struct intel_gt *gt,
+                     i915_reg_t reg,
+                     int group, int instance);
+u32 intel_gt_mcr_read_any_fw(struct intel_gt *gt, i915_reg_t reg);
+u32 intel_gt_mcr_read_any(struct intel_gt *gt, i915_reg_t reg);
+
+void intel_gt_mcr_unicast_write(struct intel_gt *gt,
+                               i915_reg_t reg, u32 value,
+                               int group, int instance);
+void intel_gt_mcr_multicast_write(struct intel_gt *gt,
+                                 i915_reg_t reg, u32 value);
+void intel_gt_mcr_multicast_write_fw(struct intel_gt *gt,
+                                    i915_reg_t reg, u32 value);
+
+void intel_gt_mcr_get_nonterminated_steering(struct intel_gt *gt,
+                                            i915_reg_t reg,
+                                            u8 *group, u8 *instance);
+
+void intel_gt_mcr_report_steering(struct drm_printer *p, struct intel_gt *gt,
+                                 bool dump_table);
+
+#endif /* __INTEL_GT_MCR__ */
index 90a4408650378ab7e7601852821475f278cc4af3..40bdd4cb629feadcd70506d0de871582b688c7b4 100644 (file)
@@ -100,14 +100,16 @@ static int vlv_drpc(struct seq_file *m)
 {
        struct intel_gt *gt = m->private;
        struct intel_uncore *uncore = gt->uncore;
-       u32 rcctl1, pw_status;
+       u32 rcctl1, pw_status, mt_fwake_req;
 
+       mt_fwake_req = intel_uncore_read_fw(uncore, FORCEWAKE_MT);
        pw_status = intel_uncore_read(uncore, VLV_GTLC_PW_STATUS);
        rcctl1 = intel_uncore_read(uncore, GEN6_RC_CONTROL);
 
        seq_printf(m, "RC6 Enabled: %s\n",
                   str_yes_no(rcctl1 & (GEN7_RC_CTL_TO_MODE |
                                        GEN6_RC_CTL_EI_MODE(1))));
+       seq_printf(m, "Multi-threaded Forcewake Request: 0x%x\n", mt_fwake_req);
        seq_printf(m, "Render Power Well: %s\n",
                   (pw_status & VLV_GTLC_PW_RENDER_STATUS_MASK) ? "Up" : "Down");
        seq_printf(m, "Media Power Well: %s\n",
@@ -124,9 +126,10 @@ static int gen6_drpc(struct seq_file *m)
        struct intel_gt *gt = m->private;
        struct drm_i915_private *i915 = gt->i915;
        struct intel_uncore *uncore = gt->uncore;
-       u32 gt_core_status, rcctl1, rc6vids = 0;
+       u32 gt_core_status, mt_fwake_req, rcctl1, rc6vids = 0;
        u32 gen9_powergate_enable = 0, gen9_powergate_status = 0;
 
+       mt_fwake_req = intel_uncore_read_fw(uncore, FORCEWAKE_MT);
        gt_core_status = intel_uncore_read_fw(uncore, GEN6_GT_CORE_STATUS);
 
        rcctl1 = intel_uncore_read(uncore, GEN6_RC_CONTROL);
@@ -178,6 +181,7 @@ static int gen6_drpc(struct seq_file *m)
 
        seq_printf(m, "Core Power Down: %s\n",
                   str_yes_no(gt_core_status & GEN6_CORE_CPD_STATE_MASK));
+       seq_printf(m, "Multi-threaded Forcewake Request: 0x%x\n", mt_fwake_req);
        if (GRAPHICS_VER(i915) >= 9) {
                seq_printf(m, "Render Power Well: %s\n",
                           (gen9_powergate_status &
index a0a49c16babd36b3f577dfb5e33229e7a2807e6e..37c1095d8603b42bb5dadc145a2d59bfa74601ea 100644 (file)
 #define FF_SLICE_CS_CHICKEN2                   _MMIO(0x20e4)
 #define   GEN9_TSG_BARRIER_ACK_DISABLE         (1 << 8)
 #define   GEN9_POOLED_EU_LOAD_BALANCING_FIX_DISABLE    (1 << 10)
+#define   GEN12_PERF_FIX_BALANCING_CFE_DISABLE REG_BIT(15)
 
 #define GEN9_CS_DEBUG_MODE1                    _MMIO(0x20ec)
 #define   FF_DOP_CLOCK_GATE_DISABLE            REG_BIT(1)
 
 #define GEN12_PAT_INDEX(index)                 _MMIO(0x4800 + (index) * 4)
 
-#define XEHPSDV_FLAT_CCS_BASE_ADDR             _MMIO(0x4910)
-#define   XEHPSDV_CCS_BASE_SHIFT               8
+#define XEHP_TILE0_ADDR_RANGE                  _MMIO(0x4900)
+#define   XEHP_TILE_LMEM_RANGE_SHIFT           8
+
+#define XEHP_FLAT_CCS_BASE_ADDR                        _MMIO(0x4910)
+#define   XEHP_CCS_BASE_SHIFT                  8
 
 #define GAMTARBMODE                            _MMIO(0x4a08)
 #define   ARB_MODE_BWGTLB_DISABLE              (1 << 9)
 #define   GEN11_GT_VEBOX_DISABLE_MASK          (0x0f << GEN11_GT_VEBOX_DISABLE_SHIFT)
 
 #define GEN12_GT_COMPUTE_DSS_ENABLE            _MMIO(0x9144)
+#define XEHPC_GT_COMPUTE_DSS_ENABLE_EXT                _MMIO(0x9148)
 
 #define GEN6_UCGCTL1                           _MMIO(0x9400)
 #define   GEN6_GAMUNIT_CLOCK_GATE_DISABLE      (1 << 22)
 /* GEN11 changed all bit defs except for FULL & RENDER */
 #define   GEN11_GRDOM_FULL                     GEN6_GRDOM_FULL
 #define   GEN11_GRDOM_RENDER                   GEN6_GRDOM_RENDER
-#define   GEN11_GRDOM_BLT                      (1 << 2)
-#define   GEN11_GRDOM_GUC                      (1 << 3)
-#define   GEN11_GRDOM_MEDIA                    (1 << 5)
-#define   GEN11_GRDOM_MEDIA2                   (1 << 6)
-#define   GEN11_GRDOM_MEDIA3                   (1 << 7)
-#define   GEN11_GRDOM_MEDIA4                   (1 << 8)
-#define   GEN11_GRDOM_MEDIA5                   (1 << 9)
-#define   GEN11_GRDOM_MEDIA6                   (1 << 10)
-#define   GEN11_GRDOM_MEDIA7                   (1 << 11)
-#define   GEN11_GRDOM_MEDIA8                   (1 << 12)
-#define   GEN11_GRDOM_VECS                     (1 << 13)
-#define   GEN11_GRDOM_VECS2                    (1 << 14)
-#define   GEN11_GRDOM_VECS3                    (1 << 15)
-#define   GEN11_GRDOM_VECS4                    (1 << 16)
-#define   GEN11_GRDOM_SFC0                     (1 << 17)
-#define   GEN11_GRDOM_SFC1                     (1 << 18)
-#define   GEN11_GRDOM_SFC2                     (1 << 19)
-#define   GEN11_GRDOM_SFC3                     (1 << 20)
+#define   XEHPC_GRDOM_BLT8                     REG_BIT(31)
+#define   XEHPC_GRDOM_BLT7                     REG_BIT(30)
+#define   XEHPC_GRDOM_BLT6                     REG_BIT(29)
+#define   XEHPC_GRDOM_BLT5                     REG_BIT(28)
+#define   XEHPC_GRDOM_BLT4                     REG_BIT(27)
+#define   XEHPC_GRDOM_BLT3                     REG_BIT(26)
+#define   XEHPC_GRDOM_BLT2                     REG_BIT(25)
+#define   XEHPC_GRDOM_BLT1                     REG_BIT(24)
+#define   GEN11_GRDOM_SFC3                     REG_BIT(20)
+#define   GEN11_GRDOM_SFC2                     REG_BIT(19)
+#define   GEN11_GRDOM_SFC1                     REG_BIT(18)
+#define   GEN11_GRDOM_SFC0                     REG_BIT(17)
+#define   GEN11_GRDOM_VECS4                    REG_BIT(16)
+#define   GEN11_GRDOM_VECS3                    REG_BIT(15)
+#define   GEN11_GRDOM_VECS2                    REG_BIT(14)
+#define   GEN11_GRDOM_VECS                     REG_BIT(13)
+#define   GEN11_GRDOM_MEDIA8                   REG_BIT(12)
+#define   GEN11_GRDOM_MEDIA7                   REG_BIT(11)
+#define   GEN11_GRDOM_MEDIA6                   REG_BIT(10)
+#define   GEN11_GRDOM_MEDIA5                   REG_BIT(9)
+#define   GEN11_GRDOM_MEDIA4                   REG_BIT(8)
+#define   GEN11_GRDOM_MEDIA3                   REG_BIT(7)
+#define   GEN11_GRDOM_MEDIA2                   REG_BIT(6)
+#define   GEN11_GRDOM_MEDIA                    REG_BIT(5)
+#define   GEN11_GRDOM_GUC                      REG_BIT(3)
+#define   GEN11_GRDOM_BLT                      REG_BIT(2)
 #define   GEN11_VCS_SFC_RESET_BIT(instance)    (GEN11_GRDOM_SFC0 << ((instance) >> 1))
 #define   GEN11_VECS_SFC_RESET_BIT(instance)   (GEN11_GRDOM_SFC0 << (instance))
 
 
 #define GEN7_MISCCPCTL                         _MMIO(0x9424)
 #define   GEN7_DOP_CLOCK_GATE_ENABLE           (1 << 0)
+#define   GEN12_DOP_CLOCK_GATE_RENDER_ENABLE   REG_BIT(1)
 #define   GEN8_DOP_CLOCK_GATE_CFCLK_ENABLE     (1 << 2)
 #define   GEN8_DOP_CLOCK_GATE_GUC_ENABLE       (1 << 4)
 #define   GEN8_DOP_CLOCK_GATE_MEDIA_ENABLE     (1 << 6)
 #define   GEN6_AGGRESSIVE_TURBO                        (0 << 15)
 #define   GEN9_SW_REQ_UNSLICE_RATIO_SHIFT      23
 #define   GEN9_IGNORE_SLICE_RATIO              (0 << 0)
+#define   GEN12_MEDIA_FREQ_RATIO               REG_BIT(13)
 
 #define GEN6_RC_VIDEO_FREQ                     _MMIO(0xa00c)
 #define   GEN6_RC_CTL_RC6pp_ENABLE             (1 << 16)
 #define XEHP_L3SCQREG7                         _MMIO(0xb188)
 #define   BLEND_FILL_CACHING_OPT_DIS           REG_BIT(3)
 
+#define XEHPC_L3SCRUB                          _MMIO(0xb18c)
+#define   SCRUB_CL_DWNGRADE_SHARED             REG_BIT(12)
+#define   SCRUB_RATE_PER_BANK_MASK             REG_GENMASK(2, 0)
+#define   SCRUB_RATE_4B_PER_CLK                        REG_FIELD_PREP(SCRUB_RATE_PER_BANK_MASK, 0x6)
+
 #define L3SQCREG1_CCS0                         _MMIO(0xb200)
 #define   FLUSHALLNONCOH                       REG_BIT(5)
 
 #define   GEN9_ENABLE_GPGPU_PREEMPTION         REG_BIT(2)
 
 #define GEN10_CACHE_MODE_SS                    _MMIO(0xe420)
-#define   ENABLE_PREFETCH_INTO_IC              REG_BIT(3)
+#define   ENABLE_EU_COUNT_FOR_TDL_FLUSH                REG_BIT(10)
+#define   DISABLE_ECC                          REG_BIT(5)
 #define   FLOAT_BLEND_OPTIMIZATION_ENABLE      REG_BIT(4)
+#define   ENABLE_PREFETCH_INTO_IC              REG_BIT(3)
 
 #define EU_PERF_CNTL0                          _MMIO(0xe458)
 #define EU_PERF_CNTL4                          _MMIO(0xe45c)
 #define   GEN11_KCR                            (19)
 #define   GEN11_GTPM                           (16)
 #define   GEN11_BCS                            (15)
+#define   XEHPC_BCS1                           (14)
+#define   XEHPC_BCS2                           (13)
+#define   XEHPC_BCS3                           (12)
+#define   XEHPC_BCS4                           (11)
+#define   XEHPC_BCS5                           (10)
+#define   XEHPC_BCS6                           (9)
+#define   XEHPC_BCS7                           (8)
+#define   XEHPC_BCS8                           (23)
 #define   GEN12_CCS3                           (7)
 #define   GEN12_CCS2                           (6)
 #define   GEN12_CCS1                           (5)
 #define GEN11_GUNIT_CSME_INTR_MASK             _MMIO(0x1900f4)
 #define GEN12_CCS0_CCS1_INTR_MASK              _MMIO(0x190100)
 #define GEN12_CCS2_CCS3_INTR_MASK              _MMIO(0x190104)
+#define XEHPC_BCS1_BCS2_INTR_MASK              _MMIO(0x190110)
+#define XEHPC_BCS3_BCS4_INTR_MASK              _MMIO(0x190114)
+#define XEHPC_BCS5_BCS6_INTR_MASK              _MMIO(0x190118)
+#define XEHPC_BCS7_BCS8_INTR_MASK              _MMIO(0x19011c)
 
 #define GEN12_SFC_DONE(n)                      _MMIO(0x1cc000 + (n) * 0x1000)
 
index 8ec8bc660c8c2bbf88c551bcfac3294753b0123b..9e4ebf53379bcaf8a17afd24ec921b492b136267 100644 (file)
@@ -24,7 +24,7 @@ bool is_object_gt(struct kobject *kobj)
 
 static struct intel_gt *kobj_to_gt(struct kobject *kobj)
 {
-       return container_of(kobj, struct kobj_gt, base)->gt;
+       return container_of(kobj, struct intel_gt, sysfs_gt);
 }
 
 struct intel_gt *intel_gt_sysfs_get_drvdata(struct device *dev,
@@ -72,9 +72,9 @@ static struct attribute *id_attrs[] = {
 };
 ATTRIBUTE_GROUPS(id);
 
+/* A kobject needs a release() method even if it does nothing */
 static void kobj_gt_release(struct kobject *kobj)
 {
-       kfree(kobj);
 }
 
 static struct kobj_type kobj_gt_type = {
@@ -85,8 +85,6 @@ static struct kobj_type kobj_gt_type = {
 
 void intel_gt_sysfs_register(struct intel_gt *gt)
 {
-       struct kobj_gt *kg;
-
        /*
         * We need to make things right with the
         * ABI compatibility. The files were originally
@@ -98,25 +96,22 @@ void intel_gt_sysfs_register(struct intel_gt *gt)
        if (gt_is_root(gt))
                intel_gt_sysfs_pm_init(gt, gt_get_parent_obj(gt));
 
-       kg = kzalloc(sizeof(*kg), GFP_KERNEL);
-       if (!kg)
+       /* init and xfer ownership to sysfs tree */
+       if (kobject_init_and_add(&gt->sysfs_gt, &kobj_gt_type,
+                                gt->i915->sysfs_gt, "gt%d", gt->info.id))
                goto exit_fail;
 
-       kobject_init(&kg->base, &kobj_gt_type);
-       kg->gt = gt;
-
-       /* xfer ownership to sysfs tree */
-       if (kobject_add(&kg->base, gt->i915->sysfs_gt, "gt%d", gt->info.id))
-               goto exit_kobj_put;
-
-       intel_gt_sysfs_pm_init(gt, &kg->base);
+       intel_gt_sysfs_pm_init(gt, &gt->sysfs_gt);
 
        return;
 
-exit_kobj_put:
-       kobject_put(&kg->base);
-
 exit_fail:
+       kobject_put(&gt->sysfs_gt);
        drm_warn(&gt->i915->drm,
                 "failed to initialize gt%d sysfs root\n", gt->info.id);
 }
+
+void intel_gt_sysfs_unregister(struct intel_gt *gt)
+{
+       kobject_put(&gt->sysfs_gt);
+}
index 9471b26752cfcfeccffc5c159d551925a3778fb6..a99aa7e8b01a631f85080f7e9da13421362887c4 100644 (file)
 
 struct intel_gt;
 
-struct kobj_gt {
-       struct kobject base;
-       struct intel_gt *gt;
-};
-
 bool is_object_gt(struct kobject *kobj);
 
 struct drm_i915_private *kobj_to_i915(struct kobject *kobj);
@@ -28,6 +23,7 @@ intel_gt_create_kobj(struct intel_gt *gt,
                     const char *name);
 
 void intel_gt_sysfs_register(struct intel_gt *gt);
+void intel_gt_sysfs_unregister(struct intel_gt *gt);
 struct intel_gt *intel_gt_sysfs_get_drvdata(struct device *dev,
                                            const char *name);
 
index f76b6cf8040ec86ace1285356b1639836118b014..73a8b46e0234430ec08515c93fdad7c4547a38e2 100644 (file)
@@ -14,6 +14,7 @@
 #include "intel_gt_regs.h"
 #include "intel_gt_sysfs.h"
 #include "intel_gt_sysfs_pm.h"
+#include "intel_pcode.h"
 #include "intel_rc6.h"
 #include "intel_rps.h"
 
@@ -558,6 +559,174 @@ static const struct attribute *freq_attrs[] = {
        NULL
 };
 
+/*
+ * Scaling for multipliers (aka frequency factors).
+ * The format of the value in the register is u8.8.
+ *
+ * The presentation to userspace is inspired by the perf event framework.
+ * See:
+ *   Documentation/ABI/testing/sysfs-bus-event_source-devices-events
+ * for description of:
+ *   /sys/bus/event_source/devices/<pmu>/events/<event>.scale
+ *
+ * Summary: Expose two sysfs files for each multiplier.
+ *
+ * 1. File <attr> contains a raw hardware value.
+ * 2. File <attr>.scale contains the multiplicative scale factor to be
+ *    used by userspace to compute the actual value.
+ *
+ * So userspace knows that to get the frequency_factor it multiplies the
+ * provided value by the specified scale factor and vice-versa.
+ *
+ * That way there is no precision loss in the kernel interface and API
+ * is future proof should one day the hardware register change to u16.u16,
+ * on some platform. (Or any other fixed point representation.)
+ *
+ * Example:
+ * File <attr> contains the value 2.5, represented as u8.8 0x0280, which
+ * is comprised of:
+ * - an integer part of 2
+ * - a fractional part of 0x80 (representing 0x80 / 2^8 == 0x80 / 256).
+ * File <attr>.scale contains a string representation of floating point
+ * value 0.00390625 (which is (1 / 256)).
+ * Userspace computes the actual value:
+ *   0x0280 * 0.00390625 -> 2.5
+ * or converts an actual value to the value to be written into <attr>:
+ *   2.5 / 0.00390625 -> 0x0280
+ */
+
+#define U8_8_VAL_MASK           0xffff
+#define U8_8_SCALE_TO_VALUE     "0.00390625"
+
+static ssize_t freq_factor_scale_show(struct device *dev,
+                                     struct device_attribute *attr,
+                                     char *buff)
+{
+       return sysfs_emit(buff, "%s\n", U8_8_SCALE_TO_VALUE);
+}
+
+static u32 media_ratio_mode_to_factor(u32 mode)
+{
+       /* 0 -> 0, 1 -> 256, 2 -> 128 */
+       return !mode ? mode : 256 / mode;
+}
+
+static ssize_t media_freq_factor_show(struct device *dev,
+                                     struct device_attribute *attr,
+                                     char *buff)
+{
+       struct intel_gt *gt = intel_gt_sysfs_get_drvdata(dev, attr->attr.name);
+       struct intel_guc_slpc *slpc = &gt->uc.guc.slpc;
+       intel_wakeref_t wakeref;
+       u32 mode;
+
+       /*
+        * Retrieve media_ratio_mode from GEN6_RPNSWREQ bit 13 set by
+        * GuC. GEN6_RPNSWREQ:13 value 0 represents 1:2 and 1 represents 1:1
+        */
+       if (IS_XEHPSDV(gt->i915) &&
+           slpc->media_ratio_mode == SLPC_MEDIA_RATIO_MODE_DYNAMIC_CONTROL) {
+               /*
+                * For XEHPSDV dynamic mode GEN6_RPNSWREQ:13 does not contain
+                * the media_ratio_mode, just return the cached media ratio
+                */
+               mode = slpc->media_ratio_mode;
+       } else {
+               with_intel_runtime_pm(gt->uncore->rpm, wakeref)
+                       mode = intel_uncore_read(gt->uncore, GEN6_RPNSWREQ);
+               mode = REG_FIELD_GET(GEN12_MEDIA_FREQ_RATIO, mode) ?
+                       SLPC_MEDIA_RATIO_MODE_FIXED_ONE_TO_ONE :
+                       SLPC_MEDIA_RATIO_MODE_FIXED_ONE_TO_TWO;
+       }
+
+       return sysfs_emit(buff, "%u\n", media_ratio_mode_to_factor(mode));
+}
+
+static ssize_t media_freq_factor_store(struct device *dev,
+                                      struct device_attribute *attr,
+                                      const char *buff, size_t count)
+{
+       struct intel_gt *gt = intel_gt_sysfs_get_drvdata(dev, attr->attr.name);
+       struct intel_guc_slpc *slpc = &gt->uc.guc.slpc;
+       u32 factor, mode;
+       int err;
+
+       err = kstrtou32(buff, 0, &factor);
+       if (err)
+               return err;
+
+       for (mode = SLPC_MEDIA_RATIO_MODE_DYNAMIC_CONTROL;
+            mode <= SLPC_MEDIA_RATIO_MODE_FIXED_ONE_TO_TWO; mode++)
+               if (factor == media_ratio_mode_to_factor(mode))
+                       break;
+
+       if (mode > SLPC_MEDIA_RATIO_MODE_FIXED_ONE_TO_TWO)
+               return -EINVAL;
+
+       err = intel_guc_slpc_set_media_ratio_mode(slpc, mode);
+       if (!err) {
+               slpc->media_ratio_mode = mode;
+               DRM_DEBUG("Set slpc->media_ratio_mode to %d", mode);
+       }
+       return err ?: count;
+}
+
+static ssize_t media_RP0_freq_mhz_show(struct device *dev,
+                                      struct device_attribute *attr,
+                                      char *buff)
+{
+       struct intel_gt *gt = intel_gt_sysfs_get_drvdata(dev, attr->attr.name);
+       u32 val;
+       int err;
+
+       err = snb_pcode_read_p(gt->uncore, XEHP_PCODE_FREQUENCY_CONFIG,
+                              PCODE_MBOX_FC_SC_READ_FUSED_P0,
+                              PCODE_MBOX_DOMAIN_MEDIAFF, &val);
+
+       if (err)
+               return err;
+
+       /* Fused media RP0 read from pcode is in units of 50 MHz */
+       val *= GT_FREQUENCY_MULTIPLIER;
+
+       return sysfs_emit(buff, "%u\n", val);
+}
+
+static ssize_t media_RPn_freq_mhz_show(struct device *dev,
+                                      struct device_attribute *attr,
+                                      char *buff)
+{
+       struct intel_gt *gt = intel_gt_sysfs_get_drvdata(dev, attr->attr.name);
+       u32 val;
+       int err;
+
+       err = snb_pcode_read_p(gt->uncore, XEHP_PCODE_FREQUENCY_CONFIG,
+                              PCODE_MBOX_FC_SC_READ_FUSED_PN,
+                              PCODE_MBOX_DOMAIN_MEDIAFF, &val);
+
+       if (err)
+               return err;
+
+       /* Fused media RPn read from pcode is in units of 50 MHz */
+       val *= GT_FREQUENCY_MULTIPLIER;
+
+       return sysfs_emit(buff, "%u\n", val);
+}
+
+static DEVICE_ATTR_RW(media_freq_factor);
+static struct device_attribute dev_attr_media_freq_factor_scale =
+       __ATTR(media_freq_factor.scale, 0444, freq_factor_scale_show, NULL);
+static DEVICE_ATTR_RO(media_RP0_freq_mhz);
+static DEVICE_ATTR_RO(media_RPn_freq_mhz);
+
+static const struct attribute *media_perf_power_attrs[] = {
+       &dev_attr_media_freq_factor.attr,
+       &dev_attr_media_freq_factor_scale.attr,
+       &dev_attr_media_RP0_freq_mhz.attr,
+       &dev_attr_media_RPn_freq_mhz.attr,
+       NULL
+};
+
 static int intel_sysfs_rps_init(struct intel_gt *gt, struct kobject *kobj,
                                const struct attribute * const *attrs)
 {
@@ -599,4 +768,12 @@ void intel_gt_sysfs_pm_init(struct intel_gt *gt, struct kobject *kobj)
                drm_warn(&gt->i915->drm,
                         "failed to create gt%u throttle sysfs files (%pe)",
                         gt->info.id, ERR_PTR(ret));
+
+       if (HAS_MEDIA_RATIO_MODE(gt->i915) && intel_uc_uses_guc_slpc(&gt->uc)) {
+               ret = sysfs_create_files(kobj, media_perf_power_attrs);
+               if (ret)
+                       drm_warn(&gt->i915->drm,
+                                "failed to create gt%u media_perf_power_attrs sysfs (%pe)\n",
+                                gt->info.id, ERR_PTR(ret));
+       }
 }
index b06611c1d4ada19860d07be87ec5a4411bfde81e..df708802889dfc00e17863295b11bba5bf7d2e90 100644 (file)
@@ -59,6 +59,13 @@ enum intel_steering_type {
        MSLICE,
        LNCF,
 
+       /*
+        * On some platforms there are multiple types of MCR registers that
+        * will always return a non-terminated value at instance (0, 0).  We'll
+        * lump those all into a single category to keep things simple.
+        */
+       INSTANCE0,
+
        NUM_STEERING_TYPES
 };
 
@@ -221,9 +228,13 @@ struct intel_gt {
 
        struct {
                u8 uc_index;
+               u8 wb_index; /* Only used on HAS_L3_CCS_READ() platforms */
        } mocs;
 
        struct intel_pxp pxp;
+
+       /* gt/gtN sysfs */
+       struct kobject sysfs_gt;
 };
 
 enum intel_gt_scratch_field {
index a40d928b38884d2e12ef9e717ddc3da059e50448..e639434e97fdb40dcfa2af0b0bb84d2de9ff5617 100644 (file)
@@ -306,6 +306,15 @@ struct i915_address_space {
                               struct i915_vma_resource *vma_res,
                               enum i915_cache_level cache_level,
                               u32 flags);
+       void (*raw_insert_page)(struct i915_address_space *vm,
+                               dma_addr_t addr,
+                               u64 offset,
+                               enum i915_cache_level cache_level,
+                               u32 flags);
+       void (*raw_insert_entries)(struct i915_address_space *vm,
+                                  struct i915_vma_resource *vma_res,
+                                  enum i915_cache_level cache_level,
+                                  u32 flags);
        void (*cleanup)(struct i915_address_space *vm);
 
        void (*foreach)(struct i915_address_space *vm,
@@ -345,6 +354,19 @@ struct i915_ggtt {
 
        bool do_idle_maps;
 
+       /**
+        * @pte_lost: Are ptes lost on resume?
+        *
+        * Whether the system was recently restored from hibernate and
+        * thus may have lost pte content.
+        */
+       bool pte_lost;
+
+       /**
+        * @probed_pte: Probed pte value on suspend. Re-checked on resume.
+        */
+       u64 probed_pte;
+
        int mtrr;
 
        /** Bit 6 swizzling required for X tiling */
@@ -548,14 +570,13 @@ i915_page_dir_dma_addr(const struct i915_ppgtt *ppgtt, const unsigned int n)
 
 void ppgtt_init(struct i915_ppgtt *ppgtt, struct intel_gt *gt,
                unsigned long lmem_pt_obj_flags);
-
 void intel_ggtt_bind_vma(struct i915_address_space *vm,
-                         struct i915_vm_pt_stash *stash,
-                         struct i915_vma_resource *vma_res,
-                         enum i915_cache_level cache_level,
-                         u32 flags);
+                        struct i915_vm_pt_stash *stash,
+                        struct i915_vma_resource *vma_res,
+                        enum i915_cache_level cache_level,
+                        u32 flags);
 void intel_ggtt_unbind_vma(struct i915_address_space *vm,
-                           struct i915_vma_resource *vma_res);
+                          struct i915_vma_resource *vma_res);
 
 int i915_ggtt_probe_hw(struct drm_i915_private *i915);
 int i915_ggtt_init_hw(struct drm_i915_private *i915);
@@ -581,6 +602,17 @@ bool i915_ggtt_resume_vm(struct i915_address_space *vm);
 void i915_ggtt_suspend(struct i915_ggtt *gtt);
 void i915_ggtt_resume(struct i915_ggtt *ggtt);
 
+/**
+ * i915_ggtt_mark_pte_lost - Mark ggtt ptes as lost or clear such a marking
+ * @i915 The device private.
+ * @val whether the ptes should be marked as lost.
+ *
+ * In some cases pte content is retained across suspend, but typically lost
+ * across hibernate. Typically they should be marked as lost on
+ * hibernation restore and such marking cleared on suspend.
+ */
+void i915_ggtt_mark_pte_lost(struct drm_i915_private *i915, bool val);
+
 void
 fill_page_dma(struct drm_i915_gem_object *p, const u64 val, unsigned int count);
 
@@ -627,7 +659,6 @@ release_pd_entry(struct i915_page_directory * const pd,
                 struct i915_page_table * const pt,
                 const struct drm_i915_gem_object * const scratch);
 void gen6_ggtt_invalidate(struct i915_ggtt *ggtt);
-void gen8_ggtt_invalidate(struct i915_ggtt *ggtt);
 
 void ppgtt_bind_vma(struct i915_address_space *vm,
                    struct i915_vm_pt_stash *stash,
index 31be734010db3bc5cd7234caff95a42268e035a3..a390f0813c8b64e5b0ee08a48a9ca7e875ad9e94 100644 (file)
@@ -111,16 +111,6 @@ enum {
 #define XEHP_SW_COUNTER_SHIFT                  58
 #define XEHP_SW_COUNTER_WIDTH                  6
 
-static inline u32 lrc_desc_priority(int prio)
-{
-       if (prio > I915_PRIORITY_NORMAL)
-               return GEN12_CTX_PRIORITY_HIGH;
-       else if (prio < I915_PRIORITY_NORMAL)
-               return GEN12_CTX_PRIORITY_LOW;
-       else
-               return GEN12_CTX_PRIORITY_NORMAL;
-}
-
 static inline void lrc_runtime_start(struct intel_context *ce)
 {
        struct intel_context_stats *stats = &ce->stats;
index c4c37585ae8cc7ce9a8a2bd915b544631b33d6ec..c6ebe2781076465260a51e944d0e424c06843e9c 100644 (file)
@@ -23,6 +23,7 @@ struct drm_i915_mocs_table {
        unsigned int n_entries;
        const struct drm_i915_mocs_entry *table;
        u8 uc_index;
+       u8 wb_index; /* Only used on HAS_L3_CCS_READ() platforms */
        u8 unused_entries_index;
 };
 
@@ -47,6 +48,7 @@ struct drm_i915_mocs_table {
 
 /* Helper defines */
 #define GEN9_NUM_MOCS_ENTRIES  64  /* 63-64 are reserved, but configured. */
+#define PVC_NUM_MOCS_ENTRIES   3
 
 /* (e)LLC caching options */
 /*
@@ -394,6 +396,17 @@ static const struct drm_i915_mocs_entry dg2_mocs_table_g10_ax[] = {
        MOCS_ENTRY(3, 0, L3_3_WB | L3_LKUP(1)),
 };
 
+static const struct drm_i915_mocs_entry pvc_mocs_table[] = {
+       /* Error */
+       MOCS_ENTRY(0, 0, L3_3_WB),
+
+       /* UC */
+       MOCS_ENTRY(1, 0, L3_1_UC),
+
+       /* WB */
+       MOCS_ENTRY(2, 0, L3_3_WB),
+};
+
 enum {
        HAS_GLOBAL_MOCS = BIT(0),
        HAS_ENGINE_MOCS = BIT(1),
@@ -423,7 +436,14 @@ static unsigned int get_mocs_settings(const struct drm_i915_private *i915,
        memset(table, 0, sizeof(struct drm_i915_mocs_table));
 
        table->unused_entries_index = I915_MOCS_PTE;
-       if (IS_DG2(i915)) {
+       if (IS_PONTEVECCHIO(i915)) {
+               table->size = ARRAY_SIZE(pvc_mocs_table);
+               table->table = pvc_mocs_table;
+               table->n_entries = PVC_NUM_MOCS_ENTRIES;
+               table->uc_index = 1;
+               table->wb_index = 2;
+               table->unused_entries_index = 2;
+       } else if (IS_DG2(i915)) {
                if (IS_DG2_GRAPHICS_STEP(i915, G10, STEP_A0, STEP_B0)) {
                        table->size = ARRAY_SIZE(dg2_mocs_table_g10_ax);
                        table->table = dg2_mocs_table_g10_ax;
@@ -622,6 +642,8 @@ void intel_set_mocs_index(struct intel_gt *gt)
 
        get_mocs_settings(gt->i915, &table);
        gt->mocs.uc_index = table.uc_index;
+       if (HAS_L3_CCS_READ(gt->i915))
+               gt->mocs.wb_index = table.wb_index;
 }
 
 void intel_mocs_init(struct intel_gt *gt)
index f5111c0a006058c2c67dec1c8ee95fa74adb9e51..d09b996a975944c0cbf78e9d96654c644bbdf12e 100644 (file)
@@ -12,6 +12,7 @@
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_ttm.h"
 #include "gt/intel_gt.h"
+#include "gt/intel_gt_mcr.h"
 #include "gt/intel_gt_regs.h"
 
 static int
@@ -101,14 +102,24 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
                return ERR_PTR(-ENODEV);
 
        if (HAS_FLAT_CCS(i915)) {
+               resource_size_t lmem_range;
                u64 tile_stolen, flat_ccs_base;
 
-               lmem_size = pci_resource_len(pdev, 2);
-               flat_ccs_base = intel_gt_read_register(gt, XEHPSDV_FLAT_CCS_BASE_ADDR);
-               flat_ccs_base = (flat_ccs_base >> XEHPSDV_CCS_BASE_SHIFT) * SZ_64K;
+               lmem_range = intel_gt_mcr_read_any(&i915->gt0, XEHP_TILE0_ADDR_RANGE) & 0xFFFF;
+               lmem_size = lmem_range >> XEHP_TILE_LMEM_RANGE_SHIFT;
+               lmem_size *= SZ_1G;
+
+               flat_ccs_base = intel_gt_mcr_read_any(gt, XEHP_FLAT_CCS_BASE_ADDR);
+               flat_ccs_base = (flat_ccs_base >> XEHP_CCS_BASE_SHIFT) * SZ_64K;
+
+               /* FIXME: Remove this when we have small-bar enabled */
+               if (pci_resource_len(pdev, 2) < lmem_size) {
+                       drm_err(&i915->drm, "System requires small-BAR support, which is currently unsupported on this kernel\n");
+                       return ERR_PTR(-EINVAL);
+               }
 
                if (GEM_WARN_ON(lmem_size < flat_ccs_base))
-                       return ERR_PTR(-ENODEV);
+                       return ERR_PTR(-EIO);
 
                tile_stolen = lmem_size - flat_ccs_base;
 
@@ -131,7 +142,7 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt)
        io_start = pci_resource_start(pdev, 2);
        io_size = min(pci_resource_len(pdev, 2), lmem_size);
        if (!io_size)
-               return ERR_PTR(-ENODEV);
+               return ERR_PTR(-EIO);
 
        min_page_size = HAS_64K_PAGES(i915) ? I915_GTT_PAGE_SIZE_64K :
                                                I915_GTT_PAGE_SIZE_4K;
index 5423bfd301adf8f8c7f9fc474cb2981f6e07bf81..d5d6f1fadcae397ff1d060f47ce4a178f484aa73 100644 (file)
@@ -117,7 +117,9 @@ static void flush_cs_tlb(struct intel_engine_cs *engine)
                return;
 
        /* ring should be idle before issuing a sync flush*/
-       GEM_DEBUG_WARN_ON((ENGINE_READ(engine, RING_MI_MODE) & MODE_IDLE) == 0);
+       if ((ENGINE_READ(engine, RING_MI_MODE) & MODE_IDLE) == 0)
+               drm_warn(&engine->i915->drm, "%s not idle before sync flush!\n",
+                        engine->name);
 
        ENGINE_WRITE_FW(engine, RING_INSTPM,
                        _MASKED_BIT_ENABLE(INSTPM_TLB_INVALIDATE |
@@ -596,8 +598,9 @@ static void ring_context_reset(struct intel_context *ce)
        clear_bit(CONTEXT_VALID_BIT, &ce->flags);
 }
 
-static void ring_context_ban(struct intel_context *ce,
-                            struct i915_request *rq)
+static void ring_context_revoke(struct intel_context *ce,
+                               struct i915_request *rq,
+                               unsigned int preempt_timeout_ms)
 {
        struct intel_engine_cs *engine;
 
@@ -632,7 +635,7 @@ static const struct intel_context_ops ring_context_ops = {
 
        .cancel_request = ring_context_cancel_request,
 
-       .ban = ring_context_ban,
+       .revoke = ring_context_revoke,
 
        .pre_pin = ring_context_pre_pin,
        .pin = ring_context_pin,
index 9b991df2cfbb417c648b91c82c1886f9df5a21fa..fb3f57ee450bc935d1ead4ec97ce0079a13d76f7 100644 (file)
@@ -1075,7 +1075,9 @@ static u32 intel_rps_read_state_cap(struct intel_rps *rps)
        struct drm_i915_private *i915 = rps_to_i915(rps);
        struct intel_uncore *uncore = rps_to_uncore(rps);
 
-       if (IS_XEHPSDV(i915))
+       if (IS_PONTEVECCHIO(i915))
+               return intel_uncore_read(uncore, PVC_RP_STATE_CAP);
+       else if (IS_XEHPSDV(i915))
                return intel_uncore_read(uncore, XEHPSDV_RP_STATE_CAP);
        else if (IS_GEN9_LP(i915))
                return intel_uncore_read(uncore, BXT_RP_STATE_CAP);
index fdd25691beda0207389c5f49759080025a4fd531..c6d3050604c89d4f71a770175379e72b245a31aa 100644 (file)
@@ -16,11 +16,6 @@ void intel_sseu_set_info(struct sseu_dev_info *sseu, u8 max_slices,
        sseu->max_slices = max_slices;
        sseu->max_subslices = max_subslices;
        sseu->max_eus_per_subslice = max_eus_per_subslice;
-
-       sseu->ss_stride = GEN_SSEU_STRIDE(sseu->max_subslices);
-       GEM_BUG_ON(sseu->ss_stride > GEN_MAX_SUBSLICE_STRIDE);
-       sseu->eu_stride = GEN_SSEU_STRIDE(sseu->max_eus_per_subslice);
-       GEM_BUG_ON(sseu->eu_stride > GEN_MAX_EU_STRIDE);
 }
 
 unsigned int
@@ -28,152 +23,240 @@ intel_sseu_subslice_total(const struct sseu_dev_info *sseu)
 {
        unsigned int i, total = 0;
 
-       for (i = 0; i < ARRAY_SIZE(sseu->subslice_mask); i++)
-               total += hweight8(sseu->subslice_mask[i]);
+       if (sseu->has_xehp_dss)
+               return bitmap_weight(sseu->subslice_mask.xehp,
+                                    XEHP_BITMAP_BITS(sseu->subslice_mask));
+
+       for (i = 0; i < ARRAY_SIZE(sseu->subslice_mask.hsw); i++)
+               total += hweight8(sseu->subslice_mask.hsw[i]);
 
        return total;
 }
 
-static u32
-sseu_get_subslices(const struct sseu_dev_info *sseu,
-                  const u8 *subslice_mask, u8 slice)
+unsigned int
+intel_sseu_get_hsw_subslices(const struct sseu_dev_info *sseu, u8 slice)
 {
-       int i, offset = slice * sseu->ss_stride;
-       u32 mask = 0;
-
-       GEM_BUG_ON(slice >= sseu->max_slices);
-
-       for (i = 0; i < sseu->ss_stride; i++)
-               mask |= (u32)subslice_mask[offset + i] << i * BITS_PER_BYTE;
+       WARN_ON(sseu->has_xehp_dss);
+       if (WARN_ON(slice >= sseu->max_slices))
+               return 0;
 
-       return mask;
+       return sseu->subslice_mask.hsw[slice];
 }
 
-u32 intel_sseu_get_subslices(const struct sseu_dev_info *sseu, u8 slice)
+static u16 sseu_get_eus(const struct sseu_dev_info *sseu, int slice,
+                       int subslice)
 {
-       return sseu_get_subslices(sseu, sseu->subslice_mask, slice);
+       if (sseu->has_xehp_dss) {
+               WARN_ON(slice > 0);
+               return sseu->eu_mask.xehp[subslice];
+       } else {
+               return sseu->eu_mask.hsw[slice][subslice];
+       }
 }
 
-static u32 sseu_get_geometry_subslices(const struct sseu_dev_info *sseu)
+static void sseu_set_eus(struct sseu_dev_info *sseu, int slice, int subslice,
+                        u16 eu_mask)
 {
-       return sseu_get_subslices(sseu, sseu->geometry_subslice_mask, 0);
+       GEM_WARN_ON(eu_mask && __fls(eu_mask) >= sseu->max_eus_per_subslice);
+       if (sseu->has_xehp_dss) {
+               GEM_WARN_ON(slice > 0);
+               sseu->eu_mask.xehp[subslice] = eu_mask;
+       } else {
+               sseu->eu_mask.hsw[slice][subslice] = eu_mask;
+       }
 }
 
-u32 intel_sseu_get_compute_subslices(const struct sseu_dev_info *sseu)
+static u16 compute_eu_total(const struct sseu_dev_info *sseu)
 {
-       return sseu_get_subslices(sseu, sseu->compute_subslice_mask, 0);
-}
+       int s, ss, total = 0;
 
-void intel_sseu_set_subslices(struct sseu_dev_info *sseu, int slice,
-                             u8 *subslice_mask, u32 ss_mask)
-{
-       int offset = slice * sseu->ss_stride;
+       for (s = 0; s < sseu->max_slices; s++)
+               for (ss = 0; ss < sseu->max_subslices; ss++)
+                       if (sseu->has_xehp_dss)
+                               total += hweight16(sseu->eu_mask.xehp[ss]);
+                       else
+                               total += hweight16(sseu->eu_mask.hsw[s][ss]);
 
-       memcpy(&subslice_mask[offset], &ss_mask, sseu->ss_stride);
+       return total;
 }
 
-unsigned int
-intel_sseu_subslices_per_slice(const struct sseu_dev_info *sseu, u8 slice)
+/**
+ * intel_sseu_copy_eumask_to_user - Copy EU mask into a userspace buffer
+ * @to: Pointer to userspace buffer to copy to
+ * @sseu: SSEU structure containing EU mask to copy
+ *
+ * Copies the EU mask to a userspace buffer in the format expected by
+ * the query ioctl's topology queries.
+ *
+ * Returns the result of the copy_to_user() operation.
+ */
+int intel_sseu_copy_eumask_to_user(void __user *to,
+                                  const struct sseu_dev_info *sseu)
 {
-       return hweight32(intel_sseu_get_subslices(sseu, slice));
-}
+       u8 eu_mask[GEN_SS_MASK_SIZE * GEN_MAX_EU_STRIDE] = {};
+       int eu_stride = GEN_SSEU_STRIDE(sseu->max_eus_per_subslice);
+       int len = sseu->max_slices * sseu->max_subslices * eu_stride;
+       int s, ss, i;
 
-static int sseu_eu_idx(const struct sseu_dev_info *sseu, int slice,
-                      int subslice)
-{
-       int slice_stride = sseu->max_subslices * sseu->eu_stride;
+       for (s = 0; s < sseu->max_slices; s++) {
+               for (ss = 0; ss < sseu->max_subslices; ss++) {
+                       int uapi_offset =
+                               s * sseu->max_subslices * eu_stride +
+                               ss * eu_stride;
+                       u16 mask = sseu_get_eus(sseu, s, ss);
+
+                       for (i = 0; i < eu_stride; i++)
+                               eu_mask[uapi_offset + i] =
+                                       (mask >> (BITS_PER_BYTE * i)) & 0xff;
+               }
+       }
 
-       return slice * slice_stride + subslice * sseu->eu_stride;
+       return copy_to_user(to, eu_mask, len);
 }
 
-static u16 sseu_get_eus(const struct sseu_dev_info *sseu, int slice,
-                       int subslice)
+/**
+ * intel_sseu_copy_ssmask_to_user - Copy subslice mask into a userspace buffer
+ * @to: Pointer to userspace buffer to copy to
+ * @sseu: SSEU structure containing subslice mask to copy
+ *
+ * Copies the subslice mask to a userspace buffer in the format expected by
+ * the query ioctl's topology queries.
+ *
+ * Returns the result of the copy_to_user() operation.
+ */
+int intel_sseu_copy_ssmask_to_user(void __user *to,
+                                  const struct sseu_dev_info *sseu)
 {
-       int i, offset = sseu_eu_idx(sseu, slice, subslice);
-       u16 eu_mask = 0;
+       u8 ss_mask[GEN_SS_MASK_SIZE] = {};
+       int ss_stride = GEN_SSEU_STRIDE(sseu->max_subslices);
+       int len = sseu->max_slices * ss_stride;
+       int s, ss, i;
 
-       for (i = 0; i < sseu->eu_stride; i++)
-               eu_mask |=
-                       ((u16)sseu->eu_mask[offset + i]) << (i * BITS_PER_BYTE);
+       for (s = 0; s < sseu->max_slices; s++) {
+               for (ss = 0; ss < sseu->max_subslices; ss++) {
+                       i = s * ss_stride * BITS_PER_BYTE + ss;
 
-       return eu_mask;
-}
+                       if (!intel_sseu_has_subslice(sseu, s, ss))
+                               continue;
 
-static void sseu_set_eus(struct sseu_dev_info *sseu, int slice, int subslice,
-                        u16 eu_mask)
-{
-       int i, offset = sseu_eu_idx(sseu, slice, subslice);
+                       ss_mask[i / BITS_PER_BYTE] |= BIT(i % BITS_PER_BYTE);
+               }
+       }
 
-       for (i = 0; i < sseu->eu_stride; i++)
-               sseu->eu_mask[offset + i] =
-                       (eu_mask >> (BITS_PER_BYTE * i)) & 0xff;
+       return copy_to_user(to, ss_mask, len);
 }
 
-static u16 compute_eu_total(const struct sseu_dev_info *sseu)
+static void gen11_compute_sseu_info(struct sseu_dev_info *sseu,
+                                   u32 ss_en, u16 eu_en)
 {
-       u16 i, total = 0;
+       u32 valid_ss_mask = GENMASK(sseu->max_subslices - 1, 0);
+       int ss;
 
-       for (i = 0; i < ARRAY_SIZE(sseu->eu_mask); i++)
-               total += hweight8(sseu->eu_mask[i]);
+       sseu->slice_mask |= BIT(0);
+       sseu->subslice_mask.hsw[0] = ss_en & valid_ss_mask;
 
-       return total;
+       for (ss = 0; ss < sseu->max_subslices; ss++)
+               if (intel_sseu_has_subslice(sseu, 0, ss))
+                       sseu_set_eus(sseu, 0, ss, eu_en);
+
+       sseu->eu_per_subslice = hweight16(eu_en);
+       sseu->eu_total = compute_eu_total(sseu);
 }
 
-static u32 get_ss_stride_mask(struct sseu_dev_info *sseu, u8 s, u32 ss_en)
+static void xehp_compute_sseu_info(struct sseu_dev_info *sseu,
+                                  u16 eu_en)
 {
-       u32 ss_mask;
+       int ss;
 
-       ss_mask = ss_en >> (s * sseu->max_subslices);
-       ss_mask &= GENMASK(sseu->max_subslices - 1, 0);
+       sseu->slice_mask |= BIT(0);
 
-       return ss_mask;
+       bitmap_or(sseu->subslice_mask.xehp,
+                 sseu->compute_subslice_mask.xehp,
+                 sseu->geometry_subslice_mask.xehp,
+                 XEHP_BITMAP_BITS(sseu->subslice_mask));
+
+       for (ss = 0; ss < sseu->max_subslices; ss++)
+               if (intel_sseu_has_subslice(sseu, 0, ss))
+                       sseu_set_eus(sseu, 0, ss, eu_en);
+
+       sseu->eu_per_subslice = hweight16(eu_en);
+       sseu->eu_total = compute_eu_total(sseu);
 }
 
-static void gen11_compute_sseu_info(struct sseu_dev_info *sseu, u8 s_en,
-                                   u32 g_ss_en, u32 c_ss_en, u16 eu_en)
+static void
+xehp_load_dss_mask(struct intel_uncore *uncore,
+                  intel_sseu_ss_mask_t *ssmask,
+                  int numregs,
+                  ...)
 {
-       int s, ss;
+       va_list argp;
+       u32 fuse_val[I915_MAX_SS_FUSE_REGS] = {};
+       int i;
 
-       /* g_ss_en/c_ss_en represent entire subslice mask across all slices */
-       GEM_BUG_ON(sseu->max_slices * sseu->max_subslices >
-                  sizeof(g_ss_en) * BITS_PER_BYTE);
+       if (WARN_ON(numregs > I915_MAX_SS_FUSE_REGS))
+               numregs = I915_MAX_SS_FUSE_REGS;
 
-       for (s = 0; s < sseu->max_slices; s++) {
-               if ((s_en & BIT(s)) == 0)
-                       continue;
+       va_start(argp, numregs);
+       for (i = 0; i < numregs; i++)
+               fuse_val[i] = intel_uncore_read(uncore, va_arg(argp, i915_reg_t));
+       va_end(argp);
 
-               sseu->slice_mask |= BIT(s);
-
-               /*
-                * XeHP introduces the concept of compute vs geometry DSS. To
-                * reduce variation between GENs around subslice usage, store a
-                * mask for both the geometry and compute enabled masks since
-                * userspace will need to be able to query these masks
-                * independently.  Also compute a total enabled subslice count
-                * for the purposes of selecting subslices to use in a
-                * particular GEM context.
-                */
-               intel_sseu_set_subslices(sseu, s, sseu->compute_subslice_mask,
-                                        get_ss_stride_mask(sseu, s, c_ss_en));
-               intel_sseu_set_subslices(sseu, s, sseu->geometry_subslice_mask,
-                                        get_ss_stride_mask(sseu, s, g_ss_en));
-               intel_sseu_set_subslices(sseu, s, sseu->subslice_mask,
-                                        get_ss_stride_mask(sseu, s,
-                                                           g_ss_en | c_ss_en));
+       bitmap_from_arr32(ssmask->xehp, fuse_val, numregs * 32);
+}
 
-               for (ss = 0; ss < sseu->max_subslices; ss++)
-                       if (intel_sseu_has_subslice(sseu, s, ss))
-                               sseu_set_eus(sseu, s, ss, eu_en);
+static void xehp_sseu_info_init(struct intel_gt *gt)
+{
+       struct sseu_dev_info *sseu = &gt->info.sseu;
+       struct intel_uncore *uncore = gt->uncore;
+       u16 eu_en = 0;
+       u8 eu_en_fuse;
+       int num_compute_regs, num_geometry_regs;
+       int eu;
+
+       if (IS_PONTEVECCHIO(gt->i915)) {
+               num_geometry_regs = 0;
+               num_compute_regs = 2;
+       } else {
+               num_geometry_regs = 1;
+               num_compute_regs = 1;
        }
-       sseu->eu_per_subslice = hweight16(eu_en);
-       sseu->eu_total = compute_eu_total(sseu);
+
+       /*
+        * The concept of slice has been removed in Xe_HP.  To be compatible
+        * with prior generations, assume a single slice across the entire
+        * device. Then calculate out the DSS for each workload type within
+        * that software slice.
+        */
+       intel_sseu_set_info(sseu, 1,
+                           32 * max(num_geometry_regs, num_compute_regs),
+                           HAS_ONE_EU_PER_FUSE_BIT(gt->i915) ? 8 : 16);
+       sseu->has_xehp_dss = 1;
+
+       xehp_load_dss_mask(uncore, &sseu->geometry_subslice_mask,
+                          num_geometry_regs,
+                          GEN12_GT_GEOMETRY_DSS_ENABLE);
+       xehp_load_dss_mask(uncore, &sseu->compute_subslice_mask,
+                          num_compute_regs,
+                          GEN12_GT_COMPUTE_DSS_ENABLE,
+                          XEHPC_GT_COMPUTE_DSS_ENABLE_EXT);
+
+       eu_en_fuse = intel_uncore_read(uncore, XEHP_EU_ENABLE) & XEHP_EU_ENA_MASK;
+
+       if (HAS_ONE_EU_PER_FUSE_BIT(gt->i915))
+               eu_en = eu_en_fuse;
+       else
+               for (eu = 0; eu < sseu->max_eus_per_subslice / 2; eu++)
+                       if (eu_en_fuse & BIT(eu))
+                               eu_en |= BIT(eu * 2) | BIT(eu * 2 + 1);
+
+       xehp_compute_sseu_info(sseu, eu_en);
 }
 
 static void gen12_sseu_info_init(struct intel_gt *gt)
 {
        struct sseu_dev_info *sseu = &gt->info.sseu;
        struct intel_uncore *uncore = gt->uncore;
-       u32 g_dss_en, c_dss_en = 0;
+       u32 g_dss_en;
        u16 eu_en = 0;
        u8 eu_en_fuse;
        u8 s_en;
@@ -183,43 +266,28 @@ static void gen12_sseu_info_init(struct intel_gt *gt)
         * Gen12 has Dual-Subslices, which behave similarly to 2 gen11 SS.
         * Instead of splitting these, provide userspace with an array
         * of DSS to more closely represent the hardware resource.
-        *
-        * In addition, the concept of slice has been removed in Xe_HP.
-        * To be compatible with prior generations, assume a single slice
-        * across the entire device. Then calculate out the DSS for each
-        * workload type within that software slice.
         */
-       if (IS_DG2(gt->i915) || IS_XEHPSDV(gt->i915))
-               intel_sseu_set_info(sseu, 1, 32, 16);
-       else
-               intel_sseu_set_info(sseu, 1, 6, 16);
+       intel_sseu_set_info(sseu, 1, 6, 16);
 
        /*
-        * As mentioned above, Xe_HP does not have the concept of a slice.
-        * Enable one for software backwards compatibility.
+        * Although gen12 architecture supported multiple slices, TGL, RKL,
+        * DG1, and ADL only had a single slice.
         */
-       if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 50))
-               s_en = 0x1;
-       else
-               s_en = intel_uncore_read(uncore, GEN11_GT_SLICE_ENABLE) &
-                      GEN11_GT_S_ENA_MASK;
+       s_en = intel_uncore_read(uncore, GEN11_GT_SLICE_ENABLE) &
+               GEN11_GT_S_ENA_MASK;
+       drm_WARN_ON(&gt->i915->drm, s_en != 0x1);
 
        g_dss_en = intel_uncore_read(uncore, GEN12_GT_GEOMETRY_DSS_ENABLE);
-       if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 50))
-               c_dss_en = intel_uncore_read(uncore, GEN12_GT_COMPUTE_DSS_ENABLE);
 
        /* one bit per pair of EUs */
-       if (GRAPHICS_VER_FULL(gt->i915) >= IP_VER(12, 50))
-               eu_en_fuse = intel_uncore_read(uncore, XEHP_EU_ENABLE) & XEHP_EU_ENA_MASK;
-       else
-               eu_en_fuse = ~(intel_uncore_read(uncore, GEN11_EU_DISABLE) &
-                              GEN11_EU_DIS_MASK);
+       eu_en_fuse = ~(intel_uncore_read(uncore, GEN11_EU_DISABLE) &
+                      GEN11_EU_DIS_MASK);
 
        for (eu = 0; eu < sseu->max_eus_per_subslice / 2; eu++)
                if (eu_en_fuse & BIT(eu))
                        eu_en |= BIT(eu * 2) | BIT(eu * 2 + 1);
 
-       gen11_compute_sseu_info(sseu, s_en, g_dss_en, c_dss_en, eu_en);
+       gen11_compute_sseu_info(sseu, g_dss_en, eu_en);
 
        /* TGL only supports slice-level power gating */
        sseu->has_slice_pg = 1;
@@ -238,14 +306,20 @@ static void gen11_sseu_info_init(struct intel_gt *gt)
        else
                intel_sseu_set_info(sseu, 1, 8, 8);
 
+       /*
+        * Although gen11 architecture supported multiple slices, ICL and
+        * EHL/JSL only had a single slice in practice.
+        */
        s_en = intel_uncore_read(uncore, GEN11_GT_SLICE_ENABLE) &
                GEN11_GT_S_ENA_MASK;
+       drm_WARN_ON(&gt->i915->drm, s_en != 0x1);
+
        ss_en = ~intel_uncore_read(uncore, GEN11_GT_SUBSLICE_DISABLE);
 
        eu_en = ~(intel_uncore_read(uncore, GEN11_EU_DISABLE) &
                  GEN11_EU_DIS_MASK);
 
-       gen11_compute_sseu_info(sseu, s_en, ss_en, 0, eu_en);
+       gen11_compute_sseu_info(sseu, ss_en, eu_en);
 
        /* ICL has no power gating restrictions. */
        sseu->has_slice_pg = 1;
@@ -257,7 +331,6 @@ static void cherryview_sseu_info_init(struct intel_gt *gt)
 {
        struct sseu_dev_info *sseu = &gt->info.sseu;
        u32 fuse;
-       u8 subslice_mask = 0;
 
        fuse = intel_uncore_read(gt->uncore, CHV_FUSE_GT);
 
@@ -271,8 +344,8 @@ static void cherryview_sseu_info_init(struct intel_gt *gt)
                        (((fuse & CHV_FGT_EU_DIS_SS0_R1_MASK) >>
                          CHV_FGT_EU_DIS_SS0_R1_SHIFT) << 4);
 
-               subslice_mask |= BIT(0);
-               sseu_set_eus(sseu, 0, 0, ~disabled_mask);
+               sseu->subslice_mask.hsw[0] |= BIT(0);
+               sseu_set_eus(sseu, 0, 0, ~disabled_mask & 0xFF);
        }
 
        if (!(fuse & CHV_FGT_DISABLE_SS1)) {
@@ -282,12 +355,10 @@ static void cherryview_sseu_info_init(struct intel_gt *gt)
                        (((fuse & CHV_FGT_EU_DIS_SS1_R1_MASK) >>
                          CHV_FGT_EU_DIS_SS1_R1_SHIFT) << 4);
 
-               subslice_mask |= BIT(1);
-               sseu_set_eus(sseu, 0, 1, ~disabled_mask);
+               sseu->subslice_mask.hsw[0] |= BIT(1);
+               sseu_set_eus(sseu, 0, 1, ~disabled_mask & 0xFF);
        }
 
-       intel_sseu_set_subslices(sseu, 0, sseu->subslice_mask, subslice_mask);
-
        sseu->eu_total = compute_eu_total(sseu);
 
        /*
@@ -342,8 +413,7 @@ static void gen9_sseu_info_init(struct intel_gt *gt)
                        /* skip disabled slice */
                        continue;
 
-               intel_sseu_set_subslices(sseu, s, sseu->subslice_mask,
-                                        subslice_mask);
+               sseu->subslice_mask.hsw[s] = subslice_mask;
 
                eu_disable = intel_uncore_read(uncore, GEN9_EU_DISABLE(s));
                for (ss = 0; ss < sseu->max_subslices; ss++) {
@@ -356,7 +426,7 @@ static void gen9_sseu_info_init(struct intel_gt *gt)
 
                        eu_disabled_mask = (eu_disable >> (ss * 8)) & eu_mask;
 
-                       sseu_set_eus(sseu, s, ss, ~eu_disabled_mask);
+                       sseu_set_eus(sseu, s, ss, ~eu_disabled_mask & eu_mask);
 
                        eu_per_ss = sseu->max_eus_per_subslice -
                                hweight8(eu_disabled_mask);
@@ -400,8 +470,8 @@ static void gen9_sseu_info_init(struct intel_gt *gt)
        sseu->has_eu_pg = sseu->eu_per_subslice > 2;
 
        if (IS_GEN9_LP(i915)) {
-#define IS_SS_DISABLED(ss)     (!(sseu->subslice_mask[0] & BIT(ss)))
-               info->has_pooled_eu = hweight8(sseu->subslice_mask[0]) == 3;
+#define IS_SS_DISABLED(ss)     (!(sseu->subslice_mask.hsw[0] & BIT(ss)))
+               info->has_pooled_eu = hweight8(sseu->subslice_mask.hsw[0]) == 3;
 
                sseu->min_eu_in_pool = 0;
                if (info->has_pooled_eu) {
@@ -455,8 +525,7 @@ static void bdw_sseu_info_init(struct intel_gt *gt)
                        /* skip disabled slice */
                        continue;
 
-               intel_sseu_set_subslices(sseu, s, sseu->subslice_mask,
-                                        subslice_mask);
+               sseu->subslice_mask.hsw[s] = subslice_mask;
 
                for (ss = 0; ss < sseu->max_subslices; ss++) {
                        u8 eu_disabled_mask;
@@ -469,7 +538,7 @@ static void bdw_sseu_info_init(struct intel_gt *gt)
                        eu_disabled_mask =
                                eu_disable[s] >> (ss * sseu->max_eus_per_subslice);
 
-                       sseu_set_eus(sseu, s, ss, ~eu_disabled_mask);
+                       sseu_set_eus(sseu, s, ss, ~eu_disabled_mask & 0xFF);
 
                        n_disabled = hweight8(eu_disabled_mask);
 
@@ -553,8 +622,7 @@ static void hsw_sseu_info_init(struct intel_gt *gt)
                            sseu->eu_per_subslice);
 
        for (s = 0; s < sseu->max_slices; s++) {
-               intel_sseu_set_subslices(sseu, s, sseu->subslice_mask,
-                                        subslice_mask);
+               sseu->subslice_mask.hsw[s] = subslice_mask;
 
                for (ss = 0; ss < sseu->max_subslices; ss++) {
                        sseu_set_eus(sseu, s, ss,
@@ -574,18 +642,20 @@ void intel_sseu_info_init(struct intel_gt *gt)
 {
        struct drm_i915_private *i915 = gt->i915;
 
-       if (IS_HASWELL(i915))
-               hsw_sseu_info_init(gt);
-       else if (IS_CHERRYVIEW(i915))
-               cherryview_sseu_info_init(gt);
-       else if (IS_BROADWELL(i915))
-               bdw_sseu_info_init(gt);
-       else if (GRAPHICS_VER(i915) == 9)
-               gen9_sseu_info_init(gt);
-       else if (GRAPHICS_VER(i915) == 11)
-               gen11_sseu_info_init(gt);
+       if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
+               xehp_sseu_info_init(gt);
        else if (GRAPHICS_VER(i915) >= 12)
                gen12_sseu_info_init(gt);
+       else if (GRAPHICS_VER(i915) >= 11)
+               gen11_sseu_info_init(gt);
+       else if (GRAPHICS_VER(i915) >= 9)
+               gen9_sseu_info_init(gt);
+       else if (IS_BROADWELL(i915))
+               bdw_sseu_info_init(gt);
+       else if (IS_CHERRYVIEW(i915))
+               cherryview_sseu_info_init(gt);
+       else if (IS_HASWELL(i915))
+               hsw_sseu_info_init(gt);
 }
 
 u32 intel_sseu_make_rpcs(struct intel_gt *gt,
@@ -641,7 +711,7 @@ u32 intel_sseu_make_rpcs(struct intel_gt *gt,
         */
        if (GRAPHICS_VER(i915) == 11 &&
            slices == 1 &&
-           subslices > min_t(u8, 4, hweight8(sseu->subslice_mask[0]) / 2)) {
+           subslices > min_t(u8, 4, hweight8(sseu->subslice_mask.hsw[0]) / 2)) {
                GEM_BUG_ON(subslices & 1);
 
                subslice_pg = false;
@@ -707,14 +777,29 @@ void intel_sseu_dump(const struct sseu_dev_info *sseu, struct drm_printer *p)
 {
        int s;
 
-       drm_printf(p, "slice total: %u, mask=%04x\n",
-                  hweight8(sseu->slice_mask), sseu->slice_mask);
-       drm_printf(p, "subslice total: %u\n", intel_sseu_subslice_total(sseu));
-       for (s = 0; s < sseu->max_slices; s++) {
-               drm_printf(p, "slice%d: %u subslices, mask=%08x\n",
-                          s, intel_sseu_subslices_per_slice(sseu, s),
-                          intel_sseu_get_subslices(sseu, s));
+       if (sseu->has_xehp_dss) {
+               drm_printf(p, "subslice total: %u\n",
+                          intel_sseu_subslice_total(sseu));
+               drm_printf(p, "geometry dss mask=%*pb\n",
+                          XEHP_BITMAP_BITS(sseu->geometry_subslice_mask),
+                          sseu->geometry_subslice_mask.xehp);
+               drm_printf(p, "compute dss mask=%*pb\n",
+                          XEHP_BITMAP_BITS(sseu->compute_subslice_mask),
+                          sseu->compute_subslice_mask.xehp);
+       } else {
+               drm_printf(p, "slice total: %u, mask=%04x\n",
+                          hweight8(sseu->slice_mask), sseu->slice_mask);
+               drm_printf(p, "subslice total: %u\n",
+                          intel_sseu_subslice_total(sseu));
+
+               for (s = 0; s < sseu->max_slices; s++) {
+                       u8 ss_mask = sseu->subslice_mask.hsw[s];
+
+                       drm_printf(p, "slice%d: %u subslices, mask=%08x\n",
+                                  s, hweight8(ss_mask), ss_mask);
+               }
        }
+
        drm_printf(p, "EU total: %u\n", sseu->eu_total);
        drm_printf(p, "EU per subslice: %u\n", sseu->eu_per_subslice);
        drm_printf(p, "has slice power gating: %s\n",
@@ -731,9 +816,10 @@ static void sseu_print_hsw_topology(const struct sseu_dev_info *sseu,
        int s, ss;
 
        for (s = 0; s < sseu->max_slices; s++) {
+               u8 ss_mask = sseu->subslice_mask.hsw[s];
+
                drm_printf(p, "slice%d: %u subslice(s) (0x%08x):\n",
-                          s, intel_sseu_subslices_per_slice(sseu, s),
-                          intel_sseu_get_subslices(sseu, s));
+                          s, hweight8(ss_mask), ss_mask);
 
                for (ss = 0; ss < sseu->max_subslices; ss++) {
                        u16 enabled_eus = sseu_get_eus(sseu, s, ss);
@@ -747,16 +833,14 @@ static void sseu_print_hsw_topology(const struct sseu_dev_info *sseu,
 static void sseu_print_xehp_topology(const struct sseu_dev_info *sseu,
                                     struct drm_printer *p)
 {
-       u32 g_dss_mask = sseu_get_geometry_subslices(sseu);
-       u32 c_dss_mask = intel_sseu_get_compute_subslices(sseu);
        int dss;
 
        for (dss = 0; dss < sseu->max_subslices; dss++) {
                u16 enabled_eus = sseu_get_eus(sseu, 0, dss);
 
                drm_printf(p, "DSS_%02d: G:%3s C:%3s, %2u EUs (0x%04hx)\n", dss,
-                          str_yes_no(g_dss_mask & BIT(dss)),
-                          str_yes_no(c_dss_mask & BIT(dss)),
+                          str_yes_no(test_bit(dss, sseu->geometry_subslice_mask.xehp)),
+                          str_yes_no(test_bit(dss, sseu->compute_subslice_mask.xehp)),
                           hweight16(enabled_eus), enabled_eus);
        }
 }
@@ -774,20 +858,44 @@ void intel_sseu_print_topology(struct drm_i915_private *i915,
        }
 }
 
-u16 intel_slicemask_from_dssmask(u64 dss_mask, int dss_per_slice)
+void intel_sseu_print_ss_info(const char *type,
+                             const struct sseu_dev_info *sseu,
+                             struct seq_file *m)
 {
-       u16 slice_mask = 0;
+       int s;
+
+       if (sseu->has_xehp_dss) {
+               seq_printf(m, "  %s Geometry DSS: %u\n", type,
+                          bitmap_weight(sseu->geometry_subslice_mask.xehp,
+                                        XEHP_BITMAP_BITS(sseu->geometry_subslice_mask)));
+               seq_printf(m, "  %s Compute DSS: %u\n", type,
+                          bitmap_weight(sseu->compute_subslice_mask.xehp,
+                                        XEHP_BITMAP_BITS(sseu->compute_subslice_mask)));
+       } else {
+               for (s = 0; s < fls(sseu->slice_mask); s++)
+                       seq_printf(m, "  %s Slice%i subslices: %u\n", type,
+                                  s, hweight8(sseu->subslice_mask.hsw[s]));
+       }
+}
+
+u16 intel_slicemask_from_xehp_dssmask(intel_sseu_ss_mask_t dss_mask,
+                                     int dss_per_slice)
+{
+       intel_sseu_ss_mask_t per_slice_mask = {};
+       unsigned long slice_mask = 0;
        int i;
 
-       WARN_ON(sizeof(dss_mask) * 8 / dss_per_slice > 8 * sizeof(slice_mask));
+       WARN_ON(DIV_ROUND_UP(XEHP_BITMAP_BITS(dss_mask), dss_per_slice) >
+               8 * sizeof(slice_mask));
 
-       for (i = 0; dss_mask; i++) {
-               if (dss_mask & GENMASK(dss_per_slice - 1, 0))
+       bitmap_fill(per_slice_mask.xehp, dss_per_slice);
+       for (i = 0; !bitmap_empty(dss_mask.xehp, XEHP_BITMAP_BITS(dss_mask)); i++) {
+               if (bitmap_intersects(dss_mask.xehp, per_slice_mask.xehp, dss_per_slice))
                        slice_mask |= BIT(i);
 
-               dss_mask >>= dss_per_slice;
+               bitmap_shift_right(dss_mask.xehp, dss_mask.xehp, dss_per_slice,
+                                  XEHP_BITMAP_BITS(dss_mask));
        }
 
        return slice_mask;
 }
-
index 5c078df4729cbac3219291a5bb40ea8ec75c4ce6..aa87d3832d60d92539413d6cc5f840873c65696a 100644 (file)
@@ -25,12 +25,16 @@ struct drm_printer;
 /*
  * Maximum number of subslices that can exist within a HSW-style slice.  This
  * is only relevant to pre-Xe_HP platforms (Xe_HP and beyond use the
- * GEN_MAX_DSS value below).
+ * I915_MAX_SS_FUSE_BITS value below).
  */
 #define GEN_MAX_SS_PER_HSW_SLICE       6
 
-/* Maximum number of DSS on newer platforms (Xe_HP and beyond). */
-#define GEN_MAX_DSS                    32
+/*
+ * Maximum number of 32-bit registers used by hardware to express the
+ * enabled/disabled subslices.
+ */
+#define I915_MAX_SS_FUSE_REGS  2
+#define I915_MAX_SS_FUSE_BITS  (I915_MAX_SS_FUSE_REGS * 32)
 
 /* Maximum number of EUs that can exist within a subslice or DSS. */
 #define GEN_MAX_EUS_PER_SS             16
@@ -38,7 +42,7 @@ struct drm_printer;
 #define SSEU_MAX(a, b)                 ((a) > (b) ? (a) : (b))
 
 /* The maximum number of bits needed to express each subslice/DSS independently */
-#define GEN_SS_MASK_SIZE               SSEU_MAX(GEN_MAX_DSS, \
+#define GEN_SS_MASK_SIZE               SSEU_MAX(I915_MAX_SS_FUSE_BITS, \
                                                 GEN_MAX_HSW_SLICES * GEN_MAX_SS_PER_HSW_SLICE)
 
 #define GEN_SSEU_STRIDE(max_entries)   DIV_ROUND_UP(max_entries, BITS_PER_BYTE)
@@ -49,15 +53,28 @@ struct drm_printer;
 #define GEN_DSS_PER_CSLICE     8
 #define GEN_DSS_PER_MSLICE     8
 
-#define GEN_MAX_GSLICES                (GEN_MAX_DSS / GEN_DSS_PER_GSLICE)
-#define GEN_MAX_CSLICES                (GEN_MAX_DSS / GEN_DSS_PER_CSLICE)
+#define GEN_MAX_GSLICES                (I915_MAX_SS_FUSE_BITS / GEN_DSS_PER_GSLICE)
+#define GEN_MAX_CSLICES                (I915_MAX_SS_FUSE_BITS / GEN_DSS_PER_CSLICE)
+
+typedef union {
+       u8 hsw[GEN_MAX_HSW_SLICES];
+
+       /* Bitmap compatible with linux/bitmap.h; may exceed size of u64 */
+       unsigned long xehp[BITS_TO_LONGS(I915_MAX_SS_FUSE_BITS)];
+} intel_sseu_ss_mask_t;
+
+#define XEHP_BITMAP_BITS(mask) ((int)BITS_PER_TYPE(typeof(mask.xehp)))
 
 struct sseu_dev_info {
        u8 slice_mask;
-       u8 subslice_mask[GEN_SS_MASK_SIZE];
-       u8 geometry_subslice_mask[GEN_SS_MASK_SIZE];
-       u8 compute_subslice_mask[GEN_SS_MASK_SIZE];
-       u8 eu_mask[GEN_SS_MASK_SIZE * GEN_MAX_EU_STRIDE];
+       intel_sseu_ss_mask_t subslice_mask;
+       intel_sseu_ss_mask_t geometry_subslice_mask;
+       intel_sseu_ss_mask_t compute_subslice_mask;
+       union {
+               u16 hsw[GEN_MAX_HSW_SLICES][GEN_MAX_SS_PER_HSW_SLICE];
+               u16 xehp[I915_MAX_SS_FUSE_BITS];
+       } eu_mask;
+
        u16 eu_total;
        u8 eu_per_subslice;
        u8 min_eu_in_pool;
@@ -66,14 +83,16 @@ struct sseu_dev_info {
        u8 has_slice_pg:1;
        u8 has_subslice_pg:1;
        u8 has_eu_pg:1;
+       /*
+        * For Xe_HP and beyond, the hardware no longer has traditional slices
+        * so we just report the entire DSS pool under a fake "slice 0."
+        */
+       u8 has_xehp_dss:1;
 
        /* Topology fields */
        u8 max_slices;
        u8 max_subslices;
        u8 max_eus_per_subslice;
-
-       u8 ss_stride;
-       u8 eu_stride;
 };
 
 /*
@@ -91,7 +110,7 @@ intel_sseu_from_device_info(const struct sseu_dev_info *sseu)
 {
        struct intel_sseu value = {
                .slice_mask = sseu->slice_mask,
-               .subslice_mask = sseu->subslice_mask[0],
+               .subslice_mask = sseu->subslice_mask.hsw[0],
                .min_eus_per_subslice = sseu->max_eus_per_subslice,
                .max_eus_per_subslice = sseu->max_eus_per_subslice,
        };
@@ -103,18 +122,28 @@ static inline bool
 intel_sseu_has_subslice(const struct sseu_dev_info *sseu, int slice,
                        int subslice)
 {
-       u8 mask;
-       int ss_idx = subslice / BITS_PER_BYTE;
-
        if (slice >= sseu->max_slices ||
            subslice >= sseu->max_subslices)
                return false;
 
-       GEM_BUG_ON(ss_idx >= sseu->ss_stride);
-
-       mask = sseu->subslice_mask[slice * sseu->ss_stride + ss_idx];
+       if (sseu->has_xehp_dss)
+               return test_bit(subslice, sseu->subslice_mask.xehp);
+       else
+               return sseu->subslice_mask.hsw[slice] & BIT(subslice);
+}
 
-       return mask & BIT(subslice % BITS_PER_BYTE);
+/*
+ * Used to obtain the index of the first DSS.  Can start searching from the
+ * beginning of a specific dss group (e.g., gslice, cslice, etc.) if
+ * groupsize and groupnum are non-zero.
+ */
+static inline unsigned int
+intel_sseu_find_first_xehp_dss(const struct sseu_dev_info *sseu, int groupsize,
+                              int groupnum)
+{
+       return find_next_bit(sseu->subslice_mask.xehp,
+                            XEHP_BITMAP_BITS(sseu->subslice_mask),
+                            groupnum * groupsize);
 }
 
 void intel_sseu_set_info(struct sseu_dev_info *sseu, u8 max_slices,
@@ -124,14 +153,10 @@ unsigned int
 intel_sseu_subslice_total(const struct sseu_dev_info *sseu);
 
 unsigned int
-intel_sseu_subslices_per_slice(const struct sseu_dev_info *sseu, u8 slice);
+intel_sseu_get_hsw_subslices(const struct sseu_dev_info *sseu, u8 slice);
 
-u32 intel_sseu_get_subslices(const struct sseu_dev_info *sseu, u8 slice);
-
-u32 intel_sseu_get_compute_subslices(const struct sseu_dev_info *sseu);
-
-void intel_sseu_set_subslices(struct sseu_dev_info *sseu, int slice,
-                             u8 *subslice_mask, u32 ss_mask);
+intel_sseu_ss_mask_t
+intel_sseu_get_compute_subslices(const struct sseu_dev_info *sseu);
 
 void intel_sseu_info_init(struct intel_gt *gt);
 
@@ -143,6 +168,15 @@ void intel_sseu_print_topology(struct drm_i915_private *i915,
                               const struct sseu_dev_info *sseu,
                               struct drm_printer *p);
 
-u16 intel_slicemask_from_dssmask(u64 dss_mask, int dss_per_slice);
+u16 intel_slicemask_from_xehp_dssmask(intel_sseu_ss_mask_t dss_mask, int dss_per_slice);
+
+int intel_sseu_copy_eumask_to_user(void __user *to,
+                                  const struct sseu_dev_info *sseu);
+int intel_sseu_copy_ssmask_to_user(void __user *to,
+                                  const struct sseu_dev_info *sseu);
+
+void intel_sseu_print_ss_info(const char *type,
+                             const struct sseu_dev_info *sseu,
+                             struct seq_file *m);
 
 #endif /* __INTEL_SSEU_H__ */
index 2d5d011e01db273cd86987aaee9aa45f0eb837c9..c2ee5e1826b5dcbd8b99b5acb71e7270f26591dc 100644 (file)
@@ -4,6 +4,7 @@
  * Copyright Â© 2020 Intel Corporation
  */
 
+#include <linux/bitmap.h>
 #include <linux/string_helpers.h>
 
 #include "i915_drv.h"
 #include "intel_gt_regs.h"
 #include "intel_sseu_debugfs.h"
 
-static void sseu_copy_subslices(const struct sseu_dev_info *sseu,
-                               int slice, u8 *to_mask)
-{
-       int offset = slice * sseu->ss_stride;
-
-       memcpy(&to_mask[offset], &sseu->subslice_mask[offset], sseu->ss_stride);
-}
-
 static void cherryview_sseu_device_status(struct intel_gt *gt,
                                          struct sseu_dev_info *sseu)
 {
@@ -41,7 +34,7 @@ static void cherryview_sseu_device_status(struct intel_gt *gt,
                        continue;
 
                sseu->slice_mask = BIT(0);
-               sseu->subslice_mask[0] |= BIT(ss);
+               sseu->subslice_mask.hsw[0] |= BIT(ss);
                eu_cnt = ((sig1[ss] & CHV_EU08_PG_ENABLE) ? 0 : 2) +
                         ((sig1[ss] & CHV_EU19_PG_ENABLE) ? 0 : 2) +
                         ((sig1[ss] & CHV_EU210_PG_ENABLE) ? 0 : 2) +
@@ -92,7 +85,7 @@ static void gen11_sseu_device_status(struct intel_gt *gt,
                        continue;
 
                sseu->slice_mask |= BIT(s);
-               sseu_copy_subslices(&info->sseu, s, sseu->subslice_mask);
+               sseu->subslice_mask.hsw[s] = info->sseu.subslice_mask.hsw[s];
 
                for (ss = 0; ss < info->sseu.max_subslices; ss++) {
                        unsigned int eu_cnt;
@@ -147,21 +140,17 @@ static void gen9_sseu_device_status(struct intel_gt *gt,
                sseu->slice_mask |= BIT(s);
 
                if (IS_GEN9_BC(gt->i915))
-                       sseu_copy_subslices(&info->sseu, s,
-                                           sseu->subslice_mask);
+                       sseu->subslice_mask.hsw[s] = info->sseu.subslice_mask.hsw[s];
 
                for (ss = 0; ss < info->sseu.max_subslices; ss++) {
                        unsigned int eu_cnt;
-                       u8 ss_idx = s * info->sseu.ss_stride +
-                                   ss / BITS_PER_BYTE;
 
                        if (IS_GEN9_LP(gt->i915)) {
                                if (!(s_reg[s] & (GEN9_PGCTL_SS_ACK(ss))))
                                        /* skip disabled subslice */
                                        continue;
 
-                               sseu->subslice_mask[ss_idx] |=
-                                       BIT(ss % BITS_PER_BYTE);
+                               sseu->subslice_mask.hsw[s] |= BIT(ss);
                        }
 
                        eu_cnt = eu_reg[2 * s + ss / 2] & eu_mask[ss % 2];
@@ -188,8 +177,7 @@ static void bdw_sseu_device_status(struct intel_gt *gt,
        if (sseu->slice_mask) {
                sseu->eu_per_subslice = info->sseu.eu_per_subslice;
                for (s = 0; s < fls(sseu->slice_mask); s++)
-                       sseu_copy_subslices(&info->sseu, s,
-                                           sseu->subslice_mask);
+                       sseu->subslice_mask.hsw[s] = info->sseu.subslice_mask.hsw[s];
                sseu->eu_total = sseu->eu_per_subslice *
                                 intel_sseu_subslice_total(sseu);
 
@@ -208,7 +196,6 @@ static void i915_print_sseu_info(struct seq_file *m,
                                 const struct sseu_dev_info *sseu)
 {
        const char *type = is_available_info ? "Available" : "Enabled";
-       int s;
 
        seq_printf(m, "  %s Slice Mask: %04x\n", type,
                   sseu->slice_mask);
@@ -216,10 +203,7 @@ static void i915_print_sseu_info(struct seq_file *m,
                   hweight8(sseu->slice_mask));
        seq_printf(m, "  %s Subslice Total: %u\n", type,
                   intel_sseu_subslice_total(sseu));
-       for (s = 0; s < fls(sseu->slice_mask); s++) {
-               seq_printf(m, "  %s Slice%i subslices: %u\n", type,
-                          s, intel_sseu_subslices_per_slice(sseu, s));
-       }
+       intel_sseu_print_ss_info(type, sseu, m);
        seq_printf(m, "  %s EU Total: %u\n", type,
                   sseu->eu_total);
        seq_printf(m, "  %s EU Per Subslice: %u\n", type,
index a05c4b99b3fbc059f9efcc1e1b0dfd0346df18b3..3213c593a55f45a057c70a2a0b756cdc9b8fad9a 100644 (file)
@@ -9,6 +9,7 @@
 #include "intel_engine_regs.h"
 #include "intel_gpu_commands.h"
 #include "intel_gt.h"
+#include "intel_gt_mcr.h"
 #include "intel_gt_regs.h"
 #include "intel_ring.h"
 #include "intel_workarounds.h"
@@ -776,7 +777,9 @@ __intel_engine_init_ctx_wa(struct intel_engine_cs *engine,
        if (engine->class != RENDER_CLASS)
                goto done;
 
-       if (IS_DG2(i915))
+       if (IS_PONTEVECCHIO(i915))
+               ; /* noop; none at this time */
+       else if (IS_DG2(i915))
                dg2_ctx_workarounds_init(engine, wal);
        else if (IS_XEHPSDV(i915))
                ; /* noop; none at this time */
@@ -948,8 +951,8 @@ gen9_wa_init_mcr(struct drm_i915_private *i915, struct i915_wa_list *wal)
         * on s/ss combo, the read should be done with read_subslice_reg.
         */
        slice = ffs(sseu->slice_mask) - 1;
-       GEM_BUG_ON(slice >= ARRAY_SIZE(sseu->subslice_mask));
-       subslice = ffs(intel_sseu_get_subslices(sseu, slice));
+       GEM_BUG_ON(slice >= ARRAY_SIZE(sseu->subslice_mask.hsw));
+       subslice = ffs(intel_sseu_get_hsw_subslices(sseu, slice));
        GEM_BUG_ON(!subslice);
        subslice--;
 
@@ -1080,18 +1083,17 @@ static void __add_mcr_wa(struct intel_gt *gt, struct i915_wa_list *wal,
        gt->default_steering.instanceid = subslice;
 
        if (drm_debug_enabled(DRM_UT_DRIVER))
-               intel_gt_report_steering(&p, gt, false);
+               intel_gt_mcr_report_steering(&p, gt, false);
 }
 
 static void
 icl_wa_init_mcr(struct intel_gt *gt, struct i915_wa_list *wal)
 {
        const struct sseu_dev_info *sseu = &gt->info.sseu;
-       unsigned int slice, subslice;
+       unsigned int subslice;
 
        GEM_BUG_ON(GRAPHICS_VER(gt->i915) < 11);
        GEM_BUG_ON(hweight8(sseu->slice_mask) > 1);
-       slice = 0;
 
        /*
         * Although a platform may have subslices, we need to always steer
@@ -1102,7 +1104,7 @@ icl_wa_init_mcr(struct intel_gt *gt, struct i915_wa_list *wal)
         * one of the higher subslices, we run the risk of reading back 0's or
         * random garbage.
         */
-       subslice = __ffs(intel_sseu_get_subslices(sseu, slice));
+       subslice = __ffs(intel_sseu_get_hsw_subslices(sseu, 0));
 
        /*
         * If the subslice we picked above also steers us to a valid L3 bank,
@@ -1112,7 +1114,7 @@ icl_wa_init_mcr(struct intel_gt *gt, struct i915_wa_list *wal)
        if (gt->info.l3bank_mask & BIT(subslice))
                gt->steering_table[L3BANK] = NULL;
 
-       __add_mcr_wa(gt, wal, slice, subslice);
+       __add_mcr_wa(gt, wal, 0, subslice);
 }
 
 static void
@@ -1120,7 +1122,6 @@ xehp_init_mcr(struct intel_gt *gt, struct i915_wa_list *wal)
 {
        const struct sseu_dev_info *sseu = &gt->info.sseu;
        unsigned long slice, subslice = 0, slice_mask = 0;
-       u64 dss_mask = 0;
        u32 lncf_mask = 0;
        int i;
 
@@ -1151,8 +1152,8 @@ xehp_init_mcr(struct intel_gt *gt, struct i915_wa_list *wal)
         */
 
        /* Find the potential gslice candidates */
-       dss_mask = intel_sseu_get_subslices(sseu, 0);
-       slice_mask = intel_slicemask_from_dssmask(dss_mask, GEN_DSS_PER_GSLICE);
+       slice_mask = intel_slicemask_from_xehp_dssmask(sseu->subslice_mask,
+                                                      GEN_DSS_PER_GSLICE);
 
        /*
         * Find the potential LNCF candidates.  Either LNCF within a valid
@@ -1177,9 +1178,8 @@ xehp_init_mcr(struct intel_gt *gt, struct i915_wa_list *wal)
        }
 
        slice = __ffs(slice_mask);
-       subslice = __ffs(dss_mask >> (slice * GEN_DSS_PER_GSLICE));
-       WARN_ON(subslice > GEN_DSS_PER_GSLICE);
-       WARN_ON(dss_mask >> (slice * GEN_DSS_PER_GSLICE) == 0);
+       subslice = intel_sseu_find_first_xehp_dss(sseu, GEN_DSS_PER_GSLICE, slice) %
+               GEN_DSS_PER_GSLICE;
 
        __add_mcr_wa(gt, wal, slice, subslice);
 
@@ -1196,6 +1196,20 @@ xehp_init_mcr(struct intel_gt *gt, struct i915_wa_list *wal)
        __set_mcr_steering(wal, SF_MCR_SELECTOR, 0, 2);
 }
 
+static void
+pvc_init_mcr(struct intel_gt *gt, struct i915_wa_list *wal)
+{
+       unsigned int dss;
+
+       /*
+        * Setup implicit steering for COMPUTE and DSS ranges to the first
+        * non-fused-off DSS.  All other types of MCR registers will be
+        * explicitly steered.
+        */
+       dss = intel_sseu_find_first_xehp_dss(&gt->info.sseu, 0, 0);
+       __add_mcr_wa(gt, wal, dss / GEN_DSS_PER_CSLICE, dss % GEN_DSS_PER_CSLICE);
+}
+
 static void
 icl_gt_workarounds_init(struct intel_gt *gt, struct i915_wa_list *wal)
 {
@@ -1487,6 +1501,18 @@ dg2_gt_workarounds_init(struct intel_gt *gt, struct i915_wa_list *wal)
         * performance guide section.
         */
        wa_write_or(wal, GEN12_SQCM, EN_32B_ACCESS);
+
+       /* Wa_14015795083 */
+       wa_write_clr(wal, GEN7_MISCCPCTL, GEN12_DOP_CLOCK_GATE_RENDER_ENABLE);
+}
+
+static void
+pvc_gt_workarounds_init(struct intel_gt *gt, struct i915_wa_list *wal)
+{
+       pvc_init_mcr(gt, wal);
+
+       /* Wa_14015795083 */
+       wa_write_clr(wal, GEN7_MISCCPCTL, GEN12_DOP_CLOCK_GATE_RENDER_ENABLE);
 }
 
 static void
@@ -1494,7 +1520,9 @@ gt_init_workarounds(struct intel_gt *gt, struct i915_wa_list *wal)
 {
        struct drm_i915_private *i915 = gt->i915;
 
-       if (IS_DG2(i915))
+       if (IS_PONTEVECCHIO(i915))
+               pvc_gt_workarounds_init(gt, wal);
+       else if (IS_DG2(i915))
                dg2_gt_workarounds_init(gt, wal);
        else if (IS_XEHPSDV(i915))
                xehpsdv_gt_workarounds_init(gt, wal);
@@ -1596,13 +1624,13 @@ wa_list_apply(struct intel_gt *gt, const struct i915_wa_list *wal)
                u32 val, old = 0;
 
                /* open-coded rmw due to steering */
-               old = wa->clr ? intel_gt_read_register_fw(gt, wa->reg) : 0;
+               old = wa->clr ? intel_gt_mcr_read_any_fw(gt, wa->reg) : 0;
                val = (old & ~wa->clr) | wa->set;
                if (val != old || !wa->clr)
                        intel_uncore_write_fw(uncore, wa->reg, val);
 
                if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
-                       wa_verify(wa, intel_gt_read_register_fw(gt, wa->reg),
+                       wa_verify(wa, intel_gt_mcr_read_any_fw(gt, wa->reg),
                                  wal->name, "application");
        }
 
@@ -1633,7 +1661,7 @@ static bool wa_list_verify(struct intel_gt *gt,
 
        for (i = 0, wa = wal->list; i < wal->count; i++, wa++)
                ok &= wa_verify(wa,
-                               intel_gt_read_register_fw(gt, wa->reg),
+                               intel_gt_mcr_read_any_fw(gt, wa->reg),
                                wal->name, from);
 
        intel_uncore_forcewake_put__locked(uncore, fw);
@@ -1924,6 +1952,32 @@ static void dg2_whitelist_build(struct intel_engine_cs *engine)
        }
 }
 
+static void blacklist_trtt(struct intel_engine_cs *engine)
+{
+       struct i915_wa_list *w = &engine->whitelist;
+
+       /*
+        * Prevent read/write access to [0x4400, 0x4600) which covers
+        * the TRTT range across all engines. Note that normally userspace
+        * cannot access the other engines' trtt control, but for simplicity
+        * we cover the entire range on each engine.
+        */
+       whitelist_reg_ext(w, _MMIO(0x4400),
+                         RING_FORCE_TO_NONPRIV_DENY |
+                         RING_FORCE_TO_NONPRIV_RANGE_64);
+       whitelist_reg_ext(w, _MMIO(0x4500),
+                         RING_FORCE_TO_NONPRIV_DENY |
+                         RING_FORCE_TO_NONPRIV_RANGE_64);
+}
+
+static void pvc_whitelist_build(struct intel_engine_cs *engine)
+{
+       allow_read_ctx_timestamp(engine);
+
+       /* Wa_16014440446:pvc */
+       blacklist_trtt(engine);
+}
+
 void intel_engine_init_whitelist(struct intel_engine_cs *engine)
 {
        struct drm_i915_private *i915 = engine->i915;
@@ -1931,7 +1985,9 @@ void intel_engine_init_whitelist(struct intel_engine_cs *engine)
 
        wa_init_start(w, "whitelist", engine->name);
 
-       if (IS_DG2(i915))
+       if (IS_PONTEVECCHIO(i915))
+               pvc_whitelist_build(engine);
+       else if (IS_DG2(i915))
                dg2_whitelist_build(engine);
        else if (IS_XEHPSDV(i915))
                xehpsdv_whitelist_build(engine);
@@ -1994,27 +2050,44 @@ void intel_engine_apply_whitelist(struct intel_engine_cs *engine)
 static void
 engine_fake_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
 {
-       u8 mocs;
+       u8 mocs_w, mocs_r;
 
        /*
-        * RING_CMD_CCTL are need to be programed to un-cached
-        * for memory writes and reads outputted by Command
-        * Streamers on Gen12 onward platforms.
+        * RING_CMD_CCTL specifies the default MOCS entry that will be used
+        * by the command streamer when executing commands that don't have
+        * a way to explicitly specify a MOCS setting.  The default should
+        * usually reference whichever MOCS entry corresponds to uncached
+        * behavior, although use of a WB cached entry is recommended by the
+        * spec in certain circumstances on specific platforms.
         */
        if (GRAPHICS_VER(engine->i915) >= 12) {
-               mocs = engine->gt->mocs.uc_index;
+               mocs_r = engine->gt->mocs.uc_index;
+               mocs_w = engine->gt->mocs.uc_index;
+
+               if (HAS_L3_CCS_READ(engine->i915) &&
+                   engine->class == COMPUTE_CLASS) {
+                       mocs_r = engine->gt->mocs.wb_index;
+
+                       /*
+                        * Even on the few platforms where MOCS 0 is a
+                        * legitimate table entry, it's never the correct
+                        * setting to use here; we can assume the MOCS init
+                        * just forgot to initialize wb_index.
+                        */
+                       drm_WARN_ON(&engine->i915->drm, mocs_r == 0);
+               }
+
                wa_masked_field_set(wal,
                                    RING_CMD_CCTL(engine->mmio_base),
                                    CMD_CCTL_MOCS_MASK,
-                                   CMD_CCTL_MOCS_OVERRIDE(mocs, mocs));
+                                   CMD_CCTL_MOCS_OVERRIDE(mocs_w, mocs_r));
        }
 }
 
 static bool needs_wa_1308578152(struct intel_engine_cs *engine)
 {
-       u64 dss_mask = intel_sseu_get_subslices(&engine->gt->info.sseu, 0);
-
-       return (dss_mask & GENMASK(GEN_DSS_PER_GSLICE - 1, 0)) == 0;
+       return intel_sseu_find_first_xehp_dss(&engine->gt->info.sseu, 0, 0) >=
+               GEN_DSS_PER_GSLICE;
 }
 
 static void
@@ -2023,9 +2096,6 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
        struct drm_i915_private *i915 = engine->i915;
 
        if (IS_DG2(i915)) {
-               /* Wa_14015227452:dg2 */
-               wa_masked_en(wal, GEN9_ROW_CHICKEN4, XEHP_DIS_BBL_SYSPIPE);
-
                /* Wa_1509235366:dg2 */
                wa_write_or(wal, GEN12_GAMCNTRL_CTRL, INVALIDATION_BROADCAST_MODE_DIS |
                            GLOBAL_INVALIDATION_MODE);
@@ -2036,12 +2106,6 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
                 * performance guide section.
                 */
                wa_write_or(wal, XEHP_L3SCQREG7, BLEND_FILL_CACHING_OPT_DIS);
-
-               /* Wa_18018781329:dg2 */
-               wa_write_or(wal, RENDER_MOD_CTRL, FORCE_MISS_FTLB);
-               wa_write_or(wal, COMP_MOD_CTRL, FORCE_MISS_FTLB);
-               wa_write_or(wal, VDBX_MOD_CTRL, FORCE_MISS_FTLB);
-               wa_write_or(wal, VEBX_MOD_CTRL, FORCE_MISS_FTLB);
        }
 
        if (IS_DG2_GRAPHICS_STEP(i915, G11, STEP_A0, STEP_B0)) {
@@ -2160,6 +2224,16 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
                wa_write_or(wal, GEN12_MERT_MOD_CTRL, FORCE_MISS_FTLB);
        }
 
+       if (IS_DG2_GRAPHICS_STEP(i915, G11, STEP_B0, STEP_FOREVER) ||
+           IS_DG2_G10(i915)) {
+               /* Wa_22014600077:dg2 */
+               wa_add(wal, GEN10_CACHE_MODE_SS, 0,
+                      _MASKED_BIT_ENABLE(ENABLE_EU_COUNT_FOR_TDL_FLUSH),
+                      0 /* Wa_14012342262 :write-only reg, so skip
+                           verification */,
+                      true);
+       }
+
        if (IS_DG1_GRAPHICS_STEP(i915, STEP_A0, STEP_B0) ||
            IS_TGL_UY_GRAPHICS_STEP(i915, STEP_A0, STEP_B0)) {
                /*
@@ -2583,6 +2657,15 @@ xcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
        }
 }
 
+static void
+ccs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
+{
+       if (IS_PVC_CT_STEP(engine->i915, STEP_A0, STEP_C0)) {
+               /* Wa_14014999345:pvc */
+               wa_masked_en(wal, GEN10_CACHE_MODE_SS, DISABLE_ECC);
+       }
+}
+
 /*
  * The workarounds in this function apply to shared registers in
  * the general render reset domain that aren't tied to a
@@ -2597,6 +2680,15 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li
 {
        struct drm_i915_private *i915 = engine->i915;
 
+       if (IS_PONTEVECCHIO(i915)) {
+               /*
+                * The following is not actually a "workaround" but rather
+                * a recommended tuning setting documented in the bspec's
+                * performance guide section.
+                */
+               wa_write(wal, XEHPC_L3SCRUB, SCRUB_CL_DWNGRADE_SHARED | SCRUB_RATE_4B_PER_CLK);
+       }
+
        if (IS_XEHPSDV(i915)) {
                /* Wa_1409954639 */
                wa_masked_en(wal,
@@ -2629,9 +2721,21 @@ general_render_compute_wa_init(struct intel_engine_cs *engine, struct i915_wa_li
                                GLOBAL_INVALIDATION_MODE);
        }
 
-       if (IS_DG2(i915)) {
-               /* Wa_22014226127:dg2 */
+       if (IS_DG2(i915) || IS_PONTEVECCHIO(i915)) {
+               /* Wa_14015227452:dg2,pvc */
+               wa_masked_en(wal, GEN9_ROW_CHICKEN4, XEHP_DIS_BBL_SYSPIPE);
+
+               /* Wa_22014226127:dg2,pvc */
                wa_write_or(wal, LSC_CHICKEN_BIT_0, DISABLE_D8_D16_COASLESCE);
+
+               /* Wa_16015675438:dg2,pvc */
+               wa_masked_en(wal, FF_SLICE_CS_CHICKEN2, GEN12_PERF_FIX_BALANCING_CFE_DISABLE);
+
+               /* Wa_18018781329:dg2,pvc */
+               wa_write_or(wal, RENDER_MOD_CTRL, FORCE_MISS_FTLB);
+               wa_write_or(wal, COMP_MOD_CTRL, FORCE_MISS_FTLB);
+               wa_write_or(wal, VDBX_MOD_CTRL, FORCE_MISS_FTLB);
+               wa_write_or(wal, VEBX_MOD_CTRL, FORCE_MISS_FTLB);
        }
 }
 
@@ -2651,7 +2755,9 @@ engine_init_workarounds(struct intel_engine_cs *engine, struct i915_wa_list *wal
        if (engine->flags & I915_ENGINE_FIRST_RENDER_COMPUTE)
                general_render_compute_wa_init(engine, wal);
 
-       if (engine->class == RENDER_CLASS)
+       if (engine->class == COMPUTE_CLASS)
+               ccs_engine_wa_init(engine, wal);
+       else if (engine->class == RENDER_CLASS)
                rcs_engine_wa_init(engine, wal);
        else
                xcs_engine_wa_init(engine, wal);
index 83ff4c2e57c5039928ad6eec6a208c9407285b7f..6493265d5f6426a514a7eed2fc308b5aceac2c55 100644 (file)
@@ -976,6 +976,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
 {
        struct i915_gpu_error *global = &gt->i915->gpu_error;
        struct intel_engine_cs *engine, *other;
+       struct active_engine *threads;
        enum intel_engine_id id, tmp;
        struct hang h;
        int err = 0;
@@ -996,8 +997,11 @@ static int __igt_reset_engines(struct intel_gt *gt,
                        h.ctx->sched.priority = 1024;
        }
 
+       threads = kmalloc_array(I915_NUM_ENGINES, sizeof(*threads), GFP_KERNEL);
+       if (!threads)
+               return -ENOMEM;
+
        for_each_engine(engine, gt, id) {
-               struct active_engine threads[I915_NUM_ENGINES] = {};
                unsigned long device = i915_reset_count(global);
                unsigned long count = 0, reported;
                bool using_guc = intel_engine_uses_guc(engine);
@@ -1016,7 +1020,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
                        break;
                }
 
-               memset(threads, 0, sizeof(threads));
+               memset(threads, 0, sizeof(*threads) * I915_NUM_ENGINES);
                for_each_engine(other, gt, tmp) {
                        struct task_struct *tsk;
 
@@ -1236,6 +1240,7 @@ unwind:
                        break;
                }
        }
+       kfree(threads);
 
        if (intel_gt_is_wedged(gt))
                err = -EIO;
index 62cb4254a77af37b327f42cfbb48de286af8dddd..4c840a2639dc58663501aa22e5a0795693604cf3 100644 (file)
@@ -122,6 +122,12 @@ enum slpc_param_id {
        SLPC_MAX_PARAM = 32,
 };
 
+enum slpc_media_ratio_mode {
+       SLPC_MEDIA_RATIO_MODE_DYNAMIC_CONTROL = 0,
+       SLPC_MEDIA_RATIO_MODE_FIXED_ONE_TO_ONE = 1,
+       SLPC_MEDIA_RATIO_MODE_FIXED_ONE_TO_TWO = 2,
+};
+
 enum slpc_event_id {
        SLPC_EVENT_RESET = 0,
        SLPC_EVENT_SHUTDOWN = 1,
index 2c4ad4a65089936b393dc9852baff81b0e405698..2706a8c650900e661a4a178f999e9e464443d885 100644 (file)
@@ -310,8 +310,8 @@ static u32 guc_ctl_wa_flags(struct intel_guc *guc)
        if (IS_DG2(gt->i915))
                flags |= GUC_WA_DUAL_QUEUE;
 
-       /* Wa_22011802037: graphics version 12 */
-       if (GRAPHICS_VER(gt->i915) == 12)
+       /* Wa_22011802037: graphics version 11/12 */
+       if (IS_GRAPHICS_VER(gt->i915, 11, 12))
                flags |= GUC_WA_PRE_PARSER;
 
        /* Wa_16011777198:dg2 */
@@ -327,6 +327,10 @@ static u32 guc_ctl_wa_flags(struct intel_guc *guc)
            IS_DG2_GRAPHICS_STEP(gt->i915, G11, STEP_A0, STEP_FOREVER))
                flags |= GUC_WA_CONTEXT_ISOLATION;
 
+       /* Wa_16015675438 */
+       if (!RCS_MASK(gt))
+               flags |= GUC_WA_RCS_REGS_IN_CCS_REGS_LIST;
+
        return flags;
 }
 
index 966e69a8b1c124d3b9ab107b9f65f774589c2b10..d0d99f178f2d4eab4ebd39f08100f14145858ef5 100644 (file)
@@ -230,6 +230,14 @@ struct intel_guc {
                 * @shift: Right shift value for the gpm timestamp
                 */
                u32 shift;
+
+               /**
+                * @last_stat_jiffies: jiffies at last actual stats collection time
+                * We use this timestamp to ensure we don't oversample the
+                * stats because runtime power management events can trigger
+                * stats collection at much higher rates than required.
+                */
+               unsigned long last_stat_jiffies;
        } timestamp;
 
 #ifdef CONFIG_DRM_I915_SELFTEST
index 3eabf4cf8eec3ee9b484d6b014a3741fc4746dd8..ba7541f3ca6104e0156bb2e3bd3e649e6d9fb09a 100644 (file)
@@ -7,6 +7,7 @@
 
 #include "gt/intel_engine_regs.h"
 #include "gt/intel_gt.h"
+#include "gt/intel_gt_mcr.h"
 #include "gt/intel_gt_regs.h"
 #include "gt/intel_lrc.h"
 #include "gt/shmem_utils.h"
@@ -313,7 +314,7 @@ static long __must_check guc_mmio_reg_add(struct intel_gt *gt,
         * tracking, it is easier to just program the default steering for all
         * regs that don't need a non-default one.
         */
-       intel_gt_get_valid_steering_for_reg(gt, reg, &group, &inst);
+       intel_gt_mcr_get_nonterminated_steering(gt, reg, &group, &inst);
        entry.flags |= GUC_REGSET_STEERING(group, inst);
 
        slot = __mmio_reg_add(regset, &entry);
@@ -457,7 +458,7 @@ static void fill_engine_enable_masks(struct intel_gt *gt,
 {
        info_map_write(info_map, engine_enabled_masks[GUC_RENDER_CLASS], RCS_MASK(gt));
        info_map_write(info_map, engine_enabled_masks[GUC_COMPUTE_CLASS], CCS_MASK(gt));
-       info_map_write(info_map, engine_enabled_masks[GUC_BLITTER_CLASS], 1);
+       info_map_write(info_map, engine_enabled_masks[GUC_BLITTER_CLASS], BCS_MASK(gt));
        info_map_write(info_map, engine_enabled_masks[GUC_VIDEO_CLASS], VDBOX_MASK(gt));
        info_map_write(info_map, engine_enabled_masks[GUC_VIDEOENHANCE_CLASS], VEBOX_MASK(gt));
 }
index c4e25966d3e9f6fc7870a3fa6dcf96bcab21439c..97a32e610c303a3e64b0218221a3d6c92783f3e5 100644 (file)
@@ -420,72 +420,6 @@ guc_capture_get_device_reglist(struct intel_guc *guc)
        return default_lists;
 }
 
-static const char *
-__stringify_owner(u32 owner)
-{
-       switch (owner) {
-       case GUC_CAPTURE_LIST_INDEX_PF:
-               return "PF";
-       case GUC_CAPTURE_LIST_INDEX_VF:
-               return "VF";
-       default:
-               return "unknown";
-       }
-
-       return "";
-}
-
-static const char *
-__stringify_type(u32 type)
-{
-       switch (type) {
-       case GUC_CAPTURE_LIST_TYPE_GLOBAL:
-               return "Global";
-       case GUC_CAPTURE_LIST_TYPE_ENGINE_CLASS:
-               return "Class";
-       case GUC_CAPTURE_LIST_TYPE_ENGINE_INSTANCE:
-               return "Instance";
-       default:
-               return "unknown";
-       }
-
-       return "";
-}
-
-static const char *
-__stringify_engclass(u32 class)
-{
-       switch (class) {
-       case GUC_RENDER_CLASS:
-               return "Render";
-       case GUC_VIDEO_CLASS:
-               return "Video";
-       case GUC_VIDEOENHANCE_CLASS:
-               return "VideoEnhance";
-       case GUC_BLITTER_CLASS:
-               return "Blitter";
-       case GUC_COMPUTE_CLASS:
-               return "Compute";
-       default:
-               return "unknown";
-       }
-
-       return "";
-}
-
-static void
-guc_capture_warn_with_list_info(struct drm_i915_private *i915, char *msg,
-                               u32 owner, u32 type, u32 classid)
-{
-       if (type == GUC_CAPTURE_LIST_TYPE_GLOBAL)
-               drm_dbg(&i915->drm, "GuC-capture: %s for %s %s-Registers.\n", msg,
-                       __stringify_owner(owner), __stringify_type(type));
-       else
-               drm_dbg(&i915->drm, "GuC-capture: %s for %s %s-Registers on %s-Engine\n", msg,
-                       __stringify_owner(owner), __stringify_type(type),
-                       __stringify_engclass(classid));
-}
-
 static int
 guc_capture_list_init(struct intel_guc *guc, u32 owner, u32 type, u32 classid,
                      struct guc_mmio_reg *ptr, u16 num_entries)
@@ -501,11 +435,8 @@ guc_capture_list_init(struct intel_guc *guc, u32 owner, u32 type, u32 classid,
                return -ENODEV;
 
        match = guc_capture_get_one_list(reglists, owner, type, classid);
-       if (!match) {
-               guc_capture_warn_with_list_info(i915, "Missing register list init", owner, type,
-                                               classid);
+       if (!match)
                return -ENODATA;
-       }
 
        for (i = 0; i < num_entries && i < match->num_regs; ++i) {
                ptr[i].offset = match->list[i].reg.reg;
@@ -556,7 +487,6 @@ int
 intel_guc_capture_getlistsize(struct intel_guc *guc, u32 owner, u32 type, u32 classid,
                              size_t *size)
 {
-       struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
        struct intel_guc_state_capture *gc = guc->capture;
        struct __guc_capture_ads_cache *cache = &gc->ads_cache[owner][type][classid];
        int num_regs;
@@ -570,11 +500,8 @@ intel_guc_capture_getlistsize(struct intel_guc *guc, u32 owner, u32 type, u32 cl
        }
 
        num_regs = guc_cap_list_num_regs(gc, owner, type, classid);
-       if (!num_regs) {
-               guc_capture_warn_with_list_info(i915, "Missing register list size",
-                                               owner, type, classid);
+       if (!num_regs)
                return -ENODATA;
-       }
 
        *size = PAGE_ALIGN((sizeof(struct guc_debug_capture_list)) +
                           (num_regs * sizeof(struct guc_mmio_reg)));
index 42cb7a9a6199c79f8350efd259bbeb0fd46b57c3..b3c9a9327f7648ee84d5aa1df71dfae16a8a8d25 100644 (file)
 #define   GUC_WA_PRE_PARSER            BIT(14)
 #define   GUC_WA_HOLD_CCS_SWITCHOUT    BIT(17)
 #define   GUC_WA_POLLCS                        BIT(18)
+#define   GUC_WA_RCS_REGS_IN_CCS_REGS_LIST     BIT(21)
 
 #define GUC_CTL_FEATURE                        2
 #define   GUC_CTL_ENABLE_SLPC          BIT(2)
index 79c66b6b51a3f77fe7eebf9660e3685c32e3fc1d..4781fccc2687d6d407643bb99e4abb6e624aa113 100644 (file)
@@ -94,9 +94,9 @@ static int guc_hwconfig_fill_buffer(struct intel_guc *guc, struct intel_hwconfig
 
 static bool has_table(struct drm_i915_private *i915)
 {
-       if (IS_ALDERLAKE_P(i915))
+       if (IS_ALDERLAKE_P(i915) && !IS_ADLP_N(i915))
                return true;
-       if (IS_DG2(i915))
+       if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 55))
                return true;
 
        return false;
index e00661fb0853169b7e5c4b3d4de4089f255f9b17..8f8dd05835c5aaf766e0f70ccbba6a3c85904dda 100644 (file)
@@ -49,7 +49,6 @@ static int guc_action_control_gucrc(struct intel_guc *guc, bool enable)
 static int __guc_rc_control(struct intel_guc *guc, bool enable)
 {
        struct intel_gt *gt = guc_to_gt(guc);
-       struct drm_device *drm = &guc_to_gt(guc)->i915->drm;
        int ret;
 
        if (!intel_uc_uses_guc_rc(&gt->uc))
@@ -60,8 +59,8 @@ static int __guc_rc_control(struct intel_guc *guc, bool enable)
 
        ret = guc_action_control_gucrc(guc, enable);
        if (ret) {
-               drm_err(drm, "Failed to %s GuC RC (%pe)\n",
-                       str_enable_disable(enable), ERR_PTR(ret));
+               i915_probe_error(guc_to_gt(guc)->i915, "Failed to %s GuC RC (%pe)\n",
+                                str_enable_disable(enable), ERR_PTR(ret));
                return ret;
        }
 
index ad570fa002a6b9e7429c6ae19a5140daa04a43a0..8dc063f087eb15add9f2a3155eea46f0832f9d85 100644 (file)
@@ -96,6 +96,7 @@
 
 #define GUC_SHIM_CONTROL2              _MMIO(0xc068)
 #define   GUC_IS_PRIVILEGED            (1<<29)
+#define   GSC_LOADS_HUC                        (1<<30)
 
 #define GUC_SEND_INTERRUPT             _MMIO(0xc4c8)
 #define   GUC_SEND_TRIGGER               (1<<0)
index 1db833da42df31c1ee5bcfe1fea76161c31d84cd..ec9c4ca0f615bc909eb41fd559d670d6e159019a 100644 (file)
@@ -98,6 +98,30 @@ static u32 slpc_get_state(struct intel_guc_slpc *slpc)
        return data->header.global_state;
 }
 
+static int guc_action_slpc_set_param_nb(struct intel_guc *guc, u8 id, u32 value)
+{
+       u32 request[] = {
+               GUC_ACTION_HOST2GUC_PC_SLPC_REQUEST,
+               SLPC_EVENT(SLPC_EVENT_PARAMETER_SET, 2),
+               id,
+               value,
+       };
+       int ret;
+
+       ret = intel_guc_send_nb(guc, request, ARRAY_SIZE(request), 0);
+
+       return ret > 0 ? -EPROTO : ret;
+}
+
+static int slpc_set_param_nb(struct intel_guc_slpc *slpc, u8 id, u32 value)
+{
+       struct intel_guc *guc = slpc_to_guc(slpc);
+
+       GEM_BUG_ON(id >= SLPC_MAX_PARAM);
+
+       return guc_action_slpc_set_param_nb(guc, id, value);
+}
+
 static int guc_action_slpc_set_param(struct intel_guc *guc, u8 id, u32 value)
 {
        u32 request[] = {
@@ -208,12 +232,14 @@ static int slpc_force_min_freq(struct intel_guc_slpc *slpc, u32 freq)
         */
 
        with_intel_runtime_pm(&i915->runtime_pm, wakeref) {
-               ret = slpc_set_param(slpc,
-                                    SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ,
-                                    freq);
+               /* Non-blocking request will avoid stalls */
+               ret = slpc_set_param_nb(slpc,
+                                       SLPC_PARAM_GLOBAL_MIN_GT_UNSLICE_FREQ_MHZ,
+                                       freq);
                if (ret)
-                       i915_probe_error(i915, "Unable to force min freq to %u: %d",
-                                        freq, ret);
+                       drm_notice(&i915->drm,
+                                  "Failed to send set_param for min freq(%d): (%d)\n",
+                                  freq, ret);
        }
 
        return ret;
@@ -222,6 +248,7 @@ static int slpc_force_min_freq(struct intel_guc_slpc *slpc, u32 freq)
 static void slpc_boost_work(struct work_struct *work)
 {
        struct intel_guc_slpc *slpc = container_of(work, typeof(*slpc), boost_work);
+       int err;
 
        /*
         * Raise min freq to boost. It's possible that
@@ -231,8 +258,9 @@ static void slpc_boost_work(struct work_struct *work)
         */
        mutex_lock(&slpc->lock);
        if (atomic_read(&slpc->num_waiters)) {
-               slpc_force_min_freq(slpc, slpc->boost_freq);
-               slpc->num_boosts++;
+               err = slpc_force_min_freq(slpc, slpc->boost_freq);
+               if (!err)
+                       slpc->num_boosts++;
        }
        mutex_unlock(&slpc->lock);
 }
@@ -260,6 +288,7 @@ int intel_guc_slpc_init(struct intel_guc_slpc *slpc)
        slpc->boost_freq = 0;
        atomic_set(&slpc->num_waiters, 0);
        slpc->num_boosts = 0;
+       slpc->media_ratio_mode = SLPC_MEDIA_RATIO_MODE_DYNAMIC_CONTROL;
 
        mutex_init(&slpc->lock);
        INIT_WORK(&slpc->boost_work, slpc_boost_work);
@@ -506,6 +535,22 @@ int intel_guc_slpc_get_min_freq(struct intel_guc_slpc *slpc, u32 *val)
        return ret;
 }
 
+int intel_guc_slpc_set_media_ratio_mode(struct intel_guc_slpc *slpc, u32 val)
+{
+       struct drm_i915_private *i915 = slpc_to_i915(slpc);
+       intel_wakeref_t wakeref;
+       int ret = 0;
+
+       if (!HAS_MEDIA_RATIO_MODE(i915))
+               return -ENODEV;
+
+       with_intel_runtime_pm(&i915->runtime_pm, wakeref)
+               ret = slpc_set_param(slpc,
+                                    SLPC_PARAM_MEDIA_FF_RATIO_MODE,
+                                    val);
+       return ret;
+}
+
 void intel_guc_pm_intrmsk_enable(struct intel_gt *gt)
 {
        u32 pm_intrmsk_mbz = 0;
@@ -654,6 +699,9 @@ int intel_guc_slpc_enable(struct intel_guc_slpc *slpc)
                return ret;
        }
 
+       /* Set cached media freq ratio mode */
+       intel_guc_slpc_set_media_ratio_mode(slpc, slpc->media_ratio_mode);
+
        return 0;
 }
 
index 0caa8fee3c040ec7cdccc68ab8292cda6f4a85f2..82a98f78f96c3f71accd8dba6e3d424315f79680 100644 (file)
@@ -38,6 +38,7 @@ int intel_guc_slpc_set_boost_freq(struct intel_guc_slpc *slpc, u32 val);
 int intel_guc_slpc_get_max_freq(struct intel_guc_slpc *slpc, u32 *val);
 int intel_guc_slpc_get_min_freq(struct intel_guc_slpc *slpc, u32 *val);
 int intel_guc_slpc_print_info(struct intel_guc_slpc *slpc, struct drm_printer *p);
+int intel_guc_slpc_set_media_ratio_mode(struct intel_guc_slpc *slpc, u32 val);
 void intel_guc_pm_intrmsk_enable(struct intel_gt *gt);
 void intel_guc_slpc_boost(struct intel_guc_slpc *slpc);
 void intel_guc_slpc_dec_waiters(struct intel_guc_slpc *slpc);
index bf5b9a563c09e0a9a7549787cfcd6268c36e4c4b..73d208123528ff740a8b25d1defa75148b04b3dc 100644 (file)
@@ -29,6 +29,9 @@ struct intel_guc_slpc {
        u32 min_freq_softlimit;
        u32 max_freq_softlimit;
 
+       /* cached media ratio mode */
+       u32 media_ratio_mode;
+
        /* Protects set/reset of boost freq
         * and value of num_waiters
         */
index 1726f0f19901097551de288ec80bdf37a4744a52..40f726c61e95183be1d11a07641b72c728821a3d 100644 (file)
@@ -1314,6 +1314,8 @@ static void __update_guc_busyness_stats(struct intel_guc *guc)
        unsigned long flags;
        ktime_t unused;
 
+       guc->timestamp.last_stat_jiffies = jiffies;
+
        spin_lock_irqsave(&guc->timestamp.lock, flags);
 
        guc_update_pm_timestamp(guc, &unused);
@@ -1386,6 +1388,17 @@ void intel_guc_busyness_park(struct intel_gt *gt)
                return;
 
        cancel_delayed_work(&guc->timestamp.work);
+
+       /*
+        * Before parking, we should sample engine busyness stats if we need to.
+        * We can skip it if we are less than half a ping from the last time we
+        * sampled the busyness stats.
+        */
+       if (guc->timestamp.last_stat_jiffies &&
+           !time_after(jiffies, guc->timestamp.last_stat_jiffies +
+                       (guc->timestamp.ping_delay / 2)))
+               return;
+
        __update_guc_busyness_stats(guc);
 }
 
@@ -1527,87 +1540,18 @@ static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub)
        lrc_update_regs(ce, engine, head);
 }
 
-static u32 __cs_pending_mi_force_wakes(struct intel_engine_cs *engine)
-{
-       static const i915_reg_t _reg[I915_NUM_ENGINES] = {
-               [RCS0] = MSG_IDLE_CS,
-               [BCS0] = MSG_IDLE_BCS,
-               [VCS0] = MSG_IDLE_VCS0,
-               [VCS1] = MSG_IDLE_VCS1,
-               [VCS2] = MSG_IDLE_VCS2,
-               [VCS3] = MSG_IDLE_VCS3,
-               [VCS4] = MSG_IDLE_VCS4,
-               [VCS5] = MSG_IDLE_VCS5,
-               [VCS6] = MSG_IDLE_VCS6,
-               [VCS7] = MSG_IDLE_VCS7,
-               [VECS0] = MSG_IDLE_VECS0,
-               [VECS1] = MSG_IDLE_VECS1,
-               [VECS2] = MSG_IDLE_VECS2,
-               [VECS3] = MSG_IDLE_VECS3,
-               [CCS0] = MSG_IDLE_CS,
-               [CCS1] = MSG_IDLE_CS,
-               [CCS2] = MSG_IDLE_CS,
-               [CCS3] = MSG_IDLE_CS,
-       };
-       u32 val;
-
-       if (!_reg[engine->id].reg)
-               return 0;
-
-       val = intel_uncore_read(engine->uncore, _reg[engine->id]);
-
-       /* bits[29:25] & bits[13:9] >> shift */
-       return (val & (val >> 16) & MSG_IDLE_FW_MASK) >> MSG_IDLE_FW_SHIFT;
-}
-
-static void __gpm_wait_for_fw_complete(struct intel_gt *gt, u32 fw_mask)
-{
-       int ret;
-
-       /* Ensure GPM receives fw up/down after CS is stopped */
-       udelay(1);
-
-       /* Wait for forcewake request to complete in GPM */
-       ret =  __intel_wait_for_register_fw(gt->uncore,
-                                           GEN9_PWRGT_DOMAIN_STATUS,
-                                           fw_mask, fw_mask, 5000, 0, NULL);
-
-       /* Ensure CS receives fw ack from GPM */
-       udelay(1);
-
-       if (ret)
-               GT_TRACE(gt, "Failed to complete pending forcewake %d\n", ret);
-}
-
-/*
- * Wa_22011802037:gen12: In addition to stopping the cs, we need to wait for any
- * pending MI_FORCE_WAKEUP requests that the CS has initiated to complete. The
- * pending status is indicated by bits[13:9] (masked by bits[ 29:25]) in the
- * MSG_IDLE register. There's one MSG_IDLE register per reset domain. Since we
- * are concerned only with the gt reset here, we use a logical OR of pending
- * forcewakeups from all reset domains and then wait for them to complete by
- * querying PWRGT_DOMAIN_STATUS.
- */
 static void guc_engine_reset_prepare(struct intel_engine_cs *engine)
 {
-       u32 fw_pending;
-
-       if (GRAPHICS_VER(engine->i915) != 12)
+       if (!IS_GRAPHICS_VER(engine->i915, 11, 12))
                return;
 
-       /*
-        * Wa_22011802037
-        * TODO: Occasionally trying to stop the cs times out, but does not
-        * adversely affect functionality. The timeout is set as a config
-        * parameter that defaults to 100ms. Assuming that this timeout is
-        * sufficient for any pending MI_FORCEWAKEs to complete, ignore the
-        * timeout returned here until it is root caused.
-        */
        intel_engine_stop_cs(engine);
 
-       fw_pending = __cs_pending_mi_force_wakes(engine);
-       if (fw_pending)
-               __gpm_wait_for_fw_complete(engine->gt, fw_pending);
+       /*
+        * Wa_22011802037:gen11/gen12: In addition to stopping the cs, we need
+        * to wait for any pending mi force wakeups
+        */
+       intel_engine_wait_for_pending_mi_fw(engine);
 }
 
 static void guc_reset_nop(struct intel_engine_cs *engine)
@@ -2394,6 +2338,26 @@ static int guc_context_policy_init(struct intel_context *ce, bool loop)
        return ret;
 }
 
+static u32 map_guc_prio_to_lrc_desc_prio(u8 prio)
+{
+       /*
+        * this matches the mapping we do in map_i915_prio_to_guc_prio()
+        * (e.g. prio < I915_PRIORITY_NORMAL maps to GUC_CLIENT_PRIORITY_NORMAL)
+        */
+       switch (prio) {
+       default:
+               MISSING_CASE(prio);
+               fallthrough;
+       case GUC_CLIENT_PRIORITY_KMD_NORMAL:
+               return GEN12_CTX_PRIORITY_NORMAL;
+       case GUC_CLIENT_PRIORITY_NORMAL:
+               return GEN12_CTX_PRIORITY_LOW;
+       case GUC_CLIENT_PRIORITY_HIGH:
+       case GUC_CLIENT_PRIORITY_KMD_HIGH:
+               return GEN12_CTX_PRIORITY_HIGH;
+       }
+}
+
 static void prepare_context_registration_info(struct intel_context *ce,
                                              struct guc_ctxt_registration_info *info)
 {
@@ -2420,6 +2384,8 @@ static void prepare_context_registration_info(struct intel_context *ce,
         */
        info->hwlrca_lo = lower_32_bits(ce->lrc.lrca);
        info->hwlrca_hi = upper_32_bits(ce->lrc.lrca);
+       if (engine->flags & I915_ENGINE_HAS_EU_PRIORITY)
+               info->hwlrca_lo |= map_guc_prio_to_lrc_desc_prio(ce->guc_state.prio);
        info->flags = CONTEXT_REGISTRATION_FLAG_KMD;
 
        /*
@@ -2768,7 +2734,9 @@ static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
        __guc_context_set_context_policies(guc, &policy, true);
 }
 
-static void guc_context_ban(struct intel_context *ce, struct i915_request *rq)
+static void
+guc_context_revoke(struct intel_context *ce, struct i915_request *rq,
+                  unsigned int preempt_timeout_ms)
 {
        struct intel_guc *guc = ce_to_guc(ce);
        struct intel_runtime_pm *runtime_pm =
@@ -2807,7 +2775,8 @@ static void guc_context_ban(struct intel_context *ce, struct i915_request *rq)
                 * gets kicked off the HW ASAP.
                 */
                with_intel_runtime_pm(runtime_pm, wakeref) {
-                       __guc_context_set_preemption_timeout(guc, guc_id, 1);
+                       __guc_context_set_preemption_timeout(guc, guc_id,
+                                                            preempt_timeout_ms);
                        __guc_context_sched_disable(guc, ce, guc_id);
                }
        } else {
@@ -2815,7 +2784,7 @@ static void guc_context_ban(struct intel_context *ce, struct i915_request *rq)
                        with_intel_runtime_pm(runtime_pm, wakeref)
                                __guc_context_set_preemption_timeout(guc,
                                                                     ce->guc_id.id,
-                                                                    1);
+                                                                    preempt_timeout_ms);
                spin_unlock_irqrestore(&ce->guc_state.lock, flags);
        }
 }
@@ -3168,7 +3137,7 @@ static const struct intel_context_ops guc_context_ops = {
        .unpin = guc_context_unpin,
        .post_unpin = guc_context_post_unpin,
 
-       .ban = guc_context_ban,
+       .revoke = guc_context_revoke,
 
        .cancel_request = guc_context_cancel_request,
 
@@ -3417,7 +3386,7 @@ static const struct intel_context_ops virtual_guc_context_ops = {
        .unpin = guc_virtual_context_unpin,
        .post_unpin = guc_context_post_unpin,
 
-       .ban = guc_context_ban,
+       .revoke = guc_context_revoke,
 
        .cancel_request = guc_context_cancel_request,
 
@@ -3506,7 +3475,7 @@ static const struct intel_context_ops virtual_parent_context_ops = {
        .unpin = guc_parent_context_unpin,
        .post_unpin = guc_context_post_unpin,
 
-       .ban = guc_context_ban,
+       .revoke = guc_context_revoke,
 
        .cancel_request = guc_context_cancel_request,
 
index 556829de9c17287dc8f2ee598627fed91caf123d..3bb8838e325a4121ce478c526eb7348d4b4f62d3 100644 (file)
@@ -6,6 +6,7 @@
 #include <linux/types.h>
 
 #include "gt/intel_gt.h"
+#include "intel_guc_reg.h"
 #include "intel_huc.h"
 #include "i915_drv.h"
 
  * capabilities by adding HuC specific commands to batch buffers.
  *
  * The kernel driver is only responsible for loading the HuC firmware and
- * triggering its security authentication, which is performed by the GuC. For
- * The GuC to correctly perform the authentication, the HuC binary must be
- * loaded before the GuC one. Loading the HuC is optional; however, not using
- * the HuC might negatively impact power usage and/or performance of media
- * workloads, depending on the use-cases.
+ * triggering its security authentication, which is performed by the GuC on
+ * older platforms and by the GSC on newer ones. For the GuC to correctly
+ * perform the authentication, the HuC binary must be loaded before the GuC one.
+ * Loading the HuC is optional; however, not using the HuC might negatively
+ * impact power usage and/or performance of media workloads, depending on the
+ * use-cases.
+ * HuC must be reloaded on events that cause the WOPCM to lose its contents
+ * (S3/S4, FLR); GuC-authenticated HuC must also be reloaded on GuC/GT reset,
+ * while GSC-managed HuC will survive that.
  *
  * See https://github.com/intel/media-driver for the latest details on HuC
  * functionality.
@@ -54,11 +59,51 @@ void intel_huc_init_early(struct intel_huc *huc)
        }
 }
 
+#define HUC_LOAD_MODE_STRING(x) (x ? "GSC" : "legacy")
+static int check_huc_loading_mode(struct intel_huc *huc)
+{
+       struct intel_gt *gt = huc_to_gt(huc);
+       bool fw_needs_gsc = intel_huc_is_loaded_by_gsc(huc);
+       bool hw_uses_gsc = false;
+
+       /*
+        * The fuse for HuC load via GSC is only valid on platforms that have
+        * GuC deprivilege.
+        */
+       if (HAS_GUC_DEPRIVILEGE(gt->i915))
+               hw_uses_gsc = intel_uncore_read(gt->uncore, GUC_SHIM_CONTROL2) &
+                             GSC_LOADS_HUC;
+
+       if (fw_needs_gsc != hw_uses_gsc) {
+               drm_err(&gt->i915->drm,
+                       "mismatch between HuC FW (%s) and HW (%s) load modes\n",
+                       HUC_LOAD_MODE_STRING(fw_needs_gsc),
+                       HUC_LOAD_MODE_STRING(hw_uses_gsc));
+               return -ENOEXEC;
+       }
+
+       /* make sure we can access the GSC via the mei driver if we need it */
+       if (!(IS_ENABLED(CONFIG_INTEL_MEI_PXP) && IS_ENABLED(CONFIG_INTEL_MEI_GSC)) &&
+           fw_needs_gsc) {
+               drm_info(&gt->i915->drm,
+                        "Can't load HuC due to missing MEI modules\n");
+               return -EIO;
+       }
+
+       drm_dbg(&gt->i915->drm, "GSC loads huc=%s\n", str_yes_no(fw_needs_gsc));
+
+       return 0;
+}
+
 int intel_huc_init(struct intel_huc *huc)
 {
        struct drm_i915_private *i915 = huc_to_gt(huc)->i915;
        int err;
 
+       err = check_huc_loading_mode(huc);
+       if (err)
+               goto out;
+
        err = intel_uc_fw_init(&huc->fw);
        if (err)
                goto out;
@@ -68,7 +113,7 @@ int intel_huc_init(struct intel_huc *huc)
        return 0;
 
 out:
-       i915_probe_error(i915, "failed with %d\n", err);
+       drm_info(&i915->drm, "HuC init failed with %d\n", err);
        return err;
 }
 
@@ -96,17 +141,20 @@ int intel_huc_auth(struct intel_huc *huc)
        struct intel_guc *guc = &gt->uc.guc;
        int ret;
 
-       GEM_BUG_ON(intel_huc_is_authenticated(huc));
-
        if (!intel_uc_fw_is_loaded(&huc->fw))
                return -ENOEXEC;
 
+       /* GSC will do the auth */
+       if (intel_huc_is_loaded_by_gsc(huc))
+               return -ENODEV;
+
        ret = i915_inject_probe_error(gt->i915, -ENXIO);
        if (ret)
                goto fail;
 
-       ret = intel_guc_auth_huc(guc,
-                                intel_guc_ggtt_offset(guc, huc->fw.rsa_data));
+       GEM_BUG_ON(intel_uc_fw_is_running(&huc->fw));
+
+       ret = intel_guc_auth_huc(guc, intel_guc_ggtt_offset(guc, huc->fw.rsa_data));
        if (ret) {
                DRM_ERROR("HuC: GuC did not ack Auth request %d\n", ret);
                goto fail;
@@ -133,6 +181,18 @@ fail:
        return ret;
 }
 
+static bool huc_is_authenticated(struct intel_huc *huc)
+{
+       struct intel_gt *gt = huc_to_gt(huc);
+       intel_wakeref_t wakeref;
+       u32 status = 0;
+
+       with_intel_runtime_pm(gt->uncore->rpm, wakeref)
+               status = intel_uncore_read(gt->uncore, huc->status.reg);
+
+       return (status & huc->status.mask) == huc->status.value;
+}
+
 /**
  * intel_huc_check_status() - check HuC status
  * @huc: intel_huc structure
@@ -150,10 +210,6 @@ fail:
  */
 int intel_huc_check_status(struct intel_huc *huc)
 {
-       struct intel_gt *gt = huc_to_gt(huc);
-       intel_wakeref_t wakeref;
-       u32 status = 0;
-
        switch (__intel_uc_fw_status(&huc->fw)) {
        case INTEL_UC_FIRMWARE_NOT_SUPPORTED:
                return -ENODEV;
@@ -167,10 +223,17 @@ int intel_huc_check_status(struct intel_huc *huc)
                break;
        }
 
-       with_intel_runtime_pm(gt->uncore->rpm, wakeref)
-               status = intel_uncore_read(gt->uncore, huc->status.reg);
+       return huc_is_authenticated(huc);
+}
 
-       return (status & huc->status.mask) == huc->status.value;
+void intel_huc_update_auth_status(struct intel_huc *huc)
+{
+       if (!intel_uc_fw_is_loadable(&huc->fw))
+               return;
+
+       if (huc_is_authenticated(huc))
+               intel_uc_fw_change_status(&huc->fw,
+                                         INTEL_UC_FIRMWARE_RUNNING);
 }
 
 /**
index 73ec670800f2b55f6266370f5173dae17f546e93..d7e25b6e879eb7e8249d6fad7898256566d55b31 100644 (file)
@@ -27,6 +27,7 @@ int intel_huc_init(struct intel_huc *huc);
 void intel_huc_fini(struct intel_huc *huc);
 int intel_huc_auth(struct intel_huc *huc);
 int intel_huc_check_status(struct intel_huc *huc);
+void intel_huc_update_auth_status(struct intel_huc *huc);
 
 static inline int intel_huc_sanitize(struct intel_huc *huc)
 {
@@ -50,9 +51,9 @@ static inline bool intel_huc_is_used(struct intel_huc *huc)
        return intel_uc_fw_is_available(&huc->fw);
 }
 
-static inline bool intel_huc_is_authenticated(struct intel_huc *huc)
+static inline bool intel_huc_is_loaded_by_gsc(const struct intel_huc *huc)
 {
-       return intel_uc_fw_is_running(&huc->fw);
+       return huc->fw.loaded_via_gsc;
 }
 
 void intel_huc_load_status(struct intel_huc *huc, struct drm_printer *p);
index e5ef509c70e8944712a67431c1a21b0f084ae099..9d6ab1e016395f2feeb91f98787b2f1258d0711c 100644 (file)
@@ -8,7 +8,7 @@
 #include "i915_drv.h"
 
 /**
- * intel_huc_fw_upload() - load HuC uCode to device
+ * intel_huc_fw_upload() - load HuC uCode to device via DMA transfer
  * @huc: intel_huc structure
  *
  * Called from intel_uc_init_hw() during driver load, resume from sleep and
@@ -21,6 +21,9 @@
  */
 int intel_huc_fw_upload(struct intel_huc *huc)
 {
+       if (intel_huc_is_loaded_by_gsc(huc))
+               return -ENODEV;
+
        /* HW doesn't look at destination address for HuC, so set it to 0 */
        return intel_uc_fw_upload(&huc->fw, 0, HUC_UKERNEL);
 }
index e8f099360e010ff1ebd40492de6674a3f57966a1..f2e7c82985efd00213c096169b47714df19abdf6 100644 (file)
@@ -45,6 +45,10 @@ static void uc_expand_default_options(struct intel_uc *uc)
 
        /* Default: enable HuC authentication and GuC submission */
        i915->params.enable_guc = ENABLE_GUC_LOAD_HUC | ENABLE_GUC_SUBMISSION;
+
+       /* XEHPSDV and PVC do not use HuC */
+       if (IS_XEHPSDV(i915) || IS_PONTEVECCHIO(i915))
+               i915->params.enable_guc &= ~ENABLE_GUC_LOAD_HUC;
 }
 
 /* Reset GuC providing us with fresh state for both GuC and HuC.
@@ -323,17 +327,10 @@ static int __uc_init(struct intel_uc *uc)
        if (ret)
                return ret;
 
-       if (intel_uc_uses_huc(uc)) {
-               ret = intel_huc_init(huc);
-               if (ret)
-                       goto out_guc;
-       }
+       if (intel_uc_uses_huc(uc))
+               intel_huc_init(huc);
 
        return 0;
-
-out_guc:
-       intel_guc_fini(guc);
-       return ret;
 }
 
 static void __uc_fini(struct intel_uc *uc)
@@ -509,7 +506,16 @@ static int __uc_init_hw(struct intel_uc *uc)
        if (ret)
                goto err_log_capture;
 
-       intel_huc_auth(huc);
+       /*
+        * GSC-loaded HuC is authenticated by the GSC, so we don't need to
+        * trigger the auth here. However, given that the HuC loaded this way
+        * survive GT reset, we still need to update our SW bookkeeping to make
+        * sure it reflects the correct HW status.
+        */
+       if (intel_huc_is_loaded_by_gsc(huc))
+               intel_huc_update_auth_status(huc);
+       else
+               intel_huc_auth(huc);
 
        if (intel_uc_uses_guc_submission(uc))
                intel_guc_submission_enable(guc);
index d078f884b5e3263258836d718843ca452255ecc9..c06e83872c34d07d79b1d643a5ea7dc91777c251 100644 (file)
@@ -156,7 +156,7 @@ __uc_fw_auto_select(struct drm_i915_private *i915, struct intel_uc_fw *uc_fw)
                [INTEL_UC_FW_TYPE_GUC] = { blobs_guc, ARRAY_SIZE(blobs_guc) },
                [INTEL_UC_FW_TYPE_HUC] = { blobs_huc, ARRAY_SIZE(blobs_huc) },
        };
-       static const struct uc_fw_platform_requirement *fw_blobs;
+       const struct uc_fw_platform_requirement *fw_blobs;
        enum intel_platform p = INTEL_INFO(i915)->platform;
        u32 fw_count;
        u8 rev = INTEL_REVID(i915);
@@ -301,45 +301,31 @@ static void __force_fw_fetch_failures(struct intel_uc_fw *uc_fw, int e)
        }
 }
 
-/**
- * intel_uc_fw_fetch - fetch uC firmware
- * @uc_fw: uC firmware
- *
- * Fetch uC firmware into GEM obj.
- *
- * Return: 0 on success, a negative errno code on failure.
- */
-int intel_uc_fw_fetch(struct intel_uc_fw *uc_fw)
+static int check_gsc_manifest(const struct firmware *fw,
+                             struct intel_uc_fw *uc_fw)
 {
-       struct drm_i915_private *i915 = __uc_fw_to_gt(uc_fw)->i915;
-       struct device *dev = i915->drm.dev;
-       struct drm_i915_gem_object *obj;
-       const struct firmware *fw = NULL;
-       struct uc_css_header *css;
-       size_t size;
-       int err;
+       u32 *dw = (u32 *)fw->data;
+       u32 version = dw[HUC_GSC_VERSION_DW];
 
-       GEM_BUG_ON(!i915->wopcm.size);
-       GEM_BUG_ON(!intel_uc_fw_is_enabled(uc_fw));
-
-       err = i915_inject_probe_error(i915, -ENXIO);
-       if (err)
-               goto fail;
+       uc_fw->major_ver_found = FIELD_GET(HUC_GSC_MAJOR_VER_MASK, version);
+       uc_fw->minor_ver_found = FIELD_GET(HUC_GSC_MINOR_VER_MASK, version);
 
-       __force_fw_fetch_failures(uc_fw, -EINVAL);
-       __force_fw_fetch_failures(uc_fw, -ESTALE);
+       return 0;
+}
 
-       err = request_firmware(&fw, uc_fw->path, dev);
-       if (err)
-               goto fail;
+static int check_ccs_header(struct drm_i915_private *i915,
+                           const struct firmware *fw,
+                           struct intel_uc_fw *uc_fw)
+{
+       struct uc_css_header *css;
+       size_t size;
 
        /* Check the size of the blob before examining buffer contents */
        if (unlikely(fw->size < sizeof(struct uc_css_header))) {
                drm_warn(&i915->drm, "%s firmware %s: invalid size: %zu < %zu\n",
                         intel_uc_fw_type_repr(uc_fw->type), uc_fw->path,
                         fw->size, sizeof(struct uc_css_header));
-               err = -ENODATA;
-               goto fail;
+               return -ENODATA;
        }
 
        css = (struct uc_css_header *)fw->data;
@@ -352,8 +338,7 @@ int intel_uc_fw_fetch(struct intel_uc_fw *uc_fw)
                         "%s firmware %s: unexpected header size: %zu != %zu\n",
                         intel_uc_fw_type_repr(uc_fw->type), uc_fw->path,
                         fw->size, sizeof(struct uc_css_header));
-               err = -EPROTO;
-               goto fail;
+               return -EPROTO;
        }
 
        /* uCode size must calculated from other sizes */
@@ -368,8 +353,7 @@ int intel_uc_fw_fetch(struct intel_uc_fw *uc_fw)
                drm_warn(&i915->drm, "%s firmware %s: invalid size: %zu < %zu\n",
                         intel_uc_fw_type_repr(uc_fw->type), uc_fw->path,
                         fw->size, size);
-               err = -ENOEXEC;
-               goto fail;
+               return -ENOEXEC;
        }
 
        /* Sanity check whether this fw is not larger than whole WOPCM memory */
@@ -378,8 +362,7 @@ int intel_uc_fw_fetch(struct intel_uc_fw *uc_fw)
                drm_warn(&i915->drm, "%s firmware %s: invalid size: %zu > %zu\n",
                         intel_uc_fw_type_repr(uc_fw->type), uc_fw->path,
                         size, (size_t)i915->wopcm.size);
-               err = -E2BIG;
-               goto fail;
+               return -E2BIG;
        }
 
        /* Get version numbers from the CSS header */
@@ -388,6 +371,49 @@ int intel_uc_fw_fetch(struct intel_uc_fw *uc_fw)
        uc_fw->minor_ver_found = FIELD_GET(CSS_SW_VERSION_UC_MINOR,
                                           css->sw_version);
 
+       if (uc_fw->type == INTEL_UC_FW_TYPE_GUC)
+               uc_fw->private_data_size = css->private_data_size;
+
+       return 0;
+}
+
+/**
+ * intel_uc_fw_fetch - fetch uC firmware
+ * @uc_fw: uC firmware
+ *
+ * Fetch uC firmware into GEM obj.
+ *
+ * Return: 0 on success, a negative errno code on failure.
+ */
+int intel_uc_fw_fetch(struct intel_uc_fw *uc_fw)
+{
+       struct drm_i915_private *i915 = __uc_fw_to_gt(uc_fw)->i915;
+       struct device *dev = i915->drm.dev;
+       struct drm_i915_gem_object *obj;
+       const struct firmware *fw = NULL;
+       int err;
+
+       GEM_BUG_ON(!i915->wopcm.size);
+       GEM_BUG_ON(!intel_uc_fw_is_enabled(uc_fw));
+
+       err = i915_inject_probe_error(i915, -ENXIO);
+       if (err)
+               goto fail;
+
+       __force_fw_fetch_failures(uc_fw, -EINVAL);
+       __force_fw_fetch_failures(uc_fw, -ESTALE);
+
+       err = request_firmware(&fw, uc_fw->path, dev);
+       if (err)
+               goto fail;
+
+       if (uc_fw->loaded_via_gsc)
+               err = check_gsc_manifest(fw, uc_fw);
+       else
+               err = check_ccs_header(i915, fw, uc_fw);
+       if (err)
+               goto fail;
+
        if (uc_fw->major_ver_found != uc_fw->major_ver_wanted ||
            uc_fw->minor_ver_found < uc_fw->minor_ver_wanted) {
                drm_notice(&i915->drm, "%s firmware %s: unexpected version: %u.%u != %u.%u\n",
@@ -400,9 +426,6 @@ int intel_uc_fw_fetch(struct intel_uc_fw *uc_fw)
                }
        }
 
-       if (uc_fw->type == INTEL_UC_FW_TYPE_GUC)
-               uc_fw->private_data_size = css->private_data_size;
-
        if (HAS_LMEM(i915)) {
                obj = i915_gem_object_create_lmem_from_data(i915, fw->data, fw->size);
                if (!IS_ERR(obj))
@@ -470,7 +493,10 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw *uc_fw)
        if (i915_gem_object_is_lmem(obj))
                pte_flags |= PTE_LM;
 
-       ggtt->vm.insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, pte_flags);
+       if (ggtt->vm.raw_insert_entries)
+               ggtt->vm.raw_insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, pte_flags);
+       else
+               ggtt->vm.insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, pte_flags);
 }
 
 static void uc_fw_unbind_ggtt(struct intel_uc_fw *uc_fw)
index 3229018877d3dbc63fa52e395ca647a606dd2a78..4f169035f5041b64b46cc076d2a1dabc930d8019 100644 (file)
@@ -102,6 +102,8 @@ struct intel_uc_fw {
        u32 ucode_size;
 
        u32 private_data_size;
+
+       bool loaded_via_gsc;
 };
 
 #ifdef CONFIG_DRM_I915_DEBUG_GUC
index e41ffc7a7fbcb0aceedd079819a3f27405748733..b05e0e35b734c434e4960e52c75f21c4fc3f5a62 100644 (file)
  * 3. Length info of each component can be found in header, in dwords.
  * 4. Modulus and exponent key are not required by driver. They may not appear
  *    in fw. So driver will load a truncated firmware in this case.
+ *
+ * Starting from DG2, the HuC is loaded by the GSC instead of i915. The GSC
+ * firmware performs all the required integrity checks, we just need to check
+ * the version. Note that the header for GSC-managed blobs is different from the
+ * CSS used for dma-loaded firmwares.
  */
 
 struct uc_css_header {
@@ -78,4 +83,8 @@ struct uc_css_header {
 } __packed;
 static_assert(sizeof(struct uc_css_header) == 128);
 
+#define HUC_GSC_VERSION_DW             44
+#define   HUC_GSC_MAJOR_VER_MASK       (0xFF << 0)
+#define   HUC_GSC_MINOR_VER_MASK       (0xFF << 16)
+
 #endif /* _INTEL_UC_FW_ABI_H */
index b9eb75a2b4002a4463e3d6584a2f8fe2797d8635..0ba2a3455d99327f7a551b800e4c84e19dfe880b 100644 (file)
@@ -428,7 +428,7 @@ struct cmd_info {
 #define R_VECS BIT(VECS0)
 #define R_ALL (R_RCS | R_VCS | R_BCS | R_VECS)
        /* rings that support this cmd: BLT/RCS/VCS/VECS */
-       u16 rings;
+       intel_engine_mask_t rings;
 
        /* devices that support this cmd: SNB/IVB/HSW/... */
        u16 devices;
index b47746152d97fc2cbec410beaebfe9e69fdf5dcf..0e224761d0ed8cfa261cf64f6dd5af13999533b0 100644 (file)
 #include "intel_region_ttm.h"
 #include "vlv_suspend.h"
 
+/* Intel Rapid Start Technology ACPI device name */
+static const char irst_name[] = "INT3392";
+
 static const struct drm_driver i915_drm_driver;
 
 static int i915_get_bridge_dev(struct drm_i915_private *dev_priv)
@@ -520,6 +523,22 @@ mask_err:
        return ret;
 }
 
+static int i915_pcode_init(struct drm_i915_private *i915)
+{
+       struct intel_gt *gt;
+       int id, ret;
+
+       for_each_gt(gt, i915, id) {
+               ret = intel_pcode_init(gt->uncore);
+               if (ret) {
+                       drm_err(&gt->i915->drm, "gt%d: intel_pcode_init failed %d\n", id, ret);
+                       return ret;
+               }
+       }
+
+       return 0;
+}
+
 /**
  * i915_driver_hw_probe - setup state requiring device access
  * @dev_priv: device private
@@ -629,7 +648,7 @@ static int i915_driver_hw_probe(struct drm_i915_private *dev_priv)
 
        intel_opregion_setup(dev_priv);
 
-       ret = intel_pcode_init(&dev_priv->uncore);
+       ret = i915_pcode_init(dev_priv);
        if (ret)
                goto err_msi;
 
@@ -1251,7 +1270,7 @@ static int i915_drm_resume(struct drm_device *dev)
 
        disable_rpm_wakeref_asserts(&dev_priv->runtime_pm);
 
-       ret = intel_pcode_init(&dev_priv->uncore);
+       ret = i915_pcode_init(dev_priv);
        if (ret)
                return ret;
 
@@ -1425,6 +1444,8 @@ static int i915_pm_suspend(struct device *kdev)
                return -ENODEV;
        }
 
+       i915_ggtt_mark_pte_lost(i915, false);
+
        if (i915->drm.switch_power_state == DRM_SWITCH_POWER_OFF)
                return 0;
 
@@ -1477,6 +1498,14 @@ static int i915_pm_resume(struct device *kdev)
        if (i915->drm.switch_power_state == DRM_SWITCH_POWER_OFF)
                return 0;
 
+       /*
+        * If IRST is enabled, or if we can't detect whether it's enabled,
+        * then we must assume we lost the GGTT page table entries, since
+        * they are not retained if IRST decided to enter S4.
+        */
+       if (!IS_ENABLED(CONFIG_ACPI) || acpi_dev_present(irst_name, NULL, -1))
+               i915_ggtt_mark_pte_lost(i915, true);
+
        return i915_drm_resume(&i915->drm);
 }
 
@@ -1536,6 +1565,9 @@ static int i915_pm_restore_early(struct device *kdev)
 
 static int i915_pm_restore(struct device *kdev)
 {
+       struct drm_i915_private *i915 = kdev_to_i915(kdev);
+
+       i915_ggtt_mark_pte_lost(i915, true);
        return i915_pm_resume(kdev);
 }
 
index 18d38cb59923d01b760cd5ae53064bf82b93c6d2..b09d1d3865740378006cc497da2dd3b3bddf60ff 100644 (file)
@@ -116,8 +116,9 @@ show_client_class(struct seq_file *m,
                total += busy_add(ctx, class);
        rcu_read_unlock();
 
-       seq_printf(m, "drm-engine-%s:\t%llu ns\n",
-                  uabi_class_names[class], total);
+       if (capacity)
+               seq_printf(m, "drm-engine-%s:\t%llu ns\n",
+                          uabi_class_names[class], total);
 
        if (capacity > 1)
                seq_printf(m, "drm-engine-capacity-%s:\t%u\n",
index f796c5e8e06066889cf71a5d101f8f8bb220cc86..69496af996d9e94ac1903bd834563d4ef80a2151 100644 (file)
@@ -11,7 +11,7 @@
 #include <linux/spinlock.h>
 #include <linux/xarray.h>
 
-#include "gt/intel_engine_types.h"
+#include <uapi/drm/i915_drm.h>
 
 #define I915_LAST_UABI_ENGINE_CLASS I915_ENGINE_CLASS_COMPUTE
 
index 4d57609d619a516b82f6d3eb19da9e6b5800fd20..c22f29c3faa0e2e665ad6131be7504e8447ec7f4 100644 (file)
@@ -879,6 +879,7 @@ static inline struct intel_gt *to_gt(struct drm_i915_private *i915)
 #define INTEL_DISPLAY_STEP(__i915) (RUNTIME_INFO(__i915)->step.display_step)
 #define INTEL_GRAPHICS_STEP(__i915) (RUNTIME_INFO(__i915)->step.graphics_step)
 #define INTEL_MEDIA_STEP(__i915) (RUNTIME_INFO(__i915)->step.media_step)
+#define INTEL_BASEDIE_STEP(__i915) (RUNTIME_INFO(__i915)->step.basedie_step)
 
 #define IS_DISPLAY_STEP(__i915, since, until) \
        (drm_WARN_ON(&(__i915)->drm, INTEL_DISPLAY_STEP(__i915) == STEP_NONE), \
@@ -892,6 +893,10 @@ static inline struct intel_gt *to_gt(struct drm_i915_private *i915)
        (drm_WARN_ON(&(__i915)->drm, INTEL_MEDIA_STEP(__i915) == STEP_NONE), \
         INTEL_MEDIA_STEP(__i915) >= (since) && INTEL_MEDIA_STEP(__i915) < (until))
 
+#define IS_BASEDIE_STEP(__i915, since, until) \
+       (drm_WARN_ON(&(__i915)->drm, INTEL_BASEDIE_STEP(__i915) == STEP_NONE), \
+        INTEL_BASEDIE_STEP(__i915) >= (since) && INTEL_BASEDIE_STEP(__i915) < (until))
+
 static __always_inline unsigned int
 __platform_mask_index(const struct intel_runtime_info *info,
                      enum intel_platform p)
@@ -1144,6 +1149,14 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
        (IS_DG2(__i915) && \
         IS_DISPLAY_STEP(__i915, since, until))
 
+#define IS_PVC_BD_STEP(__i915, since, until) \
+       (IS_PONTEVECCHIO(__i915) && \
+        IS_BASEDIE_STEP(__i915, since, until))
+
+#define IS_PVC_CT_STEP(__i915, since, until) \
+       (IS_PONTEVECCHIO(__i915) && \
+        IS_GRAPHICS_STEP(__i915, since, until))
+
 #define IS_LP(dev_priv)                (INTEL_INFO(dev_priv)->is_lp)
 #define IS_GEN9_LP(dev_priv)   (GRAPHICS_VER(dev_priv) == 9 && IS_LP(dev_priv))
 #define IS_GEN9_BC(dev_priv)   (GRAPHICS_VER(dev_priv) == 9 && !IS_LP(dev_priv))
@@ -1159,6 +1172,8 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 })
 #define RCS_MASK(gt) \
        ENGINE_INSTANCES_MASK(gt, RCS0, I915_MAX_RCS)
+#define BCS_MASK(gt) \
+       ENGINE_INSTANCES_MASK(gt, BCS0, I915_MAX_BCS)
 #define VDBOX_MASK(gt) \
        ENGINE_INSTANCES_MASK(gt, VCS0, I915_MAX_VCS)
 #define VEBOX_MASK(gt) \
@@ -1267,9 +1282,6 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 #define HAS_RUNTIME_PM(dev_priv) (INTEL_INFO(dev_priv)->has_runtime_pm)
 #define HAS_64BIT_RELOC(dev_priv) (INTEL_INFO(dev_priv)->has_64bit_reloc)
 
-#define HAS_MSLICES(dev_priv) \
-       (INTEL_INFO(dev_priv)->has_mslices)
-
 /*
  * Set this flag, when platform requires 64K GTT page sizes or larger for
  * device local memory access.
@@ -1308,6 +1320,8 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 
 #define HAS_LSPCON(dev_priv) (IS_DISPLAY_VER(dev_priv, 9, 10))
 
+#define HAS_L3_CCS_READ(i915) (INTEL_INFO(i915)->has_l3_ccs_read)
+
 /* DPF == dynamic parity feature */
 #define HAS_L3_DPF(dev_priv) (INTEL_INFO(dev_priv)->has_l3_dpf)
 #define NUM_L3_SLICES(dev_priv) (IS_HSW_GT3(dev_priv) ? \
@@ -1341,6 +1355,10 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
 
 #define HAS_MBUS_JOINING(i915) (IS_ALDERLAKE_P(i915))
 
+#define HAS_3D_PIPELINE(i915)  (INTEL_INFO(i915)->has_3d_pipeline)
+
+#define HAS_ONE_EU_PER_FUSE_BIT(i915)  (INTEL_INFO(i915)->has_one_eu_per_fuse_bit)
+
 /* i915_gem.c */
 void i915_gem_init_early(struct drm_i915_private *dev_priv);
 void i915_gem_cleanup_early(struct drm_i915_private *dev_priv);
index c12a0adefda539498151f3f1573b6508fae9618b..6fd15b39570c108a5c354f3f95e903852e525ab9 100644 (file)
@@ -148,14 +148,21 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data,
                value = intel_engines_has_context_isolation(i915);
                break;
        case I915_PARAM_SLICE_MASK:
+               /* Not supported from Xe_HP onward; use topology queries */
+               if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
+                       return -EINVAL;
+
                value = sseu->slice_mask;
                if (!value)
                        return -ENODEV;
                break;
        case I915_PARAM_SUBSLICE_MASK:
+               /* Not supported from Xe_HP onward; use topology queries */
+               if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 50))
+                       return -EINVAL;
+
                /* Only copy bits from the first slice */
-               memcpy(&value, sseu->subslice_mask,
-                      min(sseu->ss_stride, (u8)sizeof(value)));
+               value = intel_sseu_get_hsw_subslices(sseu, 0);
                if (!value)
                        return -ENODEV;
                break;
index 0512c66fa4f3f1fd48dcd4e60c454deb51fa6ea2..f9b1969ed7ed28727912cfa283297584a5f332f0 100644 (file)
@@ -581,6 +581,15 @@ static void error_print_engine(struct drm_i915_error_state_buf *m,
                err_printf(m, "  RC PSMI: 0x%08x\n", ee->rc_psmi);
                err_printf(m, "  FAULT_REG: 0x%08x\n", ee->fault_reg);
        }
+       if (GRAPHICS_VER(m->i915) >= 11) {
+               err_printf(m, "  NOPID: 0x%08x\n", ee->nopid);
+               err_printf(m, "  EXCC: 0x%08x\n", ee->excc);
+               err_printf(m, "  CMD_CCTL: 0x%08x\n", ee->cmd_cctl);
+               err_printf(m, "  CSCMDOP: 0x%08x\n", ee->cscmdop);
+               err_printf(m, "  CTX_SR_CTL: 0x%08x\n", ee->ctx_sr_ctl);
+               err_printf(m, "  DMA_FADDR_HI: 0x%08x\n", ee->dma_faddr_hi);
+               err_printf(m, "  DMA_FADDR_LO: 0x%08x\n", ee->dma_faddr_lo);
+       }
        if (HAS_PPGTT(m->i915)) {
                err_printf(m, "  GFX_MODE: 0x%08x\n", ee->vm_info.gfx_mode);
 
@@ -1095,8 +1104,12 @@ i915_vma_coredump_create(const struct intel_gt *gt,
 
                for_each_sgt_daddr(dma, iter, vma_res->bi.pages) {
                        mutex_lock(&ggtt->error_mutex);
-                       ggtt->vm.insert_page(&ggtt->vm, dma, slot,
-                                            I915_CACHE_NONE, 0);
+                       if (ggtt->vm.raw_insert_page)
+                               ggtt->vm.raw_insert_page(&ggtt->vm, dma, slot,
+                                                        I915_CACHE_NONE, 0);
+                       else
+                               ggtt->vm.insert_page(&ggtt->vm, dma, slot,
+                                                    I915_CACHE_NONE, 0);
                        mb();
 
                        s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
@@ -1224,6 +1237,16 @@ static void engine_record_registers(struct intel_engine_coredump *ee)
                ee->ipehr = ENGINE_READ(engine, IPEHR);
        }
 
+       if (GRAPHICS_VER(i915) >= 11) {
+               ee->cmd_cctl = ENGINE_READ(engine, RING_CMD_CCTL);
+               ee->cscmdop = ENGINE_READ(engine, RING_CSCMDOP);
+               ee->ctx_sr_ctl = ENGINE_READ(engine, RING_CTX_SR_CTL);
+               ee->dma_faddr_hi = ENGINE_READ(engine, RING_DMA_FADD_UDW);
+               ee->dma_faddr_lo = ENGINE_READ(engine, RING_DMA_FADD);
+               ee->nopid = ENGINE_READ(engine, RING_NOPID);
+               ee->excc = ENGINE_READ(engine, RING_EXCC);
+       }
+
        intel_engine_get_instdone(engine, &ee->instdone);
 
        ee->instpm = ENGINE_READ(engine, RING_INSTPM);
index a611abacd9c2c089e3d1278e6d4eac24a6a148f9..55a143b92d10ec0fe58cf596bed5c17695aba4bf 100644 (file)
@@ -84,6 +84,13 @@ struct intel_engine_coredump {
        u32 fault_reg;
        u64 faddr;
        u32 rc_psmi; /* sleep state */
+       u32 nopid;
+       u32 excc;
+       u32 cmd_cctl;
+       u32 cscmdop;
+       u32 ctx_sr_ctl;
+       u32 dma_faddr_hi;
+       u32 dma_faddr_lo;
        struct intel_instdone instdone;
 
        /* GuC matched capture-lists info */
index 47b2a2631c5bd80756aad26a3ffdf33c4848b753..d6d875b2d3799dec9183e35a8759496cb02a9b94 100644 (file)
        .display.overlay_needs_physical = 1, \
        .display.has_gmch = 1, \
        .gpu_reset_clobbers_display = true, \
+       .has_3d_pipeline = 1, \
        .hws_needs_physical = 1, \
        .unfenced_needs_alignment = 1, \
        .platform_engine_mask = BIT(RCS0), \
        .display.has_overlay = 1, \
        .display.overlay_needs_physical = 1, \
        .display.has_gmch = 1, \
+       .has_3d_pipeline = 1, \
        .gpu_reset_clobbers_display = true, \
        .hws_needs_physical = 1, \
        .unfenced_needs_alignment = 1, \
@@ -232,6 +234,7 @@ static const struct intel_device_info i865g_info = {
        .display.has_gmch = 1, \
        .gpu_reset_clobbers_display = true, \
        .platform_engine_mask = BIT(RCS0), \
+       .has_3d_pipeline = 1, \
        .has_snoop = true, \
        .has_coherent_ggtt = true, \
        .dma_mask_size = 32, \
@@ -323,6 +326,7 @@ static const struct intel_device_info pnv_m_info = {
        .display.has_gmch = 1, \
        .gpu_reset_clobbers_display = true, \
        .platform_engine_mask = BIT(RCS0), \
+       .has_3d_pipeline = 1, \
        .has_snoop = true, \
        .has_coherent_ggtt = true, \
        .dma_mask_size = 36, \
@@ -374,6 +378,7 @@ static const struct intel_device_info gm45_info = {
        .display.cpu_transcoder_mask = BIT(TRANSCODER_A) | BIT(TRANSCODER_B), \
        .display.has_hotplug = 1, \
        .platform_engine_mask = BIT(RCS0) | BIT(VCS0), \
+       .has_3d_pipeline = 1, \
        .has_snoop = true, \
        .has_coherent_ggtt = true, \
        /* ilk does support rc6, but we do not implement [power] contexts */ \
@@ -405,6 +410,7 @@ static const struct intel_device_info ilk_m_info = {
        .display.has_hotplug = 1, \
        .display.fbc_mask = BIT(INTEL_FBC_A), \
        .platform_engine_mask = BIT(RCS0) | BIT(VCS0) | BIT(BCS0), \
+       .has_3d_pipeline = 1, \
        .has_coherent_ggtt = true, \
        .has_llc = 1, \
        .has_rc6 = 1, \
@@ -456,6 +462,7 @@ static const struct intel_device_info snb_m_gt2_info = {
        .display.has_hotplug = 1, \
        .display.fbc_mask = BIT(INTEL_FBC_A), \
        .platform_engine_mask = BIT(RCS0) | BIT(VCS0) | BIT(BCS0), \
+       .has_3d_pipeline = 1, \
        .has_coherent_ggtt = true, \
        .has_llc = 1, \
        .has_rc6 = 1, \
@@ -692,6 +699,7 @@ static const struct intel_device_info skl_gt4_info = {
        .display.cpu_transcoder_mask = BIT(TRANSCODER_A) | BIT(TRANSCODER_B) | \
                BIT(TRANSCODER_C) | BIT(TRANSCODER_EDP) | \
                BIT(TRANSCODER_DSI_A) | BIT(TRANSCODER_DSI_C), \
+       .has_3d_pipeline = 1, \
        .has_64bit_reloc = 1, \
        .display.has_ddi = 1, \
        .display.has_fpga_dbg = 1, \
@@ -1005,6 +1013,7 @@ static const struct intel_device_info adl_p_info = {
        .graphics.rel = 50, \
        XE_HP_PAGE_SIZES, \
        .dma_mask_size = 46, \
+       .has_3d_pipeline = 1, \
        .has_64bit_reloc = 1, \
        .has_flat_ccs = 1, \
        .has_global_mocs = 1, \
@@ -1012,7 +1021,7 @@ static const struct intel_device_info adl_p_info = {
        .has_llc = 1, \
        .has_logical_ring_contexts = 1, \
        .has_logical_ring_elsq = 1, \
-       .has_mslices = 1, \
+       .has_mslice_steering = 1, \
        .has_rc6 = 1, \
        .has_reset_engine = 1, \
        .has_rps = 1, \
@@ -1079,7 +1088,12 @@ static const struct intel_device_info ats_m_info = {
 
 #define XE_HPC_FEATURES \
        XE_HP_FEATURES, \
-       .dma_mask_size = 52
+       .dma_mask_size = 52, \
+       .has_3d_pipeline = 0, \
+       .has_guc_deprivilege = 1, \
+       .has_l3_ccs_read = 1, \
+       .has_mslice_steering = 0, \
+       .has_one_eu_per_fuse_bit = 1
 
 __maybe_unused
 static const struct intel_device_info pvc_info = {
index 7584cec53d5da18ae3e8111368fb4fce8419f011..0094f67c63f2bfc3c32990f9dd9cd0ba1af8ea51 100644 (file)
@@ -31,10 +31,12 @@ static int copy_query_item(void *query_hdr, size_t query_sz,
 
 static int fill_topology_info(const struct sseu_dev_info *sseu,
                              struct drm_i915_query_item *query_item,
-                             const u8 *subslice_mask)
+                             intel_sseu_ss_mask_t subslice_mask)
 {
        struct drm_i915_query_topology_info topo;
        u32 slice_length, subslice_length, eu_length, total_length;
+       int ss_stride = GEN_SSEU_STRIDE(sseu->max_subslices);
+       int eu_stride = GEN_SSEU_STRIDE(sseu->max_eus_per_subslice);
        int ret;
 
        BUILD_BUG_ON(sizeof(u8) != sizeof(sseu->slice_mask));
@@ -43,8 +45,8 @@ static int fill_topology_info(const struct sseu_dev_info *sseu,
                return -ENODEV;
 
        slice_length = sizeof(sseu->slice_mask);
-       subslice_length = sseu->max_slices * sseu->ss_stride;
-       eu_length = sseu->max_slices * sseu->max_subslices * sseu->eu_stride;
+       subslice_length = sseu->max_slices * ss_stride;
+       eu_length = sseu->max_slices * sseu->max_subslices * eu_stride;
        total_length = sizeof(topo) + slice_length + subslice_length +
                       eu_length;
 
@@ -59,9 +61,9 @@ static int fill_topology_info(const struct sseu_dev_info *sseu,
        topo.max_eus_per_subslice = sseu->max_eus_per_subslice;
 
        topo.subslice_offset = slice_length;
-       topo.subslice_stride = sseu->ss_stride;
+       topo.subslice_stride = ss_stride;
        topo.eu_offset = slice_length + subslice_length;
-       topo.eu_stride = sseu->eu_stride;
+       topo.eu_stride = eu_stride;
 
        if (copy_to_user(u64_to_user_ptr(query_item->data_ptr),
                         &topo, sizeof(topo)))
@@ -71,15 +73,15 @@ static int fill_topology_info(const struct sseu_dev_info *sseu,
                         &sseu->slice_mask, slice_length))
                return -EFAULT;
 
-       if (copy_to_user(u64_to_user_ptr(query_item->data_ptr +
-                                        sizeof(topo) + slice_length),
-                        subslice_mask, subslice_length))
+       if (intel_sseu_copy_ssmask_to_user(u64_to_user_ptr(query_item->data_ptr +
+                                                          sizeof(topo) + slice_length),
+                                          sseu))
                return -EFAULT;
 
-       if (copy_to_user(u64_to_user_ptr(query_item->data_ptr +
-                                        sizeof(topo) +
-                                        slice_length + subslice_length),
-                        sseu->eu_mask, eu_length))
+       if (intel_sseu_copy_eumask_to_user(u64_to_user_ptr(query_item->data_ptr +
+                                                          sizeof(topo) +
+                                                          slice_length + subslice_length),
+                                          sseu))
                return -EFAULT;
 
        return total_length;
index 616164fa2e320ed1e5f36ca8b09182225dd79c1f..643d7f020a4a4cd9feb2eb5ab2dd9ea34c44b852 100644 (file)
 #define GEN12_COMPUTE2_RING_BASE       0x1e000
 #define GEN12_COMPUTE3_RING_BASE       0x26000
 #define BLT_RING_BASE          0x22000
+#define XEHPC_BCS1_RING_BASE   0x3e0000
+#define XEHPC_BCS2_RING_BASE   0x3e2000
+#define XEHPC_BCS3_RING_BASE   0x3e4000
+#define XEHPC_BCS4_RING_BASE   0x3e6000
+#define XEHPC_BCS5_RING_BASE   0x3e8000
+#define XEHPC_BCS6_RING_BASE   0x3ea000
+#define XEHPC_BCS7_RING_BASE   0x3ec000
+#define XEHPC_BCS8_RING_BASE   0x3ee000
 #define DG1_GSC_HECI1_BASE     0x00258000
 #define DG1_GSC_HECI2_BASE     0x00259000
 #define DG2_GSC_HECI1_BASE     0x00373000
 #define BXT_RP_STATE_CAP        _MMIO(0x138170)
 #define GEN9_RP_STATE_LIMITS   _MMIO(0x138148)
 #define XEHPSDV_RP_STATE_CAP   _MMIO(0x250014)
+#define PVC_RP_STATE_CAP       _MMIO(0x281014)
 
 #define GT0_PERF_LIMIT_REASONS         _MMIO(0x1381a8)
 #define   GT0_PERF_LIMIT_REASONS_MASK  0xde3
 #define     DG1_UNCORE_GET_INIT_STATUS         0x0
 #define     DG1_UNCORE_INIT_STATUS_COMPLETE    0x1
 #define GEN12_PCODE_READ_SAGV_BLOCK_TIME_US    0x23
+#define   XEHP_PCODE_FREQUENCY_CONFIG          0x6e    /* xehpsdv, pvc */
+/* XEHP_PCODE_FREQUENCY_CONFIG sub-commands (param1) */
+#define     PCODE_MBOX_FC_SC_READ_FUSED_P0     0x0
+#define     PCODE_MBOX_FC_SC_READ_FUSED_PN     0x1
+/* PCODE_MBOX_DOMAIN_* - mailbox domain IDs */
+/*   XEHP_PCODE_FREQUENCY_CONFIG param2 */
+#define     PCODE_MBOX_DOMAIN_NONE             0x0
+#define     PCODE_MBOX_DOMAIN_MEDIAFF          0x3
 #define GEN6_PCODE_DATA                                _MMIO(0x138128)
 #define   GEN6_PCODE_FREQ_IA_RATIO_SHIFT       8
 #define   GEN6_PCODE_FREQ_RING_RATIO_SHIFT     16
@@ -8328,23 +8345,6 @@ enum skl_power_gate {
 #define   SGGI_DIS                     REG_BIT(15)
 #define   SGR_DIS                      REG_BIT(13)
 
-#define XEHPSDV_TILE0_ADDR_RANGE       _MMIO(0x4900)
-#define   XEHPSDV_TILE_LMEM_RANGE_SHIFT  8
-
-#define XEHPSDV_FLAT_CCS_BASE_ADDR     _MMIO(0x4910)
-#define   XEHPSDV_CCS_BASE_SHIFT       8
-
-/* gamt regs */
-#define GEN8_L3_LRA_1_GPGPU _MMIO(0x4dd4)
-#define   GEN8_L3_LRA_1_GPGPU_DEFAULT_VALUE_BDW  0x67F1427F /* max/min for LRA1/2 */
-#define   GEN8_L3_LRA_1_GPGPU_DEFAULT_VALUE_CHV  0x5FF101FF /* max/min for LRA1/2 */
-#define   GEN9_L3_LRA_1_GPGPU_DEFAULT_VALUE_SKL  0x67F1427F /*    "        " */
-#define   GEN9_L3_LRA_1_GPGPU_DEFAULT_VALUE_BXT  0x5FF101FF /*    "        " */
-
-#define MMCD_MISC_CTRL         _MMIO(0x4ddc) /* skl+ */
-#define  MMCD_PCLA             (1 << 31)
-#define  MMCD_HOTSPOT_EN       (1 << 27)
-
 #define _ICL_PHY_MISC_A                0x64C00
 #define _ICL_PHY_MISC_B                0x64C04
 #define _DG2_PHY_MISC_TC1      0x64C14 /* TC1="PHY E" but offset as if "PHY F" */
index 73d5195146b0b82665f9f5ca7d23e3380d1f4e6b..62fad16a55e84f15ba3784a298e1155349f64297 100644 (file)
@@ -60,7 +60,7 @@ static struct kmem_cache *slab_execute_cbs;
 
 static const char *i915_fence_get_driver_name(struct dma_fence *fence)
 {
-       return dev_name(to_request(fence)->engine->i915->drm.dev);
+       return dev_name(to_request(fence)->i915->drm.dev);
 }
 
 static const char *i915_fence_get_timeline_name(struct dma_fence *fence)
@@ -134,17 +134,42 @@ static void i915_fence_release(struct dma_fence *fence)
        i915_sw_fence_fini(&rq->semaphore);
 
        /*
-        * Keep one request on each engine for reserved use under mempressure,
+        * Keep one request on each engine for reserved use under mempressure
         * do not use with virtual engines as this really is only needed for
         * kernel contexts.
+        *
+        * We do not hold a reference to the engine here and so have to be
+        * very careful in what rq->engine we poke. The virtual engine is
+        * referenced via the rq->context and we released that ref during
+        * i915_request_retire(), ergo we must not dereference a virtual
+        * engine here. Not that we would want to, as the only consumer of
+        * the reserved engine->request_pool is the power management parking,
+        * which must-not-fail, and that is only run on the physical engines.
+        *
+        * Since the request must have been executed to be have completed,
+        * we know that it will have been processed by the HW and will
+        * not be unsubmitted again, so rq->engine and rq->execution_mask
+        * at this point is stable. rq->execution_mask will be a single
+        * bit if the last and _only_ engine it could execution on was a
+        * physical engine, if it's multiple bits then it started on and
+        * could still be on a virtual engine. Thus if the mask is not a
+        * power-of-two we assume that rq->engine may still be a virtual
+        * engine and so a dangling invalid pointer that we cannot dereference
+        *
+        * For example, consider the flow of a bonded request through a virtual
+        * engine. The request is created with a wide engine mask (all engines
+        * that we might execute on). On processing the bond, the request mask
+        * is reduced to one or more engines. If the request is subsequently
+        * bound to a single engine, it will then be constrained to only
+        * execute on that engine and never returned to the virtual engine
+        * after timeslicing away, see __unwind_incomplete_requests(). Thus we
+        * know that if the rq->execution_mask is a single bit, rq->engine
+        * can be a physical engine with the exact corresponding mask.
         */
        if (!intel_engine_is_virtual(rq->engine) &&
-           !cmpxchg(&rq->engine->request_pool, NULL, rq)) {
-               intel_context_put(rq->context);
+           is_power_of_2(rq->execution_mask) &&
+           !cmpxchg(&rq->engine->request_pool, NULL, rq))
                return;
-       }
-
-       intel_context_put(rq->context);
 
        kmem_cache_free(slab_requests, rq);
 }
@@ -611,7 +636,7 @@ bool __i915_request_submit(struct i915_request *request)
                goto active;
        }
 
-       if (unlikely(intel_context_is_banned(request->context)))
+       if (unlikely(!intel_context_is_schedulable(request->context)))
                i915_request_set_error_once(request, -EIO);
 
        if (unlikely(fatal_error(request->fence.error)))
@@ -921,22 +946,11 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
                }
        }
 
-       /*
-        * Hold a reference to the intel_context over life of an i915_request.
-        * Without this an i915_request can exist after the context has been
-        * destroyed (e.g. request retired, context closed, but user space holds
-        * a reference to the request from an out fence). In the case of GuC
-        * submission + virtual engine, the engine that the request references
-        * is also destroyed which can trigger bad pointer dref in fence ops
-        * (e.g. i915_fence_get_driver_name). We could likely change these
-        * functions to avoid touching the engine but let's just be safe and
-        * hold the intel_context reference. In execlist mode the request always
-        * eventually points to a physical engine so this isn't an issue.
-        */
-       rq->context = intel_context_get(ce);
+       rq->context = ce;
        rq->engine = ce->engine;
        rq->ring = ce->ring;
        rq->execution_mask = ce->engine->mask;
+       rq->i915 = ce->engine->i915;
 
        ret = intel_timeline_get_seqno(tl, rq, &seqno);
        if (ret)
@@ -1008,7 +1022,6 @@ err_unwind:
        GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
 
 err_free:
-       intel_context_put(ce);
        kmem_cache_free(slab_requests, rq);
 err_unreserve:
        intel_context_unpin(ce);
index 28b1f9db54875944d59d3a59b1e85c6ff16d4aa6..47041ec68df8982eb9be0d0269cb77b8ac44dfd7 100644 (file)
@@ -196,6 +196,8 @@ struct i915_request {
        struct dma_fence fence;
        spinlock_t lock;
 
+       struct drm_i915_private *i915;
+
        /**
         * Context and ring buffer related to this request
         * Contexts are refcounted, so when this request is associated with a
index 8521daba212a79f740ec1e1966cdfc47e10bea13..1e2750210831308d1edd4c1c4bde4dfa3ebd77cc 100644 (file)
@@ -166,7 +166,14 @@ static ssize_t error_state_read(struct file *filp, struct kobject *kobj,
        struct device *kdev = kobj_to_dev(kobj);
        struct drm_i915_private *i915 = kdev_minor_to_i915(kdev);
        struct i915_gpu_coredump *gpu;
-       ssize_t ret;
+       ssize_t ret = 0;
+
+       /*
+        * FIXME: Concurrent clients triggering resets and reading + clearing
+        * dumps can cause inconsistent sysfs reads when a user calls in with a
+        * non-zero offset to complete a prior partial read but the
+        * gpu_coredump has been cleared or replaced.
+        */
 
        gpu = i915_first_error_state(i915);
        if (IS_ERR(gpu)) {
@@ -178,8 +185,10 @@ static ssize_t error_state_read(struct file *filp, struct kobject *kobj,
                const char *str = "No error state collected\n";
                size_t len = strlen(str);
 
-               ret = min_t(size_t, count, len - off);
-               memcpy(buf, str + off, ret);
+               if (off < len) {
+                       ret = min_t(size_t, count, len - off);
+                       memcpy(buf, str + off, ret);
+               }
        }
 
        return ret;
@@ -259,4 +268,6 @@ void i915_teardown_sysfs(struct drm_i915_private *dev_priv)
 
        device_remove_bin_file(kdev,  &dpf_attrs_1);
        device_remove_bin_file(kdev,  &dpf_attrs);
+
+       kobject_put(dev_priv->sysfs_gt);
 }
index 4f6db539571aa64d3327e577cb6cc4e026ff2f49..5d5828b9a24268e69659f0eaa8184d82d3df7863 100644 (file)
@@ -23,6 +23,7 @@
  */
 
 #include <linux/sched/mm.h>
+#include <linux/dma-fence-array.h>
 #include <drm/drm_gem.h>
 
 #include "display/intel_frontbuffer.h"
@@ -550,13 +551,6 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
        if (WARN_ON_ONCE(vma->obj->flags & I915_BO_ALLOC_GPU_ONLY))
                return IOMEM_ERR_PTR(-EINVAL);
 
-       if (!i915_gem_object_is_lmem(vma->obj)) {
-               if (GEM_WARN_ON(!i915_vma_is_map_and_fenceable(vma))) {
-                       err = -ENODEV;
-                       goto err;
-               }
-       }
-
        GEM_BUG_ON(!i915_vma_is_ggtt(vma));
        GEM_BUG_ON(!i915_vma_is_bound(vma, I915_VMA_GLOBAL_BIND));
        GEM_BUG_ON(i915_vma_verify_bind_complete(vma));
@@ -569,20 +563,33 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
                 * of pages, that way we can also drop the
                 * I915_BO_ALLOC_CONTIGUOUS when allocating the object.
                 */
-               if (i915_gem_object_is_lmem(vma->obj))
+               if (i915_gem_object_is_lmem(vma->obj)) {
                        ptr = i915_gem_object_lmem_io_map(vma->obj, 0,
                                                          vma->obj->base.size);
-               else
+               } else if (i915_vma_is_map_and_fenceable(vma)) {
                        ptr = io_mapping_map_wc(&i915_vm_to_ggtt(vma->vm)->iomap,
                                                vma->node.start,
                                                vma->node.size);
+               } else {
+                       ptr = (void __iomem *)
+                               i915_gem_object_pin_map(vma->obj, I915_MAP_WC);
+                       if (IS_ERR(ptr)) {
+                               err = PTR_ERR(ptr);
+                               goto err;
+                       }
+                       ptr = page_pack_bits(ptr, 1);
+               }
+
                if (ptr == NULL) {
                        err = -ENOMEM;
                        goto err;
                }
 
                if (unlikely(cmpxchg(&vma->iomap, NULL, ptr))) {
-                       io_mapping_unmap(ptr);
+                       if (page_unmask_bits(ptr))
+                               __i915_gem_object_release_map(vma->obj);
+                       else
+                               io_mapping_unmap(ptr);
                        ptr = vma->iomap;
                }
        }
@@ -596,7 +603,7 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
        i915_vma_set_ggtt_write(vma);
 
        /* NB Access through the GTT requires the device to be awake. */
-       return ptr;
+       return page_mask_bits(ptr);
 
 err_unpin:
        __i915_vma_unpin(vma);
@@ -614,6 +621,8 @@ void i915_vma_unpin_iomap(struct i915_vma *vma)
 {
        GEM_BUG_ON(vma->iomap == NULL);
 
+       /* XXX We keep the mapping until __i915_vma_unbind()/evict() */
+
        i915_vma_flush_writes(vma);
 
        i915_vma_unpin_fence(vma);
@@ -1762,7 +1771,10 @@ static void __i915_vma_iounmap(struct i915_vma *vma)
        if (vma->iomap == NULL)
                return;
 
-       io_mapping_unmap(vma->iomap);
+       if (page_unmask_bits(vma->iomap))
+               __i915_gem_object_release_map(vma->obj);
+       else
+               io_mapping_unmap(vma->iomap);
        vma->iomap = NULL;
 }
 
@@ -1823,6 +1835,21 @@ int _i915_vma_move_to_active(struct i915_vma *vma,
        if (unlikely(err))
                return err;
 
+       /*
+        * Reserve fences slot early to prevent an allocation after preparing
+        * the workload and associating fences with dma_resv.
+        */
+       if (fence && !(flags & __EXEC_OBJECT_NO_RESERVE)) {
+               struct dma_fence *curr;
+               int idx;
+
+               dma_fence_array_for_each(curr, idx, fence)
+                       ;
+               err = dma_resv_reserve_fences(vma->obj->base.resv, idx);
+               if (unlikely(err))
+                       return err;
+       }
+
        if (flags & EXEC_OBJECT_WRITE) {
                struct intel_frontbuffer *front;
 
@@ -1832,31 +1859,23 @@ int _i915_vma_move_to_active(struct i915_vma *vma,
                                i915_active_add_request(&front->write, rq);
                        intel_frontbuffer_put(front);
                }
+       }
 
-               if (!(flags & __EXEC_OBJECT_NO_RESERVE)) {
-                       err = dma_resv_reserve_fences(vma->obj->base.resv, 1);
-                       if (unlikely(err))
-                               return err;
-               }
+       if (fence) {
+               struct dma_fence *curr;
+               enum dma_resv_usage usage;
+               int idx;
 
-               if (fence) {
-                       dma_resv_add_fence(vma->obj->base.resv, fence,
-                                          DMA_RESV_USAGE_WRITE);
+               obj->read_domains = 0;
+               if (flags & EXEC_OBJECT_WRITE) {
+                       usage = DMA_RESV_USAGE_WRITE;
                        obj->write_domain = I915_GEM_DOMAIN_RENDER;
-                       obj->read_domains = 0;
-               }
-       } else {
-               if (!(flags & __EXEC_OBJECT_NO_RESERVE)) {
-                       err = dma_resv_reserve_fences(vma->obj->base.resv, 1);
-                       if (unlikely(err))
-                               return err;
+               } else {
+                       usage = DMA_RESV_USAGE_READ;
                }
 
-               if (fence) {
-                       dma_resv_add_fence(vma->obj->base.resv, fence,
-                                          DMA_RESV_USAGE_READ);
-                       obj->write_domain = 0;
-               }
+               dma_fence_array_for_each(curr, idx, fence)
+                       dma_resv_add_fence(vma->obj->base.resv, curr, usage);
        }
 
        if (flags & EXEC_OBJECT_NEEDS_FENCE && vma->fence)
@@ -1899,9 +1918,11 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
                /* release the fence reg _after_ flushing */
                i915_vma_revoke_fence(vma);
 
-               __i915_vma_iounmap(vma);
                clear_bit(I915_VMA_CAN_FENCE_BIT, __i915_vma_flags(vma));
        }
+
+       __i915_vma_iounmap(vma);
+
        GEM_BUG_ON(vma->fence);
        GEM_BUG_ON(i915_vma_has_userfault(vma));
 
index f414144eadf8d72602e9b2b9bb9bd9041ad2d88f..08341174ee0a544e2f87affcbd4c59905c8afaf2 100644 (file)
@@ -143,6 +143,7 @@ enum intel_ppgtt_type {
        func(needs_compact_pt); \
        func(gpu_reset_clobbers_display); \
        func(has_reset_engine); \
+       func(has_3d_pipeline); \
        func(has_4tile); \
        func(has_flat_ccs); \
        func(has_global_mocs); \
@@ -150,12 +151,14 @@ enum intel_ppgtt_type {
        func(has_heci_pxp); \
        func(has_heci_gscfi); \
        func(has_guc_deprivilege); \
+       func(has_l3_ccs_read); \
        func(has_l3_dpf); \
        func(has_llc); \
        func(has_logical_ring_contexts); \
        func(has_logical_ring_elsq); \
        func(has_media_ratio_mode); \
-       func(has_mslices); \
+       func(has_mslice_steering); \
+       func(has_one_eu_per_fuse_bit); \
        func(has_pooled_eu); \
        func(has_pxp); \
        func(has_rc6); \
index 3355486a0b2036151f62a8f2b9b94ac4b6907741..9b7e93ca1ff9edfd22843395b9bac301b877ec1f 100644 (file)
@@ -7634,10 +7634,9 @@ static void xehpsdv_init_clock_gating(struct drm_i915_private *dev_priv)
 
 static void dg2_init_clock_gating(struct drm_i915_private *i915)
 {
-       /* Wa_22010954014:dg2_g10 */
-       if (IS_DG2_G10(i915))
-               intel_uncore_rmw(&i915->uncore, XEHP_CLOCK_GATE_DIS, 0,
-                                SGSI_SIDECLK_DIS);
+       /* Wa_22010954014:dg2 */
+       intel_uncore_rmw(&i915->uncore, XEHP_CLOCK_GATE_DIS, 0,
+                        SGSI_SIDECLK_DIS);
 
        /*
         * Wa_14010733611:dg2_g10
@@ -7648,6 +7647,17 @@ static void dg2_init_clock_gating(struct drm_i915_private *i915)
                                 SGR_DIS | SGGI_DIS);
 }
 
+static void pvc_init_clock_gating(struct drm_i915_private *dev_priv)
+{
+       /* Wa_14012385139:pvc */
+       if (IS_PVC_BD_STEP(dev_priv, STEP_A0, STEP_B0))
+               intel_uncore_rmw(&dev_priv->uncore, XEHP_CLOCK_GATE_DIS, 0, SGR_DIS);
+
+       /* Wa_22010954014:pvc */
+       if (IS_PVC_BD_STEP(dev_priv, STEP_A0, STEP_B0))
+               intel_uncore_rmw(&dev_priv->uncore, XEHP_CLOCK_GATE_DIS, 0, SGSI_SIDECLK_DIS);
+}
+
 static void cnp_init_clock_gating(struct drm_i915_private *dev_priv)
 {
        if (!HAS_PCH_CNP(dev_priv))
@@ -8064,6 +8074,7 @@ static const struct drm_i915_clock_gating_funcs platform##_clock_gating_funcs =
        .init_clock_gating = platform##_init_clock_gating,              \
 }
 
+CG_FUNCS(pvc);
 CG_FUNCS(dg2);
 CG_FUNCS(xehpsdv);
 CG_FUNCS(adlp);
@@ -8102,7 +8113,9 @@ CG_FUNCS(nop);
  */
 void intel_init_clock_gating_hooks(struct drm_i915_private *dev_priv)
 {
-       if (IS_DG2(dev_priv))
+       if (IS_PONTEVECCHIO(dev_priv))
+               dev_priv->clock_gating_funcs = &pvc_clock_gating_funcs;
+       else if (IS_DG2(dev_priv))
                dev_priv->clock_gating_funcs = &dg2_clock_gating_funcs;
        else if (IS_XEHPSDV(dev_priv))
                dev_priv->clock_gating_funcs = &xehpsdv_clock_gating_funcs;
index 74e8e4680028a2ccfb04761408aba2e80e5edb10..42b3133d8387aba0330e3e246b6aef8647261df8 100644 (file)
@@ -135,6 +135,8 @@ static const struct intel_step_info adlp_n_revids[] = {
        [0x0] = { COMMON_GT_MEDIA_STEP(A0), .display_step = STEP_D0 },
 };
 
+static void pvc_step_init(struct drm_i915_private *i915, int pci_revid);
+
 void intel_step_init(struct drm_i915_private *i915)
 {
        const struct intel_step_info *revids = NULL;
@@ -142,7 +144,10 @@ void intel_step_init(struct drm_i915_private *i915)
        int revid = INTEL_REVID(i915);
        struct intel_step_info step = {};
 
-       if (IS_DG2_G10(i915)) {
+       if (IS_PONTEVECCHIO(i915)) {
+               pvc_step_init(i915, revid);
+               return;
+       } else if (IS_DG2_G10(i915)) {
                revids = dg2_g10_revid_step_tbl;
                size = ARRAY_SIZE(dg2_g10_revid_step_tbl);
        } else if (IS_DG2_G11(i915)) {
@@ -235,6 +240,69 @@ void intel_step_init(struct drm_i915_private *i915)
        RUNTIME_INFO(i915)->step = step;
 }
 
+#define PVC_BD_REVID   GENMASK(5, 3)
+#define PVC_CT_REVID   GENMASK(2, 0)
+
+static const int pvc_bd_subids[] = {
+       [0x0] = STEP_A0,
+       [0x3] = STEP_B0,
+       [0x4] = STEP_B1,
+       [0x5] = STEP_B3,
+};
+
+static const int pvc_ct_subids[] = {
+       [0x3] = STEP_A0,
+       [0x5] = STEP_B0,
+       [0x6] = STEP_B1,
+       [0x7] = STEP_C0,
+};
+
+static int
+pvc_step_lookup(struct drm_i915_private *i915, const char *type,
+               const int *table, int size, int subid)
+{
+       if (subid < size && table[subid] != STEP_NONE)
+               return table[subid];
+
+       drm_warn(&i915->drm, "Unknown %s id 0x%02x\n", type, subid);
+
+       /*
+        * As on other platforms, try to use the next higher ID if we land on a
+        * gap in the table.
+        */
+       while (subid < size && table[subid] == STEP_NONE)
+               subid++;
+
+       if (subid < size) {
+               drm_dbg(&i915->drm, "Using steppings for %s id 0x%02x\n",
+                       type, subid);
+               return table[subid];
+       }
+
+       drm_dbg(&i915->drm, "Using future steppings\n");
+       return STEP_FUTURE;
+}
+
+/*
+ * PVC needs special handling since we don't lookup the
+ * revid in a table, but rather specific bitfields within
+ * the revid for various components.
+ */
+static void pvc_step_init(struct drm_i915_private *i915, int pci_revid)
+{
+       int ct_subid, bd_subid;
+
+       bd_subid = FIELD_GET(PVC_BD_REVID, pci_revid);
+       ct_subid = FIELD_GET(PVC_CT_REVID, pci_revid);
+
+       RUNTIME_INFO(i915)->step.basedie_step =
+               pvc_step_lookup(i915, "Base Die", pvc_bd_subids,
+                               ARRAY_SIZE(pvc_bd_subids), bd_subid);
+       RUNTIME_INFO(i915)->step.graphics_step =
+               pvc_step_lookup(i915, "Compute Tile", pvc_ct_subids,
+                               ARRAY_SIZE(pvc_ct_subids), ct_subid);
+}
+
 #define STEP_NAME_CASE(name)   \
        case STEP_##name:       \
                return #name;
index d71a99bd51793d2c3f2b4f11a9f46fe81f77cf5c..a6b12bfa9744a27a0d4c939f453fd9070598dc5f 100644 (file)
 struct drm_i915_private;
 
 struct intel_step_info {
-       u8 graphics_step;
+       u8 graphics_step;       /* Represents the compute tile on Xe_HPC */
        u8 display_step;
        u8 media_step;
+       u8 basedie_step;
 };
 
 #define STEP_ENUM_VAL(name)  STEP_##name,
@@ -25,6 +26,7 @@ struct intel_step_info {
        func(B0)                        \
        func(B1)                        \
        func(B2)                        \
+       func(B3)                        \
        func(C0)                        \
        func(C1)                        \
        func(D0)                        \
index 83517a703eb6e8421658b7c3248f40e4246b2f75..a852c471d1b38576f2f0740edb7105db79f66949 100644 (file)
@@ -938,36 +938,32 @@ find_fw_domain(struct intel_uncore *uncore, u32 offset)
        return entry->domains;
 }
 
-#define GEN_FW_RANGE(s, e, d) \
-       { .start = (s), .end = (e), .domains = (d) }
-
-/* *Must* be sorted by offset ranges! See intel_fw_table_check(). */
-static const struct intel_forcewake_range __vlv_fw_ranges[] = {
-       GEN_FW_RANGE(0x2000, 0x3fff, FORCEWAKE_RENDER),
-       GEN_FW_RANGE(0x5000, 0x7fff, FORCEWAKE_RENDER),
-       GEN_FW_RANGE(0xb000, 0x11fff, FORCEWAKE_RENDER),
-       GEN_FW_RANGE(0x12000, 0x13fff, FORCEWAKE_MEDIA),
-       GEN_FW_RANGE(0x22000, 0x23fff, FORCEWAKE_MEDIA),
-       GEN_FW_RANGE(0x2e000, 0x2ffff, FORCEWAKE_RENDER),
-       GEN_FW_RANGE(0x30000, 0x3ffff, FORCEWAKE_MEDIA),
-};
-
-#define __fwtable_reg_read_fw_domains(uncore, offset) \
-({ \
-       enum forcewake_domains __fwd = 0; \
-       if (NEEDS_FORCE_WAKE((offset))) \
-               __fwd = find_fw_domain(uncore, offset); \
-       __fwd; \
-})
+/*
+ * Shadowed register tables describe special register ranges that i915 is
+ * allowed to write to without acquiring forcewake.  If these registers' power
+ * wells are down, the hardware will save values written by i915 to a shadow
+ * copy and automatically transfer them into the real register the next time
+ * the power well is woken up.  Shadowing only applies to writes; forcewake
+ * must still be acquired when reading from registers in these ranges.
+ *
+ * The documentation for shadowed registers is somewhat spotty on older
+ * platforms.  However missing registers from these lists is non-fatal; it just
+ * means we'll wake up the hardware for some register accesses where we didn't
+ * really need to.
+ *
+ * The ranges listed in these tables must be sorted by offset.
+ *
+ * When adding new tables here, please also add them to
+ * intel_shadow_table_check() in selftests/intel_uncore.c so that they will be
+ * scanned for obvious mistakes or typos by the selftests.
+ */
 
-/* *Must* be sorted by offset! See intel_shadow_table_check(). */
 static const struct i915_range gen8_shadowed_regs[] = {
        { .start =  0x2030, .end =  0x2030 },
        { .start =  0xA008, .end =  0xA00C },
        { .start = 0x12030, .end = 0x12030 },
        { .start = 0x1a030, .end = 0x1a030 },
        { .start = 0x22030, .end = 0x22030 },
-       /* TODO: Other registers are not yet used */
 };
 
 static const struct i915_range gen11_shadowed_regs[] = {
@@ -1080,6 +1076,45 @@ static const struct i915_range dg2_shadowed_regs[] = {
        { .start = 0x1F8510, .end = 0x1F8550 },
 };
 
+static const struct i915_range pvc_shadowed_regs[] = {
+       { .start =   0x2030, .end =   0x2030 },
+       { .start =   0x2510, .end =   0x2550 },
+       { .start =   0xA008, .end =   0xA00C },
+       { .start =   0xA188, .end =   0xA188 },
+       { .start =   0xA278, .end =   0xA278 },
+       { .start =   0xA540, .end =   0xA56C },
+       { .start =   0xC4C8, .end =   0xC4C8 },
+       { .start =   0xC4E0, .end =   0xC4E0 },
+       { .start =   0xC600, .end =   0xC600 },
+       { .start =   0xC658, .end =   0xC658 },
+       { .start =  0x22030, .end =  0x22030 },
+       { .start =  0x22510, .end =  0x22550 },
+       { .start = 0x1C0030, .end = 0x1C0030 },
+       { .start = 0x1C0510, .end = 0x1C0550 },
+       { .start = 0x1C4030, .end = 0x1C4030 },
+       { .start = 0x1C4510, .end = 0x1C4550 },
+       { .start = 0x1C8030, .end = 0x1C8030 },
+       { .start = 0x1C8510, .end = 0x1C8550 },
+       { .start = 0x1D0030, .end = 0x1D0030 },
+       { .start = 0x1D0510, .end = 0x1D0550 },
+       { .start = 0x1D4030, .end = 0x1D4030 },
+       { .start = 0x1D4510, .end = 0x1D4550 },
+       { .start = 0x1D8030, .end = 0x1D8030 },
+       { .start = 0x1D8510, .end = 0x1D8550 },
+       { .start = 0x1E0030, .end = 0x1E0030 },
+       { .start = 0x1E0510, .end = 0x1E0550 },
+       { .start = 0x1E4030, .end = 0x1E4030 },
+       { .start = 0x1E4510, .end = 0x1E4550 },
+       { .start = 0x1E8030, .end = 0x1E8030 },
+       { .start = 0x1E8510, .end = 0x1E8550 },
+       { .start = 0x1F0030, .end = 0x1F0030 },
+       { .start = 0x1F0510, .end = 0x1F0550 },
+       { .start = 0x1F4030, .end = 0x1F4030 },
+       { .start = 0x1F4510, .end = 0x1F4550 },
+       { .start = 0x1F8030, .end = 0x1F8030 },
+       { .start = 0x1F8510, .end = 0x1F8550 },
+};
+
 static int mmio_range_cmp(u32 key, const struct i915_range *range)
 {
        if (key < range->start)
@@ -1107,11 +1142,70 @@ gen6_reg_write_fw_domains(struct intel_uncore *uncore, i915_reg_t reg)
        return FORCEWAKE_RENDER;
 }
 
+#define __fwtable_reg_read_fw_domains(uncore, offset) \
+({ \
+       enum forcewake_domains __fwd = 0; \
+       if (NEEDS_FORCE_WAKE((offset))) \
+               __fwd = find_fw_domain(uncore, offset); \
+       __fwd; \
+})
+
+#define __fwtable_reg_write_fw_domains(uncore, offset) \
+({ \
+       enum forcewake_domains __fwd = 0; \
+       const u32 __offset = (offset); \
+       if (NEEDS_FORCE_WAKE((__offset)) && !is_shadowed(uncore, __offset)) \
+               __fwd = find_fw_domain(uncore, __offset); \
+       __fwd; \
+})
+
+#define GEN_FW_RANGE(s, e, d) \
+       { .start = (s), .end = (e), .domains = (d) }
+
+/*
+ * All platforms' forcewake tables below must be sorted by offset ranges.
+ * Furthermore, new forcewake tables added should be "watertight" and have
+ * no gaps between ranges.
+ *
+ * When there are multiple consecutive ranges listed in the bspec with
+ * the same forcewake domain, it is customary to combine them into a single
+ * row in the tables below to keep the tables small and lookups fast.
+ * Likewise, reserved/unused ranges may be combined with the preceding and/or
+ * following ranges since the driver will never be making MMIO accesses in
+ * those ranges.
+ *
+ * For example, if the bspec were to list:
+ *
+ *    ...
+ *    0x1000 - 0x1fff:  GT
+ *    0x2000 - 0x2cff:  GT
+ *    0x2d00 - 0x2fff:  unused/reserved
+ *    0x3000 - 0xffff:  GT
+ *    ...
+ *
+ * these could all be represented by a single line in the code:
+ *
+ *   GEN_FW_RANGE(0x1000, 0xffff, FORCEWAKE_GT)
+ *
+ * When adding new forcewake tables here, please also add them to
+ * intel_uncore_mock_selftests in selftests/intel_uncore.c so that they will be
+ * scanned for obvious mistakes or typos by the selftests.
+ */
+
 static const struct intel_forcewake_range __gen6_fw_ranges[] = {
        GEN_FW_RANGE(0x0, 0x3ffff, FORCEWAKE_RENDER),
 };
 
-/* *Must* be sorted by offset ranges! See intel_fw_table_check(). */
+static const struct intel_forcewake_range __vlv_fw_ranges[] = {
+       GEN_FW_RANGE(0x2000, 0x3fff, FORCEWAKE_RENDER),
+       GEN_FW_RANGE(0x5000, 0x7fff, FORCEWAKE_RENDER),
+       GEN_FW_RANGE(0xb000, 0x11fff, FORCEWAKE_RENDER),
+       GEN_FW_RANGE(0x12000, 0x13fff, FORCEWAKE_MEDIA),
+       GEN_FW_RANGE(0x22000, 0x23fff, FORCEWAKE_MEDIA),
+       GEN_FW_RANGE(0x2e000, 0x2ffff, FORCEWAKE_RENDER),
+       GEN_FW_RANGE(0x30000, 0x3ffff, FORCEWAKE_MEDIA),
+};
+
 static const struct intel_forcewake_range __chv_fw_ranges[] = {
        GEN_FW_RANGE(0x2000, 0x3fff, FORCEWAKE_RENDER),
        GEN_FW_RANGE(0x4000, 0x4fff, FORCEWAKE_RENDER | FORCEWAKE_MEDIA),
@@ -1131,16 +1225,6 @@ static const struct intel_forcewake_range __chv_fw_ranges[] = {
        GEN_FW_RANGE(0x30000, 0x37fff, FORCEWAKE_MEDIA),
 };
 
-#define __fwtable_reg_write_fw_domains(uncore, offset) \
-({ \
-       enum forcewake_domains __fwd = 0; \
-       const u32 __offset = (offset); \
-       if (NEEDS_FORCE_WAKE((__offset)) && !is_shadowed(uncore, __offset)) \
-               __fwd = find_fw_domain(uncore, __offset); \
-       __fwd; \
-})
-
-/* *Must* be sorted by offset ranges! See intel_fw_table_check(). */
 static const struct intel_forcewake_range __gen9_fw_ranges[] = {
        GEN_FW_RANGE(0x0, 0xaff, FORCEWAKE_GT),
        GEN_FW_RANGE(0xb00, 0x1fff, 0), /* uncore range */
@@ -1176,7 +1260,6 @@ static const struct intel_forcewake_range __gen9_fw_ranges[] = {
        GEN_FW_RANGE(0x30000, 0x3ffff, FORCEWAKE_MEDIA),
 };
 
-/* *Must* be sorted by offset ranges! See intel_fw_table_check(). */
 static const struct intel_forcewake_range __gen11_fw_ranges[] = {
        GEN_FW_RANGE(0x0, 0x1fff, 0), /* uncore range */
        GEN_FW_RANGE(0x2000, 0x26ff, FORCEWAKE_RENDER),
@@ -1215,14 +1298,6 @@ static const struct intel_forcewake_range __gen11_fw_ranges[] = {
        GEN_FW_RANGE(0x1d4000, 0x1dbfff, 0)
 };
 
-/*
- * *Must* be sorted by offset ranges! See intel_fw_table_check().
- *
- * Note that the spec lists several reserved/unused ranges that don't
- * actually contain any registers.  In the table below we'll combine those
- * reserved ranges with either the preceding or following range to keep the
- * table small and lookups fast.
- */
 static const struct intel_forcewake_range __gen12_fw_ranges[] = {
        GEN_FW_RANGE(0x0, 0x1fff, 0), /*
                0x0   -  0xaff: reserved
@@ -1327,8 +1402,6 @@ static const struct intel_forcewake_range __gen12_fw_ranges[] = {
 /*
  * Graphics IP version 12.55 brings a slight change to the 0xd800 range,
  * switching it from the GT domain to the render domain.
- *
- * *Must* be sorted by offset ranges! See intel_fw_table_check().
  */
 #define XEHP_FWRANGES(FW_RANGE_D800)                                   \
        GEN_FW_RANGE(0x0, 0x1fff, 0), /*                                        \
@@ -1490,6 +1563,103 @@ static const struct intel_forcewake_range __dg2_fw_ranges[] = {
        XEHP_FWRANGES(FORCEWAKE_RENDER)
 };
 
+static const struct intel_forcewake_range __pvc_fw_ranges[] = {
+       GEN_FW_RANGE(0x0, 0xaff, 0),
+       GEN_FW_RANGE(0xb00, 0xbff, FORCEWAKE_GT),
+       GEN_FW_RANGE(0xc00, 0xfff, 0),
+       GEN_FW_RANGE(0x1000, 0x1fff, FORCEWAKE_GT),
+       GEN_FW_RANGE(0x2000, 0x26ff, FORCEWAKE_RENDER),
+       GEN_FW_RANGE(0x2700, 0x2fff, FORCEWAKE_GT),
+       GEN_FW_RANGE(0x3000, 0x3fff, FORCEWAKE_RENDER),
+       GEN_FW_RANGE(0x4000, 0x813f, FORCEWAKE_GT), /*
+               0x4000 - 0x4aff: gt
+               0x4b00 - 0x4fff: reserved
+               0x5000 - 0x51ff: gt
+               0x5200 - 0x52ff: reserved
+               0x5300 - 0x53ff: gt
+               0x5400 - 0x7fff: reserved
+               0x8000 - 0x813f: gt */
+       GEN_FW_RANGE(0x8140, 0x817f, FORCEWAKE_RENDER),
+       GEN_FW_RANGE(0x8180, 0x81ff, 0),
+       GEN_FW_RANGE(0x8200, 0x94cf, FORCEWAKE_GT), /*
+               0x8200 - 0x82ff: gt
+               0x8300 - 0x84ff: reserved
+               0x8500 - 0x887f: gt
+               0x8880 - 0x8a7f: reserved
+               0x8a80 - 0x8aff: gt
+               0x8b00 - 0x8fff: reserved
+               0x9000 - 0x947f: gt
+               0x9480 - 0x94cf: reserved */
+       GEN_FW_RANGE(0x94d0, 0x955f, FORCEWAKE_RENDER),
+       GEN_FW_RANGE(0x9560, 0x967f, 0), /*
+               0x9560 - 0x95ff: always on
+               0x9600 - 0x967f: reserved */
+       GEN_FW_RANGE(0x9680, 0x97ff, FORCEWAKE_RENDER), /*
+               0x9680 - 0x96ff: render
+               0x9700 - 0x97ff: reserved */
+       GEN_FW_RANGE(0x9800, 0xcfff, FORCEWAKE_GT), /*
+               0x9800 - 0xb4ff: gt
+               0xb500 - 0xbfff: reserved
+               0xc000 - 0xcfff: gt */
+       GEN_FW_RANGE(0xd000, 0xd3ff, 0),
+       GEN_FW_RANGE(0xd400, 0xdbff, FORCEWAKE_GT),
+       GEN_FW_RANGE(0xdc00, 0xdcff, FORCEWAKE_RENDER),
+       GEN_FW_RANGE(0xdd00, 0xde7f, FORCEWAKE_GT), /*
+               0xdd00 - 0xddff: gt
+               0xde00 - 0xde7f: reserved */
+       GEN_FW_RANGE(0xde80, 0xe8ff, FORCEWAKE_RENDER), /*
+               0xde80 - 0xdeff: render
+               0xdf00 - 0xe1ff: reserved
+               0xe200 - 0xe7ff: render
+               0xe800 - 0xe8ff: reserved */
+       GEN_FW_RANGE(0xe900, 0x11fff, FORCEWAKE_GT), /*
+                0xe900 -  0xe9ff: gt
+                0xea00 -  0xebff: reserved
+                0xec00 -  0xffff: gt
+               0x10000 - 0x11fff: reserved */
+       GEN_FW_RANGE(0x12000, 0x12fff, 0), /*
+               0x12000 - 0x127ff: always on
+               0x12800 - 0x12fff: reserved */
+       GEN_FW_RANGE(0x13000, 0x23fff, FORCEWAKE_GT), /*
+               0x13000 - 0x135ff: gt
+               0x13600 - 0x147ff: reserved
+               0x14800 - 0x153ff: gt
+               0x15400 - 0x19fff: reserved
+               0x1a000 - 0x1ffff: gt
+               0x20000 - 0x21fff: reserved
+               0x22000 - 0x23fff: gt */
+       GEN_FW_RANGE(0x24000, 0x2417f, 0), /*
+               24000 - 0x2407f: always on
+               24080 - 0x2417f: reserved */
+       GEN_FW_RANGE(0x24180, 0x3ffff, FORCEWAKE_GT), /*
+               0x24180 - 0x241ff: gt
+               0x24200 - 0x251ff: reserved
+               0x25200 - 0x252ff: gt
+               0x25300 - 0x25fff: reserved
+               0x26000 - 0x27fff: gt
+               0x28000 - 0x2ffff: reserved
+               0x30000 - 0x3ffff: gt */
+       GEN_FW_RANGE(0x40000, 0x1bffff, 0),
+       GEN_FW_RANGE(0x1c0000, 0x1c3fff, FORCEWAKE_MEDIA_VDBOX0), /*
+               0x1c0000 - 0x1c2bff: VD0
+               0x1c2c00 - 0x1c2cff: reserved
+               0x1c2d00 - 0x1c2dff: VD0
+               0x1c2e00 - 0x1c3eff: reserved
+               0x1c3f00 - 0x1c3fff: VD0 */
+       GEN_FW_RANGE(0x1c4000, 0x1cffff, FORCEWAKE_MEDIA_VDBOX1), /*
+               0x1c4000 - 0x1c6aff: VD1
+               0x1c6b00 - 0x1c7eff: reserved
+               0x1c7f00 - 0x1c7fff: VD1
+               0x1c8000 - 0x1cffff: reserved */
+       GEN_FW_RANGE(0x1d0000, 0x23ffff, FORCEWAKE_MEDIA_VDBOX2), /*
+               0x1d0000 - 0x1d2aff: VD2
+               0x1d2b00 - 0x1d3eff: reserved
+               0x1d3f00 - 0x1d3fff: VD2
+               0x1d4000 - 0x23ffff: reserved */
+       GEN_FW_RANGE(0x240000, 0x3dffff, 0),
+       GEN_FW_RANGE(0x3e0000, 0x3effff, FORCEWAKE_GT),
+};
+
 static void
 ilk_dummy_write(struct intel_uncore *uncore)
 {
@@ -2125,7 +2295,11 @@ static int uncore_forcewake_init(struct intel_uncore *uncore)
 
        ASSIGN_READ_MMIO_VFUNCS(uncore, fwtable);
 
-       if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 55)) {
+       if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 60)) {
+               ASSIGN_FW_DOMAINS_TABLE(uncore, __pvc_fw_ranges);
+               ASSIGN_SHADOW_TABLE(uncore, pvc_shadowed_regs);
+               ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable);
+       } else if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 55)) {
                ASSIGN_FW_DOMAINS_TABLE(uncore, __dg2_fw_ranges);
                ASSIGN_SHADOW_TABLE(uncore, dg2_shadowed_regs);
                ASSIGN_WRITE_MMIO_VFUNCS(uncore, fwtable);
@@ -2470,118 +2644,6 @@ intel_uncore_forcewake_for_reg(struct intel_uncore *uncore,
        return fw_domains;
 }
 
-/**
- * uncore_rw_with_mcr_steering_fw - Access a register after programming
- *                                 the MCR selector register.
- * @uncore: pointer to struct intel_uncore
- * @reg: register being accessed
- * @rw_flag: FW_REG_READ for read access or FW_REG_WRITE for write access
- * @slice: slice number (ignored for multi-cast write)
- * @subslice: sub-slice number (ignored for multi-cast write)
- * @value: register value to be written (ignored for read)
- *
- * Return: 0 for write access. register value for read access.
- *
- * Caller needs to make sure the relevant forcewake wells are up.
- */
-static u32 uncore_rw_with_mcr_steering_fw(struct intel_uncore *uncore,
-                                         i915_reg_t reg, u8 rw_flag,
-                                         int slice, int subslice, u32 value)
-{
-       u32 mcr_mask, mcr_ss, mcr, old_mcr, val = 0;
-
-       lockdep_assert_held(&uncore->lock);
-
-       if (GRAPHICS_VER(uncore->i915) >= 11) {
-               mcr_mask = GEN11_MCR_SLICE_MASK | GEN11_MCR_SUBSLICE_MASK;
-               mcr_ss = GEN11_MCR_SLICE(slice) | GEN11_MCR_SUBSLICE(subslice);
-
-               /*
-                * Wa_22013088509
-                *
-                * The setting of the multicast/unicast bit usually wouldn't
-                * matter for read operations (which always return the value
-                * from a single register instance regardless of how that bit
-                * is set), but some platforms have a workaround requiring us
-                * to remain in multicast mode for reads.  There's no real
-                * downside to this, so we'll just go ahead and do so on all
-                * platforms; we'll only clear the multicast bit from the mask
-                * when exlicitly doing a write operation.
-                */
-               if (rw_flag == FW_REG_WRITE)
-                       mcr_mask |= GEN11_MCR_MULTICAST;
-       } else {
-               mcr_mask = GEN8_MCR_SLICE_MASK | GEN8_MCR_SUBSLICE_MASK;
-               mcr_ss = GEN8_MCR_SLICE(slice) | GEN8_MCR_SUBSLICE(subslice);
-       }
-
-       old_mcr = mcr = intel_uncore_read_fw(uncore, GEN8_MCR_SELECTOR);
-
-       mcr &= ~mcr_mask;
-       mcr |= mcr_ss;
-       intel_uncore_write_fw(uncore, GEN8_MCR_SELECTOR, mcr);
-
-       if (rw_flag == FW_REG_READ)
-               val = intel_uncore_read_fw(uncore, reg);
-       else
-               intel_uncore_write_fw(uncore, reg, value);
-
-       mcr &= ~mcr_mask;
-       mcr |= old_mcr & mcr_mask;
-
-       intel_uncore_write_fw(uncore, GEN8_MCR_SELECTOR, mcr);
-
-       return val;
-}
-
-static u32 uncore_rw_with_mcr_steering(struct intel_uncore *uncore,
-                                      i915_reg_t reg, u8 rw_flag,
-                                      int slice, int subslice,
-                                      u32 value)
-{
-       enum forcewake_domains fw_domains;
-       u32 val;
-
-       fw_domains = intel_uncore_forcewake_for_reg(uncore, reg,
-                                                   rw_flag);
-       fw_domains |= intel_uncore_forcewake_for_reg(uncore,
-                                                    GEN8_MCR_SELECTOR,
-                                                    FW_REG_READ | FW_REG_WRITE);
-
-       spin_lock_irq(&uncore->lock);
-       intel_uncore_forcewake_get__locked(uncore, fw_domains);
-
-       val = uncore_rw_with_mcr_steering_fw(uncore, reg, rw_flag,
-                                            slice, subslice, value);
-
-       intel_uncore_forcewake_put__locked(uncore, fw_domains);
-       spin_unlock_irq(&uncore->lock);
-
-       return val;
-}
-
-u32 intel_uncore_read_with_mcr_steering_fw(struct intel_uncore *uncore,
-                                          i915_reg_t reg, int slice, int subslice)
-{
-       return uncore_rw_with_mcr_steering_fw(uncore, reg, FW_REG_READ,
-                                             slice, subslice, 0);
-}
-
-u32 intel_uncore_read_with_mcr_steering(struct intel_uncore *uncore,
-                                       i915_reg_t reg, int slice, int subslice)
-{
-       return uncore_rw_with_mcr_steering(uncore, reg, FW_REG_READ,
-                                          slice, subslice, 0);
-}
-
-void intel_uncore_write_with_mcr_steering(struct intel_uncore *uncore,
-                                         i915_reg_t reg, u32 value,
-                                         int slice, int subslice)
-{
-       uncore_rw_with_mcr_steering(uncore, reg, FW_REG_WRITE,
-                                   slice, subslice, value);
-}
-
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/mock_uncore.c"
 #include "selftests/intel_uncore.c"
index 52fe3d89dd2b8ac7b8ef6a343d52c9c3bc190b49..b1fa912a65e75db0eaad8e0206b576893adfc409 100644 (file)
@@ -210,14 +210,6 @@ intel_uncore_has_fifo(const struct intel_uncore *uncore)
        return uncore->flags & UNCORE_HAS_FIFO;
 }
 
-u32 intel_uncore_read_with_mcr_steering_fw(struct intel_uncore *uncore,
-                                          i915_reg_t reg,
-                                          int slice, int subslice);
-u32 intel_uncore_read_with_mcr_steering(struct intel_uncore *uncore,
-                                       i915_reg_t reg, int slice, int subslice);
-void intel_uncore_write_with_mcr_steering(struct intel_uncore *uncore,
-                                         i915_reg_t reg, u32 value,
-                                         int slice, int subslice);
 void
 intel_uncore_mmio_debug_init_early(struct intel_uncore_mmio_debug *mmio_debug);
 void intel_uncore_init_early(struct intel_uncore *uncore,
index cdd196783535af05263c437f793bafe03bff0df1..fda9bb79c049d5a97c8dbdcb7d592331b28deac0 100644 (file)
@@ -69,6 +69,7 @@ static int intel_shadow_table_check(void)
                { gen11_shadowed_regs, ARRAY_SIZE(gen11_shadowed_regs) },
                { gen12_shadowed_regs, ARRAY_SIZE(gen12_shadowed_regs) },
                { dg2_shadowed_regs, ARRAY_SIZE(dg2_shadowed_regs) },
+               { pvc_shadowed_regs, ARRAY_SIZE(pvc_shadowed_regs) },
        };
        const struct i915_range *range;
        unsigned int i, j;
@@ -115,6 +116,7 @@ int intel_uncore_mock_selftests(void)
                { __gen11_fw_ranges, ARRAY_SIZE(__gen11_fw_ranges), true },
                { __gen12_fw_ranges, ARRAY_SIZE(__gen12_fw_ranges), true },
                { __xehp_fw_ranges, ARRAY_SIZE(__xehp_fw_ranges), true },
+               { __pvc_fw_ranges, ARRAY_SIZE(__pvc_fw_ranges), true },
        };
        int err, i;
 
index 67530bfef129c87a1dbd0b461ee42314cfe1df3a..cb0d5b7200c7fe51b5d28e2bb9cd003b47878f20 100644 (file)
@@ -10,24 +10,24 @@ struct agp_bridge_data;
 struct pci_dev;
 struct sg_table;
 
-void intel_gtt_get(u64 *gtt_total,
-                  phys_addr_t *mappable_base,
-                  resource_size_t *mappable_end);
+void intel_gmch_gtt_get(u64 *gtt_total,
+                       phys_addr_t *mappable_base,
+                       resource_size_t *mappable_end);
 
 int intel_gmch_probe(struct pci_dev *bridge_pdev, struct pci_dev *gpu_pdev,
                     struct agp_bridge_data *bridge);
 void intel_gmch_remove(void);
 
-bool intel_enable_gtt(void);
+bool intel_gmch_enable_gtt(void);
 
-void intel_gtt_chipset_flush(void);
-void intel_gtt_insert_page(dma_addr_t addr,
-                          unsigned int pg,
-                          unsigned int flags);
-void intel_gtt_insert_sg_entries(struct sg_table *st,
-                                unsigned int pg_start,
-                                unsigned int flags);
-void intel_gtt_clear_range(unsigned int first_entry, unsigned int num_entries);
+void intel_gmch_gtt_flush(void);
+void intel_gmch_gtt_insert_page(dma_addr_t addr,
+                               unsigned int pg,
+                               unsigned int flags);
+void intel_gmch_gtt_insert_sg_entries(struct sg_table *st,
+                                     unsigned int pg_start,
+                                     unsigned int flags);
+void intel_gmch_gtt_clear_range(unsigned int first_entry, unsigned int num_entries);
 
 /* Special gtt memory types */
 #define AGP_DCACHE_MEMORY      1
index a2def7b270097836327d738e9eb3240fdf248ab4..de49b68b4fc87d47ae5f05a6d72c19c0ac56ac48 100644 (file)
@@ -3443,6 +3443,22 @@ struct drm_i915_gem_create_ext {
  * At which point we get the object handle in &drm_i915_gem_create_ext.handle,
  * along with the final object size in &drm_i915_gem_create_ext.size, which
  * should account for any rounding up, if required.
+ *
+ * Note that userspace has no means of knowing the current backing region
+ * for objects where @num_regions is larger than one. The kernel will only
+ * ensure that the priority order of the @regions array is honoured, either
+ * when initially placing the object, or when moving memory around due to
+ * memory pressure
+ *
+ * On Flat-CCS capable HW, compression is supported for the objects residing
+ * in I915_MEMORY_CLASS_DEVICE. When such objects (compressed) have other
+ * memory class in @regions and migrated (by i915, due to memory
+ * constraints) to the non I915_MEMORY_CLASS_DEVICE region, then i915 needs to
+ * decompress the content. But i915 doesn't have the required information to
+ * decompress the userspace compressed objects.
+ *
+ * So i915 supports Flat-CCS, on the objects which can reside only on
+ * I915_MEMORY_CLASS_DEVICE regions.
  */
 struct drm_i915_gem_create_ext_memory_regions {
        /** @base: Extension link. See struct i915_user_extension. */