My Project
Modules | Macros
User-side Base APIs

Modules

 User-side Base Memory APIs
 
 User-side Base Job Dispatcher APIs
 
 User-side Base GPU Property Query APIs
 
 User-side Base core APIs
 

Macros

#define GPU_MAX_JOB_SLOTS   16
 

Detailed Description

User-side Base GPU Property Query API

The User-side Base GPU Property Query API encapsulates two sub-modules: There is a related third module outside of Base, which is owned by the MIDG module:

Base only deals with properties that vary between different Midgard implementations - the Dynamic GPU properties and the Platform Config properties.For properties that are constant for the Midgard Architecture, refer to the MIDG module. However, we will discuss their relevance here just to provide background information.

About the GPU Properties in Base and MIDG modules

The compile-time properties (Platform Config, Midgard Compile-time properties) are exposed as pre-processor macros.

Complementing the compile-time properties are the Dynamic GPU Properties, which act as a conduit for the Midgard Configuration Discovery.

In general, the dynamic properties are present to verify that the platform has been configured correctly with the right set of Platform Config Compile-time Properties.

As a consistent guide across the entire DDK, the choice for dynamic or compile-time should consider the following, in order:

  1. Can the code be written so that it doesn't need to know the implementation limits at all?
  2. If you need the limits, get the information from the Dynamic Property lookup. This should be done once as you fetch the context, and then cached as part of the context data structure, so it's cheap to access.
  3. If there's a clear and arguable inefficiency in using Dynamic Properties, then use a Compile-Time Property (Platform Config, or Midgard Compile-time property). Examples of where this might be sensible follow:
    • Part of a critical inner-loop
    • Frequent re-use throughout the driver, causing significant extra load instructions or control flow that would be worthwhile optimizing out.

We cannot provide an exhaustive set of examples, neither can we provide a rule for every possible situation. Use common sense, and think about: what the rest of the driver will be doing; how the compiler might represent the value if it is a compile-time constant; whether an OEM shipping multiple devices would benefit much more from a single DDK binary, instead of insignificant micro-optimizations.

Dynamic GPU Properties

Dynamic GPU properties are presented in two sets:

  1. the commonly used properties in base_gpu_props, which have been unpacked from GPU register bitfields.
  2. The full set of raw, unprocessed properties in gpu_raw_gpu_props (also a member of base_gpu_props). All of these are presented in the packed form, as presented by the GPU registers themselves.

The raw properties in gpu_raw_gpu_props are necessary to allow a user of the Mali Tools (e.g. PAT) to determine "Why is this device behaving differently?". In this case, all information about the configuration is potentially useful, but it does not need to be processed by the driver. Instead, the raw registers can be processed by the Mali Tools software on the host PC.

The properties returned extend the Midgard Configuration Discovery registers. For example, GPU clock speed is not specified in the Midgard Architecture, but is necessary for OpenCL's clGetDeviceInfo() function.

The GPU properties are obtained by a call to _mali_base_get_gpu_props(). This simply returns a pointer to a const base_gpu_props structure. It is constant for the life of a base context. Multiple calls to _mali_base_get_gpu_props() to a base context return the same pointer to a constant structure. This avoids cache pollution of the common data.

This pointer must not be freed, because it does not point to the start of a region allocated by the memory allocator; instead, just close the base_context.

Platform Config Compile-time Properties

The Platform Config File sets up gpu properties that are specific to a certain platform. Properties that are 'Implementation Defined' in the Midgard Architecture spec are placed here.

Note
Reference configurations are provided for Midgard Implementations, such as the Mali-T600 family. The customer need not repeat this information, and can select one of these reference configurations. For example, VA_BITS, PA_BITS and the maximum number of samples per pixel might vary between Midgard Implementations, but not for platforms using the Mali-T604. This information is placed in the reference configuration files.

The System Integrator creates the following structure:

They then edit plat_config.h, using the example plat_config.h files as a guide.

At the very least, the customer must set CONFIG_GPU_CORE_TYPE, and will receive a helpful #error message if they do not do this correctly. This selects the Reference Configuration for the Midgard Implementation. The rationale behind this decision (against asking the customer to write #include <gpus/mali_t600.h> in their plat_config.h) is as follows:

However, there is nothing to prevent the customer using #include to organize their own configurations files hierarchically.

The mechanism for the header file processing is as follows:

Kernel Operation

During Base Context Create time, user-side makes a single kernel call:

The kernel-side will fill the provided the entire processed base_gpu_props structure, because this information is required in both user and kernel side; it does not make sense to decode it twice.

Coherency groups must be derived from the bitmasks, but this can be done kernel side, and just once at kernel startup: Coherency groups must already be known kernel-side, to support chains that specify a 'Only Coherent Group' SW requirement, or 'Only Coherent Group with Tiler' SW requirement.

Coherency Group calculation

Creation of the coherent group data is done at device-driver startup, and so is one-time. This will most likely involve a loop with CLZ, shifting, and bit clearing on the L2_PRESENT mask, depending on whether the system is L2 Coherent. The number of shader cores is done by a population count, since faulty cores may be disabled during production, producing a non-contiguous mask.

The memory requirements for this algorithm can be determined either by a u64 population count on the L2_PRESENT mask (a LUT helper already is required for the above), or simple assumption that there can be no more than 16 coherent groups, since core groups are typically 4 cores.