|
My Project
|
Modules | |
| User-side Base Memory APIs | |
| User-side Base Job Dispatcher APIs | |
| User-side Base GPU Property Query APIs | |
| User-side Base core APIs | |
Macros | |
| #define | GPU_MAX_JOB_SLOTS 16 |
The compile-time properties (Platform Config, Midgard Compile-time properties) are exposed as pre-processor macros.
Complementing the compile-time properties are the Dynamic GPU Properties, which act as a conduit for the Midgard Configuration Discovery.
In general, the dynamic properties are present to verify that the platform has been configured correctly with the right set of Platform Config Compile-time Properties.
As a consistent guide across the entire DDK, the choice for dynamic or compile-time should consider the following, in order:
We cannot provide an exhaustive set of examples, neither can we provide a rule for every possible situation. Use common sense, and think about: what the rest of the driver will be doing; how the compiler might represent the value if it is a compile-time constant; whether an OEM shipping multiple devices would benefit much more from a single DDK binary, instead of insignificant micro-optimizations.
Dynamic GPU properties are presented in two sets:
The raw properties in gpu_raw_gpu_props are necessary to allow a user of the Mali Tools (e.g. PAT) to determine "Why is this device behaving differently?". In this case, all information about the configuration is potentially useful, but it does not need to be processed by the driver. Instead, the raw registers can be processed by the Mali Tools software on the host PC.
The properties returned extend the Midgard Configuration Discovery registers. For example, GPU clock speed is not specified in the Midgard Architecture, but is necessary for OpenCL's clGetDeviceInfo() function.
The GPU properties are obtained by a call to _mali_base_get_gpu_props(). This simply returns a pointer to a const base_gpu_props structure. It is constant for the life of a base context. Multiple calls to _mali_base_get_gpu_props() to a base context return the same pointer to a constant structure. This avoids cache pollution of the common data.
This pointer must not be freed, because it does not point to the start of a region allocated by the memory allocator; instead, just close the base_context.
The Platform Config File sets up gpu properties that are specific to a certain platform. Properties that are 'Implementation Defined' in the Midgard Architecture spec are placed here.
The System Integrator creates the following structure:
They then edit plat_config.h, using the example plat_config.h files as a guide.
At the very least, the customer must set CONFIG_GPU_CORE_TYPE, and will receive a helpful #error message if they do not do this correctly. This selects the Reference Configuration for the Midgard Implementation. The rationale behind this decision (against asking the customer to write #include <gpus/mali_t600.h> in their plat_config.h) is as follows:
However, there is nothing to prevent the customer using #include to organize their own configurations files hierarchically.
The mechanism for the header file processing is as follows:
During Base Context Create time, user-side makes a single kernel call:
The kernel-side will fill the provided the entire processed base_gpu_props structure, because this information is required in both user and kernel side; it does not make sense to decode it twice.
Coherency groups must be derived from the bitmasks, but this can be done kernel side, and just once at kernel startup: Coherency groups must already be known kernel-side, to support chains that specify a 'Only Coherent Group' SW requirement, or 'Only Coherent Group with Tiler' SW requirement.
Creation of the coherent group data is done at device-driver startup, and so is one-time. This will most likely involve a loop with CLZ, shifting, and bit clearing on the L2_PRESENT mask, depending on whether the system is L2 Coherent. The number of shader cores is done by a population count, since faulty cores may be disabled during production, producing a non-contiguous mask.
The memory requirements for this algorithm can be determined either by a u64 population count on the L2_PRESENT mask (a LUT helper already is required for the above), or simple assumption that there can be no more than 16 coherent groups, since core groups are typically 4 cores.
1.8.13