GB2639005A

GB2639005A - Translation of identifiers

Info

Publication number: GB2639005A
Application number: GB2403233.6A
Authority: GB
Inventors: Maria Andreozzi Matteo; Stuart Mackenzie Morse James; Owen Hawkins Peter
Original assignee: ARM Ltd; Advanced Risc Machines Ltd
Current assignee: ARM Ltd
Priority date: 2024-03-06
Filing date: 2024-03-06
Publication date: 2025-09-10
Also published as: WO2025186532A8; WO2025186532A1; GB202403233D0

Abstract

Apparatus 30 is provided with translation circuitry 34 configured to translate an input resource access request received from a first domain 32 to an output resource access request for a second domain 36, the input resource access request comprising a first identifier 40, e.g. a partition identifier PartID, indicating a requestor process in the first domain that generated the input resource access request, and the output resource access request comprising a second identifier 44 translated from the first identifier based on translation information 42 defining a mapping between identifiers in the first domain and the second domain. The first identifier is omitted from the output resource access request. Resource access requests may be issued by virtual machines (VMs), operating systems (OSs) or applications in the first domain. Providing translation circuitry addresses the problem that the first identifier may not be in a format interpretable in the second domain. It also allows scalability with multiple domains. Translation information may comprise look-up tables LUTs. Identifiers in each respective domain may be used for resource allocation control. The translation circuitry sets an output security state of the output translation request based on an input memory access requests specifying security state identifiers.

Description

TRANSLATION OF IDENTIFIERS

The present disclosure relates to data processing. Furthermore, the present disclosure relates to an apparatus, a method, and a computer program.

Some apparatuses use identifiers to indicate a requestor of a resource access request. The identifier allows the requestor to be identified from the resource access request and may be used for the purpose of resource allocation, e.g., resource partitioning.

In some configurations there is provided an apparatus comprising: translation circuitry configured to translate an input resource access request received from a first domain to an output resource access request for a second domain, the input resource access request comprising a first identifier indicating a requestor process in the first domain that generated the input resource access request, and the output resource access request comprising a second identifier translated from the first identifier based on translation information defining a mapping between identifiers in the first domain and the second domain, wherein the first identifier is omitted from the output resource access request.

k some configurations there is provided a system comprising: the apparatus; first processing circuitry configured to implement the first domain; and second processing circuitry configured to implement the second domain, wherein the translation circuitry is configured at an interface between the first domain and the second domain.

In some configurations there is provided a method comprising: translating an input resource access request received from a first domain to an output resource access request for a second domain, the input resource access request comprising a first identifier indicating a requestor process in the first domain that generated the input resource access request, and the output resource access request comprising a second identifier translated from the first identifier based on translation information defining a mapping between identifiers in the first domain and the second domain, wherein the first identifier is omitted from the output resource access request.

In some configurations there is provided a computer program for controlling a host data processing apparatus to provide an instruction execution environment, the computer program comprising: translation program logic configured to translate an input resource access request received from a first domain to an output resource access request for a second domain, the input resource access request comprising a first identifier indicating a requestor process in the first domain that generated the input resource access request, and the output resource access request comprising a second identifier translated from the first identifier based on translation information defining a mapping between identifiers in the first domain and the second domain, wherein the first identifier is omitted from the output resource access request.

In some configurations the computer program is stored on a computer readable storage medium. In some configurations the computer readable storage medium is a non-transitory computer readable storage medium.

The present techniques will be described further, by way of example only, with reference to configurations thereof as illustrated in the accompanying drawings, in which: Figure 1 schematically illustrates an apparatus according to some configurations of the present techniques; Figure 2 schematically illustrates processing circuitry according to some configurations of the present techniques; Figure 3 schematically illustrates a CPU cluster to some configurations of the present techniques; Figure 4 schematically illustrates an apparatus according to some configurations of the present techniques; Figure 5 schematically illustrates an apparatus according to some configurations of the present techniques; Figure 6 schematically illustrates an apparatus according to some configurations of the present techniques; Figure 7 schematically illustrates use of a look up table according to some configurations of the present techniques; Figure 8 schematically illustrates translation of an identifier according to some configurations of the present techniques; Figure 9 schematically illustrates translation of an identifier according to some configurations of the present techniques; Figure 10 schematically illustrates a sequence of steps carried out by an apparatus according to some configurations of the present techniques; and Figure 11 schematically illustrates simulator code according to some configurations of the present techniques.

At least some configurations provide an apparatus comprising translation circuitry configured to translate an input resource access request received from a first domain to an output resource access request for a second domain, the input resource access request comprising a first identifier indicating a requestor process in the first domain that generated the input resource access request, and the output resource access request comprising a second identifier translated from the first identifier based on translation information defining a mapping between identifiers in the first domain and the second domain, wherein the first identifier is omitted from the output resource access request.

Resource access requests may be issued from a requestor process, the requestor process may be for example, a virtual machine (VM), an operating system (OS), or an application executed within a first domain of the apparatus. The requestor process may be a process running on a processor in a multiprocessor environment. In order to allow the requestor to be identified, for example, by a process or device receiving the resource access requests, the resource access requests comprise a first identifier identifying the requestor process in the first domain. The first identifier is defined within the first domain so that any process or device within the first domain that receives multiple resource access requests within that domain can identify resource access requests that have originated (been requested by) a same requestor process.

Provision for the first identifier in the first domain of the apparatus may be defined during manufacturing and may vary between different domains. For example, a first identifier in the first domain may not be defined in a second domain (where the second domain is a different domain to the first domain and may have been manufactured to use alternately configured identifiers) and/or may be alternately assigned in the second domain.

The inventors have recognised that where two such domains are required to communicate with one another, there may be difficulties associated with the use of different identifiers in the first domain and the second domain. For example, if two domains were to be coupled to one another within the apparatus and a requestor process in the first domain was to issue a request for access to a resource in the second domain, then the first identifier comprised in the resource access request may not be in a format that can be interpreted in the second domain. Whilst it may be possible to implement an apparatus in which a second identifier, assigned in the second domain, is comprised in the resource access request in addition to the first identifier, such an implementation would require a greater amount of data to be transmitted potentially increasing overheads and power consumption. Furthermore, this approach would not scale well with multiple domains. The apparatus is therefore provided with translation circuitry to translate a first identifier received as part of an input resource access request in the first domain to a second identifier comprised in an output resource access request in the second domain. The translation between the first identifier and the second identifier is handled within the translation circuitry and the translation circuitry is configured to omit (e.g., to not include and/or to withhold) the first identifier from the output resource access request. The provision of such translation circuitry reduces the overheads associated with alternate identifier provisions in each of a first domain and a second domain and improves scalability.

In some configurations a number of identifiers available for assignment in the first identifier space and the second identifier space are defined independently from one another. The independence in the definition of the identifiers available in each domain means that the number of identifiers provided in one domain was determined without making use of knowledge of the number of identifiers provided in the other domain. For example, the identifiers may be assigned at a time of manufacture by different manufacturers without those manufacturers communicating with one another relating to the number of identifiers. Alternatively, the assignment of identifiers in the first domain and the second domain may be based on different implementation criteria without consideration of the implementation criteria in the other domain. The number of identifiers that are available in the first domain and the number of identifiers that are available in the second domain may be different to one another. The first domain may, for example, be a large domain comprising a number of processors, whilst the second domain may be a much smaller domain comprising a single processor and supporting far fewer identifiers than the first domain. For example, the identifiers in the first domain could be represented using an 8-bit number, whilst the identifiers in the second domain may be represented using a 4-bit number. Alternatively, the number of identifiers that are available in the first domain and the second domain may, coincidentally, be the same as one another. However, where this is the case the assignment and application of these identifiers may be different in each of the first domain and the second domain.

In some configurations the translation information is dynamically configurable. Whilst the provision of a fixed translation between the first domain and the second domain may offer the reduced overheads and improved scal ability discussed above, such an approach requires a-priori knowledge of the identifiers available in both the first domain and the second domain. Such information may not be available at the time of system assembly, for example, because the first domain was constructed by a first manufacturer and the second domain was constructed independently by a second manufacturer with the domains being combined by a third manufacturer. By allowing the translation information to be dynamically configurable, a greater flexibility can be provided and incompatibility between domains, e.g., manufactured by different manufactures, can be reduced.

In some configurations the translation circuitry comprises one or more registers and the translation information is defined based on data stored in the one or more registers. The registers may, for example identify one or more rules that allow for the translation of identifiers between the first domain and the second domain. The registers may be configurable by one or more process executing in the first domain and/or in the second domain allowing either of the first domain and the second domain to configure the translation circuitry.

In some configurations the one or more registers are memory mapped registers. The memory mapped registers may be mapped to a region of memory that is accessible by processes running in one or both of the domains. The memory mapped registers allow the processes wishing to modify the memory mapped registers to write data to the memory mapped location (e.g., to the addresses in memory that correspond to the memory mapped registers). This approach provides an efficient way to provide dynamic configurability of the translation information.

In some configurations the apparatus comprises control circuitry configured to maintain the translation information by modifying the one or more registers. The control circuitry may be provided in addition to the use of memory mapped registers, thereby providing multiple approaches by which the registers can be dynamically updated. The control circuitry may be configured to respond to one or more architecturally defined instructions that may form part of an instruction set architecture.

Whilst the translation data can be stored in any format, in some configurations the translation information comprises one or more translation lookup tables. The translation lookup tables may be arranged as a direct mapped storage structure (e.g., a direct mapped cache), a set associative storage structure, or a fully associative storage structure. The translation tables may be arranged in a hierarchical structure requiring a translation table walk to be performed to map between the first identifier and the second identifier. The translation tables may be arranged to translate one portion of the identifier, e.g., a most significant portion of the identifier, with a further portion of the identifier, e.g., a least significant portion of the identifier, being mapped unchanged between the first identifier and the second identifier.

In some configurations the translation information comprises one or more translation functions configured to map between identifiers in the first domain and identifiers in the second domain. The translation functions may comprise one or more hash functions, one or more bit reducing functions, and/or one or more bit permuting functions. The translation functions may be provided in combination with one or more translation tables, with the translating functions being provided to map between a most significant portion of the first identifier and a most significant portion of the second identifier, and the translation tables being provided to map between a least significant portion of the first identifier and a least significant portion of the second identifier.

Alternatively, the translation tables may be provided to map between the most significant portions, with the translation functions mapping between least significant portions. Furthermore, in some configurations, the combination of translation functions, translation tables, and direct mapping may be used to map different portions of the identifiers. In some configurations, the translation tables may be used to identify a function from a plurality of possible functions that is used to perform the identifier translation.

In some configurations the translation circuitry is configured to be provided at a boundary between the first domain and the second domain. Where there are multiple communication pathways between the first domain and the second domain, multiple instance of the translation circuitry may be provided, e.g., one instance of the translation circuitry at each of the communication pathways.

In some configurations the boundary is an external boundary of both of the first domain and the second domain. In alternative configurations, the boundary may be an internal boundary of at least one of the first domain and the second domain.

In some configurations identifiers in each respective domain of the first domain and the second domain are assigned by a respective instance of management software running in the respective domain. The assignment of identifiers may be handled by management software. In such configurations, the management software in the second domain may be ignorant of the identifiers assigned by the management software operating in the first domain. The management software running in the first domain may be able to communicate with the management software running in the second domain to define regions of identifier space in one domain that can be utilised by requestor devices in the other domain. Where one domain has a greater number of available identifiers than the other, a subset of those identifiers may mapped to identifiers in the other domain for inter-domain resource access requests, with the remaining identifiers reserved for use for intra-domain resource access requests.

In some configurations the identifiers in each respective domain of the first domain and the second domain are used for resource allocation control for regulating the level of performance seen by resource accesses in the respective domain. In some configurations resource allocation may include resource partitioning. The identifiers may be used, for example, to help provide a desired Quality of Service (QoS) for the requestor process, and/or to help ensure a fair usage policy between different requestor processes. In some configurations, the identifiers are partition identifiers (PARTIDs), otherwise referred to as partition numbers, which reference a particular partition within a Partition ID Space. The numerical value of a partition number has no inherent meaning. As one use case, a unique partition identifier may be assigned to different VMs, OSs, or applications by the relevant PARTID space manager. Alternatively, or in addition, the identifier may include a Partition Monitoring Group (PMG) which may be used to group access requests and may provide a more coarse-grained grouping than the use of a PARTID, for example, two or more requestor devices could share a same PMG identifier and each assign their own PARTIDs. A QoS could then be provided for the two or more requestors based on the PMG identifier, the PARTID, or a combination of the two. In addition, the PMG may allow for a more fine-grained partitioning with multiple requestor devices in different PMGs being able to use the same PARTID, thereby providing a greater range of PARTIDs available to an individual requestor. It will be readily apparent to the skilled person that the PARTID and the PMG are examples of identifiers and that alternative identifiers could be used in alternative configurations.

In some configurations the first identifier is independent of a memory address specified in the input resource access request. The first identifier (e.g., a PARTID) may be used as a label to distinguish memory requests issued on behalf of different execution environments (e.g. software execution environments executed by processing circuitry). The first identifier does not influence which addresses in memory are allowed to be accessed by a particular execution environment, but may be used for resource allocation control for regulating the level of performance seen for memory access is issued by a particular execution environment.

In some configurations the input memory access request specifies a security state identifier indicative of an input security state associated with the input memory access request; and the translation circuitry is configured to set an output security state of the output translation request based on the input security state. The first domain may support operating in a number of different security states, which may be associated with different rights to access information stored in registers or memory, for example. The resource access request may specify a security state identifier indicative of a security state associated with the resource access request. The translation circuitry may translate the security state identifier when performing the identifier translation.

In some configurations, the translation circuitry is configured to set the output security state in dependence on security translation information defined in the translation information. The security translation may be provided by a separate set of tables and/or functions to the identifier translation. In some configurations, a number of security states in the first domain may be different to a number of security states provided in a second domain. In some configurations a different set of translation tables and/or translation functions may be provided for each of the security domains. For example, where the first domain provides multiple security domains, each of the security domains may map onto a different part of an identifier space in a second domain that only provides a single security domain.

In some configurations the translation circuitry is configured to perform access request translations input from the second domain to be output to the first domain. In other words, the translation circuitry may be capable of performing two-way translation with input resource access requests received from either of the domains and output into the other of the two domains.

According to some configurations there is provided a system comprising: the apparatus as described above; first processing circuitry configured to implement the first domain; and second processing circuitry configured to implement the second domain, wherein the translation circuitry is configured at an interface between the first domain and the second domain. The first domain and the second domain may include any component(s) of a data processing system. For example, each of the first domain and the second domain may comprise any number of CPUs, GPUs, caches, data processing pipelines, etc. In some configurations the system comprises third processing circuitry configured to implement a third domain; and second translation circuitry arranged at a boundary between the third domain and at least one of the first domain and the second domain, wherein the translation circuitry is unaware of identifiers assigned in the third domain. In this way three or more domains may be connected to one another with each independently assigning their own identifiers as discussed above. Translation circuitry is provided between the first domain and the second domain, and second translation circuitry is provided between the second domain and the third domain. In some configurations, third translation circuitry may be provided between the first domain and the third domain. The provision of multiple address translation circuits, each capable of translating between the two domains to which they are connected, provides a highly flexible system which does not require any additional overhead as the number of domains increase.

Particular configurations will now be described with reference to the figures.

Figure 1 schematically illustrates an apparatus 30 according to some configurations of the present techniques. The apparatus 30 comprises translation circuitry 34 which is arranged between a first domain 32 and a second domain 36. The translation circuitry 34 stores translation information 42 indicative of translations between input resource access requests, issued from one of the first domain 32 and the second domain 34, and output resource access requests issued to the other of the first domain 32 and the second domain 34.

In the illustrated configuration the first domain 32 comprises plural central processing units (CPUs) 38, each of which is an example of processing circuitry. Processes, e.g., virtual machines or operating systems running on one or more of the CPUs 38 may issue resource access requests. The resource access requests are each assigned an identifier indicative of the request process, e.g., the virtual machine or operating system that generated (initiated) the resource access request. Where the resource access request is fulfilled within the first domain 32, this can be done using the identifier to ensure that the requestor process that generated the resource access request receives a predefined, or dynamically configurable quality of service.

k the illustrated configuration, the second domain 36 comprises a single CPU 46. The CPU 46 may be responsive to resource access requests to fulfil those resource access requests based on identifiers contained in those resource access requests. The identifiers assignable by the first domain 32 and by the second domain 36 may be independently defined, for example, at a time of manufacture or at a time that those domains are designed. As such, the identifiers that can be handled by each of the two domains may not be compatible with one another.

Where the resource access request is issued by the first domain 32 and is to be fulfilled by the second domain 36, the resource access request is passed as an input resource access request 40 from the first domain 32 to the second domain 36 via the translation circuitry 34. The translation circuitry 34 receives the input resource access request 40, which in the illustrated configuration is a 12-bit identifier, and performs a translation to translate the identifier of the input resource access request 40 to an identifier in an output resource access request 44, which in the illustrated configuration is a 4-bit identifier. The translation is performed by the translation circuitry 34 based on the translation information 42. The output resource access request 44 is then passed to the second domain 36 for that resource access request to be fulfilled. The 12-bit identifier of the first domain 32 is omitted from the output resource access request 44.

Whilst the specific examples set out in the figures make reference to PARTIDs as the identifiers, it will be readily apparent to the skilled person that the use of the PARTID is for example purpose only and that any identifier indicative of a requestor process may be used. For example, the identifier could be a PARTID, a PMG, a combination of a PARTID and a PMG, or any other identifier that can be used to identify the requestor process.

Figure 2 schematically illustrates an example of a data processing apparatus 2, which may be an example of the first domain 32 and/or an example of the second domain 36. The apparatus 2 is provided with processing circuitry 4 and a cache 6. For example, the cache 6 could be an instruction cache for caching instructions, a data cache for caching data, or a shared cache which can cache both instructions to be fetched for processing and data accessed from memory in response to load/store instructions processed by the processing circuitry 4. The cache 6 can be at any level of a cache hierarchy. For example, the cache 6 could be a level 1, level 2, level 3, or system cache. Although the cache 6 is shown separate from the processing circuitry 4, in some implementations (especially if the cache is a level I or level 2 cache) the cache 6 can be regarded as part of the processing circuitry. If the cache 6 is at a level of the cache hierarchy other than level 1, then there may also be a higher-level cache 8 accessible to the processing circuitry 4, where information can be evicted from the higher-level cache 8 to the cache 6 at a lower level of the cache hierarchy.

The processing circuitry 4 includes fetch circuitry 10 to fetch instructions from the cache 6, 8 or memory, decode circuitry 12 to decode the fetched instructions, and execute circuitry 14 to execute the instructions to perform data processing operations. Operands for the instructions may be read by the execute circuitry 14 from registers 16, and results of executed instructions may be written to the registers 16.

The registers 16 include one or more partition identifier control registers 18 used to set a partition identifier which is specified by a cache request 19 (an example of a resource access request) sent to the cache 6 by the processing circuitry 4 to request access to information that may be stored in the cache 6. The processing circuitry 4 has partition identifier selection circuitry 17 which selects which partition identifier is specified by the cache request 19, based on the information stored in the one or more partition identifier control registers 18. The partition identifier (PART I D) acts as a label to distinguish resource requests issued on behalf of different execution environments (e.g. software execution environments executed by the processing circuitry 4). The partition identifier does not influence which addresses in memory are allowed to be accessed by a particular execution environment, but is used for resource allocation control for regulating the level of performance seen for memory access is issued by a particular execution environment.

As an illustrative example, the cache 6 comprises storage circuitry 20 for storing cached information and related tags (used for determining on a cache lookup whether a cache entry relates to the target address of the cache request). The cache 6 also has cache replacement control circuitry 22 for controlling replacement of cache entries in the storage circuitry 20. The cache 6 may use the partition identifier to influence the cache replacement policy used by the cache replacement control 22 to select victim cache entries to be reallocated for a new address to be allocated in the cache, but the partition identifier can also be used for other aspects of resource allocation such as controlling the amount of memory system bandwidth which a particular execution environment is allowed to use, or setting a maximum fraction of cache capacity that a given execution environment is allowed to allocate for its own information. Such resource allocation controls can be useful to prevent a "noisy" execution environment (which generates frequent memory access requests) monopolizing a significant fraction of the available memory system resource (which may otherwise harm performance for other execution environments with less frequent requests which might not be able to gain sufficient usage of memory system resource if the amount of resource used by the "noisy" execution environment was not limited). It will be appreciated that the partition identifier may be used for other purposes and may be attached to resource access requests other than cache access requests, as discussed in relation to figure 1, the resource access requests may be serviced in a domain other than the domain in which the resource access request originated.

In the specific case of software execution environments executed by processing circuitry 4, each software execution environment could be a different process or thread executed by the processing circuitry 4 or a sub-portion of instructions executed within such a process or thread (hence in some examples different parts of the same process or thread could be allocated different partition identifiers). The way in which the set of software to be executed by the processing circuitry 4 is partitioned into different software execution environments allocated different partition identifier is controlled by the software itself, by setting the partition identifier control information in one or more partition identifier control registers 18.

Partition identifiers could also be assigned to particular hardware execution environments in the system. For example, resource access requests initiated from different hardware units can be assigned different partition identifiers.

In some examples, the allocation of partition identifiers can be fixed, selected by hardware. For example, the partition identifiers used for requests initiated from different hardware execution environments can be hardwired in the circuit design (for example, during manufacture of the circuit), or the partition identifiers used for particular software execution environments could be derived from software execution environment identifiers such as thread identifiers or process identifiers in a manner which does not allow the software itself to vary the partition identifier used.

It can be useful to offer the ability for software to program which partition identifiers are used for particular execution environments. Hence, the partition identifier control registers 18 can be provided to allow software to configure information used to control the selection of the partition identifier used for a particular memory access request 19.

The partition identifier control registers 18 may include a single register to which a partition identifier can be written by software. In such implementations, memory access requests such as cache access request 19 issued by the processing circuitry 4 specify the partition identifier currently specified in the register 18. When switching between different portions of software requiring their resource access requests to be distinguished from each other for resource control purposes (e.g. on a context switch), software updates the partition identifier control register 18 to specify the partition identifier for the new software to be executed after the switch, and then subsequent resource access requests will specify the new partition identifier.

Other examples could implement multiple partition identifier control registers 18 specifying partition identifiers associated with different operating states (e.g. privilege levels or exception levels associated with the processing circuitry 4), and the current operating state of the processing circuitry 4 at the time a resource access request, e.g., the cache request 19, is issued may be used to select which partition identifier control register 18 is selected by the partition identifier selection circuitry 17, and hence which partition identifier is specified in the resource access request. For example, this can be useful to avoid software needing to rewrite partition identifier control registers 18 each time there is a supervisor call or exception taken to a more privileged operating states or an exception return back to a less privileged operating state, which may be relatively frequent events.

Some implementations may provide an architectural mechanism for enabling different partition identifiers to be specified for different classes of memory access request issued in the same software execution environment. For example, there may be fields within the partition identifier control registers 18 for specifying different partition identifiers for data cache requests issued in response to load/store instructions executed by the execute circuitry 14, instruction fetch cache requests issued in response to instruction fetch requests made by the fetch circuitry 10, and/or page table walk cache requests issued by the processing circuitry 4 to request access to page table information used to translate addresses of cache/memory access requests.

Also, in some cases the partition identifier specified in a memory access request may not be exactly the same as the partition identifier value stored in the partition identifier control register 18. Some implementations of the partition identifier selection circuitry 17 may support a partition identifier virtualisation scheme where a virtual partition identifier written by software to the partition identifier control registers 18 is remapped to a physical partition identifier appended to the cache request 19, based on partition identifier remapping information which can be defined by software. This can allow a number of different pieces of less privileged software (e.g. operating systems) to coexist on the system while independently setting the partition identifiers to be used for different software execution environments managed by the less privileged software, with more privileged software (e.g. a hypervisor) defining the partition identifier remapping information so that conflicting partition identifiers set by different operating systems can be mapped to different partition identifiers as seen by the cache 6.

Hence, it will be appreciated that there are a wide variety of ways in which the partition identifier of a resource access request could be determined by the partition identifier selection circuitry 17, but in some examples the processing circuitry 4 has circuitry to select the partition identifier to be associated with the memory access request, based on information specified by software in at least one software-writable architectural register 18.

In some implementations, the processing circuitry 4 also supports operating in different security states, which may be associated with different access rights to execute instructions and/or access information in memory, the cache 6, 8 or registers 16. A security state identifier associated with a current security state may also be specified by the cache request 19. The cache request 19 also specifies the target address of the information to be accessed in the cache.

It will be readily apparent to the skilled person that the circuitry illustrated in figure 2 may form part of either the first domain 32, the second domain 36 or may be split with some circuitry, e.g., the processing circuitry 4 being provided in the first domain 32 and other circuitry, e.g., the cache 6 being provided in the second domain.

In such configurations, the translation circuitry 34 provided between the first domain and the second domain is used to translate partition identifiers identified in resource access requests issued by the first domain into partition identifiers that can be recognized/managed by the second domain.

Figure 3 schematically illustrates details of a further example of a data processing apparatus, which may be an example of the first domain 32 and/or an example of the second domain. The system comprises a CPU (central processing unit) cluster 200 and a GPU 202. This example also has a DMA (direct memory access) controller 210 for performing memory accesses based on configuration data set by software executing on the CPU cluster 200 or GPU 202. It will be appreciated that other processing circuits not shown could also be included (e.g. a neural processing unit (NPU) used for accelerating neural network processing, or other type of hardware accelerator).

The CPU cluster 200 comprises a number of CPUs 201, each CPU 201 having processing circuitry 4 and at least one higher-level (e.g. level 1 and/or level 2) cache 8 as mentioned above. While Figure 3 shows an example where there are two CPUs 201 in the cluster 200, other examples could only have a single CPU 201 or could have more than two CPUs 201. While Figure 3 shows a single CPU cluster 200, other examples could have more than one CPU cluster 200. While not shown in the example of Figure 3, in addition to any caches 8 private to a particular CPU 201, there could also be a further level of cache shared between the CPUs 201 of the cluster 200, but which is not accessible to the GPU 202 or to other CPU clusters.

The GPU 202 also has processing circuitry 4 and at least one cache 8 similar to those mentioned earlier. The architecture and micro-architecture of the processing circuitry 4 in the GPU 202 may differ from the architecture and micro-architecture of the processing circuitry 4 in the CPUs 201 -e.g. the GPU may support different instructions and have a different hardware design targeting parallel processing of graphics threads. While Figure 3 shows a single GPU 202, other examples could have more than one GPU 202.

The CPU cluster 200 and GPU 202 share access to a shared memory system including a shared cache 6. For example, the shared cache 6 can be a system cache which is part of a system interconnect 204 used to manage communications between the CPU cluster 200, GPU 202 and memory 206, or alternatively the shared system cache could be separate from the interconnect 204. The interconnect 204 can be a coherent interconnect which applies a coherency protocol to manage coherency of data cached at the respective caches 8 of the CPU cluster 200 and GPU 202.

The processing circuitry 4 in each CPU 201 and GPU 202 assigns a partition identifier to each outgoing memory access request sent to the interconnect 204, with the partition identifier being selected by partition identifier selection circuitry 17 based on the information stored in the partition identifier control registers 18 as mentioned above. The partition identifier flows through the memory system along with the request, to any memory system node that has resource allocation circuitry for making resource allocation decisions based on the partition identifier. Hence, cache requests made to the system cache 6 also specify the partition identifier that was selected by the one of the CPUs 201 and GPU 202 from which the corresponding memory access request originated.

Accesses from the DMA controller 210 to the system cache 6 can similarly be labelled with partition identifiers selected by partition identifier selection circuitry 17 based on information in at least one partition identifier control register 18, but in the case of the DMA controller 210 (which does not itself execute instructions), the information specified in the partition identifier control registers 18 of the DMA controller 210 is set based on instructions executed by the processing circuitry 4 running on the CPU cluster 200 or GPU 202, rather than on the DMA controller 210 itself Alternatively, DMA accesses could be assigned a fixed partition identifier selected in hardware, which is not configurable based on the software executed by the CPU cluster 200 or GPU 202.

As discussed in relation to the apparatus 2 illustrated in figure 2, the apparatus set out in figure 3 could be provided in the first domain 32, the second domain, 34 or may be split between plural domains. For example, the CPI cluster 200 could be provided as a first domain capable of managing allocation of its own partition identifiers, and the GPU 202 could be provided as a second domain capable of managing allocation of its own identifiers. Advantageously, this would allow the architect of the CPU cluster 200 and the architect of the GPU 202 to each independently determine the format of the partition identifiers that are used by the respective domain. Translation circuitry may then be provided at the interface between the CPU cluster 200 and the interconnect 204, and at the interface between the GPU 202 and the interconnect 204. It will be readily apparent to the skilled person that the example apparatuses set out in figures 2 and 3 could contain any number of sub domains designed independently and each operating using their own mechanism for defining and assigning partition identifiers. The provision of translation circuitry at the interfaces to these domains eliminates the need for designers of components within the system to agree on a standard approach for partition identifiers allowing each to tailor the approach to the circuitry being provided.

Figure 4 schematically illustrates an apparatus 50 according to some configurations of the present techniques. The apparatus 50 is provided with translation circuitry 54 to perform translations between partition identifiers received from one of the first domain 52 and the second domain 56 to be output in the other of the first domain 52 and the second domain 56.

The first domain 52 comprises hardware running a management process 58 configured to receive a resource access requests 62 and, in response to receipt of a resource access request, to select an N-bit partition identifier using partition identifier selection circuitry 64. Here, N is any positive integer. The management process 58 assigns the partition identifier based, for example, on the techniques described in relation to figure 2 and outputs the resource access request having the N-bit partition identifier 70.

The second domain 56 comprises hardware running a management process 60 configured to receive a resource access requests 66 and, in response to receipt of a resource access request, to select an M-bit partition identifier using partition identifier selection circuitry 68. Here M is any positive integer. The management process 60 assigns the partition identifier based, for example, on the techniques described in relation to figure 2 and outputs the resource access request having the M-bit partition identifier 72.

Where the resource access request in the first domain 52, having the N-bit partition identifier 70, requires access to a resource in the second domain 56, or where the resource access request in the second domain 56, having the M-bit identifier 72, requires access to a resource in the first domain 52, the partition identifier assigned to that resource access request requires translation in order that it can be compatible with the other domain. The translation circuitry 54 comprises translation information 74 which includes a mapping between at least a subset of the N-bit partition identifiers and at least a subset of the M-bit partition identifiers.

On receipt of the translation request having the N-bit partition identifier 70 from the first domain 52, the translation circuitry 54 performs a translation of the N-bit partition identifier using the translation information 74 and outputs the resource access request to the second domain with an M-bit identifier obtained from the mapping stored in the translation information 74. Similarly, on receipt of the translation request having the M-bit partition identifier 72 from the first domain 56, the translation circuitry 54 performs a translation of the M-bit partition identifier using the translation information 74 and outputs the resource access request to the second domain with an N-bit identifier obtained from the mapping stored in the translation information 74.

Figure 5 schematically illustrates details of an apparatus 80 according to some configurations of the present techniques. The apparatus 80 is provided with translation circuitry 84 configured to perform identifier translation between a first domain 82 and a second domain 86. The translation of identifiers by the translation circuitry may take place as described above. The translation is performed with reference to translation information stored in registers 92 comprised in the translation circuitry 84. The registers 92 are memory mapped registers mapped to a region of memory 90 that is comprised in the first domain 82. The translation information can be modified by a control process 88 in the first domain 82 which can modify the translation information by writing to the mapped region of memory 90. The mapped region of memory can also be modified by a control process 94 running in the second domain 86. The control process 94 in the second domain 86 can modify the mapped region of memory 90 in the first domain by sending a memory access request 102. The memory access request 102 is received by a partition identifier management process 98 in the second domain which makes a selection of a partition identifier using partition identifier selection circuitry 100. The selected partition identifier, which is an M-bit partition identifier is associated with the access request 102 and passed as an input resource access request 96 to the translation circuitry 84. The translation circuitry translates the partition identifier from the M-bit partition identifier for the second domain 86 to an N-bit partition identifier in the first domain 82 which then allows the control process 94 to access the mapped region of memory 90 in order to modify the translation information stored in the memory mapped registers 92 comprised in the translation circuitry 84. It will be readily apparent to the skilled person that, in alternative configurations, the mapped region of memory may be hosted in the second domain 86 rather than the first domain 82 or may comprise regions of memory in both the first domain 82 and the second domain 86.

Figure 6 schematically illustrates a system 110 comprising a plurality of domains 112 each connected to two others by translation circuitry 114. The system comprises a first domain 112(1), which is provided with a set of CPUs 116; a second domain 112(2), which is provided with a set of CPUs 118; a third domain 112(3), which is provided with a set of CPUs 120; and a fourth domain 112(4), which is provided with a set of CPUs 122. The first domain 112(1) is coupled (connected) to the second domain 112(2) by translation circuitry 114(1-2), the second domain 112(2) is coupled to the fourth domain 112(4) by translation circuitry 114(2-4), the fourth domain 112(4) is coupled to the third domain 112(3) by the translation circuitry 114(3-4), the third domain 112(3) is coupled to the first domain 1120) by the translation circuitry 114(1-3). Each of the domains 112 is able to independently assign partition identifiers with translation between the different identifiers handled by the respective translation circuitry 114 as described above.

It will be readily apparent to the skilled person that additional translation circuitry could be provided between the first domain 1120) and the fourth domain 112(4), and/or between the second domain 112(2) and the third domain 112(3).

Furthermore, whilst each of the illustrated domains 112 comprises a set of four CPUs, the arrangement of each domain may vary both in terms of the size of that domain and the type and arrangement of circuitry provided.

Figure 7 schematically illustrates the translation of partition identifiers according to some configurations of the present techniques. The translation circuitry receives an input access request 130 from a first domain specifying a first partition identifier. The translation circuitry stores a set of look up tables (LUTs) 134 indicative of a translation between the first partition identifier defined in the first domain and a second partition identifier in the second domain. The translation circuitry performs a lookup in the lookup tables 134 based on the first partition identifier. The lookup may be performed, for example, by indexing into the lookup tables 134 based on the partition identifier. The translation circuitry identifies the second partition identifier from an entry in the lookup table and outputs the access request as an output access request 132 having the second partition identifier.

Figure 8 schematically illustrates an arrangement of the lookup tables according to some configurations of the present techniques. The lookup table is arranged as a set associative storage structure 146. The translation circuitry receives a memory request 140 comprising a memory address and a first partition identifier. The partition identifier indicated in the memory request 140 is fed into hash circuitry 144 which generates a hash of the first partition identifier. The hash of the first partition identifier is used to index into the set associative storage structure 146. Each unique index of the set associative storage structure 146 identifies a set of entries, in the illustrated configuration the set comprises two entries and each of the entries comprises a first partition identifier and a second partition identifier. Once the set of entries is identified based on the hash of the first partition identifier, the identified set 148 is passed to tag comparison circuitry 150. The tag comparison circuitry 150 compares the partition identifier in the memory access request 140 to the first partition identifiers stored in each entry of the identified set 148. Where the tag comparison circuitry 150 identifies a match, between the first partition identifier included in the memory access request 140 and a first partition identifier included in one of the entries is in the identified set 148, the tag comparison circuitry 150 signals multiplexors 152 or 154 to forward the matching second identifier from the entry having the matching first partition identifier. The second partition identifier is included in the output memory access request 142 along with the memory address specified in the input memory access request 140.

It will be readily apparent to the skilled person that, in alternative configurations, the set associative storage structure 146 may contain a greater number of entries (ways) per set. Furthermore, the partition identifier may be based on only a subset of bits of the first partition identifier with the remaining bits being forwarded without translation or omitted from the translation.

Figure 9 schematically illustrates an arrangement of translation circuitry according to some configurations of the present techniques. The translation circuitry receives an input memory access request 160 from a first domain specifying a first partition identifier. The first partition identifier is passed to identifier conversion circuitry 168 which stores a logical function 166 to convert a most significant portion of the first identifier to a most significant portion of the second identifier, for example, through the combination of two or more bits of the first partition identifier. The least significant portion of the first identifier is passed, without conversion, to be used as the second partition identifier. The second partition identifier is output to be combined with the resource access request to generate the output resource access request 162 which is passed to the second domain.

Figure 10 schematically illustrates a sequence of steps carried out by translation circuitry according to some configurations of the present techniques. Flow begins at step S100 where it is determined if an input resource access request is received from a first domain specifying a first identifier. If, at step S100, it is determined that an input resource access request specifying a first identifier has not been received, then flow remains at step 5100. If, at step S100, it is determined that an input resource access request specifying a first identifier has been received, then flow proceeds to step S102. At step 5102, a translation of a first identifier indicating a requestor process in the first domain to a second identifier based on translation information defining a mapping between identifiers in the first domain and the second domain is performed. Flow then proceeds to step 5104 where the translated access request including the second identifier is output. The output translated access request omits the first identifier.

Figure 11 illustrates a simulator implementation that may be used. Whilst the earlier described embodiments implement the present invention in terms of apparatus and methods for operating specific processing hardware supporting the techniques concerned, it is also possible to provide an instruction execution environment in accordance with the embodiments described herein which is implemented through the use of a computer program. Such computer programs are often referred to as simulators, insofar as they provide a software based implementation of a hardware architecture.

Varieties of simulator computer programs include emulators, virtual machines, models, and binary translators, including dynamic binary translators. Typically, a simulator implementation may run on a host processor 730, optionally running a host operating system 720, supporting the simulator program 710. In some arrangements, there may be multiple layers of simulation between the hardware and the provided instruction execution environment, and/or multiple distinct instruction execution environments provided on the same host processor. Historically, powerful processors have been required to provide simulator implementations which execute at a reasonable speed, but such an approach may be justified in certain circumstances, such as when there is a desire to run code native to another processor for compatibility or re-use reasons. For example, the simulator implementation may provide an instruction execution environment with additional functionality which is not supported by the host processor hardware, or provide an instruction execution environment typically associated with a different hardware architecture. An overview of simulation is given in "Some Efficient Architecture Simulation Techniques", Robert Bedi ch ek, Winter 1990 U S EN I X Conference, Pages 53 -63.

To the extent that embodiments have previously been described with reference to particular hardware constructs or features, in a simulated embodiment, equivalent functionality may be provided by suitable software constructs or features. For example, particular circuitry may be implemented in a simulated embodiment as computer program logic. Similarly, memory hardware, such as a register or cache, may be implemented in a simulated embodiment as a software data structure. In arrangements where one or more of the hardware elements referenced in the previously described embodiments are present on the host hardware (for example, host processor 730), some simulated embodiments may make use of the host hardware, where suitable.

The simulator program 710 may be stored on a computer-readable storage medium (which may be a non-transitory medium), and provides a program interface (instruction execution environment) to the target code 700 (which may include applications, operating systems and a hypervisor) which is the same as the interface of the hardware architecture being modelled by the simulator program 710. Thus, the program instructions of the target code 700 may be executed from within the instruction execution environment using the simulator program 710, so that a host computer 730 which does not actually have the hardware features of the apparatus 2 discussed above can emulate these features. In some configurations, the simulator code comprises translation program logic 740 to emulate the features of the translation circuitry as described above.

In brief overall summary there is provided an apparatus, a method, and a computer program. The apparatus is provided with translation circuitry configured to translate an input resource access request received from a first domain to an output resource access request for a second domain, the input resource access request comprising a first identifier indicating a requestor process in the first domain that generated the input resource access request, and the output resource access request comprising a second identifier translated from the first identifier based on translation information defining a mapping between identifiers in the first domain and the second domain. The first identifier is omitted from the output resource access request.

In the present application, the words "configured to..." are used to mean that an element of an apparatus has a configuration able to carry out the defined operation. In this context, a "configuration" means an arrangement or manner of interconnection of hardware or software. For example, the apparatus may have dedicated hardware which provides the defined operation, or a processor or other processing device may be programmed to perform the function. "Configured to" does not imply that the apparatus element needs to be changed in any way in order to provide the defined operation.

In the present application, lists of features preceded with the phrase "at least one of mean that any one or more of those features can be provided either individually or in combination. For example, "at least one of [A], [B] and [C]" encompasses any of the following options: A alone (without B or C), B alone (without A or C), C alone (without A or B), A and B in combination (without C), A and C in combination (without B), B and C in combination (without A), or A, B and C in combination.

Although illustrative configurations of the invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise configurations, and that various changes, additions and modifications can be effected therein by one skilled in the art without departing from the scope of the invention as defined by the appended claims. For example, various combinations of the features of the dependent claims could be made with the features of the independent claims without departing from the scope of the present invention.

Claims

WE CLAIM: 1. An apparatus comprising: translation circuitry configured to translate an input resource access request received from a first domain to an output resource access request for a second domain, the input resource access request comprising a first identifier indicating a requestor process in the first domain that generated the input resource access request, and the output resource access request comprising a second identifier translated from the first identifier based on translation information defining a mapping between identifiers in the first domain and the second domain, wherein the first identifier is omitted from the output resource access request.
2. The apparatus of claim 1, wherein a number of identifiers available for assignment in the first identifier space and the second identifier space are defined independently from one another.
3 The apparatus of claim 1 or claim 2, wherein the translation information is dynamically configurable.
4. The apparatus of any preceding claim, wherein the translation circuitry comprises one or more registers and the translation information is defined based on data stored in the one or more registers.
5. The apparatus of claim 4, wherein the one or more registers are memory mapped registers.
6. The apparatus of claim 4 or claim 5, wherein the apparatus comprises control circuitry configured to maintain the translation information by modifying the one or more registers.
7. The apparatus of any preceding claim, wherein the translation information comprises one or more translation lookup tables.
8. The apparatus of any preceding claim, wherein the translation information comprises one or more translation functions configured to map between identifiers in the first domain and identifiers in the second domain.
9. The apparatus of any preceding claim, wherein the translation circuitry is configured to be provided at a boundary between the first domain and the second domain.
10. The apparatus of claim 9, wherein the boundary is an external boundary of both of the first domain and the second domain.
11. The apparatus of any preceding claim, wherein identifiers in each respective domain of the first domain and the second domain are assigned by a respective instance of management software running in the respective domain.
12. The apparatus of any preceding claim, wherein the identifiers in each respective domain of the first domain and the second domain are used for resource allocation control for regulating the level of performance seen by resource accesses in the respective domain.
13. The apparatus of any preceding claim, wherein the first identifier is independent of a memory address specified in the input resource access request.
14. The apparatus of any preceding claim, wherein: the input memory access request specifies a security state identifier indicative of an input security state associated with the input memory access request and the translation circuitry is configured to set an output security state of the output translation request based on the input security state.
15. The apparatus of claim 14, wherein the translation circuitry is configured to set the output security state in dependence on security translation information defined in the translation information.
16. The apparatus of any preceding claim, wherein the translation circuitry is configured to perform access request translations input from the second domain to be output to the first domain.
17. A system comprising: the apparatus of any preceding claim; first processing circuitry configured to implement the first domain; and second processing circuitry configured to implement the second domain, wherein the translation circuitry is configured at an interface between the first domain and the second domain.
18. The system of claim 17, comprising: third processing circuitry configured to implement a third domain; and second translation circuitry arranged at a boundary between the third domain and at least one of the first domain and the second domain, wherein the translation circuitry is unaware of identifiers assigned in the third domain.
19. A method comprising: translating an input resource access request received from a first domain to an output resource access request for a second domain, the input resource access request comprising a first identifier indicating a requestor process in the first domain that generated the input resource access request, and the output resource access request comprising a second identifier translated from the first identifier based on translation information defining a mapping between identifiers in the first domain and the second domain, wherein the first identifier is omitted from the output resource access request.
20. A computer program for controlling a host data processing apparatus to provide an instruction execution environment, the computer program comprising: translation program logic configured to translate an input resource access request received from a first domain to an output resource access request for a second domain, the input resource access request comprising a first identifier indicating a requestor process in the first domain that generated the input resource access request, and the output resource access request comprising a second identifier translated from the first identifier based on translation information defining a mapping between identifiers in the first domain and the second domain, wherein the first identifier is omitted from the output resource access request.