[go: up one dir, main page]

WO2024194593A1 - Storing coalesced memory address translations - Google Patents

Storing coalesced memory address translations Download PDF

Info

Publication number
WO2024194593A1
WO2024194593A1 PCT/GB2024/050276 GB2024050276W WO2024194593A1 WO 2024194593 A1 WO2024194593 A1 WO 2024194593A1 GB 2024050276 W GB2024050276 W GB 2024050276W WO 2024194593 A1 WO2024194593 A1 WO 2024194593A1
Authority
WO
WIPO (PCT)
Prior art keywords
coalesced
format
address
translation
entry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/GB2024/050276
Other languages
French (fr)
Inventor
Abdel Hadi Moustafa
Guillaume Bolbenes
Albin Pierrick Tonnerre
Paolo Monti
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARM Ltd
Original Assignee
ARM Ltd
Advanced Risc Machines Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ARM Ltd, Advanced Risc Machines Ltd filed Critical ARM Ltd
Priority to CN202480017762.4A priority Critical patent/CN120883198A/en
Publication of WO2024194593A1 publication Critical patent/WO2024194593A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0864Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0886Variable-length word access
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • G06F12/1036Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] for multiple virtual address spaces, e.g. segmentation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1028Power efficiency
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1041Resource optimization
    • G06F2212/1044Space efficiency improvement
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/50Control mechanisms for virtual memory, cache or TLB
    • G06F2212/502Control mechanisms for virtual memory, cache or TLB using adaptive policy
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/65Details of virtual memory and virtual address translation
    • G06F2212/651Multi-level translation tables
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/65Details of virtual memory and virtual address translation
    • G06F2212/652Page size control
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/65Details of virtual memory and virtual address translation
    • G06F2212/657Virtual address space management
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/68Details of translation look-aside buffer [TLB]
    • G06F2212/683Invalidation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/68Details of translation look-aside buffer [TLB]
    • G06F2212/684TLB miss handling

Definitions

  • the present invention relates to data processing. More particularly the present invention relates to an apparatus, method, and non-transitory computer readable storage medium.
  • Some data processing apparatuses are provided with a translation lookaside buffer to store memory address translations between a block of input memory addresses in an input memory address space and a block of output memory addresses in an output memory address space. By storing memory address translations in this way the apparatus is able to translate between addresses in the input memory address space and the output memory address space without needing to request translation data from memory, provided the translation lookaside buffer stores the required memory address translation.
  • an apparatus comprising: a translation lookaside buffer comprising a plurality of entries each capable of storing translation data comprising one or more memory address translations, the one or more memory address translations each defining an address translation between a block of input memory addresses in an input memory address space and a block of output memory addresses in an output memory address space, wherein the translation lookaside buffer is configured to select, when allocating the translation data for storage within a given entry, a format used to store the translation data within the given entry, and the format is selected from a format group comprising a plurality of coalesced formats; and control circuitry configured to maintain coalesced format information identifying one or more active coalesced formats of the plurality of coalesced formats currently available for use by the translation lookaside buffer; wherein: each of the plurality of coalesced formats defines an input address range size and an output address range size; and each entry formatted using a given coalesced format of the plurality of coalesced formats is capable of identifying a
  • a method comprising: storing, in a plurality of entries of a translation lookaside buffer, translation data comprising one or more memory address translations, the one or more memory address translations each defining an address translation between a block of input memory addresses in an input memory address space and a block of output memory addresses in an output memory address space, wherein the translation lookaside buffer is configured to select, when allocating the translation data for storage within a given entry, a format used to store the translation data within the given entry, and the format is selected from a format group comprising a plurality of coalesced formats; and maintaining coalesced format information identifying one or more active coalesced formats of the plurality of coalesced formats currently available for use by the translation lookaside buffer, wherein: each of the plurality of coalesced formats defines an input address range size and an output address range size; and each entry formatted using a given coalesced format of the plurality of coalesced formats is capable of identifying a plurality of memory address
  • a computer readable storage medium to store computer-readable code for fabrication of an apparatus comprising: a translation lookaside buffer comprising a plurality of entries each capable of storing translation data comprising one or more memory address translations, the one or more memory address translations each defining an address translation between a block of input memory addresses in an input memory address space and a block of output memory addresses in an output memory address space, wherein the translation lookaside buffer is configured to select, when allocating the translation data for storage within a given entry, a format used to store the translation data within the given entry, and the format is selected from a format group comprising a plurality of coalesced formats; and control circuitry configured to maintain coalesced format information identifying one or more active coalesced formats of the plurality of coalesced formats currently available for use by the translation lookaside buffer; wherein: each of the plurality of coalesced formats defines an input address range size and an output address range size; and each entry formatted using a given coalesced format of
  • Such computer-readable code can be disposed in any known transitory computer-readable medium (such as wired or wireless transmission of code over a network) or non-transitory computer-readable medium such as semiconductor, magnetic disk, or optical disc.
  • FIG. 1 schematically illustrates an apparatus according to various configurations of the present techniques
  • FIG. 2 schematically illustrates an apparatus according to various configurations of the present techniques
  • Figure 3 schematically illustrates a sequence of steps performed in response to a request for a translation according to various configurations of the present techniques
  • Figure 4 schematically illustrates a page table walk carried out in response to a request for a translation according to various configurations of the present techniques
  • Figure 5 schematically illustrates a sequence of steps carried out when performing a page table walk according to various configurations of the present techniques
  • Figure 6 schematically illustrates a mapping between virtual addresses and physical addresses according to various configurations of the present techniques
  • Figure 7 schematically illustrates a mapping between virtual addresses and physical addresses according to various configurations of the present techniques
  • Figure 8 schematically illustrates a mapping between virtual addresses and physical addresses according to various configurations of the present techniques
  • Figure 9 schematically illustrates a mapping between virtual addresses and physical addresses according to various configurations of the present techniques
  • Figure 10 schematically illustrates a mapping between virtual addresses and physical addresses according to various configurations of the present techniques
  • Figure 11 schematically illustrates a mapping between virtual addresses and physical addresses using a range of different formats according to various configurations of the present techniques
  • Figure 12 schematically illustrates details of an apparatus according to various configurations of the present techniques
  • Figure 13 schematically illustrates a sequence of steps carried out for performing a lookup in response to an input address, in accordance with various configurations of the present techniques
  • Figure 14 schematically illustrates a sequence of steps carried out when allocating memory address translations in a translation lookaside buffer, according to various configurations of the present techniques.
  • Figure 15 schematically illustrates fabrication of an apparatus according to various configurations of the present techniques.
  • Some processing apparatuses maintain addresses in two or more different address spaces that are used to identify locations at which data values are stored.
  • the apparatus stores translation data that defines a mapping between input addresses in an input address space and output addresses in an output address space.
  • some apparatuses are provided with a translation lookaside buffer that is arranged to store (cache) at least some of the translation data.
  • the apparatus can query (i.e., perform a lookup in) the translation lookaside buffer and, if there is a hit, perform the translation without having to retrieve the translation data from memory resulting in a faster translation.
  • entries of the translation lookaside buffer may be formatted using a coalesced format in which two or more translations that occur within a same portion of the memory address spaces can be combined into a single entry.
  • an apparatus comprising a translation lookaside buffer comprising a plurality of entries each capable of storing translation data comprising one or more memory address translations, the one or more memory address translations each defining an address translation between a block of input memory addresses in an input memory address space and a block of output memory addresses in an output memory address space.
  • the translation lookaside buffer is configured to select, when allocating the translation data for storage within a given entry, a format used to store the translation data within the given entry, and the format is selected from a format group comprising a plurality of coalesced formats.
  • the apparatus is also provided with control circuitry configured to maintain coalesced format information identifying one or more active coalesced formats of the plurality of coalesced formats currently available for use by the translation lookaside buffer.
  • Each of the plurality of coalesced formats defines an input address range size and an output address range size.
  • each entry formatted using a given coalesced format of the plurality of coalesced formats is capable of identifying a plurality of memory address translations between blocks of input memory addresses, located within a same input address range having the input address range size defined in the given coalesced format, and blocks of output memory addresses, located within a same output address range having the output address range size defined in the given coalesced format.
  • the inventors have realised that a coalesced format that is used at system start up, or following a defragmentation process, may become less useful as the memory spaces fragment.
  • coalesced formats In order to improve the overall coverage of the translation lookaside buffer, plural (two or more) different coalesced formats are supported Each of the coalesced formats provides a different input address range size and a different output address range size. Therefore, if the output address space becomes fragmented then a coalesced format having a larger output address range size could be selected so that blocks of output addresses that are further apart in the output address space can be coalesced into a single coalesced entry. In this way, the potential adverse effect of fragmentation on entries of the translation lookaside buffer can be reduced and the coverage of the translation lookaside buffer for fragmented memory address spaces can be increased.
  • the apparatus is also provided with control circuitry which is configured to maintain coalesced format information in order to identify which of the plural coalesced formats are currently actively supported by the translation lookaside buffer.
  • the translation lookaside buffer can therefore be arranged to support (i.e., be capable of supporting) plural coalesced formats, but the number of those coalesced formats that are active at any one time may be fewer than the total number of coalesced formats that are supported.
  • the format used by a given entry of the translation lookaside buffer is initially determined at a time of allocation of translation data into that given entry of the translation lookaside buffer. In some configurations this format is retained so long as at least one memory address translation stored in the entry remains valid. In other configurations, the format may be changed after allocation as will be discussed in detail below.
  • the translation data that is stored in the translation lookaside buffer is a translation from a block of input memory addresses to a block (group) of output memory addresses.
  • each block identifies an aligned contiguous range of memory address space.
  • the translation data that is stored in the translation lookaside buffer may be provided at a coarser level than the individual address level.
  • the aligned contiguous range of memory address space is identified by excluding a number of least significant bits of an input address and of an output address from the translation data and mapping input addresses within a corresponding input address block to output address within a corresponding output address block by performing a fixed mapping between the least significant bits of the input address and the least significant bits of the output address.
  • the least significant bits of the input address are the least significant bits of the output address.
  • the apparatus is provided with various circuitry blocks, i.e., the translation lookaside buffer and the control circuitry. These circuitry blocks may be provided as discrete circuits fabricated as a same or as different integrated circuits. Alternatively, the circuitry blocks may be provided as one or more combined circuitry blocks that each work together to perform the functionality of the described circuitry blocks.
  • the control circuitry may be comprised in the translation lookaside buffer. In alternative configurations, the control circuitry may be external to the translation lookaside buffer.
  • each of the coalesced formats can be provided in a variety of ways.
  • a number of address translations that can be identified per entry is dependent on the output address range size defined in that coalesced format.
  • Each entry using one of the coalesced formats can be used to store data indicative of translations between two or more blocks of input addresses and two or more blocks of output addresses. The closer together these addresses are in the output memory address space, the more the address translations can be compressed within an entry. Therefore, there is a trade-off between the output address range size and the number of address translations that can be identified per entry, given the fixed size of each entry. The closer together the two or more blocks of output addresses, the fewer the bits needed to store address translations related to those blocks of output addresses.
  • the plurality of coalesced formats comprises at least a first coalesced format and a second coalesced format.
  • the output address range size defined in the second coalesced format is greater than the output address range size defined in the first coalesced format.
  • the number of address translations that can be identified per entry formatted using the second coalesced format is defined such that a second number of bits required for the entry formatted using the second coalesced format is fewer than or equal to a first number of bits required by an entry formatted using the first coalesced format.
  • the first coalesced format is defined as a legacy coalesced format available to translation lookaside buffers that are capable of supporting only a single coalesced format.
  • the second coalesced format is then defined such that a greater output address range size can be spanned for entries formatted using the second coalesced format with a number of address translations that can be accommodated in entries formatted using the second coalesced format being smaller than a number of address translations that can be accommodated in entries formatted using the first coalesced format.
  • the translation data that is present in the second coalesced format can be represented without requiring any additional bits to be provided beyond those that were already present in an entry of the translation lookaside buffer to support the first coalesced format.
  • the number of address translations is equal to a power of two.
  • the number of redundant bits that are present for the each of the coalesced formats can be reduced. This results in an increased information density in the entries of the translation lookaside buffer.
  • each coalesced entry may be generated based on data returned from a page table walk, e.g., an entire cache line of address translations which typically contains a number of entries that is also equal to a power of two, there is a closer correspondence between the coalesced formats and the data returned from the page table walk.
  • each coalesced entry can be aligned to a boundary of an input address range having the input address range size. This results in a more efficient implementation.
  • the format group comprises a non-coalesced format comprising a single memory address translation between a single block of input memory addresses and a single block of output memory addresses.
  • the translation lookaside buffer is configured to store, for each of the plurality of entries, coalesced entry indicating information identifying whether the one or more memory address translations stored in that entry are represented using the non-coalesced format or one of the plurality of coalesced formats.
  • the translation lookaside buffer may be capable of storing translation data in entries using a non-coalesced format.
  • the translation lookaside buffer stores coalesced entry indicating information identifying, for each entry, whether that entry is in the non-coalesced format or whether that entry is in the coalesced format.
  • the coalesced entry indicating information is stored as part of the translation entry, for example, as a single bit at a known bit position which takes a first value when the entry is a coalesced entry and takes a second value when the entry is a noncoalesced entry.
  • the coalesced entry indicating information could be implicit.
  • an entry formatted using a coalesced format may comprise validity bits indicating a validity of each of the memory address translations where the coalesced entry indicating information is inferred based on whether or not there is more than one valid memory address translation in the entry.
  • the coalesced entry indicating information is stored in a separate storage structure.
  • the translation lookaside buffer may comprise two or more separate storage structures, one for coalesced entries and one for noncoalesced entries with the coalesced entry indicating information being implicitly determined based on the storage structure that is used to store the translation data.
  • each of the plurality of coalesced formats defines the input address range size as covering 2 n blocks of input memory addresses, and defines the output address range size as covering 2 m blocks of output memory addresses.
  • each entry formatted using the given coalesced format stores a base input memory address, a base output memory address, and a plurality of fields each capable of storing a mapping between an n-bit offset in the input memory address space and an m-bit offset in the output memory address space.
  • the blocks of input and output memory addresses are identified, respectively, using an input address identifier and an output address identifier which exclude a number of least significant bits from the input and output addresses.
  • each entry formatted using a coalesced format stores the following information:
  • Ni 2 n offset fields each having m-bits in which to store an m-bit offset
  • Ni validity bits each indicating a validity of a corresponding one of the Ni offsets.
  • the total number of additional bits required to support this information is given by the number of additional sets of translation offsets (the total number of translation offsets minus the one that would already be there for a non-coalesced entry), i.e., (N[ — 1), multiplied by the number of bits per offset, i.e., (m + 1), where the additional +1 is for the validity bits.
  • N[ — 1) the number of bits per offset
  • m + 1 the number of bits per offset
  • one or more additional bits may be provided to identify the translation format that has been used.
  • the active coalesced formats may be configurable and, in some configurations, the control circuitry is responsive to removal of a previously active coalesced format from the active coalesced formats, to initiate a scrubbing procedure to identify entries formatted using the previously active coalesced format and to remove the identified entries.
  • the control circuitry is arranged to scrub the translation lookaside buffer to identify (determine) which entries are formatted using the previously active coalesced format.
  • the scrubbing procedure comprises disabling all or part of the translation lookaside buffer whilst scrubbing is being performed.
  • the scrubbing procedure can take any form that removes entries formatted using the previously active coalesced format.
  • the scrubbing procedure comprises invalidating at least one of the identified entries.
  • the scrubbing procedure comprises allocating at least one reformatted entry identifying at least one memory address translation comprised in one of the identified entries rewritten using a currently active coalesced format identified in the coalesced format information.
  • the scrubbing procedure can use a combination of invalidation and rewriting (reallocation) of entries dependent on one or more criteria.
  • control circuitry may be responsive to identification of an entry formatted using the previously active format, to determine whether any of the plurality of memory address translations stored in that entry can be coalesced into an entry formatted using a currently active coalesced format. If so then the entry can be reallocated and if not then the entry is invalidated.
  • the translation data is allocated to a corresponding entry of the plurality of entries based on an index derived from one or more indexing bits of a common address portion of the one or more blocks of input memory addresses associated with the one or more memory address translations comprised in the translation data, and the one or more indexing bits are dependent on the format used to store the translation data.
  • the index is the one or more indexing bits.
  • a mapping function may be applied to the one or more indexing bits to derive the index
  • the mapping function may be a hash function.
  • the indexing bits may comprise one or more least significant bits of the common portion of the address.
  • the translation lookaside buffer is formulated as a set associative cache comprising a plurality of sets each identified by a corresponding index and each comprising a plurality of set entries of the plurality of entries.
  • the control circuitry is configured to determine one or more skew bits identifying, for given translation data, which of the plurality of set entries can be used to store the given translation data formatted using a given format of the format group.
  • a first subset of the set entries may be identified by the one or more skew bits as being suitable for storing the translation data if it is in a first format of the format group (e.g., one of the coalesced formats) and a second subset of the set entries may be identified as being suitable for storing the translation data if it is in a second format of the format group (e g. another of the coalesced formats or the non-coalesced format).
  • the one or more skew bits may be a single bit with the first subset of entries being indicated by a first value and the second subset of entries being indicated by the second value.
  • the first subset of entries and the second subset of entries are the same size, for example, a skew bit having a logical 1 may indicate that even set entries are being used for the first format and that odd set entries are being used for the second format.
  • the sizes of the first subset and the second subset may be different.
  • the one or more skew bits may comprise a plurality of skew bits identifying a different subset of the plurality of entry sets that could be used for each format identified in the format group. This approach provides a great deal of flexibility in terms of which entry formats are used for different entry sets and can be used, for example, if it is expected that some entry formats are likely to occur less frequently than others.
  • the bits that are used as the one or more skew bits may be independent of the active coalesced formats, in some configurations the one or more skew bits are determined based on the coalesced format information. In other words, the bits identified as the one or more skew bits can change in response to a change in the active coalesced formats identified in the coalesced format information.
  • the one or more skew bits may comprise one or more least significant bits (e g., the least significant bit) of the bits of the input memory address that are not comprised in the indexing bits of any of the active coalesced formats.
  • the one or more skew bits identify a first group of one or more set entries that can be used to store the given translation data formatted using the non-coalesced format and a second group of one or more set entries that can be used to store the given translation data formatted using one of the one or more active coalesced formats.
  • control circuitry is responsive to a change in bits identified as the one or more skew bits, to perform a reallocation procedure to reallocate entries storing the translation data formatted using the non-coalesced format.
  • the reallocation procedure may comprise identifying entries of the translation lookaside buffer that use the non-coalesced format, comparing a value of the previous skew bit(s) to a value of the new skew bit(s) and, if the values are different, reallocating translation data to a different set entry.
  • the reallocation procedure can skip that entry and move onto identifying another entry that is formatted using the non-coalesced format.
  • the previously allocated skew bits comprised a single bit, i.e., bit 16
  • the newly allocated skew bits comprised a single bit, i.e., bit 18, then the bit at bit position 16 and the bit at bit position 18 would be compared to determine whether the value of the skew bit has changed. This comparison is performed on a per entry basis as it is dependent on the precise value of the input address bit(s) used as the skew bit(s).
  • the bits determined to be the one or more skew bits comprise at least one bit belonging to the common address portion and different to the one or more indexing bits for each of the one or more active coalesced formats.
  • the input memory address can therefore be subdivided into the following sections [tag, index, LSBs] where the LSBs are a portion of the input memory address that is mapped directly, i.e., without performing a lookup, to the output memory address.
  • the index bits are used as index generating bits to identify a location in the translation lookaside buffer.
  • the tag portion of the input memory address is used for comparison to bits of input memory addresses stored in each set entry in order to determine if the set entry is a match.
  • the one or more skew bits comprise one or more of the bits (e g., a single bit) of the tag portion.
  • the one or more skew bits can be any of the tag bits that are not used as index bits in any of the active coalesced formats identified in the coalesced format information or as index bits for the non-coalesced format.
  • the one or more skew bits comprise the least significant one or more bits of the tag bits that are not used as an index in any of the active coalesced formats identified in the coalesced format information or as index bits for the noncoalesced format.
  • the coalesced format information identifies a single active coalesced format.
  • the coalesced format information can be stored globally for all entries in the translation lookaside buffer.
  • the global storage may be provided as one or more dedicated control bits that identify which of the plurality of coalesced formats is the single active coalesced format at any given time.
  • the one or more dedicated control bits may be provided as part of the translation lookaside buffer, the control circuitry, or separately from the translation lookaside buffer and the control circuitry, for example, as part of one or more control registers.
  • the one or more control bits also identifies a previous coalesced format to facilitate identification and reallocation/invalidation of entries that are formatted using the previous coalesced format.
  • the coalesced format information identifies a plurality of active coalesced formats, and each of the plurality of entries stores information identifying which of the plurality of active coalesced formats is used to represent translation data stored in that entry.
  • the information may comprise one or more additional bits added to (encoded in) each entry of the translation lookaside buffer.
  • the information may be encoded using a plurality of bits including at least one bit identifying whether the entry is a coalesced or a noncoalesced bit.
  • each entry may be provided with two bits to identify the four possible formats (non-coalesced format, or one of the three coalesced formats).
  • the plurality of active coalesced formats comprises all of the plurality of coalesced formats.
  • the plurality of active coalesced formats comprises a subset of the coalesced formats.
  • three coalesced formats may be available with only two active coalesced formats at any given time.
  • the information identifying which of the plurality of active coalesced formats is used to represent the translation data stored in each entry may be provided as a table, separate from the translation lookaside buffer.
  • a lookup in the translation lookaside buffer for translation data of a particular format could be avoided and/or curtailed if the table identifies that, for a given index determined given that particular format, there are no entries formatted using the particular format present in any of the set entries identified by that given index.
  • the apparatus comprises address processing circuitry responsive to an input lookup address in the input memory address space and for each given active coalesced format of the active coalesced formats: to generate an index based on at least a portion of the input lookup address, wherein the portion is dependent on the given active coalesced format, to determine whether an identified entry corresponding to the input lookup address and formatted using the given active coalesced format is present in the translation lookaside buffer at a location identified by the index, and in response to a determination that the identified entry is present in the translation lookaside buffer formatted in the given coalesced format, to determine a translated output address based on identified translation data stored in the identified entry.
  • the address processing circuitry may be provided as a dedicated circuitry block or may be provided as part of one or more of the control circuitry and the translation lookaside buffer.
  • the address processing circuitry is configured to perform lookups for at least two formats (e.g. at least two active coalesced formats or one or more active coalesced formats and the non-coalesced format) in parallel.
  • each lookup may be performed sequentially.
  • the determination that the identified entry is present may comprise comparing a tag portion of the input lookup address to tag portions stored in each entry of the translation lookaside buffer identified by the index.
  • the address processing circuitry is responsive to the input lookup address: to generate a non-coalesced index based on at least a further portion of the input lookup address, to determine whether a further identified entry corresponding to the input lookup address and formatted using the non-coalesced format is present in the translation lookaside buffer at a location identified by the non-coalesced index, and in response to a determination that the further identified entry is present in the translation lookaside buffer formatted in the non-coalesced format, to determine the translated output address based on further translation data stored in the further identified entry.
  • the entry or entries identified by the non-coalesced index may, for the same input memory address, be different to those identified by the index for a coalesced entry.
  • the non-coalesced lookup and the coalesced lookup may be performed either in parallel or sequentially.
  • the address processing circuitry is responsive to the translation lookaside buffer not storing the required memory address translation, to: trigger a page table walk using the input lookup address to determine the translated output memory address from a plurality of page tables, and allocate new translation data to one of the plurality of entries in the translation lookaside buffer, the new translation data representing translation of at least the input lookup address to the translated output address and being stored in the one of the plurality of entries in a format chosen from the format group.
  • the translated output memory address may be determined from a page table walk comprising plural accesses to page tables stored in memory.
  • the translated output memory address may be allocated in the translation lookaside buffer according to an allocation policy which may identify, for example, when the translation lookaside buffer is full (at capacity), whether existing entries of the translation lookaside buffer are to be replaced by the translated output memory address.
  • the page table walk returns a plurality of output memory addresses including the translated output memory address.
  • the address translation circuitry is responsive to a determination that the plurality of output memory addresses can be coalesced into a single entry using one of the active coalesced formats, to represent the new translation data using one of the active coalesced formats.
  • the address translation circuitry is responsive to a determination that the plurality of output memory addresses cannot be coalesced into the single entry using one of the active coalesced formats, to represent the new translation data using the non-coalesced format.
  • a page table walk will return an entire cache line of translations corresponding to consecutive blocks of input memory addresses.
  • the address translation circuitry can be configured to identify, for each active coalesced format, which of the plurality of output memory addresses can be grouped using that active coalesced format. The determination may be based on whether the output memory addresses are located within a same output address range having the output address range size defined in the given coalesced format and whether the input memory addresses are located within a same input address range as the input memory address, the input address range having the input address range size defined in the given coalesced format.
  • the address processing circuitry may be responsive to a determination that the plurality of output memory addresses can be coalesced into a single entry using one of the active coalesced formats, to generate the new translation data by generating a plurality of candidate sets of translation data each corresponding to one of the plurality of active coalesced formats and to select the new translation data out of the plurality of candidate sets of translation data based on a number of active translations associated with each of the plurality of candidate sets of translation data.
  • the candidate set of translation data selected may be candidate translation data having the most active translations or, where there is no single set of candidate translation data having the most active translations, the selected candidate set of translation data may be selected based on another criteria, for example the candidate set of translation data associated with the largest/smallest range of the output address space.
  • coalesced format identifying the smallest output address range size.
  • Such a coalesced format may not allow as many output memory addresses to be coalesced but may still allow coalesced entries to be generated as the memory address space fragments, thereby increasing the number of address translations that can be allocated to the translation lookaside buffer as fragmentation increases.
  • the input memory address space and the output memory address space can be any memory address spaces.
  • the block of input memory addresses is a block of virtual memory addresses and the input memory address space is a virtual memory address space; and the block of output memory addresses is a block of physical memory addresses and the output memory address space is a physical memory address space.
  • one of the blocks of input or output memory addresses is a block of intermediate physical memory addresses and one of the input or output memory address spaces is an intermediate physical memory address space such that the translation lookaside buffer stores one or more of virtual address to physical address translations, virtual address to intermediate physical address translations, and intermediate physical address to physical address translations. It would be readily apparent to the skilled person that a translation lookaside buffer could be provided to translate between blocks of memory addresses in any memory address space in addition to those identified herein.
  • Concepts described herein may be embodied in computer-readable code for fabrication of an apparatus that embodies the described concepts.
  • the computer-readable code can be used at one or more stages of a semiconductor design and fabrication process, including an electronic design automation (EDA) stage, to fabricate an integrated circuit comprising the apparatus embodying the concepts.
  • EDA electronic design automation
  • the above computer-readable code may additionally or alternatively enable the definition, modelling, simulation, verification and/or testing of an apparatus embodying the concepts described herein.
  • the computer-readable code for fabrication of an apparatus embodying the concepts described herein can be embodied in code defining a hardware description language (HDL) representation of the concepts.
  • HDL hardware description language
  • the code may define a register-transfer- level (RTL) abstraction of one or more logic circuits for defining an apparatus embodying the concepts
  • the code may define a HDL representation of the one or more logic circuits embodying the apparatus in Verilog, SystemVerilog, Chisel, or VHDL (Very High-Speed Integrated Circuit Hardware Description Language) as well as intermediate representations such as FIRRTL.
  • Computer-readable code may provide definitions embodying the concept using system-level modelling languages such as SystemC and SystemVerilog or other behavioural representations of the concepts that can be interpreted by a computer to enable simulation, functional and/or formal verification, and testing of the concepts.
  • the computer-readable code may define a low-level description of integrated circuit components that embody concepts described herein, such as one or more netlists or integrated circuit layout definitions, including representations such as GDSII.
  • the one or more netlists or other computer-readable representation of integrated circuit components may be generated by applying one or more logic synthesis processes to an RTL representation to generate definitions for use in fabrication of an apparatus embodying the invention.
  • the one or more logic synthesis processes can generate from the computer-readable code a bitstream to be loaded into a field programmable gate array (FPGA) to configure the FPGA to embody the described concepts.
  • the FPGA may be deployed for the purposes of verification and test of the concepts prior to fabrication in an integrated circuit or the FPGA may be deployed in a product directly.
  • the computer-readable code may comprise a mix of code representations for fabrication of an apparatus, for example including a mix of one or more of an RTL representation, a netlist representation, or another computer-readable definition to be used in a semiconductor design and fabrication process to fabricate an apparatus embodying the invention.
  • the concept may be defined in a combination of a computer-readable definition to be used in a semiconductor design and fabrication process to fabricate an apparatus and computer- readable code defining instructions which are to be executed by the defined apparatus once fabricated.
  • Such computer-readable code can be disposed in any known transitory computer-readable medium (such as wired or wireless transmission of code over a network) or non-transitory computer-readable medium such as semiconductor, magnetic disk, or optical disc.
  • An integrated circuit fabricated using the computer-readable code may comprise components such as one or more of a central processing unit, graphics processing unit, neural processing unit, digital signal processor or other components that individually or collectively embody the concept.
  • FIG 1 schematically illustrates an example of a data processing apparatus comprising: one or more processing elements (PE) 100, an interconnect circuit 110, a dynamic random access memory (DRAM) 120 and a DRAM controller 130.
  • Each of the processing elements 100 can access at least some of the memory locations in the DRAM 120. In principle this access could be directly via actual (physical) memory addresses.
  • the processing elements 100 refer to memory addresses by virtual memory addresses. These require translation into output or physical memory addresses to access real (physical) memory locations in the DRAM 120.
  • translation apparatus 115 such as a Memory Management Unit (MMU).
  • MMU Memory Management Unit
  • This arrangement therefore provides an example of data processing apparatus comprising: a memory 120 accessible according to physical memory addresses; one or more processing elements 100 to generate virtual memory addresses for accessing the memory; and memory address translation apparatus 115 to provide a translation of the initial memory addresses generated by the one or more processing elements to physical memory addresses provided to the memory.
  • the virtual memory addresses may be considered as input memory addresses and the physical memory addresses as output memory addresses.
  • address translation can (from the point of view of a processing element 100) be performed by a translation lookaside buffer (TLB) 105 associated with that processing element.
  • TLB translation lookaside buffer
  • the TLB 105 stores or buffers recently-used translations between virtual memory addresses and physical memory addresses.
  • the processing element 100 refers a virtual memory address to the TLB 105
  • the virtual memory address is translated to a physical memory address which then forms part of a memory access to be DRAM 120.
  • the TLB has limited size and cannot store every single possible memory address translation which may be called upon by the processing element 100.
  • the TLB refers the request to the translation apparatus 115, for example forming part of the interconnect circuitry 110.
  • the translation apparatus operates to provide or otherwise obtain the required translation and pass it back to the TLB 105 where it can be stored and used to translate a virtual memory address into a physical memory address.
  • FIG. 2 schematically illustrates the use of a translation lookaside buffer (TLB) 105 and control circuitry 103.
  • TLB translation lookaside buffer
  • VA virtual address
  • FIG. 3 which is a schematic flowchart illustrating operations of the TLB 105
  • supply of a VA 102 to the TLB 105 forms a request for a translation to be performed to determine a corresponding output physical address (PA) 104 for the VA 102 (shown in Figure 3 as a step 200).
  • PA output physical address
  • the TLB 105 contains a cache or store of translations between VA and PA.
  • the criteria by which the TLB 105 stores particular VA to PA translations can be established according to known techniques for the operation of a TLB and will be discussed further below.
  • the cached translations might include recently used translations, frequently used translations and/or translations which are expected to be required soon (such as translations relating to VAs which are close to recently- accessed VAs).
  • the TLB contains a cache of a subset of the set of all possible VA to PA translations, such that when a particular VA to PA translation is required, it may be found that the translation is already held in the cache at the TLB.
  • the TLB 105 detects whether the required translation is indeed currently cached by the TLB. If the answer is yes, then control passes to a step 240 in which the required translation is applied to the VA 102 to generate the PA 104. However, if the answer is no, then control passes to a step 220 at which the TLB 105 sends a request, comprising the required VA 222, to the MMU 115.
  • the MMU 115 derives the required VA to PA translation (using techniques to be discussed below) and sends at least the PA 232 corresponding to the VA 222 back to the TLB 105 where it is stored at a step 230.
  • the TLB 105 applies the translation stored at the TLB 105 to provide the output PA 104.
  • the TLB 105 is capable of storing translation data within entries that are formatted using one of a plurality of possible coalesced formats where each entry formatted using a given one of the coalesced formats is capable of identifying a plurality of memory address translations between blocks of input memory addresses, located within a same input address range having the input address range size defined in the given coalesced format, and blocks of output memory addresses, located within a same output address range having the output address range size defined in the given coalesced format.
  • the TLB 105 is provided with control circuitry 103 which is arranged to maintain coalesced format information identifying which of the coalesced formats are currently available for use.
  • the information identifying the coalesced formats may be stored in the TLB 105, the control circuitry 103, or as part of one or more sets of control information stored external to the TLB 105 and the control circuitry 103.
  • Figure 4 schematically illustrates an example of a stage 1 page table walk (PTW) process
  • Figure 5 is a schematic flowchart illustrating a PTW process.
  • PTW page table walk
  • a VA 222 which requires translation is formed as a 48-bit value.
  • the techniques are applicable to addresses of various lengths, and indeed that the length of a VA need not necessarily be the same as the length of a PA. Different portions of the VA 222 are used at different stages in the PTW process.
  • a base address stored in a base address register 300 ( Figure 4) is obtained at a step 400 ( Figure 5).
  • a first portion 312 of the VA 222 is added to the base address as an offset, at a step 410, so as to provide the PA 314 of an entry in a level 1 table 310.
  • the relevant page table entry is looked up in physical memory, or in any intervening cache (e g. a level 2 cache 50) if the relevant page is cached, at a step 430.
  • a detection is made as to whether a final level (level 3 in the example of figure 4) of the page table walk has been reached in the page table hierarchy. If not, as in the present case, control passes to a step 450 at which the retrieved page table entry is used as a base address of a next table in the hierarchy. The page table entry acts as the next level table in the hierarchy, a "level 1 table" 320. Control returns to the step 410.
  • a further part 322 of the VA 222 being the next 9 bits [38:30] of the VA 222, forms an offset from the base address of the table 320 in order to provide the PA of an entry 324 in the table 320. This then provides the base address of a "level 2 table” 330 which in turn (by the same process) provides the base address of a "level 3 table” 340.
  • the answer to the detection at the step 440 is "yes".
  • the page table entry indicated by the PA 344 provides a page address and access permissions relating to a physical memory page.
  • the remaining portion 352 of the VA 222 namely the least significant 12 bits [11:0] provides a page offset within the memory page defined by the page table entry at the PA 344, though in an example system which stores information as successive four byte (for example 32 bit) portions, it may be that the portion [11:2] provides the required offset to the address of the appropriate 32 bit word.
  • the combination (at a step 460) of the least significant portion of the VA 222 and the final page table entry (in this case, from the "level 3 table” 340) provides (at a step 470) the PA 232 as a translation of the VA 222.
  • multiple stage MMUs are used in some situations. In this arrangement, two levels of translation are in fact used.
  • a virtual address (VA) required by an executing program or other system module such as a graphics processing unit (GPU) is translated to an intermediate physical address (IP A) by a first MMU stage.
  • the IPA is translated to a physical address (PA) by a second MMU stage.
  • VA virtual address
  • PA physical address
  • One reason why multiple stage translation is used is for security of information handling when multiple operating systems (OS) may be in use on respective “virtual machines” running on the same processor.
  • OS operating systems
  • a particular OS is exposed to the VA to IPA translation, whereas only a hypervisor (software which oversees the running of the virtual machines) has oversight of the stage 2 (IPA to PA) translation.
  • the VA may be considered as the input memory address and the IPA as the output memory address.
  • the IPA may be considered as the input memory address and the PA as the output memory address.
  • the returned PA is a single PA
  • an entire cache line of physical addresses is returned with the required physical address identified by one or more of the bits of the 12-bit page offset 352.
  • the TLB 105 is responsive to the returned cache line of physical addresses to determine whether one or more of the translations identified in the cache line can be allocated, in addition to the returned PA, as a coalesced entry in the TLB 105 and, if so, the TLB 105 or address processing circuitry associated therewith is arranged to allocate a coalesced entry in the TLB 105.
  • FIG. 6 schematically illustrates a coalesced (clustered) TLB entry 600 according to various configurations of the present techniques.
  • the coalesced TLB entry 600 contains translation data mapping each of the virtual addresses 602 to each of the physical addresses 604.
  • the coalesced TLB entry 600 comprises an aligned virtual address (VA aligned), an aligned physical address (PA aligned), a validity map, and a plurality of clustered PA offset fields (PA cluster).
  • VA aligned aligned
  • PA aligned aligned physical address
  • PA cluster a plurality of clustered PA offset fields
  • 8 virtual address blocks are clustered into the entry so the aligned virtual address is aligned to an address boundary of a region having a size of 8 virtual address blocks.
  • the aligned physical address is aligned to an address boundary of a region of the physical address space sized to include all possible physical address blocks that can be identified by the PA offsets.
  • the PA offsets are each 3 bits identifying 1 of 8 possible physical address blocks and the aligned PA is aligned to a physical address boundary of a physical address region having a size of 8 physical address blocks.
  • the validity map indicates which PA offsets of the PA offsets identify a valid translation and the PA offsets each identify a corresponding PA offset to be added to the aligned PA address in order to identify a particular address translation.
  • the VAs 0x8800 to 0x8804 are mapped, respectively, to the physical addresses 0xF003 to 0xF007, VAs 0x8805 and 0x8806 are unmapped (no valid translation) and VA 0x8807 is mapped to PA OxFOOO.
  • the address mappings are identified by recording the aligned VA (i.e., the portion of the VA that is common to each of the VAs 602), the aligned PA (i.e., the portion of the PA that is common to each of the PAs 604), and each of the PA offsets in order of the VA to which they correspond.
  • VA 0x8800 to PA 0xF003 The translation from VA 0x8800 to PA 0xF003 is identified in the coalesced entry 600 through the aligned VA 0x8800, the aligned PA OxFOOO, a set validity bit in the zeroth position (the right most position) of the validity map, and a PA offset of 3 in the zeroth position (the right most position) of the PA offsets.
  • VA 0x8801 to PA 0xF004 The translation from VA 0x8801 to PA 0xF004 is identified in the coalesced TLB entry 600 through the aligned VA 0x8800, the aligned PA OxFOOO, a set validity bit in the first position of the validity map, and a PA offset of 4 in the first position of the PA offsets.
  • VA 0x8802 to PA 0xF005 is identified in the coalesced TLB entry 600 through the aligned VA 0x8800, the aligned PA OxFOOO, a set validity bit in the second position of the validity map, and a PA offset of 5 in the second position of the PA offsets.
  • VA 0x8803 to PA 0xF006 The translation from VA 0x8803 to PA 0xF006 is identified in the coalesced TLB entry 600 through the aligned VA 0x8800, the aligned PA OxFOOO, a set validity bit in the third position of the validity map, and a PA offset of 6 in the third position of the PA offsets.
  • VA 0x8804 to PA 0xF007 is identified in the coalesced TLB entry 600 through the aligned VA 0x8800, the aligned PA OxFOOO, a set validity bit in the fourth position of the validity map, and a PA offset of 7 in the fourth position of the PA offsets.
  • the invalid translations for VAs 0x8805 and 0x8806 are identified in the coalesced TLB entry 600 as invalid entries through a clear (not set) validity bit in the fifth and sixth positions of the validity map.
  • the translation from VA 0x8807 to PA OxFOOO is identified in the coalesced TLB entry 600 through the aligned VA 0x8800, the aligned PA OxFOOO, a set validity bit in the seventh position of the validity map, and a PA offset of 0 in the fourth position of the PA offsets.
  • the TLB entry 600 will be identified in response to a lookup using any of the VAs 0x8800 to 0x8807 and a corresponding PA (where the translation is valid) can be returned by identifying the specific PA offset within the TLB entry 600.
  • Example implementations of the present disclosure relate to a TLB that is capable of supporting plural coalesced formats in which a size of a region of an input address space from which memory address translations can be coalesced into a single coalesced entry and/or a size of a region of output address space from which memory address translations can be coalesced into the single coalesced entry are different between different coalesced formats.
  • n- bits can be used to express the VA offset values
  • m bits can be used to express the PA offset values, in which n does not necessarily equal m.
  • Figure 7 schematically illustrates the translation between a VA and a PA for a case where n does not equal m.
  • a VA is defined by a VA base address 900 plus a VA offset 910 of n bits (the combination of the VA base address 900 and the VA offset 910 providing the same information as one of the VAs 602 of Figure 6), plus LSBs 920 of which one or more least significant bits 930 may be set to 0 so that each VA and each PA refers to a word boundary.
  • the size of the VA space is not equal to the size of the PA space.
  • the PA space may be a 40 bit address space and the VA space may be a 48 bit address space.
  • the total number of bits provided for the VA base address is larger (by a margin 943) than the total number of bits provided for the PA base address.
  • the VA is mapped by a memory address translation to a corresponding PA defined by a PA base address 940 plus a PA offset 950 of m bits, where in the illustrated configuration m > n which also implies that the number of least significant bits that are stored for the PA base address 940 is smaller (by a margin 942) than the bit length of the VA base address 900, assuming the size of each VA and PA is the same.
  • the PA base address 940 plus the PA offset 950 is concatenated with LSBs 960 which correspond to LSBs 920 to form the whole translated PA 970.
  • FIG 8 schematically illustrates a further example of a coalesced TLB entry 700 using a first coalesced format that may be supported by a TLB 105 according to various configurations of the present techniques.
  • the arrangement of the mappings between the VAs 702 and the PAs 704 that are recorded in the entry 700 using the first format are the same as those described in relation to figure 6 and, for reasons of conciseness, will not be repeated.
  • the TLB entry 700 formatted using the first coalesced format is capable of coalescing up to 8 memory address translations and differs from the TLB entry 600 formatted using the general coalesced format in that the TLB entry 700 formatted using the first format is able to support a greater range of PAs due to additional bits being provided for each of the PA offsets.
  • the TLB entry 700 formatted using the first coalesced format is able to store a translation 706 from VA 0x0_00F0_8805 to PA 0x000_F00F by storing the aligned VA 0x0_00F0_8800, the aligned PA 0x000_F000, a set validity bit in the fifth position of the validity map and a PA offset of F in the fifth position of the PA offsets.
  • the first coalesced format of TLB entry 700 illustrated in figure 8 uses 42 bits, in addition to the bits that would be used to identify a single memory address translation, to identify the translation data.
  • additional offsets in addition to the one offset required for a single translation
  • an additional 7 validity bits in addition to the one validity bit required for a single translation
  • One or more further bits may also be provided to indicate the format of the TLB entry 700.
  • the range of the virtual address space that can be covered by an entry in a given coalesced entry format is defined by a virtual address clustering factor (VA_CF) which specifies a number of blocks of virtual address space that can be coalesced into an entry for the given coalesced format.
  • VA_CF virtual address clustering factor
  • PA_CF physical address clustering factor
  • FIG. 9 schematically illustrates an example of a second coalesced format used by the TLB entry 800 to identify memory address translations from the VAs 802 to PAs 804.
  • the TLB entry 800 formatted using the second coalesced format is capable of identifying up to 4 memory address translations between VAs, that are within an aligned range of the virtual memory address space having a size equal to four blocks of memory addresses, and their corresponding PAs.
  • the second coalesced format of TLB entry 800 uses the same number of additional bits as the first coalesced format, i.e., 42 bits in addition to the bits that would be used to identify a single memory address translation.
  • the second coalesced format can therefore coalesce up to four translations provided that the PAs of each of the four translations are contained within a same region of the physical address space having a size equal to 2 13 times the size of a block of physical addresses.
  • the memory address translations are defined from VA 0x0_00F0_AB00 to PA 0xF0F_F001, from VA 0x0_00F0_AB01 to PA OxFOF FFFF, VA 0x0_00F0_AB02 to PA 0xF0F_F00F, and VA 0x0_00F0_AB03 to PA 0xF0F_E000.
  • VA 0x0 00F0 AB00 to PA OxFOF FOOl is identified through the aligned VA 0x0_00F0_AB00, the aligned PA 0xF0F_E000, a set validity bit in the zeroth position of the validity map, and a PA offset of 1001 in the zeroth position of the PA offsets.
  • PA offset has been represented using four hexadecimal values, this is for representation purpose only and the second coalesced format provides 13 bits (three hexadecimal values and 1 additional bit) for each PA offset.
  • a PA offset, in hexadecimal notation, of 1XXX (where X is any hexadecimal value) is to be interpreted as a binary value of I ZZZZ ZZZZ ZZZZZ (where Z is any binary value) and a PA offset, in hexadecimal notation, of OXXX is to be interpreted as a binary value of O ZZZZ ZZZZZZ.
  • VA 0x0_00F0_AB01 The translation from VA 0x0_00F0_AB01 to PA 0xF0F_FFFF is identified through the aligned VA 0x0_00F0_AB00, the aligned PA 0xF0F_E000, a set validity bit in the first position of the validity map, and a PA offset of 1FFF in the first position of the PA offsets.
  • VA 0x0_00F0_AB02 The translation from VA 0x0_00F0_AB02 to PA 0xF0F_F00F is identified through the aligned VA 0x0_00F0_AB00, the aligned PA 0xF0F_E000, a set validity bit in the second position of the validity map, and a PA offset of 100F in the second position of the PA offsets.
  • VA 0x0_00F0_AB03 The translation from VA 0x0_00F0_AB03 to PA 0xF0F_E000 is identified through the aligned VA 0x0_00F0_AB00, the aligned PA 0xF0F_E000, a set validity bit in the third position of the validity map, and a PA offset of 0000 in the third position of the PA offsets.
  • FIG 10 schematically illustrates an example of a third coalesced format used by the TLB entry 900 to identify memory address translations from the VAs 902 to PAs 904.
  • the TLB entry 900 formatted using the third coalesced format is capable of identifying up to 2 memory address translations between VAs, that are within an aligned range of the virtual memory address space having a size equal to two blocks of memory addresses, and their corresponding PAs.
  • the third coalesced format of TLB entry 900 uses the same number of additional bits as the first coalesced format, i.e., 42 bits in addition to the bits that would be used to identify a single memory address translation.
  • the third coalesced format can therefore coalesce up to two translations provided that the PAs of each of the four translations are contained within a same region of the physical address space having a size equal to 2 41 times the size of a block of physical addresses.
  • the physical address space is a 40 bit address space and the third coalesced format is therefore able to coalesce entries from anywhere within the physical address space.
  • two memory address translations are defined. Specifically, the memory address translations are defined from VA 0xl_FFF0_l 100 to PA 0xA93_lC0F, and from VA 0xl_FFF0_l 101 to PA 0x03F_05B0.
  • VA 0xl_FFF0_l 100 The translation from VA 0xl_FFF0_l 100 to PA 0xA93_lC0F is identified through the aligned VA 0xl_FFF0_l 100, the aligned PA 0x000_0000, a set validity bit in the zeroth position of the validity map, and a PA offset of 0xA93_lC0F in the zeroth position of the PA offsets
  • VA 0xl_FFF0_l 101 to PA 0x03F_05B0 is identified through the aligned VA 0xl_FFF0_l 100, the aligned PA 0x000_0000, a set validity bit in the first position of the validity map, and a PA offset of 0x03F_05B0 in the first position of the PA offsets.
  • the three coalesced formats illustrated in figures 8-10 each identify a different number of memory address translations with the number of memory address translations traded off against the range of the physical address space that can be coalesced into a single entry.
  • Figure 11 schematically illustrates a sequence of conversions from virtual addresses in a virtual address space 1000 to physical addresses in a physical address space 1002.
  • the translation data identifies five active (valid) translations.
  • the VA 0x8800 translates to the PA OxOFOO
  • the VA 0x8801 translates to the PA OxFFFF
  • the VA 0x8802 translates to the PA OxOFlF
  • the VA 0x8803 translates to the PA 0x1000
  • the VA 0x8807 translates to the PA OxOFOl .
  • There are various ways that these translations could be stored in the TLB dependent on a format used for an entry storing the translation data.
  • the translations 0x8800 to OxOFOO and 0x8807 to OxOFOl can be coalesced into a single entry.
  • the second coalesced format 1006 comprising up to 4 VA-PA translations
  • the translations 0x8800 to OxOFOO, 0x8802 to OxOFlF and 0x8803 to 0x1000 can be coalesced into a single entry.
  • the third coalesced format 1008 comprising up to 2 VA-PA translations, the translations 0x8800 to OxOFOO and 0x8801 to OxFFFF can be coalesced into a single entry.
  • the address allocation circuitry may be configured to select the new address translation to allocate from the first coalesced format 1004, the second coalesced format 1006, and the third coalesced format 1008.
  • the second coalesced format 1006 provides the greatest number of translations per entry and may therefore be selected for a new address translation.
  • Figure 12 schematically illustrates the translation of a virtual address 1202 to a physical address 1232 using an apparatus 1200 according to some configurations of the present technique.
  • the apparatus 1200 is provided with a translation lookaside buffer 1208, first hash generating circuitry 1204, second hash generating circuitry 1206, tag comparison circuitry for a coalesced translation 1226, tag comparison circuitry for a non-coalesced (single) translation 1228, and address forwarding circuitry 1230.
  • the translation lookaside buffer 1208 is arranged as a set associative cache comprising four set entries (ways) per index. In response to receipt of the virtual address 1202, two lookups in the translation lookaside buffer 1208 are performed.
  • the first lookup is a non-coalesced lookup at a location in the translation lookaside buffer 1208 identified using a non-coalesced index generated by first hash generating circuitry 1204 to identify a plurality of set entries 1210.
  • the second lookup is a coalesced lookup at a location in the translation lookaside buffer 1208 identified using a coalesced index generated by the second hash generating circuitry 1206.
  • the index generated by the second hash generating circuitry generates the hash based on one or more indexing bits of the virtual address 1202 that are defined by a currently active coalesced format identified in the coalesced format information 1214. Because the one or more indexing bits for the coalesced index are dependent on the currently active coalesced format, different indexing bits of the virtual address 1202 may be used for generation of the coalesced index and for generation of the non-coalesced index. As a result, a second plurality of set entries 1212 is identified based on the coalesced index.
  • a tag comparison is made using the tag comparison circuitry for the non-coalesced translation 1228 and the tag comparison circuitry for the coalesced translation 1226. Whilst this could, in principle, be performed based on a tag comparison of each of the set entries 1210 and 1212 against a tag portion of the virtual address 1202, this would require 8 tag comparisons for the 4 way set associative translation lookaside buffer 1208. The number of tag comparisons made is reduced by eliminating half of the set entries of each of the first plurality of set entries 1210 and the second plurality of set entries 1212.
  • a skew bit 1216 is identified from the virtual address 1202.
  • the choice of skew bit 1216 is dependent on the coalesced format information 1214 and is the least significant bit of the virtual address 1202 that is not used in the generation of the coalesced index or the non-coalesced index.
  • the skew bit is fed into the first coalesced entry selection circuitry 1218, the second coalesced entry selection circuitry 1220, the first non-coalesced entry selection circuitry 1222 and the second non-coalesced entry selection circuitry 1224.
  • the first coalesced entry selection circuitry 1218 selects way 11 of the second plurality of set entries 1212 to be forwarded to the tag comparison circuitry for the coalesced translation 1226 when the skew bit has a value of 1, and selects way 10 of the second plurality of set entries 1212 to be forwarded to the tag comparison circuitry for the coalesced translation 1226 when the skew bit has a value of 0.
  • the second coalesced entry selection circuitry 1220 selects way 01 of the second plurality of set entries 1212 to be forwarded to the tag comparison circuitry for the coalesced translation 1226 when the skew bit has a value of 1, and selects way 00 of the second plurality of set entries 1212 to be forwarded to the tag comparison circuitry for the coalesced translation 1226 when the skew bit has a value of 0.
  • the first non-coalesced entry selection circuitry 1222 selects way 11 of the first plurality of set entries 1210 to be forwarded to the tag comparison circuitry for the non-coalesced translation 1228 when the skew bit has a value of 0, and selects way 10 of the first plurality of set entries 1210 to be forwarded to the tag comparison circuitry for the non-coalesced translation 1228 when the skew bit has a value of 1.
  • the first non-coalesced entry selection circuitry 1224 selects way 01 of the first plurality of set entries 1210 to be forwarded to the tag comparison circuitry for the non-coalesced translation 1228 when the skew bit has a value of 0, and selects way 00 of the first plurality of set entries 1210 to be forwarded to the tag comparison circuitry for the non-coalesced translation 1228 when the skew bit has a value of 1.
  • the tag comparison circuitry for the coalesced translation 1226 receives ways 11 and 01 of the plurality of entry sets identified by the coalesced index, whilst the tag comparison circuitry for the non-coalesced translation 1228 receives ways 10 and 00 of the plurality of entry sets identified by the non-coalesced index.
  • the tag comparison circuitry for the coalesced translation 1226 receives ways 10 and 00 of the plurality of entry sets identified by the coalesced index
  • the tag comparison circuitry for the non-coalesced translation 1228 receives ways 11 and 01 of the plurality of entry sets identified by the non-coalesced index.
  • the total number of tag comparisons for the four way set associative translation lookaside buffer 1208 is therefore equal to four.
  • the tag comparison circuitry for the non-coalesced translation 1228 performs a tag comparison between a tag portion of the virtual address 1202 and a tag portion of stored virtual addresses in the ways forwarded by the first non-coalesced entry selection circuitry 1222 and the second non-coalesced entry selection circuitry 1224. When a tag match is identified, the physical address of the matching entry is passed to address forwarding circuitry 1230.
  • the tag comparison circuitry for the coalesced translation 1226 performs a tag comparison between a tag portion of the virtual address 1202 and tag portions of stored virtual addresses in the set entries forwarded by the first coalesced entry selection circuitry 1218 and the second coalesced entry selection circuitry 1220.
  • the tag comparison circuitry for the coalesced translation 1226 receives the coalesced format information 1214 identifying bits of the virtual address 1202 that are to be used to identify whether a base portion of the virtual address matches a stored base portion of one of the set entries and identifying bits of the virtual address 1202 that are used to identify whether a valid offset portion of a physical address is present in the matching entry.
  • the physical address of the matching entry is passed to address forwarding circuitry 1230.
  • the address forwarding circuitry 1230 is responsive to receipt of a physical address from one of the tag comparison circuitry for the coalesced translation and the tag comparison circuitry for the non-coalesced translation to output a final physical address 1232.
  • the address forwarding circuitry 1230 is responsive to an indication that neither the tag comparison circuitry for the coalesced translation 1226 nor the tag comparison circuitry for the non-coalesced translation 1228 has returned a matching set entry, to signal a TLB miss to trigger a page table walk to be performed to identify the corresponding physical address
  • Figure 13 schematically illustrates a sequence of steps performed by address processing circuitry according to various configurations of the present techniques. The steps begin at step 1300 where an input lookup address is received by the address processing circuitry. Flow then proceeds to step 1302 where the address processing circuitry retrieves the active coalesced formats. Flow then proceeds to step 1304 where the address processing circuitry selects a candidate coalesced format from the retrieved active coalesced formats. Flow then proceeds to step 1306 where the address processing circuitry triggers a lookup in the translation lookaside buffer using the coalesced format.
  • the lookup may comprise any of the steps described above including generating an index from an indexing portion of the input address defined in the candidate coalesced format, identifying a subset of set entries (ways) using one or more skew bits defined based on the active coalesced formats, and performing a tag comparison on one or more set entries (ways) at the identified index.
  • step 1308 it is determined whether there is a hit in the translation lookaside buffer. If no hit is found then flow proceeds to step 1310 where it is determined if there are any more coalesced formats to consider. If, at step 1310, it is determined that there are more coalesced formats to consider then flow returns to step 1304.
  • step 1310 If, at step 1310, it was determined that there are no more coalesced formats to consider, then flow proceeds to step 1312 where a noncoalesced lookup is performed in the translation lookaside buffer based on a non-coalesced (single address translation) format. If, at step 1312, the lookup misses (fails to identify a corresponding entry) in the translation lookaside buffer, then flow proceeds to step 1314 where a page table walk is triggered to determine the output memory address. If, at step 1308, it was determined that there was a hit for a candidate coalesced format in the TLB, then flow proceeds to step 1316 where it is determined whether the entry is in the candidate coalesced format.
  • step 1316 This is performed, for example, by inspecting one or more control bits that are present in the entry to determine if those control bits match values that are expected for the current candidate coalesced format. If, at step 1316, there is no match, then flow proceeds to step 1310. If, at step 1316, there is a match then flow proceeds to step 1318 where the address processing circuitry outputs the output memory address identified in the candidate coalesced format. If, at step 1312, it was determined that there was a hit in the translation lookaside buffer in the non-coalesced format, then flow proceeds to step 1318 where the output memory address identified using the non-coalesced format is output.
  • step 1304, step 1306, step 1308 and step 1316 are performed for two or more of the coalesced formats in parallel.
  • step 1304, step 1306, step 1308, and step 1316 may be performed for one or more of the coalesced formats in parallel to performing step 1312.
  • step 1306 and 1312 may be adapted dependent on whether or not the particular implementation implements skewing (i.e., whether one or more skew bits are used to identify the particular ways that can be used for different formats) and dependent on the layout of the translation lookaside buffer (i.e., whether it is a set associative cache, a direct mapped cache, or a fully associative cache).
  • skewing i.e., whether one or more skew bits are used to identify the particular ways that can be used for different formats
  • layout of the translation lookaside buffer i.e., whether it is a set associative cache, a direct mapped cache, or a fully associative cache.
  • Figure 14 schematically illustrates a sequence of steps carried out by address processing circuitry when allocating an entry in the translation lookaside buffer.
  • Flow begins at step 1400 where an output address is determined using, for example, a page table walk. The page table walk may be triggered, for example, in response to an input address missing in the translation lookaside buffer.
  • Flow then proceeds to step 1402 where the active coalesced formats are retrieved by the translation lookaside buffer.
  • step 1404 a candidate coalesced format is selected from the active coalesced formats.
  • Flow proceeds to step 1408 where it is determined whether there are any more coalesced formats.
  • step 1408 If, at step 1408, it is determined that there are more coalesced formats then flow returns to step 1404. If, at step 1408, it is determined that there are no more coalesced formats, i.e., a set of candidate translation data has been prepared for each of the coalesced formats, then flow proceeds to step 1410 where it is determined whether the address translation data can be represented using one of the candidate coalesced formats. In other words, it is determined if any of the sets of candidate translation data includes a plurality of valid memory address translations.
  • step 1410 If, at step 1410, it is determined that none of the candidate sets of translation data includes a plurality of valid memory address translations, then flow proceeds to step 1412 where a new memory address translation, including translation data identifying a translation between the input memory address that triggered the page table walk and the returned output memory address, is allocated in the translation lookaside buffer. If, at step 1410, it is determined that one or more of the candidate sets of translation data includes a plurality of valid memory address translations, then flow proceeds to step 1414 where a new entry is allocated in the translation lookaside buffer to store a chosen one of the candidate sets of translation data that includes a plurality of valid memory address translations.
  • the set of candidate translation data is selected for entry in the translation lookaside buffer. If there are a plurality of sets of candidate translation data that include plural valid memory address translations, then the set of candidate translation data having the greatest number of valid translations may, for example, be selected. Where there are multiple sets of candidate translation data having the greatest number of valid translations, then the set of candidate translation data for allocation as an entry in the translation lookaside buffer may, for example, be selected based on a predetermined selection criteria. For example, the predetermined selection criteria may indicate that the set of candidate translation data using a format that spans the greatest range the output memory address space is selected.
  • step 1404, and step 1406 can be performed for two or more of the retrieved coalesced formats in parallel.
  • the predetermined selection criteria may can be any criteria with which a unique set of candidate translation data can be identified. Such predetermined selection criteria may include selecting based on one or more of the following criteria: the candidate translation data having memory address translations that span a smallest/greatest region of the output memory address space, the candidate translation data formatted using a coalesced format capable of storing the most/fewest memory address translations, and/or the candidate translation data having the fewest invalid entries.
  • Figure 15 schematically illustrates a non-transitory computer-readable medium comprising computer readable code for fabrication of an apparatus according to various configurations of the present techniques. Fabrication is carried out based on computer readable code 1002 that is stored on a non-transitory computer-readable medium 1000.
  • the computer-readable code can be used at one or more stages of a semiconductor design and fabrication process, including an electronic design automation (EDA) stage, to fabricate an integrated circuit comprising the apparatus embodying the concepts.
  • EDA electronic design automation
  • the fabrication process involves the application of the computer readable code 1002 either directly into one or more programmable hardware units such as a field programmable gate array (FPGA) to configure the FPGA to embody the configurations described hereinabove or to facilitate the fabrication of an apparatus implemented as one or more integrated circuits or otherwise that embody the configurations described hereinabove.
  • the fabricated design 1004 may in one example implementation comprise the control circuitry 103, and the translation lookaside buffer 105 described in reference to figure 2. However, the fabricated design may comprise any of the implementations described in reference to any of figures 1, 2, and/or 12 arranged to carry out any of the processes described in figures 3-11 and 13-14
  • an apparatus comprising a translation lookaside buffer (TLB) comprising plural entries capable of storing translation data.
  • TLB translation lookaside buffer
  • the TLB is configured to select, when allocating the translation data for storage within a given entry, a format used to store the translation data within the given entry, and the format is selected from a format group comprising plural coalesced formats.
  • the apparatus is provided with control circuitry to maintain coalesced format information identifying active coalesced formats.
  • Each coalesced format defines an input address range size and an output address range size
  • each entry formatted using a coalesced format is capable of identifying plural address translations between input address blocks, located within an input address range having the input address range size defined in that coalesced format, and output address blocks, located within an output address range having the output address range size defined in that coalesced format.
  • the words “configured to. ..” are used to mean that an element of an apparatus has a configuration able to carry out the defined operation.
  • a “configuration” means an arrangement or manner of interconnection of hardware or software.
  • the apparatus may have dedicated hardware which provides the defined operation, or a processor or other processing device may be programmed to perform the function. “Configured to” does not imply that the apparatus element needs to be changed in any way in order to provide the defined operation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

There is provided an apparatus comprising a translation lookaside buffer (TLB) comprising plural entries capable of storing translation data. The TLB is configured to select, when allocating the translation data for storage within a given entry, a format used to store the translation data within the given entry, and the format is selected from a format group comprising plural coalesced formats. The apparatus is provided with control circuitry to maintain coalesced format information identifying active coalesced formats. Each coalesced format defines an input address range size and an output address range size, and each entry formatted using a coalesced format is capable of identifying plural address translations between input address blocks, located within an input address range having the input address range size defined in that coalesced format, and output address blocks, located within an output address range having the output address range size defined in that coalesced format.

Description

STORING COALESCED MEMORY ADDRESS TRANSLATIONS
The present invention relates to data processing. More particularly the present invention relates to an apparatus, method, and non-transitory computer readable storage medium.
Some data processing apparatuses are provided with a translation lookaside buffer to store memory address translations between a block of input memory addresses in an input memory address space and a block of output memory addresses in an output memory address space. By storing memory address translations in this way the apparatus is able to translate between addresses in the input memory address space and the output memory address space without needing to request translation data from memory, provided the translation lookaside buffer stores the required memory address translation.
In some configurations there is provided an apparatus comprising: a translation lookaside buffer comprising a plurality of entries each capable of storing translation data comprising one or more memory address translations, the one or more memory address translations each defining an address translation between a block of input memory addresses in an input memory address space and a block of output memory addresses in an output memory address space, wherein the translation lookaside buffer is configured to select, when allocating the translation data for storage within a given entry, a format used to store the translation data within the given entry, and the format is selected from a format group comprising a plurality of coalesced formats; and control circuitry configured to maintain coalesced format information identifying one or more active coalesced formats of the plurality of coalesced formats currently available for use by the translation lookaside buffer; wherein: each of the plurality of coalesced formats defines an input address range size and an output address range size; and each entry formatted using a given coalesced format of the plurality of coalesced formats is capable of identifying a plurality of memory address translations between blocks of input memory addresses, located within a same input address range having the input address range size defined in the given coalesced format, and blocks of output memory addresses, located within a same output address range having the output address range size defined in the given coalesced format.
In some configurations there is provided a method comprising: storing, in a plurality of entries of a translation lookaside buffer, translation data comprising one or more memory address translations, the one or more memory address translations each defining an address translation between a block of input memory addresses in an input memory address space and a block of output memory addresses in an output memory address space, wherein the translation lookaside buffer is configured to select, when allocating the translation data for storage within a given entry, a format used to store the translation data within the given entry, and the format is selected from a format group comprising a plurality of coalesced formats; and maintaining coalesced format information identifying one or more active coalesced formats of the plurality of coalesced formats currently available for use by the translation lookaside buffer, wherein: each of the plurality of coalesced formats defines an input address range size and an output address range size; and each entry formatted using a given coalesced format of the plurality of coalesced formats is capable of identifying a plurality of memory address translations between blocks of input memory addresses, located within a same input address range having the input address range size defined in the given coalesced format, and blocks of output memory addresses, located within a same output address range having the output address range size defined in the given coalesced format.
In some configurations there is provided a computer readable storage medium to store computer-readable code for fabrication of an apparatus comprising: a translation lookaside buffer comprising a plurality of entries each capable of storing translation data comprising one or more memory address translations, the one or more memory address translations each defining an address translation between a block of input memory addresses in an input memory address space and a block of output memory addresses in an output memory address space, wherein the translation lookaside buffer is configured to select, when allocating the translation data for storage within a given entry, a format used to store the translation data within the given entry, and the format is selected from a format group comprising a plurality of coalesced formats; and control circuitry configured to maintain coalesced format information identifying one or more active coalesced formats of the plurality of coalesced formats currently available for use by the translation lookaside buffer; wherein: each of the plurality of coalesced formats defines an input address range size and an output address range size; and each entry formatted using a given coalesced format of the plurality of coalesced formats is capable of identifying a plurality of memory address translations between blocks of input memory addresses, located within a same input address range having the input address range size defined in the given coalesced format, and blocks of output memory addresses, located within a same output address range having the output address range size defined in the given coalesced format.
Such computer-readable code can be disposed in any known transitory computer-readable medium (such as wired or wireless transmission of code over a network) or non-transitory computer-readable medium such as semiconductor, magnetic disk, or optical disc.
The present techniques will be described further, by way of example only, with reference to configurations thereof as illustrated in the accompanying drawings, in which:
Figure 1 schematically illustrates an apparatus according to various configurations of the present techniques;
Figure 2 schematically illustrates an apparatus according to various configurations of the present techniques;
Figure 3 schematically illustrates a sequence of steps performed in response to a request for a translation according to various configurations of the present techniques;
Figure 4 schematically illustrates a page table walk carried out in response to a request for a translation according to various configurations of the present techniques;
Figure 5 schematically illustrates a sequence of steps carried out when performing a page table walk according to various configurations of the present techniques;
Figure 6 schematically illustrates a mapping between virtual addresses and physical addresses according to various configurations of the present techniques; Figure 7 schematically illustrates a mapping between virtual addresses and physical addresses according to various configurations of the present techniques;
Figure 8 schematically illustrates a mapping between virtual addresses and physical addresses according to various configurations of the present techniques;
Figure 9 schematically illustrates a mapping between virtual addresses and physical addresses according to various configurations of the present techniques;
Figure 10 schematically illustrates a mapping between virtual addresses and physical addresses according to various configurations of the present techniques;
Figure 11 schematically illustrates a mapping between virtual addresses and physical addresses using a range of different formats according to various configurations of the present techniques;
Figure 12 schematically illustrates details of an apparatus according to various configurations of the present techniques;
Figure 13 schematically illustrates a sequence of steps carried out for performing a lookup in response to an input address, in accordance with various configurations of the present techniques;
Figure 14 schematically illustrates a sequence of steps carried out when allocating memory address translations in a translation lookaside buffer, according to various configurations of the present techniques; and
Figure 15 schematically illustrates fabrication of an apparatus according to various configurations of the present techniques.
Some processing apparatuses maintain addresses in two or more different address spaces that are used to identify locations at which data values are stored. In order to identify addresses using two or more address spaces the apparatus stores translation data that defines a mapping between input addresses in an input address space and output addresses in an output address space. In order to facilitate rapid translation between the input address space and the output address space, some apparatuses are provided with a translation lookaside buffer that is arranged to store (cache) at least some of the translation data. When an input address is received, the apparatus can query (i.e., perform a lookup in) the translation lookaside buffer and, if there is a hit, perform the translation without having to retrieve the translation data from memory resulting in a faster translation. The more translations that are stored in the translation lookaside buffer, the greater the hit rate when a lookup is performed and the less time spent performing the address translations. One approach to increasing the hit rate of the translation lookaside buffer would be to provide a larger translation lookaside buffer. However, this approach incurs a cost in terms of circuit area and power consumption which may not be desirable. Alternatively, rather than providing translations between a single block of input addresses and a single block of output addresses, entries of the translation lookaside buffer may be formatted using a coalesced format in which two or more translations that occur within a same portion of the memory address spaces can be combined into a single entry. This approach can improve the hit rate associated with lookups in a translation lookaside buffer due to the coalesced entries increasing the effective capacity of the translation lookaside buffer However, during normal system use address space fragmentation occurs, due to reassignment of different portions of the address spaces. As a result addresses that were originally within a same portion of the memory address space can become separated and the benefit of using the coalesced format for entries of the translation lookaside buffer may reduce.
In some configurations there is provided an apparatus comprising a translation lookaside buffer comprising a plurality of entries each capable of storing translation data comprising one or more memory address translations, the one or more memory address translations each defining an address translation between a block of input memory addresses in an input memory address space and a block of output memory addresses in an output memory address space. The translation lookaside buffer is configured to select, when allocating the translation data for storage within a given entry, a format used to store the translation data within the given entry, and the format is selected from a format group comprising a plurality of coalesced formats. The apparatus is also provided with control circuitry configured to maintain coalesced format information identifying one or more active coalesced formats of the plurality of coalesced formats currently available for use by the translation lookaside buffer. Each of the plurality of coalesced formats defines an input address range size and an output address range size.
In addition, each entry formatted using a given coalesced format of the plurality of coalesced formats is capable of identifying a plurality of memory address translations between blocks of input memory addresses, located within a same input address range having the input address range size defined in the given coalesced format, and blocks of output memory addresses, located within a same output address range having the output address range size defined in the given coalesced format. The inventors have realised that a coalesced format that is used at system start up, or following a defragmentation process, may become less useful as the memory spaces fragment. In order to improve the overall coverage of the translation lookaside buffer, plural (two or more) different coalesced formats are supported Each of the coalesced formats provides a different input address range size and a different output address range size. Therefore, if the output address space becomes fragmented then a coalesced format having a larger output address range size could be selected so that blocks of output addresses that are further apart in the output address space can be coalesced into a single coalesced entry. In this way, the potential adverse effect of fragmentation on entries of the translation lookaside buffer can be reduced and the coverage of the translation lookaside buffer for fragmented memory address spaces can be increased.
The apparatus is also provided with control circuitry which is configured to maintain coalesced format information in order to identify which of the plural coalesced formats are currently actively supported by the translation lookaside buffer. The translation lookaside buffer can therefore be arranged to support (i.e., be capable of supporting) plural coalesced formats, but the number of those coalesced formats that are active at any one time may be fewer than the total number of coalesced formats that are supported. The format used by a given entry of the translation lookaside buffer is initially determined at a time of allocation of translation data into that given entry of the translation lookaside buffer. In some configurations this format is retained so long as at least one memory address translation stored in the entry remains valid. In other configurations, the format may be changed after allocation as will be discussed in detail below.
The translation data that is stored in the translation lookaside buffer is a translation from a block of input memory addresses to a block (group) of output memory addresses. In one example implementation, each block identifies an aligned contiguous range of memory address space. In other words, the translation data that is stored in the translation lookaside buffer may be provided at a coarser level than the individual address level. In some configurations, the aligned contiguous range of memory address space is identified by excluding a number of least significant bits of an input address and of an output address from the translation data and mapping input addresses within a corresponding input address block to output address within a corresponding output address block by performing a fixed mapping between the least significant bits of the input address and the least significant bits of the output address. In some configurations, the least significant bits of the input address are the least significant bits of the output address. The apparatus is provided with various circuitry blocks, i.e., the translation lookaside buffer and the control circuitry. These circuitry blocks may be provided as discrete circuits fabricated as a same or as different integrated circuits. Alternatively, the circuitry blocks may be provided as one or more combined circuitry blocks that each work together to perform the functionality of the described circuitry blocks. In some configurations, the control circuitry may be comprised in the translation lookaside buffer. In alternative configurations, the control circuitry may be external to the translation lookaside buffer.
The storage of translation data in each of the coalesced formats can be provided in a variety of ways. In some configurations, for each of the plurality of coalesced formats, a number of address translations that can be identified per entry is dependent on the output address range size defined in that coalesced format. Each entry using one of the coalesced formats can be used to store data indicative of translations between two or more blocks of input addresses and two or more blocks of output addresses. The closer together these addresses are in the output memory address space, the more the address translations can be compressed within an entry. Therefore, there is a trade-off between the output address range size and the number of address translations that can be identified per entry, given the fixed size of each entry. The closer together the two or more blocks of output addresses, the fewer the bits needed to store address translations related to those blocks of output addresses.
Whilst it may be possible to increase the number of bits available for each entry of the translation lookaside buffer, this results in increased overheads in terms of circuit area and power consumption. In some configurations the plurality of coalesced formats comprises at least a first coalesced format and a second coalesced format. The output address range size defined in the second coalesced format is greater than the output address range size defined in the first coalesced format. Furthermore, the number of address translations that can be identified per entry formatted using the second coalesced format is defined such that a second number of bits required for the entry formatted using the second coalesced format is fewer than or equal to a first number of bits required by an entry formatted using the first coalesced format. In some configurations, the first coalesced format is defined as a legacy coalesced format available to translation lookaside buffers that are capable of supporting only a single coalesced format. The second coalesced format is then defined such that a greater output address range size can be spanned for entries formatted using the second coalesced format with a number of address translations that can be accommodated in entries formatted using the second coalesced format being smaller than a number of address translations that can be accommodated in entries formatted using the first coalesced format. In other words, the translation data that is present in the second coalesced format can be represented without requiring any additional bits to be provided beyond those that were already present in an entry of the translation lookaside buffer to support the first coalesced format.
In some configurations, for each of the plurality of coalesced formats, the number of address translations is equal to a power of two. By defining the number of address translations as being a power of two, the number of redundant bits that are present for the each of the coalesced formats can be reduced. This results in an increased information density in the entries of the translation lookaside buffer. In addition, as each coalesced entry may be generated based on data returned from a page table walk, e.g., an entire cache line of address translations which typically contains a number of entries that is also equal to a power of two, there is a closer correspondence between the coalesced formats and the data returned from the page table walk. Finally, by defining the number of address translations as being a power of two, each coalesced entry can be aligned to a boundary of an input address range having the input address range size. This results in a more efficient implementation.
In some configurations the format group comprises a non-coalesced format comprising a single memory address translation between a single block of input memory addresses and a single block of output memory addresses. In addition, the translation lookaside buffer is configured to store, for each of the plurality of entries, coalesced entry indicating information identifying whether the one or more memory address translations stored in that entry are represented using the non-coalesced format or one of the plurality of coalesced formats. In addition to storing translation data in entries using the one or more active coalesced formats, the translation lookaside buffer may be capable of storing translation data in entries using a non-coalesced format. As a result, any memory address translation can potentially be stored in the translation lookaside buffer even when it cannot be coalesced with another memory address translation. In order to identify which of the entries in the translation lookaside buffer are coalesced entries and which are non-coalesced entries, the translation lookaside buffer stores coalesced entry indicating information identifying, for each entry, whether that entry is in the non-coalesced format or whether that entry is in the coalesced format. In some configurations the coalesced entry indicating information is stored as part of the translation entry, for example, as a single bit at a known bit position which takes a first value when the entry is a coalesced entry and takes a second value when the entry is a noncoalesced entry. Alternatively, the coalesced entry indicating information could be implicit. For example, an entry formatted using a coalesced format may comprise validity bits indicating a validity of each of the memory address translations where the coalesced entry indicating information is inferred based on whether or not there is more than one valid memory address translation in the entry. In some configurations, the coalesced entry indicating information is stored in a separate storage structure. In a further configuration the translation lookaside buffer may comprise two or more separate storage structures, one for coalesced entries and one for noncoalesced entries with the coalesced entry indicating information being implicitly determined based on the storage structure that is used to store the translation data.
In some configurations each of the plurality of coalesced formats defines the input address range size as covering 2n blocks of input memory addresses, and defines the output address range size as covering 2m blocks of output memory addresses. In addition, each entry formatted using the given coalesced format stores a base input memory address, a base output memory address, and a plurality of fields each capable of storing a mapping between an n-bit offset in the input memory address space and an m-bit offset in the output memory address space. The blocks of input and output memory addresses are identified, respectively, using an input address identifier and an output address identifier which exclude a number of least significant bits from the input and output addresses. These least significant bits may be directly mapped (e g., copied) from the input address to the output address rather than being stored in the translation lookaside buffer. In some configurations, by way of example, each entry formatted using a coalesced format stores the following information:
• The base input address;
• A base output address;
• Ni=2n offset fields each having m-bits in which to store an m-bit offset; and
• Ni validity bits each indicating a validity of a corresponding one of the Ni offsets.
For a given input address identifier (comprising a base input address and an n-bit input offset), a translation can be performed by taking the base output address specified in an entry identified using the base input address and appending one of the Ni=2“ offsets to the base output address, where the offset is identified by a least significant n-bits of the input address identifier, i.e., the n-bit offset. The total number of additional bits required to support this information is given by the number of additional sets of translation offsets (the total number of translation offsets minus the one that would already be there for a non-coalesced entry), i.e., (N[ — 1), multiplied by the number of bits per offset, i.e., (m + 1), where the additional +1 is for the validity bits. In some configurations one or more additional bits may be provided to identify the translation format that has been used. By varying the number of bits that are available to identify the offset, the range of output address space that can be stored in a single entry of the translation lookaside buffer can be varied.
The active coalesced formats may be configurable and, in some configurations, the control circuitry is responsive to removal of a previously active coalesced format from the active coalesced formats, to initiate a scrubbing procedure to identify entries formatted using the previously active coalesced format and to remove the identified entries. When the previously active coalesced format is removed, any entries of the translation lookaside buffer that are formatted using the previously active coalesced format may no longer be considered valid. In such a situation the control circuitry is arranged to scrub the translation lookaside buffer to identify (determine) which entries are formatted using the previously active coalesced format. In some configurations, the scrubbing procedure comprises disabling all or part of the translation lookaside buffer whilst scrubbing is being performed.
The scrubbing procedure can take any form that removes entries formatted using the previously active coalesced format. In some configurations the scrubbing procedure comprises invalidating at least one of the identified entries. In some configurations the scrubbing procedure comprises allocating at least one reformatted entry identifying at least one memory address translation comprised in one of the identified entries rewritten using a currently active coalesced format identified in the coalesced format information. The scrubbing procedure can use a combination of invalidation and rewriting (reallocation) of entries dependent on one or more criteria. For example, the control circuitry may be responsive to identification of an entry formatted using the previously active format, to determine whether any of the plurality of memory address translations stored in that entry can be coalesced into an entry formatted using a currently active coalesced format. If so then the entry can be reallocated and if not then the entry is invalidated.
In some configurations the translation data is allocated to a corresponding entry of the plurality of entries based on an index derived from one or more indexing bits of a common address portion of the one or more blocks of input memory addresses associated with the one or more memory address translations comprised in the translation data, and the one or more indexing bits are dependent on the format used to store the translation data. In some configurations the index is the one or more indexing bits. In alternative configurations, a mapping function may be applied to the one or more indexing bits to derive the index The mapping function may be a hash function. In some configurations the indexing bits may comprise one or more least significant bits of the common portion of the address.
In some configurations the translation lookaside buffer is formulated as a set associative cache comprising a plurality of sets each identified by a corresponding index and each comprising a plurality of set entries of the plurality of entries. In addition, the control circuitry is configured to determine one or more skew bits identifying, for given translation data, which of the plurality of set entries can be used to store the given translation data formatted using a given format of the format group. In other words, for a given index (identified from given indexing bits of the given translation data) rather than being able to store the given translation data in any of the plurality of set entries, a first subset of the set entries (ways) may be identified by the one or more skew bits as being suitable for storing the translation data if it is in a first format of the format group (e.g., one of the coalesced formats) and a second subset of the set entries may be identified as being suitable for storing the translation data if it is in a second format of the format group (e g. another of the coalesced formats or the non-coalesced format). In some configurations, the one or more skew bits may be a single bit with the first subset of entries being indicated by a first value and the second subset of entries being indicated by the second value. In some configurations, the first subset of entries and the second subset of entries are the same size, for example, a skew bit having a logical 1 may indicate that even set entries are being used for the first format and that odd set entries are being used for the second format. However, this need not be the case and, in some configurations, the sizes of the first subset and the second subset may be different. In addition, the one or more skew bits may comprise a plurality of skew bits identifying a different subset of the plurality of entry sets that could be used for each format identified in the format group. This approach provides a great deal of flexibility in terms of which entry formats are used for different entry sets and can be used, for example, if it is expected that some entry formats are likely to occur less frequently than others.
Whilst the bits that are used as the one or more skew bits may be independent of the active coalesced formats, in some configurations the one or more skew bits are determined based on the coalesced format information. In other words, the bits identified as the one or more skew bits can change in response to a change in the active coalesced formats identified in the coalesced format information. For example, the one or more skew bits may comprise one or more least significant bits (e g., the least significant bit) of the bits of the input memory address that are not comprised in the indexing bits of any of the active coalesced formats.
In some configurations, where the format group includes a non-coalesced format, the one or more skew bits identify a first group of one or more set entries that can be used to store the given translation data formatted using the non-coalesced format and a second group of one or more set entries that can be used to store the given translation data formatted using one of the one or more active coalesced formats. By assigning different groups of one or more set entries to the coalesced and non-coalesced entries, a total number of set entries that need to be checked for each format can be reduced. In some configurations the first group and the second group are nonoverlapping groups.
In some configurations the control circuitry is responsive to a change in bits identified as the one or more skew bits, to perform a reallocation procedure to reallocate entries storing the translation data formatted using the non-coalesced format. The reallocation procedure may comprise identifying entries of the translation lookaside buffer that use the non-coalesced format, comparing a value of the previous skew bit(s) to a value of the new skew bit(s) and, if the values are different, reallocating translation data to a different set entry. Where the values of the previous skew bit(s) and the values new skew bit(s) are the same (i.e., where the previously allocated skew bits are set to a particular binary value and the newly allocated skew bits are, e.g., by coincidence, set to the same particular binary value), the reallocation procedure can skip that entry and move onto identifying another entry that is formatted using the non-coalesced format. By way of example, if the previously allocated skew bits comprised a single bit, i.e., bit 16, and the newly allocated skew bits comprised a single bit, i.e., bit 18, then the bit at bit position 16 and the bit at bit position 18 would be compared to determine whether the value of the skew bit has changed. This comparison is performed on a per entry basis as it is dependent on the precise value of the input address bit(s) used as the skew bit(s).
Whilst any of the bits of the input memory address can be used as the one or more skew bits, in some configurations the bits determined to be the one or more skew bits comprise at least one bit belonging to the common address portion and different to the one or more indexing bits for each of the one or more active coalesced formats. The input memory address can therefore be subdivided into the following sections [tag, index, LSBs] where the LSBs are a portion of the input memory address that is mapped directly, i.e., without performing a lookup, to the output memory address. The index bits are used as index generating bits to identify a location in the translation lookaside buffer. The tag portion of the input memory address is used for comparison to bits of input memory addresses stored in each set entry in order to determine if the set entry is a match. The one or more skew bits comprise one or more of the bits (e g., a single bit) of the tag portion. The one or more skew bits can be any of the tag bits that are not used as index bits in any of the active coalesced formats identified in the coalesced format information or as index bits for the non-coalesced format. In some configurations the one or more skew bits comprise the least significant one or more bits of the tag bits that are not used as an index in any of the active coalesced formats identified in the coalesced format information or as index bits for the noncoalesced format.
Whilst some configurations may support multiple active coalesced formats, in some configurations the coalesced format information identifies a single active coalesced format. In such configurations, the coalesced format information can be stored globally for all entries in the translation lookaside buffer. The global storage may be provided as one or more dedicated control bits that identify which of the plurality of coalesced formats is the single active coalesced format at any given time. The one or more dedicated control bits may be provided as part of the translation lookaside buffer, the control circuitry, or separately from the translation lookaside buffer and the control circuitry, for example, as part of one or more control registers. In some configurations, the one or more control bits also identifies a previous coalesced format to facilitate identification and reallocation/invalidation of entries that are formatted using the previous coalesced format. In some configuration the coalesced format information identifies a plurality of active coalesced formats, and each of the plurality of entries stores information identifying which of the plurality of active coalesced formats is used to represent translation data stored in that entry. The information may comprise one or more additional bits added to (encoded in) each entry of the translation lookaside buffer. In some configurations, the information may be encoded using a plurality of bits including at least one bit identifying whether the entry is a coalesced or a noncoalesced bit. For example, where three coalesced formats can be supported by the translation lookaside buffer at any given time, each entry may be provided with two bits to identify the four possible formats (non-coalesced format, or one of the three coalesced formats). In some configurations, the plurality of active coalesced formats comprises all of the plurality of coalesced formats. In other configurations, the plurality of active coalesced formats comprises a subset of the coalesced formats. For example, three coalesced formats may be available with only two active coalesced formats at any given time. In some alternative configurations the information identifying which of the plurality of active coalesced formats is used to represent the translation data stored in each entry may be provided as a table, separate from the translation lookaside buffer. In such configurations, a lookup in the translation lookaside buffer for translation data of a particular format could be avoided and/or curtailed if the table identifies that, for a given index determined given that particular format, there are no entries formatted using the particular format present in any of the set entries identified by that given index.
In some configurations the apparatus comprises address processing circuitry responsive to an input lookup address in the input memory address space and for each given active coalesced format of the active coalesced formats: to generate an index based on at least a portion of the input lookup address, wherein the portion is dependent on the given active coalesced format, to determine whether an identified entry corresponding to the input lookup address and formatted using the given active coalesced format is present in the translation lookaside buffer at a location identified by the index, and in response to a determination that the identified entry is present in the translation lookaside buffer formatted in the given coalesced format, to determine a translated output address based on identified translation data stored in the identified entry. The address processing circuitry may be provided as a dedicated circuitry block or may be provided as part of one or more of the control circuitry and the translation lookaside buffer. In some configurations, the address processing circuitry is configured to perform lookups for at least two formats (e.g. at least two active coalesced formats or one or more active coalesced formats and the non-coalesced format) in parallel. In some configurations, each lookup may be performed sequentially. The determination that the identified entry is present may comprise comparing a tag portion of the input lookup address to tag portions stored in each entry of the translation lookaside buffer identified by the index.
In some configurations the address processing circuitry is responsive to the input lookup address: to generate a non-coalesced index based on at least a further portion of the input lookup address, to determine whether a further identified entry corresponding to the input lookup address and formatted using the non-coalesced format is present in the translation lookaside buffer at a location identified by the non-coalesced index, and in response to a determination that the further identified entry is present in the translation lookaside buffer formatted in the non-coalesced format, to determine the translated output address based on further translation data stored in the further identified entry. Because the indexing bits used for entries formatted using the coalesced format may be different from the indexing bits used for entries formatted using the non-coalesced formats, the entry or entries identified by the non-coalesced index may, for the same input memory address, be different to those identified by the index for a coalesced entry. The non-coalesced lookup and the coalesced lookup may be performed either in parallel or sequentially.
In some configurations the address processing circuitry is responsive to the translation lookaside buffer not storing the required memory address translation, to: trigger a page table walk using the input lookup address to determine the translated output memory address from a plurality of page tables, and allocate new translation data to one of the plurality of entries in the translation lookaside buffer, the new translation data representing translation of at least the input lookup address to the translated output address and being stored in the one of the plurality of entries in a format chosen from the format group. The translated output memory address may be determined from a page table walk comprising plural accesses to page tables stored in memory. The translated output memory address may be allocated in the translation lookaside buffer according to an allocation policy which may identify, for example, when the translation lookaside buffer is full (at capacity), whether existing entries of the translation lookaside buffer are to be replaced by the translated output memory address.
In some configurations the page table walk returns a plurality of output memory addresses including the translated output memory address. The address translation circuitry is responsive to a determination that the plurality of output memory addresses can be coalesced into a single entry using one of the active coalesced formats, to represent the new translation data using one of the active coalesced formats. In addition, the address translation circuitry is responsive to a determination that the plurality of output memory addresses cannot be coalesced into the single entry using one of the active coalesced formats, to represent the new translation data using the non-coalesced format. Typically, a page table walk will return an entire cache line of translations corresponding to consecutive blocks of input memory addresses. The address translation circuitry can be configured to identify, for each active coalesced format, which of the plurality of output memory addresses can be grouped using that active coalesced format. The determination may be based on whether the output memory addresses are located within a same output address range having the output address range size defined in the given coalesced format and whether the input memory addresses are located within a same input address range as the input memory address, the input address range having the input address range size defined in the given coalesced format.
In some configurations where the coalesced format information identifies a plurality of active coalesced formats, the address processing circuitry may be responsive to a determination that the plurality of output memory addresses can be coalesced into a single entry using one of the active coalesced formats, to generate the new translation data by generating a plurality of candidate sets of translation data each corresponding to one of the plurality of active coalesced formats and to select the new translation data out of the plurality of candidate sets of translation data based on a number of active translations associated with each of the plurality of candidate sets of translation data.
The candidate set of translation data selected may be candidate translation data having the most active translations or, where there is no single set of candidate translation data having the most active translations, the selected candidate set of translation data may be selected based on another criteria, for example the candidate set of translation data associated with the largest/smallest range of the output address space. By generating the new translation data in this way, the number of address translations that are allocated to the translation lookaside buffer can be increased whilst allowing for fragmentation of the memory address spaces. For example, when there is little or no fragmentation of the memory address spaces, the plurality of output memory addresses are likely to be located within a same output address range having a smallest output address range size amongst the sizes defined by the various coalesced formats. However, as the memory address space fragments (e g., due to reallocation of address translations during normal processing), it becomes less likely that the plurality of output memory addresses are located within a same output address range having that smallest output address range size, reducing the use of the coalesced format identifying the smallest output address range size. However, it may still be possible to allocate a coalesced entry using one of the coalesced formats identifying an output address range size that is larger than the smallest output address range size. Such a coalesced format may not allow as many output memory addresses to be coalesced but may still allow coalesced entries to be generated as the memory address space fragments, thereby increasing the number of address translations that can be allocated to the translation lookaside buffer as fragmentation increases.
The input memory address space and the output memory address space can be any memory address spaces. In some configurations, the block of input memory addresses is a block of virtual memory addresses and the input memory address space is a virtual memory address space; and the block of output memory addresses is a block of physical memory addresses and the output memory address space is a physical memory address space. In alternative configurations one of the blocks of input or output memory addresses is a block of intermediate physical memory addresses and one of the input or output memory address spaces is an intermediate physical memory address space such that the translation lookaside buffer stores one or more of virtual address to physical address translations, virtual address to intermediate physical address translations, and intermediate physical address to physical address translations. It would be readily apparent to the skilled person that a translation lookaside buffer could be provided to translate between blocks of memory addresses in any memory address space in addition to those identified herein.
Concepts described herein may be embodied in computer-readable code for fabrication of an apparatus that embodies the described concepts. For example, the computer-readable code can be used at one or more stages of a semiconductor design and fabrication process, including an electronic design automation (EDA) stage, to fabricate an integrated circuit comprising the apparatus embodying the concepts. The above computer-readable code may additionally or alternatively enable the definition, modelling, simulation, verification and/or testing of an apparatus embodying the concepts described herein. For example, the computer-readable code for fabrication of an apparatus embodying the concepts described herein can be embodied in code defining a hardware description language (HDL) representation of the concepts. For example, the code may define a register-transfer- level (RTL) abstraction of one or more logic circuits for defining an apparatus embodying the concepts The code may define a HDL representation of the one or more logic circuits embodying the apparatus in Verilog, SystemVerilog, Chisel, or VHDL (Very High-Speed Integrated Circuit Hardware Description Language) as well as intermediate representations such as FIRRTL. Computer-readable code may provide definitions embodying the concept using system-level modelling languages such as SystemC and SystemVerilog or other behavioural representations of the concepts that can be interpreted by a computer to enable simulation, functional and/or formal verification, and testing of the concepts.
Additionally or alternatively, the computer-readable code may define a low-level description of integrated circuit components that embody concepts described herein, such as one or more netlists or integrated circuit layout definitions, including representations such as GDSII. The one or more netlists or other computer-readable representation of integrated circuit components may be generated by applying one or more logic synthesis processes to an RTL representation to generate definitions for use in fabrication of an apparatus embodying the invention. Alternatively or additionally, the one or more logic synthesis processes can generate from the computer-readable code a bitstream to be loaded into a field programmable gate array (FPGA) to configure the FPGA to embody the described concepts. The FPGA may be deployed for the purposes of verification and test of the concepts prior to fabrication in an integrated circuit or the FPGA may be deployed in a product directly.
The computer-readable code may comprise a mix of code representations for fabrication of an apparatus, for example including a mix of one or more of an RTL representation, a netlist representation, or another computer-readable definition to be used in a semiconductor design and fabrication process to fabricate an apparatus embodying the invention. Alternatively or additionally, the concept may be defined in a combination of a computer-readable definition to be used in a semiconductor design and fabrication process to fabricate an apparatus and computer- readable code defining instructions which are to be executed by the defined apparatus once fabricated. Such computer-readable code can be disposed in any known transitory computer-readable medium (such as wired or wireless transmission of code over a network) or non-transitory computer-readable medium such as semiconductor, magnetic disk, or optical disc. An integrated circuit fabricated using the computer-readable code may comprise components such as one or more of a central processing unit, graphics processing unit, neural processing unit, digital signal processor or other components that individually or collectively embody the concept.
Particular configurations of the invention will now be described with reference to the accompanying figures.
Figure 1 schematically illustrates an example of a data processing apparatus comprising: one or more processing elements (PE) 100, an interconnect circuit 110, a dynamic random access memory (DRAM) 120 and a DRAM controller 130. Each of the processing elements 100 can access at least some of the memory locations in the DRAM 120. In principle this access could be directly via actual (physical) memory addresses. However, in order to provide partitioning and a degree of security between memory accesses by different processing elements (or in some cases different operating systems running on the processing elements 100), the processing elements 100 refer to memory addresses by virtual memory addresses. These require translation into output or physical memory addresses to access real (physical) memory locations in the DRAM 120. Such translations are handled by translation apparatus 115 such as a Memory Management Unit (MMU).
This arrangement therefore provides an example of data processing apparatus comprising: a memory 120 accessible according to physical memory addresses; one or more processing elements 100 to generate virtual memory addresses for accessing the memory; and memory address translation apparatus 115 to provide a translation of the initial memory addresses generated by the one or more processing elements to physical memory addresses provided to the memory. In the context of such a translation, the virtual memory addresses may be considered as input memory addresses and the physical memory addresses as output memory addresses.
However, address translation can (from the point of view of a processing element 100) be performed by a translation lookaside buffer (TLB) 105 associated with that processing element. The TLB 105 stores or buffers recently-used translations between virtual memory addresses and physical memory addresses. In operation, the processing element 100 refers a virtual memory address to the TLB 105 Assuming the translation is stored at the TLB 105, the virtual memory address is translated to a physical memory address which then forms part of a memory access to be DRAM 120. However, the TLB has limited size and cannot store every single possible memory address translation which may be called upon by the processing element 100. In the case that a required translation is not present in the TLB 105, the TLB refers the request to the translation apparatus 115, for example forming part of the interconnect circuitry 110. The translation apparatus operates to provide or otherwise obtain the required translation and pass it back to the TLB 105 where it can be stored and used to translate a virtual memory address into a physical memory address.
Figure 2 schematically illustrates the use of a translation lookaside buffer (TLB) 105 and control circuitry 103. For the purposes of Figure 2, other items relating to the data communication between the TLB 105 and the MMU 115 are omitted for clarity of the diagram. As part of the operation of the processing element (or other module or arrangement with which the TLB 105 is associated), the TLB 105 receives a virtual address (VA) 102 relating to a required memory access. This could of course be a read or a write memory access. It is immaterial to the present discussion which type of memory access is underway. Referring also to Figure 3 (which is a schematic flowchart illustrating operations of the TLB 105), supply of a VA 102 to the TLB 105 forms a request for a translation to be performed to determine a corresponding output physical address (PA) 104 for the VA 102 (shown in Figure 3 as a step 200).
The TLB 105 contains a cache or store of translations between VA and PA. The criteria by which the TLB 105 stores particular VA to PA translations can be established according to known techniques for the operation of a TLB and will be discussed further below. The cached translations might include recently used translations, frequently used translations and/or translations which are expected to be required soon (such as translations relating to VAs which are close to recently- accessed VAs). Overall, the situation is that the TLB contains a cache of a subset of the set of all possible VA to PA translations, such that when a particular VA to PA translation is required, it may be found that the translation is already held in the cache at the TLB.
Accordingly, at a next step 210, the TLB 105 detects whether the required translation is indeed currently cached by the TLB. If the answer is yes, then control passes to a step 240 in which the required translation is applied to the VA 102 to generate the PA 104. However, if the answer is no, then control passes to a step 220 at which the TLB 105 sends a request, comprising the required VA 222, to the MMU 115. The MMU 115 derives the required VA to PA translation (using techniques to be discussed below) and sends at least the PA 232 corresponding to the VA 222 back to the TLB 105 where it is stored at a step 230.
Finally, at the step 240, the TLB 105 applies the translation stored at the TLB 105 to provide the output PA 104.
The TLB 105 is capable of storing translation data within entries that are formatted using one of a plurality of possible coalesced formats where each entry formatted using a given one of the coalesced formats is capable of identifying a plurality of memory address translations between blocks of input memory addresses, located within a same input address range having the input address range size defined in the given coalesced format, and blocks of output memory addresses, located within a same output address range having the output address range size defined in the given coalesced format. The TLB 105 is provided with control circuitry 103 which is arranged to maintain coalesced format information identifying which of the coalesced formats are currently available for use. The information identifying the coalesced formats may be stored in the TLB 105, the control circuitry 103, or as part of one or more sets of control information stored external to the TLB 105 and the control circuitry 103.
An example of the operation of the MMU 115 to obtain a required translation of the VA 222 to the PA 232 will now be described. Figure 4 schematically illustrates an example of a stage 1 page table walk (PTW) process, and Figure 5 is a schematic flowchart illustrating a PTW process.
In this example, a VA 222 which requires translation is formed as a 48-bit value. However, it will be appreciated that the techniques are applicable to addresses of various lengths, and indeed that the length of a VA need not necessarily be the same as the length of a PA. Different portions of the VA 222 are used at different stages in the PTW process.
To obtain a first entry in the page table hierarchy, in a "level 0 table" 310, a base address stored in a base address register 300 (Figure 4) is obtained at a step 400 (Figure 5). A first portion 312 of the VA 222, being the 9 most significant bits, is added to the base address as an offset, at a step 410, so as to provide the PA 314 of an entry in a level 1 table 310. The relevant page table entry is looked up in physical memory, or in any intervening cache (e g. a level 2 cache 50) if the relevant page is cached, at a step 430.
At a step 440, a detection is made as to whether a final level (level 3 in the example of figure 4) of the page table walk has been reached in the page table hierarchy. If not, as in the present case, control passes to a step 450 at which the retrieved page table entry is used as a base address of a next table in the hierarchy. The page table entry acts as the next level table in the hierarchy, a "level 1 table" 320. Control returns to the step 410.
At the second iteration of the step 410, a further part 322 of the VA 222, being the next 9 bits [38:30] of the VA 222, forms an offset from the base address of the table 320 in order to provide the PA of an entry 324 in the table 320. This then provides the base address of a "level 2 table" 330 which in turn (by the same process) provides the base address of a "level 3 table" 340.
When the final level of the page table walk (level 3 in the example of figure 4) has been accessed, the answer to the detection at the step 440 is "yes". The page table entry indicated by the PA 344 provides a page address and access permissions relating to a physical memory page. The remaining portion 352 of the VA 222, namely the least significant 12 bits [11:0] provides a page offset within the memory page defined by the page table entry at the PA 344, though in an example system which stores information as successive four byte (for example 32 bit) portions, it may be that the portion [11:2] provides the required offset to the address of the appropriate 32 bit word.
Therefore, the combination (at a step 460) of the least significant portion of the VA 222 and the final page table entry (in this case, from the "level 3 table" 340) provides (at a step 470) the PA 232 as a translation of the VA 222.
Note that multiple stage MMUs are used in some situations. In this arrangement, two levels of translation are in fact used. A virtual address (VA) required by an executing program or other system module such as a graphics processing unit (GPU) is translated to an intermediate physical address (IP A) by a first MMU stage. The IPA is translated to a physical address (PA) by a second MMU stage. One reason why multiple stage translation is used is for security of information handling when multiple operating systems (OS) may be in use on respective “virtual machines” running on the same processor. A particular OS is exposed to the VA to IPA translation, whereas only a hypervisor (software which oversees the running of the virtual machines) has oversight of the stage 2 (IPA to PA) translation. In a multiple stage MMU, for a VA to IPA translation, the VA may be considered as the input memory address and the IPA as the output memory address. For an IPA to PA translation, the IPA may be considered as the input memory address and the PA as the output memory address.
Whilst in the illustrated example of figures 4 and 5 the returned PA is a single PA, in some alternative configurations, an entire cache line of physical addresses is returned with the required physical address identified by one or more of the bits of the 12-bit page offset 352. In such configurations, the TLB 105 is responsive to the returned cache line of physical addresses to determine whether one or more of the translations identified in the cache line can be allocated, in addition to the returned PA, as a coalesced entry in the TLB 105 and, if so, the TLB 105 or address processing circuitry associated therewith is arranged to allocate a coalesced entry in the TLB 105.
Figure 6 schematically illustrates a coalesced (clustered) TLB entry 600 according to various configurations of the present techniques. The coalesced TLB entry 600 contains translation data mapping each of the virtual addresses 602 to each of the physical addresses 604. The coalesced TLB entry 600 comprises an aligned virtual address (VA aligned), an aligned physical address (PA aligned), a validity map, and a plurality of clustered PA offset fields (PA cluster). The aligned virtual address is aligned to an address boundary of a region of virtual address space sized to include each virtual address that is coalesced into that entry. In the illustrated example, 8 virtual address blocks are clustered into the entry so the aligned virtual address is aligned to an address boundary of a region having a size of 8 virtual address blocks. The aligned physical address is aligned to an address boundary of a region of the physical address space sized to include all possible physical address blocks that can be identified by the PA offsets. In the illustrated configuration, the PA offsets are each 3 bits identifying 1 of 8 possible physical address blocks and the aligned PA is aligned to a physical address boundary of a physical address region having a size of 8 physical address blocks. The validity map indicates which PA offsets of the PA offsets identify a valid translation and the PA offsets each identify a corresponding PA offset to be added to the aligned PA address in order to identify a particular address translation.
In the illustrated configuration the VAs 0x8800 to 0x8804 are mapped, respectively, to the physical addresses 0xF003 to 0xF007, VAs 0x8805 and 0x8806 are unmapped (no valid translation) and VA 0x8807 is mapped to PA OxFOOO. In the TLB entry 600, the address mappings are identified by recording the aligned VA (i.e., the portion of the VA that is common to each of the VAs 602), the aligned PA (i.e., the portion of the PA that is common to each of the PAs 604), and each of the PA offsets in order of the VA to which they correspond.
The translation from VA 0x8800 to PA 0xF003 is identified in the coalesced entry 600 through the aligned VA 0x8800, the aligned PA OxFOOO, a set validity bit in the zeroth position (the right most position) of the validity map, and a PA offset of 3 in the zeroth position (the right most position) of the PA offsets.
The translation from VA 0x8801 to PA 0xF004 is identified in the coalesced TLB entry 600 through the aligned VA 0x8800, the aligned PA OxFOOO, a set validity bit in the first position of the validity map, and a PA offset of 4 in the first position of the PA offsets.
The translation from VA 0x8802 to PA 0xF005 is identified in the coalesced TLB entry 600 through the aligned VA 0x8800, the aligned PA OxFOOO, a set validity bit in the second position of the validity map, and a PA offset of 5 in the second position of the PA offsets.
The translation from VA 0x8803 to PA 0xF006 is identified in the coalesced TLB entry 600 through the aligned VA 0x8800, the aligned PA OxFOOO, a set validity bit in the third position of the validity map, and a PA offset of 6 in the third position of the PA offsets.
The translation from VA 0x8804 to PA 0xF007 is identified in the coalesced TLB entry 600 through the aligned VA 0x8800, the aligned PA OxFOOO, a set validity bit in the fourth position of the validity map, and a PA offset of 7 in the fourth position of the PA offsets.
The invalid translations for VAs 0x8805 and 0x8806 are identified in the coalesced TLB entry 600 as invalid entries through a clear (not set) validity bit in the fifth and sixth positions of the validity map.
The translation from VA 0x8807 to PA OxFOOO is identified in the coalesced TLB entry 600 through the aligned VA 0x8800, the aligned PA OxFOOO, a set validity bit in the seventh position of the validity map, and a PA offset of 0 in the fourth position of the PA offsets. The TLB entry 600 will be identified in response to a lookup using any of the VAs 0x8800 to 0x8807 and a corresponding PA (where the translation is valid) can be returned by identifying the specific PA offset within the TLB entry 600.
Example implementations of the present disclosure relate to a TLB that is capable of supporting plural coalesced formats in which a size of a region of an input address space from which memory address translations can be coalesced into a single coalesced entry and/or a size of a region of output address space from which memory address translations can be coalesced into the single coalesced entry are different between different coalesced formats. For such formats, n- bits can be used to express the VA offset values and m bits can be used to express the PA offset values, in which n does not necessarily equal m. Figure 7 schematically illustrates the translation between a VA and a PA for a case where n does not equal m. A VA is defined by a VA base address 900 plus a VA offset 910 of n bits (the combination of the VA base address 900 and the VA offset 910 providing the same information as one of the VAs 602 of Figure 6), plus LSBs 920 of which one or more least significant bits 930 may be set to 0 so that each VA and each PA refers to a word boundary. In general the size of the VA space is not equal to the size of the PA space. For example, the PA space may be a 40 bit address space and the VA space may be a 48 bit address space. Neglecting for a moment any difference in the size of the VA and PA offsets, in the illustrated example the total number of bits provided for the VA base address is larger (by a margin 943) than the total number of bits provided for the PA base address. In addition, the VA is mapped by a memory address translation to a corresponding PA defined by a PA base address 940 plus a PA offset 950 of m bits, where in the illustrated configuration m > n which also implies that the number of least significant bits that are stored for the PA base address 940 is smaller (by a margin 942) than the bit length of the VA base address 900, assuming the size of each VA and PA is the same. The PA base address 940 plus the PA offset 950 is concatenated with LSBs 960 which correspond to LSBs 920 to form the whole translated PA 970.
Figure 8 schematically illustrates a further example of a coalesced TLB entry 700 using a first coalesced format that may be supported by a TLB 105 according to various configurations of the present techniques. The arrangement of the mappings between the VAs 702 and the PAs 704 that are recorded in the entry 700 using the first format are the same as those described in relation to figure 6 and, for reasons of conciseness, will not be repeated. The TLB entry 700 formatted using the first coalesced format is capable of coalescing up to 8 memory address translations and differs from the TLB entry 600 formatted using the general coalesced format in that the TLB entry 700 formatted using the first format is able to support a greater range of PAs due to additional bits being provided for each of the PA offsets. In addition to the memory address translations that could be identified using the general coalesced format of figure 6, the TLB entry 700 formatted using the first coalesced format is able to store a translation 706 from VA 0x0_00F0_8805 to PA 0x000_F00F by storing the aligned VA 0x0_00F0_8800, the aligned PA 0x000_F000, a set validity bit in the fifth position of the validity map and a PA offset of F in the fifth position of the PA offsets.
The first coalesced format of TLB entry 700 illustrated in figure 8 uses 42 bits, in addition to the bits that would be used to identify a single memory address translation, to identify the translation data. In particular seven additional offsets (in addition to the one offset required for a single translation) each having 5 bits are provided and an additional 7 validity bits (in addition to the one validity bit required for a single translation) are provided. One or more further bits (not illustrated) may also be provided to indicate the format of the TLB entry 700.
Expressed in another way the range of the virtual address space that can be covered by an entry in a given coalesced entry format is defined by a virtual address clustering factor (VA_CF) which specifies a number of blocks of virtual address space that can be coalesced into an entry for the given coalesced format. Similarly, the range of the physical address space that can be covered by an entry of a given coalesced format is defined by a physical address clustering factor (PA_CF) which specifies a number of blocks of physical address space that can be coalesced into an entry for the given coalesced format. For an m-bit physical address offset, PA_CF=2m, and for an n-bit virtual address offset, VA_CF=2n. When the TLB also supports a non-coalesced entry, the number of additional bits that are required to incorporate the additional memory address translation into an entry formatted using the given coalesced format is given by:
(VA_CF - 1) + (VA_CF - 1) x log2 PA_CF) + 1, where the first term (VA_CF — 1) is the number of additional validity bits required, the second term (VA_CF — 1) X log2 PA_CF) is the number of additional bits required to store the physical address offsets, and the third term (+1) is an additional bit used to distinguish whether the entry is in the coalesced format or the non-coalesced format. In configurations where the TLB supports more than one active coalesced format, further bits may be added to distinguish between the active coalesced formats
Figure 9 schematically illustrates an example of a second coalesced format used by the TLB entry 800 to identify memory address translations from the VAs 802 to PAs 804. The TLB entry 800 formatted using the second coalesced format is capable of identifying up to 4 memory address translations between VAs, that are within an aligned range of the virtual memory address space having a size equal to four blocks of memory addresses, and their corresponding PAs. The second coalesced format of TLB entry 800 uses the same number of additional bits as the first coalesced format, i.e., 42 bits in addition to the bits that would be used to identify a single memory address translation. Because four fewer translations are recorded in the entry 800, there are a number of bits that can be repurposed from the first coalesced entry format to increase the range of the offsets provided in the second coalesced entry format without having to increase a total number of bits provided for the entry. In particular, four lots of five offset bits and four validity bits that can be repurposed from their use in the first coalesced format to be used for storing PA offsets in the second coalesced format. These 24 bits are split equally between the remaining additional PA offsets (i.e., the three PA offsets that would not be present in a single non-coalesced translation) allowing a total of 13 bits for each PA offset. The second coalesced format can therefore coalesce up to four translations provided that the PAs of each of the four translations are contained within a same region of the physical address space having a size equal to 213 times the size of a block of physical addresses.
In figure 9, four memory address translations are defined. Specifically, the memory address translations are defined from VA 0x0_00F0_AB00 to PA 0xF0F_F001, from VA 0x0_00F0_AB01 to PA OxFOF FFFF, VA 0x0_00F0_AB02 to PA 0xF0F_F00F, and VA 0x0_00F0_AB03 to PA 0xF0F_E000.
The translation from VA 0x0 00F0 AB00 to PA OxFOF FOOl is identified through the aligned VA 0x0_00F0_AB00, the aligned PA 0xF0F_E000, a set validity bit in the zeroth position of the validity map, and a PA offset of 1001 in the zeroth position of the PA offsets. It is noted that, whilst the PA offset has been represented using four hexadecimal values, this is for representation purpose only and the second coalesced format provides 13 bits (three hexadecimal values and 1 additional bit) for each PA offset. As such, a PA offset, in hexadecimal notation, of 1XXX (where X is any hexadecimal value) is to be interpreted as a binary value of I ZZZZ ZZZZ ZZZZ (where Z is any binary value) and a PA offset, in hexadecimal notation, of OXXX is to be interpreted as a binary value of O ZZZZ ZZZZ ZZZZ.
The translation from VA 0x0_00F0_AB01 to PA 0xF0F_FFFF is identified through the aligned VA 0x0_00F0_AB00, the aligned PA 0xF0F_E000, a set validity bit in the first position of the validity map, and a PA offset of 1FFF in the first position of the PA offsets.
The translation from VA 0x0_00F0_AB02 to PA 0xF0F_F00F is identified through the aligned VA 0x0_00F0_AB00, the aligned PA 0xF0F_E000, a set validity bit in the second position of the validity map, and a PA offset of 100F in the second position of the PA offsets.
The translation from VA 0x0_00F0_AB03 to PA 0xF0F_E000 is identified through the aligned VA 0x0_00F0_AB00, the aligned PA 0xF0F_E000, a set validity bit in the third position of the validity map, and a PA offset of 0000 in the third position of the PA offsets.
Figure 10 schematically illustrates an example of a third coalesced format used by the TLB entry 900 to identify memory address translations from the VAs 902 to PAs 904. The TLB entry 900 formatted using the third coalesced format is capable of identifying up to 2 memory address translations between VAs, that are within an aligned range of the virtual memory address space having a size equal to two blocks of memory addresses, and their corresponding PAs. The third coalesced format of TLB entry 900 uses the same number of additional bits as the first coalesced format, i.e., 42 bits in addition to the bits that would be used to identify a single memory address translation. Because two fewer translations are recorded in the entry 900, there are two lots of thirteen offset bits and two validity bits that can be repurposed from their use in the second coalesced format to be used for storing PA offsets in the third coalesced format. These 28 bits are used for the remaining additional PA offset (i.e., the one PA offset that would not be present in a single non-coalesced translation) allowing a total of 41 bits for each PA offset. The third coalesced format can therefore coalesce up to two translations provided that the PAs of each of the four translations are contained within a same region of the physical address space having a size equal to 241 times the size of a block of physical addresses. In some configurations, the physical address space is a 40 bit address space and the third coalesced format is therefore able to coalesce entries from anywhere within the physical address space.
In figure 10, two memory address translations are defined. Specifically, the memory address translations are defined from VA 0xl_FFF0_l 100 to PA 0xA93_lC0F, and from VA 0xl_FFF0_l 101 to PA 0x03F_05B0.
The translation from VA 0xl_FFF0_l 100 to PA 0xA93_lC0F is identified through the aligned VA 0xl_FFF0_l 100, the aligned PA 0x000_0000, a set validity bit in the zeroth position of the validity map, and a PA offset of 0xA93_lC0F in the zeroth position of the PA offsets
The translation from VA 0xl_FFF0_l 101 to PA 0x03F_05B0 is identified through the aligned VA 0xl_FFF0_l 100, the aligned PA 0x000_0000, a set validity bit in the first position of the validity map, and a PA offset of 0x03F_05B0 in the first position of the PA offsets.
The three coalesced formats illustrated in figures 8-10 each identify a different number of memory address translations with the number of memory address translations traded off against the range of the physical address space that can be coalesced into a single entry.
Figure 11 schematically illustrates a sequence of conversions from virtual addresses in a virtual address space 1000 to physical addresses in a physical address space 1002. The translation data identifies five active (valid) translations. The VA 0x8800 translates to the PA OxOFOO, the VA 0x8801 translates to the PA OxFFFF, the VA 0x8802 translates to the PA OxOFlF, the VA 0x8803 translates to the PA 0x1000, and the VA 0x8807 translates to the PA OxOFOl . There are various ways that these translations could be stored in the TLB dependent on a format used for an entry storing the translation data. Using the first coalesced format 1004 comprising up to 8 VA- PA translations, the translations 0x8800 to OxOFOO and 0x8807 to OxOFOl can be coalesced into a single entry. Using the second coalesced format 1006 comprising up to 4 VA-PA translations, the translations 0x8800 to OxOFOO, 0x8802 to OxOFlF and 0x8803 to 0x1000 can be coalesced into a single entry. Using the third coalesced format 1008 comprising up to 2 VA-PA translations, the translations 0x8800 to OxOFOO and 0x8801 to OxFFFF can be coalesced into a single entry. In terms of allocation, where the translation data of figure 11 corresponds to translations returned in a single cache line, the address allocation circuitry may be configured to select the new address translation to allocate from the first coalesced format 1004, the second coalesced format 1006, and the third coalesced format 1008. In the illustrated configuration, the second coalesced format 1006 provides the greatest number of translations per entry and may therefore be selected for a new address translation.
Figure 12 schematically illustrates the translation of a virtual address 1202 to a physical address 1232 using an apparatus 1200 according to some configurations of the present technique. The apparatus 1200 is provided with a translation lookaside buffer 1208, first hash generating circuitry 1204, second hash generating circuitry 1206, tag comparison circuitry for a coalesced translation 1226, tag comparison circuitry for a non-coalesced (single) translation 1228, and address forwarding circuitry 1230.
The translation lookaside buffer 1208 is arranged as a set associative cache comprising four set entries (ways) per index. In response to receipt of the virtual address 1202, two lookups in the translation lookaside buffer 1208 are performed. The first lookup is a non-coalesced lookup at a location in the translation lookaside buffer 1208 identified using a non-coalesced index generated by first hash generating circuitry 1204 to identify a plurality of set entries 1210. The second lookup is a coalesced lookup at a location in the translation lookaside buffer 1208 identified using a coalesced index generated by the second hash generating circuitry 1206. The index generated by the second hash generating circuitry generates the hash based on one or more indexing bits of the virtual address 1202 that are defined by a currently active coalesced format identified in the coalesced format information 1214. Because the one or more indexing bits for the coalesced index are dependent on the currently active coalesced format, different indexing bits of the virtual address 1202 may be used for generation of the coalesced index and for generation of the non-coalesced index. As a result, a second plurality of set entries 1212 is identified based on the coalesced index.
In order to identify whether one of the first plurality of set entries 1210 and the second plurality of set entries 1212 corresponds to the virtual address 1202, a tag comparison is made using the tag comparison circuitry for the non-coalesced translation 1228 and the tag comparison circuitry for the coalesced translation 1226. Whilst this could, in principle, be performed based on a tag comparison of each of the set entries 1210 and 1212 against a tag portion of the virtual address 1202, this would require 8 tag comparisons for the 4 way set associative translation lookaside buffer 1208. The number of tag comparisons made is reduced by eliminating half of the set entries of each of the first plurality of set entries 1210 and the second plurality of set entries 1212.
In the illustrated configuration, a skew bit 1216 is identified from the virtual address 1202. The choice of skew bit 1216 is dependent on the coalesced format information 1214 and is the least significant bit of the virtual address 1202 that is not used in the generation of the coalesced index or the non-coalesced index. The skew bit is fed into the first coalesced entry selection circuitry 1218, the second coalesced entry selection circuitry 1220, the first non-coalesced entry selection circuitry 1222 and the second non-coalesced entry selection circuitry 1224.
The first coalesced entry selection circuitry 1218 selects way 11 of the second plurality of set entries 1212 to be forwarded to the tag comparison circuitry for the coalesced translation 1226 when the skew bit has a value of 1, and selects way 10 of the second plurality of set entries 1212 to be forwarded to the tag comparison circuitry for the coalesced translation 1226 when the skew bit has a value of 0.
The second coalesced entry selection circuitry 1220 selects way 01 of the second plurality of set entries 1212 to be forwarded to the tag comparison circuitry for the coalesced translation 1226 when the skew bit has a value of 1, and selects way 00 of the second plurality of set entries 1212 to be forwarded to the tag comparison circuitry for the coalesced translation 1226 when the skew bit has a value of 0.
The first non-coalesced entry selection circuitry 1222 selects way 11 of the first plurality of set entries 1210 to be forwarded to the tag comparison circuitry for the non-coalesced translation 1228 when the skew bit has a value of 0, and selects way 10 of the first plurality of set entries 1210 to be forwarded to the tag comparison circuitry for the non-coalesced translation 1228 when the skew bit has a value of 1.
The first non-coalesced entry selection circuitry 1224 selects way 01 of the first plurality of set entries 1210 to be forwarded to the tag comparison circuitry for the non-coalesced translation 1228 when the skew bit has a value of 0, and selects way 00 of the first plurality of set entries 1210 to be forwarded to the tag comparison circuitry for the non-coalesced translation 1228 when the skew bit has a value of 1.
As a result, when the skew bit 1216 takes a value of 1, the tag comparison circuitry for the coalesced translation 1226 receives ways 11 and 01 of the plurality of entry sets identified by the coalesced index, whilst the tag comparison circuitry for the non-coalesced translation 1228 receives ways 10 and 00 of the plurality of entry sets identified by the non-coalesced index. Furthermore, when the skew bit 1216 takes a value of 0, the tag comparison circuitry for the coalesced translation 1226 receives ways 10 and 00 of the plurality of entry sets identified by the coalesced index, whilst the tag comparison circuitry for the non-coalesced translation 1228 receives ways 11 and 01 of the plurality of entry sets identified by the non-coalesced index. The total number of tag comparisons for the four way set associative translation lookaside buffer 1208 is therefore equal to four.
The tag comparison circuitry for the non-coalesced translation 1228 performs a tag comparison between a tag portion of the virtual address 1202 and a tag portion of stored virtual addresses in the ways forwarded by the first non-coalesced entry selection circuitry 1222 and the second non-coalesced entry selection circuitry 1224. When a tag match is identified, the physical address of the matching entry is passed to address forwarding circuitry 1230.
The tag comparison circuitry for the coalesced translation 1226 performs a tag comparison between a tag portion of the virtual address 1202 and tag portions of stored virtual addresses in the set entries forwarded by the first coalesced entry selection circuitry 1218 and the second coalesced entry selection circuitry 1220. In order to identify the physical address of the matching entry, the tag comparison circuitry for the coalesced translation 1226 receives the coalesced format information 1214 identifying bits of the virtual address 1202 that are to be used to identify whether a base portion of the virtual address matches a stored base portion of one of the set entries and identifying bits of the virtual address 1202 that are used to identify whether a valid offset portion of a physical address is present in the matching entry. When a tag match is identified, the physical address of the matching entry is passed to address forwarding circuitry 1230. The address forwarding circuitry 1230 is responsive to receipt of a physical address from one of the tag comparison circuitry for the coalesced translation and the tag comparison circuitry for the non-coalesced translation to output a final physical address 1232. The address forwarding circuitry 1230 is responsive to an indication that neither the tag comparison circuitry for the coalesced translation 1226 nor the tag comparison circuitry for the non-coalesced translation 1228 has returned a matching set entry, to signal a TLB miss to trigger a page table walk to be performed to identify the corresponding physical address
Figure 13 schematically illustrates a sequence of steps performed by address processing circuitry according to various configurations of the present techniques. The steps begin at step 1300 where an input lookup address is received by the address processing circuitry. Flow then proceeds to step 1302 where the address processing circuitry retrieves the active coalesced formats. Flow then proceeds to step 1304 where the address processing circuitry selects a candidate coalesced format from the retrieved active coalesced formats. Flow then proceeds to step 1306 where the address processing circuitry triggers a lookup in the translation lookaside buffer using the coalesced format. The lookup may comprise any of the steps described above including generating an index from an indexing portion of the input address defined in the candidate coalesced format, identifying a subset of set entries (ways) using one or more skew bits defined based on the active coalesced formats, and performing a tag comparison on one or more set entries (ways) at the identified index. At step 1308 it is determined whether there is a hit in the translation lookaside buffer. If no hit is found then flow proceeds to step 1310 where it is determined if there are any more coalesced formats to consider. If, at step 1310, it is determined that there are more coalesced formats to consider then flow returns to step 1304. If, at step 1310, it was determined that there are no more coalesced formats to consider, then flow proceeds to step 1312 where a noncoalesced lookup is performed in the translation lookaside buffer based on a non-coalesced (single address translation) format. If, at step 1312, the lookup misses (fails to identify a corresponding entry) in the translation lookaside buffer, then flow proceeds to step 1314 where a page table walk is triggered to determine the output memory address. If, at step 1308, it was determined that there was a hit for a candidate coalesced format in the TLB, then flow proceeds to step 1316 where it is determined whether the entry is in the candidate coalesced format. This is performed, for example, by inspecting one or more control bits that are present in the entry to determine if those control bits match values that are expected for the current candidate coalesced format. If, at step 1316, there is no match, then flow proceeds to step 1310. If, at step 1316, there is a match then flow proceeds to step 1318 where the address processing circuitry outputs the output memory address identified in the candidate coalesced format. If, at step 1312, it was determined that there was a hit in the translation lookaside buffer in the non-coalesced format, then flow proceeds to step 1318 where the output memory address identified using the non-coalesced format is output.
It would be appreciated by the person skilled in the art that the steps set out in figure 13 are illustrated sequentially for example purpose only and that many of these steps could be implemented in parallel. For example, in some configurations, step 1304, step 1306, step 1308 and step 1316 are performed for two or more of the coalesced formats in parallel. Similarly, step 1304, step 1306, step 1308, and step 1316 may be performed for one or more of the coalesced formats in parallel to performing step 1312. Furthermore, the person skilled in the art would appreciate that the steps performed during the lookups of step 1306 and 1312 may be adapted dependent on whether or not the particular implementation implements skewing (i.e., whether one or more skew bits are used to identify the particular ways that can be used for different formats) and dependent on the layout of the translation lookaside buffer (i.e., whether it is a set associative cache, a direct mapped cache, or a fully associative cache).
Figure 14 schematically illustrates a sequence of steps carried out by address processing circuitry when allocating an entry in the translation lookaside buffer. Flow begins at step 1400 where an output address is determined using, for example, a page table walk. The page table walk may be triggered, for example, in response to an input address missing in the translation lookaside buffer. Flow then proceeds to step 1402 where the active coalesced formats are retrieved by the translation lookaside buffer. Flow then proceeds to step 1404 where a candidate coalesced format is selected from the active coalesced formats. Flow then proceeds to step 1406 where candidate translation data is prepared using the candidate coalesced format. Flow then proceeds to step 1408 where it is determined whether there are any more coalesced formats. If, at step 1408, it is determined that there are more coalesced formats then flow returns to step 1404. If, at step 1408, it is determined that there are no more coalesced formats, i.e., a set of candidate translation data has been prepared for each of the coalesced formats, then flow proceeds to step 1410 where it is determined whether the address translation data can be represented using one of the candidate coalesced formats. In other words, it is determined if any of the sets of candidate translation data includes a plurality of valid memory address translations. If, at step 1410, it is determined that none of the candidate sets of translation data includes a plurality of valid memory address translations, then flow proceeds to step 1412 where a new memory address translation, including translation data identifying a translation between the input memory address that triggered the page table walk and the returned output memory address, is allocated in the translation lookaside buffer. If, at step 1410, it is determined that one or more of the candidate sets of translation data includes a plurality of valid memory address translations, then flow proceeds to step 1414 where a new entry is allocated in the translation lookaside buffer to store a chosen one of the candidate sets of translation data that includes a plurality of valid memory address translations. Where only a single set of candidate translation data includes a plurality of valid memory address translations, then that set of candidate translation data is selected for entry in the translation lookaside buffer. If there are a plurality of sets of candidate translation data that include plural valid memory address translations, then the set of candidate translation data having the greatest number of valid translations may, for example, be selected. Where there are multiple sets of candidate translation data having the greatest number of valid translations, then the set of candidate translation data for allocation as an entry in the translation lookaside buffer may, for example, be selected based on a predetermined selection criteria. For example, the predetermined selection criteria may indicate that the set of candidate translation data using a format that spans the greatest range the output memory address space is selected.
It would be readily apparent to the skilled person that step 1404, and step 1406 can be performed for two or more of the retrieved coalesced formats in parallel. Furthermore, it would be apparent to the skilled person that the predetermined selection criteria may can be any criteria with which a unique set of candidate translation data can be identified. Such predetermined selection criteria may include selecting based on one or more of the following criteria: the candidate translation data having memory address translations that span a smallest/greatest region of the output memory address space, the candidate translation data formatted using a coalesced format capable of storing the most/fewest memory address translations, and/or the candidate translation data having the fewest invalid entries.
Figure 15 schematically illustrates a non-transitory computer-readable medium comprising computer readable code for fabrication of an apparatus according to various configurations of the present techniques. Fabrication is carried out based on computer readable code 1002 that is stored on a non-transitory computer-readable medium 1000. The computer-readable code can be used at one or more stages of a semiconductor design and fabrication process, including an electronic design automation (EDA) stage, to fabricate an integrated circuit comprising the apparatus embodying the concepts. The fabrication process involves the application of the computer readable code 1002 either directly into one or more programmable hardware units such as a field programmable gate array (FPGA) to configure the FPGA to embody the configurations described hereinabove or to facilitate the fabrication of an apparatus implemented as one or more integrated circuits or otherwise that embody the configurations described hereinabove. The fabricated design 1004 may in one example implementation comprise the control circuitry 103, and the translation lookaside buffer 105 described in reference to figure 2. However, the fabricated design may comprise any of the implementations described in reference to any of figures 1, 2, and/or 12 arranged to carry out any of the processes described in figures 3-11 and 13-14
In brief overall summary there is provided an apparatus comprising a translation lookaside buffer (TLB) comprising plural entries capable of storing translation data. The TLB is configured to select, when allocating the translation data for storage within a given entry, a format used to store the translation data within the given entry, and the format is selected from a format group comprising plural coalesced formats. The apparatus is provided with control circuitry to maintain coalesced format information identifying active coalesced formats. Each coalesced format defines an input address range size and an output address range size, and each entry formatted using a coalesced format is capable of identifying plural address translations between input address blocks, located within an input address range having the input address range size defined in that coalesced format, and output address blocks, located within an output address range having the output address range size defined in that coalesced format.
In the present application, the words “configured to. ..” are used to mean that an element of an apparatus has a configuration able to carry out the defined operation. In this context, a “configuration” means an arrangement or manner of interconnection of hardware or software. For example, the apparatus may have dedicated hardware which provides the defined operation, or a processor or other processing device may be programmed to perform the function. “Configured to” does not imply that the apparatus element needs to be changed in any way in order to provide the defined operation.
In the present application, lists of features preceded with the phrase “at least one of’ mean that any one or more of those features can be provided either individually or in combination. For example, “at least one of: [A], [B] and [C]” encompasses any of the following options: A alone (without B or C), B alone (without A or C), C alone (without A or B), A and B in combination (without C), A and C in combination (without B), B and C in combination (without A), or A, B and C in combination.
Although illustrative configurations have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise configurations, and that various changes, additions and modifications can be effected therein by one skilled in the art without departing from the scope and spirit of the invention as defined by the appended claims. For example, various combinations of the features of the dependent claims could be made with the features of the independent claims without departing from the scope of the present invention.

Claims

1. An apparatus comprising: a translation lookaside buffer comprising a plurality of entries each capable of storing translation data comprising one or more memory address translations, the one or more memory address translations each defining an address translation between a block of input memory addresses in an input memory address space and a block of output memory addresses in an output memory address space, wherein the translation lookaside buffer is configured to select, when allocating the translation data for storage within a given entry, a format used to store the translation data within the given entry, and the format is selected from a format group comprising a plurality of coalesced formats; and control circuitry configured to maintain coalesced format information identifying one or more active coalesced formats of the plurality of coalesced formats currently available for use by the translation lookaside buffer; wherein: each of the plurality of coalesced formats defines an input address range size and an output address range size; and each entry formatted using a given coalesced format of the plurality of coalesced formats is capable of identifying a plurality of memory address translations between blocks of input memory addresses, located within a same input address range having the input address range size defined in the given coalesced format, and blocks of output memory addresses, located within a same output address range having the output address range size defined in the given coalesced format.
2. The apparatus of claim 1, wherein for each of the plurality of coalesced formats, a number of address translations that can be identified per entry is dependent on the output address range size defined in that coalesced format.
3. The apparatus of claim 2, wherein: the plurality of coalesced formats comprises at least a first coalesced format and a second coalesced format; the output address range size defined in the second coalesced format is greater than the output address range size defined in the first coalesced format; and the number of address translations that can be identified per entry formatted using the second coalesced format is defined such that a second number of bits required for the entry formatted using the second coalesced format is fewer than or equal to a first number of bits required by an entry formatted using the first coalesced format.
4. The apparatus of claim 2 or claim 3, wherein for each of the plurality of coalesced formats, the number of address translations is equal to a power of two
5. The apparatus of any preceding claim, wherein: the format group comprises a non-coalesced format comprising a single memory address translation between a single block of input memory addresses and a single block of output memory addresses; and the translation lookaside buffer is configured to store, for each of the plurality of entries, coalesced entry indicating information identifying whether the one or more memory address translations stored in that entry are represented using the non-coalesced format or one of the plurality of coalesced formats.
6. The apparatus of any of preceding claim, wherein: each of the plurality of coalesced formats defines the input address range size as covering 2n blocks of input memory addresses, and defines the output address range size as covering 2m blocks of output memory addresses; and each entry formatted using the given coalesced format stores a base input memory address, a base output memory address, and a plurality of fields each capable of storing a mapping between an n-bit offset in the input memory address space and an m-bit offset in the output memory address space.
7. The apparatus of any preceding claim, wherein the control circuitry is responsive to removal of a previously active coalesced format from the active coalesced formats, to initiate a scrubbing procedure to identify entries formatted using the previously active coalesced format and to remove the identified entries.
8. The apparatus of claim 7, wherein the scrubbing procedure comprises invalidating at least one of the identified entries.
9. The apparatus of claim 7 or claim 8, wherein the scrubbing procedure comprises allocating at least one reformatted entry identifying at least one memory address translation comprised in one of the identified entries rewritten using a currently active coalesced format identified in the coalesced format information.
10. The apparatus of any preceding claim, wherein the translation data is allocated to a corresponding entry of the plurality of entries based on an index derived from one or more indexing bits of a common address portion of the one or more blocks of input memory addresses associated with the one or more memory address translations comprised in the translation data, and the one or more indexing bits are dependent on the format used to store the translation data.
11. The apparatus of claim 10, wherein: the translation lookaside buffer is formulated as a set associative cache comprising a plurality of sets each identified by a corresponding index and each comprising a plurality of set entries of the plurality of entries; the control circuitry is configured to determine one or more skew bits identifying, for given translation data, which of the plurality of set entries can be used to store the given translation data formatted using a given format of the format group.
12. The apparatus of claim 11, wherein the one or more skew bits are determined based on the coalesced format information.
13. The apparatus of claim 11 or claim 12, when dependent on claim 5, wherein the one or more skew bits identify a first group of one or more set entries that can be used to store the given translation data formatted using the non-coalesced format and a second group of one or more set entries that can be used to store the given translation data formatted using one of the one or more active coalesced formats.
14. The apparatus of claim 13, wherein the control circuitry is responsive to a change in bits identified as the one or more skew bits, to perform a reallocation procedure to reallocate entries storing the translation data formatted using the non-coalesced format.
15. The apparatus of any of claims 11 to 14, wherein the bits determined to be the one or more skew bits comprise at least one bit belonging to the common address portion and different to the one or more indexing bits for each of the one or more active coalesced formats.
16. The apparatus of any preceding claim, wherein the coalesced format information identifies a single active coalesced format.
17. The apparatus of any of claims 1 to 15, wherein the coalesced format information identifies a plurality of active coalesced formats, and each of the plurality of entries stores information identifying which of the plurality of active coalesced formats is used to represent translation data stored in that entry.
18. The apparatus of any preceding claim, comprising address processing circuitry responsive to an input lookup address in the input memory address space and for each given active coalesced format of the active coalesced formats: to generate an index based on at least a portion of the input lookup address, wherein the portion is dependent on the given active coalesced format; to determine whether an identified entry corresponding to the input lookup address and formatted using the given active coalesced format is present in the translation lookaside buffer at a location identified by the index; and in response to a determination that the identified entry is present in the translation lookaside buffer formatted in the given coalesced format, to determine a translated output address based on identified translation data stored in the identified entry.
19. The apparatus of claim 18, when dependent on claim 5, wherein the address processing circuitry is responsive to the input lookup address: to generate a non-coalesced index based on at least a further portion of the input lookup address; to determine whether a further identified entry corresponding to the input lookup address and formatted using the non-coalesced format is present in the translation lookaside buffer at a location identified by the non-coalesced index; and in response to a determination that the further identified entry is present in the translation lookaside buffer formatted in the non-coalesced format, to determine the translated output address based on further translation data stored in the further identified entry.
20. The apparatus of claim 19, wherein the address processing circuitry is responsive to the translation lookaside buffer not storing the required memory address translation, to: trigger a page table walk using the input lookup address to determine the translated output memory address from a plurality of page tables; and allocate new translation data to one of the plurality of entries in the translation lookaside buffer, the new translation data representing translation of at least the input lookup address to the translated output address and being stored in the one of the plurality of entries in a format chosen from the format group.
21. The apparatus of claim 20, wherein: the page table walk returns a plurality of output memory addresses including the translated output memory address; the address translation circuitry is responsive to a determination that the plurality of output memory addresses can be coalesced into a single entry using one of the active coalesced formats, to represent the new translation data using one of the active coalesced formats; and the address translation circuitry is responsive to a determination that the plurality of output memory addresses cannot be coalesced into the single entry using one of the active coalesced formats, to represent the new translation data using the non-coalesced format.
22. The apparatus of claim 21, wherein: the coalesced format information identifies a plurality of active coalesced formats; and the address processing circuitry is responsive to a determination that the plurality of output memory addresses can be coalesced into a single entry using one of the active coalesced formats, to generate the new translation data by generating a plurality of candidate sets of translation data each corresponding to one of the plurality of active coalesced formats and to select the new translation data out of the plurality of candidate sets of translation data based on a number of active translations associated with each of the plurality of candidate sets of translation data.
23. The apparatus of any preceding claim, wherein: the block of input memory addresses is a block of virtual memory addresses and the input memory address space is a virtual memory address space; and the block of output memory addresses is a block of physical memory addresses and the output memory address space is a physical memory address space.
24. A method comprising: storing, in a plurality of entries of a translation lookaside buffer, translation data comprising one or more memory address translations, the one or more memory address translations each defining an address translation between a block of input memory addresses in an input memory address space and a block of output memory addresses in an output memory address space, wherein the translation lookaside buffer is configured to select, when allocating the translation data for storage within a given entry, a format used to store the translation data within the given entry, and the format is selected from a format group comprising a plurality of coalesced formats; and maintaining coalesced format information identifying one or more active coalesced formats of the plurality of coalesced formats currently available for use by the translation lookaside buffer, wherein: each of the plurality of coalesced formats defines an input address range size and an output address range size; and each entry formatted using a given coalesced format of the plurality of coalesced formats is capable of identifying a plurality of memory address translations between blocks of input memory addresses, located within a same input address range having the input address range size defined in the given coalesced format, and blocks of output memory addresses, located within a same output address range having the output address range size defined in the given coalesced format.
25. A non-transitory computer readable storage medium to store computer-readable code for fabrication of an apparatus comprising: a translation lookaside buffer comprising a plurality of entries each capable of storing translation data comprising one or more memory address translations, the one or more memory address translations each defining an address translation between a block of input memory addresses in an input memory address space and a block of output memory addresses in an output memory address space, wherein the translation lookaside buffer is configured to select, when allocating the translation data for storage within a given entry, a format used to store the translation data within the given entry, and the format is selected from a format group comprising a plurality of coalesced formats; and control circuitry configured to maintain coalesced format information identifying one or more active coalesced formats of the plurality of coalesced formats currently available for use by the translation lookaside buffer; wherein: each of the plurality of coalesced formats defines an input address range size and an output address range size; and each entry formatted using a given coalesced format of the plurality of coalesced formats is capable of identifying a plurality of memory address translations between blocks of input memory addresses, located within a same input address range having the input address range size defined in the given coalesced format, and blocks of output memory addresses, located within a same output address range having the output address range size defined in the given coalesced format.
PCT/GB2024/050276 2023-03-21 2024-02-01 Storing coalesced memory address translations Ceased WO2024194593A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202480017762.4A CN120883198A (en) 2023-03-21 2024-02-01 Memory address translation for storage merging

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB2304115.5 2023-03-21
GB2304115.5A GB2628371B (en) 2023-03-21 2023-03-21 Storing coalesced memory address translations

Publications (1)

Publication Number Publication Date
WO2024194593A1 true WO2024194593A1 (en) 2024-09-26

Family

ID=89900771

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2024/050276 Ceased WO2024194593A1 (en) 2023-03-21 2024-02-01 Storing coalesced memory address translations

Country Status (3)

Country Link
CN (1) CN120883198A (en)
GB (1) GB2628371B (en)
WO (1) WO2024194593A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170315927A1 (en) * 2016-04-27 2017-11-02 Ati Technologies Ulc Method and apparatus for translation lookaside buffer with multiple compressed encodings
US20180101480A1 (en) * 2016-10-11 2018-04-12 Arm Limited Apparatus and method for maintaining address translation data within an address translation cache
US20190188149A1 (en) * 2017-12-20 2019-06-20 Arm Limited Technique for determining address translation data to be stored within an address translation cache

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10372621B2 (en) * 2018-01-05 2019-08-06 Intel Corporation Mechanism to support variable size page translations

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170315927A1 (en) * 2016-04-27 2017-11-02 Ati Technologies Ulc Method and apparatus for translation lookaside buffer with multiple compressed encodings
US20180101480A1 (en) * 2016-10-11 2018-04-12 Arm Limited Apparatus and method for maintaining address translation data within an address translation cache
US20190188149A1 (en) * 2017-12-20 2019-06-20 Arm Limited Technique for determining address translation data to be stored within an address translation cache

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHAO YU ET AL: "Enabling Large-Reach TLBs for High-Throughput Processors by Exploiting Memory Subregion Contiguity", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 16 October 2021 (2021-10-16), XP091077605 *
GUILHERME COX ET AL: "Efficient Address Translation for Architectures with Multiple Page Sizes", ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, ACM, 2 PENN PLAZA, SUITE 701 NEW YORK NY 10121-0701 USA, 4 April 2017 (2017-04-04), pages 435 - 448, XP058326945, ISBN: 978-1-4503-4465-4, DOI: 10.1145/3037697.3037704 *

Also Published As

Publication number Publication date
GB2628371B (en) 2025-05-07
GB2628371A (en) 2024-09-25
CN120883198A (en) 2025-10-31

Similar Documents

Publication Publication Date Title
US7539843B2 (en) Virtual memory fragment aware cache
US7809921B2 (en) Method and apparatus for translating a virtual address to a real address using blocks of contiguous page table entries
JP6505132B2 (en) Memory controller utilizing memory capacity compression and associated processor based system and method
US5526504A (en) Variable page size translation lookaside buffer
US6014732A (en) Cache memory with reduced access time
US9772943B1 (en) Managing synonyms in virtual-address caches
US6493812B1 (en) Apparatus and method for virtual address aliasing and multiple page size support in a computer system having a prevalidated cache
CN107735773B (en) Method and apparatus for cache tag compression
US11409663B2 (en) Methods and systems for optimized translation of a virtual address having multiple virtual address portions using multiple translation lookaside buffer (TLB) arrays for variable page sizes
KR102281928B1 (en) Variable Transform Index Buffer (TLB) Indexing
JP2001175536A (en) Method and apparatus for calculating a page table index from a virtual address
US9996474B2 (en) Multiple stage memory management
JPH0749812A (en) Memory address controller using hash address tag in page table
US20100205344A1 (en) Unified cache structure that facilitates accessing translation table entries
US9507729B2 (en) Method and processor for reducing code and latency of TLB maintenance operations in a configurable processor
US11334499B2 (en) Method for locating metadata
KR20210144656A (en) How to allocate virtual pages to non-contiguous backup physical subpages
US10223279B2 (en) Managing virtual-address caches for multiple memory page sizes
US12197340B2 (en) Apparatus and method for cache invalidation
US7237084B2 (en) Method and program product for avoiding cache congestion by offsetting addresses while allocating memory
US6686920B1 (en) Optimizing the translation of virtual addresses into physical addresses using a pipeline implementation for least recently used pointer
US20230135599A1 (en) Memory address translation
WO2024194593A1 (en) Storing coalesced memory address translations
CN118349493A (en) Computer system, chip and related method for accessing data via virtual address
US10621107B1 (en) Translation lookaside buffer (TLB) clustering system for checking multiple memory address translation entries each mapping a viritual address offset

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24704540

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202480017762.4

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

WWP Wipo information: published in national office

Ref document number: 202480017762.4

Country of ref document: CN