HK1180795B

HK1180795B - Method for facilitating management of system memory of a computing environment

Info

Publication number: HK1180795B
Application number: HK13108052.3A
Authority: HK
Inventors: D．克拉多克; T．格雷格; C．莱施
Original assignee: 国际商业机器公司
Priority date: 2010-06-23
Filing date: 2010-11-08
Publication date: 2017-03-17

Description

Method of facilitating management of system memory of a computing environment

Technical Field

The present invention relates generally to managing system memory of a computing environment, and more particularly to facilitating the provision of address space in system memory and address translation tables available in accessing system memory if needed.

Background

The system memory may be accessed by read and write requests. These requests may come from various components of the computing environment, including the central processing unit and the adapters. Each request includes an address for accessing system memory. However, the address typically does not have a one-to-one correspondence with a physical location in system memory. Thus, address translation is performed.

Address translation is used to translate addresses provided in a form that cannot be used directly when accessing system memory to another form that can be used directly when accessing physical locations in system memory. For example, a virtual address included in a request provided by a central processing unit is translated to a real or absolute address in system memory. As yet another example, a Peripheral Component Interconnect (PCI) address provided in a request from an adapter may be translated to an absolute address in system memory.

To perform address translation, one or more address translation tables are used. The tables are configured hierarchically and bits of the address provided in the request are used to locate entries in the highest level table. This entry then points to another translation table or the page itself to be accessed.

U.S. application No.2008/0114906a1, published 15.5.2008, Hummel et al, "efficient control lines professional memorymappable systems accesses" describes in one embodiment an input/output memory management unit (IOMMU) comprising: a control register configured to store a base address of a set of translation tables; and control logic coupled to the control register. The control logic is configured to respond to an input/output (I/O) device initiated request having an address within an address range of an address space corresponding to the peripheral interconnect. One or more operations other than the memory operation are associated with the address range, and the control logic is configured to translate the address to a second address outside the address range if the translation table specifies a translation from the address to the second address, thereby performing the memory operation in response to the request in place of the one or more operations associated with the address range.

U.S. application No.2007/0168636a1, issued on 19.7.2007, Hummel et al, "chained hybrid iommu," describes in one embodiment an input/output (I/O) node comprising: an I/O memory management unit (IOMMU) configured to translate memory requests. The I/O node is configured to couple to and operate as a tunnel on the interconnect, and wherein the IOMMU is configured to translate memory requests passing through the tunnel in an upstream direction. In another embodiment, a system comprises: another I/O node configured to bridge another interconnect to the interconnect, wherein the I/O node is a tunnel for the other I/O node.

U.S. application No.2006/0288130a1, published 21.12.2006, madukkarumkumana et al, "address windowsupport for direct memoryaccess transfer," discloses a device. The device includes: remapping circuitry to facilitate access of one or more I/O devices to a memory device for a Direct Memory Access (DMA) transaction. The remapping circuit includes a translation mechanism to perform memory address translation for the I/ODMA transaction via address window based translation.

Disclosure of Invention

The shortcomings of the prior art are overcome and advantages are provided through the provision of a method as claimed in claim 1, and corresponding system and computer program product for facilitating management of system memory of a computing environment.

Drawings

One or more aspects of the present invention are particularly pointed out and distinctly claimed as examples of the claims at the conclusion of the specification. The above and other objects, features and advantages of the present invention will become apparent from the following detailed description when taken in conjunction with the accompanying drawings, in which:

FIG. 1 depicts one embodiment of a computing environment to incorporate and use one or more aspects of the present invention;

FIG. 2A depicts one embodiment of further details of the system memory and input/output (I/O) hub (hub) of FIG. 1, in accordance with an aspect of the present invention;

FIG. 2B illustrates an example of assigning multiple address spaces to an adapter function in accordance with an aspect of the present invention;

FIG. 3A depicts one embodiment of the overview of the logic to register a DMA (direct memory Access) address space for an adapter, in accordance with an aspect of the present invention;

FIG. 3B depicts one embodiment of various details of registering a DMA address space for an adapter, in accordance with an aspect of the present invention;

FIG. 4 depicts one embodiment of the logic to process a DMA operation, in accordance with an aspect of the present invention;

FIG. 5A illustrates one example of translation levels used when an entire address is used to index into an address translation table to translate addresses and access a page;

FIG. 5B illustrates one example of translation levels used when ignoring a portion of an address when indexing into an address translation table in accordance with an aspect of the present invention;

FIG. 5C illustrates examples of various CPUDAT-compatible formats that may be used in accordance with one or more aspects of the present invention;

FIG. 5D illustrates examples of various I/O extended address translation formats that may be used in accordance with one or more aspects of the present invention;

FIG. 6A depicts one embodiment of a Modify PCI function controls instruction, used in accordance with an aspect of the present invention;

FIG. 6B depicts one embodiment of fields used by the Modify PCI function controls instruction of FIG. 6A, in accordance with an aspect of the present invention;

FIG. 6C depicts one embodiment of another field used by the Modify PCI function controls instruction of FIG. 6A, in accordance with an aspect of the present invention;

FIG. 6D depicts one embodiment of the contents of a Function Information Block (FIB), used in accordance with an aspect of the present invention;

FIG. 7 depicts one embodiment of an overview of the logic to modify PCI function control, in accordance with an aspect of the present invention;

FIG. 8 depicts one embodiment of the logic associated with the register I/O address translation parameter operation specified by the Modify PCI function controls instruction, in accordance with an aspect of the present invention;

FIG. 9 depicts one embodiment of the logic associated with a deregister I/O address translation parameter operation specified by a Modify PCI functions control instruction, in accordance with an aspect of the present invention;

FIG. 10A depicts one embodiment of call logic processor instructions, used in accordance with an aspect of the present invention;

FIG. 10B depicts one embodiment of a request block used by the Call logical processor instructions of FIG. 10A, in accordance with an aspect of the present invention;

FIG. 10C depicts one embodiment of a response block provided by the Call logical processor instructions of FIG. 10A, in accordance with an aspect of the present invention;

FIG. 11 depicts one embodiment of the logic to enable a PCI function, in accordance with an aspect of the present invention;

FIG. 12A depicts one embodiment of a request block used by the Call logic processor instruction of FIG. 10A for a query group operation, in accordance with an aspect of the present invention;

FIG. 12B depicts one embodiment of a response block operated on by the query group of FIG. 12A, in accordance with an aspect of the present invention;

FIG. 13 depicts one embodiment of a computer program product incorporating one or more aspects of the present invention;

FIG. 14 depicts one embodiment of a host computer system to incorporate and use one or more aspects of the present invention;

FIG. 15 illustrates a further example of a computer system incorporating and using one or more aspects of the present invention;

FIG. 16 illustrates another example of a computer system comprising a computer network that incorporates and uses one or more aspects of the present invention;

FIG. 17 depicts one embodiment of various elements of a computer system incorporating and using one or more aspects of the present invention;

FIG. 18A depicts one embodiment of an execution unit of the computer system of FIG. 17 to incorporate and use one or more aspects of the present invention;

FIG. 18B depicts one embodiment of a branching unit of the computer system of FIG. 17 incorporating and using one or more aspects of the present invention;

FIG. 18C depicts one embodiment of a load/store unit of the computer system of FIG. 17 incorporating and using one or more aspects of the present invention; and

FIG. 19 illustrates one embodiment of an emulated host computer system incorporating and using one or more aspects of the present invention.

Detailed Description

According to an aspect of the invention, an adapter is associated with a plurality of address spaces. This enables multiple address translation formats to be used by the adapter to access system memory and, if needed or desired, multiple sets of address translation tables to be used in translating addresses available in the access system memory. In particular, in one example, an adapter includes one or more adapter functions, and a plurality of address spaces are allocated to at least one of the adapter functions.

Moreover, as used herein, the term adapter includes any type of adapter (e.g., storage adapter, network adapter, processing adapter, PCI adapter, network adapter, cryptographic adapter, other types of input/output adapters, etc.). In one embodiment, an adapter includes an adapter function. However, in other embodiments, an adapter may include multiple adapter functions. One or more aspects of the present invention may be applied regardless of whether an adapter includes one adapter function or multiple adapter functions. Further, in the examples presented herein, adapters are used interchangeably with adapter functions (e.g., PCI functions) unless otherwise noted.

One embodiment of a computing environment to incorporate and use one or more aspects of the present invention is described with reference to FIG. 1. In one example, computing environment 100 is provided by International Business machines corporationAnd (4) a server.The server is based on the service provided by International Business machines corporationAboutDetails of (c) in the IBM publicationThe publication is entitled "z/Architecture principles of operation", IBM publication No. SA22-7832-07, month 2 2009.Andis a registered trademark of international business machines corporation, armonk, new york. Other names used herein may be registered trademarks, trademarks or product names of International Business machines corporation or other companies.

In one example, computing environment 100 includes one or more Central Processing Units (CPUs) 102 coupled to a system memory 104 (also referred to as main memory) via a memory controller 106. To access system memory 104, central processing unit 102 issues a read or write request that includes an address that is used to access system memory. The address included in the request is typically not directly usable to access system memory, and therefore, it is converted to an address that is directly usable to access system memory. The address is translated via a translation mechanism (XLATE) 108. For example, addresses are translated from virtual addresses to real or absolute addresses using, for example, Dynamic Address Translation (DAT).

The request including the translated address is received by memory controller 106. In one example, the memory controller 106 contains hardware and is used to arbitrate for access to system memory and to maintain memory coherency. This arbitration is performed for requests received from the CPU102 and requests received from one or more adapters 110. Similar to a central processing unit, the adapter issues a request to system memory 104 to obtain access to the system memory.

In one example, adapter 110 is a Peripheral Component Interconnect (PCI) or PCI express (PCIe) adapter that includes one or more PCI functions. The PCI function issues a request to access system memory. The request is routed to an input/output hub 112 (e.g., a PCI hub) via one or more switches (e.g., PCIe switches) 114. In one example, an input/output hub includes hardware including one or more state machines.

The input/output hub includes, for example, a root complex (rootcomplex) 116 that receives requests from the switches. The request typically includes an input/output address that will need to be translated, and thus the root complex provides the address to the address translation and protection unit 118. This unit is, for example, a hardware unit that translates the I/O address to be directly usable to access system memory 104 if needed, as will be described in detail below.

Requests originating from the adapter that include an address (translated or not, if translation is not required) are provided to the memory controller 106 via, for example, the I/O-to-memory bus 120. The memory controller performs its arbitration and forwards the request with the translated address (or the original address, if not translated) to the system memory at the appropriate time.

Further details regarding the system memory and input/output hub will be described with reference to FIG. 2A. In this embodiment, the memory controller is not shown. However, the I/O hub may be coupled to the system memory directly or via a memory controller. In one example, system memory 104 includes one or more address spaces 200. An address space is a particular portion of system memory that has been allocated to a particular component of a computing environment, such as a particular adapter or adapter function. In one example, the address space may be accessed by Direct Memory Access (DMA) initiated by the adapter (or adapter function), and thus, the system space is referred to as the DMA address space in this example. However, in other examples, direct memory accesses are not used to access the address space.

Also, in one example, system memory 104 includes an address translation table 202 that is used to translate addresses that are not directly usable to access system memory to directly usable addresses. In one embodiment, one or more address translation tables are assigned to a DMA address space, and these one or more address translation tables are configured based on, for example, the size of the address space to which they are assigned, the size of the address translation tables themselves, and/or the size of the page (or other unit of memory) to be accessed.

In one example, there is a hierarchy of address translation tables (ATs). For example, as shown in FIG. 2A, there is a first level table 202A (e.g., a segment table) pointed to by an IOAT pointer 218 (described below), and a second, lower level table 202b (e.g., a page table) pointed to by an entry 206a of the first level table. One or more bits of the received address 204 may be used to index into the table 202a to locate a particular entry 206a that indicates a particular lower level table 202 a. One or more other bits of the address 204 are then used to locate a particular entry 206b in the table. In this example, the entry provides an address for locating the correct page, and the other bits in the address 204 are used to locate a particular location 208 in the page to perform the data transfer. That is, the address in entry 206a and selected bits of received PCI address 204 are used to provide an address that is directly available for accessing system memory. For example, a directly usable address is formed from a concatenation (collocation) of the high order bits of the address in entry 206b (e.g., bits 63:12 in the 4k page example) and selected low order bits from the received PCI address (e.g., bits 11:0 in the 4k page).

According to an aspect of the invention, multiple address spaces may be assigned to a particular component, such as a particular adapter (or adapter function). For example, as shown in FIG. 2B, two or more address spaces 200a … 200n of system memory 104 are allocated to adapter function 220 a. In this example, two address spaces are shown, but in other examples, more than two address spaces are allocated. Assigning multiple address spaces to a particular adapter function allows the operating system to separate the DMA address spaces. For example, one address space may be used for control information and queues (e.g., SCSI control data blocks) and one address space may be used for data transfers (e.g., SCSI blocks). Other embodiments also exist. However, each address space may be less than one large address space, thereby providing improved conversion efficiency and finer grained protection.

In one embodiment, each DMA address space allocated to an adapter function may be associated with a different translation format (e.g., bypass, no fetch, CPUDAT compatibility, I/O extended address translation (described below), etc.). Further, if the translation format uses translation tables, a set of one or more address translation tables 250a-250n is assigned to the address space. Each set of one or more address translation tables assigned to the address space has a particular format (e.g., a CPUDAT-compatible format or an I/O extended address translation format). The format of one set of translation tables may be the same or different from another set of translation tables.

In one example, it is the operating system that assigns one or more DMA address spaces to a particular adapter. This allocation is performed via a registration process that causes initialization (e.g., via trusted software) of one or more device table entries 210 (FIG. 2A) for the adapter. The registration process also associates an address space identifier (e.g., one or more bits of a PCI address) to each address space, as described in more detail below.

Each device table entry is located in a device table 211 in the I/O hub 112. For example, the device table 211 is located within an address translation and protection unit of the I/O hub.

In one example, Device Table Entry (DTE) 210 includes a plurality of fields, such as the following:

format 212: this field includes a number of bits to indicate various information, including, for example: an address translation format of a higher level table of the address translation tables. The address translation format indicates the level of the table (e.g., the first level table in the above example), and the selected address translation format (also referred to as the translation format) used in providing the addresses directly available in accessing system memory (e.g., CPUDAT-compatible, I/O extended address, bypass, no fetch, etc.);

page size 213: this field indicates the size of the page (or other unit of memory) to be accessed;

PCI base address 214 and PCI bound 216: these values provide a range for defining the DMA address space and verifying that the received address (e.g., PCI address) is valid;

IOAT (input/output address translation) pointer 218: this field includes a pointer to the highest level of the address translation table for the DMA address space.

Enable 219: this field indicates whether DTE is enabled; and

the key 221: a storage key for storage protection when performing DMA operations in system memory.

In other embodiments, the DTE may include more, less, or different information.

According to an aspect of the present invention, there is one device table entry according to the address space, and thus, there may be a plurality of device table entries according to the adapter (or adapter function). In one embodiment, a Requester Identifier (RID) (and/or a portion of an address) and an address space identifier are used to locate a device table entry used in a particular translation. A requestor ID (e.g., a 16-bit value specifying, for example, a bus number, a device number, and a function number) is included in a request issued by PCI function 220 associated with an adapter. The address space identifier is one or more bits of the I/O address included in the request. A particular one or more bits used as an address space identifier was previously defined as an address space identifier. The request, including the RID and the I/O address (including the address space identifier), is provided, for example, to Content Addressable Memory (CAM) 230 via, for example, switch 114. The CAM is used to provide an index value that is used to index into the device table 211 to locate a particular device table entry 210. For example, the CAM includes a plurality of entries, each entry corresponding to an index in the device table. Each CAM entry includes a value of the RID and an address space identifier. If the received RID and address space identifier match the values contained in the entry in the CAM, the device table entry is located using the corresponding device table index. If there is no match, the received packet is discarded without accessing system memory. (in other embodiments, no CAM or other lookup is required and the RID and address space identifier are used as an index.)

The fields in the device table entry are then used to ensure the validity of the address and the configuration of the address translation table (if any). For example, the inbound address (inbound address) in the request is checked by the I/O hub's hardware (e.g., address translation and protection unit) to ensure that it is within the bounds defined by the PCI base address 214 and the PCI bounds 216 stored in the device table entry located using the RID and address space identifier of the request that provided the address. This ensures that the address is within the previously registered range and the address translation tables (if any) are effectively configured for it.

One embodiment of the registration process will be described with reference to fig. 3A-3B. In this example, the registration process is performed for each address space allocated by the adapter (or specifically, the adapter function). As an example, this logic is executed by one of the central processing units coupled to the system memory in response to an operating system request.

Initially, referring to FIG. 3A, the size and location of the address space to be accessed by the adapter is determined, STEP 300. In one example, the size of the address space is determined by the PCI base address and PCI limits set by the operating system. The operating system uses one or more calibration standards to define the base and limits. For example, if the operating system wishes to have PCI addresses mapped directly to CPU virtual addresses, the base and limit are set to such. In yet another example, if additional isolation between adapters and/or operating system images is desired, the addresses used are selected to provide non-overlapping and disjoint address spaces. The location is also specified by the operating system and is based on, for example, the characteristics of the adapter.

Further, as part of the registration process, it is determined which address translation format to register for the adapter function, step 301. I.e. determining which format is used to provide an address for accessing an adaptation function directly available in the system memory.

In one embodiment, multiple address translation formats are available, and based on such multiple formats, the operating system selects one format for the adapter function. This selection is based on, for example, the configuration of the address space, the adapter type, etc. The various possible formats include:

(a) bypass format, in which address translation is bypassed. This format is used when the adapter performing the registration is a trusted adapter. An adapter is considered a trusted adapter if, for example, the hardware design of the adapter is robust enough and protected so that addresses are not corrupted. For example, an internally developed adapter or an adapter managed by trusted firmware that provides its own translation and protection mechanisms may be considered a trusted adapter.

As used herein, firmware includes, for example, microcode, millicode (millicode), and/or macrocode of a processor. It includes, for example, hardware-level instructions and/or data structures for implementing higher-level machine code. In one embodiment, it includes, for example, proprietary (proprietary) code that is typically delivered as microcode that includes trusted software or microcode specific to the underlying hardware and controls operating system access to system hardware.

By way of example atNative attachments to the I/O adapter employ I/O address translation (IOAT) to provide protection and isolation of DMA accesses to system memory by the adapter. However, there are multiple instances where such a particular level of protection is not requiredAn adapter comprising the above. Thus, for those adapters, a bypass format may be selected;

(b) the format is not fetched, and the address included in the initial request from the adapter is available without fetching any translation tables. This format may be selected when the memory is contiguous, the page size is known, and the addresses are used in a forced area (e.g., 4k or 1M pages) that does not require fetching any translation tables from system memory. The addresses available for accessing system memory (i.e., the addresses that result when the no-fetch format is selected) are derived from the addresses of the IOAT pointers. For example, for a 4k page size, the lower order bits of the PCI address (e.g., bits 11: 0) are concatenated with the upper 52 bits of the IOAT pointer to obtain an address that is available for accessing system memory;

(c) a CPUDAT-compliant format in which the translation tables used to translate I/O addresses are compatible with the translation tables used for CPUDAT translation. That is, an address translation table similar to and compatible with that already used for CPU dynamic address translation is used. This facilitates operating system usage similar to those using these types of tables; the ability to share tables between the CPU and the I/O adapter; and providing a certain operating system in the DMA space that manages its pageable guests (e.g., a DMA space in which to run a program) Efficiency. There are various available CPUDAT-compliant formats, as described in more detail below with reference to fig. 5C;

(d) I/O extended address translation formats, where extended address translation tables are used for I/O address translation. With this format, the address translation tables are dedicated to I/O operations and may be larger in size than typically used for CPU address translation. For example, there may be 1M or even larger page tables and/or other translation tables. Further, the sizes of different levels of translation tables including page tables may be different from each other, and they may be different from their own pages. The increase in conventional size reduces bus transactions and helps promote I/O translation caching. The size of the page table and other translation tables, as well as the size of the page, will determine how many levels of translation are required. Examples of different I/O extended address translation formats are described in detail below with reference to FIG. 5D.

Thereafter, one or more address translation tables are created to cover the DMA address space, STEP 302. In one example, the creation includes building a table and placing the appropriate address in the table entry. As an example, one translation table is a 4k page table with 512 64-bit entries, and each entry includes a 4k page address compatible with the allocated address space.

Thereafter, the DMA address space is registered for the adapter (or adapter function), STEP 304, as described in detail with reference to FIG. 3B. In this example, it is assumed that each adapter has a PCI function and, therefore, each adapter has a requester ID. The logic is executed, for example, by a central processing unit coupled to the system memory in response to an operating system request.

Initially, in one embodiment, an available device table entry corresponding to the adapter's requestor ID and address space identifier is selected, STEP 310. That is, the requestor ID will be used to locate the device table entry. In one embodiment, firmware of one of the central processing units determines which bits of the address represent the address space identifier and provides this information to the operating system (executable on the CPU or another CPU) requesting registration, which uses the information to select the device table entry.

In addition, the PCI base address and the PCI limits are stored in the device table entry, STEP 312. Also, the format of the highest level address translation table (if any) is also stored in the device table entry, step 314. For example, the format field includes a plurality of bits, and one or more of those bits indicate the format of the highest level table and the selected address translation format (e.g., segment level, CPUDAT compatible). In other embodiments, one or more bits indicate the highest level and one or more other bits indicate the determined translation format (e.g., bypass, no fetch, a particular CPUDAT-compliant format, a particular I/O extended address translation format, etc.).

In addition, an input/output address translation (IOAT) pointer to the highest level address translation table (or page, if not fetched) is also stored in the device table entry, step 316. This completes the registration process.

In response to performing the registration, the DMA address space and corresponding address translation tables (if any) and device table entries are ready to be used. Details regarding processing a request to access system memory issued by a requestor, such as an adapter, are described with reference to FIG. 4. The processing described below is performed by the I/O hub. In one example, it is the address translation and protection unit that performs this logic.

In one embodiment, initially, a DMA request is received at an input/output hub, step 400. For example, a PCI function issues a request that is forwarded to a PCI hub, e.g., via a PCI switch. Using the requestor ID and the address space identifier in the request (which is one or more bits of the I/O address in the request), the appropriate device table entry may be located, STEP 402. For example, the CAM knows which bits are designated as the address space identifier, and he uses which bits and RID to build an index into the device table to select the appropriate device table entry.

Thereafter, a determination is made as to whether the device table entry is valid, INQUIRY 404. In one example, the validity is determined by checking the validity bit of the entry itself. This bit is set, for example, in response to execution of an enable function request by the operating system. If enabled, the bit is set to, for example, 1 (i.e., valid); otherwise, it remains at zero (i.e., invalid). In yet another example, the bit may be set when the registration process is complete.

If the device table entry is invalid, an error is presented, step 405. Otherwise, a further determination is made as to whether the PCI address provided in the request is less than the PCI base address stored in the device table entry, INQUIRY 406. If so, the address is outside the valid range and an error is provided, STEP 407. However, if the PCI address is greater than or equal to the base address, a further determination is made as to whether the PCI address is greater than the PCI limit value in the device table entry, INQUIRY 408. If the PCI address is greater than the limit, then again an error is present because the address is outside the valid range, STEP 409. However, if the address is within the valid range, processing continues.

In one example, a determination is made as to whether the address translation format specified in the device table entry indicates a bypass translation, INQUIRY 410. If so, the address is transferred directly to the memory controller over the I/O bus to access the memory without fetching any translation entries. The I/O hub continues processing to enable the fetch/store of data at that address, step 426.

Returning to INQUIRY 410, if the format does not indicate bypass, then a further inquiry is made as to whether the format indicates direct access to memory based on the IOAT pointer without any fetching of address translation tables, INQUIRY 412. If no fetch is indicated, the resulting address is derived from the IOAT pointer and no fetch of an address translation table from system memory is required, STEP 414. The resulting address is sent to the memory controller and used to locate the page and the particular entry in the page. For example, if the page size is 4k, bits 11:0 are used as the cheap slave IOAT pointer. The I/O hub continues processing to enable the retrieval/storage of data at the page entry, step 426.

Returning to INQUIRY 412, if, on the other hand, a translation table needs to be used, the format provided in the device table entry is used to determine the type of translation table (e.g., CPUDAT-compatible or I/O extended address translation) and to determine the PCI address bits in the address used for address translation, STEP 416. For example, if the format indicates an I/O extended address translation format with 4k pages and 4k address translation tables (as described below), and the upper level table is a first level table with 4k pages, bits 29:21 of the address are used to index into the first level table; bits 20:12 are used to index into the page table; and bits 11:0 are used to index into the 4k page. The bits used depend on how many bits are needed to index into a page or table of a given size. For example, for a 4k page with byte level addressing, 4096 bytes are addressed using 12 bits; while for a 4k page table with 512 entries, 8 bytes each, 512 entries are addressed using 9 bits.

Next, the PCI hub retrieves the appropriate address translation table entry, STEP 418. For example, initially, the highest level translation table is located using the IOAT pointer of the device table entry. The bits of the address (those following the high order bits for validation rather than translation; e.g., bits 29:21 in the above example) are then used to locate a particular entry in the table.

A determination is then made as to whether the located address translation entry is in the correct format, based on, for example, the format provided in the device table entry, INQUIRY 420. For example, the format in the device table entry is compared to the format indicated in the address translation entry. If so, the format in the device table entry is valid. If not, an error is indicated, step 422; otherwise, processing continues to determine if this is the last table to be processed, INQUIRY 416. That is, it is determined whether additional address translation tables are needed to obtain a real or absolute address or whether the lowest level entry has been located. This determination is made based on the provided format and the size of the table that has been processed. If it is not the last table, processing continues to step 412. Otherwise, the I/O hub continues processing to enable the retrieval or storage of data at the translated address, STEP 426. In one example, the I/O hub forwards the translated address to the memory controller, which uses the address to fetch or store data at the DMA location specified by the translated address.

In one embodiment, the number of levels of conversion, and thus the number of acquisitions required to perform the conversion, is reduced. This is accomplished by, for example, ignoring the higher order bits of the address during translation and using only the lower order bits to traverse translation tables that are based on, for example, the size of the DMA address space allocated to the adapter. The use of partial addresses relative to full addresses is further illustrated in the following examples.

Initially, referring to fig. 5A, an example is described in which an entire address is used in address translation/memory access. Using this prior art, six levels of translation tables are required, including page tables. The beginning of the highest level table (e.g., the fifth level table in this example) is pointed to by the IOAT pointer, and then the bits of the PCI address are used to locate the entry in the table. Each translation table entry points to the beginning of a lower level translation table or page (e.g., an entry in the fifth level table points to the beginning of the fourth level table, etc.).

In this example, the DMA address space (DMAAS) is 6M in size, and each table is 4k bytes, with a maximum of 512 entries of 8 bytes (except for a fifth level table that supports only 128 entries based on address size). The address is, for example, 64 bits: FFFFC0000009C 600. The start of the fifth level table is pointed to by the IOAT pointer and bits 63:57 of the PCI address are used to index into the fifth table to locate the start of the fourth level table; bits 56:48 of the PCI address are used to index into the fourth level table to locate the beginning of the third level table; bits 47:39 are used to index into the third level table to locate the beginning of the second level table; bits 38:30 are used to index into the second level table to locate the beginning of the first level table; bits 29:21 are used to index into the first level table to locate the beginning of the page table; bits 20:12 are used to index into the page table to locate the start of the page; and bits 11:0 are used to locate entries in the 4k page. Thus, in this example, all address bits are used for translation/access.

This is in contrast to the example in FIG. 5B, where the address space is the same size (e.g., 6M) and the addresses are the same, but the translation technique ignores some of the address bits in the translation. In this example, bits 63:30 of the address are ignored in the translation. The IOAT pointer points to the beginning of the first level table and bits 29:21 of the PCI address are used to index into the first level table to locate the beginning of the page table; bits 20:12 are used to index into the appropriate page table to locate the start of the page; and bits 11:0 are used to index into the 4k page.

As shown, the first level table 500 includes three entries 502, each entry providing an address to one of three page tables 504. The number of page tables required, and thus the number of other level tables, depends on, for example, the size of the DMA address space, the size of the translation tables, and/or the size of the pages. In this example, the DMA address space is 6M, and each page table is 4k, with a maximum of 512 entries. Thus, each page table can map up to 2M of memory (4 k × 512 entries). Therefore, three page tables are required for the 6M address space. The first level table can hold three entries, one for each page table, and thus, no further levels of address translation tables are required in this example.

Further, as described above, different formats of address translation tables may be used for address translation, and variations may exist in the formats. For example, there may be various CPUDAT-compliant formats, examples of which are described with reference to fig. 5C. As shown, one CPUDAT-compatible format is a 4 k-page CPUDAT-compatible format 550, and the other is a 1M-page CPUDAT-compatible format 552, as examples. The number of bits shown is the number of address bits used to index into (or locate an entry in) the page or table. For example, 12 bits 554 of the PCI address are used as a byte offset in 4k page 556; 8 bits 558 are used as an index into page table 560; the 11-bit 562 is used as an index into the segment table 564, and so on. The maximum size of the address space located under the specified address translation table that is supported by the address translation table. For example, page table 560 supports a 1MDMA address space; the segment table 564 supports a 2GDMA address space, etc. In this figure, and in fig. 5D, K is kilobytes (kilobytes), M is megabytes (megabytes), G is gigabytes, T is terabytes, P is petabytes (bytes), and E is abaytes (octets).

As shown, as the size of the page increases, the number of levels of the translation table decreases. For example, for 4k page 556, a page table is needed, but not for 1M pages. Other examples and variations are possible.

Various examples of I/O extended address translation formats are shown in FIG. 5D. For example, the following format is shown: a 4k address translation table 570 having 4k pages; a 1M address translation table 572 having 4k pages; and a 1M address translation table 574 having 1M pages. With the CPUDAT-compatible format, the number of bits listed are those used to locate an entry in a particular table. For example, at reference numeral 576, 12 bits are the offset in the 4k page. Similarly, at reference numeral 578, the 9 bits are used to index into the I/O page table. This I/O page table allows a DMA address space having a size of 2M. Many other examples exist.

As described herein, one address translation format of one address space may be different from the address translation format of another address space. For example, the formats may be of different types (e.g., a bypass format of one address space, a CPUDAT-compatible format of another address space, a CPUDAT-compatible format of one party, an I/O extended address translation format of another party, or any other combination), or may be variants of a particular type of format (e.g., a 4 k-page CPUDAT-compatible format of one address space, a 1M-page DAT-compatible format of another address space, a 4 k-table with 4 k-page I/O extended address translation of one party, a 1M-table with 4 k-page I/O extended address translation format of another party, etc.). Furthermore, the address spaces may have the same format, have different (or even the same) lengths, and may be identified by a unique address identifier. The number of address spaces supported may be greater than 2, depending on the implementation.

In one particular arrangement, to perform the registration of the DMA address space with the adapter, an instruction referred to as a Modify PCI Function Controls (MPFC) instruction is used. For example, the operating system determines which address translation format he wishes to use, builds an address translation table for that format, and then issues an MPFC instruction (which has the format included as an operand of the instruction). In one example, the format and other operands of the instruction are included in a function information block (described below), which is an operand of the instruction. The DTE is then updated with a function information block, and in one embodiment, optionally, a Function Table Entry (FTE) that includes operating parameters of the adapter.

One embodiment of details associated with this instruction, and in particular the registration process, is described with reference to FIGS. 6A-9. Referring to FIG. 6A, a Modify PCI function controls instruction 600 includes, for example, an opcode 602 indicating the Modify PCI function controls instruction; a first field 604 specifying where various information is included, the information being information about the adapter function for which the operational parameters are being established; and a second field 606 that indicates the location from which a PCI Function Information Block (FIB) is obtained. The contents of the location specified by fields 1 and 2 will be further described below.

In one embodiment, field 1 specifies a general register that includes various information. As shown in FIG. 6B, the contents of the register include, for example, a function handle (handle) 610 that identifies the handle of the adapter function on which the modify instruction is executed; an address space 612 that specifies an address space in system memory associated with the adapter function specified by the function handle; an operation control 614 that specifies an operation to be performed for the adapter function; and a state 616 that provides, in a predetermined code, a state about the instruction when the instruction is completed.

In one embodiment, the function handle includes, for example, an enable indicator indicating whether the handle is enabled, a function number (which is a static identifier and can be used to index into a function table) that identifies the adapter function; and an instance number that specifies a particular instance of the function handle. There is a function handle for each adapter function and it is used to locate a Function Table Entry (FTE) in the function table. Each function table entry includes operating parameters and/or other information related to its adapter function. As an example, the function table entry includes:

example No.: this field indicates the particular instance of the adapter function handle associated with the function table entry;

device Table Entry (DTE) index 1 … n: there are one or more device table indices, and each index is an index into one of the device tables for locating a Device Table Entry (DTE). Each adapter function has one or more device table entries, and each entry includes information related to its adapter function, including information for handling requests of the adapter function (e.g., DMA requests, MSI requests) and information related to requests related to the adapter function (e.g., PCI instructions). Each device table entry is associated with an address space in system memory allocated to the adapter function. The adapter function may have one or more address spaces within system memory allocated to the adapter function.

A busy indicator: this field indicates whether the adapter function is busy;

persistent error status indicator: this field indicates whether the adapter function is in a persistent error state;

restoring the starting indicator: this field indicates whether recovery of the adapter function has been initiated;

permission indicator: this field indicates whether the operating system attempting to control the adapter function has permission to do so;

enabling the indicator: this field indicates whether the adapter function is enabled (e.g., 1= enabled, 0= disabled);

requester Identifier (RID): this is an identifier of the adapter function and includes, for example, a bus number, a device number, and a function number.

In one example, this field is used to access the configuration space of the adapter function. (the memory of the adapter may be defined as an address space, including, for example, a configuration space, an I/O space, and/or one or more memory spaces.) in one example, the configuration space may be accessed by specifying the configuration space in instructions issued by the operating system (or other configuration) to the adapter function. Specified in the instruction is an offset into the configuration space, and a function handle for locating the appropriate function table entry including the RID. The firmware receives the instruction and determines that it is for the configuration space. Thus, it uses the RID to generate requests to the I/O hub, and the I/O hub creates requests to access the adapter. The positioning of the adapter function is based on the RID, and the offset specifies an offset into the configuration space of the adapter function.

Base Address Register (BAR) (1 to n): this field includes a plurality of unsigned integers, designated BAR₀-BAR_nWhich is associated with the originally specified adapter function and whose value is also stored in the base address register associated with the adapter function. Each BAR indicates the starting address of the memory space or I/O space within the adapter function, and also indicates the type of address space, i.e., it is a 64 or 32 bit memory space, for example, or a 32 bit I/O space;

in one example, it is used to access memory space and/or I/O space of the adapter function. For example, an offset provided in an instruction accessing the adapter function is added to a value in a base address register associated with an address space specified in the instruction to obtain an address for accessing the adapter function. An address space identifier provided in the instruction identifies an address space within the adapter function to be accessed, and a corresponding BAR to be used;

size 1 … n: this field includes a plurality of unsigned integers, designated SIZE₀-SIZE_N(ii) a The value of the size field, when not zero, indicates the size of each address space, and each entry corresponds to the previously described BAR.

Further details regarding BAR and Size will be described below.

1. When the BAR is not implemented for the adapter function, both the BAR field and its corresponding size field are stored as zeros.

2. When the BAR field represents an I/O address space or a 32-bit memory address space, the corresponding size field is non-zero and represents the size of the address space.

3. When the BAR field represents a 64-bit memory address space,

a.BAR_nthe field indicates the least significant (least significant) address bit.

b. The next successive BAR_n+1The field indicates the most significant (mostgignifican) address bit.

c. Corresponding SIZE_nThe field is non-zero and indicates the size of the address space.

d. Corresponding SIZE_n+1The field is not meaningful and is stored as zero.

Internal routing information: this information is used to perform a specific routing to the adapter. It includes, by way of example, node, processor chip and hub addressing information.

And (3) status indication: this provides an indication as to whether, for example, a load/store operation is blocked or the adapter is in an error state, among other indications.

In one example, the busy indicator, persistent error status indicator, and recovery start indicator are set based on supervision performed by firmware. Also, the permission indicator is set based on, for example, policy; and BAR information is set based on configuration information found during bus walks (buswalk) of a processor (e.g., firmware of the processor). Other fields may be set based on configuration, initialization, and/or events. In other embodiments, the function table entry may include more, less, or different information. The information included may depend on the operations supported or enabled by the adapter function.

Referring to FIG. 6C, in one example, field 2 indicates the logical address 620 of the PCI Function Information Block (FIB), which includes information about the relevant adapter function. The function information block is used to update the device table entry and/or function table entry (or other location) associated with the adapter function. This information is stored in the FIB during initialization and/or configuration of the adapter, and/or in response to certain events.

Further details regarding the Functional Information Block (FIB) are described with reference to fig. 6D. In one embodiment, the function information block 650 includes the following fields:

format 651: this field specifies the format of the FIB.

Intercept control 652: this field is used to indicate whether guest execution of a particular instruction by a pageable mode guest (pageable modeguest) results in instruction interception;

error indication 654: this field includes error status indications for direct memory access and adapter interruptions. When the bit is set (e.g., 1), one or more errors are detected when performing direct memory access or adapter interception for the adapter function;

load/store block 656: this field indicates whether the load/store operation is blocked;

PCI function valid 658: this field includes enable controls for the adapter function. When the bit is set (e.g., 1), the adapter function is considered enabled for I/O operations;

address space registration 660: this field includes direct memory access enable control for the adapter function. When this field is set (e.g., 1), direct memory access is enabled;

page size 661: this field indicates the size of the page or other unit of storage to be accessed by the DMA memory access;

PCI Base Address (PBA) 662: this field is the base address for the address space in system memory allocated to the adapter function. It represents the lowest virtual address that the adapter function is allowed to use in direct memory access to the specified DMA address space;

PCI Address boundary (PAL) 664: this field indicates the highest virtual address that the adapter function is allowed to access within the specified DMA address space;

input/output address translation pointer (IOAT) 666: the input/output address translation pointer specifies the first of any translation tables used by PCI virtual address translation, or it may directly specify the absolute address of the memory frame as the result of the translation;

interruption Subclass (ISC) 668: this field includes an interrupt subclass for giving adapter interrupts for adapter functions;

number of interrupts (NOI) 670: this field specifies the number of different interrupt codes that are acceptable for the adapter's function. This field also defines in bits the size of the adapter interrupt bit vector specified by the adapter interrupt bit vector address and the adapter interrupt bit vector offset field;

adapter interrupt bit vector Address (AIBV) 672: this field specifies the address of the adapter interrupt bit vector for the adapter function. The vector is used in the interrupt processing;

adapter interrupt bit vector offset 674: this field specifies the offset of the first adapter interrupt bit vector bit for the adapter function;

adapter interrupt summary bit Address (AISB) 676: this field provides an address specifying an adapter interrupt summary bit that is optionally used in interrupt processing;

adapter interrupt summary bit offset 678: this field provides an offset into the adapter interrupt summary bit vector;

functional Measurement Block (FMB) address 680: this field provides the address of the function measurement block for collecting measurements on the adapter function;

function measurement block key (key) 682: this field includes an access key to access the functional measurement block;

summary bit notification control 684: this field indicates whether there is a summary bit vector being used;

instruction authorization token 686: this field is used to determine whether the pageable storage mode guest is authorized to execute PCI instructions without host intervention.

In one example, inA pageable guest is interpretively executed at level 2 of interpretation via a Start Interpretive Execution (SIE) instruction. For example, a Logical Partition (LPAR) hypervisor (hypervisor) executes the SIE instruction to begin a physical, fixed logical partition in memory. If it is notIs the operating system in the logical partition that issues the SIE instruction to execute its guest (virtual) machine in its V = V (virtual) storage. Thus, the LPAR hypervisor uses the level 1SIE, andhypervisor usage level 2 SIE; and

address translation format 687: this field indicates an indication of a selected format (e.g., highest level table (e.g., segment table, region (region) third, etc.) for address translation of the highest level translation table to be used in translation, and an indication of the selected format (e.g., CPUDAT compatible, I/O extended address translation format, bypass format, no fetch format).

The function information block specified in the Modify PCI function controls instruction is used to modify the selected device table entry, function table entry, and/or other firmware controls associated with the adapter function specified in the instruction. Certain services are provided to the adapter by modifying device table entries, function table entries, and/or other firmware controls. These services include, for example, adapter interruptions; address translation; resetting the error state; reset load/store block; setting functional measurement parameters; and setting interception control.

One embodiment of the logic associated with modifying a PCI function control instruction is described with reference to FIG. 7. In one example, the instructions are issued by an operating system (or other configuration) and executed by a processor (e.g., firmware) executing the operating system. In the example herein, the instruction and adapter functions are PCI based. However, in other embodiments, different adapter structures and corresponding instructions may be used.

In one example, the operating system provides the following operands to the instruction (e.g., in one or more registers specified by the instruction); PCI function handles; a DMA address space identifier; operation control; and the address of the functional information block.

Referring to FIG. 7, initially, a determination is made as to whether a facility (facility) is installed that allows modification of the PCI function control instructions, INQUIRY 700. This determination is made, for example, by examining an indicator stored, for example, in a control block. If the tool is not installed, an exception condition is provided, STEP 702. Otherwise, a determination is made as to whether the instruction was issued by a pageable storage mode guest (or other guest), INQUIRY 704. If so, the host operating system will emulate the operation for the guest, step 706.

Otherwise, a determination is made as to whether one or more operands are aligned, INQUIRY 708. For example, it is determined whether the address of the functional information block is at a doubleword boundary. In one example, this is optional. If the operands are not aligned, an exception condition is provided, STEP 710.

Otherwise, a determination is made as to whether the functional information block is accessible, INQUIRY 712. If not, an exception condition is provided, step 714. Otherwise, a determination is made as to whether the handle provided in the operand of the modify PCI function control instruction is enabled, INQUIRY 716. In one example, this determination is made by examining an enable indicator in the handle. If the handle is not enabled, an exception condition is provided, STEP 718.

If the handle is enabled, the handle is used to locate the function table entry, STEP 720. That is, at least a portion of the handle is used to index into the function table to locate the function table entry corresponding to the adapter function for which the operating parameters are to be established.

A determination is made as to whether a function table entry is found, INQUIRY 722. If not, an exception condition is provided, step 724. Otherwise, if the configuration from which the instruction is issued is guest, query 726, an exception condition is provided (e.g., interception to the host), step 728. If the configuration is not a customer, the query may be ignored, or other authorizations may be checked, if specified.

A determination is then made as to whether the function is enabled, INQUIRY 730. In one example, this determination is made by checking an enable indicator in the function table entry. If it is not enabled, an exception condition is provided, step 732.

If the function is enabled, a determination is made as to whether the recovery is active, INQUIRY 734. If the recovery is active as determined by the recovery indicator in the function table entry, an exception condition is provided, STEP 736. However, if recovery is not active, a further determination is made as to whether the function is busy, INQUIRY 738. This determination is made by looking up the busy indicator in the function table entry. If the function is busy, a busy condition is provided, step 740. With the busy condition, the instruction may be retried instead of giving up it.

If the function is not busy, a further determination is made as to whether the function information block format is valid, INQUIRY 742. For example, the format field of the FIB is examined to determine if the format is supported by the system. If it is not valid, an exception condition is provided, step 744. If the function information block format is valid, a further determination is made as to whether the operation control specified in the operand of the instruction is valid, INQUIRY 746. That is, whether the operation control is one of the specified operation controls for the instruction. If it is not valid, an exception condition is provided, step 748. However, if the operation control is valid, the specified specific operation control is continued to be processed.

One type of operation control that may be specified is a register I/O address translation parameter operation used in controlling address translation for the adapter. With this operation, PCI function parameters related to I/O address translation are set in the DTE, FTE, and/or other locations by appropriate parameters of the FIB (which is an operand of the instruction). These parameters include, for example, PCI base address; PCI address boundaries (also known as PCI boundaries or limits); address translation format; a page size; and an I/O address translation pointer, which are operands to the operation. There are also implied operations including a start DMA address (SDMA) and an end DMA address (EDMA) stored in a location that is accessible by the processor executing the instructions.

One embodiment of the logic to establish operating parameters for I/O address translation is described with reference to FIG. 8. Initially, a determination is made as to whether the PCI base address in the FIB is greater than the PCI limit in the FIB, INQUIRY 800. If the comparison of the base address and the limit indicates that the base address is greater than the limit, an exception condition is identified, STEP 802. However, if the base address is less than or equal to the limit, a further determination is made as to whether the address translation format and page size are valid, INQUIRY 804. If they are not valid, an exception condition is provided, step 806. If they are valid, however, it is further determined whether the size of the address space (based on the base address and bounds) exceeds the translation capability, INQUIRY 808. In one example, the size of the address space is compared to the maximum address translation capability possible based on the format of the upper level table. For example, if the upper level table is a DAT compatible segment table, the maximum translation capability is 2 gigabytes.

If the size of the address space exceeds the translation capability, an exception condition is provided, step 88. Otherwise, a further determination is made as to whether the base address is less than the starting DMA address, INQUIRY 812. If so, an exception condition is provided, step 814. Otherwise, a further determination is made as to whether the address limit is greater than the ending DMA address, INQUIRY 816. If so, an exception condition is provided, step 818. In one example, the starting DMA address and the ending DMA address are based on a system-wide policy.

Thereafter, a determination is made as to whether sufficient resources are available (if any resources are needed) to perform the I/O address translation, INQUIRY 820. If not, an exception condition is provided, step 822. Otherwise, a further determination is made as to whether the I/O address translation parameters are already registered in the FTE and DTE, INQUIRY 824. This is determined by examining the values of the parameters in the FTE/DTE. For example, if the value in the FTE/DTE is zero or another defined value, registration has not been performed. To locate the FTE, the handle provided in the instruction is used, and to locate the DTE, the device index in the FTE is used.

If the adapter function has been registered for address translation, an exception condition is provided, step 826. If not, a determination is made as to whether the specified DMA address space is valid, (i.e., whether it is a DTE enabled address space), INQUIRY 828. If not, an exception condition is provided, step 830. If all checks are successful, then the translation parameter is placed in the device table entry and, optionally, in the corresponding function table entry (or other designated location), STEP 832. For example, PCI function parameters related to I/O address translation are copied from the function information block and placed in the DTE/FTE. These parameters include, for example, PCI base address, PCI address bounds, translation format, page size, and I/O address translation pointers. This operation enables DMA access to the specified DMA address space. It enables I/O address translation for the adapter function.

Another operational control that may be specified by the Modify PCI function controls instruction is a deregister I/O address translation parameter operation, an example of which will be described with reference to FIG. 9. With this operation, the functional parameter related to the I/O address translation is reset to zero. This operation prohibits DMA accesses to the specified DMA address space and causes the purging of an I/O translation lookaside buffer (TranslationLookasideBuffer) entry for that DMA address space. It disables address translation.

Referring to FIG. 9, in one embodiment, a determination is made as to whether the I/O address translation parameters are unregistered, INQUIRY 900. In one example, this determination is made by examining the values of appropriate parameters in the FTE or DTE. If these fields are zero or some specified value, they are not registered. Thus, an exception condition is provided, step 902. If they are registered, a determination is made as to whether the DMA address space is valid, INQUIRY 904. If it is not valid, an exception condition is provided, step 906. If the DMA address space is valid, the translation parameters in the device table entry and optionally the corresponding function table entry are cleared, STEP 908.

In one embodiment, the registration process is performed for each DMA address space allocated to the adapter. As described herein, multiple address spaces may be allocated, and in one particular arrangement, the number of address spaces to be allocated is indicated by the call logic processor instruction enable function.

One embodiment of this instruction is shown in FIG. 10A. As shown, in one example, the call logic processor instruction 1000 includes an opcode 1002 indicating that it is a call logic processor instruction; and an indication for command 1004. In one example, this indication describes the address of the request block of the command to be executed. One embodiment of such a request block is shown in FIG. 10B.

As shown in FIG. 10B, in one example, request block 1020 includes a number of parameters, such as a length field 1022, indicating the length of the request block; a command field 1024 indicating an aggregate PCI function command; PCI function handle 1026, which is a handle provided to an enabled or disabled function; an operation code 1028 for specifying an enable or disable operation; and a DMA address space (DMAAS) number 1030 indicating a requested address space number associated with a particular PCI function. More, less or different information may be included in other embodiments. For example, a guest identity (identity) is provided in a virtual environment where a host of a pageable storage mode guest issues instructions. Other variations are also possible.

In response to issuing and processing the call logic processor instruction, a response block is returned and the information included in the response block depends on the operation to be performed. One embodiment of a response block is shown in FIG. 10C. In one example, the response block 1050 includes: a length field 1052 indicating the length of the response block; a response code 1054 indicating the status of the command; and a PCI function handle 1056 that identifies a PCI function. In response to the enable command, the PCI function handle is an enable handle of the PCI function. Further, upon completion of the disable operation, the PCI function handle is a generic handle that may be enabled by an enabling function in the future.

One embodiment of the logic to enable the PCI function is described with reference to FIG. 11. In one example, the logic is initiated in response to issuing a call logic processor instruction, wherein the command is set to an aggregate PCI function command and the opcode is set to an enable function. The logic is executed by the processor, for example, in response to an operating system or a device driver of the operating system being authorized to execute the logic that issues instructions. In other embodiments, the logic may be executed without the use of call logic processor instructions.

Referring to FIG. 11, initially, a determination is made as to whether the handle provided in the request block of the Call logic processor instruction is a valid handle, INQUIRY 1100. I.e., does the handle point to a valid entry in the function table? Or it is outside the range of valid entries (e.g., the function number portion of the handle specifies the installed function). If the handle is unknown, a corresponding response code is provided indicating that the handle is not recognized, STEP 1102. However, if the handle is known, then a further query is made as to whether the handle is enabled, INQUIRY 1104. This determination is made by examining an enable indicator in the PCI function handle. If an indication is set indicating that the handle is enabled, then a response code is returned as indicated, STEP 1106.

However, if the handle is known and not enabled (i.e., valid for implementation), then a determination is made as to whether the requested address space number assigned to the PCI function is greater than a maximum value, INQUIRY 1108. To make this determination, the DMA address space number specified in the request block is compared to a maximum value (provided based on policy, in one example). If the address space number is greater than the maximum value, then a response code indicating an invalid value for the DMA address space is provided, STEP 1110. Otherwise, a determination is made as to whether the requested address space number is available, INQUIRY 1112. This determination is made by checking whether there is a device table entry available for the requested address space number. If the requested address space number is not available, then a response code is returned indicating that there are not enough resources, STEP 1114. Otherwise, processing continues to enable the PCI function.

The handle provided is used to locate the function table entry, step 1116. For example, one or more specified bits of the handle are used as an index into the function table to locate a particular function table entry. A determination is made whether the function is enabled, INQUIRY 1118, in response to locating the appropriate function table entry. This determination is made by checking the enable indicator in the function table entry. If the function has been enabled (i.e., the indicator is set to 1), then a response code is returned indicating that the PCI function has been in the requested state, STEP 1120.

If the function has not been enabled, then processing continues with a determination of whether the function is in a permanent error state, INQUIRY 1122. If the permanent error status indicator in the function table entry indicates that it is in a permanent error state, then a response code indicating this is returned, step 1124. However, if the function is not in a permanent error state, then a further determination is made as to whether error recovery has been initiated for the function, INQUIRY 1126. If the recovery initiation indicator in the function table entry is set, a response code indicating that recovery has been initiated is provided, STEP 1128. Otherwise, a further query is made as to whether the PCI function is busy, query 1130. Again, if the check of the busy indicator in the function table entry indicates that the PCI function is busy, then such an indication is provided, step 1132, whereas if the PCI function is not in a permanent error state, no recovery is initiated and it is not busy, then a further query is made as to whether the operating system is allowed to enable the PCI function, step 1134. If the function table entry based permission indicator does not permit, then a response code indicating an unauthorized action is provided, step 1136. If, however, all tests are successfully passed, then a further determination may be made as to whether there are any DTEs available for such PCI functionality, INQUIRY 1138. As an example, the determination that DTEs are available may be based on DTEs that are not currently enabled in the I/O hub. Thus, policies may be applied to further limit the number of DTEs that may be used to specify an operating system or logical partition. Any available DTE that can access the adapter can be assigned. If there are no available DTEs, a response code is returned indicating that one or more requested DTEs are not available, step 1140.

If a DTE is available, then a number of DTEs corresponding to the requested address space number are allocated and enabled, step 1142. In one example, the enabling includes setting an enable indicator in each DTE to be enabled. Further, in this example, the enabling includes establishing a CAM to provide an index to each DTE. For example, for each DTE, the entry in the CAM is loaded with the index.

In addition, a DTE is associated with the concept table entry, step 1144. This includes, for example, including each DTE index in the function table entry. The function is then marked as enabled by setting an enable indicator in the function table entry, step 1146. In addition, the enable bit in the handle is set and the instance number is updated, step 1148. This enable handle, which allows use of the PCI adapter, is then returned, STEP 1150. For example, in response to enabling the function, registration for address translation and interrupts may be performed, DMA operations may be performed by the PCI function and/or load, store, and store block instructions may be issued to the function.

Each address space is identified by an address space identifier (i.e., one or more bits of an address received by the adapter). The specific bit is indicated in the DMA address space mask retrieved by the CLP query group command. An example of a CLP instruction is described above with reference to FIG. 10A.

One embodiment of a request block for a query PCI function group command is described with reference to FIG. 12A. In one example, request block 1200 includes the following fields:

length field 1202: this field indicates the length of the request block;

the command code 1204: this field indicates the query PCI function group command; and

functional group ID 1206: this field specifies the PCI function group identifier for which attributes are to be obtained. In one example, he obtains from a query function command that provides details about the selected function.

A response block is returned in response to the call logic processor instruction issuing and processing the command with the query PCI function group. One embodiment of a response block is shown in FIG. 12B. In one example, response block 1250 includes:

length field 1252: this field indicates the length of the response block;

response code 1254: this field indicates the status of the command;

number of interrupts 1256: this field indicates the maximum number of consecutive MSI vector numbers (i.e., interrupt event indicators) supported by the PCI tool for each PCI function in the specified group of PCI functions. In one example, the number of interrupts may have a significant value in the range of zero to 2048;

version 1258: this field indicates the version of the PCI specification supported by the group-attached PCI tool that specifies the PCI function indicated by the PCI group identifier;

frame 1262: this field indicates the frame (or page) size supported for I/O address translation;

measurement block update interval 1264: this is a value indicating an approximate time interval (e.g., in milliseconds) to update the PCI function measurement block;

DMA address space mask 1266: this is a value used to indicate which bits in the PCI address are used to identify the DMA address space. He may implicitly define the maximum amount of DMA address space supported. That is, it is raised to the power of 2 with the number of bits of one in the mask; and

MSI address 1268: this is the value used for the message signaled interruption request.

The group information is based on the given system I/O architecture and the capabilities of the firmware and I/O hub. This may be stored in the FTE or any other conventional location for later retrieval during query processing. In particular, the query group command retrieves the information and stores it in its response block accessible to the operating system.

The ability to assign multiple DMA address spaces to each adapter, and specifically each adapter function (which shares the PCI bus with other adapter functions) is described above. The use of multiple address spaces depending on the adapter or adapter function can enable the use of different sized address spaces, different translation formats, and/or different address translation tables, if desired. The use of multiple address spaces is achieved by associating a DTE to each address space. The DTE defines the characteristics of the address space to which it relates. The appropriate DTE is selected by a combination of the RID and the address space identifier.

In the embodiment described herein, the adapter is a PCI adapter. As used herein, PCI refers to any adapter (www.pcisig.com/home) implemented according to a PCI-based specification defined by the peripheral component interconnect special interest group (PCI-SIG), including but not limited to PCI or PCIe. In one particular example, peripheral component interconnect express (PCIe) is a component-level interconnect standard that defines a bi-directional communication protocol for transactions between an I/O adapter and a host system. According to the PCIe standard for transmission over a PCIe bus, PCIe communications are encapsulated in packets. Transactions originating at the I/O adapter and terminating at the host system are referred to as upbound transactions. Transactions originating at the host system and terminating at the I/O adapter are referred to as downstream transactions. The PCIe topology is based on point-to-point unidirectional links that are paired (e.g., one uplink, one downlink) to form a PCIe bus. The PCIe standard is maintained and published by the PCI-SIG.

As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method or computer program product. Accordingly, the present disclosure may be embodied in the form of: may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software, and may be referred to herein generally as a "circuit," module "or" system. Furthermore, in some embodiments, the invention may also be embodied in the form of a computer program product in one or more computer-readable media having computer-readable program code embodied in the medium.

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Referring now to FIG. 13, in one example, a computer program product 100 includes, for instance, one or more computer-readable storage media 1302 having computer-readable program code means or logic 1304 stored thereon to provide and facilitate one or more aspects of the present invention.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The present invention is described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instruction means (instructions) which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition to the foregoing, one or more aspects of the present invention may be provided, offered, deployed, managed, serviced, etc. by a service provider who offers management of a user's environment. For example, a service provider can create, maintain, support, etc., computer code and/or computer infrastructure that performs one or more aspects of the present invention for one or more users. The service provider, in turn, may accept payment from the user, for example, according to a subscription and/or fee agreement. Additionally or alternatively, the service provider may receive payment from the sale of advertising content to one or more third parties.

In one aspect of the invention, an application may be deployed to perform one or more aspects of the invention. As one example, deploying an application comprises providing a computer infrastructure operable to perform one or more aspects of the present invention.

As yet another aspect of the present invention, a computing infrastructure may be deployed comprising integrating computer-readable code into a computer system, wherein the code in combination with the computing system is capable of performing one or more aspects of the present invention.

As yet another aspect of the present invention, a process for integrating computing infrastructure comprising integrating computer readable code into a computer system may be provided. The computer system includes a computer-readable medium, wherein the computer medium includes one or more aspects of the present invention. The code in combination with the computer system is capable of performing one or more aspects of the present invention.

While various embodiments are described above, these are only examples. For example, computing environments of other architectures may incorporate and use one or more aspects of the present invention. As an example, except thatGarment outside serverServers, such as Power systems servers or other servers offered by International Business machines corporation, or servers of other companies, may include, use and/or benefit from one or more aspects of the present invention. Moreover, although in the examples illustrated herein, the adapters and PCI hubs are considered to be part of the server, in other embodiments, they need not be considered to be part of the server, but may simply be considered to be coupled to the system memory and/or other components of the computing environment. The computing environment need not be a server. Also, although a translation table is described, any data result may be used, and the term table is intended to include all such data results. Moreover, although the adapters are PCI based, one or more aspects of the present invention may be used with other adapters or other I/O components. Adapters and PCI adapters are examples only. Also, other sizes of address spaces, address tables, and/or pages may be used without departing from the scope of the present invention. Also, the DTE may include more, less, or different information. Further, other types of addresses may be translated using one or more aspects of the present invention. In addition, other values may be used for the address space identifier and/or the requestor identifier. Many other variations are possible.

Moreover, other types of computing environments may benefit from one or more aspects of the present invention. By way of example, a data processing system suitable for storing and/or executing program code will be used that includes at least two processors coupled directly or indirectly to memory elements through a system bus. The memory elements include, for instance, local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.

Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, DASD, magnetic tape, CDs, DVDs, thumb drives (thumb drives), and other storage media, etc.) can be coupled to the system either directly or through intervening I/O controllers. Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the available types of network adapters.

Referring to FIG. 14, representative components of a host computer system 5000 to implement one or more aspects of the present invention are depicted. Representative host computer 5000 includes one or more CPUs in communication with computer memory (i.e., central storage) 5002, as well as I/O interfaces to storage media devices 5011 and networks 5010 for communicating with other computers or SANs and the like. The CPU5001 conforms to an architecture having an architectural instruction set and architectural functions. The CPU5001 may have Dynamic Address Translation (DAT) 5003 for translating program addresses (virtual addresses) to real addresses of memory. A DAT typically includes a Translation Lookaside Buffer (TLB) 5007 for caching translations so that later accesses to a block of computer memory 5002 do not require the delay of address translation. Typically, a cache 5009 is used between the computer memory 5002 and the processor 5001. The cache 5009 may be hierarchical, having a large cache available to more than one CPU, and smaller, faster (lower level) caches between the large cache and each CPU. In some embodiments, the lower level cache is split to provide separate lower level caches for instruction fetching and data accesses. In one embodiment, instructions are fetched from memory 5002 by instruction fetch unit 5004 via cache 5009. The instructions are decoded in the instruction decode unit 5006 and (in some embodiments, with other instructions) sent to the one or more instruction execution units 5008. Typically, several execution units 5008 are used, such as an arithmetic execution unit, a floating point execution unit, and a branch instruction execution unit. The specification is executed by the execution unit, accessing operands from registers or memory specified by the instruction, as needed. If an operand is to be accessed (loaded or stored) from memory 5002, load/store unit 5005 typically handles the access under the control of the instruction being executed. The instructions may be executed in hardware circuitry, or in internal microcode (firmware), or in a combination thereof.

Note that the computer system includes information in local (or main) memory, as well as addressing, protection, and reference and change records. Some aspects of addressing include address format, concept of address space, various types of addresses, and the manner in which one type of address is translated to another type of address. Some main memories include persistently allocated memory locations. The main memory provides the system with fast-access data storage that is directly addressable. Both data and programs will be loaded into main memory (from the input device) before they can be processed.

The main memory may include one or more smaller, faster-access cache memories, sometimes referred to as caches. The cache is typically physically associated with the CPU or I/O processor. The effects of the physical structure and use of different storage media are not typically observed by a program except in terms of performance.

Separate caches for instruction and data operands may be maintained. Information in a cache may be maintained as contiguous bytes on integer boundaries called cache blocks or cache lines (or simply lines). The model may provide an extract cache attribute instruction that returns the byte size of the cache line. The model may also provide PREFETCHDATA (prefetch data) and prefetch data relative issue (prefetch longer data) instructions that enable a prefetch to be stored into the data or instruction cache, or a release of data from the cache.

The memory is considered to be a long horizontal string of bits. For most operations, accesses to memory are made in left-to-right order. The bit string is subdivided into units of eight bits. The eight-bit unit is called a byte, which is the basic building block for all information formats. Each byte location in memory is identified by a unique non-negative integer, which is the address of the byte location, or simply, the byte address. Adjacent byte positions have consecutive addresses, starting at 0 on the left and proceeding in left to right order. The address is an unsigned binary integer and is 24, 31 or 64 bits.

Information is transferred between the memory and the CPU or channel subsystem one byte or a group of bytes at a time. Unless otherwise specified, e.g. inA group of bytes in memory is addressed by the leftmost byte of the group. The number of bytes in a group may be implied or explicitly specified by the operation to be performed. When used in CPU operations, a group of bytes is called a field. Within each group of bytes, e.g. inIn which bits are numbered in left-to-right order. In thatIn (d), the leftmost bit is sometimes referred to as the "high order" bit and the rightmost bit is referred to as the "low order" bit. However, the number of bits is not a memory address. Only bytes can be addressed. To operate on a single bit of a byte in memory, the entire byte is accessed. The bits on a byte are numbered 0 to 7 from left to right (e.g., inIn (1). Bits in the address are numbered 8-31 or 40-63 for a 24-bit address, or 1-31 or 33-63 for a 31-bit address; they are numbered 0-63 for a 64-bit address. In any other fixed length format of a plurality of bytes, the bits that make up the format are numbered consecutively starting from 0. For error detection, and preferably for correction, one or more check bits may be passed with each byte or group of bytes. Such check bits are automatically generated by the machine and cannot be directly controlled by the program. The storage capacity is expressed in number of bytes. When the length of a memory operand field is implied by the opcode of an instruction, the field is said to have a fixed length, which may be one, two, four, eight, or sixteenA byte. Larger fields may be implied for some instructions. When the length of the memory operand field is not implied but explicitly indicated, the field is said to have a variable length. Variable length operands may be variable in length in increments of one byte (or for some instructions, in multiples of two bytes or other multiples). When information is placed in memory, only the contents of which byte locations included in the specified field are replaced, even though the width of the physical path to memory may be greater than the length of the field being stored.

Some units of information are located on integer limits in memory. For a unit of information, a bound is said to be an integer when its memory address is a multiple of the length of the unit in bytes. Special names are given to the fields of 2, 4, 6, 8 and 16 bytes on the integer limit. A halfword is a set of two consecutive bytes on a two-byte boundary and is the basic building block of instructions. A word is a set of four consecutive bytes on a four-byte boundary. A doubleword is a set of eight consecutive bytes on an eight-byte boundary. A quad word (quadword) is a set of 16 contiguous bytes on a 16-byte boundary. When a memory address specifies a halfword, a word, a doubleword, and a quadword, the binary representation of the address includes one, two, three, or four rightmost zero bits, respectively. The instruction will be on a two-byte integer boundary. Most instructions have memory operands that do not have boundary alignment requirements.

On devices that implement separate caches for instructions and data operands, significant delays may be experienced if a program stores in a cache line and an instruction is subsequently fetched from the cache line, regardless of whether the store alters the subsequently fetched instruction.

In one embodiment, the invention may be implemented by software (sometimes referred to as licensed internal code, firmware, microcode, millicode, picocode, etc., any of which would be consistent with the invention). Referring to fig. 11, software program code embodying the present invention is typically accessible by a processor of the host system 5000 from a long term storage media device 5011, such as a CD-ROM drive, tape drive or hard drive. The software program code may be embodied on any of a variety of known media for use with a data processing system, such as a floppy disk, a hard drive, or a CD-ROM. The code may be distributed on such media, or may be distributed to users of other computer systems from the computer memory 5002 or storage devices of one computer system over the network 5010 for use by users of such other systems.

The software program code includes an operating system which controls the function and interaction of the various computer components and one or more application programs. The program code is typically paged from the storage media device 5011 to the relatively higher speed computer memory 5002 where it is available to the processor 5001. The techniques and methods for embodying software program code in memory, on physical media, and/or distributing software code via networks are well known and will not be discussed further herein. When the program code is created and stored on a tangible medium, including but not limited to an electronic memory module (RAM), flash memory, Compact Discs (CDs), DVDs, tapes, etc., it is often referred to as a "computer program product". The computer program product medium is typically readable by processing circuitry preferably located in a computer system for execution by the processing circuitry.

FIG. 15 illustrates a representative workstation or server hardware system in which the present invention may be implemented. The system 5020 of fig. 12 includes a representative base computer system (base computer) 5021, such as a personal computer, workstation or server, including optional peripherals. A basic computer system 5021 comprises one or more processors 5026 and a bus used to connect and enable communication between the processors 5026 and other components of the system 5021, in accordance with known techniques. The bus connects the processor 5026 to memory 5025 and long-term storage 5027 which may comprise a hard disk drive (including any of magnetic media, CD, DVD, and flash memory, for example) or a tape drive, for example. The system 5021 may also include a user interface adapter that connects the microprocessor 5026 via the bus to one or more interface devices, such as a keyboard 5024, a mouse 5023, a printer/scanner 5030, and/or other interface devices, which may be any user interface device such as a touch-sensitive screen, a digital input pad (digizzedentrypad), etc. The bus may also connect a display device 5022, such as an LCD screen or monitor, to the microprocessor 5026 via a display adapter.

The system 5021 may communicate with other computers or networks of computers via a network adapter capable of communicating 5028 with a network 5029. Exemplary network adapters are communications channels, token ring, Ethernet or modems. Alternatively, the system 5021 may communicate using a wireless interface, such as a CDPD (cellular digital packet data) card. The system 5021 can be associated with such other computers in a Local Area Network (LAN) or a Wide Area Network (WAN), or the system 5021 can be a client in a client/server arrangement with another computer, etc. All of these configurations, as well as suitable communication hardware and software, are known in the art.

Figure 16 illustrates a data processing network 5040 in which the present invention may be implemented. The data processing network 5040 may include a plurality of separate networks, such as wireless and wired networks, each of which may include a plurality of separate workstations 5041, 5042, 5043, 5044. Further, those skilled in the art will appreciate that one or more LANs may be included, wherein a LAN may include a plurality of intelligent workstations coupled to a host processor.

Still referring to FIG. 16, the network may also include mainframe computers or servers, such as a gateway computer (client server 5046) or application server (remote server 5048, which may access a data repository and may also be accessed directly from a workstation 5045). The gateway computer 5046 serves as a point of entry into each individual network. When connecting one networking protocol to another, a gateway is required. The gateway 5046 may preferably be coupled to another network (e.g., the internet 5047) by a communications link. The gateway 5046 may also be directly coupled to one or more workstations 5041, 5042, 5043, 5044 using a communications link. IBMeServer available from International Business machines corporation may be utilized^TMThe server implements a gateway computer.

Referring concurrently to fig. 15 and 16, software programming code which may embody the present invention may be accessed by the processor 5026 of the system 5020 from long-term storage media 5027, such as a CD-ROM drive or hard drive. The software programming code may be embodied on any of a variety of known media for use with a data processing system, such as a floppy disk, a hard drive, or a CD-ROM. The code may be distributed on such media, or from the memory or storage of one computer system over a network to users 5050, 5051 of other computer systems for use by users of such other systems.

Alternatively, the programming code may be embodied in the memory 5025 and accessed by the processor 5026 using a processor bus. Such programming code includes an operating system which controls the function and interaction of the various computer components and one or more application programs 5032. Program code is typically paged from the storage medium 5027 to high-speed memory 5025 where it is available for processing by the processor 5026. Techniques and methods for embodying software programming code in memory, on physical media, and/or distributing software code via networks are well known and will not be discussed further herein. Program code, when created and stored on tangible media, including but not limited to electronic memory modules (RAM), flash memory, Compact Discs (CDs), DVDs, tapes, etc., is commonly referred to as a "computer program product". The computer program product medium is typically readable by a processing circuit, preferably located in a computer system, for execution by the processing circuit.

The cache most readily used by the processor (which is typically faster and smaller than the other caches of the processor) is the lowest level (L1 or level 1) cache, and main storage (main memory) is the highest level cache (L3 if there are three levels). The lowest level cache is often divided into an instruction cache (I-cache) that holds the machine instructions to be executed, and a data cache (D-cache) that holds the data operands.

Referring to FIG. 17, an exemplary processor embodiment is shown for the processor 5026. Typically, one or more levels of cache 5053 are used to buffer memory blocks in order to improve processor performance. The cache 5053 is a cache buffer that holds cache lines of memory data that are likely to be used. Typical cache lines are 64, 128 or 256 bytes of memory data. A separate cache is typically used for caching instructions rather than data. Cache coherency (synchronization of copies of lines in memory and cache) is typically provided by various "snoop" algorithms well known in the art. The main memory 5025 of the processor system is commonly referred to as a cache. In a processor system having 4 levels of cache 5053, main memory 5025 is sometimes referred to as a level 5 (L5) cache, because it is typically faster and maintains only a portion of the non-volatile storage (DASD, tape, etc.) that is available to the computer system. Main memory 5025 may "cache" pages of data paged in and out of main memory 5025 by the operating system.

Program counter (instruction counter) 5061 keeps track of the address of the current instruction to be executed.The program counter in the processor is 64 bits and may be truncated to 31 or 24 bits to support the previous addressing limits. The program counter is typically embodied in the computer's PSW (program status word) so that it persists during context transitions. Thus, an in-progress program having a program counter value may be interrupted by, for example, an operating system (context switch from a program environment to an operating system environment). When a program is inactive, the PSW of the program maintains a program counter value, and while the operating system executes, the program counter (in the PSW) of the operating system is used. Typically, the program counter is incremented by an amount equal to the number of bytes of the current instruction. RISC (reduced instruction set computing) instructions are typically of fixed length, whereas CISC (Complex instruction set)Compute) instructions are typically of variable length.Is a CISC instruction having a length of 2, 4 or 6 bytes. Program counter 5061 is modified by, for example, a context switch operation or a branch taken operation of a branch instruction. In a context switch operation, the current program counter value is saved in a program status word along with other status information about the program being executed (such as condition codes), and a new program counter value is loaded and points to the instruction of the new program module to be executed. A branch taken operation is performed to allow the program to make a decision or loop within the program by loading the result of the branch instruction into the program counter 5061.

Typically, instructions are fetched on behalf of the processor 5026 using an instruction fetch unit 5055. The fetch unit may fetch a "next sequence of instructions," a target instruction of a branch taken instruction, or a first instruction of a context-switched program. Present instruction fetch units typically use prefetch techniques to speculatively prefetch instructions based on the likelihood that the prefetched instructions will be used. For example, the fetch unit may fetch 16 bytes of instructions, including the next sequential instruction and additional bytes of further sequential instructions.

The fetched instructions are then executed by the processor 5026. In one embodiment, the fetched instructions are passed to the dispatch unit 5056 of the fetch unit. The dispatch unit decodes the instructions and forwards information about the decoded instructions to the appropriate units 5057, 5058, 5060. The execution unit 5057 will typically receive information from the instruction fetch unit 5055 regarding decoded arithmetic instructions, and will perform arithmetic operations on operands according to the opcode of the instruction. Operands are preferably provided to the execution unit 5057 from storage 5025, architectural registers 5059, or from an immediate field (immediatefield) of the instruction being executed. The results of the execution, when stored, are stored in storage 5025, registers 5059, or other machine hardware (such as control registers, PSW registers, etc.).

The processor 5026 typically has one or more units 5057, 5058, 5060 for performing the function of instructions. Referring to fig. 18A, an execution unit 5057 may communicate with architected general registers 5059, decode/dispatch unit 5056, load store unit 5060, and other 5065 processor units via interface logic 5071. The execution unit 5057 may use several register circuits 5067, 5068, 5069 to hold information that the Arithmetic Logic Unit (ALU) 5066 is to operate on. The ALU performs arithmetic operations such as add, subtract, multiply, divide, and logical operations such as AND, OR, and exclusive OR (XOR), rotate, and shift. Preferably, the ALU supports specialized operations that are design dependent. Other circuitry may provide other architectural tools 5072, including condition codes and recovery support logic, for example. Typically, the results of the ALU operations are held in output register circuitry 5070, which may forward the results to a variety of other processing functions. There are many processor unit arrangements and this description is intended only to provide a representative understanding of one embodiment.

For example, ADD instructions will be executed in an execution unit 5057 having arithmetic and logical functionality, while floating point instructions will be executed in floating point execution with dedicated floating point capabilities, for example. Preferably, the execution unit operates on the operands identified by the instruction by executing the function defined by the opcode on the operands. For example, an ADD instruction may be executed by the execution unit 5057 on operands found in two registers 5059 identified by register fields of the instruction.

The execution unit 5057 performs arithmetic addition on two operands and stores the result in a third operand, which may be a third register or one of the two source registers. The execution unit preferably utilizes an Arithmetic Logic Unit (ALU) 5066, which can perform a variety of logic functions, such as shifting, rotating, and, OR, and XOR, as well as any of a variety of algebraic functions, including addition, subtraction, multiplication, and division. Some ALUs 5056 are designed for scalar operations, and some for floating point. Depending on the architecture, the data may be big endian (where the least significant byte is located at the most significant byte address) or little endian (where the least significant byte is located at the least significant byte address).IBMIs the large end. Depending on the architecture, the signed field may be sign and magnitude, 1's complement, or 2's complement. A 2's complement number is advantageous in that the ALU does not need to design subtraction capability because only addition in the ALU is required, whether negative or positive in the 2's complement. The numbers are typically described in shorthand, where a 12-bit field defines the address of a block of 4096 bytes, and are typically described as a 4Kbyte block, for example.

Referring to FIG. 18B, branch instruction information for executing a branch instruction is typically sent to a branch unit 5058, which often predicts branch outcome before other conditional operations are completed, using a branch prediction algorithm such as branch history table 5082. Before the conditional operation completes, the target of the current branch instruction will be fetched and speculatively executed. When the conditional operation completes, the speculatively executed branch instruction is either completed or discarded based on the condition of the conditional operation and the speculative result. Typical branch instructions may test the condition code and branch to a target address if the condition code satisfies the branch requirement of the branch instruction, the branch address may be calculated based on a number including, for example, a number found in a register field or an immediate field of the instruction. The branch unit 5058 may utilize an ALU5074 having a plurality of input register circuits 5075, 5076, 5077 and an output register circuit 5080. The branch unit 5058 may communicate with, for example, general registers 5059, decode dispatch unit 5056, or other circuitry 5073.

Execution of a set of instructions may be interrupted for a number of reasons including, for example, a context switch initiated by the operating system, a program exception or error causing a context switch, an I/O interrupt signal causing a context switch, or multi-threaded activity of multiple programs (in a multi-threaded environment). Preferably, the context switch action saves state information about the currently executing program and then loads state information about another program being invoked. The state information may be stored, for example, in hardware registers or memory. The state information preferably includes a program counter value pointing to the next instruction to be executed, condition codes, memory translation information and architectural register contents. The context translation activities may be implemented by hardware circuitry, application programs, operating system programs, or firmware code (microcode, pico code, or Licensed Internal Code (LIC)), alone or in combination.

The processor accesses operands according to the instruction defined method. An instruction may provide an immediate operand using the value of a portion of the instruction, may provide one or more register fields that explicitly point to general purpose registers or special purpose registers (e.g., floating point registers). The instruction may utilize the implied register determined by the opcode field as an operand. The instruction may utilize memory locations for operands. The memory location of the operand may be provided by a register, an immediate field, or a combination of a register and an immediate field, such asIllustrated by the long displacement facility (facility), where the instruction defines a base register, an index register, and an immediate field (displacement field) that are added together to provide, for example, the address of an operand in memory. Location here typically means a location in main memory (main storage device) unless otherwise specified.

Referring to fig. 18C, the processor accesses the memory using the load/store unit 5060. The load/store unit 5060 may perform a load operation by obtaining the address of a target operand in memory 5053 and loading the operand into a register 5059 or other memory 5053 location, or may perform a store operation by obtaining the address of a target operand in memory 5053 and storing data obtained from a register 5059 or another memory 5053 location in the target operand location in memory 5053. The load/store unit 5060 may be speculative and may access memory in an out-of-order relative to instruction order, but the load/store unit 5060 will maintain the appearance to a program that instructions are executed in order. The load/store unit 5060 may communicate with general registers 5059, decryption/dispatch unit 5056, cache/memory interface 5053 or other elements 5083, and includes various register circuits, ALUs 5085 and control logic 5090 to calculate memory addresses and provide pipeline order to keep operations in order. Some operations may be out of order, but the load/store unit provides functionality such that operations that are performed out of order appear to the program as if they were performed in order, as is well known in the art.

Preferably, the addresses that are "seen" by the application are commonly referred to as virtual addresses. Virtual addresses are sometimes referred to as "logical addresses" and "effective addresses". These virtual addresses are virtual in that they are redirected to a physical memory location by one of a variety of Dynamic Address Translation (DAT) techniques including, but not limited to, simply prefixing the virtual address with an offset value, translating the virtual address via one or more translation tables, preferably including at least a segment table and a page table (either individually or in combination), preferably the segment table having an entry pointing to the page table. In thatA translation hierarchy is provided that includes a region first table, a region second table, a region third table, a segment table, and an optional page table. The performance of translation tables is typically improved by utilizing a Translation Lookaside Buffer (TLB) that includes entries that map virtual addresses to associated physical memory locations. When a DAT translates a virtual address using a translation table, an entry is created. Subsequent use of the virtual address may then utilize the entry of the fast TLB, rather than the slow sequential translation table access. TLB content may be managed by a plurality of replacement algorithms including LRU (least recently used).

Where the processors are processors of a multi-processor system, each processor has the responsibility of maintaining shared resources, such as I/O, caches, TLBs, and memory, which are interlocked to achieve coherency. Typically, "snooping" techniques will be used to maintain cache coherency. In a snooping environment, each cache line may be marked as being in one of a shared state, an exclusive state, a changed state, an invalid state, etc., to facilitate sharing.

An I/O unit 5054 (fig. 17) provides the processor with means for attaching to peripheral devices including, for example, tapes, disks, printers, displays, and networks. The I/O cells are typically presented to the computer program by a software driver. In a location such as fromIs/are as followsThe channel adapter and the open system adapter are I/O units of the mainframe computer that provide communication between the operating system and peripheral devices.

Moreover, other types of computing environments may benefit from one or more aspects of the present invention. By way of example, an environment may include an emulator (e.g., software or other emulation mechanisms), in which a particular architecture (including, for example, instruction execution, architectural functions such as address translation, and architectural registers) or a subset thereof is emulated (e.g., in a native computer system having a processor and memory). In such an environment, one or more emulation functions of the emulator can implement one or more aspects of the present invention, even though the computer executing the emulator may have a different architecture than the capabilities being emulated. As one example, in emulation mode, a particular instruction or operation being emulated is decoded, and the appropriate emulation function is established to implement the single instruction or operation.

In an emulation environment, a host computer includes, for example, memory to store instructions and data; an instruction fetch unit to fetch instructions from memory and, optionally, to provide local buffering of fetched instructions; an instruction decode unit to receive the fetched instruction and determine a type of instruction that has been fetched; and an instruction execution unit to execute the instruction. Execution may include loading data from memory to a register; storing data from the register back to the memory; or perform some type of arithmetic or logical operation as determined by the decode unit. In one example, each unit is implemented in software. For example, the operations performed by the units are implemented as one or more subroutines in emulator software.

More specifically, in a mainframe computer, programmers (typically today's "C" programmers) typically use architected machine instructions through compiler applications. The instructions stored in the storage medium may be inEither locally in a server or in a machine executing other architectures. They may be present and futureMainframe computer server andother machines (e.g., Power systems servers andserver) is simulated. They can be used byAMD^TMEtc. are executed in machines running Linux on various machines of manufactured hardware. Except that atWith this on-hardware execution, Linux can also be used for machines that use emulation provided by Hercules (see www.hercules-390.org /) or FSI (fundamentals software, Inc) (www.funsoft.com /) (where the general execution is in emulation mode). In emulation mode, emulation software is executed by the native processor to emulate the architecture of the emulated processor.

The native processor typically executes emulation software, which includes firmware or a native operating system, to execute an emulation program of the emulated processor. The emulation software is responsible for fetching and executing instructions of the emulated processor architecture. The emulation software maintains an emulated program counter to keep track of instruction boundaries. The emulation software can fetch one or more emulated machine instructions at a time and convert the one or more emulated machine instructions into a corresponding set of native machine instructions for execution by the native processor. These translated instructions may be cached so that faster translations may be accomplished. Nevertheless, the emulation software will maintain the architectural rules of the emulated processor architecture to ensure that the operating system and applications written for the emulated processor operate correctly. Furthermore, the emulation software will provide resources determined by the emulated processor architecture, including but not limited to control registers, general purpose registers, floating point registers, dynamic address translation functions including, for example, segment and page tables, interrupt mechanisms, context translation mechanisms, time of day (TOD) clocks, and architectural interfaces to the I/O subsystem, such that operating systems or applications designed to run on the emulated processor may run on the native processor with the emulation software.

The particular instruction being emulated is decoded and a subroutine is called to perform the function of that single instruction. The emulation software functions that emulate the functions of an emulated processor are implemented, for example, in a "C" subroutine or driver, or by other methods that provide drivers for specific hardware, as will be understood by those skilled in the art after understanding the description of the preferred embodiments. Including, but not limited to, U.S. patent No. 5,551,013 entitled "multiprocessor hardware emulation" to beaussoleil et al; and U.S. patent certificate number 6,009,261 entitled "preprocessing of storettaggetoutputting for simulating incorporated PatibelteIndustmeasuring A TargetProcesser" to Scalazi et al; and U.S. patent document No. 5,574,873 entitled "decodingguest instruments directive access instruments" by Davidian et al; and U.S. patent certificate number 6,308,255 entitled "symmetry multi processing and chip set used for a processsor support alloy non-native codex runinasystem" by Gorishek et al; and U.S. patent document No. 6,463,582 entitled "dynamic Optimizing ObjectCode Translationmethod for implementing and dynamic Optimizing ObjectCode Translationmethod" to Lethin et al; and U.S. patent certificate number 5,790,825 entitled "method for simulating Guest Instructions Structure of HostComputerThrough dynamic Recompatibilities of HostInstructions" by EricTraut; as well as numerous other patents, show various known ways to implement emulation of instruction formats architected for different machines for a target machine available to those skilled in the art.

In FIG. 19, an example of an emulated host computer system 5092 is provided that emulates a host computer system 5000' of a host architecture. In the emulated host computer system 5092, the host processor (CPU) 5091 is an emulated host processor (or virtual host processor) and includes an emulated processor 5093 having a different native instruction set architecture than the processor 5091 of the host computer 5000'. The emulation host computer system 5092 has a memory 5094 accessible by an emulation processor 5093. In the exemplary embodiment, memory 5094 is partitioned into a host computer memory 5096 portion and an emulation routines 5097 portion. Host computer memory 5096 is available to programs emulating host computer 5092, according to the host computer architecture. The emulation processor 5093 executes native instructions of an architected instruction set of a different architecture than the emulated processor 5091 (i.e., native instructions from the emulated program processor 5097), and may access host instructions for execution from programs in the host computer memory 5096 by using one or more instructions obtained from a sequence and access/decode routine that may decode the accessed host instructions to determine a native instruction execution routine for emulating the function of the accessed host instructions. Other tools defined for the host computer system 5000' architecture may be emulated by the architecture tool routines, including such tools as general purpose registers, control registers, dynamic address translation and I/O subsystem support and processor caches. The emulation routine may also take advantage of the functionality available in the emulation processor 5093 (such as dynamic translation of general purpose registers and virtual addresses) to improve the performance of the emulation routine. Specialized hardware and offload engines may also be provided to assist the processor 5093 in emulating the functionality of the host computer 5000'.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, and/or components.

The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below, if any, are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A method of facilitating management of system memory of a computing environment, the method comprising the steps of:

in response to executing a Call logical processor CLP instruction for enabling an adapter and requesting allocation of a plurality of Direct Memory Access (DMA) address spaces to the adapter, the CLP instruction including a function handle identifying the adapter, the function handle having an adapter not enable indicator, enabling one or more DMA address spaces and returning the function handle having an adapter enable indicator;

defining a first DMA address space of one or more DMA address spaces enabled for the adapter in response to a Modify PCI function controls MPFC instruction performing a specified register address translation parameter operation, wherein the DMA address space is associated with one or more address translation tables, the one or more address translation tables having a first format;

receiving a request from an adapter to access system memory;

using a requestor identifier and an address space identifier provided in a request to select a DMA address space for use in an access, the DMA address space selected from one or more DMA address spaces enabled for an adapter, comprising: selecting another DMA address space for the adapter, associating one or more other address translation tables to the another DMA address space, the one or more other address translation tables having a second format, the second format being different from the first format.

2. The method of claim 1, wherein the DMA address space identifier comprises one or more bits, and wherein the method further comprises: in response to executing the CLP instruction for the query group, it is determined which bit or bits of the address provided by the adapter are one or more bits of the DMA address space identifier.

3. The method of claim 1, wherein the using step comprises: the requestor identifier and the DMA address space identifier are used to locate an entry in a data structure associated with the adapter that provides one or more characteristics about the DMA address space.

4. The method of claim 3, wherein the entry is located in a device table of an input/output hub coupled to the adapter and the system memory.

5. The method of claim 1, wherein the first format comprises a first variant of an address translation format and the second format comprises a second variant of an address translation format.

6. The method of claim 1, wherein the second format is of a different type of address translation format than the first format.

7. The method of claim 1, wherein the adapter comprises an adapter function module, and wherein the request is received from the adapter function module, the adapter function module having a plurality of DMA address spaces allocated thereto.

8. The method of claim 1, wherein the DMA address space identifier comprises a bit of an address provided in a request, wherein a first value of the bit in combination with a requestor identifier indicates a first DMA address space and a second value of the bit in combination with a requestor identifier indicates a second DMA address space.

9. The method of claim 1, wherein the DMA address space identifier comprises one or more bits of an address provided in a request.

10. The method of claim 1, wherein the method further comprises:

receiving another request from the adapter; and

using another requestor identifier and another DMA address space identifier provided in a request to select another DMA address space, wherein the DMA address space has a first address translation format associated therewith, and the other DMA address space has a second address translation format associated therewith, the first address translation format being different from the second address translation format.

11. A system that facilitates management of system memory of a computing environment, the system comprising:

means for enabling one or more DMA address spaces associated with the DMA address space and returning the function handle with the adapter enable indicator in response to executing a Call logical processor CLP instruction for enabling the adapter and requesting allocation of a plurality of direct memory Access DMA address spaces to the adapter, the CLP instruction including a function handle identifying the adapter, the function handle having an adapter not enable indicator, the DMA address spaces being associated with one or more address translation tables, the one or more address translation tables having a first format;

means for defining a first DMA address space of the enabled one or more DMA address spaces for the adapter in response to a Modify PCI function controls MPFC instruction performing an operation specifying a register address translation parameter;

means for receiving a request from an adapter to access system memory; and

means for using a requestor identifier and a DMA address space identifier provided in a request to select a DMA address space for use in an access, the DMA address space selected from one or more DMA address spaces enabled for an adapter, comprising: means for selecting another DMA address space for the adapter, associating one or more other address translation tables to the other DMA address space, the one or more other address translation tables having a second format, the second format being different from the first format.

12. The system of claim 11, wherein the DMA address space identifier comprises one or more bits, wherein the system further comprises: means for determining which bit or bits of the address provided by the adapter are one or more bits of the DMA address space identifier in response to executing the CLP instruction for the query group.

13. The system of claim 11, wherein the means for using the requestor identifier and the DMA address space identifier provided in the request to select the DMA address space used in the access comprises: means for using the requestor identifier and the DMA address space identifier to locate an entry in a data structure associated with the adapter, the entry providing one or more characteristics about the DMA address space.

14. The system of claim 13, wherein the entry is located in a device table of an input/output hub coupled to the adapter and the system memory.

15. The system of claim 13, wherein the first format comprises a first variant of an address translation format and the second format comprises a second variant of an address translation format.

16. The system of claim 13, wherein the second format is of a different type of address translation format than the first format.

17. The system of claim 13, wherein the adapter comprises an adapter function module, and wherein the request is received from the adapter function module, the adapter function module having a plurality of DMA address spaces allocated thereto.

18. The system of claim 13, wherein the DMA address space identifier comprises a bit of an address provided in a request, wherein a first value of the bit in combination with a requestor identifier indicates a first DMA address space and a second value of the bit in combination with a requestor identifier indicates a second DMA address space.

19. The system of claim 13, wherein the DMA address space identifier comprises one or more bits of an address provided in a request.

20. The system of claim 13, wherein the system further comprises:

means for receiving another request from the adapter; and

means for using another requestor identifier and another DMA address space identifier provided in a request to select another DMA address space, wherein the DMA address space has a first address translation format associated therewith, and the other DMA address space has a second address translation format associated therewith, the first address translation format being different from the second address translation format.