[go: up one dir, main page]

HK1180800B - Converting a message signaled interruption into an i/o adapter event notification to a guest operating system - Google Patents

Converting a message signaled interruption into an i/o adapter event notification to a guest operating system Download PDF

Info

Publication number
HK1180800B
HK1180800B HK13108098.9A HK13108098A HK1180800B HK 1180800 B HK1180800 B HK 1180800B HK 13108098 A HK13108098 A HK 13108098A HK 1180800 B HK1180800 B HK 1180800B
Authority
HK
Hong Kong
Prior art keywords
adapter
interrupt
guest
indicator
host
Prior art date
Application number
HK13108098.9A
Other languages
Chinese (zh)
Other versions
HK1180800A1 (en
Inventor
G.斯特曼三世
D.克拉多克
J.伊斯顿
M.法雷尔
T.格雷格
D.L.奥西塞克
F.布赖斯
Original Assignee
国际商业机器公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/821,177 external-priority patent/US8468284B2/en
Application filed by 国际商业机器公司 filed Critical 国际商业机器公司
Publication of HK1180800A1 publication Critical patent/HK1180800A1/en
Publication of HK1180800B publication Critical patent/HK1180800B/en

Links

Description

Converting message signaled interrupts to I/O adapter event notifications to guest operating systems
Technical Field
The present invention relates generally to interrupt handling in a computing environment, and more particularly to handling interrupts generated by adapters in a computing environment.
Background
Message Signaled Interrupts (MSI) is a method used by adapter functions, such as Peripheral Component Interconnect (PCI) functions, to generate Central Processing Unit (CPU) interrupts to notify an operating system of an event or the presence of a certain state. The MSI is an alternative to having a dedicated interrupt pin on each device. When an adapter function is configured to use MSI, the function requests an interrupt by performing an MSI write operation that writes a specified number of bytes of data to a special address. The combination of this special address and the unique data value is called an MSI vector.
Some adapter functions support only one MSI vector; other adapter functions support multiple MSI vectors. For functions that support multiple MSI vectors, the same special address is used with different data values.
On many computing platforms, the device driver configures itself as an interrupt handler associated with the MSI vector. This effectively associates the MSI vector with an entry in the CPU interrupt vector. Thus, when an adapter function supports multiple MSI vectors and is configured to use multiple MSI vectors, it consumes a corresponding number of entries in the CPU interrupt vector.
U.S. publication No. 2007/0271559a1 entitled "virtualization of infiniband and host channel adapter interruptions" published by Easton et al on 11/22 of 2007 describes a method, system, program product, and computer data structure for providing two-tier server virtualization. A first hypervisor (hypervisor) enables multiple logical partitions to share a set of resources and provides a first layer of virtualization. The second hypervisor enables multiple independent virtual machines to share resources allocated to a single logical partition and provides a second layer of virtualization. All events for all virtual machines within the single logical partition are grouped (group) into a single partition owned event queue for receiving event notifications from the shared resources of the single logical partition. The interrupt request is signaled for a grouped event from the partition-owned event queue for demultiplexing of the grouped event from the partition-owned event queue by the machine to a separate virtualized event queue allocated on each virtual machine.
Publication number WO96/00940a1 entitled "pcttoisainterrupprotocol converter and selection 0 onmechansm" (PCI-to-ISA interrupt protocol converter and selection mechanism), published by Nelson et al, 10/1, 1996, describes an interrupt handling mechanism for converting PCI agent interrupts (36-38) that conform to the secondary bus standard interrupt protocol. PCI agent interrupts (36-38) are processed by programmable logic (50) to translate PCI compliant interrupts into, for example, ISA bus standard compliant interrupts (40) for processing by a computer system that implements both a PCI bus (30) and an ISA bus (40). A programmable register (48) provides for selection of which ISA interrupts are to be generated by programmable logic (50) in response to PCI agent interrupts (36-38).
U.S. patent No. 6,772,264 issued by Dayan et al on 8/3 2004 and entitled "docking station for isaaadapters" which enables a docking station for an ISA adapter describes a docking system for use with a computer system that includes an externally accessible PC card interface for transmitting signals conforming to the PC card standard to a docking station housing (enclosure). The docking station housing includes a PC card connector that connects to and passes interface signals between the PC card interface of the computer system and the docking station housing. The docking station housing also includes an ISA bus structure conforming to an ISA bus standard. In addition, the docking station housing contains translation logic connected to receive signals from the computer system through the PC card connector and to translate these received signals into signals for operating the ISA bus structure. The computer system includes conversion logic connected to receive signals from the docking station housing through the PC card connector and convert these signals into a system interrupt request. In this manner, one or more ISA adapters may be used in the docking station housing to emulate (emulate) one or more PC card functions on a PC card interface.
Disclosure of Invention
According to aspects of the present invention, a capability is provided that facilitates managing interrupt requests from adapters.
The shortcomings of the prior art are overcome and advantages are provided through the provision of a method as claimed in claim 1 and a computer program product for providing an interrupt to a guest (guest) of a computer environment.
Drawings
One or more aspects of the present invention are particularly pointed out and distinctly claimed as examples of the claims at the conclusion of the specification. The above and other objects, features and advantages of the present invention will become apparent from the following detailed description when taken in conjunction with the accompanying drawings, in which:
FIG. 1A depicts one embodiment of a computing environment to incorporate and use one or more aspects of the present invention;
FIG. 1B depicts one embodiment of a central processing complex (complex) in which a host executes one or more guests, in accordance with an aspect of the present invention;
FIG. 2A depicts one embodiment of the system memory and I/O hub of FIG. 1, in more detail, in accordance with an aspect of the present invention;
FIG. 2B depicts another embodiment of the system memory and I/O hub of FIG. 1 in greater detail, in accordance with an aspect of the present invention;
FIG. 2C depicts one embodiment of an entry in a Guest Adapter Interruption Table (GAIT), used in accordance with an aspect of the present invention;
FIG. 2D depicts one embodiment of an entry in an Adapter Interruption Forwarding Table (AIFT), used in accordance with an aspect of the present invention;
3A-3B illustrate examples of allocation of adapter interrupt bit vectors in accordance with aspects of the present invention;
3C-3D illustrate examples of allocation of adapter interruption summary bits in accordance with aspects of the present invention;
FIG. 4 depicts one embodiment of an overview of the logic to be performed at initialization to configure adapter functionality for I/O adapter event notification, in accordance with an aspect of the present invention;
FIG. 5 depicts one embodiment of the logic to perform registration to enable conversion of a Message Signaled Interruption (MSI) to an I/O adapter event notification, in accordance with an aspect of the present invention;
FIG. 6A depicts one embodiment of the logic to convert an MSI request into an I/O adapter event notification, in accordance with an aspect of the present invention;
FIG. 6B depicts one embodiment of the logic to present I/O adapter event notifications to the operating system, in accordance with an aspect of the present invention;
FIG. 7A depicts one embodiment of the logic to perform host initialization in the environment of a host executing a guest, in accordance with an aspect of the present invention;
FIG. 7B depicts one embodiment of the logic to perform guest initialization, in accordance with an aspect of the present invention;
FIG. 7C depicts one embodiment of a set interrupt control instruction, used in accordance with an aspect of the present invention;
7D-7F illustrate examples of the contents of fields used in setting the interrupt control instruction in FIG. 7C, in accordance with aspects of the present invention;
FIG. 7G illustrates one example of an Adapter Interruption Parameter Block (AIPB) used in accordance with an aspect of the present invention;
FIG. 8 depicts one embodiment of the logic to perform registration in a host executing guest computing environment, in accordance with an aspect of the present invention;
FIG. 9A depicts one embodiment of the logic to convert an MSI request into an I/O adapter event notification to be presented to a guest, in accordance with an aspect of the present invention;
FIG. 9B depicts one embodiment of the logic performed in response to an adapter interruption request, in accordance with an aspect of the present invention;
FIG. 9C depicts one embodiment of the logic to handle an adapter interrupt indicator (indicator), in accordance with an aspect of the present invention;
FIG. 10A depicts one embodiment of a Modify PCI function controls instruction, used in accordance with an aspect of the present invention;
FIG. 10B depicts one embodiment of the fields used by the Modify PCI function controls instruction of FIG. 10A, in accordance with an aspect of the present invention;
FIG. 10C depicts one embodiment of another field used by the Modify PCI function controls instruction of FIG. 10A, in accordance with an aspect of the present invention;
FIG. 10D depicts one embodiment of the contents of a Function Information Block (FIB), used in accordance with an aspect of the present invention;
FIG. 11 depicts one embodiment of an overview of the logic to modify PCI function control, in accordance with an aspect of the present invention;
FIG. 12 depicts one embodiment of the logic associated with a register adapter interrupt operation specified by a Modify PCI function controls instruction, in accordance with an aspect of the present invention;
FIG. 13 depicts one embodiment of the logic associated with the logoff adapter interruption operation specified by the Modify PCI function controls instruction, in accordance with an aspect of the present invention;
FIG. 14A depicts one embodiment of a Call logical processor instruction, used in accordance with an aspect of the present invention;
FIG. 14B depicts one embodiment of a request block used by the Call logical processor instruction of FIG. 14A for a list operation, in accordance with an aspect of the present invention;
FIG. 14C depicts one embodiment of a response block for the list operation of FIG. 14B, in accordance with an aspect of the present invention;
FIG. 14D depicts one embodiment of a function list item, used in accordance with an aspect of the present invention;
FIG. 15A depicts one embodiment of a request block used by the Call logical processor instruction of FIG. 14A to query a functional operation, in accordance with an aspect of the present invention;
FIG. 15B depicts one embodiment of a response block for the query function operation of FIG. 15A, in accordance with an aspect of the present invention;
FIG. 16A depicts one embodiment of a request block used by the Call logical processor instruction of FIG. 14A for a query group operation, in accordance with an aspect of the present invention;
FIG. 16B depicts one embodiment of a response block for the query group operation of FIG. 16A, in accordance with an aspect of the present invention;
FIG. 17 depicts one embodiment of a computer program product incorporating one or more aspects of the present invention;
FIG. 18 depicts one embodiment of a host computer system to incorporate and use one or more aspects of the present invention;
FIG. 19 depicts another embodiment of a computer system to incorporate and use one or more aspects of the present invention;
FIG. 20 illustrates another example of a computer system including a computer network that incorporates and uses one or more aspects of the present invention;
FIG. 21 depicts one embodiment of the elements of a computer system incorporating and using one or more aspects of the present invention;
FIG. 22A depicts one embodiment of an execution unit of the computer system of FIG. 21 to incorporate and use one or more aspects of the present invention;
FIG. 22B depicts one embodiment of a branching unit of the computer system of FIG. 21 that incorporates and uses one or more aspects of the present invention;
FIG. 22C depicts one embodiment of a load/store unit of the computer system of FIG. 21 incorporating and using one or more aspects of the present invention;
FIG. 23 illustrates one embodiment of an emulated host computer system incorporating and using one or more aspects of the present invention.
Detailed Description
In accordance with aspects of the present invention, the ability to convert a Message Signaled Interruption (MSI) request into an input/output (I/O) adapter event notification is provided. An MSI is requested by an adapter and converted to an adapter message notification, with one or more specific indicators (indicators) set, and a request is generated to present an interrupt to the operating system (or other software, e.g., other programs, etc.. As used herein, the term operating system includes operating system device drivers). In one particular example, each MSI request does not cause an interrupt request to the operating system, but rather an interrupt request contains multiple MSI requests. In one particular example, the operating system is a guest, such as a guest operating system, executing under the host. In one example, the guest is a pageable storage mode guest.
In one example, inMiddle and adjustableThe page guest is interpreted by a begin interpretive execution (SIE) instruction at an interpretation level 2. For example, a Logical Partition (LPAR) hypervisor (hypervisor) executes an SIE instruction to begin a logical partition in physically fixed memory. If it is notIs the operating system in the logical partition that initiates the SIE instruction to execute the guest (virtual) machine in its V = V (virtual) memory. Thus, the LPAR hypervisor uses level-1 SIE, andthe hypervisor uses level-2 SIE.
As used herein, the term "adapter" includes any type of adapter (e.g., storage adapter, processing adapter, network adapter, cryptographic adapter, PCI adapter, other types of input/output adapters, etc.). In one embodiment, an adapter includes an adapter function. However, in other embodiments, an adapter may include multiple adapter functions. One or more aspects of the present invention may be applied regardless of whether an adapter includes one adapter function or multiple adapter functions. Further, in the examples presented herein, adapters are used interchangeably with adapter functions (e.g., PCI functions) unless otherwise noted.
One embodiment of a computing environment to incorporate and use one or more aspects of the present invention is described with reference to FIG. 1. In one example, the computing environment 100 is a System provided by International Business machines corporationAnd (4) a server. SystemThe server is based on the service provided by International Business machines corporationAboutDetails of (A) are published in IBMThe publication is entitled "z/Architecture principles of operation", IBM publication No. SA22-7832-07, month 2 2009.Andis a registered trademark of international business machines corporation, armonk, new york. Other names used herein may be registered trademarks, trademarks or product names of International Business machines corporation or other companies.
In one example, computing environment 100 includes one or more Central Processing Units (CPUs) 102 coupled to a system memory 104 (also referred to as main memory) via a memory controller 106. To access system memory 104, central processing unit 102 issues a read or write request that includes an address that is used to access system memory. The address included in the request is typically not directly usable to access system memory, and therefore, it is converted to an address that is directly usable to access system memory. The address is translated via a translation mechanism (XLATE) 108. For example, addresses are translated from virtual addresses to real or absolute addresses using, for example, Dynamic Address Translation (DAT).
A request including an address (translated if necessary) is received by memory controller 106. In one example, the memory controller 106 contains hardware and is used to arbitrate for access to system memory and to maintain memory coherency. This arbitration is performed for requests received from the CPU102 and requests received from one or more adapters 110. Similar to a central processing unit, the adapter issues a request to system memory 104 to obtain access to the system memory.
In one example, adapter 110 is a Peripheral Component Interconnect (PCI) or PCI express (PCIe) adapter that includes one or more PCI functions. The PCI function issues a request that is routed to an input/output hub 112 (e.g., a PCI hub) via one or more switches (e.g., PCIe switches) 114. In one example, an input/output hub includes hardware including one or more state machines.
The input/output hub includes, for example, a root complex (rootcomplex) 116 that receives requests from switches. The request includes an input/output address that is used, for example, to perform Direct Memory Access (DMA) or request Message Signaled Interrupts (MSI). The address is provided to an address translation and protection unit 118, which accesses information for the DMA or MSI request.
For DMA operations, address translation and protection unit 118 may translate addresses to addresses that may be used to access system memory. The request initiated from the adapter, including the translated address, is then provided to the memory controller 106, e.g., via the I/O to memory bus 120. The memory controller performs its arbitration and forwards the request with the translated address to the system memory at the appropriate time.
For an MSI request, information in address translation and protection unit 118 is obtained to facilitate translation of the MSI request to an I/O adapter event notification.
In another embodiment, a central processing complex, such as that shown in FIG. 1B, in addition to or in place of one or more central processor units 102, is coupled to the memory controller 106. In this particular example, central processing complex 150 provides virtual machine support. The central processing complex 150 includes, for example, one or more virtual machines 152, one or more Central Processors (CPs) 154, and at least one hypervisor 156, each of which will be described below.
The virtual machine support of the central processing complex provides the ability to operate a large number of virtual machines, each capable of executing a guest operating system 158 such as z/Linux. Each virtual machine 152 can operate as a separate system. That is, each virtual machine may be independently reset, execute a guest operating system, and run a different program. An operating system or application running on a virtual machine appears to have access to a full and complete system, but in practice, only a portion is available.
In this particular example, the model of the virtual machine is a V = V model, where the memory of the virtual machine is supported by virtual memory rather than real memory. Each virtual machine has a virtual linear memory space. The physical resources are owned by a hypervisor 156, such as a VM hypervisor, and the shared physical resources are dispatched by the hypervisor to the guest operating systems as needed to meet their processing needs. The V = V virtual machine model assumes that the interaction between the guest operating system and the physically shared machine resources is controlled by the VM hypervisor, since the large number of guests typically precludes the hypervisor from simply partitioning and allocating hardware resources to configured guests. One or more aspects of the V = V model are further described in IBM publication SC24-5997-02 entitled "z/VM: RunningGuest operating systems" (z/VM: running guest operating system) published by IBM at 10.2001.
Central processors 154 are physical processor resources that may be assigned to virtual machines. For example, virtual machine 152 includes one or more logical processors, each representing all or a portion of physical processor resources 154 that may be dynamically allocated to the virtual machine. Virtual machine 152 is managed by hypervisor 156. By way of example, the hypervisor may be implemented in firmware running on the processor 154 or may be part of a host operating system running on the machine. In one example, hypervisor 156 is a VM hypervisor, such as that provided by International Business machines corporation, Armonk, N.Y.One embodiment of (1) is disclosed in IBM at 5 months 2003 under the heading "z/M:the IBM publication GC24-5991-05 to GenerlInformationNual "(z/VM: general information guide).
As used herein, firmware includes, for example, microcode, millicode (millicode), and or macrocode of a processor. It includes, for example, hardware-level instructions and/or data structures for implementing high-level machine code. In one embodiment, it comprises, for example, proprietary (proprietary) code, which is typically delivered as microcode containing trusted software or microcode specific to the underlying hardware, and controls operating system access to system hardware.
According to an aspect of the present invention, a message signaled interrupt request issued by an adapter (e.g., an adapter function) is converted to an input/output adapter event notification to be presented to a guest (e.g., a pageable storage mode guest; i.e., a V = V guest). Although in this embodiment I/O adapter event notifications are intended for the guest, to facilitate an understanding of one or more aspects of the present invention, the presentation of I/O adapter event notifications to an operating system (or other entity, such as a processor, logical partition, etc.) that is not a guest is discussed first herein, followed by a discussion of the presentation of notifications to the guest.
Further details regarding the I/O hub and system memory as they relate to interrupt processing will be described with reference to FIGS. 2A and 2B. In these figures, the memory controller is not shown, but may be used. The I/O hub may be coupled to the system memory 104 and/or the processor 254 directly or through a memory controller. FIG. 2A illustrates one embodiment of a structure used to present adapter event notifications to an operating system that is not a guest, and FIG. 2B illustrates one embodiment of a structure used to present adapter event notifications to a guest.
Referring to FIG. 2A, in one example, system memory 104 includes one or more data structures that may be used to facilitate interrupt processing. In this example, system memory 104 includes an Adapter Interruption Bit Vector (AIBV) 200 and an optional Adapter Interruption Summary Bit (AISB) 202 associated with a particular adapter. For each adapter, there may be an AIBV and a corresponding AISB.
In one example, adapter interruption bit vector 200 is a one-dimensional array of one or more indicators (e.g., bits) in main memory associated with an adapter (e.g., a PCI function). Bits in the adapter interruption bit vector represent the MSI vector number (vectorumber). The bit set to 1 in the AIBV indicates the event condition or type for the associated adapter. In the example of a PCI function, each bit in the associated AIBV corresponds to an MSI vector. Thus, if a PCI function supports only one MSI vector, its AIBV includes a single bit; if a PCI function supports multiple MSI vectors, its AIBV includes one bit per MSI vector. In the example shown in FIG. 2A, the PCI function supports multiple MSI vectors (e.g., 3), and thus, there are multiple bits (e.g., 3) in the AIBV 200. Each bit corresponds to a particular event, e.g., when bit 0 of the AIBV is set to 1, indicating a completed operation. When bit 1 of the AIBV is set to 1, this corresponds to an error event, etc. As shown, bit 1 is set in this example.
In one particular example, a command (e.g., a Modify PCI function controls command) is used to specify the AIBV for a PCI function. In particular, the command is issued by the operating system and specifies an identity (identity) of the PCI function, a primary storage location of an area containing the AIBV, an offset from the location to a first bit of the AIBV, and a number of bits constituting the AIBV.
In one example, the identity of the PCI function is a function handle (functional handle). The function handle includes, for example, an enable indicator that indicates whether the PCI function handle is enabled; a PCI function number that identifies the function (this is a static identifier and can be used as an index into the function table to locate a particular entry); and an instance number indicating a particular instance of the function handle. For example, each time the function handle is enabled, the instance number is incremented to provide a new instance number. The function handle is used to locate a function table entry in a function table that contains one or more entries. For example, one or more bits of the function handle are used as an index into the function table to locate a particular function table entry. The function table entry includes information about its associated PCI function. For example, it may include various indicators as to the status of its associated adapter function, and it may include one or more device table entry indexes that are used to locate a device table entry for that adapter function. (for operating systems, in one embodiment, the handle is simply an opaque (opaque) identifier for the adapter).
The AIBV may be located at any byte boundary (boundary) and any bit boundary. This allows the operating system the flexibility to compress (pack) the AIBVs of multiple adapters to a contiguous range of bits and bytes. For example, as shown in FIG. 3A, in one example, the operating system specifies a common storage area at location X to contain 5 consecutive AIBVs. The adapter associated with each AIBV is identified by the letters A-E. The events represented by each AIBV bit for an adapter are further identified by the numbers 0-n. Unallocated bits are identified by the lower case letter "u".
Another example is shown in fig. 3B. In this example, the operating system has specified three unique storage areas at X, Y, Z to contain the AIBVs for five I/O adapters. The storage area at location X contains AIBVs for adapters A and B, the storage area at location Y contains AIBVs for adapter C only, and the storage area at location Z contains AIBVs for D and E. The events represented by each AIBV bit for an I/O adapter may be further identified by the numbers 0-n. Unallocated bits are identified by the letter "u".
Returning to FIG. 2A, in addition to the AIBV, in this example, there is an AISB202 for the adapter, which contains a single indicator (e.g., bit) associated with the adapter. An AISB having a value of 1 indicates that one or more bits in the AIBV associated with the AISB have been set to 1. The AISB is optional and may be one for each adapter, one for each selected adapter, or one for a group of adapters.
In one particular implementation of a PCI function, a command (e.g., a Modify PCI function controls command) is used to specify an AISB for the PCI function. In particular, the command is issued by the operating system and specifies the identity (e.g., handle) of the PCI function, the main storage location of the area containing the AISB, an offset from that location to the AISB, and an adapter interruption summary notification enable control indicating that a summary bit is present.
The AISB may be allocated on any byte boundary and any bit boundary. This allows the operating system the flexibility to compress the AISBs of multiple adapters into a contiguous range of bits and bytes. In one example, as shown in FIG. 3, the operating system specifies a common storage area at location X to contain nine consecutive AISBs. The adapter associated with each AISB is identified by the letters A-I. Unallocated bits are identified by the lower case letter "u".
Another allocation example is shown in FIG. 3D, where the operating system has designated three unique AISB storage locations at locations X, Y and Z to contain the AISBs for each of the three adapters. The adapter associated with each AISB is identified by the letters A-C. Unallocated bits are identified by the lower case letter "u".
In addition, the operating system may also assign a single AISB to multiple PCI functions. This associates multiple AIBVs with a single summary bit. Thus, such an AISB having a value of 1 indicates that the operating system should scan multiple AIBVs.
Returning to FIG. 2A, in one example, the AIBV and AISB are pointed to by addresses in the device table entry 206 of the device table 208 located in the I/O hub 112. In one example, the device table 208 is located within an address translation protection unit of the I/O hub.
The device table 208 includes one or more entries 206, each of which is assigned to a particular adapter function 210. The device table entry 206 includes several fields that may be populated using the commands described above, for example. The values of one or more fields are based on policy and/or configuration. Examples of fields include:
interruption Subclass (ISC) 214: indicating an interrupt subclass for the interrupt. The ISC identifies a maskable class of adapter interrupts that may be associated with a priority that an operating system uses to process the interrupt;
AIBV address () 216: providing an absolute address, e.g., of the start of a storage location containing the AIBV for the particular adapter function assigned to the device table entry;
AIBV offset 218: an offset to the start of the AIBV in the master storage location;
AISB address () 220: providing an absolute address of a start of a memory location containing an AISB for the PCI function if the AISB has been specified by the operating system;
AISB offset 222: an offset into the AISB in the master storage location;
adapter summary notification enable control (enable) 224: the control indicates whether an AISB is present;
number of interrupts (NOI) 226: expressed as the maximum number of MSI vectors allowed for the PCI function, 0 denotes that no MSI vectors are allowed.
In other embodiments, the DTE (device table entry) may include more, less or different information.
In one embodiment, a Requestor Identifier (RID) (and/or a portion of an address), for example, located in a request issued by an adapter (e.g., PCI function 210) is used to locate a device table entry to be used for a particular interrupt requested by the adapter. The requestor ID (e.g., a 16-bit value specifying, for example, a bus number, a device number, and a function number) and an address to be used for the interrupt are included in the request. The request, including the RID and address, is provided, for example, via a switch to a content addressable memory (CAM 230), for example, and the content addressable memory is used to provide an index value. For example, the CAM includes a plurality of entries, each entry corresponding to an index into a Device Table (DT). Each CAM entry includes the value of the RID. If, for example, the received RID matches a value contained in an entry in the CAM, the corresponding device table index is used to locate a device table entry. That is, the output of the CAM is used to index into the device table 208. If there is no match, the received packet is discarded (in other embodiments, no CAM or other lookup is needed, and the RID is used as an index). As described herein, the located DTE is used to process interrupt requests.
In one particular example, if the interrupt request is for a guest executing in a particular region or logical partition, the device table entry also includes a region field 228, as shown in FIG. 2B. This field indicates the region to which the client belongs. In another embodiment, this field is not used, or may even be used without providing a guest (e.g., to specify the region or logical partition in which the operating system is running).
To facilitate interrupt processing for guests, other data structures are used, some of which are stored in host memory 270 and others in guest memory 271. Examples of these structures are described below.
In one example, host memory 270 includes, for example, a forwarding AISB array 272 and a Guest Adapter Interruption Table (GAIT) 274. Forward AISB array 272 is an array of AISBs that is used in conjunction with a guest adapter interruption table to determine whether an MSI request is targeted to a guest or its host. The forwarding AISB array includes AISBs for each PCI function that the host has assigned to a guest, and the host requests an adapter event notification interrupt for that PCI function on behalf of the guest. Such arrays are hosted by the guest (e.g.) Allocated in host memory.
The guest adapter interruption table 274, together with the forwarding AISB array, is used to determine whether the target of an MSI is a host or one of its guests, and if a guest, which guest. There is a one-to-one correspondence between indicators (e.g., bits) in the forwarding AISB array and the GAIT entries. This means that when a bit in the forwarding AISB array is set to 1 and the corresponding GAIT information contains forwarding information, adapter event notification is made pending (pending) for the adapter for the client associated with the AISB indicator (e.g., bit) and the corresponding GAIT entry.
When the GAIT entry is used and contains a defined value (e.g., all zeros), the MSI request is targeted to the host. When the GAIT entry is used and does not contain a defined value, the MSI request is targeted to the client. In addition, when the MSI request is targeted to a guest, the GAIT entry includes the following information, as shown in FIG. 2C: the host address and guest offset of the guest AISB for PCI function 290; host address 291 of Guest Interrupt Status Area (GISA); and a Guest Interruption Subclass (GISC) 292 for interrupting adapters generated for guests.
Returning to FIG. 2B, more details regarding the Guest Interruption State Area (GISA) are provided. In one example, the GISA276 is a control block in which guest adapter interruptions are made pending. Its origin (origin) is specified in GAIT274 and in a state description 280 (e.g., a control block maintained by the host that defines the virtual CPU for the guest to the interpreting hardware/firmware). It comprises for example an Interrupt Pending Mask (IPM), which is a mask associated with the guest that comprises indicators for a plurality of Interrupt Subclasses (ISCs); and an Interrupt Alert Mask (IAM), which is another mask corresponding to the guest. In one example, each bit in the mask corresponds to an ISC enable indicator.
In addition to the above, in guest memory, which is pinned (pin), i.e., pinned, to become non-pageable (non-pageable) in host memory, there is a guest AISB array 282 and a guest AIBV array 284. Guest AISB array 282 includes a plurality of indicators 202' (e.g., AISBs), each of which may be associated with an I/O adapter. The AISB for an I/O adapter, when 1, indicates that one or more bits have been set to 1 in the adapter interruption bit vector associated with the I/O adapter.
The AIBV array 284 includes one or more AIBVs 200 '(e.g., 3 in this example), and as described above with reference to AIBV200, each AIBV 200' is a one-dimensional array of one or more indicators (e.g., bits) associated with an I/O adapter. Each bit in the AIBV, when it is a1, indicates the condition and type of event for the associated I/O adapter.
In addition to the data structures in the host and guest memory, a data structure called an Adapter Interruption Forwarding Table (AIFT) 285 is maintained in secure memory 286 that is not accessible by the host nor by the guest. The adapter interrupt forwarding table is used by system firmware to determine whether the target of the MSI request is the logical partition in which the host and guest are running. The AIFT is indexed by an area number that identifies the logical partition to which the PCI function is assigned. When the AIFT entry is used and the entry contains a defined value (e.g., all zeros), the adapter event notification is targeted to the operating system running in the specified logical partition. When the AIFT entry is used and the entry does not contain a defined value, the firmware uses the forwarding AISB array and the GAIT to determine whether the adapter event notification is targeted to a host or guest running in the logical partition. In one example, as shown in FIG. 2D, the AIFT entry of AIFT285 includes, for example, address 294 of a forwarding AISB array in (host) memory of the partition; forward length in bits of the AISB array and length in GAIT entries of the GAIT 295; address 296 of GAIT in the partitioned memory; and a host Interrupt Subclass (ISC) 297 associated with MSI requests to be forwarded to guests for that partition.
Returning to FIG. 2A and/or FIG. 2B, to request an interrupt, the adapter function 210 sends a packet to the I/O hub. The packet has an MSI address 232 and associated data 234. The I/O hub compares at least a portion of the received address to the value in MSI compare register 250. If there is a match, an interrupt (e.g., MSI) is being requested, rather than a DMA operation. The reason for the request (i.e., the type of event that occurred) is indicated in the correlation data 234. For example, one or more of the low-order bits of the data are used to specify a particular interrupt vector (i.e., MSI vector) indicating a cause (event).
In accordance with an aspect of the present invention, an interrupt request received from an adapter is converted into an I/O adapter event notification. That is, one or more indicators (e.g., one or more AIBVs and optionally AISBs) are set and an interrupt is requested to the operating system (host or guest) if no interrupt is already pending (pending). In one embodiment, multiple interrupt requests (e.g., MSIs) from one or more adapters are combined into a single interrupt to the operating system, but with respective AIBV and AISB indicators. For example, if an I/O hub has received an MSI request, which in turn has provided an interrupt request to the processor, and the interrupt request is still pending (e.g., an interrupt has not been presented to the operating system for one reason or another (e.g., the interrupt is disabled)), then if the hub receives one or more other MSIs, it will not request additional interrupts. One interrupt replaces and represents multiple MSI requests. However, one or more AIBVs and optionally one or more AISBs are set.
More details regarding the conversion of an MSI (or other adapter interruption request) to an I/O adapter event notification will be described below with reference to these figures. In particular, details regarding converting MSIs and presenting adapter event notifications in those systems that do not include a client or present notifications to a client are described with reference to FIGS. 4-6B, and details regarding converting MSIs and presenting adapter event notifications to a client are described with reference to FIGS. 7A-9C.
Referring initially to FIG. 4, in one example, to convert an MSI request to an I/O adapter event notification, some initialization is performed. During initialization, the operating system performs several steps to configure the adapter for adapter event notification via MSI requests. In this example, a PCI function is configured; but in other embodiments may be other adapters including other types of adapter functionality.
Initially, in one embodiment, a PCI function in a configuration is determined, step 400. In one example, commands issued by an operating system (e.g., a query list command) are used to obtain a list of PCI functions assigned to a requested configuration (e.g., assigned to a particular operating system). The information is obtained from a configuration data structure that maintains the information.
Next, at step 402, one of the PCI functions in the list is selected, and at step 406, the MSI address for the PCI function and the number of MSI vectors supported by the PCI function are determined. The MSI address is determined based on the characteristics of the I/O hub and the system in which it is installed. The number of MSI vectors supported is policy based and configurable.
In addition, at step 410, the AIBV, and AISBs (if any), are allocated. In one example, the operating system determines the location of the AIBV, typically based on the class of adapter, to allow for efficient processing of one or more adapters. For example, AIBVs for storage adapters may be adjacent to each other. The AIBV and AISB are allocated and cleared to zero, and a register adapter interruption operation is registered. At step 412, the operation registers the AIBV, AISB, ISC, number of interrupts (MSI vectors), and adapter interrupt summary notification enablement control, as will be described in more detail below. In one example, the Modify PCI function controls instruction is used to perform a registration operation, as well as other operations herein, as will be described in more detail below.
Thereafter, at step 414, the configuration space for the PCI function is written. In particular, the MSI address and MSI vector count are written into the configuration address space of the adapter function consistent with the previous registration. (in one example, a PCI function includes multiple address spaces including, for example, a configuration space, an I/O space, and one or more memory spaces).
Thereafter, at query 416, it is determined whether additional functionality exists in the list. If so, processing continues with step 402. Otherwise, the initialization process is complete.
More details regarding the registration of various parameters are described with reference to fig. 5. Initially, a Device Table Entry (DTE) corresponding to the PCI function for which initialization is being performed is selected. This selection is performed, for example, by the management firmware selecting an available DTE from the device table. Thereafter, at step 502, various parameters are stored in the device table entry. For example, the ISC, AIBV address, AIBV offset, AISB address, AISB offset, enable control, and number of interrupts (NOI) are set to values obtained by configuring the function. This completes the registration process.
During operation, when a PCI function wants to generate an MSI, it typically makes some information describing the condition available to the operating system. This causes one or more steps to occur to convert the PCI function's MSI request to an I/O adapter event notification to the operating system. This will be described with reference to fig. 6A.
Referring to FIG. 6A, initially, at step 600, a description of an event for which an interrupt is requested is recorded. For example, the PCI function records a description of the event in one or more adapter-specific event description recording structures stored, for example, in system memory. This may include recording the event type and recording other information. In addition, at step 601, a request is initiated by the PCI function specifying the MSI address and MSI vector number, as well as the requestor ID. At step 602, the request is received by the I/O hub, and in response to receiving the request, the requestor ID in the request is used to locate a device table entry for the PCI function. At query 603, the I/O hub compares at least a portion of the address in the request to the value in the MSI compare register. If they are not equal, the MSI is not requested. However, if they are equal, the MSI address has been specified and, thus, the MSI has been requested, not a direct memory access operation.
Thereafter, at query 604, a determination is made as to whether the MSI vector number specified in the request is less than or equal to the number of interrupts (NOIs) allowed for the function. If the MSI vector number is greater than NOI, an error is indicated. Otherwise, the I/O hub issues a set bit function to set the appropriate AIBV bit in memory. The appropriate bits are determined by adding the MSI vector number to the AIBV offset specified in the device table entry and shifting this number of bits from the AIBV address specified in the device table entry, STEP 605. Further, if an AISB has been specified, then the I/O hub uses a set bit function to set the AISB, step 606, using the AISB address and AISB offset in the device table entry.
Next, in one embodiment, a determination is made (e.g., by the CPU or I/O hub) as to whether an interrupt request is already pending. To make this determination, a pending indicator is used. For example, at query 608, a pending indicator 252 (FIGS. 2A and 2B) stored in a memory of the processor 254 is examined, which is accessible by a processor (e.g., CPU102 in FIG. 1) in the computing environment that can handle interrupts. If it is not set, it is set (e.g., to 1) at step 610. If it is already set, processing is complete and another interrupt request will not be requested. Thus, subsequent interrupt requests are contained by one request that is already pending.
In one particular example, there may be one pending indicator for each interrupt subclass, and thus, the pending indicator assigned to the interrupt subclass for the requesting function is the indicator that is checked.
Asynchronously, as shown in FIG. 6B, one or more processors check the pending indicator at query 640. In particular, each processor enabled for the ISC (and zone (zone) in another embodiment) polls (poll) the indicator when, for example, interrupts are enabled for the processor (i.e., its operating system). If one of the processors determines that the indicator is set, it arbitrates with other processors enabled for the same ISC (and zone in another embodiment) to present the interrupt, STEP 642. Returning to INQUIRY 640, if the pending indicator is not set, the processor enabled for ISC continues to poll for the set indicator.
In response to presenting the operating system with an interrupt, the operating system determines whether an AISB is registered, INQUIRY 644. If not, the operating system processes the set AIBV, as described below, at step 648. Otherwise, the operating system processes any set AISBs and AIBVs at steps 646, 648. For example, it checks if any AISBs are set. If so, it uses the AISB to determine the location of one or more AIBVs. For example, the operating system remembers the locations of the AISB and the AIBV. In addition, it keeps track of which adapter each AISB and AIBV represents. Thus, it may maintain some form of control block and other data structures including the locations of the AISB and AIBV and the associations between the AISB, AIBV and adapter IDs (handles). It uses the control block to facilitate locating the AIBV based on the associated AISB. In another embodiment, the AISB is not used. In this case, the control block is used to locate a particular AIBV.
In response to locating one or more AIBVs, the operating system scans the AIBVs and processes any set AIBVs. It processes the interrupt (e.g., provides status) in a manner consistent with the presented event. For example, using a storage adapter, an event may indicate that an operation has completed. This causes the operating system to check the state stored by the adapter to see if the operation has completed successfully and the details of the operation. In the case of a memory read, this indicates that the data read from the adapter is now available in system memory and can be processed.
In one embodiment, if an error is detected during the operation of the conversion, instead of converting the MSI request to an adapter event notification, an attention (attention) is generated to the system firmware and the DTE is placed in an error state.
As described above, in addition to converting MSIs to adapter event notifications to operating systems that are not guests (e.g., pageable guests), in another embodiment adapter events may be presented to guests. Further details regarding converting an MSI request to an adapter event notification and providing the adapter notification to a client are described with reference to FIGS. 7A-9C. To convert MSIs and provide adapter event notifications, a variety of tasks are performed by the host and the guest, as described below.
Referring first to FIG. 7A, during host initialization (or when the first PCI function is assigned to a guest), the host assigns forwarding AISB arrays and GAIT at step 700. The host then registers the location and length of the forwarding AISB array and GAIT in, for example, an Adapter Interruption Forwarding Table (AIFT), at step 702. In one example, an instruction, such as a set interrupt control instruction, is used to register the location and length of the forwarding AISB.
Further, at step 704, the host specifies a host Interrupt Subclass (ISC) to be assigned to the PCI adapter, which is assigned to the guest. Again, in one example, an instruction, such as a set interrupt control instruction, is used to specify this information. This information is also retained in the AIFT entry for the partition in which the host is running, step 706. This completes the host initialization.
Referring to FIG. 7B, during guest initialization, the guest performs several tasks to configure its PCI functions for adapter event notification requested by an MSI. In one example, one or more instructions that invoke these functions cause an intercept to the host, and thus, the host takes action for each intercept, as described below.
Initially, at step 720, the client determines the PCI functions in the configuration that it has access to. In one example, a client initiates a command (e.g., a query list command) to obtain a list of PCI functions, and the command is intercepted by the host. Since the host has determined which PCI functions are assigned to the host during host initialization, in response to interception of a guest request for a PCI function, the host constructs and returns a command response to the guest that includes only those PCI functions assigned to the guest.
Thereafter, at step 722, the PCI function configured by the customer is selected and some processing is performed. For example, in step 724, the MSI address to be used for the PCI function and the number of MSI vectors supported by the PCI function are determined. The MSI address is determined based on characteristics of the I/O hub and the system in which the I/O hub is installed. The number of MSI vectors supported by this function is based on the capabilities of the adapter. In one example, to determine the MSI address, the client uses a command, such as a query group command, to retrieve a set of adapter's common characteristics, including the MSI address, from a function table entry associated with the adapter or another location.
Further, at step 726, the AIBV and AISB (if present) are allocated. The AIBV and AISB are allocated and initialized to zero, and a register adapter interruption operation is specified. In response to the requested registration adapter interruption operation, the host intercepts the operation and performs registration, step 728, described below. After registration is performed, the configuration space for the PCI function is written, step 730. In particular, the MSI address and MSI vector count are written into the configuration address space of the PCI function consistent with the previous registration. Thereafter, at query 732, a determination is made as to whether additional functionality is present in the list. If so, processing continues with step 722. Otherwise, the client initialization process is complete.
As described above, during initialization, the set interrupt control instruction is used. One embodiment of the instruction is described with reference to FIGS. 7C-7G. As shown in FIG. 7C, in one example, set interrupt control instruction 750 includes an opcode 752 that specifies that this is a set interrupt control instruction; a first field (field 1) 754 containing a location (e.g., register) that specifies an operation control 760 (FIG. 7D) for the instruction; a second field (field 2) 756 specifying a location (e.g., a register) containing an interrupt subclass 770 (FIG. 7E) for the operation control specified by field 1; and a third field (field 3) 758, shown in fig. 7F, which contains the logical address of the Adapter Interruption Parameter Block (AIPB) 780, as described below.
In one example, the operation controls 760 may be encoded as follows:
0-set all interrupt modes: an adapter interruption suppression facility (facility) is set to allow presentation of all adapter interruptions for a given ISC request.
1-set single interrupt mode: the adapter interruption suppression utility is configured to allow presentation of a single adapter interruption request for a given ISC. Subsequent adapter interruption requests for a given ISC are suppressed.
2-set adapter event Notification interrupt control: the adapter event notification interpretation control contained in the adapter interruption parameter block specified by field 3 is set.
An example of the AIPB780 is described with reference to fig. 7G. As shown, AIPB780 includes, for example:
forwarding AISB array address 782: this field specifies the forwarding AISB array used in conjunction with a Guest Adapter Interruption Table (GAIT) and a specified adapter event Notification Forwarding interruption subclass (AFI) to determine whether the I/O adapter signaled adapter interruption request targets a pageable store mode guest.
When the forwarding AISB array address is zero, the interrupt request is targeted to the host. When the AISB array address is not zero, the target of the interrupt request is further determined from the AFI and GAIT.
Guest Adapter Interrupt Table (GAIT) address 784: this field provides the address of the GAIT that will be used to determine whether the adapter interruption request signaled by the I/O adapter is targeted to a pageable storage mode guest, and if so, the GAIT is also used to set the guest AISB and to pass the adapter interruption request to the guest.
Adapter event Notification Forwarding interruption subclass (AFI) 786: this field indicates the ISC value. Pending and presentable interrupts on that ISC initiate an adapter event notification forwarding process whereby the contents of the forwarding AISB array and GAIT are used to further target (host or guest) interrupt requests from the applicable I/O adapter for the respective ISC. When the interrupt request is made from an applicable adapter for the ISC specified by the AFI field, the target of the interrupt may be a pageable store mode guest, and the forwarding AISB array and GAIT are used to determine the actual target (host or guest) to forward any adapter event notifications indicated in the AISB array. When the interrupt request is made from an applicable adapter for an ISC other than the one specified by the AFI field, forwarding the AISB array address and the GAIT address is not applicable and the interrupt request for the corresponding ISC is targeted to the host.
Forward AISB Array Length (FAAL) 788: this field indicates the length of the forwarding AISB array in bits, or the length of the GAIT in GAIT entries.
In response to executing the set interrupt control instruction, one or more interrupt controls are set based on the operation controls specified in field 1. When the value of the operation control indicates that all the interrupt modes are set or a single interrupt mode is set, the value contained in the field 2 specifies an interrupt subclass to which the interrupt control is to be set.
When the value of the operation control indicates that the adapter event notification interpretation control is set, the second operand address (field 3) is a logical address of an Adapter Interrupt Parameter Block (AIPB) including the control to be set. The adapter interruption parameter block is used by the host to facilitate the interpretation (i.e., forwarding) of adapter interruptions originating from the I/O adapter associated with the adapter event notification facility used to page the storage mode guest.
In one example, the settings for the operation controls are stored in a location (e.g., a control block) that is accessible by the firmware and operating system.
Further details regarding the registration of various parameters are described with reference to fig. 8. In one example, at step 800, the host pins the guest AIBV in host memory (i.e., fixes the guest page in host memory, thereby making it non-pageable). In addition, the host may also pin the customer AISB in the host memory if the customer specifies an AISB. At step 802, the host assigns the AISB from the Transmit AISB array and implicitly assigns the corresponding GAIT entry to the PCI function. Alternatively, if the client-specified AISB and ISC are the same AISB and ISC that the client previously registered (for another PCI function), the host may use the same forwarding AISB and GAIT entries allocated for the previous request. This reduces overhead.
At step 804, the host copies the guest Interrupt Subclass (ISC) into the GAIT entry. If the guest specifies an AISB, the host copies the guest AISB's host address and its offset into the GAIT entry at step 806. At step 808, the host copies the GISA designation (designation) from its state description into the GAIT entry.
The host executes commands on behalf of the guest, such as a modify PCI function control command, to specify a registered adapter interruption operation and to specify the following information: a host address and a guest offset for a guest AIBV; a host address and offset of a host AISB in a forwarding AISB array assigned to the adapter; host Interrupt Subclass (ISC) for the adapter (this is the same ISC registered by the host in the AIFT when the host initiates the set interrupt control instruction at initialization); and the number of MSIs specified by the customer.
In response to executing the Modify PCI function controls command, at step 810, a device table entry corresponding to the PCI function for which initialization is being performed is selected, and at step 812, various parameters are stored in the device table entry. For example, the customer AIBV; forwarding AISB selected by the host; a host ISC; and the number of interrupts is set to a value obtained by configuring the function. This completes the registration process.
During operation, when a PCI function wants to generate an MSI, it typically makes some information available to the operating system that describes the condition. This causes one or more steps to occur to convert the PCI function's MSI request to an I/O adapter event notification to the guest operating system. This will be described with reference to fig. 9A.
Referring to FIG. 9A, initially, at step 900, a description of an event for which an interrupt is requested is recorded. For example, the PCI function records a description of the event in one or more adapter-specific event description recording structures stored, for example, in system memory. Further, in step 901, a request is initiated by the PCI function specifying the MSI address and MSI vector number, as well as the requestor ID. At step 902, the request is received by the I/O hub, and in response to receiving the request, the requestor ID in the request is used to locate a device table entry for the PCI function. At query 903, the I/O hub compares at least a portion of the address in the request to the value in the MSI compare register. If they are not equal, the MSI is not requested. However, if they are equal, the MSI address has been specified, not a direct memory access operation.
Thereafter, at query 904, a determination is made as to whether the MSI vector number specified in the request is less than or equal to the number of interrupts (NOIs) allowed by the function. If the MSI vector number is greater than NOI, an error is indicated. Otherwise, in step 905, the I/O hub issues a set bit function to set the appropriate AIBV bit in memory. The appropriate bits are determined by adding the MSI vector number to the AIBV offset specified in the device table entry and shifting this number of bits from the AIBV address specified in the device table entry. The bit that is set is the guest AIBV that has been pinned in host memory based on the way the host sets its interrupt information registration.
Further, if an AISB has been specified, then the I/O hub uses a set bit function to set the AISB, at step 906, using the AISB address and AISB offset in the device table entry, as described above for the AIBV. Again, the bit that is set is the host AISB in the forwarding AISB array in the host memory, based on the manner in which the host sets its interrupt information registration. Note that if the system does not support the setting of a single bit, multiple bits (e.g., bytes) may be set to indicate an adapter event or a summary indication.
Next, in one embodiment, a determination is made by the I/O hub as to whether an interrupt request is already pending. To make this determination, a pending indicator is used. For example, at query 908, pending indicator 252 (FIG. 2A, FIG. 2B) stored in the memory of processor 254, which is accessible by a processor in the interrupt-capable computing environment, is examined. If it is not set, it is set (e.g., to 1) at step 910 and the process is complete. If it is already set, the process is complete and another interrupt request is not requested. Thus, subsequent interrupt requests are contained by one request that is already pending.
In one particular example, to set the indicator, the ISC and the area number (as the interrupt area) in the device table entry are used to determine which pending indicator is to be set. The ISC used is the host-allocated ISC that is used to convert MSI requests to guest adapter event notifications (i.e., the same ISC that the host is registered with in the AIFT), based on the manner in which the host sets its interrupt information registration.
Asynchronously, as shown in FIG. 9B, one or more processors check the pending indicator at query 920. In particular, each processor enabled for ISCs and zones polls the indicator when, for example, interrupts are enabled for the processor (i.e., its operating system). If one of the processors determines that the indicator is set, it arbitrates with the other processors to handle the interrupt. For example, at step 924, the firmware uses the area number specified for the adapter interruption request to locate the AIFT entry for the logical partition (area).
In response to locating the AIFT entry, the firmware checks whether the AIFT entry includes a defined value (e.g., all zeros) at query 926. If the AIFT entry includes a defined value, then no host runs the guest in the logical partition, and at step 928, the logical partition identified by the region number (or the operating system if the logical partition is not configured) is caused to have adapter interrupts pending. The interrupt is then processed as described above with reference to fig. 6B.
Returning to query 926, if the AIFT entry does not include a defined value, meaning that there is a host running guest, processing continues to check if the ISC specified as part of the adapter interruption request is equal to the ISC in the AIFT entry, INQUIRY 930. If the ISC specified as part of the adapter interruption request is not equal to the ISC in the AIFT entry, then the adapter interruption request is not targeted to a guest and the logical partition (i.e., host) identified by the region number is made pending, step 928. And then processed as described above with reference to fig. 6B.
Otherwise, if the ISC specified as part of the adapter interruption request does equal the ISC in the AIFT entry, meaning that the adapter interruption request is targeted to a guest, then at step 932, the firmware scans the forwarding AISB array specified by the host using the forwarding AISB array address and length in the AIFT entry for an indicator (e.g., bit) set to 1. At step 934, for each bit set to 1, the firmware processes the indicator using the information in the corresponding GAIT entry. Further details regarding handling the indicator are described with reference to FIG. 9C.
Referring to FIG. 9C, initially, at query 950, a determination is made as to whether the GAIT entry contains a defined value (e.g., all zeros), which means that the adapter interruption is not targeted to a guest. If the GAIT entry does contain a defined value, then at step 952, the host with the forwarding indicator set to 1 is notified for the host adapter event in the I/O interruption code, so that an adapter interruption is pending. The process is complete.
However, if the GAIT entry does not contain a defined value, meaning that the adapter interruption is targeted to the corresponding guest, then in step 954, steps are performed to complete the forwarding of the adapter event notification to the guest. For example, if the guest AISB address in the GAIT entry does not contain a defined value (e.g., all zeros), then the guest AISB address and the guest AISB offset are used to set the guest AISB to 1 at step 956. Further, in step 958, the guest interrupt subclass and GISA designation in the GAIT entry are used to make the interrupt pending in the GISA for the guest. For example, a bit corresponding to the GISC in an Interrupt Pending Mask (IPM) in the GISA is set to 1. Further, if a host alarm (alerting) is requested for the GISC (e.g., the bit in the Interrupt Alarm Mask (IAM) of the GISA corresponding to the GISC is set to 1), then the host alarm adapter interrupt is made pending, step 960. For example, when the IPM bit is set to 1, the bit specifying adapter interrupts to be processed for the PCI function is set in the adapter interrupt source mask corresponding to the GISC. The setting of the IPM bit is equivalent to a pending indicator for the CPU, since when a guest CPU is enabled for ISC, interrupts are presented to the guest CPU without host intervention. This completes the forwarding of the adapter event notification to the client.
In response to the notification, the client handles any setting of the AISB and/or AIBV indicator, as described above for the operating system.
Thereafter, at step 962, an indicator (e.g., bit) in the forwarding AISB array that triggered the action is set to 0 to indicate that the adapter event notification has been forwarded. In another implementation, the bit may be set to 0 prior to this processing and then checked there.
In one embodiment, although the system firmware handles the host forwarding AISB array, it may decide to end processing early and pass the responsibility to complete processing of the forwarding AISB array to the host. For this case, the system firmware has the adapter interrupt pending for that host, with the host AEN forwarding bit in the I/O interrupt code set to 1. This mechanism avoids the processing of densely populated forwarding AISB arrays from negatively impacting system firmware performance.
Further details regarding the Modify PCI function controls instructions for registering adapter interruptions are presented herein. Referring to FIG. 10A, a Modify PCI function controls instruction 1000 includes, for example, an opcode 1002 indicating the Modify PCI function controls instruction; a first field 1004 specifying where various information is included, the information being about the adapter function for which the operating parameters are being established; and a second field 1006 indicating a location from which a PCI Function Information Block (FIB) is obtained. The contents of the location specified by fields 1 and 2 will be further described below.
In one embodiment, field 1 specifies a general register that includes various information. As shown in FIG. 10B, the contents of the register include, for example, a function handle (handle) 1010 that identifies the handle of the adapter function on which the modify instruction is executed; an address space 1012 specifying an address space in system memory associated with the adapter function specified by the function handle; an operation control 1014 that specifies an operation to be performed for the adapter function; and a state 1016 that provides, in a predetermined code, a status about the instruction when the instruction is completed.
In one embodiment, the function handle includes, for example, an enable indicator indicating whether the handle is enabled, a function number (which is a static identifier and can be used to index into a function table) that identifies the adapter function; and an instance number that specifies a particular instance of the function handle. There is a function handle for each adapter function and it is used to locate a Function Table Entry (FTE) in the function table. Each function table entry includes operating parameters and/or other information related to its adapter function. As an example, the function table entry includes:
example No.: this field indicates the particular instance of the adapter function handle associated with the function table entry;
device Table Entry (DTE) index 1.. n: there are one or more device table indices, and each index is an index into one of the device tables for locating a Device Table Entry (DTE). Each adapter function has one or more device table entries, and each entry includes information related to its adapter function, including information for handling requests of the adapter function (e.g., DMA requests, MSI requests) and information related to requests related to the adapter function (e.g., PCI instructions). Each device table entry is associated with an address space in system memory allocated to the adapter function. The adapter function may have one or more address spaces within system memory allocated to the adapter function.
A busy indicator: this field indicates whether the adapter function is busy;
persistent error status indicator: this field indicates whether the adapter function is in a persistent error state;
restoring the starting indicator: this field indicates whether recovery of the adapter function has been initiated;
permission indicator: this field indicates whether the operating system attempting to control the adapter function has permission to do so;
enabling the indicator: this field indicates whether the adapter function is enabled (e.g., 1= enabled, 0= disabled);
requester Identifier (RID): this is an identifier of the adapter function and includes, for example, a bus number, a device number, and a function number.
In one example, this field is used to access the configuration space of the adapter function. (the memory of the adapter may be defined as an address space, including, for example, a configuration space, an I/O space, and/or one or more memory spaces.) in one example, the configuration space may be accessed by specifying the configuration space in instructions issued by the operating system (or other configuration) to the adapter function. Specified in the instruction is an offset into the configuration space, and a function handle for locating the appropriate function table entry including the RID. The firmware receives the instruction and determines that it is for the configuration space. Thus, it uses the RID to generate requests to the I/O hub, and the I/O hub creates requests to access the adapter. The positioning of the adapter function is based on the RID, and the offset specifies an offset into the configuration space of the adapter function.
Base Address Register (BAR) (1 to n): this field includes a plurality of unsigned integers, designated BAR0-BARnWhich is associated with the originally specified adapter function and whose value is also stored in the base address register associated with the adapter function. Each BAR indicates the starting address of the memory space or I/O space within the adapter function, and also indicates the type of address space, i.e., it is a 64 or 32 bit memory space, for example, or a 32 bit I/O space;
in one example, it is used to access memory space and/or I/O space of the adapter function. For example, an offset provided in an instruction accessing the adapter function is added to a value in a base address register associated with an address space specified in the instruction to obtain an address for accessing the adapter function. An address space identifier provided in the instruction identifies an address space within the adapter function to be accessed, and a corresponding BAR to be used;
size (Size) 1.. n: this field includes a plurality of unsigned integers, designated SIZE0-SIZEN(ii) a The value of the size field, when not zero, indicates the size of each address space, and each entry corresponds to the previously described BAR.
Further details regarding BAR and Size will be described below.
1. When the BAR is not implemented for the adapter function, both the BAR field and its corresponding size field are stored as zeros.
2. When the BAR field represents an I/O address space or a 32-bit memory address space, the corresponding size field is non-zero and represents the size of the address space.
3. When the BAR field represents a 64-bit memory address space,
a.BARnthe field indicates the least significant (least significant) address bit.
b. The next successive BARn+1The field indicates the most significant (mostgignifican) address bit.
c. Corresponding SIZEnThe field is non-zero and indicates the size of the address space.
d. Corresponding SIZEn+1The field is not meaningful and is stored as zero.
Internal routing information: this information is used to perform a specific routing to the adapter. It includes, by way of example, node, processor chip and hub addressing information.
And (3) status indication: this provides an indication as to whether, for example, a load/store operation is blocked or the adapter is in an error state, among other indications.
In one example, the busy indicator, persistent error status indicator, and recovery start indicator are set based on supervision performed by firmware. Also, the permission indicator is set based on, for example, policy; and BAR information is set based on configuration information found during bus walks (buswalk) of a processor (e.g., firmware of the processor). Other fields may be set based on configuration, initialization, and/or events. In other embodiments, the function table entry may include more, less, or different information. The information included may depend on the operations supported or enabled by the adapter function.
Referring to FIG. 10C, in one example, field 2 indicates the logical address 1020 of the PCI Function Information Block (FIB), which includes information about the adapter function. The function information block is used to update the device table entry and/or function table entry (or other location) associated with the adapter function. This information is stored in the FIB during initialization and/or configuration of the adapter, and/or in response to certain events.
Further details regarding the Functional Information Block (FIB) are described with reference to fig. 10D. In one embodiment, the function information block 1050 includes the following fields:
format 1051: this field specifies the format of the FIB.
Interception control 1052: this field is used to indicate whether guest execution of a particular instruction by a pageable mode guest (pageable modeguest) results in instruction interception;
error indication 1054: this field includes error status indications for direct memory access and adapter interruptions. When the bit is set (e.g., 1), one or more errors are detected when performing direct memory access or adapter interception for the adapter function;
load/store prevent 1056: this field indicates whether the load/store operation is blocked;
PCI function valid 1058: this field includes enable controls for the adapter function. When the bit is set (e.g., 1), the adapter function is considered enabled for I/O operations;
address space registration 1060: this field includes direct memory access enable control for the adapter function. When this field is set (e.g., 1), direct memory access is enabled;
page size 1061: this field indicates the size of the page or other unit of storage to be accessed by the DMA memory access;
PCI Base Address (PBA) 1062: this field is the base address for the address space in system memory allocated to the adapter function. It represents the lowest virtual address that the adapter function is allowed to use in direct memory access to the specified DMA address space;
PCI Address boundary (PAL) 1064: this field indicates the highest virtual address that the adapter function is allowed to access within the specified DMA address space;
input/output address translation pointer (IOAT) 1066: the input/output address translation pointer specifies the first of any translation tables used by PCI virtual address translation, or it may directly specify the absolute address of the memory frame as the result of the translation;
interruption Subclass (ISC) 1068: this field includes an interrupt subclass for giving adapter interrupts for adapter functions;
number of interruptions (NOI) 1070: this field specifies the number of different interrupt codes that are acceptable for the adapter's function. This field also defines in bits the size of the adapter interrupt bit vector specified by the adapter interrupt bit vector address and the adapter interrupt bit vector offset field;
adapter interrupt bit vector Address (AIBV) 1072: this field specifies the address of the adapter interrupt bit vector for the adapter function. The vector is used in the interrupt processing;
adapter interrupt bit vector offset 1074: this field specifies the offset of the first adapter interrupt bit vector bit for the adapter function;
adapter interrupt summary bit Address (AISB) 1076: this field provides an address specifying an adapter interrupt summary bit that is optionally used in interrupt processing;
adapter interrupt summary bit offset 1078: this field provides an offset into the adapter interrupt summary bit vector;
functional Measurement Block (FMB) address 1080: this field provides the address of the function measurement block for collecting measurements on the adapter function;
function measurement block key (key) 1082: this field includes an access key to access the functional measurement block;
summary bit notification control 1084: this field indicates whether there is a summary bit vector being used;
instruction authorization token 1086: this field is used to determine whether the pageable storage mode guest is authorized to execute PCI instructions without host intervention; and
address translation format 1087: this field indicates the selected format (e.g., indication of the segment table, region (region) third, etc.) for translating the address of the highest level translation table to be used in the translation.
The function information block specified in the Modify PCI function controls instruction is used to modify the selected device table entry, function table entry, and/or other firmware controls associated with the adapter function specified in the instruction. Certain services are provided to the adapter by modifying device table entries, function table entries, and/or other firmware controls. These services include, for example, adapter interruptions; address translation; resetting the error state; reset load/store block; setting functional measurement parameters; and setting interception control.
One embodiment of the logic associated with modifying a PCI function control instruction is described with reference to FIG. 11. In one example, the instructions are issued by an operating system (or other configuration) and executed by a processor (e.g., firmware) executing the operating system. In the example herein, the instruction and adapter functions are PCI based. However, in other embodiments, different adapter structures and corresponding instructions may be used.
In one example, the operating system provides the following operands to the instruction (e.g., in one or more registers specified by the instruction); PCI function handles; a DMA address space identifier; operation control; and the address of the functional information block.
Referring to FIG. 11, initially, a determination is made as to whether a facility (facility) is installed that allows modification of the PCI function control instructions, INQUIRY 1100. This determination is made, for example, by examining an indicator stored, for example, in a control block. If the tool is not installed, an exception condition is provided, STEP 1102. Otherwise, a determination is made as to whether the instruction was issued by a pageable storage mode guest (or other guest), INQUIRY 1104. If so, the host operating system will emulate the operation for that guest, step 1106, as described above.
Otherwise, a determination is made as to whether one or more operands are aligned, INQUIRY 1108. For example, it is determined whether the address of the functional information block is at a doubleword boundary. In one example, this is optional. If the operands are not aligned, an exception condition is provided, STEP 1110. Otherwise, a determination is made as to whether the block of functional information is accessible, INQUIRY 1112. If not, an exception condition is provided, step 1114. Otherwise, a determination is made as to whether the handle provided in the operand of the Modify PCI function controls instruction is enabled, INQUIRY 1116. In one example, this determination is made by examining an enable indicator in the handle. If the handle is not enabled, an exception condition is provided, step 1118.
If the handle is enabled, the handle is used to locate the function table entry, STEP 1120. That is, at least a portion of the handle is used to index into the function table to locate the function table entry corresponding to the adapter function for which the operating parameters are to be established.
A determination is made as to whether a function table entry is found, INQUIRY 1122. If not, an exception condition is provided, step 1124. Otherwise, if the configuration from which the instruction was issued is a guest, INQUIRY 1126, then an exception condition is provided (e.g., intercepted to the host), STEP 1128. If the configuration is not a customer, the query may be ignored, or other authorizations may be checked, if specified.
A determination is then made as to whether the function is enabled, INQUIRY 1130. In one example, this determination is made by checking an enable indicator in the function table entry. If it is not enabled, an exception condition is provided, step 1132.
If the function is enabled, a determination is made as to whether recovery is active, INQUIRY 1134. If the recovery is active as determined by the recovery indicator in the function table entry, an exception condition is provided, step 1136. If, however, recovery is not active, a further determination is made as to whether the function is busy, INQUIRY 1138. This determination is made by looking up the busy indicator in the function table entry. If the function is busy, a busy condition is provided, step 1140. With the busy condition, the instruction may be retried instead of giving up it.
If the function is not busy, a further determination is made as to whether the function information block format is valid, INQUIRY 1142. For example, the format field of the FIB is examined to determine if the format is supported by the system. If it is not valid, an exception condition is provided, step 1144. If the function information block format is valid, a further determination is made as to whether the operation control specified in the operand of the instruction is valid, INQUIRY 1146. That is, whether the operation control is one of the specified operation controls for the instruction. If it is not valid, an exception condition is provided, step 1148. However, if the operation control is valid, the specified specific operation control is continued to be processed.
In one example, the operation control is a register adapter interruption operation, which is used to control adapter interruptions. In response to this operational control, adapter function parameters associated with the adapter interruption are set in the device table entry based on the appropriate contents of the function information block.
One embodiment of the logic associated with this operation is described with reference to FIG. 12. As an example, the operands for this operation obtained from the functional information block include, for example: interrupt Subclass (ISC); number of allowed interrupts (NOI); adapter Interrupt Bit Vector Offset (AIBVO); a summary notification (S); adapter interruption summary bit vector offset (ABVSO); an Adapter Interrupt Bit Vector (AIBV) address; and an adapter interrupt summary bit vector (AISB) address.
Referring to FIG. 12, initially, at query 1200, it is determined whether the number of breaks (NOI) specified in the FIB is greater than a model-dependent maximum value. If so, then at step 1202, an exception condition is provided. However, if the number of interrupts is not greater than the model-dependent maximum, then at query 1204, a further determination is made as to whether the number of interrupts plus the adapter interrupt bit vector offset (NOI + AIBVO) is greater than the model-dependent maximum. If so, then in step 1206, an exception condition is provided. If the NOI plus the AIBVO is not greater than the maximum value dependent on the model, then at query 1208, it is further determined whether the AIBV address plus the NOI crosses the 4k boundary. If it does cross a 4k boundary, then at step 1210, an exception condition is provided. Otherwise, at step 1212, it is determined whether sufficient resources are available for any required resources. If there are not enough resources, then at step 1214, an exception condition is provided.
Otherwise, at step 1216, a determination is made as to whether an adapter interruption has been registered for the function. In one embodiment, this may be determined by examining one or more parameters (e.g., in the DTE/FTE). In particular, parameters related to the interruption, such as NOI, are checked. If the field is filled, the adapter is registered for interruption. If the adapter has been registered, then at step 1218 an exception condition is provided. Otherwise, the interrupt parameter is retrieved from the FIB and placed in the function table entry (or other specified location) and the corresponding Device Table Entry (DTE). Further, at step 1220, an MSI enable indicator is set in the DTE. That is, PCI function parameters associated with the adapter interruption are set in the DTE and optionally the FTE based on information retrieved from the function information block. These parameters include, for example, ISC, NOI, AIBVO, S, AIBVSO, AIBV address, and AISB address.
In addition to the above, another operation control that can be specified is a logout memory interrupt operation, an example of which will be described with reference to fig. 10. With this operation, the adapter function parameter associated with the adapter interruption is reset.
Referring to FIG. 13, initially, at query 1300, a determination is made as to whether the adapter specified by the function handle is registered for an interrupt. If not, then at step 1302, an exception condition is provided. Otherwise, at step 1304, the interrupt parameter in the function table entry (or other location) and the corresponding device table entry is set to 0. In one example, the parameters include ISC, NOI, AIBVO, S, AIBSO, AIBV address, and AISB address.
As described above, in one embodiment, to obtain information about the adapter function, a call logic processor instruction is used. One embodiment of this instruction is shown in FIG. 14. As shown, in one example, a Call Logical Processor (CLP) instruction 1400 includes an opcode 1402 indicating that this is a Call logical processor instruction; and a command indication 1404. In one example, the indication is an address of a request block describing a command to be executed, and information in the request block depends on the command. Examples of request blocks and corresponding response blocks for respective commands are described with reference to FIGS. 14B-16B.
Referring first to FIG. 14B, a request block for a list PCI function command is provided. The list PCI function command is used to obtain a list of PCI functions assigned to the requesting configuration (e.g., the requesting operating system). Request block 1420 includes several parameters, such as:
length field 1422: this field indicates the length of the request block;
command code 1424: this field indicates the list PCI function command; and
recovery token (token) 1426: this field is an integer that is used to start a new list PCI function command or to resume a previous list PCI function command, as will be described in more detail below.
When the recovery token field in the command request block includes a value, for example, zero, then a new list of PCI functions is requested. When the recovery token field includes a non-zero value returned, for example, from the previous list PCI function command, then the request is to continue with the previous list of PCI functions.
The response block is returned in response to the call logic processing instruction issuing and processing the command for the list PCI function. One embodiment of a response block is shown in FIG. 14C. In one example, the response block 1450 for the list PCI function command includes:
length field 1452: this field indicates the length of the response block;
response code 1454: this field indicates the status of the command;
PCI function list 1456: this field indicates a list of one or more PCI functions available to the requesting operating system;
recovery token 1458: this field indicates whether continuation of the previous PCI function list is requested. In one example, when the recovery token in the request block and the recovery token in the response block are zero, all PCI functions assigned to the requested configuration are represented in the list of PCI functions; if the recovery token in the request block is zero and the recovery token in the response block is not, there may be additional PCI functions assigned to the request configuration that are not represented in the list; if the recovery token in the request block is not zero and the recovery token in the response block is zero, the remaining PCI functions assigned to the requested configuration are represented in the list starting from the recovery point; when the recovery tokens in both the request and response blocks are not zero from the recovery point, there may be additional PCI functions assigned to the request configuration that are not represented in any of the relevant PCI function lists. After being returned, the recovery token remains valid for an indeterminate period of time, but it may be invalid for various reasons depending on the model (including system load elapsed time).
Model-dependent data 1460: this field includes system dependent data;
number of PCI functions 1462: this field indicates the maximum number of PCI functions supported by the tool (facility); and
item size 1464: this field indicates the size of each entry in the PCI function list.
Further details regarding the PCI function list are described with reference to FIG. 14D. In one example, the list of PCI functions includes a plurality of entries and each entry 1456 includes, by way of example, the following information:
device ID 1470: this field indicates the I/O adapter associated with the corresponding PCI function;
vendor ID 1472: this field identifies the manufacturer of the I/O adapter associated with the corresponding PCI function;
function identifier 1474: this field includes the persistent identifier of the PCI function;
function handle 1476: this field identifies the PCI function. The stored PCI function handle is a general handle when the designated bit of the handle is zero, and it is an enabled handle when the bit is 1. If the PCI function is disabled, the generic PCI function handle is stored. If the PCI function is enabled, the enabled PCI function handle is stored. In one example, the PCI function handle does not persist outside of the IPL, unlike the PCI function ID, which is persistent and set for the lifetime of the I/O configuration definition; and
configuration state 1478: this field indicates the status of the PCI function. When the indicator is, for example, zero, the state is wait (standby), and when the indicator is, for example, 1, the state is configured. When waiting, the PCI function handle is a generic PCI function handle, and when configured, it is a generic or enabled PCI function handle, depending on whether the PCI function is enabled.
After obtaining the list of adapter functions, information regarding the attributes of the selected function specified by the specified PCI function handle may be obtained. This information may be obtained by issuing a CLP instruction with a query function command.
One embodiment of a request block for a query PCI function command is described with reference to FIG. 15A. In one example, request block 1500 includes, for example:
length field 1502: this field indicates the length of the request block;
the command code 1504: this field indicates the query PCI function command; and
function handle 1506: this field includes a (e.g., generic or enabled) PCI function handle that specifies the PCI function to be queried.
The response block is returned in response to issuing the call logic processor instruction to query the PCI function command. One embodiment of a response block is shown in FIG. 15B. In one example, response block 1550 includes the following:
length 1552: this field indicates the length of the response block;
response code 1554: this field indicates the status of the command;
function group ID 1556: this field indicates the PCI function group identifier. The PCI function group identifier is used to associate a group of PCI functions with a set of attributes (also referred to herein as properties). Each PCI function having the same PCI function group identifier has the same set of attributes;
function ID 1558: PCI function id is a persistent identifier of the PCI function, which is originally specified by the PCI function handle and is set for the lifetime of the I/O configuration definition;
physical channel adapter 1560: this value represents a model-dependent identification of the location of the physical I/O adapter corresponding to the PCI function;
base Address Register (BAR) 1.. n 1562: this field includes a plurality of unsigned integers, which are designated BARs0–BARnIt is associated with the initially designated PCI function and its value is also stored in the base register associated with the PCI function. Each BAR specifies the starting address of the memory space or I/O space in the adapter, and also indicates the type of address space, i.e., whether it is a 64-bit or 32-bit memory space, or a 32-bit I/O space, for example;
size 1.. n 1564: this field includes a plurality of unsigned integers, designated SIZE0–SIZEn. When the value of the size field is non-zero, it represents the size of each address space, each entry of which corresponds to the previously described BAR.
Start available DMA 1566: this field includes an address indicating the start of a PCI address range that may be used for DMA operations;
terminate available DMA 1568: this field includes a value that indicates the termination of the PCI address range that is available for DMA operations.
In addition to obtaining attributes for a particular adapter function, attributes for the group containing that function may also be obtained. These common attributes may be obtained by issuing a CLP instruction with a query PCI function group command. The command is used to obtain a supported set of properties for a group of one or more PCI functions specified by a specified PCI function group identifier. The PCI function group identifier is used to associate a group of PCI functions with the same group of properties. One embodiment of a request block for requesting a PCI function group command is described with reference to FIG. 16A. In one example, request block 1600 includes the following:
length field 1602: this field indicates the length of the request block;
command code 1604: this field indicates the query PCI function group command; and
function group ID 1606: this field specifies the PCI function group identifier for which the attributes are acquired.
In response to issuing and processing a call logic processing instruction with a query PCI function group command, a response block is returned. FIG. 16B illustrates one embodiment of a response block. In one embodiment, response block 1650 includes:
length field 1652: this field indicates the length of the response block;
response code 1654: this field indicates the status of the command;
interrupt number 1656: this field indicates the maximum number of consecutive MSI vector numbers (i.e., interrupt event indicators) that are supported by the PCI tool for each PCI function in the specified set of PCI functions. In one example, the range of possible valid values for the number of interruptions is 0 to 2,048;
version 1658: this field indicates the version of the PCI specification supported by the PCI utility to which the PCI function group specified by the specified PCI group identifier is attached;
frame 1662: this field indicates the supported frame (or page) size for I/O address translation;
measurement block update interval 1664: this is a value indicating the approximate time interval (e.g., in milliseconds) between PCI function measurement block updates
DMA address space mask 1666: this is a value to indicate which bits in the PCI address are used to identify the DMA address space; and
MSI address 1668: this is a value used for message signaled interruption requests.
The request list and function commands as described above retrieve information from the function table, for example. At initialization, or after hot plug (hot plug) of the adapter, the firmware performs a bus walk (buswalk) to determine the location of the adapter and to determine its basic characteristics. This information is stored by the firmware into a Function Table Entry (FTE) for each adapter. The accessibility of the adapter is determined based on policies set by the system administrator and also set by the firmware into the FTE. The query list and function commands may then retrieve this information and store it in their respective response blocks accessible to the operating system.
In addition, the group information is based on the capabilities of the given system I/O architecture infrastructure as well as the firmware and I/O hubs. This may be stored in the FTE or any other convenient location for later retrieval at query processing. In particular, the query group command retrieves this information and stores it in a response block accessible to the operating system. In one example, when a guest issues a query command, the host operating system may reissue the query to determine system capabilities and then may modify the response block based on the capabilities of the host before returning the response block to the guest.
The ability to translate PCI message signaled interrupts into I/O adapter event notifications to a guest (e.g., guest operating system) is described in detail above. This provides a low latency interrupt request; passing the MSI from a relatively large number of PCI functions to the operating system; and keeping the MSI adapted to the MSI vector specified style (navigator) of the adapter event notification framework. It is adapted to allow the I/O hub to connect to a relatively large number of PCI functions and to eliminate the problem of generating a unique interrupt each time an MSI vector is written.
In accordance with aspects of the present invention, MSIs that are issued by adapter functions and converted to I/O adapter event notifications are delivered to a guest (e.g., pageable storage mode guest) without host intervention during delivery.
In the embodiment described herein, the adapter is a PCI adapter. As used herein, PCI refers to any adapter implemented according to the PCI-based specification defined by the peripheral component interconnect special interest group (PCI-SIG) (www.pcisig.com/home), including but not limited to PCI or PCIe. In one particular example, peripheral component interconnect express (PCIe) is a component-level interconnect standard that defines a bi-directional communication protocol for transactions between an I/O adapter and a host system. According to the PCIe standard for transmission over a PCIe bus, PCIe communications are encapsulated in packets. Transactions originating at the I/O adapter and terminating at the host system are referred to as upbound transactions. Transactions originating at the host system and terminating at the I/O adapter are referred to as downstream transactions. The PCIe topology is based on point-to-point unidirectional links that are paired (e.g., one uplink, one downlink) to form a PCIe bus. The PCIe standard is maintained and published by the PCI-SIG.
As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method or computer program product. Accordingly, the present disclosure may be embodied in the form of: may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software, and may be referred to herein generally as a "circuit," module "or" system. Furthermore, in some embodiments, the invention may also be embodied in the form of a computer program product in one or more computer-readable media having computer-readable program code embodied in the medium.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Referring now to FIG. 17, in one example, a computer program product 1700 includes, for instance, one or more computer-readable storage media 1702 having computer-readable program code means or logic 1704 stored thereon to provide and facilitate one or more aspects of the present invention.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The present invention is described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instruction means (instructions) which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition to the foregoing, one or more aspects of the present invention may be provided, offered, deployed, managed, serviced, etc. by a service provider who offers management of a user's environment. For example, a service provider can create, maintain, support, etc., computer code and/or computer infrastructure that performs one or more aspects of the present invention for one or more users. The service provider, in turn, may accept payment from the user, for example, according to a subscription and/or fee agreement. Additionally or alternatively, the service provider may receive payment from the sale of advertising content to one or more third parties.
In one aspect of the invention, an application may be deployed to perform one or more aspects of the invention. As one example, deploying an application comprises providing a computer infrastructure operable to perform one or more aspects of the present invention.
As yet another aspect of the present invention, a computing infrastructure may be deployed comprising integrating computer-readable code into a computer system, wherein the code in combination with the computing system is capable of performing one or more aspects of the present invention.
As yet another aspect of the present invention, a process for integrating computing infrastructure comprising integrating computer readable code into a computer system may be provided. The computer system includes a computer-readable medium, wherein the computer medium includes one or more aspects of the present invention. The code in combination with the computer system is capable of performing one or more aspects of the present invention.
While various embodiments are described above, these are only examples. For example, computing environments of other architectures may incorporate and use one or more aspects of the present invention. By way of example, except SystemServers other than servers, such as Power systems servers or other servers offered by International Business machines corporation, or servers of other companies, may include, use and/or benefit from one or more aspects of the present invention. Moreover, although in the examples illustrated herein, the adapters and PCI hubs are considered to be part of the server, in other embodiments, they need not be considered to be part of the server, but may simply be considered to be coupled to the system memory and/or other components of the computing environment. The computing environment need not be a server. Moreover, although the adapters are PCI based, one or more aspects of the present invention may be used with other adapters or other I/O components. Adapters and PCI adapters are examples only. Further, one or more aspects of the present invention may be applicable to interruption schemes other than PCIMSI. Further, although bits are set in the described examples, in other embodiments, bytes or other types of indicators may be set. Moreover, the DTE and other structures may include more, less, or different information. Many other variations are possible.
Moreover, other types of computing environments may benefit from one or more aspects of the present invention. By way of example, a data processing system suitable for storing and/or executing program code will be used that includes at least two processors coupled directly or indirectly to memory elements through a system bus. The memory elements include, for instance, local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, DASD, magnetic tape, CDs, DVDs, thumb drives (thumb drives), and other storage media, etc.) can be coupled to the system either directly or through intervening I/O controllers. Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the available types of network adapters.
Referring to FIG. 18, representative components of a host computer system 5000 to implement one or more aspects of the present invention are depicted. Representative host computer 5000 includes one or more CPUs in communication with computer memory (i.e., central storage) 5002, as well as I/O interfaces to storage media devices 5011 and networks 5010 for communicating with other computers or SANs and the like. The CPU5001 conforms to an architecture having an architectural instruction set and architectural functions. The CPU5001 may have Dynamic Address Translation (DAT) 5003 for translating program addresses (virtual addresses) to real addresses of memory. A DAT typically includes a Translation Lookaside Buffer (TLB) 5007 for caching translations so that later accesses to a block of computer memory 5002 do not require the delay of address translation. Typically, a cache 5009 is used between the computer memory 5002 and the processor 5001. The cache 5009 may be hierarchical, having a large cache available to more than one CPU, and smaller, faster (lower level) caches between the large cache and each CPU. In some embodiments, the lower level cache is split to provide separate lower level caches for instruction fetching and data accesses. In one embodiment, instructions are fetched from memory 5002 by instruction fetch unit 5004 via cache 5009. The instructions are decoded in the instruction decode unit 5006 and (in some embodiments, with other instructions) sent to the one or more instruction execution units 5008. Typically, several execution units 5008 are used, such as an arithmetic execution unit, a floating point execution unit, and a branch instruction execution unit. The specification is executed by the execution unit, accessing operands from registers or memory specified by the instruction, as needed. If an operand is to be accessed (loaded or stored) from memory 5002, load/store unit 5005 typically handles the access under the control of the instruction being executed. The instructions may be executed in hardware circuitry, or in internal microcode (firmware), or in a combination thereof.
Note that the computer system includes information in local (or main) memory, as well as addressing, protection, and reference and change records. Some aspects of addressing include address format, concept of address space, various types of addresses, and the manner in which one type of address is translated to another type of address. Some main memories include persistently allocated memory locations. The main memory provides the system with fast-access data storage that is directly addressable. Both data and programs will be loaded into main memory (from the input device) before they can be processed.
The main memory may include one or more smaller, faster-access cache memories, sometimes referred to as caches. The cache is typically physically associated with the CPU or I/O processor. The effects of the physical structure and use of different storage media are not typically observed by a program except in terms of performance.
Separate caches for instruction and data operands may be maintained. Information in a cache may be maintained as contiguous bytes on integer boundaries called cache blocks or cache lines (or simply lines). The model may provide an extract cache attribute instruction that returns the byte size of the cache line. The model may also provide PREFETCHDATA (prefetch data) and prefetch data relative issue (prefetch longer data) instructions that enable a prefetch to be stored into the data or instruction cache, or a release of data from the cache.
The memory is considered to be a long horizontal string of bits. For most operations, accesses to memory are made in left-to-right order. The bit string is subdivided into units of eight bits. The eight-bit unit is called a byte, which is the basic building block for all information formats. Each byte location in memory is identified by a unique non-negative integer, which is the address of the byte location, or simply, the byte address. Adjacent byte positions have consecutive addresses, starting at 0on the left and proceeding in left to right order. The address is an unsigned binary integer and is 24, 31 or 64 bits.
Information is transferred between the memory and the CPU or channel subsystem one byte or a group of bytes at a time. Unless otherwise specified, e.g. inA group of bytes in memory is addressed by the leftmost byte of the group. The number of bytes in a group may be implied or explicitly specified by the operation to be performed. When used in CPU operations, a group of bytes is called a field. Within each group of bytes, e.g. inIn which bits are numbered in left-to-right order. In thatIn (d), the leftmost bit is sometimes referred to as the "high order" bit and the rightmost bit is referred to as the "low order" bit. However, the number of bits is not a memory address. Only bytes can be addressed. To operate on a single bit of a byte in memory, the entire byte is accessed. The bits on a byte are numbered 0 to 7 from left to right (e.g., inIn (1). Bits in the address are numbered 8-31 or 40-63 for a 24-bit address, or 1-31 or 33-63 for a 31-bit address; they are numbered 0-63 for a 64-bit address. Any other fixed length in bytesThe bits constituting the format are numbered consecutively from 0. For error detection, and preferably for correction, one or more check bits may be passed with each byte or group of bytes. Such check bits are automatically generated by the machine and cannot be directly controlled by the program. The storage capacity is expressed in number of bytes. When the length of a memory operand field is implied by the opcode of the instruction, the field is said to have a fixed length, which may be one, two, four, eight, or sixteen bytes. Larger fields may be implied for some instructions. When the length of the memory operand field is not implied but explicitly indicated, the field is said to have a variable length. Variable length operands may be variable in length in increments of one byte (or for some instructions, in multiples of two bytes or other multiples). When information is placed in memory, only the contents of which byte locations included in the specified field are replaced, even though the width of the physical path to memory may be greater than the length of the field being stored.
Some units of information are located on integer limits in memory. For a unit of information, a bound is said to be an integer when its memory address is a multiple of the length of the unit in bytes. Special names are given to the fields of 2, 4, 6, 8 and 16 bytes on the integer limit. A halfword is a set of two consecutive bytes on a two-byte boundary and is the basic building block of instructions. A word is a set of four consecutive bytes on a four-byte boundary. A doubleword is a set of eight consecutive bytes on an eight-byte boundary. A quad word (quadword) is a set of 16 contiguous bytes on a 16-byte boundary. When a memory address specifies a halfword, a word, a doubleword, and a quadword, the binary representation of the address includes one, two, three, or four rightmost zero bits, respectively. The instruction will be on a two-byte integer boundary. Most instructions have memory operands that do not have boundary alignment requirements.
On devices that implement separate caches for instructions and data operands, significant delays may be experienced if a program stores in a cache line and an instruction is subsequently fetched from the cache line, regardless of whether the store alters the subsequently fetched instruction.
In one embodiment, the invention may be implemented by software (sometimes referred to as licensed internal code, firmware, microcode, millicode, picocode, etc., any of which would be consistent with the invention). Referring to fig. 18, software program code embodying the present invention is typically accessible by a processor of the host system 5000 from a long term storage media device 5011, such as a CD-ROM drive, tape drive or hard drive. The software program code may be embodied on any of a variety of known media for use with a data processing system, such as a floppy disk, a hard drive, or a CD-ROM. The code may be distributed on such media, or may be distributed to users of other computer systems from the computer memory 5002 or storage devices of one computer system over the network 5010 for use by users of such other systems.
The software program code includes an operating system which controls the function and interaction of the various computer components and one or more application programs. The program code is typically paged from the storage media device 5011 to the relatively higher speed computer memory 5002 where it is available to the processor 5001. The techniques and methods for embodying software program code in memory, on physical media, and/or distributing software code via networks are well known and will not be discussed further herein. When the program code is created and stored on a tangible medium, including but not limited to an electronic memory module (RAM), flash memory, Compact Discs (CDs), DVDs, tapes, etc., it is often referred to as a "computer program product". The computer program product medium is typically readable by processing circuitry preferably located in a computer system for execution by the processing circuitry.
FIG. 19 illustrates a representative workstation or server hardware system in which the present invention may be implemented. The system 5020 of fig. 19 includes a representative base computer system (base computer) 5021, such as a personal computer, workstation or server, including optional peripherals. A basic computer system 5021 comprises one or more processors 5026 and a bus used to connect and enable communication between the processors 5026 and other components of the system 5021, in accordance with known techniques. The bus connects the processor 5026 to memory 5025 and long-term storage 5027 which may comprise a hard disk drive (including any of magnetic media, CD, DVD, and flash memory, for example) or a tape drive, for example. The system 5021 may also include a user interface adapter that connects the microprocessor 5026 via the bus to one or more interface devices, such as a keyboard 5024, a mouse 5023, a printer/scanner 5030, and/or other interface devices, which may be any user interface device such as a touch-sensitive screen, a digital input pad (digizzedentrypad), etc. The bus may also connect a display device 5022, such as an LCD screen or monitor, to the microprocessor 5026 via a display adapter.
The system 5021 may communicate with other computers or networks of computers via a network adapter capable of communicating 5028 with a network 5029. Exemplary network adapters are communications channels, token ring, Ethernet or modems. Alternatively, the system 5021 may communicate using a wireless interface, such as a CDPD (cellular digital packet data) card. The system 5021 can be associated with such other computers in a Local Area Network (LAN) or a Wide Area Network (WAN), or the system 5021 can be a client in a client/server arrangement with another computer, etc. All of these configurations, as well as suitable communication hardware and software, are known in the art.
Figure 20 illustrates a data processing network 5040 in which the present invention may be implemented. The data processing network 5040 may include a plurality of separate networks, such as wireless and wired networks, each of which may include a plurality of separate workstations 5041, 5042, 5043, 5044. Further, those skilled in the art will appreciate that one or more LANs may be included, wherein a LAN may include a plurality of intelligent workstations coupled to a host processor.
Still referring to FIG. 20, the network may also include mainframe computers or servers, such as a gateway computer (client server 5046) or application server (remote server 5048, which may access a data repository and may also be directly fromWorkstation 5045 is accessed). The gateway computer 5046 serves as a point of entry into each individual network. When connecting one networking protocol to another, a gateway is required. The gateway 5046 may preferably be coupled to another network (e.g., the internet 5047) by a communications link. The gateway 5046 may also be directly coupled to one or more workstations 5041, 5042, 5043, 5044 using a communications link. IBMeServer available from International Business machines corporation may be utilizedTMSystemThe server implements a gateway computer.
Referring concurrently to fig. 19 and 20, software programming code which may embody the present invention may be accessed by the processor 5026 of the system 5020 from long-term storage media 5027, such as a CD-ROM drive or hard drive. The software programming code may be embodied on any of a variety of known media for use with a data processing system, such as a floppy disk, a hard drive, or a CD-ROM. The code may be distributed on such media, or from the memory or storage of one computer system over a network to users 5050, 5051 of other computer systems for use by users of such other systems.
Alternatively, the programming code may be embodied in the memory 5025 and accessed by the processor 5026 using a processor bus. Such programming code includes an operating system which controls the function and interaction of the various computer components and one or more application programs 5032. Program code is typically paged from the storage medium 5027 to high-speed memory 5025 where it is available for processing by the processor 5026. Techniques and methods for embodying software programming code in memory, on physical media, and/or distributing software code via networks are well known and will not be discussed further herein. Program code, when created and stored on tangible media, including but not limited to electronic memory modules (RAM), flash memory, Compact Discs (CDs), DVDs, tapes, etc., is commonly referred to as a "computer program product". The computer program product medium is typically readable by a processing circuit, preferably located in a computer system, for execution by the processing circuit.
The cache most readily used by the processor (which is typically faster and smaller than the other caches of the processor) is the lowest level (L1 or level 1) cache, and main storage (main memory) is the highest level cache (L3 if there are three levels). The lowest level cache is often divided into an instruction cache (I-cache) that holds the machine instructions to be executed, and a data cache (D-cache) that holds the data operands.
Referring to FIG. 21, an exemplary processor embodiment is shown for the processor 5026. Typically, one or more levels of cache 5053 are used to buffer memory blocks in order to improve processor performance. The cache 5053 is a cache buffer that holds cache lines of memory data that are likely to be used. Typical cache lines are 64, 128 or 256 bytes of memory data. A separate cache is typically used for caching instructions rather than data. Cache coherency (synchronization of copies of lines in memory and cache) is typically provided by various "snoop" algorithms well known in the art. The main memory 5025 of the processor system is commonly referred to as a cache. In a processor system having 4 levels of cache 5053, main memory 5025 is sometimes referred to as a level 5 (L5) cache, because it is typically faster and maintains only a portion of the non-volatile storage (DASD, tape, etc.) that is available to the computer system. Main memory 5025 may "cache" pages of data paged in and out of main memory 5025 by the operating system.
Program counter (instruction counter) 5061 keeps track of the address of the current instruction to be executed.The program counter in the processor is 64 bits and may be truncated to 31 or 24 bits to support the previous addressing limits. The program counter is typically embodied in the PSW (program status word) of the computer so that it can persist through context transitions. Thus, an in-progress program having a program counter value may be interrupted by, for example, an operating system (context switch from a program environment to an operating system environment). When a program is inactive, the PSW of the program maintains a program counter value, and while the operating system executes, the program counter (in the PSW) of the operating system is used. Typically, the program counter is incremented by an amount equal to the number of bytes of the current instruction. RISC (reduced instruction set computing) instructions are typically of fixed length, while CISC (Complex instruction set computing) instructions are typically of variable length.Is a CISC instruction having a length of 2, 4 or 6 bytes. Program counter 5061 is modified by, for example, a context switch operation or a branch taken operation of a branch instruction. In a context switch operation, the current program counter value is saved in a program status word along with other status information about the program being executed (such as condition codes), and a new program counter value is loaded and points to the instruction of the new program module to be executed. A branch taken operation is performed to allow the program to make a decision or loop within the program by loading the result of the branch instruction into the program counter 5061.
Typically, instructions are fetched on behalf of the processor 5026 using an instruction fetch unit 5055. The fetch unit may fetch a "next sequence of instructions," a target instruction of a branch taken instruction, or a first instruction of a context-switched program. Present instruction fetch units typically use prefetch techniques to speculatively prefetch instructions based on the likelihood that the prefetched instructions will be used. For example, the fetch unit may fetch 16 bytes of instructions, including the next sequential instruction and additional bytes of further sequential instructions.
The fetched instructions are then executed by the processor 5026. In one embodiment, the fetched instructions are passed to the dispatch unit 5056 of the fetch unit. The dispatch unit decodes the instructions and forwards information about the decoded instructions to the appropriate units 5057, 5058, 5060. The execution unit 5057 will typically receive information from the instruction fetch unit 5055 regarding decoded arithmetic instructions, and will perform arithmetic operations on operands according to the opcode of the instruction. Operands are preferably provided to the execution unit 5057 from storage 5025, architectural registers 5059, or from an immediate field (immediatefield) of the instruction being executed. The results of the execution, when stored, are stored in storage 5025, registers 5059, or other machine hardware (such as control registers, PSW registers, etc.).
The processor 5026 typically has one or more units 5057, 5058, 5060 for performing the function of instructions. Referring to fig. 22A, an execution unit 5057 may communicate with architected general registers 5059, decode/dispatch unit 5056, load store unit 5060, and other 5065 processor units via interface logic 5071. The execution unit 5057 may use several register circuits 5067, 5068, 5069 to hold information that the Arithmetic Logic Unit (ALU) 5066 is to operate on. The ALU performs arithmetic operations such as add, subtract, multiply, divide, and logical operations such as AND, OR, and exclusive OR (XOR), rotate, and shift. Preferably, the ALU supports specialized operations that are design dependent. Other circuitry may provide other architectural tools 5072, including condition codes and recovery support logic, for example. Typically, the results of the ALU operations are held in output register circuitry 5070, which may forward the results to a variety of other processing functions. There are many processor unit arrangements and this description is intended only to provide a representative understanding of one embodiment.
For example, ADD instructions will be executed in an execution unit 5057 having arithmetic and logical functionality, while floating point instructions will be executed in floating point execution with dedicated floating point capabilities, for example. Preferably, the execution unit operates on the operands identified by the instruction by executing the function defined by the opcode on the operands. For example, an ADD instruction may be executed by the execution unit 5057 on operands found in two registers 5059 identified by register fields of the instruction.
The execution unit 5057 performs arithmetic addition on two operands and stores the result in a third operand, which may be a third register or one of the two source registers.The execution unit preferably utilizes an Arithmetic Logic Unit (ALU) 5066, which can perform a variety of logic functions, such as shifting, rotating, and, OR, and XOR, as well as any of a variety of algebraic functions, including addition, subtraction, multiplication, and division. Some ALUs 5056 are designed for scalar operations, and some for floating point. Depending on the architecture, the data may be big endian (where the least significant byte is located at the most significant byte address) or little endian (where the least significant byte is located at the least significant byte address). IBMIs the large end. Depending on the architecture, the signed field may be sign and magnitude, 1's complement, or 2's complement. A 2's complement number is advantageous in that the ALU does not need to design subtraction capability because only addition in the ALU is required, whether negative or positive in the 2's complement. The numbers are typically described in shorthand, where a 12-bit field defines the address of a block of 4096 bytes, and are typically described as a 4Kbyte block, for example.
Referring to FIG. 22B, branch instruction information for executing a branch instruction is typically sent to a branch unit 5058, which often predicts branch outcome before other conditional operations are completed, using a branch prediction algorithm such as a branch history table 5082. Before the conditional operation completes, the target of the current branch instruction will be fetched and speculatively executed. When the conditional operation completes, the speculatively executed branch instruction is either completed or discarded based on the condition of the conditional operation and the speculative result. Typical branch instructions may test the condition code and branch to a target address if the condition code satisfies the branch requirement of the branch instruction, the branch address may be calculated based on a number including, for example, a number found in a register field or an immediate field of the instruction. The branch unit 5058 may utilize an ALU5074 having a plurality of input register circuits 5075, 5076, 5077 and an output register circuit 5080. The branch unit 5058 may communicate with, for example, general registers 5059, decode dispatch unit 5056, or other circuitry 5073.
Execution of a set of instructions may be interrupted for a number of reasons including, for example, a context switch initiated by the operating system, a program exception or error causing a context switch, an I/O interrupt signal causing a context switch, or multi-threaded activity of multiple programs (in a multi-threaded environment). Preferably, the context switch action saves state information about the currently executing program and then loads state information about another program being invoked. The state information may be stored, for example, in hardware registers or memory. The state information preferably includes a program counter value pointing to the next instruction to be executed, condition codes, memory translation information and architectural register contents. The context translation activities may be implemented by hardware circuitry, application programs, operating system programs, or firmware code (microcode, pico code, or Licensed Internal Code (LIC)), alone or in combination.
The processor accesses operands according to the instruction defined method. An instruction may provide an immediate operand using the value of a portion of the instruction, may provide one or more register fields that explicitly point to general purpose registers or special purpose registers (e.g., floating point registers). The instruction may utilize the implied register determined by the opcode field as an operand. The instruction may utilize memory locations for operands. The memory location of the operand may be provided by a register, an immediate field, or a combination of a register and an immediate field, such asIllustrated by the long displacement facility (facility), where the instruction defines a base register, an index register, and an immediate field (displacement field) that are added together to provide, for example, the address of an operand in memory. Location here typically means a location in main memory (main storage device) unless otherwise specified.
Referring to fig. 22C, a processor accesses a memory using a load/store unit 5060. The load/store unit 5060 may perform a load operation by obtaining the address of a target operand in memory 5053 and loading the operand into a register 5059 or other memory 5053 location, or may perform a store operation by obtaining the address of a target operand in memory 5053 and storing data obtained from a register 5059 or another memory 5053 location in the target operand location in memory 5053. The load/store unit 5060 may be speculative and may access memory in an out-of-order relative to instruction order, but the load/store unit 5060 will maintain the appearance to a program that instructions are executed in order. The load/store unit 5060 may communicate with general registers 5059, decryption/dispatch unit 5056, cache/memory interface 5053 or other elements 5083, and includes various register circuits, ALUs 5085 and control logic 5090 to calculate memory addresses and provide pipeline order to keep operations in order. Some operations may be out of order, but the load/store unit provides functionality such that operations that are performed out of order appear to the program as if they were performed in order, as is well known in the art.
Preferably, the addresses that are "seen" by the application are commonly referred to as virtual addresses. Virtual addresses are sometimes referred to as "logical addresses" and "effective addresses". These virtual addresses are virtual in that they are redirected to a physical memory location by one of a variety of Dynamic Address Translation (DAT) techniques including, but not limited to, simply prefixing the virtual address with an offset value, translating the virtual address via one or more translation tables, preferably including at least a segment table and a page table (either individually or in combination), preferably the segment table having an entry pointing to the page table. In thatA translation hierarchy is provided that includes a region first table, a region second table, a region third table, a segment table, and an optional page table. The performance of translation tables is typically improved by utilizing a Translation Lookaside Buffer (TLB) that includes entries that map virtual addresses to associated physical memory locations. When a DAT translates a virtual address using a translation table, an entry is created. Subsequent use of the virtual address may then utilize the entry of the fast TLB, rather than the slow sequential translation table accessAsking for it. TLB content may be managed by a plurality of replacement algorithms including LRU (least recently used).
Where the processors are processors of a multi-processor system, each processor has the responsibility of maintaining shared resources, such as I/O, caches, TLBs, and memory, which are interlocked to achieve coherency. Typically, "snooping" techniques will be used to maintain cache coherency. In a snooping environment, each cache line may be marked as being in one of a shared state, an exclusive state, a changed state, an invalid state, etc., to facilitate sharing.
An I/O unit 5054 (fig. 21) provides the processor with means for attaching to peripheral devices including, for example, tapes, disks, printers, displays, and networks. The I/O cells are typically presented to the computer program by a software driver. In a location such as fromIs/are as followsThe channel adapter and the open system adapter are I/O units of the mainframe computer that provide communication between the operating system and peripheral devices.
Moreover, other types of computing environments may benefit from one or more aspects of the present invention. By way of example, an environment may include an emulator (e.g., software or other emulation mechanisms), in which a particular architecture (including, for example, instruction execution, architectural functions such as address translation, and architectural registers) or a subset thereof is emulated (e.g., in a native computer system having a processor and memory). In such an environment, one or more emulation functions of the emulator can implement one or more aspects of the present invention, even though the computer executing the emulator may have a different architecture than the capabilities being emulated. As one example, in emulation mode, a particular instruction or operation being emulated is decoded, and the appropriate emulation function is established to implement the single instruction or operation.
In an emulation environment, a host computer includes, for example, memory to store instructions and data; an instruction fetch unit to fetch instructions from memory and, optionally, to provide local buffering of fetched instructions; an instruction decode unit to receive the fetched instruction and determine a type of instruction that has been fetched; and an instruction execution unit to execute the instruction. Execution may include loading data from memory to a register; storing data from the register back to the memory; or perform some type of arithmetic or logical operation as determined by the decode unit. In one example, each unit is implemented in software. For example, the operations performed by the units are implemented as one or more subroutines in emulator software.
More specifically, in a mainframe computer, programmers (typically today's "C" programmers) typically use architected machine instructions through compiler applications. The instructions stored in the storage medium may be inEither locally in a server or in a machine executing other architectures. They may be present and futureMainframe computer server andother machines (e.g., Power systems servers and systems)Server) is simulated. They can be used byAMDTMEtc. are executed in machines running Linux on various machines of manufactured hardware. Except that atWith this hardware on board, Linux can also be used for machines that use emulation provided by Hercules (see www.hercules-390.org /) or FSI (fundamentals software, Inc) (see www.funsoft.com /), where execution is typically in emulation mode. In emulation mode, emulation software is executed by the native processor to emulate the architecture of the emulated processor.
The native processor typically executes emulation software, which includes firmware or a native operating system, to execute an emulation program of the emulated processor. The emulation software is responsible for fetching and executing instructions of the emulated processor architecture. The emulation software maintains an emulated program counter to keep track of instruction boundaries. The emulation software can fetch one or more emulated machine instructions at a time and convert the one or more emulated machine instructions into a corresponding set of native machine instructions for execution by the native processor. These translated instructions may be cached so that faster translations may be accomplished. Nevertheless, the emulation software will maintain the architectural rules of the emulated processor architecture to ensure that the operating system and applications written for the emulated processor operate correctly. Furthermore, the emulation software will provide resources determined by the emulated processor architecture, including but not limited to control registers, general purpose registers, floating point registers, dynamic address translation functions including, for example, segment and page tables, interrupt mechanisms, context translation mechanisms, time of day (TOD) clocks, and architectural interfaces to the I/O subsystem, such that operating systems or applications designed to run on the emulated processor may run on the native processor with the emulation software.
The particular instruction being emulated is decoded and a subroutine is called to perform the function of that single instruction. The emulation software functions that emulate the functions of an emulated processor are implemented, for example, in a "C" subroutine or driver, or by other methods that provide drivers for specific hardware, as will be understood by those skilled in the art after understanding the description of the preferred embodiments. Including, but not limited to, U.S. patent No. 5,551,013 entitled "multiprocessor hardware emulation" to beaussoleil et al; and U.S. patent certificate number 6,009,261 entitled "preprocessing of storettaggetoutputting for simulating incorporated PatibelteIndustmeasuring A TargetProcesser" to Scalazi et al; and U.S. patent document No. 5,574,873 entitled "decodingguest instruments directive access instruments" by Davidian et al; and U.S. patent certificate number 6,308,255 entitled "symmetry multi processing and chip set used for a processsor support alloy non-native codex runinasystem" by Gorishek et al; and U.S. patent document No. 6,463,582 entitled "dynamic Optimizing ObjectCode Translationmethod for implementing and dynamic Optimizing ObjectCode Translationmethod" to Lethin et al; and U.S. patent certificate number 5,790,825 entitled "method for simulating Guest Instructions Structure of HostComputerThrough dynamic Recompatibilities of HostInstructions" by EricTraut; as well as numerous other patents, show various known ways to implement emulation of instruction formats architected for different machines for a target machine available to those skilled in the art.
In fig. 23, an example of an emulated host computer system 5092 is provided that emulates a host computer system 5000' of a host architecture. In the emulated host computer system 5092, the host processor (CPU) 5091 is an emulated host processor (or virtual host processor) and includes an emulated processor 5093 having a different native instruction set architecture than the processor 5091 of the host computer 5000'. The emulation host computer system 5092 has a memory 5094 accessible by an emulation processor 5093. In the exemplary embodiment, memory 5094 is partitioned into a host computer memory 5096 portion and an emulation routines 5097 portion. Host computer memory 5096 is available to programs emulating host computer 5092, according to the host computer architecture. The emulation processor 5093 executes native instructions of an architected instruction set of a different architecture than the emulated processor 5091 (i.e., native instructions from the emulated program processor 5097), and may access host instructions for execution from programs in the host computer memory 5096 by using one or more instructions obtained from a sequence and access/decode routine that may decode the accessed host instructions to determine a native instruction execution routine for emulating the function of the accessed host instructions. Other tools defined for the host computer system 5000' architecture may be emulated by the architecture tool routines, including such tools as general purpose registers, control registers, dynamic address translation and I/O subsystem support and processor caches. The emulation routine may also take advantage of the functionality available in the emulation processor 5093 (such as dynamic translation of general purpose registers and virtual addresses) to improve the performance of the emulation routine. Specialized hardware and offload engines may also be provided to assist the processor 5093 in emulating the functionality of the host computer 5000'.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, and/or components.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below, if any, are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims (16)

1. A method of providing interrupts to guests in a computing environment, said method comprising:
in response to executing a Modify PCI Function Controls (MPFC) instruction register interrupt operation that includes a function handle for an adapter, specifying a location in memory of a client Adapter Interrupt Bit Vector (AIBV) and a location in memory of a client Adapter Interrupt Summary Bit (AISB) that identifies the adapter, the client AIBV being an AIBV in a client array of one or more AIBVs;
receiving, by a component of a computing environment having a host forwarding AISB array with AISBs for each adapter available to a host and the host assigned to a guest, an interrupt request from an adapter;
setting an indicator in the client AIBV indicating an event from the adapter in response to the received request, and setting a host forwarding AISB array indicating that the indicator is set in the client AIBV;
determining whether an interrupt request has been pending and, in response to determining that the interrupt request has not been pending, setting a pending indicator;
the scanning host forwards a set host AISB indicator in the AISB array and determines whether the interrupt request is targeted to a client;
in response to determining that the interrupt request is targeted to a guest, determining one or more guest AISBs corresponding to a set AISB of the host forwarding AISB array; and
setting the determined one or more client AISBs, wherein each client AISB includes an indicator associated with the adapter, wherein the indicator is an indicator that one or more bits have been set in a client AIBV associated with a client AISB,
wherein, in response to setting the pending indicator, determining whether the interrupt request is targeted to the guest, wherein the determining checks one or more fields of an entry in the adapter interrupt forwarding table to determine whether the interrupt request is targeted to the guest.
2. The method of claim 1, wherein responsive to determining that the interrupt request is targeted to a guest, the interrupt is made pending for the guest.
3. The method of claim 2, wherein making the interrupt pending comprises setting a guest interrupt status area indicator in the guest interrupt status area, the guest interrupt status area indicator corresponding to a guest interrupt subclass of the adapter.
4. The method of claim 1, wherein determining whether the adapter interruption is targeted to a guest comprises:
checking an Adapter Interruption Forwarding Table (AIFT) entry, wherein a predefined value in the AIFT entry indicates that the host is running one or more guests; and
in response to the check indicating that one or more guests are running, an Interruption Subclass (ISC) specified in the adapter interruption request is compared to the ISC in the AIFT entry, where equality indicates that the target of the adapter interruption is a guest.
5. The method of claim 1, wherein the interrupt comprises an adapter-provided Message Signaled Interrupt (MSI) containing an MSI number specifying an event.
6. The method of claim 1, wherein setting an indicator in the client AIBV comprises setting an indicator in the client AIBV using one or more parameters corresponding to a device table entry of the adapter, the indicator selected based on the MSI number and an offset to the AIBV obtained from the device table entry.
7. The method of claim 6 wherein setting the host forwarding AISB array comprises using one or more parameters of the device table entry to set the host forwarding AISB array, which serves as the forwarding adapter interrupt indicator for the forwarding adapter interrupt array.
8. The method of claim 1, wherein in response to the received request, setting a pending indicator of the requesting adapter interrupt, the pending indicator observable by the one or more processors.
9. A system for providing interrupts to guests in a computing environment, said system comprising:
means for registering an interrupt operation in response to executing a Modify PCI Function Controls (MPFC) instruction that includes a function handle for an adapter, specifying a location in memory of a client Adapter Interrupt Bit Vector (AIBV), the client AIBV being an AIBV in a client array of one or more AIBVs, and identifying a location in memory of a client Adapter Interrupt Summary Bit (AISB) for the adapter;
means for receiving, by a component of a computing environment having a host forwarding AISB array with AISBs for each adapter available to a host and to which the host is assigned to a guest, an interrupt request from an adapter;
means for setting an indicator in the client AIBV indicating an event from the adapter in response to the received request, and setting a host forwarding AISB array indicating that the indicator is set in the client AIBV;
means for determining whether an interrupt request has been pending and, in response to determining that the interrupt request has not been pending, setting a pending indicator;
means for scanning a set host AISB indicator in the host forwarding AISB array and determining whether the interrupt request is targeted to a guest;
means for determining one or more guest AISBs corresponding to a set AISB of the host forwarding AISB array in response to determining that the interrupt request is targeted to a guest; and
means for setting the determined one or more client AISBs, wherein each client AISB includes an indicator associated with the adapter, wherein the indicator is an indicator that one or more bits have been set in a client AIBV associated with a client AISB,
wherein, in response to setting the pending indicator, determining whether the interrupt request is targeted to the guest, wherein the determining checks one or more fields of an entry in the adapter interrupt forwarding table to determine whether the interrupt request is targeted to the guest.
10. The system of claim 9, wherein the interrupt is made pending for the guest in response to determining that the interrupt request is targeted to the guest.
11. The system of claim 10, wherein to make the interrupt pending comprises to set a guest interrupt status area indicator in the guest interrupt status area, the guest interrupt status area indicator corresponding to a guest interrupt subclass of the adapter.
12. The system of claim 9, wherein determining whether the adapter interruption is targeted to a guest comprises:
checking an Adapter Interruption Forwarding Table (AIFT) entry, wherein a predefined value in the AIFT entry indicates that the host is running one or more guests; and
in response to the check indicating that one or more guests are running, an Interruption Subclass (ISC) specified in the adapter interruption request is compared to the ISC in the AIFT entry, where equality indicates that the target of the adapter interruption is a guest.
13. The system of claim 9, wherein the interrupt comprises an adapter-provided Message Signaled Interrupt (MSI) containing an MSI number specifying an event.
14. The system of claim 9, wherein setting the indicator in the client AIBV comprises setting the indicator in the client AIBV using one or more parameters of a device table entry corresponding to the adapter, the indicator selected based on the MSI number and an offset to the AIBV obtained from the device table entry.
15. The system of claim 14 wherein setting the host forwarding AISB array comprises using one or more parameters of the device table entry to set the host forwarding AISB array, which serves as a forwarding adapter interrupt indicator for the forwarding adapter interrupt array.
16. The system of claim 9, wherein in response to the received request, a pending indicator of the requesting adapter interrupt is set, the pending indicator being observable by the one or more processors.
HK13108098.9A 2010-06-23 2010-11-08 Converting a message signaled interruption into an i/o adapter event notification to a guest operating system HK1180800B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US12/821,177 US8468284B2 (en) 2010-06-23 2010-06-23 Converting a message signaled interruption into an I/O adapter event notification to a guest operating system
US12/821,177 2010-06-23
PCT/EP2010/067021 WO2011160706A1 (en) 2010-06-23 2010-11-08 Converting a message signaled interruption into an i/o adapter event notification to a guest operating system

Publications (2)

Publication Number Publication Date
HK1180800A1 HK1180800A1 (en) 2013-10-25
HK1180800B true HK1180800B (en) 2017-02-24

Family

ID=

Similar Documents

Publication Publication Date Title
JP5680193B2 (en) Method for converting message signaled interrupts into I/O adapter event notifications to a guest operating system - Patents.com
JP5719435B2 (en) Converting message-signaled interrupts into I/O adapter event notifications
JP5671614B2 (en) Control the rate at which adapter interrupt requests are processed
JP5649200B2 (en) Identifying the source type of an adapter interrupt
HK1180800B (en) Converting a message signaled interruption into an i/o adapter event notification to a guest operating system
HK1180796B (en) Converting a message signaled interruption into an i/o adapter event notification
HK1180795B (en) Method for facilitating management of system memory of a computing environment