HK1180796B

HK1180796B - Converting a message signaled interruption into an i/o adapter event notification

Info

Publication number: HK1180796B
Application number: HK13108053.2A
Authority: HK
Inventors: G．斯特曼三世; D．克拉多克; T．格雷格; M.法雷尔; J.伊斯顿; E．N．莱丝
Original assignee: 国际商业机器公司
Priority date: 2010-06-23
Filing date: 2010-11-08
Publication date: 2016-04-22

Description

Converting message signaled interrupts to I/O adapter event notifications

Technical Field

The present invention relates generally to interrupt handling in a computing environment, and more particularly to handling interrupts generated by adapters in a computing environment.

Background

Message Signaled Interrupts (MSI) is a method used by adapter functions, such as Peripheral Component Interconnect (PCI) functions, to generate Central Processing Unit (CPU) interrupts to notify an operating system of an event or the presence of a certain state. The MSI is an alternative to having a dedicated interrupt pin on each device. When an adapter function is configured to use MSI, the function requests an interrupt by performing an MSI write operation that writes a specified number of bytes of data to a special address. The combination of this special address and the unique data value is called an MSI vector.

Some adapter functions support only one MSI vector; other adapter functions support multiple MSI vectors. For functions that support multiple MSI vectors, the same special address is used with different data values.

On many computing platforms, the device driver configures itself as an interrupt handler associated with the MSI vector. This effectively associates the MSI vector with an entry in the CPU interrupt vector. Thus, when an adapter function supports multiple MSI vectors and is configured to use multiple MSI vectors, it consumes a corresponding number of entries in the CPU interrupt vector.

U.S. publication No. 2007/0271559a1 entitled "Virtualization of infiniband Host Channel Adapter Interruptions" (Virtualization of infiniband Host Channel Adapter Interruptions ") published by Easton et al, 11/22, 2007 describes a method, system, program product, and computer data structure for providing two-tier server Virtualization. A first hypervisor (hypervisor) enables multiple logical partitions to share a set of resources and provides a first layer of virtualization. The second hypervisor enables multiple independent virtual machines to share resources allocated to a single logical partition and provides a second layer of virtualization. All events for all virtual machines within the single logical partition are grouped (group) into a single partition owned event queue for receiving event notifications from the shared resources of the single logical partition. The interrupt request is signaled for a grouped event from the partition-owned event queue for demultiplexing of the grouped event from the partition-owned event queue by the machine to a separate virtualized event queue allocated on each virtual machine.

U.S. publication No. 2005/0289271a1 entitled "circuit to selective product MSI Signals" (circuits that selectively generate MSI Signals), published at 29/12/2005 of Martinez et al, describes in certain embodiments an invention that includes a chip having state register Circuitry coupled to a conductor to receive an interrupt event signal to provide a source signal corresponding to the interrupt event signal. The chip also includes control register circuitry to provide a source enable signal for a selective one of the interrupt sources, and re-arming logic circuitry coupled to the conductors to receive the interrupt event signal and provide a re-arming signal. The chip also includes first logic circuitry to receive a source signal, a source enable signal, and a rearming signal to provide an initial interrupt signal, and Message Signaled Interrupt (MSI) signal pulse generation logic to receive the initial interrupt signal and provide an MSI signal in response. Other embodiments are described and claimed.

U.S. patent No. 7,562,366 entitled "Transmit Completion event batching," issued 7/14 of Pope et al in 2009, describes a method for managing data send queues for use with a host and a network interface device. Briefly, the host writes data buffer descriptors to the send descriptor queue and the network interface device writes events to inform the host when it completes processing the send data buffer. Each transmit completion event descriptor informs the host of the completion of multiple transmit data buffers.

Disclosure of Invention

According to aspects of the present invention, a capability is provided that facilitates managing interrupt requests from adapters.

The shortcomings of the prior art are overcome and advantages are provided through the provision of a method, and corresponding system and computer program product for managing interrupt requests in a computing environment as set forth in claim 1.

Drawings

One or more aspects of the present invention are particularly pointed out and distinctly claimed as examples of the claims at the conclusion of the specification. The above and other objects, features and advantages of the present invention will become apparent from the following detailed description when taken in conjunction with the accompanying drawings, in which:

FIG. 1 depicts one embodiment of a computing environment to incorporate and use one or more aspects of the present invention;

FIG. 2 depicts one embodiment of the system memory and I/O hub of FIG. 1, in more detail, in accordance with an aspect of the present invention;

3A-3B illustrate examples of allocation of adapter interrupt bit vectors in accordance with aspects of the present invention;

3C-3D illustrate examples of allocation of adapter interruption summary bits in accordance with aspects of the present invention;

FIG. 4 depicts one embodiment of an overview of the logic to be performed at initialization to configure adapter functionality for I/O adapter event notification, in accordance with an aspect of the present invention;

FIG. 5 depicts one embodiment of the logic to perform registration to enable conversion of a Message Signaled Interruption (MSI) to an I/O adapter event notification, in accordance with an aspect of the present invention;

FIG. 6A depicts one embodiment of the logic to convert an MSI request into an I/O adapter event notification, in accordance with an aspect of the present invention;

FIG. 6B depicts one embodiment of the logic to present I/O adapter event notifications to the operating system, in accordance with an aspect of the present invention;

FIG. 7A depicts one embodiment of a Modify PCI function controls instruction, used in accordance with an aspect of the present invention;

FIG. 7B depicts one embodiment of fields used by the Modify PCI function controls instruction of FIG. 7A, in accordance with an aspect of the present invention;

FIG. 7C depicts one embodiment of another field used by the Modify PCI function controls instruction of FIG. 7A, in accordance with an aspect of the present invention;

FIG. 7D depicts one embodiment of the contents of a Function Information Block (FIB), used in accordance with an aspect of the present invention;

FIG. 8 depicts one embodiment of an overview of the logic to modify PCI function control, in accordance with an aspect of the present invention;

FIG. 9 depicts one embodiment of the logic associated with a register adapter interrupt operation specified by a Modify PCI function controls instruction, in accordance with an aspect of the present invention;

FIG. 10 depicts one embodiment of the logic associated with the logoff adapter interruption operation specified by the Modify PCI function controls instruction, in accordance with an aspect of the present invention;

FIG. 11A depicts one embodiment of a Call logical processor instruction, used in accordance with an aspect of the present invention;

FIG. 11B depicts one embodiment of a request block used by the Call logical processor instruction of FIG. 11A for a list operation, in accordance with an aspect of the present invention;

FIG. 11C depicts one embodiment of a response block for the list operation of FIG. 11B, in accordance with an aspect of the present invention;

FIG. 11D depicts one embodiment of a function list item, used in accordance with an aspect of the present invention;

FIG. 12A depicts one embodiment of a request block used by the Call logical processor instruction of FIG. 11A to query a functional operation, in accordance with an aspect of the present invention;

FIG. 12B depicts one embodiment of a response block for the query function operation of FIG. 12A, in accordance with an aspect of the present invention;

FIG. 13A depicts one embodiment of a request block used by the Call logical processor instruction of FIG. 11A for a query group operation, in accordance with an aspect of the present invention;

FIG. 13B depicts one embodiment of a response block for the query group operation of FIG. 13A, in accordance with an aspect of the present invention;

FIG. 14 depicts one embodiment of a computer program product incorporating one or more aspects of the present invention;

FIG. 15 depicts one embodiment of a host computer system to incorporate and use one or more aspects of the present invention;

FIG. 16 depicts another embodiment of a computer system incorporating and using one or more aspects of the present invention;

FIG. 17 illustrates another example of a computer system including a computer network that incorporates and uses one or more aspects of the present invention;

FIG. 18 depicts one embodiment of various elements of a computer system incorporating and using one or more aspects of the present invention;

FIG. 19A depicts one embodiment of an execution unit of the computer system of FIG. 18 to incorporate and use one or more aspects of the present invention;

FIG. 19B depicts one embodiment of a branching unit of the computer system of FIG. 18 incorporating and using one or more aspects of the present invention;

FIG. 19C depicts one embodiment of a load/store unit of the computer system of FIG. 18 incorporating and using one or more aspects of the present invention;

FIG. 20 illustrates one embodiment of an emulated host computer system incorporating and using one or more aspects of the present invention.

Detailed Description

In accordance with aspects of the present invention, the ability to convert a Message Signaled Interruption (MSI) request into an input/output (I/O) adapter event notification is provided. An MSI is requested by an adapter and converted to an adapter message notification, with one or more specific indicators (indicators) set, and a request is generated to present an interrupt to the operating system (or other software, e.g., other programs, etc.. As used herein, the term operating system includes operating system device drivers). In one particular example, each MSI request does not cause an interrupt request to the operating system, but rather an interrupt request contains multiple MSI requests.

As used herein, the term "adapter" includes any type of adapter (e.g., storage adapter, network adapter, processing adapter, cryptographic adapter, PCI adapter, other types of input/output adapters, etc.). In one embodiment, an adapter includes an adapter function. However, in other embodiments, an adapter may include multiple adapter functions. One or more aspects of the present invention may be applied regardless of whether an adapter includes one adapter function or multiple adapter functions. Further, in the examples presented herein, adapters are used interchangeably with adapter functions (e.g., PCI functions) unless otherwise noted.

One embodiment of a computing environment to incorporate and use one or more aspects of the present invention is described with reference to FIG. 1. In one example, the computing environment 100 is a System provided by International Business machines corporationAnd (4) a server. SystemThe server is based on z @providedby international business machines corporationIn respect of z-Details of (A) are published in IBMThe publication is entitled "z/Architecture Principles of Operation", IBM publication No. SA22-7832-07, month 2 2009.SystemAnd z-Is a registered trademark of international business machines corporation, armonk, new york. Other names used herein may be registered trademarks, trademarks or product names of International Business machines corporation or other companies.

In one example, computing environment 100 includes one or more Central Processing Units (CPUs) 102 coupled to a system memory 104 (also referred to as main memory) via a memory controller 106. To access system memory 104, central processing unit 102 issues a read or write request that includes an address that is used to access system memory. The address included in the request is typically not directly usable to access system memory, and therefore, it is converted to an address that is directly usable to access system memory. The address is translated via a translation mechanism (XLATE) 108. For example, addresses are translated from virtual addresses to real or absolute addresses using, for example, Dynamic Address Translation (DAT).

A request including an address (translated if necessary) is received by memory controller 106. In one example, the memory controller 106 contains hardware and is used to arbitrate for access to system memory and to maintain memory coherency. This arbitration is performed for requests received from the CPU 102 and requests received from one or more adapters 110. Similar to a central processing unit, the adapter issues a request to system memory 104 to obtain access to the system memory.

In one example, adapter 110 is a Peripheral Component Interconnect (PCI) or PCI Express (PCIe) adapter that includes one or more PCI functions. The PCI function issues a request that is routed to an input/output hub 112 (e.g., a PCI hub) via one or more switches (e.g., PCIe switches) 114. In one example, an input/output hub includes hardware including one or more state machines.

The input/output hub includes, for example, a root complex 116 that receives requests from switches. The request includes an input/output address that is used, for example, to perform Direct Memory Access (DMA) or request Message Signaled Interrupts (MSI). The address is provided to an address translation and protection unit 118, which accesses information for the DMA or MSI request.

For DMA operations, address translation and protection unit 118 may translate addresses to addresses that may be used to access system memory. The request initiated from the adapter, including the translated address, is then provided to the memory controller 106, e.g., via the I/O to memory bus 120. The memory controller performs its arbitration and forwards the request with the translated address to the system memory at the appropriate time.

For an MSI request, information in address translation and protection unit 118 is obtained to facilitate translation of the MSI request to an I/O adapter event notification. Since the embodiments described herein relate to interrupt processing, more details regarding the I/O hub and system memory as they relate to interrupt processing will be described with reference to FIG. 2. In fig. 2, the memory controller is not shown, but may be used. The I/O hub may be coupled to the system memory 104 and/or the processor 254 directly or through a memory controller.

Referring to FIG. 2, in one example, system memory 104 includes one or more data structures that may be used to facilitate interrupt processing. In this example, system memory 104 includes an Adapter Interruption Bit Vector (AIBV) 200 and an optional Adapter Interruption Summary Bit (AISB) 202 associated with a particular adapter. For each adapter, there may be an AIBV and a corresponding AISB.

In one example, adapter interruption bit vector 200 is a one-dimensional array of one or more bits in main memory associated with an adapter (e.g., a PCI function). Bits in the adapter interruption bit vector represent the MSI vector number (vector number). The bit set to 1 in the AIBV indicates the event condition or type for the associated adapter. In the example of a PCI function, each bit in the associated AIBV corresponds to an MSI vector. Thus, if a PCI function supports only one MSI vector, its AIBV includes a single bit; if a PCI function supports multiple MSI vectors, its AIBV includes one bit per MSI vector. In the example shown in FIG. 2, the PCI function supports multiple MSI vectors (e.g., 3), and thus, there are multiple bits (e.g., 3) in the AIBV 200. Each bit corresponds to a particular event, e.g., when bit 0 of the AIBV is set to 1, indicating a completed operation. When bit 1 of the AIBV is set to 1, this corresponds to an error event, etc. As shown, bit 1 is set in this example.

In one particular example, a command (e.g., a Modify PCI function controls command) is used to specify the AIBV for a PCI function. In particular, the command is issued by the operating system and specifies an identity (identity) of the PCI function, a primary storage location of an area containing the AIBV, an offset from the location to a first bit of the AIBV, and a number of bits constituting the AIBV. In particular, using this command, adapter interruption parameters are copied from a function information block that stores such information (e.g., obtained from initialization/configuration) into the adapter's device table entry (described below) and/or function table entry (described below).

In one example, the identity of the PCI function is a function handle (function handle). The function handle includes, for example, an enable indicator that indicates whether the PCI function handle is enabled; a PCI function number that identifies the function (this is a static identifier); and an instance number indicating a particular instance of the function handle. For example, each time the function handle is enabled, the instance number is incremented to provide a new instance number. The function handle is used to locate a function table entry in a function table that contains one or more entries. For example, one or more bits of the function handle are used as an index into the function table to locate a particular function table entry. The function table entry includes information about its associated PCI function. For example, it may include various indicators as to the status of its associated adapter function, and it may include one or more device table entry indexes that are used to locate a device table entry for that adapter function. (in one embodiment, the handle is simply an opaque (opaque) identifier of the adapter to the operating system).

The AIBV may be located at any byte boundary (boundary) and any bit boundary. This allows the operating system the flexibility to compress (pack) the AIBVs of multiple adapters to a contiguous range of bits and bytes. For example, as shown in FIG. 3A, in one example, the operating system specifies a common storage area at location X to contain 5 consecutive AIBVs. The adapter associated with each AIBV is identified by the letters A-E. The events represented by each AIBV bit for an adapter are further identified by the numbers 0-n. Unallocated bits are identified by the lower case letter "u".

Another example is shown in fig. 3B. In this example, the operating system has specified three unique storage areas at X, Y, Z to contain the AIBVs for five I/O adapters. The storage area at location X contains AIBVs for adapters A and B, the storage area at location Y contains AIBVs for adapter C only, and the storage area at location Z contains AIBVs for D and E. The events represented by each AIBV bit for an I/O adapter may be further identified by the numbers 0-n. Unallocated bits are identified by the letter "u".

Returning to FIG. 2, in addition to the AIBV, in this example, there is an AISB 202 for the adapter, which contains a single bit associated with the adapter. An AISB having a value of 1 indicates that one or more bits in the AIBV associated with the AISB have been set to 1. The AISB is optional and may be one for each adapter, one for each selected adapter, or one for a group of adapters.

In one particular implementation of a PCI function, a command (e.g., a Modify PCI function controls command) is used to specify an AISB for the PCI function. In particular, the command is issued by the operating system and specifies the identity (e.g., handle) of the PCI function, the main storage location of the area containing the AISB, an offset from that location to the AISB, and an adapter interruption summary notification enable control indicating that a summary bit is present.

The AISB may be allocated on any byte boundary and any bit boundary. This allows the operating system the flexibility to compress the AISBs of multiple adapters into a contiguous range of bits and bytes. In one example, as shown in FIG. 3, the operating system specifies a common storage area at location X to contain nine consecutive AISBs. The adapter associated with each AISB is identified by the letters A-I. Unallocated bits are identified by the lower case letter "u".

Another allocation example is shown in FIG. 3D, where the operating system has designated three unique AISB storage locations at locations X, Y and Z to contain the AISBs for each of the three adapters. The adapter associated with each AISB is identified by the letters A-C. Unallocated bits are identified by the lower case letter "u".

In addition, the program also assigns a single AISB to multiple PCI functions. This associates multiple AIBVs with a single summary bit. Thus, such an AISB having a value of 1 indicates that the operating system should scan multiple AIBVs.

Returning to FIG. 2, in one example, the AIBV and AISB are pointed to by addresses in the device table entry 206 of the device table 208 located in the I/O hub 112. In one example, the device table 208 is located within an address translation protection unit of the I/O hub.

The device table 208 includes one or more entries 206, each of which is assigned to a particular adapter function 210. The device table entry 206 includes several fields that may be populated using the commands described above, for example. The values of one or more fields are based on policy and/or configuration. Examples of fields include:

interruption Subclass (ISC) 214: indicating an interrupt subclass for the interrupt. The ISC identifies a maskable class of adapter interrupts that may be associated with a priority that an operating system uses to process the interrupt;

AIBV address () 216: providing an absolute address, e.g., of the start of a storage location containing the AIBV for the particular adapter function assigned to the device table entry;

AIBV offset 218: an offset to the start of the AIBV in the master storage location;

AISB address () 220: providing an absolute address of a start of a memory location containing an AISB for the PCI function if the AISB has been specified by the operating system;

AISB offset 222: an offset into the AISB in the master storage location;

adapter summary notification enable control (enable) 224: the control indicates whether an AISB is present;

number of interrupts (NOI) 226: expressed as the maximum number of MSI vectors allowed for the PCI function, 0 denotes that no MSI vectors are allowed.

In other embodiments, the DTE (device table entry) may include more, less or different information.

In one embodiment, a Requestor Identifier (RID) (and/or a portion of an address), for example, located in a request issued by an adapter (e.g., PCI function 210) is used to locate a device table entry to be used for a particular interrupt requested by the adapter. The requestor ID (e.g., a 16-bit value specifying, for example, a bus number, a device number, and a function number) and an address to be used for the interrupt are included in the request. The request, including the RID and address, is provided, for example, via a switch to a content addressable memory (CAM 230), for example, and the content addressable memory is used to provide an index value. For example, the CAM includes a plurality of entries, each entry corresponding to an index into a Device Table (DT). Each CAM entry includes the value of the RID. If, for example, the received RID matches a value contained in an entry in the CAM, the corresponding device table index is used to locate a device table entry. That is, the output of the CAM is used to index into the device table 208. If there is no match, the received packet is discarded (in other embodiments, no CAM or other lookup is needed, and the RID is used as an index). As described herein, the located DTE is used to process interrupt requests.

To request an interrupt, the adapter function 210 sends a packet to the I/O hub. The packet has an MSI address 232 and associated data 234. The I/O hub compares at least a portion of the received address to the value in MSI compare register 250. If there is a match, an interrupt (e.g., MSI) is being requested, rather than a DMA operation. The reason for the request (i.e., the type of event that occurred) is indicated in the correlation data 234. For example, one or more of the low-order bits of the data are used to specify a particular interrupt vector (i.e., MSI vector) indicating a cause (event).

In accordance with an aspect of the present invention, an interrupt request received from an adapter is converted into an I/O adapter event notification. That is, one or more indicators (e.g., one or more AIBVs and optionally AISBs) are set and an interrupt is requested to the operating system if no interrupts are already pending (pending). In one embodiment, multiple interrupt requests (e.g., MSIs) from one or more adapters are combined into a single interrupt to the operating system, but with respective AIBV and AISB indicators. For example, if an I/O hub has received an MSI request, which in turn has provided an interrupt request to the processor, and the interrupt request is still pending (e.g., an interrupt has not been presented to the operating system for one reason or another (e.g., the interrupt is disabled)), then if the hub receives one or more other MSIs, it will not request additional interrupts. One interrupt replaces and represents multiple MSI requests. However, one or more AIBVs and optionally one or more AISBs are set.

More details regarding the conversion of an MSI (or other adapter interrupt request) to an I/O adapter event notification will be described below with reference to FIGS. 4-6B. In particular, FIG. 4 depicts various initializations to be performed; FIG. 5 depicts a registration process; FIG. 6A depicts the logic for converting an MSI to an adapter event notification; FIG. 6B depicts logic to present an I/O adapter event notification to the operating system.

Referring to FIG. 4, in one example, to convert an MSI request to an I/O adapter event notification, some initialization is performed. During initialization, the operating system performs several steps to configure the adapter for adapter event notification via MSI requests. In this example, a PCI function is configured; but in other embodiments may be other adapters including other types of adapter functionality.

Initially, in one embodiment, a PCI function in a configuration is determined, step 400. In one example, commands issued by an operating system (e.g., a query list command) are used to obtain a list of PCI functions assigned to a requested configuration (e.g., assigned to a particular operating system). The information is obtained from a configuration data structure that maintains the information.

Next, at step 402, one of the PCI functions in the list is selected, and the MSI address for the PCI function and the number of MSI vectors supported by the PCI function are determined. The MSI address is determined based on the characteristics of the I/O hub and the system in which it is installed. The number of MSI vectors supported is policy based and configurable.

In addition, at step 410, the AIBV, and AISBs (if any), are allocated. In one example, the operating system determines the location of the AIBV, typically based on the class of adapter, to allow for efficient processing of one or more adapters. For example, AIBVs for storage adapters may be adjacent to each other. The AIBV and AISB are allocated and cleared to zero, and a register adapter interruption operation is registered (e.g., using a Modify PCI function controls instruction). At step 412, the operation registers the AIBV, AISB, ISC, number of interrupts (MSI vectors), and adapter interrupt summary notification enablement control, as will be described in more detail below. Thereafter, at step 414, the configuration space of the PCI function is read/written. Specifically, the MSI address and MSI vector count are written consistent with the previous registration.

Thereafter, at query 416, it is determined whether additional functionality exists in the list. If so, processing continues with step 402. Otherwise, the initialization process is complete.

More details regarding the registration of various parameters are described with reference to fig. 5. Initially, a Device Table Entry (DTE) corresponding to the PCI function for which initialization is being performed is selected. This selection is performed, for example, by the management firmware selecting an available DTE from the device table. Thereafter, at step 502, various parameters are stored in the device table entry. For example, the ISC, AIBV address, AIBV offset, AISB address, AISB offset, enable control, and number of interrupts (NOI) are set to values obtained by configuring the function. This completes the registration process.

As used herein, firmware includes, for example, microcode, millicode (millicode), and/or macrocode of a processor. It includes, for example, hardware-level instructions and/or data structures for implementing higher-level machine code. In one embodiment, it includes, for example, proprietary (proprietary) code that is typically delivered as microcode that includes trusted software or microcode specific to the underlying hardware and controls operating system access to system hardware.

During operation, when a PCI function wants to generate an MSI, it typically makes some information describing the condition available to the operating system. This causes one or more steps to occur to convert the PCI function's MSI request to an I/O adapter event notification to the operating system. This will be described with reference to fig. 6A.

Referring to FIG. 6A, initially, at step 600, a description of an event for which an interrupt is requested is recorded. For example, the PCI function records a description of the event in one or more adapter-specific event description recording structures stored, for example, in system memory. This may include recording the event type and recording other information. In addition, at step 601, a request is initiated by the PCI function specifying the MSI address and MSI vector number, as well as the requestor ID. At step 602, the request is received by the I/O hub, and in response to receiving the request, the requestor ID in the request is used to locate a device table entry for the PCI function. At query 603, the I/O hub compares at least a portion of the address in the request to the value in the MSI compare register. If they are not equal, the MSI is not requested. However, if they are equal, the MSI address has been specified and, thus, the MSI has been requested, not a direct memory access operation.

Thereafter, at query 604, a determination is made as to whether the MSI vector number specified in the request is less than or equal to the number of interrupts (NOIs) allowed for the function. If the MSI vector number is greater than NOI, an error is indicated. Otherwise, the I/O hub issues a set bit function to set the appropriate AIBV bit in memory. The appropriate bits are determined by adding the MSI vector number to the AIBV offset specified in the device table entry and shifting this number of bits from the AIBV address specified in the device table entry, STEP 605. Further, if an AISB has been specified, then the I/O hub uses a set bit function to set the AISB, step 606, using the AISB address and AISB offset in the device table entry.

Next, in one embodiment, a determination is made (e.g., by the CPU or I/O hub) as to whether an interrupt request is already pending. To make this determination, a pending indicator is used. For example, at query 608, a pending indicator 252 (FIG. 2) stored in the memory of the processor 254 is examined, which is accessible by a processor (e.g., CPU 102 of FIG. 1) in the computing environment that can handle interrupts. If it is not set, it is set (e.g., to 1) at step 610. If it is already set, processing is complete and another interrupt request will not be requested. Thus, subsequent interrupt requests are contained by one request that is already pending.

In one particular example, there may be one pending indicator for each interrupt subclass, and thus, the pending indicator assigned to the interrupt subclass for the requesting function is the indicator that is checked.

Asynchronously, as shown in FIG. 6B, one or more processors check the pending indicator at query 640. In particular, each processor enabled for ISCs (and zones in another embodiment) polls (pols) the indicator for ISCs when, for example, interrupts are enabled for the processor (i.e., its operating system). If one of the processors determines that the indicator is set, it arbitrates with other processors enabled for the same ISC (and zone in another embodiment) to present the interrupt, STEP 642. Returning to INQUIRY 640, if the pending indicator is not set, the processor enabled for ISC continues to poll for the set indicator.

In response to presenting the operating system with an interrupt at step 642, the operating system determines whether an AISB is registered, INQUIRY 643. If not, the operating system processes the set AIBV, as described below, at step 645. Otherwise, the operating system processes any set AISBs and AIBVs at steps 644, 645. For example, it checks if any AISBs are set. If so, it uses the AISB to determine the location of one or more AIBVs. For example, the operating system remembers the locations of the AISB and the AIBV. In addition, it keeps track of which adapter each AISB and AIBV represents. Thus, it may maintain some form of control block and other data structure including the locations of the AISB and AIBV and the associations between the AISB, AIBV and adapter IDs. It uses the control block to facilitate locating the AIBV based on the associated AISB. In another embodiment, the AISB is not used. In this case, the control block is used to locate a particular AIBV.

In response to locating one or more AIBVs, the operating system scans the AIBVs and processes any set AIBVs. It processes the interrupt (e.g., provides status) in a manner consistent with the presented event. For example, using a storage adapter, an event may indicate that an operation has completed. This causes the operating system to check the state stored by the adapter to see if the operation has completed successfully and the details of the operation. In the case of a memory read, this indicates that the data read from the adapter is now available in system memory and can be processed.

In one embodiment, if an error is detected during the operation of the conversion, an attention (attention) is raised to the system firmware instead of converting the MSI request to an adapter event notification.

Further details regarding the Modify PCI function controls instructions for registering adapter interruptions are presented herein. Referring to FIG. 7A, a Modify PCI function controls instruction 700 includes, for example, an opcode 702 indicating the Modify PCI function controls instruction; a first field 704 specifying where various information is included, the information being about the adapter function for which the operational parameters are being established; and a second field 706 that indicates the location from which a PCI Function Information Block (FIB) is obtained. The contents of the location specified by fields 1 and 2 will be further described below.

In one embodiment, field 1 specifies a general register that includes various information. As shown in FIG. 7B, the contents of the register include, for example, a function handle (handle) 710 that identifies the handle of the adapter function on which the modify instruction is executed; an address space 712 specifying an address space in system memory associated with the adapter function specified by the function handle; an operation control 714 that specifies an operation to be performed for the adapter function; and a state 716 that provides, in a predetermined code, a state about the instruction when the instruction is completed.

In one embodiment, the function handle includes, for example, an enable indicator indicating whether the handle is enabled, a function number (which is a static identifier and can be used to index into a function table) that identifies the adapter function; and an instance number that specifies a particular instance of the function handle. There is a function handle for each adapter function and it is used to locate a Function Table Entry (FTE) in the function table. Each function table entry includes operating parameters and/or other information related to its adapter function. As an example, the function table entry includes:

example No.: this field indicates the particular instance of the adapter function handle associated with the function table entry;

device Table Entry (DTE) index 1 … n: there are one or more device table indices, and each index is an index into one of the device tables for locating a Device Table Entry (DTE). Each adapter function has one or more device table entries, and each entry includes information related to its adapter function, including information for handling requests of the adapter function (e.g., DMA requests, MSI requests) and information related to requests related to the adapter function (e.g., PCI instructions). Each device table entry is associated with an address space in system memory allocated to the adapter function. The adapter function may have one or more address spaces within system memory allocated to the adapter function.

A busy indicator: this field indicates whether the adapter function is busy;

persistent error status indicator: this field indicates whether the adapter function is in a persistent error state;

restoring the starting indicator: this field indicates whether recovery of the adapter function has been initiated;

permission indicator: this field indicates whether the operating system attempting to control the adapter function has permission to do so;

enabling the indicator: this field indicates whether the adapter function is enabled (e.g., 1= enabled, 0= disabled);

requester Identifier (RID): this is an identifier of the adapter function and includes, for example, a bus number, a device number, and a function number.

In one example, this field is used to access the configuration space of the adapter function. (the memory of the adapter may be defined as an address space, including, for example, a configuration space, an I/O space, and/or one or more memory spaces.) in one example, the configuration space may be accessed by specifying the configuration space in instructions issued by the operating system (or other configuration) to the adapter function. Specified in the instruction is an offset into the configuration space, and a function handle for locating the appropriate function table entry including the RID. The firmware receives the instruction and determines that it is for the configuration space. Thus, it uses the RID to generate requests to the I/O hub, and the I/O hub creates requests to access the adapter. The positioning of the adapter function is based on the RID, and the offset specifies an offset into the configuration space of the adapter function.

Base Address Register (BAR) (1 to n): this field includes a plurality of unsigned integers, designated BAR₀-BAR_nWhich is associated with the originally specified adapter function and whose value is also stored in the base address register associated with the adapter function. Each BAR indicates the starting address of the memory space or I/O space within the adapter function, and also indicates the type of address space, i.e., it is a 64 or 32 bit memory space, for example, or a 32 bit I/O space;

in one example, it is used to access memory space and/or I/O space of the adapter function. For example, an offset provided in an instruction accessing the adapter function is added to a value in a base address register associated with an address space specified in the instruction to obtain an address for accessing the adapter function. An address space identifier provided in the instruction identifies an address space within the adapter function to be accessed, and a corresponding BAR to be used;

size (Size) 1 … n: this field includes a plurality of unsigned integers, designated SIZE₀-SIZE_N(ii) a The value of the size field, when not zero, indicates the size of each address space, and each entry corresponds to the previously described BAR.

Further details regarding BAR and Size will be described below.

1. When the BAR is not implemented for the adapter function, both the BAR field and its corresponding size field are stored as zeros.

2. When the BAR field represents an I/O address space or a 32-bit memory address space, the corresponding size field is non-zero and represents the size of the address space.

3. When the BAR field represents a 64-bit memory address space,

a.BAR_nthe field indicates the least significant (least significant) address bits.

b. The next successive BAR_n+1The field indicates the most significant (most significant) address bit.

c. Corresponding SIZE_nThe field is non-zero and indicates the size of the address space.

d. Corresponding SIZE_n+1The field is not meaningful and is stored as zero.

Internal routing information: this information is used to perform a specific routing to the adapter. It includes, by way of example, node, processor chip and hub addressing information.

And (3) status indication: this provides an indication as to whether, for example, a load/store operation is blocked or the adapter is in an error state, among other indications.

In one example, the busy indicator, persistent error status indicator, and recovery start indicator are set based on supervision performed by firmware. Also, the permission indicator is set based on, for example, policy; and BAR information is set based on configuration information found during a bus walk (bus walk) of a processor (e.g., firmware of the processor). Other fields may be set based on configuration, initialization, and/or events. In other embodiments, the function table entry may include more, less, or different information. The information included may depend on the operations supported or enabled by the adapter function.

Referring to FIG. 7C, in one example, field 2 indicates the logical address 720 of the PCI Function Information Block (FIB), which includes information about the adapter function. The function information block is used to update the device table entry and/or function table entry (or other location) associated with the adapter function. This information is stored in the FIB during initialization and/or configuration of the adapter, and/or in response to certain events.

Further details regarding the Functional Information Block (FIB) are described with reference to fig. 7D. In one embodiment, the function information block 750 includes the following fields:

format 751: this field specifies the format of the FIB.

Interception control 752: this field is used to indicate whether guest execution of a particular instruction by a pageable mode guest (pageable mode guest) results in instruction interception;

error indication 754: this field includes error status indications for direct memory access and adapter interruptions. When the bit is set (e.g., 1), one or more errors are detected when performing direct memory access or adapter interception for the adapter function;

load/store block 756: this field indicates whether the load/store operation is blocked;

PCI function valid 758: this field includes enable controls for the adapter function. When the bit is set (e.g., 1), the adapter function is considered enabled for I/O operations;

address space registration 760: this field includes direct memory access enable control for the adapter function. When this field is set (e.g., 1), direct memory access is enabled;

page size 761: this field indicates the size of the page or other unit of storage to be accessed by the DMA memory access;

PCI Base Address (PBA) 762: this field is the base address for the address space in system memory allocated to the adapter function. It represents the lowest virtual address that the adapter function is allowed to use in direct memory access to the specified DMA address space;

PCI address boundary (PAL) 764: this field indicates the highest virtual address that the adapter function is allowed to access within the specified DMA address space;

input/output address translation pointer (IOAT) 766: the input/output address translation pointer specifies the first of any translation tables used by PCI virtual address translation, or it may directly specify the absolute address of the memory frame as the result of the translation;

interrupt Subclass (ISC) 768: this field includes an interrupt subclass for giving adapter interrupts for adapter functions;

number of interruptions (NOI) 770: this field specifies the number of different interrupt codes that are acceptable for the adapter's function. This field also defines in bits the size of the adapter interrupt bit vector specified by the adapter interrupt bit vector address and the adapter interrupt bit vector offset field;

adapter interrupt bit vector Address (AIBV) 772: this field specifies the address of the adapter interrupt bit vector for the adapter function. The vector is used in the interrupt processing;

adapter interrupt bit vector offset 774: this field specifies the offset of the first adapter interrupt bit vector bit for the adapter function;

adapter interrupt summary bit Address (AISB) 776: this field provides an address specifying an adapter interrupt summary bit that is optionally used in interrupt processing;

adapter interrupt summary bit offset 778: this field provides an offset into the adapter interrupt summary bit vector;

function Measurement Block (FMB) address 780: this field provides the address of the function measurement block for collecting measurements on the adapter function;

function measurement block key (key) 782: this field includes an access key to access the functional measurement block;

summary bit notification control 784: this field indicates whether there is a summary bit vector being used;

instruction authorization token 786: this field is used to determine whether the pageable storage mode guest is authorized to execute PCI instructions without host intervention.

In one example, in z @A pageable guest is interpretively executed at level 2 of interpretation via a Start Interpretive Execution (SIE) instruction. For example, a Logical Partition (LPAR) hypervisor (hypervisor) executes the SIE instruction to begin a physical, fixed logical partition in memory. If z-Is the operating system in the logical partition that issues the SIE instruction to execute its guest (virtual) machine in its V = V (virtual) storage. Thus, the LPAR manager uses the rank 1SIE, and z @Hypervisor usage level 2 SIE; and

address translation format 787: this field indicates the selected format (e.g., indication of the segment table, region (region) third, etc.) for translating the address of the highest level translation table to be used in the translation.

The function information block specified in the Modify PCI function controls instruction is used to modify the selected device table entry, function table entry, and/or other firmware controls associated with the adapter function specified in the instruction. Certain services are provided to the adapter by modifying device table entries, function table entries, and/or other firmware controls. These services include, for example, adapter interruptions; address translation; resetting the error state; reset load/store block; setting functional measurement parameters; and setting interception control.

One embodiment of the logic associated with modifying a PCI function control instruction is described with reference to FIG. 8. In one example, the instructions are issued by an operating system (or other configuration) and executed by a processor (e.g., firmware) executing the operating system. In the example herein, the instruction and adapter functions are PCI based. However, in other embodiments, different adapter structures and corresponding instructions may be used.

In one example, the operating system provides the following operands to the instruction (e.g., in one or more registers specified by the instruction); PCI function handles; a DMA address space identifier; operation control; and the address of the functional information block.

Referring to FIG. 8, initially, a determination is made as to whether a facility (facility) is installed that allows modification of the PCI function control instructions, INQUIRY 800. This determination is made, for example, by examining an indicator stored, for example, in a control block. If the tool is not installed, an exception condition is provided, STEP 802. Otherwise, a determination is made as to whether the instruction was issued by a pageable storage mode guest (or other guest), INQUIRY 804. If so, the host operating system will emulate the operation for the guest, step 806.

Otherwise, a determination is made as to whether one or more operands are aligned, INQUIRY 808. For example, it is determined whether the address of the functional information block is at a doubleword boundary. In one example, this is optional. If the operands are not aligned, an exception condition is provided, STEP 810. Otherwise, a determination is made as to whether the functional information block is accessible, INQUIRY 812. If not, an exception condition is provided, step 814. Otherwise, a determination is made as to whether the handle provided in the operand of the modify PCI function control instruction is enabled, INQUIRY 816. In one example, this determination is made by examining an enable indicator in the handle. If the handle is not enabled, an exception condition is provided, step 818.

If the handle is enabled, the handle is used to locate the function table entry, STEP 820. That is, at least a portion of the handle is used to index into the function table to locate the function table entry corresponding to the adapter function for which the operating parameters are to be established.

A determination is made as to whether a function table entry is found, INQUIRY 822. If not, an exception condition is provided, step 824. Otherwise, if the configuration from which the instruction was issued is guest, query 826, an exception condition is provided (e.g., interception to host), step 828. If the configuration is not a customer, the query may be ignored, or other authorizations may be checked, if specified.

A determination is then made as to whether the function is enabled, INQUIRY 830. In one example, this determination is made by checking an enable indicator in the function table entry. If it is not enabled, an exception condition is provided, step 832.

If the function is enabled, a determination is made as to whether recovery is active, INQUIRY 834. If the recovery is active as determined by the recovery indicator in the function table entry, an exception condition is provided, step 836. If, however, recovery is not active, a further determination is made as to whether the function is busy, INQUIRY 838. This determination is made by looking up the busy indicator in the function table entry. If the function is busy, a busy condition is provided, step 840. With the busy condition, the instruction may be retried instead of giving up it.

If the function is not busy, a further determination is made as to whether the function information block format is valid, INQUIRY 842. For example, the format field of the FIB is examined to determine if the format is supported by the system. If it is not valid, an exception condition is provided, step 844. If the function information block format is valid, a further determination is made as to whether the operation control specified in the operand of the instruction is valid, INQUIRY 846. That is, whether the operation control is one of the specified operation controls for the instruction. If it is not valid, an exception condition is provided, step 848. However, if the operation control is valid, the specified specific operation control is continued to be processed.

In one example, the operation control is a register adapter interruption operation, which is used to control adapter interruptions. In response to this operational control, adapter function parameters associated with the adapter interruption are set in the device table entry based on the appropriate contents of the function information block.

One embodiment of the logic associated with this operation is described with reference to FIG. 9. As an example, the operands for this operation obtained from the functional information block include, for example: interrupt Subclass (ISC); number of allowed interrupts (NOI); adapter Interrupt Bit Vector Offset (AIBVO); a summary notification (S); adapter interruption summary bit vector offset (ABVSO); an Adapter Interrupt Bit Vector (AIBV) address; and an adapter interrupt summary bit vector (AISB) address.

Referring to FIG. 9, initially, at query 900, a determination is made whether the number of breaks (NOI) specified in the FIB is greater than a model-dependent maximum value. If so, then in step 902, an exception condition is provided. However, if the number of interrupts is not greater than the model-dependent maximum, then at query 904, a further determination is made as to whether the number of interrupts plus the adapter interrupt bit vector offset (NOI + AIBVO) is greater than the model-dependent maximum. If so, then at step 906, an exception condition is provided. If the NOI plus the AIBVO is not greater than the maximum value dependent on the model, then at query 908, a further determination is made as to whether the AIBV address plus the NOI crosses a 4k boundary. If it does cross a 4k boundary, then at step 910, an exception condition is provided. Otherwise, at step 912, it is determined whether sufficient resources are available for any required resources. If there are not enough resources, then at step 914, an exception condition is provided.

Otherwise, at step 916, a determination is made whether an adapter interruption has been registered for the function. In one embodiment, this may be determined by examining one or more parameters (e.g., in the DTE/FTE). In particular, parameters related to the interruption, such as NOI, are checked. If the field is filled, the adapter is registered for interruption. If an adapter has been registered, then at step 918 an exception condition is provided. Otherwise, the interrupt parameter is retrieved from the FIB and placed in the device table entry, and optionally, in the corresponding Function Table Entry (FTE) (or other specified location). Further, at step 920, an MSI enable indicator is set in the DTE. That is, PCI function parameters associated with the adapter interruption are set in the DTE and optionally the FTE based on information retrieved from the function information block. These parameters include, for example, ISC, NOI, AIBVO, S, AIBVSO, AIBV address, and AISB address.

In addition to the above, another operation control that can be specified is a logout memory interrupt operation, an example of which will be described with reference to fig. 10. With this operation, the adapter function parameter associated with the adapter interruption is reset.

Referring to FIG. 10, initially, a determination is made as to whether the adapter specified by the function handle is registered for an interrupt, INQUIRY 1000. If not, then at step 1002, an exception condition is provided. Otherwise, at step 1004, the interrupt parameter in the function table entry (or other location) and the corresponding device table entry is set to 0. In one example, the parameters include ISC, NOI, AIBVO, S, AIBSO, AIBV address, and AISB address.

As described above, in one embodiment, to obtain information about the adapter function, a call logic processor instruction is used. One embodiment of this instruction is shown in FIG. 11. As shown, in one example, a Call Logical Processor (CLP) instruction 1100 includes an opcode 1102, which indicates that this is a Call logical processor instruction; and a command indication 1104. In one example, the indication is an address of a request block describing a command to be executed, and information in the request block depends on the command. Examples of request blocks and corresponding response blocks for respective commands are described with reference to FIGS. 11B-13B.

Referring first to FIG. 11B, a request block for a list PCI function command is provided. The list PCI function command is used to obtain a list of PCI functions assigned to the requesting configuration (e.g., the requesting operating system). Request block 1120 includes several parameters, such as:

length field 1122: this field indicates the length of the request block;

the command code 1124: this field indicates the list PCI function command; and

recovery token (token) 1126: this field is an integer that is used to start a new list PCI function command or to resume a previous list PCI function command, as will be described in more detail below.

When the recovery token field in the command request block includes a value, for example, zero, then a new list of PCI functions is requested. When the recovery token field includes a non-zero value returned, for example, from the previous list PCI function command, then the request is to continue with the previous list of PCI functions.

The response block is returned in response to the call logic processing instruction issuing and processing the command for the list PCI function. One embodiment of a response block is shown in FIG. 11C. In one example, the response block 1150 for the list PCI function command includes:

length field 1152: this field indicates the length of the response block;

response code 1154: this field indicates the status of the command;

PCI function list 1156: this field indicates a list of one or more PCI functions available to the requesting operating system;

recovery token 1158: this field indicates whether continuation of the previous PCI function list is requested. In one example, when the recovery token in the request block and the recovery token in the response block are zero, all PCI functions assigned to the requested configuration are represented in the list of PCI functions; if the recovery token in the request block is zero and the recovery token in the response block is not, there may be additional PCI functions assigned to the request configuration that are not represented in the list; if the recovery token in the request block is not zero and the recovery token in the response block is zero, the remaining PCI functions assigned to the requested configuration are represented in the list starting from the recovery point; when the recovery tokens in both the request and response blocks are not zero from the recovery point, there may be additional PCI functions assigned to the request configuration that are not represented in any of the relevant PCI function lists. After being returned, the recovery token remains valid for an indeterminate period of time, but it may be invalid for various reasons depending on the model (including system load elapsed time).

Model-dependent data 1160: this field includes system dependent data;

number of PCI functions 1162: this field indicates the maximum number of PCI functions supported by the tool (facility); and

item size 1164: this field indicates the size of each entry in the PCI function list.

More details regarding the PCI function list are described with reference to fig. 11D. In one example, the PCI function list includes a plurality of entries and each entry 1156 includes, as examples, the following information:

device ID 1170: this field indicates the I/O adapter associated with the corresponding PCI function;

vendor ID 1172: this field identifies the manufacturer of the I/O adapter associated with the corresponding PCI function;

function identifier 1174: this field includes the persistent identifier of the PCI function;

function handle 1176: this field identifies the PCI function. The stored PCI function handle is a general handle when the designated bit of the handle is zero, and it is an enabled handle when the bit is 1. If the PCI function is disabled, the generic PCI function handle is stored. If the PCI function is enabled, the enabled PCI function handle is stored. In one example, the PCI function handle does not persist outside of the IPL, unlike the PCI function ID, which is persistent and set for the lifetime of the I/O configuration definition; and

configuration status 1178: this field indicates the status of the PCI function. When the indicator is, for example, zero, the state is wait (standby), and when the indicator is, for example, 1, the state is configured. When waiting, the PCI function handle is a generic PCI function handle, and when configured, it is a generic or enabled PCI function handle, depending on whether the PCI function is enabled.

After obtaining the list of adapter functions, information regarding the attributes of the selected function specified by the specified PCI function handle may be obtained. This information may be obtained by issuing a CLP instruction with a query function command.

One embodiment of a request block for a query PCI function command is described with reference to FIG. 12A. In one example, request block 1200 includes, for example:

length field 1202: this field indicates the length of the request block;

the command code 1204: this field indicates the query PCI function command; and

function handle 1206: this field includes a (e.g., generic or enabled) PCI function handle that specifies the PCI function to be queried.

The response block is returned in response to issuing the call logic processor instruction to query the PCI function command. One embodiment of a response block is shown in FIG. 12B. In one example, response block 1250 includes the following:

length 1252: this field indicates the length of the response block;

response code 1254: this field indicates the status of the command;

functional group ID 1256: this field indicates the PCI function group identifier. The PCI function group identifier is used to associate a group of PCI functions with a set of attributes (also referred to herein as properties). Each PCI function having the same PCI function group identifier has the same set of attributes;

function ID 1258: PCI function id is a persistent identifier of the PCI function, which is originally specified by the PCI function handle and is set for the lifetime of the I/O configuration definition;

physical channel adapter 1260: this value represents a model-dependent identification of the location of the physical I/O adapter corresponding to the PCI function;

base Address Register (BAR) 1 … n 1262: this field includes a plurality of unsigned integers, which are designated BARs₀–BAR_nIt is associated with the initially designated PCI function and its value is also stored in the base register associated with the PCI function. Each BAR specifies the starting address of the memory space or I/O space in the adapter, and also indicates the addressThe type of space, i.e., whether it is a 64-bit or 32-bit memory space, or a 32-bit I/O space, for example;

size 1 … n 1264: this field includes a plurality of unsigned integers, designated SIZE₀–SIZE_n. When the value of the size field is non-zero, it represents the size of each address space, each entry of which corresponds to the previously described BAR.

Start available DMA 1266: this field includes an address indicating the start of a PCI address range that may be used for DMA operations;

terminate available DMA 1268: this field includes a value that indicates the termination of the PCI address range that is available for DMA operations.

In addition to obtaining attributes for a particular adapter function, attributes for the group containing that function may also be obtained. These common attributes may be obtained by issuing a CLP instruction with a query PCI function group command. The command is used to obtain a supported set of properties for a group of one or more PCI functions specified by a specified PCI function group identifier. The PCI function group identifier is used to associate a group of PCI functions with the same group of properties. One embodiment of a request block for requesting a PCI function group command is described with reference to FIG. 13A. In one example, request block 1300 includes the following:

length field 1302: this field indicates the length of the request block;

command code 1304: this field indicates the query PCI function group command; and

function group ID 1306: this field specifies the PCI function group identifier for which the attributes are acquired.

In response to issuing and processing a call logic processing instruction with a query PCI function group command, a response block is returned. FIG. 13B illustrates one embodiment of a response block. In one embodiment, response block 1350 includes:

length field 1352: this field indicates the length of the response block;

response code 1354: this field indicates the status of the command;

number of interruptions 1356: this field indicates the maximum number of consecutive MSI vector numbers (i.e., interrupt event indicators) that are supported by the PCI tool for each PCI function in the specified set of PCI functions. In one example, the range of possible valid values for the number of interruptions is 0 to 2,048;

version 1358: this field indicates the version of the PCI specification supported by the PCI utility to which the PCI function group specified by the specified PCI group identifier is attached;

frame 1362: this field indicates the supported frame (or page) size for I/O address translation;

measurement block update interval 1364: this is a value indicating the approximate time interval (e.g., in milliseconds) between PCI function measurement block updates

DMA address space mask 1366: this is a value to indicate which bits in the PCI address are used to identify the DMA address space; and

MSI address 1368: this is a value used for message signaled interruption requests.

The request list and function commands as described above retrieve information from the function table, for example. At initialization, or after hot plug (hot plug) of the adapter, the firmware performs a bus walk (bus walk) to determine the location of the adapter and to determine its basic characteristics. This information is stored by the firmware into a Function Table Entry (FTE) for each adapter. The accessibility of the adapter is determined based on policies set by the system administrator and also set by the firmware into the FTE. The query list and function commands may then retrieve this information and store it in their respective response blocks accessible to the operating system.

In addition, the group information is based on the capabilities of the given system I/O architecture infrastructure as well as the firmware and I/O hubs. This may be stored in the FTE or any other convenient location for later retrieval at query processing. In particular, the query group command retrieves this information and stores it in a response block accessible to the operating system.

The ability to translate PCI message signaled interrupts into I/O adapter event notifications to the operating system is described in detail above. This provides a low latency interrupt request; passing the MSI from a relatively large number of PCI functions to the operating system; and keeping the MSI adapted to the MSI vector specified style (navigator) of the adapter event notification framework. It is adapted to allow the I/O hub to connect to a relatively large number of PCI functions and to eliminate the problem of generating a unique interrupt each time an MSI vector is written.

In the embodiment described herein, the adapter is a PCI adapter. As used herein, PCI refers to any adapter implemented according to the PCI-based specification defined by the peripheral component interconnect special interest group (PCI-SIG) (www.pcisig.com/home), including but not limited to PCI or PCIe. In one particular example, peripheral component interconnect express (PCIe) is a component-level interconnect standard that defines a bi-directional communication protocol for transactions between an I/O adapter and a host system. According to the PCIe standard for transmission over a PCIe bus, PCIe communications are encapsulated in packets. Transactions originating at the I/O adapter and terminating at the host system are referred to as upbound transactions. Transactions originating at the host system and terminating at the I/O adapter are referred to as downstream transactions. The PCIe topology is based on point-to-point unidirectional links that are paired (e.g., one uplink, one downlink) to form a PCIe bus. The PCIe standard is maintained and published by the PCI-SIG.

As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method or computer program product. Accordingly, the present disclosure may be embodied in the form of: may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software, and may be referred to herein generally as a "circuit," module "or" system. Furthermore, in some embodiments, the invention may also be embodied in the form of a computer program product in one or more computer-readable media having computer-readable program code embodied in the medium.

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Referring now to FIG. 14, in one example, a computer program product 1400 includes, for instance, one or more computer-readable storage media 1402 having computer-readable program code means or logic 1404 stored thereon to provide and facilitate one or more aspects of the present invention.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The present invention is described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instruction means (instructions) which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition to the foregoing, one or more aspects of the present invention may be provided, offered, deployed, managed, serviced, etc. by a service provider who offers management of a user's environment. For example, a service provider can create, maintain, support, etc., computer code and/or computer infrastructure that performs one or more aspects of the present invention for one or more users. The service provider, in turn, may accept payment from the user, for example, according to a subscription and/or fee agreement. Additionally or alternatively, the service provider may receive payment from the sale of advertising content to one or more third parties.

In one aspect of the invention, an application may be deployed to perform one or more aspects of the invention. As one example, deploying an application comprises providing a computer infrastructure operable to perform one or more aspects of the present invention.

As yet another aspect of the present invention, a computing infrastructure may be deployed comprising integrating computer-readable code into a computer system, wherein the code in combination with the computing system is capable of performing one or more aspects of the present invention.

As yet another aspect of the present invention, a process for integrating computing infrastructure comprising integrating computer readable code into a computer system may be provided. The computer system includes a computer-readable medium, wherein the computer medium includes one or more aspects of the present invention. The code in combination with the computer system is capable of performing one or more aspects of the present invention.

While various embodiments are described above, these are only examples. For example, computing environments of other architectures may incorporate and use one or more aspects of the present invention. By way of example, except SystemServers other than servers, such as Power Systems servers or other servers offered by International Business machines corporation, or servers of other companies, may include, use and/or benefit from one or more aspects of the present invention. Moreover, although in the examples illustrated herein, the adapters and PCI hubs are considered to be part of the server, in other embodiments, they need not be considered to be part of the server, but may simply be considered to be coupled to the system memory and/or other components of the computing environment. The computing environment need not be a server. Moreover, although the adapters are PCI based, one or more aspects of the present invention may be used with other adapters or other I/O components. Adapters and PCI adapters are examples only. Further, one or more aspects of the present invention may be applicable to interrupt schemes other than PCI MSI. Further, although bits are set in the described examples, in other embodiments, bytes or other types of indicators may be set. Also, the DTE may include more, less, or different information. Many other variations are possible.

Moreover, other types of computing environments may benefit from one or more aspects of the present invention. By way of example, a data processing system suitable for storing and/or executing program code will be used that includes at least two processors coupled directly or indirectly to memory elements through a system bus. The memory elements include, for instance, local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.

Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, DASD, magnetic tape, CDs, DVDs, thumb drives, and other storage media) can be coupled to the system either directly or through intervening I/O controllers. Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the available types of network adapters.

Referring to FIG. 15, representative components of a host computer system 5000 to implement one or more aspects of the present invention are described. The representative host computer 5000 includes one or more CPUs in communication with a computer memory (i.e., central storage) 5002, and I/O interfaces to storage media devices 5011 and networks 5010 for communicating with other computers or SANs and the like. The CPU 5001 conforms to an architecture having an architectural instruction set and architectural functions. The CPU 5001 may have Dynamic Address Translation (DAT) 5003 for translating program addresses (virtual addresses) to real addresses of memory. A DAT typically includes a Translation Lookaside Buffer (TLB) 5007 for caching translations so that later accesses to a block of computer memory 5002 do not require the delay of address translation. Typically, a cache 5009 is used between the computer memory 5002 and the processor 5001. The cache 5009 may be hierarchical, having a large cache available to more than one CPU, and smaller, faster (lower level) caches between the large cache and each CPU. In some embodiments, the lower level cache is split to provide separate lower level caches for instruction fetching and data accesses. In one embodiment, instructions are fetched from memory 5002 by instruction fetch unit 5004 via cache 5009. The instructions are decoded in the instruction decode unit 5006 and (in some embodiments, with other instructions) sent to the one or more instruction execution units 5008. Typically, several execution units 5008 are used, such as an arithmetic execution unit, a floating point execution unit, and a branch instruction execution unit. The instruction is executed by the execution unit, accessing operands from registers or memory specified by the instruction, as needed. If an operand is to be accessed (loaded or stored) from memory 5002, load/store unit 5005 typically handles the access under the control of the instruction being executed. The instructions may be executed in hardware circuitry, or in internal microcode (firmware), or in a combination thereof.

Note that the computer system includes information in local (or main) memory, as well as addressing, protection, and reference and change records. Some aspects of addressing include address format, concept of address space, various types of addresses, and the manner in which one type of address is translated to another type of address. Some main memories include persistently allocated memory locations. The main memory provides the system with fast-access data storage that is directly addressable. Both data and programs will be loaded into main memory (from the input device) before they can be processed.

The main memory may include one or more smaller, faster-access cache memories, sometimes referred to as caches. The cache is typically physically associated with the CPU or I/O processor. The effects of the physical structure and use of different storage media are not typically observed by a program except in terms of performance.

Separate caches for instruction and data operands may be maintained. Information in a cache may be maintained as contiguous bytes on integer boundaries called cache blocks or cache lines (or simply lines). The model may provide an EXTRACT CACHE ATTRIBUTE instruction that returns the byte size of the CACHE line. The model may also provide PREFETCH DATA (prefetch data) and PREFETCH DATA relative issue (prefetch longer data) instructions that enable prefetching for storage into a data or instruction cache, or release of data from the cache.

The memory is considered to be a long horizontal string of bits. For most operations, accesses to memory are made in left-to-right order. The bit string is subdivided into units of eight bits. The eight-bit unit is called a byte, which is the basic building block for all information formats. Each byte location in memory is identified by a unique non-negative integer, which is the address of the byte location, or simply, the byte address. Adjacent byte positions have consecutive addresses, starting at 0 on the left and proceeding in left to right order. The address is an unsigned binary integer and is 24, 31 or 64 bits.

Information is transferred between the memory and the CPU or channel subsystem one byte or a group of bytes at a time. Unless otherwise specified, e.g., at zA group of bytes in memory is addressed by the leftmost byte of the group. The number of bytes in a group may be implied or explicitly specified by the operation to be performed. When used in CPU operations, a group of bytes is called a field. Within each group of bytes, for example at z-In which bits are numbered in left-to-right order. In z-In (d), the leftmost bit is sometimes referred to as the "high order" bit and the rightmost bit is referred to as the "low order" bit. However, the number of bits is not a memory address. Only bytes can be addressed. To operate on a single bit of a byte in memory, the entire byte is accessed. Bits on a byte are numbered 0 to 7 from left to right (e.g., at z @)In (1). Bits in the address are numbered 8-31 or 40-63 for a 24-bit address, or 1-31 or 33-63 for a 31-bit address; they are numbered 0-63 for a 64-bit address. In any other fixed length format of a plurality of bytes, the bits that make up the format are numbered consecutively starting from 0. For error detection, and preferably for correction, one or more check bits may be passed with each byte or group of bytes. Such check bits are automatically generated by the machine and cannot be directly controlled by the program. The storage capacity is expressed in number of bytes. When the length of a memory operand field is implied by the opcode of the instruction, the field is said to have a fixed length, which may be one, two, four, eight, or sixteen bytes. Larger fields may be implied for some instructions. When the length of the memory operand field is not implied but explicitly indicated, the field is said to have a variable length. Variable length operands may be variable in length in increments of one byte (or for some instructions, in multiples of two bytes or other multiples). When information is placed in memory, only the contents of those byte locations included in the specified field are replaced, even though the width of the physical path to memory may be greater than the length of the field being stored.

Some units of information are located on integer limits in memory. For a unit of information, a bound is said to be an integer when its memory address is a multiple of the length of the unit in bytes. Special names are given to fields of 2, 4,8 and 16 bytes on the integer limit. A halfword is a set of two consecutive bytes on a two-byte boundary and is the basic building block of instructions. A word is a set of four consecutive bytes on a four-byte boundary. A doubleword is a set of eight consecutive bytes on an eight-byte boundary. A quad word (quadword) is a set of 16 contiguous bytes on a 16-byte boundary. When a memory address specifies a halfword, a word, a doubleword, and a quadword, the binary representation of the address includes one, two, three, or four rightmost zero bits, respectively. The instruction will be on a two-byte integer boundary. Most instructions have memory operands that do not have boundary alignment requirements.

On devices that implement separate caches for instructions and data operands, significant delays may be experienced if a program stores in a cache line and an instruction is subsequently fetched from the cache line, regardless of whether the store alters the subsequently fetched instruction.

In one embodiment, the invention may be implemented by software (sometimes referred to as licensed internal code, firmware, microcode, millicode, picocode, etc., any of which would be consistent with the invention). Referring to fig. 15, software program code embodying the present invention is typically accessible by a processor of the host system 5000 from a long term storage media device 5011, such as a CD-ROM drive, tape drive or hard drive. The software program code may be embodied on any of a variety of known media for use with a data processing system, such as a floppy disk, a hard drive, or a CD-ROM. The code may be distributed on such media, or may be distributed to users of other computer systems from the computer memory 5002 or storage devices of one computer system over the network 5010 for use by users of such other systems.

The software program code includes an operating system which controls the function and interaction of the various computer components and one or more application programs. The program code is typically paged from the storage media device 5011 to the relatively higher speed computer memory 5002 where it is available to the processor 5001. Techniques and methods for embodying software program code in memory, on physical media, and/or distributing software code via networks are well known and will not be discussed further herein. When the program code is created and stored on a tangible medium, including but not limited to an electronic memory module (RAM), flash memory, Compact Discs (CDs), DVDs, tapes, etc., it is often referred to as a "computer program product". The computer program product medium is typically readable by processing circuitry preferably located in a computer system for execution by the processing circuitry.

FIG. 16 illustrates a representative workstation or server hardware system in which the present invention may be implemented. The system 5020 of fig. 16 includes a representative base computer system (base computer system) 5021, such as a personal computer, workstation or server, including optional peripherals. A basic computer system 5021 comprises one or more processors 5026 and a bus used to connect and enable communication between the processors 5026 and other components of the system 5021, in accordance with known techniques. The bus connects the processor 5026 to memory 5025 and long-term storage 5027 which may comprise a hard disk drive (including any of magnetic media, CD, DVD, and flash memory, for example) or a tape drive, for example. The system 5021 can also include a user interface adapter that connects the microprocessor 5026 via the bus to one or more interface devices, such as a keyboard 5024, a mouse 5023, a printer/scanner 5030, and/or other interface devices, which can be any user interface device, such as a touch-sensitive screen, a digital input pad (digitized entry pad), etc. The bus may also connect a display device 5022, such as an LCD screen or monitor, to the microprocessor 5026 via a display adapter.

The system 5021 may communicate with other computers or networks of computers via a network adapter capable of communicating 5028 with a network 5029. Exemplary network adapters are communications channels, token ring, Ethernet or modems. Alternatively, the system 5021 may communicate using a wireless interface, such as a CDPD (cellular digital packet data) card. The system 5021 can be associated with such other computers in a Local Area Network (LAN) or a Wide Area Network (WAN), or the system 5021 can be a client in a client/server arrangement with another computer, etc. All of these configurations, as well as suitable communication hardware and software, are known in the art.

Figure 17 illustrates a data processing network 5040 in which the present invention may be implemented. The data processing network 5040 may include a plurality of separate networks, such as wireless and wired networks, each of which may include a plurality of separate workstations 5041, 5042, 5043, 5044. Further, those skilled in the art will appreciate that one or more LANs may be included, wherein a LAN may include a plurality of intelligent workstations coupled to a host processor.

Still referring to FIG. 17, the network may also include mainframe computers or servers, such as a gateway computer (client server 5046) or application server (remote server 5048, which may access a data repository and may also be accessed directly from a workstation 5045). The gateway computer 5046 serves as a point of entry into each individual network. When connecting one networking protocol to another, a gateway is required. The gateway 5046 may preferably be coupled to another network (e.g., the internet 5047) by a communications link. The gateway 5046 may also be directly coupled to one or more workstations 5041, 5042, 5043, 5044 using a communications link. IBM eServer, available from International Business machines corporation, may be utilized^TMSystemThe server implements a gateway computer.

Referring concurrently to fig. 16 and 17, software programming code which may embody the present invention may be accessed by the processor 5026 of the system 5020 from long-term storage media 5027, such as a CD-ROM drive or hard drive. The software programming code may be embodied on any of a variety of known media for use with a data processing system, such as a floppy disk, a hard drive, or a CD-ROM. The code may be distributed on such media, or from the memory or storage of one computer system over a network to users 5050, 5051 of other computer systems for use by users of such other systems.

Alternatively, the programming code may be embodied in the memory 5025 and accessed by the processor 5026 using a processor bus. Such programming code includes an operating system which controls the function and interaction of the various computer components and one or more application programs 5032. Program code is typically paged from the storage medium 5027 to high-speed memory 5025 where it is available for processing by the processor 5026. Techniques and methods for embodying software programming code in memory, on physical media, and/or distributing software code via networks are well known and will not be discussed further herein. Program code, when created and stored on tangible media, including but not limited to electronic memory modules (RAM), flash memory, Compact Discs (CDs), DVDs, tapes, etc., is commonly referred to as a "computer program product". The computer program product medium is typically readable by a processing circuit, preferably located in a computer system, for execution by the processing circuit.

The cache most readily used by the processor (which is typically faster and smaller than the other caches of the processor) is the lowest level (L1 or level 1) cache, and main storage (main memory) is the highest level cache (L3 if there are three levels). The lowest level cache is often divided into an instruction cache (I-cache) that holds the machine instructions to be executed, and a data cache (D-cache) that holds the data operands.

Referring to FIG. 18, an exemplary processor embodiment is shown for the processor 5026. Typically, one or more levels of cache 5053 are used to buffer memory blocks in order to improve processor performance. The cache 5053 is a cache buffer that holds cache lines of memory data that are likely to be used. Typical cache lines are 64, 128 or 256 bytes of memory data. A separate cache is typically used for caching instructions rather than data. Cache coherency (synchronization of copies of lines in memory and cache) is typically provided by various "snoop" algorithms well known in the art. The main memory 5025 of the processor system is commonly referred to as a cache. In a processor system having 4 levels of cache 5053, main memory 5025 is sometimes referred to as a level 5 (L5) cache, because it is typically faster and maintains only a portion of the non-volatile storage (DASD, tape, etc.) that is available to the computer system. Main memory 5025 may "cache" pages of data paged in and out of main memory 5025 by the operating system.

Program counter (instruction counter) 5061 keeps track of the address of the current instruction to be executed. z-The program counter in the processor is 64 bits and may be truncated to 31 or 24 bits to support the previous addressing limits. The program counter is typically embodied in the computer's PSW (program status word) so that it persists during context transitions. Thus, an in-progress program having a program counter value may be interrupted by, for example, an operating system (context switch from a program environment to an operating system environment). When a program is inactive, the PSW of the program maintains a program counter value, and while the operating system executes, the program counter (in the PSW) of the operating system is used. Typically, the program counter is incremented by an amount equal to the number of bytes of the current instruction. RISC (reduced instruction set computing) instructions are typically of fixed length, while CISC (Complex instruction set computing) instructions are typically of variable length. IBMz-Is a CISC instruction having a length of 2, 4 or 6 bytes. Program counter 5061 is modified by, for example, a context switch operation or a branch taken operation of a branch instruction. In a context switch operation, the current program counter value is saved in a program status word along with other status information about the program being executed (such as condition codes), and a new program counter value is loaded and points to the instruction of the new program module to be executed. A branch taken operation is performed to allow the program to make a decision or loop within the program by loading the result of the branch instruction into the program counter 5061.

Typically, instructions are fetched on behalf of the processor 5026 using an instruction fetch unit 5055. The fetch unit may fetch a "next sequence of instructions," a target instruction of a branch taken instruction, or a first instruction of a context-switched program. Present instruction fetch units typically use prefetch techniques to speculatively prefetch instructions based on the likelihood that the prefetched instructions will be used. For example, the fetch unit may fetch 16 bytes of instructions, including the next sequential instruction and additional bytes of further sequential instructions.

The fetched instructions are then executed by the processor 5026. In one embodiment, the fetched instructions are passed to the dispatch unit 5056 of the fetch unit. The dispatch unit decodes the instructions and forwards information about the decoded instructions to the appropriate units 5057, 5058, 5060. The execution unit 5057 will typically receive information from the instruction fetch unit 5055 regarding decoded arithmetic instructions, and will perform arithmetic operations on operands according to the opcode of the instruction. Operands are preferably provided to the execution unit 5057 from storage 5025, architectural registers 5059, or from an immediate field (immediate field) of the instruction being executed. The results of the execution, when stored, are stored in storage 5025, registers 5059, or other machine hardware (such as control registers, PSW registers, etc.).

The processor 5026 typically has one or more units 5057, 5058, 5060 for performing the function of instructions. Referring to fig. 19A, an execution unit 5057 may communicate with architected general registers 5059, decode/dispatch unit 5056, load store unit 5060, and other 5065 processor units via interface logic 5071. The execution unit 5057 may use several register circuits 5067, 5068, 5069 to hold information that the Arithmetic Logic Unit (ALU) 5066 is to operate on. The ALU performs arithmetic operations such as add, subtract, multiply, divide, and logical operations such as AND, OR, and exclusive OR (XOR), rotate, and shift. Preferably, the ALU supports specialized operations that are design dependent. Other circuitry may provide other architectural tools 5072, including condition codes and recovery support logic, for example. Typically, the results of the ALU operations are held in output register circuitry 5070, which may forward the results to a variety of other processing functions. There are many processor unit arrangements and this description is intended only to provide a representative understanding of one embodiment.

For example, ADD instructions will be executed in an execution unit 5057 having arithmetic and logical functionality, while floating point instructions will be executed in floating point execution with dedicated floating point capabilities, for example. Preferably, the execution unit operates on the operands identified by the instruction by executing the function defined by the opcode on the operands. For example, an ADD instruction may be executed by the execution unit 5057 on operands found in two registers 5059 identified by register fields of the instruction.

The execution unit 5057 performs arithmetic addition on two operands and stores the result in a third operand, which may be a third register or one of the two source registers. The execution unit preferably utilizes an Arithmetic Logic Unit (ALU) 5066, which can perform a variety of logic functions, such as shifting, rotating, and, OR, and XOR, as well as any of a variety of algebraic functions, including addition, subtraction, multiplication, and division. Some ALUs 5066 are designed for scalar operations, and some for floating point. Depending on the architecture, the data may be big endien (where the least significant byte is at the most significant byte address) or little endien (where the least significant byte is at the least significant byte address). IBMz-Is the large end. Depending on the architecture, the signed field may be sign and magnitude, 1's complement, or 2's complement. A 2's complement number is advantageous in that the ALU does not need to design subtraction capability because only addition in the ALU is required, whether negative or positive in the 2's complement. The numbers are typically described in shorthand, where a 12-bit field defines the address of a block of 4096 bytes, and are typically described as a 4Kbyte block, for example.

Referring to FIG. 19B, branch instruction information for executing a branch instruction is typically sent to a branch unit 5058, which often predicts branch outcome before other conditional operations are completed, using a branch prediction algorithm such as a branch history table 5082. Before the conditional operation completes, the target of the current branch instruction will be fetched and speculatively executed. When the conditional operation completes, the speculatively executed branch instruction is either completed or discarded based on the condition of the conditional operation and the speculative result. Typical branch instructions may test the condition code and branch to a target address if the condition code satisfies the branch requirement of the branch instruction, the branch address may be calculated based on a number including, for example, a number found in a register field or an immediate field of the instruction. The branch unit 5058 may utilize an ALU 5074 having a plurality of input register circuits 5075, 5076, 5077 and an output register circuit 5080. The branch unit 5058 may communicate with, for example, general registers 5059, decode dispatch unit 5056, or other circuitry 5073.

Execution of a set of instructions may be interrupted for a number of reasons including, for example, a context switch initiated by the operating system, a program exception or error causing a context switch, an I/O interrupt signal causing a context switch, or multi-threaded activity of multiple programs (in a multi-threaded environment). Preferably, the context switch action saves state information about the currently executing program and then loads state information about another program being invoked. The state information may be stored, for example, in hardware registers or memory. The state information preferably includes a program counter value pointing to the next instruction to be executed, condition codes, memory translation information and architectural register contents. The context translation activities may be implemented by hardware circuitry, application programs, operating system programs, or firmware code (microcode, pico code, or Licensed Internal Code (LIC)), alone or in combination.

The processor accesses operands according to the instruction defined method. An instruction may provide an immediate operand using the value of a portion of the instruction, may provide one or more register fields that explicitly point to general purpose registers or special purpose registers (e.g., floating point registers). The instruction may utilize the implied register determined by the opcode field as an operand. The instruction may utilize memory locations for operands. The memory location of the operand may be provided by a register, an immediate field, or a combination of a register and an immediate field, such as by z ≧Illustrated by the long displacement facility (facility), where the instruction defines a base register, an index register, and an immediate field (displacement field) that are added together to provide, for example, the address of an operand in memory. Location here typically means a location in main memory (main storage device) unless otherwise specified.

Referring to fig. 19C, the processor accesses the memory using the load/store unit 5060. The load/store unit 5060 may perform a load operation by obtaining the address of a target operand in memory 5053 and loading the operand into a register 5059 or other memory 5053 location, or may perform a store operation by obtaining the address of a target operand in memory 5053 and storing data obtained from a register 5059 or another memory 5053 location in the target operand location in memory 5053. The load/store unit 5060 may be speculative and may access memory in an out-of-order relative to instruction order, but the load/store unit 5060 will maintain the appearance to a program that instructions are executed in order. The load/store unit 5060 may communicate with general registers 5059, decryption/dispatch unit 5056, cache/memory interface 5053 or other elements 5083, and includes various register circuits, ALUs 5085 and control logic 5090 to calculate memory addresses and provide pipeline order to keep operations in order. Some operations may be out of order, but the load/store unit provides functionality such that operations that are performed out of order appear to the program as if they were performed in order, as is well known in the art.

Preferably, the addresses that are "seen" by the application are commonly referred to as virtual addresses. Virtual addresses are sometimes referred to as "logical addresses" and "effective addresses". These virtual addresses are virtual in that they are redirected to a physical memory location by one of a variety of Dynamic Address Translation (DAT) techniques including, but not limited to, simply prefixing the virtual address with an offset value, translating the virtual address via one or more translation tables, preferably including at least a segment table and a page table (either individually or in combination), preferably the segment table having an entry pointing to the page table. In z-A translation hierarchy is provided that includes a region first table, a region second table, a region third table, a segment table, and an optional page table. Performance of address translation tables is generally facilitated byIs improved with a Translation Lookaside Buffer (TLB) that includes entries that map virtual addresses to associated physical memory locations. When a DAT translates a virtual address using a translation table, an entry is created. Subsequent use of the virtual address may then utilize the entry of the fast TLB, rather than the slow sequential translation table access. TLB content may be managed by a plurality of replacement algorithms including LRU (least recently used).

Where the processors are processors of a multi-processor system, each processor has the responsibility of maintaining shared resources, such as I/O, caches, TLBs, and memory, which are interlocked to achieve coherency. Typically, "snooping" techniques will be used to maintain cache coherency. In a snooping environment, each cache line may be marked as being in one of a shared state, an exclusive state, a changed state, an invalid state, etc., to facilitate sharing.

The I/O unit 5054 (fig. 18) provides the processor with means for attaching to peripheral devices including, for example, tapes, disks, printers, displays, and networks. The I/O cells are typically presented to the computer program by a software driver. In a location such as fromSystem ofThe channel adapter and the open system adapter are I/O units of the mainframe computer that provide communication between the operating system and peripheral devices.

Moreover, other types of computing environments may benefit from one or more aspects of the present invention. By way of example, an environment may include an emulator (e.g., software or other emulation mechanisms), in which a particular architecture (including, for example, instruction execution, architectural functions such as address translation, and architectural registers) or a subset thereof is emulated (e.g., in a native computer system having a processor and memory). In such an environment, one or more emulation functions of the emulator can implement one or more aspects of the present invention, even though the computer executing the emulator may have a different architecture than the capabilities being emulated. As one example, in emulation mode, a particular instruction or operation being emulated is decoded, and the appropriate emulation function is established to implement the single instruction or operation.

In an emulation environment, a host computer includes, for example, memory to store instructions and data; an instruction fetch unit to fetch instructions from memory and, optionally, to provide local buffering of fetched instructions; an instruction decode unit to receive the fetched instruction and determine a type of instruction that has been fetched; and an instruction execution unit to execute the instruction. Execution may include loading data from memory to a register; storing data from the register back to the memory; or perform some type of arithmetic or logical operation as determined by the decode unit. In one example, each unit is implemented in software. For example, the operations performed by the units are implemented as one or more subroutines in emulator software.

More specifically, in a mainframe computer, programmers (typically today's "C" programmers) typically use architected machine instructions through compiler applications. These instructions stored in the storage medium may be at z @Either locally in a server or in a machine executing other architectures. They may be present and futureMainframe computer server andother machines (e.g., Power Systems servers and Systems)Server) is simulated. They can be used byAMD^TMEtc. are executed in machines running Linux on various machines of manufactured hardware. Except at z-With this hardware on board, Linux can also be used for machines that use emulation provided by Hercules (see www.hercules-390.org /) or FSI (Fundamental Software, Inc) (see www.funsoft.com /), where execution is typically in emulation mode. In emulation mode, emulation software is executed by the native processor to emulate the architecture of the emulated processor.

The native processor typically executes emulation software, which includes firmware or a native operating system, to execute an emulation program of the emulated processor. The emulation software is responsible for fetching and executing instructions of the emulated processor architecture. The emulation software maintains an emulated program counter to keep track of instruction boundaries. The emulation software can fetch one or more emulated machine instructions at a time and convert the one or more emulated machine instructions into a corresponding set of native machine instructions for execution by the native processor. These translated instructions may be cached so that faster translations may be accomplished. The emulation software will maintain the architectural rules of the emulated processor architecture to ensure that the operating system and applications written for the emulated processor operate correctly. Furthermore, the emulation software will provide resources determined by the emulated processor architecture, including but not limited to control registers, general purpose registers, floating point registers, dynamic address translation functions including, for example, segment and page tables, interrupt mechanisms, context translation mechanisms, time of day (TOD) clocks, and architectural interfaces to the I/O subsystem, such that operating systems or applications designed to run on the emulated processor may run on the native processor with the emulation software.

The particular instruction being emulated is decoded and a subroutine is called to perform the function of that single instruction. The emulation software functions that emulate the functions of an emulated processor are implemented, for example, in a "C" subroutine or driver, or by other methods that provide drivers for specific hardware, as will be understood by those skilled in the art after understanding the description of the preferred embodiments. Including, but not limited to, U.S. patent No. 5,551,013 entitled "Multiprocessor for Hardware Emulation" to beaussoleil et al; and U.S. patent certificate number 6,009,261 entitled "Preprocessing of Stored Target Instructions for simulating an incorporated instruction on a Target Processor" to Scalazi et al; and U.S. patent document No. 5,574,873 entitled "Decoding Guest Instructions to direct Access orientations angles of the Guest Instructions" to Davidian et al; and U.S. patent certificate No. 6,308,255 entitled "symmetric Multiprocessing Bus and chip Used for multiprocessor support Non-Native Code to Run in a System" to Gorishek et al; and U.S. patent document No. 6,463,582 entitled "Dynamic Optimizing Object code translator for Architecture implementation and Dynamic Optimizing Object code Translation Method" by Lethin et al; and U.S. patent certificate No. 5,790,825 entitled "Method for simulating Guest instruments" by Eric Traut for Host computer through Dynamic reconfiguration of Host instruments "; as well as numerous other patents, show various known ways to implement emulation of instruction formats architected for different machines for a target machine available to those skilled in the art.

In fig. 20, an example of an emulated host computer system 5092 is provided that emulates a host computer system 5000' of a host architecture. In the emulated host computer system 5092, the host processor (CPU) 5091 is an emulated host processor (or virtual host processor) and includes an emulated processor 5093 having a different native instruction set architecture than the processor 5091 of the host computer 5000'. The emulation host computer system 5092 has a memory 5094 accessible by an emulation processor 5093. In the exemplary embodiment, memory 5094 is partitioned into a host computer memory 5096 portion and an emulation routines 5097 portion. Host computer memory 5096 is available to programs emulating host computer 5092, according to the host computer architecture. The emulation processor 5093 executes native instructions of an architected instruction set of a different architecture than the emulated processor 5091 (i.e., native instructions from the emulated program processor 5097), and may access host instructions for execution from programs in the host computer memory 5096 by using one or more instructions obtained from a sequence and access/decode routine that may decode the accessed host instructions to determine a native instruction execution routine for emulating the function of the accessed host instructions. Other tools defined for the host computer system 5000' architecture may be emulated by the architecture tool routines, including such tools as general purpose registers, control registers, dynamic address translation and I/O subsystem support and processor caches. The emulation routine may also take advantage of functions available in the emulation processor 5093, such as dynamic translation of general purpose registers and virtual addresses, to improve the performance of the emulation routine. Specialized hardware and offload engines may also be provided to assist the processor 5093 in emulating the functionality of the host computer 5000'.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, and/or components.

The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below, if any, are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A method of managing interrupt requests in a computing environment, comprising the steps of:

controlling a register interrupt operation of the MPFC instruction in response to executing the Modify PCI function, wherein the register interrupt operation specifies a function handle of an adapter, specifies a location in memory of an adapter interrupt bit vector AIBV of the adapter, the AIBV contained in an array of one or more AIBVs, and a location in memory of an AISB in an array of adapter interrupt summary bits AISB;

receiving an interrupt request from the adapter; and

in response to receiving the request, setting an indicator in the AIBV indicating the event from the adapter and setting an ASIB, the AISB indicating that the indicator is set in the AIBV, wherein the setting the indicator in the AIBV comprises:

determining whether a vector number provided in a request is within a number of interrupts allowed by the adapter;

in response to determining that the vector number is within the allowed number of interrupts, an indicator in the AIBV is set according to the vector number, the AIBV offset, and the AIBV address.

2. The method of claim 1, wherein the method further comprises presenting an interrupt to an operating system, the interrupt responsive to the interrupt request.

3. The method of claim 2, wherein the interrupt request represents a plurality of message signaled interrupts, and the interrupt to the operating system is part of an input/output adapter event notification for the operating system.

4. The method of claim 2, wherein the method further comprises, in response to the presenting, obtaining one or more AIBV indications for one or more adapters that specify at least one cause of interruption for each adapter.

5. The method of claim 4, wherein the obtaining further comprises obtaining a plurality of AIBV indications that specify a plurality of reasons for interruption, the plurality of reasons for interruption corresponding to a plurality of interruption requests.

6. The method of claim 5, wherein the obtaining comprises using an AISB in obtaining one or more AIBV indications of one or more AIBVs.

7. The method of claim 1, wherein setting an indicator in the AIBV further comprises:

obtaining a device table entry using an indicator of a request from an adapter, the adaptation table entry including a value specifying a number of interrupts allowed by the adapter;

in response to determining that the vector number is within the allowed number of interrupts, the start position of the AIBV is located using one or more parameters of the device table entry.

8. A system for managing interrupt requests in a computer environment, comprising the following means:

means for controlling a register interrupt operation of the MPFC instruction in response to executing the Modify PCI function, wherein the register interrupt operation specifies a function handle of an adapter, specifies a location in memory of an adapter interrupt bit vector AIBV of the adapter, the AIBV contained in an array of one or more AIBVs, and a location in memory of an AISB in an array of adapter interrupt summary bits AISB;

means for receiving an interrupt request from the adapter; and

means for setting an indicator in the AIBV indicating an event from the adapter and setting an ASIB indicating that the indicator is set in the AIBV in response to the received request, wherein the means for setting the indicator in the AIBV comprises:

means for determining whether a vector number provided in a request is within a number of interrupts allowed by the adapter;

means for setting an indicator in the AIBV according to the vector number, the AIBV offset, and the AIBV address in response to determining that the vector number is within the allowed number of interrupts.

9. The system of claim 8, wherein the system further comprises: means for presenting an interrupt to an operating system, the interrupt responsive to the interrupt request.

10. The system of claim 9, wherein the interrupt request represents a plurality of message signaled interrupts, and the interrupt to the operating system is part of an input/output adapter event notification for the operating system.

11. The system of claim 9, wherein the system further comprises means for obtaining, in response to the presenting, one or more AIBV indications for one or more adapters, the AIBV indications specifying at least one cause of interruption for each adapter.

12. The system of claim 11, wherein the means for obtaining further comprises means for obtaining a plurality of AIBV indications, the AIBV indications specifying a plurality of reasons for an interrupt, the plurality of reasons for the interrupt corresponding to a plurality of interrupt requests.

13. The system of claim 12, wherein the means for obtaining comprises means for using an AISB in obtaining one or more AIBV indications of one or more AIBVs.

14. The system of claim 8, wherein the means for setting an indicator in the AIBV further comprises:

means for obtaining a device table entry using an indicator of a request from an adapter, the adaptation table entry including a value specifying a number of interrupts allowed by the adapter;

means for locating a start position of the AIBV using one or more parameters of the device table entry in response to determining that the vector number is within the allowed number of interrupts.