HK1201353B

HK1201353B - Method and system for loading data into a register

Info

Publication number: HK1201353B
Application number: HK15101820.7A
Authority: HK
Inventors: J.D.布拉德伯里; M.K.格施温德; T.斯雷格; E.M.施瓦茨; C.雅各比
Original assignee: 国际商业机器公司
Priority date: 2012-03-15
Filing date: 2013-03-07
Publication date: 2018-04-13

Description

Method and system for loading data into register

Technical Field

Aspects of the present invention relate generally to data processing, and more particularly to loading data into registers.

Background

Data processing includes various types of processing, including loading data into registers. The loading of data into registers includes, but is not limited to, character data (such as a string of character data); integer data; or any other type of loading of data. The loaded data can then be used and/or manipulated.

Current instructions that perform various types of processing, including loading data into registers, tend to be inefficient.

Disclosure of Invention

The shortcomings of the prior art are overcome and advantages are provided through the provision of a computer program product for executing machine instructions. The computer program product includes a computer-readable storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method. The method includes (for example): obtaining, by a processor, a machine instruction for execution, the machine instruction defined according to a computer architecture for computer execution, the machine instruction comprising: at least one opcode field providing an opcode, the opcode identifying a load to block boundary operation; a register field to specify a register, the register including a first operand; at least one field for finding a location of a second operand in main memory; and executing the machine instruction, the executing comprising: the bytes of the first operand are loaded with only corresponding bytes of the second operand within a block of main memory that is dynamically determined based on the specified type of block boundary and one or more characteristics of the processor.

Methods and systems relating to one or more aspects of the present invention are also described and claimed herein. Additionally, services relating to one or more aspects of the present invention are also described and may be claimed herein.

Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention.

Drawings

One or more aspects of the present invention are particularly pointed out and distinctly claimed as examples in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the invention are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 depicts an example of a computing environment incorporating and using one or more aspects of the present invention;

FIG. 2A depicts another example of a computing environment incorporating and using one or more aspects of the present invention;

FIG. 2B depicts further details of the memory of FIG. 2A, in accordance with an aspect of the present invention;

FIG. 3 depicts one embodiment of a format of a "vector load to block boundary" instruction, in accordance with an aspect of the present invention;

FIG. 4 depicts one embodiment of the logic associated with a "vector load to Block boundary" instruction, in accordance with an aspect of the present invention;

FIG. 5 depicts one example of data to be loaded into a vector register, in accordance with an aspect of the present invention;

FIG. 6 depicts one example of a register file, in accordance with an aspect of the present invention;

FIG. 7 depicts an embodiment of a computer program product incorporating one or more aspects of the present invention;

FIG. 8 depicts one embodiment of a host computer system incorporating and using one or more aspects of the present invention;

FIG. 9 depicts another example of a computer system incorporating and using one or more aspects of the present invention;

FIG. 10 depicts another example of a computer system, including a computer network, incorporating and using one or more aspects of the present invention;

FIG. 11 depicts one embodiment of various elements of a computer system incorporating and using one or more aspects of the present invention;

FIG. 12A depicts one embodiment of an execution unit of the computer system of FIG. 11 incorporating and using one or more aspects of the present invention;

FIG. 12B depicts one embodiment of a branch unit of the computer system of FIG. 11 incorporating and using one or more aspects of the present invention;

FIG. 12C depicts one embodiment of a load/store unit of the computer system of FIG. 11 incorporating and using one or more aspects of the present invention; and

FIG. 13 depicts one embodiment of an emulated host computer system incorporating and using one or more aspects of the present invention.

Detailed Description

According to one aspect of the present invention, a capability is provided for facilitating the loading of data in a register. By way of example, the data includes character data, integer data, and/or other types of data. Additionally, the registers are vector registers or another type of register.

Character data includes, but is not limited to, alphabetic characters in any language; a number; punctuation marks; and/or other symbols. The character data may or may not be a string of data. Criteria are associated with the character data, examples of criteria include (but are not limited to): ASCII (american standard code for information interchange); unicode, including but not limited to UTF (Unicode transform format) 8; UTF 16; and the like.

A vector register (also referred to as a vector) includes one or more elements, and each element is one, two, or four bytes in length, as examples. In addition, a vector operand is, for example, a SIMD (single instruction multiple data) operand having multiple elements. In other embodiments, the elements may be other sizes; and the vector operands need not be SIMD and/or may comprise one element.

In one example, a "vector load to block boundary" instruction is provided that loads a variable number of bytes of data from memory into a vector register while ensuring that the specified boundary of the memory from which the data is being loaded is not crossed. Boundaries may be explicitly specified by an instruction (e.g., a variable value in the instruction text, a fixed instruction text value encoded in the opcode, a register-based boundary specified in the instruction, etc.); or the boundary may be determined dynamically by the machine. For example, an instruction specifies that data is to be loaded to a page or cache boundary, and the machine determines the cache line or page size (e.g., look up in a translation look aside buffer, for example, to determine the page size), and loads to that point.

As another example, this instruction is also used to align data accesses with selected boundaries.

In one embodiment, the instruction loads the bytes of the vector register (first operand) with only the corresponding bytes of the second operand within a block of main memory (also referred to as main storage) that is dynamically determined by the type of block boundary (e.g., cache line or page) and one or more characteristics of the processor executing the instruction (such as cache line size or page size). As used herein, a block of main memory is any block of memory having a specified size. The specified size is also referred to as the boundary of the block, which is the end of the block.

In another embodiment, other types of registers are loaded. That is, the register being loaded is not a vector register, but another type of register. In this context, the instruction is referred to as a "load to block boundary" instruction, which is used to load data into registers.

One embodiment of a computing environment to incorporate and use one or more aspects of the present invention is described with reference to FIG. 1. The computing environment 100 includes a processor 102 (e.g., a central processing unit), a memory 104 (e.g., a main memory), and/or a plurality of input/output (I/O) devices and/or interfaces 106 coupled to one another via, for example, one or more buses 108 and/or other connections.

In one example, the processor 102 is based on the z/Architecture supplied by International Business Machines Corporation and is part of a server, such as a System z server that is also supplied by International Business Machines Corporation and implements the z/Architecture. One example of a z/Architecture is described in the title "z/Architecture Principles of OperationPublication No. (SA22-7832-08, ninth edition, 8 months 2010). In one example, the processor executes an operating system, such as the z/OS also supplied by International Business Machines Corporation.Andis a registered trademark of International Business machines corporation (Armonk, New York, USA). Other names used herein may be registered trademarks, trademarks or product names of International Business Machines Corporation or other companies.

In another embodiment, the processor 102 is based on a software product supplied by International Business machines corporationThe Power architecture of (1). One embodiment of the Power architecture is described in the "Power ISA^TMRevision B "version 2.06 (International Business Machines Corporation, 23/7 2010). PowerIs a registered trademark of International Business Machines Corporation.

In another embodiment, the processor 102 is based on the Intel architecture supplied by Intel Corporation. An embodiment of the Intel architecture is described in "64and IA-32 architecture devices' Manual, Vol.2B, instruments Set Reference, A-L (Ser. No. 253666. 041US, 12 2011) and "64and IA-32 architecture devices' Manual, Vol.2B, instruments SetReference, M-Z (Ser. No. 253667. parable. 041US, 12 2011).Is a registered trademark of Intel Corporation (santa clara, California).

Another embodiment of a computing environment to incorporate and use one or more embodiments of the present invention is described with reference to FIG. 2A. In this example, the computing environment 200 includes a local central processing unit 202, a memory 204, and one or more input/output devices and/or interfaces 206 coupled to each other, e.g., via one or more buses 208 and/or other connections, for example. By way of example, the computing environment 200 may include: PowerPC processors, pSeries servers, or xSeries servers supplied by International Business Machines Corporation (Armonk, New York); HPSuperdome with Intel Itanium II processor supplied by hewlett packard co. (Palo Alto, California); and/or other Machines based on architectures supplied by International Business Machines Corporation, Hewlett packard, Intel, Oracle, or others.

The local central processing unit 202 includes one or more local registers 210, such as one or more general purpose registers and/or one or more special purpose registers, used during processing within the environment. These registers contain information that represents the state of the environment at any particular point in time.

In addition, the local central processing unit 202 executes instructions and program code stored in memory 204. In one particular example, the central processing unit executes emulator code 212 stored in memory 204. This code enables a processing environment configured in one architecture to emulate another architecture. By way of example, the emulator code 212 allows for emulating z/Architecture and executing software and instructions developed based on z/Architecture based on machines other than z/Architecture, such as PowerPC processors, pSeries servers, xSeries servers, HPSuperdome servers, or others.

Further details regarding the emulator code 212 are described with reference to FIG. 2B. Guest instructions 250 comprise software instructions (e.g., machine instructions) developed to be executed in an architecture different from that of the local CPU 202. For example, the guest instruction 250 may have been designed to execute on the z/Architecture processor 102, but instead, the guest instruction 250 is being emulated on the local CPU 202 (which may be, for example, an Intel Itanium II processor). In one example, the emulator code 212 includes an instruction fetch unit 252 to obtain one or more guest instructions 250 from the memory 204 and optionally provide local buffering for the obtained instructions. The emulator code 212 also includes an instruction translation routine 254 to determine the type of guest instruction that has been obtained and translate the guest instruction into one or more corresponding native instructions 256. This translation includes, for example, identifying a function to be performed by the guest instruction and selecting the native instruction(s) to perform the function.

Additionally, the emulator 212 includes an emulation control routine 260 to cause native instructions to be executed. Emulation control routine 260 may cause native CPU 202 to execute a routine of native instructions that emulate one or more previously obtained guest instructions, and, upon completion of such execution, return control to the instruction fetch routine to emulate the obtaining of the next guest instruction or group of guest instructions. Execution of the native instructions 256 may include loading data from the memory 204 into registers; storing data from the register back to the memory; or perform some type of arithmetic or logical operation (as determined by the translation routine).

Each routine is implemented, for example, in software that is stored in memory and executed by the local central processing unit 202. In other examples, one or more of the routines or operations are implemented in firmware, hardware, software, or some combination thereof. The registers of the emulated processor may be emulated using registers 210 of the local CPU or by using locations in memory 204. In embodiments, guest instructions 250, native instructions 256, and emulator code 212 may reside in the same memory or may be distributed among different memory devices.

As used herein, firmware includes, for example, microcode, millicode, and/or macrocode of a processor. Firmware includes, for example, hardware-level instructions and/or data structures used in the implementation of high-level machine code. In one embodiment, the firmware includes, for example, proprietary code that is typically delivered as microcode that includes microcode specific to trusted software or underlying hardware and that controls access by the operating system to the system hardware.

In one example, the obtained, translated, and executed guest instructions 250 are instructions described herein. An instruction that is of one Architecture (e.g., z/Architecture) is fetched from memory, translated and represented as a sequence of native instructions 256 of another Architecture (e.g., PowerPC, pSeries, xSeries, Intel, etc.). These native instructions are then executed.

In one embodiment, the instructions described herein are vector instructions provided according to an embodiment of the present disclosure that are part of a vector tool. The vector tool provides a fixed-size vector, for example, ranging from one to sixteen elements. Each vector includes data that is operated on by a vector instruction defined in the facility. In one embodiment, if the vector consists of multiple elements, each element is processed in parallel with the other elements. Instruction completion does not occur until processing of all elements is complete.

As described herein, vector instructions may be implemented as part of various architectures including, but not limited to, z/Architecture, Power, Intel, and the like. Although the embodiments described herein are directed to z/Architecture, the vector instructions and one or more embodiments of the present invention may be based on many other architectures. The z/Architecture is merely an example.

In one embodiment implementing the vector tool as part of the z/Architecture, to use vector registers and instructions, the vector enable control and register control in a specified control register (e.g., control register 0) are set to, for example, one. A data exception is considered if the vector tool is installed and the vector instruction is executed without the enable control set. If the vector tool is not installed and the vector instruction is executed, the exception to the operation is considered.

The vector data is presented in storage, for example, in the same left-to-right order as the other data formats. Bits of the data format numbered 0 to 7 constitute the byte in the leftmost (lowest numbered) byte position in the storage, bits 8 to 15 form the byte in the next sequential position, and so on. In another example, the vector data may appear in storage in another order (such as from right to left).

Many of the vector instructions provided with the vector tool have a field specifying a bit. This field, referred to as register extension bits or RXB, includes the most significant bits for each of the operands specified by the vector register. The bits for registers not specified by the instruction will be reserved and set to zero.

In one example, the RXB field includes four bits (e.g., bits 0-3), and the bits are defined as follows:

0-the most significant bit indicated by the first vector register for the instruction.

1-second vector register for instruction indicates the most significant bit, if any.

2-the third vector register for the instruction indicates the most significant bit, if any.

3-fourth vector register for instruction indicates the most significant bit, if any.

Each bit is set to either a zero or a one depending on the register number, for example, by an assembler. For example, for registers 0-15, the bit is set to 0; for registers 16 to 31, the bit is set to 1, and so on.

In one embodiment, each RXB bit is an extension bit for a particular location in an instruction that includes one or more vector registers. For example, in one or more vector instructions, bit 0 of RXB is an extended bit of positions 8-11, which is assigned to, for example, V₁(ii) a Bit 1 of RXB is an extended bit of positions 12 to 15, which is assigned to, for example, V₂(ii) a And so on.

In another embodiment, the RXB field includes additional bits, and more than one bit is used as an extension for each vector or position.

An instruction provided according to one aspect of the present invention that includes an RXB field is a "vector load to block boundary" instruction, an example of which is depicted in fig. 3. In one example, the "vector load to block boundary" instruction 300 includes: opcode fields 302a (e.g., bits 0-7), 302b (e.g., bits 40-47) indicating a "vector load to block boundary" operation; vector register field 304 (e.g., bits 8-11) to specify a vector register (V)₁) (ii) a Index field (X)₂)306 (e.g., bits 12 to 15); basic field (B)₂)308 (e.g., bits 16 to 19); displacement field (D)₂)310 (e.g., bits 20 to 31); mask field (M)₃)312 (e.g., bits 32 through 35); and an RXB field 314 (e.g., bits 36 through 39). In one example, each of fields 304-314 is separate and independentOpcode field(s). Additionally, in one embodiment, these fields are separate and independent of each other; however, in other embodiments, more than one field may be combined. Additional information regarding the use of these fields is described below.

In one example, selected bits (e.g., the first two bits) of the opcode specified by the opcode field 302a specify the length and format of the instruction. In this particular example, the length is three halfwords, and the format is a vector register and index store operation with an extended opcode field. The vector (V) field and its corresponding extension bit specified by RXB specify the vector register. In particular, for vector registers, a four-bit field, such as a register field, with a register extension bit (RXB) added as the most significant bit, is used to specify the register containing the operand. For example, if the four-bit field is 0110 and the extension bit is 0, then the five-bit field 00110 indicates register number 6.

The subscript numbers associated with a field of an instruction indicate the operands to which that field applies. For example, and V₁The associated subscript number 1 denotes the first operand, and so on. The length of a register operand is one register, which is, for example, 128 bits.

In one example, in a "vector register and index store operation" instruction, the instruction will be operated on by X₂And B₂The contents of the general register indicated by the field are added to D₂The contents of the fields to form the second operand address. In one example, displacement D for the "vector load to Block boundary" instruction₂Treated as a 12-bit unsigned integer.

Using M in one or more embodiments₃A field to determine the boundary (hereinafter also referred to as boundary size) in the memory to be loaded. For example, in one embodiment where the boundaries are specified by an instruction, M₃The field specifies the code to signal the CPU about the block boundary loaded. If a reserved value is specified, it is treated as a specified exception. Example codes and corresponding values are as follows:

however, in another embodiment, where the boundary is dynamically determined by the processor executing the instruction, M₃The fields include, by way of example, an indication of a boundary type, such as a cache or page boundary. The processor then determines a boundary size based on the type and one or more processor characteristics, such as a cache line or page size used by the processor. The processor may use a fixed value for the size or may dynamically determine the size. For example, if M₃The field indicates that the type is a page boundary, the processor may perform a table lookup of the starting address, for example, in a translation lookaside buffer to obtain the page size.

In another example, M is not provided₃A field and the type is indicated by another field of the instruction or by a control outside the instruction.

In the execution of one embodiment of the "vector load to block boundary (VLBB)" instruction, in one embodiment, proceeding from left to right, starting with zero index byte elements, the first operand is loaded with bytes from the second operand (at pass V)₁Specified in a register indicated by a field plus an extension bit). The second operand is a memory location specified by a second operand address (also referred to as a start address). The loading starts from the memory location and continues to the end address calculated by the instruction (or processor), as described below. If a boundary condition is encountered, it is a model that depends on the way the rest of the first operand is treated. Not seen as an access exception with respect to one byte not loaded. In one example, the bytes that are not loaded are unpredictable.

In the example instruction above, the start address is indexed by the register value (X)₂) + base register value (B)₂) + displacement (D)₂) To determine; however, in other embodiments, the start address is provided by: a register value;instruction address + instruction text specifies an offset (offset); register value + displacement; or register value + index register value; (as just a few examples). Additionally, in one embodiment, the instruction does not include an RXB field. Rather, no extension is used or provided in another manner (such as from a control external to the instruction), or as part of another field of the instruction.

Further details of one embodiment of processing a "vector load to block boundary" instruction are described with reference to FIG. 4. In one example, a processor of the computing environment is executing this logic.

In one embodiment, initially, a start address is calculated that indicates a location in memory from which loading will begin (step 400). As an example, the start address 402 may be provided by: a register value; the instruction address plus the instruction text specified offset; register value plus displacement; the register value plus the index register value; or register value plus index register value plus displacement. In the instructions provided herein, by X₂Field, B₂Field and D₂A field to provide a start address. I.e. will pass X₂And B₂Contents of the designated register being added to the pass D₂The indicated displacement to provide the start address. The above indicated way of calculating the start address is only an example; other examples are possible.

Thereafter, a determination is made as to whether the boundary is to be dynamically determined (INQUIRY 404). If not, then M is used₃The value specified in the field is as the boundary size (BdySize). Otherwise, the processor dynamically determines the boundary size (step 406). For example, M₃The field specifies a type of the boundary, and based on the type and one or more characteristics of the processor (e.g., a cache line size of the processor, a page size of the processor, etc.), the processor determines the boundary. As an example, based on the type, the processor uses a fixed size of the boundary (e.g., a predefined fixed cache line or page size of the processor), or based on the type, the processor determines the boundary. For example, if the type is pageBoundary, the processor looks up the start address in the TLB and determines the page boundary therefrom. Other examples exist.

After the boundary size is determined, a boundary mask (BdyMask) is established, either dynamically or through specified instructions, to determine the proximity to the specified boundary (step 410). To create the mask, in one example, a complement negative number of 2 of boundary sizes (BdySize) (408) is employed to create boundary mask 412 (e.g., bdysask ═ 0-BdySize).

Next, an end address is calculated, which indicates where to stop loading (step 420). The inputs to this calculation are, for example, boundary size 408, start address 402, vector size 414 (e.g., in bytes; e.g., 16), and boundary mask 412. In one example, the end address 422 is calculated as follows:

thereafter, starting with index byte 0, the first operand (i.e., the designated vector register) is loaded from memory starting at the start address and ending at the end address (step 430). This enables a variable number of bytes to be loaded from memory into the vector without crossing the indicated memory boundary. For example, if the memory boundary is at 64 bytes and the start address is at 58 bytes, then bytes 58-64 are loaded in the vector register.

One example of data to be loaded into a vector register according to an embodiment of the invention is depicted in FIG. 5. As indicated, no data is loaded across the boundary indicated by the dashed vertical line. Locations that pass through the boundary are not accessible and no exception occurs. In one particular embodiment, vectors are loaded from left to right. However, in another embodiment, the vectors may be loaded from right to left. In one embodiment, the direction of the vector-left to right or right to left is provided at execution time (runtime). For example, an instruction accesses a register, state control, or other entity that indicates that the direction of processing is left-to-right or right-to-left, as examples. In one embodiment, this directional control is not encoded as part of the instructions, but is provided to the instructions upon execution.

As described herein, the vector registers are loaded with bytes of data within a block from main storage. The boundary of a block is considered the end of the block. The start of the block may be calculated using the start address (StartAddress) and the boundary mask. For example, the beginning of the block is calculated from StartAddress AND BdyMask.

The above description is one example of a load instruction. When loading data (such as string data), it is often unknown whether the string will end before the page boundary. The ability to load up to the boundary without crossing typically requires first checking the end of the string. Some implementations may also have penalties for crossing boundaries, and the software may want to avoid these situations. Thus, the ability to load up to several boundaries is useful. Instructions are provided that load a variable number of bytes into a vector register while ensuring that data that crosses specified boundaries is not loaded.

In one embodiment, there are 32 vector registers, and other types of registers may map to quadrants of the vector registers. For example, as shown in FIG. 6, if there is a register file 600 that includes 32 vector registers 602 and each register is 128 bits in length, 16 floating point registers 604 of 64 bits in length may overlap these vector registers. Thus, as an example, when floating point register 2 is modified, then vector register 2 is also modified. Other mappings for other types of registers are possible.

Herein, memory, main memory, storage, and main storage are used interchangeably unless explicitly noted otherwise or by context.

Additional details regarding the vector tool (including examples of other instructions) are provided as part of this implementation described further below.

As will be appreciated by one skilled in the art, one or more embodiments of the present invention may be embodied as a system, method or computer program product. Accordingly, one or more embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware embodiments that may all generally be referred to herein as a "circuit," module "or" system. Furthermore, one or more embodiments of the invention may take the form of a computer program product embodied in one or more computer-readable media having computer-readable program code embodied therein.

Any combination of one or more computer-readable media may be utilized. The computer readable medium may be a computer readable storage medium. For example, a computer readable storage medium may be, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc-read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Referring now to fig. 7, in one example, a computer program product 700 includes, for instance, one or more non-transitory computer-readable storage media 702 to store computer-readable program code means or logic 704 thereon to provide and facilitate one or more embodiments of the present invention.

Program code embodied on a computer readable medium may be transmitted using an appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for one or more aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language, assembler or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

One or more aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in these figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of one or more aspects of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition to the foregoing, one or more embodiments of the invention may also be provided, provisioned, deployed, managed, serviced, etc. by a service provider that offers management of customer environments. For example, a service provider can create, maintain, support, etc., computer code and/or computer infrastructure that performs one or more embodiments of the invention for one or more customers. In return, the service provider may collect payment from the customer under a subscription and/or fee agreement, as examples. Additionally or alternatively, the service provider may collect payment from the sale of advertising content to one or more third parties.

In one aspect of the invention, an application program may be deployed to perform one or more aspects of the invention. By way of example, deployment of an application includes providing a computer infrastructure operable to perform one or more aspects of the present invention.

As another aspect of the present invention, a computing infrastructure may be deployed comprising integrating computer-readable code into a computing system, wherein the code in combination with the computing system is capable of performing one or more aspects of the present invention.

As another aspect of the present invention, a process for integrating computing infrastructure may be provided that includes integrating computer readable code into a computer system. The computer system includes a computer-readable medium, wherein the computer medium includes one or more aspects of the present invention. The code in combination with the computer system is capable of performing one or more aspects of the present invention.

While various embodiments are described above, these are only examples. For example, computing environments of other architectures may incorporate and use one or more aspects of the present invention. Additionally, other sizes of registers may be used, and changes to the instructions may be made without departing from the spirit of the present invention.

In addition, other types of computing environments may benefit from one or more aspects of the present invention. As an example, a data processing system suitable for storing and/or executing program code is available that includes at least two processors coupled directly or indirectly to memory elements through a system bus. These memory elements include, for instance, local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.

Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, DASD, tapes, CDs, DVDs, drive and other storage media, etc.) can be coupled to the system either directly or through intervening I/O controllers. Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the available types of network adapters.

Referring to FIG. 8, representative components of a host computer system 5000 to implement one or more aspects of the present invention are depicted. The representative host computer 5000 comprises one or more CPUs 5001 in communication with computer memory (i.e., central storage) 5002, and I/O interfaces to storage media devices 5011 and networks 5010 for communicating with other computers or SANs and the like. The CPU 5001 conforms to an architecture with built instruction sets and built functionality. The CPU 5001 may have a Dynamic Address Translation (DAT)5003 for transforming program addresses (virtual addresses) into real addresses of memory. A DAT typically includes a Translation Lookaside Buffer (TLB)5007 for caching translations so that later accesses to a block of computer memory 5002 do not require delayed address translations. Typically, a cache 5009 is used between the computer memory 5002 and the processor 5001. The cache 5009 may be hierarchical, having a large cache available to more than one CPU and smaller, faster (lower-level) caches between the large cache and each CPU. In some implementations, the lower-level cache is partitioned to provide separate lower-level caches for instruction fetching and data accesses. In one embodiment, instructions are fetched from the memory 5002 by the instruction fetch unit 5004 via the cache 5009. Instructions are decoded in an instruction decode unit 5006 and instructions (other instructions, in some embodiments) are dispatched to one or more instruction execution units 5008. Several execution units 5008 are typically used, such as an arithmetic execution unit, a floating point execution unit, and a branch instruction execution unit. The instructions are executed by the execution units to access operands as needed from registers or memory specified by the instructions. If operands are to be accessed (loaded or stored) from memory 5002, the load/store unit 5005 typically handles the access under control of the instruction being executed. The instructions may be executed in hardware circuitry, or in internal microcode (firmware), or by a combination of both.

As noted, the computer system includes information in local (or main) storage, as well as addressing, protection, and reference and change records. Some aspects of addressing include the format of the address, the concept of address space, the various types of addresses, and the manner in which one type of address is translated to another type of address. Some of the primary storages include permanently assigned storage locations. The main storage provides directly addressable, fast access storage of data for the system. Both data and programs are loaded into the main storage (from the input device) before they can be processed.

The main storage may include one or more smaller fast access buffers (sometimes referred to as caches). Cache memories are typically associated with a CPU or an I/O processor. The effects of physical construction and use of different storage media (in addition to the effects on performance) are not generally observable by programs.

Separate caches may be maintained for instructions and for data operands. Information within a cache is maintained in contiguous bytes on an overall boundary known as a cache block or cache line (or simply a line). The model may provide an "EXTRACT CACHE ATTRIBUTE" instruction that returns the size of the CACHE line in bytes. The model may also provide "PREFETCH DATA (pre-fetch data)" and "PREFETCH DATA RELATIVE LONG (pre-fetch data relative length)" instructions that enable pre-fetching of storage into the data or instruction cache or the release of data from the cache.

The storage is treated as a long horizontal bit string. For most operations, accesses to storage are made in left-to-right order. The string of bits is subdivided into eight bit cells. An eight-bit element is called a byte, which is a basic building block for all information formats. Each byte location in storage is identified by a unique non-negative integer, which is the address of the byte location or simply the byte address. The adjacent byte positions have consecutive addresses, starting with 0 on the left and proceeding in left-to-right order. Addresses are unsigned binary integers and are 24, 31 or 64 bits.

Information is transferred between the storage and the CPU or channel subsystem one byte or group of bytes at a time. Unless otherwise specified, in z/Architecture, for example, a group of bytes in storage is addressed by the leftmost byte of the group. The number of bytes in the group is implicitly or explicitly specified by the operation to be performed. When used in CPU operations, groups of bytes are referred to as fields. Within each byte group, bits are numbered in order from left to right, e.g., in a z/Architecture. In the z/Architecture, the leftmost bits are sometimes referred to as "high order" bits, and the rightmost bits are sometimes referred to as "low order" bits. However, the bit number is not a memory address. Only bytes may be addressed. To operate on individual bits of a byte in storage, the entire byte is accessed. The bits in a byte are numbered 0 to 7 from left to right (in, for example, z/Architecture). Bits in the address may be numbered 8 to 31 or 40 to 63 for a 24-bit address, or 1 to 31 or 33 to 63 for a 31-bit address; for a 64-bit address, the bits in the address may be numbered 0 through 63. Within any other fixed-length format of a plurality of bytes, the bits that make up the format are numbered consecutively starting from 0. For the purpose of error detection, and preferably for the purpose of correction, one or more check bits may be transmitted with each byte or with groups of bytes. Such check bits are generated automatically by the machine and cannot be directly controlled by the program. The storage capacity is expressed in number of bytes. When the length of the operand field is implicitly stored by the opcode of the instruction, the field is considered to have a fixed length, which may be one, two, four, eight, or sixteen bytes. For some instructions, larger fields may be implied. When the length of the store operand field is not implicitly, but explicitly stated, the field is considered to have a variable length. The length of the variable length operand may be varied in increments of one byte (or for some instructions, in increments of multiples of two bytes or other multiples). When information is placed in storage, the contents of only those byte locations included in the indicated field are replaced, even though the width of the physical path to storage may be greater than the length of the field being stored.

Some information units will be on an overall boundary in the storage. A boundary is referred to as the entirety of a unit of information when its storage address is a multiple of the length (in bytes) of the unit. Special names are given to fields of 2, 4,8 and 16 bytes on the overall boundary. A halfword is a group of two consecutive bytes on a two-byte boundary and is the basic building block of instructions. A word is a group of four consecutive bytes on a four-byte boundary. A doubleword is a group of eight consecutive bytes on an eight-byte boundary. A quadword is a group of 16 consecutive bytes on a 16-byte boundary. When the memory address specifies a halfword, a word, a doubleword, and a quadword, the binary representation of the address contains one, two, three, or four rightmost zero bits, respectively. The instruction will be on a two-byte overall boundary. Most instructions have no boundary alignment requirements for their storage operands.

On devices implementing separate caches for instructions and data operands, significant delays may be experienced if a program is stored into a cache line from which instructions are subsequently fetched, regardless of whether the store modifies the subsequently fetched instructions.

In one embodiment, the present invention may be practiced with software (sometimes referred to as authorized internal code, firmware, microcode, millicode, microcode (pico-code), etc., any of which will be consistent with one or more aspects of the present invention). Referring to FIG. 8, software program code embodying one or more aspects of the present invention may be accessed by the processor 5001 of the host system 5000 from a long-term storage media device 5011 (such as a CD-ROM drive, tape drive or hard drive). The software program code may be embodied on any of a variety of known media for use with a data processing system such as a magnetic diskette, hard drive, or CD-ROM. The code may be distributed on such media, or may be distributed to users from the computer system's computer memory 5002 or storage to other computer systems via the network 5010 for use by users of such other systems.

The software program code includes an operating system which controls the function and interaction of the various computer components and one or more application programs. Program code is typically paged from storage media device 5011 to relatively high speed computer storage 5002 where it is available for processing by processor 5001. Techniques and methods for embodying software program code in memory, on physical media, and/or distributing software program code via networks are well known and will not be discussed further herein. When the program code is created and stored on tangible media, including, but not limited to, electronic memory modules (RAM), flash memory, Compact Discs (CD), DVDs, magnetic tapes, etc., the program code is often referred to as a "computer program product". The computer program product medium is generally readable by a processing circuit, preferably in a computer system, for execution by the processing circuit.

FIG. 9 illustrates a representative workstation or server hardware system in which one or more aspects of the present invention may be practiced. The system 5020 of fig. 9 comprises a representative base computer system 5021 (such as a personal computer, workstation or server), including optional peripheral devices. The base computer system 5021 comprises one or more processors 5026 and a bus to connect the processor 5026 with the other components of the system 5021 and to enable communication between the processor(s) 5026 and the other components of the system 5021 in accordance with known techniques. The bus connects the processor 5026 to memory 5025 and long-term storage 5027, the long-term storage 5027 may comprise, for example, a hard drive (including, for example, any of magnetic media, CD, DVD, and flash memory) or a tape drive. The system 5021 may also include a user interface adapter that connects the microprocessor 5026 via the bus to one or more interface devices, such as a keyboard 5024, a mouse 5023, a printer/scanner 5030, and/or other interface devices, which may be any user interface device such as a touch-sensitive screen, a digital keypad (entry pad), and the like. The bus also connects a display device 5022, such as an LCD screen or monitor, to the microprocessor 5026 via a display adapter.

The system 5021 may communicate with other computers or networks of computers by way of a network adapter capable of communicating (5028) with a network 5029. Example network adapters are communications channels, token ring, Ethernet or modems. Alternatively, the system 5021 may communicate using a wireless interface, such as a CDPD (cellular digital packet data) card. The system 5021 can be associated with such other computers in a Local Area Network (LAN) or a Wide Area Network (WAN), or the system 5021 can be a client in a client/server arrangement with another computer, or the like. All of these configurations, as well as appropriate communication hardware and software, are known in the art.

Fig. 10 illustrates a data processing network 5040 in which one or more aspects of the present invention may be practiced. Data processing network 5040 may include a plurality of separate networks (such as wireless networks and wired networks), each of which may include a plurality of separate workstations 5041, 5042, 5043, 5044. Additionally, one or more LANs may be included, where a LAN may include a plurality of intelligent workstations coupled to the host processor, as will be appreciated by those skilled in the art.

Still referring to FIG. 10, the network may also include mainframe computers or servers, such as a gateway computer (client server 5046) or application server (remote server 5048, which may access a data repository and may also be accessed directly from a workstation 5045). The gateway computer 5046 acts as an entry point to each individual network. A gateway is needed when connecting one networking protocol to another. The gateway 5046 may be coupled to another network (e.g., the internet 5047), preferably by way of a communication link. Can also makeThe gateway 5046 is directly coupled to one or more workstations 5041, 5042, 5043, 5044 with a communications link. IBM eServer available from International Business Machines Corporation may be utilized^TMSystem z server to implement the gateway computer.

Referring concurrently to fig. 9 and 10, software program code which may embody one or more embodiments of the present invention may be accessed by the processor 5026 of the system 5020 from long-term storage media 5027, such as a CD-ROM drive or hard drive. The software program code may be embodied on any of a variety of known media for use with a data processing system such as a diskette, hard drive, or CD-ROM. Program code may be distributed on such media, or may be distributed to users 5050, 5051 from the memory or storage of one computer system to other computer systems via a network, for use by users of such other systems.

Alternatively, the program code may be embodied in the memory 5025 and accessed by the processor 5026 using a processor bus. This program code includes an operating system which controls the function and interaction of the various computer components and one or more application programs 5032. The program code is typically paged from the storage medium 5027 to high speed memory 5025 where it is available for processing by the processor 5026. Techniques and methods for embodying software program code in memory, on physical media, and/or distributing software program code via networks are well known and will not be discussed further herein. When the program code is created and stored on tangible media, including, but not limited to, electronic memory modules (RAM), flash memory, Compact Discs (CD), DVDs, magnetic tapes, etc., the program code is often referred to as a "computer program product". The computer program product medium is generally readable by a processing circuit, preferably in a computer system, for execution by the processing circuit.

The cache that is most readily available to the processor (typically faster and smaller than the other caches of the processor) is the lowest level (L1 or level one) cache, and the main storage (main memory) is the highest level cache (L3 if there are 3 levels). The lowest level cache is often divided into an instruction cache (I-cache) that holds the machine instructions to be executed and a data cache (D-cache) that holds the data operands.

Referring to fig. 11, an exemplary processor embodiment is depicted for the processor 5026. Typically, one or more hierarchical cache blocks of cache 5053 are used in order to improve processor performance. Cache 5053 is a cache that holds cache lines of memory data that may be used. Typical cache lines are 64, 128 or 256 bytes of memory data. In addition to being used to cache data, a separate cache is often used to cache instructions. Cache coherency (synchronization of copies of lines in memory and cache) is often provided by various "snoop" algorithms well known in the art. The main memory storage 5025 of the processor system is often referred to as a cache memory. In a processor system having 4 levels of cache 5053, the main storage 5025 is sometimes referred to as a level 5(L5) cache because it is typically faster and only holds portions of the non-volatile storage (DASD, tape, etc.) available to the computer system. The main storage 5025 "caches" pages of data that are paged in and out of the main storage 5025 by the operating system.

Program counter (instruction counter) 5061 keeps track of the address of the current instruction to be executed. The program counter in the z/Architecture processor is 64 bits and can be truncated to 31 or 24 bits to support previous addressing restrictions. The program counter is typically embodied in the computer's PSW (program status word) such that the program counter persists during context switches. Thus, an in-flight program with a program counter value may be interrupted by, for example, the operating system (a context switch from the program environment to the operating system environment). The PSW of the program maintains a program counter value when the program is not functional, and uses the program counter of the operating system (in the PSW) while the operating system is executing. Typically, the program counter is incremented by an amount equal to the number of bytes of the current instruction. RISC (reduced instruction set computing) instructions are typically of fixed length, while CISC (Complex instruction set computing) instructions are typically of variable length. IBM z/Architecture's instructions are CISC instructions having a length of 2, 4, or 6 bytes. For example, program counter 5061 is modified by a context switch operation or a branch taken operation of a branch instruction. In a context switch operation, the current program counter value is saved in a program status word along with other status information about the program being executed (such as condition codes), and a new program counter value is loaded to point to the instruction of the new program module to be executed. A branch taken operation is performed to permit the program to make decisions or loop within the program by loading the result of the branch instruction into the program counter 5061.

Typically, instruction fetch unit 5055 is used to fetch instructions on behalf of processor 5026. The fetch unit fetches the "next sequential instruction", the target instruction of the branch taken instruction, or the first instruction of the program after a context switch. Modern instruction fetch units often use prefetching techniques to speculatively prefetch instructions based on the likelihood that pre-fetched instructions may be used. For example, the fetch unit may fetch 16 bytes of the instruction that includes the next sequential instruction and the additional bytes of the next sequential instruction.

The fetched instructions are then executed by the processor 5026. In one embodiment, the fetched instruction(s) are passed to the fetch unit's dispatch unit 5056. The dispatch unit decodes the instruction(s) and forwards information about the decoded instruction(s) to the appropriate units 5057, 5058, 5060. The execution unit 5057 will typically receive information about decoded arithmetic instructions from the instruction fetch unit 5055, and will perform arithmetic operations on operands according to the opcode of the instruction. Operands are provided to the execution units 5057, preferably from storage 5025, from architected registers 5059, or from the immediate field of the instruction being executed. When storing the results of the execution, the results of the execution are stored in storage 5025, registers 5059, or other machine hardware (such as control registers, PSW registers, etc.).

Processor 5026 typically has one or more units 5057, 5058, 5060 for performing the function of instructions. Referring to fig. 12A, an execution unit 5057 may communicate with built general registers 5059, decode/dispatch unit 5056, load store unit 5060, and other processor units 5065 by way of interface logic 5071. The execution unit 5057 may use a number of register circuits 5067, 5068, 5069 to hold information that an Arithmetic Logic Unit (ALU)5066 is to operate on. The ALU performs arithmetic operations (such as addition, subtraction, multiplication, and division) as well as logical functions (such as "and" (and), "OR" (or), and "XOR" (XOR), rotation, and shifting). Preferably, the ALU supports specialized operations that are design dependent. Other circuitry may provide other structured tools 5072, including, for example, condition codes and recovery support logic. Typically, the result of the ALU operation is held in an output register circuit 5070, which output register circuit 5070 may forward the result to a variety of other processing functions. There are many arrangements of processor units and the description of the invention is intended only to provide a representative understanding of one embodiment.

The "add" instruction (for example) will execute in an execution unit 5057 with arithmetic and logical functionality, while the floating point instruction (for example) will execute in floating point execution with specialized floating point capabilities. Preferably, the execution unit operates on operands identified by the instruction by performing an opcode-defined function on the operands. For example, an "add" instruction may be executed by the execution unit 5057 on operands found in two registers 5059 identified by the register fields of the instruction.

The execution unit 5057 performs arithmetic addition on two operands and stores the result in a third operand, where the third operand may be a third register or one of the two source registers. The execution unit preferably utilizes an Arithmetic Logic Unit (ALU)5066, which Arithmetic Logic Unit (ALU)5066 is capable of executing a variety of logical functions, such as shift, rotate, sum, Or, And exclusive Or, XOR, And algebraic functions including any of addition, subtraction, multiplication, division. Some ALUs 5066 are designed for scalar operations and some ALUs 5066 are designed for floating point operations. Depending on the architecture, the data may be Big-end (Big Endian) with the least significant byte at the most significant byte address or little-end (little Endian) with the least significant byte at the least significant byte address. IBM z/Architecture is a big-end method. Depending on the architecture, the unsigned field may be a sign and magnitude (1's complement or 2's complement). A 2's complement is advantageous because the ALU does not need to design the subtraction capability, since in the ALU negative or positive values in the 2's complement only require addition. Numbers are typically described in shorthand, where a 12-bit field defines the address of a 4,096 byte block, and is typically described as, for example, a 4Kbyte block.

Referring to FIG. 12B, branch instruction information for executing branch instructions is typically sent to the branch unit 5058, and the branch unit 5058 often uses branch prediction algorithms (such as the branch history table 5082) to predict the outcome of a branch before other conditional operations are completed. The target of the current branch instruction will be fetched and speculatively executed before the conditional operation completes. When the conditional operation is completed, the speculatively executed branch instruction is completed or discarded based on the condition of the conditional operation and the speculative result. A typical branch instruction may test the condition code and branch to a target address if the condition code satisfies the branch requirement of the branch instruction, the target address may be calculated based on a number of digits including, for example, a digit found in a register field or an immediate field of the instruction. The branch unit 5058 may use an ALU 5074 having a plurality of input register circuits 5075, 5076, 5077 and output register circuits 5080. For example, the branch unit 5058 may communicate with general registers 5059, decode dispatch unit 5056, or other circuitry 5073.

Execution of a group of instructions may be interrupted for a variety of reasons including, for example: context switches initiated by the operating system, program exceptions or errors causing a context switch, I/O interrupts causing a context switch, or multi-threaded activity of multiple programs (in a multi-threaded environment). Preferably, the context switch action saves state information about the program currently being executed and then loads state information about another program being invoked. For example, the state information may be saved in hardware registers or in memory. The state information preferably contains the program counter value pointing to the next instruction to be executed, condition codes, memory translation information and constructed register contents. The context switch activity may be trained by hardware circuitry, an application program, an operating system program, or firmware program code (microcode, or authorized internal code (LIC)), alone or in combination.

The processor accesses operands according to the instruction defined method. An instruction may use the value of a portion of the instruction to provide an immediate operand, which may provide one or more register fields that explicitly point to general purpose registers or special purpose registers (e.g., floating point registers). The instruction may utilize an implicit register identified as an operand by the opcode field. The instruction may use the memory location for the operand. The memory location of the operand may be provided by a register, an immediate field, or a combination of a register and an immediate field, as exemplified by the z/Architecture long displacement facility (long displacement facility), where the instruction defines, for example, a base register, an index register, and an immediate field (displacement field) that are added together to provide the address of the operand in memory. Unless otherwise indicated, a location herein typically implies a location in main memory (primary storage).

Referring to FIG. 12C, the processor uses a load/store unit 5060 to access storage. The load/store unit 5060 may perform a load operation by obtaining the address of the target operand in memory 5053 and loading the operand in a location of register 5059 or another memory 5053, or may perform a store operation by obtaining the address of the target operand in memory 5053 and storing the data obtained from the location of register 5059 or another memory 5053 in the target operand location in memory 5053. The load/store unit 5060 may be speculative and may be a sequential access store out of order with respect to instruction order, however, the load/store unit 5060 maintains the appearance of executing instructions in order for a program. The load/store unit 5060 may communicate with general registers 5059, decode/dispatch unit 5056, cache/memory interface 5053, or other elements 5083, and includes various register circuits, ALUs 5085, and control logic 5090 to calculate storage addresses and provide pipeline sequencing to keep operations in order. Some operations may be out-of-order, but the load/store unit provides functionality that makes out-of-order operations appear to a program to have been executed in order, as is well known in the art.

Preferably, the addresses that are "seen" by the application are often referred to as virtual addresses. Virtual addresses are sometimes referred to as "logical addresses" and "effective addresses". These virtual addresses are virtual because: they are redirected to a physical memory location by one of a variety of Dynamic Address Translation (DAT) techniques including, but not limited to, prefixing a virtual address with an offset value only, translating the virtual address via one or more translation tables, preferably containing, either alone or in combination, at least a segment table and a page table, preferably the segment table having an entry pointing to the page table. In the z/Architecture, a translation hierarchy is provided, including a region first table, a region second table, a region third table, a segment table, and an optional page table. The performance of address translation is often improved by utilizing a Translation Lookaside Buffer (TLB), which includes entries that map virtual addresses to associated physical memory locations. These entries are established when the DAT translates a virtual address using the translation table. Subsequent use of the virtual address may then utilize the entry of the fast TLB, rather than the slow in-order translation table access. TLB content may be managed by a variety of replacement algorithms including LRU (least recently used).

In the case where the processor is a processor of a multi-processor system, each processor has the responsibility of keeping common resources such as I/O, cache, TLB, and memory interlocked to achieve coherency. Typically, a "snooping" technique will be utilized in maintaining cache coherency. In a snooping environment, each cache line may be marked as being in any one of the following states in order to facilitate sharing: shared state, exclusive state, changed state, invalid state, etc.

An I/O unit 5054 (fig. 11) provides a means for the processor to attach to peripheral devices, including, for example, tapes, optical disks, printers, displays, and networks. I/O cells are often presented to a computer program by a software driver. At a mainframe computer (such as fromSystem of) Channel adapters and open system adapters are the I/O units of a mainframe computer that provide communication between the operating system and peripheral devices.

In addition, other types of computing environments may benefit from one or more aspects of the present invention. As an example, an environment may include an emulator (e.g., software or other emulation mechanisms) in which a particular architecture (including, for example, instruction execution, architected functions (such as address translation), and architected registers) or a subset thereof (e.g., on a local computer system with a processor and memory) is emulated. In such an environment, one or more emulation functions of an emulator can implement one or more aspects of the present invention, even though the computer executing the emulator may have a different architecture than the capability being emulated. As one example, in emulation mode, a particular instruction or emulating operation is decoded and the appropriate emulation function is built to implement the separate instruction or operation.

In an emulation environment, a host computer includes (for example): a memory storing instructions and data; an instruction fetch unit that fetches instructions from memory and optionally provides a local buffer of fetched instructions; an instruction decode unit that receives fetched instructions and determines the type of instructions that have been fetched; and an instruction execution unit that executes the instructions. The executing may include: loading data from a memory into a register; storing data from the register back to the memory; or perform some type of arithmetic or logical operation (as determined by the decode unit). In one example, each unit is implemented in software. For example, the operations being performed by these units are implemented as one or more subroutines within emulator software.

More specifically, in a mainframe computer, the built machine instructions are often used by programmers (today often "C" programmers) by means of compiling applications. These instructions stored in the storage medium may be locally in the z/ArchitectureIn a server or in a machine executing other architectures. Can be in the present and futureIn large computer servers andother machines (e.g., Power Systems servers and Systems)Server) emulate the instructions. Can be used byAMD^TMAnd other manufactured hardware in a wide variety of machines that execute Linux. In addition to executing on this hardware under z/Architecture, Linux may also be used, as well as emulated machines by Hercules, UMX, or FSI (fundamental software, Inc), where execution is substantially in emulation mode. In emulation mode, emulation software is executed by the native processor to emulate the architecture of the emulated processor.

The native processor typically executes emulation software, including firmware or a native operating system, to perform emulation of the emulated processor. The emulation software is responsible for fetching and executing instructions that emulate the processor architecture. The emulation software maintains an emulation program counter to track instruction boundaries. The emulation software may fetch one or more emulated machine instructions at a time and convert the one or more emulated machine instructions to a corresponding group of native machine instructions for execution by the native processor. These translated instructions may be cached so that faster translations may be achieved. Nevertheless, the emulation software will maintain the architectural rules of the emulated processor architecture in order to ensure that the operating system and the application written for the emulated processor operate correctly. Further, the emulation software will provide the resources identified by the emulated processor architecture, including, but not limited to, control registers, general purpose registers, floating point registers, dynamic address translation functions including, for example, segment and page tables, interrupt mechanisms, context switch mechanisms, time of day (TOD) clocks, and architected interfaces to the I/O subsystem, so that an operating system or application designed to execute on the emulated processor can execute on the native processor with the emulation software.

The decode is emulating a particular instruction and calls a subroutine to perform the function of the individual instruction. Simulation software functions that simulate the functions of a simulation processor are implemented, for example, as follows: a "C" subroutine or driver, or some other method of providing a driver for specific hardware that will be within the skill of those in the art after understanding the description of the preferred embodiment. Various software and hardware simulation patents, including (but not limited to) the following, describe a number of known ways to arrive at instruction formats built for different machines for simulation of a target machine that may be used by those skilled in the art: U.S. patent certificate No. 5,551,013 entitled "Multiprocessor for Hardware Emulation" to Beausoleil et al; and U.S. patent certificate No. 6,009,261 entitled "Preprocessing of Stored Target routes for organizing the expressed incorporated electronic measurements on a Target Processor" to Scalazi et al; and U.S. patent certificate No. 5,574,873 entitled "Decoding Guest Instruction to direct Access Instructions for the Guest Instructions" by Davidian et al; and U.S. patent certificate No. 6,308,255 entitled "symmetric Multiprocessing Bus and chip Used for Coprocessor supported Non-Native Code to Run in a System" by Gorishek et al; and U.S. patent certificate No. 6,463,582 entitled "Dynamic Optimizing Object Code Translator for architectural implementation and Dynamic Optimizing Object Code Translation Method" by Lethin et al; and U.S. patent certificate No. 5,790,825 entitled "Method for simulating guide measurements on Host Computer Through Dynamic Recompression of Host measurements" by Eric Trout (each of the aforementioned patent certificates is hereby incorporated by reference in its entirety); and many others.

In FIG. 13, an example of an emulated host computer system 5092 is provided that emulates the host computer system 5000' of the host architecture. In emulated host computer system 5092, host processor (CPU)5091 is an emulated host processor (or virtual host processor) and includes an emulated processor 5093 having a native instruction set architecture that is different from that of processor 5091 of host computer 5000'. Emulation host computer system 5092 has memory 5094 accessible by emulation processor 5093. In an example embodiment, memory 5094 is partitioned into a host computer memory 5096 portion and an emulation routines 5097 portion. Host computer memory 5096 may be used to emulate the programs of host computer 5092 according to the host computer architecture. Emulation processor 5093 executes native instructions of a built instruction set of an architecture different from native instructions of emulation processor 5091, obtained from emulation routines memory 5097, and may access host instructions for execution from programs in host computer memory 5096 by using one or more instructions obtained in a sequence and access/decode routine that may decode the accessed host instruction(s) to determine a native instruction execution routine for emulating the function of the accessed host instruction. By way of example, other tools defined for the architecture of the host computer system 5000' may be emulated by the implemented tool routines, including such tools as general purpose registers, control registers, dynamic address translation and I/O subsystem support and processor caches. The emulation routine may also utilize functions available in the emulation processor 5093 (such as dynamic translation of general purpose registers and virtual addresses) to improve the performance of the emulation routine. Special hardware and off-load engines may also be provided to assist the processor 5093 in emulating the functionality of the host computer 5000'.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below, if any, are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of one or more aspects of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Vector string tool

Instructions

Unless otherwise specified, all operands are vector register operands. A "V" in the assembler syntax indicates a vector operand.

Vector finding any equality

From left to right, comparing each unsigned binary integer element of the second operand with each unsigned binary integer element of the third operand, and if at M₅The field has a zero search flag set, and is optionally compared to zero.

If M is₅The Result Type (RT) flag in the field is zero, then for each element in the second operand that matches either an element in the third operand or optionally zero, the bit position of the corresponding element in the first operand is set to one, otherwise it is set to zero.

If M is₅The Result Type (RT) flag in the field is one, the byte index of the leftmost element in the second operand that matches an element in the third operand or zero is stored in byte seven of the first operand.

Each instruction has an extended mnemonic section that describes the recommended extended mnemonic and its corresponding machine assembler syntax.

Programming comments: for all instructions that optionally set condition codes, performance may be degraded if the condition codes are set.

If M is₅The Result Type (RT) flag in the field is one and no bytes found to be equal, or zero (if the zero search flag is set), then an index equal to the number of bytes in the vector is stored in byte seven of the first operandIn (1).

M₄The field specifies the element size control (ES). The ES control specifies the size of the elements in the vector register operand. If a reserved value is specified, it is treated as a specified exception.

0-byte

1-half character

2-word

3 to 15-Retention

M₅The fields have the following format:

m is defined as follows₅Bit of field:

result Type (RT): if zero, each resulting element is a mask for all range comparisons for that element. If it is a one, the byte index is stored into byte seven of the first operand, and zeros are stored in all other elements.

Zero Search (ZS): if one, then each element of the second operand is also compared to zero.

Condition code setting (CC): if zero, the condition code is not set and remains unchanged. If one, the condition code is set as specified in the following paragraph.

Special conditions

If any of the following occurs, then the exception is treated as a specification and no other action is taken:

1.M₄the fields contain values from 3 to 15.

2.M₅Bit 0 of the field is not zero.

The resulting condition code:

if the CC flag is zero, the code remains unchanged.

If the CC flag is one, the code is set as follows:

0 if the ZS-bit is set, there is no match in the index elements lower than zero in the second operand.

1 some elements of the second operand match at least one element in the third operand.

2 all elements of the second operand match at least one element in the third operand.

3 no element in the second operand matches any element in the third operand.

Exception to the procedure:

1 data with DXC FE, vector register

Operation without vector extension tool installed

Specify (reserved ES value)

Transaction Constraint (Transaction Constraint)

Extended mnemonics:

vector find element equality

Proceeding from left to right, unsigned binary integer elements of the second operand are compared to corresponding unsigned binary integer elements of the third operand. If the two elements are equal, the byte index of the first byte of the leftmost equal element is placed in byte seven of the first operand. Zeros are stored in the remaining bytes of the first operand. If no bytes are found to be equal, or if no bytes are found to be zero (if a zero compare is set), then an index equal to the number of bytes in the vector is stored in byte seven of the first operand. Zeros are stored in the remaining bytes.

If at M₅With a Zero Search (ZS) bit set in the field, each element in the second operand is also compared for equality with zero. If an element of zero is found in the second operand before any other elements of the second operand and the third operand are found to be equal, the byte index of the first byte of the element found to be zero is stored in byte seven of the first operand, and zeros are stored in all other byte positions. If the condition code set (CC) flag is one, then the condition code is set to zero.

0-byte

1-half character

2-word

3 to 15-Retention

M₅The fields have the following format:

m is defined as follows₅Bit of field:

retention: bits 0 to 1 are reserved and bits 0 to 1 must be zero. Otherwise, consider as a specified exception.

Condition code setting (CC): if zero, the condition code remains unchanged. If one, the condition code is set as specified in the following paragraph.

Special conditions

1.M₄the fields contain values from 3 to 15.

2.M₅Bits 0 to 1 of the field are not zeros.

The resulting condition code:

if M is to be₅Bit 3 of the field is set to one, then the code is set as follows:

0 if the zero compare bit is set, the comparison detects a zero element in the second operand in an element having a smaller index than any equal comparison.

The 1-compare detects a match between the second operand and the third operand in some elements. If the zero compare bit is set, then this match occurs in an element having an index less than or equal to the zero compare element.

2--

3 no elements compare equally.

If M is₅Bit 3 of the field is zero, the code remains unchanged.

Exception to the procedure:

data with DXC FE, vector register

Operation without vector extension tool installed

Specify (reserved ES value)

Transaction constraints

Extended mnemonics:

programming comments:

1. for any element size, a byte index is always stored into the first operand. For example, if the element size is set to half-word and the 2 nd indexed half-word is compared to be equal, byte index 4 will be stored.

2. The third operand should not contain an element with a value of zero. If the third operand does contain a zero and matches the zero element in the second operand before any other equality comparison, then the condition code is set regardless of the zero compare bit setting.

Vector finding element inequality

Proceeding from left to right, unsigned binary integer elements of the second operand are compared to corresponding unsigned binary integer elements of the third operand. If the two elements are not equal, then the byte index of the leftmost unequal element is placed in byte seven of the first operand, and zeros are stored to all other bytes. If M is to be₅The condition code set (CC) bit in the field is set to one, then the condition code is set to indicate which operand is larger. If all elements are equal, a byte index equal to the vector size is placed in byte seven of the first operand, and zeros are placed in all other byte positions. If the CC bit is one, then condition code three is set.

If at M₅With a Zero Search (ZS) bit set in the field, each element in the second operand is also compared for equality with zero. If a zero element is found in the second operand before any other element of the second operand is found to be unequal, then the byte index of the first byte of the element found to be zero is stored in byte seven of the first operand. Zeros are stored in all other bytes and condition code 0 is set.

0-byte

1-half character

2-word

3 to 15-Retention

M₅The fields have the following format:

m is defined as follows₅Bit of field:

Special conditions

1.M₄the fields contain values from 3 to 15.

2.M₅Bits 0 to 1 of the field are not zeros.

The resulting condition code:

0 if the zero compare bit is set, the comparison detects a zero element in the two operands in a lower index element than any unequal comparison

1 element mismatch is detected and the elements in VR2 are smaller than the elements in VR3

2 element mismatch is detected and the element in VR2 is larger than the element in VR3

3 all elements compare equally and if the zero compare bit is set, no zero element is found in the second operand.

If M is₅Bit 3 of the field is zero, the code remains unchanged.

Exception to the procedure:

data with DXC FE, vector register

Operation without installing vector expansion tools

Specify (reserved ES value)

Transaction constraints

Extended mnemonics:

vector string range comparison

Proceeding from left to right, unsigned binary integer elements in the second operand are compared to a range of values defined by even-odd pairs of elements in the third operand and the fourth operand. The range of comparisons to be performed is defined in combination with the control value from the fourth operand. An element is considered a match if it matches any of the ranges specified by the third operand and the fourth operand.

If M is₆The Result Type (RT) flag in the field is zero, then if the element in the first operand that corresponds to the element in the second operand being compared matches any of these ranges, thenThe bit position of the element is set to one, otherwise it is set to zero.

If M is to be₆With the Result Type (RT) flag in the field set to one, the byte index of the first element of the second operand that matches either one of the ranges specified by the third operand and the fourth operand or a zero compare (if the ZS flag is set to one) is placed in byte seven of the first operand, and zeros are stored in the remaining bytes. If there is no element match, an index equal to the number of bytes in the vector is placed in byte seven of the first operand, and zeros are stored in the remaining bytes.

M₆A Zero Search (ZS) flag in the field, if set to one, adds the comparison of the second operand element with zero to the range provided by the third operand and the fourth operand. If it is a zero compare in a lower index element than any other true compare, then the condition code is set to zero.

These operands include a value represented by M₅The element size in the field controls the element of the specified size.

The fourth operand element has the following format:

if ES equals 0:

if ES equals 1:

if ES equals 2:

the bits in the fourth operand element are defined as follows:

equal (EQ): when it is one, an equality comparison is performed.

Greater Than (GT): when one, a greater than comparison is performed.

Less Than (LT): when one, a less than compare is performed.

All other bits are reserved and should be zero to ensure future compatibility.

These control bits may be used in any combination. If none of these bits are set, the comparison will always produce a false result. If all of these bits are set, the comparison will always produce a true result.

M₅The field specifies the element size control (ES). The ES control specifies the size of the elements in the vector register operand. If a reserved value is specified, it is treated as a specified exception.

0-byte

1-half character

2-word

3 to 15-Retention

M₆The fields have the following format:

m is defined as follows₆Bit of field:

inversion result (IN): if zero, the comparison with the pair of values in the control vector continues. If one, the results of these pairwise comparisons in these ranges are inverted.

Result Type (RT): if zero, each resulting element is a mask for all range comparisons for that element. If so, the index is stored into byte seven of the first operand. Zeros are stored in the remaining bytes.

Special conditions

1.M₄the fields contain values from 3 to 15.

The resulting condition code:

0 if ZS 1 and find zero in index elements lower than any comparison

1 discovery comparison

2--

3 No comparison was found

Exception to the procedure:

data with DXC FE, vector register

Operation without installing vector expansion tools

Specify (reserved ES value)

Transaction constraints

Extended mnemonics:

ES＝1，ZS＝0

result of VR1(a) RT ═ 0

Result of VR1(b) RT ═ 1

Loading counts to block boundaries

A 32-bit unsigned binary integer (overlaid in sixteen) containing the number of bytes that may be loaded from the second operand location without crossing the specified block boundary is placed in the first operand.

The shifts are treated as 12-bit unsigned integers.

The second operand address is not used to address data.

M₃The field specifies the code used to signal the CPU about the block boundary size to count the number of possible bytes loaded. If a reserved value is specified, it is treated as a specified exception.

The resulting condition code:

0 operand one is sixteen

1 --

2 --

3 operands one is less than sixteen

The resulting condition code:

exception to the procedure:

operation without installing vector expansion tools

Specify

Programming comments: loading COUNTs TO BLOCK boundaries (LOAD COUNT TO BLOCK boundaries) is used in conjunction with loading VECTORs TO BLOCK boundaries (VECTOR LOAD TO BLOCK boundaries) TO determine the number of bytes loaded.

Vector Loading GR from VR elements

Will have a structure represented by M₄The ES value in the field specifies a size and an element of a third operand indexed by the second operand address is placed in the first operand position. The third operand is a vector register. The first operand is a general register. If the index specified by the second operand address is greater than the highest numbered element in the third operand having the specified element size, the data in the first operand is unpredictable.

If a vector register element is smaller than a doubleword, then that element is exactly aligned in the 64-bit general register and zeros fill the remaining bits.

The second operand address is not used to address data; instead, the rightmost 12 bits of the address are used to specify the index of the element within the second operand.

0-byte

1-half character

2-word

3-double character

4 to 15-Retention.

The resulting condition code: the code is not changed.

Exception to the procedure:

data with DXC FE, vector register

Operation without installing vector expansion tools

Specify (reserved ES value)

Transaction constraints

Extended mnemonics:

vector loading to Block boundaries

The first operand is loaded with bytes from the second operand, starting with the zero index byte element. If a boundary condition is encountered, the remainder of the first operand is unpredictable. No access exceptions are seen with respect to unloaded bytes.

The displacement for VLBB is treated as a 12-bit unsigned integer.

M₃The field specifies the code used to signal to the CPU about the block boundary size to load into. If a reserved value is specified, it is treated as a specified exception.

The resulting condition code: the code remains unchanged.

Exception to the procedure:

access (get, operand 2)

Data with DXC FE, vector register

Operation without installing vector expansion tools

Specify (reserved block boundary code)

Transaction constraints

Programming comments:

1. in some cases, data may be loaded across block boundaries. However, this will only happen if there is no access exception for the data.

Vector storage

A 128-bit value in the first operand is stored to a storage location specified by the second operand. The displacement for the VST is treated as a 12-bit unsigned integer.

The resulting condition code: the code remains unchanged.

Exception to the procedure:

access (store, operand 2)

Data with DXC FE, vector register

Operation without installing vector expansion tools

Transaction constraints

Vector storage with length

Proceeding from left to right, bytes from the first operand are stored at the second operand location. The general register specifying the third operand contains a 32-bit unsigned integer containing a value representing the highest index byte stored. If the third operand contains a value greater than or equal to the highest byte index of the vector, then all bytes of the first operand are stored.

Only as an access exception with respect to the stored bytes.

The displacement for the VECTOR STORE with length (VECTOR STORE WITH LENGTH) is treated as a 12-bit unsigned integer.

The resulting condition code: the condition code remains unchanged.

Exception to the procedure:

access (store, operand 2)

Data with DXC FE, vector register

Operation without installing vector expansion tools

Transaction constraints

Description of RXB

All vector instructions have a field in bits 36 through 40 of the instruction labeled RXB. This field contains the most significant bits for all the operand specified by the vector register. Bits are reserved for registers not specified by the instruction and should be set to zero; otherwise, the program will not operate compatibly in the future. The most significant bit is concatenated to the left of the four-bit register indication to create a five-bit vector register indication.

These bits are defined as follows:

0. among bits 8 through 11 of the instruction are the most significant bits for the vector register designation.

1. Among bits 12 through 15 of the instruction are the most significant bits for vector register designation.

2. Among bits 16 through 19 of the instruction are the most significant bits for the vector register designation.

3. Among bits 32 through 35 of the instruction are the most significant bits for the vector register designation.

Vector enable control

If both vector enable control (bit 46) and AFP register control (bit 45) in control register zero are set to one, then only the vector register and instruction may be used. If the vector tool is installed and the vector instruction is executed without the enable bit set, then it is treated as an exception to data having DXC FE hexadecimal. If the vector tool is not installed, the exception to the operation is considered.

Claims

1. A method for executing a machine instruction in a central processing unit, the method comprising:

obtaining, by a processor, a machine instruction for execution, the machine instruction defined according to a computer architecture for computer execution, the machine instruction comprising:

at least one opcode field providing an opcode, the opcode identifying a load to block boundary operation;

a register field to specify a register, the register including a first operand;

at least one field for locating a second operand in main memory; and

a block boundary type indicator to indicate a specified type of block boundary for the second operand; and

executing the machine instruction, the executing comprising:

loading only bytes of the first operand with corresponding bytes of the second operand for use within a block of main memory that is dynamically determined based on the specified type of block boundary and one or more characteristics of the processor, wherein the loading only includes: loading a variable amount of data from a block of a second operand to a first operand while ensuring that only data within the block of the second operand is loaded into the first operand, wherein loading from the block of the second operand begins at a start address in the block of the second operand, the start address being provided by a machine instruction, and wherein loading ends at or before a determined block boundary of the block of the second operand, wherein the variable amount of loaded data is dynamically determined based on the start address and the determined block boundary, based on a specified type of block boundary and one or more characteristics of a processor,

wherein the address of the second operand is a starting address in memory from which data is to be loaded into the first operand, and wherein the performing further comprises: determining an end address at which a load is to be stopped, wherein the load is stopped at the end address; and

wherein the determining the end address comprises calculating the end address as follows:

end address ═ minimum value in (start address + (boundary size- (start address AND NOT boundary mask)), start address + register size), where the boundary size is the block boundary, boundary mask equals 0-boundary size, AND register size is the specified length of the register.

2. The method of claim 1, wherein the at least one field comprises a displacement field, a base field, and an index field, the base field and the index field for locating general registers having contents to be added to contents of the displacement field to form the address of the second operand.

3. The method of claim 1, wherein the machine instruction further comprises a mask field, the mask field comprising a block boundary type indicator.

4. The method of claim 1, wherein the one or more characteristics comprise one of a cache line size of the processor or a page size of the processor.

5. The method of claim 1, wherein the executing comprises determining a block boundary of the block using an address of the second operand, wherein the address is used in a data structure lookup to determine the block boundary of the block.

6. The method of claim 1, wherein the loading comprises one of: the first operand is loaded from left to right or from right to left.

7. The method of claim 6, wherein the direction of the loading is provided at execution time.

8. The method of claim 1, wherein the machine instruction further comprises an extension field to be used in designating one or more registers, and wherein the register field is combined with at least a portion of the extension field to designate the register.

9. A system comprising means adapted for carrying out all the steps of the method according to any preceding method claim.