HK1201355B - Method and system for executing a machine instruction in a central processing unit - Google Patents
Method and system for executing a machine instruction in a central processing unit Download PDFInfo
- Publication number
- HK1201355B HK1201355B HK15101822.5A HK15101822A HK1201355B HK 1201355 B HK1201355 B HK 1201355B HK 15101822 A HK15101822 A HK 15101822A HK 1201355 B HK1201355 B HK 1201355B
- Authority
- HK
- Hong Kong
- Prior art keywords
- operand
- register
- instruction
- elements
- zero
- Prior art date
Links
Description
Technical Field
One aspect of the present invention relates generally to text processing and, more particularly, to facilitating processing associated with character data.
Background
Text processing often requires comparison of character data, including but not limited to comparison of strings of comparison character data. Typically, instructions to compare character data compare a single byte of data at a time.
In addition, text processing often requires other types of character data processing, including finding a termination point (e.g., the end of a character), determining the length of the character data, finding a particular character, and so forth. Current instructions to perform these types of processing tend to be inefficient.
Disclosure of Invention
The shortcomings of the prior art are overcome and advantages are provided through the provision of a computer program product for executing machine instructions. The computer program product includes a computer-readable storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method. The method includes (for example): obtaining, by a processor, a machine instruction for execution, the machine instruction defined according to a computer architecture for computer execution, the machine instruction comprising: at least one opcode field providing an opcode, the opcode identifying a vector string range compare operation; an extension field for use in designating one or more registers; a first register field combined with a first portion of the extension field to specify a first register, the first register including a first operand; a second register field combined with a second portion of the extension field to specify a second register, the second register including a second operand; a third register field combined with a third portion of the extension field to specify a third register, the third register including a third operand; a fourth register field combined with a fourth portion of the extension field to indicate a fourth register, the fourth register including a fourth operand; a mask field comprising one or more controls to be used during execution of the machine instruction; and executing the machine instruction, the executing comprising: comparing elements of the second operand with one or more values of the third operand using one or more controls programmatically provided by the fourth operand to determine whether there is a match defined by the one or more values of the third operand and the one or more controls of the fourth operand; and providing a result in the first operand based on the comparison.
Methods and systems relating to one or more aspects of the present invention are also described and claimed herein. Additionally, services relating to one or more aspects of the present invention are also described and may be claimed herein.
Additional features and advantages are realized through the techniques of the present invention. Other embodiments and aspects of the invention are described in detail herein and are considered a part of the claimed invention.
Drawings
One or more aspects of the present invention are particularly pointed out and distinctly claimed as examples in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the invention are apparent from the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 depicts an example of a computing environment incorporating and using one or more aspects of the present invention;
FIG. 2A depicts another example of a computing environment incorporating and using one or more aspects of the present invention;
FIG. 2B depicts further details of the memory of FIG. 2A, in accordance with an aspect of the present invention;
FIG. 3 depicts one embodiment of the format of a "string range compare" instruction, in accordance with an aspect of the present invention;
FIG. 4 depicts logic associated with a "string range compare" instruction, in accordance with an aspect of the present invention;
FIG. 5 depicts one example of the use of a "vector string range compare" instruction, in accordance with an aspect of the present invention;
FIG. 6 depicts a plurality of vectors using one embodiment of a "vector string range compare" instruction, in accordance with an aspect of the present invention;
FIG. 7 depicts one example of a register file, in accordance with an aspect of the present invention;
FIG. 8 depicts one embodiment of a computer program product incorporating one or more aspects of the present invention;
FIG. 9 depicts one embodiment of a host computer system incorporating and using one or more aspects of the present invention;
FIG. 10 depicts another example of a computer system incorporating and using one or more aspects of the present invention;
FIG. 11 depicts another example of a computer system, including a computer network, incorporating and using one or more aspects of the present invention;
FIG. 12 depicts one embodiment of various elements of a computer system incorporating and using one or more aspects of the present invention;
FIG. 13A depicts one embodiment of an execution unit of the computer system of FIG. 12 incorporating and using one or more aspects of the present invention;
FIG. 13B depicts one embodiment of a branch unit of the computer system of FIG. 12 incorporating and using one or more embodiments of the present invention;
FIG. 13C depicts one embodiment of a load/store unit of the computer system of FIG. 12 incorporating and using one or more embodiments of the present invention; and
FIG. 14 depicts one embodiment of an emulated host computer system incorporating and using one or more aspects of the present invention.
Detailed Description
According to one aspect of the present invention, a capability is provided for facilitating the processing of character data, including but not limited to alphabetic characters in any language; a number; punctuation marks; and/or other symbols. The character data may or may not be a string of data. Criteria are associated with the character data, examples of criteria include (but are not limited to): ASCII (american standard code for information interchange); unicode, including but not limited to UTF (Unicode transform format) 8; UTF 16; and the like.
In one example, a compare vector register is provided that compares each element of the vector register to a range of values to determine if there is a matching "vector string range compare" instruction. As used herein, a range of values may be one or more values. For example, the range may include one value for which a comparison is made (e.g., is H ═ a), or the comparison may include multiple values based on which a comparison is made (e.g., is a < H < Z).
As described herein, the elements of a vector register (also referred to as a vector) are one, two, or four bytes in length; and the vector operands are, for example, SIMD (single instruction multiple data) operands having multiple elements. In other embodiments, the elements may be other sizes; and the vector operands need not be SIMD and/or may comprise one element.
In another embodiment, the same instruction (the "vector string range compare" instruction) also searches for a null element (also referred to as a zero element (e.g., the element contains all zeros)) in the selected vector. Null or zero elements indicate termination of character data; e.g., the end of a particular string of data.
One embodiment of a computing environment to incorporate and use one or more aspects of the present invention is described with reference to FIG. 1. The computing environment 100 includes a processor 102 (e.g., a central processing unit), a memory 104 (e.g., a main memory), and/or a plurality of input/output (I/O) devices and/or interfaces 106 coupled to one another via, for example, one or more buses 108 and/or other connections.
In one example, the processor 102 is based on the z/Architecture supplied by International Business Machines Corporation and is part of a server, such as a System z server that is also supplied by International Business Machines Corporation and implements the z/Architecture. One example of a z/Architecture is described in the title "z/Architecture Principles of OperationPublication No. (SA22-7832-08, ninth edition, 8 months 2010). In one example, the processor executes an operating system, such as the z/OS also supplied by International Business Machines Corporation.Andis a registered trademark of International Business machines corporation (Armonk, New York, USA). Other names used herein may be registered trademarks, trademarks or product names of International Business Machines Corporation or other companies.
In another embodiment, the processor 102 is based on a computer programThe Power architecture supplied by International Business machines corporation. One embodiment of the Power architecture is described in the "Power ISATMRevision B "version 2.06 (International Business Machines Corporation, 23/7 2010). PowerIs a registered trademark of International Business Machines Corporation.
In another embodiment, the processor 102 is based on the Intel architecture supplied by Intel Corporation. An embodiment of the Intel architecture is described in "64and IA-32architecture devices' Manual, Vol.2B, instruments Set Reference, A-L (Ser. No. 253666. 041US, 12 2011) and "64andIA-32architecture developers' Manual, Vol.2B, instruments Set Reference, M-Z (Ser. No. 253667-.Is a registered trademark of Intel Corporation (Santa Clara, California).
Another embodiment of a computing environment to incorporate and use one or more embodiments of the present invention is described with reference to FIG. 2A. In this example, the computing environment 200 includes a local central processing unit 202, a memory 204, and one or more input/output devices and/or interfaces 206 coupled to each other, e.g., via one or more buses 208 and/or other connections, for example. By way of example, the computing environment 200 may include: PowerPC processors, pSeries servers, or xSeries servers supplied by International Business Machines Corporation (Armonk, New York); HPSuperdome with Intel Itanium II processor supplied by hewlett packard co. (Palo Alto, California); and/or other Machines based on architectures supplied by International Business Machines Corporation, Hewlett packard, Intel, Oracle, or others.
The local central processing unit 202 includes one or more local registers 210, such as one or more general purpose registers and/or one or more special purpose registers, used during processing within the environment. These registers contain information that represents the state of the environment at any particular point in time.
In addition, the local central processing unit 202 executes instructions and program code stored in memory 204. In one particular example, the central processing unit executes emulator code 212 stored in memory 204. This code enables a processing environment configured in one architecture to emulate another architecture. By way of example, the emulator code 212 allows for emulating z/Architecture and executing software and instructions developed based on z/Architecture based on machines other than z/Architecture, such as PowerPC processors, pSeries servers, xSeries servers, HPSuperdome servers, or others.
Further details regarding the emulator code 212 are described with reference to FIG. 2B. Guest instructions 250 comprise software instructions (e.g., machine instructions) developed to be executed in an architecture different from that of the local CPU 202. For example, the guest instruction 250 may have been designed to execute on the z/Architecture processor 102, but instead, the guest instruction 250 is being emulated on the local CPU 202 (which may be, for example, an Intel Itanium II processor). In one example, the emulator code 212 includes an instruction fetch unit 252 to obtain one or more guest instructions 250 from the memory 204 and optionally provide local buffering for the obtained instructions. The emulator code 212 also includes an instruction translation routine 254 to determine the type of guest instruction that has been obtained and translate the guest instruction into one or more corresponding native instructions 256. This translation includes, for example, identifying a function to be performed by the guest instruction and selecting the native instruction(s) to perform the function.
Additionally, the emulator 212 includes an emulation control routine 260 to cause native instructions to be executed. Emulation control routine 260 may cause native CPU 202 to execute a routine of native instructions that emulate one or more previously obtained guest instructions, and, upon completion of such execution, return control to the instruction fetch routine to emulate the obtaining of the next guest instruction or group of guest instructions. Execution of the native instructions 256 may include loading data from the memory 204 into registers; storing data from the register back to the memory; or perform some type of arithmetic or logical operation (as determined by the translation routine).
Each routine is implemented, for example, in software that is stored in memory and executed by the local central processing unit 202. In other examples, one or more of the routines or operations are implemented in firmware, hardware, software, or some combination thereof. The registers of the emulated processor may be emulated using registers 210 of the local CPU or by using locations in memory 204. In embodiments, guest instructions 250, native instructions 256, and emulator code 212 may reside in the same memory or may be distributed among different memory devices.
As used herein, firmware includes, for example, microcode, millicode, and/or macrocode of a processor. Firmware includes, for example, hardware-level instructions and/or data structures used in the implementation of high-level machine code. In one embodiment, the firmware includes, for example, proprietary code that is typically delivered as microcode that includes microcode specific to trusted software or underlying hardware and that controls access by the operating system to the system hardware.
In one example, the obtained, translated, and executed guest instructions 250 are instructions described herein. An instruction that is of one Architecture (e.g., z/Architecture) is fetched from memory, translated and represented as a sequence of native instructions 256 of another Architecture (e.g., PowerPC, pSeries, xSeries, Intel, etc.). These native instructions are then executed.
In one embodiment, the instructions described herein are vector instructions provided according to an embodiment of the present disclosure that are part of a vector tool. The vector tool provides a fixed-size vector, for example, ranging from one to sixteen elements. Each vector includes data that is operated on by a vector instruction defined in the facility. In one embodiment, if the vector consists of multiple elements, each element is processed in parallel with the other elements. Instruction completion does not occur until processing of all elements is complete.
As described herein, vector instructions may be implemented as part of various architectures including, but not limited to, z/Architecture, Power, Intel, and the like. Although the embodiments described herein are directed to z/Architecture, the vector instructions described herein and one or more embodiments of the present invention may be based on many other architectures. The z/Architecture is merely an example.
In one embodiment implementing the vector tool as part of the z/Architecture, to use vector registers and instructions, the vector enable control and register control in a specified control register (e.g., control register 0) are set to, for example, one. A data exception is considered if the vector tool is installed and the vector instruction is executed without the enable control set. If the vector tool is not installed and the vector instruction is executed, the exception to the operation is considered.
The vector data is presented in storage, for example, in the same left-to-right order as the other data formats. Bits of the data format numbered 0 to 7 constitute the byte in the leftmost (lowest numbered) byte position in the storage, bits 8 to 15 form the byte in the next sequential position, and so on. In another example, the vector data may appear in storage in another order (such as from right to left).
Many of the vector instructions provided with the vector tool have a field specifying a bit. This field, referred to as register extension bits or RXB, includes the most significant bits for each of the operands specified by the vector register. The bits for registers not specified by the instruction will be reserved and set to zero.
In one example, the RXB field includes four bits (e.g., bits 0-3), and the bits are defined as follows:
0-the most significant bit indicated by the first vector register for the instruction.
1-second vector register for instruction indicates the most significant bit, if any.
2-the third vector register for the instruction indicates the most significant bit, if any.
3-fourth vector register for instruction indicates the most significant bit, if any.
Each bit is set to either a zero or a one depending on the register number, for example, by an assembler. For example, for registers 0-15, the bit is set to 0; for registers 16 to 31, the bit is set to 1, and so on.
In one embodiment, each RXB bit is an extension bit for a particular location in an instruction that includes one or more vector registers. For example, in one or more vector instructions, bit 0 of RXB is an extended bit of positions 8-11, which is assigned to, for example, V1(ii) a Bit 1 of RXB is an extended bit of positions 12 to 15, which is assigned to, for example, V2(ii) a And so on.
In another embodiment, the RXB field includes additional bits, and more than one bit is used as an extension for each vector or position.
An instruction provided according to one aspect of the present invention that includes an RXB field is a "vector string range compare" instruction, an example of which is depicted in fig. 3. In one example, the "vector string compare instruction" instruction 300 includes: an opcode field 302a (e.g., bits 0-7), 302b (e.g., bits 40-47) indicating a "vector string compare operation" operation; a first vector register field 304 (e.g., bits 8-11) to indicate a first vector register (V)1) (ii) a A second vector register field 306 (e.g., bits 12-15) to indicate a second vector register (V)2) (ii) a A third vector register field 308 (e.g., bits 16-19) to specify a third vector register (V)3) (ii) a First mask field (M)5)310 (e.g., bits 20 to 2)3) (ii) a Second mask field (M)6)312 (e.g., bits 24 to 27); a fourth register field 314 (e.g., bits 32-35) to specify a fourth vector register (V)4) And an RXB field 316 (e.g., bits 36 through 39). In one example, each of fields 304-316 is separate and independent from the opcode field(s). Additionally, in one embodiment, these fields are separate and independent of each other; however, in other embodiments, more than one field may be combined. Additional information regarding the use of these fields is described below.
In one example, selected bits (e.g., the first two bits) of the opcode specified by the opcode field 302a specify the length and format of the instruction. In this particular example, the selected bit indication is three halfwords in length and is formatted as a vector register and register operation with an extended opcode field. Each vector (V) field and its corresponding extension bit specified by RXB specify a vector register. In particular, for vector registers, a register containing an operand is specified using, for example, a four-bit field of a register field with its corresponding register extension bit (RXB) added as the most significant bit. For example, if the four-bit field is 0110 and the extension bit is 0, then the five-bit field 00110 indicates register number 6.
The subscript numbers associated with a field of an instruction indicate the operands to which that field applies. For example, and vector register V1The associated subscript number 1 denotes the first operand, and so on. The register operand is one register in length, which is, for example, 128 bits.
M with, for example, four bits (0-3)5The field specifies an Element Size (ES) control, for example, in bits 1-3. The element size controls the size of a specified element in a vector register operand. In one example, the element size control may specify a byte, a halfword (e.g., two bytes), or a word (e.g., four bytes). For example, 0 indicates one byte; 1 indicates a half word; and 2 indicates a word, also called a full word. If a reserved value is specified, then it is treated as a specified exception. The operand comprising a value represented by M5Element size in a field controls the specified sizeOf (2) is used.
M6The field is, for example, a four-bit field (bits 0-3) including, for example, the following controls:
negation (IN, bit 0): if zero, the comparison is made with a pair of values in the control vector. If it is one, the results of the pair compared are inverted with their ranges.
Result type (RT, bit 1): if zero, each resulting element is a mask of all range comparisons for that element. If so, the index is stored to the specified byte (e.g., byte 7) of the first operand. Zeros are stored in the remaining bytes.
Zero search (ZS, bit 2): if one, each element of the second operand is also compared to zero.
Condition code setting (CC, bit 3): if zero, the condition code is not set and remains unchanged. If one, then as an example the condition code is set as specified:
0-if ZS equals one and find zero in the index element lower than any comparison.
1-comparison is found
2--
3. No comparison is found.
In the execution of one embodiment of the "vector string Range Compare" instruction, which in one embodiment proceeds from left to right, the instruction uses the instruction from the fourth operand (included in the instruction set by V)4Plus its RXB bit), a second operand (included in a register designated by V)2Plus its RXB bit specified) and a third operand (included in a register specified by V)3Plus in a register indicated by its RXB bit). Even-odd pairs in the third operand and the fourth operand form a range of values to be used in comparison with each element in the second operand. If the element matches with the first elementAny of the ranges specified by the three operand and the fourth operand are considered a match.
If M is6If the result type flag in the field is zero, then if the first operand (included in the field by V)1Plus its RXB bit in the register) that the element in the second operand that is being compared matches any of these ranges, the bit position of that element is set to one, otherwise, they are set to zero.
If M is to be6With the Result Type (RT) flag in the field set to one, the byte index of the first element (e.g., the index of the first byte of the first element) in the second operand that matches any of the ranges specified by the third operand and the fourth operand is placed in the specified byte (e.g., byte 7) of the first operand, and zeros are stored in the remaining bytes.
M6A zero search flag in the field (if set to one) increases the comparison of zeros of the second operand element to the range provided by the third operand and the fourth operand. If the zero compare is in a lower index element than any other true compare, then the condition code is set to zero. Additionally, if RT ═ 1, the byte index of the first byte of the leftmost zero operand is placed in the specified byte (e.g., byte 7) of the first operand if the zero comparison is at a lower index element than any other true comparison. Otherwise, if RT ═ 0, it is placed in each element of the first operand corresponding to zero in the second operand.
As an example, the controls specified by the fourth operand include, for example, equal to, greater than, and less than. Specifically, in one example, the bits in the fourth operand are defined as follows.
Equal (EQ): when it is one, an equality comparison is performed.
Greater Than (GT): when one, a greater than comparison is performed.
Less Than (LT): when it is one, perform a less than compare
These control bits may be used in any combination. If none of these bits are set, the comparison will produce a false result. If all of these bits are set, the comparison yields a true result.
In one embodiment, the comparison of the elements is performed in parallel. For example, if the vector register being compared is 16 bytes in length, then 16 bytes are compared in parallel. Additionally, in another embodiment, the direction of the vector-either left to right or right to left is provided at execution time (runtime). For example, an instruction accesses a register, state control, or other entity that indicates that the direction of processing is left-to-right or right-to-left, as examples. In one embodiment, this directional control is not encoded as part of the instructions, but is provided to the instructions upon execution.
In another embodiment, the instruction does not include an RXB field. Instead, no extension is used, or an extension is provided in another manner, such as from control external to the instruction, or provided as part of another field of the instruction.
Additional details regarding one embodiment of processing a "vector string range compare" instruction are described with reference to FIG. 4. In one example, a processor of the computing environment is executing this logic.
Referring to FIG. 4, initially, a determination is made as to whether a null search (also referred to as a zero element, end of string, terminator, etc.) is to be performed (query 400). If a null search is to be performed (e.g., the element contains all zeros). A comparison is made in the second operand for the null character (i.e. for the zero element) (step 402) and the result is output to the variable zeroidx 403. In one example, the result is the byte index of the first byte of the left-most zero element. For example, if the element size is a byte and a zero element is found in byte five, the index of the byte in which the zero element was found is placed in zeroidx. Similarly, if the element size is a halfword and there are eight elements (0-7), and element 3 is zero, then 6 (for byte index 6) is placed in zeroidx. Likewise, if the element size is a full word, and there are four elements (0-3) and element 1 is 0, then 4 (for byte index 4) is placed in zeroidx. If no zero element is found, then in one example, the size of the vector (e.g., in bytes; e.g., 16) is placed in zeroidx.
In one embodiment, to obtain the first byte of an element, any number of bytes of the element is anded with a mask, where for a 1 byte size element, the mask is 11111; for an element of size 2 bytes, the mask is 11110; and for an element of size 4 bytes, the mask is 11100. Thus, for the above example, when the element size is byte AND element 5 has zero, the binary value of 5(00101) is anded with the mask of byte size elements (11111) to obtain the byte index of the first byte of the element (i.e., 00101AND 11111 ═ 00101 (5)). Similarly, for an element size of 2 bytes AND an element 3 of zero, the value 6 or 7 in the binary (since element 3 includes bytes 6 AND 7) is anded with the mask of halfwords (11110) to obtain the byte index of the first byte of the element (i.e., (00110 or 00111) AND 11110 ═ 00110 (6)). Further, because the element size is 4 bytes AND element 1 is zero, the binary value of any byte making up element 1 (bytes 4-7) is anded with the mask of full words (11100) to obtain the first byte index of element 1 (i.e., (00100 or 00101 or 00110 or 00111) AND 11100 ═ 00100 (4)).
Additionally, or if no null search is performed, a pair of characters is loaded from a third operand (step 404), and control from a fourth operand is loaded (step 406). For example, as shown in FIG. 5, the third operand depicted in column 500 includes characters A, Z, a, Z, 0, and 9; and a fourth operand depicted in column 502 includes controls such as greater than and equal to (GE) and less than and equal to (LE). A pair of characters (such as A, Z) is loaded, and a pair of controls (such as GE and LE).
Referring to fig. 4and 5, thereafter, each element of the second operand (an example of which is depicted in line 506 of fig. 5) is compared to the load word from the third operand using control from the fourth operand, step 408. The result of the comparison is placed in resultvec [ i ], where i represents the range i to be compared. That is, in this example, there is a resultvec for each range of comparisons specified in the third operand, and each resultvec includes the same number of bits, e.g., 128 bits, as the second operand. Thus, in this example, resultvec is an array of three resultvecs (e.g., AZ, 09) for three comparison ranges, and each has 128 bits. As an example, the letter H from the second operand is compared to a and Z of the third operand using controls GE and LE from the fourth operand. Because H is greater than or equal to A and less than or equal to Z, the bit in resultvec [ i ] corresponding to H is set to true (e.g., 1). The comparison is performed for each element of the second operand.
Thereafter, a determination is made as to whether there are more pairs from the third operand to load (INQUIRY 412). If there are more pairs to load, processing continues to step 404 where the next pair of characters and controls is loaded and used in the comparison that compares all elements of the second operand to generate another resultvec (i is incremented by 1). Otherwise, processing continues to step 420, where the results from the various comparisons and/or zero searches are compressed. The inputs to the compression results logic are resultvec [ i ]410 and zeroidx 403.
An example pseudo code for the compression result logic is as follows:
as indicated above, to obtain a result to be placed in the first operand, a mask, referred to herein as TEMP, which is the size of the second operand (e.g., 128 bits) is set to zero. Then, for each pair of characters from the third operand, TEMP is ORed with resultvec (i). (in this example, the loop repeats vec _ length (e.g., 16)/2 times; however, since there are only 3 pairs of values in the third operand, the control in the fourth operand of the remaining 5 pairs of values is set to false (false) — in another example, the loop repeats only the number of pairs of values in the third operand. After comparison for the example in fig. 5, TEMP is equal to FFFFFFFFFF00FFFFFFFFFF 0000? Is there a Is there a Is there a Is there a Is there a .
Then, if a zero search is performed, TEMP is adjusted to reflect the first zero element (if any). That is, the bit of TEMP corresponding to zeroidx for the element size is set to one. For example, if zeroidx is 12 and the element size is 1 byte. Byte 12 is set to one.
If the inversion result (IN) is 1, then TEMP is negated (e.g., zero becomes one, and one becomes zero). Additionally, the resultidx setting is equal to the number of leading zero bytes in TEMP (e.g., in the example above, it is zero because the first value is FF, but if TEMP is negated, the number of leading zero bytes is 5 for the example above).
Then, operand 1 is set based on the value of RT. For example, if RT ═ 1, operand 1 is set to zero, except that a byte is specified (e.g., 7), which is set to the minimum of resultidx and zeroidx, where resultidx is equal to the first byte index of the first element of the second operand that matches either of the ranges specified by the third and fourth operands (e.g., resultidx is equal to zero in the example above, because H is a match). If RT is 0, operand 1 is set to TEMP.
Further, the condition code is conditionally set. If the condition code setting field is ON and (ZS ═ 1and zeroidx < resultidx), CC ═ 0; or if the condition code setting field is ON and if resultidx is less than the size of the second operand (e.g., 16), CC is 0 or otherwise it is set to 3. If the condition code set field is off, no condition code register update is performed.
In another embodiment, the zero search is not a condition, but is performed when the "vector string range compare" instruction is executed. Based on or in response to executing the instruction, a zero search is performed and the position of the zero element (e.g., byte index) or the position of the mask and/or first matching element (e.g., byte index) is returned.
To further explain one or more aspects of one embodiment of a "vector string range compare" instruction, reference is made to FIG. 6 and the following description. Referring to FIG. 6, in one example, four vector registers are used, including V1(600)、V2(602)、V3(604) And V4(606)。V1Is a first operand comprising a number of first elements of equal size; v2Is a second operand comprising a number of second elements of equal size; v3Is a third operand comprising an odd/even pair of third elements, each element being the size of the second element; v4Is a fourth operand comprising an odd/even pair of fourth elements, each element being the size of the second element.
V3For elements used at V4From left to right and V under control of the element2And (5) element comparison. Thus, in one embodiment, the process is as follows:
A.(1)V3element 0 (even) 620 at V4Element 0 (odd) 624 under control of the first V2Element 622 (element 0) comparison, where V4Element 0 specifies a condition, such as greater than, less than, etc. For example, a is compared to F using operator W. In addition, V3Element 1 (odd) 626 at V4Under the control of element 1 (odd) 628 with the first V2Element 622 (element 0) comparison, where V4Element 1 specifies a condition, such as greater than, less than, etc. For example, A is also compared to G using operator X.
If a match is encountered, the operation terminates. For example, if the two comparisons match, then V2(A) Element 0 of (a) has a match. Otherwise:
B. for V3、V4Each successive pair of elements of (1), pair V2Element 0 repeats process a (1) above until a match is found (if any).
If B does not encounter a match, then pair V2The next element (element 1) in sequence performs the operations of processes a and B until a match (if any) is found; and so on.
At V1(600) Set a result value dependent on the RT option flag.
If ZS is 1, V2Each comparison of each element of (a) is also compared to a zero value. If it is at V before a match is found2A zero value is detected in the element, a match condition is indicated, and the operation ends.
In one embodiment, there are 32 vector registers, and other types of registers may map to quadrants of the vector registers. For example, as shown in FIG. 7, if there is a register file 700 that includes 32 vector registers 702 and each register is 128 bits in length, 16 floating point registers 704 of 64 bits in length may overlap these vector registers. Thus, as an example, when the floating-point registers 704 are modified, then the vector registers 702 are also modified. Other mappings for other types of registers are possible.
One embodiment of a "vector string range compare" instruction is described above. The instruction takes a vector of characters as one input, another input for the search value, and a third input for control information to indicate how many search values to use. The instruction may be used, for example, to find a string or a specific category of characters in a character set, such as all numbers, all capitals, etc. According to one aspect of the invention, the instructions may search for exclusive or inclusive scopes.
In one embodiment, a "vector string range compare" instruction is provided that specifies a plurality of elements in a second vector register. The elements of the odd and even pairs in the third and fourth vectors specify a range of values. Each range of values is compared to each element of the second vector register. If the elements of the second vector match the range values, it is considered a "match".
Alternatively, if another instruction field is set, the location of the matching first element is placed in the first vector register.
Alternatively, if another instruction field is set, the bits of the elements in the first vector are set to the bits of each element of the matching second vector.
Alternatively, if another instruction field is set, each element of the second vector register is additionally compared to a zero value, and a match indicates either a range match or a zero match.
Herein, memory, main memory, storage, and main storage are used interchangeably unless explicitly noted otherwise or by context.
Additional details regarding the vector tool (including examples of other instructions) are provided as part of this implementation described further below.
As will be appreciated by one skilled in the art, one or more embodiments of the present invention may be embodied as a system, method or computer program product. Accordingly, one or more embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware embodiments that may all generally be referred to herein as a "circuit," module "or" system. Furthermore, one or more embodiments of the invention may take the form of a computer program product embodied in one or more computer-readable media having computer-readable program code embodied therein.
Any combination of one or more computer-readable media may be utilized. The computer readable medium may be a computer readable storage medium. For example, a computer readable storage medium may be, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc-read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Referring now to fig. 8, in one example, a computer program product 800 comprises, for instance, one or more non-transitory computer-readable storage media 802 to store computer-readable program code means or logic 804 thereon to provide and facilitate one or more embodiments of the present invention.
Program code embodied on a computer readable medium may be transmitted using an appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for one or more aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language, assembler or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
One or more aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in these figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of one or more aspects of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition to the foregoing, one or more embodiments of the invention may also be provided, provisioned, deployed, managed, serviced, etc. by a service provider that offers management of customer environments. For example, a service provider can create, maintain, support, etc., computer code and/or computer infrastructure that performs one or more embodiments of the invention for one or more customers. In return, the service provider may collect payment from the customer under a subscription and/or fee agreement, as examples. Additionally or alternatively, the service provider may collect payment from the sale of advertising content to one or more third parties.
In one aspect of the invention, an application program may be deployed to perform one or more aspects of the invention. By way of example, deployment of an application includes providing a computer infrastructure operable to perform one or more aspects of the present invention.
As another aspect of the present invention, a computing infrastructure may be deployed comprising integrating computer-readable code into a computing system, wherein the code in combination with the computing system is capable of performing one or more aspects of the present invention.
As another aspect of the present invention, a process for integrating computing infrastructure may be provided that includes integrating computer readable code into a computer system. The computer system includes a computer-readable medium, wherein the computer medium includes one or more aspects of the present invention. The code in combination with the computer system is capable of performing one or more aspects of the present invention.
While various embodiments are described above, these are only examples. For example, computing environments of other architectures may incorporate and use one or more aspects of the present invention. Additionally, other sizes of registers may be used, and changes to the instructions may be made without departing from the spirit of the present invention. Further, registers other than vector registers may be used and/or the data may be non-character data, such as integer data or other types of data.
In addition, other types of computing environments may benefit from one or more aspects of the present invention. As an example, a data processing system suitable for storing and/or executing program code is available that includes at least two processors coupled directly or indirectly to memory elements through a system bus. These memory elements include, for instance, local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, DASD, tapes, CDs, DVDs, drive and other storage media, etc.) can be coupled to the system either directly or through intervening I/O controllers. Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the available types of network adapters.
Referring to FIG. 9, representative components of a host computer system 5000 to implement one or more aspects of the present invention are depicted. The representative host computer 5000 comprises one or more CPUs 5001 in communication with computer memory (i.e., central storage) 5002, and I/O interfaces to storage media devices 5011 and networks 5010 for communicating with other computers or SANs and the like. The CPU 5001 conforms to an architecture with built instruction sets and built functionality. The CPU 5001 may have a Dynamic Address Translation (DAT)5003 for transforming program addresses (virtual addresses) into real addresses of memory. A DAT typically includes a Translation Lookaside Buffer (TLB)5007 for caching translations so that later accesses to a block of computer memory 5002 do not require delayed address translations. Typically, a cache 5009 is used between the computer memory 5002 and the processor 5001. The cache 5009 may be hierarchical, having a large cache available to more than one CPU and smaller, faster (lower-level) caches between the large cache and each CPU. In some implementations, the lower-level cache is partitioned to provide separate lower-level caches for instruction fetching and data accesses. In one embodiment, instructions are fetched from the memory 5002 by the instruction fetch unit 5004 via the cache 5009. Instructions are decoded in an instruction decode unit 5006 and instructions (other instructions, in some embodiments) are dispatched to one or more instruction execution units 5008. Several execution units 5008 are typically used, such as an arithmetic execution unit, a floating point execution unit, and a branch instruction execution unit. The instructions are executed by the execution units to access operands as needed from registers or memory specified by the instructions. If operands are to be accessed (loaded or stored) from memory 5002, the load/store unit 5005 typically handles the access under control of the instruction being executed. The instructions may be executed in hardware circuitry, or in internal microcode (firmware), or by a combination of both.
As noted, the computer system includes information in local (or main) storage, as well as addressing, protection, and reference and change records. Some aspects of addressing include the format of the address, the concept of address space, the various types of addresses, and the manner in which one type of address is translated to another type of address. Some of the primary storages include permanently assigned storage locations. The main storage provides directly addressable, fast access storage of data for the system. Both data and programs are loaded into the main storage (from the input device) before they can be processed.
The main storage may include one or more smaller fast access buffers (sometimes referred to as caches). Cache memories are typically associated with a CPU or an I/O processor. The effects of physical construction and use of different storage media (in addition to the effects on performance) are not generally observable by programs.
Separate caches may be maintained for instructions and for data operands. Information within a cache is maintained in contiguous bytes on an overall boundary known as a cache block or cache line (or simply a line). The model may provide an "EXTRACT cache attribute" instruction that returns the size of the cache line in bytes. The model may also provide "PREFETCH DATA (pre-fetch data)" and "PREFETCH DATA RELATIVE LONG (pre-fetch data relative length)" instructions that enable pre-fetching of storage into the data or instruction cache or the release of data from the cache.
The storage is treated as a long horizontal bit string. For most operations, accesses to storage are made in left-to-right order. The string of bits is subdivided into eight bit cells. An eight-bit element is called a byte, which is a basic building block for all information formats. Each byte location in storage is identified by a unique non-negative integer, which is the address of the byte location or simply the byte address. The adjacent byte positions have consecutive addresses, starting with 0 on the left and proceeding in left-to-right order. Addresses are unsigned binary integers and are 24, 31 or 64 bits.
Information is transferred between the storage and the CPU or channel subsystem one byte or group of bytes at a time. Unless otherwise specified, in z/Architecture, for example, a group of bytes in storage is addressed by the leftmost byte of the group. The number of bytes in the group is implicitly or explicitly specified by the operation to be performed. When used in CPU operations, groups of bytes are referred to as fields. Within each byte group, bits are numbered in order from left to right, e.g., in a z/Architecture. In the z/Architecture, the leftmost bits are sometimes referred to as "high order" bits, and the rightmost bits are sometimes referred to as "low order" bits. However, the bit number is not a memory address. Only bytes may be addressed. To operate on individual bits of a byte in storage, the entire byte is accessed. The bits in a byte are numbered 0 to 7 from left to right (in, for example, z/Architecture). Bits in the address may be numbered 8 to 31 or 40 to 63 for a 24-bit address, or 1 to 31 or 33 to 63 for a 31-bit address; for a 64-bit address, the bits in the address may be numbered 0 through 63. Within any other fixed-length format of a plurality of bytes, the bits that make up the format are numbered consecutively starting from 0. For the purpose of error detection, and preferably for the purpose of correction, one or more check bits may be transmitted with each byte or with groups of bytes. Such check bits are generated automatically by the machine and cannot be directly controlled by the program. The storage capacity is expressed in number of bytes. When the length of the operand field is implicitly stored by the opcode of the instruction, the field is considered to have a fixed length, which may be one, two, four, eight, or sixteen bytes. For some instructions, larger fields may be implied. When the length of the store operand field is not implicitly, but explicitly stated, the field is considered to have a variable length. The length of the variable length operand may be varied in increments of one byte (or for some instructions, in increments of multiples of two bytes or other multiples). When information is placed in storage, the contents of only those byte locations included in the indicated field are replaced, even though the width of the physical path to storage may be greater than the length of the field being stored.
Some information units will be on an overall boundary in the storage. A boundary is referred to as the entirety of a unit of information when its storage address is a multiple of the length (in bytes) of the unit. Special names are given to fields of 2, 4,8 and 16 bytes on the overall boundary. A halfword is a group of two consecutive bytes on a two-byte boundary and is the basic building block of instructions. A word is a group of four consecutive bytes on a four-byte boundary. A doubleword is a group of eight consecutive bytes on an eight-byte boundary. A quadword is a group of 16 consecutive bytes on a 16-byte boundary. When the memory address specifies a halfword, a word, a doubleword, and a quadword, the binary representation of the address contains one, two, three, or four rightmost zero bits, respectively. The instruction will be on a two-byte overall boundary. Most instructions have no boundary alignment requirements for their storage operands.
On devices implementing separate caches for instructions and data operands, significant delays may be experienced if a program is stored into a cache line from which instructions are subsequently fetched, regardless of whether the store modifies the subsequently fetched instructions.
In one embodiment, the present invention may be practiced with software (sometimes referred to as authorized internal code, firmware, microcode, millicode, microcode (pico-code), etc., any of which will be consistent with one or more aspects of the present invention). Referring to FIG. 9, software program code embodying one or more aspects of the present invention may be accessed by the processor 5001 of the host system 5000 from a long-term storage media device 5011 (such as a CD-ROM drive, tape drive or hard drive). The software program code may be embodied on any of a variety of known media for use with a data processing system such as a magnetic diskette, hard drive, or CD-ROM. The code may be distributed on such media, or may be distributed to users from the computer system's computer memory 5002 or storage to other computer systems via the network 5010 for use by users of such other systems.
The software program code includes an operating system which controls the function and interaction of the various computer components and one or more application programs. Program code is typically paged from storage media device 5011 to relatively high speed computer storage 5002 where it is available for processing by processor 5001. Techniques and methods for embodying software program code in memory, on physical media, and/or distributing software program code via networks are well known and will not be discussed further herein. When the program code is created and stored on tangible media, including, but not limited to, electronic memory modules (RAM), flash memory, Compact Discs (CD), DVDs, magnetic tapes, etc., the program code is often referred to as a "computer program product". The computer program product medium is generally readable by a processing circuit, preferably in a computer system, for execution by the processing circuit.
FIG. 10 illustrates a representative workstation or server hardware system in which one or more aspects of the present invention may be practiced. The system 5020 of fig. 10 comprises a representative base computer system 5021 (such as a personal computer, workstation or server), including optional peripheral devices. The base computer system 5021 comprises one or more processors 5026 and a bus to connect the processor 5026 with the other components of the system 5021 and to enable communication between the processor(s) 5026 and the other components of the system 5021 in accordance with known techniques. The bus connects the processor 5026 to memory 5025 and long-term storage 5027, the long-term storage 5027 may comprise, for example, a hard drive (including, for example, any of magnetic media, CD, DVD, and flash memory) or a tape drive. The system 5021 may also include a user interface adapter that connects the microprocessor 5026 via the bus to one or more interface devices, such as a keyboard 5024, a mouse 5023, a printer/scanner 5030, and/or other interface devices, which may be any user interface device such as a touch-sensitive screen, a digital keypad (entry pad), and the like. The bus also connects a display device 5022, such as an LCD screen or monitor, to the microprocessor 5026 via a display adapter.
The system 5021 may communicate with other computers or networks of computers by way of a network adapter capable of communicating (5028) with a network 5029. Example network adapters are communications channels, token ring, Ethernet or modems. Alternatively, the system 5021 may communicate using a wireless interface, such as a CDPD (cellular digital packet data) card. The system 5021 can be associated with such other computers in a Local Area Network (LAN) or a Wide Area Network (WAN), or the system 5021 can be a client in a client/server arrangement with another computer, or the like. All of these configurations, as well as appropriate communication hardware and software, are known in the art.
Fig. 11 illustrates a data processing network 5040 in which one or more aspects of the present invention may be practiced. Data processing network 5040 may include multiple separate networks (such as wireless networks and wired networks), each of which may include multiple clients 5041, 5042, 5043, 5044. Additionally, one or more LANs may be included, where a LAN may include a plurality of intelligent workstations coupled to the host processor, as will be appreciated by those skilled in the art.
Still referring to FIG. 11, the network may also include mainframe computers or servers, such as a gateway computer (client server 5046) or application server (remote server 5048, which may access a data repository and may also be accessed directly from client 5045). The gateway computer 5046 acts as an entry point to each individual network. A gateway is needed when connecting one networking protocol to another. The gateway 5046 may be coupled to another network (e.g., the internet 5047), preferably by way of a communication link. The gateway 5046 may also be directly coupled to one or more clients 5041, 5042, 5043, 5044 using a communications link. The gateway computer may be implemented using an IBMeServerTMSystemz server available from International Business Machines Corporation.
Referring concurrently to fig. 10 and 11, software program code which may embody one or more embodiments of the present invention may be accessed by the processor 5026 of the system 5020 from long-term storage media 5027, such as a CD-ROM drive or hard drive. The software program code may be embodied on any of a variety of known media for use with a data processing system such as a diskette, hard drive, or CD-ROM. Program code may be distributed on such media, or may be distributed to users 5050, 5051 from the memory or storage of one computer system to other computer systems via a network, for use by users of such other systems.
Alternatively, the program code may be embodied in the memory 5025 and accessed by the processor 5026 using a processor bus. This program code includes an operating system which controls the function and interaction of the various computer components and one or more application programs 5032. The program code is typically paged from the storage medium 5027 to high speed memory 5025 where it is available for processing by the processor 5026. Techniques and methods for embodying software program code in memory, on physical media, and/or distributing software program code via networks are well known and will not be discussed further herein. When the program code is created and stored on tangible media, including, but not limited to, electronic memory modules (RAM), flash memory, Compact Discs (CD), DVDs, magnetic tapes, etc., the program code is often referred to as a "computer program product". The computer program product medium is generally readable by a processing circuit, preferably in a computer system, for execution by the processing circuit.
The cache that is most readily available to the processor (typically faster and smaller than the other caches of the processor) is the lowest level (L1 or level one) cache, and the main storage (main memory) is the highest level cache (L3 if there are 3 levels). The lowest level cache is often divided into an instruction cache (I-cache) that holds the machine instructions to be executed and a data cache (D-cache) that holds the data operands.
Referring to fig. 12, an exemplary processor embodiment is depicted for the processor 5026. Typically, one or more hierarchical cache blocks of cache 5053 are used in order to improve processor performance. Cache 5053 is a cache that holds cache lines of memory data that may be used. Typical cache lines are 64, 128 or 256 bytes of memory data. In addition to being used to cache data, a separate cache is often used to cache instructions. Cache coherency (synchronization of copies of lines in memory and cache) is often provided by various "snoop" algorithms well known in the art. The main memory storage 5025 of the processor system is often referred to as a cache memory. In a processor system having 4 levels of cache 5053, the main storage 5025 is sometimes referred to as a level 5(L5) cache because it is typically faster and only holds portions of the non-volatile storage (DASD, tape, etc.) available to the computer system. The main storage 5025 "caches" pages of data that are paged in and out of the main storage 5025 by the operating system.
Program counter (instruction counter) 5061 keeps track of the address of the current instruction to be executed. The program counter in the z/Architecture processor is 64 bits and can be truncated to 31 or 24 bits to support previous addressing restrictions. The program counter is typically embodied in the computer's PSW (program status word) such that the program counter persists during context switches. Thus, an in-flight program with a program counter value may be interrupted by, for example, the operating system (a context switch from the program environment to the operating system environment). The PSW of the program maintains a program counter value when the program is not functional, and uses the program counter of the operating system (in the PSW) while the operating system is executing. Typically, the program counter is incremented by an amount equal to the number of bytes of the current instruction. RISC (reduced instruction set computing) instructions are typically of fixed length, while CISC (Complex instruction set computing) instructions are typically of variable length. IBM z/Architecture's instructions are CISC instructions having a length of 2, 4, or 6 bytes. For example, program counter 5061 is modified by a context switch operation or a branch taken operation of a branch instruction. In a context switch operation, the current program counter value is saved in a program status word along with other status information about the program being executed (such as condition codes), and a new program counter value is loaded to point to the instruction of the new program module to be executed. A branch taken operation is performed to permit the program to make decisions or loop within the program by loading the result of the branch instruction into the program counter 5061.
Typically, instruction fetch unit 5055 is used to fetch instructions on behalf of processor 5026. The fetch unit fetches the "next sequential instruction", the target instruction of the branch taken instruction, or the first instruction of the program after a context switch. Modern instruction fetch units often use prefetching techniques to speculatively prefetch instructions based on the likelihood that pre-fetched instructions may be used. For example, the fetch unit may fetch 16 bytes of the instruction that includes the next sequential instruction and the additional bytes of the next sequential instruction.
The fetched instructions are then executed by the processor 5026. In one embodiment, the fetched instruction(s) are passed to the fetch unit's dispatch unit 5056. The dispatch unit decodes the instruction(s) and forwards information about the decoded instruction(s) to the appropriate units 5057, 5058, 5060. The execution unit 5057 will typically receive information about decoded arithmetic instructions from the instruction fetch unit 5055, and will perform arithmetic operations on operands according to the opcode of the instruction. Operands are provided to the execution units 5057, preferably from storage 5025, from architected registers 5059, or from the immediate field of the instruction being executed. When storing the results of the execution, the results of the execution are stored in storage 5025, registers 5059, or other machine hardware (such as control registers, PSW registers, etc.).
Processor 5026 typically has one or more units 5057, 5058, 5060 for performing the function of instructions. Referring to fig. 13A, an execution unit 5057 may communicate with built general registers 5059, decode/dispatch unit 5056, load store unit 5060, and other processor units 5065 by way of interface logic 5071. The execution unit 5057 may use a number of register circuits 5067, 5068, 5069 to hold information that an Arithmetic Logic Unit (ALU)5066 is to operate on. The ALU performs arithmetic operations (such as addition, subtraction, multiplication, and division) as well as logical functions (such as "and" (and), "OR" (or), and "XOR" (XOR), rotation, and shifting). Preferably, the ALU supports specialized operations that are design dependent. Other circuitry may provide other structured tools 5072, including, for example, condition codes and recovery support logic. Typically, the result of the ALU operation is held in an output register circuit 5070, which output register circuit 5070 may forward the result to a variety of other processing functions. There are many arrangements of processor units and the description of the invention is intended only to provide a representative understanding of one embodiment.
The "add" instruction (for example) will execute in an execution unit 5057 with arithmetic and logical functionality, while the floating point instruction (for example) will execute in floating point execution with specialized floating point capabilities. Preferably, the execution unit operates on operands identified by the instruction by performing an opcode-defined function on the operands. For example, an "add" instruction may be executed by the execution unit 5057 on operands found in two registers 5059 identified by the register fields of the instruction.
The execution unit 5057 performs arithmetic addition on two operands and stores the result in a third operand, where the third operand may be a third register or one of the two source registers. The execution unit preferably utilizes an Arithmetic Logic Unit (ALU)5066, which Arithmetic Logic Unit (ALU)5066 is capable of executing a variety of logical functions, such as shift, rotate, sum, Or, And exclusive Or, XOR, And algebraic functions including any of addition, subtraction, multiplication, division. Some ALUs 5066 are designed for scalar operations and some ALUs 5066 are designed for floating point operations. Depending on the architecture, the data may be Big-end (Big Endian) with the least significant byte at the most significant byte address or little-end (little Endian) with the least significant byte at the least significant byte address. IBM z/Architecture is a big-end method. Depending on the architecture, the unsigned field may be a sign and magnitude (1's complement or 2's complement). A 2's complement is advantageous because the ALU does not need to design the subtraction capability, since in the ALU negative or positive values in the 2's complement only require addition. Numbers are typically described in shorthand, where a 12-bit field defines the address of a 4,096 byte block, and is typically described as, for example, a 4Kbyte block.
Referring to FIG. 13B, branch instruction information for executing branch instructions is typically sent to the branch unit 5058, and the branch unit 5058 often uses branch prediction algorithms (such as the branch history table 5082) to predict the outcome of a branch before other conditional operations are completed. The target of the current branch instruction will be fetched and speculatively executed before the conditional operation completes. When the conditional operation is completed, the speculatively executed branch instruction is completed or discarded based on the condition of the conditional operation and the speculative result. A typical branch instruction may test the condition code and branch to a target address if the condition code satisfies the branch requirement of the branch instruction, the target address may be calculated based on a number of digits including, for example, a digit found in a register field or an immediate field of the instruction. The branch unit 5058 may use an ALU 5074 having a plurality of input register circuits 5075, 5076, 5077 and output register circuits 5080. For example, the branch unit 5058 may communicate with general registers 5059, decode dispatch unit 5056, or other circuitry 5073.
Execution of a group of instructions may be interrupted for a variety of reasons including, for example: context switches initiated by the operating system, program exceptions or errors causing a context switch, I/O interrupts causing a context switch, or multi-threaded activity of multiple programs (in a multi-threaded environment). Preferably, the context switch action saves state information about the program currently being executed and then loads state information about another program being invoked. For example, the state information may be saved in hardware registers or in memory. The state information preferably contains the program counter value pointing to the next instruction to be executed, condition codes, memory translation information and constructed register contents. The context switch activity may be trained by hardware circuitry, an application program, an operating system program, or firmware program code (microcode, or authorized internal code (LIC)), alone or in combination.
The processor accesses operands according to the instruction defined method. An instruction may use the value of a portion of the instruction to provide an immediate operand, which may provide one or more register fields that explicitly point to general purpose registers or special purpose registers (e.g., floating point registers). The instruction may utilize an implicit register identified as an operand by the opcode field. The instruction may use the memory location for the operand. The memory location of the operand may be provided by a register, an immediate field, or a combination of a register and an immediate field, as exemplified by the z/Architecture long displacement facility (long displacement facility), where the instruction defines, for example, a base register, an index register, and an immediate field (displacement field) that are added together to provide the address of the operand in memory. Unless otherwise indicated, a location herein typically implies a location in main memory (primary storage).
Referring to FIG. 13C, the processor uses a load/store unit 5060 to access storage. The load/store unit 5060 may perform a load operation by obtaining the address of the target operand in memory 5053 and loading the operand in a location of register 5059 or another memory 5053, or may perform a store operation by obtaining the address of the target operand in memory 5053 and storing the data obtained from the location of register 5059 or another memory 5053 in the target operand location in memory 5053. The load/store unit 5060 may be speculative and may access memory sequentially out of order with respect to instruction order, however, the load/store unit 5060 maintains an appearance to programs that instructions are executed in order. The load/store unit 5060 may communicate with general registers 5059, decode/dispatch unit 5056, cache/memory interface 5053, or other elements 5083, and includes various register circuits, ALUs 5085, and control logic 5090 to calculate storage addresses and provide pipeline sequencing to keep operations in order. Some operations may be out-of-order, but the load/store unit provides functionality that makes out-of-order operations appear to a program to have been executed in order, as is well known in the art.
Preferably, the addresses that are "seen" by the application are often referred to as virtual addresses. Virtual addresses are sometimes referred to as "logical addresses" and "effective addresses". These virtual addresses are virtual because: they are redirected to a physical memory location by one of a variety of Dynamic Address Translation (DAT) techniques including, but not limited to, prefixing a virtual address with an offset value only, translating the virtual address via one or more translation tables, preferably containing, either alone or in combination, at least a segment table and a page table, preferably the segment table having an entry pointing to the page table. In the z/Architecture, a translation hierarchy is provided, including a region first table, a region second table, a region third table, a segment table, and an optional page table. The performance of address translation is often improved by utilizing a Translation Lookaside Buffer (TLB), which includes entries that map virtual addresses to associated physical memory locations. These entries are established when the DAT translates a virtual address using the translation table. Subsequent use of the virtual address may then utilize the entry of the fast TLB, rather than the slow in-order translation table access. TLB content may be managed by a variety of replacement algorithms including LRU (least recently used).
In the case where the processor is a processor of a multi-processor system, each processor has the responsibility of keeping common resources such as I/O, cache, TLB, and memory interlocked to achieve coherency. Typically, a "snooping" technique will be utilized in maintaining cache coherency. In a snooping environment, each cache line may be marked as being in any one of the following states in order to facilitate sharing: shared state, exclusive state, changed state, invalid state, etc.
An I/O unit 5054 (fig. 12) provides a means for the processor to attach to peripheral devices, including, for example, tapes, optical disks, printers, displays, and networks. I/O cells are often presented to a computer program by a software driver. At a mainframe computer (such as fromSystem z), the channel adapter and open System adapter are I/O cells of the mainframe computer that provide communication between the operating System and peripheral devices.
In addition, other types of computing environments may benefit from one or more aspects of the present invention. As an example, an environment may include an emulator (e.g., software or other emulation mechanisms) in which a particular architecture (including, for example, instruction execution, architected functions (such as address translation), and architected registers) or a subset thereof (e.g., on a local computer system with a processor and memory) is emulated. In such an environment, one or more emulation functions of an emulator can implement one or more aspects of the present invention, even though the computer executing the emulator may have a different architecture than the capability being emulated. As one example, in emulation mode, a particular instruction or emulating operation is decoded and the appropriate emulation function is built to implement the separate instruction or operation.
In an emulation environment, a host computer includes (for example): a memory storing instructions and data; an instruction fetch unit that fetches instructions from memory and optionally provides a local buffer of fetched instructions; an instruction decode unit that receives fetched instructions and determines the type of instructions that have been fetched; and an instruction execution unit that executes the instructions. The executing may include: loading data from a memory into a register; storing data from the register back to the memory; or perform some type of arithmetic or logical operation (as determined by the decode unit). In one example, each unit is implemented in software. For example, the operations being performed by these units are implemented as one or more subroutines within emulator software.
More specifically, in a mainframe computer, the built machine instructions are often used by programmers (today often "C" programmers) by means of compiling applications. These instructions stored in the storage medium may be locally in the z/ArchitectureIn a server or in a machine executing other architectures. Can be in the present and futureIn large computer servers andother machines (e.g., Power Systems servers and Systems)Server) on-board simulationThese instructions. Can be used byAMDTMAnd other manufactured hardware in a wide variety of machines that execute Linux. In addition to executing on this hardware under z/Architecture, Linux may also be used, as well as emulated machines by Hercules, UMX, or FSI (Fundamental Software, Inc.), where execution is substantially in emulation mode. In emulation mode, emulation software is executed by the native processor to emulate the architecture of the emulated processor.
The native processor typically executes emulation software, including firmware or a native operating system, to perform emulation of the emulated processor. The emulation software is responsible for fetching and executing instructions that emulate the processor architecture. The emulation software maintains an emulation program counter to track instruction boundaries. The emulation software may fetch one or more emulated machine instructions at a time and convert the one or more emulated machine instructions to a corresponding group of native machine instructions for execution by the native processor. These translated instructions may be cached so that faster translations may be achieved. Nevertheless, the emulation software will maintain the architectural rules of the emulated processor architecture in order to ensure that the operating system and the application written for the emulated processor operate correctly. Further, the emulation software will provide the resources identified by the emulated processor architecture, including, but not limited to, control registers, general purpose registers, floating point registers, dynamic address translation functions including, for example, segment and page tables, interrupt mechanisms, context switch mechanisms, time of day (TOD) clocks, and architected interfaces to the I/O subsystem, so that the operating system or applications designed to execute on the emulated processor can execute on the native processor with the emulation software.
The decode is emulating a particular instruction and calls a subroutine to perform the function of the individual instruction. Simulation software functions that simulate the functions of a simulation processor are implemented, for example, as follows: a "C" subroutine or driver, or some other method of providing a driver for specific hardware that will be within the skill of those in the art after understanding the description of the preferred embodiment. Various software and hardware simulation patents, including (but not limited to) the following, describe a number of known ways to arrive at instruction formats built for different machines for simulation of a target machine that may be used by those skilled in the art: U.S. patent certificate No. 5,551,013 entitled "Multiprocessor for Hardware Emulation" to Beausoleil et al; and U.S. patent certificate No. 6,009,261 entitled "Preprocessing of Stored Target routes for organizing the expressed incorporated electronic measurements on a Target Processor" to Scalazi et al; and U.S. patent certificate No. 5,574,873 entitled "Decoding Guest Instruction to direct Access Instructions for the Guest Instructions" by Davidian et al; and U.S. patent certificate No. 6,308,255 entitled "symmetric Multiprocessing Bus and chip Used for Coprocessor supported Non-Native Code to Run in a System" by Gorishek et al; and U.S. patent certificate No. 6,463,582 entitled "Dynamic Optimizing Object Code Translator for architectural implementation and Dynamic Optimizing Object Code Translation Method" by Lethin et al; and Eric Trout U.S. patent certificate No. 5,790,825 entitled "Method for simulating guide instruments on Host Computer Through Dynamic Recompression of Host instruments"; and many others.
In FIG. 14, an example of an emulated host computer system 5092 is provided that emulates the host computer system 5000' of the host architecture. In emulated host computer system 5092, host processor (CPU)5091 is an emulated host processor (or virtual host processor) and includes an emulated processor 5093 having a native instruction set architecture that is different from that of processor 5091 of host computer 5000'. Emulation host computer system 5092 has memory 5094 accessible by emulation processor 5093. In an example embodiment, memory 5094 is partitioned into a host computer memory 5096 portion and an emulation routines 5097 portion. Host computer memory 5096 may be used to emulate the programs of host computer 5092 according to the host computer architecture. Emulation processor 5093 executes native instructions of a built instruction set of an architecture different from native instructions of emulation processor 5091, obtained from emulation routines memory 5097, and may access host instructions for execution from programs in host computer memory 5096 by using one or more instructions obtained in a sequence and access/decode routine that may decode the accessed host instruction(s) to determine a native instruction execution routine for emulating the function of the accessed host instruction. By way of example, other tools defined for the architecture of the host computer system 5000' may be emulated by the implemented tool routines, including such tools as general purpose registers, control registers, dynamic address translation and I/O subsystem support and processor caches. The emulation routine may also utilize functions available in the emulation processor 5093 (such as dynamic translation of general purpose registers and virtual addresses) to improve the performance of the emulation routine. Special hardware and off-load engines may also be provided to assist the processor 5093 in emulating the functionality of the host computer 5000'.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below, if any, are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of one or more aspects of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
Vector string instructions
Vector string tool
Instructions
Unless otherwise specified, all operands are vector register operands. A "V" in the assembler syntax indicates a vector operand.
Vector finding any equality
From left to right, comparing each unsigned binary integer element of the second operand with each unsigned binary integer element of the third operand, and if at M5The field has a zero search flag set, and is optionally compared to zero.
If M is5The Result Type (RT) flag in the field is zero, then for each element in the second operand that matches either an element in the third operand or optionally zero, the bit position of the corresponding element in the first operand is set to one, otherwise it is set to zero.
If M is5The Result Type (RT) flag in the field is one, the byte index of the leftmost element in the second operand that matches an element in the third operand or zero is stored in byte seven of the first operand.
Each instruction has an extended mnemonic section that describes the recommended extended mnemonic and its corresponding machine assembler syntax.
Programming comments: for all instructions that optionally set condition codes, performance may be degraded if the condition codes are set.
If M is5The Result Type (RT) flag in the field is one and no bytes found to be equal, or zero (if the zero search flag is set), then an index equal to the number of bytes in the vector is stored in byte seven of the first operand.
M4The field specifies the element size control (ES). The ES control specifies the size of the elements in the vector register operand. If a reserved value is specified, it is treated as a specified exception.
0-byte
1-half character
2-word
3 to 15-Retention
M5The fields have the following format:
m is defined as follows5Bit of field:
result Type (RT): if zero, each resulting element is a mask for all range comparisons for that element. If it is a one, the byte index is stored into byte seven of the first operand, and zeros are stored in all other elements.
Zero Search (ZS): if one, then each element of the second operand is also compared to zero.
Condition code setting (CC): if zero, the condition code is not set and remains unchanged. If one, the condition code is set as specified in the following paragraph.
Special conditions
If any of the following occurs, then the exception is treated as a specification and no other action is taken:
1.M4the fields contain values from 3 to 15.
2.M5Bit 0 of the field is not zero.
The resulting condition code:
if the CC flag is zero, the code remains unchanged.
If the CC flag is one, the code is set as follows:
0 if the ZS-bit is set, there is no match in the index elements lower than zero in the second operand.
1 some elements of the second operand match at least one element in the third operand.
2 all elements of the second operand match at least one element in the third operand.
3 no element in the second operand matches any element in the third operand.
Exception to the procedure:
1 data with DXC FE, vector register
Operation without vector extension tool installed
Specify (reserved ES value)
Transaction Constraint (Transaction Constraint)
Extended mnemonics:
vector find element equality
Proceeding from left to right, unsigned binary integer elements of the second operand are compared to corresponding unsigned binary integer elements of the third operand. If the two elements are equal, the byte index of the first byte of the leftmost equal element is placed in byte seven of the first operand. Zeros are stored in the remaining bytes of the first operand. If no bytes are found to be equal, or if no bytes are found to be zero (if a zero compare is set), then an index equal to the number of bytes in the vector is stored in byte seven of the first operand. Zeros are stored in the remaining bytes.
If at M5With a Zero Search (ZS) bit set in the field, each element in the second operand is also compared for equality with zero. If an element of zero is found in the second operand before any other elements of the second operand and the third operand are found to be equal, the byte index of the first byte of the element found to be zero is stored in byte seven of the first operand, and zeros are stored in all other byte positions. If the condition code set (CC) flag is one, then the condition code is set to zero.
M4The field specifies the element size control (ES). The ES control specifies the size of the elements in the vector register operand. If a reserved value is specified, it is treated as a specified exception.
0-byte
1-half character
2-word
3 to 15-Retention
M5The fields have the following format:
m is defined as follows5Bit of field:
retention: bits 0 to 1 are reserved and bits 0 to 1 must be zero. Otherwise, consider as a specified exception.
Zero Search (ZS): if one, then each element of the second operand is also compared to zero.
Condition code setting (CC): if zero, the condition code remains unchanged. If one, the condition code is set as specified in the following paragraph.
Special conditions
If any of the following occurs, then the exception is treated as a specification and no other action is taken:
1.M4the fields contain values from 3 to 15.
2.M5Bits 0 to 1 of the field are not zeros.
The resulting condition code:
if M is to be5Bit 3 of the field is set to one, then the code is set as follows:
0 if the zero compare bit is set, the comparison detects a zero element in the second operand in an element having a smaller index than any equal comparison.
The 1-compare detects a match between the second operand and the third operand in some elements. If the zero compare bit is set, then this match occurs in an element having an index less than or equal to the zero compare element.
2--
3 no elements compare equally.
If M is5Bit 3 of the field is zero, the code remains unchanged.
Exception to the procedure:
data with DXC FE, vector register
Operation without vector extension tool installed
Specify (reserved ES value)
Transaction constraints
Extended mnemonics:
programming comments:
1. for any element size, a byte index is always stored into the first operand. For example, if the element size is set to half-word and the 2 nd indexed half-word is compared to be equal, byte index 4 will be stored.
2. The third operand should not contain an element with a value of zero. If the third operand does contain a zero and matches the zero element in the second operand before any other equality comparison, then the condition code is set regardless of the zero compare bit setting.
Vector finding element inequality
Proceeding from left to right, unsigned binary integer elements of the second operand are compared to corresponding unsigned binary integer elements of the third operand. If the two elements are not equal, then the byte index of the leftmost unequal element is placed in byte seven of the first operand, and zeros are stored to all other bytes. If it isWill M5The condition code set (CC) bit in the field is set to one, then the condition code is set to indicate which operand is larger. If all elements are equal, a byte index equal to the vector size is placed in byte seven of the first operand, and zeros are placed in all other byte positions. If the CC bit is one, then condition code three is set.
If at M5With a Zero Search (ZS) bit set in the field, each element in the second operand is also compared for equality with zero. If a zero element is found in the second operand before any other element of the second operand is found to be unequal, then the byte index of the first byte of the element found to be zero is stored in byte seven of the first operand. Zeros are stored in all other bytes and condition code 0 is set.
M4The field specifies the element size control (ES). The ES control specifies the size of the elements in the vector register operand. If a reserved value is specified, it is treated as a specified exception.
0-byte
1-half character
2-word
3 to 15-Retention
M5The fields have the following format:
m is defined as follows5Bit of field:
zero Search (ZS): if one, then each element of the second operand is also compared to zero.
Condition code setting (CC): if zero, the condition code is not set and remains unchanged. If one, the condition code is set as specified in the following paragraph.
Special conditions
If any of the following occurs, then the exception is treated as a specification and no other action is taken:
1.M4the fields contain values from 3 to 15.
2.M5Bits 0 to 1 of the field are not zeros.
The resulting condition code:
if M is to be5Bit 3 of the field is set to one, then the code is set as follows:
0 if the zero compare bit is set, the comparison detects a zero element in the two operands in a lower index element than any unequal comparison
1 element mismatch is detected and the elements in VR2 are smaller than the elements in VR3
2 element mismatch is detected and the element in VR2 is larger than the element in VR3
3 all elements compare equally and if the zero compare bit is set, no zero element is found in the second operand.
If M is5Bit 3 of the field is zero, the code remains unchanged.
Exception to the procedure:
data with DXC FE, vector register
Operation without installing vector expansion tools
Specify (reserved ES value)
Transaction constraints
Extended mnemonics:
vector string range comparison
Proceeding from left to right, unsigned binary integer elements in the second operand are compared to a range of values defined by even-odd pairs of elements in the third operand and the fourth operand. The range of comparisons to be performed is defined in combination with the control value from the fourth operand. An element is considered a match if it matches any of the ranges specified by the third operand and the fourth operand.
If M is6The Result Type (RT) flag in the field is zero, then if the element in the first operand corresponding to the element in the second operand being compared matches any of these ranges, the bit position of that element is set to one, otherwise it is set to zero.
If M is to be6With the Result Type (RT) flag in the field set to one, the byte index of the first element of the second operand that matches either one of the ranges specified by the third operand and the fourth operand or a zero compare (if the ZS flag is set to one) is placed in byte seven of the first operand, and zeros are stored in the remaining bytes. If there is no element match, an index equal to the number of bytes in the vector is placed in byte seven of the first operand, and zeros are stored in the remaining bytes.
M6A Zero Search (ZS) flag in the field, if set to one, adds the comparison of the second operand element with zero to the range provided by the third operand and the fourth operand. If it is a zero compare in a lower index element than any other true compare, then the condition code is set to zero.
These operands include a value represented by M5The element size in the field controls the element of the specified size.
The fourth operand element has the following format:
if ES equals 0:
if ES equals 1:
if ES equals 2:
the bits in the fourth operand element are defined as follows:
equal (EQ): when it is one, an equality comparison is performed.
Greater Than (GT): when one, a greater than comparison is performed.
Less Than (LT): when one, a less than compare is performed.
All other bits are reserved and should be zero to ensure future compatibility.
These control bits may be used in any combination. If none of these bits are set, the comparison will always produce a false result. If all of these bits are set, the comparison will always produce a true result.
M5The field specifies the element size control (ES). The ES control specifies the size of the elements in the vector register operand. If a reserved value is specified, it is treated as a specified exception.
0-byte
1-half character
2-word
3 to 15-Retention
M6The fields have the following format:
m is defined as follows6Bit of field:
inversion result (IN): if zero, the comparison with the pair of values in the control vector continues. If one, the results of these pairwise comparisons in these ranges are inverted.
Result Type (RT): if zero, each resulting element is a mask for all range comparisons for that element. If so, the index is stored into byte seven of the first operand. Zeros are stored in the remaining bytes.
Zero Search (ZS): if one, then each element of the second operand is also compared to zero.
Condition code setting (CC): if zero, the condition code is not set and remains unchanged. If one, the condition code is set as specified in the following paragraph.
Special conditions
If any of the following occurs, then the exception is treated as a specification and no other action is taken:
1.M4the fields contain values from 3 to 15.
The resulting condition code:
0 if ZS 1and find zero in index elements lower than any comparison
1 discovery comparison
2--
3 No comparison was found
Exception to the procedure:
data with DXC FE, vector register
Operation without installing vector expansion tools
Specify (reserved ES value)
Transaction constraints
Extended mnemonics:
ES=1,ZS=0
result of VR1(a) RT ═ 0
Result of VR1(b) RT ═ 1
Loading counts to block boundaries
A 32-bit unsigned binary integer (overlaid in sixteen) containing the number of bytes that may be loaded from the second operand location without crossing the specified block boundary is placed in the first operand.
The shifts are treated as 12-bit unsigned integers.
The second operand address is not used to address data.
M3The field specifies the code used to signal the CPU about the block boundary size to count the number of possible bytes loaded. If a reserved value is specified, it is treated as a specified exception.
The resulting condition code:
0 operand one is sixteen
1--
2--
3 operands one is less than sixteen
The resulting condition code:
exception to the procedure:
operation without installing vector expansion tools
Specify
Programming comments: loading a COUNT TO a BLOCK BOUNDARY (LOAD COUNT TO BLOCK BOUNDARY) is used in conjunction with loading a VECTOR TO a BLOCK BOUNDARY (VECTOR LOAD TO BLOCK BOUNDARY) is expected TO determine the number of bytes loaded.
Vector Loading GR from VR elements
Will have a structure represented by M4The ES value in the field specifies a size and an element of a third operand indexed by the second operand address is placed in the first operand position. The third operand is a vector register. The first operand is a general register. If the index specified by the second operand address is greater than the highest numbered element in the third operand having the specified element size, the data in the first operand is unpredictable.
If a vector register element is smaller than a doubleword, then that element is exactly aligned in the 64-bit general register and zeros fill the remaining bits.
The second operand address is not used to address data; instead, the rightmost 12 bits of the address are used to specify the index of the element within the second operand.
M4The field specifies the element size control (ES). The ES control specifies the size of the elements in the vector register operand. If a reserved value is specified, it is treated as a specified exception.
0-byte
1-half character
2-word
3-double character
4 to 15-Retention.
The resulting condition code: the code is not changed.
Exception to the procedure:
data with DXC FE, vector register
Operation without installing vector expansion tools
Specify (reserved ES value)
Transaction constraints
Extended mnemonics:
vector loading to Block boundaries
The first operand is loaded with bytes from the second operand, starting with the zero index byte element. If a boundary condition is encountered, the remainder of the first operand is unpredictable. No access exceptions are seen with respect to unloaded bytes.
The displacement for VLBB is treated as a 12-bit unsigned integer.
M3The field specifies the code used to signal to the CPU about the block boundary size to load into. If a reserved value is specified, it is treated as a specified exception.
The resulting condition code: the code remains unchanged.
Exception to the procedure:
access (get, operand 2)
Data with DXC FE, vector register
Operation without installing vector expansion tools
Specify (reserved block boundary code)
Transaction constraints
Programming comments:
1. in some cases, data may be loaded across block boundaries. However, this will only happen if there is no access exception for the data.
Vector storage
A 128-bit value in the first operand is stored to a storage location specified by the second operand. The displacement for the VST is treated as a 12-bit unsigned integer.
The resulting condition code: the code remains unchanged.
Exception to the procedure:
access (store, operand 2)
Data with DXC FE, vector register
Operation without installing vector expansion tools
Transaction constraints
Vector storage with length
Proceeding from left to right, bytes from the first operand are stored at the second operand location. The general register specifying the third operand contains a 32-bit unsigned integer containing a value representing the highest index byte stored. If the third operand contains a value greater than or equal to the highest byte index of the vector, then all bytes of the first operand are stored.
Only as an access exception with respect to the stored bytes.
The displacement for the VECTOR STORE with length (VECTOR STORE WITH LENGTH) is treated as a 12-bit unsigned integer.
The resulting condition code: the condition code remains unchanged.
Exception to the procedure:
access (store, operand 2)
Data with DXC FE, vector register
Operation without installing vector expansion tools
Transaction constraints
Description of RXB
All vector instructions have a field in bits 36 through 40 of the instruction labeled RXB. This field contains the most significant bits for all the operand specified by the vector register. Bits are reserved for registers not specified by the instruction and should be set to zero; otherwise, the program will not operate compatibly in the future. The most significant bit is concatenated to the left of the four-bit register indication to create a five-bit vector register indication.
These bits are defined as follows:
0. among bits 8 through 11 of the instruction are the most significant bits for the vector register designation.
1. Among bits 12 through 15 of the instruction are the most significant bits for vector register designation.
2. Among bits 16 through 19 of the instruction are the most significant bits for the vector register designation.
3. Among bits 32 through 35 of the instruction are the most significant bits for the vector register designation.
Vector enable control
If both vector enable control (bit 46) and AFP register control (bit 45) in control register zero are set to one, then only the vector register and instruction may be used. If the vector tool is installed and the vector instruction is executed without the enable bit set, then it is treated as an exception to data having DXC FE hexadecimal. If the vector tool is not installed, the exception to the operation is considered.
Claims (14)
1. A method for executing a machine instruction in a central processing unit, the method comprising:
obtaining, by a processor, a machine instruction for execution, the machine instruction defined according to a computer architecture for computer execution, the machine instruction comprising:
at least one opcode field providing an opcode, the opcode identifying a vector string range compare operation;
an extension field for use in designating one or more registers;
a first register field combined with a first portion of the extension field to specify a first register, the first register including a first operand;
a second register field combined with a second portion of the extension field to specify a second register, the second register including a second operand;
a third register field combined with a third portion of the extension field to specify a third register, the third register including a third operand;
a fourth register field combined with a fourth portion of the extension field to indicate a fourth register, the fourth register including a fourth operand;
a mask field comprising one or more controls to be used during execution of the machine instruction; and
executing the machine instruction, the executing comprising:
comparing, using one or more controls programmatically provided by a fourth operand, elements of the second operand with one or more values of the third operand to determine whether there is a match defined by one or more values of the third operand and one or more controls of the fourth operand, wherein the third operand comprises a plurality of pairs of elements and the fourth operand comprises a plurality of pairs of elements, each element of the fourth operand corresponding to an element of the third operand and each element of the fourth operand comprising a control to be used in comparing a value of a corresponding element of the third operand with a value of an element of the second operand, and wherein the comparing a value of an element of the second operand comprises comparing a value of an element of the second operand with at least values of pairs of elements of the plurality of pairs of elements of the third operand to determine whether there is a match, and wherein the condition specified in the control of the elements of the fourth operand controls each comparison of the value of the elements of the second operand with the value of the elements of the third operand, the elements of the fourth operand corresponding to the elements of the third operand; and
the result is provided in the first operand based on the comparison.
2. The method of claim 1, wherein the method further comprises:
determining whether the mask field includes a zero element control set to indicate a search zero element;
the zero element is searched for in the second operand based on a mask field that includes a zero element control set to indicate searching for the zero element.
3. The method of claim 1, wherein the plurality of values of the third operand comprises a pair of values in the third operand.
4. The method of claim 3, wherein the third operand comprises a plurality of pairs of values, and the method further comprises comparing an element of the second operand to each of the plurality of pairs of values.
5. The method of claim 4, wherein the comparing comprises comparing each element of the second operand to each pair of values of the plurality of pairs of values.
6. The method of claim 1, wherein the one or more controls programmatically provided by the fourth operand comprise at least one of greater than, less than, or equal to.
7. The method of claim 1, wherein the mask field includes a result type that defines how the result is provided in the first operand.
8. The method of claim 7, wherein
Based on a result type having a first value, the result is placed in a selected location of the first operand; and
based on the result type having the second value, the result placed in the first operand is a mask that depends on whether the comparison indicates a true or false of the element.
9. The method of claim 8, wherein the method further comprises repeating the comparing for a plurality of elements of the second operand, and wherein the result placed at a selected location comprises one of the first byte of the first element in the range or an indication of no match.
10. The method of claim 1, wherein the mask field includes a condition code setting control, and wherein the method further comprises:
determining whether condition code setting control is set; and
the condition codes are set for executing the machine instructions based on the set condition code setting control.
11. The method of claim 4, wherein setting a condition code comprises one of:
setting a condition code to a value indicating that a zero element is detected in a lower index element than any comparison;
setting a condition code to a value indicating that the comparison is sought; and
the condition code is set to a value indicating that no comparison is being sought.
12. The method of claim 1, wherein performing comprises determining a direction of the comparison when performing, wherein the direction is one of left-to-right or right-to-left, and wherein determining the direction of the comparison comprises accessing a direction control by the machine instruction to determine the direction.
13. The method of claim 1, wherein the mask field includes one or more indicators to provide one or more controls for executing the machine instruction, the one or more controls for executing the machine instruction including a zero search control to determine whether values of elements of the second operand are to be compared to zeros, and wherein the machine instruction is configured in one execution to search for zeros and compare values of elements of the second operand using one or more values of the third operand.
14. A system comprising means adapted for carrying out all the steps of the method according to any preceding method claim.
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US13/421,560 | 2012-03-15 | ||
| US13/421,560 US9459864B2 (en) | 2012-03-15 | 2012-03-15 | Vector string range compare |
| PCT/EP2013/054614 WO2013135558A1 (en) | 2012-03-15 | 2013-03-07 | Vector string range compare |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1201355A1 HK1201355A1 (en) | 2015-08-28 |
| HK1201355B true HK1201355B (en) | 2018-01-19 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN104169868B (en) | For executing the method and system of machine instruction in CPU | |
| CN104169906B (en) | Method and system for executing machine instructions in a central processing unit | |
| US9715383B2 (en) | Vector find element equal instruction | |
| CN104205067B (en) | instruction that loads data up to the specified memory boundary indicated by the instruction | |
| US9710266B2 (en) | Instruction to compute the distance to a specified memory boundary | |
| US9454366B2 (en) | Copying character data having a termination character from one memory location to another | |
| US20130246762A1 (en) | Instruction to load data up to a dynamically determined memory boundary | |
| CN104169869A (en) | Comparing sets of character data having termination characters | |
| AU2012373736B2 (en) | Instruction to compute the distance to a specified memory boundary | |
| HK1201355B (en) | Method and system for executing a machine instruction in a central processing unit | |
| HK1201353B (en) | Method and system for loading data into a register | |
| HK1201372B (en) | Method and system for executing machine instructions in cpu | |
| HK1201352B (en) | Instruction to load data up to a specified memory boundary indicated by the instruction | |
| HK1201354A1 (en) | Transforming non-contiguous instruction specifiers to contiguous instruction specifiers |