Summary of the invention
In view of this, the invention provides the structural data processing method of a kind of SIMD and processor, with power wastage in the structural execution process instruction of SIMD in the solution prior art, carry out the lower problem of efficient.Its concrete scheme is:
The data processing method of a kind of single instruction multiple data stream organization SIMD comprises:
Choose qualified instruction process multi-group data stream, described instruction has the predicate territory that comprises marker bit and index bit;
Decoding is carried out in described instruction, and obtained the value of described marker bit and index bit;
Utilize the value of described marker bit to judge whether described instruction is the predicate instruction;
When described instruction is the predicate instruction, search list item corresponding with it in predicate register file according to the value of index bit, read the predicate in list item corresponding with described index place value in default predicate register file;
Give described multi-group data stream with described predicate uniform distribution;
The value of the predicate that more described marker bit is corresponding with each group data stream respectively;
Determine that comparative result is but that identical data stream is data streams;
But carry out the described data streams of instruction process.
Preferably, also comprise:
Determine that comparative result is but that different data stream is non-data streams;
But carry out non-operation instruction and process described non-data streams.
Preferably, also comprise:
For not simultaneously, stop the processing to its corresponding data stream when described comparative result.
Preferably, when described instruction is non-predicate instruction, directly carry out described instruction process multi-group data stream.
A kind of SIMD processor comprises:
The unit is chosen in instruction, is used for choosing qualified instruction process multi-group data stream, and described instruction has the predicate territory that comprises marker bit and index bit;
Decoding unit is used for decoding is carried out in described instruction, and obtains the value of described marker bit and index bit;
Judging unit is used for utilizing the value of described marker bit to judge whether described instruction is the predicate instruction;
Predicate register file is used for depositing predicate;
Reading unit is used for searching list item corresponding with it in predicate register file according to the value of index bit when described instruction is the predicate instruction, reads the predicate in list item corresponding with described index bit in default predicate register file;
Allocation units are used for described predicate uniform distribution to described multi-group data stream;
Comparing unit is for the value of the predicate that more described marker bit is corresponding with each group data stream respectively;
Determining unit, but be used for determining that comparative result is that identical data stream is data streams;
Performance element, but be used for carrying out the described data streams of instruction process.
Preferably, but described determining unit be used for to determine that also comparative result is that different data stream is non-data streams; But described performance element also is used for carrying out non-operation instruction processes described non-data streams.
The structural data processing method of SIMD disclosed by the invention is introduced the predicated execution mode, utilize the comparative result of predicate mark and predicate to judge whether to need to carry out instruction treatmenting data stream group, then only carrying out comparative result is identical data stream corresponding to instruction, avoided because process the power wastage that does not need processed data stream to cause, and the low problem for the treatment of effeciency.Can process efficiently irregular control stream, further enlarge the scope of application of SIMD structure.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, rather than whole embodiment.Based on the embodiment in the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.
The predicated execution technology is a kind of relevant technology of control between instruction that solves, and by introducing logical predicate, with the instruction of controlling in relevant and certain branch, the data that are converted to this branch condition (being predicate) and corresponding data are relevant.Traditional predicated execution technology is relevant by the control of eliminating between a plurality of fundamental blocks, and the tie compiler device merges a plurality of fundamental blocks becomes an inside without controlling relevant super piece, has realized excavating the purpose of more instruction-level parallelism.Produce the Boolean type predicate of the different individual paths of a plurality of representatives after the branch instruction predicate, the instruction that is arranged in branch has the predicate territory, only has when corresponding predicate condition is satisfied in predicate territory corresponding to instruction, and instruction is just carried out.The predicated execution technology can rely on the data dependence that is converted to predicate with the control between instruction.
The present invention discloses the structural data processing method of a kind of SIMD according to the predicated execution technology, and its embodiment is as follows:
Embodiment one
The flow process of the structural data processing method of the disclosed SIMD of the present embodiment comprises as shown in Figure 2:
Step S21, choose qualified instruction process multi-group data stream, described instruction has the predicate territory that comprises marker bit and index bit;
Predicate territory in instruction is set in advance by compiler, when compiler is the scale-of-two machine code in the program translation that will finish writing, is predicate numbering of each finger assignments automatically, and this numbering is filled in the predicate territory of instruction of branch's association therewith.In the process that program is carried out, calculate the value of corresponding predicate by computations, and by the predicate register file write command, predicate is write in predefined predicate register file.Each list item of predicate register file has one and writes the position, complete writing of described list item fashionable when predicate register file writes instruction, simultaneously will be with the writing position position of corresponding list item, realize according to the hardware of reality, it can be set to 1 or 0, with the expression predicate register file this value of corresponding predicate has been arranged.The instruction that is selected need to meet the condition that operand has been satisfied simultaneously, and predicate is calculated accordingly, and is written to the condition in predicate register file.
Step S22, decoding is carried out in described instruction, and obtained the value of marker bit and index bit;
Step S23, utilize the value of described marker bit to judge whether described instruction is the predicate instruction;
Step S24, when described instruction is the predicate instruction, read the predicate in list item corresponding with described index place value in default predicate register file;
Step S25, give described multi-group data stream with described predicate uniform distribution;
Whether the value of step S26, more described marker bit and each group data stream are corresponding respectively predicate is identical;
Step S27, determine that comparative result is but that identical data stream is data streams;
But step S28, the described data streams of execution instruction process.
Can find out from above-mentioned steps, the structural data processing method of SIMD disclosed by the invention is by introducing the predicated execution mode, utilize the comparative result of predicate mark and predicate to judge whether to need to carry out instruction treatmenting data stream group, then only carrying out comparative result is identical data stream corresponding to instruction, avoided because process the power wastage that does not need processed data stream to cause, and the low problem for the treatment of effeciency.
Embodiment two
Comparatively detailed data processing method disclosed by the invention is described in the present embodiment, its flow process as shown in Figure 3,
Step S31, choose qualified instruction process multi-group data stream, described instruction has the predicate territory that comprises marker bit and index bit;
Instruction set requires when every instruction encoding, increases the predicate territory of N position, predicate territory concrete structure as shown in Figure 4, the predicate marker bit of the front M position in predicate territory is used for determining following two problems, 1, whether this statement be the predicate statement; If 2 predicate statements, this statement is carried out under which kind of predicate state, and in the present embodiment, take M=2 as example, its definition mode as shown in Figure 5.In the present invention, predicate is stored in predefined predicate register file, and predicate register file is one group of multiport register,, deposit corresponding predicate data.The residue N-M position in predicate territory is the predicate index bit, can index 2
N-M, can find the predicate corresponding with this instruction according to the index bit in the predicate territory of every instruction.Then predicate is divided into groups according to the characteristics of SIMD structure, for example, if the SIMD structure comprises four performance elements, can utilize simultaneously an instruction that four groups of data are processed, predicate is divided into four groups, whether each group is corresponding performance element respectively, utilize the comparative result of every group of predicate and marker bit to control this performance element and work, thereby realize whether steering order is carried out.
Step S32, decoding is carried out in described instruction, and obtained the value of marker bit and index bit;
Step S33, utilize the value of described marker bit to judge whether described instruction is the predicate instruction, if, execution in step S34, if not, execution in step S311;
According to definition mode shown in Figure 4, corresponding current marker bit, whether decision instruction is the predicate instruction.
Step S34, read the predicate in list item corresponding with described index place value in default predicate register file;
Search list item corresponding with it in predicate register file according to the value of index bit, read the predicate in list item, for example index bit is 111, and its value is 7, reads the predicate of the 7th the interior storage of list item in predicate register file.
Step S35, give described multi-group data stream with described predicate uniform distribution;
Whether the value of step S36, more described marker bit and each group data stream are corresponding respectively predicate is identical, if, execution in step S37, if not, execution in step S39;
marker bit and predicate are compared according to preset rules, preset rules can according to circumstances set up on their own, but need to follow certain principle, namely when instruction is the predicate instruction, when the value of the corresponding predicate that indexes out in the marker bit in instruction predicate territory and predicate register file is identical, comparative result is TRUE, otherwise comparative result is FALSE, as shown in Figure 6, when only having the predicate value identical with the predicate marker bit, result is TRUE, and for non-predicate statement, be that predicate is labeled as at 01 o'clock, need to carry out according to normal sequence instruction, so no matter what its corresponding predicate is, its result is all TRUE.
Step S37, determine that comparative result is but that identical data stream is data streams;
If comparative result is identical, represent that this instruction can be performed its corresponding data stream of processing, meet executive condition,
But step S38, the described data streams of execution instruction process;
Step S39, determine that comparative result is but that different data stream is non-data streams;
If comparative result is different, illustrating does not need to carry out this data stream of instruction process.
But step S310, execution non-operation instruction are processed described non-data streams;
If comparative result is different, expression does not need to carry out this data stream of instruction process, can replace instruction to the processing of data stream with blank operation, perhaps introduces gated clock, with carrying out the performance element dormancy of this data stream, stops the processing to this data stream.
Step S311, all set of streams of execution instruction process.
Specifically disclose the type of utilizing the marker bit decision instruction in the present embodiment and utilize whether certain rule judgment instruction carry out step, be that different instructions utilizes dummy instruction to fill for comparative result, thereby do not go to process, saved power consumption.Can realize reducing the calculating of redundant branch instruction, effectively raise treatment effeciency, reduce energy consumption, can process efficiently irregular control stream and use, further enlarge the scope of application of SIMD structure.
The invention also discloses a kind of SIMD processor, its structure as shown in Figure 7, comprise: instruction choose unit 71,, decoding unit 72, judging unit 73, predicate register file 74, reading unit 75, allocation units 76, comparing unit 77, determining unit 78, performance element 79, wherein:
Instruction is chosen unit 71 and is used for choosing pending instruction; Decoding unit 72 is used for decoding is carried out in described instruction, and obtains the value of marker bit and index bit; Judging unit 73 is used for utilizing the value of described marker bit to judge whether described instruction is the predicate instruction; Predicate register file 74 is used for depositing predicate; Reading unit 75 is used for reading the predicate in list item corresponding with described index place value in default predicate register file when described instruction is the predicate instruction; Allocation units 76 are used for described predicate uniform distribution to described multi-group data stream; Whether the value that comparing unit 77 is used for the predicate that more described marker bit and each group data stream are corresponding respectively is identical; Determining unit 78 is used for determining that comparative result is but that identical data stream is data streams; But performance element 79 is used for carrying out the described data streams of instruction process.
The disclosed SIMD processor of the present embodiment has four performance elements, can process four group data streams simultaneously, and each execution unit controls whether carry out this instruction according to the comparative result of comparing unit.When this functional part does not need to carry out this instruction, can adopt direct filling dummy instruction to the mode of streamline, utilize dummy instruction to replace current instruction, data are not processed, perhaps introduce gated clock, with this functional part dormancy, stop the processing to instruction.
Determining unit also is used for determining that comparative result is but that different data stream is non-data streams, processes described non-data streams but performance element also is used for carrying out non-operation instruction.
As seen from the figure, the disclosed SIMD processor of the present embodiment has only added predicate register file and steering logic on hardware, and the microstructure that does not need to change original processor is simple in structure, is easy to realize.
In this instructions, each embodiment adopts the mode of going forward one by one to describe, and what each embodiment stressed is and the difference of other embodiment that between each embodiment, identical similar part is mutually referring to getting final product.For the disclosed device of embodiment, because it is corresponding with the disclosed method of embodiment, so description is fairly simple, relevant part partly illustrates referring to method and gets final product.
The professional can also further recognize, unit and the algorithm steps of each example of describing in conjunction with embodiment disclosed herein, can realize with electronic hardware, computer software or combination both, for the interchangeability of hardware and software clearly is described, composition and the step of each example described in general manner according to function in the above description.These functions are carried out with hardware or software mode actually, depend on application-specific and the design constraint of technical scheme.The professional and technical personnel can specifically should be used for realizing described function with distinct methods to each, but this realization should not thought and exceeds scope of the present invention.
The method of describing in conjunction with embodiment disclosed herein or the step of algorithm can directly use the software module of hardware, processor execution, and perhaps both combination is implemented.Software module can be placed in the storage medium of any other form known in random access memory (RAM), internal memory, ROM (read-only memory) (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field.
To the above-mentioned explanation of the disclosed embodiments, make this area professional and technical personnel can realize or use the present invention.Multiple modification to these embodiment will be apparent concerning those skilled in the art, and General Principle as defined herein can be in the situation that do not break away from the spirit or scope of the present invention, realization in other embodiments.Therefore, the present invention will can not be restricted to these embodiment shown in this article, but will meet the widest scope consistent with principle disclosed herein and features of novelty.