CN1297887C

CN1297887C - Processor capable of aligning multiple register data across boundaries and method thereof

Info

Publication number: CN1297887C
Application number: CNB2003101188147A
Authority: CN
Inventors: 梁伯嵩
Original assignee: Sunplus Technology Co Ltd
Current assignee: Sunplus Technology Co Ltd
Priority date: 2003-11-28
Filing date: 2003-11-28
Publication date: 2007-01-31
Anticipated expiration: 2023-11-28
Also published as: CN1622031A

Abstract

The invention provides a processor capable of aligning a plurality of register data by crossing boundaries and a method thereof, wherein a decoding device is used for decoding a multiple shift instruction; a register set having a plurality of registers, each register having N bits; a shifter connects the output contents of the first output end and the second output end of the register set in series to form a 2N bit word, then shifts the 2N bit word by w bits and outputs the first N bits; a control device sets a register group according to the decoded multiple shift instruction, reads out the content of the corresponding register, shifts the content of the read register by w bits by the shifter, and writes the output of the shifter into the register group.

Description

Trans-boundary alignment multiple transient memory DATA PROCESSING device and method thereof

Technical field

The invention relates to the technical field of Data Processing; Especially refer to a kind of trans-boundary alignment multiple transient memory DATA PROCESSING device and method thereof utilized.

Background technology

When processor carried out Data Processing, whether the alignment of data was related to the usefulness of many key operations, for example the usefulness of computing such as word string, array.As shown in Figure 1, a data (ABCDEFGHIJKL) that needs to handle is often crossed over the data storage border, when a processor carries out word string or array operation to this document, need to carry out earlier many extra computings, so that after can be with this document being reduced into the form of alignment, this processor could be to the document utilization of being correlated with.

At the unjustified problem of processing data, a kind of known method is after data is written into processor, utilizes various processor instructions to operate again and obtain needed data.As shown in Figure 2, the data (ZABC) that will be arranged in the 100h place earlier is written into working storage R16, working storage R16 is moved to left 8 bits so that unwanted data (Z) is removed, the data (DEFG) that will be arranged in the 104h place again is written into working storage R17, and working storage R17 moved to right 24 bits so that unwanted data (EFG) is removed, at last with working storage R16 and working storage R17 carries out or (OR) computing and its result deposited to working storage R16, the content among this moment working storage R16 is the data (ABCD) of required processing.According to above-mentioned same steps as, data EFGH and IJKL are written among working storage R17 and the working storage R18 in regular turn.

As shown in the above description: if the required unjustified data length that is written into is n word group (a word group is 32 bits), known method then needs 5n instruction to describe and reads action, simultaneously need 5n instruction cycle just can finish at least and read action, this makes procedure code tediously long, occupy the storage area, the burden that also increases processor simultaneously makes processor efficient unclear.

Use processor instruction to handle the problem that unjustified data is drawn the tediously long and efficient of Hyper program sign indicating number at known method, in U.S. USP4,814, in No. 976 patent announcements, be to be written into the action that unjustified data is promptly alignd simultaneously, and, be divided into twice and read a document of crossing the boundary.As shown in Figure 3, the data (ABC) that will be arranged in 101h to 103h place earlier is written into the

bit group

0,1,2 of working storage R16, this moment working storage R16 bit group 3 in data be X (don ' t care), the data (D) that will be arranged in the 104h place again is written into the bit group 3 of R16, and the content among the working storage R16 is the data (ABCD) of required processing at this moment.Same steps as is written into data EFGH and IJKL among working storage R17 and the working storage R18 in regular turn according to this.

As shown in the above description,, then need 2n instruction to describe and read action, need 2n instruction cycle just can finish at least simultaneously and read action if the required unjustified data length that is written into is n word group.And, make the processor pipeline stop (Pipeline Stall) possibility and improve because same reservoir and working storage position are made repetitive read-write.Same reservoir position is repeated to read, can waste bus bandwidth, especially in some system that does not have cache, the delay that is caused is obvious especially.

Summary of the invention

The object of the present invention is to provide a kind of with trans-boundary alignment multiple transient memory DATA PROCESSING device and method thereof, tediously long with the procedure code of avoiding known technology, as to occupy storage area problem, can avoid because same reservoir is repeated to read the problem of waste bus bandwidth simultaneously.

According to one of characteristic of the present invention, a kind of trans-boundary alignment multiple transient memory DATA PROCESSING apparatus is proposed, it mainly comprises:

One decoding device is decoded so that a multiple shift is instructed;

One working storage group, have a plurality of working storages, each working storage is the N bit, this working storage group can read working storage respectively according to one first address and one second address, and by one first output terminal and the output of one second output terminal, and can write this multiple transient memory one of them (N is a positive integer) via an input end according to one the 3rd address;

One shift unit, be coupled to first output terminal and second output terminal of this working storage group, and the output content of this first output terminal and second output terminal is concatenated into a 2N bit word group, again according to a shift value w with this 2N bit word group displacement w bit (w is a positive integer), and export top n bit in this 2N bit word group; And

One control device, be coupled to this decoding device and working storage group, according to this decoded multiple shift instruction, to set this first address, second address, the 3rd address and shift value w, read the content of corresponding working storage, with by this shift unit with the content of read working storage displacement w bit, and the output of this shift unit is write this working storage group according to the 3rd address.

Described device, wherein N is 32.

Described device, wherein w be 8,16,24 one of them.

Described device, wherein this shift unit w bit that can be shifted to the left or to the right.

Described device, wherein the 3rd address is that setting is identical with this first address.

Described device, wherein this second address is the follow-up address that is set at this first address.

According to another characteristic of the present invention, the align method of a plurality of working storage data of a kind of trans-boundary is proposed, these a plurality of working storages form a working storage group, each working storage is the N bit, this working storage group can read working storage respectively according to one first address and one second address, and by one first output terminal and the output of one second output terminal, and can write this multiple transient memory one of them (N is a positive integer) via an input end according to one the 3rd address, this method mainly comprises the following step:

(A) set this first address, this second address, the 3rd address and a shift value w according to multiple shift instruction;

(B) content of reading corresponding working storage according to this first address and second address; And

(C) content strings of step (B) working storage of reading is connected into the word group of 2N bit, again to this 2N bit word group w bit that is shifted, and top n bit in this 2N bit word group after will being shifted, according to the 3rd address write these a plurality of working storages one of them.

Described method, wherein step (A) to step (C) is heavily to cover execution, has all finished displacement up to the working storage of a predetermined number.

Described method, wherein N is 32.

Described method, wherein w be 8,16,24 one of them.

Described method, wherein displacement w bit can be the w bit that is shifted to the left or to the right in the step (C).

Described method, wherein the 3rd address is that setting is identical with this first address.

Described method, wherein this second address is the follow-up address that is set at this first address.

Description of drawings

Fig. 1: be one group of synoptic diagram that unjustified data is arranged in reservoir.

Fig. 2: the procedure code that is written into one group of unjustified data for known technology.

Fig. 3: for another known technology is written into the procedure code of one group of unjustified data and the synoptic diagram of working storage.

Fig. 4: be the calcspar of trans-boundary alignment multiple transient memory DATA PROCESSING apparatus of the present invention.

Fig. 5: be the detailed circuit diagram of the technology of the present invention control device 5.

Fig. 6: be the technology of the present invention running synoptic diagram.

Fig. 7: be an exemplary applications of the technology of the present invention.

Embodiment

Fig. 4 shows the calcspar that utilizes trans-boundary alignment multiple transient memory DATA PROCESSING device of the present invention, and it includes a decoding device 100, a control device 200, a working storage group 300 and a shift unit 400.Working storage group 300 has a plurality of working storages 3001, and each working storage 3001 is the N bit, and in the present embodiment, the N value is preferably 32.This working storage group 300 can read working storage 3001 respectively according to one first address 301 and one second address 302, and by one first output terminal 310 and 320 outputs of one second output terminal, and can write this multiple transient memory 3001 one of them (N is a positive integer) via an input end 330 according to one the 3rd address 303.

This decoding device 100 is that instruction is decoded to a multiple shift, and this multiple shift instruction can be divided into a multiple left shift instruction (Multiple Left Shin Instruction, MLSI) and a multiple right shift instruction (Multiple Right Shift Instruction, MRSI).Wherein, multiple left shift instruction form is MLSIRx, Ry, and w, it is represented the working storage contents value in x to the y scope, and integral body is carried out to the action w bit that shifts left.And multiple right shift instruction form is MRSI Rx, Ry, and w, it is represented the working storage contents value in x to the y scope, and integral body is carried out the action w bit of right shift.Decoding device 100 is after instruction is decoded to a multiple shift, can produce x, y, L_R ^*And the w signal, and export this control device 200 to, and wherein, L_R ^*Signal is only first in order to the mobile to the left or to the right w of indication, works as L_R ^*Signal is 1 o'clock, and expression is moved to the left the w bit, works as L_R ^*Signal is 0 o'clock, represents to move right the w bit.

This shift unit 400 is first output terminal 310 and second output terminals 320 that are coupled to this working storage group 300, and the output content of this first output terminal 310 and second output terminal 320 is concatenated into one 64 bit space groups, again according to a shift value w and a L_R ^*Signal is this 64 bit word group w bit (w is a positive integer) that is shifted to the left or to the right, and exports preceding 32 bits in these displacement back 64 bit word groups.

This control device 200 is coupled to this decoding device 100 and working storage group 300, according to this decoded x, y, and L_R ^*And w signal, setting first address 301, second address 302, the 3rd address 303 and the shift value w of this working storage group 300, and the content of reading x working storage and y working storage in this working storage group 300 by first output terminal 310 of this working storage group 300 and second output terminal 320.

Fig. 5 is the detailed circuit diagram of this control device 200, and it mainly comprises a multiplexer 210, a comparer 220, one first address working storage 230, a totalizer 240 and one second address working storage 250.This multiplexer 210 is selected an x signal that is produced by decoding device 100 or by the contents value of this second address working storage 250.The output of this multiplexer 210 writes this first address working storage 230, and it exports first address 301 of this working storage group 300 to, with the working storage 3001 of these first address, 301 indications of access.This totalizer 240 is written to this second address working storage 250 after the contents value of this first address working storage 230 is added 1 again, and the contents value of this second address working storage 250 is in order to the working storage 3001 of these second address, 302 indications of access.This comparer 220 is the contents value of this first address working storage 230 and the y signal that decoding device 100 is produced relatively, if the contents value of this first address working storage 230 during more than or equal to this y signal, then produces a stop signal (stop_signal).

Fig. 6 shows running synoptic diagram of the present invention, and it carries out a MLSIR16, R19, and 8 instructions, this instruction represent that contents value with working storage R16, R17, R18 and R19 is to 8 bits that shift left.When first performance period began, these decoding device 100 these instructions of decoding, and produce x=16, y=19, L_R ^*=1 and the w=8 signal.This multiplexer 210 is selected an x signal (=16) that is produced by decoding device 100, and 200 of control device insert 16 with this first address working storage 230, and via these totalizer 240 computings this second address working storage 250 are inserted 17.Because the first address working storage 230 is 16, it is less than 19, so comparer 220 can not produce this stop signal (stop_signal).That is this working storage group 300 can according to this first address 301 (=16) and second address 302 (=17) read respectively working storage R16 contents value (=ZABC) and the contents value of R17 (=DEFG).And export this shift unit 400 to by first output terminal 310 and second output terminal 320.

This shift unit 400 with the contents value of this first output terminal 310 (=ZABC) and the contents value of second output terminal 320 (=DEFG) be concatenated into one 64 bit word groups (=ZABCDEFG), again according to a shift value w=8 and a L_R ^*=1 signal with this 64 bit word group to 8 bits that shift left (=ABCDEFG0), and export in the 64 bit word groups of this displacement back (=ABCDEFG0) preceding 3 bits (=ABCD).200 of control device according to the 3rd address 303 with the output of this shift unit 400 (=ABCD) write among the working storage R16 of this working storage group 300.

When second performance period began, this multiplexer 210 is selected the contents value (=17) of this second address working storage 250,200 of control device insert 18 with this first address working storage 230, and via these totalizer 240 computings this second address working storage 250 are inserted 18.Its implementation was same as for first performance period, so when second performance period finished, the contents value of this working storage R17 was EFGH.In like manner, so when the 3rd performance period finished, the contents value of this working storage R18 was IJKL.

When the 4th performance period began, this multiplexer 210 is selected the contents value (=19) of this second address working storage 250,200 of control device insert 19 with this first address working storage 230, because the first address working storage 230 is 19, so comparer 220 can produce this stop signal (stop_signal) and stop executive routine, that is only needs three performance periods to get final product.

Fig. 7 shows utilization synoptic diagram of the present invention, when desire is written into one group of unjustified data, can respectively unjustified data be written among working storage R16, R17, R18 and the R19 with being written into instruction (LW) earlier, re-using multiple left shift instruction of the present invention (MLSI) can finish.As shown in Figure 7, its procedure code only needs 5 word groups.

As shown in the above description, technology of the present invention can solve the problem that the known technology procedure code is tediously long, occupy the storage area, can avoid because same reservoir is repeated to read the problem of waste bus bandwidth simultaneously.

It should be noted that above-mentioned many embodiment give an example for convenience of explanation, the interest field that the present invention advocated should be as the criterion so that claim is described certainly, but not only limits to the foregoing description.

Claims

1. A processor device capable of aligning multiple register data across boundaries, mainly comprising:

a decoding device, to decode a multiple shift instruction;

A temporary register group has a plurality of temporary registers, and each temporary register is N-bit. The temporary register group can respectively read the temporary registers according to a first address and a second address, and output from a first output terminal and a second output terminal, and can be written into one of the multiple registers through an input terminal according to a third address, N is a positive integer;

A shifter, coupled to the first output end and the second output end of the temporary register group, and the output content of the first output end and the second output end are concatenated into a 2N byte group, and then according to a shift bit value w shifts the 2N-byte word by w bits, where w is a positive integer, and outputs the first N bits in the 2N-byte word; and

A control device, coupled to the decoding device and the register group, according to the decoded multiple shift instruction, to set the first address, second address, third address and shift value w, read Output the content of the corresponding temporary register, so that the content of the read temporary register is shifted by w bits by the shifter, and write the output of the shifter into the temporary register group according to the third address .

2. The device according to claim 1, wherein N is 32.

3. The device according to claim 1, wherein w is one of 8, 16, 24.

4. The apparatus of claim 1, wherein the shifter can shift w bits left or right.

5. The device of claim 1, wherein the third address is set to be the same as the first address.

6. The device of claim 1, wherein the second address is set as a subsequent address of the first address.

7. A method for aligning multiple temporary register data across boundaries, the multiple temporary registers form a temporary register group, each temporary register is N-bit, and the temporary register group can be based on a first An address and a second address are read from the temporary register respectively, and are output by a first output terminal and a second output terminal, and can be written into the multiple temporary register through an input terminal according to a third address One of them, N is a positive integer, the method mainly includes the following steps:

(A) setting the first address, the second address, the third address and a shift value w according to a multiple shift instruction;

(B) read out the content of the corresponding register according to the first address and the second address; and

(C) Concatenate the contents of the temporary register read out in step (B) into a 2N-byte word group, then shift the 2N-byte word group by w bits, and the shifted 2N-bit writing the first N bits of the metaword into one of the plurality of temporary registers according to the third address; and

(D) Repeat steps (A) to (C) until a predetermined number of registers have been shifted.

8. The method of claim 7, wherein N is 32.

9. The method according to claim 7, wherein w is one of 8, 16, 24.

10 . The method according to claim 7 , wherein the shift of w bits in step (C) can be left or right by w bits. 11 .

11. The method of claim 7, wherein the third address is set to be the same as the first address.

12. The method of claim 7, wherein the second address is set as a subsequent address of the first address.