TWI344109B

TWI344109B - High efficient pipelined decompression method with no-wait time

Info

Publication number: TWI344109B
Application number: TW96128544A
Authority: TW
Inventors: Yuan Long Jeang
Original assignee: Yuan Long Jeang
Priority date: 2007-08-03
Filing date: 2007-08-03
Publication date: 2011-06-21
Also published as: TW200907797A

Description

1344109 r ·· 九、發明說明：【發明所屬之技術領域】本發明係關於一種無等待週期之高效能管縮方法，特別是關於當一分歧指令發生時，由〜間=解壓支援單元輸出一間接跳躍指令區塊首二筆未壓缩^令=流微處理器；反之，則輸出一壓縮指令儲存單元之已二，二令，以提升一系統晶片效能之管線式解壓縮方法。‘才日【先前技術】在系統晶片〔System on Chips，SoCs〕中，面積、制告成本及功率消耗疋二個重要議題。程式記憶體通常占敕^ :·晶片大約40%~80%的大小容量，因此利用壓縮記情俨的來減少面積、製造成本以及功率消耗，是一個直接又有效的方法。整個程式編碼壓縮是在晶片設計之前就要完成的: 而解壓縮的動拃則是整個晶片生產出來後才會產生的。因此，解壓縮的效率將直接影響到已壓縮程式的表現，而壓縮的前置作業也會影響到解壓縮時的效能。凊芩照第1圖所不，其揭示習用之管線式解壓縮裝置控制之方法，其包含一微處理器、一指令緩衝器82、一指令壓縮記憶艚83及一解壓縮電路84。當進行一指令解壓縮時，該解麇縮電路84會去指令壓縮記憶體83讀取一筆€备§^曰令，妓將％壓縮後的指令存於指令緩衝器82内，以便讓處理器81執行該指令的動作。又，請參照第2圖所示，其揭示第I圖的管線式架構的分歧行為。當一條指令剛剛執行完讀取步驟並轉向解碼步驟時，下一條指令就開 PK10A39 2007/8/3 始執行讀取步驟。然而’如第2圖所示，當苐〗條指令發生分歧的行為時，必須經過二條管線階段，才能得知正確的分歧位址，這將導致下一條指令的抓取，必須停止。直到第一條指令的結果產生，此時，原本管線重疊執行的特性將被破壞而造成管線政此上的損失。不幸的是，當分歧指令正在被解碼時，在分歧拈令被確定是分歧指令之前，後續的取指操作就發生了。這樣一來，勢必要把讀取到的指令丟棄。如此’將導致晶片系統的整體效能大大降低。 :有鑑於此，本發明改良上述習用的管線式解壓縮方法之缺點’其係首先將一系、统晶片的所有指令分成間接跳躍才曰令區塊和直接跳躍指令區塊，並將各間接跳躍指令區塊首二筆未壓縮的指令置於一間接湧流支援單元，且將各直接跳躍指令區塊及間接跳·令區塊除了該間接跳躍指令區塊首一筆未壓縮的指令之外的所有的壓縮指令暫存於一壓縮指令儲存單元。當-分歧偵測器_到—微^理哭輸出-分歧指令時’―輸出f料選❹卫器擇可控制輸出一來自該壓縮指令儲存單元之_或是間_流支援單元之資料。藉此’本發明之管線聽㈣方法，料合造成任何延遲’也不會@為分歧或卿指令，⑽生任何的系统停止，便可增加“祕效能的提升，進而可達成減少電路面積、降低製造成本及降低消耗功率。【發明内容】本發明之主要目的係提供—種無等待週期之高效能 2007/8/3 6 — 管線式解壓縮方法，其係將—线晶片之所有指令分成間接跳躍和直接跳躍指令數個區塊，並藉由一分歧偵測器押，—輸出資料選擇多工器錢擇資料來自—壓縮指令^ ^元之資料或是—間接渴流支援單元之輸出資料，使得本發明具有減少電路面積、降低製造成本及降低消耗功率之功效。八根本發明之無等待週期之高效能管線式解壓縮方法，其將一系統晶片的所有指令分成數個間接跳躍指令區衫數1^直接跳躍指令區塊，並將各間接跳躍指令區塊首 1未壓縮指令置於—間接糕支援單^;且將各直接跳令區塊及間接跳躍指令區塊已壓縮指令暫存於一壓縮指令儲存單元；藉由—分歧制器伽卜分歧指令是否發，，，便將該已壓縮或未壓縮之指令透過一輸出資料選擇多工器之控制，而輸出至一微處理器。【實施方式】 ^為讓本發明之上述及其他目的、特徵及優點能更明顯易it下文特舉本發明之較佳實施例，並配合所附圖作詳細說明如下： ” β π苓,¾第3及4圖所示，本發明較佳實施例之管線式解，縮方法係包含第一步驟:〔sn〕將一系統晶片的所有令分成數個間接跳躍指令區塊和數個直接跳躍指令區塊 ^系統晶片設有—滅支援單元卜-壓縮指令儲子f-元2、一指令解壓縮單元3、一輸出資料選擇多工器4 、一位址控解元5及—微處理ti 6。本發明之管線式解 PK10439 2007/8/3 壓縮方法係應用於該系統晶片〔未繪示〕的指令之解壓縮動作。其中各該指令區塊只有一個進入點，但至少有:個出口點。進入點是一個指令，該指令是—個跳躍的目的地 ’或是副程式的返回點，或是帽呼1。該出口點是一個跳躍指令，副程式的進入點或是。乎叫指令，歧這個指令 =下-個指令’是-個進人點。—個指令區塊也許如同 ^―指令一樣短，即下一條指令也是—個進入點。該數個指令區塊在該系統晶片中係可加以利用編號進行區分，該間接跳躍指令區塊首二筆指令係未經過壓縮的指令，而除了該首二筆指令之外’該指令區塊内之所有指令則係為已壓縮的指令。」请再芩照第3及4圖所示，本發明較佳實施例之管線式解壓縮方法係包含第二步驟：〔S12〕將各間接跳躍指令區塊首二筆未壓縮指令置於該間接湧流支援單元丨。本發明較佳實施例之管線式解壓縮方法之間接湧流支援單元】係設有一第一記憶體Π。該第一記憶體n較佳係選自— 唯δ賣δ己憶體〔Read oniy Memory，ROM〕，其係用以存放久該間接跳躍區塊内未加以壓縮的首二筆指令。明翏照第3至5圖所示，本發明較佳實施例之管線解壓縮方法係包含第三步驟：〔si3〕將各直接跳躍指令^ 鬼^間接卿指令區塊除了 _接跳躍指令區塊首二筆法° ίΐΤ:令:外的所有的壓縮指令暫存於該壓縮指令儲存 G存單實t例之管線式㈣縮方法之壓縮指不。又有一第一暫存器2〗、一累加暫存器 PK10439 2007/8/3 U441〇9 。該第夕—第二記憶體24、—位福移暫存器25 跳躍p Λ存③21係用以存放該直接跳職令11塊及間接一堅鈿指令之資料，同時該累加暫存器22自動將該第人暫存器2】内之直接跳躍指令區塊及間接跳躍已壓縮指 :之位址自動累加至第二條跳躍壓縮指令之位址 a :數器加4〕’以便存放該指令之資料。該多工器23二一對一的多工器’即二輸八痒231、一個輸料2 個選擇埠233。其中該輸入璋231係連接該第-暫存器及累加暫存器22之個別輸出，以便分別接收該第-i =21及累加暫存器22之指令資料；該輸出蜂议連接 4第―己憶體24之-輸人痒’以便將存放在該第—暫存器 21内該第一暫存器21或累加暫存器22之指令資料送至第二記憶體24内部；該選擇埠233係可接受一判斷是否分歧的訊號，以便決定哪一個輸入埠23丨係連通該輸出埠232 。5亥第二記憶體24亦係選自一唯讀記憶體〔Read 〇n^ Memoiy，R〇M〕，其係用以存放各直接跳躍指令區塊及間接跳躍指令區塊已壓縮的指令〔亦即除去各間接跳躍指令區塊内首二筆未壓細的指令〕。該位元偏移暫存器25之— 輸入連接至該位址控制單元5之一起始位址映射邏輯56 之一輪出，以便接收該起始位址映射邏輯56所發出之位元偏移值’該位元偏移暫存器25之一輸出並連接至該指令解壓縮單元3，以便作為指令解壓縮單元3内位元的偏移所用。再參照第3至5圖所示，本發明較佳實施例之管線式 PK10439 2007/6/3 一 9 解壓縮方法係包含第四步驟：〔S14〕將該壓縮指令儲存單元2之壓縮指令送至該指令解壓縮單元3，以進行解壓縮後，再送至該微處理器6。換言之，另將該壓縮指令儲存單元2之第二記憶體24之一輸出、位元偏移暫存器25之一輸出及該位址控制單元5之一分歧偵測器51之一輸出連接該指令解壓縮單元3之個別輸入，以分別將壓縮指令及分歧訊號送至該指令解壓縮單元3，以進行解壓縮。如第5 圖所示，本發明較佳實施例之管線式解壓縮方法之指令解壓縮單元3係設有一直接跳躍暫存器組31、一累加指令暫存器組32、一邏輯電路33、一多工器34、一累加暫存器 -.}35、一進位暫存器36、一筒式移位器37、一加法器38以 &及一編碼字組表格39。該直接跳躍暫存器組31及累加指令暫存器32存放該壓縮指令儲存單元2之第二記憶體24 直接跳躍指令區塊的壓縮位址。該累加指令暫存器32存放未發生跳躍之當前順序指令位址。該邏輯電路33係由二個輸入璋及一輸出埠，其中之一輸入埠係接收位址控制單元 5之一分歧偵測器51之一判斷結果；另一輸入埠係接收該. 進位暫存器36所發出之信號。該邏輯電路33係選自一或閘（OR Gate);該邏輯電路33之輸出璋給予控制多工器34 。該多工器34係一二對一多工器分別由直接跳躍暫存器組 31與累加指令暫存器組32為輸入埠，其輸出埠輸出編碼資料給予筒式移位器37 ;該多工器34之選擇埠係可連接該邏輯電路343之輸出埠，以便接受由該邏輯電路33之運算結果，進而決定選擇由直接跳躍暫存器組31或累加指令 PK10439 2007/8/3 —J0 — 1344109 1 · ,· 。曰。存器組3 2哪-個為輸出，並料編器37。該筒式移位器37係指—移位哭，’、、·6予筒式移位循環位元移位器，該筒式移位器37根° ’罗本*明係選自-數值做為所應'該偏移的位元數，並送^存器35的以便解碼。該加法器38係為—加法A、‘、，碼予組表格39 予累加暫存器35及進位暫存器36你亚將運算結果送脈運算之用。該編碼字組表格 3下-單位時長度。 T應之蝙碼及編碼的再茶照第3及5圖所示，本發明解壓縮方法係包含第五及卜 4例之管線式制單元5之分歧偵測器5Ι_;:=〕利用該位址控否為分歧指令；及〔S16〕16輪出之指令是結果予該輸出資料選擇多工器51提供該判斷 ^ ^輸“自壓縮指令儲存單^及該間接;勇流支 ^早兀1之一之指令資料。本發明較佳實施例之管線式解壓縮方法之輪出資料選擇多工器α有數個輪人^ 輸出槔42及一ip;®·!：自/1。,, ^擇皁43。在本實施例中，該輸出資料選擇^工5 4係為-二對—的多工器’即二輸入料、一輸出槔42j其中，玄二輪入埠41係分別連接至該渴流支援單兀1之第-疏體單元u及指令解壓縮單元3之一輸出，以便分別接收該第一及第二記憶體單元u'24之指令資料輸，出隼42係接遠微處理器6，以便將該第一記憶體 1 >或苐一。己丨思組24之指令資料送至該微處理器6内部；且5玄廷擇璋43射接受該分歧彳貞測m之判斷結果，以 PK10439 2007/8/3 1344109 « »·► 2決定哪一個輸入埠41係連通該輸出埠42。在該微處理 :6之—單位時脈内，該輸出資料選擇多工器4係依據該 '、擇=43之判斷結果，決定該輸出埠目前係輸出該第一記憶體1]或第二記憶體24之指令資料予該微處理器6 J … ” 7 =荃照第3及4圖所示，本發明較佳實施例之管線式解^縮方去係包含第七步驟：〔S17〕由一解碼導引器521344109 r ·· ** Description of the invention: [Technical field of the invention] The present invention relates to a high-performance tube shrinking method without waiting period, in particular, when a divergent instruction occurs, an indirect = decompression support unit outputs an indirect The first two uncompressed commands in the jump instruction block are the stream microprocessor; on the contrary, the second and second commands of the compressed instruction storage unit are output to improve the pipeline decompression method of the performance of a system chip. ‘才日【Previous Technology】 In System on Chips (SoCs), there are two important issues in area, cost, and power consumption. Program memory usually accounts for approximately 40% to 80% of the size of the wafer. Therefore, it is a direct and effective method to reduce the area, manufacturing cost and power consumption by using compression. The entire program code compression is done before the chip design: The decompression is generated after the entire chip is produced. Therefore, the efficiency of decompression will directly affect the performance of the compressed program, and the compressed pre-operation will also affect the performance when decompressing. Referring to Figure 1, there is disclosed a method of conventional pipelined decompression device control comprising a microprocessor, an instruction buffer 82, an instruction compression memory 83 and a decompression circuit 84. When an instruction is decompressed, the decompression circuit 84 will instruct the compression memory 83 to read a copy of the instruction, and store the % compressed instruction in the instruction buffer 82 for the processor to 81 performs the action of the instruction. Again, please refer to Figure 2, which discloses the divergent behavior of the pipelined architecture of Figure 1. When an instruction has just completed the reading step and moves to the decoding step, the next instruction starts the PK10A39 2007/8/3 to perform the reading step. However, as shown in Figure 2, when the divergence of the instructions is different, the two pipeline stages must be passed to know the correct divergence address, which will result in the capture of the next instruction and must be stopped. Until the result of the first instruction is generated, at this time, the characteristics of the original pipeline overlap execution will be destroyed and the pipeline will be lost. Unfortunately, when a disagreement instruction is being decoded, subsequent fetch operations occur before the disagreement order is determined to be a divergent instruction. In this way, it is necessary to discard the read instructions. This will result in a significant reduction in the overall performance of the wafer system. In view of the above, the present invention improves the disadvantages of the conventional pipeline-type decompression method described above, which first divides all instructions of a system and a system into an indirect jump block and a direct jump instruction block, and each indirect The first two uncompressed instructions of the jump instruction block are placed in a direct inrush current support unit, and each direct jump instruction block and the indirect jump command block are other than the first uncompressed instruction of the indirect jump instruction block. All compressed instructions are temporarily stored in a compressed instruction storage unit. When the - divergence detector _ to - micro 理输输 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - By the method of the pipeline listening (fourth) of the present invention, it is expected that any delay will not be 'different or clear instructions', and (10) any system stop can increase the "secret performance improvement, thereby achieving a reduction in circuit area, Reducing the manufacturing cost and reducing the power consumption. SUMMARY OF THE INVENTION The main object of the present invention is to provide a high-performance 2007/8/3 6-in-line decompression method without waiting cycles, which divides all instructions of the line wafer into Indirect jump and direct jump command several blocks, and by a divergent detector, the output data selection multiplexer money selection data comes from - the compression command ^ ^ yuan data or - the output of the indirect thirst support unit The invention has the advantages of reducing the circuit area, reducing the manufacturing cost and reducing the power consumption. The high-performance pipeline decompression method without waiting period of the fundamental invention divides all instructions of a system wafer into several indirect jump instructions. The number of shirts is 1^ directly jumps to the command block, and the first uncompressed command of each indirect jump instruction block is placed in the indirect cake support list; The compressed command block and the indirect jump instruction block are temporarily stored in a compressed instruction storage unit; and the compressed or uncompressed command is transmitted through a divergence command. The output data is controlled by a multiplexer and output to a microprocessor. [Embodiment] The above and other objects, features and advantages of the present invention will become more apparent. And the detailed description of the drawings is as follows: "β π 苓, 3⁄4, 3 and 4, the pipeline type solution of the preferred embodiment of the present invention comprises the first step: [sn] a system All the instructions of the chip are divided into a plurality of indirect jump instruction blocks and a plurality of direct jump instruction blocks. The system chip is provided with an off-support unit b-compression command store f-unit 2, an instruction decompression unit 3, and an output data. Select multiplexer 4, one address control solution 5 and - microprocessor ti 6. The pipelined solution of the present invention PK10439 2007/8/3 compression method is applied to the decompression operation of the instruction of the system chip (not shown). Each of the instruction blocks has only one entry point, but at least one exit point. The entry point is an instruction that is either a jump destination or a return point to the subroutine, or a hat call 1. The exit point is a jump command, the entry point of the subroutine or. It is called the instruction, and this instruction = the next instruction is a - entry point. An instruction block may be as short as the ^ instruction, ie the next instruction is also an entry point. The plurality of instruction blocks can be distinguished by number in the system chip, and the first two instructions of the indirect jump instruction block are uncompressed instructions, and the instruction block is except for the first two instructions. All instructions within the system are compressed instructions. Referring to Figures 3 and 4, the pipeline decompression method of the preferred embodiment of the present invention comprises the second step: [S12] placing the first two uncompressed instructions of each indirect jump instruction block into the Indirect surge support unit丨. In the pipeline type decompression method of the preferred embodiment of the present invention, the inrush current support unit is provided with a first memory port. Preferably, the first memory n is selected from the group consisting of a Read oniy Memory (ROM) for storing the first two instructions that are not compressed in the indirect hop block. As shown in the third to fifth embodiments of the present invention, the pipeline decompression method of the preferred embodiment of the present invention comprises the third step: [si3] directing each direct jump instruction ^ ghost ^ indirect command block except _ jump command area The first two strokes of the block ° ΐΤ ΐΤ 令令令令令令令令令令令令令令令令令令令令令令令令令令令令令令令令令令令令令令There is also a first register 2, an accumulator register PK10439 2007/8/3 U441〇9. The first night-second memory 24, the bit shift register 25 jumps the p 321 to store the data of the direct jump order 11 and the indirect one firm command, and the accumulator register 22 Automatically accumulate the address of the direct jump instruction block and the indirect jump compressed finger in the first person register 2] to the address of the second jump compression instruction a: add 4]' to store the address Information on the instructions. The multiplexer 23 has two one-to-one multiplexers ‘two loses eight itchings 231, one feeds two options 埠233. The input port 231 is connected to the individual outputs of the first register and the accumulator register 22 to receive the instruction data of the -i=21 and the accumulator register 22 respectively; The memory of the body 24 is input to send the instruction data stored in the first temporary memory 21 or the accumulation temporary memory 22 in the first temporary memory 21 to the inside of the second memory 24; The 233 system can accept a signal to determine whether it is divergent in order to determine which input port 232 is connected to the output port 232. The second memory 24 of the 5th is also selected from a read-only memory (Read 〇n^ Memoiy, R〇M), which is used to store the compressed instructions of each direct jump instruction block and the indirect jump instruction block. That is, the first two uncompressed instructions in each indirect jump instruction block are removed. The bit offset register 25 is input to one of the start address mapping logic 56 of the address control unit 5 for rounding to receive the bit offset value issued by the start address mapping logic 56. One of the bit offset registers 25 outputs and is coupled to the instruction decompression unit 3 for use as an offset to the bits in the instruction decompression unit 3. Referring again to FIGS. 3 to 5, the pipeline type PK10439 2007/6/3-9 decompression method according to a preferred embodiment of the present invention includes a fourth step: [S14] sending the compression command of the compression instruction storage unit 2 The instruction decompression unit 3 is decompressed and sent to the microprocessor 6. In other words, the output of one of the second memory 24 of the compression instruction storage unit 2, the output of one of the bit offset registers 25, and the output of one of the divergent detectors 51 of the address control unit 5 are connected to the output. The individual inputs of the decompression unit 3 are instructed to send the compression command and the divergence signal to the instruction decompression unit 3, respectively, for decompression. As shown in FIG. 5, the instruction decompression unit 3 of the pipeline decompression method of the preferred embodiment of the present invention is provided with a direct jump register group 31, an accumulation instruction register group 32, a logic circuit 33, A multiplexer 34, an accumulator register -. 35, a carry register 36, a cartridge shifter 37, an adder 38 & and a code block table 39. The direct jump register group 31 and the accumulation instruction register 32 store the compressed address of the second memory 24 direct jump instruction block of the compressed instruction storage unit 2. The accumulated instruction register 32 stores the current sequential instruction address in which no jump has occurred. The logic circuit 33 is composed of two input ports and one output port, one of which inputs the result of one of the divergent detectors 51 of the address receiving unit control unit 5; the other input system receives the bit. The signal from device 36. The logic circuit 33 is selected from an OR gate; the output of the logic circuit 33 is given to the control multiplexer 34. The multiplexer 34 is a two-to-one multiplexer which is respectively input 埠 from the direct jump register group 31 and the accumulation instruction register group 32, and the output 埠 output coded data is given to the cartridge shifter 37; The selection of the processor 34 can be connected to the output port of the logic circuit 343 to accept the operation result of the logic circuit 33, thereby determining the selection by the direct jump register group 31 or the accumulation instruction PK10439 2007/8/3 - J0. — 1344109 1 · ,· . Hey. The register group 3 2 is an output, and the binder 37 is. The cartridge shifter 37 is a shifting crying, ',, · 6 pre-cylinder shifting cycle bit shifter, the cylinder shifter 37 roots 'Roben* Ming is selected from - value As the number of bits of the offset, it should be sent to the memory 35 for decoding. The adder 38 is for adding A, ',, the code group table 39 to the accumulator register 35 and the carry register 36 for sub-computing the result of the operation. The code block table 3 is down - the unit time length. T-cord code and coded re-tea, as shown in Figures 3 and 5, the decompression method of the present invention comprises a divergent detector of the fifth and fourth example of the pipeline type unit 5; _:: The address control is a disagreement instruction; and [S16] 16 rounds of the instruction is the result to the output data selection multiplexer 51 to provide the judgment ^ ^ input "self-compression instruction storage list ^ and the indirect; The instruction data of one of the first embodiment of the pipeline decompression method of the preferred embodiment of the present invention has a plurality of rounds of output 多42 and an ip;®·!: from /1. , ^Select soap 43. In this embodiment, the output data is selected as the multiplexer of the two-pairs, that is, the two input materials, one output 槔42j, and the Xuan second wheel 埠41 series are respectively connected. And outputting one of the first-sequence unit u and the command decompression unit 3 of the thirst support unit 1 to receive the command data of the first and second memory units u'24, respectively Far from the microprocessor 6, in order to send the instruction data of the first memory 1 > or the first memory group 24 to the inside of the microprocessor 6;璋43 shot accepts the judgment result of the difference speculation, and PK10439 2007/8/3 1344109 « »·► 2 determines which input 埠41 is connected to the output 埠42. In the micro processing: 6-unit In the pulse, the output data selection multiplexer 4 determines that the output 埠 is currently outputting the instruction data of the first memory 1] or the second memory 24 to the micro processing according to the judgment result of the ', select=43. 6 J ... " 7 = Referring to Figures 3 and 4, the pipelined demodulation of the preferred embodiment of the present invention comprises a seventh step: [S17] by a decoding guide 52

十f"彳9令的終點位址，並將該位址傳送至該壓縮指令儲 =:二2或間接湧流支援單元！。本發明較佳實施例之管了解1%方法之位址控制單元5除了設有該分歧偵測器、解碼導引器52及位址映射邏輯％之外有一直接^轉暫存器”間接跳躍順序暫存器Μ及4 -暫存5 55。該直接跳躍，嗅序暫存器54賴該解碼料^ U及間接跳路^ w 該分歧偵測器Μ係"^妾The address of the ten f"彳9 order, and transfer the address to the compressed instruction store =: 2 or 2 inrush current support unit! . The address control unit 5 of the preferred embodiment of the present invention understands that the address control unit 5 of the 1% method has an indirect jumper in addition to the divergence detector, the decoding director 52, and the address mapping logic %. The sequential register Μ and the 4 - temporary storage 5 55. The direct jump, the sniffer register 54 depends on the decoding material ^ U and the indirect jump ^ w the difference detector system " ^ 妾

1土=二6所送出的指令，其可判斷該指令是否二:二Γ跳躍指令、，式返回糾令或擇多果輪出至該-資料選更詳言之，賴處理器6内邹具-二，Program Counter，PC〕。、g A 式。十數為〔未、曰” 令後H切數1，會？加/細處理11 6處理4 -個指令之位址，並處=二〔即咖〕，以便索引矣下發生時，由_分处但當該分歧指令因此該程式計數器之累加值^^他”朗时執行，曰大於4，藉此，該分歧偵測 ΡΚί 0^39 2007/3/3 一 Ζ2〜器51則可依此判斷該分歧指 6之輸出判斷該結果，再由該位址控：單並f據微處理器 52内部具衫— 之&碼導引器1 soil = 2 instructions sent by the 6th, which can determine whether the instruction is two: the second jump instruction, the return of the formula, or the selection of the fruit wheel to the - the data selection is more detailed, the processor 6 With - two, Program Counter, PC]. , g A type. The tenth is [not, 曰", after the H cut number 1, will be added / fine processing 11 6 processing 4 - the address of the instruction, and = 2 [ie coffee], so that when the index occurs, by _ Subdivision, but when the divergence instruction is therefore the cumulative value of the program counter ^^", it is greater than 4, whereby the divergence detection ΡΚί 0^39 2007/3/3 Ζ 2~ 51 can be The judgment of the divergence refers to the output of 6 to judge the result, and then the address control: single and f according to the internal fan of the microprocessor 52 - & code guide

否為直接跳躍或間接跳躍指二指令是式:數『係同步於該微處理器6内“部= 的二=令發生時’該解碼導引器52會重新J 使目二的且_式計數值因同步於該微處理器6而 ^解^ 吐料歧指令欲跳至目標的健值，同丨器52會判斷指令是否為直接跳躍指令或間接縣^。該起純㈣射邏輯56储由該第二暫存器連接及解碼‘引☆ 52 ’以便該起始位址映射邏輯％可接收該解碼導引H 52新的程式計數值，並對應輯的程式计數值產生-區塊的位址值，並將該位址值同時載入該第一 #存态55以及壓縮指令儲存單元2，以便該已區分成數個刀歧區塊之程式碼〔即系統晶片之程式碼〕執行跳躍動作，並輸出該區塊内包含分歧區塊指令，以便進行該區塊内指令的運算；同時係判斷出直接跳躍區塊或間接跳躍區塊k號存放於直接跳躍順序暫存器53或間接跳躍順序暫存器，以便進行該區塊内指令信號的控制。該第二暫存器 55係可存放解碼導引器52之位址值。該起始位址映射邏輯56接收來自該解碼導引器52新的程式計數值，並對應 PK10439 2007/8/3 13 — 栽::===:區塊::址值，並將該位址值程式碼執行跳躍動作以㈣已區分成數個區塊之Whether it is a direct jump or an indirect jump means that the number is "synchronized" in the microprocessor 6 when "part=================================================================== The count value is synchronized with the microprocessor 6 to resolve the health value of the target, and the peer 52 determines whether the instruction is a direct jump instruction or an indirect county. The pure (four) shot logic 56 The second temporary register is connected and decoded 'introduction ☆ 52' so that the start address mapping logic % can receive the new program counter value of the decoding guide H 52, and the program count value corresponding to the series is generated - block Address value, and the address value is simultaneously loaded into the first # 存 state 55 and the compressed instruction storage unit 2, so that the code that has been divided into several knives blocks (ie, the code of the system chip) is executed. Jumping action, and outputting the block containing the branch block instruction to perform the operation of the instruction in the block; at the same time, determining that the direct jump block or the indirect jump block k number is stored in the direct jump sequence register 53 or Indirect jump sequence register for execution within the block Controlling the signal. The second register 55 can store the address value of the decode director 52. The start address mapping logic 56 receives the new program count value from the decode guide 52 and corresponds to PK10439. 2007/8/3 13 — Plant::===: Block::Address value, and the address value code performs the jump action to (4) has been divided into several blocks.

時，若’=心線式解㈣方法在進行指令的執行指令的咱取二t A ’其會破壞管線的操作，因此後續遲，進二驟料分歧目標計算的影響，而造成延片之所朴幻效率低料缺點，本發日域由將系統晶If the '= heart-line solution (4) method is used to execute the instruction execution instruction, it will destroy the operation of the pipeline, so the subsequent delay will affect the influence of the divergence target calculation and cause the extension. The illusion of efficiency is low, and the shortcomings of this

、_間接卿指令和1：接卿指令區塊以便：=分歧她51控制該輸出資料選擇多工器4, 接卿指令儲存單元2之已壓縮指令或該間體運算速百二筆未壓縮指令，其確實可提升整以νΓ發明已彻上述較佳實施例揭示，然其並非用、疋本發明’任何料此技藝者在不麟本發明之精神 :祀圍之内’相對上述實施例進行各種更動與修改仍屬本 ^所保狀技術範脅，因此本發明之保㈣圍當視後附之申凊專利範圍所界定者為準。 ΡΚΤ0439 20Q7/8/3 —Η〜【圖式簡單說明】第1圖：習用無等待週期之高效能管線式解壓縮方法之架構不意圖。第2圖：習用無等待週期之高效能管線式解壓縮方法進行指令處理之示意圖。第3圖：本發明較佳實施例之無等待週期之高效能管線式解壓縮方法之架構示意圖。第4圖：本發明較佳實施例之無等待週期之高效能管線式解壓縮方法之步驟流程圖。第5圖：本發明較佳實施例之無等待週期之高效能管線式解壓縮方法之指令解壓縮單元之架構示意圖。」【主要元件符號說明】 1 間接湧流支援單元 11 第一記憶體 2 壓縮指令儲存單元 21 第一暫存器 22 累加暫存器 23 多工器 24 第二記憶體 25 位元偏移暫存器 3 指令解壓縮單元 341 間接跳躍暫存器 342 累加指令暫存器組 343 邏輯電路 344 多工器 345 累加暫存器 346 進位暫存器 347 筒式移位器 348 加法器 349 編碼字組表格 4 輸出資料選擇多工器 5 位址控制單元 51 分歧彳貞測器 52 解碼導引 PK10439 2007/8/3 —15 — 53 直接跳躍順序暫存器 54 間接跳躍順序暫存器 55 第二暫存器 56 位址映射邏輯 6 微處理器 81 微處理器 82 指令緩衝器 83 指令壓縮記憶體 84 解壓縮電路 S11將一糸統晶片的所有指令分成數個間接跳躍指令區塊和數個直接跳躍指令區塊 S12將各間接跳躍指令區塊首二筆未壓縮指令置於-間接湧流支援單元 S13將各直接跳躍指令區塊及間接跳躍指令區塊除了該間接跳躍指令區塊首二筆未壓縮的指令之外的所有的壓縮指令暫存於-壓縮指令儲存單元 S14將該壓縮指令儲存單元之壓縮指令送至-指令解壓縮單元，以進行解壓縮後，再送至-微處理器 S15利用-位址控制單元之-分歧偵測器偵測該微處理器輸出之指令是否為-分歧指令 S16由該分歧偵測器提供該判斷結果予-輸出資料選擇多工器，以便控制該輸出資料選擇多工器輸出來自壓縮指令儲存單元及該間接湧流支援單元之一之指令資料 S17由一解碼導引器計算一指令的終點位址，並將該位址傳送至該壓縮指令儲存單元或間接湧流支援單, _ indirect qing command and 1: qing qing command block to: = divergence she 51 control the output data selection multiplexer 4, the command instruction storage unit 2 compressed instruction or the inter-body operation speed is not compressed The present invention may be improved by the above-described preferred embodiments, but it is not intended to be used in the context of the present invention. The various changes and modifications are still in the technical scope of this warranty. Therefore, the warranty of the present invention (4) is subject to the scope of the patent application. ΡΚΤ0439 20Q7/8/3 —Η~ [Simple description of the diagram] Figure 1: The architecture of the high-performance pipeline decompression method with no waiting period is not intended. Figure 2: Schematic diagram of the instruction processing for the high-performance pipeline decompression method with no waiting period. Figure 3 is a block diagram showing the architecture of a high efficiency pipeline decompression method without a wait period in accordance with a preferred embodiment of the present invention. Figure 4 is a flow chart showing the steps of a high efficiency pipeline decompression method without a wait period in accordance with a preferred embodiment of the present invention. Figure 5 is a block diagram showing the architecture of an instruction decompression unit of a high-performance pipeline decompression method without a wait period in accordance with a preferred embodiment of the present invention. [Main component symbol description] 1 Indirect surge support unit 11 First memory 2 Compression command storage unit 21 First register 22 Accumulator register 23 Multiplexer 24 Second memory 25 Bit offset register 3 instruction decompression unit 341 indirect jump register 342 accumulation instruction register group 343 logic circuit 344 multiplexer 345 accumulation register 346 carry register 347 barrel shifter 348 adder 349 code block table 4 Output data selection multiplexer 5 Address control unit 51 Divergence detector 52 Decoding pilot PK10439 2007/8/3 — 15 — 53 Direct hopping sequence register 54 Indirect hopping sequence register 55 Second register 56 Address Mapping Logic 6 Microprocessor 81 Microprocessor 82 Instruction Buffer 83 Instruction Compression Memory 84 The decompression circuit S11 divides all instructions of a SiS chip into a number of indirect hopping instruction blocks and a number of direct hopping instruction blocks. S12 places the first two uncompressed instructions of each indirect jump instruction block into the indirect inrush current support unit S13, except for each direct jump instruction block and the indirect jump instruction block. All the compressed instructions other than the first two uncompressed instructions of the indirect jump instruction block are temporarily stored in the compression command storage unit S14, and the compressed instructions of the compressed instruction storage unit are sent to the instruction decompression unit for decompression. And then sent to - the microprocessor S15 uses the -differential detector of the -address control unit to detect whether the command output by the microprocessor is - the divergence command S16 is provided by the divergent detector to the result of the judgment - output data selection a multiplexer for controlling the output data selection multiplexer output instruction data S17 from one of the compression instruction storage unit and the indirect current support unit to calculate an end address of an instruction by a decoding guide, and the address is Transfer to the compression command storage unit or indirect current support ticket

Claims

1344109 Bu, the scope of application for patents: 】, the kind of high-level Wei _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Each of the indirect jump instruction blocks has the first two unfinished inrush current support units placed in an indirect: each direct jump instruction block and the indirect skip command area are connected to the jump instruction block. All the dust of the instruction storage unit; the _ _ _ _ unit _ command shirt is prematurely decompressed, then sent to the - microprocessor, using the address control unit Whether the instruction of the divergent detector to prepare the output is - the divergence instruction; ^ the processing person (four) points (4) provide the shutdown result, and the operator is controlled to control the output data selection, the storage unit and the indirect hybrid support. Unit 2; :=! Calculation - the end address of the instruction is 2 2 2, the slave unit, the storage unit and the indirect current support unit... The high-performance pipeline without the waiting period described in the H-item item, The block has - entry points and at least Exit points. Tc 3 is a high-performance tube comprehensive contraction method according to the scalar waiting period described in the patent item ,, wherein the lang (four) flow support unit is provided with - - PK10439 2007/8/3 ~ 17 ' 4, 2,: first-memory The body is selected from - read only memory. • Wei Shenqing's patent scope is the first formula to solve the problem of high-performance pipelines and crying and I/ 〃 5 Hai Shrink instruction storage unit is provided - the first temporary storage of all _ instructions in the connection Jump to jump instruction block and inter-hop instruction block and instruction address β automatically accumulates to the third jump compression 5, =^^4峨__ high-performance pipeline 2 private method, in which the instruction storage unit is another There is a multiplexed two memory, the multiplexer has two input 埠, one round ^ Γ select 埠, the input bee is the first temporary (four) tiring force: == out, the output bee is connected to the first The second is the second loss: "Ting material is accepted - the slogan of the difference between the shirts. The axis 4 is the height of the axis of the axis =; method 'where the compression instruction storage unit is further provided with a "1 input connection to the address control unit output / several & temporary storage - output and connected to the instruction decompression 7L. 概 Unexpected high-performance pipeline port == Among them, the second trt address control unit of the compression instruction storage unit - the divergent detector - the output is connected to the instruction decompression unit. PK10439 2007 /8/3 —18 1344109 II Generate the address value of any block and load the address value into the compression instruction storage unit. 14. High-performance without waiting period according to item 1 of the patent application scope a pipelined decompression method, wherein when the divergent instruction occurs, the decoding director reloads a new program counter value, and the new program counter value is synchronized with the current address value of the microprocessor plus The difference command is to jump to the target address value.

PK10439 20Q7IBI3 20 —