WO2000008555A1 - Dispositif de traitement de donnees - Google Patents
Dispositif de traitement de donnees Download PDFInfo
- Publication number
- WO2000008555A1 WO2000008555A1 PCT/EP1999/005520 EP9905520W WO0008555A1 WO 2000008555 A1 WO2000008555 A1 WO 2000008555A1 EP 9905520 W EP9905520 W EP 9905520W WO 0008555 A1 WO0008555 A1 WO 0008555A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- instruction
- stage
- register file
- result
- processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30025—Format conversion instructions, e.g. Floating-Point to Integer, decimal conversion
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/30141—Implementation provisions of register files, e.g. ports
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30181—Instruction operation extension or modification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3854—Instruction completion, e.g. retiring, committing or graduating
- G06F9/3858—Result writeback, i.e. updating the architectural state or memory
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3867—Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
Definitions
- the invention relates to a data processing device with an instruction execution pipeline.
- PCT patent application No. WO 98/11483 teaches a data processing device with an instruction pipeline.
- the pipeline contains a series of processing stages from a front end to a back end, for performing successive operations during the execution of an instruction.
- the final stage of the back end writes back a processing result to a register file.
- the pipeline can process several instructions in parallel, because the front end processing stages can start executing an instruction before the back end processing stages produced and written back the result of an earlier instruction.
- the data processing device is described in Claim 1.
- the invention provides for the possibility to write back results from different processing stages in the pipeline directly after such a processing stage completes processing of an instruction, that is, without passing the entire pipeline and before the entire pipeline has had the opportunity to process the instruction.
- a first processing stage might perform an arithmetic operation and a second processing stage might perform a clipping operation on the result of the arithmetic operation.
- one may include two types of instruction in the instruction set of the data processing device, one type for arithmetic operation with clipping and one type for arithmetic operations without clipping. In case of an operation with clipping the result would be written back from the second processing unit (after completion of the clipping operation) and in case of an operation without clipping the result would be written back from the first processing stage (before completion of the clipping operation).
- the data processor may even write the result of both the first and the second stage (e.g. with and without clipping) in response to some instructions. This means that the result is written back to the register file directly after the processing stage produces its result, that is, earlier than if the processor has to wait for a time period corresponding to the time needed by the second processing stage.
- Writing to the register file is normally followed by writing to a register after a predetermined delay, but without deviating from the invention, some types of register file may introduce a variable delay until writing is complete, for example in order to resolve access conflicts.
- the register file is provided with more than one write port, so that results from different stages of the pipeline can be written back in parallel.
- different write port of the register file are assigned to different processing stages, so that the pipeline is connected to more write ports than needed for writing the result of individual instructions, in order to be able to write results of different instructions in the pipeline from different processing stages in parallel.
- Figure 1 shows an architecture of a data processing device
- Figure 2 shows a functional unit.
- FIG 1 shows the architecture of a data processor.
- the processor contains a register file 10, a number of functional units 12a-f and an instruction issue unit 14.
- the instruction issue unit 14 has instruction issue connections to the functional units 12a-f.
- the functional units 12a-f are connected to the register file 10 via read and write ports.
- a first one of the functional units 12a has two read ports and two write ports connected to the register file 10.
- Figure 2 shows the first one of the functional units 70, with a cascade of a first and second sub-unit 72. 74. An output of the first sub-unit is coupled to an input of the second sub-unit and to a write port of the register file 10.
- the functional unit 70 contains two control units 76, 78 coupled to a control input the first and second sub-unit 72, 74 respectively.
- An input of the first control unit 76 is coupled to an output of the instruction issue unit for receiving an opcode.
- An output of the first control unit 76 is coupled to an input of the second control unit 78.
- the instruction issue unit 14 fetches successive instructions words from an instruction memory (not shown explicitly). Each instruction word may contain several instructions for different ones of the functional units 12a-f. Normally, each instruction contains fields specifying an opcode, one or more source registers and one result register. When the instruction issue unit 14 has fetched an instruction word from instruction memory, the fields specifying the source registers in a particular instruction are decoded and used to address the register file 10. In response, the register file 10 supplies the content of the source registers to the functional unit 12a-f that will execute the particular instruction.
- the field specifying the opcode and the content of the source registers is supplied to the functional unit 70.
- the functional unit 70 operates in successive processing cycles.
- a control signal for the first sub-unit 72 is generated by the first control unit 76, dependent on the opcode.
- the first sub-unit 72 generates a result which the first sub-unit may write to the register file via the write port (writing depends on the control signal).
- the result (and possible additional information) is passed to the second sub-unit 74.
- a further control signal dependent on the opcode is passed from the first control unit 74 to the second control unit 78.
- the second sub-unit 74 processes the result generated by the first sub-unit 72 under control of the control signal passed by the second control unit 78.
- a second result, generated by the second sub-unit 72 may be written to the register file via a write port (writing depends on the control signal from the second control unit 78).
- the first control unit 76 may already cause the first sub-unit 72 to process a subsequent instruction.
- processors that have a two or more functional units that can start processing different instructions in parallel, such as VLIW processors.
- These processors can execute further instructions 13 and 14 that use the results of II and 12 respectively. Due to the invention such a processor can start 13 and 14 in the same cycle, which makes processing faster.
- the first sub-unit 72 may be for example an ALU and the second sub-unit 74 may be clipping unit or a rounding unit.
- the instruction may be for example an "ADD" instruction.
- the first sub-unit 72 adds the source operands and writes the sum to the register file via its write port, i.e. without involvement of the second sub-unit 74; the second sub-unit 74 refrains from writing to its write port if it receives this first type of ADD instruction.
- the first sub-unit 72 adds the source operands, but it refrains from writing the sum to the register file via its write port; the second sub-unit 74 responds to the second type of ADD instruction e.g. by rounding or clipping the sum, which the second sub-unit 74 receives from the first sub-unit 72. Also in response to the second type of ADD instruction the second sub-unit 74 write the result of its operation on the sum to the write port of the second sub-unit 74.
- adding and rounding or clipping are used here merely by way of example, many other types of operations, which produce meaningful intermediate results, e.g instead of ADD other arithmetic or logic operations, or vector operations and instead of rounding or clipping further arithmetic or logic operations on the result of the first sub-unit 72.
- the functional unit may respond to some instructions by writing back from both of the sub-units. This leads to the following pipeline table.
- each sub-unit 72, 74 itself may contain one or more further subunits, or pipeline stages which process the instruction in successive processing stages.
- more sub-units for implementing different pipeline stages may be placed in series with the first and second sub-unit 72, 74.
- more than two of such further sub- units may be connected to their own write ports to the register file 10 for writing a result produced at an intermediate stage in the pipeline.
- the pipeline table may be
- forks in the pipeline may be included, where one sub-unit feeds two or more further subunits in parallel, one or more of these sub-units having their own write ports for writing results back to the register file 10.
- one may include one or more sub-units (not shown) in parallel to the first sub-unit 72, each having its own instruction and operand inputs and its own write port for writing to the register file 10.
- these one or more sub-units and the first sub-unit 72 may feed a single second sub-unit 74 in parallel via a multiplexer (not shown), the pipeli- ned instructions determining from which of the sub-units the multiplexer passes results to the second sub-unit 74.
- several instructions may be executed in parallel and a selected one of them may be followed by postprocessing in the second sub-unit 74.
- a compiler for the processor will have to schedule operations in such a way that results are produced timely, without conflicts about the use of functional units 12a-f or regis- ters.
- the compiler can treat the functional unit 70 more or less as two or more conceptually different functional units, one for processing instructions without processing by the second sub-unit 74 and one for processing instructions including processing by the second sub-unit 74. These conceptually different functional units have different latencies.
- the compiler will avoid scheduling instruction simultaneously at the functional unit, but the compiler may schedule the start a further instruction at a time when the second sub-unit 74 is still working on the previous instruction. Owing to the invention the compiler can schedule instructions that use the result of the further instruction earlier, for example as early as an instruction that uses a result of the previous instruction.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Advance Control (AREA)
Abstract
L'invention concerne un dispositif de traitement de données, qui possède un pipeline d'exécution des instructions comprenant au moins un premier et un second étage de traitement, directement ou indirectement en série. Les étages exécutent une première et une seconde étape d'exécution des instructions, un premier et un second nombre réciproquement différents de cycles de traitement après l'entrée de l'instruction dans le pipeline. Le premier et le second étage sont tous deux reliés à une pile de registres, afin de permettre l'écriture dans ladite pile d'un résultat de traitement obtenu au cours de la première et/ou la seconde étape et ce, une fois le premier et le second nombre de cycles de traitement exécutés respectivement.
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP98202647.8 | 1998-08-06 | ||
| EP98202647 | 1998-08-06 | ||
| EP98203425 | 1998-10-09 | ||
| EP98203425.8 | 1998-10-09 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2000008555A1 true WO2000008555A1 (fr) | 2000-02-17 |
Family
ID=26150605
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP1999/005520 Ceased WO2000008555A1 (fr) | 1998-08-06 | 1999-07-29 | Dispositif de traitement de donnees |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2000008555A1 (fr) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN100418645C (zh) * | 2003-03-21 | 2008-09-17 | 迪纳帕克压紧设备股份公司 | 用于调节压实机滚轮偏心轴偏心力矩的调节装置 |
| EP2866138B1 (fr) * | 2013-10-23 | 2019-08-07 | Teknologian tutkimuskeskus VTT Oy | Pipeline à support de virgule-flottante pour architecures émulées de mémoire partagée |
| EP2887207B1 (fr) * | 2013-12-19 | 2019-10-16 | Teknologian tutkimuskeskus VTT Oy | Architecture pour des opérations de latence longue dans des architectures de mémoire partagée émulées |
| JP2021168189A (ja) * | 2020-07-15 | 2021-10-21 | ベイジン バイドゥ ネットコム サイエンス アンド テクノロジー カンパニー リミテッド | 命令実行結果をライトバックするための装置及び方法、処理装置 |
| CN118963839A (zh) * | 2024-07-30 | 2024-11-15 | 中山大学 | 一种基于rv32i指令的伪两级流水线处理器及其控制方法 |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4228497A (en) * | 1977-11-17 | 1980-10-14 | Burroughs Corporation | Template micromemory structure for a pipelined microprogrammable data processing system |
| EP0653703A1 (fr) * | 1993-11-17 | 1995-05-17 | Sun Microsystems, Inc. | Jeu de registres temporaire pour un processeur superpipeline-superscalaire |
-
1999
- 1999-07-29 WO PCT/EP1999/005520 patent/WO2000008555A1/fr not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4228497A (en) * | 1977-11-17 | 1980-10-14 | Burroughs Corporation | Template micromemory structure for a pipelined microprogrammable data processing system |
| EP0653703A1 (fr) * | 1993-11-17 | 1995-05-17 | Sun Microsystems, Inc. | Jeu de registres temporaire pour un processeur superpipeline-superscalaire |
Non-Patent Citations (1)
| Title |
|---|
| "METHOD TO MAINTAIN PIPELINE THROUGHPUT WHILE PIPELINE DEPTH IS ALLOWED TO VARY", IBM TECHNICAL DISCLOSURE BULLETIN,US,IBM CORP. NEW YORK, vol. 39, no. 5, pages 31-32, XP000584045, ISSN: 0018-8689 * |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN100418645C (zh) * | 2003-03-21 | 2008-09-17 | 迪纳帕克压紧设备股份公司 | 用于调节压实机滚轮偏心轴偏心力矩的调节装置 |
| EP2866138B1 (fr) * | 2013-10-23 | 2019-08-07 | Teknologian tutkimuskeskus VTT Oy | Pipeline à support de virgule-flottante pour architecures émulées de mémoire partagée |
| EP2887207B1 (fr) * | 2013-12-19 | 2019-10-16 | Teknologian tutkimuskeskus VTT Oy | Architecture pour des opérations de latence longue dans des architectures de mémoire partagée émulées |
| JP2021168189A (ja) * | 2020-07-15 | 2021-10-21 | ベイジン バイドゥ ネットコム サイエンス アンド テクノロジー カンパニー リミテッド | 命令実行結果をライトバックするための装置及び方法、処理装置 |
| EP3940531A1 (fr) * | 2020-07-15 | 2022-01-19 | Kunlunxin Technology (Beijing) Company Limited | Appareil et procédé d'écriture de résultat d'exécution d'instructions et appareil de traitement |
| JP7229305B2 (ja) | 2020-07-15 | 2023-02-27 | ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド | 命令実行結果をライトバックするための装置及び方法、処理装置 |
| CN118963839A (zh) * | 2024-07-30 | 2024-11-15 | 中山大学 | 一种基于rv32i指令的伪两级流水线处理器及其控制方法 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20020169942A1 (en) | VLIW processor | |
| EP1658559B1 (fr) | Dispositif et methode de traitement de donnees a commande par instructions | |
| JP2918631B2 (ja) | デコーダ | |
| US5404552A (en) | Pipeline risc processing unit with improved efficiency when handling data dependency | |
| US7281119B1 (en) | Selective vertical and horizontal dependency resolution via split-bit propagation in a mixed-architecture system having superscalar and VLIW modes | |
| JP3881763B2 (ja) | データ処理装置 | |
| CN102063286B (zh) | 程序流控制 | |
| US6260189B1 (en) | Compiler-controlled dynamic instruction dispatch in pipelined processors | |
| US6145074A (en) | Selecting register or previous instruction result bypass as source operand path based on bypass specifier field in succeeding instruction | |
| JP2002512399A (ja) | 外部コプロセッサによりアクセス可能なコンテキストスイッチレジスタセットを備えたriscプロセッサ | |
| JPH11224194A5 (fr) | ||
| US6154828A (en) | Method and apparatus for employing a cycle bit parallel executing instructions | |
| JP2003005958A (ja) | データ処理装置およびその制御方法 | |
| JP3578883B2 (ja) | データ処理装置 | |
| JP2874351B2 (ja) | 並列パイプライン命令処理装置 | |
| WO2000008555A1 (fr) | Dispositif de traitement de donnees | |
| US7111152B1 (en) | Computer system that operates in VLIW and superscalar modes and has selectable dependency control | |
| US6099585A (en) | System and method for streamlined execution of instructions | |
| JPH08272611A (ja) | マイクロプロセッサ | |
| US7302555B2 (en) | Zero overhead branching and looping in time stationary processors | |
| JP3182591B2 (ja) | マイクロプロセッサ | |
| US6981130B2 (en) | Forwarding the results of operations to dependent instructions more quickly via multiplexers working in parallel | |
| JP2878792B2 (ja) | 電子計算機 | |
| JP3534987B2 (ja) | 情報処理装置 | |
| US6032249A (en) | Method and system for executing a serializing instruction while bypassing a floating point unit pipeline |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A1 Designated state(s): JP |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| 122 | Ep: pct application non-entry in european phase |