[go: up one dir, main page]

TWI861966B - Processor, data processing system, method and design structure for performing secure hash algorithms - Google Patents

Processor, data processing system, method and design structure for performing secure hash algorithms Download PDF

Info

Publication number
TWI861966B
TWI861966B TW112124023A TW112124023A TWI861966B TW I861966 B TWI861966 B TW I861966B TW 112124023 A TW112124023 A TW 112124023A TW 112124023 A TW112124023 A TW 112124023A TW I861966 B TWI861966 B TW I861966B
Authority
TW
Taiwan
Prior art keywords
message
register
state
circuit
block
Prior art date
Application number
TW112124023A
Other languages
Chinese (zh)
Other versions
TW202409871A (en
Inventor
曼諾 庫瑪
希薇亞 梅莉塔 穆勒
笛巴普里雅 洽特傑
尼爾斯 弗里克
凱塔莫瑞 艾卡納達
馬丁 J 布爾斯馬
馬丁 迪德 柏克斯
Original Assignee
美商萬國商業機器公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/857,627 external-priority patent/US20240015004A1/en
Priority claimed from US17/884,704 external-priority patent/US12411996B2/en
Application filed by 美商萬國商業機器公司 filed Critical 美商萬國商業機器公司
Publication of TW202409871A publication Critical patent/TW202409871A/en
Application granted granted Critical
Publication of TWI861966B publication Critical patent/TWI861966B/en

Links

Landscapes

  • Executing Machine-Instructions (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A processor includes a register file and an execution unit. The execution unit includes a hash circuit including at least a state register, a state update circuit coupled to the state register, and a control circuit. Based on a hash instruction, the hash circuit receives from the register file and buffers within the state register a current state of a message being hashed. The state update circuit performs state update function on contents of the state register, where performing the state update function includes performing a plurality of iterative rounds of processing on contents of the state register and returning a result of each of the plurality of iterative rounds of processing to the state register. Following completion of all of the plurality of iterative rounds of processing, the execution unit stores contents of the state register to the register file as an updated state of the message.

Description

用於執行安全雜湊演算法之處理器、資料處理系統、方法及設計結構 Processor, data processing system, method and design structure for executing secure hashing algorithm

本發明大體而言係關於資料處理,且特定言之,係關於在硬體中高效地執行安全雜湊演算法。 The present invention relates generally to data processing, and more particularly to efficiently executing secure hashing algorithms in hardware.

資料安全之重要態樣為經由加密來保護靜止資料(例如,當儲存於資料儲存裝置中時)或轉變中之資料(例如,在傳輸期間)。一般而言,加密涉及經由利用加密函數將明文與一或多個加密密鑰組合來將未加密資料(被稱作明文)轉換成經加密資料(被稱作密文)。為了自密文恢復明文,藉由利用一或多個解密密鑰之解密函數處理密文。因此,加密藉由在當事方能夠存取受保護明文之前彼當事方已知額外秘密(亦即,解密密鑰)的要求來提供資料安全。 An important aspect of data security is the protection of data at rest (e.g., when stored in a data storage device) or in transition (e.g., during transmission) through encryption. In general, encryption involves converting unencrypted data (called plaintext) into encrypted data (called ciphertext) by combining the plaintext with one or more encryption keys using an encryption function. To recover the plaintext from the ciphertext, the ciphertext is processed by a decryption function using one or more decryption keys. Thus, encryption provides data security by requiring that an additional secret (i.e., a decryption key) be known to a party before the party can access the protected plaintext.

在許多實現中,利用執行於通用處理器上之軟體來執行資料加密。雖然在軟體中實現加密提供了能夠選擇不同加密演算法且易於調適所選擇加密演算法以使用各種資料長度的優點,但在軟體中執行加密具有相對不良效能的伴隨缺點。隨著資料集之量在「大資料」時代繼續顯著增加,當加密大訊息及/或資料集時,藉由軟體實現加密達成之效能可係不可接受的。亦由於愈來愈需要利用加密資料運行企業應用程式以便減輕 「黑客行為」及其他網路攻擊的後果且確保法規遵循性,而產生對加密效能之關注。因此,常常需要提供對硬體中之加密的支援以達成改良之效能。 In many implementations, data encryption is performed in software running on a general-purpose processor. While implementing encryption in software offers the advantage of being able to choose from different encryption algorithms and easily adapting the selected encryption algorithm to work with a variety of data lengths, performing encryption in software has the attendant disadvantage of relatively poor performance. As the size of data sets continues to increase dramatically in the "Big Data" era, the performance achieved by implementing encryption in software may be unacceptable when encrypting large messages and/or data sets. Concerns about encryption performance have also arisen due to the increasing need to run enterprise applications with encrypted data in order to mitigate the consequences of "hacktivism" and other cyberattacks and to ensure regulatory compliance. Therefore, it is often desirable to provide support for encryption in hardware to achieve improved performance.

本發明瞭解到,希望為其提供硬體支援的一種類別之加密演算法為雜湊函數,包括但不限於屬於安全雜湊演算法(SHA)標準系列之雜湊函數。如此項技術中已知,SHA標準系列定義由國家標準學會(NIST)核准的用於生成訊息之壓縮表示(亦即,訊息摘要)的雜湊演算法。SHA標準系列經指定於兩個聯邦資訊處理標準(FIPS)中:FIPS 180-4「安全雜湊標準(Secure Hash Standard)」及FIPS 202「SHA-3標準:基於置換之雜湊及可擴展輸出函數(SHA-3 Standard:Permutation-Based Hash and Extendable-Output Functions)」,該等標準以引用方式併入本文中。FIPS 180-4指定七個雜湊演算法,即安全雜湊演算法1(Secure Hash Algorithm-1;SHA-1)及SHA-2系列雜湊演算法,包括SHA-224、SHA-256、SHA-384、SHA-512、SHA-512/224及SHA-512/256。FIPS 202另外指定四個SHA-3雜湊演算法,其具有固定長度輸出(亦即,SHA3-224、SHA3-256、SHA3-384及SHA3-512)及兩個緊密相關「可擴展輸出」函數(XOF),名為SHAKE128及SHAKE256(其中SHAKE為安全雜湊演算法及Keccak之縮寫)。SHA標準系列之額外用途(例如,作為串流密碼、經鑑認加密系統或樹雜湊方案)尚未被採用為NIST標準。 The present invention recognizes that one class of cryptographic algorithms for which hardware support is desirable is hash functions, including but not limited to hash functions belonging to the Secure Hash Algorithm (SHA) family of standards. As is known in the art, the SHA family of standards defines hash algorithms approved by the National Institute of Standards (NIST) for generating compressed representations of messages (i.e., message digests). The SHA family of standards is specified in two Federal Information Processing Standards (FIPS): FIPS 180-4 "Secure Hash Standard" and FIPS 202 "SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions", which are incorporated herein by reference. FIPS 180-4 specifies seven hashing algorithms, namely Secure Hash Algorithm-1 (SHA-1) and the SHA-2 family of hashing algorithms, including SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256. FIPS 202 additionally specifies four SHA-3 hashing algorithms with fixed-length outputs (i.e., SHA3-224, SHA3-256, SHA3-384, and SHA3-512) and two closely related "extensible output" functions (XOFs) named SHAKE128 and SHAKE256 (where SHAKE is an abbreviation for Secure Hash Algorithm and Keccak). Additional uses of the SHA family of standards (e.g., as a stream cipher, an authenticated encryption system, or a tree hashing scheme) have not yet been adopted as NIST standards.

在給出雜湊函數之廣泛多樣性及雜湊函數之資料大小(即使在SHA標準系列內)的情況下,用於雜湊函數之硬體中的廣泛支援可導致處理器佈局內之較大區域被實現雜湊函數之電路系統消耗。結果,一些硬 體解決方案選擇(例如)在匯流排附接之特殊應用積體電路(ASIC)或加速器中與處理器核心分開地實現此電路系統。雖然提供了比一些軟體解決方案更好的效能的可能性,但此等輔助電路之使用仍然受到匯流排及記憶體存取潛時及訊息傳遞開銷的影響,從而與在高效能處理器核心內可達成之效能相比再次限制了效能。對於相對較小訊息(例如,擬合於單一訊息塊內之訊息),此效能損失尤其嚴重,該等訊息為企業伺服器中處置之大多數SHA訊息。本發明藉由在處理器中有效地實現雜湊函數來解決此等及其他設計考慮因素。 Given the wide variety of hash functions and the data sizes of hash functions (even within the SHA family of standards), widespread support in hardware for hash functions can result in a large area within the processor layout being consumed by circuitry implementing the hash functions. As a result, some hardware solutions choose to implement this circuitry separately from the processor core, for example in a bus-attached application specific integrated circuit (ASIC) or accelerator. While offering the potential for better performance than some software solutions, the use of such auxiliary circuitry is still subject to bus and memory access latency and message passing overhead, again limiting performance compared to what can be achieved within a high-performance processor core. This performance loss is particularly severe for relatively small messages (e.g., messages that fit within a single message block), which are the majority of SHA messages processed in enterprise servers. The present invention addresses these and other design considerations by efficiently implementing the hash function in the processor.

在一個實施例中,一種處理器包括一暫存器檔案及一執行單元。該執行單元包括:一雜湊電路,其包括至少一狀態暫存器;一狀態更新電路,其耦接至該狀態暫存器;及一控制電路。基於一雜湊指令,該雜湊電路自該暫存器檔案接收並在該狀態暫存器內緩衝正被雜湊之一訊息之一當前狀態。該狀態更新電路對該狀態暫存器之內容執行狀態更新函數,其中執行該狀態更新函數包括對該狀態暫存器之內容執行複數個反覆回合之處理,及將該複數個反覆回合處理中之每一者之一結果傳回至該狀態暫存器。在完成所有該複數個反覆回合之處理之後,該執行單元將該狀態暫存器之內容儲存至該暫存器檔案作為該訊息之一經更新狀態。 In one embodiment, a processor includes a register file and an execution unit. The execution unit includes: a hash circuit including at least one state register; a state update circuit coupled to the state register; and a control circuit. Based on a hash instruction, the hash circuit receives from the register file and buffers a current state of a message being hashed in the state register. The state update circuit executes a state update function on the content of the state register, wherein executing the state update function includes executing a plurality of iterative rounds of processing on the content of the state register, and returning a result of each of the plurality of iterative rounds of processing to the state register. After completing all the plurality of iterative rounds of processing, the execution unit stores the content of the state register to the register file as an updated state of the message.

此處理器可併入至一資料處理系統中,該資料處理系統包括多個處理器、一共用記憶體及以通信方式耦接該共用記憶體及該多個處理器之一系統互連件。此處理器亦可有形地體現於一機器可讀儲存裝置中用於設計、製造或測試一積體電路之之一設計結構中。 The processor may be incorporated into a data processing system comprising a plurality of processors, a shared memory, and a system interconnect communicatively coupling the shared memory and the plurality of processors. The processor may also be tangibly embodied in a machine-readable storage device for use in a design structure for designing, manufacturing, or testing an integrated circuit.

在一個實施例中,一種在一處理器中進行資料處理之方法包括藉由一指令提取單元提取待由該處理器執行之指令。該等指令包括一 雜湊指令。基於接收到該雜湊指令,該處理器之一執行單元執行該雜湊指令。執行該雜湊指令包括自一暫存器檔案接收並在該執行單元之一狀態暫存器內緩衝正被雜湊之一訊息之一當前狀態。執行該雜湊指令亦包括在一狀態更新電路中對該狀態暫存器之內容執行一狀態更新函數,其中執行該狀態更新函數包括對該狀態暫存器之內容執行複數個反覆回合之處理,及將該複數個反覆回合處理中之每一者之一結果傳回至該狀態暫存器。在完成所有該複數個反覆回合之處理之後,將該狀態暫存器之內容儲存至該暫存器檔案中作為該訊息之一經更新狀態。 In one embodiment, a method for performing data processing in a processor includes fetching instructions to be executed by the processor by an instruction fetch unit. The instructions include a hash instruction. Based on receiving the hash instruction, an execution unit of the processor executes the hash instruction. Executing the hash instruction includes receiving from a register file and buffering a current state of a message being hashed in a state register of the execution unit. Executing the hash instruction also includes executing a state update function on the content of the state register in a state update circuit, wherein executing the state update function includes executing a plurality of iterative rounds of processing on the content of the state register, and returning a result of each of the plurality of iterative rounds of processing to the state register. After completing all of the plurality of iterative rounds of processing, the content of the state register is stored in the register file as an updated state of the message.

在至少一些實施例中,該狀態更新函數包含一安全雜湊演算法3(SHA3)狀態置換函數;且該狀態更新電路執行二十四個回合之處理,每一回合之處理利用二十四個回合索引中之一各別回合索引作為一輸入。 In at least some embodiments, the state update function comprises a secure hash algorithm 3 (SHA3) state permutation function; and the state update circuit performs twenty-four rounds of processing, each round of processing utilizing a respective round index of the twenty-four round indices as an input.

在一個實施例中,該執行單元在一安全雜湊演算法及Keccak(SHAKE)雜湊演算法之一擠壓階段中執行該雜湊指令。 In one embodiment, the execution unit executes the hash instruction in a squeeze phase of a secure hash algorithm and a Keccak (SHAKE) hash algorithm.

在至少一些實施例中,該狀態更新函數包含一安全雜湊演算法2(SHA2)塊雜湊函數。在至少一些實施例中,該雜湊電路進一步包括一加法器,該加法器經組態以將該狀態暫存器之內容加至該當前狀態且將一所得總和傳回至該暫存器檔案。在一些實施例中,該執行單元進一步包括用於緩衝該訊息之一訊息塊之一訊息塊暫存器及耦接至該訊息塊暫存器之一訊息排程回合電路。該訊息排程回合電路對該訊息塊暫存器之內容執行複數個反覆回合之處理,且將該複數個反覆回合之處理中之每一者的一結果傳回至該訊息塊暫存器。在一些實施例中,該狀態更新電路包括用於具有一第一資料寬度之資料字的一資料路徑,且該執行單元經組態以基於 指示比該第一資料寬度窄之一第二資料寬度的該雜湊指令,在處理該狀態更新電路中之該訊息的該等資料字之前,將該訊息之一訊息塊的資料字擴展至該第一資料寬度。 In at least some embodiments, the state update function comprises a secure hash algorithm 2 (SHA2) block hash function. In at least some embodiments, the hash circuit further comprises an adder configured to add the contents of the state register to the current state and return a resulting sum to the register file. In some embodiments, the execution unit further comprises a message block register for buffering a message block of the message and a message scheduling round circuit coupled to the message block register. The message scheduling circuit performs a plurality of iterative rounds of processing on the content of the message block register and returns a result of each of the plurality of iterative rounds of processing to the message block register. In some embodiments, the state update circuit includes a data path for data words having a first data width, and the execution unit is configured to expand the data words of a message block of the message to the first data width before processing the data words of the message in the state update circuit based on the hash instruction indicating a second data width narrower than the first data width.

100:資料處理系統 100:Data processing system

102:處理器 102: Processor

104:處理器核心 104: Processor core

106:快取記憶體 106: Cache memory

110:系統互連件 110: System interconnects

112:記憶體控制器 112:Memory controller

114:系統記憶體 114: System memory

116:輸入/輸出(I/O)配接器 116: Input/output (I/O) adapter

118:非揮發性儲存系統 118: Non-volatile storage system

120:網路配接器 120: Network adapter

200:處理器核心 200: Processor cores

202:指令提取單元 202: Instruction fetch unit

204:指令解碼單元 204: Instruction decoding unit

206:分支處理單元 206: Branch processing unit

210:映射器電路 210: Mapper circuit

216:分派電路 216: Dispatching circuit

218:發行佇列 218: Release Queue

220:固定點單元 220: Fixed point unit

222:浮點單元 222: Floating point unit

224:載入-儲存單元 224: Load-storage unit

226:向量-純量單元 226: Vector-scalar unit

230:儲存器 230: Storage

300:架構式暫存器檔案 300:Architecture register file

301:256位元暫存器r0至rS 301: 256-bit registers r0 to rS

302:功能單元/算術邏輯單元/旋轉單元 302: Functional unit/arithmetic logic unit/rotation unit

304:功能單元/乘法單元 304: Functional unit/multiplication unit

306:功能單元/除法單元 306: Functional unit/division unit

308:功能單元/加密單元 308: Functional unit/encryption unit

310:功能單元/置換單元 310: Functional unit/replacement unit

312:功能單元/二進位寫碼十進位(BCD)單元 312: Functional unit/Binary Code Decimal (BCD) unit

314:加速器單元 314: Accelerator unit

316:寬向量暫存器檔案 316: Wide vector register file

317:暫存器R0及R1 317: Registers R0 and R1

320:傳送單元 320: Transmission unit

400:SHA3/SHAKE雜湊電路 400: SHA3/SHAKE hashing circuit

402:SHA2雜湊電路 402: SHA2 hashing circuit

404:單指令多資料(SIMD)互斥或(XOR)電路 404: Single instruction multiple data (SIMD) exclusive OR (XOR) circuit

406:資料傳送電路 406: Data transmission circuit

500:程序 500:Procedure

502:訊息 502: Message

504:SHA3吸收階段 504: SHA3 absorption phase

506:SHA3/SHAKE擠壓階段 506: SHA3/SHAKE squeeze phase

508:訊息摘要 508: Message summary

600:區塊 600: Block

602:1600位元擴展訊息塊 602: 1600-bit extended message block

604:SHA3狀態置換函數 604: SHA3 state replacement function

606:1600位元逐位元XOR函數 606: 1600-bit bit-by-bit XOR function

610:1600位元最後吸收狀態 610:1600 bits final absorption status

702:SHA-3標準指定之回合索引0 702: Round index 0 specified by the SHA-3 standard

704:SHA3回合函數 704: SHA3 round function

800:結果塊1 800: Result block 1

802:截斷函數 802: Truncation function

804:截斷函數 804: Truncation function

900:SHA3雜湊指令 900: SHA3 hashing instruction

902:作業碼欄位 902: Operation code field

904:暫存器欄位 904: Register field

906:暫存器欄位 906: Register field

1000:逐位元XOR指令 1000: Bit-by-bit XOR instruction

1002:作業碼欄位 1002: Operation code field

1004:暫存器欄位 1004: Register field

1006:暫存器欄位 1006: Register field

1008:暫存器欄位 1008: Register field

1100a:1024位元雙輸入多工器 1100a: 1024-bit dual-input multiplexer

1100b:1024位元雙輸入多工器 1100b: 1024-bit dual-input multiplexer

1102a:1024位元狀態暫存器 1102a: 1024-bit status register

1102b:1024位元狀態暫存器 1102b: 1024-bit status register

1106:SHA3回合電路 1106: SHA3 round circuit

1108:輸出多工器 1108: Output multiplexer

1110:控制電路/控制邏輯 1110: Control circuit/control logic

1200:區塊 1200: Block

1202:區塊 1202: Block

1204:區塊 1204: Block

1206:區塊 1206: Block

1208:區塊 1208: Block

1210:區塊 1210: Block

1214:區塊 1214: Block

1216:區塊 1216: Block

1300:SHA2雜湊函數 1300: SHA2 hash function

1302:訊息 1302: Message

1304:區塊 1304: Block

1306:16×w位元訊息塊/訊息塊1 1306: 16× w- bit message block/message block 1

1308:8×w位元初始狀態 1308:8× w bits initial state

1310:SHA2塊雜湊函數 1310: SHA2 block hash function

1312:截斷函數 1312: Truncation function

1314:訊息摘要 1314: Message summary

1400:訊息排程回合函數 1400: message scheduling round function

1402:w位元回合密鑰 1402: w- bit round key

1404:回合函數/初始更新回合 1404: Round function/initial update round

1406:區塊 1406: Block

1410:8×w位元進位傳播加法函數 1410:8× w bit carry propagation addition function

1420:區塊/16×w位元初始化 1420: Block/16× w bit initialization

1500:SHA2-224或SHA2-256輸入訊息 1500: SHA2-224 or SHA2-256 input message

1502:32位元字 1502: 32-bit word

1504:輸出訊息 1504: Output message

1506:64位元雙字 1506:64-bit double word

1508:32位元零字 1508: 32-bit zero word

1600:SHA2雜湊指令 1600: SHA2 hashing instruction

1602:作業碼欄位 1602: Operation code field

1604:運算元暫存器欄位 1604: Operand register field

1606:運算元暫存器欄位 1606: Operand register field

1608:模式欄位 1608: Mode field

1702a:512位元雙輸入狀態多工器 1702a:512-bit dual input state multiplexer

1702b:1024位元雙輸入訊息多工器 1702b: 1024-bit dual-input message multiplexer

1704a:512位元狀態暫存器 1704a:512-bit status register

1704b:1024位元訊息塊暫存器 1704b: 1024-bit message block register

1708:更新工作狀態電路 1708: Update working status circuit

1710:訊息排程回合電路 1710: Message scheduling return circuit

1712:單指令多資料(SIMD)加法器 1712: Single Instruction Multiple Data (SIMD) adder

1720:控制電路 1720: Control circuit

1800:區塊/輸入狀態 1800: Block/input status

1802:SHA2西格瑪0電路 1802: SHA2 Sigma 0 circuit

1804:SHA2 MA電路 1804: SHA2 MA circuit

1806:SHA2西格瑪1電路 1806: SHA2 Sigma 1 Circuit

1808:SHA2 CH電路 1808: SHA2 CH circuit

1810:64位元加法器 1810:64-bit adder

1812:64位元加法器/模組化加法器 1812: 64-bit adder/modular adder

1814:64位元加法器/模組化加法器 1814:64-bit adder/modular adder

1816:結果狀態 1816: Result status

1900:SHA2西格瑪電路 1900:SHA2 Sigma Circuit

1902:64位元輸入變數 1902:64-bit input variable

1904a:64位元旋轉電路 1904a: 64-bit rotation circuit

1904b:64位元旋轉電路 1904b: 64-bit rotation circuit

1906a:32位元旋轉電路 1906a: 32-bit rotation circuit

1906b:32位元旋轉電路 1906b: 32-bit rotation circuit

1908a:64位元旋轉/移位電路 1908a: 64-bit rotate/shift circuit

1908b:32位元旋轉/移位電路 1908b: 32-bit rotate/shift circuit

1910a:多工器 1910a: Multiplexer

1910b:多工器 1910b: Multiplexer

1910c:多工器 1910c: Multiplexer

1912:三輸入64位元逐位元XOR電路/3向逐位元XOR電路 1912: Three-input 64-bit bit-by-bit XOR circuit/3-way bit-by-bit XOR circuit

1914:64位元輸出 1914:64-bit output

2000:區塊 2000: Block

2002:區塊 2002: Block

2004:區塊 2004: Block

2006:區塊 2006: Block

2010:區塊 2010: Block

2012:區塊 2012: Block

2014:區塊 2014: Block

2016:區塊 2016: Block

2100:未經填充訊息 2100: Unfilled message

2300:最後訊息塊 2300:Last message block

2300':最後訊息塊 2300':Last message block

2300":最後訊息塊 2300":Last message block

2302:EOM填充位元組 2302: EOM padding byte

2304:EOB填充位元組 2304:EOB padding byte

2306:最後訊息位元組 2306: Last message byte

2308:EOM/EOB填充位元組 2308: EOM/EOB padding bytes

2310:訊息塊 2310: Message block

2312:額外置零訊息塊 2312: Additional zero message block

2400:最後訊息塊 2400:Last message block

2400':最後訊息塊 2400':Last message block

2400":最後訊息塊 2400":Last message block

2402:EOM填充位元組 2402: EOM fill byte

2404:EOB填充字 2404:EOB filler

2406:最後訊息位元組 2406:Last message byte

2408:訊息塊 2408: Message block

2410:完整訊息塊 2410: Complete message block

2412:訊息塊 2412: Message block

2500:填充指令 2500: Filling instructions

2502:作業碼欄位 2502: Operation code field

2504:暫存器1欄位 2504: Register 1 field

2506:暫存器2欄位 2506: Register 2 field

2508:模式欄位 2508: Mode field

2510:雜湊識別符(HID)子欄位 2510: Hash identifier (HID) subfield

2512:塊長度(BL)子欄位 2512: Block length (BL) subfield

2514:擴展(E)子欄位 2514: Expand (E) subfield

2600:填充電路 2600: Filling circuit

2602:選擇EOM電路 2602: Select EOM circuit

2604:選擇EOB電路 2604: Select EOB circuit

2606:選擇BL大小電路 2606: Select BL size circuit

2608:EOB賦能電路 2608:EOB enabling circuit

2610:比較器 2610: Comparator

2612:解碼器 2612:Decoder

2614:逐位元「及」電路 2614: Bit-by-bit AND circuit

2620:EOM賦能電路/選擇電路 2620: EOM enabling circuit/selection circuit

2622:比較器 2622: Comparator

2624:解碼器 2624:Decoder

2626:逐位元「及」電路 2626: Bit-by-bit AND circuit

2630:條件「或」電路 2630:Conditional "or" circuit

2700:「或」閘 2700: "or" gate

2702:雙輸入「及」閘 2702: Double input "and" gate

2704:雙輸入「及」閘 2704: Double input "and" gate

2800:區塊 2800: Block

2802:區塊 2802: Block

2804:區塊 2804: Block

2806:區塊 2806: Block

2808:區塊 2808: Block

2810:區塊 2810: Block

2812:區塊 2812: Block

2814:區塊 2814: Block

2816:區塊 2816: Block

2818:區塊 2818: Block

2820:區塊 2820: Block

2822:區塊 2822: Block

2900:設計流程 2900: Design process

2910:設計程序 2910: Design Program

2920:設計結構 2920: Design structure

2930:程式庫元件 2930:Library component

2940:設計規格 2940: Design specifications

2950:特性化資料 2950: Characterization data

2960:驗證資料 2960: Verification data

2980:接線對照表 2980: Wiring comparison table

2985:測試資料檔案 2985:Test data file

2990:設計規則/第二設計結構 2990: Design rules/second design structure

2995:階段 2995: Stage

圖1為根據一個實施例的包括處理器之資料處理系統之高階方塊圖;圖2為根據一個實施例的處理器核心之高階方塊圖;圖3為根據一個實施例的處理器核心之例示性執行單元之高階方塊圖;圖4為根據一個實施例的在處理器核心內之加速器單元之更詳細方塊圖;圖5為根據SHA-3標準之訊息雜湊的時間-空間圖;圖6圖5中所描繪之吸收階段之時間-空間圖;圖7A圖6中所繪示的SHA3置換函數之時間-空間圖;圖7B圖7A中所描繪的SHA3回合函數之時間-空間圖;圖8圖5中所繪示之SHA3/SHAKE擠壓階段之時間-空間圖;圖9至圖10分別繪示根據一個實施例的用於SHA3雜湊指令及逐位互斥或(exclusive OR;XOR)指令之例示性格式;圖11為根據一個實施例的例示性SHA3/SHAKE雜湊電路的高階方塊圖;圖12為根據一個實施例的處理器執行SHA3雜湊指令所藉以的例示性程序之高階邏輯流程圖; 圖13描繪根據SHA-2標準之訊息雜湊的時間-空間圖;圖14圖13中所繪示之SHA2塊雜湊函數之時間-空間圖;圖15繪示根據例示性實施例的具有32位元字之SHA2雜湊函數的訊息擴展;圖16描繪根據一個實施例的用於SHA2雜湊指令之例示性格式;圖17為根據一個實施例的例示性SHA2雜湊電路的高階方塊圖;圖18為根據一個實施例的來自圖17之例示性更新工作狀態電路之高階方塊圖;圖19為如圖18中所展示的SHA2西格瑪電路(sigma circuit)之例示性實施例的高階方塊圖;圖20為根據一個實施例的處理器執行SHA2雜湊指令所藉以的例示性程序之高階邏輯流程圖;圖21A描繪例示性未經填充訊息;圖21B繪示例示性經填充訊息;圖22A圖22B描繪將訊息塊之組塊組合於較窄第一暫存器檔案中且將訊息塊傳送至較寬第二暫存器檔案;圖23A圖23D繪示SHA3/SHAKE訊息之各種填充情境;圖24A圖24D描繪用於SHA2訊息之各種填充情境;圖25繪示根據一個實施例之例示性填充指令;圖26描繪根據一個實施例之例示性填充電路;圖27繪示根據一個實施例的用於組合塊末端(EOB)及訊息 末端(EOM)位元組與訊息的例示性電路;圖28為根據一個實施例的用於填充訊息塊之例示性程序之高階邏輯流程圖;且圖29描繪根據一個實施例之例示性設計程序。 FIG. 1 is a high-level block diagram of a data processing system including a processor according to one embodiment; FIG. 2 is a high-level block diagram of a processor core according to one embodiment; FIG. 3 is a high-level block diagram of an exemplary execution unit of a processor core according to one embodiment; FIG. 4 is a more detailed block diagram of an accelerator unit within a processor core according to one embodiment; FIG. 5 is a time-space diagram of message hashing according to the SHA-3 standard; FIG. 6 is a block diagram of FIG. FIG. 7A is a time-space diagram of the SHA3 permutation function depicted in FIG. 6 ; FIG. 7B is a time-space diagram of the SHA3 round function depicted in FIG. 7A ; FIG. 8 is a time-space diagram of the SHA3/SHAKE squeeze phase depicted in FIG. 5 ; and FIGS. 9 to 10 respectively illustrate a time-space diagram for a SHA3 hash instruction and a bitwise exclusive or (exclusive FIG. 11 is a high-level block diagram of an exemplary SHA3/SHAKE hashing circuit according to one embodiment; FIG. 12 is a high-level logic flow chart of an exemplary procedure by which a processor executes a SHA3 hashing instruction according to one embodiment; FIG. 13 depicts a time-space diagram of message hashing according to the SHA-2 standard; FIG. 14 is a time-space diagram of the SHA2 block hashing function shown in FIG. 13; FIG. 15 depicts a time-space diagram of the SHA2 block hashing function shown in FIG . FIG. 16 depicts an exemplary format for a SHA2 hash instruction according to one embodiment; FIG. 17 is a high-level block diagram of an exemplary SHA2 hash circuit according to one embodiment; FIG. 18 is a high-level block diagram of an exemplary update working state circuit from FIG. 17 according to one embodiment; FIG. 19 is a SHA2 sigma circuit as shown in FIG. FIG. 20 is a high-level logic flow diagram of an exemplary process by which a processor executes a SHA2 hash instruction according to one embodiment; FIG. 21A depicts an exemplary unfilled message; FIG. 21B depicts an exemplary filled message; FIGS . 22A - 22B depict assembling blocks of message blocks in a narrower first register file and transferring the message blocks to a wider second register file; and FIGS. 23A - 23D depict various filling scenarios for SHA3/SHAKE messages. 24A to 24D depict various padding scenarios for SHA2 messages; FIG. 25 depicts an exemplary padding instruction according to an embodiment; FIG. 26 depicts an exemplary padding circuit according to an embodiment; FIG. 27 depicts an exemplary circuit for combining end of block (EOB) and end of message (EOM) bytes and messages according to an embodiment; FIG. 28 is a high-level logic flow chart of an exemplary procedure for padding message blocks according to an embodiment; and FIG. 29 depicts an exemplary design procedure according to an embodiment.

現在參考諸圖且特別參考圖1,繪示根據一個實施例的資料處理系統100之高階方塊圖。在一些實現中,資料處理系統100可為(例如)伺服器電腦系統(諸如,可購自國際商業機器公司之POWER系列伺服器中之一者)、大型電腦系統、行動計算裝置(諸如智慧型手機或平板電腦)、膝上型或桌上型個人電腦系統或嵌入式處理器系統。 Referring now to the figures and in particular to FIG1 , a high-level block diagram of a data processing system 100 is shown according to one embodiment. In some implementations, the data processing system 100 may be, for example, a server computer system (e.g., one of the POWER series servers available from Business Machines Corporation), a mainframe computer system, a mobile computing device (e.g., a smartphone or tablet), a laptop or desktop personal computer system, or an embedded processor system.

如所示,資料處理系統100包括處理指令及資料之一或多個處理器102。如此項技術中已知,每一處理器102可實現為具有半導體基板之各別積體電路,在該半導體基板中形成有積體電路系統。在至少一些實施例中,處理器102可通常實現多個市售處理器架構中之任一者,例如,POWER、ARM、Intel x86、NVidia、Apple silicon等。在所描繪之實例中,每一處理器102包括一或多個處理器核心104及快取記憶體106,該快取記憶體提供對很可能待由處理器核心104讀取及/或寫入之指令及運算元的低潛時存取。處理器102經耦接以用於藉由系統互連件110進行通信,該系統互連件在各種實現中可包括一或多個匯流排、交換器、橋接器及/或混合互連件。 As shown, data processing system 100 includes one or more processors 102 that process instructions and data. As is known in the art, each processor 102 may be implemented as a respective integrated circuit having a semiconductor substrate in which the integrated circuit system is formed. In at least some embodiments, processor 102 may generally implement any of a number of commercially available processor architectures, such as POWER, ARM, Intel x86, NVidia, Apple silicon, etc. In the depicted example, each processor 102 includes one or more processor cores 104 and a cache memory 106 that provides low latency access to instructions and operands that are likely to be read and/or written by processor core 104 . Processors 102 are coupled for communication via a system interconnect 110 , which in various implementations may include one or more buses, switches, bridges, and/or hybrid interconnects.

資料處理系統100可另外包括耦接至系統互連件110之數個其他組件。舉例而言,此等組件可包括控制由處理器102及資料處理系統100之其他組件對系統記憶體114之存取的記憶體控制器112。另外,資料 處理系統100可包括:輸入/輸出(I/O)配接器116,其用於將一或多個I/O裝置耦接至系統互連件110;非揮發性儲存系統118;及網路配接器120,其用於將資料處理系統100耦接至通信網路(例如,有線或無線區域網路及/或網際網路)。 The data processing system 100 may additionally include a number of other components coupled to the system interconnect 110. For example, such components may include a memory controller 112 that controls access to a system memory 114 by the processor 102 and other components of the data processing system 100. In addition, the data processing system 100 may include an input/output (I/O) adapter 116 for coupling one or more I/O devices to the system interconnect 110 , a non-volatile storage system 118 , and a network adapter 120 for coupling the data processing system 100 to a communication network (e.g., a wired or wireless local area network and/or the Internet).

熟習此項技術者應另外瞭解,圖1中所展示之資料處理系統100可包括許多額外未繪示之組件。因為此類額外組件對於理解所描述實施例並非必需的,所以其並未在圖1中加以繪示或在本文中加以進一步論述。然而,亦應理解,本文中所描述之增強適用於不同架構之資料處理系統及處理器,且決不限於圖1中所繪示之一般化資料處理系統架構。 Those skilled in the art will also appreciate that the data processing system 100 shown in FIG. 1 may include many additional components that are not shown. Because such additional components are not necessary for understanding the described embodiments, they are not shown in FIG. 1 or further discussed herein. However, it should also be understood that the enhancements described herein are applicable to data processing systems and processors of different architectures and are in no way limited to the generalized data processing system architecture shown in FIG. 1 .

現參考圖2,描繪根據一個實施例的例示性處理器核心200之高階方塊圖。處理器核心200可用以實現圖1之處理器核心104中之任一者。 2 , a high-level block diagram of an exemplary processor core 200 is depicted according to one embodiment. The processor core 200 may be used to implement any of the processor cores 104 of FIG. 1 .

在所描繪之實例中,處理器核心200包括用於自儲存器230(其可包括例如來自圖1之快取記憶體106及/或系統記憶體114)提取一或多個指令串流內之指令的指令提取單元202。在典型實現中,每一指令具有由處理器核心200之指令集架構定義之格式,且至少包括指定待由處理器核心200執行之操作(例如,固定點或浮點算術運算、向量運算、矩陣運算、邏輯運算、分支運算、記憶體存取操作、加密運算等)的作業碼(operation code/opcode)欄位。某些指令可另外包括一或多個運算元欄位,該一或多個運算元欄位直接指定運算元或隱含地或明確地參考儲存待用於指令執行中之源運算元的一或多個暫存器及用於儲存藉由指令執行而生成的目的地運算元的一或多個暫存器。在一些實施例中可與指令提取單元202合併的指令解碼單元204,解碼藉由指令提取單元202自儲存器230 擷取之指令,且將控制執行流之分支指令轉遞至分支處理單元206。在一些實施例中,藉由分支處理單元206執行之分支指令的處理可包括推測條件分支指令之結果。由分支處理單元206進行的分支處理(推測性及非推測性兩者)之結果繼而可用以重新引導藉由指令提取單元202進行的指令提取之一或多個串流。 In the depicted example, the processor core 200 includes an instruction fetch unit 202 for fetching instructions in one or more instruction streams from memory 230 (which may include, for example, cache 106 and/or system memory 114 of FIG. 1 ). In a typical implementation, each instruction has a format defined by the instruction set architecture of the processor core 200 and includes at least an operation code (opcode) field that specifies an operation to be performed by the processor core 200 (e.g., fixed-point or floating-point arithmetic operations, vector operations, matrix operations, logical operations, branch operations, memory access operations, cryptographic operations, etc.). Some instructions may additionally include one or more operand fields that directly specify operands or implicitly or explicitly reference one or more registers storing source operands to be used in the execution of the instruction and one or more registers for storing destination operands generated by the execution of the instruction. Instruction decode unit 204 , which may be combined with instruction fetch unit 202 in some embodiments, decodes instructions fetched from register 230 by instruction fetch unit 202 and transfers branch instructions that control the flow of execution to branch processing unit 206. In some embodiments, the processing of branch instructions performed by branch processing unit 206 may include speculating the outcome of a conditional branch instruction. The results of branch processing (both speculative and non-speculative) performed by branch processing unit 206 may then be used to redirect one or more streams of instruction fetches performed by instruction fetch unit 202 .

指令解碼單元204將並非分支指令的指令(常常被稱作「依序指令」)轉遞至映射器電路210。映射器電路210負責視需要將處理器核心200之暫存器檔案內的實體暫存器指派給指令以支援指令執行。映射器電路210較佳實現暫存器重命名。因此,對於至少一些類別之指令,映射器電路210建立藉由指令參考之邏輯(或架構式)暫存器之集合與處理器核心200之暫存器檔案內的實體暫存器之較大集合之間的暫態映射。結果,處理器核心200可避免對並非資料相依的指令進行不必要的串列化,否則可能由於按程式次序附近的指令再使用架構式暫存器之有限集合而發生此情形。 Instruction decode unit 204 forwards instructions that are not branch instructions (often referred to as "sequential instructions") to mapper circuit 210. Mapper circuit 210 is responsible for assigning physical registers within the register file of processor core 200 to instructions as needed to support instruction execution. Mapper circuit 210 preferably implements register renaming. Thus, for at least some classes of instructions, mapper circuit 210 establishes a temporary mapping between a set of logical (or architectural) registers referenced by the instruction and a larger set of physical registers within the register file of processor core 200 . As a result, processor core 200 may avoid unnecessary serialization of non-data-dependent instructions that may otherwise occur due to nearby instructions in program order reusing a limited set of architected registers.

仍參看圖2,處理器核心200另外包括一分派電路216,該分派電路經組態以確保觀測到指令之間的任何資料相依性並在依序指令變得準備好執行時分派依序指令。由分派電路216分派之指令暫時在發行佇列218中經緩衝,直至處理器核心200之執行單元具有可用於執行經分派指令之資源。當適當的執行資源變得可用時,發行佇列218機會性地且可能相對於指令之原始程式次序無序地將指令自發行佇列218發行至處理器核心200之執行單元。 Still referring to FIG. 2 , the processor core 200 further includes a dispatch circuit 216 that is configured to ensure that any data dependencies between instructions are observed and to dispatch sequential instructions when the sequential instructions become ready to execute. Instructions dispatched by the dispatch circuit 216 are temporarily buffered in an issue queue 218 until the execution unit of the processor core 200 has resources available to execute the dispatched instructions. When appropriate execution resources become available, the issue queue 218 opportunistically and possibly out of order relative to the original program order of the instructions issues instructions from the issue queue 218 to the execution unit of the processor core 200 .

在所描繪之實例中,處理器核心200包括用於執行各別不同類別之指令的若干不同類型之執行單元。在此實例中,執行單元包括: 一或多個固定點單元220,其用於執行存取固定點運算元之指令;一或多個浮點單元222,其用於執行存取浮點運算元之指令;一或多個載入-儲存單元224,其用於自儲存器230載入資料並將資料儲存至該儲存器;及一或多個向量-純量單元226,其用於執行存取向量及/或純量運算元之指令。在一典型實施例中,每一執行單元經實現為多階段管線,其中可在不同執行階段同時處理多個指令。每一執行單元較佳包括至少一個暫存器檔案或經耦接以存取至少一個暫存器檔案,該至少一個暫存器檔案包括用於暫時緩衝在指令執行中存取或藉由指令執行生成之運算元的複數個實體暫存器。 In the depicted example, processor core 200 includes several different types of execution units for executing respective different classes of instructions. In this example, the execution units include: one or more fixed-point units 220 for executing instructions that access fixed-point operands; one or more floating-point units 222 for executing instructions that access floating-point operands; one or more load-store units 224 for loading data from and storing data to memory 230 ; and one or more vector-scalar units 226 for executing instructions that access vector and/or scalar operands. In a typical embodiment, each execution unit is implemented as a multi-stage pipeline, in which multiple instructions can be processed simultaneously at different execution stages. Each execution unit preferably includes at least one register file or is coupled to access at least one register file, the at least one register file including a plurality of physical registers for temporarily buffering operands accessed during or generated by instruction execution.

熟習此項技術者應瞭解,處理器核心200可包括額外未繪示之組件,諸如經組態以管理由執行單元220226之執行結束所針對之指令的完成及引退的邏輯。因為此等額外組件對於理解所描述實施例並非必需的,所以其並未在圖2中加以繪示或在本文中加以進一步論述。 Those skilled in the art will appreciate that processor core 200 may include additional components not shown, such as logic configured to manage the completion and retirement of instructions targeted by the completion of execution of execution units 220 through 226. Because these additional components are not necessary for understanding the described embodiments, they are not shown in FIG. 2 or discussed further herein.

現在參考圖3,繪示根據一個實施例的處理器102之例示性執行單元之高階方塊圖。在此實例中,更詳細地展示處理器核心200之向量-純量單元226。在圖3之實施例中,向量-純量單元226經組態以執行對不同類型之運算元之操作並生成不同類型之運算元的多個不同類別之指令。舉例而言,向量-純量單元226經組態以執行對向量及純量源運算元進行操作並生成向量及純量目的地運算元的第一類別之指令。向量-純量單元226在功能單元302312中執行此第一類別之指令中的指令,在所描繪之實施例中,該等功能單元包括:用於執行加法、減法及旋轉運算之算術邏輯單元/旋轉單元302、用於執行二進位乘法之乘法單元304、用於執行二進位除法之除法單元306、用於執行加密功能之加密單元308、用於執 行運算元置換之置換單元310及用於執行十進位數學運算之二進位寫碼十進位(BCD)單元312。對其執行此等運算之向量及純量源運算元以及藉由此等運算生成之向量及純量目的地運算元在架構式暫存器檔案300之實體暫存器中被緩衝。 Referring now to FIG. 3 , a high-level block diagram of an exemplary execution unit of the processor 102 according to one embodiment is shown. In this example, the vector-scalar unit 226 of the processor core 200 is shown in greater detail. In the embodiment of FIG. 3 , the vector-scalar unit 226 is configured to perform operations on different types of operators and generate multiple different classes of instructions for different types of operators. For example, the vector-scalar unit 226 is configured to perform operations on vector and scalar source operators and generate a first class of instructions for vector and scalar destination operators. The vector-scalar unit 226 executes instructions of this first class of instructions in functional units 302-312 , which in the depicted embodiment include: an arithmetic logic unit/rotate unit 302 for performing addition, subtraction, and rotate operations, a multiplication unit 304 for performing binary multiplication, a division unit 306 for performing binary division, an encryption unit 308 for performing encryption functions, a permutation unit 310 for performing operand permutations, and a binary coded decimal (BCD) unit 312 for performing decimal math operations. The vector and scalar source operators on which these operations are performed, and the vector and scalar destination operators generated by these operations, are buffered in physical registers of the architectural register file 300 .

在此實例中,向量-純量單元226另外經組態以執行致使執行雜湊函數之第二類別之指令。向量-純量單元226在加速器單元314中執行此第二類別之指令中的指令。對其執行此等雜湊函數之運算元及藉由此等雜湊函數生成之運算元經緩衝且累積於寬向量暫存器檔案316中,該寬向量暫存器檔案可包括例如1024位元寬實體暫存器。 In this example, the vector-scalar unit 226 is additionally configured to execute instructions of a second class that cause hash functions to be executed. The vector-scalar unit 226 executes instructions of this second class in the accelerator unit 314. Operands on which these hash functions are executed and operands generated by these hash functions are buffered and accumulated in a wide vector register file 316 , which may include, for example, 1024-bit wide physical registers.

在操作中,向量-純量單元226自發行佇列218接收指令。若指令係在第一類別之指令(例如,向量-純量指令)中,則在架構式暫存器檔案300中利用由映射器電路210建立的邏輯暫存器與實體暫存器之間的映射來存取用於指令之相關源運算元,且接著將其與指令一起轉遞至功能單元302312中之一相關功能單元以供執行。藉由彼執行生成的目的地運算元接著儲存回至架構式暫存器檔案300的藉由映射器電路210建立之映射判定的實體暫存器。另一方面,若指令處於第二類別之指令(例如,雜湊指令)中,則將該指令轉遞至加速器單元314以關於在寬向量暫存器檔案316之指定暫存器中緩衝的運算元進行執行。 In operation, the vector-scalar unit 226 receives an instruction from the issue queue 218. If the instruction is in the first category of instructions (e.g., vector-scalar instructions), the associated source operands for the instruction are accessed in the architectural register file 300 using the mapping between logical registers and physical registers established by the mapper circuit 210 , and then transferred with the instruction to the associated one of the functional units 302 to 312 for execution. The destination operands generated by the execution are then stored back into the physical registers of the architectural register file 300 determined by the mapping established by the mapper circuit 210 . On the other hand, if the instruction is in the second category of instructions (eg, shuffle instructions), the instruction is forwarded to the accelerator unit 314 for execution with respect to the operands buffered in the specified registers of the wide vector register file 316 .

現在參看圖4,描繪根據一個實施例的圖3之加速器單元314之更詳細方塊圖。加速器單元314包括用於在硬體中執行多種雜湊函數之電路系統,包括(例如)由SHA標準系列定義之一或多個雜湊函數。在所描繪之實例中,加速器單元314之雜湊電路系統至少包括如下文參考圖11更詳細描述之SHA3/SHAKE雜湊電路400及如下文參考圖17更詳細描述 之SHA2雜湊電路402。加速器單元314另外包括在執行訊息之SHA3/SHAKE雜湊時採用的單指令多資料(SIMD)互斥或(XOR)電路404,如下文進一步論述。最後,加速器單元314包括在記憶體系統(例如,快取記憶體106及系統記憶體114)與寬向量暫存器檔案316之間傳送資料(例如,待雜湊之訊息及訊息摘要)的資料傳送電路406Referring now to FIG. 4 , a more detailed block diagram of the accelerator unit 314 of FIG. 3 is depicted according to one embodiment. The accelerator unit 314 includes circuitry for executing a variety of hash functions in hardware, including, for example, one or more hash functions defined by the SHA family of standards. In the depicted example, the hash circuitry of the accelerator unit 314 includes at least a SHA3/SHAKE hash circuit 400 as described in more detail below with reference to FIG. 11 and a SHA2 hash circuit 402 as described in more detail below with reference to FIG. 17 . The accelerator unit 314 further includes a single instruction multiple data (SIMD) exclusive OR (XOR) circuit 404 for use in performing SHA3/SHAKE hashing of messages, as discussed further below. Finally, the accelerator unit 314 includes a data transfer circuit 406 for transferring data (e.g., messages to be hashed and message digests) between a memory system (e.g., cache 106 and system memory 114 ) and a wide vector register file 316 .

現在參考圖5,存在根據SHA-3標準的訊息雜湊之程序500的時間-空間圖。如此項技術中已知,SHA-3標準(亦即,FIPS 202)採用基於寬隨機函數或隨機置換的海綿構造。根據此海綿構造,任何任意長度(可能許多百萬位元組)之訊息502首先在輸入階段(在海綿術語中被稱作SHA3吸收階段504)中經處理。在下文參考圖6更詳細描述的SHA3吸收階段504,針對SHA3雜湊函數及SHAKE雜湊函數兩者係相同的。SHA3吸收階段504產生1600位元最後吸收狀態610,接著在輸出階段(在海綿術語中被稱作SHA3/SHAKE擠壓階段506)中處理該1600位元最後吸收狀態以生成訊息摘要508。下文參考圖8詳細描述的SHA3/SHAKE擠壓階段506針對SHA3雜湊函數及SHAKE雜湊函數不同地操作。特定言之,SHA3/SHAKE擠壓階段506生成用於各種SHA3雜湊函數的固定長度訊息摘要508,但生成用於SHAKE雜湊函數的可變長度訊息摘要508Referring now to FIG. 5 , there is a time-space diagram of a process 500 for message hashing according to the SHA-3 standard. As is known in the art, the SHA-3 standard (i.e., FIPS 202) employs a sponge construction based on a wide random function or random permutation. According to this sponge construction, a message 502 of any arbitrary length (possibly many megabytes) is first processed in an input phase (referred to in sponge terminology as a SHA3 absorption phase 504 ). The SHA3 absorption phase 504 , described in more detail below with reference to FIG. 6 , is identical for both the SHA3 hash function and the SHAKE hash function. The SHA3 absorption phase 504 produces a 1600-bit final absorption state 610 , which is then processed in the output phase (referred to in sponge terminology as the SHA3/SHAKE squeeze phase 506 ) to generate a message digest 508. The SHA3/SHAKE squeeze phase 506 , described in detail below with reference to FIG. 8, operates differently for the SHA3 hash function and the SHAKE hash function. Specifically, the SHA3/SHAKE squeeze phase 506 generates a fixed-length message digest 508 for various SHA3 hash functions, but generates a variable-length message digest 508 for the SHAKE hash function.

以下表I概述由SHA-3標準定義且列於第一行中的四個SHA3雜湊函數及兩個SHAKE雜湊函數之屬性。在表I中,第二行概述SHA3吸收階段504將可變長度訊息502再分成的訊息塊之以位元為單位的大小(r)。訊息塊大小r為位元組長度的整數倍,且每一訊息之第一訊息塊係位元組對準的。表I之第三行概述由SHA3/SHAKE擠壓階段506輸出之訊息摘要508之以位元為單位的大小(d)。再次應注意,不同於SHA3雜湊 函數,SHAKE-128及SHAKE-256生成長度為d'的可變長度摘要。如表I之第四行中所提及,對於由SHA-3標準指定之每一雜湊函數,最後吸收狀態610之長度為1600位元。表I之第五行指定c之不同值,即在SHA3/SHAKE擠壓階段506期間在SHA3狀態置換函數之反覆之間傳遞的較低階位元之數目(參見例如圖8)。最後,表I之第六行指定:SHA3狀態置換函數之每次反覆對每訊息塊採用24個回合之置換(參見例如圖7A)。在對SHA-3標準之更新中或在非標準實現中,可例如藉由減小所需之置換數目(例如,減小至12)來變化置換之回合數。 Table I below summarizes the properties of the four SHA3 hash functions and two SHAKE hash functions defined by the SHA-3 standard and listed in the first row. In Table I, the second row summarizes the size in bits ( r ) of the message blocks into which the variable-length message 502 is subdivided by the SHA3 absorption phase 504. The message block size r is an integer multiple of the byte length, and the first message block of each message is byte-aligned. The third row of Table I summarizes the size in bits ( d ) of the message digest 508 output by the SHA3/SHAKE squeeze phase 506. Again, it should be noted that, unlike the SHA3 hash functions, SHAKE-128 and SHAKE-256 generate variable-length digests of length d' . As mentioned in the fourth row of Table I, for each hash function specified by the SHA-3 standard, the length of the final absorbed state 610 is 1600 bits. The fifth row of Table I specifies different values of c , i.e., the number of lower-order bits passed between iterations of the SHA3 state permutation function during the SHA3/SHAKE squeeze phase 506 (see, e.g., FIG. 8 ). Finally, the sixth row of Table I specifies that each iteration of the SHA3 state permutation function employs 24 rounds of permutations per message block (see, e.g., FIG. 7A ). In updates to the SHA-3 standard or in non-standard implementations, the number of permutation rounds may be varied, e.g., by reducing the number of permutations required (e.g., to 12).

Figure 112124023-A0305-02-0015-1
Figure 112124023-A0305-02-0015-1

現在參看圖6,描繪圖5中所描繪之SHA3吸收階段504的時間-空間圖。如所展示,SHA3吸收階段504接收任何任意長度的訊息502作為輸入。如在區塊600處所展示,填充訊息502以獲得為r個位元之整數倍的長度。在許多先前技術實現中,此填充經由整個訊息502之高潛時、計算上昂貴的記憶體至記憶體移動來實現。在一些其他先前技術實現中,SHA雜湊軟體常式在使用習知SIMD指令序列將訊息塊載入至SIMD暫存器中之後填充訊息塊。儘管此等先前技術之技術可在本文中用以執行填充,但如下文參考圖21A至圖27所詳細描述,此填充可替代地藉由根據所揭示發明之處理器暫存器(例如,寬向量暫存器檔案316)中之硬體經由執 行填充指令來高效地執行。經由執行填充指令來填充訊息502亦允許以與SHA3吸收階段504中之訊息塊之處理時間上重疊的方式將填充應用於訊息502之末端。 Referring now to FIG. 6 , a time-space diagram of the SHA3 absorption phase 504 depicted in FIG. 5 is depicted. As shown, the SHA3 absorption phase 504 receives as input a message 502 of any arbitrary length. As shown at block 600 , the message 502 is padded to obtain a length that is an integer multiple of r bits. In many prior art implementations, this padding is accomplished via a potentially high-latency, computationally expensive memory-to-memory move of the entire message 502. In some other prior art implementations, the SHA hashing software routine pads the message block after loading the message block into a SIMD register using a learned SIMD instruction sequence. Although these prior art techniques may be used herein to perform padding, as described in detail below with reference to FIGS. 21A-27 , such padding may alternatively be efficiently performed by hardware in processor registers (e.g., wide vector register file 316 ) according to the disclosed invention by executing padding instructions. Padding message 502 by executing padding instructions also allows padding to be applied to the end of message 502 in a manner that overlaps in time with the processing of message blocks in SHA3 absorption phase 504 .

在SHA3吸收階段504中,提取組成經填充訊息的長度為rn個(n為正整數)訊息塊中的每一者,且接著在尾隨低階位元中將其進行零擴展以形成n個1600位元擴展訊息塊602。第一訊息塊,亦即訊息塊1602,形成由SHA-3標準定義的SHA3狀態置換函數604之輸入。如下文參考圖9圖11所描述,根據所揭示發明之一個態樣,在硬體中經由執行SHA3雜湊指令來執行SHA3狀態置換函數604。SHA3狀態置換函數604之1600位元狀態輸出形成1600位元逐位元XOR函數606之第一輸入,該1600位元逐位元XOR函數將經填充訊息之下一1600位元擴展訊息塊602視為第二輸入。逐位元XOR函數606之結果形成SHA3狀態置換函數604之下一反覆之輸入。如所示,此程序針對訊息塊602中之每一者反覆地繼續,直至SHA3狀態置換函數604之最終反覆生成並輸出1600位元最後吸收狀態610,如先前在圖5之描述中所提及。 In the SHA3 absorption phase 504 , each of the n ( n is a positive integer) message blocks of length r that make up the padded message is extracted and then zero-extended in the trailing low-order bits to form n 1600-bit extended message blocks 602. The first message block, message block 1 602 , forms the input to the SHA3 state permutation function 604 defined by the SHA-3 standard. As described below with reference to FIGS. 9 and 11 , according to one aspect of the disclosed invention, the SHA3 state permutation function 604 is executed in hardware by executing a SHA3 hash instruction. The 1600-bit state output of the SHA3 state permutation function 604 forms the first input of the 1600-bit bit-by-bit XOR function 606 , which treats the next 1600-bit extended message block 602 of the padded message as the second input. The result of the bit-by-bit XOR function 606 forms the input of the next iteration of the SHA3 state permutation function 604. As shown, this process continues repeatedly for each of the message blocks 602 until the final iteration of the SHA3 state permutation function 604 generates and outputs the 1600-bit final absorbed state 610 , as previously mentioned in the description of FIG. 5 .

現在參考圖7A,繪示圖6中所繪示之SHA3置換函數604的時間-空間圖。SHA3置換函數604接受1600位元輸入,且接著在SHA3回合函數704之24個回合中之第一回合中結合SHA-3標準指定之回合索引0702來處理該1600位元輸入。此程序反覆地繼續,其中SHA3回合函數704中之每一後續回合之處理接收前一SHA3回合函數704之1600位元輸出及相關SHA3標準指定之回合索引702(其為常數)作為輸入。在SHA3狀態置換函數604內之24個回合之處理完成之後,SHA3狀態置換函數604輸出1600位元狀態,該1600位元狀態充當至逐位元XOR函數606之輸入,或在 SA3吸收階段504內之SHA3狀態置換函數604之最終反覆的狀況下構成充當SHA3/SHAKE擠壓階段506之輸入的最後吸收狀態610Referring now to FIG. 7A , a time-space diagram of the SHA3 permutation function 604 shown in FIG. 6 is shown. The SHA3 permutation function 604 accepts a 1600-bit input and then processes the 1600-bit input in conjunction with a SHA-3 standard specified round index 0 702 in the first of 24 rounds of the SHA3 round function 704. This process continues repeatedly, with each subsequent round of processing in the SHA3 round function 704 receiving as input the 1600-bit output of the previous SHA3 round function 704 and the associated SHA3 standard specified round index 702 (which is a constant). After the 24 rounds of processing within the SHA3 state permutation function 604 are completed, the SHA3 state permutation function 604 outputs a 1600-bit state, which serves as input to the bit-wise XOR function 606 , or in the case of the final iteration of the SHA3 state permutation function 604 within the SA3 absorption phase 504, constitutes the final absorption state 610 that serves as input to the SHA3/SHAKE squeeze phase 506 .

現在參看圖7B,描繪圖7A中所描繪之SHA3回合函數704的時間-空間圖。如所示,SHA3回合函數704包括SHA-3標準指定之函數序列,按次序包括在SHA-3標準中由希臘字母θ(theta)、ρ(rho)、π(pi)、χ(chi)及ϊ(iota)所指的五個函數。θ函數接收並處理至回合函數704之1600位元輸入,且除ϊ函數之外的每個其他函數之輸出饋送下一依序函數。最後,ϊ函數處理χ函數之輸出及相關回合索引702以產生SHA3回合函數704之給定反覆的1600位元輸出。在先前技術中,利用兩個單指令多資料(SIMD)向量管線執行回合函數704可佔用多達80個循環。根據本文中所揭示之發明的一個態樣,可利用下文所描述的圖11之SHA3/SHAKE雜湊電路400在處理器核心104之單一循環中完成回合函數704Referring now to FIG. 7B , a time-space diagram of the SHA3 round function 704 depicted in FIG. 7A is depicted. As shown, the SHA3 round function 704 includes a sequence of functions specified by the SHA-3 standard, including, in order, the five functions referred to in the SHA-3 standard by the Greek letters θ (theta), ρ (rho), π (pi), χ (chi), and ϊ (iota). The θ function receives and processes a 1600-bit input to the round function 704 , and the output of each of the other functions except the ϊ function is fed to the next sequential function. Finally, the ϊ function processes the output of the χ function and the associated round index 702 to produce a 1600-bit output for a given iteration of the SHA3 round function 704 . In the prior art, executing round function 704 using two single instruction multiple data (SIMD) vector pipelines may take up to 80 cycles. According to one aspect of the invention disclosed herein, round function 704 may be completed in a single cycle of processor core 104 using SHA3/SHAKE hash circuit 400 of FIG. 11 described below.

現在參考圖8,繪示圖5中所繪示之SHA3/SHAKE擠壓階段506的時間-空間圖。如先前所描述,SHA3/SHAKE擠壓階段506接收由SHA3吸收階段504產生的1600位元最後吸收階段610作為輸入。為了產生用於由SHA-3標準定義之SHA3函數中之任一者的訊息摘要508,SHA3/SHAKE擠壓階段506首先提取最後吸收狀態610之前r個高階位元以形成結果塊1 800。截斷函數802接著截斷結果塊1 800r個位元以保留形成訊息摘要508之高階d個位元。 Referring now to FIG. 8 , a time-space diagram of the SHA3/SHAKE squeeze phase 506 shown in FIG. 5 is shown. As previously described, the SHA3/SHAKE squeeze phase 506 receives as input the 1600-bit final absorption state 610 generated by the SHA3 absorption phase 504. To generate a message digest 508 for use with any of the SHA3 functions defined by the SHA-3 standard, the SHA3/SHAKE squeeze phase 506 first extracts the r high-order bits before the final absorption state 610 to form a result block 1 800. The truncation function 802 then truncates r bits of the result block 1 800 to retain the high-order d bits that form the message digest 508 .

為了產生用於由SHA-3標準定義的SHAKE函數中之一者的訊息摘要,結果塊1 800r個位元形成截斷函數804之輸入之r個高階位元。此等r個高階位元與n-1個額外r位元結果塊800串連,該等額外r位元結果塊中之每一者係由如先前關於圖7A所描述的SHA3狀態置換函數604 之反覆之輸出的r個高階位元形成。SHA3/SHAKE擠壓階段506之每一SHA3狀態置換函數604接收1600位元輸入(亦即,r+c=1600)並生成1600位元輸出,該1600位元輸出除了SHA3狀態置換函數604之最後反覆之外,饋送SHA3狀態置換函數604之後續反覆。截斷函數804截斷r×n個輸入位元以獲得具有使用者指定長度d'位元的訊息摘要508To generate a message digest for one of the SHAKE functions defined by the SHA-3 standard, the r bits of result block 1 800 form the r high-order bits of the input to a truncation function 804. These r high-order bits are concatenated with n -1 additional r- bit result blocks 800 , each of which is formed by the r high-order bits of the output of a repetition of the SHA3 state permutation function 604 as previously described with respect to FIG. 7A . Each SHA3 state permutation function 604 of the SHA3/SHAKE squeeze phase 506 receives a 1600-bit input (i.e., r + c = 1600) and generates a 1600-bit output that feeds subsequent iterations of the SHA3 state permutation function 604 except for the last iteration of the SHA3 state permutation function 604. The truncation function 804 truncates the r × n input bits to obtain a message digest 508 having a user-specified length d' bits.

現在參看圖9圖10,繪示根據一個實施例的分別用於SHA3雜湊指令900及逐位元互斥或(XOR)指令1000之例示性格式。在一例示性實施例中,加速器單元314經組態以回應於接收到SHA3雜湊指令900而在硬體中利用SHA3/SHAKE雜湊電路400來執行SHA3/SHAKE狀態置換函數,且回應於接收到逐位元XOR指令1000而利用SIMD XOR電路404來執行指定運算元之1024位元逐位元XOR。 9-10 , exemplary formats for a SHA3 hash instruction 900 and a bitwise exclusive OR (XOR) instruction 1000, respectively, are shown according to one embodiment. In one exemplary embodiment, the accelerator unit 314 is configured to utilize the SHA3/SHAKE hash circuit 400 to perform a SHA3/SHAKE state permutation function in hardware in response to receiving the SHA3 hash instruction 900 , and utilize the SIMD XOR circuit 404 to perform a 1024-bit bitwise XOR of a specified operand in response to receiving the bitwise XOR instruction 1000 .

在所繪示之實施例中,SHA3雜湊指令900包括作業碼欄位902,該作業碼欄位指定用於SHA3/SHAKE置換函數之特定的架構特定作業碼。SHA3雜湊指令900另外包括一或多個暫存器欄位904906,該一或多個暫存器欄位用於指定寬向量暫存器檔案316內之用於SHA3/SHAKE狀態置換函數之源及目的地運算元的暫存器。舉例而言,在一個實現中,SHA3雜湊指令900包括單一暫存器欄位904,該暫存器欄位指定緩衝1600位元源運算元且在SHA3/SHAKE置換函數完成之後緩衝1600位元目的地運算元(其覆寫源運算元)的一對鄰近的1024位元暫存器中之第一者。在一替代實現中,SHA3雜湊指令900包括用於指定單獨對的1024位元源及目的地暫存器的兩個暫存器欄位904906(在此狀況下,目的地運算元並不覆寫源運算元)。 In the illustrated embodiment, the SHA3 hash instruction 900 includes an opcode field 902 that specifies a specific architecture-specific opcode for the SHA3/SHAKE permutation function. The SHA3 hash instruction 900 further includes one or more register fields 904 , 906 that specify registers within the wide vector register file 316 for the source and destination operands of the SHA3/SHAKE state permutation function. For example, in one implementation, the SHA3 hash instruction 900 includes a single register field 904 that specifies the first of a pair of adjacent 1024-bit registers that buffer a 1600-bit source operand and, after the SHA3/SHAKE permutation function is completed, buffer a 1600-bit destination operand (which overwrites the source operand). In an alternative implementation, the SHA3 hash instruction 900 includes two register fields 904 , 906 that specify a separate pair of 1024-bit source and destination registers (in which case the destination operand does not overwrite the source operand).

如上文所提及,在將來的對SHA-3標準之更新中或在非標 準實現中,可需要控制由SHA3狀態置換函數604應用的置換之回合數目。在此類實施例中,該回合數目的SHA3雜湊指令900可包括直接設定置換之回合數目或參考指定置換之回合數目之暫存器的欄位。 As mentioned above, in future updates to the SHA-3 standard or in non-standard implementations, it may be desirable to control the number of permutations applied by the SHA3 state permutation function 604. In such embodiments, the SHA3 hash instruction 900 for the number of permutations may include directly setting the number of permutations or referencing a field of a register that specifies the number of permutations.

圖10描繪例示性實施例,其中逐位元XOR指令包括作業碼欄位1002,該作業碼欄位指定用於1024位元逐位元XOR函數之特定的架構特定作業碼。逐位元XOR指令1000另外包括三個暫存器欄位100410061008,該等暫存器欄位用於分開地指定寬向量暫存器檔案316內之用於緩衝兩個1024位元源運算元及一個1024位元目的地運算元的1024位元暫存器。 10 depicts an exemplary embodiment in which the bitwise XOR instruction includes an opcode field 1002 that specifies a specific architecture-specific opcode for a 1024-bit bitwise XOR function. The bitwise XOR instruction 1000 also includes three register fields 1004 , 1006 , and 1008 that are used to separately specify 1024-bit registers within the wide vector register file 316 for buffering two 1024-bit source operands and one 1024-bit destination operand.

現在,已解釋SHA3及SHAKE雜湊函數以及用於實現此等雜湊函數之部分之例示性指令,呈現用於在硬體中執行例示性SHA3雜湊函數之偽碼。在以下偽碼中,參考以下暫存器: Now that the SHA3 and SHAKE hash functions have been explained, as well as exemplary instructions for implementing portions of these hash functions, pseudocode for executing an exemplary SHA3 hash function in hardware is presented. In the following pseudocode, reference is made to the following registers:

Rr←以位元組為單位之塊長度 Rr←block length in bytes

RL←以位元組為單位之訊息長度//假定RL

Figure 112124023-A0305-02-0019-3
Rr且第一塊未被填充 RL←Message length in bytes //Assume RL
Figure 112124023-A0305-02-0019-3
Rr and the first block is not filled

Ra←訊息之起始位址 Ra←Starting address of the message

Rb←由雜湊函數產生之訊息摘要之位址 Rb←address of the message digest generated by the hash function

Rd←以位元組為單位之訊息摘要長度 Rd←Message summary length in bytes

Xs←SHA3狀態 //寬向量暫存器對 Xs←SHA3 state //Wide vector register pair

Xm←訊息塊 //寬向量暫存器對 Xm←Message block //Wide vector register pair

給出此等暫存器,用於SHA3(非SHAKE)雜湊函數中之任一者的偽碼可表示如下: Given these registers, the pseudonym for any of the SHA3 (not SHAKE) hash functions can be expressed as follows:

Xs=loadlength(Ra,Rr) //載入訊息之第一訊息塊且初始化狀態 Xs=loadlength(Ra,Rr) //Load the first message block and initialize the status

Xs=sha3hash(Xs) //執行SHA3雜湊指令以對第一訊息塊執行置 換 Xs=sha3hash(Xs) //Execute SHA3 hashing command to perform substitution on the first message block

RL-=Rr //遞減訊息之未經處理部分之長度 RL-=Rr //The length of the unprocessed part of the reciprocating message

Ra+=Rr //遞增至訊息中之下一訊息塊之指標 Ra+=Rr //Increase to the pointer of the next message block in the message

While(RL>=Rr) //進入用於處理每一剩餘訊息塊之迴路,除訊息之最後訊息塊之外 While(RL>=Rr) //Enter the loop for processing each remaining message block, except the last message block of the message

{Xm=loadlength(Ra,Rr) //載入下一訊息塊 {Xm=loadlength(Ra,Rr) //Load the next message block

Xs=wide_xor(Xs,Xm) //執行逐位元XOR指令以組合狀態及當前訊息塊 Xs=wide_xor(Xs,Xm) //Execute bit-by-bit XOR instruction to combine the status and current message block

Xs=sha3hash(Xs) //執行SHA3雜湊指令以對當前訊息塊執行置換 Xs=sha3hash(Xs) //Execute SHA3 hashing command to perform substitution on the current message block

RL-=Rr //遞減訊息之未經處理部分之長度 RL-=Rr //The length of the unprocessed part of the reciprocating message

Ra+=Rr //遞增至下一訊息塊之指標 Ra+=Rr //Increase to the pointer of the next message block

} }

Xm=loadlength(Ra,RL) //載入最後訊息塊(若存在)(RL可為零) Xm=loadlength(Ra,RL) //Load the last message block (if it exists) (RL can be zero)

Xm=sha3_padding(Xm,RL,sha3-type) //基於剩餘訊息長度及SHA3函數執行填充指令以填充訊息 Xm=sha3_padding(Xm,RL,sha3-type) //Execute padding instructions based on the remaining message length and SHA3 function to fill the message

Xs=wide_xor(Xs,Xm)//執行逐位元XOR指令以組合狀態及最後訊息塊 Xs=wide_xor(Xs,Xm) //Execute bit-by-bit XOR instruction to combine the status and the last message block

Xs=sha3hash(Xs) //執行SHA3雜湊指令以對最後訊息塊執行置換且產生最後吸收狀態 Xs=sha3hash(Xs) //Execute SHA3 hashing instruction to perform substitution on the last message block and generate the final absorption state

Store_length(Xs,Rb,Rd) //在SHA3擠壓階段中,藉由將Xs之前導 Rd位元組儲存至位址Rb處之記憶體來截斷最後吸收狀態以形成訊息摘要 Store_length(Xs,Rb,Rd) //In the SHA3 squeeze phase, truncate the last absorbed state to form a message digest by storing the leading Rd bytes of Xs to memory at address Rb

現在參考圖11,繪示根據一個實施例的適合於執行SHA3雜湊指令900之例示性SHA3/SHAKE雜湊電路400的高階方塊圖。如所示,SHA3/SHAKE雜湊電路400包括兩個1024位元雙輸入多工器1100a1100b、兩個1024位元狀態暫存器1102a1102b、SHA3回合電路1106以及控制電路1110,該控制電路回應於SHA3雜湊指令900來控制SHA3/SHAKE雜湊電路400之操作。 Referring now to FIG. 11 , a high-level block diagram of an exemplary SHA3/SHAKE hash circuit 400 suitable for executing a SHA3 hash instruction 900 is shown according to one embodiment. As shown, the SHA3/SHAKE hash circuit 400 includes two 1024-bit two-input multiplexers 1100 a , 1100 b , two 1024-bit state registers 1102 a, 1102 b , a SHA3 round circuit 1106 , and a control circuit 1110 that controls the operation of the SHA3/SHAKE hash circuit 400 in response to the SHA3 hash instruction 900 .

輸入多工器1100a具有:第一輸入,其經耦合以自由SHA3雜湊指令900識別的寬向量暫存器檔案316中之暫存器對的第一暫存器接收1600位元輸入狀態之高階1024個位元;及第二輸入,其經耦合以自SHA3回合電路1106接收1600位元回合回饋之高階1024個位元。輸入多工器1100b類似地經結構化,其具有:第一輸入,其經耦合以自寬向量暫存器檔案316中之指令指定之暫存器對中的第二暫存器接收包括1600位元輸入狀態之低階576個位元的1024位元值;及第二輸入,其耦合至SHA3回合電路1106以接收包括1600位元回合回饋之低階576個位元的1024位元值。SHA3/SHAKE雜湊電路400內之控制邏輯1110將未繪示之選擇信號提供至輸入多工器1100a1100b以使輸入多工器1100a1100b選擇在SHA3回合0之前在其第一輸入處存在的值且選擇在SHA3回合0至SHA3回合23中之每一者之後在其第二輸入處存在的值。由輸入多工器1100a1100b輸出的分別在狀態暫存器1102a1102b中緩衝的值一起形成SHA3回合電路1106之1600位元回合輸入值,該SHA3回合電路經組態以對回合輸入值執行SHA3回合函數704,如先前參考圖7A至圖7B所描述。 The input multiplexer 1100a has: a first input, which is coupled to receive the high-order 1024 bits of the 1600-bit input state from the first register of the register pair in the wide vector register file 316 identified by the SHA3 hash instruction 900 ; and a second input, which is coupled to receive the high-order 1024 bits of the 1600-bit round feedback from the SHA3 round circuit 1106 . Input multiplexer 1100b is similarly structured having a first input coupled to receive a 1024-bit value comprising the low-order 576 bits of a 1600-bit input state from a second register in a register pair specified by an instruction in wide vector register file 316 , and a second input coupled to SHA3 round circuit 1106 to receive a 1024-bit value comprising the low-order 576 bits of a 1600-bit round feedback. Control logic 1110 within the SHA3/SHAKE hash circuit 400 provides select signals, not shown, to the input multiplexers 1100a , 1100b to cause the input multiplexers 1100a , 1100b to select the value present at their first input prior to SHA3 round 0 and to select the value present at their second input after each of SHA3 rounds 0 through SHA3 rounds 23. The values output by the input multiplexers 1100a , 1100b, buffered in the state registers 1102a , 1102b, respectively, together form the 1600-bit round input value for the SHA3 round circuit 1106 , which is configured to perform the SHA3 round function 704 on the round input value, as previously described with reference to FIGS. 7A-7B .

控制電路1110經進一步組態以利用由SHA-3標準指定之正 確回合索引經由SHA-3標準所需之24個回合中的每一者對SHA3回合電路1106進行定序。在第23個回合結束之後,狀態暫存器1102a1102b將分別保持1600位元輸出狀態之高階1024個位元及低階576個位元。控制電路1110進一步經組態以一旦獲得輸出狀態,就確立未繪示之選擇信號,以使輸出多工器1108在兩個連續循環中將來自狀態暫存器1102a1102b之1600位元輸出狀態之高階位元及低階位元分別寫入至寬向量暫存器檔案316中的指令指定之暫存器對(假定寬向量暫存器檔案316具有單個寫入埠)。 The control circuit 1110 is further configured to sequence the SHA3 round circuit 1106 through each of the 24 rounds required by the SHA-3 standard using the correct round index specified by the SHA-3 standard. After the 23rd round is completed, the state registers 1102a , 1102b will respectively hold the high-order 1024 bits and low-order 576 bits of the 1600-bit output state. The control circuit 1110 is further configured to assert a select signal (not shown) once the output state is obtained, so that the output multiplexer 1108 writes the high-order bits and low-order bits of the 1600-bit output state from the state registers 1102a , 1102b to the instruction-specified register pair in the wide vector register file 316 in two consecutive cycles (assuming that the wide vector register file 316 has a single write port).

現在參看圖12,描繪根據一個實施例的用於執行SHA3雜湊指令900之例示性程序的高階邏輯流程圖。為了易於理解,參考圖11之例示性SHA3/SHAKE雜湊電路400描述圖12之程序。 Referring now to FIG. 12 , a high-level logic flow chart of an exemplary process for executing the SHA3 hash instruction 900 according to one embodiment is depicted. For ease of understanding, the process of FIG. 12 is described with reference to the exemplary SHA3/SHAKE hash circuit 400 of FIG. 11 .

圖12之程序開始於區塊1200,且接著繼續進行至區塊1202,區塊1202繪示SHA3/SHAKE雜湊電路400接收指定寬向量暫存器檔案316內之運算元暫存器對的SHA3雜湊指令900。回應於接收到SHA3雜湊指令900,控制電路1110使得待自寬向量暫存器檔案316讀出運算元暫存器對之內容且經由輸入多工器1100a1100b將該等內容載入至狀態暫存器1102a1102b中(區塊1204)。控制電路1110另外初始化內部回合計數器至0(區塊1206)。 12 begins at block 1200 and then proceeds to block 1202 , which shows the SHA3/SHAKE hash circuit 400 receiving the SHA3 hash instruction 900 that specifies the operand register pair in the wide vector register file 316. In response to receiving the SHA3 hash instruction 900 , the control circuit 1110 causes the contents of the operand register pair to be read from the wide vector register file 316 and loaded into the state registers 1102a , 1102b via the input multiplexers 1100a , 1100b (block 1204 ). The control circuit 1110 also initializes the internal round counter to 0 (block 1206 ).

程序接著自區塊1206繼續進行至區塊1208,該區塊1208繪示控制電路1110引導SHA3回合電路1106利用在狀態暫存器1102a1102b中緩衝之回合輸入及適當的SHA-3標準指定之回合索引來執行SHA3回合函數704之反覆。控制電路1110另外遞增回合計數器(區塊1208)。SHA3回合電路1106之處理的結果由輸入多工器1100a1100b傳 回至狀態暫存器1102a1102b。如區塊1210處所指示,控制邏輯1110使SHA3回合電路1106利用適當的回合索引執行由SHA-3標準指定的24回合處理。當24回合處理完成時,控制電路1110確立適當選擇信號以使輸出多工器1108將在狀態暫存器1102a1102b中緩衝的1600位元狀態(在低階位元中經零擴展以形成兩個1024位元值)儲存至由SHA3雜湊指令900指定之寬向量暫存器檔案316內的運算元暫存器對中(區塊1214)。此後,圖12之程序在區塊1216處結束。 The process then continues from block 1206 to block 1208 , which shows that control circuit 1110 directs SHA3 round circuit 1106 to perform iterations of SHA3 round function 704 using round inputs buffered in state registers 1102a , 1102b and the appropriate round index specified by the SHA- 3 standard. Control circuit 1110 also increments the round counter (block 1208 ). The results of the processing by SHA3 round circuit 1106 are returned to state registers 1102a , 1102b by input multiplexers 1100a , 1100b . As indicated at block 1210 , control logic 1110 causes SHA3 round circuit 1106 to perform the 24 rounds of processing specified by the SHA-3 standard using the appropriate round index. When the 24 rounds of processing are completed, control circuit 1110 asserts the appropriate select signal to cause output multiplexer 1108 to store the 1600-bit state buffered in state registers 1102a , 1102b (zero-extended in the low-order bits to form two 1024-bit values) into the operand register pair within the wide vector register file 316 specified by the SHA3 hash instruction 900 (block 1214 ). Thereafter, the program of FIG. 12 ends at block 1216 .

現在參考圖13,繪示根據SHA-2標準(FIPS 180-4)之訊息雜湊的時間-空間圖,該訊息雜湊在圖4之實施例中由SHA2雜湊電路402執行。以下表II概述由SHA-2標準定義且列於第一行中的六個SHA2雜湊函數之屬性。在表II中,第二行概述以位元為單位之訊息塊大小(r)。訊息塊大小r為位元組長度的整數倍,且訊息之第一訊息塊係位元組對準的。表II之第三行概述由每一SHA2雜湊函數產生的訊息摘要之以位元為單位的固定大小(d)。表II之第四行指定每一SHA2雜湊函數之狀態之以位元為單位的大小,且表II之第五行指示每一SHA2雜湊函數中所採用之處理的回合數目(亦即,64或80)(參見例如圖14)。最後,表II之第六行指定用於每一SHA2雜湊函數之以位元為單位的字大小。應注意,對於所有變體,狀態大小為字大小之8倍(亦即,包含8個字),且訊息塊之大小為字之大小的16倍(亦即,包含16個字)。如下文所描述,根據所揭示發明之一個態樣,憑藉應用於SHA2-224及SHA2-256雜湊函數之字的訊息擴展沿著相同資料流來處理採用32位元字大小之SHA2雜湊函數及採用64位元字大小之SHA2雜湊函數,如下文參考圖15所描述。 Referring now to FIG. 13 , a time-space diagram of message hashing according to the SHA-2 standard (FIPS 180-4) is shown, which is performed by the SHA2 hashing circuit 402 in the embodiment of FIG. 4 . The following Table II summarizes the properties of the six SHA2 hashing functions defined by the SHA-2 standard and listed in the first row. In Table II, the second row summarizes the message block size ( r ) in bits. The message block size r is an integer multiple of the byte length, and the first message block of the message is byte aligned. The third row of Table II summarizes the fixed size ( d ) in bits of the message digest generated by each SHA2 hashing function. The fourth row of Table II specifies the size in bits of the state of each SHA2 hash function, and the fifth row of Table II indicates the number of rounds of processing employed in each SHA2 hash function (i.e., 64 or 80) (see, e.g., FIG. 14 ). Finally, the sixth row of Table II specifies the word size in bits used for each SHA2 hash function. It should be noted that for all variants, the state size is 8 times the word size (i.e., contains 8 words), and the size of the message block is 16 times the word size (i.e., contains 16 words). As described below, according to one aspect of the disclosed invention, a SHA2 hash function using a 32-bit word size and a SHA2 hash function using a 64-bit word size are processed along the same data stream by message extension of words applied to SHA2-224 and SHA2-256 hash functions, as described below with reference to FIG. 15 .

表II

Figure 112124023-A0305-02-0024-2
Table II
Figure 112124023-A0305-02-0024-2

圖13中所展示,SHA2雜湊函數1300接收任何任意長度(例如,長度可能為百萬位元組)之訊息1302作為一個輸入。如在區塊1304處所展示,填充訊息1302以獲得為r個位元之整數倍的長度。如上文參考圖6所論述,此填充可藉由處理器暫存器(例如,寬向量暫存器檔案316)中之硬體而非經由執行填充指令而進行記憶體移動來高效地執行。經由執行填充指令來填充訊息1302,且特定言之填充訊息1302之最後訊息塊,亦允許以SHA2雜湊函數1300對訊息塊之處理在時間上重疊的方式將填充應用於訊息1302之末端。組成藉由區塊1304產生之經填充訊息的長度為r(其中r=16×w)的n個(n為正整數)訊息塊中之每一者經提取以形成n個16×w位元訊息塊1306中之一者。 As shown in FIG13 , the SHA2 hash function 1300 receives as one input a message 1302 of any arbitrary length (e.g., the length may be millions of bytes). As shown at block 1304 , the message 1302 is padded to obtain a length that is an integer multiple of r bits. As discussed above with reference to FIG6 , this padding can be efficiently performed by hardware in processor registers (e.g., wide vector register file 316 ) rather than by executing a pad instruction to perform a memory move. Message 1302 is padded by executing a padding instruction, and in particular the last message block of message 1302 , which also allows padding to be applied to the end of message 1302 in a manner that overlaps in time with the processing of the message blocks by SHA2 hash function 1300. Each of the n ( n is a positive integer) message blocks of length r (where r = 16× w ) that make up the padded message generated by block 1304 is extracted to form one of the n 16× w bit message blocks 1306 .

除了訊息1302以外,SHA2雜湊函數1300亦接收8×w位元之SHA-2指定之常數值作為輸入。如此項技術中已知,可自架構式暫存器檔案300存取之此常數值,在SHA2雜湊函數之間變化且形成8×w位元初始狀態1308。初始狀態1308及第一訊息塊(亦即,訊息塊1 1306)形成由SHA-2標準定義之SHA2塊雜湊函數1 1310之兩個輸入。如下文參考圖16圖17所描述,根據所揭示發明之一個態樣,在硬體中經由執行SHA2雜湊指令來執行SHA2塊雜湊函數1310。由SHA2塊雜湊函數1 1310輸出之8×w位元狀態形成SHA2塊雜湊函數2 1310之第一輸入,該SHA2塊雜湊函 數2 1310將下一16×w位元訊息塊2 1306視為第二輸入。SHA2塊雜湊函數2 1310之結果形成SHA2塊雜湊函數1310之下一反覆之輸入。如所示,此程序針對訊息塊602中之每一者反覆地繼續,直至SHA2塊雜湊函數1310之最終第n次反覆生成並輸出8×w位元最後狀態,該8×w位元最後狀態藉由截斷函數1312截斷以產生具有d個位元之訊息摘要1314In addition to the message 1302 , the SHA2 hash function 1300 also receives as input an 8× w bit SHA-2 specified constant value. As is known in the art, this constant value, which can be accessed from the architectural register file 300 , varies between SHA2 hash functions and forms an 8× w bit initial state 1308. The initial state 1308 and the first message block (i.e., message block 1 1306 ) form two inputs to the SHA2 block hash function 1 1310 defined by the SHA-2 standard. As described below with reference to FIGS. 16 and 17 , according to one aspect of the disclosed invention, the SHA2 block hash function 1310 is executed in hardware by executing a SHA2 hash instruction. The 8× w bit state output by SHA2 block hash function 1 1310 forms the first input of SHA2 block hash function 2 1310 , which treats the next 16× w bit message block 2 1306 as the second input. The result of SHA2 block hash function 2 1310 forms the next repeated input of SHA2 block hash function 1310 . As shown, this process continues repeatedly for each of the message blocks 602 until the final n- th iteration of the SHA2 block hash function 1310 generates and outputs an 8× w - bit final state, which is truncated by the truncation function 1312 to produce a message digest 1314 having d bits.

現在參看圖14,描繪圖13中所繪示之SHA2塊雜湊函數1310的時間-空間圖。SHA2塊雜湊函數1310接受16×w位元訊息塊1306,且如在區塊1420處所展示,初始化針對訊息塊1306之16×w位元訊息排程。SHA2塊雜湊函數1310接著經由訊息排程回合函數1400中之n個回合之處理來處理16×w位元訊息排程,其中回合1至n-2中之每一者的16×w位元輸出充當至下一回合之訊息排程處理的輸入。 Referring now to FIG. 14 , a time-space diagram of the SHA2 block hash function 1310 shown in FIG. 13 is depicted. The SHA2 block hash function 1310 accepts a 16× w bit message block 1306 and, as shown at block 1420 , initializes a 16× w bit message schedule for the message block 1306. The SHA2 block hash function 1310 then processes the 16× w bit message schedule through n rounds of processing in a message schedule round function 1400 , where the 16× w bit output of each of rounds 1 to n-2 serves as input to the next round of message schedule processing.

如所示,SHA2塊雜湊函數1310亦接收8×w位元當前雜湊狀態(亦即,初始狀態1308或先前SHA2塊雜湊函數1310之輸出)作為輸入。如區塊1406處所指示,SHA2塊雜湊函數1310將此8×w位元當前雜湊狀態分割成8 w位元變數ah。SHA2塊雜湊函數1310接著藉由更新回合函數1404經由n個回合處理來處理當前雜湊狀態。初始更新回合0 1404將SHA-2指定之w位元回合密鑰0 1402及訊息排程之16×w位元初始化1420w個高階位元視為額外輸入。更新回合函數1404之每一接續反覆將由更新回合函數1404之先前反覆生成的狀態、訊息排程回合函數1400之對應反覆的16×w位元輸出之w個高階位元以及SHA-2指定之w位元回合密鑰1402視為輸入。由更新回合函數n-1 1404輸出之雜湊狀態藉由8×w位元進位傳播加法函數1410添加至輸入雜湊狀態以生成下一雜湊狀態。 As shown, the SHA2 block hash function 1310 also receives as input an 8× w bit current hash state (i.e., the initial state 1308 or the output of the previous SHA2 block hash function 1310 ). The SHA2 block hash function 1310 splits this 8× w bit current hash state into 8w bit variables a to h as indicated at block 1406. The SHA2 block hash function 1310 then processes the current hash state by updating the round function 1404 through n rounds of processing. Initial update round 0 1404 takes as additional inputs the w- bit SHA-2 specified round key 0 1402 and the w high-order bits of the 16× w-bit message scheduling initialization 1420. Each successive iteration of update round function 1404 takes as input the state generated by the previous iteration of update round function 1404, the w high - order bits of the 16× w -bit output of the corresponding iteration of message scheduling round function 1400 , and the w- bit SHA-2 specified round key 1402. The hash state output by update round function n -1 1404 is added to the input hash state by the 8× w -bit carry propagation addition function 1410 to generate the next hash state.

現在參考圖15,繪示根據一例示性實施例的用於SHA2雜 湊函數之訊息擴展。如上文參考表II及圖13所提及,本發明之實施例較佳地藉由擴展採用較小字大小之彼等SHA2雜湊函數的訊息字及初始雜湊狀態來支援沿著共同資料路徑的不同字大小w之SHA2雜湊函數之處理。此擴展可例如在圖13之區塊13041308處執行。圖15繪示一特定實例,其中SHA2-224或SHA2-256輸入訊息1500之十六個32位元字1502中的每一者經擴展以形成輸出訊息1504之十六個64位元雙字1506中的對應一者。在此實例中,每一64位元雙字1506係藉由將64位元雙字1506之高階一半中之輸入訊息1500之32位元字與雙字1506之低階一半中之32位元零字1508串連而形成。所得輸出訊息1504可接著以與採用64位元字之訊息相同的方式由SHA2雜湊電路處理。 Referring now to FIG. 15 , a message expansion for a SHA2 hash function according to an exemplary embodiment is illustrated. As mentioned above with reference to Table II and FIG. 13 , embodiments of the present invention preferably support processing of SHA2 hash functions of different word sizes w along a common data path by expanding the message words and initial hash states of those SHA2 hash functions that employ smaller word sizes. This expansion may be performed, for example, at blocks 1304 and 1308 of FIG. 13 . 15 illustrates a specific example in which each of the sixteen 32-bit words 1502 of a SHA2-224 or SHA2-256 input message 1500 is expanded to form a corresponding one of the sixteen 64-bit double words 1506 of an output message 1504. In this example, each 64-bit double word 1506 is formed by concatenating the 32-bit word of the input message 1500 in the high-order half of the 64-bit double word 1506 with the 32-bit zero word 1508 in the low-order half of the double word 1506. The resulting output message 1504 can then be processed by the SHA2 hash circuit in the same manner as a message using 64-bit words.

現在參看圖16,描繪根據一個實施例的用於SHA2雜湊指令1600之例示性格式。在一例示性實施例中,加速器單元314經組態以回應於接收到SHA2雜湊指令1600而在硬體中利用SHA2雜湊電路402執行SHA2塊雜湊函數1310 16 , an exemplary format for a SHA2 hash instruction 1600 according to one embodiment is depicted. In one exemplary embodiment, the accelerator unit 314 is configured to execute the SHA2 block hash function 1310 in hardware using the SHA2 hash circuit 402 in response to receiving the SHA2 hash instruction 1600 .

在所繪示之實施例中,SHA2雜湊指令1600包括作業碼欄位1602,該作業碼欄位指定用於SHA2塊雜湊函數之特定的架構特定作業碼。SHA2雜湊指令1600另外包括一或多個運算元暫存器欄位16041606,該一或多個運算元暫存器欄位用於指定寬向量暫存器檔案316內之用於SHA2塊雜湊函數之源及目的地運算元的運算元暫存器。舉例而言,在一個實現中,SHA2雜湊指令1600包括暫存器欄位1604,該暫存器欄位指定緩衝輸入當前雜湊狀態且在SHA2塊雜湊函數完成之後緩衝輸出當前雜湊狀態(其覆寫輸入當前雜湊狀態)的1024位元暫存器。另外,SHA2雜湊指令1600包括緩衝待處理之當前訊息塊的暫存器欄位1606。SHA2雜湊 指令1600進一步包括模式欄位1608,該模式欄位指示待執行之SHA2雜湊函數是採用32位元字抑或64位元字。 In the illustrated embodiment, the SHA2 hash instruction 1600 includes an opcode field 1602 that specifies a specific architecture-specific opcode for the SHA2 block hash function. The SHA2 hash instruction 1600 further includes one or more operand register fields 1604 , 1606 that specify operand registers within the wide vector register file 316 for source and destination operands of the SHA2 block hash function. For example, in one implementation, the SHA2 hash instruction 1600 includes a register field 1604 that specifies a 1024-bit register that buffers the current hash state of the input and buffers the current hash state of the output after the SHA2 block hash function is completed (which overwrites the input current hash state). In addition, the SHA2 hash instruction 1600 includes a register field 1606 that buffers the current message block to be processed. The SHA2 hash instruction 1600 further includes a mode field 1608 that indicates whether the SHA2 hash function to be performed uses 32-bit words or 64-bit words.

現在,已解釋SHA2雜湊函數及用於實現SHA2雜湊函數之部分之例示性指令,呈現用於在硬體中執行例示性SHA2雜湊函數(亦即,SHA2-512)之偽碼。在SHA2-512雜湊函數中,每一訊息塊之長度為1024個位元,且雜湊狀態及訊息摘要之長度各自為512個位元。在以下偽碼中,參考以下暫存器: Now that the SHA2 hash function and exemplary instructions for implementing portions of the SHA2 hash function have been explained, a pseudocode for executing an exemplary SHA2 hash function (i.e., SHA2-512) in hardware is presented. In the SHA2-512 hash function, each message block is 1024 bits long, and the hash state and message digest are each 512 bits long. In the following pseudocode, the following registers are referenced:

Rl←以位元為單位之訊息長度 Rl←Message length in bits

RL←以位元組為單位之訊息長度;假定

Figure 112124023-A0305-02-0027-4
128個位元組,因此第一訊息塊未被填充 RL←Message length in bytes; assuming
Figure 112124023-A0305-02-0027-4
128 bytes, so the first message block is not filled

Ra←訊息之起始位址 Ra←Starting address of the message

Ri←初始狀態之位址 Ri←address of initial state

Rb←由雜湊函數產生之訊息摘要之位址 Rb←address of the message digest generated by the hash function

Rd←以位元組為單位之訊息摘要長度 Rd←Message summary length in bytes

Xs←SHA2狀態 //寬向量暫存器 Xs←SHA2 status //Wide vector register

Xm←當前訊息塊 //寬向量暫存器 Xm←Current message block //Wide vector register

給出此等暫存器,用於執行SHA2-512雜湊函數之偽碼可表示如下: Given these registers, the pseudo code for executing the SHA2-512 hash function can be expressed as follows:

Xs=load(Ri,64) //載入64個位元組之初始狀態 Xs=load(Ri,64) //Load the initial state of 64 bytes

Xm=load(Ra,128) //載入第一(完整)訊息塊 Xm=load(Ra,128) //Load the first (complete) message block

Xs=sha2hash(Xs,Xm,64-bit) //執行SHA2雜湊指令以執行塊雜湊函數 Xs=sha2hash(Xs,Xm,64-bit) //Execute SHA2 hashing instruction to execute block hashing function

RL-=128 //遞減待處理之訊息長度 RL-=128 //Decrease the length of the message to be processed

Ra+=128 //前進指標至下一訊息塊 Ra+=128 //Advance pointer to the next message block

While(RL>=128) //經由剩餘訊息塊迴路,除了最後訊息塊之外 While(RL>=128) //Through the remaining message block loop, except for the last message block

{Xm=load(Ra,128) //載入下一訊息塊(全大小) {Xm=load(Ra,128) //Load the next message block (full size)

Xs=sha2hash(Xs,Xm,64-bit) //執行SHA2雜湊指令以執行塊雜湊函數 Xs=sha2hash(Xs,Xm,64-bit) //Execute SHA2 hashing instruction to execute block hashing function

RL-=128 //遞減待處理之訊息長度 RL-=128 //Decrease the length of the message to be processed

Ra+=128 //前進指標至下一訊息塊 Ra+=128 //Advance pointer to the next message block

} }

Xm=1oadlength(Ra,RL) //載入最後訊息塊(若存在)(RL可為零) Xm=1oadlength(Ra,RL) //Load the last message block (if it exists) (RL can be zero)

Xm=sha2_EOM_pad(Xm,RL) //將SHA2 EOM位元組附加至訊息塊之末端 Xm=sha2_EOM_pad(Xm,RL) //Append the SHA2 EOM byte to the end of the message block

If(RL>111)then //若填充跨越兩個訊息塊,則 If(RL>111)then //If the filling spans two message blocks, then

{Xs=sha2hash(Xs,Xm,64-bit) //執行SHA2雜湊指令以執行塊 {Xs=sha2hash(Xs,Xm,64-bit) //Execute SHA2 hashing instruction to execute the block

Xm=force-to-zero //雜湊函數且將最後訊息塊置零 Xm=force-to-zero //Hash function and set the last message block to zero

} }

Xm=sha2_EOB_pad(Xm,RI) //在經填充訊息之最後塊中插入EOB Xm=sha2_EOB_pad(Xm,RI) //Insert EOB into the last block of the padded message

Xs=sha2hash(Xs,Xm,64-bit) //執行SHA2雜湊指令以對最後訊息塊執行塊雜湊函數 Xs=sha2hash(Xs,Xm,64-bit) //Execute SHA2 hashing instruction to perform block hashing function on the last message block

Store(Xs,Rb,64) //截斷狀態至Xs之前導64個位元組以獲得訊息摘要且在位址Rb處儲存至記憶體 Store(Xs,Rb,64) //Truncate the state to the leading 64 bytes of Xs to obtain the message digest and store it to memory at address Rb

現在參考圖17,繪示適合於執行SHA2雜湊指令1600圖4之SHA2雜湊電路402之例示性實施例的高階方塊圖。如所示,SHA2雜湊電路402包括512位元雙輸入狀態多工器1702a、1024位元雙輸入訊息多工器1702b、512位元狀態暫存器1704a、1024位元訊息塊暫存器1704b、更新工作狀態電路1708、訊息排程回合電路1710及控制電路1720,該控制電路回應於SHA2雜湊指令1600而控制SHA2雜湊電路402之操作。 Referring now to FIG. 17 , a high-level block diagram of an exemplary embodiment of the SHA2 hash circuit 402 of FIG. 4 suitable for executing the SHA2 hash instruction 1600 is shown. As shown, the SHA2 hash circuit 402 includes a 512-bit dual-input state multiplexer 1702 a , a 1024-bit dual-input message multiplexer 1702 b , a 512-bit state register 1704 a , a 1024-bit message block register 1704 b , an update work status circuit 1708 , a message scheduling round circuit 1710 , and a control circuit 1720 that controls the operation of the SHA2 hash circuit 402 in response to the SHA2 hash instruction 1600 .

在此實例中,狀態多工器1702a之第一輸入經耦合以自寬向量暫存器檔案316中之由SHA2雜湊指令1600之暫存器欄位1604指定的暫存器接收保持於暫存器之512高階位元中的當前雜湊狀態。狀態多工器1702a之第二輸入耦接至更新工作狀態電路1708之輸出。訊息多工器1702b經類似地組態,其具有:第一輸入,其經耦合以自寬向量暫存器檔案316中之由SHA2雜湊指令1600之暫存器欄位1606指定的暫存器接收訊息塊;及第二輸入,其經耦合以自訊息排程回合電路1710接收1024位元回合回饋。SHA2雜湊電路400內之控制邏輯1720將未繪示之選擇信號提供至多工器1702a1702b,以使多工器1702a1702b選擇在更新回合0函數1404之前存在於第一輸入處的值,且選擇在更新回合0函數至SHA2塊雜湊n函數中之每一者之後存在於第二輸入處的值。分別在狀態暫存器1704a及訊息塊暫存器1704b中暫時緩衝由多工器1702a1702b輸出之值。在訊息塊暫存器1704b中緩衝之訊息塊形成訊息排程回合電路1710之輸入,該訊息排程回合電路實現圖14之訊息排程回合函數1400。來自訊息塊暫存器1704b之64高階位元及狀態暫存器1704a中之512位元狀態形成更新工作狀態電路1708之兩個輸入,該更新工作狀態電路經組態以執行如先前參考圖14所描述之更新回合函數1404In this example, a first input of state multiplexer 1702a is coupled to receive the current hash state held in the 512 high-order bits of the register specified by register field 1604 of SHA2 hash instruction 1600 in width vector register file 316. A second input of state multiplexer 1702a is coupled to the output of update working state circuit 1708 . Message multiplexer 1702b is similarly configured, having: a first input coupled to receive a message block from a register specified by register field 1606 of SHA2 hash instruction 1600 in width vector register file 316 ; and a second input coupled to receive a 1024-bit round feedback from message scheduling round circuit 1710 . The control logic 1720 within the SHA2 hash circuit 400 provides a select signal (not shown) to the multiplexers 1702a , 1702b to cause the multiplexers 1702a , 1702b to select the value present at the first input before the update round 0 function 1404 , and to select the value present at the second input after each of the update round 0 function to the SHA2 block hash n function. The values output by the multiplexers 1702a , 1702b are temporarily buffered in the state register 1704a and the message block register 1704b, respectively. The message blocks buffered in the message block register 1704b form the inputs of the message scheduling round circuit 1710 , which implements the message scheduling round function 1400 of FIG. 14. The 64 high-order bits from the message block register 1704b and the 512-bit state in the state register 1704a form the two inputs of the update work state circuit 1708 , which is configured to execute the update round function 1404 as previously described with reference to FIG. 14 .

控制電路1720經進一步組態以利用由SHA-2標準指定之正確回合索引經由n個回合中之每一者對更新工作狀態電路1708進行定序。在最後回合n-1結束之後,狀態暫存器1704a將保持512位元雜湊狀態。控制電路1720經進一步組態以一旦獲得輸出雜湊狀態,就使單指令多資料(SIMD)加法器1712將來自狀態暫存器1704a之雜湊狀態與自寬向量暫存器欄位316讀取之輸入雜湊狀態相加,且將作為下一雜湊狀態之結果儲存回至寬向量暫存器檔案316,如上文關於圖14之加法函數1410所描述。熟習此項技術者將瞭解,在不同實現中,SIMD加法器1712可實現為SHA2雜湊電路402之專用組件或實現為可(例如)由多個雜湊電路共用之單獨管線。 The control circuit 1720 is further configured to sequence the update of the working state circuit 1708 through each of the n rounds using the correct round index specified by the SHA-2 standard. After the last round n -1 is completed, the state register 1704a will hold a 512-bit hash state. The control circuit 1720 is further configured to cause the single instruction multiple data (SIMD) adder 1712 to add the hash state from the state register 1704a to the input hash state read from the wide vector register field 316 once the output hash state is obtained, and store the result back to the wide vector register file 316 as the next hash state, as described above with respect to the addition function 1410 of Figure 14. Those skilled in the art will appreciate that in different implementations, the SIMD adder 1712 may be implemented as a dedicated component of the SHA2 hash circuit 402 or as a separate pipeline that may be shared by multiple hash circuits, for example.

現在參看圖18,描繪根據一個實施例的來自圖17之例示性更新工作狀態電路1708之更詳細方塊圖。在此實施例中,在狀態暫存器1704a內緩衝的作為更新工作狀態電路1708之一個輸入被接收的512位元狀態,經分割成八個64位元變數,其在SHA-2標準中被稱作變數ah,如區塊1800處所展示。更新工作狀態電路1708包括:兩個西格瑪函數電路,即SHA2西格瑪0電路1802及SHA2西格瑪1電路1806,以及SHA2 MA電路1804及SHA2 CH電路1808,其各自執行由SHA-2標準定義之各別函數。更新工作狀態電路1708另外包括三個64位元加法器181018121814。SHA2西格瑪0電路1802將具有n(n1,n2,n3)=(28,34,39)及m(m1,m2,m3)=(2,13,22)的西格瑪函數應用於變數a以產生加法器1812之第一輸入。藉由SHA2 MA電路1804處理變數abc以產生加法器1812之第二輸入。SHA2西格瑪1電路1806將具有n(n1,n2,n3)=(14,18,41)及m(m1,m2,m3)=(6,13,22)的西格瑪函數應用於變數e以產生加法 器1810之五個輸入當中的第一輸入。藉由SHA2 CH電路1808處理變數efg以產生加法器1810之第二輸入。加法器1810將相關回合密鑰、回合訊息塊及變數d加至此兩個輸入以產生形成加法器1814之第一輸入及加法器1812之第三輸入的總和。 Referring now to FIG. 18 , a more detailed block diagram of the exemplary update operating state circuit 1708 from FIG. 17 is depicted according to one embodiment. In this embodiment, the 512-bit state received as one input to the update operating state circuit 1708 , buffered in the state register 1704 a , is split into eight 64-bit variables, referred to as variables a through h in the SHA-2 standard, as shown at block 1800 . The update working state circuit 1708 includes: two sigma function circuits, namely SHA2 sigma 0 circuit 1802 and SHA2 sigma 1 circuit 1806 , and SHA2 MA circuit 1804 and SHA2 CH circuit 1808 , each of which performs a respective function defined by the SHA-2 standard. The update working state circuit 1708 also includes three 64-bit adders 1810 , 1812 and 1814. SHA2 sigma 0 circuit 1802 applies a sigma function with n (n1, n2, n3) = (28, 34, 39) and m (m1, m2, m3) = (2, 13, 22) to variable a to generate a first input of adder 1812 . Variables a , b, and c are processed by SHA2 MA circuit 1804 to generate the second input of adder 1812. SHA2 Sigma 1 circuit 1806 applies a sigma function with n (n1, n2, n3) = (14, 18, 41) and m (m1, m2, m3) = (6, 13, 22) to variable e to generate the first input of the five inputs of adder 1810. Variables e , f, and g are processed by SHA2 CH circuit 1808 to generate the second input of adder 1810. Adder 1810 adds the associated round key, round message block, and variable d to these two inputs to generate a sum that forms the first input of adder 1814 and the third input of adder 1812 .

更新工作狀態電路1708生成由八個64位元變數a'h'組成的512位元結果狀態1816。結果狀態1816之變數a'係藉由加法器1812之輸出而形成,變數b'c'd'分別由輸入狀態1800之變數abc形成,且變數f'g'h'分別由輸入狀態1800之變數efg形成。剩餘變數e'係藉由加法器1810之輸出與輸入狀態1800之變數d的總和而形成。 Update working state circuit 1708 generates a 512-bit result state 1816 consisting of eight 64-bit variables a' to h' . Variable a' of result state 1816 is formed by the output of adder 1812 , variables b' , c' and d' are formed by variables a , b and c of input state 1800 , and variables f' , g' and h' are formed by variables e , f and g of input state 1800. Residual variable e' is formed by the sum of the output of adder 1810 and variable d of input state 1800 .

應注意,上文參考圖15所描述的SHA-2訊息之字的32位元至64位元擴展並不影響SHA2 MA電路1804、SHA2 CH電路1808及模組化加法器18121814之設計(對其透明)。採用32位元字之SHA2訊息的尾隨零擴展僅影響SHA2西格瑪電路18021806,如下文參考圖19更詳細描述。 It should be noted that the 32-bit to 64-bit expansion of the SHA-2 message word described above with reference to FIG15 does not affect (is transparent to) the design of the SHA2 MA circuit 1804 , the SHA2 CH circuit 1808 , and the modular adders 1812 , 1814. The trailing zero expansion of the SHA2 message using 32-bit words only affects the SHA2 sigma circuits 1802 , 1806 , as described in more detail below with reference to FIG19 .

圖19為SHA2西格瑪電路1900之例示性實施例的更詳細方塊圖,該SHA2西格瑪電路可用以實現圖18之SHA2西格瑪0電路1802及SHA2西格瑪1電路1806。SHA2西格瑪電路1900接收包括32個高階位元(位元0至31)及32個低階位元(位元32至63)之64位元輸入變數1902 19 is a more detailed block diagram of an exemplary embodiment of a SHA2 Sigma circuit 1900 , which may be used to implement the SHA2 Sigma 0 circuit 1802 and the SHA2 Sigma 1 circuit 1806 of FIG 18. The SHA2 Sigma circuit 1900 receives a 64-bit input variable 1902 including 32 high-order bits (bits 0 to 31) and 32 low-order bits (bits 32 to 63).

SHA2西格瑪電路1900包括64位元旋轉電路1904a,該64位元旋轉電路將64位元輸入變數1902旋轉n1個位元(亦即,對於SHA2西格瑪0電路1802為28個位元且對於SHA2西格瑪1電路1806為14個位元)以獲得多工器1910a之第一64位元輸入。SHA2西格瑪電路1900另外包括32位元旋轉電路1906a,該32位元旋轉電路將輸入變數1902之32個高階位元 旋轉m1個位元(亦即,對於SHA2西格瑪0電路1802為2個位元且對於SHA2西格瑪1電路1806為6個位元),以在與輸入變數1902之32個低階位元串連時獲得多工器1910a之第二64位元輸入。多工器1910a基於由相關SHA2雜湊指令1600之模式欄位1608判定的模式信號而在其第一輸入與第二輸入之間進行選擇。亦即,若模式信號指示模式欄位1608經設定為指示利用64位元字之SHA2雜湊函數,則多工器1910a選擇第一輸入,且若模式信號指示模式欄位1608經設定為指示利用32位元字之SHA2雜湊函數,則多工器1910a選擇第二輸入。 SHA2 Sigma circuit 1900 includes a 64-bit rotate circuit 1904a that rotates the 64-bit input variable 1902 by n1 bits (i.e., 28 bits for SHA2 Sigma 0 circuit 1802 and 14 bits for SHA2 Sigma 1 circuit 1806 ) to obtain a first 64-bit input to multiplexer 1910a . SHA2 sigma circuit 1900 further includes a 32-bit rotate circuit 1906a that rotates the 32 high-order bits of input variable 1902 by m1 bits (i.e., 2 bits for SHA2 sigma 0 circuit 1802 and 6 bits for SHA2 sigma 1 circuit 1806 ) to obtain a second 64-bit input to multiplexer 1910a when concatenated with the 32 low-order bits of input variable 1902. Multiplexer 1910a selects between its first and second inputs based on a mode signal determined by mode field 1608 of the associated SHA2 hash instruction 1600 . That is, if the mode signal indicates that the mode field 1608 is set to indicate the SHA2 hash function using 64-bit words, the multiplexer 1910a selects the first input, and if the mode signal indicates that the mode field 1608 is set to indicate the SHA2 hash function using 32-bit words, the multiplexer 1910a selects the second input.

SHA2西格瑪電路1900另外包括64位元旋轉電路1904b,該64位元旋轉電路將64位元輸入變數1902旋轉n2個位元(亦即,對於SHA2西格瑪0電路1802為34個位元且對於SHA2西格瑪1電路1806為18個位元)以獲得多工器1910b之第一64位元輸入。SHA2西格瑪電路1900亦包括32位元旋轉電路1906b,該32位元旋轉電路將輸入變數1902之32個高階位元旋轉m2個位元(亦即,對於SHA2西格瑪0電路1802及SHA2西格瑪1電路1806兩者為13個位元),以在與輸入變數1902之32個低階位元串連時獲得多工器1910b之第二64位元輸入。多工器1910b基於模式信號在其第一輸入與第二輸入之間進行選擇。特定言之,若模式信號指示模式欄位1608經設定為指示利用64位元字之SHA2雜湊函數,則多工器1910b選擇第一輸入,且若模式信號指示模式欄位1608經設定為指示利用32位元字之SHA2雜湊函數,則多工器1910b選擇第二輸入。 SHA2 Sigma circuit 1900 further includes a 64-bit rotate circuit 1904b that rotates the 64-bit input variable 1902 by n2 bits (i.e., 34 bits for SHA2 Sigma 0 circuit 1802 and 18 bits for SHA2 Sigma 1 circuit 1806 ) to obtain a first 64-bit input to multiplexer 1910b . SHA2 Sigma circuit 1900 also includes a 32-bit rotate circuit 1906b that rotates the 32 high-order bits of input variable 1902 by m2 bits (i.e., 13 bits for both SHA2 Sigma 0 circuit 1802 and SHA2 Sigma 1 circuit 1806 ) to obtain a second 64-bit input to multiplexer 1910b when connected in series with the 32 low-order bits of input variable 1902. Multiplexer 1910b selects between its first and second inputs based on the mode signal. Specifically, if the mode signal indicates that the mode field 1608 is set to indicate the SHA2 hash function using 64-bit words, the multiplexer 1910b selects the first input, and if the mode signal indicates that the mode field 1608 is set to indicate the SHA2 hash function using 32-bit words, the multiplexer 1910b selects the second input.

SHA2西格瑪電路1900亦包括64位元旋轉/移位電路1908a,該64位元旋轉/移位電路將64位元輸入變數旋轉及移位n3個位元(亦即,對於SHA2西格瑪0電路1802為39個位元且對於SHA2西格瑪1電路 1806為41個位元)以獲得多工器1910c之第一64位元輸入。SHA2西格瑪電路1900另外包括32位元旋轉/移位電路1908b,該32位元旋轉/移位電路將輸入變數1902之32個高階位元旋轉及移位m3個位元(亦即,對於SHA2西格瑪0電路1802及SHA2西格瑪1電路1806兩者為22個位元),以在與輸入變數1902之32個低階位元串連時獲得多工器1910c之第二64位元輸入。多工器1910c基於模式信號在其第一輸入與第二輸入之間進行選擇。如同多工器1910a1910b,若模式信號指示模式欄位1608經設定為指示利用64位元字之SHA2雜湊函數,則多工器1910c選擇第一輸入,且若模式信號指示模式欄位1608經設定為指示利用32位元字之SHA2雜湊函數,則多工器1910c選擇第二輸入。 SHA2 Sigma circuit 1900 also includes a 64-bit rotate/shift circuit 1908a that rotates and shifts the 64-bit input variable by n3 bits (i.e., 39 bits for SHA2 Sigma 0 circuit 1802 and 41 bits for SHA2 Sigma 1 circuit 1806 ) to obtain a first 64-bit input to multiplexer 1910c . SHA2 Sigma circuit 1900 additionally includes a 32-bit rotate/shift circuit 1908b that rotates and shifts the 32 high-order bits of input variable 1902 by m3 bits (i.e., 22 bits for both SHA2 Sigma 0 circuit 1802 and SHA2 Sigma 1 circuit 1806 ) to obtain a second 64-bit input to multiplexer 1910c when concatenated with the 32 low-order bits of input variable 1902. Multiplexer 1910c selects between its first and second inputs based on the mode signal. As with multiplexers 1910a , 1910b , multiplexer 1910c selects the first input if the mode signal indicates that mode field 1608 is set to indicate a SHA2 hash function using 64-bit words, and multiplexer 1910c selects the second input if the mode signal indicates that mode field 1608 is set to indicate a SHA2 hash function using 32-bit words.

多工器1910a1910b1910c之64位元輸出形成三輸入64位元逐位元XOR電路1912之輸入,該三輸入64位元逐位元XOR電路在其三個輸入上執行逐位元XOR以生成64位元輸出1914。熟習此項技術者應瞭解,在SHA2西格瑪電路1900之一些實施例中,旋轉電路1904a1904b1906a1906b以及旋轉/移位電路1908a1908b之功能可藉由適當佈線實現,從而允許SHA2西格瑪電路1900藉由三個多工器1910a1910c及3向逐位元XOR電路1912且無需顯式旋轉及移位電路系統來實現。 The 64-bit outputs of multiplexers 1910a , 1910b , and 1910c form the inputs of a three-input 64-bit bit-by-bit XOR circuit 1912 , which performs a bit-by-bit XOR on its three inputs to generate a 64-bit output 1914 . Those skilled in the art will appreciate that in some embodiments of the SHA2 sigma circuit 1900 , the functionality of the rotation circuits 1904a to 1904b and 1906a to 1906b and the rotation/shift circuits 1908a to 1908b may be implemented by appropriate routing, thereby allowing the SHA2 sigma circuit 1900 to be implemented by three multiplexers 1910a to 1910c and a 3-way bitwise XOR circuit 1912 without the need for explicit rotation and shift circuitry.

現在參看圖20,描繪根據一個實施例的用於執行SHA2雜湊指令1600之例示性程序的高階邏輯流程圖。為了易於理解,參考圖17圖19中所繪示之SHA2雜湊電路402之例示性實施例來描述圖20之程序。 Referring now to FIG. 20 , a high-level logic flow chart of an exemplary process for executing the SHA2 hash instruction 1600 according to one embodiment is depicted. For ease of understanding, the process of FIG. 20 is described with reference to the exemplary embodiment of the SHA2 hash circuit 402 depicted in FIGS. 17 to 19 .

圖20之程序開始於區塊2000,且接著繼續進行至區塊2002,區塊2002繪示SHA2雜湊電路402接收指定特定SHA2模式(亦即, 32位元或64位元字大小)以及寬向量暫存器檔案316內之狀態暫存器及訊息塊暫存器的SHA2雜湊指令1600。回應於接收到SHA2雜湊指令1600,控制電路1720使得待自寬向量暫存器檔案316讀出512位元狀態及1024位元訊息塊並分別經由多工器1702a1702b將其載入至狀態暫存器1704a及訊息塊暫存器1704b中(區塊2002)。控制電路1720另外初始化內部回合計數器至0(區塊2004)。 The process of Figure 20 begins at block 2000 and then continues to block 2002 , which shows the SHA2 hash circuit 402 receiving the SHA2 hash instruction 1600 specifying a particular SHA2 mode (i.e., 32-bit or 64- bit word size) and the state registers and message block registers in the wide vector register file 316 . In response to receiving the SHA2 hash instruction 1600 , the control circuit 1720 causes the 512-bit state and 1024-bit message blocks to be read from the width vector register file 316 and load them into the state register 1704a and the message block register 1704b via multiplexers 1702a and 1702b , respectively (block 2002 ). The control circuit 1720 also initializes the internal round counter to 0 (block 2004 ).

程序接著自區塊2004繼續進行至區塊2006,區塊2006繪示控制電路1720引導訊息排程回合電路1710利用在訊息塊暫存器1704b中緩衝之訊息塊來執行訊息排程回合函數1400之反覆。另外,控制電路1720引導更新工作狀態電路1708基於適當回合索引、訊息塊暫存器1704b之64高階位元及來自狀態暫存器1704a之輸入雜湊狀態來執行更新回合函數1404之反覆。更新工作狀態電路1708及訊息排程回合電路1710之處理結果分別由多工器1702a1702b傳回至暫存器1704a1704b。控制電路1110另外使回合計數器前進。在區塊2010處,控制邏輯1720藉由參考回合計數器判定SHA2雜湊電路402是否已執行由SHA-2標準指定之最後回合個處理。如表II中所提及,SHA2雜湊電路402針對採用32位元字之SHA2雜湊函數執行64個回合之處理,且針對採用64位元字之SHA2雜湊函數執行80個回合之處理。若控制電路1720在區塊2010處判定仍有至少一個額外回合之處理待執行,則程序返回至區塊2006,區塊2006已被描述。然而,回應於在區塊2010處判定所有回合之處理完成,控制電路1720使得先前狀態再次自寬向量暫存器檔案316被讀取且藉由SIMD加法器1712添加至在狀態暫存器1704a中緩衝之最終狀態(區塊2012)。控制電路1720接著將所得下一狀態之儲存引導回至寬向量暫存器檔案316中(區塊 2014)。此後,圖20之程序在區塊2016處結束。 The program then continues from block 2004 to block 2006 , which shows that control circuit 1720 directs message scheduling round circuit 1710 to execute iterations of message scheduling round function 1400 using the message blocks buffered in message block register 1704b . In addition, control circuit 1720 directs update work status circuit 1708 to execute iterations of update round function 1404 based on the appropriate round index, the 64 high-order bits of message block register 1704b , and the input hash state from status register 1704a . The processing results of the update working state circuit 1708 and the message scheduling round circuit 1710 are returned to the registers 1704a and 1704b by the multiplexers 1702a and 1702b respectively. The control circuit 1110 also advances the round counter. At block 2010 , the control logic 1720 determines whether the SHA2 hash circuit 402 has executed the last round of processing specified by the SHA-2 standard by referring to the round counter. As mentioned in Table II, the SHA2 hash circuit 402 executes 64 rounds of processing for the SHA2 hash function using 32-bit words, and executes 80 rounds of processing for the SHA2 hash function using 64-bit words. If the control circuit 1720 determines at block 2010 that there is still at least one additional round of processing to be performed, the program returns to block 2006 , which has been described. However, in response to determining at block 2010 that all rounds of processing are complete, the control circuit 1720 causes the previous state to be read again from the wide vector register file 316 and added to the final state buffered in the state register 1704a by the SIMD adder 1712 (block 2012 ). The control circuit 1720 then directs the storage of the resulting next state back to the wide vector register file 316 (block 2014 ). Thereafter, the program of Figure 20 ends at block 2016 .

如上文參考圖6之區塊600圖13之區塊1304所論述,由SHA2及SHA3雜湊函數處理之訊息經填充以產生長度為塊長度r個位元之偶數倍的訊息。圖21A描繪例示性未經填充訊息2100,其具有L個位元之總長度且包括n個訊息塊。其中,前n-1個訊息塊包括r個位元,但最終訊息塊n包括k個位元,其中k<r。如圖21B中所展示,在一般狀況下,訊息2100藉由將r-k個填充位元附加至訊息塊n之末端來填充,從而產生長度皆為r個位元的n個訊息塊。 As discussed above with reference to block 600 of FIG. 6 and block 1304 of FIG. 13 , messages processed by SHA2 and SHA3 hash functions are padded to produce messages whose length is an even multiple of the block length r bits. FIG. 21A depicts an exemplary unpadded message 2100 having a total length of L bits and including n message blocks. Among them, the first n -1 message blocks include r bits, but the final message block n includes k bits, where k < r . As shown in FIG. 21B , in general, message 2100 is padded by appending rk padding bits to the end of message block n , thereby producing n message blocks each having a length of r bits.

為獲得經填充訊息所附加的填充位元之內容可取決於所考慮之雜湊函數而變化。舉例而言,在本文中所論述之SHA2及SHA3/SHAKE雜湊演算法中,填充位元將包括標記訊息之未經填充部分之末端(亦即,訊息末端(EOM)標記)及經填充訊息之最後塊之末端(亦即,塊末端(EOB)標記)兩者的位元組。如下文進一步所解釋,在一些狀況下,包括EOM及EOB標記之填充位元可全部包括於含有最終訊息位元組之訊息塊內;在其他狀況下,填充位元之添加可需要將額外訊息塊附加至訊息。在任一狀況下,所揭示發明較佳地在處理器暫存器中經由執行一或多個指令而非經由在記憶體中之兩個位置之間傳送訊息的高潛時記憶體移動操作來執行訊息填充。 The content of the padding bits appended to obtain a padded message may vary depending on the hash function under consideration. For example, in the SHA2 and SHA3/SHAKE hashing algorithms discussed herein, the padding bits will include bytes marking both the end of the unpadded portion of the message (i.e., the end-of-message (EOM) marker) and the end of the last block of padded messages (i.e., the end-of-block (EOB) marker). As further explained below, in some cases, the padding bits including the EOM and EOB markers may all be included in the message block containing the final message bytes; in other cases, the addition of padding bits may require appending an additional message block to the message. In either case, the disclosed invention preferably performs message filling in a processor register by executing one or more instructions rather than by a high latency memory move operation that transfers a message between two locations in memory.

在至少一些架構中,載入儲存單元224、記憶體控制器112及/或系統互連件110並不經建構以支援系統記憶體114與寬向量暫存器檔案116之間的冗長資料物件(例如,完整的r位元SHA3/SHAKE及SHA2訊息塊)之資料傳送。在此類架構中,訊息塊以多個較小組塊經傳送至較窄暫存器檔案中,且接著自較窄暫存器檔案經傳送至寬向量暫存器檔案316 之一或多個寬向量暫存器中。舉例而言,圖22A繪示將SHA3/SHAKE訊息塊n組合於包括256位元暫存器r0至rS 301的架構式暫存器檔案300中的實例。在此實例中,(例如)藉由圖2之載入儲存單元224執行載入長度指令以將1152位元SHA3-224訊息塊n之五個256位元組塊載入至暫存器r0至r7中且將不含訊息資料之任何暫存器位元組置零。在給出SHA3-224中之訊息塊的1152位元長度的情況下,訊息塊n內之訊息位元組至多將完全填充暫存器r0至r3加上暫存器r4之前導128個位元(當然,未經填充訊息之最終訊息塊可含有少於r個位元)。藉由自動執行載入長度指令抑或藉由執行標準載入指令,可將暫存器r4之至少剩餘128位元以及所有暫存器r5至r7置零。(僅適用於所支援之訊息塊長度中之任一者的通用SHA3函數才需要用零填充暫存器r6及r7)。可接著藉由資料傳送電路406或傳送單元320執行額外資料傳送指令,以將暫存器r0至r7之內容傳送至寬向量暫存器檔案316之暫存器R0及R1 317中,該寬向量暫存器檔案包括各自具有如上文所論述的1024位元之例示性寬度的暫存器R0至RT。在替代實現中,可藉由載入架構式暫存器檔案300內之四個暫存器301以緩衝組塊n1至n4且接著在後續循環上再使用相同暫存器301來緩衝組塊n5至n8來達成相同的結果。 In at least some architectures, the load storage unit 224 , the memory controller 112 , and/or the system interconnect 110 are not configured to support data transfer of lengthy data objects (e.g., complete r- bit SHA3/SHAKE and SHA2 message blocks) between the system memory 114 and the wide vector register file 116. In such architectures, the message blocks are transferred in multiple smaller chunks to a narrower register file, and then transferred from the narrower register file to one or more wide vector registers in the wide vector register file 316 . 22A shows an example of assembling SHA3/SHAKE message block n into an architected register file 300 including 256-bit registers r0 to r7 301. In this example, a load length instruction is executed, for example, by the load storage unit 224 of FIG. 2 to load five 256-byte blocks of the 1152-bit SHA3-224 message block n into registers r0 to r7 and any register bytes that do not contain message data are set to zero. Given a message block length of 1152 bits in SHA3-224, the message bytes in message block n will at most completely fill registers r0 to r3 plus the leading 128 bits of register r4 (of course, the final message block without padding may contain fewer than r bits). At least the remaining 128 bits of register r4 and all of registers r5 to r7 are zeroed, either automatically or by executing a load length instruction. (Registers r6 and r7 need only be zeroed for the generic SHA3 function for any of the supported message block lengths). Additional data transfer instructions may then be executed by data transfer circuit 406 or transfer unit 320 to transfer the contents of registers r0 to r7 to registers R0 and R1 317 of wide vector register file 316 , which includes registers R0 to RT each having an exemplary width of 1024 bits as discussed above. In an alternative implementation, the same result may be achieved by loading four registers 301 within architected register file 300 to buffer blocks n1 to n4 and then reusing the same registers 301 on a subsequent cycle to buffer blocks n5 to n8.

圖22B描繪類似實例,其展示在將訊息塊組合於架構式暫存器檔案300之暫存器301中之後將1024位元SHA2訊息塊n傳送至寬向量暫存器檔案316中之寬向量暫存器317。在此實例中,(例如)藉由圖2之載入儲存單元224執行載入長度指令以將SHA2訊息塊n之四個256位元組塊載入至架構式暫存器檔案300之暫存器r2至r5中且將不含訊息資料之任何暫存器位元組置零。可接著藉由資料傳送電路406執行額外資料傳送指 令,以將暫存器r2至r5之內容傳送至寬向量暫存器檔案316之暫存器R0317中。在替代實現中,可藉由載入架構式暫存器檔案300內之兩個暫存器301以緩衝組塊n1及n2且接著在後續循環上再使用相同暫存器301來緩衝組塊n3至n4來達成相同的結果。 22B depicts a similar example showing the 1024-bit SHA2 message block n being transferred to wide vector register 317 in wide vector register file 316 after the message blocks are assembled in register 301 of architectural register file 300. In this example, a load length instruction is executed, for example by load storage unit 224 of FIG. 2 , to load four 256-byte blocks of SHA2 message block n into registers r2 through r5 of architectural register file 300 and any register bytes that do not contain message data are set to zero. Additional data transfer instructions may then be executed by data transfer circuit 406 to transfer the contents of registers r2-r5 to register R0 317 of wide vector register file 316. In an alternative implementation, the same result may be achieved by loading two registers 301 within architected register file 300 to buffer blocks n1 and n2 and then reusing the same registers 301 on a subsequent loop to buffer blocks n3-n4.

在至少一些較佳實施例中,針對SHA3/SHAKE或SHA2訊息之所有訊息塊執行用於將訊息塊載入至圖22A圖22B中所給出的寬向量暫存器檔案316中的程序,該等所有訊息塊包括訊息塊n,其為未經填充訊息之最後訊息塊。如下文所解釋,可接著經由執行一或多個指令至少部分地在寬向量暫存器檔案316內填充訊息之末端。 In at least some preferred embodiments, a procedure for loading message blocks into the wide vector register file 316 shown in Figures 22A to 22B is performed for all message blocks of the SHA3/SHAKE or SHA2 message, including message block n , which is the last message block that is not filled with messages. As explained below, the end of the message can then be filled at least partially in the wide vector register file 316 by executing one or more instructions.

圖23A圖23D描繪針對各種長度之SHA3/SHAKE訊息的各種填充狀況。根據SHA-3標準,每一訊息必須包括標記EOM之EOM填充。在SHA3標準下,EOM填充具有用於SHA3雜湊函數之固定值x06及用於SHAKE雜湊函數之固定值x1F。經填充訊息內EOM填充之位置取決於訊息長度而變化,訊息長度在編譯時間常常係未知的。SHA-3標準進一步授權每一經填充訊息之最後位元組為固定值EOB填充位元組。 Figures 23A to 23D depict various padding conditions for SHA3/SHAKE messages of various lengths. According to the SHA-3 standard, each message must include EOM padding that marks EOM. Under the SHA3 standard, EOM padding has a fixed value of x06 for the SHA3 hash function and a fixed value of x1F for the SHAKE hash function. The location of the EOM padding within the padded message varies depending on the message length, which is often unknown at compile time. The SHA-3 standard further mandates that the last byte of each padded message be a fixed value EOB padding byte.

圖23A中所展示,若SHA3/SHAKE訊息之最後訊息塊2300包括不含有訊息資料的多於兩個位元組,則EOM填充位元組2302緊接在最後訊息位元組2306之後被插入至相關寬向量暫存器317之置零位元組中,且EOB填充位元組2304作為經填充訊息塊之最後位元組被插入至寬向量暫存器317之置位元組中。 As shown in FIG. 23A , if the last message block 2300 of the SHA3/SHAKE message includes more than two bytes that do not contain message data, the EOM pad byte 2302 is inserted into the zero byte of the associated wide vector register 317 immediately following the last message byte 2306 , and the EOB pad byte 2304 is inserted into the set byte of the wide vector register 317 as the last byte of the padded message block.

圖23B繪示類似第二狀況,其中SHA3/SHAKE訊息之最後訊息塊2300'包括並不含有訊息資料之確切兩個置零位元組。在此狀況下,最後訊息塊2300'之最後兩個置零位元組用EOM填充位元組2302,接 著是EOB填充位元組2304替換。 23B shows a similar second situation, where the last message block 2300' of the SHA3/SHAKE message includes exactly two zero bytes that do not contain message data. In this situation, the last two zero bytes of the last message block 2300' are replaced with EOM padding bytes 2302 , followed by EOB padding bytes 2304 .

圖23C描繪第三狀況,其中SHA3/SHAKE訊息之最後訊息塊2300"在最後訊息位元組2306之後僅包括單一置零訊息位元組。在此狀況下,如下文所描述的填充指令之執行會使EOM及EOB填充值一起被「或」(OR)運算且插入於經填充訊息塊2300"之最終位元組中作為EOM/EOB填充位元組2308 23C depicts a third situation in which the last message block 2300" of the SHA3/SHAKE message includes only a single zeroed message byte following the last message byte 2306. In this situation, execution of the padding instruction as described below causes the EOM and EOB padding values to be OR'ed together and inserted into the last byte of the padded message block 2300" as the EOM/EOB padding byte 2308 .

圖23D繪示最終狀況,其中SHA3/SHAKE訊息之最後訊息位元組2306為訊息塊2310之最後位元組。因為訊息塊2310在此狀況下不包括所需EOM及EOB填充之容量,所以將額外置零訊息塊2312附加至訊息(例如,經由執行載入長度指令)。EOM填充位元組2302作為第一位元組被插入至此置零訊息塊2312中,且EOB填充位元組2304作為最後位元組被插入至此置零訊息塊2312中。應注意,在圖23A圖23D中所描繪之四種狀況中的每一者中,可有利地藉由單一填充指令應用EOM填充及EOB填充兩者,此係由於EOM填充及EOB填充兩者始終屬於同一訊息塊內。亦應瞭解,儘管圖23A圖23D描繪將填充應用於包括整數數目個訊息位元組之訊息,但填充可類似地應用於不包括整數數目個位元組的位元訊息。 23D shows the final situation, where the last message byte 2306 of the SHA3/SHAKE message is the last byte of the message block 2310. Because the message block 2310 does not include the capacity for the required EOM and EOB padding in this situation, an additional zeroed message block 2312 is appended to the message (e.g., by executing a load length instruction). The EOM padding byte 2302 is inserted as the first byte into this zeroed message block 2312 , and the EOB padding byte 2304 is inserted as the last byte into this zeroed message block 2312 . It should be noted that in each of the four cases depicted in Figures 23A to 23D , both EOM padding and EOB padding can be advantageously applied by a single padding instruction since both EOM padding and EOB padding always belong to the same message block. It should also be understood that although Figures 23A to 23D depict padding being applied to messages that include an integer number of message bytes, padding can be similarly applied to byte messages that do not include an integer number of bytes.

在一個實施例中,可利用三個指令來實現如圖23A圖23D中所展示的任意長度之SHA3/SHAKE訊息之填充。此等指令包括:(1)載入長度指令,其將經填充訊息之最終訊息塊分級於架構式暫存器檔案300中之指定暫存器301中;(2)傳送指令,其將訊息塊自架構式暫存器檔案300中之暫存器301傳送至如圖22A中所展示之寬向量暫存器檔案316中之一或多個寬向量暫存器317;及(3)填充指令,其在保持於寬向量暫存 器317中之經填充SHA3/SHAKE訊息之最終訊息塊中的適當位元組位置處插入EOM及EOB填充。當然,在替代實現中,有可能利用兩個不同指令將EOM填充及EOB填充插入至最終訊息塊中。然而,對於諸如通常用於後量子加密方案中之單塊訊息的單塊訊息,添加額外填充指令會增加潛時且不合需要地降低雜湊效能。 In one embodiment, three instructions may be used to implement padding of SHA3/SHAKE messages of arbitrary length as shown in Figures 23A to 23D . These instructions include: (1) a load length instruction that stages the final message block of the padded message into a specified register 301 in the architectural register file 300 ; (2) a transfer instruction that transfers the message block from register 301 in the architectural register file 300 to one or more wide vector registers 317 in the wide vector register file 316 as shown in Figure 22A ; and (3) a fill instruction that inserts EOM and EOB padding at the appropriate byte positions in the final message block of the padded SHA3/SHAKE message held in the wide vector register 317 . Of course, in an alternative implementation, it is possible to insert EOM padding and EOB padding into the final message block using two different instructions. However, for single-block messages such as those commonly used in post-quantum encryption schemes, adding additional padding instructions increases latency and undesirably reduces hashing performance.

圖24A圖24D描繪針對各種長度之SHA2訊息的各種填充狀況。根據SHA-2標準,每一訊息必在緊接在最後訊息位元組之後的位元組中必須包括具有值x80之一個EOM填充位元組。EOM填充位元組在經填充訊息內之位置因此取決於訊息長度而變化。SHA-2標準進一步授權,最後兩個字(亦即,取決於所討論之SHA2雜湊函數,兩個32位元字抑或兩個64位元字(參見表II))含有指定以位元為單位的未經填充訊息之長度的EOB填充。 Figures 24A to 24D depict various padding conditions for SHA2 messages of various lengths. According to the SHA-2 standard, each message must include an EOM padding byte having a value of x80 in the byte immediately following the last message byte. The position of the EOM padding byte within the padded message therefore varies depending on the message length. The SHA-2 standard further mandates that the last two words (i.e., two 32-bit words or two 64-bit words, depending on the SHA2 hash function in question (see Table II)) contain EOB padding specifying the length of the unpadded message in bits.

圖24A中所繪示之第一狀況下,SHA2訊息之最後訊息塊2400包括不含有訊息資料的多於兩個字加一個位元組。在此狀況下,最後訊息塊2400藉由緊接在最後訊息位元組2406之後將EOM填充位元組2402插入至相關寬向量暫存器317之置零位元組中且藉由插入兩個EOB填充字2404作為最後訊息塊2400之最後兩個字來填充。 In the first case shown in Figure 24A , the last message block 2400 of the SHA2 message includes more than two words plus one byte that do not contain message data. In this case, the last message block 2400 is filled by inserting an EOM fill byte 2402 into the zero byte of the associated width vector register 317 immediately after the last message byte 2406 and by inserting two EOB fill words 2404 as the last two words of the last message block 2400 .

圖24B繪示類似第二狀況,其中SHA2訊息之最後訊息塊2400'包括並不含有訊息資料之確切兩個字加一個位元組。在此狀況下,最後訊息塊2400'藉由緊接在最後訊息位元組2406之後將EOM填充位元組2402插入至相關寬向量暫存器317之置零位元組中且插入兩個EOB填充字2404作為最後訊息塊2400之最後兩個字來填充。 24B illustrates a similar second situation, where the last message block 2400′ of the SHA2 message includes exactly two words plus one byte that do not contain message data. In this situation, the last message block 2400′ is filled by inserting an EOM fill byte 2402 into the zero byte of the associated width vector register 317 immediately after the last message byte 2406 and inserting two EOB fill words 2404 as the last two words of the last message block 2400 .

圖24C描繪第三狀況,其中未經填充SHA2訊息之最後訊息 塊2400"包括不含訊息資料的過少位元組以適應EOM填充位元組2402及兩個EOB填充字2404。在此狀況下,SHA2訊息係藉由緊接在最後訊息位元組2406之後將EOM填充位元組2402插入至相關寬向量暫存器317之置零位元組中來填充。因為EOB填充字2404並不適合於訊息塊2400"內,所以將額外置零訊息塊2408附加至訊息(例如,經由執行載入長度指令)。EOB填充字2404接著作為訊息塊2408之最後兩個字被插入。 Figure 24C depicts a third case, in which the last message block 2400" of the unfilled SHA2 message includes too few bytes containing no message data to accommodate the EOM fill byte 2402 and the two EOB fill words 2404. In this case, the SHA2 message is filled by inserting the EOM fill byte 2402 into the zero bytes of the associated wide vector register 317 immediately after the last message byte 2406. Because the EOB fill word 2404 does not fit within the message block 2400" , an additional zero message block 2408 is appended to the message (e.g., by executing a load length instruction). The EOB fill word 2404 is then inserted as the last two words of the message block 2408 .

圖24D繪示第四狀況,其中SHA2訊息之最後訊息位元組2406形成完整訊息塊2410之最後位元組。因為訊息塊2410不包括EOM或EOB填充之容量,所以將額外置零訊息塊2412附加至SHA2訊息。額外訊息塊2412包括EOM填充位元組2402作為訊息塊2412之第一位元組,接著是多個置零位元組,且最後在訊息塊2412之末端處為兩個EOB填充字2404 24D shows a fourth case, where the last message byte 2406 of the SHA2 message forms the last byte of a complete message block 2410. Because the message block 2410 does not include space for EOM or EOB padding, an additional zeroed message block 2412 is appended to the SHA2 message. The additional message block 2412 includes the EOM padding byte 2402 as the first byte of the message block 2412 , followed by a number of zeroed bytes, and finally two EOB padding words 2404 at the end of the message block 2412 .

在一個實施例中,可利用少至四個指令來實現任意長度之SHA2訊息之填充。此等指令包括:(1)載入長度指令,其將SHA2訊息之最終訊息塊置放於架構式暫存器檔案300中之指定暫存器301中且將不含訊息位元組之任何暫存器位元組置零;(2)插入字指令,其將兩個EOB填充字2404置放於架構式暫存器檔案300中之暫存器301之適當位元組中以標記經填充訊息之末端;(3)傳送指令,其將緩衝訊息塊的暫存器301之內容自架構式暫存器檔案300傳送至寬向量暫存器檔案316中之寬向量暫存器317;及(4)填充指令,其將EOM填充位元組2402插入寬向量暫存器317中之適當位置處。在此實施例中,填充指令之執行會插入EOM填充位元組2402而不插入EOB填充字2404,此係因為(1)EOM填充位元組2402及EOB填充字2404可位於不同訊息塊中,且(2)EOB填充字2404可利用現有 插入字指令高效地定位於架構式暫存器檔案300內之適當暫存器301中。當然,在一替代實施例中,EOM填充位元組2402及EOB填充字2404兩者可應用於架構式暫存器檔案300之暫存器301中的SHA2訊息塊。 In one embodiment, padding of SHA2 messages of arbitrary length can be accomplished using as few as four instructions. These instructions include: (1) a load length instruction that places the final message block of the SHA2 message in a specified register 301 in the architectural register file 300 and sets any register bytes that do not contain message bytes to zero; (2) an insert word instruction that places two EOB fill words 2404 in the appropriate bytes of register 301 in the architectural register file 300 to mark the end of the filled message; (3) a transfer instruction that transfers the contents of register 301 of the buffered message block from the architectural register file 300 to wide vector register 317 in the wide vector register file 316 . and (4) a fill instruction that inserts EOM fill bytes 2402 into the appropriate location in wide vector register 317. In this embodiment, execution of the fill instruction inserts EOM fill bytes 2402 instead of EOB fill words 2404 because (1) EOM fill bytes 2402 and EOB fill words 2404 may be located in different message blocks, and (2) EOB fill words 2404 may be efficiently located in the appropriate register 301 in architected register file 300 using existing insert word instructions. Of course, in an alternative embodiment, both EOM fill bytes 2402 and EOB fill words 2404 may be applied to the SHA2 message block in register 301 of architected register file 300 .

現在參考圖25,繪示根據一個實施例之例示性填充指令2500。在至少一個實施例中,例示性填充指令2500可由加速器單元314在資料傳送電路406內執行以針對SHA3/SHAKE訊息塊及SHA2訊息塊兩者執行填充。 Referring now to FIG. 25 , an exemplary fill instruction 2500 is illustrated according to one embodiment. In at least one embodiment, the exemplary fill instruction 2500 may be executed by the accelerator unit 314 within the data transfer circuit 406 to perform fill for both SHA3/SHAKE message blocks and SHA2 message blocks.

在所繪示之實例中,填充指令2500包括指定用於訊息填充指令之架構特定作業碼的作業碼欄位2502。填充指令另外包括用於指定填充操作之源及目的地運算元之儲存位置的兩個暫存器欄位25042506。舉例而言,暫存器1欄位2504可識別寬向量暫存器檔案316內緩衝待填充之訊息塊的目標寬向量暫存器317,且暫存器2欄位2506可指定架構式暫存器檔案300中保持以位元組為單位之剩餘訊息長度的暫存器301In the illustrated example, the fill instruction 2500 includes an operation code field 2502 that specifies an architecture specific operation code for the message fill instruction. The fill instruction further includes two register fields 2504 , 2506 that specify the storage locations of the source and destination operands of the fill operation. For example, the register 1 field 2504 may identify the destination wide vector register 317 in the wide vector register file 316 that buffers the message block to be filled, and the register 2 field 2506 may specify the register 301 in the architecture register file 300 that holds the remaining message length in bytes.

填充指令2500進一步包括提供用以填充訊息之資訊的模式欄位2508。在一個例示性實施例中,模式欄位2508包括至少三個子欄位,包括雜湊識別符(HID)子欄位2510、塊長度(BL)子欄位2512及擴展(E)子欄位2514。HID子欄位2510指示被應用於訊息塊的雜湊函數之類型。舉例而言,在一個實現中,HID子欄位2510可包括指定以下雜湊類型中之一者的兩個位元:SHA3、SHAKE、SHA2(64位元字)及SHA2(32位元字)。BL子欄位2512指示(可能在與HID子欄位2510一起被解譯時)以位元組為單位之訊息塊的長度。E子欄位2514指示由暫存器1欄位2504指定之寬向量暫存器317是保持訊息塊之前導區段S0抑或尾隨區段S1。舉例而 言,在寬向量暫存器317為1024位元寬的一實施例中,若由暫存器1欄位2504指定之寬向量暫存器317並不保持訊息塊之尾隨區段,則E子欄位2514可具有值b0,且若經指定寬向量暫存器317保持訊息塊之尾隨區段,則E子欄位2514可具有值b1。當然,在寬向量暫存器317具有不同寬度(例如,512個位元)之其他實施例中,E子欄位2514可包括額外位元以指定額外暫存器區段。 The fill instruction 2500 further includes a mode field 2508 that provides information for filling the message. In an exemplary embodiment, the mode field 2508 includes at least three subfields, including a hash identifier (HID) subfield 2510 , a block length (BL) subfield 2512 , and an extension (E) subfield 2514. The HID subfield 2510 indicates the type of hash function applied to the message block. For example, in one implementation, the HID subfield 2510 may include two bits that specify one of the following hash types: SHA3, SHAKE, SHA2 (64-bit word), and SHA2 (32-bit word). BL subfield 2512 indicates (possibly when interpreted together with HID subfield 2510 ) the length of the message block in bytes. E subfield 2514 indicates whether the width vector register 317 specified by register 1 field 2504 holds the leading segment S0 or the trailing segment S1 of the message block. For example, in one embodiment where the wide vector register 317 is 1024 bits wide, if the wide vector register 317 specified by the register 1 field 2504 does not hold a trailing segment of a message block, then the E subfield 2514 may have a value of b0, and if the wide vector register 317 is specified to hold a trailing segment of a message block, then the E subfield 2514 may have a value of b1. Of course, in other embodiments where the wide vector register 317 has a different width (e.g., 512 bits), the E subfield 2514 may include additional bits to specify additional register segments.

現在參看圖26,繪示根據一個實施例之例示性填充電路2600。可實現為例如加速器單元314之資料傳送電路406之一部分的填充電路2600,回應於如圖25中所展示的填充指令2500之執行而填充保持於目標寬向量暫存器中之訊息區段S1。所繪示之實例假定寬向量暫存器檔案316具有1024位元寬向量暫存器317Referring now to FIG. 26 , an exemplary fill circuit 2600 is shown according to one embodiment. The fill circuit 2600 , which may be implemented as part of the data transfer circuit 406 of the accelerator unit 314 , for example, fills the message segment S1 held in the target wide vector register in response to the execution of the fill instruction 2500 as shown in FIG. 25 . The illustrated example assumes that the wide vector register file 316 has 1024-bit wide vector registers 317 .

在此例示性實施例中,填充電路2600包括選擇EOM電路2602,該選擇EOM電路基於由填充指令2500之HID子欄位2510指定之雜湊函數來選擇EOM填充位元組23022402(亦即,eom_byte)之值。填充電路2600亦包括選擇EOB電路2604,該選擇EOB電路基於HID子欄位2510以類似方式選擇待藉由填充指令2500插入之EOB填充位元組(亦即,eob_byte)之值。在所描述之實施例中,對於SHA3/SHAKE雜湊函數,選擇EOB電路2604選擇由SHA-3標準指定之固定eob_byte值,該值含於由暫存器2欄位2506指示之暫存器中。對於SHA2雜湊函數,選擇EOB電路2604選擇零eob_byte,此係因為EOB填充字2404在此實施例中由單獨指令插入。填充電路2600進一步包括選擇BL大小電路2606,該選擇BL大小電路基於填充指令2500之HID欄位2510及BL欄位2512選擇並輸出8位元塊長度值。 In this exemplary embodiment, fill circuit 2600 includes select EOM circuit 2602 , which selects the value of EOM fill byte 2302 or 2402 (i.e., eom_byte) based on the hash function specified by HID subfield 2510 of fill instruction 2500. Fill circuit 2600 also includes select EOB circuit 2604 , which selects the value of the EOB fill byte (i.e., eob_byte) to be inserted by fill instruction 2500 based on HID subfield 2510 in a similar manner. In the described embodiment, for the SHA3/SHAKE hash function, the select EOB circuit 2604 selects a fixed eob_byte value specified by the SHA-3 standard, which is contained in the register indicated by register 2 field 2506. For the SHA2 hash function, the select EOB circuit 2604 selects a zero eob_byte because the EOB fill word 2404 is inserted by a separate instruction in this embodiment. The fill circuit 2600 further includes a select BL size circuit 2606 that selects and outputs an 8-bit block length value based on the HID field 2510 and the BL field 2512 of the fill instruction 2500 .

藉由選擇BL大小電路2606輸出之8位元塊長度值由EOB賦能電路2608接收,該EOB賦能電路包括比較器2610、解碼器2612及逐位元「及」(AND)電路2614。8位元塊長度值之高階位元指示訊息塊之長度是否超過1024位元寬向量暫存器317之寬度(如將(例如)針對SHA3-224、SHAKE-128以及SHAKE 256之狀況)。塊長度之低階7位元形成塊長度大小(bl_size),其指示包含在由暫存器1欄位2504識別之目標寬向量暫存器317中緩衝的訊息塊之區段的位元組之數目。解碼器2612解碼7位元bl_size值以獲得目標寬向量暫存器317內之訊息塊之末端之位置的128位元表示。比較器2610比較8位元塊長度之高階位元與填充指令2500之E子欄位2514,以形成是否將EOB填充添加至在目標寬向量暫存器中緩衝的訊息塊之區段(亦即,目標寬向量暫存器317是否緩衝訊息塊之尾隨區段S1)的1位元指示。此1位元指示接著藉由逐位元「及」電路2614邏輯地組合以產生128位元EOB賦能信號(eob_en(0:127)),該128位元EOB賦能信號識別在目標寬向量暫存器317中緩衝的待插入EOB填充的訊息區段之位元組(若存在)。 The 8-bit block length value output by the select BL size circuit 2606 is received by the EOB enable circuit 2608 , which includes a comparator 2610 , a decoder 2612 , and a bitwise AND circuit 2614. The high-order bit of the 8-bit block length value indicates whether the length of the message block exceeds the width of the 1024-bit wide vector register 317 (as would be the case, for example, for SHA3-224, SHAKE-128, and SHAKE 256). The low-order 7 bits of the block length form the block length size (bl_size), which indicates the number of bytes contained in the segment of the message block buffered in the target width vector register 317 identified by register 1 field 2504. The decoder 2612 decodes the 7-bit bl_size value to obtain a 128-bit representation of the position of the end of the message block within the target width vector register 317 . Comparator 2610 compares the high-order bits of the 8-bit block length with the E subfield 2514 of the fill instruction 2500 to form a 1-bit indication of whether EOB padding is added to the segment of the message block buffered in the target width vector register (i.e., whether the target width vector register 317 buffers the trailing segment S1 of the message block). This 1-bit indication is then logically combined by a bit-wise AND circuit 2614 to generate a 128-bit EOB enable signal (eob_en(0:127)) which identifies the byte of the message segment buffered in the target width vector register 317 into which the EOB padding is to be inserted (if any).

仍參看圖26,填充電路2600進一步包括EOM賦能電路2620,該EOM賦能電路包括選擇電路2620、比較器2622、解碼器2624及逐位元「及」電路2626。在所描繪之實例中,選擇電路2620藉由雙輸入多工器實現,該雙輸入多工器具有經耦合以接收訊息長度之8位元指示的第一輸入,及經耦合以接收適用於採用32位元字之SHA-2個雜湊函數的擴展訊息長度的第二輸入。第二輸入處之擴展訊息長度值根據方程式EX_LEN=4*(LEN/4)+LEN藉由將b0插入至原始長度之位元5與6之間而使訊息之原始長度加倍。此技術保留了原始位元6:7之位元位置,該等位 元位置指示在最終訊息位元組之擴展訊息塊之32高階位元內的位元組位置(若存在)。若HID子欄位2510指示雜湊函數為SHA3/SHAKE雜湊函數或採用64位元字之SHA2雜湊函數,則選擇電路2620選擇其兩個8位元輸入中之第一者,且若HID子欄位2510指示雜湊函數為採用32位元字之SHA2雜湊函數,則選擇電路2620替代地選擇其兩個輸入中之第二者。 Still referring to FIG26 , the padding circuit 2600 further includes an EOM enabling circuit 2620 , which includes a selection circuit 2620 , a comparator 2622 , a decoder 2624 , and a bitwise AND circuit 2626. In the depicted example, the selection circuit 2620 is implemented by a dual-input multiplexer having a first input coupled to receive an 8-bit indication of a message length, and a second input coupled to receive an extended message length for a SHA-2 hash function using 32-bit words. The extended message length value at the second input doubles the original length of the message according to the equation EX_LEN = 4*(LEN/4) + LEN by inserting b0 between bits 5 and 6 of the original length. This technique preserves the bit positions of the original bits 6:7, which indicate the byte positions within the 32 high-order bits of the extended message block of the final message bytes, if any. If HID subfield 2510 indicates that the hash function is a SHA3/SHAKE hash function or a SHA2 hash function that uses 64-bit words, then selection circuit 2620 selects the first of its two 8-bit inputs, and if HID subfield 2510 indicates that the hash function is a SHA2 hash function that uses 32-bit words, then selection circuit 2620 alternatively selects the second of its two inputs.

由選擇電路2620輸出之8位元長度值包括指示塊長度是否超過1024位元寬向量暫存器檔案316之寬度的高階位元,及指示包含在由暫存器1欄位2504識別之目標寬向量暫存器317中緩衝的訊息塊之區段的位元組之數目的七個低階位元。解碼器2624解碼七個低階位元以獲得位元組位置之128位元表示(若存在),訊息位元組之末端將在目標寬向量暫存器317內被插入在該位元組位置處。比較器2622比較由選擇電路2620輸出之長度值之高階位元與填充指令2500之E子欄位2514,以形成EOM填充是否待添加至在目標寬向量暫存器317中緩衝之訊息塊之區段的1位元指示。此1位元指示接著藉由逐位元「及」電路2626邏輯地組合以產生128位元EOM賦能信號(eom_en(0:127)),該128位元EOM賦能信號識別在目標寬向量暫存器317中緩衝的待插入EOM填充的訊息區段之位元組(若存在)。 The 8-bit length value output by the select circuit 2620 includes a high-order bit indicating whether the block length exceeds the width of the 1024-bit wide vector register file 316 , and seven low-order bits indicating the number of bytes contained in the segment of the message block buffered in the target wide vector register 317 identified by register 1 field 2504. The decoder 2624 decodes the seven low-order bits to obtain a 128-bit representation of the byte position (if any) at which the end of the message byte is to be inserted in the target wide vector register 317 . Comparator 2622 compares the high-order bits of the length value output by select circuit 2620 with E subfield 2514 of fill instruction 2500 to form a 1-bit indication of whether EOM padding is to be added to the segment of the message block buffered in target width vector register 317. This 1-bit indication is then logically combined by bitwise AND circuit 2626 to generate a 128-bit EOM enable signal (eom_en(0:127)) that identifies the bytes of the message segment buffered in target width vector register 317 into which EOM padding is to be inserted (if any).

EOB賦能信號eob_en(o:127)、EOM賦能信號eom_en(0:127)、eom_byte、eob_byte及來自目標寬向量暫存器317之訊息區段全部被傳遞至條件「或」電路2630,該條件「或」電路條件性地將EOM及/或EOB填充插入至訊息區段中以獲得經填充訊息區段Sp。接著將經填充訊息區段Sp儲存回至暫存器1欄位2504中指定之目標寬向量暫存器317The EOB enable signal eob_en (o: 127), the EOM enable signal eom_en (0: 127), eom_byte, eob_byte and the message segment from the target width vector register 317 are all passed to the conditional "OR" circuit 2630 , which conditionally inserts EOM and/or EOB padding into the message segment to obtain the filled message segment Sp. The filled message segment Sp is then stored back to the target width vector register 317 specified in register 1 field 2504 .

現在參考圖27,繪示圖26之條件「或」電路2630的例示性實施例。在此實例中,訊息區段之128個位元組中之每一者具有各別相關聯的「或」閘2700,該「或」閘具有三個8位元輸入。「或」閘2700之第一輸入經耦合以接收訊息區段S之各別位元組。「或」閘2700之第二輸入耦合至雙輸入「及」閘2702之輸出,該雙輸入「及」閘針對訊息區段S之給定位元組用eom_en()限定eom_byte。「或」閘2700之第三輸入耦合至雙輸入「及」閘2704之輸出,該雙輸入「及」閘針對訊息區段S之給定位元組用eob_en()限定eob_byte。「或」閘2700對此等三個輸入執行邏輯或運算,且將經填充訊息區段Sp之所得位元組寫入至寬向量暫存器檔案316中之目標寬向量暫存器317。因此,若對於訊息區段S之給定位元組既不確立eom_en()亦不確立eob_en(),則相關「或」閘2700僅將輸入訊息區段S之位元組寫入至經填充訊息區段Sp之對應位元組。然而,若對於訊息區段S之給定位元組確立eom_en()或eob_en()中之一者或兩者,則相關「或」閘2700將eom_byte、eob_byte或其邏輯組合寫入至經填充訊息區段Sp之對應位元組中,如由賦能信號eom_en()及eob_en()指示。 Referring now to FIG. 27 , an exemplary embodiment of the conditional OR circuit 2630 of FIG. 26 is shown. In this example, each of the 128 bytes of the message segment has a respective associated OR gate 2700 having three 8-bit inputs. The first input of the OR gate 2700 is coupled to receive the respective bytes of the message segment S. The second input of the OR gate 2700 is coupled to the output of a dual-input AND gate 2702 , which qualifies the eom_byte for a given bit packet of the message segment S with eom_en(). The third input of the OR gate 2700 is coupled to the output of the dual-input AND gate 2704 , which qualifies eob_byte with eob_en() for a given bit packet of the message segment S. The OR gate 2700 performs a logical OR operation on these three inputs and writes the resulting bytes of the padded message segment Sp to the target wide vector register 317 in the wide vector register file 316. Therefore, if neither eom_en() nor eob_en() is asserted for a given bit packet of the message segment S, the associated OR gate 2700 simply writes the bytes of the input message segment S to the corresponding bytes of the padded message segment Sp. However, if either or both of eom_en() or eob_en() are asserted for a given bit packet of message segment S, then the associated OR gate 2700 writes eom_byte, eob_byte, or a logical combination thereof, into the corresponding bytes of the padded message segment Sp, as indicated by the enable signals eom_en() and eob_en().

現在參看圖28,描繪根據一個實施例的用於填充訊息塊之例示性程序的高階邏輯流程圖。所繪示程序可藉由加速器單元314回應於接收到填充指令2500而執行。為了易於理解,下文參考圖26圖27中所描繪之例示性填充電路來描述程序。 Referring now to FIG. 28 , a high-level logic flow chart of an exemplary process for filling a message block according to one embodiment is depicted. The depicted process may be executed by the accelerator unit 314 in response to receiving a fill instruction 2500. For ease of understanding, the process is described below with reference to the exemplary fill circuit depicted in FIGS. 26-27 .

圖28之程序開始於區塊2800,且接著繼續進行至區塊2802,區塊2802繪示加速器單元314接收填充指令2500以供執行。回應於接收到填充指令2500,加速器單元314首先存取由填充指令2500之暫存器欄位25042506指定的源運算元(區塊2804)。特定言之,加速器單元314 自寬向量暫存器檔案316中之由暫存器1欄位2504指定的目標寬向量暫存器317讀取訊息區段S,自架構式暫存器檔案300中之由暫存器2欄位2506指定的暫存器301讀取未經填充訊息長度,且將此等運算元傳送至圖26之填充電路2600,該填充電路如上文所提及可實現於資料傳送電路406內。在區塊2806處,填充電路2600利用填充指令2500之模式欄位2508來選擇填充操作之參數。特定言之,選擇EOM電路2602基於由模式欄位2508指定之雜湊函數來選擇EOM填充位元組(eom_byte)23022402之值,選擇EOB電路2604選擇待藉由填充指令2500插入之EOB填充位元組(eob_byte)之值(亦即,用於SHA3/SHAKE之固定值及用於SHA2之零位元組,此係由於EOB填充字2404係藉由用於SHA2之單獨指令應用),且選擇BL大小電路2606基於HID子欄位2510及BL子欄位2512選擇塊長度。由選擇EOM電路2602選擇之eom_byte及由所選擇EOB電路2604選擇之eob_byte形成至條件「或」電路2630之輸入。 28 begins at block 2800 and then proceeds to block 2802 , which shows the accelerator unit 314 receiving the fill instruction 2500 for execution. In response to receiving the fill instruction 2500 , the accelerator unit 314 first accesses the source operand specified by the register fields 2504 , 2506 of the fill instruction 2500 (block 2804 ). Specifically, the accelerator unit 314 reads the message segment S from the target width vector register 317 specified by the register 1 field 2504 in the width vector register file 316 , reads the unfilled message length from the register 301 specified by the register 2 field 2506 in the architectural register file 300 , and transmits these operands to the fill circuit 2600 of FIG. 26 , which, as mentioned above, can be implemented in the data transfer circuit 406. At block 2806 , the fill circuit 2600 uses the mode field 2508 of the fill instruction 2500 to select the parameters of the fill operation. Specifically, the select EOM circuit 2602 selects the value of the EOM fill byte (eom_byte) 2302 or 2402 based on the hash function specified by the mode field 2508 , the select EOB circuit 2604 selects the value of the EOB fill byte (eob_byte) to be inserted by the fill instruction 2500 (i.e., a fixed value for SHA3/SHAKE and a zero byte for SHA2, since the EOB fill word 2404 is applied by a separate instruction for SHA2), and the select BL size circuit 2606 selects the block length based on the HID subfield 2510 and the BL subfield 2512 . The eom_byte selected by the selected EOM circuit 2602 and the eob_byte selected by the selected EOB circuit 2604 form the input to the conditional "OR" circuit 2630 .

在區塊2808處,選擇電路2620基於模式欄位2508之HID子欄位判定應用於訊息之雜湊函數是否為採用32位元字之SHA2-224或SHA2-256雜湊函數中之一者。若否,則選擇電路2620選擇並輸出自由暫存器2欄位2506識別之暫存器301讀取的訊息長度作為訊息之長度,且圖28之程序繼續進行至區塊2812,其在下文予以描述。然而,若選擇電路2620在區塊2808處判定填充指令2500之HID子欄位2510指示採用32位元字之SHA2雜湊函數,則選擇電路2620針對SHA2訊息選擇並輸出加倍長度以考量上文參考圖15所描述之訊息擴展。在一個實現中,擴展SHA2訊息長度可方便地計算為:4*(LEN/4)+LEN。程序接著自區塊2810繼續進行至區塊2812At block 2808 , selection circuit 2620 determines whether the hash function applied to the message is one of the SHA2-224 or SHA2-256 hash functions using 32-bit words based on the HID subfield of mode field 2508. If not, selection circuit 2620 selects and outputs the message length read from register 301 identified by free register 2 field 2506 as the length of the message, and the process of Figure 28 continues to block 2812 , which is described below. However, if the selection circuit 2620 determines at block 2808 that the HID subfield 2510 of the fill instruction 2500 indicates that a 32-bit word SHA2 hash function is used, the selection circuit 2620 selects and outputs a double length for the SHA2 message to account for the message expansion described above with reference to FIG. 15. In one implementation, the expanded SHA2 message length can be conveniently calculated as: 4*(LEN/4)+LEN. The program then continues from block 2810 to block 2812 .

區塊2812繪示藉由EOM賦能電路2620判定是否待將EOM填充置放於當前訊息區段中。若否,則由EOM賦能電路2620生成之EOM賦能向量eom_en(0:127)全部為零,且無EOM填充被插入至訊息區段S中。因此,程序轉至區塊2816,其在下文加以描述。然而,若EOM賦能電路2620在區塊2812處判定EOM填充待被插入至訊息區段S中,則EOM賦能電路2620生成EOM賦能向量eom_en(0:127),該EOM賦能向量識別待插入EOM填充位元組所在的訊息區段S之位元組,且EOM填充位元組係藉由條件「或」電路2630插入至經填充訊息區段Sp之指定位元組中(區塊2814)。程序自區塊2814繼續進行至區塊2816Block 2812 shows the determination by the EOM enable circuit 2620 whether EOM padding is to be placed in the current message segment. If not, the EOM enable vector eom_en (0:127) generated by the EOM enable circuit 2620 is all zeros, and no EOM padding is inserted into the message segment S. Therefore, the process transfers to block 2816 , which is described below. However, if the EOM enable circuit 2620 determines at block 2812 that EOM padding is to be inserted into the message segment S, the EOM enable circuit 2620 generates an EOM enable vector eom_en (0:127) that identifies the byte of the message segment S where the EOM padding byte is to be inserted, and the EOM padding byte is inserted into the specified byte of the padded message segment Sp by the conditional "OR" circuit 2630 (block 2814 ). The program continues from block 2814 to block 2816 .

在區塊2816處,選擇BL大小電路2606及EOB賦能電路2608判定由雜湊指令2500之模式欄位2508指定的雜湊函數是否為SHA3或SHAKE雜湊函數且EOB填充位元組待插入於訊息區段S中。若否,則由EOB賦能電路2608生成之EOB賦能向量eob_en(0:127)全部為零,且無EOB填充被插入至訊息區段S中。因此,程序自區塊2816轉至區塊2820,區塊2820在下文加以描述。然而,若BL大小電路2606及EOB賦能電路2620在區塊2816處判定由模式欄位2508指定之雜湊函數為SHA3或SHAKE雜湊函數且EOB填充將被插入至訊息區段S中,則EOB賦能電路2608生成EOB賦能向量eob_en(0:127),該EOB賦能向量識別待插入EOB填充位元組的訊息區段S之位元組,且EOB填充位元組藉由條件「或」電路2630插入至經填充訊息區段Sp之指定位元組中(區塊2818)。程序接著轉至區塊2820At block 2816 , the BL size selection circuit 2606 and the EOB enable circuit 2608 determine whether the hash function specified by the mode field 2508 of the hash instruction 2500 is a SHA3 or SHAKE hash function and EOB padding bytes are to be inserted into the message segment S. If not, the EOB enable vector eob_en (0:127) generated by the EOB enable circuit 2608 is all zeros, and no EOB padding is inserted into the message segment S. Therefore, the program jumps from block 2816 to block 2820 , which is described below. However, if the BL size circuit 2606 and the EOB enable circuit 2620 determine at block 2816 that the hash function specified by the mode field 2508 is a SHA3 or SHAKE hash function and EOB padding is to be inserted into the message segment S, then the EOB enable circuit 2608 generates an EOB enable vector eob_en (0:127) that identifies the byte of the message segment S into which the EOB padding byte is to be inserted, and the EOB padding byte is inserted into the specified byte of the padded message segment Sp by the conditional "or" circuit 2630 (block 2818 ). The program then jumps to block 2820 .

區塊2820繪示資料傳送電路406將所得經填充訊息區段Sp寫入至由暫存器1欄位2504指定之目標寬向量暫存器317中。此後,圖28 之程序在區塊2822處結束。 Block 2820 shows the data transfer circuit 406 writing the resulting padded message segment Sp into the target width vector register 317 specified by register 1 field 2504. Thereafter, the process of FIG. 28 ends at block 2822 .

現在參考圖29,繪示用於(例如)半導體IC邏輯設計、模擬、測試、佈局以及製造中的例示性設計流程2900的方塊圖。設計流程2900包括用於處理設計結構或裝置以生成上文所描述並在本文中所展示的設計結構及/或裝置之邏輯上或以其他方式功能上等效表示的程序、機器及/或機構。藉由設計流程2900處理及/或生成的設計結構可在機器可讀傳輸或儲存媒體上經編碼以包括當在資料處理系統上執行或以其他方式處理時生成硬體組件、電路、裝置或系統之邏輯上、結構上、機械上或以其他方式功能上等效表示的資料及/或指令。機器包括但不限於用於IC設計程序之任何機器,該IC設計程序諸如設計、製造或模擬電路、組件、裝置或系統。舉例而言,機器可包括:微影機器、用於生成遮罩之機器及/或裝備(例如電子束寫入器)、用於模擬設計結構之電腦或裝備、用於製造或測試程序之任何設備或用於將設計結構之功能上等效的表示程式化至任何媒體中的任何機器(例如,用於程式化可程式化閘陣列的機器)。 Referring now to FIG. 29 , a block diagram of an exemplary design flow 2900 for use in, for example, semiconductor IC logic design, simulation, testing, layout, and manufacturing is shown. The design flow 2900 includes a program, machine, and/or mechanism for processing a design structure or device to generate a logically or otherwise functionally equivalent representation of the design structure and/or device described above and shown herein. The design structure processed and/or generated by the design flow 2900 may be encoded on a machine-readable transmission or storage medium to include data and/or instructions that, when executed or otherwise processed on a data processing system, generate a logically, structurally, mechanically, or otherwise functionally equivalent representation of a hardware component, circuit, device, or system. A machine includes, but is not limited to, any machine used in an IC design process, such as designing, manufacturing, or simulating a circuit, component, device, or system. For example, a machine may include: a lithography machine, a machine and/or equipment for generating masks (e.g., an electron beam writer), a computer or equipment for simulating a design structure, any equipment for a manufacturing or testing process, or any machine for programming a functionally equivalent representation of a design structure into any medium (e.g., a machine for programming a programmable gate array).

設計流程2900可取決於正設計的表示之類型而變化。舉例而言,用於建置特殊應用IC(ASIC)之設計流程2900可不同於用於設計標準組件之設計流程2900或不同於用於將設計實體化為可程式化陣列之設計流程2900,可程式化陣列例如由Altera®公司或Xilinx®公司提供之可程式化閘陣列(PGA)或場可程式化閘陣列(FPGA)。 The design flow 2900 may vary depending on the type of representation being designed. For example, a design flow 2900 for building an application specific IC (ASIC) may be different from a design flow 2900 for designing a standard component or from a design flow 2900 for materializing a design into a programmable array, such as a programmable gate array (PGA) or a field programmable gate array (FPGA) provided by Altera® or Xilinx®.

圖29繪示包括較佳藉由設計程序2910處理之輸入設計結構2920的多個此類設計結構。設計結構2920可為藉由設計程序2910生成且處理以產生硬體裝置之邏輯上等效之功能表示的邏輯模擬設計結構。設計結構2920亦可或替代地包含在藉由設計程序2910處理時生成硬體裝置之 實體結構之功能表示的資料及/或程式指令。無論表示功能及/或結構設計特徵,都可使用諸如由核心開發者/設計者實現之電子電腦輔助設計(ECAD)來生成設計結構2920。當經編碼於機器可讀資料傳輸、閘陣列或儲存媒體上時,設計結構2920可藉由設計程序2910內之一或多個硬體及/或軟體模組存取及處理以模擬或另外功能上表示電子組件、電路、電子或邏輯模組、設備、裝置或系統,諸如本文中所展示之彼等電子組件、電路、電子或邏輯模組、設備、裝置或系統。因而,設計結構2920可包含檔案或包括人類及/或機器可讀原始程式碼的其他資料結構、經編譯結構及電腦可執行程式碼結構,該等電腦可執行程式碼結構在由設計或模擬資料處理系統處理時在功能上模擬或以其他方式表示硬體邏輯設計之電路或其他層級。此類資料結構可包括硬體描述語言(HDL)設計實體或符合較低層級HDL設計語言(諸如Verilog及VHDL)及/或較高層級設計語言(諸如C或C++)及/或與較低層級HDL設計語言及/或較高層級設計語言相容的其他資料結構。 FIG. 29 illustrates a plurality of such design structures including an input design structure 2920 preferably processed by a design program 2910. The design structure 2920 may be a logical analog design structure generated by the design program 2910 and processed to produce a logically equivalent functional representation of a hardware device. The design structure 2920 may also or alternatively include data and/or program instructions that, when processed by the design program 2910, generate a functional representation of the physical structure of the hardware device. Whether representing functional and/or structural design features, the design structure 2920 may be generated using electronic computer-aided design (ECAD) such as implemented by a core developer/designer. When encoded on a machine-readable data transmission, gate array, or storage medium, the design structure 2920 can be accessed and processed by one or more hardware and/or software modules within the design process 2910 to simulate or otherwise functionally represent an electronic component, circuit, electronic or logical module, apparatus, device, or system, such as those shown herein. Thus, design structure 2920 may include files or other data structures including human and/or machine readable source code, compiled structures, and computer executable program code structures that functionally simulate or otherwise represent circuits or other levels of a hardware logic design when processed by a design or simulation data processing system. Such data structures may include hardware description language (HDL) design entities or other data structures that conform to lower level HDL design languages (such as Verilog and VHDL) and/or higher level design languages (such as C or C++) and/or are compatible with lower level HDL design languages and/or higher level design languages.

設計程序2910較佳採用且併入硬體及/或軟體模組以用於合成、轉譯或以其他方式處理本文中所展示之組件、電路、裝置或邏輯結構之設計/模擬功能等效者以生成可含有諸如設計結構2920之設計結構的接線對照表2980。接線對照表2980可包含例如經編譯或以其他方式處理之資料結構,其表示描述至積體電路設計中之其他元件及電路之連接的導線、離散組件、邏輯閘、控制電路、I/O裝置、模型等之清單。接線對照表2980可使用反覆製程來合成,其中接線對照表2980取決於用於裝置之設計規格及參數而經重新合成一或多次。如同本文中所描述的其他設計結構類型,接線對照表2980可經記錄於機器可讀儲存媒體上或經程式化至 可程式化閘陣列中。媒體可為非揮發性儲存媒體,諸如磁碟機或光碟機、可程式化閘陣列、CF卡(compact flash)或其他快閃記憶體。另外或在替代例中,媒體可為系統或快取記憶體或緩衝空間。 Design program 2910 preferably employs and incorporates hardware and/or software modules for synthesizing, translating, or otherwise processing design/simulation functional equivalents of components, circuits, devices, or logic structures shown herein to generate a wiring lookup table 2980 that may contain design structures such as design structure 2920. Wiring lookup table 2980 may include, for example, a compiled or otherwise processed data structure representing a list of wires, discrete components, logic gates, control circuits, I/O devices, models, etc. that describe connections to other components and circuits in an integrated circuit design. The wiring lookup table 2980 can be synthesized using an iterative process, where the wiring lookup table 2980 is resynthesized one or more times depending on the design specifications and parameters for the device. As with other design structure types described herein, the wiring lookup table 2980 can be recorded on a machine readable storage medium or programmed into a programmable gate array. The medium can be a non-volatile storage medium such as a disk drive or optical disk drive, a programmable gate array, a CF card (compact flash) or other flash memory. Additionally or in the alternative, the medium can be system or cache memory or buffer space.

設計程序2910可包括用於處理包括接線對照表2980之多種輸入資料結構類型的硬體及軟體模組。此類資料結構類型可駐留於例如程式庫元件2930內,且包括用於給定製造技術(例如,不同技術節點:32nm、45nm、290nm等)的常用元件、電路及裝置之集合,包括模型、佈局及符號表示。資料結構類型可進一步包括設計規格2940、特性化資料2950、驗證資料2960、設計規則2990以及測試資料檔案2985,該等測試資料檔案可包括輸入測試圖案、輸出測試結果及其他測試資訊。設計程序2910可進一步包括例如標準機械設計程序,諸如應力分析、熱分析、機械事件模擬、用於諸如澆鑄、模製及模壓成形等之操作的程序模擬。機械設計之一般熟習此項技術者可瞭解用於設計程序2910中之可能的機械設計工具及應用的範圍而不偏離本發明之範疇及精神。設計程序2910亦可包括用於執行諸如定時分析、驗證、設計規則檢查、置放及路由操作等之標準電路設計程序之模組。 The design process 2910 may include hardware and software modules for processing a variety of input data structure types including a wiring lookup table 2980. Such data structure types may reside, for example, in a library component 2930 and include a collection of commonly used components, circuits, and devices for a given manufacturing technology (e.g., different technology nodes: 32nm, 45nm, 290nm, etc.), including models, layouts, and symbolic representations. The data structure types may further include design specifications 2940 , characterization data 2950 , verification data 2960 , design rules 2990 , and test data files 2985 , which may include input test patterns, output test results, and other test information. The design process 2910 may further include, for example, standard mechanical design processes such as stress analysis, thermal analysis, mechanical event simulation, process simulation for operations such as casting, molding, and die-stamping. A person skilled in the art of mechanical design will understand the range of possible mechanical design tools and applications for use in the design process 2910 without departing from the scope and spirit of the present invention. The design process 2910 may also include modules for performing standard circuit design processes such as timing analysis, verification, design rule checking, placement and routing operations.

設計程序2910採用且併入諸如HDL編譯器及模擬模型建構工具的邏輯及實體設計工具以連同所描繪支援資料結構中之一些或全部以及任何額外機械設計或資料(若適用)來處理設計結構2920,以生成第二設計結構2990。設計結構2990以用於交換機械裝置及結構之資料的資料格式(例如,以IGES、DXF、Parasolid XT、JT、DRG或用於儲存或呈現此類機械設計結構之任何其他合適格式儲存的資訊)駐留於儲存媒體或可程式化閘陣列上。類似於設計結構2920,設計結構2990較佳包含一或多個 檔案、資料結構,或其他電腦經編碼資料或指令,其駐留於傳輸或資料儲存媒體上且當藉由ECAD系統處理時生成本文中所展示的本發明之實施例中之一或多者的邏輯上或以其他方式功能上等效之形式。在一個實施例中,設計結構2990可包含在功能上模擬本文中所展示之裝置的經編譯、可執行之HDL模擬模型。 The design program 2910 employs and incorporates logical and physical design tools such as HDL compilers and simulation model building tools to process the design structure 2920 along with some or all of the depicted supporting data structures and any additional mechanical design or data, if applicable, to generate a second design structure 2990. The design structure 2990 resides on a storage medium or programmable gate array in a data format for exchanging data of mechanical devices and structures (e.g., information stored in IGES, DXF, Parasolid XT, JT, DRG, or any other suitable format for storing or presenting such mechanical design structures). Similar to design structure 2920 , design structure 2990 preferably includes one or more files, data structures, or other computer-encoded data or instructions that reside on a transmission or data storage medium and that, when processed by an ECAD system, generate a logically or otherwise functionally equivalent form of one or more of the embodiments of the invention presented herein. In one embodiment, design structure 2990 may include a compiled, executable HDL simulation model that functionally simulates a device presented herein.

設計結構2990亦可採用用於交換積體電路之佈局資料的資料格式及/或符號資料格式(例如,以GDSII(GDS2)、GL1、OASIS、映射檔案或用於儲存此類設計資料結構之任何其他合適格式儲存之資訊)。設計結構2990可包含資訊,諸如(例如)符號資料、映射檔案、測試資料檔案、設計內容檔案、製造資料、佈局參數、導線、金屬層級、通孔、形狀、用於經由所製造線路由的資料以及製造商或其他設計者/開發者生產如上文所描述及本文中所展示的裝置或結構所需的任何其他資料。設計結構2990接著可繼續進行至階段2995,其中例如設計結構2990:繼續進行至成品出廠驗證(tape-out),經釋放至製造,經釋放至遮罩室,經發送至另一設計室,經發送回至客戶等。 Design structure 2990 may also employ a data format for exchanging layout data of integrated circuits and/or a symbol data format (e.g., information stored in GDSII (GDS2), GL1, OASIS, a map file, or any other suitable format for storing such design data structures). Design structure 2990 may include information such as, for example, symbol data, map files, test data files, design content files, manufacturing data, layout parameters, wires, metal levels, vias, shapes, data for routing through fabricated lines, and any other data required by a manufacturer or other designer/developer to produce devices or structures as described above and shown herein. The design structure 2990 may then proceed to stage 2995 , where, for example, the design structure 2990 : proceeds to tape-out, is released to manufacturing, is released to a mask room, is sent to another design house, is sent back to a customer, etc.

如已描述,在至少一個實施例中,一種處理器包括一暫存器檔案及一執行單元。該執行單元包括:一雜湊電路,其包括至少一狀態暫存器;一狀態更新電路,其耦接至該狀態暫存器;及一控制電路。基於一雜湊指令,該雜湊電路自該暫存器檔案接收並在該狀態暫存器內緩衝正被雜湊之一訊息之一當前狀態。該狀態更新電路對該狀態暫存器之內容執行狀態更新函數,其中執行該狀態更新函數包括對該狀態暫存器之內容執行複數個反覆回合之處理,及將該複數個反覆回合處理中之每一者之一結果傳回至該狀態暫存器。在完成所有該複數個反覆回合之處理之後,該執 行單元將該狀態暫存器之內容儲存至該暫存器檔案作為該訊息之一經更新狀態。 As described, in at least one embodiment, a processor includes a register file and an execution unit. The execution unit includes: a hash circuit including at least one state register; a state update circuit coupled to the state register; and a control circuit. Based on a hash instruction, the hash circuit receives from the register file and buffers a current state of a message being hashed in the state register. The state update circuit executes a state update function on the content of the state register, wherein executing the state update function includes executing a plurality of repeated rounds of processing on the content of the state register, and returning a result of each of the plurality of repeated rounds of processing to the state register. After completing all the plurality of repeated rounds of processing, the execution unit stores the content of the state register to the register file as an updated state of the message.

在至少一些實施例中,該狀態更新函數包含一安全雜湊演算法3(SHA3)狀態置換函數;且該狀態更新電路執行二十四個回合之處理,每一回合之處理利用二十四個回合索引中之一各別回合索引作為一輸入。 In at least some embodiments, the state update function comprises a secure hash algorithm 3 (SHA3) state permutation function; and the state update circuit performs twenty-four rounds of processing, each round of processing utilizing a respective round index of the twenty-four round indices as an input.

在一個實施例中,該執行單元在一安全雜湊演算法及Keccak(SHAKE)雜湊演算法之一擠壓階段中執行該雜湊指令。 In one embodiment, the execution unit executes the hash instruction in a squeeze phase of a secure hash algorithm and a Keccak (SHAKE) hash algorithm.

在至少一些實施例中,該狀態更新函數包含一安全雜湊演算法2(SHA2)塊雜湊函數。在至少一些實施例中,該雜湊電路進一步包括一加法器,該加法器經組態以將該狀態暫存器之內容加至該當前狀態且將一所得總和傳回至該暫存器檔案。在一些實施例中,該執行單元進一步包括用於緩衝該訊息之一訊息塊之一訊息塊暫存器及耦接至該訊息塊暫存器之一訊息排程回合電路。該訊息排程回合電路對該訊息塊暫存器之內容執行複數個反覆回合之處理,且將該複數個反覆回合之處理中之每一者的一結果傳回至該訊息塊暫存器。在一些實施例中,該狀態更新電路包括用於具有一第一資料寬度之資料字的一資料路徑,且該執行單元經組態以基於指示比該第一資料寬度窄之一第二資料寬度的該雜湊指令,在處理該狀態更新電路中之該訊息的該等資料字之前,將該訊息之一訊息塊的資料字擴展至該第一資料寬度。 In at least some embodiments, the state update function comprises a secure hash algorithm 2 (SHA2) block hash function. In at least some embodiments, the hash circuit further comprises an adder configured to add the contents of the state register to the current state and return a resulting sum to the register file. In some embodiments, the execution unit further comprises a message block register for buffering a message block of the message and a message scheduling round circuit coupled to the message block register. The message scheduling circuit performs a plurality of iterative rounds of processing on the content of the message block register and returns a result of each of the plurality of iterative rounds of processing to the message block register. In some embodiments, the state update circuit includes a data path for data words having a first data width, and the execution unit is configured to expand the data words of a message block of the message to the first data width before processing the data words of the message in the state update circuit based on the hash instruction indicating a second data width narrower than the first data width.

雖然已特別展示並描述了各種實施例,但熟習此項技術者應瞭解,在不脫離所附申請專利範圍之精神及範疇的情況下,可在其中作出形式及細節上的各種改變,且此等替代實現皆屬於所附申請專利範圍之 範疇。舉例而言,雖然已特定參考SHA標準系列來描述本發明,但熟習此項技術者應瞭解,所揭示之發明亦適用於其他雜湊演算法(例如,通用Keccak函數,以及其他)。另外,儘管本文中為了易於理解已論述繪示性數目個位元及位元組,但應瞭解,用於雜湊演算法中之位元及位元組的特定數目可以且隨著時間推移進行改變,且所揭示發明之原理適用於加密演算法,而不管給定實現中之位元及位元組的特定數目如何。 Although various embodiments have been particularly shown and described, those skilled in the art will appreciate that various changes in form and detail may be made therein without departing from the spirit and scope of the appended claims, and such alternative implementations are intended to be within the scope of the appended claims. For example, although the present invention has been described with specific reference to the SHA family of standards, those skilled in the art will appreciate that the disclosed invention is also applicable to other hashing algorithms (e.g., generalized Keccak functions, among others). Additionally, although illustrative numbers of bits and bytes have been discussed herein for ease of understanding, it should be understood that the specific numbers of bits and bytes used in hashing algorithms can and do change over time, and that the principles of the disclosed invention apply to encryption algorithms regardless of the specific numbers of bits and bytes in a given implementation.

諸圖中之流程圖及方塊圖繪示根據本發明之各種實施例的系統、方法及電腦程式產品之可能實現之架構、功能性及操作。就此而言,流程圖或方塊圖中之每一區塊可表示模組、區段或指令之部分,其包含用於實現指定邏輯函數之一或多個可執行指令。在一些替代實現中,區塊中所提及之功能可能不以諸圖中所提及之次序發生。舉例而言,取決於所涉及之功能性,連續展示的兩個區塊事實上可實質上同時地執行,或該等區域塊有時可以反向次序執行。亦應注意,方塊圖及/或流程圖繪示之每一區塊以及方塊圖及/或流程圖繪示中之區塊組合可由執行指定功能或動作或進行特殊用途硬體及電腦指令之組合的基於特殊用途硬體之系統實現。 The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagram may represent a module, segment, or portion of an instruction that includes one or more executable instructions for implementing a specified logic function. In some alternative implementations, the functions mentioned in the blocks may not occur in the order mentioned in the figures. For example, depending on the functionality involved, two blocks shown in succession may actually be executed substantially simultaneously, or the blocks may sometimes be executed in reverse order. It should also be noted that each block of the block diagram and/or flowchart illustration and the combination of blocks in the block diagram and/or flowchart illustration can be implemented by a special purpose hardware-based system that performs the specified function or action or performs a combination of special purpose hardware and computer instructions.

另外,儘管已關於執行引導本發明之功能之程式碼的電腦系統描述態樣,但應理解,本發明可替代地實現為包括儲存可由資料處理系統處理之程式碼的電腦可讀儲存裝置的程式產品。電腦可讀儲存裝置可包括揮發性或非揮發性記憶體、光碟或磁碟或其類似者。然而,如本文中所採用,「儲存裝置」具體地定義為僅包括法定製品且排除信號媒體本身、暫時性傳播信號本身及能量本身。 Additionally, although the embodiments have been described with respect to a computer system executing program code that directs the functions of the present invention, it should be understood that the present invention may alternatively be implemented as a program product including a computer-readable storage device storing program code that can be processed by a data processing system. The computer-readable storage device may include volatile or non-volatile memory, optical or magnetic disks, or the like. However, as used herein, "storage device" is specifically defined to include only legal products and excludes the signal medium itself, the transient propagation signal itself, and the energy itself.

程式產品可包括資料及/或指令,該等資料及/或指令在資 料處理系統上經執行或以其他方式經處理時生成本文中所揭示的硬體組件、電路、裝置或系統之邏輯上、結構上或以其他方式功能上等效的表示(包括模擬模型)。此類資料及/或指令可包括硬體描述語言(HDL)設計實體或符合較低層級HDL設計語言(諸如Verilog及VHDL)及/或較高層級設計語言(諸如C或C++)及/或與較低層級HDL設計語言及/或較高層級設計語言相容的其他資料結構。此外,資料及/或指令亦可採用用於交換積體電路之佈局資料的資料格式及/或符號資料格式(例如,以GDSII(GDS2)、GL1、OASIS、映射檔案或用於儲存此類設計資料結構之任何其他合適格式儲存之資訊)。 Program products may include data and/or instructions that, when executed or otherwise processed on a data processing system, generate a logically, structurally or otherwise functionally equivalent representation (including simulation models) of a hardware component, circuit, device or system disclosed herein. Such data and/or instructions may include hardware description language (HDL) design entities or other data structures that conform to lower-level HDL design languages (such as Verilog and VHDL) and/or higher-level design languages (such as C or C++) and/or are compatible with lower-level HDL design languages and/or higher-level design languages. Additionally, the data and/or instructions may also be in a data format and/or symbolic data format used to exchange layout data of an integrated circuit (e.g., information stored in GDSII (GDS2), GL1, OASIS, a map file, or any other suitable format for storing such design data structures).

2800:區塊 2800: Block

2802:區塊 2802: Block

2804:區塊 2804: Block

2806:區塊 2806: Block

2808:區塊 2808: Block

2810:區塊 2810: Block

2812:區塊 2812: Block

2814:區塊 2814: Block

2816:區塊 2816: Block

2818:區塊 2818: Block

2820:區塊 2820: Block

2822:區塊 2822: Block

Claims (10)

一種處理器,其包含:一指令提取單元,其提取待執行之指令;一暫存器檔案,其包括用於儲存源及目的地運算元之複數個暫存器;及一執行單元,其用於執行一雜湊指令,其中該執行單元包括:一雜湊電路,其包括至少一狀態暫存器;一狀態更新電路,其耦接至該狀態暫存器;及一控制電路,其中該執行單元基於該雜湊指令經組態以執行以下操作:自該暫存器檔案接收並在該狀態暫存器內緩衝正被雜湊之一訊息之一當前狀態;在該狀態更新電路中對該狀態暫存器之內容執行一狀態更新函數,其中執行該狀態更新函數包括對該狀態暫存器之內容執行複數個反覆回合之處理,及將該複數個反覆回合處理中之每一者之一結果傳回至該狀態暫存器;及在完成所有該複數個反覆回合之處理之後,將該狀態暫存器之內容儲存至該暫存器檔案作為該訊息之一經更新狀態,其中:該狀態更新函數包含一安全雜湊演算法2(SHA2)塊雜湊函數;且該雜湊電路進一步包括一加法器,該加法器經組態以將該狀態暫存器之內容加至該當前狀態且將一所得總和傳回至該暫存器檔案。 A processor comprises: an instruction fetch unit, which fetches instructions to be executed; a register file, which includes a plurality of registers for storing source and destination operands; and an execution unit, which is used to execute a hash instruction, wherein the execution unit includes: a hash circuit, which includes at least one state register; a state update circuit, which is coupled to the state register; and a control circuit, wherein the execution unit is configured to perform the following operations based on the hash instruction: receiving a current state of a message being hashed from the register file and buffering it in the state register; performing a state update on the content of the state register in the state update circuit; A new function, wherein executing the state update function includes performing a plurality of iterative rounds of processing on the content of the state register, and returning a result of each of the plurality of iterative rounds of processing to the state register; and after completing all of the plurality of iterative rounds of processing, storing the content of the state register to the register file as an updated state of the message, wherein: the state update function includes a secure hash algorithm 2 (SHA2) block hash function; and the hash circuit further includes an adder, which is configured to add the content of the state register to the current state and return a resulting sum to the register file. 如請求項1之處理器,其中:該執行單元進一步包括用於緩衝該訊息之一訊息塊之一訊息塊暫存器及耦接至該訊息塊暫存器之一訊息排程回合電路;且執行一狀態更新函數包括藉由該訊息排程回合電路對該訊息塊暫存器之內容執行複數個反覆回合之處理,且將該複數個反覆回合之處理中之每一者的一結果傳回至該訊息塊暫存器。 A processor as claimed in claim 1, wherein: the execution unit further includes a message block register for buffering a message block of the message and a message scheduling circuit coupled to the message block register; and executing a state update function includes executing a plurality of repeated rounds of processing on the content of the message block register by the message scheduling circuit, and returning a result of each of the plurality of repeated rounds of processing to the message block register. 如請求項1之處理器,其中:該狀態更新電路包括用於具有一第一資料寬度之資料字的一資料路徑;且該執行單元經組態以基於指示比該第一資料寬度窄之一第二資料寬度的該雜湊指令,在處理該狀態更新電路中之該訊息的該等資料字之前,將該訊息之一訊息塊的資料字擴展至該第一資料寬度。 A processor as claimed in claim 1, wherein: the state update circuit includes a data path for data words having a first data width; and the execution unit is configured to expand the data words of a message block of the message to the first data width before processing the data words of the message in the state update circuit based on the hash instruction indicating a second data width narrower than the first data width. 一種資料處理系統,其包含:多個處理器,包括如請求項1之處理器;一共用記憶體;及一系統互連件,其以通信方式耦接該共用記憶體及該多個處理器。 A data processing system comprising: a plurality of processors, including the processor of claim 1; a shared memory; and a system interconnect coupling the shared memory and the plurality of processors in a communication manner. 一種在一處理器中進行資料處理之方法,該方法包含:藉由一指令提取單元提取待由該處理器執行之指令,其中該等指令包括一雜湊指令;及基於接收到該雜湊指令,該處理器之一執行單元執行該雜湊指令, 其中該執行單元包括:一雜湊電路,其包括至少一狀態暫存器;一狀態更新電路,其耦接至該狀態暫存器;及一控制電路,其中該執行包括:自一暫存器檔案接收並在該狀態暫存器內緩衝正被雜湊之一訊息之一當前狀態;在該狀態更新電路中對該狀態暫存器之內容執行一狀態更新函數,其中執行該狀態更新函數包括對該狀態暫存器之內容執行複數個反覆回合之處理,及將該複數個反覆回合處理中之每一者之一結果傳回至該狀態暫存器;及在完成所有該複數個反覆回合之處理之後,將該狀態暫存器之內容儲存至該暫存器檔案作為該訊息之一經更新狀態,其中:該狀態更新函數包含一安全雜湊演算法2(SHA2)塊雜湊函數;且該雜湊電路進一步包括一加法器,該加法器經組態以將該狀態暫存器之內容加至該當前狀態且將一所得總和傳回至該暫存器檔案。 A method for performing data processing in a processor, the method comprising: extracting instructions to be executed by the processor by an instruction extraction unit, wherein the instructions include a hash instruction; and based on receiving the hash instruction, an execution unit of the processor executes the hash instruction, wherein the execution unit includes: a hash circuit, which includes at least one state register; a state update circuit, which is coupled to the state register; and a control circuit, wherein the execution includes: receiving a current state of a message being hashed from a register file and buffering it in the state register; performing a state update on the content of the state register in the state update circuit. A new function, wherein executing the state update function includes performing a plurality of iterative rounds of processing on the content of the state register, and returning a result of each of the plurality of iterative rounds of processing to the state register; and after completing all of the plurality of iterative rounds of processing, storing the content of the state register to the register file as an updated state of the message, wherein: the state update function includes a secure hash algorithm 2 (SHA2) block hash function; and the hash circuit further includes an adder, which is configured to add the content of the state register to the current state and return a resulting sum to the register file. 如請求項5之方法,其中:該執行單元進一步包括用於緩衝該訊息之一訊息塊之一訊息塊暫存器及耦接至該訊息塊暫存器之一訊息排程回合電路;且執行一狀態更新函數包括在該訊息排程回合電路中對該訊息塊暫存器之內容執行複數個反覆回合之處理,且將該複數個反覆回合之處理中之每一者的一結果傳回至該訊息塊暫存器。 The method of claim 5, wherein: the execution unit further includes a message block register for buffering a message block of the message and a message scheduling circuit coupled to the message block register; and executing a state update function includes executing a plurality of repeated rounds of processing on the content of the message block register in the message scheduling circuit, and returning a result of each of the plurality of repeated rounds of processing to the message block register. 如請求項5之方法,其中: 該狀態更新電路包括用於具有一第一資料寬度之資料字的一資料路徑;且該方法進一步包含:基於指示比該第一資料寬度窄之一第二資料寬度的該雜湊指令,在處理該狀態更新電路中之該訊息的該等資料字之前,將該訊息之一訊息塊的資料字擴展至該第一資料寬度。 The method of claim 5, wherein: The state update circuit includes a data path for data words having a first data width; and the method further includes: based on the hash instruction indicating a second data width narrower than the first data width, before processing the data words of the message in the state update circuit, expanding the data words of a message block of the message to the first data width. 一種設計結構,其有形地體現於一機器可讀儲存裝置中用於設計、製造或測試一積體電路,該設計結構包含:一處理器,其包括:一指令提取單元,其提取待執行之指令;一暫存器檔案,其包括用於儲存源及目的地運算元之複數個暫存器;及一執行單元,其用於執行一雜湊指令,其中該執行單元包括:一雜湊電路,其包括至少一狀態暫存器;一狀態更新電路,其耦接至該狀態暫存器;及一控制電路,其中該執行單元基於該雜湊指令經組態以執行以下操作:自該暫存器檔案接收並在該狀態暫存器內緩衝正被雜湊之一訊息之一當前狀態;在該狀態更新電路中對該狀態暫存器之內容執行一狀態更新函數,其中執行該狀態更新函數包括對該狀態暫存器之內容執行複數個反覆回合之處理,及將該複數個反覆回合處理中之每一者之一結果傳回至該狀態暫存器;及 在完成所有該複數個反覆回合之處理之後,將該狀態暫存器之內容儲存至該暫存器檔案作為該訊息之一經更新狀態,其中:該狀態更新函數包含一安全雜湊演算法2(SHA2)塊雜湊函數;且該雜湊電路進一步包括一加法器,該加法器經組態以將該狀態暫存器之內容加至該當前狀態且將一所得總和傳回至該暫存器檔案。 A design structure, tangibly embodied in a machine-readable storage device, for designing, manufacturing or testing an integrated circuit, the design structure comprising: a processor, comprising: an instruction fetch unit, which fetches instructions to be executed; a register file, which comprises a plurality of registers for storing source and destination operands; and an execution unit, which is used to execute a complex A hash instruction, wherein the execution unit includes: a hash circuit including at least one state register; a state update circuit coupled to the state register; and a control circuit, wherein the execution unit is configured to perform the following operations based on the hash instruction: receiving from the register file and buffering in the state register a current state of a message being hashed; The state update circuit executes a state update function on the content of the state register, wherein the execution of the state update function includes executing a plurality of repeated rounds of processing on the content of the state register, and returning a result of each of the plurality of repeated rounds of processing to the state register; and after completing all the plurality of repeated rounds of processing, the state register is The contents of the register are stored in the register file as an updated state of the message, wherein: the state update function comprises a secure hash algorithm 2 (SHA2) block hash function; and the hash circuit further comprises an adder configured to add the contents of the state register to the current state and return a resulting sum to the register file. 如請求項8之設計結構,其中:該執行單元進一步包括用於緩衝該訊息之一訊息塊之一訊息塊暫存器及耦接至該訊息塊暫存器之一訊息排程回合電路;且執行一狀態更新函數包括藉由該訊息排程回合電路對該訊息塊暫存器之內容執行複數個反覆回合之處理,且將該複數個反覆回合之處理中之每一者的一結果傳回至該訊息塊暫存器。 The design structure of claim 8, wherein: the execution unit further includes a message block register for buffering a message block of the message and a message scheduling circuit coupled to the message block register; and executing a state update function includes executing a plurality of repeated rounds of processing on the content of the message block register by the message scheduling circuit, and returning a result of each of the plurality of repeated rounds of processing to the message block register. 如請求項8之設計結構,其中:該狀態更新電路包括用於具有一第一資料寬度之資料字的一資料路徑;且該執行單元經組態以基於指示比該第一資料寬度窄之一第二資料寬度的該雜湊指令,在處理該狀態更新電路中之該訊息的該等資料字之前,將該訊息之一訊息塊的資料字擴展至該第一資料寬度。 The design structure of claim 8, wherein: the state update circuit includes a data path for data words having a first data width; and the execution unit is configured to expand the data words of a message block of the message to the first data width before processing the data words of the message in the state update circuit based on the hash instruction indicating a second data width narrower than the first data width.
TW112124023A 2022-07-05 2023-06-28 Processor, data processing system, method and design structure for performing secure hash algorithms TWI861966B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US17/857,627 2022-07-05
US17/857,627 US20240015004A1 (en) 2022-07-05 2022-07-05 Hardware-based key generation and storage for cryptographic function
US17/884,704 US12411996B2 (en) 2022-08-10 2022-08-10 Hardware-based implementation of secure hash algorithms
US17/884,704 2022-08-10

Publications (2)

Publication Number Publication Date
TW202409871A TW202409871A (en) 2024-03-01
TWI861966B true TWI861966B (en) 2024-11-11

Family

ID=91228239

Family Applications (1)

Application Number Title Priority Date Filing Date
TW112124023A TWI861966B (en) 2022-07-05 2023-06-28 Processor, data processing system, method and design structure for performing secure hash algorithms

Country Status (1)

Country Link
TW (1) TWI861966B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201519623A (en) * 2013-08-08 2015-05-16 Intel Corp Instruction and logic to provide a secure cipher hash round functionality
TW201717573A (en) * 2015-11-05 2017-05-16 英特爾公司 Hardware accelerator for cryptographic hash operations
US20190319782A1 (en) * 2019-06-28 2019-10-17 Intel Corporation Combined sha2 and sha3 based xmss hardware accelerator

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201519623A (en) * 2013-08-08 2015-05-16 Intel Corp Instruction and logic to provide a secure cipher hash round functionality
TW201717573A (en) * 2015-11-05 2017-05-16 英特爾公司 Hardware accelerator for cryptographic hash operations
US20190319782A1 (en) * 2019-06-28 2019-10-17 Intel Corporation Combined sha2 and sha3 based xmss hardware accelerator
CN112152785A (en) * 2019-06-28 2020-12-29 英特尔公司 XMSS hardware accelerator based on SHA2 and SHA3 combination

Also Published As

Publication number Publication date
TW202409871A (en) 2024-03-01

Similar Documents

Publication Publication Date Title
US12411996B2 (en) Hardware-based implementation of secure hash algorithms
US10928847B2 (en) Apparatuses and methods for frequency scaling a message scheduler data path of a hashing accelerator
JP6449349B2 (en) Instruction set for SHA1 round processing in multiple 128-bit data paths
CN106575215B (en) Systems, devices, methods, processors, media and electronic devices for processing instructions
CN103975302B (en) Matrix multiply accumulate instruction
CN105190535B (en) Perform the instruction that pseudo random number produces operation
TWI518589B (en) Instructions to perform groestl hashing
CN110347634A (en) Hardware accelerator and method for high-performance authenticated encryption
CN105204820B (en) Instructions and logic for providing generic GF(256) SIMD cryptographic arithmetic functions
CN105190534A (en) Instruction for performing pseudorandom number seed operation
EP4569404A1 (en) Hardware-based galois multiplication
EP3507933A1 (en) Hybrid aes-sms4 hardware accelerator
JP2025528781A (en) Data processing method using hash function, processor
CN104011709B (en) The instruction of JH keyed hash is performed in 256 bit datapaths
TW202407540A (en) Hardware-based key generation and storage for cryptographic function
TWI861966B (en) Processor, data processing system, method and design structure for performing secure hash algorithms
TWI900865B (en) A processor and a method for processing data in the processor
TWI857674B (en) Hardware-based galois multiplication
CN104012031B (en) Instruction for performing JH keyed hash