[go: up one dir, main page]

CN1592189A - Microprocessor and method with optimized block cipher function - Google Patents

Microprocessor and method with optimized block cipher function Download PDF

Info

Publication number
CN1592189A
CN1592189A CNA2004100831177A CN200410083117A CN1592189A CN 1592189 A CN1592189 A CN 1592189A CN A2004100831177 A CNA2004100831177 A CN A2004100831177A CN 200410083117 A CN200410083117 A CN 200410083117A CN 1592189 A CN1592189 A CN 1592189A
Authority
CN
China
Prior art keywords
crypto
block
cryptographic
execution
input characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2004100831177A
Other languages
Chinese (zh)
Other versions
CN100527664C (en
Inventor
汤玛斯A·克里斯宾
G·葛兰亨利
泰瑞帕德斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Via Technologies Inc
Original Assignee
Via Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Via Technologies Inc filed Critical Via Technologies Inc
Publication of CN1592189A publication Critical patent/CN1592189A/en
Application granted granted Critical
Publication of CN100527664C publication Critical patent/CN100527664C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Storage Device Security (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

本发明提供一种在处理器中的复数个输入数据区块执行密码运算的装置及方法。在一实施例中,提供一种执行密码运算的装置,而此装置是包含密码指令及转译逻辑。上述的密码指令是由一计算装置接收并将其当成指令流的一部分,并且此密码指令指定一种密码运算。上述的转译逻辑是将上述的密码指令转译成微指令,此微指令是用以在指示计算装置储存对应第一输入文字区块的输出文字区块之前,指示计算装置加载第二输入文字区块并对此第二输入文字区块执行密码运算。因此,在对第二输入文字区块执行密码运算期间,上述的输出文字区块可以被储存。

Figure 200410083117

The present invention provides a device and method for performing cryptographic operations on a plurality of input data blocks in a processor. In one embodiment, a device for performing cryptographic operations is provided, and the device includes a cryptographic instruction and a translation logic. The above-mentioned cryptographic instruction is received by a computing device and used as part of an instruction stream, and the cryptographic instruction specifies a cryptographic operation. The above-mentioned translation logic translates the above-mentioned cryptographic instruction into a microinstruction, and the microinstruction is used to instruct the computing device to load a second input text block and perform a cryptographic operation on the second input text block before instructing the computing device to store an output text block corresponding to the first input text block. Therefore, during the execution of the cryptographic operation on the second input text block, the above-mentioned output text block can be stored.

Figure 200410083117

Description

具最佳化区块密码功能的微处理器及方法Microprocessor and method with optimized block cipher function

技术领域technical field

本发明是有关于微电子领域,特别是有关于一种可在具有微指令最佳化顺序的计算装置中执行密码运算的装置及方法,以增加此计算装置的产量(throughput)。The present invention relates to the field of microelectronics, and more particularly to a device and method for performing cryptographic operations in a computing device with an optimized sequence of microinstructions, so as to increase the throughput of the computing device.

背景技术Background technique

早期的计算机系统是独立操作于其它计算机系统之外,据此,执行于此计算机系统中的应用程序所需的输入数据,若不是储存于此计算机系统就是由应用程序设计人员在执行时提供;而应用程序执行结果所产生的输出数据,其形式一般为打印输出的纸张,或者是写入磁带、磁盘或是此计算机系统其它类型的储存装置的档案。输出档案可以是之后在相同计算机系统中执行的应用程序的输入档案,或者,当输出数据先前被储存成档案于可移除或可输送的储存装置时,其也可以提供给不同但兼容的计算机系统的应用程序使用。在这些早期的系统,保护机密信息的需求是公认,并且在其它信息安全措施中,密码应用程序被发展及应用以防止机密信息被未授权揭露。这些密码程序一般是以搅合(scramble)及解读(unscramble)在储存装置中储存成档案的输出数据。Early computer systems operated independently of other computer systems, whereby the input data required for the application programs running on the computer system were either stored in the computer system or provided by the application programmer at the time of execution; The output data generated by the execution result of the application program is generally in the form of printed paper, or a file written into tape, disk or other types of storage devices of the computer system. The output file may be an input file for an application program subsequently executed on the same computer system, or, when the output data was previously stored as a file on a removable or transportable storage device, it may be provided to a different but compatible computer used by system applications. In these early systems, the need to protect classified information was recognized, and among other information security measures, cryptographic applications were developed and applied to prevent unauthorized disclosure of classified information. These cryptographic programs generally scramble and unscramble output data stored as files in a storage device.

其后没几年,使用者开始发现通过由网络将计算机连接可以提供信息共享存取的好处,因此网络架构、操作系统、以及数据传输协议等均发展成不仅支持存取共享数据的能力,更是其显著的特征。例如:使用者的计算机工作站可以在不同工作站或网络文件服务器存取档案,或者利用因特网获得新闻及其它信息,或者对数以百计的其它计算机传送及接收电子讯息(如电子邮件),或者与经销商的计算机系统连接并提供信用卡或银行信息以购买产品,或者在餐厅、机场或其它公共场合利用无线网络进行上述的任何活动。因此,保护机密数据及传输免于未授权揭露的需求已急速的成长,而在某些特定的状况下,使用者被迫保护其机密数据的情况也大大的增加。目前新闻头条通常集中在计算机信息安全问题,例如垃圾邮件(spam)、骇客、身分盗取、反向工程、恶作剧以及信用卡诈骗等是公众所关注之前几名。而当这些从各方面侵入私人领域的动机由无心的错误到有预谋的网络攻击,负责的执行单位以新法律、严厉的执行以及公共教育节目响应。然而,这些响应并未有效遏止危及计算机信息的浪潮。昔日是政府、金融机构、军方所专注关切之间谍,现在对一般人而言也已成为重要的问题;间谍读取他们的电子邮件或从他们的家用计算机存取他们检查账户的交易。在商业之前,熟悉该项技术者可察知从小到大的社团法人目前应用其资源卓越的部分以保护财产信息。In the next few years, users began to discover the benefits of sharing and accessing information provided by connecting computers through the network. Therefore, network architecture, operating systems, and data transmission protocols were developed to not only support the ability to access shared data, but also is its salient feature. For example: a user's computer workstation can access files on different workstations or network file servers, or use the Internet to obtain news and other information, or send and receive electronic messages (such as emails) to hundreds of other computers, or communicate with The dealer's computer system connects and provides credit card or bank information to purchase the product, or use the wireless network in a restaurant, airport, or other public location for any of the above activities. As a result, the need to protect confidential data and transmissions from unauthorized disclosure has grown dramatically, and under certain circumstances, users are compelled to protect their confidential data. Current news headlines usually focus on computer information security issues, such as spam (spam), hacking, identity theft, reverse engineering, hoaxes, and credit card fraud are the top few of the public's attention. And when the motives for these intrusions into the private sphere range from unintentional mistakes to premeditated cyber-attacks, responsible enforcement agencies respond with new laws, tough enforcement, and public education programs. However, these responses have not been effective in stemming the tide of compromised computer information. What was once the preoccupational concern of governments, financial institutions, and the military is now an important issue for ordinary people; spies reading their emails or accessing transactions from their home computers to check their accounts. Before commercialization, those who are familiar with this technology can perceive that small to large corporate legal persons are currently using an outstanding part of their resources to protect property information.

信息安全领域提供我们技术及装置以编码数据,并使其仅能由指定的个体加以解碼,此为所知的密码(cryptography)。当特别应用于保护储存或传输于计算机之间的信息时,密码最常被应用于转换机密数据(称为“明文”;plaintext或cleartext)成为难以理解的形式(称为“密文”;ciphertext)。转变明文成为密文的转换过程称为加密(encryption;enciphering;ciphering),而转变密文回明文的反向转换过程称为解密(decryption;deciphering;inverse ciphering)。The field of information security provides us with techniques and devices to encode data so that it can only be decoded by designated individuals, known as cryptography. When applied specifically to protecting information stored or transmitted between computers, ciphers are most commonly used to convert confidential data (called "plaintext"; plaintext or cleartext) into an incomprehensible form (called "ciphertext"; ciphertext ). The conversion process of converting plaintext into ciphertext is called encryption (enciphering; ciphering), and the reverse conversion process of converting ciphertext back to plaintext is called decryption (decryption; deciphering; inverse ciphering).

在密码学的领域中,几种程序及协议已发展到允许使用者不须具备许多知识及努力即可执行密码运算,并且针对这些使用者使其可以传输或者提供其编码形式的信息产品给不同的使用者。连同编码信息,传送者通常会提供接收者一“密码钥匙(cryptographic key)”以使接收者可以译码所编码的信息,因此使得接收者能够恢复或者获得存取未编码的原始信息。熟悉该项技术者可察知这些程序及协议一般是以暗语(password)保护、数学算法以及应用程序特别设计的形式加以实现以加密及解密机密信息。In the field of cryptography, several programs and protocols have been developed to allow users to perform cryptographic operations without much knowledge and effort, and for these users to transmit or provide information products in their encoded form to different users. Along with the encoded message, the sender typically provides the recipient with a "cryptographic key" that enables the recipient to decode the encoded message, thus enabling the recipient to recover or gain access to the unencoded original message. Those who are familiar with this technology can perceive that these programs and protocols are generally implemented in the form of password protection, mathematical algorithms, and application programs specially designed to encrypt and decrypt confidential information.

几种类型的算法目前使用于加密及解密数据。算法根据上述一类型(例如一种RSA算法,公开钥匙密码算法)利用两密码钥匙,一公开钥匙与一私密钥匙,加密或解密资料。根据一些公开钥匙算法,接收者的公开钥匙是被传送者用来加密传送给接收者的数据,因为有一数学关系存在于使用者的公开钥匙与私密钥匙之间,因此接收者必须利用其私密钥匙解密此传输以恢复此数据。虽然这类型的密码算法广泛使用于现今,但其加密及解密的运算却是极慢甚至于少量的数据。一第二类型的算法,如所知的对称钥匙算法,提供同量等级的数据安全并且可以较快执行。这些算法称为对称钥匙算法因为他们对加密及解密信息使用单一密码钥匙。在公开区段,目前有三种盛行单一钥匙(single-key)密码算法:数据编码标准(Data EncryptionStandard;DES)、三重DES以及进阶编码标准(AdvancedEncryption Standard;AES)。因为这些算法保护机密资料的强度,美国政府机关目前正使用这些算法,但熟悉该项技术者预期这些算法中一个或多个算法在不久的将来会变成商业及非官方交易的标准。根据所有这些对称钥匙算法,明文及密文被划分在指定大小中的区块以进行加密及解密。例如:AES执行密码运算于128位区块的大小,并且使用128位、192位以及256位的密码钥匙长度。其它对称钥匙算法,例如Rijndael Cipher也允许192位以及256位的数据区块。据此,就一区块加密运算而言,一1024位的明文讯息加密当成8个128位的区块。Several types of algorithms are currently used to encrypt and decrypt data. The algorithm uses two cryptographic keys, a public key and a private key, to encrypt or decrypt data according to one of the above-mentioned types (for example, an RSA algorithm, public key cryptographic algorithm). According to some public key algorithms, the receiver's public key is used by the transmitter to encrypt the data transmitted to the receiver, because there is a mathematical relationship between the user's public key and the private key, so the receiver must use its private key Decrypt this transmission to recover this data. Although this type of cryptographic algorithm is widely used today, its encryption and decryption operations are extremely slow even for a small amount of data. A second type of algorithm, known as the symmetric key algorithm, provides the same level of data security and can execute faster. These algorithms are called symmetric key algorithms because they use a single cryptographic key for encrypting and decrypting messages. In the public sector, there are currently three popular single-key encryption algorithms: Data Encryption Standard (DES), Triple DES, and Advanced Encryption Standard (AES). Because of the strength with which these algorithms protect classified information, U.S. government agencies are currently using these algorithms, but those familiar with the art expect that one or more of these algorithms will become the standard for commercial and unofficial transactions in the near future. According to all these symmetric key algorithms, plaintext and ciphertext are divided into blocks of specified size for encryption and decryption. For example: AES performs cryptographic operations on 128-bit block sizes and uses 128-bit, 192-bit, and 256-bit cryptographic key lengths. Other symmetric key algorithms, such as Rijndael Cipher also allow 192-bit and 256-bit data blocks. Accordingly, in terms of a block encryption operation, a 1024-bit plaintext message is encrypted as eight 128-bit blocks.

所有对称钥匙算法利用相同形式的次运算以加密一区块的明文,并且根据许多更常被应用的对称钥匙算法,一初始密码钥匙被扩展成复数个钥匙(例如:一“钥匙排程”),每一钥匙是用以当成次运算的一对应密码“回合”且执行于明文区块。例如:钥匙排程的第一钥匙是用以执行次运算的第一密码回合于明文区块,第一回合的结果是用以当成第二回合的输入,其中第二回合利用钥匙排程的第二钥匙以产生第二结果,并且一具体指定数量后来的回合执行产生一最终回合结果,即密文本身。根据AES算法,在每一回合的次运算是参照于文献中的SubBytes(或S-box)、ShiftRows、MixColums以及AddRoundKey。一区块密文的解密是类似的处理并伴随例外的执行在每一回合,且回合的最终结果是一区块的明文,上述的例外是指密文输入反加密及反次运算执行(例如:InverseMixColumns、Inverse ShiftRows)。All symmetric key algorithms utilize the same form of operations to encrypt a block of plaintext, and according to many of the more commonly used symmetric key algorithms, an initial cryptographic key is expanded into a plurality of keys (e.g. a "key schedule") , each key is used as a corresponding cryptographic "round" of an operation and is performed on a plaintext block. For example: the first key of the key schedule is used to perform the first cryptographic round of the operation on the plaintext block, and the result of the first round is used as the input of the second round, wherein the second round uses the first round of the key schedule Two keys are used to produce a second result, and a specified number of subsequent round executions produce a final round result, the ciphertext itself. According to the AES algorithm, the operations in each round refer to SubBytes (or S-box), ShiftRows, MixColums and AddRoundKey in the literature. The decryption of a block of ciphertext is a similar process and accompanied by the execution of exceptions in each round, and the final result of the round is the plaintext of a block. The above exception refers to the anti-encryption of the ciphertext input and the execution of the reverse operation (for example : InverseMixColumns, Inverse ShiftRows).

DES及三重DES利用不同特定的次运算,但是这些次运算是类似AES的次运算,因为其利用相似的方式以转换一区块的明文成为一区块的密文。DES and Triple DES utilize different specific operations, but these operations are similar to AES operations in that they convert a block of plaintext into a block of ciphertext in a similar manner.

执行密码运算于多连续的文字区块,所有对称钥匙算法利用相同类别的模式,这些模式包含电子密码本(electronic codebook;ECB)模式、密码区块链接(cipher block chaining;CBC)模式、密码反馈模式(cipher feedback;CFB)以及输出反馈模式(output feedback;OFB)。这些模式中有些利用一附加初始化向量于执行次运算期间,有些使用执行于第一区块明文的第一集(set)密码回合的密文输出当成附加的输入给执行于第二区块明文的第二集密码回合。除此,本应用的领域对现今对称钥匙密码算法所应用的每一密码演算及次运算提供更深层的讨论。就具体指定执行标准而言,读者可由美国联邦信息处理标准公告46-3(Federal Information Processing Standards Publication;FIPS-46-3),1999年10月25日出版,得到DES及三重DES的详细探讨;以及美国联邦信息处理标准公告197(FIPS-197),2001年11月26日出版,得到AES的详细探讨。上述提及的两种标准是由美国国家标准暨技术局(National Institute ofStandards and Technology;NIST)所发布及主张,在此列为参考以供本发明所有意图及目的的说明。除上述所提及的标准,教导(tutorial)、白皮书、套件(toolkit)以及资源文章均可透过因特网http://csrc.nist.gov/在NIST的计算机资源安全中心(Computer Security Resource Center;CSRC)获得。Perform cryptographic operations on multiple consecutive blocks of text. All symmetric key algorithms utilize the same class of modes. These modes include electronic codebook (ECB) mode, cipher block chaining (CBC) mode, cipher feedback mode (cipher feedback; CFB) and output feedback mode (output feedback; OFB). Some of these modes utilize an additional initialization vector during the execution of the second operation, and some use the ciphertext output of the first set of cipher rounds executed on the plaintext of the first block as additional input to the plaintext of the second block Episode 2 Password Round. In addition, this application area provides an in-depth discussion of each cryptographic calculation and sub-operation used in today's symmetric-key cryptographic algorithms. As far as specific implementation standards are concerned, readers can obtain a detailed discussion of DES and triple DES from the Federal Information Processing Standards Publication 46-3 (Federal Information Processing Standards Publication; FIPS-46-3), published on October 25, 1999; And the United States Federal Information Processing Standard Notice 197 (FIPS-197), published on November 26, 2001, has been discussed in detail by AES. The two standards mentioned above are issued and advocated by the National Institute of Standards and Technology (NIST), and are hereby incorporated by reference for all intents and purposes of the present invention. In addition to the standards mentioned above, tutorials, white papers, toolkits, and resource articles are available on the Internet at http://csrc.nist.gov/ at NIST's Computer Security Resource Center; CSRC) obtained.

熟悉该项技术者可察知有许多的应用程序能够执行在可以执行密码运算(例如:加密及解密)的计算机系统。实际上,某些操作系统(例如:微软Window XP、Linux)提供直接加密/解密的服务于密码基元(primitive)、密码应用程序接口以及诸如此类的形式。然而,本发明人已观察到现今计算机密码技术在某些方面的缺陷,因此通过由图1强调及讨论这些缺陷。Those skilled in the art will recognize that there are many applications that can be executed on computer systems that can perform cryptographic operations (eg, encryption and decryption). In fact, some operating systems (eg: Microsoft Window XP, Linux) provide direct encryption/decryption services in the form of cryptographic primitives, cryptographic APIs, and the like. However, the present inventors have observed certain deficiencies in current computer cryptography techniques, and thus highlight and discuss these deficiencies by referring to FIG. 1 .

图1是方块图100图解现今计算机密码应用程序。方块图100描绘第一计算机工作站101连接局域网络105,且局域网络105也连接第二计算机工作站102、网络档案储存装置106、第一路由器107或其它接口形式到广域网络110(例如:因特网)以及像是符合IEEE 802.11的无线网络路由器108,笔记型计算机104则是透过无线网络109与无线路由器108成为接口。在广域网络110方面,第二路由器111提供接口给第三计算机工作站103。FIG. 1 is a block diagram 100 illustrating today's computer cryptographic applications. Block diagram 100 depicts first computer workstation 101 connected to local area network 105, and local area network 105 also connects second computer workstation 102, network file storage device 106, first router 107 or other interface form to wide area network 110 (for example: Internet) and Like a wireless network router 108 conforming to IEEE 802.11, the notebook computer 104 is interfaced with the wireless router 108 through the wireless network 109 . In terms of the wide area network 110 , a second router 111 provides an interface to the third computer workstation 103 .

如上概述,现今的使用者在工作期间面临许多次的计算机信息安全问题。例如:在现今多任务(multi-tasking)操作系统的控制下,使用者工作站101可以同时执行多个任务(task)且每一任务要求密码运算。使用者工作站101要求执行加密/解密应用程序112(无论是操作系统的一部分或是由操作系统所引动(invoke))以储存区域档案于网络档案储存装置106,在档案储存的同时,使用者可以传送一加密讯息给在工作站102的第二使用者,其中工作站102也要求执行加密/解密应用程序112的一范例,而加密讯息可能是实时(例如:实时讯息)或者是非实时(例如:电子邮件)。此外,使用者可以透过广域网络110从工作站103存取或提供其金融数据(例如:信用卡号、金融交易等)或者其它形式的机密数据。工作站103也可以代表是家庭办公或其它远程计算机103,其可以让工作站101的使用者离开办公室时用以存取局域网络105的任何共享资源101、102、106、107、108以及109。上述提及的每一活动均要求引动加密/解密应用程序112的相对范例,并且无线网络109目前普遍地提供于咖啡店、机场、学校以及其它公众场所,因而促使使用者笔记型计算机104不仅对其他使用者传送/接收的讯息进行加密/解密,并且也对透过无线网络109到无线路由器108的所有通讯进行加密及解密。As outlined above, today's users face computer information security issues many times during their work. For example: under the control of the current multi-tasking operating system, the user workstation 101 can execute multiple tasks simultaneously and each task requires a cryptographic operation. The user workstation 101 requests to execute the encryption/decryption application 112 (whether part of the operating system or invoked by the operating system) to store local files on the network file storage device 106. While the files are being stored, the user can Send an encrypted message to a second user at workstation 102, where workstation 102 also requires execution of an instance of encryption/decryption application 112, and the encrypted message may be real-time (e.g., real-time message) or non-real-time (e.g., e-mail ). In addition, users can access or provide their financial data (such as credit card numbers, financial transactions, etc.) or other forms of confidential data from the workstation 103 through the wide area network 110 . Workstation 103 may also represent a home office or other remote computer 103 that allows the user of workstation 101 to access any shared resources 101 , 102 , 106 , 107 , 108 , and 109 of local area network 105 while away from the office. Each of the above-mentioned activities requires a relative instance of the encryption/decryption application 112, and the wireless network 109 is now commonly provided in coffee shops, airports, schools, and other public places, thereby enabling the user's notebook computer 104 to not only Messages sent/received by other users are encrypted/decrypted, and all communications through the wireless network 109 to the wireless router 108 are also encrypted and decrypted.

熟悉该项技术者可因此察知在工作站101-104中连同每一要求密码运算的活动,须有一相对的要求以引动(invoke)加密/解密应用程序112的范例,因此计算机101-104在最近的将来有可能同时执行数以百计的密码运算。Those skilled in the art can therefore perceive that in workstations 101-104, along with each activity requiring cryptographic operations, there must be a relative requirement to invoke (invoke) the example of encryption/decryption application 112, so computers 101-104 are in the nearest In the future it will be possible to perform hundreds of cryptographic operations simultaneously.

本发明人注意到上述计算机系统101-104通过由引动加密/解密应用程序112的一或多范例以执行密码运算的方法的限制。例如:透过程序规划的软件执行一指定功能就比透过硬件执行相同功能还慢。且每次执行加密/解密应用程序112时,正在计算机101-104执行的任务就必须暂缓执行,并且密码运算的参数(例如:明文、密文、模式以及钥匙等)必须透过操作系统传送给加密/解密应用程序112为完成密码运算所引动的范例。并且因为密码演算须在一指定的数据区块引动许多回合的次运算,加密/解密应用程序112的执行引动许多计算机指令的执行而对整体系统的处理速度产生不利的影响。熟悉该项技术者可察知在微软Outlook传送少量加密电子邮件讯息的时间会相当于只传送未加密电子邮件讯息的五倍。The inventors have noted limitations in the manner in which computer systems 101 - 104 described above perform cryptographic operations by invoking one or more instances of encryption/decryption application 112 . For example, software programmed to perform a given function is slower than performing the same function through hardware. And every time the encryption/decryption application program 112 is executed, the tasks being executed by the computers 101-104 must be suspended, and the parameters of the cryptographic operations (for example: plaintext, ciphertext, mode and key, etc.) must be transmitted to the computer through the operating system. Encryption/decryption application 112 is an example invoked to perform cryptographic operations. And because the cryptographic calculation involves many rounds of operations on a given data block, the execution of the encryption/decryption application 112 involves the execution of many computer instructions which adversely affects the processing speed of the overall system. Those familiar with the technology know that sending a small amount of encrypted e-mail messages in Microsoft Outlook can take five times as long as sending only unencrypted e-mail messages.

此外,目前的技术受限于操作系统介入的延迟。大部分的应用程序并无提供完整的钥匙产生或加密/解密组件;其利用操作系统的组件或外挂应用程序以完成上述的任务,此外操作系统因中断及其它正在执行应用程序的请求而转移其执行。Additionally, current techniques are limited by the latency of operating system intervention. Most applications do not provide complete key generation or encryption/decryption components; they use operating system components or plug-in applications to complete the above tasks, and the operating system transfers its implement.

并且,本发明人注意到在现今计算机系统101-104的密码运算是相类似于微处理机尚未有浮点单元时的浮点数学运算。早期的浮点单元运算是由软件所执行,因此执行的非常慢;同浮点运算,由软件执行的密码运算也是极慢。当浮点技术更进一步发展,浮点辅助处理器提供浮点指令以供执行,这些浮点辅助处理器执行浮点运算比软件执行快了许多,但却增加了系统的成本。相同地,密码辅助处理器目前以附加在电路板或以外接装置与主处理器透过并列端口或其它接口总线(例如:USB)成为接口的形式存在,这些辅助处理器能使密码运算的完成比由纯软件所执行的快了许多。但是密码辅助处理器增加系统配备的成本、要求额外的电源以及降低系统的整体可靠度。密码辅助处理器的实现对刻意的窥探而言有其弱点,因为数据信道与主微处理器并不在相同的晶粒(die)上。Also, the inventors noticed that the cryptographic operations in the current computer systems 101-104 are similar to the floating point mathematical operations in the days when microprocessors did not have floating point units. Early floating-point unit operations were performed by software, so the execution was very slow; similar to floating-point operations, cryptographic operations performed by software were also extremely slow. When the floating-point technology is further developed, the floating-point auxiliary processor provides floating-point instructions for execution. These floating-point auxiliary processors perform floating-point operations much faster than software execution, but increase the cost of the system. Similarly, cryptographic auxiliary processors currently exist in the form of additional circuit boards or external devices that interface with the main processor through parallel ports or other interface buses (such as: USB). These auxiliary processors enable the completion of cryptographic operations. Much faster than performed by pure software. But the cryptographic auxiliary processor increases the cost of system configuration, requires additional power supply and reduces the overall reliability of the system. The implementation of a cryptographic secondary processor has its vulnerability to deliberate snooping because the data channels are not on the same die as the main microprocessor.

因此本发明人确认将密码硬件加入现今微处理器的需要,通过此,要求密码运算的应用程序可通过由一单独、基元(atomic)的密码指令指示微处理器执行密码运算。本发明人也确认应以此功能限定操作系统介入及管理的要求,并且期望密码指令可以使用于应用程序的权限层级(privilege level)以及密码硬件可相称(comport with)于现今微处理器的一般架构,并且密码硬件及相关联的密码指令可支持兼容先前的操作系统及应用程序。更期望的是提供执行密码运算的装置及方法,其可阻止未授权的监视;其可支持及可程序化有关多密码演算;其可支持核对及测试实体特定的密码演算;其可允许使用者提供钥匙也可自行产生钥匙;其支持多数据区块大小及钥匙长度(key size);其提供有效率的多资料区块管线处理;以及其提供可程序化区块加密/解密模式如ECB、CBC、CFB以及OFB。The inventors thus identified the need to add cryptographic hardware to today's microprocessors, whereby applications requiring cryptographic operations can instruct the microprocessor to perform cryptographic operations by a single, atomic cryptographic instruction. The inventor also confirms that the requirements for operating system intervention and management should be limited by this function, and it is expected that the cryptographic instructions can be used at the privilege level of the application program and the cryptographic hardware can be comported with the general requirements of today's microprocessors. architecture, and cryptographic hardware and associated cryptographic instructions may support compatibility with previous operating systems and applications. It is further desirable to provide devices and methods for performing cryptographic operations that prevent unauthorized monitoring; that can support and program related multiple cryptographic algorithms; that can support verification and testing of entity-specific cryptographic algorithms; that allow users to The key can also be generated by itself; it supports multiple data block sizes and key lengths (key size); it provides efficient multi-data block pipeline processing; and it provides programmable block encryption/decryption modes such as ECB, CBC, CFB, and OFB.

发明内容Contents of the invention

本发明目的是用以解决上述现有技术中的问题及缺点。本发明提供一较佳的技术以执行密码运算于一微处理机中。在一实施例中,提供一种执行密码运算的装置,而此装置是包含密码指令及转译逻辑。上述的密码指令是由一计算装置接收并将其当成在此计算装置上执行指令流的一部分,并且此密码指令指定一种密码运算。上述的转译逻辑是操作性地耦合密码指令且将密码指令转译成微指令,此微指令是用以在指示计算装置储存对应第一输入文字区块的输出文字区块之前,指示计算装置加载第二输入文字区块并对此第二输入文字区块执行密码运算。因此,在对第二输入文字区块执行密码运算期间,上述的输出文字区块可以被储存。The purpose of the present invention is to solve the above-mentioned problems and shortcomings of the prior art. The present invention provides a better technique for performing cryptographic operations in a microprocessor. In one embodiment, a device for performing cryptographic operations is provided, and the device includes cryptographic instructions and translation logic. The aforementioned cryptographic command is received by a computing device as part of a stream of instructions executed on the computing device, and the cryptographic command specifies a cryptographic operation. The translation logic described above is operatively coupled to the cryptographic instructions and translates the cryptographic instructions into microinstructions for instructing the computing device to load an output text block corresponding to the first input text block before instructing the computing device to store A second input text block and performing cryptographic operations on the second input text block. Therefore, during the cryptographic operation performed on the second input text block, the above-mentioned output text block can be stored.

本发明提供一种执行密码运算的装置,此装置包含配置用以转译一密码指令成一序列(sequence)微指令(microinstructions)的转译逻辑(translation logic)。此序列的微指令包含一第一微指令及一第二微指令。上述的第一微指令指示(direct)加载一第二输入文字区块并且执行一密码运算于此第二输入文字区块。上述的第二微指令指示储存一第一输出文字区块,此第一输出文字区块根据执行的密码运算对应于一第一输入文字区块。上述的转译逻辑发布(issue)第一微指令后发布第二微指令。The present invention provides an apparatus for performing cryptographic operations, the apparatus including translation logic configured to translate a cryptographic instruction into a sequence of microinstructions. The sequence of microinstructions includes a first microinstruction and a second microinstruction. The above-mentioned first microinstruction directs to load a second input text block and execute a cryptographic operation on the second input text block. The above-mentioned second microinstruction instructs to store a first output text block, and the first output text block corresponds to a first input text block according to the executed cryptographic operation. The above translation logic issues the second microinstruction after issuing the first microinstruction.

本发明提供一种在一装置执行密码运算的方法,此方法包含转译一密码指令成一第一微指令及一第二微指令,其中此密码指令指定(prescribes)一种密码运算的执行。上述的第一微指令指示(direct)此装置加载一第二输入文字区块并且执行此密码运算于此第二输入文字区块,上述的第二微指令指示此装置储存一第一输出文字区块,此第一输出文字区块根据执行的此密码运算对应于一第一输入文字区块;以及发布(issue)上述的第一微指令给一密码单元后,发布上述的第二微指令给此密码单元;通过此在此密码运算对此第二输入文字区块执行期间,此输出文字区块可以被储存。The present invention provides a method for performing a cryptographic operation on a device. The method includes translating a cryptographic instruction into a first microinstruction and a second microinstruction, wherein the cryptographic instruction prescribes the execution of a cryptographic operation. The above-mentioned first microinstruction directs the device to load a second input text block and executes the cryptographic operation on the second input text block, and the above-mentioned second microinstruction instructs the device to store a first output text block block, this first output text block corresponds to a first input text block according to the cryptographic operation performed; The cryptographic unit by which the output text block can be stored during the execution of the cryptographic operation on the second input text block.

附图说明Description of drawings

图1是现今密码应用的方块图;Figure 1 is a block diagram of today's cryptographic applications;

图2是执行密码运算技术的方块图;Figure 2 is a block diagram of the technique for performing cryptographic operations;

图3是本发明执行密码运算的微处理器装置的方块图;Fig. 3 is the block diagram of the microprocessor device that the present invention carries out cryptographic operation;

图4是本发明的基元(atomic)密码指令实施例的方块图;Fig. 4 is the block diagram of primitive (atomic) password instruction embodiment of the present invention;

图5是图4的基元密码指令区块加密模式字段值的范例的表格;FIG. 5 is a table of examples of field values of the block encryption mode of the primitive cipher instruction block in FIG. 4;

图6是本发明在X86兼容微处理器中的密码单元的方块图;Fig. 6 is the block diagram of the cryptographic unit of the present invention in X86 compatible microprocessor;

图7是图6的微处理器中指示密码次运算的范例微指令字段的方块图;7 is a block diagram of an example microinstruction field indicating cryptographic operations in the microprocessor of FIG. 6;

图8是图7的XLOAD微指令暂存字段值格式的表格;Fig. 8 is the form of the XLOAD microinstruction temporary storage field value format of Fig. 7;

图9是图7的XSTOR微指令暂存字段值格式的表格;Fig. 9 is the form of the XSTOR microinstruction temporary storage field value format of Fig. 7;

图10是本发明指定密码运算参数的控制字组格式范例的方块图;Fig. 10 is a block diagram of an example of the format of a control block specifying cryptographic operation parameters in the present invention;

图11是本发明的一较佳实施密码单元的方块图;Fig. 11 is a block diagram of a preferred implementation cryptographic unit of the present invention;

图12是本发明执行有关进阶加密标准(AES)算法密码运算的一区块加密逻辑实施例的方块图;FIG. 12 is a block diagram of a block encryption logic embodiment of the present invention that performs cryptographic operations related to the Advanced Encryption Standard (AES) algorithm;

图13是本发明微指令流的一实施例对密码单元的单阶实施例的表格;Fig. 13 is the form of an embodiment of the microinstruction stream of the present invention to the single-stage embodiment of the cryptographic unit;

图14是本发明微指令流的另一实施例对密码单元的单阶Fig. 14 is another embodiment of the microinstruction flow of the present invention to the single stage of the cryptographic unit

实施例的表格;Examples of tables;

图15是本发明微指令流的一实施例对密码单元的两阶实施例的表格;以及Fig. 15 is the form of an embodiment of microinstruction flow of the present invention to the two-stage embodiment of cryptographic unit; And

图16是本发明微指令流的另一实施例对密码单元的两阶Fig. 16 is another embodiment of the microinstruction flow of the present invention to the two stages of the cryptographic unit

实施例的表格。Table of Examples.

具体实施方式Detailed ways

本发明的一些实施例会详细描述如下。然而,除了详细描述外,本发明还可以广泛地在其它的实施例施行,且本发明的范围不受限定,其以之后的专利范围为准。并且,为提供更清楚的描述及更容易理解本发明,图示内各部分并没有依照其相对尺寸绘图,某些尺寸与其它相关尺度的比例已经被夸张;不相关的细节部分也未完全绘出,以求图示的简洁。Some embodiments of the present invention are described in detail as follows. However, the present invention can be widely implemented in other embodiments other than those described in detail, and the scope of the present invention is not limited, which is subject to the scope of the following patents. Moreover, in order to provide a clearer description and an easier understanding of the present invention, the various parts in the illustrations have not been drawn according to their relative sizes, and the ratios of certain dimensions to other relevant dimensions have been exaggerated; irrelevant details have not been fully drawn out, in order to simplify the diagram.

鉴于上述所讨论的密码运算及现今计算机系统用以加/解密数据的相关技术,这些技术及其相关限制将在图2中继续探讨,而接下来本发明也将根据图3到图16加以讨论。本发明提供一种在现今计算机系统中执行密码运算的装置及方法,其透过主要机制展现优秀的性能特征并且更满足上述所提及的目标,像是限制操作系统的干预、先前(legacy)架构的兼容性、算法及模式的可程序性、高效率的多数据区块管线操作,防止骇客入侵以及可测试性等等。In view of the above discussion of cryptographic operations and related techniques used by today's computer systems to encrypt/decrypt data, these techniques and their associated limitations will be continued in Figure 2, and the present invention will then be discussed in light of Figures 3 through 16 . The present invention provides an apparatus and method for performing cryptographic operations in today's computer systems, which exhibits excellent performance characteristics through the main mechanism and moreover satisfies the above-mentioned goals, such as limiting the intervention of the operating system, legacy Compatibility of architecture, programmability of algorithms and patterns, high-efficiency multi-data block pipeline operation, prevention of hacking and testability, etc.

请参照图2,方块图200描绘当今计算机系统中执行密码运算的技术。方块图200包含一微处理器201,其撷取指令及从系统内存中一称为应用程序内存203存取应用程序相关的数据,而程控及应用程序内存203中数据的存取通常是由属于系统内存保护范围的操作系统软件202所管理。如上所述,当一执行应用程序(例如:电子邮件程序或档案储存程序)要求执行密码运算时,此执行应用程序必须通过由指示(direct)微处理器201执行相当数量的指令以完成密码运算。这些指令可能是执行应用程序本身的子程序,也可能是连结到此执行应用程序的外挂应用程序,或者是由操作系统202所提供的服务。姑且不论他们的关联性,熟悉该项技术者可察知这些指令将驻于某些指定或分派的内存范围。为达讨论目的,这些内存范围显示在应用内存203并且包含一密码钥匙产生应用204,其中密码钥匙产生应用204产生或接收一密码钥匙并且扩展此钥匙成一钥匙排程使用于密码回合运算中。就多区块加密运算而言,区块加密应用206被引动(invoke)。区块加密应用206执行存取明文(plaintext)区块210、钥匙排程205以及密码参数209的指令,其中密码参数209是进一步指示明确的密码运算,如模式、钥匙排程位置等,且在要求特定模式时,加密应用206也可存取初始向量208。加密应用206执行其内的指令以产生对应的密文(ciphertext)区块211。同理,区块解密应用207被引动以执行区块解密运算。区块解密应用207执行存取密文区块211、钥匙排程205以及密码参数209的指令,其中密码参数209是进一步指示明确的密码运算,并且在要求特定模式时,也可存取初始向量208。解密应用207执行其内的指令以产生对应的明文区块210。Referring to FIG. 2, a block diagram 200 depicts techniques for performing cryptographic operations in today's computer systems. Block diagram 200 includes a microprocessor 201 that fetches instructions and accesses application-related data from a system memory called application memory 203, which is typically accessed by the The system memory protection scope is managed by the operating system software 202 . As mentioned above, when an execution application program (for example: e-mail program or file storage program) requires to perform a cryptographic operation, the execution application program must execute a considerable number of instructions by directing the microprocessor 201 to complete the cryptographic operation . These instructions may be subroutines of the executing application itself, or plug-in applications linked to the executing application, or services provided by the operating system 202 . Regardless of their associativity, those skilled in the art will recognize that these instructions will reside in certain designated or allocated memory ranges. For discussion purposes, these memory ranges are shown in application memory 203 and include a cryptographic key generation application 204 that generates or receives a cryptographic key and expands the key into a key schedule for use in cryptographic round operations. For multi-block encryption operations, the block encryption application 206 is invoked. The block encryption application 206 executes instructions for accessing plaintext blocks 210, key schedules 205, and password parameters 209, wherein the password parameters 209 further indicate specific cryptographic operations, such as mode, key schedule location, etc., and in The encryption application 206 may also access the initialization vector 208 when a particular mode is required. The encryption application 206 executes the instructions therein to generate a corresponding ciphertext block 211 . Similarly, the block decryption application 207 is activated to perform block decryption operations. The block decryption application 207 executes instructions to access the ciphertext block 211, the key schedule 205, and the cryptographic parameters 209, wherein the cryptographic parameters 209 further indicate explicit cryptographic operations, and can also access the initial vector when a specific mode is required 208. The decryption application 207 executes the instructions therein to generate the corresponding plaintext block 210 .

值得注意的是必须执行相当数量的指令以产生密码钥匙及加密或解密文字区块。上述提及的FIPS说明书包含许多伪码致能相当数量指令的范例,因此,熟悉该项技术者可察知一个简单的加密运算将要求数以百计的指令,并且每一指令须经由微处理器201执行以完成所要求的密码运算。并且,完成密码运算的指令执行对正在执行的应用程序的主目的(例如:档案管理、实时讯息、电子邮件、远程档案存取、信用卡交易)而言一般是属多余,结果让使用者误为目前执行的应用程序执行效率不佳。至于独立或外挂的加密及解密应用206及207,这些应用206及207的引动及管理也必须服从操作系统202的其它请求,例如支持中断、例外(exception)以及更恶化问题的类似事件。并且计算机系统所要求每一同时的密码运算,应用程序204、207及208的个别实例必须被配置在内存203,且预期由微处理器201所要求执行的同时密码运算的数目也将随时间而增加。It is worth noting that a considerable number of commands must be executed to generate cryptographic keys and encrypt or decrypt blocks of text. The above-mentioned FIPS specification contains many examples of pseudocode enabling a considerable number of instructions, so those skilled in the art will recognize that a simple encryption operation will require hundreds of instructions, and each instruction must pass through the microprocessor. 201 is executed to complete the required cryptographic operations. Moreover, the execution of the instruction to complete the cryptographic calculation is generally redundant to the main purpose of the application being executed (for example: file management, real-time messaging, email, remote file access, credit card transactions), and as a result, the user is mistaken for The currently executing application is not performing well. As for the independent or plug-in encryption and decryption applications 206 and 207, the initiation and management of these applications 206 and 207 must also obey other requirements of the operating system 202, such as supporting interrupts, exceptions, and similar events that worsen the problem. And for each simultaneous cryptographic operation required by the computer system, individual instances of the application programs 204, 207, and 208 must be configured in memory 203, and it is expected that the number of simultaneous cryptographic operations required to be performed by the microprocessor 201 will also vary over time. Increase.

本发明人注意到目前计算机系统密码技术的问题与限制,并且确认在微处理器中提供执行密码运算的装置及方法的需要。通过此,本发明提供一微处理器及相关的方法透过其内的密码单元执行密码运算,此密码单元是通过由单一密码指令的程序行程(program)执行密码运算。本发明现在将以图3到图12为参考加以讨论。The inventors are aware of the problems and limitations of current computer system cryptography and recognize the need to provide means and methods for performing cryptographic operations in microprocessors. In this regard, the present invention provides a microprocessor and related methods for performing cryptographic operations through a cryptographic unit therein, the cryptographic unit performing cryptographic operations through a program of single cryptographic instructions. The present invention will now be discussed with reference to FIGS. 3 to 12 .

请参照图3,其为本发明执行密码运算的微处理器的方块图300。方块图300描绘一微处理器301,其透过内存总线319与系统内存321耦合连接,且处理器301包含从指令缓存器接收指令的一转译逻辑303。转译逻辑303包含逻辑、电路、装置或微码(例如:微指令或本机指令),或逻辑、电路、装置或微码的组合,或用以转译指令成为指令相关序列的等效组件。这些在转译逻辑303中执行转译的组件可能与在微处理器301中执行其它功能的电路、微码共享,而根据本应用的范围,微码是对照一个或多个微指令的术语。一微指令(也可参照成一本机指令)是一单元层级执行的一指令,例如微指令是由精简指令集计算机(reduced instruction set computer;RISC)微处理器直接执行。至于复杂指令集计算机(complex instruction setcomputer;CISC)微处理器,如x86兼容的微处理器,其x86指令被转译为关联的微指令并且由复杂指令集计算机微处理器中的单元直接执行。转译逻辑303耦合微指令队列304,且此微指令队列304具有复数个通道305、306。微指令由微指令队列304提供给包含一临时文件307的暂存阶段逻辑,而此临时文件307包含复数个暂存308-313(register),其内容在执行一指定的密码运算前就已建立。暂存308-313指到内存321中含有执行指定密码运算数据的对应位置323-327。暂存阶段耦合到加载逻辑314,此加载逻辑314是与取回数据以执行址定密码运算的数据快取315成接口,而此数据快取315通过由数据总线319耦合到内存321。执行逻辑328偶合到加载逻辑314并且执行由前面阶段传来的微指令所指定的运算。执行逻辑328包含逻辑、电路、装置或微码(例如:微指令或本机指令),或逻辑、电路、装置或微码的组合,或用以执行由指令指定的运算的等效组件。这些在执行逻辑328中执行运算的组件可能与在微处理器301中执行其它功能的电路、微码共享。执行逻辑包含密码单元316,此密码单元316接收从加载逻辑314被要求执行指定密码运算的数据。微指令指示密码单元316执行指定密码运算于复数个输入文字区块326以产生相对应复数个输出文字区块327。密码单元316包含逻辑、电路、装置或微码(例如:微指令或本机指令),或逻辑、电路、装置或微码的组合,或用以执行密码运算的等效组件。这些在密码单元316中执行运算的组件可能与在微处理器301中执行其它功能的电路、微码共享。在一实施例中,密码单元316并列操作与在执行逻辑328内的其它执行单元(未绘出),例如整数单元、浮点数单元等。在本应用范围一“单元”的实施是包含逻辑、电路、装置或微码(例如:微指令或本机指令),或逻辑、电路、装置或微码的组合,或用以执行指定功能或指定运算的等效组件。这些在特定单元中执行指定功能或指定运算的组件可能与在微处理器301中执行其它功能的电路、微码共享。例如:一实施例中,一整数单元包含逻辑、电路、装置或微码(例如:微指令或本机指令),或逻辑、电路、装置或微码的组合,或用以执行整数指令的等效组件;一浮点数单元包含逻辑、电路、装置或微码(例如:微指令或本机指令),或逻辑、电路、装置或微码的组合,或用以执行浮点数指令的等效组件;则在整数单元中执行整数指令的组件可能与在浮点数单元中执行浮点数指令的其它电路、微码等共享。在一与x86架构兼容的实施例中,密码单元316并列操作与x86整数单元、x86浮点数单元、x86数学数组处理指令(Mathematic Matrix Extension;MMX)单元、x86单指令多数据流扩展(Streaming SIMD Extensions;SSE)单元。根据本应用范围,当一实施例可以正确执行设计给x86微处理器执行的大部分应用程序时,此实施例是与x86架构兼容,一应用程序正确执行而得到其预期的结果。替代x86兼容实施例预期密码单元并列操作与先前提及的x86执行单元的子集。密码单元316耦合到储存逻辑317并且提供相对应复数个输出文字区块327,而此储存逻辑317也耦合到指定输出文字数据327给系统内存321储存的数据快取315。此数据快取315偶合到写回逻辑318,而当所指定的密码运算完成时,写回逻辑318更新在临时文件307中的暂存308-313。在一实施例中,微指令与时脉信号(未绘出)同步经过每一个上述所提及的逻辑阶段302、303、304、307、314、316-318以使运算可以同时执行而相似于在线执行运算。Please refer to FIG. 3 , which is a block diagram 300 of a microprocessor for performing cryptographic operations according to the present invention. Block diagram 300 depicts a microprocessor 301 coupled to system memory 321 via memory bus 319, and processor 301 includes a translation logic 303 that receives instructions from an instruction register. The translation logic 303 includes logic, circuit, device or microcode (eg, microinstruction or native instruction), or a combination of logic, circuit, device or microcode, or equivalent components for translating instructions into instruction-related sequences. These components that perform translation in the translation logic 303 may be shared with circuits that perform other functions in the microprocessor 301 , microcode, and according to the scope of this application, microcode is a term that refers to one or more microinstructions. A microinstruction (also referred to as a native instruction) is an instruction executed at a unit level, for example, a microinstruction is directly executed by a reduced instruction set computer (RISC) microprocessor. As for a complex instruction set computer (CISC) microprocessor, such as an x86-compatible microprocessor, its x86 instructions are translated into associated microinstructions and directly executed by units in the complex instruction set computer microprocessor. The translation logic 303 is coupled to the microinstruction queue 304 , and the microinstruction queue 304 has a plurality of lanes 305 , 306 . The microinstructions are provided by the microinstruction queue 304 to the temporary storage stage logic including a temporary file 307, and this temporary file 307 includes a plurality of temporary storages 308-313 (registers), whose contents have been established before performing a specified cryptographic operation . Temporary storage 308-313 refers to corresponding locations 323-327 in memory 321 containing data for performing specified cryptographic operations. The scratch stage is coupled to load logic 314 which interfaces with a data cache 315 which is coupled to memory 321 by a data bus 319 for retrieving data to perform address-specific cryptographic operations. Execution logic 328 is coupled to load logic 314 and performs operations specified by microinstructions passed from previous stages. Execution logic 328 includes logic, circuits, devices, or microcode (eg, microinstructions or native instructions), or a combination of logic, circuits, devices, or microcode, or equivalent components for performing operations specified by instructions. These components that perform operations in the execution logic 328 may be shared with circuits and microcodes that perform other functions in the microprocessor 301 . Execution logic includes a cryptographic unit 316 that receives data from load logic 314 that is required to perform specified cryptographic operations. The microinstruction instructs the cryptographic unit 316 to perform a specified cryptographic operation on the plurality of input text blocks 326 to generate a corresponding plurality of output text blocks 327 . The cryptographic unit 316 includes logic, circuits, devices or microcodes (eg, microinstructions or native instructions), or a combination of logics, circuits, devices or microcodes, or equivalent components for performing cryptographic operations. These components that perform operations in the cryptographic unit 316 may be shared with circuits and microcodes that perform other functions in the microprocessor 301 . In one embodiment, the cryptographic unit 316 operates in parallel with other execution units (not shown) within the execution logic 328 , such as integer units, floating point units, and the like. The implementation of a "unit" in the scope of this application includes logic, circuit, device or microcode (for example: microinstruction or native instruction), or a combination of logic, circuit, device or microcode, or is used to perform a specified function or Specifies the equivalent component of the operation. These components that perform specified functions or specified operations in specific units may be shared with circuits and microcodes that perform other functions in the microprocessor 301 . For example: in one embodiment, an integer unit includes logic, circuit, device or microcode (for example: microinstruction or native instruction), or a combination of logic, circuit, device or microcode, or for executing integer instructions, etc. effective components; a floating-point unit comprising logic, circuits, devices, or microcode (eg, microinstructions or native instructions), or a combination of logic, circuits, devices, or microcodes, or equivalent components for executing floating-point instructions ; then the components that execute integer instructions in the integer unit may be shared with other circuits, microcode, etc. that execute floating-point instructions in the floating-point unit. In an embodiment compatible with the x86 architecture, the cryptographic unit 316 operates in parallel with the x86 integer unit, x86 floating point unit, x86 Mathematic Matrix Extension (MMX) unit, x86 Streaming SIMD Extensions; SSE) unit. According to this scope of application, when an embodiment can correctly execute most of the application programs designed for x86 microprocessor execution, this embodiment is compatible with the x86 architecture, and an application program can be executed correctly to obtain its expected results. Alternative x86 compatible embodiments contemplate that the cryptographic units operate in parallel with a subset of the previously mentioned x86 execution units. The cryptographic unit 316 is coupled to the storage logic 317 and provides a corresponding plurality of output text blocks 327 , and the storage logic 317 is also coupled to the data cache 315 that assigns the output text data 327 to the system memory 321 for storage. This data cache 315 is coupled to writeback logic 318 which updates scratchpads 308-313 in temporary file 307 when the specified cryptographic operations complete. In one embodiment, microinstructions are synchronized with a clock signal (not shown) through each of the aforementioned logic stages 302, 303, 304, 307, 314, 316-318 so that operations can be performed simultaneously similar to Perform calculations online.

在系统内存321中,一要求指定密码运算的应用程序可以直接指示微处理器301透过单一密码指令322(参照用以说明的XCRYPT指令322)执行此运算。在一CISC实施例中,XCRYPT指令322包含一指定密码运算的微指令。在一实施例中,XCRYPT指令322利用一存在指令集架构中的一空闲或未使用指令运算码。在一x86架构兼容的实施例中,XCRYPT指令322是一4字节指令包含一x86前置REP(如0xF3)、两字节未使用x86运算码(如0x0FA7)、一字节有关于一指定区块密码模式以应用于执行一指定密码运算。在一实施例中,根据本发明的XCRYPT指令322可以在系统权限供给应用程序的层级执行,因而可以程序规划于指令的程序流以提供给微处理器301不论是由应用程序直接或在操作系统320的控制下。因为仅有一指令322指示微处理器301执行指定的密码运算,而运算的完成对操作系统320应是显而易见。In the system memory 321, an application program requiring a specific cryptographic operation can directly instruct the microprocessor 301 to perform the operation through a single cryptographic instruction 322 (see XCRYPT instruction 322 for illustration). In a CISC embodiment, the XCRYPT instruction 322 includes a microinstruction specifying a cryptographic operation. In one embodiment, the XCRYPT instruction 322 utilizes a spare or unused instruction opcode present in the ISA. In an x86-architecture-compatible embodiment, the XCRYPT instruction 322 is a 4-byte instruction comprising an x86 prefixed REP (eg, 0xF3), two bytes of unused x86 opcode (eg, 0x0FA7), and one byte for a specified The block cipher mode is applied to perform a specified cipher operation. In one embodiment, the XCRYPT instruction 322 according to the present invention can be executed at the level of the application program provided by the system authority, so it can be programmed in the program flow of the instruction to provide to the microprocessor 301 whether it is directly by the application program or in the operating system. 320 under control. Since there is only one instruction 322 instructing the microprocessor 301 to perform the specified cryptographic operation, the completion of the operation should be obvious to the operating system 320 .

在操作中,操作系统320引动一应用程序以执行于微处理器301。如部分指令流于应用程序的执行期间,一XCRYPT指令322从内存321提供给撷取逻辑302。然而,在XCRYPT指令322执行之前,在程序流的指令指示微处理器301初始化暂存308-312的内容以使他们指到内存321中的位置323-327,其包含一密码控制字组323、一初始化密码钥匙324或一钥匙排程324、一初始化向量325(如果需要)、运算用的输入文字326、以及输出文字327。在执行XCRYPT指令322之前须先初始化暂存308-312,因为XCRYPT指令322与一附加含有区块计数的暂存313是参照暂存308-312,其中区块计数是在输入文字范围326加密或解密的数据区块的数目。因此转译逻辑303从撷取逻辑302取回XCRYPT指令并且转译成一序列相对应的微指令以指示微处理器301执行指定的密码运算。一第一复数个微指令305-306于相对应微指令序列中,指示密码单元316从加载逻辑314加载数据,并且开始执行指定数目的密码回合以产生相对应区块的输出数据提供给储存逻辑317通过由数据快取315储存于内存321中的输出文字范围327。一第二复数个微指令(未绘出)于相对应微指令序列中,指示在微处理器301中其它执行单元(未绘出)执行其它为完成指定密码运算所需的运算,例如:管理包含暂时结果及计数的非架构暂存(未绘出)、更新输出及输入指针缓存器311-312、更新输入文字区块326的加密/解密初始指针缓存器310(如果需要)、处理未处理的中断等等。在一实施例中,缓存器308-313是架构性的缓存器。架构性缓存器308-313是为实现特定微处理器的指令集架构中所定义的一种缓存器。In operation, the operating system 320 launches an application program to execute on the microprocessor 301 . An XCRYPT instruction 322 is provided from memory 321 to fetch logic 302 as part of the instruction flow during execution of the application. However, before the XCRYPT instruction 322 is executed, the instructions in the program flow instruct the microprocessor 301 to initialize the contents of the scratchpad 308-312 so that they point to locations 323-327 in the memory 321, which include a cryptographic control word 323, An initialization cryptographic key 324 or a key schedule 324 , an initialization vector 325 (if necessary), input text 326 for operation, and output text 327 . The temporary storage 308-312 must be initialized before executing the XCRYPT command 322, because the XCRYPT command 322 and an additional temporary storage 313 containing the block count refer to the temporary storage 308-312, wherein the block count is encrypted in the input text range 326 or The number of decrypted data blocks. Therefore, the translation logic 303 retrieves the XCRYPT instruction from the fetch logic 302 and translates it into a sequence of corresponding microinstructions to instruct the microprocessor 301 to perform the specified cryptographic operations. A first plurality of microinstructions 305-306 in the corresponding microinstruction sequence, instructs the cryptographic unit 316 to load data from the loading logic 314, and begins to perform a specified number of cryptographic rounds to generate the output data of the corresponding block to provide to the storage logic 317 through output text range 327 stored in memory 321 by data cache 315 . A second plurality of microinstructions (not shown) in the corresponding microinstruction sequence instructs other execution units (not shown) in the microprocessor 301 to perform other operations required for completing specified cryptographic operations, such as: management Non-architectural scratchpad (not shown) including interim results and counts, update output and input pointer registers 311-312, update encryption/decryption initial pointer register 310 for input text block 326 (if needed), process unprocessed interruption etc. In one embodiment, registers 308-313 are architectural registers. Architectural registers 308-313 are registers defined in the instruction set architecture for implementing a particular microprocessor.

在一实施例中,密码单元316分成复数个阶段因此允许相继输入文字区块326的管线处理。而其相反的实施例是单阶段密码单元316。一第三实施例是关注于一两阶段密码单元316,其可管线处理两个相继输入文字区块326。根据所有的实施例,密码单元316是装置以缓冲微指令及输入文字区块326,并且在储存对应前一输入文字区块326的输出文字区块327时,执行指定的密码运算于随后的输入文字区块326。为透过密码单元最大化文字区块326-327的生产量,微指令305-306因此用以指示加载随后的输入文字区块,并在对应前一输入文字区块326的输出文字区块327被储存之前,执行指定的密码运算。如此的顺序考虑到文字区块326-327有效率的管线处理,并且也将在之后更加详细的探讨。In one embodiment, the cryptographic unit 316 is divided into stages thus allowing pipelined processing of successive input text blocks 326 . The opposite embodiment is the single-stage cryptographic unit 316 . A third embodiment focuses on a two-stage cryptographic unit 316 that pipelines two successive input text blocks 326 . According to all embodiments, the cryptographic unit 316 is a device for buffering microinstructions and input literal blocks 326, and while storing output literal blocks 327 corresponding to previous input literal blocks 326, performing specified cryptographic operations on subsequent input text block 326 . In order to maximize the throughput of text blocks 326-327 through the cryptographic unit, microinstructions 305-306 are thus used to instruct to load subsequent input text blocks and output text blocks 327 corresponding to previous input text blocks 326 Before being stored, perform the specified cryptographic operations. Such an order allows for efficient pipeline processing of text blocks 326-327, and will be discussed in more detail later.

图3的方块图300教示本发明所需的组件,因此省略许多在现今微处理器301中的逻辑以求图标的简洁。然而,熟悉该项技术者可察知现今特定实现的微处理器301是包含许多阶段及逻辑,在此为图标的简洁而将其部分合并。例如:加载逻辑314在一快取线程对准阶段之后可以嵌入一随一快取接口阶段的一地址产生阶段。然而重要且应注意的是,在复数个输入文字区块326上的一完全密码运算是根据本发明通过由一单一指令322的运算对操作系统320的考量是显而易见,并且单一指令322的执行是通过由与微处理器301中其它执行单元并连操作及协调的密码单元316所完成。本发明密码单元316在实施组态中的替代实施例是类似前几年微处理器中浮点单元的硬件。密码单元316的操作及相关XCRPYT指令322是完全兼容先前操作系统及程序同时操作,并且也将在之后更加详细的探讨。The block diagram 300 of FIG. 3 teaches the components required by the present invention, thus omitting much of the logic found in today's microprocessors 301 for simplicity of illustration. However, those skilled in the art will appreciate that particular implementations of microprocessor 301 today comprise many stages and logic, some of which are consolidated here for simplicity of illustration. For example, load logic 314 may embed an address generation phase following a cache interface phase after a cache thread alignment phase. It is important to note, however, that a complete cryptographic operation on the plurality of input text blocks 326 is made apparent to the operating system 320 by the operation of a single instruction 322 according to the present invention, and the execution of the single instruction 322 is This is accomplished by the cryptographic unit 316 operating in parallel and coordinating with other execution units in the microprocessor 301 . An alternative embodiment of the cryptographic unit 316 of the present invention in an implementation configuration is hardware similar to floating point units in microprocessors of previous years. The operation of the cryptographic unit 316 and the associated XCRPYT instruction 322 is fully compatible with previous operating systems and programs operating concurrently, and will also be discussed in more detail below.

请参照图4,其为本发明的一基元(atomic)密码指令400实施例的方块图。密码指令400包含一选项(optional)前置字段401、一重复前置字段402、一运算码字段403、一区块密码模式字段404。在一实施例中,字段401-404的内容相称于x86指令集架构,而其替代的实施例可考虑兼容于其它指令集架构。Please refer to FIG. 4 , which is a block diagram of an embodiment of an atomic cryptographic instruction 400 of the present invention. The password command 400 includes an optional prefix field 401 , a repetition prefix field 402 , an operation code field 403 , and a block cipher mode field 404 . In one embodiment, the contents of the fields 401-404 correspond to the x86 instruction set architecture, and alternative embodiments may be considered to be compatible with other instruction set architectures.

操作上,前置401在许多指令集架构中是用以致能(enable)或禁能(disable)部分主微处理器的处理特征,像是指示16位或32位的运算、指示处理或存取特定的内存区段等。重复前置402是用以指示由密码指令400所指定的密码运算是在复数个输入数据区块(如明文或密文)完成。重复前置402也隐示一相称微处理机利用其内复数个架构缓存器的内容当成指针指到系统内存中含有完成指定密码运算所需参数的位置。如上所述,在一x86相容实施例中,重复前置402的值是0xF3,并且根据x86架构协议,密码指令与x86重复字符串指令,如REP.MOV,在形式上非常相似。例如:当本发明由一x86兼容微处理器实施例执行时,重复前置是参照一储存在架构缓存器ECX中的区块计数变量、一储存在缓存器ESI中的来源地址指针(指到输入数据以供密码运算)以及一储存在缓存器EDI中的目的地址指针(指到内存中的输出数据)。在x86相容的一实施例中,本发明更扩展传统重复字符串的指令的概念成为更可参照一储存在缓存器EDX中的控制字组指针、一储存在缓存器EBX中的密码钥匙指针以及一储存在缓存器EAX中对一初始化向量的指针(如果指定密码模式要求)。Operationally, the front end 401 is used in many instruction set architectures to enable (enable) or disable (disable) some of the processing features of the main microprocessor, such as indicating 16-bit or 32-bit operations, indicating processing or accessing Specific memory segments, etc. The repetition prefix 402 is used to indicate that the cryptographic operation specified by the cryptographic instruction 400 is performed on a plurality of input data blocks (such as plaintext or ciphertext). Repeating the prefix 402 also implies that a corresponding microprocessor uses the contents of a plurality of architectural registers within it as pointers to locations in system memory containing parameters required to perform specified cryptographic operations. As mentioned above, in an x86 compatible embodiment, the value of the repeat prefix 402 is 0xF3, and according to the x86 architecture protocol, the password instruction is very similar in form to the x86 repeat string instruction, such as REP.MOV. For example: when the present invention is implemented by an x86-compatible microprocessor embodiment, the repeat prefix refers to a block count variable stored in architectural register ECX, a source address pointer stored in register ESI (pointing to input data for cryptographic operations) and a destination address pointer (pointing to output data in memory) stored in register EDI. In an embodiment compatible with x86, the present invention further expands the concept of the traditional repeating character string instruction to be able to refer to a control word group pointer stored in the register EDX, a password key pointer stored in the register EBX and a pointer to an initialization vector stored in register EAX (if required by the specified cipher mode).

运算码字段403指定微处理器完成一密码运算,此密码运算是由控制字组指标所隐示参照储存在内存中的一控制字组。本发明认为运算码值的较佳选择是存在指令集架构中一空闲或未使用的运算码值,通过此在一相称微处理器中保留与先前操作系统及应用软件的兼容。例如:如上所述,一x86兼容实施例的运算码字段403使用0x0FA7以指示执行指定的的密码运算。区块密码模式字段404指示特定的区块密码模式以供特定的密码运算使用,并且将参照图5加以探讨。The operation code field 403 designates the microprocessor to perform a cryptographic operation implicitly referred to by the control word pointer and referenced to a control word stored in memory. The present invention recognizes that the preferred choice of opcode values is to have a spare or unused opcode value in the ISA, by which compatibility with previous operating systems and application software is preserved in a compatible microprocessor. For example, as described above, the operation code field 403 of an x86-compatible embodiment uses 0x0FA7 to indicate that the specified cryptographic operation is performed. The block cipher mode field 404 indicates the particular block cipher mode to use for a particular cryptographic operation and will be discussed with reference to FIG. 5 .

图5是图4基元密码运算指令的区块密码模式字段范例值的表格500。值0xC8指示使用电子密码本(ECB)方式完成密码运算;值0xD0指示使用密码区块链接(CBC)方式完成密码算;值0xE0指示使用密码反馈方式(CFB)完成密码运算;以及值0xE8指示使用输出反馈方式(OFB)完成密码运算。区块密码模式字段404其它所有的值是保留,而这些模式是描述于上述所提及的FIPS的文件中。FIG. 5 is a table 500 of example values of the block cipher mode field of the primitive cipher operation instruction in FIG. 4 . The value 0xC8 indicates that the cryptographic operation is completed using the electronic codebook (ECB) method; the value 0xD0 indicates that the cryptographic operation is completed using the cryptographic block chaining (CBC) method; the value 0xE0 indicates that the cryptographic operation is completed using the cryptographic feedback method (CFB); The output feedback mode (OFB) completes the cryptographic operation. All other values of the block cipher mode field 404 are reserved, and these modes are described in the aforementioned FIPS document.

请参照图6,其为本发明在一X86兼容微处理器600中较详细的密码单元617的实施例方块图。微处理器600包含撷取逻辑601用以从内存(未绘出)撷取指令以供执行。撷取逻辑601是耦合到转译逻辑602,而转译逻辑602包含逻辑、电路、装置或微码(例如:微指令或本机指令),或逻辑、电路、装置或微码的组合,或用以转译指令成为相关序列微指令的等效组件。这些在转译逻辑602中执行转译的组件可能与在微处理器600中执行其它功能的电路、微码共享。转译逻辑602包含一转译器603,而此转译器603是耦合到一微码只读存储器604。中断逻辑626通过由总线628耦合到转译逻辑602。复数个软件及硬件中断信号627是由指示未处理中断给转译逻辑628的中断逻辑626处理。转译逻辑628耦合到微处理器600相继的阶段包含一暂存阶段605、地址阶段606、加载阶段607、执行阶段608、储存阶段618、以及写回阶段619。每一相继阶段包含逻辑以完成由撷取逻辑601所提供相关指令执行的特定功能,如先前在图3的微处理器中所讨论参照类似名称的组件。描绘在图6中x86兼容实施例600是以在执行阶段608中的执行逻辑632为特征,其包含平行执行单元610、612、614、616、617。一整数单元610从微指令队列609接收执行整数微指令;一浮点单元612从微指令队列611接收执行浮点数微指令;一MMX单元614从微指令队列613接收执行MMX微指令;一SSE单元616从微指令队列615接收执行SSE微指令。在展示的x86实施例,一密码单元617通过由一加载总线620耦合到SSE单元616、一失速(stall)信号621以及一储存总线622。密码单元617共享SSE单元的微指令队列615。一替代实施例可将密码单元617独立并联操作像是单元610、612以及614。整数单元610耦合到一x86 EFLAGS缓存器624,此EFLAGS缓存器包含一X位625,而此X位625的状态是配置用以指示密码运算是否正在处理。在一实施例中,此X位625是一x86 ELFAGS缓存器624的第30位。此外,整数单元610存取一机器指定缓存器以评估一E位629的状态,而此E位629的状态指示密码单元617是否位于微处理器600。整数单元610也存取一D位631于一特征控制缓存器630以致能或禁能密码单元617。如图3的微处理机实施例301,图6的微处理机600以必要组件为特征教示本发明一x86兼容实施例的内容,并且为求图示简洁而合并或省略微处理器的其它组件。熟悉该项技术者可察觉用以完全接口的其它组件,像是数据快取、总线接口单元、时脉产生以及分配逻辑等,均未绘出。Please refer to FIG. 6 , which is a block diagram of an embodiment of a more detailed encryption unit 617 in an X86 compatible microprocessor 600 of the present invention. The microprocessor 600 includes fetch logic 601 for fetching instructions from memory (not shown) for execution. The fetch logic 601 is coupled to the translation logic 602, and the translation logic 602 includes logic, circuits, devices or microcode (eg, microinstructions or native instructions), or a combination of logic, circuits, devices or microcodes, or for Translated instructions become equivalent components of the associated sequence of microinstructions. These components that perform translation in translation logic 602 may be shared with circuits, microcode, that perform other functions in microprocessor 600 . The translation logic 602 includes a translator 603 coupled to a microcode ROM 604 . Interrupt logic 626 is coupled to translation logic 602 by bus 628 . A plurality of software and hardware interrupt signals 627 are handled by interrupt logic 626 which indicates unhandled interrupts to translation logic 628 . The translation logic 628 is coupled to the microprocessor 600 and the successive stages include a register stage 605 , address stage 606 , load stage 607 , execute stage 608 , store stage 618 , and writeback stage 619 . Each successive stage contains logic to perform specific functions performed by associated instructions provided by fetch logic 601, as previously discussed in the microprocessor of FIG. 3 with reference to similarly named components. The x86 compatible embodiment 600 depicted in FIG. 6 is characterized by execution logic 632 in execution stage 608 , which includes parallel execution units 610 , 612 , 614 , 616 , 617 . An integer unit 610 receives and executes integer microinstructions from the microinstruction queue 609; a floating-point unit 612 receives and executes floating-point number microinstructions from the microinstruction queue 611; an MMX unit 614 receives and executes MMX microinstructions from the microinstruction queue 613; an SSE unit 616 receives and executes SSE microinstructions from the microinstruction queue 615 . In the x86 embodiment shown, a cryptographic unit 617 is coupled to the SSE unit 616 , a stall signal 621 , and a store bus 622 via a load bus 620 . The cryptographic unit 617 shares the microinstruction queue 615 of the SSE unit. An alternative embodiment may operate cryptographic unit 617 independently in parallel like units 610 , 612 and 614 . The integer unit 610 is coupled to an x86 EFLAGS register 624 which includes an X bit 625 whose state is configured to indicate whether a cryptographic operation is in progress. In one embodiment, the X bit 625 is bit 30 of an x86 ELFAGS register 624. In addition, the integer unit 610 accesses a machine specific register to evaluate the state of an E bit 629 indicating whether the cryptographic unit 617 is located in the microprocessor 600 . The integer unit 610 also accesses a D bit 631 in a feature control register 630 to enable or disable the cryptographic unit 617 . Microprocessor embodiment 301 of FIG. 3 , microprocessor 600 of FIG. 6 features the necessary components to teach an x86 compatible embodiment of the invention, and other components of the microprocessor are incorporated or omitted for simplicity of illustration. . Those skilled in the art will recognize that other components for a complete interface, such as data caches, bus interface units, clock generation and distribution logic, etc., are not shown.

在操作中,指令是由撷取逻辑601从内存(未绘出)撷取并且与一时脉信号(未绘出)同步提供给转译逻辑602。转译逻辑602转译每个指令成为一相对应序列的微指令,其与时脉信号同步持续地提供给微处理机600之后续阶段605-608、618、619。在一序列微指令中的每一个微指令指示一个次运算的执行,而次运算被要求完成由一相对指令所指定的一整体运算,例如地址阶段606产生一地址、暂存阶段605从指定缓存器(未绘出)恢复的两操作数在整数单元内相加、通过由储存逻辑618储存执行单元610、612、614、616、617其中的一所产生的结果于内存等。根据转译中的指令,转译逻辑602利用转译器603直接产生一序列的微指令,或是从微码ROM 604撷取此序列,或是利用转译器603直接产生此序列的部份并且从微码ROM604撷取此序列剩下的部分。微指令透过微处理机600的相继阶段605-608、618、619持续地与时脉同步进行。当微指令到达执行阶段608,执行逻辑632连同其操作数(在暂存阶段605从缓存器所恢复,或在地址阶段606由逻辑所产生,或通过由加载逻辑608从数据快取所恢复),通过由放置微指令在一对应的微指令队列609、611、613、615而将其依指定路线传送给一指定执行单元610、612、614、616、617。执行单元610、612、614、616、617执行微指令并提供结果给储存阶段618。在一实施例,微指令包含字段指示其是否可以与其它运算并列执行。In operation, instructions are fetched from memory (not shown) by the fetch logic 601 and provided to the translation logic 602 in synchronization with a clock signal (not shown). The translation logic 602 translates each instruction into a corresponding sequence of microinstructions, which are continuously provided to subsequent stages 605-608, 618, 619 of the microprocessor 600 synchronously with the clock signal. Each microinstruction in a sequence of microinstructions indicates the execution of a secondary operation, and the secondary operation is required to complete an overall operation specified by a relative instruction, for example, the address stage 606 generates an address, the temporary storage stage 605 from the specified cache The two operands recovered by the register (not shown) are added in the integer unit, and the result generated by one of the execution units 610, 612, 614, 616, 617 is stored by the storage logic 618 in memory, etc. According to the instruction in translation, the translation logic 602 utilizes the translator 603 to directly generate a sequence of microinstructions, or retrieves this sequence from the microcode ROM 604, or utilizes the translator 603 to directly generate a part of this sequence and extracts it from the microcode ROM 604 retrieves the rest of the sequence. Microinstructions are continuously executed synchronously with the clock through successive stages 605 - 608 , 618 , 619 of the microprocessor 600 . When the microinstruction reaches the execute stage 608, the execute logic 632 along with its operands (retrieved from the buffer in the scratch stage 605, or generated by logic in the address stage 606, or retrieved from the data cache by the load logic 608) , by placing the microinstructions in a corresponding microinstruction queue 609 , 611 , 613 , 615 and sending them to a specified execution unit 610 , 612 , 614 , 616 , 617 according to a specified route. Execution units 610 , 612 , 614 , 616 , 617 execute microinstructions and provide results to storage stage 618 . In one embodiment, the microinstruction contains a field indicating whether it can be performed in parallel with other operations.

响应先前所述的撷取一个XCRYPT指令,转译逻辑602产生相关微指令,其指示在微处理器600后继阶段605-608,618,619中的逻辑执行指定的密码运算。据此,一第一复数个相关微指令是直接依路径传送至密码单元617并且指示单元617由加载总线620加载数据,或加载一区块的输入数据并且开始执行指定数目的密码回合以产生一区块的输出数据,或通过由储存逻辑618透过储存总线620将所产生的区块输出数据储存于内存。如先前参照图3所述,此第一复数个相关微指令是用以增加密码单元617有利的特性以达到多数据区块的有效管线操作。更精确地,第一复数个相关微指令是用以确定在储存对应前一输入文字区块的一输出文字区块之前,一相继的输入文字区块已被加载。这使得当输出文字区块正在被储存时,一指定密码操作执行于此相继的输入文字区块。In response to fetching an XCRYPT instruction as previously described, translation logic 602 generates associated microinstructions that instruct logic in subsequent stages 605-608, 618, 619 of microprocessor 600 to perform specified cryptographic operations. Accordingly, a first plurality of related microinstructions are directly routed to the cryptographic unit 617 and instruct the unit 617 to load data from the load bus 620, or load a block of input data and start to execute a specified number of cryptographic rounds to generate a The output data of the block, or the block output data generated by the storage logic 618 through the storage bus 620 is stored in the memory. As previously described with reference to FIG. 3 , the first plurality of related microinstructions is used to increase the advantageous characteristics of the cryptographic unit 617 to achieve efficient pipeline operation of multiple data blocks. More precisely, the first plurality of associated microinstructions are used to determine that a successive input text block has been loaded before storing an output text block corresponding to a previous input text block. This enables a specific cryptographic operation to be performed on a subsequent block of input text while the block of output text is being stored.

一第二复数个相关微指令依其路径传送至其它实行单元610、612、614、616以执行其它次运算,其等次运算是完成指定密码运算的必需,例如E位629的测试、致能D位631、设定X位625以指示密码操作进行中、在暂存阶段605更新暂存(例如,计数缓存器、输入文字指针缓存器、输出文字指针缓存器)、由中断逻辑626所指示的中断627的处理等。相关微指令是用以提供指定密码运算的最佳执行于多区块输入数据,其通过由与密码单元微指令序列中的整数单元微指令成为接口,因此整数运算可与密码单元运算并行完成。微指令是包含于相关微指令以允许或并从待处理中断627恢复。因为所有对密码参数的指针与数据是提供于x86架构缓存器,当中断执行时,其状态被保存,并且当从中断返回,这些状态被恢复。当从中断返回,微指令测试X位625的状态以决定是否一密码运算在进行。如果是,当中断发生时,此运算重复于处理中的特别输入数据区块。相关微指令是用以允许在处理中断627之前,更新在一序列输入文字区块上的一序列密码操作的指针缓存器及中间的结果。A second plurality of relevant micro-instructions are transmitted to other execution units 610, 612, 614, 616 according to their paths to perform other operations, which are necessary for completing specified cryptographic operations, such as the test of the E bit 629, enabling D bit 631, X bit 625 set to indicate cryptographic operation in progress, temporary storage (e.g., counter register, input literal pointer register, output literal pointer register) updated during temporary storage phase 605, indicated by interrupt logic 626 The processing of interrupt 627 and so on. The associated microinstructions are used to provide optimal execution of specified cryptographic operations on multi-block input data by interfacing with integer unit microinstructions in the cryptographic unit microinstruction sequence, so integer operations can be performed in parallel with cryptographic unit operations. The uop is included in the associated uop to enable or resume from pending interrupt 627 . Since all pointers and data to cryptographic parameters are provided in x86 architecture registers, their state is saved when the interrupt executes, and these states are restored when returning from the interrupt. Upon return from the interrupt, the microinstruction tests the state of the X bit 625 to determine if a cryptographic operation is in progress. If so, the operation is repeated for the particular input data block being processed when an interrupt occurs. Related microinstructions are used to allow updating pointer registers and intermediate results of a sequence of cryptographic operations on a sequence of input text blocks before processing interrupt 627 .

请参照图7,其为图6的微处理器中指示密码次运算的范例微指令700字段的方块图。微指令700包含一微运算码字段701,一数据缓存器字段702,以及一缓存器字段703。微运算码字段701指定执行一特定次运算并且指定逻辑于微处理器600中一或多阶段以执行次运算。微运算码字段701的指定值指定根据本发明的一密码单元执行指示的微指令。在一实施例,有两个指定的值。一第一值(XLOAD)指定数据从一内存位置恢复,而其地址是由数据缓存器字段702内容所指称的一架构缓存器的内容所指定。这数据被加载到由缓存器字段703内容所具体指定密码单元内的一缓存器。这恢复的数据(例如:密码钥匙数据、控制字组、输入文字数据、初始化向量)是提供给密码单元。微运算码字段701的第二值(XSTOR)指出由密码单元所产生的数据储存在一记忆位置,而其地址是由数据缓存器字段702内容所指称的一架构缓存器的内容所指定。在密码单元一多阶段实施例,缓存器字段703的内容指示复数个输出数据区块的一储存于内存。输出数据区块是由数据域位704内的密码单元所提供以供储存逻辑存取。根据本发明密码单元所执行XLOAD和XSTOR微指令更具体的细节,将参照图8及图9加以讨论。Please refer to FIG. 7 , which is a block diagram of fields of an example microinstruction 700 indicating cryptographic operations in the microprocessor of FIG. 6 . The microinstruction 700 includes a micro-opcode field 701 , a data register field 702 , and a register field 703 . The micro-opcode field 701 specifies the execution of a particular operation and specifies one or more stages of logic within the microprocessor 600 to perform the operation. The specified value of the micro-opcode field 701 designates a cryptographic unit according to the present invention to execute the indicated micro-instruction. In one embodiment, there are two specified values. A first value (XLOAD) specifies that data is to be retrieved from a memory location specified by the contents of an architectural register referred to by the contents of the data register field 702 . This data is loaded into a buffer within the cryptographic unit specified by the contents of the buffer field 703 . The recovered data (for example: cryptographic key data, control word, input text data, initialization vector) is provided to the cryptographic unit. The second value (XSTOR) of the micro-op field 701 indicates that the data generated by the cryptographic unit is stored at a memory location whose address is specified by the contents of an architectural register referred to by the contents of the data register field 702 . In a multi-stage embodiment of the cryptographic unit, the content of the register field 703 indicates that one of the plurality of output data blocks is stored in memory. The output data block is provided by the cryptographic unit in the data field 704 for storage logic access. More specific details of the XLOAD and XSTOR microinstructions executed by the cryptographic unit according to the present invention will be discussed with reference to FIG. 8 and FIG. 9 .

请参照图8,其为图7的XLOAD微指令格式700缓存器字段703的值的表格。如前所述,一序列微指令是产生响应一XCRPYT指令的转译。此序列微指令包含一第一复数个微指令,其是由密码单元指示执行;以及一第二复数个微指令,其是由微处理器中密码单元以外的一或多个并列功能单元所执行。第二复数个微指令指示次运算,例如更新计数器、暂时储存、架构暂存、测试并设定状态位于机器指定缓存器等。第一复数个微指令提供钥匙数据、密码参数以及输入数据给密码单元并且指示密码单元产生钥匙排程(或加载从内存恢复的钥匙排程)以加载并加密(或解密)输入文字数据,并且储存输出文字数据。一XLOAD微指令提供给密码单元以加载控制字组数据、加载一密码钥匙或钥匙排程、加载初始向量数据、加载输入文字数据以及加载输入文字数据并指示密码单元开始一指定密码运算。一XLOAD微指令在缓存器字段703的值0b010是指示密码单元加载一控制字组到其内部控制字组缓存器。当这微指令进行管线处理,在暂存阶段的一架构控制字组指标传存器存取获得内存中储存控制字组的地址。地址逻辑转译此地址成为一实体地址以供内存存取。加载逻辑从快取撷取控制字组,然后传给密码单元。同样地,缓存器字段值0b010指示密码单元加载由数据域位704所提供的输入文字数据,并且在加载之后开始指定的密码运算。类似控制字组,输入数据由储存在架构缓存器中所储存的一缓存器存取。值0b010指示加载由数据域位704所提供的输入数据给内部缓存器IN-1。加载到IN-1缓存器的数据不是输入文字数据(当管线处理时)就是一初始化向量。值0b110及0b111分别指示密码单元加载一密码钥匙或使用者产生钥匙排程中一钥匙的较低及较高位。根据本应用,使用者是定义成执行一特定功能或特定运算,而使用者可具体化成一应用程序、一操作系统、一机器或者一个人。Please refer to FIG. 8 , which is a table of the values of the register field 703 of the XLOAD microinstruction format 700 of FIG. 7 . As previously mentioned, a sequence of microinstructions is generated in response to translations of an XCRPYT instruction. This sequence of microinstructions includes a first plurality of microinstructions, which are instructed to be executed by the cryptographic unit; and a second plurality of microinstructions, which are executed by one or more parallel functional units other than the cryptographic unit in the microprocessor . The second plurality of microinstructions instructs operations such as updating counters, temporary storage, architectural temporary storage, testing and setting state in machine-specific registers, and the like. The first plurality of microinstructions provides key data, cryptographic parameters, and input data to the cryptographic unit and instructs the cryptographic unit to generate a key schedule (or load a key schedule recovered from memory) to load and encrypt (or decrypt) the input text data, and Store output text data. An XLOAD microinstruction is provided to the cryptographic unit to load control block data, load a cryptographic key or key schedule, load initial vector data, load input text data and load input text data and instruct the cryptographic unit to start a specified cryptographic operation. An XLOAD microinstruction with a value of 0b010 in the register field 703 instructs the cryptographic unit to load a control word into its internal control word register. When the microinstruction is pipelined, an architectural control word pointer register access in the scratch stage obtains the address in memory where the control word is stored. The address logic translates this address into a physical address for memory access. The load logic retrieves the control word from the cache and passes it to the cryptographic unit. Likewise, the register field value 0b010 instructs the cryptographic unit to load the input literal data provided by the data field bit 704, and to start the specified cryptographic operation after loading. Like the control word, the input data is accessed by a register stored in the architectural register. A value of 0b010 indicates to load the input data provided by the data field bits 704 into the internal buffer IN-1. The data loaded into the IN-1 register is either input literal data (when pipelined) or an initialization vector. Values 0b110 and 0b111 instruct the cryptographic unit to load the lower and upper bits of a key in a cryptographic key or user generated key schedule, respectively. According to this application, a user is defined as performing a specific function or a specific operation, and the user can be embodied as an application program, an operating system, a machine or a person.

在一实施例中,缓存器字段值0b100及0b101是考虑一密码单元有两阶段,通过此,可以管线处理相继的输入文字区块数据。因此对管线处理相继的输入资料区块而言,一第一XLOAD微指令执行提供一第一区块的输入文字数据给IN-1,接着执行一第XLOAD微指令提供一第二区块的输入文字数据给IN-0,并且指示密码单元开始执行指定的密码运算。In one embodiment, the register field values 0b100 and 0b101 consider that a cryptographic unit has two stages, through which successive input text block data can be pipelined. Therefore, for pipeline processing of successive blocks of input data, a first XLOAD microinstruction is executed to provide a first block of input text data to IN-1, and then a first XLOAD microinstruction is executed to provide a second block of input Text data is given to IN-0, and instructs the cryptographic unit to start executing the specified cryptographic operation.

当一使用者产生的钥匙排程被用以执行密码运算时,对应使用者产生的钥匙排程中钥匙数量的XLOAD微指令是依设定路径传送给密码单元,此密码单元指示加载此钥匙排程中每一回合钥匙。When a key schedule generated by a user is used to perform cryptographic operations, the XLOAD microinstruction corresponding to the number of keys in the key schedule generated by the user is sent to the cryptographic unit according to the set path, and the cryptographic unit instructs to load the key schedule Keys for each turn in the process.

在XLOAD微指令中缓存器字段703其它所有的值是保留。All other values of register field 703 are reserved in XLOAD microinstructions.

请参照图9,其为图7的XSTOR微指令格式700缓存器字段703的值的表格。一XSTOR微指令是发布(issue)给密码单元以指示其提供所产生的输出文字区块给储存逻辑储存于内存中由地址字段702所提供的地址。据此,本发明的转译逻辑为一特定的输出文字区块所发布的一XSTOR微指令是在为一其所对应输入文字区块所发布的一XSTOR微指令之后。缓存器字段703的值0b100是指示密码单元提供关联其内部的OUT-0缓存器给储存逻辑储存。OUT-0的内容与输入文字区块提供给IN-0是相关联。同理,参照缓存器字段值0b101的内部output-1缓存器是与输入文字数据提供给IN-1相关联。据此,跟随在钥匙及控制字组数据加载之后,复数个输入文字区块可以被管线输送,是透过密码单元依序发布密码微指令XLOAD.IN-1、XLOAD.IN-0(XLOAD.IN-0也指示密码单元开始密码运算)、XSTOR.OUT-1、XSTOR.OUT-0、XLOAD.IN-1、XLOAD.IN-0(开始下两个输入文字区块运算)等等。Please refer to FIG. 9 , which is a table of the values of the register field 703 of the XSTOR microinstruction format 700 of FIG. 7 . An XSTOR microinstruction is issued to the cryptographic unit instructing it to provide the generated output text block to storage logic to store in memory at the address provided by address field 702 . Accordingly, the translation logic of the present invention issues an XSTOR microinstruction for a specific output text block after issuing an XSTOR microinstruction for a corresponding input text block. The value 0b100 of the register field 703 indicates that the cryptographic unit provides its associated internal OUT-0 register for storage logic storage. The content of OUT-0 is associated with the input text block provided to IN-0. Similarly, the internal output-1 register with the reference register field value 0b101 is associated with the input text data provided to IN-1. Accordingly, following the loading of the key and control word data, a plurality of input text blocks can be pipelined, and the password micro-instructions XLOAD.IN-1, XLOAD.IN-0 (XLOAD.IN-0 (XLOAD. IN-0 also instructs the cryptographic unit to start a cryptographic operation), XSTOR.OUT-1, XSTOR.OUT-0, XLOAD.IN-1, XLOAD.IN-0 (to start the next two input text block operations) and so on.

请参照图10,其为本发明指定密码运算参数的范例控制字组1000格式的方块图。控制字组1000是由使用者程序设计于内存,并且在执行密码运算之前,控制字组1000的指针提供给相称微处理器中的一架构缓存器。据此,当部分序列的微指令对应到一XCRYPT指令时,一XLOAD微指令被发布以指示微处理器去读取包含指针的架构缓存器、从内存(快取)恢复控制字组1000以及加载控制字组1000到密码单元的内部控制字组缓存器。控制字组1000包含一保留RSVD字段1001、钥匙大小KSIZE字段1002、一加密/解密E/D字段1003、一中间结果IRSLT字段1004、一钥匙产生KGEN字段1005、一演算ALG字段1006以及一回合计算RCNT字段1007。Please refer to FIG. 10 , which is a block diagram illustrating the format of an example control word 1000 for specifying cryptographic operation parameters in the present invention. The control word 1000 is programmed in memory by the user, and before performing cryptographic operations, the pointer to the control word 1000 is provided to an architectural register in a corresponding microprocessor. Accordingly, when a partial sequence of microinstructions corresponds to an XCRYPT instruction, an XLOAD microinstruction is issued to instruct the microprocessor to read architectural registers containing pointers, restore control word 1000 from memory (cache), and load Control block 1000 to the internal control block register of the cryptographic unit. The control block 1000 includes a reserved RSVD field 1001, a key size KSIZE field 1002, an encryption/decryption E/D field 1003, an intermediate result IRSLT field 1004, a key generation KGEN field 1005, a calculation ALG field 1006 and a round calculation RCNT field 1007.

保留字段1001所有的值是保留。KSIZE字段1002的内容是指示一用以完成加密或解密的密码钥匙的大小。在一实施例中,KSIZE字段1002不是指示一128位钥匙、一192位钥匙,就是指示一256位钥匙。E/D字段1003指出密码运算是加密运算或指出密码运算是解密运算。KGEN字段1005指示在内存中是使用者产生的钥匙排程或在内存中是单一密码钥匙;如果为单一钥匙时,微指令发布给密码单元与密码钥匙以指示单元根据ALG字段1006的内容所具体指定的密码演算以扩展钥匙成为一钥匙排程。在一实施例,ALG字段1006的特定值具体指示DES算法、三重DES算法或者AES算法如先前所述的讨论。替代实施例可考虑其它密码算法,例如Rijndael Cipher、Twofish Cipher等。RCNT字段1007的内容指示一数量的密码回合,其根据具体指示的算法完成于每一输入文字区块。虽然上述提及的标准指示每一输入文字区块固定前置数量的密码回合,但RCNT字段1007允许一程序设计者从标准指示修改回合的数量。在一实施例中,程序设计者可指定每一区块从0-15回合。最后,IRSLT字段1004指示是否一输入文字区块的加密/解密是根据ALG 1006所指定的密码算法以RCNT 1007所指定回合的数量执行,或者加密/解密是根据ALG 1006所指定的密码算法以RCNT 1007所指定回合的数量执行,而其最终回合的执行代表一中间结果而不是一最终结果。熟悉该项技术者可察知许多密码算法除了最终回合的次运算的外是执行相同的次运算于每一回合。因此程序设计IRSLT字段1004提供中间结果而不是最后结果,通过此,允许程序设计者可核对算法实现的中间的步骤。例如:获得增加的中间值以核对算法实行,假设,执行一回合的加密于一文字区块,然后执行两回合于相同文字区块,然后三回合等。提供可程序化回合及中间值结果的功能可让使用者检查密码执行、除错以及达到改变钥匙结构及回合计数。All values in the reserved field 1001 are reserved. The content of the KSIZE field 1002 indicates the size of a cryptographic key used to perform encryption or decryption. In one embodiment, the KSIZE field 1002 either indicates a 128-bit key, a 192-bit key, or a 256-bit key. The E/D field 1003 indicates that the cryptographic operation is an encryption operation or indicates that the cryptographic operation is a decryption operation. The KGEN field 1005 indicates that it is a key schedule generated by the user in memory or a single cryptographic key in memory; The specified cryptographic calculation with the extended key becomes a key schedule. In one embodiment, the particular value of the ALG field 1006 specifically indicates a DES algorithm, a triple DES algorithm, or an AES algorithm as previously discussed. Alternative embodiments may consider other cryptographic algorithms such as Rijndael Cipher, Twofish Cipher, etc. The content of the RCNT field 1007 indicates a number of cryptographic rounds to complete for each block of input text according to the specified algorithm. Although the above-mentioned standard indicates a fixed number of cryptographic rounds preceding each input text block, the RCNT field 1007 allows a programmer to modify the number of rounds from the standard indication. In one embodiment, the programmer can specify each block from 0-15 rounds. Finally, the IRSLT field 1004 indicates whether encryption/decryption of an input text block is performed according to the cryptographic algorithm specified by ALG 1006 with the number of rounds specified by RCNT 1007, or whether encryption/decryption is performed according to the cryptographic algorithm specified by ALG 1006 with RCNT 1007 The specified number of rounds is executed, and the execution of the final round represents an intermediate result rather than a final result. Those skilled in the art will recognize that many cryptographic algorithms perform the same operations in each round except for the operations in the final round. The programming IRSLT field 1004 therefore provides intermediate results rather than final results, by which the programmer is allowed to check intermediate steps of the algorithm implementation. For example: to obtain an increased median value to check that the algorithm performs, say, one round of encryption on a block of text, then two rounds on the same block of text, then three rounds, etc. The ability to provide programmable round and intermediate results allows users to check cryptographic execution, debug, and achieve changes in key structures and round counts.

请参照图11,其为本发明的一密码单元1100的较佳实施例方块图。密码单元1100包含一微运算码缓存器1103,此微运算码缓存器1103透过一微指令总线1114接收密码微指令(例如XLOAD与XSTOR微指令)。密码单元1100也包含一控制字组缓存器1104、一input-0缓存器1105以及input-1缓存器1106、一key-0缓存器1107以及一key-1缓存器1108。数据透过一加载总线1111提供给缓存器1104-1108,如微指令暂存迄1103中一XLOAD微指令内容所指定。而input-0与input-1缓存器1105-1106是配置用以在目前输入文字区块执行密码运算期间,致能随后输入文字区块的缓冲。密码单元1100也包含区块密码逻辑1101,此区块密码逻辑1101耦合到所有的缓存器1103-1108以及也耦合到密码钥匙随机存取内存1102。区块码逻辑1101提供一暂停信号1113并且也提供区块结果给一output-0缓存器1109以及一output-1缓存器1110。输出缓存器1109-1110透过一储存总线1212将内容依指定路径传送给在一相称微处理器中的相继阶段。密码单元1100是装配以致能在密码运算于接着的输入文字区块时,储存从输出缓存器1109-1110的数据。在一实施例中,微指令缓存器1103是32位大小,并且每一剩下的缓存器1104-1110是128位缓存器。Please refer to FIG. 11 , which is a block diagram of a preferred embodiment of a cryptographic unit 1100 of the present invention. The cryptographic unit 1100 includes a micro-op code register 1103 , and the micro-op code register 1103 receives cryptographic micro-commands (such as XLOAD and XSTOR micro-commands) through a micro-command bus 1114 . The encryption unit 1100 also includes a control block register 1104 , an input-0 register 1105 and an input-1 register 1106 , a key-0 register 1107 and a key-1 register 1108 . Data is provided to registers 1104-1108 via a load bus 1111 as specified by the content of an XLOAD microinstruction in register 1103. The input-0 and input-1 registers 1105-1106 are configured to enable the buffering of subsequent input text blocks during the cryptographic operation of the current input text block. The cryptographic unit 1100 also includes block cryptographic logic 1101 coupled to all registers 1103 - 1108 and also to cryptographic key random access memory 1102 . Block code logic 1101 provides a pause signal 1113 and also provides block results to an output-0 register 1109 and an output-1 register 1110 . Output registers 1109-1110 route their contents through a memory bus 1212 to successive stages in a corresponding microprocessor. The cryptographic unit 1100 is configured to store data from the output registers 1109-1110 while the cryptographic operation is performed on subsequent blocks of input text. In one embodiment, microinstruction register 1103 is 32-bit in size, and each of the remaining registers 1104-1110 is a 128-bit register.

在操作中,密码微指令与数据一起连续提供给微指令缓存器1103,其中数据是指定给控制字组缓存器1104、或输入缓存器1105-1106的一、或钥匙缓存器1107-1108的一。在参照图8及图9讨论的实施例中,控制字组通过由一XLOAD微指令加载到控制字组缓存器1104。因此密码钥匙或钥匙排程经由连续的XLOAD微指令加载。当一128位密码钥匙加载时,一XLOAD微指令因此提供给指定的KEY-0缓存器1107,并且连同一XLOAD微指令提供给指定的KEY-1缓存器1108。当一使用者产生的钥匙排程加载时,连续XLOAD微指令提供给指定KEY-0缓存器1107。钥匙排程中的每一钥匙被加载且依序被放置在钥匙随机存取内存1102以供其相对应的密码回合使用。随此,输入文字数据(如果没有要求一初始向量)加载到IN-1缓存器1106,如果要求一初始向量,则经由一XLOAD微指令加载到IN-1缓存器1106。对IN-0缓存器1105的一XLOAD微指令指示密码单元以加载输入文字数据给IN-0缓存器1105,并且开始在IN-0缓存器1105内的输入文字数据执行密码回合,其根据控制字组缓存器1104的内容所提供的参数使用在IN-1或在两输入缓存器1105-1106(当输入数据是管线处理)中的初始向量。根据收到指定IN-0的XLOAD微指令,区块密码逻辑1101开始执行由控制字组内容所指定的密码运算。当单一密码钥匙要求扩展,区块密码逻辑1101产生钥匙排程中的每一钥匙并将以储存在钥匙随机存取内存1102。姑且不论是否由区块密码逻辑1101产生一钥匙排程或者是从内存中加载钥匙排程,第一回合的钥匙是快取储存于区块密码逻辑1101中以使得第一区块密码回合可以不用存取钥匙RAM1102而处理。一但初始化后,区块密码逻辑1101继续执行指定的密码运算于一或多个输入文字区块直到运算完成;其连续从钥匙RAM 1102撷取回合钥匙如所应用的密码算法所要求。密码单元1100执行一指定区块密码运算于指定的输入文字区块,而相继的输入文字区块透过相继对应的XLOAD及XSTOR微指令加密/解密。当一XSTOR微指令执行时,如果指定输出数据(例如OUT-0或OUT-1)尚未完全产生,则区块密码逻辑1101显示暂停信号1113。一但输出数据已产生且放置于相对应的输出缓存器1109-1110时,缓存器1109-1110的内容接着传送到储存总线1112。虽然当指定输出资料尚未完全产生时会显示暂停信号1113,但由于输入缓存器1105-1106允许输入文字区块的缓冲,因此透过密码单元1100有效率的数据区块管线处理是通过由顺序化加载及储存微指令,使得在随后输入文字区块的密码运算总是执行在要求储存数据于输出缓存器1109-1110时。In operation, cryptographic microinstructions are sequentially provided to microinstruction register 1103 along with data, where data is assigned to control word register 1104, or one of input registers 1105-1106, or one of key registers 1107-1108. . In the embodiment discussed with reference to FIGS. 8 and 9, the control word is loaded into the control word register 1104 by an XLOAD microinstruction. Thus the cryptographic key or key schedule is loaded via successive XLOAD microinstructions. When a 128-bit cryptographic key is loaded, an XLOAD microinstruction is therefore provided to the designated KEY-0 register 1107 and, together with an XLOAD microinstruction, provided to the designated KEY-1 register 1108 . When a user-generated key is scheduled to be loaded, successive XLOAD microcommands are provided to the designated KEY-0 register 1107 . Each key in the key schedule is loaded and sequentially placed in the key random access memory 1102 for use by its corresponding cryptographic round. Following this, the input text data (if an initial vector is not required) is loaded into the IN-1 register 1106, and if an initial vector is required, it is loaded into the IN-1 register 1106 via an XLOAD microinstruction. An XLOAD microinstruction to the IN-0 register 1105 instructs the crypto unit to load the input text data into the IN-0 register 1105, and initiates the execution of the crypto round on the input text data in the IN-0 register 1105, which is based on the control word The content of group buffer 1104 provides parameters using initial vectors in IN-1 or in both input buffers 1105-1106 (when input data is pipelined). Upon receiving the XLOAD microinstruction specifying IN-0, the block cipher logic 1101 starts to execute the cipher operation specified by the content of the control word. When a single cryptographic key is required to be extended, the block cryptographic logic 1101 generates each key in the key schedule and stores it in the key random access memory 1102 . Regardless of whether a key schedule is generated by the block cipher logic 1101 or loaded from memory, the keys for the first round are cached in the block cipher logic 1101 so that the first block cipher round can be used without Access the key RAM1102 for processing. Once initialized, block cipher logic 1101 continues to perform specified cryptographic operations on one or more blocks of input text until the operation is complete; it continues to retrieve round keys from key RAM 1102 as required by the applied cryptographic algorithm. The cryptographic unit 1100 executes a specified block cryptographic operation on a specified input text block, and successive input text blocks are encrypted/decrypted by sequentially corresponding XLOAD and XSTOR microinstructions. When an XSTOR microinstruction is executed, if the specified output data (eg, OUT-0 or OUT-1) has not been fully generated, the block cipher logic 1101 asserts the suspend signal 1113 . Once the output data has been generated and placed in the corresponding output buffers 1109-1110, the contents of the registers 1109-1110 are then transferred to the storage bus 1112. Although the pause signal 1113 is displayed when the specified output data has not yet been fully generated, efficient data block pipeline processing through the cryptographic unit 1100 is achieved by the sequential Load and store microinstructions such that subsequent cryptographic operations on input literal blocks are always performed when required to store data in output registers 1109-1110.

请参照图12,其为本发明执行有关进阶加密标准(AES)算法密码运算的一区块密码逻辑1200实施例的方块图。区块密码逻辑1200包含一回合引擎1220,此回合引擎1220透过总线1211-1214及总线1216-1218耦合到一回合引擎控制器1210。回合引擎控制器1210包含储存逻辑1230,并且存取一微指令缓存器1201、控制字组缓存器1202、KEY-0缓存器1203以及KEY-1缓存器1204以存取钥匙数据、微指令以及所指示密码运算的参数。输入缓存器1205-1206的内容提供给回合引擎1220并且回合引擎1220提供相对应输出文字给输出缓存器1207-1208。输出缓存器1207-1208透过总线1216-1217也耦合到回合引擎控制器1210以致能回合引擎控制器存取每一相继密码回合的结果,而此结果是透过NEXTIN总线1218提供给回合引擎1220下一密码回合。钥匙RAM(未绘出)中的密码钥匙是透过总线1215存取。ENC/DEC信号1211指示回合引擎利用次运算执行不是加密(例如S-Box)就是解密(例如反向S-Box)。RNDCON总线1212的内容指示回合引擎1220执行不是一第一AES回合、一中间AES回合就是一最后AES回合。钥匙总线1213用以提供每一回合钥匙给回合引擎1220在其对应的回合执行时。Please refer to FIG. 12 , which is a block diagram of an embodiment of a block of cryptographic logic 1200 for performing cryptographic operations related to the Advanced Encryption Standard (AES) algorithm of the present invention. Block cipher logic 1200 includes a round engine 1220 coupled to round engine controller 1210 via buses 1211-1214 and buses 1216-1218. The round engine controller 1210 includes storage logic 1230 and accesses a uop register 1201, control word register 1202, KEY-0 register 1203, and KEY-1 register 1204 to access key data, uops, and all Indicates the parameters of the cryptographic operation. The contents of the input buffers 1205-1206 are provided to the round engine 1220 and the round engine 1220 provides the corresponding output text to the output buffers 1207-1208. Output registers 1207-1208 are also coupled to round engine controller 1210 via buses 1216-1217 to enable the round engine controller to access the results of each successive cryptographic round, which results are provided to round engine 1220 via NEXTIN bus 1218 next password round. The cryptographic keys in the key RAM (not shown) are accessed through the bus 1215 . ENC/DEC signal 1211 instructs the turn engine to perform either encryption (eg S-Box) or decryption (eg Reverse S-Box) with the secondary operation. The content of the RNDCON bus 1212 instructs the round engine 1220 to perform either a first AES round, an intermediate AES round or a final AES round. The key bus 1213 is used to provide each round key to the round engine 1220 when executing the corresponding round.

回合引擎1220包含第一钥匙XOR逻辑1221,此第一钥匙XOR逻辑1221耦合到一第一缓存器REG-0 1222,此第一缓存器1222耦合到S-Box逻辑1223,此S-Box逻辑1223耦合到ShiftRow逻辑1224,此Shift Row逻辑1224耦合到一第二缓存器REG-1 1225,此第二缓存器1225耦合到Mix Colum逻辑1226,此Mix Colum逻辑1226耦合到一第三缓存器REG-2 1227。第一钥匙逻辑1221、S-Box逻辑1223、Shift Row逻辑1224以及Mix Colum逻辑1226是配置用以执行次运算于输入文字数据,像是具体指定于先前讨论的AES FIPS标准。Mix Colum逻辑1226在中间回合期间于要求使用通过由钥匙总线1213所提供的回合钥匙时,是附加配置以执行AES XOR功能于输入数据。第一钥匙逻辑1221、S-Box逻辑1223、Shift Row逻辑1224以及Mix Colum逻辑1226在通过由ENC/DEC 1211的状态指示时,也配置用以执行其相对的反向AES次运算于解密期间。熟悉该项技术者可察知中间回合数据是根据控制字组缓存器1202内容所指定的具体区块加密模式而回馈给回合引擎1220。初始化向量数据(如果要求)透过NEXTIN总线1218提供给回合引擎1220。Round engine 1220 includes first key XOR logic 1221, this first key XOR logic 1221 is coupled to a first register REG-0 1222, this first register 1222 is coupled to S-Box logic 1223, this S-Box logic 1223 Coupled to ShiftRow logic 1224, the Shift Row logic 1224 is coupled to a second register REG-1 1225, the second register 1225 is coupled to Mix Colum logic 1226, and the Mix Colum logic 1226 is coupled to a third register REG-1 1225 2 1227. First Key Logic 1221 , S-Box Logic 1223 , Shift Row Logic 1224 , and Mix Column Logic 1226 are configured to perform operations on input text data, as specified in the previously discussed AES FIPS standard. Mix Colum Logic 1226 is additionally configured to perform an AES XOR function on incoming data during intermediate rounds when required using the round key provided via key bus 1213. First Key Logic 1221, S-Box Logic 1223, Shift Row Logic 1224, and Mix Column Logic 1226 are also configured to perform their relative reverse AES operations during decryption when indicated by the status of ENC/DEC 1211. Those skilled in the art can appreciate that the intermediate round data is fed back to the round engine 1220 according to the specific block encryption mode specified by the content of the control block register 1202 . The initialization vector data (if required) is provided to the round engine 1220 via the NEXTIN bus 1218 .

在图12所示的实施例中,回合引擎分为两阶段:一第一阶段介于REG-0 1222与REG-1 1225以及一第二阶段介于REG-1 1225与REG-2 1227。中间回合数据同步一时脉信号(未绘出)于阶段间管线处理。当一区块的输入数据完成密码运算,其关联的输出数据放置于相对应输出缓存器1207-1208。响应到一XSTOR微指令,储存逻辑1230确立STORE信号1214以告知回合引擎1220说指定输出缓存器1207-1208的内容正提供给储存总线(未绘出)。当随后的输入文字区块已缓冲于输入缓存器1205-1206,且当回合引擎1220正在处理随后的输入文字区块时,输出缓存器1207-1208可以执行储存。根据本发明效率化多资料区块管线处理如何加载及储存微指令,将更具体参照图13到图16加以讨论。In the embodiment shown in Figure 12, the turn engine is divided into two phases: a first phase between REG-0 1222 and REG-1 1225 and a second phase between REG-1 1225 and REG-2 1227. The intermediate round data is synchronized with a clock signal (not shown) in the inter-stage pipeline for processing. When the input data of a block completes the cryptographic operation, its associated output data is placed in the corresponding output registers 1207-1208. In response to an XSTOR microinstruction, the store logic 1230 asserts the STORE signal 1214 to inform the round engine 1220 that the contents of the specified output registers 1207-1208 are being provided to the store bus (not shown). Output registers 1207-1208 may perform storage when subsequent input text blocks have been buffered in input registers 1205-1206 and while the next input text block is being processed by round engine 1220. How to load and store microinstructions according to the streamlined multiple data block pipeline processing of the present invention will be discussed in more detail with reference to FIGS. 13 to 16 .

请参照图13,其为本发明微指令流的一实施例对密码单元的一单阶实施例的表格1300。如上述,一单阶密码单元一次可以处理一输入文字区块。然而,此单阶实施例如多阶实施例(一两阶实施例是展示并具体讨论参照图12)是配置于相同方法,也就是当回合引擎对目前输入数据执行指定的密码运算时,输入缓存器允许缓冲随后的输入区块数据,并且当随后的输入数据区块执行指定的密码运算时,输出缓存器与储存逻辑致能对应目前输入数据区块的输出区块储存。在表格1300的微指令流并没有利用先前所提及有利于单阶段密码单元的特性。Please refer to FIG. 13 , which is a table 1300 of an embodiment of the microinstruction stream of the present invention for a single-level embodiment of the cryptographic unit. As mentioned above, a single-level cryptographic unit can process one input text block at a time. However, this single-stage embodiment is configured in the same way as the multi-stage embodiment (a two-stage embodiment is shown and discussed in detail with reference to FIG. 12 ), that is, when the round engine performs a specified cryptographic operation on the current input The registers allow buffering of subsequent input block data, and when the subsequent input data block performs specified cryptographic operations, the output register and storage logic enables storage of the output block corresponding to the current input data block. The microinstruction flow in table 1300 does not take advantage of the previously mentioned properties that favor single-stage cryptographic units.

就本发明教示的目的而言,一加载微指令LD.IN-0的执行需要两个管线时脉周期。一但输入数据加载输入缓存器0,回合引擎自动开始。就比较目的而言,回合引擎须20个时脉周期以产生一对应的输出区块,在此期间,一储存指令ST.OUT-0是暂停。类似加载指令LD.IN-0,储存指令ST.OUT-0指定执行的储存运算须两个时脉周期。据此,当一第一加载指令LD.IN-0在周期0提供给密码单元,然后在两个周期后,输入数据加载并且回合引擎开始执行,因此在周期22时产生一对应的输出数据区块。相对应的储存指令ST.OUT-0是暂停直到相对应的输出数据区块备妥,因此在周期24完成储存。一随后的加载指令LD.IN-0是暂停在先前储存指令ST.OUT-0之后直到储存完成,因此在周期26之前没有加载随后的输入文字区块。For the purposes of the teachings of the present invention, the execution of a load microinstruction LD.IN-0 requires two pipeline clock cycles. Once the input data is loaded into input register 0, the round engine starts automatically. For comparison purposes, the round engine takes 20 clock cycles to generate a corresponding output block, during which time a store instruction ST.OUT-0 is stalled. Similar to the load instruction LD.IN-0, the storage instruction ST.OUT-0 specifies that the execution of the storage operation requires two clock cycles. Accordingly, when a first load instruction LD.IN-0 is provided to the cryptographic unit at cycle 0, then after two cycles, the input data is loaded and the round engine starts executing, thus generating a corresponding output data field at cycle 22 piece. The corresponding store instruction ST.OUT-0 is stalled until the corresponding output data block is ready, so the store is completed in cycle 24 . A subsequent load instruction LD.IN-0 is suspended after the previous store instruction ST.OUT-0 until the store is complete, so the subsequent input text block is not loaded before cycle 26 .

如上所述,这种载入-储存-加载-储存微指令的顺序对密码单元先前所提及的特性并无助益。结果,就多数据区块执行密码运算而言,每个区块需要24周期。As mentioned above, this sequence of load-store-load-store microinstructions does not contribute to the previously mentioned properties of the cryptographic unit. As a result, each block requires 24 cycles for performing cryptographic operations on multiple data blocks.

请参照图14,其为本发明微指令流的另一实施例对密码单元的单阶实施例的表格1400。对比参照图13所讨论的微指令流,此替代微指令流实施例利用了单阶密码单元的有利特性。就比较目的而言,透过回合引擎执行加载指令LD.IN-0、储存指令ST.OUT-0以及密码运算的时脉周期的数目与参照图13所讨论的实施例是相同。Please refer to FIG. 14 , which is a table 1400 of another embodiment of the microinstruction stream of the present invention for a single-level embodiment of a cryptographic unit. In contrast to the microinstruction flow discussed with reference to FIG. 13, this alternative microinstruction flow embodiment takes advantage of the advantageous properties of single-level cryptographic units. For comparison purposes, the number of clock cycles for the load instruction LD.IN-0, store instruction ST.OUT-0, and cryptographic operations to be executed by the round engine is the same as the embodiment discussed with reference to FIG. 13 .

根据此替代微指令流实施例,当一第一加载指令LD.IN-0在周期0提供给密码单元,然后在两个周期后,输入数据加载并且回合引擎开始执行,因此在周期22时产生一对应的输出数据区块。然而,因为输入资料可以缓冲,因此转译逻辑在周期4完成发布一第二加载指令LD.IN0以加载一随后的输入文字区块。在随后输入文字区块执行的密码运算是暂停直到一对应第一输入文字区块的输出文字区块产生(周期22),但是随后的输入文字区块在周期4已缓冲储存,因此其密码运算可以在周期23开始并在周期42完成。对应第一输入区块的输出文字的储存指令ST.OUT-0是由转译逻辑在随后区块加载指令LD.IN-0之后所提供。此储存指令ST.OUT-0是暂停直到相对应的输出数据区块在周期22备妥,但在周期24完成储存。一随后的加载指令LD.IN-0是暂停在先前储存指令ST.OUT-0之后直到储存完成,因此在周期26之前没有加载随后的输入文字区块。通过由回合引擎而将两周期转入随后输入文字区块的处理。通过由最初执行两个加载,这种微指令顺序可以得利于密码单元先前所提及的特性,因此增加多区块的每个区块的产量成20周期。储存一输出区块所需的这两个时脉周期是有效合并于一随后输入文字区块密码运算的执行。此外,加载随后输入文字区块所需的的两个周期是合并于目前输入文字区块密码运算的执行期间。According to this alternate microinstruction flow embodiment, when a first load instruction LD.IN-0 is provided to the cryptographic unit at cycle 0, then two cycles later, the input data is loaded and the round engine starts executing, thus generating at cycle 22 A corresponding output data block. However, since the input data can be buffered, the translation logic finishes issuing a second load instruction LD.IN0 in cycle 4 to load a subsequent input text block. Cryptographic operations performed on subsequent input text blocks are suspended until an output text block corresponding to the first input text block is generated (cycle 22), but subsequent input text blocks have been buffered in cycle 4, so their cryptographic operations Can start in cycle 23 and finish in cycle 42. The store instruction ST.OUT-0 corresponding to the output text of the first input block is provided by the translation logic after the subsequent block load instruction LD.IN-0. The store instruction ST.OUT-0 is suspended until the corresponding output data block is ready in cycle 22, but the store is completed in cycle 24. A subsequent load instruction LD.IN-0 is suspended after the previous store instruction ST.OUT-0 until the store is complete, so the subsequent input text block is not loaded before cycle 26 . By transferring two cycles to the processing of subsequent input text blocks by the turn engine. This microinstruction sequence can benefit from the previously mentioned properties of the cryptographic unit by initially performing two loads, thus increasing the throughput of multiple blocks to 20 cycles per block. The two clock cycles required to store an output block are effectively combined in the execution of a subsequent input word block cryptographic operation. In addition, the two cycles required to load subsequent input text blocks are incorporated into the execution of cryptographic operations on the current input text block.

请参照图15,其为本发明微指令流的一实施例对密码单元的两阶实施例的表格1500。两阶实施例是具体讨论对照于图12,并且其可以在回合引擎的周期处理两个相继的输入数据区块。如同表格1300的单阶实施例流,表格1500的流在密码单元中并未从其特征中获利以合并时脉周期。就比较目的而言,透过回合引擎执行加载指令LD.IN-0、储存指令ST.OUT-0以及密码运算的时脉周期的数目与参照图13、图14所讨论的实施例是相同。如上所述,缓存器1执行加载指令LD.IN-1是仅仅加载输入数据至输入缓存器1;一LD.IN-0缓存器执行加载输入文字数据至输入缓存器0,并且透过回合引擎初始处理在输入缓存器0及1内的输入数据。因为回合引擎的发动(staged),因此完成在两输入缓存器中输入数据的加密/解密仅须20个时脉周期。Please refer to FIG. 15 , which is a table 1500 of an embodiment of the microinstruction stream of the present invention and a two-stage embodiment of the cryptographic unit. The two-stage embodiment is discussed in detail with reference to FIG. 12 , and it can process two consecutive input data blocks in a cycle of the round engine. Like the single-stage embodiment stream of Table 1300, the stream of Table 1500 does not benefit from its features to incorporate clock cycles in the cryptographic unit. For comparison purposes, the number of clock cycles for executing the load instruction LD.IN-0, the store instruction ST.OUT-0, and the cryptographic operation through the round engine is the same as the embodiments discussed with reference to FIGS. 13 and 14 . As mentioned above, register 1 executes the load instruction LD.IN-1 to only load input data to input register 1; a register LD.IN-0 executes to load input text data to input register 0, and through the round engine Initially process input data in input buffers 0 and 1. Since the round engine is staged, only 20 clock cycles are required to complete the encryption/decryption of the input data in the two input registers.

因此,转译逻辑发布一LD.IN-1微指令跟随一LD.IN-0指令。LD.IN-1在周期2完成而LD.IN-0在周期4完成,并且回合引擎在周期5开始处理两输入文字区块且在周期24完成。两随后的储存指令ST.OUT-1、ST.OUT-0是暂停直到周期24待其对应的输入数据文字区块处理完成,在周期24暂停解除,其允许在周期28完成储存。因为没有其它输入数据缓冲储存,因此两随后加载指令LD.IN-0、LD.IN-1是暂停直到储存完成。因此随后输入文字区块的加载发生于周期29-32之间,并且由回合引擎在周期33-52之间处理这些区块。Therefore, the translation logic issues an LD.IN-1 microinstruction followed by an LD.IN-0 instruction. LD.IN-1 completes in cycle 2 and LD.IN-0 completes in cycle 4, and the round engine starts processing two input text blocks in cycle 5 and completes in cycle 24. The two subsequent store instructions ST.OUT-1 and ST.OUT-0 are paused until cycle 24 to wait for their corresponding input data word blocks to be processed, and the pause is lifted in cycle 24, which allows the store to be completed in cycle 28. Since no other input data is buffered, the two subsequent load instructions LD.IN-0, LD.IN-1 are suspended until the store is completed. Thus subsequent loading of chunks of input text occurs between cycles 29-32, and the chunks are processed by the turn engine between cycles 33-52.

相同于参照图13单阶密码单元所讨论微指令的加载-储存-加载-储存顺序,表格15的加载-加载-储存-储存-载入-加载-储存-储存顺序并没有从支持有效率数据区块处理的密码单元的特性中取得好处。结果,在两阶密码单元执行密码运算于多数据区块,每两个区块需要28周期。Similar to the load-store-load-store sequence of microinstructions discussed with reference to the single-level cryptographic unit of Figure 13, the load-load-store-store-load-load-store-store sequence of Table 15 does not support efficient data benefit from the properties of the cryptographic unit of block processing. As a result, performing cryptographic operations on multiple data blocks in the two-level cipher unit requires 28 cycles for every two blocks.

请参照图16,其为本发明微指令流的另一实施例对密码单元的两阶实施例的表格1600。对比参照图15所讨论的微指令流,此表格1600的替代微指令流实施例利用了两阶密码单元的有利特性。就比较目的而言,透过回合引擎执行加载指令LD.IN-0、储存指令ST.OUT-0以及密码运算的时脉周期的数目与参照图15所讨论的实施例是相同。Please refer to FIG. 16 , which is a table 1600 of another embodiment of the microinstruction flow of the present invention for the two-stage embodiment of the cryptographic unit. In contrast to the microinstruction flow discussed with reference to FIG. 15, this alternate microinstruction flow embodiment of table 1600 takes advantage of the advantageous properties of the two-stage cipher unit. For comparison purposes, the number of clock cycles for the load instruction LD.IN-0, store instruction ST.OUT-0, and cryptographic operations to be executed by the round engine is the same as the embodiment discussed with reference to FIG. 15 .

根据此替代微指令流实施例,当一第一加载指令LD.IN-1在周期0提供给密码单元以及跟着提供一第二加载指令LD.IN0,然后在4个周期后,输入数据加载并且回合引擎开始执行,因此在周期24时产生一对应的输出数据区块。然而,因为输入数据可以缓冲储存,因此转译逻辑发布允许两输入文字区块的加载指令LD.IN-1、LD.IN-0的第二集(set)并在周期8完成加载。在随后输入文字区块执行的密码运算是暂停直到两分别对应两第一输入文字区块的两输出文字区块产生(周期24),但是随后的两输入文字区块在周期8已缓冲储存,因此其等密码运算可以在周期25开始并在周期44完成。对应两第一输入文字区块的两输出文字的储存指令ST.OUT-1、ST.OUT-0是由转译逻辑在随后区块加载指令LD.IN-1、LD.IN-0之后所提供。此储存指令ST.OUT-1、ST.OUT-0是暂停直到相对应的输出数据区块在周期24备妥,但在周期28完成储存。通过由回合引擎已经将4周期转入随后输入文字区块的处理。通过由最初执行四个加载,这种微指令顺序可以得利于密码单元先前所提及的特性,因此增加多区块的每个区块的产量成20周期。储存输出区块所需的这四个时脉周期是有效合并于两随后输入文字区块密码运算的执行。此外,加载随后两输入文字区块所需的的四个周期是合并于目前输入两输入文字区块密码运算的执行期间。According to this alternate microinstruction flow embodiment, when a first load instruction LD.IN-1 is provided to the cryptographic unit at cycle 0 followed by a second load instruction LD.IN0, then after 4 cycles, the input data is loaded and The round engine starts to execute, so at cycle 24 a corresponding output data block is generated. However, because the input data can be buffered, the translation logic issues a second set of load instructions LD.IN-1, LD.IN-0 allowing two input literal blocks and completes the load in cycle 8. The cryptographic operation performed on subsequent input text blocks is suspended until two output text blocks respectively corresponding to the two first input text blocks are generated (cycle 24), but the subsequent two input text blocks have been buffered in cycle 8, Thus other cryptographic operations can be started at cycle 25 and completed at cycle 44 . The storage instructions ST.OUT-1, ST.OUT-0 corresponding to the two output texts of the two first input text blocks are provided by the translation logic after the subsequent block load instructions LD.IN-1, LD.IN-0 . The store commands ST.OUT-1, ST.OUT-0 are suspended until the corresponding output data blocks are ready in cycle 24, but are stored in cycle 28. 4 cycles have been diverted by the turn engine into the processing of subsequent input text blocks. This microinstruction sequence can benefit from the previously mentioned properties of the cryptographic unit by initially performing four loads, thus increasing the throughput of multiple blocks to 20 cycles per block. The four clock cycles required to store the output block are effectively combined in the execution of two subsequent input word block cryptographic operations. In addition, the four cycles required to load the next two input text blocks are incorporated into the execution period of the cryptographic operation for the current input two input text blocks.

虽然本发明及其目的、特征与优点已详细描述,但其它实施例也应包含于本发明。例如:本发明曾根据兼容x86架构的实施例讨论长度,然而这些讨论已提供此类的方式,因为x86架构容易理解且提供足够的方式以教示本发明。然而本发明包含相称于其它指令集架构的实施例,例如:PowerPC、MIPS及诸如此类等,此外还有全新的指令集架构。Although the present invention and its objects, features and advantages have been described in detail, other embodiments should also be included in the present invention. For example: the present invention has discussed lengths in terms of x86-architecture-compatible embodiments, however these discussions have provided such means because the x86 architecture is easy to understand and provides sufficient means to teach the present invention. However, the present invention includes embodiments corresponding to other instruction set architectures, such as PowerPC, MIPS, and the like, as well as entirely new instruction set architectures.

本发明更包含计算机系统中微理器外其它组件的密码运算的执行,例如,根据本发明的密码指令可以容易地被应用在一密码单元的一实施例,此实施例并非如微处理器部分相同的整合电路,其执行方式如部分计算机系统。本发明的如此的实施例是为了并入围绕在微处理器的芯片组(如北桥、南侨),或当一处理器用于执行密码运算时,其密码指令是由主微处理器移转(hand off)给此处理器。本发明可应用于内嵌控制器、工业控制器、信号处理器、阵列处理机以及任何相似处理数据的装置。本发明也包含一实施例仅含有执行密码运算所必需的组件。如此的内嵌装置不仅执行密码运算,也确实提供低成本、低电源,例如通信系统中的加密/解密处理器。为求简明,本发明将这些替代的处理组件参照成上述的处理器。The present invention also includes the execution of cryptographic operations by components other than the microprocessor in the computer system. For example, the cryptographic instructions according to the present invention can be easily applied to an embodiment of a cryptographic unit that is not part of a microprocessor. The same integrated circuit that performs as part of a computer system. Such embodiments of the present invention are intended for incorporation into chipsets (e.g., Northbridge, Southbridge) surrounding a microprocessor, or when a processor is used to perform cryptographic operations, its cryptographic instructions are offloaded from the main microprocessor ( hand off) to this processor. The present invention is applicable to embedded controllers, industrial controllers, signal processors, array processors, and any similar devices that process data. The invention also includes an embodiment that contains only the components necessary to perform cryptographic operations. Such embedded devices not only perform cryptographic operations, but also provide low-cost, low-power supplies, such as encryption/decryption processors in communication systems. For simplicity, this disclosure refers to these alternative processing components as the processors described above.

此外,虽然本发明提及128位区块,但是许多不同区块的大小可以透过改变缓存器的大小而被应用,其中缓存器传送输入数据、输出数据、钥匙以及控制字组。Furthermore, although the present invention refers to 128-bit blocks, many different block sizes can be used by varying the size of the registers that pass input data, output data, keys, and control words.

并且,虽然本应用显著以DES、三重DES以及AES为其特征,但本发明也包含较少人知的区块密码算法,例如:MARS密码、Rijndael密码、Twofish密码、Blowfish密码、Serpent密码以及RC6密码。足以理解的是,本发明提供在微处理器中用于区块密码的装置及支持的算法,其基元区块密码运算可透过单一指令的执行而引动。Also, while this application notably features DES, Triple DES, and AES, the present invention also includes lesser-known block cipher algorithms such as: MARS cipher, Rijndael cipher, Twofish cipher, Blowfish cipher, Serpent cipher, and RC6 cipher . It should be understood that the present invention provides means and supported algorithms for block ciphers in microprocessors, the primitive block cipher operations of which can be initiated by the execution of a single instruction.

并且,虽然本发明在此以区块密码演算及其相关技术以执行区块密码功能为特征,但是除了区块密码的外其它形式的密码也包含于本发明应用范围的内。可足以观察的是,提供一单一指令,通过此,使用者可指示一相称的微处理器执行一密码运算,例如:加密或解密,其中微处理器包含一密码单元,此密码单元依指示完成指令所指定的密码功能。Moreover, although the present invention is characterized by the block cipher algorithm and its related technologies to perform the block cipher function, other forms of cipher besides the block cipher are also included in the scope of application of the present invention. It is sufficient to observe that a single instruction is provided by which the user instructs a suitable microprocessor to perform a cryptographic operation, such as encryption or decryption, wherein the microprocessor contains a cryptographic unit which performs as directed The cryptographic function specified by the directive.

并且,在此所讨论的回合引擎提供一两阶装置可管线处理两区块的输入数据,但其它实施例也可考虑多于两阶段。阶段的分配对支持更多输入数据区块的管线处理,将发展协调相称微处理器中其它阶段的分配。Also, the round engine discussed here provides a two-stage device that pipelines two blocks of input data, but other embodiments contemplate more than two stages. The allocation of stages will evolve to match the allocation of other stages in the microprocessor to pipeline processing that supports more blocks of input data.

最后,虽然本发明具体讨论支持复数个算法的一单独密码单元,但是本发明也提供理解在一相称微处理器中与其它执行单元并列操作耦合的多密码单元,而每一多密码单元是配置用以执行一具体指定的密码演算,例如:一第一单元是配置用以执行AES、一第二单元是配置用以执行DES等。Finally, while the present invention specifically discusses a single cryptographic unit supporting a plurality of algorithms, the present invention also provides insight into multiple cryptographic units operationally coupled in parallel with other execution units in a commensurate microprocessor, where each multiple cryptographic unit is configured It is used to perform a specified cryptographic calculation, for example: a first unit is configured to perform AES, a second unit is configured to perform DES, and so on.

以上所述仅为本发明的较佳实施例而已,并非用以限定本发明的申请专利范围;凡其它为脱离本发明所揭示的精神下所完成的等效改变或修饰,均应包含在下述的申请专利范围。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the patent scope of the present invention; all other equivalent changes or modifications completed under the spirit disclosed by the present invention shall be included in the following scope of the patent application.

Claims (29)

1. a device of carrying out crypto-operation is characterized in that, the device of described this execution crypto-operation comprises:
One cipher instruction is that wherein this cipher instruction is specified a crypto-operation by calculation element reception and with its part as an instruction stream that is executed in this calculation element;
Translation logic, operational coupled becomes microcommand in this cipher instruction and configuration in order to translate this cipher instruction, wherein this microcommand is in order to storing at this calculation element of indication before the output character block corresponding to one first input characters block, indicates this calculation element to load one second input characters block and to this this crypto-operation of second input characters onblock executing;
During this crypto-operation was to this second input characters onblock executing, this output character block can be stored by this.
2. the device of execution crypto-operation as claimed in claim 1 is characterized in that, described this crypto-operation comprises:
One cryptographic calculation, this cryptographic calculation comprise the encryption of a plurality of plaintext blocks to produce a plurality of relatively ciphertext blocks;
Wherein these a plurality of plaintext blocks comprise:
This first and second input characters block; And
Wherein these a plurality of relatively ciphertext blocks comprise:
This output character block.
3. the device of execution crypto-operation as claimed in claim 1 is characterized in that, described this crypto-operation comprises:
One decrypt operation, this decrypt operation comprise the deciphering of a plurality of ciphertext blocks to produce a plurality of relatively plaintext blocks;
Wherein these a plurality of ciphertext blocks comprise:
This first and second input characters block; And
Wherein these a plurality of relatively plaintext blocks comprise:
This output character block.
4. the device of execution crypto-operation as claimed in claim 1 is characterized in that, more comprises:
Actuating logic, operational coupled is to receive this microcommand and to dispose in order to store this output character block when this crypto-operation is executed in this second input characters block.
5. the device of execution crypto-operation as claimed in claim 4 is characterized in that, described this actuating logic comprises a password unit.
6. the device of execution crypto-operation as claimed in claim 5 is characterized in that, described this password unit is that configuration is in order to carry out this crypto-operation according to advancing the rank encryption standard.
7. the device of execution crypto-operation as claimed in claim 5 is characterized in that, described this password unit comprises:
One or two rank bout engine, configuration is carried out this first and second input characters block in order to pipeline.
8. the device of execution crypto-operation as claimed in claim 1 is characterized in that, described this microcommand comprises:
One loads microcommand, and configuration is in order to indicate this calculation element to load this second input characters block and to carry out this crypto-operation in this second input characters block; And
One stores microcommand, and configuration is in order to indicate this calculation element to store this output character block.
9. the device of execution crypto-operation as claimed in claim 1 is characterized in that, described this cipher instruction is specified according to x86 command format.
10. the device of execution crypto-operation as claimed in claim 1 is characterized in that, described this cipher instruction is implicit with reference to a plurality of buffers in this calculation element.
11. the device of execution crypto-operation as claimed in claim 10 is characterized in that, described these a plurality of buffers comprise:
One first buffer, wherein the content of this first buffer comprises one first pointer to one first memory address, this first memory address is according to a primary importance is with a plurality of input characters blocks of access in this crypto-operation specified memory of finishing, and wherein these a plurality of input characters blocks comprise this first and second input characters block.
12. the device of execution crypto-operation as claimed in claim 10 is characterized in that, described these a plurality of buffers comprise:
One second buffer, wherein the content of this second buffer comprises one second pointer to one second memory address, a second place is to store a plurality of relatively output character blocks in this second memory address specified memory, these a plurality of relatively output character blocks are the results according to a plurality of input characters these crypto-operations that block is finished, and wherein these a plurality of output character blocks comprise this output character block.
13. the device of execution crypto-operation as claimed in claim 10 is characterized in that, described these a plurality of buffers comprise:
One the 3rd buffer, wherein the content of the 3rd buffer is indicated the literal block of a quantity in a plurality of input characters blocks.
14. the device of execution crypto-operation as claimed in claim 10 is characterized in that, described in these a plurality of buffers comprise:
One the 4th buffer, wherein the content of the 4th buffer comprises one the 3rd pointer to one the 3rd memory address, in the 3rd memory address specified memory one the 3rd position with the key data that accesses to your password to be used to finish this crypto-operation.
15. the device of execution crypto-operation as claimed in claim 10 is characterized in that, described these a plurality of buffers comprise:
One the 5th buffer, wherein the content of the 5th buffer comprises one the 4th pointer to one the 4th memory address, one the 4th position in the 4th memory address specified memory, the 4th position comprises an initial vector position, and the content of this initial vector position comprises an initial vector or initial vector equivalent to be used to finish this crypto-operation.
16. the device of execution crypto-operation as claimed in claim 10 is characterized in that, described these a plurality of buffers comprise:
One the 6th buffer, wherein the content of the 6th buffer comprises the five fingers at one the 5th memory address, to be used to finish this crypto-operation, wherein this control word group designated pin parameter is given this crypto-operation to interior one the 5th position of the 5th memory address specified memory with access one control word group.
17. a device of carrying out crypto-operation is characterized in that, the device of described this execution crypto-operation comprises:
Translation logic, configuration becomes the microcommand of a sequence in order to translate a cipher instruction, and the microcommand of this sequence comprises:
One first microcommand, indication loads one second input characters block and carries out a crypto-operation in this second input characters block; And
One second microcommand, indication stores one first output character block, this first output character block according to this crypto-operation of carrying out corresponding to one first input characters block;
Wherein this translation logic was issued this first microcommand before this second microcommand of issue;
During this crypto-operation was to this second input characters onblock executing, this output character block can be stored by this.
18. the device of execution crypto-operation as claimed in claim 17 is characterized in that, described this crypto-operation comprises:
One cryptographic calculation, this cryptographic calculation comprise the encryption of a plurality of plaintext blocks to produce a plurality of relatively ciphertext blocks;
Wherein these a plurality of plaintext blocks comprise:
This first and second input characters block; And
Wherein these a plurality of relatively ciphertext blocks comprise:
This output character block.
19. the device of execution crypto-operation as claimed in claim 17 is characterized in that, described this crypto-operation comprises:
One decrypt operation, this decrypt operation comprise the deciphering of a plurality of ciphertext blocks to produce a plurality of relatively plaintext blocks;
Wherein these a plurality of ciphertext blocks comprise:
This first and second input characters block; And
Wherein these a plurality of relatively plaintext blocks comprise:
This output character block.
20. the device of execution crypto-operation as claimed in claim 17 is characterized in that, more comprises:
One password unit, operational coupled is to receive this microcommand and to dispose in order to store this output character block when this crypto-operation is executed in this second input characters block.
21. the device of execution crypto-operation as claimed in claim 20 is characterized in that, described this password unit is that configuration is in order to carry out this crypto-operation according to advancing the rank encryption standard.
22. the device of execution crypto-operation as claimed in claim 20 is characterized in that, described this password unit comprises:
One or two rank bout engine, configuration is carried out this first and second input characters block in order to pipeline.
23. the device of execution crypto-operation as claimed in claim 17 is characterized in that, described this cipher instruction is specified according to x86 command format.
24. the method at a device execution crypto-operation is characterized in that, described should comprising in the method for a device execution crypto-operation:
Translate a cipher instruction and become one first microcommand and one second microcommand, this cipher instruction is specified a crypto-operation, this first microcommand is indicated this device to load one second input characters block and is carried out this crypto-operation in this second input characters block, this second microcommand indicates this device to store one first output character block, this first output character block according to this crypto-operation of carrying out corresponding to one first input characters block; And
Issue and issue this second microcommand to this password unit after this first microcommand is given a password unit;
During this crypto-operation was to this second input characters onblock executing, this output character block can be stored by this.
25. the method at a device execution crypto-operation as claimed in claim 24, wherein this is translated and comprises:
By specify by this first microcommand carry out a cryptographic calculation in this second literal block to produce relative second a ciphertext block.
26. the method at a device execution crypto-operation as claimed in claim 24 is characterized in that described this translated and comprised:
By specify by this first microcommand carry out a decrypt operation in this second literal block to produce a relative second plaintext block.
27. the method at a device execution crypto-operation as claimed in claim 24 is characterized in that, more comprises:
Carry out this first and second microcommand in a password unit, wherein this execution comprises:
When carrying out this crypto-operation, store this output character block in this second input characters block.
28. the method at a device execution crypto-operation as claimed in claim 24 is characterized in that, described this cipher instruction is specified this crypto-operation of execution according to advancing the rank encryption standard.
29. the method at a device execution crypto-operation as claimed in claim 24 is characterized in that, more comprises:
Carry out this first and second microcommand in a password unit, wherein this execution comprises through one or two this first and second input characters block of rank bout engine pipeline.
CNB2004100831177A 2003-09-29 2004-09-29 Microprocessor and method with optimized block cipher function Expired - Lifetime CN100527664C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US50697103P 2003-09-29 2003-09-29
US60/506,971 2003-09-29

Publications (2)

Publication Number Publication Date
CN1592189A true CN1592189A (en) 2005-03-09
CN100527664C CN100527664C (en) 2009-08-12

Family

ID=34619303

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004100831177A Expired - Lifetime CN100527664C (en) 2003-09-29 2004-09-29 Microprocessor and method with optimized block cipher function

Country Status (2)

Country Link
CN (1) CN100527664C (en)
TW (1) TWI253268B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330552A (en) * 2017-06-28 2017-11-07 无锡井通网络科技有限公司 A kind of intelligent trade matching method of distributed system digital asset

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104683093B (en) * 2013-11-27 2018-01-26 财团法人资讯工业策进会 Block encryption device, block encryption method, block decryption device, and block decryption method capable of integrity verification

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4250546A (en) * 1978-07-31 1981-02-10 Motorola, Inc. Fast interrupt method
US6118870A (en) * 1996-10-09 2000-09-12 Lsi Logic Corp. Microprocessor having instruction set extensions for decryption and multimedia applications
AU5730200A (en) * 1999-06-08 2000-12-28 General Instrument Corporation Cryptographic processing system
US6983374B2 (en) * 2000-02-14 2006-01-03 Kabushiki Kaisha Toshiba Tamper resistant microprocessor
TWI282066B (en) * 2002-08-22 2007-06-01 Ip First Llc Apparatus and method for extending data modes in a microprocessor

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330552A (en) * 2017-06-28 2017-11-07 无锡井通网络科技有限公司 A kind of intelligent trade matching method of distributed system digital asset

Also Published As

Publication number Publication date
CN100527664C (en) 2009-08-12
TWI253268B (en) 2006-04-11
TW200513084A (en) 2005-04-01

Similar Documents

Publication Publication Date Title
CN1655496B (en) Device and method for generating cipher key schedule
US7321910B2 (en) Microprocessor apparatus and method for performing block cipher cryptographic functions
EP1596530B1 (en) Apparatus and method for employing cryptographic functions to generate a message digest
EP1538510B1 (en) Microprocessor apparatus and method for performing block cipher cryptographic functions
EP1496421B1 (en) Apparatus and method for performing transparent block cipher cryptographic functions
EP1519509B1 (en) Apparatus and method for providing user-generated key schedule in a microprocessor cryptographic engine
US7392400B2 (en) Microprocessor apparatus and method for optimizing block cipher cryptographic functions
US7502943B2 (en) Microprocessor apparatus and method for providing configurable cryptographic block cipher round results
US7536560B2 (en) Microprocessor apparatus and method for providing configurable cryptographic key size
US7529368B2 (en) Apparatus and method for performing transparent output feedback mode cryptographic functions
US7900055B2 (en) Microprocessor apparatus and method for employing configurable block cipher cryptographic algorithms
US7542566B2 (en) Apparatus and method for performing transparent cipher block chaining mode cryptographic functions
US7519833B2 (en) Microprocessor apparatus and method for enabling configurable data block size in a cryptographic engine
US7529367B2 (en) Apparatus and method for performing transparent cipher feedback mode cryptographic functions
CN1592189A (en) Microprocessor and method with optimized block cipher function
CN1658548B (en) Microprocessor and method for allocating data blocks of a cryptographic engine
CN1661958A (en) Microprocessor and method for block cipher function
CN1607763A (en) Microprocessor device and method for executing configuration block cryptographic algorithm
CN1652163B (en) Method and device for implementing password function of permeability output feedback mode
CN1684408B (en) Microprocessor apparatus and method for providing configurable encryption block encryption

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CX01 Expiry of patent term

Granted publication date: 20090812

CX01 Expiry of patent term