[go: up one dir, main page]

CN102520903B - Length-configurable vector maximum/minimum network supporting reconfigurable fixed floating points - Google Patents

Length-configurable vector maximum/minimum network supporting reconfigurable fixed floating points Download PDF

Info

Publication number
CN102520903B
CN102520903B CN201110415155.8A CN201110415155A CN102520903B CN 102520903 B CN102520903 B CN 102520903B CN 201110415155 A CN201110415155 A CN 201110415155A CN 102520903 B CN102520903 B CN 102520903B
Authority
CN
China
Prior art keywords
floating
data
bit
maximum
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110415155.8A
Other languages
Chinese (zh)
Other versions
CN102520903A (en
Inventor
王东琳
汪涛
尹磊祖
谢少林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Silam Technology Co., Ltd.
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201110415155.8A priority Critical patent/CN102520903B/en
Publication of CN102520903A publication Critical patent/CN102520903A/en
Application granted granted Critical
Publication of CN102520903B publication Critical patent/CN102520903B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

本发明公开了一种支持定浮点可重构的长度可配置的向量最大/最小值网络,包括:并行浮点数据预处理单元,用于对接收的512位向量数据A的格式进行分析,并针对不同的数据格式分别进行处理,将处理后得到的浮点数据输出给可重构比较器网络,将处理后得到的各种标志位输出给结果选择单元;Mask寄存器,用于控制参与最大/最小值的数据;可重构比较器网络,用于将接收自并行浮点数据预处理单元的浮点数据以及接收自Mask寄存器的值作为输入,对向量数据依次进行比较,将得到的最大/值结果输出给结果选择单元;以及结果选择单元,用于接收可重构比较器网络的输出,根据接收自并行浮点数据预处理单元的各种标志位输出得到最终的向量最大/最小值结果。

The invention discloses a reconfigurable length-configurable vector maximum/minimum network supporting fixed-floating points, including: a parallel floating-point data preprocessing unit for analyzing the format of the received 512-bit vector data A, And according to different data formats, process them separately, output the floating-point data obtained after processing to the reconfigurable comparator network, and output various flag bits obtained after processing to the result selection unit; the Mask register is used to control the maximum participation /minimum value data; a reconfigurable comparator network is used to take the floating-point data received from the parallel floating-point data preprocessing unit and the value received from the Mask register as input, compare the vector data in turn, and obtain the maximum The /value result is output to the result selection unit; and the result selection unit is used to receive the output of the reconfigurable comparator network, and obtain the final vector maximum/minimum value according to the output of various flag bits received from the parallel floating-point data preprocessing unit result.

Description

支持定浮点可重构的长度可配置的向量最大/最小值网络Support fixed-floating point reconfigurable length configurable vector max/min network

技术领域 technical field

本发明涉及高性能数字信号处理器技术领域,尤其涉及一种支持定浮点可重构的长度可配置的向量最大/最小值网络。The invention relates to the technical field of high-performance digital signal processors, in particular to a reconfigurable length-configurable vector maximum/minimum network supporting fixed-floating points.

背景技术 Background technique

随着计算机和信息学科的飞速发展,数字信号处理器(DSP)技术应运而生,在过去的40年,DSP得到了突飞猛进的发展。在DSP中,无论多么复杂的运算最终都交由运算单元来实现,因此,运算单元是整个DSP中核心部件。近年来,随着数字信号处理领域的不断发展,DSP的应用推动着DSP的发展,针对特定领域、特定需求的DSP是其不断发展的方向。With the rapid development of computer and information science, digital signal processor (DSP) technology came into being. In the past 40 years, DSP has been developed by leaps and bounds. In DSP, no matter how complicated the operation is, it is finally implemented by the operation unit. Therefore, the operation unit is the core component of the entire DSP. In recent years, with the continuous development of the field of digital signal processing, the application of DSP has promoted the development of DSP, and the DSP for specific fields and specific needs is the direction of its continuous development.

在数字信号处理领域存在着大量的最大/最小值操作,如中值滤波、最大/最小像素的提取、Viterbi译码、阈值检测和精度检测等。传统DSP处理器中,最大/最小值操作都是复用已有的定、浮点算术运算单元(ALU),这样虽然能节省芯片面积,但是具有以下局限性:There are a large number of maximum/minimum value operations in the field of digital signal processing, such as median filtering, extraction of maximum/minimum pixels, Viterbi decoding, threshold detection, and precision detection. In traditional DSP processors, the maximum/minimum value operations are to reuse the existing fixed and floating-point arithmetic unit (ALU). Although this can save chip area, it has the following limitations:

1)效率低。一般DSP最大/最小值指令仅仅能比较两个数据的大小,当需要从大量的数据中取最大/最小值时,需要多条指令。1) Low efficiency. General DSP maximum/minimum value instructions can only compare the size of two data, when it is necessary to obtain the maximum/minimum value from a large amount of data, multiple instructions are required.

2)支持数据粒度小,一般只支持一种定点格式或浮点格式。以ADITS20XS系列DSP为例,虽然可以将一个32位定点数据配置成1/2/4个32/16/8位定点数据,支持8/16/32位定点数据格式,但其在8/16位模式下只能依次从对应的两个数据中取最大/最小值,而不是从8/4个8/16位(2个32位定点可以配置成8个8位,4个16位)中取最大/最小值。2) The supported data granularity is small, generally only one fixed-point format or floating-point format is supported. Taking the ADITS20XS series DSP as an example, although a 32-bit fixed-point data can be configured as 1/2/4 32/16/8-bit fixed-point data, and supports 8/16/32-bit fixed-point data format, it is in the 8/16-bit In the mode, the maximum/minimum value can only be taken from the corresponding two data in turn, instead of taking from 8/4 8/16 bits (2 32-bit fixed points can be configured as 8 8-bit, 4 16-bit) Maximum/minimum value.

3)数据个数不可配置。仅仅能从两个数据中取最大/最小值,不能灵活配置数据的个数,不能让多个数据参与最大/最小值运算。3) The number of data cannot be configured. It can only take the maximum/minimum value from two data, and cannot flexibly configure the number of data, and cannot allow multiple data to participate in the maximum/minimum value operation.

在现代雷达信号处理、星载卫星图像处理、图像压缩、高清视频等领域,存在着大量的可变尺寸、高密度的计算,这对运算单元提出了越来越高的挑战,最大/最小值运算将成为运算单元的一大瓶颈。已有的一些专利和文献对最大/最小值操作进行了一些优化,但都仅仅局限于标量处理器复用ALU这一层面,并且定、浮点最大/最小值操作完全分开,且没有进一步研究向量处理器特有的最大/最小值网络。In modern radar signal processing, spaceborne satellite image processing, image compression, high-definition video and other fields, there are a large number of variable-size, high-density calculations, which pose increasingly high challenges to the computing unit. The maximum/minimum Operation will become a major bottleneck of the operation unit. Some existing patents and literature have optimized the maximum/minimum value operations, but they are only limited to the level of scalar processor multiplexed ALU, and the fixed and floating-point maximum/minimum value operations are completely separated, and there is no further research Max/min networks specific to vector processors.

因此,在算术级分析定、浮点数据的相似性,采用可重构技术实现不同粒度的定点数据比较,在定点数据通路中增加额外的控制电路实现浮点数据格式数据比较,采用专用寄存器配置参与最大/最小值运算的数据个数,利用一套专有的可配置资源,专门执行向量最大/最小值操作。提供一种支持不同粒度、不同数据格式、不同数据个数的向量最大/最小值网络,以满足特定领域的密集向量最大/最小值运算需求,是本发明急需解决的问题。Therefore, analyze the similarity of fixed-point and floating-point data at the arithmetic level, use reconfigurable technology to achieve fixed-point data comparison at different granularities, add additional control circuits in the fixed-point data path to achieve data comparison in floating-point data format, and use special register configuration The number of data participating in the maximum/minimum value operation uses a set of proprietary configurable resources to perform vector maximum/minimum value operations. It is an urgent problem to be solved in the present invention to provide a vector maximum/minimum value network that supports different granularities, different data formats, and different data numbers to meet the needs of intensive vector maximum/minimum value calculations in specific fields.

需要说明的是,本文中“最大/最小、1/2/4、32/16/8以及8/16/32等”中的“/”均是指“或”,下文就不再赘述。It should be noted that the "/" in "maximum/minimum, 1/2/4, 32/16/8, and 8/16/32" in this article all means "or", which will not be repeated below.

发明内容 Contents of the invention

(一)要解决的技术问题(1) Technical problems to be solved

有鉴于此,本发明的主要目的在于提供一种支持定浮点可重构的长度可配置的向量最大/最小值网络,支持8/16/32位有/无符号定点数据、32位IEEE754标准精简单精度浮点数据操作,通过寄存器灵活配置参与向量最大/最小值的数据个数,执行向量最大/最小值操作,加快大量数据最大/最小操作的执行速度,以满足特定领域的密集向量最大/最小值运算需求。In view of this, the main purpose of the present invention is to provide a vector maximum/minimum value network with reconfigurable fixed-floating point length and configurable length, which supports 8/16/32 bits with/unsigned fixed-point data, 32-bit IEEE754 standard Simple simple-precision floating-point data operations, flexible configuration of the number of data participating in the maximum/minimum value of the vector through registers, and the execution of maximum/minimum value operations on vectors, speed up the execution speed of maximum/minimum operations on large amounts of data to meet the needs of dense vectors in specific fields /Minimum calculation requirements.

(二)技术方案(2) Technical solution

为达到上述目的,本发明提供了一种支持定浮点可重构的长度可配置的向量最大/最小值网络,包括:并行浮点数据预处理单元100,用于对接收的512位向量数据A的格式进行分析,并针对不同的数据格式分别进行处理,将处理后得到的浮点数据输出给可重构比较器网络300,将处理后得到的各种标志位输出给结果选择单元400;Mask寄存器200,为64位可配置的Mask寄存器,用于控制参与最大/最小值的数据;可重构比较器网络300,用于将接收自并行浮点数据预处理单元100的浮点数据以及接收自Mask寄存器200的值作为输入,根据Opcode操作码、FBS选项数据格式、U选项、M选项以及Mask寄存器的值,对向量数据依次进行比较,将得到的最大/小值结果输出给结果选择单元400;以及结果选择单元400,用于接收可重构比较器网络300的输出,根据接收自并行浮点数据预处理单元100的各种标志位输出得到的最终的向量最大/最小值结果。In order to achieve the above object, the present invention provides a reconfigurable fixed-floating-point vector maximum/minimum value network, including: a parallel floating-point data preprocessing unit 100 for processing the received 512-bit vector data The format of A is analyzed, and different data formats are processed respectively, and the floating-point data obtained after processing is output to the reconfigurable comparator network 300, and various flag bits obtained after processing are output to the result selection unit 400; Mask register 200 is a 64-bit configurable Mask register used to control the data participating in the maximum/minimum value; the reconfigurable comparator network 300 is used to receive the floating point data from the parallel floating point data preprocessing unit 100 and The value received from the Mask register 200 is used as input, and the vector data is compared in turn according to the Opcode operation code, FBS option data format, U option, M option and the value of the Mask register, and the obtained maximum/minimum value results are output to the result selection Unit 400 ; and a result selection unit 400 , configured to receive the output of the reconfigurable comparator network 300 , and output the final vector maximum/minimum value result obtained according to various flag bits received from the parallel floating point data preprocessing unit 100 .

上述方案中,所述并行浮点数据预处理单元100对接收的512位向量数据A的格式进行分析,并针对不同的数据格式分别进行处理,包括:并行浮点数据预处理单元100对接收的512位向量数据A的格式进行分析,当该512位向量数据A为浮点数据格式时,对这些浮点数据进行特殊值分析,得到非正常浮点数据标志位NaNFlag、正无穷标志位PosInfFlag和负无穷标志位NegInfFlag,并对负浮点数据进行求反操作;当该512位向量数据A为定点数据格式时,直接输出定点数据。In the above scheme, the parallel floating-point data preprocessing unit 100 analyzes the format of the received 512-bit vector data A, and processes different data formats respectively, including: the parallel floating-point data preprocessing unit 100 processes the received The format of the 512-bit vector data A is analyzed. When the 512-bit vector data A is in the floating-point data format, the special value analysis is performed on these floating-point data, and the abnormal floating-point data flag NaNFlag, positive infinity flag PosInfFlag and Negative infinity flag NegInfFlag, and negate the negative floating-point data; when the 512-bit vector data A is in the fixed-point data format, the fixed-point data is directly output.

上述方案中,所述Mask寄存器200控制参与最大/最小值的数据,包括:Mask寄存器200为64位可配置寄存器,直接控制该512位向量数据A,Mask寄存器200的每一位分别控制该512位向量数据A的一个字节;当M选项有效时,只有Mask寄存器相应位为1指示的单元才参与最大/最小值操作;当M选项不存在时,最大/最小值操作不受Mask寄存器影响,该512位向量数据A全部参与最大/最小值操作。In the above scheme, the Mask register 200 controls the data participating in the maximum/minimum value, including: the Mask register 200 is a 64-bit configurable register, directly controls the 512-bit vector data A, and each bit of the Mask register 200 controls the 512 bits respectively. One byte of bit vector data A; when the M option is valid, only the unit indicated by the corresponding bit of the Mask register as 1 participates in the maximum/minimum value operation; when the M option does not exist, the maximum/minimum value operation is not affected by the Mask register , the 512-bit vector data A all participate in the maximum/minimum value operation.

上述方案中,所述可重构比较器网络300由8/16/32位比较器级联组成,每个比较器根据输入操作码得到相应的最大/最小值。In the above solution, the reconfigurable comparator network 300 is composed of 8/16/32-bit comparators cascaded, and each comparator obtains a corresponding maximum/minimum value according to an input operation code.

上述方案中,所述可重构比较器网络300由多个32位比较器和1个16位比较器以及1个8位比较器组成,除相应的数据输入外,每个比较器的输入还有控制信号U、M、FBS;当工作在32位单精度浮点或32位定点模式时,512位向量数据通过4层32位比较器网络,得到32位最大/最小值;当工作在16位半字定点模式时,第4层32位比较器的输出进入1个16位比较器,得到16位最大/最小值;当工作在8位字节模式时,第5层16位比较器的输出进入1个8位比较器,得到8位最大/最小值;通过8位定点数据格式时所需的比较器资源在加上相应的控制信号,实现16/32位定点,以及32位IEEE754标准精简单精度浮点多种数据格式的可重构。In the above scheme, the reconfigurable comparator network 300 is composed of multiple 32-bit comparators, one 16-bit comparator and one 8-bit comparator. In addition to the corresponding data input, the input of each comparator is also There are control signals U, M, FBS; when working in 32-bit single-precision floating-point or 32-bit fixed-point mode, 512-bit vector data passes through a 4-layer 32-bit comparator network to obtain a 32-bit maximum/minimum value; when working in 16-bit In half-word fixed-point mode, the output of the 32-bit comparator in the fourth layer enters a 16-bit comparator to obtain a 16-bit maximum/minimum value; when working in 8-bit byte mode, the output of the 16-bit comparator in the fifth layer The output enters an 8-bit comparator to obtain an 8-bit maximum/minimum value; the comparator resources required by the 8-bit fixed-point data format are added with corresponding control signals to achieve 16/32-bit fixed-point and 32-bit IEEE754 standards Reconfigurable simple-precision floating-point multiple data formats.

上述方案中,在所述结果选择单元400中,4-1选择器MUX3614受FBS选项的控制,当FBS=2’b00时,网络工作在32位定点模式下,MUX3614直接输出可重构比较器300的32位结果;当FBS=2’b10,网络工作在8位定点模式下,MUX3 614直接输出可重构比较器300的8位结果;当FBS=2’b11,网络工作在16位定点模式下,MUX3 614直接输出可重构比较器300的16位结果;当FBS=2’b01,网络工作在32位精简单精度浮点数据格式,根据各种浮点标志位信号选择输出最终的向量最大/最小值;若NaNFlag=1,说明16个浮点数据中存在NaN浮点数据,最大/最小值网络最终输出32’hFFFF_FFFF;若PosInfFlag=1且Opcode=1表示浮点数据中存在正无穷数据,且工作在最大值Max模式,最大/最小值网络输出正无穷32’h7F80_0000;若NegInfFlag=1且Opcode=0表示浮点数据中存在负无穷,且工作在最小值Min模式下,最大/最小值网络输出负无穷32’hFF80_0000;在其他情况下,输出可重构比较器网络300的32位浮点数据。In the above scheme, in the result selection unit 400, the 4-1 selector MUX3614 is controlled by the FBS option. When FBS=2'b00, the network works in 32-bit fixed-point mode, and the MUX3614 directly outputs the reconfigurable comparator The 32-bit result of 300; when FBS=2'b10, the network works in 8-bit fixed-point mode, and MUX3 614 directly outputs the 8-bit result of reconfigurable comparator 300; when FBS=2'b11, the network works in 16-bit fixed-point mode In this mode, MUX3 614 directly outputs the 16-bit result of the reconfigurable comparator 300; when FBS=2'b01, the network works in the 32-bit simple-precision floating-point data format, and selects and outputs the final result according to various floating-point flag signals The maximum/minimum value of the vector; if NaNFlag=1, it means that there are NaN floating-point data in the 16 floating-point data, and the maximum/minimum value network will finally output 32'hFFFF_FFFF; if PosInfFlag=1 and Opcode=1, it means that there are positive floating-point data in the floating-point data Infinity data, and work in the maximum value Max mode, the maximum/minimum value network outputs positive infinity 32'h7F80_0000; if NegInfFlag=1 and Opcode=0, it means that there is negative infinity in the floating point data, and work in the minimum value Min mode, the maximum The /min network outputs negative infinity 32'hFF80_0000; in other cases, outputs the 32-bit floating point data of the reconfigurable comparator network 300.

(三)有益效果(3) Beneficial effects

本发明提供的这种支持定浮点可重构的长度可配置的向量最大/最小值网络,采用可重构技术实现不同粒度的定点数据比较,在定点数据通路中增加额外的控制逻辑单元实现浮点数据比较,采用寄存器灵活配置参与最大/最小值操作的向量数据个数,执行向量最大/最小值操作,能够支持8/16/32位有/无符号定点数据、32位IEEE754标准精简单精度浮点数据操作,同时向量数据长度受Mask寄存器配置,加快了面向特定领域的密集向量最大/最小值运算的执行速度,简化了软件编程复杂度,提高了代码密度,改善了处理器执行最大/最小值得运算效率和灵活性。The reconfigurable length-configurable vector maximum/minimum network that supports fixed-floating points provided by the present invention adopts reconfigurable technology to realize fixed-point data comparison of different granularities, and adds an additional control logic unit in the fixed-point data path to realize Floating-point data comparison, using registers to flexibly configure the number of vector data participating in the maximum/minimum value operation, and performing vector maximum/minimum value operations, can support 8/16/32-bit signed/unsigned fixed-point data, 32-bit IEEE754 standard is simple Precision floating-point data operations, while the length of vector data is configured by the Mask register, which speeds up the execution speed of intensive vector maximum/minimum value operations for specific fields, simplifies software programming complexity, increases code density, and improves processor execution. /Minimum worth computing efficiency and flexibility.

附图说明 Description of drawings

图1是依照本发明实施例的支持定浮点可重构、数据长度可配置向量最大/最小值网络的示意图。Fig. 1 is a schematic diagram of a vector maximum/minimum value network supporting fixed-floating-point reconfigurable and configurable data length according to an embodiment of the present invention.

图2是依照本发明实施例的并行浮点数据预处理单元100的内部结构图。FIG. 2 is an internal structure diagram of the parallel floating-point data preprocessing unit 100 according to an embodiment of the present invention.

图3是依照本发明实施例的可重构比较器网络300的内部结构图。FIG. 3 is an internal structure diagram of a reconfigurable comparator network 300 according to an embodiment of the present invention.

图4是依照本发明实施例的支持不同数据格式8位比较器内部结构图。Fig. 4 is an internal structure diagram of an 8-bit comparator supporting different data formats according to an embodiment of the present invention.

图5是依照本发明实施例的可重构、支持不同数据格式32位比较器内部结构图。Fig. 5 is an internal structure diagram of a reconfigurable 32-bit comparator supporting different data formats according to an embodiment of the present invention.

图6是依照本发明实施例的结果选择单元400内部结构图。FIG. 6 is an internal structure diagram of the result selection unit 400 according to an embodiment of the present invention.

具体实施方式 Detailed ways

为使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本发明进一步详细说明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in further detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

本发明的主要特点为:数据格式可重构、数据长度可配置。描述过程中约定以下说明符号:最大/最小值网络指令描述为B=Max/MinA{(M)}{(U)}{(FBS)};B为32位标量数据,A为512位向量数据;Opcode指操作码,用1位二进制表示,0表示最小值Min,1表示最大值Max;Mask为64位可配置寄存器,每位分别控制向量寄存器A的8位字节;M表示最大/最小值操作受Mask寄存器影响,当M选项不存在时表示Mask寄存器对最大/最小值操作无影响;U表示无符号选项;FBS表示数据格式,用2位二进制表示。“00”代表32位定点,“01”代表32位精简单精度浮点,“10”代表8位字节,“11”代表16位半字。OpaValid/OpbValid表示操作数Opa/Opb有效,受M选项的影响,当M选项有效时,OpaValid/OpbValid为1指示操作数Opa/Opb有效;当M选项不存在时,OpaValid/OpbValid无效,操作数Opa/Opb恒有效。The main features of the invention are: the data format can be reconfigured and the data length can be configured. The following description symbols are agreed during the description process: the maximum/minimum value network instruction is described as B=Max/MinA{(M)}{(U)}{(FBS)}; B is 32-bit scalar data, and A is 512-bit vector data ;Opcode refers to the operation code, expressed in 1-bit binary, 0 represents the minimum value Min, 1 represents the maximum value Max; Mask is a 64-bit configurable register, each bit controls the 8-bit byte of the vector register A; M represents the maximum/minimum The value operation is affected by the Mask register. When the M option does not exist, it means that the Mask register has no effect on the maximum/minimum value operation; U indicates an unsigned option; FBS indicates the data format, expressed in 2-bit binary. "00" stands for 32-bit fixed point, "01" stands for 32-bit compact simple-precision floating point, "10" stands for 8-bit byte, and "11" stands for 16-bit halfword. OpaValid/OpbValid indicates that the operand Opa/Opb is valid, and is affected by the M option. When the M option is valid, OpaValid/OpbValid is 1, indicating that the operand Opa/Opb is valid; when the M option does not exist, OpaValid/OpbValid is invalid, and the operand Opa/Opb is always valid.

本发明实施例中假定A为512位向量数据,但本发明适用于任何A为32倍数位宽的场合,Mask寄存器的宽度与向量A的长度关系为LengthMask=LengthA/8。In the embodiment of the present invention, it is assumed that A is 512-bit vector data, but the present invention is applicable to any occasion where A is a multiple of 32 bits, and the relationship between the width of the Mask register and the length of the vector A is LengthMask=LengthA/8.

如图1所示,图1是依照本发明实施例的支持定浮点可重构、数据长度可配置向量最大/最小值网络的示意图,该网络包括依次连接的并行浮点数据预处理单元100、Mask寄存器200、可重构比较器网络300和结果选择单元400,其中:As shown in FIG. 1 , FIG. 1 is a schematic diagram of a fixed-floating-point reconfigurable and configurable vector maximum/minimum network according to an embodiment of the present invention, and the network includes parallel floating-point data preprocessing units 100 connected in sequence. , Mask register 200, reconfigurable comparator network 300 and result selection unit 400, wherein:

并行浮点数据预处理单元100,用于对接收的512位向量数据A的格式进行分析,当该512位向量数据A为浮点数据格式时(FBS=2’b01),对这些浮点数据进行特殊值分析,得到非正常浮点数据标志位(NaNFlag)、正无穷标志位(PosInfFlag)、负无穷标志位(NegInfFlag)等特殊标志位,并对负浮点数据进行求反操作;当该512位向量数据A为定点数据格式时,并行浮点数据预处理单元100直接输出定点数据;并行浮点数据预处理单元100将处理后的浮点数据输出给可重构比较器网络300,并将各种特殊标志位输出给结果选择单元400。The parallel floating-point data preprocessing unit 100 is used to analyze the format of the received 512-bit vector data A, and when the 512-bit vector data A is in a floating-point data format (FBS=2'b01), these floating-point data Perform special value analysis to obtain special flag bits such as abnormal floating-point data flag (NaNFlag), positive infinity flag (PosInfFlag), negative infinity flag (NegInfFlag), and perform negation operation on negative floating-point data; when the When the 512-bit vector data A is in a fixed-point data format, the parallel floating-point data preprocessing unit 100 directly outputs the fixed-point data; the parallel floating-point data preprocessing unit 100 outputs the processed floating-point data to the reconfigurable comparator network 300, and Output various special flag bits to the result selection unit 400 .

Mask寄存器200,为可配置的64位寄存器,用于控制参与最大/最小值的数据。64位Mask寄存器直接控制512位向量数据A,Mask寄存器200的每一位分别控制向量数据A的一个字节。当M选项有效时,只有Mask寄存器相应位为1指示的单元才参与最大/最小值操作;当M选项不存在时,最大/最小值操作不受Mask寄存器影响,512位向量数据全部参与最大/最小值操作。The Mask register 200 is a configurable 64-bit register used to control the data involved in the maximum/minimum value. The 64-bit Mask register directly controls the 512-bit vector data A, and each bit of the Mask register 200 controls one byte of the vector data A respectively. When the M option is valid, only the unit indicated by the corresponding bit of the Mask register as 1 participates in the maximum/minimum value operation; when the M option does not exist, the maximum/minimum value operation is not affected by the Mask register, and all 512-bit vector data participate in the maximum/minimum value operation. Minimum operation.

可重构比较器网络300,用于接收经并行浮点数据预处理单元100处理后的数据以及Mask寄存器的值作为输入,根据Opcode操作码、FBS选项数据格式、U选项、M选项以及Mask寄存器的值,对向量数据依次进行比较,得到最大/最小值,并将最大/值结果输出给结果选择单元400。The reconfigurable comparator network 300 is used to receive the data processed by the parallel floating-point data preprocessing unit 100 and the value of the Mask register as input, according to the Opcode operation code, FBS option data format, U option, M option and Mask register The vector data is compared in turn to obtain the maximum/minimum value, and the maximum/value result is output to the result selection unit 400 .

结果选择单元400,用于接收可重构比较器网络300的输出,根据接收自并行浮点数据预处理单元100的NaNFlag、PosInfFlag、NegInfFlag等标志位输出得到的最终的向量最大/最小值结果。The result selection unit 400 is used to receive the output of the reconfigurable comparator network 300, and output the final vector maximum/minimum value result obtained according to the flag bits such as NaNFlag, PosInfFlag, NegInfFlag received from the parallel floating-point data preprocessing unit 100.

下面结合图2至图6,详细介绍本发明提供的支持定浮点可重构、数据长度可配置的最大/最小值网络。本发明在具体实现方面,包括并行、可重构、可配置设计,其中并行浮点数据预处理单元100通过16份完全相同的硬件结构实现并行,4个8位比较器可重构成1/2/4个32/16/8位比较器,Mask寄存器的值实现向量长度的灵活配置。The maximum/minimum value network supporting fixed-floating point reconfigurability and configurable data length provided by the present invention will be described in detail below with reference to FIG. 2 to FIG. 6 . In terms of specific implementation, the present invention includes parallel, reconfigurable, and configurable designs, wherein the parallel floating-point data preprocessing unit 100 realizes parallelism through 16 identical hardware structures, and four 8-bit comparators can be reconfigured into 1/2 /4 32/16/8-bit comparators, the value of the Mask register realizes the flexible configuration of the vector length.

如图2所示,图2为依照本发明实施例的并行浮点预处理单元的内部结构图,该并行浮点预处理单元包括依次连接的向量分解单元110、16个完全相同的浮点标志位生成单元120、向量浮点结果标志位生成单元140。As shown in Figure 2, Figure 2 is an internal structure diagram of a parallel floating-point preprocessing unit according to an embodiment of the present invention, which includes a sequentially connected vector decomposition unit 110 and 16 identical floating-point flags A bit generation unit 120 , and a vector floating point result flag bit generation unit 140 .

向量分解单元110,用于将输入的512位向量数据A分解成16个32位标量浮点数据A_0-A_15,并依次送至16个完全相同的浮点标志位生成单元120。The vector decomposition unit 110 is used to decompose the input 512-bit vector data A into 16 pieces of 32-bit scalar floating-point data A_0-A_15, and send them to 16 identical floating-point flag generating units 120 in sequence.

所述浮点标志位生成单元120,用于对每个32位单精度浮点数据进行分析,判断其是否是NaN、无穷大等特殊情况,并对负数浮点数据进行求反操作。在浮点标志位生成单元120中,符号位、指数、尾数分离单元121将32位浮点数据进行符号位、指数、尾数分离,其中指数送至指数比较器122进行指数比较,当指数为0时输出Exp_0=1,当指数为255时输出Exp_255=1,指数为其他值时,Exp_0和Exp_255均为0。尾数比较器123接收符号位、指数、尾数分离单元121输出的23位尾数(没有经过隐含1扩展的浮点尾数),当23位尾数为0时,Manti_0=1,其他尾数时Manti_0=0。同时31位指数、尾数经高位0扩展成32位后,通过另外的通道进入取反电路124和MUX0选择器128,MUX0选择器128的控制信号来自浮点的符号位,当符号位为1时MUX0选择输出取反后的指数、尾数,否则输出取反前的指数、尾数。MUX1选择器129接收MUX0选择器128的输出和0作为其输入,其控制信号来自指数比较器123的输出Exp_0,当Exp_0=1时,将浮点数据看成0,MUX1选择器129输出0,其他情况下输出正常的32位非零数据,MUX1选择器129得到预处理后的32位浮点数据DisFloat_0。信号Exp_255和Mant_0进入NaN判定逻辑单元130和无穷判定逻辑单元126,当Exp_255=1,Mant_0=0时,NaN判定逻辑单元130输出NaN_0=1,表示浮点数据为NaN;当Exp_255=1,Mant_0=1时,无穷判定逻辑单元126输出为1,表示浮点数据为无穷。正无穷判定逻辑单元131和负无穷判定逻辑单元132接收无穷判定逻辑单元126的输出以及浮点符号位作为输入,进一步生成正无穷标志位PosInf_0,和负无穷标志位NegInf_0。至此,每个浮点数据的特殊符号标志位均生成且得到预处理后的浮点数据。The floating-point flag generation unit 120 is used to analyze each 32-bit single-precision floating-point data, judge whether it is a special case such as NaN, infinity, and perform a negation operation on negative floating-point data. In the floating-point flag generating unit 120, the sign bit, the exponent, and the mantissa separation unit 121 separate the sign bit, the exponent, and the mantissa of the 32-bit floating-point data, and wherein the exponent is sent to the exponent comparator 122 for exponent comparison. output Exp_0=1, when the exponent is 255, output Exp_255=1, and when the exponent is other values, both Exp_0 and Exp_255 are 0. The mantissa comparator 123 receives the 23-bit mantissa (floating-point mantissa without implicit 1 extension) output by the sign bit, the exponent, and the mantissa separation unit 121. When the 23-bit mantissa is 0, Manti_0=1, and Manti_0=0 during other mantissas . Simultaneously, after the 31-bit exponent and the mantissa are extended into 32 bits by high-order 0, they enter the inversion circuit 124 and the MUX0 selector 128 through another channel, and the control signal of the MUX0 selector 128 comes from the sign bit of the floating point. When the sign bit is 1 MUX0 chooses to output the exponent and mantissa after inversion, otherwise it outputs the exponent and mantissa before inversion. MUX1 selector 129 receives the output of MUX0 selector 128 and 0 as its input, and its control signal is from the output Exp_0 of exponent comparator 123, when Exp_0=1, the floating-point data is regarded as 0, MUX1 selector 129 outputs 0, In other cases, the normal 32-bit non-zero data is output, and the MUX1 selector 129 obtains the preprocessed 32-bit floating point data DisFloat_0. Signal Exp_255 and Mant_0 enter NaN decision logic unit 130 and infinite decision logic unit 126, when Exp_255=1, Mant_0=0, NaN decision logic unit 130 outputs NaN_0=1, represents that floating-point data is NaN; When Exp_255=1, Mant_0 =1, the output of the infinity decision logic unit 126 is 1, indicating that the floating point data is infinite. The positive infinity decision logic unit 131 and the negative infinity decision logic unit 132 receive the output of the infinity decision logic unit 126 and the floating point sign bit as input, and further generate a positive infinity flag PosInf_0 and a negative infinity flag NegInf_0. So far, the special symbol flag bit of each floating point data is generated and the preprocessed floating point data is obtained.

向量浮点结果标志位生成单元140根据各个浮点数据的标志位以及各个浮点处理后的数据得到整个向量浮点标志位和向量浮点数据。向量NaN标志位生成单元141为16输入或门,接收每个浮点标志位NaN_0-NaN_15,输出向量NaN标志为NaNFlag。向量正无穷标志位生成单元142和向量负无穷标志位生成单元143均为16输入或门,分别接收PosInf_0-PosInf_15和NegInf_0-NegInf-15作为输入,得到向量正无穷标志位PosInfFlag和向量负无穷标志位NegInfFlag。向量结合单元144将预处理得到的浮点数据DisFloat_0-DisFloat_15进行结合,得到512位向量数据。The vector floating-point result flag generation unit 140 obtains the entire vector floating-point flag and the vector floating-point data according to the flags of each floating-point data and the processed data of each floating point. The vector NaN flag generating unit 141 is a 16-input OR gate, which receives each floating-point flag NaN_0-NaN_15, and outputs a vector NaN flag as NaNFlag. The vector positive infinity flag generation unit 142 and the vector negative infinity flag generation unit 143 are 16-input OR gates, which respectively receive PosInf_0-PosInf_15 and NegInf_0-NegInf-15 as inputs to obtain the vector positive infinity flag PosInfFlag and the vector negative infinity flag Bit NegInfFlag. The vector combination unit 144 combines the pre-processed floating point data DisFloat_0-DisFloat_15 to obtain 512-bit vector data.

至此,浮点向量预处理完成,得到浮点向量的标志位和预处理后的浮点数据。当工作在定点模式时,浮点数据预处理单元通过另外的通路,不经过任何预处理直接得到定点向量数据,进入下一个单元。So far, the preprocessing of the floating point vector is completed, and the flag bit of the floating point vector and the preprocessed floating point data are obtained. When working in the fixed-point mode, the floating-point data preprocessing unit directly obtains the fixed-point vector data without any preprocessing through another channel, and enters the next unit.

所述Mask寄存器200为64位用户可配置的Mask寄存器。64位Mask寄存器的每一位分别指示512向量数据的8位字节,当Mask寄存器有效时(即M选项存在),只有Mask寄存器相应位为1的向量数据才参与最大/最小值操作;否则所有向量数据均参与最大/最小值操作。The Mask register 200 is a 64-bit user-configurable Mask register. Each bit of the 64-bit Mask register indicates the 8-bit byte of 512 vector data respectively. When the Mask register is valid (that is, the M option exists), only the vector data whose corresponding bit of the Mask register is 1 participates in the maximum/minimum value operation; otherwise All vector data participate in max/min operations.

如图3所示,图3是依照本发明实施例的可重构比较器网络300的内部结构图。所述可重构比较器网络300由8/16/32位比较器级联组成,每个比较器根据输入操作码(最大/最小值)得到相应的最大/最小值。通过4级32位比较器得到32位最大/最小值,第5级增加一个16位比较器可以得到16位最大/最小值,第6级增加一个8位比较器可以得到8位最大/最小值。通过一套比较器资源可以实现8/16/32位有/无符号定点、32位精简单精度浮点数据的比较,并得到最终的最大/最小值。As shown in FIG. 3 , FIG. 3 is an internal structure diagram of a reconfigurable comparator network 300 according to an embodiment of the present invention. The reconfigurable comparator network 300 is composed of cascaded 8/16/32-bit comparators, and each comparator obtains a corresponding maximum/minimum value according to an input operation code (maximum/minimum value). A 32-bit maximum/minimum value can be obtained through a 4-stage 32-bit comparator, a 16-bit maximum/minimum value can be obtained by adding a 16-bit comparator to the fifth stage, and an 8-bit maximum/minimum value can be obtained by adding an 8-bit comparator to the sixth stage . Through a set of comparator resources, the comparison of 8/16/32-bit signed/unsigned fixed-point and 32-bit simple-precision floating-point data can be realized, and the final maximum/minimum value can be obtained.

比较器网络300由多个32位比较器和1个16位比较器以及1个8位比较器组成,除相应的数据输入外,每个比较器的输入还有U、M、FBS等控制信号。当工作在32位单精度浮点或32位定点模式时,512位向量数据通过4层32位比较器网络,得到32位最大/最小值;当工作在16位半字定点模式时,第4层32位比较器的输出进入1个16位比较器,得到16位最大/最小值;当工作在8位字节模式时,第5层16位比较器的输出进入1个8位比较器,得到8位最大/最小值。通过8位定点数据格式时所需的比较器资源在加上相应的控制信号,实现了16/32位定点,以及32位IEEE754标准精简单精度浮点等多种数据格式的可重构。The comparator network 300 is composed of multiple 32-bit comparators, one 16-bit comparator and one 8-bit comparator. In addition to the corresponding data input, the input of each comparator also has control signals such as U, M, and FBS. . When working in 32-bit single-precision floating-point or 32-bit fixed-point mode, 512-bit vector data passes through a 4-layer 32-bit comparator network to obtain a 32-bit maximum/minimum value; when working in 16-bit half-word fixed-point mode, the fourth The output of the 32-bit comparator of the layer enters a 16-bit comparator to obtain a 16-bit maximum/minimum value; when working in 8-bit byte mode, the output of the 16-bit comparator of the fifth layer enters an 8-bit comparator, Get 8 bit max/min values. By adding corresponding control signals to the comparator resources required for the 8-bit fixed-point data format, the reconfigurability of multiple data formats such as 16/32-bit fixed-point and 32-bit IEEE754 standard simple-precision floating-point is realized.

16/32位比较器由基本的8位比较器级联组合而成,如图4所示,图4是依照本发明实施例的支持不同数据格式8位比较器内部结构图。加法器412计算输入Opa和Opb的差值第一逻辑电路413接收加法器的输出和有/无符号数据选项U生成Opa和Opb差值结果标志位:进位标志(CF)、溢出标志(OF)、负数标志(NF)。第二逻辑电路414接收CF、OF、NF等标志位,以及U、M、Opcode、OpaValid、OpbValid等控制信号,产生结果选择信号Sel[1:0];Sel[1:0]占用2位二进制,“00”表示输出无效,“01”表示应选择输出Opa,“10”表示输出应选择Opb,“11”表示两数据相等,在这仍输出Opa。Sel[0]进入MUX选择器(417)作为控制信号选择比较器的输出;同时OpaValid、OpbValid和M选项通过另外一通路,进入比较器结果有效生成单元(416)产生比较器结果有效控制信号ResultValid,其中 Re sultValid = OpaValid | OpbValid | M ‾ . The 16/32-bit comparator is composed of basic 8-bit comparators cascaded and combined, as shown in FIG. 4 , which is an internal structure diagram of an 8-bit comparator supporting different data formats according to an embodiment of the present invention. Adder 412 calculates the difference between the inputs Opa and Opb The first logic circuit 413 receives the output of the adder and has/unsigned data option U to generate Opa and Opb difference result flags: carry flag (CF), overflow flag (OF), negative number flag (NF). The second logic circuit 414 receives flags such as CF, OF, and NF, and control signals such as U, M, Opcode, OpaValid, and OpbValid, and generates a result selection signal Sel[1:0]; Sel[1:0] occupies 2 binary bits , "00" indicates that the output is invalid, "01" indicates that Opa should be selected as output, "10" indicates that Opb should be selected for output, and "11" indicates that the two data are equal, and Opa is still output here. Sel[0] enters MUX selector (417) as the output of control signal selection comparator; Simultaneously OpaValid, OpbValid and M option enter comparator result effective generation unit (416) to produce comparator result effective control signal ResultValid through another path ,in Re sultValid = OpaValid | OpbValid | m ‾ .

第二逻辑电路144通过一些组合逻辑产生结果选择信号Sel,Sel的生成满足以下真值表:The second logic circuit 144 generates the result selection signal Sel through some combinatorial logic, and the generation of Sel satisfies the following truth table:

表格1  8位比较器Sel生成真值表Table 1 8-bit comparator Sel generates a truth table

注:Opcode为0表示最小值,1表示最大值;X表示可取任何值Note: Opcode 0 means the minimum value, 1 means the maximum value; X means any value can be taken

如图5所示,图5是依照本发明实施例的可重构、支持不同数据格式32位比较器内部结构图,该32位比较器由4个图4所示的基本8位比较器和一些控制逻辑组成。4个基本8位比较器(511、512、513、514)并行工作,4个8位比较器的ResultValid信号(ResultValid0-ResultValid3)通过第四逻辑电路516拼接成32位结果有效信号ResultValid[3:0],其中ResultValid[3:0]={ResultValid3,ResultValid2,ResultValid1,ResultValid0}。第三逻辑电路515接收4个基本8位比较器的Sel信号(Sel0-Sel3)以及FBS选项产生32位比较器的选择信号Sel[3:0],Sel[3:0]的每一位分别指示32位比较器的每个字节来自Opa还是Opb,为1时选择Opa,为0时选择Opb。Sel[3:0]的生成满足以下真值表:As shown in Figure 5, Figure 5 is a reconfigurable internal structure diagram of a 32-bit comparator that supports different data formats according to an embodiment of the present invention. The 32-bit comparator is composed of four basic 8-bit comparators shown in Figure 4 and some control logic. Four basic 8-bit comparators (511, 512, 513, 514) work in parallel, and the ResultValid signals (ResultValid0-ResultValid3) of the four 8-bit comparators are spliced into 32-bit result valid signals ResultValid[3: 0], where ResultValid[3:0]={ResultValid3, ResultValid2, ResultValid1, ResultValid0}. The third logic circuit 515 receives the Sel signals (Sel0-Sel3) of 4 basic 8-bit comparators and the selection signal Sel[3:0] of the 32-bit comparator generated by the FBS option, and each bit of Sel[3:0] is respectively Indicates that each byte of the 32-bit comparator comes from Opa or Opb, Opa is selected for 1, and Opb is selected for 0. The generation of Sel[3:0] satisfies the following truth table:

表格2  32位比较器Sel生成逻辑表Table 2 32-bit comparator Sel generates a logic table

如图6所示,图6是依照本发明实施例的结果选择单元400内部结构图。所述结果选择单元400根据浮点特殊情况的符号位NaNFlag、PosInfFlag、NegInfFlag和可重构比较器网络的8/16/32位最大/最小值结果得到最终的向量最大/最小值结果。参照图1,结果选择单元400根据可重构比较器300的输出8/16/32位结果以及并行浮点数据预处理单元100输出的各种浮点标志位信号(NaNFlag、PosInfFlag、NegInfFlag)选择输出最终的向量最大/最小值。4-1选择器MUX3 614受FBS选项的控制,当FBS=2’b00时,网络工作在32位定点模式下,MUX3 614直接输出可重构比较器300的32位结果;当FBS=2’b10,网络工作在8位定点模式下,MUX3 614直接输出可重构比较器300的8位结果;当FBS=2’b11,网络工作在16位定点模式下,MUX3 614直接输出可重构比较器300的16位结果;当FBS=2’b01,网络工作在32位IEEE754b标准精简单精度浮点数据格式,根据各种浮点标志位信号(NaNFlag、PosInfFlag、NegInfFlag)选择输出最终的向量最大/最小值。若NaNFlag=1,说明16个浮点数据中存在NaN浮点数据,最大/最小值网络最终输出32’hFFFF_FFFF;若PosInfFlag=1且Opcode=1表示浮点数据中存在正无穷数据,且工作在最大值(Max)模式,最大/最小值网络输出正无穷32’h7F80_0000;若NegInfFlag=1且Opcode=0表示浮点数据中存在负无穷32’hFF80_0000,且工作在最小值(Min)模式下,最大/最小值网络输出负无穷;在其他情况下,输出可重构比较器300的32位浮点数据。As shown in FIG. 6 , FIG. 6 is an internal structure diagram of the result selection unit 400 according to an embodiment of the present invention. The result selection unit 400 obtains the final vector maximum/minimum value result according to the sign bits NaNFlag, PosInfFlag, NegInfFlag of the floating-point special case and the 8/16/32-bit maximum/minimum value result of the reconfigurable comparator network. Referring to Fig. 1, the result selection unit 400 selects according to the output 8/16/32 bit result of the reconfigurable comparator 300 and various floating-point flag signals (NaNFlag, PosInfFlag, NegInfFlag) output by the parallel floating-point data preprocessing unit 100 Output the final vector max/min. 4-1 Selector MUX3 614 is controlled by the FBS option, when FBS=2'b00, the network works in 32-bit fixed-point mode, MUX3 614 directly outputs the 32-bit result of reconfigurable comparator 300; when FBS=2' b10, the network works in 8-bit fixed-point mode, MUX3 614 directly outputs the 8-bit result of the reconfigurable comparator 300; when FBS=2'b11, the network works in 16-bit fixed-point mode, MUX3 614 directly outputs the reconfigurable comparator The 16-bit result of the device 300; when FBS=2'b01, the network works in the 32-bit IEEE754b standard refined simple-precision floating-point data format, and selects the final vector maximum output according to various floating-point flag signals (NaNFlag, PosInfFlag, NegInfFlag) /min. If NaNFlag=1, it means that there are NaN floating-point data in the 16 floating-point data, and the maximum/minimum value network will finally output 32'hFFFF_FFFF; if PosInfFlag=1 and Opcode=1, it means that there are positive infinite data in the floating-point data, and work in In the maximum value (Max) mode, the maximum/minimum value network outputs positive infinity 32'h7F80_0000; if NegInfFlag=1 and Opcode=0, it means that there is negative infinity 32'hFF80_0000 in the floating point data, and it works in the minimum value (Min) mode, The max/min network outputs negative infinity; otherwise, the 32-bit floating point data of the reconfigurable comparator 300 is output.

基于上述图1至图6所示的支持定浮点可重构、向量长度可配置的向量最大/最小值网络,本发明还提供了一种定点可重构、数据长度可配置的比较方法,其特征在于,包括以下步骤:Based on the above-mentioned vector maximum/minimum network that supports fixed-floating point reconfigurable and vector length configurable shown in Figures 1 to 6, the present invention also provides a fixed-point reconfigurable and data length-configurable comparison method, It is characterized in that, comprising the following steps:

8/16/32位定点数据可重构,8位定点数据为基本单元;2个8位定点数据和相应的控制逻辑单元重组为16位定点数据;4个8位定点数据和相应的控制逻辑单元重组为32位定点数据;8/16/32-bit fixed-point data can be reconfigured, 8-bit fixed-point data is the basic unit; two 8-bit fixed-point data and corresponding control logic units are reorganized into 16-bit fixed-point data; four 8-bit fixed-point data and corresponding control logic Cell reorganization into 32-bit fixed-point data;

定浮点可重构,浮点数据根据符号位情况决定是否完成求反操作;当符号位为1,浮点指数、尾数分别求反,符号位保持1不变,浮点符号位、指数、尾数形成新的32位数据;当符号位为0,浮点保持不变。浮点经符号位处理后可以复用定点数据通路;Fixed-floating point can be reconfigured, and the floating-point data determines whether to complete the negation operation according to the sign bit; when the sign bit is 1, the floating-point exponent and mantissa are negated respectively, and the sign bit remains 1. The floating-point sign bit, exponent, The mantissa forms the new 32-bit data; when the sign bit is 0, the floating point remains unchanged. After the floating point is processed by the sign bit, the fixed-point data path can be multiplexed;

数据长度可配置,通过Mask寄存器实现;Mask寄存器的每一位分别控制数据的某个位域,通过配置Mask寄存器的值来配置参与运算的数据长度。The data length is configurable and implemented through the Mask register; each bit of the Mask register controls a certain bit field of the data, and the data length involved in the operation is configured by configuring the value of the Mask register.

以上所述的具体实施例,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施例而已,并不用于限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The specific embodiments described above have further described the purpose, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above descriptions are only specific embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims (12)

1. support the configurable vector maximum/minimum value of the reconfigurable length of a fixed and floating network, it is characterized in that, comprising:
Parallel floating point data pretreatment unit (100), for the form of the 512 bit vector data A that receive is analyzed, and process respectively for different data layouts, the floating data obtaining after processing is exported to restructural comparator network (300), the various zone bits that obtain after processing are exported to result selected cell (400);
Mask register (200) is 64 configurable Mask registers, for the data of control and participate in maximum/minimum value comparison;
Restructural comparator network (300), the value that is used for being received from the floating data of parallel floating point data pretreatment unit (100) and be received from Mask register (200) is as input, according to the value of Opcode operational code, FBS option data form, U option, M option and Mask register, vector data is compared successively, the maximum obtaining/little value result is exported to result selected cell (400); And
Result selected cell (400), be used for receiving the output of restructural comparator network (300), according to the various zone bits that are received from parallel floating point data pretreatment unit (100), output obtains final vector maximum/minimum value result.
2. the configurable vector maximum/minimum value of the reconfigurable length of support fixed and floating according to claim 1 network, it is characterized in that, described parallel floating point data pretreatment unit (100) is analyzed the form of the 512 bit vector data A that receive, and process respectively for different data layouts, comprising:
Parallel floating point data pretreatment unit (100) is analyzed the form of the 512 bit vector data A that receive, in the time that these 512 bit vector data A is floating point data format, these floating datas are carried out to particular value analysis, obtain improper floating data zone bit NaNFlag, just infinite zone bit PosInfFlag and negative infinite zone bit NegInfFlag, and negative floating data is carried out to complementary operation; In the time that these 512 bit vector data A is fixed-point data form, directly export fixed-point data.
3. the configurable vector maximum/minimum value of the reconfigurable length of support fixed and floating according to claim 2 network, it is characterized in that, described parallel floating point data pretreatment unit (100) comprises the vectorial resolving cell (110), 16 identical floating-point zone bit generation units (120) and the vectorial floating point result zone bit generation unit (140) that connect successively, wherein:
Vector resolving cell (110), for these 512 bit vector data A of input is resolved into 16 32 scalar floating data A_0-A_15, and delivers to 16 identical floating-point zone bit generation units (120) successively;
Floating-point zone bit generation unit (120), for each 32 single-precision floating-point datas are analyzed, judges whether it is NaN or infinity, and negative floating data is carried out to complementary operation;
Vector floating point result zone bit generation unit (140), for obtaining whole vectorial floating-point zone bit and vector floating-point data according to the zone bit of each floating data and each floating-point data after treatment.
4. the configurable vector maximum/minimum value of the reconfigurable length of support fixed and floating according to claim 3 network, it is characterized in that, in described floating-point zone bit generation unit (120), 32 floating datas are carried out sign bit, index, mantissa's separation by sign bit, index, mantissa's separative element (121), its Exponential is delivered to index comparator (122) and is carried out index comparison, in the time that being 0, exports index Exp_0=1, in the time that being 255, exports index Exp_255=1, when index is worth for other, Exp_0 and Exp_255 are 0; 23 mantissa of mantissa's comparer (123) receiving symbol position, index, mantissa's separative element (121) output, in the time that 23 mantissa are 0, Manti_0=1, Manti_0=0 when other mantissa; 31 indexes, mantissa are extended to after 32 through high-order 0 simultaneously, enter negate circuit (124) and MUX0 selector switch (128) by other passage, the control signal of MUX0 selector switch (128) is from the sign bit of floating-point, in the time that sign bit is 1, MUX0 selects index, the mantissa after output negate, otherwise exports index, the mantissa before negate; MUX1 selector switch (129) receives the output and 0 of MUX0 selector switch (128) as its input, its control signal is from the output Exp_0 of index comparator (123), in the time of Exp_0=1, regard floating data as 0, MUX1 selector switch (129) output 0, in other situations, export normal 32 non-zero, MUX1 selector switch (129) obtains pretreated 32 floating data DisFloat_0; Signal Exp_255 and Mant_0 enter NaN decision logic unit (130) and infinite decision logic unit (126), work as Exp_255=1, when Mant_0=0, NaN decision logic unit (130) output NaN_0=1, expression floating data is NaN; Work as Exp_255=1, when Mant_0=1, infinite decision logic unit (126) is output as 1, represents that floating data is infinite; Just infinite decision logic unit (131) and negative infinite decision logic unit (132) receive the output of infinite decision logic unit (126) and floating-point-sign position as input, further generate just infinite zone bit PosInf_0, and negative infinite zone bit NegInf_0; So far, the special symbol zone bit of each floating data all generates and obtains pretreated floating data; .
5. the configurable vector maximum/minimum value of the reconfigurable length of support fixed and floating according to claim 3 network, it is characterized in that, in described vectorial floating point result zone bit generation unit (140), vector NaN zone bit generation unit (141) is 16 inputs or door, receive each floating-point zone bit NaN_0-NaN_15, output vector NaN is masked as NaNFlag; The negative infinite zone bit generation unit (143) of the just infinite zone bit generation unit of vector (142) and vector is 16 inputs or door, receive respectively PosInf_0-PosInf_15 and NegInf_0-NegInf-15 as input, obtain the negative infinite mark NegInfFlag of the just infinite zone bit PosInfFlag of vector and vector; The floating data DisFloat_0-DisFloat_15 that vector combining unit (144) obtains pre-service carries out combination, obtains 512 bit vector data.
6. the configurable vector maximum/minimum value of the reconfigurable length of support fixed and floating according to claim 1 network, is characterized in that, the data of described Mask register (200) control and participate in maximum/minimum value, comprising:
Mask register (200) is 64 configurable registers, directly controls these 512 bit vector data A, and each of Mask register (200) is controlled respectively a byte of these 512 bit vector data A; In the time that M option is effective, only having Mask register corresponding positions is that the unit of 1 instruction just participates in the operation of maximum/minimum value; In the time that M option does not exist, the operation of maximum/minimum value is not affected by Mask register, and these 512 bit vector data A all participates in the operation of maximum/minimum value.
7. the configurable vector maximum/minimum value of the reconfigurable length of support fixed and floating according to claim 1 network, it is characterized in that, described restructural comparator network (300) is made up of 8/16/32 bit comparator cascade, and each comparer obtains corresponding maximum/minimum value according to input operation code.
8. the configurable vector maximum/minimum value of the reconfigurable length of support fixed and floating according to claim 7 network, it is characterized in that, in described restructural comparator network (300), obtain 32 maximum/minimum value by 4 grade of 32 bit comparator, 16 bit comparators of the 5th grade of increase can obtain 16 maximum/minimum value, and the 6th grade increases by 8 bit comparators and can obtain 8 maximum/minimum value.
9. the configurable vector maximum/minimum value of the reconfigurable length of support fixed and floating according to claim 7 network, it is characterized in that, in described restructural comparator network (300), can realize 8/16/32 by a comparer resource simplifies the comparison of single-precision floating-point data, and obtains final maximum/minimum value with/without symbol fixed point, 32 IEEE7544 standards.
10. the configurable vector maximum/minimum value of the reconfigurable length of support fixed and floating according to claim 1 network, it is characterized in that, described restructural comparator network (300) is made up of multiple 32 bit comparators and 1 16 bit comparator and 18 bit comparator, except corresponding data input, the input of each comparer also has control signal U, M, FBS; In the time being operated in 32 single-precision floating points or 32 fixed point modes, 512 bit vector data, by 4 layer of 32 bit comparator network, obtain 32 maximum/minimum value; In the time being operated in 16 half-word fixed point modes, the output of the 4th layer of 32 bit comparator enters 1 16 bit comparator, obtains 16 maximum/minimum value; In the time being operated in octet pattern, the output of the 5th layer of 16 bit comparator enters 18 bit comparator, obtains 8 maximum/minimum value; Comparer resource required during by 8 fixed-point data forms is adding corresponding control signal, realizes 16/32 fixed point, and 32 IEEE754 standards are simplified the restructural of single-precision floating point several data form.
Configurable vector maximum/the minimum value of the 11. reconfigurable length of support fixed and floating according to claim 10 network, it is characterized in that, described 16/32 bit comparator is formed by 8 bit comparator cascadings, and totalizer (412) is calculated the difference of input Opa and Opb the first logical circuit (413) receives the output of totalizer and generates Opa and Opb difference result zone bit with/without symbol data option U: carry flag CF, overflow indicator OF, negative mark NF; The second logical circuit (414) receiving flag position CF, OF, NF, and control signal U, M, Opcode, OpaValid, OpbValid, bear results and select signal Sel[1:0]; Sel[1:0] take 2 scale-of-two, " 00 " represents that output is invalid, and " 01 " represents should select to export Opa, and " 10 " expression output should be selected Opb, and " 11 " represent that two data equate, still export Opa at this; Sel[0] enter MUX selector switch (417) and select the output of comparer as control signal; OpaValid, OpbValid and M option, by an other path, enter the effective generation unit of comparator results (416) and produce the effective control signal ResultValid of comparator results, wherein simultaneously
Configurable vector maximum/the minimum value of the 12. reconfigurable length of support fixed and floating according to claim 1 network, it is characterized in that, in described result selected cell (400), 4-1 selector switch MUX3 (614) is subject to the control of FBS option, in the time of FBS=2 ' b00, network is operated under 32 fixed point modes, and MUX3 (614) directly exports 32 results of restructural comparer (300); Work as FBS=2 ' b10, network is operated under 8 fixed point modes, and MUX3 (614) directly exports 8 results of restructural comparer (300); Work as FBS=2 ' b11, network is operated under 16 fixed point modes, and MUX3 (614) directly exports 16 results of restructural comparer (300); Work as FBS=2 ' b01, network is operated in 32 and simplifies single-precision floating-point data form, selects the final vector maximum/minimum value of output according to various floating-point zone bit signals; If NaNFlag=1, illustrates in 16 floating datas and has NaN floating data, the final output of maximum/minimum value network 32 ' hFFFF_FFFF; If PosInfFlag=1 and Opcode=1 represent the just infinite data of existence in floating data, and are operated in maximal value Max pattern, the just infinite 32 ' h7F80_0000 of maximum/minimum value network output; If it is negative infinite that NegInfFlag=1 and Opcode=0 represent to exist in floating data, and be operated under minimum M in pattern, the negative infinite 32 ' hFF80_0000 of maximum/minimum value network output; In other cases, 32 floating datas of output restructural comparator network (300).
CN201110415155.8A 2011-12-13 2011-12-13 Length-configurable vector maximum/minimum network supporting reconfigurable fixed floating points Active CN102520903B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110415155.8A CN102520903B (en) 2011-12-13 2011-12-13 Length-configurable vector maximum/minimum network supporting reconfigurable fixed floating points

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110415155.8A CN102520903B (en) 2011-12-13 2011-12-13 Length-configurable vector maximum/minimum network supporting reconfigurable fixed floating points

Publications (2)

Publication Number Publication Date
CN102520903A CN102520903A (en) 2012-06-27
CN102520903B true CN102520903B (en) 2014-07-23

Family

ID=46291846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110415155.8A Active CN102520903B (en) 2011-12-13 2011-12-13 Length-configurable vector maximum/minimum network supporting reconfigurable fixed floating points

Country Status (1)

Country Link
CN (1) CN102520903B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015097494A1 (en) * 2013-12-23 2015-07-02 Intel Corporation Instruction and logic for identifying instructions for retirement in a multi-strand out-of-order processor
CN105511836A (en) * 2016-01-22 2016-04-20 成都三零嘉微电子有限公司 High-speed and multimode modulo addition operation circuit
CN106775579B (en) * 2016-11-29 2019-06-04 北京时代民芯科技有限公司 Floating-point arithmetic acceleration unit based on configurable technology
CN107301031B (en) * 2017-06-15 2020-08-04 西安微电子技术研究所 Normalized floating point data screening circuit
CN107340992B (en) * 2017-06-15 2020-07-28 西安微电子技术研究所 Fixed point data screening circuit
CN111381805A (en) * 2018-12-28 2020-07-07 上海寒武纪信息科技有限公司 Data comparator, data processing method, chip and electronic device
CN111381875B (en) * 2018-12-28 2022-12-09 上海寒武纪信息科技有限公司 Data comparator, data processing method, chip and electronic device
CN111260044B (en) * 2018-11-30 2023-06-20 上海寒武纪信息科技有限公司 Data comparator, data processing method, chip and electronic equipment
CN111381802B (en) * 2018-12-28 2022-12-09 上海寒武纪信息科技有限公司 Data comparator, data processing method, chip and electronic equipment
CN111381806A (en) * 2018-12-28 2020-07-07 上海寒武纪信息科技有限公司 Data comparator, data processing method, chip and electronic equipment
CN117724676A (en) * 2018-12-28 2024-03-19 上海寒武纪信息科技有限公司 Data comparator, data processing method, chip and electronic device
CN117519637A (en) * 2018-12-28 2024-02-06 上海寒武纪信息科技有限公司 Data comparator, data processing method, chip and electronic device
CN110888992A (en) * 2019-11-15 2020-03-17 北京三快在线科技有限公司 Multimedia data processing method and device, computer equipment and readable storage medium
CN113094020B (en) * 2021-03-15 2023-03-28 西安交通大学 Hardware device and method for quickly searching maximum or minimum N values of data set

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5301137A (en) * 1990-07-23 1994-04-05 Mitsubishi Denki Kabushiki Kaisha Circuit for fixed point or floating point arithmetic operations
KR20090117451A (en) * 2008-05-09 2009-11-12 연세대학교 산학협력단 Reconfigurable compute unit that performs fixed-point or floating-point arithmetic based on the format of the input data
CN101847087A (en) * 2010-04-28 2010-09-29 中国科学院自动化研究所 Reconfigurable transverse summing network structure for supporting fixed and floating points

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5301137A (en) * 1990-07-23 1994-04-05 Mitsubishi Denki Kabushiki Kaisha Circuit for fixed point or floating point arithmetic operations
KR20090117451A (en) * 2008-05-09 2009-11-12 연세대학교 산학협력단 Reconfigurable compute unit that performs fixed-point or floating-point arithmetic based on the format of the input data
CN101847087A (en) * 2010-04-28 2010-09-29 中国科学院自动化研究所 Reconfigurable transverse summing network structure for supporting fixed and floating points

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
数字信号处理器中高性能可重构加法器设计;马鸿等;《计算机工程》;20090620;第35卷(第12期);1-12 *
马鸿等.数字信号处理器中高性能可重构加法器设计.《计算机工程》.2009,第35卷(第12期),

Also Published As

Publication number Publication date
CN102520903A (en) 2012-06-27

Similar Documents

Publication Publication Date Title
CN102520903B (en) Length-configurable vector maximum/minimum network supporting reconfigurable fixed floating points
CN102103479B (en) Floating point calculator and processing method for floating point calculation
TWI515649B (en) Reducing power consumption in a fused multiply-add (fma) unit responsive to input data values
CN101847087B (en) A Reconfigurable Horizontal Sum Network Structure Supporting Fixed-Floating Point
CN102520906A (en) Vector dot product accumulating network supporting reconfigurable fixed floating point and configurable vector length
US8577948B2 (en) Split path multiply accumulate unit
CN108255777B (en) Embedded floating-point DSP hard core structure for FPGA
CN102566967B (en) A kind of high-speed floating point arithmetical unit adopting multi-stage pipeline arrangement
CN110036368A (en) For executing arithmetical operation with the device and method for the floating number that adds up
US9274750B2 (en) System and method for signal processing in digital signal processors
CN108459840A (en) A kind of SIMD architecture floating-point fusion point multiplication operation unit
US8930433B2 (en) Systems and methods for a floating-point multiplication and accumulation unit using a partial-product multiplier in digital signal processors
US20110173421A1 (en) Multi-input and binary reproducible, high bandwidth floating point adder in a collective network
US20170357506A1 (en) Unified integer and floating-point compare circuitry
CN105607889A (en) Fixed-point and floating-point operation part with shared multiplier structure in GPDSP
CN111538473A (en) Posit floating point number processor
CN101082860A (en) Multiply adding up device
JP2016009492A (en) Apparatus and method for efficient division performance
CN104778026A (en) A high-speed data format conversion component and conversion method with SIMD
CN116974512A (en) Floating point arithmetic devices, vector processing devices, processors and electronic equipment
US9619205B1 (en) System and method for performing floating point operations in a processor that includes fixed point operations
CN105335128B (en) 64-bit fixed-point ALU circuit based on three-stage carry lookahead adder in GPDSP
GB2581542A (en) Apparatus and method for processing floating-point numbers
CN116382618A (en) Single-precision floating point arithmetic device
Raghav et al. Implementation of fast and efficient mac unit on FPGA

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20171129

Address after: 102412 Beijing City, Fangshan District Yan Village Yan Fu Road No. 1 No. 11 building 4 layer 402

Patentee after: Beijing Si Lang science and Technology Co.,Ltd.

Address before: 100190 Zhongguancun East Road, Beijing, No. 95, No.

Patentee before: Institute of Automation, Chinese Academy of Sciences

TR01 Transfer of patent right
CP03 Change of name, title or address

Address after: 201306 building C, No. 888, Huanhu West 2nd Road, Lingang New District, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai

Patentee after: Shanghai Silang Technology Co.,Ltd.

Address before: 102412 room 402, 4th floor, building 11, No. 1, Yanfu Road, Yancun Town, Fangshan District, Beijing

Patentee before: Beijing Si Lang science and Technology Co.,Ltd.

CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 201306 building C, No. 888, Huanhu West 2nd Road, Lingang New District, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai

Patentee after: Shanghai Silam Technology Co., Ltd.

Country or region after: China

Address before: 201306 building C, No. 888, Huanhu West 2nd Road, Lingang New District, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai

Patentee before: Shanghai Silang Technology Co.,Ltd.

Country or region before: China

CP03 Change of name, title or address