[go: up one dir, main page]

CN114612309A - Fully on-chip dynamically reconfigurable super-resolution device - Google Patents

Fully on-chip dynamically reconfigurable super-resolution device Download PDF

Info

Publication number
CN114612309A
CN114612309A CN202210512559.7A CN202210512559A CN114612309A CN 114612309 A CN114612309 A CN 114612309A CN 202210512559 A CN202210512559 A CN 202210512559A CN 114612309 A CN114612309 A CN 114612309A
Authority
CN
China
Prior art keywords
convolution
data
circuit
length
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210512559.7A
Other languages
Chinese (zh)
Other versions
CN114612309B (en
Inventor
常亮
赵鑫
周军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210512559.7A priority Critical patent/CN114612309B/en
Publication of CN114612309A publication Critical patent/CN114612309A/en
Application granted granted Critical
Publication of CN114612309B publication Critical patent/CN114612309B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)

Abstract

一种全片上动态可重构超分辨率装置,属于图像处理技术领域。所述全片上动态可重构超分辨率装置包括预处理电路、算术运算电路、插值电路和后处理电路;预处理电路包括权重缓冲器、输入缓冲器和输入图像色彩空间转换电路,算术运算电路包括数据重分配电路、卷积计算块、共享加法树电路和层间缓冲器,插值电路包括最近邻插值电路和临时缓冲器,后处理电路包括输出整形电路和输出图像色彩空间转换电路。本发明采用卷积压缩、卷积分解和PE重映射的映射策略以及多个动态可重构PE计算单元组成的卷积计算块,极大的降低了反卷积运算的计算量,提高了反卷积运算的运算效率,有效消除了无效计算,避免了计算负载不均衡的问题。

Figure 202210512559

An on-chip dynamic reconfigurable super-resolution device belongs to the technical field of image processing. The full on-chip dynamic reconfigurable super-resolution device includes a preprocessing circuit, an arithmetic operation circuit, an interpolation circuit and a post-processing circuit; the preprocessing circuit includes a weight buffer, an input buffer and an input image color space conversion circuit, and an arithmetic operation circuit It includes a data redistribution circuit, a convolution calculation block, a shared addition tree circuit and an inter-layer buffer. The interpolation circuit includes a nearest neighbor interpolation circuit and a temporary buffer. The post-processing circuit includes an output shaping circuit and an output image color space conversion circuit. The invention adopts the mapping strategy of convolution compression, convolution decomposition and PE remapping, and a convolution calculation block composed of multiple dynamic reconfigurable PE calculation units, which greatly reduces the calculation amount of the deconvolution operation and improves the anti-convolution calculation. The operational efficiency of the convolution operation effectively eliminates invalid calculations and avoids the problem of unbalanced computing loads.

Figure 202210512559

Description

全片上动态可重构超分辨率装置Fully on-chip dynamically reconfigurable super-resolution device

技术领域technical field

本发明属于图像处理技术领域,具体涉及一种全片上动态可重构超分辨率装置。The invention belongs to the technical field of image processing, and in particular relates to an on-chip dynamic reconfigurable super-resolution device.

背景技术Background technique

随着人工智能算法的发展,基于深度学习的超分辨率网络由于具有更好的图像重建效果,在老照片修复、医疗检测、安防监控等关键行业具有良好的应用前景。然而,超分辨率网络存在计算量大和数据访存量大的问题,使其对硬件的要求极高。究其原因,主要是反卷积过程中的大计算量导致的。传统的反卷积的计算过程存在大量的对计算结果未产生任何作用的无效计算,为了解决上述问题,Chang等人(Chang, Jung-Woo, Keon-Woo Kang,and Suk-Ju Kang. "An energy-efficient FPGA-based deconvolutional neuralnetworks accelerator for single image super-resolution." IEEE Transactions on Circuits and Systems for Video Technology 30.1 (2018): 281-295.)提出了一种将反卷积转化为卷积的TDC(转置卷积转换)方法,通过将反卷积转化为卷积运算、并重分配计算任务,有效减小了反卷积的计算量。但仍然存在PE(Processing Elements)利用率不高和计算负载不均衡的问题,没有充分挖掘反卷积的可提速空间。With the development of artificial intelligence algorithms, the super-resolution network based on deep learning has good application prospects in key industries such as old photo restoration, medical inspection, and security monitoring due to its better image reconstruction effect. However, super-resolution networks have problems of large amount of computation and data access, making them extremely demanding on hardware. The reason is mainly caused by the large amount of calculation in the deconvolution process. The traditional deconvolution calculation process has a large number of invalid calculations that have no effect on the calculation results. In order to solve the above problems, Chang et al. (Chang, Jung-Woo, Keon-Woo Kang, and Suk-Ju Kang. "An energy-efficient FPGA-based deconvolutional neuralnetworks accelerator for single image super-resolution." IEEE Transactions on Circuits and Systems for Video Technology 30.1 (2018): 281-295.) proposed a TDC that converts deconvolution into convolution The (transposed convolution transformation) method effectively reduces the computational complexity of deconvolution by converting deconvolution into convolution operations and redistributing computing tasks. However, there are still problems of low utilization rate of PE (Processing Elements) and unbalanced computing load, and the speed-up space of deconvolution is not fully exploited.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于,针对背景技术存在的问题,提出了一种全片上动态可重构超分辨率装置。The purpose of the present invention is to propose an on-chip dynamic reconfigurable super-resolution device in view of the problems existing in the background technology.

为实现上述目的,本发明采用的技术方案如下:For achieving the above object, the technical scheme adopted in the present invention is as follows:

一种全片上动态可重构超分辨率装置,包括预处理电路、算术运算电路、插值电路和后处理电路;A fully on-chip dynamically reconfigurable super-resolution device, comprising a preprocessing circuit, an arithmetic operation circuit, an interpolation circuit and a post-processing circuit;

其中,所述预处理电路包括权重缓冲器、输入缓冲器和输入图像色彩空间转换电路;所述权重缓冲器用于缓存超分辨率网络的权重数据,所述输入缓冲器用于缓存输入图像数据,所述输入图像色彩空间转换电路读取输入缓冲器中的输入图像数据,并将输入图像数据从RGB格式转换为YCbCr格式,转换后得到的Y通道数据输入数据重分配电路中,Cb和Cr通道数据输入最近邻插值电路中;Wherein, the preprocessing circuit includes a weight buffer, an input buffer and an input image color space conversion circuit; the weight buffer is used for buffering the weight data of the super-resolution network, the input buffer is used for buffering the input image data, and the The input image color space conversion circuit reads the input image data in the input buffer, and converts the input image data from RGB format to YCbCr format, the Y channel data obtained after conversion is input to the data redistribution circuit, and the Cb and Cr channel data Input into the nearest neighbor interpolation circuit;

所述算术运算电路包括数据重分配电路、卷积计算块、共享加法树电路和层间缓冲器;所述数据重分配电路读取权重缓冲器中的权重数据和输入图像色彩空间转换电路输出的Y通道数据,并根据缩放因子将权重数据和Y通道数据以指定的映射策略进行重分配,得到重分配后的数据;所述卷积计算块接收重分配后的数据,并对重分配后的数据进行卷积运算,得到卷积运算结果;所述共享加法树电路接收卷积运算结果,并对卷积运算结果进行累加,得到超分辨率网络中当前层的输出特征图数据;所述层间缓冲器接收并存储当前层的输出特征图数据,当未达到超分辨率网络的最大层数时,输出特征图数据作为下一层网络的输入特征图输入至数据重分配电路中,当达到超分辨率网络的最大层数时,输出特征图数据输入至输出整形电路;The arithmetic operation circuit includes a data redistribution circuit, a convolution calculation block, a shared addition tree circuit and an inter-layer buffer; the data redistribution circuit reads the weight data in the weight buffer and the input image color space conversion circuit output. Y channel data, and redistribute the weight data and Y channel data according to the scaling factor according to the specified mapping strategy to obtain the redistributed data; the convolution calculation block receives the redistributed data, and redistributes the redistributed data. Perform convolution operation on the data to obtain the result of the convolution operation; the shared addition tree circuit receives the result of the convolution operation, and accumulates the results of the convolution operation to obtain the output feature map data of the current layer in the super-resolution network; the layer The intermediate buffer receives and stores the output feature map data of the current layer. When the maximum number of layers of the super-resolution network is not reached, the output feature map data is used as the input feature map of the next layer network and is input to the data redistribution circuit. When the super-resolution network has the maximum number of layers, the output feature map data is input to the output shaping circuit;

所述插值电路包括最近邻插值电路和临时缓冲器;所述最近邻插值电路基于最近邻插值策略对接收到的Cb和Cr通道数据进行插值,得到插值后的特征图;所述临时缓冲器接收并缓存插值后的特征图;The interpolation circuit includes a nearest neighbor interpolation circuit and a temporary buffer; the nearest neighbor interpolation circuit interpolates the received Cb and Cr channel data based on a nearest neighbor interpolation strategy to obtain an interpolated feature map; the temporary buffer receives And cache the interpolated feature map;

所述后处理电路包括输出整形电路和输出图像色彩空间转换电路;所述输出整形电路读取层间缓冲器的输出特征图数据,并对输出特征图数据进行重排,得到Y通道数据的顺序输出;所述输出图像色彩空间转换电路读取Y通道数据的顺序输出和插值后的特征图,并将Y通道数据的顺序输出和插值后的特征图转换为RGB格式数据后输出。The post-processing circuit includes an output shaping circuit and an output image color space conversion circuit; the output shaping circuit reads the output feature map data of the interlayer buffer, and rearranges the output feature map data to obtain the sequence of the Y channel data Output; the output image color space conversion circuit reads the sequential output of the Y channel data and the interpolated feature map, and converts the sequential output of the Y channel data and the interpolated feature map into RGB format data for output.

进一步的,所述输入图像数据是通过将原始图像切分后得到的多个子图像。Further, the input image data is a plurality of sub-images obtained by dividing the original image.

进一步的,所述权重数据是通过将原始图像切分为多个子图像,然后对切分后的子图像作为训练数据集进行训练后得到的。Further, the weight data is obtained by dividing the original image into a plurality of sub-images, and then training the divided sub-images as a training data set.

进一步的,所述映射策略包括卷积压缩、卷积分解和PE重映射过程,其中,卷积压缩根据缩放因子对权重数据和Y通道数据进行压缩,去除Y通道数据中的0值以及与0值对应的权重数据;卷积分解将压缩后的数据分解为不同长度的卷积;PE重映射将不同长度的卷积组合为固定长度的卷积并输入卷积计算块。Further, the mapping strategy includes convolution compression, convolution decomposition and PE remapping process, wherein the convolution compression compresses the weight data and the Y channel data according to the scaling factor, and removes the 0 value in the Y channel data and the 0 value in the Y channel data. The weight data corresponding to the value; convolution decomposition decomposes the compressed data into convolutions of different lengths; PE remapping combines convolutions of different lengths into fixed-length convolutions and inputs them into the convolution calculation block.

进一步的,当缩放因子为2时,该装置可实现

Figure 128811DEST_PATH_IMAGE001
个大小为9×9的反卷积的并行 运算,其中所述数据重分配电路和卷积计算块的处理过程具体为: Further, when the scaling factor is 2, the device can achieve
Figure 128811DEST_PATH_IMAGE001
A parallel operation of deconvolution with a size of 9×9, wherein the processing process of the data redistribution circuit and the convolution calculation block is as follows:

1)卷积压缩:1) Convolution compression:

对输入的Y通道数据和权重数据进行压缩,得到

Figure 45951DEST_PATH_IMAGE002
个大小为5×5的卷积、
Figure 888006DEST_PATH_IMAGE003
个大小为5×4的卷积、
Figure 345532DEST_PATH_IMAGE002
个大小为4×5的卷积和
Figure 222221DEST_PATH_IMAGE002
个大小为4×4的卷积,其中,mn 为正整数; Compress the input Y channel data and weight data to get
Figure 45951DEST_PATH_IMAGE002
A convolution of size 5 × 5,
Figure 888006DEST_PATH_IMAGE003
A convolution of size 5 × 4,
Figure 345532DEST_PATH_IMAGE002
A convolutional sum of size 4 × 5
Figure 222221DEST_PATH_IMAGE002
A convolution of size 4×4, where m and n are positive integers;

2)卷积分解:2) Convolution decomposition:

Figure 107000DEST_PATH_IMAGE002
个大小为5×5的卷积分解为
Figure 436350DEST_PATH_IMAGE004
个长度为9的卷积和
Figure 635251DEST_PATH_IMAGE003
个长度为7的 卷积;
Figure 366446DEST_PATH_IMAGE002
个大小为5×4的卷积分解为
Figure 422127DEST_PATH_IMAGE004
个长度为9的卷积和
Figure 176456DEST_PATH_IMAGE003
个长度为2的卷积;
Figure 913468DEST_PATH_IMAGE002
个大小为4×5的卷积分解为
Figure 499170DEST_PATH_IMAGE004
个长度为9的卷积和
Figure 991331DEST_PATH_IMAGE003
个长度为2的卷积;
Figure 295274DEST_PATH_IMAGE002
个 大小为4×4的卷积分解为
Figure 695031DEST_PATH_IMAGE002
个长度为9的卷积和
Figure 400819DEST_PATH_IMAGE003
个长度为7的卷积; Will
Figure 107000DEST_PATH_IMAGE002
A convolution of size 5 × 5 is decomposed into
Figure 436350DEST_PATH_IMAGE004
A convolutional sum of length 9
Figure 635251DEST_PATH_IMAGE003
a convolution of length 7;
Figure 366446DEST_PATH_IMAGE002
A convolution of size 5 × 4 is decomposed into
Figure 422127DEST_PATH_IMAGE004
A convolutional sum of length 9
Figure 176456DEST_PATH_IMAGE003
a convolution of length 2;
Figure 913468DEST_PATH_IMAGE002
A convolution of size 4 × 5 is decomposed into
Figure 499170DEST_PATH_IMAGE004
A convolutional sum of length 9
Figure 991331DEST_PATH_IMAGE003
a convolution of length 2;
Figure 295274DEST_PATH_IMAGE002
A convolution of size 4 × 4 is decomposed into
Figure 695031DEST_PATH_IMAGE002
A convolutional sum of length 9
Figure 400819DEST_PATH_IMAGE003
a convolution of length 7;

3)PE重映射:3) PE remapping:

长度为7的卷积与长度为2的卷积组合,得到

Figure 735985DEST_PATH_IMAGE004
个长度为9的卷积,然后与剩余的
Figure 527224DEST_PATH_IMAGE005
个长度为9的卷积一起输入卷积计算块; Combining a length-7 convolution with a length-2 convolution yields
Figure 735985DEST_PATH_IMAGE004
convolutions of length 9, and then with the remaining
Figure 527224DEST_PATH_IMAGE005
A convolution with a length of 9 is input into the convolution calculation block together;

4)反卷积运算:4) Deconvolution operation:

输入的mn个卷积送入卷积计算块中进行卷积计算,得到mn个卷积运算结果。The input mn convolutions are sent to the convolution calculation block for convolution calculation, and mn convolution operation results are obtained.

进一步的,当缩放因子为3时,该装置可实现mn个大小为9×9的反卷积的并行运算,其中所述数据重分配电路和卷积计算块的处理过程具体为:Further, when the scaling factor is 3, the device can realize mn parallel operations of deconvolution with a size of 9×9, wherein the processing process of the data redistribution circuit and the convolution calculation block is specifically:

1)卷积压缩: 1) Convolution compression:

对输入的Y通道数据和权重数据进行压缩,得到mn个大小为3×3的卷积;Compress the input Y channel data and weight data to obtain mn convolutions with a size of 3 × 3;

2)卷积分解:2) Convolution decomposition:

mn个大小为3×3的卷积分解为mn个长度为9的卷积;Decompose mn convolutions of size 3×3 into mn convolutions of length 9;

3)PE重映射:3) PE remapping:

mn个长度为9的卷积输入卷积计算块;Input mn convolutions of length 9 into the convolution calculation block;

4)反卷积运算:4) Deconvolution operation:

输入的mn个卷积送入卷积计算块中进行卷积计算,得到mn个卷积运算结果。The input mn convolutions are sent to the convolution calculation block for convolution calculation, and mn convolution operation results are obtained.

进一步的,当缩放因子为4时,该装置可实现

Figure 606038DEST_PATH_IMAGE006
个大小为9×9的反卷积的并行 运算,其中所述数据重分配电路和卷积计算块的处理过程具体为: Further, when the scaling factor is 4, the device can achieve
Figure 606038DEST_PATH_IMAGE006
A parallel operation of deconvolution with a size of 9×9, wherein the processing process of the data redistribution circuit and the convolution calculation block is as follows:

1)卷积压缩:1) Convolution compression:

对输入的Y通道数据和权重数据进行压缩,得到

Figure 166333DEST_PATH_IMAGE002
个大小为3×3的卷积、
Figure 872733DEST_PATH_IMAGE007
个大小为3×2的卷积、
Figure 885688DEST_PATH_IMAGE008
个大小为2×3的卷积、mn个大小为2×2的卷积; Compress the input Y channel data and weight data to get
Figure 166333DEST_PATH_IMAGE002
A convolution of size 3 × 3,
Figure 872733DEST_PATH_IMAGE007
A convolution of size 3 × 2,
Figure 885688DEST_PATH_IMAGE008
convolutions of size 2×3, mn convolutions of size 2×2;

2)卷积分解:2) Convolution decomposition:

Figure 830511DEST_PATH_IMAGE002
个大小为3×3的卷积分解为
Figure 245311DEST_PATH_IMAGE002
个长度为9的卷积;
Figure 250177DEST_PATH_IMAGE007
个大小为3×2 的卷积分解为
Figure 688111DEST_PATH_IMAGE009
个长度为3的卷积;
Figure 108728DEST_PATH_IMAGE010
个大小为2×3的卷积分解为
Figure 643615DEST_PATH_IMAGE011
个长度为3的 卷积和
Figure 757064DEST_PATH_IMAGE009
个长度为2的卷积;mn个大小为2×2的卷积分解为
Figure 744612DEST_PATH_IMAGE012
个长度为4的卷积和
Figure 968920DEST_PATH_IMAGE013
个长度为2的卷积; Will
Figure 830511DEST_PATH_IMAGE002
A convolution of size 3 × 3 is decomposed into
Figure 245311DEST_PATH_IMAGE002
a convolution of length 9;
Figure 250177DEST_PATH_IMAGE007
A convolution of size 3×2 is decomposed into
Figure 688111DEST_PATH_IMAGE009
a convolution of length 3;
Figure 108728DEST_PATH_IMAGE010
A convolution of size 2 × 3 is decomposed into
Figure 643615DEST_PATH_IMAGE011
A convolutional sum of length 3
Figure 757064DEST_PATH_IMAGE009
convolutions of length 2; mn convolutions of size 2 × 2 are decomposed into
Figure 744612DEST_PATH_IMAGE012
A convolutional sum of length 4
Figure 968920DEST_PATH_IMAGE013
a convolution of length 2;

3)PE重映射:3) PE remapping:

将长度为4的卷积、长度为3的卷积和长度为2的卷积组合,得到

Figure 92734DEST_PATH_IMAGE012
个长度为9的 卷积,然后与剩余的
Figure 377084DEST_PATH_IMAGE002
个长度为9的卷积一起输入卷积计算块;Combining the convolution of length 4, the convolution of length 3 and the convolution of length 2, we get
Figure 92734DEST_PATH_IMAGE012
convolutions of length 9, and then with the remaining
Figure 377084DEST_PATH_IMAGE002
A convolution with a length of 9 is input into the convolution calculation block together;

4)反卷积运算:4) Deconvolution operation:

输入的mn个卷积送入卷积计算块中进行卷积计算,得到mn个卷积运算结果。The input mn convolutions are sent to the convolution calculation block for convolution calculation, and mn convolution operation results are obtained.

优选地,当m×n为9的倍数时,该装置的计算效率最高。Preferably, the computational efficiency of the device is highest when m × n is a multiple of 9.

其中,所述卷积计算块包括m×n个动态可重构PE计算单元,所述动态可重构PE计算单元包括第1~9像素点、第1~9权重数据点、第1~9乘法器、第1~8加法器、第一数据选择器和第二数据选择器;第1像素点A1与第1权重数据点W1的乘积,与第2像素点A2与第2权重数据点W2的乘积,在第1加法器中相加,得到第1数据;第3像素点A3与第3权重数据点W3的乘积,与第4像素点A4与第4权重数据点W4的乘积,在第3加法器中相加,得到第2数据;第1数据与第2数据在第2加法器中相加,得到的第3数据输入第一数据选择器的输入端,第一数据选择器的一个输出端连接第4加法器的第一输入端,另一个输出端作为动态可重构PE计算单元的第一输出;第6像素点A6与第6权重数据点W6的乘积,与第7像素点A7与第7权重数据点W7的乘积,在第6加法器中相加,得到第4数据;第5像素点A5与第5权重数据点W5的乘积,与第4数据在第5加法器中相加,得到第5数据;第5数据与第一数据选择器输出的数据在第4加法器中相加,得到第6数据;第8像素点A8与第8权重数据点W8的乘积,与第9像素点A9与第9权重数据点W9的乘积,在第8加法器中相加,得到第7数据;得到的第7数据输入第二数据选择器的输入端,第二数据选择器的一个输出端的数据与第6数据在第7加法器中相加,得到动态可重构PE计算单元的第二输出;第二数据选择器的另一个输出端的数据作为动态可重构PE计算单元的第三输出。Wherein, the convolution calculation block includes m × n dynamically reconfigurable PE calculation units, and the dynamic reconfigurable PE calculation unit includes the 1st to 9th pixel points, the 1st to 9th weight data points, the 1st to 9th pixel points. Multiplier, the 1st to 8th adders, the first data selector and the second data selector; the product of the first pixel point A 1 and the first weight data point W 1 , and the second pixel point A 2 and the second weight The product of the data point W 2 is added in the first adder to obtain the first data; the product of the third pixel point A 3 and the third weight data point W 3 , and the fourth pixel point A 4 and the fourth weight data The product of point W 4 is added in the third adder to obtain the second data; the first data and the second data are added in the second adder, and the third data obtained is input to the input terminal of the first data selector , one output end of the first data selector is connected to the first input end of the fourth adder, and the other output end is used as the first output of the dynamic reconfigurable PE calculation unit; the sixth pixel point A 6 and the sixth weight data point The product of W 6 and the product of the seventh pixel point A 7 and the seventh weight data point W 7 are added in the sixth adder to obtain the fourth data; the fifth pixel point A 5 and the fifth weight data point W The product of 5 is added with the 4th data in the 5th adder to obtain the 5th data; the 5th data and the data output by the first data selector are added in the 4th adder to obtain the 6th data; the 8th data The product of the pixel point A 8 and the 8th weight data point W 8 , and the product of the 9th pixel point A 9 and the 9th weight data point W 9 , are added in the 8th adder to obtain the 7th data; 7 data is input to the input end of the second data selector, and the data of one output end of the second data selector and the 6th data are added in the 7th adder to obtain the second output of the dynamically reconfigurable PE computing unit; the second The data at the other output of the data selector is used as the third output of the dynamically reconfigurable PE computing unit.

进一步的,每个动态可重构PE计算单元可实现3种工作模式:Further, each dynamically reconfigurable PE computing unit can implement 3 working modes:

模式0:输出1个长度为9的卷积运算结果;Mode 0: output a convolution operation result of length 9;

模式1:输出1个长度为7和1个长度为2的卷积运算结果;Mode 1: Output 1 convolution operation result of length 7 and 1 length 2;

模式2:输出1个长度为4、1个长度为3和1个长度为2的卷积运算结果。Mode 2: Output 1 convolution operation result of length 4, 1 length 3 and 1 length 2.

所述第一输出的结果为模式2的1个长度为4的卷积运算结果;所述第二输出的结果为模式0的1个长度为9的卷积运算结果、或者模式1的1个长度为7的卷积运算结果、或者模式2的1个长度为3的卷积运算结果;所述第三输出的结果为模式1的1个长度为2的卷积运算结果、或者模式2的1个长度为2的卷积运算结果。The result of the first output is a convolution operation result with a length of 4 in mode 2; the result of the second output is a convolution operation result with a length of 9 in mode 0, or a convolution operation result in mode 1 The result of a convolution operation with a length of 7, or a convolution operation result of a length of 3 in mode 2; the result of the third output is a convolution operation result of a length of 2 in mode 1, or a convolution operation result of mode 2 The result of a convolution operation of length 2.

与现有技术相比,本发明的有益效果为:Compared with the prior art, the beneficial effects of the present invention are:

1、本发明提供的一种全片上动态可重构超分辨率装置,采用卷积压缩、卷积分解和PE重映射的映射策略以及多个动态可重构PE计算单元组成的卷积计算块,极大的降低了反卷积运算的计算量,提高了反卷积运算的运算效率,有效消除了无效计算,避免了计算负载不均衡的问题。1. A full on-chip dynamic reconfigurable super-resolution device provided by the present invention adopts the mapping strategy of convolution compression, convolution decomposition and PE remapping and a convolution calculation block composed of multiple dynamic reconfigurable PE calculation units , which greatly reduces the calculation amount of the deconvolution operation, improves the operation efficiency of the deconvolution operation, effectively eliminates the invalid calculation, and avoids the problem of unbalanced calculation load.

2、本发明提供的一种全片上动态可重构超分辨率装置,输入图像数据和权重数据是将原始图像切分为多个子图像并经过训练得到的,大大降低了层间的数据量,避免了中间网络层和片外存储器通信,实现全片上存储,提高了装置的吞吐量。2. A full on-chip dynamic reconfigurable super-resolution device provided by the present invention, the input image data and weight data are obtained by dividing the original image into multiple sub-images and training, which greatly reduces the amount of data between layers, Communication between the intermediate network layer and off-chip memory is avoided, full on-chip storage is realized, and the throughput of the device is improved.

附图说明Description of drawings

图1为本发明提供的一种全片上动态可重构超分辨率装置的结构示意图;1 is a schematic structural diagram of an on-chip dynamic reconfigurable super-resolution device provided by the present invention;

图2为本发明提供的一种全片上动态可重构超分辨率装置中,动态可重构PE计算单元的结构示意图。FIG. 2 is a schematic structural diagram of a dynamically reconfigurable PE computing unit in an on-chip dynamically reconfigurable super-resolution device provided by the present invention.

具体实施方式Detailed ways

下面结合附图和实施例,详述本发明的技术方案。The technical solutions of the present invention will be described in detail below with reference to the accompanying drawings and embodiments.

实施例Example

如图1所示,为本发明提供的一种全片上动态可重构超分辨率装置的结构示意图;包括预处理电路、算术运算电路、插值电路和后处理电路;As shown in FIG. 1 , it is a schematic structural diagram of an on-chip dynamic reconfigurable super-resolution device provided by the present invention; it includes a preprocessing circuit, an arithmetic operation circuit, an interpolation circuit and a post-processing circuit;

其中,所述预处理电路包括权重缓冲器、输入缓冲器和输入图像色彩空间转换电路;所述权重缓冲器用于缓存超分辨率网络的权重数据,所述输入缓冲器用于缓存输入图像数据,所述输入图像色彩空间转换电路读取输入缓冲器中的输入图像数据,并将输入图像数据从RGB格式转换为YCbCr格式,转换后得到的Y通道数据输入数据重分配电路中,Cb和Cr通道数据输入最近邻插值电路中;Wherein, the preprocessing circuit includes a weight buffer, an input buffer and an input image color space conversion circuit; the weight buffer is used for buffering the weight data of the super-resolution network, the input buffer is used for buffering the input image data, and the The input image color space conversion circuit reads the input image data in the input buffer, and converts the input image data from RGB format to YCbCr format, the Y channel data obtained after conversion is input to the data redistribution circuit, and the Cb and Cr channel data Input into the nearest neighbor interpolation circuit;

所述算术运算电路包括数据重分配电路、卷积计算块、共享加法树电路和层间缓冲器;所述数据重分配电路读取权重缓冲器中的权重数据和输入图像色彩空间转换电路输出的Y通道数据,并根据缩放因子将权重数据和Y通道数据以指定的映射策略进行重分配,得到重分配后的数据;所述卷积计算块接收重分配后的数据,并对重分配后的数据进行卷积运算,得到卷积运算结果;所述共享加法树电路接收卷积运算结果,并对卷积运算结果进行累加,得到超分辨率网络中当前层的输出特征图数据;所述层间缓冲器接收并存储当前层的输出特征图数据,当未达到超分辨率网络的最大层数时,输出特征图数据作为下一层网络的输入特征图输入至数据重分配电路中,当达到超分辨率网络的最大层数时,输出特征图数据输入至输出整形电路;The arithmetic operation circuit includes a data redistribution circuit, a convolution calculation block, a shared addition tree circuit and an inter-layer buffer; the data redistribution circuit reads the weight data in the weight buffer and the input image color space conversion circuit output. Y channel data, and redistribute the weight data and Y channel data according to the scaling factor according to the specified mapping strategy to obtain the redistributed data; the convolution calculation block receives the redistributed data, and redistributes the redistributed data. Perform convolution operation on the data to obtain the result of the convolution operation; the shared addition tree circuit receives the result of the convolution operation, and accumulates the results of the convolution operation to obtain the output feature map data of the current layer in the super-resolution network; the layer The intermediate buffer receives and stores the output feature map data of the current layer. When the maximum number of layers of the super-resolution network is not reached, the output feature map data is used as the input feature map of the next layer network and is input to the data redistribution circuit. When the super-resolution network has the maximum number of layers, the output feature map data is input to the output shaping circuit;

所述插值电路包括最近邻插值电路和临时缓冲器;所述最近邻插值电路基于最近邻插值策略对接收到的Cb和Cr通道数据进行插值,得到插值后的特征图;所述临时缓冲器接收并缓存插值后的特征图;The interpolation circuit includes a nearest neighbor interpolation circuit and a temporary buffer; the nearest neighbor interpolation circuit interpolates the received Cb and Cr channel data based on a nearest neighbor interpolation strategy to obtain an interpolated feature map; the temporary buffer receives And cache the interpolated feature map;

所述后处理电路包括输出整形电路和输出图像色彩空间转换电路;所述输出整形电路读取层间缓冲器的输出特征图数据,并对输出特征图数据进行重排,得到Y通道数据的顺序输出;所述输出图像色彩空间转换电路读取Y通道数据的顺序输出和插值后的特征图,并将Y通道数据的顺序输出和插值后的特征图转换为RGB格式数据后输出。The post-processing circuit includes an output shaping circuit and an output image color space conversion circuit; the output shaping circuit reads the output feature map data of the interlayer buffer, and rearranges the output feature map data to obtain the sequence of the Y channel data Output; the output image color space conversion circuit reads the sequential output of the Y channel data and the interpolated feature map, and converts the sequential output of the Y channel data and the interpolated feature map into RGB format data for output.

其中,所述输入图像数据是通过将大小为1080×720的RGB格式的原始图像切分后得到的多个大小为54×36的RGB格式图像。The input image data is a plurality of RGB format images with a size of 54×36 obtained by dividing an original image with a size of 1080×720 in RGB format.

其中,所述权重数据是通过将原始图像切分为多个子图像,然后对切分后的子图像进行训练后得到的。Wherein, the weight data is obtained by dividing the original image into multiple sub-images, and then training the divided sub-images.

其中,所述卷积计算块包括3×3个动态可重构PE计算单元,所述动态可重构PE计算单元包括第1~9像素点、第1~9权重数据点、第1~9乘法器、第1~8加法器、第一数据选择器和第二数据选择器,如图2所示;第1像素点A1与第1权重数据点W1的乘积,与第2像素点A2与第2权重数据点W2的乘积,在第1加法器中相加,得到第1数据;第3像素点A3与第3权重数据点W3的乘积,与第4像素点A4与第4权重数据点W4的乘积,在第3加法器中相加,得到第2数据;第1数据与第2数据在第2加法器中相加,得到的第3数据输入第一数据选择器的输入端,第一数据选择器的一个输出端连接第4加法器的第一输入端,另一个输出端作为动态可重构PE计算单元的第一输出;第6像素点A6与第6权重数据点W6的乘积,与第7像素点A7与第7权重数据点W7的乘积,在第6加法器中相加,得到第4数据;第5像素点A5与第5权重数据点W5的乘积,与第4数据在第5加法器中相加,得到第5数据;第5数据与第一数据选择器输出的数据在第4加法器中相加,得到第6数据;第8像素点A8与第8权重数据点W8的乘积,与第9像素点A9与第9权重数据点W9的乘积,在第8加法器中相加,得到第7数据;得到的第7数据输入第二数据选择器的输入端,第二数据选择器的一个输出端的数据与第6数据在第7加法器中相加,得到动态可重构PE计算单元的第二输出;第二数据选择器的另一个输出端的数据作为动态可重构PE计算单元的第三输出。Wherein, the convolution calculation block includes 3×3 dynamically reconfigurable PE calculation units, and the dynamic reconfigurable PE calculation unit includes the 1st to 9th pixel points, the 1st to 9th weight data points, the 1st to 9th pixel points. The multiplier, the 1st to 8th adders, the first data selector and the second data selector, as shown in Figure 2; the product of the first pixel point A 1 and the first weight data point W 1 , and the second pixel point The product of A 2 and the second weight data point W 2 is added in the first adder to obtain the first data; the product of the third pixel point A 3 and the third weight data point W 3 is added to the fourth pixel point A The product of 4 and the fourth weight data point W 4 is added in the third adder to obtain the second data; the first data and the second data are added in the second adder, and the third data obtained is input into the first The input end of the data selector, one output end of the first data selector is connected to the first input end of the fourth adder, and the other output end is used as the first output of the dynamically reconfigurable PE calculation unit; the sixth pixel point A 6 The product of the sixth weight data point W 6 and the product of the seventh pixel point A 7 and the seventh weight data point W 7 are added in the sixth adder to obtain the fourth data; the fifth pixel point A 5 and The product of the fifth weighted data point W 5 is added with the fourth data in the fifth adder to obtain the fifth data; the fifth data and the data output by the first data selector are added in the fourth adder to obtain The sixth data; the product of the eighth pixel point A 8 and the eighth weight data point W 8 , and the product of the ninth pixel point A 9 and the ninth weight data point W 9 , are added in the eighth adder to obtain the first 7 data; the obtained 7th data is input to the input end of the second data selector, and the data of one output end of the second data selector and the 6th data are added in the 7th adder to obtain the dynamic reconfigurable PE calculation unit. The second output; the data of the other output terminal of the second data selector is used as the third output of the dynamically reconfigurable PE computing unit.

其中,缩放因子设置为4,可实现16个大小为9×9的反卷积的并行运算,所述数据重分配电路和卷积计算块的处理过程具体为:Among them, the scaling factor is set to 4, which can realize 16 parallel operations of deconvolution with a size of 9×9. The processing process of the data redistribution circuit and the convolution calculation block is as follows:

1)卷积压缩:1) Convolution compression:

对输入的Y通道数据和权重数据进行压缩,得到1个大小为3×3的卷积、3个大小为3×2的卷积、3个大小为2×3的卷积、9个大小为2×2的卷积;Compress the input Y channel data and weight data to obtain 1 convolution with a size of 3×3, 3 convolutions with a size of 3×2, 3 convolutions with a size of 2×3, and 9 convolutions with a size of 9 2×2 convolution;

2)卷积分解:2) Convolution decomposition:

将1个大小为3×3的卷积分解为1个长度为9的卷积;3个大小为3×2的卷积分解为6个长度为3的卷积;3个大小为2×3的卷积分解为2个长度为3的卷积和6个长度为2的卷积;9个大小为2×2的卷积分解为8个长度为4的卷积和2个长度为2的卷积;Decompose 1 convolution of size 3×3 into 1 convolution of size 9; 3 convolutions of size 3×2 into 6 convolutions of size 3; 3 convolutions of size 2×3 The convolution is decomposed into 2 convolutions of length 3 and 6 convolutions of length 2; 9 convolutions of size 2×2 are decomposed into 8 convolutions of length 4 and 2 convolutions of length 2 convolution;

3)PE重映射:3) PE remapping:

将长度为4的卷积、长度为3的卷积和长度为2的卷积组合,得到8个长度为9的卷积,然后与剩余的1个长度为9的卷积一起输入卷积计算块;Combine the convolution of length 4, the convolution of length 3 and the convolution of length 2 to obtain 8 convolutions of length 9, and then enter the convolution calculation together with the remaining 1 convolution of length 9 piece;

4)反卷积运算:4) Deconvolution operation:

输入的9个卷积送入卷积计算块中进行卷积计算,得到9个卷积运算结果;其中卷积计算块包括3×3排列的9个动态可重构PE计算单元。The input 9 convolutions are sent to the convolution calculation block for convolution calculation, and 9 convolution operation results are obtained; the convolution calculation block includes 9 dynamically reconfigurable PE calculation units arranged in 3×3.

以上所述,仅是本发明的较佳实施例而已,并非对本发明作任何形式上的限制,虽然本发明已以较佳实施例展示如上,然而并非用以限定本发明,任何熟悉本专业的技术人员,在不脱离本发明技术方案范围内,当可利用上述揭示的技术内容做出些许更动或修饰为等同变化的等效实施例,但凡是未脱离本发明技术方案内容,依据本发明的技术实质,在本发明的精神和原则之内,对以上实施例所作的任何简单的修改、等同替换与改进等,均仍属于本发明技术方案的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention in any form. Although the present invention has been shown above with preferred embodiments, it is not intended to limit the present invention. The technical personnel, within the scope of the technical solution of the present invention, can make some changes or modifications by using the technical content disclosed above to be equivalent examples of equivalent changes, but if they do not depart from the technical solution content of the present invention, according to the present invention Within the spirit and principle of the present invention, any simple modifications, equivalent replacements and improvements made to the above embodiments still fall within the protection scope of the technical solutions of the present invention.

Claims (6)

1. A full-on-chip dynamic reconfigurable super-resolution device is characterized by comprising a preprocessing circuit, an arithmetic operation circuit, an interpolation circuit and a post-processing circuit;
wherein the pre-processing circuit comprises a weight buffer, an input buffer and an input image color space conversion circuit; the weight buffer is used for caching weight data of the super-resolution network, the input buffer is used for caching input image data, the input image color space conversion circuit reads the input image data in the input buffer and converts the input image data from an RGB format into a YCbCr format, Y-channel data obtained after conversion are input into the data redistribution circuit, and Cb and Cr channel data are input into the nearest neighbor interpolation circuit;
The arithmetic operation circuit comprises a data redistribution circuit, a convolution calculation block, a shared addition tree circuit and an interlayer buffer; the data redistribution circuit reads the weight data in the weight buffer and the Y-channel data output by the input image color space conversion circuit, and redistributes the weight data and the Y-channel data according to a designated mapping strategy according to a scaling factor to obtain redistributed data; the convolution calculation block receives the redistributed data and performs convolution operation on the redistributed data to obtain a convolution operation result; the shared addition tree circuit receives convolution operation results and accumulates the convolution operation results to obtain output characteristic diagram data of a current layer in the super-resolution network; the interlayer buffer receives and stores output characteristic diagram data of a current layer, when the maximum number of layers of the super-resolution network is not reached, the output characteristic diagram data is used as an input characteristic diagram of a next layer of network and is input into the data redistribution circuit, and when the maximum number of layers of the super-resolution network is reached, the output characteristic diagram data is input into the output shaping circuit;
the interpolation circuit comprises a nearest neighbor interpolation circuit and a temporary buffer; the nearest neighbor interpolation circuit interpolates the received Cb and Cr channel data based on a nearest neighbor interpolation strategy to obtain an interpolated characteristic diagram; the temporary buffer receives and buffers the characteristic diagram after interpolation;
The post-processing circuit comprises an output shaping circuit and an output image color space conversion circuit; the output shaping circuit reads the output characteristic diagram data of the interlayer buffer and rearranges the output characteristic diagram data to obtain the sequential output of Y-channel data; the output image color space conversion circuit reads the sequential output of the Y-channel data and the characteristic diagram after interpolation, and converts the sequential output of the Y-channel data and the characteristic diagram after interpolation into RGB format data for output.
2. The full on-chip dynamic reconfigurable super-resolution device according to claim 1, wherein the mapping strategy comprises convolution compression, convolution decomposition and PE remapping processes, wherein the convolution compression compresses the weight data and the Y-channel data according to a scaling factor, and removes a 0 value and the weight data corresponding to the 0 value in the Y-channel data; the convolution decomposition decomposes the compressed data into convolutions of different lengths; PE remapping combines convolutions of different lengths into convolutions of fixed length and inputs them to a convolution computation block.
3. The full on-chip dynamic reconfigurable super-resolution device according to claim 1, wherein when the scaling factor is 2, the processing procedures of the data redistribution circuit and the convolution calculation block are as follows:
1) And (3) convolution compression:
compressing the input Y-channel data and the weight data to obtain
Figure 586952DEST_PATH_IMAGE001
Convolution with size of 5 × 5,
Figure 24755DEST_PATH_IMAGE002
Convolution with size of 5 × 4,
Figure 334514DEST_PATH_IMAGE001
4 x 5 convolution sum
Figure 936396DEST_PATH_IMAGE001
A convolution of size 4 x 4, wherein,mandnis a positive integer;
2) and (3) convolution decomposition:
will be provided with
Figure 130749DEST_PATH_IMAGE001
A convolution of size 5 x 5 is decomposed into
Figure 970529DEST_PATH_IMAGE003
A convolution sum of length 9
Figure 982347DEST_PATH_IMAGE002
A convolution of length 7;
Figure 727318DEST_PATH_IMAGE001
each size of 5X 4The convolution is decomposed into
Figure 584415DEST_PATH_IMAGE003
A convolution sum of length 9
Figure 685227DEST_PATH_IMAGE002
A length-2 convolution;
Figure 602367DEST_PATH_IMAGE001
a convolution of size 4 x 5 is decomposed into
Figure 647683DEST_PATH_IMAGE003
A convolution sum of length 9
Figure 495423DEST_PATH_IMAGE002
A length-2 convolution;
Figure 450740DEST_PATH_IMAGE001
a convolution of size 4 x 4 is decomposed into
Figure 538782DEST_PATH_IMAGE001
A convolution sum of length 9
Figure 71395DEST_PATH_IMAGE002
A convolution of length 7;
3) PE remapping:
the convolution of length 7 is combined with the convolution of length 2 to yield
Figure 270295DEST_PATH_IMAGE003
Convolution of length 9 and then with the rest
Figure 329386DEST_PATH_IMAGE004
Inputting convolution with the length of 9 into a convolution calculation block;
4) and (3) deconvolution operation:
input ofmnIs sent by convolutionPerforming convolution calculation in a convolution calculation block to obtainmnAnd (5) convolution operation results.
4. The full on-chip dynamic reconfigurable super-resolution device according to claim 1, wherein when the scaling factor is 3, the processing procedures of the data redistribution circuit and the convolution calculation block are as follows:
1) And (3) convolution compression:
compressing the input Y-channel data and the weight data to obtainmnConvolution of size 3 × 3;
2) and (3) convolution decomposition:
will be provided withmnA convolution of size 3 x 3 is decomposed intomnA convolution of length 9;
3) PE remapping:
will be provided withmnConvolution input convolution calculation blocks with the length of 9;
4) and (3) deconvolution operation:
input ofmnThe convolution is sent into a convolution calculation block for convolution calculation to obtainmnAnd (5) convolution operation results.
5. The full on-chip dynamic reconfigurable super-resolution device according to claim 1, wherein when the scaling factor is 4, the processing procedures of the data redistribution circuit and the convolution calculation block are as follows:
1) and (3) convolution compression:
compressing the input Y-channel data and the weight data to obtain
Figure 791592DEST_PATH_IMAGE001
Convolution with a size of 3 x 3,
Figure 77080DEST_PATH_IMAGE005
Convolution with size of 3 x 2,
Figure 814091DEST_PATH_IMAGE006
Convolution with size of 2 x 3,mnConvolution of size 2 × 2;
2) and (3) convolution decomposition:
will be provided with
Figure 727690DEST_PATH_IMAGE001
A convolution of size 3 x 3 is decomposed into
Figure 360796DEST_PATH_IMAGE001
A convolution of length 9;
Figure 868001DEST_PATH_IMAGE005
a convolution of size 3 x 2 is decomposed into
Figure 408704DEST_PATH_IMAGE007
A length-3 convolution;
Figure 442388DEST_PATH_IMAGE008
a convolution of size 2 x 3 is decomposed into
Figure 308713DEST_PATH_IMAGE009
A convolution sum of length 3
Figure 37634DEST_PATH_IMAGE007
A length-2 convolution;mna convolution of size 2 x 2 is decomposed into
Figure 319711DEST_PATH_IMAGE010
A convolution sum of length 4
Figure 473481DEST_PATH_IMAGE011
A length-2 convolution;
3) PE remapping:
combining the convolution of length 4, the convolution of length 3 and the convolution of length 2 to obtain
Figure 245128DEST_PATH_IMAGE010
Convolution of length 9 and then with the rest
Figure 461345DEST_PATH_IMAGE001
Inputting convolution with the length of 9 into a convolution calculation block;
4) and (3) deconvolution operation:
input ofmnThe convolution is sent into a convolution calculation block for convolution calculation to obtainmnAnd (5) convolution operation results.
6. The full on-chip dynamic reconfigurable super-resolution device according to claim 1, wherein the convolution calculation block comprisesm×nThe dynamic reconfigurable PE computing unit comprises 1 st to 9 th pixel points, 1 st to 9 th weight data points, 1 st to 9 th multipliers, 1 st to 8 th adders, a first data selector and a second data selector; 1 st pixel point A1And the 1 st weight data point W1The product of (2) with the 2 nd pixel A2And 2 nd weight data point W2The 1 st data is obtained by adding the products of the first and second adders in the 1 st adder; point 3 of pixel A3And 3 rd weight data point W3The product of (2) and the 4 th pixel A4And the 4 th weight data point W4The products of (1) are added in a 3 rd adder to obtain 2 nd data; adding the 1 st data and the 2 nd data in a 2 nd adder to obtain 3 rd data, inputting the 3 rd data into an input end of a first data selector, connecting one output end of the first data selector with a first input end of a 4 th adder, and taking the other output end as a first output of the dynamic reconfigurable PE computing unit; the 6 th pixel point A 6And the 6 th weight data point W6The product of (2) and the 7 th pixel A7And 7 th weight data point W7The products of (1) are added in a 6 th adder to obtain 4 th data; the 5 th pixel A5And the 5 th weight data point W5The product of (1) and the 4 th data are added in a 5 th adder to obtain 5 th data; adding the 5 th data and the data output by the first data selector in a 4 th adder to obtain 6 th data; 8 th pixel point A8And the 8 th weight data point W8The product of (2) and the 9 th pixel A9And the 9 th weight data point W9The product of (a) is added at 8 thAdding the obtained data in the device to obtain 7 th data; inputting the obtained 7 th data into the input end of a second data selector, and adding the data at one output end of the second data selector and the 6 th data in a 7 th adder to obtain a second output of the dynamic reconfigurable PE calculation unit; and the data at the other output end of the second data selector is used as a third output of the dynamic reconfigurable PE computing unit.
CN202210512559.7A 2022-05-12 2022-05-12 Fully on-chip dynamically reconfigurable super-resolution device Active CN114612309B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210512559.7A CN114612309B (en) 2022-05-12 2022-05-12 Fully on-chip dynamically reconfigurable super-resolution device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210512559.7A CN114612309B (en) 2022-05-12 2022-05-12 Fully on-chip dynamically reconfigurable super-resolution device

Publications (2)

Publication Number Publication Date
CN114612309A true CN114612309A (en) 2022-06-10
CN114612309B CN114612309B (en) 2022-10-14

Family

ID=81870355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210512559.7A Active CN114612309B (en) 2022-05-12 2022-05-12 Fully on-chip dynamically reconfigurable super-resolution device

Country Status (1)

Country Link
CN (1) CN114612309B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109544457A (en) * 2018-12-04 2019-03-29 电子科技大学 Image super-resolution method, storage medium and terminal based on fine and close link neural network
US10701394B1 (en) * 2016-11-10 2020-06-30 Twitter, Inc. Real-time video super-resolution with spatio-temporal networks and motion compensation
CN112991173A (en) * 2021-03-12 2021-06-18 西安电子科技大学 Single-frame image super-resolution reconstruction method based on dual-channel feature migration network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10701394B1 (en) * 2016-11-10 2020-06-30 Twitter, Inc. Real-time video super-resolution with spatio-temporal networks and motion compensation
CN109544457A (en) * 2018-12-04 2019-03-29 电子科技大学 Image super-resolution method, storage medium and terminal based on fine and close link neural network
CN112991173A (en) * 2021-03-12 2021-06-18 西安电子科技大学 Single-frame image super-resolution reconstruction method based on dual-channel feature migration network

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
HONG-LIANG CHANG .EL: "Reconstruction of Proton Image with Ion Recombination Compensation", 《2021 9TH INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGY》 *
欧阳宁等: "基于特征重用模型的超分辨率重建方法", 《桂林电子科技大学学报》 *
赵鑫: "多源高分辨率遥感图像自动配准算法研究 ——面向震后灾情快速评估", 《中国优秀博硕士学位论文全文数据库(硕士)》 *
陈晨等: "基于四通道卷积稀疏编码的图像超分辨率重建方法", 《计算机应用》 *
高昭昭等: "基于卷积神经网络的单帧毫米波图像超分辨算法", 《电子信息对抗技术》 *

Also Published As

Publication number Publication date
CN114612309B (en) 2022-10-14

Similar Documents

Publication Publication Date Title
CN108921910B (en) JPEG coding compressed image restoration method based on scalable convolutional neural network
CN1239380A (en) Apparatus and method for compressing image data received from an image sensor
CN112017116B (en) Image super-resolution reconstruction network and its construction method based on asymmetric convolution
CN106385584B (en) The adaptively sampled coding method of distributed video compressed sensing based on spatial correlation
CN116916035B (en) Image compression method and system based on feature stacking attention for feature preprocessing
CN114399036B (en) An Efficient Convolution Computing Unit Based on One-Dimensional Winograd Algorithm
CN110428382A (en) A kind of efficient video Enhancement Method, device and storage medium for mobile terminal
CN110099280A (en) A kind of video service quality Enhancement Method under wireless self-organization network Bandwidth-Constrained
CN102724499B (en) Variable-compression ratio image compression system and method based on FPGA
CN114494472B (en) Image compression method based on deep self-attention transformer network
AU2019101272A4 (en) Method and apparatus for super-resolution using line unit operation
CN112509071B (en) A Luminance Information Aided Chroma Information Compression and Reconstruction Method
CN101697486A (en) Two-dimensional wavelet transform integrated circuit structure
CN114612309B (en) Fully on-chip dynamically reconfigurable super-resolution device
CN107105245A (en) High speed JPEG method for compressing image based on TMS320C6678 chips
CN101883285A (en) Design Method of Parallel Pipeline Deblocking Filter VLSI Structure
CN101534439A (en) Low power consumption parallel wavelet transforming VLSI structure
CN103761753B (en) Decompression method based on texture image similarity
CN104869426A (en) JPEG coding method lowering image diamond effect under low compression code rate
CN105245889B (en) A kind of reference frame compression method based on stratified sampling
CN108184127A (en) A kind of configurable more dimension D CT mapping hardware multiplexing architectures
CN106559668A (en) A low-bit-rate image compression method based on intelligent quantization technology
Qin et al. Leveraging redundancy in feature for efficient learned image compression
CN111815502B (en) FPGA acceleration method for multi-graph processing based on WebP compression algorithm
CN103366384B (en) Importance degree drive towards overall redundant image compression method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant